Crawler out of memory: java.lang.OutOfMemoryError: PermGen space

Description

This error occurs if the Java virtual machine runs out of PermGen memory allocation.

Error message

Displayed in the url_errors.log file.

[Crawl URLs: java.lang.OutOfMemoryError: PermGen space]

Cause

The Java virtual machine PermGen space allocated by the web crawler has been used up. This is a separate allocation of memory to the head and is used to store class object information. The default allocation is quite small at only about 25MB and can be consumed by complex collection configurations given there are already a large number of classes now included in the Funnelback class path. The JVM memory usage can be checked using the java jmap utility with -heap by running the following command from the terminal:

jmap -heap <jvm_process_id>

Resolution

  1. Increase the allocated PermGen memory by adding the following Java options to the collection.cfg:

    java_options=-XX:MaxPermSize=128m
  2. Start a new crawl