crawler.max_parse_size

Sets the maximum size of documents parsed by the crawler.

Key: crawler.max_parse_size
Type: Integer
Can be set in: collection.cfg

Description

The crawler will stop parsing documents larger than the specified value (in MB), and their content will be truncated. This only applies to MIME types listed in the crawler.parser.mimeTypes parameter (e.g. HTML, text, XML). Here parsing refers to link extraction from these file types.

Default Value

crawler.max_parse_size=10

Examples

Increase the limit to 15MB.

crawler.max_parse_size=15