crawler.num_crawlers

Number of crawler threads which simultaneously crawl different hosts.

Key: crawler.num_crawlers
Type: Integer
Can be set in: collection.cfg

Description

This option specifies how many crawler threads will be created to download pages from a set of web sites. Note that with the default frontier, only one thread will access a given host at a time (i.e. if you have 10 threads but are only crawling one site then 9 threads will be idle).

Default Value

crawler.num_crawlers=20

Examples

If you have a small number of distinct web sites to crawl, then you might decide to reduce the number of threads:

crawler.num_crawlers=10
Having the extra threads should not have any performance impact on the system (as they will be idle if there is no site for them to crawl).

© 2015- Squiz Pty Ltd