crawler.start_urls_file

Background

The list of start URLs that will be initially crawled is a combination of all URLs declared in the file specified here and those which are in start_url.

Only use HTTP/HTTPS protocols in the URL.

Setting the key

Set this configuration key in the search package or data source configuration.

Use the configuration key editor to add or edit the crawler.start_urls_file key, and set the value. This can be set to any valid String value.

Default value

crawler.start_urls_file=collection.cfg.start.urls

Examples

crawler.start_urls_file=/conf/myurllist.txt

This file might then contain something like:

http://www.funnelback.com/news/index
http://www.mycompany.com/
https://some.secure.site.com/

Notes

While permission to read and edit this key is configured by read.key.start_urls_file and edit.key.start_urls_file, to fully restrict the URLs that will be crawled, you will need to also consider sec.collection-start-urls, read.key.start_url and edit.key.start_url.