crawler.start_urls_file
Background
The list of start URLs that will be initially crawled is a combination of all URLs declared in the file specified here
and those which are in start_url
.
Only use HTTP/HTTPS protocols in the URL.
Examples
crawler.start_urls_file=/conf/myurllist.txt
This file might then contain something like:
http://www.funnelback.com/news/index http://www.mycompany.com/ https://some.secure.site.com/
Notes
While permission to read and edit this key is configured by read.key.start_urls_file
and edit.key.start_urls_file
,
to fully restrict the URLs that will be crawled, you will need to also consider sec.collection-start-urls
,
read.key.start_url
and edit.key.start_url
.