crawler.start_urls_file
Path to a file that contains a list of URLs (one per line) that will be used as the starting point for a crawl.
Key: crawler.start_urls_file
Type: String
Can be set in: collection.cfg
Description
The list of start URLs that will be initially crawled is a combination of all URLs declared in the file specified here
and those which are in start_url
.
Only use HTTP/HTTPS protocols in the URL.
Examples
crawler.start_urls_file=/conf/myurllist.txt
This file might then contain something like:
http://www.funnelback.com/news/index http://www.mycompany.com/ https://some.secure.site.com/
⚠ Caveats
While permission to read and edit this key is configured by read.key.start_urls_file
and edit.key.start_urls_file
,
to fully restrict the URLs that will be crawled, you will need to also consider sec.collection-start-urls
,
read.key.start_url
and edit.key.start_url
.