start_url
Background
This option is the list of URLs from which the web crawler should start crawling. All links found on the pages are followed according to the include/exclude patterns.
The crawler will start crawling URLs from both this setting and from URLs in the file specified
by crawler.start_urls_file
.
Only use HTTP/HTTPS protocols in the URL.
Within configuration files the format is a space separated list of URLs.
Examples
To configure the crawler to start crawling from the following two URLs:
http://www.company.com/ http://store.company.com/
start_url=http://www.company.com/ http://store.company.com/
Notes
The key must be set for all web and matrix data sources, even if all URLs would come from
crawler.start_urls_file
. In that case the value can be
set to empty.
While permission to read and edit this key is configured by read.key.start_url
and edit.key.start_url
,
to fully restrict the URLs that will be crawled, you will need to also consider sec.collection-start-urls
,
read.key.start_urls_file
and edit.key.start_urls_file
.