start_url

Background

This option is the list of URLs from which the web crawler should start crawling. All links found on the pages are followed according to the include/exclude patterns.

The crawler will start crawling URLs from both this setting as well as from URLs in the file specified by crawler.start_urls_file.

Only use HTTP/HTTPS protocols in the URL.

Within configuration files the format is a space separated list of URLs.

Setting the key

Set this configuration key in the search package or data source configuration.

Use the configuration key editor to add or edit the start_url key, and set the value. This can be set to any valid List<String> value.

Default value

By default the option is empty.

start_url=

Examples

To configure the crawler to start crawling from the following two URLs:

http://www.company.com/
http://store.company.com/
start_url=http://www.company.com/ http://store.company.com/

Notes

The key must be set for all web and matrix data sources, even if all URLs would come from crawler.start_urls_file. In that case the value can be set to empty.

While permission to read and edit this key is configured by read.key.start_url and edit.key.start_url, to fully restrict the URLs that will be crawled, you will need to also consider sec.collection-start-urls, read.key.start_urls_file and edit.key.start_urls_file.