crawler.request_header

Background

This parameter can be used to specify an optional additional header to be inserted in HTTP(S) requests made by the webcrawler. For example, sending a cookie header may help the WebCrawler in gaining access to a web site which uses cookies to store login information. An alternative approach is to specify in_crawl crawler.form_interaction.in_crawl.[groupId].url_pattern or pre_crawl crawler.form_interaction.pre_crawl.[groupId].url form interaction entries to login to a specific site.

Setting the key

Set this configuration key in the search package or data source configuration.

Use the configuration key editor to add or edit the crawler.request_header key, and set the value. This can be set to any valid String value.

Default value

(Empty)

Examples

Send a cookie string:

crawler.request_header=Cookie: phpbb2mysql_data=xyx; phpbb2mysql_sid=123

This cookie information could be got by loading up the relevant website in a web browser and then examining the cookies it tries to set and store.

Notes:

  • If sending cookie strings you should set crawler.accept_cookies to "false", to avoid the cookie strings you are trying to send being overridden.

  • You will probably want to use the crawler.request_header_url_prefix parameter as well to limit what URLs the crawler sends these request headers to.