crawler.request_header

Optional additional header to be inserted in HTTP(S) requests made by the webcrawler.

Key: crawler.request_header
Type: String
Can be set in: collection.cfg

Description

This parameter can be used to specify an optional additional header to be inserted in HTTP(S) requests made by the webcrawler. For example, sending a cookie header may help the WebCrawler in gaining access to a web site which uses cookies to store login information. An alternative approach is to specify in_crawl crawler.form_interaction.in_crawl.[groupId].url_pattern or pre_crawl crawler.form_interaction.pre_crawl.[groupId].url form interaction entries to login to a specific site.

Default Value

(Empty)

Examples

Send a cookie string:

crawler.request_header=Cookie: phpbb2mysql_data=xyx; phpbb2mysql_sid=123

This cookie information could be got by loading up the relevant website in a web browser and then examining the cookies it tries to set and store.

Notes

  • If sending cookie strings you should set crawler.accept_cookies to "false", to avoid the cookie strings you are trying to send being overridden.

  • You will probably want to use the crawler.request_header_url_prefix parameter as well to limit what URLs the crawler sends these request headers to.

© 2015- Squiz Pty Ltd