crawler.request_header
Background
This parameter can be used to specify an optional additional header to be inserted in HTTP(S) requests
made by the webcrawler. For example, sending a cookie header may help the WebCrawler in gaining access
to a website which uses cookies to store login information. An alternative approach is to specify in_crawl
crawler.form_interaction.in_crawl.[groupId].url_pattern
or pre_crawl
crawler.form_interaction.pre_crawl.[groupId].url
form interaction entries
to log in to a specific site.
Examples
Send a cookie string:
crawler.request_header=Cookie: phpbb2mysql_data=xyx; phpbb2mysql_sid=123
This cookie information could be got by loading up the relevant website in a web browser and then examining the cookies it tries to set and store.
Notes:
-
If sending cookie strings you should set
crawler.accept_cookies
to "false", to avoid the cookie strings you are trying to send being overridden. -
You will probably want to use the
crawler.request_header_url_prefix
parameter as well to limit what URLs the crawler sends these request headers to.