crawler.remove_parameters
Optional list of parameters to remove from URLs.
Key: crawler.remove_parameters
Type: String
Can be set in: collection.cfg
Description
This is a regular expression to remove portions of a URL. For example, if you wish to remove a session-id or stylesheet parameter from all URLs you would use this parameter to implement this. If the URL matches the given regular expression then the matching portion will be stripped off before the URL is downloaded.
Examples
To remove style=mediaRelease
, stylesheet=mediaRelease
and x=123
from URLs:
crawler.remove_parameters=regexp:&style(sheet)?=mediaRelease|&x=\d+
To remove all parameters starting with utm_
from URLs:
crawler.remove_parameters=regexp:utm_[^=&]+=[^&]+&?