crawler.remove_parameters
Background
This is a regular expression to remove portions of a URL. For example, if you wish to remove a session-id or stylesheet parameter from all URLs you would use this parameter to implement this. If the URL matches the given regular expression then the matching portion will be stripped off before the URL is downloaded.
Examples
To remove style=mediaRelease
, stylesheet=mediaRelease
and x=123
from URLs:
crawler.remove_parameters=regexp:&style(sheet)?=mediaRelease|&x=\d+
To remove all parameters starting with utm_
from URLs:
crawler.remove_parameters=regexp:utm_[^=&]+=[^&]+&?