crawler.remove_parameters

Background

This is a regular expression to remove portions of a URL. For example, if you wish to remove a session-id or stylesheet parameter from all URLs you would use this parameter to implement this. If the URL matches the given regular expression then the matching portion will be stripped off before the URL is downloaded.

Setting the key

Set this configuration key in the search package or data source configuration.

Use the configuration key editor to add or edit the crawler.remove_parameters key, and set the value. This can be set to any valid String value.

Default value

crawler.remove_parameters=

Examples

To remove style=mediaRelease, stylesheet=mediaRelease and x=123 from URLs:

crawler.remove_parameters=regexp:&style(sheet)?=mediaRelease|&x=\d+

To remove all parameters starting with utm_ from URLs:

crawler.remove_parameters=regexp:utm_[^=&]+=[^&]+&?

See also