Define canonical URLs

Websites can often provide the same (or very similar) content via multiple URLs. For example, you might have region-specific subdomains that share a lot of content pages, or may have dynamically generated pages where the set of parameters supplied can often vary but result in the same content being returned.

Search indexers have some intelligence to automatically detect duplication but this can fail if the pages are slightly different (e.g. different navigation items or hidden content). Canonical URLs can be used to overcome this issue - by providing a canonical URL meta tag that defines what URL should be assigned when indexing a specific page. The specified URL obviously has to return the page, but if the same URL is specified on multiple URL variants, only the first variant will be indexed and the others will be skipped by the indexed (as the URL will already be in the index).

The other main use for canonical URLs is to define a sensible URL for a page. This is particularly relevant when you have dynamically-generated pages that have a lot of varying request parameters, or have Matrix pages that are accessed using ?a URLs.

Use canonical URLs with care - the DXP search will use the canonical URL as the key to store the document and this is also used for duplicate detections so incorrectly specified canonical URLs may result in items that are not added to the index. A common error is to set the canonical URL to the home page for the entire site. This will result in a single page being stored within the search index.