Using sitemap.xml to expose unlinked pages

Use Sitemap.xml to tell web robots about pages on your site that you can’t browse to by following links in pages.

This type of page is often only accessible using a search function on a site (For example, by searching the publications archive). It is best practice in this instance to prevent web crawlers from crawling your search results pages, and list all the individual URLs in a sitemap file.

This is a lot more efficient because the crawler will discover each content page once and won’t waste time crawling through all the permutations of search pages with parameters that often result from this sort of search.

Sitemaps will be used by public search engines (like Google and Bing) as well as your integrated DXP search.

Providing a sitemap

Configuring sitemaps requires you to:

  1. Configure your DXP Content Management site to build the sitemap.xml file(s).

  2. Add sitemap.xml links to your robots.txt

  3. Turn on sitemap support in the DXP search web data source that indexes your site.

DXP search limitations

The DXP search has limited support for the sitemap.xml standard:

  • Sitemaps (including nested sitemap files) are processed by DXP Search, but only for the purpose of discovering links. The links are extracted from the sitemap files and added to the list of URLs that should be crawled if they pass the include/exclude patterns that are defined for the crawler.

  • Other directives within the sitemap.xml (such as priority and changefreq) file are currently ignored.