Using sitemap.xml to expose unlinked pages
Use Sitemap.xml
to tell web robots about pages on your site that you can’t browse to by following links in pages.
This type of page is often only accessible using a search function on a site (e.g. search the publications archive). It is best practice in this instance to prevent web crawlers from crawling your search results pages, and list all the individual URLs in a sitemap file.
This is a lot more efficient because the crawler will discover each content page once and won’t waste time crawling through all the permutations of search pages with parameters that often result from this sort of search.
Sitemaps will be used by public search engines (like Google and Bing) as well as your integrated DXP search.
Providing a sitemap
Configuring sitemaps requires you to:
-
Configure your DXP Content Management site to build the
sitemap.xml
file(s). -
Add
sitemap.xml
links to yourrobots.txt
-
Turn on sitemap support in the DXP search web data source that indexes your site.
DXP search limitations
The DXP search has limited support for the sitemap.xml
standard:
-
Sitemaps (including nested sitemap files) are processed by DXP Search, but only for the purpose of discovering links. The links are extracted from the sitemap files and added to the list of URLs that should be crawled if they pass the include/exclude patterns that are defined for the crawler.
-
Other directives within the
sitemap.xml
(such aspriority
andchangefreq
) file are currently ignored.