Controlling which parts of your pages are indexed by the DXP search

This information only applies when using your website with DXP search.

Controlling which parts of a web page are considered for search results relevance is a very important and simple process.

DXP search - no index tags

The DXP search provides noindex tags, which are HTML comment tags that can be included in site templates (and wherever else is appropriate). Areas of the page code that don’t contain page content should be excluded from consideration by the indexer. This means that when indexing the page only the relevant content is included in the index.

noindex tags only hide the content from the indexer, the crawler will still follow any links within noindex regions.

This has a few benefits. The most obvious one is that search result relevance will immediately improve due to the removal of a lot of noise from the search results. For example a search for contact information won’t potentially return every page on a site because contact us happens to appear in the site navigation.

A secondary benefit is that the search result summaries will become much more relevant as well as snippet text will only include the indexed content.

Applying noindex tags is as simple as adding <!-- noindex --> and <!-- endnoindex --> comment tags to your site templates. e.g.

 ... This section is indexed ...
 ... Text in this section is not indexed, but links are followed and the link graph information is recorded by DXP search for ranking purposes ...
 ... This section is indexed ...

The idea is that you put noindex tags around all templated site navigation, headers and footers. This prevents the search returning every page in response to the queries such as about and contact and also ensures that navigation and headers are excluded from contextual search summaries.

Noindex tags do not have to be specified in matching noindex/endnoindex pairs - the document is parsed from top to bottom and indexing switches whenever a tag is encountered. However, if you’ve used any <!-- noindex tags -->, don’t forget to put a <!-- endnoindex --> before the start of any content otherwise the search indexer will have nothing to index.

There are also some old Google-specific tags that provided equivalent noindex functionality.

The DXP search will recognize these as aliases for the noindex/endnoindex tags:

DXP search native tag Google equivalent 1 Google equivalent 2

<!-- noindex -->

<!-- googleoff: index -->

+<!-- googleoff: all -→+

<!-- endnoindex -->

<!-- googleon: index -->

<!-- googleon: all -->

the Googleoff/on: anchor and snippet tags are not supported by the DXP search.
  1. Index your site using DXP Search and observe the search results when you search for something in your site navigation.

  2. Modify your site template to wrap the header,footer and site navigation in noindex tags

  3. Clear the DXP Content Management cache

  4. Run a full update of the search index and rerun the query for something in the site navigation. Observe that there are now fewer results, and that the summaries don’t show any content from the header/footer or navigation.