Content auditor - reports are missing

Problem description

The reading grade level, undesirable text or duplicate titles reports are missing.

Solution

This is usually caused by customization of the filters that are run for a data source.

Check the data source configuration for each data source that is part of your search package.

  1. Check to see if you have filter.classes set in your data source configuration. If this is set in your configuration then you need to ensure it includes the JSoupProcessingFilterProvider within the listed set of filters.

    if filter.classes is not listed then your data source is configured with the default filter chain and Jsoup filtering will be enabled.

    e.g. this is the default filter chain:

    filter.classes=TikaFilterProvider,ExternalFilterProvider:JSoupProcessingFilterProvider:DocumentFixerFilterProvider
  2. Check to see if you have filter.jsoup.classes set in your data source configuration. If this is set in your configuration then you need to ensure it includes the ContentGeneratorUrlDetection, FleschKincaidGradeLevel, UndesirableText and `TitleDuplicates filters within the listed set of filters.

    filter.jsoup.classes=ContentGeneratorUrlDetection,FleschKincaidGradeLevel,UndesirableText,TitleDuplicates