Content auditor

The content auditor report is designed to assist you to understand and manage your content.

While the tool primarily focuses on your content’s metadata, it can also report on:

  • Readability of your content

  • Usage of undesirable words

  • When content was last updated, modified, published

  • Response times of your content

  • Discovery of duplicate content

The content auditor report can be filtered in a number of various ways through the recommendations, overview, attributes and search results screens.

Accessing the content auditor report

You can access the content auditor report by selecting the content auditor tile from the insights dashboard, or by clicking the content auditor link from the results page detail page in the search dashboard.

Using content auditor

Content auditor is available for every Funnelback results page from the insights dashboard, and by default provides a number of content-related reports.

Content auditor provides a range of sub-reports arranged in tabs, and options to navigate through your content either by keywords (i.e. any valid Funnelback search query), but filtering the report based on URL prefixes, and by filtering based on any metadata value shown within the Content Auditor interface.

Content-auditor-recommendations.png
  1. Filter the report by URI path

  2. Filter the report by keyword(s)

  3. Content auditor sub-reports

  4. Content auditor sub-report content

The first sub-report within content auditor, as shown above, provides a range of recommendation reports which are associated with common content best practices. These reports are as follows:

Reading grade report

The reading grade chart reports on the reading grade level of pages in the index. The reading grade is assessed using the Flesch-Kincaid readability scale. This estimates the reading level required to understand the content from the user’s point of view. The grade level is the level of schooling - a grade of 8 corresponds to an 8th grade reading level, or that easily understood by a 13-14 year old in 8th grade.

reading grade chart 01

A reading grade level of 8-9 is considered plain English. For WCAG 2.0 accessibility compliance (Level AAA) the readability of your sites content should be secondary school level (grade 9) or lower. The pages audited for the chart above pass this AAA check.

Clicking on any of the bars in this report filters the recommendations data based upon that reading grade level. For example by clicking on the grade 5 level we get an analysis of that specific grade.

reading grade chart 02
  1. Indicates that a filter (reading grade = 5) has been applied to the report.

While this measure is a heuristic rather than an exact measurement, it may be useful in ensuring that you write website content at an appropriate level.

The range of 'green' grade levels can be configured with the ui.modern.content-auditor.reading-grade.lower-ok-limit and ui.modern.content-auditor.reading-grade.upper-ok-limit parameters

Missing metadata report

The missing metadata table identifies pages that are missing metadata fields that are configured for reporting within the content auditor.

missing metadata 01

By default the content auditor reports on author, format, language, publisher and subject metadata.

Clicking one of the items filters the report to report on only items missing the selected field.

For the search in the example above the default tags are missing in all pages. An additional metadata field, tags, is also reported on and this is missing from 3581 pages on the site.

Duplicate titles report

The duplicate titles table lists titles that are found in more than one page. A duplicate title could indicate a duplicate content page, or a poorly titled page.

Duplicate titles should be avoided as these can cause confusion with users.

duplicate titles 01

This table can be used to identify the pages with duplicate titles so that a web administrator can take the appropriate action (which could be removing duplicate content or re-titling a page to provide better context.

Clicking one of the items filters the report to report on only items that contain the selected title. Clicking the view all button opens up the attributes report.

For this report to be used with documents which are not originally HTML or filtered to HTML (such as XML records), a copy of the title metadata to be considered must be mapped to the FunDuplicateTitle metadata class.

Date modified report

The date modified report presents a chart of when pages / documents were last modified. This is based upon the metadata of the page/document and can be helpful to identify which documents should be updated and/or reviewed.

date modified 01

Moving your mouse across each of the bars in the chart, will give you a glimpse of how many pages / documents have changed in a certain timeframe. Clicking on a bar filters the report to the selected year.

The allowable document age before it is marked in red can be configured with the Ui.modern.content-auditor.date-modified.ok-age-years setting.

Response time report

The response time report provides you with a bar chart of the time taken to download documents. This may help identify pages / documents / sections or entire sites where response time is in need of improvement.

Note: the response time only tracks the time taken to retrieve the document and doesn’t include linked resources. It is not the same as a page load time which includes the time taken to load these resources (such as images presented in a HTML page).

Hovering over the bars in the chart provides additional information.

response time 01

Undesirable text report

The undesirable text table reports on undesirable words found within the content.

By default, the undesirable word list includes common misspellings but can be customised to identify organisational-specific undesirable words (such as specific words banned in editorial policies or other terms such as acronyms).

undesirable text 01

Clicking one of the items filters the report to report on only items that contain the specific word. Clicking the view all button opens up the attributes report.

Duplicate content

The duplicate content report shows documents for which the content (or if configured, some metadata) is duplicated by other documents. Duplicated content makes site more difficult to navigate, and may also be penalized as a ranking factor by some search engines.

The ui.modern.content-auditor.collapsing-signature configuration parameter can be used to configure exactly what parts of documents are considered for duplication.

Other content auditor reports

Content auditor overview

The overview tab reports on the top values found for each of the metadata fields covered by the content auditor report.

Metadata fields that are configured but missing in all the pages are suppressed.

Each metadata field can be explored further by clicking the corresponding view all button, or the report filtered to just the specific metadata value by clicking on one of the values.

content auditor overview 01

Content auditor attributes

The attributes tab provides a complete list of metadata values for each metadata field that is included in the content auditor report. Clicking on one of the values in the list will restrict subsequent reports to documents containing that metadata value.

content auditor attributes 01

Content auditor search results

The search results tab lists the individual pages that match the current search criteria which consists of any search terms entered into the search box filtered by any of the metadata values selected on other screens.

The results are returned as a table that shows the metadata values for each item along with some tools linking in with other parts of the insights dashboard.

The table lists:

  • Title and URL of the document / page

  • File size

  • Last updated date

  • Format

  • Metadata that is configured to be reported on

  • Quick access to additional tools

The listing can be exported as CSV.

Additional tools are available for each item in the listing providing quick access to the tool in the context of the selected page:

Symbol Name Function

Analyse anchor tags

Provides information about pages that link to the current page.

SEO auditor

Loads the page into SEO auditor, allowing for analysis on how the page performs for specific search terms.

Check accessibility with WCAG auditor

Information on how the page conforms to WCAG accessibility checks.

Preview the page / document

Shows a thumbnail sized preview of the page.

View cached copy

Loads the cached (or locally saved) copy of the page.

Troubleshooting content auditor issues

Reporting on custom document properties

The metadata scraper filter can be used to check documents for various properties by defining a number of content checking rules that are tailored for your content. These can then be included in your content auditor report.

Configuring content auditor

Content auditor can be configured in a number of ways to provide relevant reports for different data sets. Most customizations are made by setting results page configuration parameter keys.

The common customizations are:

The following is a full set of content auditor results page parameter keys:

ui.modern.content-auditor.collapsing-signature

Specify how results are determined to be duplicates within content auditor

ui.modern.content-auditor.daat_limit

Specify how many results are examined in creating content auditor reports

ui.modern.content-auditor.display-metadata.[metadataName]

Specify which metadata classes should be displayed within the content auditor’s search results tab.

ui.modern.content-auditor.facet-metadata.[metadata]

specify which metadata classes should be displayed as facets

ui.modern.content-auditor.max-metadata-facet-categories

Specify how many categories are displayed within each facet shown in content auditor.

ui.modern.content-auditor.num_ranks

Specify how many results are displayed in the content auditor search results tab

ui.modern.content-auditor.overview-category-count

Specify how many facet categories are displayed for each facet

ui.modern.content-auditor.preferred-facets

Specifies which content auditor facets will be displayed in the content auditor panel of the insights dashboard.

Further customizations can be implemented using search lifecycle plugins that target the Content Auditor search type.

The following is a full set of content auditor data source parameter keys:

filter.jsoup.undesirable_text-source.[key_name]

Specify sources of undesirable text strings to detect and present within content auditor.\

ui.modern.content-auditor.count_urls

Define how deep into URLs Content Auditor users can navigate using facets.

ui.modern.content-auditor.date-modified.ok-age-years

Define how many years old a document may be before it is considered problematic.

ui.modern.content-auditor.duplicate_num_ranks

Define how many results should be considered in detecting duplicates for Content Auditor.

ui.modern.content-auditor.reading-grade.lower-ok-limit

Define the reading grade below which documents are considered problematic.

ui.modern.content-auditor.reading-grade.upper-ok-limit

Define the reading grade above which documents are considered problematic.