Plugin: Date filter

Other versions of this plugin may exist. Please ensure you are viewing the documentation for the version that you are currently using. If you are not running the latest version of your plugin we recommend upgrading. See: list of all available versions of this plugin.

Purpose

Excludes documents in an XML or HTML data source based on a date/time contained in the document itself.

The most common use case for this filter plugin is to exclude social media posts older than a given date from being included in the search index.

Usage

Enable the plugin

Enable the date-filter plugin on your data source from the Extensions screen in the administration dashboard or add the following data source configuration to enable the plugin.

plugin.date-filter.enabled=true
plugin.date-filter.version=1.0.0
This plugin requires a full update of the data source to take effect.

Plugin configuration settings

The DateFilter filter must be added to the filter chain for the plugin to work correctly:

Add the filter to the filter.classes in the data source configuration.

filter.classes=<OTHER-FILTERS>:com.funnelback.plugin.datefilter.DateFilter:<OTHER-FILTERS>
The filter should be placed at an appropriate position in the filter chain. In most circumstances this should be located towards the end of the filter chain.

The following option must be set (for XML and HTML filtering) in the data source configuration to configure the plugin:

  • plugin.date-filter.config.unit: Specifies the unit of time used to calculate if the document should be filtered and is required. Valid values are 'YEARS', 'MONTHS', 'DAYS', 'HOURS', 'MINUTES'.

  • plugin.date-filter.config.amount: Specifies the amount of units above used to calculate if the document should be filtered and is required.

On XML data source, the following option must be set if you are not using the built-in 'Facebook', 'YouTube' or 'Twitter' data source types:

  • plugin.date-filter.config.record_type: Specifies the type of custom XML data source. Valid values are 'instagram' (when using the Stencils Instagram gatherer) or 'custom'.

Also, for the XML data source, the following options must be set if plugin.date-filter.config.record_type is set to custom:

  • plugin.date-filter.config.date_element: Specifies the XML element name (located at the root of the record’s XML) that contains the date/time value to be used for filtering.

  • plugin.date-filter.config.date_format: Specifies the format of the date/time value in the XML element that contains the date/time value. Value must be a valid Java date format string.

If you are using the plugin to filter HTML documents following options are mandatory:

  • plugin.date-filter.config.date_format: Specifies the format of the date/time value in the element that contains the date/time value. Value must be a valid Java date format string.

  • plugin.date-filter.config.jsoup_selector: Specifies Jsoup selector for the element that contains the date/time value to be used for filtering.

If the date is in the elements attribute, you can extract it using following setting:

  • plugin.date-filter.config.jsoup_selector.attribute: Specifies the attribute to extract the date from.

Example - exclude items older than 30 days

For a custom XML record:

<item>
    <title><![CDATA[Example record]]></title>
    <timestamp>2000-12-24T04:35:21+1100</timestamp>
    <description><![CDATA[Example description]]></description>
</item>

Exclude records older than 30 days from a custom XML data source using the 'timestamp' element:

plugin.date-filter.config.unit=DAYS
plugin.date-filter.config.amount=30
plugin.date-filter.config.record_type=custom
plugin.date-filter.config.date_element=timestamp
plugin.date-filter.config.date_format=yyyy-MM-dd'T'HH:mm:ssZ
For a HTML document that has date in meta tag
<html>
<body>
<h1>Title</h1>

META tag <meta name="created_date" content="2005-10-12">

</body>
</html>

Keep documents that are newer than 1 year

plugin.date-filter.config.unit=YEARS
plugin.date-filter.config.amount=1
plugin.date-filter.config.date_format=yyyy-MM-dd
plugin.date-filter.config.jsoup_selector=meta[name=created_date]
plugin.date-filter.config.jsoup_selector.attribute=content

If the date is in tag content:

<span class="date">October 12th, 2015</span>

use:

plugin.date-filter.config.jsoup_selector=span.date

For tag attributes:

<span data-date="12/10/2005">...</span>

use:

plugin.date-filter.config.jsoup_selector=span[data-date]
plugin.date-filter.config.jsoup_selector.attribute=data-date

NOTE: Please note that the date format key, plugin.date-filter.config.date_format=yyyy-MM-dd, must be configured to match the document’s date format.
For example: If the document has 2023-07-23 13:55:05.0 CEST in the date field that is being processed, the date format key should be configured as plugin.date-filter.config.date_format=yyyy-MM-dd HH:mm:ss.S z.

Upgrade notes

Upgrading from xml-date-filter plugin

This plugin supersedes the xml-date-filter plugin and any data sources that use the xml-date-filter plugin should upgraded to use this plugin. To upgrade to use this rename the configuration keys for the xml-date-filter plugin (plugin.xml-date-filter.<configuration key>) to the corresponding keys in this plugin (plugin.date-filter.<configuration key>).

The quickest way to achieve this and update all of the keys is to use the tools  edit raw data menu item from the menu located above the configuration keys listing.

The example below shows a set of old plugin.xml-date-filter plugin keys upgraded to the plugin.date-filter plugin. When upgrading ensure you have the version key set to the correct version number for the new plugin.

xml-date-filter plugin:

plugin.xml-date-filter.enabled=true
plugin.xml-date-filter.version=1.0.0
plugin.xml-date-filter.config.unit=DAYS
plugin.xml-date-filter.config.amount=30
plugin.xml-date-filter.config.record_type=custom
plugin.xml-date-filter.config.date_element=timestamp
plugin.xml-date-filter.config.date_format=yyyy-MM-dd'T'HH:mm:ssZ

date-filter plugin:

plugin.date-filter.enabled=true
plugin.date-filter.version=1.0.0
plugin.date-filter.config.unit=DAYS
plugin.date-filter.config.amount=30
plugin.date-filter.config.record_type=custom
plugin.date-filter.config.date_element=timestamp
plugin.date-filter.config.date_format=yyyy-MM-dd'T'HH:mm:ssZ

All versions of date-filter