Plugin: Date filter
Other versions of this plugin may exist. Please ensure you are viewing the documentation for the version that you are currently using. If you are not running the latest version of your plugin we recommend upgrading. See: list of all available versions of this plugin. |
Purpose
Excludes documents in an XML or HTML data source based on a date/time contained in the document itself.
The most common use case for this filter plugin is to exclude social media posts older than a given date from being included in the search index.
Usage
Enable the plugin
Enable the date-filter plugin on your data source from the Extensions screen in the administration dashboard or add the following data source configuration to enable the plugin.
plugin.date-filter.enabled=true
plugin.date-filter.version=1.0.0
This plugin requires a full update of the data source to take effect. |
Plugin configuration settings
The DateFilter
filter must be added to the filter chain for the plugin to work correctly:
Add the filter to the filter.classes
in the data source configuration.
filter.classes=<OTHER-FILTERS>:com.funnelback.plugin.datefilter.DateFilter:<OTHER-FILTERS>
The filter should be placed at an appropriate position in the filter chain. In most circumstances this should be located towards the end of the filter chain. |
The following option must be set (for XML and HTML filtering) in the data source configuration to configure the plugin:
-
plugin.date-filter.config.unit
: Specifies the unit of time used to calculate if the document should be filtered and is required. Valid values are 'YEARS', 'MONTHS', 'DAYS', 'HOURS', 'MINUTES'. -
plugin.date-filter.config.amount
: Specifies the amount of units above used to calculate if the document should be filtered and is required.
On XML data source, the following option must be set if you are not using the built-in 'Facebook', 'YouTube' or 'Twitter' data source types:
-
plugin.date-filter.config.record_type
: Specifies the type of custom XML data source. Valid values are 'instagram' (when using the Stencils Instagram gatherer) or 'custom'.
Also, for the XML data source, the following options must be set if plugin.date-filter.config.record_type
is set to custom
:
-
plugin.date-filter.config.date_element
: Specifies the XML element name (located at the root of the record’s XML) that contains the date/time value to be used for filtering. -
plugin.date-filter.config.date_format
: Specifies the format of the date/time value in the XML element that contains the date/time value. Value must be a valid Java date format string.
If you are using the plugin to filter HTML documents following options are mandatory:
-
plugin.date-filter.config.date_format
: Specifies the format of the date/time value in the element that contains the date/time value. Value must be a valid Java date format string. -
plugin.date-filter.config.jsoup_selector
: Specifies Jsoup selector for the element that contains the date/time value to be used for filtering.
If the date is in the elements attribute, you can extract it using following setting:
-
plugin.date-filter.config.jsoup_selector.attribute
: Specifies the attribute to extract the date from.
Example - exclude items older than 30 days
For a custom XML record:
<item>
<title><![CDATA[Example record]]></title>
<timestamp>2000-12-24T04:35:21+1100</timestamp>
<description><![CDATA[Example description]]></description>
</item>
Exclude records older than 30 days from a custom XML data source using the 'timestamp' element:
plugin.date-filter.config.unit=DAYS
plugin.date-filter.config.amount=30
plugin.date-filter.config.record_type=custom
plugin.date-filter.config.date_element=timestamp
plugin.date-filter.config.date_format=yyyy-MM-dd'T'HH:mm:ssZ
- For a HTML document that has date in meta tag
<html>
<body>
<h1>Title</h1>
META tag <meta name="created_date" content="2005-10-12">
</body>
</html>
Keep documents that are newer than 1 year
plugin.date-filter.config.unit=YEARS
plugin.date-filter.config.amount=1
plugin.date-filter.config.date_format=yyyy-MM-dd
plugin.date-filter.config.jsoup_selector=meta[name=created_date]
plugin.date-filter.config.jsoup_selector.attribute=content
If the date is in tag content:
<span class="date">October 12th, 2015</span>
use:
plugin.date-filter.config.jsoup_selector=span.date
For tag attributes:
<span data-date="12/10/2005">...</span>
use:
plugin.date-filter.config.jsoup_selector=span[data-date]
plugin.date-filter.config.jsoup_selector.attribute=data-date
NOTE:
Please note that the date format key, plugin.date-filter.config.date_format=yyyy-MM-dd
, must be configured to match the document’s date format.
For example:
If the document has 2023-07-23 13:55:05.0 CEST
in the date field that is being processed, the date format key should be configured as plugin.date-filter.config.date_format=yyyy-MM-dd HH:mm:ss.S z
.
Upgrade notes
Upgrading from xml-date-filter plugin
This plugin supersedes the xml-date-filter
plugin and any data sources that use the xml-date-filter
plugin should upgraded to use this plugin. To upgrade to use this rename the configuration keys for the xml-date-filter
plugin (plugin.xml-date-filter.<configuration key>
) to the corresponding keys in this plugin (plugin.date-filter.<configuration key>
).
The quickest way to achieve this and update all of the keys is to use the | menu item from the menu located above the configuration keys listing.
The example below shows a set of old plugin.xml-date-filter
plugin keys upgraded to the plugin.date-filter
plugin. When upgrading ensure you have the version key set to the correct version number for the new plugin.
xml-date-filter plugin:
plugin.xml-date-filter.enabled=true
plugin.xml-date-filter.version=1.0.0
plugin.xml-date-filter.config.unit=DAYS
plugin.xml-date-filter.config.amount=30
plugin.xml-date-filter.config.record_type=custom
plugin.xml-date-filter.config.date_element=timestamp
plugin.xml-date-filter.config.date_format=yyyy-MM-dd'T'HH:mm:ssZ
date-filter plugin:
plugin.date-filter.enabled=true
plugin.date-filter.version=1.0.0
plugin.date-filter.config.unit=DAYS
plugin.date-filter.config.amount=30
plugin.date-filter.config.record_type=custom
plugin.date-filter.config.date_element=timestamp
plugin.date-filter.config.date_format=yyyy-MM-dd'T'HH:mm:ssZ