Plugin: Date filter

Purpose

Use this plugin if you need to exclude documents in an XML or HTML data source based on a date/time contained in the document itself.

When to use this plugin

The primary use for this plugin is with social media repositories that do not provide any options for restricting the social media items that are included within the search. For example, to exclude social media posts older than a given date.

This has the benefit of:

  • removing the older and less relevant content from your search results.

  • ensuring that your license document count isn’t eaten up by a lot of old social media content.

Also use this plugin when you have date based content that you wish to exclude from the index, and you don’t have access to other methods of excluding the content (such as modifying the include/exclude conditions of your crawl or API/database query that is executed.

When to not use this plugin

Do not use this plugin if your data source has another supported method of excluding the items.

This plugin will remove items by date, but this is done after the document is retrieved from the data source. If the data source has an alternate method of excluding this content (e.g. by altering the API query, or via configurable include/exclude rules) then the built in method will be significantly more performant as the data will not be gathered from the data source.

For example, consider the scenario where you are gathering content from a social media repository and only want to include today’s items.

If you use this plugin, you will fetch all items from the repository (only limited by any other include/exclude rules you have defined on the data source), then the plugin will check the date and skip old items.

If the data source has the ability to alter the API query that is run then you will only fetch today’s items - which will be significantly faster, and also potentially result in fewer API calls to the social media platform, which is important for platforms where the number of API calls is limited.

Usage

Enable the plugin

  1. Select Plugins from the side navigation pane and click on the Date filter tile.

  2. From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.

The plugin will take effect after setup steps and an advanced > full update of the data source has completed.

Configuration settings

The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.

The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value.

Record type

Configuration key

plugin.date-filter.config.record_type

Data type

string

Allowed values

NONE,FACEBOOK_POST,FACEBOOK_EVENT,TWITTER,YOUTUBE,INSTAGRAM,CUSTOM

Required

This setting is optional

Specifies the type of custom XML data source. Choose from a preset social media source, or define a custom source.

The social media presets can be used to quickly apply your date filter to Facebook posts and events, Twitter tweets, YouTube videos and Instagram posts.

If you select a CUSTOM record type, you also need to define the source field and date format using the corresponding configuration settings.

Date filter period - type of time units

Configuration key

plugin.date-filter.config.unit

Data type

string

Allowed values

YEARS,MONTHS,DAYS,HOURS,MINUTES

Required

This setting is required

Configuration item that contains the unit of measure for determining the date for which items should be filtered.

Date filter period - number of time units

Configuration key

plugin.date-filter.config.amount

Data type

integer

Required

This setting is required

Specifies the number of time units above used to calculate if the document should be filtered and is required.

Custom XML date element

Configuration key

plugin.date-filter.config.date_element

Data type

string

Required

This setting is optional

(deprecated) XML field name that contains the date/time value to be used for filtering. The value of this is an XML field name. Deprecated - use the custom Jsoup element selector/attribute configuration.

Custom Jsoup element selector

Configuration key

plugin.date-filter.config.jsoup_selector

Data type

string

Required

This setting is optional

Jsoup selector that identifies the element that contains the string representation of the date data in the document.

Custom Jsoup element attribute

Configuration key

plugin.date-filter.config.jsoup_selector.attribute

Data type

string

Required

This setting is optional

If the date is stored in an attribute, this should be set to the attribute name of the element (defined in the custom Jsoup element selector setting). Leave this setting blank if the date is stored as the element content.

Date format

Configuration key

plugin.date-filter.config.date_format

Data type

string

Required

This setting is optional

Specifies the date/time format of the extracted date. Value must be a valid Java date format string.

Upgrade notes

Upgrading from the XML date filter plugin

This plugin supersedes the XML date filter plugin and any data sources that use the superseded plugin should be upgraded to use this plugin.

The plugin can be upgraded by editing the data source configuration and renaming all the configuration keys that are set for the superseded plugin.

Alternatively, you can note down the settings for the old plugin and then delete the plugin from your data source using the standard plugin management screen, and then add a new date filter plugin, configuring it with the same settings.

Manually updating the configuration keys

To manually update the configuration keys,

  1. Open the data source configuration screen for the data source that is running the old XML date filter plugin.

  2. Select edit data source configuration from the settings panel.

  3. Select tools  edit raw data from the select menu that appears above the configuration key editor.

  4. Update all the configuration keys that start with plugin.xml-date-filter to start with plugin.date-filter.

  5. Click the save button to save the configuration.

  6. Return to the plugins management screen to verify that the data source is now configured to use the date filter plugin instead of the XML date filter

The example below shows a set of old plugin.xml-date-filter plugin keys upgraded to the plugin.date-filter plugin. When upgrading ensure you have the version key set to the correct version number for the new plugin.

In the raw configuration key editor update the following keys:

xml-date-filter plugin
plugin.xml-date-filter.enabled=true
plugin.xml-date-filter.version=1.0.0
plugin.xml-date-filter.config.unit=DAYS
plugin.xml-date-filter.config.amount=30
plugin.xml-date-filter.config.record_type=custom
plugin.xml-date-filter.config.date_element=timestamp
plugin.xml-date-filter.config.date_format=yyyy-MM-dd'T'HH:mm:ssZ

to

date-filter plugin
plugin.date-filter.enabled=true
plugin.date-filter.version=1.1.0
plugin.date-filter.config.unit=DAYS
plugin.date-filter.config.amount=30
plugin.date-filter.config.record_type=custom
plugin.date-filter.config.date_element=timestamp
plugin.date-filter.config.date_format=yyyy-MM-dd'T'HH:mm:ssZ

Once you have updated the key names, save the configuration and the plugin upgrade is complete.

Upgrading custom XML date element settings to Jsoup selectors

The deprecated Custom XML date element setting (plugin.xml-date-filter.config.date_element key) should be replaced with the equivalent Jsoup selector. Because the old key only selects the named element you can just update configuration to remove this setting and set the custom Jsoup element selector (plugin.xml-date-filter.config.jsoup_selector)to the same value.

Filter chain configuration

This plugin uses filters which are used to apply transformations to the gathered content.

The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.

Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation.

Filter classes

This plugin supplies a filter that runs in the main document filter chain: com.funnelback.plugins.datefilter.DateFilter

Drag the com.funnelback.plugins.datefilter.DateFilter plugin filter to where you wish it to run in the filter chain sequence.

Examples

Exclude items older than 30 days

Exclude records older than 30 days from a custom XML data source using the <timestamp> element.

For a custom XML record:

<item>
    <title><![CDATA[Example record]]></title>
    <timestamp>2000-12-24T04:35:21+1100</timestamp>
    <description><![CDATA[Example description]]></description>
</item>

Use the plugin configuration:

Configuration key name Value

Custom Jsoup element selector

timestamp

Date filter period - number of time units

30

Date filter period - type of time units

DAYS

Date format

yyyy-MM-dd’T’HH:mm:ssZ

Record type

CUSTOM

Keep documents that are newer than 1 year

Exclude documents that are older than 1 year, using the created date metadata.

For a HTML document that has the date in a meta tag:

<html>
  <head>
    <meta name="created_date" content="2005-10-12">
   </head>
  <body>
    <h1>Title</h1>
  </body>
</html>

Use the plugin configuration:

Configuration key name Value

Custom Jsoup element selector

meta[created_date]

Custom Jsoup element attribute

content

Date filter period - number of time units

1

Date filter period - type of time units

YEARS

Date format

yyyy-MM-dd

Record type

CUSTOM

Exclude documents that are older than 1 year, using a date contained within a <span> of class date.

If the date is in tag content:

<span class="date">October 12, 2015</span>

Use:

Configuration key name Value

Custom Jsoup element selector

span.date

Date filter period - number of time units

1

Date filter period - type of time units

YEARS

Date format

M 5, yyyy

Record type

CUSTOM

For tag attributes:

<span class="date" data-date="12/10/2005">...</span>

Use:

Configuration key name Value

Custom Jsoup element selector

span.date

Custom Jsoup element attribute

data-date

Date filter period - number of time units

1

Date filter period - type of time units

YEARS

Date format

dd/MM/yyyy

Record type

CUSTOM

Exclude records older than 30 days from a custom XML data source using the <date> element and JSoup selector:

<item>
    <title><![CDATA[Example record]]></title>
    <description><![CDATA[Example description]]></description>
    <date type="created" dateValue="2023-10-21"/>
</item>

Use the plugin configuration:

Configuration key name Value

Custom Jsoup element selector

date

Custom Jsoup element attribute

dateValue

Date filter period - number of time units

30

Date filter period - type of time units

DAYS

Date format

yyyy-MM-dd

Record type

CUSTOM

Change log

[1.1.1]

Fixed

  • Fixed a bug where date fields that contained leading or trailing whitespace could not be parsed.

[1.1.0]

Changed

  • Updated to the latest version plugin framework (Funnelback shared v16.20) to enable integration with the new plugin management dashboard.

  • Allowed using of JSoup selector for XML 'custom' type.