Plugin: Date filter
Purpose
Use this plugin if you need to exclude documents in an XML or HTML data source based on a date/time contained in the document itself.
When to use this plugin
The primary use for this plugin is with social media repositories that do not provide any options for restricting the social media items that are included within the search. For example, to exclude social media posts older than a given date.
This has the benefit of:
-
removing the older and less relevant content from your search results.
-
ensuring that your license document count isn’t eaten up by a lot of old social media content.
Also use this plugin when you have date based content that you wish to exclude from the index, and you don’t have access to other methods of excluding the content (such as modifying the include/exclude conditions of your crawl or API/database query that is executed.
When to not use this plugin
Do not use this plugin if your data source has another supported method of excluding the items.
This plugin will remove items by date, but this is done after the document is retrieved from the data source. If the data source has an alternate method of excluding this content (e.g. by altering the API query, or via configurable include/exclude rules) then the built in method will be significantly more performant as the data will not be gathered from the data source.
For example, consider the scenario where you are gathering content from a social media repository and only want to include today’s items.
If you use this plugin, you will fetch all items from the repository (only limited by any other include/exclude rules you have defined on the data source), then the plugin will check the date and skip old items.
If the data source has the ability to alter the API query that is run then you will only fetch today’s items - which will be significantly faster, and also potentially result in fewer API calls to the social media platform, which is important for platforms where the number of API calls is limited.
Usage
Enable the plugin
-
Select Plugins from the side navigation pane and click on the Date filter tile.
-
From the Location section, select the data source to which you would like to enable this plugin from the Select a data source select list.
The plugin will take effect after setup steps and an advanced > full update of the data source has completed. |
Configuration settings
The configuration settings section is where you do most of the configuration for your plugin. The settings enable you to control how the plugin behaves.
The configuration key names below are only used if you are configuring this plugin manually. The configuration keys are set in the data source configuration to configure the plugin. When setting the keys manually you need to type in (or copy and paste) the key name and value. |
Record type
Configuration key |
|
Data type |
string |
Allowed values |
NONE,FACEBOOK_POST,FACEBOOK_EVENT,TWITTER,YOUTUBE,INSTAGRAM,CUSTOM |
Required |
This setting is optional |
Specifies the type of custom XML data source. Choose from a preset social media source, or define a custom source.
The social media presets can be used to quickly apply your date filter to Facebook posts and events, Twitter tweets, YouTube videos and Instagram posts.
If you select a CUSTOM record type, you also need to define the source field and date format using the corresponding configuration settings.
Date filter period - type of time units
Configuration key |
|
Data type |
string |
Allowed values |
YEARS,MONTHS,DAYS,HOURS,MINUTES |
Required |
This setting is required |
Configuration item that contains the unit of measure for determining the date for which items should be filtered.
Date filter period - number of time units
Configuration key |
|
Data type |
integer |
Required |
This setting is required |
Specifies the number of time units above used to calculate if the document should be filtered and is required.
Custom XML date element
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
(deprecated) XML field name that contains the date/time value to be used for filtering. The value of this is an XML field name. Deprecated - use the custom Jsoup element selector/attribute configuration.
Custom Jsoup element selector
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
Jsoup selector that identifies the element that contains the string representation of the date data in the document.
Custom Jsoup element attribute
Configuration key |
|
Data type |
string |
Required |
This setting is optional |
If the date is stored in an attribute, this should be set to the attribute name of the element (defined in the custom Jsoup element selector setting). Leave this setting blank if the date is stored as the element content.
Upgrade notes
Upgrading from the XML date filter plugin
This plugin supersedes the XML date filter plugin and any data sources that use the superseded plugin should be upgraded to use this plugin.
The plugin can be upgraded by editing the data source configuration and renaming all the configuration keys that are set for the superseded plugin.
Alternatively, you can note down the settings for the old plugin and then delete the plugin from your data source using the standard plugin management screen, and then add a new date filter plugin, configuring it with the same settings.
Manually updating the configuration keys
To manually update the configuration keys,
-
Open the data source configuration screen for the data source that is running the old XML date filter plugin.
-
Select edit data source configuration from the settings panel.
-
Select
from the select menu that appears above the configuration key editor. -
Update all the configuration keys that start with
plugin.xml-date-filter
to start withplugin.date-filter
. -
Click the save button to save the configuration.
-
Return to the plugins management screen to verify that the data source is now configured to use the date filter plugin instead of the XML date filter
The example below shows a set of old plugin.xml-date-filter
plugin keys upgraded to the plugin.date-filter
plugin. When upgrading ensure you have the version key set to the correct version number for the new plugin.
In the raw configuration key editor update the following keys:
plugin.xml-date-filter.enabled=true
plugin.xml-date-filter.version=1.0.0
plugin.xml-date-filter.config.unit=DAYS
plugin.xml-date-filter.config.amount=30
plugin.xml-date-filter.config.record_type=custom
plugin.xml-date-filter.config.date_element=timestamp
plugin.xml-date-filter.config.date_format=yyyy-MM-dd'T'HH:mm:ssZ
to
plugin.date-filter.enabled=true
plugin.date-filter.version=1.1.0
plugin.date-filter.config.unit=DAYS
plugin.date-filter.config.amount=30
plugin.date-filter.config.record_type=custom
plugin.date-filter.config.date_element=timestamp
plugin.date-filter.config.date_format=yyyy-MM-dd'T'HH:mm:ssZ
Once you have updated the key names, save the configuration and the plugin upgrade is complete.
Upgrading custom XML date element settings to Jsoup selectors
The deprecated Custom XML date element setting (plugin.xml-date-filter.config.date_element
key) should be replaced with the equivalent Jsoup selector. Because the old key only selects the named element you can just update configuration to remove this setting and set the custom Jsoup element selector (plugin.xml-date-filter.config.jsoup_selector
)to the same value.
Filter chain configuration
This plugin uses filters which are used to apply transformations to the gathered content.
The filters run in sequence and need be set in an order that makes sense. The plugin supplied filter(s) (as indicated in the listing) should be re-ordered to an appropriate point in the sequence.
Changes to the filter order affects the way the data source processes gathered documents. See: document filters documentation. |
Examples
Exclude items older than 30 days
Exclude records older than 30 days from a custom XML data source using the <timestamp>
element.
For a custom XML record:
<item>
<title><![CDATA[Example record]]></title>
<timestamp>2000-12-24T04:35:21+1100</timestamp>
<description><![CDATA[Example description]]></description>
</item>
Use the plugin configuration:
Configuration key name | Value |
---|---|
Custom Jsoup element selector |
|
Date filter period - number of time units |
|
Date filter period - type of time units |
DAYS |
Date format |
|
Record type |
CUSTOM |
Keep documents that are newer than 1 year
Exclude documents that are older than 1 year, using the created date metadata.
For a HTML document that has the date in a meta tag:
<html>
<head>
<meta name="created_date" content="2005-10-12">
</head>
<body>
<h1>Title</h1>
</body>
</html>
Use the plugin configuration:
Configuration key name | Value |
---|---|
Custom Jsoup element selector |
|
Custom Jsoup element attribute |
|
Date filter period - number of time units |
|
Date filter period - type of time units |
YEARS |
Date format |
|
Record type |
CUSTOM |
Exclude documents that are older than 1 year, using a date contained within a <span>
of class date.
If the date is in tag content:
<span class="date">October 12, 2015</span>
Use:
Configuration key name | Value |
---|---|
Custom Jsoup element selector |
|
Date filter period - number of time units |
|
Date filter period - type of time units |
YEARS |
Date format |
|
Record type |
CUSTOM |
For tag attributes:
<span class="date" data-date="12/10/2005">...</span>
Use:
Configuration key name | Value |
---|---|
Custom Jsoup element selector |
|
Custom Jsoup element attribute |
|
Date filter period - number of time units |
|
Date filter period - type of time units |
YEARS |
Date format |
|
Record type |
CUSTOM |
Exclude records older than 30 days from a custom XML data source using the <date>
element and JSoup selector:
<item>
<title><![CDATA[Example record]]></title>
<description><![CDATA[Example description]]></description>
<date type="created" dateValue="2023-10-21"/>
</item>
Use the plugin configuration:
Configuration key name | Value |
---|---|
Custom Jsoup element selector |
|
Custom Jsoup element attribute |
|
Date filter period - number of time units |
|
Date filter period - type of time units |
DAYS |
Date format |
|
Record type |
CUSTOM |