Set kill configuration using a plugin
The plugin indexing interface provides two methods killByExactMatch()
and killByPartialMatch()
which enables additional kill patterns to be registered within a data source.
The primary use case for this is to set up any kill patterns that are required to support a plugin.
Prerequisite
In order to add a kill pattern, your plugin must be configured to provide indexing functionality.
Kill patterns can be added from configuration (kill_exact.cfg or kill_partial.cfg ) or via one or more plugins. Each source of kill patterns is independent of the other sources and the resultant kill patterns applied to a document will combine all kill patterns that have been added by the different sources.
|
Set kill patterns for documents fully matching a URL pattern
This is equivalent to kill patterns that are defined in kill_exact.cfg .
|
To set a kill pattern based on an exact URL match, implement the killByExactMatch()
method within this java class.
void killByExactMatch(IndexConfigProviderContext context, KillByExactMatchConsumer consumer)
Within this method, you need to call the killByExactMatch()
method on the consumer for each mapping you wish to set up.
The void killByExactMatch(String urlToKillByExactMatch)
method takes a single parameter. Documents that match this URL will have their kill bit set within the index.
Set kill patterns on documents partially matching a URL pattern
This is equivalent to kill patterns that are defined in kill_partial.cfg .
|
To set a kill pattern based on a partial match to a URL, implement the killByPartialMatch()
method within this java class.
void killByPartialMatch(IndexConfigProviderContext context, KillByPartialMatchConsumer consumer)
Within this method, you need to call the killByPartialMatch()
method on the consumer for each mapping you wish to set up.
The void killByPartialMatch(String urlToKillByPartialMatch)
method takes a single parameter. Documents that contain this pattern will have their kill bit set within the index.
Partial match rules are a left-match to a document’s URL with some extra logic to handle missing protocols. See: Kill Partial for more information. |
Example: Set kill patterns
This example demonstrates how to set gscopes on a document using a plugin.
package com.funnelback.plugin.example;
import com.funnelback.plugin.index.consumers.GscopeByRegexConsumer;
import com.funnelback.plugin.index.consumers.GscopeByQueryConsumer;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import com.funnelback.plugin.index.IndexConfigProviderContext;
import com.funnelback.plugin.index.IndexingConfigProvider;
public class ExampleIndexingConfigProvider implements IndexingConfigProvider {
private static final Logger log = LogManager.getLogger(ExampleIndexingConfigProvider.class);
@Override
public void killByExactMatch(IndexConfigProviderContext context, KillByExactMatchConsumer consumer) {
log.debug("Set kill bits for these specific pages");
consumer.killByExactMatch("http://example.com/index.html"); (1)
consumer.killByExactMatch("http://example.com/sitemap.xml");
}
public void killByPartialMatch(IndexConfigProviderContext context, KillByPartialMatchConsumer consumer) {
log.debug("Set kill bits for pages whose URL starts with one of these patterns");
consumer.killByPartialMatch("https://example.com/beta/"); (2)
consumer.killByPartialMatch("https://example.com/invalid/");
}
}
1 | Sets the kill bit for https://example.com/index.html and https://example.com/sitemap.xml . The behavior is equivalent to adding these URLs to the kill_exact.cfg . |
2 | Sets the kill bit for documents with URLs starting with https://example.com/beta/ and https://example.com/invalid/ . The behavior is equivalent to adding these URLs to the kill_partial.cfg . |