Set kill configuration using a plugin

The plugin indexing interface provides three methods killByExactMatch(), killByPartialMatch() and killByQueryMatch() which enables additional kill patterns to be registered within a data source.

The primary use case for this is to set up any kill patterns that are required to support a plugin.

Prerequisite

In order to add a kill pattern, your plugin must be configured to provide indexing functionality.

Kill patterns can be added from configuration (kill_exact.cfg , kill_partial.cfg or query-kill.cfg) or via one or more plugins. Each source of kill patterns is independent of the other sources and the resultant kill patterns applied to a document will combine all kill patterns that have been added by the different sources.

Set kill patterns for documents fully matching a URL pattern

This is equivalent to kill patterns that are defined in kill_exact.cfg.

To set a kill pattern based on an exact URL match, implement the killByExactMatch() method within this java class.

void killByExactMatch(IndexConfigProviderContext context, KillByExactMatchConsumer consumer)

Within this method, you need to call the killByExactMatch() method on the consumer for each mapping you wish to set up.

The void killByExactMatch(String urlToKillByExactMatch) method takes a single parameter. Documents that match this URL will have their kill bit set within the index.

Set kill patterns on documents partially matching a URL pattern

This is equivalent to kill patterns that are defined in kill_partial.cfg.

To set a kill pattern based on a partial match to a URL, implement the killByPartialMatch() method within this java class.

void killByPartialMatch(IndexConfigProviderContext context, KillByPartialMatchConsumer consumer)

Within this method, you need to call the killByPartialMatch() method on the consumer for each mapping you wish to set up.

The void killByPartialMatch(String urlToKillByPartialMatch) method takes a single parameter. Documents that contain this pattern will have their kill bit set within the index.

Partial match rules are a left-match to a document’s URL with some extra logic to handle missing protocols. See: kill_partial.cfg for more information.

Set kill patterns on document(s) returned from a given query

This is equivalent to kill patterns that are defined in query-kill.cfg.

To set a kill pattern based on a query, implement the killByQueryMatch() method within this java class.

void killByQueryMatch(IndexConfigProviderContext context, KillByQueryMatchConsumer consumer)

Within this method, you need to call the killByQueryMatch() method on the consumer for each query you wish to set up.

The void killByQueryMatch(String queryToKillByMatch) method takes a single parameter. Documents that are returned by this query will have their kill bit set within the index.

Example: Set kill patterns

This example demonstrates how to set the kill pattern on a document using a plugin.

ExampleIndexingConfigProvider.java
package com.funnelback.plugin.example;

import com.funnelback.plugin.index.consumers.KillByExactMatchConsumer;
import com.funnelback.plugin.index.consumers.KillByPartialMatchConsumer;
import com.funnelback.plugin.index.consumers.KillByQueryMatchConsumer;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import com.funnelback.plugin.index.IndexConfigProviderContext;
import com.funnelback.plugin.index.IndexingConfigProvider;

public class ExampleIndexingConfigProvider implements IndexingConfigProvider {

    private static final Logger log = LogManager.getLogger(ExampleIndexingConfigProvider.class);

    @Override
    public void killByExactMatch(IndexConfigProviderContext context, KillByExactMatchConsumer consumer) {
        log.debug("Set kill bits for these specific pages");

        consumer.killByExactMatch("http://example.com/index.html"); (1)
        consumer.killByExactMatch("http://example.com/sitemap.xml"); (1)
    }

    @Override
    public void killByPartialMatch(IndexConfigProviderContext context, KillByPartialMatchConsumer consumer) {
        log.debug("Set kill bits for pages whose URL starts with one of these patterns");

        consumer.killByPartialMatch("https://example.com/beta/"); (2)
        consumer.killByPartialMatch("https://example.com/invalid/"); (2)
    }

    @Override
    public void killByQueryMatch(IndexConfigProviderContext context, KillByQueryMatchConsumer consumer) {
        log.debug("Set kill bits for pages whose URLs are returned from a given query");

        consumer.killByQueryMatch("sitemap"); (3)
        consumer.killByQueryMatch("beta"); (3)
    }
}
1 Sets the kill bit for https://example.com/index.html and https://example.com/sitemap.xml. The behavior is equivalent to adding these URLs to the kill_exact.cfg.
2 Sets the kill bit for documents with URLs starting with https://example.com/beta/ and https://example.com/invalid/. The behavior is equivalent to adding these URLs to the kill_partial.cfg.
3 Sets the kill bit for documents with URLs returned from a query sitemap and beta. The behavior is equivalent to adding these queries to the query-kill.cfg.