Start URL provider

The plugin framework allows logic to provide additional start URLs to be when updating a data source.

Provide start URLs for a plugin

Interface methods

To provide start URLs for a plugin, you need to implement the StartUrlProvider interface.

The StartUrlProvider interface has a single method:

List<URL> extraStartUrls(StartUrlProviderContext context)

Usage

Additional start URLs will be crawled after the plugin is enabled on the data source.

Logging

Log messages from the extraStartUrls method will appear in the data source’s user interface logs.

Example: Start URL provider

The code below implements logic which builds list of start URLs by adding sequential numbers to a prefix.

SequentialStartUrlPlugin.java
package com.funnelback.plugin.sequentialstarturlplugin;

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import com.funnelback.plugin.starturl.StartUrlProvider;
import com.funnelback.plugin.starturl.StartUrlProviderContext;

import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class SequentialStartUrlPlugin implements StartUrlProvider {

    private static final String PROTOCOL = "https";
    private static final String HOST = "www.mysite.com";
    private static final int PORT = 443;
    private static final String FILE_PREFIX = "/MyPage";
    private static final String FILE_SUFFIX = "/index.html";

    @Override
    public List<URL> extraStartUrls(StartUrlProviderContext context) {
        try {
            List<URL> startUrls = new ArrayList<>();
            for (int i = 0; i < 5; i++) {
                startUrls.add(new URL(PROTOCOL, HOST, PORT, FILE_PREFIX + i + FILE_SUFFIX));
            }
            return startUrls;
        } catch (MalformedURLException e) {
            return Collections.emptyList();
        }
    }
}