Start URL provider
The plugin framework allows logic to provide additional start URLs to be when updating a data source.
Provide start URLs for a plugin
Interface methods
To provide start URLs for a plugin, you need to implement the StartUrlProvider interface.
The StartUrlProvider
interface has a single method:
List<URL> extraStartUrls(StartUrlProviderContext context)
Usage
Additional start URLs will be crawled after the plugin is enabled on the data source.
Logging
Log messages from the extraStartUrls
method will appear in the data source’s user interface logs.
Example: Start URL provider
The code below implements logic which builds list of start URLs by adding sequential numbers to a prefix.
SequentialStartUrlPlugin.java
package com.funnelback.plugin.sequentialstarturlplugin;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import com.funnelback.plugin.starturl.StartUrlProvider;
import com.funnelback.plugin.starturl.StartUrlProviderContext;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class SequentialStartUrlPlugin implements StartUrlProvider {
private static final String PROTOCOL = "https";
private static final String HOST = "www.mysite.com";
private static final int PORT = 443;
private static final String FILE_PREFIX = "/MyPage";
private static final String FILE_SUFFIX = "/index.html";
@Override
public List<URL> extraStartUrls(StartUrlProviderContext context) {
try {
List<URL> startUrls = new ArrayList<>();
for (int i = 0; i < 5; i++) {
startUrls.add(new URL(PROTOCOL, HOST, PORT, FILE_PREFIX + i + FILE_SUFFIX));
}
return startUrls;
} catch (MalformedURLException e) {
return Collections.emptyList();
}
}
}