Crawling and indexing FTP sites
Funnelback includes basic support for the crawling of FTP sites using the web crawler.
Crawling of FTP sites does not support document level security. |
Method
-
Create a web data source for the ftp site index
-
Configure the start URL to be the FTP site’s root page
-
Configure include/exclude patterns as for a standard web data source
-
Enable the ftp protocol by adding ftp to the
crawler_protocols
data source configuration setting.
-
-
Configure authentication
-
Set the ftp username and password configuration options in the data source configuration:
ftp_passwd=<FTP-USERNAME> ftp_user=<FTP-PASSWORD>
-
-
Configure filetypes and download sizes
-
The basic filetypes supported by web data sources will be gathered. Additional filetypes can be added. See: Configure Funnelback to index additional file types
-
Set download and parser sizes using the
crawler.max_download_size
andcrawler.max_parse_size
settings.
-
-
Crawl the site.