Crawling and indexing FTP sites
Method
-
Create a web collection for the ftp site index
-
Configure the start URL to be the FTP site’s root page
-
Configure include/exclude patterns as for a standard web collection
-
Enable the ftp protocol by adding ftp to the
crawler_protocols
collection.cfg
setting.
-
-
Configure authentication
-
Set the ftp username and password configuration options in
collection.cfg
:ftp_passwd=<FTP-USERNAME> ftp_user=<FTP-PASSWORD>
-
-
Configure filetypes and download sizes
-
The basic filetypes supported by web collections will be gathered. Additional filetypes can be added. See: Configure Funnelback to index additional file types
-
Set download and parser sizes using the
crawler.max_download_size
andcrawler.max_parse_size
settings.
-
-
Crawl the site.