Crawling and indexing FTP sites


Funnelback includes basic support for the crawling of FTP sites using the web crawler.


  • Crawling of FTP sites does not support document level security.


  1. Create a web collection for the ftp site index

    • Configure the start URL to be the FTP site’s root page

    • Configure include/exclude patterns as for a standard web collection

    • Enable the ftp protocol by adding ftp to the crawler_protocols collection.cfg setting.

  2. Configure authentication

    • Set the ftp username and password configuration options in collection.cfg:

  3. Configure filetypes and download sizes

    • The basic filetypes supported by web collections will be gathered. Additional filetypes can be added. See: Configure Funnelback to index additional file types

    • Set download and parser sizes using the crawler.max_download_size and crawler.max_parse_size settings.

  4. Crawl the site.