Crawling and indexing FTP sites

Funnelback includes basic support for the crawling of FTP sites using the web crawler.

Crawling of FTP sites does not support document level security.

Method

  1. Create a web data source for the ftp site index

    • Configure the start URL to be the FTP site’s root page

    • Configure include/exclude patterns as for a standard web data source

    • Enable the ftp protocol by adding ftp to the crawler_protocols data source configuration setting.

  2. Configure authentication

    • Set the ftp username and password configuration options in the data source configuration:

      ftp_passwd=<FTP-USERNAME>
      ftp_user=<FTP-PASSWORD>
  3. Configure filetypes and download sizes

    • The basic filetypes supported by web data sources will be gathered. Additional filetypes can be added. See: Configure Funnelback to index additional file types

    • Set download and parser sizes using the crawler.max_download_size and crawler.max_parse_size settings.

  4. Crawl the site.

See also