Crawling and indexing WebDAV sites

Background

Funnelback includes basic support for the crawling of WebDAV sites using the web crawler.

Crawling of WebDAV sites does not support DLS.

Details

WebDAV is delivered via HTTP and can be accessed using the web crawler.

  1. Create a web collection for the WebDAV site index

    • Configure the start URL to be the WebDAV site’s root page

    • Configure include/exclude patterns as for a standard web collection.

  2. Configure authentication

  3. Configure filetypes and download sizes

    • The basic filetypes supported by web collections will be gathered. Additional filetypes can be added. See: Configure Funnelback to index additional file types

    • Set download and parser sizes using the crawler.max_download_size and crawler.max_parse_size settings.

  4. Crawl the site.