Crawling password protected websites

Some websites are protected by an authentication scheme which requires a username/password combination to access the site. In order for Funnelback to successfully crawl password protected sites, it must be given a valid user name and password to use.

The authentication schemes that Funnelback currently supports are:

Giving Funnelback a username and password

Funnelback supports multiple HTTP Basic username/password pairs per web data source. If you have a single account to configure you can set the values using parameters in a data source configuration. To allow Funnelback access to the protected website:

For basic HTTP authentication:

  • Set the http_user parameter to a valid HTTP Basic username.

  • Set the http_passwd parameter to the HTTP Basic username’s password.

For NTLM/Windows Integrated authentication:

  • Set the crawler.ntlm.domain parameter to a valid NTLM domain.

  • Set the crawler.ntlm.username parameter to a valid username in the NTLM domain.

  • Set the crawler.ntlm.password parameter to the NTLM username’s password.

For FTP sites:

  • Set the ftp_user parameter to a valid FTP username.

  • Set the ftp_passwd parameter to the FTP Basic username’s password.

ftp will need to be added to the crawler.protocols in order to crawl an FTP site.

Specifying multiple HTTP Basic usernames and passwords

If you need to specify multiple HTTP Basic accounts for different web servers you can configure this using site profiles.