Crawling password protected websites
Some websites are protected by an authentication scheme which requires a username/password combination to access the site. In order for Funnelback to successfully crawl password protected sites, it must be given a valid user name and password to use.
The authentication schemes that Funnelback currently supports are:
-
HTTP Basic Authentication
-
Windows Integrated Authentication (NTLM)
-
Web form based authentication such as SAML.
Giving Funnelback a username and password
Funnelback supports multiple HTTP Basic username/password pairs per web data source. If you have a single account to configure you can set the values using parameters in a data source configuration. To allow Funnelback access to the protected website:
For basic HTTP authentication:
-
Set the
http_user
parameter to a valid HTTP Basic username. -
Set the
http_passwd
parameter to the HTTP Basic username’s password.
For NTLM/Windows Integrated authentication:
-
Set the
crawler.ntlm.domain
parameter to a valid NTLM domain. -
Set the
crawler.ntlm.username
parameter to a valid username in the NTLM domain. -
Set the
crawler.ntlm.password
parameter to the NTLM username’s password.
For FTP sites:
-
Set the
ftp_user
parameter to a valid FTP username. -
Set the
ftp_passwd
parameter to the FTP Basic username’s password.
ftp will need to be added to the crawler.protocols in order to crawl an FTP site. |
Specifying multiple HTTP Basic usernames and passwords
If you need to specify multiple HTTP Basic accounts for different web servers you can configure this using site profiles.