Used to manually specify the preferred name for a particular server or set of servers.
To access the server alias configuration, open the data source configuration screen then select browse data source files.
server_alias.cfg file is listed in the file listing then click on it to edit the file. If it is not listed click the add new button to create a
Alternatively you can use a WebDAV Client to edit this file directly.
A list of mappings, one per line, of the form:
Protocols can be explicit (e.g. https://www.example.com/ ), otherwise the http protocol is assumed. Comments in the file are allowed by starting a line with the # character.
# Specify that www.daff.gov.au is always the preferred name www.daff.gov.au=www.affa.gov.au,www.dpie.gov.au
During a crawl the web crawler may decide that one site is a duplicate of another by comparing the content of their root page e.g. old-site.example.com may be marked as a duplicate of new-site.example.com because their home pages are the same. This is done to avoid downloading a lot of duplicate content.
However, there may be a case where some content is only present on the old site and should still be gathered. If this is the case you can use the
server_alias.cfg mechanism to ensure that the old site is still fully crawled and not marked as a duplicate e.g.
# Specify that content from old-site.example.com should be stored under that name old-site.example.com=old-site.example.com