server_alias.cfg specifies a predefined set of preferred names/aliases for a particular server or set of servers.
To access the server alias configuration, open the data source configuration screen and access the configuration file manager by clicking on the browse data source files link.
server_alias.cfg file is listed in the file listing then click on it to edit the file. If it is not listed click the add new button to create a
Alternatively you can use a WebDAV Client to edit this file directly.
A list of mappings, one per line, of the form:
Protocols can be explicit (e.g. https://www.example.com/ ), otherwise the http protocol is assumed. Comments in the file are allowed by starting a line with the # character.
# Specify that www.daff.gov.au is always the preferred name www.daff.gov.au=www.affa.gov.au,www.dpie.gov.au
During a crawl the web crawler may decide that one site is a duplicate of another by comparing the content of their root page e.g.
old-site.example.com may be marked as a duplicate of new-site.example.com because their home pages are the same. This is done to avoid downloading a lot of duplicate content.
However, there may be a case where some content is only present on the old site and should still be gathered. If this is the case you can use the
server_alias.cfg mechanism to ensure that the old site is still fully crawled and not marked as a duplicate e.g.
# Specify that content from old-site.example.com should be stored under that name old-site.example.com=old-site.example.com