Creating a robots.txt file

Squiz Content Management allows you to create a robots.txt file to restrict access to sections of your site by robot (or spider) programs.

The permissions that you set for your site also restrict access by robot programs. For example, if you have an intranet site or a member’s area for which public read access has not been granted, Squiz Content Management will not allow access to these areas by robot programs.

Before you start

Adhere to the core robots.txt standard on http://www.robotstxt.org/.

Avoid the extensions added by Google and other search engines.

Following these recommendations provides more assurance that your settings will not be not ignored by crawler robots.

Specifically:

  • Do not use the * wildcard character anywhere except in a User-agent directive

  • Do not use Allow rules.

These recommendations are particularly important when configuring rules for Squiz Search, which only supports what is in the standard (and this is true of many other web robots too).

Steps

To create a robots.txt file:

  1. Create a text file asset in the site’s root directory to which you wish to restrict access.

  2. You can then either upload a pre-existing robots.txt file or use the Edit text screen of the text file asset and enter the appropriate configuration.

An example entry to restrict access by all robots to all areas of your site is shown:

User-agent: *
Disallow: /
bash
  1. Click the Allow unrestricted access toggle to the off position on the Details screen of the text file asset.

  2. Set the text file asset status to live and ensure that you have granted public read access. Once you’ve created your robots.txt file, you can test it to see if it’s configured correctly using various online tools such as: