Could not access seed page

Description

This error occurs if the web crawler is unable to access any of the configured start URLs.

Error message

Displayed in the update-<COLLECTION-ID>.log file.

Crawl: Error running crawler: Crawler couldn't access seed page

Cause

This means the web crawler couldn’t access any of the specified start URLs.

Resolution

  1. Confirm that the collection’s atart URLs are valid and return a status 200 (OK). e.g. attempt to access the failed URL using your web browser.

  2. Examine entries for the atart URLs in the the collection’s crawl logs (crawl.log.*, url_errors.log). Common issues include:

    • The URL was temporarily unavailable

    • robots.txt file on webserver blocking access

    • The start URL redirects to another URL that doesn’t match an include pattern

    • The start URL requires authentication

  3. Correct the error and then update the collection