This document describes the crawl summary data structure stored in the summary database and is part of the entity data tree.
It is almost directly derived from the crawl log.
A crawl summary has the following fields:
crawl_type
- Which kind of crawl was performed. (Same as in the crawl log)
crawl_uuid
- UUID to indetify a crawl across databases, taken from crawl log.
crawl_time
- Taken from
time_started
in the crawl log. agent_uuid
- The uuid indentifying the agent that did the crawling.
exit_code
- The crawl exit code that summarizes the outcome of the
server_last_modified
- When the resource was last modified according to the server (UTC timestamp)
request_duration_ms
- How long the request took in milliseconds
was_robotstxt_approved
- Wheter the request was approved by robots.txt.
http
- One optional http summary
The URL is not prt of this data structure and is assumed to match the entity generation URL.
HTTP Summary
The http summary fields are:
status_code
- The status code returned by the server
etag
- Optional, ETag retuened by the server