This document describes the crawl summary data structure stored in the summary database and is part of the entity data tree.
It is almost directly derived from the crawl log.
A crawl summary has the following fields:
crawl_type- Which kind of crawl was performed. (Same as in the crawl log)
crawl_uuid- UUID to indetify a crawl across databases, taken from crawl log.
crawl_time- Taken from
time_startedin the crawl log. agent_uuid- The uuid indentifying the agent that did the crawling.
exit_code- The crawl exit code that summarizes the outcome of the
server_last_modified- When the resource was last modified according to the server (UTC timestamp)
request_duration_ms- How long the request took in milliseconds
was_robotstxt_approved- Wheter the request was approved by robots.txt.
http- One optional http summary
The URL is not prt of this data structure and is assumed to match the entity generation URL.
HTTP Summary
The http summary fields are:
status_code- The status code returned by the server
etag- Optional, ETag retuened by the server