Data: Crawl Summary

This document describes the crawl summary data structure stored in the summary database and is part of the entity data tree.

It is almost directly derived from the crawl log.

A crawl summary has the following fields:

crawl_type
Which kind of crawl was performed. (Same as in the crawl log)
crawl_uuid
UUID to indetify a crawl across databases, taken from crawl log.
crawl_time
Taken from time_started in the crawl log.
agent_uuid
The uuid indentifying the agent that did the crawling.
exit_code
The crawl exit code that summarizes the outcome of the
server_last_modified
When the resource was last modified according to the server (UTC timestamp)
request_duration_ms
How long the request took in milliseconds
was_robotstxt_approved
Wheter the request was approved by robots.txt.
http
One optional http summary

The URL is not prt of this data structure and is assumed to match the entity generation URL.

HTTP Summary

The http summary fields are:

status_code
The status code returned by the server
etag
Optional, ETag retuened by the server