An Entity Generation is a concept and data structure used by Unobtanium to have some kind of space/time coordinate for any given version of queryable resource (an entity).
Entity Generations live in the summary database and are given a UUID for unique and stable identification.
Note: Recommended are UUID v7 or v4 as they both have huge random components. v7 is interesting for data transparency as it in theory allows tracing when the data was generated.
Each EntityGeneration has a few timestamps:
first_seen
last_seen
time_end_confirmed
These record the entity as observed by the unobtanium crawler and while useful shouldn't be taken as unversal truth.
The UUID itself is just a convenient identifier that refers to a given entity generation.
See also:
Fields
url
Url
The url under which the given entity livesuuid
Uuid
The UUID that identifies the entity generationfirst_seen
UtcTimestamp
First known existence of this generation (may change if better data is integrated)last_seen
UtcTimestamp
Last known existence, may equalfirst_seen
, updated every time a newer request confirms that the EntityGeneration is still live.marked_duplicate
-
bool
Caches the current duplicate status to avoid uneccessary queries to the duplicate summary table. time_end_confirmed
Option<UtcTimestamp>
If set the EntityGeneration is considered no longer live, set to the time that this has been confirmed. (Usually thefirst_seen
of the next EntityGeneration) In cases of temorary outages this might reset back toNone
when the entity comes back.
Updates
2024-10-12 Improved Timestamping
Removed fields:
time_started
known_lifetime_seconds
Added fields:
first_seen
last_seen
time_end_confirmed
Before this update every EntityGeneration had a time_started
and a known_lifetime_seconds
, this was put in place before unobtanium was really able to work with multiple versions of EntityGeneration for one URL. As it turns out these are quite impractical to work with.
They were replaced by start and end times for the range that was observed "seen" and a definitive end time of the entity generation at which it was confiemed to be no longer there.
2024-10-26 Duplicate Marker
Added fields:
marked_duplicate
A duplicate marker has been added that follows the current duplicate status for an entity generation as that saves a lot of query time during search.