An Entity Generation is a concept and data structure used by Unobtanium to have some kind of space/time coordinate for any given version of queryable resource (an entity).
Entity Generations live in the summary database and are given a UUID for unique and stable identification.
Note: Recommended are UUID v7 or v4 as they both have huge random components. v7 is interesting for data transparency as it in theory allows tracing when the data was generated.
See also:
Open / Closed State
Each entity generation has a few timestamps:
first_seen
last_seen
time_end_confirmed
The first_seen
and last_seen
timestamps are always set and can be used to determine when the entity generation definitely existed in the recorded form.
As long as the entity can be assumed to exist in the recorded form the entity generation is considered open.
The time_end_confirmed
timestamp is set when unobtanium learns that the entity has changed. When that happened the entity generation is considered closed.
Fields
url
Url
The url under which the given entity livesuuid
Uuid
The UUID that identifies the entity generationfirst_seen
UtcTimestamp
First known existence of this generation (may change if better data is integrated)last_seen
UtcTimestamp
Last known existence, may equalfirst_seen
, updated every time a newer request confirms that the entity generation is still open.marked_duplicate
-
bool
Caches the current duplicate status to avoid uneccessary queries to the duplicate summary table. time_end_confirmed
Option<UtcTimestamp>
If set the entity generation is considered no longer live, set to the time that this has been confirmed. (Usually thefirst_seen
of the next entity generation) In cases of temorary outages this might reset back toNone
when the entity comes back.
Updates
2024-10-12 Improved Timestamping
Removed fields:
time_started
known_lifetime_seconds
Added fields:
first_seen
last_seen
time_end_confirmed
Before this update every entity generation had a time_started
and a known_lifetime_seconds
, this was put in place before unobtanium was really able to work with multiple versions of entity generation for one URL. As it turns out these are quite impractical to work with.
They were replaced by start and end times for the range that was observed "seen" and a definitive end time of the entity generation at which it was confiemed to be no longer there.
2024-10-26 Duplicate Marker
Added fields:
marked_duplicate
A duplicate marker has been added that follows the current duplicate status for an entity generation as that saves a lot of query time during search.