An Entity Generation is a concept and data structure used by Unobtanium to have some kind of space/time coordinate for any given version of queryable resource (an entity).
Entity Generations live in the summary database and are given a UUID for unique and stable identification.
Note: Recommended are UUID v7 or v4 as they both have huge random components. v7 is interesting for data transparency as it in theory allows tracing when the data was generated.
See also:
Open / Closed State
Each entity generation has a few timestamps:
first_seenlast_seentime_end_confirmed
The first_seen and last_seen timestamps are always set and can be used to determine when the entity generation definitely existed in the recorded form.
As long as the entity can be assumed to exist in the recorded form the entity generation is considered open.
The time_end_confirmed timestamp is set when unobtanium learns that the entity has changed. When that happened the entity generation is considered closed.
Fields
urlUrlThe url under which the given entity livesuuidUuidThe UUID that identifies the entity generationfirst_seenUtcTimestampFirst known existence of this generation (may change if better data is integrated)last_seenUtcTimestampLast known existence, may equalfirst_seen, updated every time a newer request confirms that the entity generation is still open.marked_duplicate-
boolCaches the current duplicate status to avoid uneccessary queries to the duplicate summary table. time_end_confirmedOption<UtcTimestamp>If set the entity generation is considered no longer live, set to the time that this has been confirmed. (Usually thefirst_seenof the next entity generation) In cases of temorary outages this might reset back toNonewhen the entity comes back.
Updates
2024-10-12 Improved Timestamping
Removed fields:
time_startedknown_lifetime_seconds
Added fields:
first_seenlast_seentime_end_confirmed
Before this update every entity generation had a time_started and a known_lifetime_seconds, this was put in place before unobtanium was really able to work with multiple versions of entity generation for one URL. As it turns out these are quite impractical to work with.
They were replaced by start and end times for the range that was observed "seen" and a definitive end time of the entity generation at which it was confiemed to be no longer there.
2024-10-26 Duplicate Marker
Added fields:
marked_duplicate
A duplicate marker has been added that follows the current duplicate status for an entity generation as that saves a lot of query time during search.