Data: Entity Generation

An Entity Generation is a concept and data structure used by Unobtanium to have some kind of space/time coordinate for any given version of queryable resource (an entity).

Entity Generations live in the summary database and are given a UUID for unique and stable identification.

Note: Recommended are UUID v7 or v4 as they both have huge random components. v7 is interesting for data transparency as it in theory allows tracing when the data was generated.

See also:

Open / Closed State

Each entity generation has a few timestamps:

The first_seen and last_seen timestamps are always set and can be used to determine when the entity generation definitely existed in the recorded form.

As long as the entity can be assumed to exist in the recorded form the entity generation is considered open.

The time_end_confirmed timestamp is set when unobtanium learns that the entity has changed. When that happened the entity generation is considered closed.

Fields

url
Url The url under which the given entity lives
uuid
Uuid The UUID that identifies the entity generation
first_seen
UtcTimestamp First known existence of this generation (may change if better data is integrated)
last_seen
UtcTimestamp Last known existence, may equal first_seen, updated every time a newer request confirms that the entity generation is still open.
marked_duplicate
bool Caches the current duplicate status to avoid uneccessary queries to the duplicate summary table.
time_end_confirmed
Option<UtcTimestamp> If set the entity generation is considered no longer live, set to the time that this has been confirmed. (Usually the first_seen of the next entity generation) In cases of temorary outages this might reset back to None when the entity comes back.

Updates

2024-10-12 Improved Timestamping

Removed fields:

Added fields:

Before this update every entity generation had a time_started and a known_lifetime_seconds, this was put in place before unobtanium was really able to work with multiple versions of entity generation for one URL. As it turns out these are quite impractical to work with.

They were replaced by start and end times for the range that was observed "seen" and a definitive end time of the entity generation at which it was confiemed to be no longer there.

2024-10-26 Duplicate Marker

Added fields:

A duplicate marker has been added that follows the current duplicate status for an entity generation as that saves a lot of query time during search.