Data: Entity Generation

An Entity Generation is a concept and data structure used by Unobtanium to have some kind of space/time coordinate for any given version of queryable resource (an entity).

Entity Generations live in the summary database and are given a UUID for unique and stable identification.

Note: Recommended are UUID v7 or v4 as they both have huge random components. v7 is interesting for data transparency as it in theory allows tracing when the data was generated.

Each EntityGeneration has a few timestamps:

These record the entity as observed by the unobtanium crawler and while useful shouldn't be taken as unversal truth.

The UUID itself is just a convenient identifier that refers to a given entity generation.

See also:

Fields

url
Url The url under which the given entity lives
uuid
Uuid The UUID that identifies the entity generation
first_seen
UtcTimestamp First known existence of this generation (may change if better data is integrated)
last_seen
UtcTimestamp Last known existence, may equal first_seen, updated every time a newer request confirms that the EntityGeneration is still live.
marked_duplicate
bool Caches the current duplicate status to avoid uneccessary queries to the duplicate summary table.
time_end_confirmed
Option<UtcTimestamp> If set the EntityGeneration is considered no longer live, set to the time that this has been confirmed. (Usually the first_seen of the next EntityGeneration) In cases of temorary outages this might reset back to None when the entity comes back.

Updates

2024-10-12 Improved Timestamping

Removed fields:

Added fields:

Before this update every EntityGeneration had a time_started and a known_lifetime_seconds, this was put in place before unobtanium was really able to work with multiple versions of EntityGeneration for one URL. As it turns out these are quite impractical to work with.

They were replaced by start and end times for the range that was observed "seen" and a definitive end time of the entity generation at which it was confiemed to be no longer there.

2024-10-26 Duplicate Marker

Added fields:

A duplicate marker has been added that follows the current duplicate status for an entity generation as that saves a lot of query time during search.