Ranking

This is the algorithm that decides search result ranking.

It is split into two parts:

Database ranking is the algorithm that runs as SQL as a direct follow-up of the search result filter. Postranking runs for every result page and reorders the results based on more sophisticated criteria that would be too expensive to run across the whole database.

Database Ranking

Database ranking ius implemented in lib-unobtanium in the get_search_results() function on the summary database.

This documentation was written based on commit 78f080d240 on 2026-05-23.

This algorithm is derived from BM25 and is halfway between TF-IDF and BM25 plus some other weighting factors. A higher number from the algorithm mens the result is likely to be further up the result list.

Inputs to the algorithm are:

exact_including_token_ids (optional): A list of token ids that exactly match the entered search term that should get and extra boost.
average_document_token_length: How many tokens the average document has (calculated as described below).
including_token_ids: A list of token ids that when they match could cause a page to be included. Or simplified: All search query terms that don't use the - syntax.

Score calculation: token_score * indexiness_factor

Calculating the `token_score`

The token score is calculated by iterating over all including_token_ids that fullfills all of the requirements:

occurs in the result currently evaluated.
is in the including_token_ids list.
has an inverse document frequency (idf in token_idf table) greater than 12. (to filter out stopwords and other common words)

The constants for the BM25 part are:

k_one = 1.2
b = 0.75

The value for each token is calculated using:

if token_id in exact_including_token_ids then 1.5 else 1 end *
inverse_document_frequency *
(
	occurances * (k_one + 1) /
	(
		occurances +
		k_one * (
			1 - b +
			b * (relevant_segements / average_document_token_length)
		)
	)
)

The final token_score is the sum of the score of all individual tokens.

Calculating the `indexiness_factor`

The indexiness factor is set to 0.9 when the documents indexiness is above 0 (index pages) and set to 1 otherwise (leaf pages).

Calculating the `average_document_token_length`

The average token length is calculated by letting the database compute the average of the relevant_segements field in the text_pile_v0_2 table over all entries that are mentioned at least once in the token_statistics table.

Postranking

Postranking is implemented in the viewers search worker and reorders results to boost those that mention the searched for keywords in their titles and those that have the keywords closer together.

TODO: This is not yet documented, apologies.

Database Ranking

Calculating the token_score

Calculating the indexiness_factor

Calculating the average_document_token_length

Postranking

Calculating the `token_score`

Calculating the `indexiness_factor`

Calculating the `average_document_token_length`