Concept: Text Pile

The text pile as a concept is that the search engine stores text independent of where it found that text so that it can reuse the storage instace in case of a page being reachable via more than one address (exact duplicates).

The text piles store the plain text from the pages it was scraped from in a cleaned up form along with some metadata about the semantics of the text and information directly derived from the text.

Implementations