Selfhosting Unobtanium

So you've heard about unobtanium and want to host your own instance of it? This guide is for you.

Note: Unobtanium is unfinished software, be prepared for a fair bit of tinkering.

Warning: This guide is incomplete, but should be enough to get you started.

I'll assume you already know a few things:

Linux administration in general
Configuring a web reverse proxy
HTML
SQL
TOML
How to use cargo build to build your own binaries
Shell scripting (for your own convenience)

If you haven't yet please have a look at the overview page.

Resource requirements

CPU: No special requirements, it should have at least 2 cores.
RAM: Plan at least 2GB for unobtanium alone, this includes some buffer for caching and making sure nothing runs out of memory. In practice its very likely you'll use less, but not having the RAM free for the Kernel to use as cache will noticeably slow down searching.
Disk: For a small index of one medium size blog 100MB shoukd be enough.; For large deployments plan ~10GB per 100K searchable pages for the crawler database and another 6GB per 100K pages for one summary database (in practice you'll probably want two of them to get new data in without downtime).

What you need to do, overview

Bootstrapping a search engine:

Writing and testing an initial crawler configuration
Expanding the configuration
Doing a full initial crawl and summary
Setting up the frontend

Maintainence:

Semi regular recrawls
Updating the crawler configuration
Keeping unobtanium updated

Your first crawler configuration

Have the crawler crawl configuration manual ready.

This file will define where the crawler is allowed to collect pages from.

You may find the configuration of unobtanium.rocks useful. Use the shared policies file with the --policy-file option, it avoids some common parts of websites that the crawler should avoid.

Hint: Start with a small configuration of one to three seeds and a crawler command limit of 1000. This will quickly give you a feel for how the crawler behaves and what it tries to collect.

After the first successful crawl try running the summarizer and the pointing a locally running viewer at the resulting summary database. Try typing in some keywords you'd expect to find results for.

Once that works you can expand the crawling configuration and raise the crawler command limit.