Manual: Your first search engine

This is a step by step guide on setting up your first unobtanium search instance from installing the dependencies to getting first search results.

This guide assumes you are running on Linux and know how to navigate on the terminal.

In case you get stuck: You getting stuck while following this guide really shouldn't happen, if you do get stuck please open on issue on codeberg.org/unobtanium/unobtanium-documentation.

Resource requirements: This tutorial requires almost 2GB of disk space, make sure you have that much free.

Installing dependencies

You need the following packages:

These packages don't always have the same names, but they should be available for every Linux distribution. Operating systems other than Linux are currently not supported, consider running this inside a Linux virtual machine.

On alpine Linux:

apk add git rust cargo sqlite-dev openssl-dev

On Debian trixie:

apt install git rustc cargo libsqlite3-dev libssl-dev

On Void-Linux:

xbps-install git rust cargo sqlite-devel openssl-devel

Setting up

To set up create a folder my-first-unobtanium, everything in this tutorial will happen inside it. (The exact name isn't important, but this tutorial is going to reference it a lot)

Now to set up run the following commands:

# Navigate to the folder you just created
cd my-first-unobtanium

# Download the sourcecode using git
git clone https://codeberg.org/unobtanium/unobtanium

# Navigate inside the sourcecode
cd unobtanium

# Use a known working version of unobtanium
# that doesn't require extra steps
git checkout v3.0.0

# Git will complain about something it calls 'detached HEAD' state.
# This is okay since we won't be doing any development.

# Run the rust compiler to build release optimized versions
# This will take a while ...
cargo build --release

# Create a folder outside the source code
# where we can put the resulting binaries
mkdir ../bin

# Copy the binaries to the bin folder we just created
cp target/release/unobtanium-viewer ../bin/
cp target/release/unobtanium-crawler ../bin/

# Free up some space
cargo clean

# Back to the my-first-unobtanium folder
cd ..

# Tell your shell that there are additional commands
# in the bin folder so you can use them by typing their name.
# This is not permanent:
# If you come back later remember to repeat this step.
export PATH="$PATH:$PWD/bin/"

# Make sure the crawler is there
unobtanium-crawler --help

# Make sure the viewer is there
unobtanium-viewer --help

You now have an environment that will work for the rest of the tutorial.

Your first crawl

To search something you need an index, to build an index you need raw data.

In this step we will:

Creating a configuration file

Inside the my-first-unobtanium folder create a text file example_config.toml:

# This is just a human readable name
name = "Unobtanium example index"

# Wait one second between each requst to the same URL origin
default_delay_ms = 1000

# The number of requests to attempt when running the crawler command once.
max_commands_per_run = 100

# Only crawl pages that haven't been crawled within a week
recrawl_interval = "1 week"

# The http `User-Agent`, in this case a placeholder for the tutorial
user_agent = "unobtanium-tutorial-crawler"

# The entry points of the sites that unobtanium shoudl crawl.
seeds = [
	"https://doc.unobtanium.rocks/",
	"https://slatecave.net/",
]

Please don't change the file for now, you can mix it up after you've gotten it working. I know you're curious.

Running the crawler

This step will collect the crawl database from the web.

Crawl Database The crawl database contains raw web pages along with information on when and how they were fetched, other search engines call this their "repository".

Back in the Terminal, inside the my-first-unobtanium folder run:

unobtanium-crawler crawl \
	--config example_config.toml \
	--database example_crawl.db

This will run for about a minute and will collect slightly less than 100 documents from doc.unobtanium.rocks and slatecave.net in what is roughly a 50:50 split.

Running the crawler command again will fetch another (almost) 100 documents.

Interrupting the crawler: You can stop the crawler like any other well behaved command line program through Ctrl+C. When you start it again, it will continue where it left off.

The rest of this section is explanation of what is going on.

To break the command down:

While the crawler is running you can observe it doing a number of things:

Running the summarizer

The crawler gave you raw data from the web, which is as searchable as a pile of random papers someone dumped on your desk without explanation. The summarizer takes this pile of pages and generates the summary database.

Summary Database: The summary database is the search index of unobtanium and contains data and metadata in a way that is easily searchable.

The summarizer is also built into the crawler, you can run it with the following command in the my-first-unobtanium folder:

unobtanium-crawler summarize \
	--crawler-db example_crawl.db \
	--summary-db example_summary.db

This will run for a few seconds and generate the file example_summary.db.

Interrupting the summarizer: Like the crawler the summarizer can be interrupted through Ctrl+C. It will also resume where it was interrupted.

The rest of this section is explanation of what is going on.

To break down the command:

The summarizer will do a few things:

Running the search frontend

Now that you have an index you want to search it, you can start it using the following command in the my-first-unobtanium folder:

unobtanium-viewer \
	--summary-db example_summary.db \
	--template-location unobtanium/viewer/templates/

You should see it starting up some search workers, starting the templating engine and then telling you Web interface on: 127.0.0.1:3000.

You can now open http://127.0.0.1:3000 in your local web-browser and you'll be greeted by a search box.

Access from the local Network: In case you want/need to open the search engine for your local network pass in --listen 0.0.0.0:3000. In a real deployment on the internet you want to use a reverse proxy.

Some queries for you to try:

You can stop the viewer with Ctrl+C.

File overview

An overview of all files created in this guide:

What now?

Congratulations, you now have a working search engine!

To build this from a tutorial-example into a real search engine, whether it be on your own network or for the internet the next steps are: