When building a search for an application, you typically face two broad approaches:

There's also a hybrid approach, but I will leave that for a future article. Instead, in this post, I’ll walk you through how the two Brad approaches work in Python using MariaDB and an AI embedding model, highlight where they differ, and show code that you can adapt.

The Key Components

For this example, I used MariaDB Cloud to spin up a free serverless database. Within seconds, I had a free instance ready. I grabbed the host/user/password details, connected with VS Code, created a database called demo, created a products table, and loaded ~500 rows of product names via LOAD DATA LOCAL INFILE. This is an extremely small dataset, but it's enough for learning and experimentation.

Then I built a small Python + FastAPI app. First, I implemented a simple keyword search (by product name) endpoint using a full-text index, then I implemented a semantic (vector) search using AI-generated vector embeddings + MariaDB’s vector support. You can see the whole process in this video.

Keyword-based Search: Simple and Familiar

For keyword search, I used a full-text index on the name column of the products table. With this index in place, I could search by product name using this SQL query:

SELECT name
FROM products
ORDER BY MATCH(name) AGAINST(?)
LIMIT 10;

I exposed this functionality using a FastAPI endpoint as follows:

@app.get("/products/text-search")
def text_search(query: str):
    cursor = connection.cursor()
    cursor.execute(
        "SELECT name FROM products ORDER BY MATCH(name) AGAINST(?) LIMIT 10;", (query,)
    )
    return [name for (name,) in cursor]

Pros:

Cons:

In my demo, the endpoint returned several products that were not relevant to “running shoes”.

Semantic (Vector) Search: Matching Meaning

To go beyond keywords, I implemented a second endpoint:

  1. I use an AI embedding model (Google Generative AI via LangChain) to convert each product name into a high-dimensional vector.
  2. Store those vectors in MariaDB with the vector integration for LangChain.
  3. At query time, embed the user’s search phrase into a vector (using exactly the same AI embedding model of the previous step), then perform a similarity search with the highly performant HNSW algorithm in MariaDB (e.g., top 10 nearest vectors) and return the corresponding products.

Here’s how I implemented the ingestion endpoint:

@app.post("/products/ingest")
def ingest_products():
    cursor = connection.cursor()
    cursor.execute("SELECT name FROM products;")
    vector_store.add_texts([name for (name,) in cursor])
    return "Products ingested successfully"

And this is the semantic search endpoint:

@app.get("/products/semantic-search")
def search_products(query: str):
    results = vector_store.similarity_search(query, k=10)
    return [doc.page_content for doc in results]

The LangChain integration for MariaDB makes the whole process extremely easy. The integration creates two tables:

When I ran the semantic search endpoint with the same query “running shoes”, the results felt much more relevant: they included products that didn’t match “running” or “shoes” literally but were semantically close.

Keyword vs. Semantic — When to Use Which

Here’s a quick comparison:

Approach

Pros

Cons

Keyword search

Quick to set up, uses SQL directly

Limited to literal term matching, less clever

Semantic search

Matches meaning and context, more flexible

Requires embedding model + vector support

Pick keyword search when:

Pick semantic search when:

In many real-world apps, you’ll use a hybrid: start with keyword search, and for higher-value queries or when an exact match fails, fall back to semantic search. Or even mix the two via hybrid search. MariaDB helps with this, too.

How Simple the Integration Can Be

In my demo, I triggered vector ingestion via a POST endpoint (/ingest). That reads all product names, computes embeddings, and writes them to MariaDB. One line of code (via LangChain + MariaDB integration) handled the insertion of ~500 rows of vectors.

Once vectors are stored, adding a semantic search endpoint is just a few lines of code. The MariaDB vector supports hiding most of the complexity.

The Source Code

You can find the code on GitHub. I have one simplistic, easy-to-follow program in the webinar-main.py and a more elaborate one with good practices in backend.py. Feel free to clone the repository, modify it, experiment with your own datasets, and let us know if there's anything you'd like to see in the LangChain integration for MariaDB.

https://www.youtube.com/watch?v=B8XGe4KIv8o&embedable=true