Your users don't search the way you think they do.

Someone looking for "how to cancel my subscription" might type "stop paying" or "end membership." Traditional keyword search returns nothing. The user assumes the answer isn't in your docs. They open a support ticket.

This happens constantly. Keyword search worked when documents were structured and users searched with predictable terms. That era is over. Modern applications need search that understands intent, not just matches strings.

The good news: you can build semantic search that runs entirely on your machine, costs nothing to operate, and keeps your data private. No cloud APIs. No vector database subscriptions. Just a small language model generating embeddings locally.

Let’s build it.

Keyword search relies on exact or fuzzy string matching. It fails when:

You can patch this with stemming, synonyms lists, and fuzzy matching. But you're fighting the core limitation: keyword search has no understanding of meaning.

What Are Embeddings?

An embedding is a list of numbers that represents the meaning of text. Two pieces of text with similar meanings produce similar number lists. Two unrelated texts produce different ones.

Think of it like coordinates. The sentence "How do I reset my password?" and "I forgot my login credentials" would be placed near each other in this mathematical space because they mean similar things.

When a user searches, you:

  1. Convert their query into an embedding
  2. Compare it against your pre-computed document embeddings
  3. Return the documents with the most similar embeddings

No keyword matching. Pure meaning comparison.

Local vs Cloud: Why Run Embeddings Locally?

Cloud embedding APIs (OpenAI, Cohere, Voyage) are convenient but come with tradeoffs:

Aspect

Cloud APIs

Local SLM

Cost

Per-request pricing

Free after setup

Privacy

Data sent to third party

Data stays on device

Latency

Network round-trip

Instant

Availability

Depends on service uptime

Always available

Rate limits

Yes

No

For internal tools, documentation search, or privacy-sensitive applications, local embeddings make sense. They're also great for development: no API keys, no costs during iteration.

The tradeoff is quality. Cloud models are larger and produce better embeddings. For many use cases, especially when your search corpus is focused (your own docs, a specific domain), smaller models work well.

Architecture Overview

Here's what we're building:

Components:

Setting Up Ollama

Ollama lets you run language models locally. Install it from ollama.ai, then pull an embedding model:

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a small, fast embedding model
ollama pull nomic-embed-text

nomic-embed-text produces 768-dimensional embeddings and runs well on modest hardware. It's a good balance between quality and speed.

Verify it's working:

curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "test embedding"
}'

You should get back a JSON response with an embedding array.

Project Setup

We'll build a React + TypeScript application with Vite. The complete code is available at github.com/ivmarcos/local-ai-search-engine.

npm create vite@latest local-search -- --template react-ts
cd local-search
npm install idb

idb is a tiny wrapper around IndexedDB that makes it less painful to use.

The Embedding Service

First, let's create a service that talks to Ollama:

// src/services/embeddings.ts

const OLLAMA_URL = 'http://localhost:11434';
const MODEL = 'nomic-embed-text';

export async function generateEmbedding(text: string): Promise<number[]> {
  const response = await fetch(`${OLLAMA_URL}/api/embeddings`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      model: MODEL,
      prompt: text,
    }),
  });

  if (!response.ok) {
    throw new Error(`Ollama error: ${response.statusText}`);
  }

  const data = await response.json();
  return data.embedding;
}

export async function generateEmbeddings(
  texts: string[]
): Promise<number[][]> {
  // Process in batches to avoid overwhelming the model
  const embeddings: number[][] = [];

  for (const text of texts) {
    const embedding = await generateEmbedding(text);
    embeddings.push(embedding);
  }

  return embeddings;
}

Ollama doesn't support batch embedding requests, so we process texts sequentially. For large document sets, you might want to add parallelism with a concurrency limit.

Storing Embeddings in IndexedDB

IndexedDB is a browser database that persists data locally. It's perfect for storing embeddings: no server needed, survives page refreshes, and can handle substantial amounts of data.

// src/services/database.ts

import { openDB, DBSchema, IDBPDatabase } from 'idb';

export interface Document {
  id: string;
  title: string;
  content: string;
  embedding: number[];
  createdAt: number;
}

interface SearchDB extends DBSchema {
  documents: {
    key: string;
    value: Document;
    indexes: { 'by-created': number };
  };
}

let dbInstance: IDBPDatabase<SearchDB> | null = null;

export async function getDB(): Promise<IDBPDatabase<SearchDB>> {
  if (dbInstance) return dbInstance;

  dbInstance = await openDB<SearchDB>('local-search', 1, {
    upgrade(db) {
      const store = db.createObjectStore('documents', { keyPath: 'id' });
      store.createIndex('by-created', 'createdAt');
    },
  });

  return dbInstance;
}

export async function saveDocument(doc: Document): Promise<void> {
  const db = await getDB();
  await db.put('documents', doc);
}

export async function getAllDocuments(): Promise<Document[]> {
  const db = await getDB();
  return db.getAll('documents');
}

export async function deleteDocument(id: string): Promise<void> {
  const db = await getDB();
  await db.delete('documents', id);
}

export async function clearAllDocuments(): Promise<void> {
  const db = await getDB();
  await db.clear('documents');
}

Each document stores its content alongside its embedding. This denormalization makes search fast: we load everything once, then compute similarities in memory.

The Search Engine

The core of semantic search is comparing embeddings using cosine similarity. Two vectors pointing in the same direction (similar meaning) have similarity close to 1. Orthogonal vectors (unrelated) have similarity close to 0.

// src/services/search.ts

import { Document, getAllDocuments } from './database';
import { generateEmbedding } from './embeddings';

function cosineSimilarity(a: number[], b: number[]): number {
  let dotProduct = 0;
  let normA = 0;
  let normB = 0;

  for (let i = 0; i < a.length; i++) {
    dotProduct += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }

  const magnitude = Math.sqrt(normA) * Math.sqrt(normB);
  return magnitude === 0 ? 0 : dotProduct / magnitude;
}

export interface SearchResult {
  document: Document;
  score: number;
}

export async function search(
  query: string,
  limit: number = 10
): Promise<SearchResult[]> {
  const queryEmbedding = await generateEmbedding(query);
  const documents = await getAllDocuments();

  const results: SearchResult[] = documents.map((doc) => ({
    document: doc,
    score: cosineSimilarity(queryEmbedding, doc.embedding),
  }));

  results.sort((a, b) => b.score - a.score);

  return results.slice(0, limit);
}

This loads all documents into memory for each search. That's fine for hundreds or even a few thousand documents. For larger datasets, you'd want to implement approximate nearest neighbor search or use a proper vector store.

Indexing Content

Now let's tie it together with a function that indexes new content:

// src/services/indexer.ts

import { v4 as uuidv4 } from 'uuid';
import { Document, saveDocument } from './database';
import { generateEmbedding } from './embeddings';

export interface ContentToIndex {
  title: string;
  content: string;
  id?: string;
}

export async function indexContent(
  content: ContentToIndex
): Promise<Document> {
  const embedding = await generateEmbedding(
    `${content.title}\n\n${content.content}`
  );

  const document: Document = {
    id: content.id || uuidv4(),
    title: content.title,
    content: content.content,
    embedding,
    createdAt: Date.now(),
  };

  await saveDocument(document);
  return document;
}

export async function indexBatch(
  contents: ContentToIndex[],
  onProgress?: (indexed: number, total: number) => void
): Promise<Document[]> {
  const documents: Document[] = [];

  for (let i = 0; i < contents.length; i++) {
    const doc = await indexContent(contents[i]);
    documents.push(doc);
    onProgress?.(i + 1, contents.length);
  }

  return documents;
}

Notice we concatenate title and content before embedding. This gives the model more context and generally produces better results than embedding just the content.

Building the UI

Here's a minimal React interface that lets you add documents and search:

// src/App.tsx

import { useState, useEffect } from 'react';
import { indexContent } from './services/indexer';
import { search, SearchResult } from './services/search';
import { getAllDocuments, deleteDocument, Document } from './services/database';
import './App.css';

function App() {
  const [documents, setDocuments] = useState<Document[]>([]);
  const [results, setResults] = useState<SearchResult[]>([]);
  const [query, setQuery] = useState('');
  const [title, setTitle] = useState('');
  const [content, setContent] = useState('');
  const [isIndexing, setIsIndexing] = useState(false);
  const [isSearching, setIsSearching] = useState(false);

  useEffect(() => {
    loadDocuments();
  }, []);

  async function loadDocuments() {
    const docs = await getAllDocuments();
    setDocuments(docs);
  }

  async function handleIndex() {
    if (!title.trim() || !content.trim()) return;

    setIsIndexing(true);
    try {
      await indexContent({ title, content });
      setTitle('');
      setContent('');
      await loadDocuments();
    } finally {
      setIsIndexing(false);
    }
  }

  async function handleSearch() {
    if (!query.trim()) return;

    setIsSearching(true);
    try {
      const searchResults = await search(query);
      setResults(searchResults);
    } finally {
      setIsSearching(false);
    }
  }

  async function handleDelete(id: string) {
    await deleteDocument(id);
    await loadDocuments();
    setResults(results.filter(r => r.document.id !== id));
  }

  return (
    <div className="container">
      <h1>Local AI Search</h1>

      <section className="add-document">
        <h2>Add Document</h2>
        <input
          type="text"
          placeholder="Title"
          value={title}
          onChange={(e) => setTitle(e.target.value)}
        />
        <textarea
          placeholder="Content"
          value={content}
          onChange={(e) => setContent(e.target.value)}
          rows={4}
        />
        <button onClick={handleIndex} disabled={isIndexing}>
          {isIndexing ? 'Indexing...' : 'Index Document'}
        </button>
      </section>

      <section className="search">
        <h2>Search</h2>
        <div className="search-box">
          <input
            type="text"
            placeholder="Search by meaning, not keywords..."
            value={query}
            onChange={(e) => setQuery(e.target.value)}
            onKeyDown={(e) => e.key === 'Enter' && handleSearch()}
          />
          <button onClick={handleSearch} disabled={isSearching}>
            {isSearching ? 'Searching...' : 'Search'}
          </button>
        </div>
      </section>

      {results.length > 0 && (
        <section className="results">
          <h2>Results</h2>
          {results.map((result) => (
            <div key={result.document.id} className="result-card">
              <div className="result-header">
                <h3>{result.document.title}</h3>
                <span className="score">
                  {(result.score * 100).toFixed(1)}% match
                </span>
              </div>
              <p>{result.document.content}</p>
            </div>
          ))}
        </section>
      )}

      <section className="documents">
        <h2>Indexed Documents ({documents.length})</h2>
        {documents.map((doc) => (
          <div key={doc.id} className="document-card">
            <h4>{doc.title}</h4>
            <p>{doc.content.substring(0, 100)}...</p>
            <button
              className="delete-btn"
              onClick={() => handleDelete(doc.id)}
            >
              Delete
            </button>
          </div>
        ))}
      </section>
    </div>
  );
}

export default App;

Add a few documents to test. Try these:

  1. Title: "Password Reset", Content: "To reset your password, click the forgot password link on the login page and follow the email instructions."
  2. Title: "Account Security", Content: "Enable two-factor authentication in your security settings to protect your account from unauthorized access."
  3. Title: "Billing FAQ", Content: "You can update your payment method or cancel your subscription from the billing section in account settings."

Now search for "I can't log in" and watch it find the password reset article, even though those exact words don't appear anywhere.

Search for "stop my subscription" and it finds the billing FAQ.

That's semantic search working! Here’s what it should look like:

Performance Considerations

A few things to keep in mind:

Embedding generation takes time. On a MacBook Pro M1, nomic-embed-text generates about 10-15 embeddings per second. For large document sets, index during off-peak times or show progress to users.

Memory usage scales with documents. Each 768-dimension embedding is about 3KB. 10,000 documents use ~30MB just for embeddings, plus the document content. This is manageable in browsers but watch it.

Search is O(n). We compare every document on each search. For thousands of documents, this takes milliseconds. For millions, you need approximate nearest neighbor algorithms. Libraries like hnswlib-node or faiss can help, though they complicate the pure-browser approach.

Chunking Long Documents

One issue: embedding models have token limits, typically 512-8192 tokens. Long documents need to be split into chunks.

// src/services/chunker.ts

export interface Chunk {
  text: string;
  startIndex: number;
  endIndex: number;
}

export function chunkText(
  text: string,
  maxLength: number = 500,
  overlap: number = 50
): Chunk[] {
  const chunks: Chunk[] = [];
  let start = 0;

  while (start < text.length) {
    let end = start + maxLength;

    // Try to break at sentence boundary
    if (end < text.length) {
      const lastPeriod = text.lastIndexOf('.', end);
      if (lastPeriod > start + maxLength / 2) {
        end = lastPeriod + 1;
      }
    }

    chunks.push({
      text: text.slice(start, end).trim(),
      startIndex: start,
      endIndex: Math.min(end, text.length),
    });

    start = end - overlap;
    if (start >= text.length) break;
  }

  return chunks;
}

Each chunk gets its own embedding. When searching, you find relevant chunks, then return the parent document. This lets semantic search work on 50-page PDFs or long documentation.

When Cloud Embeddings Still Make Sense

Local embedding isn't always the answer:

Real-World Use Cases

Where local semantic search shines:

Documentation search: Replace Cmd+F with something that understands "How do I authenticate?" finds the right docs even if they say "login" or "sign in."

Internal knowledge bases: Company wikis, runbooks, and internal docs stay on-premises. No sending proprietary information to external APIs.

Personal tools: Note-taking apps, bookmark managers, local file search. Your data never leaves your machine.

Offline-capable apps: Semantic search works without internet. Index once, search forever.

Development and testing: Iterate on search quality without API costs. Once it works, decide if you need cloud-scale deployment.

What We Built

A semantic search engine that:

The complete code is at github.com/ivmarcos/local-ai-search-engine. Clone it, install dependencies, make sure Ollama is running, and try it yourself.

Local AI isn't about replacing cloud services. It's about having options. Sometimes you need scale and quality at any cost. Sometimes you need privacy and simplicity. Now you can build for both.