Your users don't search the way you think they do.
Someone looking for "how to cancel my subscription" might type "stop paying" or "end membership." Traditional keyword search returns nothing. The user assumes the answer isn't in your docs. They open a support ticket.
This happens constantly. Keyword search worked when documents were structured and users searched with predictable terms. That era is over. Modern applications need search that understands intent, not just matches strings.
The good news: you can build semantic search that runs entirely on your machine, costs nothing to operate, and keeps your data private. No cloud APIs. No vector database subscriptions. Just a small language model generating embeddings locally.
Let’s build it.
The Problem with Keyword Search
Keyword search relies on exact or fuzzy string matching. It fails when:
- Users describe concepts differently than your content does ("authentication" vs "login" vs "sign in")
- Synonyms aren't explicitly indexed
- Context matters more than individual words
- Typos or variations break the match
You can patch this with stemming, synonyms lists, and fuzzy matching. But you're fighting the core limitation: keyword search has no understanding of meaning.
What Are Embeddings?
An embedding is a list of numbers that represents the meaning of text. Two pieces of text with similar meanings produce similar number lists. Two unrelated texts produce different ones.
Think of it like coordinates. The sentence "How do I reset my password?" and "I forgot my login credentials" would be placed near each other in this mathematical space because they mean similar things.
When a user searches, you:
- Convert their query into an embedding
- Compare it against your pre-computed document embeddings
- Return the documents with the most similar embeddings
No keyword matching. Pure meaning comparison.
Local vs Cloud: Why Run Embeddings Locally?
Cloud embedding APIs (OpenAI, Cohere, Voyage) are convenient but come with tradeoffs:
|
Aspect |
Cloud APIs |
Local SLM |
|---|---|---|
|
Cost |
Per-request pricing |
Free after setup |
|
Privacy |
Data sent to third party |
Data stays on device |
|
Latency |
Network round-trip |
Instant |
|
Availability |
Depends on service uptime |
Always available |
|
Rate limits |
Yes |
No |
For internal tools, documentation search, or privacy-sensitive applications, local embeddings make sense. They're also great for development: no API keys, no costs during iteration.
The tradeoff is quality. Cloud models are larger and produce better embeddings. For many use cases, especially when your search corpus is focused (your own docs, a specific domain), smaller models work well.
Architecture Overview
Here's what we're building:
Components:
- Ollama: Runs embedding models locally
- Embedding Generator: Sends text to Ollama, receives vectors
- IndexedDB: Stores documents and their embeddings in the browser
- Search Engine: Computes similarity between query and stored embeddings
Setting Up Ollama
Ollama lets you run language models locally. Install it from
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a small, fast embedding model
ollama pull nomic-embed-text
nomic-embed-text produces 768-dimensional embeddings and runs well on modest hardware. It's a good balance between quality and speed.
Verify it's working:
curl http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "test embedding"
}'
You should get back a JSON response with an embedding array.
Project Setup
We'll build a React + TypeScript application with Vite. The complete code is available at
npm create vite@latest local-search -- --template react-ts
cd local-search
npm install idb
idb is a tiny wrapper around IndexedDB that makes it less painful to use.
The Embedding Service
First, let's create a service that talks to Ollama:
// src/services/embeddings.ts
const OLLAMA_URL = 'http://localhost:11434';
const MODEL = 'nomic-embed-text';
export async function generateEmbedding(text: string): Promise<number[]> {
const response = await fetch(`${OLLAMA_URL}/api/embeddings`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: MODEL,
prompt: text,
}),
});
if (!response.ok) {
throw new Error(`Ollama error: ${response.statusText}`);
}
const data = await response.json();
return data.embedding;
}
export async function generateEmbeddings(
texts: string[]
): Promise<number[][]> {
// Process in batches to avoid overwhelming the model
const embeddings: number[][] = [];
for (const text of texts) {
const embedding = await generateEmbedding(text);
embeddings.push(embedding);
}
return embeddings;
}
Ollama doesn't support batch embedding requests, so we process texts sequentially. For large document sets, you might want to add parallelism with a concurrency limit.
Storing Embeddings in IndexedDB
IndexedDB is a browser database that persists data locally. It's perfect for storing embeddings: no server needed, survives page refreshes, and can handle substantial amounts of data.
// src/services/database.ts
import { openDB, DBSchema, IDBPDatabase } from 'idb';
export interface Document {
id: string;
title: string;
content: string;
embedding: number[];
createdAt: number;
}
interface SearchDB extends DBSchema {
documents: {
key: string;
value: Document;
indexes: { 'by-created': number };
};
}
let dbInstance: IDBPDatabase<SearchDB> | null = null;
export async function getDB(): Promise<IDBPDatabase<SearchDB>> {
if (dbInstance) return dbInstance;
dbInstance = await openDB<SearchDB>('local-search', 1, {
upgrade(db) {
const store = db.createObjectStore('documents', { keyPath: 'id' });
store.createIndex('by-created', 'createdAt');
},
});
return dbInstance;
}
export async function saveDocument(doc: Document): Promise<void> {
const db = await getDB();
await db.put('documents', doc);
}
export async function getAllDocuments(): Promise<Document[]> {
const db = await getDB();
return db.getAll('documents');
}
export async function deleteDocument(id: string): Promise<void> {
const db = await getDB();
await db.delete('documents', id);
}
export async function clearAllDocuments(): Promise<void> {
const db = await getDB();
await db.clear('documents');
}
Each document stores its content alongside its embedding. This denormalization makes search fast: we load everything once, then compute similarities in memory.
The Search Engine
The core of semantic search is comparing embeddings using cosine similarity. Two vectors pointing in the same direction (similar meaning) have similarity close to 1. Orthogonal vectors (unrelated) have similarity close to 0.
// src/services/search.ts
import { Document, getAllDocuments } from './database';
import { generateEmbedding } from './embeddings';
function cosineSimilarity(a: number[], b: number[]): number {
let dotProduct = 0;
let normA = 0;
let normB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
const magnitude = Math.sqrt(normA) * Math.sqrt(normB);
return magnitude === 0 ? 0 : dotProduct / magnitude;
}
export interface SearchResult {
document: Document;
score: number;
}
export async function search(
query: string,
limit: number = 10
): Promise<SearchResult[]> {
const queryEmbedding = await generateEmbedding(query);
const documents = await getAllDocuments();
const results: SearchResult[] = documents.map((doc) => ({
document: doc,
score: cosineSimilarity(queryEmbedding, doc.embedding),
}));
results.sort((a, b) => b.score - a.score);
return results.slice(0, limit);
}
This loads all documents into memory for each search. That's fine for hundreds or even a few thousand documents. For larger datasets, you'd want to implement approximate nearest neighbor search or use a proper vector store.
Indexing Content
Now let's tie it together with a function that indexes new content:
// src/services/indexer.ts
import { v4 as uuidv4 } from 'uuid';
import { Document, saveDocument } from './database';
import { generateEmbedding } from './embeddings';
export interface ContentToIndex {
title: string;
content: string;
id?: string;
}
export async function indexContent(
content: ContentToIndex
): Promise<Document> {
const embedding = await generateEmbedding(
`${content.title}\n\n${content.content}`
);
const document: Document = {
id: content.id || uuidv4(),
title: content.title,
content: content.content,
embedding,
createdAt: Date.now(),
};
await saveDocument(document);
return document;
}
export async function indexBatch(
contents: ContentToIndex[],
onProgress?: (indexed: number, total: number) => void
): Promise<Document[]> {
const documents: Document[] = [];
for (let i = 0; i < contents.length; i++) {
const doc = await indexContent(contents[i]);
documents.push(doc);
onProgress?.(i + 1, contents.length);
}
return documents;
}
Notice we concatenate title and content before embedding. This gives the model more context and generally produces better results than embedding just the content.
Building the UI
Here's a minimal React interface that lets you add documents and search:
// src/App.tsx
import { useState, useEffect } from 'react';
import { indexContent } from './services/indexer';
import { search, SearchResult } from './services/search';
import { getAllDocuments, deleteDocument, Document } from './services/database';
import './App.css';
function App() {
const [documents, setDocuments] = useState<Document[]>([]);
const [results, setResults] = useState<SearchResult[]>([]);
const [query, setQuery] = useState('');
const [title, setTitle] = useState('');
const [content, setContent] = useState('');
const [isIndexing, setIsIndexing] = useState(false);
const [isSearching, setIsSearching] = useState(false);
useEffect(() => {
loadDocuments();
}, []);
async function loadDocuments() {
const docs = await getAllDocuments();
setDocuments(docs);
}
async function handleIndex() {
if (!title.trim() || !content.trim()) return;
setIsIndexing(true);
try {
await indexContent({ title, content });
setTitle('');
setContent('');
await loadDocuments();
} finally {
setIsIndexing(false);
}
}
async function handleSearch() {
if (!query.trim()) return;
setIsSearching(true);
try {
const searchResults = await search(query);
setResults(searchResults);
} finally {
setIsSearching(false);
}
}
async function handleDelete(id: string) {
await deleteDocument(id);
await loadDocuments();
setResults(results.filter(r => r.document.id !== id));
}
return (
<div className="container">
<h1>Local AI Search</h1>
<section className="add-document">
<h2>Add Document</h2>
<input
type="text"
placeholder="Title"
value={title}
onChange={(e) => setTitle(e.target.value)}
/>
<textarea
placeholder="Content"
value={content}
onChange={(e) => setContent(e.target.value)}
rows={4}
/>
<button onClick={handleIndex} disabled={isIndexing}>
{isIndexing ? 'Indexing...' : 'Index Document'}
</button>
</section>
<section className="search">
<h2>Search</h2>
<div className="search-box">
<input
type="text"
placeholder="Search by meaning, not keywords..."
value={query}
onChange={(e) => setQuery(e.target.value)}
onKeyDown={(e) => e.key === 'Enter' && handleSearch()}
/>
<button onClick={handleSearch} disabled={isSearching}>
{isSearching ? 'Searching...' : 'Search'}
</button>
</div>
</section>
{results.length > 0 && (
<section className="results">
<h2>Results</h2>
{results.map((result) => (
<div key={result.document.id} className="result-card">
<div className="result-header">
<h3>{result.document.title}</h3>
<span className="score">
{(result.score * 100).toFixed(1)}% match
</span>
</div>
<p>{result.document.content}</p>
</div>
))}
</section>
)}
<section className="documents">
<h2>Indexed Documents ({documents.length})</h2>
{documents.map((doc) => (
<div key={doc.id} className="document-card">
<h4>{doc.title}</h4>
<p>{doc.content.substring(0, 100)}...</p>
<button
className="delete-btn"
onClick={() => handleDelete(doc.id)}
>
Delete
</button>
</div>
))}
</section>
</div>
);
}
export default App;
Testing Semantic Search
Add a few documents to test. Try these:
- Title: "Password Reset", Content: "To reset your password, click the forgot password link on the login page and follow the email instructions."
- Title: "Account Security", Content: "Enable two-factor authentication in your security settings to protect your account from unauthorized access."
- Title: "Billing FAQ", Content: "You can update your payment method or cancel your subscription from the billing section in account settings."
Now search for "I can't log in" and watch it find the password reset article, even though those exact words don't appear anywhere.
Search for "stop my subscription" and it finds the billing FAQ.
That's semantic search working! Here’s what it should look like:
Performance Considerations
A few things to keep in mind:
Embedding generation takes time. On a MacBook Pro M1, nomic-embed-text generates about 10-15 embeddings per second. For large document sets, index during off-peak times or show progress to users.
Memory usage scales with documents. Each 768-dimension embedding is about 3KB. 10,000 documents use ~30MB just for embeddings, plus the document content. This is manageable in browsers but watch it.
Search is O(n). We compare every document on each search. For thousands of documents, this takes milliseconds. For millions, you need approximate nearest neighbor algorithms. Libraries like hnswlib-node or faiss can help, though they complicate the pure-browser approach.
Chunking Long Documents
One issue: embedding models have token limits, typically 512-8192 tokens. Long documents need to be split into chunks.
// src/services/chunker.ts
export interface Chunk {
text: string;
startIndex: number;
endIndex: number;
}
export function chunkText(
text: string,
maxLength: number = 500,
overlap: number = 50
): Chunk[] {
const chunks: Chunk[] = [];
let start = 0;
while (start < text.length) {
let end = start + maxLength;
// Try to break at sentence boundary
if (end < text.length) {
const lastPeriod = text.lastIndexOf('.', end);
if (lastPeriod > start + maxLength / 2) {
end = lastPeriod + 1;
}
}
chunks.push({
text: text.slice(start, end).trim(),
startIndex: start,
endIndex: Math.min(end, text.length),
});
start = end - overlap;
if (start >= text.length) break;
}
return chunks;
}
Each chunk gets its own embedding. When searching, you find relevant chunks, then return the parent document. This lets semantic search work on 50-page PDFs or long documentation.
When Cloud Embeddings Still Make Sense
Local embedding isn't always the answer:
- Quality matters more than privacy: OpenAI's
text-embedding-3-largeproduces better embeddings than any local model. For user-facing search where quality directly impacts experience, the API cost may be worth it. - Scale: If you're indexing millions of documents, cloud services handle the infrastructure. Running large embedding workloads locally requires serious hardware.
- Team doesn't want to manage Ollama: Ollama is simple, but it's another thing to install and maintain. For quick prototypes or teams without local setup capability, cloud APIs are easier.
- Mobile or restricted environments: Can't run Ollama on an iPhone. Browser-based embedding models exist (like transformers.js) but are slower and more limited.
Real-World Use Cases
Where local semantic search shines:
Documentation search: Replace Cmd+F with something that understands "How do I authenticate?" finds the right docs even if they say "login" or "sign in."
Internal knowledge bases: Company wikis, runbooks, and internal docs stay on-premises. No sending proprietary information to external APIs.
Personal tools: Note-taking apps, bookmark managers, local file search. Your data never leaves your machine.
Offline-capable apps: Semantic search works without internet. Index once, search forever.
Development and testing: Iterate on search quality without API costs. Once it works, decide if you need cloud-scale deployment.
What We Built
A semantic search engine that:
- Runs entirely locally with no cloud dependencies
- Uses Ollama with
nomic-embed-textfor embeddings - Stores everything in IndexedDB
- Understands meaning, not just keywords
- Costs nothing to operate
The complete code is at
Local AI isn't about replacing cloud services. It's about having options. Sometimes you need scale and quality at any cost. Sometimes you need privacy and simplicity. Now you can build for both.