This post is part of The Database Zoo: Exotic Data Storage Engines , a series exploring purpose-built databases engineered for specific workloads. Each post dives into a different type of specialized engine, explaining the problem it solves, the design decisions behind its architecture, how it stores and queries data efficiently, and real-world use cases. The goal is to show not just what these databases are, but why they exist and how they work under the hood.

Introduction

Vector embeddings have quietly become one of the most important data types in modern systems. Every LLM application, recommendation engine, semantic search feature, image similarity tool, fraud detector, and "find me things like this" workflow ultimately boils down to the same operation: convert some input into a high-dimensional vector, then search for its nearest neighbours.

At small scales this is straightforward, but as the volume of data and dimensionality grow, it's the sort of problem that turns general-purpose databases into smoke.

Vector search workloads have very different characteristics from classical OLTP (Online Transaction Processing) or document-store workloads:

This is why vector databases exist. They're not "databases that store vectors", they're purpose-built engines optimized around approximate nearest neighbour (ANN) search, distance-based retrieval, metadata filtering, high-throughput ingestion, and lifecycle management for embeddings at scale.

In this article we'll walk through how vector databases are structured, why they look the way they do, what indexing techniques they rely on, how queries are executed, what trade-offs matter, and where these systems shine or struggle in practice. By the end, you should have a mental model strong enough to reason about algorithm choice, storage design, performance tuning, and architectural decisions for any vector search workload.

Why General-Purpose Databases Struggle

Even the most robust relational and document-oriented databases stumble when faced with vector search workloads. The patterns and scale of high-dimensional embeddings expose fundamental limitations in systems designed for exact-match or low-dimensional indexing.

High-Dimensional Similarity Queries

Vector search is fundamentally about similarity, not equality. Unlike a traditional SQL query that looks for a value or range, a vector query typically asks:

Which vectors are closest to this one according to some distance metric?

General-purpose databases are optimized for exact-match or low-dimensional range queries. Indexes like B-trees or hash maps fall apart in high dimensions - a phenomenon known as the curse of dimensionality. As dimensions increase, nearly all points appear equidistant, making scans and traditional indexes increasingly ineffective.

Approximate Nearest Neighbour Workload

At scale, brute-force searches across millions or billions of embeddings are computationally infeasible:

Approximate Nearest Neighbour (ANN) algorithms solve this, but general-purpose databases do not implement them. Without ANN, even modest datasets produce query latencies measured in seconds or minutes rather than milliseconds.

Metadata Filtering and Hybrid Queries

Vector searches rarely occur in isolation. Most real-world applications require hybrid queries, such as:

Relational databases can filter metadata efficiently, but they cannot combine these filters with high-dimensional distance calculations without either brute-force scanning or complex application-level pipelines.

Ingestion at Scale

Modern vector pipelines can continuously produce embeddings:

Storage and Compression Challenges

Embeddings are dense, high-dimensional floating-point vectors. Naive storage in relational tables or JSON documents results in:

Specialized vector databases implement compression, quantization, or block-oriented storage schemes to reduce disk and memory usage while maintaining query accuracy.

Summary

General-purpose relational and document stores are reliable for exact-match or low-dimensional queries, but vector search workloads present unique challenges:

These challenges justify the emergence of vector databases: purpose-built engines designed to efficiently store, index, and query embeddings while supporting metadata filters, high throughput, and scalable approximate nearest neighbour algorithms.

Core Architecture

Vector databases are built to handle high-dimensional embeddings efficiently, addressing both the computational and storage challenges that general-purpose systems cannot. Their architecture revolves around optimized storage, indexing, and query execution tailored to similarity search workloads.

Storage Layouts

Unlike relational databases, vector databases adopt storage formats that prioritize both memory efficiency and fast distance computations:

These storage choices allow vector databases to scale to billions of embeddings without sacrificing query performance.

Indexing Strategies

Efficient indexing is critical for fast similarity search:

Together, these structures allow vector databases to perform ANN searches over millions or billions of vectors with millisecond-scale latency.

Query-Aware Compression

Vector databases often store embeddings in compressed formats, enabling efficient computation without fully decompressing:

These techniques reduce both RAM usage and disk I/O, critical for large-scale vector datasets.

Real-world applications often require a combination of vector similarity and structured filtering:

This hybrid approach ensures that vector databases are not just fast for raw similarity search but practical for complex application queries.

Summary

The core architecture of vector databases relies on:

By combining these elements, vector databases achieve fast, scalable similarity search while managing storage, memory, and computational efficiency in ways that general-purpose databases cannot match.

Query Execution and Patterns

Vector databases are designed around the unique demands of similarity search in high-dimensional spaces. Queries typically involve finding the closest vectors to a given embedding, often combined with filters or aggregations. Efficient execution requires careful coordination between indexing structures, storage layouts, and distance computation strategies.

Common Query Types

k-Nearest Neighbor (k-NN) Search

Fetch the top k vectors most similar to a query embedding, according to a distance metric (e.g., cosine similarity, Euclidean distance, inner product).

Example: Finding the 10 most similar product images to a new upload.

Optimized by: ANN indexes (HNSW, IVF, PQ) that prune the search space and avoid scanning all vectors.

Range / Radius Search

Retrieve all vectors within a specified distance threshold from the query embedding.

Example: Returning all text embeddings within a similarity score > 0.8 for semantic search.

Optimized by: Multi-level index traversal with early pruning based on approximate distance bounds.

Filtered / Hybrid Queries

Combine vector similarity search with structured filters on metadata or attributes.

Example: Find the closest 5 product embeddings in the "electronics" category with a price < $500.

Optimized by: Pre-filtering candidates using secondary indexes, then performing ANN search on the reduced set.

Batch Search

Execute multiple vector queries simultaneously, often in parallel.

Example: Performing similarity searches for hundreds of user queries in a recommendation pipeline.

Optimized by: Vectorized computation leveraging SIMD or GPU acceleration, and batching index traversal.

Query Execution Strategies

Vector databases translate high-level queries into efficient execution plans tailored for high-dimensional search:

Candidate Selection via ANN Index

Distance Computation

Parallel and GPU Execution

Hybrid Filtering

Dynamic Updates

Example Query Patterns

Key Takeaways

Vector database queries differ from traditional relational lookups:

By aligning execution strategies with the structure of embedding spaces and leveraging specialized indexes, vector databases achieve sub-linear search times and millisecond-scale response, even for billions of vectors.

Several purpose-built vector databases have emerged to handle the challenges of high-dimensional similarity search, each optimized for scale, query latency, and integration with other data systems. Here, we highlight a few widely adopted engines:

Milvus

Overview:

Milvus is an open-source vector database designed for large-scale similarity search. It supports multiple ANN index types, high-concurrency queries, and integration with both CPU and GPU acceleration.

Architecture Highlights:

Trade-offs:

Use Cases:

Recommendation engines, multimedia search (images, videos), NLP semantic search.

Weaviate

Overview:

Weaviate is an open-source vector search engine with strong integration for structured data and machine learning pipelines. It provides a GraphQL interface and supports semantic search with AI models.

Architecture Highlights:

Trade-offs:

Use Cases:

Semantic search in knowledge bases, enterprise search, AI-powered chatbots.

Pinecone

Overview:

Pinecone is a managed vector database service with a focus on operational simplicity, low-latency search, and scalability for production workloads.

Architecture Highlights:

Trade-offs:

Use Cases:

Real-time recommendations, personalization engines, semantic search for enterprise applications.

FAISS

Overview:

FAISS is a library for efficient similarity search over dense vectors. Unlike full database engines, it provides the building blocks to integrate ANN search into custom systems.

Architecture Highlights:

Trade-offs:

Use Cases:

Large-scale research experiments, AI model embeddings search, custom recommendation systems.

Other Notable Engines

Key Takeaways

While each vector database has its strengths and trade-offs, they share common characteristics:

Selecting the right vector database depends on use case requirements: whether you need full operational simplicity, extreme scalability, hybrid queries, or tight ML integration. Understanding these distinctions allows engineers to choose the best engine for their high-dimensional search workloads, rather than relying on general-purpose databases or custom implementations.

Trade-offs and Considerations

Vector databases excel at workloads involving high-dimensional similarity search, but their optimizations come with compromises. Understanding these trade-offs is essential when selecting or designing a vector database for your application.

Accuracy vs. Latency

Storage Efficiency vs. Query Speed

Hybrid Search Trade-offs

Scalability Considerations

Operational Complexity

Embedding Lifecycle and Updates

Cost vs. Performance

Key Takeaways

Use Cases and Real-World Examples

Vector databases are not just theoretical tools, they solve practical, high-dimensional search problems across industries. Below are concrete scenarios illustrating why purpose-built vector search engines are indispensable:

Semantic Search and Document Retrieval

Scenario: A company wants to allow users to search large text corpora or knowledge bases by meaning rather than exact keywords.

Challenges:

Vector Database Benefits:

Example: A customer support platform uses Milvus to index millions of support tickets and FAQs. Users can ask questions in natural language, and the system retrieves semantically relevant answers in milliseconds.

Recommendation Systems

Scenario: An e-commerce platform wants to suggest products based on user behavior, item embeddings, or content features.

Challenges:

Vector Database Benefits:

Example: A streaming service leverages FAISS to provide real-time content recommendations, using vector embeddings for movies, shows, and user preferences to improve engagement.

Scenario: A media platform wants users to search for images or video clips using example content instead of keywords.

Challenges:

Vector Database Benefits:

Example: An online fashion retailer uses Pinecone to allow users to upload photos of clothing items and find visually similar products instantly.

Fraud Detection and Anomaly Detection

Scenario: Financial institutions need to detect suspicious transactions or patterns in real-time.

Challenges:

Vector Database Benefits:

Example: A bank uses Milvus to monitor transaction embeddings, flagging unusual patterns that deviate from typical user behavior, enabling early fraud detection.

Conversational AI and Chatbots

Scenario: A company wants to enhance a chatbot with contextual understanding and retrieval-augmented generation.

Challenges:

Vector Database Benefits:

Example: A SaaS company integrates Pinecone with a large language model to provide contextual, accurate, and fast answers to user queries, improving support efficiency and satisfaction.

Example Workflow: Building a Semantic Search Engine with Milvus

This section provides a concrete end-to-end example of a vector search workflow, using Milvus to illustrate how data moves from embedding generation to similarity search, highlighting architecture and optimizations discussed earlier.

Scenario

We want to build a semantic search engine for a knowledge base containing 1 million documents. Users will enter natural language queries, and the system will return the most semantically relevant documents.

The workflow covers:

  1. Embedding generation
  2. Vector storage and indexing
  3. Query execution
  4. Hybrid filtering
  5. Retrieval and presentation

Following this workflow demonstrates how a vector database enables fast, accurate similarity search at scale.

Step 1: Embedding Generation

Each document is transformed into a high-dimensional vector using a transformer model (e.g., Sentence-BERT):

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
document_embedding = model.encode("The quick brown fox jumps over the lazy dog")

Key Concepts Illustrated:

Step 2: Vector Storage and Indexing

Vectors are stored in Milvus with an ANN index (HNSW):

from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection

connections.connect("default", host="localhost", port="19530")

fields = [
    FieldSchema(name="doc_id", dtype=DataType.INT64, is_primary=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=384)
]

schema = CollectionSchema(fields, description="Knowledge Base Vectors")
collection = Collection("kb_vectors", schema)

collection.insert([list(range(1_000_000)), embeddings])
collection.create_index("embedding", {"index_type": "HNSW", "metric_type": "COSINE"})

Storage Highlights:

Step 3: Query Execution

A user submits a query:

query_embedding = model.encode("How do I reset my password?")
results = collection.search([query_embedding], "embedding", param={"metric_type":"COSINE"}, limit=5)

Execution Steps:

  1. Transform query into embedding space.
  2. ANN search retrieves nearest neighbors efficiently using HNSW.
  3. Results ranked by similarity score.
  4. Only top-k results returned for low-latency response.

Step 4: Hybrid Filtering

Optionally, filter results by metadata, e.g., document category or publication date:

results = collection.search(
    [query_embedding],
    "embedding",
    expr="category == 'FAQ' && publish_date > '2025-01-01'",
    param={"metric_type":"COSINE"},
    limit=5
)

Highlights:

Step 5: Retrieval and Presentation

The system returns document IDs and similarity scores, which are then mapped back to full documents:

for res in results[0]:
    print(f"Doc ID: {res.id}, Score: {res.score}")

Output:

Key Concepts Illustrated

By following this workflow, engineers can build production-grade semantic search engines, recommendation systems, or retrieval-augmented applications using vector databases like Milvus, Pinecone, or FAISS.

Conclusion

Vector databases are purpose-built engines designed for high-dimensional search, enabling fast and accurate similarity queries over massive datasets. By combining efficient storage, indexing structures like HNSW or IVF, and optimized query execution, they handle workloads that general-purpose databases struggle with.

Understanding the core principles: embedding generation, vector indexing, and approximate nearest neighbor search helps engineers choose the right vector database and design effective semantic search or recommendation systems.