In the current AI world, search is not just a feature; it’s the core of how we interact with information. But have you ever searched for a concept, and end up frustrated when the results focus on your exact keywords and  miss the actual meaning of the keywords?  Example, a search for "tips for new dog owners" might miss a great article titled "A Guide to Your First Canine Companion." This is the classic limitation of traditional keyword search.

The solution isn't to abandon keywords but to enhance them. Hybrid Search, a state-of-the-art modern technique that delivers the best of both worlds.  Hybrid Search includes the precision of keyword matching and the contextual understanding of modern AI.

This article will  walk you through not just the what and why, but the how, with a complete, hands-on implementation using the open-source vector database Milvus.

The Two Worlds of Search: Lexical vs. Semantic

Imagine you are searching for “fast running shoes” in e-commerce site. A traditional search will list the results matching “shoe”, “running” & “fast” the product name instantly. But this search will miss the products with words “sneakers” or products described as “swift”, “quick”, “athletic footwear” etc.

Hybrid search doesn't force you to choose between Lexical and Semantic. It brings them together, creating a search experience that is both precise and context-aware. Hybrid Search delivers far more relevant results.

Before we start building, let us gather our tools:

Step-by-Step Implementation Guide

Step 1: Define a Multi-Vector Schema

Every database needs a blueprint for the data it stores. In Milvus, this is called a schema. For hybrid search, our blueprint needs to specify fields for our text, its dense (semantic) vector, and its sparse (lexical) vector.

from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections
import pymilvus

# Connect to Milvus instance (set host as needed)
connections.connect("default", host='localhost', port='19530')

# 1. Define Fields
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True)
text_field = FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=2048)

# Dense vector field (e.g., 768 dimensions for BGE models)
dense_vector_field = FieldSchema(name="dense_vector", dtype=DataType.FLOAT_VECTOR, dim=768)

# Sparse vector field (for Splade/BM25-style sparse representations)
sparse_vector_field = FieldSchema(name="sparse_vector", dtype=DataType.SPARSE_FLOAT_VECTOR)

# 2. Define the Schema
schema = CollectionSchema(
    fields=[id_field, text_field, dense_vector_field, sparse_vector_field],
    description="Collection for hybrid search implementation"
)

# 3. Create the Collection
collection_name = "hybrid_search_articles"
collection = Collection(name=collection_name, schema=schema)

print(f"Collection '{collection_name}' created successfully.")


Step 2: Create Specialized Indexes

If a schema is a blueprint, an index is the super-fast table of contents. To get optimal performance, we need to tell Milvus how to organize our different vector types.

# Create index for the dense vector field
dense_index_params = {
    "index_type": "AUTOINDEX",
    "metric_type": "COSINE", # Common metric for semantic search
    "params": {}
}
collection.create_index("dense_vector", dense_index_params)

# Create index for the sparse vector field
sparse_index_params = {
    "index_type": "SPARSE_INVERTED_INDEX",
    "metric_type": "IP", # Inner Product is standard for sparse vectors
    "params": {}
}
collection.create_index("sparse_vector", sparse_index_params)

print("Indexes created for dense and sparse fields.")

Step 3: Insert Data (with AI-Generated Embeddings)

Now we can move to populate our collection with data. We will take our text documents, use our embedding model to generate both dense and sparse vectors for each, and insert them into Milvus.

The following code uses a mock function to generate vectors. In a real-world application, you would replace this with calls to your actual AI model.

# This is for demo. You must generate these vectors using your specific AI models
def generate_mock_embeddings(texts):
    # In a real app, replace with calls to your model endpoint
    import random
    import numpy as np
    dense = [np.random.rand(768).tolist() for _ in texts]
    # Sparse vectors are dictionary representations of indices/values
    sparse = [{random.randint(0, 5000): random.random() for _ in range(10)} for _ in texts]
    return dense, sparse
# ----------------------------

texts = ["Milvus is a vector database.", "Hybrid search is powerful.", "Semantic search uses AI.", "Keyword search is traditional."]
dense_vecs, sparse_vecs = generate_mock_embeddings(texts)

data_to_insert = [
    {"text": t, "dense_vector": d, "sparse_vector": s}
    for t, d, s in zip(texts, dense_vecs, sparse_vecs)
]

collection.insert(data_to_insert)
collection.load() # Load collection into memory for searching
print(f"Inserted {len(data_to_insert)} records and loaded collection.")


This is where the magic happens. We will  take a user  query, generate both dense and sparse vectors for it (aka inference), and then ask Milvus to perform two searches in parallel. Milvus then uses a reranker to intelligently fuse the two sets of results into a single, highly relevant list.

The most common reranker is Reciprocal Rank Fusion (RRF), which smartly combines the rankings from both searches without needing complex manual tuning.

from pymilvus import AnnSearchRequest, RRFRanker, WeightedRanker

# Assume we generate query vectors the same way we generated data vectors
query_text = "What is a vector database?"
# Use your models to get these vectors:
query_dense_vector, query_sparse_vector = generate_mock_embeddings([query_text])

# 1. Define the Dense Search Request
req_dense = AnnSearchRequest(
    data=query_dense_vector, # Your query vector(s)
    anns_field="dense_vector",
    param={"metric_type": "COSINE", "params": {"nprobe": 10}},
    limit=10 # Get top 10 from dense search
)

# 2. Define the Sparse Search Request
req_sparse = AnnSearchRequest(
    data=query_sparse_vector, # Your query sparse vector(s)
    anns_field="sparse_vector",
    param={"metric_type": "IP", "params": {}},
    limit=10 # Get top 10 from sparse search
)

# 3. Define the Reranker
# We use RRF which dynamically fuses rankings
rerank = RRFRanker()

# Optional: Use WeightedRanker if you want to explicitly bias towards semantic (0.7, 0.3)
# rerank = WeightedRanker(0.7, 0.3)

# 4. Execute the Hybrid Search
results = collection.hybrid_search(
    reqs=[req_dense, req_sparse],
    rerank=rerank,
    limit=5, # Final limit of results to return
    output_fields=["text"]
)

# 5. Process and display results
print("\nHydrid Search Results:")
for hit in results[0]: # results[0] because we provided one query vector
    print(f"ID: {hit.id} | Score (RRF): {hit.distance:.4f} | Text: {hit.entity.get('text')}")

Implementing the code is just the beginning. To build a truly exceptional search experience, follow these best practices.

Data and Vector Generation

Indexing and Infrastructure

Reranking Strategy

Summary

By combining the strengths of lexical and semantic search, you can build an intelligent, intuitive, and highly effective search solution that understands user intent, not just keywords. You now have the blueprint and the code to implement it yourself. Happy building!

References