sia.hackernoon.com

In the current AI world, search is not just a feature; it’s the core of how we interact with information. But have you ever searched for a concept, and end up frustrated when the results focus on your exact keywords and miss the actual meaning of the keywords? Example, a search for "tips for new dog owners" might miss a great article titled "A Guide to Your First Canine Companion." This is the classic limitation of traditional keyword search.

The solution isn't to abandon keywords but to enhance them. Hybrid Search, a state-of-the-art modern technique that delivers the best of both worlds. Hybrid Search includes the precision of keyword matching and the contextual understanding of modern AI.

This article will walk you through not just the what and why, but the how, with a complete, hands-on implementation using the open-source vector database Milvus.

The Two Worlds of Search: Lexical vs. Semantic

Imagine you are searching for “fast running shoes” in e-commerce site. A traditional search will list the results matching “shoe”, “running” & “fast” the product name instantly. But this search will miss the products with words “sneakers” or products described as “swift”, “quick”, “athletic footwear” etc.

Keyword Search (Lexical): It is great for finding exact terms and specific entities (like names or product codes). It works by matching the text itself, often using algorithms like BM25. It’s reliable but lacks a deeper understanding.
Semantic Search (Vector): It uses AI models to convert text into numerical representations called "vectors." These vectors capture the meaning and context of the words. This allows it to find conceptually similar results, even if the phrasing is completely different.

Hybrid search doesn't force you to choose between Lexical and Semantic. It brings them together, creating a search experience that is both precise and context-aware. Hybrid Search delivers far more relevant results.

Toolkit for Building Hybrid Search

Before we start building, let us gather our tools:

A Milvus Instance: Milvus is our vector database, the specialized library where we'll store and query our text's "meaning." You can run it locally, self-host it, or use the fully-managed Zilliz Cloud.
Python: The programming language we'll use.
pymilvus library : The official Python SDK for talking to Milvus.
- pip install pymilvus
An Embedding Model: This is the AI that acts as our translator, which turns text into vectors. For hybrid search, we need a model that can create both dense vectors (for semantic meaning) and sparse vectors (for lexical keywords). A modern model like BGE-M3 can do both, or you can use separate models.

Step-by-Step Implementation Guide

Step 1: Define a Multi-Vector Schema

Every database needs a blueprint for the data it stores. In Milvus, this is called a schema. For hybrid search, our blueprint needs to specify fields for our text, its dense (semantic) vector, and its sparse (lexical) vector.

from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections
import pymilvus

# Connect to Milvus instance (set host as needed)
connections.connect("default", host='localhost', port='19530')

# 1. Define Fields
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True)
text_field = FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=2048)

# Dense vector field (e.g., 768 dimensions for BGE models)
dense_vector_field = FieldSchema(name="dense_vector", dtype=DataType.FLOAT_VECTOR, dim=768)

# Sparse vector field (for Splade/BM25-style sparse representations)
sparse_vector_field = FieldSchema(name="sparse_vector", dtype=DataType.SPARSE_FLOAT_VECTOR)

# 2. Define the Schema
schema = CollectionSchema(
    fields=[id_field, text_field, dense_vector_field, sparse_vector_field],
    description="Collection for hybrid search implementation"
)

# 3. Create the Collection
collection_name = "hybrid_search_articles"
collection = Collection(name=collection_name, schema=schema)

print(f"Collection '{collection_name}' created successfully.")

Step 2: Create Specialized Indexes

If a schema is a blueprint, an index is the super-fast table of contents. To get optimal performance, we need to tell Milvus how to organize our different vector types.

Dense Vectors use Approximate Nearest Neighbor (ANN) indexes. AUTOINDEX is a great choice where Milvus picks the best one for you.
Sparse Vectors have their own special index type, SPARSE_INVERTED_INDEX.

# Create index for the dense vector field
dense_index_params = {
    "index_type": "AUTOINDEX",
    "metric_type": "COSINE", # Common metric for semantic search
    "params": {}
}
collection.create_index("dense_vector", dense_index_params)

# Create index for the sparse vector field
sparse_index_params = {
    "index_type": "SPARSE_INVERTED_INDEX",
    "metric_type": "IP", # Inner Product is standard for sparse vectors
    "params": {}
}
collection.create_index("sparse_vector", sparse_index_params)

print("Indexes created for dense and sparse fields.")

Step 3: Insert Data (with AI-Generated Embeddings)

Now we can move to populate our collection with data. We will take our text documents, use our embedding model to generate both dense and sparse vectors for each, and insert them into Milvus.

The following code uses a mock function to generate vectors. In a real-world application, you would replace this with calls to your actual AI model.

# This is for demo. You must generate these vectors using your specific AI models
def generate_mock_embeddings(texts):
    # In a real app, replace with calls to your model endpoint
    import random
    import numpy as np
    dense = [np.random.rand(768).tolist() for _ in texts]
    # Sparse vectors are dictionary representations of indices/values
    sparse = [{random.randint(0, 5000): random.random() for _ in range(10)} for _ in texts]
    return dense, sparse
# ----------------------------

texts = ["Milvus is a vector database.", "Hybrid search is powerful.", "Semantic search uses AI.", "Keyword search is traditional."]
dense_vecs, sparse_vecs = generate_mock_embeddings(texts)

data_to_insert = [
    {"text": t, "dense_vector": d, "sparse_vector": s}
    for t, d, s in zip(texts, dense_vecs, sparse_vecs)
]

collection.insert(data_to_insert)
collection.load() # Load collection into memory for searching
print(f"Inserted {len(data_to_insert)} records and loaded collection.")

Step 4: Execute the Hybrid Search

This is where the magic happens. We will take a user query, generate both dense and sparse vectors for it (aka inference), and then ask Milvus to perform two searches in parallel. Milvus then uses a reranker to intelligently fuse the two sets of results into a single, highly relevant list.

The most common reranker is Reciprocal Rank Fusion (RRF), which smartly combines the rankings from both searches without needing complex manual tuning.

from pymilvus import AnnSearchRequest, RRFRanker, WeightedRanker

# Assume we generate query vectors the same way we generated data vectors
query_text = "What is a vector database?"
# Use your models to get these vectors:
query_dense_vector, query_sparse_vector = generate_mock_embeddings([query_text])

# 1. Define the Dense Search Request
req_dense = AnnSearchRequest(
    data=query_dense_vector, # Your query vector(s)
    anns_field="dense_vector",
    param={"metric_type": "COSINE", "params": {"nprobe": 10}},
    limit=10 # Get top 10 from dense search
)

# 2. Define the Sparse Search Request
req_sparse = AnnSearchRequest(
    data=query_sparse_vector, # Your query sparse vector(s)
    anns_field="sparse_vector",
    param={"metric_type": "IP", "params": {}},
    limit=10 # Get top 10 from sparse search
)

# 3. Define the Reranker
# We use RRF which dynamically fuses rankings
rerank = RRFRanker()

# Optional: Use WeightedRanker if you want to explicitly bias towards semantic (0.7, 0.3)
# rerank = WeightedRanker(0.7, 0.3)

# 4. Execute the Hybrid Search
results = collection.hybrid_search(
    reqs=[req_dense, req_sparse],
    rerank=rerank,
    limit=5, # Final limit of results to return
    output_fields=["text"]
)

# 5. Process and display results
print("\nHydrid Search Results:")
for hit in results[0]: # results[0] because we provided one query vector
    print(f"ID: {hit.id} | Score (RRF): {hit.distance:.4f} | Text: {hit.entity.get('text')}")

Best Practices for Success with Hybrid Search

Implementing the code is just the beginning. To build a truly exceptional search experience, follow these best practices.

Data and Vector Generation

Align Your Models: The AI models used to embed your documents must be the same models you use to embed your queries. A mismatch is like the two librarians speaking different languages.
Normalize Dense Vectors: For metrics like COSINE, normalizing your dense vectors (making their "length" equal to 1) before insertion can improve search accuracy and performance.
Use Proven Sparse Methods: Don't invent your own sparse vector generation. Rely on established lexical methods like BM25 or SPLADE to create meaningful, high-quality sparse representations.

Indexing and Infrastructure

Tune Index Parameters: While defaults are a good start, tuning index parameters (like nlist or M) based on your dataset size and desired speed vs. accuracy trade-off is crucial for production systems.
Leverage Scalar Filtering: Use the expr parameter in your search requests to pre-filter candidates based on metadata (e.g., category == "electronics" or publish_date > 2023). This dramatically speeds up queries by reducing the search space.
Monitor and Scale: Keep an eye on query latency and system metrics. As your data and traffic grow, be prepared to scale your Milvus cluster to maintain performance.

Reranking Strategy

Start with RRF: Reciprocal Rank Fusion (RRFRanker) is the best starting point for most use cases. It effectively balances results without manual weight-tuning.
Consider WeightedRanker for Control: If you have a strong reason to favor one search type over another (e.g., for e-commerce, you might want to give semantic search 70% weight and keywords 30%), use WeightedRanker.
Test and Iterate: The only way to know what's best is to test. Use real-world queries and user feedback to fine-tune your reranking strategy and parameters.

Summary

By combining the strengths of lexical and semantic search, you can build an intelligent, intuitive, and highly effective search solution that understands user intent, not just keywords. You now have the blueprint and the code to implement it yourself. Happy building!

Revolutionize Your Search with Hybrid Techniques: A Hands-On Guide