We’re excited to announce that CocoIndex now supports native integration with ColPali — enabling multi-vector, patch-level image indexing using cutting-edge multimodal models.

With just a few lines of code, you can now embed and index images with ColPali’s late-interaction architecture, fully integrated into CocoIndex’s composable flow system.

Why ColPali for Indexing?

ColPali (Contextual Late-interaction over Patches) is a powerful model for multimodal retrieval.

It fundamentally rethinks how documents—especially visually complex or image-rich ones—are represented and searched. Instead of reducing each image or page to a single dense vector (as in traditional bi-encoders), ColPali breaks an image into many smaller patches, preserving local spatial and semantic structure. Each patch receives its own embedding, which together form a multi-vector representation of the complete document.


Declare an Image Indexing Flow with CocoIndex / Qdrant

Indexing Images with ColPali in CocoIndex

This flow illustrates how we’ll process and index images using ColPali:

  1. Ingest image files from the local filesystem
  2. Use ColPali to embed each image into patch-level multi-vectors
  3. Optionally extract image captions using an LLM
  4. Export the embeddings (and optional captions) to a Qdrant collection

1. Ingest the Images

We start by defining a flow to read .jpg, .jpeg, and .png files from a local directory using LocalFile.


@cocoindex.flow_def(name="ImageObjectEmbeddingColpali")
def image_object_embedding_flow(flow_builder, data_scope):
    data_scope["images"] = flow_builder.add_source(
        cocoindex.sources.LocalFile(
            path="img",
            included_patterns=["*.jpg", "*.jpeg", "*.png"],
            binary=True
        ),
        refresh_interval=datetime.timedelta(minutes=1),
    )

The add_source function sets up a table with fields like filename and content. Images are automatically re-scanned every minute.


2. Process Each Image and Collect the Embedding

2.1 Embed the Image with ColPali

We use CocoIndex's built-in ColPaliEmbedImage function, which returns a multi-vector representation for each image. Each patch receives its own vector, preserving spatial and semantic information.


colpali_embed = cocoindex.functions.ColPaliEmbedImage(model="vidore/colpali-v1.2")

Inside the flow:


    img_embeddings = data_scope.add_collector()
    with data_scope["images"].row() as img:
        img["embedding"] = img["content"].transform(colpali_embed)

This transformation turns the raw image bytes into a list of vectors — one per patch — that can later be used for late interaction search.


3. Collect and Export the Embeddings

Once we’ve processed each image, we collect its metadata and embedding and send it to Qdran


        collect_fields = {
            "id": cocoindex.GeneratedField.UUID,
            "filename": img["filename"],
            "embedding": img["embedding"],
        }

        if ollama_model_name is not None:
            collect_fields["caption"] = img["caption"]

        img_embeddings.collect(**collect_fields)

Then we export to Qdrant using the Qdrant target:


    img_embeddings.export(
        "img_embeddings",
        cocoindex.targets.Qdrant(collection_name="ImageSearchColpali"),
        primary_key_fields=["id"],
    )

This creates a vector collection in Qdrant that supports multi-vector fields — required for ColPali-style late interaction search.


4. Enable Real-Time Indexing

To keep the image index up to date automatically, we wrap the flow in a FlowLiveUpdater:


@asynccontextmanager
async def lifespan(app: FastAPI):
    load_dotenv()
    cocoindex.init()
    image_object_embedding_flow.setup(report_to_stdout=True)
    app.state.live_updater = cocoindex.FlowLiveUpdater(image_object_embedding_flow)
    app.state.live_updater.start()
    yield

This keeps your vector index fresh as new images arrive.


🧬 What’s Actually Stored?

Unlike typical image search pipelines that store one global vector per image, ColPali stores:


Vector[Vector[Float32, N]]

Where:

This makes the index multi-vector ready, and compatible with late-interaction query strategies — like MaxSim or learned fusion.


🔌 Real-Time Indexing with Live Updater

You can also attach CocoIndex’s FlowLiveUpdater to your FastAPI or any Python app to keep your ColPali index synced in real time:


from fastapi import FastAPI
from contextlib import asynccontextmanager

@asynccontextmanager
async def lifespan(app: FastAPI):
    load_dotenv()
    cocoindex.init()
    image_object_embedding_flow.setup(report_to_stdout=True)
    app.state.live_updater = cocoindex.FlowLiveUpdater(image_object_embedding_flow)
    app.state.live_updater.start()
    yield

🌳 Retrivel and application

Refer to this example on Query and application building: https://cocoindex.io/blogs/live-image-search#3-query-the-index

Make sure we use ColPali to embed the query

@app.get("/search")
def search(
    q: str = Query(..., description="Search query"),
    limit: int = Query(5, description="Number of results"),
) -> Any:
    # Get the multi-vector embedding for the query
    query_embedding = text_to_colpali_embedding.eval(q)

Built with Flexibility in Mind

Whether you’re working on:

CocoIndex + ColPali gives you a modular, modern foundation to build from.

We’re constantly adding more examples and improving our runtime. If you found this helpful, please ⭐ star CocoIndex on GitHub and share it with others.

Suggestions for more native ‘LEGO’ pieces? Just let us know! We are moving full speed ahead to support you!