CortexaDB LogoCortexaDB
Guides

Embedders

Embedding providers (OpenAI, Gemini, Ollama, Hash)

CortexaDB supports pluggable embedding providers. When an embedder is configured, text is automatically converted to vectors for storage and search.

Overview

Without an embedder, you must provide raw embedding vectors manually. With an embedder, CortexaDB handles embedding automatically:

# Without embedder - manual vectors
db = CortexaDB.open("db.mem", dimension=128)
db.remember("text", embedding=[0.1, 0.2, ...])  # must provide embedding

# With embedder - automatic
db = CortexaDB.open("db.mem", embedder=OpenAIEmbedder())
db.remember("text")  # auto-embedded
db.ask("query")      # auto-embedded

Built-in Providers

OpenAI

Uses the OpenAI Embeddings API.

from cortexadb.providers.openai import OpenAIEmbedder

embedder = OpenAIEmbedder(
    api_key="sk-...",                      # or set OPENAI_API_KEY env var
    model="text-embedding-3-small"         # default model
)

db = CortexaDB.open("db.mem", embedder=embedder)

Available Models:

ModelDimensionNotes
text-embedding-3-small1536Default, good balance of cost and quality
text-embedding-3-large3072Higher quality, more expensive

Supports batch embedding via embed_batch().

Gemini

Uses Google's embedding API.

from cortexadb.providers.gemini import GeminiEmbedder

embedder = GeminiEmbedder(api_key="...")
db = CortexaDB.open("db.mem", embedder=embedder)

Ollama

Uses a local Ollama instance for self-hosted embeddings.

from cortexadb.providers.ollama import OllamaEmbedder

embedder = OllamaEmbedder(model="nomic-embed-text")
db = CortexaDB.open("db.mem", embedder=embedder)

Requires: Ollama running locally (ollama serve).

HashEmbedder (Testing)

A deterministic hash-based embedder for testing. Produces consistent embeddings from text using SHA-256 — not suitable for semantic search.

from cortexadb import HashEmbedder

embedder = HashEmbedder(dimension=64)
db = CortexaDB.open("db.mem", embedder=embedder)

Custom Embedders

You can create your own embedder by subclassing the Embedder base class:

from cortexadb.embedder import Embedder

class MyEmbedder(Embedder):
    @property
    def dimension(self) -> int:
        return 768

    def embed(self, text: str) -> list[float]:
        # Your embedding logic here
        return my_model.encode(text).tolist()

    def embed_batch(self, texts: list[str]) -> list[list[float]]:
        # Optional: override for efficient batch processing
        return [self.embed(t) for t in texts]

Embedder Interface

Method/PropertyRequiredDescription
dimensionYesReturns the embedding vector dimension
embed(text)YesEmbeds a single text string, returns list[float]
embed_batch(texts)NoEmbeds multiple texts, defaults to calling embed() in a loop

How Auto-Embedding Works

When an embedder is configured:

  1. remember(text) - Text is embedded via embedder.embed(text), then stored with the embedding
  2. ask(query) - Query is embedded via embedder.embed(query), then used for vector search
  3. ingest(text) - Each chunk is embedded individually after chunking
  4. load(file) - File is read, chunked, and each chunk is embedded

You can always override auto-embedding by providing an explicit embedding parameter:

db.remember("text", embedding=[0.1, 0.2, ...])  # uses provided embedding

Choosing an Embedder

ProviderProsCons
OpenAIHigh quality, easy setupRequires API key, costs money
GeminiGood quality, Google ecosystemRequires API key
OllamaFree, private, no API keyRequires local setup, slower
HashZero setup, deterministicNo semantic meaning, testing only

For production use, OpenAI or Ollama are recommended depending on whether you prefer cloud or self-hosted.


Next Steps

On this page