CortexaDB LogoCortexaDB
Guides

Indexing

Exact search vs HNSW approximate nearest neighbor

CortexaDB supports two vector indexing modes: Exact (brute-force) and HNSW (approximate nearest neighbor). Choose based on your dataset size and latency requirements.

Comparison

ModeUse CaseRecallSpeedMemory
exactSmall datasets (<10K entries)100%O(n)Low
hnswLarge datasets (10K+ entries)~95%O(log n)Higher

Exact Mode (Default)

Brute-force cosine similarity scan over all stored embeddings.

db = CortexaDB.open("db.mem", dimension=128)
# or explicitly:
db = CortexaDB.open("db.mem", dimension=128, index_mode="exact")

Pros:

  • 100% recall — never misses a result
  • No additional memory overhead
  • No build time

Cons:

  • O(n) query time — scales linearly with dataset size
  • Becomes slow beyond ~10,000 entries

HNSW Mode

Approximate nearest neighbor search using the USearch library. HNSW (Hierarchical Navigable Small World) builds a graph-based index for sub-linear query time.

Basic Usage

db = CortexaDB.open("db.mem", dimension=128, index_mode="hnsw")

Custom Parameters

db = CortexaDB.open("db.mem", dimension=128, index_mode={
    "type": "hnsw",
    "m": 16,                # connections per node
    "ef_search": 50,        # query-time search width
    "ef_construction": 200, # build-time search width
    "metric": "cos"         # distance metric
})

Parameters

ParameterDefaultRangeDescription
m164-64Connections per node. Higher = more memory, higher recall
ef_search5010-500Query search width. Higher = better recall, slower search
ef_construction20050-500Build search width. Higher = better index, slower build
metriccoscos, l2Distance metric

Tuning Guide

For higher recall (>99%):

index_mode={"type": "hnsw", "m": 32, "ef_search": 200, "ef_construction": 400}

For faster queries (lower latency):

index_mode={"type": "hnsw", "m": 8, "ef_search": 20}

For memory-constrained environments:

index_mode={"type": "hnsw", "m": 8, "ef_construction": 100}

Distance Metrics

Cosine (cos) - Default

Measures the angle between two vectors, ignoring magnitude.

index_mode={"type": "hnsw", "metric": "cos"}

Best for:

  • Text/semantic embeddings
  • Embeddings from models trained with cosine loss
  • Any scenario where direction matters more than magnitude

L2 / Euclidean (l2)

Measures straight-line distance between vectors.

index_mode={"type": "hnsw", "metric": "l2"}

Best for:

  • Image embeddings where magnitude matters
  • Recommendation systems comparing user ratings
  • Geometric data (GPS coordinates)
  • Models trained with L2/MSE loss

Automatic Persistence

HNSW indexes are automatically persisted to disk:

EventAction
checkpoint()HNSW index saved to cortexadb.hnsw
Database closeHNSW index automatically saved
Database openHNSW index loaded from disk (if exists)

No extra configuration needed. The index is rebuilt automatically if the file is missing.


Trade-offs

DimensionIncrease ForDecrease For
ef_searchBetter recallFaster queries
mHigher recallLess memory usage
ef_constructionBetter index qualityFaster build time

Decision Flow

Dataset size < 10,000?
  ├── Yes → Use "exact" (100% recall, simple)
  └── No  → Use "hnsw"

              Need >99% recall?
              ├── Yes → m=32, ef_search=200
              └── No  → Default parameters are fine

Next Steps

On this page