Vector Stores Overview

Vector stores in Cognee handle the storage and retrieval of embeddings, enabling semantic search and similarity matching across your knowledge base. They work alongside graph stores to provide comprehensive knowledge retrieval capabilities.
Vector stores enable semantic search by storing high-dimensional embeddings that capture the meaning of your content, allowing for similarity-based retrieval.

How Vector Stores Work

1

Text Embedding

Content is converted into high-dimensional vectors using embedding models (OpenAI, local models, etc.).
2

Vector Storage

Embeddings are stored in specialized vector databases optimized for similarity search.
3

Semantic Search

User queries are embedded and compared against stored vectors to find semantically similar content.
4

Result Ranking

Results are ranked by similarity score and combined with graph-based context for comprehensive answers.

Choose Your Vector Store

Select the vector database that best fits your needs:

Vector Store Comparison

Performance Comparison

Quick Configuration Examples

Get started with any vector store in just a few lines of code:
Zero Setup Required
import os
import cognee

# LanceDB configuration (default)
os.environ["VECTOR_DB_PROVIDER"] = "lancedb"
os.environ["VECTOR_DB_PATH"] = "./lancedb_data"

# No additional setup required
await cognee.add("LanceDB provides fast vector storage.")
await cognee.cognify()

Embedding Integration

High Quality Vectors
# Configure OpenAI embeddings
os.environ["EMBEDDING_PROVIDER"] = "openai"
os.environ["EMBEDDING_MODEL"] = "text-embedding-3-large"
os.environ["EMBEDDING_DIMENSIONS"] = "3072"

# Works with any vector store
os.environ["VECTOR_DB_PROVIDER"] = "qdrant"

Vector Search Types

Semantic Similarity

Performance Optimization

Index Optimization

Fast Retrieval
  • Optimize vector dimensions for your use case
  • Use appropriate distance metrics
  • Configure index parameters for speed vs accuracy

Batch Operations

Efficient Processing
  • Batch vector insertions for better performance
  • Use bulk operations when available
  • Optimize embedding generation

Memory Management

Resource Optimization
  • Configure appropriate cache sizes
  • Use memory mapping for large datasets
  • Optimize vector quantization

Scaling

Growth Planning
  • Plan for data growth and query volume
  • Configure sharding and replication
  • Monitor performance metrics

Advanced Features

Monitoring & Analytics

import cognee
import asyncio

async def get_vector_metrics():
    # Get vector store statistics
    stats = await cognee.get_vector_stats()
    
    print(f"Total vectors: {stats['total_vectors']}")
    print(f"Index size: {stats['index_size_mb']} MB")
    print(f"Average query time: {stats['avg_query_time_ms']} ms")
    print(f"Cache hit rate: {stats['cache_hit_rate']:.2%}")
    
    return stats

asyncio.run(get_vector_metrics())

Troubleshooting

Quick Start Guides

Choose your vector store and get started in minutes:

Not Sure Which to Choose?

Start with LanceDB if you’re new to vector databases - it requires zero setup and works great for most use cases. You can always migrate to Qdrant or Weaviate later as your needs grow.