Vector Stores Overview

Vector stores in Cognee handle the storage and retrieval of embeddings, enabling semantic search and similarity matching across your knowledge base. They work alongside graph stores to provide comprehensive knowledge retrieval capabilities.

Vector stores enable semantic search by storing high-dimensional embeddings that capture the meaning of your content, allowing for similarity-based retrieval.

How Vector Stores Work

Text Embedding

Content is converted into high-dimensional vectors using embedding models (OpenAI, local models, etc.).

Vector Storage

Embeddings are stored in specialized vector databases optimized for similarity search.

Semantic Search

User queries are embedded and compared against stored vectors to find semantically similar content.

Result Ranking

Results are ranked by similarity score and combined with graph-based context for comprehensive answers.

Choose Your Vector Store

Select the vector database that best fits your needs:

Vector Store Comparison

Performance Comparison

Database	Speed	Scalability	Setup	Best For
LanceDB	Fast	Medium	Easy	Development, serverless
Qdrant	Very Fast	High	Medium	Production, high volume
Weaviate	Fast	High	Medium	Multi-modal, hybrid search

Use Case Recommendations

Cost Considerations

Quick Configuration Examples

Get started with any vector store in just a few lines of code:

Zero Setup Required

import os
import cognee

# LanceDB configuration (default)
os.environ["VECTOR_DB_PROVIDER"] = "lancedb"
os.environ["VECTOR_DB_PATH"] = "./lancedb_data"

# No additional setup required
await cognee.add("LanceDB provides fast vector storage.")
await cognee.cognify()

Embedding Integration

High Quality Vectors

# Configure OpenAI embeddings
os.environ["EMBEDDING_PROVIDER"] = "openai"
os.environ["EMBEDDING_MODEL"] = "text-embedding-3-large"
os.environ["EMBEDDING_DIMENSIONS"] = "3072"

# Works with any vector store
os.environ["VECTOR_DB_PROVIDER"] = "qdrant"

Vector Search Types

Semantic Similarity

Meaning-Based SearchFind content based on semantic meaning rather than exact keywords:

# Semantic search example
results = await cognee.search(
    "artificial intelligence in medicine",
    query_type=SearchType.CHUNKS,
    similarity_threshold=0.8
)

# Finds related content like:
# - "machine learning for healthcare"
# - "AI-powered medical diagnosis"
# - "deep learning in radiology"

Hybrid Search

Combined ApproachesCombine vector similarity with traditional keyword search:

# Hybrid search combining vector and keyword matching
results = await cognee.search(
    "machine learning healthcare applications",
    query_type=SearchType.HYBRID,
    vector_weight=0.7,
    keyword_weight=0.3
)

Filtered Search

Metadata FilteringSearch within specific subsets using metadata filters:

# Search with metadata filters
results = await cognee.search(
    "treatment options",
    query_type=SearchType.CHUNKS,
    filters={
        "document_type": "medical_research",
        "publication_year": {"gte": 2020},
        "confidence": {"gt": 0.8}
    }
)

Performance Optimization

Index Optimization

Fast Retrieval

Optimize vector dimensions for your use case
Use appropriate distance metrics
Configure index parameters for speed vs accuracy

Batch Operations

Efficient Processing

Batch vector insertions for better performance
Use bulk operations when available
Optimize embedding generation

Memory Management

Resource Optimization

Configure appropriate cache sizes
Use memory mapping for large datasets
Optimize vector quantization

Scaling

Growth Planning

Plan for data growth and query volume
Configure sharding and replication
Monitor performance metrics

Advanced Features

Complex Queries

# Search across multiple vector spaces
results = await cognee.search(
    query_vectors=[
        {"text": "machine learning", "weight": 0.6},
        {"text": "healthcare applications", "weight": 0.4}
    ],
    query_type=SearchType.MULTI_VECTOR
)

Monitoring & Analytics

import cognee
import asyncio

async def get_vector_metrics():
    # Get vector store statistics
    stats = await cognee.get_vector_stats()
    
    print(f"Total vectors: {stats['total_vectors']}")
    print(f"Index size: {stats['index_size_mb']} MB")
    print(f"Average query time: {stats['avg_query_time_ms']} ms")
    print(f"Cache hit rate: {stats['cache_hit_rate']:.2%}")
    
    return stats

asyncio.run(get_vector_metrics())

Troubleshooting

Performance Issues

Quality Issues

Configuration Issues

Connection Problems

# Test vector store connection
try:
    await cognee.test_vector_connection()
    print("✅ Vector store connected")
except Exception as e:
    print(f"❌ Connection failed: {e}")

Index Issues

Verify index exists and is accessible
Check permissions and credentials
Validate configuration parameters
Test with simple operations

Quick Start Guides

Choose your vector store and get started in minutes:

🚀 Start with LanceDB

⏱️ Setup time: 2 minutes

⚡ Start with Qdrant

Best for ProductionHigh-performance vector search with clustering and enterprise features.

⏱️ Setup time: 10 minutes

🎯 Start with Weaviate

Multi-Modal ApplicationsAdvanced search with GraphQL API and multi-modal capabilities.

⏱️ Setup time: 15 minutes

Not Sure Which to Choose?

Start with LanceDB if you’re new to vector databases - it requires zero setup and works great for most use cases. You can always migrate to Qdrant or Weaviate later as your needs grow.

LanceDB

Qdrant

Weaviate

Getting Started

Core Concepts

Examples

CLI

Python SDK

UI

Overview

Vector Stores Overview

How Vector Stores Work

Choose Your Vector Store

Vector Store Comparison

Quick Configuration Examples

Embedding Integration

Vector Search Types

Performance Optimization

Index Optimization

Batch Operations

Memory Management

Scaling

Advanced Features

Monitoring & Analytics

Troubleshooting

Quick Start Guides

🚀 Start with LanceDB

⚡ Start with Qdrant

🎯 Start with Weaviate

Not Sure Which to Choose?

Getting Started

Core Concepts

Examples

CLI

Python SDK

UI

​Vector Stores Overview

​How Vector Stores Work

​Choose Your Vector Store

LanceDB

Qdrant

Weaviate

​Vector Store Comparison

​Quick Configuration Examples

​Embedding Integration

​Vector Search Types

​Performance Optimization

Index Optimization

Batch Operations

Memory Management

Scaling

​Advanced Features

​Monitoring & Analytics

​Troubleshooting

​Quick Start Guides

🚀 Start with LanceDB

⚡ Start with Qdrant

🎯 Start with Weaviate

​Not Sure Which to Choose?

Vector Stores Overview

How Vector Stores Work

Choose Your Vector Store

Vector Store Comparison

Quick Configuration Examples

Embedding Integration

Vector Search Types

Performance Optimization

Advanced Features

Monitoring & Analytics

Troubleshooting

Quick Start Guides

Not Sure Which to Choose?