Cloud storage in Cognee is essential for distributed deployments (Kubernetes, ECS, EC2) where nodes do not share a common disk. Amazon S3 is fully supported, with other providers being actively developed.

Supported Cloud Providers

Amazon S3

βœ… Fully SupportedComplete S3 integration with read/write operations, authentication, and bucket management.

Google Cloud Storage

🚧 In DevelopmentHigh-performance storage with integrated AI/ML services and global distribution.

Azure Blob Storage

🚧 In DevelopmentEnterprise-grade storage with seamless integration to Azure services.

MinIO

🚧 In DevelopmentOpen-source S3-compatible storage for on-premises and private cloud deployments.

DigitalOcean Spaces

🚧 In DevelopmentDeveloper-friendly object storage with straightforward pricing and setup.

Cloudflare R2

🚧 In DevelopmentCost-effective storage with no egress charges and global edge distribution.
Current Status: Amazon S3 is the primary supported cloud storage provider. Other cloud storage providers are being actively developed and will be available in future releases.

Storage Architecture

1

Data Classification

Different types of data are stored optimally based on access patterns and requirements.
2

Tiered Storage

Frequently accessed data in hot storage, archived data in cold storage for cost optimization.
3

Replication

Data is replicated across multiple regions for high availability and disaster recovery.
4

Caching

Local caching layer for frequently accessed data to minimize latency.

Amazon S3 Setup Guide

This guide walks you through the minimal steps to configure Cognee with Amazon S3 for distributed deployments.

Prerequisites

1

AWS Account

Create AWS AccountYou’ll need an AWS account. If you’re not the AWS administrator, ask your admin to create an access key for you or a role that Cognee can assume.
2

S3 Bucket

Create or Choose S3 BucketCreate a new S3 bucket or choose an existing one. For this guide, we’ll use:
s3://my-cognee-bucket
AWS S3 Bucket Creation Guide
3

AWS Credentials

Collect Access CredentialsIf you’re an admin, generate a new Access Key and Secret Key for your IAM user in the AWS Console (IAM β†’ Users β†’ Security credentials). Otherwise, request them from your administrator.
4

Configure Environment

Set Environment VariablesConfigure Cognee to use S3 instead of local storage.

Configuration

Basic S3 ConfigurationCreate or update your .env file (or export environment variables in your deployment manifests):
# Storage backend configuration
STORAGE_BACKEND=s3
STORAGE_BUCKET_NAME=my-cognee-bucket

# AWS authentication
AWS_ACCESS_KEY_ID=<your-access-key>
AWS_SECRET_ACCESS_KEY=<your-secret-key>
Environment Variables Explained:
  • STORAGE_BACKEND: Tells Cognee to use S3 instead of the local disk
  • STORAGE_BUCKET_NAME: The bucket you created/selected
  • AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY: Authenticate Cognee’s requests to S3

Verification

Once configured, verify that Cognee is using S3:
import cognee

# Add some data
await cognee.add("Test data for S3 storage")
await cognee.cognify()

# Search to verify everything works
results = await cognee.search("test data")
print(f"βœ… S3 storage working! Found {len(results)} results")
Check your S3 bucket - you should see the data/ and system/ directories with Cognee’s files.

Other Cloud Providers (In Development)

The following cloud storage providers are currently in development and will be available in future releases. Amazon S3 is the only fully supported provider at this time.

Google Cloud Storage

Status: 🚧 In DevelopmentGoogle Cloud Storage integration is being actively developed. It will include support for:
  • Service account authentication
  • Multi-region storage classes (STANDARD, NEARLINE, COLDLINE, ARCHIVE)
  • Lifecycle management policies
  • Customer-managed encryption keys (CMEK)
  • Uniform bucket-level access
Planned Configuration:
# Future GCS configuration (not yet available)
import os
import cognee

os.environ["STORAGE_BACKEND"] = "gcs"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/service-account.json"
os.environ["GCS_BUCKET_NAME"] = "cognee-knowledge-graphs"
os.environ["GCS_PROJECT_ID"] = "your-gcp-project"

# This will be available in a future release

Azure Blob Storage

Status: 🚧 In DevelopmentAzure Blob Storage integration is being actively developed. It will include support for:
  • Connection string and managed identity authentication
  • Access tiers (Hot, Cool, Archive)
  • Lifecycle management policies
  • Azure Key Vault encryption
  • CDN integration for global distribution
Planned Configuration:
# Future Azure configuration (not yet available)
import os
import cognee

os.environ["STORAGE_BACKEND"] = "azure"
os.environ["AZURE_STORAGE_CONNECTION_STRING"] = "your-connection-string"
os.environ["AZURE_CONTAINER_NAME"] = "cognee-knowledge-graphs"

# This will be available in a future release

Other Providers

Also In Development:
  • MinIO: S3-compatible self-hosted storage
  • DigitalOcean Spaces: Simple cloud storage
  • Cloudflare R2: Zero egress fee storage
  • Backblaze B2: Cost-effective cloud storage
Want to contribute? If you’re interested in helping develop support for additional cloud storage providers, check out our contributing guide or join our Discord community.

Data Organization

Folder Structure

Next Steps