A minimal guide to using S3 (or S3-compatible, e.g., MinIO) to ingest data and/or store Cognee’s internal files. Before you start:
  • Complete Quickstart to understand basic operations
  • Ensure you have LLM Providers configured
  • Have S3 credentials and access to an S3 bucket

What S3 Storage Does

  • Ingest from S3: Pass s3://... paths to cognee.add() to load data directly from S3
  • Store Cognee data on S3: Set your data/system roots to S3 URLs to keep all files on S3
  • S3-compatible: Works with MinIO and other S3-compatible services

Prerequisites

Install with AWS extra if needed (boto3/s3fs) and add credentials to .env:
aws_access_key_id=your_access_key
aws_secret_access_key=your_secret_key
aws_region=us-east-1
# Optional for S3-compatible endpoints (e.g., MinIO):
aws_endpoint_url=http://localhost:9000

Option A: Ingest from S3

Pass S3 URIs (files or prefixes) directly to add(). Directories/prefixes expand to files when credentials are set.
import cognee

# Single file
await cognee.add("s3://my-bucket/docs/paper.pdf")

# Folder/prefix (recursively expands)
await cognee.add("s3://my-bucket/datasets/reports/")

# Mixed list
await cognee.add([
    "s3://my-bucket/docs/paper.pdf",
    "Some inline text to ingest",
])

# Process the data
await cognee.cognify()
This loads data directly from S3 using the s3:// URI. Directory expansion lists S3 keys and filters out folders, while file I/O streams from S3 using s3fs.
This simple example uses S3 paths for demonstration. In practice, you can mix S3 files with local files, use dataset scoping, and apply custom loaders - the same options work with S3 paths.

Option B: Store Cognee Data on S3

Keep Cognee’s generated files (text copies, system files) on S3 by pointing roots to S3 URLs. Add this to your .env:
DATA_ROOT_DIRECTORY="s3://my-bucket/cognee/data"
SYSTEM_ROOT_DIRECTORY="s3://my-bucket/cognee/system"
# Optional: force S3 backend detection
STORAGE_BACKEND="s3"
This configures Cognee to store all its internal files (processed data, system files) on S3 instead of locally.
Cognee chooses S3 storage when roots start with s3:// (or when STORAGE_BACKEND=s3 and both roots are S3 URLs). Credentials from .env are required.