Behind the scenes, every pipeline step runs as a Modal job that talks to managed LanceDB, Kuzu, and PostgreSQL clusters.
System Overview
Cogwit’s architecture centers around three main layers that work together to provide a managed knowledge processing platform:Modal (Managed Infrastructure)
Modal provides the compute foundation for all Cogwit operations:- API Services: Hosts the FastAPI service that handles all REST endpoints and authentication (see Cogwit SDK)
- Notebook Sandbox: Provides isolated environments for running user code with 24-hour timeout support (see Cogwit Notebooks)
- Container Orchestration: Every API request runs inside a Modal container with secrets managed internally by Cogwit
- Code Execution: Notebook code runs in short-lived sandboxes that forward the user’s Cogwit API key to the managed API
Storage Services (Managed by Cogwit)
All data persistence is handled through Cogwit’s managed storage infrastructure:- S3 – Central storage for all raw uploads, LanceDB tables, and Kuzu graph files in Cogwit’s managed S3 infrastructure
- LanceDB – Vector database that stores embeddings generated during the cognify process
- Kuzu – Graph database that maintains knowledge graph relationships and entities
- PostgreSQL – Relational database for users, datasets, permissions, quotas, and billing records
Key Architectural Principles
- Dataset Isolation: All processing happens at the dataset level, with separate storage namespaces (see permissions & security for details)
- Managed Infrastructure: Users don’t configure Modal, S3, or database credentials—everything is managed by Cogwit
- Compatibility: Storage schemas remain compatible with self-hosted Cognee for easy migration