Rate Limiting Guide

Rate limiting is an essential feature in Cognee that helps control the number of LLM and embedding calls you make

Understanding Rate Limiting in Cognee

Cognee implements rate limiting at several levels:

LLM endpoint rate limits: Controls how many requests can be made to LLM endpoint
Embedding endpoint limits: Manages the frequency and volume of emedding calls

Configuring Rate Limits

You can configure rate limits through the environement variables:


RATE_LIMIT_ENABLED = True
RATE_LIMIT_REQUESTS = 60 
RATE_LIMIT_INTERVAL = 60  # in seconds (default is 60 requests per minute)

or via config:


config = get_llm_config()
self._enabled = config.llm_rate_limit_enabled
self._requests = config.llm_rate_limit_requests
self._interval = config.llm_rate_limit_interval

Next Steps

Learn about Performance Optimization techniques
Explore Configuration Options for customizing Cognee
See how to implement Remote Models with their own rate limits