Rate Limiting Guide
Rate limiting is an essential feature in Cognee that helps control the number of LLM and embedding calls you make
Understanding Rate Limiting in Cognee
Cognee implements rate limiting at several levels:
- LLM endpoint rate limits: Controls how many requests can be made to LLM endpoint
- Embedding endpoint limits: Manages the frequency and volume of emedding calls
Configuring Rate Limits
You can configure rate limits through the environement variables:
RATE_LIMIT_ENABLED = True
RATE_LIMIT_REQUESTS = 60
RATE_LIMIT_INTERVAL = 60 # in seconds (default is 60 requests per minute)
or via config:
config = get_llm_config()
self._enabled = config.llm_rate_limit_enabled
self._requests = config.llm_rate_limit_requests
self._interval = config.llm_rate_limit_interval
Next Steps
- Learn about Performance Optimization techniques
- Explore Configuration Options for customizing Cognee
- See how to implement Remote Models with their own rate limits