Skip to Content
ReferenceRate Limiting

Rate Limiting Guide

Rate limiting is an essential feature in Cognee that helps control the number of LLM and embedding calls you make

Understanding Rate Limiting in Cognee

Cognee implements rate limiting at several levels:

  • LLM endpoint rate limits: Controls how many requests can be made to LLM endpoint
  • Embedding endpoint limits: Manages the frequency and volume of emedding calls

Configuring Rate Limits

You can configure rate limits through the environement variables:

RATE_LIMIT_ENABLED = True RATE_LIMIT_REQUESTS = 60 RATE_LIMIT_INTERVAL = 60 # in seconds (default is 60 requests per minute)

or via config:

config = get_llm_config() self._enabled = config.llm_rate_limit_enabled self._requests = config.llm_rate_limit_requests self._interval = config.llm_rate_limit_interval

Next Steps