Skip to main content
  1. Inventions/

TokenLimiter

2025.1 February 14, 2025

This library enables you to limit the usage of tokens per minute across multiple goroutines, with support for optional delay until fit behavior.

Problem to Solve

In applications that make heavy use of tokens such as API keys with rate limits, it’s crucial to limit the usage of tokens to avoid exceeding these limits. This problem becomes more complex when multiple goroutines are involved, each potentially consuming tokens independently.

Solution

I created a solution that works across multiple goroutines and provides an option to either return an error or delay until the usage fits within the limit when the limit is exceeded.

Typical Use Cases

  • Rate limiting API calls across concurrent processes
  • Controlling resource consumption in distributed systems
  • Managing token usage for third-party API integrations
  • Ensuring compliance with API rate limits in high-throughput applications

Used By

  • LinkResearchTools
  • A large AI brand

Version History

2025.1
February 14, 2025
Review and update to latest libraries
Changes:
  • Review and update to latest libraries
  • Fixed compatibility issues with older systems
2024.1
January 10, 2024
Review and update to latest libraries
Changes:
  • Review and update to latest libraries
2023.1
May 5, 2023
Initial solution provided

License Terms

Individual usage licenses to be agreed
Features>

Features #

  • Thread-safe token limiting across multiple goroutines
  • Option to delay until fit: when enabled, if the token usage limit is exceeded, the call will wait until enough tokens have passed for the usage to fit within the limit
  • Error when limit exceeded: when delay until fit is not enabled, the call will return an error if the token usage limit is exceeded
  • Track average usage: keep track of the average token usage over a series of points (e.g., last 100 usages)
  • Get last minute: method to get the last minute that was used to add tokens
Limitations>

Limitations #

This TokenLimiter works only on a single machine. If run on multiple nodes of a cluster, the token limit could be exceeded. A workaround for this would be to give each node a portion of the Token Per Minute quota (TPM). This would work if I know that each node will have similar utilization. However, if a machine handles less requests and is not operational, its quota would be unused, degrading cluster performance. The proper solution would be to implement a cluster of TokenLimiters that manages the distribution of token limits per machine dynamically.

How is this different from Rate Limiters>

How is this different from Rate Limiters #

While rate limiters and my TokenLimiter both limit the usage of resources in some way, there are key differences. Rate limiters typically limit the number of requests that can be made in a given time period and often handle concurrency in isolation. My TokenLimiter is specifically designed for token-based systems (e.g., API tokens) where each action may consume different amounts of tokens, and provides thread-safe management across concurrent operations.



Christoph C. Cemper
Author
Christoph C. Cemper
Waking up every, thinking about Links, SEO and AI.