If you've interacted with an API you've probably seen something like the following:

HTTP/1.1 429 Too Many Requests
Content-Type: text/html
Retry-After: 3600

Many API's such as Stripe will return the 429 HTTP status code if a rate limit is exceeded. This can have an ill-effect on your application if you are not prepared to handle them.

During my tenure as an Integration Engineer at Stripe I worked with large users to help them implement their Stripe integrations. One time there was a fast growing startup that built their entire user session logic by fetching a Stripe Customer among other objects like Subscriptions and payment history. On a massive launch day, this company hit the standard API rate limit of 100 requests per second very quickly. Due to the nature of the business we could allow a temporary increase in the rate limit. But even this increase could not save a poor integration. They quickly hit the new rate limit and had to ride it out. This could have been prevented with a better integration but also better rate limit handling and retry strategies.

Elements of a rate limit error solution

In general there are three things that you want to do when a rate limit error is detected: 1) properly detect the error, 2) wait for a certain amount of time, and 3) repeat the request.

  1. Properly detect the error - generally an API will use the 429 HTTP status code to indicate a rate limit exception has occurred. This could also happen with some kind of response header scheme. Be sure to read the API docs to see how the particular API will handle this situation.
  2. Wait for a certain amount of time - The entire point of a rate limit error is that you are sending requests too fast. The goal of this step is to slow down your overall throughput or requests per second.
  3. Repeat the request - After you wait, repeat the request. It's important to make sure that you keep all the context required and that there are no side effects of repeating the function or lines of code.

If you get get a successful response after step 3 you are good to go. Depending on your retry logic you may want to do any cleanup required but generally your code can proceed as normal.

If you do not get success after retrying a few times you may want to reschedule the request if it's in the context of a batch process. If it was driven by a customer action for example, show the appropriate error message and ask them to retry. Or better yet, turn a customer action into a background task for a better user experience.

Different solutions and examples

Simple try/catch/retry

require "faraday"

class RateLimitException < Exception; end

conn = Faraday.new(
  url: 'http://127.0.0.1:3000',
  headers: {'Content-Type' => 'application/json'}
)

# Get a new API Key from our service
api_key_response = conn.post("/api_keys.json")
api_key = JSON.parse(api_key_response.body)

conn.params = { api_key: api_key["secret"] }

pictures = []
# Create 50 pictures
50.times do |i|
  begin
    res = conn.post('/pictures.json') do |req|
      req.params = req.params.merge({
        picture: {
          title: "Picture #{i}",
          alt: "Hey picture this",
          url: "https://place-hold.it/300x500.jpg/666/fff/000"
        }
      })
    end
    # if we get a 429 rate limit error, raise the exception
    if res.status == 429
      raise RateLimitException.new
    end
    pictures.push JSON.parse(res.body)
  rescue RateLimitException => e
    puts e, i
    # Sleep for a second, then retry the block starting at `begin`
    sleep 1
    retry
  end
end

pictures.each do |picture|
  begin
    raise RateLimitException.new unless conn.get("/pictures/#{picture['id']}.json").status != 429
  rescue RateLimitException => e
    puts e, picture
    sleep 1
    retry
  end
end

Simple try/catch/retry using HTTP client middleware

require "faraday"

class RateLimitException < Exception; end

module Faraday
  module RateLimitMiddleware
    class Middleware < Faraday::Middleware
      def on_complete(env)
        if env.status == 429
          raise RateLimitException
        end
      end
    end
  end
end

@conn = Faraday.new(
  url: 'http://127.0.0.1:3000',
  headers: {'Content-Type' => 'application/json'}
) do |f|
  f.use Faraday::RateLimitMiddleware::Middleware
end

# Get a new API Key from our service
api_key_response = @conn.post("/api_keys.json")
api_key = JSON.parse(api_key_response.body)

@conn.params = { api_key: api_key["secret"] }

def with_rate_limit_retry
  begin
    yield
  rescue RateLimitException => e
    num = 0.2
    puts "Sleeping for #{num}"
    sleep num
    retry
  end
end

@pictures = []
# Create 50 pictures

def create_picture i
  puts "creating picture #{i}"
  res = @conn.post('/pictures.json') do |req|
    req.params = req.params.merge({
      picture: {
        title: "Picture #{i}",
        alt: "Hey picture this",
        url: "https://place-hold.it/300x500.jpg/666/fff/000"
      }
    })
  end
  @pictures.push JSON.parse(res.body)
  res.status
end

50.times.map do |i|
  with_rate_limit_retry do
    create_picture i + 1
  end
end

@pictures.each do |picture|
  with_rate_limit_retry do
    puts "Getting #{picture['title']}"
    @conn.get("/pictures/#{picture['id']}.json")
  end
end

Exponential backoff try/catch/retry with HTTP Middleware

require "faraday"

class RateLimitException < Exception; end

module Faraday
  module RateLimitMiddleware
    class Middleware < Faraday::Middleware
      def on_complete(env)
        if env.status == 429
          raise RateLimitException
        end
      end
    end
  end
end

@conn = Faraday.new(
  url: 'http://127.0.0.1:3000',
  headers: {'Content-Type' => 'application/json'}
) do |f|
  f.use Faraday::RateLimitMiddleware::Middleware
end

# Get a new API Key from our service
api_key_response = @conn.post("/api_keys.json")
api_key = JSON.parse(api_key_response.body)

@conn.params = { api_key: api_key["secret"] }

def with_rate_limit_retry give_up_count = -1
  sleep_num = 0.2
  count = 0
  begin
    yield
  rescue RateLimitException => e
    count += 1
    puts "Sleeping for #{sleep_num}"
    sleep sleep_num
    sleep_num *= 2
    retry unless give_up_count == count
    puts "Giving up after #{count} attempts"
  end
end

@pictures = []
# Create 50 pictures

def create_picture i
  puts "creating picture #{i}"
  res = @conn.post('/pictures.json') do |req|
    req.params = req.params.merge({
      picture: {
        title: "Picture #{i}",
        alt: "Hey picture this",
        url: "https://place-hold.it/300x500.jpg/666/fff/000"
      }
    })
  end
  if res.status.between? 200, 300
    @pictures.push JSON.parse(res.body)
  end
  res.status
end

50.times.map do |i|
  with_rate_limit_retry(2) do
    create_picture i + 1
  end
end

@pictures.each do |picture|
  with_rate_limit_retry do
    puts "Getting #{picture['title']}"
    @conn.get("/pictures/#{picture['id']}.json")
  end
end

Best Practices

The reason you are calling the API can give you a hint as what to do if you experience a rate limit error. Retrying or reschedule the API call depending on the urgency of the task at hand. For example a on session checkout Payment attempt with the Stripe PaymentIntents API might be retry-able one time. But most likely you'll want return an error to the customer and ask them to try again.

Understand the API service's rate limit model. Many times you can simply model this locally and know when you are coming up against the limits. This approach works well for batch processes.

For web apps with many workers and web instances using an exponential backoff strategy will help prevent a "Thundering Herd" of requests.

FAQ

Can I raise the API rate limit?

Yes, sometimes it's possible to ask the API provider to grant you an exception. Many times there will be questions as to why can you not throttle your operations or Is your integration optimal / not making unnecessary API calls. But it is not out of the question for the limit to be raised.

Can I pay for a higher API rate limit?

There could be a paid tier that gets you more API access. API calls individually a small cost for the business but can add up if they incur storage or compute costs. So upgrading to another tier of a SaaS/API might be the answer.

What's a better strategy? Exponential backoff or tracking the local API rate?

I believe that exponential backoff is a more simple solution to implement and likely a better one. This research paper goes into extreme detail regarding the dynamic nature of an exponential backoff algorithm and the scalability as the number of nodes (workers, web instances) increases. It's very low level but provides mathematical proofs for those interested in a deeper look at this algorithm. This paper can also be helpful if you want to define an upper bound for how high the "exponential backoff" timeout can actually grow.