Rate Limits

To ensure efficient use of model resources, global QPS and Timeout mechanisms are set to ensure service availability and stability.

Limit Details

QPS: Maximum 2 requests per second, exceeding will prompt “429 - Rate limit reached for requests”.
Timeout: Synchronous calls have a 90s timeout, streaming calls are recommended.

FAQ

1. What is QPS

QPS (Queries Per Second) refers to queries per second, which is an indicator of the number of requests the server can accept and process per second.

2. Why set QPS

Setting QPS is a common practice for APIs. There are several different reasons for implementing rate limits:

It helps prevent API abuse and misuse. For example, preventing some users from maliciously overloading the API in an attempt to overload it or cause service disruption. We can prevent such malicious use by setting rate limits;
Rate limits help ensure everyone has fair access to the API. Preventing one person or organization from issuing too many requests that could cause uneven API resource allocation for others. By limiting the number of requests a single user can make, we can ensure the most people have the opportunity to use the API without encountering slowdowns.

3. What happens if requests exceed the limit

When a rate limit is triggered, you will receive a rate limit error with status code 429, indicating that you have made too many requests in a short time, or too many characters in a short time. At this point, the API will refuse to fulfill further requests until after the specified time has elapsed.

Was this page helpful?