Limitations
The Arena API is currently in research preview. Expect rough edges, and thank you for helping us improve it.
Research preview
This is an early release intended for research and evaluation purposes. The gateway is under active development and behaviour may change without notice. We do not currently offer uptime SLAs or production support commitments.
Auto mode model quality
The auto routing mode is powered by an early version of Arena's max routing model. It selects the best available model based on Arena leaderboard rankings, but it is not expected to be uniformly performant across all tasks and domains. Routing quality will improve over time as the underlying model is updated.
Multimodal support
The gateway primarily supports chat-style text generation today, with limited GPT audio support behind rollout gating. Newly created audio-enabled keys can access text-to-speech, GPT speech-to-text uploads, and GPT audio chat models. General file inputs, video, and arbitrary multimodal request bodies are still not broadly supported. Unsupported content is rejected with a 400 error.
Rate limits
Each account is assigned a rate-limit tier during the research preview period. Limits are shared across all of your API keys.
| Limit | Value |
|---|---|
| Default tier | Tier 1 |
| Tier 1 limits | 1,200 RPM / 1,200,000 TPM / 10 RPS / 200,000 TPS |
When a rate limit is exceeded the gateway returns a 429 Too Many Requests response. See Rate Limits for details.
Context window limits
Context window limits vary by the underlying model selected for each request. When using routing modes such as auto or fast, the selected model may have a smaller context window than expected. If a request exceeds the selected model's context window, the gateway returns a 400 error.
No function calling guarantee
While many models in the catalog support tool/function calling, the auto routing mode may select a model that does not support it. If your application requires function calling, use a direct model name rather than a routing mode.