Request multipliers
Every model has a request multiplier — a number that determines how many of your per-minute requests each API call consumes. Most models have a 1x multiplier (one API call = one request). Larger or more expensive models have a higher multiplier to reflect their compute cost.How they work
Your plan gives you a set number of requests per minute (RPM):| Plan | RPM |
|---|---|
| Free | 2 |
| Basic | 10 |
| Premium | 30 |
| Scale | 80 |
Examples
Basic plan (10 RPM)
| Model | Multiplier | Effective calls/min |
|---|---|---|
| ShuttleAI Auto | 1x | 10 |
| GPT-5.2 | 1.5x | ~6 |
| Claude Haiku 4.5 | 1x | 10 |
Premium plan (30 RPM)
| Model | Multiplier | Effective calls/min |
|---|---|---|
| ShuttleAI Auto | 1x | 30 |
| GPT-5.2 | 1.5x | 20 |
| Claude Opus 4.6 | 2x | 15 |
| Claude Sonnet 4.6 | 2x | 15 |
| Claude Haiku 4.5 | 1x | 30 |
Scale plan (80 RPM)
| Model | Multiplier | Effective calls/min |
|---|---|---|
| ShuttleAI Auto | 1x | 80 |
| GPT-5.2 | 1.5x | ~53 |
| Claude Opus 4.6 | 2x | 40 |
| Claude Sonnet 4.6 | 2x | 40 |
| Claude Haiku 4.5 | 1x | 80 |
Checking multipliers
You can check any model’s multiplier using the verbose models endpoint:request_multiplier field:
Tips for managing multipliers
Use ShuttleAI Auto for best cost efficiency
Use ShuttleAI Auto for best cost efficiency
ShuttleAI Auto has a 1x multiplier and intelligently routes to the best model for each task. It’s the most cost-efficient way to use the API.
Mix high and low multiplier models
Mix high and low multiplier models
Use lighter models (1x) for simple tasks and save your budget for frontier model calls (2x) when you need maximum quality.
Monitor your usage
Monitor your usage
The Dashboard shows your request usage in real time, including breakdowns by model. Use this to understand how multipliers affect your usage patterns.
Upgrade for more headroom
Upgrade for more headroom
If you’re consistently hitting rate limits, consider upgrading your plan. The jump from Basic (10 RPM) to Premium (30 RPM) gives you 3x the capacity.
Rate limit errors
If you exceed your RPM (accounting for multipliers), the API returns a 429 Too Many Requests error:Context window limits & scaling
Your plan determines the maximum number of tokens you can send in a single request. This is a hard limit — requests that exceed it are rejected.| Plan | Max context |
|---|---|
| Free | 8K tokens |
| Basic | 16K tokens |
| Premium | 36K tokens |
| Scale | 128K tokens |
Context scaling (paid plans only)
For paid plans (Basic, Premium, Scale), a context scaling multiplier is applied on top of the model’s request multiplier. This is calculated per 16K token block:| Prompt size | Scale factor | With 1x model | With 2x model |
|---|---|---|---|
| 1–16K tokens | 1x | 1 request | 2 requests |
| 16K–32K tokens | 2x | 2 requests | 4 requests |
| 32K–48K tokens | 3x | 3 requests | 6 requests |
| 48K–64K tokens | 4x | 4 requests | 8 requests |
The Free plan does not have context scaling — you pay only the model’s base request multiplier, but you’re limited to 8K tokens max.
Examples
- A 10K prompt on the Basic plan with a 1x model = 1 request (10K is within the first 16K block)
- A 20K prompt on the Premium plan with a 1x model = 2 requests (ceil(20K / 16K) = 2)
- A 100K prompt on the Scale plan with a 2x model = 14 requests (ceil(100K / 16K) = 7, × 2x model = 14)
Why context scaling exists
Processing more tokens requires more compute. Context scaling ensures that heavy token usage is priced fairly while keeping costs low for everyday requests.Tips for managing context costs
Keep prompts concise
Keep prompts concise
Only include the context the model actually needs. Trim unnecessary history, instructions, or documents.
Use the right plan for your workload
Use the right plan for your workload
If you regularly need large context windows, upgrade to a plan with a higher limit. The Scale plan (128K) gives you the most room.
Summarize long conversations
Summarize long conversations
Instead of sending the full chat history every time, periodically summarize older messages to reduce token count.
Monitor usage on the dashboard
Monitor usage on the dashboard
The Dashboard shows your request usage. Watch for requests that consume more than expected — it may be context scaling.