> ## Documentation Index > Fetch the complete documentation index at: https://docs.shuttleai.com/llms.txt > Use this file to discover all available pages before exploring further. # Multipliers > How request and context multipliers work and what they mean for your usage. ShuttleAI uses **multipliers** to fairly price access to different models and context sizes. There are two types: **request multipliers** and **context multipliers**. *** ## Request multipliers Every model has a **request multiplier** — a number that determines how many of your per-minute requests each API call consumes. Most models have a **1x multiplier** (one API call = one request). Larger or more expensive models have a higher multiplier to reflect their compute cost. ## How they work Your plan gives you a set number of **requests per minute (RPM)**: | Plan | RPM | | ------- | --- | | Free | 2 | | Basic | 10 | | Premium | 30 | | Scale | 80 | When you call a model, the multiplier is deducted from your available RPM: You send a request to a model with a 2x multiplier. The system deducts **2 requests** from your per-minute quota instead of 1. If you're on the Premium plan (30 RPM), you now have 28 RPM remaining for that minute. ## Examples ### Basic plan (10 RPM) | Model | Multiplier | Effective calls/min | | ---------------- | ---------- | ------------------- | | ShuttleAI Auto | 1x | 10 | | GPT-5.2 | 1.5x | \~6 | | Claude Haiku 4.5 | 1x | 10 | ### Premium plan (30 RPM) | Model | Multiplier | Effective calls/min | | ----------------- | ---------- | ------------------- | | ShuttleAI Auto | 1x | 30 | | GPT-5.2 | 1.5x | 20 | | Claude Opus 4.6 | 2x | 15 | | Claude Sonnet 4.6 | 2x | 15 | | Claude Haiku 4.5 | 1x | 30 | ### Scale plan (80 RPM) | Model | Multiplier | Effective calls/min | | ----------------- | ---------- | ------------------- | | ShuttleAI Auto | 1x | 80 | | GPT-5.2 | 1.5x | \~53 | | Claude Opus 4.6 | 2x | 40 | | Claude Sonnet 4.6 | 2x | 40 | | Claude Haiku 4.5 | 1x | 80 | ## Checking multipliers You can check any model's multiplier using the verbose models endpoint: ```bash theme={null} curl https://api.shuttleai.com/v1/models/verbose \ -H "Authorization: Bearer shuttle-xxx" ``` Each model includes a `request_multiplier` field: ```json theme={null} { "id": "claude-opus-4.6", "plan": "premium", "request_multiplier": 2.0 } ``` You can also see multipliers displayed on each model card at [shuttleai.com/models](https://shuttleai.com/models). ## Tips for managing multipliers ShuttleAI Auto has a 1x multiplier and intelligently routes to the best model for each task. It's the most cost-efficient way to use the API. Use lighter models (1x) for simple tasks and save your budget for frontier model calls (2x) when you need maximum quality. The [Dashboard](/dashboard/overview) shows your request usage in real time, including breakdowns by model. Use this to understand how multipliers affect your usage patterns. If you're consistently hitting rate limits, consider upgrading your plan. The jump from Basic (10 RPM) to Premium (30 RPM) gives you 3x the capacity. ## Rate limit errors If you exceed your RPM (accounting for multipliers), the API returns a **429 Too Many Requests** error: ```json theme={null} { "error": { "message": "Rate limit exceeded. Please try again later.", "type": "rate_limit_error", "code": 429 } } ``` Wait until the next minute window to retry, or upgrade your plan for higher limits. *** ## Context window limits & scaling Your plan determines the **maximum number of tokens** you can send in a single request. This is a hard limit — requests that exceed it are rejected. | Plan | Max context | | ------- | ----------- | | Free | 8K tokens | | Basic | 16K tokens | | Premium | 36K tokens | | Scale | 128K tokens | If your request exceeds your plan's context limit, the API will return an error. You cannot go over your plan's max — upgrade your plan for a higher limit. ``` ### Context scaling (paid plans only) For paid plans (Basic, Premium, Scale), a **context scaling multiplier** is applied on top of the model's request multiplier. This scales smoothly based on prompt size using **0.1x per 1K tokens**: ``` scaleFactor = promptTokens / 10000 totalCost = model\_request\_multiplier × scaleFactor ``` This means larger prompts consume more of your per-minute quota gradually rather than in large blocks: | Prompt size | Scale factor | With 1x model | With 2x model | |-------------|-------------|----------------|----------------| | 1K tokens | 0.1x | 0.1 request | 0.2 requests | | 5K tokens | 0.5x | 0.5 request | 1 request | | 10K tokens | 1x | 1 request | 2 requests | | 20K tokens | 2x | 2 requests | 4 requests | | 50K tokens | 5x | 5 requests | 10 requests | The Free plan does **not** have context scaling — you pay only the model's base request multiplier, but you're limited to 8K tokens max. ``` ### Examples * A 10K prompt on the Basic plan with a 1x model = **1 request** (10K is within the first 16K block) * A 20K prompt on the Premium plan with a 1x model = **2 requests** (ceil(20K / 16K) = 2) * A 100K prompt on the Scale plan with a 2x model = **14 requests** (ceil(100K / 16K) = 7, × 2x model = 14) ### Why context scaling exists Processing more tokens requires more compute. Context scaling ensures that heavy token usage is priced fairly while keeping costs low for everyday requests. ### Tips for managing context costs Only include the context the model actually needs. Trim unnecessary history, instructions, or documents. If you regularly need large context windows, upgrade to a plan with a higher limit. The Scale plan (128K) gives you the most room. Instead of sending the full chat history every time, periodically summarize older messages to reduce token count. The [Dashboard](/dashboard/overview) shows your request usage. Watch for requests that consume more than expected — it may be context scaling.