> ## Documentation Index
> Fetch the complete documentation index at: https://docs.shuttleai.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Multipliers

> How request and context multipliers work and what they mean for your usage.

ShuttleAI uses **multipliers** to fairly price access to different models and context sizes. There are two types: **request multipliers** and **context multipliers**.

***

## Request multipliers

Every model has a **request multiplier** — a number that determines how many of your per-minute requests each API call consumes.

Most models have a **1x multiplier** (one API call = one request). Larger or more expensive models have a higher multiplier to reflect their compute cost.

## How they work

Your plan gives you a set number of **requests per minute (RPM)**:

| Plan    | RPM |
| ------- | --- |
| Free    | 2   |
| Basic   | 10  |
| Premium | 30  |
| Scale   | 80  |

When you call a model, the multiplier is deducted from your available RPM:

<Steps>
  <Step title="You make an API call">
    You send a request to a model with a 2x multiplier.
  </Step>

  <Step title="Multiplier is applied">
    The system deducts **2 requests** from your per-minute quota instead of 1.
  </Step>

  <Step title="Remaining capacity">
    If you're on the Premium plan (30 RPM), you now have 28 RPM remaining for that minute.
  </Step>
</Steps>

## Examples

### Basic plan (10 RPM)

| Model            | Multiplier | Effective calls/min |
| ---------------- | ---------- | ------------------- |
| ShuttleAI Auto   | 1x         | 10                  |
| GPT-5.2          | 1.5x       | \~6                 |
| Claude Haiku 4.5 | 1x         | 10                  |

### Premium plan (30 RPM)

| Model             | Multiplier | Effective calls/min |
| ----------------- | ---------- | ------------------- |
| ShuttleAI Auto    | 1x         | 30                  |
| GPT-5.2           | 1.5x       | 20                  |
| Claude Opus 4.6   | 2x         | 15                  |
| Claude Sonnet 4.6 | 2x         | 15                  |
| Claude Haiku 4.5  | 1x         | 30                  |

### Scale plan (80 RPM)

| Model             | Multiplier | Effective calls/min |
| ----------------- | ---------- | ------------------- |
| ShuttleAI Auto    | 1x         | 80                  |
| GPT-5.2           | 1.5x       | \~53                |
| Claude Opus 4.6   | 2x         | 40                  |
| Claude Sonnet 4.6 | 2x         | 40                  |
| Claude Haiku 4.5  | 1x         | 80                  |

## Checking multipliers

You can check any model's multiplier using the verbose models endpoint:

```bash theme={null}
curl https://api.shuttleai.com/v1/models/verbose \
  -H "Authorization: Bearer shuttle-xxx"
```

Each model includes a `request_multiplier` field:

```json theme={null}
{
  "id": "claude-opus-4.6",
  "plan": "premium",
  "request_multiplier": 2.0
}
```

You can also see multipliers displayed on each model card at [shuttleai.com/models](https://shuttleai.com/models).

## Tips for managing multipliers

<AccordionGroup>
  <Accordion title="Use ShuttleAI Auto for best cost efficiency">
    ShuttleAI Auto has a 1x multiplier and intelligently routes to the best model for each task. It's the most cost-efficient way to use the API.
  </Accordion>

  <Accordion title="Mix high and low multiplier models">
    Use lighter models (1x) for simple tasks and save your budget for frontier model calls (2x) when you need maximum quality.
  </Accordion>

  <Accordion title="Monitor your usage">
    The [Dashboard](/dashboard/overview) shows your request usage in real time, including breakdowns by model. Use this to understand how multipliers affect your usage patterns.
  </Accordion>

  <Accordion title="Upgrade for more headroom">
    If you're consistently hitting rate limits, consider upgrading your plan. The jump from Basic (10 RPM) to Premium (30 RPM) gives you 3x the capacity.
  </Accordion>
</AccordionGroup>

## Rate limit errors

If you exceed your RPM (accounting for multipliers), the API returns a **429 Too Many Requests** error:

```json theme={null}
{
  "error": {
    "message": "Rate limit exceeded. Please try again later.",
    "type": "rate_limit_error",
    "code": 429
  }
}
```

Wait until the next minute window to retry, or upgrade your plan for higher limits.

***

## Context window limits & scaling

Your plan determines the **maximum number of tokens** you can send in a single request. This is a hard limit — requests that exceed it are rejected.

| Plan    | Max context |
| ------- | ----------- |
| Free    | 8K tokens   |
| Basic   | 16K tokens  |
| Premium | 36K tokens  |
| Scale   | 128K tokens |

<Warning>
  If your request exceeds your plan's context limit, the API will return an error. You cannot go over your plan's max — upgrade your plan for a higher limit.
</Warning>

```
### Context scaling (paid plans only)

For paid plans (Basic, Premium, Scale), a **context scaling multiplier** is applied on top of the model's request multiplier. This scales smoothly based on prompt size using **0.1x per 1K tokens**:

```

scaleFactor = promptTokens / 10000
totalCost = model\_request\_multiplier × scaleFactor

```

This means larger prompts consume more of your per-minute quota gradually rather than in large blocks:

| Prompt size | Scale factor | With 1x model | With 2x model |
|-------------|-------------|----------------|----------------|
| 1K tokens | 0.1x | 0.1 request | 0.2 requests |
| 5K tokens | 0.5x | 0.5 request | 1 request |
| 10K tokens | 1x | 1 request | 2 requests |
| 20K tokens | 2x | 2 requests | 4 requests |
| 50K tokens | 5x | 5 requests | 10 requests |

<Note>
  The Free plan does **not** have context scaling — you pay only the model's base request multiplier, but you're limited to 8K tokens max.
</Note>
```

### Examples

* A 10K prompt on the Basic plan with a 1x model = **1 request** (10K is within the first 16K block)
* A 20K prompt on the Premium plan with a 1x model = **2 requests** (ceil(20K / 16K) = 2)
* A 100K prompt on the Scale plan with a 2x model = **14 requests** (ceil(100K / 16K) = 7, × 2x model = 14)

### Why context scaling exists

Processing more tokens requires more compute. Context scaling ensures that heavy token usage is priced fairly while keeping costs low for everyday requests.

### Tips for managing context costs

<AccordionGroup>
  <Accordion title="Keep prompts concise">
    Only include the context the model actually needs. Trim unnecessary history, instructions, or documents.
  </Accordion>

  <Accordion title="Use the right plan for your workload">
    If you regularly need large context windows, upgrade to a plan with a higher limit. The Scale plan (128K) gives you the most room.
  </Accordion>

  <Accordion title="Summarize long conversations">
    Instead of sending the full chat history every time, periodically summarize older messages to reduce token count.
  </Accordion>

  <Accordion title="Monitor usage on the dashboard">
    The [Dashboard](/dashboard/overview) shows your request usage. Watch for requests that consume more than expected — it may be context scaling.
  </Accordion>
</AccordionGroup>
