Multipliers

ShuttleAI uses multipliers to fairly price access to different models and context sizes. There are two types: request multipliers and context multipliers.

Request multipliers

Every model has a request multiplier — a number that determines how many of your per-minute requests each API call consumes. Most models have a 1x multiplier (one API call = one request). Larger or more expensive models have a higher multiplier to reflect their compute cost.

How they work

Your plan gives you a set number of requests per minute (RPM):

Plan	RPM
Free	2
Basic	10
Premium	30
Scale	80

When you call a model, the multiplier is deducted from your available RPM:

You make an API call

You send a request to a model with a 2x multiplier.

Multiplier is applied

The system deducts 2 requests from your per-minute quota instead of 1.

Remaining capacity

If you’re on the Premium plan (30 RPM), you now have 28 RPM remaining for that minute.

Examples

Basic plan (10 RPM)

Model	Multiplier	Effective calls/min
ShuttleAI Auto	1x	10
GPT-5.2	1.5x	~6
Claude Haiku 4.5	1x	10

Premium plan (30 RPM)

Model	Multiplier	Effective calls/min
ShuttleAI Auto	1x	30
GPT-5.2	1.5x	20
Claude Opus 4.6	2x	15
Claude Sonnet 4.6	2x	15
Claude Haiku 4.5	1x	30

Scale plan (80 RPM)

Model	Multiplier	Effective calls/min
ShuttleAI Auto	1x	80
GPT-5.2	1.5x	~53
Claude Opus 4.6	2x	40
Claude Sonnet 4.6	2x	40
Claude Haiku 4.5	1x	80

Checking multipliers

You can check any model’s multiplier using the verbose models endpoint:

curl https://api.shuttleai.com/v1/models/verbose \
  -H "Authorization: Bearer shuttle-xxx"

Each model includes a request_multiplier field:

{
  "id": "claude-opus-4.6",
  "plan": "premium",
  "request_multiplier": 2.0
}

You can also see multipliers displayed on each model card at shuttleai.com/models.

Tips for managing multipliers

Use ShuttleAI Auto for best cost efficiency

ShuttleAI Auto has a 1x multiplier and intelligently routes to the best model for each task. It’s the most cost-efficient way to use the API.

Mix high and low multiplier models

Use lighter models (1x) for simple tasks and save your budget for frontier model calls (2x) when you need maximum quality.

Monitor your usage

The Dashboard shows your request usage in real time, including breakdowns by model. Use this to understand how multipliers affect your usage patterns.

Upgrade for more headroom

If you’re consistently hitting rate limits, consider upgrading your plan. The jump from Basic (10 RPM) to Premium (30 RPM) gives you 3x the capacity.

Rate limit errors

If you exceed your RPM (accounting for multipliers), the API returns a 429 Too Many Requests error:

{
  "error": {
    "message": "Rate limit exceeded. Please try again later.",
    "type": "rate_limit_error",
    "code": 429
  }
}

Wait until the next minute window to retry, or upgrade your plan for higher limits.

Context window limits & scaling

Your plan determines the maximum number of tokens you can send in a single request. This is a hard limit — requests that exceed it are rejected.

Plan	Max context
Free	8K tokens
Basic	16K tokens
Premium	36K tokens
Scale	128K tokens

If your request exceeds your plan’s context limit, the API will return an error. You cannot go over your plan’s max — upgrade your plan for a higher limit.

### Context scaling (paid plans only)

For paid plans (Basic, Premium, Scale), a **context scaling multiplier** is applied on top of the model's request multiplier. This scales smoothly based on prompt size using **0.1x per 1K tokens**:

scaleFactor = promptTokens / 10000 totalCost = model_request_multiplier × scaleFactor

This means larger prompts consume more of your per-minute quota gradually rather than in large blocks:

| Prompt size | Scale factor | With 1x model | With 2x model |
|-------------|-------------|----------------|----------------|
| 1K tokens | 0.1x | 0.1 request | 0.2 requests |
| 5K tokens | 0.5x | 0.5 request | 1 request |
| 10K tokens | 1x | 1 request | 2 requests |
| 20K tokens | 2x | 2 requests | 4 requests |
| 50K tokens | 5x | 5 requests | 10 requests |

<Note>
  The Free plan does **not** have context scaling — you pay only the model's base request multiplier, but you're limited to 8K tokens max.
</Note>

Examples

A 10K prompt on the Basic plan with a 1x model = 1 request (10K is within the first 16K block)
A 20K prompt on the Premium plan with a 1x model = 2 requests (ceil(20K / 16K) = 2)
A 100K prompt on the Scale plan with a 2x model = 14 requests (ceil(100K / 16K) = 7, × 2x model = 14)

Why context scaling exists

Processing more tokens requires more compute. Context scaling ensures that heavy token usage is priced fairly while keeping costs low for everyday requests.

Tips for managing context costs

Keep prompts concise

Only include the context the model actually needs. Trim unnecessary history, instructions, or documents.

Use the right plan for your workload

If you regularly need large context windows, upgrade to a plan with a higher limit. The Scale plan (128K) gives you the most room.

Summarize long conversations

Instead of sending the full chat history every time, periodically summarize older messages to reduce token count.

Monitor usage on the dashboard

The Dashboard shows your request usage. Watch for requests that consume more than expected — it may be context scaling.

Getting Started

Models

Pricing

Guides

Request multipliers

How they work

Examples

Basic plan (10 RPM)

Premium plan (30 RPM)

Scale plan (80 RPM)

Checking multipliers

Tips for managing multipliers

Rate limit errors

Context window limits & scaling

Examples

Why context scaling exists

Tips for managing context costs

Getting Started

Models

Pricing

Guides

​Request multipliers

​How they work

​Examples

​Basic plan (10 RPM)

​Premium plan (30 RPM)

​Scale plan (80 RPM)

​Checking multipliers

​Tips for managing multipliers

​Rate limit errors

​Context window limits & scaling

​Examples

​Why context scaling exists

​Tips for managing context costs

Request multipliers

How they work

Examples

Basic plan (10 RPM)

Premium plan (30 RPM)

Scale plan (80 RPM)

Checking multipliers

Tips for managing multipliers

Rate limit errors

Context window limits & scaling

Examples

Why context scaling exists

Tips for managing context costs