AI FinOps · Anthropic Claude Cost Framework

Learn · Foundations

Which Claude costs do you actually have?

Every dollar your organization pays Anthropic falls into one of three billing streams. Knowing which streams apply to you is step one. From there, each stream has its own forecasting method, monitoring tool, and optimization strategy.

Fixed

Seat Subscriptions

Monthly per-user fees: Pro, Max, Team, or Enterprise. Predictable, billed regardless of usage.

Variable

API Token Usage

Pay-per-token for every API call. Varies with prompt length, output, model, and volume.

Hybrid

Claude Code

Included in some plans. Overages billed at standard API token rates.

A Typical Bill Breakdown

Illustrative monthly spend$5,000/mo example

35%

48%

17%

Seats (fixed)

API (variable)

Claude Code (hybrid)

Illustrative The exact split varies. API-heavy orgs see 70%+ variable; chat-heavy teams lean toward seats.

📊

Visibility

Know where the money goes: by team, product, model.

🏷

Attribution

Every dollar traceable to a workspace, key, team, or product.

🎯

Optimization

Identify waste: wrong models, redundant prompts, unused seats.

🔒

Governance

Limits, policies, rotation schedules prevent drift.

Learn · Foundations

Three Spending Streams

Anthropic bills through three mechanisms. Each has its own pricing model, monitoring tools, and optimization strategies.

How They Interact

A developer on a Team seat gets Claude Code included up to a plan allowance. Premium seats get a larger allowance (6.25x Pro vs 1.25x on Standard). But if that developer also uses API keys in a product, those variable charges stack on top of their seat cost.

Rule of thumb: Attribute seat costs to people (headcount budget) and API costs to products (product budget). Claude Code overage goes to the project budget.

Forecasting Difficulty Opinionated

Seats

Easy

API Tokens

Hard

Claude Code

Medium

Learn · Foundations

Token Economics

Tokens are the fundamental unit of API cost.

What Is a Token?

A chunk of text, roughly 3/4 of a word. A 200-word email ≈ 270 tokens. A 10-page doc ≈ 4,000-5,000 tokens.

Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)

Token Cost by Model Official

Last verified: Apr 10, 2026 against platform.claude.com/docs/en/about-claude/pricing

Input per 1M tokens

Haiku 4.5

$1

$1.00

Sonnet 4.6

$3

$3.00

Opus 4.6

$5

$5.00

Output per 1M tokens

Haiku 4.5

$5

$5.00

Sonnet 4.6

$15

$15.00

Opus 4.6

$25

$25.00

Learn · Deep Dive

Plans & Pricing

A complete view of Anthropic's current plan lineup, including billing frequency differences and Claude Code availability.

Individual Plans Official

Last verified: Apr 10, 2026 against claude.com/pricing

Free

$0

forever

Sonnet 4.5 only
~20 msgs/day
No Claude Code
No Cowork

Pro

$20

$17/mo if annual ($200/yr)

All models
Claude Code ✓
Cowork ✓
~5x Free usage

Max

$100-$200

5x ($100) or 20x ($200)

5x or 20x Pro usage
Claude Code ✓
Higher output limits
Priority access

Team & Enterprise Plans Official

Last verified: Apr 10, 2026 against claude.com/pricing and support.claude.com

Plan	Annual	Monthly	Min Seats	Claude Code
Team Standard	$20/seat	$25/seat	5	Included
Team Premium	$100/seat	$125/seat	5	Included
Enterprise (self-serve)	$20/seat + API usage	—	20	Included
Enterprise (sales)	Custom	—	Contact sales	Included

Annual vs Monthly matters. Team Standard is $20/seat annual vs $25 monthly, a 20% difference. Team Premium is $100 vs $125, also 20%. For a 20-person team on Standard, that's $1,200/year saved by committing annually.

Claude Code Availability Official

Last verified: Apr 10, 2026 against claude.com/product/claude-code

Claude Code is available with:

✓ Pro plan ($20/mo)
✓ Max plan ($100-200/mo)
✓ Team Standard seats ($20-25/seat)
✓ Team Premium seats ($100-125/seat)
✓ Enterprise plans
✓ Anthropic Console / API account (pay-per-token)
✗ Free plan (not available)

Enterprise: Not Just "Custom" Opinionated

Enterprise is often assumed to be a single black-box "contact sales" tier. In practice, Anthropic offers two Enterprise paths:

Self-serve Enterprise ($20/seat + API usage, min 20 seats, annual commitment): organizations can start today without contacting sales. It includes SSO, SCIM, audit logs, 500K context window, and compliance features.

Sales-assisted Enterprise (custom pricing, contact sales): for organizations needing tailored terms, usage commitments, invoicing, product bundling, and HIPAA-ready configurations. Minimum seat counts are negotiated.

The self-serve path means Enterprise isn't necessarily more expensive per-seat than Team, it's $20/seat (same as Team Standard annual) plus usage-based API charges. The value is in the governance and compliance features, not a price premium on the seat itself.

Learn · Deep Dive

API Cost Anatomy

Every API call has multiple cost components.

Input Tokens · ~60-75%

System prompt, user message, documents, history, tool definitions. Cheaper per-token but high volume.

Output Tokens · ~20-35%

Response text, thinking, tool calls, code. 3-5x more expensive per token than input.

Feature Charges · ~5-15%

Web search, code execution, fast mode (6x rates), extended caching. Not visible in token counts.

Discounts

Batch API (50% off), prompt cache reads (90% off input), cache writes (1.25x input). Check the Cost page.

Usage ≠ Cost. If tokens are flat but cost rises, someone switched models. If tokens spike but cost doesn't, caching is working. Always compare both Console pages.

Learn · Deep Dive

Claude Code Costs

CLI coding assistant with hybrid billing: included allowances plus variable overages.

Billing Model

On subscription plans (Pro, Max, Team Standard, Team Premium, Enterprise), Claude Code draws from included usage. Once exceeded, overages are billed at standard API rates for the model used. On a Console/API account, all usage is pay-per-token from the start.

Typical spend: Estimate Active developers average $6-12/day. Heavy agentic workflows reach $20-30/day. That's $100-600+/month per developer.

⌨

/cost Command

Real-time session spend in the terminal.

📈

Analytics API

Daily per-user metrics via Admin API.

🔑

Key Hygiene

Disable keys from departed team members.

⚠

Spend Limits

Set per-user caps. No limit = unbounded risk.

Diagnose

Which Plan Do I Need? Opinionated

Answer five questions to get a recommended configuration. Recommendations reflect the author's judgment, not official Anthropic guidance.

1 How many people need Claude access?

Diagnose

Console Walkthrough

The Anthropic Console at console.anthropic.com is the primary interface.

🏢 Organization Recommended Structure

Product_API

Customer-facing keys

Internal_Tools

Automation & research

Claude_Code

1 key per developer

📊 Usage Page

▸

Token volume by workspace and key. Filter by model. Click bars for hourly detail. Export CSV monthly as your baseline.

💰 Cost Page

▸

Dollar amounts with model pricing, feature charges, discounts. Compare with Usage to catch model switches or caching effects.

⚙ Workspaces

▸

One per product/team. Keys in "Default" = unattributable spend. Create named workspaces and migrate keys.

🔑 API Keys

▸

Name descriptively. Rotate quarterly. Disable when someone leaves. Never share one key across products.

👥 Spend Limits

▸

In Settings. Set per-user caps at 2x median spend. "Unlimited" = unbounded risk.

📡 Admin API

▸

/v1/organizations/usage_report/messages for tokens. /v1/organizations/usage_report/claude_code for Code. Group by workspace, model, time. Worth automating above $1K/mo.

Diagnose

Am I Normal? Estimates

Reference ranges for typical Claude usage. These are approximate ranges based on observed patterns, not official Anthropic data. Use them to identify whether your spend warrants investigation, not as targets.

Small consulting team (5-10 people, mostly chat)

Monthly input tokens

20-100M

0200M500M

Monthly total spend

$300-$800

$0$2,500$5,000

Mid-size product team (15-30, API + chat + Code)

Monthly input tokens

100M-500M

0500M1B

Monthly total spend

$1.5K-$5K

$0$5K$10K

Single developer using Claude Code

Monthly spend

$100-$600

$0$500$1,000

Diagnose

Common Mistakes Opinionated

Six costly anti-patterns with typical dollar impact and fixes.

1. Sending full documents when sections suffice

3-10x input token waste

A 50-page contract (60K tokens) to answer one clause question. Only ~3K tokens needed. At Sonnet rates: $0.17 vs $0.009 per request.

→ Pre-filter with embeddings or keyword search before sending.

2. Opus for tasks Haiku handles identically

5x cost, 0x quality gain

Classification, extraction, formatting rarely benefit from Opus. Same output, 5x the price.

→ A/B test model tiers on actual tasks. Document which genuinely need Opus.

3. No max_tokens cap

2-5x output spend

4,000-token response when 200 suffice. On Sonnet: $0.06 vs $0.003.

→ Set max_tokens on every production call. Match to expected response length.

4. Orphaned API keys

$50-500/mo unattributed spend

A departed developer's test integration keeps running. Nobody's monitoring.

→ Monthly key audit. Disable unused keys. Name with owner + purpose.

5. Full conversation history every request

Quadratic cost growth per conversation

Turn 20 sends all 19 prior turns. A $0.50 conversation costs $5+.

→ Conversation windowing: last N turns, or summarize older context.

6. Unlimited spend limits, no alerts

Unbounded, discovered on invoice

A looping bug runs 100x requests. First sign: $10,000 bill.

→ Set dollar caps today. Set up daily Admin API alerts. Same-day implementations.

Optimize

Optimization Levers

Ordered by typical impact, highest first.

1. Prompt Caching Official

Pricing verified Apr 10, 2026

Standard: $3.00/MTok → Cached read: $0.30/MTok (90% savings)

Cache writes cost 1.25x. Default TTL: 5 minutes. Best for high-frequency stable prompts.

2. Model Routing Opinionated

Haiku — $1/$5

Classification, extraction, formatting, validation

Sonnet — $3/$15

Code gen, doc analysis, reasoning, most production

Opus — $5/$25

Complex reasoning, nuanced writing

Opus Fast — $30/$150

Latency-critical production only

3. Batch API Official

Sonnet: $3/$15 → Batch: $1.50/$7.50 (50% off)

Non-urgent workloads. Results within 24h.

4. Prompt Engineering Opinionated

Trim redundant system prompt instructions
Set max_tokens on every call
JSON output when machine-parsed
Truncate history to relevant turns
Send needed sections, not full documents

Optimize

Cost Calculator

Compare before/after optimization scenarios.

Multi-Model API Estimator

Total input/mo

M tokens

Total output/mo

M tokens

Model mix (must total 100%):

Haiku ($1/$5)

%

Sonnet ($3/$15)

%

Opus ($5/$25)

%

Before

Batch %

Cache %

$1,200

—

After

Batch %

Cache %

$540

—

Monthly Savings

$660 (55%)

Seat Cost Estimator Official

Using annual pricing. Last verified Apr 10, 2026.

Team Standard

× $20/seat (annual)

Team Premium

× $100/seat (annual)

Pro (individual)

× $17/user (annual)

Monthly Seat Cost (annual pricing)

$800

—

Optimize

FinOps Maturity Model Opinionated

Stages of AI cost management capability.

Stage 1 · Awareness

You know you're spending

Can see total spend but can't attribute to teams/products.

Stage 2 · Visibility

Every dollar attributed

Keys mapped, workspaces aligned, monthly reviews, limits set. Target: 2-4 weeks.

Stage 3 · Optimization

Actively reducing waste

Caching, routing, batch, seat right-sizing. Target: 2-3 months.

Stage 4 · Governance

Self-managing through process

Automated alerts, budget owners, routine reviews, playbook. Target: 4-6 months.

Getting Started

Day 1: Set spend limits (not Unlimited)
Week 1: Audit keys, disable unused, rename vague
Week 1: Export Usage CSVs as baseline
Week 2: Investigate spiky days, create named workspaces
Week 3: Review seat utilization, evaluate caching
Month 2: Document routing policy, set up alerts
Month 3: Write the FinOps playbook

Learn · Reference

Glossary

MTok

Million tokens. Standard API billing unit. 1 MTok ≈ 750K words.

Input Tokens

Everything sent: prompt, documents, history, tool defs.

Output Tokens

Everything generated: response, thinking, tool calls. 3-5x input price.

Context Window

Max tokens per request. 200K (Team), 500K (Enterprise), 1M (Opus 4.6).

ITPM / OTPM

Input/Output Tokens Per Minute. Rate limit metrics in Console.

TTL

Time To Live. Cache expiry. Default 5 min.

Prompt Caching

Stores repeated content. Reads cost ~10% of standard input.

Batch API

50% cheaper. Results within 24h. Non-urgent workloads.

Workspace

API key container in Console. Group/filter for cost attribution.

Admin API Key

sk-ant-admin... format. Org-level usage/cost data. Admin-only.

Spend Limit

Per-user dollar cap beyond subscription. In Console Settings.

Fast Mode

Opus only. 6x rates ($30/$150). Lower latency.

max_tokens

API param capping response length. Essential for output cost control.

Model Routing

Directing tasks to cheapest adequate model. Haiku→Sonnet→Opus.

Cache Hit Rate

% of input tokens read from cache. Higher = cheaper.

Extra Usage

Pay-per-token charges when subscription limits are exceeded.

Pricing data verified against claude.com/pricing and platform.claude.com/docs on April 10, 2026. Anthropic may change pricing at any time. Always verify before making purchasing decisions.

Which Claude costs do you actually have?

Seat Subscriptions

API Token Usage

Claude Code

A Typical Bill Breakdown

Visibility

Attribution

Optimization

Governance

Three Spending Streams

How They Interact

Forecasting Difficulty Opinionated

Token Economics

What Is a Token?

Token Cost by Model Official

Plans & Pricing

Individual Plans Official

Team & Enterprise Plans Official

Claude Code Availability Official

Enterprise: Not Just "Custom" Opinionated

API Cost Anatomy

Input Tokens · ~60-75%

Output Tokens · ~20-35%

Feature Charges · ~5-15%

Discounts

Claude Code Costs

Billing Model

/cost Command

Analytics API

Key Hygiene

Spend Limits

Which Plan Do I Need? Opinionated

1 How many people need Claude access?

2 Do developers need Claude Code?

3 SSO, audit logs, or HIPAA needed?

4 API integration into products?

5 How often do users hit limits?

Console Walkthrough

Product_API

Internal_Tools

Claude_Code

📊 Usage Page

💰 Cost Page

⚙ Workspaces

🔑 API Keys

👥 Spend Limits

📡 Admin API

Am I Normal? Estimates

Small consulting team (5-10 people, mostly chat)

Mid-size product team (15-30, API + chat + Code)

Single developer using Claude Code

Common Mistakes Opinionated

1. Sending full documents when sections suffice

2. Opus for tasks Haiku handles identically

3. No max_tokens cap

4. Orphaned API keys

5. Full conversation history every request

6. Unlimited spend limits, no alerts

Optimization Levers

1. Prompt Caching Official

2. Model Routing Opinionated

Haiku — $1/$5

Sonnet — $3/$15

Opus — $5/$25

Opus Fast — $30/$150

3. Batch API Official

4. Prompt Engineering Opinionated

Cost Calculator

Multi-Model API Estimator

Before

After

Seat Cost Estimator Official

FinOps Maturity Model Opinionated

Stage 1 · Awareness

You know you're spending

Stage 2 · Visibility

Every dollar attributed

Stage 3 · Optimization

Actively reducing waste

Stage 4 · Governance