Estimate your token spend
Choose a preset (optional), then move the sliders. Results update automatically and you can save snapshots for later.
Estimate how much your AI API usage will cost from input tokens, output tokens, and pricing per 1M tokens. Get a clear view of cost per request, daily/monthly/yearly spend, and whether you’re on track with your budget.
Choose a preset (optional), then move the sliders. Results update automatically and you can save snapshots for later.
Most AI APIs price tokens separately for input and output. This calculator treats one request as a pair: the tokens you send plus the tokens you receive. If your provider bills in “per 1M tokens”, then the per‑request cost is:
Cost per request = (inputTokens ÷ 1,000,000 × priceIn) + (outputTokens ÷ 1,000,000 × priceOut)
Two practical additions make this much closer to real life:
After you have cost per request, forecasting is simple multiplication:
Finally, with a monthly budget you can invert the math to estimate “how many requests can I afford?” This is the most useful question for founders and creators because it turns pricing into a product decision.
A small chat endpoint that sends a short prompt (2,000 input tokens) and receives a medium answer (800 output tokens). At $5 / 1M input tokens and $15 / 1M output tokens, the raw cost per request is: (2000/1M×5) + (800/1M×15) = $0.010 + $0.012 = $0.022. Add overhead 1.10 → $0.0242.
If you paste long context into each request (say 20,000 input tokens) and generate 1,500 output tokens, costs can jump by an order of magnitude. This is where caching and summarization pay for themselves.
A creator tool running 10,000 requests/day at $0.01 per request is roughly $3,000/month (at 30 days). Small per-request reductions matter a lot at volume.
Try this: move the output slider down by 20% and watch the monthly cost. Output control is often the easiest win.
Token pricing is deceptively simple: “pay per token.” But the hard part is that tokens depend on product design. Your UI, your prompts, your memory strategy, and your reliability systems all change the final bill. If you’re building anything with AI—an app, a Chrome extension, a newsletter helper, a content studio—this page is meant to help you translate fuzzy “AI cost” into concrete product levers.
The first lever is how much text you send. Every request includes more than the user’s last message: system instructions, formatting templates, tool definitions, and often conversation history. If you’re not careful, “history” grows linearly with time and turns into a cost snowball. A common pattern is to switch from “send full history” to “send a rolling summary + last few turns.” That one change can reduce input tokens dramatically without harming quality.
The second lever is how much text you receive. Output tokens are usually more expensive than input tokens. A model that writes 400 words instead of 200 words can double the output bill. The fix is rarely “make the model worse.” It’s “make the model precise.” Use explicit format constraints (bullets, word limits, JSON schemas), set max output tokens, and ask for the minimum that still accomplishes the user goal. The best prompt isn’t the longest; it’s the one that produces the most usable output per token.
The third lever is routing: not every task needs your strongest model. Many apps use a small model to classify intent (“is this a billing question?”), then route to a bigger model only when necessary. You can also do “draft then polish”: a smaller model drafts a response and a bigger model edits it. This can keep quality high while controlling spend.
Next is caching and reuse. If your app uses a fixed template—like a branded system prompt, or a static “policy” block—those tokens are the same across requests. Some providers offer prompt caching; even if they don’t, you can cache results (for repeated questions, or repeated document chunks) to reduce calls. In this calculator, the cache hit rate applies only to input tokens because caching typically reduces the repeated context you send, not the new output the model generates.
Finally, there’s real-world overhead. Production systems are messy: users retry, network calls fail, your server times out, safety filters re-run, tool calls add extra messages, and observability/logging may call additional endpoints. That’s why the overhead multiplier exists. If you’re early-stage, use 1.10 as a starting point. If you have heavy tool usage or multi-step flows, 1.25–1.60 can be more realistic. The goal is not perfect accuracy; it’s avoiding the “we launched and the bill surprised us” moment.
Once you have a per-request cost, turn it into a product constraint. For example:
A healthy cost model is not “zero cost.” It’s “cost scales with value.” Your best users should be worth more than they cost, and your pricing should reflect the token reality. Use this tool to test scenarios: increase output length, add memory, introduce a tool call, add retries—and see what happens. The numbers will tell you where to optimize first.
Note: This tool does not know your provider’s exact billing nuances (minimums, rounding, cached pricing rules, or discounted tiers). Treat it as a decision calculator, then verify with provider docs for final billing.
Many AI APIs price input and output tokens differently. Separating them helps you see which side drives cost and which optimization will have the biggest impact.
Start with rough numbers, then instrument your app. Most providers return token usage in responses or logs. Measure a representative sample (10–50 requests) per feature and update your sliders.
For a simple single-call endpoint: 1.05–1.15. For multi-step agent flows with tools and retries: 1.20–1.60. If you’re unsure, pick 1.10, then revisit after you have production logs.
Usually no. Caching typically reduces repeated input context. Output still depends on the user’s request and the model’s response. This calculator applies cache only to input tokens.
Reduce unnecessary context, cap max output tokens, use a smaller model for routing/drafts, and summarize history. The best results come from product design, not from “cheaper prompts.”
Jump to adjacent calculators and planning tools:
If you’re shipping AI features, cost control should be visible in your product and your engineering:
Want virality? Share a screenshot of your “cost per request” and ask: “How low can you get this without losing quality?”
MaximCalculator builds fast, human-friendly tools. Always treat estimates as guidance, and verify real billing with your provider.