If you're building tools with Claude's API — whether that's a marketing analytics assistant, a campaign naming correction workflow, or anything that connects AI to your data — you've probably heard people throw around the term "tokens." But what are tokens, really? And more importantly, what makes your API bill go up or down?
This post walks through everything I've learned building AI-powered marketing tools that connect Claude to BigQuery and dbt. We'll cover how token pricing works, why agentic workflows cost more, what MCP servers are, what system prompts and .md files actually do, and how to architect your app so you're not burning money on every request.
What Are Tokens and Why Do They Determine Cost?
Tokens are the fundamental unit that language models like Claude process. Think of them as the "atoms" of text from the model's perspective. A token is roughly three to four characters of English text, or about three-quarters of a word. The sentence "What was our total ad spend last month?" is roughly 10 tokens. Short words like "the" or "is" are usually one token, while longer words like "impressions" might be two or three.
Every time you make an API call to Claude, two things get counted separately. First, the input tokens — this is everything you send to Claude, including your instructions, the user's question, any conversation history, and any documents or data you attach. Second, the output tokens — everything Claude generates in its response back to you.
Here's the critical part that trips people up: output tokens cost significantly more than input tokens. For Claude Sonnet 4.5, the pricing breaks down to $3 per million input tokens and $15 per million output tokens. That's a 5x multiplier on the output side, which makes intuitive sense when you think about it — generating new text requires more computational work than reading existing text.
This distinction matters more than most people realize. If you're building a tool where Claude returns large tables of data or writes lengthy explanations, the output token cost is where your bill lives. Asking Claude to be concise in your instructions isn't just a UX preference — it's a direct cost optimization strategy.
What Makes Costs Go Up or Down?
There are several levers you can pull, and understanding each one helps you make smarter architectural decisions when building your tool.
Model choice is the single biggest lever. Anthropic offers three tiers of Claude models, each with different capability and cost profiles.
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Opus 4.5 | $5 | $25 |
| Sonnet 4.5 | $3 | $15 |
| Haiku 4.5 | $0.80 | $4 |
For most marketing data queries — things like writing SQL and summarizing results — Sonnet is the sweet spot. It's smart enough to handle the work without the premium you'd pay for Opus.
System prompt length is something people consistently overlook, and it deserves its own section (which we'll get to below). The key thing to understand right now is that your system prompt gets sent with every single API request. If you've written a 3,000-token system prompt full of schema definitions and instructions, that's 3,000 input tokens charged on every call your users make. Across 10,000 monthly queries, that's 30 million tokens just in system prompt overhead.
Conversation history is where costs can silently snowball. This is one of the most important things to understand about how the API works. The model has no memory between calls, so your application has to send the full history every time for Claude to have context.
To visualize this, use the "Star Wars Ticket" Analogy:
Imagine if every time a new Star Wars movie came out, you were physically incapable of understanding it unless you re-watched every single previous movie first.
- Movie 1 (A New Hope): You watch it. You pay for one ticket. Easy.
- Movie 2 (The Empire Strikes Back): To watch this, the theater forces you to watch A New Hope again first. You pay for two tickets.
- Movie 3 (Return of the Jedi): Now you have to watch Episodes 4 and 5 before they'll let you see Episode 6. You're paying for three tickets.
By the time you get to The Rise of Skywalker (the 9th turn in your conversation), you are paying for nine tickets just to see one new movie.
That means the fifth message in a conversation carries the full weight of the previous four exchanges as input. By turn ten, you might be sending 10,000+ input tokens before the user's actual question even factors in.
Attached data — whether that's documents, images, or query results — consumes input tokens. A BigQuery result set that returns 30 rows of daily spend data might be 1,500 to 2,000 tokens. A single image can run into thousands of tokens. This is especially relevant for marketing tools where you're passing campaign performance data back through Claude for summarization and interpretation.
Prompt caching is Anthropic's built-in cost optimization for repeated content, and it's a big deal for tool builders. If part of your input stays the same across requests (like a system prompt or schema definition), you can cache it. The first request pays a small premium to write to cache, but every subsequent request reads from cache at 90% off the normal input price. For tools where every request shares the same base context — which describes most marketing analytics tools — this is an enormous savings opportunity.
A Real Example: How Much Does a Marketing Data Query Cost?
Let's make this concrete with a scenario that's directly relevant to marketing teams. Say you've built a tool where team members can ask natural language questions about campaign performance, and Claude queries BigQuery to get the answer. The response might be a single sentence, or it might be a full table of spend broken out by day and channel.
A simple question like "What was our total spend last month?" involves roughly 2,000 input tokens for the system prompt and schema context, another 30 to 50 tokens for the user's question, and maybe 100 tokens for the SQL Claude writes. BigQuery returns a single number, Claude wraps it in a sentence, and the output is around 30 tokens. Total cost: roughly half a cent.
Now consider a heavier question like "Show me daily spend by channel for the past 30 days." The system prompt is the same 2,000 tokens, the question is similar in length, but the BigQuery result set might be 1,500 to 2,000 tokens of tabular data coming back in. Claude then formats a table and adds some commentary, producing maybe 800 to 1,000 output tokens. Total cost: two to three cents.
So between the simplest and most complex queries, you're looking at roughly a 3x to 6x cost difference. But in absolute terms, even the expensive queries are pennies. At 1,000 queries per month with an average cost of about a cent and a half each, you're at roughly $15 per month in API costs. Even at 10,000 queries, that's $150. For a tool that gives an entire marketing team self-service access to their data, that's remarkably affordable.
Why Agentic Workflows Cost So Much More
You've probably seen the buzz around AI agents — systems where Claude doesn't just answer a question, but takes a series of actions autonomously. Think about an agent that reads your email, drafts a response, checks your calendar for conflicts, and then sends the reply. Or the stock trading bots and research agents people are building. These agentic workflows are fundamentally different from a simple question-and-answer exchange, and the cost difference is substantial. Understanding why helps you appreciate where your own tool sits on the cost spectrum.
The key is that compounding conversation history problem we discussed earlier. In an agentic workflow, each step is a separate API call, and every call includes the full history of everything that came before it. Step one sends the system prompt and the user's request, and Claude decides which tool to call. Step two sends everything from step one plus the results of the first tool call, and Claude reasons about what to do next. Step three sends everything from steps one and two plus the new results. By step eight or ten, you're sending a massive input payload with every single call — and paying for all of it.
A task that chains 10 tool calls together might cost $0.50 to $2.00 or more per execution. A complex research or trading agent that runs 15 to 20+ steps with lots of context passing through can run $1.00 to $5.00 or higher. Compare that to your marketing data query at a penny or two.
The good news is that a typical marketing data tool has a lean workflow. A user asks a question, Claude writes SQL, BigQuery returns results, Claude summarizes the answer. That's roughly two round trips — not ten or twenty. You're firmly on the affordable end of the agent cost spectrum. Where costs could creep up for you is if you allow multi-turn conversations where users ask follow-up after follow-up, because each turn carries the full weight of everything before it.
