If you're building tools with Claude's API, whether that's a marketing analytics assistant, a campaign naming correction workflow, or anything that connects AI to your data, you've probably heard people throw around the term "tokens." But what are tokens, really? And more importantly, what makes your API bill go up or down?
This post walks through everything I've learned building AI-powered marketing tools that connect Claude to BigQuery and dbt. We'll cover how token pricing works, why agentic workflows cost more, what MCP servers are, what system prompts and .md files actually do, and how to architect your app so you're not burning money on every request.
What Are Tokens and Why Do They Determine Cost?
Tokens are the fundamental unit that language models like Claude process: think of them as the "atoms" of text from the model's perspective. A token is roughly three to four characters of English text, or about three-quarters of a word. The sentence "What was our total ad spend last month?" is roughly 10 tokens. Short words like "the" are usually one token, while longer words like "impressions" might be two or three.
Every API call to Claude counts two things separately:
- Input tokens: everything you send to Claude, including your instructions, the user's question, any conversation history, and any documents or data you attach.
- Output tokens: everything Claude generates in its response back to you.
Output tokens cost significantly more than input tokens. For Claude Sonnet 4.5, the pricing breaks down to $3 per million input tokens and $15 per million output tokens, a 5x multiplier on the output side. Generating new text requires more computational work than reading existing text. If you're building a tool where Claude returns large tables or writes lengthy explanations, the output token cost is where your bill lives.
What Makes Costs Go Up or Down?
There are several levers you can pull, and understanding each one helps you make smarter architectural decisions.
Model Choice
The single biggest lever. Anthropic offers three tiers of Claude models, each with different capability and cost profiles:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Opus 4.5 | $5 | $25 |
| Sonnet 4.5 | $3 | $15 |
| Haiku 4.5 | $0.80 | $4 |
For most marketing data queries, things like writing SQL and summarizing results, Sonnet is the sweet spot. It's smart enough to handle the work without the premium you'd pay for Opus.
System Prompt Length
Something people consistently overlook: your system prompt gets sent with every single API request. If you've written a 3,000-token system prompt full of schema definitions and instructions, that's 3,000 input tokens charged on every call your users make. Across 10,000 monthly queries, that's 30 million tokens just in system prompt overhead. Prompt caching (covered below) is the solution.
Conversation History
This is where costs can silently snowball. The model has no memory between calls, so your application has to send the full conversation history every time for Claude to have context. Think of it like a movie theater with a strange policy:
- Movie 1 (A New Hope): You watch it. You pay for one ticket.
- Movie 2 (The Empire Strikes Back): To watch this, the theater forces you to watch A New Hope again first. You pay for two tickets.
- Movie 3 (Return of the Jedi): Now you have to watch Episodes 4 and 5 before they'll let you see Episode 6. Three tickets.
By the time you get to The Rise of Skywalker (the 9th turn in your conversation), you are paying for nine tickets just to see one new movie. In API terms: the fifth message in a conversation carries the full weight of the previous four exchanges. By turn ten, you might be sending 10,000+ input tokens before the user's actual question even factors in.
Attached Data
Documents, images, or query results all consume input tokens. A BigQuery result set that returns 30 rows of daily spend data might be 1,500 to 2,000 tokens. A single image can run into thousands of tokens. This is especially relevant for marketing tools where you're passing campaign performance data back through Claude for summarization and interpretation.
Prompt Caching
Anthropic's built-in cost optimization for repeated content, and it's a big deal for tool builders. If part of your input stays the same across requests (like a system prompt or schema definition), you can cache it. The first request pays a small premium to write to cache, but every subsequent request reads from cache at 90% off the normal input price. For tools where every request shares the same base context, this is an enormous savings opportunity.
A Real Example: How Much Does a Marketing Data Query Cost?
Let's make this concrete with a scenario directly relevant to marketing teams: a tool where team members ask natural language questions about campaign performance, and Claude queries BigQuery to get the answer.
| Query Type | Example | Approx. Cost |
|---|---|---|
| Simple | "What was our total spend last month?" | ~$0.005 |
| Heavier | "Show me daily spend by channel for the past 30 days" | ~$0.02–$0.03 |
Between the simplest and most complex queries, you're looking at a 3x to 6x cost difference, but in absolute terms, even the expensive queries are pennies. At 1,000 queries per month with an average cost of about a cent and a half each, you're at roughly $15 per month in API costs. At 10,000 queries, that's $150. For a tool that gives an entire marketing team self-service access to their data, that's remarkably affordable.
Why Agentic Workflows Cost So Much More
AI agents, systems where Claude takes a series of actions autonomously like reading your email, checking your calendar, and drafting a reply, are fundamentally different from a simple Q&A exchange, and the cost difference is substantial.
The key is that compounding conversation history problem. In an agentic workflow, each step is a separate API call, and every call includes the full history of everything before it:
- Step 1: Sends system prompt + user request. Claude picks a tool.
- Step 2: Sends everything from step 1 + tool results. Claude decides what to do next.
- Step 3: Sends everything from steps 1 and 2 + new results. And so on.
A task that chains 10 tool calls together might cost $0.50 to $2.00 per execution. A complex research or trading agent running 15 to 20+ steps can run $1.00 to $5.00 or higher. Compare that to your marketing data query at a penny or two.
Marketing data tools stay lean
A typical marketing data tool has a lean workflow: user asks a question → Claude writes SQL → BigQuery returns results → Claude summarizes. That's roughly two round trips, not ten or twenty. Where costs could creep up is if you allow multi-turn conversations; each follow-up question carries the full weight of everything before it.
