Token Management
Last updated: Jan 2026
Overview
Tokens are the building blocks of AI model interactions. Understanding how tokens work helps you optimize performance, stay within limits, and control costs.
Key concepts to understand: Input Tokens (what you send to the model), Output Tokens (what the model generates), Context Limit (maximum total tokens), and Cost (credits charged per token).
Understanding Tokens
Tokens are pieces of text that models process. They're not exactly words - a token might be a word, part of a word, or punctuation.
Token Examples
"hello"= 1 token"authentication"= 2-3 tokens"Hello, world!"= 4 tokensfunction() {}= 5+ tokensRule of Thumb
For English text: ~4 characters = 1 token, or ~0.75 words = 1 token. Code and non-English text often use more tokens per word.
Context Limits
Each model has a maximum context window - the total tokens for input plus output combined.
| Model | Context Window | Max Output |
|---|---|---|
| Claude Opus 4.5 | 200K tokens | 4K tokens |
| Claude Sonnet 4.5 / 4 | 200K tokens | 4K tokens |
| Claude Haiku 4.5 | 200K tokens | 4K tokens |
| GPT-5.2 / GPT-5 Mini | 128K tokens | 16K tokens |
| GPT-4o / GPT-4o Mini | 128K tokens | 4K tokens |
| o3 / o4-mini (reasoning) | 128K tokens | 16K tokens |
| Gemini 2.5 Pro | 1M tokens | 8K tokens |
| Gemini 2.5 / 2.0 Flash | 1M tokens | 8K tokens |
Context Overflow
If input + output exceeds the context window, the request will fail. Monitor your token usage and truncate long inputs if needed.
Optimization Strategies
Reduce token usage without sacrificing quality with these strategies.
- Concise prompts: Remove redundant words. "Summarize this" not "I would like you to please summarize".
- Truncate long inputs: For long documents, extract relevant sections before sending.
- Use smaller models: Simple tasks don't need large models. Haiku 4.5 or GPT-4o Mini are efficient choices.
# Before (verbose) - 45 tokens
I would like you to please take a look at the following
customer email and provide me with a comprehensive summary
of what the customer is asking about.
# After (concise) - 15 tokens
Summarize this customer email in 2 sentences:
# Savings: 30 tokens per request = 67% reductionTruncation Tips
- Keep first and last paragraphs (often contain key info)
- Remove boilerplate (headers, footers, signatures)
- Extract only relevant sections for the task
- Summarize long sections before detailed analysis
Monitoring Usage
Track token usage to understand costs and optimize workflows.
- Execution Details: Each AI node execution shows input tokens, output tokens, and total credit cost in the execution pane.
- Usage Dashboard: View aggregated token usage by workflow, time period, and model in the analytics dashboard.
Cost Control
Implement these controls to manage AI spending effectively.
- Use smaller models (Haiku 4.5, GPT-4o Mini) for simple tasks
- Review usage reports weekly to identify optimization opportunities
- Test with small samples before processing large datasets
Quick Wins
The biggest cost savings usually come from: (1) using smaller models where appropriate, (2) reducing prompt verbosity, and (3) reducing tool calls.
Key Takeaways
Tokens are text pieces - roughly 4 characters or 0.75 words each.
Input + output must fit within the model's context window.
Write concise prompts to reduce usage.
Monitor token usage in execution details and dashboards.
Use smaller models for simple tasks to reduce costs.