The AI cost panic has arrived earlier than many small teams expected. In the first half of 2026, the argument around coding agents changed from “how much faster can we go?” to “why is the bill this large, and what did we actually ship?”
That’s the real question behind the tokenpocalypse. The industry term, popularised in recent Axios coverage, is “tokenmaxxing”: using as many AI tokens as possible because heavy usage is treated as proof of seriousness, skill or future-readiness. It sounds technical. It’s often just a vanity metric with an invoice attached.
My view is blunt: higher-effort AI is useful and sometimes worth the cost, but “more tokens = more value” is a bad operating principle. If a small business copies the behaviour of AI labs, Big Tech engineering teams or venture-backed founders, it can end up buying a costly simulation of progress.
What higher-effort AI actually costs
A normal chatbot exchange is relatively bounded. A person asks a question, the model answers, perhaps there are a few follow-ups. An agentic workflow is different. A coding agent may inspect files, plan work, call tools, write code, run tests, read failures, rewrite code, spawn subtasks, summarise state and continue. Each loop consumes tokens. If the agent is pointed at a large codebase or left running for hours, the spend compounds quietly.
That’s why the economics feel so strange. The visible product may be a pull request, a report or a workflow change. The hidden cost is every intermediate thought, file read, retry, tool result and failed path. Gartner has forecast that model inference will become far cheaper by 2030, but it also warns that agentic models need many more tokens per task than standard chatbot use. Lower unit prices don’t automatically mean cheaper projects if the project starts consuming vastly more units.
This is the distinction small teams need to understand. The useful question isn’t “how much AI did we use?” It’s “what outcome did we get that we could not have achieved at lower cost another way?”
The cracks are now visible
The first warning sign is that even large technology companies are trimming or redirecting AI usage. Windows Central reported that Microsoft is winding down most Claude Code licences in its Experiences and Devices division and steering developers towards GitHub Copilot CLI by the end of June. The report frames the move partly around cost and partly around Microsoft wanting engineers on its own tooling.
That doesn’t prove Claude Code is bad. It proves that even Microsoft, a company with deep AI partnerships and enormous engineering budgets, is asking whether the tool mix is worth the bill. For an SME, that’s the useful lesson. If the largest buyers are rationalising usage, smaller companies should not treat unlimited agent time as a default setting.
Uber is a sharper example. TechCrunch reported that Uber introduced monthly caps after burning through its annual AI budget in four months. The report says Uber put a $1,500 monthly cap per employee and per agentic coding tool, including Claude Code and Cursor. It also notes that the budget blowout followed a period where staff were encouraged to use AI heavily.
The uncomfortable part is not merely the spend. It’s the measurement gap. Uber COO Andrew Macdonald has been reported as saying it’s hard to draw a line between increased AI usage and new consumer features. That sentence should be printed above every AI budget dashboard. If the token graph is rising but shipped value isn’t, the graph isn’t a productivity metric. It’s a cost centre with better branding.
Then there is the half-billion-dollar anecdote. Tom’s Hardware reported, citing Axios, that an unnamed enterprise client spent $500 million in a single month after failing to put usage limits on Claude licences. Treat the exact company identity as unknown. But the point still stands: without caps, approval flows and outcome tracking, agentic AI can create bills that are out of proportion to the work people thought they were buying.
Tokenmaxxing is culture, not strategy
Tokenmaxxing has a seductive logic. If AI is the future, the people using the most AI must be closest to the future. That’s how usage becomes a scoreboard. Workers leave agents running. Teams compare token consumption. Managers look for signs that employees are embracing the new toolset. The problem is that token volume measures activity, not judgement.
The New York Times reported in March that a single Claude Code user at Anthropic ran up more than $150,000 in a month, and that some technology companies were using token consumption in internal leaderboards or performance signals. Axios has also covered the pro-tokenmaxxing argument: one developer said people should spend on AI as much as they spend on rent if they want to understand what the technology can do.
That’s a fair experiment for a specialist who is deliberately stress-testing the frontier. It’s reckless as a general business rule. Most businesses aren’t AI labs. Most teams don’t need to prove they can burn tokens. They need to answer customers, ship work, reduce mistakes, improve margins and free up skilled people for higher-value decisions.
There are sceptics inside the technology industry too. Tom’s Hardware reported Nvidia VP Bryan Catanzaro’s view, originally given to Axios, that compute spend for his team was far beyond employee salaries. MIT CSAIL research has also shown why technical possibility isn’t the same as economic viability: in its computer-vision task analysis, only about 23% of wages for exposed tasks were economically attractive to automate at then-current prices.
That MIT work isn’t a direct study of 2026 coding agents. It’s still a useful warning. Automation doesn’t win simply because it can do a task. It wins when the full cost of deploying, supervising and maintaining the system beats the human or hybrid alternative.
The optimists are not obviously wrong
The sceptical case is strong, but it isn’t the whole story. The AI labs aren’t retreating. Anthropic has been expanding usage headroom, including reported higher Claude Code limits and temporary off-peak promotions. OpenAI is pushing Codex harder into companies: the company has said more than four million developers were using Codex weekly by April 2026, and it’s working with global systems integrators to take the tool from pilots into production workflows.
The infrastructure race also argues against a simple “AI bubble has popped” story. The big labs and their cloud partners are still signing enormous compute deals, building new data-centre capacity and trying to make inference cheaper. Forecasts from firms such as Goldman Sachs and Gartner point in the same direction: token demand is expected to grow dramatically, while the cost per token should fall sharply over time.
The optimists’ best argument is this: we’re in the messy installation phase of a general-purpose technology. Early usage is wasteful because companies are learning where the tool fits. Costs look ugly before workflows, routing, governance and model efficiency catch up. Some of today’s overspend may be the price of finding tomorrow’s operating model.
That argument deserves respect. Many technologies looked expensive and clumsy before they became ordinary. The mistake is turning that long-term possibility into a short-term blank cheque.
So does higher-effort AI pay off?
Sometimes. Not automatically. The honest answer is that higher-effort AI pays off when it’s pointed at constrained, valuable work with a measurable output and a human owner who can judge quality. It fails when the task is vague, the agent is left to wander, the data is messy, or the organisation measures token usage instead of shipped value.
For a small team, the best AI use cases are rarely “let an agent do everything”. They are narrower: draft a first pass, compare options, test a hypothesis, generate boilerplate, find inconsistencies, summarise research, write a migration plan, triage a queue, or produce a prototype that a skilled person can then verify. These uses can save time without pretending that AI is a full employee.
The worst use cases are predictable: open-ended agent runs with no budget cap, coding tasks where no one can review the output, customer-facing automation without safeguards, and “AI adoption” programmes where the target is usage rather than profit, speed, accuracy or customer experience.
That’s the practical line. The more autonomy you give an AI system, the more you need cost controls, quality controls and a clear definition of done. Otherwise you’re not buying labour. You are buying uncertainty by the token.
What small teams should do now
First, cap spend before you scale usage. Set per-user and per-tool limits. Review outliers weekly. Require approval for long-running agents, large context windows and repeated background jobs. If a tool does not give you clear usage reporting, treat that as a procurement risk.
Second, measure outcomes instead of tokens. For engineering, look at accepted pull requests, escaped defects, cycle time and review burden. For marketing, look at publishable drafts, research quality, time saved and conversion impact. For operations, look at resolved tickets, error rates and handoff quality. Token volume belongs in the cost column, not the success column.
Third, keep humans in the loop where judgement matters. The right comparison isn’t always AI versus employee. It’s often AI plus a capable person versus a capable person working alone. In many small businesses, that hybrid model is cheaper and safer than pretending an agent can own the entire workflow.
Fourth, build a routing habit. Don’t send every task to the most expensive model. Use cheaper models for summarisation, classification and routine transformations. Reserve frontier agents for work where reasoning depth materially changes the output.
Finally, don’t tokenmaxx. Experiment seriously, but don’t confuse intensity with progress. A team that spends GBP 500 a month and ships useful work is ahead of a team that spends GBP 5,000 a month proving it can keep agents busy.
For more context on how AI behaviour is already changing digital channels, Kahunam’s article on AI search and web traffic trends is a useful companion. The pattern is similar: the technology is real, but the winners will be the teams that measure business impact rather than chasing the loudest metric.
The tokenpocalypse isn’t a reason to abandon higher-effort AI. It’s a reason to grow up about it. Spend where the work is valuable. Stop where the value is vague. And never let a token graph become a substitute for a business result.