Thoughts on LLMs and impact per token

2026-05-20

I've been expressing an opinion multiple times recently and I figured I'd refine my thoughts and write it down.

When using LLMs, and in light of their impressive energy consumption, it's important to make sure that whichever use-case you're trying to cover is using as few tokens as possible.

The best way I could think of this is by creating a new metric: impact per token. The goal isn't really to give it a real measurable value, but to get a sense of whether the tokens are being well spent or not.

I'll illustrate my point from worst impact per token (low), to best impact per token (high).

Generating tokens for a specific user (worst)

Let's imagine you make a chat interface. Here you're generating specific tokens per user session. In my view this is the worst possible case as these tokens will be read once, then never bring more value to you or your users. Avoid as much as possible. Here the number of tokens would scale linearly with the number of users, it's easy to see how this could end up boiling a small ocean.

Generating tokens for a specific piece of content

Here you could imagine how a piece of content (say a blog post, a report, an image) is enriched by an LLM to either summarise it or describe it. In this case the number of token consumed scales linearly to the number of content item you have, and assuming you have more users than content, it puts you in a much better position in terms of impact per token.

I like to think of this as a first order derivative of the tokens per user use-case (even though not mathematically correct)

Generating tokens for your content pipeline (best)

Finally, the best possible use case I can think of is spending tokens on the pipeline that generates your content. This could be implementing a feature you wouldn't have otherwise bothered implementing, but that brings value to your users (that part where it brings value to your users is pretty important, otherwise it's just wasted). In this case it's a one-off cost which will keep paying dividends in terms of impact per token.

This could be a second order derivative in my incorrect mathematical model.

If you're going to burn tokens, this is where you must look first.