The NIST report doesn't engage with training costs, or even token costs. It's concerned with the cost the end user pays to complete a task. Actually their discussion of cost is interesting enough I'll quote it in full.
> Users care both about model performance and the expense of using models. There are multiple different types of costs and prices involved in model creation and usage:
> • Training cost: the amount spent by an AI company on compute, labor, and other inputs to create a new model.
> • Inference serving cost: the amount spent by an AI company on datacenters and compute to make a model available to end users.
> • Token price: the amount paid by end users on a per-token basis.
> • End-to-end expense for end users: the amount paid by end users to use a model to complete a task.
> End users are ultimately most affected by the last of these: end-to-end expenses. End-to-end expenses are more relevant than token prices because the number of tokens required to complete a task varies by model. For example, model A might charge half as much per token as model B does but use four times the number of tokens to complete an important piece of work, thus ending up twice as expensive end-to-end.
> Users care both about model performance and the expense of using models. There are multiple different types of costs and prices involved in model creation and usage:
> • Training cost: the amount spent by an AI company on compute, labor, and other inputs to create a new model.
> • Inference serving cost: the amount spent by an AI company on datacenters and compute to make a model available to end users.
> • Token price: the amount paid by end users on a per-token basis.
> • End-to-end expense for end users: the amount paid by end users to use a model to complete a task.
> End users are ultimately most affected by the last of these: end-to-end expenses. End-to-end expenses are more relevant than token prices because the number of tokens required to complete a task varies by model. For example, model A might charge half as much per token as model B does but use four times the number of tokens to complete an important piece of work, thus ending up twice as expensive end-to-end.