How Will Foundation Model Labs Make Money?

This essay topic unfortunately did not come to me in a stroke of brilliance, but rather from Dwarkesh Patel's Blog prize contest he just launched. See more here.

The foundation model companies have turned the razor-razorblade model on its head, and the inversion is killing them. The old pattern: the hard-to-replace software was sold at a discount to induce dependency, and then the services were marked up to make the profit. In the case of enterprise software, it meant selling a cheap basic product, scaling out to the point of lock-in, and then adding other profitable products around it. In the world of foundation models, everything works in reverse: the hard-to-replace product (model) sells at high costs of production; there may be margins in selling the consumables (calls, wrappers, consumer subscription), which can today easily be replaced with open source and self-hosted equivalents.

But it is missing a crucial element in explaining what made the old razor-razorblade model work for so long. It wasn't that the razor was irreplaceable through technology: it's a razor blade! No, Gillette won by creating a dependency so deep it was practically impossible to switch to a competitor despite their existence. The profit was generated not from the technological lock-in, but behavioral. In exactly the opposite manner, foundation labs have engineered a behavioral environment which encourages developers to adopt their frameworks but also makes defection incredibly easy.

What's more is that the new model has a certain asymmetry in it that's making it extremely difficult to solve. For instance, when a regular enterprise faces an issue of cost-cutting, they simply do it. But here, any reduction in training spend puts the company below quality evaluation on LMArena. In another example, raising prices on subscriptions encourages people to buy substitutes, which only increases the incentives away from foundational labs. So, all of the things that should conventionally work are actually exacerbating the problem, because the costs are load-bearing.

So where is the profit then? In the training.

Currently, every foundation model company approaches the challenge from the same point: Training is the cost side, inference the revenue side. Yet, in this model, the important asset is taken to be the fully trained models themselves, i.e., the weights. I make the argument that the important asset is not the finished model, but instead the capability for taking an arbitrary objective function and optimizing it massively using the joint machinery of hardware resources, data pipeline, RLHF, evaluation metrics, and learned experience with what works at scale. This is not about selling compute. This is not about becoming a cloud provider. This is about hitting an achievable, defined goal with time-bound outcomes and results.

Consider pharmaceuticals. Drug discovery today is a search process over molecular space, exactly the type of high-dimensional optimization for which foundational labs have years of experience. Protein folding was the easy example. The hard examples will be those where a search over the molecular space is needed but neither simulation nor brute force can help: where tasks are catalyst design, materials engineering, gene therapy target location. These are not cases where you ask a model a question and get back an answer, they are complex training runs with an objective function, a training cycle, and months-long optimization cycles.

Why has Pfizer not done this yet? There is no secret math involved here, no lack of computing power, or lack of scientists. The reason is simply that the experience of how to train a model successfully in a massive optimization without falling into divergence, garbage output, or failure is something known to perhaps only five organizations on earth, at a price tag of billions of dollars of failed experiments nobody else wants to repeat.

Here, the margin structure is finally turned upside down in favor of the model company. An API call to a chatbot is selling a commoditized product with zero switching cost for the customer. A multi-month optimization of combinatorial molecular search that OpenAI alone can achieve has huge switching costs, because there is trust involved in such a process, and no way to independently evaluate its progress along the way.

McKinsey doesn't have a proprietary product. Its frameworks are public. Any reasonably intelligent person can replicate the logic. McKinsey has margins because it executes a certain kind of process at a level that clients cannot, and because the client is paying for the institutional credibility of "we hired McKinsey." Foundation labs are building the same structure, except the process isn't strategic analysis but rather large-scale optimization.

It might seem self-evident that rather than pay billions, Pfizer would be better off hiring fifty MLEs and developing the capability in-house for far less. But this is the argument that has been leveled against McKinsey for the past sixty years, and McKinsey continues to grow. Strategy can indeed be developed internally. Yet companies continue to retain McKinsey for reasons including both expertise and political cover, though most importantly, because the price of failure on an internal effort to develop something that hasn't been done before isn't the salary of fifty engineers. It is the eighteen months wasted while your competitor's drug was discovered by those who knew every pitfall along the way. The foundational labs' competitive advantage does not rest with their models or APIs, but rather in the learnings of a hundred billion dollars worth of trial runs gone awry.

This is the new way to think about things. The chatbots, the APIs, and the subscribers are not the business. They are just the portfolio, and the equivalent of the published case studies from McKinsey. What matters is not when the API margin turns positive but when the first foundation models lab ink a multi-billion-dollar deal with a pharmaceutical company, a government, or a material sciences corporation, and when that deal bears no resemblance to a software license agreement. When that day comes, AI model companies will finally have made money on the training as well as inference.