AI Scenarios - Felix Breuer's Blog

The Future of AI

Since the launch of ChatGPT in November 2022, AI systems have made significant progress. Yet, arguably, no advancement in capabilities since then has made quite as much of an impression on the global public as that launch – and widespread productivity gains have yet to materialize. How will AI develop going forward and at what timescales? Will the real-world impact of AI remain at the level of smart assistants as we use them today? Or will we reach Artificial General Intelligence (AGI), where AI systems can perform any task almost any human can do and act as independent agents in the real world? Will AI advance even further to Artificial Super Intelligence (ASI), vastly exceeding human capabilities in almost all domains?

Instead of making specific predictions, I will argue in this article that all three of the scenarios mentioned above are worth considering in depth. First, there are important conceptual reasons why progress in AI might plateau at the level of AGI or even at the level of AI assistants without advancing further for a prolonged period of time. Second, depending on which scenario materializes, the real-world outcomes will be vastly different, and therefore should be considered separately.

Exponential Growth

Exponential growth is never really a single exponential curve. Rather, it is made up of many sigmoid curves: temporary regimes of rapid improvement that eventually reach saturation and level off, when the technology that spurred the growth gets stuck in ever-diminishing returns. The overall exponential growth trajectory is kept going by another technology taking over, providing the next growth spurt.

A great example of this is Moore's Law. In its general form, it simply states that the amount of compute available per inflation-adjusted dollar grows exponentially. This exponential path of growth has held for over a century across different computation technologies. The specific Moore's Law is a statement about the density of transistors on semiconductors, which in itself has held for 50 years now by virtue of ongoing regime changes in chip manufacturing and design.

In the debate around future growth of AI, much has been made of the question whether the rapid improvements felt after the launch of ChatGPT in late 2022 will continue on an exponential trajectory and whether, to this end, scaling the amount of compute involved will be necessary or sufficient or both.

Scalable Methods

An influential talk titled “The Future of Artificial Intelligence Belongs to Search and Learning” on the role of scale in AI was given by Richard Sutton in 2016. In it he articulates what is known colloquially as the “biggest lesson” or sometimes the “bitter lesson” in AI circles: Scalable methods always win in the long run.

In AI, a scalable method is a method that can use arbitrary quantities of computation and its performance improves proportionately to the amount of computation used. These methods in particular beat “smart” methods that have a lot of domain specific knowledge or structure built-in. Instead of going after one-off gains that can be achieved by hand-crafting a particular human insight into the method, find a general methods that can scale with compute. This will take longer, but the method will benefit from Moore's Law in the long run and eventually overtake hand-crafted competitor methods. (Of course in principle you can do both, but in practice, Sutton argues, people prioritise one over the other.)

Sutton goes through many empirical examples for this phenomenon, which I will not reiterate here. I will point out, though, that it is quite easy to find further examples supporting this hypothesis since 2016 – the spectacular success of Large Language Models (LLMs) being the most noteworthy, but including also breakthroughs in Go, Poker and protein folding.

Data and Scale

A key detail in Sutton's definition of scalability is the requirement that the method should be able to use arbitrary amounts of computation to extract further proportional performance improvements. This statement will rarely be true for any AI method in full generality – instead it will be conditional on some assumptions, in particular regarding the availability of data.

Media Data

LLMs, and the transformer architecture in particular, have certainly demonstrated vast ability to scale, beyond what could reasonably be expected when the architecture was invented in 2017. In fact, because it is so effective, this method has already been applied to a significant fraction of all human media ever created. As Ilya Sutskever put it at his NeurIPS 2024 talk, internet content is “the fossil fuel of AI” and “we have but one internet”. The use of human-generated media data to scale on is reaching its limit.

Rule-Based Data

Note that this is in contrast to the narrative proposed by some in 2023, that the methods in use at that time would suffice to reach AGI, all it would take was scale. Instead, the progress made by AI in late 2024 on tasks requiring reasoning, was only achieved by innovative methods combining LLMs with reinforcement learning (RL). Continued progress was not guaranteed by the growth of compute but required a regime change in the algorithms employed.

Most importantly, this regime change leveraged a different category of data: In domains where tasks have a verifiably correct answer, the feedback required for RL algorithms is easy to generate automatically. This applies in particular to mathematics and particular areas of computer science: Proofs of mathematical theorems are hard to find but easy to verify. Games such as chess and go can also be included in this category. The rules of these games generate a vast but finite and perfectly well-defined decision tree, that very large amounts of data can be generated from, for example by letting the model play against itself. For the sake of this argument, I call this type of data rule-based data.

The regime of scaling reasoning models on rule-based data has only just begun. Tremendous amount of progress in this area can reasonably be expected in the short term. This is not without its pitfalls: as Ilya Sutskever points out in the talk, RL will make models more unpredictable, as opposed to models trained solely on media data. While classic LLMs are mainly “stochastic parrots” that replicate human writing patterns, the combined LLM+RL systems are truly optimising towards a reward function, and as such as are much closer to the concept of “AI” that AI existential risk folks have been worried about for decades.

Simulated Data

The issue with rule-based data, though, is that very few domains of human (or more generally real-world) activity are governed by strict rules and have verifiably correct answers. For general artificial intelligence to arrive, other types of data will be needed. One step beyond rule-based data is simulated data: Instead of requiring that we have at our disposal the entire rules of the game or that we can perfectly decide whether a solution is correct, we instead settle for approximately simulating the effect of actions. Most pertinent example of this is placing a robot in a simulated environment. If the simulation can approximate the physics of interactions of the real robot with a real environment well enough, then the system can learn from the simulation without having to go through the slow and expensive process of putting the real robot through real-world exercises. Importantly, not just the number and duration of simulations, but in many cases also their accuracy can scale with compute and thus benefit from Moore's Law.

The downside of this approach is that the model will learn the simulation. The model's real-world behaviour then stands and falls with how accurately the simulation reflects the real world. Simulating the physics of stacking boxes in a warehouse or the behaviour of drivers, cars, and pedestrians in a busy downtown environment is one thing. Simulating the chemistry of cooking, the human body during surgery or the behaviour of toddlers in a nursery is another. Simulating the real economy, financial markets or the global climate yet another. What these examples illustrate is that constructing good simulators can quickly become the limiting factor of AI, not learning. The industry focus on building “digital twins” is motivated by precisely this observation.

Experimental Data

One means of improving simulators, or indeed of gathering data directly, is to conduct experiments. Here, I do not mean computational experiments that can be scaled with processing power by virtue of known rules or simulations. Instead, this means experiments that have to be conducted in the real world. This includes building particle colliders, conducting clinical trials, or trying to successfully land rockets. It also includes, after all the market research has been done and the MVP has been built, just going ahead and actually founding that start-up to see if the business idea really works – an experiment with a sample size of 1.

As these examples show, gathering such experimental data is very slow and expensive. Sometimes, time can be traded-off for cost. Sometimes, increases in sample size can allow arriving at conclusions more quickly. Either way, the time-scale at which experiments operate is positively glacial, compared with exponential growth of compute. In contrast, to rule-based data and simulated data, compute itself does not accelerate the real-world data gathering process in any way. In this manner, experimental data can become a bottleneck for improvement in artificial intelligence in many domains.

Observational Data

Experiments have the advantage, however, that, in principle, they allow to control for confounding variables, to decide the sample size, to design a test specific to the question you are trying to answer. In many real world domains, however, this is not possible, and you are left with simply observing the world and recording what happens as time goes by. Sometimes, this can be sped up by deploying many observers. A great example of this is Tesla's fleet of cars, which have been gathering data relevant for self-driving for over a decade. In other cases, when the subject of interest is planet earth, the world economy, or the global financial market, the only option is simply letting history unfold.

Therefore, observational data can be scaled only to a very limited extent. This applies in particular to observational data that covers rare events, even though these rare events are often the most important to learn from. At the same time, observational data is the most relevant type of data in most domains of human activity.

Scenarios

As discussed, scaling AI capabilities requires both data and compute. Even if compute per dollar continues to scale exponentially in line with Moore's Law, data is subject to different scaling laws. Media data is finite and has already been largely exploited. Rule-based data is the basis of a new growth regime in AI, but limited to certain domains. Simulated data is scalable but depends on accurate simulators, which need to be built for each domain. Experimental data is the gold standard of data about the real world, but expensive to acquire. Observational data is comparatively cheap to record, but scales only with time. Neither experimental nor observational data scale with compute or “intelligence”.

The point of this is that in domains that require experimental or observational data, or the use of accurate simulators, rapid progress of AI system is not guaranteed by virtue of scaling compute. Of course, rapid progress in those domains is still possible: A human brain reaches general intelligence using data that is readily available through embodied interaction with the real world. However, given the data constraints, achieving AGI will require more than just scaling up current AI methods. Breakthrough innovations will be required for AGI and when those materialize is hard to forecast. While we know for a fact (by virtue of our own existence) that achieving what we call “general intelligence” is possible, it is not clear how far and how quickly data limitations will allow AI to scale beyond AGI to ASI.

This leads us to consider the following scenarios.

AI Assistants

In this scenario, AI systems scale fully in the directions opened by media and rule-based data. In particular, AI achieves super-human performance in many narrow domains of expertise, especially those where it is straightforward to define whether the solutions presented by AI are “correct”. This can include breakthroughs in math, computer science, computational chemistry and related fields. It can also include a revolution in medicine, though brilliant new AI methods will not obviate the need for clinical trials, which will take their time.

However, in this scenario, AI does not become fully “agentic” or “general”, due to the limits imposed by experimental and observational data. While AI is smart, in practical use-cases it needs to be supervised by humans, who need to bring their commons sense to bear to sanity-check AI recommendations. Deploying an AI agent in a fully unsupervised manner on any task of true import would require not just 99% confidence that the agent will not make any major mistakes, but, say, 99.9999%. The path to “many 9s”, as this is sometimes called, may prove to be genuinely hard, and only feasible for narrow-domain agents that are engineered one domain at a time.

A good example for this are, again, self-driving cars. The first version of Tesla's “Autopilot” system was released in 2015. Despite Tesla's vast fleet of cars gathering real-world observational data, and immense progress in both compute and algorithms, a fully unsupervised robo-taxi is scheduled to enter production only in 2027 – and it remains to be seen whether this timeline realises. Waymo, taking a very different approach from Tesla, had its first unsupervised robo-taxi ride in 2015, and today, almost 10 years later, its robo-taxi service is still avaliable in only a handful of cities worldwide. This pace reflects mainly the engineering part of the problem. Regulatory approval and deployment around the world will add further delays before this type of unsupervised narrow-domain agent has large-scale real-world economic impact – and this slow process will take its course one domain at a time.

Even though this scenario is the most pessimistic in terms of the growth of AI capabilities, it is in some sense the most optimistic in terms of the economic impact of AI: It implies that in many domains, humans will become vastly more productive by leveraging AI assistants – yet only few humans will be displaced by AI (and gradually), given that human supervision is still essential for an extended period of time. This productivity boom will be deflationary, yet lead to tremendous economic growth in real terms.

Note that because of the ongoing growth in compute per dollar, combined with AI being constrained by the scale of data, in this scenario AI assistants will be very cheap to operate. Any breakthroughs in software or hardware enabling these developments are unlikely to remain exclusive to particular labs with proprietary IP. First movers will have a limited advantage and fast followers will benefit from reduced capex spend.

Last but not least, it should be noted that just because, in this scenario, AGI is not attained, does not mean that there are no risks associated with the deployment of AI assitents. Precisely because of the expected scaling of reinforcement learning, AI assistents will display both superhuman performance in narrow domains and unexpected behaviour, lacking common sense. A dangerous combination.

AGI Agents

The next step up, in terms of AI capabilites, is a scenario where agents that can make reliable unsupervised real-world decisions become a reality fairly quickly. By investing capital in experiments and observational data gathering across a wide range of application domains, and building high-quality simulators for each, AI development overcomes the immediate data bottlenecks. However, compute still grows much faster than data, making AGI agents, once developed, cheap to operate. These AGI agents will be both virtual and physical, with the former likely to arrive first.

This scenario implies a seismic shift in the global economy. Human labour will become truly redundant, in the sense that AGI agents can perform literally any task a human could do more effectively in terms of resources required. (To put this in perspective, a human daily diet of 2,000 kcal corresponds to 2.3 kWh, which, using technology available today, suffices to generate well over 1 million output tokens from DeepSeek R1 70B.) The first jobs to become redundant will likely be those that are screen-based, inexperienced, or rule-driven. More resilient will be those roles that require experience, judgement across multiple domains, or relationships and face-to-face interaction. But eventually there will be no value left for human labour to add, except where humans add value purely by being human. Brain-computer interfaces may change the nuances of this dynamic, but not the fundamentals.

On the one hand, this transformation will make the economy vastly more productive. The more important impact, however, is that, unless a different societal contract is put in place, e.g. granting a universal basic income, the purchasing power of the vast majority of humanty will completely vanish. A utopia is possible where humans spend all their time doing what they desire, caring for and entertaining each other. However making this utopia come to pass will either need massive transfers of capital (through one mechanism or another), or it requires that owners of capital are willing to pay a substantial premium for the privilege of being entertained by and cared for by humans specifically. A distopia is also imaginable where the economies of those who own capital and those who don't detach from one another, creating AI-generated wealth for the former and poverty for the latter. The dynamics will be highly non-linear. The only thing that seems certain is that returns on capital will vastly exceed returns on labour. Comparing this epochal change to the industrial revolution underestimates its impact by orders of magnitude. And this is not even touching upon the possibily that AGI agents might become self-aware, and all this implies.

However, in this scenario, the assumption is still that the capability of AI is limited. AGI agents can surpass humans by a wide margin in narrow domains and are more capable and resource-efficient as a general agent. Yet, ASI – an AI system that outperforms all humans by orders of magnitude in all areas – does not materialize, due to the scaling limitations for most domains discussed above.

ASI

What if ASI does materialize?

Much has been made of the distinction between fast-takeoff and slow-takeoff ASI. The idea behind fast-takeoff being that once AI reaches a level on-par with humans, then AI will be able to start improving itself, leading to a further increase in the pace of AI performance improvements. Of course some kind of self-improvement has been part of Moore's Law for the past century: advances in the previous generation of computers have always been leveraged to design and build the next generation of computers. The new element with ASI would be that, for the first time, humans are taken out of the loop entirely. The constraints of physics and economics remain in place however.

A better question to ask, in my opinion, is: How useful is raw intelligence really? The limitations imposed by experimental and observational data we have discussed already. For example, even with the most brilliant ideas for a grand unified theory of physics, experiments still need to be conducted to attempt to falsify it. There are many other limitations however, including chaos theory (even with a perfect model, to predict the global economy, society, or the weather with perfect accuracy, you need to measure initial conditions with greater accuracy than practically possible), computational complexity (making a Go board just one square larger doubles the complexity of the game), game theory (depending on the game, having unlimited computational power may increase a player's winnings only marginally versus competing players with simple strategies), economics (even with the most brilliant design for a new processor, you still need to build the factories to manufacture it and power plants to run it at scale), and markets (“markets can remain irrational longer than you can remain solvent”, “being right or making money”), to name just a few. In short, the omnipotence that some ascribe to ASI is not a given. The amount of real-world impact of ASI may well scale more with the amount of capital it is allowed to accumulate (or indeed that it is endowed with from the start) than with the magnitude of its intelligence.

However, none of the above obstacles are a guarantee that ASI, if and when it arrives, will not “take off”. Super-intelligence may well provide leverage above and beyond what we can imagine. “Any sufficiently advanced technology is indistinguishable from magic,” as Arthur C. Clarke famously put it. If this comes to pass, then all bets are off. In particular, after this event, humans will no longer be in charge – one way or another. ASI alignment is a real challenge that we have to get right on the very first try – a daunting challenge. The short story Firewall by David D. Levine provides a good sense of the magnitude of this occasion, and I will not spoil here whether it has a happy ending or not.