Two of the most influential figures in AI fundamentally disagree on how we're going to achieve Artificial General Intelligence (AGI). Ilya Sutskever, former Chief Scientist of OpenAI and now Chief Scientist at Safe Superintelligence, thinks that if we keep scaling up LLMs, we'll eventually get there. On the other hand, François Chollet, creator of Keras, thinks that LLMs alone aren't enough.
Both Sutskever and Chollet have been on Dwarkesh Patel's podcast, where they've discussed their views on LLMs and AGI. It should be noted that Ilya's interview took place over a year ago, while François's interview occurred only a few days ago. So, it's possible that Ilya's views have changed in the past year. However, it's clear that Ilya is more bullish on LLMs than Chollet, so the majority (if not all) of his opinions probably still stand.
Early in the interview Dwarkesh asks Ilya, "What would it take to surpass human performance?" Right off the bat, Ilya challenges the claim that, "next-token prediction is not enough to surpass human performance." He argues against the premise that LLMs just learn to imitate, giving a straightforward counter argument.
If your base neural net is smart enough. You just ask it — What would a person with great insight, wisdom, and capability do? Maybe such a person doesn't exist, but there's a pretty good chance the neural net would be able to extrapolate how such a person would behave.
Ilya believes that LLMs are so good at generalization that they might even extrapolate the behavior of a "super" person that doesn't actually exist. Before ChatGPT was revealed to the world, almost nobody would have believed such a claim. But now, there seems to be a large number of people, both inside and outside the machine learning community, that believe LLMs can generalize well beyond their data. He goes on to discuss the implications of next-token prediction's success.
...what does it mean to predict the next token well enough? It's actually a deeper question than it seems. Predicting the next token well means that you understand the underlying reality that led to the creation of that token.
While Ilya doesn't explicitly define "understanding" in this context, it certainly sounds like he thinks LLMs are capable of much more more than just memorization. Ilya expands on the this idea:
It's not statistics. Like, it is statistics, but what is statistics? In order to understand those statistics, to compress them, you need to understand what is it about the world that creates this set of statistics?
Here, I believe Sutskever is arguing that the only way an LLM could be so good at predicting the next token is by actually compressing the data in a way that reflects the underlying reality of the world. Essentially, he's saying that while LLMs are just statistics, that doesn't actually mean that they have no understanding. In fact, Sutskever seems to think that the only way to be so proficient at next-token prediction is to have a deep understanding of the world.
I find this argument compelling, but I think it still an open question how much "understanding" is really going on versus how much memorization is involved.
While Ilya only touches on LLMs relation to AGI in his interview, the majority of François Chollet's interview revolves around the topic. Within the first couple of minutes, Chollet explains how he believes LLMs function.
If you look at the way LLMs work is that they're basically this big interpretive memory. The way you scale up their capabilities is by trying to cram as much knowledge and patterns as possible into them.
While this statement doesn't directly contradict Ilya's claims, it's clear that Chollet is less impressed by LLMs ability to generalize. Later in the interview, he goes into more depth.
If you scale up your database, and you keep adding to it more knowledge, more program templates then sure it becomes more and more skillful. You can apply it to more and more tasks. But general intelligence is not task specific skills scaled up to many skills, because there is an infinite space of possible skills.
Now this directly contradicts Ilya's claim that LLMs can extrapolate the behavior of a "super" person. Chollet continues by explaining what general intelligence actually is.
General intelligence is the ability to approach any problem, any skill, and very quickly master it using very little data. This is what makes you able to face anything you might ever encounter. This is the definition of generality. Generality is not specificity scaled up. It is the ability to apply your mind to anything at all, to arbitrary things. This fundamentally requires the ability to adapt, to learn on the fly efficiently."
Chollet argues that general intelligence cannot be achieved merely by scaling up models or training data. He explains that more data increases a model's "skill" but does not bestow general intelligence. I love the way he succintly summarizes this: "Generality is not specificity scale up."
Throughout the podcast, Chollet carefully explains why he thinks LLMs lack understanding and are really just "memorizers." He goes so far as to say that LLMs lack intelligence altogether. Personally, I think this is very much dependent on your definiton of intelligence, but his point is well taken.
To aid in the search for general intelligence, Chollet created the ARC challenge, a test he believes can only be fairly completed by using general intelligence. He notes that an LLM or other model could be trained on many millions of ARC-like problems, and that it might be able to beat ARC, but that it would be "cheating" in a sense. He argues that the model would not actually be learning to generalize, but rather memorizing the answers to the ARC problems.
Interestingly, the state-of-the-art model on the ARC challenge does in fact use an LLM, but it adds quite a bit of extra machinery to the model to help it generalize. In the podcast, Chollet explains how he expects an LLM to be part of the final solution to AGI, but that it will need to be combined with other techniques to achieve general intelligence.
While Sutskever thinks AGI is mostly a matter of scaling up LLMs, Chollet is unconvinced. It's evident that even top AI researchers cannot agree on the correct path to AGI.
Estimates for achieving AGI vary widely, from a few years to several decades. Chollet even suggests that LLMs have slowed down progress towards AGI, stating, "OpenAI basically set back progress towards AGI by quite a few years, probably like 5-10 years." Only time will reveal whose perspective is closer to the truth.
If you haven't already, I highly recommend listening to both podcasts. Here's the link to Ilya's and here's a link to François's.