The Great Debate: Sutskever vs. Chollet on the Future of AGI

June 19, 2024

Ilya vs Chollet

Are LLMs a Pathway to AGI?

Two of the most influential figures in AI fundamentally disagree on how we're going to achieve Artificial General Intelligence (AGI). Ilya Sutskever, former Chief Scientist of OpenAI and now Chief Scientist at Safe Superintelligence, thinks that if we keep scaling up LLMs, we'll eventually get there. On the other hand, François Chollet, creator of Keras, thinks that LLMs alone aren't enough.

Both Sutskever and Chollet have been on Dwarkesh Patel's podcast, where they've discussed their views on LLMs and AGI. It should be noted that Ilya's interview took place over a year ago, while François's interview occurred only a few days ago. So, it's possible that Ilya's views have changed in the past year. However, it's clear that Ilya is more bullish on LLMs than Chollet, so the majority (if not all) of his opinions probably still stand.

Sutskever's Interview

Early in the interview Dwarkesh asks Ilya, "What would it take to surpass human performance?" Right off the bat, Ilya challenges the claim that, "next-token prediction is not enough to surpass human performance." He argues against the premise that LLMs just learn to imitate, giving a straightforward counter argument.

If your base neural net is smart enough. You just ask it — What would a person with great insight, wisdom, and capability do? Maybe such a person doesn't exist, but there's a pretty good chance the neural net would be able to extrapolate how such a person would behave.

Ilya believes that LLMs are so good at generalization that they might even extrapolate the behavior of a "super" person that doesn't actually exist. Before ChatGPT was revealed to the world, almost nobody would have believed such a claim. But now, there seems to be a large number of people, both inside and outside the machine learning community, that believe LLMs can generalize well beyond their data. He goes on to discuss the implications of next-token prediction's success.

...what does it mean to predict the next token well enough? It's actually a deeper question than it seems. Predicting the next token well means that you understand the underlying reality that led to the creation of that token.

While Ilya doesn't explicitly define "understanding" in this context, it certainly sounds like he thinks LLMs are capable of much more more than just memorization. Ilya expands on the this idea:

It's not statistics. Like, it is statistics, but what is statistics? In order to understand those statistics, to compress them, you need to understand what is it about the world that creates this set of statistics?

Here, I believe Sutskever is arguing that the only way an LLM could be so good at predicting the next token is by actually compressing the data in a way that reflects the underlying reality of the world. Essentially, he's saying that while LLMs are just statistics, that doesn't actually mean that they have no understanding. In fact, Sutskever seems to think that the only way to be so proficient at next-token prediction is to have a deep understanding of the world.

I find this argument compelling, but I think it still an open question how much "understanding" is really going on versus how much memorization is involved.

Chollet's Interview

While Ilya only touches on LLMs relation to AGI in his interview, the majority of François Chollet's interview revolves around the topic. Within the first couple of minutes, Chollet explains how he believes LLMs function.

If you look at the way LLMs work is that they're basically this big interpretive memory. The way you scale up their capabilities is by trying to cram as much knowledge and patterns as possible into them.

While this statement doesn't directly contradict Ilya's claims, it's clear that Chollet is less impressed by LLMs ability to generalize. Later in the interview, he goes into more depth.

If you scale up your database, and you keep adding to it more knowledge, more program templates then sure it becomes more and more skillful. You can apply it to more and more tasks. But general intelligence is not task specific skills scaled up to many skills, because there is an infinite space of possible skills.

Now this directly contradicts Ilya's claim that LLMs can extrapolate the behavior of a "super" person. Chollet continues by explaining what general intelligence actually is.

General intelligence is the ability to approach any problem, any skill, and very quickly master it using very little data. This is what makes you able to face anything you might ever encounter. This is the definition of generality. Generality is not specificity scaled up. It is the ability to apply your mind to anything at all, to arbitrary things. This fundamentally requires the ability to adapt, to learn on the fly efficiently."

Chollet argues that general intelligence cannot be achieved merely by scaling up models or training data. He explains that more data increases a model's "skill" but does not bestow general intelligence. I love the way he succintly summarizes this: "Generality is not specificity scale up."

Throughout the podcast, Chollet carefully explains why he thinks LLMs lack understanding and are really just "memorizers." He goes so far as to say that LLMs lack intelligence altogether. Personally, I think this is very much dependent on your definiton of intelligence, but his point is well taken.

To aid in the search for general intelligence, Chollet created the ARC challenge, a test he believes can only be fairly completed by using general intelligence. He notes that an LLM or other model could be trained on many millions of ARC-like problems, and that it might be able to beat ARC, but that it would be "cheating" in a sense. He argues that the model would not actually be learning to generalize, but rather memorizing the answers to the ARC problems.

Interestingly, the state-of-the-art model on the ARC challenge does in fact use an LLM, but it adds quite a bit of extra machinery to the model to help it generalize. In the podcast, Chollet explains how he expects an LLM to be part of the final solution to AGI, but that it will need to be combined with other techniques to achieve general intelligence.

Conclusion

While Sutskever thinks AGI is mostly a matter of scaling up LLMs, Chollet is unconvinced. It's evident that even top AI researchers cannot agree on the correct path to AGI.

Estimates for achieving AGI vary widely, from a few years to several decades. Chollet even suggests that LLMs have slowed down progress towards AGI, stating, "OpenAI basically set back progress towards AGI by quite a few years, probably like 5-10 years." Only time will reveal whose perspective is closer to the truth.

...

If you haven't already, I highly recommend listening to both podcasts. Here's the link to Ilya's and here's a link to François's.


You're not a slow reader

May 25, 2024

Cinque Terre

Everybody wants to read more

Reading is one of those things that just about everybody wishes they did more of. While reading a little each day is a great strategy, there's only so much time you can set aside for reading. We're limited not only by how much time we can dedicate to reading, but also by how quickly we can read.

For a long time, I was frustrated with myself for being a "slow reader." I know I'm not alone in this feeling. Personally, I always figured it was just something I wasn't good at, and that the ability to speed read was mostly a gift and not a skill. However, I recently discovered not only that I was wrong about these assumptions, but also that the trick to reading quickly is in fact very simple.

Reading without subvocalization

When most people read, myself included, we essentially talk to ourselves in our mind, saying each word one by one with our "inner monologue." This is called subvocalization. Naturally, when we subvocalize, we tend to read at around the same speed that we talk. This is fine for things like fiction books that you're reading purely for enjoyment, but when it comes to non fiction, this can severely slow you down.

Thankfully, subvocalization isn't a necessary feature of reading. It's completely possible to read without using your inner monologue, and it's actually much faster. When I first came about this concept in this blog post by Benedict Neo, a fellow Machine Learning Engineer, I was honestly blown away. I felt like this simple idea had answered a long standing question I've had as to how other people were able to read so quickly, and why I could not.

In retrospect, I knew that speed readers were not reading every word of every sentence in their entirety, however, I hadn't even considered that the trick was to simply "turn off your inner monologue." It takes some getting used to if you've never done it before, but after a little while you'll be reading faster than ever. Without having to wait for your inner monologue to finish pronouncing each word, you can drastically increase the number of words you read per minute.

I think it's important to draw the distinction between this method of "reading without your inner monologue" and the concept of "skimming." Skimming is more analogous to "skipping through" some text to get a general idea what it's about, or to try and search for some topic within it. When skimming, you may skip over entire sentences or even paragraphs. However, when "reading without subvocalization", you should still be reading every word (more-or-less), but without stopping to pronounce each word in your head. I think it's important to differentiate between the two methods, as they provide radically different levels of reading comprehension.