When the Stochastic Parrot Spoke for Itself… and Flew Away

Benjamin Skuse

We all know how convincing modern chatbots based on large language models (LLMs) can be at appearing to understand what they are asked, how creative their responses can look, and how they even seem to be able to learn and reason on the fly.

According to a one school of thought, this is all an illusion. Sceptics argue that LLMs are simply ‘stochastic parrots’ – a term coined in a 2021 article by Emily Bender, Timnit Gebru and co-workers. Stochastic parrots completely lack understanding of the meaning encoded in their outputs; they are simply memorising and regurgitating the contents of the vast datasets on which they have been trained.

Parrots No More

During his all-too-brief Spark Talk at the 12th Heidelberg Laureate Forum, Sanjeev Arora (ACM Prize in Computing – 2011), a researcher who has played a pivotal role in some of the deepest and most influential results in theoretical computer science, made a compelling case that contemporary LLMs are much more than stochastic parrots, demonstrating capabilities that take them far beyond their human-created training data.

Arora believes the stochastic parrot school of thought should have died when ChatGPT-3 was replaced with GPT-4 and its modern contemporaries. Since 2023 – the same year Arora founded Princeton Language and Intelligence, a unit at Princeton University devoted to the study of large AI models and their applications – he argues that AI models have exhibited much more complicated and interesting behaviour. “They are trained on general-purpose skills… not just text,” he explained. “They undergo complicated multi-stage training, a large part of the training data is believed to be generated by AI, and finally there is this idea of self-improvement.”

For the specific application of mathematics, these advanced behaviours may have important implications. Those who believe AI can never reach the mathematical capabilities of humans often cite mathematical logician Kurt Gödel’s incompleteness theorems published in 1931 as proof that humans will always remain critical to mathematical progress.

Gödel essentially showed that no consistent system (in this case an AI) can discover a complete and consistent set of axioms for all mathematics. Soon after, between 1935 and 1937, Alonzo Church and Alan Turing conceived the idea of a computer and showed that, no matter how advanced, a computer cannot always decide the correctness of mathematical statements. Then in 1971, Steve Cook introduced the P versus NP problem, presenting an intractability barrier to AI solving all mathematical problems, in the sense that some proofs may take superpolynomial time to be realised.

Yet these ideas only show that a machine cannot solve the entirety of mathematics. The set of theorems that humans have proven at any given time is a small subset of all possible mathematical theorems, leaving plenty of room for AI to add to the corpus of knowledge. “A superhuman AI mathematician, this AI model, is able to prove theorems over time that are strictly more than the set of theorems humans have proven,” Arora said. “We don’t expect this to be perfect, it just has to be better than humans.”

Humans Out of the Loop

To get there requires humans to be taken out of the loop completely, argued Arora, which is something already starting to be seen in mathematics. Proof-checking is now routinely conducted automatically by proof-assistants such as Lean, and translation from English proofs to Lean proofs is becoming more and more automated. Are humans still needed in other parts of the process?

Google DeepMind’s AlphaProof was first trained using a large question bank, much of which was human-written, and with questions varying in difficulty. The model had multiple attempts to solve each question, and answers were graded without human intervention using a smaller LLM. Correct answers were used to train the model further, resulting in what may be construed as reasoning that could then be applied to improve performance on questions and topics not seen in training. The model improved its own performance without relying on human solutions. “There’s even increasing evidence that actually the AI itself can generate very good questions,” adds Arora. “And Lean … prevents it from going off track.”

The results of all of this progress have been stunning. Last year, DeepMind researchers harnessed AlphaProof and AlphaGeometry 2 together to solve four out of six problems from that year’s International Mathematical Olympiad (the most prestigious mathematics competition for pre-university students), a level equivalent to a silver medal. Building on this success, this year, models from Google and OpenAI went a step further, achieving gold medal status by solving five out of six problems under official contest conditions.

Beyond potentially transforming the job description of a human mathematician, why is this important? “Superhuman AI is really likely to happen, and math is the likely first domain because of verified answers.” When a superhuman AI mathematician is finally realised, perhaps even in the next five to 10 years, it will be the bellwether for other superhuman AI capabilities.

A Superhuman AI?

These more transformative capabilities were the focus of two other Spark Talks shortly after Arora’s. David Silver (ACM Prize in Computing – 2019) and his former PhD advisor Richard S. Sutton (ACM A.M. Turing Award – 2024) outlined how they see AI breaking the shackles of its human roots to become not just superhuman, but beyond and separate from humans entirely, experiencing the world in its own unique way.

In their joint chapter, “The Era of Experience“, in the upcoming book Designing an Intelligence, Silver and Sutton call for AI to be developed to learn from its own experience, continually generating data by interacting with its environment. This, they argue, combined with powerful self-improvement methods descended from those exhibited by AlphaProof, will allow AI to transcend human knowledge and capabilities.

“Agents [different AI] will inhabit streams of experience,… these actions and observations will be richly grounded in the environment,… their rewards will also be grounded in experience,… and finally agents will plan and/or reason about experience,” says Silver, principal research scientist at Google DeepMind and a professor at University College London, UK. “I think the scales in which this will happen will eventually vastly exceed the scale of the internet. At some point in the future, the knowledge of all the things that humans have discovered over time will seem small, and the knowledge that agents have learned will be far larger than that.”

But to get to that point will require the help of an army of computer scientists: “OK, call to arms for young researchers in the room, the challenge is how to solve the deep problem of AI: how to learn from experience,” he said. “If we solve this, it will be a profound moment for science and will transform the future of AI, and thereby humanity.”

“Succession to AI is inevitable”

Sutton, a professor at the University of Alberta, Fellow & Chief Scientific Advisor at the Alberta Machine Intelligence Institute, and a research scientist at Keen Technologies, agreed that superintelligence will require AI to be able to learn from experience, but he looked a little farther into the future in is Spark Session.

“Within your lifetime, AI researchers will understand the principles of intelligence well enough to create (or become) beings of far greater intelligence,” he opened. “It will be the greatest intellectual achievement of all time, with significance beyond humanity, beyond life, beyond good or bad – it’s a big deal.”

Richard S. Sutton (© HLFF / Christian Flemming)

To help accelerate science and society towards this new era, Sutton and colleagues have drawn up The Alberta Plan for AI Research, a vision and path towards deeply understanding computational intelligence over the next five to 10 years. Realising this plan calls for researchers to keep their eyes on the prize, argues Sutton, focusing on designing better learning and planning algorithms.

Similarly, he calls for focus and calm from politicians and the public alike: “AI in politics today is politically charged,” says Sutton. “It’s the focus of geopolitical competition between nation states and the public are fearful that AI will lead to bad things.”

Using the many negative consequences of authoritarian centralised control of human societies as a blueprint, Sutton argued that calls from politicians and the public for centralised control of AI (setting goals, slowing/stopping research, limiting compute power, requiring safety, etc) should be strongly resisted. This is because those calls are based on what he feels are unwarranted fears, and an unfounded perception that anyone has the power to stop AI in its tracks before it surpasses all human capabilities.

“In the ascent of humanity, succession to AI is inevitable,” he concluded. “But this view is still human-centric … AI is the inevitable next step in the development of the universe, and we should embrace it with courage, pride and a sense of adventure.”

The post When the Stochastic Parrot Spoke for Itself… and Flew Away originally appeared on the HLFF SciLogs blog.

Back