OpenAI's recent publication, "Let's Verify Step by Step," (https://cdn.openai.com/improving-mathematical-reasoning-with-process-supervision/Lets_Verify_Step_by_Step.pdf) iinstantly captured my attention upon its release on May 31, 2023. Central to this paper's
contents is a chart on page seven suggesting a thought-provoking postulation – that humanity may be closely approaching the creation of artificial general intelligence (AGI).
Pay particular attention to the "Process-Supervised RM" curve, which is notable to the careful observer for its continuing upward climb. Contrary to expectations that it would eventually plateau, the curve suggests that by increasing computational power,
the percentage of mathematical problems solved correctly can continue to rise. Given sufficient computing power, AI could potentially solve mathematical challenges at an unprecedented scale, conceivably even achieving near-perfect accuracy.
Before delving deeper into the implications, let's trace the journey that brought us to this juncture and understand its significance.
The roots of this development were published in 2021 in a study titled 'Training Verifiers to Solve Math Word Problems'. This research evaluated large AI language models, including a massive 175 billion parameter model, using a math word problem dataset
known as GSM8K.
A carnival snack booth made $50 selling popcorn each day. It made three times as much selling cotton candy. For a 5-day activity, the booth has to pay $30 rent and $75 for the cost of the ingredients. How much did the booth earn for 5 days after paying the
rent and the cost of ingredients?
Example Question
Despite their size, these models initially showed limited success, correctly solving only about 30% of the problems.
To enhance their effectiveness, the researchers tested an innovative technique: they developed a 'verifier' model specifically to assess solutions generated by the primary model. During testing, this verifier analyzed multiple candidate solutions, selecting
the one with the highest score. This strategy significantly boosted performance, enabling the AI to achieve results nearly on par with those of a base model that was 30 times larger.
Not only does a six billion parameter outperform a 175 billion parameter model, but it turns out that the verifier model scales much better with more data than transformer-based language models.
Initially, the verifier was rewarded for identifying a correct answer, even if the reasoning behind it was flawed. This approach occasionally led to accurate conclusions drawn from incorrect methodologies, a substantial concern. Researchers pivoted from
an Outcome-Supervised reward model to a Process-Supervised reward model to address this issue. This shift meant that the verifier now focused on the correctness of the reasoning process rather than just the end result.
As mentioned earlier, this great result is documented in the remarkable paper 'Let's Verify Step by Step,' but raises the question of why perfect scores in mathematics could lead to Artificial General Intelligence.
A New Theory of Intelligence
Building on Mountcastle's research, the latest theory of intelligence posits that thousands of cortical columns in the brain learn and construct models of complete objects. These columns are believed to be the fundamental units of learning and processing
in the brain, each capable of creating a model of the world as it perceives it. However, the true complexity and intelligence of the brain emerge from how these individual models are combined and integrated.
Likely, it is this integration that allows for a more nuanced and comprehensive understanding of the environment. While individual cortical columns process partly different aspects of an object or a scene, they then collaborate to form a unified perception,
combining various sensory inputs and experiences. This process enables not just recognition of objects but also understanding of contexts, relationships, and abstract concepts.
This theory extends to artificial intelligence, where integrating multiple models could lead to more advanced, generalized forms of AI akin to human intelligence. By mimicking the brain's strategy of combining numerous, eventually specialized, models, AI
systems could potentially develop a more holistic and nuanced understanding of the world, a key step towards achieving AGI.
The challenge lies in effectively replicating this complex interplay of independent yet interconnected processing units. Just as the brain seamlessly integrates the input from countless cortical columns, an AGI system would need to harmonize diverse models,
each with its own specialized knowledge and perspective, into a coherent and cohesive intelligence.
It requires some advancements in how individual AI models learn and process information but, more importantly, how they integrate those models. It is specifically this integration process that requires precise mathematics.
Specialized cells, namely grid and place cells, work in tandem to form a complex network for spatial awareness and navigation, building the brain's positioning system. This positioning system can be thought of as a virtual coordinate system over reality,
where every movement or directional change is recorded and encoded into each cortical column's model.
After several centuries, Immanuel Kant's theory, as introduced in his 'Critique of Pure Reason,' appears to hold validity. He proposed that we have "a priori" knowledge about space and time, suggesting an innate understanding of these concepts independent
of experience.
These models are integrated to form a single coherent model using positional encoding and employing vector addition, trigonometric functions, and differential and integral calculus. This path integration is essential for a unified and comprehensive form
of intelligence.
Accurate mathematics appears to be a crucial foundation for human intelligence, and we might be on the cusp of a breakthrough. Yet, amidst this exhilarating progress, there's a catch: it dramatically intensifies existing problems, casting a shadow of dilemma
over our newfound path.
Sustainable Intelligence: Rethinking the Cost of AI Advancements
With the approach of generating several possible answers and having a validator choose the correct one, there is a substantial shift in the demand for computational power from training, which happens once, to inference, which happens every time the system
is used. After all, whenever such a system is used, it is asked to generate hundreds or thousands of possible solutions.
Even if the language model is up to 30 times smaller, it still uses more than 33 times the energy at inference if asked to produce a thousand possible solutions (1000/30 = 33.3). That is, without even considering the validator model cost. This surge in computational
demand would inevitably escalate the already pressing concerns of computing cost and its environmental impact.
For example, the only 176 billion parameter Bloom model consumed the equivalent energy of powering 30 homes for a year in training despite its emphasis on ethics, transparency, and consent. It emitted 25 tons of carbon dioxide – an environmental footprint
comparable to driving a car around the globe five times.
Moreover, the financial cost of running such advanced AI models is staggering, rendering only a limited range of use cases economically viable. This economic barrier could result in a few well-funded companies monopolizing these technologies, potentially
compromising transparency and accessibility. The concentration of control in the hands of a few would exacerbate existing issues, such as bias, as these entities could prioritize their commercial interests over broader ethical considerations. We would face
the risk of a technology landscape where bias is perpetuated and potentially amplified.
The exclusivity of access to advanced AI might further deepen existing inequalities in the tech world, with smaller entities and researchers finding themselves increasingly marginalized.
Such a dystopian future leads to a crucial societal question: can we even afford AGI at that cost?
AI Adequacy: Task-Tuning for Precision and Efficiency
Efforts to address the future existential risks of AI are indeed important, yet they can sometimes overshadow the immediate, tangible impacts that AI technologies already have. These impacts will likely be amplified as we edge closer to the development of
Artificial General Intelligence.
The journey towards AGI isn't just about technological advancement but also about creating tools to mitigate these very impacts. One such tool, the Diffusion Bias Explorer, allows users to investigate the biases inherent in image generation models, particularly
in the context of various professions.
As we continue to work towards a hungry, foolish, human-centered society, it is important that we continue to make contributions that democratize AI by making it adequate. We are achieving this by developing tools that simplify the creation of task-specific
AI models. These tools leverage pre-trained large language models to generate more precise, energy-efficient, cost-effective language models tailored to specific tasks.
https://cdn.openai.com/improving-mathematical-reasoning-with-process-supervision/Lets_Verify_Step_by_Step.pdf