Source: Quantum
A headline article from Ars Technica today explores the question of whether large language models have non-linguistic reasoning capabilities, and cites researchers' findings that processing in a "latent space" can help AI solve tricky logic problems. What's going on? Let's keep looking down.
So far, large language models have been hugely successful, using their transformer architecture to effectively predict the next word (i.e., language token) needed to respond to a query. However, when it comes to complex reasoning tasks that require abstract logic, some researchers have found that explaining everything through this "language space" can lead to some problems, even for modern "reasoning" models.
Now, researchers are trying to address these issues by designing models that can compute potential logical solutions entirely in a "latent space" - a hidden layer of computation before the transformer generates language. While this approach won't lead to a revolutionary change in the reasoning capabilities of large language models, it does clearly improve the accuracy of certain types of logic problems and points to some interesting directions for new research.
Wait, what space?
Modern reasoning models, such as ChatGPT’s o1, tend to work by generating “chains of thoughts.” In these models, each step of a logical process is represented as a sequence of natural language word tokens that are fed back through the model.
In a new paper, researchers from the Meta Foundation AI Research Team and the University of California, San Diego, call this reliance on natural language and “word tokens” a “fundamental constraint” for these reasoning models. This is because successful reasoning tasks often require complex planning around specific key tokens to find the right logical path from a multitude of options.
The figure above illustrates the difference between the standard model, which passes through a transformer at each step, and the COCONUT model, which uses hidden “latent” states. (Image source: Training Large Language Models to Reason in a Continuous Latent Space)
In current thought chaining models, word tokens are often generated for the sake of "text coherence" and "fluency," while "contributing little to the actual reasoning process," the researchers wrote. Instead, they suggest that "ideally, large language models can reason freely without any linguistic restrictions, and then only translate their findings into language when necessary."
To achieve this "ideal," the researchers describe a method for "training large language models to reason in a continuous latent space," as the title of the paper states. This "latent space" is essentially composed of a set of "hidden" intermediate token weight sets that the model contains before the converter generates a human-readable, natural language version of that internal state.
In the researchers' COCONUT model (Continuous Thought Chaining), these hidden states are encoded as "latent thoughts," which replace individual written steps in a logical sequence when training and processing queries. This avoids the need to convert to natural language at each step and “liberates reasoning from the language space,” the researchers wrote, resulting in an optimized path of reasoning they call “sequential thinking.”
A broader view
While processing logic in latent space has some benefits for improving model efficiency, the more important finding is that such a model can “encode multiple potential subsequent steps simultaneously.” Processing logic in “latent space” enables a kind of instant backtracking, which the researchers liken to doing a breadth-first search in a graph. Rather than searching through each logical option completely and one by one in a “greedy” process.
This emergent, simultaneous processing property manifests itself in testing, the researchers wrote, even when the model has not been explicitly trained. “While the model may not make the right decision initially, it can maintain many possible choices in sequential thinking, guided by some implicit value function, and gradually eliminate incorrect paths through reasoning,” they wrote.
This figure highlights some of the ways in which different models may fail in certain types of logical reasoning. (Source: Training Large Language Models to Reason in a Continuous Latent Space)
This multi-path reasoning did not really improve COCONUT's accuracy compared to traditional thought chain models in relatively simple mathematical reasoning tests (GSM8K) or general reasoning (ProntoQA) tests. But the researchers found that the model performed relatively well in a set of randomly generated ProntoQA-style queries involving complex and tortuous sets of logical conditions (for example, "every apple is a fruit, every fruit is a food, and so on").
For these tasks, standard thought-chaining reasoning models often get stuck in reasoning dead ends or even produce completely fictitious rules when trying to solve logical chains. Previous research has also shown that the "verbalized" logical steps output by these thought-chaining models "may actually utilize underlying reasoning processes that are different from the shared reasoning processes."
The new study joins a growing number of studies aimed at understanding and exploiting how large language models work at the level of their underlying neural networks. While this type of research has not yet produced major breakthroughs, the researchers believe that models that are pre-trained with this kind of "serial thought" from the beginning could "enable the model to generalize more effectively across a wider range of reasoning scenarios."