Do large language models have non-linguistic reasoning capabilities?

2024/12/13 14:07

Source: Quantum

A headline article from Ars Technica today explores the question of whether large language models have non-linguistic reasoning capabilities, and cites researchers' findings that processing in a "latent space" can help AI solve tricky logic problems. What's going on? Let's keep looking down.

So far, large language models have been hugely successful, using their transformer architecture to effectively predict the next word (i.e., language token) needed to respond to a query. However, when it comes to complex reasoning tasks that require abstract logic, some researchers have found that explaining everything through this "language space" can lead to some problems, even for modern "reasoning" models.

Now, researchers are trying to address these issues by designing models that can compute potential logical solutions entirely in a "latent space" - a hidden layer of computation before the transformer generates language. While this approach won't lead to a revolutionary change in the reasoning capabilities of large language models, it does clearly improve the accuracy of certain types of logic problems and points to some interesting directions for new research.

Wait, what space?

Modern reasoning models, such as ChatGPT’s o1, tend to work by generating “chains of thoughts.” In these models, each step of a logical process is represented as a sequence of natural language word tokens that are fed back through the model.

In a new paper, researchers from the Meta Foundation AI Research Team and the University of California, San Diego, call this reliance on natural language and “word tokens” a “fundamental constraint” for these reasoning models. This is because successful reasoning tasks often require complex planning around specific key tokens to find the right logical path from a multitude of options.

The figure above illustrates the difference between the standard model, which passes through a transformer at each step, and the COCONUT model, which uses hidden “latent” states. (Image source: Training Large Language Models to Reason in a Continuous Latent Space)

In current thought chaining models, word tokens are often generated for the sake of "text coherence" and "fluency," while "contributing little to the actual reasoning process," the researchers wrote. Instead, they suggest that "ideally, large language models can reason freely without any linguistic restrictions, and then only translate their findings into language when necessary."

To achieve this "ideal," the researchers describe a method for "training large language models to reason in a continuous latent space," as the title of the paper states. This "latent space" is essentially composed of a set of "hidden" intermediate token weight sets that the model contains before the converter generates a human-readable, natural language version of that internal state.

In the researchers' COCONUT model (Continuous Thought Chaining), these hidden states are encoded as "latent thoughts," which replace individual written steps in a logical sequence when training and processing queries. This avoids the need to convert to natural language at each step and “liberates reasoning from the language space,” the researchers wrote, resulting in an optimized path of reasoning they call “sequential thinking.”

A broader view

While processing logic in latent space has some benefits for improving model efficiency, the more important finding is that such a model can “encode multiple potential subsequent steps simultaneously.” Processing logic in “latent space” enables a kind of instant backtracking, which the researchers liken to doing a breadth-first search in a graph. Rather than searching through each logical option completely and one by one in a “greedy” process.

This emergent, simultaneous processing property manifests itself in testing, the researchers wrote, even when the model has not been explicitly trained. “While the model may not make the right decision initially, it can maintain many possible choices in sequential thinking, guided by some implicit value function, and gradually eliminate incorrect paths through reasoning,” they wrote.

This figure highlights some of the ways in which different models may fail in certain types of logical reasoning. (Source: Training Large Language Models to Reason in a Continuous Latent Space)

This multi-path reasoning did not really improve COCONUT's accuracy compared to traditional thought chain models in relatively simple mathematical reasoning tests (GSM8K) or general reasoning (ProntoQA) tests. But the researchers found that the model performed relatively well in a set of randomly generated ProntoQA-style queries involving complex and tortuous sets of logical conditions (for example, "every apple is a fruit, every fruit is a food, and so on").

For these tasks, standard thought-chaining reasoning models often get stuck in reasoning dead ends or even produce completely fictitious rules when trying to solve logical chains. Previous research has also shown that the "verbalized" logical steps output by these thought-chaining models "may actually utilize underlying reasoning processes that are different from the shared reasoning processes."

The new study joins a growing number of studies aimed at understanding and exploiting how large language models work at the level of their underlying neural networks. While this type of research has not yet produced major breakthroughs, the researchers believe that models that are pre-trained with this kind of "serial thought" from the beginning could "enable the model to generalize more effectively across a wider range of reasoning scenarios."

Gain a broader understanding of the crypto industry through informative reports, and engage in in-depth discussions with other like-minded authors and readers. You are welcome to join us in our growing Coinlive community:https://t.me/CoinliveSG

Add Comment

LoginLeave your comments

0 Comments

Earliest

Load more comments

More news about bybit ローンチプール

Oct 04
CATS will be listed on Bitget Launchpool
Bullish
Bearish
Sep 26
Trust Wallet: Launchpool feature coming soon
Bullish
Bearish
Aug 21
Binance Launchpool to List Dogs (DOGS)
Bullish
Bearish
Apr 19
Merlin Coin Launchpool Project Attracts Significant Investment
Bullish
Bearish
Mar 08
AEVO has been listed on Binance launchpool
Bullish
Bearish
Sep 14
Bybit CEO: New British regulations may force Bybit to exit the British market
Bullish
Bearish
Jul 03
Binance Launchpool Will List Pendle
Bullish
Bearish
Apr 18
Bybit Opens Global Headquarters in Dubai
Bullish
Bearish
Jan 20
Bybit Lianchuang: Mirana only manages part of Bybit assets and is isolated from customer funds
Bullish
Bearish
Sep 06
Bybit banned from securities brokerage in Brazil
Bullish
Bearish

Do large language models have non-linguistic reasoning capabilities?

Wait, what space?

A broader view

More news about bybit ローンチプール

More news about bybit ローンチプール

Will Binance Launchpool’s new project Portal open higher and then sell lower?

Everything you need to know about Binance Launchpool’s new project Portal

TonUP Launchpad Introduces Innovative Launchpool for TonTogether Token (TOT)

Bybit exits Canadian market, following in Binance's footsteps

Bybit Receives In-Principle Approval to Operate in Kazakhstan

Bybit, Swyftx Join List of Crypto Firms Reducing Workforce

Bybit to Lay Off 30% of Workforce Amid Colder Crypto Winter

Bybit and Huobi Rush to Publish Proof of Reserves

Canadian regulator takes enforcement actions against Bybit and Kucoin

Bybit adds crypto options trading as part of expansion plans