News

See Headlines »

Editorial | what does this mean?

This content has been selected, created and edited by the Finextra editorial team based upon its relevance and interest to our community.

LLMs display cognitive shortfalls - BIS

When posed with a logical puzzle that demands reasoning about the knowledge of others and about counterfactuals, large language models (LLMs) display a "distinctive and revealing pattern of failure," according to a bulletin from the Bank for International Settlements.

With ChatGPT capturing the public imagination and central banks around the world exploring the potential applications of LLMs, BIS has been testing their cognitive limits.

To do this, it quizzed GPT-4 with the well-known Cheryl’s birthday logic puzzle, finding that the LLM solved the puzzle flawlessly when presented with the original wording.

As the authors note, GPT-4 will have encountered the puzzle and its solution during its training. However, the model consistently failed when small incidental details - such as the names of the characters or the specific dates - were changed.

This says, the BIS bulletin, suggest a lack of true understanding of the underlying logic.

BIS says that the findings do not detract from the progress in central bank applications of machine learning to data management, macro analysis and regulation.

"Nevertheless, our findings do suggest that caution should be exercised in deploying large language models in contexts that necessitate careful and rigorous economic reasoning.

"The evidence so far is that the current generation of LLMs falls short of the rigour and clarity in reasoning required for the high-stakes analyses needed for central banking applications."

Read the bulletin

Related Companies

Bank for International Settlements (BIS)

Lead Channel

Regulation & Compliance

Channels

Wholesale banking

Keywords

Artificial intelligence Machine learning Research/analysis

Editorial | what does this mean?

This content has been selected, created and edited by the Finextra editorial team based upon its relevance and interest to our community.

Comments: (2)

Vladimir Dimitroff - Senior Executives Forum - London 05 January, 2024, 13:14

Be the first to give this comment the thumbs up

0 likes

Still more 'A' than 'I' - but working on it ;)

Report abuse

Ketharaman Swaminathan - GTM360 Marketing Solutions - Pune 08 January, 2024, 08:23

0 likes

TBH how many human bankers do any "careful and rigorous economic reasoning" these days anyway? I wonder if any bank is planning to use Gen AI / LLM for such activities in the first place. Banks moved them to regulatory mandate and CBS, FD&P, HFT, and other software systems years ago. Whenever I ask my (human so far) RM to explain e.g. a certain TDS entry on my bank statement, his / her stock answer is, "It's according to RBI mandate" or "That's what FLEXCUBE says".

(Disclosure: I'm ex-employee of FLEXCUBE company.)