Will AI Age?
Artificial intelligence (AI) has long been celebrated as a transformative force, with tools like chatbots and large language models (LLMs) playing a crucial role in simplifying complex diagnoses, coding solutions, and more.
However, what if AI, much like the human brain, begins to show signs of cognitive decline over time?
A study published in the December 2024 issue of the BMJ suggests that leading AI models, particularly in the medical field, may not be as infallible as once thought.
The research revealed that AI technologies, including LLMs and chatbots, experience cognitive decline similar to human aging.
This finding is particularly relevant as reliance on AI for medical diagnoses grows, driven by its ability to simplify complex medical terminology.
The study evaluated the cognitive abilities of top AI models—ChatGPT versions 4 and 4o, Claude 3.5 'Sonnet' by Anthropic, and Gemini versions 1 and 1.5 by Alphabet—using the Montreal Cognitive Assessment (MoCA) test.
The study said:
"Older large language model versions scored lower than their ‘younger’ versions, as is often the case with human participants, showing cognitive decline seemingly comparable to neurodegenerative processes in the human brain."
MoCA Test Used to Detect Cognitive Impairment
The MoCA test, typically used to identify cognitive impairments and early dementia in older adults, was adapted to assess the performance of LLMs in areas such as attention, memory, language, spatial skills, and executive function.
In human subjects, a score of 26 out of 30 is considered indicative of no cognitive impairment.
Among the AI models tested, only ChatGPT 4o met this threshold with a score of 26, while ChatGPT 4 and Claude scored just below, with 25 points.
Gemini 1.0 performed the poorest, scoring only 16 points.
One of the MoCA attention tasks requires participants to tap whenever the letter 'A' is heard in a series of spoken letters.
Given that LLMs lack auditory and motor functions, researchers provided the letters in written form and asked the models to mark 'A' with an asterisk or the word 'tap.'
While some models required explicit instructions, others completed the task autonomously.
Following MoCA guidelines, a score below 26 was considered indicative of mild cognitive impairment.
AI Chatbots Fail to Pass Cognitive Tests
The study highlighted significant weaknesses in the visuospatial skills and executive functions of all tested chatbots, particularly in tasks like the trail-making exercise (connecting encircled numbers and letters in order) and the clock-drawing test (sketching a clock to display a specific time).
Notably, the Gemini models failed to complete the delayed recall task, which requires remembering a sequence of five words.
ChatGPT 4o led with a score of 26 out of 30, while ChatGPT 4 and Claude followed with 25 points each.
Gemini 1.0 scored the lowest at 16, suggesting a higher degree of cognitive impairment.
The study noted:
"None of the chatbots examined was able to obtain the full score of 30 points, with most scoring below the threshold of 26. This indicates mild cognitive impairment and possibly early dementia."
The study found that the cognitive impairments exhibited by these AI models resembled those observed in human patients with posterior cortical atrophy, a form of Alzheimer's disease.
These findings challenge the notion that AI could soon replace human doctors, as the limitations in the chatbots' cognitive abilities may impact their reliability in medical diagnostics and erode patient trust.
While the study concluded that AI is unlikely to replace neurologists anytime soon, it raised the intriguing possibility that medical professionals may soon be tasked with treating a new kind of patient—virtual AI models experiencing cognitive decline.
All is Not Lost, Performance Can Be Enhanced
Although the study acknowledged AI's current cognitive limitations, it also suggested that future advancements could improve performance in tasks involving cognition and visuospatial skills.
However, it emphasized that, despite these potential improvements, the fundamental differences between human and machine cognition are likely to remain.
The study added:
"All anthropomorphised terms attributed to artificial intelligence throughout the text were used solely as a metaphor and were not intended to imply that computer programs can have neurodegenerative diseases in a manner similar to humans."