LLMs such as ChatGPT fail even with simple logic tasks

ChatGPT 4's response to a logic question posed on 8/14/2014 proves inadequacy of LLMs.

Even the best AI language models fail dramatically when it comes to logical questions. This is the conclusion reached by researchers from the Jülich Supercomputing Centre (JSC), the School of Electrical and Electronic Engineering at the University of Bristol and the LAION AI laboratory. In their paper, „Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models“, the researchers attest to a „severe breakdown in functional and reasoning ability“ in the state-of-the-art LLMs tested and suggest that although language models have the basic ability to draw conclusions, they cannot reliably retrieve them. They call on the scientific and technological community to stimulate an urgent reassessment of the claimed capabilities of the current generation of LLM. Furthermore, they call for the development of standardized benchmarks to uncover weaknesses in language models‘ reasoning abilities – as current tests have apparently failed to detect this serious flaw. (jr)

Link to the original message

Link to the pre-view of the research paper