Scientists have shown that AI will always hallucinate and guess „like a student on an exam;” artificial intelligence assessment systems are to blame

A significant problem with AI, like ChatGPT, is its tendency to “hallucinate” – essentially, make up information. Because these fabrications are unpredictable, users who rely on AI for writing help need to carefully check the content it creates. Otherwise, they risk spreading incorrect information. According to OpenAI researchers, this issue is unavoidable (as reported by Computer World).

AI will always hallucinate

A recent study, “Why Language Models Hallucinate,” found that AI systems often make things up because the way they’re tested rewards *any* answer, even if it’s incorrect. Current AI benchmarks prioritize providing a response over admitting when they don’t know, leading them to confidently guess rather than say ‘I don’t know’.

This behavior is similar to students who will write anything at all on a test question just to avoid leaving it blank.

Similar to how students might guess on difficult test questions, large language models sometimes make up answers when they’re unsure, presenting them as if they’re true. These believable but incorrect statements, often called ‘hallucinations,’ continue to happen even with the most advanced systems and can erode our confidence in them.

I was reading about how different AI chatbots stack up against ChatGPT, and it’s kinda wild. They did this test where they asked a simple question – how many ‘d’s are in the word ‘deepseek’? You’d think it’d be easy, right? But DeepSeek-V3 got it wrong ten times in a row, saying there were either two or three ‘d’s. And it gets worse – Claude 3.7 Sonnet actually answered with six or seven! Seriously? It’s a pretty basic test, and these AIs really struggled with it.

ChatGPT-5 is also prone to hallucinations, although according to scientists, to a lesser extent. The model already showed it in August when it responded “I don’t know” to a question from an internet user, which impressed many, including Elon Musk, because it was seen as a very human reaction. Interestingly, in the experiment, less errors were made by the more primitive models than the more advanced ones (o1 with 16% hallucinations, o3 with 33% hallucinations, and o4-mini with 48% hallucinations).

Studies show that AI systems will inevitably sometimes ‘hallucinate’ – produce incorrect or nonsensical information. Instead of trying to eliminate this entirely, we should focus on managing it. Current evaluation methods also need to be revised to discourage systems from simply guessing and instead reward them for acknowledging when they don’t know something. Achieving this, however, requires clear rules and standards within the industry.

2025-09-23 10:32

AI will always hallucinate

Read More