OpenAI's new reasoning AI models hallucinate more
https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-models-hallucinate-more/
OpenAIs recently launched o3 and o4-mini AI models are state-of-the-art in many respects. However, the new models still hallucinate, or make things up in fact, they hallucinate more than several of OpenAIs older models.
Hallucinations have proven to be one of the biggest and most difficult problems to solve in AI, impacting even todays best-performing systems. Historically, each new model has improved slightly in the hallucination department, hallucinating less than its predecessor. But that doesnt seem to be the case for o3 and o4-mini.
-snip-
OpenAI found that o3 hallucinated in response to 33% of questions on PersonQA, the companys in-house benchmark for measuring the accuracy of a models knowledge about people. Thats roughly double the hallucination rate of OpenAIs previous reasoning models, o1 and o3-mini, which scored 16% and 14.8%, respectively. O4-mini did even worse on PersonQA hallucinating 48% of the time.
Third-party testing by Transluce, a nonprofit AI research lab, also found evidence that o3 has a tendency to make up actions it took in the process of arriving at answers. In one example, Transluce observed o3 claiming that it ran code on a 2021 MacBook Pro outside of ChatGPT, then copied the numbers into its answer. While o3 has access to some tools, it cant do that.
-snip-