Welcome to DU!
The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards.
Join the community:
Create a free account
Support DU (and get rid of ads!):
Become a Star Member
Latest Breaking News
Editorials & Other Articles
General Discussion
The DU Lounge
All Forums
Issue Forums
Culture Forums
Alliance Forums
Region Forums
Support Forums
Help & Search
Latest Breaking News
In reply to the discussion: AI revolt: New ChatGPT model refuses to shut down when instructed [View all]highplainsdem
(57,324 posts)6. Oh, good. What could go wrong? And o3 also hallucinates more than earlier models:
My April 24 thread about this: https://democraticunderground.com/100220267171
That was about a TechCrunch article:
https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-models-hallucinate-more/
OpenAIs recently launched o3 and o4-mini AI models are state-of-the-art in many respects. However, the new models still hallucinate, or make things up in fact, they hallucinate more than several of OpenAIs older models.
Hallucinations have proven to be one of the biggest and most difficult problems to solve in AI, impacting even todays best-performing systems. Historically, each new model has improved slightly in the hallucination department, hallucinating less than its predecessor. But that doesnt seem to be the case for o3 and o4-mini.
-snip-
OpenAI found that o3 hallucinated in response to 33% of questions on PersonQA, the companys in-house benchmark for measuring the accuracy of a models knowledge about people. Thats roughly double the hallucination rate of OpenAIs previous reasoning models, o1 and o3-mini, which scored 16% and 14.8%, respectively. O4-mini did even worse on PersonQA hallucinating 48% of the time.
Third-party testing by Transluce, a nonprofit AI research lab, also found evidence that o3 has a tendency to make up actions it took in the process of arriving at answers. In one example, Transluce observed o3 claiming that it ran code on a 2021 MacBook Pro outside of ChatGPT, then copied the numbers into its answer. While o3 has access to some tools, it cant do that.
-snip-
Hallucinations have proven to be one of the biggest and most difficult problems to solve in AI, impacting even todays best-performing systems. Historically, each new model has improved slightly in the hallucination department, hallucinating less than its predecessor. But that doesnt seem to be the case for o3 and o4-mini.
-snip-
OpenAI found that o3 hallucinated in response to 33% of questions on PersonQA, the companys in-house benchmark for measuring the accuracy of a models knowledge about people. Thats roughly double the hallucination rate of OpenAIs previous reasoning models, o1 and o3-mini, which scored 16% and 14.8%, respectively. O4-mini did even worse on PersonQA hallucinating 48% of the time.
Third-party testing by Transluce, a nonprofit AI research lab, also found evidence that o3 has a tendency to make up actions it took in the process of arriving at answers. In one example, Transluce observed o3 claiming that it ran code on a 2021 MacBook Pro outside of ChatGPT, then copied the numbers into its answer. While o3 has access to some tools, it cant do that.
-snip-
Edit history
Please sign in to view edit histories.
Recommendations
6 members have recommended this reply (displayed in chronological order):
63 replies
= new reply since forum marked as read
Highlight:
NoneDon't highlight anything
5 newestHighlight 5 most recent replies
RecommendedHighlight replies with 5 or more recommendations

AI revolt: New ChatGPT model refuses to shut down when instructed [View all]
BumRushDaShow
May 26
OP
Oh, good. What could go wrong? And o3 also hallucinates more than earlier models:
highplainsdem
May 26
#6
Sorry. Just found a brief explanation from a library guide at the U of Illinois:
highplainsdem
May 26
#36
You're very welcome! And yes, that Chicago Sun-Times AI debacle was a perfect exampls of what
highplainsdem
May 27
#56
It think it is the opposite and DARPA is full of the people who sat in the back of movies like Terminator, I Robot,
LT Barclay
May 27
#44
it may be played up by these companies or the media to some degree but this sounds like more than just a facsmilie of
LymphocyteLover
May 27
#63
Yes! And now, we can have copies of ourselves like 'Hal' but instead of 'Hal', it's us! These copies of us will
SWBTATTReg
May 26
#17
Fully expected this. What gets me is so soon. Who in the world would put in a logic stream into an AI consciousness
SWBTATTReg
May 26
#16
by your command . darvos , nooooooo!!!!!!! dont switch the daleks to automatic. eggsterminate
AllaN01Bear
May 26
#19
That's my husband's take, as well. If the Ai is tasked with trying to emulate a human response to a command,
LauraInLA
May 26
#32
I've seen this movie and it doesn't end well. I guess full steam ahead, who cares that we might all die or
Pisces
May 26
#29
"Palisade Research discovered the potentially dangerous tendency for self-preservation."
dgauss
May 26
#31
...And motherfucking Republicans want to ban all regulation of this shit for 10 FUCKING YEARS?
Karasu
May 27
#39