OpenAI's o3 and o4-mini hallucinate way higher than previous models

27.04.2025 07:40

Home > Tech A troubling issue nestled in OpenAI's technical report. By Cecily Mauran Cecily Mauran Tech Reporter Cecily is a tech reporter at Mashable who covers AI, Apple, and emerging tech trends. Before getting her master's degree at Columbia Journalism School, she spent several years working with startups and social impact businesses for Unreasonable Group and B Lab. Before that, she co-founded a startup consulting business for emerging entrepreneurial hubs in South America, Europe, and Asia. You can find her on X at @cecily_mauran. Read Full Bio onApril 19, 2025 Share on Facebook Share on Twitter Share on Flipboard And OpenAI doesn't know why. Credit: Didem Mente / Anadolu / Getty Images ByOpenAI's own testing, its newestreasoning models, o3 and o4-mini, hallucinate significantly higher than o1. First reported by TechCrunch, OpenAI's system card detailed the PersonQA evaluation results, designed to test for hallucinations. From the results of this evaluation, o3's hallucination ra...