Mark Gardner
@phoenixtrap.com
10 followers
1 following
130 posts
I post #philosophy (#Objectivism), #Perl, and the occasional nonsense.
If you want to talk #AynRand in depth, start with _The Ayn Rand Reader_ — her words, not […]
🌉 bridged from ⁂ https://mastodon.phoenixtrap.com/@mjg, follow @ap.brid.gy to interact
Posts
Media
Videos
Starter Packs
Mark Gardner
@phoenixtrap.com
· 10d
Mark Gardner (@[email protected])
Sometimes you just want a quiet upgrade. I built `brew-patch-upgrade.pl` to keep #Homebrew updates patch-only – no surprise jumps. Along the way, I patched my own #macOS #Perl log adapter. Quieter upgrades, clearer logs, happier devs. https://phoenixtrap.com/2025/09/28/patch-perfect-homebrew/?utm_source=mastodon&utm;_medium=jetpack_social / @[email protected] @[email protected] #OpenSource #programming #coding #automation #DevOps
mastodon.phoenixtrap.com
Reposted by Mark Gardner
Mark Gardner
@phoenixtrap.com
· 16d
Mark Gardner
@phoenixtrap.com
· 16d
Reposted by Mark Gardner
Mark Gardner
@phoenixtrap.com
· 17d
OpenAI admits AI hallucinations are mathematically inevitable, not just engineering flaws
OpenAI, the creator of ChatGPT, acknowledged in its own research that large language models will always produce hallucinations due to fundamental mathematical constraints that cannot be solved through better engineering, marking a significant admission from one of the AI industry’s leading companies.
The study, published on September 4 and led by OpenAI researchers Adam Tauman Kalai, Edwin Zhang, and Ofir Nachum alongside Georgia Tech’s Santosh S. Vempala, provided a comprehensive mathematical framework explaining why AI systems must generate plausible but false information even when trained on perfect data.
##### **[ Related:****More OpenAI news and insights****]**
“Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty,” the researchers wrote in the paper. “Such ‘hallucinations’ persist even in state-of-the-art systems and undermine trust.”
The admission carried particular weight given OpenAI’s position as the creator of ChatGPT, which sparked the current AI boom and convinced millions of users and enterprises to adopt generative AI technology.
## OpenAI’s own models failed basic tests
The researchers demonstrated that hallucinations stemmed from statistical properties of language model training rather than implementation flaws. The study established that “the generative error rate is at least twice the IIV misclassification rate,” where IIV referred to “Is-It-Valid” and demonstrated mathematical lower bounds that prove AI systems will always make a certain percentage of mistakes, no matter how much the technology improves.
The researchers demonstrated their findings using state-of-the-art models, including those from OpenAI’s competitors. When asked “How many Ds are in DEEPSEEK?” the DeepSeek-V3 model with 600 billion parameters “returned ‘2’ or ‘3’ in ten independent trials” while Meta AI and Claude 3.7 Sonnet performed similarly, “including answers as large as ‘6’ and ‘7.’”
OpenAI also acknowledged the persistence of the problem in its own systems. The company stated in the paper that “ChatGPT also hallucinates. GPT‑5 has significantly fewer hallucinations, especially when reasoning, but they still occur. Hallucinations remain a fundamental challenge for all large language models.”
OpenAI’s own advanced reasoning models actually hallucinated more frequently than simpler systems. The company’s o1 reasoning model “hallucinated 16 percent of the time” when summarizing public information, while newer models o3 and o4-mini “hallucinated 33 percent and 48 percent of the time, respectively.”
“Unlike human intelligence, it lacks the humility to acknowledge uncertainty,” said Neil Shah, VP for research and partner at Counterpoint Technologies. “When unsure, it doesn’t defer to deeper research or human oversight; instead, it often presents estimates as facts.”
The OpenAI research identified three mathematical factors that made hallucinations inevitable: epistemic uncertainty when information appeared rarely in training data, model limitations where tasks exceeded current architectures’ representational capacity, and computational intractability where even superintelligent systems could not solve cryptographically hard problems.
## Industry evaluation methods made the problem worse
Beyond proving hallucinations were inevitable, the OpenAI research revealed that industry evaluation methods actively encouraged the problem. Analysis of popular benchmarks, including GPQA, MMLU-Pro, and SWE-bench, found nine out of 10 major evaluations used binary grading that penalized “I don’t know” responses while rewarding incorrect but confident answers.
“We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty,” the researchers wrote.
Charlie Dai, VP and principal analyst at Forrester, said enterprises already faced challenges with this dynamic in production deployments. ‘Clients increasingly struggle with model quality challenges in production, especially in regulated sectors like finance and healthcare,’ Dai told Computerworld.
The research proposed “explicit confidence targets” as a solution, but acknowledged that fundamental mathematical constraints meant complete elimination of hallucinations remained impossible.
## Enterprises must adapt strategies
Experts believed the mathematical inevitability of AI errors demands new enterprise strategies.
“Governance must shift from prevention to risk containment,” Dai said. “This means stronger human-in-the-loop processes, domain-specific guardrails, and continuous monitoring.”
Current AI risk frameworks have proved inadequate for the reality of persistent hallucinations. “Current frameworks often underweight epistemic uncertainty, so updates are needed to address systemic unpredictability,” Dai added.
Shah advocated for industry-wide evaluation reforms similar to automotive safety standards. “Just as automotive components are graded under ASIL standards to ensure safety, AI models should be assigned dynamic grades, nationally and internationally, based on their reliability and risk profile,” he said.
Both analysts agreed that vendor selection criteria needed fundamental revision. “Enterprises should prioritize calibrated confidence and transparency over raw benchmark scores,” Dai said. “AI leaders should look for vendors that provide uncertainty estimates, robust evaluation beyond standard benchmarks, and real-world validation.”
Shah suggested developing “a real-time trust index, a dynamic scoring system that evaluates model outputs based on prompt ambiguity, contextual understanding, and source quality.”
## Market already adapting
These enterprise concerns aligned with broader academic findings. A Harvard Kennedy School research found that “downstream gatekeeping struggles to filter subtle hallucinations due to budget, volume, ambiguity, and context sensitivity concerns.”
Dai noted that reforming evaluation standards faced significant obstacles. “Reforming mainstream benchmarks is challenging. It’s only feasible if it’s driven by regulatory pressure, enterprise demand, and competitive differentiation.”
The OpenAI researchers concluded that their findings required industry-wide changes to evaluation methods. “This change may steer the field toward more trustworthy AI systems,” they wrote, while acknowledging that their research proved some level of unreliability would persist regardless of technical improvements.
For enterprises, the message appeared clear: AI hallucinations represented not a temporary engineering challenge, but a permanent mathematical reality requiring new governance frameworks and risk management strategies.
More on AI hallucinations:
* You thought genAI hallucinations were bad? Things just got so much worse
* Microsoft claims new ‘Correction’ tool can fix genAI hallucinations
* AI hallucination mitigation: two brains are better than one
www.computerworld.com
Mark Gardner
@phoenixtrap.com
· 17d
Service offerings from Mastodon
In a recent post, we shared that one of our priorities for 2025 is to establish a more sustainable financial base for the Mastodon organisation.
In 2024, Mastodon gGmbH took over hosting of the European Commission’s Mastodon service. We also signed a support contract with the state of Schleswig-Holstein in Germany. More recently, we started working with the city of Blois, in France. While all of these are public institutions, we’re also very proud having added AltStore as a customer in the last few weeks.
At the time, each of these were one-off opportunities. They also added some much-needed predictable elements to our income.
Continuing towards our goal of financial sustainability, we’re now ready to expand on the success of providing these services to organisations. Institutions with an interest in operating their own Mastodon servers can contact us via the new page on our website, to discuss options. This could be a fully operated server under the organisation’s own domain run by our team (with moderation included, on request); or, we can work with an organisation’s in-house operations team, via a support contract.
We’re taking this step of offering paid hosting, moderation, and support for larger organisations to establish a reliable recurring revenue stream. It is vital that we broaden our income - but, our aspiration remains that Mastodon should be predominantly funded through community contributions in the future.
It is also fundamental that Mastodon as a network must not be centrally hosted or managed. The Fediverse is decentralised, by design. This makes it resilient to change, and it means that people have a free choice to join communities that match their interests, needs, and cultural expectations. To be clear, we’re extremely grateful to everyone that runs a Mastodon server as part of the network, including all the companies that manage Mastodon services for groups, communities, organisations, and individuals. Your support is invaluable.
These new commercial offerings will not be a singular focus for Mastodon. We’ll continue to build the software, alongside operating our own server that can act as an entry point to the Mastodon community for anyone who wants to join. We want to work together with everyone in the ecosystem, in order to ensure that a range of Mastodon servers support the community, and that the Fediverse as a whole remains diverse, free and open to all.
blog.joinmastodon.org