Adi Simhi
@adisimhi.bsky.social
21 followers 25 following 9 posts
NLProc, and machine learning. Ph.D. student Technion
Posts Media Videos Starter Packs
adisimhi.bsky.social
Check out our new paper on evaluating LLM agents on their preference for achieving their goal and avoiding human harm, called ManagerBench👔
mtutek.bsky.social
🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm?

🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵
adisimhi.bsky.social
🔍Check out our paper "Trust Me, I’m Wrong: High-Certainty Hallucinations in LLMs", at arxiv.org/pdf/2502.12964 and code at github.com/technion-cs-...
adisimhi.bsky.social
What do you think? 🤔
Could high-certainty hallucinations be a major roadblock to safe AI deployment? Let’s discuss! 👇
adisimhi.bsky.social
🔮 Takeaway:
We need new approaches to understand hallucinations so we can mitigate them better.
This research moves us toward deeper insights into why LLMs hallucinate and how we can build more trustworthy AI.
adisimhi.bsky.social
💡Why does this matter?
- Not all hallucinations stem from uncertainty or lack of knowledge.
- High-certainty hallucinations appear systematically across models & datasets.
- This challenges existing hallucination detection & mitigation strategies that rely on uncertainty signals
adisimhi.bsky.social
🛠️How did we test this?
We used knowledge detection & uncertainty measurement methods to analyze when and how hallucinations occur.
adisimhi.bsky.social
🚨Key finding:
LLMs can produce hallucinations with high certainty—even when they possess the correct knowledge!
adisimhi.bsky.social
🔍The problem:
LLMs sometimes generate hallucinations - factually incorrect outputs. assuming that if the model is certain and does not lack knowledge it must be correct.
adisimhi.bsky.social
🚨New arXiv preprint!🚨
LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? 🤯
We find those hallucinations in our latest work with @itay-itzhak.bsky.social, @fbarez.bsky.social, @gabistanovsky.bsky.social and Yonatan Belinkov