Lightnews — Scholar-powered news

Adi Simhi

@adisimhi.bsky.social

21 followers 25 following 9 posts

NLProc, and machine learning. Ph.D. student Technion

Posts Media Videos Starter Packs

Adi Simhi @adisimhi.bsky.social · 19h

Check out our new paper on evaluating LLM agents on their preference for achieving their goal and avoiding human harm, called ManagerBench👔

Martin Tutek @mtutek.bsky.social · 20h

🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm?

🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵

Adi Simhi @adisimhi.bsky.social · Feb 19

🔍Check out our paper "Trust Me, I’m Wrong: High-Certainty Hallucinations in LLMs", at arxiv.org/pdf/2502.12964 and code at github.com/technion-cs-...

Adi Simhi @adisimhi.bsky.social · Feb 19

What do you think? 🤔
Could high-certainty hallucinations be a major roadblock to safe AI deployment? Let’s discuss! 👇

Adi Simhi @adisimhi.bsky.social · Feb 19

🔮 Takeaway:
We need new approaches to understand hallucinations so we can mitigate them better.
This research moves us toward deeper insights into why LLMs hallucinate and how we can build more trustworthy AI.

Adi Simhi @adisimhi.bsky.social · Feb 19

💡Why does this matter?
- Not all hallucinations stem from uncertainty or lack of knowledge.
- High-certainty hallucinations appear systematically across models & datasets.
- This challenges existing hallucination detection & mitigation strategies that rely on uncertainty signals

Adi Simhi @adisimhi.bsky.social · Feb 19

🛠️How did we test this?
We used knowledge detection & uncertainty measurement methods to analyze when and how hallucinations occur.

Adi Simhi @adisimhi.bsky.social · Feb 19

🚨Key finding:
LLMs can produce hallucinations with high certainty—even when they possess the correct knowledge!

Adi Simhi @adisimhi.bsky.social · Feb 19

🔍The problem:
LLMs sometimes generate hallucinations - factually incorrect outputs. assuming that if the model is certain and does not lack knowledge it must be correct.

Adi Simhi @adisimhi.bsky.social · Feb 19

🚨New arXiv preprint!🚨
LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? 🤯
We find those hallucinations in our latest work with @itay-itzhak.bsky.social, @fbarez.bsky.social, @gabistanovsky.bsky.social and Yonatan Belinkov