Lightnews — Scholar-powered news

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

This work is the result of a collaboration with a great team. Thanks to my co-authors:
Eyal Ben David, Hadas Orgad, Eran Ofek, Yonatan Belinkov, Idan Szpektor, Jonathan Herzig, and Roi Reichart.

Paper: arxiv.org/abs/2503.15299

17/🧵 (end)

Inside-Out: Hidden Factual Knowledge in LLMs

This work presents a framework for assessing whether large language models (LLMs) encode more factual knowledge in their parameters than what they express in their outputs. While a few studies hint at...

arxiv.org

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

We hope our framework facilitates future research on hidden knowledge, ultimately leading to more transparent and reliable LLMs.
16/🧵

1

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

Yet the fact that models fail to generate known answers puts a practical constraint on scaling test-time compute via repeated sampling in closed-book QA. Significant gains remain inaccessible because we fail to sample answers that the probe would otherwise rank first.
15/🧵

1 1

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

We also leverage our setup to enhance performance in a challenging closed-book QA setting, achieving a 12% average relative improvement over greedy decoding by increasing test-time compute: sampling a large set of answers and selecting the top one using our probe.
14/🧵

1 1

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

For example, here the correct answer “Volvo Buses” gets a very low P(a | q) score, meaning it is unlikely to be generated. Accordingly, it wasn’t sampled after 1,000 attempts in our study. Yet the probe scores it higher than all other alternatives.
13/🧵

1

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

This shows that LLMs can know the answer but have practically zero chance of generating it even once, despite large-scale repeated sampling.

This highlights limitations in the generation process and opens interesting directions for future research on decoding mechanisms.

12/🧵

1 1

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

We also discover an extreme case of hidden knowledge. When the ground-truth answer isn’t sampled after 1,000 attempts, manually adding it to the set of candidate answers leads to a substantial increase in knowledge scores.
11/🧵

1

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

Our results indicate that LLMs consistently exhibit hidden knowledge, with an average relative gap of 40%.

This highlights the need to understand these differences and build models that better use their knowledge, for which our framework serves as a foundation.

10/🧵

1

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

We then compare internal and external knowledge.

Internal knowledge is measured using a linear probing classifier to score candidate answers, while external knowledge is measured using standard methods that rely on the model’s observable token-level probabilities.

9/🧵

1

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

In our study, we estimate the set of (correct, incorrect) answer pairs per question using 1,000 model-generated answers, labeled for correctness by an LLM judge that compares each answer to the ground truth.
8/🧵

1

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

We define hidden knowledge as the condition in which internal knowledge exceeds external knowledge.
7/🧵

1

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

This allows us to measure internal and external knowledge using the same definition. We just use different scoring methods: external ones, that rely only on the model's observable token-level probabilities, and internal ones, that can use intermediate computations.
6/🧵

1

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

We propose to measure knowledge relative to a function that scores answer candidates using signals from the model, and we formalize knowledge of a question as the fraction of correct-incorrect answer pairs where the correct one is scored higher.
5/🧵

1

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

To define hidden knowledge, we first need a definition of “knowledge”, which is also not well defined for LLMs, as shown by @constanzafierro.bsky.social.
4/🧵

1

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

Yet, despite its importance from both practical and interpretability perspectives, hidden knowledge hasn't been clearly defined and measured.

We propose such a definition, laying foundations for studying this concept, and use it in a study to demonstrate hidden knowledge.
3/🧵

1

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

📜 arxiv.org/abs/2503.15299

Diverse evidence from prior work suggests the existence of hidden knowledge.
2/🧵

Inside-Out: Hidden Factual Knowledge in LLMs

This work presents a framework for assessing whether large language models (LLMs) encode more factual knowledge in their parameters than what they express in their outputs. While a few studies hint at...

arxiv.org

1

Zorik Gekhman @zorikgekhman.bsky.social · Mar 31

🚨 It's often claimed that LLMs know more facts than they show in their outputs, but what does this actually mean, and how can we measure this “hidden knowledge”?

In our new paper, we clearly define this concept and design controlled experiments to test it.
1/🧵

1