Lightnews — Scholar-powered news

Aaron Tay

@aarontay.bsky.social

[Read] PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading arxiv.org/abs/2510.22242 (1)

PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading

Large Language Models (LLMs) increasingly serve as research assistants, yet their reliability in scholarly tasks remains under-evaluated. In this work, we introduce PaperAsk, a benchmark that systemat...

arxiv.org

October 29, 2025 at 4:47 PM Everybody can reply

3 reposts 6 likes 3 saves

1 3 6

infoDOCKET

@infodocket.bsky.social

New Preprint: #PaperAsk: A Benchmark for Reliability Evaluation of #LLMs in Paper Search and Reading (via #arXiv) arxiv.org/abs/2510.22242 #scholcomm #AI #discovery #retrieval

October 28, 2025 at 8:13 PM Everybody can reply

2 likes 1 saves

2

arxiv cs.IR

@arxiv-cs-ir.bsky.social

Yutao Wu, Xiao Liu, Yunhao Feng, Jiale Ding, Xingjun Ma
PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading
https://arxiv.org/abs/2510.22242

October 28, 2025 at 5:53 AM Everybody can reply

arXiv cs.IR Information Retrieval

@csir-bot.bsky.social

Yutao Wu, Xiao Liu, Yunhao Feng, Jiale Ding, Xingjun Ma: PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading https://arxiv.org/abs/2510.22242 https://arxiv.org/pdf/2510.22242 https://arxiv.org/html/2510.22242

October 28, 2025 at 6:32 AM Everybody can reply

2 reposts

2

ndocist.bsky.social

@ndocist.bsky.social

PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading | arXiv prepint
Test de fiabilité d' #AIGenerative
#LLM pour la recherche de publications scientifiques
Guess what ? "missing over 60% of relevant literature"

PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading

Large Language Models (LLMs) increasingly serve as research assistants, yet their reliability in scholarly tasks remains under-evaluated. In this work, we introduce PaperAsk, a benchmark that systemat...

arxiv.org

October 29, 2025 at 5:32 AM Everybody can reply

1 reposts 5 likes 3 saves

1 5