[Read] PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading arxiv.org/abs/2510.22242 (1)
        
            PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading
            Large Language Models (LLMs) increasingly serve as research assistants, yet their reliability in scholarly tasks remains under-evaluated. In this work, we introduce PaperAsk, a benchmark that systemat...
          
            
            arxiv.org
          
        
          
            October 29, 2025 at 4:47 PM
            
              
              Everybody can reply
            
          
        
          3 reposts
          
          6 likes
          3 saves
        
        
      
    New Preprint: #PaperAsk: A Benchmark for Reliability Evaluation of #LLMs in Paper Search and Reading (via #arXiv) arxiv.org/abs/2510.22242 #scholcomm #AI #discovery #retrieval
          
            October 28, 2025 at 8:13 PM
            
              
              Everybody can reply
            
          
        
          
          
          2 likes
          1 saves
        
        
      
    Yutao Wu, Xiao Liu, Yunhao Feng, Jiale Ding, Xingjun Ma
PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading
https://arxiv.org/abs/2510.22242
          PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading
https://arxiv.org/abs/2510.22242
            October 28, 2025 at 5:53 AM
            
              
              Everybody can reply
            
          
        Yutao Wu, Xiao Liu, Yunhao Feng, Jiale Ding, Xingjun Ma: PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading https://arxiv.org/abs/2510.22242 https://arxiv.org/pdf/2510.22242 https://arxiv.org/html/2510.22242
          
            October 28, 2025 at 6:32 AM
            
              
              Everybody can reply
            
          
        
          2 reposts
          
          
          
        
        
      
    PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading | arXiv prepint
Test de fiabilité d' #AIGenerative
#LLM pour la recherche de publications scientifiques
Guess what ? "missing over 60% of relevant literature"
        
            Test de fiabilité d' #AIGenerative
#LLM pour la recherche de publications scientifiques
Guess what ? "missing over 60% of relevant literature"
PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading
            Large Language Models (LLMs) increasingly serve as research assistants, yet their reliability in scholarly tasks remains under-evaluated. In this work, we introduce PaperAsk, a benchmark that systemat...
          
            
            arxiv.org
          
        
          
            October 29, 2025 at 5:32 AM
            
              
              Everybody can reply
            
          
        
          1 reposts
          
          5 likes
          3 saves
        
        
      
     
        