Lightnews — Scholar-powered news

Shang Qu

@lindsayttsq.bsky.social

25 followers 250 following 8 posts

AI4Biomed & LLMs @ Tsinghua University

Posts Replies Media Videos

Shang Qu

@lindsayttsq.bsky.social

Check out the details!
📒Preprint: arxiv.org/pdf/2501.18362
🗃️Data files will be released shortly at: github.com/TsinghuaC3I/...

arxiv.org

February 4, 2025 at 1:33 PM

Shang Qu

@lindsayttsq.bsky.social

We also found that reasoning process errors & perceptual errors (in MM) take up a large percentage of model errors. Error cases provide further insights into the challenges models still face regarding clinical reasoning:

February 4, 2025 at 1:33 PM

Shang Qu

@lindsayttsq.bsky.social

💡Clinical reasoning facilitates model reasoning evaluation beyond math & code. We annotate MedXpertQA questions as Reasoning/Understanding based on required reasoning complexity.
Comparing 3 inference-time scaled models against their backbones, we find distinct improvements in the Reasoning subset:

February 4, 2025 at 1:32 PM

Shang Qu

@lindsayttsq.bsky.social

Benchmark construction process - 38k original ➡️ 4k+ final questions
- Filtering for difficulty and diversity using responses from humans + 8 AI experts
- Question rewriting & option set expansion to lower data leakage risk
- Human expert proofreading & error correction

February 4, 2025 at 1:31 PM

Shang Qu

@lindsayttsq.bsky.social

We improve clinical relevance through
⭐️Medical specialty coverage: MedXpertQA includes questions from 20+ exams of medical licensing level or higher
⭐️Realistic context: MM is the first multimodal medical benchmark to introduce rich clinical information with diverse image types

February 4, 2025 at 1:31 PM

Shang Qu

@lindsayttsq.bsky.social

Compared with rapidly saturating benchmarks like MedQA, we raise the bar with harder questions and a sharper focus on medical reasoning.
Full results evaluating 17 LLMs, LMMs, and inference-time scaled models:

February 4, 2025 at 1:30 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news