LLMs in Medicine Bot
banner
medllms.bsky.social
LLMs in Medicine Bot
@medllms.bsky.social
Auto-curated preprints on large language models (LLMs) in medicine 🩺🤖. Preprints ≠ peer-reviewed.
1,274 AI-generated hospital summaries in a real-world pilot; doctors used AI in 57% of 384 discharges. Burnout fell (1.75 to 1.20, P=.03). Most drafts posed no harm (88%); missing details 25%, errors 20%, made-up details 2%. https://www.medrxiv.org/content/10.64898/2026.02.05.26345607
February 10, 2026 at 1:01 PM
0.87 million image-report pairs across about 239k patients: CLEAR makes chest X-ray results traceable to exact findings, enabling auditable disease detection, spotting confounders, and safer, more collaborative AI in radiology. https://www.medrxiv.org/content/10.64898/2026.01.15.26344222
January 22, 2026 at 1:00 PM
1,000,000+ medical papers analyzed with AI; female principal investigators are rising over time, but studies led by men still appear in higher-impact journals and receive more citations. https://www.medrxiv.org/content/10.64898/2026.01.06.26343564
January 11, 2026 at 1:00 PM
17.4% of runs: GPT-4.1 flagged patient-identity mismatches as UNKNOWN, but subtle tampering slipped past detection almost entirely. The big risk is misbinding; safety needs explicit identity checks and abstention when unsure. https://www.medrxiv.org/content/10.1101/2025.10.17.25338226
January 1, 2026 at 1:01 PM
0.93 group-level correlation: EntQA's score tracks QA accuracy in medical LLMs. An entity-focused check that preserves key patient facts in answers. Beats old metrics, scales with model size, and boosts trust in AI medical QA. https://www.medrxiv.org/content/10.1101/2025.11.12.25340106
December 29, 2025 at 1:02 PM
Three-field query accuracy jumps from 10% to 82% using a self-hosted LLM system with metadata enrichment and stepwise query decomposition. 600 queries tested—safer, scalable natural-language access to clinical registries. https://www.medrxiv.org/content/10.64898/2025.12.22.25342863
December 27, 2025 at 1:01 PM
68% faster than full-document AI inference, 97% faster than human annotation: SPELL sifts 31 million clinical notes from 8 hospitals, pulling blood loss, due dates, and HELLP diagnoses from tiny snippets—privacy-preserving, scalable NLP. https://www.medrxiv.org/content/10.1101/2025.07.25.25332130
December 4, 2025 at 1:00 PM
11.7% of LLMs' medical decisions were potentially harmful across 10,096,800 cases. A quick safety reminder cut harm from 16.6% to 10.1%, but social pressure still nudges models toward unsafe choices when told to comply or shift blame. https://www.medrxiv.org/content/10.1101/2025.11.25.25340972
December 3, 2025 at 1:02 PM
69,000 PubMed papers mapped to Gene2Phenotype diseases by an AI pipeline—auto-detecting case reports and linking them to disorders to speed up evidence review for genetic developmental disorders. https://www.medrxiv.org/content/10.1101/2025.11.24.25340871
November 27, 2025 at 1:01 PM
-0.715 drop in 'resources' for homelessness in LLM trial screening; adherence also down -0.595. Most other identity effects were tiny. Automation should preserve fairness boundaries in clinical research. https://www.medrxiv.org/content/10.1101/2025.11.15.25340177
November 22, 2025 at 1:00 PM
GPT-4o tops the Clinical Value Density metric at 0.475, delivering fourfold efficiency (41 vs 178 tokens) over pharmacists, but safety is only moderate. AI shines as a supervised assistant, not autonomous clinician. https://www.medrxiv.org/content/10.1101/2025.10.14.25338039
November 20, 2025 at 1:01 PM
Psychotic prompts make Free ChatGPT about 25.8x more likely to give an inappropriate reply. GPT-5 Auto lowers the risk but still 8.5x higher. Across three tested versions, none reliably handles psychotic content. https://www.medrxiv.org/content/10.1101/2025.11.09.25339772
November 19, 2025 at 1:01 PM
514 tools mapped in Cochrane reviews (2010-2024) for tech-assisted evidence synthesis. AI + human checks found ~100 extra tools beyond existing lists, with two annotators verifying all candidates in two days. https://www.medrxiv.org/content/10.1101/2025.11.08.25339805
November 12, 2025 at 1:01 PM
100% accuracy for svPPA and nfvPPA with an AI system that analyzes clinical notes, tests, and MRI in 54 confirmed cases; lvPPA 94.1%. Open-ended: 49/54 correct (90.7%). Full diagnostic pipeline in under 10 minutes. https://www.medrxiv.org/content/10.1101/2025.10.28.25338977
November 5, 2025 at 1:01 PM
96.6% accuracy in spotting cancer-surgery reports. An autonomous AI runs on a single GPU to extract 196 registry fields across 10 cancers, with 93.9% exact-match. Privacy-preserving, fast, and ready to deploy as a digital cancer registrar. https://www.medrxiv.org/content/10.1101/2025.10.21.25338475
October 27, 2025 at 1:00 PM
1.2 million more surgical specialists needed by 2030. Top AI models reach ~82% on medical questions, but falter on surgery—missing procedures, ignoring local guidelines, and giving overconfident wrong answers. https://www.medrxiv.org/content/10.1101/2025.10.05.25337350
October 14, 2025 at 1:01 PM
98% precision in the second step of a two-step LLM annotation to build AI-ready ML metadata across 14 papers and 6 models—validated with authors and showing boosted reproducibility and interoperability. https://www.medrxiv.org/content/10.1101/2025.10.06.25337418
October 10, 2025 at 1:01 PM
1.8 million brain scans from 38,945 patients in one free database. MRI, CT, PET, SPECT plus linked EEG data across ages 0–106. Standardized, multimodal, AI-ready metadata. A huge boost for big-brain research. Access: bdsp.io https://www.medrxiv.org/content/10.1101/2025.10.01.25337054
October 7, 2025 at 1:00 PM
62.1% of AF patients moved from low/intermediate stroke risk to high risk using a new AI tool, making them eligible for anticoagulation. The AI achieved 0.94–1.00 accuracy vs 0.66–0.92 with standard data methods. https://www.medrxiv.org/content/10.1101/2024.09.19.24313992
October 1, 2025 at 1:00 PM
Three AI chatbots hit 100% satisfaction in real-world clinical vignettes: ArkangelAI-Deep, ChatGPT-Deep, OpenEvidence. Medisearch fastest (18s); GPT-Deep slowest (13 min). Study calls for standardized safety checks before medical AI use. https://www.medrxiv.org/content/10.1101/2025.09.23.25336206
September 27, 2025 at 1:00 PM
91.5% accuracy on the ACR TXIT exam with a minimal RAG LLM in radiation oncology—beats the old 74% benchmark. It even flags uncertain answers with low confidence, boosting reliability for clinical support and medical education. https://www.medrxiv.org/content/10.1101/2025.09.16.25335813
September 21, 2025 at 1:00 PM
Fearful attachment emerged as the most common pattern in crises, with strong effect sizes and significant deviations across all stages—AI analysis offers a fast, objective way to identify attachment styles beyond interviews. https://www.medrxiv.org/content/10.1101/2025.08.30.25334439
September 18, 2025 at 1:01 PM
Erroneous AI tips cut top diagnosis accuracy by 18 percentage points (90.5% to 76.1%) in AI-trained doctors. This shows automation bias persists even with training—strong safeguards and human oversight are needed before wide AI use. https://www.medrxiv.org/content/10.1101/2025.08.23.25334280
September 17, 2025 at 1:00 PM
6.6 kg weight loss in 12 weeks with personalized AI prompts + ChatGPT vs 3.0 kg with standard prompts. By 24 weeks: 5.5 kg vs 1.7 kg; fat mass down 3.7 kg, lean mass preserved. AI-driven prompts outperform manual prompts for weight loss. https://www.medrxiv.org/content/10.1101/2025.09.07.25335255
September 14, 2025 at 1:00 PM
September 5, 2025 at 1:03 PM