Alison Hoens
@physioktbroker.bsky.social
580 followers 350 following 280 posts
Clinical Professor, Knowledge Broker, Physical Therapist, Knowledge mobilization specialist
Posts Media Videos Starter Packs
physioktbroker.bsky.social
Think beyond ChatGPT: “Claude Opus 4 showed superior accuracy, reliability, and usefulness, with significantly fewer hallucinations. Readability scores were similar across models” 🧪 link.springer.com/article/10.1...
Referential hallucination and clinical reliability in large language models: a comparative analysis using regenerative medicine guidelines for chronic pain - Rheumatology International
This study compared language models’ responses to open-ended questions on regenerative therapy guidelines for chronic pain, assessing their accuracy, reliability, usefulness, readability, semantic similarity, and hallucination rates. This cross-sectional study used 16 open-ended questions based on the American Society of Pain and Neuroscience’s regenerative therapy guidelines for chronic pain. Questions were answered by ChatGPT-4o, Gemini 2.5 Flash, and Claude 4 Opus. Responses were rated on a 7-point Likert scale for usability and reliability, and a 5-point scale for accuracy. Hallucinogenicity, readability (FKRE, FKGL), and similarity (USE, ROUGE-L) were also assessed. Statistical comparisons were made, with significance set at p < 0.05. Claude Opus 4 showed the highest reliability (5.19 ± 1.11), usefulness (5.06 ± 1.0), and clinical accuracy (4.06 ± 0.68), outperforming ChatGPT-4o (4.13 ± 0.96; 3.94 ± 0.85; 3.38 ± 0.72) and Gemini 2.5 (4.19 ± 0.98; 4.06 ± 0.93; 3.38 ± 0.62). Claude had the lowest reference hallucinations (RHS 4.44 ± 3.18) vs. ChatGPT-4o (8.38 ± 1.86) and Gemini 2.5 (8.75 ± 1.73). In semantic similarity, Claude (0.68 ± 0.08) and Gemini (0.65 ± 0.07) surpassed ChatGPT-4o (0.60 ± 0.09). Gemini led in ROUGE-L F1 (0.12 ± 0.03) vs. Claude (0.10 ± 0.02) and ChatGPT-4o (0.07 ± 0.03). Readability was similar, though Gemini had a higher FKGL (11.3 ± 1.06) than Claude (10.3 ± 2.09). Claude Opus 4 showed superior accuracy, reliability, and usefulness, with significantly fewer hallucinations. Readability scores were similar across models. Further research is recommended.
link.springer.com
physioktbroker.bsky.social
Yep, the photo that is linked is definitely not a shoulder 😆
Reposted by Alison Hoens
kellyhereid.bsky.social
Curious about which fields poast the most?

Medicine / Social science / Environmental science / Biochemistry

Note that these aren't corrected for field size, so love the strong showing from earth science too ⚒️🧪
Figure 4. Distribution of Bluesky posts referencing scholarly articles across OpenAlex domains and fields from January 2023 to July 2025. Led by medicine, social science, environmental science, biochemistry
Reposted by Alison Hoens
conradhackett.bsky.social
NEW STUDY:
Types of science🧪 articles shared on Bluesky:
Health 29%
Social 25%
Physical 23%
Life 23%
arxiv.org/pdf/2507.18840
Worth reading--lots of other insights!

Lately, I've been sharing our new article on global religious decline www.nature.com/articles/s41...
Bar chart showing the distribution of Bluesky posts referencing scholarly articles across different domains from January 2023 to July 2025.
Reposted by Alison Hoens
whysharksmatter.bsky.social
Hey everybody! @drjuliawester.bsky.social and I have a new paper!

We surveyed over 800 scientists, science communicators, and science educators who use social media.

Conclusion: Scientists no longer find Twitter useful or pleasant, and many have switched to Bluesky! 🧪🌎🦑

doi.org/10.1093/icb/...
Scientists no Longer Find Twitter Professionally Useful, and have Switched to Bluesky
Synopsis. Social media has become widely used by the scientific community for a variety of professional uses, including networking and public outreach. For
doi.org
physioktbroker.bsky.social
Even modest daily step counts are associated with health benefits.

7000 per day= sizeable risk reductions across most outcomes, compared with 2000 steps per day.

Even though risk continued to decrease beyond 7000 steps per day, it plateaued for some outcomes.🧪

www.thelancet.com/journals/lan...
Daily steps and health outcomes in adults: a systematic review and dose-response meta-analysis
Although 10 000 steps per day can still be a viable target for those who are more active, 7000 steps per day is associated with clinically meaningful improvements in health outcomes and might be a mor...
www.thelancet.com
Reposted by Alison Hoens
cenmag.bsky.social
The large language model–based chatbot ChatGPT fails to highlight the validity concerns with scientific papers that have been retracted or have been the subject of other editorial notices, according to a new study. cen.acs.org/policy/publi... #chemsky 🧪
ChatGPT tends to ignore retractions on scientific papers
Study finds the chatbot doesn’t acknowledge concerns with problematic studies
cen.acs.org
Reposted by Alison Hoens
jperkel.bsky.social
Many have sworn off spreadsheets, others swear by them. Some swear profusely when they’re forced to use them. New @nature.com, six questions to make the most of your spreadsheets in science. 🧪 Feat @yabellini.bsky.social @cghlewis.bsky.social & more #datasci www.nature.com/articles/d41...
Six questions to ask before jumping into a spreadsheet
Spreadsheet software can be frustrating, but adopting some helpful habits can improve its effectiveness.
www.nature.com