Lightnews — Scholar-powered news

Reposted by Mourad Heddaya

chenhaotan.bsky.social @chenhaotan.bsky.social · Jul 2

It predicts pretty well—not just shifts in the last week, but also:

1. Who’s working an overnight shift (in our data + external validation in MIMIC)

2. Who’s working on a disruptive circadian schedule

3. How many patients has the doc seen *on the current shift*

1 3 5

Mourad Heddaya @mheddaya.bsky.social · May 1

I'll be presenting this work at 2pm and will be around until Sunday. Please reach out if you're interested in this line of work - would love to connect in person or virtually!

Thank you to my great collaborators @kyle-macmillan.bsky.social , Anup Malani, Hongyuan Mei, and @chenhaotan.bsky.social

1 1 3

Mourad Heddaya @mheddaya.bsky.social · May 1

CaseSumm is publicly available on HuggingFace! We hope this dataset enables:
- Better evaluation of long-context summarization
- Research on legal language understanding
- Development of more accurate & reliable legal AI tools

Dataset: huggingface.co/datasets/Chi...

ChicagoHAI/CaseSumm · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

1

Mourad Heddaya @mheddaya.bsky.social · May 1

Analysis reveals different types of hallucinations:
- Simple factual errors
- Incorrect legal citations
- Misrepresentation of procedural history
- Mischaracterization of Court's reasoning

Fine-tuned smaller models tend to make more egregious errors than GPT-4.

1 1 1

Mourad Heddaya @mheddaya.bsky.social · May 1

CaseSumm is a useful resource for long-context reasoning and legal research:
- Largest legal case summarization dataset
- 200+ years of Supreme Court cases
- "Ground truth" summaries written by Court attorneys and approved by Justices
- Variation in summary styles and compression rates over time

1 1

Mourad Heddaya @mheddaya.bsky.social · May 1

Key findings:
1. A smaller fine-tuned LLM scores well on metrics but has more factual errors.
2. Experts prefer GPT-4 summaries—even over the “ground-truth” syllabuses.
3. ROUGE and similar metrics poorly reflect human preferences.
4. Even LLM-based evaluations still misalign with human judgment.

1 1

Mourad Heddaya @mheddaya.bsky.social · May 1

Dataset: huggingface.co/datasets/Chi...
Paper: arxiv.org/abs/2501.00097

When evaluating LLM-generated and human-written summaries, we find interesting discrepancies between automatic metrics, LLM-based evaluation, and human expert judgements.

ChicagoHAI/CaseSumm · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

1 2

Mourad Heddaya @mheddaya.bsky.social · May 1

We develop CaseSumm, a comprehensive dataset comprising 25K U.S. Supreme Court opinions and their official syllabuses spanning over 200 years, and conduct a rigorous evaluation of long-document summarization using CaseSumm.

1 2

Mourad Heddaya @mheddaya.bsky.social · May 1

🧑‍⚖️How well can LLMs summarize complex legal documents? And can we use LLMs to evaluate?

Excited to be in Albuquerque presenting our paper this afternoon at @naaclmeeting 2025!

2 13 23

Reposted by Mourad Heddaya

chenhaotan.bsky.social @chenhaotan.bsky.social · Apr 30

Although I cannot make #NAACL2025, @chicagohai.bsky.social will be there. Please say hi!

@chachachen.bsky.social GPT ❌ x-rays (Friday 9-10:30)
@mheddaya.bsky.social CaseSumm and LLM 🧑‍⚖️ (Thursday 2-3:30)
@haokunliu.bsky.social @qiaoyu-rosa.bsky.social hypothesis generation 🔬 (Saturday at 4pm)

7 17

Reposted by Mourad Heddaya

Dang Nguyen @divingwithorcas.bsky.social · Apr 14

1/n

You may know that large language models (LLMs) can be biased in their decision-making, but ever wondered how those biases are encoded internally and whether we can surgically remove them?

1 12 18

Reposted by Mourad Heddaya

chenhaotan.bsky.social @chenhaotan.bsky.social · Jan 24

Spent a great day at Boulder meeting new students and old colleagues. I used to take this view every day.

Here are the slides for my talk titled "Alignment Beyond Human Preferences: Use Human Goals to Guide AI towards Complementary AI": chenhaot.com/talks/alignm...

5 17

Mourad Heddaya @mheddaya.bsky.social · Nov 15

Thank you to my excellent collaborators
Qingcheng Zeng, @chenhaotan.bsky.social, @robvoigt.bsky.social, and Alexander Zentefis!

1

Mourad Heddaya @mheddaya.bsky.social · Nov 15

I’ll be presenting this work at the
EMNLP 2024 Workshop on Narrative Understanding. If you are in Miami, the presentation will be at 3:30pm!

Paper: aclanthology.org/2024.wnu-1.12/
Dataset (soon): mheddaya.com/research/nar...

Causal Micro-Narratives

Mourad Heddaya, Qingcheng Zeng, Alexander Zentefis, Rob Voigt, Chenhao Tan. Proceedings of the The 6th Workshop on Narrative Understanding. 2024.

aclanthology.org

1 2

Mourad Heddaya @mheddaya.bsky.social · Nov 15

Please reach you if you are interested in this line of work, I’d love to connect in-person or virtually!

1

Mourad Heddaya @mheddaya.bsky.social · Nov 15

Our ongoing work aims to discover narratives automatically, investigate their geographic and temporal trends, understand their potential spread, and assess their influence on economic indicators.

1 1

Mourad Heddaya @mheddaya.bsky.social · Nov 15

We're scaling up! Using our fine-tuned models, we're identifying narratives in millions of news articles. Techniques like Design-Based Supervised Learning ensure validity in downstream analyses.

1

Mourad Heddaya @mheddaya.bsky.social · Nov 15

Even human annotators sometimes disagree on narrative presence, but fine-tuned LLMs mirror these natural disagreements more closely than larger models.

Our error analysis shows some mistakes arise from genuine interpretative ambiguity. Check out the last three examples here:

1

Mourad Heddaya @mheddaya.bsky.social · Nov 15

Fine-tuning shines in teaching models to spot narratives, unlike in-context learning. GPT-4o struggles, often misclassifying non-narratives as narratives.

1

Mourad Heddaya @mheddaya.bsky.social · Nov 15

This is a difficult hierarchical classification task, with many, somestimes semantically similar, classes.

We find that smaller fine-tuned LLMs outperform larger models like GPT-4o, while also offering better scalability and cost efficiency. But they also err differently.

1

Mourad Heddaya @mheddaya.bsky.social · Nov 15

We define a causal micro-narrative as a sentence-level explanation of a target subject's cause(s) and/or effect(s).

As an application, we propose an ontology for inflation's causes/effects and create a large-scale dataset classifying sentences from U.S. news articles.

1 1

Mourad Heddaya @mheddaya.bsky.social · Nov 15

While the importance of narratives has become well recognized, formulating an operational definition remains challenging. Particularly one that is flexible to informal and ambiguous language.

In our work, we address both the conceptual and technical challenges.

1 1 2

Mourad Heddaya @mheddaya.bsky.social · Nov 15

We focus on sentences but do not make any further assumptions about the syntax or structure of language used to convey a narrative.

📄 aclanthology.org/2024.wnu-1.12/

Causal Micro-Narratives

Mourad Heddaya, Qingcheng Zeng, Alexander Zentefis, Rob Voigt, Chenhao Tan. Proceedings of the The 6th Workshop on Narrative Understanding. 2024.

aclanthology.org

1 1 3

Mourad Heddaya @mheddaya.bsky.social · Nov 15

How do everyday narratives reveal hidden cause-and-effect patterns that shape our beliefs and behaviors?

In our paper, we propose Causal Micro-Narratives to uncover narratives from real-world data. As a case study, we characterize the narratives about inflation in news.

1 7 34