Mourad Heddaya
@mheddaya.bsky.social
130 followers 150 following 20 posts
nlp phd student at uchicago cs
Posts Media Videos Starter Packs
Reposted by Mourad Heddaya
chenhaotan.bsky.social
It predicts pretty well—not just shifts in the last week, but also:

1. Who’s working an overnight shift (in our data + external validation in MIMIC)

2. Who’s working on a disruptive circadian schedule

3. How many patients has the doc seen *on the current shift*
mheddaya.bsky.social
I'll be presenting this work at 2pm and will be around until Sunday. Please reach out if you're interested in this line of work - would love to connect in person or virtually!

Thank you to my great collaborators @kyle-macmillan.bsky.social , Anup Malani, Hongyuan Mei, and @chenhaotan.bsky.social
mheddaya.bsky.social
CaseSumm is publicly available on HuggingFace! We hope this dataset enables:
- Better evaluation of long-context summarization
- Research on legal language understanding
- Development of more accurate & reliable legal AI tools

Dataset: huggingface.co/datasets/Chi...
ChicagoHAI/CaseSumm · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
mheddaya.bsky.social
Analysis reveals different types of hallucinations:
- Simple factual errors
- Incorrect legal citations
- Misrepresentation of procedural history
- Mischaracterization of Court's reasoning

Fine-tuned smaller models tend to make more egregious errors than GPT-4.
mheddaya.bsky.social
CaseSumm is a useful resource for long-context reasoning and legal research:
- Largest legal case summarization dataset
- 200+ years of Supreme Court cases
- "Ground truth" summaries written by Court attorneys and approved by Justices
- Variation in summary styles and compression rates over time
mheddaya.bsky.social
Key findings:
1. A smaller fine-tuned LLM scores well on metrics but has more factual errors.
2. Experts prefer GPT-4 summaries—even over the “ground-truth” syllabuses.
3. ROUGE and similar metrics poorly reflect human preferences.
4. Even LLM-based evaluations still misalign with human judgment.
mheddaya.bsky.social
Dataset: huggingface.co/datasets/Chi...
Paper: arxiv.org/abs/2501.00097

When evaluating LLM-generated and human-written summaries, we find interesting discrepancies between automatic metrics, LLM-based evaluation, and human expert judgements.
ChicagoHAI/CaseSumm · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
mheddaya.bsky.social
We develop CaseSumm, a comprehensive dataset comprising 25K U.S. Supreme Court opinions and their official syllabuses spanning over 200 years, and conduct a rigorous evaluation of long-document summarization using CaseSumm.
mheddaya.bsky.social
🧑‍⚖️How well can LLMs summarize complex legal documents? And can we use LLMs to evaluate?

Excited to be in Albuquerque presenting our paper this afternoon at @naaclmeeting 2025!
Reposted by Mourad Heddaya
chenhaotan.bsky.social
Although I cannot make #NAACL2025, @chicagohai.bsky.social will be there. Please say hi!

@chachachen.bsky.social GPT ❌ x-rays (Friday 9-10:30)
@mheddaya.bsky.social CaseSumm and LLM 🧑‍⚖️ (Thursday 2-3:30)
@haokunliu.bsky.social @qiaoyu-rosa.bsky.social hypothesis generation 🔬 (Saturday at 4pm)
Reposted by Mourad Heddaya
divingwithorcas.bsky.social
1/n

You may know that large language models (LLMs) can be biased in their decision-making, but ever wondered how those biases are encoded internally and whether we can surgically remove them?
Reposted by Mourad Heddaya
chenhaotan.bsky.social
Spent a great day at Boulder meeting new students and old colleagues. I used to take this view every day.

Here are the slides for my talk titled "Alignment Beyond Human Preferences: Use Human Goals to Guide AI towards Complementary AI": chenhaot.com/talks/alignm...
mheddaya.bsky.social
Thank you to my excellent collaborators
Qingcheng Zeng, @chenhaotan.bsky.social, @robvoigt.bsky.social, and Alexander Zentefis!
mheddaya.bsky.social
Please reach you if you are interested in this line of work, I’d love to connect in-person or virtually!
mheddaya.bsky.social
Our ongoing work aims to discover narratives automatically, investigate their geographic and temporal trends, understand their potential spread, and assess their influence on economic indicators.
mheddaya.bsky.social
We're scaling up! Using our fine-tuned models, we're identifying narratives in millions of news articles. Techniques like Design-Based Supervised Learning ensure validity in downstream analyses.
mheddaya.bsky.social
Even human annotators sometimes disagree on narrative presence, but fine-tuned LLMs mirror these natural disagreements more closely than larger models.

Our error analysis shows some mistakes arise from genuine interpretative ambiguity. Check out the last three examples here:
mheddaya.bsky.social
Fine-tuning shines in teaching models to spot narratives, unlike in-context learning. GPT-4o struggles, often misclassifying non-narratives as narratives.
mheddaya.bsky.social
This is a difficult hierarchical classification task, with many, somestimes semantically similar, classes.

We find that smaller fine-tuned LLMs outperform larger models like GPT-4o, while also offering better scalability and cost efficiency. But they also err differently.
mheddaya.bsky.social
We define a causal micro-narrative as a sentence-level explanation of a target subject's cause(s) and/or effect(s).

As an application, we propose an ontology for inflation's causes/effects and create a large-scale dataset classifying sentences from U.S. news articles.
mheddaya.bsky.social
While the importance of narratives has become well recognized, formulating an operational definition remains challenging. Particularly one that is flexible to informal and ambiguous language.

In our work, we address both the conceptual and technical challenges.
mheddaya.bsky.social
How do everyday narratives reveal hidden cause-and-effect patterns that shape our beliefs and behaviors?

In our paper, we propose Causal Micro-Narratives to uncover narratives from real-world data. As a case study, we characterize the narratives about inflation in news.