Ai2
banner
ai2.bsky.social
Ai2
@ai2.bsky.social
Breakthrough AI to solve the world's biggest problems.

› Join us: http://allenai.org/careers
› Get our newsletter: https://share.hsforms.com/1uJkWs5aDRHWhiky3aHooIg3ioxm
Our goal: systems scientist can trust and build on 🤝. OpenScholar’s code & data are public—and it’s already shaping our next-gen research models.

📄 Nature: buff.ly/hQHM8K9
📝 Blog: buff.ly/Re5wvCA
Synthesizing scientific literature with retrieval-augmented language models - Nature
A specialized, open-source, retrieval-augmented language model is introduced for answering scientific queries and synthesizing literature, the responses of which are shown to be preferred by human…
www.nature.com
February 4, 2026 at 4:21 PM
What started as research into literature-grounded AI now powers real tools. OpenScholar's 45M paper corpus feeds the Semantic Scholar API. ScholarQABench inspired parts of AstaBench. And OpenScholar’s core concepts live on in Asta and DR Tulu.
February 4, 2026 at 4:21 PM
In a review, 16 scientists preferred OpenScholar to human answers 51% of the time—and combining OpenScholar's citation pipeline with GPT-4o boosted that to 70% (vs. 32% for GPT-4o alone) 📈
February 4, 2026 at 4:21 PM
We also created ScholarQABench, the first large, multi-domain benchmark for scientific search and synthesis 🧪: 3,000 queries + 250 long-form expert answers across CS, physics, biomedicine, & neuroscience.
February 4, 2026 at 4:21 PM
With the University of Washington, we built OpenScholar: scientific synthesis with citation-grounded answers, trained on 45M papers.

Because web search alone can be noisy, it uses RAG to search for, incorporate, & cite new sources—even after training 🔎
February 4, 2026 at 4:21 PM
Scientists can't keep up with millions of new papers. General-purpose AI could help, but it still hallucinates—especially citations. In our study, GPT-4o fabricated 78–90% of its research sources.
February 4, 2026 at 4:21 PM
You can drop in SERA-14B or retrain with our refreshed data. We look forward to seeing what you build!

💻 Model & data: buff.ly/K15oZuB
📝 Learn more: buff.ly/eII61ys
Open Coding Agents - a allenai Collection
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
February 3, 2026 at 5:39 PM
We've also revamped the open SERA training data into a general, model-agnostic format that's easier to reuse across different workflows.

What's new:
✅ Verification thresholds per sample
✅ More metadata for filtering & analysis
February 3, 2026 at 5:39 PM
SERA-14B is built for more setups and easier deployment—a smaller, more accessible option that still keeps SERA's cheap, customizable approach.
February 3, 2026 at 5:39 PM
We built this so anyone can train a custom open coding agent. We’re eager to see what you create. 👨🏻‍💻
⬇️ Models: buff.ly/JE7Znl8
💻 SERA CLI: buff.ly/DB3aqlw | PyPi: buff.ly/BH9s45D
📝 Tech report: buff.ly/haD0Fmd
✏️ Blog: buff.ly/VlMDfqe
buff.ly
January 28, 2026 at 11:54 PM
We're releasing:
✅ A family of strong open coding models
✅ SERA, our training method for building your own agents
✅ Code, recipes, data, + Claude Code integration
Here’s how to get started with SERA via Claude Code: youtube.com/watch?v=LfLIi4ZR_jA
Getting started with SERA in Claude Code
YouTube video by Ai2
youtube.com
January 28, 2026 at 11:54 PM
We’re releasing the Theorizer code and framework + a dataset of ~3,000 theories generated by Theorizer across the field of AI/NLP, built from 13,744 source papers.

✍️ Learn more in our blog: buff.ly/7gddq0F
💻 Code: buff.ly/mRh1QAu
📝 Technical report: buff.ly/0tWBPcb
Theorizer: Turning thousands of papers into scientific laws | Ai2
Theorizer is a system that automatically reads scientific literature and synthesizes structured, testable theories.
allenai.org
January 28, 2026 at 6:37 PM
Benchmarking theory generation is hard, so we evaluate on 5 desiderata: specificity, empirical support, predictive accuracy, novelty, & plausibility.

We find that grounding in papers boosts specificity, empirical support, & plausibility—especially when pushing for novelty.
January 28, 2026 at 6:37 PM
Theorizer:
1️⃣ Gathers a corpus, pulling full text when available
2️⃣ Builds a query-specific schema + extracts structured records from each paper
3️⃣ Aggregates evidence into candidate laws, refining for clarity & attribution
January 28, 2026 at 6:37 PM
Theorizer is a multi-LLM framework. Ask "make me theories about X" and it reads relevant papers + outputs candidate laws, looking for regularities across studies and writing them as ⟨LAW, SCOPE, EVIDENCE⟩ tuples.
January 28, 2026 at 6:37 PM
Experiments drive science forward, but progress compounds when findings coalesce into theories that explain & predict. Kepler's laws distilled centuries of observations into a few statements about planetary motion.

We asked: can an AI build theories by reading the literature?
January 28, 2026 at 6:37 PM
We've updated our benchmark tables to reflect feedback from the community! Please check out our blog for the latest results.
January 28, 2026 at 5:46 PM
SERA (Soft-verified Efficient Repository Agents) is our method for training repo-specialized agents quickly & affordably. It generates diverse, realistic training data from any codebase, teaching agents how developers actually work.
January 27, 2026 at 4:13 PM
Coding agents are changing how software gets built, but most remain closed, expensive, & difficult to customize to your codebase. Adapting to codebases has been hard because you need agent-ready synthetic training data, ideally without building complex RL infrastructure.
January 27, 2026 at 4:13 PM