Ai2
banner
ai2.bsky.social
Ai2
@ai2.bsky.social
Breakthrough AI to solve the world's biggest problems.

› Join us: http://allenai.org/careers
› Get our newsletter: https://share.hsforms.com/1uJkWs5aDRHWhiky3aHooIg3ioxm
Pinned
Knowing which questions to ask is often the hardest part of science. Today we're releasing AutoDiscovery in AstaLabs, an AI system that starts with your data and generates its own hypotheses. 🧪
Try AutoDiscovery in AstaLabs today → buff.ly/UaBQLur

We're giving early users 1,000 free Hypothesis Credits to get started.

📚 Learn more in our blog: buff.ly/fHLhXPs
AstaLabs AutoDiscovery
asta-autodiscovery.allen.ai
February 12, 2026 at 4:06 PM
Everything AutoDiscovery generates is transparent & reproducible—every hypothesis, statistical analysis, and line of Python code is there for you to inspect.

Science has always had more data than time to explore it. AutoDiscovery helps surface the questions hiding in yours: buff.ly/yGxdpMT
Using Asta AutoDiscovery: AI-powered autonomous scientific discovery
AutoDiscovery is an AI-powered tool that explores structured datasets autonomously—generating hypotheses, designing and running statistical experiments, and surfacing findings that researchers might…
youtu.be
February 12, 2026 at 4:06 PM
📄 In social science, AutoDiscovery helped economist Sanchaita Hazra find that doctoral-level authors made more edits to AI-generated abstracts, suggesting expertise drives critical engagement with AI.

Her independently verified results were published in a peer-reviewed paper: buff.ly/BJ5Zran
Accepted with Minor Revisions: Value of AI-Assisted Scientific Writing
Large Language Models have seen expanding application across domains, yet their effectiveness as assistive tools for scientific writing - an endeavor requiring precision, multimodal synthesis, and…
arxiv.org
February 12, 2026 at 4:06 PM
🌊 In marine ecology, Fabio Favoretto at Scripps used AutoDiscovery to explore 20+ years of rocky reef data from the Gulf of California, surfacing cross-trophic productivity relationships that would have taken extensive manual iterations to find.

Read the report: buff.ly/mhgIKD0
allenai.org
February 12, 2026 at 4:06 PM
👩‍🔬 In oncology, Dr. Kelly Paulson at the Swedish Cancer Institute used AutoDiscovery to explore breast cancer & melanoma datasets, surfacing new hypotheses about immune responses + lymph node spread that weren't part of her team's initial questions.

Read more: buff.ly/9uJFRNL
allenai.org
February 12, 2026 at 4:06 PM
Researchers across ecology, health, & social science are already using AutoDiscovery to surface findings hiding in their data—from cancer mutation patterns to trophic relationships in marine ecosystems.

Read their stories: buff.ly/xNbB93d
allenai.org
February 12, 2026 at 4:06 PM
How does it decide what to pursue? Bayesian surprise—a measure of how much the system's beliefs change after seeing evidence. By chasing surprise, AutoDiscovery gravitates toward the unexpected, prioritizing results most likely to represent genuine discoveries. 🔬
February 12, 2026 at 4:06 PM
Most AI tools for science wait for a research question, then help answer it. AutoDiscovery starts with your data. It generates hypotheses, runs experiments, interprets results, & uses what it learns to keep exploring.
February 12, 2026 at 4:06 PM
Knowing which questions to ask is often the hardest part of science. Today we're releasing AutoDiscovery in AstaLabs, an AI system that starts with your data and generates its own hypotheses. 🧪
February 12, 2026 at 4:06 PM
Explore MolmoSpaces & start building:
📝 Blog: buff.ly/nJpBn7E
💻 Demo: buff.ly/s3x8xJo
⬇️ Code: buff.ly/WrpkCxg
📊 Data: buff.ly/FMNHJAI
✍️ Paper: buff.ly/FpYgLDV
February 11, 2026 at 7:48 PM
All MolmoSpaces assets, scenes, & tools are open + modular, provided in MJCF with USD conversion for cross-simulator portability. Plug in new embodiments, regenerate grasps, & run across MuJoCo, ManiSkill, & NVIDIA Isaac Lab/Sim.
February 11, 2026 at 7:48 PM
MolmoSpaces supports teleoperation via mobile platforms like Teledex—collect demonstrations right from your phone, compatible with all our embodiment setups, including DROID and CAP; no extra configuration is needed.
February 11, 2026 at 7:48 PM
📐 MolmoSpaces-Bench is our new benchmark for evaluating generalist policies under systematic, controlled variation. Researchers can vary one factor at a time – from object properties to layouts, task complexity, lighting, dynamics, and instruction phrasing – across thousands of realistic scenes.
February 11, 2026 at 7:47 PM
MolmoSpaces ships with a massive bank of validated grasps for rigid and articulated objects, loadable directly into environments. An accompanying trajectory-generation pipeline supports reproducible demonstrations and imitation learning at scale.
February 11, 2026 at 7:47 PM
MolmoSpaces builds on two foundations: Objaverse, one of the largest open collections of 3D objects, and our THOR family of interactive simulation environments, all unified with physics-grounded simulation + validated physical parameters tuned for realistic manipulation. ⚙️
February 11, 2026 at 7:47 PM
The next wave of AI will act in the physical world, but building robots that generalize across new environments – rather than replaying learned behaviors – requires far more diverse training data than exists today.

That's where MolmoSpaces comes in.
February 11, 2026 at 7:47 PM
Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖

230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.
February 11, 2026 at 7:47 PM
MolmoSpaces ships with a massive bank of validated grasps for rigid and articulated objects, loadable directly into environments. An accompanying trajectory-generation pipeline supports reproducible demonstrations and imitation learning at scale.
February 11, 2026 at 5:32 PM
MolmoSpaces builds on two foundations: Objaverse, one of the largest open collections of 3D objects, and our THOR family of interactive simulation environments, all unified with physics-grounded simulation + validated physical parameters tuned for realistic manipulation. ⚙️
February 11, 2026 at 5:31 PM
The next wave of AI will act in the physical world, but building robots that generalize across new environments – rather than replaying learned behaviors – requires far more diverse training data than exists today.

That's where MolmoSpaces comes in.
February 11, 2026 at 5:31 PM
Thanks for the feedback. It's not a perfect tool; hallucinations may occur, particularly given the model's small size.
February 11, 2026 at 1:32 AM
Reposted by Ai2
incredibly fun project led by our intern yapei chang

we mined the web for thousands of real-world “how to do X” step by step instructions and turned it into a dataset, synth data training procedure, eval suite, etc.
LLMs often generate step-by-step instructions, from real-world tasks (how do I file taxes?) to plans for AI agents. Improving this is hard: outputs can sound fluent for steps that don't work, and current datasets cover few domains.

How2Everything evals/trains for this at scale. 🧵
February 10, 2026 at 8:34 PM
We stress-test How2Bench to make sure that model performance isn’t driven by matching task style or by memorizing source web pages.

Read all about it below 👇
📝 Blog: buff.ly/4FUlgD3
📄 Paper: buff.ly/CfrDxiI
💻 Code: buff.ly/vKMAvqc
🤗 HF: buff.ly/jOMqysf
How2Everything: Mining the web to evaluate and improve LLMs on real-world procedures | Ai2
How2Everything is an open framework for evaluating and improving how well LLMs generate step-by-step procedures.
allenai.org
February 10, 2026 at 4:53 PM
Finally, RL using How2Score as a reward yields >10-point gains on Qwen3 4B, Qwen3 8B, and Olmo 3 7B Think with no systematic regressions on 12 standard benchmarks covering knowledge, reasoning, chat, math, & code. We apply a length reward to prevent reward hacking via verbosity.
February 10, 2026 at 4:53 PM
3️⃣ We save 7K procedures for How2Bench, a benchmark for measuring how base & instruct models fare. It reliably tracks generation correctness across training progress & model size, providing an effective tool for comparing models from 1B pretraining checkpoints to frontier LLMs.
February 10, 2026 at 4:53 PM