UniverseTBD
banner
universetbd.org
UniverseTBD
@universetbd.org
We are on a mission to democratise science for everyone. Join our Discord at https://discord.gg/RH2jgT3vtQ, or contact us at [email protected].
📢 New dataset out!

We introduce HypoGen💥, a dataset of ~5.5K structured problem–hypothesis pairs (Bit–Flip–Spark + Chain‑of‑Reasoning) to advance LLM-driven scientific ideation💡.

Fine‑tuned LLaMA 3.1 8B & R1‑distilled models show significant gains. Humans are still the best🥇.
April 18, 2025 at 5:57 PM
Our two-stage fine-tuning process adapts the model for both image captioning and visual question answering in the astronomy domain, making complex astronomical concepts more accessible through natural conversation
April 16, 2025 at 12:17 PM
We fine-tuned LLaVA on ~30k astronomical images with captions & QA pairs from NASA APOD, ESO, and Hubble archives to create a model that understands astronomical concepts in visual form 👉 hf.co/datasets/UniverseTBD/AstroLLaVA_convos
April 16, 2025 at 12:17 PM
Excited to announce our new paper as part of our 2^2 week: AstroLLaVA, a vision language model for astronomy that enables natural dialogue with astronomical imagery! Shout out to Sharaf Zaman for leading this work arxiv.org/abs/2504.08583 🔭☄️
April 16, 2025 at 12:17 PM
Our 2^2 celebration is still in full swing! 🎉
Today we’re launching our latest, must-read survey paper:
“A Survey on Hypothesis Generation for Scientific Discovery in the Era of Large Language Models.”

Check it out! arxiv.org/pdf/2504.054...
🔭
April 15, 2025 at 12:48 PM
🎉 HAPPY BIRTHDAY, UniverseTBD! 🚀
As we turn 2, we’re going 2^2.
Launching a new project per day for the next four days.
We hope that you all enjoy these works as much as we have enjoyed working on them. Stay tuned for the big reveals!
April 13, 2025 at 7:18 PM
This work wouldn't have been possible without collaboration between more than two dozen researchers and institutions 👇
December 2, 2024 at 3:47 PM
The MMU compiles crossmatched observations of astronomy's most important telescopes, including:
- over 120M galaxy images
- over 5M stellar and galactic spectra
- Light curves for over 3.5M astronomical objects
- Measurements of nearly 220M stars
- Supernova and galaxy classifications
December 2, 2024 at 3:47 PM
Excited to announce the Multimodal Universe (MMU): a huge 100TB dataset bringing together the largest ML-focused collection of astronomical observations ever assembled to accelerate open AI and astronomy research

github.com/MultimodalUn...

Think ImageNet, but for space 🔭 #astrocode 🧵
December 2, 2024 at 3:47 PM
Very nice work on chemistry and materials science foundation models from Nawaf Alampara and team (including our own
@pktrpl.bsky.social 🥳🥳 ) 🧪🧪

Paper link: arxiv.org/abs/2411.16955
November 28, 2024 at 8:03 AM
Join us on Discord tomorrow (Nov 28th) at 15:00 UTC for our monthly talk 🔭 🧪 ! @micginolfi.bsky.social (U Florence) will be talking about how we can infer many galaxy properties via a single neural net (nice X thread here with more details: x.com/micginolfi/s...).

Link: discord.gg/9PXcTH7cTn?e...
November 27, 2024 at 3:32 PM