Lightnews — Scholar-powered news

Reposted by Anirudh Khatry

Sebastian Joseph @sebajoe.bsky.social · Jun 2

How good are LLMs at 🔭 scientific computing and visualization 🔭?

AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results.

SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵

1 2 10

Anirudh Khatry @anirudhkhatry.bsky.social · Jun 2

Love this!

1

Reposted by Anirudh Khatry

SIGPLAN @sigplan.bsky.social · Jun 2

We’ve started a podcast! @awsto.bsky.social and @samps.phd host “Current Continuation,” a little interview series with PL researchers. The first two episodes are with @ranjitjhala.bsky.social and @satnam6502.bsky.social. sigplan.org/cc/

Current Continuation

sigplan.org

2 11 24

Anirudh Khatry @anirudhkhatry.bsky.social · Jun 2

Congratulations Kanishka!

1 1

Reposted by Anirudh Khatry

Kanishka Misra 🌊 @kanishka.bsky.social · Jun 2

News🗞️

I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘

Excited to develop ideas about linguistic and conceptual generalization (recruitment details soon!)

Picture of the UT Tower taken by me on my first day at UT as a postdoc in 2023!

12 8 66

Reposted by Anirudh Khatry

Manya Wadhwa @manyawadhwa.bsky.social · Apr 22

Evaluating language model responses on open-ended tasks is hard! 🤔

We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️.

EvalAgent identifies 👩‍🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇

1 5 21

Anirudh Khatry @anirudhkhatry.bsky.social · Apr 23

📄 Read the full paper:
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation
arxiv.org/abs/2504.15254
Dataset: github.com/anirudhkhatr...
w/ @robertzhang.bsky.social , Jia Pan, @zetten.bsky.social, @jqchen.bsky.social, @gregdnlp.bsky.social, @idillig.bsky.social.

🧵[6/6]

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

C-to-Rust transpilation is essential for modernizing legacy C code while enhancing safety and interoperability with modern Rust ecosystems. However, no dataset currently exists for evaluating whether ...

arxiv.org

1 3

Anirudh Khatry @anirudhkhatry.bsky.social · Apr 23

Models often fail to:
1. Respect ownership rules
2. Infer type information
3. Follow idiomatic Rust interfaces
4. Preserve correct lifetimes
In the paper, we provide a taxonomy of common LLM mistakes.
🧵[5/6]

1 2

Anirudh Khatry @anirudhkhatry.bsky.social · Apr 23

We evaluate state-of-the-art closed-source LLMs (like o1, Claude-3.7, and Gemini-1.5-Pro), open-source models like QwQ-32B and virtuoso-32B, and the SWE-Agent on CRUST-Bench.
Even the best model—OpenAI's o1—passes only 15/100 tasks in a single-shot setting.
🧵[4/6]

1 2

Anirudh Khatry @anirudhkhatry.bsky.social · Apr 23

Our benchmark is the first to provide:
1. Rust tests
2. Rust interfaces, which are necessary for the transpiled code to work with the tests
3. A sizable number of real-scale transpilation problems.
🧵[3/6]

1 3

Anirudh Khatry @anirudhkhatry.bsky.social · Apr 23

Transpiling C to Rust helps modernize legacy code with memory safety guarantees. CRUST-Bench evaluates whether transpilation methods yield safe, idiomatic Rust, using handcrafted interfaces and tests to ensure safety and validate correctness.
🧵[2/6]

1 2

Anirudh Khatry @anirudhkhatry.bsky.social · Apr 23

🚀Meet CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]

1 5 17

Reposted by Anirudh Khatry

Conference on Language Modeling @colmweb.org · Mar 20

A bit of a mess around the conflict of COLM with the ARR (and to lesser degree ICML) reviews release. We feel this is creating a lot of pressure and uncertainty. So, we are pushing our deadlines:

Abstracts due March 22 AoE (+48hr)
Full papers due March 28 AoE (+24hr)

Plz RT 🙏

3 31 37

Reposted by Anirudh Khatry

Nathan Lambert @natolambert.bsky.social · Feb 25

Come work with me!
We are looking to bring on more top talent to our language modeling workstream at @ai2.bsky.social building the open ecosystem. We are hiring:
* Research scientists
* Senior research engineers
* Post docs (Young investigators)
* Pre docs

job-boards.greenhouse.io/thealleninst...

The Allen Institute for AI

job-boards.greenhouse.io

4 15 56

Reposted by Anirudh Khatry

Jessy Li @jessyjli.bsky.social · Feb 25

🌟Job ad🌟 We (@gregdnlp.bsky.social, @mattlease.bsky.social and I) are hiring a postdoc fellow within the CosmicAI Institute, to do galactic work with LLMs and generative AI! If you would like to push the frontiers of foundation models to help solve myths of the universe, please apply!

NSF-Simons AI Institute for Cosmic Origins (CosmicAI) @nsfsimonscosmicai.bsky.social · Feb 25

Seeking candidates (within three years of the award of their PhD) for a postdoctoral position with the Explorable Universe research group to perform research on developing next-generation generative AI copilots & agents to aid astronomy research. Info here www.cosmicai.org/jobs/postdoc...

7 13

Reposted by Anirudh Khatry

Kyle Lo @ COLM 2025 🍁 @kylelo.bsky.social · Feb 25

encourage postdocs to apply 👇

@soldaini.net, myself and others from @ai2.bsky.social have been helping in project & also learning a ton---continued pretraining, creating domain-specific training data & evals---to build foundation models that scientists can use. promising area for open source LMs!

Jessy Li @jessyjli.bsky.social · Feb 25

🌟Job ad🌟 We (@gregdnlp.bsky.social, @mattlease.bsky.social and I) are hiring a postdoc fellow within the CosmicAI Institute, to do galactic work with LLMs and generative AI! If you would like to push the frontiers of foundation models to help solve myths of the universe, please apply!

NSF-Simons AI Institute for Cosmic Origins (CosmicAI) @nsfsimonscosmicai.bsky.social · Feb 25

Seeking candidates (within three years of the award of their PhD) for a postdoctoral position with the Explorable Universe research group to perform research on developing next-generation generative AI copilots & agents to aid astronomy research. Info here www.cosmicai.org/jobs/postdoc...

2 9

Reposted by Anirudh Khatry

Luca Soldaini 🎀 @soldaini.net · Feb 10

three things are certain in life: death, taxes, and Claude switching to concise mode during US business hours

2 2 38

Anirudh Khatry @anirudhkhatry.bsky.social · Jan 30

Kudos to Usneek Singh. It was a pleasure to collaborate on this paper with the amazing folks at PROSE!

José Cambronero @josepablocam.bsky.social · Jan 29

Excited to share that our work on validating synthetic data for spreadsheet formula generation from natural language will be presented at NAACL Findings 2025 arxiv.org/abs/2407.10657 congratulations to the lead author, Usneek Singh, who i had the pleasure of working with at MSFT.

An Empirical Study of Validating Synthetic Data for Formula Generation

Large language models (LLMs) can be leveraged to help with writing formulas in spreadsheets, but resources on these formulas are scarce, impacting both the base performance of pre-trained models and l...

arxiv.org

1 2

Reposted by Anirudh Khatry

Swarat Chaudhuri @swarat.bsky.social · Jan 4

@ayushkhaitan.bluesky.social, Amitayush Thakur, and I are organizing an #AI4Math panel at the Joint Mathematics Meeting this month. Please spread the word among your math friends! We will post a summary of the discussion after the event.

Ayush Khaitan @ayushkhaitan.bsky.social · Jan 3

Looking forward to the #jmm2025 panel on the "Use of AI tools for Mathematics research" that we are co-organizing with @swarat.bsky.social and Amitayush Thakur. The panelists are Alex Kontorovich, Rishi Mehta, Emily Wenger and Kaiyu Yang. See you there!

1 6

Reposted by Anirudh Khatry

Greg Durrett @gregdnlp.bsky.social · Jan 3

Huge congrats to @prasannsinghal.bsky.social for being one of the 8 CRA Outstanding Undergraduate Researcher Award winners! It has been an absolute privilege to work with Prasann during his time at UT. (And he's applying for PhD programs this year...hint hint...)

Prasann's work 🧵

1 4 23

Reposted by Anirudh Khatry

Isil Dillig @idillig.bsky.social · Dec 23

@andersmoeller.bsky.social and I are co-chairing OOPSLA'26 and soliciting PC nominations. If you'd like to serve on the OOPSLA PC next year or know anyone (e.g., recent graduate) who you think would do a good job, please nominate them here: forms.gle/NVnzjcmbshoL...

forms.gle

2 13 21

Reposted by Anirudh Khatry

Swarat Chaudhuri @swarat.bsky.social · Dec 8

The legendary Putnam math competition had its 85th edition yesterday. Coincidentally, George Tsoukalas will present our paper on PutnamBench, a next-generation #AI4Math benchmark, at #NeurIPS2024 this week: arxiv.org/abs/2407.11214.
If you work on frontier AI for math/reasoning, talk to George!

3 15

Reposted by Anirudh Khatry

Greg Durrett @gregdnlp.bsky.social · Dec 8

I'll be at #NeurIPS2024 w/

- @fcyin.bsky.social's LoFiT: using interp to improve fine-tuning (Weds pm poster & MINT spotlight talk Sun)
- @thomlake.bsky.social's analysis of Overton pluralism (Pluralistic alignment Sat)

Please reach out to me to chat about interp, factuality, reasoning, &c!

1 8 46

Reposted by Anirudh Khatry

Isil Dillig @idillig.bsky.social · Nov 22

Excited to visit Columbia next week!

1 1 14

Reposted by Anirudh Khatry

Atlas Wang @atlaswang.bsky.social · Nov 22

I did a starter pack of ML/AI people at @utaustin.bsky.social Please distribute and feel free to self nominate!

go.bsky.app/QLQznZg

2 8 27