Lightnews — Scholar-powered news

John F Wu @jwuphysics.bsky.social · 2d

Catch us in Montreal!

JHU Computer Science @jhucompsci.bsky.social · 2d

In “Rank1: Test-Time Compute for Reranking in Information Retrieval,” @orionweller.bsky.social, Kathryn Ricci, Eugene Yang, Andrew Yates, Dawn Lawrie, and Benjamin Van Durme introduce the first reranking model trained to take advantage of test-time 🕰️ compute: (2/3)

Rank1: Test-Time Compute for Reranking in Information Retrieval

We introduce Rank1, the first reranking model trained to take advantage of test-time compute. Rank1 demonstrates the applicability within retrieval of using a reasoning language model (i.e. OpenAI's o...

arxiv.org

1

Reposted by John F Wu

Jessy Li @jessyjli.bsky.social · 7d

Here is a genuine one :) CosmicAI’s AstroVisBench, to appear at #NeurIPS

bsky.app/profile/nsfs...

NSF-Simons AI Institute for Cosmic Origins (CosmicAI) @nsfsimonscosmicai.bsky.social · 13d

Exciting news! Introducing AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy!

A new benchmark developed by researchers at the NSF-Simons AI Institute for Cosmic Origins is testing how well LLMs implement scientific workflows in astronomy and visualize results.

1 2

John F Wu @jwuphysics.bsky.social · 6d

Okay I guess I should be more fair. This isn't the worst offender, but I'm still not a fan: it misses loads of relevant citations, doesn't release the benchmark, its example questions are meh (see MIRI question in Fig 1), and multiple choice is known to be bad (see e.g. arxiv.org/abs/2507.02856)

Answer Matching Outperforms Multiple Choice for Language Model Evaluation

Multiple choice benchmarks have long been the workhorse of language model evaluation because grading multiple choice is objective and easy to automate. However, we show multiple choice questions from ...

arxiv.org

1 1

John F Wu @jwuphysics.bsky.social · 7d

I've never shuddered so hard at reading AI slop.

Please make it stop

arxiv.org/abs/2510.00063

AstroMMBench: A Benchmark for Evaluating Multimodal Large Language Models Capabilities in Astronomy

Astronomical image interpretation presents a significant challenge for applying multimodal large language models (MLLMs) to specialized scientific tasks. Existing benchmarks focus on general multimoda...

arxiv.org

2 7

John F Wu @jwuphysics.bsky.social · 8d

Fantastic work on the size–mass relation for low-mass galaxies, led by Yasmeen (@yasmeenasali.bsky.social)!

arxiv.org/abs/2509.25335

🔭🌌🧪

A sneak preview of Figure 6 in the paper, which shows size (r-band radius) vs stellar mass for SAGA satellites, SAGA background galaxies, and SDSS isolated galaxies. They all obey the same trends but have small offsets, which appears unlikely to be driven by SFR but *does* seem to be driven by environment!

3 11

John F Wu @jwuphysics.bsky.social · 8d

Any mutuals going to be in Montréal next week? Give me a shout if so!

I'll be attending COLM and visiting UdeM, Ciela, and Mila, and presenting on various topics spanning ML applications in galaxy evolution to interpretable AI for scientific discovery.

1

John F Wu @jwuphysics.bsky.social · 9d

An Arp-like messy merger at z=0.03.

www.legacysurvey.org/viewer?ra=17...

A merger galaxy system that look Arp-like.

https://www.legacysurvey.org/viewer?ra=171.8432&dec=-5.4983&layer=ls-dr9&zoom=14

1 9

John F Wu @jwuphysics.bsky.social · 11d

Anyone else going to COLM? Give me a shout!

Also, check out our poster on evaluating LLMs for astronomy research. This work came out of our 2024 JSALT research and was jointly led by undergrads Alina Hyk and Kiera McCormick!

Screenshot of our abstract from the COLM schedule page, printed below

Thursday, October 9th

Title: From Queries to Criteria: Understanding How Astronomers Evaluate LLMs

11:00 AM – 1:00 PM
710

Authors: Alina Hyk, Kiera McCormick, Mian Zhong, Ioana Ciucă, Sanjib Sharma, John F Wu, J. E. G. Peek, Kartheik G. Iyer, Ziang Xiao, Anjalie Field

Abstract
There is growing interest in leveraging LLMs to aid in astronomy and other scientific research, but benchmarks for LLM evaluation in general have not kept pace with the increasingly diverse ways that real people evaluate and use these models. In this study, we seek to improve evaluation procedures by building an understanding of how users evaluate LLMs. We focus on a particular use case: an LLM-powered retrieval-augmented generation bot for engaging with astronomical literature, which we deployed via Slack. Our inductive coding of 368 queries to the bot over four weeks and our follow-up interviews with 11 astronomers reveal how humans evaluated this system, including the types of questions asked and the criteria for judging responses. We synthesize our findings into concrete recommendations for building better benchmarks, which we then employ in constructing a sample benchmark for evaluating LLMs for astronomy. Overall, our work offers ways to improve LLM evaluation and ultimately usability, particularly for use in scientific research.

2

John F Wu @jwuphysics.bsky.social · 14d

I assume the submission is 9 pp and then the camera ready is 10 pp. Strange that they wrote "submission version" every time...

1

John F Wu @jwuphysics.bsky.social · 14d

Thanks Baltimore DOT for penalizing 40+ mph speeders more harshly, but by that point shouldn't you be revoking their driving license?

Baltimore new tiered speeding fine structure, ranging from $40 (<15 mph over), $70 (16-19 mph), $120 (20-29 mph), $230 (30-39 mph), and $425 (40+ mph)

2 8

Reposted by John F Wu

✨Grace Hebron✨ @heywritergrace.bsky.social · 15d

*Girl who's excited for @baltimorebeat.bsky.social's FIRST food issue, coming tomorrow* 🍔🌭🌮🍕

8 18

John F Wu @jwuphysics.bsky.social · 16d

They really don't pay you guys enough to be subjected to the disappointment that is eating at Chipotle in Baltimore

Also 27 points for the one in Mt Vernon?! They haven't once fulfilled my order correctly or had all menu items in stock.

1

John F Wu @jwuphysics.bsky.social · 16d

Tbf Lamar usually doesn't throw it away, and he somehow makes magic out of it. Not this time...

John F Wu @jwuphysics.bsky.social · 16d

Didn't seem like the o line had any idea who their blocking assignments were.

1

John F Wu @jwuphysics.bsky.social · 17d

Oops I mean 2022

1

John F Wu @jwuphysics.bsky.social · 18d

While democracy dies in darkness, let me just say that one of my most prized possessions is the re-launch print of the @baltimorebeat.bsky.social from 2020

1 5

John F Wu @jwuphysics.bsky.social · 19d

Two things about this paper.

1. This is legitimately useful information
2. The supplementary material shows the experimental set up... and they perform all experiments in a Bialetti Moka pot box, because of course they did

Supplementary Fig 1 of the paper, showing the schematic of the heating equipment (left) and image acquisition set up (right)

Supplementary Fig 1 of the paper, showing the actual heating equipment (left) and image acquisition set up (right). The right-side shows a camera mounted over an illuminated box. The box is the outside packaging of the popular Bialetti moka pot.

3

John F Wu @jwuphysics.bsky.social · 19d

Delighted to see that this year's Ig Nobel Physics Prize is about the phase behavior of Cacio e pepe sauce.

Paper: pubs.aip.org/aip/pof/arti...

Pop Sci article: phys.org/news/2025-09...

2025 Ig Physics Nobel Prize goes to perfect pasta sauce

The Ig Nobel Prize honors research that first makes people laugh, then makes them think. Its 35th award ceremony possibly also makes people hungry: ISTA physicist Fabrizio Olmeda and colleagues resear...

phys.org

2 8 23

Reposted by John F Wu

Eugene Katrukha 🇺🇦 @ekatrukha.bsky.social · 20d

A cell finding its way through the matrix, imaged with @joycemeiri.bsky.social on LLS.

1 12 33

John F Wu @jwuphysics.bsky.social · 21d

That's how I learned it!

1

John F Wu @jwuphysics.bsky.social · 22d

TIL that @colmweb.org is pronounced like "Collum"!

1 3

Reposted by John F Wu

Melina Thévenot🏳️‍⚧️ @melina-iras07572.bsky.social · 26d

The jellyfish #galaxy MACSJ0451-JFG1 in a galaxy cluster with #JWST NIRCam. 🔭

The galaxy is experiencing ram-pressure stripping. It moves trough the intracluster medium and is stripped of gas, leaving tails that form stars.

My image processing from today: commons.wikimedia.org/wiki/File:MA...

A galaxy with long filament structures tailing behind the galaxy, like the tentacles of a jellyfish. The shape of the galaxy is also slightly warped.

3 39 150

John F Wu @jwuphysics.bsky.social · 26d

Thank you! 💙

1

John F Wu @jwuphysics.bsky.social · 26d

And I didn't even have to pay a billionaire! Wow!

Verified Accounts @verifiedusers.bsky.social · 26d

🛡️ @jwuphysics.bsky.social has been verified by @bsky.team Track verified accounts and trusted verifiers at bverified.vercel.app!

1 3

John F Wu @jwuphysics.bsky.social · 27d

Impressive sleuthing!

Careful observations 🤝 careful statistical modeling

Matthew Kenworthy @mattkenworthy.bsky.social · 27d

Some sad #exoplanets news in a paper led by me: “YSES 2b is a background star”. A distant M dwarf star some 2 kiloparsecs behind the star just so happens to have a non-zero proper motion in EXACTLY the wrong direction: this required multiple GRAVITY observations to solve… #astrosci #astrodon

A coiled spiral representing proper motion plus parallax shows a fit to several astrometric points, showing that the object next to the star YSES 2 is probably a very distant M dwarf star far away in our Galaxy. Drat, damn and blast!

1 2