Lightnews — Scholar-powered news

Robert Nowak

@rdnowak.bsky.social

770 followers 110 following 41 posts

Director of the Center for the Advancement of Progress

Posts Replies Media Videos

Robert Nowak

@rdnowak.bsky.social

Yes. Just write your thoughts in a rough and unpolished form, say rough paragraphs that contain terse points you want to make. then let 'er rip

October 31, 2025 at 7:21 PM

Robert Nowak

@rdnowak.bsky.social

Section 7 is a wonderful description of the process they went through.

October 25, 2025 at 3:57 PM

Robert Nowak

@rdnowak.bsky.social

something just isn't fully clicking. if you look at total yards and time of possession, they should have blown them out. well, better anyway to peak later in season, so let's hope that's what happens (like two seasons ago)

October 13, 2025 at 2:09 AM

Robert Nowak

@rdnowak.bsky.social

Packers get the win, but it wasn't pretty.

October 13, 2025 at 12:45 AM

Robert Nowak

@rdnowak.bsky.social

Google promotes box shirts too

September 5, 2025 at 6:19 PM

Robert Nowak

@rdnowak.bsky.social

Pour into

August 27, 2025 at 2:36 PM

Robert Nowak

@rdnowak.bsky.social

More likely midges. The truest sign of a healthy ecosystem

May 16, 2025 at 10:55 PM

Robert Nowak

@rdnowak.bsky.social

This is collaboration with Ziyue Luo, @shroffness and @kevinlauka

February 7, 2025 at 2:55 AM

Robert Nowak

@rdnowak.bsky.social

Jifan’s on the industry job market now, and his expertise in efficient training, distillation, and data curation couldn't be more timely. Feel free to reach out to him at [email protected].
📄 Paper: arxiv.org/abs/2410.02755

GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data

Large language models require vast amounts of high-quality training data, but effective filtering of web-scale datasets remains a significant challenge. This paper demonstrates that GPT-4o is remarkab...

arxiv.org

February 7, 2025 at 2:55 AM

Robert Nowak

@rdnowak.bsky.social

SIEVE improves upon existing quality filtering methods in the DataComp-LM challenge, producing better LLM pretraining data that led to improved model performance.
This work is part of Jifan's broader research on efficient ML training, from active learning to label-efficient SFT for LLMs.

February 7, 2025 at 2:55 AM

Robert Nowak

@rdnowak.bsky.social

Why does this matter? High-quality data is the bedrock of LLM training. SIEVE enables filtering trillions of web data for specific domains like medical/legal text with customizable natural language prompts.

February 7, 2025 at 2:55 AM

Robert Nowak

@rdnowak.bsky.social

SIEVE distills GPT-4's data filtering capabilities into lightweight models at <1% of the cost. Not just minor improvements - we're talking 500x more efficient filtering operations.

February 7, 2025 at 2:55 AM

Robert Nowak

@rdnowak.bsky.social

Maybe Trump should have read my mom's book: "For the first six weeks, the embryo, whether XX or XY, coasts along in sexual ambiguity." p. 25

January 23, 2025 at 12:25 AM

Robert Nowak

@rdnowak.bsky.social

Good luck with that

January 4, 2025 at 1:20 AM

Robert Nowak

@rdnowak.bsky.social

p.s. we don't know for sure if I said this or not

January 4, 2025 at 12:36 AM

Robert Nowak

@rdnowak.bsky.social

Is the solution treating everything electronic as "fake"?
Maybe?

January 4, 2025 at 12:35 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news