Lightnews — Scholar-powered news

Reposted by Charles Foster

METR

@metr.org

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers.

The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

July 10, 2025 at 7:47 PM

Charles Foster

@cfoster.bsky.social

Update for those who’ve left the other app:

I’m now on the policy team at Model Evaluation and Threat Research (METR). Excited to be “doing AI policy” full-time.

March 7, 2025 at 3:19 AM

Charles Foster

@cfoster.bsky.social

Why aren’t our AI evaluations better? AFAICT a key reason is that the incentives around them are kinda bad.

In a new post, I explain how the standardized testing industry works and write about lessons it may have for the AI evals ecosystem.

open.substack.com/pub/contextw...

February 26, 2025 at 4:35 AM

Charles Foster

@cfoster.bsky.social

When we optimize automation, we sometimes optimize *hard*. Like this automated loom working away at an inhuman 1200 RPM. Wild. youtu.be/WweMNDqDYhc?...

TOYOTA AIR JET LOOMS JAT 810 JA4S-190 CM RUNNING AT 1200 RPM

YouTube video by TEMAC INDIA

youtu.be

January 11, 2025 at 2:10 AM

Charles Foster

@cfoster.bsky.social

Is there a website/database out there that tracks what major AI company executives say about the future of AI?

December 22, 2024 at 5:24 AM

Charles Foster

@cfoster.bsky.social

Transformers and other parallel sequence models like Mamba are in TC⁰. That implies they can't internally map (state₁, action₁ ... actionₙ) → stateₙ₊₁

But they can map (state₁, action₁, state₂, action₂ ... stateₙ, actionₙ) → stateₙ₊₁

Just reformulate the task!

December 18, 2024 at 6:59 AM

Charles Foster

@cfoster.bsky.social

Atticus Geiger gave a take on when sparse autoencoder (SAEs) are/aren’t what you should use. I basically agree with his recommendations. youtube.com/clip/UgkxKWI...

YouTube

Share your videos with friends, family, and the world

youtube.com

December 10, 2024 at 10:28 PM

Charles Foster

@cfoster.bsky.social

These days, flow-based models are typically defined via (neural) differential equations, requiring numerical integration or simulation-free alternatives during training. This paper revisits autoregressive flows, using Transformer layers to define the sequence of flow transformations directly.

Tanishq Mathew Abraham @iscienceluvr.bsky.social · Dec 10

Normalizing Flows are Capable Generative Models

Apple introduces TarFlow, a new Transformer-based variant of Masked Autoregressive Flows.

SOTA on likelihood estimation for images, quality and diversity comparable to diffusion models.

arxiv.org/abs/2412.06329

Normalizing Flows are Capable Generative Models

Normalizing Flows (NFs) are likelihood-based models for continuous inputs. They have demonstrated promising results on both density estimation and generative modeling tasks, but have received relati...

arxiv.org

December 10, 2024 at 5:08 PM

Charles Foster

@cfoster.bsky.social

Re: instruction-tuning and RLHF as “lobotomy”

I’m interested in experiments that look into how much finetuning can “roll back” a post-trained model to its base model perplexity on the original distribution.

Has anyone seen an experiment like this run?

December 4, 2024 at 5:44 AM

Charles Foster

@cfoster.bsky.social

I’ve been wondering when it would make sense for “AI agent” services to offer money-back guarantees. Wrote a short post about this on a flight.

open.substack.com/pub/contextw...

“Provider pays” for failed automation services

If your AI works as well as you claim, why not make that a promise?

open.substack.com

December 1, 2024 at 11:26 PM

Charles Foster

@cfoster.bsky.social

Neat thing about real-money prediction markets is that you can get paid for doing this.

xkcd comic 386, with back and forth that goes:

“Are you going to bed?”
“I can’t. This is important.”
“What?”
“Someone is WRONG on the internet.”

https://xkcd.com/386/

November 30, 2024 at 4:40 PM

Charles Foster

@cfoster.bsky.social

A bit of clever mechanism design: prediction markets + randomized auditing.

If you have 100 verifiable claims you want information on but can only afford to check 10, fund markets on each. Later, use a randomized ordering of them to check the first 10. Resolve those to yes/no, refund the rest.

November 28, 2024 at 12:54 AM

Charles Foster

@cfoster.bsky.social

Still gathering my thoughts on @TheCurveConf, but for now, a short reflection on why I like “the curve” as a way of thinking about the future of AI. (1/6)

November 27, 2024 at 7:01 AM

Charles Foster

@cfoster.bsky.social

RT-ed and endorsed

Nathan @handle.invalid · Nov 26

13 thoughts on "The Curve" conference:

1/ The event felt different than other conferences I've been to. Rather than the meeting of a tribe or a cluster of tribes around a shared idea, it was two conflicting tribes.

I would like to see more events like this.

November 26, 2024 at 5:19 AM

Charles Foster

@cfoster.bsky.social

CLAIM: In areas where we can’t measure what (we claim) we want & where we won’t change our minds about that, we’ll struggle to make AI systems that give us better—rather than merely cheaper, faster, more consistent—outputs. But I think that’ll really pressure us to revise our wants.

November 25, 2024 at 8:10 PM

Charles Foster

@cfoster.bsky.social

Timothy B. Lee here gives a good short list of what human attributes might still have value (at least temporarily) in a hypothetical world where AI systems are capable of acting as “remote worker substitutes”.
open.substack.com/pub/understa...

Seven big advantages human workers have over AI

Geoffrey Hinton says "there's nothing special about people." He's wrong.

open.substack.com

November 21, 2024 at 9:58 PM

Charles Foster

@cfoster.bsky.social

For those of us that rely on earned income, a key concern about the future is “Will automation soon put me out of work?” But at the moment, we can’t do much about it.

Would you pay 1% of your earnings per year to protect a year’s worth of future earnings if most jobs are suddenly automated away?

November 21, 2024 at 6:57 AM

Charles Foster

@cfoster.bsky.social

Whenever faced with a hard problem, some AI folks say “I know, I’ll use reinforcement learning.”
Now, they have two hard problems.

November 20, 2024 at 5:35 PM

Charles Foster

@cfoster.bsky.social

Using Bluesky to reboot your character arc for the LLMs

November 17, 2024 at 9:11 PM

Reposted by Charles Foster

Grace

@gracekind.net

A human bioactuator inspects the entire factory and turns a single screw, fixing the problem. The next day, he bills the company $10,000.

“$10,000? My AI could’ve figured out the problem in an instant!”

The bioactuator relied: “It’s $1 for knowing which screw to turn, and $9,999 for turning it”

November 15, 2024 at 1:30 AM

Charles Foster

@cfoster.bsky.social

“I had worked hard for nearly two years, for the sole purpose of infusing life into an inanimate body. For this I had deprived myself of rest and health […] but now that I had finished, the beauty of the dream vanished, and breathless horror and disgust filled my heart.”
- M. Shelley in Frankenstein

Professor Geoffrey Hinton, photo from a NYT article. Link: https://www.nytimes.com/2023/05/01/technology/ai-google-chatbot-engineer-quits-hinton.html

November 13, 2024 at 10:27 PM

Charles Foster

@cfoster.bsky.social

Let us not mistake how we want the world to be for how it is.

November 12, 2024 at 11:04 PM

Reposted by Charles Foster

Ted Underwood

@tedunderwood.com

A letter about the critical role of open-source models in ensuring that AI doesn’t produce an unsafe concentration of power. Led by Mozilla and signed by leaders in academia, MistralAI, EleutherAI, &c.

Joint Statement on AI Safety and Openness

We are at a critical juncture in AI governance. To mitigate current and future harms from AI systems, we need to embrace openness, transparency, and broad access.

open.mozilla.org

November 1, 2023 at 10:18 AM

Charles Foster

@cfoster.bsky.social

We'll soon forget the butterflies we first felt when software talked.

July 4, 2023 at 12:12 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news