Lightnews — Scholar-powered news

Light up
your news

About Privacy Terms Help

gavin leech

gavin leech

@gleech.org

1.1K followers 290 following 270 posts

context maximizer

https://gleech.org/

Posts Replies Media Videos

gavin leech

@gleech.org

www.gleech.org/barriers

Ways we can fail to answer

In what ways can we can fail to answer a question?(I mean necessarily fail: actual barriers to knowledge, rather than skill issue hurdles. But of course ...

November 8, 2025 at 11:57 AM

gavin leech

@gleech.org

www.gleech.org/god

What god has to do in mathematics

He picks from uncountably many sets simultaneously without an algorithm.

November 8, 2025 at 11:57 AM

gavin leech

@gleech.org

www.gleech.org/nysound

There are roughly three New Yorks. There is, first, the New York of the man or woman who was born here, who takes the city for granted and accepts its size...

November 8, 2025 at 11:57 AM

gavin leech

@gleech.org

www.gleech.org/editron

AI editing: a test

November 8, 2025 at 11:57 AM

gavin leech

@gleech.org

www.gleech.org/inference

Abusing "inference"

November 8, 2025 at 11:57 AM

gavin leech

@gleech.org

www.gleech.org/tallinn

That’s why you can never trust a good person, for he will freely do evil - purely for justice’s sake, so that everyone may be the same [miserable].

November 8, 2025 at 11:57 AM

gavin leech

@gleech.org

www.gleech.org/music2024

On 900 albums of 2024

Last year, I tuned the fuck in: I listened to 914 albums released last year. I had a few aims: capturing the zeitgeist; seeing if our usual sense that new st...

November 8, 2025 at 11:57 AM

gavin leech

@gleech.org

www.gleech.org/deen

Mica glittered from the white stone.Town of the pure crystal,I learnt Latin in your sparkling cage,I loved your brilliant streets.Places that have been g...

November 8, 2025 at 11:57 AM

gavin leech

@gleech.org

www.gleech.org/glesga

Here’s the bird that never flew;Here’s the tree that never grew;Here’s the fish that never swam;Here’s the bell that never rang.― superficially apocalyp...

November 8, 2025 at 11:57 AM

gavin leech

@gleech.org

www.gleech.org/rats-and-trads

Recently I spent some time at a liberal arts college, doing close reading out loud with a group of the profs. It was lovely; stimulating, collegial, civilise...

November 8, 2025 at 11:57 AM

gavin leech

@gleech.org

www.gleech.org/theisms

Thou hast conquered, O pale Galilean; the world has grown grey from thy breath;We have drunken of things Lethean, and fed on the fullness of death.Laurel i...

November 8, 2025 at 11:57 AM

gavin leech

@gleech.org

www.gleech.org/ambitions

When someone accumulates money and wants more, or fame or acclaim and wants more, or climbs the ladder of large organisations, or thinks everyone else is wro...

November 8, 2025 at 11:57 AM

gavin leech

@gleech.org

The METR eval is worth reading throughout - they anticipated most of my objections

metr.github.io/autonomy-eva...

Details about METR’s evaluation of OpenAI GPT-5

Resources for testing dangerous autonomous capabilities in frontier models

August 8, 2025 at 2:06 PM

gavin leech

@gleech.org

Refs:

epoch.ai/frontiermath

If you assume GPT-5 fails all 23 excluded SWE-Bench problems, then Claude 4.0 > GPT-5
x.com/gneubig/stat...

other coding
x.com/eli_lifland/...

aider.chat/docs/leaderboa

FrontierMath is a benchmark of hundreds of unpublished and extremely challenging math problems to help us to understand the limits of artificial intelligence.

August 8, 2025 at 2:06 PM

gavin leech

@gleech.org

The switcher switcharoo means it now makes even less sense to report one number for "GPT-5".

You should do 2:
* Raw power: (no tools pass@256)
* Maxed out mech suit (128k thinking, all tools, search, agency, subsession where it asks Claude, whatever)

x.com/Sauers_/stat...

Sauers on X: "6.7x difference depending on what you mean by "GPT-5" https://t.co/SyeMhS7N6h" / X

6.7x difference depending on what you mean by "GPT-5" https://t.co/SyeMhS7N6h

August 8, 2025 at 2:06 PM