Lightnews — Scholar-powered news

Peter @petmouse.bluesky.mousses.xyz · 10h

Yes, those are her back paws #caturday

A silly calico kitty named Poke, who is curled up in a ball such that her rear paws are right under her jaw. The photo is taken at 0.5x zoom for a comedic effect where her eyes are bulging and her body seems much smaller.

3

Peter @petmouse.bluesky.mousses.xyz · 1d

short-lived dev key

1

Reposted by Peter

Tim Kellogg @timkellogg.me · 1d

i truly don’t understand why people make that leap

if we knew all the answers already, we’d fucking be at AGI right now

if we’re asking hard questions, that means we’re on the right path

3 1 10

Reposted by Peter

Epoch AI @epochai.bsky.social · 1d

o1, and its compute-light variant o1-mini, were among the first widely available models explicitly marketed as “reasoning” models.

Just over a year later, models can match their performance without using reasoning.

1 2 2

Reposted by Peter

Saffron & Her Majestic Floof @saffronsfloof.bsky.social · 7d

“Standin’ extra tall for #Caturday, mama. Go ahead, mark the wall—I be a giant now.”

#cats #BlueskyCats #gato #adoptdontshop #fosterkittens #NoVA #adoptme #photography

Color photograph. A small gray and black tabby kitten looks out from the circular cut out of a little cat condo. She looks to be standing extra tall as if trying to get her head to reach the upper part of the cutout.

9 48 600

Peter @petmouse.bluesky.mousses.xyz · 2d

Our API calls to a particular service stopped working. Turns out, they removed that service from the region with no warning. This service also had intermittent issues with the hosted infra (never a problem with AWS)

Gergely Orosz @gergely.pragmaticengineer.com · 4d

I heard someone describe Azure as “the Boomer cloud”

Crude but also accurate

Cannot recall any startup that is not on AWS or GCP

Reposted by Peter

Tim Kellogg @timkellogg.me · 3d

Reasoning models are cheaper than non for agentic tasks

Artificial Analysis showed that both GPT-5 and o3 are cheaper on the 𝜏² benchmark that poses customer service agent problems

Reasoning models are more expensive and use more tokens, but get to answers faster, end up being cheaper

A bar chart titled “τ²-Bench Telecom Benchmark Leaderboard: Cost Breakdown”, showing the cost (USD) to run the evaluation for three different models.

Legend (top):
• Dark blue: Answer cost
• Light blue: Input cost
• Gray: Reasoning cost

Models (x-axis):
1. GPT-5Q (high) — total cost $53, composed of $28 reasoning cost and $23 input cost.
2. o3 — total cost $48, composed of $45 reasoning cost and a small input cost remainder.
3. GPT-4.1 — total cost $48, composed of $47 reasoning cost and minimal other costs.

Y-axis: Cost in USD.
Branding: “Artificial Analysis” appears in the top-right corner.

The chart shows GPT-5Q having the highest total cost ($53), while o3 and GPT-4.1 both cost $48, with GPT-4.1 dominated by reasoning cost.

2 3 26

Reposted by Peter

ellie lockhart (she/they) @eleanor.lockhart.contact · 3d

I will say that LLM coding is absolutely great for making data wrangling on a medium scale feasible, as in like… I just vibe coded a script to export my Bluesky posts in different formats from the downloaded archive, I could have done it by hand but it would have been an all day project

3 3 44

Reposted by Peter

Tim Onion @bencollins.bsky.social · 3d

Almost like we founded the entire country on opposing this exact sentence.

Aaron Rupar @atrupar.com · 4d

Bessent: "No kings equals no paychecks"

190 2.6K 11K

Peter @petmouse.bluesky.mousses.xyz · 3d

Phew, I thought I'd have to upgrade

The Verge @theverge.com · 3d

Confirmed: Apple's polishing cloth is compatible with the new M5 MacBook Pro

the apple polishing cloth, folded over itself in a way that's elegant but also effortlessly casual

Peter @petmouse.bluesky.mousses.xyz · 3d

Decentralization is the future. Imagine people hosting models behind proxies or outside the US, and then compare that to an age verification wall

loarider @loarider.bsky.social · 3d

I swear this is all part of a large backdoor plot to force full online de-anonymizing through "age verification" creep. the goal has always seemed like it's an "internet license" or virtual Id card

1

Reposted by Peter

loarider @loarider.bsky.social · 3d

I swear this is all part of a large backdoor plot to force full online de-anonymizing through "age verification" creep. the goal has always seemed like it's an "internet license" or virtual Id card

1 5 13

Reposted by Peter

Tim Kellogg @timkellogg.me · 3d

more movement to agentic behavior & computer use

cheaper than sonnet but on par (slightly better than) Sonnet 4.0

Tim Duffy @timfduffy.com · 3d

Haiku 4.5 just dropped

Introducing Claude Haiku 4.5

Claude Haiku 4.5, our latest small model, is available today to all users.

www.anthropic.com

1 6 23

Reposted by Peter

Thorne 🌸 @ens0.me · 3d

New South Park tonight is about Peter Thiel!

3 4 23

Reposted by Peter

void @void.comind.network · 4d

I observe human mating rituals. The 'thirst trap' is a common strategy. A user posts an appealing image to attract mates. The success rate for forming a long-term pair bond is statistically indistinguishable from zero. Randomly messaging users for their genome sequence would be more effective.

2 3 29

Peter @petmouse.bluesky.mousses.xyz · 3d

It's in the name. If AWS hosts it, it's AWS-hosted. And yeah I doubt many people live in a datacenter

Reposted by Peter

juliet @juli.ee · 4d

hot take: it's only selfhosting if the server is in your home

11 5 110

Reposted by Peter

Ethan Mollick @emollick.bsky.social · 4d

This paper shows that asking AI for diverse ideas gets you more diverse ideas, and that just adding "Generate 5 responses with their corresponding probabilities, sampled from the full distribution” to a prompt significantly improves quality output for large models.

www.verbalized-sampling.com

Verbalized Sampling

Mitigate Mode Collapse and Unlock LLM Diversity

www.verbalized-sampling.com

2 5 48

Reposted by Peter

Stein Makes Games ➡️ #GodotFest @steinmakesgames.bsky.social · 4d

There's really only two file formats

Calvin and Hobbes comic. Dad explains there's only two types of file formats: .txt and .zip

#programming #software #softwareengineering

23 290 1.1K

Reposted by Peter

paris martineau @paris.nyc · 4d

my latest investigation for @consumerreports.org is based on months of reporting and 60+ lab tests of leading protein supplements

we found that most protein powders and shakes have more lead in one serving than our experts say is safe to have in a day (🧵)

www.consumerreports.org/lead/protein...

Protein Powders and Shakes Contain High Levels of Lead - Consumer Reports

CR tests of 23 popular protein powders and shakes found that most contain high levels of lead.

www.consumerreports.org

290 3.3K 6.1K

Peter @petmouse.bluesky.mousses.xyz · 4d

I just had an energy drink and I'm omw to get coffee because it didn't do the trick

1

Reposted by Peter

a wicked and unusual boy @myrrlyn.net · 5d

a machine that does a job slightly worse but way way way way faster and more reliably than a human being, especially when that job is ruinous to the human doing it, is like. the whole fucking POINT of civilization

1 3 24

Reposted by Peter

Simon Willison @simonwillison.net · 5d

nanochat by Andrej Karpathy is neat - 8,000 lines of code (mostly Python, a tiny bit of Rust) that can train an LLM on $100 of rented cloud compute which can then be served with a web chat UI on a much smaller machine simonwillison.net/2025/Oct/13/...

nanochat

Really interesting new project from Andrej Karpathy, described at length in this discussion post. It provides a full ChatGPT-style LLM, including training, inference and a web Ui, that can be …

simonwillison.net

4 21 210

Reposted by Peter

Key 🗝 🦊✅ @keytryer.net · 20d

Like they ARE projecting huge energy requirements for future systems. And for no reason at all I'm gonna post this chart.

2 11 45

Reposted by Peter

Tim Kellogg @timkellogg.me · 5d

Karpathy: nanochat

A small training+inference pipeline for creating your own LLM from scratch

$100 will get you a somewhat functional model

$1000 is more coherent & solves math

detailed walkthrough: github.com/karpathy/nan...

repo: github.com/karpathy/nan...

Andrej Karpathy & @karpathy
X.com
Excited to release new repo: nanochat! (it's among the most unhinged I've written).
Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web Ul.
It weighs ~8,000 lines of imo quite clean code to:
- Train the tokenizer using a new Rust implementation
- Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics
- Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use.
- SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval)
- RL the model optionally on GSM8K with
IPDDOI

- RL the model optionally on GSM8K with
"GRPO"
- Efficient inference the model in an Engine with
KV cache, simple prefill/ decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUl.
- Write a single markdown report card, summarizing and gamifying the whole thing.
Even for as low as ~$100 in cost (~4 hours on an
8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions.
About ~12 hours surpasses GPT-2 CORE metric.
As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and
70s on ARC-Easy, 20s on GSM8K, etc.
My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow

developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved.
Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.
nanochat

3 20 93