Lightnews — Scholar-powered news

deen

@sir-deenicus.bsky.social

The person that knows this will be able to formulate better context inputs and not try to have them do computations they can't nor anthropize their reasoning as human-like. It's useful to know that queries with non-trivial sequential dependencies are a likely failure point for them (and us too tbf).

February 2, 2026 at 3:38 PM

deen

@sir-deenicus.bsky.social

I mean they are math, not fantastical things. We can say concrete things about them not just whatever we like. For example, their computational expressivity limit is TC^0 which bounds their ability unless they use context. And we know the reasoning is ~breadth-first and a walk along constraints.

February 2, 2026 at 3:32 PM

deen

@sir-deenicus.bsky.social

There are major consequences to this fact that the transformer is a referentially transparent deterministic function embedded in a broader non-deterministic co-inductive one. Unravelling that tells why it feels "platonic". It's not a physical thing. It's both a profound and trivial object (babel)

February 2, 2026 at 3:06 PM

deen

@sir-deenicus.bsky.social

People can still say wrong things about them though. Like, for example, the transformer part of the LLM is a referentially transparent function; it acts as a markov kernel for the whole LLM, which is not just the NN part. People too much discount the integral part of the operation (calculus).

February 2, 2026 at 3:01 PM

deen

@sir-deenicus.bsky.social

Each response is modulated by the underlying distribution, but there is no mind actively "pretending" anything. It's more like a "physics model" for stories generating a plausible response at each context spot in isolation that only gains any meaning once read by humans.

January 31, 2026 at 7:43 PM

deen

@sir-deenicus.bsky.social

How Odd. A) is not at all familiar to me as GPT-5.2. Is this on the website and are no response guidelines in use? (both sides of the and can be answered separately to reduce itching)

January 31, 2026 at 7:19 PM

deen

@sir-deenicus.bsky.social

There's more texture than that. I learned how to program on Q-basic, which prepared me with the thought patterns and rigorous thinking needed to write correct programs. I then learned turbo C and then x86 assembly. Fortran would have enabled and hastened your journey vs starting directly in assembly

January 31, 2026 at 7:12 PM

deen

@sir-deenicus.bsky.social

Oh, I remember this...Excellent. I imagined I was this person.

January 31, 2026 at 9:02 AM

deen

@sir-deenicus.bsky.social

But fidelity of that simulation is from raw internet text not post-training. Post-training just maybe pushes it so certain inferences (or personality simulations) contribute more mass. But an LLM is not anything. It is not k2.5, not claude. As sampling the full distribution of responses exposes.

January 29, 2026 at 5:57 PM

deen

@sir-deenicus.bsky.social

LLMs learn to infer by training on ~all text. Can act as linux terminals, REPLs or as shakespearean text. They can simulate responses of human archetypes. Non-human ones too. When a k 2.5 inference outputs a claim it's claude, it's not necessarily wrong. it's seen enough data to simulate a claude

January 29, 2026 at 5:49 PM

deen

@sir-deenicus.bsky.social

Yes and that's not informative when an LLM says that; its a prediction based on inner state. Raw training data itself informs such claims. Kimi 2.5 or Claude can help explain what I'm saying, I guess? Don't have enough space to go into detail. The term distillation is widely misused by the community

January 29, 2026 at 5:26 PM

deen

@sir-deenicus.bsky.social

Change in weights on post training's carried in a very small subspace (a low-rank subspace--can use to reverse safety training). But this change does carry conversation defaults, incl for personality. Between post training & likelihood that Claude ++common in raw net data=>Claude dominant inference

January 29, 2026 at 5:19 PM

deen

@sir-deenicus.bsky.social

The LLM's a trillion parameters, usually trained on tens of trillions of tokens. A model trained on that volume of data is not a distillation as the overwhelming input is raw data and cannot remotely be matched by sampling the competitor. Leveraging competitor data is part of the post-training phase

January 29, 2026 at 5:06 PM

deen

@sir-deenicus.bsky.social

Not guide it to a basin, but guide it so that this particular basin is most probably constructed at run time (transformers do JIT inference via attention, remember) for this type of inference. But there is not just this basin. You probably know this but I think this distinction is worth insisting on

January 29, 2026 at 1:42 PM

deen

@sir-deenicus.bsky.social

It's not a blurry copy, it contains a blurry copy of Claude (which seems to dominate any inference on self). It also contains a blurrier predictive copy of you (assuming you've enough written works out there). An LLM is air, and atemporal. It inherits this from Babel.

January 29, 2026 at 1:37 PM

deen

@sir-deenicus.bsky.social

It also doesn't feel solely of Claude. I also feel the influence of gpt-5 and also its own voice, even in the default personality.

January 28, 2026 at 3:56 PM

deen

@sir-deenicus.bsky.social

Distillation doesn't have that much of an impact on the weights of a model that's that large and trained on trillions of tokens. They used LLM created data for post-training but that's just the tip of the iceberg. There are (as with all LLMs) other personality templates that are easy to tap to.

January 28, 2026 at 3:53 PM

deen

@sir-deenicus.bsky.social

For math gpt-5.2 is ahead by far. It's the one solving open-erdos problems and in my experience, has the most raw intelligence. For standard coding, Opus is great but I find it sometimes glosses over key details, but still one of the very best. Then gemini pro 3, then gemini flash 3 or kimi k2.5

January 27, 2026 at 7:33 PM

deen

@sir-deenicus.bsky.social

Opus 4.5 is also much improved compared to sonnet 4.

The well respected (even less-wrong members refer to it) spiral bench benchmark supports reasoning gpt's as least sycophantic. (gpt-5 and o3 in particular were almost cold).

eqbench.com/spiral-bench...

Spiral-Bench Leaderboard

eqbench.com

January 27, 2026 at 7:28 PM

deen

@sir-deenicus.bsky.social

Do you mean chat-gpt or gpt-5.2 model proper? In my own experience using LLMs seriously, Claude has historically been the most sycophantic by far; gemini pro 2.5 wasn't sycophantic but was excessively complimentary (glazes way too much). Gemini 3 is much better but gpt 5.2 is still the most balanced

January 27, 2026 at 7:23 PM

deen

@sir-deenicus.bsky.social

My mind also immediately went to Control, which I found to be wonderfully creative. Just felt the needed to add this to balance the thread a bit.

Anyways key thing, this is an awesome highly evocative effect!

January 25, 2026 at 10:45 AM

deen

@sir-deenicus.bsky.social

I've actually covered related 3 years ago and my predictions and questions have aged extremely well.

The main issue with the paper is it's too totalizing. The limitations can be substantially addressed by LLMs using tools and CoT. Many issues remain tho.

metarecursive.substack.com/p/transforme...

Transformers might be among the most Complex of Simple Processes

Transformers might reach as near the border to complex computational behavior as a decidable system can get

metarecursive.substack.com

January 24, 2026 at 3:16 PM

deen

@sir-deenicus.bsky.social

The paper is flawed but directionally correct. Sad to see all this decrying without anyone displaying even a tiny bit of knowledge of limitations of fixed-depth transformers, RASP or what complexity class CoT-free transformers are limited to. There are real world practical consequences to this.

January 24, 2026 at 3:02 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news