Avik Dey
banner
avikdey.bsky.social
Avik Dey
@avikdey.bsky.social
Mostly Data, ML, OSS & Society • Stop chasing Approximately Generated Illusions; focus on Specialized Small LMs • To understand it well enough, learn to explain it simply • Shadow self of https://linkedin.com/in/avik-dey, have a beard now
Pinned
Alignment isnt only thing LLMs are faking. Reasoning is another one that they are good at faking. Reading paper on LLM performance on reasoning tasks of doctors. Just started reading but either going to be:
1. Memorization or
2. Priming or
2. Confirmation prompting

www.anthropic.com/research/ali...
Alignment faking in large language models
A paper from Anthropic's Alignment Science team on Alignment Faking in AI large language models
www.anthropic.com
Proxying the Apple byte - are we?

Amateur move guys.
November 26, 2025 at 1:37 AM
Having faced this exact same repetitive issue since 2023, I would have laughed at this - if we didn’t have 1% of the GDP invested in this caricature of an “AI”.

www.dwarkesh.com/p/ilya-sutsk...
November 25, 2025 at 9:40 PM
Ilya appears to be progressively approaching the right conclusion. Remain confident that in time he will consolidate his insights from first 5 minutes and recognize that complex explanations are unnecessary when simpler ones suffice.

(screenshots not chronological)

www.dwarkesh.com/p/ilya-sutsk...
November 25, 2025 at 8:17 PM
Good to see research on what math always said - low-average performers that’s your LLM “employee”:

> This supports our assertion that the ceiling on LLM creativity (0.25) corresponds to the boundary between little-c and Pro-c human creative performance (Figure 6).

www.academia.edu/144621465/_T...
November 25, 2025 at 5:19 PM
Any PhD who endorses that a LLM constitutes “PhD level” intelligence is at minimum engaging in a questionable use of their academic authority. These endorsements function less as rigorous assessments and more as signal that the symbolism conferred by their credential is - available for rent.
Deeply absurd. This Google PDF published on a blog (arxiv, not peer reviewed) claims an LLM is "PhD level" but in most cases the MAJORITY of reference URLs were invalid or inaccessible.

A PhD sitting down and just fabricating >50% of sources = career ending

arxiv.org/abs/2511.11597
November 24, 2025 at 9:39 PM
They were convinced “AI“ would rewrite it all in a week and ship by end of that month, the ‘year or two’ estimate was just sandbagging so they could pose as 100x devs.
In the early days of DOGE I spoke to developers Musk had parachuted into the IRS and the FAA, each telling me they expected to rewrite the core software of both agencies within a year or two.

It would be amusing to speak to them again.
DOGE is no more, and in its wake, only chaos
November 24, 2025 at 5:09 AM
“warm-up”: Under the guidance of an expert human the model was finally able to get the answer right when nudged towards it.

Not the model, not the prompt - still the human.

The amount of shilling these guys do, no wonder they can’t get anything serious built.

cdn.openai.com/pdf/4a25f921...
November 23, 2025 at 5:33 PM
Think they might have answered their own question … ?

bsky.app/profile/slas...
November 22, 2025 at 4:04 AM
The problem with most financial analysis of Nvidia’s quarterly performance, is these folks don’t seem to understand data center hardware lead times and revenue recognition cycle.
November 20, 2025 at 6:36 AM
Great article with learned insights - the best kind.

Unfortunately, this is a societal failure. Tech didn’t invent loneliness, it offered a new way to cope with it - in an empathetic echo chamber.

We are failing the kids. Others too, but mostly it’s the kids that I worry about.
I agree that emotional addiction to chatbots is the number one risk of AI today. Here is a gift link to an important OpEd in the NYTimes:
www.nytimes.com/2025/11/17/o...
Opinion | The Sad and Dangerous Reality Behind ‘Her’
www.nytimes.com
November 20, 2025 at 6:10 AM
You watch a video of a professor from a random internet post and are filled with regret because you didn’t have the opportunity to learn from him in person:

en.wikipedia.org/wiki/Ramamur...
19. Quantum Mechanics I: The key experiments and wave-particle duality
YouTube video by YaleCourses
youtu.be
November 19, 2025 at 6:16 AM
Smaller bag, same toss.
Nvidia and Microsoft will invest up to $15 billion in OpenAI competitor Anthropic. Anthropic, in turn, said it would buy $30 billion of compute capacity from Microsoft Azure and use advanced AI chips supplied by Nvidia.
Nvidia, Microsoft Pour $15 Billion Into Anthropic for New AI Alliance
Anthropic also commits to purchase $30 billion from Microsoft’s cloud computing business Azure.
on.wsj.com
November 18, 2025 at 7:40 PM
For ancillary text based foo foo services or core financial services? I am have a hard time believing that their engineers, a few of who I know, would sign off on this integration - but leadership prevailed?
November 18, 2025 at 7:38 PM
Don’t worry about it this quarter - they have enough to prop it up.

But next quarter you should be terrified.
November 18, 2025 at 7:22 PM
If these Gemini 3 Pro benchmarks are accurate, time for OpenAI to sell to Microsoft. Microsoft won’t want their management team or their prolifically tweeting engineers, but I am sure most engineers would thrive if led by seasoned engineering management.

storage.googleapis.com/deepmind-med...
November 18, 2025 at 4:51 PM
I too would like my taxpayer backed trillion dollar fantasy fund. Why should Sama have all the fun?
Anthropic CEO Dario Amodei thinks AI could help find cures for most cancers, prevent Alzheimer’s, and even double the human lifespan. cbsn.ws/4oRZ8Nm
November 18, 2025 at 6:50 AM
Perfect prediction, even if I say so myself!

Actually their realization dawned a few weeks back, but these things take a little while to surface externally.

Image of tweet from bird site because I won’t link to it.
November 16, 2025 at 1:45 AM
From the bird site, the acceleration continues:
November 16, 2025 at 1:30 AM
Thoughts:
- Report is based on Claude’s logs without any visibility to human actions outside of Claude
- Reinforcing “80–90% of tactical work” was by Claude and humans were merely in a strategic role, is curiously well aligned with their marketing message rather than any verified capability
assets.anthropic.com
November 14, 2025 at 5:00 PM
In software engineering, lines of code edited and weekly merge counts are misleading proxies for productivity. There are a significant number exogenous variables that impact those metrics. To name only a few - team dynamics, code maturity, product maturity, business seasonal cycles and many more.
Some pretty eye-opening data on the effect of AI coding.

When Cursor added agentic coding in 2024, adopters produced 39% more code merges, with no sign of a decrease in quality (revert rates were the same, bugs dropped) and no sign that the scope of the work shrank. papers.ssrn.com/sol3/papers....
November 13, 2025 at 4:18 PM
In this age of AI, don’t be a follower. Be the leader who hires engineers who build the future - because AI ain’t building jackshit for you.
November 8, 2025 at 8:36 PM
We are entering the golden age of AI “world models” where every AI hype will be proudly accompanied by their grand unified theory of everything, rigorously engineered to collapse at the first gentle poke of reality.
Browsing the arxiv paper - the architecture seems to rely heavily on the structured world model. Any additional write up on how the world model was generated and is globally maintained?
November 8, 2025 at 4:48 PM
Classifier ≠ Human Judge

> We assess how effectively large language models generate social media replies that remain indistinguishable from human-authored content when evaluated by automated classifiers. We employ a BERT-based binary classification model to distinguish between the two text types.
LLMs are now widely used in social science as stand-ins for humans—assuming they can produce realistic, human-like text

But... can they? We don’t actually know.

In our new study, we develop a Computational Turing Test.

And our findings are striking:
LLMs may be far less human-like than we think.🧵
Computational Turing Test Reveals Systematic Differences Between Human and AI Language
Large language models (LLMs) are increasingly used in the social sciences to simulate human behavior, based on the assumption that they can generate realistic, human-like text. Yet this assumption rem...
arxiv.org
November 8, 2025 at 5:33 AM
AI isn’t going to wound web’s ad model - fatally or otherwise. AI companies are going to be the ones serving those ads.

I would be shocked if OpenAI hasn’t / isn’t already indexing the web even as I type this.
November 7, 2025 at 1:14 AM
Demo coming soon …

bsky.app/profile/avik...
November 5, 2025 at 4:06 PM