Saturnin Pugnet
banner
satpugnet.bsky.social
Saturnin Pugnet
@satpugnet.bsky.social
Founding member @Worldcoin 🗺️ Prev @Caltech/@Imperialcollege - Wonder wildly, fail often, love recklessly, and leave the campfire brighter than you found it 🔥

https://satpugnet.com
Features my fingerprints to confirm that it is not AI.
December 14, 2025 at 1:58 AM
We often talk about the chip war, but I believe the war for the training set is also happening right now, mostly invisible to the average user. These "glitches" are likely just the tip of the iceberg.

I wonder if anyone is actually tracking this shift in token distribution?
December 7, 2025 at 2:05 PM
By ensuring their data comprises a significantly larger percentage of the global training corpus, they can subtly influence how future "neutral" models process information.
December 7, 2025 at 2:05 PM
As LLMs rely heavily on the open web for training, the composition of that data dictates the model's behavior. I believe China (or state-aligned actors) may be strategically flooding the web with Chinese-language data. The goal isn't just content creation; it's statistical saturation.
December 7, 2025 at 2:05 PM
My working theory: We are seeing the early signs of a "Data Offensive."
December 7, 2025 at 2:05 PM
Here is a tour of where my brain went with this: satpugnet.substack.com/p/quiet-echoes
December 2, 2025 at 5:55 PM
I was surprised to learn that it is as high as 38%.
December 23, 2024 at 5:00 PM