Toni Kukurin
banner
tkukurin.bsky.social
Toni Kukurin
@tkukurin.bsky.social
life is short and I am in a hurry.

tkukurin.github.io
not sure I follow this one -- as I see it the implication is you are learning params "beyond" the immediate task of interest. what's a more precise alternative?
November 16, 2025 at 5:08 PM
I don't mind "ICL" as a term but "in context adaptation" always seemed apt.

neuro has nice terms "working memory gating", "attentional selection"

but I think the point of keeping "learning" in there is exactly because the _outcome_ is you get better performance on your task at hand.
November 16, 2025 at 3:35 PM
suttonification may be better characterized as "moving up the abstraction level" rather than "just" removing domain knowledge (ie meta-learning). he loves his options frameworks.
October 19, 2025 at 9:01 PM
agree on the latter point, cf bsky.app/profile/tkuk...

on the former - it's costly if one assumes grok invocation per page load. but IMO scaling a grok-twitter-recsys is amenable to good system dimensioning (e.g. model cascade + periodic batching)

from that perspective, doesn't seem preposterous 🤷
agree re/feeds

what you'd normally do for a closed-platform is (1) scrape posts, (2) write-your-own ranking/clf. algo to get better content (tags are useful but easily co-opted, the platform's recsys is necessarily deficient due to partial obs).

_native_ support for scaling (2) I find exciting
October 18, 2025 at 4:45 PM
how would satya sell enterprise AI without text extraction
October 18, 2025 at 3:35 PM
our entire economy depends on document scanning for training data.
October 18, 2025 at 3:31 PM
iirc @neuroai.bsky.social make a nice remark at some point during the learning salon. paraphrasing from memory, "is the real AGI DeepMind's accumulated bias + methods or are we _truly_ building a general learner"
October 18, 2025 at 9:34 AM
"long-context" and "prompt eng" are flip-sides of the same underlying concept. _someone_ has to compress the representation, either in language- or high-d-vector space.

IMO the domain model is intelligence as part user (akin to "phds doing lin reg" in finance), part interface (RLHF), part model
September 21, 2025 at 1:18 PM
interesting but I guess not a relevant attack vector (assuming it exclusively influences user-local results).

becomes relevant in some intricate global-recommender-system situation but google already has years of dealing with this behind them.
August 17, 2025 at 7:08 PM
how do you here define _frontier work_? most work I do at least seems a combination of monotonous incantations and real insight. insight usually being ~ "this will affect outcomes I care about N steps down the road". && frequently this is preempted by me just-realizing which outcomes I care about :)
July 18, 2025 at 11:50 AM
but also sounds like a cool research project. rev-eng how much google3 code in gemini pre-training based on relative quality of sth like gemini vs gpt4 or claude.

(eg via diff-in-diffs perf on coding tasks nodejs vs swift)

(I assume gem was predominantly google3-trained based on its coding quirks)
July 13, 2025 at 10:51 AM
novelty = f(audience, goal)?

there's a nice exposition of this topic in @rockt.ai and others' position paper arxiv.org/abs/2406.042...

which, to prove a meta-point, delivers a rather "obvious" message with exposition clarity deemed novel enough for ICML poster acceptance :)
Open-Endedness is Essential for Artificial Superhuman Intelligence
In recent years there has been a tremendous surge in the general capabilities of AI systems, mainly fuelled by training foundation models on internetscale data. Nevertheless, the creation of openended...
arxiv.org
July 12, 2025 at 7:09 PM
"dynamic" as discussed here sounds to me more akin to what people would refer to as "reasoning" fine-tuning nowadays?

RAG implies external memory; reasoning post-trained model generates more artefacts (/tokens/"writes") as a result of computation (which is also where "dynamic" makes a difference)
June 10, 2025 at 2:02 PM
not sure about a single unifying tutorial textbook; however closest might be something in the huggingface/anthropic/... blogpost genre... eg. what comes to mind in data filtering/collection space
huggingface.co/spaces/Huggi...
FineWeb: decanting the web for the finest text data at scale - a Hugging Face Space by HuggingFaceFW
This application helps create large, high-quality datasets for training large language models by processing and filtering web data. Users can obtain datasets with optimized deduplication and filter...
huggingface.co
June 3, 2025 at 6:04 PM
Humans demonstrating OOD failure, AI demonstrating IID success :)

bsky.app/profile/mela...
Alison Gopnik, telling it like it is, at Johns Hopkins.
May 30, 2025 at 5:06 PM
"its normal and the most important thing you can do is do a lot of work" :)

it would be intriguing to see the progress if you're willing to share at some point!
A message from Ira Glass that every Artist should hear...
YouTube video by Krzysztof A. Janczak
www.youtube.com
May 29, 2025 at 7:53 AM
related and belated followup: opinion/rant on similar topic

> Every app has a different design [optimized based on your activity ...] each app trying to make you do different things in uniquely annoying ways [...] low-quality clickbait

anyhow we all know how it goes
Ed Zitron's AI opinions are pretty misinformed/dumb but I have fundamental empathy for where he's coming from on the overall issue of computing becoming an abusive carnival of scummy value extraction tactics from the poor and ignorant. This is a good op ed.
www.wheresyoured.at/never-forgiv...
Never Forgive Them
In the last year, I’ve spent about 200,000 words on a kind of personal journey where I’ve tried again and again to work out why everything digital feels so broken, and why it seems to keep getting wor...
www.wheresyoured.at
May 25, 2025 at 6:08 PM
I mean, you've described a lot of content. "low entropy" does some moderately-heavy lifting, but in essence www.youtube.com/@kurzgesagt videos are also this way.

all of art ultimately exists on a particular grounded low entropy axis (nice example by Kurt Vonnegut youtu.be/4_RUgnC1lm8?...)
Kurt Vonnegut Lecture
YouTube video by Case Western Reserve University
youtu.be
May 25, 2025 at 10:55 AM
agree re/feeds

what you'd normally do for a closed-platform is (1) scrape posts, (2) write-your-own ranking/clf. algo to get better content (tags are useful but easily co-opted, the platform's recsys is necessarily deficient due to partial obs).

_native_ support for scaling (2) I find exciting
May 25, 2025 at 8:51 AM
💯, people complain to overuse of mathematical formalisms in papers but a less expressive language enforces precision.

(in which sense, I guess a similar analogy exists: internal monologue vs. writing)
May 21, 2025 at 4:52 PM
How was it, "if I had more time I would've written a shorter letter"... Cursor has all the time in the world, now to incentivize compression. Twitter v0 had the right idea :)
May 15, 2025 at 4:58 PM
despite all those heroic efforts to open source "the algorithm"?! shocker
GitHub - twitter/the-algorithm: Source code for Twitter's Recommendation Algorithm
Source code for Twitter's Recommendation Algorithm - twitter/the-algorithm
github.com
May 15, 2025 at 7:24 AM
How many turns into the conversation? Any way to share the full trace?
May 5, 2025 at 5:45 AM
composition (in standard unixy sense), decomposition (use part of functionality and pipe into another program), misappropriation (eg use excel as coloring tool), environment-driven personalization (e.g. use OS-level accessibility tools)
May 1, 2025 at 9:05 PM
+1, I conceptualize as generalized state of flow beyond visceral fleeting moments. prob why env design (school, "WFH v onsite", etc) matters & likely down to energy expense: helper.ipam.ucla.edu/publications...

> Control energy for state transitions decreases over the course of repeated task trials
helper.ipam.ucla.edu
April 19, 2025 at 4:08 PM