Lightnews — Scholar-powered news

Toni Kukurin

@tkukurin.bsky.social

not sure I follow this one -- as I see it the implication is you are learning params "beyond" the immediate task of interest. what's a more precise alternative?

November 16, 2025 at 5:08 PM

Toni Kukurin

@tkukurin.bsky.social

I don't mind "ICL" as a term but "in context adaptation" always seemed apt.

neuro has nice terms "working memory gating", "attentional selection"

but I think the point of keeping "learning" in there is exactly because the _outcome_ is you get better performance on your task at hand.

November 16, 2025 at 3:35 PM

Toni Kukurin

@tkukurin.bsky.social

suttonification may be better characterized as "moving up the abstraction level" rather than "just" removing domain knowledge (ie meta-learning). he loves his options frameworks.

October 19, 2025 at 9:01 PM

Toni Kukurin

@tkukurin.bsky.social

agree on the latter point, cf bsky.app/profile/tkuk...

on the former - it's costly if one assumes grok invocation per page load. but IMO scaling a grok-twitter-recsys is amenable to good system dimensioning (e.g. model cascade + periodic batching)

from that perspective, doesn't seem preposterous 🤷

Toni Kukurin @tkukurin.bsky.social · May 25

agree re/feeds

what you'd normally do for a closed-platform is (1) scrape posts, (2) write-your-own ranking/clf. algo to get better content (tags are useful but easily co-opted, the platform's recsys is necessarily deficient due to partial obs).

_native_ support for scaling (2) I find exciting

October 18, 2025 at 4:45 PM

Toni Kukurin

@tkukurin.bsky.social

how would satya sell enterprise AI without text extraction

October 18, 2025 at 3:35 PM

Toni Kukurin

@tkukurin.bsky.social

our entire economy depends on document scanning for training data.

October 18, 2025 at 3:31 PM

Toni Kukurin

@tkukurin.bsky.social

iirc @neuroai.bsky.social make a nice remark at some point during the learning salon. paraphrasing from memory, "is the real AGI DeepMind's accumulated bias + methods or are we _truly_ building a general learner"

October 18, 2025 at 9:34 AM

Toni Kukurin

@tkukurin.bsky.social

"long-context" and "prompt eng" are flip-sides of the same underlying concept. _someone_ has to compress the representation, either in language- or high-d-vector space.

IMO the domain model is intelligence as part user (akin to "phds doing lin reg" in finance), part interface (RLHF), part model

September 21, 2025 at 1:18 PM

Toni Kukurin

@tkukurin.bsky.social

interesting but I guess not a relevant attack vector (assuming it exclusively influences user-local results).

becomes relevant in some intricate global-recommender-system situation but google already has years of dealing with this behind them.

August 17, 2025 at 7:08 PM

Toni Kukurin

@tkukurin.bsky.social

how do you here define _frontier work_? most work I do at least seems a combination of monotonous incantations and real insight. insight usually being ~ "this will affect outcomes I care about N steps down the road". && frequently this is preempted by me just-realizing which outcomes I care about :)

July 18, 2025 at 11:50 AM

Toni Kukurin

@tkukurin.bsky.social

but also sounds like a cool research project. rev-eng how much google3 code in gemini pre-training based on relative quality of sth like gemini vs gpt4 or claude.

(eg via diff-in-diffs perf on coding tasks nodejs vs swift)

(I assume gem was predominantly google3-trained based on its coding quirks)

July 13, 2025 at 10:51 AM

Toni Kukurin

@tkukurin.bsky.social

novelty = f(audience, goal)?

there's a nice exposition of this topic in @rockt.ai and others' position paper arxiv.org/abs/2406.042...

which, to prove a meta-point, delivers a rather "obvious" message with exposition clarity deemed novel enough for ICML poster acceptance :)

Open-Endedness is Essential for Artificial Superhuman Intelligence

In recent years there has been a tremendous surge in the general capabilities of AI systems, mainly fuelled by training foundation models on internetscale data. Nevertheless, the creation of openended...

arxiv.org

July 12, 2025 at 7:09 PM

Toni Kukurin

@tkukurin.bsky.social

"dynamic" as discussed here sounds to me more akin to what people would refer to as "reasoning" fine-tuning nowadays?

RAG implies external memory; reasoning post-trained model generates more artefacts (/tokens/"writes") as a result of computation (which is also where "dynamic" makes a difference)

June 10, 2025 at 2:02 PM

Toni Kukurin

@tkukurin.bsky.social

not sure about a single unifying tutorial textbook; however closest might be something in the huggingface/anthropic/... blogpost genre... eg. what comes to mind in data filtering/collection space
huggingface.co/spaces/Huggi...

FineWeb: decanting the web for the finest text data at scale - a Hugging Face Space by HuggingFaceFW

This application helps create large, high-quality datasets for training large language models by processing and filtering web data. Users can obtain datasets with optimized deduplication and filter...

huggingface.co

June 3, 2025 at 6:04 PM

Toni Kukurin

@tkukurin.bsky.social

Humans demonstrating OOD failure, AI demonstrating IID success :)

bsky.app/profile/mela...

Melanie Mitchell @melaniemitchell.bsky.social · May 29

Alison Gopnik, telling it like it is, at Johns Hopkins.

May 30, 2025 at 5:06 PM

Toni Kukurin

@tkukurin.bsky.social

"its normal and the most important thing you can do is do a lot of work" :)

it would be intriguing to see the progress if you're willing to share at some point!

A message from Ira Glass that every Artist should hear...

YouTube video by Krzysztof A. Janczak

www.youtube.com

May 29, 2025 at 7:53 AM

Toni Kukurin

@tkukurin.bsky.social

related and belated followup: opinion/rant on similar topic

> Every app has a different design [optimized based on your activity ...] each app trying to make you do different things in uniquely annoying ways [...] low-quality clickbait

anyhow we all know how it goes

John David Pressman @jdp.extropian.net · May 25

Ed Zitron's AI opinions are pretty misinformed/dumb but I have fundamental empathy for where he's coming from on the overall issue of computing becoming an abusive carnival of scummy value extraction tactics from the poor and ignorant. This is a good op ed.
www.wheresyoured.at/never-forgiv...

Never Forgive Them

In the last year, I’ve spent about 200,000 words on a kind of personal journey where I’ve tried again and again to work out why everything digital feels so broken, and why it seems to keep getting wor...

www.wheresyoured.at

May 25, 2025 at 6:08 PM

Toni Kukurin

@tkukurin.bsky.social

I mean, you've described a lot of content. "low entropy" does some moderately-heavy lifting, but in essence www.youtube.com/@kurzgesagt videos are also this way.

all of art ultimately exists on a particular grounded low entropy axis (nice example by Kurt Vonnegut youtu.be/4_RUgnC1lm8?...)

Kurt Vonnegut Lecture

YouTube video by Case Western Reserve University

youtu.be

May 25, 2025 at 10:55 AM

Toni Kukurin

@tkukurin.bsky.social

agree re/feeds

what you'd normally do for a closed-platform is (1) scrape posts, (2) write-your-own ranking/clf. algo to get better content (tags are useful but easily co-opted, the platform's recsys is necessarily deficient due to partial obs).

_native_ support for scaling (2) I find exciting

May 25, 2025 at 8:51 AM

Toni Kukurin

@tkukurin.bsky.social

💯, people complain to overuse of mathematical formalisms in papers but a less expressive language enforces precision.

(in which sense, I guess a similar analogy exists: internal monologue vs. writing)

May 21, 2025 at 4:52 PM

Toni Kukurin

@tkukurin.bsky.social

How was it, "if I had more time I would've written a shorter letter"... Cursor has all the time in the world, now to incentivize compression. Twitter v0 had the right idea :)

May 15, 2025 at 4:58 PM

Toni Kukurin

@tkukurin.bsky.social

despite all those heroic efforts to open source "the algorithm"?! shocker

GitHub - twitter/the-algorithm: Source code for Twitter's Recommendation Algorithm

Source code for Twitter's Recommendation Algorithm - twitter/the-algorithm

github.com

May 15, 2025 at 7:24 AM

Toni Kukurin

@tkukurin.bsky.social

How many turns into the conversation? Any way to share the full trace?

May 5, 2025 at 5:45 AM

Toni Kukurin

@tkukurin.bsky.social

composition (in standard unixy sense), decomposition (use part of functionality and pipe into another program), misappropriation (eg use excel as coloring tool), environment-driven personalization (e.g. use OS-level accessibility tools)

May 1, 2025 at 9:05 PM

Toni Kukurin

@tkukurin.bsky.social

+1, I conceptualize as generalized state of flow beyond visceral fleeting moments. prob why env design (school, "WFH v onsite", etc) matters & likely down to energy expense: helper.ipam.ucla.edu/publications...

> Control energy for state transitions decreases over the course of repeated task trials

helper.ipam.ucla.edu

April 19, 2025 at 4:08 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news