Yoav Goldberg
yoavgo.bsky.social
Yoav Goldberg
@yoavgo.bsky.social
Pinned
my current belief is that while thinking of DL in the lens of NLP was expanding, thinking of LLMs with the lens of NLP is mostly limiting.
December 20, 2025 at 10:24 PM
in the trailer, the mentions use the full names for "Mirror Isle" and "Skipping Stones to Lonely Homes", and "Heroes" for Heroes of Sokoban, so I'd be surprised if the intention is to hide anything.
December 14, 2025 at 10:04 AM
for the record i don't think language is "solved". the parts i cared about solving, though, are to a large extent "solved", to the extent that the remaining "non-solved" parts are imo not linguistic
December 13, 2025 at 6:46 AM
why?
December 12, 2025 at 7:28 PM
whats the difference in your view?
December 12, 2025 at 5:17 PM
i discuss this in the gist text. this is the more correct way to frame it imo (env provides observations, which agent interprets as rewards based on its goals), and it also opens up possible variations in how to think about learning from the env.
December 6, 2025 at 12:44 AM
I complain a lot about RL lately, and here we go again.

The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment.

More at length here:

gist.github.com/yoavg/3eb3e7...
rl-wrong-about-rewards.md
GitHub Gist: instantly share code, notes, and snippets.
gist.github.com
December 5, 2025 at 11:37 PM
yes it sucks to be the ICLR organizers today, totally agree
November 28, 2025 at 12:01 AM
given that the data is already out and a large jsonl file is rumored to be floating around (which seems very plausible to me), i think the moral thing to do now would be to make the breached data publicly available for all rather than trying to hide it.
November 27, 2025 at 11:32 PM
RL is ok. but the jump from
A) people can be thought of as agents who observe and environment, act, observe the outcome and update their beliefs

to:

B) lets model all things as a POMDP with a numeric reward function!

is just way too big for me
November 27, 2025 at 8:13 PM
the fascinating (to me) quality of hard-core RL researchers (e.g Sutton) is the ability to have an all encompassing view of RL as the basis of intelligence, while at the same time working on super low level stuff like tabular TD algorithms, and yet strongly believe these are actually the same thing
November 27, 2025 at 4:32 PM
ืžืกื›ื™ื ืœื’ืžืจื™
November 19, 2025 at 2:07 AM
ื•ื”ืืžื™ื™ืœ ื”ื–ื” (ืื ื™ ืžื ื™ื— ืฉืœืžื˜ืจืช ืฆื™ืœื•ืžื™ื ืจืฉืžื™ื™ื ืœืžื˜ืจื” ื›ืฉืœื”ื™ ืฉืœ ืืจื’ื•ืŸ ื›ืœืฉื”ื•) ื ืชืคืก ื›ืฆื™ื ื–ื•ืจ? (ืื ื™ ืคืฉื•ื˜ ืžื•ืคืชืข ื›ื™ ืื ื™ ืœื ื”ื™ื™ืชื™ ื—ื•ืฉื‘ ืขืœ ื–ื”)
November 18, 2025 at 5:58 PM
ื‘ืชื•ืจ ืื—ื“ ืฉืจื•ืื” ืขืฆืžื• ื›ืขืฉื•ื™ ืœื›ืชื•ื‘ ืžืฉื”ื• ื›ื–ื” ื‘ื˜ืขื•ืช ื•ืœื ืžื‘ื™ืŸ ืžื” ื”ืขื ื™ื™ืŸ, ืืฉืžื— ืื ืชื•ื›ืœื• ืœื”ืกื‘ื™ืจ ืžื” ื›ืœ ื›ืš ืžืงืคื™ืฅ ืคื”?
November 18, 2025 at 4:24 PM
ืœื ื”ื‘ื ืชื™ ืžื” ื”ืจืคืจื ืก ืœืฉื˜ืจ ืฉืœ ื›ืกืฃ ื•ื’ื ืœื ืจืื™ืชื™ ืืช ื”ื™ืจื•ืง ื“ื•ื•ืงื ื›ืจืคืจื•ืจ ืœืฆื”ืœ.. ืื‘ืœ ื›ืืžื•ืจ ืื•ืœื™ ื–ื” ื›ื™ ืื ื™ ื‘ืืžืช ืœื ืžืขืฆื‘ ืื– ืื ื™ ืœื ื—ื•ืฉื‘ ื‘ืžื•ื‘ื ื™ื ื”ืืœื•
November 18, 2025 at 3:39 PM
ื”ืื ืืช ืžืืžื™ื ื” ื‘ืขื™ืงืจื•ืŸ? ื›ื™ ื ืฉืžืข ืžืฆื™ื•ืฆื™ื ืื—ืจื™ื ืฉืœื ืžืžืฉ, ื•ืื–, ืžื” ืื›ืคืช ืœืš ื‘ืขืฆื ืขื“ ื›ืžื” ื–ื” ืžื“ื•ื™ื™ืง? ืื ื™ ืื™ืฉื™ืช ื›ืŸ ืžืืžื™ืŸ, ื•ืื›ืŸ ื”ื™ื™ืชื™ ืฉืžื— ืื ื–ื” ื™ืฉืชืคืจ ืœื”ื‘ื, ื•ืžืืžื™ืŸ ืฉืื›ืŸ ื›ืš ื™ื”ื™ื” ื›ื™ ื–ื” ื›ื•ืœื” ื‘ื•ืœื˜ ืฉืžืงื•ืฆืจ ื‘ืื•ืคืŸ ืœื ื‘ืจื•ืจ ืขืœ ืคื•ืกื˜ืจ.
November 18, 2025 at 3:37 PM
ืœื ื”ืกืœื•ื’ืŸ ื”ืื™ื“ืืœื™ ืื‘ืœ ื’ื ื‘ืืžืช ืœื ื›ื–ื” ืžื•ืคืจืš. ื”ื›ืจื” ืžืคื•ืจืฉืช ื‘ื–ื›ื•ืช ืฉืœ ื™ืฉืจืืœ ืœื”ืชืงื™ื™ื ื›ื™ืฉื•ืช ืฆื™ื•ื ื™ืช, ืœืฆื“ ื”ืžื“ื™ื ื•ืช ื”ืขืจื‘ื™ื•ืช ื›ื•ืœืœ ื”ืคืœืกื˜ื™ื ื™ืช.
November 18, 2025 at 3:11 PM
ื›ืœื ืžืขืฆื‘, ื–ื” ื ืจืื” ืœื™ ืงืฆืช ื—ื•ื‘ื‘ื ื™ ืื‘ืœ ืกื”ื› ืžืžืฉ ื‘ืกื“ืจ. ืžื” ื”ื‘ืขื™ื”? ืžื” ื–ื” ื™ืจื•ืง ืœื ื ื›ื•ืŸ?
November 18, 2025 at 3:10 PM
what's the latest-and-greatest attempt to reverse-engineer and document the inner-working of claude-code?
November 17, 2025 at 10:23 AM
(hmm i guess we can amend to "increase in the proportion of knowledge we believe to be true")
November 17, 2025 at 7:00 AM
i think memory is never "free", in the sense that the real bottleneck is not storage, but the ability to retrieve the right thing, while not retrieving a wrong (out of date) thing by mistake.

but assuming we do delete facts, is deleting considered learning in your definition?
November 17, 2025 at 6:59 AM
is "increase" necessary? or is "change" enough? (although i guess that in an ideal form, you dont "forget" a wrong fact but add the fact that it is wrong, so you may consider it as increasing...)
November 16, 2025 at 8:11 PM
yes, following instructions in prompt is not learning. but if a wrapping systems stores items to inject in future prompts, then you can consider the system as learning.
November 16, 2025 at 8:00 PM
it will be in-context-induction, and the storing and retention from external memory would be learning.
November 16, 2025 at 7:34 PM
the storage, if it happens, is the learning part. the inference process is not learning.
November 16, 2025 at 7:14 PM