Author | Lightnews

burkov.bsky.social

@burkov.bsky.social

AI experts in 2025: 1) Asked an LLM to solve the business problem, 2) Said that it's an agent, so the solution should be OK, 3) Vibe coded it to production.

March 22, 2025 at 3:57 AM

burkov.bsky.social

@burkov.bsky.social

AI experts in the past: 1) Defined an optimization problem as a proxy to a business problem, 2) Optimized on a business-specific dataset, 3) Proved the solution is optimal, 4) Applied it to the business problem.

March 22, 2025 at 3:57 AM

burkov.bsky.social

@burkov.bsky.social

Will the billions poured into LLM companies during the last two years allow them to unlock access to proprietary data at a scale that justifies the hype continuation? I bet not. And you?

December 8, 2024 at 6:38 AM

burkov.bsky.social

@burkov.bsky.social

Limited access to data is what killed the big data/Hadoop hype and what kept machine/deep learning as a niche skill.

December 8, 2024 at 6:38 AM

burkov.bsky.social

@burkov.bsky.social

This will require proprietary data, and this is the problem because we are back to traditional data science/AI, where models were empty shells looking for data, but the data was hard to find/adapt for ML.

December 8, 2024 at 6:38 AM

burkov.bsky.social

@burkov.bsky.social

The next stage will be companies trying to specialize LLMs to do something they cannot do well enough from pretraining.

December 8, 2024 at 6:38 AM

burkov.bsky.social

@burkov.bsky.social

The growth in LLMs from pretraining on larger datasets and parameter count increase has reached its peak. It took 2 years, which was fun to watch.

December 8, 2024 at 6:38 AM

burkov.bsky.social

@burkov.bsky.social

Those of us who learned to code before LLMs will know how to instruct them in generating good quality code.

Those who started building software by talking to LLMs will always be limited in what they can explain verbally.

Learning to code is crucial to being able to effectively not code.

December 5, 2024 at 8:01 AM

burkov.bsky.social

@burkov.bsky.social

If not, the word is split into individual characters, and those characters are merged by using the learned merge rules in the same order those rules were added to the merges collection during BPE training.

Don't trust online information. Trust the source code and good books.

December 1, 2024 at 1:48 AM

burkov.bsky.social

@burkov.bsky.social

This is not how it works, and doing so would not result in correct tokenization. The real algorithm takes a word, checks if the word is also a token, and if it is, it returns the token.

December 1, 2024 at 1:48 AM

burkov.bsky.social

@burkov.bsky.social

Once the BPE model is trained, most explain the process of tokenizing a new sequence as scanning it from left to right and looking for the longest token in the vocabulary that matches the upcoming characters.

December 1, 2024 at 1:48 AM

burkov.bsky.social

@burkov.bsky.social

In doing research for my book, I discovered that byte-pair encoding (BPE), the algorithm used to tokenize data for modern language models, one of the most important algorithms of our times, is described incorrectly in almost all online resources.

December 1, 2024 at 1:48 AM

burkov.bsky.social

@burkov.bsky.social

Hello, world!

December 1, 2024 at 1:03 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news