The key is to provide enough details so there’s little room for improvisation. Otherwise there’s no guarantee it will improvise the way you want it to.
The key is to provide enough details so there’s little room for improvisation. Otherwise there’s no guarantee it will improvise the way you want it to.
Now I wonder what other capabilities can be improved using RL only. One could create bunch of different reward models and let them train models for longer time.
I know RL training is expensive, but even S1-like experiments could expose patterns and behaviors.
Now I wonder what other capabilities can be improved using RL only. One could create bunch of different reward models and let them train models for longer time.
I know RL training is expensive, but even S1-like experiments could expose patterns and behaviors.
SQLite (sqlite-vec) version
simonwillison.net/2024/Oct/4/h...
PostgreSQL (pgvector)
github.com/pgvector/pgv...
SQLite (sqlite-vec) version
simonwillison.net/2024/Oct/4/h...
PostgreSQL (pgvector)
github.com/pgvector/pgv...
Bravo! 👏
Bravo! 👏
Chat is only a UI constraint to build this context iteratively.
Chat is only a UI constraint to build this context iteratively.
Like seriously. I would love to read a post about how you approach and research what’s new, what tool you made and use daily to consume all your feeds and sources. - 1/2
Like seriously. I would love to read a post about how you approach and research what’s new, what tool you made and use daily to consume all your feeds and sources. - 1/2
There is this paper that shows how dataset with 1k long instructions outperforms larger datasets of worse quality.
arxiv.org/pdf/2402.048...
There is this paper that shows how dataset with 1k long instructions outperforms larger datasets of worse quality.
arxiv.org/pdf/2402.048...