I'm so glad you asked!
Anthropic has been releasing some promising LLM alignment results. Does this AI alignment in general will be easier than we thought? My answer is, as usual, "it's complicated".
gracekind.net/blog/llmalig...
Anthropic has been releasing some promising LLM alignment results. Does this AI alignment in general will be easier than we thought? My answer is, as usual, "it's complicated".
gracekind.net/blog/llmalig...
There's been some excellent research lately into A. Why that is and B. How to solve it. thinkingmachines.ai/blog/defeati...
There's been some excellent research lately into A. Why that is and B. How to solve it. thinkingmachines.ai/blog/defeati...
👇
👇
How the current way of training language models destroys any voice (and hope of good writing).
www.interconnects.ai/p/why-ai-wri...
t.co/h82vvrYRPM
a new mech interp paper from OpenAI proposes a way to train models so that they’re natively easier to understand
openai.com/index/unders...