You can also find me at threads: @sung.kim.mw
Paper: www.arxiv.org/abs/2511.089...
Model: ???
Paper: www.arxiv.org/abs/2511.089...
Model: ???
ageron.github.io/homlp/HOMLP_...
ageron.github.io/homlp/HOMLP_...
He proves that under the heat semigroup (Pτ) on the Boolean hypercube, any nonnegative function f:{−1,1}n→ℝ+ exhibits a uniform tail bound that is better than that by Markov's inequality.
He proves that under the heat semigroup (Pτ) on the Boolean hypercube, any nonnegative function f:{−1,1}n→ℝ+ exhibits a uniform tail bound that is better than that by Markov's inequality.
www.chosun.com/english/mark...
www.chosun.com/english/mark...
It tries to answer this question - As probabilistic models for language data, how do Diffusion LLMs differ from AR LLMs when fitting the natural language?
It tries to answer this question - As probabilistic models for language data, how do Diffusion LLMs differ from AR LLMs when fitting the natural language?
Are they implying that Google will sell TPUs, not just offer them through Google Cloud, to other companies like Meta?
"Continuous batching" by Remi Ouazan and two others.
huggingface.co/blog/continu...
"Continuous batching" by Remi Ouazan and two others.
huggingface.co/blog/continu...
- Now I can easily bench two plates, squat three plates, and even do Nordic curls. Yea…
- But some of my most technical and "expensive" jackets fits me very very tight.
- Now I can easily bench two plates, squat three plates, and even do Nordic curls. Yea…
- But some of my most technical and "expensive" jackets fits me very very tight.
- Now I can easily bench two plates, squat three plates, and even do Nordic curls. Yea…
- But some of my most technical and "expensive" jackets fits me very very tight.
An intelligent LLM routing system trained with reinforcement learning to dynamically select optimal models from 20+ available LLMs while optimizing for both performance and cost.
An intelligent LLM routing system trained with reinforcement learning to dynamically select optimal models from 20+ available LLMs while optimizing for both performance and cost.
An expert, end-to-end OCR model built on Hunyuan's native multimodal architecture and training strategy. This model "supposed to" achieve SOTA performance with only 1 billion parameters, significantly reducing deployment costs.
An expert, end-to-end OCR model built on Hunyuan's native multimodal architecture and training strategy. This model "supposed to" achieve SOTA performance with only 1 billion parameters, significantly reducing deployment costs.
The pants are next-level ugly.
The pants are next-level ugly.
Are they implying that Google will sell TPUs, not just offer them through Google Cloud, to other companies like Meta?
Are they implying that Google will sell TPUs, not just offer them through Google Cloud, to other companies like Meta?
github.com/jalexine/gpu...
github.com/jalexine/gpu...
Google DeepMind suggests that pixel-by-pixel autoregressive modeling may scale into a truly unified vision paradigm. Their study shows that as resolution increases, model size must grow much faster than the dataset,
Google DeepMind suggests that pixel-by-pixel autoregressive modeling may scale into a truly unified vision paradigm. Their study shows that as resolution increases, model size must grow much faster than the dataset,
Their first agentic small language model for computer use. This experimental model includes robust safety measures to aid responsible deployment.
Blog: www.microsoft.com/en-us/resear...
Model: huggingface.co/microsoft/Fa...
Their first agentic small language model for computer use. This experimental model includes robust safety measures to aid responsible deployment.
Blog: www.microsoft.com/en-us/resear...
Model: huggingface.co/microsoft/Fa...
www.cs.cornell.edu/~asampson/bl...
www.cs.cornell.edu/~asampson/bl...
Why? Because Korean retail investors keep buying U.S. equities. If I were them, I’d buy more SK Hynix, not U.S. stocks, but who else would buy meme stocks.
Why? Because Korean retail investors keep buying U.S. equities. If I were them, I’d buy more SK Hynix, not U.S. stocks, but who else would buy meme stocks.
This project demonstrates integer-only training of a language model directly on the CPU, completely bypassing the need for GPUs, floating-point arithmetic, or heavy ML frameworks like PyTorch or JAX.
This project demonstrates integer-only training of a language model directly on the CPU, completely bypassing the need for GPUs, floating-point arithmetic, or heavy ML frameworks like PyTorch or JAX.
In the subway, when only one seat becomes available for a couple, more often than not the man sits down while the woman remains standing. I observed this across all generations.
In the subway, when only one seat becomes available for a couple, more often than not the man sits down while the woman remains standing. I observed this across all generations.