tomwhi.bsky.social
@tomwhi.bsky.social
Reposted
"o3, show me a page from the instruction manual for a machine that turns everything into crabs"

Intrigued by the warning it created: "make the warning page of what happens if you insert a crab into the Universal Crabbifier"

"show me a security camera still..."

(all zero shot, first try)
May 19, 2025 at 8:43 PM
Reposted
LLMs may allow computational studies of literature to get beyond word-counting and model things readers value, like surprise. But — what exactly is "surprise"? New work by Annaliese Bissell, Ella Paulin, and @andrewpiper.bsky.social. aclanthology.org/2025.wnu-1.7/
May 10, 2025 at 2:29 PM
Reposted
Another one of those little shocking AI moments: this sound clip was generated in 46 seconds on my home PC from the script below. Just the text

Nari Lab's Dia does some of the best expressive AI voice I have seen and it is open weights & created by two undergrads with no funding
April 22, 2025 at 9:12 PM
Reposted
Magi-1: The Autoregressive Diffusion Video Generation Model

🥇 The first autoregressive video model with top-tier quality output
🔓 100% open-source & tech report
📊 Exceptional performance on major benchmarks
April 22, 2025 at 6:10 AM
Reposted
Comparing Human versus LLM Judges

TREC, which is a community of researchers in information retrieval and natural language processing convened by the NIST, found that an independent human judge correlates better with GPT-4o than a human judge.
April 22, 2025 at 7:17 PM
Reposted
Lvmin Zhang launched FramePack, a novel approach for next - frame prediction models in video generation.

Project: lllyasviel.github.io/frame_pack_g...

Image-to-5-Seconds (30fps, 150 frames)
April 21, 2025 at 2:39 AM
Reposted
Microsoft's BitNet b1.58 2B4T — the first large-scale, native 1-bit LLM🚀🚀

BitNet achieves performance on par with leading full-precision LLMs — and it’s blazingly fast⚡️⚡️uses much lower memory🎉

Everything is open-sourced, per them.
April 16, 2025 at 4:18 AM
Reposted
A way to help models "be aware of their own capabilities and limitations" from @jacobeisenstein.bsky.social et al: arxiv.org/abs/2503.14481 #MLSky
March 22, 2025 at 4:09 PM
Reposted
The Differences between Deep Research, Deep Research, and Deep Research by Han Lee

This blog post examines the various flavors of “Deep Research” from a technical implementation perspective.

leehanchung.github.io/blogs/2025/0...
The Differences between Deep Research, Deep Research, and Deep Research
Dive into the world of AI engineering with this exploration of Deep Research in report generation. Learn how LLM-as-a-judge systems, machine learning, and ev...
leehanchung.github.io
March 19, 2025 at 5:52 AM
Reposted
A very exciting day for open-source AI! We're releasing our biggest open source model yet -- OLMo 2 32B -- and it beats the latest GPT 3.5, GPT 4o mini, and leading open weight models like Qwen and Mistral. As usual, all data, weights, code, etc. are available.
March 13, 2025 at 6:16 PM
Reposted
Google has released Gemma 3, the latest open model. Being the most capable and advanced version in the open - source model family, it adds highly requested features like longer context and multimodality.
March 13, 2025 at 5:35 AM
Reposted
Introducing our Gemma 3 open models, the most capable models that you can run on a single GPU or TPU. Multimodal, multilingual, 128k context length, and exceeds quality of other open models that are an order of magnitude larger in terms of hardware footprint. 🎉

blog.google/technology/d...
Introducing Gemma 3: The most capable model you can run on a single GPU or TPU
Today, we're introducing Gemma 3, our most capable, portable and responsible open model yet.
blog.google
March 13, 2025 at 2:55 PM
Reposted
PapersChat – Chat with Research Papers

PapersChat provides an agentic AI interface for querying papers, retrieving insights from ArXiv & PubMed, and structuring responses efficiently.

github.com/AstraBert/Pa...
March 10, 2025 at 4:47 AM
Reposted
Mistral has just released a new OCR model, Mistral-OCR. Yet still the usual VLM curse: with challenging manuscripts, it hallucinates completely.
March 6, 2025 at 7:02 PM
Reposted
Mercury is the first commercial-scale Diffusion LLM.
March 1, 2025 at 4:42 PM
Reposted
The Blue Report is amazing. The most clicked on articles are right here

theblue.report
The Blue Report
The top links on Bluesky, updated hourly
theblue.report
March 1, 2025 at 9:21 PM
Reposted
An Overview of Large Language Models for Statisticians

- Stat for LLM: How statistical methods can improve LLM uncertainty quantification, interpretability, trustworthiness & more.
February 27, 2025 at 4:09 AM
Reposted
I asked Claude “Make an interactive artifact that will illustrate to me why I should not start Civ VII right now.”

This is what it came up with on its own.
February 8, 2025 at 10:34 PM
Reposted
Why do LLMs trained on over 90% English text perform so well in non-English languages?

They find that they learn to share highly abstract grammatical concept representations, even across unrelated languages!
February 6, 2025 at 7:19 AM
Reposted
Fellow journos covering "AI": Please don't do their PR for them! "Virtual employees" is a harmful anthropomorphism in that it (a) is false; (b) confuses readers about the emerging technology and inaccurately lends human attributes like agency, accountability, etc.; and (c) harms humans in real jobs.
January 6, 2025 at 5:07 PM
Reposted
Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:

Original release: 8 models, 540K downloads. Just the beginning...

The community turned those open-weight models into +550 NEW models on @huggingface. Total downloads? 2.5M—nearly 5X the originals.
January 27, 2025 at 4:24 PM
Reposted
We’re excited to introduce Transformer², a machine learning system that dynamically adjusts its weights for various tasks!

sakana.ai/transformer-...

Adaptation is a remarkable natural phenomenon, like how the octopus blends into its environment, or how the brain rewires itself after injury.

🧵 1/N
January 15, 2025 at 5:49 AM
Yet more interesting research by sakana.ai
We’re excited to introduce Transformer², a machine learning system that dynamically adjusts its weights for various tasks!

sakana.ai/transformer-...

Adaptation is a remarkable natural phenomenon, like how the octopus blends into its environment, or how the brain rewires itself after injury.

🧵 1/N
January 26, 2025 at 10:57 AM
Reposted
ByteDance Doubao-1.5-pro

- Includes a "Deep Thinking" mode, surpassing O1-preview and O1 models on the AIME benchmark.

- Outperforms deepseek-v3, gpt4o, and llama3.1-405B on popular benchmarks.

team.doubao.com/en/special/d...
January 24, 2025 at 9:43 PM