Lightnews — Scholar-powered news

Reposted

Ethan Mollick

@emollick.bsky.social

"o3, show me a page from the instruction manual for a machine that turns everything into crabs"

Intrigued by the warning it created: "make the warning page of what happens if you insert a crab into the Universal Crabbifier"

"show me a security camera still..."

(all zero shot, first try)

May 19, 2025 at 8:43 PM

Reposted

Ted Underwood

@tedunderwood.com

LLMs may allow computational studies of literature to get beyond word-counting and model things readers value, like surprise. But — what exactly is "surprise"? New work by Annaliese Bissell, Ella Paulin, and @andrewpiper.bsky.social. aclanthology.org/2025.wnu-1.7/

Narrative surprise is a core element of storytelling for engaging audiences, and yet it remains underexplored in the context of large language models (LLMs) and narrative generation. While surprise arises from events that deviate from expectations while maintaining retrospective coherence, current computational approaches lack comprehensive frameworks to evaluate this phenomenon. This paper presents a novel framework for assessing narrative surprise, drawing on psychological theories of narrative comprehension and surprise intensity. We operationalize six criteria—initiatoriness, immutability violation, predictability, post-dictability, importance, and valence—to measure narrative surprise in story endings. Our study evaluates 120 story endings, generated by both human authors and LLMs, across 30 mystery narratives. Through a ranked-choice voting methodology, we identify significant correlations between reader preferences and four of the six criteria. Results underscore the continuing advantage of human-authored endings in achieving compelling narrative surprise, while also revealing significant progress in LLM-generated narratives.

May 10, 2025 at 2:29 PM

Reposted

Ethan Mollick

@emollick.bsky.social

Another one of those little shocking AI moments: this sound clip was generated in 46 seconds on my home PC from the script below. Just the text

Nari Lab's Dia does some of the best expressive AI voice I have seen and it is open weights & created by two undergrads with no funding

April 22, 2025 at 9:12 PM

Reposted

Sung Kim

@sungkim.bsky.social

Magi-1: The Autoregressive Diffusion Video Generation Model

🥇 The first autoregressive video model with top-tier quality output
🔓 100% open-source & tech report
📊 Exceptional performance on major benchmarks

April 22, 2025 at 6:10 AM

Reposted

Sung Kim

@sungkim.bsky.social

Comparing Human versus LLM Judges

TREC, which is a community of researchers in information retrieval and natural language processing convened by the NIST, found that an independent human judge correlates better with GPT-4o than a human judge.

April 22, 2025 at 7:17 PM

Reposted

luokai

@luok.ai

Lvmin Zhang launched FramePack, a novel approach for next - frame prediction models in video generation.

Project: lllyasviel.github.io/frame_pack_g...

Image-to-5-Seconds (30fps, 150 frames)

April 21, 2025 at 2:39 AM

Reposted

Sung Kim

@sungkim.bsky.social

Microsoft's BitNet b1.58 2B4T — the first large-scale, native 1-bit LLM🚀🚀

BitNet achieves performance on par with leading full-precision LLMs — and it’s blazingly fast⚡️⚡️uses much lower memory🎉

Everything is open-sourced, per them.

April 16, 2025 at 4:18 AM

Reposted

Ted Underwood

@tedunderwood.com

A way to help models "be aware of their own capabilities and limitations" from @jacobeisenstein.bsky.social et al: arxiv.org/abs/2503.14481 #MLSky

$Don't lie to your friends: Learning what you know from collaborative self-play Jacob Eisenstein, Reza Aghajani, Adam Fisch, Dheeru Dua, Fantine Huot, Mirella Lapata, Vicky Zayats, Jonathan Berant To be helpful assistants, AI agents must be aware of their own capabilities and limitations. This includes knowing when to answer from parametric knowledge versus using tools, when to trust tool outputs, and when to abstain or hedge. Such capabilities are hard to teach through supervised fine-tuning because they require constructing examples that reflect the agent's specific capabilities. We therefore propose a radically new approach to teaching agents what they know: \emph{collaborative self-play}. We construct multi-agent collaborations in which the group is rewarded for collectively arriving at correct answers. The desired meta-knowledge emerges from the incentives built into the structure of the interaction. We focus on small societies of agents that have access to heterogeneous tools (corpus-specific retrieval), and therefore must collaborate to maximize their success while minimizing their effort. Experiments show that group-level rewards for multi-agent communities can induce policies that \emph{transfer} to improve tool use and selective prediction in settings where individual agents are deployed in isolation.$

March 22, 2025 at 4:09 PM

Reposted

Sung Kim

@sungkim.bsky.social

The Differences between Deep Research, Deep Research, and Deep Research by Han Lee

This blog post examines the various flavors of “Deep Research” from a technical implementation perspective.

leehanchung.github.io/blogs/2025/0...

The Differences between Deep Research, Deep Research, and Deep Research

Dive into the world of AI engineering with this exploration of Deep Research in report generation. Learn how LLM-as-a-judge systems, machine learning, and ev...

leehanchung.github.io

March 19, 2025 at 5:52 AM

Reposted

Nathan Lambert

@natolambert.bsky.social

A very exciting day for open-source AI! We're releasing our biggest open source model yet -- OLMo 2 32B -- and it beats the latest GPT 3.5, GPT 4o mini, and leading open weight models like Qwen and Mistral. As usual, all data, weights, code, etc. are available.

March 13, 2025 at 6:16 PM

Reposted

luokai

@luok.ai

Google has released Gemma 3, the latest open model. Being the most capable and advanced version in the open - source model family, it adds highly requested features like longer context and multimodality.

March 13, 2025 at 5:35 AM

Reposted

Jeff Dean

@jeffdean.bsky.social

Introducing our Gemma 3 open models, the most capable models that you can run on a single GPU or TPU. Multimodal, multilingual, 128k context length, and exceeds quality of other open models that are an order of magnitude larger in terms of hardware footprint. 🎉

blog.google/technology/d...

Introducing Gemma 3: The most capable model you can run on a single GPU or TPU

Today, we're introducing Gemma 3, our most capable, portable and responsible open model yet.

blog.google

March 13, 2025 at 2:55 PM

Reposted

Sung Kim

@sungkim.bsky.social

PapersChat – Chat with Research Papers

PapersChat provides an agentic AI interface for querying papers, retrieving insights from ArXiv & PubMed, and structuring responses efficiently.

github.com/AstraBert/Pa...

March 10, 2025 at 4:47 AM

Reposted

Alexander Doria

@dorialexander.bsky.social

Mistral has just released a new OCR model, Mistral-OCR. Yet still the usual VLM curse: with challenging manuscripts, it hallucinates completely.

March 6, 2025 at 7:02 PM

Reposted

luokai

@luok.ai

Mercury is the first commercial-scale Diffusion LLM.

March 1, 2025 at 4:42 PM

Reposted

Mark Cuban

@mcuban.bsky.social

The Blue Report is amazing. The most clicked on articles are right here

theblue.report

The Blue Report

The top links on Bluesky, updated hourly

theblue.report

March 1, 2025 at 9:21 PM

Reposted

Sung Kim

@sungkim.bsky.social

An Overview of Large Language Models for Statisticians

- Stat for LLM: How statistical methods can improve LLM uncertainty quantification, interpretability, trustworthiness & more.

February 27, 2025 at 4:09 AM

Reposted

Ethan Mollick

@emollick.bsky.social

I asked Claude “Make an interactive artifact that will illustrate to me why I should not start Civ VII right now.”

This is what it came up with on its own.

February 8, 2025 at 10:34 PM

Reposted

Sung Kim

@sungkim.bsky.social

Why do LLMs trained on over 90% English text perform so well in non-English languages?

They find that they learn to share highly abstract grammatical concept representations, even across unrelated languages!

February 6, 2025 at 7:19 AM

Reposted

JD Shadel

@jdshadel.com

Fellow journos covering "AI": Please don't do their PR for them! "Virtual employees" is a harmful anthropomorphism in that it (a) is false; (b) confuses readers about the emerging technology and inaccurately lends human attributes like agency, accountability, etc.; and (c) harms humans in real jobs.

The Guardian @theguardian.com · Jan 6

‘Virtual employees’ could join workforce as soon as this year, OpenAI boss says

Sam Altman says tools that carry out jobs anonymously, known as AI agents, could transform business output Virtual employees could join workforces this year and transform how companies work, according to the chief executive of OpenAI. The first…

www.theguardian.com

January 6, 2025 at 5:07 PM

Reposted

Florent Daudens

@fdaudens.bsky.social

Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:

Original release: 8 models, 540K downloads. Just the beginning...

The community turned those open-weight models into +550 NEW models on @huggingface. Total downloads? 2.5M—nearly 5X the originals.

January 27, 2025 at 4:24 PM

Reposted

sakanaai.bsky.social

@sakanaai.bsky.social

We’re excited to introduce Transformer², a machine learning system that dynamically adjusts its weights for various tasks!

sakana.ai/transformer-...

Adaptation is a remarkable natural phenomenon, like how the octopus blends into its environment, or how the brain rewires itself after injury.

🧵 1/N

January 15, 2025 at 5:49 AM

tomwhi.bsky.social

@tomwhi.bsky.social

Yet more interesting research by sakana.ai

sakanaai.bsky.social @sakanaai.bsky.social · Jan 15

We’re excited to introduce Transformer², a machine learning system that dynamically adjusts its weights for various tasks!

sakana.ai/transformer-...

Adaptation is a remarkable natural phenomenon, like how the octopus blends into its environment, or how the brain rewires itself after injury.

🧵 1/N

January 26, 2025 at 10:57 AM

Reposted

Sung Kim

@sungkim.bsky.social

ByteDance Doubao-1.5-pro

- Includes a "Deep Thinking" mode, surpassing O1-preview and O1 models on the AIME benchmark.

- Outperforms deepseek-v3, gpt4o, and llama3.1-405B on popular benchmarks.

team.doubao.com/en/special/d...

January 24, 2025 at 9:43 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news