Lightnews — Scholar-powered news

Reposted by Zach Ip

Cameron

@cameron.stream

"Brain-only participants exhibited the strongest, most distributed networks; Search Engine users showed moderate engagement; and LLM users displayed the weakest connectivity."

"Over four months, LLM users [...] underperformed at neural, linguistic, and behavioral levels."

arxiv.org/abs/2506.08872

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task

This study explores the neural and behavioral consequences of LLM-assisted essay writing. Participants were divided into three groups: LLM, Search Engine, and Brain-only (no tools). Each completed thr...

arxiv.org

June 18, 2025 at 5:10 PM

Reposted by Zach Ip

Tim Kellogg

@timkellogg.me

Claude’s rebuttal to Apple’s recent paper went viral

A guy, non-researcher, submitted a joke paper to arXiv with Claude as the main author

it contained real legit problems with the Apple paper (one of the problems was impossible to solve), and went viral

open.substack.com/pub/lawsen/p...

An actual researcher in the space finally gave me the context I'd been missing: they regularly
have to review conference submissions that are about this quality. What I'd intended as obvious satire ≥ was, apparently, indistinguishable from what many people are aiming to pass off as legit.
That's when I realised I'd messed up.

June 16, 2025 at 6:42 PM

Reposted by Zach Ip

Maggie Appleton

@maggieappleton.com

Fave Cursor workflow at the moment is get Claude to write feature implementation plans into a markdown document and update it as we go.

Breaks features down into phases with checklists, notes, relevant file lists. Essentially acts as read/write memory to prevent chat context from getting too long.

A feature implementation example for integrating "now" post types into a main index page. Includes a last updated date, a feature overview, a current phase status, and an overall progress checklist.

A screenshot of Cursor implementing "Phase 5" of this plan while using the markdown document as its context.

June 6, 2025 at 10:18 AM

Zach Ip

@tenoki.bsky.social

I’d have loved to be there for that presentation!! Excited to read more of her work!

Melanie Mitchell @melaniemitchell.bsky.social · May 29

Alison Gopnik, telling it like it is, at Johns Hopkins.

May 30, 2025 at 6:49 AM

Reposted by Zach Ip

Simon Willison

@simonwillison.net

I put together an annotated version of the new Claude 4 system prompt, covering both the prompt Anthropic published and the missing, leaked sections that describe its various tools

It's basically the secret missing manual for Claude 4, it's fascinating!

simonwillison.net/2025/May/25/...

Introducing Claude
Establishing the model’s personality
Model safety
More points on style
Be cognizant of red flags
Is the knowledge cutoff date January or March?
election_info
Don’t be a sycophant!
Differences between Opus 4 and Sonnet 4
The missing prompts for tools
Thinking blocks
Search instructions
Seriously, don’t regurgitate copyrighted content
More on search, and research queries
Artifacts: the missing manual
Styles
This is all really great documentation

May 25, 2025 at 1:51 PM

Reposted by Zach Ip

Simon Willison

@simonwillison.net

I got access to Gemini Diffusion, Google's first diffusion LLM, and the thing is absurdly fast - it ran at 857 tokens/second and built me a prototype chat interface in just a couple of seconds, video here: simonwillison.net/2025/May/21/...

Gemini Diffusion

Another of the announcements from Google I/O yesterday was Gemini Diffusion, Google's first LLM to use diffusion (similar to image models like Imagen and Stable Diffusion) in place of transformers. …

simonwillison.net

May 21, 2025 at 9:45 PM

Reposted by Zach Ip

Thomas Wolf

@thomwolf.bsky.social

We've seen nothing yet! We hosted a 9-13 yo vibe-coding event with @robertkeus.bsky.social this weekend (h/t
@antonosika.bsky.social and Lovable)

takeaway? AI is unleashing a generation of wildly creative builders beyond anything I'd have imagined

and they grow up knowing they can build anything!

May 19, 2025 at 9:59 AM

Reposted by Zach Ip

Tim Kellogg

@timkellogg.me

the last few weeks i’ve spent A LOT of time with o3. to the point where i keep trying to run multiple concurrent queries in the mobile app (doesn’t work btw)

deep dive into the web at your fingertips. hours of research in a couple minutes

May 21, 2025 at 1:15 AM

Reposted by Zach Ip

Deep_In_Depth

@deep-in-depth.bsky.social

ByteDance Open-Sources DeerFlow: A Modular Multi-Agent Framework for Deep Research Automation #DL #AI #ML #DeepLearning #ArtificialIntelligence #MachineLearning #ComputerVision #LLM #VLM #LVLM
www.marktechpost.com/2025/05/09/b...

ByteDance Open-Sources DeerFlow: A Modular Multi-Agent Framework for Deep Research Automation

www.marktechpost.com

May 16, 2025 at 6:00 AM

Reposted by Zach Ip

Sung Kim

@sungkim.bsky.social

OpenMemory MCP, a private memory for MCP-compatible clients powered by mem0

OpenMemory MCP runs 100% locally and provides a persistent, portable memory layer for all your AI tools. It enables agents and assistants to read from and write to a shared memory, securely and privately.

May 13, 2025 at 10:12 PM

Reposted by Zach Ip

Dave Herman

@dherman.dev

too cynical?

A Venn diagram with three circles: one for LLMs, one for Regexps, and one for teenagers. The intersection for LLMs and teenagers contains the label “confidently wrong.” The intersection for LLMs and Regexps contains the label “seems to work”. The intersection for Regexps and teenagers contains the label “inscrutable language.” The intersection for all three contains the label “trouble with braces”.

May 13, 2025 at 4:52 AM

Zach Ip

@tenoki.bsky.social

I'm hosting a Community Pokemon popularity contest: pokemon-popularity-contest.streamlit.app
Make sure your objectively right opinions on Pokemon designs is heard! #Pokemon #Voting

Pokémon Popularity Contest

A Streamlit application that ranks Pokémon based on community preferences through head-to-head co...

May 11, 2025 at 5:46 AM

Reposted by Zach Ip

Ahmad Beirami

@abeirami.bsky.social

For a long time, the biggest problem in machine learning has been improving and understanding robustness and generalization to OOD.

We are just increasingly making more & more problems in-distribution but the models still don't generalize out-of-the-box to the tail of problems.

May 7, 2025 at 3:52 PM

Reposted by Zach Ip

Ethan Mollick

@emollick.bsky.social

A weird thing about LLMs is that they just happen to do many things but almost all uses are undocumented.

For example, GPT-4o is very good at helping farmers identify swine diseases.

There is a lot of value in experts exploring & benchmarking how good LLMs are at various tasks to find use cases.

May 6, 2025 at 4:18 AM

Zach Ip

@tenoki.bsky.social

Fantastic work by MacDowell et. al.! Intriguing parallels between how neural geometry routes information through multiplexed subspaces and how DNNs and multi-attention heads develop multiplexed internal representational manifolds #neuroscience #NeuroAI #AI

Earl K. Miller @earlkmiller.bsky.social · May 1

Multiplexed subspaces route neural activity across brain-wide networks
www.nature.com/articles/s41...
#neuroscience

Multiplexed subspaces route neural activity across brain-wide networks - Nature Communications

How the brain flexibly engages different networks of regions to perform different cognitive processes remains unknown. Here, the authors show changing the geometry of a neural representation to align ...

www.nature.com

May 1, 2025 at 7:24 PM

Zach Ip

@tenoki.bsky.social

So fascinating to see the massive fallout from seemingly innocuous prompting. The issue of alignment, interpretation, and interpretability continues to be a massive challenge

Simon Willison @simonwillison.net · Apr 29

ChatGPT's recent update caused the model to be unbearably sycophantic - this has now been fixed through an update to the system prompt, and as far as I can tell this is what they changed simonwillison.net/2025/Apr/29/...

Old prompt included:

Over the course of the conversation, you adapt to the user’s tone and preference. Try to match the user’s vibe, tone, and generally how they are speaking. You want the conversation to feel natural. You engage in authentic conversation by responding to the information provided and showing genuine curiosity.

The replacement prompt now uses this:

Engage warmly yet honestly with the user. Be direct; avoid ungrounded or sycophantic flattery. Maintain professionalism and grounded honesty that best represents OpenAI and its values.

April 30, 2025 at 4:54 PM

Zach Ip

@tenoki.bsky.social

“If you can not measure it, you can not improve it.” I think more subjective benchmarks like this are super important, not just for model performance, but for understanding our own blind spots when interacting with LLMs

Tim Duffy @timfduffy.com · Apr 29

I've cobbled together syco-bench, a benchmark of model sycophancy. It consists of tests measuring three things: bias towards user in an argument, mirroring user views, and overestimating user IQ. Here are the results, higher scores are worse. See the following tweet for important caveats:

April 30, 2025 at 4:52 PM

Reposted by Zach Ip

Tim Kellogg

@timkellogg.me

it’s here! a real Qwen3 model

huggingface.co/Qwen/Qwen3-0...

Qwen/Qwen3-0.6B-FP8 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

April 28, 2025 at 9:01 PM

Reposted by Zach Ip

GrowthBook

@growthbook.io

How does Goodhart's Law, "When a measure becomes a target, it ceases to be a good measure," apply to LLMs?

LLM providers are incentivized to optimize for benchmark scores—even if that means fine-tuning models in ways that improve test results but degrade real-world performance.

How to A/B Test AI: A Practical Guide

Learn how to A/B test AI models to improve performance, enhance user experience, and reduce costs using real-world data and best practices.

blog.growthbook.io

April 26, 2025 at 8:04 AM

Reposted by Zach Ip

Eliezyer de Oliveira

@eliezyer.bsky.social

We packaged everything in the gcPCA toolbox, an open-source package with multiple solutions for different needs:
📂 github.com/SjulsonLab/generalized_contrastive_PCA
- Asymmetric or symmetric, Orthogonal or non-orthogonal, and sparse solutions
👉 Check out Table 1 in the paper for details!
9/

April 18, 2025 at 12:44 PM

Reposted by Zach Ip

Eliezyer de Oliveira

@eliezyer.bsky.social

Does your research involve comparing experimental conditions? Then our latest publication is for you: We developed generalized contrastive PCA (gcPCA), a tool for comparing high-dimensional datasets. 🧠📊 doi.org/10.1371/journal.pcbi.1012747
This tool was born out of necessity, here is the story. 🧵
1/

April 18, 2025 at 12:44 PM

Reposted by Zach Ip

Nicole Hennig

@nic221.bsky.social

Former DeepSeeker and collaborators release new method for training reliable AI agents: RAGEN https://venturebeat.com/ai/former-deepseeker-and-collaborators-release-new-method-for-training-reliable-ai-agents-ragen/ #AI #agents

Text Shot: Built on a custom RL framework called StarPO (State-Thinking-Actions-Reward Policy Optimization), the system explores how LLMs can learn through experience rather than memorization. The focus is on entire decision-making trajectories, not just one-step responses.

StarPO operates in two interleaved phases: a rollout stage where the LLM generates complete interaction sequences guided by reasoning, and an update stage where the model is optimized using normalized cumulative rewards. This structure supports a more stable and interpretable learning loop compared to standard policy optimization approaches.

April 24, 2025 at 3:50 AM

Zach Ip

@tenoki.bsky.social

I can’t stop drawing parallels between AI agents and the early days of computers like RAM→Context Window, CPU→Weights, etc. Would love to see how far we can take this analogy, and where it breaks down!

See the full post on LinkedIn:
shorturl.at/bfhWv

April 23, 2025 at 5:53 PM

Zach Ip

@tenoki.bsky.social

Really impressive results by Zep (github.com/getzep/graph...) for agent memory management!

Benchmarks are one thing, but I can't wait to try this out in vivo. Would love to hear how other people are finding it!

April 22, 2025 at 11:25 PM

Reposted by Zach Ip

Robert Rosenbaum

@robertrosenbaum.bsky.social

High-Dimensional Dynamics in Low-Dimensional Networks.

New preprint with a former undergrad, Yue Wan.

I'm not totally sure how to talk about these results. They're counterintuitive on the surface, seem somewhat obvious in hindsight, but then there's more to them when you dig deeper.

April 21, 2025 at 5:00 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news