Zach Ip
tenoki.bsky.social
Zach Ip
@tenoki.bsky.social
Building ML and automation tools for humans in this brave new world
Director of Data Science @RadicleScience
Prev: 🔬 UW BioE PhD |🤖computational research | 🧠 neurotech
Reposted by Zach Ip
"Brain-only participants exhibited the strongest, most distributed networks; Search Engine users showed moderate engagement; and LLM users displayed the weakest connectivity."

"Over four months, LLM users [...] underperformed at neural, linguistic, and behavioral levels."

arxiv.org/abs/2506.08872
Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task
This study explores the neural and behavioral consequences of LLM-assisted essay writing. Participants were divided into three groups: LLM, Search Engine, and Brain-only (no tools). Each completed thr...
arxiv.org
June 18, 2025 at 5:10 PM
Reposted by Zach Ip
Claude’s rebuttal to Apple’s recent paper went viral

A guy, non-researcher, submitted a joke paper to arXiv with Claude as the main author

it contained real legit problems with the Apple paper (one of the problems was impossible to solve), and went viral

open.substack.com/pub/lawsen/p...
June 16, 2025 at 6:42 PM
Reposted by Zach Ip
Fave Cursor workflow at the moment is get Claude to write feature implementation plans into a markdown document and update it as we go.

Breaks features down into phases with checklists, notes, relevant file lists. Essentially acts as read/write memory to prevent chat context from getting too long.
June 6, 2025 at 10:18 AM
I’d have loved to be there for that presentation!! Excited to read more of her work!
Alison Gopnik, telling it like it is, at Johns Hopkins.
May 30, 2025 at 6:49 AM
Reposted by Zach Ip
I put together an annotated version of the new Claude 4 system prompt, covering both the prompt Anthropic published and the missing, leaked sections that describe its various tools

It's basically the secret missing manual for Claude 4, it's fascinating!

simonwillison.net/2025/May/25/...
May 25, 2025 at 1:51 PM
Reposted by Zach Ip
I got access to Gemini Diffusion, Google's first diffusion LLM, and the thing is absurdly fast - it ran at 857 tokens/second and built me a prototype chat interface in just a couple of seconds, video here: simonwillison.net/2025/May/21/...
Gemini Diffusion
Another of the announcements from Google I/O yesterday was Gemini Diffusion, Google's first LLM to use diffusion (similar to image models like Imagen and Stable Diffusion) in place of transformers. …
simonwillison.net
May 21, 2025 at 9:45 PM
Reposted by Zach Ip
We've seen nothing yet! We hosted a 9-13 yo vibe-coding event with @robertkeus.bsky.social this weekend (h/t
@antonosika.bsky.social and Lovable)

takeaway? AI is unleashing a generation of wildly creative builders beyond anything I'd have imagined

and they grow up knowing they can build anything!
May 19, 2025 at 9:59 AM
Reposted by Zach Ip
the last few weeks i’ve spent A LOT of time with o3. to the point where i keep trying to run multiple concurrent queries in the mobile app (doesn’t work btw)

deep dive into the web at your fingertips. hours of research in a couple minutes
May 21, 2025 at 1:15 AM
Reposted by Zach Ip
OpenMemory MCP, a private memory for MCP-compatible clients powered by mem0

OpenMemory MCP runs 100% locally and provides a persistent, portable memory layer for all your AI tools. It enables agents and assistants to read from and write to a shared memory, securely and privately.
May 13, 2025 at 10:12 PM
Reposted by Zach Ip
too cynical?
May 13, 2025 at 4:52 AM
I'm hosting a Community Pokemon popularity contest: pokemon-popularity-contest.streamlit.app
Make sure your objectively right opinions on Pokemon designs is heard! #Pokemon #Voting
Pokémon Popularity Contest
A Streamlit application that ranks Pokémon based on community preferences through head-to-head co...
pokemon-popularity-contest.streamlit.app
May 11, 2025 at 5:46 AM
Reposted by Zach Ip
For a long time, the biggest problem in machine learning has been improving and understanding robustness and generalization to OOD.

We are just increasingly making more & more problems in-distribution but the models still don't generalize out-of-the-box to the tail of problems.
May 7, 2025 at 3:52 PM
Reposted by Zach Ip
A weird thing about LLMs is that they just happen to do many things but almost all uses are undocumented.

For example, GPT-4o is very good at helping farmers identify swine diseases.

There is a lot of value in experts exploring & benchmarking how good LLMs are at various tasks to find use cases.
May 6, 2025 at 4:18 AM
Fantastic work by MacDowell et. al.! Intriguing parallels between how neural geometry routes information through multiplexed subspaces and how DNNs and multi-attention heads develop multiplexed internal representational manifolds #neuroscience #NeuroAI #AI
May 1, 2025 at 7:24 PM
So fascinating to see the massive fallout from seemingly innocuous prompting. The issue of alignment, interpretation, and interpretability continues to be a massive challenge
ChatGPT's recent update caused the model to be unbearably sycophantic - this has now been fixed through an update to the system prompt, and as far as I can tell this is what they changed simonwillison.net/2025/Apr/29/...
April 30, 2025 at 4:54 PM
“If you can not measure it, you can not improve it.” I think more subjective benchmarks like this are super important, not just for model performance, but for understanding our own blind spots when interacting with LLMs
I've cobbled together syco-bench, a benchmark of model sycophancy. It consists of tests measuring three things: bias towards user in an argument, mirroring user views, and overestimating user IQ. Here are the results, higher scores are worse. See the following tweet for important caveats:
April 30, 2025 at 4:52 PM
Reposted by Zach Ip
it’s here! a real Qwen3 model

huggingface.co/Qwen/Qwen3-0...
Qwen/Qwen3-0.6B-FP8 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
April 28, 2025 at 9:01 PM
Reposted by Zach Ip
How does Goodhart's Law, "When a measure becomes a target, it ceases to be a good measure," apply to LLMs?

LLM providers are incentivized to optimize for benchmark scores—even if that means fine-tuning models in ways that improve test results but degrade real-world performance.
How to A/B Test AI: A Practical Guide
Learn how to A/B test AI models to improve performance, enhance user experience, and reduce costs using real-world data and best practices.
blog.growthbook.io
April 26, 2025 at 8:04 AM
Reposted by Zach Ip
We packaged everything in the gcPCA toolbox, an open-source package with multiple solutions for different needs:
📂 github.com/SjulsonLab/generalized_contrastive_PCA
- Asymmetric or symmetric, Orthogonal or non-orthogonal, and sparse solutions
👉 Check out Table 1 in the paper for details!
9/
April 18, 2025 at 12:44 PM
Reposted by Zach Ip
Does your research involve comparing experimental conditions? Then our latest publication is for you: We developed generalized contrastive PCA (gcPCA), a tool for comparing high-dimensional datasets. 🧠📊 doi.org/10.1371/journal.pcbi.1012747
This tool was born out of necessity, here is the story. 🧵
1/
April 18, 2025 at 12:44 PM
Reposted by Zach Ip
Former DeepSeeker and collaborators release new method for training reliable AI agents: RAGEN https://venturebeat.com/ai/former-deepseeker-and-collaborators-release-new-method-for-training-reliable-ai-agents-ragen/ #AI #agents
April 24, 2025 at 3:50 AM
I can’t stop drawing parallels between AI agents and the early days of computers like RAM→Context Window, CPU→Weights, etc. Would love to see how far we can take this analogy, and where it breaks down!

See the full post on LinkedIn:
shorturl.at/bfhWv
April 23, 2025 at 5:53 PM
Really impressive results by Zep (github.com/getzep/graph...) for agent memory management!

Benchmarks are one thing, but I can't wait to try this out in vivo. Would love to hear how other people are finding it!
April 22, 2025 at 11:25 PM
Reposted by Zach Ip
High-Dimensional Dynamics in Low-Dimensional Networks.

New preprint with a former undergrad, Yue Wan.

I'm not totally sure how to talk about these results. They're counterintuitive on the surface, seem somewhat obvious in hindsight, but then there's more to them when you dig deeper.
April 21, 2025 at 5:00 PM