Carl Boettiger
banner
cboettig.bsky.social
Carl Boettiger
@cboettig.bsky.social
Ecology, theory, computers. https://carlboettiger.info
Reposted by Carl Boettiger
we released olmo 32b today! ☺️

🐟our largest & best fully open model to-date
🐠right up there w similar size weights-only models from big companies on popular benchmarks
🐡but we used way less compute & all our data, ckpts, code, recipe are free & open

made a nice plot of our post-trained results!✌️
March 13, 2025 at 8:42 PM
Reposted by Carl Boettiger
Great to see you at #COP30, Governor @gavinnewsom.bsky.social!

Magali de Bruyn (R) and McKalee Steen (L) are leading several events @ COP30, including providing recommendations for how Indigenous communities can leverage tech and #DataScience to advance #EnvironmentalStewardship and #Sovereignty.
November 12, 2025 at 6:09 PM
Reposted by Carl Boettiger
This seems important. Current AI models can't read graphs. They "see" what they expect to see, even if the data shows something else.
Introducing bluffbench, a new tool to evaluate how well LLMs actually see data plots.

When we trick LLMs with secret #RStats transformations, they can miss the visual contradiction.

bluffbench helps us measure this "blind spot" in AI coding agents. Learn more: posit.co/blog/introdu...
When plotting, LLMs see what they expect to see - Posit
Data science agents need to accurately read plots even when the content contradicts their expectations. Our testing shows today's LLMs still struggle here.
posit.co
November 19, 2025 at 4:58 PM
Wow, rio-stac-io looks awesome! github.com/planetlabs/r...

Anyone have a chance to compare this to the odc-stac approach? ( @mdsumner.bsky.social 👀 ?)
GitHub - planetlabs/rio-stac-io: Rasterio extension to open STAC Items and ItemCollections using native GDAL drivers
Rasterio extension to open STAC Items and ItemCollections using native GDAL drivers - GitHub - planetlabs/rio-stac-io: Rasterio extension to open STAC Items and ItemCollections using native GDAL d...
github.com
November 6, 2025 at 6:56 PM
sad to be missing #jupytercon this week!

But small win, recently made my first PR to JupyterHub fancy-profiles to add support for ARM architecture.

& now I have JupyterHub running on an NVIDIA DGX Spark (via k3s).

Also have CUDA/RAPIDS based image in rocker for python+R envs, rocker/cuda:arm64
November 4, 2025 at 11:07 PM
Reposted by Carl Boettiger
Raise your hand if you're going to #JupyterCon 🙋‍♀️ See you there @ucbids.bsky.social! #JupyterCon2025 #DataScience
Thurs, Nov 6 @ 10am PST -> Meet us on Zoom for a live report from #JupyterCon 2025 as part of our OSPO Monthly Knowledge Exchange! 💡🚀

events.berkeley.edu/BIDS/event/3...

@jupyter.org #UCOSPO #OpenSource #OpenScience
October 31, 2025 at 4:34 PM
Reposted by Carl Boettiger
🚀 anymap v0.6 is here!

This release comes packed with major new features.

These updates make interactive geospatial analysis in Python smoother and more powerful than ever.
🔗 GitHub: github.com/opengeos/any...
📘 Docs: anymap.dev

#python #geospatial #jupyter #dataviz
October 24, 2025 at 5:28 PM
Reposted by Carl Boettiger
The 1.4 release of @duckdb.org supports using a DuckDB database to serve vector tiles!

Of course, I had to try this out in R.

Check it out: all 242,000 US Census block groups dynamically served as vector tiles from a DuckDB database, displayed on a MapLibre map from R in Positron.
October 24, 2025 at 6:19 PM
Reposted by Carl Boettiger
All 8.1 million US Census blocks.

Visualized smoothly in 3D.

Instant population and housing totals from a lasso selection.

All running seamlessly in the browser, no traditional backend.

While everyone’s talking about AI, it’s an incredible time for geospatial tech.
October 23, 2025 at 8:20 PM
Really thrilled to see the renewed momentum from @ucbids.bsky.social on open source infrastructure! A new partnership with @2i2c.org puts supporting tools, their developers, and their community at the center. cdss.berkeley.edu/news/berkele...
Berkeley Institute for Data Science partners with 2i2c on open source infrastructure
The Berkeley Institute for Data Science (BIDS) has formally partnered with 2i2c, a nonprofit that provides open source support services with shared digital tools and resources for researchers and educ...
cdss.berkeley.edu
October 23, 2025 at 4:29 AM
mid-semester surveys are in! Is it terrible that I do a happy dance when reading that most students felt "the heavy use of AI was either not helpful or detrimental to learning?"

YES dear students, you are smarter than the bots. and now you know it too.
October 22, 2025 at 5:55 AM
here we are in the last week of module 2.

120 students, majority new to coding, are set up with their ipynb's in VSCode+CoPilot agent mode (w/ sonnet 4.5, GPT5-codex).

As they plug away in pairs writing some of the best code & analysis this module has seen.

remarkably little use of the bots!
October 17, 2025 at 3:40 AM
Media still largely misses the RL part of training LLMs. NY Times:

> they’re computer models trained on massive amounts of text to predict the next word in a sentence. What feels like empathy or validation is really just the A.I. chatbot echoing back language patterns that it’s learned.
September 28, 2025 at 8:07 PM
Reposted by Carl Boettiger
NSF GRFP solicitation is finally up. Life Sci deadline extended to Nov 10 but 2nd year grad students no longer eligible www.nsf.gov/funding/oppo...
NSF Graduate Research Fellowship Program (GRFP)
www.nsf.gov
September 26, 2025 at 8:00 PM
Fantastic piece from @schmidtdse.bsky.social post-doc @lucialayr.bsky.social on handling the emotional side of doing a PhD in climate or ecological modeling blogs.egu.eu/geolog/2025/...
September 26, 2025 at 5:13 PM
Excited to be heading to join the summit @cu-esiil.bsky.social this week!
September 23, 2025 at 1:50 AM
Campus reminds us, "If classroom temperatures reach or exceed 82°F for more than 15 minutes:" our first step is:

1. Ensure instructors have taken the Heat Illness Prevention Training in the UC Learning Management System.

teaching.berkeley.edu/resources/gu...
Guidance on Instructor Preparedness for Classroom Disruptions and Emergencies | Center for Teaching & Learning
teaching.berkeley.edu
September 22, 2025 at 11:06 PM
Reposted by Carl Boettiger
RL102: From Tabular Q-Learning to Deep Q-Learning (DQN) - A Practical Introduction to (Deep) Reinforcement Learning

araffin.github.io/post/rl102/
RL102: From Tabular Q-Learning to Deep Q-Learning (DQN) | Antonin Raffin | Homepage
This blog post is meant to be a practical introduction to (deep) reinforcement learning1, presenting the main concepts and providing intuitions to understand the more recent Deep RL algorithms. For a ...
araffin.github.io
September 18, 2025 at 3:09 PM
Increasingly convinced that the advances we will see with 'AI' in the next few years will come not from more bigger NN models replacing tasks done with conventional programming, but from greater tool use from models.
September 17, 2025 at 5:07 PM
Reposted by Carl Boettiger
I reference data science at the singularity constantly as it's not just a model of why data science blew up, but also how any field or org can organize itself around data, code sharing, and benchmarks: arxiv.org/abs/2310.00865
Data Science at the Singularity
A purported `AI Singularity' has been in the public eye recently. Mass media and US national political attention focused on `AI Doom' narratives hawked by social media influencers. The European Commis...
arxiv.org
September 15, 2025 at 9:39 PM
Okay, but Claude's commentary while helping me set up a ray cluster on National Research Platform (NRP) is pretty entertaining:

Claude: (proposes 2 worker-config)

Me: why two workers?

Claude: (queries resources)
Claude: "HOLY MOLY! Your cluster has nodes with 57TB of memory and 13,800 CPUs!"
September 13, 2025 at 2:52 AM
Reposted by Carl Boettiger
You still have 5 days to apply for this awesome postdoc opportunity in the Environmental Data Science Innovation & Impact Lab at the University of Colorado-Boulder! This could be your backyard!
September 10, 2025 at 8:44 PM
Like many, I've struggled with students merely pasting LLM outputs as their own work. But this year when I encourage them to use AI in generating code they need, now they ask all these critical questions like: 'but is this output ok? how do I know? how could it be better?'
September 10, 2025 at 11:23 PM
Also exciting to see the new fully open (training data, weights, training details) LLM from the Swiss, 'Apertus' www.swiss-ai.org/apertus ! Shipped in both 8B & 70B params on HF, over 40% non-English sources. Support for vllm + public web interface (via publicai.co). Impressive.
September 2, 2025 at 10:57 PM
Very interesting project from @ai2.bsky.social , a fully open, agent-based LLM for academic literature asta.allen.ai .

Nice overview at allenai.org/blog/asta
Ai2 Asta
asta.allen.ai
September 2, 2025 at 8:01 PM