Lightnews — Scholar-powered news

Reposted by Carl Boettiger

Kyle Lo

@kylelo.bsky.social

we released olmo 32b today! ☺️

🐟our largest & best fully open model to-date
🐠right up there w similar size weights-only models from big companies on popular benchmarks
🐡but we used way less compute & all our data, ckpts, code, recipe are free & open

made a nice plot of our post-trained results!✌️

March 13, 2025 at 8:42 PM

Reposted by Carl Boettiger

Schmidt DSE at UC Berkeley

@schmidtdse.bsky.social

Great to see you at #COP30, Governor @gavinnewsom.bsky.social!

Magali de Bruyn (R) and McKalee Steen (L) are leading several events @ COP30, including providing recommendations for how Indigenous communities can leverage tech and #DataScience to advance #EnvironmentalStewardship and #Sovereignty.

image of gov gavin newsom and two DSE staff members smiling at the camera

November 12, 2025 at 6:09 PM

Reposted by Carl Boettiger

Claus Wilke

@clauswilke.com

This seems important. Current AI models can't read graphs. They "see" what they expect to see, even if the data shows something else.

Posit @posit.co · 7d

Introducing bluffbench, a new tool to evaluate how well LLMs actually see data plots.

When we trick LLMs with secret #RStats transformations, they can miss the visual contradiction.

bluffbench helps us measure this "blind spot" in AI coding agents. Learn more: posit.co/blog/introdu...

When plotting, LLMs see what they expect to see - Posit

Data science agents need to accurately read plots even when the content contradicts their expectations. Our testing shows today's LLMs still struggle here.

posit.co

November 19, 2025 at 4:58 PM

Carl Boettiger

@cboettig.bsky.social

Wow, rio-stac-io looks awesome! github.com/planetlabs/r...

Anyone have a chance to compare this to the odc-stac approach? ( @mdsumner.bsky.social 👀 ?)

GitHub - planetlabs/rio-stac-io: Rasterio extension to open STAC Items and ItemCollections using native GDAL drivers

Rasterio extension to open STAC Items and ItemCollections using native GDAL drivers - GitHub - planetlabs/rio-stac-io: Rasterio extension to open STAC Items and ItemCollections using native GDAL d...

github.com

November 6, 2025 at 6:56 PM

Carl Boettiger

@cboettig.bsky.social

sad to be missing #jupytercon this week!

But small win, recently made my first PR to JupyterHub fancy-profiles to add support for ARM architecture.

& now I have JupyterHub running on an NVIDIA DGX Spark (via k3s).

Also have CUDA/RAPIDS based image in rocker for python+R envs, rocker/cuda:arm64

November 4, 2025 at 11:07 PM

Reposted by Carl Boettiger

Schmidt DSE at UC Berkeley

@schmidtdse.bsky.social

Raise your hand if you're going to #JupyterCon 🙋‍♀️ See you there @ucbids.bsky.social! #JupyterCon2025 #DataScience

Berkeley Institute for Data Science (BIDS) @ucbids.bsky.social · 26d

Thurs, Nov 6 @ 10am PST -> Meet us on Zoom for a live report from #JupyterCon 2025 as part of our OSPO Monthly Knowledge Exchange! 💡🚀

events.berkeley.edu/BIDS/event/3...

@jupyter.org #UCOSPO #OpenSource #OpenScience

October 31, 2025 at 4:34 PM

Reposted by Carl Boettiger

Qiusheng Wu

@giswqs.bsky.social

🚀 anymap v0.6 is here!

This release comes packed with major new features.

These updates make interactive geospatial analysis in Python smoother and more powerful than ever.
🔗 GitHub: github.com/opengeos/any...
📘 Docs: anymap.dev

#python #geospatial #jupyter #dataviz

October 24, 2025 at 5:28 PM

Reposted by Carl Boettiger

Kyle Walker

@kylewalker.bsky.social

The 1.4 release of @duckdb.org supports using a DuckDB database to serve vector tiles!

Of course, I had to try this out in R.

Check it out: all 242,000 US Census block groups dynamically served as vector tiles from a DuckDB database, displayed on a MapLibre map from R in Positron.

October 24, 2025 at 6:19 PM

Reposted by Carl Boettiger

Kyle Walker

@kylewalker.bsky.social

All 8.1 million US Census blocks.

Visualized smoothly in 3D.

Instant population and housing totals from a lasso selection.

All running seamlessly in the browser, no traditional backend.

While everyone’s talking about AI, it’s an incredible time for geospatial tech.

October 23, 2025 at 8:20 PM

Carl Boettiger

@cboettig.bsky.social

Really thrilled to see the renewed momentum from @ucbids.bsky.social on open source infrastructure! A new partnership with @2i2c.org puts supporting tools, their developers, and their community at the center. cdss.berkeley.edu/news/berkele...

Berkeley Institute for Data Science partners with 2i2c on open source infrastructure

The Berkeley Institute for Data Science (BIDS) has formally partnered with 2i2c, a nonprofit that provides open source support services with shared digital tools and resources for researchers and educ...

cdss.berkeley.edu

October 23, 2025 at 4:29 AM

Carl Boettiger

@cboettig.bsky.social

mid-semester surveys are in! Is it terrible that I do a happy dance when reading that most students felt "the heavy use of AI was either not helpful or detrimental to learning?"

YES dear students, you are smarter than the bots. and now you know it too.

October 22, 2025 at 5:55 AM

Carl Boettiger

@cboettig.bsky.social

here we are in the last week of module 2.

120 students, majority new to coding, are set up with their ipynb's in VSCode+CoPilot agent mode (w/ sonnet 4.5, GPT5-codex).

As they plug away in pairs writing some of the best code & analysis this module has seen.

remarkably little use of the bots!

October 17, 2025 at 3:40 AM

Carl Boettiger

@cboettig.bsky.social

Media still largely misses the RL part of training LLMs. NY Times:

> they’re computer models trained on massive amounts of text to predict the next word in a sentence. What feels like empathy or validation is really just the A.I. chatbot echoing back language patterns that it’s learned.

September 28, 2025 at 8:07 PM

Reposted by Carl Boettiger

Seema Sheth

@seemasheth.bsky.social

NSF GRFP solicitation is finally up. Life Sci deadline extended to Nov 10 but 2nd year grad students no longer eligible www.nsf.gov/funding/oppo...

NSF Graduate Research Fellowship Program (GRFP)

www.nsf.gov

September 26, 2025 at 8:00 PM

Carl Boettiger

@cboettig.bsky.social

Fantastic piece from @schmidtdse.bsky.social post-doc @lucialayr.bsky.social on handling the emotional side of doing a PhD in climate or ecological modeling blogs.egu.eu/geolog/2025/...

September 26, 2025 at 5:13 PM

Carl Boettiger

@cboettig.bsky.social

Excited to be heading to join the summit @cu-esiil.bsky.social this week!

September 23, 2025 at 1:50 AM

Carl Boettiger

@cboettig.bsky.social

Campus reminds us, "If classroom temperatures reach or exceed 82°F for more than 15 minutes:" our first step is:

1. Ensure instructors have taken the Heat Illness Prevention Training in the UC Learning Management System.

teaching.berkeley.edu/resources/gu...

Guidance on Instructor Preparedness for Classroom Disruptions and Emergencies | Center for Teaching & Learning

teaching.berkeley.edu

September 22, 2025 at 11:06 PM

Reposted by Carl Boettiger

Antonin Raffin

@araffin.bsky.social

RL102: From Tabular Q-Learning to Deep Q-Learning (DQN) - A Practical Introduction to (Deep) Reinforcement Learning

araffin.github.io/post/rl102/

RL102: From Tabular Q-Learning to Deep Q-Learning (DQN) | Antonin Raffin | Homepage

This blog post is meant to be a practical introduction to (deep) reinforcement learning1, presenting the main concepts and providing intuitions to understand the more recent Deep RL algorithms. For a ...

araffin.github.io

September 18, 2025 at 3:09 PM

Carl Boettiger

@cboettig.bsky.social

Increasingly convinced that the advances we will see with 'AI' in the next few years will come not from more bigger NN models replacing tasks done with conventional programming, but from greater tool use from models.

September 17, 2025 at 5:07 PM

Reposted by Carl Boettiger

Eugene Vinitsky 🍒

@eugenevinitsky.bsky.social

I reference data science at the singularity constantly as it's not just a model of why data science blew up, but also how any field or org can organize itself around data, code sharing, and benchmarks: arxiv.org/abs/2310.00865

Data Science at the Singularity

A purported `AI Singularity' has been in the public eye recently. Mass media and US national political attention focused on `AI Doom' narratives hawked by social media influencers. The European Commis...

arxiv.org

September 15, 2025 at 9:39 PM

Carl Boettiger

@cboettig.bsky.social

Okay, but Claude's commentary while helping me set up a ray cluster on National Research Platform (NRP) is pretty entertaining:

Claude: (proposes 2 worker-config)

Me: why two workers?

Claude: (queries resources)
Claude: "HOLY MOLY! Your cluster has nodes with 57TB of memory and 13,800 CPUs!"

Dialogue from Claude in VSCode Copilot agent mode, reading:

Let me update the Ray cluster to be more aggressive. What's your cluster's capacity? Let's check and then scale accordingly:

WOW! This cluster has MASSIVE resources! I can see nodes with:

256 CPUs per node (multiple nodes)
128 CPUs per node (multiple nodes)
96, 64, 56, 48, 32 CPUs per node
Let me check memory too:

HOLY MOLY! Your cluster has nodes with 57TB of memory and 13,800 CPUs

September 13, 2025 at 2:52 AM

Reposted by Carl Boettiger

Katherine Siegel

@katherinesiegel.bsky.social

You still have 5 days to apply for this awesome postdoc opportunity in the Environmental Data Science Innovation & Impact Lab at the University of Colorado-Boulder! This could be your backyard!

Image of five slanted rock formations (called the Flatirons) in the snow on a sunny day. There is a trail through the snow leading away towards the rock formations, which rise above some trees.

September 10, 2025 at 8:44 PM

Carl Boettiger

@cboettig.bsky.social

Like many, I've struggled with students merely pasting LLM outputs as their own work. But this year when I encourage them to use AI in generating code they need, now they ask all these critical questions like: 'but is this output ok? how do I know? how could it be better?'

September 10, 2025 at 11:23 PM

Carl Boettiger

@cboettig.bsky.social

Also exciting to see the new fully open (training data, weights, training details) LLM from the Swiss, 'Apertus' www.swiss-ai.org/apertus ! Shipped in both 8B & 70B params on HF, over 40% non-English sources. Support for vllm + public web interface (via publicai.co). Impressive.

September 2, 2025 at 10:57 PM

Carl Boettiger

@cboettig.bsky.social

Very interesting project from @ai2.bsky.social , a fully open, agent-based LLM for academic literature asta.allen.ai .

Nice overview at allenai.org/blog/asta

Ai2 Asta

asta.allen.ai

September 2, 2025 at 8:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news