mdsumner
@mdsumner.bsky.social
290 followers 380 following 520 posts
maxing the pixels to glow meshes, R, Python, GDAL, virtual Zarr, coordinate systems, southern ocean research
Posts Media Videos Starter Packs
Reposted by mdsumner
mdsumner.bsky.social
how's your day going

#rstats

paste0(sample(c(letters, LETTERS, 0:9,
strsplit("!@#$%^&*()_+=-~", "")[[1]]), 56),
collapse = "")
Reposted by mdsumner
mikemahoney218.com
my recent focus has been on a lot of point time series data -- I _think_ someone told me zarr handles this use case well enough, but honestly I've only seriously played with parquet so far
Reposted by mdsumner
lvulis.bsky.social
Oh huh. Like trajectories? I'd keep in mind what you're doing up/down as well and how that fits into the workflow. So what packages are loading the parquets etc because that ends up being a huge part of the decisionmaking for me
mdsumner.bsky.social
Oh great point ! is actually exactly where tabular spatial breaks down, trajectory data is an excellent crossover that doesn't quite fit either world, it's mostly tabular but ordered within groups (and is it points, segments, line strings or all the above, yes).

Argo long sits in this uneasy place
mdsumner.bsky.social
This is not parquet, but pretty essential to see where xarray and Zarr are going (raster yes, but not just raster)

youtu.be/I-NHCuLhRjY?...
mdsumner.bsky.social
It won't be long until Icechunk is used to virtualize Parquet stores, referencing compressed chunks by byte ranges, and stored in their mysterious Rust Zarrtabase
mdsumner.bsky.social
Neither is quite the superset, but both can use the other and this is not a hierarchy rather a thicket of experiments
mdsumner.bsky.social
Also, you already know will the difference imo because it's manifest in R, arrays (missing "labels) vs data frames

But, it's those crossover areas where they bleed into one another (nested parquet like json arrays, compressed partition chunks like in arrow) is where the blur is.
mdsumner.bsky.social
And this one:

earthmover.io/blog/what-is...

(The Earthmover stance that "labelled arrays are inefficient as fully materialized tables" is a bit disingenuous ... obvious actually, I wonder if they will review better now xarray has coordinate indexes)
Fundamentals: What Is Zarr? A Cloud-Native Format for Tensor Data - Earthmover
What Zarr is, and how it enables fast, scalable access to multidimensional array data in the cloud.
earthmover.io
mdsumner.bsky.social
"hit", I meant "gut"
Reposted by mdsumner
mikemahoney218.com
hey spatial folks: if anyone has a great "parquet versus zarr" write-up, let me know. I'm never going to write a serializer but I'm really interested in the performance impacts of format for cloud downloads 👀

#geosky
mdsumner.bsky.social
Have you seen kerchunk store virtual Zarr in Parquet? (Also, intake models netcdf by nesting array metadata in Parquet

(Comparing Zarr to parq s a category error imo, like vector cubes: there's no cube it's just a relation to not duplicate structures, stored via arrays rather than latent joins)
mdsumner.bsky.social
They're real different... don't do the same things ... Parquet is a table, until it's a nested table (then it's more like nested json, which is where the array folks pounce)

The array folks think Zarr will take over all table tech... (I think arrow will hit the Zarr core and become its foundation)
Reposted by mdsumner
fubits.dev
ah yeah, absolutely adding `zarr` to the watch-list! bsky.app/profile/mike...
mikemahoney218.com
hey spatial folks: if anyone has a great "parquet versus zarr" write-up, let me know. I'm never going to write a serializer but I'm really interested in the performance impacts of format for cloud downloads 👀

#geosky
Reposted by mdsumner
olliniemitalo.bsky.social
Zarr in the Browser — Fast, Flexible, and Surprisingly Powerful for Big Geo Data by Tobias Ferreira, Jul 1, 2025 medium.com/p/zarr-in-th...
Reposted by mdsumner
obrien.page
(the read-write end -- the server is well beyond this and awesome by itself)
Reposted by mdsumner
obrien.page
I rescind this question now that I'm getting further into the package 🫣
mdsumner.bsky.social
star wars won't work
Reposted by mdsumner
johnrossmd.bsky.social
Pleased to report that this 88-year-old fella still has the juice
Shadow Ticket, a new novel by Thomas Pynchon
mdsumner.bsky.social
#python what dataframe framework will hand over a column without it always being a (useless) Series?

gawd, it feels like being stuck in {zoo}
Reposted by mdsumner
rsimmon.bsky.social
Any chance you could point to an example?
mdsumner.bsky.social
scratching the surface here: gist.github.com/mdsumner/f25...

the real power I find is that it's so modular, we can fix a huge list of problems and read from urls with vrt:// / vsicurl, and warping can burn pixels into any grid we want so the combination is hugely various