Julien Hurault
hachej.bsky.social
Julien Hurault
@hachej.bsky.social
Freelance Data | Weekly Data Eng. Newsletter 📨 juhache.substack.com - 4k+ readers

Indeed it s not simple unfortunately..just the a way to get started quickly atm .
December 13, 2024 at 7:55 PM
❤️❤️
December 13, 2024 at 4:47 PM
For iceberg catalog hard to find a simpler setup..
December 13, 2024 at 9:07 AM
Nice! Can you orchestrate lambda or ecs tasks that way?
December 13, 2024 at 8:16 AM
Just use Pyiceberg with AWS Glue, probably the fastest way to get started.
December 13, 2024 at 8:02 AM
In term of volume of data exchange over the marketplace? No idea
December 6, 2024 at 9:21 PM
SF sales rep told me that markeplace was THE feature that helped a lot to convert
December 6, 2024 at 8:53 PM
Saw it lot in finance!
December 6, 2024 at 8:28 PM
Multi pipelines will probably get tricky no?
December 6, 2024 at 8:24 PM
Dlt + duck + evidence + @blef.fr's baby?
December 6, 2024 at 8:03 PM
For those lost in GCP terminology:
I wrote a summary of Iceberg integration in GCP a couple of weeks ago:
juhache.substack.com/p/gcp-and-ic...
GCP & Iceberg
Ju Data Engineering Weekly - Ep 77
juhache.substack.com
December 6, 2024 at 7:46 PM
30 million rows is just one month of data, right?
December 4, 2024 at 2:06 PM
Home
catalog.boringdata.io
December 2, 2024 at 11:43 AM
Do you support swiss banks by any chance?
December 1, 2024 at 12:27 PM
" bash / make knowledge, a single instance SQL processing engine (DuckDB, CHDB or a few python scripts), a distributed file system, git and a developer workflow (CI/CD)" what s your best option to orchestate sql models in such setup?
November 30, 2024 at 6:48 PM
Super good thx!
"immutable workflow + atomic operation" 100%!
November 30, 2024 at 6:25 PM
November 30, 2024 at 3:39 PM
S3 to Snow in the same aws region is free no?
November 30, 2024 at 8:49 AM
Niiiice, your view is doing a read_parquet(*) on their bucket? Or do you copy the data?
November 30, 2024 at 6:44 AM
Where is the data stored, then? In DuckDB itself?
So, if you have a 1GB dataset, does that mean you’ll share a single .duckdb file containing the entire dataset? Or either a view pointing to parquet files: CREATE VIEW... as read_parquet(*.parquet) ?
November 29, 2024 at 3:13 PM
Do you see DuckDB as a format?

For me:
• Parquet = Standard storage format
• Iceberg = Standard metadata format
• DuckDB = One possible distribution vector
November 29, 2024 at 8:25 AM