Lightnews — Scholar-powered news

Light up
your news

About Privacy Terms Help

Julien Hurault

Julien Hurault

@hachej.bsky.social

120 followers 300 following 45 posts

Freelance Data | Weekly Data Eng. Newsletter 📨 juhache.substack.com - 4k+ readers

Posts Replies Media Videos

Julien Hurault

@hachej.bsky.social

Indeed it s not simple unfortunately..just the a way to get started quickly atm .

December 13, 2024 at 7:55 PM

Julien Hurault

@hachej.bsky.social

❤️❤️

December 13, 2024 at 4:47 PM

Julien Hurault

@hachej.bsky.social

For iceberg catalog hard to find a simpler setup..

December 13, 2024 at 9:07 AM

Julien Hurault

@hachej.bsky.social

Nice! Can you orchestrate lambda or ecs tasks that way?

December 13, 2024 at 8:16 AM

Julien Hurault

@hachej.bsky.social

Just use Pyiceberg with AWS Glue, probably the fastest way to get started.

December 13, 2024 at 8:02 AM

Julien Hurault

@hachej.bsky.social

In term of volume of data exchange over the marketplace? No idea

December 6, 2024 at 9:21 PM

Julien Hurault

@hachej.bsky.social

SF sales rep told me that markeplace was THE feature that helped a lot to convert

December 6, 2024 at 8:53 PM

Julien Hurault

@hachej.bsky.social

Saw it lot in finance!

December 6, 2024 at 8:28 PM

Julien Hurault

@hachej.bsky.social

Multi pipelines will probably get tricky no?

December 6, 2024 at 8:24 PM

Julien Hurault

@hachej.bsky.social

Dlt + duck + evidence + @blef.fr's baby?

December 6, 2024 at 8:03 PM

Julien Hurault

@hachej.bsky.social

For those lost in GCP terminology:
I wrote a summary of Iceberg integration in GCP a couple of weeks ago:
juhache.substack.com/p/gcp-and-ic...

Ju Data Engineering Weekly - Ep 77

juhache.substack.com

December 6, 2024 at 7:46 PM

Julien Hurault

@hachej.bsky.social

30 million rows is just one month of data, right?

December 4, 2024 at 2:06 PM

Julien Hurault

@hachej.bsky.social

check catalog.boringdata.io/dashboard/in...

catalog.boringdata.io

December 2, 2024 at 11:43 AM

Julien Hurault

@hachej.bsky.social

Do you support swiss banks by any chance?

December 1, 2024 at 12:27 PM

Julien Hurault

@hachej.bsky.social

" bash / make knowledge, a single instance SQL processing engine (DuckDB, CHDB or a few python scripts), a distributed file system, git and a developer workflow (CI/CD)" what s your best option to orchestate sql models in such setup?

November 30, 2024 at 6:48 PM

Julien Hurault

@hachej.bsky.social

Super good thx!
"immutable workflow + atomic operation" 100%!

November 30, 2024 at 6:25 PM

Julien Hurault

@hachej.bsky.social

ATTACH url_to_your_dabase.duckdb;

November 30, 2024 at 3:39 PM

Julien Hurault

@hachej.bsky.social

S3 to Snow in the same aws region is free no?

November 30, 2024 at 8:49 AM

Julien Hurault

@hachej.bsky.social

Niiiice, your view is doing a read_parquet(*) on their bucket? Or do you copy the data?

November 30, 2024 at 6:44 AM

Julien Hurault

@hachej.bsky.social

Where is the data stored, then? In DuckDB itself?
So, if you have a 1GB dataset, does that mean you’ll share a single .duckdb file containing the entire dataset? Or either a view pointing to parquet files: CREATE VIEW... as read_parquet(*.parquet) ?

November 29, 2024 at 3:13 PM

Julien Hurault

@hachej.bsky.social

Do you see DuckDB as a format?

For me:
• Parquet = Standard storage format
• Iceberg = Standard metadata format
• DuckDB = One possible distribution vector

November 29, 2024 at 8:25 AM