Contributor: Apache Iceberg | Apache Hudi
Distributed Computing | Technical Author (O’Reilly, Packt)
Prev📍: DevRel @Onehouse.ai, Dremio, Engineering @Qlik, @OTIS Elevators
Book 📕: https://a.co/d/fUDs7G6
TBH, I have been thinking about this for quite some time.
Most of the times, in conversations with engineers exploring these formats, so many questions have come up.
We need to ask, Why don’t ODBC & JDBC fit in today’s analytical world?
These protocols were designed particularly for row-based workloads.
What about columnar “Arrow” based data?
dipankar-tnt.medium.com/what-is-apac...
We need to ask, Why don’t ODBC & JDBC fit in today’s analytical world?
These protocols were designed particularly for row-based workloads.
What about columnar “Arrow” based data?
dipankar-tnt.medium.com/what-is-apac...
www.onehouse.ai/blog/acid-tr...
www.onehouse.ai/blog/acid-tr...
Community Over Code North America 2025 has been announced!
Where: Minneapolis, MN (USA)
When: September 11-14, 2025
Read more about #CommunityOverCode --> https://buff.ly/4jQx36S
Community Over Code North America 2025 has been announced!
Where: Minneapolis, MN (USA)
When: September 11-14, 2025
Read more about #CommunityOverCode --> https://buff.ly/4jQx36S
In this blog I go into the fundamentals of concurrency control, explore why it is essential for lakehouses with OCC, MVCC & Non blocking control.
hudi.apache.org/blog/2025/01...
In this blog I go into the fundamentals of concurrency control, explore why it is essential for lakehouses with OCC, MVCC & Non blocking control.
hudi.apache.org/blog/2025/01...
Querying huge volumes of data from storage demands optimized query speed
Your queries are fast today, but they might not be over time!
Read: www.onehouse.ai/blog/what-is...
Querying huge volumes of data from storage demands optimized query speed
Your queries are fast today, but they might not be over time!
Read: www.onehouse.ai/blog/what-is...
Lakehouse means only one thing- data lakes needed the “transactional layer” on top of Parquet for running db-style workloads (both transactional & analytical)
Lakehouse means only one thing- data lakes needed the “transactional layer” on top of Parquet for running db-style workloads (both transactional & analytical)
www.dataengineeringw...
www.dataengineeringw...
This is a solid example to show the metadata translation capability for open table formats like Iceberg, Hudi & Delta.
Fabric users can work with Iceberg tables written by Snowflake without any rewrites/stuff.
Link: blog.fabric.microsoft.com/en-us/blog/s...
This is a solid example to show the metadata translation capability for open table formats like Iceberg, Hudi & Delta.
Fabric users can work with Iceberg tables written by Snowflake without any rewrites/stuff.
Link: blog.fabric.microsoft.com/en-us/blog/s...
This is a solid example to show the metadata translation capability for open table formats like Iceberg, Hudi & Delta.
Fabric users can work with Iceberg tables written by Snowflake without any rewrites/stuff.
Link: blog.fabric.microsoft.com/en-us/blog/s...
This is a solid example to show the metadata translation capability for open table formats like Iceberg, Hudi & Delta.
Fabric users can work with Iceberg tables written by Snowflake without any rewrites/stuff.
Link: blog.fabric.microsoft.com/en-us/blog/s...
transactional.blog/b...
transactional.blog/b...
No matter how performant your compute engine is, your storage needs data to be optimally organized.
Here’s a new blog I published!
www.onehouse.ai/blog/how-to-...
No matter how performant your compute engine is, your storage needs data to be optimally organized.
Here’s a new blog I published!
www.onehouse.ai/blog/how-to-...
Upserts are crucial for use cases like CDC.
There are two ways for record-level updates in data lakes:
Copy-On-Write (CoW)
Merge-On-Read (MoR)
A 🧵on Uber's new CoW optimization for Parquet.
Upserts are crucial for use cases like CDC.
There are two ways for record-level updates in data lakes:
Copy-On-Write (CoW)
Merge-On-Read (MoR)
A 🧵on Uber's new CoW optimization for Parquet.
TBH, I have been thinking about this for quite some time.
Most of the times, in conversations with engineers exploring these formats, so many questions have come up.
TBH, I have been thinking about this for quite some time.
Most of the times, in conversations with engineers exploring these formats, so many questions have come up.
Here’s what you will find in the repo:
⭐️ Blogs
⭐️ Research Papers summary
⭐️ Code
⭐️ Crisp Social posts
Link: github.com/dipankarmazu...
Here’s what you will find in the repo:
⭐️ Blogs
⭐️ Research Papers summary
⭐️ Code
⭐️ Crisp Social posts
Link: github.com/dipankarmazu...
Here’s what you will find in the repo:
⭐️ Blogs
⭐️ Research Papers summary
⭐️ Code
⭐️ Crisp Social posts
Link: github.com/dipankarmazu...
Here’s what you will find in the repo:
⭐️ Blogs
⭐️ Research Papers summary
⭐️ Code
⭐️ Crisp Social posts
Link: github.com/dipankarmazu...
I am Dipankar & I have been involved in open source space for some time. Currently I focus on Data Infra with projects such as Apache Hudi, Iceberg, Arrow & XTable. In my prev gigs, I worked on different spectrum of Data.
I am Dipankar & I have been involved in open source space for some time. Currently I focus on Data Infra with projects such as Apache Hudi, Iceberg, Arrow & XTable. In my prev gigs, I worked on different spectrum of Data.
Incremental processing is a technique that processes data in small increments rather than in one large batch.
Incremental processing is a technique that processes data in small increments rather than in one large batch.
blog.haoxp.xyz/posts/parque...
blog.haoxp.xyz/posts/parque...