https://distsys.fponzi.me/
justinjaffray.com/what-does-wr...
This post is about gaining intuition for Write Skew, and, by extension, Snapshot Isolation. Snapshot Isolation is billed as a transaction isolation level that offers a good mix between performance and correctness.
justinjaffray.com/what-does-wr...
This post is about gaining intuition for Write Skew, and, by extension, Snapshot Isolation. Snapshot Isolation is billed as a transaction isolation level that offers a good mix between performance and correctness.
martin.kleppmann.com/2016/02/08/h...
Redis has been making inroads into areas of data management where there are stronger consistency and durability expectations. Distributed locking is one of those areas. Let’s examine it in some more detail.
martin.kleppmann.com/2016/02/08/h...
Redis has been making inroads into areas of data management where there are stronger consistency and durability expectations. Distributed locking is one of those areas. Let’s examine it in some more detail.
wyounas.github.io/aws/concurre...
We’ll use a model checker to see how such a race could happen. Formal verification can’t prevent every failure, but it helps us think more clearly about correctness and reason about subtle bugs.
wyounas.github.io/aws/concurre...
We’ll use a model checker to see how such a race could happen. Formal verification can’t prevent every failure, but it helps us think more clearly about correctness and reason about subtle bugs.
muratbuffalo.blogspot.com/2025/11/tla-...
AWS’s N. Virginia region suffered a DynamoDB outage triggered by a DNS automation defect.This post focuses narrowly on the race condition at the core of the bug, which is best understood through TLA+ modeling
muratbuffalo.blogspot.com/2025/11/tla-...
AWS’s N. Virginia region suffered a DynamoDB outage triggered by a DNS automation defect.This post focuses narrowly on the race condition at the core of the bug, which is best understood through TLA+ modeling
www.xtxmarkets.com/tech/2025-te...
This post motivates TernFS, explains its high-level architecture, and then explores some key implementation details.
www.xtxmarkets.com/tech/2025-te...
This post motivates TernFS, explains its high-level architecture, and then explores some key implementation details.
www.allthingsdistributed.com/2025/05/just...
Each component follows the Unix mantra—do one thing, and do it well—but working together they are able to offer all the features users expect from a database.
www.allthingsdistributed.com/2025/05/just...
Each component follows the Unix mantra—do one thing, and do it well—but working together they are able to offer all the features users expect from a database.
marc-bowes.com/dsql-auth.html
How connections to Aurora DSQL are authenticated and authorized. This information is meant to be supplemental to what is found in the official Amazon Aurora DSQL documentation.
marc-bowes.com/dsql-auth.html
How connections to Aurora DSQL are authenticated and authorized. This information is meant to be supplemental to what is found in the official Amazon Aurora DSQL documentation.
brooker.co.za/blog/2025/08...
People often ask me about the architectural relationship between Amazon Dynamo, Amazon DynamoDB and Aurora DSQL. I’ll start off on comparing how the systems achieve a few key properties.
brooker.co.za/blog/2025/08...
People often ask me about the architectural relationship between Amazon Dynamo, Amazon DynamoDB and Aurora DSQL. I’ll start off on comparing how the systems achieve a few key properties.
s2.dev/blog/lineari...
We can gain confidence that S2 is linearizable by taking an empirical validation approach, using a model checker like Knossos, or Porcupine.
s2.dev/blog/lineari...
We can gain confidence that S2 is linearizable by taking an empirical validation approach, using a model checker like Knossos, or Porcupine.
dbos.dev/blog/durable...
What we really needed to make distributed task queueing robust are durable queues that checkpoint the status of our queued tasks to a durable store like Postgres.
dbos.dev/blog/durable...
What we really needed to make distributed task queueing robust are durable queues that checkpoint the status of our queued tasks to a durable store like Postgres.
relentless-leader.com/dive-deep-in...
relentless-leader.com/dive-deep-in...
muratbuffalo.blogspot.com/2020/06/lear...
A principled, from the foundations-up, studying of distributed systems, which will take a good three months in the first pass, and many more months to build competence after that.
muratbuffalo.blogspot.com/2020/06/lear...
A principled, from the foundations-up, studying of distributed systems, which will take a good three months in the first pass, and many more months to build competence after that.
groups.csail.mit.edu/tds/papers/L...
groups.csail.mit.edu/tds/papers/L...
www.allthingsdistributed.com/2025/05/just...
a few weeks ago, at our internal dev conference I watched a talk from two of our PEs on building DSQL. I asked if they’d be willing to turn their insights into a deeper exploration of DSQL’s development.
www.allthingsdistributed.com/2025/05/just...
a few weeks ago, at our internal dev conference I watched a talk from two of our PEs on building DSQL. I asked if they’d be willing to turn their insights into a deeper exploration of DSQL’s development.
decentralizedthoughts.github.io/2025-05-23-s...
Reasoning about distributed algorithms is hard at the best of times, with state split across remote nodes, asynchrony, concurrency, and non-determinism in the order that event occur
decentralizedthoughts.github.io/2025-05-23-s...
Reasoning about distributed algorithms is hard at the best of times, with state split across remote nodes, asynchrony, concurrency, and non-determinism in the order that event occur
relentless-leader.com/apache-icebe...
Apache Iceberg is an ACID table format designed for large-scale analytics workloads.
relentless-leader.com/apache-icebe...
Apache Iceberg is an ACID table format designed for large-scale analytics workloads.
www.elastic.co/search-labs/...
Debugging concurrency bugs is no picnic, but we're going to get into it. Enter Fray, a deterministic concurrency testing framework that turns flaky failures into reproducible ones.
www.elastic.co/search-labs/...
Debugging concurrency bugs is no picnic, but we're going to get into it. Enter Fray, a deterministic concurrency testing framework that turns flaky failures into reproducible ones.
stevana.github.io/erlangs_not_...
To me it’s clear that the big idea there isn’t lightweight processes2 and message passing, but rather the generic components which in Erlang are called behaviours.
stevana.github.io/erlangs_not_...
To me it’s clear that the big idea there isn’t lightweight processes2 and message passing, but rather the generic components which in Erlang are called behaviours.
pierrezemb.fr/posts/learn-...
A curated collection of resources about deterministic simulation testing for distributed systems.
pierrezemb.fr/posts/learn-...
A curated collection of resources about deterministic simulation testing for distributed systems.
sumercip.com/posts/patter...
sumercip.com/posts/patter...
ilyasergey.net/YSC4231/
This course on basic concurrent and parallel algorithms has been taught by Ilya Sergey at Yale-NUS College in 2019-2024.
ilyasergey.net/YSC4231/
This course on basic concurrent and parallel algorithms has been taught by Ilya Sergey at Yale-NUS College in 2019-2024.
dl.acm.org/doi/10.1145/...
dl.acm.org/doi/10.1145/...
shachaf.net/w/consensus
This page is a relatively informal discussion of distributed consensus and Paxos, what it does, how it works, and some tricks and variants.
shachaf.net/w/consensus
This page is a relatively informal discussion of distributed consensus and Paxos, what it does, how it works, and some tricks and variants.
groups.google.com/g/raft-dev/c...
groups.google.com/g/raft-dev/c...