Ricardo Castro
@mccricardo.bsky.social
Senior Principal Engineer, tech speaker & writer, @DevOpsPorto and @DevOpsDaysPT, @CDeliveryFdn Ambassador, martial arts amateur, and metal lover. Opinions are my own.
mccricardo.com
mccricardo.com
You Should Write An Agent
They're like riding a bike: easy, and you don't get it until you try.
fly.io
November 10, 2025 at 5:01 PM
"Faster root cause for slow traces with ClickStack Event Deltas" by Dale McDiarmid
clickhouse.com/blog/%20fast...
clickhouse.com/blog/%20fast...
Faster root cause for slow traces with ClickStack Event Deltas
Read how ClickStack's improved Event Deltas make it effortless to pinpoint the root causes of performance outliers in observability data - turning complex trace analysis into instant, actionable…
clickhouse.com
November 10, 2025 at 1:01 PM
"Faster root cause for slow traces with ClickStack Event Deltas" by Dale McDiarmid
clickhouse.com/blog/%20fast...
clickhouse.com/blog/%20fast...
Revision 149 is out!
@koslib.com
#devops #sre #platformengineering
embracerisk.substack.com/p/revision-149
@koslib.com
#devops #sre #platformengineering
embracerisk.substack.com/p/revision-149
Revision 149
Articles and updates:
embracerisk.substack.com
November 10, 2025 at 10:23 AM
Revision 149 is out!
@koslib.com
#devops #sre #platformengineering
embracerisk.substack.com/p/revision-149
@koslib.com
#devops #sre #platformengineering
embracerisk.substack.com/p/revision-149
"How Databricks Implemented Intelligent Kubernetes Load Balancing" by ByteByteGo
blog.bytebytego.com/p/how-databr...
blog.bytebytego.com/p/how-databr...
How Databricks Implemented Intelligent Kubernetes Load Balancing
The Databricks Engineering Team needed something smarter: a Layer 7, request-level load balancer that could react dynamically to real service conditions instead of relying on connection-level routing…
blog.bytebytego.com
November 9, 2025 at 6:01 PM
"How Databricks Implemented Intelligent Kubernetes Load Balancing" by ByteByteGo
blog.bytebytego.com/p/how-databr...
blog.bytebytego.com/p/how-databr...
TicketOps is perfectly fine for relatively stable stuff.
At scale, it breaks.
At scale, it breaks.
November 7, 2025 at 2:51 PM
TicketOps is perfectly fine for relatively stable stuff.
At scale, it breaks.
At scale, it breaks.
"SQL expressions in Grafana: Combine and manipulate data from multiple sources" by Sam Jewell and Kyle Brandt
grafana.com/blog/2025/10...
grafana.com/blog/2025/10...
SQL expressions in Grafana: Combine and manipulate data from multiple sources | Grafana Labs
SQL expressions are a versatile and powerful feature that opens up all sorts of creative possibilities by manipulating and combining data from different data sources.
grafana.com
November 7, 2025 at 1:01 PM
"SQL expressions in Grafana: Combine and manipulate data from multiple sources" by Sam Jewell and Kyle Brandt
grafana.com/blog/2025/10...
grafana.com/blog/2025/10...
In the dawn of a new wave of AI, if you're still thinking about infrastructure as code and not infrastructure as software, you're living in the past.
November 7, 2025 at 12:56 PM
In the dawn of a new wave of AI, if you're still thinking about infrastructure as code and not infrastructure as software, you're living in the past.
SRE is much more than just incident response.
I thought this needed to be highlighted since many are talking about "AI SRE", which mostly focuses on incident response.
I thought this needed to be highlighted since many are talking about "AI SRE", which mostly focuses on incident response.
November 6, 2025 at 6:03 PM
SRE is much more than just incident response.
I thought this needed to be highlighted since many are talking about "AI SRE", which mostly focuses on incident response.
I thought this needed to be highlighted since many are talking about "AI SRE", which mostly focuses on incident response.
"OTel Updates: Consistent Probability Sampling Fixes Fragmented Traces" by Anjali Udasi
last9.io/blog/consist...
last9.io/blog/consist...
OTel Updates: Consistent Probability Sampling Fixes Fragmented Traces | Last9
One sampling decision, propagated everywhere. OpenTelemetry's Consistent Probability Sampling fixes fragmented traces across services.
last9.io
November 6, 2025 at 1:01 PM
"OTel Updates: Consistent Probability Sampling Fixes Fragmented Traces" by Anjali Udasi
last9.io/blog/consist...
last9.io/blog/consist...
Consistency is underrated.
Many people believe in a "big bang" event that propels their career. And while there are certain cases where that's true, consistency is usually a better investment of your time.
Invest in being consistent and you'll reap rewards.
Many people believe in a "big bang" event that propels their career. And while there are certain cases where that's true, consistency is usually a better investment of your time.
Invest in being consistent and you'll reap rewards.
November 5, 2025 at 6:02 PM
Consistency is underrated.
Many people believe in a "big bang" event that propels their career. And while there are certain cases where that's true, consistency is usually a better investment of your time.
Invest in being consistent and you'll reap rewards.
Many people believe in a "big bang" event that propels their career. And while there are certain cases where that's true, consistency is usually a better investment of your time.
Invest in being consistent and you'll reap rewards.
"Effortless Observability - Integrating CloudWatch Application Signals with OpenTelemetry" by Tobias Schmidt
awsfundamentals.com/blog/cloudwa...
awsfundamentals.com/blog/cloudwa...
How to Use AWS CloudWatch Application Signals with OpenTelemetry on ECS Fargate and Lambda
This guide shows how to connect CloudWatch Application Signals with OpenTelemetry. See simple steps for ECS Fargate and Lambda. Example code included. Get clear metrics and traces fast.
awsfundamentals.com
November 5, 2025 at 1:01 PM
"Effortless Observability - Integrating CloudWatch Application Signals with OpenTelemetry" by Tobias Schmidt
awsfundamentals.com/blog/cloudwa...
awsfundamentals.com/blog/cloudwa...
"Go and enhance your calm: demolishing an HTTP/2 interop problem" by Lucas Pardue and Zak Cutner
blog.cloudflare.com/go-and-enhan...
blog.cloudflare.com/go-and-enhan...
Go and enhance your calm- demolishing an HTTP:2 interop problem
HTTP/2 implementations often respond to suspected attacks by closing the connection with an ENHANCE_YOUR_CALM error code. Learn how a common pattern of using Go's HTTP/2 client can lead to unintended…
blog.cloudflare.com
November 4, 2025 at 5:04 PM
"Go and enhance your calm: demolishing an HTTP/2 interop problem" by Lucas Pardue and Zak Cutner
blog.cloudflare.com/go-and-enhan...
blog.cloudflare.com/go-and-enhan...
"From Signals to Reliability: SLOs, Runbooks and Post-Mortems" by Fatih Koç
fatihkoc.net/posts/sre-ob...
fatihkoc.net/posts/sre-ob...
From Signals to Reliability: SLOs, Runbooks and Post-Mortems
Build reliability with SLOs, runbooks and post-mortems. Turn observability into systematic incident response and learning. Practical examples for Kubernetes environments.
fatihkoc.net
November 4, 2025 at 1:02 PM
"From Signals to Reliability: SLOs, Runbooks and Post-Mortems" by Fatih Koç
fatihkoc.net/posts/sre-ob...
fatihkoc.net/posts/sre-ob...
Reliability, like any other feature, needs to be prioritised accordingly.
There will be times where reliability work will be the priority. Other times, product features will be the priority.
And so on.
If one topic massively overshadows all the others, problems will arise.
There will be times where reliability work will be the priority. Other times, product features will be the priority.
And so on.
If one topic massively overshadows all the others, problems will arise.
November 3, 2025 at 6:03 PM
Reliability, like any other feature, needs to be prioritised accordingly.
There will be times where reliability work will be the priority. Other times, product features will be the priority.
And so on.
If one topic massively overshadows all the others, problems will arise.
There will be times where reliability work will be the priority. Other times, product features will be the priority.
And so on.
If one topic massively overshadows all the others, problems will arise.
For platforms to be valuable they need to be force multipliers.
That means being more than the sum of its parts.
That means being more than the sum of its parts.
November 3, 2025 at 1:02 PM
For platforms to be valuable they need to be force multipliers.
That means being more than the sum of its parts.
That means being more than the sum of its parts.
You always need to take roles and titles with a grain of salt.
I often meet DevOps/SREs/PlatEng all doing very similar jobs.
I also often meet groups of DevOps doing quite different jobs. The same applies for SREs and PlatEngs.
Context is crucial.
I often meet DevOps/SREs/PlatEng all doing very similar jobs.
I also often meet groups of DevOps doing quite different jobs. The same applies for SREs and PlatEngs.
Context is crucial.
October 31, 2025 at 6:03 PM
You always need to take roles and titles with a grain of salt.
I often meet DevOps/SREs/PlatEng all doing very similar jobs.
I also often meet groups of DevOps doing quite different jobs. The same applies for SREs and PlatEngs.
Context is crucial.
I often meet DevOps/SREs/PlatEng all doing very similar jobs.
I also often meet groups of DevOps doing quite different jobs. The same applies for SREs and PlatEngs.
Context is crucial.
Some people look down on or think of quality assurance and security as annoyances.
In the age of AI, if they continue to have that perspective, they'll have a rude awakening.
In the age of AI, if they continue to have that perspective, they'll have a rude awakening.
October 31, 2025 at 1:04 PM
Some people look down on or think of quality assurance and security as annoyances.
In the age of AI, if they continue to have that perspective, they'll have a rude awakening.
In the age of AI, if they continue to have that perspective, they'll have a rude awakening.
Important: hire adults.
Also important: treat them like adults.
Also important: treat them like adults.
October 31, 2025 at 12:50 PM
Important: hire adults.
Also important: treat them like adults.
Also important: treat them like adults.
Strive for civil discourse on your teams.
Some of the most creative solutions I've seen were born from discussions between people with completely different views on how to approach a problem.
Promoting diversity lays a good foundation for this to happen organically.
Some of the most creative solutions I've seen were born from discussions between people with completely different views on how to approach a problem.
Promoting diversity lays a good foundation for this to happen organically.
October 31, 2025 at 9:46 AM
Strive for civil discourse on your teams.
Some of the most creative solutions I've seen were born from discussions between people with completely different views on how to approach a problem.
Promoting diversity lays a good foundation for this to happen organically.
Some of the most creative solutions I've seen were born from discussions between people with completely different views on how to approach a problem.
Promoting diversity lays a good foundation for this to happen organically.
People that say "that's a DevOps team problem" have absolutely no clue what DevOps is about.
October 30, 2025 at 6:02 PM
People that say "that's a DevOps team problem" have absolutely no clue what DevOps is about.
For complex issues, I like runbooks because they allow me to really understand the problem before trying to automate it.
In the long-run, for most issues, I strive for automation. But starting with runbooks allows me to understand the quirks before automation.
In the long-run, for most issues, I strive for automation. But starting with runbooks allows me to understand the quirks before automation.
October 30, 2025 at 1:05 PM
For complex issues, I like runbooks because they allow me to really understand the problem before trying to automate it.
In the long-run, for most issues, I strive for automation. But starting with runbooks allows me to understand the quirks before automation.
In the long-run, for most issues, I strive for automation. But starting with runbooks allows me to understand the quirks before automation.