Christopher Finlan
cmfinlan.bsky.social
Christopher Finlan
@cmfinlan.bsky.social
Monitoring Spark Jobs in Real Time in Microsoft Fabric

If Spark performance work is surgery, monitoring is your live telemetry. Microsoft Fabric gives you multiple monitoring entry points for Spark workloads: Monitor hub for cross-item visibility, item Recent runs for focused context, and…
Monitoring Spark Jobs in Real Time in Microsoft Fabric
If Spark performance work is surgery, monitoring is your live telemetry. Microsoft Fabric gives you multiple monitoring entry points for Spark workloads: Monitor hub for cross-item visibility, item Recent runs for focused context, and application detail pages for deep investigation. This post is a practical playbook for using those together. Why this matters When a notebook or Spark job definition slows down, "run it again" is the most expensive way to debug. Real-time monitoring helps you: spot bottlenecks while jobs are still running isolate failures quickly compare behavior across submitters and workspaces…
christopherfinlan.com
February 12, 2026 at 3:37 PM
Lakehouse Table Optimization: VACUUM, OPTIMIZE, and Z-ORDER

If your Lakehouse tables are getting slower (or more expensive) over time, it’s often not "Spark is slow." It’s usually table layout drift: too many small files, suboptimal clustering, and old files piling up. In Fabric Lakehouse, the…
Lakehouse Table Optimization: VACUUM, OPTIMIZE, and Z-ORDER
If your Lakehouse tables are getting slower (or more expensive) over time, it’s often not "Spark is slow." It’s usually table layout drift: too many small files, suboptimal clustering, and old files piling up. In Fabric Lakehouse, the three table-maintenance levers you’ll reach for most are: OPTIMIZE: compacts many small files into fewer, larger files (and can apply clustering)
christopherfinlan.com
February 10, 2026 at 8:33 PM
OneLake catalog in Microsoft Fabric: Explore, Govern, and Secure

If your Fabric tenant has grown past "a handful of workspaces," the problem isn’t just storage or compute—it’s finding the right items, understanding what they are, and making governance actionable. That’s the motivation behind the…
OneLake catalog in Microsoft Fabric: Explore, Govern, and Secure
If your Fabric tenant has grown past "a handful of workspaces," the problem isn’t just storage or compute—it’s finding the right items, understanding what they are, and making governance actionable. That’s the motivation behind the OneLake catalog: a central hub to discover and manage Fabric content, with dedicated experiences for discovery (Explore), governance posture (Govern), and security administration (Secure). This post is a practical walk-through of what’s available today, with extra focus on what Fabric admins get in the Govern…
christopherfinlan.com
February 10, 2026 at 3:00 PM
Understanding Spark Execution in Microsoft Fabric

Spark performance work is mostly execution work: understanding where the DAG splits into stages, where shuffles happen, and why a handful of tasks can dominate runtime. This post is a quick, practical refresher on the Spark execution model — with…
Understanding Spark Execution in Microsoft Fabric
Spark performance work is mostly execution work: understanding where the DAG splits into stages, where shuffles happen, and why a handful of tasks can dominate runtime. This post is a quick, practical refresher on the Spark execution model — with Fabric-specific pointers on where to observe jobs, stages, and tasks. 1) The execution hierarchy: Application → Job → Stage → Task In Spark, your code runs as a Spark application. When you run an action (for example, count(), collect(), or writing a table), Spark submits a job…
christopherfinlan.com
February 9, 2026 at 9:22 PM
Fabric Spark Shuffle Tuning: AQE + partitions for Faster Joins

Shuffles are where Spark jobs go to get expensive: a wide join or aggregation forces data to move across the network, materialize shuffle files, and often spill when memory pressure spikes. In Microsoft Fabric Spark workloads, the…
Fabric Spark Shuffle Tuning: AQE + partitions for Faster Joins
Shuffles are where Spark jobs go to get expensive: a wide join or aggregation forces data to move across the network, materialize shuffle files, and often spill when memory pressure spikes. In Microsoft Fabric Spark workloads, the fastest optimization is usually the boring one: avoid the shuffle when you can, and when you can’t, make it smaller and better balanced. This post lays out a practical, repeatable approach you can apply in Fabric notebooks and Spark job definitions. 1) Start with the simplest win: avoid the shuffle If one side of your join is genuinely small (think lookup/dimension tables), use a broadcast join so Spark ships the small table to executors and avoids a full shuffle.
christopherfinlan.com
February 6, 2026 at 3:03 PM
OneLake Shortcuts + Spark: Practical Patterns for a Single Virtual Lakehouse

If you’ve adopted Microsoft Fabric, there’s a good chance you’re trying to reduce the number of ‘copies’ of data that exist just so different teams and engines can access it. OneLake shortcuts are one of the core…
OneLake Shortcuts + Spark: Practical Patterns for a Single Virtual Lakehouse
If you’ve adopted Microsoft Fabric, there’s a good chance you’re trying to reduce the number of ‘copies’ of data that exist just so different teams and engines can access it. OneLake shortcuts are one of the core primitives Fabric provides to unify data across domains, clouds, and accounts by making OneLake a single virtual data lake namespace. For Spark users specifically, the big win is that shortcuts appear as folders in OneLake—so Spark can read them like any other folder—and Delta-format shortcuts in the Lakehouse Tables area can be surfaced as tables.
christopherfinlan.com
February 5, 2026 at 3:02 PM
When ‘Native Execution Engine’ Doesn’t Stick: Debugging Fabric Environment Deployments with fabric-cicd

If you’re treating Microsoft Fabric workspaces as source-controlled assets, you’ve probably started leaning on code-first deployment tooling (either Fabric’s built-in Git integration or…
When ‘Native Execution Engine’ Doesn’t Stick: Debugging Fabric Environment Deployments with fabric-cicd
If you’re treating Microsoft Fabric workspaces as source-controlled assets, you’ve probably started leaning on code-first deployment tooling (either Fabric’s built-in Git integration or community tooling layered on top). One popular option is the open-source fabric-cicd Python library, which is designed to help implement CI/CD automations for Fabric workspaces without having to interact directly with the underlying Fabric APIs. For most Fabric items, a ‘deploy what’s in Git’ model works well—until you hit a configuration that looks like it’s in source control, appears in deployment logs, but still doesn’t land in the target workspace.
christopherfinlan.com
February 3, 2026 at 3:00 PM
Gil Gerard, Buck Rogers, and the Kind of Grief That Shows Up in December
Gil Gerard, Buck Rogers, and the Kind of Grief That Shows Up in December
Gil Gerard's departure reminds us that some celebrities aren't just actors; they're the comforting echoes of our past. Buck Rogers was more than a show—it was a place that shaped our childhood optimism.
christopherfinlan.com
December 18, 2025 at 2:05 AM
Build Your Own Spark Job Doctor in Microsoft Fabric

Microsoft Fabric simplifies Spark workload management but diagnosing performance issues remains challenging. This post introduces the "Job Doctor," an AI tool that analyzes Spark telemetry to identify problems like skew or excessive shuffles,…
Build Your Own Spark Job Doctor in Microsoft Fabric
Microsoft Fabric simplifies Spark workload management but diagnosing performance issues remains challenging. This post introduces the "Job Doctor," an AI tool that analyzes Spark telemetry to identify problems like skew or excessive shuffles, generates human-readable diagnoses, and suggests fixes. The implementation integrates with Azure AI for optimized Spark job management.
christopherfinlan.com
December 5, 2025 at 7:43 PM
Time to Automate: Why Sports Card Grading Needs an AI Revolution

As I head to the National for the first time, this is a topic I have been thinking about for quite some time, and a recent video inspired me to put this together with help from ChatGPT’s o3 model doing deep research. Enjoy!…
Time to Automate: Why Sports Card Grading Needs an AI Revolution
As I head to the National for the first time, this is a topic I have been thinking about for quite some time, and a recent video inspired me to put this together with help from ChatGPT’s o3 model doing deep research. Enjoy! Introduction: Grading Under the Microscope Sports card grading is the backbone of the collectibles hobby – a PSA 10 vs PSA 9 on the same card can mean thousands of dollars of difference in value. Yet the process behind those grades has remained stubbornly old-fashioned, relying on human eyes and judgment.
christopherfinlan.com
July 29, 2025 at 11:47 PM
Humans + Machines: From Co-Pilots to Convergence — A Friendly Response to Josh Caplan’s “Interview with AI”

1. Setting the Table Josh, I loved how you framed your conversation with ChatGPT-4o around three crisp horizons — 5, 25 and 100 years. It’s a structure that forces us to check our near-term…
Humans + Machines: From Co-Pilots to Convergence — A Friendly Response to Josh Caplan’s “Interview with AI”
1. Setting the Table Josh, I loved how you framed your conversation with ChatGPT-4o around three crisp horizons — 5, 25 and 100 years. It’s a structure that forces us to check our near-term expectations against our speculative impulses. Below I’ll walk through each horizon, point out where my own analysis aligns or diverges, and defend those positions with the latest data and research. 2. Horizon #1 (≈ 2025-2030): The Co-Pilot Decade Where we agree You write that “AI will write drafts, summarize meetings, and surface insights … accelerating workflows without replacing human judgment.” Reality is already catching up:
christopherfinlan.com
July 15, 2025 at 3:12 AM
The team worked a long time to make this a reality for folks - so excited it is finally here!! True serverless billing for Spark in Fabric!

Introducing Autoscale Billing for Spark in Microsoft Fabric - blog.fabric.microsoft.com/en/blog/intr...
Microsoft Fabric Blog
Keep up with the latest Microsoft Fabric updates, announcements, information, & new features on the Microsoft Fabric blog. Search by category or date published.
https://blog.fabric.microsoft.com/en/blog/introd…
March 31, 2025 at 5:30 PM