Lightnews — Scholar-powered news

Xan Gregg

@xangregg.bsky.social

1.7K followers 1.8K following 240 posts

Engineering Fellow at JMP, focused on #DataViz, preferring smoothers over fitted lines. Creator of JMP #GraphBuilder and #PackedBars chart type for high-cardinality Pareto data. #TieDye #LessIsMore

Posts Replies Media Videos

Xan Gregg

@xangregg.bsky.social

My "discovery" was discussed 30 years earlier in a research note by Martin Mächler (see lowess.ps in his unpublished manuscripts folder) people.math.ethz.ch/~maechler/

screenshot of research note with first paragraph. Text reads:

Robustifying a Local Nonparametric Regression Estimator
Martin Maechler
May 1989
April 1992
This technical note is basically section 5.5 from my PhD thesis Machler The
goal is to make the following ndings more widely available I assume here that the
reader is familiar with Cleveland the rst journal pap er to introduce LOWESS
Note that since Cleveland and several coworkers have continued to research and improve
the local regression methodology notably allowing for multiple carriers and also working
out methods for inference see Cleveland and Devlin or chapter of Chambers
and Hastie Also the name of LOWESS has b een changed to loess However
nothing has been changed in the algorithm used to robustify LOWESS/loess and here I
am only considering these robustness properties Therefore all the subsequent references
to LOWESS equally apply to loess.

November 6, 2025 at 5:57 PM

Xan Gregg

@xangregg.bsky.social

Right, I assume the mean of the line is based on the data. The paper mentions σ as 3000 steps. Perhaps each line is a hypothetical average of a few thousand such movers, but I can't find any such explanation.

August 20, 2025 at 12:21 PM

Xan Gregg

@xangregg.bsky.social

Here's the smoothed grid with dots colored by their values's ones digit (walkScore % 10), and a superposition attempt, with smoothed in gray. (I didn't quite get the walk score per dot width to be an exact number of pixels.) Hope these capture the diagnostic you're looking for.

Smoothed dot plot with dots colored according to their true values.

Overlaid smoothed and unsmoothed dot plots

August 17, 2025 at 8:59 PM

Xan Gregg

@xangregg.bsky.social

It can't be a ratio of the changes since the denominator could be very small, even 0. However, using (total + first)/(total+latest) is no good since base is so much bigger. It seems like some smoothing/annualizing is happening. Closest I could get was a 12-month cumulative error versus the total.

August 13, 2025 at 2:32 PM

Xan Gregg

@xangregg.bsky.social

Better alternative?:
The datasets and code have been escrowed with the publisher (checksum xxx). They will be shared upon request to the corresponding author cc publisher for the following purposes: x, y, z. If no response within n days, notify the publisher and the paper will be retracted.

August 10, 2025 at 5:12 PM

Xan Gregg

@xangregg.bsky.social

1. Email addresses change.
2. Author becomes unavailable (retire, get busy, ...)
3. No definition of reasonable.
4. No way to verify that any supplied data is the actual data.
5. The supplied data may not be complete.
6. No penalty for breaking promise.

August 10, 2025 at 5:09 PM

Xan Gregg

@xangregg.bsky.social

That's a good way to put it. That example is mostly for those with a strict rule. I think my internal rule matches yours: the bar origin should be a "meaningful baseline" such that 2x bar height is a 2x effect from the baseline.

July 23, 2025 at 12:54 PM

Xan Gregg

@xangregg.bsky.social

The originals could serve as fodder for some #dataviz guides. When the zero-origin rule breaks down or when to use dots/lines instead of bars.

Bar charts from https://www.dailymail.co.uk/sciencetech/article-13739705/london-underground-hottest-line.html, showing average temperature for 10 years using bar charts with origins at 0°C. All bars in the 25 to 30 range, showing little variation at the scale.

July 23, 2025 at 12:25 PM

Xan Gregg

@xangregg.bsky.social

Great improvement sequence, but for me, it's harder to verify which categories are changing after putting their bars in separate groups. I see it's a trade-off with simplifying the coloring. Here's a try at sticking with the original ordering, at the cost of an imperfect time legend.

bar chart with 15 bars for 5 categories across 3 time periods each.

July 16, 2025 at 12:11 PM

Xan Gregg

@xangregg.bsky.social

Not mine, just to be clear. But, yes, very nice!

June 30, 2025 at 8:55 PM

Xan Gregg

@xangregg.bsky.social

This article by Don Wheeler has a good discussion of Grubbs' test and others. www.qualitydigest.com/inside/stati... [free reg reqd]
He's a control charts expert, which explains the sequence-based context and small data sizes.

Chart excerpted from https://www.qualitydigest.com/inside/statistics-column/some-outlier-tests-part-2-011121.html by Donald Wheeler showing several overlaid curves. Each shows the probability that outliers found are real versus data set size for several outlier tests.

June 30, 2025 at 7:22 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news