Joey Cherdarchuk - Obumbratta
@obumbratta.com
640 followers 110 following 170 posts
Posts Media Videos Starter Packs
obumbratta.com
So glad to hear this. I have many many links that should now be unbroken.
albertocairo.com
Big news: The archive of my old weblog, The Functional Art, is back online: www.thefunctionalart.com

The blog won't be updated anymore, though. In the next 2-3 weeks I'll share some news about resuming the newsletter of my upcoming big project: openvisualizationacademy.org

#dataViz #dataVis 📊
Logo of the Open Visualization Academy, which looks like a data-generated wave
obumbratta.com
This is basically the content of a blog post I wrote 9yrs ago while at Darkhorse Analytics. But no one reads blogs anymore and many of those links are now dead so it was time for an update
Read that post here
www.darkhorseanalytics.com/blog/visuali...
And see my work/hire me here
www.obumbratta.com
obumbratta.com
The list is by no means exhaustive, HOPs, horizon charts and I'm sure other charts can be and have been used to show distributions. But attached is a handy one pager to summarize all the methods discussed here
Plotting distributions. 16 different ways to visualize one data set.
obumbratta.com
Honestly, I've seen examples in books by Cleveland or Few, and researchers use them a lot, but I don't have any good real world examples to share with you of Quantile plots
obumbratta.com
Quantile Plot

These can feel less cluttered than ranked bars, but can be harder to highlight a single data point. You usually plot each of the 100 percentiles rather than plotting every point in your data. Great for answering "what percent of my values are below/above a certain threshold?"
obumbratta.com
OECD Better Life Index by @moritzstefaner.bsky.social, @do.minik.us & Raureif (current tool not working so here is Moritz's post)
truth-and-beauty.net/projects/oec...

Darkhorse Analytics on Alberta school grades
grades.dha.io

Density Design ranks cesarean sections
www.flickr.com/photos/densi...
OECD Better Life Index Ranked Bars for country comparisons across a number of indicators W.H.O.'s caesarean section ranking by country
obumbratta.com
Ranked Bars

Great to communicate a single distribution while highlighting a particular data point. People love to rank things so it’s easy to interpret. It behaves in almost the opposite manner to a histogram in that flat areas represent clusters of data whereas with a histogram it is the peaks.
Ranked Bars
obumbratta.com
We are back to plotting all the data, but this time we use one axis to plot the values and another to plot the ranking. This gives a different perspective on the shape of the data. These are excellent at providing context for where one data point fits in among many. Not great for comparing >1 dist.
Rank the data
obumbratta.com
Seth Kadish visualizes charity ratings
vizual-statistix.tumblr.com/post/1207793...

I rework social media demographics
www.darkhorseanalytics.com/blog/divergi...
(I've since learned to have better contrast)
Charity Navigator ratings: 4,3,2,1,0 stars and donor advisory ratings for different charity categories. Social media platforms, gender, age, education, and income breakdowns.
obumbratta.com
Stacked Bar

This plot involves a change in perspective. It has a much smaller number of bins and shows the proportion of the data that fits in each. It will provide a very coarse look at the shape of the data but is quite easy to interpret. It is most effective with a small number of categories.
Stacked bar chart
obumbratta.com
FiveThirtyEight with a story on name based age estimates
fivethirtyeight.com/features/how...

@flowingdata.com plots 10th, Median, & 90th percentiles of salaries across industries
projects.flowingdata.com/2014/industry/

Flourish looks at the ages of olympians
flourish.studio/blog/visuali...
Median ages for females with the 25 most common names among Americans estimated to be alive as of Jan 1, 2014. Emily skews young, while Dorthy much older Portion of interactive chart showing salary ranges for four industries From 12 to 72: Ages of Olympians since 1896
obumbratta.com
Range Plot

The ends of the line (usually) show the min and max of your data and the dot shows the average, or really any three metrics you find most useful. If the range of the data is the most important thing or your audience my struggle to interpret the other plots, this is nice and simple.
Range Plot
obumbratta.com
FiveThirtyEight looks at the Red Sox roller coaster
fivethirtyeight.com/features/the...

Washington Post - Ivy League Grad Income
www.washingtonpost.com/news/wonk/wp...

Marcus Beck analyzes dissertation & thesis lengths
beckmw.wordpress.com/2014/07/15/a...
Red Sox Actual vs. projected number of wins 1999-2014 Ivy League educations pay off. Actual earnings distributions, 10 years after starting school. Ivy League distribution is significantly higher than All schools distribution Summary of page lengths of masters theses by major, sorted and color-coded by median. Boxes represent the median, 25th and 75th percentiles, 1.5 times the interquartile range as whiskers, and outliers beyond the whiskers. Number of records for each major are in parentheses.
obumbratta.com
Box Plot

Box Plots focus on the min, 25th percentile, median, 75th percentile and max. Many will explicitly plot outliers and exclude them from the min-max
Box Plot
obumbratta.com
You can also simplify the distribution by focusing on a few key aspects of the data, usually some measure of central tendency plus some measure of range. The following visuals give less detail but may make comparisons simpler.
Summarize the data
obumbratta.com
Nick Berry analyzes 4-digit PINs
www.datagenetics.com/blog/septemb...

Seth Kadish shows chess board utilization
vizual-statistix.tumblr.com/post/7882178...

Matt Styles looks at the distribution of birthdays
www.themarginalian.org/2013/10/08/b...
4-digit pin number combinations heatmap, columns are the first two digits, rows are the next two digits. Utilization of chess board squares by Chess Masters Which birth dates are most common? July, Aug, Sep births are most common, along with some spikes after Christmas and dips on the 4th and 5th of July.
obumbratta.com
Heatmaps

Heatmaps are extremely compact and excellent for looking for patterns across many distributions. They use variation in colour to show the counts and, as such, they are not very precise, only giving a crude sense of the data’s shape
Heatmap
obumbratta.com
Stem & Leaf Plot

So awesome, so rare. It’s a sideways histogram, but with all the data encoded in it. Once you know how to read it, it is quite powerful, but possibly too much information for many audiences.
Stem & Leaf Plot
obumbratta.com
Periscopic looking at grant distributions
medium.com/@Periscopic/...

@experimental361.com shows Premier League age profiles. The page for this visual seems to have disappeared

CJ Mayes compare life expectancy over time and continents
cj-mayes.com/2022/05/24/v...
Comparing the grant dollar distributions of Old, Special Projects, Performing Arts, Environment, Education, and some other unnamed categories of grants. Squad age profiles: Premier League, 21 May 2017 Violin Plot comparison of life expectancy on the different continents. Africa's life expectancy has shifted up significantly between 1997 and 2020
obumbratta.com
Bean / Violin Plot

Like the area chart except symmetrical and often more compressed. Useful when comparing >1 distributions where a sense of the shape is important. Sometimes superimposed with: a box plot that makes them look like a violin, or a rug plot that makes them look nothing like a bean
Bean / Violin Plot
obumbratta.com
OkCupid exposes lies about height among other things
theblog.okcupid.com/the-big-lies...

The Financial Times compares the time of day distribution of devices browsing their site
aboutus.ft.com/press_releas...

D3 visualizing some pulsar data à la Joy Division
observablehq.com/@d3/psr-b191...
Comparison of the average Male height distribution in the US vs reported heights on OKCupid. People are about two inches shorter than they say they are. Comparison of Desktop, Tablet, and Smartphone browsing times on the FT site in 2011. Smartphone and tablet browsing are more common in the early morning and late at night. Desktop more common in the day. Visualization of pulsar data. Honestly I don't understand it well enough to even begin to explain what it is showing.
obumbratta.com
Line Chart

Like the area chart but not filled under the line. Line charts can be useful when you want to accurately compare a few distributions to one another.
Line chart
obumbratta.com
FiveThirtyEight looks at the political spectrum in law schools
fivethirtyeight.com/features/the...

The Financial times looks at the distribution of points won and lost by top tennis players
ig.ft.com/features/bas...

HBR looks at how people interpet probabilistic words
hbr.org/2018/07/if-y...
The wide ranging politics of elite Law school. Distribution of justice ideology for Supreme Court clerks, by school since 1937. Berkley Justices lean most liberal, then NYU, Columbia. Yale is bimodal but slightly more conservative, then Harvard Michigan, Stanford, Northwestern, Chicago, Penn, with Virginia leaning the most Conservative. Nadal and Federer: two paths to greatness. Distributions of %of points won in matches for matches they won in blue vs matches they lost in red. Distribution of responses according to respondents' estimate of likelihood for words or phrases like: Always, Usually, Often, Possibly, Rarely, and Never