Calle Börstell
@cborstell.bsky.social
1.7K followers 460 following 200 posts
Linguist at the University of Bergen 🇳🇴 #SignLanguages, #linguistics, #RStats & #dataviz
Posts Media Videos Starter Packs
cborstell.bsky.social
Euroleague Basketball #TidyTuesday

A mini dataset, so decided to find a way to plot stadium capacity in an interesting way: went with points along the trajectory of a basketball shot! Swoosh! 🏀

Code: github.com/borstell/tid...

#R4DS #DataViz
A plot that says Euroleague Basketball teams by stadium capacity on a offwhite background and black/gray text. Each data point is plotted as a basketball (emoji) and text label with the name of the team(s) playing at the arena, along a parabola simulating the trajectory of a basketball shot towards a schematic basketball hoop located in the bottom right corner. A caption reads: "There are 20 teams in Euroleague Basketball, playing at 19 unique arenas with a median capacity of 12,700 spectators. The biggest arena by capacity is Belgrade Arena which hosts the teams Crvena zvezda Meridianbet and Partizan and seats up to 18,386 people. The smallest arena is Salle Gaston Médecin which hosts Monaco and has a capacity of 5,000. Data: EuroLeague & Wikipedia via {TidyTuesday}; Packages: {tidyverse, ggrepel, ggtext, glue, scales} ; Visualization: C. Börstell."
cborstell.bsky.social
I did! Just one more reply down the thread! 🙂

bsky.app/profile/cbor...
cborstell.bsky.social
Something like this!

Tweaked the calculation to something I had originally intended and included 3-letter endings only.

–arp is very Skåne

I like the distribution of the –red/–röd/–ryd endings, of the same origin

Code: github.com/borstell/maps
A map of Sweden in 10 different panels, each showing the log odds frequency of different 3-letter place name endings in various shades of red (darker means higher prevalence; gray means no data).
cborstell.bsky.social
I just wanted to simulate the Monty Hall problem, but couldn't get why the host would sometimes pick the door that the contestant had already chosen

🚪🐐🚪🐐🚪🚗
Line chart showing a simulation of the Monty Hall problem. Two lines representing the strategies stay vs switch quickly converge around the ⅔ and ⅓ probabilities of winning the car, respectively
cborstell.bsky.social
Trying to debug for much longer than needed before realizing that

sample(3, 1)

is interpreted as

sample(1:3, 1)

Could not understand what the issue was, thinking it must be my own code, but finally looked at ?sample documentation... #RStats
cborstell.bsky.social
Something like this!

Tweaked the calculation to something I had originally intended and included 3-letter endings only.

–arp is very Skåne

I like the distribution of the –red/–röd/–ryd endings, of the same origin

Code: github.com/borstell/maps
A map of Sweden in 10 different panels, each showing the log odds frequency of different 3-letter place name endings in various shades of red (darker means higher prevalence; gray means no data).
cborstell.bsky.social
I haven't! But I should, I'll see if I can clean it up a bit, I was only trying to experiment with spatial data.

The method is definitely a bit weak as it's only looking among the just over 2000 "tätorter", which are already quite skewed in distribution and the number per municipality.
cborstell.bsky.social
It's super cool!

I used it for a bonus part TidyTuesday two years ago, making a tune out of Taylor Swift song data, saved as an mp3. Spoiler: the tune wasn't/isn't very good!

github.com/borstell/tid...
A graphic in the style of a yellowish page of sheet music. Title says "Taylor's Albums (melodized)". On the right, each album is represented by a line of sheet music. On the left, the text reads: "Each album as a single line of sheet music. The key of each track as a single note. Each note's duration as the relative track duration (z scored) across all albums."
Reposted by Calle Börstell
andrew.heiss.phd
I just learned about the `truncate` argument in {lubridate} functions, which means NO MORE HACKY paste0(year, "-01-01") code to build dates when converting years to dates in #rstats
library(lubridate)

yr <- 2025

# This doesn't work
ymd(yr)
#> Warning message:
#> All formats failed to parse. No formats found. 
#> [1] NA

#-------
# TIRED
#-------
# I've done this for *yeeeeears*
ymd(paste0(yr, "-01-01"))
#> [1] "2025-01-01"

#-------
# WIRED
#-------
# The truncated argument tells {lubridate} that it can 
# ignore up to 2 (in this case) formats to look for. 
# Ordinarily `ymd()` looks for three formats: a year, 
# a month, and a day. `truncated = 2` means it can skip 
# the month and day parts
ymd(yr, truncated = 2)
#> [1] "2025-01-01"

ymd(c("2025", "2025-10", "2025-10-02"), truncated = 2)
#> [1] "2025-01-01" "2025-10-01" "2025-10-02"
cborstell.bsky.social
The meanings are more or less:

å = river
berg = mountain
kvarn = mill
mark = land
ö = island
ryd = open land
sta = place
torp = cottage
träsk = marsh
tuna = enclosed area, yard

Full disclosure I'm 100% not a historical linguist nor onomastician
cborstell.bsky.social
Tun(a) comes from something like 'enclosed area, yard, plot of land' and is related to "town"!
cborstell.bsky.social
Distributions of endings in Swedish place names

#RStats #DataViz
A map of Sweden in 10 different panels, each showing the proportional frequency of different place name endings in various shades of blue
Reposted by Calle Börstell
thomasp85.com
🎨 Theming got a huge overhaul with the latest #ggplot2 release. In honour of that @teunbrand.bsky.social has written a comprehensive deep-dive into styling your plots, covering both old and new functionality. Grab a coffee and dive in!

#rstats
ggplot2 styling
This post discusses one function in ggplot2: `theme()`. Find out about the glamour of graphics in this deep-dive article.
www.tidyverse.org
cborstell.bsky.social
I partially contributed to this week's #TidyTuesday dataset of crane observations at the lake Hornborgasjön, 🇸🇪

I made a minimalist plot of observation maxima – earlier over time – in the style of written notes.

{magick} magic for images!

Code: github.com/borstell/tid...

#R4DS #DataViz #ggplot2
A data visualization in the style of handwritten/-drawn notes on a piece of paper. Title reads "Cranes at Lake Hornborgasjön: Observations of cranes at the Swedish lake peak earlier over time". The graph shows ridgelines along dates from mid-March to end of April (x-axis) across the years 1994 to 2024 (y-axis). The first date each year where that year's maximum number of observations was reached is marked with a little handdrawn X on the year ridgelines. Across all years, there are lines showing the mean and median values for first max observations across years down the middle. X'es are generally farther to the right in early years (i.e. later in the spring) and farther to the left in more recent years (earlier in the spring). In the plot margins, there are embedded images of cranes in flight. Data: "Transtatistik", Naturum Hornborgasjön via TidyTuesday; Image: Gllawm, Wikimedia Commons curid=147233798; Packages: {tidyverse, magick}; Visualization: C. Börstell
Reposted by Calle Börstell
jonthegeek.com
@dslc.io welcomes you to week 39 of #TidyTuesday! We're exploring Crane Observations at Lake Hornborgasjön, Sweden (1994–2024)!

📁 https://tidytues.day/2025/2025-09-30

#RStats #PyData #JuliaLang #DataViz #tidyverse #r4ds
Logo for the #TidyTuesday Project. The words TidyTuesday, A weekly data project from the Data Science Learning Community (dslc.io) overlaying a black paint splash. TidyTuesday is a weekly social data project. All are welcome to participate! Please remember to share the code used to generate your results!
TidyTuesday is organized by the Data Science Learning Community. Join our Slack for free online help with R and other data-related topics, or to participate in a data-related book club!

 How to Participate
Data is posted to social media every Monday morning. Follow the instructions in the new post for how to download the data.
Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data.
Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language.
Share your output and the code used to generate it on social media with the #TidyTuesday hashtag. Large flock of cranes gathered in a grassy field during golden hour, with some birds in flight overhead and autumn-colored vegetation in the background.
Line chart showing the number of cranes per day at Lake Hornborga in Sweden during March and April from 2014-2021. Multiple gray lines represent different years, with the 2018 and 2021 seasons highlighted in blue. The chart shows crane migration patterns starting near zero in early March, building to peak numbers of 15,000-27,000 birds between March 30 and April 8 (highlighted in beige), then declining through late April. The highest recorded count was 27,300 cranes on April 3rd, 2019. Dashed lines indicate days when weather conditions made accurate counting difficult. A note explains that 2018 had unusually cold February temperatures causing later arrival. The chart is credited to Anna Thieme from Länsstyrelsen VG at https://transtat.lansstyrelsen.se/
Reposted by Calle Börstell
phkraemer.bsky.social
Since today is the #EuropeanDayOfLanguages, let's not forget that you are not required to keep your languages separate, pure, intact or in any way feel inadequate about your way of using your own language/s.
And don't let anyone take away your co-ownership of the languages you have.
cborstell.bsky.social
Besides questionable variable names (test, testtest, test2, ...) and working in Untitled for hours, it's the 20+ lines of piping, including pipes inside join functions, all piped into ggplot with additional anonymous functions of subsetting and piping inside the geoms 🫣
cborstell.bsky.social
Them in any other channel:

"We will organize an amazing thing, we will literally not tell you the date, location, contents or any other details here, please see our fb [link]" *logs off for the next 6 months*
cborstell.bsky.social
And it's particularly sign languages in the Southern Hemisphere that have been added to the Glottolog database in the past 8 years.
cborstell.bsky.social
Because sign language linguistics is a young field, many sign languages were undocumented for a long time, and still are.

In fact, in Glottolog, the "sign language" category is in the top with regard to the increase in number of documented languages across version releases.
A line chart with the title "Cumulative increase of documented languages by "family"/grouping across releases of Glottolog: only showing families with 10 or more languages". It shows the cumulative increase over time with releases of the Glottolog language database. "Sign languages" is highlighted in a turquoise line, steadily increasing, having increased by around 26% from version 3.1 to version 5.2.
cborstell.bsky.social
In the Glottolog language database, there are currently 227 different sign languages documented, and they are found all around the globe!
A map showing the distribution of the 227 sign languages documented in Glottolog 5.2 (only 224 languages shown as 3 are missing coordinates). The languages are shown as turquoise dots, distributed across all continents.
cborstell.bsky.social
It's the #InternationalDayOfSignLanguages!

The GIF below shows a commonly used international sign for 'sign language'. But is there only a single, universal sign language? Of course not, there are many!

#Linguistics
cborstell.bsky.social
Which countries have must rated chess players per age group? ♟️
India rising to the top in the youngest age groups. #TidyTuesday

github.com/borstell/tid...

#R4DS #DataViz #ggplot2
A plot titled "FIDE chess players by country & birth year: Ranking of the top International Chess Federation (FIDE) countries by the number of rated players (Elo rating ⩾1400) per age group (year of birth). Number of players shown under each flag (percentage of age group in brackets). Numbers under the flags show the number of players (with percentage of totals in brackets)". The plot resembles a chessboard, with a grayish purple background and the rankings being displayed as country flags on top of the chessboard's squares. In the oldest age brackets (left side), European countries are dominating with Germany, Spain and France having the most players. On the right side with the younger age groups, India is quickly rising to the top, in the youngest age group (2010—2021), Sri Lanka is also up-and-coming. Data: FIDE (September 2025) via TidyTuesday; Packages: {ggtext, tidyverse}; Visualization: C. Börstell
cborstell.bsky.social
library(tidyverse)
ggplot(mutate(uncount(tibble(x=LETTERS[1:7],y=rep(1:4,e=2)[-1],z=c(rep("#DDC",4),"#678","#643","#BCD")),y),g=row_number()))+geom_bar(aes(x,fill=I(z),group=g),col="#444")+theme_void()

👨🏻‍💻⛳️ #RStats
A bar plot that resembles the image quote posted, showing 7 filing cabinets on a sidewalk side by side, resembling a bar chart. The bar chart has lines separating the individual values, thus mimicking drawers of each cabinet.
Reposted by Calle Börstell
noraduggan.bsky.social
Application for a PhD position at our department is now open! We are looking for applicants that are interested in areas of sign language linguistics, general sign language studies, or deaf bilingualism/multilingualism. Deadline is 15th Oct. Please share away! su.varbi.com/en/what:job/...
PhD student in Linguistics, sign language
The Department of Linguistics at Stockholm University conducts research and offers education in a number of areas such as child language development, computational linguistics, general linguistics, ph
su.varbi.com
cborstell.bsky.social
I agree! But I see people teach with the raw notebooks themselves, which to me makes things muddled and I'm not sure I get the benefits of. But for posting it like a report or tutorial, the rendered output with clear separation of comments, code and outputs is much nicer!