Lightnews — Scholar-powered news

James Tauber

@jtauber.com

6.9K followers 4.1K following 2.7K posts

Using computers to better understand languages, texts, and music. Python, Web, Corpus Linguistics, Data Visualization, Philology, Ancient Greek, Music Theory, Tolkien, Space, Health. Perseus DL, Greek Learner Texts Project & Digital Tolkien Project!

Posts Media Videos Starter Packs

Pinned

James Tauber @jtauber.com · Nov 3

Intro for new followers: I'm a long-time (i.e. old) Python and Web developer. Now mostly apply that to digital humanities and corpus linguistics with focus on historical languages (especially Ancient Greek) and Tolkien. Also education, data visualization, music theory, and a handful of other things.

2 3 66

James Tauber @jtauber.com · 5h

And some more follow-on discoveries...

jtauber.github.io/CARC/2025/10...

Why So Many Sectors Start with 00 11 0X 00 | CARC

As we saw in the previous post, ten out of the top 20 sector types start with 00 11 0X 00 and continue with all zeros. I suspected why this might be the case and I’ve now confirmed the reason.

jtauber.github.io

1 1

James Tauber @jtauber.com · 8h

More Apple II disk explorations...

jtauber.github.io/CARC/2025/10...

Sector Types in the Apple II Disk Corpus | CARC

A regular Apple II floppy disk has 560 sectors of 256 bytes. Across the 12,450 disks in the current corpus, that’s 6,972,000 sectors. How many unique sector ...

jtauber.github.io

1 1 3

Reposted by James Tauber

Ursula K. Le Guin @ursulakleguin.com · 1d

Today, The Word for World: The Maps of Ursula K. Le Guin opens at AA Gallery in London! Curated by Sarah Shin and Harriet Jennings, the exhibition presents a selection of Ursula's maps, including some that have never been exhibited before.

A photo of the entrance to The Word for World exhibition, with two banners hanging vertically down outside the doorway to a brick building with white-paned windows. A row of bicycles are parked in front. The banners are purple and white and say the name of the exhibition and Ursula K. Le Guin's name.

A cord-wrapped rock rests on a vivid blue background next to a map of and program for The Word for World exhibition.

A stack of copies of The Word for World book, which shows the title in vivid blue against a black cloth cover.

5 110 300

James Tauber @jtauber.com · 23h

Here are the rank differences visualized for the Ultima IV boot disk vs the entire corpus.

jtauber.github.io/CARC/2025/10...

Byte Value Rank Differences For A Specific Disk | CARC

Having ranked all the byte values across 12,450 Apple II disk images, how might a specific disk differ?

jtauber.github.io

James Tauber @jtauber.com · 1d

Okay, next idea for CARC is being able to compare ranking of byte values on a specific disk versus the corpus average.

1 2

James Tauber @jtauber.com · 1d

okay, I've redone the heatmap to show column and row rankings as well

James Tauber @jtauber.com · 1d

But the pattern you noticed is suggestive of a bit-level pattern.

Those rows you mention are all XX01XXXX

1 1

James Tauber @jtauber.com · 1d

One thing I plan to do is find shared sectors to see how much of this is boiler plate DOS code, etc

1 3

James Tauber @jtauber.com · 1d

The _B column also corresponds to unused opcodes.

1 1

James Tauber @jtauber.com · 1d

Yes, that’s the plan but much harder to do on a large scale. There is already evidence of strong code influence though.

James Tauber @jtauber.com · 1d

Not only is 9B the least common but 9_ and _B seem the least common rows and columns respectively overall too.

(perhaps I should alter the visualization to show that)

2 4

James Tauber @jtauber.com · 1d

Did a heatmap of byte value ranks in my Apple II Disk Corpus...

jtauber.github.io/CARC/2025/10...

Ranking Byte Values in the Apple II Disk Corpus | CARC

Which are the most common byte values across the 12,450 Apple II disk images?

jtauber.github.io

2 5

James Tauber @jtauber.com · 1d

The CARC project now has a website (Ryland-generated of course):

jtauber.github.io/CARC/

CARC

Corpus Analysis of Retro Code

jtauber.github.io

2 3

Reposted by James Tauber

Space Missions @space-missions.com · 1d

We've added Estrack status:

space-missions.com/estrack/

Estrack – Space Missions

Status of the ESA’s tracking station network

space-missions.com

1 2

James Tauber @jtauber.com · 1d

okay, wrote my first plugin for markdown-it-py to emulate Obsidian callouts (rather than Github alerts)

1 5

James Tauber @jtauber.com · 1d

I can imagine wanting to support @mystmd.org in Ryland at some point too :-)

James Tauber @jtauber.com · 1d

haha, I just hit the fact that markdown-it-py doesn't seem to have wikilinks support. Looking into this discussion now...

Thanks for letting me know!

James Tauber @jtauber.com · 1d

Although it suffers from something I see a lot. The documentation (particularly for the plugins) seems comprehensive but doesn't actually tell me stuff I need to know to use it.

This is common with ports that assume you know the original. I remember Tkinter being like this in the early days.

1 3

James Tauber @jtauber.com · 1d

markdown-it-py is looking good so far

1 3

James Tauber @jtauber.com · 1d

mistletoe and markdown-it-py seem promising although can't see any discussion of Obsidian-related extensions to either

James Tauber @jtauber.com · 1d

It seems that a CommonMark implementation with just a few extensions would _probably_ do the job.

1 2

James Tauber @jtauber.com · 1d

As I think about Ryland-based Obsidian-rendering, I'm pondering:

1) Does an Obsidian Markdown Python library already exist?
2) If not, should it be done by customizing an existing library or would it need to be done from scratch?
3) is there already a input/output test suite for Obsidian Markdown?

1 4

Reposted by James Tauber

Quinn Dombrowski @quinnanya.me · 1d

It's looking like I've got several projects that need language detection as part of the workflow. It's been a few years since I've used that and I assume there's been some (possibly vast?) improvements. Anyone have a favorite library / model / etc they'd recommend? #MultilingualDH

1 7 6

James Tauber @jtauber.com · 1d

It’s just New Testament but I do plan to run on larger corpora.

James Tauber @jtauber.com · 1d

Reminds me that I really want to redo all my calculations using the Perseus corpus.