James Tauber
@jtauber.com
6.9K followers 4.1K following 2.7K posts
Using computers to better understand languages, texts, and music. Python, Web, Corpus Linguistics, Data Visualization, Philology, Ancient Greek, Music Theory, Tolkien, Space, Health. Perseus DL, Greek Learner Texts Project & Digital Tolkien Project!
Posts Media Videos Starter Packs
Pinned
jtauber.com
Intro for new followers: I'm a long-time (i.e. old) Python and Web developer. Now mostly apply that to digital humanities and corpus linguistics with focus on historical languages (especially Ancient Greek) and Tolkien. Also education, data visualization, music theory, and a handful of other things.
Reposted by James Tauber
ursulakleguin.com
Today, The Word for World: The Maps of Ursula K. Le Guin opens at AA Gallery in London! Curated by Sarah Shin and Harriet Jennings, the exhibition presents a selection of Ursula's maps, including some that have never been exhibited before.
A photo of the entrance to The Word for World exhibition, with two banners hanging vertically down outside the doorway to a brick building with white-paned windows. A row of bicycles are parked in front. The banners are purple and white and say the name of the exhibition and Ursula K. Le Guin's name. A cord-wrapped rock rests on a vivid blue background next to a map of and program for The Word for World exhibition. A stack of copies of The Word for World book, which shows the title in vivid blue against a black cloth cover.
jtauber.com
Okay, next idea for CARC is being able to compare ranking of byte values on a specific disk versus the corpus average.
jtauber.com
okay, I've redone the heatmap to show column and row rankings as well
jtauber.com
But the pattern you noticed is suggestive of a bit-level pattern.

Those rows you mention are all XX01XXXX
jtauber.com
One thing I plan to do is find shared sectors to see how much of this is boiler plate DOS code, etc
jtauber.com
The _B column also corresponds to unused opcodes.
jtauber.com
Yes, that’s the plan but much harder to do on a large scale. There is already evidence of strong code influence though.
jtauber.com
Not only is 9B the least common but 9_ and _B seem the least common rows and columns respectively overall too.

(perhaps I should alter the visualization to show that)
jtauber.com
okay, wrote my first plugin for markdown-it-py to emulate Obsidian callouts (rather than Github alerts)
jtauber.com
I can imagine wanting to support @mystmd.org in Ryland at some point too :-)
jtauber.com
haha, I just hit the fact that markdown-it-py doesn't seem to have wikilinks support. Looking into this discussion now...

Thanks for letting me know!
jtauber.com
Although it suffers from something I see a lot. The documentation (particularly for the plugins) seems comprehensive but doesn't actually tell me stuff I need to know to use it.

This is common with ports that assume you know the original. I remember Tkinter being like this in the early days.
jtauber.com
markdown-it-py is looking good so far
jtauber.com
mistletoe and markdown-it-py seem promising although can't see any discussion of Obsidian-related extensions to either
jtauber.com
It seems that a CommonMark implementation with just a few extensions would _probably_ do the job.
jtauber.com
As I think about Ryland-based Obsidian-rendering, I'm pondering:

1) Does an Obsidian Markdown Python library already exist?
2) If not, should it be done by customizing an existing library or would it need to be done from scratch?
3) is there already a input/output test suite for Obsidian Markdown?
Reposted by James Tauber
quinnanya.me
It's looking like I've got several projects that need language detection as part of the workflow. It's been a few years since I've used that and I assume there's been some (possibly vast?) improvements. Anyone have a favorite library / model / etc they'd recommend? #MultilingualDH
jtauber.com
It’s just New Testament but I do plan to run on larger corpora.
jtauber.com
Reminds me that I really want to redo all my calculations using the Perseus corpus.