github.com/jaalu | he/him
There is a surprising amount of text available though, Harvard's Institutional Books dataset - arxiv.org/pdf/2506.08300 - has >470K texts dating to the 1800s
There is a surprising amount of text available though, Harvard's Institutional Books dataset - arxiv.org/pdf/2506.08300 - has >470K texts dating to the 1800s