Greg Leppert
leppert.me
Greg Leppert
@leppert.me
Working on AI and access to knowledge at Harvard. Executive Director of the Institutional Data Initiative; Chief Technologist of the Berkman Klein Center.
Even if you're not a partner library, you might be curious about what it's like to work with GRIN. Our technical report has a wealth of details. arxiv.org/abs/2511.11447
GRIN Transfer: A production-ready tool for libraries to retrieve digital copies from Google Books
Publicly launched in 2004, the Google Books project has scanned tens of millions of items in partnership with libraries around the world. As part of this project, Google created the Google Return Inte...
arxiv.org
November 20, 2025 at 4:42 PM
We're also sharing the pipeline we developed for Institutional Books that seamlessly dedupes, classifies, and enhances the data once GRIN Transfer brings it down. www.institutional.org/tools
Institutional Books | Institutional Data Initiative
Institutional Books 1.0 is our first release of public domain books. This set was originally digitized through Harvard Library’s participation in the Google Books project..
www.institutional.org
November 20, 2025 at 4:42 PM
That's why we built GRIN Transfer: a tool for downloading collections, big or small. GRIN Transfer handles request batching, failure recovery, and data aggregation so that libraries can focus on using the data rather than simply gaining access to it. www.institutional.org/posts/grin-t...
Announcing the release of GRIN Transfer
GRIN Transfer, an open source tool that allows Google Books partner libraries to more easily access their Google Books collection.
www.institutional.org
November 20, 2025 at 4:42 PM
We learned this lesson over the months it took to download 1M of Harvard Library's books for our Institutional Books release. As a result, many libraries have yet to take full advantage of the wonderful resources GRIN provides.
November 20, 2025 at 4:42 PM
The @institutionaldatainitiative.org at Harvard works with knowledge institutions to increase the availability, diversity, and responsible use of training data for AI. Reach out and join us.
March 12, 2025 at 1:23 PM
Our goal is to develop methods and tools that can support expert staff at libraries everywhere, increasing the breadth of materials that can be digitized and the speed at which they’re made accessible to the public. Learn more at BPL: www.bpl.org/news/boston-...
Boston Public Library Expands Access to Collections Through AI-Enhanced Digitization
BOSTON, MA – March 12, 2025 - The Boston Public Library (BPL) is launching a large-scale digitization project to unlock hundreds of thousands…
www.bpl.org
March 12, 2025 at 1:23 PM
Together, we’ll research opportunities to generate machine-readable representations of items, add searchable metadata, and begin the structuring of entire collections—all at the moment each item leaves the imaging station.
March 12, 2025 at 1:23 PM
IDI and BPL are working to change this by collaborating at the outset of a large digitization project, exploring how AI might complement human expertise and strengthen the process in its earliest stages.
March 12, 2025 at 1:23 PM
BPL is embarking on a new initiative to digitize hundreds of thousands of historic items. Conventional approaches to this scale lead to an impossible choice: sacrifice depth for breadth or drastically limit what gets digitized. AI tools can help, but they’re relegated to the end of the process.
March 12, 2025 at 1:23 PM
With our digitization at Harvard Law School Library, we'll work to increase access to unique collections, such as the Supreme Court Records and Briefs that are critical to understanding decision-making at the highest U.S. court yet remain largely inaccessible.
March 5, 2025 at 3:36 PM
If you're part of a library, university, or other knowledge institution and interested in working with a team of data scientists to refine and publish your data, we'd love to chat. And if you're a data scientist or community builder interested in working with institutions, we're hiring.
March 5, 2025 at 3:36 PM
IDI is building a collection of large, impactful, and widely available datasets to increase AI’s accessibility and diversity while reaffirming institutions as stewards of knowledge.
March 5, 2025 at 3:36 PM