Crowd-sourced Curation and Publication of Special Collections Materials

This is the last in a series of posts about the teams who will be attending the Institute in November, and their projects. This was submitted by Josh Sosin.

Image of papyrus fragmentLet’s a imagine a student has been working on his Coptic, getting good, looking for a short research project. He discovers that the Rubinstein Library at Duke owns a fragment of I Kingdoms in Sahidic, sits down with the original in the reading room, takes careful notes on its physical and palaeographic features, transcribes the text and collates it against the textual tradition, and leaves at closing time. Later, he discovers that the fragment is our earliest witness to Sahidic I Kingdoms, pre-dating the next oldest witness by half a millennium! What’s more, it shows remarkably little difference from the later text, suggesting a very stable tradition and transmission, entirely out of keeping with scholarly consensus. He writes it up: A. Butts, “P.Duk.inv. 797 (U) – I Kingdoms 14:24-50 in Sahidic,” Le Muséon 118 (2005) 7-20. The discovery is modest, but important. The discipline re-factors what it thinks it knows about I Kingdoms in the light of the new find. Scholars re-group.

But, as typically happens, the patron took notes offline, transcribed and collated the text offline, dated the text on the basis of palaeographic comparanda offline, and published his findings offline. The scholarly workflow that generated the data, produced the findings, and communicated both to the wider community does not touch the library until it receives the journal in which the findings are published; even then, the enhanced information may not effect local intellectual control of the object. This is a missed opportunity, and also the historical pattern: patrons have entered special collections libraries, transcribed, translated, contextualized, and annotated materials, and then walked away knowing in some cases more about the materials than the libraries themselves do. But thanks to a wide variety of crowd-sourcing tools and practices, Libraries are now in a position to support more of that scholarly workflow, bringing more of the results back into the curatorial fold and sharing them with a wider audience than most specialized scholarly publications tend to target.

This SCI group brings together a diverse team of librarians, digital humanisits, faculty, and programmers, to ask what it would take to:

  • pilot an instance of FromThePage, a free, open-source, lightweight transcription, translation, and annotation tool
  • develop undergraduate and graduate classes that focus on scholarly ‘publication’ of special collections materials—including development of workflows to support adding surrogates of original documents digitized in the field (by students and scholars), for scholarly curation by students, scholars, and other partons
  • publish textual content of same in an open, online, free, Duke University Libraries branded venue
  • integrate content with Duke University Libraries digital exhibits workflows, with a view to creating educational mechanisms and vehicles for translating complicated disciplinary materials to a mass audience
  • erect workflows that allows libraries to pull crowd-generated knowledge back into local repositories, catalog records, finding aids etc.

In other words, we mean to ask what it will take to allow future patrons to transcribe, translate, annotate, and ‘pre-publish’ special collections materials in real time, on a Duke-hosted platform; to open results to peer-review; to feed enhancements back into local library controls; to allow others in turn to annotate, emend, and improve these findings; to feed the cumulative results into a sustainable repository of Duke University Libraries digital exhibit materials; and to grow and sustain this entire scholarly eco-system via locally hosted environment that helps transform the owning institution from data host (here are some materials) to knowledge cultivator (here is a place in which our ever-growing, ever-changing knowledge about these materials is made), to become the technical and intellectual hub for scholarly communication around its precious sources.

Screen shot of Brumfield diary in FTP system

 

The members of the group are:

  • Ryan Baumann, Duke Collaboratory for Classical Computing; has been prototyping FromThePage amateur transcription tooling for use cases like the one proposed here; longtime developer of papyri.info, which is a multi-author transcription and editing tool for ancient papyrological texts.
  • Meg Brown, E. Rhodes and Leona B. Carpenter Foundation Exhibits Librarian, Duke University Libraries; the exhibitions program, physical spaces often with a virtual counterpart, includes library created content, but increasingly more faculty and student curated exhibitions that showcase library materials AND/OR University scholarship. The exhibits program educates students and faculty about how to tell their scholarly story to a mass audience.
  • Hugh Cayless, Duke Collaboratory for Classical Computing; has been prototyping FromThePage amateur transcription tooling for use cases like the one proposed here; longtime developer of papyri.info, which is a multi-author transcription and editing tool for ancient papyrological texts.
  • Noah Huffman, Archivist for Metadata and Encoding, Rubenstein Library; one of the more complicated design considerations will be crowd-sourcing of metadata generation and feeding such, which are inherently more prone to conflict than transcription data are, back into local materials.
  • Liz Milewicz, Head, Digital Scholarship Services, Duke University Libraries.
  • Josh Sosin, Duke Collaboratory for Classical Computing; Associate Professor of Classical Studies and History, Co-Director of the DDbDP, Associate editor of Greek, Roman, and Byzantine Studies; an epigraphist and papyrologist interested in the intersection of ancient law, religion, and the economy.