An Analytical Attribution Framework

This is the fifth and final in a series of posts about each of the teams that will be attending SCI 2015, and their projects. This one was submitted by Christopher Blackwell.

What are the goals of your project and how do they fit the theme of this year’s Institute?

Attributing and valuing scholarship was easy when scholarship was monographic and communities of scholars were small. It was easy to attach an author to a citation when the author was universally known (“Aristotle”, “Linnaeus”) and the citation pointed to a clearly defined, grossly granular publication, one of relatively few to have emerged in a given year. Scholarship has always emerged from collaboration, but in a rigid hierarchy it was easy to collapse a team of researchers to a single named authority.

When scholarship was largely monographic, attribution could remain monolithic: a name or one-dimensional list of names attached to a work. Just as scholarship supported one primary and straightforward method of interaction—reading—attribution likewise supported a relatively simple number of methods: credit for authorship, ranking in a list of authors, a multiplier based on the perceived value of the work.

Detail from the Venetus A manuscript, showing Iliad 3.1-9

The promise of digital scholarship lies in the potential for synthesis and analysis, a much richer body of operations that may extract more meaning and prompt more insight from a given body of data. We wonder if approaches to synthesis and analysis that have proven fruitful for our own research might also be fruitful ways of approaching how we credit and value contributions to that research. We have all encountered problems of attributing and valuing authorship in situations like:

  • many editors producing a single edition of a text,
  • a group of developers contributing to a single software project,
  • many editors indexing or commenting on a body of data (texts, images, &c.),
  • scholars producing complementary analyses of a given text, (that is, one scholar produces a syntactic analysis, and one a semantic analysis),
  • scholars producing exclusive analyses of the same data (that is, one scholar analyzes syntax one way, another analyzes it another way).

In each case, while it is possible to attach “authorship” to individual pieces of work—lines of code or XML, individual indexed relationships, individual analysis—it is extremely difficult to quantify the significance of each author’s contribution:

  • Editor A enters an initial OCR text of the Iliad to a GitHub repository, thus “contributing” 15,000 lines; Editor B meticulously documents variants, receiving credit for only 75 lines.
  • Author N writes a short algorithm that is called innumerable times throughout the execution of a piece of software; it is brilliant because it consists of only 12 lines of code.
  • Scholar A captures the syntax of a complex sentence in Thucydides; Scholar B builds on that initial analysis, making it better. It would be desireable to capture Scholar B’s debt to Scholar A, and the extent to which the two analyses differ.

Who is on your team, and what are you hoping they will contribute to the project?

The members of this proposed working group have extensive experience applying innovative approaches to analysis, for topics that require collaborative effort across diverse data, often under conditions of uncertainty:

  • documenting conflicting interpretations of damaged text-bearing artifacts,
  • integrating various kinds of image-data for recovering lost text,
  • exploring the intersection of syntactic and semantic graphs of texts,
  • associating metadata with texts and data-structures at differing levels of granularity,
  • capturing iterative analyses of corpora undergoing collaborative editing,
  • aligning diverse data across generic and chronological axes,
  • building learning portfolios to track specific performance in the acquisition of a foreign language.

Our team members are:

  • Bridget Almas, the lead software developer and architect for the Perseus Digital Library
  • Christopher W. Blackwell, Project Architect for the Homer Multitext and co-developer (with Neel Smith) of the Canonical Text Service Protocol
  • Francesco Mambrini, research fellow at the Digital Humanities Department (IT-Referat) of the German Archaeological Institute
  • Ségolène Tarte, senior research fellow at the University of Oxford’s e-Research Centre
  • Gabriel A. Weaver, a Research Scientist at the Coordinated Science Laboratory at the University of Illinois at Urbana-Champaign

What do you look forward to most from SCI, and what do you hope to accomplish through the Institute?

All the members of our team have talked about these issues, and possible technological solutions, for years, in various ad hoc conversations. The Institute will give us the most valuable opportunity of dedicated in corpore time and space to align our individual ongoing work and technologies with our shared goal of flexible, expressive, machine-actionable attribution and evaluation.

We are also looking forward to the opportunity to share ideas with, gather criticism from, and face the need of clear presentation to the larger group that will gather in October.

Venetus A

What are your plans for next steps after the Institute this fall?

We are all engaged in collaborative work that could immediately serve as test-beds for ideas about analytical attribution. Blackwell’s work on historical botany, for example, continues to engage undergraduate students from across disciplines, over relatively short periods of time, contributing diverse and very specific data to an evolving digital library: taxonomic indexing, medical commentary, historical essays, transcriptions of letters, compilation of geo-spatial data, photography. Almas, in her work on Perseids has an immediate need and audience for innovative approaches to citation of complex and evolving analyses. Mambrini’s research at the DAI is likewise focused on analysis of texts and meta-analysis of scholarly interpretation.

Weaver notes that an analytical framework for attribution would be immediately useful in the domain of computer science as a discipline and within industry. Currently, there is demand among practitioners to be able to search, retrieve, and measure the evolution of multiple versions of security policies and compliance reports over time.

The proposed discussion for this SCI workshop can have strong impacts on the current activities of the Deutsches Archäologisches Institut, and any conclusions that we reach would be welcomed contributions to any of the ongoing outreach initiatives of the Institute, e.g. the Digital Classicist Berlin series of symposia. Mambrini is a co-chair of the conference on “Corpus-Based Research in the Humanities” (next held in Warsaw in December 2015), whose participants would be specifically interested in this topic.

Is there anything else you’d like to say about your project or participation in the Institute?

The Scholarly Communication Institute represents an opportunity that is all too rare: a space for forward-looking conversation among scholars from different disciplines. We are excited at the prospect, and honored to have been invited.

[ Image credits: and ]