Collecting and analyzing usage data for online scholarly publications

This is the first in a series of posts about each of the teams that will be attending SCI 2015, and their projects. This one is adapted from the text of the proposal submitted by Kevin S. Hawkins.


Digital technologies have made it relatively easy and inexpensive for a broad range of traditional and new publishing entities to produce quality scholarship in digital forms. Unfortunately, aspirant digital publishers have encountered significant problems in collecting and analyzing usage data about such publications. A key challenge is that usage data is often available on more than one platform, including internal systems (such as a publisher’s institutional repository or other local web publishing infrastructure), vendor-hosted solutions (such as bepress Digital Commons), and third-party platforms that include open-access content (such as HathiTrust, the OAPEN Library, the author’s institutional repository, a disciplinary repository,, and the Internet Archive). Aggregated usage data allows the publisher to assess the impact of their online publications, to make strategic and business decisions about their publication operations, and to share the data with its authors. Furthermore, usage data helps publishers demonstrate their value to funders and administrators and to potential authors skeptical of online, especially open-access, publishing.

Streamgraph exampleTo give a concrete example, Writing History in the Digital Age is a volume of essays subjected to open peer review and commenting on a website hosted by Trinity College. Subsequent to its availability on the Trinity College site, the University of Michigan Press (U-M Press) published the book as part of its digitalculturebooks series, making it freely available to read online and available for purchase in print and as an e-book.  Unfortunately this desirable increase in access has made it difficult for the editors, contributors, or even U-M Press to assess how the book is used in its various editions, given the siloization of data in each platform. At the present time, Jack Dougherty, one of the co-editors, collects usage data from the Trinity College site using Google Analytics, and U-M Press collects usage data from its platform for the free online version using a combination of Google Analytics and spreadsheets of data from a homegrown usage statistics system. But there is no practical way for the U-M Press to combine its usage data from its two sources, much less with the data from the editor, to see which version of the book readers prefer and whether they buy a copy of the book after exploring one of the free versions. Similarly, if either co-editor or a contributor wishes to demonstrate the impact of their work to a promotion & tenure committee, they would need to request the data from all of the sources and go through the difficult and time-consuming process of compiling it, including aligning the incompatible data recorded by these tools.

In response to the situation exemplified by the above scenario, this working group will do the following:

  1. Look at examples of usage data reports from platforms that produce reports conforming to the COUNTER and related PIRUS standards (which were designed with libraries and content managers in mind) and at the user interfaces of and reports produced by the web analytics tools Google Analytics and Piwik to see:
    • which data is useful to authors and publishers of scholarly literature
    • what kind of data is missing but important for authors and publishers to know
  2. Formulate a set of functional requirements for the study of usage data by authors and scholarly publishers.
  3. Create prototypes of a user interface and usage reports (both inspired by the web analytics tools examined) that a tool to collect and analyze usage data for scholarly publications would provide.

Our working group includes scholars and staff at various organizations with experience publishing works of scholarship online on more than one platform. Many of them have struggled to collect and analyze usage data on their publications and can speak to the sort of data that would be most useful to authors and their publishers, as well as other stakeholders such as funders or administrative leadership at hosting institutions. It also includes an expert on bibliometrics and altmetrics, two methods of quantifying the impact of a work of scholarship through its citations and other mentions online. We will explore ways of including these metrics with data about usage to provide a fuller picture of the total impact of works of scholarship.

Composition of the working group

Kevin S. Hawkins is director of library publishing at the University of North Texas Libraries, where he has established a new scholarly publishing service at the UNT Libraries that complements the UNT Press. Previously, he spent ten years with Michigan Publishing, which includes the University of Michigan Library and Press, whose publications are available on various platforms that produce incompatible usage data. He also currently serves as president of the board of the Library Publishing Coalition.

Sarah V. Melton is digital projects coordinator at the Emory Center for Digital Scholarship at Emory University, where she coordinates the open-access program. She is also a practicing scholar, completing her PhD at Emory University and serving as digital publishing strategist for Southern Spaces and on the editorial board of the Atlanta Studies Network.

Lucy Montgomery is director of the Centre for Culture and Technology at Curtin University and deputy director of Knowledge Unlatched, a non-profit organization piloting a new approach to funding open access monographs. KU’s pilot collection is available on more than one platform, and Montgomery has been closely involved in studying the usage of the pilot collection.

Lisa Schiff is technical lead of the Publishing Group of the California Digital Library. She is responsible for ensuring that CDL’s current and future programs and services related to publishing are as effective and robust as possible. She is also contributing to a Mellon-funded project at CDL and the UC Press to develop a web-based open-source content management system and workflow management system to support the publication of open-access monographs in the humanities and social sciences.  She is a member of the editorial board of the Journal of Librarianship and Scholarly Communication and is co-chair of the ORCID Business Steering Group.

Rodrigo Costas is a researcher at the Centre for Science and Technology Studies at Leiden University. He studies bibliometrics and altmetrics and is interested in the conceptual and empirical differences between altmetrics and usage indicators.

Plans for beyond the SCI

Analytics icon

The working group will produce documents that could be used to guide development of a tool for publishers of online scholarship—including university presses, libraries, and digital scholarship centers—to collect, analyze, and share usage data and altmetrics regarding their publications. We will make these freely available online and seek input from the wider community after the SCI.

Some of the working group members are already seeking funding to develop such a tool, so the final set of documents will also serve to demonstrate to potential funders that an extensive planning phase has already taken place.

[ Streamgraph image by used under CC license. Analytics image by used under CC license. ]