We’re thrilled to announce Impactstory will be collaborating with James Howison at the University of Texas-Austin on a project to improve research software by helping its creators get proper credit for their work. The project will be funded by a three-year, $635k grant from the Alfred P. Sloan foundation.
Research software is an essential component of modern science. But the tradition-bound scholarly credit system does not appropriately reward the academic unsung heroes who create research software, putting further development of software-intensive science in jeopardy. Even when software is mentioned, the mentions are often informal, such as URLs in footnotes or just names in text. Howison, working with doctoral student Julia Bullard, found that 63% of mentions in a random sample of 90 biology articles were informal (Howison and Bullard, 2014).
We’re going to help fix that.
We’ll be working with James and his lab to make a huge database of every research software project used in every paper in the biomedicine, astronomy, and economics literatures. This database will filled in using a deep learning system that’ll automatically extract both formal and informal mentions of software, after being trained on a large, manually-coded gold standard dataset.
We’ll use this database to build and study three cool prototype tools:
- CiteSuggest will analyze submitted text or code and make recommendations for normalized citations using the software author’s preferred citation,
- CiteMeAs will help software producers make clear requests for their preferred citations, and
- Software Impactstory will help software authors demonstrate the scholarly impact of their software in the literature.
We believe these tools will help transform the scholarly reward system into one where where software is a first-class research products, and its authors get full academic credit for their work. This in turn will support the software-intensive open science system we need for the future.
The project will build on our experience creating Depsy, a platform to track the scholarly impact of Python and R packages with an emphasis on dependencies, and on James’ extensive experience researching development in open source software and software in science. For lots more detail on the whole thing, check out the submitted proposal (edit Nov 9, 2016: note this document is not a complete representation of the proposal, since the application and approval process also involved confidential back and forth with reviewers. The reviewers added great comments and insight that we’re incorporating into the work as we go forward.)
Thank you, Sloan. Thanks to Program Director Josh Greenberg for his continued advice and encouragement, and to the grant reviewers for well-informed and helpful feedback. And thanks especially to James, who had this idea in the first place, brought us on board, and has been a patient, good-natured, and ingenious collaborator in a lot of hard work already. We can’t wait to get started!
That’s great news!
This will be really useful for people (like myself) who have been looking for a decent “research software” dataset to use for data analysis to inform policy.
Will you be seeking to do any form of longitudinal tracking (e.g. first/last mention in a paper) or availability categorisation – perhaps as part of SoftwareImpactStory (e.g. flagged as no longer maintained)?