Thankful for Repositories and OA advocates

It’s American Thanksgiving this week, and we sure are thankful. We’re thankful for so many people and what they do — those who fight for open data, those who release their software and photos openly, the folks who ask and answer Stack Overflow questions, the amazing people behind the Crossref API…. the list is long and rich.

But today I want to shout out a special big thank you to OA advocates and the people behind repositories. Without your early and continued work, it wouldn’t be true that half of all views to scholarly articles are to an article that has an OA copy somewhere, and even better this number is growing to 70% of articles five years from now. That changes the game. For researchers and the public who are looking for papers, and for the whole scholarly communication system in how we think about paying for publishing in the years ahead in ways that make it more efficient and equitable.

I gave the closing keynote at Open Repositories 2019 this year, and my talk highlighted how the success of Unpaywall is really the success of all of you — and how we are set for institutional repositories to be even more impactful in the years ahead. It’s online here if you want to see it. We mean it.

Thank you.

New: Open crowdsourced list of Society Journals

Unpaywall Journals needed data on whether a given journal is associated with an academic society, to help inform librarians in their subscription decisions. Alas there was no open source of this information.

There is now! Thanks to 60+ contributors over the last week, all Elsevier and Wiley journals have now been annotated with whether or not they are a society journal. Many also have the society name itself listed in the notes.

We are releasing this dataset CC0 in its Google Spreadsheet now, and will clean it up and host it in a stand-alone API endpoint in the coming weeks. It has already been pulled into Wikidata! Others are welcome and encouraged to use it however they’d like 🙂

Thanks so much to all of these contributors, some of whom annotated hundreds of journals:

  • Lauren Maggio
  • Eamon Costello
  • Hugo Gruson
  • Heather K Moberly
  • Sofie Wennström
  • josmel pacheco-mendoza
  • Kate O’Neill
  • Stefanie Haustein
  • Lisa Matthias
  • Kathryn Pelland
  • Camilla Lindelöw
  • Amanda Whitmire
  • Iara Vidal
  • Raquel Donahue
  • Sam Teplitzky
  • Steffi Grimm
  • Marianne Gauffriau
  • Anonymous Dinosaur Librarian (> 60 and still bringing it!)
  • Maximilian Heimstädt
  • Kendra K. Levine
  • Ranti Junus
  • Nicki Clarkson
  • KT Vaughan
  • Sarah Severson
  • Christie Hurrell
  • Philipp Zumstein
  • Lucy Carr Jones
  • Emma U.
  • Chris Rusbridge
  • Diana Wright
  • Biljana Kosanovic
  • Milica Sevkusic
  • Patricia Brennan
  • Emilio M Bruna
  • Bevan S Weir
  • Irene Barbers
  • Oskia Agirre
  • Sarah R. O. Santos
  • Olivier Pourret
  • Phil Gooch
  • FrĂ©dĂ©rique Bordignon
  • Jackie Proven
  • Tobias Steiner
  • Eleanor Colla
  • Aidy Weeks
  • George Matsumoto
  • Egon Willighagen
  • Rob Hooft
  • Iseult Lynch
  • Andrew Gray
  • Heather Lang
  • Ethan White
  • Sarah Steele Cabrera
  • Didier Torny
  • Bruce Caron
  • Eleta Exline
  • Teresa Schultz
  • Christy Caldwell
  • Richard Abdill
  • Anthony Hamzah
  • Marc Couture

This was a great community push, and it is all of ours, and we’re sure thankful.

Unpaywall Journals — helping librarians get more value from their serials budget

We’re thrilled to announce a new product:

Unpaywall Journals is a data dashboard that combines journal-level citations, downloads, Open Access statistics and more, to help librarians confidently manage their serials collections.

Learn more, join the announcement list, and help spread the word.

It’s going to be big.

Update: In May 2020 we changed the name of Unpaywall Journals to Unsub.

.

Green OA lag

Ok I know for maximum impact we should probably spread all these blog posts out over multiple days, but I’m way too eager to share — I think people interested in Green OA will be really interested in this, I know I am.

It’s from the supplementary information section of the preprint, Section 11.1:

In the figure below we plot the number of Green OA papers made available each year vs their date of publication. The first plot is a histogram of number of papers made available each year (one row for each year).

The next plot is the same, but superimposes the articles made available in previous years. This stacked area represents the total cumulative number of Green OA papers that are available in that year — if you were in that year and wondering what was available as Green OA that’s what you’d find.

The third plot is a larger version of the availability as of 2018, showing the accumulation of availability. It allows us to appreciate that less than half of papers papers published in, say, 2015, were made available the same year — most of the papers have been made available in subsequent years. The fourth plot is a slice in isolation, for clarity: the Green OA for articles with a Publication Date of 2015.

Again, this last plot is when articles that were published in 2015 were actually made available in repositories. As you can see at the bottom of the stacked bar, a very few articles that were published in 2015 were actually posted in a repository in 2014. Those are preprints. A lot of articles published in 2015 appeared in a repository in 2015, but even more had a delay and didn’t appear in a repository until 2016. A full 40% of articles had an OA lag of more than a year, including some with an OA lag of four years!

More details on data collection are in the paper — just wanted to dig this out of Supplementary Information so that fellow nerds who’d enjoy this data don’t miss it 🙂

The Future of OA: what did we find?

Here are some of the key findings from the recent preprint on the Future of OA:

  • By 2025 we predict that 70% of all article views will be to articles available as OA — only 30% of article view attempts will be to content available only via subscription.
    • This compares to 52% of views available as OA right now, so it’ll be a big change in the next five years.
  • The numbers of Green, Gold, and Hybrid articles have been growing exponentially, and growing faster than Delayed OA or Closed access articles:
    • articles by year of observation, with exponential best fit line:
  • The average Green, Gold, and Hybrid paper receives more views than its Closed or Bronze counterpart, particularly Green papers made available within a year of publication.
    • views per article, by age of article:
  • Most Green OA articles become OA within their first two years of publication, but there is a long tail.
    • articles made newly Green OA in each the last four years, histograms by year of publication:
  • One interesting realization from the modeling we’ve done is that when the proportion of papers that are OA increases, or when the OA lag decreases, the total number of views increase — the scholarly literature becomes more heavily viewed and thus more valuable to society. This is intuitive, but could be explored quantitatively in future work using this model or ones like it.

Anyway, there are more findings too, but those are some of the main ones.

New perspective for OA: Date of Observation

We’d like to share one of the fun parts of our recent preprint. It’s fun because the concept of Date of Observation helps to untangle issues around embargoes — and also because we think we came up with a neat way to explain what is otherwise a fairly complicated concept, and hopefully make it accessible to everybody.

See what you think — here is our description of the Date of Observation, from section 3.3 of the preprint:

Let’s imagine two observers, Alice (blue) and Bob (red), shown by the two stick figures at the top of the figure:

Alice lives at the end of Year 1–that’s her “Date Of Observation.” Looking down, she can see all 8 articles (represented by solid colored dots) published in Year 1, along with their access status: Gold OA, Green OA, or Closed. The Year of Publication for all eight of these articles is Year 1.

Alice likes reading articles, so she decides to read all eight Year 1 articles, one by one.

She starts with Article A. This article started its life early in the year as Closed. Later that year, though–after an OA Lag of about six months–Article A became Green OA as its author deposited a manuscript (the green circle) in their institutional repository. Now, at Alice’s Date of Observation, it’s open! Excellent. Since Alice is inclined toward organization, she puts Article A article in a stack of Green articles she’s keeping below.

Now let’s look at Bob. Bob lives in Alice’s future, in Year 3 (ie, his “Date of Observation” is Year 3). Like Alice, he’s happy to discover that Article A is open. He puts it in his stack of Green OA articles, which he’s further organized by date of their publication (it goes in the Year 1 stack).

Next, Alice and Bob come to Article B, which is a tricky one. Alice is sad: she can’t read the article, and places it in her Closed stack. Unbeknownst to poor Alice, she is a victim of OA Lag, since Article B will become OA in Year 2. By contrast, Bob, from his comfortable perch in the future, is able to read the article. He places it in his Green Year 1 stack. He now has two articles in this stack, since he’s found two Green OA articles in Year 1.

Finally, Alice and Bob both find Article C is closed, and place it in the closed stack for Year 1. We can model this behavior for a hypothetical reader at each year of observation, giving us their view on the world–and that’s exactly the approach we take in this paper.

Now, let’s say that Bob has decided he’s going to figure out what OA will look like in Year 4. He starts with Gold. This is easy, since Gold article are open immediately upon publication, and publication date is easy to find from article metadata. So, he figures out how many articles were Gold for Alice (1), how many in Year 2 (3), and how many in his own Year 3 (6). Then he computes percentages, and graphs them out using the stacked area chart at the bottom of the figure. From there, it’s easy to extrapolate forward a year.

For Green, he does the same thing–but he makes sure to account for OA Lag. Bob is trying to draw a picture of the world every year, as it appeared to the denizens of that world. He wants Alice’s world as it appeared to Alice, and the same for Year 2, and so on. So he includes OA Lag in his calculations for Green OA, in addition to publication year. Once he has a good picture from each Date Of Observation, and a good understanding of what the OA Lag looks like, he can once again extrapolate to find Year 4 numbers.

Bob is using the same approach we will use in this paper–although in practice, we will find it to be rather more complex, due to varying lengths of OA Lag, additional colors of OA, and a lack of stick figures.

The Future of OA: A large-scale analysis projecting Open Access publication and readership

We are excited to announce our most recent study has just been posted on bioRxiv:

Piwowar, Priem, Orr (2019) The Future of OA: A large-scale analysis projecting Open Access publication and readership. bioRxiv: https://doi.org/10.1101/795310

This is the largest, most comprehensive analysis ever to predict the future of Open Access. Importantly, we look not only at publication trends but also at *viewership* — what do people want to read, and how much of it is OA?

The abstract is included below, we’ll be highlighting a few of the cool findings in subsequent blog posts, and you can read the full paper here (DOI not resolving yet). All the raw data and code is available, as is our style: http://doi.org/10.5281/zenodo.3474007. Enjoy, and let us know what you think!


Understanding the growth of open access (OA) is important for deciding funder policy, subscription allocation, and infrastructure planning.

This study analyses the number of papers available as OA over time. The models includes both OA embargo data and the relative growth rates of different OA types over time, based on the OA status of 70 million journal articles published between 1950 and 2019.

The study also looks at article usage data, analyzing the proportion of views to OA articles vs views to articles which are closed access. Signal processing techniques are used to model how these viewership patterns change over time. Viewership data is based on 2.8 million uses of the Unpaywall browser extension in July 2019.

We found that Green, Gold, and Hybrid papers receive more views than their Closed or Bronze counterparts, particularly Green papers made available within a year of publication. We also found that the proportion of Green, Gold, and Hybrid articles is growing most quickly.

In 2019:

  • 31% of all journal articles are available as OA
  • 52% of all article views are to OA articles

Given existing trends, we estimate that by 2025:

  • 44% of all journal articles will be available as OA
  • 70% of all article views will be to OA articles

The declining relevance of closed access articles is likely to change the landscape of scholarly communication in the years to come.


Additional blog posts about this paper:

Impactstory is now Our Research

Big news: today Impactstory is changing our name! Meet: Our Research!

1. Why the change?

TL;DR we outgrew our old name and need a new one that fits broader scope of our work.

We’ve been passionate about Open Science from the beginning. That’s what we both researched as academics. And it’s what brought us together eight years ago, in the impromptu all-night hackathon where we built the first version of Impactstory Profiles. Open Science has been our passion through fast times and slow, fat times and lean. That’s Us.

Because of that we’ve jumped at chances to take on new Open Science infrastructure projects in the last eight years, projects like:

  • Unpaywall, an open index of the world’s Open Access papers,
  • Get The Research, a website to help regular people find, read, and understand research,
  • Depsy (and its yet-unnamed follow-up) to help show the impact of research software,
  • and we’ve got several new projects launching later this year (stay tuned :).

We’ve never seen these as distractions from our mission. We’ve seen them as our mission. And we’ve been thankful to have had the chance to work across several of the schools of Open Science. That’s going to continue as in coming months we leverage our new ability to fund projects with self-generated revenue. We’re thrilled at this.

However, it does mean that Impactstory name is becoming increasingly confusing. We love helping folks tell Stories about Impact…but that’s not all we do, and hasn’t been for a while now. So it’s time to change our name to reflect that.

2. Why the Our Research name?

TL;DR: “Research” means what it says. “Our” means we want research to belong to 1) humankind and 2) the academic community.

To answer that question more fully, let’s break the name down into its parts:

Research: The global Research enterprise is what we want to improve. And all research, not just Science (although we do suspect that the term “Open Science” is, while lamentably inaccurate, probably here to stay at this point). 

Our: Of course our is a possessive we. So who’s the “we” and what’s it possessing? There are two answers:

Most broadly we is…everyone. It’s every human who has ever woken up on this rock with a list of unanswered questions and unsolved problems and thought, hey let’s figure this out. Research is how we figure it out. The “our” is  possessive because (we believe) research belongs to to all of us, as humans. Knowing is a team sport. Our Research is dedicated to making our research knowledge more open and accessible to our species, because we’re all in this together.

More narrowly (and less grandiosely), we is the academic community: researchers, administrators, librarians, and everyone else working together to create all this new knowledge. We in the nonprofit academic world have our own way of looking at things, a perspective that’s quite different from the profit-driven priorities of the business world. Collaboration with for-profits can be valuable. But we (and lot of other folks)  don’t think for-profits should own our core scholarly infrastructure. We should. The scholarly community.  As a mission-driven nonprofit, Our Research works to build our research infrastructure in ways concordant with the shared values of our academic community. A lot of other folks feel the same.

3. What is Our Research trying to do?

TL;DR: we’re about what we’ve always been about: helping to bring about universal Open Science by building open, functional, sustainable infrastructure.

We felt like the new name was a good excuse to sit down and explicitly articulate our core values. There’s five. We value:

  • openness: We default to sharing. Our code is open-source and our data is open, too.
  • progress:  We seek revolution. We want to transform how scholars share, assess, and reuse research, moving beyond the paper to value all research products
  • community: We reach out. We’re proud to lead, proud to follow, and proud to work with anyone who shares our values. 
  • pragmatism:  We favor action over words. We make do with what we have, take what we can get. We ship.
  • sustainability: We’re not too proud or pure to hustle for cash–revolutions ain’t free. We’re now financially self-sustaining and aim to stay that way.

We’re so excited to move forward, guided by these values. We’ve got a lot to learn still, and a long long way to go before we reach our goals. But we’re bigger, better-funded, and more motivated than we’ve ever been. We are so, so thankful to everyone who has supported Impactstory for the last eight years. We hope that in the Our Research era we’ll make y’all proud. We’re sure gonna do our best. 

If you’d like to be notified about the cool stuff we’re launching later this year, sign up for our mailing list!

Podcast episode about Unpaywall


 

I recently had a fun conversation with @ORION_opensci for their just-launched podcast.

The episode is about half an hour long, and covers what @Unpaywall is, who uses it, how it came about, a bit about how it works, thoughts on the importance of #openinfrastructure, the sustainability model, how open jives with getting money from Elsevier, #PlanS, how to help the #openscience revolution…

Anyway, here’s where you can listen (you can either load it into your Podcast app, or just press “play” on the webpage player):

https://orionopenscience.podbean.com/e/scaling-the-paywall-how-unpaywall-improved-open-access/

(Or here’s the MP3.)

Thanks for having me @OOSP_ORIONPod, it was super fun!  And do check out the rest of the episodes as well, they are covering great topics:

 

What should a FAIR checker include?


The Wellcome Trust is considering funding a tool that would report on the FAIR status of research outputs.  We recently responded to their Request for Information with some ideas to refine their initial plan and thought we’d share them here!

a) Include Openness Assessment

[Figure source]

We believe the planned software tool should not only assess the FAIRness of research outputs, but also their Openness.  As described in the recent Final Report and Action Plan from the European Commission Expert Group on FAIR Data:  “Data can be FAIR or Open, both or neither. The greatest benefits come when data are both FAIR and Open, as the lack of restrictions supports the widest possible reuse, and reuse at scale.”    

This refinement is essential for several reasons.  First, we believe researchers will be expect something called a “FAIR assessment” to include assessing Openness, and will be confused when it does not, leading to poor understanding of the system.  Second, the benefit of openness is clear to everyone and increases the motivation of the project to researchers. Third, Wellcome has done a great job of highlighting the need for openness already and so it helps the tool be an incremental addition to the work they have done rather than a different, new set of requirements with an unclear relationship.  Fourth, an openness assessment tool is needed by the community, and would fit very well in the proposed tool, and its anticipated popularity and exposure would help the FAIR assessment gain traction.

 

b) Require the tool produce Open Data, not just be Open Source

The project brief was very clear that the tool needs to be Open Source, with a liberal license.  This is great. We suggest the brief needs to add that the data provided by the tool will be Open Data.  Ideally the brief would suggest a license for the data (CC0, or an open database license which facilitates reuse including commercial reuse) and data delivery specifications.  For data delivery we suggest both regular full data dumps and also a machine-readable free open JSON API which requires minimal registration, is high performing (< 1 second response time), can handle a high concurrent load, has high daily quota limits, and can handle at least a million calls per day across the system.

It could also specify that money could be charged for Support-Level Agreements for the API for institutions who want that, or for above-normal quotas on the API, for more common data dumps, or similar.  This is similar to our Unpaywall open data model which has worked very well.

 

c) Pre-ingest hundreds of millions of research objects

The project brief should make it more explicit that the software tool needs to launch with pre-calculation of scores/badges of a hundreds of millions of research objects.   We luckily live in a world where many research objects are already listed in repositories like Crossref, DataCite, Github, etc. These should be ingested and form the basis of the dataset used by the tool.  This pre-ingesting is implicitly needed to do some of the leaderboards and aggregations specified by the brief: in our opinion it should be more explicit. It will also allow large-scale calibration of scores, large-scale datasets to be exported to support policy research, additional tools, etc, and would assure a high-performing system which can not be assured when FAIR assessments are made ad-hoc upon request for most products.

(Admittedly gathering research objects registered in such sources naturally selects research objects that have identifiers, and a certain standard and kind of metadata and FAIR level, so it isn’t representative of all research objects — this needs to be considered when using it for calibration)

 

d) More details on aggregation

The brief doesn’t include enough details on aggregation.  In our opinion aggregation is key.

Aggregation supports context for FAIR metrics and badges (through percentiles etc), facilitates publicity, inspires change and improvement, etc.  Most research objects do not have metadata that supports interesting aggregation right now — datasets are rarely associated with an ORCID or institution, etc.  RFPs should specify how they will facilitate aggregation. We anticipate the proposals will include combination of automated approaches using metadata (use crossref and datacite metadata, and pubmed linkout data, to associate datasets with papers, which are themselves associated with ORCIDs and clinical trial IDs and GRID institutional identifiers) and text mining (to associate github links with papers) etc, and methods for CSV uploads to link identifiers to aggregation groups

 

e) Include Actionable Steps for immediate FAIR score improvement

The brief should specify that after showing them their scores, the tool links researchers to actionable steps that they should take to improve their FAIR and Open Data scores.  These could simply be How-to guides — how to put your software on Github, how to specify a license for your dataset, how to make your paper Open Access via uploading the accepted manuscript etc. They should walk the researcher through how to improve their score on existing products, and then immediately recalculate the FAIR score so the researcher can see progress.  If this sort of recalculation ability is not built in to the design from the beginning it can be lead to system designs which make it difficult to add later.

 

f) Open grants process for this RFI

The RFP should give applicants the option to make their proposals public (and encourage them to do so), and the grant reviews should be public.  Or at least make steps forward on this, in the spirit of incremental improvement on the Wellcome’s great Open Research Fund mechanisms.