Major Update to Unpaywall Database

We recently announced major changes to Unpaywall on our Unpaywall google group (https://groups.google.com/g/unpaywall) and via email to Unpaywall Premium Subscribers. A lot of folks aren’t on the group so we’re announcing here as well.


TL;DR
Unpaywall has migrated to a new codebase that helps us address data quality issues faster, and you may notice some changes.

  • The API is way faster → 10× faster API responses (avg 500 ms → 50 ms).
  • Some data has changed → About 23% of works saw data change, with about 10% seeing changes in oa_status (green, gold, etc) and 5% in is_oa (closed or open).
  • Overall accuracy is similar → Overall, precision remains constant. We have better recall of some Gold articles and worse detection of some Green articles.
  • Tiny schema changes→ Your scripts, API calls, and data feeds keep running, but two fields are now deprecated (oa_locations.evidence & oa_locations.updated)
  • Community curation → Users can now report and fix errors at unpaywall.org/fix.
  • Action required only if you host the full dataset locally (details below).

Why rewrite a perfectly good tool?

A decade ago we developed Unpaywall to:

  1. make open access research in institutional repositories discoverable by users globally,
  2. track open access behaviours and generate evidence for effective open access policies, and 
  3. raise the bar for open infrastructures by ensuring that the industry standard for determining open access status, was itself completely open. 

We’re happy to report it has been very effective at achieving those goals: 

  • Our Chrome and Firefox extensions are used by 800k monthly active users around the world, 
  • Unpaywall sees an average of 200 API calls per second every second of the year, 
  • Unpaywall now underpins every major open access monitoring and tracking initiative globally, and 
  • Unpaywall has demonstrated an effective model for operating open research infrastructure. 

Over the years, Open Access has become increasingly important to researchers, institutions, funders, and publishers. And steady changes over the years brought us to a publishing system that looks differently than the one we started in. At first, it was exceedingly rare for a publication that was open access to later become closed access. It was rare for publishers to make closed access works openly available for short times (like during COVID). And with the exception of embargo periods, it was rare for closed journals to later be made completely open. 

All of these are common now, and at the scale of millions of publications. And publication landing pages aren’t just about providing the user with access to information– they also now collect information on users. As scholarly communication has evolved, it was clear that Unpaywall needed to evolve from a product into a process. And unfortunately, the code base that supported Unpaywall was struggling to adapted. With every change, we introduced new bugs and fixing each new bug kept creating more bugs. To continue delivering high quality open access metadata in an efficient way, we needed to start from scratch.

We spent the last year completely re-writing the code base for Unpaywall to make it: 

  • faster; 
  • easier to fix when it breaks; and 
  • easier for users and publishers to curate.

On May 20, 2025 we launched the update. We have been working with our premium subscribers to implement the changes of their locally hosted databases that rely on Unpaywall. Most of our users switched to the new code base without even noticing– and that was intentional. Still, we think it is important for our users, especially those whose work depends on the Unpaywall database to understand these changes.


What didn’t change

Stable as everDetails
Data format & schemaAll keys stay the same (only the fields: oa_locations.evidence and oa_locations.updated are now marked “deprecated”).
API & data feed URLsZero downtime, same endpoints.
Aggregate metrics
10% of records saw a change in oa_status (i.e., color) and 5% saw a change in is_oa (open access vs. closed). Some changes were improvements and some were degradations, but overall precision remains the same

What did change

Better than beforeHow it helps you
SpeedAverage API now returns in 50 ms, compared with 500ms before–10x speedup! ⚡
AccuracyWe detect more Gold OA, licenses, fresh OA URLs, and works that were once open access but are now closed. We detect less Green OA (but we’ll be able to improve that soon).
Curation UIUsers around the world can submit fixes via a web form; they go live in days.
Bulk CurationPublishers can now directly submit to us bulk changes when their journals change from closed to open (or vice versa); they go live within 2 weeks.
Bug-fix velocityCleaner code = faster bug fixes.

Do you need to do anything?

Your setupRequired action
API-onlyNothing. You’re already on the new code and likely didn’t even notice
Data-feed mirrorDownload our one-time “May 20 Snapshot” and overwrite your current database—too many small tweaks for a changefile.

Meet the new Curation Portal

We heard loud and clear from our users that they need to be able to fix open access metadata errors when they find them. And that’s why we developed a community curation pipeline for Unpaywall. 

Found a record that still looks off? Head to unpaywall.org/fix, flag the issue, and we’ll merge your correction shortly (typically within 3 business days). Your expertise powers continual data quality improvements. 

If you have ideas on how to improve the functionality of the curation user interface, please send them to brett@ourresearch.org


Looking ahead

  • Community curation of Unpaywall will become increasingly important for overall database accuracy and fixing in Unpaywall will fix in all downstream users (Web of Science, Scopus, Dimensions, and more).
  • We will collaborate more closely with publishers directly to make large-scale changes associated with journal policy changes more quickly and accurately.
  • We will continue refining specific parts of our pipelines to increase their overall reliability, including better detection of OA status, journal OA status, license information, and fulltext links.
  • Users will see faster patch cycles for reported issues.
  • We will increase repository coverage and enhance linkage between publisher and repository versions.
  • Later this summer, we’ll be launching a full re-write of OpenAlex to bring the databases into closer alignment where they overlap (i.e., OA status metadata for publications with Crossref DOIs)

Thank you

We heard loud and clear from our communities of users that timely fixes of data quality issues is critical for them to be able to rely on Unpaywall. And we know that our response times slipped while we tackled this rewrite—thanks for sticking with us! 

If you spot an error in the Unpaywall database that you would like to see fixed, the fastest way is to do that at unpaywall.org/fix. If you have other questions, send a note to  support@unpaywall.org.

Here’s to a faster, cleaner, and ever-more-useful Unpaywall!

The OurResearch Team