OurResearch receives $7.5M grant from Arcadia to establish OpenAlex, a milestone development for Open Science

OurResearch is proud to announce a $7.5M grant from Arcadia, to establish a sustainable and completely open index of the world’s research ecosystem. With this 5-year grant, OurResearch expands their open science ambitions to replace paywalled knowledge graphs with OpenAlex.

Researchers, funders, and organizations around the world rely on scientific knowledge graphs to find, perform, and manage their research. For decades, only paywalled proprietary systems have provided this information and they have become unaffordable (costing libraries $1B annually); uninclusive (systematically excluding works from some fields and geographies); and unavailable (even paid subscribers are limited in their use of the data).

OpenAlex indexes more than twice as many scholarly works as the leading proprietary products and the entirety of the knowledge graph and its source code are openly licensed and freely available through data snapshots, an easy to use API, and a nascent user interface.

OurResearch has a decade of sustained experience developing tools that advance open science. Funds from Arcadia will fuel the development needed to establish OpenAlex as the go-to scientific knowledge graph for researchers and organizations around the world. Long-term sustainability of OpenAlex will be achieved through value-add premium services.

Development of OpenAlex started only two years ago and it already serves 115M API calls per month; underlies a major university ranking; is displacing proprietary products at Universities; and has established partnerships with national governments. We are excited by these early successes of OpenAlex and its promise to revolutionize scholarly communication and democratize the world’s research.

— — — — 

OurResearch is a nonprofit that builds tools to help accelerate the transition to universal Open Science. Started at a hackathon in 2011, they remain committed to creating open, sustainable research infrastructure that solves real-world problems, like Unpaywall and Unsub.

Arcadia is a charitable foundation that works to protect nature, preserve cultural heritage and promote open access to knowledge. Since 2002 Arcadia has awarded more than $1 billion to organizations around the world.

Coverage in the Financial Times of OpenAlex and the Sorbonne

The Financial Times recently published an article detailing Sorbonne University’s “radical decision” to switch to OpenAlex for its publication database and bibliometric analytics. The article (behind a paywall, unfortunately 😞) came out a little while ago, but we wanted to highlight it here in case you missed it.

The news comes in the context of “a wider pushback against the current model in academic publishing, where researchers publish and review papers for free but have to buy expensive subscriptions to the journals in which they are published to analyse data relating to their work.” It includes a quote from OurResearch/OpenAlex co-founder and CEO Jason Priem: “We felt there’s a mismatch between the values of the academy and the shareholder boardroom. Research is fundamentally about sharing, while for-profits are fundamentally about capturing and enclosing. We aim to create and sustain research infrastructure that’s truly aligned with . . . the values of the research community.”

Exciting times for OpenAlex and open science!

Jack, Andrew. “Sorbonne’s Embrace of Free Research Platform Shakes up Academic Publishing.” Financial Times, December 27, 2023. https://www.ft.com/content/89098b25-78af-4539-ba24-c770cf9ec7c3.

Sorbonne University announces switch to OpenAlex

We at OpenAlex are thrilled at Sorbonne University’s recent announcement that they will be switching to OpenAlex for their publication database and bibliometric analytics, abandoning the use of proprietary products! The Sorbonne, a leading French university, made their announcement in a recent post (click here for the English version; click here for the French version). Starting in 2024, they will be ending their subscription to Web of Science and Clarivate’s bibliometric tools. They will instead be adopting “open, free and participatory tools, and [they are] now working on the consolidation of a sustainable and international alternative, relying in particular on the OpenAlex tool.”

OpenAlex has been working closely with the Sorbonne to make this switch possible, and as they note, “A partnership agreement will shortly be established between Sorbonne University and OpenAlex to formalize their contributions and mutual commitments … and to bring about developments that will meet the needs of its community.” This is an extremely exciting milestone for us and for open science! We invite you all to celebrate with us 🎉🎉🎉!

Assigning Institutions — New England Journal of Medicine Case Study

The New England Journal of Medicine uses a non-standard format when presenting authors and their institutional affiliations, which is a problem when we want to keep track of these links in our data. We developed a custom algorithm to solve this problem, preserving more than a hundred thousand author-institution links.

Linking works, authors, and institutions

Part of a diagram from the OpenAlex docs, showing how authors and institutions are linked to works through authorships.
OpenAlex data has links between works, authors, and institutions.

Works, authors, and institutions are three of the basic entities in the OpenAlex data. Keeping track of the relationships between these entities is one of the core things we do. It’s important that we identify these links correctly, so they can be used for downstream tasks like university research intelligence, ranking, etc. Often, this information comes to us via structured data which is not difficult to ingest. Many times, however, the data is messy, and using it is not so straightforward.

Affiliation data in the New England Journal of Medicine

Publications from the New England Journal of Medicine (NEJM) are an example of this messiness. Author affiliations in these papers are presented in a format that is human-readable, but not straightforward for a computer to parse automatically. In most other journals, authors are listed alongside their affiliated institutions, and so it is relatively easy for a program to link them together. NEJM does it a different way—as shown in the screenshot of a paper from the journal’s website, institutions are listed together with the initials of the authors, which in turn correspond to the full author names at the top of the paper.

Screenshot of the affiliations of a paper from the New England Journal of Medicine's website.
Author affiliations in NEJM come in a nonstandard format that is not easy for a computer to parse.

We might hope that the structured metadata we get from Crossref would have the data in a more standard format. But alas, this isn’t the case, as shown in the screenshot of data from the Crossref API.

Screenshot of JSON data from the Crossref API
Data about the paper from the Crossref API is also in the nonstandard format.

There are around 170,000 works from this journal. This is a relatively tiny proportion of the total number of works in OpenAlex. However, NEJM is a highly influential journal in medicine, so it’s a priority that we get this right.

Custom OpenAlex solution to assign institutions to NEJM authors

OpenAlex team member Nolan created a bespoke algorithm specifically for NEJM papers to parse the affiliation strings and assign authors to institutions. This rule-based algorithm identifies the author initials that might correspond to the full names, and uses those as a mapping to get the link from institution to author, as shown in the screenshot from the OpenAlex API of the example paper from above. The full data for this work can be found at https://api.openalex.org/works/W4386208393.

We have been able to apply this to around 35,000 articles, amounting to 158,000 institutional affiliations. Additionally, we identified about ten thousand raw affiliation strings that we couldn’t match to an institution, but can still prove useful to our users.

The NEJM case is an example of the attention to data and extra effort that is part of the value that OpenAlex hopes to provide. The data can be messy sometimes. It’s our mission to help make sense of it, so the world can have access to high-quality, free and open data.

Screenshot of JSON data from the OpenAlex API
OpenAlex data has institutional affiliations as structured, fully linked data.

New study shows OpenAlex is a good alternative to Scopus for demographic research

Highlights

  • New research from the Max Planck Institute for Demographic Research analyzes global migration of scholars, using bibliometric data. They do a side-by-side comparison of this analysis between Scopus and OpenAlex data.
  • Counts of scholars by country are highly correlated between Scopus and OpenAlex.
  • Migration events are less correlated between the two, but trends in migration between top pairs of countries are consistent between them. There is higher correlation with Western countries, and OpenAlex has more coverage of non-Western countries.
  • OpenAlex is open. Scopus is not. This puts limits on how researchers can perform and share this type of analysis.

A new working paper[1] from researchers at the Max Planck Institute for Demographic Research (MPIDR) uses bibliometric data to study the migration patterns of scholars between countries. Within the field of demography, there is a lack of high-quality data about human migration; so this use of scholarly publication data to infer global-scale migration of scholars is a welcome contribution. They compare the use of two sources of large-scale bibliometric data: “Elsevier’s proprietary Scopus and the openly available OpenAlex.”

The findings of the paper suggest that OpenAlex is a source of open data that shows promise as a replacement for the more established—but more restricted—Scopus data. Overall counts of scholars between countries over time have a high correlation between Scopus and OpenAlex, “with a median correlation close to 1.” The analysis of migration events between the two databases shows less correlation overall, but among the top pairs of countries, “the bilateral flows … are consistent in the two databases.” The authors go on to discuss the reason for the differences, noting that “[this] could signal a large difference in coverage of individual migration trajectories between these two databases and can also stem from the small net migration rates which fluctuate with small differences in measurement rather than population counts which are larger and small changes do not cause them to fluctuate.” In other words, while smaller scale trends may present differently between different data sources due to the nuances and idiosyncrasies of each one, the larger-scale trends are consistent.

The results also suggest that, in some cases, OpenAlex may be an even better resource than Scopus for this analysis. The authors note that the magnitude of migration flows is much larger in OpenAlex compared to Scopus, and that “this could indicate that the higher coverage of publications in OpenAlex might help discover some under-explored scholarly migration corridors worldwide.”

The paper does note some limitations of using OpenAlex as opposed to Scopus for their purposes, specifically, “the quality of the author name disambiguation and identifiers in OpenAlex needs further evaluation in future research.” Evaluating the job that OpenAlex has done assigning authors to all of their papers was outside the scope of this research, but they are able to refer to established research validating the Scopus data. We look forward to this validation on the OpenAlex data both from us and from other independent researchers. We’re also happy to say that we are continually making improvements in our author name disambiguation, so our data will be getting better and better!

Finally, there is the big difference between the two services: OpenAlex is open, while Scopus is not. The authors touch on this several times throughout the paper, both directly and indirectly. They mention that they must limit the years of their analysis, due to “our license terms for Scopus data”. In their Methods section, they describe the multiple steps they had to take to gain access to and acquire the Scopus data, while for OpenAlex, the process was much simpler: “we obtain the publicly available data and process it ourselves”. And in the Acknowledgements section, they explain that the Scopus license terms only permit sharing aggregated results, and no individual data is shared.

Overall, we are very proud that OpenAlex is being recognized as an emerging high-quality, completely open source of bibliometric data that can be used for demographic research. The lack of restrictions on our data is extremely important as it eliminates barriers that researchers face in doing their work. Please check out their paper to learn more about their work!


[1] Akbaritabar, A., Theile, T. & Zagheni, E. Global flows and rates of international migration of scholars. WP-2023-018 https://www.demogr.mpg.de/en/publications_databases_6118/publications_1904/mpidr_working_papers/global_flows_and_rates_of_international_migration_of_scholars_7729 (2023) doi:10.4054/MPIDR-WP-2023-018.

OurResearch’s Commitment to the Principles of Open Scholarly Infrastructure

OurResearch is committed to the Principles of Open Scholarly Infrastructure (POSI). This post summarizes how we are honoring these principles, as well as where we still have work left to do.

Since our beginning in an all-night hackathon ten years ago, we’ve tried to run OurResearch as a sustainable, open, and community-aligned provider of scholarly infrastructure. So while we didn’t write the POSI principles, we sure do recognize them: by and large, these are principles we’ve held (and argued for) from the beginning (eg: 2012, 2018). They’re consistent with our core values of openness, progress, pragmatism, sustainability, and community. 

So when someone asked us recently if we endorse POSI, our answer was HECK YEAH! Today, we’d like to follow that up with a more concrete, public, and formal commitment to these principles. This commitment has been unanimously approved by our board of directors.

The sixteen POSI principles are divided into three sections: Insurance, Governance, and Sustainability. We’ve arranged the document below in the same way. For each principle, we begin with a short description (in italics), taken from the original POSI paper.

If an item has a green heart 💚, we think we’re doing a decent job of it. But that doesn’t mean we’re doing a perfect job. We’re not. We’re committed to continual improvement, and continual vigilance to make sure we honor our commitments. If there’s a yellow heart 💛, we think we’re making progress, but still have a ways to go. We’ll be continuing to work on it. That may take us a while; this is a journey. But we’ll get there.

Finally: our thanks to Geoff Bilder, Jennifer Lin, and Cameron Neylon for authoring the principles, and thanks to Crossref, Dryad, ROR, and JOSS for their early POSI commitments, which gave us great examples to follow.

Summary

Insurance
💚 Open source
💚 Open data (within constraints of privacy laws)
💚 Available data (within constraints of privacy laws)
💚 Patent non-assertion

Governance
💚 Coverage across the research enterprise
💛 Stakeholder Governed
💛 Non-discriminatory membership
💚 Transparent operations
💚 Cannot lobby
💚 Living will
💚 Formal incentives to fulfil mission & wind-down

Sustainability
💚 Time-limited funds are used only for time-limited activities
💚 Goal to generate surplus
💚 Goal to create contingency fund to support operations for 12 months
💚 Mission-consistent revenue generation
💚 Revenue based on services, not data

(💚 = good, 💛 = less good)

Insurance

💚 Open source

All software required to run the infrastructure should be available under an open source license. This does not include other software that may be involved with running the organisation.

All the source code behind everything we do is freely available on GitHub under the MIT open source license. This includes our products, websites, and the software behind the papers we publish. Our code is “born open” — we write it in the open, rather than periodically posting a cleaned-up “open version” later on. Source code is archived via Software Heritage, ensuring availability over the long haul.

💚 Open data (within constraints of privacy laws)

For an infrastructure to be forked it will be necessary to replicate all relevant data. The CC0 waiver is best practice in making data legally available. Privacy and data protection laws will limit the extent to which this is possible

OurResearch makes the data behind our projects open. For example, you can download a full dump of the Unpaywall database, all 120M+ rows of it, any time. This data dump is updated at least once a year. That same data is also available via a public, open API with generous rate limits (100,000 calls per day). Past projects (Impactstory Profiles, Depsy, Paperbuzz, etc) have also always had an open API, and we commit to similar approaches for future products.

Sometimes users share their private data with us, so that we can use that data to generate reports and analyses for them. For example, Unsub users upload their COUNTER data and price lists in order to inform an analytics dashboard we make for them. We never share that private data, or the data derived from it. However, we do encourage users to share their own data, and we never restrict our users’ right to access and share any data they get from us. 

Some of our data, like Crossref’s, consists of facts that have no copyright. Where copyright is applicable, our data is licensed as CC0.

💚 Available data (within constraints of privacy laws)

It is not enough that the data be made “open” if there is not a practical way to actually obtain it. Underlying data should be made easily available via periodic data dumps.

As described above, OurResearch is committed to providing practical ways to obtain open data. 

💚 Patent non-assertion

The organisation should commit to a patent non-assertion covenant. The organisation may obtain patents to protect its own operations, but not use them to prevent the community from replicating the infrastructure.

OurResearch believes patents do not belong in scholarly infrastructure. We will not pursue or assert patents. We will look into making a formal patent non-assertion covenant as suggested by Crossref.

Governance

💚 Coverage across the research enterprise

It is increasingly clear that research transcends disciplines, geography, institutions and stakeholders. The infrastructure that supports it needs to do the same.

We are committed to serving a diverse group of stakeholders across the research enterprise:

  • Disciplines: our products cover the gamut of scholarly disciplines, including STEM, humanities, social sciences, and professional education.
  • Geography: Our users are worldwide, on all continents (except Antarctica…we’re working on that one) and in nearly every country. We take care to support papers and other works written in all languages.
  • Institutions and stakeholders: we serve all different kinds of institutions and stakeholders. Unsub users, for example, include not just the world’s largest research universities, but also industry labs, nonprofits, museums, community colleges, and philanthropies. Unpaywall is used by all of the above, as well as by academic publishers, library services companies (large and small), bibliometricians, research assessment exercises, and startups. The free Unpaywall extension currently has 400,000 active users, including large numbers of students, journalists, policy-makers, independent researchers, laypeople, and other historically neglected stakeholder groups.

By offering different types of products, aimed at different sets of stakeholders, we’re able to engage with a wide range of communities, and hear how their needs are similar, and how they’re different. We build infrastructure that cuts across communities where applicable–for instance, the open Unpaywall dataset is used in all kinds of ways. However, we also find places where a particular group would benefit from more customized tooling. For example, we built the Simple Query Tool (a web-based UI to Unpaywall) in response to requests from less technical users who wanted to access the database, but didn’t feel comfortable using a REST API. Later we built an Unpaywall repository dashboard for institutional repository librarians, a stakeholder group we didn’t originally consider.

Although we do strive to be inclusive, there are areas where we can continue to improve, and we intend to do so. For example, we’d like to improve our internationalization, by writing more documentation and UI components in languages besides English. In the next year we will be making an important stride to support diversity, as we provide better support for research works not assigned a DOI. 

💛 Stakeholder Governed

A board-governed organisation drawn from the stakeholder community builds more confidence that the organisation will take decisions driven by community consensus and consideration of different interests.

OurResearch is a 501(c)3 organization, with a governance structure documented in its bylaws. Our Board of Directors, being a small group, is limited in its representation, in terms of geographic, ethnic, gender, disability, and organizational diversity. The current board includes those with work experience as a faculty member, publisher, library advocate, teacher, and infrastructure builder, with educational backgrounds in science, engineering, history, and business. While this does represent many aspects of our stakeholder community, the small size of our board limits the extent to which the range of stakeholders can be involved. We recognize that increasing the diversity of stakeholders on our Board is important to provide diverse perspectives. We will work towards improving this.

💛 Non-discriminatory membership

We see the best option as an “opt-in” approach with a principle of non-discrimination where any stakeholder group may express an interest and should be welcome. The process of representation in day to day governance must also be inclusive with governance that reflects the demographics of the membership.

OurResearch is not a membership based organization, but we fully support the principle of non-discrimination in our hiring, Board appointments, community engagement, outreach and all other activities. We engage our community through GitHub, Twitter, our mailing lists, and conferences (virtual and in-person), and welcome “opt-in” ideas from anyone at any time. We will also be launching an advisory group, to broaden the involvement of stakeholder groups as members of the community. 

We do not currently have a formal Code Of Conduct to govern interactions between OurResearch employees and Board members and the OurResearch community. We are working on one.

Representation in day-to-day governance comes from our employees, Board of Directors, customer feedback, and engagement with the community online. However, because our Board is 50% women, 50% men, entirely white, non-disabled, and based solely in the USA and Canada, it does not fully reflect the demographics of our community of users, which is global in scope and more racially, ethnically, and gender, disability, and geographically diverse than our current board. We will work towards improving this.

💚 Transparent operations

Achieving trust in the selection of representatives to governance groups will be best achieved through transparent processes and operations in general (within the constraints of privacy laws).

OurResearch strives to be a transparent organization.  As a 501(c)3 nonprofit, all of our tax returns are publicly available; you can find links to these on our transparency page. That page also publishes executive salaries, incorporation documents, bylaws, and other relevant information.  All our grant proposals (funded and unfunded) are openly published and archived on Open Grants (search under “Piwowar” or “Priem”). 

💚 Cannot lobby

The community, not infrastructure organisations, should collectively drive regulatory change. An infrastructure organisation’s role is to provide a base for others to work on and should depend on its community to support the creation of a legislative environment that affects it.

OurResearch is a mission-driven organization that works toward accelerating the transition to open science. We’re not lobbyists and we don’t lobby. As a 501(c)3 non-profit organization, we strictly adhere to U.S. limitations in this area.

💚  Living will

A powerful way to create trust is to publicly describe a plan addressing the condition under which an organisation would be wound down, how this would happen, and how any ongoing assets could be archived and preserved when passed to a successor organisation. Any such organisation would need to honour this same set of principles.

Our core assets are our source code and datasets. These are both open. Software is archived via Software Heritage assuring long-term persistence. Key datasets are integrated into other open datasets (eg, Unpaywall is part of the open DOIBoost dataset).  Today and in the future, our data and code can be used by a wide variety of successor organizations. 

We are a non-profit company without equity shares, so are unlikely to be bought or acquired.  That said, we are looking into formal mechanisms to codify that any future disposal of our brand assets (trademarks, domain names, etc) could only be to organizations who honour the same principles.

💚 Formal incentives to fulfil mission & wind-down

Infrastructures exist for a specific purpose and that purpose can be radically simplified or even rendered unnecessary by technological or social change. If it is possible the organisation (and staff) should have direct incentives to deliver on the mission and wind down.

Many of the tools that OurResearch provides are “stop-gap” solutions. For example, in a world where all articles are open access at the time of publication, no open-access index like Unpaywall would be needed — the DOI would simply resolve to an open copy of the paper every time. Similarly, in a world without toll-access academic journals there is no longer a need for tools like Unsub to help librarians assess the value of journal subscriptions. 

We eagerly look forward to the day when our stop-gaps are no longer needed! We also plan accordingly, and will wind down projects (or parts of projects) as they are no longer valuable to the community. We don’t have formal incentives for this, other than looking forward to a really big party.

Sustainability

💚 Time-limited funds are used only for time-limited activities

Day to day operations should be supported by day to day sustainable revenue sources. Grant dependency for funding operations makes them fragile and more easily distracted from building core infrastructure.

Currently earned revenue fully covers the day-to-day operations of OurResearch. When we get grants, we use them to support the development and early stages of new products, or to fund one-time enhancements of existing products. We will continue to work hard to ensure this remains true in the future.

💚 Goal to generate surplus

Organisations which define sustainability based merely on recovering costs are brittle and stagnant. It is not enough to merely survive, it has to be able to adapt and change. To weather economic, social and technological volatility, they need financial resources beyond immediate operating costs.

OurResearch currently has an operating surplus. This hasn’t always been true — we’ve had some lean years in the past — but it is certainly our goal to maintain a surplus in the future. Our deliberate decision to run with a relatively small number of staff makes it easier to achieve that goal. Our experience running in both rich and lean times over the last ten years makes us resilient to a wide range of financial contingencies. 

💚 Goal to create contingency fund to support operations for 12 months

A high priority should be generating a contingency fund that can support a complete, orderly wind down (12 months in most cases). This fund should be separate from those allocated to covering operating risk and investment in development.

We currently have funds available to support our operations for 12 months. We have not formally set these aside as a contingency fund. We will create a Use Of Funds policy to make our contingency and wind-down funds more explicit.

💚 Mission-consistent revenue generation

Potential revenue sources should be considered for consistency with the organisational mission and not run counter to the aims of the organisation. For instance…

The earned revenue of OurResearch currently comes from service level agreements to the Unpaywall Data Feed and subscriptions to Unsub custom analytics services. Our revenue comes from a worldwide assortment of universities, university consortia, scholarly publishers, discovery services, and research analytics companies. We supplement our earned revenue with grants from mission-aligned organizations like the Arcadia Foundation.

💚 Revenue based on services, not data

Data related to the running of the research enterprise should be a community property. Appropriate revenue sources might include value-added services, consulting, API Service Level Agreements or membership fees.

OurResearch receives no revenue for its data, which is completely open, but rather for service level agreements and value-added services. We’re deeply committed to maintaining this model. 

What should a FAIR checker include?


The Wellcome Trust is considering funding a tool that would report on the FAIR status of research outputs.  We recently responded to their Request for Information with some ideas to refine their initial plan and thought we’d share them here!

a) Include Openness Assessment

[Figure source]

We believe the planned software tool should not only assess the FAIRness of research outputs, but also their Openness.  As described in the recent Final Report and Action Plan from the European Commission Expert Group on FAIR Data:  “Data can be FAIR or Open, both or neither. The greatest benefits come when data are both FAIR and Open, as the lack of restrictions supports the widest possible reuse, and reuse at scale.”    

This refinement is essential for several reasons.  First, we believe researchers will be expect something called a “FAIR assessment” to include assessing Openness, and will be confused when it does not, leading to poor understanding of the system.  Second, the benefit of openness is clear to everyone and increases the motivation of the project to researchers. Third, Wellcome has done a great job of highlighting the need for openness already and so it helps the tool be an incremental addition to the work they have done rather than a different, new set of requirements with an unclear relationship.  Fourth, an openness assessment tool is needed by the community, and would fit very well in the proposed tool, and its anticipated popularity and exposure would help the FAIR assessment gain traction.

 

b) Require the tool produce Open Data, not just be Open Source

The project brief was very clear that the tool needs to be Open Source, with a liberal license.  This is great. We suggest the brief needs to add that the data provided by the tool will be Open Data.  Ideally the brief would suggest a license for the data (CC0, or an open database license which facilitates reuse including commercial reuse) and data delivery specifications.  For data delivery we suggest both regular full data dumps and also a machine-readable free open JSON API which requires minimal registration, is high performing (< 1 second response time), can handle a high concurrent load, has high daily quota limits, and can handle at least a million calls per day across the system.

It could also specify that money could be charged for Support-Level Agreements for the API for institutions who want that, or for above-normal quotas on the API, for more common data dumps, or similar.  This is similar to our Unpaywall open data model which has worked very well.

 

c) Pre-ingest hundreds of millions of research objects

The project brief should make it more explicit that the software tool needs to launch with pre-calculation of scores/badges of a hundreds of millions of research objects.   We luckily live in a world where many research objects are already listed in repositories like Crossref, DataCite, Github, etc. These should be ingested and form the basis of the dataset used by the tool.  This pre-ingesting is implicitly needed to do some of the leaderboards and aggregations specified by the brief: in our opinion it should be more explicit. It will also allow large-scale calibration of scores, large-scale datasets to be exported to support policy research, additional tools, etc, and would assure a high-performing system which can not be assured when FAIR assessments are made ad-hoc upon request for most products.

(Admittedly gathering research objects registered in such sources naturally selects research objects that have identifiers, and a certain standard and kind of metadata and FAIR level, so it isn’t representative of all research objects — this needs to be considered when using it for calibration)

 

d) More details on aggregation

The brief doesn’t include enough details on aggregation.  In our opinion aggregation is key.

Aggregation supports context for FAIR metrics and badges (through percentiles etc), facilitates publicity, inspires change and improvement, etc.  Most research objects do not have metadata that supports interesting aggregation right now — datasets are rarely associated with an ORCID or institution, etc.  RFPs should specify how they will facilitate aggregation. We anticipate the proposals will include combination of automated approaches using metadata (use crossref and datacite metadata, and pubmed linkout data, to associate datasets with papers, which are themselves associated with ORCIDs and clinical trial IDs and GRID institutional identifiers) and text mining (to associate github links with papers) etc, and methods for CSV uploads to link identifiers to aggregation groups

 

e) Include Actionable Steps for immediate FAIR score improvement

The brief should specify that after showing them their scores, the tool links researchers to actionable steps that they should take to improve their FAIR and Open Data scores.  These could simply be How-to guides — how to put your software on Github, how to specify a license for your dataset, how to make your paper Open Access via uploading the accepted manuscript etc. They should walk the researcher through how to improve their score on existing products, and then immediately recalculate the FAIR score so the researcher can see progress.  If this sort of recalculation ability is not built in to the design from the beginning it can be lead to system designs which make it difficult to add later.

 

f) Open grants process for this RFI

The RFP should give applicants the option to make their proposals public (and encourage them to do so), and the grant reviews should be public.  Or at least make steps forward on this, in the spirit of incremental improvement on the Wellcome’s great Open Research Fund mechanisms.

 

Unpaywall extension adds 200,000th active user

We’re thrilled to announce that we’re now supporting over 200,000 active users of the Unpaywall extension for Chrome and Firefox!

The extension, which debuted nearly two years ago, helps users find legal, open access copies of paywalled scholarly articles. Since its release, the extension has been used more than 45 million times, finding an open access copy in about half of those. We’ve also been featured in The Chronicle of Higher Ed, TechCrunch, Lifehacker, Boing Boing, and Nature (twice).

However, although the extension gets the press, the database powering the extension is the real star. There are millions of people using the Unpaywall database every day:

  • We deliver nearly one million OA papers every day to users worldwide via our open API…that’s 10 papers every second!
  • Over 1,600 academic libraries use our SFX integration to automatically find and deliver OA copies of articles when they have no subscription access.
  • If you’re using an academic discovery tool, it probably includes Unpaywall data…we’re integrated into Web of Science, Europe PubMed Central, WorldCat, Scopus, Dimensions, and many others.
  • Our data is used to inform and monitor OA policy at organizations like the US NIH, UK Research and Innovation, the Swiss National Science Foundation, the Wellcome Trust, the European Open Science Monitor, and many others.

The Unpaywall database gets information from over 50,000 academic journals and 5000 scholarly repositories and archives, tracking OA status for more than 100 million articles. You can access this data for free using our open API, or user our free web-based query tool. Or if you prefer, you can just download the whole database for free.

Unpaywall is supported via subscriptions to the Unpaywall Data Feed, a high-throughput pipeline providing weekly updates to our free database dump. Thanks to Data Feed subscribers, Unpaywall is completely self-sustaining and uses no grant funding. That makes us real optimistic about our ability to stick around and provide open infrastructure for lots of other cool projects.

Thanks to everyone who has supported this project, and even more, thanks to everyone who has fought for open access. Without y’all, Unpaywall wouldn’t matter. With you: we’re changing the world. Together. Next stop 300k!

It’s time to insist on #openinfrastructure for #openscience


It’s time.  In the last month there’ve been three events that suggest now is the time to start insisting on open infrastructure for open science:

The first event was the publication of two separate recommendations/plans on open science, a report by the National Academies in the US, and Plan S by the EU on open access.  Notably, although comprehensive and bold in many other regards, neither report/plan called for open infrastructure to underpin the proposed open science initiatives.

Peter Suber put it well in his comments on Plan S:

the plan promises support for OA infrastructure, which is good. But it never commits to open infrastructure, that is, platforms running on open-source software, under open standards, with open APIs for interoperability, preferably owned or hosted by non-profit organizations. This omission invites the fate that befell bepress and SSRN, but this time for all European research.

The second event was the launch of Google’s Dataset Search — without an API.

Why do we care?  Because of opportunity cost.  Google Scholar doesn’t have an API, and Google has said it never will.  That means that no one has been able to integrate Google Scholar results into their workflows or products.  This has had a huge opportunity cost for scholarship.  It’s hard to measure, of course, opportunity costs always are, but we can get a sense of it: within 2 years of the Unpaywall launch (a product which does a subset of the same task but with an open api and open bulk data dump), the Unpaywall data has been built in to 2000 library workflows, the three primary A&I indexes, competing commercial OA discovery services, many reports, apps of countless startups, and more integrations in the works.  All of that value-add was waiting for a solution that others could build on.

If we relax and consider the Dataset Search problem solved now that Google has it working, we’re forgoing these same integration possibilities for dataset search that we lost out on for so long with OA discovery.  We need to build open infrastructure: the open APIs and open source solutions that Peter Suber talks about above.

As Peter Kraker put it on Twitter the other day: #dontLeaveItToGoogle.

The third event was of a different sort: a gathering of 58 nonprofit projects working toward Open Science.  It was the first time we’ve gathered together explicitly like that, and the air of change was palatable.

It’s exciting.  We’re doing this.  We’re passionate about providing tools for the open science workflow that embody open infrastructure.

If you are a nonprofit but you weren’t at JROST last month, join in!  It’s just getting going.

 

So.  #openinfrastructure for #openscience.  Everybody in scholarly communication: start talking about it, requesting it, dreaming it, planning it, building it, requiring it, funding it.  It’s not too big a step.  We can do it.  It’s time.

 

ps More great reading on what open infrastructure means from Bilder, Lin, and Neylon (2015) here and from Hindawi here.

pps #openinfrastructure is too long and hard to spell for a rallying cry.  #openinfra??  help 🙂

Reposted from Heather’s personal Research Remix blog.