How do we know Unpaywall won’t be acquired?

Reposted with minor editing from a response Jason gave on the Global Open Access mailing list, July 12 2018.

We’re often asked: How do we know Unpaywall won’t be acquired? What makes Unpaywall (and the company behind it, Impactstory) different than Bepress, SSRN, Mendeley, Publons, Kopernio, etc?

How can we be sure you won’t be bought by someone whose values don’t align with open science?

There are no credible guarantees I can offer that this won’t happen, and nor can any other organization. However, I think stability in the values and governance of Impactstory is a relatively safe bet. Here’s why (note: I’m not a lawyer and the below isn’t legal advice, obvs):

We’re incorporated as a 501(c)3 nonprofit. This was not true of recently-acquired open science platforms like Mendeley, SSRN, and Bepress, which were all for-profits. We think that’s fine…the world needs for-profits. But we sure weren’t surprised when any of them were acquired. These are for-profit companies, which means they are, er:

For: Profit.

Legally, their purpose is profit. They may benefit the world in many additional ways, but their officers and board have a fiduciary duty to deliver a return to investors.

Our officers and board, on the other hand, have a legal fiduciary duty to fulfill our nonprofit mission, even where this doesn’t make much money. I think instead of “nonprofit” it should be called for-mission. Mission is the goal. That can be a big difference. Jefferson Pooley did a great job articulating the value of the nonprofit structure for scholcomm organizations in more detail in a much-discussed LSE Impact post last year.

All that said, I’m not going to sit here and tell you nonprofits can’t be acquired…cos although that may be technically true, nonprofits can still be, in all-but-name, acquired. It’s just less common and harder.

So we like to also emphasize that the source code for these projects we are doing is open. That means that for any given project, its main asset–the code that makes our project work–is available for free to anyone who wants it. This makes us much less of an acquisition target. Why buy the cow when the code is free, as it were.

As a 501(c)3 nonprofit, we have a board of directors that helps keep us accountable and helps provide leadership to the organization as well. Past board members have included Cameron Neylon and John Wilbanks, with a current board of me, Heather, Ethan White and Heather Joseph. Heather, Ethan, John, and Cameron have each contributed mightily to the Open cause, in ways that would take me much longer than I have to fully chronicle (and most of you probably know anyway). We’re incredibly proud to have (and have had) them tirelessly working to help Impactstory stay on the right course. We think they are people that can be trusted.

Finally, and y’all can make up your own minds about this, I like to think our team has built up some credibility in the space. Me and Heather have both been working entirely on open-source, open science projects for the last ten years, and most of that work’s pretty easy to find if you want to check it out. In that time, it’s safe to assume we’ve turned down some better-paying projects that aligned less closely with the open science mission.

So, being acquired? Not in our future. But growth sure is, through grants and partnerships and customer relationships and lots of hard work… all in the service of making scholcomm more open. Stay tuned 🙂

We’re building a search engine for academic literature–for everyone

Huzzah! Today we’re announcing an $850k grant from the Arcadia Fund to build a new way for folks to find, read, and understand the scholarly literature.

Wait, another search engine? Really?

Yep. But this one’s a little different: there are already a lot of ways for academic researchers to find academic literature…we’re building one for everyone else.

We’re aiming to meet the information needs of citizen scientists, patients, K-12 teachers, medical practitioners, social workers, community college students, policy makers, and millions more. What they all have in common: they’re folks who’d benefit from access to the scholarly record, but they’ve historically been locked out. They’ve had no access to the content or the context of the scholarly conversation.

Problem: it’s hard to access to content

Traditionaly, the scholarly literature was paywalled, cutting off access to the content. The Open Access movement is on the way to solving this: Half of new articles are now free to read somewhere, and that number is growing. The catch is that there are more than 50,000 different “somewheres” on web servers around the world, so we need a central index to find it. No one’s done a good job of this yet (Google Scholar gets close, but it’s aimed at specialists, not regular people. It’s also 100% proprietary, closed-source, closed-data, and subject to disappearing at Google’s whim.)

Problem: it’s hard to access to context

Context is the stuff that makes an article understandable for a specialist, but gobbledegook to the rest of us. So that includes everything from field-specific jargon, to strategies for on how to skim to the key findings, to knowledge of core concepts like p-values. Specialists have access to context. Regular folks don’t. This makes reading the scholarly literature like reading Shakespeare without notes: you get glimmers of beauty, but without some help it’s mostly just frustrating.

Solution: easy access to the content and context of research literature.

Our plan: provide access to both content and context, for free, in one place. To do that, we’re going to bring together an open a database of OA papers with a suite AI-powered support tools we’re calling an Explanation Engine.

We’ve already finished the database of OA papers. So that’s good. With the free Unpaywall database, we’ve now got 20 million OA articles from 50k sources, built on open source, available as open data, and with a working nonprofit sustainability model.

We’re building the “AI-powered support tools” now. What kind of tools? Well, let’s go back to the Hamlet example…today, publishers solve the context problem for readers of Shakespeare by adding notes to the text that define and explain difficult words and phrases. We’re gonna do the same thing for 20 million scholarly articles. And that’s just the start…we’re also working on concept maps, automated plain-language translations (think automatic Simple Wikipedia), structured abstracts, topic guides, and more. Thanks to recent progress in AI, all this can be automated, so we can do it at scale. That’s new. And it’s big.

The payoff

When Microsoft launched Altair BASIC for the new “personal computers,” there were already plenty of programming environments for experts. But here was one accessible to everyone else. That was new. And ultimately it launched the PC revolution, bringing computing the lives of regular folks. We think it’s time that same kind of movement happened in the world of knowledge.

From a business perspective, you might call this a blue ocean strategy. From a social perspective (ours), this is a chance to finally cash the cheques written by the Open Access movement. It’s a chance to truly open up access to the frontiers of human knowledge to all humans.

If that sounds like your jam, we’d love your support: tell your friends, sign up for early access, and follow us for updates. It’s gonna be quite an adventure.

Here’s the press release.

Why the name “altmetrics” doesn’t imply replacement of citations (and other bicycling metaphors)

“Based on the name “alternative” metrics, you clearly think altmetrics can replace citations. That’s dumb.”

I (Jason) have heard this critique more times than I care to count. And on one level, I get it. If you take an “alternate route,” you don’t take the original route, you take a different one. There’s a replacement. And completely replacing citation metrics with altmetrics is, I agree, dumb. That said, I actually believe altmetrics should complement citation, and I further think that the name “altmetrics” (for all its flaws) is compatible with this view. To explain, here’s an example:

I’m currently looking out the window at a street which includes both a lane for cars, and another lane for “alternate transportation,” a category that includes bicycles, skateboards, and scooters.

Although these “alternate” vehicles have many advantages over cars (cleaner, smaller, etc) the goal of city planners is not, as I understand, to replace automobiles with alternate transportation. Rather, the goal is to make it easy for commuters to use the most suitable vehicle for their particular trip. This in turn supports a more efficient infrastructure for the city as a whole. Making it easy for commuters to choose alternate transportation for a given trip is helpful, even though no one really expects bikes to completely replace cars in the city as a whole.

(As an aside: these “alternate” vehicles could probably have some other, more descriptive name….for instance, “smaller-more-efficient vehicles.” However, as a practical matter, cars are the default for now so bikes etc remain “alternatives” for now. This is also true of altmetrics, of course, which I often hear will someday be obsolete as a term, once it really catches on. To this I say: excellent. The sooner the better.)

Like bikes et al, altmetrics aren’t right for every use case, and never will be. Altmetrics can’t and shouldn’t replace citation metrics for every task. But they are much better tools than citation metrics for some tasks (for example, understanding the impact of research on populations that don’t write scholarly papers). Therefore, using altmetrics alongside citations will let us measure scholarly impact more in a way that’s more efficient, nuanced, and comprehensive. Altmetrics are an alternative to the measurement gridlock that comes from over-reliance on citation metrics.

When will everything be Open Access?

OA continues to grow. But when will it be…done? When will everything be published as Open Access?

Using data from our recently-published PeerJ OA study, we took a crack at answering that question. This data we’re using comes from the Unpaywall database–now the largest open database of OA articles ever created, with comprehensive data on over 90 million articles. Check out the paper for more lots more details on how we assembled the data, along with assessments of accuracy and other goodies. But without further ado, here’s our projection of OA growth:

growth-over-time

In the study, we found that OA is increasingly likely for newer articles since around 1990. That’s the solid line part of the graph, and is based on hard data.

But since the curve is so regular, it was tempting to extend it so see what would happen at the current rate of increase. That’s the dotted line in the figure above. Of course it’s a pretty facile projection, in that no effort has been made to model the underlying processes. #limitations #futurework 😀. Moreover, the 2040 number is clearly too conservative since it doesn’t account for discontinuities–like the surge in OA we’ll see in 2020 when new European mandates take effect.

But while the dates can’t be known for certain, what the data makes very clear is that we are headed for an era of universal OA. It’s not a question of if, but when. And that’s great news.

Open Access, coming to a workflow near you: welcome to the year of Ubiquitous OA

Thanks to 20 years of OA innovation and advocacy, today you can legally access around half the recent research literature for free. However, in practice, much of this free literature is not as open as we’d like it to be, because it’s hard for readers to find the OA version.

OA lives on repositories and publisher websites. But very few people visit these sources directly to find a given article. Instead, people rely on the search tools that are already part of their existing workflows. Historically, these haven’t done a great job surfacing OA resources. Google, for instance, often fails to index OA versions, in addition to indexing content of dubious provenance. OA aggregators like BASE, CORE, and OpenAIRE aim to solve this by emphasizing OA coverage, but they require researchers to add a second or third search step to their existing workflows–something researchers have been reluctant to do.

So in addition to the well-known access problem, we also have a discovery problem. On the one there’s a healthy, efficient OA infrastructure in journals and repositories. On the other we have millions of individual readers doing their own thing. We need to connect these. We need to cover this last mile between the infrastructure and the individual user, and we need to make that connection easy and seamless and ubiquitous. Until we do, OA is writing a check it can’t fully cash.

But the news is good: over the last year, several efforts are emerging to cover that last mile. Our contribution was Unpaywall: an extension that shows a green tab in your browser on articles where there’s an OA version available. Unpaywall has enjoyed lots of success, adding over 100,000 active users in under than a year. Moreover, the backend database of Unpaywall (formerly called oaDOI) can be integrated into any number of existing tools, making it easier to spread OA content all over the place. For instance, we’re already seeing over a million uses every day from library link resolvers.

Our most recent integration takes this to a new level, and we’re so excited about it: thanks to a new partnership between Impactstory and Clarivate Analytics, data from Impactstory’s Unpaywall database is now live in the Web of Science, making it the first editorially-curated and publisher-neutral resource to implement this technology. Web of Science has been able to use Unpaywall data to discover and link to millions more OA records amongst their existing content. This enables millions of Web of Science users around the world to link straight from their search results to a trusted, legal, peer-reviewed OA version—and they can also filter search results by the different versions of OA.

This is a big deal because article and indexing (A&I) systems like Web of Science are currently the most important way researchers access literature. And though it’s by no means the only A&I system out there, Web of Science is the most respected and most prevalent. Every month, millions of users access literature through Web of Science—and now, each and every one of them will see more OA options for articles they might not otherwise discover, right alongside subscribed content. Every day. What a huge change from the days we had to convince folks that OA was legitimate at all! It’s a new era.

A new era: that’s not just a hyperbolic phrase. We think this year marks the turning of a new moment in the OA narrative. We’re moving out of the author-focused, advocacy-focused initial phase, and into a more mature era of ubiquitous Open Access, characterized by deep integration of OA into researcher workflows and value-add services built on top of the immense OA corpus. This is the era of user-focused OA.

As OA becomes the default state for published research, tools that centralize, mine, index, search, organize, and extract knowledge from papers suddenly become massively more powerful. Integrations between Unpaywall and commercial services aren’t generating this new era, but they are one of the hallmarks of it. We’re not making new OA, but rather starting to leverage the massive OA corpus now available. In the last year, many others have begun to do this as well. Many, many more will follow

For years, we in the OA advocate community have been arguing that a critical mass of OA would not just improve scholarly communication, it would transform it. This is finally beginning to happen, and we think this partnership with Web of Science is an early part of that transformation. Now, a subscription to Web of Science—something most academic libraries globally already have—is also a subscription to a database of millions of free-to-read OA articles.

We’ve never been more excited about the future of OA–or more thankful for all the work the OA community as a whole has done to get here. And we can’t wait to keep working together with the community to help make the vision of ubiquitous open access a reality.

New partnership with Clarivate to help oaDOI find even more Open Access

We’re excited to announce a new partnership with Clarivate Analytics!

This partnership between Impactstory and Clarivate will help fund better coverage of Open Access in the oaDOI database. The improvements will grow our index of free-to-read fulltext copies, bringing the total number to more than 18 million, along with 86 million article records altogether. All this data will continue to be freely accessible to everyone via our open API.

The partnership with Clarivate Analytics will put oaDOI data in front of users at thousands of new institutions, by integrating our index into the popular Web of Science system. The oaDOI API is already in use by more than 700 libraries via SFX, and delivers more than 500,000 fulltext articles to users worldwide every day. It also powers the free Unpaywall browser extension, used by over seventy thousand people in 145 countries.

You can read more about the partnership in Clarivate’s press release. We’ll be sharing more details about improvements in the coming months. Exciting!

Green Open Access comes of age

This morning David Prosser, executive director of Research Libraries UK, tweeted, “So we have @unpaywall, @oaDOI_org, PubMed icons – is the green #OA infrastructure reaching maturity?” (link).

We love this observation, and not just because two of the three projects he mentioned are from us at Impactstory 😀. We love it because we agree: Green OA infrastructure is at a tipping point where two decades of investment, a slew of new tools, and a flurry of new government mandates is about to make Green OA the scholarly publishing game-changer.

A lot of folks have suggested that Sci-Hub is scholarly publishing’s “Napster moment,” where the internet finally disrupts a very resilient, profitable niche market. That’s probably true. But like music industry shut down Napster, Elsevier will likely be able shut down Sci-Hub. They’ve got both the money and the legal (though not moral) high ground and that’s a tough combo to beat.

But the future is what comes after Napster. It’s in the iTunes and the Spotifys of scholarly communication. We’ve built something to help to create this future. It’s Unpaywall, a browser extension that instantly finds free, legal Green OA copies of paywalled research papers as you browse–like a master key to the research literature. If you haven’t tried it yet, install Unpaywall for free and give it a try.

Unpaywall has reached 5,000 active users in our first ten days of pre-release.

But Unpaywall is far from the only indication that we’re reaching a Green OA inflection point. Today is a great day to appreciate this, as there’s amazing Green OA news everywhere you look:

Unpaywall reached the 5000 Active Users milestone. We’re now delivering tens of thousands of OA articles to users in over 100 countries, and growing fast.
PubMed announced Institutional Repository LinkOut, which links every PubMed article to a free Green copy in institutional repositories where available. This is huge, since PubMed is one of the world’s most important portals to the research literature.
The Open Access Button announced a new integration with interlibrary loan that will make it even more useful for researchers looking for open content. Along with the interlibrary loan request, they send instructions to authors to help them self-archive closed publications.

Over the next few years, we’re going to see an explosion in the amount of research available openly, as government mandates in the US, UK, Europe, and beyond take force. As that happens, the raw material is there to build completely new ways of searching, sharing, and accessing the research literature.
We think Unpaywall is a really powerful example: When there’s a big Get It Free button next to the Pay Money button on publisher pages, it starts to look like the game is changing. And it is changing. Unpaywall is just the beginning of the amazing open-access future we’re going to see. We can’t wait!

How to smash an interstellar paywall

Last month, hundreds of news outlets covered an amazing story: seven earth-sized planets were discovered, orbiting a nearby star. It was awesome. Less awesome: the paper with the details, published in the journal Nature, was paywalled. People couldn’t read it.

That’s messed up. We’re working to fix it, by releasing our new free Chrome extension Unpaywall. Using Unpaywall, you can get access to the article, and millions like it, instantly and legally. Let’s learn more.

First, is this really a problem? Surely google can find the article. I mean, there might be aliens out there. We need to read about this. Here we go, let’s Google for “seven terrestrial planets nature article.” Great, there it is, first result. Click, and…

What, thirty-two bucks to read!? Well that’s that, I quit.

Or maybe there are some ways around the paywall? Well, you can know someone with access. My pal Cindy Wu helped out her journal club out this way, offering on Twitter to email them a copy of the paper. But you have to follow Cindy on Twitter for that to work.

Or you could know the right places to look for access. Astronomers generally post their papers are on a free web server called the ArXiv, and sure enough if you search there, you’ll find the Nature paper. But you have to know about ArXiv for that to work. And check out those Google search results again: ArXiv doesn’t appear.

Most people don’t know Cindy, or ArXiv. And no one’s paying $32 for an article. So the knowledge in this paper, and thousands of papers like it, is locked away from the taxpayers who funded it. Research becomes the private reserve of those privileged few with the money, experience, or connections to get access.

We’re helping to change that.

Install our new, free Unpaywall Chrome extension and browse to the Nature article. See that little green tab on the right of the page? It means Unpaywall found a free version, the one the authors posted to ArXiv. Click the tab. Read for free. No special knowledge or searches or emails or anything else needed.

Today you’ll find Unpaywall’s green tab on ten million articles, and that number is growing quickly thanks to the hard work of the open-access movement. Governments in the US, UK, Europe, and beyond are increasingly requiring that taxpayer-funded research be publically available, and as they do Unpaywall will get more and more effective.

Eventually, the paywalls will all fall. Till then, we’ll be standing next to ‘em, handing out ladders. Together with millions of principled scientists, libraries, techies, and activists, we’re helping make scholarly knowledge free to all humans. And whoever else is out there 😀 👽.

Introducing Unpaywall: unlock paywalled research papers as you browse

Last Friday night we tweeted about a new Chrome extension we’ve been working on. It’s called Unpaywall, and it links you to free fulltext as you browse research articles. Hit a paywall? No problem: click the green tab and read it free.

Unpaywall is powered by an index of over ten million legally-uploaded, open-access resources, and it delivers. For example, in a set of 11k recent cancer research articles covered in mainstream media, Unpaywall users were able to read around half of them for free–even without any subscription, and even though most of them were paywalled.

So far the response to Friday’s tweet has been amazing — 500 retweets, and in just a few days we’ve gotten more than 1500 installations: Hockey stick growth! 🙂

And we’ve also gotten rave reviews, like this one from Sarah:

Wow, just wow, this could be the single most useful extension ever! https://t.co/FWTBje4Lq2

— Sarah Hammond (@schammond) March 14, 2017

Why the excitement? Finding free, legal, open access is now super easy — it happens automatically. With the Unpaywall extension, links to open access are automatically available as you browse.

The Unpaywall browser extension by @Impactstory is the best way I've seen to quickly find #OA versions of $ papers https://t.co/sFZXSJMqJl https://t.co/lDYkgm3tYV

— Ethan White (@ethanwhite) March 13, 2017

This is useful for researchers like Ethan. It’s also really helpful for people outside academia, who don’t enjoy the expensive subscription benefits of institutional libraries. This is especially true for nonprofits:

@jasonpriem as someone who runs a non-profit organisation in a developing country this extension is GOLD! thank you

— Nikita Shiel-Rolle (@Nikitasr135) March 14, 2017

…. and folks working to communicate scholarship to a broader audience:

As a science writer, #openaccess papers are solid gold. This is great! https://t.co/jH49m8i6HT

— FLGenomicsFrances (@FLGFrances) March 14, 2017

Go give it a try and see what you think! The official release is April 4th, but you can already install it, learn more, and follow @unpaywall. We’d love your help to spread the word about Unpaywall to your friends and colleagues. Together we can accelerate toward to a future of full #openaccess for all!

behind the scenes: cleaning dirty data

Dirty Data. It’s everywhere! And that’s expected and ok and even frankly good imho — it happens when people are doing complicated things, in the real world, with lots of edge cases, and moving fast. Perfect is the enemy of good.

Thanks http://www.navigo.com.au/2015/05/cleaning-out-the-closet-how-to-deal-with-dirty-data/ for the image

Alas it’s definitely behind-the-scenes work to find and fix dirty data problems, which means none of us learn from each other in the process. So — here’s a quick post about a dirty data issue we recently dealt with 🙂 Hopefully it’ll help you feel comradery, and maybe help some people using the BASE data.

We traced some oaDOI bugs to dirty records from PMC in the BASE open access aggregation database.

Most PMC records in BASE are really helpful — they include the title, author, and link to the full text resource in PMC. For example, this record lists valid PMC and PubMed urls:

and this one lists the PMC and DOI urls:

The vast majority of PMC records in BASE look like this. So until last week, to find PMC article links for oaDOI we looked up article titles in BASE and used the URL listed there to point to the free resource.

But! We learned! There is sometimes a bug! This record has a broken PMC url — it lists http://www.ncbi.nlm.nih.gov/pmc/articles/PMC with no PMC id in it (see, look at the URL — there’s nothing about it that points to a specific article, right?). To get the PMC link you’d have to follow the Pubmed link and then click to PMC from there. (which does exist — here’s the PMC page which we wish the BASE record had pointed to).

That’s some dirty data. And it gets worse. Sometimes there is no pubmed link at all, like this one (correct PMC link exists):

and sometimes there is no valid URL, so there’s really no way to get there from here:

(pretty cool PMC lists this article from 1899, eh?. Edge cases for papers published more than 100 years ago seems fair, I’ve gotta admit 🙂 )

Anyway. We found this dirty PMC data in base is infrequent but common enough to cause more bugs than we’re comfortable with. To work around the dirty data we’ve added a step — oaDOI now uses the the DOI->PMCID lookup file offered by PMC to find PMC articles we might otherwise miss. Adds a bit more complexity, but worth it in this case.

So, that’s This Week In Dirty Data from oaDOI! 🙂 Tune in next week for, um, something else 🙂

And don’t forget Open Data Day is Saturday March 4, 2017. Perfect is the enemy of the good — make it open.

OurResearch blog