We’ve got a board. And it’s awesome.

Total-impact is in the process of incorporating as a non-profit, which means (among other things) we need to form a Board of Directors.

It was been a tough decision, since we are lucky to know tons of qualified people, and there’s not even a standard number of people to pick. After much discussion, though, we decided two things:

Small is better than big. We like being light and agile, and fewer people is consistent with that.
Aim high. Worst that can happen is they say no.

The first point led us to a board of four people: the two of us and two more. The second point led us to ask our best-case-scenario choices, Cameron Neylon and John Wilbanks. Both these guys are whip smart, extraordinary communicators, and respected leaders in the open science communities. Both have extensive experience working with researchers and funders (our users) at all levels. And both mix principle and pragmatism in their writing and work in a way that resonates with us. These people are, along pretty much every dimension, our dream board.

And we are immensely excited (I am literally bouncing in my seat at the coffee shop as I write this) to now publicly announce: they both said yes. So welcome, Cameron and John. We’re excited to start changing how science works, together.

Percentiles

In the last post we talked about the need to give raw counts context on expected impact. How should this background information be communicated?

Our favourite approach: percentiles.

Try it on for size: Your paper is in the 88th percentile of CiteULike bookmarks, relative to other papers like it. That tells you something, doesn’t it? The paper got a lot of bookmarks, but there are some papers with more. Simple, succinct, intuitive, and applicable to any type of metric.

Percentiles were also the favoured approach for context in the “normalization” breakout group at altmetrics12, and have already popped up as a total-impact feature request. Percentiles have been explored scientometrics for journal impact metrics, including in a recent paper by Leydesdorff and Bornmann [http://dx.doi.org/10.1002/asi.21609, free preprint PDF.] The abstract says “total impact” in it, did you catch that? 🙂

As it turns out, actually implementing percentiles for altmetrics isn’t quite as simple as it sounds. We have to make a few decisions about how to handle ties, and zeros, and sampling, and how to define “other papers like it”…. stay tuned.

(part 2 of a series on how total-impact plans to give context to the altmetrics it reports. see part 1, part 3, and part 4.)

What do we expect?

How many tweets is a lot?

Total-impact is getting pretty good at finding raw numbers of tweets, bookmarks, and other interactions. But these numbers are hard to interpret. Say I’ve got 5 tweets on a paper—am I doing well? To answer that, we must know how much activity we expect on a paper like this one.

But how do we know what to expect? To figure this out, we’ll need to account for a number of factors:

First, expected impact depends on the age of the paper. Older papers have had longer to accumulate impact: an older paper is likely to have more citations than a younger paper.

Second, especially for some metrics, expected impact depends on the absolute year of publication. Because papers often get a spike in social media attention at the time of publication, papers published in years when a social tool is very popular recieve more attention on that tool than papers published before or after the tool was popular. For example, papers published in years when twitter has been popular recieve more tweets than papers published in the 1980s.

Third, expected impact depends on the size of the field. The more people there are who read papers like this, the more people there are who might Like it.

Fourth, expected impact depends on the tool adoption patterns of the subdiscipline. Papers in fields with a strong Mendeley community will have more Mendeley readers than papers published in fields that tend to use Zotero.

Finally, expected impact levels depends on what we mean by papers “like this.” How do we define the relevant reference set? Other papers in this journal? Papers with the same indexing terms? Funded under the same program? By investigators I consider my competition?

There are other variables too. For example, a paper published in a journal that tweets all its new publications will get a twitter boost, an Open Access paper might receive more Shares than a paper behind a paywall, and so on.

Establishing a clear and robust baseline won’t be easy, given all of this compexity! That said, let’s start. Stay tuned for our plans…

(part 1 of a series on how total-impact plans to give context to the altmetrics it reports. see part 2, part 3, and part 4.)

Learning from our mistakes: fixing bad data

Total-impact is in early beta. We’re releasing early and often in this rapid-push stage, which means that we (and our awesome early-adopting users!) are finding some bugs.

As a result of early code, a bit of bad data had made it into our total-impact database. It affected only a few items, but even a few is too many. We’ve traced it to a few issues:

our wikipedia code called the wikipedia api with the wrong type of quotes, in some cases returning partial matches
when pubmed can’t find a doi and the doi contains periods, it turns out that the pubmed api breaks the doi into pieces and tries to match any of the pieces. Our code didn’t check for this.
a few DOIs were entered with null and escape characters that we didn’t handle properly

We’ve fixed these and redoubled our unit tests to find these sorts of bugs earlier in the future…. but how to purge the bad data currently in the database?

Turns out that the data architecture we had been using didn’t make this easy. A bad pubmed ID propagated through our collected data in ways that were hard for us to trace. Arg! We’ve learned from this, and taken a few steps:

deleted the problematic Wikipedia data
deleted all the previously collected PubMed Central citation counts and F1000 notes
deleted 56 items from collections because we couldn’t rederive the original input string
updated our data model to capture provenance information so this doesn’t happen again!

What does this mean for a total-impact user? You may notice fewer Wikipedia and PubMed Central counts than you saw last week if you revisit an old collection. Click the “update” button at the top of a collection and accurate data will be re-collected.

It goes without saying: we are committed to bringing you Accurate Data (and radical transparency on both our successes and our mistakes 🙂 ).

New Twitter account

We created total-impact’s “@totalpimpactdev” Twitter account a while ago, as a way to keep our small group of developers and early users enlooped about changes to the code. Since then, total-impact has matured past the point where only developers care.

So, we’re updating our Twitter handle accordingly: we’re now tweeting from @totalimpactorg. If you follow us already, no need to change anything. If you don’t, do! Our codebase and feature list are improving almost daily, and our Twitter feed is a great way to stay up to date.

What’s your pain?

We want to build a product users want. No, actually, we want to build a product users *need*. A product that solves pain, that solves problems. Best way to know what the problems are? Get out of the building and ask.

So, dear potential-future-users: where are you currently feeling real pain about tracking the impact of your research?

Here are three potential places:

You are desperate to learn more about your impact for your own curiosity.
You put all of this time into your research, you really want your circle to know about it. You need to share info about your impact.
You want to be rewarded for your impact when evaluated for hiring, promotion, grants, and awards.

What’s the rank order of these pains for you? Are there others? Tell us all about it so we can build the tool that you need: team@total-impact.org or @totalimpactdev.

load all your Google Scholar publications into total-impact

A lot of users have pointed out that it’s hard to get lists of articles into total-impact: you can cut and paste DOIs, but most people don’t have those on hand. Today we’re launching an awesome new feature to fix that: importing from Google Scholar “My Citations” profiles.

To use it, just visit your profile and click Actions->export, then “Export all my articles.” Save the file it gives you. Upload the file to total-impact in “Upload a BibTeX file” box when you create your collection (and of course, you can still add other research products from Slideshare, Github, Dryad, and elsewhere, too). In minutes, you can go from a narrow, old-fashioned impact snapshot to a rich, multi-dimensional image of your research’s diverse impacts.

Thanks to Google Scholar for making profiles easy to export, and CrossRef for their open API. This feature is still experimental (we only get articles with DOIs, for instance, so some are left out), and we’d love your feedback. Enjoy!

new metrics: number of student readers, citations by review articles, and more…

We’ve added some cool new metrics to total-impact:

number of citations by papers in PMC,
number of citations by review papers in PMC,
number of citations by editorials in PMC,
the number of student readers in Mendeley (roughly, based on top-three reported job descriptions)
the number of Mendeley readers from developing countries (again, roughly)
a “F1000 Yes” note if an article has been reviewed by F1000

See them in action in our sample collection.

These are exciting metrics for two reasons: they aren’t easily available elsewhere in this format, and we think they’ll be powerful signals about the impact flavor of research.

Thanks to PMC and Mendeley for making their data and filters available via an Open API: this sort of innovation isn’t otherwise possible.

If you have a current collection on total-impact and want to see these metrics, hit the “update” button. New collections will all include these metrics. Enjoy!

Megasprint!

As of yesterday, I (Jason) have joined Heather in Vancouver for what we’re calling the Megasprint: two months of 12hr-day, take-no-prisoners, hardcore hacking on total-impact. We’re working toward the mid-September release of our next version, codenamed Bruce (total-impact sounds like an action movie name…why fight it).

Bruce will be our first heavily-publicized release (there will be t-shirts!), and will feature collection-level analysis and visualization tools, data from tons of new providers, and support for collections tracking hundreds of thousands of articles, datasets, software projects, and more. And of course the ability to knock Alan Rickman off a building.

We’re super excited about all we’ll be able to get done in the next, intense two months…stay tuned!

Help pick a tagline for total-impact!

We’re picking a tagline for TI, and we’d love to get your feedback. To help us, just visit the poll on SurveyMonkey, tick your top tagline (or suggest one of your own), and you’re done. Finally, a chance to channel your inner Don Draper!

OurResearch blog