The Hakai Institute, as seen by OpenAlex

Posted on June 15, 2026June 15, 2026 by Kyle Demes

A case study of a research funder, told through OpenAlex data and some of my own history

TL;DR. As part of a Wellcome-funded effort to build open funding metadata into OpenAlex, I assembled research outputs I could tie to the Tula Foundation and its Hakai Institute, the Canadian funder where I got my start in research management and strategy over a decade ago. The result is 1,496 research works (overwhelmingly coastal and environmental science), cited 58,000+ times by research from 199 countries and every field of science, with more than 6% in the world’s top 1% most-cited papers. The work runs from a 13,000-year-old discovery on a BC beach to the pathogen behind a continent-wide ecological collapse; it’s co-acknowledged alongside 500+ other funders; and it shows BC universities working across institutional silos. This is a story about a remarkable funder and a demonstration of what is now possible with open research funding data integrated into OpenAlex.

Measuring what funders make possible

For the past year at OpenAlex, I’ve been leading the development and implementation of Wellcome-funded project to build out open funding metadata and to connect the world’s research to the funders who made it possible, as open data anyone can use (more on that here, including link to original proposal). We’ve now made enough progress that I wanted to put our data through the test of a real case study. I chose one close to home: the Tula Foundation and its Hakai Institute, the Canadian research funder where I first cut my teeth on research management and strategy.

The Tula Foundation was founded in 2001 by Eric Peterson and Christina Munck, after Eric sold his medical-imaging company. With those resources, they did something ambitious: they built two world-class marine research stations: one on Calvert Island, on the wild Pacific edge of the Great Bear Rainforest, and one on Quadra Island in the Salish Sea. Rather than issue a series of smaller grants to researchers at universities, they aspired to bring scientists together across university and sectoral silos to focus on place-based research along British Columbia’s coastline, in collaboration with the First Nations whose territories span the region. The Calvert lodge was acquired in 2009 and opened the following spring; the Hakai Institute (for a time the “Hakai Beach Institute,” and earlier administered as the “Hakai Network”) grew from there.

Tula’s work reaches well beyond the coast, including other important initiatives like TulaSalud, a maternal- and child-health program in Guatemala. But the coastal science is the part I know best, because I lived a chapter of it, and so I focus on that aspect of their work here.

(More reporting on the Tula foundation here, and here)

My chapter

After finishing my PhD at UBC, I was hired as a postdoc in Anne Salomon’s lab at Simon Fraser University, funded through the Hakai Network. My job was to design a kelp-forest monitoring protocol for the Central Coast (the first long-term baseline of its kind there) built to be interoperable with monitoring initiatives elsewhere in BC (like Haida Gwaii) but also aimed squarely at a question that was rewriting the coast in real time: what happens as sea otters return after decades of local extinction? I ran a dive team out of the Calvert Island station for two field seasons, and supported the work of trainees like Jenn Burt and Christine Stevenson, who have since launched successful careers in the field.

As my postdoc wound down, I was interviewing for faculty jobs and was conflicted about my future, not wanting to leave BC, which had become my home. Around the same time, Eric and Christina were rethinking how to best mobilize Tula’s resources to move coastal research forward. Because I already knew researchers across the BC universities and was familiar with his vision, he offered me a full-time role building Hakai’s in-house research administration: designing better ways to partner with universities, and exploring collaborative models with government, industry, First Nations, other funders, and NGOs.

I was in that job less than a year. But the entrepreneurial spirit of the organization and the founders’ ambition and mission-focus gave me a unique crash course in research management, specifically: how to align the incentives of researchers and institutions with a mission (which are often orthogonal) to produce real impact. These experiences became the foundation of everything I’ve done since and inspired a career dedicated to analytical approaches to large-scale research strategy, ultimately motivating my work at OpenAlex today, supporting research globally with an open index of the world’s research ecosystem.

You can probably read from my tone that I am not a neutral observer in this story, but I see that as an excellent opportunity to test the state of OpenAlex’s funding data.

Finding the corpus: casting a wide net

Here’s the first hard problem, and it’s a general one for funder analyses: you cannot just look up a funder and expect the list to be completely accurate and comprehensive.

Tula and Hakai aren’t always acknowledged the way a federal funding agency is in publications. Conventions have changed over two decades; some researchers name “Hakai Institute” as an affiliation rather than in an acknowledgements section; closed-access papers may carry no machine-readable funding; Hakai publishes open datasets through a repository hosted in-house; and as I mentioned above Tula supports great initiatives outside of the mission of the Hakai Institute. A single funder lookup would miss some outputs and include others that don’t fit the scope of this analysis.

So I cast a deliberately wide net using complementary search strategies in OpenAlex:

Strategy	What it catches
Funder = Hakai Institute / Tula Foundation	Explicit funder acknowledgements and linkages asserted in Crossref
Affiliation = Hakai Institute / Tula Foundation	Authors who list Hakai/Tula as one of their institutions
Raw affiliation string “Hakai”	Affiliations OpenAlex hasn’t yet linked to an institution
Full-text “Hakai Institute” / “Hakai Network” / “Tula Foundation”	Mentions in the open full text
Full-text “Calvert Island”	Work done at the field station, however acknowledged
Datasets	Hakai-published open data

That produced 1,684 candidate works but casting a wide net means catching some fish you didn’t want to catch, which is exactly why the next step matters. [To those who are annoyed by unnecessary marine puns: I’m sorry but I just couldn’t help myself]

Verifying every candidate

I ran an LLM over all 1,684 candidates (title, abstract, venue, affiliations, provenance) to judge whether each was genuinely Tula/Hakai work, then hand-checked every exclusion and randomly spot-checked inclusions. 170 were false positives. A few illustrate why the step is essential: a protein called Hakai (CBLL1, named for the Japanese word for “destruction”) that full-text matching brought in; the Russian city of Tula and its universities brought in through affiliation searching; and one output was a 1985 oceanography paper tagged to a funding organization that wouldn’t exist for another sixteen years.

That left 1,514 genuine Tula/Hakai works. The analysis below covers 1,496 of them that are overwhelmingly coastal and environmental science, plus Tula’s marine-microbial, earth-science, and related research. Setting aside only the clinical and behavioural papers (Medicine, Psychology, Neuroscience) relating to TulaSalud as off-topic for this analysis.

A note on precision and recall. For those who are interested, I estimated precision and recall of this search strategy. Precision (95%) was calculated as the number of publications in the final set divided by the total candidates returned by the search, and recall (90%) was measured as the percent of works from Hakai’s public list of publications (https://hakai.org/publications) that were returned by this search. The 10% of works on their list that were not returned by my searches had no Hakai affiliation, no searchable full text, and/or text funding sections in formats that we failed to parse. These types of lists of funded outputs by funders help us identify systematic precision/recall errors to improve, but they also provide the opportunity for us to create open funding linkages where we otherwise might not have been able to detect a signal. Two weeks ago, we developed a pipeline to accept funder-publication linkages directly from funders (in collaboration with NWO in the Netherlands) and I submitted the remaining missing linkages for Hakai shortly after finalizing this analysis..

The real point here. Open funder metadata is now good enough to find a funder’s footprint across journals, repositories, and full text. Broad open metadata plus a verification pass is what makes a study like this possible. A few years ago, it wasn’t. And later this year, users will be able to submit the types of curations I made during this analysis directly to OpenAlex!

What the funding produced

1,496 verified Tula/Hakai research works, spanning effectively 2007 to today, including 4,742 authors at 1,200+ institutions in 85 countries. ~80% of are open access with over 100 open datasets.

That ~80% open-access share is well above the global average (about 47% of research published 2015–2024). That isn’t an accident. In 2015, Hakai adopted an open-science policy (one I helped implement) requiring open data and publications. Funders are one of the most powerful levers for open science: when the Gates Foundation made immediate open access a condition of its grants and refused to cover APC fees, compliance has followed across the fields it funds and other funders have been empowered to make similar policies. Hakai’s policy is a small, early example of the same dynamic, and it shows up directly in how discoverable and reusable this body of work is today.

Annual outputs climb from a handful before the field stations to well over 100 a year; the dotted lines mark the Calvert (2009) and Quadra (2014) field stations.

The work is overwhelmingly environmental and marine science and the single most common topic is marine and coastal plant biology (read: related to seaweeds, seagrasses, and their ecosystems). Works are published in venues like Frontiers in Marine Science, Marine Ecology Progress Series, The ISME Journal, Proceedings of the Royal Society B, and PNAS.

Subfields of the corpus (top 10; the long tail is grouped as “Other”).

Citations: punching above weight

Counting papers is relatively straightforward but much less meaningful. The more interesting question is whether anyone built on the knowledge produced and shared through those outputs. They did, of course: the corpus has been cited 58,579 times.

The standard field-normalized measure is Field-Weighted Citation Impact (FWCI), where 1.0 is the world average for a paper of the same age, field, and type. People like FWCI because it allows comparisons across very different outputs and that’s why FWCI underpins university rankings and many institutional research intelligence use cases. But averaged FWCI is notorious for being heavily right-skewed: a handful of very highly cited papers can pull the mean far above the typical paper. Here the mean FWCI is 6.94 but that is heavily influenced by a few giant consortia papers. Data from the Central Coast of BC are only able to be included in mega-authored global syntheses because of the Hakai Institute, and so it’s important to still value Hakai’s contributions to those highly cited initiatives, but the median and the percentile shares are more robust here:

Median FWCI: 1.69 — the typical Hakai/Tula paper is cited ~69% more than its global peers.
62.5% of works are above the world average.
92 works (6.1%) sit in the world’s top 1% most-cited — 6.1× the expected rate.
526 works (35.2%) are in the top 10% — 3.5× the expected rate.

The corpus is ~6× over-represented in the world’s top 1%, and ~3.5× in the top 10%.

What was the science?

Citation metrics tell you that people built on the work; they don’t tell you what the work was. So I pulled the 50 highest-FWCI works to understand the work and its impact better. It was surreal to read about some of the incredible discoveries that came from the Calvert Island field station while reflecting on the memories I was fortunately enough to build there with other researchers:

The sea stars, the mystery, and the answer. Starting in 2013, sea stars from Alaska to Mexico melted away (literally…) in the largest marine wildlife die-off ever recorded. A Hakai-linked team documented the continent-scale collapse of the sunflower sea star (Pycnopodia), tied to a marine heat wave (2019). The loss rippled straight into my world: fewer sea stars and otters meant more urchins, and more urchins meant less kelp. Then, after a decade-long hunt by scientists globally, a 2025 paper in the corpus named the culprit, the bacterium Vibrio pectenicida, finally giving the disease a cause.
A 13,000-year-old morning on Calvert Island. Among the highest-impact papers (FWCI 93) is the discovery of terminal-Pleistocene human footprints pressed into the shoreline of Calvert Island, representing direct evidence for the peopling of the Americas along a coastal route. The field station didn’t only host marine biology; it helped rewrite an understanding of human history.
Kelp forests, globally. The corpus includes the definitive syntheses of global kelp-forest change over the past half-century and the economic value of the world’s kelp forests. This is the big-picture context for exactly the monitoring I was sent to Calvert to build.
The invisible ocean. Tula’s marine-microbial program produced landmark work on microbial and viral “dark matter” and the evolutionary origin of plastids, the hidden machinery of ocean ecosystems.
Filling a blank spot in the global picture. Some of the highest-FWCI works are large climate syntheses (e.g., the Global Carbon Budget series, a multi-decade ocean-CO₂ record) where Hakai is one of many contributors. That’s not a footnote to discount. Coastal British Columbia was historically underrepresented in global ocean and carbon models; Hakai’s sustained observations are the only reason that this region appears in those worldwide analyses at all. Putting a blank spot on the map is its own kind of impact.
Science with, not about, communities. Several top papers are Indigenous-led, Haida and Haíɫzaqv ethics and oral history guiding ecological restoration, reflecting how the work is done on this coast.

The point: the impact numbers aren’t abstract. They’re footprints, sea stars, and kelp.

How far does the work travel?

This is my favourite analysis, and it’s only possible because OpenAlex maps the full scholarly citation graph. I took every paper that cites the corpus and asked where, and in what fields, those citing papers come from.

The answer: research from 199 countries and all 26 fields of science has cited this coastal work. A program run from two islands on the BC coast is being built upon on every continent.

Every shaded country has published research citing this corpus; colour is on a log scale.

Fields that cite the work

Breaking the silos

Tula’s bet wasn’t only on papers; it was on a place where people from different organizations would work side by side. Does the data show it?

I looked at co-authorship among five BC universities (UBC, SFU, UVic, UNBC, and VIU) restricted to Hakai’s focus coastal topics (kelp and coastal plant biology, fisheries, coastal ecosystems, fish ecology, ocean acidification, marine biology, and environmental DNA). Comparing two equal 15-year windows around the field station’s 2009 opening, cross-university co-authorship links in these topics grew from 76 before (1994–2008) to 462 after (2010–2024), roughly a sixfold rise. UBC–SFU joint coastal papers alone went from 27 to 196. A rich body of research exists showing the benefits of collaboration across institutions, but institutional incentives restrict this type of collaboration. Without organizations like Tula/Hakai creating incentives for collaboration, they tend not to happen, despite the increased impact such collaborations can produce.

Crucially, that’s not just more papers. Total BC output in these topics grew ~2.7× over the same period, but collaboration links grew ~6× — so collaboration intensity (links per BC coastal paper) more than doubled, from 0.06 to 0.13. The universities didn’t only publish more; they published together more.

Co-authorship ties among the five BC universities in Hakai’s coastal topics (listed beneath the figure). Before the field stations (left) the universities barely co-published in this space; afterward (right) the ties thicken dramatically.

Amplifying other funders’ investments

Research rarely has a single funder. When I grouped the corpus by every funder acknowledged on each paper, I found 537 distinct co-funders beyond Tula/Hakai (excluding universities’ own internal grants):

Top co-funders

Canada’s NSERC appears on 388 of these works; the US National Science Foundation on 127; CIFAR, Fisheries and Oceans Canada, and Mitacs on ~70–80 each; and a long international tail including NOAA, NASA, the UK’s NERC, the Gordon and Betty Moore Foundation, and the Pacific Salmon Foundation. (NASA is there because Hakai’s satellite-based kelp mapping leverages Landsat imagery to map BC’s coastal resources.)

These mostly aren’t formal partnerships between Tula/Hakai and, say, NSERC. They’re something quietly valuable: Tula-supported researchers, field stations, and long-term datasets enabling work that other funders also backed so that a single project advances several funders’ missions at once. A trained scientist or a decade-long dataset makes the public dollars flowing through NSERC, NSF, DFO, or NOAA reach further than they could alone. Open funding metadata lets any funder see, and show, that amplifying effect directly.

What bibliometrics can and can’t tell you

It’s tempting to look for a clean before-and-after spike when the field stations opened in 2009 and call it “Tula’s effect.” Analysis won’t support that and the reason is instructive about the method itself. BC was already a global leader in marine science (that’s what drew me here originally), and also because Tula’s impact in this space began earlier than the Hakai Institute, in the 2000s. A place-based funder embedded in an already-strong region simply cannot be cleanly isolated with this approach.

What the data can show robustly, is specialization. BC produces about 1.2% of the world’s output in Hakai’s focus topics but only ~0.3% of science overall, meaning BC is 4× more present in exactly the topics Hakai funds than in research at large, and has been since the early-grant era. BC’s annual output in these topics grew from 61 papers in 2000 to ~272 a year recently.

BC is consistently 4–5× over-represented in the topics Hakai funds, relative to its overall share of science.

That’s the right altitude for this kind of analysis. Bibliometrics can map a footprint and its concentration with confidence; it cannot, on its own, prove what caused it. Being clear about that line is part of using funder data responsibly.

And these analyses are useful well beyond telling one funder’s story. The same open data lets a funder run a landscape analysis of its own portfolio; it helps funders read the broader landscape to find others with overlapping missions for large-scale collaboration; and it supports gap analyses that reveal promising areas with little funding. Open funder metadata turns work that used to be a bespoke consulting engagement or a subscription to a proprietary database into something any funder can do to help their investments have a larger impact towards their mission.

The part the bibliometrics miss

I want to be clear about the limits of everything above. The true impact of Tula’s funding is so much larger than papers. The field stations supported local and First Nations economies, created jobs on the BC coast, trained leaders in coastal ecology, and underpinned Indigenous-led monitoring and stewardship. Tula also funded important work outside the scope of this analysis (like TulaSalud’s community health program in Guatemala, local nurse training, the Coastal Guardian Watchmen, and more) which were omitted here and whose true impact barely appears in the citation record at all, because that kind of impact mostly doesn’t get published in journals. Bibliometrics can only show a shadow of a funder from a single angle but funders are multi-dimensional and moving the light source even slightly would cast a different shadow. I’m showing you this shadow because it’s what the data in OpenAlex and other bibliometric databases can describe, not because it’s the whole figure — remembering this analogy in bibliometric analyses is critical when using them responsibly.

Careers, including mine

Here’s something no citation count fully captures. The Calvert Island station didn’t only produce papers, it produced people. Over the years it trained and launched a generation of coastal scientists: graduate students, postdocs, dive technicians, data managers, etc., many of whom now lead research and conservation at universities, government agencies, NGOs, and the Institute itself. Properly tracing all of those careers would be a study in its own right. Here, I’ll just say the pattern is unmistakable to anyone who has met a former Hakai trainee. [Maybe next year, I’ll try to get a list of all the trainees and track their careers as another case study.]

The sea-otter-and-kelp question that brought me to Calvert is one small example of the through-line: it seeded a whole research lineage with 86 kelp papers (2,500+ citations) and 24 sea-otter papers in this corpus alone.

The sea-otter and kelp research lineage

And me? The crash course Eric and Christina gave me in how research actually gets supported, and how collaborations are built around missions, is the reason I’m writing this from OpenAlex today instead of a faculty office. A funder’s impact includes the careers it shapes, including my own.

Why I could do this now — and an invitation

A few years ago, this study would have been a months-long manual slog of reading acknowledgements sections and asking faculty to take on more administrative work, taking precious time away from their research and training activities. Today, open funding data in OpenAlex (funder IDs, linked affiliations, full-text search, the complete citation graph, datasets, FWCI and percentiles, etc.) let me assemble, verify, and analyze a funder’s entire research footprint in a single weekend, with every step reproducible without paywalls. The scripts and data behind every number are open here.

It also showed me where the work still is: improving entity resolution strategies that conflate a foundation with a gene, institutions that over-match a common place name, conventions that vary across decades. That’s exactly the kind of work this Wellcome-funded project is built to support and where funders themselves can help, by sharing their data openly and helping us establish trusted connections tot he work they’ve funded.

If you’re a research funder (agency or foundation, large or small) and you’d like to understand your research footprint with open, transparent metadata, I’d love to talk (e-mail me at kyle@openalex.org). This is what open funder data makes possible, and we’re just getting started!

Q2 2026 Town Hall: What We Shipped and What’s Next

Posted on April 25, 2026April 25, 2026 by Kyle Demes

Last week, we held our quarterly community town hall. Jason Priem (founder & CEO) walked through everything OpenAlex shipped in Q1 2026 and laid out the roadmap for Q2. If you’d like to watch the full recording, it’s on YouTube.

This post recaps the highlights for anyone who couldn’t make it.

A new kind of transparency

One thing we tried for the first time this quarter: instead of slides, the entire town hall was a walk-through of Markdown files in a public GitHub repo. The retrospective itself was generated by Claude Code from a single prompt pointed at our open-source repositories, our blog, and our internal job tracker (oxjobs). The prompt is right there at the top of the document — anyone can re-run.

That seems like a small thing, but it represents something we’re really excited about. For two decades, the open community has been writing a check that says “if we keep building in the open, eventually the machines will get here and this will pay off.” That check is finally cashing. Because OpenAlex is built in the open, you don’t have to wait for the next town hall to find out what we’ve been doing — you can run the same prompt next week, next month, and see for yourself. That kind of legibility just isn’t available from closed databases.

Okay, on to the actual work.

Q1 in review

Alice — our biggest release since Walden

In February we shipped Alice, a sweeping update that touched search, content delivery, pricing, and docs:

Semantic search (beta). Search by meaning, not just keywords — query neoplasm and get articles about cancer. Behind the scenes that’s a custom Elasticsearch vector index with 413M embeddings (including 197M title-only embeddings so even works without abstracts are searchable). Queries return in under 250ms.
Advanced search. Proximity operators, exact matching, wildcards, and queries up to 8KB. Most of the syntax you’d use in legacy databases for a systematic review now works in OpenAlex.
Content API. Direct access to 60M+ open-access PDFs and parsed GROBID TEI XML at predictable URLs like content.openalex.org/works/{id}.pdf. There’s a 62M-row Parquet manifest for bulk sync.
Usage-based pricing. This one matters most for our long-term sustainability. We replaced blunt rate limits with a transparent credit system priced in dollars: $1/day free for individuals, plus Pro, Max, Member, and Partner tiers, and one-time Stripe top-ups for burst usage. Even in these early days, we’re tracking toward $100K+/year in metered revenue — and the vast majority of those charges are under $10. People who want to run a million searches can pay a few bucks for it; users exploring the database or with lower volume data needs still get a free daily allowance.
Completely rebuilt documentation at developers.openalex.org, built on Mintlify. It’s optimized for agent legibility — humans aren’t really going to read documentation in 2026, but agents will, and we want our docs to be the ones they pull into context.

Data quality: tens of millions of records improved

Less glamorous, just as important. Highlights:

Repository attribution: corrected 44M works misattributed to the wrong repository source. Created 797 new source records, fixed 1,035 existing ones. any_repository_has_fulltext had never actually been implemented — it now correctly flags 163M works.
Affiliations: backfilled 23.9M affiliation strings missing from our lookup table, restoring institution IDs for ~20M works. Applied ~165K corrections from the French Ministry’s Works-Magnet tool.
Abstracts: fixed a pipeline gap where ~3M landing-page abstracts (2024–2026) were extracted but never reached the API. Re-ran topic and SDG models on 1.27M works that gained abstracts.
FWCI: replaced a stale external lookup with an inline calculation that guarantees average FWCI = 1.0 within any cohort. Fixed is_in_top_1_percent to be derived directly from citation percentiles.
Topics: unified the topics pipeline and cleared a 17.9M-work backlog.

A lot of this was only practical because of our move to Walden, our new Databricks-based codebase. Fixes that used to take two weeks now take a day.

Funder & awards expansion

Backed by a $3.6M Wellcome grant, awards are now first-class objects in OpenAlex. We’ve ingested grants from 30+ funders worldwide — NIH, NSF, DOE, UKRI, Wellcome, ERC, DFG, ANR, SNSF, NSERC, CIHR, KAKEN/JSPS (873K grants), ARC, FAPESP, ANID, the Swedish Research Council, Gates, NWO, and more. A recent analysis compared our early progress with legacy databases — we’re closing the gap quickly and by end of quarter we expect to be the most comprehensive funder/awards database available.

Japanese repositories (IRDB)

We completed a contract deliverable to ingest ~4.6M records from IRDB via JPCOAR 2.0 / OAI-PMH, replacing 74 failing individual NII endpoints. Plus ~700 new Japanese repository source records and ~500 new OAI-PMH endpoints from OpenAIRE. Big step forward for our coverage of non-English scholarship.

Author disambiguation foundations (AER v4)

We didn’t fully finish what we set out to do here — author disambiguation is hard — but we laid serious groundwork:

A new deterministic Python name parser (89–93% accuracy on a 15K gold standard) — interesting story here: we used AI to build a large gold standard, then iteratively prompted Claude to write a Python parser until it scored well against it. End result is fully deterministic Python that would have taken months to write by hand.
106.7M author-level embeddings and 718M per-authorship similarity scores.
~3.2M overmerged author profiles split using raw ORCID conflicts.
Fixed an author-sequence bug that was corrupting author ordering.
Tightened the firewalls between ORCID data and other authorship signals — ORCID metadata is right ~95% of the time, but when older versions of our pipeline trusted it as a gold standard, the 5% of errors metastasized.

Affiliation curation

A complete community-driven curation system, initially for Member institutions. Match/unmatch UI, role-based access, status tracking, and corrections that propagate to the API the next night. Curations that used to take months now land in ~24 hours, and we’ve already received one hundred thousand of them.

GUI modernization

Novice / Expert modes with a one-click toggle — once you switch to Expert, you stay there.
All 21 entity types now have first-class search, browse, zoom drawers, and CSV export (was 6).
Sidebar nav, repository operator dashboard, accessibility audit (Vuetify upgrade, WCAG 2.1).
New landing page positioning OpenAlex as “the universal research database — built for agents, scripts, and spreadsheets.”

A frank note on the GUI from Jason: we’re going to keep it healthy, but our long-term focus is the API and the data. Increasingly, the right interface for OpenAlex is the one your agent builds for you on demand.

Daily snapshots, OECD/FORD mapping, and infra

Daily incremental snapshots for all 21 entity types (JSONL + Parquet, hash-based change detection).
Mapped OpenAlex subfields to OECD FORD in collaboration with NORA — covers 38 of 42 two-digit FORD fields and feeds the upcoming Danish Research Portal.
$135K/year in infrastructure savings by shutting down legacy Heroku apps, migrating Unpaywall to RDS, and consolidating services. Honestly, a lot of this was AI-assisted: “find me where we’re wasting money” turns out to be a great prompt when your codebase is open.
A public status page at status.openalex.org, GitHub Actions CI/CD for Databricks, and a fresh POSI v2 self-assessment.

Pricing & community

We collapsed our pricing to three clear tiers: Member, Member+, and Partner — with a PDF Sync add-on. The new $5K/year Member tier (admin dashboard, affiliation editor, Unsub access, CAB nomination rights) launched with the University of Victoria, joined by Université de Montréal, KISTI, University of Queensland, and Statistics Denmark.

Q2 roadmap: fewer things, deeper focus

This quarter we’re deliberately doing fewer big things. Two real focuses:

1. Author accuracy — finishing what we started

Roughly a six-week push. The work packages:

ORCID-driven merges of incorrectly split author profiles (splits are largely done; merges are next).
Exposing raw_orcid in the works API — if this has been bothering you, you’ll be happy.
Name-based splits and joins, building on the new parser. This is where we expect to clear out the most egregious errors — the cases where two people with totally different names got clustered together because the algorithm weighted other features over the name itself. Those situations are going to go away.
Curation UI for authors and institutions to fix their own profiles. We’ve actually built this three times and walked away from it — it touches everything in the system and has to propagate fast. We’re going to land it this quarter. Long-term this should also be drivable from agents (“hey, look me up in OpenAlex and fix the things that aren’t me”), but we need the UI first.
A “fancy” ML-based split algorithm is out of scope — there’s enough low-hanging fruit that combining the basic improvements with self-curation will get us most of the way there.

2. Data quality — raking the lawn

Driven by the thousands of Zendesk tickets the community has filed. No single one of these is huge, but collectively they’re who we are. Examples we’re working through:

arXiv bugs (language detection, OAI-PMH locations, missing PDFs)
Abstract coverage gaps
Missing institution.country_code values
publication_year > 2026 (should be null)
Crosswalks for the many idiosyncratic repository work.type taxonomies

We’re hoping to report dozens of these fixed by next town hall.

Process & community

Semi-automated ticket solving with AI in the loop — the guardrails matter, but the productivity wins are real.
More work on oxjobs, our homegrown issue tracker, plus internal QA dashboards.
London funder workshop next week, hosted at Wellcome and bringing funders from around the world together. Tied to the Wellcome grant.

A few themes worth calling out

AI is multiplying our throughput. This was the busiest quarter we’ve had, by a lot, and we’re a small team. Walden + AI tooling is the reason. The quarterly retro itself was AI-generated from our open repos. We don’t think AI replaces the work — but it does mean a small open team can punch well above its weight.

Open data is the foundation, not the application. What’s exciting right now isn’t that AI exists — it’s that AI plus open data finally decouples intelligence from data. You can swap in whichever model you want over OpenAlex. We’ve spent twenty years building toward this moment.

We listen to what the community pays for. Talk is cheap. When someone signs up at the Member tier, that’s a real signal about what’s working. Conversely, semantic search has gotten less excitement than we’d hoped given its cost (~$10K/month to serve) — so if you’re using it and want it to do more (custom vectors, higher result limits), please tell us. Every request is a vote.

Get involved

Browse the Q2 retro and Q2 roadmap directly.
Watch the town hall recording.
Have a feature request or bug? File a ticket — support@openalex.org. Even if we can’t get to it this quarter, every report is a vote that helps prioritize the next one.
Member institutions: try the new affiliation curator.
Everyone: try developers.openalex.org with your favorite agent. Let it write your queries.

Opening your research funding data: a practical guide for funders

Posted on April 13, 2026April 13, 2026 by Kyle Demes

Open research infrastructure works best when the underlying metadata is open too. Research funding is one important area where that openness still lags. Over the past year, we’ve been working directly with funders around the world to help make funding information a first-class part of the open scholarly graph (see more on that here: https://blog.openalex.org/funding-metadata-in-openalex). Along the way, we have been hearing from funders of all shapes and sizes that they want practical advice: for those that are ready to make their metadata more open, what exactly should they share, and where should they start? This post is our attempt to answer those questions in a concrete and incremental way.

Why this matters

When funding data is openly available, it can be linked to the rest of the scholarly ecosystem: research outputs (papers, datasets, software), institutions, researchers, topics, and citations. This makes it much easier to understand what research was supported, by whom, and with what impact as well as where funding is flowing (and not flowing).

OpenAlex represents connections between funding and research either by funders → grants → outputs or directly from funders → outputs when information on specific grants is not available or linkable.

Two ways to connect funding to research

1) Funder identity + acknowledgement matching (no grant database required)

Even if you don’t share grant-by-grant information, funded research can often be identified when we scan acknowledgements sections of research outputs for funder mentions and linking those mentions to a known funder.

What this enables

Tracking of research outputs and their impacts by funder
Topic and institutional views based on acknowledged support
Better tracking despite acronyms, translations, and name changes

What we need for this to be effective

A public record of the funders’ operating names (e.g., meaningful name variants and historical names). ROR.org has most funders already– adding records and curating is free and easy and makes it easier to trace the different ways researchers might refer to your funding.
Providing guidance to researchers on how to represent your organization in publications when they declare funding

If you share nothing else, curating your ROR record and name variants is still high value.

2) Grant/Award records (richer linkages + funding-flow intelligence)

If you also share information about your grants/awards, you can make connections to specific grants (not just the funder name) and unlock deeper intelligence about research funding flows.

What this enables

Which funders fund topic X (and how that changes over time)
Which institutions receive funding from funder Y
Where there may be “underfunded” topics
Linking specific grants/programs → outputs → citations/impact
Funding-flow analytics (when amounts/currencies/dates are shared)

“Share what you can” tiers (minimum → recommended → optional)

Funders vary widely (public accountability vs. privacy, administrative constraints, donor preferences). It’s okay to publish only a subset of fields. You can start small and expand over time.

Tier 0: No grant records, but you want to track funded research

Do this

Ensure you have a ROR record and keep it updated (especially name variants).

Tier 1: Minimum viable grant data (high value, low sensitivity)

Share these fields for each grant/award (even without funding amounts):

Grant ID (your internal identifier)
Grant title
Short description/abstract
Start year/date (and end year/date if possible)

Enables: topic discovery, portfolio timelines, and better matching to outputs—without publishing sensitive financial info.

Tier 2: Strong linkage + attribution (recommended for most funders)

Add:

Awarded institutions
Investigators (PI or individual awardee; co-investigators optional)
Program/scheme/call name
Funding type (grant/fellowship/infrastructure/etc.)
Reported outputs linked to grants (optional): if you collect publication/output lists from grantee reporting (papers, datasets, software, preprints, etc.), consider sharing those as explicit grant → output links. This captures links that may not appear in acknowledgement sections and reuses effort researchers already invested in reporting.

Enables: “who/where did we fund,” collaboration networks, and more reliable grant-to-output linking.

Tier 3: Funding-flow intelligence (optional; may be sensitive)

Add:

Amount + currency

Enables: investment totals by topic, cross-funder comparisons, and richer budget analytics.

What if you don’t want to share certain fields?

That’s okay! It’s better to share some information than to not share any because you don’t want to share specific fields. Here’s what you gain/lose:

No amounts/currency: you lose “how much was invested” analytics, but still get strong portfolio discovery and topic/institution linkages via titles/descriptions/dates/institutions.
No investigator identities: you lose person-level linking, but institutional and topical linking can remain strong.

Special case: Donor Advised Funds (DAFs) and fund administrators

Sometimes a fund administrator or DAF sponsor has stricter disclosure rules than the underlying funder’s preferences. In these cases, funders and administrators can collaborate on a field-sharing policy that protects sensitive info while still enabling research tracking.

Practical approach

Separate donor identity constraints from research grant data
Share Tier 1–2 fields first (title, description, years, institutions)
If amounts are sensitive: omit them
Use a simple workflow: funder approves which fields are public; administrator publishes and maintains the feed

A simple “how to start” plan

Confirm identity: ensure your funder organization has a curated ROR record (names/aliases).
Pick a tier: Tier 0 if you can’t publish grants; Tier 1 if you can publish minimal records.
Publish in a sustainable format: website, database, bulk download, or API—getting data out there is more important than the specific format
Iterate: add fields over time as policy and capacity allow.

Thank you to the many funders who have already worked with us to help make research funding data more open and more useful. Building a comprehensive, connected, and open view of research funding will take collaboration from funders of all kinds, and we’re excited to keep learning alongside the community. If your organization is thinking about sharing its metadata, we’d be glad to talk—please reach out to kyle@openalex.org

Recommitting to the Principles of Open Scholarly Infrastructure (POSI)

Posted on March 30, 2026March 30, 2026 by Kyle Demes

We’re excited to reconfirm our commitment to the Principles of Open Scholarly Infrastructure (POSI).

Back in 2021, when we were still called OurResearch, we published a blog post assessing our fit to POSI and making a public commitment to these principles. That commitment still stands. But both we and POSI have evolved since then, and with the recent release of the POSI v2, it felt like a good time to revisit that post.

Since our beginnings in an all-night hackathon more than a decade ago, we’ve tried to build a sustainable, mission-driven, genuinely open piece of scholarly infrastructure. So while we didn’t write POSI, we’ve long felt that it describes the kind of organization we want to be: one that belongs to the scholarly community, one that takes sustainability seriously, and one that is built to endure—or to wind down responsibly, if that ever becomes the right thing to do.

The POSI community has now released a revised version of the principles. The updated version keeps the same overall spirit, but clarifies and strengthens a few important areas. In particular, it now separates transparent governance from transparent operations, updates the principle on lobbying to allow advocacy in support of the community, strengthens expectations around living wills and transitions, and adds more explicit attention to financial reserves, volunteer labour, preservation, and interoperability.

As before, this post is our public self-assessment. We are not claiming perfection, but we are claiming commitment.

Summary

Governance

💚 Coverage across the scholarly enterprise
💚 Stakeholder governed
💚 Non-discriminatory participation or membership
💚 Transparent governance
💚 Cannot lobby
💛 Living will
💚 Regular review of community support and need

Sustainability

💚 Transparent operations
💚 Time-limited funds are used only for time-limited activities
💚 Goal to generate surplus
💛 Goal to create financial reserves
💚 Mission-consistent revenue generation
💚 Revenue based on services, not data
💚 Volunteer labour
💚 Transition planning

Insurance

💚 Open source
💚 Open data (within constraints of privacy laws)
💚 Available data (within constraints of privacy laws)
💚 Patent non-assertion
💚 Preservation
💚 Interoperability and open standards

(💚 = good,💛 = less good)

Governance

💚 Coverage across the scholarly enterprise

What it means: Research transcends disciplines, geographies, institutions, and stakeholder groups. The infrastructure that supports it should too.

OpenAlex: This remains central to our mission. OpenAlex aims to index and support the entire research ecosystem: across disciplines, geographies, languages, output types, and stakeholder groups. Since our 2021 assessment, we’ve also broadened the scope of what OpenAlex covers, including major work to expand beyond the traditional DOI-centered literature. We still have more work to do here, but are strongly aligned with the principle.

💚 Stakeholder governed

What it means: A board-governed organization drawn from the stakeholder community builds confidence that decisions will reflect community interests.

OpenAlex: OpenAlex is a nonprofit with a public-interest mission and a governing board. Since our 2021 post, we have also added a Community Advisory Board, which brings broader stakeholder perspectives into our work. Information about our team and governance is public, the CAB terms are public, and the CAB itself was selected through a community vote. We think we are in a stronger position here than we were a few years ago.

💚 Non-discriminatory participation or membership

What it means: Any stakeholder group should be able to participate, and participation should be inclusive.

OpenAlex: Anyone can use OpenAlex. Anyone can access the data. Anyone can use the API or snapshots. Anyone can report bugs, suggest features, or engage with us publicly. Since 2021, we’ve also added more structured ways for the community to participate, including the CAB, CAB working groups, community curation pathways, community google groups, and the Member program (which is intended to support sustainability and deepen engagement, not to gate access to the infrastructure itself).

💚 Transparent governance

What it means: To achieve trust, the processes and policies for governance should be transparent.

OpenAlex: This is one place where POSI v2 improves on the earlier version. Governance transparency and operational transparency are now treated separately, which makes sense to us. OpenAlex now has both a governing board and a public Community Advisory Board. CAB terms are public, and the advisory board was selected through a community vote. We think that puts us in a better place than we were in 2021, when some of these structures were more aspirational than real.

💚 Cannot lobby

What it means: Infrastructure organizations should not lobby for narrow self-interest, though they may advocate for policy changes in support of their communities.

OpenAlex: We like the revised wording of this principle. OpenAlex is a mission-driven nonprofit. We do not lobby for narrow organizational advantage, and as a 501(c)(3) we also operate within legal constraints in this area. At the same time, we do advocate publicly for the adoption of open science, open infrastructure, open metadata, and transparent research intelligence. We see that as support for the community, not as self-serving lobbying.

💛 Living will

What it means: A trustworthy organization should describe the conditions under which it or its services would be wound down, and how assets would be preserved or passed to a successor that also honors POSI.

OpenAlex: We continue to support this principle strongly. The core assets of OpenAlex remain our source code and datasets, and those are completely open under CC0 and MIT licenses. Our code is openly developed and archived. Our snapshots are publicly available, including through some third-party sources with a backup archived on Zenodo, and more academic groups around the world are hosting local copies of OpenAlex snapshots. That said, our position here is broadly the same as it was in 2021: we have many of the practical ingredients of a living will, but we do not yet have a more formal public statement of wind-down and successor conditions than we did then.

💚 Regular review of community support and need

What it means: Organizations should regularly review whether their activities are still needed and still supported by the community.

OpenAlex: A lot of what we build is, in one way or another, a response to failures or gaps in the current scholarly communication system. If those failures disappeared, some of our work should disappear too. OpenAlex is not trying to exist forever for its own sake. We want to build infrastructure that is useful, community-aligned, and durable for as long as it is needed.

Sustainability

💚 Transparent operations

What it means: Community trust requires transparency not just in governance, but in the practical realities of how the organization works.

OpenAlex: OpenAlex publishes pricing information, member information, nonprofit transparency materials, and public information about our grant funding (including openly depositing our grant proposals on Open Grants). We want the community to be able to see not just what we say we value, but how we are actually trying to sustain the work.

💚 Time-limited funds are used only for time-limited activities

What it means: Operations should be supported by sustainable revenue sources. Time-limited funds should be used for time-limited work.

OpenAlex: This remains central to how we think about sustainability. We continue to use grants to support bounded development work, major new initiatives, and strategic expansion. We continue to believe that day-to-day operations should be supported by revenue from sustainable services. Since the initial assessment, our operational revenue has grown significantly to scale new services with the growing operational needs of OpenAlex.

💚 Goal to generate surplus

What it means: Merely breaking even is not enough. Infrastructure organizations need enough flexibility to adapt and survive shocks.

OpenAlex: We agree with this more strongly now than we did in 2021. Open infrastructure at global scale needs slack. It needs room to invest, recover from surprises, and support transitions. A model that aims only at exact cost recovery is too brittle. We’re in a massive build phase at the moment, investing significant resources to build OpenAlex, but aim to generate surplus in future operational phases.

💛 Goal to create financial reserves

What it means: Organizations should maintain reserves that can support orderly wind-down, transition, or response to major unexpected events.

OpenAlex: We continue to believe in this principle, and we continue to work toward it. We currently have funds available to support our operations for the next year but have not set aside a formal contingency fund, so this remains an area where the work is ongoing.

💚 Mission-consistent revenue generation

What it means: Revenue sources should support the mission, not undermine it.

OpenAlex: This remains one of our strongest convictions. OpenAlex needs revenue to survive, but not all revenue is equally compatible with our mission. We believe a robust sustainability model is built on revenue from services, support, memberships, and mission-aligned partnerships and not on revenue that would require restricting the core infrastructure or distorting our priorities.

💚 Revenue based on services, not data

What it means: Data related to the scholarly infrastructure should be community property. Revenue should come from services, not from locking up the data itself.

OpenAlex: This remains a core principle for us. OpenAlex data remains open and will always be open. What we charge for are services around that infrastructure: support, enhanced access, memberships, and other value-added offerings.

💚 Volunteer labour

What it means: Organizations should be honest about the extent to which they rely on volunteer labour, and thoughtful about the risks and responsibilities that come with it.

OpenAlex: OpenAlex benefits from a great deal of community contribution. CAB members volunteer their time and expertise. CAB working groups help us think through specific topics. Community members submit metadata corrections, bug reports, and feature requests. We are grateful for that work and rely on it for ensuring that our development meets community needs.

💚 Transition planning

What it means: Organizations should reduce dependence on a small number of people and make transitions survivable.

OpenAlex: Since 2021, one of our co-founders left the organization (and field). It was a difficult transition period that reinforced for us the importance of transition planning, but it also demonstrated that we can handle major transitions. Since then, we have brought on new team members with clearer roles, better documentation, and more mature systems. The launch of Walden, with its easier-to-operate architecture that LLMs can grok and develop, is also an important part of this story. All of this reduces key-person risk and makes the organization more durable.

Insurance

💚 Open source

What it means: All software and assets required to run the infrastructure should be available under an open-source license.

OpenAlex: As in 2021, our code is openly available and openly developed. We continue to see “born open” development as the right default. Our code is also archived through Software Heritage.

💚 Open data (within constraints of privacy laws)

What it means: For an infrastructure to be reproducible, the relevant data must be openly and legally available where possible.

OpenAlex: The core data behind OpenAlex is open and intended to remain open. At the same time, some of our products and services involve private or user-provided data. As we wrote in 2021, when users share private data with us in order to receive a service, we do not share that private data or the data derived from it.

💚 Available data (within constraints of privacy laws)

What it means: It is not enough for data to be open in principle; there must be a practical way to obtain it.

OpenAlex: OpenAlex data is not just nominally open; it is completely open through full snapshots that are free to download as well as open UI and API services with generous daily free limits. Snapshots are also preserved and redistributed in ways that reduce dependence on us as the sole host and data is always available under a CC0 license.

💚 Patent non-assertion

What it means: Organizations should not use patents to prevent the community from replicating the infrastructure.

OpenAlex: Our position here is unchanged in substance from 2021. We do not believe patents belong at the core of scholarly infrastructure. In our earlier post, we said we would not pursue or assert patents and would look into formalizing that commitment. That still reflects our position.

💚 Preservation

What it means: Open infrastructure should be preserved in ways that make rescue and continuity possible.

OpenAlex: OpenAlex snapshots are stored on Zenodo, our code is archived via Software Heritage, and copies of the data are increasingly being hosted by academic groups around the world. That kind of distributed preservation is exactly what open infrastructure should enable.

💚 Interoperability and open standards

What it means: Infrastructure should use open standards and fit into the larger ecosystem in ways that make continuity and reuse easier.

OpenAlex: Interoperability has always been central to what OpenAlex is trying to do. We rely heavily on shared identifiers, open metadata flows, public snapshots, and open APIs. We want others to be able to build with and around OpenAlex without asking permission.

Closing

So: OpenAlex remains committed to POSI.

The revised principles are better in a few important ways. They push organizations like ours to be clearer about governance, more transparent about operations, more deliberate about reserves and transitions, more honest about volunteer labour, and more serious about preservation and interoperability.

As in 2021, we are publishing this not because we think we have everything figured out, but because we think these commitments are worth making in public. They give our community something concrete to expect from us, and something concrete to hold us accountable to.

And if the long arc of scholarly infrastructure bends toward a world where less of our current work is necessary because the ecosystem has become more open, more interoperable, and less broken, we will count that as success too.

That would be a nice problem to have.

New Features and Usage-Based Pricing

Posted on February 24, 2026February 24, 2026 by Jason

Today we’re adding some features to the OpenAlex API: better search, content download, and new docs. Most importantly, we’re also introducing usage-based pricing.

New features

Advanced search at last

We’ve had lots of request for advanced search features to support systematic reviews. Good news: they’re here!

Proximity search: find terms near each other
Exact matching: skip stemming when you need precision
Wildcards: for when you’re not sure of the exact form
Lonnnnng queries: Searches can be up to several pages in length (8kb)

Find details and examples of advanced search in the new developer docs here.

Note for developers: the old filter syntax for search is now deprecated; the ?search= parameter approach remains. It’ll be the One Way To Do It moving forward. Filter searches will redirect to the ?search param.

Semantic search

We’re also launching semantic search. Instead of just matching keywords, it uses embeddings to match the meaning of your search–so a search for “kelp biomechanics” also finds articles about algae and wave mechanics. But you don’t have to stop there: you can even paste a whole abstract into the search bar to find related papers!

Semantic search is in beta; we don’t recommend using it for sensitive production workflows yet. But we would love to hear your feedback! If it’s well-used we’ll continue to invest more resources into it.

Full-text downloads

We’re hosting PDFs and TEI XML for our 60M open-access works. You can search and filter for works of interest, filter to get just ones with PDFs, and then download the PDFs in bulk—all with the API. Or you can use our new OpenAlex CLI to do it from the command line, massively parallelized, in a single command. Or your agent can—they love CLIs.

openalex download \ 
  --api-key YOUR_KEY \ 
  --output ./climate-pdfs \ 
  --filter "topics.id:T10325,has_content.pdf:true" \ 
  --content pdf

See the full-text documentation for details.

New docs

We’ve completely rebuilt our documentation. The old docs are deprecated and will redirect soon. The new docs are clearer, cleaner, up to date, and AI-optimized. We want to make OpenAlex as easy as possible to use for everyone, whether they’re an expert or a novice vibe-coding their first app.

API keys are now required.

As we announced in January, you’ll need an API key for all requests. Getting one is free and takes about 30 seconds: create an account at openalex.org, then grab your key at openalex.org/settings/api. You can still make a few calls without an API key for demo purposes, but it’s not suitable for any kind of production use. The API keys are essential for our new usage-based pricing model. What’s a usage-based pricing model? Gentle Reader, a mere centimeter now separates you from the answer. 👇

Usage-based pricing

Different API operations cost us different amounts to run. Doing stuff with PDFs is expensive, but looking up a single work by ID is nearly free. We think it’s essential that our pricing reflects these actual costs. Usage-based pricing is a natural fit for this: it’s transparent, sustainable, and fair.

Here’s what things cost. See the developer docs for more details.

endpoint	cost per call	cost per 1,000 calls
single work lookup by DOI or ID	0	0
list and filter	$0.0001	$0.10
search	$0.001	$1.00
PDF/XML download	$0.01	$10.00

Free usage

Every API key gets $1 of free usage per day. We’ve always subsidized free users using revenue from paying ones–this makes the exact extent of that subsidy clear, transparent, and unambiguous.

What does that daily dollar get you? Assuming you return 100 works per request:

endpoint	daily free calls	daily free results
single work lookup by DOI or ID	unlimited	unlimited
list and filter	10,000	1,000,000
search	1,000	100,000
PDF/XML download	100	100

To use a real-world example: grabbing all 694k works by Finnish authors takes about 7k paginated requests at $0.10 per thousand or $0.70. That’s covered by your free daily allowance. But if you want all 9 million works from Japan, that will cost about $9. (You could even download all 480M works in OpenAlex this way for $480—but don’t do that lol, download the full dataset instead, it’s free).

It’s easy to track your usage: every API response includes headers showing how much you’ve spent and how much you’ve got left. You can also check openalex.org/settings/usage anytime.

Prepaid usage

Most users will find that the free plan covers all their needs. However, for some projects, you may need more usage. The great thing about usage-based pricing is that most of the time this will only cost you a few bucks. You’re just paying for what you need. You can buy prepaid usage in 1min with your credit card, whenever you want, however much you want. It supplements your daily free allowance.

Organizational plans

Organizations also buy prepaid usage. But many will want to get annual plans instead, which offer major discounts, data sync, curation dashboards, and more. Check out our new Member, Member+, and Supporter plans for more details.

FAQ

I thought it was free? The data remains free. The full OpenAlex dataset—all 480M works, all the metadata—is free to download, share, remix, and build on. We’re committed to keeping it sustainably free by charging for a service (the API) built on that dataset. Free data, paid service–this is the path laid out in the POSI principles, which we’ve signed and enthusiastically support.

How do I track my usage? Every API response includes usage info; you can also call the rate-limit endpoint or check your usage page on openalex.org. Learn more here.

How is my usage data used? We analyze usage data to improve the overall service and we provide institutions aggregated usage summaries for their institutions upon request. We only collect what we need to run OpenAlex. We aren’t building tools to monitor individuals and we don’t sell your data. You can read our full privacy policy here.

Why charge per request instead of per result? We’re trying to link our costs to our pricing, and our costs mostly scale with requests, not results; a search that returns 10 results costs us about the same as one that returns 10,000.

Will prices change?

Yes, probably. The point of this model is to keep our prices tightly linked to our costs, and our costs will likely change with new tech, new use cases, and new data.

Where from here?

AI accelerates every day. The future of knowledge is getting rebuilt, right now. If we build on checkerboards of enclosed, walled gardens, we build a fragmented, incoherent future for scholarship and humanity.

We think OpenAlex can help with that. We’re gathering and connecting the literature into a cohesive living library, complete and organized and accessible to everyone. Today’s new pricing model helps us stay in this for the long haul.

An API-based sustainability model lets us deliver (and monetize) value in the post-GUI era. Soon, users won’t go to openalex.org (or any SaaS website), they’ll use APIs to vibe-code a custom interface for any question in minutes. [1] The post-GUI world will be tough on some open sustainability models. But it’s also an amazing opportunity for open infrastructure, if we adapt our pricing model correctly. That’s what we’re doing today.

We’re so very excited about this next chapter. Questions? Hit us up at support@openalex.org.

Let’s build!

[1] Check out our Q1 town hall for more on our post-GUI strategy, and check out this vibe-coding webinar to see several real-life examples of building five-minute custom OpenAlex dashboards.

Affiliation curation is coming to OpenAlex

Posted on February 19, 2026February 19, 2026 by Kyle Demes

Algorithmic matching of affiliation text to real institutions is one of those things that only really becomes visible when it’s wrong.

For institutions adopting open research metadata, accurate affiliation matching is foundational: after all, tracking and understanding your research outputs requires first having an accurate list of your research outputs. When affiliation matching is noisy, institutions can lose confidence in open data—sometimes even when the underlying work’s metadata is otherwise excellent.

That’s why we’re launching a new affiliation curation tool inside OpenAlex, starting with our existing Member supporters.

Why we’re launching this now — and why it’s Member-only

Building affiliation curation properly is labour-intensive in two ways:

Developing the tool itself
We’re bringing curation into our production environment so it’s stable, auditable, and fast. That means building the interface, workflows, safeguards, and monitoring needed to support real institutional use at scale. And, of course, we need to iteratively develop this tool with partners as they start using it.
Operating curation as a service
Affiliation curation is much more complex than it looks and we can’t sustain the activities needed to moderate curation requests from any user. Moving forward, we need to provide training, guidance, moderation practices, and ongoing support.

OpenAlex Members aren’t just users of our data—they help us stress-test the workflows, surface edge cases, and shape the FAQ, training materials, and governance that will make the tool durable long-term.

What the tool does (and doesn’t do)

This new tool lets authorized institutional curators create and manage matches between:

Raw affiliation strings — the free-text affiliation lines authors include in publications (e.g., “University of X, Dept. Y, City, Country”), and
Your institution’s ROR record — the persistent identifier record for your organization.

In plain language: it helps you link the affiliation text that appears in publications to the correct institutional identity in OpenAlex.

What it does not do:

It does not let an institution “claim” a work if the institution isn’t actually present in the affiliation text.
It’s not designed to replicate your full internal hierarchy (departments, labs, etc.). In some cases, distinct branded units that report to the institution may warrant their own organizational identifier, but the tool’s core job is linking affiliation text to organizational identifiers.

A quick thank-you to our French partners

This work builds on a strong collaboration with our partners in the French Ministry of Higher Education and Research (MESR), who created and operated the works-magnet. This tool supported affiliation curation at global scale, demonstrated just how much the community is willing to contribute to better open metadata, and enabled many institutions to shift from proprietary to open databases.

We’re hugely appreciative: the success of works-magnet made the need (and the opportunity) unmistakable, and we’re grateful to continue this partnership as we bring curation natively into OpenAlex.

As part of this transition, the works-magnet submission pathway has been closed and we have fully processed previous submissions. We’re excited to move forward with a workflow that’s stable inside our production systems.

What Members can expect

Member institutions will be able to:

access the curation interface through a curator-enabled OpenAlex account,
search affiliation strings that may refer to their institution (including variants, acronyms, and location cues),
filter between strings that are already matched vs not yet matched,
and add or remove linkages to improve both recall (catch missing matches) and precision (remove incorrect matches when names are similar across institutions).

We’ll provide onboarding and training, plus guidance on best practices—especially for tricky scenarios like similarly named universities, multilingual variants, and hospital/university affiliation patterns.

What if you have an urgent need but can’t become a Member?

If poor affiliation matching is causing significant harm in a time-sensitive workflow (for example, a major reporting deadline or a high-stakes rankings exercise) and your institution can’t currently support membership, please reach out to kyle@openalex.org.

We can’t promise we’ll be able to solve every case immediately, but we do want to understand urgent situations and help where we can—especially when a small, well-scoped intervention can prevent real damage.

What’s next

We’re excited to put better affiliation control directly into the hands of institutions who rely on OpenAlex—and to do it in a way that’s sustainable for open infrastructure.

If you’re already a Member, keep an eye out for onboarding details and training materials. If you’re considering membership, you can learn more at openalex.org/members.

And if you’ve been part of the works-magnet effort: thank you. This launch is a continuation of that shared work—making open research metadata not just available, but dependable.

A new way to support OpenAlex: become a Member!

Posted on February 10, 2026February 10, 2026 by Kyle Demes

Starting today, institutions can now support OpenAlex as a Member for $5,000 USD/year—a lightweight way to help sustain fully-open research metadata for institutions who don’t need the services provided by our existing institutional service offerings.

🎉A special thank you and shout-out to the University of Victoria for becoming our first OpenAlex Member supporter!

OpenAlex remains free to use (website, API, and quarterly public snapshot), with data released under CC0 license. Membership is about keeping that open infrastructure healthy and helping us scale sustainably.

What you get as a Member

Membership is designed for institutions (often university libraries) who want to invest in open infrastructure and also get a few practical benefits in return.

The Member tier includes:

Admin dashboard (with institutional use statistics)
Affiliation editor (access provided to certified curators)
Unsub access (helping libraries with data-driven collections strategies)
Nomination rights (for our Community Advisory Board)
Members roundtables (quarterly meetings on roadmap priorities)

For more information on what is included in the new Member support package, head to https://openalex.org/members

We also offer higher tiers of membership

If your institution relies on higher volume access to OpenAlex or needs our time for additional services, we offer Member+ and Partner support packages that include increased API quotas and consulting hours, in addition to all of the benefits listed above for Member. For more information on what’s included in each membership tier check out https://openalex.org/pricing/institutions.

Why we’re doing this

OpenAlex is completely open research infrastructure that ingests, deduplicates, links, and enriches metadata so anyone in the world can build on a shared, open index of the global research system. Keeping that open takes real resources. Revenue from our existing paid subscriptions (previously called Premium and Institutional, but now Member+ and Partner) have been critical for our growth over the last few years. But we’ve heard from many institutions with less extensive service needs, that they would like a lighter weight option that costs less with fewer services— something similar to what other open infrastructures offer (e.g., ORCID). And so that’s what we’ve done!

How to join

For more information on which membership level is right for your institution, head to https://openalex.org/pricing/institutions. If you’re ready to become an OpenAlex Member, Member+, or Partner, or would like to discuss these options further, send an e-mail to sales@openalex.org.

Funding metadata in OpenAlex

Posted on January 26, 2026January 26, 2026 by Kyle Demes

With the Walden launch behind us, 2026 promises to be an exciting year for OpenAlex. And thanks to a transformative grant from Wellcome of $3.6M over three years, funding metadata will be a major focus of that development.

This Wellcome-funded project aims to make funding information a first-class part of the open scholarly graph so that funders, institutions, researchers, and tool-builders can rely on open, structured, reusable funding metadata.

Below is a progress update on what we’ve shipped so far, what we’re working on now, and how funders can help shape what comes next.

Why funding metadata (and why now)

Funding data is essential infrastructure for research strategy and accountability: funders need to understand what they supported, what it produced, and what changed as a result. They also need global data to position their work within the global funding landscape.

But today, most funding intelligence workflows still depend on closed databases or on burdensome reporting from grantees into siloed funder databases. OpenAlex already provides a comprehensive, open inventory of research outputs. This project extends that foundation so funding metadata becomes similarly open, structured, and connected.

What’s new in OpenAlex

We are hosting a webinar February 19, 2026 at 10am EST to review updates in more detail and allow time for interactive Q&A. You can register for that webinar here and a recording will be available on our YouTube channel afterwards. Here’s a quick update on recent progress.

1) We’re mining full text to match funders to outputs

We’ve begun matching funder names to research outputs through full-text data mining, adding millions of new linkages between funders and their outputs.

We have just started this work and have 10s of millions of PDFs to continue working through, but the momentum is building quickly.

2) “Awards” are now first-class objects in the OpenAlex graph

We’ve updated the OpenAlex schema so awards are first-class citizens, with their own entity type and API endpoint: https://api.openalex.org/awards

This is foundational work: it lets us represent grants/awards as structured nodes in the graph (instead of only as scattered fragments attached to works), which is required for reliable linking, curation, and downstream funding intelligence.

3) When DOIs are registered for grants, they appear in OpenAlex

Any funder registering DOIs for grants can now have their award metadata show up in OpenAlex almost immediately after registration. We’ve built this integration for Crossref award DOIs and will soon have completed the integration for DataCite award DOIs as well.

4) We’re ingesting grant metadata directly from funders

We’ve started ingesting funding metadata directly from funders who make their grant data available online but don’t mint DOIs. At the time of posting this, we had already ingested 11.5M grants.

This is critical: To build a comprehensive database of funding metadata, we need to meet funders where they’re at and ingest their data directly in the formats they’ve made available.

What we’re working on next

Here’s what we’re working on during 2026:

Full-text matching (finish running across our corpus of fulltext; set up on-going pipeline for new PDFs)
Improving matching quality (funder name disambiguation)
Grant ID matching (create linkages between individual grant IDs and papers)
Scaling ingest across many funders and formats (from well-structured national databases to the long tail of smaller or distributed sources)
- We’re starting with a seed list of 50 funders to develop these pipelines. You can check out that list and monitor our progress here.
- We’ll scale funder ingest later this year, but if you want to suggest specific funders you don’t see on our roadmap yet, e-mail kyle@openalex.org
Expanding linkages beyond acknowledgements by incorporating trusted reporting sources wherever possible (e.g., funder impact reports)
Clarifying and prioritizing use cases so we build the funding intelligence workflows funders actually need
Pilot apps that suggest linkages between grants and outputs (e.g., based on vector distance of text in grants and outputs)

Funder workshop in London: April 27–28, 2026

We’re convening an in-person workshop with collaborating funders on April 27–28, 2026 in London, England.

The goals are to:

Review what we’ve learned so far (what’s working, what’s messy, what needs partner input)
Confirm and refine funder use cases for open funding intelligence and impact reporting
Jointly shape the next phase of the project—both technical priorities and outreach activities to scale this initiative globally in the following two years

We will publish a report summarizing the workshop and detailing next phases of the project.

Call to action: we’re looking for funder collaborators (all shapes and sizes)

If you’re a funder—large or small, national or regional, public or private, anywhere in the world—we’d love to talk.

With each funder collaborator, we’re looking to:

Assess the current state of their grant metadata (coverage, structure, identifiers, openness, and constraints)
Help make their award records (and impact reports) easier to discover and reuse when possible
Ingest their grant metadata into OpenAlex to improve linkages between awards and outputs
Fully understand the funding intelligence use cases that matter most to them, so the open dataset supports real reporting and strategy needs

How to get started

The simplest next step is an introductory meeting.

Email the project lead and OpenAlex COO, Kyle Demes: kyle@openalex.org

Thanks (and more soon)

—Kyle

OpenAlex 2026 Roadmap

Posted on January 16, 2026January 16, 2026 by Jason

We just wrapped up our Q1 2026 Town Hall. You can watch the full recording here, but this post covers the highlights: what we shipped last quarter, what’s coming this quarter, and why we think 2026 is a pivotal year for open science.

What we shipped in Q4

The Walden rewrite is done. OpenAlex now runs on a modern Databricks infrastructure that lets us ship faster and iterate on data quality in days instead of months.

We added 192 million new works from DataCite and repositories. OpenAlex now indexes 477 million works—the largest connected repository of scholarship ever published.

On funders and awards: we created Awards as a first-class entity, extracted 27 million funder links from fulltext PDFs, and integrated 15 new funders directly.

What’s coming in Q1

For enterprise users: Credit-based API pricing launches this month. Different calls cost different amounts:

a singleton (/works/w123) is 1 credit,
a list (/works?filter=foo:bar) is 10,
PDF content (coming this month!) is 100,
vector search is 1,000. (coming soon! email steve@ourresearch.org for early access!)

We’re also launching a sync service so you can pull daily updates in one chunk instead of polling millions of records.

For institutions: Affiliation matching curation launches in February. Members can edit the matching algorithm that links affiliation strings to their institution. Changes propagate to the API within a day—permanently improving the dataset for everyone.

We’re also launching two membership tiers at $5k and $20k/year that include ability to curate your own data in OpenAlex, training/consulting, and pro API keys with higher API access for your faculty.

For researchers: A complete rewrite of author name disambiguation ships by end of Q1. This has always been the hardest problem in bibliometrics. With today’s AI, we think we can build the most accurate system ever made.

The bigger picture

There’s a lot more I want to say about why 2026 feels like a pivotal year—why we think the GUI is dead, why open data wins the AI era, and what that means for OpenAlex. I’ll save that for a follow-up post. For now: watch the town hall to hear the full argument, and try the vibe-coded demo I built live during the talk. And join our mailing list to stay up-to-date on all the wild stuff we’re doing this year. It’s going to be, by far, our biggest year ever. You ain’t seen nothing yet.

OpenAlex and NORA Collaborate to connect publications to the OECD FORD Taxonomy

Posted on January 16, 2026January 16, 2026 by Kyle Demes

OpenAlex and NORA (the Danish National Open Research Analytics team) are pleased to announce a collaboration mapping the OpenAlex research classification system to the OECD Fields of Research and Development (FORD) taxonomy. This alignment supports the upcoming launch of the new Danish Research Portal, but also enables OpenAlex users globally to use the taxonomy in their research analytics.

🎯 Why This Matters for Research Analytics

Widely adopted taxonomies like OECD FORD are critical for international benchmarking, reporting, and policy alignment. At the same time, national governments, research institutions, and regional bodies often rely on their own classification schemes that reflect local research priorities and funding strategies.

By linking OpenAlex’s aboutness classification system with the OECD FORD taxonomy, this collaboration creates:

A bridge between global standards and national strategy
An open and transparent alternative to proprietary classification systems
A pathway for countries and institutions to conduct policy-relevant analytics using fully open data
A blueprint for creating crosswalks between OpenAlex and additional research taxonomies

This mapping supports both broader interoperability and regionally specific analysis—without compromising either goal.

🧭 How We Built the Mapping

The mapping was developed using a systematic methodology that relates OpenAlex research subfields with OECD FORD categories. OpenAlex uses metadata about research articles (e.g., title, abstract, journal) to classify research outputs into research topics, subfields, fields, and domains (full documentation here).

OpenAlex subfields were successfully mapped to 38 out of 42 two-digit FORD fields.
The four remaining categories did not have direct equivalents given the current OpenAlex taxonomy structure.
The resulting crosswalk supports comprehensive coverage of major research areas across the OECD framework.

The figure below shows the number of OpenAlex subfields that were mapped to each FORD category. A full table listing each OpenAlex subfield and its corresponding FORD categories is available here.

🤖 Combining Expert Knowledge with AI

To ensure quality and scalability, we employed a dual approach:

A human expert (from OpenAlex) manually assigned OpenAlex subfields to FORD categories.
The same task was conducted using ChatGPT to test whether AI could reliably assist in classification alignment.

Out of 250+ assignments, the two approaches differed in only 11 cases. These were reviewed in collaboration with researchers in those fields: ChatGPT’s classification was determined a better fit in 7 of the 11 cases, while the human’s classification was a better fit only 4 times!

This result gives both teams confidence in using AI to assist with future classification crosswalks—especially as a way to accelerate mappings between OpenAlex and other national or domain-specific taxonomies.

📊 What the Mapping Enables

Once mapped, the classifications were applied by NORA to publications in the Danish Research Portal, which aggregates research outputs from across Denmark’s institutions. The FORD classifications derived from OpenAlex were then compared with classifications from Scopus and Web of Science.

While proprietary licensing prevents sharing of detailed comparisons, results from the three systems were broadly aligned, with some differences reflecting their underlying methodologies. Importantly, this confirms that open infrastructure can meet the same analytical needs traditionally served by closed systems.

🚀 What’s Next

OpenAlex users around the world can apply the crosswalk in their own analyses. If you think it’s useful for us to expose the OECD directly in our public API, let us know! If there is enough interest, we’ll add it this year.
The Danish Research Portal will launch in mid 2026, showcasing Danish research outputs across the OECD FORD classifications.

With the new OpenAlex Walden system, we look forward to expanding support for multiple taxonomies to meet the needs of different countries, research communities, and policy environments.

⚠️ Important Note on Use

This mapping is not formally endorsed by the OECD. We consulted with the OECD team and shared preliminary results to ensure accuracy and transparency. However, users conducting official reporting should validate the mapping according to their institutional or national guidance.

🌍 A Shared Vision for Open, Interoperable Research Infrastructure

This collaboration demonstrates what is possible when national research infrastructure and open data providers work together to align global and local needs. By combining methodological rigor, AI-assisted innovation, and a commitment to openness, NORA and OpenAlex are helping advance a more interoperable and transparent research ecosystem.

If your organization or country uses its own classification system and is interested in implementing it in OpenAlex, we invite you to reach out and collaborate with us.

— The OpenAlex and NORA Teams