Author search in OpenAlex: improved handling of diacritics within names

We’ve improved the author search feature within OpenAlex, so you get more results when searching for author names that may or may not include diacritics. For example, a search for the name “David Tarragó” will return the same number of results as the the version that is converted via Lucene’s ASCII folding filter, which in this case is “David Tarrago”.

When searching with diacritics, results with the queried diacritics are more likely to be ranked towards the top. So the two searches may have slightly different rankings. You can see the results of these two searches in the API:

These queries return the same number of results, with diacritic and non-diacritic names included. Keep in mind that results are weighted by the author’s works count, so that has an impact on relevance as well.

Why make this change?

When creating the OpenAlex author search capability, it was important for us to honor author’s names by respecting diacritics. So searching with a diacritic returned results with diacritics. However, this strict approach makes it harder to find some authors. We’re comfortable with the compromise of searching with and without diacritics at the same time, while giving priority to the intended search query. Hopefully this improved feature is helpful!

Leave a Reply