Insight

AI-Assisted Author Disambiguation: What It Does and What It Cannot

By Discover RIMS Admin · June 8, 2026 · Updated June 13, 2026

Author name disambiguation is the unglamorous problem at the heart of every researcher-level metric. When the institutional record contains two profiles for one researcher, or one profile blending two researchers, every downstream number is wrong. AI has changed what is possible here: machine learning can match name variants, transliterations, and affiliation changes at a scale and accuracy that manual review cannot. This article explains how AI-assisted disambiguation works, where it helps most, and the limits that still require human judgement and persistent identifiers.

The problem AI is solving

Names are not unique. The same researcher may appear as "J. Smith", "John A. Smith", "John Smith", "Smith JA", or under a transliterated form across publications spanning a career. Affiliations change. Co-author lists are inconsistent. Common names produce hundreds of plausible candidates per query. Without a persistent identifier on every output, name-and-affiliation matching is the only available signal — and it is fragile at scale. This is why researcher-level metrics are so often unreliable: they rest on guessed identity.

How AI-assisted disambiguation works

Modern disambiguation combines several signals: name string similarity (typo-tolerant), co-author network overlap, affiliation match (now strengthened by ROR identifiers), topical similarity from publication metadata, and temporal consistency (a profile cannot be co-located in two cities at the same time). Machine-learning models weight these signals to produce a confidence score per candidate match. High-confidence matches are accepted automatically; low-confidence cases surface for human review.

What AI cannot do

Two limits matter. First, AI cannot create signal where none exists. A researcher with two publications under name variants and no other metadata to link them will remain ambiguous regardless of model sophistication. Second, AI matches are probabilistic — they can be confidently wrong. A model that produces 99% accuracy is wrong on 1 in 100 researchers; for an institution with 2,500 researchers, that is 25 errors silently absorbed into reporting. The remedy is not better AI; it is persistent identifiers.

Why ORCID is still the answer

An ORCID iD on a publication makes disambiguation a lookup, not a guess. AI then handles only the residual cases — outputs published before the researcher had an ORCID, or under affiliations missing from the registry. The combination is decisive: ORCID for known cases, AI for the rest, human review where confidence is low. This is the foundation laid out in Institutional ORCID Adoption and the identifier infrastructure described in Scopus Author ID and ORCID Explained.

How a RIMS surfaces disambiguation decisions

A defensible RIMS does not hide disambiguation behind a black box. It records which outputs were matched automatically, which required human review, and the confidence level of each match. It allows researchers to claim or reject outputs against their profile. It treats disambiguation as a maintained relationship, not a one-time clean-up. The wider context — why this matters for ranking submissions, accreditation, and impact reporting — is in our journal and researcher metrics pillar.

What this means for AI procurement

When evaluating an AI-assisted disambiguation feature in a RIMS, the right questions are not "how accurate is the model" — they are "what signals does it use", "how does it surface low-confidence cases", "can researchers correct it", and "does it depend on ORCID and ROR or work around them". The answer to the last question separates serious implementations from marketing claims.

Frequently asked questions

Can AI alone solve our researcher-disambiguation problem? No. AI plus ORCID plus human review can. AI alone produces silently wrong matches at a small but material rate.

Will AI disambiguation reduce our ORCID rollout burden? Indirectly — it makes the cost of incomplete ORCID coverage less severe. But ORCID coverage remains the leading indicator of researcher-record reliability.

How do we know an AI match is correct? Confidence scores, surfaced low-confidence cases for review, and researcher claim/reject mechanisms on the profile.

Where to start

Discover RIMS reconciles author identity using ORCID, Scopus Author ID, and probabilistic matching across Scopus, OpenAlex, ORCID, Crossref, and Scimago — automated where confidence is high, surfaced for review where it is not, and researcher-correctable always.

AI-Assisted Author Disambiguation: What It Does and What It Cannot

The problem AI is solving

How AI-assisted disambiguation works

What AI cannot do

Why ORCID is still the answer

How a RIMS surfaces disambiguation decisions

What this means for AI procurement

Frequently asked questions

Where to start

Related reading

Related articles

Internationalisation Metrics: Measuring Global Research Collaboration

Bibliometrics for Ranking Submissions: h-index, FWCI and Citation Impact

AI is Only as Good as the Data Beneath It: The RIMS Data Foundation