Every conversation about AI in research information management eventually arrives at the same realisation: the AI is only as good as the data beneath it. An author-disambiguation model fed inconsistent affiliations produces confidently wrong matches. An LLM summarising publications missing from the institutional record summarises a fiction. A trend dashboard built on incomplete metadata identifies trends that do not exist. The institutions that benefit most from AI in their RIMS are not the ones that bought the most AI features; they are the ones whose data foundation was strong enough to make those features work.
What "data foundation" means in practice
For a RIMS, the foundation has four components: complete output capture across multiple authoritative sources (Scopus, OpenAlex, ORCID, Crossref, Scimago); reconciled author identity via ORCID and Scopus Author ID, with probabilistic matching where identifiers are missing; consistent affiliations, increasingly anchored by ROR; and deduplicated records so the same paper never appears twice. Each of these is its own discipline; together they constitute a research-information record an AI layer can be trusted on.
How poor data undermines AI
The failure modes are predictable. An AI matching model that sees inconsistent name variants without supporting identifiers makes more false matches. An LLM summarisation feature on a partial publication list produces a summary that misrepresents the researcher. A topic-trend engine on missing open-access output understates trend strength in fields where open-access publishing is dominant. None of these failures are fixable inside the AI; they require fixing the data. The principle is operationalised in Building a Single Source of Truth for Research Data.
The sequencing of investment
Institutions evaluating AI capabilities in a RIMS sometimes ask whether they should adopt AI features first and improve data later. The answer is the reverse. AI features on poorly reconciled data accelerate the wrong answers; they do not produce better answers in time. The right sequence: reconcile identifiers, complete output coverage, normalise affiliations, deduplicate. Then layer AI on top. The first phase is where most of the value is created; the second is where it is amplified.
What this means for procurement
When comparing RIMS platforms in 2026, the meaningful question is not "which has more AI features" — it is "which has the data-reconciliation discipline AI features depend on, and how transparently does it expose it?". A vendor whose AI demo is impressive but whose disambiguation logic is opaque has solved the easy problem and left the hard one for you. Useful procurement questions live in our RIMS RFP evaluation criteria article.
What this looks like in production
Universitas Hasanuddin operates Discover RIMS in production across 2,500+ researchers, 15,300+ publications, and 18 faculties and research units — with author identity, affiliations, and outputs reconciled across five global sources. That foundation makes any AI feature added on top — today or in future — defensible. Without it, no AI feature is.
Frequently asked questions
Can a vendor's AI feature improve our data quality? Sometimes (anomaly detection, deduplication suggestions). But it cannot create signal where none exists; identifiers and source coverage remain the prerequisite.
How long does the data-foundation phase take? Initial reconciliation typically takes weeks to months depending on existing system maturity. Continuous reconciliation is permanent — it is how the foundation stays trustworthy.
Is "AI is only as good as the data" just an excuse to avoid AI? No — it is the precondition for AI working. Institutions that skip it discover the consequences later.
Where to start
Discover RIMS prioritises the data foundation: continuous reconciliation across ORCID, Scopus, OpenAlex, Crossref, and Scimago, with affiliation disambiguation anchored by ROR. AI features sit on a foundation strong enough to deserve them.