PEER-REVIEWED PUBLICATION · COMPUTERS IN BIOLOGY AND MEDICINE

IDENTIFYING INDICATIONS FOR NOVEL DRUGS USING ELECTRONIC HEALTH RECORDS

Unsupervised methods that read large EHR databases to predict which diseases a drug under development could treat.

A peer-reviewed study extending and benchmarking unsupervised computational methods that identify candidate indications for drugs in development directly from electronic health records — including a matrix-factorisation variant tuned for first-in-class molecules, where the obvious comparators do not yet exist.

AUTHORED BY
One of nine authors. The study was led by Lukas Adamek (corresponding author Brandon Rufino) with the Data & Computational Science and Clinical Real-World Evidence teams at Sanofi R&D.
PEER-REVIEWEDOPEN ACCESSPUBLISHED OCTOBER 2024CODE ON GITHUB
Open access — no form. Read on ScienceDirect, resolve via DOI, or get the code on GitHub.

WHAT THE PAPER SHOWS

Choosing which disease to develop a drug for is one of the earliest and highest-stakes decisions in research and development. This study extends and tests several unsupervised computational methods that use electronic health records to identify candidate indications — diseases a drug could plausibly treat — for molecules still in development. The methods are phenotypic-similarity driven: they reason from the patterns of disease that co-occur across millions of real patients, rather than from literature or expert intuition alone.

To benchmark the approach, the methods were tested on known drugs that already have multiple approved indications, so predictions could be checked against ground truth. A variant of matrix factorisation gave the best performance for first-in-line drugs — the hardest and most valuable case, where no established comparator exists yet — improving on earlier methods built for well-characterised, established drugs. Applied beyond the benchmark, the methods surfaced novel predictions for key immunology and oncology drugs.

KEY FINDINGS

  • A matrix-factorisation variant gives the best performance for first-in-class drugs — improving on prior methods that were built for established, well-characterised molecules.
  • The approach is phenotypic-similarity driven: it learns from the patterns of disease that co-occur across large patient populations, not from literature review alone.
  • Applied beyond the benchmark, the methods produced novel indication predictions for key immunology and oncology drugs.
  • Performance differs sharply by therapeutic area — stronger in inflammation and immunology than in oncology, likely because many chemotherapies are not targeted therapies, so phenotypic signal is weaker.
  • The implementation is released as open-source code, so the methods can be inspected, reproduced and extended.
RESULTING PATENT

Methods for treating digitally-identified IL-4/IL-13 related disorders

Publication no.
WO2021119028A1 (WIPO PCT) · PCT/US2020/063835
Assignee / owner
Sanofi Biotechnology SAS
Dates
Priority 9 Dec 2019 · Filed 8 Dec 2020 · Published 17 Jun 2021

The same line of work produced more than a paper. Applying unsupervised machine learning — bisecting K-means clustering with Multiple Correspondence Analysis — across electronic health records for roughly 94 million patients, the team digitally identified novel candidate indications for dupilumab, an anti-IL-4Rα antibody that blocks signalling through the shared IL-4/IL-13 pathway. Beyond dupilumab's approved set, the analysis surfaced new candidate disorders to evaluate across skin, blood — including sickle-cell disease — lung fibrosis and eye disorders, the basis for the patent's treatment claims.

This patent is owned by the pharma partner, Sanofi Biotechnology SAS — not by chAIron. It is shown here because it is a concrete, real-world outcome of the same family of EHR-based indication-finding methods the publication describes.

Inventors: Cliona Marie Molony · Paul Bryce · Emanuele De Rinaldis · Ramon Antonio Hernandez Vecino · Francisco Javier Jimenez Jimenez

VIEW ON GOOGLE PATENTSLegal status per Google Patents: ceased — an indicator, not a legal conclusion.
PART OF CHAIRON'S FOUNDING STORY

WHERE THE METHOD CAME FROM

Before chAIron existed as a company, the methods now at the centre of its platform were being built, tested and published in the peer-reviewed literature. This paper — co-authored by Flavio Dormont, today a co-founder and chief scientific officer of chAIron — is one of the earliest pieces of that story: an EHR-based, mechanism-grounded way to find the indications a molecule could treat, validated against drugs whose answers were already known.

It is one of the first engagements that shaped how chAIron works today. The platform operationalises this lineage for clients — pairing real-world evidence with a biomedical knowledge graph, and keeping clinical experts in the loop — to turn indication finding from a months-long, intuition-led exercise into a ranked, testable, evidence-backed shortlist.

AT A GLANCE

Type
Peer-reviewed research article
Journal
Computers in Biology and Medicine (Elsevier)
Published
October 2024 · vol. 183, 109158
Access
Open access — no form
Methods
Unsupervised ML · matrix factorisation
Code
Open-source on GitHub

FREQUENTLY ASKED QUESTIONS

Yes. It is available openly through the publisher. You can read it on ScienceDirect, resolve it via its DOI, or read the mirrored record on the ACM Digital Library — no form and no charge. The implementation code is also released openly on GitHub.

It extends and benchmarks unsupervised computational methods that read electronic health records to predict candidate indications for drugs in development. The methods are tested on known drugs with multiple approved indications so that predictions can be checked against ground truth, and a matrix-factorisation variant is shown to work best for first-in-class molecules.

It is part of chAIron's founding story. The EHR-based, mechanism-grounded approach to indication finding described here is co-authored by chAIron co-founder Flavio Dormont, and the same lineage of methods underpins the chAIron platform today. The published work was carried out with the Data & Computational Science and Clinical Real-World Evidence teams at Sanofi R&D.

The patent (WO2021119028A1) covering digitally-identified IL-4/IL-13 disorders is owned by Sanofi Biotechnology SAS, the pharma partner — not by chAIron. It is referenced here because it is a concrete outcome of the same kind of EHR-based indication-finding work, not because chAIron holds any rights in it.

Using unsupervised machine learning across electronic health records for roughly 94 million patients, the work digitally identified novel candidate indications for dupilumab — an anti-IL-4Rα antibody acting on the IL-4/IL-13 pathway — beyond its approved set, spanning skin, blood (including sickle-cell disease), lung fibrosis and eye disorders. These formed the basis of the patent's treatment claims.

No. Performance differs sharply by therapeutic area. The methods perform better for inflammation and immunology than for oncology — likely because many chemotherapies are not targeted therapies, so the phenotypic signal the methods rely on is weaker. The paper is transparent about this limitation.

THE METHOD BEHIND CHAIRON — APPLIED TO YOUR ASSET

Read the paper, then talk to us about finding the right indication for a specific molecule — peer to peer.

BOOK AN INDICATION-STRATEGY CALL

© 2026 chAIron SA. The referenced publication and patent are the property of their respective owners; the patent (WO2021119028A1) is owned by Sanofi Biotechnology SAS, not chAIron. Provided for informational purposes only — not legal, regulatory, financial or investment advice.

Information

Expanding the knowledge frontier of molecules with artificial and human intelligence.

Follow on LinkedIn

Get in touch

OfficeRue de la Grotte 6, 1003
Lausanne, Switzerland
Member of
© 2026 chAIron SA. All rights reserved.Designed & powered by Aumentta

Expanding the knowledge frontier of molecules with artificial and human intelligence.

Follow on LinkedIn

Get in touch

OfficeRue de la Grotte 6, 1003
Lausanne, Switzerland
Member of
© 2026 chAIron SA. All rights reserved.Designed & powered by Aumentta

Expanding the knowledge frontier of molecules with artificial and human intelligence.

Follow on LinkedIn

Get in touch

OfficeRue de la Grotte 6, 1003
Lausanne, Switzerland
Member of
© 2026 chAIron SA. All rights reserved.Designed & powered by Aumentta