Translational research occupies a unique stage in the drug discovery and development process. This is where hypothesis meets reality, where a new molecule developed in the lab can either be translated into a lifesaving medicine, or become another failure in this notoriously high-attrition game of drug development.
Because of the pivotal role of translational research in the success of the drug and the high stakes of failure in late stage clinical development, technological innovations that can improve the success rate of the translation between the bench and the bedside can play an outsized role in pharma R&D productivity. AI, being the most transformative technology in our society today, is expected to be a game changer in translational research; but where are the opportunities and how will they happen?
"To fully realize the power of AI, it is important for both pharma and academia to invest in research at the intersection of AI and biomedicine. It is also imperative to systematically generate high quality, properly labeled data"
To limit the scope of our discussion, I will focus only on machine learning, the branch of AI that enjoyed the most breakthroughs in recent years, and the development of biomarkers, a key success factor in translational success and precision medicine. I would like to discuss three types of biomarkers in particular: molecular biomarkers, digital biomarkers, and imaging biomarkers.
The classic example of a molecular biomarker is HER2 overexpression in breast cancer which is routinely measured in a companion diagnostic for Herceptin, the first precision medicine approved by FDA, to treat HER2-positive breast cancer patients. Since then many molecular biomarkers have been identified and successfully used in targeted therapies (e.g., BRAF V60E mutation status for Zalboraf to treat metastatic melanoma). Despite the successes, there are only a small number of clinically validated biomarkers so far. For many new drugs, especially those with more complex mechanisms of action such as in immunotherapy, a data-driven approach is often necessary. Typically a large pool of data from genomic or transcriptomic profiling experiments, protein assays, flow cytometry assays, will be combined with lab data and other clinical data, which will then be interrogated for potential biomarkers that can predict clinical response. This is where machine learning can be particularly helpful.
Indeed, several biotech companies have employed this approach in their R&D models. For example, Anavex used a computational platform to identify genetic variant biomarkers to stratify a sub-population of Alzheimer’s disease based on whole genome sequencing data of patients in a Phase 2a study. Improved clinical outcomes was observed in this sub-population treated with their experimental drug. Berg also positions itself as an AI-driven biotech company that operates at the intersection of biology, technology and artificial intelligence analytics. It has identified novel biomarkers in multiple disease areas with drugs that have reached phase II clinical trials.
At larger pharmaceutical companies, new AI and digital transformation strategies are being implemented to improve the drug development process. At Roche, a recent company-wide machine learning challenge attracted more than a hundred teams of data scientists in building predictive models to find the best treatment options (e.g., immunotherapy, chemo, or combination therapy) for patients with a variety of cancers, using both clinical and transcriptomics data from a dozen past clinical trials. Efforts like this are not just theoretical exercises but have indeed yielded plausible candidate biomarkers that are being investigated in early trials.
In certain disease areas, e.g., Neuroscience, it can be challenging to identify molecular biomarkers. This may be due to a number of factors such as lack of patient samples to study disease mechanisms at the molecular level, high inter-patient heterogeneity, and the lack of quantitative measures of symptoms.
However, a new type of biomarker, the digital biomarker, may change how we study these disease fundamentally. Rather than relying on biochemical assays, digital biomarkers are measured as signals generated by sensors in wearable devices such as smartphones and smart watches. For example, Roche is currently testing the concept of digital biomarkers in multiple clinical trials including Parkinson's, schizophrenia, and Multiple Sclerosis. Early results showed that a deep convolutional recurrent neural network trained with human activity data was able to accurately predict Parkinson’s patients' daily movement patterns based on data from patients’ smartphones. Similarly, in a schizophrenia trial, data generated in actigraphy smart watches accurately predicted patients’ movements including hand gestures. In addition, these movement patterns, i.e. digital biomarkers showed a high degree of correlations with clinically observed symptoms.
If clinically validated, these movement patterns, i.e., digital biomarkers, have the potential to provide more accurate and objective measurements of disease symptoms. In addition, unlike molecular biomarker data that are collected only when a patient visits a doctor’s office, digital biomarker data can be collected remotely from patients on a continuing bases. This provides a longitudinal view of the patient disease journey that is invaluable in understanding disease progression and drug response, and providing the best care for the patients.
Imaging biomarkers have been in use in translational research for some time. For example, the widely used RECIST (Response Evaluation Criteria In Solid Tumors) criteria to assess objective responses in cancer therapies are derived from CT or MRI images.
Rapid advances in AI, especially in convolutional neural networks (CNN), are quickly changing radiology and pathology image analysis. For example, a CNN trained with a large number of clinical images of skin cancers at Stanford was able to distinguish malignant from benign lesions, with a performance on par with board-certified dermatologists. A NYU study also showed that a CNN trained on whole-slide histology images obtained from The Cancer Genome Atlas (TCGA) could not only classify them into different subtypes of lung cancer, but also predict the commonly mutated genes in lung cancer with accuracies comparable to that of pathologists.
New image biomarkers, in the forms of algorithmic predictive CNN or other machine learning models, will likely gain acceptance in the near future and represent a new category of clinical endpoints in clinical trial design.
Challenges and Future Outlook
Amid the overly optimistic outlook of AI, it’s important to be aware of the various challenges. The most fundamental challenge, I believe, is still the lack of data. It is paradoxical that while we may be drowning in waves of big data, useful data—properly labeled and high quality data—can still be a rare commodity.
Take genomic data for example. While the number of measurements per sample is huge in a whole genome sequencing experiment (over 20,000 genes or 3 billion base pairs), the number of samples in a typical study is rather small, in the range between a single digit and a few hundred. The labels (e.g., response vs resistance) are not always clear cut, and often unbalanced in numbers, making it difficult for machine to learn the patterns. On top of that, raw genomic sequences alone may or may not be the right features to predict drug response.
Therefore to fully realize the power of AI, it is important for both pharma and academia to invest in research at the intersection of AI and biomedicine. It is also imperative to systematically generate high quality, properly labeled data. Pharma needs to treat data not only as the supporting evidences on the road to drug approval, but as valuable assets in its own rights, and implement strategies to systematically collect properly labeled data guided by a vision of an AI-powered drug discovery and development process.
The good news is that the industry is already moving in this direction. As with any new technology, there may be multiple hype cycles ahead. We will likely see both quick wins in some areas and incremental improvements in others. But a data-driven, AI-powered drug discovery and development approach will bear fruit and eventually benefit many patients.