Article Blog December 19, 2018

Proof of Concept: Deep Learning Automatically Classifies 300,000 Clinical Studies.

The clinical trial industry is ripe for the application of good data science. The field generates massive amounts of data on a daily basis, which has a tremendous impact on human individuals and societies. Anonymized data related to clinical trials is freely available online, through government agencies like and public-private partnerships such as the Clinical Trials Transformation Initiative (CTTI).

Comprehend’s engineering group is always looking for new ways to use data science to improve the safety and effectiveness of clinical trials. We’re particularly interested in deep learning tools that can make handling large datasets more efficient and pave the way for new analytical discoveries. Using deep learning and other artificial intelligence (AI) technologies may bring us to the day where we can determine, and replicate, the conditions required to optimize clinical trials in different therapeutic areas.

To that end, we recently decided to see if we could use deep learning to automatically classify clinical trials across 38 therapeutic categories, such as oncology, neurology, hepatology, and orthopedics.

Using publicly available data from the websites mentioned above, we classified nearly 300,000 clinical studies with an 83% success rate. To map these studies to our 38 categories, we used pre-trained clinical concept embeddings to correlate terminology from the studies themselves to medical terms from the Unified Medical Language System® (UMLS®).

This particular deep learning capability is based on word embeddings used in natural language processing. Word embeddings are basically numerical representations of text, used by neural networks to quickly understand and capture meaning, semantic relationships, and context. For our purposes, the medical concept embeddings we used allowed us to classify hundreds of thousands of studies in a couple of seconds— far less time than it could have been done manually.

If you want all the details of our research, including specific steps and computer code, it’s all outlined in “Use Embeddings to Predict Therapeutic Area of Clinical Studies,” an article I published on Medium.

The Success of Our Experiment Makes Us Optimistic about the Future Use of AI.

Building a clinical study classification algorithm showed our team the potential for using deep learning and similar technologies to gain valuable insights from clinical trial data in a very short amount of time. In addition to basic classification, AI tools can help us answer important questions, such as:

  • Which sites are optimized for specific types of studies?
  • To what extent, and in what ways, does study planning affect outcomes?
  • What qualities in patients may make them more or less likely to experience an adverse event?

Data science is advancing the positive impact that clinical trials can have on humanity. At Comprehend, we use data science every day to help sponsors and CROs improve the performance and lower the risk of their clinical trials. Our clinical intelligence solutions, for example, enable the rapid integration of data from disparate sources—EDC, CTMS, ePRO, IxRS, etc.—into a unified data study model (UDSM) for greater visibility, automated alerting, and standardized, real-time data analytics.

To learn more about how Comprehend can help you harness advanced technology to bring valuable medical treatments to market faster, request a demo or call us at 650-521-5449. For information specifically about our clinical trial classification experiment, be sure to check out my Medium article and feel free to email me directly at

We’re Here for You!

Let us help you overcome your clinical trial challenges.