MLOps Blog

Data Science and Machine Learning in the Medical Industry

10 min
Nilesh Barla
21st April, 2023

Data science is one of the fastest-growing domains in IT right now. Companies all over the world are trying to adopt and integrate data science and machine learning into their systems

In this article, we’ll explore how data science and machine learning are used in different areas of the medical industry.

As the old adage goes, prevention is better than cure, and new technologies can help in both. Imagine if a doctor could look at your medical history, run a mathematical formula over it, and predict what disease you could have, and when. 

Or if the doctor could tell which part of your body could get worse, and how quickly, with the kind of lifestyle that you’re living. AI could make this possible. 

Medicine and healthcare is a promising industry for implementing revolutionary data science solutions. Tech is taking medical science to a whole new level, from computerizing medical records to drug discovery and post-treatment patient monitoring. And this is just the beginning.

“To be a good diagnostician, a physician needs to acquire a large set of labels for diseases, each of which binds an idea of the illness and its symptoms, possible antecedents and consequences, and possible interventions to cure or migrate the illness.” – Daniel Kahneman


Machine Learning Trends to Watch Out in 2020 and 2021

Areas of healthcare influenced by data science and machine learning

Data management and privacy

As organizations adopt machine learning as a tool to find patterns, extract knowledge from data and tackle a diverse set of computationally hard tasks, it has become very important to manage and store data with the same consistency, without any errors. 

When data is stored with the same consistency (the same structural format), data scientists and analysts can use it to find patterns and trends, and mine information which couldn’t be found with manual analysis.

Data management in the healthcare industry is no different. It provides privacy, security and – if trained properly – it can prevent attacks and theft as well. 

For example, machine learning and deep learning models can be trained to detect patterns in incoming data. If it contains any information that’s not authorized by the doctors or the medical staff, it can freeze the incoming process, verify it, and make decisions with the help of human supervision. 


Top 12 Machine Learning Podcasts That You Want to Check as a Data Scientist

Source: Karli Carpenter

In this way, we’re not abandoning the human effort, but we’re combining both machine and human intelligence for better decision-making. 

One of the interesting areas of research is using deep learning to identify, tag, and mask PII (Personally Identifiable Information) data. Regular expressions and static rules can be used for this purpose, but using deep learning allows learning of specific formats (even custom PII types) used in an organization. 

Convolutional Neural Nets (CNNs) have been successfully used for image recognition, so exploring their usage for PII compliance is another interesting possibility.

Data scientists can also use machine learning and deep learning to establish a baseline, meaning what constitutes “normal” behavior for a system, by monitoring relevant attributes or features. They can then use the baseline to detect anomalies. 

This can prevent theft, and provide a strong reinforcement when data is being retrieved from any system by medical staff, as well as protection from hackers in or outside the medical organization. A model can control whether the data is being retrieved by authorized personnel.

Lastly, tech can be used to store information with appropriate consistency and structural format, without any errors. With proper data consistency, data scientists and analysts have an easier time working on problems like disease prevention, diagnostics, predictive analysis, treatment, drug discovery, and so on. 

How machine learning can prevent diseases

Machine learning can give us answers to very difficult questions, like how can we prevent an outbreak of a particular disease? What are additional undiscovered symptoms of that disease? How does it affect life in terms of longevity?

With a huge amount of data in hand, data scientists can use machine learning to find correlation between various attributes and features of the patients with the labeled disease. These correlations can help doctors understand the underlying patterns of disease, and come up with prevention plans.

For example, social media messages can be used to find latent infectious diseases. Social media messages with the user and temporal information are extracted during the data preprocessing stage. 

Next, an unsupervised sentiment analysis model is presented. During training, the model can extract key information and correlation about symptoms, body parts, duration of pain, prior activity before the pain, and pain locations from social media data. 

Finally, latent-infectious-disease-related information is retrieved from individuals’ symptom weighing vectors.  

Once the machine learning model extracts information, doctors can recommend a proper medical routine that not only reduces the financial expenses of the patient but also reduces the chances of developing something more drastic like a tumor. 

Not treating a symptom or a pain in the early stages can be very risky not only to the patient’s health but also considering the economic costs. As the disease grows, the cost of treating it also increases. In this sense, data science can play a huge role in optimizing healthcare spending.

How machine learning can help with diagnostics

Medical diagnostics are a very challenging part of healthcare. Stats show that if patients are treated in a shorter time, they have a bigger chance of retaining their health. But treatment depends upon the diagnostics report, and analyzing hundreds of reports can be a tedious task for doctors. 

medical diagnostics

Recently, hospitals have started using machine learning to speed up the process of diagnosis  and analysis. Now, doctors can get support in diagnosis to reach the treatment phase as quickly as possible. 

With machine learning being used to analyze the diagnosis reports or CT scans, finding anomalies or even malign cells can be done with a super-human level of perfection. 

Reports and suggestions created by machine learning models are reviewed by expert doctors, and support them in making the right decisions for their patients. 

This way, doctors can identify and prioritize patients with the most serious health conditions, and increase their success rates in preventing avoidable diseases. 

How machine learning can help with treatment

With more data on individual patient characteristics and their symptoms, it’s possible to deliver precise prescriptions and highly personalized care. 

Machine learning systems are proving to be a great tool for both doctors and nurses. With precise prescriptions, there is less chance of a patient having side-effects from medicine, because treatments are specifically designed for the patient’s needs. 

The International Genome Sample Resource is a huge database of human variation and genotype data associated with common diseases like coronary heart disease and diabetes. Data like this makes it possible to increase the quality of personalized healthcare.

Data scientists are also helping to merge other fields of research and study with medical and health sciences, to offer better ways of treating patients. 

For instance, when going through an athlete’s medical records, data scientists can discover patterns and correlations that are unique to the patient’s requirements, like the load-bearing capacity of the athlete’s joint being far better than a normal person. 

Data scientists can use these patterns and anomalies to find an appropriate solution from the branch of material science. One example of such a case would be the discovery that metallic materials, such as Ti and its alloys, have high mechanical strength and fracture toughness, so they can be clinically applied to repair hard tissues (bones and joints) under high-load conditions. 

With findings like these, data scientists can provide doctors with new ideas for treatments.

How to use machine learning in predictive analysis

This is one of the most popular topics in health analytics. You can feed a machine learning model historical data, and train it to find patterns and generate accurate predictions from it. It finds correlations and associations of symptoms, habits, diseases, and makes meaningful predictions.

Doctors can use this to predict various diseases through the patient’s lifestyle, eating habits, and through various activities.

Predictive analytics play an important role in increasing the efficiency of supply chains and pharmaceutical logistics. Models can predict the local requirement of pharmaceutical logistics in a particular hospital, to avoid shortages of medicine in case of emergencies.

Doctors can predict the deterioration of patient health, and provide preventive measures, like starting an early treatment that will reduce the risk of patients getting worse.

Big companies like Apple or Nike use predictive analytics in their wearables, to measure heart rate, breathing rate, sleep cycles, and much more. 

Such measurements give people an idea of what’s happening in their bodies, and software can predict whether their training was optimal or not, and how much calories and rest they need.

Drug discovery

According to research and statistics, it costs up to $2.6 billion and takes 12 years to bring a drug to market. Since big data and machine learning came along, scientists have been able to simulate the reaction of a drug with body proteins and different types of cells and conditions much faster. This leads to shorter drug development times, and greater likelihoods of gaining Food and Drug Administration approval. 

Scientists are using this technology to develop vaccines to solve the ongoing global health crisis caused by the coronavirus.

In a recent paper titled “Computational predictions of protein structures associated with COVID-19”, the Google DeepMind team used an AI system called AlphaFold to predict several under-studied proteins associated with SARS-CoV-2. 

The team showed that these structures provide an important resource for understanding how the coronavirus functions. This is just one example of how data scientists can use machine learning to predict various protein structures, and use it to understand and develop vaccines. 

Source: Deepmind

Another example comes from GlaxoSmithKline. They used machine learning to conduct clinical trials to speed up the drug discovery process. 

Startups are raising significant amounts of investments to speed up the drug discovery and testing process. BenevolentAI is a unicorn based in London that has raised $115 million to start over 20 drug programs, and create a “bioscience machine brain, purpose-built to discover new medicines and cures for disease.” 

Its first clinical trial this year in Europe and the US will address excessive daytime sleepiness in Parkinson’s disease.

Post-treatment monitoring

After any type of surgery or treatment, there’s a risk of complications and recurring pain, sometimes even side effects. This can be difficult to manage once the patient leaves the hospital. 

Remote in-home monitoring helps doctors stay in touch with patients in real-time. It’s very cost-efficient for patients, and it frees up space in hospitals. 

Intel’s Cloudera software helps hospitals predict the chances that a patient will be readmitted in the next 30 days, based on EMR data and the socioeconomic status of the hospital’s location.

SeamlessMD’s multimodal platform for post-operative care enabled the Saint Peter’s Healthcare System in New Jersey to reduce the average length of post-surgery stay by one day, saving an average of $1,500 per patient. 

All that patients have to do is answer a few questions in the app, and the machine learning software can guide them through appropriate responses. 

In home monitoring
Source: SeamlessMD

Virtual Assistance

With the help of disease predictive modeling, and advancements in natural language processing (NLP), data scientists can build a comprehensive virtual platform that provides assistance to patients. 

With the help of these platforms and machine learning algorithms, patients can input their symptoms and get insights about various possible diseases, and a comprehensive summary of the treatment that they need to follow.

Patients who suffer from psychological problems like depression or anxiety, and neurodegenerative diseases like Alzheimer’s or Parkinson’s, can make use of virtual applications to help them in their daily tasks. 

A popular example of a virtual assistant is Ada – a startup based in Berlin that predicts diseases based on the user’s symptoms. There’s also Woebot – a chatbot developed at Stanford University that provides therapy treatments to patients who suffer from depression.

ada logo

Since the outbreak of COVID, many startups are building apps that can virtually assist you in finding balance in your everyday life. Balance is one of those apps that helps you to meditate to get rid of anxiety and increase your focus.

Zenia is another app that provides guided yoga and fitness. The app basically tracks your position and pose through the webcam, and predicts a line segmentation showing how correct your pose is.

Zenia app
Source: Zenia

Recommendation systems

Recommendation systems can be very useful. They don’t replace human intervention, but rather enhance it, and provide humans with a tool to make more informed decisions.

Radiologists can use machine learning to expose malign cells through segmentation and support their decision with the algorithm’s suggestions. 

Likewise, a machine learning algorithm can use NLP to recommend prescriptions, treatments, healthcare routines, and diets that are most appropriate for the patient. Doctors and nurses can review the algorithm’s suggestions, and make better decisions. 

Recommendation systems can also be brought to mobile devices through a chatbot. 

Patients can input their concerns and answer questions asked by a chatbot. An algorithm can evaluate the answers based on the history of the patient’s data, medical data, prior answers and so on, and recommend a set of personalized solutions. 

Tools, libraries, and frameworks that are being in healthcare and medical research


  • Variational Autoencoder
    • Commonly used in representation learning, variational autoencoders (VAE) are relatively new unsupervised neural networks that learn a distribution over the feature space. VAEs are used to find important features or representations; patterns in the data that are recursive. Apart from that, they are also used to detect outliers, clustering, and data reconstruction
    • VAEs are also used for dimensionality reduction, which transforms higher dimension data to lower dimension data. With dimensionality reduction, it becomes easier to understand and interpret data.  
  • Segmentation
    • Segmentation algorithms are used to highlight the ground truth or objective. It is very much useful in detecting and highlighting malign cells in CT scans. 
  • Clustering
    • When working with huge amounts of data it is often very difficult to label the data, understand data, and interpret data, so we try to separate them based upon the likeliness i.e. arranging them with a similar probability distribution. There are various clustering algorithms, a couple of them are tSNE and VAE. 
  • Image Classification
    • Image classification is a supervised learning algorithm. It is used to classify different groups of data. In general, it recognizes patterns and assigns them to a given group or class. 
  • Natural Language Processing
    • Natural Language Processing or NLP, is to analyze the text data, and based on the assigned task it returns the output. NLP is predominantly used in completing the sentence, predicting the next word or sentence, text summarization, text translation
    • In the medical industry, it is very helpful in generating prescriptions, recommending medicines based upon the diagnosis, auto-generating reports using image to text generation and so on.
  • Text Summarisation
    • As mentioned earlier, text summarization is an NLP task. As the name suggests it takes text data and summarises it to the given length. It is very useful to condense reports of the patients when being transferred from one doctor to the other or even when the patient is being transferred to a different hospital or transferred to a different treatment altogether. 
  • Text Translation
    • There might be times when the specialist doctors are trained in some other language or maybe the patient is from an indigenous background and he/she is not able to understand English, in that case, text translation can help to translate the report from one language to the other. 


The Best Tools, Libraries, Frameworks and Methodologies that Machine Learning Teams Actually Use – Things We Learned from 41 ML Startups [ROUNDUP]

Python libraries for the medical and healthcare industry

  • Tensorflow
    • Tensorflow or tf is developed by Google. It provides good education support from the Google machine learning community and it is quite widely used to build simple and complex neural networks. 
      Check Neptune’s integration with TensorFlow
  • Tensorflow Probability
    • Tensorflow Probability is built on Tensorflow which was recently developed. It used to combine probabilistic models and deep learning on modern hardware (TPU, GPU). Mostly used data scientists, statisticians, ML researchers, and practitioners who want to encode domain knowledge to understand data and make predictions.
  • PyTorch
    • PyTorch is a scientific computing framework with wide support for machine learning algorithms. The Lua based scripting language provides a wide range of algorithms for deep learning and uses the scripting language LuaJIT, and an underlying C implementation.
  • Caffe
    • Like TensorFlow, Caffe is another python library for building expressive deep learning projects. 
  • Open3d
    • Open3d is a python library for processing 3d data. It is very useful to visualize and analyze CT scans.
  • Matplotlib and Seaborn
  • OpenCV
    • OpenCV is used for image processing and transformation such as cropping, resizing, color transformations, reshaping, and so on.
  • Numpy
    • Numpy is a scientific python library used to do complex maths, especially linear algebra and matrices. In fact, most of the image processing libraries like OpenCV and Open3d are built on NumPy.
  • Sonnet
    • Sonnet is the deep learning library built around TensorFlow, by Deepmind. It provides features specifically designed around their research requirements, and it has been quite extensively used by them and the research community. 

Frameworks used in Medical and Healthcare industries

  1. Pytorch by Facebook
    1. The PyTorch framework has all the essential libraries such as torch, PIL, NumPy to name a few, needed for building any deep learning model that is flexible and powerful, and though it lacks tutorial it is backed with active community support in the Pytorch forum. 
    2. The framework itself offers a lot of extensions such as image processing and all the extensive mathematical formulas needed to build a deep neural network.
    3. It is also worth noting that because it is built on top of CUDA, it gives direct access to NVIDIA GPUs if available.
  2. Keras
    1. Built on top of TensorFlow and provides a base for beginners tutorials and a large active community. Like Pytorch it also provides access to GPUs with additional TPUs when working with Google Colab.
  3. Acme by Deepmind
    1. Acme framework is extensively used in Deepmind though not used in public too much it is worth noting that is a customizable framework built specifically on top of TensorFlow. 

Companies using data science and machine learning in everyday applications

Many companies use data science and machine learning in their research and products. Here’s a list of 10 companies that use machine learning in their medical research and products:

  • Google/Deepmind 
    • Deepmind is a research company that on 30 Nov 2020 made a scientific breakthrough for determining the 3D shapes of protein stands to transform biology through their algorithm called AlphaFold.
  • IBM Watson Healthcare 
    • IBM Watson Healthcare aims to provide healthcare workers with tools and services designed to help them derive more insights from their data and simplify their operations. 
    • They use AI-driven solutions to quickly sift through data and make quick and appropriate decisions.
  • Zebra Medical Vision
    • The demand for medical imaging services is continuously increasing, outpacing the supply of qualified radiologists and stretching them to produce more output, without compromising patient care. Zebra-Med aims to provide radiologists with tools that can help them to understand CT scans much efficiently.
  • Kensci
    • KenSci’s AI platform for digital health & healthcare solutions that provides data management systems for AI and BI applications. They argue to bring a rich set of AI development features for experimental model development as well as testing and production management. 
  • PathAI
    • PathAI is developing technology that assists pathologists in making rapid and accurate diagnoses for every patient, every time. Along with that they also analyze and predict benefits from novel therapies for a patient, to make scalable personalized medicine a reality. 
    • PathAI is funded by the Gates Foundation.
  • Project InnerEye 
    • As stated in their website “The goal of Project InnerEye is to democratize AI for medical image analysis and empower researchers, hospitals, life science organizations, and healthcare providers to build medical imaging AI models using Microsoft Azure”.
    • Project InnerEye is a Microsoft project that uses AI to diagnose eye infections and disease. In one of the conferences held by Microsoft Satya Nadella CEO of Microsoft said “…We are pursuing AI so that we can empower every person and every institution that people build with tools of AI so that they can go on to solve the most pressing problems of our society and our economy. That’s the pursuit.” 
  • Insitro
    • Insitro provides:
      • Better Predictions for Drug Discovery and Development
      • Integrating Machine Learning and Biology at Scale
      • Collaboration Across Disciplines, such as biologists, engineers, and scientists.
      • Reimagining Drug Discovery and Development
  • ConcertAI
    • ConcertAI describes its mission as “to accelerate insights and outcomes for patients through leading real-world data, AI technologies, and scientific expertise in partnership with the leading biomedical innovators, healthcare providers, and medical societies.
  • Orderly
    • Orderly is an interesting company. It uses AI to correct the data making it reliable for medical usage. This makes the diagnosis and treatment process smooth. 
  • Enlitic
    • Enlitic is a company that uses data to advance medical diagnostics. By pairing radiologists with data scientists and engineers, they collect and analyze the world’s most comprehensive clinical data, pioneering medical software that enables doctors to diagnose sooner with renowned accuracy. 


Data science and machine learning are advancing medicine into a new realm. It’s exciting to think about where it can go. In the coming days, it will be very common to have embedded machine learning expertise that analyzes patient status in real-time, what’s going on with similar patients in multiple healthcare systems, what applicable clinical trials are underway, and the efficacy and cost of new treatment options. 

Opportunities and treatments that were merely an idea a few years ago are now becoming a reality. We’re living in the age of machine learning, where algorithms can support us in preventing and treating diseases by analyzing our data and helping doctors make better decisions.

Machine learning is equipping more and more doctors, nurses, and healthcare workers with many new tools, enabling them to take better care of their patients. As. Dr. Francis Peabody said years ago, “the secret of the care of the patient is in caring for the patient”. While machines take over more tasks, humans can do what they do best – care and provide help.