The global medical diagnostics industry is rising at a brisk pace and is expected to reach US $38.68 billion by 2025 at a CAGR (compound annual growth rate) of 50.2 per cent. The Covid-19 crisis has highlighted the need for the use of technology to reduce the time taken to diagnose a disease and exposed the weakness of our healthcare systems.
Using artificial intelligence (AI) in medical diagnosis can reduce detection time and error rate, use predictive techniques to auto-diagnose, better predict a patient's future health, and provide apt treatment recommendations.
With AI algorithms, we can predict the epidemiological trend and diagnose disease severity in a better way. Even though we have a huge influx of data from Electronic Health Record (EHR), patient history, hospitalisation records, and wearable devices, we do not have enough technological capability to determine patterns and make predictions accurately.
To perform reliable medical diagnostics, healthcare organisations leverage the power of high quality data. Since data sources like EHR, patient's history, hospitalisation records, and wearable devices data are very diverse, data quality preservation is challenging. EHR data includes physiological data, laboratory test data, allergic information, patient insurance data, and past medical history.
The data collected from patients and healthcare records is seldom complete and clean and requires data cleansing. The cleaning step involves removing errors, missing values, and inconsistencies. Missing values can occur due to patient conditions unrelated to a particular healthcare variable or data not recorded by the sensor. If missing values are mishandled, it can lead to biased results.
To handle missing values, we can either impute or drop them. Before discarding missing values, we must ensure that analysis of remaining values does not produce inference bias. However, discarding values is never the best option. Parameter estimation can be effective, like multiple imputations that impute missing values using Bayesian posterior distribution or K–Nearest Neighbour Imputation.
Linear regression can compute missing values when a variable distribution is continuous, while logistic regression is helpful to compute missing values when the distribution is binary. Feature scaling helps in scaling features of data that vary in magnitude, range, and unit. Data scaling can be done using a min max or z-score algorithm.
As unstructured healthcare data is generated rapidly from wearable devices, clinical reports, and doctor's prescriptions, the prediction may not be useful due to irrelevant factors. Selecting the best features using feature engineering can improve accuracy and help identify the most significant risk factors.
Principal component analysis (PCA) can help in generating new uncorrelated predictors. For example, using PCA, we can explore the major factors that increase heart disease risk. We can use the random forest algorithm to select the best subset of features.
Using machine learning (ML) algorithms in medical diagnosis offers several advantages, like early diagnosis of disease, treatment cost-cutting, and potentially saving human life. Initially, ML algorithms were designed to analyse massive medical data sets. Through the years, we can now apply powerful ML algorithms like random forest and deep learning to more specialised and complicated medical diagnosis problems like heart disease prediction and cancer image classification.
A fuzzy logic system can be used for heart disease diagnosis. The system takes input variables like heart rate, cholesterol, blood sugar, blood pressure (BP), gender, and age, and produces output referring to disease condition in a numerical range where increasing values show increasing heart disease risk.
The fuzzifier changes the observation input into a fuzzy value. The fuzzy value is processed by an inference engine using a set of rules. The rule base in fuzzy systems contains a collection of attributes with AND/OR operators. For example, if BP, cholesterol, and heart rate are low, then the result is “Healthy.” Finally, defuzzification converts the output value from the inference engine to crisp logic.
Once we have developed and trained our model, we monitor the performance of our model to assess how well it is performing on completely new data. Certain performance criteria determine the successful uptake of an algorithm in medical diagnosis. The most important measures used in clinical tests for the performance of medical diagnostics are sensitivity and specificity.
The predictive value of a test is defined as the probability of having the disease, given the test results. Positive predictive value or diagnostic precision is a probability measure that indicates the number of cases that actually have the disease divided by the number of cases which is classified as a disease by the classifier. A negative predictive value indicates the probability of a person being healthy when the classifier classified the output as negative/healthy.
The trade-off between having high specificity (detect all healthy people) and sensitivity (detect people who have a disease) is important to consider in a binary class classification. We define threshold criteria to assign input data to 0 or 1. Estimating the optimal threshold based on costs delivers better results. For example, if the intervention of missing a disease is safe and cheap and the cost of diagnosis is high, the optimal threshold can be found at the top right of the ROC curve where sensitivity is high and also the possibility of accepting a high number of false positives.
However, if the intervention is of high risk and its effectiveness does not convince us, the optimal threshold will be at the bottom left corner of the ROC curve where we minimise harming non-diseased people but take missing diseased persons for granted.
When we have to detect critical patients in healthcare settings, it is important to prioritise sensitivity over specificity. In the case of a Covid-19-positive person, a false-positive result can wrongly suggest a person is safe, and if they harm others, they can become potential carriers of the disease.
The improvement in personalised care to patients and healthcare efficiency comes with ethical concerns. Since the reliability of the output is determined by the quality of data input, error-prone data can lead to misinformed medical decision-making, which can ultimately impact the health or cost the life of a patient.
Another concern is that flawed data might not represent minority groups properly. This can put patients at risk of overdiagnosis or underdiagnosis. Features like age, disability, and skin colour can also serve as a basis for algorithmic bias. For example, an AI-based software that recommends skin cancer treatment to clinicians might be trained on white-skinned patients. Thus, this software will give inaccurate recommendations when testing on samples containing information about, say, African Americans.
Safety and transparency are other crucial concerns in AI. AI developers must be transparent about the kind of data used to train the model or any shortcomings of the model. For example, IBM's Watson for Oncology provided incorrect cancer treatment recommendations as the developers had trained the software on synthetic cancer cases. IBM kept Watson's incorrect and unsafe recommendations secret for over a year. There can be a negative impact on the patient-healthcare provider relationship due to a lack of transparency. Healthcare providers must accurately provide information about collaboration with the third party for data sharing of patients.
Recent advancements in AI in healthcare have significantly contributed to medical diagnostics. With rapid development in AI, while there is a focus on optimising the performance of complex AI models, there is a need for explainability. Explainable diagnosis will transform the AI-based diagnostic potentials into clinical practice and form a basis for trustworthy and reliable communication between AI model experts and medical experts.
AI-based systems will have a positive impact on several diagnostics fields, like dermatology (lesion interpretation), pathology (microscopic diagnoses), radiology (mammography or MRI evaluation), and ophthalmology (examination of a retinal artery for diabetes diagnosis).
Several startups are venturing into the field of medical diagnostics using AI. Retina Pharmaceuticals has designed a device for diagnosing glaucoma with accuracy and ease. Tricog, a Bengaluru-based startup, enables cardiologists to access blood flow rates and diagnose heart diseases. OncoStem, another Indian startup, is developing novel ways to detect breast cancer from a patient's tumours and compute the probability of recurrence.
Artelus, one of the top 10 Indian AI healthcare firms, is helping doctors with disease diagnosis by developing products for analysing chest X-rays for tuberculosis (TB) and pneumonia detection. They have another product in their pipeline that will facilitate the early detection of Parkinson's and Alzheimer's, which is currently in the data-gathering phase.
Can we infer that AI will completely replace humans in the medical diagnostics domain? Our findings suggest that this might not occur in the near future. AI has reduced the cost of prediction, but still we need audits and checks for biases. We need experts to explain how a model is arriving at a particular decision. We can state that with AI we are interweaving human knowledge with algorithm intelligence.
This article has been published as part of , the Swarajya Science and Technology Initiative 2022. We are .
Read other Swasti 22 submissions.
As you are no doubt aware, Swarajya is a media product that is directly dependent on support from its readers in the form of subscriptions. We do not have the muscle and backing of a large media conglomerate nor are we playing for the large advertisement sweep-stake.
Our business model is you and your subscription. And in challenging times like these, we need your support now more than ever.
We deliver over 10 - 15 high quality articles with expert insights and views. From 7AM in the morning to 10PM late night we operate to ensure you, the reader, get to see what is just right.
Becoming a Patron or a subscriber for as little as Rs 1200/year is the best way you can support our efforts.