Medication adherence classification for non-communicable disease patients through machine learning approaches.
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Medication non-adherence is a significant public health issue that results in poor treatment outcomes, unnecessary hospitalisations, increased healthcare expenses, and increased risks of morbidity, disability, and mortality. According to the World Health Organisation, only 50% of people with chronic diseases adhere to treatment recommendations, despite receiving guidance on the need for medication adherence (MA). Notwithstanding considerable endeavours to tackle this concern, traditional methods have demonstrated low efficacy in predicting and intervening in non-adherent behaviours. In recent years, the advent of Fourth Industrial Revolution (4IR) technologies, such as artificial intelligence (AI) and machine learning (ML) has brought about promising solutions to MA problems. This study developed seven ML models (SVM, KNN, DT, Naïve Bayes, DNN, LR, and RF) to classify MA in a dataset of diabetes and hypertension patients in Zimbabwe, a developing country. The thesis's full structure includes numerous studies chronologically arrayed as follows: (1) an expansive systematic literature review on ML approaches to MA analytics among NCD (NCD) patients; (2) feature selection and the importance of predictors of NCDs MA from ML research perspectives; (3) the generation of a novel stateof-the-art dataset entitled "Data Wrangling and Generation for ML Models in MA Analytics: A Practical Standpoint Using Patient-Level and Medical Claims Data"; (4) development and validation of seven ML models, and (5) development of twinned machine learning algorithms for MA analytics in patients with diabetes and hypertension. The relevant data for diabetics and hypertensives were filtered from the initial raw dataset based on the International Classification of Diseases 10th Revision Code (ICD-10). Since adherence metrics measure the percentage of patients covered by medication refill claims for the same medication in the same therapeutic class, the created dataset only included medication refill data and patient-level data for 2022, from January 1 to December 31, 2022. Non-adherence was defined as medication refills that accounted for less than 75% of the projected 12-month claims, whereas adherence required at least 75% of the refills. This study acknowledges the absence of a universally accepted compendium of data wrangling (DW) steps while developing the novel dataset. However, it accepts certain building blocks
and functionalities with significant overlaps as typical DW elements. In light of this, the study employs a pragmatic approach to dataset generation, encompassing nine essential DW building blocks and tasks that are employed iteratively and incrementally, comparable to an agile approach to dataset generation. An ML model was then developed using the generated dataset to evaluate the novel dataset's soundness, and its 81% accuracy
demonstrated a generally good level of performance. During the ML model development process, the study used an RF feature importance mechanism to identify the important variables needed for building MA prediction models. The study also produced Shapley Additive Explanations (SHAP) beeswarm plots to aid in the interpretation of the model outputs and evaluate the contribution of the features to the prediction process. These factors included the annual quantity of medical supplies, the annual claim amount, the patient's age, their membership in a wellness programme, the medical aid cover, their contribution to the cover, comorbidity, diagnosis, the type of hospital cover, the occurrence of complications, gender, and the type of medical aid scheme. The total number of medical supplies dispensed annually was revealed as the most important predictor of MA. The ML classifiers had a classification accuracy of 84.7% to 87.6%, with AUC values ranging from 0.857 to 0.934. The most robust classifier was the RF, an ensemble learning technique with 87.6% accuracy, an AUC of 0.9351, and superior precision, recall, and F1-score. The hybridization of classical SVM with tree-based ML algorithms (RF and DT) demonstrated improved results, with the newly developed TWINNED-SVM-RF and TWINNED-SVMDT classifiers outperforming the standard SVM. Among these, the TWINNED-SVM-RF model showed superior performance. Additionally, the application of LIME (Local Interpretable Model-Agnostic Explanations) enhanced the explainability of the models, providing valuable insights into their decision-making processes. The ensemble learner (RF) and the TWINNED-SVM-RF show promise as prognostic tools for improving MA and determining patient adherence levels. These findings contribute to reducing discrepancies in medication refills and adherence rates among NCD patients. The ML model serves as a foundation for the creation of intelligent MA and intervention apps to promote patient MA among chronic disease patients. The outcomes, techniques, and insights proffered in novel dataset generation inform future researchers, data scientists, and analysts on how to conduct data mining, dataset enrichment, and DW. The dataset generated can spark additional research in areas such as evaluating MA and performing ML tasks such as classifying, predicting, and clustering MA behaviours.
Description
Doctoral Degree. University of KwaZulu-Natal, Pietermaritzburg.