An investigation of multi-label classification techniques for predicting HIV drug resistance in resource-limited settings.
South Africa has one of the highest HIV infection rates in the world with more than 5.6 million infected people and consequently has the largest antiretroviral treatment program with more than 1.5 million people on treatment. The development of drug resistance is a major factor impeding the efficacy of antiretroviral treatment. While genotype resistance testing (GRT) is the standard method to determine resistance, access to these tests is limited in resource-limited settings. This research investigates the efficacy of multi-label machine learning techniques at predicting HIV drug resistance from routine treatment and laboratory data. Six techniques, namely, binary relevance, HOMER, MLkNN, predictive clustering trees (PCT), RAkEL and ensemble of classifier chains (ECC) have been tested and evaluated on data from medical records of patients enrolled in an HIV treatment failure clinic in rural KwaZulu-Natal in South Africa. The performance is measured using five scalar evaluation measures and receiver operating characteristic (ROC) curves. The techniques were found to provide useful predictive information in most cases. The PCT and ECC techniques perform best and have true positive prediction rates of 97% and 98% respectively for specific drugs. The ECC method also achieved an AUC value of 0:83, which is comparable to the current state of the art. All models have been validated using 10 fold cross validation and show increased performance when additional data is added. In order to make use of these techniques in the field, a tool is presented that may, with small modifications, be integrated into public HIV treatment programs in South Africa and could assist clinicians to identify patients with a high probability of drug resistance.