Evaluation of single and multiple missing data imputation techniques: a comparative application on BMI data.
Date
2017
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Missing data are a common occurrence in various fields of data science and statistics. The research into missing data is one of the most important topics in applied statistics, especially in academic, government and industry-run clinical trials. However, this data loss can result in an inadequate basis for study inferences. Dealing with missing data involves neglecting or imputing unobserved values. However, the methods used to deal with the missingness in a data set may bias the results and lead to results which do not reflect a true picture of the reality under investigation in a study.
This thesis discusses the various missing data mechanisms and how missing values can be inferred. The main objective of this thesis is to evaluate the performance of several single and multiple imputation methods for a continuous dataset to find the best imputation techniques. Based on a complete survey data (2014 Lesotho Demographic Household Survey), missingness was created in the response variable (BMI) using three missing data mechanisms: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Missing values were then imputed using three single imputation methods and two multiple imputation methods, namely: mean substitution, hot-deck and regression, multiple linear regression and predictive mean matching (PMM), respectively. The analysis indicated that the PMM imputation method is more precise and can also produce lower estimated standard error compared to other methods.
Description
Master’s Degree. University of KwaZulu-Natal, Durban.