Bayesian data augmentation using MCMC: application to missing values imputation on cancer medication data.
Date
2017
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Missing data is a very serious issue that negatively affect inferences and findings of
researchers in data science and statistics. The ignorance of missing data or deletion
of cases that contain missing observations may lead to reducing statistical power,
loss of information, increasing standard errors of estimates and increases estimation
bias in data analysis. One of the advantages of using imputation methods is
to keep the full sample size, which makes the results to be more precise. Amongst
all the missing data imputation techniques, data augmentation is not so popular in
the literature and very few articles mentioned the use of the technique to account for
missing data problems. Data Augmentation technique can be used for imputation of
missing data in both Bayesian and classical statistics. In the classical approach, data
augmentation is implemented through EM algorithm that uses maximum likelihood
function to impute and estimate unknown parameters of a model. EM algorithm is
a useful tool for a likelihood-based decision when dealing with missing data problems.
The Bayesian data augmentation approach is used when it is not possible to
directly estimate a posterior distribution P( j xov), of the parameters, given the
observed data xov due to the missing data in x. This study aims to contribute to
a better understanding of Bayesian data augmentation and improve the quality of
estimates and precision of the analysis of data with missing values. The General
Household Survey [GHS 2015] is the main source of data in this study. All the analyses
are made using the software R and more precisely the package mix. In this
study, we have find that Bayesian data augmentation can solve the problem of missing
data in cancer drug intake data. The Bayesian data augmentation performs very
well in improving modelling of cancer drug affected by missing data.
Description
Master of Science in Statistics, University of kwaZulu-Natal, Westville, 2017.