Repository logo
 

Bayesian data augmentation using MCMC: application to missing values imputation on cancer medication data.

Thumbnail Image

Date

2017

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Missing data is a very serious issue that negatively affect inferences and findings of researchers in data science and statistics. The ignorance of missing data or deletion of cases that contain missing observations may lead to reducing statistical power, loss of information, increasing standard errors of estimates and increases estimation bias in data analysis. One of the advantages of using imputation methods is to keep the full sample size, which makes the results to be more precise. Amongst all the missing data imputation techniques, data augmentation is not so popular in the literature and very few articles mentioned the use of the technique to account for missing data problems. Data Augmentation technique can be used for imputation of missing data in both Bayesian and classical statistics. In the classical approach, data augmentation is implemented through EM algorithm that uses maximum likelihood function to impute and estimate unknown parameters of a model. EM algorithm is a useful tool for a likelihood-based decision when dealing with missing data problems. The Bayesian data augmentation approach is used when it is not possible to directly estimate a posterior distribution P( j xov), of the parameters, given the observed data xov due to the missing data in x. This study aims to contribute to a better understanding of Bayesian data augmentation and improve the quality of estimates and precision of the analysis of data with missing values. The General Household Survey [GHS 2015] is the main source of data in this study. All the analyses are made using the software R and more precisely the package mix. In this study, we have find that Bayesian data augmentation can solve the problem of missing data in cancer drug intake data. The Bayesian data augmentation performs very well in improving modelling of cancer drug affected by missing data.

Description

Master of Science in Statistics, University of kwaZulu-Natal, Westville, 2017.

Keywords

Citation

DOI