Comparative approaches to handling missing data, with particular focus on multiple imputation for both cross-sectional and longitudinal models.
Date
2012
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Much data-based research are characterized by the unavoidable problem of incompleteness
as a result of missing or erroneous values. This thesis discusses some of the
various strategies and basic issues in statistical data analysis to address the missing
data problem, and deals with both the problem of missing covariates and missing outcomes.
We restrict our attention to consider methodologies which address a specific
missing data pattern, namely monotone missingness.
The thesis is divided into two parts. The first part placed a particular emphasis on
the so called missing at random (MAR) assumption, but focuses the bulk of attention
on multiple imputation techniques. The main aim of this part is to investigate various
modelling techniques using application studies, and to specify the most appropriate
techniques as well as gain insight into the appropriateness of these techniques for handling
incomplete data analysis. This thesis first deals with the problem of missing
covariate values to estimate regression parameters under a monotone missing covariate
pattern. The study is devoted to a comparison of different imputation techniques,
namely markov chain monte carlo (MCMC), regression, propensity score (PS) and last
observation carried forward (LOCF). The results from the application study revealed
that we have universally best methods to deal with missing covariates when the missing
data pattern is monotone. Of the methods explored, the MCMC and regression methods
of imputation to estimate regression parameters with monotone missingness were
preferable to the PS and LOCF methods. This study is also concerned with comparative
analysis of the techniques applied to incomplete Gaussian longitudinal outcome
or response data due to random dropout. Three different methods are assessed and
investigated, namely multiple imputation (MI), inverse probability weighting (IPW)
and direct likelihood analysis. The findings in general favoured MI over IPW in the
case of continuous outcomes, even when the MAR mechanism holds. The findings further suggest that the use of MI and direct likelihood techniques lead to accurate and
equivalent results as both techniques arrive at the same substantive conclusions. The
study also compares and contrasts several statistical methods for analyzing incomplete
non-Gaussian longitudinal outcomes when the underlying study is subject to ignorable
dropout. The methods considered include weighted generalized estimating equations
(WGEE), multiple imputation after generalized estimating equations (MI-GEE) and
generalized linear mixed model (GLMM). The current study found that the MI-GEE
method was considerably robust, doing better than all the other methods in terms of
small and large sample sizes, regardless of the dropout rates.
The primary interest of the second part of the thesis falls under the non-ignorable
dropout (MNAR) modelling frameworks that rely on sensitivity analysis in modelling
incomplete Gaussian longitudinal data. The aim of this part is to deal with non-random
dropout by explicitly modelling the assumptions that caused the dropout and
incorporated this additional sub-model into the model for the measurement data, and
to assess the sensitivity of the modelling assumptions. The study pays attention to
the analysis of repeated Gaussian measures subject to potentially non-random dropout
in order to study the influence on inference that might be caused in the data by the
dropout process. We consider the construction of a particular type of selection model,
namely the Diggle-Kenward model as a tool for assessing the sensitivity of a selection
model in terms of the modelling assumptions. The major conclusions drawn were that
there was evidence in favour of the MAR process rather than an MCAR process in
the context of the assumed model. In addition, there was the need to obtain further
insight into the data by comparing various sensitivity analysis frameworks. Lastly,
two families of models were also compared and contrasted to investigate the potential
influence on inference that dropout might have or exert on the dependent measurement
data considered, and to deal with incomplete sequences. The models were based on
selection and pattern mixture frameworks used for sensitivity analysis to jointly model
the distribution of the dropout process and longitudinal measurement process. The
results of the sensitivity analysis were in agreement and hence led to similar parameter
estimates. Additional confidence in the findings was gained as both models led to
similar results for significant effects such as marginal treatment effects.
Description
Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2012.
Keywords
Multiple imputation (Statistics), Multivariate analysis., Theses--Statistics and actuarial science., Missing observations (Statistics)