The management of missing categorical data : comparison of multiple imputation and subset correspondence analysis.
Date
2015
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Missing data is a common problem in research and the manner in which this ‘missingness’ is
managed, is crucial to the validity of analysis outcomes.
This study illustrates the use of two diverse methods to handle, in particular, missing
categorical data. These methods are applied to a set of data which intended to identify
relationships between asthma severity in children and environmental, behavioural, genetic
and socio-economic factors. This dataset suffered from substantial missingness.
The first method involved the application of two approaches to multiple imputation, each
adopting different distributional specifications. A practical challenge, previously
undocumented, was encountered in the application of multiple imputation when interactions,
to be identified and included in the analysis model, were needed for the imputation model.
This study found that by imputing a single set of complete data using the expectation
maximization (EM) algorithm for covariance matrices, it was possible to identify relevant
interactions for inclusion in the imputation model.
The second method illustrated the application of correspondence analysis to a subset of the
data that includes only the measured data categories. The application of subset
correspondence analysis (s-CA) with incomplete data, as well as its sensitivity to the type of
missingness, has not been well documented, if at all. There is also no evidence of research in
which interactions have been added to an analysis with s-CA. In this study its use, both with
and without interactions, was illustrated and the results, when compared to those from the
multiple imputation approach, were found to be similar and favourably complementary. A
simulation study found that s-CA performed well with any type of missingness, provided the
amount of missingness is less than 30% on any variable with incomplete data.
Across all analyses, relationships found between asthma severity and factors were consistent
with known relationships, thus providing confirmation of the reliability of the methods.
Description
Doctor of Philosophy in Applied Statistics, University of KwaZulu-Natal, Westville, 2015.