Repository logo
 

The management of missing categorical data : comparison of multiple imputation and subset correspondence analysis.

dc.contributor.advisorZewotir, Temesgen Tenaw.
dc.contributor.advisorNaidoo, Rajen.
dc.contributor.advisorNorth, Delia Elizabeth.
dc.contributor.authorHendry, Gillian Margaret.
dc.date.accessioned2018-10-15T12:20:51Z
dc.date.available2018-10-15T12:20:51Z
dc.date.created2015
dc.date.issued2015
dc.descriptionDoctor of Philosophy in Applied Statistics, University of KwaZulu-Natal, Westville, 2015.en_US
dc.description.abstractMissing data is a common problem in research and the manner in which this ‘missingness’ is managed, is crucial to the validity of analysis outcomes. This study illustrates the use of two diverse methods to handle, in particular, missing categorical data. These methods are applied to a set of data which intended to identify relationships between asthma severity in children and environmental, behavioural, genetic and socio-economic factors. This dataset suffered from substantial missingness. The first method involved the application of two approaches to multiple imputation, each adopting different distributional specifications. A practical challenge, previously undocumented, was encountered in the application of multiple imputation when interactions, to be identified and included in the analysis model, were needed for the imputation model. This study found that by imputing a single set of complete data using the expectation maximization (EM) algorithm for covariance matrices, it was possible to identify relevant interactions for inclusion in the imputation model. The second method illustrated the application of correspondence analysis to a subset of the data that includes only the measured data categories. The application of subset correspondence analysis (s-CA) with incomplete data, as well as its sensitivity to the type of missingness, has not been well documented, if at all. There is also no evidence of research in which interactions have been added to an analysis with s-CA. In this study its use, both with and without interactions, was illustrated and the results, when compared to those from the multiple imputation approach, were found to be similar and favourably complementary. A simulation study found that s-CA performed well with any type of missingness, provided the amount of missingness is less than 30% on any variable with incomplete data. Across all analyses, relationships found between asthma severity and factors were consistent with known relationships, thus providing confirmation of the reliability of the methods.en_US
dc.identifier.urihttp://hdl.handle.net/10413/15643
dc.language.isoen_ZAen_US
dc.subject.otherMissing data.en_US
dc.subject.otherAsthma severity.en_US
dc.subject.otherAsthma categories.en_US
dc.subject.otherChildren with asthma.en_US
dc.titleThe management of missing categorical data : comparison of multiple imputation and subset correspondence analysis.en_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hendry_Gillian_Margaret_2015.pdf
Size:
4.92 MB
Format:
Adobe Portable Document Format
Description:
Thesis.

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.64 KB
Format:
Item-specific license agreed upon to submission
Description: