Flexible statistical modeling of deaths by diarrhoea in South Africa.
Date
2013
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The purpose of this study is to investigate and understand data which are grouped into
categories. Various statistical methods was studied for categorical binary responses to
investigate the causes of death from diarrhoea in South Africa. Data collected included
death type, sex, marital status, province of birth, province of death, place of death, province
of residence, education status, smoking status and pregnancy status. The objective of this
thesis is to investigate which of the above explanatory variables was most affected by
diarrhoea in South Africa.
To achieve this objective, different sample survey data analysis techniques are investigated.
This includes sketching bar graphs and using several statistical methods namely, logistic
regression, surveylogistic, generalised linear model, generalised linear mixed model, and
generalised additive model. In the selection of the fixed effects, a bar graph is applied to the
response variable individual profile graphs. A logistic regression model is used to identify
which of the explanatory variables are more affected by diarrhoea. Statistical applications
are conducted in SAS (Statistical Analysis Software).
Hosmer and Lemeshow (2000) propose a statistic that they show, through simulation, is
distributed as chi‐square when there is no replication in any of the subpopulations. Due to
the similarity of the Hosmer and Lemeshow test for logistic regression, Parzen and Lipsitz
(1999) suggest using 10 risk score groups. Nevertheless, based on simulation results, May
and Hosmer (2004) show that, for all samples or samples with a large percentage of
censored observations, the test rejects the null hypothesis too often. They suggest that the
number of groups be chosen such that G=integer of {maximum of 12 and minimum of 10}.
Lemeshow et al. (2004) state that the observations are firstly sorted in increasing order of their estimated event probability.
Description
Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2013.
Keywords
Statistics--Mathematics., Mathematical statistics., Statistics--Data processing., Linear models (Statistics), Theses--Statistics and actuarial science.