Flexible statistical modeling of deaths by diarrhoea in South Africa.
Abstract
The purpose of this study is to investigate and understand data which are grouped into
categories. Various statistical methods was studied for categorical binary responses to
investigate the causes of death from diarrhoea in South Africa. Data collected included
death type, sex, marital status, province of birth, province of death, place of death, province
of residence, education status, smoking status and pregnancy status. The objective of this
thesis is to investigate which of the above explanatory variables was most affected by
diarrhoea in South Africa.
To achieve this objective, different sample survey data analysis techniques are investigated.
This includes sketching bar graphs and using several statistical methods namely, logistic
regression, surveylogistic, generalised linear model, generalised linear mixed model, and
generalised additive model. In the selection of the fixed effects, a bar graph is applied to the
response variable individual profile graphs. A logistic regression model is used to identify
which of the explanatory variables are more affected by diarrhoea. Statistical applications
are conducted in SAS (Statistical Analysis Software).
Hosmer and Lemeshow (2000) propose a statistic that they show, through simulation, is
distributed as chi‐square when there is no replication in any of the subpopulations. Due to
the similarity of the Hosmer and Lemeshow test for logistic regression, Parzen and Lipsitz
(1999) suggest using 10 risk score groups. Nevertheless, based on simulation results, May
and Hosmer (2004) show that, for all samples or samples with a large percentage of
censored observations, the test rejects the null hypothesis too often. They suggest that the
number of groups be chosen such that G=integer of {maximum of 12 and minimum of 10}.
Lemeshow et al. (2004) state that the observations are firstly sorted in increasing order of their estimated event probability.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Bayesian spatial models with application to HIV, TB and STI modeling in Kenya.
Owino, Ngesa Oscar. (2014)This dissertation is concerned with developing and extending statistical models in the area of spatial modeling with particular interest towards application to HIV, TB and HSV-2 data. Hierarchical spatial modeling is a ... -
A frequentist and a Bayesian approach to estimating HIV prevalence accounting for non-response using population-based survey data.
Chinomona, Amos. (2016)Enhanced and novel frequentist and Bayesian approaches to estimating disease measures such as HIV prevalence utilizing the recent advances in statistical computing software are explored and applied making use of ... -
Imputation for nonresponse using the Annual Financial Statistics Survey.
Singh, Smeeta. (2011)In this dissertation, we focus on the Annual Financial Statistics (AFS) survey. This is a survey conducted by Statistics South Africa, the national statistics office of South Africa. The main purpose of this survey is ...