Browsing by Author "Matthews, Glenda Beverley."
Now showing 1 - 9 of 9
- Results Per Page
- Sort Options
Item Aspects of categorical data analysis.(1998) Govender, Yogarani.; Matthews, Glenda Beverley.The purpose of this study is to investigate and understand data which are grouped into categories. At the onset, the study presents a review of early research contributions and controversies surrounding categorical data analysis. The concept of sparseness in a contingency table refers to a table where many cells have small frequencies. Previous research findings showed that incorrect results were obtained in the analysis of sparse tables. Hence, attention is focussed on the effect of sparseness on modelling and analysis of categorical data in this dissertation. Cressie and Read (1984) suggested a versatile alternative, the power divergence statistic, to statistics proposed in the past. This study includes a detailed discussion of the power-divergence goodness-of-fit statistic with areas of interest covering a review on the minimum power divergence estimation method and evaluation of model fit. The effects of sparseness are also investigated for the power-divergence statistic. Comparative reviews on the accuracy, efficiency and performance of the power-divergence family of statistics under large and small sample cases are presented. Statistical applications on the power-divergence statistic have been conducted in SAS (Statistical Analysis Software). Further findings on the effect of small expected frequencies on accuracy of the X2 test are presented from the studies of Tate and Hyer (1973) and Lawal and Upton (1976). Other goodness-of-fit statistics which bear relevance to the sparse multino-mial case are discussed. They include Zelterman's (1987) D2 goodness-of-fit statistic, Simonoff's (1982, 1983) goodness-of-fit statistics as well as Koehler and Larntz's tests for log-linear models. On addressing contradictions for the sparse sample case under asymptotic conditions and an increase in sample size, discussions are provided on Simonoff's use of nonparametric techniques to find the variances as well as his adoption of the jackknife and bootstrap technique.Item A classical approach for the analysis of generalized linear mixed models.(2004) Hammujuddy, Mohammad Jahvaid. ; Matthews, Glenda Beverley.Generalized linear mixed models (GLMMs) accommodate the study of overdispersion and correlation inherent in hierarchically structured data. These models are an extension of generalized linear models (GLMs) and linear mixed models (LMMs). The linear predictor of a GLM is extended to include an unobserved, albeit realized, vector of Gaussian distributed random effects. Conditional on these random effects, responses are assumed to be independent. The objective function for parameter estimation is an integrated quasi-likelihood (IQL) function which is often intractable since it may consist of high-dimensional integrals. Therefore, an exact maximum likelihood analysis is not feasible. The penalized quasi-likelihood (PQL) function, derived from a first-order Laplace expansion to the IQL about the optimum value of the random effects and under the assumption of slowly varying weights, is an approximate technique for statistical inference in GLMMs. Replacing the conditional weighted quasi-deviance function in the Laplace-approximated IQL by the generalized chi-squared statistic leads to a corrected profile quasilikelihood function for the restricted maximum likelihood (REML) estimation of dispersion components by Fisher scoring. Evaluation of mean parameters, for fixed dispersion components, by iterative weighted least squares (IWLS) yields joint estimates of fixed effects and random effects. Thus, the PQL criterion involves repeated fitting of a Gaussian LMM with a linked response vector and a conditional iterated weight matrix. In some instances, PQL estimates fail to converge to a neighbourhood of their true values. Bias-corrected PQL estimators (CPQL) have hence been proposed, using asymptotic analysis and simulation. The pseudo-likelihood algorithm is an alternative estimation procedure for GLMMs. Global score statistics for hypothesis testing of overdispersion, correlation and heterogeneity in GLMMs has been developed as well as individual score statistics for testing null dispersion components separately. A conditional mean squared error of prediction (CMSEP) has also been considered as a general measure of predictive uncertainty. Local influence measures for testing the robustness of parameter estimates, by inducing minor perturbations into GLMMs, are recent advances in the study of these models. Commercial statistical software is available for the analysis of GLMMs.Item Connectedness and the hyperspace of metric spaces.(2015) Rathilal, Cerene.; Matthews, Glenda Beverley.; Molenberghs, Geert.One of the prime motivations for studying hyperspaces of a metric space is to understand the original space itself. The hyperspace of a metric space X is the space 2X of all non-empty closed bounded subsets of it, endowed with the Hausdorff metric. Our purpose is to study, in particular, connectedness properties of X and its hyperspace. We shall be concerned with knowing if a property P is extensional, that is, if X has property P then so does the hyperspace, or if a property is P is re ective, that is, if the hyperspace has property P then so does X itself. The hyperspace 2X and its subspace C(X) will be the focus of our study. First the Hau- dorff metric, p, is considered and introduced for the hyperspace 2X which is also inherited by C(X). As in (Nadler; [8]), when X is a continuum, the property of compactness is shown to be extensional to 2X and C(X). This is further generalised, when it is shown that each of 2X and C(X) is arcwise connected and hence are each arcwise connected continua, when X is a continuum. The classical results, the Boundary Bumping Theorems (due to Janiszewski [4]), which provide the required conditions under which the component of a set intersects its boundary, is proved using the Cut Wire Theorem (Whyburn; [13]). As an ap- plication, the Boundary Bumping Theorem (for open sets) is used to show the existence of continua arising out of convergence, in the Continuum of Convergence Theorem(Nadler; [8]). Using a construction of Whitney( [12]), the existence of a Whitney map, , for 2X and ! for C(X) are given. Using u, a special function o : [0; 1] -! 2X (due to Kelley [3]) called a segment is considered in the study of the arc structure of 2X and C(X). The equivalence of the existence of an order arc in 2X and the existence of a segment in 2X is also shown. A segment homotopy is then utilised to show that if one of 2X or C(X) is contractible then so is the other. This is presented in the Fundamental Theorem of Contractible Hyperspaces. The relationship between local connectedness and connectedness im kleinen is examined in order to understand the properties of Peano continua. Property S, introduced by Sierpin- ski( [10]), is considered and its connection to local connectedness is examined. Furthermore, a result of Wojdyslawski( [15]), which shows that local connectedness is an extensional prop- erty of a continuum X to the hyperspaces 2X and C(X), is given. Local connectedness is also re ective if either 2X or C(X) is a locally connected metric continuum. Lastly, Property K, by Kelley( [3]) is examined and is shown to be a sufficient condition for a continuum X to have its hyperspaces 2X and C(X) to be contractible. Consequently, if X is a Peano continuum then 2X and C(X) are contractible.Item The impact of missing data on clinical trials : a re-analysis of a placebo controlled trial of Hypericum perforatum (St Johns wort) and sertraline in major depressive disorder.(Springer., 2014) Grobler, Anna Christina.; Matthews, Glenda Beverley.; Molenberghs, Geert.Rationale and objective Hypericum perforatum (St John's wort) is used to treat depression, but the effectiveness has not been established. Recent guidelines described the analysis of clinical trials with missing data, inspiring the reanalysis of this trial using proper missing data methods. The objective was to determine whether hypericum was superior to placebo in treating major depression. Methods A placebo-controlled, randomized clinical trial was conducted for 8 weeks to determine the effectiveness of hypericum or sertraline in reducing depression, measured using the Hamilton depression scale. We performed sensitivity analyses under different assumptions about the missing data process. Results Three hundred forty participants were randomized, with 28 % lost to follow-up. The missing data mechanism was not missing completely at random. Under missing at random assumptions, some sensitivity analyses found no difference between either treatment arm and placebo, while some sensitivity analyses found a significant difference from baseline to week 8 between sertraline and placebo (−1.28, 95 % credible interval [−2.48; −0.08]), but not between hypericum and placebo (0.56, [−0.64;1.76]). The results were similar when the missing data process was assumed to be missing not at random. Conclusions There is no difference between hypericum and placebo, regardless of the assumption about the missing data process. There is a significant difference between sertraline and placebo with some statistical methods used. It is important to conduct an analysis that takes account of missing data using valid statistically principled methods. The assumptions about the missing data process could influence the results.Item A perspective on incomplete data in longitudinal multi-arm clinical trials, with emphasis on pattern-mixture-model based methodology.(2014) Grobler, Anna Christina.; Matthews, Glenda Beverley.; Molenberghs, Geert.Missing data are common in longitudinal clinical trials. Rubin described three different missing data mechanisms based on the level of dependence between the missing data process and the measurement process. These are missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Data are MCAR when the probability of dropout is independent of both observed and unobserved data. Data are MAR when the probability of data being missing does not depend on the unobserved data, conditional on the observed data. When neither MCAR nor MAR is valid, data are MNAR. The aim of this thesis is to discuss statistical methodology required for analysing missing outcome data and provide valid statistical methods for the MAR, MCAR and MNAR scenarios. This thesis does not focus on data analysis where covariate data are missing. Under MCAR complete and available case analyses are valid. When data are MAR multiple imputation, likelihood-based models, inverse probability weighting and Bayesian models are valid. When data are MNAR pattern-mixture, selection and shared-parameter models are valid. These methods are illustrated by an in depth analysis of two data sets with missing data. The first data set is the SAPiT trial an open label, randomised controlled trial in HIVtuberculosis co-infected patients. Patients were randomised to three arms; each initiating antiretroviral therapy at a different time. CD4+ count, an indication of HIV progression, was measured at baseline and every 6 months for 24 months. The primary question was whether CD4+ count trajectory over time differed for the three treatment arms. The assumption that missing data are MCAR was not supported by the observed data. We performed a range of sensitivity analyses under both MAR and MNAR assumptions. The second data set is a placebo-controlled, randomised clinical trial conducted for 8 weeks to determine the effectiveness of hypericum or sertraline in reducing depression, measured by the Hamilton depression scale. The trial randomised 340 participants, with 28% lost to follow-up before Week 8. We performed a sensitivity analysis under different assumptions about the missing data process. The missing data mechanism was not MCAR. Under MAR assumptions, some of the sensitivity analyses found no difference between either of the treatment arms and placebo, while some found a significant difference between sertraline and placebo, but not between hypericum and placebo. This re-analysis contributed to the literature around the effectiveness of St John’s Wort because it changed the conclusions of the original analysis.Item Prevalence and risk factors of malaria in children under the age of five years old in Uganda.(2015) Roberts, Danielle Jade.; Matthews, Glenda Beverley.Malaria is considered to be one of the main global health problems, with it causing close to a million deaths each year. Ninety percent of these deaths occur in Sub- Saharan Africa and 70% are of children under the age of 5 years. Uganda, ranked 6th worldwide in the number of malaria cases and 3rd in the number of malaria deaths in 2008, experiences weather conditions that often allow malaria transmission to occur all year round with only a few areas that experience low or unstable transmission. Malaria is the leading cause of morbidity in Uganda with 95% of the population at risk and it killing between 70,000 and 100,000 children every year. Children under the age of five years are among the most vulnerable to malaria infection as they have not yet developed any immunity to the disease. In order to apply successful implementations to eradicate malaria, there is a continuous need to understand the epidemiology and risk factors associated with the disease. Although a large number of studies done worldwide have identified a wide variety of risk factors; socioeconomic, environmental, demographic, and others, associated with malaria infection, there is still a great need to identify the influence of these factors in a local context to allow a successful formulation of a national malaria-control strategy. There have, however, been very few studies done in Uganda on malaria indicators and risk factors. These studies have also been specific to one community at a time. Most recent studies on malaria in Uganda have been hospital-based, investigating clinical malaria among young children and pregnant women. One of the aims of this thesis was to identify significant socio-economic, demographic and environmental risk factors associated with malaria infection, based on the result of a microscopy test conducted on 3,972 children under the age of five during a nationally represented Malaria Indicator Survey (MIS) done in Uganda in 2009. The MIS sample was stratified according to 10 regions of Uganda and was not spread geographically in proportion to the population, but rather equally across the regions. The survey consisted of a two-stage sample design where the first stage involved selecting clusters, with probability proportional to size, from a list of enumeration areas. The second stage involved systematic sampling of households from a list of households in each cluster. Surveys carried out using these sampling techniques are referred to as having complex survey designs. The response variable of interest is binary, indicating whether a child tested positive or negative for malaria. Logistic regression is commonly used to explore the relationship between a binary response variable and a set of explanatory variables. However, this method of analysis is not valid if the data come from complex survey designs. Failure to account for the complex design of a study may result in an overestimation of standard errors, therefore leading to incorrect results. There are many methods of dealing with this design of the study. Two such commonly used approaches are design-based and model-based statistical methods. A designed-based method, which involves the extension of logistic regression to complex survey designs, is survey logistic regression. For design-based methods, parameter estimates and inferences are based on the sampling weights, and only inferences concerning the effects of certain covariates on the response variable are of interest. However, model-based methods are used when interest is also on estimating the proportion of variation in the response variable that is attributable to each of the multiple levels of sampling. In this case, inference on the variance components of the model may also be of interest. Such methods include generalized linear mixed models and generalized estimating equations. This thesis discusses these three methods of analyzing complex survey designs and compares the results of each applied to the MIS data.Item Sexual debut: an analysis of the birth to twenty data.(2015) Singh, Rashmika.; Matthews, Glenda Beverley.; Ramjith, Jordache.According to the literature, it is widely accepted that the early timing of first sex among adolescents is related to long-term health effects and current and future risky sexual behaviour (Sandfort et al., 2008). Despite the importance of youth sexual behaviour for sexual and reproductive health, and the severity of the Human Immunodeficiency Virus (HIV) and the Acquired Immune Deficiency Syndrome (AIDS), there exists relatively little empirical research on sexual debut in Southern Africa (Muula, 2008). The aim of this dissertation is to utilize survival analysis techniques to determine significant predictors of early sexual debut in a South African context. A collaboration with the Human Sciences Research Council (HSRC) was fostered and access to the Birth to Twenty (Bt20) data was arranged. The data set consists of 3273 respondents who were followed from birth. Sexual exposure measures were recorded in six collection waves, namely 11-12, 13, 14, 15, 16 and 17-18 years. Multivariate analyses were initially run by employing a standard survival analysis technique, namely Cox proportional hazards regression survival analysis for sexual debut. Analyses were run separately for males and females. A log-rank test showed that there was a significant difference between the survivor curves for voluntary sexual debut and involuntary sexual debut. This result prompted consideration to explore a competing risks regression model with voluntary sexual debut as the event of interest and involuntary sexual debut as the competing risk event. SPSS was used to run exploratory analyses and Cox Regression (IBM Corp, 2012). Regression diagnostic plots were run in SAS (SAS Institute Inc, 2004). Competing risks regression was performed according to the method of Fine & Gray (1999) by evoking the STCRREG command in STATA and the validity of the proportional subhazards assumption was tested by including time interaction variables in the model (StataCorp, 2013). Where violations of the proportional subhazards assumption were found, the varyi ing effect of the hazard functions on the time to sexual debut was interpreted accordingly.Item Statistical modelling and estimation of solar radiation.(2014) Nzuza, Mphiliseni Bongani.; Ranganai, Edmore.; Matthews, Glenda Beverley.Solar radiation is a primary driving force behind a number of solar energy applications such as photovoltaic systems for electricity generation amongst others. Hence, the accurate modelling and prediction of the solar flux incident at a particular location, is essential for the design and performance prediction of solar energy conversion systems. In this regard, literature shows that time series models such as the Box-Jenkins Seasonal/Non-seasonal Autoregressive Integrated Moving Average (S/ARIMA) stochastic models have considerable efficacy to describe, monitor and forecast solar radiation data series at various sites on the earths surface (see e.g. Reikard, 2009). This success is attributable to their ability to capture the stochastic component of the irradiance series due to the effects of the ever-changing atmospheric conditions. On the other hand at the top of the atmosphere, there are no such conditions and deterministic models which have been used successfully to model extra-terrestrial solar radiation. One such modelling procedure is the use of a sinusoidal predictor at determined harmonic (Fourier) frequencies to capture the inherent periodicities (seasonalities) due to the diurnal cycle. We combine this deterministic model component and SARIMA models to construct harmonically coupled SARIMA (HCSARIMA) models to model the resulting mixture of stochastic and deterministic components of solar radiation recorded at the earths surface. A comparative study of these two classes of models is undertaken for the horizontal global solar irradiance incident on the solar panels at UKZN Howard College (UKZN HC), located at 29.9º South, 30.98º East with elevation, 151.3m. The results indicated that both SARIMA and HCSARIMA models are good in describing the underlying data generating processes for all data series with respect to different diagnostics. In terms of the predictive ability, the HCSARIMA models generally had a competitive edge over the SARIMA models in most cases. Also, a tentative study of long range dependence (long memory) shows this phenomenon to be inherent in high frequency data series. Therefore autoregressive fractionally integrated moving average (ARFIMA) models are recommended for further studies on high frequency irradiance.Item Wadley's problem with overdispersion.(2009) Leask, Kerry Leigh.; Haines, Linda Margaret.; Matthews, Glenda Beverley.Wadley’s problem frequently emerges in dosage-mortality data and is one in which the number of surviving organisms is observed but the number initially treated is unknown. Data in this setting are also often overdispersed, that is the variability within the data exceeds that described by the distribution modelling it. The aim of this thesis is to explore distributions that can accommodate overdispersion in a Wadley’s problem setting. Two methods are essentially considered. The first considers adapting the beta-binomial and multiplicative binomial models that are frequently used for overdispersed binomial-type data to a Wadley’s problem setting. The second strategy entails modelling Wadley’s problem with a distribution that is suitable for modelling overdispersed count data. Some of the distributions introduced can be used for modelling overdispersed count data as well as overdispersed doseresponse data from a Wadley context. These models are compared using goodness of fit tests, deviance and Akaike’s Information Criterion and their properties are explored.