Statistics
Permanent URI for this communityhttps://hdl.handle.net/10413/6771
Browse
Browsing Statistics by Date Accessioned
Now showing 1 - 20 of 158
- Results Per Page
- Sort Options
Item Factors affecting the health status of the people of Lesotho.(2007) Moeti, Abiel.Lesotho, like any other country of the world, is faced with the task of improving theItem Estimating risk determinants of HIV and TB in South Africa.(2009) Mzolo, Thembile.; Mwambi, Henry G.; Zuma, Khangelani.Where HIV/AIDS has had its greatest adverse impact is on TB. People with TB that are infected with HIV are at increased risk of dying from TB than HIV. TB is the leading cause of death in HIV individuals in South Africa. HIV is the driving factor that increases the risk of progression from latent TB to active TB. In South Africa no coherent analysis of the risk determinants of HIV and TB has been done at the national level this study seeks to mend that gab. This study is about estimating risk determinants of HIV and TB. This will be done using the national household survey conducted by Human Sciences Research Council in 2005. Since individuals from the same household and enumerator area more likely to be more alike in terms of risk of disease or correlated among each other, the GEEs will be used to correct for this potential intraclass correlation. Disease occurrence and distribution is highly heterogeneous at the population, household and the individual level. In recognition of this fact we propose to model this heterogeneity at community level through GLMMs and Bayesian hierarchical modelling approaches with enumerator area indicating the community e ect. The results showed that HIV is driven by sex, age, race, education, health and condom use at sexual debut. Factors associated with TB are HIV status, sex, education, income and health. Factors that are common to both diseases are sex, education and health. The results showed that ignoring the intraclass correlation can results to biased estimates. Inference drawn from GLMMs and Bayesian approach provides some degree of con dence in the results. The positive correlation found at an enumerator area level for both HIV and TB indicates that interventions should be aimed at an area level rather than at the individual level.Item Inference from finite population sampling : a unified approach.(2007) Hargovan, Kashmira Ansuyah.; Arnab, Raghunath.; North, Delia Elizabeth.In this thesis, we have considered the inference aspects of sampling from a finite population. There are significant differences between traditional statistical inference and finite population sampling inference. In the case of finite population sampling, the statistician is free to choose his own sampling design and is not confined to independent and identically distributed observations as is often the case with traditional statistical inference. We look at the correspondence between the sampling design and the sampling scheme. We also look at methods used for drawing samples. The non – existence theorems (Godambe (1955), Hanurav and Basu (1971)) are also discussed. Since the minimum variance unbiased estimator does not exist for infinite populations, a number of estimators need to be considered for estimating the same parameter. We discuss the admissible properties of estimators and the use of sufficient statistics and the Rao-Blackwell Theorem for the improvement of inefficient inadmissible estimators. Sampling strategies using auxiliary information, relating to the population, need to be used as no sampling strategy can provide an efficient estimator of the population parameter in all situations. Finally few well known sampling strategies are studied and compared under a super population model.Item D-optimal designs for drug synergy.(2009) Kabera, Muregancuro Gaëtan.; Ndlovu, Principal.; Haines, Linda Margaret.This thesis is focused on the construction of optimal designs for detecting drug interaction using the two-variable binary logistic model. Two specific models are considered: (1) the binary two-variable logistic model without interaction, and (2) the binary two-variable logistic model with interaction. The two explanatory variables are assumed to be doses of two drugs that may or may not interact when jointly administered to subjects. The main objective of the thesis is to algebraically construct the optimal designs. However, numerical computations are used for constructing optimal designs in cumbersome cases. The problem of constructing optimal designs is to allocate weights to specific points of the design space in such a way that information associated with model parameters is maximized and the variances of the mean responses are minimized. Specifically, the D-optimality criterion discussed in this thesis minimizes the determinant of the asymptotic variance-covariance matrix of the estimates of the model parameters. The number of support points of the D-optimal designs for the two- variable binary logistic model without interaction varies from 3 to 6. Support points are equally weighted only in case of the 3-point designs and in some special cases of the 4-point designs. The number of support points of the D-optimal designs for the two-variable binary logistic model with interaction varies from 4 to 8. Support points are equally weighted only in case of the 4-point designs and in some special cases of 8-point designs. Numerous examples are given to illustrate theoretical results.Item Modeling environmental factors affecting the growth of eucalypt clones.(2009) Chauke, Morries.; Zewotir, Temesgen Tenaw.; Ndlovu, Principal.; Grzeskowiak, Valerie.Tree growth is influenced by environment and genetic factors. The same tree growing in different areas will have different growth patterns. Trees with different genetic material, e.g. pine and Eucalyptus trees, growing under the same environmental conditions have different growth patterns. Plantation trees in South Africa are mainly used for pulp and paper production. Growth is an important economic factor in the pulp and paper industry. Plantations with fast growth will be available for processing earlier compared to a slow growth plantation. Consequently, it is important to understand the role played by environmental factors, especially climatic factors, on tree growth. This thesis investigated the climatic effects on the radial growth of two Eucalyptus clones using growth data collected daily over five years by Sappi. The general linear model and the time series models were used to assess the effects of climate on radial growth of the two clones. It was found that the two clones have similar overall growth patterns over time, but differ in growth rates. The growth pattern of the two clones appears to be characterized by substantial jumps/changes in growth rates over time. The times at which the jumps/changes in growth rate occur are referred to as the “breakpoints”. The piecewise linear regression model was used to estimate when the breakpoints occur. After estimating the breakpoints, the climatic effects associated with these breakpoints were investigated. The linear and time series modeling results indicated that the contribution of climatic factors on radial growth of Eucalyptus clones was small. Most of the variation in radial growth was explained by the age of the trees. Consequently, this thesis also investigated the appropriate functional relationship between radial growth and age. In particular, this nonlinear growth models were used to model the radial growth process. The investigated growth curve models were those which included the maximum radius and the age at which the radial growth rate is largest as some of the parameters. The maximum growth rate was calculated from the estimated model of each clone. The results indicated that the two clones reach the maximum growth rate at different times. In particular, the two clones reach the maximum growth rates at around 368 and 376 days, respectively. Furthermore, the maximum radius was found to be different for the two clones.Item Modelling acute HIV infection using longitudinally measured biomarker data including informative drop-out.(2009) Werner, Lise.; Mwambi, Henry G.Background. Numerous methods have been developed to model longitudinal data. In HIV/AIDS studies, HIV markers, CD4+ count and viral load are measured over time. Informative drop-out and the lower detection limit of viral load assays can bias the results and influence assumptions of the models. Objective The objective of this thesis is to describe the evolution of HIV markers in an HIV-1 subtype C acutely infected cohort of women from the CAPRISA 002: Acute Infection Study in Durban, South Africa. They were HIV treatment naive. Methods. Various linear mixed models were fitted to both CD4+ count and viral load, adjusting for repeated measurements, as well as including intercept and slope as random effects. The rate of change in each of the HIV markers was assessed using weeks post infection as both a linear effect and piecewise linear effects. Left-censoring of viral load was explored to account for missing data resulting from undetectable measurements falling below the lower detection limit of the assay. Informative drop- out was addressed by using a method of joint modelling in which a longitudinal and survival model were jointly linked using a latent Gaussian process. The progression of HIV markers were described and the effectiveness and usefulness of each modelling procedure was evaluated. Results. 62 women were followed for a median of 29 months post infection (IQR 20-39). Viral load increased sharply by 2.6 log copies/ml per week in the first 2 weeks of infection and decreased by 0.4 log copies/ml per week the next fortnight. It decreased at a slower rate thereafter. Similarly CD4+ count fell in the first 2 weeks by 4.4 square root cells/ul per week then recovered slightly only to decrease again. Left-censoring was unnecessary in this acute infection cohort as few viral load measures were below the detection limit and provided no improvement on model fit. Conclusion. Piecewise linear effects proved to be useful in quantifying the degree at which the HIV markers progress during the first few weeks of HIV infection, whereas modelling time as a linear effect was not very meaningful. Modelling HIV markers jointly with informative drop-out is shown to be necessary to account for the missing data incurred from participants leaving the study to initiate ARV treatment. In ignoring this drop-out, CD4+ count is estimated to be higher than what it actually is.Item Time series modelling with application to South African inflation data(2009) Chinomona, AmosThe research is based on financial time series modelling with special applicationItem Linear model diagnostics and measurement error(2010-09-07) Siverpersad, Ishara.The general linear model, the weighted linear model, and the generalized linear model are presented in detail. Diagnostic tools for the linear models are considered. In general the standard analysis for linear models does not account for measurement error.Item Longitudinal survey data analysis.(2006) Nasirumbi, Pamela Opio.; Zewotir, Temesgen Tenaw.To investigate the effect of environmental pollution on the health of children in the Durban South Industrial Basin (DSIB) due to its proximity to industrial activities, 233 children from five primary schools were considered. Three of these schools were located in the south of Durban while the other two were in the northern residential areas that were closer to industrial activities. Data collected included the participants' demographic, health, occupational, social and economic characteristics. In addition, environmental information was monitored throughout the study specifically, measurements on the levels of some ambient air pollutants. The objective of this thesis is to investigate which of these factors had an effect on the lung function of the children. In order to achieve this objective, different sample survey data analysis techniques are investigated. This includes the design-based and model-based approaches. The nature of the survey data finally leads to the longitudinal mixed model approach. The multicolinearity between the pollutant variables leads to the fitting of two separate models: one with the peak counts as the independent pollutant measures and the other with the 8-hour maximum moving average as the independent pollutant variables. In the selection of the fixed-effects structure, a scatter-plot smoother known as the loess fit is applied to the response variable individual profile plots. The random effects and the residual effect are assumed to have different covariance structures. The unstructured (UN) covariance structure is used for the random effects, while using the Akaike information criterion (AIC), the compound symmetric (CS) covariance structure is selected to be appropriate for the residual effects. To check the model fit, the profiles of the fitted and observed values of the dependent variables are compared graphically. The data is also characterized by the problem of intermittent missingness. The type of missingness is investigated by applying a modified logistic regression model missing at random (MAR) test. The results indicate that school location, sex and weight are the significant factors for the children's respiratory conditions. More specifically, the children in schools located in the northern residential areas are found to have poor respiratory conditions as compared to those in the Durban-South schools. In addition, poor respiratory conditions are also identified for overweight children.Item Modeling the factors affecting cereal crop yields in the Amhara National Regional State of Ethiopia.(2010) Mohammed, Yunus Hussien.; Ramroop, Shaun.; Zewotir, Temesgen.The agriculture sector in Amhara National Regional State is characterised by producing cereal crops which occupy the largest percentage (84.3%) of the total crop area cultivated in the region. As a result, it is imperative to investigate which factors influence the yields of cereal crops particularly in relation to the five major types of cereals in the study region namely barley, maize, sorghum, teff and wheat. Therefore, in this thesis, using data collected by the Central Statistical Agency of Ethiopia, various statistical methods such as multiple regression analysis were applied to investigate the factors which influence the mean yields of the major cereal crops. Moreover, a mixed model analysis was implemented to assess the effects associated with the sampling units (enumeration areas), and a cluster analysis to classify the region into similar groups of zones. The multiple regression results indicate that all the studied cereals mean yields are affected by zone, fertilizer type and crop damage effects. In addition to this, barley is affected by extension programme; maize crop by seed type, irrigation, and protection of soil erosion; sorghum and teff crops are additionally affected by crop prevention method, extension programme, protection of soil erosion, and gender of the household head; and wheat crop by crop prevention methods, extension programme and gender of the household head. The results from the mixed model analysis were entirely different from the regression results due to the observed dependencies of the cereals mean yields on the sampling unit. Based on the hierarchical cluster analysis, five groups of classes (clusters) were identified which seem to be in agreement with the geographical neighbouring positions of the locations and the similarity of the type of crops produced.Item Nonlinear models for neural networks.(2000) Brittain, Susan.; Haines, Linda Margaret.The most commonly used applications of hidden-layer feed forward neural networks are to fit curves to regression data or to provide a surface from which a classification rule can be found. From a statistical viewpoint, the principle underpinning these networks is that of nonparametric regression with sigmoidal curves being located and scaled so that their sum approximates the data well, and the underlying mechanism is that of nonlinear regression, with the weights of the network corresponding to parameters in the regression model, and the objective function implemented in the training of the network defining the error structure. The aim ofthe present study is to use these statistical insights to critically appraise the reliability and the precision of the predicted outputs from a trained hiddenlayer feed forward neural network.Item Aspects of categorical data analysis.(1998) Govender, Yogarani.; Matthews, Glenda Beverley.The purpose of this study is to investigate and understand data which are grouped into categories. At the onset, the study presents a review of early research contributions and controversies surrounding categorical data analysis. The concept of sparseness in a contingency table refers to a table where many cells have small frequencies. Previous research findings showed that incorrect results were obtained in the analysis of sparse tables. Hence, attention is focussed on the effect of sparseness on modelling and analysis of categorical data in this dissertation. Cressie and Read (1984) suggested a versatile alternative, the power divergence statistic, to statistics proposed in the past. This study includes a detailed discussion of the power-divergence goodness-of-fit statistic with areas of interest covering a review on the minimum power divergence estimation method and evaluation of model fit. The effects of sparseness are also investigated for the power-divergence statistic. Comparative reviews on the accuracy, efficiency and performance of the power-divergence family of statistics under large and small sample cases are presented. Statistical applications on the power-divergence statistic have been conducted in SAS (Statistical Analysis Software). Further findings on the effect of small expected frequencies on accuracy of the X2 test are presented from the studies of Tate and Hyer (1973) and Lawal and Upton (1976). Other goodness-of-fit statistics which bear relevance to the sparse multino-mial case are discussed. They include Zelterman's (1987) D2 goodness-of-fit statistic, Simonoff's (1982, 1983) goodness-of-fit statistics as well as Koehler and Larntz's tests for log-linear models. On addressing contradictions for the sparse sample case under asymptotic conditions and an increase in sample size, discussions are provided on Simonoff's use of nonparametric techniques to find the variances as well as his adoption of the jackknife and bootstrap technique.Item Application of statistical multivariate techniques to wood quality data.(2010) Negash, Asnake Worku.; Mwambi, Henry Godwell.; Zewotir, Temesgen Tenaw.Sappi is one of the leading producer and supplier of Eucalyptus pulp to the world market. It is also a great contributor to South Africa economy in terms of employment opportunity to the rural people through its large plantation and export earnings. Pulp mills production of quality wood pulp is mainly affected by the supply of non uniform raw material namely Eucalyptus tree supply from various plantations. Improvement in quality of the pulp depends directly on the improvement on the quality of the raw materials. Knowing factors which affect the pulp quality is important for tree breeders. Thus, the main objective of this research is first to determine which of the anatomical, chemical and pulp properties of wood are significant factors that affect pulp properties namely viscosity, brightness and yield. Secondly the study will also investigate the effect of the difference in plantation location and site quality, trees age and species type difference on viscosity, brightness and yield of wood pulp. In order to meet the above mentioned objectives, data for this research was obtained from Sappi’s P186 trial and other two published reports from the Council for Scientific and Industrial Research (CSIR). Principal component analysis, cluster analysis, multiple regression analysis and multivariate linear regression analysis were used. These statistical analysis methods were used to carry out mean comparison of pulp quality measurements based on viscosity, brightness and yield of trees of different age, location, site quality and hybrid type and the results indicate that these four factors (age, location, site quality and hybrid type) and some anatomical and chemical measurements (fibre lumen diameter, kappa number, total hemicelluloses and total lignin) have significant effect on pulp quality measurements.Item A study of student academic performance at the University of Natal.(1994) Naidoo, Robert.; Murray, Michael.In this dissertation a study will be made of university performance in the Science Faculty of the University of Natal, Durban. In particular, we will develop models that can be used to predict the success rate of a student based on his or her matriculation results. These models will prove useful for selecting students to universities. They may also be used to assist sponsors, bursars and donors in allocating funds to deserving students. In addition, these models may be used to identify students who might experience difficulties in their studies at university.Item Statistical methods for longitudinal binary data structure with applications to antiretroviral medication adherence.(2010) Maqutu, Dikokole.; Zewotir, Temesgen Tenaw.Longitudinal data tend to be correlated and hence posing a challenge in the analysis since the correlation has to be accounted for to obtain valid inference. We study various statistical methods for such correlated longitudinal binary responses. These models can be grouped into five model families, namely, marginal, subject-specific, transition, joint and semi-parametric models. Each one of the models has its own strengths and weaknesses. Application of these models is carried out by analyzing data on patient’s adherence status to highly active antiretroviral therapy (HAART). One other complicating issue with the HAART adherence data is missingness. Although some of the models are flexible in handling missing data, they make certain assumptions about missing data mechanisms, the most restrictive being missing completely at random (MCAR). The test for MCAR revealed that dropout did not depend on the previous outcome. A logistic regression model was used to identify predictors for the patients’ first month’s adherence status. A marginal model was then fitted using generalized estimating equations (GEE) to identify predictors of long-term HAART adherence. This provided marginal population-based estimates, which are important for public health perspective. We further explored the subject’s specific effects that are unique to a particular individual by fitting a generalized linear mixed model (GLMM). The GLMM was also used to assess the association structure of the data. To assess whether the current optimal adherence status of a patient depended on the previous adherence measurements (history) in addition to the explanatory variables, a transition model was fitted. Moreover, a joint modeling approach was used to investigate the joint effect of the predictor variables on both HAART adherence status of patients and duration between successive visits. Assessing the association between the two outcomes was also of interest. Furthermore, longitudinal trajectories of observed data may be very complex especially when dealing with practical applications and as such, parametric statistical models may not be flexible enough to capture the main features of the longitudinal profiles, and so a semiparametric approach was adopted. Specifically, generalized additive mixed models were used to model the effect of time as well as interactions associated with time non-parametrically.Item Forecasting the monthly electricity consumption of municipalities in KwaZulu-Natal.(1997) Walton, Alison Norma.; Haines, Linda Margaret.Eskom is the major electricity supplier in South Africa and medium term forecasting within the company is a critical activity to ensure that enough electricity is generated to support the country's growth, that the networks can supply the electricity and that the revenue derived from electricity consumption is managed efficiently. This study investigates the most suitable forecasting technique for predicting monthly electricity consumption, one year ahead for four major municipalities within Kwa-Zulu Natal.Item The application of multistate Markov models to HIV disease progression.(2011) Reddy, Tarylee.; Mwambi, Henry Godwell.Survival analysis is a well developed area which explores time to single event analysis. In some cases, however, such methods may not adequately capture the disease process as the disease progression may involve intermediate events of interest. Multistate models incorporate multiple events or states. This thesis proposes to demystify the theory of multistate models through an application based approach. We present the key components of multistate models, relevant derivations, model diagnostics and techniques for modeling the effect of covariates on transition intensities. The methods that are developed in the thesis are applied to HIV and TB data partly sourced from CAPRISA and the HPP programmes in the University of KwaZulu-Natal. HIV progression is investigated through the application of a five state Markov model with reversible transitions such that state 1: CD4 count 500, state 2: 350 CD4 count < 500, state 3: 200 CD4 count < 350, state 4: CD4 count < 200 and state 5: ARV initiation. The mean sojourn time in each state and transition probabilities are presented as well as the effect of covariates namely age, gender and baseline CD4 count on transition rates. A key finding, consistent with previous research, is that the rate of decline in CD4 count tends to decrease at lower levels of the marker. Further, patients enrolling with a CD4 count less than 350 had a far lower chance of immune recovery and a substantially higher chance of immune deterioration compared to patients with a higher CD4 count. We noted that older patients tend to progress more rapidly through the disease than younger patients.Item The statistical analyses of a complex survey of banana pests and diseases in Uganda.(1999) Ngoya, Japheth N.; Clarke, G. Peter Y.No abstract available.Item Modelling longitudinal counts data with application to recurrent epileptic seizure events.(2010) Ngulube, Phathisani.; Mwambi, Henry G.; Ramroop, Shaun.The objectives of this thesis is to explore different approaches of modelling clustered correlated data in the form of repeated or longitudinal counts data leading to a replicated Poisson process. The specific application is from repeated epileptic seizure time to events data. Two main classes of models will be considered in this thesis. These are the marginal and subject or cluster specific effects models. Under the marginal class of models the generalized estimating equations approach due to Liang and Zeger (1986) is first considered. These models are concerned with population averaged effects as opposed to subject-specific effects which include random subject-specific effects such that multiple or repeated outcomes within a subject or cluster are assumed to be independent conditional on the subject−specific effects. Finally we consider a distinct class of marginal models which include three common variants namely the approach due to Anderson and Gill (1982), Wei et al (1989) and Prentice et al. (1981)Item An application of some inventory control techniques.(1992) Samuels, Carol Anne.; Moolman, W. H.; Ryan, K. C.No abstract available.