Statistics
Permanent URI for this communityhttps://hdl.handle.net/10413/6771
Browse
Browsing Statistics by Issue Date
Now showing 1 - 20 of 158
- Results Per Page
- Sort Options
Item An application of some inventory control techniques.(1992) Samuels, Carol Anne.; Moolman, W. H.; Ryan, K. C.No abstract available.Item A study of student academic performance at the University of Natal.(1994) Naidoo, Robert.; Murray, Michael.In this dissertation a study will be made of university performance in the Science Faculty of the University of Natal, Durban. In particular, we will develop models that can be used to predict the success rate of a student based on his or her matriculation results. These models will prove useful for selecting students to universities. They may also be used to assist sponsors, bursars and donors in allocating funds to deserving students. In addition, these models may be used to identify students who might experience difficulties in their studies at university.Item Forecasting the monthly electricity consumption of municipalities in KwaZulu-Natal.(1997) Walton, Alison Norma.; Haines, Linda Margaret.Eskom is the major electricity supplier in South Africa and medium term forecasting within the company is a critical activity to ensure that enough electricity is generated to support the country's growth, that the networks can supply the electricity and that the revenue derived from electricity consumption is managed efficiently. This study investigates the most suitable forecasting technique for predicting monthly electricity consumption, one year ahead for four major municipalities within Kwa-Zulu Natal.Item Aspects of categorical data analysis.(1998) Govender, Yogarani.; Matthews, Glenda Beverley.The purpose of this study is to investigate and understand data which are grouped into categories. At the onset, the study presents a review of early research contributions and controversies surrounding categorical data analysis. The concept of sparseness in a contingency table refers to a table where many cells have small frequencies. Previous research findings showed that incorrect results were obtained in the analysis of sparse tables. Hence, attention is focussed on the effect of sparseness on modelling and analysis of categorical data in this dissertation. Cressie and Read (1984) suggested a versatile alternative, the power divergence statistic, to statistics proposed in the past. This study includes a detailed discussion of the power-divergence goodness-of-fit statistic with areas of interest covering a review on the minimum power divergence estimation method and evaluation of model fit. The effects of sparseness are also investigated for the power-divergence statistic. Comparative reviews on the accuracy, efficiency and performance of the power-divergence family of statistics under large and small sample cases are presented. Statistical applications on the power-divergence statistic have been conducted in SAS (Statistical Analysis Software). Further findings on the effect of small expected frequencies on accuracy of the X2 test are presented from the studies of Tate and Hyer (1973) and Lawal and Upton (1976). Other goodness-of-fit statistics which bear relevance to the sparse multino-mial case are discussed. They include Zelterman's (1987) D2 goodness-of-fit statistic, Simonoff's (1982, 1983) goodness-of-fit statistics as well as Koehler and Larntz's tests for log-linear models. On addressing contradictions for the sparse sample case under asymptotic conditions and an increase in sample size, discussions are provided on Simonoff's use of nonparametric techniques to find the variances as well as his adoption of the jackknife and bootstrap technique.Item The statistical analyses of a complex survey of banana pests and diseases in Uganda.(1999) Ngoya, Japheth N.; Clarke, G. Peter Y.No abstract available.Item Nonlinear models for neural networks.(2000) Brittain, Susan.; Haines, Linda Margaret.The most commonly used applications of hidden-layer feed forward neural networks are to fit curves to regression data or to provide a surface from which a classification rule can be found. From a statistical viewpoint, the principle underpinning these networks is that of nonparametric regression with sigmoidal curves being located and scaled so that their sum approximates the data well, and the underlying mechanism is that of nonlinear regression, with the weights of the network corresponding to parameters in the regression model, and the objective function implemented in the training of the network defining the error structure. The aim ofthe present study is to use these statistical insights to critically appraise the reliability and the precision of the predicted outputs from a trained hiddenlayer feed forward neural network.Item Spatial analysis and efficiency of systematic designs in intercropping experiments.(2002) Wandiembe, Symon Peter.; Njuho, Peter Mungai.In studies involving intercropping plant populations, the main interest is to locate the position of the maximum response or to study the response pattern. Such studies normally require many plant population levels. Thus, designs such as spacing systematic designs that minimise experimental land area are desired. Randomised block designs may not perform well as they allow few population levels which may not span the maximum or enable exploration of other features of the response surface. However, lack of complete randomisation in systematic designs may imply spatial variability (largescale and small-scale variations i.e. trend and spatial dependence) in observations. There is no correct statistical method laid out for data analysis from such designs. Given that spacing systematic designs are not well explored in literature, the main thrusts of this study are two fold; namely, to explore the use of spatial modelling techniques in analysing and modelling data from systematic designs, and to evaluate the efficiency of systematic designs used in intercropping experiments. Three classes of models for trend and error modelling are explored/introduced. These include spatial linear mixed models, semi-parametric mixed models and beta-hat models incorporating spatial variability. The reliability and precision of these methods are demonstrated. Relative efficiency of systematic designs to completely randomised design are evaluated. The analysis of data from systematic designs is shown be easily implemented. Measures of efficiency that includeItem Longitudinal survey data analysis.(2006) Nasirumbi, Pamela Opio.; Zewotir, Temesgen Tenaw.To investigate the effect of environmental pollution on the health of children in the Durban South Industrial Basin (DSIB) due to its proximity to industrial activities, 233 children from five primary schools were considered. Three of these schools were located in the south of Durban while the other two were in the northern residential areas that were closer to industrial activities. Data collected included the participants' demographic, health, occupational, social and economic characteristics. In addition, environmental information was monitored throughout the study specifically, measurements on the levels of some ambient air pollutants. The objective of this thesis is to investigate which of these factors had an effect on the lung function of the children. In order to achieve this objective, different sample survey data analysis techniques are investigated. This includes the design-based and model-based approaches. The nature of the survey data finally leads to the longitudinal mixed model approach. The multicolinearity between the pollutant variables leads to the fitting of two separate models: one with the peak counts as the independent pollutant measures and the other with the 8-hour maximum moving average as the independent pollutant variables. In the selection of the fixed-effects structure, a scatter-plot smoother known as the loess fit is applied to the response variable individual profile plots. The random effects and the residual effect are assumed to have different covariance structures. The unstructured (UN) covariance structure is used for the random effects, while using the Akaike information criterion (AIC), the compound symmetric (CS) covariance structure is selected to be appropriate for the residual effects. To check the model fit, the profiles of the fitted and observed values of the dependent variables are compared graphically. The data is also characterized by the problem of intermittent missingness. The type of missingness is investigated by applying a modified logistic regression model missing at random (MAR) test. The results indicate that school location, sex and weight are the significant factors for the children's respiratory conditions. More specifically, the children in schools located in the northern residential areas are found to have poor respiratory conditions as compared to those in the Durban-South schools. In addition, poor respiratory conditions are also identified for overweight children.Item Analysis of time-to-event data including frailty modeling.(2006) Phipson, Belinda.; Mwambi, Henry Godwell.There are several methods of analysing time-to-event data. These include nonparametric approaches such as Kaplan-Meier estimation and parametric approaches such as regression modeling. Parametric regression modeling involves specifying the distribution of the survival time of the individuals, which are commonly chosen to be either exponential, Weibull, log- normal, log-logistic or gamma distributed. Another well known model that does not require assumptions about the hazard function to be made is the Cox proportional hazards model. However, there may be deviations from proportional hazards which may be explained by unaccounted random heterogeneity. In the early 1980s, a series of studies showed concern with the possible bias in the estimated treatment e®ect when important covariates are omitted. Other problems may be encountered with the traditional proportional hazards model when there is a possibility of correlated data, for instance when there is clustering. A method of handling these types of problems is by making use of frailty modeling. Frailty modeling is a method whereby a random e®ect is incorporated in the Cox pro- portional hazards model. While this concept is fairly simple to understand, the method of estimation of the ¯xed and random e®ects becomes complicated. Various methods have been explored by several authors, including the Expectation-Maximisation (EM) algorithm, pe- nalized partial likelihood approach, Markov Chain Monte Carlo (MCMC) methods, Monte Carlo EM approach and di®erent methods using Laplace approximation. The lack of available software is problematic for ¯tting frailty models. These models are usually computationally extensive and may have long processing times. However, frailty modeling is an important aspect to consider, particularly if the Cox proportional hazards model does not adequately describe the distribution of survival time.Item Factors affecting the health status of the people of Lesotho.(2007) Moeti, Abiel.Lesotho, like any other country of the world, is faced with the task of improving theItem Inference from finite population sampling : a unified approach.(2007) Hargovan, Kashmira Ansuyah.; Arnab, Raghunath.; North, Delia Elizabeth.In this thesis, we have considered the inference aspects of sampling from a finite population. There are significant differences between traditional statistical inference and finite population sampling inference. In the case of finite population sampling, the statistician is free to choose his own sampling design and is not confined to independent and identically distributed observations as is often the case with traditional statistical inference. We look at the correspondence between the sampling design and the sampling scheme. We also look at methods used for drawing samples. The non – existence theorems (Godambe (1955), Hanurav and Basu (1971)) are also discussed. Since the minimum variance unbiased estimator does not exist for infinite populations, a number of estimators need to be considered for estimating the same parameter. We discuss the admissible properties of estimators and the use of sufficient statistics and the Rao-Blackwell Theorem for the improvement of inefficient inadmissible estimators. Sampling strategies using auxiliary information, relating to the population, need to be used as no sampling strategy can provide an efficient estimator of the population parameter in all situations. Finally few well known sampling strategies are studied and compared under a super population model.Item Analysis of longitudinal binary data : an application to a disease process.(2008) Ramroop, Shaun.; Mwambi, Henry Godwell.The analysis of longitudinal binary data can be undertaken using any of the three families of models namely, marginal, random effects and conditional models. Each family of models has its own respective merits and demerits. The models are applied in the analysis of binary longitudinal data for childhood disease data namely the Respiratory Syncytial Virus (RSV) data collected from a study in Kilifi, coastal Kenya. The marginal model was fitted using generalized estimating equations (GEE). The random effects models were fitted using ‘Proc GLIMMIX’ and ‘NLMIXED’ in SAS and then again in Genstat. Because the data is a state transition type of data with the Markovian property the conditional model was used to capture the dependence of the current response to the previous response which is known as the history. The data set has two main complicating issues. Firstly, there is the question of developing a stochastically based probability model for the disease process. In the current work we use direct likelihood and generalized linear modelling (GLM) approaches to estimate important disease parameters. The force of infection and the recovery rate are the key parameters of interest. The findings of the current work are consistent and in agreement with those in White et al. (2003). The aspect of time dependence on the RSV disease is also highlighted in the thesis by fitting monthly piecewise models for both parameters. Secondly, there is the issue of incomplete data in the analysis of longitudinal data. Commonly used methods to analyze incomplete longitudinal data include the well known available case analysis (AC) and last observation carried forward (LOCF). However, these methods rely on strong assumptions such as missing completely at random (MCAR) for AC analysis and unchanging profile after dropout for LOCF analysis. Such assumptions are too strong to generally hold. In recent years, methods of analyzing incomplete longitudinal data have become available with weaker assumptions, such as missing at random (MAR). Thus we make use of multiple imputation via chained equations that require the MAR assumption and maximum likelihood methods that result in the missing data mechanism becoming ignorable as soon as it is MAR. Thus we are faced with the problem of incomplete repeated non–normal data suggesting the use of at least the Generalized Linear Mixed Model (GLMM) to account for natural individual heterogeneity. The comparison of the parameter estimates using the different methods to handle the dropout is strongly emphasized in order to evaluate the advantages of the different methods and approaches. The survival analysis approach was also utilized to model the data due to the presence of multiple events per subject and the time between these events.Item Time series modelling with application to South African inflation data(2009) Chinomona, AmosThe research is based on financial time series modelling with special applicationItem Estimating risk determinants of HIV and TB in South Africa.(2009) Mzolo, Thembile.; Mwambi, Henry G.; Zuma, Khangelani.Where HIV/AIDS has had its greatest adverse impact is on TB. People with TB that are infected with HIV are at increased risk of dying from TB than HIV. TB is the leading cause of death in HIV individuals in South Africa. HIV is the driving factor that increases the risk of progression from latent TB to active TB. In South Africa no coherent analysis of the risk determinants of HIV and TB has been done at the national level this study seeks to mend that gab. This study is about estimating risk determinants of HIV and TB. This will be done using the national household survey conducted by Human Sciences Research Council in 2005. Since individuals from the same household and enumerator area more likely to be more alike in terms of risk of disease or correlated among each other, the GEEs will be used to correct for this potential intraclass correlation. Disease occurrence and distribution is highly heterogeneous at the population, household and the individual level. In recognition of this fact we propose to model this heterogeneity at community level through GLMMs and Bayesian hierarchical modelling approaches with enumerator area indicating the community e ect. The results showed that HIV is driven by sex, age, race, education, health and condom use at sexual debut. Factors associated with TB are HIV status, sex, education, income and health. Factors that are common to both diseases are sex, education and health. The results showed that ignoring the intraclass correlation can results to biased estimates. Inference drawn from GLMMs and Bayesian approach provides some degree of con dence in the results. The positive correlation found at an enumerator area level for both HIV and TB indicates that interventions should be aimed at an area level rather than at the individual level.Item Applications of Levy processes in finance.(2009) Essay, Ahmed Rashid.; O’Hara, J. G.The option pricing theory set forth by Black and Scholes assumes that the underlying asset can be modeled by Geometric Brownian motion, with the Brownian motion being the driving force of uncertainty. Recent empirical studies, Dotsis, Psychoyios & Skiadopolous (2007) [17], suggest that the use of Brownian motion alone is insufficient in accurately describing the evolution of the underlying asset. A more realistic description of the underlying asset’s dynamics would be to include random jumps in addition to that of the Brownian motion. The concept of including jumps in the asset price model leads us naturally to the concept of a L'evy process. L'evy processes serve as a building block for stochastic processes that include jumps in addition to Brownian motion. In this dissertation we first examine the structure and nature of an arbitrary L'evy process. We then introduce the stochastic integral for L'evy processes as well as the extended version of Itˆo’s lemma, we then identify exponential L'evy processes that can serve as Radon-Nikod'ym derivatives in defining new probability measures. Equipped with our knowledge of L'evy processes we then implement this process in a financial context with the L'evy process serving as driving source of uncertainty in some stock price model. In particular we look at jump-diffusion models such as Merton’s(1976) [37] jump-diffusion model and the jump-diffusion model proposed by Kou and Wang (2004) [30]. As the L'evy processes we consider have more than one source of randomness we are faced with the difficulty of pricing options in an incomplete market. The options that we shall consider shall be mainly European in nature, where exercise can only occur at maturity. In addition to the vanilla calls and puts we independently derive a closed form solution for an exchange option under Merton’s jump-diffusion model making use of conditioning arguments and stochastic integral representations. We also examine some exotic options under the Kou and Wang model such as barrier options and lookback options where the solution to the option price is derived in terms of Laplace transforms. We then develop the Kou and Wang model to include only positive jumps, under this revised model we compute the value of a perpetual put option along with the optimal exercise point. Keywords Derivative pricing, L'evy processes, exchange options, stochastic integration.Item D-optimal designs for drug synergy.(2009) Kabera, Muregancuro Gaëtan.; Ndlovu, Principal.; Haines, Linda Margaret.This thesis is focused on the construction of optimal designs for detecting drug interaction using the two-variable binary logistic model. Two specific models are considered: (1) the binary two-variable logistic model without interaction, and (2) the binary two-variable logistic model with interaction. The two explanatory variables are assumed to be doses of two drugs that may or may not interact when jointly administered to subjects. The main objective of the thesis is to algebraically construct the optimal designs. However, numerical computations are used for constructing optimal designs in cumbersome cases. The problem of constructing optimal designs is to allocate weights to specific points of the design space in such a way that information associated with model parameters is maximized and the variances of the mean responses are minimized. Specifically, the D-optimality criterion discussed in this thesis minimizes the determinant of the asymptotic variance-covariance matrix of the estimates of the model parameters. The number of support points of the D-optimal designs for the two- variable binary logistic model without interaction varies from 3 to 6. Support points are equally weighted only in case of the 3-point designs and in some special cases of the 4-point designs. The number of support points of the D-optimal designs for the two-variable binary logistic model with interaction varies from 4 to 8. Support points are equally weighted only in case of the 4-point designs and in some special cases of 8-point designs. Numerous examples are given to illustrate theoretical results.Item Modelling CD4+ count over time in HIV positive patients initiated on HAART in South Africa using linear mixed models.(2009) Yende, Nonhlanhla.; Mwambi, Henry G.HIV is among the highly infectious and pathogenic diseases with a high mortality rate. The spread of HIV is in uenced by several individual based epidemiological factors such as age, gender, mobility, sexual partner pro le and the presence of sexually transmitted infections (STI). CD4+ count over time provided the rst surrogate marker of HIV disease progression and is currently used for clinical management of HIV-positive patients. The CD4+ count as a key disease marker is repeatedly measured among those individuals who test HIV positive to monitor the progression of the disease since it is known that HIV/AIDS is a long wave event. This gives rise to what is commonly known as longitudinal data. The aim of this project is to determine if the patients' weight, baseline age, sex, viral load and clinic site, in uences the rate of change in CD4+ count over time. We will use data of patients who commenced highly active antiretroviral therapy (HAART) from the Center for the AIDS Programme of Research in South Africa (CAPRISA) in the AIDS Treatment Project (CAT) between June 2004 and September 2006, including two years of follow-up for each patient. Analysis was done using linear mixed models methods for longitudinal data. The results showed that larger increase in CD4+ count over time was observed in females and individuals who were younger. However, upon tting baseline log viral load in the model instead of the log viral at all visits was that, larger increase in CD4+ count was observed in females, individuals who were younger, had higher baseline log viral load and lower weight.Item Stochastic volatility effects on defaultable bonds.(2009) Mkize, Thembisile.; O'Hara, John Gerard.We study the eff ects of stochastic volatility of defaultable bonds using the first -passage structural approach. In this approach Black and Cox (1976) argued that default can happen at any time. This then led to the development of afirst-passage model, in which a rm (company) default occurs when its value falls to a barrier. In the first-passage model the rm debt is considered to be a single pure discount bond and default occurs only if the rm value falls below the face value of the bond at maturity. Here the firm's debt can be viewed as a portfolio composed of a risk-free bond and a short-put option on the value of a rm. The classic Black-Scholes-Merton model only considers a single liability and the solvency is tested at the maturity date, while the extended Black-Scholes-Merton model allows for default at any time before maturity to cater for more complex capital structures and was delivered by Geske, Black-Cox, Leland, Leland and Toft and others. In this work a review of the eff ect of stochastic volatility on defaultable bonds is given. In addition a study from the first-passage structural approach and reduced-form approach is made. We also introduce symmetry analysis to study some of the equations that appear in option-pricing models. This approach is quite recent and has produced successful results. In this work we lay the foundation of this method. Keywords: Stochastic Volatility, Defaultable bonds, Lie Symmetries.Item Modelling acute HIV infection using longitudinally measured biomarker data including informative drop-out.(2009) Werner, Lise.; Mwambi, Henry G.Background. Numerous methods have been developed to model longitudinal data. In HIV/AIDS studies, HIV markers, CD4+ count and viral load are measured over time. Informative drop-out and the lower detection limit of viral load assays can bias the results and influence assumptions of the models. Objective The objective of this thesis is to describe the evolution of HIV markers in an HIV-1 subtype C acutely infected cohort of women from the CAPRISA 002: Acute Infection Study in Durban, South Africa. They were HIV treatment naive. Methods. Various linear mixed models were fitted to both CD4+ count and viral load, adjusting for repeated measurements, as well as including intercept and slope as random effects. The rate of change in each of the HIV markers was assessed using weeks post infection as both a linear effect and piecewise linear effects. Left-censoring of viral load was explored to account for missing data resulting from undetectable measurements falling below the lower detection limit of the assay. Informative drop- out was addressed by using a method of joint modelling in which a longitudinal and survival model were jointly linked using a latent Gaussian process. The progression of HIV markers were described and the effectiveness and usefulness of each modelling procedure was evaluated. Results. 62 women were followed for a median of 29 months post infection (IQR 20-39). Viral load increased sharply by 2.6 log copies/ml per week in the first 2 weeks of infection and decreased by 0.4 log copies/ml per week the next fortnight. It decreased at a slower rate thereafter. Similarly CD4+ count fell in the first 2 weeks by 4.4 square root cells/ul per week then recovered slightly only to decrease again. Left-censoring was unnecessary in this acute infection cohort as few viral load measures were below the detection limit and provided no improvement on model fit. Conclusion. Piecewise linear effects proved to be useful in quantifying the degree at which the HIV markers progress during the first few weeks of HIV infection, whereas modelling time as a linear effect was not very meaningful. Modelling HIV markers jointly with informative drop-out is shown to be necessary to account for the missing data incurred from participants leaving the study to initiate ARV treatment. In ignoring this drop-out, CD4+ count is estimated to be higher than what it actually is.Item Modeling environmental factors affecting the growth of eucalypt clones.(2009) Chauke, Morries.; Zewotir, Temesgen Tenaw.; Ndlovu, Principal.; Grzeskowiak, Valerie.Tree growth is influenced by environment and genetic factors. The same tree growing in different areas will have different growth patterns. Trees with different genetic material, e.g. pine and Eucalyptus trees, growing under the same environmental conditions have different growth patterns. Plantation trees in South Africa are mainly used for pulp and paper production. Growth is an important economic factor in the pulp and paper industry. Plantations with fast growth will be available for processing earlier compared to a slow growth plantation. Consequently, it is important to understand the role played by environmental factors, especially climatic factors, on tree growth. This thesis investigated the climatic effects on the radial growth of two Eucalyptus clones using growth data collected daily over five years by Sappi. The general linear model and the time series models were used to assess the effects of climate on radial growth of the two clones. It was found that the two clones have similar overall growth patterns over time, but differ in growth rates. The growth pattern of the two clones appears to be characterized by substantial jumps/changes in growth rates over time. The times at which the jumps/changes in growth rate occur are referred to as the “breakpoints”. The piecewise linear regression model was used to estimate when the breakpoints occur. After estimating the breakpoints, the climatic effects associated with these breakpoints were investigated. The linear and time series modeling results indicated that the contribution of climatic factors on radial growth of Eucalyptus clones was small. Most of the variation in radial growth was explained by the age of the trees. Consequently, this thesis also investigated the appropriate functional relationship between radial growth and age. In particular, this nonlinear growth models were used to model the radial growth process. The investigated growth curve models were those which included the maximum radius and the age at which the radial growth rate is largest as some of the parameters. The maximum growth rate was calculated from the estimated model of each clone. The results indicated that the two clones reach the maximum growth rate at different times. In particular, the two clones reach the maximum growth rates at around 368 and 376 days, respectively. Furthermore, the maximum radius was found to be different for the two clones.