Browsing by Author "Ramroop, Shaun."
Now showing 1 - 20 of 28
- Results Per Page
- Sort Options
Item Analysis of longitudinal binary data : an application to a disease process.(2008) Ramroop, Shaun.; Mwambi, Henry Godwell.The analysis of longitudinal binary data can be undertaken using any of the three families of models namely, marginal, random effects and conditional models. Each family of models has its own respective merits and demerits. The models are applied in the analysis of binary longitudinal data for childhood disease data namely the Respiratory Syncytial Virus (RSV) data collected from a study in Kilifi, coastal Kenya. The marginal model was fitted using generalized estimating equations (GEE). The random effects models were fitted using ‘Proc GLIMMIX’ and ‘NLMIXED’ in SAS and then again in Genstat. Because the data is a state transition type of data with the Markovian property the conditional model was used to capture the dependence of the current response to the previous response which is known as the history. The data set has two main complicating issues. Firstly, there is the question of developing a stochastically based probability model for the disease process. In the current work we use direct likelihood and generalized linear modelling (GLM) approaches to estimate important disease parameters. The force of infection and the recovery rate are the key parameters of interest. The findings of the current work are consistent and in agreement with those in White et al. (2003). The aspect of time dependence on the RSV disease is also highlighted in the thesis by fitting monthly piecewise models for both parameters. Secondly, there is the issue of incomplete data in the analysis of longitudinal data. Commonly used methods to analyze incomplete longitudinal data include the well known available case analysis (AC) and last observation carried forward (LOCF). However, these methods rely on strong assumptions such as missing completely at random (MCAR) for AC analysis and unchanging profile after dropout for LOCF analysis. Such assumptions are too strong to generally hold. In recent years, methods of analyzing incomplete longitudinal data have become available with weaker assumptions, such as missing at random (MAR). Thus we make use of multiple imputation via chained equations that require the MAR assumption and maximum likelihood methods that result in the missing data mechanism becoming ignorable as soon as it is MAR. Thus we are faced with the problem of incomplete repeated non–normal data suggesting the use of at least the Generalized Linear Mixed Model (GLMM) to account for natural individual heterogeneity. The comparison of the parameter estimates using the different methods to handle the dropout is strongly emphasized in order to evaluate the advantages of the different methods and approaches. The survival analysis approach was also utilized to model the data due to the presence of multiple events per subject and the time between these events.Item Bayesian spatial modeling of malnutrition and mortality among under-five children in sub-Saharan Africa.(2019) Adeyemi, Rasheed Alani.; Zewotir, Temesgen Tenaw.; Ramroop, Shaun.The aim of this thesis is to develop and extend Bayesian statistical models in the area of spatial modeling and apply them to child health outcomes, with particular focus on childhood malnutrition and mortality among under-five children. The easy availability of a geo-referenced database has stimulated a paradigm shift in methodological approaches to spatial analysis. This study reviewed the spatial methods and disease mapping models developed for areal (lattice) data analysis. Observational data collected from complex design surveys and geographical locations often violates the independent assumption of classical regression models. By relaxing the restrictive linearity and normality assumptions of classical regression models, this study first developed a flexible semi-parametric spatial model that accommodates the usual fixed effect, nonlinear and geographical component in a unified model. The approach was explored in the analysis of spatial patterns of child birth outcomes in Nigeria. The study also addressed the issue of disease clustering, which is of interest to epidemiologists and public health officials. The study then proposed a Bayesian hierarchical analysis approach for Poisson count data and formulated a Poisson version of generalized linear mixed models (GLMMs) for analyzing childhood mortality. The model simultaneously addressed the problem of overdispersion and spatial dependence by the inclusion of the risk factors and random effects in a single model. The proposed approach identified regions with elevated relative risk or clustering of high mortality and evaluated the small scale geographical disparities in sub-populations across the regions. The study identified another challenge in spatial data analysis, which are spatial autocorrelation and model misspecification. The study then fitted geoadditive mixed (GAM) models to analyze childhood anaemia data belonging to a family of exponential distributions (Gaussian, binary and multinomial). The GAM models are extension of generalized linear mixed models by allowing the inclusion of splines for continuous covariate (or time) trends with the parametric function. Lastly, the shared component model originally developed for multiple disease mapping was reviewed and modified to suit the binary data at hand. A multivariate conditional autoregressive (MCAR) model was developed and applied to jointly analyze three child malnutrition indicators. The approach facilitated the estimation of conditional correlation between the diseases; assess the spatial association with the regions and geographical variation of individual disease prevalence. The spatial analysis presented in this thesis is useful to inform health-care policy and resource allocation. This thesis contributes to methodological applications in life sciences, environmental sciences, public health and agriculture. The present study expands the existing methods and tools for health impact assessment in public health studies. KEYWORDS: Conditional Autoregressive (CAR) model, Disease Mapping Models, Multiple Disease mapping, Health Geography, Ecology Models, Spatial Epidemiology, Childhood Health outcomes.Item Factors affecting child mortality in Lesotho using 2009 and 2014 LDHS data.(2021) Mkhize, Nonduduzo Noxolo.; Melesse, Sileshi Fanta.; Mwambi, Henry Godwell.; Ramroop, Shaun.Child mortality rate is known to be the important indicator of social development, quality of life, welfare as well as the overall health of the society. In most countries, especially the developing countries; the death of a child is usually caused by transferable, preventable diseases and poor health. Progress in improving under-five mortality since 1990 has been made globally. There has been a decline globally in under-five mortality from 12.7 million in 1990 to approximately 6 million in 2015. All regions except the developing countries in Sub-Saharan Africa, Central Asia, Southern Asia and Oceania had reduced the rate by 52% or more in 2013. Lesotho is a developing country with one of the highest rates of infant and child mortality. The study uncovers the factors influencing child mortality in Lesotho based on the Lesotho Demographic and Health Surveys for 2009 and 2014. The survey logistic regression, a model under the generalized linear model framework was used to find the factors related to under-five child mortality to account for the sampling designs complexity. The SLR model is not able to account for variability occurring from connection between subjects from the equal clusters and household. The generalized linear mixed model is then put into application. To ease the normality assumptions and the linearity assumption in the parametric models, the semi-parametric generalized additive model, was lastly used for the data. Finding the determining factors that result in child mortality will benefit the way intervention programs are planned and the formulation for policy makers to lead in the decreasing of child mortality; and accomplish MDGs. This study intends to improve the existing knowledge on child mortality in Lesotho by studying the determining factors in detail. Based on the previous studies this paper will recommend intervention designs and policy formulation. Overall, the findings of this research showed that birth order number, weight of child at birth, age of child, breastfeeding, wealth index, education attainment, mother’s age, type of place of residence, number of children living were the key determining factors of the under-five mortality in Lesotho. The study displays that policy makers should strengthen the interventions for child health in order to decrease child under-five mortality. The results achieved can help with the policy formulation to control and reduce child mortality. The government should continually assess current programs to review and develop programs that are more applicable.Item Financial modelling of cryptocurrency: a case study of Bitcoin, Ethereum, and Dogecoin in comparison with JSE stock returns.(2022) Kaseke, Forbes.; Ramroop, Shaun.; Mwambi, Henry Godwell.The emergency of cryptocurrency has caused a shift in the financial markets. Although it was created as a currency for exchange, cryptocurrency has been shown to be an asset, with investors seeking to profit from it rather than using it as a medium of exchange. Despite being a financial asset, cryptocurrency has distinct, stylised facts like any other asset. Studying these stylised facts allows the creation of better-suited models to assist investors in making better data-driven decisions. The data used in this thesis was of three leading cryptocurrencies: Bitcoin, Ethereum, and Dogecoin and the Johannesburg Stock Exchange (JSE) data as a guide for comparison. The sample period was from 18 September 2017 to 27 May 2021. The goal was to research the stylised facts of cryptocurrencies and then create models that capture these stylised facts. The study developed risk-quantifying models for cryptocurrencies. The main findings were that cryptocurrency exhibits stylised facts that are well-known in financial data. However, the magnitude and frequency of these stylised facts tend to differ. For example, cryptocurrency is more volatile than stock returns. The volatility also tends to be more persistent than in stocks. The study also finds that cryptocurrency has a reverse leverage effect as opposed to the normal one, where past negative returns increase volatility more than past positive returns. The study also developed a hybrid GARCH model using the extreme value theorem for quantifying cryptocurrency risk. The results showed that the GJR-GARCH with GDP innovations could be used as an alternative model to calculate the VaR. The volatile nature of cryptocurrency was also compared with that of the JSE while accounting for structural breaks and while not accounting for them. The results showed that the cryptocurrencies’ volatility patterns are similar but differ from those of the JSE. The cryptocurrency was also found to be an inefficient market. This finding means that some investors can take advantage of this inefficiency. The study also revealed that structural breaks affect volatility persistence. However, this persistence measure differs depending on the model used. Markov switching GARCH models were used to strengthen the structural break findings. The results showed that two-regime models outperform single-regime models. The VAR and DCC-GARCH models were also used to test the spillovers amongst the assets used. The results showed short-run spillovers from Bitcoin to Ethereum and long-run spillovers based on the DCC-GARCH. Lastly, factors affecting cryptocurrency adoption were discussed. The main reasons affecting mass adoption are the complexity that comes with the use of cryptocurrency and its high volatility. This study was critical as it gives investors an understanding of the nature and behaviour of cryptocurrency so that they know when and how to invest. It also helps policymakers and financial institutions decide how to treat or use cryptocurrency within the economy.Item Flexible statistical modeling of childhood malnutrition in Malawi.(2019) Magagula, Mzwakhe Elmon.; Ramroop, Shaun.Childhood malnutrition is one of the most significant health problems affecting public health departments, mainly in developing countries. The development of proper assessment of malnutrition is one of the challenges faced by policy makers in many countries across the globe. Therefore, the current study was undertaken with the primary objective of assessing and determining all possible determinants of malnutrition in Malawi, using the Demographic and Health Survey (DHS) data 2015/16. Different types of statistical models were adopted to allow variety in methodology and to find the most accurate results among the models used. As a point of departure, the study utilized Generalized Linear Models (GLM) to account for the ordering of the outcome variable (severe, moderate and nourished). Furthermore, we noticed that it would be substantial to extend the ordinal logistic regression to include random effects and therefore to consider the variability between the primary sampling units or villages. Furthermore, we adopted a class of models that allows flexible functional dependence of an outcome variable on covariates by using nonparametric regression. Hence, the use of the generalized additive mixed model (GAMM), which relaxes the assumption of normality and linearity inherent in linear regressions. Analyses of childhood stunting have mainly used mean regression, yet modelling using quantile regression is more appropriate than using mean regression in that the former provides flexibility to study the impact of predictors on different desired quantiles of the response distribution, whereas the latter allows only studying the impact of predictors on the mean of the response variable. Therefore, quantile regression models were adopted for the provision of a complete picture of the relationship between the outcome variable (stunting) and the predictor variables on different desired quantiles of the response distribution. This study fitted a Bayesian additive quantile regression model with structural spatial effects for childhood stunting in Malawi, using 2015/16 DHS data. Inference was fully Bayesian, using the new integrated nested Laplace approximation (INLA), purely because of its much faster computation as compared to Markov chain Monte Carlo (MCMC). Furthervii Abstract more, different types of quantile regression models were fitted and compared according to each Deviance Information Criteria (DIC) for determination of the best model among them. Each of these models has inherent strengths and weaknesses. The choice of one depends on what the research is trying to accomplish and the type of data one has. In this study, we combined the results from different models, mainly from our quantile regression models. The significant determinants of childhood stunting in Malawi were found to be the age of the child, the education level of parents (mother and father), the family’s place of residence, gender of the child, incidence of recent fever, incidence of recent diarrhoea, multiple births, mother’s age at the birth, body mass index of the mother, wealth index of the family, source of drinking water and districts. Furthermore, from the spatial quantile regression model, a map was generated showing the distribution of malnutrition in a district level of Malawi. This map gave us an overview on how stunting is distributed in Malawi and from the map we were able to visualize and assess affected districts.Item Flexible statistical modeling of deaths by diarrhoea in South Africa.(2013) Mbona, Sizwe Vincent.; Ramroop, Shaun.; Mwambi, Henry G.The purpose of this study is to investigate and understand data which are grouped into categories. Various statistical methods was studied for categorical binary responses to investigate the causes of death from diarrhoea in South Africa. Data collected included death type, sex, marital status, province of birth, province of death, place of death, province of residence, education status, smoking status and pregnancy status. The objective of this thesis is to investigate which of the above explanatory variables was most affected by diarrhoea in South Africa. To achieve this objective, different sample survey data analysis techniques are investigated. This includes sketching bar graphs and using several statistical methods namely, logistic regression, surveylogistic, generalised linear model, generalised linear mixed model, and generalised additive model. In the selection of the fixed effects, a bar graph is applied to the response variable individual profile graphs. A logistic regression model is used to identify which of the explanatory variables are more affected by diarrhoea. Statistical applications are conducted in SAS (Statistical Analysis Software). Hosmer and Lemeshow (2000) propose a statistic that they show, through simulation, is distributed as chi‐square when there is no replication in any of the subpopulations. Due to the similarity of the Hosmer and Lemeshow test for logistic regression, Parzen and Lipsitz (1999) suggest using 10 risk score groups. Nevertheless, based on simulation results, May and Hosmer (2004) show that, for all samples or samples with a large percentage of censored observations, the test rejects the null hypothesis too often. They suggest that the number of groups be chosen such that G=integer of {maximum of 12 and minimum of 10}. Lemeshow et al. (2004) state that the observations are firstly sorted in increasing order of their estimated event probability.Item Flexible statistical modelling in food insecurity risk assessment.(2015) Lokosang, Laila Barnaba.; Ramroop, Shaun.; Zewotir, Temesgen Tenaw.Food insecurity has remained a persistent problem in Sub-Saharan Africa. Conflict and other protracted crisis have rendered a significant proportion of Africa’s populations to suffer the risk of food insecurity, as their resilience to livelihood shocks weakens. A significant and immense body of research in the past two decades has largely centred on describing the incidence of food insecurity and vulnerability. Limited research was done using statistical methods to determine the likelihood of food insecurity risk. The use of flexible statistical techniques for a sound and purposive monitoring, evaluation, planning and decision making in food security and resilience was limited. The study aimed to extend the use of statistics into the expanding field of food security and resilience, and also to provide new direction for future research involving applications of the methods explored, such as adjustments in statistical methods, sampling and data collection. The study specifically aims at helping food security analysts with tested and statistically robust tools for use in the analyses of the likelihood of food insecurity risk in settings with structural food insecurity issues. Moreover, it aimed to inform practice, policy and analysis in monitoring and evaluation of food insecurity risk in protracted crisis; thus helping in improving risk aversion measures. Utilising secondary data, the research examines relevant statistical techniques for determining predictors of food insecurity risk, namely; Principal Component Analysis; Multiple Correspondence Analysis; Classification and Regression Tree Analysis; Survey Logistic Regression, Generalized Linear Mixed Models for Ordered Categorical Data; and Joint Modelling. The study was conducted in the form of structured analysis of different datasets vi collected in the conflict-ridden South Sudan. Assets owned by households, as well as availability of livelihood endowments, was used as proxy for determining the level of resilience in particular demographic unit or geographical setting. The study highlighted the strengths and weaknesses of the techniques explored in the analysis as identifying or classifying potential predictors of food insecurity outcomes. Each technique is capable of generating a unique composite index for measuring the amount of resilience and predicting and classifying households according to food insecurity phase based on factor loadings. In general, the study determined that each method explored has peculiar strengths as well as limitations. However, a noteworthy implication observed is that asset-based statistical analysis, whether based on composite index that can be used as proxy for measuring the amount of resilience to food insecurity eventualities or on regression modelling approaches, does assure sufficient rigour in drawing conclusions about the wellbeing of households or populations under study and how they might withstand food insecurity and livelihood shocks. As food insecurity and malnutrition continue to attract substantial attention, such flexible analytical approaches exert potential usefulness in determining food insecurity risks, especially in protracted crisis settings.Item Flexible statistical modelling of the determinants of childhood anaemia in Tanzania and Angola.(2020) Ndlangamandla, Qondeni.; Ramroop, Shaun.; Mwambi, Henry Godwell.Anaemia is one of the major causes of morbidity and mortality in children aged five or less in Africa, affecting 25% of the world’s population. In developing countries, it accounts for more than 89% of the disease burden. Although anaemia affects all population groups, the more vulnerable groups are children under five years of age and women of reproductive age (15–49 years) compared to any other age group. According to the World Health Organization’s 2008 report, 50% of anaemia cases in Africa were associated with insufficient consumption of iron (iron deficiency anaemia). This study aims to determine the factors associated with childhood anaemia in Tanzania and Angola. For us to serve our aim, the Tanzania Demographics and Health Survey (TDHS) and the Angola Demographics and Health Survey (ADHS) data sets were fitted to several statistical models that could robustly model the response variable, anaemia, which is binary. Survey Logistic Regression (SLR), which is under the class of Generalized Linear Models (GLM), fits because of its robustness, not only in modelling dichotomous responses, but also in it ability to deal with data that assumes complex survey designs. The SLR model was extended by a Generalized additive mixed model (GAMM), which was fitted to relax the assumption of normality and to fit other terms non-parametrically. Furthermore, to cater for the effect of spatial effect and spatial variability, a Spatial Generalized linear mixed model (SGLMM) was fitted to the two data sets to help in the investigation of factors that are spatially related to childhood anaemia. The SLR and SGLMM models were fitted using the SAS software (PROC SURVEYLOGISTIC and PROC GLIMMIX, respectively), while the GAMM model was fitted using the statistical-R software. Moreover, smooth maps were produced for the outcome variable using ARCGIS software for the purpose of identifying the hot spots of childhood anaemia in the country. Our aim for this study was successfully achieved. After the three models were fitted into the two data sets, they revealed that the factors that were highly associated with childhood anaemia in both countries are: the highest level of education of caretakiers (mothers), child gender, age of the child and stunting status. The models also revealed that the standard of living in Tanzania has a significant effect in childhood anaemiaItem Identifying factors associated with smoking in Gauteng in the presence of missing data.(2014) Mabungane, Siyanda.; Ramroop, Shaun.Smoking still remains one of the leading preventable causes of death in South Africa. It increases the chances of lung diseases such as emphysema, chronic bronchitis and many other diseases. The current research aims to model the smoking survey data which was part of the October 1996 omnibus smoking survey in Gauteng (South Africa). The surveyed variables were race, sex, marital status, socio-economic status, smoking status, age and education level. Generalized Linear Models (GLMs) and Generalized Linear Mixed Models (GLMMs) were used to model this data. Multiple Correspondence Analysis (MCA) was used to check for the relationships and correlation among the variables. Furthermore, the problem of missing data was addressed using the classical methods such as Last Observation Carried Forward (LOCF) as well as more modern advanced methods viz. Inverse Probability Weighting (IPW) and Multiple Imputation (MI). The percentage of smokers was found to be lower than that of non-smokers amongst all the surveyed variables. Race, sex, age and socio-economic status were found to be signi cant when tted with both GLMs and GLMMs. It was found that race and socio-economic status were closely correlated, education was closely correlated with race, education was closely correlated with socio-economic status, and age was closely correlated with marital status. MI and IPW estimators were found to be more consistent than the LOCF estimators. In spite of the e ort by several health policy makers of trying to alert people about the dangers of smoking, there appears to be a lack of awareness that smoking causes tuberculosis (TB), lung cancer, stroke, throat and mouth cancer, as well as various other lung and heart diseases.Item Joint modelling of malaria and anaemia in children less than five years of age in Malawi.(Elsevier., 2021) Gaston, Rugiranka Tony.; Ramroop, Shaun.; Habyarimana, Faustin.Background: Malaria and anaemia jointly remain a public health problem in developing countries of which Malawi is one. Although there is an improvement along with intervention strategies in fighting against malaria and anaemia in Malawi, the two diseases remain significant problems, especially in children 6–59 months of age. The main objective of this study was to examine the association between malaria and anaemia. Moreover, the study investigated whether socio-economic, geographic, and demographic factors had a significant impact on malaria and anaemia. Data and methodology: The present study used a secondary cross-sectional data set from the 2017 Malawi Malaria Indicator Survey (MMIS) with a total number of 2 724 children 6–9 months of age. The study utilized a multivariate joint model within the ambit of the generalized linear mixed model (GLMM) to analyse the data. The two response variables for this study were: the child has either malaria or anaemia. Results: The prevalence of malaria was 37.2% of the total number of children who were tested using an RDT, while 56.9% were anaemic. The results from the multivariate joint model under GLMM indicated a positive association between anaemia and malaria. Furthermore, the same results showed that mother's education level, child's age, the altitude of the place of residence, place of residence, toilet facility, access to electricity and children who slept under a mosquito bed net the night before the survey had a significant effect on malaria and anaemia. Conclusion: The study indicated that there is a strong association between anaemia and malaria. This is interpreted to indicate that controlling for malaria can result in a reduction of anaemia. The socio-economic, geographical and demographic variables have a significant effect on improving malaria and anaemia. Thus, improving health care, toilet facilities, access to electricity, especially in rural areas, educating the mothers of children and increasing mosquito bed nets would contribute in the reduction of malaria and anaemia in Malawi.Item Measuring poverty and child malnutrition with their determinants from household survey data.(2016) Habyarimana, Faustin.; Zewotir, Temesgen Tenaw.; Ramroop, Shaun.The eradication of poverty and malnutrition is the main objective of most societies and policy makers. But in most cases, developing a perfect or accurate poverty and malnutrition assessment tool to target the poor households and malnourished people is a challenge for applied policy research. The poverty of households and malnutrition of children under five years have been measured based to money metric and this approach has a number of problems especially in developing countries. Hence, in this study we developed an asset index from Demographic and Health Survey data as an alternative method to measure poverty of households and malnutrition and thereby examine different statistical methods that are suitable to identify the associated factors. Therefore, principal component analysis was used to create an asset index for each household which in turn served as response variable in case of poverty and explanatory (known as wealth quintile) variable in the case of malnutrition. In order to account for the complexity of sampling design and the ordering of outcome variable, a generalized linear mixed model approach was used to extend ordinal survey logistic regression to include random effects and therefore to account for the variability between the primary sampling units or villages. Further, a joint model was used to simultaneously measure the malnutrition on three anthropometric indicators and to examine the possible correlation between underweight, stunting and wasting. To account for spatial variability between the villages, we used spatial multivariate joint model under generalized linear mixed model. A quantile regression model was used in order to consider a complete picture of the relationship between the outcome variable (poverty index and weight-for-age index) and predictor variables to the desired quantiles. We have also used generalized additive mixed model (semiparametric) in order to relax the assumption of normality and linearity inherent in linear regression models, where categorical covariates were modeled by parametric model, continuous covariates and interaction between the continuous and categorical variables by nonparametric models. A composite index from three anthropometric indices was created and used to identify the association of poverty and malnutrition as well as the factors associated with them. Each of these models has inherent strengths and weaknesses. Then, the choice of one depends on what a research is trying to accomplish and the type of data being used. The findings from this study revealed that the level of education of household head, gender of household head, age of household head, size of the household, place of residence and the province are the key determinants of poverty of households in Rwanda. It also revealed that the determinants of malnutrition of children under five years in Rwanda are: child age, birth order of the child, gender of the child, birth weight of the child, fever, multiple birth, mother’s level of education, mother’s age at the birth, anemia, marital status of the mother, body mass index of the mother, mother’s knowledge on nutrition, wealth index of the family, source of drinking water and province. Further, this study revealed a positive association between poverty of household and malnutrition of children under five years.Item Modeling the factors affecting cereal crop yields in the Amhara National Regional State of Ethiopia.(2010) Mohammed, Yunus Hussien.; Ramroop, Shaun.; Zewotir, Temesgen.The agriculture sector in Amhara National Regional State is characterised by producing cereal crops which occupy the largest percentage (84.3%) of the total crop area cultivated in the region. As a result, it is imperative to investigate which factors influence the yields of cereal crops particularly in relation to the five major types of cereals in the study region namely barley, maize, sorghum, teff and wheat. Therefore, in this thesis, using data collected by the Central Statistical Agency of Ethiopia, various statistical methods such as multiple regression analysis were applied to investigate the factors which influence the mean yields of the major cereal crops. Moreover, a mixed model analysis was implemented to assess the effects associated with the sampling units (enumeration areas), and a cluster analysis to classify the region into similar groups of zones. The multiple regression results indicate that all the studied cereals mean yields are affected by zone, fertilizer type and crop damage effects. In addition to this, barley is affected by extension programme; maize crop by seed type, irrigation, and protection of soil erosion; sorghum and teff crops are additionally affected by crop prevention method, extension programme, protection of soil erosion, and gender of the household head; and wheat crop by crop prevention methods, extension programme and gender of the household head. The results from the mixed model analysis were entirely different from the regression results due to the observed dependencies of the cereals mean yields on the sampling unit. Based on the hierarchical cluster analysis, five groups of classes (clusters) were identified which seem to be in agreement with the geographical neighbouring positions of the locations and the similarity of the type of crops produced.Item Modeling the smoking status of Kenya's males in the presence of missing data.(2014) Umugiraneza, Odette.; Ramroop, Shaun.; Mwambi, Henry G.The current research, modeling smoking status in Kenya's males in the presence of missing data has three objectives: The first objective of this study is to identify factors, associated with smoking which will lead to recommendations to the smoking policy in Kenya. The second objective is to apply the appropriate statistical models to model smoking status of Kenya males that incorporates missing data; Logistic regression as well as the generalized linear mixed model are used to model the smoking status. The third objective leads to comparison of the various statistical methods that handle monotone missing data and by their strengths and weaknesses. The following statistical methods for handling missing data are investigated. These are Last Observation Carried Forward (LOCF) and Multiple Imputation (MI) in order to handle the missingness. The missing data will be created by deleting randomly 20% and 30% of the data. The data used is KDHS 2008-2009, the response variable is the smoking status (smoker and non smoker) and the explanatory variables are region, marital status, religion, education, age group of the respondent, wealth index, size of household and access to mass media.Item Modelling longitudinal binary disease outcome data including the effect of covariates and extra variability.(2011) Ngcobo, Siyabonga.; Mwambi, Henry G.; Ramroop, Shaun.The current work deals with modelling longitudinal or repeated non-Gaussian measurements for a respiratory disease. The analysis of longitudinal data for non-Gaussian binary disease outcome data can broadly be modeled using three different approaches; the marginal, random effects and transition models. The marginal type model is used if one is interested in estimating population averaged effects such as whether a treatment works or not on an average individual. On the other hand random effects models are important if apart from measuring population averaged effects a researcher is also interested in subject specific effects. In this case to get marginal effects from the subject-specific model we integrate out the random effects. Transition models are also called conditional models as a general term. Thus all the three types of models are important in understanding the effects of covariates and disease progression and distribution of outcomes in a population. In the current work the three models have been researched on and fitted to data. The random effects or subject-specific model is further modified to relax the assumption that the random effects should be strictly normal. This leads to the so called hierarchical generalized linear model (HGLM) based on the h-likelihood formulation suggested by Lee and Nelder (1996). The marginal model was fitted using generalized estimating equations (GEE) using PROC GENMOD in SAS. The random effects model was fitted using PROC GLIMMIX and PROC NLMIXED in SAS (generalized linear mixed model). The latter approach was found to be more flexible except for the need of specifying initial parameter values. The transition model was used to capture the dependence between outcomes in particular the dependence of the current response or outcome on the previous response and fitted using PROC GENMOD. The HGLM was fitted using the GENSTAT software. Longitudinal disease outcome data can provide real and reliable data to model disease progression in the sense that it can be used to estimate important disease i parameters such as prevalence, incidence and others such as the force of infection. Problem associated with longitudinal data include loss of information due to loss to follow up such as dropout and missing data in general. In some cases cross-sectional data can be used to find the required estimates but longitudinal data is more efficient but may require more time, effort and cost to collect. However the successful estimation of a given parameter or function depends on the availability of the relevant data for it. It is sometimes impossible to estimate a parameter of interest if the data cannot its estimation.Item Modelling longitudinal counts data with application to recurrent epileptic seizure events.(2010) Ngulube, Phathisani.; Mwambi, Henry G.; Ramroop, Shaun.The objectives of this thesis is to explore different approaches of modelling clustered correlated data in the form of repeated or longitudinal counts data leading to a replicated Poisson process. The specific application is from repeated epileptic seizure time to events data. Two main classes of models will be considered in this thesis. These are the marginal and subject or cluster specific effects models. Under the marginal class of models the generalized estimating equations approach due to Liang and Zeger (1986) is first considered. These models are concerned with population averaged effects as opposed to subject-specific effects which include random subject-specific effects such that multiple or repeated outcomes within a subject or cluster are assumed to be independent conditional on the subject−specific effects. Finally we consider a distinct class of marginal models which include three common variants namely the approach due to Anderson and Gill (1982), Wei et al (1989) and Prentice et al. (1981)Item Modelling longitudinally measured outcome HIV biomarkers with immuno genetic parameters.(2011) Bryan, Susan Ruth.; Ramroop, Shaun.; Mwambi, Henry G.According to the Joint United Nations Programme against HIV/AIDS 2009 AIDS epidemic update, there were a total of 33.3 million (31.4 million–35.3 million) people living with HIV worldwide in 2009. The majority of the epidemic occurs in Sub-Saharan Africa. Of the 33.3 million people living with HIV worldwide in 2009, a vast majority of 22.5 million (20.9 million-24.2 million) were from Sub-Saharan Africa. There were 1.8 million (1.6 million-2.0 million) new infections and 1.3 million (1.1 million-1.5 million) AIDS-related deaths in Sub-Saharan Africa in 2009 (UNAIDS, 2009). Statistical models and analysis are required in order to further understand the dynamics of HIV/AIDS and in the design of intervention and control strategies. Despite the prevalence of this disease, its pathogenesis is still poorly understood. A thorough understanding of HIV and factors that influence progression of the disease is required in order to prevent the further spread of the virus. Modelling provides us with a means to understand and predict the progression of the disease better. Certain genetic factors play a key role in the way the disease progresses in a human body. For example HLA-B types and IL-10 genotypes are some of the genetic factors that have been independently associated with the control of HIV infection. Both HLA-B and IL-10 may influence the quality and magnitude of immune responses and IL-10 has also been shown to down regulate the expression of certain HLA molecules. Studies are therefore required to investigate how HLA-B types and IL-10 genotypes may interact to affect HIV infection outcomes. This dissertation uses the Sinikithemba study data from the HIV Pathogenesis Programme (HPP) at the Medical School, University of KwaZulu-Natal involving 450 HIV positive and treatment naive individuals to model how certain outcome biomarkers (CD4+ counts and viral loads) are associated with immuno genetic parameters (HLA-B types and IL-10 genotypes). The work also seeks to exploit novel longitudinal data methods in Statistics in order to efficiently model longitudinally measured HIV outcome data. Statistical techniques such as linear mixed models and generalized estimating equations were used to model this data. The findings from the current work agree quite closely with what is expected from the biological understanding of the disease.Item Modelling of volatility in the South African mining sector : application of ARCH and GARCH models.(2016) Gaston, Rugiranka Tony.; Ramroop, Shaun.; Mwambi, Henry Godwell.Abstract available in PDF file.Item Modelling poverty in Zimbabwe based on the demographic health survey dataset using GLMs and GAMMs.(2020) Mtshali, Precious.; Ramroop, Shaun.; Mwambi, Henry Godwell.Zimbabwe has been in a state of political, economic, and social crisis for the past 15 years. In 2004, 80% of Zimbabweans were living below the national poverty line. By January 2009, only 6% of the population held jobs in the formal sector. Living in poverty may lead to stressful conditions that are linked to poor mental health problems in adults and developmental issues in children. This study investigates the risk factors that affect poverty status in Zimbabwe and makes recommendations for current policy on poverty, using statistical models such as generalized linear models (GLMs) and generalized additive mixed models (GAMMs). This study makes use of the Zimbabwe 2015 Demographic and Healthy Survey Dataset (DHS). The index was created using 29 variables questions from a principal component analysis. The first component was taken and the factor score was used. There was a cutoff below the median and above the median. Hence, the dichotomous response variable was socioeconomic status (SES) (1=Poor, 2=Not poor).The DHS data has explanatory variables such as the level of education, sex of the household head and age of the household head, size of the household head, and place of residence and sex of the household head. The results in both models (GLMs and GLMMs) reveal that these demographic factors are key determinants of poverty of households in Zimbabwe. This study demonstrates that the government of Zimbabwe needs to pay attention and intervene by looking into the demographic factors that affect poverty status.Item Modelling volatility in financial time series.(2011) Dralle, Bruce.; Ramroop, Shaun.; Mwambi, Henry G.The objective of this dissertation is to model the volatility of financial time series data using ARCH, GARCH and stochastic volatility models. It is found that the ARCH and GARCH models are easy to fit compared to the stochastic volatility models which present problems with respect to the distributional assumptions that need to be made. For this reason the ARCH and GARCH models remain more widely used than the stochastic volatility models. The ARCH, GARCH and stochastic volatility models are fitted to four data sets consisting of daily closing prices of gold mining companies listed on the Johannesburg stock exchange. The companies are Anglo Gold Ashanti Ltd, DRD Gold Ltd, Gold Fields Ltd and Harmony Gold Mining Company Ltd. The best fitting ARCH and GARCH models are identified along with the best error distribution and then diagnostics are performed to ensure adequacy of the models. It was found throughout that the student-t distribution was the best error distribution to use for each data set. The results from the stochastic volatility models were in agreement with those obtained from the ARCH and GARCH models. The stochastic volatility models are, however, restricted to the form of an AR(1) process due to the complexities involved in fitting higher order models.Item Modelling volatility in stock exchange data : a case study of three Johannesburg Stock Exchange (JSE) companies.(2015) Kaseke, Forbes.; Ramroop, Shaun.; Mwambi, Henry Godwell.For investors and policy makers such as governments, the uncertainty of returns on investments is a major problem. The aim of this paper is to study volatility models for financial data for both univariate and multivariate case. The data to be used is monthly and daily asset returns of three different companies. For the univariate case, the main focus is on GARCH models and their subsequent derivatives. ARCH and GARCH models of different orders are fit. For the monthly data, the GARCH(1,1)outperformed the ARCH and higher order GARCH models. For the daily data, the GARCH(1,1) preceded by an appropriate AR model was the best fit. For the Multivariate volatility models, models such as the DCC-GARCH, EMWA and Go-GARCH were used. All three gave similar results. Various distributional assumptions such as the normal and Student t distributions were assumed for the innovations. Student t and Skewed Student t distributions were more effective because of their ability to capture fat tails of the distributions. Fundamental finance terms and concepts are also discussed.