Repository logo

Risk factors and classification of diabetes in South Africa.

Thumbnail Image



Journal Title

Journal ISSN

Volume Title



Diabetes prevalence has been seen to be on the increase in recent years, globally and in South Africa. The number of people with diabetes globally has risen from 108 million in 1980 to 442 million in 2014. It was estimated that, of the 1.8 million people between 20 and 79 years old with diabetes in South Africa in 2017, 84.8% were undiagnosed. Diabetes was the 2nd leading underlying cause of death in South Africa in 2016. Identifying risk factors for diabetes will assist in raising public awareness and assist public authorities to develop prevention programs. This study aimed to investigate the prevalence and risk factors associated with diabetes in the South African population aged 15 years and older, as well as explore various statistical methods of classifying a person’s diabetic status. This study made use of the South African Demographic Health Survey 2016 data which involved a two-stage sampling design. The study participants included 6442 individuals aged 15 years and older. Of the individuals sampled, 11%, 67% and 22% were found to be non-diabetic, pre-diabetic and diabetic, respectively. Classification methods, namely, a decision tree, random forest and Bayesian neural network, were used to assess classification of diabetic status based on the risk factors. Of the classification methods, the Bayesian neural network gave the highest accuracy (75.9%). These methods however, failed to account for the complex survey design and sampling weights. In addition, these methods are not able to provide the estimated effect that a risk factor has on the diabetic status. Regression models were employed to identify the significant risk factors. Due to the ordinal nature of diabetic status, initially the proportional odds model was fit. However, the proportional odds assumption was found to be violated. A multinomial generalized linear mixed model was fitted to account for the complexity of the design. However, the model’s residuals were found to be spatially autocorrelated. Accordingly, a spatial generalized additive mixed model, which accounts for the complexity of the survey structure as well as incorporates nonlinear spatial effects, was adopted. The highest accuracy from the regression models considered was obtained from this adjusted surface correlation model (accuracy = 70.8%). Individuals of the Black/African race were more likely to be diabetic (OR = 1.429; 95% CI: 1.032-1.978) than other races. Individuals taking high blood pressure medication were 1.444 times more likely to be diabetic than pre-diabetic (95% CI: 1.167-1.786) compared to those not taking high blood pressure medication.


Masters Degree. University of KwaZulu-Natal, Durban.