Repository logo

Modelling depression in South Africa.

Thumbnail Image



Journal Title

Journal ISSN

Volume Title



Depression is considered to be the leading cause of disability worldwide, with approximately 350 million individuals, of all ages, affected. The mental disorder is predominant in females and poverty is associated with an increased prevalence. The 12-month prevalence in South Africa is approximately 16.5%, with a lifetime prevalence of common mental disorders among adults of 38% (World Health Organization (WHO), 2017). In order to assist individuals in dealing with depression, it is important for such individuals to be identified at an early stage in order to provide them with the necessary support before their depression becomes unmanageable, thus putting them at risk for self-inflicted harm. The objective of this study was to investigate the prevalence and risk determinants of depression among South African individuals between the ages 15 to 49 years old and to determine which factors contribute the most to this mental illness. This study made use of data from the 2016 South African General Household Survey which was carried out using a multistage cluster sampling technique. The sample was not spread geographically in proportion to the population, but rather equally across the enumeration areas. The response variable of interest was binary, indicating whether an individual considered himself/herself depressed or not. Three statistical approaches were applied. The first was the survey logistic regression model which is a design-based approach. In this approach, parameter estimates and inferences were based on the sampling weights, and only inferences concerning the effects of certain covariates on the response variable were of interest. The second was a generalized linear mixed model which is a model-based approach. In this approach, interest was also on estimating and accounting for the proportion of variation in the response variable that was attributable to each of the multiple levels of sampling. This approach also accounted for possible correlations in the data where individuals in the same household or cluster tend to be more alike than those from other households or clusters. Lastly, a Bayesian network was applied to model the conditional dependence among the variables. This approach is a type of probabilistic graphical model that uses Bayesian inference for calculations of the probabilities. i The results indicated that substance abuse, the person’s perceived health status and gender were significantly associated with depression. Each of the three techniques were then used to classify the depression status of the individuals, and their performances in this classification were compared. The purpose of being able to classify an individual’s depression status, based on their individual and household factors, is to be able to identify a depressed individual in order to target them for intervention. The generalized linear mixed model proved to be the better performing technique in terms of classification. Thus, we recommend that when using data based on a complex survey design, this technique is considered in classifying the occurrence of an event of interest.


Masters Degree. University of KwaZulu-Natal, Durban.