Statistical methods for analysing complex survey data : an application to HIV/AIDS in Ethiopia.
The HIV/AIDS pandemic is currently the most challenging public health matter that faces third world countries, especially those in Sub-Saharan Africa. Ethiopia, in East Africa, with a generalised and highly heterogeneous epidemic, is no exception, with HIV/AIDS affecting most sectors of the economy. The first case of HIV in Ethiopia was reported in 1984. Since then, HIV/AIDS has become a major public health con cern, leading the Government of Ethiopia to declare a public health emergency in 2002. In 2011, the adult HIV/AIDS prevalence in Ethiopia was estimated at 1.5%. Approximately 1.2 million Ethiopians were living with HIV/AIDS in 2010. Surveys are an important and popular tool for collecting data. Analytical use of survey data especially health survey data has become very common, with a focus on the association of particular outcome variables with explanatory variables at the population level. In this study we used the data from the 2005 Ethiopian Demographic and Health Survey, (EDHS 2005), and identified key demographic, socioeconomic, sociocultural, behavioral and proximate determinants of HIV/AIDS risk factor. Usually most survey analysts ignore the complex survey design issues like clustering, stratification and unequal probability of selection (weights). This study deals with complex survey design and takes the design aspect into account, because failure to do so leads to bias parameters estimates and standard error, wide confidence intervals and statistical tests will be incorrect. In this study, three statistical approaches were used to analyse the complex survey data. The first approach was a survey logistic regression used to model the binary outcome (HIV serostatus) and set of explanatory variables (the dependence of the HIV risk factors). The difference between survey logistic regression and the ordinary logistic regression is that survey logistic regression approach takes the study design into account during analysis. The second approach was a multilevel logistic regression model, that assumed that the data structure in the population was hierarchical, and that individual within household was selected from clusters that were randomly selected from a national sampling frame. We considered a three-level model for our analysis. This second approach considered the results from Frequentist and a Bayesian multilevel models. Bayesian methods can provide accurate estimates of the parameters and the uncertainty associated with them. The third approach used was a Spatial models approach where model parameters were estimated under the Integrated Nested Laplace Approximation (INLA) paradigm.