Identifying factors associated with smoking in Gauteng in the presence of missing data.
Smoking still remains one of the leading preventable causes of death in South Africa. It increases the chances of lung diseases such as emphysema, chronic bronchitis and many other diseases. The current research aims to model the smoking survey data which was part of the October 1996 omnibus smoking survey in Gauteng (South Africa). The surveyed variables were race, sex, marital status, socio-economic status, smoking status, age and education level. Generalized Linear Models (GLMs) and Generalized Linear Mixed Models (GLMMs) were used to model this data. Multiple Correspondence Analysis (MCA) was used to check for the relationships and correlation among the variables. Furthermore, the problem of missing data was addressed using the classical methods such as Last Observation Carried Forward (LOCF) as well as more modern advanced methods viz. Inverse Probability Weighting (IPW) and Multiple Imputation (MI). The percentage of smokers was found to be lower than that of non-smokers amongst all the surveyed variables. Race, sex, age and socio-economic status were found to be signi cant when tted with both GLMs and GLMMs. It was found that race and socio-economic status were closely correlated, education was closely correlated with race, education was closely correlated with socio-economic status, and age was closely correlated with marital status. MI and IPW estimators were found to be more consistent than the LOCF estimators. In spite of the e ort by several health policy makers of trying to alert people about the dangers of smoking, there appears to be a lack of awareness that smoking causes tuberculosis (TB), lung cancer, stroke, throat and mouth cancer, as well as various other lung and heart diseases.