Modelling longitudinal binary disease outcome data including the effect of covariates and extra variability.
The current work deals with modelling longitudinal or repeated non-Gaussian measurements for a respiratory disease. The analysis of longitudinal data for non-Gaussian binary disease outcome data can broadly be modeled using three different approaches; the marginal, random effects and transition models. The marginal type model is used if one is interested in estimating population averaged effects such as whether a treatment works or not on an average individual. On the other hand random effects models are important if apart from measuring population averaged effects a researcher is also interested in subject specific effects. In this case to get marginal effects from the subject-specific model we integrate out the random effects. Transition models are also called conditional models as a general term. Thus all the three types of models are important in understanding the effects of covariates and disease progression and distribution of outcomes in a population. In the current work the three models have been researched on and fitted to data. The random effects or subject-specific model is further modified to relax the assumption that the random effects should be strictly normal. This leads to the so called hierarchical generalized linear model (HGLM) based on the h-likelihood formulation suggested by Lee and Nelder (1996). The marginal model was fitted using generalized estimating equations (GEE) using PROC GENMOD in SAS. The random effects model was fitted using PROC GLIMMIX and PROC NLMIXED in SAS (generalized linear mixed model). The latter approach was found to be more flexible except for the need of specifying initial parameter values. The transition model was used to capture the dependence between outcomes in particular the dependence of the current response or outcome on the previous response and fitted using PROC GENMOD. The HGLM was fitted using the GENSTAT software. Longitudinal disease outcome data can provide real and reliable data to model disease progression in the sense that it can be used to estimate important disease i parameters such as prevalence, incidence and others such as the force of infection. Problem associated with longitudinal data include loss of information due to loss to follow up such as dropout and missing data in general. In some cases cross-sectional data can be used to find the required estimates but longitudinal data is more efficient but may require more time, effort and cost to collect. However the successful estimation of a given parameter or function depends on the availability of the relevant data for it. It is sometimes impossible to estimate a parameter of interest if the data cannot its estimation.