## Modelling longitudinal binary disease outcome data including the effect of covariates and extra variability.

##### Abstract

The current work deals with modelling longitudinal or repeated non-Gaussian measurements for
a respiratory disease. The analysis of longitudinal data for non-Gaussian binary disease outcome
data can broadly be modeled using three different approaches; the marginal, random effects and
transition models. The marginal type model is used if one is interested in estimating population
averaged effects such as whether a treatment works or not on an average individual. On the
other hand random effects models are important if apart from measuring population averaged
effects a researcher is also interested in subject specific effects. In this case to get marginal effects
from the subject-specific model we integrate out the random effects. Transition models are also
called conditional models as a general term. Thus all the three types of models are important in
understanding the effects of covariates and disease progression and distribution of outcomes in
a population. In the current work the three models have been researched on and fitted to data.
The random effects or subject-specific model is further modified to relax the assumption that the
random effects should be strictly normal. This leads to the so called hierarchical generalized linear
model (HGLM) based on the h-likelihood formulation suggested by Lee and Nelder (1996). The
marginal model was fitted using generalized estimating equations (GEE) using PROC GENMOD
in SAS. The random effects model was fitted using PROC GLIMMIX and PROC NLMIXED
in SAS (generalized linear mixed model). The latter approach was found to be more flexible
except for the need of specifying initial parameter values. The transition model was used to
capture the dependence between outcomes in particular the dependence of the current response
or outcome on the previous response and fitted using PROC GENMOD. The HGLM was fitted
using the GENSTAT software. Longitudinal disease outcome data can provide real and reliable
data to model disease progression in the sense that it can be used to estimate important disease
i
parameters such as prevalence, incidence and others such as the force of infection. Problem
associated with longitudinal data include loss of information due to loss to follow up such as
dropout and missing data in general. In some cases cross-sectional data can be used to find the
required estimates but longitudinal data is more efficient but may require more time, effort and
cost to collect. However the successful estimation of a given parameter or function depends on
the availability of the relevant data for it. It is sometimes impossible to estimate a parameter of
interest if the data cannot its estimation.