Govender, Yogarani.
Abstract:
The purpose of this study is to investigate and understand data which are
grouped into categories. At the onset, the study presents a review of early
research contributions and controversies surrounding categorical data analysis.
The concept of sparseness in a contingency table refers to a table where many
cells have small frequencies. Previous research findings showed that incorrect
results were obtained in the analysis of sparse tables. Hence, attention is
focussed on the effect of sparseness on modelling and analysis of categorical
data in this dissertation.
Cressie and Read (1984) suggested a versatile alternative, the power divergence
statistic, to statistics proposed in the past. This study includes a detailed
discussion of the power-divergence goodness-of-fit statistic with areas of interest
covering a review on the minimum power divergence estimation method
and evaluation of model fit. The effects of sparseness are also investigated
for the power-divergence statistic. Comparative reviews on the accuracy, efficiency
and performance of the power-divergence family of statistics under
large and small sample cases are presented. Statistical applications on the
power-divergence statistic have been conducted in SAS (Statistical Analysis
Software).
Further findings on the effect of small expected frequencies on accuracy of the
X2 test are presented from the studies of Tate and Hyer (1973) and Lawal and
Upton (1976).
Other goodness-of-fit statistics which bear relevance to the sparse multino-
mial case are discussed. They include Zelterman's (1987) D2 goodness-of-fit
statistic, Simonoff's (1982, 1983) goodness-of-fit statistics as well as Koehler
and Larntz's tests for log-linear models. On addressing contradictions for the
sparse sample case under asymptotic conditions and an increase in sample size,
discussions are provided on Simonoff's use of nonparametric techniques to find
the variances as well as his adoption of the jackknife and bootstrap technique.