# Linear and log-linear models based on generalized inverse sampling scheme

Dissertation

Spring 5-31-2006

## Degree Name

Doctor of Philosophy in Mathematical Sciences - (Ph.D.)

## Department

Mathematical Sciences

Sunil Kumar Dhar

Manish Chandra Bhattacharjee

Robert M. Miura

Eliza Zoi-Heleni Michalopoulou

Thomas Spencer

## Abstract

This dissertation explores the development of novel statistical techniques and the applications in modeling rare events using generalized inverse sampling scheme. The Poisson model can be used for independent frequency count data. Also negative binomial and negative multinomial (NMn) models are applicable when there is only one rare category in the population. Here, a new model, based on generalized inverse sampling scheme, is introduced to study several rare events simultaneouly. The generalized inverse sampling scheme is used to study several rare categories of a population. Samples are drawn until a predetermined number of the total of the rare events occur. The distribution of the frequency counts under generalized inverse sampling is said to follow an extended negative multinomial (ENMn) distribution and hence the model is named as extended negative multinomial model. The interesting properties of this distribution made it applicable in analyzing a wide variety of data in the biomedical field.

Log-linear models can be interpreted in terms of interactions between the various factors in multidimensional tables and are easily generalized to higher dimensions. In this thesis, a log-linear model has been defined for a multi-way contingency table, where the cells are frequency counts that follow an extended negative multinomial distribution. The parameters of the new model are estimated by a maximum likelihood method. A test statistic for the general log-linear hypothesis also is derived. The results are generalized for s independent sub-populations.

The major difficulty in using the extended negative multinomial model, like the negative multinomial model, is to estimate the shape parameter of the underlying distribution. There were no existing maximum likelihood estimators for this shape parameter of the negative multinomial distribution for s sub populations. A maximum likelihood estimator based on Expectation-Maximization (EM) algorithm is developed to estimate the shape parameter of both the negative multinomial and the extended negative multinomial distributions. This model is applied to study the tolerability analysis of the drug tolterodine and also to study the incidence of several related diseases in different cities.

COinS