Document Type
Dissertation
Date of Award
Fall 2017
Degree Name
Doctor of Philosophy in Mathematical Sciences - (Ph.D.)
Department
Mathematical Sciences
First Advisor
Sunil Kumar Dhar
Second Advisor
Wenge Guo
Third Advisor
Ji Meng Loh
Fourth Advisor
Sundarraman Subramanian
Fifth Advisor
Satrajit Roychowdhury
Abstract
In applications such as studying drug adverse events (AE) in clinical trials and identifying differentially expressed genes in microarray experiments, the data of the experiments usually consists of frequency counts. In the analysis of such data, researchers often face multiple hypotheses testing based on discrete test statistics. Incorporating this discrete property of the data, several stepwise procedures, which allow to use the CDF of p-values to determine the testing threshold, are proposed for controlling familiwise error rate (FWER). It is shown that the proposed procedures strongly control the FWER and are more powerful than the existing ones for discrete data. Through some simulation studies and real data examples, the proposed procedures are shown to outperform the existing procedures in terms of the FWER control and power. An R package “MHTdiscrete” and a web application are developed for implementing the proposed procedures for discrete data.
Many complex biomedical studies, such as clinical safety studies and genome-wide association studies, often involve testing multiple families of hypotheses. Most existing multiple testing methods cannot guarantee strong control of appropriate type 1 error rates suitable for such increasingly complex research questions. A novel two-stage procedure based on the recently developed idea of selective inference for clinical safety studies is introduced. In the first stage, some significant families are selected by using some family-level global test, which guarantees control of generalized familywise error rate (k-FWER) among the selected families. In the second stage, individual hypotheses are tested for each selected families by using some multiple testing procedure, which controls conditional false discovery rate (cFDR) based on the fact that the family is selected. By applying the proposed procedure to clinical safety studies, one can not only efficiently flag the significant clinical adverse events (AEs) but also select body systems of interest (BSoI) as extra information for further research. The simulation studies show that the proposed procedure can be more reliable than alternative methods such as Mehrotra and Heyse’s double FDR procedure in the setting of clinical safety. The proposed procedure for multiple families structure is implemented in the R package “MHTmult”.
Categorical data arises in biomedical and healthcare experiments naturally. In many of these cases, the outcome variables of interest are the numbers of special events. At least one distinct special event category is observed, when the negative multinomial and extended negative multinomial or generalized inverse sampling scheme-based regression models are used. The new model, based on generalized inverse sampling scheme for several special events, is developed in this dissertation. This research is an adaption to the widely used multinomial logistic regression model. The resulting equations of the proposed model, corresponding to the natural log of the ratio of the expected responses, appears similar to the multinomial logistic regression. Using this expected response ratio of a category to that of the special category, the maximum likelihood estimator of the regression parameters can be computed by creating score equations and the Hessian matrix of the likelihood. The covariance matrix of estimators of the regression parameters for the new model can be estimated by inverting the Hessian matrix to develop the inference. This research also develops model diagnostics such as normality check with deviance and Pearson residuals, and likelihood based computations. The proposed model is implemented in the R package “mvlogit”.
Recommended Citation
Zhu, Yalin, "Topics on multiple hypotheses testing and generalized linear model" (2017). Dissertations. 55.
https://digitalcommons.njit.edu/dissertations/55