Date of Award
Doctor of Philosophy in Mathematical Sciences - (Ph.D.)
In such applications as identifying differentially expressed genes in micro-array experiments or assessing safety and efficacy of drugs in clinical trials, researchers often report confidence intervals (CIs) and p-values only for the selected parameters, which is called selective inference. While constructing multiple CIs for the selected parameters, it is common practice to ignore issue of selection and multiplicity. Although protection against the effect of selection is sufficient in some cases, simultaneous coverage should be also needed in real applications. For example, in clinical trials, multiple endpoints are considered to assess effects of a drug and the ultimate decision often depends on joint outcome for primary endpoints.
In this dissertation, a new concept of γ-false coverage proportion (γ-FCP) is first presented as a proper measurement for CIs following selection. Such a new measurement has advantages since it takes effect of selection into consideration as well as simultaneous coverage. If a procedure control γ-FCP at a desired level a, then it implies such procedure has high proportion of CIs, which cover the corresponding parameters with high probability. Aiming at keeping γ-FCP at a desired level, two types of procedures are developed. One type is based on unconditional CI; the other type is based on conditional CI, which means CI is conditional on the event of selection. An unconditional CI-based procedure is firstly developed, which is proven to control γ-FCP at a desired level under independence. Theoretically, the result is able to be extended to positive regression dependence. Secondly, a modified unconditional CI-based procedure is presented to control γ-FCP under arbitrary dependence. Thirdly, with approach of conditional CIs, a new conditional CI-based selective inference procedure is developed, which is able to control γ-FCP at a desired level under independence. Finally a modified conditional CI-based procedure is developed to control γ-FCP under arbitrary dependence.
All of the proposed procedures are evaluated through extensive simulation studies. The effect of nonzero proportion, selection level, and correlation coefficient are evaluated, while we apply the proposed procedures in terms of γ-FCP control and average width of CIs. The simulation studies are then applied to strong dependence such as equal correlation and several weak dependence such as block-wise dependence. The simulation studies show that the proposed procedures are able to either control γ-FCP or have shorter width of CIs than existing methods such as FCR controlling procedures (Benjamini and Yekutieli, 2005). Next, all of the proposed procedures are applied on two sets of micro-array gene expression data. Compared to same existing methods, the proposed conditional CI-based procedure provides (i) shorter width of CI; and (ii) more count of CI not covering zero; and (iii) longer distance of CI away from zero.
Zhang, Yan, "Topics on high dimensional selective inference" (2019). Dissertations. 1655.