Document Type

Dissertation

Date of Award

12-31-2019

Degree Name

Doctor of Philosophy in Mathematical Sciences - (Ph.D.)

Department

Mathematical Sciences

First Advisor

Ji Meng Loh

Second Advisor

Yixin Fang

Third Advisor

Sunil Kumar Dhar

Fourth Advisor

Antai Wang

Fifth Advisor

Yang Feng

Abstract

This dissertation introduces two statistical techniques to tackle high-dimensional data, which is very commonplace nowadays. It consists of two topics which are inter-related by a common link, dimension reduction.

The first topic is a recently introduced classification technique, the weighted principal support vector machine (WPSVM), which is incorporated into a spatial point process framework. The WPSVM possesses an additional parameter, a weight parameter, besides the regularization parameter. Most statistical techniques, including WPSVM, have an inherent assumption of independence, which means the data points are not connected with each other in any manner. But spatial data violates this assumption. Correlation between two spatial data points increases as the distance between them decreases. However, under some conditions on the spatial point process, the WPSVM is still valid. Furthermore, through extensive simulations it has been shown that WPSVM performs better than other dimension reduction techniques. The main advantage of WPSVM comes from the fact that it can handle non-linear relationships. WPSVM is also applied to a rainforest dataset.

The second topic talks about another recently introduced technique, joint-screening. Unlike the previous method, this works for ultra-high dimensional data (p >> n). Most existing variable screening methods fail to identify those marginally unimportant but jointly important genetic variables. The joint screening (JS) procedure screens all the covariates at the same time based on a criterion. In this way a subset of variables that are suspected to be highly associated with the outcome can be identified. One massive advantage of the JS procedure comes from the fact that it is computationally simple and easy to understand. The performance of the proposed JS procedure is evaluated via simulation studies and an application to the Genetics Analysis Workshop 20 data.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.