Date of Award
Doctor of Philosophy in Mathematical Sciences - (Ph.D.)
Loh, Ji Meng
Dhar, Sunil Kumar
This dissertation introduces two statistical techniques to tackle high-dimensional data, which is very commonplace nowadays. It consists of two topics which are inter-related by a common link, dimension reduction.
The first topic is a recently introduced classification technique, the weighted principal support vector machine (WPSVM), which is incorporated into a spatial point process framework. The WPSVM possesses an additional parameter, a weight parameter, besides the regularization parameter. Most statistical techniques, including WPSVM, have an inherent assumption of independence, which means the data points are not connected with each other in any manner. But spatial data violates this assumption. Correlation between two spatial data points increases as the distance between them decreases. However, under some conditions on the spatial point process, the WPSVM is still valid. Furthermore, through extensive simulations it has been shown that WPSVM performs better than other dimension reduction techniques. The main advantage of WPSVM comes from the fact that it can handle non-linear relationships. WPSVM is also applied to a rainforest dataset.
The second topic talks about another recently introduced technique, joint-screening. Unlike the previous method, this works for ultra-high dimensional data (p >> n). Most existing variable screening methods fail to identify those marginally unimportant but jointly important genetic variables. The joint screening (JS) procedure screens all the covariates at the same time based on a criterion. In this way a subset of variables that are suspected to be highly associated with the outcome can be identified. One massive advantage of the JS procedure comes from the fact that it is computationally simple and easy to understand. The performance of the proposed JS procedure is evaluated via simulation studies and an application to the Genetics Analysis Workshop 20 data.
Datta, Subha, "Dimension reduction techniques for high dimensional and ultra-high dimensional data" (2019). Dissertations. 1432.