Author ORCID Identifier
0000-0002-2151-3281
Document Type
Dissertation
Date of Award
5-31-2023
Degree Name
Doctor of Philosophy in Mathematical Sciences - (Ph.D.)
Department
Mathematical Sciences
First Advisor
Wenge Guo
Second Advisor
Zhi Wei
Third Advisor
Ji Meng Loh
Fourth Advisor
Antai Wang
Fifth Advisor
Usman W. Roshan
Abstract
Asymmetric classification refers to a situation where the cost of misclassifying one class is significantly higher than the cost of misclassifying the other class. This problem is common in many real-world scenarios, such as medical diagnosis or fraud detection. In this dissertation two of the common types of asymmetric classification problems have been dealt with — imbalanced classification and ordinal classification. An example of imbalanced classification is to detect fraudulent credit card transactions where the distribution of the normal and fraud transactions are extremely skewed. On the other hand, ordinal classification, also known as ordinal regression, is widely used in various fields including detection of the stage of cancer where the stages are naturally ordered based on some characteristics of an individual. For asymmetric classification, traditional methods are not appropriate, since they have mainly been developed to address symmetric classification. Therefore, several approaches have been proposed for dealing with problems of asymmetric classification such as cost-sensitive learning, algorithmic modifications, adjustment of thresholds, adaptive prediction sets, etc. However, such methods do not provide probabilistic guarantee of type I error control or provides control but under strong assumptions and restrictions. The aim of this dissertation is to provide a more general approach to asymmetric classification. This dissertation constructs a point prediction method based on Neyman-Pearson paradigm and an interval prediction method based on conformal prediction to address ordinal classification. The constructed methods provide probabilistic guarantees of type I error control or cover the true response with high confidence. In addition, knowledge distillation has been applied in the case of classifying imbalanced binary data with the choice of suitable student, teacher networks and tuning parameters. Chapter 1 introduces notations and tools that have been used in the dissertation. Chapter 2 proposes the point prediction method based on Neyman-Pearson paradigm by reformulating an ordinal regression problem as a multiple hypothesis testing problem. This is a parametric method which is shown to control type I error at a pre-specified level. Chapter 3 introduces interval prediction methods based on conformal prediction that provides both contiguous as well as non-contiguous prediction regions with proven statistical guarantees. In Chapter 4, knowledge distillation technique has been applied to classify binary data with severe class imbalance. Experiments have been conducted on synthetic data and real-life data sets and the results for several choices of tuning parameters have been reported. Finally, Chapter 5 summarizes the contributions and suggests possible future work.
Recommended Citation
Chakraborty, Subhrasish, "Topics on asymmetric classification" (2023). Dissertations. 1794.
https://digitalcommons.njit.edu/dissertations/1794