Author ORCID Identifier

0000-0002-2151-3281

Document Type

Dissertation

Date of Award

5-31-2023

Degree Name

Doctor of Philosophy in Mathematical Sciences - (Ph.D.)

Department

Mathematical Sciences

First Advisor

Wenge Guo

Second Advisor

Zhi Wei

Third Advisor

Ji Meng Loh

Fourth Advisor

Antai Wang

Fifth Advisor

Usman W. Roshan

Abstract

Asymmetric classification refers to a situation where the cost of misclassifying one class is significantly higher than the cost of misclassifying the other class. This problem is common in many real-world scenarios, such as medical diagnosis or fraud detection. In this dissertation two of the common types of asymmetric classification problems have been dealt with — imbalanced classification and ordinal classification. An example of imbalanced classification is to detect fraudulent credit card transactions where the distribution of the normal and fraud transactions are extremely skewed. On the other hand, ordinal classification, also known as ordinal regression, is widely used in various fields including detection of the stage of cancer where the stages are naturally ordered based on some characteristics of an individual. For asymmetric classification, traditional methods are not appropriate, since they have mainly been developed to address symmetric classification. Therefore, several approaches have been proposed for dealing with problems of asymmetric classification such as cost-sensitive learning, algorithmic modifications, adjustment of thresholds, adaptive prediction sets, etc. However, such methods do not provide probabilistic guarantee of type I error control or provides control but under strong assumptions and restrictions. The aim of this dissertation is to provide a more general approach to asymmetric classification. This dissertation constructs a point prediction method based on Neyman-Pearson paradigm and an interval prediction method based on conformal prediction to address ordinal classification. The constructed methods provide probabilistic guarantees of type I error control or cover the true response with high confidence. In addition, knowledge distillation has been applied in the case of classifying imbalanced binary data with the choice of suitable student, teacher networks and tuning parameters. Chapter 1 introduces notations and tools that have been used in the dissertation. Chapter 2 proposes the point prediction method based on Neyman-Pearson paradigm by reformulating an ordinal regression problem as a multiple hypothesis testing problem. This is a parametric method which is shown to control type I error at a pre-specified level. Chapter 3 introduces interval prediction methods based on conformal prediction that provides both contiguous as well as non-contiguous prediction regions with proven statistical guarantees. In Chapter 4, knowledge distillation technique has been applied to classify binary data with severe class imbalance. Experiments have been conducted on synthetic data and real-life data sets and the results for several choices of tuning parameters have been reported. Finally, Chapter 5 summarizes the contributions and suggests possible future work.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.