An embedded feature selection method for imbalanced data classification
Document Type
Article
Publication Date
5-1-2019
Abstract
Imbalanced data is one type of datasets that are frequently found in real-world applications, e.g., fraud detection and cancer diagnosis. For this type of datasets, improving the accuracy to identify their minority class is a critically important issue. Feature selection is one method to address this issue. An effective feature selection method can choose a subset of features that favor in the accurate determination of the minority class. A decision tree is a classifier that can be built up by using different splitting criteria. Its advantage is the ease of detecting which feature is used as a splitting node. Thus, it is possible to use a decision tree splitting criterion as a feature selection method. In this paper, an embedded feature selection method using our proposed weighted Gini index x0028 WGI x0029 is proposed. Its comparison results with Chi2, F-statistic and Gini index feature selection methods show that F-statistic and Chi2 reach the best performance when only a few features are selected. As the number of selected features increases, our proposed method has the highest probability of achieving the best performance. The area under a receiver operating characteristic curve x0028 ROC AUC x0029 and F-measure are used as evaluation criteria. Experimental results with two datasets show that ROC AUC performance can be high, even if only a few features are selected and used, and only changes slightly as more and more features are selected. However, the performance of Fmeasure achieves excellent performance only if 20 x0025 or more of features are chosen. The results are helpful for practitioners to select a proper feature selection method when facing a practical problem.
Identifier
85063899495 (Scopus)
Publication Title
IEEE Caa Journal of Automatica Sinica
External Full Text Location
https://doi.org/10.1109/JAS.2019.1911447
e-ISSN
23299274
ISSN
23299266
First Page
703
Last Page
715
Issue
3
Volume
6
Grant
CMMI-1162482
Recommended Citation
Liu, Haoyue; Zhou, Mengchu; and Liu, Qing, "An embedded feature selection method for imbalanced data classification" (2019). Faculty Publications. 7620.
https://digitalcommons.njit.edu/fac_pubs/7620
