Date of Award
Master of Science in Computer Engineering - (M.S.)
Electrical and Computer Engineering
Yun Q. Shi
A class imbalance problem appears in many real world applications, e.g., fault diagnosis, text categorization and fraud detection. When dealing with an imbalanced dataset, feature selection becomes an important issue. To address it, this work proposes a feature selection method that is based on a decision tree rule and weighted Gini index. The effectiveness of the proposed methods is verified by classifying a dataset from Santander Bank and two datasets from UCI machine learning repository. The results show that our methods can achieve higher Area Under the Curve (AUC) and F-measure. We also compare them with filter-based feature selection approaches, i.e., Chi-Square and F-statistic. The results show that they outperform them but need slightly more computational efforts.
Liu, Haoyue, "Decision tree rule-based feature selection for imbalanced data" (2017). Theses. 25.