Document Type
Thesis
Date of Award
Spring 5-31-2017
Degree Name
Master of Science in Computer Engineering - (M.S.)
Department
Electrical and Computer Engineering
First Advisor
MengChu Zhou
Second Advisor
Osvaldo Simeone
Third Advisor
Yun Q. Shi
Abstract
A class imbalance problem appears in many real world applications, e.g., fault diagnosis, text categorization and fraud detection. When dealing with an imbalanced dataset, feature selection becomes an important issue. To address it, this work proposes a feature selection method that is based on a decision tree rule and weighted Gini index. The effectiveness of the proposed methods is verified by classifying a dataset from Santander Bank and two datasets from UCI machine learning repository. The results show that our methods can achieve higher Area Under the Curve (AUC) and F-measure. We also compare them with filter-based feature selection approaches, i.e., Chi-Square and F-statistic. The results show that they outperform them but need slightly more computational efforts.
Recommended Citation
Liu, Haoyue, "Decision tree rule-based feature selection for imbalanced data" (2017). Theses. 25.
https://digitalcommons.njit.edu/theses/25