Date of Award

Spring 5-31-2017

Document Type

Thesis

Degree Name

Master of Science in Computer Engineering - (M.S.)

Department

Electrical and Computer Engineering

First Advisor

MengChu Zhou

Second Advisor

Osvaldo Simeone

Third Advisor

Yun Q. Shi

Abstract

A class imbalance problem appears in many real world applications, e.g., fault diagnosis, text categorization and fraud detection. When dealing with an imbalanced dataset, feature selection becomes an important issue. To address it, this work proposes a feature selection method that is based on a decision tree rule and weighted Gini index. The effectiveness of the proposed methods is verified by classifying a dataset from Santander Bank and two datasets from UCI machine learning repository. The results show that our methods can achieve higher Area Under the Curve (AUC) and F-measure. We also compare them with filter-based feature selection approaches, i.e., Chi-Square and F-statistic. The results show that they outperform them but need slightly more computational efforts.

Share

COinS