Document Type


Date of Award


Degree Name

Doctor of Philosophy in Computer Engineering - (Ph.D.)


Electrical and Computer Engineering

First Advisor

MengChu Zhou

Second Advisor

Nirwan Ansari

Third Advisor

Hieu Pham Trung Nguyen

Fourth Advisor

Qing Liu

Fifth Advisor

Zhipeng Yan


People nowadays use the Internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source of gathering information for data analytics, sentiment analysis, natural language processing, etc. The most critical challenge is interpreting this data and capturing the sentiment behind these expressions. Sentiment analysis is analyzing, processing, concluding, and inferencing subjective texts with the views. Companies use sentiment analysis to understand public opinions, perform market research, analyze brand reputation, recognize customer experiences, and study social media influence. According to the different needs for aspect granularity, it can be divided into document, sentence, and aspect-based sentiment analysis.

Conventionally, the true sentiment of a customer review matches its corresponding star rating. There are exceptions when the star rating of a review is opposite to its true nature. These are labeled as the outliers in a dataset for this work. The state-of-the-art methods for anomaly detection involve manual search, predefined rules, or machine learning techniques to detect such instances. This dissertation work proposes a statistics-based anomaly detection and correction method (SADCM), which helps identify such reviews and rectify their star ratings to enhance the performance of a sentiment analysis algorithm without any data loss. This data analysis pipeline preserves these outliers to correct them and prevents any information loss.

This research work focuses on performing SADCM in datasets containing customer reviews of various products, which are a) scraped from and b) publicly available. The scraped dataset includes 35,000 Amazon customer reviews while the publicly available dataset includes 100,000 Amazon customer reviews for multiple products reviewed this year. The research work also analyzes these datasets and concludes the effect of SADCM on the performances of several sentiment analysis algorithms. The results exhibit that SADCM outperforms other state-of-the-art anomaly detection algorithms with a higher accuracy and recall percentage for all the datasets. The proposed method should thus help businesses that rely on public reviews to enhance their performances in better decision-making.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.