Date of Award

12-31-2019

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Information Systems - (Ph.D.)

Department

Informatics

First Advisor

Yi-Fang Brook Wu

Second Advisor

Vincent Oria

Third Advisor

Hai Nhat Phan

Fourth Advisor

Shaohua David Wang

Fifth Advisor

Zhi Wei

Abstract

The ever-increasing popularity and convenience of social media enable the rapid widespread of fake news, which can cause a series of negative impacts both on individuals and society. Early detection of fake news is essential to minimize its social harm. Existing machine learning approaches are incapable of detecting a fake news story soon after it starts to spread, because they require certain amounts of data to reach decent effectiveness which take time to accumulate. To solve this problem, this research first analyzes and finds that, on social media, the user characteristics of fake news spreaders distribute significantly differently from those of the general user population. Based on this finding and also the fact that news spreaders' user profiles are usually readily available at the start of news propagation, this research proposes three machine learning models to achieve the goal of fake news early detection based on the user characteristics of its spreaders. The first model named Propagation Path Classification (PPC) detects fake news by combining recurrent neural networks with convolution neural networks to classify its propagation path which is represented as a sequence of user feature vectors. The second model named Social Media Content Classification (SMCC) improves the first model by adding 1) an embedding layer and an integration layer to model news spreaders, and 2) a fake news spreader likelihood score to model source users independently, which is particularly useful when the propagation path is extremely short, i.e., only very few retweets. The third model named Fake News Early Detection (FNED) further improves the first two models by combining users' text responses with their user characteristics as status-sensitive crowd responses, which contain more information than text responses or user characteristics alone. Two novel deep learning mechanisms are also proposed as key components in the third model: 1) Position-aware attention mechanism to determine which status-sensitive crowd responses are more discriminative; and 2) Multi-region mean-pooling to aggregate intermediate features in multiple timeframes, which improves the performance when very few retweets are available and thus needing zero-padding. The third model also incorporates a PU-Learning (Learning from Positive and Unlabeled Examples) framework to handle unlabeled and imbalanced data.

Comprehensive experiments were conducted to evaluate the proposed models on two datasets collected from Twitter and Sina Weibo, respectively. The experimental results demonstrate that the proposed models can detect fake news with over 90% accuracy within five minutes after it starts to spread and before it is retweeted 50 times, which is significantly faster than state-of-the-art baselines. Also, the third proposed model requires only 10% labeled fake news samples to achieve this effectiveness under PU-Learning settings. These advantages indicate a promising potential for the proposed models to be implemented in real-world social media platforms for fake news detection.

Share

COinS