Document Type

Thesis

Date of Award

Fall 12-31-2017

Degree Name

Master of Science in Computer Science - (M.S.)

Department

Computer Science

First Advisor

James Geller

Second Advisor

Soon Ae Chun

Third Advisor

Hai Nhat Phan

Abstract

The topic of this project is an analysis of drug-related tweets. The goal is to build a Machine Learning Model that can distinguish between tweets that indicate drug abuse and other tweets that also contain the name of a drug but do not describe abuse. Drugs can be illegal, such as heroin, or legal drugs with a potential of abuse, such as painkillers. However, building a good Machine Learning Model requires a large amount of training data. For each training tweet, a human expert has determined whether it indicates drug abuse or not. This is difficult work for humans. In this project a new “Looping Predictive Method” was developed that allows generating large training datasets from a small seed set of tweets by repeatedly adding machine-labeled tweets to the human-labeled tweets. With this method, an accuracy improvement of 15.4% was achieved from an initial set of 1,075 tweets, by expanding the training set to 29,908 tweets.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.