Date of Award

Fall 12-31-2017

Document Type

Thesis

Degree Name

Master of Science in Computer Science - (M.S.)

Department

Computer Science

First Advisor

James Geller

Second Advisor

Soon Ae Chun

Third Advisor

Hai Nhat Phan

Abstract

The topic of this project is an analysis of drug-related tweets. The goal is to build a Machine Learning Model that can distinguish between tweets that indicate drug abuse and other tweets that also contain the name of a drug but do not describe abuse. Drugs can be illegal, such as heroin, or legal drugs with a potential of abuse, such as painkillers. However, building a good Machine Learning Model requires a large amount of training data. For each training tweet, a human expert has determined whether it indicates drug abuse or not. This is difficult work for humans. In this project a new “Looping Predictive Method” was developed that allows generating large training datasets from a small seed set of tweets by repeatedly adding machine-labeled tweets to the human-labeled tweets. With this method, an accuracy improvement of 15.4% was achieved from an initial set of 1,075 tweets, by expanding the training set to 29,908 tweets.

Share

COinS