Document Type
Thesis
Date of Award
Fall 12-31-2017
Degree Name
Master of Science in Computer Science - (M.S.)
Department
Computer Science
First Advisor
James Geller
Second Advisor
Soon Ae Chun
Third Advisor
Hai Nhat Phan
Abstract
The topic of this project is an analysis of drug-related tweets. The goal is to build a Machine Learning Model that can distinguish between tweets that indicate drug abuse and other tweets that also contain the name of a drug but do not describe abuse. Drugs can be illegal, such as heroin, or legal drugs with a potential of abuse, such as painkillers. However, building a good Machine Learning Model requires a large amount of training data. For each training tweet, a human expert has determined whether it indicates drug abuse or not. This is difficult work for humans. In this project a new “Looping Predictive Method” was developed that allows generating large training datasets from a small seed set of tweets by repeatedly adding machine-labeled tweets to the human-labeled tweets. With this method, an accuracy improvement of 15.4% was achieved from an initial set of 1,075 tweets, by expanding the training set to 29,908 tweets.
Recommended Citation
Pogili, Subramanyam Reddy, "Looping predictive method to improve accuracy of a machine learning model" (2017). Theses. 45.
https://digitalcommons.njit.edu/theses/45