Document Type


Date of Award


Degree Name

Doctor of Philosophy in Business Data Science - (Ph.D.)


Data Science

First Advisor

Zhipeng Yan

Second Advisor

Junmin Shi

Third Advisor

Michael A. Ehrlich

Fourth Advisor

Dantong Yu

Fifth Advisor

Xinyuan Tao

Sixth Advisor

Chase Qishi Wu


This research explores the influence of Twitter sentiment on healthcare and finance industries. It assesses how Twitter sentiment and culture measure influence COVID-19 statistics, and it investigates the impact of Twitter sentiment on S&P 1500 stock mispricing. Furthermore, it examines how tweet sentiment predicts major industry returns.

The first part examines how Hofstede’s Culture Dimensions (HCD) and Twitter economic uncertainty index (TEU) relate to COVID-19 infection rate and death rate. The results show certain aspects in HCD, such as power distance index (PDI) and masculinity (MAS) both are negatively and significantly associated with the infection rate, while indulgence (IVR) and long-term orientation (LTO) exhibit negative statistical significance to the death rate. TEU based in USA is relevant to COVID-19 death rate in short run (up to 3 months). Some practical strategies are proposed for public health officials to help mitigate COVID-19 spread.

The second part bridges a research gap by exploring the relation between aggregated tweet contents and stock market mispricing. In short, tweet features affect future stock mispricing, in different directions and magnitudes. For overvalued stocks, tweet variables including proportion of external links, average number of words, percentage of retweets, likes and replies are negatively associated with mispricing of S&P 1500 stocks. Average number of words possibly reduces mispricing by reducing idiosyncratic volatility, while proportion of external links can mitigate mispricing via channels other than liquidity or idiosyncratic volatility. For undervalued stocks, only average number of words is positively related to mispricing; average number of words affect mispricing via channels other than liquidity or idiosyncratic volatility.

Additionally, this study investigates how tweet sentiment from S&P 1500 firms predicts major industry returns by constructing multiple sentiment indices. The robustness tests show highly consistent results, proving such indices can predict the returns from three out of five major industries, including Consumables, High Technology and Healthcare. In general, the sentiment index type and prediction length do not matter much.

In conclusion, this research shows tweet sentiment is more than some meaningless noise. Instead, it has beneficial applications in both healthcare and finance fields, such as COVID-19 pandemic prediction and possible investment reference.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.