Date of Award
Doctor of Philosophy in Electrical Engineering - (Ph.D.)
Electrical and Computer Engineering
Senjuti Basu Roy
Qing Gary Liu
Internet traffic classification or packet classification is the act of classifying packets using the extracted statistical data from the transmitted packets on a computer network. Internet traffic classification is an essential tool for Internet service providers to manage network traffic, provide users with the intended quality of service (QoS), and perform surveillance. QoS measures prioritize a network's traffic type over other traffic based on preset criteria; for instance, it gives higher priority or bandwidth to video traffic over website browsing traffic. Internet packet classification methods are also used for automated intrusion detection. They analyze incoming traffic patterns and identify malicious packets used for denial of service (DoS) or similar attacks. Internet traffic classification may also be used for website fingerprinting attacks in which an intruder analyzes encrypted traffic of a user to find behavior or usage patterns and infer the user's online activities.
Protecting users' online privacy against traffic classification attacks is the primary motivation of this work. This dissertation shows the effectiveness of machine learning algorithms in identifying user traffic by comparing 11 state-of-art classifiers and proposes three anonymization methods for masking generated user network traffic to counter the Internet packet classifiers. These methods are equalized packet length, equalized packet count, and equalized inter-arrival times of TCP packets. This work compares the results of these anonymization methods to show their effectiveness in reducing machine learning algorithms' performance for traffic classification. The results are validated using newly generated user traffic.
Additionally, a novel model based on a generative adversarial network (GAN) is introduced to automate countering the adversarial traffic classifiers. This model, which is called GAN tunnel, generates pseudo traffic patterns imitating the distributions of the real traffic generated by actual applications and encapsulates the actual network packets into the generated traffic packets. The GAN tunnel's performance is tested against random forest and extreme gradient boosting (XGBoost) traffic classifiers. These classifiers are shown not being able of detecting the actual source application of data exchanged in the GAN tunnel in the tested scenarios in this thesis.
Fathi-Kazerooni, Sina, "Countering internet packet classifiers to improve user online privacy" (2020). Dissertations. 1701.