Date of Award
1-1-2025
Document Type
Dissertation
Degree Name
Ph.D. in Engineering Science
First Advisor
Yili Jiang
Second Advisor
Charles Walter
Third Advisor
Joshua Brown
School
University of Mississippi
Relational Format
dissertation/thesis
Abstract
Machine learning (ML) algorithms play a critical role in automated decision-making systems across domains such as healthcare, finance, and autonomous systems. However, these models are increasingly vulnerable to adversarial threats, particularly poisoning attacks that manipulate training data without the knowledge of the ML developers. As ML models are often trained on publicly available data, data poisoning is trivial for attackers to perform, with no way to determine if training data is legitimate, poisoned during data collection, or poisoned during training in the current ML training pipeline.
This dissertation investigates data poisoning attacks, with a focus on label flipping and gradient manipulation techniques, two attacks capable of compromising the integrity and performance of ML systems. The dissertation also explores the impact of these attacks on key performance metrics, including detection accuracy across multiple ML algorithms. To establish a foundation for evaluation, I benchmark Decision Trees, K-Nearest Neighbors (KNN), Logistic Regression, Random Forest, and Support Vector Machines (SVM) on manipulated datasets, generating strong baseline performance metrics that enable direct comparison of poisoning attacks.
Building on this foundation, I introduce DynaDetect[44], a novel KNN-based algorithm designed to detect data poisoning attacks in real-time. I further develop DynaDetect2.0, an improved version that integrates Convolutional Neural Networks (CNNs) for feature extraction and Mahalanobis distance for improved detection accuracy in high-dimensional data. I show the viability of DynaDetect2.0 on the CIFAR-10, ImageNet, and GTSRB datasets, where it outperformed both DynaDetect and traditional KNN in detecting label-flipping and gradient poisoning attacks.
To better understand the impact of DynaDetect2.0, I assess the vulnerability of multiple ML algorithms to poisoning attacks and examine new potential detection methods. This work emphasizes the importance of computational overhead, efficiency, and latency, ensuring these algorithms can rapidly and accurately detect data poisoning in real-world scenarios. The results provide insights into the conditions under which ML algorithms are most vulnerable to poisoning attacks and offer effective strategies for identifying these threats.
This dissertation’s anticipated contributions include advancements in detection mechanisms, such as DynaDetect2.0, and the application of these techniques to other traditional ML algorithms. By improving ML systems’ resilience, this work aims to improve their security and reliability, ensuring they can withstand sophisticated malicious attacks in diverse application environments.
Recommended Citation
Perry, Sabrina, "Improving Detection Capabilities of Traditional Machine Learning (ML) Algorithms Against Data Poisoning Attacks on Image Data" (2025). Electronic Theses and Dissertations. 3354.
https://egrove.olemiss.edu/etd/3354