Date of Award
Ph.D. in Mathematics
Dao X Nguyen
University of Mississippi
Machine learning and artificial intelligence have changed the world significantly in recent years. Its breakthrough innovations take machine learning to a whole new level where machines can learn to recognize tasks, inspired by the human brain's neural networks. The generally accepted belief about training these machines or models is that \larger models and more data are always better". In classical statistics, the test error is a \U-shaped" curve where the bigger and more complex models have higher test errors which is not favorable. But recent studies observed a surprising concept about test error called the double descent phenomenon where the increasing model complexity decreases the test error at first and then the test error increases and finally decreases again after a certain point. This has been demonstrated true for most of the neural network models like CNNs, ResNets, and many other machine learning architectures experimentally, but the theories are far behind.
The model-wise double descent phenomenon can lead to a regime where training with more parameters hurts. As we increase the number of parameters in a neural network, the test error initially decreases, then increases, and just as the model is able to _t the train set, undergoes a second descent. The peak in test error occurs around the interpolation threshold when the models are just barely large enough to _t the train set. A model in an over-parameterized region can have a similar or better performance than models in an under-parameterized region.
Inspired by this concept, we pick a two-layer neural network model with a ReLU activation function designed for the binary classification of data under supervised learning. For example, this could be an email spam detection model that classifies emails as spam or not. Our aim is to observe and find the mathematical and statistical concepts behind the double descent behavior of the test error in the model for varying the ratio _ = n=d, where n is the number of training samples and d is the dimension of the model and we consider the asymptotics of the test error when n; d ! 1. Several studies have been done regarding the double descent behavior of similar neural network models but for a linear binary classification and hence we take another step ahead by adding a ReLU activation function before the output layer. We classify the ReLU output using a classification rule to find the theoretical test error and use the Legendre transformation and Convex Gaussian Minimax Theorem to solve the empirical risk minimization problem.
We have been able to derive a closed-form solution for the test error of the model (Theorem 3.1.1) and find the asymptotics of the fixed quantities (Theorem 7.2.1) when _ increases. Our efforts on these two theorems are accurate for any margin-based loss function and we use the square loss function to view the results from the two theorems. After confirming the existence of the double descent behavior in our model, we analyze the curve with respect to weak and strong regularization and different cluster sizes.
Abeykoon, Chathurika Srimali, "The Double Descent Behavior in a Two Layer Neural Network for Binary Classification" (2023). Electronic Theses and Dissertations. 2471.
Available for download on Friday, September 13, 2024