Date of Award
Ph.D. in Engineering Science
University of Mississippi
Random forests (RFs) have been widely used for supervised learning tasks because of their high prediction accuracy good model interpretability and fast training process. However they are not able to learn from local structures as convolutional neural networks (CNNs) do when there exists high dependency among features. They also cannot utilize features that are jointly dependent on the label but marginally independent of it. In this dissertation we present two approaches to address these two problems respectively by dependence analysis. First a local feature sampling (LFS) approach is proposed to learn and use the locality information of features to group dependent/correlated features to train each tree. For image data the local information of features (pixels) is defined by the 2-D grid of the image. For non-image data we provided multiple ways of estimating this local structure. Our experiments shows that RF with LFS has reduced correlation and improved accuracy on multiple UCI datasets. To address the latter issue of random forest mentioned we propose a way to categorize features as marginally dependent features and jointly dependent features the latter is defined by minimum dependence sets (MDS's) or by stronger dependence sets (SDS's). Algorithms to identify MDS's and SDS's are provided. We then present a feature dependence mapping (FDM) approach to map the jointly dependent features to another feature space where they are marginally dependent. We show that by using FDM decision tree and RF have improved prediction performance on artificial datasets and a protein expression dataset.
Zhang, Silu, "Improving random forests by feature dependence analysis" (2019). Electronic Theses and Dissertations. 1800.