Improving random forests by feature dependence analysis

Silu ZhangFollow

Date of Award

1-1-2019

Document Type

Dissertation

Degree Name

Ph.D. in Engineering Science

First Advisor

Yixin Chen

Second Advisor

Xin Dang

School

University of Mississippi

Relational Format

dissertation/thesis

Abstract

Random forests (RFs) have been widely used for supervised learning tasks because of their high prediction accuracy good model interpretability and fast training process. However they are not able to learn from local structures as convolutional neural networks (CNNs) do when there exists high dependency among features. They also cannot utilize features that are jointly dependent on the label but marginally independent of it. In this dissertation we present two approaches to address these two problems respectively by dependence analysis. First a local feature sampling (LFS) approach is proposed to learn and use the locality information of features to group dependent/correlated features to train each tree. For image data the local information of features (pixels) is defined by the 2-D grid of the image. For non-image data we provided multiple ways of estimating this local structure. Our experiments shows that RF with LFS has reduced correlation and improved accuracy on multiple UCI datasets. To address the latter issue of random forest mentioned we propose a way to categorize features as marginally dependent features and jointly dependent features the latter is defined by minimum dependence sets (MDS's) or by stronger dependence sets (SDS's). Algorithms to identify MDS's and SDS's are provided. We then present a feature dependence mapping (FDM) approach to map the jointly dependent features to another feature space where they are marginally dependent. We show that by using FDM decision tree and RF have improved prediction performance on artificial datasets and a protein expression dataset.

Recommended Citation

Zhang, Silu, "Improving random forests by feature dependence analysis" (2019). Electronic Theses and Dissertations. 1800.
https://egrove.olemiss.edu/etd/1800

Download

Included in

Computer Sciences Commons

COinS

Improving random forests by feature dependence analysis

Date of Award

Document Type

Degree Name

First Advisor

Second Advisor

School

Relational Format

Abstract

Recommended Citation

Included in

Browse

Search

Author Corner

Additional Information

Improving random forests by feature dependence analysis

Author

Date of Award

Document Type

Degree Name

First Advisor

Second Advisor

School

Relational Format

Abstract

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Additional Information