"Gini Distance Correlation and Feature Selection" by Xin Dang
 

Document Type

Lecture

Publication Date

10-2-2019

Abstract

Big data is becoming ubiquitous in the biological, engineering, geological and social sciences, as well as in government and public policy. Building an interpretable model is an effective way to extract information and to do prediction. However, this task becomes particularly challenging for the scenario of big data, which are large scale and ultra-high dimensional with mixed-type features. A common practice in tackling this challenge is to reduce the number of features under consideration via feature selection by choosing a subset of features that are “relevant" and useful. The work in this talk aims at proposing new dependence measure in feature selection. The features having strong dependence with the response variable are selected as candidate features. We propose a new Gini correlation to measure dependence between categorical response and numerical feature variables. Compared with the existing dependence measures, the proposed one has both computational and statistical efficiency advantages that improve the feature selection procedure and therefore the resulting prediction model. This is joint work with Dao Nguyen and Yixin Chen.

Relational Format

presentation

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.