Electronic Theses and Dissertations

Date of Award


Document Type


Degree Name

M.S. in Engineering Science


Computer and Information Science

First Advisor

Dawn Wilkins

Second Advisor

Conrad Cunningham

Relational Format



The current technical practice for doing classification has limitations when using gene expression microarray data. For example, the robustness of top scoring pairs does not extend to some datasets involving small data size and the gene set with best discrimination power may not be involve a combination of genes. Hence, it is necessary to construct a discriminative and stable classifier that generates highly informative gene sets. As we know, not all the features will be active in a biological process. So a good feature selector should be robust with respect to noise and outliers; the challenge is to select the most informative genes. In this study, the top discriminating pair (TDP) approach is motivated by this issue and aims to reveal which features are highly ranked according to their discrimination power. To identify TDPS, each pair of genes is assigned a score based on their relative probability distribution. Our experiment combines the TDP methodology with information gain (ig) to achieve an effective feature set. To illustrate the effectiveness of TDP with ig, we applied this method to two breast cancer datasets (Wang et al., 2005 and Van't Veer et al., 2002). The result from these experimental datasets using the TDP method is competitive with the baseline method using random forests. Information gain combined with the TDP algorithm used in this study provides a new effective method for feature selection for machine learning.


Emphasis: Computer Science



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.