Electronic Theses and Dissertations

Date of Award


Document Type


Degree Name

M.S. in Engineering Science


Computer and Information Science

First Advisor

Yixin Chen

Second Advisor

Conrad Cunningham

Relational Format



Incorporating various sources of biological information is important for biological discovery. For example, genes have a multi-view representation. They can be represented by features such as sequence length and physical-chemical properties. They can also be represented by pairwise similarities, gene expression levels, and phylogenetics position. Hence, the types vary from numerical features to categorical features. An efficient way of learning from observations with a multi-view representation of mixed type of data is thus important. We propose a large margin random forests classification approach based on random forests proximity. Random forests accommodate mixed data types naturally. Large margin classifiers are obtained from the random forests proximity kernel or its derivative kernels. We test the approach on four biological datasets. The performance is promising compared with other state of the art methods including support vector machines (SVMs) and Random Forests classifiers. It demonstrates high potential in the discovery of functional roles of genes and proteins. We also examine the effects of mixed type of data on the algorithms used.


Emphasis: Computer Science



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.