Date of Award
2011
Document Type
Thesis
Degree Name
M.S. in Engineering Science
Department
Computer and Information Science
First Advisor
Yixin Chen
Second Advisor
Conrad Cunningham
Relational Format
dissertation/thesis
Abstract
Incorporating various sources of biological information is important for biological discovery. For example, genes have a multi-view representation. They can be represented by features such as sequence length and physical-chemical properties. They can also be represented by pairwise similarities, gene expression levels, and phylogenetics position. Hence, the types vary from numerical features to categorical features. An efficient way of learning from observations with a multi-view representation of mixed type of data is thus important. We propose a large margin random forests classification approach based on random forests proximity. Random forests accommodate mixed data types naturally. Large margin classifiers are obtained from the random forests proximity kernel or its derivative kernels. We test the approach on four biological datasets. The performance is promising compared with other state of the art methods including support vector machines (SVMs) and Random Forests classifiers. It demonstrates high potential in the discovery of functional roles of genes and proteins. We also examine the effects of mixed type of data on the algorithms used.
Recommended Citation
Liu, Sheng, "Large Margin Random Forests On Mixed Type Data" (2011). Electronic Theses and Dissertations. 445.
https://egrove.olemiss.edu/etd/445
Concentration/Emphasis
Emphasis: Computer Science