Date of Award
M.S. in Engineering Science
Computer and Information Science
Incorporating various sources of biological information is important for biological discovery. For example, genes have a multi-view representation. They can be represented by features such as sequence length and physical-chemical properties. They can also be represented by pairwise similarities, gene expression levels, and phylogenetics position. Hence, the types vary from numerical features to categorical features. An efficient way of learning from observations with a multi-view representation of mixed type of data is thus important. We propose a large margin random forests classification approach based on random forests proximity. Random forests accommodate mixed data types naturally. Large margin classifiers are obtained from the random forests proximity kernel or its derivative kernels. We test the approach on four biological datasets. The performance is promising compared with other state of the art methods including support vector machines (SVMs) and Random Forests classifiers. It demonstrates high potential in the discovery of functional roles of genes and proteins. We also examine the effects of mixed type of data on the algorithms used.
Liu, Sheng, "Large Margin Random Forests On Mixed Type Data" (2011). Electronic Theses and Dissertations. 445.
Emphasis: Computer Science