Honors Theses

Exploration of Feature Selection Techniques in Machine Learning Models on HPTLC Images for Rule Extraction

Bozidar-Brannan KovachevFollow

Date of Award

Spring 5-12-2023

Document Type

Undergraduate Thesis

Department

Computer and Information Science

First Advisor

Yixin Chen

Second Advisor

Feng Wang

Third Advisor

Thai Le

Relational Format

Dissertation/Thesis

Abstract

Research related to Biology often utilizes machine learning models that are ultimately uninterpretable by the researcher. It would be helpful if researchers could leverage the same computing power but instead gain specific insight into decision-making to gain a deeper understanding of their domain knowledge. This paper seeks to select features and derive rules from a machine learning classification problem in biochemistry. The specific point of interest is five species of Glycyrrhiza, or Licorice, and the ability to classify them using High-Performance Thin Layer Chromatography (HPTLC) images. These images were taken using HPTLC methods under varying conditions to provide eight unique views of each species. Each view contains 24 samples with varying counts of the individual species. There are a few techniques applied for feature selection and rule extraction. The first two are based on methods recently pioneered and presented as “Binary Encoding of Random Forests” and “Rule Extraction using Sparse Encoding” (Liu 2012). In addition, an independently developed technique called “Interval Extraction and Consolidation” was applied, which was conceptualized due to the particular nature of the dataset. Altogether, these techniques used in consort with standard machine learning models could narrow a feature space from around one-thousand candidates to only ten. These ten most critical features were then used to derive a set of rules for the classification of the five species of licorice. Regarding feature selection, compared to standard model parameter optimization, the Binary Encoding of Random Forests performed similarly, if not much better, in reducing the feature space in almost all cases. Additionally, the application of Interval Extraction and Consolidation excelled in further simplifying the reduced feature space, often by another factor of five to ten. The selected features were then used for relatively simple rule extraction using decision trees, allowing for a more interpretable model.

Recommended Citation

Kovachev, Bozidar-Brannan, "Exploration of Feature Selection Techniques in Machine Learning Models on HPTLC Images for Rule Extraction" (2023). Honors Theses. 2841.
https://egrove.olemiss.edu/hon_thesis/2841

Accessibility Status

Searchable text

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Download

Included in

Theory and Algorithms Commons

COinS

Honors Theses

Exploration of Feature Selection Techniques in Machine Learning Models on HPTLC Images for Rule Extraction

Date of Award

Document Type

Department

First Advisor

Second Advisor

Third Advisor

Relational Format

Abstract

Recommended Citation

Accessibility Status

Creative Commons License

Included in

Browse

Search

Author Corner

Additional Information

Honors Theses

Exploration of Feature Selection Techniques in Machine Learning Models on HPTLC Images for Rule Extraction

Author

Date of Award

Document Type

Department

First Advisor

Second Advisor

Third Advisor

Relational Format

Abstract

Recommended Citation

Accessibility Status

Creative Commons License

Included in

Share

Browse

Search

Author Corner

Additional Information