Document Type

Oral Presentation

Location

Oxford Convention Center, 102 Ed Perry Boulevard Oxford, MS 38655

Event Website

https://www.oxfordicsb.org/

Start Date

25-4-2023 10:50 AM

Description

Chemometrics is a subset of machine learning which has been defined a set of advanced mathematical and statistical methods for the analysis of data. This defines many methods such as chemometrics, artificial neuro networks, support vector machines, and rule building expert systems. Chemometrics allows us to draw information from multivariate raw data, i.e., to discover patterns, causes of the patterns, and, with appropriate algorithms, the variance associated with the patterns. The most commonly used forms of chemometrics are principal components analysis (PCA for modeling) and partial least squares-discriminant analysis (PLS-DA for classification), unsupervised and supervised methods, respectively. In its simplest form, PCA makes no assumptions about the data and allows examination for patterns based on any known experimental factors (metadata) such as genotype, growing location, processing, age, etc. In its supervised form, soft independent modeling of class analogy (SIMCA), a separate PCA model is built for each class of samples and the models are compared for similarity. Unknown samples may fall into one, or more classes or no class. PLS-DA is more restrictive, always requiring identification of the classes of the samples and forcing an unknown sample into one of the specified classes. One-class PCA modeling, SIMCA with only one class of samples, is an ideal tool for authentication. A model is constructed for a set of authentic samples and the unknown sample is judged to be authentic (fitting inside the specified model limits) or adulterated (outside model limits). If PCA provides separate of samples into distinct clusters, the loadings can identify the variables (e.g., chromatographic peaks or mass spectral ions) that permit discrimination. Finally, PCA has been coupled to analysis of variance (ANOVA-PCA) to allow determination of the variance associated with each experimental factor. For example, the total variance of a set of botanical samples might be attributed to variance between runs, between genotypes, between growing locations, and the residuals from analytical variability. In conclusion, the many forms of chemometric analyses provide the analyst with powerful, well documented tools for deriving information from raw data sets.

Recommended Citation

Harnly, James, "Chemometrics: a valuable tool for deriving information from complex data sets" (2023). Oxford ICSB. 8.
https://egrove.olemiss.edu/icsb/2023_ICSB/schedule/8

Publication Date

April 2023

Accessibility Status

Searchable text

Download

Included in

Medicine and Health Sciences Commons

COinS

Apr 25th, 10:50 AM

Chemometrics: a valuable tool for deriving information from complex data sets

Oxford Convention Center, 102 Ed Perry Boulevard Oxford, MS 38655

https://egrove.olemiss.edu/icsb/2023_ICSB/schedule/8

2023 International Conference on the Science of Botanicals

Chemometrics: a valuable tool for deriving information from complex data sets

Document Type

Location

Event Website

Start Date

Description

Recommended Citation

Publication Date

Accessibility Status

Included in

Browse

Search

Author Corner

Links

Additional Information

2023 International Conference on the Science of Botanicals

Chemometrics: a valuable tool for deriving information from complex data sets

Presenter Information

Document Type

Location

Event Website

Start Date

Description

Recommended Citation

Publication Date

Accessibility Status

Included in

Share

Browse

Search

Author Corner

Links

Additional Information