Presenter Information

James Harnly, USDAFollow

Document Type

Oral Presentation

Location

Oxford Convention Center, 102 Ed Perry Boulevard Oxford, MS 38655

Event Website

https://www.oxfordicsb.org/

Start Date

25-4-2023 10:50 AM

Description

Chemometrics is a subset of machine learning which has been defined a set of advanced mathematical and statistical methods for the analysis of data. This defines many methods such as chemometrics, artificial neuro networks, support vector machines, and rule building expert systems. Chemometrics allows us to draw information from multivariate raw data, i.e., to discover patterns, causes of the patterns, and, with appropriate algorithms, the variance associated with the patterns. The most commonly used forms of chemometrics are principal components analysis (PCA for modeling) and partial least squares-discriminant analysis (PLS-DA for classification), unsupervised and supervised methods, respectively. In its simplest form, PCA makes no assumptions about the data and allows examination for patterns based on any known experimental factors (metadata) such as genotype, growing location, processing, age, etc. In its supervised form, soft independent modeling of class analogy (SIMCA), a separate PCA model is built for each class of samples and the models are compared for similarity. Unknown samples may fall into one, or more classes or no class. PLS-DA is more restrictive, always requiring identification of the classes of the samples and forcing an unknown sample into one of the specified classes. One-class PCA modeling, SIMCA with only one class of samples, is an ideal tool for authentication. A model is constructed for a set of authentic samples and the unknown sample is judged to be authentic (fitting inside the specified model limits) or adulterated (outside model limits). If PCA provides separate of samples into distinct clusters, the loadings can identify the variables (e.g., chromatographic peaks or mass spectral ions) that permit discrimination. Finally, PCA has been coupled to analysis of variance (ANOVA-PCA) to allow determination of the variance associated with each experimental factor. For example, the total variance of a set of botanical samples might be attributed to variance between runs, between genotypes, between growing locations, and the residuals from analytical variability. In conclusion, the many forms of chemometric analyses provide the analyst with powerful, well documented tools for deriving information from raw data sets.

Publication Date

April 2023

Accessibility Status

Searchable text

Share

COinS
 
Apr 25th, 10:50 AM

Chemometrics: a valuable tool for deriving information from complex data sets

Oxford Convention Center, 102 Ed Perry Boulevard Oxford, MS 38655

Chemometrics is a subset of machine learning which has been defined a set of advanced mathematical and statistical methods for the analysis of data. This defines many methods such as chemometrics, artificial neuro networks, support vector machines, and rule building expert systems. Chemometrics allows us to draw information from multivariate raw data, i.e., to discover patterns, causes of the patterns, and, with appropriate algorithms, the variance associated with the patterns. The most commonly used forms of chemometrics are principal components analysis (PCA for modeling) and partial least squares-discriminant analysis (PLS-DA for classification), unsupervised and supervised methods, respectively. In its simplest form, PCA makes no assumptions about the data and allows examination for patterns based on any known experimental factors (metadata) such as genotype, growing location, processing, age, etc. In its supervised form, soft independent modeling of class analogy (SIMCA), a separate PCA model is built for each class of samples and the models are compared for similarity. Unknown samples may fall into one, or more classes or no class. PLS-DA is more restrictive, always requiring identification of the classes of the samples and forcing an unknown sample into one of the specified classes. One-class PCA modeling, SIMCA with only one class of samples, is an ideal tool for authentication. A model is constructed for a set of authentic samples and the unknown sample is judged to be authentic (fitting inside the specified model limits) or adulterated (outside model limits). If PCA provides separate of samples into distinct clusters, the loadings can identify the variables (e.g., chromatographic peaks or mass spectral ions) that permit discrimination. Finally, PCA has been coupled to analysis of variance (ANOVA-PCA) to allow determination of the variance associated with each experimental factor. For example, the total variance of a set of botanical samples might be attributed to variance between runs, between genotypes, between growing locations, and the residuals from analytical variability. In conclusion, the many forms of chemometric analyses provide the analyst with powerful, well documented tools for deriving information from raw data sets.

https://egrove.olemiss.edu/icsb/2023_ICSB/schedule/8