Document Type
Oral Presentation
Location
Oxford Convention Center, 102 Ed Perry Boulevard Oxford, MS 38655
Event Website
https://www.oxfordicsb.org/
Start Date
25-4-2023 10:50 AM
Description
Chemometrics is a subset of machine learning which has been defined a set of advanced mathematical and statistical methods for the analysis of data. This defines many methods such as chemometrics, artificial neuro networks, support vector machines, and rule building expert systems. Chemometrics allows us to draw information from multivariate raw data, i.e., to discover patterns, causes of the patterns, and, with appropriate algorithms, the variance associated with the patterns. The most commonly used forms of chemometrics are principal components analysis (PCA for modeling) and partial least squares-discriminant analysis (PLS-DA for classification), unsupervised and supervised methods, respectively. In its simplest form, PCA makes no assumptions about the data and allows examination for patterns based on any known experimental factors (metadata) such as genotype, growing location, processing, age, etc. In its supervised form, soft independent modeling of class analogy (SIMCA), a separate PCA model is built for each class of samples and the models are compared for similarity. Unknown samples may fall into one, or more classes or no class. PLS-DA is more restrictive, always requiring identification of the classes of the samples and forcing an unknown sample into one of the specified classes. One-class PCA modeling, SIMCA with only one class of samples, is an ideal tool for authentication. A model is constructed for a set of authentic samples and the unknown sample is judged to be authentic (fitting inside the specified model limits) or adulterated (outside model limits). If PCA provides separate of samples into distinct clusters, the loadings can identify the variables (e.g., chromatographic peaks or mass spectral ions) that permit discrimination. Finally, PCA has been coupled to analysis of variance (ANOVA-PCA) to allow determination of the variance associated with each experimental factor. For example, the total variance of a set of botanical samples might be attributed to variance between runs, between genotypes, between growing locations, and the residuals from analytical variability. In conclusion, the many forms of chemometric analyses provide the analyst with powerful, well documented tools for deriving information from raw data sets.
Recommended Citation
Harnly, James, "Chemometrics: a valuable tool for deriving information from complex data sets" (2023). Oxford ICSB. 8.
https://egrove.olemiss.edu/icsb/2023_ICSB/schedule/8
Publication Date
April 2023
Accessibility Status
Searchable text
Included in
Chemometrics: a valuable tool for deriving information from complex data sets
Oxford Convention Center, 102 Ed Perry Boulevard Oxford, MS 38655
Chemometrics is a subset of machine learning which has been defined a set of advanced mathematical and statistical methods for the analysis of data. This defines many methods such as chemometrics, artificial neuro networks, support vector machines, and rule building expert systems. Chemometrics allows us to draw information from multivariate raw data, i.e., to discover patterns, causes of the patterns, and, with appropriate algorithms, the variance associated with the patterns. The most commonly used forms of chemometrics are principal components analysis (PCA for modeling) and partial least squares-discriminant analysis (PLS-DA for classification), unsupervised and supervised methods, respectively. In its simplest form, PCA makes no assumptions about the data and allows examination for patterns based on any known experimental factors (metadata) such as genotype, growing location, processing, age, etc. In its supervised form, soft independent modeling of class analogy (SIMCA), a separate PCA model is built for each class of samples and the models are compared for similarity. Unknown samples may fall into one, or more classes or no class. PLS-DA is more restrictive, always requiring identification of the classes of the samples and forcing an unknown sample into one of the specified classes. One-class PCA modeling, SIMCA with only one class of samples, is an ideal tool for authentication. A model is constructed for a set of authentic samples and the unknown sample is judged to be authentic (fitting inside the specified model limits) or adulterated (outside model limits). If PCA provides separate of samples into distinct clusters, the loadings can identify the variables (e.g., chromatographic peaks or mass spectral ions) that permit discrimination. Finally, PCA has been coupled to analysis of variance (ANOVA-PCA) to allow determination of the variance associated with each experimental factor. For example, the total variance of a set of botanical samples might be attributed to variance between runs, between genotypes, between growing locations, and the residuals from analytical variability. In conclusion, the many forms of chemometric analyses provide the analyst with powerful, well documented tools for deriving information from raw data sets.
https://egrove.olemiss.edu/icsb/2023_ICSB/schedule/8