Prediction and Contribution Analysis of Flavor Characteristics in Flue-cured Tobacco Based on Machine Learning and Metabolomics
-
ZANG Zhaoyang,
-
YANG Jiashuo,
-
LIU Pingping,
-
HU Zhengrong²,
-
ZHOU Rong,
-
LI Yunxia,
-
MAO Hui,
-
HU Risheng,
-
PU Wenxuan,
-
CHEN Yang,
-
ZHOU Huina
-
Abstract
To quickly and accurately assess the flavor characteristics of tobacco leaves and elucidate the intrinsic relationship between cigarette sensory flavor and fresh leaf metabolic composition, this study employed a derivatization gas chromatography-mass spectrometry (GC-MS) method to analyze the metabolome of mature fresh middle leaves from 21 tobacco varieties across three regions—Chenzhou (Hunan Province), Xiangxi (Hunan Province), and Yuxi (Yunnan Province). Based on the evaluation of the flavor characteristics (10 types) of cured tobacco leaves, five machine learning algorithms (linear regression, K-nearest neighbors, support vector machine regression, random forest regression, and gradient boosting decision tree) were used to construct metabolites-based predictive models, and the contributions of metabolic components to predictions were interpreted via SHAP analysis. The results showed as follows. 1) A total of 131 metabolites were detected in fresh tobacco leaves, with organic acids, sugars, and amino acids relatively abundant. The geographic origin significantly influenced the content of most metabolites.2) Tobacco leaves from different regions exhibited distinctive flavor styles: Yuxi tobacco leaves prominently featured clear sweet and fresh green flavors, Chenzhou tobacco leaves displayed notable burnt-sweet, burnt, baked nuances, and nutty flavors, while Xiangxi tobacco leaves had flavor scores that generally fell between those of Chenzhou and Yuxi tobacco leaves.3) Prediction models were constructed for 8 flavors, and high prediction accuracy was achieved for nutty flavor (RMSE = 0.07, R2 = 0.90, MAE = 0.05), burnt-sweet (RMSE = 0.20, R2 = 0.90, MAE = 0.15), and mellow-sweet (RMSE = 0.15, R2 = 0.86, MAE = 0.11). Predictive performances for green, burnt, and fresh-sweet flavors were comparatively lower yet still exhibited certain predictive performance and acceptable generalizability. 4)SHAP interpretation revealed that nutty flavor was predominantly driven by sugars and organic acids. Burnt-sweet was primarily driven by organic acids and alcohols. Mellow-sweet was primarily influencedby alcohol metabolites. At the individual metabolite level, acetol showed the strongest positive contributions across the above-mentioned three flavors. Additionally, nutty was also affected by erythritol, 2-hydroxyglutarate, mucic acid, and galacturonic acid, while burnt-sweet correlated closely with oleic acid, mucic acid, 2-hydroxyglutarate, and other organic acids.
-
-