高级检索

      基于机器学习和代谢组学的烤烟香韵预测及贡献度分析

      Prediction and Contribution Analysis of Flavor Characteristics in Flue-cured Tobacco Based on Machine Learning and Metabolomics

      • 摘要: 为快速、准确地评价烟叶原料香韵特征,探明卷烟感官香韵与鲜烟代谢组分的内在关系,本文采用衍生化气相色谱-质谱联用(GC-MS)方法,对湖南郴州、湖南湘西、云南玉溪3个产区的21个烟草品种中部成熟采收的鲜烟叶进行代谢组测定和烤后烟叶香韵(10种)评价的基础上,利用5种机器学习算法(线性回归、K-近邻、支持向量机回归、随机森林回归和梯度提升决策树)构建基于代谢组学的香韵评分预测模型,并通过SHAP方法解释代谢组分对预测结果的贡献。结果表明:①鲜烟叶样品中共检测出代谢物131种,有机酸、糖和氨基酸种类较为丰富,产地因素对多数代谢物含量影响较大。②不同产区烟叶香韵凸显风格有所差异:玉溪烟叶清甜香和青香较为突出,郴州烟叶表现出明显的焦甜香、焦香、烘焙香以及坚果香,湘西烟叶各项香韵评分大多介于郴州和玉溪烟叶之间。③在构建预测模型的8种香韵指标中,实现了对坚果香(RMSE=0.07,R2=0.90,MAE=0.05)、焦甜香(RMSE=0.20,R2=0.90,MAE=0.15)和醇甜香(RMSE=0.15,R2=0.86,MAE=0.11)的准确预测,青香、焦香和清甜香的预测精度相对较低,但仍具有一定的预测性能和良好的泛化性。④通过SHAP方法对坚果香、焦甜香和醇甜香预测模型进行解释:糖类和有机酸类代谢物对坚果香的贡献最为突出,有机酸类和醇类对焦甜香的贡献最大,醇甜香则受醇类代谢物影响最大。就单体代谢物而言,丙酮醇对3种香韵评分贡献最大且均呈正相关。除此之外,坚果香受赤藓糖醇、2-羟基戊二酸、粘酸和半乳糖醛酸的影响较大,焦甜香与油酸、粘酸、2-羟基戊二酸等有机酸类代谢物密切相关。

         

        Abstract: To quickly and accurately assess the flavor characteristics of tobacco leaves and elucidate the intrinsic relationship between cigarette sensory flavor and fresh leaf metabolic composition, this study employed a derivatization gas chromatography-mass spectrometry (GC-MS) method to analyze the metabolome of mature fresh middle leaves from 21 tobacco varieties across three regions—Chenzhou (Hunan Province), Xiangxi (Hunan Province), and Yuxi (Yunnan Province). Based on the evaluation of the flavor characteristics (10 types) of cured tobacco leaves, five machine learning algorithms (linear regression, K-nearest neighbors, support vector machine regression, random forest regression, and gradient boosting decision tree) were used to construct metabolites-based predictive models, and the contributions of metabolic components to predictions were interpreted via SHAP analysis. The results showed as follows. 1) A total of 131 metabolites were detected in fresh tobacco leaves, with organic acids, sugars, and amino acids relatively abundant. The geographic origin significantly influenced the content of most metabolites.2) Tobacco leaves from different regions exhibited distinctive flavor styles: Yuxi tobacco leaves prominently featured clear sweet and fresh green flavors, Chenzhou tobacco leaves displayed notable burnt-sweet, burnt, baked nuances, and nutty flavors, while Xiangxi tobacco leaves had flavor scores that generally fell between those of Chenzhou and Yuxi tobacco leaves.3) Prediction models were constructed for 8 flavors, and high prediction accuracy was achieved for nutty flavor (RMSE = 0.07, R2 = 0.90, MAE = 0.05), burnt-sweet (RMSE = 0.20, R2 = 0.90, MAE = 0.15), and mellow-sweet (RMSE = 0.15, R2 = 0.86, MAE = 0.11). Predictive performances for green, burnt, and fresh-sweet flavors were comparatively lower yet still exhibited certain predictive performance and acceptable generalizability. 4)SHAP interpretation revealed that nutty flavor was predominantly driven by sugars and organic acids. Burnt-sweet was primarily driven by organic acids and alcohols. Mellow-sweet was primarily influencedby alcohol metabolites. At the individual metabolite level, acetol showed the strongest positive contributions across the above-mentioned three flavors. Additionally, nutty was also affected by erythritol, 2-hydroxyglutarate, mucic acid, and galacturonic acid, while burnt-sweet correlated closely with oleic acid, mucic acid, 2-hydroxyglutarate, and other organic acids.

         

      /

      返回文章
      返回