高级检索

      基于FMR-NSGA波长筛选的鲜烟叶叶绿素含量集成预测模型

      An Integrated Prediction Model of Chlorophyll Content in Fresh Tobacco Leaves Based on Wavelength Selection of FMR-NSGA

      • 摘要: 为快速、无损检测鲜烟叶叶绿素含量,基于不同成熟度的鲜烟叶透射光谱,本研究构建了一种特征波长筛选方法和叶绿素含量预测集成模型。首先结合单变量回归F检验、互信息、递归特征消除与非支配排序遗传算法构建FMR-NSGA特征波长筛选方法;之后采用马氏距离法剔除异常值,比选数据集划分方法(K-S、SPXY、分层抽样),将不同回归器(PLSR、SVR、RF)、预处理方法(Savitzky-Golay平滑法、均值中心化法、一阶导数法、去趋势发)和特征波长筛选方法(SPA、CARS、FMR-NSGA)组合构建基回归器,比选得到3个最佳基回归器;最后采用Stacking方式和灰狼优化算法构建并优化集成模型。结果表明:分层抽样样本集划分效果最佳,Savitzky-Golay平滑和FMR-NSGA在多个回归器中表现最优,最佳基回归器(SVR-SG-FMR-NSGA)的R_\mathrmt^2 、RMSE和RPD分别为0.821、101.745 mg/L、2.364,集成模型相比最佳基回归器,R_\mathrmt^2 提升5.24%,R_\mathrmt^2 、RMSE和RPD分别为0.864、88.697 mg/L、2.711,可实现鲜烟叶叶绿素含量的精准预测。该方法可为开发鲜烟叶内含物质成分的在线检测设备提供技术支持和理论参考。

         

        Abstract: To achieve rapid and non-destructive detection of chlorophyll content in fresh tobacco leaves, this study developed a feature wavelength selection method and an ensemble model for chlorophyll content prediction based on the transmission spectra of fresh tobacco leaves at different maturity levels. Initially, the FMR-NSGA feature wavelength selection method was constructed by integrating univariate regression F-test, mutual information, recursive feature elimination, and non-dominated sorting genetic algorithm. Subsequently, the Mahalanobis distance method was employed to eliminate outliers, and dataset partitioning methods (K-S, SPXY, stratified sampling) were compared. Different regressors (PLSR, SVR, RF), preprocessing methods (Savitzky-Golay smoothing, centerlized, first derivative, and detrend), and feature wavelength selection methods (SPA, CARS, FMR-NSGA) were combined to construct base regressors, from which three optimal base regressors were selected. Finally, the ensemble model was constructed and optimized using the Stacking method and the Grey Wolf Optimizer. The results indicate that stratified sampling yielded the best dataset partitioning performance, with Savitzky-Golay smoothing and FMR-NSGA demonstrating superior performance across multiple regressors. The optimal base regressor (SVR-SG-FMR-NSGA) achieved R_\mathrmt^2 , RMSE, and RPD values of 0.821, 101.745 mg/L, and 2.364, respectively. The ensemble model outperformed the optimal base regressor, with R_\mathrmt^2 increasing by 5.24%, and R_\mathrmt^2 , RMSE, and RPD values of 0.864, 88.697 mg/L, and 2.711, respectively, enabling accurate prediction of chlorophyll content in fresh tobacco leaves. This method provides technical support and theoretical reference for the development of online detection equipment for intrinsic substance components in fresh tobacco leaves.

         

      /

      返回文章
      返回