基于改进随机森林算法的薏苡仁产地溯源研究
Research on geographical origin traceability of Coix seed based on a modified random forest algorithm
-
摘要: 以9个产地的薏苡仁为研究对象,通过将激发-发射矩阵(EEM)荧光光谱与改进的随机森林算法结合以实现薏苡仁产地的溯源分析。其中,随机森林算法的改进主要包括两方面:一是通过主成分分析(PCA)方法降低EEM荧光光谱的维度;二是利用网格筛选法找出PCA降维过程中最优保留主成分数及判别模型超参数。结果表明:基于薏苡仁的EEM荧光光谱数据构建的改进随机森林模型(加入标准差标准化和PCA降维模块)可以对9个产地薏苡仁样本进行准确预测,最佳模型由100棵最大深度为3、叶节点最小样本数为1个的决策树结合16个主成分数(PCs)构建,其对验证集和测试集(共108个样品)的准确度均为100%,优于偏最小二乘法构建的PLS-DA模型(96%)。
-
关键词:
- 薏苡仁 /
- 随机森林算法 /
- 激发-发射矩阵荧光光谱 /
- 产地溯源
Abstract: Coix seeds from 9 different origins were taken as the research object. An attempt was made to achieve geographical origin traceability of Coix seeds through the combination of excitation-emission matrix (EEM) fluorescence spectroscopy with improved random forest algorithm. The improvements to the random forest algorithm mainly include two aspects, firstly, principal component analysis (PCA) was adopted to reduce the dimension of EEM fluorescent data; secondly, a grid search method was used to identify the optimal number of principal components(PCs) to retain and the hyperparameters of the discriminant model during the PCA dimension reduction process. The results showed that an improved random forest model, incorporating standard deviation normalization and PCA dimension reduction modules, based on Coix seeds EEM fluorescence spectroscopy data, accurately predicted the geographical origin of Coix seed samples from 9 different areas. The optimal model was constructed by combining 100 decision trees with a maximum depth of 3 and a minimum sample size of 1 at the leaf node, using 16 principal components (PCs). This model achieved 100% prediction accuracy for both the validation and test sets, which consisted of a total of 108 samples, outperforming the PLS-DA model constructed by the partial least squares method (96% prediction accuracy). -
-
[1]
刘星, 王正武.薏仁的化学成分及其应用研究[J].食品与药品, 2014, 16(2):129-133.
-
[2]
陆雅丽, 王明力, 闫岩.薏苡仁综合开发利用[J].中国食物与营养, 2013, 19(4):64-66.
-
[3]
LIN L Y, LIAO Y L, CHEN M H, et al.Molecular action mechanism of Coixol from soft-shelled adlay on tyrosinase:The future of cosmetics[J].Molecules, 2022, 27(14):4626.
-
[4]
ZENG Y W, YANG J Z, CHEN J, et al.Actional mechanisms of active ingredients in functional food adlay for human health[J].Molecules, 2022, 27(15):4808.
-
[5]
CHIANG Y F, CHUNG C P, LIN J H, et al.Adlay seed (Coix lacryma-jobi L.var.ma-yuen Stapf.) ethanolic extract fractions and subfractions induce cell cycle arrest and apoptosis in human breast and cervical cancer cell lines[J].Molecules, 2022, 27(13):3984.
-
[6]
ZHANG W, JIA X Z, XU Y H, et al.Effects of Coix seed extract, bifidobacterium BPL1, and their combination on the glycolipid metabolism in obese mice[J].Frontiers in Nutrition, 2022, 9(39):423-423.
-
[7]
ZHOU Q Y, YU R Y, LIU T L, et al.Coix seed diet ameliorates immune function disorders in experimental colitis mice[J].Nutrients, 2022, 14(1):123.
-
[8]
赵杨景, 杨峻山, 张聿梅, 等.不同产地薏苡的经济性状和质量的比较研究[J].中国中药杂志, 2002, 27(9):694-696.
-
[9]
LIU X, MAO D Z, WANG Z W, et al.Rapid identification of Coix seed varieties by near infrared spectroscopy[J].Spectroscopy and Spectral Analysis, 2014, 34(5):1259-1263.
-
[10]
刘星, 范楷, 杨俊花, 等.基于主要营养成分含量的大小颗粒薏仁米判别[J].食品与机械, 2019, 35(2):77-81
, 133. -
[11]
郑利, 陈丹, 范世明, 等.不同产地薏苡仁的鉴别及含量测定[J].福建中医药大学学报, 2012, 22(5):52-54.
-
[12]
TANG W W, WANG J C, LI W, et al.Changes in triacylglycerols content and quality control implications of Coix seeds during processing and storage[J].Foods, 2022, 11(16):2462.
-
[13]
CHANG Y Y, WU H L, WANG T, et al.Geographical origin traceability of traditional Chinese medicine Atractylodes macrocephala Koidz.by using multi-way fluorescence fingerprint and chemometric methods[J].Spectrochimica Acta Part A(Molecular and Biomolecular Spectroscopy), 2022, 269:120737.
-
[14]
LI M X, LI Y Z, CHEN Y, et al.Excitation-emission matrix fluorescence spectroscopy combined with chemometrics methods for rapid identification and quantification of adulteration in Atractylodes macrocephala Koidz[J].Microchemical Journal, 2021, 171:106884.
-
[15]
LONG W J, WU H L, WANG T, et al.Fast identification of the geographical origin of Gastrodia elata using excitation-emission matrix fluorescence and chemometric methods[J].Spectrochimica Acta Part A(Molecular and Biomolecular Spectroscopy), 2021, 258:119798.
-
[16]
HU L Q, MA S, YIN C L.Discrimination of geographical origin and detection of adulteration of kudzu root by fluorescence spectroscopy coupled with multi-way pattern recognition[J].Spectrochimica Acta Part A(Molecular and Biomolecular Spectroscopy), 2018, 193:87-94.
-
[17]
MATTHIAS S, ROSIE Y Z.The random forest algorithm for statistical learning[J].The Stata Journal(Promoting Communications on Statistics and Stata), 2020, 20(1):3-29.
-
[18]
LI S F, JIA M Z, DONG D M.Fast measurement of sugar in fruits using near infrared spectroscopy combined with random forest algorithm[J].Spectroscopy And Spectral Analysis, 2018, 38(6):1766-1771.
-
[19]
FELIPE L G, GUSTAVO R F, HENRIQUE F D A, et al.Principal component analysis:A natural approach to data exploration[J].ACM Computing Surveys, 2021.54(4):1-34.
-
[20]
成巍, 侯恩广, 李珂, 等.基于PCA的中药黄芩药效评价方法研究[J].山东科学, 2012, 25(1):47-50.
-
[1]
计量
- PDF下载量: 26
- 文章访问数: 1504
- 引证文献数: 0