基于统计语言模型及动态规划算法的蛋白质表达载体的优化设计
Protein expression vector optimization based on statistical language model and dynamic programming
-
摘要: 针对合成生物学基因片段组装中选择最优“零件”组装功能性蛋白质表达载体费时且易出错的问题,提出一种基于引入统计语言模型(SLM)与动态规划算法的蛋白质表达载体设计方法.该方法通过统计合成生物学标准“零件”(BioBrick)的参数,将基础“零件”组装过程转化为SLM,用动态规划算法找到最优路径,以实现蛋白质表达载体的设计.实验结果证明该方法准确率高,可以减少真实装配过程的冗余操作,节省时间和费用,可用来优化其他合成生物学软件设计结果,也可独立使用来模拟装配合成生物学基因片段产生蛋白质表达载体,还可被迭代从而给出不同的优化结果供选择.
-
关键词:
- 统计语言模型 /
- 动态规划算法 /
- 蛋白质表达载体 /
- 合成生物学标准"零件"
Abstract: In order to solve the problem of time consuming and error pronein selecting optimal "brick" to assemble functional protein expression vector,based on statistical language model (SLM), a dynamic programming algorithm of protein expression vector was carried out. By collecting the statistical parameters of BioBrick standard parts and transforming the assembling process into SLM, a dynamic programming algorithm could be performed to choose suitable parts to compose the final genetic construction. The result showed this method had high accuracy,redundant operations could be reduced and the time and cost required for conducting biological experiment could be minimized. The method could be not only used to optimize a design in a synthetic biological robotic platform, but also independently used to automate the DNA assembly process in synthetic biology. It could also be iterated and then give out different optimized results for consideration. -
-
[1]
GOLER J A,BRAMLETT B W,PECCOUD J.Genetic design:rising above the sequence[J].Trends Biotechnol,2008,26:538.
-
[2]
GRASLUND S,NORDLUND P,WEIGELT J,et al.Protein production and purification[J].Nat Methods,2008(5):135.
-
[3]
CZAR M J,CAI Y,PECCOUD J.Writing DNA with GenoCAD[J].Nucleic Acids Res,2009,37:W40.
-
[4]
CAI Y,WILSON M L,PECCOUD J.GenoCAD for iGEM:a grammatical approach to the design of standard-compliant constructs[J].Nucleic Acids Res,2010,38:2637.
-
[5]
ISAACS F J,DWYER D J,DING C,et al.Engineered riboregulators enable posttranscriptional control of gene expression[J].Nat Biotechnol,2004,22:841.
-
[6]
GARDNER T S,CANTOR C R,COLLINS J J.Construction of a genetic toggle switch in Escherichia coli[J].Nature,2000,403:339.
-
[7]
ADAMES N R,WILSON M L,FANG G,et al.GenoLIB:a database of biological parts derived from a library of common plasmid features[J].Nucleic Acids Res,2015,43:4823.
-
[8]
ARKIN A.Setting the standard in synthetic biology[J].Nat Biotechnol,2008,26:771.
-
[9]
CANTON B,LABNO A,ENDY D.Refinementand standardization of synthetic biological parts and devices[J].Nat Biotechnol,2008,26:787.
-
[10]
DENSMORE D,HSIAU T H C,BATTEN C,et al.Algorithms for automated DNA assembly[J].Nucleic Acids Res,2010,38:2607.
-
[11]
COLL A,WILSON M L,GRUDEN K,et al.Rule-based design of plant expression vectors using GenoCAD[J].PLoS ONE,2015,10(7):e0132502.
-
[12]
JELINEK F.Statistical Methods for Speech Recognition (Language,Speech,and Communication)[M].Cambridge:MIT Press,1998.
-
[13]
CAI Y,HARTNETT B,GUSTAFSSON C,et al.A syntactic model to design and verify synthetic genetic constructs derived from standard biological parts[J].Bioinformatics,2007,23:2760.
-
[14]
CHEN S F,GOODMAN G.An empirical study of smoothing techniques for language modeling[J].Computer Speech and Language,1999(13):359.
-
[15]
VITERBI A J.A personal history of the viterbi algorithm[J].IEEE Signal Processing Magazine,2006,23:120.
-
[16]
HUANG F L,YU M S,HWANG C Y.An empirical study of good-turing smoothing for language models on different size corpora of Chinese[J].Journal of Computer and Communications,2013(1):14.
-
[17]
KATZ S M.Estimation of probabilities from sparse data for the language model component of a speech recogniser[J].IEEE Transactions on Acoustics (Speech and Signal Processing),1987,35:400.
-
[1]
-
点击查看大图
计量
- PDF下载量: 62
- 文章访问数: 2672
- 引证文献数: 0

下载: