中国农业科学 ›› 2018, Vol. 51 ›› Issue (6): 1013-1019.doi: 10.3864/j.issn.0578-1752.2018.06.001

• 作物遗传育种·种质资源·分子遗传学 •    下一篇

利用概率估算提高植物品种分子标记鉴定的准确率

周俊飞1,崔野韩2,唐浩2,李论1,陈红2,温雯2,韩瑞玺2,黄思思2,方治伟1,彭海1

 
  

  1. 1江汉大学系统生物学研究院,武汉 430056;2农业部科技发展中心,北京 100122
  • 收稿日期:2017-10-29 出版日期:2018-03-16 发布日期:2018-03-16
  • 通讯作者: 彭海,Tel:027-84731689;E-mail:18971601772@163.com
  • 作者简介:周俊飞,Tel:18040533750;E-mail:zhoujunfei.00@163.com
  • 基金资助:
    生物产业共性技术标准研究(2016YFF0202300)、湖北省自然科学基金青年科学(2017CFB229)

Utilizing Probability Estimation Improves the Accuracy of Plant Variety Identification by Molecular Markers

ZHOU Junfei1, CUI Yehan2, TANG Hao2, LI Lun1, CHEN Hong2, WEN wen2, HAN Ruixi2, HUANG Sisi2, FANG Zhiwei1, PENG Hai1   

  1. 1Institute for Systems Biology, Jianghan University, Wuhan 430056; 2Development Center of Science and Technology of Ministry of Agriculture, Beijing 100122
  • Received:2017-10-29 Online:2018-03-16 Published:2018-03-16

摘要: 【目的】植物品种分子标记鉴定标准只抽检了基因组上部分标记位点,存在抽样误差,鉴定结论常常因此被质疑。估计植物品种分子标记鉴定位点的抽样误差与鉴定结论的可靠性,为品种分子鉴定标准的应用提供科学依据。【方法】2个品种间观察到的差异位点的数目为条件,根据贝叶斯公式建立品种间真实差异位点数目的条件概率模型。根据观察到的差异位点的数目服从于二项分布,且真实差异位点数的先验概率近似于均匀分布的特点,实现条件概率模型的计算。根据概率模型获得的概率保障的大小,将品种间的关系划分为红区、绿区和黄区,对应相同或近似品种、不同品种和待定品种。利用8个水稻品种的3 205个SSR分子标记位点的分型结果,估计2个品种组合间的真实差异水平,进而判定品种间关系的真实值。对每一对品种进行10 000次的分子标记位点的模拟抽样,每次抽取48个SSR标记位点。根据每次模拟抽样的结果,按概率模型计算概率并判定品种间的关系,将判定结论与真实值比较,验证概率模型的准确性。最后,利用概率模型为近期的西瓜品种侵权案的判决结论提供概率支持。【结果】在模拟抽样的验证试验中,每个品种组合有4 295—10 000次随机抽样在95%的概率保障下,判定为不同品种。与品种间的真实关系比较表明,利用概率模型鉴定的水稻品种间关系的正确率为100%。最后,利用概率模型为最近的西瓜品种侵权案的判决结论的正确性提供了95%以上概率保障,败诉方对抽样位点不足导致判决不可靠的质疑理由并不充分。【结论】构建了一个评估品种间关系、判定结论可靠性的概率模型,为品种间关系的分子鉴定结论赋予了概率保证,提高品种间关系判定结论的准确性,避免因检测位点不足导致的争议。

关键词: 品种鉴定, 分子标记, 抽样误差

Abstract: 【Objective】 The current standards for plant variety identification only examine a small number of markers on the genome, which may lead to sampling errors, therefore identification conclusions are often questionable. The goal of this study is to estimate the sampling errors in plant variety identification procedure and evaluate the reliability of the conclusions, and eventually provide the scientific basis for the applications of molecular identification standards. 【Method】 Based on the number of observed differential loci between two varieties, a conditional probability model was established based on the Bayes’ theorem to estimate the true number of the different loci. Given that the observed number of differential loci between two varieties follows the binomial distribution, and the prior distribution of true number is an approximate uniform distribution, the conditional probability model was finally computed. Based on the confidence levels provided by the probabilistic model, the relationship between plant varieties is divided into the red, green or yellow zones, corresponding to the same or similar, different and undetermined varieties, respectively. To validate this probabilistic model, the genotyping data of 3 205 SSR molecular markers for each of the 8 rice varieties were used. For each pair of varieties, 10 000 sets of molecular markers were simulated, and each set is composed of 48 random SSR markers. For each simulation, the relationship between the varieties was estimated based on the probability computed by the model. And then the estimated relationship was compared with the real one to evaluate the accuracy of the probabilistic model. Finally, the probabilistic model was applied to provide probabilistic support for the conclusion of a recent watermelon variety infringement case. 【Result】The validation results showed that each pair of varieties was determined as different varieties in 4 295-10 000 simulations at a confidence level of 95%. Compared with the true relationship between varieties, the probabilistic model had an accuracy of 100% in determination of rice variety relationships. Finally, the court decision about the infringement dispute of watermelon varieties also was supported by the probabilistic model at a confidence level of 95%, indicating that the losing party's doubt on the limited number of the sampling loci is not sufficient. 【Conclusion】In this study, a probabilistic model was built to evaluation the reliability of the conclusion of the variety relationships, which provides confidence levels for the molecular identification conclusion of the relationship among varieties, and thus improves the accuracy, and finally avoids the controversies caused by the insufficient number of testing markers.

Key words: variety identification, molecular marker, sampling error