中国农业科学 ›› 2017, Vol. 50 ›› Issue (9): 1655-1665.doi: 10.3864/j.issn.0578-1752.2017.09.011

• 园艺 • 上一篇    下一篇

基于高通量测序组装‘赤霞珠’叶绿体基因组及其特征分析

谢海坤,焦健,樊秀彩,张颖,姜建福,孙海生,刘崇怀   

  1. 中国农业科学院郑州果树研究所,郑州450009
  • 收稿日期:2016-09-29 出版日期:2017-05-01 发布日期:2017-05-01
  • 通讯作者: 刘崇怀,Tel:13703939601;E-mail:liuchonghuai@caas.cn
  • 作者简介:谢海坤,Tel:15290850630;E-mail:1379226793@qq.com
  • 基金资助:
    国家现代农业产业技术体系建设专项资金(CARS-30-yz-1)、中国农业科学院科技创新工程专项(CAAS-ASTIP-2015-ZFRI)、农业部物种保护项目(2130135-34)

Assembling and Characteristic Analysis of the Complete Chloroplast Genome of Vitis vinifera cv. Cabernet Sauvignon from High-Throughput Sequencing Data

XIE HaiKun, JIAO Jian, FAN XiuCai, ZHANG Ying, JIANG JianFu, SUN HaiSheng, LIU ChongHuai   

  1. Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, Zhengzhou 450009
  • Received:2016-09-29 Online:2017-05-01 Published:2017-05-01

摘要: 【目的】以欧亚种葡萄‘赤霞珠’(Cabernet Sauvignon)为试材,建立适于葡萄属(Vitis)植物完整叶绿体基因组组装及其特征分析的方法,为研究葡萄属植物的进化和系统发育提供方法指导。【方法】采用Illumina HiSeq PE150双末端测序策略对其全基因组DNA建库测序,建库类型为350 bp DNA小片段文库,测序深度为10倍。以已发表的拟南芥(Arabidopsis thaliana)和欧亚种葡萄‘黑比诺’(Pinot Noir)的叶绿体基因组序列为参考,通过BLASTN比对提取葡萄叶绿体基因组序列,并用SOAPdenovo软件进行组装,得到‘赤霞珠’完整的叶绿体基因组并对其进行特征分析。【结果】基于高通量Illumina测序,共获得5.2 G的全基因组原始数据,其中,葡萄叶绿体基因组序列为0.42 G,约占全基因组序列的8%。用抽提出来的葡萄叶绿体基因组序列成功组装出‘赤霞珠’完整叶绿体基因组。特征分析表明,叶绿体基因组序列全长160 676 bp,包括大单拷贝区(large single copy,LSC)、小单拷贝区(small single copy,SSC)和2个反向重复序列(inverted repeat,IRA和IRB),长度分别为89 134、19 072和26 235 bp,具有典型被子植物叶绿体基因组环状四分体结构;共注释得到154个基因,包括99个蛋白编码基因、47个tRNA基因和8个rRNA基因;其叶绿体基因组的GC含量为37.43%;共检测到37个串联重复序列(tandem repeat sequence)和53个散在重复序列(dispersed repeats),其中,绝大部分串联重复序列的长度为11—42 bp,占叶绿体基因组序列的0.83%,而散在重复序列占叶绿体基因组序列的5.33%;此外,还检测到50个简单重复序列(simple sequence repeats,SSR)位点,大部分的SSRs均由A或T组成,同时SSRs在‘赤霞珠’叶绿体基因组上的分布是不均匀的,LSC区段含有39个SSRs,而SSC区段和IR区段分别仅有7个和4个SSRs;与蛋白编码基因对应的密码子偏好使用A/T碱基,并且编码亮氨酸(L)的密码子使用频率最高,而编码半胱氨酸(C)的密码子使用频率最低;系统发育分析表明‘赤霞珠’与‘黑比诺’、夏葡萄(Vitis aestivalis)、圆叶葡萄(Vitis rotundifolia)亲缘关系最近。【结论】基于全基因组高通量测序的方法,成功组装出‘赤霞珠’完整的叶绿体基因组,与传统获得叶绿体基因组的方法相比,此方法不需要分离叶绿体和提取cpDNA,缩短了试验时间、降低了劳动强度,并且极大地提高了试验的可行性。‘赤霞珠’叶绿体基因组的基因结构、基因顺序、GC含量和密码子偏好性均与典型的被子植物叶绿体基因组类似。

关键词: &lsquo, 赤霞珠&rsquo, 叶绿体基因组, 高通量测序, 特征分析, 系统发育分析

Abstract: 【Objective】 A method was built to assemble complete chloroplast (cp) genome of Vitis and analyze its characteristics with Vitis vinifera cv. Cabernet Sauvignon, which will provide a methodological guidance for evolution and phylogenetic analysis of vitis in the future.【Method】Total genomic DNA was extracted from young leaves of Cabernet Sauvignon using plant genomic DNA kit. The small fragments (350 bp) of DNA libraries were constructed according to the manufacturer’s manual for the Illumina HiSeq PE150, and the sequencing depth was 10 fold. Grape cp reads were extracted by BLASTN software according to cp genome sequence of Arabidopsis thaliana (NC000932) and Pinot Noir (DQ424856). SOAPdenovo 2.04 assembled the extracted cp reads into complete chloroplast genome of Cabernet Sauvignon. Then its basic characteristics were analyzed using some bioinformatic softwares. 【Result】 This research obtained total of 5.2 G raw data after high-throughput sequencing. Among them, 0.42 G clean data of grape cp reads were extracted, and it accounted for about 8%. These extracted grape cp reads assembled the complete cp genome successfully. The characteristic analysis of grape cp genome showed that it was a circular molecule of 160 676 bp in length with a typical quadripartite structure, including a pair of inverted repeats (IRA and IRB) of 26 235 bp that were separated by large and small single copy regions (LSC and SSC) of 89 134 bp and 19 072 bp, respectively. A total of 154 predicted genes, including 99 protein-coding genes, 47 tRNA genes and 8 rRNA genes were identified. And the GC content of cp genome was 37.43%. Furthermore, the cp genome of Cabernet Sauvignon contained 37 tandem repeat sequences and 53 dispersed repeats. The length of most tandem repeat sequences was 11-42 bp. They accounted for 0.83% of whole cp genome, and the dispersed repeats accounted for 5.33%. Additionally, fifty short simple repeats (SSRs) loci of cp genome were detected. And most SSR loci were composed of A or T contributing to an obvious bias in base composition. Distribution of cp SSRs was non-uniform because the regions of LSC, SSC, and IR were located by 39, 7, and 4 SSRs, respectively. The codon usage of protein-coding genes was biased to use A/T bases. And among these codons, leucine (L) and cysteine (C) were the most and least used amino acids, respectively. The phylogenetic analysis showed that Cabernet Sauvignon had a closer genetic relationship with Pinot Noir, V. aestivalis and V. rotundifolia.【Conclusion】Based on high-throughput sequencing, the complete cp genome of Cabernet Sauvignon was obtained successfully. Cp and cpDNA were not required to isolate and extract in this method which shortened the experiment time, reduced the labor intensity and improved the feasibility. The subsequent characteristic analysis showed that gene structure, gene order, GC content and codon usage were identical with typical angiosperm. This research provided perfect and detailed data for the study of cp genome of Vitis vinifera, which also supplemented many deficiencies of characteristic analysis of cp genome of Vitis, such as repeat sequences, codon bias and SSRs.

Key words: cabernet sauvignon, chloroplast genome, high-throughput sequencing, characteristic analysis, phylogenetic analysis