中国农业科学 ›› 2017, Vol. 50 ›› Issue (1): 183-194.doi: 10.3864/j.issn.0578-1752.2017.01.016

• 畜牧·兽医·资源昆虫 • 上一篇    下一篇

三江黄牛全基因组数据分析

宋娜娜1,2,钟金城1,2,柴志欣1,2,汪琦1,2,何世明3,吴锦波3,蹇尚林4,冉强5,蒙欣5,胡红春4

 
  

  1. 1西南民族大学动物遗传育种学国家民委-教育部重点实验室,成都 610041;2西南民族大学青藏高原研究院 成都 610041;3阿坝州畜牧科学研究所,四川汶川 623000;4阿坝州畜牧工作站,四川汶川 623000;5汶川县畜牧工作站,四川汶川 623000
  • 收稿日期:2016-06-12 出版日期:2017-01-01 发布日期:2017-01-01
  • 通讯作者: 钟金城,E-mail:zhongjincheng518@126.com
  • 作者简介:宋娜娜,Tel:13688499824;E-mail:songnana28@126.com
  • 基金资助:
    四川省科技厅项目(2015JY0248)、中央高校服务民族地区发展项目(2015NFW01)

The Whole Genome Data Analysis of Sanjiang Cattle

SONG Nana1,2, ZHONG Jincheng1,2, CHAI Zhixin1,2, WANG Qi1,2, HE Shiming3,WU Jinbo3, JIAN Shanglin4, RAN Qiang5, MENG Xin5, HU Hongchun4   

  1. 1Key Laboratory of Animal Genetics and Breeding of State Ethnic Affairs Commission and Ministry of Education, Southwest University for Nationalities, Chengdu 610041; 2Institute of Tibetan Plateau Research, Southwest University for Nationalities, Chengdu 610041; 3Animal Husbandry Science Institute of ABa Autonomous Prefecture, Wenchuan 623000, Sichuan; 4Animal Husbandry and Veterinary Station of Aba Autonomous Prefecture, Wenchuan 623000, Sichuan; 5Animal Husbandry and Veterinary Station of Wenchuan, Wenchuan 623000, Sichuan
  • Received:2016-06-12 Online:2017-01-01 Published:2017-01-01

摘要: 【目的】研究三江黄牛群体遗传多样性,从基因组层面讨论其群体遗传变异情况。【方法】提取50个体基因组总DNA,等浓度等体积混合,构建混合样本DNA池,利用CovarisS2进行随机打断基因组DNA,电泳回收长度500 bp的DNA片段,构建DNA文库。应用Illumina HiSeq 2000测序,最终得到测序数据。利用BWA软件将短序列比对到牛参考基因组(UMD 3.1),来检测三江黄牛基因组突变情况。SAMtools、Picard-tools、GATK、Reseqtools对重测序数据进行分析,Ensembl、DAVID、dbSNP数据库对SNPs和indels进行注释。【结果】全基因组重测序分析共计得到77.8 Gb序列数据,测序深度为25.32×,覆盖率为99.31%。测序得到778 403 444个reads和77 840 344 400个碱基,比对到参考基因组UMD 3.1)reads为673 670 505,碱基为67 341 451 555,匹配率分别为86.55%和86.51%,成对比对上的reads数为635 242 898(81.61%),成对比对上的碱基数为63 512 636 924(81.59%);共确定了20 477 130个SNPs位点和1 355 308个indels,其中2 147 988个SNPs(2.4%)和90 180个indels(6.7%)是新发现的。总SNPs中,鉴定出纯合SNPs989 686(4.83%),杂合SNPs19 487 444(95.17%),纯合/杂合SNP比为1﹕19.7。转换数为14 800 438个,颠换为6 680 058个,转换/颠换(TS/TV)为2.215。剪切位点突变SNP727个,开始密码子变非开始密码子SNP117个,提前终止密码子的SNP530个,终止密码子变非终止密码子SNP88个。检测到非同义突变数为57 621,同义突变为83 797,非同义/同义比率为0.69。检测到非同义SNPs分布在9 017个基因上,其中发现567个基因与已报道的重要经济性状相符,肉质、抗病、产奶、生长性状、生殖等相关基因的数量分别为471、77、21、10、8个,其中包括功能相重叠的基因;indels数据中,缺失数量为693 180(51.15%),插入数量为662 148(48.85%),纯合indels数量为161 198(11.89%),杂合indels数量1 194 110(88.11%),大部分的变异都位于基因间隔区和内含子区;三江黄牛全基因组杂合度(H)、核苷酸多样性(Pi)及theta W分别为7.6×10-30.0 0390.0 040,说明其遗传多样性较为丰富。三江黄牛群体Tajima'D为-0.06 832,推测可能由于群体内存在不平衡选择所致。【结论】本研究为进一步分析与经济性状相关的遗传学机制和保护三江黄牛品种遗传多样性提供了基因组数据支持。

关键词: 三江黄牛, 基因组, 第二代测序技术, SNP, indel

Abstract: 【Objective】 The objective of this paper is to study the genetic diversity of Sanjiang cattle group and discuss its genetic variation at the genome level.【Method】Fifty individual genomic DNA were extracted and mixed with isocratic and equal volumes, then the DNA pool of the mixed samples were constructed. Genomic DNA was interrupted randomly by using CovarisS2 and the DNA fragments of 500 bp were recovered by electrophoresis, and  DNA library was constructed at last. Finally, the sequencing data were obtained through the Illumina HiSeq 2000. The short reads were mapped to bovine reference genome (UMD 3.1) to detect the genomic mutations of Sanjiang cattle using BWA software. The analysis of the re-sequencing data was implemented using SAMtools, Picard-tools, GATK, Reseqtools, the SNPs and indels were annotated based on the Ensembl, DAVID and dbSNP database. 【Result】A total of 77.8 Gb of sequence data were generated by whole-genome sequencing analysis, 99.31% of the reference genome sequence was covered with an mapping depth of 25.32-fold, 778 403 444 reads and 77 840 344 400 bases were obtained, of which 673 670 505 reads and 67 341 451 555 bases covered 86.55% and 86.51% of bovine reference genomes (UMD 3.1) respectively, paired-end reads mapping were 635 242 898 (81.61%), paired-end bases mapping were 63 512 636 924 (81.59%). A total of 20 477 130 SNPs and 1 355 308 small indels were identified, of which 2 147 988 SNPs (2.4%) and 90 180 (6.7%) indels were found to be new. Of the total number of SNPs, 989 686 (4.83%) homozygous SNPs and 19 487 444 (95.17%) heterozygous SNPs were discovered, homozygous/heterozygous SNPs was 1﹕19.7. Transitions were 14 800 438, transversions were 6 680 058, transition/transversion (TS/TV) was 2.215. SNPs of splice site mutations were 727, the number of SNPs which the start codon converts into no stop codon were 117, SNPs of premature stop codon were 530, the number of SNPs which stop codon converts into no stop codon were 88. A total of 57 621 non-synonymous SNPs and 83 797 synonymous SNPs were detected, the ratio was 0.69. Non-synonymous SNPs were detected in 9,017 genes, 567 genes were assigned as trait-associated genes, which included meat quality, disease resistance, milk production, growth rate, fecundity with the number of 471, 77, 21, 10, and 8 respectively, the function of some genes were overlap. In detection of indels, 693 180 (51.15%) were deletions and 662 148 (48.85%) were insertions, 161 198 (11.89%) were homozygous and 1 194 110 (88.11%) were heterozygous. Most variations were located in intergenic regions and introns. Heterozygosity (H), nucleotide diversity (Pi) and theta W of Sanjiang cattle genome-wide were 7.6 × 10-3, 0.0039, 0.0040, respectively, which indicated that Sanjiang cattle have an abundant genetic diversity. The Tajima'D of Sanjiang cattle population was-0.06 832, which speculated that the population exists an unbalanced selection.【Conclusion】Results of this research will provide valuable genomic data for further investigations of the genetic mechanisms underlying traits of interest and protection of Sanjiang cattle breeds genetic diversity.

Key words: Sanjiang Cattle, genome, next generation sequencing, SNP, indel