中国农业科学 ›› 2018, Vol. 51 ›› Issue (18): 3591-3599.doi: 10.3864/j.issn.0578-1752.2018.18.015

• 畜牧·兽医·资源昆虫 • 上一篇    下一篇

基于比较基因组学分析的方法定位及注释双峰驼MHC基因

支立康, 额尔敦木图(), 安希文, 王超, 王瑞, 包花尔, 王秀珍   

  1. 内蒙古农业大学兽医学院/农业部动物疾病临床诊疗技术重点实验室,呼和浩特 010018
  • 收稿日期:2017-12-13 接受日期:2018-07-06 出版日期:2018-09-16 发布日期:2018-09-16
  • 作者简介:

    联系方式:支立康,E-mail:15849121059@163.com

  • 基金资助:
    国家自然科学基金(31360591)

Mapping and Annotating of Bactrian Camel MHC Gene by Using the Comparative Genomic Approach

LiKang ZHI, Erdemtu(), XiWen AN, Chao WANG, Rui WANG, Huar BAO, XiuZhen WANG   

  1. College of Veterinary Medicine, Inner Mongolia Agricultural University/Key Laboratory of Clinical Diagnosis and Treatment Technology in Animal Disease, Ministry of Agriculture P. R. China, Hohhot 010018
  • Received:2017-12-13 Accepted:2018-07-06 Online:2018-09-16 Published:2018-09-16

摘要:

【目的】定位并注释双峰驼主要组织相容性复合体(major histocompatibility complex,MHC)基因序列,为进一步研究双峰驼MHC基因提供科学依据。【方法】运用比较基因组学方法,提取人类MHC(HLA)基因编码序列和牛MHC(BoLA)基因编码序列并分别与双峰驼转录本进行blastn基因序列比对,识别出相似度较高的scaffolds,通过分析HLA、BoLA基因序列比对在这些scaffolds上的位置顺序,对多条scaffolds进行拼接,得到双峰驼MHC的Pseudo chromosome;再分别提取HLA、BoLA全基因组序列与双峰驼已拼接的scaffolds进行基因组共线性分析,利用lastz建立起的Pseudo chromosome与HLA、BoLA全基因组序列的线性关系判断筛选出的scaffolds是否准确;然后通过分析MHC基因在两物种间的线性关系,在双峰驼参考基因组中提取出MHC基因序列,并对这些序列进行基因注释;最后根据得到的双峰驼MHC基因绘制系统进化树,研究其基因间的进化关系。【结果】通过对HLA、BoLA基因编码序列与双峰驼转录本用blastn进行序列比对,识别出了相似度较高的3条scaffolds,即NW_011511766.1(全长4.1M)、NW_011515227.1(全长1.2M)和NW_011514613.1(全长15K),对其拼接得到双峰驼MHC的Pseudo chromosome;利用lastz共线性分析,识别出HLA基因序列和BoLA基因序列并比对出其在双峰驼MHC基因的共线性区域。该区域与拼接得到的Pseudo chromosome一致,证明筛选出的scaffolds是准确的。并且发现Class-Ⅰ类和Class-Ⅲ类基因集中分布在NW_011515227.1上,而Class-Ⅱ类基因集中分布在NW_011511766.1和NW_011514613.1上,进一步分析得知Class-Ⅱ类基因主要分布在NW_011511766.1 的3.5—4.1M的位置;将存在共线性区域的序列提取出来,与比对到双峰驼上的MHC基因的编码序列进行blat分析,结果在双峰驼基因组中共识别出24个与牛BoLA基因高度相似的基因,其中Ⅰ类基因1个,Ⅱ类10个, Ⅲ类基因13个。对双峰驼这24个MHC基因进行信息注释并绘制系统进化树,结果显示注释的Class-Ⅰ类和Class-Ⅱ类基因在同一分支。【结论】通过比较基因组学方法定位并注释了双峰驼的MHC基因,将双峰驼MHC基因序列定位到了3条scaffolds上,找到并注释了24个MHC基因,绘制了双峰驼MHC的Pseudo chromosome,为进一步研究双峰驼MHC基因奠定了理论基础。

关键词: 双峰驼, MHC, CBLA, BoLA, HLA, 比较基因组学

Abstract:

【Objective】The objective of this study was to locate and annotate the major histocompatibility complex (MHC) gene sequence of Bactrian camel in order to provide scientific basis for further study on Bactrian camel MHC gene. 【Method】This study used comparative genomics method. The human MHC (HLA) gene coding sequence and bovine MHC (BoLA) gene coding sequence were extracted, compared with Bactrian camel transcripts on the gene sequences through blastn, to identify the scaffolds with higher similarity. By analyzing the sequence of HLA and BoLA gene sequences on their positions on these scaffolds, multiple pieces of scaffolds were spliced to obtain the Pseudo chromosome of Bactrian camel MHC. Then, the human MHC (HLA) gene coding sequence and bovine MHC (BoLA) gene coding sequence were extracted and analyzed with the spliced scaffolds of Bactrian camels through the genomic collinearity analysis. The selected scaffolds could be judged whether or not it was accurate, based on the linear relationship between Pseudo chromosome established by lastz and HLA and BoLA genome sequences; then by analyzing the linear relationship between MHC genes in the two species, MHC gene sequences were extracted from Bactrian camel genomes, and these sequences were genetically annotated; finally, according to the obtained Bactrian camel MHC gene, the phylogenetic tree was drawn to study the evolutionary relationship between their genes. 【Result】By comparing the HLA and BoLA gene coding sequences with the Bactrian camel transcripts through blastn, three scaffolds with high similarity were identified, namely NW_011511766.1 (full-length 4.1M), NW_011515227.1 (full-length 1.2 M) and NW_011514613.1 (15K in total length), and spliced to obtain Bactrian camel MHC Pseudo chromosome; By using the lastz colinear analysis, the HLA gene sequence and the BoLA gene sequence were identified and compared with MHC gene of the Bactrian camel to obtain the colinear region. It was consistent with the spliced Pseudo chromosome, which proved that the selected scaffolds was accurate. It was found that Class-I and Class-III genes were distributed on NW_011515227.1, while Class-II genes were distributed on NW_011511766.1 and NW_011514613.1. Further analysis revealed that Class-II genes were mainly distributed in NW_011511766.1 3.5 to 4.1M position; the sequences that existed in the collinear region were extracted and subjected to blat analysis, namely aligned with the coding sequence of the MHC gene on the Bactrian camel. Results reveal that a total of 24 genes highly similar to bovine BoLA gene were identified in Bactrian camel genome, including 1 of class I gene, 10 of class II gene and 13 of class III gene. The 24 MHC genes of Bactrian camels were annotated and phylogenetic trees were mapped. The results showed that the annotated Class-I and Class-II genes were on the same branch. 【Conclusion】The method of locating and annotating the MHC gene sequence in Bactrian camel was established by comparative genomics. The MHC gene sequence of Bactrian camel was mapped to three scaffolds, 24 MHC genes were found and annotated, and the Pseudo chromosome of the MHC gene of the Bactrian camel was drawn, which laid the foundation for further study of Bactrian camel MHC gene.

Key words: Bactrian camel, MHC, CBLA, BoLA, HLA, comparative genomic