中国农业科学 ›› 2021, Vol. 54 ›› Issue (6): 1288-1300.doi: 10.3864/j.issn.0578-1752.2021.06.018

• 畜牧·兽医·资源昆虫 • 上一篇    

基于纳米孔全长转录组数据完善东方蜜蜂微孢子虫的基因组注释

陈华枝1(),范元婵1(),蒋海宾1,王杰1,范小雪1,祝智威1,隆琦1,蔡宗兵1,郑燕珍1,付中民1,2,徐国钧1,陈大福1,2,郭睿1,2()   

  1. 1福建农林大学动物科学学院(蜂学学院),福州 350002
    2福建农林大学蜂疗研究所,福州 350002
  • 收稿日期:2020-05-06 接受日期:2020-05-28 出版日期:2021-03-16 发布日期:2021-03-25
  • 通讯作者: 郭睿
  • 作者简介:陈华枝,E-mail:CHZ0720@outlook.com。|范元婵,E-mail:fanyc19980201@126.com
  • 基金资助:
    国家现代农业产业技术体系建设专项(CARS-44-KXJ7);福建省自然科学基金(2018J05042);福建农林大学杰出青年科研人才计划(xjq201814);福建农林大学科技创新专项基金(CXZX2017342);福建农林大学科技创新专项基金(CXZX2017343);福建农林大学优秀硕士学位论文资助基金陈华枝

Improvement of Nosema ceranae Genome Annotation Based on Nanopore Full-Length Transcriptome Data

HuaZhi CHEN1(),YuanChan FAN1(),HaiBin JIANG1,Jie WANG1,XiaoXue FAN1,ZhiWei ZHU1,Qi LONG1,ZongBing CAI1,YanZhen ZHENG1,ZhongMin FU1,2,GuoJun XU1,DaFu CHEN1,2,Rui GUO1,2()   

  1. 1College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002
    2Apitherapy Research Institute, Fujian Agriculture and Forestry University, Fuzhou 350002
  • Received:2020-05-06 Accepted:2020-05-28 Online:2021-03-16 Published:2021-03-25
  • Contact: Rui GUO

摘要:

【目的】利用已获得的纳米孔全长转录组数据对现有的东方蜜蜂微孢子虫(Nosema ceranae)参考基因组的基因序列和功能注释进行完善。【方法】采用TransDecoder软件预测东方蜜蜂微孢子虫基因的开放阅读框(open reading frame,ORF)及相应的氨基酸。利用gffcompare软件将全长转录本与参考基因组注释的转录本进行比较,对基因组注释基因的非编码区向上游或下游延伸,修正基因的边界。利用MISA软件鉴定长度在500 bp以上的全长转录本的简单重复序列(simple sequence repeat,SSR)位点,包括单核苷酸重复、双核苷酸重复、三核苷酸重复、四核苷酸重复、五核苷酸重复、六核苷酸重复、混合SSR等类型。通过Blast工具将鉴定到的新基因和新转录本比对Nr、KOG、eggNOG、GO和KEGG数据库,从而获得功能注释。【结果】共预测出2 353个完整ORF,其中长度分布在0—100个氨基酸的ORF最多,占总ORF数的72.12%。共对东方蜜蜂微孢子虫的2 340个基因进行了结构优化,其中5′端延长的基因有1 182个,3′端延长的基因有1 158个。共鉴定到1 658个SSR,其中单核苷酸重复、双核苷酸重复、三核苷酸重复、四核苷酸重复的数量分别为1 622、23、7和6个;单核苷酸重复类型的SSR密度最大,达到182.32个/Mb,其次为混合SSR、双核苷酸重复和三核苷酸重复,分别达到6.90、2.78和0.73个/Mb。共鉴定出954个新基因,其中分别有951、333、371、422和321个新基因可注释到Nr、KOG、eggNOG、GO和KEGG数据库。此外,还鉴定出6 164条新转录本,其中分别有6 141、2 808、2 932、3 196和2 585条新转录本可注释到Nr、KOG、eggNOG、GO和KEGG数据库。新基因和新转录本注释数量最多的物种均为东方蜜蜂微孢子虫,其次是蜜蜂微孢子虫(Nosema apis)。【结论】研究结果较好地完善了现有的东方蜜蜂微孢子虫参考基因组已注释基因的序列和功能注释,并补充和注释了大量参考基因组未注释的新基因和新转录本。

关键词: 纳米孔测序, 全长转录本, 转录组, 基因组, 蜜蜂, 东方蜜蜂微孢子虫

Abstract:

【Objective】The objective of this study is to improve gene sequence and functional annotation of current reference genome of Nosema ceranae using previously obtained Nanopore full-length transcriptome dataset. 【Method】TransDecoder software was used to predict open reading frames (ORFs) of N. ceranae and corresponding amid acids. Comparison between full-length transcripts and transcripts annotated in reference genome was performed using gffcompare software to extend upstream sequences or downstream sequences of annotated genes’ untranslated regions and correct genes’ boundaries. MISA software was used to explore simple sequence repeat (SSR) loci within transcripts with a length above 500 bp, including single nucleotide repeat, dinucleotide repeat, trinucleotide repeat, tetranucleotide repeat, pentanucleotide repeat, hexanucleotide repeat and mixed SSR. By using Blast tool, novel genes and novel transcripts were aligned to Nr, KOG, eggNOG, GO and KEGG databases to gain functional annotations. 【Result】A total of 2 353 complete ORFs were predicted, and those ORFs with a length distribution among 0-100 aa were the predominant, reaching a ratio of 72.12% among total ORFs. Additionally, structures of 2 340 N. ceranae genes were optimized; 5′ ends of 1 182 genes and 3′ ends of 1 158 genes were respectively prolonged. Moreover, 1 658 SSRs were identified, and the numbers of single nucleotide repeat, dinucleotide repeat, trinucleotide repeat, tetranucleotide repeat were 1 622, 23, seven and six, respectively. The density of single nucleotide repeat was the highest (182.32/Mb), followed by those of mixed SSR, dinucleotide repeat and trinucleotide repeat, reaching 6.90, 2.78 and 0.73/Mb, respectively. Further, 954 novel genes were identified, among them 951, 333, 371, 422 and 321 were respectively annotated to Nr, KOG, eggNOG, GO and KEGG databases. In addition, 6 164 novel transcripts were identified, among them 6 141, 2 808, 2 932, 3 196 and 2 585 were annotated to the aforementioned five databases, respectively. The species annotated by the highest number of new gene and new transcript was N. ceranae followed by Nosema apis. 【Conclusion】Our results well improve sequences and functional annotations of annotated genes in current reference genome of N. ceranae, and supplement and annotate a number of unannotated novel genes and transcripts. Lots of SSR sites were provided for research on molecular markers, information of genes and transcripts on reference genome were supplemented.

Key words: Nanopore sequencing, full-length transcript, transcriptome, genome, honeybee, Nosema ceranae