中国农业科学 ›› 2021, Vol. 54 ›› Issue (4): 864-876.doi: 10.3864/j.issn.0578-1752.2021.04.017

• 畜牧·兽医·资源昆虫 • 上一篇    

利用第三代纳米孔长读段测序技术构建和注释蜜蜂球囊菌的全长转录组

杜宇1(),祝智威1(),王杰1,王秀娜3,4,蒋海宾1,范元婵1,范小雪1,陈华枝1,隆琦1,蔡宗兵1,熊翠玲1,2,郑燕珍1,付中民1,2,陈大福1,2(),郭睿1,2()   

  1. 1福建农林大学动物科学学院(蜂学学院),福州350002
    2福建农林大学蜂疗研究所,福州 350002
    3福建农林大学生命科学学院,福州350002
    4福建省病原真菌与真菌毒素重点实验室(福建农林大学),福州 350002
  • 收稿日期:2020-05-04 接受日期:2020-05-22 出版日期:2021-02-16 发布日期:2021-02-16
  • 通讯作者: 陈大福,郭睿
  • 作者简介:杜宇,E-mail: m18505700830@163.com。|祝智威,E-mail: zzw15235470398@163.com
  • 基金资助:
    国家现代农业产业技术体系建设专项(CARS-44-KXJ7);福建省自然科学基金(2018J05042);福建农林大学杰出青年科研人才计划(xjq201814);福建省病原真菌与真菌毒素重点实验室开放课题郭睿;江西省蜜蜂生物学与饲养重点实验室开放基金(JXKLHBB-2020-04);福建农林大学优秀硕士学位论文资助基金杜宇

Construction and Annotation of Ascosphaera apis Full-Length Transcriptome Utilizing Nanopore Third-Generation Long-Read Sequencing Technology

DU Yu1(),ZHU ZhiWei1(),WANG Jie1,WANG XiuNa3,4,JIANG HaiBin1,FAN YuanChan1,FAN XiaoXue1,CHEN HuaZhi1,LONG Qi1,CAI ZongBing1,XIONG CuiLing1,2,ZHENG YanZhen1,FU ZhongMin1,2,CHEN DaFu1,2(),GUO Rui1,2()   

  1. 1 College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou 350002
    2Apitherapy Research Institution, Fujian Agriculture and Forestry University, Fuzhou 350002
    3College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002
    4Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province (Fujian Agriculture and Forestry University), Fuzhou 350002
  • Received:2020-05-04 Accepted:2020-05-22 Online:2021-02-16 Published:2021-02-16
  • Contact: DaFu CHEN,Rui GUO

摘要:

【目的】利用第三代纳米孔(nanopore)长读段测序技术对蜜蜂球囊菌(Ascosphaera apis,简称球囊菌)的纯化菌丝(Aam)和孢子(Aas)进行测序,构建和注释球囊菌的高质量全长转录组。【方法】通过Oxford Nanopore PromethION平台对Aam和Aas进行测序。利用Guppy软件对原始读段(raw reads)进行碱基识别(base calling),通过过滤短片段和低质量原始读段得到有效读段(clean reads)。通过识别两端引物鉴定全长转录本序列。通过比对Nr、Swissprot、KOG、eggNOG、Pfam、GO和KEGG数据库获得全长转录本的注释信息。分别利用CPC、CNCI、CPAT、Pfam 4种方法对长链非编码RNA(long non-coding RNA,lncRNA)进行预测,取四者的交集作为高可信度的lncRNA。【结果】Aam和Aas的纳米孔测序分别测得6 321 704和6 259 727条原始读段,经质控得到5 669 436和6 233 159条有效读段,其中包含的全长有效读段分别为4 497 102(79.32%)和4 963 101(79.62%)条。共鉴定到9 859和16 795条非冗余全长转录本,N50分别为1 482和1 658 bp,平均长度分别为1 187和1 303 bp,最大长度分别为6 472和6 815 bp。Venn分析结果显示有6 512条非冗余全长转录本为菌丝和孢子所共有,分别有3 347和10 283个非冗余全长转录本为二者特有。此外,在球囊菌菌丝和孢子中共鉴定到20 142条全长转录本,其中分别有20 809、11 151、17 723、12 164、11 340和9 833条全长转录本可注释到Nr、KOG、eggNOG、Pfam、GO和KEGG数据库。注释全长转录本数量最多的物种是球囊菌、Polytolypa hystricis和荚膜组织胞浆菌(Histoplasma capsulatum)。GO数据库注释结果显示,上述全长转录本可注释到45个功能条目,涉及细胞组件、细胞和细胞器等细胞组分相关条目;催化活性、结合和转运器活性等分子功能相关条目;以及细胞进程、代谢进程和单一组织进程等生物学进程相关条目。KEGG数据库注释结果显示,上述全长转录本还可注释到抗生素的生物合成、核糖体、氨基酸的生物合成、碳代谢和剪接体等49条通路。此外,鉴定到648条高可信度的lncRNA,包含480条基因间区lncRNA、119条反义链lncRNA和49条正义链lncRNA。【结论】构建和注释了球囊菌的首个高质量全长转录组,为探究球囊菌转录组的复杂性,完善参考基因组的序列和功能注释信息以及深入开展球囊菌可变剪接体的功能研究提供了关键依据。

关键词: 第三代高通量测序技术, 纳米孔测序, 全长转录本, 参考转录组, 蜜蜂, 蜜蜂球囊菌

Abstract:

【Objective】Purified mycelia sample (Aam) and spore sample (Aas) were sequenced using third-generation nanopore long-read sequencing technology, followed by construction and annotation of high-quality full-length transcriptome.【Method】Aam and Aas were respectively sequenced using Oxford Nanopore PromethION platform. Guppy software was used to conduct base calling of raw reads. Clean reads were obtained after filtering out short fragments and low-quality raw reads. Full-length transcripts were identified by recognizing primers at both ends of clean reads. Full-length transcripts were aligned to Nr, Swissprot, KOG, eggNOG, Pfam, GO and KEGG databases to gain corresponding annotations. Four approaches such as CPC, CNCI, CPAT, and Pfam were used to predict lncRNAs, and the intersection was deemed to be high-reliability lncRNAs.【Result】In total, 6 321 704 and 6 259 727 raw reads were yielded from nanopore sequencing of Aam and Aas, and after quality control, 5 669 436 and 6 233 159 clean reads were obtained, including 4 497 102 (79.32%) and 4 963 101 (79.62%) full-length clean reads. Additionally, 9 859 and 16 795 non-redundant full-length transcripts were identified, with a N50 of 1 482 and 1 658 bp, an average length of 1 187 and 1 303 bp, and a maximum length of 6 472 and 6 815 bp, respectively. Venn analysis showed that 6 512 non-redundant full-length transcripts were shared by Aam and Aas, while 3 347 and 10 283 ones were specific for Aam and Aas, respectively. Besides, a total of 20 142 full-length transcripts were identified in Aam and Aas, among them 20 809, 11 151, 17 723, 12 164, 11 340 and 9 833 full-length transcripts could be annotated to Nr, KOG, eggNOG, Pfam, GO and KEGG databases, respectively. Most of full-length transcripts were annotated to A. apis, Polytolypa hystricis and Histoplasma capsulatum. Moreover, GO database annotation demonstrated that the above-mentioned full-length transcripts could be annotated to 45 functional terms, involving in cell component-associated terms such as cell part, cell and organelle; molecular function-associated terms such as catalytic activity, binding and transporter activity; and biological process-associated terms such as cellular processes, metabolic processes and single-organism processes. KEGG database annotation indicated that these full-length transcripts could be annotated to 49 pathways, including biosynthesis of antibiotics, ribosome, biosynthesis of amino acid, carbon metabolism, spliceosome and so on. In addition, 648 lncRNAs were identified, including 480 long intergenic RNAs (lincRNAs), 119 anti-sense lncRNAs and 49 sense lncRNAs. 【Conclusion】The first high-quality full-length transcriptome was constructed and annotated in this work, which offers a key basis for exploration of the complexity of A. apis transcriptome, improvement of sequence and functional annotation of reference genome and further study on isoforms’ function of A. apis.

Key words: third-generation high-throughput sequencing technology, nanopore sequencing, full-length transcript, reference transcriptome, honeybee, Ascosphaera apis