Journal of Integrative Agriculture ›› 2026, Vol. 25 ›› Issue (1): 150-156.DOI: 10.1016/j.jia.2024.03.077

• • 上一篇    下一篇

toGC:一个用于矫正大豆疫霉中GPCR基因模型的流程

  

  • 收稿日期:2024-01-01 修回日期:2024-03-27 接受日期:2024-02-21 出版日期:2026-01-20 发布日期:2025-12-05

toGC: A pipeline to correct gene model for functional excavation of dark GPCRs in Phytophthora sojae

Min Qiu1, 2, 3*, Chun Yan1*, Huaibo Li1, Haiyang Zhao1, Siqun Tu1, Yaru Sun1, Saijiang Yong1, Ming Wang1, 2, 3#, Yuanchao Wang1, 2, 3#   

  1. 1 Sanya Institute of Nanjing Agricultural University, Department of Plant Pathology, Nanjing Agricultural University, Nanjing 210095, China

    2 The Key Laboratory of Plant Immunity, Nanjing Agricultural University, Nanjing 210095, China

    3 Key Laboratory of Soybean Disease and Pest Control (Ministry of Agriculture and Rural Affairs), Nanjing Agricultural University, Nanjing 210095, China

  • Received:2024-01-01 Revised:2024-03-27 Accepted:2024-02-21 Online:2026-01-20 Published:2025-12-05
  • About author:Min Qiu, E-mail: minqiu@njau.edu.cn; Chun Yan, E-mail: 2022202060@stu.njau.edu.cn; #Correspondence Yuanchao Wang, E-mail: wangyc@njau.edu.cn; Ming Wang, E-mail: mwang@njau.edu.cn * These authors contributed equally to this study.
  • Supported by:
    This work was supported by the grants to Min Qiu and Ming Wang from the National Natural Science Foundation of China (32100160 and 32100044), the grants to Ming Wang from the Jiangsu “Innovative and Entrepreneurial Talent” Program, China (JSSCRC2021510), and the grants to Yuanchao Wang from the Chinese Modern Agricultural Industry Technology System (CARS-004-PS14).  

摘要:

基因组注释的准确性对于后续基因功能研究至关重要。然而,常规的高通量注释基因方法可能难免存在基因模型预测错误的情况。这些基因模型错误情况会导致基因序列的错误延伸或截短,给下游的基因功能分析带来挑战。传统的通过克隆序列矫正序列的方法耗时且劳动密集,因此缺乏便捷的方法。为填补这一空白,我们开发了toGC流程,这是一个将基因组注释与转录组数据集集成起来以矫正基因模型预测错误的情况。首先我们在大豆疫霉中检索了已发表的具有克隆序列的20个基因,发现大约40%的基因存在基因模型错误的情况。下一步我们利用toGC流程,发现这些基因注释序列和克隆序列不一致的情况都可以得到矫正,得到近乎100%的准确性。随后我们将toGC矫正流程应用于大豆疫霉的双元G蛋白偶联受体(GPCR)基因家族,该家族在大豆疫霉基因组中被预测为有42个成员,但缺乏实验验证。通过使用toGC,我们确定了32个双元GPCR基因存在基因注释与toGC矫正后序列不一致的情况。值得注意的是,其中有5个基因(GPCR-TKL9GPCR-TKL15GPCR-PDE3GPCR-AC3GPCR-AC4),其注释序列与toGC矫正后序列存在非常大的差异。随后,我们通过基因克隆获得了这5个基因的实际序列,测序后发现它们均与矫正后序列一致,进一步证实了toGC矫正流程的可靠性。更重要的是,我们还发现了两个新的双元GPCR基因(GPCR-AC3GPCR-AC4),它们先前被错误地预测为一个基因。CRISPR/Cas9介导的基因敲除实验证实了GPCR-AC4参与了卵孢子的产生,而GPCR-AC3的敲除对卵孢子没有影响,进一步证实了它们作为两个独立基因的地位。除此以外,我们进一步在辣椒疫霉和终极腐霉中证实了toGC流程的可靠性。我们的研究结果突显了toGC流程在基因模型矫正方面的实用性,促进了对生物学功能的研究,并在不同物种分析中提供了潜在的应用。

Abstract:

The accuracy of genomic annotation is crucial for subsequent functional investigations; however, computational protocols used in high-throughput annotation of open reading frames (ORFs) can introduce inconsistencies.  These inconsistencies, which lead to non-uniform extension or truncation of sequence ends, pose challenges for downstream analyses.  Existing strategies to rectify these inconsistencies are time-consuming and labor-intensive, lacking specific approaches.  To address this gap, we developed toGC, a tool that integrates genomic annotation with RNA-seq datasets to rectify annotation inconsistencies.  Using toGC, we achieved an accuracy of nearly 100% accuracy in correcting inconsistencies in published Phytophthora sojae ORFs.  We applied this innovative pipeline to the GPCR-bigrams gene family, which was predicted to have 42 members in the Psojae genome but lacked experimental validation.  By employing toGC, we identified 32 GPCR-bigram ORFs with inconsistencies between previous annotations and toGC-corrected sequences.  Notably, among these were 5 genes (GPCR-TKL9, GPCR-TKL15, GPCR-PDE3, GPCR-AC3, and GPCR-AC4) showed substantial inconsistencies.  Experimental gene annotation confirmed the effectiveness of toGC, as sequences obtained through cloning matched those annotated by toGC.  Importantly, we discovered two novel GPCRs (GPCR-AC3 and GPCR-AC4), which were previously mispredicted as a single gene.  CRISPR/Cas9-mediated knockout experiments revealed the involvement of GPCR-AC4 but not GPCR-AC3 in oospore production, further confirming their status as two separate genes.  In addition to Psojae, the reliability of the toGC pipeline in Phytophthora capsici and Pythium ultimum further emphasizes the robustness of this pipeline.  Our findings highlight the utility of toGC for reliable gene model correction, facilitating investigations into biological functions and offering potential applications in diverse species analyses.

Key words: gene model correction , transcriptome ,  open reading frames ,  G-protein coupled receptors