toGC：一个用于矫正大豆疫霉中GPCR基因模型的流程

doi:10.1016/j.jia.2024.03.077

Journal of Integrative Agriculture ›› 2026, Vol. 25 ›› Issue (1): 150-156.DOI: 10.1016/j.jia.2024.03.077

toGC：一个用于矫正大豆疫霉中GPCR基因模型的流程

收稿日期:2024-01-01 修回日期:2024-03-27 接受日期:2024-02-21 出版日期:2026-01-20 发布日期:2025-12-05

toGC: A pipeline to correct gene model for functional excavation of dark GPCRs in Phytophthora sojae

Min Qiu^{1, 2, 3*}, Chun Yan^1*, Huaibo Li¹, Haiyang Zhao¹, Siqun Tu¹, Yaru Sun¹, Saijiang Yong¹, Ming Wang^{1, 2, 3#}, Yuanchao Wang^{1, 2, 3#}

¹Sanya Institute of Nanjing Agricultural University, Department of Plant Pathology, Nanjing Agricultural University, Nanjing 210095, China
²The Key Laboratory of Plant Immunity, Nanjing Agricultural University, Nanjing 210095, China
³Key Laboratory of Soybean Disease and Pest Control (Ministry of Agriculture and Rural Affairs), Nanjing Agricultural University, Nanjing 210095, China

Received:2024-01-01 Revised:2024-03-27 Accepted:2024-02-21 Online:2026-01-20 Published:2025-12-05
About author:Min Qiu, E-mail: minqiu@njau.edu.cn; Chun Yan, E-mail: 2022202060@stu.njau.edu.cn; #Correspondence Yuanchao Wang, E-mail: wangyc@njau.edu.cn; Ming Wang, E-mail: mwang@njau.edu.cn * These authors contributed equally to this study.
Supported by:
This work was supported by the grants to Min Qiu and Ming Wang from the National Natural Science Foundation of China (32100160 and 32100044), the grants to Ming Wang from the Jiangsu “Innovative and Entrepreneurial Talent” Program, China (JSSCRC2021510), and the grants to Yuanchao Wang from the Chinese Modern Agricultural Industry Technology System (CARS-004-PS14).

摘要/Abstract

摘要：

基因组注释的准确性对于后续基因功能研究至关重要。然而，常规的高通量注释基因方法可能难免存在基因模型预测错误的情况。这些基因模型错误情况会导致基因序列的错误延伸或截短，给下游的基因功能分析带来挑战。传统的通过克隆序列矫正序列的方法耗时且劳动密集，因此缺乏便捷的方法。为填补这一空白，我们开发了toGC流程，这是一个将基因组注释与转录组数据集集成起来以矫正基因模型预测错误的情况。首先我们在大豆疫霉中检索了已发表的具有克隆序列的20个基因，发现大约40%的基因存在基因模型错误的情况。下一步我们利用toGC流程，发现这些基因注释序列和克隆序列不一致的情况都可以得到矫正，得到近乎100%的准确性。随后我们将toGC矫正流程应用于大豆疫霉的双元G蛋白偶联受体（GPCR）基因家族，该家族在大豆疫霉基因组中被预测为有42个成员，但缺乏实验验证。通过使用toGC，我们确定了32个双元GPCR基因存在基因注释与toGC矫正后序列不一致的情况。值得注意的是，其中有5个基因（GPCR-TKL9，GPCR-TKL15，GPCR-PDE3，GPCR-AC3和GPCR-AC4），其注释序列与toGC矫正后序列存在非常大的差异。随后，我们通过基因克隆获得了这5个基因的实际序列，测序后发现它们均与矫正后序列一致，进一步证实了toGC矫正流程的可靠性。更重要的是，我们还发现了两个新的双元GPCR基因（GPCR-AC3和GPCR-AC4），它们先前被错误地预测为一个基因。CRISPR/Cas9介导的基因敲除实验证实了GPCR-AC4参与了卵孢子的产生，而GPCR-AC3的敲除对卵孢子没有影响，进一步证实了它们作为两个独立基因的地位。除此以外，我们进一步在辣椒疫霉和终极腐霉中证实了toGC流程的可靠性。我们的研究结果突显了toGC流程在基因模型矫正方面的实用性，促进了对生物学功能的研究，并在不同物种分析中提供了潜在的应用。

Abstract:

The accuracy of genomic annotation is crucial for subsequent functional investigations; however, computational protocols used in high-throughput annotation of open reading frames (ORFs) can introduce inconsistencies. These inconsistencies, which lead to non-uniform extension or truncation of sequence ends, pose challenges for downstream analyses. Existing strategies to rectify these inconsistencies are time-consuming and labor-intensive, lacking specific approaches. To address this gap, we developed toGC, a tool that integrates genomic annotation with RNA-seq datasets to rectify annotation inconsistencies. Using toGC, we achieved an accuracy of nearly 100% accuracy in correcting inconsistencies in published Phytophthora sojae ORFs. We applied this innovative pipeline to the GPCR-bigrams gene family, which was predicted to have 42 members in the P. sojae genome but lacked experimental validation. By employing toGC, we identified 32 GPCR-bigram ORFs with inconsistencies between previous annotations and toGC-corrected sequences. Notably, among these were 5 genes (GPCR-TKL9, GPCR-TKL15, GPCR-PDE3, GPCR-AC3, and GPCR-AC4) showed substantial inconsistencies. Experimental gene annotation confirmed the effectiveness of toGC, as sequences obtained through cloning matched those annotated by toGC. Importantly, we discovered two novel GPCRs (GPCR-AC3 and GPCR-AC4), which were previously mispredicted as a single gene. CRISPR/Cas9-mediated knockout experiments revealed the involvement of GPCR-AC4 but not GPCR-AC3 in oospore production, further confirming their status as two separate genes. In addition to P. sojae, the reliability of the toGC pipeline in Phytophthora capsici and Pythium ultimum further emphasizes the robustness of this pipeline. Our findings highlight the utility of toGC for reliable gene model correction, facilitating investigations into biological functions and offering potential applications in diverse species analyses.

Key words: gene model correction , transcriptome , open reading frames , G-protein coupled receptors

Min Qiu, Chun Yan, Huaibo Li, Haiyang Zhao, Siqun Tu, Yaru Sun, Saijiang Yong, Ming Wang, Yuanchao Wang. toGC：一个用于矫正大豆疫霉中GPCR基因模型的流程[J]. Journal of Integrative Agriculture, 2026, 25(1): 150-156.

Min Qiu, Chun Yan, Huaibo Li, Haiyang Zhao, Siqun Tu, Yaru Sun, Saijiang Yong, Ming Wang, Yuanchao Wang. toGC: A pipeline to correct gene model for functional excavation of dark GPCRs in Phytophthora sojae[J]. Journal of Integrative Agriculture, 2026, 25(1): 150-156.

参考文献

Bult C J, Blake J A, Calvi B R, Cherry J M, DiFrancesco V, Fullem R, Howe K L, Kaufman T, Mungall C, Perrimon N, Shimoyama M, Sternberg P W, Thomas P, Westerfield M, Consorti A G R. 2019. The alliance of genome resources: building a modern data ecosystem for model organism databases. Genetics, 213, 1189–1196.

Chen H, Fang Y, Song W, Shu H, Li X, Ye W, Wang Y, Dong S. 2023. The SET domain protein PsKMT3 regulates histone H3K36 trimethylation and modulates effector gene expression in the soybean pathogen Phytophthora sojae. Molecular Plant Pathology, 24, 346–358.

Chen H, Shu H, Wang L, Zhang F, Li X, Ochola S O, Mao F, Ma H, Ye W, Gu T, Jiang L, Wu Y, Wang Y, Kamoun S, Dong S. 2018. Phytophthora methylomes are modulated by 6mA methyltransferases and associated with adaptive genome regions. Genome Biology, 19, 181.

Danchin A, Ouzounis C, Tokuyasu T, Zucker J D. 2018. No wisdom in the crowd: Genome annotation in the era of big data-current status and future prospects. Microbial Biotechnology, 11, 588–605.

Denton J F, Lugo-Martinez J, Tucker A E, Schrider D R, Warren W C, Hahn M W. 2014. Extensive error in the number of genes inferred from draft genome assemblies. PLoS Computational Biology, 10, e1003998.

Deutekom E S, Vosseberg J, van Dam T J P, Snel B. 2019. Measuring the impact of gene prediction on gene loss estimates in eukaryotes by quantifying falsely inferred absences. PLoS Computational Biology, 15, e1007301.

Dragan M A, Moghul I, Priyam A, Bustos C, Wurm Y. 2016. GeneValidator: Identify problems with protein-coding gene predictions. Bioinformatics, 32, 1559–1561.

Fang Y, Tyler B M. 2016. Efficient disruption and replacement of an effector gene in the oomycete Phytophthora sojae using CRISPR/Cas9. Molecular Plant Pathology, 17, 127–139.

Feng H, Wan C, Zhang Z, Chen H, Li Z, Jiang H, Yin M, Dong S, Dou D, Wang Y, Zheng X, Ye W. 2021. Specific interaction of an RNA-binding protein with the 3´-UTR of its target mRNA is critical to oomycete sexual reproduction. PLoS Pathogens, 17, e1010001.

Gao J, Cao M, Ye W, Li H, Kong L, Zheng X, Wang Y. 2015. PsMPK7, a stress-associated mitogen-activated protein kinase (MAPK) in Phytophthora sojae, is required for stress tolerance, reactive oxygenated species detoxification, cyst germination, sexual reproduction and infection of soybean. Molecular Plant Pathology, 16, 61–70.

Guigo R, Agarwal P, Abril J F, Burset M, Fickett J W. 2000. An assessment of gene prediction accuracy in large DNA sequences. Genome Research, 10, 1631–1642.

Guigo R, Flicek P, Abril J F, Reymond A, Lagarde J, Denoeud F, Antonarakis S, Ashburner M, Bajic V B, Birney E, Castelo R, Eyras E, Ucla C, Gingeras T R, Harrow J, Hubbard T, Lewis S E, Reese M G. 2006. EGASP: The human ENCODE genome annotation assessment project. Genome Biology, 7, S2.1–31.

Hadley C. 2003. Righting the wrongs-DNA and protein sequence databases are increasingly useful research tools. But to maximize their potential, the errors in them need to be addressed. EMBO Reports, 4, 829–831.

Van den Hoogen D J, Meijer H J G, Seidl M F, Govers F. 2018. The ancient link between G-protein-coupled receptors and C-terminal phospholipid kinase domains. mBio, 9, e02119-17.

Hua C L, Wang Y L, Zheng X B, Dou D L, Zhang Z G, Govers F, Wang Y C. 2008. A Phytophthora sojae G-protein alpha subunit is involved in chemotaxis to soybean isoflavones. Eukaryotic Cell, 7, 2133–2140.

Li X, Liu Y, Tan X Q, Li D L, Yang X Y, Zhang X, Zhang D Y. 2020. The high-affinity phosphodiesterase is involved in the polarized growth and pathogenicity of. Fungal Biology, 124, 164–173.

McGowan J, Fitzpatrick D A. 2020. Recent advances in oomycete genomics. Advances in Genetics, 105, 175–228.

Meyer C, Scalzitti N, Jeannin-Girardon A, Collet P, Poch O, Thompson J D. 2020. Understanding the causes of errors in eukaryotic protein-coding gene prediction: A case study of primate proteomes. BMC Bioinformatics, 21, 513.

Mohanta T K, Al-Harrasi A. 2021. Fungal genomes: Suffering with functional annotation errors. IMA Fungus, 12, 32.

Qiu M, Li Y, Zhang X, Xuan M, Zhang B, Ye W, Zheng X, Govers F, Wang Y. 2020. G protein alpha subunit suppresses sporangium formation through a serine/threonine protein kinase in Phytophthora sojae. PLoS Pathogens, 16, e1008138.

Qiu M, Tian M, Yong S, Sun Y, Cao J, Li Y, Zhang X, Zhai C, Ye W, Wang M, Wang Y. 2023. Phase-specific transcriptional patterns of the oomycete pathogen Phytophthora sojae unravel genes essential for asexual development and pathogenic processes. PLoS Pathogens, 19, e1011256.

Salzberg S L. 2019. Next-generation genome annotation: We still struggle to get it right. Genome Biology, 20, 92.

Thines M. 2018. Oomycetes. Current Biology, 28, R812-R813.

Tyler B M, Tripathy S, Zhang X, Dehal P, Jiang R H, Aerts A, Arredondo F D, Baxter L, Bensasson D, Beynon J L, Chapman J, Damasceno C M, Dorrance A E, Dou D, Dickerman A W, Dubchak I L, Garbelotto M, Gijzen M, Gordon S G, Govers F, et al. 2006. Phytophthora genome sequences uncover evolutionary origins and mechanisms of pathogenesis. Science, 313, 1261–1266.

Wang W Z, Xue Z L, Xie L F, Zhou X, Zhang F, Zhang S C, Govers F, Liu X L. 2023. Sterol-sensing domain (SSD)-containing proteins in sterol auxotrophic mediate sterol signaling and play a role in asexual reproduction and pathogenicity. Microbiology Spectrum, 11, e0379722.

Wang Y, Ye W, Wang Y. 2018. Genome-wide identification of long non-coding RNAs suggests a potential association with effector gene transcription in Phytophthora sojae. Molecular Plant Pathology, 19, 2177–2186.

Weis W I, Kobilka B K. 2018. The molecular basis of G protein-coupled receptor activation. Annual Review of Biochemistry, 87, 897–919.

Ye W, Wang X, Tao K, Lu Y, Dai T, Dong S, Dou D, Gijzen M, Wang Y. 2011. Digital gene expression profiling of the Phytophthora sojae transcriptome. Molecular Plant-Microbe Interactions, 24, 1530–1539.

Zerbino D R, Frankish A, Flicek P. 2020. Progress, challenges, and surprises in annotating the human genome. Annual Review of Genomics and Human Genetics, 21, 55–79.

Zhang X, Zhai C, Hua C, Qiu M, Hao Y, Nie P, Ye W, Wang Y. 2016. PsHint1, associated with the G-protein α subunit PsGPA1, is required for the chemotaxis and pathogenicity of Phytophthora sojae. Molecular Plant Pathology, 17, 272–285.

toGC：一个用于矫正大豆疫霉中GPCR基因模型的流程

toGC: A pipeline to correct gene model for functional excavation of dark GPCRs in Phytophthora sojae

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics