中国农业科学 ›› 2022, Vol. 55 ›› Issue (2): 248-264.doi: 10.3864/j.issn.0578-1752.2022.02.002

• 作物遗传育种·种质资源·分子遗传学 • 上一篇    下一篇

陆地棉吐絮率的限制性两阶段多位点全基因组关联分析及候选基因预测

谢晓宇1(),王凯鸿1,秦晓晓1,王彩香1(),史春辉1,宁新柱2,杨永林3,秦江鸿3,李朝周1,马麒2(),宿俊吉1()   

  1. 1甘肃农业大学生命科学技术学院/省部共建干旱生境作物学国家重点实验室,兰州 730070
    2新疆农垦科学院棉花研究所,新疆石河子 832000
    3石河子农业科学研究院,新疆石河子 832000
  • 收稿日期:2021-08-13 接受日期:2021-10-26 出版日期:2022-01-16 发布日期:2022-01-26
  • 通讯作者: 王彩香,马麒,宿俊吉
  • 作者简介:谢晓宇,E-mail: xiexiaoyu0924@126.com
  • 基金资助:
    国家自然科学基金(31971986);棉花生物学国家重点实验室开放课题(CB2021A03);棉花生物学国家重点实验室开放课题(CB2021A19);甘肃省科技计划(20JR10RA520);新疆兵团财政科技计划(2020DA001);甘肃农业大学国家级大学生创新创业训练计划(202110733018)

Restricted Two-Stage Multi-Locus Genome-Wide Association Analysis and Candidate Gene Prediction of Boll Opening Rate in Upland Cotton

XIE XiaoYu1(),WANG KaiHong1,QIN XiaoXiao1,WANG CaiXiang1(),SHI ChunHui1,NING XinZhu2,YANG YongLin3,QIN JiangHong3,LI ChaoZhou1,MA Qi2(),SU JunJi1()   

  1. 1College of Life Science and Technology, Gansu Agricultural University/State Key Laboratory of Arid Land Crop Science, Lanzhou 730070
    2Cotton Research Institute, Xinjiang Academy of Agricultural and Reclamation Science, Shihezi 832000, Xinjiang
    3Shihezi Academy of Agriculture Science, Shihezi 832000, Xinjiang
  • Received:2021-08-13 Accepted:2021-10-26 Online:2022-01-16 Published:2022-01-26
  • Contact: CaiXiang WANG,Qi MA,JunJi SU

摘要:

【目的】吐絮率是反映陆地棉(Gossypium hirsutum L.)早熟性状的重要指标之一,利用全基因组关联分析(genome-wide association study,GWAS)解析吐絮率的QTL(quantitative trait locus)及其遗传效应,为陆地棉早熟性状的分子育种提供理论基础。【方法】利用315份不同陆地棉品种(系)构成的自然群体,在3个环境下对吐絮率进行表型鉴定。同时利用前期构建的9 244个具有复等位变异SNP连锁不平衡区段(SNP linkage disequilibrium block,SNPLDB)分子标记,采用限制性两阶段多位点全基因组关联分析(restricted two-stage multi-locus GWAS,RTM-GWAS)方法,检测与吐絮率显著关联的SNPLDB位点、估算其表型效应值,并建立显著关联位点在群体中QTL-allele矩阵,鉴定稳定关联的主效SNPLDB位点及其优异单倍型。根据2组转录组数据的基因表达量,在显著SNPLDB位点侧翼序列1 Mb基因组范围内挖掘可能与目标性状有关的候选基因。【结果】陆地棉自然群体在3个环境下的吐絮率变异范围为37.78%—100.00%,广义遗传力为67.03%。多环境方差分析表明,吐絮率在基因型、环境及基因型×环境互作间均呈现极显著差异(P<0.001)。通过RTM-GWAS共检测到52个与吐絮率显著关联的SNPLDB位点,共包含179个等位基因/单倍型。其中90个增效等位基因/单倍型的效应值分布范围为0.014—19.43,89个减效等位基因/单倍型的效应值分布范围为-21.49—-0.039。在上述显著关联SNPLDB位点中,6个位点在多环境联合分析和单环境分析中均被检测到,被认为可能是与吐絮率显著关联的稳定SNPLDB位点。通过对上述6个稳定SNPLDB位点不同等位变异对应表型性状的差异显著性分析,鉴定出4个优异等位变异类型,分别为LDB_16_37952328(TT)、LDB_5_96395565(AA)、LDB_16_49503485(TT)和LDB_4_81118668(TT)。进一步分析发现,其优异等位变异的频率分布在中国4个不同生态区的品种(系)间存在差异。此外,在4个稳定主效SNPLDB位点的邻近区域共注释了178个基因,并利用转录组数据分析,预测发现其中23个基因可能是调控陆地棉吐絮率的候选基因。【结论】共鉴定到52个与吐絮率显著相关的SNPLDB位点,其中,有4个SNPLDB为稳定关联的主效位点;预测发现23个基因可能与陆地棉吐絮率有关。

关键词: 陆地棉, 吐絮率, 限制性两阶段多位点全基因组关联分析, QTL-等位基因矩阵, 候选基因

Abstract:

【Objective】Boll opening rate (BOR) is one of the most important indicators reflecting the early maturing trait of upland cotton (Gossypium hirsutum L.). The genome-wide association study (GWAS) was applied to dissect the QTL (quantitative trait locus) and its genetic effect for providing a theoretical basis for molecular breeding of early maturing traits in upland cotton. 【Method】The natural population composed of 315 different upland cotton varieties (lines) were used to identify the BOR under three environments. Simultaneously, a total of 9 244 SNP linkage disequilibrium block (SNPLDB) markers with multiple alleles were constructed. Then, the restricted two-stage multi-locus GWAS (RTM-GWAS) was utilized to detect SNPLDB loci significantly associated with BOR, estimate its phenotypic effect value, establish QTL-Allele matrix for significantly associated loci in the population, and further detected the stable major SNPLDB loci and elite haplotypes. Finally, according to the gene expression levels of the two transcriptome data, candidate genes that may be related to the target trait were mined within the 1 Mb genome range of the flanking sequence of the significant SNPLDB loci. 【Result】The variation of BOR was ranged from 37.78% to 100.00% and the broad-sense heritability was 67.03% in the natural population under three environments. The multi-environment variance analysis revealed that the BOR was significantly different among genotype, environment and genotype × environment interaction (P<0.001). A total of 52 SNPLDB loci significantly associated with BOR were detected through the RTM-GWAS procedure, containing 179 alleles or haplotypes, among them, the effect values of 90 increasing alleles or haplotypes ranged from 0.014 to 19.43, and the effect values of 89 decreasing alleles or haplotypes ranged from -21.49 to -0.039. Among the significant SNPLDB loci mentioned above, 6 SNPLDB loci were detected simultaneously in both multi-environment and single environment, which were considered as stable SNPLDB loci significantly associated with BOR. Through the significance analysis of phenotypic traits corresponding to different allelic variations of the above six stable SNPLDB loci, the four favorable alleles were identified as LDB_16_37952328(TT), LDB_5_96395565(AA), LDB_16_49503485(TT), and LDB_4_81118668(TT). Besides, further analysis showed that there were significant differences in the frequency distribution of favorable alleles among varieties (lines) in four different ecological regions. Additionally, a total of 178 genes were annotated and 23 potential candidate genes were predicted in the adjacent regions of 4 stable major SNPLDB loci by transcriptome data analysis. 【Conclusion】A total of 52 SNPLDB loci significantly associated with BOR were identified, of which 4 loci were stable major SNPLDB loci. Furthermore, it was predicted that 23 genes might be related to the BOR of upland cotton. These SNPLDBs loci and candidate genes will provide a theoretical basis for marker-assisted breeding of early maturity in upland cotton.

Key words: Upland cotton, boll opening rate, RTM-GWAS, QTL allele matrix, candidate genes