Scientia Agricultura Sinica ›› 2022, Vol. 55 ›› Issue (2): 248-264.doi: 10.3864/j.issn.0578-1752.2022.02.002


Restricted Two-Stage Multi-Locus Genome-Wide Association Analysis and Candidate Gene Prediction of Boll Opening Rate in Upland Cotton

XIE XiaoYu1(),WANG KaiHong1,QIN XiaoXiao1,WANG CaiXiang1(),SHI ChunHui1,NING XinZhu2,YANG YongLin3,QIN JiangHong3,LI ChaoZhou1,MA Qi2(),SU JunJi1()   

  1. 1College of Life Science and Technology, Gansu Agricultural University/State Key Laboratory of Arid Land Crop Science, Lanzhou 730070
    2Cotton Research Institute, Xinjiang Academy of Agricultural and Reclamation Science, Shihezi 832000, Xinjiang
    3Shihezi Academy of Agriculture Science, Shihezi 832000, Xinjiang
  • Received:2021-08-13 Accepted:2021-10-26 Online:2022-01-16 Published:2022-01-26
【Objective】Boll opening rate (BOR) is one of the most important indicators reflecting the early maturing trait of upland cotton (Gossypium hirsutum L.). The genome-wide association study (GWAS) was applied to dissect the QTL (quantitative trait locus) and its genetic effect for providing a theoretical basis for molecular breeding of early maturing traits in upland cotton. 【Method】The natural population composed of 315 different upland cotton varieties (lines) were used to identify the BOR under three environments. Simultaneously, a total of 9 244 SNP linkage disequilibrium block (SNPLDB) markers with multiple alleles were constructed. Then, the restricted two-stage multi-locus GWAS (RTM-GWAS) was utilized to detect SNPLDB loci significantly associated with BOR, estimate its phenotypic effect value, establish QTL-Allele matrix for significantly associated loci in the population, and further detected the stable major SNPLDB loci and elite haplotypes. Finally, according to the gene expression levels of the two transcriptome data, candidate genes that may be related to the target trait were mined within the 1 Mb genome range of the flanking sequence of the significant SNPLDB loci. 【Result】The variation of BOR was ranged from 37.78% to 100.00% and the broad-sense heritability was 67.03% in the natural population under three environments. The multi-environment variance analysis revealed that the BOR was significantly different among genotype, environment and genotype × environment interaction (P<0.001). A total of 52 SNPLDB loci significantly associated with BOR were detected through the RTM-GWAS procedure, containing 179 alleles or haplotypes, among them, the effect values of 90 increasing alleles or haplotypes ranged from 0.014 to 19.43, and the effect values of 89 decreasing alleles or haplotypes ranged from -21.49 to -0.039. Among the significant SNPLDB loci mentioned above, 6 SNPLDB loci were detected simultaneously in both multi-environment and single environment, which were considered as stable SNPLDB loci significantly associated with BOR. Through the significance analysis of phenotypic traits corresponding to different allelic variations of the above six stable SNPLDB loci, the four favorable alleles were identified as LDB_16_37952328(TT), LDB_5_96395565(AA), LDB_16_49503485(TT), and LDB_4_81118668(TT). Besides, further analysis showed that there were significant differences in the frequency distribution of favorable alleles among varieties (lines) in four different ecological regions. Additionally, a total of 178 genes were annotated and 23 potential candidate genes were predicted in the adjacent regions of 4 stable major SNPLDB loci by transcriptome data analysis. 【Conclusion】A total of 52 SNPLDB loci significantly associated with BOR were identified, of which 4 loci were stable major SNPLDB loci. Furthermore, it was predicted that 23 genes might be related to the BOR of upland cotton. These SNPLDBs loci and candidate genes will provide a theoretical basis for marker-assisted breeding of early maturity in upland cotton.

Key words: Upland cotton, boll opening rate, RTM-GWAS, QTL allele matrix, candidate genes

Table 1

Frequency distribution and descriptive statistics of boll opening rate in upland cotton"

吐絮率BOR(%) 平均数
Mean (%)
Range (%)
CV (%)
13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 N
AY-14 3 5 6 19 31 85 133 33 315 80.62 46.73—90.00 9.86
AY-15 4 5 11 23 25 32 35 43 36 22 24 20 19 9 7 315 55.99 10.42—100.00 34.60
SHZ-14 1 1 3 16 28 41 63 62 57 34 9 315 70.80 37.78—98.05 14.42
综合 Syn. 1 2 9 24 48 58 80 44 37 12 315 70.76 39.27—91.32 14.25

Table 2

Variance analysis of boll opening rate in upland cotton"

F- value
环境Environment 321981.19 160990.59 1084.69 <0.0001 67.03
基因型Genotype 288095.59 917.50 6.18 <0.0001
环境×基因型Environment ×Genotype 237515.09 378.21 2.55 <0.0001

Table 3

SNPLDB loci significantly associated with boll opening rate in upland cotton"

Position (bp)
-lg(P) 表型变异
PV (%)
Common environments
LDB_16_37952328 16 (D03) 37952328 46.14 1.90 AY-15 (11.49)、SHZ-14 (5.91)
LDB_5_96395565 5 (A05) 96395565 22.69 0.40 AY-15 (7.42)
LDB_16_49503485 16 (D03) 49503485 17.79 0.91 AY-15 (8.49)
LDB_10_6908012 10 (A10) 6908012 16.70 0.87 AY-15 (8.75)
LDB_15_9697358 15 (D02) 9697358 13.52 1.89 AY-15 (8.41)
LDB_21_22036171 21 (D08) 22036171 13.44 1.34
LDB_4_81118668 4 (A04) 81118668 11.80 0.91 AY-15 (4.58)
LDB_3_6725154_6746183 3 (A03) 6725154 11.63 1.21
LDB_1_57527637_57527855 1 (A01) 57527637 10.44 1.11
LDB_16_36553916_36554161 16 (D03) 36553916 9.81 1.72
LDB_11_119047116 11 (A11) 119047116 9.76 0.86
LDB_19_52309050_52309284 19 (D06) 52309050 7.97 1.54
LDB_17_51465593 17 (D04) 51465593 7.93 0.51
LDB_13_54999733 13 (A13) 54999733 7.36 0.61
Position (bp)
-lg(P) 表型变异
PV (%)
Common environments
LDB_20_6608424 20 (D07) 6608424 6.63
LDB_7_94884564_94884800 7 (A07) 94884564 6.34
LDB_19_53788858 19 (D06) 53788858 6.25 0.27
LDB_9_58666084_58666088 9 (A09) 58666084 6.06 1.12
LDB_6_41226103 6 (A06) 41226103 5.52 9.00
LDB_15_46384945_46385146 15 (D02) 46384945 5.17 0.36
LDB_15_67139952_67140180 15 (D02) 67139952 5.14 1.68
LDB_25_27281705 25 (D12) 27281705 5.10
LDB_23_6012092 23 (D10) 6012092 4.94 0.18
LDB_1_33221331 1 (A01) 33221331 4.06 0.38
LDB_12_96771402 12 (A12) 96771402 3.84 0.97
LDB_15_65780303 15 (D02) 65780303 3.65 0.67
LDB_22_48423583 22 (D09) 48423583 3.64 0.88
LDB_14_45778715_45778973 14 (D01) 45778715 3.58 1.08
LDB_21_5030313_5030325 21 (D08) 5030313 3.55 0.15
LDB_25_39776065 25 (D12) 39776065 3.47 0.37
LDB_26_45642548 26 (D13) 45642548 3.42 SHZ-14 (1.77)
LDB_5_30493466 5 (A05) 30493466 3.20
LDB_12_103902963 12 (A12) 103902963 3.15 0.39
LDB_3_42046528_42046540 3 (A03) 42046528 2.86
LDB_10_70110862_70110882 10 (A10) 70110862 2.51
LDB_13_72681594_72681843 13 (A13) 72681594 2.46 0.21
LDB_19_39327466 19 (D06) 39327466 2.43
LDB_11_18062977 11 (A11) 18062977 2.42 0.44
LDB_8_113962673 8 (A08) 113962673 2.30
LDB_23_24053424 23 (D10) 24053424 2.25 0.43
LDB_21_25090078 21 (D08) 25090078 2.19 0.47
LDB_6_108270563 6 (A06) 108270563 2.05
LDB_25_1835041 25 (D12) 1835041 1.86 0.18
LDB_2_4064617 2 (A02) 4064617 1.76 0.29
LDB_18_52473670 18 (D05) 52473670 1.71 0.32
LDB_13_62696637 13 (A13) 62696637 1.62 0.30
LDB_8_8677270 8 (A08) 8677270 1.56
LDB_24_59986795 24 (D11) 59986795 1.48
LDB_17_1478345 17 (D04) 1478345 1.39 0.42
LDB_5_23314512 5 (A05) 23314512 1.34
LDB_15_17739938 15 (D02) 17739938 1.34
LDB_17_53762505_53762526 17 (D04) 53762505 1.33 AY-14 (4.68)

Fig. 1

Genetic analysis of boll opening rate in upland cotton by MLM-GWAS and RTM-GWAS procedure A: Manhattan diagram of the RTM-GWAS; B: Manhattan diagram of the MLM-GWAS; C: QTL-allele effect distribution of the tested population; D: QTL-allele matrix of boll opening rate in the tested population. The numbers 1-6 in the Fig.A and Fig.C represent the six stable SNPLDB loci. In the Fig. D, the warm color system represents the positive effect allele, the cool color system represents the negative effect allele, and the color depth represents the size of the effect value"

Fig. 2

Comparison of boll opening rate with different allelic variation at six stable and significant association SNPLDB loci BOR: Boll opening rate. Lowercase letters indicate significant difference at P<0.05 according to LSD multiple-comparison. The same as below"

Fig. 3

Frequency distribution of superior alleles of four stable association loci A: Comparison of the frequency of superior alleles of four stable four SNPLDB loci between extreme materials with high and low BOR; B: Comparison of boll opening rate of four varieties among different geographic regions. Capital letters indicate significant difference at P<0.01 according to LSD multiple-comparison; C-F: Comparison of superior allelic variation frequencies of four stable SNPLDB loci among four regional varieties and lines. YRR: Yellow River Region; YZRR: Yangtze River Region; NSER: Northern Super Early-maturing Region; NIR: Northwest Inland Region"

Fig. 4

Prediction of candidate genes related to the BOR of upland cotton A and B: Gene expression patterns of upland cotton RNA-Seq data of NAU (A) and CRI (B) in different tissues; C: Venn diagrams of common genes between two RNA sequence data (NAU and CRI); D: Heat map of expression patterns of 23 candidate genes related to the BOR of upland cotton. Red indicates high expression of the gene at a certain location or period, while green and black indicate low expression of the gene at a certain location or period"

Table 4

Candidate genes related to BOR in upland cotton"

Candidate gene
Gene name
Gene function annotation
Group a
GH_D03G1078 SWEET10 双向糖转运体SWEET10 Bidirectional sugar transporter SWEET10
GH_D03G1058 slc25a24 钙结合线粒体载体蛋白 SCAMC-1 Calcium-binding mitochondrial carrier protein SCaMC-1
GH_D03G1629 IRL5 植物细胞内Ras-group相关LRR蛋白5 Plant intracellular Ras-group-related LRR protein 5
GH_D03G1059 NA NA
GH_D03G1067 Bicc1 蛋白双尾C同源物1 Protein bicaudal C homolog 1
GH_D03G1594 RABD2C Ras-related蛋白质RABD2c Ras-related protein RABD2c
GH_D03G1618 NA 假定的转化酶抑制剂 Putative invertase inhibitor
GH_A04G1255 RRP6L3 蛋白质RRP6-like 3 Protein RRP6-like 3
GH_D03G1643 ABCB2 ABC转运体B家族成员2 ABC transporter B family member 2
Group b
GH_D03G1586 HOS1 E3泛素蛋白连接酶HOS1 E3 ubiquitin-protein ligase HOS1
GH_D03G1616 BRG3 可能与BOI相关的E3泛素蛋白连接酶3 Probable BOI-related E3 ubiquitin-protein ligase 3
GH_A04G1248 P3H1 脯氨酰3-羟化酶 1 Prolyl 3-hydroxylase 1
GH_D03G1644 HSF8 热休克因子蛋白HSF8 Heat shock factor protein HSF8
Group c
GH_D03G1617 UPF1 无意义转录本1同源物的调节器 Regulator of nonsense transcripts 1 homolog
GH_D03G1083 NA NA
GH_D03G1610 caskin2 Caskin-2
GH_A04G1266 SPL6 Squamosa启动子结合样蛋白6 Squamosa promoter-binding-like protein 6
GH_D03G1087 PAP27 可能失活的紫色酸性磷酸酶27 Probable inactive purple acid phosphatase 27
GH_D03G1608 At2g01680 含锚蛋白重复蛋白At2g01680 Ankyrin repeat-containing protein At2g01680
GH_D03G1063 GLO1 过氧化物酶体(S)-2-羟基氧化酶 Peroxisomal (S)-2-hydroxy-acid oxidase GLO1
GH_D03G1085 IRX15 蛋白质不规则木质部15 Protein IRREGULAR XYLEM 15
GH_A05G3660 At3g52300 ATP合酶亚基d,线粒体 ATP synthase subunit d, mitochondrial
GH_A05G3680 NA NA
