Scientia Agricultura Sinica ›› 2022, Vol. 55 ›› Issue (12): 2265-2277.doi: 10.3864/j.issn.0578-1752.2022.12.001


Genome-Wide Association Study of Yield Component Traits in Upland Cotton (Gossypium hirsutum L.)

WANG Juan1(),MA XiaoMei1,ZHOU XiaoFeng1,WANG Xin1,TIAN Qin1,LI ChengQi2(),DONG ChengGuang1()   

  1. 1Cotton Research Institute, Xinjiang Academy of Agricultural and Reclamation Science/Northwest Inland Region Key Laboratory of Cotton Biology and Genetic Breeding, Ministry of Agriculture and Rural Affairs, Shihezi 832000, Xinjiang
    2Life Science College, Yuncheng University, Yuncheng 044000, Shanxi
  • Received:2022-01-17 Accepted:2022-03-21 Online:2022-06-16 Published:2022-06-23
  • Contact: ChengQi LI,ChengGuang DONG;;


【Objective】The loci, elite alleles and candidate genes associated with yield component traits, such as boll weight, lint percentage, number of bolls per plant and seed index, were explored using a genome-wide association analysis (GWAS), which provided a theoretical reference for the molecular breeding of cotton yield.【Method】The GWAS based on a mixed linear model was performed on 408 upland cotton accessions grown in six different environments using the Cotton SNP 80K chip for the four yield component traits, and the significant SNP loci (SNPs) and elite allele were also detected. Finally, on the basis of the gene expression levels of the transcriptome, candidate genes related to the target traits were mined within a 1 Mb genome range of the flanking sequences of the significant SNPs. 【Result】The four yield component traits showed wide phenotypic variations in different environments, with the maximum coefficient of variation for number of bolls per plant being 16.67%-22.66%. The heritability of each trait was between 48.4% and 92.2%. The correlations among traits were significant or highly significant, except between boll weight and lint percentage. A total of 23 significant SNPs distributed in seven different genomic regions associated with the four traits were identified across the 408 cotton accessions in the BLUP. The numbers of loci associated with boll weight, lint percentage, number of bolls per plant and seed index were 5, 1, 9 and 8, respectively, and three loci (TM21094, TM21102, and TM57382) were associated with multiple target traits simultaneously. Seven elite allele types, TM21099(TT), TM57382(GG), TM78920(CC), TM53448(TT), TM59015(AA), TM43412(GG) and TM69770(AA), were identified. A total of 158 candidate genes potentially related to yield formation were selected through an analysis of gene expression patterns in RNA-Seq data. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analyses indicated that the functions and metabolic pathways of most genes were varied.【Conclusion】In this study, 23 significant SNPs associated with four yield component traits were identified across 408 cotton accessions, and 158 candidate genes were predicted using RNA-Seq.

Key words: Upland cotton, yield components, genome-wide association analysis, candidate genes

Table 1

Statistical analysis for the yield component traits"

CV (%)
G G×E 遗传率
<BOLD>H</BOLD>2 (%)
SHZ13 3.10 7.60 5.57 0.56 10.05 ** 78.0
KRL13 3.90 7.90 5.54 0.56 10.11
SHZ14 3.80 7.70 5.69 0.59 10.37
KRL14 4.00 7.50 5.70 0.58 10.18
SHZ15 3.90 7.10 5.43 0.53 9.76
KRL15 4.30 8.60 6.19 0.67 10.82
BLUP 4.22 7.16 5.64 0.41 7.27
SHZ13 31.90 56.10 43.86 3.26 7.43 ** ** 92.2
KRL13 31.50 50.30 41.73 3.05 7.31
SHZ14 28.70 48.80 39.99 3.31 8.28
KRL14 27.50 48.40 39.12 3.46 8.84
SHZ15 28.10 47.20 39.48 3.43 8.69
KRL15 28.00 48.20 39.56 3.68 9.30
BLUP 30.28 48.42 40.84 2.90 7.10
SHZ13 2.30 10.90 4.28 0.94 21.96 ** * 48.8
KRL13 3.10 9.50 5.50 0.96 17.45
SHZ14 3.70 11.10 6.16 1.14 18.51
KRL14 5.10 16.30 7.92 1.56 19.70
SHZ15 3.00 10.80 5.87 1.33 22.66
KRL15 4.00 11.30 6.00 1.00 16.67
BLUP 4.82 8.04 5.95 0.50 8.40
SHZ13 7.50 16.20 10.13 1.16 11.45 ** ** 85.4
KRL13 8.00 16.70 10.29 1.24 12.05
SHZ14 8.30 17.80 11.15 1.27 11.39
KRL14 8.20 18.80 11.16 1.39 12.46
SHZ15 6.50 16.40 10.53 1.20 11.40
KRL15 8.10 18.50 11.49 1.36 11.84
BLUP 8.75 16.17 10.71 1.01 9.43

Fig. 1

Correlation coefficients (r) between the yield component traits BW: Boll weight; LP: Lint percentage; BN: Boll number; SI: Seed index. *, ** indicate significant at P=0.05 and P=0.01, level, respectively. The same as below"

Table 2

Association analysis of the yield component traits"

性状 Trait SNP位点 SNP locus 染色体 Chromosome 位置 Position (bp) 等位基因 Allele -log10(<BOLD>P</BOLD>)
TM21099 A07 70392221 T/C 6.07
TM21097 A07 70365245 T/C 5.92
TM21094 A07 70345913 T/C 5.60
TM58956 D06 1287377 A/G 5.11
TM21102 A07 70411236 A/G 4.90
衣分LP TM57382 D05 18043944 A/G 4.74
TM78920 D12 42319440 A/C 6.80
TM78922 D12 42325933 T/C 6.49
TM78921 D12 42322642 T/A 6.09
TM53448 D03 1908727 T/C 5.59
TM53452 D03 1940517 T/C 5.33
TM53460 D03 1989801 T/C 5.24
TM78919 D12 42306456 T/C 5.18
TM59015 D06 1782860 A/G 5.17
TM53454 D03 1950689 T/C 5.11
TM43412 A13 5005690 A/G 6.75
TM21094 A07 70345913 T/C 5.15
TM69770 D08 62547519 A/T 5.11
TM43413 A13 5012761 A/G 4.98
TM21098 A07 70381299 T/C 4.94
TM21102 A07 70411236 A/G 4.90
TM57382 D05 18043944 T/C 4.84
TM21111 A07 70492663 A/G 4.75

Fig. 2

Manhattan plots of GWAS for yield component traits"

Table 3

Summary of elite alleles and phenotypic effects"

性状 Trait SNP位点 SNP loci -log10(P) 等位基因 Allele 优异等位变异 Elite allele 表型效应值 ai
铃重BW TM21099 6.07 T/C TT 0.42
衣分LP TM57382 4.74 A/G GG 1.73
单株铃数BN TM78920 6.80 A/C CC 0.48
TM53448 5.59 T/C TT 0.31
TM59015 5.17 A/G AA 0.31
籽指SI TM43412 6.75 A/G GG 2.14
TM69770 5.11 A/T AA 1.74

Fig. 3

Expression analysis of the 367 genes in different Upland cotton tissues A: Genes within the target region for SNP associated with BW; B: Genes within the target region for SNP associated with LP; C: Genes within the target region for SNP associated with BN; D: Genes within the target region for SNP associated with SI. Values in the scale on the middle represent the range of the variation of normalized quantity of expression, in red for high level expressions and in blue for low level expressions"

Table 4

Comparison of the associated loci in this study with previous studies"

SNP locus
SNP position (bp)
Target region (bp)
Previous studies
TM21099 A07 70392221 69892221—70892221 BW (qBW)[6], LP (qLP-1)[6], SI (qSD)[6], LP (qLP-c7-1)[12], SI (qSI-7-1)[7], SI (qSIA7-2)[14], SI (A07: 70700387)[27], SI (TM21134)[29]
衣分LP TM57382 D05 18043944 17543944—18543944 BN (qNB-D5-1)[24]
TM78920 D12 42319440 41819440—42819440 LY (qLY-C26-1)[10], SCY (qYLD-C26-1)[10], SCY (qSCY-06A-c26-1)[12]
TM53448 D03 1908727 1408727—2408727 LP (D03: 1424880)[27]
TM59015 D06 1782860 1282860—2282860
TM43412 A13 5005690 4505690—5505690 SI (A13: 4741980)[27]
TM69770 D08 62547519 62047519—63047519
