中国农业科学 ›› 2017, Vol. 50 ›› Issue (17): 3375-3385.doi: 10.3864/j.issn.0578-1752.2017.17.012

• 园艺 • 上一篇    下一篇

白菜类作物开花时间的全基因组关联分析

高宝祯1,刘博1,李石开2,梁建丽1,程锋1,王晓武1,武剑1

 
  

  1. 1中国农业科学院蔬菜花卉研究所,北京 100081;2云南省农业科学院园艺作物研究所,昆明 650205
  • 收稿日期:2017-01-20 出版日期:2017-09-01 发布日期:2017-09-01
  • 通讯作者: 武剑,Tel:010-82105971;E-mail:wujian@caas.cn
  • 作者简介:高宝祯,E-mail:18853812686@163.com
  • 基金资助:
    国家自然科学基金(31272179)

Genome-Wide Association Studies for Flowering Time in Brassica rapa

GAO BaoZhen1, LIU Bo1, LI ShiKai2, LIANG JianLi1, CHENG Feng1, WANG XiaoWu1, WU Jian1   

  1. 1Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing 100081; 2Institute of horticultural crops, Yunnan Academy of Agricultural Sciences, Kunming 650205
  • Received:2017-01-20 Online:2017-09-01 Published:2017-09-01

摘要: 【目的】解析白菜类作物开花时间的调控位点,定位白菜类作物开花时间相关的候选基因,为白菜类作物抽薹开花时间的遗传改良提供依据。【方法】以116份白菜类作物组成的自然群体作为研究材料,分别种植在温室与露地2个独立的环境中,进行开花时间调查。同时,提取试验材料的DNA样品进行深度为1.2x的重测序,对测序数据用Pooled Mapping法进行过滤、与参考基因组比对,获得全基因组高密度SNP集合。经过条件过滤后,对高质量的SNP集合进行生物信息分析,包括试验材料的群体结构分析和全基因组连锁不平衡分析。从高质量的SNP集合中,随机挑选出2 000个变异位点,用PhyML软件以最大似然法对116份试验材料进行系统发育树分析。用全部的高质量SNP集合位点通过软件Haploview进行全基因组连锁不平衡分析。最后,将高质量的SNP集合与开花时间数据结合,通过TASSEL和GAPIT软件包以及R程序语言进行全基因组关联分析。根据强关联峰值信号点位置和连锁不平衡区间定位开花时间候选位点,再通过白菜与同源物种拟南芥的基因共线性关系以及基因功能注释分析来预测白菜类作物开花时间相关的候选基因。【结果】不同种植条件下、不同类型的白菜类作物在开花时间上存在广泛差异。试验材料在露地环境下的开花时间高峰期明显早于温室环境下的材料;试验材料在露地环境下的开花时间总体表现出偏正态分布,而在温室环境下,开花时间各个阶段呈现出较为均衡的分布。温室与露地环境下的开花时间呈显著正相关。通过生物信息学分析最终得到的高质量SNP位点共103万个。试验材料的群体结构分析表明在系统发育树上各亚群内部分布较为集中,不同亚群之间的分布与材料的地理起源密切相关。全基因组衰减平均LD为2.3 kb,表明在116份白菜类作物构建的群体内存在较为频繁的重组和突变。对不同条件下的开花时间进行全基因组关联分析,用复合模型检测到54个(P>4)强关联峰值信号点,一般模型检测到87个(P>5)。通过进一步分析强关联信号点的连锁不平衡(linkage disequilibrium,LD)区段,得到存在强连锁关系(r2>0.33)的峰值信号点共33个(温室环境下27个,露地环境下19个)。其中,在温室与露地环境下的共定位位点13个。根据33个关联候选位点,再通过白菜与同源物种拟南芥的基因共线性关系以及基因功能注释分析筛选出白菜类作物开花时间相关的候选基因14个,其中温室与露地环境下共定位候选基因3个(FUL、PHYB和FPF1)。在露地条件下定位到开花关键基因FT1。【结论】不同条件下开花时间的相关性分析表明,遗传效应在开花早晚中起着决定性作用。全基因组关联分析共鉴定出33个与开花时间相关的显著关联信号。通过连锁不平衡分析、白菜与同源物种拟南芥的基因共线性关系以及基因功能注释分析初步鉴定出14个白菜类作物开花时间相关的候选基因。

关键词: 白菜类作物, 开花时间, 连锁不平衡, 全基因组关联分析, 候选基因

Abstract: 【Objective】To identify the genetic loci or candidate genes for flowering time regulation in Brassica rapa for improvement of pre-mature bolting resistance of B. rapa. 【Method】 In this study, 116 B. rapa germplasm accessions were selected to evaluate flowering time variations in greenhouse and open-field, respectively. Total genomic DNA was extracted with 1.2x re-sequenced depth. Filtering, mapping with reference by Pooled Mapping was conducted to obtain a genomic high quality SNP set. Then the population structure and linkage disequilibrium (LD) were analyzed using SNP set after condition filtering. In total 2000 SNP points were selected from all SNPs randomly to conduct phylogenetic tree analysis using PhyML software with maximum likelihood method. all high quality SNPs were used to conduct genomic linkage disequilibrium analysis with Haploview software. Genome-wide association study (GWAS) for flowering time variations was then conducted based on software TASSEL, GAPIT and R. According to the position of strong association signals and LD block, the candidate signals for flowering time were identified. Eventually, flowering time candidate genes in B. rapa were predicted by gene colinearity relationship between A. thaliana and B. rapa, and gene function annotation.【Result】The 116 B. rapa accessions showed extensive variations in flowering time. Significant variation was also observed between greenhouse and open-field environments. The distribution of flowering time under open-field was partial normal, while the flowering time distributed evenly under greenhouse. Phenotypes of flowering time were significantly correlated between different environments, indicating that genetic effect played a crucial role in regulation of flowering time. A total of 1.03 million SNPs covering genome-wide were generated by biotechnology analysis. Population structure showed that accessions from each sub-group were clustered, and had a close relationship with geographic origin in phylogenetic tree. The linkage disequilibrium decay across genome-wide was 2.3 kb, demonstrating that there were frequent recombinations and variants in 116 B. rapa accessions. A total of 54 strong signals (P>4) were detected using mixed linear model and 87 (P>5) using general linear model under two different environments. Thirty-three strong signals (27 loci under greenhouse, 19 loci under open-field) were saved after considering LD block (r2>0.33), including 13 co-identified SNPs. Based on genome colinearity between A. thaliana and B. rapa, and gene function annotation, 14 candidate genes were predicted. Three candidate genes, FUL, PHYB, FPF1, were co-identified under greenhouse and open-field environments. FT1, a key gene involved in flowering time regulation was also identified under open-field condition.【Conclusion】 Correlation analysis of flowering time under different environments indicated that genetic control is a decisive effect on flowering time. A total of 33 significant associated SNPs controlling flowering time were identified by GWAS. By combining LD block, genome colinearity between A. thaliana and B. rapa, and gene annotation, 14 flowering time candidate genes were predicted.

Key words: Brassica rapa, flowering time, linkage disequilibrium, genome-wide association study, candidate gene