Please wait a minute...
Journal of Integrative Agriculture  2024, Vol. 23 Issue (5): 1634-1643    DOI: 10.1016/j.jia.2023.11.048
Animal Science · Veterinary Medicine Advanced Online Publication | Current Issue | Archive | Adv Search |

Prescreening of large-effect markers with multiple strategies improves the accuracy of genomic prediction

Keanning Li1*, Bingxing An1*, Mang Liang1, Tianpeng Chang1, Tianyu Deng1, 2, Lili Du1, Sheng Cao1, 3, Yueying Du1, 4, Hongyan Li5, Lingyang Xu1, Lupei Zhang1, Xue Gao1, Junya LI1, Huijiang Gao1#

1 Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China

2 Shaanxi Key Laboratory of Molecular Biology for Agriculture, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China

3 College of Computer and Information Engineering, Tianjin Agricultural University, Tianjin 300384, China

4 College of Animal Science and Technology, Qingdao Agricultural University, Qingdao 266109, China

5 Tongliao Animal Agriculture Development Service Center, Tongliao 028000, China

Download:  PDF in ScienceDirect  
Export:  BibTeX | EndNote (RIS)      
摘要  

基因组选择利用表型与密集的全基因组单核苷酸多态性(Single nucleotide polymorphisms, SNPs)来估计个体的基因组估计育种值(Genomic estimated breeding value, GEBV)。目前,基因组最佳线性无偏预测(Genomic best linear unbiased prediction, GBLUP)是预测复杂性状最广泛的工具,随着技术不断发展,使用情境日下迁变,研究者们对预测的准确性提出更高要求。侧重于对模型本身进行拓展的相关研究将传统模型中的随机效应部分重新划分为多个子集并赋权,以此来提高预测精度,而倾向于利用多组学数据的研究则捕获基因组测序水平之外的变异来辅助提高预测精度。在本研究中,为了在大动物牛中提高基因组选择的预测精度,我们将具有基因组与转录组数据的群体设置为训练群体,以华西牛的生长性状中的背最长肌重量性状(Longissimus dorsi muscles, LDM),与肉质性状中的系水力(Water holding capacity, WHC)、剪切力(Shear force, SF)以及pH的数据作为表型数据,使用贝叶斯稀疏线性混合模型(Bayesian sparse linear mixed model, BSLMM)、全转录组关联分析(Transcriptome-wide association study, TWAS)及表达数量性状基因座(Expression quantitative trait locus, eQTL)映射的方法,根据βb>0、前1%表型方差解释率(Percent of phenotypic variation explained, PVE)、与表达关联的单核苷酸多态性位点(expression-associated single nucleotide polymorphisms, eSNPs)及基因(egenes)错误发现率(False discovery rate, FDR)< 0.01的条件进行基因组特征预筛选,而后将这些显著的预筛选位点分别设置为额外的固定效应(GBLUP-Fix)与随机效应(GFBLUP),用以改进模型并在验证群体中进行验证,同时,将传统GBLUP方法以及基于随机筛选的位点进行设置的GFBLUP与GBLUP-Fix进行比较。结果表明,在GFBLUP与GBLUP-Fix模型下,不同策略预筛选位点的加入,将LDM、WHC、SF、pH性状的预测精度平均提高了2.14%至8.69%。其中GFBLUP-TWAS在SF方面相较于GBLUP模型,预测精度提高了13.66%。此外结果也表明这些方法能够捕获比GBLUP模型更多的遗传变异。我们的研究验证并强调了多组学辅助的大效应位点预筛选策略在提高大动物的基因组预测准确性上的可行性,这为筛选位点并用于华西牛低密度SNP芯片序列的设计工作奠定基础。



Abstract  

Presently, integrating multi-omics information into a prediction model has become a ameliorate strategy for genomic selection to improve genomic prediction accuracy.  Here, we set the genomic and transcriptomic data as the training population data, using BSLMM, TWAS, and eQTL mapping to prescreen features according to | ^βb|>0, top 1% of phenotypic variation explained (PVE), expression-associated single nucleotide polymorphisms (eSNPs), and egenes (false discovery rate (FDR)<0.01), where these loci were set as extra fixed effects (named GBLUP-Fix) and random effects (GFBLUP) to improve the prediction accuracy in the validation population, respectively.  The results suggested that both GBLUP-Fix and GFBLUP models could improve the accuracy of longissimus dorsi muscle (LDM), water holding capacity (WHC), shear force (SF), and pH in Huaxi cattle on average from 2.14 to 8.69%, especially the improvement of GFBLUP-TWAS over GBLUP was 13.66% for SF.  These methods also captured more genetic variance than GBLUP.  Our study confirmed that multi-omics-assisted large-effects loci prescreening could improve the accuracy of genomic prediction.

Keywords:  Multi-omics data       features prescreening       eQTL mapping       Huaxi cattle       Genomic selection   
Received: 23 November 2022   Accepted: 03 November 2023
Fund: This research was supported by the National Natural Science Foundations of China (31872975), the Science and Technology Project of Inner Mongolia Autonomous Region, China (2020GG0210), and the Program of National Beef Cattle and Yak Industrial Technology System, China (CARS-37).
About author:  Keanning Li, E-mail: likeanning@163.com; Bingxing An, E-mail: anbingxing@caas.cn; #Correspondence Huijiang Gao, Tel: +86-10-62818176, Fax: +86-10-62817806, E-mail: gaohuijiang@caas.com * These authors contributed equally to this study.

Cite this article: 

Keanning Li, Bingxing An, Mang Liang, Tianpeng Chang, Tianyu Deng, Lili Du, Sheng Cao, Yueying Du, Hongyan Li, Lingyang Xu, Lupei Zhang, Xue Gao, Junya LI, Huijiang Gao. 2024.

Prescreening of large-effect markers with multiple strategies improves the accuracy of genomic prediction . Journal of Integrative Agriculture, 23(5): 1634-1643.

An B, Liang M, Chang T, Duan X, Du L, Xu L, Zhang L, Gao X, Li J, Gao H. 2021. KCRR: A nonlinear machine learning with a modified genomic similarity matrix improved the genomic prediction efficiency. Briefing in Bioinformatics, 22, 6.

Azodi C B, Pardo J, VanBuren R, de Los Campos G, Shiu S H. 2020. Transcriptome-based prediction of complex traits in maize. Plant Cell, 32, 139–151.

Bernardo R. 2014. Genomewide selection when major genes are known. Crop Science, 54, 68–75.

Du L, Chang T, An B, Liang M, Duan X, Cai W, Zhu B, Gao X, Chen Y, Xu L, Zhang L, Li J, Gao H. 2021. Transcriptome profiling analysis of muscle tissue reveals potential candidate genes affecting water holding capacity in Chinese Simmental beef cattle. Scientific Reports, 11, 11897.

Edwards S M, Sørensen I F, Sarup P, Mackay T F, Sørensen P. 2016. Genomic prediction for quantitative traits is improved by mapping variants to gene ontology categories in Drosophila melanogaster. Genetics, 203, 1871–1883.

Erbe M, Hayes B J, Matukumalli L K, Goswami S, Bowman P J, Reich C M, Mason B A, Goddard M E. 2012. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. Journal of Dairy Science, 95, 4114–4129.

Fang L, Sahana G, Ma P, Su G, Yu Y, Zhang S, Lund M S, Sørensen P. 2017. Exploring the genetic architecture and improving genomic prediction accuracy for mastitis and milk production traits in dairy cattle by mapping variants to hepatic transcriptomic regions responsive to intra-mammary infection. Genetics Selection Evolution, 49, 44.

Forsberg S K, Bloom J S, Sadhu M J, Kruglyak L, Carlborg Ö. 2017. Accounting for genetic interactions improves modeling of individual quantitative trait phenotypes in yeast. Nature Genetics, 49, 497–503.

Gao N, Martini J W R, Zhang Z, Yuan X, Zhang H, Simianer H, Li J. 2017. Incorporating gene annotation into genomic prediction of complex phenotypes. Genetics, 207, 489–501.

García-Ruiz A, Cole J B, VanRaden P M, Wiggans G R, Ruiz-López F J, Van Tassell C P. 2016. Changes in genetic selection differentials and generation intervals in US Holstein dairy cattle as a result of genomic selection. Proceedings of National Academy Sciences of the United States of America, 113, E3995–E4004.

Hayes B J, Pryce J, Chamberlain A J, Bowman P J, Goddard M E. 2010. Genetic architecture of complex traits and accuracy of genomic prediction: Coat colour, milk-fat percentage, and type in Holstein cattle as contrasting model traits. PLoS Genetics, 6, e1001139.

Hu X, Xie W, Wu C, Xu S. 2019. A directed learning strategy integrating multiple omic data improves genomic prediction. Plant Biotechnology Journal, 17, 2011–2020.

Klau S, Jurinovic V, Hornung R, Herold T, Boulesteix A L. 2018. Priority-Lasso: A simple hierarchical approach to the prediction of clinical outcome using multi-omics data. BMC Bioinformatics, 19, 322.

Li Z, Gao N, Martini J W R, Simianer H. 2019. Integrating gene expression data into genomic prediction. Frontiers in Genetics, 10, 126.

Martini J W, Gao N, Cardoso D F, Wimmer V, Erbe M, Cantet R J, Simianer H. 2017. Genomic prediction with epistasis models: On the marker-coding-dependent performance of the extended GBLUP and properties of the categorical epistasis model (CE). BMC Bioinformatics, 18, 3.

Martini J W, Wimmer V, Erbe M, Simianer H. 2016. Epistasis and covariance: How gene interaction translates into genomic relationship. Theoretical and Applied Genetics, 129, 963–976.

Meuwissen T H, Hayes B J, Goddard M E. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157, 1819–1829.

Moore J K, Manmathan H K, Anderson V A, Poland J A, Morris C F, Haley S D. 2017. Improving genomic prediction for pre-harvest sprouting tolerance in wheat by weighting large-effect quantitative trait loci. Crop Science, 57, 1315–1324.

Morgante F, Huang W, Sørensen P, Maltecca C, Mackay T F C. 2020. Leveraging multiple layers of data to predict drosophila complex traits. G3 (Bethesda), 10, 4599–4613.

Moser G, Lee S H, Hayes B J, Goddard M E, Wray N R, Visscher P M. 2015. Simultaneous discovery, estimation and prediction analysis of complex traits using a bayesian mixture model. PLoS Genetics, 11, e1004969.

Pertea M, Pertea G M, Antonescu C M, Chang T C, Mendell J T, Salzberg S L. 2015. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology, 33, 290–295.

Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M A, Bender D, Maller J, Sklar P, De Bakker P I, Daly M J. 2007. PLINK: A tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81, 559–575.

de Roos A P W, Hayes B J, Spelman R J, Goddard M E. 2008. Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle. Genetics, 179, 1503–1512.

Sarup P, Jensen J, Ostersen T, Henryon M, Sørensen P. 2016. Increased prediction accuracy using a genomic feature model including prior information on quantitative trait locus regions in purebred Danish Duroc pigs. BMC Genetics, 17, 11.

Schaeffer L R. 2006. Strategy for applying genome-wide selection in dairy cattle. Journal of Animal Breeding and Genetics, 123, 218–223.

Schrag T A, Westhues M, Schipprack W, Seifert F, Thiemann A, Scholten S, Melchinger A E. 2018. Beyond genomic prediction: combining different types of omics data can improve prediction of hybrid performance in maize. Genetics, 208, 1373–1385.

Shabalin A A. 2012. Matrix eQTL: Ultra fast eQTL analysis via large matrix operations. Bioinformatics, 28, 1353–1358.

Song H, Ye S, Jiang Y, Zhang Z, Zhang Q, Ding X. 2019. Using imputation-based whole-genome sequencing data to improve the accuracy of genomic prediction for combined populations in pigs. Genetics Selection Evolution, 51, 58.

Speed D, Balding D J. 2014. MultiBLUP: Improved SNP-based prediction for complex traits. Genome Research, 24, 1550–1557.

VanRaden P M. 2008. Efficient methods to compute genomic predictions. Journal of Dairy Science, 91, 4414–4423.

Veerkamp R F, Bouwman A C, Schrooten C, Calus M P. 2016. Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle. Genetics Selection Evolution, 48, 95.

Võsa U, Claringbould A, Westra H J, Bonder M J, Deelen P, Zeng B, Kirsten H, Saha A, Kreuzhuber R, Yazar S, Brugge H, Oelen R, de Vries D H, van der Wijst M G P, Kasela S, Pervjakova N, Alves I, Favé M J, Agbessi M, Christiansen M W, et al. 2021. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nature Genetics, 53, 1300–1310.

Wen Y J, Zhang H, Ni Y L, Huang B, Zhang J, Feng J Y, Wang S B, Dunwell J M, Zhang Y M, Wu R. 2018. Methodological implementation of mixed linear models in multi-locus genome-wide association studies. Briefings in Bioinformatics, 19, 700–712.

Wray N R, Wijmenga C, Sullivan P F, Yang J, Visscher P M. 2018. Common disease is more complex than implied by the core gene omnigenic model. Cell, 173, 1573–1580.

Xu S, Xu Y, Gong L, Zhang Q. 2016. Metabolomic prediction of yield in hybrid rice. Plant Journal, 88, 219–227.

Xu Y, Xu C, Xu S. 2017. Prediction and association mapping of agronomic traits in maize using multiple omic data. Heredity (Edinb), 119, 174–184.

Yang J, Lee S H, Goddard M E, Visscher P M. 2011. GCTA: A tool for genome-wide complex trait analysis. American Journal of Human Genetics, 88, 76–82.

Yang J, Zeng J, Goddard M E, Wray N R, Visscher P M. 2017. Concepts, estimation and interpretation of SNP-based heritability. Nature Genetics, 49, 1304–1310.

Yao D W, O’Connor L J, Price A L, Gusev A. 2020. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nature Genetics, 52, 626–633.

Ye S, Li J, Zhang Z. 2020a. Multi-omics-data-assisted genomic feature markers preselection improves the accuracy of genomic prediction. Journal of Animal Science and Biotechnology, 11, 109.

Ye S, Song H, Ding X, Zhang Z, Li J. 2020b. Pre-selecting markers based on fixation index scores improved the power of genomic evaluations in a combined Yorkshire pig population. Animal, 14, 1–10.

Zhang Z, Ober U, Erbe M, Zhang H, Gao N, He J, Li J, Simianer H. 2014. Improving the accuracy of whole genome prediction for complex traits using the results of genome wide association studies. PLoS ONE, 9, e93017

Zhou X, Carbonetto P, Stephens M. 2013. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genetics, 9, e1003264.

Zhou X, Stephens M. 2014. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature Methods, 11, 407–409.

[1] Jun Zhou, Qing Lin, Xueyan Feng, Duanyang Ren, Jinyan Teng, Xibo Wu, Dan Wu, Xiaoke Zhang, Xiaolong Yuan, Zanmou Chen, Jiaqi Li, Zhe Zhang, Hao Zhang.

Evaluating the performance of genomic selection on purebred population by incorporating crossbred data in pigs [J]. >Journal of Integrative Agriculture, 2024, 23(2): 639-648.

[2] TENG Jin-yan, YE Shao-pan, GAO Ning, CHEN Zi-tao, DIAO Shu-qi, LI Xiu-jin, YUAN Xiao-long, ZHANG Hao, LI Jia-qi, ZHANG Xi-quan, ZHANG Zhe. Incorporating genomic annotation into single-step genomic prediction with imputed whole-genome sequence data[J]. >Journal of Integrative Agriculture, 2022, 21(4): 1126-1136.
[3] Learnmore Mwadzingeni, Hussein Shimelis, Ernest Dube, Mark D Laing, Toi J Tsilo. Breeding wheat for drought tolerance: Progress and technologies[J]. >Journal of Integrative Agriculture, 2016, 15(05): 935-943.
[4] ZHANG Zhe, ZHANG Hao, PAN Rong-yang, WU Long, LI Ya-lan, CHEN Zan-mou, CAI Geng-yuan, LI Jia-qi, WU Zhen-fang. Genetic parameters and trends for production and reproduction traits of a Landrace herd in China[J]. >Journal of Integrative Agriculture, 2016, 15(05): 1069-1075.
[5] CHEN Jun, WANG Ya-chun, ZHANG Yi, SUN Dong-xiao, ZHANG Sheng-li , ZHANG Yuan . Evaluation of Breeding Programs Combining Genomic Information in Chinese Holstein[J]. >Journal of Integrative Agriculture, 2011, 10(12): 1949-1957.
No Suggested Reading articles found!