Scientia Agricultura Sinica ›› 2020, Vol. 53 ›› Issue (1): 191-200.doi: 10.3864/j.issn.0578-1752.2020.01.018

• ANIMAL SCIENCE·VETERINARY SCIENCE·RESOURCE INSECT • Previous Articles     Next Articles

Using Restricted Standardized Linear Regression Model to Estimate Genomic Breed Composition in Composite Breed Animals

Jun HE1,Zhi LI1,2,XiaoLin WU1,2   

  1. 1 College of Animal Science and Technology, Hunan Agricultural University, Changsha 410128, China
    2 Biostatistics and Bioinformatics,Neogen GeneSeek, Lincoln, NE 68504, USA
  • Received:2019-03-01 Accepted:2019-05-30 Online:2020-01-01 Published:2020-01-19

Abstract:

【Background】A composite breed is made up of two or more purebreds (ancestries), designed to combine advantageous genetic characteristics from the ancestry breeds and to retain heterosis in future generations without crossbreeding. Unlike crossbred populations, composite variety can be maintained as a purebred. In practice, knowing the ratio of genomic contribution of an ancestry breed to individual composite animals, referred to as the genomic breed composition (GBC), is of importance in animal breed registration, tracing breeding history and population structure, breed conservation, and the prediction of heterosis. Using a set of genomic SNP genotype and an appropriate statistical model, GBC of a purebred or crossbred animal can be estimated. So far, studies on statistical methods devote to the estimation of GBC in composite breed are limited. Linear regression (LR) analysis was commonly used to estimated GBC of individual animals, but it had some limitations such as the coefficients of ancestral breeds does not add to 1.【Objective】The purpose of the present study was to propose and evaluate the use of restricted standardized regression analysis, as an improved approach of linear regression analysis to estimate GBC in composite animals. 【Method】The dataset consisted of 4 323 Beefmaster cattle and purebred animals belonging to their ancestry breeds, namely Brahman, Hereford and Shorthorn. All these animals were genotyped by GeneSeek Genomic Profiling (GGP) bovine 50K SNP chips. Allelic frequencies of each SNP and the Euclidean distance between breeds were computed for the four animal populations, and their genetic relationships were revealed by Hierarchical Clustering based on Euclidean distance of SNP allele frequencies among the four populations. Genomic breed composition of the 4 323 Beefmaster cattle were estimated using RSLR and LR, respectively, based on 7 SNP panels(1K, 5K, 10K, 20K, 30K, 40K, and all the common 47 900 SNP). 【Result】The results of the clustering analysis agreed well with the genetic relationships of Beefmaster and the three ancestral breeds, showing that Beefmaster was more related to Brahman than Herdford and Shorhorn. Linear regression analysis underestimated the genomic contribution ratios of Brahman cattle (0.459-0.462) and shorthorn cattle (0.208-0.212) and at the same time overestimated that of Hereford cattle (0.326-0.333) to Beefmaster cattle. In contrast, estimated GBC of the 4 323 Beefmaster cattle obtained by using RSLR agreed well with expected genomic contribution ratios of the three ancestry breeds, which were 0.497-0.503 for Brahman, 0.262-0.274 for Hereford, and 0.229-0.231 for Shorthorn, respectively. Furthermore, the standard deviations (SD) and coefficients of variance (CV) of GBC obtained by using LR were larger than those obtained using RSLR. With 20K or more SNPs as the reference panels, the SD of GBC estimated by using LR were 0.048 (Brahman), 0.032 (Hereford) and 0.051-0.052 (Shorthorn), and the corresponding CV were 10.46%-10.50% (Brahman), 9.61%-9.76% (Hereford) and 23.94%-25.00% (Shorthorn), respectively. Using RSLR, on the other hand, the SD of GBC pertaining to each of the three ancestry breeds were 0.021 (Brahman), 0.021-0.022(Hereford) and 0.024-0.025 (Shorthorn), and the responding CV were 4.18%-4.20% (Brahman), 7.89%-8.33% (Hereford) and 10.26%-10.68% (Shorthorn), correspondingly. 【Conclusion】The RSLR method provided more accurate and consistent estimates of GBC in the 4 323 Beefmaster cattle than the LR approach. It thus provided a new statistical method for the estimation of GBC in composite animals.

Key words: SNP chip, linear regression, composite breeds, genomic breed composition

Table 1

Number of animals and summary information of GGP 50K bovine SNP genotypes for the four experimental cattle populations"

群体 Population 动物数量 Number of animals SNP数量 Number of SNPs 小等位基因频率 MAF
肉牛王牛 Beefmaster 4323 49463 0.310±0.141
婆罗门牛 Brahman 68 49463 0.248±0.139
短角牛 Shorthorn 1232 49463 0.282±0.145
海福特牛Hereford 2423 49463 0.270±0.150

Fig. 1

Analysis of population structure of four breeds"

Table 2

SNP number distribution in 7 selected SNP panels in each chromosome"

染色体
Chromosome
SNP子集 SNP panel
500 1000 3000 5000 10000 20000 30000 40000 47900
0 35 69 205 341 682 1364 2045 2727 3265
1 26 53 161 268 536 1072 1608 2144 2567
2 24 47 140 235 469 938 1407 1876 2247
3 22 44 133 221 443 886 1330 1772 2123
4 20 40 119 198 396 791 1186 1582 1894
5 22 45 135 225 450 900 1351 1801 2157
6 22 43 129 215 429 859 1288 1718 2057
7 19 39 116 193 386 772 1157 1543 1848
8 19 37 113 189 379 757 1136 1514 1812
9 19 38 113 187 374 748 1123 1497 1793
10 18 36 109 182 364 728 1091 1455 1742
11 18 37 111 186 372 745 1118 1490 1785
12 15 30 88 146 291 581 871 1162 1391
13 16 31 95 159 318 636 954 1272 1523
14 15 31 92 154 308 616 925 1233 1477
15 16 31 93 154 309 618 926 1235 1478
16 13 27 80 133 266 533 800 1067 1279
17 13 26 78 130 260 520 779 1039 1244
18 13 26 80 133 265 530 796 1061 1271
19 13 25 75 126 252 504 755 1007 1205
20 14 28 84 139 279 557 836 1114 1335
21 12 24 72 121 242 484 727 969 1160
22 11 22 64 106 211 423 633 845 1011
23 10 20 62 104 208 416 625 832 998
24 11 23 67 111 223 446 668 892 1067
25 8 15 45 76 152 303 455 606 726
26 9 18 55 91 182 365 547 730 874
27 7 15 45 76 151 301 453 603 722
28 9 17 50 82 164 329 492 657 787
29 8 17 53 89 179 357 537 716 857
X 23 46 138 230 460 921 1381 1841 2205

Table 3

Estimated GBC for Beefmaster cattle two linear regression methods and seven SNP panels, respectively"

方法
Method
SNP子集
SNP panel
婆罗门牛 Brahman 海福特牛 Hereford 短角牛 Shorthorn
平均数
Mean
标准差
SD
中位数
Median
变异系数CV(%) 平均数
Mean
标准差
SD
中位数
Median
变异系数CV(%) 平均数
Mean
标准差
SD
中位数
Median
变异系数CV(%)
LR 1000 0.462 0.060 0.465 12.99 0.326 0.054 0.327 16.56 0.212 0.073 0.206 34.43
5000 0.462 0.050 0.465 10.82 0.322 0.037 0.323 11.49 0.216 0.055 0.210 25.46
10000 0.463 0.049 0.466 10.58 0.322 0.034 0.322 10.56 0.215 0.053 0.206 24.65
20000 0.459 0.048 0.463 10.46 0.328 0.032 0.328 9.76 0.213 0.051 0.206 23.94
30000 0.457 0.048 0.461 10.50 0.330 0.032 0.331 9.70 0.213 0.052 0.204 24.41
40000 0.459 0.048 0.463 10.46 0.333 0.032 0.333 9.61 0.208 0.051 0.200 24.52
47900 0.459 0.048 0.464 10.46 0.333 0.032 0.333 9.61 0.208 0.052 0.199 25.00
RSLR 1000 0.497 0.029 0.498 5.84 0.274 0.036 0.275 13.14 0.229 0.038 0.227 16.59
5000 0.501 0.023 0.501 4.59 0.267 0.025 0.267 9.36 0.231 0.027 0.229 11.69
10000 0.503 0.022 0.504 4.37 0.262 0.023 0.261 8.78 0.235 0.025 0.233 10.64
20000 0.502 0.021 0.502 4.18 0.264 0.022 0.265 8.33 0.234 0.025 0.231 10.68
30000 0.500 0.021 0.501 4.20 0.266 0.022 0.267 8.27 0.234 0.024 0.231 10.26
40000 0.501 0.021 0.501 4.19 0.266 0.022 0.267 8.27 0.233 0.024 0.230 10.30
47900 0.501 0.021 0.501 4.19 0.266 0.021 0.266 7.89 0.234 0.024 0.231 10.26

Fig. 2

Distribution of genomic breeding composition of three parental breeding estimated in 47900 SNP by RSLR"

[1] 刘文忠 . 家畜合成群体保留杂种优势的预测与培育效果评价. 遗传, 2009,31(8):791-798.
LIU W Z . Prediction of retained heterosis and evaluation on breeding effects of composite livestock populations. Hereditas(Beijing), 2009,31(8):791-798.(in Chinese)
[2] MARSHALL B H, BRIGGS D M . Modern Breeds of Livestock. 4th ed. New York: MacMillian Company, 1980.
[3] 何俊, 钱长嵩, RICHARD G Tait Jr, Stewart Bauck, 吴晓林 . SNP芯片数据估计动物个体基因组品种构成的方法及应用. 遗传, 2018,40(4):305-314.
HE J, QIAN C S TAIT Jr R G, BAUCK S, WU X L . Estimating genomic breed composition of individual animals using selected SNPs. Hereditas (Beijing), 2018,40(4):305-314. (in Chinese)
[4] WU X L, LIU R Z, SHI Q S, LIU X C, LI X, WU M S . Marker-assisted mating applied in in-situ conservation of indigenous animals in small populations: (1) Choosing mating schemes for maximum heterozygosity. Asian-Australian Journal of Animal Science, 2000,13(4):431-434.
[5] 杨子博, 王安邦, 冷苏凤, 顾正中, 周羊梅 . 小麦新品种淮麦33的遗传构成分析. 中国农业科学, 2018,51(17):3237-3248.
YANG Z B, WANG A B, LENG S F GU Z Z, ZHOU Y M . Genetic analysis of the novel high-yielding wheat cultivar Huaimai33. Scientia Agricultura Sinica, 2018,51(17):3237-3248.( in Chinese)
[6] VANRADEN P M, COOPER T A . Genomic evaluations and breed composition for crossbred U.S. dairy cattle. Interbull Bulletin. Orlando,Florida, 2015.
[7] PRITCHARD J K, STEPHENS M, DONNELLY P . Inference of population structure using multilocus genotype data. Genetics, 2000,155(2):945-959.
[8] HE J, GUO Y G, XU J, LI H, FULLER A, TAIT R G, WU X L, BAUCK S . Comparing SNP panels and statistical methods for estimating genomic breed composition of individual animals in ten cattle breeds. BMC Genetics, 2018, 19:56.
[9] GOBENA M, ELZO M A, MATEESCU R G . Population structure and genomic breed composition in an Angus-Brahman crossbred cattle population. Frontier Genetics, 2018,9:90.
[10] CHIANG C W K, GAJDOS Z K Z, KORN J M, KURUVILLA F G, BUTLER J L, HACKETT R, GUIDUCCI C, NGUYEN T T, WILKS R, FORRESTER T, HAIMAN C A, HENDERSON K D, LE MARCHAND L, HENDERSON B E, PALMERT M R, MCKENZIE C A, LYON H N, COOPER R S, ZHU X F, HIRSCHHORN J N . Rapid assessment of genetic ancestry in populations of unknown origin by genome-wide genotyping of pooled samples. PLoS Genetics, 2010,6(3):e1000866.
[11] KUEHN L A, KEELE J W, BENNETT G L, MCDANELD T G, SMITH T P L, SNELLING W M, SONSTEGARD T S, THALLMAN R M . Predicting breed composition using breed frequencies of 50,000 markers from the US Meat Animal Research Center 2,000 Bull Project. Journal of Animal Science, 2011,89(6):1742-1750.
[12] HULSEGGE B, CALUS M P, WINDIG J J, HOVING-BOLINK A H, MAURICE-VAN EIJNDHOVEN M H, HIEMSTRA S J . Selection of SNP from 50K and 777K arrays to predict breed of origin in cattle. Journal of Animal Science, 2013,91:5128-5134.
[13] AKANNO E C, CHEN L, ABO-ISMAIL M K, CROWLEY J J, WANG Z, LI C, BASARAB J A, MACNEIL M D, PLASTOW G . Genomic prediction of breed composition and heterosis effects in Angus, Charolais, and Hereford crosses using 50K genotypes. Canadian Journal of Animal Science, 2017,97(3):431-438.
[14] MCVEAN G . A Genealogical Interpretation Of Principal Components Analysis. PLoS Genetics. 2009,5(10):e1000686.
[15] MA J, AMOS C I . Principal components analysis of population admixture. PLoS ONE, 2012,7(7):e40115.
[16] LEWIS J, ABAS Z, DADOUSIS C, LYKIDIS D, PASCHOU P, DRINEAS P . Tracing cattle breeds with principal components analysis ancestry informative SNPs. PLoS ONE. 2011,6(4):e18007.
[17] BANSAL V, LIBIGER O . Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations. BMC Bioinformatics, 2015,16:4.
[18] ALEXANDER D H, LANGE K . Enhancements to the ADMIXTURE algorithm for individual ancestry estima-tion. BMC Bioinform, 2011,12:246.
[19] DODDS K G, AUVRAY B, NEWMAN N S A, MCEWAN C J . Genomic breed prediction in New Zealand sheep. BMC Genetics, 2014,15:92
[20] FUNKHOUSER S A, BATES R O, ERNST C W, NEWCOM D, STEIBEL J P . Estimation of genome-wide and locus-specific breed composition in pigs. Translational Animal Science, 2017,1(1):36-44.
[21] GOBENA M, ELZO M A, MATEESCU R G . Population structure and genomic breed composition in an Angus-Brahman crossbred cattle population. Frontiers in Genetics, 2018,9:90.
[22] SARGOLZAEI M, CHESNAIS J P, SCHENKEL F S . A new approach for efficient genotype imputation using information from relatives. BMC Genom, 2014,15:478.
[23] HE J, GUO YG, XU JQ, LI H, FULLER A, RICHARD G JR, WU XL, BAUCK S . Estimating genomic breed composition of individual animals in ten cattle breeds: Comparison of SNP panels and statistical methodology//Proceedings of the 11th World Congress on Genetics Applied to Live-stock Production. New Zealand: Auckland, 2018, 684-687.
[24] MURTAGH F, LEGENDRE P . Ward's hierarchical agglomerative clustering method: which algorithms implement Ward's criterion? Journal of Classification, 2014,31(3):274-295.
[25] LEGENDRE P, LEGENDRE L . Numerical Ecology. 3rd ed. Developments in environmental modelling. 2012,24.
[26] 桑世飞, 王会, 梅德圣, 刘佳, 付丽, 王军, 汪文祥, 胡琼 . 利用全基因组SNP芯片分析油菜遗传距离与杂种优势的关系. 中国农业科学, 2015,48(12):2469-2478.
SANG S F, WANG H, MEI D S, LIU J, FU L, WANG J, WANG W X, HU Q . Correlation analysis between heterosis and genetic distance evaluated by genome-wide SNP chip in Brassica napus. Scientia Agricultura Sinica, 2015,48(12):2469-2478.
[27] WRIGHT S . Correlation and causation. Journal of Agricultural Research, 1921,20(7):557-585.
[28] HAN T S, KOBAYASHI K . Mathematics of Information and Coding. Boston, MA,USA: American Mathematical Society, 2001.
[29] WU X L, XU J Q, FENG G F, WIGGANS G R, TAYLOR J F, HE J, QIAN C S, QIU J S, SIMPSON B, WALKER J, BAUCK S . Optimal design of low-density SNP arrays for genomic prediction: algorithm and applications. PLoS ONE, 2016,11(9):e0161719.
[30] ABDI H, WILLIAMS L J . Principal component analysis. Wiley Interdisciplinary Reviews Computational Statistics, 2010,2(4):433-459.
[31] WU X L, XU J Q, FENG G F, WIGGANS G R, TAYLOR J F, HE J, QIAN C S, QIU J S, SIMPSON B, WALKER J, BAUCK S . Optimal design of low-density SNP arrays for genomic prediction: algorithm and applications. PLoS ONE, 2016,11(9):e0161719.
[1] XU ZhiYing,WANG BaiCui,MA XiaoLan,JIA ZiMiao,YE XingGuo,LIN ZhiShan,HU HanQiao. Polymorphism Analysis Among Chromosomes of Dasypyrum villosum 6V#2 and 6V#4 and Wheat 6A and 6D Based on Wheat SNP Chip [J]. Scientia Agricultura Sinica, 2021, 54(8): 1579-1589.
[2] ZHANG Zhuo,LONG HuiLing,WANG ChongChang,YANG GuiJun. Comparison of Hyperspectral Remote Sensing Estimation Models Based on Photosynthetic Characteristics of Winter Wheat Leaves [J]. Scientia Agricultura Sinica, 2019, 52(4): 616-628.
[3] XIE Yuan-cheng,XU Huan-liang,XIE Zhuang
. Analysis of Texture Features Based on Beef Marbling Standards (BMS) Images
[J]. Scientia Agricultura Sinica, 2010, 43(24): 5121-5128 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!