Please wait a minute...
Journal of Integrative Agriculture
Advanced Online Publication | Current Issue | Archive | Adv Search
Phenotype-driven machine learning models for predicting average daily gain in Yorkshire pigs with SHAP interpretation

Shan Jiang1, Jiahao Chen1, Yifan Han1, Haoyu Pei1, Jiakai Tang1, Chuxiong Zhang1, Miaomiao Qin1, Fei Cheng1, 2, Lijing Bai3#, Jiangwei Wu1#

1 Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China

2 Huanglong Zhengneng Agriculture and Animal Husbandry Technology Co., Ltd in Shaanxi 715700, China

3 Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518000, China 

 Highlights 

1. A regression-based ML framework accurately predicts average daily gain (ADG)in Yorkshire pigs.

2. CatBoost outperformed 14 models and showed strong external validation.

3. SHAP analysis identified key phenotypic predictors affecting ADG variation.

4. A web-based tool enables real-time and interpretable ADG prediction on farms.

Download:  PDF in ScienceDirect  
Export:  BibTeX | EndNote (RIS)      
摘要  

平均日增重(ADG)是生猪生产中衡量生长性能与生产效率的关键指标。尽管GBLUPssGBLUP等基因组预测工具已在育种中广泛应用,但其推广仍受基因分型成本与数据可得性限制。为提供一种实用且经济高效的基因组评估补充方案,本研究构建了基于机器学习的表型预测框架,利用生产中常规记录的早期生长与管理变量对约克夏猪ADG进行预测与解释。研究整理了20202月至20244月期间在标准化饲养条件下获得的12,079头约克夏猪生产记录,经质量控制后,训练并比较15种机器学习算法,采用均方根误差(RMSE)、平均绝对误差(MAE)、平均绝对百分比误差(MAPE)和决定系数(R²)对模型性能进行综合评估。模型可解释性采用SHapley Additive exPlanationsSHAP)方法进行特征归因分析,并引入独立外部群体开展验证以检验泛化能力。结果表明,CatBoost模型在内部与外部验证中均取得较为理想且稳定的预测表现,显示出良好的稳健性与泛化能力。SHAP分析进一步剖析了各早期特征对ADG预测的贡献,识别出与ADG预测显著相关且具有生物学意义的关键预测特征,为生产管理优化与育种决策提供了可解释依据。为促进成果落地应用,本研究进一步开发了用户友好的Web应用程序,实现ADG的实时预测与解释可视化。综上,常规采集的表型与管理数据可通过机器学习实现对约克夏猪ADG的准确预测,并结合SHAP提供可解释的特征贡献分析,为生猪养殖系统的精细化管理与生产效率提升提供了一种低成本、可推广的数据驱动工具。本研究基于生产一线可获得的早期表型与管理数据,系统比较多种机器学习模型,并结合外部独立验证与SHAP可解释性归因,将表现较优的模型以Web端工具形式部署,实现ADG的可解释预测与可部署应用。



Abstract  

Average daily gain (ADG) is a key indicator of growth performance in swine production. Although genomic prediction tools such as genomic best linear unbiased prediction (GBLUP) and single-step genomic best linear unbiased prediction (ssGBLUP) are widely used in breeding programs, their application may be limited by cost and data availability. To provide a practical and cost-effective complement to genomic evaluation, we developed a machine learning–based phenotypic prediction framework for estimating ADG in Yorkshire pigs using routinely recorded early-life variables. Production records from 12,079 pigs raised under standardized conditions between February 2020 and April 2024 were curated, and after data cleaning, fifteen regression algorithms were trained and evaluated using the root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and the coefficient of determination (R2). Model interpretability was assessed using SHapley Additive exPlanations (SHAP), and an independent external cohort was used for validation. Results indicated that CatBoost delivered the highest predictive accuracy and demonstrated strong generalization in both internal and external validations. SHAP analysis identified biologically meaningful early-life predictors contributing to ADG variation. To promote practical adoption, we developed a user-friendly web application that enables real-time prediction and interpretation of ADG outcomes. Overall, this study demonstrates that routinely collected phenotypic and management data can effectively support accurate ADG prediction through machine learning, offering a data-driven tool to enhance decision-making and production efficiency in swine systems. 

Keywords:  machine learning       average daily gain       shap interpretation       Yorkshire pigs       predictive modeling  
Online: 19 March 2026  
Fund: 

The authors are grateful to Huanglong Zhengneng Agriculture and Animal Husbandry Technology Co., Ltd. (Shaanxi Province, China) for their support. We especially acknowledge the valuable assistance of the company’s staff and farm personnel in facilitating data collection and management. We extend our sincere gratitude to the High-Performance Computing (HPC) of Northwest A&F University (NWAFU) for the provision of crucial computational resources used throughout this study. This work was supported by the STI2030 Major Projects, China (2023ZD0404702), the Program for Shaanxi Science and Technology, China (2023-CX-TD-57), and Shaanxi Livestock and Poultry Breeding Common Technology Research and Development Platform, China (2023GXJS-02-01).

About author:  Shan Jiang, E-mail: jiangshan001@nwafu.edu.cn; #Correspondence Jiangwei Wu, E-mail: wujiangwei@nwafu.edu.cn; Lijing Bai, E-mail: bailijing@caas.cn

Cite this article: 

Shan Jiang, Jiahao Chen, Yifan Han, Haoyu Pei, Jiakai Tang, Chuxiong Zhang, Miaomiao Qin, Fei Cheng, Lijing Bai, Jiangwei Wu. 2026. Phenotype-driven machine learning models for predicting average daily gain in Yorkshire pigs with SHAP interpretation. Journal of Integrative Agriculture, Doi:10.1016/j.jia.2026.03.045

Aluwe M, Degezelle I, Depuydt L, Fremaut D, Van den Broeke A, Millet S. 2016. Immunocastrated male pigs: effect of 4 v. 6 weeks time post second injection on performance, carcass quality and meat quality. Animal, 10, 1466-1473.

An U, Pazokitoroudi A, Alvarez M, Huang L, Bacanu S, Schork AJ, Kendler K, Pajukanta P, Flint J, Zaitlen N, Cai N, Dahl A, Sankararaman S. 2023. Deep learning-based phenotype imputation on population-scale biobank data increases genetic discoveries. Nature Genetics, 55, 2269-2276.

Azouggagh L, Ibanez-Escriche N, Martinez-Alvaro M, Varona L, Casellas J, Negro S, Casto-Rebollo C. 2025. Characterization of microbiota signatures in iberian pig strains using machine learning algorithms. Animal Microbiome, 7, 13.

Bergamaschi M, Maltecca C, Fix J, Schwab C, Tiezzi F. 2020. Genome-wide association study for carcass quality traits and growth in purebred and crossbred pigs1. Journal of Animal Science, 98, skz360.

Calderón Díaz JA, Herrero Medrano JM, Trittmacher S, Magallón Verde P, Lewis CRG. 2025. Welfare implications of poor gilt selection standards in commercial pig production systems. Animal Frontiers, 15, 43-52.

Chen Y, Yang Z, Liu Y, Li Y, Zhong Z, McDowell G, Ditchfield C, Guo T, Yang M, Zhang R, Huang B, Gue Y, Lip GYH. 2024. Exploring the prognostic impact of triglyceride-glucose index in critically ill patients with first-ever stroke: insights from traditional methods and machine learning-based mortality prediction. Cardiovascular Diabetology, 23, 443.

Crossa J, Montesinos-Lopez O A, Costa-Neto G, Vitale P, Martini J, Runcie D, Fritsche-Neto R, Montesinos-Lopez A, Pérez-Rodríguez P, Gerard G, Dreisigacker S, Crespo-Herrera L, Pierre CS, Lillemo M, Cuevas J, Bentley A, Ortiz R. 2025. Machine learning algorithms translate big data into predictive breeding accuracy. Trends in Plant Science, 30, 167-184.

Grohmann C J, Decker J E. 2025. From reactive to proactive: impact of artificial intelligence on management and selection of livestock. Animal Frontiers, 14, 64-67.

Gu L, Wu H, Liu T, Zhang Y, He J, Liu X, Wang Z, Chen G, Jiang D, Fang M. 2025. Rapid and accurate multi-phenotype imputation for millions of individuals. Nature Communications, 16, 387.

Hamid M, Hajjej F, Alluhaidan A S, Bin Mannie N W. 2025. Fine tuned catboost machine learning approach for early detection of cardiovascular disease through predictive modeling. Scientific Reports, 15, 31199.

Hermanussen M, Scheffler C. 2021. Secular trends in gestational weight gain and parity on birth weight: an editorial. Acta Paediatrica, 110, 1094-1096.

Hoque MA, Suzuki K, Kadowaki H, Shibata T, Oikawa T. 2007. Genetic parameters for feed efficiency traits and their relationships with growth and carcass traits in duroc pigs. Journal of Animal Breeding and Genetics, 124, 108-116.

Hornick JL, Van Eenaeme C, Gérard O, Dufrasne I, Istasse L. 2000. Mechanisms of reduced and compensatory growth. Domestic Animal Endocrinology, 19, 121-132.

Jesuyon OMA. 2018. Effects of strain, sex, and season on body weight development of cane rat (thryonomys swinderianus) in the humid tropics. Tropical Animal Health and Production, 50, 5-10.

Jiang J, Xiang X, Zhou Q, Zhou L, Bi X, Khanal S K, Wang Z, Chen G, Guo G. 2024. Optimization of a novel engineered ecosystem integrating carbon, nitrogen, phosphorus, and sulfur biotransformation for saline wastewater treatment using an interpretable machine learning approach. Environmental Science & Technology, 58, 12989-12999.

Knol E F, van der Spek D, Zak L J. 2022. Genetic aspects of piglet survival and related traits: a review. Journal of Animal Science, 100, skac190.

Lavery A, Lawlor P G, Magowan E, Miller H M, O'Driscoll K, Berry D P. 2019. An association analysis of sow parity, live-weight and back-fat depth as indicators of sow productivity. Animal, 13, 622-630.

Lee H, Lee J H, Gondro C, Koh Y J, Lee S H. 2023. Deepgblup: joint deep learning networks and gblup framework for accurate genomic prediction of complex traits in korean native cattle. Genetics, selection, evolution : GSE, 55, 56.

Lee H, Lin M, Wang H, Hsu C, Lin C, Chang S, Shen P, Chang H. 2022. Direct-maternal genetic parameters for litter size and body weight of piglets of a new black breed for the taiwan black hog market. Animals, 12, 3295.

Liu F, Zhao W, Le H H, Cottrell J J, Green M P, Leury B J, Dunshea F R, Bell A W. 2022. Review: what have we learned about the effects of heat stress on the pig industry? Animal, 16 Suppl 2, 100349.

Liu X, Wang M, Yang H. 2025. Integrating multiple feature engineering methods with catboost algorithm for the prediction and interpretation of nitrogenous components in bio-oil from biomass pyrolysis. Bioresource Technology, 440, 133505.

Liu X, Xie Z, Zhang Y, Huang J, Kuang L, Li X, Li H, Zou Y, Xiang T, Yin N, Zhou X, Yu J. 2024. Machine learning for predicting in-hospital mortality in elderly patients with heart failure combined with hypertension: a multicenter retrospective study. Cardiovascular Diabetology, 23, 407.

Liufu S, Lan Q, Liu X, Chen B, Xu X, Ai N, Li X, Yu Z, Ma H. 2023. Transcriptome analysis reveals the age-related developmental dynamics pattern of the longissimus dorsi muscle in ningxiang pigs. Genes, 14, 1050.

Ma J, Zhang S, Liu X, Wang J. 2023. Machine learning prediction of biochar yield based on biomass characteristics. Bioresource Technology, 389, 129820.

Maher C, Ferguson T, Curtis R, Brown W, Dumuid D, Fraysse F, Hendrie G A, Singh B, Esterman A, Olds T. 2023. Weekly, seasonal, and festive period weight gain among australian adults. Jama Network Open, 6, e2326038.

Maltecca C, Lu D, Schillebeeckx C, McNulty N P, Schwab C, Shull C, Tiezzi F. 2019. Predicting growth and carcass traits in swine using microbiome data and machine learning algorithms. Scientific Reports, 9, 6574.

Menegat M B, Dritz S S, Tokach M D, Woodworth J C, DeRouchey J M, Goodband R D. 2020. A review of compensatory growth following lysine restriction in grow-finish pigs. Translational Animal Science, 4, txaa014.

Nohara Y, Matsumoto K, Soejima H, Nakashima N. 2022. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Computer Methods and Programs in Biomedicine, 214, 106584.

Patience J F, Rossoni-Serao M C, Gutierrez N A. 2015. A review of feed efficiency in swine: biology and application. Journal of Animal Science and Biotechnology, 6, 33.

Pineiro C, Manso A, Manzanilla E G, Morales J. 2019. Influence of sows' parity on performance and humoral immune response of the offspring. Porcine Health Management, 5, 1.

Qi X, Wang S, Fang C, Jia J, Lin L, Yuan T. 2025. Machine learning and shap value interpretation for predicting comorbidity of cardiovascular disease and cancer with dietary antioxidants. Redox Biology, 79, 103470.

Qian X, Pei J, Han C, Liang Z, Zhang G, Chen N, Zheng W, Meng F, Yu D, Chen Y, Sun Y, Zhang H, Qian W, Wang X, Er Z, Hu C, Zheng H, Shen D. 2025. A multimodal machine learning model for the stratification of breast cancer risk. Nature Biomedical Engineering, 9, 356-370.

Schipper M, de Leeuw C A, Maciel BAPC, Wightman D P, Hubers N, Boomsma D I, O'Donovan MC, Posthuma D. 2025. Prioritizing effector genes at trait-associated loci using multimodal evidence. Nature Genetics 57, 323-333.

Sionek B, Przybylski W, Bańska A, Florowski T. 2021. Applications of biosensors for meat quality evaluations. Sensors (Basel), 21, 7430.

Song H, Chu J, Li W, Li X, Fang L, Han J, Zhao S, Ma Y. 2024. A novel approach utilizing domain adversarial neural networks for the detection and classification of selective sweeps. Advanced Science (Weinh), 11, e2304842.

Su R, Lv J, Xue Y, Jiang S, Zhou L, Jiang L, Tan J, Shen Z, Zhong P, Liu J. 2025. Genomic selection in pig breeding: comparative analysis of machine learning algorithms. Genetics, selection, evolution : GSE, 57, 13.

Tohyama T, Ide T, Ikeda M, Kaku H, Enzan N, Matsushima S, Funakoshi K, Kishimoto J, Todaka K, Tsutsui H. 2021. Machine learning-based model for predicting 1 year mortality of hospitalized patients with heart failure. Esc Heart Failure, 8, 4077-4085.

Tu T C, Lin C J, Liu M C, Hsu Z T, Chen C F. 2024. Comparison of genomic prediction accuracy using different models for egg  production traits in taiwan country chicken. Poultry Science, 103, 104063.

Tusell L, Bergsma R, Gilbert H, Gianola D, Piles M. 2020. Machine learning prediction of crossbred pig feed efficiency and growth rate from single nucleotide polymorphisms. Frontiers in Genetics, 11, 567818.

Wang L, Hu Q, Wang L, Shi H, Lai C, Zhang S. 2022a. Predicting the growth performance of growing-finishing pigs based on net energy and digestible lysine intake using multiple regression and artificial neural  networks models. Journal of Animal Science and Biotechnology, 13, 57.

Wang X, Shi S, Wang G, Luo W, Wei X, Qiu A, Luo F, Ding X. 2022b. Using machine learning to improve the accuracy of genomic prediction of  reproduction traits in pigs. Journal of Animal Science and Biotechnology, 13, 60.

Wang Z, Li Q, Yu Q, Qian W, Gao R, Wang R, Wu T, Li X. 2024. A review of visual estimation research on live pig weight. Sensors (Basel), 24, 7093.

Xiang T, Li T, Li J, Li X, Wang J. 2023. Using machine learning to realize genetic site screening and genomic prediction of productive traits in pigs. FASEB Journal, 37, e22961.

Xue B, Zhao X Q, Zhang Y S. 2005. Seasonal changes in weight and body composition of yak grazing on alpine-meadow grassland in the qinghai-tibetan plateau of china. Journal of Animal Science, 83, 1908-1913.

Xue Y, Liu S, Li W, Mao R, Zhuo Y, Xing W, Liu J, Wang C, Zhou L, Lei M, Liu J. 2022. Genome-wide association study reveals additive and non-additive effects on growth traits in duroc pigs. Genes (Basel), 13, 1454.

Yang X, Zhu L, Jiang W, Yang Y, Gan M, Shen L, Zhu L. 2025. Machine learning-based prediction of feed conversion ratio: a feasibility study of using short-term fcr data for long-term feed conversion ratio (fcr) prediction. Animals (Basel), 15, 1773.

Yao Z, Wang Y, Wu Y, Zhou J, Dang N, Wang M, Liang Y, Sun T. 2025. Leveraging machine learning with dynamic 18f-fdg pet/ct: integrating metabolic and flow features for lung cancer differential diagnosis. European Journal of Nuclear Medicine and Molecular Imaging, 52, 3807-3819.

You J, Guo Y, Kang J J, Wang H F, Yang M, Feng J F, Yu J T, Cheng W. 2023. Development of machine learning-based models to predict 10-year risk of  cardiovascular disease: a prospective cohort study. Stroke and Vascular Neurology, 8, 475-485.

Zhang C, Chen X, Wang S, Hu J, Wang C, Liu X. 2021. Using catboost algorithm to identify middle-aged and elderly depression, national health and nutrition examination survey 2011-2018. Psychiatry Research, 306, 114261.

Zhang S, Ding W. 2025. Research on catboost model based on autoencoder dimensionality reduction in pollution source apportionment. Environmental Geochemistry and Health, 47, 543.

[1] Jinbu Wang, Wencheng Zong, Liangyu Shi, Mianyan Li, Jia Li, Deming Ren, Fuping Zhao, Lixian Wang, Ligang Wang. Using mixed kernel support vector machine to improve the predictive accuracy of genome selection[J]. >Journal of Integrative Agriculture, 2026, 25(2): 775-787.
[2] Xi Tang, Lei Xie, Min Yan, Longyun Li, Tianxiong Yao, Siyi Liu, Wenwu Xu, Shijun Xiao, Nengshui Ding, Zhiyan Zhang, Lusheng Huang . Genomic selection for meat quality traits based on VIS/NIR spectral information[J]. >Journal of Integrative Agriculture, 2025, 24(1): 235-245.
[3] Xianglin Zhang, Jie Xue, Songchao Chen, Zhiqing Zhuo, Zheng Wang, Xueyao Chen, Yi Xiao, Zhou Shi. Improving model performance in mapping cropland soil organic matter using time-series remote sensing data[J]. >Journal of Integrative Agriculture, 2024, 23(8): 2820-2841.
[4] Hui Chen, Hongxing Chen, Song Zhang, Shengxi Chen, Fulang Cen, Quanzhi Zhao, Xiaoyun Huang, Tengbing He, Zhenran Gao. Comparison of CWSI and Ts-Ta-VIs in moisture monitoring of dryland crops (sorghum and maize) based on UAV remote sensing[J]. >Journal of Integrative Agriculture, 2024, 23(7): 2458-2475.
[5] Mansoor Sheikh, Farooq Iqra, Hamadani Ambreen, Kumar A Pravin, Manzoor Ikra, Yong Suk Chung. Integrating artificial intelligence and high-throughput phenotyping for crop improvement[J]. >Journal of Integrative Agriculture, 2024, 23(6): 1787-1802.
[6] Zhikai Cheng, Xiaobo Gu, Yadan Du, Zhihui Zhou, Wenlong Li, Xiaobo Zheng, Wenjing Cai, Tian Chang.

Spectral purification improves monitoring accuracy of the comprehensive growth evaluation index for film-mulched winter wheat [J]. >Journal of Integrative Agriculture, 2024, 23(5): 1523-1540.

[7] Jie Song, Dongsheng Yu, Siwei Wang, Yanhe Zhao, Xin Wang, Lixia Ma, Jiangang Li. Mapping soil organic matter in cultivated land based on multi-year composite images on monthly time scales[J]. >Journal of Integrative Agriculture, 2024, 23(4): 1393-1408.
[8] LI Qian-chuan, XU Shi-wei, ZHUANG Jia-yu, LIU Jia-jia, ZHOU Yi, ZHANG Ze-xi. Ensemble learning prediction of soybean yields in China based on meteorological data[J]. >Journal of Integrative Agriculture, 2023, 22(6): 1909-1927.
[9] LIU Feng, YANG Fei, ZHAO Yu-guo, ZHANG Gan-lin, LI De-cheng. Predicting soil depth in a large and complex area using machine learning and environmental correlations[J]. >Journal of Integrative Agriculture, 2022, 21(8): 2422-2434.
[10] WU Ping-xian, ZHOU Jie, WANG Kai, CHEN De-juan, YANG Xi-di, LIU Yi-hui, JIANG An-an, SHEN Lin-yuan, JIN Long, XIAO Wei-hang, JIANG Yan-zhi, LI Ming-zhou, ZHU Li, ZENG Yang-shuang, XU Xu, QIU Xiao-tian, LI Xue-wei, TANG Guo-qing. Identifying SNPs associated with birth weight and days to 100 kg traits in Yorkshire pigs based on genotyping-by-sequencing[J]. >Journal of Integrative Agriculture, 2021, 20(9): 2483-2490.
[11] Shuhan LU, YE Si-jing.
Using an image segmentation and support vector machine method for identifying two locust species and instars
[J]. >Journal of Integrative Agriculture, 2020, 19(5): 1301-1313.
[12] HU Bin, MO De-lin, WANG Xiao-ying, LIU Xiao-hong, CHEN Yao-sheng. Effects of back fat, growth rate, and age at first mating on Yorkshire and Landrace sow longevity in China[J]. >Journal of Integrative Agriculture, 2016, 15(12): 2809-2818.
No Suggested Reading articles found!