Please wait a minute...
Journal of Integrative Agriculture
Advanced Online Publication | Current Issue | Archive | Adv Search

Breaking data barriers with homomorphic encryption: The HEGS platform for secure joint genomic selection in animal breeding

Jiamin Gu1, 2, Wei Zhao1, 2, 4, Zhenyang Zhang1, 2, 5, He Han1, 2, Yongqi He4, Xiaoliang Hou4, Jianlan Wang4, Yan Fu1, 2, 4, Qishan Wang1, 2, 3, Yuchun Pan1, 2, 3, Zhen Wang1, 2#, Zhe Zhang1, 2#

1 Zhejiang Key Laboratory of nutrition and breeding for high-quality animal products, College of Animal Sciences, Zhejiang University, Hangzhou 310058, China

2 Key Laboratory of Livestock and Poultry Resources Evaluation and Utilization, Ministry of Agriculture and Rural Affairs, Hangzhou 310058, China

3 Hainan Institute, Zhejiang University, Yongyou Industrial Park, Yazhou Bay Sci-Tech City, Sanya 572000, China

4 SciGene Biotechnology Co., Ltd, Hefei 230031, China.

5 Xianghu Laboratory, Hangzhou 310027, China

 Highlights 

l Encrypted Numerical Fidelity: We encrypt phenotypes and genotypes using orthogonal-matrix transformations, solving mixed-model equations in the encrypted domain with identical results to plaintext analyses (correlation=1.0).

l Methodological Generalization: HEGS supports BLUP, GBLUP, and ssGBLUP within a unified framework using secure factorization and recombination of A and H relationship matrices.

l Platform and Data Resources: We provide a dual-mode platform (web/local) with encryption scripts and 180 pre-encrypted datasets from five populations across 36 traits. Synthetic population evaluations show significant improvements in small populations.

Download:  PDF in ScienceDirect  
Export:  BibTeX | EndNote (RIS)      
摘要  

基因组选择(GS)是加速动植物遗传改良最有效的方法之一,但其应用效率在很大程度上依赖于训练群体规模。然而,构建大规模训练群体通常耗时且成本较高。另一种可行策略是整合来自不同育种场或公司的多个群体开展联合 GS,但这一过程长期受限于数据安全问题以及缺乏可用于安全协同分析的公共平台。本研究开发了 HEGS(Homomorphic Encryption Genomic Selection)开源平台,用于跨机构隐私保护联合 GS,理论上可适用于所有二倍体物种。HEGS 利用同态加密技术在不暴露原始数据的前提下直接对密文数据进行基因组分析,并将加密分析框架由初始的基因组最佳线性无偏预测(GBLUP)扩展至常规最佳线性无偏预测(BLUP)和单步 GBLUP(ssGBLUP),从而拓展了其在不同育种评估场景中的适用性。为验证平台实用性,我们构建了一个大规模加密猪数据集,包含杜洛克、大白、长白和皮特兰 4 个品种、36 个经济性状、180 个预加密数据集以及 58 万余条表型记录,可在不暴露原始数据的情况下直接开展联合分析。基于模拟数据和真实数据的结果表明,HEGS 能够有效实现同态加密条件下的 GS。模型拟合后,HEGS 可为缺少表型记录的基因分型候选个体输出基因组估计育种值(GEBVs),从而支持无需额外表型测定的高效选择。总体而言,HEGS 为动物育种中的跨机构隐私保护协作提供了可部署、可扩展的开源解决方案。



Abstract  

Genomic selection (GS) is one of the most effective approaches for accelerating genetic improvement in animals and plants, but its efficiency largely depends on the size of the training population. However, establishing a large training population is often time-consuming and costly. An alternative strategy is to combine multiple populations distributed across different breeding farms or companies for joint GS, but this is greatly constrained by data-security concerns and the lack of a public platform for secure collaborative analysis. In this study, we developed HEGS (Homomorphic Encryption Genomic Selection), an open-source platform for privacy-preserving joint GS across institutions, which is in principle applicable to diploid species. HEGS uses homomorphic encryption to perform genomic analyses directly on encrypted data without revealing raw information, and extends the encrypted analysis framework from the initial genomic best linear unbiased prediction (GBLUP) model to include both conventional best linear unbiased prediction (BLUP) and single-step GBLUP (ssGBLUP), thereby broadening its applicability in breeding evaluation. To demonstrate the utility of the platform, we constructed a large encrypted pig dataset comprising four breeds (Duroc, Yorkshire, Landrace, and Pietrain), 36 economically important traits, 180 pre-encrypted datasets, and more than 580,000 phenotypic records, enabling immediate joint analyses without exposing raw data. Using both simulated and real datasets, we demonstrated the feasibility and effectiveness of GS under homomorphic encryption. After model fitting, HEGS outputs genomic estimated breeding values (GEBVs) for genotyped candidates without phenotypic records, facilitating selection without additional phenotyping. Overall, HEGS provides a deployable and scalable open-source solution for privacy-preserving cross-institutional collaboration in animal breeding.

Keywords:  homomorphic encryption       joint genomic selection       privacy-preserving  
Online: 19 March 2026  
Fund: 

This work was financially supported by the National Natural Science Foundation of China (32272832 and 32102503), the National Key Research and Development Program of China (2023YFF1001100) and Zhejiang Provincial Key R&D Program of China (2021C02068). 

About author:  Jia ming Gu, E-mail: 12117016@zju.edu.cn; #Correspondence Zhen Wang, E-mail: wangzhen20@zju.edu.cn; Zhe Zhang, E-mail: zhe_zhang@zju.edu.cn

Cite this article: 

Jiamin Gu, Wei Zhao, Zhenyang Zhang, He Han, Yongqi He, Xiaoliang Hou, Jianlan Wang, Yan Fu, Qishan Wang, Yuchun Pan, Zhen Wang, Zhe Zhang. 2026.

Breaking data barriers with homomorphic encryption: The HEGS platform for secure joint genomic selection in animal breeding . Journal of Integrative Agriculture, Doi:10.1016/j.jia.2026.03.046

Akanno E, Schenkel F, Sargolzaei M, Friendship R, Robinson J. 2014. Persistency of accuracy of genomic breeding values for different simulated pig breeding programs in developing countries. Journal of Animal Breeding and Genetics, 131, 367-378.

Arellano A M, Dai W, Wang S, Jiang X, Ohno-Machado L. 2018. Privacy policy and technology in biomedical data science. Annual review of biomedical data science, 1, 115-129.

Berger B, Cho H. 2019. Emerging technologies towards enhancing privacy in genomic data sharing. Genome biology, 20, 128.

Blatt M, Gusev A, Polyakov Y, Goldwasser S. 2020. Secure large-scale genome-wide association studies using homomorphic encryption. Proceedings of the National Academy of Sciences, 117, 11608-11613.

Chen S, Zhou Y, Chen Y, Gu J. 2018. fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 34, i884-i890.

Danielson P, McKiernan H, Brown C, Legg K. 2018. NextGen serology: Protein mass spectrometry for the forensic identification human body fluids. papers of the American Chemical Society, Washington, DC, USA.

de Souza F D, de Lassus H, Cammarota R. 2024. Private detection of relatives in forensic genomics using homomorphic encryption. BMC Medical Genomics, 17, 273.

Froelicher D, Troncoso-Pastoriza J R, Raisaro J L, Cuendet M A, Sousa J S, Cho H, Berger B, Fellay J, Hubaux J P. 2021. Truly privacy-preserving federated analytics for precision medicine with multiparty homomorphic encryption. Nature communications, 12, 5910.

Gentry C. 2009. Fully homomorphic encryption using ideal lattices. Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing. pp. 169-178.

Goddard M E, Hayes B J. 2009. Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nature Reviews Genetics, 10, 381-391.

Henderson C R J B. 1975. Best linear unbiased estimation and prediction under a selection model. Biometrics, 423-447.

Hong M M, Froelicher D, Magner R, Popic V, Berger B, Cho H. 2024. Secure discovery of genetic relatives across large-scale and distributed genomic data sets. Genome Research, 34, 1312-1323.

Hu R, Li F, Chen Y, Liu C, Li J, Ma Z, Wang Y, Cui C, Luo C, Zhou P. 2024. AnimalMetaOmics: A multi-omics data resources for exploring animal microbial genomes and microbiomes. Nucleic Acids Research, 52, D690-D700.

Konečný J, McMahan B, Ramage D. 2015. Federated optimization: Distributed optimization beyond the datacenter. arXiv preprint arXiv:.03575.

Li D, Xiao Y, Chen X, Chen Z, Zhao X, Xu X, Li R, Jiang Y, An X, Zhang L. 2025. Genomic selection and WssGWAS of sheep body weight and milk yield: Imputing low-coverage sequencing data with similar genetic background panels. Journal of Dairy Science, 108, 3820-3834 

Lloret-Villas A, Pausch H, Leonard A S. 2023. The size and composition of haplotype reference panels impact the accuracy of imputation from low-pass sequencing in cattle. Genetics Selection Evolution, 55, 33.

Lund M S, de Roos A P, de Vries A G, Druet T, Ducrocq V, Fritz S, Guillaume F, Guldbrandtsen B, Liu Z, Reents R. 2011. A common reference population from four European Holstein populations increases reliability of genomic predictions. Genetics Selection Evolution, 43, 1-8.

Madsen P, Sørensen P, Su G, Damgaard L H, Thomsen H, Labouriau R, eds. 2006. DMU-a package for analyzing multivariate mixed models. 8th World Congress on Genetics Applied to Livestock Production. Belo Horizonte. pp. 11-27.

McCallum E, Weston S. 2011. Parallel R. " O'Reilly Media, Inc.".

Meuwissen T H, Hayes B J, Goddard M. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157, 1819-1829.

Mott R, Fischer C, Prins P, Davies R W. 2020. Private genomes and public SNPs: Homomorphic encryption of genotypes and phenotypes for shared quantitative genetics. Genetics, 215, 359-372.

Privé F, Aschard H, Ziyatdinov A, Blum M G. 2018. Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr. Bioinformatics, 34, 2781-2787.

Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M A, Bender D, Maller J, Sklar P, De Bakker P I, Daly M J. 2007. PLINK: A tool set for whole-genome association and population-based linkage analyses. The American journal of human genetics, 81, 559-575.

Rubinacci S, Hofmeister R J, Sousa da Mota B, Delaneau O. 2023. Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes. Nature Genetics, 55, 1088-1090.

Sargolzaei M, Schenkel F S. 2009. QMSim: A large-scale genome simulator for livestock. Bioinformatics, 25, 680-681.

Song H, Ye S, Jiang Y, Zhang Z, Zhang Q, Ding X. 2019. Using imputation-based whole-genome sequencing data to improve the accuracy of genomic prediction for combined populations in pigs. Genetics Selection Evolution, 51, 1-13.

Wang S, Zhang Y, Dai W, Lauter K, Kim M, Tang Y, Xiong H, Jiang X. 2016. HEALER: homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS. Bioinformatics, 32, 211-218.

Wang Z, Zhang Z, Chen Z, Sun J, Cao C, Wu F, Xu Z, Zhao W, Sun H, Guo L. 2022. PHARP: a pig haplotype reference panel for genotype imputation. Scientific Reports, 12, 12645.

Warr A, Affara N, Aken B, Beiki H, Bickhart D M, Billis K, Chow W, Eory L, Finlayson H A, Flicek P. 2020. An improved pig reference genome sequence to enable pig genetics and genomics research. Gigascience, 9, giaa051.

Xing Y, Li G, Wang Z, Feng B, Song Z, Wu C. 2017. GTZ: a fast compression and cloud transmission tool optimized for FASTQ files. BMC bioinformatics, 18, 549.

Xu Y, Liu X, Fu J, Wang H, Wang J, Huang C, Prasanna B M, Olsen M S, Wang G, Zhang A. 2020. Enhancing genetic gain through genomic selection: From livestock to plants. Plant Communications, 1, 100005.

Yang J, Lee S H, Goddard M E, Visscher P M. 2011. GCTA: a tool for genome-wide complex trait analysis. The American Journal of Human Genetics, 88, 76-82.

Yin L, Zhang H, Tang Z, Yin D, Fu Y, Yuan X, Li X, Liu X, Zhao S. 2023. HIBLUP: an integration of statistical models on the BLUP framework for efficient genetic evaluation using big genomic data. Nucleic Acids Research, 51, 3501-3512.

Zhao W, Zhang Z, Ma P, Wang Z, Wang Q, Zhang Z, Pan Y. 2023. The effect of highdensity genotypic data and different methods on joint genomic prediction: A case study in large white pigs. Animal Genetics, 54, 45-54.

Zhu D, Wang Y, Qu H, Feng C, Zhang H, Sheng Z, Jiang Y, Nie Q, Chu S, Shu D. 2025. GCRP: Integrated Global chicken reference panel from 11,951 chicken genomes. Genomics, Proteomics Bioinformatics, qzaf032. 

No related articles found!
No Suggested Reading articles found!