Please wait a minute...
Journal of Integrative Agriculture
Advanced Online Publication | Current Issue | Archive | Adv Search
Classification of fruit shape based on decision tree model and identification of QTLs and genes controlling fruit shape in eggplant

Qiang Li1*#, Shuangxia Luo1*, Huimin Du1*, Chive Paradowski2*, Jingjian Ma1, Liying Zhang1, Xupeng Jia1, Ruoxuan Zhao1, Dongfang Zhang1, Wei Yan3, Jianan Liu3, Lijun Song1, Esther van der Knaap4, Sofia Visa2#, Xueping Chen1#

1 College of Horticulture/Key Laboratory of Vegetable Germplasm Innovation and Utilization of Hebei, Hebei Agricultural University, Baoding 071000, China

2 Department of Mathematical & Computational Sciences, The College of Wooster, Wooster, OH 44691, USA

3 College of Life Sciences/Institute of Life Science and Green Development, Hebei University, Baoding 071000, China

4 Center for Applied Genetic Technologies/Department of Horticulture, University of Georgia, Athens, GA 30602, USA

 Highlights 

• Thirteen shape categories and ten key attributes determining fruit shape were identified based on decision tree model in eggplant

•Classification rules achieving an accuracy of 92.59% were generated.

•Four QTLs controlling FSI and PAMi were detected using GWAS and QTL-seq.

•Overexpression of the candidate gene SmFSI3.1/SmFL resulted in the production of elongated tomato fruits 

Download:  PDF in ScienceDirect  
Export:  BibTeX | EndNote (RIS)      
摘要  

茄子(Solanum melongena L.)果实形状变异类型丰富,是研究果实形状变异的理想模式作物。茄子果实形状不仅影响消费者偏好,还在商品品种划分与种质资源分类中发挥重要作用。尽管果实形状意义重大,但现有分类体系主要是定性描述,缺乏量化标准,且不同国家或地区的分类体系存在差异,无法全面涵盖茄子果实形状的丰富变异类型。本研究基于基尼指数的变量筛选方法,利用决策树模型,将茄子果实形状划分为13个类别,筛选出 10 个对果实形状起决定性作用的关键性状指标,并建立了分类准确率达 92.59% 的判别规则。同时,本研究还采用随机森林、极端梯度提升模型(XGBoost)、支持向量机模型(SVM)、K均值聚类及高斯混合模型(GMM)这 5 种方法进行果实形状分类结果表明,这些方法的分类稳健性均低于决策树模型。果实形状分类模型为数量性状位点测序(QTL-seq)和全基因组关联分析(GWAS)提供了关键果形指标。通过两种技术联合检测,共鉴定出 4 个调控果实形状指数(FSI)和近端微夹角(PAMi)的数量性状位点(QTL)。本研究将候选基因 SmFSI3.1/SmFL(属于SUN/IQD 蛋白家族)在番茄中进行过表达验证结果表明,在番茄中过表达SmFSI3.1/SmFL导致果实变长,表明该基因在调控茄子果实伸长过程中发挥正向作用。综上,本研究建立了一套精准且可重复的茄子果实形状分类模型,该模型对茄子育种实践与品种分类具有重要指导意义同时,本研究明确了fsi3.1/fl3.1位点的候选基因功能,为解析茄子果实形状的遗传调控机制奠定了理论基础。



Abstract  

Eggplant (Solanum melongena L.) shows remarkable diversity in fruit shape, making it an excellent model for studying shape variation. Eggplant fruit shape influences consumer preference and plays an important role in the classification of commercial varieties and germplasm. Despite its importance, existing classification systems are limited to description without quantitative criteria, differ by country or region and fail to fully capture the diversity of eggplant fruit shapes. In the present study, thirteen shape categories were identified using a decision tree model with Gini index-based variable selection. Ten key attributes that largely determine fruit shape were identified and high accuracy (92.59%) classification rules were generated. Five other methods, including random forest, XGBoost, SVM, K-means and GMM, were also applied to fruit shape classification, but they proved less robust for classification compared to the decision tree. The shape modeling informed the key attribute selection for the QTL-seq and GWAS analyses. Four QTLs controlling Fruit Shape Index (FSI) and Proximal Angle Micro (PAMi) were detected using GWAS and QTL-seq. The candidate gene SmFSI3.1/SmFL, a member of the SUN/IQD family, was over-expressed in tomato and resulted in elongated fruits, indicating the positive roles of this gene in regulating fruit elongation in eggplant. In summary, we developed an accurate and reproducible model for classifying eggplant fruit shapes, which is of significance for eggplant breeding and variety classification. Moreover, we verified the function of the causal gene responsible for fsi3.1/fl3.1 locus, providing a foundation for understanding the genetic regulation of fruit shape in eggplant. 

Keywords:  Eggplant       Decision tree       Fruit shape       Classification system       IQD  
Online: 11 March 2026  
Fund: 

This work was supported by the Hebei Provincial Natural Science Foundation for Distinguished Young Scholars (C2023204028), Youth Special Program for Basic Research in Biological Breeding of the National Natural Science Foundation of China (32441074), Science Research Project of Hebei Education Department (ZD2022111), Hebei Agriculture Research System (HBCT2023100207) and S&T Program of Hebei (21326309D).

About author:  #Correspondence Qiang Li, E-mail: yylq@hebau.edu.cn; Xueping Chen, E-mail: chenxueping@hebau.edu.cn; Sofia Visa, E-mail: svisa@wooster.edu * These authors contributed equally to this study.

Cite this article: 

Qiang Li, Shuangxia Luo, Huimin Du, Chive Paradowski, Jingjian Ma, Liying Zhang, Xupeng Jia, Ruoxuan Zhao, Dongfang Zhang, Wei Yan, Jianan Liu, Lijun Song, Esther van der Knaap, Sofia Visa, Xueping Chen. 2026. Classification of fruit shape based on decision tree model and identification of QTLs and genes controlling fruit shape in eggplant. Journal of Integrative Agriculture, Doi:10.1016/j.jia.2026.03.027

Barchi L, Rabanus-Wallace M T, Prohens J, Toppino L, Padmarasu S, Portis E, Rotino G L, Stein N, Lanteri S, Giuliano G. 2021. Improved genome assembly and pan-genome provide key insights into eggplant domestication and breeding. The Plant Journal, 107, 579-596.

Duan P, Xu J, Zeng D, Zhang B, Geng M, Zhang G, Huang K, Huang L, Xu R, Ge S, Qian Q, Li Y. 2017. Natural variation in the promoter of GSE5 contributes to grain size diversity in rice. Molecular Plant, 10, 685-694.

FAO. 2023. FAOSTAT. https://www.fao.org/faostat/en/#data.

Feldmann M J, Hardigan M A, Famula R A, Lopez C M, Tabb A, Cole G S, Knapp S J. 2020. Multi-dimensional machine learning approaches for fruit shape phenotyping in strawberry. GigaScience, 9, 1-17.

Gómez-Devia L, Nevo O. 2024. Effects of temperature gradient on functional fruit traits: an elevation-for-temperature approach. BMC Ecology and Evolution, 24, 94.

Gonzalo M J, Brewer M T, Anderson C, Sullivan D, Gray S, van der Knaap E. 2009. Tomato fruit shape analysis using morphometric and morphology attributes implemented in Tomato Analyzer software program. Journal of the American Society for Horticultural Science, 134,77-87.

Hurtado M, Vilanova S, Plazas M, Gramazio P, Herraiz F J, Andújar I, Prohens J. 2013. Phenomics of fruit shape in eggplant (Solanum melongena L.) using Tomato Analyzer software. Scientia Horticulturae,164, 625-632.

International Board for Plant Genetic Resources (IBPGR). 1990. Descriptors for eggplant. https://alliancebioversityciat.org/publications-data/descriptors-eggplant

Kaur A, Gill K S, Malhotra S, Devliyal S. 2024.  Automated fruit classification using KNN and decision trees for enhanced agricultural efficiency and accuracy. 4th Asian Conference on Innovation in Technology (ASIANCON), 1-5.

Kaushik P, Prohens J, Vilanova S, Gramazio P, Plazas M. 2016. Phenotyping of eggplant wild relatives and interspecific hybrids with conventional and phenomics descriptors provides insight for their potential utilization in breeding. Frontiers in Plant Science, 7, 677.

Li H, Durbin R. 2010. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26, 589-595.

Li, Q, Feng Q, Snouffer A, Zhang B, Rodriguez G R, van der Knaap E. 2022. Increasing fruit weight by editing a cis-regulatory element in tomato KLUH promoter using CRISPR/Cas9. Frontiers in Plant Science, 13, 879642.

Li Q, Luo S, Zhang L, Feng Q, Song L, Sapkota M, Xuan S, Wang Y, Zhao J, van der Knaap E, Chen X, Shen S. 2023. Molecular and genetic regulations of fleshy fruit shape and lessons from Arabidopsis and rice. Horticulture Research, 10, uhad108.

Li X, Zhu D. 2006. Descriptive Specifications and Data Standards for Eggplant Germplasm Resources. China Agriculture Press, Beijing, China. (in Chinese).

Liu J, Van Eck J, Cong B, Tanksley S D. 2002. A new class of regulatory genes underlying the cause of pear-shaped tomato fruit. Proceedings of the National Academy of Sciences, 99, 13302-13306.

Liu J, Chen J, Zheng X, Wu F, Lin Q, Heng Y, Tian P, Cheng Z, Yu X, Zhou K, Zhang X, Guo X, Wang J, Wang H, Wan J. 2017. GW5 acts in the brassinosteroid signalling pathway to regulate grain width and weight in rice. Nature Plants, 3, 17043.

Liu W, Qian Z, Zhang J, Yang J, Wu M, Barchi L, Zhao H, Sun H, Cui Y, Wen C. 2019. Impact of fruit shape selection on genetic structure and diversity uncovered from genome-wide perfect SNPs genotyping in eggplant. Molecular Breeding, 39, 140.

Mangino G, Vilanova S, Plazas M, Prohens J, Gramazio P. 2021. Fruit shape morphometric analysis and QTL detection in a set of eggplant introgression lines. Scientia Horticulturae, 282, 110006.

Mansfeld B N, Grumet R. 2018. QTLseqr: an R package for bulk segregant analysis with next‐generation sequencing. The Plant Genome, 11, 180006.

McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20, 1297-1303.

Mimma N E A, Ahmed S, Rahman T, Khan R, Tang Z. 2022. Fruits classification and detection application using deep learning. Scientific Programming, 2022, 1-16.

Nagamatsu S, Tsubone M, Wada T, Oku K, Mori M, Hirata C, Hayashi A, Tanabata T, Isobe S, Takata K, Shimomura K. 2021. Strawberry fruit shape: quantification by image analysis and QTL detection by genome-wide association analysis. Breeding Science, 71, 167-175.

Nankar A N, Tringovska I, Grozeva S, Ganeva D, Kostova D. 2020. Tomato phenotypic diversity determined by combined approaches of conventional and high-throughput tomato analyzer phenotyping. Plants, 9, 197

Pang H, Ai J, Wang W, Hu T, Hu H, Wang J, Yan Y, Wu X, Bao C, Wei Q. 2024. Fine mapping of QTL fl3.1 reveal SmeFL as the candidate gene regulating fruit length in eggplant (Solanum melongena L.). Vegetable Research, 4, e028.

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É. 2011. Scikit-learn: machine learning in python. Journal of Machine Learning Research,12, 2825-2830.

Pereira-Dias L, Fita A, Vilanova S, Sánchez-López E, Rodríguez-Burruezo A. 2020. Phenomics of elite heirlooms of peppers (Capsicum annuum L.) from the Spanish centre of diversity: Conventional and high-throughput digital tools towards varietal typification. Scientia Horticulturae, 265, 109245

Portis E, Barchi L, Toppino L, Lanteri S, Acciarri N, Felicioni N, Fusari F, Barbierato V, Cericola F, Vale G, Rotino G L. 2014. QTL mapping in eggplant reveals clusters of yield-related loci and orthology with the tomato genome. PLoS One, 9, e89499.

Quispe-Choque G, Rojas-Ledezma S, Maydana-Marca A. 2022. Morphological diversity determination of the tomato fruit collection (Solanum lycopersicum L.) by phenotyping based on digital images. Journal of the Selva Andina Research Society, 13, 51-68.

Rajametov S N, Lee K, Jeong H B, Cho M C, Nam C W, Yang E Y. 2021. The effect of night low temperature on agronomical traits of thirty-nine pepper accessions (Capsicum annuum L.). Agronomy, 11, 1986.

Rodriguez G R, Munos S, Anderson C, Sim S C, Michel A, Causse M, Gardener B B, Francis D, van der Knaap E. 2011. Distribution of SUN, OVATE, LC, and FAS in the tomato germplasm and the relationship to fruit shape diversity. Plant Physiology, 156, 275-285.

Rodríguez G R, Moyseenko J B, Robbins M D, Morejón N H, Francis D M, van der Knaap E. 2010. Tomato analyzer: a useful software application to collect accurate and detailed morphological and colorimetric data from two-dimensional objects. Journal of Visualized Experiments, 37, e1856.

Sarker I H. 2021. Machine learning: Algorithms, real-world applications and research directions. SN Computer Science, 2, 160.

Snouffer A, Kraus C, van der Knaap E. 2020. The shape of things to come: ovate family proteins regulate plant organ shape. Current Opinion in Plant Biology, 53, 98-105.

Taher D, Solberg S O, Prohens J, Chou Y Y, Rakha M, Wu T H. 2017. World vegetable center eggplant collection: origin, composition, seed dissemination and utilization in breeding. Frontiers in Plant Science, 8, 1484.

Tripodi P, Greco B. 2018. Large scale phenotyping provides insight into the diversity of vegetative and reproductive organs in a wide collection of wild and domesticated peppers (Capsicum spp.). Plants, 7, 103

UPOV. Eggplant: Guidelines for the conduct of tests for Distinctness, uniformity and stability, 2012

Visa S, Cao C, Gardener B M,v an der Knaap E. 2014. Modeling of tomato fruits into nine shape categories using elliptic fourier shape modeling and Bayesian classification of contour morphometric data. Euphytica, 200, 429-439.

Wang X, Yan M, Wang X, Wu Z, Zhou J, Wang C, Chen R, Qin X, Yang H, Wei H, Gu W. 2022. The phenotypic diversity of Schisandra sphenanthera fruit and SVR model for phenotype forecasting. Industrial Crops and Products, 186, 115162.

Wei Q, Wang W, Hu T, Hu H, Wang J, Bao C. 2020a. Construction of a SNP-based genetic map using SLAF-Seq and QTL analysis of morphological traits in eggplant. Frontiers in Genetics, 11, 178.

Wei Q, Wang J, Wang W, Hu T, Hu H, Bao C. 2020b. A high-quality chromosome-level genome assembly reveals genetics for important traits in eggplant. Horticulture Research, 7,153.

Wei Q, Wang W, Wang Y, Ai J, Hu T, Hu H, Wang J, Yan Y, Pang H, Hu N, Bao, C. 2025. A complete telomere-to-telomere genome assembly of Solanum melongena uncovers key regulators in pan-tissue anthocyanin biosynthesis. Plant Communications, 6, 101533.

Wert T W, WilliamsonJ G, Chaparro J X, Miller E P, Rouse R E. 2007. The influence of climate on fruit shape of four low-chill peach cultivars. HortScience ,42, 1589-1591.

Wu S, Zhang B, Keyhaninejad N, Rodriguez G R, Kim H J, Chakrabarti M, Illa-Berenguer E,Taitano N K , Gonzalo M J, Diaz A, Pan Y, Leisner C P, Halterman D, Buell C R, Weng Y, Jansky S H ,van Eck H, Willemsen J, Monforte A J, Meulia T, van der Knaap E. 2018. A common genetic mechanism underlies morphological diversity in fruits and other plant organs. Nature Communications, 9, 4734.

Xiao H, Jiang N, Schaffner E, Stockinger E J, Van Der Knaap E. 2008. A retrotransposon-mediated gene duplication underlies morphological variation of tomato fruit. Science, 319, 1527-1530.

Yu C, Yang Q, Li W, Jiang Y, Gan G, Cai L, Li X, Li Z, Li W, Zou M, Yang Y, Wang Y. 2024a. Development of a 50K SNP array for whole-genome analysis and its application in the genetic localization of eggplant (Solanum melongena L.) fruit shape. Frontiers in Plant Science, 15,1492242.

Yu X, Du C, Wang X, Gao F, Lu J, Di X, Zhuang X, Cheng C, Yao F. 2024b. Multivariate analysis between environmental factors and fruit quality of citrus at the core navel orange-producing area in China. Frontiers in Plant Science, 15, 1510827.

Zakeri A, Hedayati R, Khedmati M, Taghipour-Gorjikolaie M. 2021. Classification of jujube fruit based on several pricing factors using machine learning methods. Computer Vision and Pattern Recognition, arXiv: 2111.00112.

Zeng Y, Shi J, Ji Z, Wen Z, Liang Y, C. Yang. 2017. Genotype by environment interaction: the greatest obstacle in precise determination of rice sheath blight resistance in the field. Plant Disease, 101, 1795-1801.

Zhang B, Li Q, Keyhaninejad N, Taitano N, Sapkota M, Snouffer A, van der Knaap E. 2023. A combinatorial TRM-OFP module bilaterally fine-tunes tomato fruit shape. New Phytologist, 238, 2393-2409.

Zhang T, Hong Y, Zhang X, Yuan X, Chen S. 2022. Relationship between key environmental factors and the architecture of fruit shape and size in near-isogenic lines of cucumber (Cucumis sativus L.). International Journal of Molecular Sciences, 23, 14033.

Zhang Z, Zhang H, Liu J, Chen K, Wang Y, Zhang G, Li L, Yue H, Weng Y, Li Y. 2024. The mutation of CsSUN, an IQD family protein, is responsible for the short and fat fruit (sff) in cucumber (Cucumis sativus L.). Plant Science, 346, 112177.

Zhu Q, Deng L, Chen J, Rodríguez G R, Sun C, Chang Z, Yang T, Zhai H, Jiang H, Topcu Y, Francis D, Hutton S, Sun L, Li C B, van der Knaap E, Li C. 2023. Redesigning the tomato fruit shape for mechanized production. Nature Plants, 9, 1659-1674.

[1] Peiyu Zhang, Guoning Zhu, Chunjiao Zhang, Hongliang Zhu. Functional analysis of tomato MAP65 gene family, highlighting SlMAP65-1’s role in fruit morphogenesis[J]. >Journal of Integrative Agriculture, 2025, 24(2): 564-574.
[2] WU Li-hong, ZHOU Cao, LONG Gui-yun, YANG Xi-bin, WEI Zhi-yan, LIAO Ying-jiang, YANG Hong, HU Chao-xing . Fitness of fall armyworm, Spodoptera frugiperda to three solanaceous vegetables[J]. >Journal of Integrative Agriculture, 2021, 20(3): 755-763.
[3] WANG Shou-ming, LI Wei-jia, LIU Yue-xue, LI He, MA Yue, ZHANG Zhi-hong. Comparative transcriptome analysis of shortened fruit mutant in woodland strawberry (Fragaria vesca) using RNA-Seq[J]. >Journal of Integrative Agriculture, 2017, 16(04): 828-844.
[4] LI Yong, FANG Wei-chao, ZHU Geng-rui, CAO Ke, CHEN Chang-wen, WANG Xin-wei, WANG Li-rong. Accumulated chilling hours during endodormancy impact blooming and fruit shape development in peach (Prunus persica L.)[J]. >Journal of Integrative Agriculture, 2016, 15(06): 1267-1274.
[5] L Sabatino, G Iapichino, A Maggio, E D’ Anna, M Bruno, F D’ Anna. Grafting affects yield and phenolic profie of Solanum melongena L. landraces[J]. >Journal of Integrative Agriculture, 2016, 15(05): 1017-1024.
[6] HAN Ning, WU Jing, Amir Reza Shah Tahmassebi, XU Hong-wei , WANG Ke. NDVI-Based Lacunarity Texture for Improving Identification of Torreya Using Object-Oriented Method[J]. >Journal of Integrative Agriculture, 2011, 10(9): 1431-1444.
No Suggested Reading articles found!