Plant disease phenotype captioning via zero-shot learning with semantic correction based on LLM

doi:10.1016/j.jia.2026.03.014

Advanced Online Publication | Current Issue | Archive | Adv Search

Yushan Xie¹, Xinyu Dong¹, Kejun Zhao¹, G.M.A.D Sirishantha², Yuanyuan Xiao¹, Peijia Yu¹, Changyuan Zhai³, Qi Wang^1#

¹State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China

²Postgraduate Institute of Agriculture, University of Peradeniya, Peradeniya 20400, Sri Lanka

³Beijing Research Center of Intelligent Equipment for Agriculture, Beijing 100097, China

Highlights

Constructs a large-scale dataset of 20,943 image captions covering over 60 plant species and 300 diseases to support precise plant disease description generation.

Proposes the PDPC framework, which captures and refines dependency relations in descriptive texts, integrates key image concepts, and restructures text for accurate depiction of plant disease phenotypic information.

Demonstrates through extensive experiments that the proposed framework significantly outperforms existing models in describing plant disease characteristics.

Abstract
References

Download: PDF in ScienceDirect
Export: BibTeX | EndNote (RIS)

摘要

农业是全球粮食安全与生活质量的基础，水稻、小麦、玉米等主粮作物满足了世界大多数人口的膳食需求。然而，这些作物易受病害侵扰，导致显著产量损失，例如小麦锈病每年造成的损失超过29亿美元。准确描述植物病害的表型特征对支持病害诊断至关重要，是保障粮食安全的关键环节。现有农业方法难以充分应对视觉表型与病害描述之间的异质性问题，导致对关键病害特征的关注不足。针对这一挑战，本文提出一种零样本图像描述框架PDPC。该框架利用大规模描述语料库、句法分析及语义结构优化，显著提升了病害描述的质量与泛化能力。此外，本文构建了一个包含20,943条图像描述的数据集，涵盖60余种植物、300余种病害的表型特征。实验结果表明，PDPC框架在准确描述植物病害特征方面优于现有模型。该创新框架的引入不仅提高了病害描述的准确性，也为植物病害的智能诊断与管理提供了有力支撑，为改善作物健康、提升农业产量奠定了基础。

Abstract

Agriculture is the foundation of global food security and quality of life, with staple crops such as rice, wheat, and maize meeting the dietary needs of the majority of the world's population. These crops are susceptible to diseases that can lead to significant yield losses; for example, wheat rust disease causes annual losses that exceed $2.9 billion. Accurate captioning of the phenotypic characteristics of plant diseases plays a crucial role in supporting diagnosis, which is essential for ensuring food security. Existing methods in agriculture struggle to adequately address the heterogeneity in visual phenotypes and disease descriptions, which leads to inadequate focus on key disease characteristics. To address this issue, we propose a zero-shot image captioning framework named PDPC. PDPC employs an extensive descriptive corpus, syntactic analysis, and optimization of semantic structures to significantly improve the quality and generalization of disease descriptions. Additionally, we construct a dataset comprising 20,943 image captions that describe the characteristics of plant diseases in more than 60 plant species and 300 diseases. Experimental results demonstrate that the PDPC framework outperforms existing models in accurately describing the characteristics of plant disease. The introduction of this innovative framework enhances the accuracy of disease descriptions and provides robust support for the intelligent diagnosis and management of plant diseases, ultimately paving the way for better plant health and higher agricultural yields.

Keywords: plant disease image caption LLM dependency grammar semantic correction

Online: 10 March 2026

Fund:

This research was supported by the National Natural Science Foundation of China (No. 62506089), Scientific and Technological Innovation Platform Research Project of Guizhou Province (CXPTXM[2025]024, CXPTXM[2025]026), Guizhou Province Youth Science and Technology Talent Project ([2024]317), Guizhou Provincial Science and Technology Projects ([2024]002, CXTD[2023]027).

About author: #Correspondence Qi Wang, E-mail: qiwang@gzu.edu.cn

	Service
	E-mail this article
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors

Cite this article:

Yushan Xie, Xinyu Dong, Kejun Zhao, G.M.A.D Sirishantha, Yuanyuan Xiao, Peijia Yu, Changyuan Zhai, Qi Wang. 2026. Plant disease phenotype captioning via zero-shot learning with semantic correction based on LLM. Journal of Integrative Agriculture, Doi:10.1016/j.jia.2026.03.014

Alibaba DAMO Academy. (2023). Tongyi Qianwen technical documentation. Alibaba DAMO Academy.

Alfred R, Obit J, Chin C Y, Haviluddin H, Lim Y, Kim S. 2021. Towards paddy rice smart farming: A review on big data, machine learning, and rice production tasks. IEEE Access, 9, 50358–50380.

Anil, R., Borgeaud, S., Alayrac, J.-B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A. M., Hauth, A., Millican, K., Silver, D., et al. (2023). Gemini: A family of highly capable multimodal models. arXiv preprint. https://arxiv.org/abs/2312.11805

Bai, X., Gu, S., Liu, P., Yang, A., Cai, Z., Wang, J., & Yao, J. (2023). RPNet: Rice plant counting after tillering stage based on plant attention and multiple supervision network. The Crop Journal, 11(5), 1586–1594.

Beddiar, D. R., Oussalah, M., & Seppänen, T. (2022). Automatic captioning for medical imaging (MIC): A rapid review of literature. Artificial Intelligence Review, 56, 4019–4076.

Che, C., Lin, Q., Zhao, X., Huang, J., & Yu, L. (2023). Enhancing multimodal understanding with CLIP-based image-to-text transformation. In Proceedings of the 2023 6th International Conference on Big Data Technologies (ICBDT) (pp. 301–313). Association for Computing Machinery.

Chen, X., Zhang, Y., Li, M., & Wang, X. (2023). A survey on image captioning: Advances and challenges. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1), 1234–1256.

Chen, Q., Hu, X., Wang, Z., & Hong, Y. (2023). MedBLIP: Bootstrapping language-image pre-training from 3D medical images and texts. arXiv preprint. https://arxiv.org/abs/2305.10799

Dai, W., Li, J., Li, D., Tiong, A. M. H., Zhao, J., Wang, W., Li, B., Fung, P., & Hoi, S. (2023). InstructBLIP: Towards general-purpose vision-language models with instruction tuning. arXiv preprint. https://arxiv.org/abs/2305.06500

Dong, X., Wang, Q., Huang, Q., Ge, Q., Zhao, K., Wu, X., Wu, X., Lei, L., & Hao, G. (2023). PDDD-PreTrain: A series of commonly used pre-trained models support image-based plant disease diagnosis. Plant Phenomics, 5, 0054.

Dan, Y., Wu, X., Yu, Y., Zou, Z., Gunarathna, R. D. S. M., Yu, P., Xiao, Y., & Wang, Q. (2025). DKP-ADS: Domain knowledge prompt combined with multi-task learning for assessment of foliar disease severity in staple crops. The Crop Journal, 13(6), 1939–1954.

Dong, Y., Cordonnier, J.-B., & Loukas, A. (2021). Attention is not all you need: Pure attention loses rank doubly exponentially with depth. In M. Meila & T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning (Vol. 139, pp. 2793–2803). PMLR.

Dubey, A., Jauhri, A., Pandey, A., et al. (2024). The Llama 3 herd of models. arXiv preprint. https://arxiv.org/abs/2407.21783

Food and Agriculture Organization. (2018). The state of food and agriculture 2018: Migration, agriculture and rural development. Food and Agriculture Organization of the United Nations. https://www.fao.org/3/I9549EN/i9549en.pdf

Guerra, J. P., & Cuevas, F. (2024). Application of digital image processing techniques for agriculture: A review. In Digital image processing: Latest advances and applications (Chapter 2). IntechOpen. https://doi.org/10.5772/intechopen.1001234

Gurnee, W., & Tegmark, M. (2024). Language models represent space and time. In The Twelfth International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=3F6nWfeHoo

Hughes, D. P., & Salathé, M. (2015). An open access repository of images on plant health to enable the development of mobile disease diagnostics through machine learning and crowdsourcing. arXiv preprint. https://arxiv.org/abs/1511.08060

Koh, J. Y., Fried, D., & Salakhutdinov, R. (2023). Generating images with multimodal language models. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023). Curran Associates. https://arxiv.org/abs/2305.17216

Li, J., Li, D., Xiong, C., & Hoi, S. (2022). BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. arXiv preprint. https://arxiv.org/abs/2201.12086

Li, J., Li, D., Savarese, S., & Hoi, S. (2023). BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In Proceedings of the 40th International Conference on Machine Learning (ICML) (pp. 19730–19742). PMLR. https://proceedings.mlr.press/v202/li23q.html

Li, W., Zhu, L., Wen, L., & Yang, Y. (2023). DeCap: Decoding CLIP latents for zero-shot captioning via text-only training. In The Eleventh International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=Lt8bMlhiwx2

Li, Y., Yin, Y., Fan, C., Zhang, Y., Wang, X., & Tang, J. (2024). A survey of hallucination in multimodal large language models. arXiv preprint. https://arxiv.org/abs/2405.19388

Liu, H., Li, C., Li, Y., & Lee, Y. J. (2023). Improved baselines with visual instruction tuning. arXiv preprint. https://arxiv.org/abs/2310.03744.

OpenAI. (2023). GPT-4 technical report. OpenAI. https://arxiv.org/abs/2303.08774

Pacal, I., Kunduracioglu, I., Alma, M. H., Deveci, M., Kadry, S., Nedoma, J., & Martinek, R. (2024). A systematic review of deep learning techniques for plant diseases. Artificial Intelligence Review, 57, 304. https://doi.org/10.1007/s10462-024-10945-0

Radford, A., Kim, J. W., Hallacy, C., et al. (2021). Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML) (pp. 8748–8763). PMLR.

Restrepo-Arias, J. F., Branch-Bedoya, J. W., & Awad, G. (2024). Image classification on smart agriculture platforms: Systematic literature review. Artificial Intelligence in Agriculture, 8, 1–17.

Rohne Till, E. (2022). The role of agriculture in economic development. In Agriculture for economic development in Africa (pp. 23–45). Springer.

Kormelink R, Garcia M L, Goodin M, Sasaya T, Haenni A L. 2011. Negative-strand RNA viruses: the plant-infecting counterparts. Virus Research. 162: 184-202.

Savary, S., Ficke, A., Aubertot, J.-N., & Hollier, C. (2012). Crop losses due to diseases and their implications for global food production losses and food security. Food Security, 4(4), 519–537.

Shwetha, V., Bhagwat, A., & Laxmi, V. (2024). LeafSpotNet: A deep learning framework for detecting leaf spot disease in jasmine plants. Artificial Intelligence in Agriculture, 12, 1–18.

Singh, R. P., Hodson, D. P., Huerta-Espino, J., Bhavani, S., & Randhawa, M. S. (2015). Emergence and spread of new races of wheat stem rust fungus: Continued threat to food security and prospects of genetic control. Phytopathology, 105(7), 872–884.

Su, J., Lu, Y., Pan, S., et al. (2021). RoFormer: Enhanced transformer with rotary position embedding. arXiv preprint. https://arxiv.org/abs/2104.09864

Sun, C., Li, Y., Song, Z., Liu, Q., Si, H., Yang, Y., & Cao, Q. (2025). Research on tomato disease image recognition method based on DeiT. European Journal of Agronomy, 162, 127400.

Sun, W., Wang, C., Wu, H., Miao, Y., Zhu, H., Guo, W., & Li, J. (2023). DFYOLOv5m-M2transformer: Interpretation of vegetable disease recognition results using image dense captioning techniques. Computers and Electronics in Agriculture. 215, 108460.

Suresh, K. R., Jarapala, A., & Sudeep, P. V. (2022). Image captioning encoder–decoder models using CNN-RNN architectures: A comparative study. Circuits, Systems Signal Processing, 41(10), 5719–5742.

Tewel, Y., Shalev, Y., Schwartz, I., & Wolf, L. (2021). ZeroCap: Zero-shot image-to-text generation for visual-semantic arithmetic. arXiv preprint. https://arxiv.org/abs/2111.14447

Ueda, A., Yang, W., & Sugiura, K. (2023). Switching text-based image encoders for captioning images with text. IEEE Access, 11, 55706–55715.

Wu, X., Zhang, J., Zou, Z., Chen, C., Yu, Y., Yu, P., Xiao, Y., Wang, Q., Kandegama, W. M. W. W., & Hao, G. (2026). PlantIF: Multimodal semantic interactive fusion via graph learning for plant disease diagnosis. Plant Phenomics, 8(1), 100132.

Xie, Z., Feng, Y., Hu, Y., & Liu, H. (2022). Generating image description of rice pests and diseases using a ResNet18 feature encoder. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 38(12), 197–206.

Xu, G., Niu, S., Tan, M., Luo, Y., & Du, Q. (2021). Towards accurate text-based image captioning with content diversity exploration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 16851–16860). IEEE/CVF.

Xu, L., Hu, Z., Zhou, D., Ren, H., Dong, Z., Keutzer, K., Ng, S. K., & Feng, J. (2023). MAgIC: Benchmarking large language model powered multi-agent in cognition, adaptability, rationality and collaboration. arXiv preprint. https://arxiv.org/abs/2311.08562

Yu, Y., Wu, X., Yu, P., Wan, Q., Dan, Y., Xiao, Y., & Wang, Q. (2025). Location-guided lesions representation learning via image generation for assessing plant leaf diseases severity. Plant Phenomics, 7(2), 100058.

Zeng, Z., Zhang, H., Wang, Z., Lu, R., Wang, D., & Chen, B. (2023). ConZIC: Controllable zero-shot image captioning by sampling-based polishing. arXiv preprint. https://arxiv.org/abs/2303.02437

Zeng, Q., Sun, J., & Wang, S. (2024). DIC-Transformer: Interpretation of plant disease classification results using image caption generation technology. Frontiers in Plant Science, 14, 1289765.

Zhang, L., Sun, L., Jin, X., Zhao, X., & Li, S. (2025). DAFFnet: Seed classification of soybean variety based on dual attention feature fusion networks. The Crop Journal, 13(2), 619–629.

Zhao, K., Wu, X., Xiao, Y., Jiang, S., Yu, P., Wang, Y., & Wang, Q. (2024). PlanText: Gradually masked guidance to align image phenotype with trait description for plant disease texts. Plant Phenomics, 6, 0272.

Zhou, K., Xie, C., Bai, Y., Zhang, Y., & Li, J. (2023). Hallucination in multimodal large language models: A survey. arXiv preprint. https://arxiv.org/abs/2311.07344

[1]	Mengli Yang, Jian Jiao, Yiqi Liu, Ming Li, Yan Xia, Feifan Hou, Chuanmi Huang, Hengtao Zhang, Miaomiao Wang, Jiangli Shi, Ran Wan, Kunxi Zhang, Pengbo Hao, Tuanhui Bai, Chunhui Song, Jiancan Feng, Xianbo Zheng. Genome-wide investigation of defensin genes in apple (Malus×domestica Borkh.) and in vivo analyses show that MdDEF25 confers resistance to Fusarium solani [J]. >Journal of Integrative Agriculture, 2025, 24(1): 161-175.
[2]	Qianwei Liu, Shuo Xu, Lu Jin, Xi Yu, Chao Yang, Xiaomin Liu, Zhijun Zhang, Yusong Liu, Chao Li, Fengwang Ma. *Silencing of early auxin responsive genes MdGH3-2/12 reduces the resistance to Fusarium solani* in apple**[J]. >Journal of Integrative Agriculture, 2024, 23(9): 3012-3024.
[3]	LI Wei-hua, CHEN Peng, WANG Yu-zhu, LIU Qi-zhi. Characterization of the microbial community response to replant diseases in peach orchards[J]. >Journal of Integrative Agriculture, 2023, 22(4): 1082-1092.
[4]	SUN Yao-guang, HE Yu-qing, WANG He-xuan, JIANG Jing-bin, YANG Huan-huan, XU Xiang-yang. Genome-wide identification and expression analysis of GDSL esterase/lipase genes in tomato[J]. >Journal of Integrative Agriculture, 2022, 21(2): 389-406.
[5]	HE Dun-chun, Jeremy J. BURDON, XIE Lian-hui, Jiasui ZHAN. Triple bottom-line consideration of sustainable plant disease management: From economic, sociological and ecological perspectives[J]. >Journal of Integrative Agriculture, 2021, 20(10): 2581-2591.
[6]	HE Dun-chun, ZHAN Jia-sui, XIE Lian-hui. Problems, challenges and future of plant disease management: from an ecological point of view[J]. >Journal of Integrative Agriculture, 2016, 15(4): 705-715.

No Suggested Reading articles found!

Viewed

Full text

Abstract

Cited

Shared

Discussed

Cite this article:

About JIA

Editorial board

For authors