Journal of Integrative Agriculture ›› 2026, Vol. 25 ›› Issue (2): 756-768.DOI: 10.1016/j.jia.2024.03.075

• • 上一篇    下一篇

  

  • 收稿日期:2023-10-23 修回日期:2024-03-27 接受日期:2024-01-16 出版日期:2026-02-20 发布日期:2026-01-06

E2ETCA: End-to-end training of CNN and attention ensembles for rice disease diagnosis

Md. Zasim Uddin1#, Md. Nadim Mahamood1, Ausrukona Ray1, Md. Ileas Pramanik1, Fady Alnajjar2#, Md Atiqur Rahman Ahad3   

  1. 1 Department of Computer Science and Engineering, Begum Rokeya University, Rangpur 5404 , Bangladesh

    2 Department of Computer Science and Software Engineering, United Arab Emirates University, Alain 15551, UAE

    3 Department of Computer Science and Digital Technologies, University of East London, London E16 2RD, UK

  • Received:2023-10-23 Revised:2024-03-27 Accepted:2024-01-16 Online:2026-02-20 Published:2026-01-06
  • About author:#Correspondence Md. Zasim Uddin, E-mail: zasim@brur.ac.bd; Fady Alnajjar, E-mail: fady.alnajjar@uaeu.ac.ae
  • Supported by:
    We are grateful to the Begum Rokeya University, Rangpur, and the United Arab Emirates University, UAE for partially supporting this work.

Abstract:

Rice is one of the most important staple crops globally.  Rice plant diseases can severely reduce crop yields and, in extreme cases, lead to total production loss.  Early diagnosis enables timely intervention, mitigates disease severity, supports effective treatment strategies, and reduces reliance on excessive pesticide use.  Traditional machine learning approaches have been applied for automated rice disease diagnosis; however, these methods depend heavily on manual image preprocessing and handcrafted feature extraction, which are labor-intensive and time-consuming and often require domain expertise.  Recently, end-to-end deep learning (DL) models have been introduced for this task, but they often lack robustness and generalizability across diverse datasets.  To address these limitations, we propose a novel end-to-end training framework for convolutional neural network (CNN) and attention-based model ensembles (E2ETCA).  This framework integrates features from two state-of-the-art (SOTA) CNN models, Inception V3 and DenseNet-201, and an attention-based vision transformer (ViT) model.  The fused features are passed through an additional fully connected layer with softmax activation for final classification.  The entire process is trained end-to-end, enhancing its suitability for real-world deployment.  Furthermore, we extract and analyze the learned features using a support vector machine (SVM), a traditional machine learning classifier, to provide comparative insights.  We evaluate the proposed E2ETCA framework on three publicly available datasets, the Mendeley Rice Leaf Disease Image Samples dataset, the Kaggle Rice Diseases Image dataset, the Bangladesh Rice Research Institute dataset, and a combined version of all three.  Using standard evaluation metrics (accuracy, precision, recall, and F1-score), our framework demonstrates superior performance compared to existing SOTA methods in rice disease diagnosis, with potential applicability to other agricultural disease detection tasks.

Key words: rice disease diagnosis , ensemble method ,  CNN-based model ,  end-to-end model ,  Inception model ,  DenseNet model ,  vision transformer model ,  attention-based model ,  support vector machine