Journal of Integrative Agriculture ›› 2026, Vol. 25 ›› Issue (7): 2915-2935.DOI: 10.1016/j.jia.2025.08.020

• • 上一篇    下一篇

基于视觉基础模型与自监督学习的小样本驱动的大规模灯诱昆虫图像数据集构建方法

  

  • 收稿日期:2025-04-14 修回日期:2025-08-26 接受日期:2025-07-01 出版日期:2026-07-20 发布日期:2026-06-09

Few-shot driven construction method of a large-scale light-trapped insect annotation data based on vision foundation models and self-supervised learning

Yanchen You1, Zelin Feng2, 3, Zhe Wang3, Lingyi Li3, Ju Luo4, Jun Lü1, Haowen Zhang3, Baojun Yang4, Shuhua Liu4, Qing Yao3#   

  1. 1 School of Information Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China

    2 School of Information and Control, Keyi College of Zhejiang Sci-Tech University, Hangzhou 310018, China

    3 School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China

    4State Key Laboratory of Rice Biology and Breeding, China National Rice Research Institute, Hangzhou 311401, China

  • Received:2025-04-14 Revised:2025-08-26 Accepted:2025-07-01 Online:2026-07-20 Published:2026-06-09
  • About author:Yanchen You, Mobile: +86-19857138119, E-mail: 2023220704083@mails.zstu.edu.cn; #Correspondence Qing Yao, Mobile: +86-13958015661, E-mail: q-yao@zstu.edu.cn
  • Supported by:

    This work was supported by the National Key Research Program of China during the 14th Five-Year Plan Period (2021YFD1401100), the “San Nong Jiu Fang” Sciences and Technologies Cooperation Project of Zhejiang Province, China (2024SNJF010), the Zhejiang Provincial Natural Science Foundation, China (LTGN24C140007) and the Industry-Academia-Research Cooperation Project of Zhuhai, China (ZH22017001210013PWC).

摘要:

基于机器视觉的智能虫情测报灯利用特定光谱吸引害虫,通过红外加热灭杀害虫,并借助人工智能模型进行害虫的识别与计数。为实现更优的害虫识别模型性能,构建高质量的昆虫标注数据集至关重要。然而,传统的人工标注方式依赖专家知识,耗时且在大规模多类昆虫标注任务中效率低下。本研究建立了一种高效的小样本学习方法,用于构建大规模灯诱昆虫数据集。该方法采用检测-分类的两阶段标注框架:首先设计了一种多尺度灯诱昆虫检测器(MLTIDD),解决不同大学昆虫尺度与感受野差异问题。该方法基于微调的Grounding DINO模型,整合了SAMSAHI技术,实现了多尺度昆虫检测。然后,我们提出了一种基于iBOT的自监督学习方法InsectSSRL用于从MLTIDD检测到的大量未标记昆虫子图像中学习具有鲁棒性的昆虫特征表示。该方法通过结合三项代理任务,显著提升了昆虫子图像的特征提取能力。该特征提取器可以用于分类模型对昆虫子图像进行预分类。经专家校正后,昆虫标签信息回溯至原始图像,完成灯诱昆虫数据集的标注工作。实验表明在有限样本条件下,MLTIDD实现了79.6%AP50-9590.8%AR比DINO模型分别提高7.04.7个百分点。InsectSSRLk近邻评估中达到了85.87%top-1准确率。在小样本分类任务中,使用InsectSSRL预训练并在5%InsectID数据上微调的Swin-T模型取得了80.35%的准确率,较iBOT提升了2.08个百分点,较基于COCO的迁移学习模型提升了11.3个百分点。相较于DINOiBOT结合的模型基于本文提出的灯诱昆虫标注数据集构建方法在测试集上mAP50-95提升了10.91个百分点,AR提升了8.26个百分点专家人工标注相比,标注时间减少了约80%

Abstract:

The intelligent pest-monitoring light trap based on machine vision employs specific light spectra to attract pests, infrared heating to eliminate pests, and artificial intelligence models to recognize and count them.  Achieving optimal model performance requires a high-quality insect annotated dataset.  However, traditional manual annotation is expert-dependent, time-consuming, and inefficient for large-scale multi-class insect labeling.  This study establishes an efficient, few-shot learning approach to construct a large-scale light-trapped insect dataset through a two-stage annotation framework: detection followed by classification.  Specifically, a MLTIDD addresses scale and receptive field disparities between large and tiny insects.  Based on a fine-tuned Grounding DINO, SAM and SAHI are integrated to detect insects at multiple scales.  Subsequently, InsectSSRL, an iBOT-based self-supervised method, learns robust insect feature representations from the extensive set of unlabeled insect sub-images detected by MLTIDD.  It enhances feature extraction capability for insect sub-images through three proxy tasks.  This feature extractor supports a classification model to pre-classify insect sub-images.  Following expert correction, labels are traced back to original images to complete annotation work for the light-trapped insect dataset.  Experimental results demonstrate that under limited samples, MLTIDD achieved 79.6% average precision (AP)50–95 and 90.8% average recall (AR), surpassing DINO by 7.0 and 4.7 percentage points.  InsectSSRL attained 85.87% top-1 accuracy in k-NN evaluation.  In few-shot classification, Swin-T pre-trained with InsectSSRL and fine-tuned on 5% of InsectID achieved 80.35% accuracy, exceeding iBOT by 2.08 and COCO-based transfer learning by 11.3 percentage points.  The proposed pipeline improved mAP50–95 by 10.91 and AR by 8.26 percentage points compared to DINO and iBOT, while reducing expert annotation time by approximately 80% relative to manual labeling.

Key words: intelligent pest-monitoring light trap , insect dataset construction ,  few-shot learning ,  self-supervised learning ,  insect detection and classification ,  automated insect annotation