基于视觉基础模型与自监督学习的小样本驱动的大规模灯诱昆虫图像数据集构建方法

doi:10.1016/j.jia.2025.08.020

Journal of Integrative Agriculture ›› 2026, Vol. 25 ›› Issue (7): 2915-2935.DOI: 10.1016/j.jia.2025.08.020

基于视觉基础模型与自监督学习的小样本驱动的大规模灯诱昆虫图像数据集构建方法

收稿日期:2025-04-14 修回日期:2025-08-26 接受日期:2025-07-01 出版日期:2026-07-20 发布日期:2026-06-09

Few-shot driven construction method of a large-scale light-trapped insect annotation data based on vision foundation models and self-supervised learning

Yanchen You¹, Zelin Feng^{2, 3}, Zhe Wang³, Lingyi Li³, Ju Luo⁴, Jun Lü¹, Haowen Zhang³, Baojun Yang⁴, Shuhua Liu⁴, Qing Yao^3#

¹School of Information Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China
²School of Information and Control, Keyi College of Zhejiang Sci-Tech University, Hangzhou 310018, China
³School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
⁴State Key Laboratory of Rice Biology and Breeding, China National Rice Research Institute, Hangzhou 311401, China

Received:2025-04-14 Revised:2025-08-26 Accepted:2025-07-01 Online:2026-07-20 Published:2026-06-09
About author:Yanchen You, Mobile: +86-19857138119, E-mail: 2023220704083@mails.zstu.edu.cn; #Correspondence Qing Yao, Mobile: +86-13958015661, E-mail: q-yao@zstu.edu.cn
Supported by:
This work was supported by the National Key Research Program of China during the 14th Five-Year Plan Period (2021YFD1401100), the “San Nong Jiu Fang” Sciences and Technologies Cooperation Project of Zhejiang Province, China (2024SNJF010), the Zhejiang Provincial Natural Science Foundation, China (LTGN24C140007) and the Industry-Academia-Research Cooperation Project of Zhuhai, China (ZH22017001210013PWC).

摘要/Abstract

摘要：

基于机器视觉的智能虫情测报灯利用特定光谱吸引害虫，通过红外加热灭杀害虫，并借助人工智能模型进行害虫的识别与计数。为实现更优的害虫识别模型性能，构建高质量的昆虫标注数据集至关重要。然而，传统的人工标注方式依赖专家知识，耗时且在大规模多类昆虫标注任务中效率低下。本研究建立了一种高效的小样本学习方法，用于构建大规模灯诱昆虫数据集。该方法采用“检测-分类”的两阶段标注框架：首先设计了一种多尺度灯诱昆虫检测器（MLTIDD），解决不同大学昆虫尺度与感受野差异问题。该方法基于微调的Grounding DINO模型，整合了SAM和SAHI技术，实现了多尺度昆虫检测。然后，我们提出了一种基于iBOT的自监督学习方法InsectSSRL用于从MLTIDD检测到的大量未标记昆虫子图像中学习具有鲁棒性的昆虫特征表示。该方法通过结合三项代理任务，显著提升了昆虫子图像的特征提取能力。该特征提取器可以用于分类模型对昆虫子图像进行预分类。经专家校正后，昆虫标签信息回溯至原始图像，完成灯诱昆虫数据集的标注工作。实验表明，在有限样本条件下，MLTIDD实现了79.6%的AP_50-95和90.8%的AR，比DINO模型分别提高7.0和4.7个百分点。InsectSSRL在k近邻评估中达到了85.87%的top-1准确率。在小样本分类任务中，使用InsectSSRL预训练并在5%的InsectID数据集上微调的Swin-T模型取得了80.35%的准确率，较iBOT提升了2.08个百分点，较基于COCO的迁移学习模型提升了11.3个百分点。相较于DINO和iBOT结合的模型，基于本文提出的灯诱昆虫标注数据集构建方法在测试集上mAP_50-95提升了10.91个百分点，AR提升了8.26个百分点，与专家人工标注相比，标注时间减少了约80%。

Abstract:

The intelligent pest-monitoring light trap based on machine vision employs specific light spectra to attract pests, infrared heating to eliminate pests, and artificial intelligence models to recognize and count them. Achieving optimal model performance requires a high-quality insect annotated dataset. However, traditional manual annotation is expert-dependent, time-consuming, and inefficient for large-scale multi-class insect labeling. This study establishes an efficient, few-shot learning approach to construct a large-scale light-trapped insect dataset through a two-stage annotation framework: detection followed by classification. Specifically, a MLTIDD addresses scale and receptive field disparities between large and tiny insects. Based on a fine-tuned Grounding DINO, SAM and SAHI are integrated to detect insects at multiple scales. Subsequently, InsectSSRL, an iBOT-based self-supervised method, learns robust insect feature representations from the extensive set of unlabeled insect sub-images detected by MLTIDD. It enhances feature extraction capability for insect sub-images through three proxy tasks. This feature extractor supports a classification model to pre-classify insect sub-images. Following expert correction, labels are traced back to original images to complete annotation work for the light-trapped insect dataset. Experimental results demonstrate that under limited samples, MLTIDD achieved 79.6% average precision (AP)_50–95 and 90.8% average recall (AR), surpassing DINO by 7.0 and 4.7 percentage points. InsectSSRL attained 85.87% top-1 accuracy in k-NN evaluation. In few-shot classification, Swin-T pre-trained with InsectSSRL and fine-tuned on 5% of InsectID achieved 80.35% accuracy, exceeding iBOT by 2.08 and COCO-based transfer learning by 11.3 percentage points. The proposed pipeline improved mAP_50–95 by 10.91 and AR by 8.26 percentage points compared to DINO and iBOT, while reducing expert annotation time by approximately 80% relative to manual labeling.

Key words: intelligent pest-monitoring light trap , insect dataset construction , few-shot learning , self-supervised learning , insect detection and classification , automated insect annotation

. 基于视觉基础模型与自监督学习的小样本驱动的大规模灯诱昆虫图像数据集构建方法[J]. Journal of Integrative Agriculture, 2026, 25(7): 2915-2935.

Yanchen You, Zelin Feng, Zhe Wang, Lingyi Li, Ju Luo, Jun Lv, Haowen Zhang, Baojun Yang, Shuhua Liu, Qing Yao. Few-shot driven construction method of a large-scale light-trapped insect annotation data based on vision foundation models and self-supervised learning[J]. Journal of Integrative Agriculture, 2026, 25(7): 2915-2935.

参考文献

Akyon F C, Onur Altinuc S, Temizel A. 2022. Slicing aided hyper inference and fine-tuning for small object detection. In: Proceedings of 2022 IEEE International Conference on Image Processing. Institute of Electrical and Electronic Engineers Computer Society, Bordeaux, France. pp. 966–970.

Alfarisy A A, Chen Q, Guo M. 2018. Deep learning based classification for paddy pests and diseases recognition. In: Proceedings of 2018 International Conference on Mathematics and Artificial Intelligence. Association for Computing Machinery, New York, USA. pp. 21–25.

Bjerge K, Nielsen J B, Sepstrup M V, Helsing-Nielsen F, Hoye T T. 2021. An automated light trap to monitor moths (Lepidoptera) using computer vision-based tracking and deep learning. Sensors, 21, 343.

Caron M, Misra I, Mairal J, Goyal P, Bojanowski P, Joulin A. 2020. Unsupervised learning of visual features by contrasting cluster assignments. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. Curran Associates, Vancouver, Canada. pp. 9912–9924.

Caron M, Touvron H, Misra I, Jégou H, Mairal J, Bojanowski P, Joulin A. 2021. Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Institute of Electrical and Electronic Engineers Computer Society, Montreal, Canada. pp. 9650–9660.

Chen K, Liu C, Chen H, Zhang H, Li W, Zou Z, Shi Z. 2023. RSPrompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. IEEE Transactions on Geoscience and Remote Sensing, 62 1–17.

Chen T, Zhu L, Ding C, Cao R, Wang Y, Li Z, Sun L, Mao P, Zang Y. 2023. SAM fails to segment anything? - SAM-adapter: Adapting SAM in underperformed scenes: Camouflage, shadow, medical image segmentation, and more. arXiv, doi: https://doi.org/10.48550/arXiv.2304.09148.

Chen X, Yang X, Hu H, Li T, Zhou Z, Li W. 2025. DAMI-YOLOv8l: A multi-scale detection framework for light-trapping insect pest monitoring. Ecological Informatics, 86, 103067.

Cheng W, Zheng X, Wang P, Lei C, Wang X. 2011. Sexual difference of insect phototactic behavior and related affecting factors. Chinese Journal of Applied Ecology, 22, 3351–3357. (in Chinese)

Cohen-addad V, Kanade V, Mallmann-trenn F, Mathieu C. 2019. Hierarchical clustering: Objective functions and algorithms. Journal of the ACM, 66, 1–42.

Feng H, Yao Q. 2018. Automatic identification and monitoring technologies of agricultural pest insects. Plant Protection, 44, 127–133. (in Chinese)

Feng H, Yao Q, Hu C, Huang W, Hu X, Liu J, Zhang Y, Zhang Z, Qiao H, Liu W. 2023. Recent advances in intelligent techniques for monitoring and prediction of crop diseases and insect pests in China. Plant Protection, 49, 229–242. (in Chinese)

Gharaee Z, Gong Z, Pellegrino N, Zarubiieva I, Haurum J B, Lowe S, McKeown J, Ho C, McLeod J, Wei Y Y, Agda J, Ratnasingham S, Steinke D, Chang A, Taylor G W, Fieguth P. 2023. A step towards worldwide biodiversity assessment: The BIOSCAN-1M insect dataset. Advances in Neural Information Processing Systems, 36, 43593–43619.

Guan B, Zhang L, Zhu J, Li R, Kong J, Wang Y, Dong W. 2023. The key issues and evaluation methods for constructing agricultural pest and disease image datasets: A review. Smart Agriculture, 5, 17–34. (in Chinese)

He K, Fan H, Wu Y, Xie S, Girshick R. 2020. Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronic Engineers Computer Society, Seattle, USA. pp. 9729–9738.

Islam T, Sarker T T, Ahmed K R, Lakhssassi N. 2024. Detection and classification of cannabis seeds using RetinaNet and faster R-CNN. Seeds, 3, 456–478.

Jiang Y, Liu J, Zeng J, Huang C, Zhang T. 2021. Occurrence of, and damage caused by, major migratory pests and techniques for monitoring and forecasting these in China. Chinese Journal of Applied Entomology, 58, 542–551. (in Chinese)

Khanam R, Hussain M. 2024. YOLOv11: An overview of the key architectural enhancements. arXiv, doi: https://doi.org/10.48550/arXiv.2410.17725.

Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, Xiao T, Whitehead S, Berg A C, Lo W Y, Dollar P, Girshick R. 2023. Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Institute of Electrical and Electronic Engineers Computer Society, Paris, France. pp. 4015–4026.

Li C, Yang J, Zhang P, Gao M, Xiao B, Dai X, Yuan L, Gao J. 2022. Efficient self-supervised vision transformers for representation learning. arXiv, doi: https://doi.org/10.48550/arXiv.2106.09785.

Li J, Chen D, Qi X, Li Z, Huang Y, Morris D, Tan X. 2023. Label-efficient learning in agriculture: A comprehensive review. Computers and Electronics in Agriculture, 215, 108412.

Lin T Y, Goyal P, Girshick R, He K, Dollar P. 2017. Focal loss for dense object detection. In: Proceedings of 2017 IEEE International Conference on Computer Vision. Institute of Electrical and Electronic Engineers Computer Society, Venice, Italy. pp. 2999–3007.

Liu L. 2021. Research and applications on agricultural crop pest detection techniques based on deep learning. PhD thesis, University of Science and Technology of China, China. (in Chinese)

Liu L, Wang R, Xie C, Yang P, Wang F, Sudirman S, Liu W. 2019. PestNet: An end-to-end deep learning approach for large-scale multi-class pest detection and classification. IEEE Access, 7, 45301–45312.

Liu S, Zeng Z, Ren T, Li F, Zhang H, Yang J, Jiang Q, Li C, Yang J, Su H, Zhu J, Zhang L. 2024. Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection. In: Proceedings of the European Conference on Computer Vision. Springer, Zurich, Switzerland. pp. 38–55.

Liu Z, Gao J, Yang G, Zhang H, He Y. 2016. Localization and classification of paddy field pests using a saliency map and deep convolutional neural network. Scientific Reports, 6, 20410.

Liu Z, Wang F, Li Y, Guan Y, Song Z, Cui C, Li S. 2022. Application effect of intelligent pest monitoring lamps in the monitoring of vegetable pests. China Plant Protection, 42, 37–41. (in Chinese)

Lv J, Li W, Fan M, Zheng T, Yang Z, Chen Y, He G, Yang X, Liu S, Sun C. 2022. Detecting pests from light-trapping images based on improved YOLOv3 model and instance augmentation. Frontiers in Plant Science, 13, 939498.

Ma B, Xu W. 2023. Efficient fine tuning for fashion object detection. Sensors, 23, 6083.

Peng Z, Wang W, Dong L, Hao Y, Huang S, Ma S, Wei F. 2023. Kosmos-2: Grounding multimodal large language models to the world. arXiv, doi: https://doi.org/10.48550/arXiv.2306.14824.

Qin M, Liu Z, Zhang J, Song W, Li Z, Ceng S, Yin Y. 2024. Research on the application of insect scouting lamps in monitoring and forecasting of forestry pests. Journal of Wildland Fire Science, 42, 105–107. (in Chinese)

Qu R, Meng W, Li J, Ding A, Jin Y. 2008. Effects of environmental moisture and precipitation on insects: A review. Chinese Journal of Ecology, 27, 619. (in Chinese)

Ren S, He K, Girshick R, Sun J. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137–1149.

Savary S, Willocquet L, Pethybridge S J, Esker P, McRoberts N, Nelson A. 2019. The global burden of pathogens and pests on major food crops. Nature Ecology & Evolution, 3, 430–439.

Shao S, Li Z, Zhang T, Peng C, Yu G, Zhang X, Li J, Sun J. 2019. Objects365: A large-scale, high-quality dataset for object detection. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Institute of Electrical and Electronic Engineers Computer Society, Seoul, South Korea. pp. 8429–8438.

Son J, Jung H. 2024. Teacher-student model using Grounding DINO and you only look once for multi-sensor-based object detection. Applied Sciences, 14, 2232.

Stevens S, Wu J, Thompson M J, Campolongo E G, Song C H, Carlyn D E, Dong L, Dahdul W M, Stewart C, Berger-Wolf T, Chao W L, Su Y. 2024. BioCLIP: A vision foundation model for the tree of life. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronic Engineers Computer Society, Seattle, USA. pp. 19412–19424.

Sultan R I, Li C, Zhu H, Khanduri P, Brocanelli M, Zhu D. 2024. GeoSAM: Fine-tuning SAM with sparse and dense visual prompting for automated segmentation of mobility infrastructure. arXiv, doi: https://doi.org/10.48550/arXiv.2311.11319.

Talaei Khoei T, Ould Slimane H, Kaabouch N. 2023. Deep learning: Systematic review, models, challenges, and research directions. Neural Computing and Applications, 35, 23103–23124.

Varghese R, Sambath M S. 2024. YOLOv8: A novel object detection algorithm with enhanced performance and robustness. In: Proceedings of 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems. Institute of Electrical and Electronic Engineers Computer Society, Chennai, India. pp.1–6.

Wang A, Chen H, Liu L, Chen K, Lin Z, Han J, Ding G. 2024. YOLOv10: real-time end-to-end object detection. In: Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, Zhang C, eds., Advances in Neural Information Processing Systems. Curran Associates, Montreal, Canada. pp. 107984–108011.

Wang C Y, Yeh I H, Mark Liao H Y. 2024. YOLOv9: Learning what you want to learn using programmable gradient information. In: Proceedings of Computer Vision - ECCV 2024: 18th European Conference. Springer-Verlag, Milan, Italy. pp. 1–21.

Wang J, Zhang P, Chu T, Cao Y, Zhou Y, Wu T, Wang B, He C, Lin D. 2023. V3Det: Vast vocabulary visual detection dataset. In: Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Institute of Electrical and Electronic Engineers Computer Society, Paris, France. pp. 19787–19797.

Wang Q J, Zhang S Y, Dong S F, Zhang G C, Yang J, Li R, Wang H Q. 2020. Pest24: A large-scale very small object data set of agricultural pests for multi-target detection. Computers and Electronics in Agriculture, 175, 105585.

Wang R, Liu L, Xie C, Yang P, Li R, Zhou M. 2021. AgriPest: A large-scale domain-specific benchmark dataset for practical agricultural pest detection in the wild. Sensors, 21, 1601.

Wen C, Chen H, Ma Z, Zhang T, Yang C, Su H, Chen H. 2022. Pest-YOLO: A model for large-scale multi-class dense and tiny pest detection and counting. Frontiers in Plant Science, 13, 973985.

Wu J, Ji W, Liu Y, Fu H, Xu M, Xu Y, Jin Y. 2023. Medical SAM adapter: Adapting segment anything model for medical image segmentation. Medical Image Analysis, 102, 103547.

Wu X, Zhan C, Lai Y K, Cheng M M, Yang J. 2019. IP102: A large-scale benchmark dataset for insect pest recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronic Engineers Computer Society, Long Beach, USA. pp. 8787–8796.

Wu Z, Xiong Y, Yu S X, Lin D. 2018. Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronic Engineers Computer Society, Salt Lake City, USA. pp. 3733–3742.

Yao Q, Feng J, Tang J, Xu W, Zhu X, Yang B, Lü J, Xie Y, Yao B, Wu S, Kuai N, Wang L. 2020. Development of an automatic monitoring system for rice light-trap pests based on machine vision. Journal of Integrative Agriculture, 19, 2500–2513.

Yao Q, Lv J, Tang J, Feng J, Zhu X. 2021a. Research on fine-grained image recognition of agricultural light-trap pests based on bilinear attention network. Scientia Agricultura Sinica, 54, 4562–4572. (in Chinese)

Yao Q, Wu S, Kuai N, Yang B, Tang J, Feng J, Zhu X. 2021b. Automatic detection of rice planthoppers through light-trap insect images using improved CornerNet. Transactions of the Chinese Society of Agricultural Engineering, 37, 183–189. (in Chinese)

Zhang H, Li F, Liu S, Zhang L, Su H, Zhu J, Ni L, Shum H Y. 2023. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In: Proceedings of the International Conference on Learning Representations. ICLR, Kigali, Rwanda. pp. 1–19.

Zhao X, Chen Y, Xu S, Li X, Wang X, Li Y, Huang H. 2024. An open and comprehensive pipeline for unified object grounding and detection. arXiv, doi: https://doi.org/10.48550/arXiv.2401.02361.

Zhao Y, Lv W, Xu S, Wei J, Wang G, Dang Q, Liu Y, Chen J. 2024. DETRs beat YOLOs on real-time object detection. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronic Engineers Computer Society, Seattle, USA. pp. 16965–16974.

Zhou J, Wei C, Wang H, Shen W, Xie C, Yuille A, Kong T. 2022. iBOT: image BERT pre-training with online tokenizer. arXiv, doi: https://doi.org/10.48550/arXiv.2111.07832.

基于视觉基础模型与自监督学习的小样本驱动的大规模灯诱昆虫图像数据集构建方法

Few-shot driven construction method of a large-scale light-trapped insect annotation data based on vision foundation models and self-supervised learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics