Please wait a minute...
Journal of Integrative Agriculture  2026, Vol. 25 Issue (7): 2915-2935    DOI: 10.1016/j.jia.2025.08.020
Plant Protection Advanced Online Publication | Current Issue | Archive | Adv Search |
Few-shot driven construction method of a large-scale light-trapped insect annotation data based on vision foundation models and self-supervised learning

Yanchen You1, Zelin Feng2, 3, Zhe Wang3, Lingyi Li3, Ju Luo4, Jun Lü1, Haowen Zhang3, Baojun Yang4, Shuhua Liu4, Qing Yao3#

1 School of Information Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China

2 School of Information and Control, Keyi College of Zhejiang Sci-Tech University, Hangzhou 310018, China

3 School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China

4State Key Laboratory of Rice Biology and Breeding, China National Rice Research Institute, Hangzhou 311401, China

 Highlights 
Multi-scale light-trapped insect-DINO detector (MLTIDD) with segment anything model (SAM) and slicing-aided hyper inference (SAHI) effectively detects tiny insects in light-trapped images.
MLTIDD demonstrates robust generalization across diverse light-trapped scenarios.
InsectSSRL with multiple proxy tasks learns robust insect feature representations.
Vision transformer (ViT) trained with InsectSSRL exhibits exceptional few-shot learning capability.
Proposed data construction method reduces expert annotation time by 80% while maintaining precision.
Download:  PDF in ScienceDirect  
Export:  BibTeX | EndNote (RIS)      
摘要  

基于机器视觉的智能虫情测报灯利用特定光谱吸引害虫,通过红外加热灭杀害虫,并借助人工智能模型进行害虫的识别与计数。为实现更优的害虫识别模型性能,构建高质量的昆虫标注数据集至关重要。然而,传统的人工标注方式依赖专家知识,耗时且在大规模多类昆虫标注任务中效率低下。本研究建立了一种高效的小样本学习方法,用于构建大规模灯诱昆虫数据集。该方法采用检测-分类的两阶段标注框架:首先设计了一种多尺度灯诱昆虫检测器(MLTIDD),解决不同大学昆虫尺度与感受野差异问题。该方法基于微调的Grounding DINO模型,整合了SAMSAHI技术,实现了多尺度昆虫检测。然后,我们提出了一种基于iBOT的自监督学习方法InsectSSRL用于从MLTIDD检测到的大量未标记昆虫子图像中学习具有鲁棒性的昆虫特征表示。该方法通过结合三项代理任务,显著提升了昆虫子图像的特征提取能力。该特征提取器可以用于分类模型对昆虫子图像进行预分类。经专家校正后,昆虫标签信息回溯至原始图像,完成灯诱昆虫数据集的标注工作。实验表明在有限样本条件下,MLTIDD实现了79.6%AP50-9590.8%AR比DINO模型分别提高7.04.7个百分点。InsectSSRLk近邻评估中达到了85.87%top-1准确率。在小样本分类任务中,使用InsectSSRL预训练并在5%InsectID数据上微调的Swin-T模型取得了80.35%的准确率,较iBOT提升了2.08个百分点,较基于COCO的迁移学习模型提升了11.3个百分点。相较于DINOiBOT结合的模型基于本文提出的灯诱昆虫标注数据集构建方法在测试集上mAP50-95提升了10.91个百分点,AR提升了8.26个百分点专家人工标注相比,标注时间减少了约80%



Abstract  

The intelligent pest-monitoring light trap based on machine vision employs specific light spectra to attract pests, infrared heating to eliminate pests, and artificial intelligence models to recognize and count them.  Achieving optimal model performance requires a high-quality insect annotated dataset.  However, traditional manual annotation is expert-dependent, time-consuming, and inefficient for large-scale multi-class insect labeling.  This study establishes an efficient, few-shot learning approach to construct a large-scale light-trapped insect dataset through a two-stage annotation framework: detection followed by classification.  Specifically, a MLTIDD addresses scale and receptive field disparities between large and tiny insects.  Based on a fine-tuned Grounding DINO, SAM and SAHI are integrated to detect insects at multiple scales.  Subsequently, InsectSSRL, an iBOT-based self-supervised method, learns robust insect feature representations from the extensive set of unlabeled insect sub-images detected by MLTIDD.  It enhances feature extraction capability for insect sub-images through three proxy tasks.  This feature extractor supports a classification model to pre-classify insect sub-images.  Following expert correction, labels are traced back to original images to complete annotation work for the light-trapped insect dataset.  Experimental results demonstrate that under limited samples, MLTIDD achieved 79.6% average precision (AP)50–95 and 90.8% average recall (AR), surpassing DINO by 7.0 and 4.7 percentage points.  InsectSSRL attained 85.87% top-1 accuracy in k-NN evaluation.  In few-shot classification, Swin-T pre-trained with InsectSSRL and fine-tuned on 5% of InsectID achieved 80.35% accuracy, exceeding iBOT by 2.08 and COCO-based transfer learning by 11.3 percentage points.  The proposed pipeline improved mAP50–95 by 10.91 and AR by 8.26 percentage points compared to DINO and iBOT, while reducing expert annotation time by approximately 80% relative to manual labeling.

Keywords:  intelligent pest-monitoring light trap       insect dataset construction        few-shot learning        self-supervised learning        insect detection and classification        automated insect annotation  
Received: 14 April 2025   Accepted: 01 July 2025 Online: 26 August 2025  
Fund: 

This work was supported by the National Key Research Program of China during the 14th Five-Year Plan Period (2021YFD1401100), the “San Nong Jiu Fang” Sciences and Technologies Cooperation Project of Zhejiang Province, China (2024SNJF010), the Zhejiang Provincial Natural Science Foundation, China (LTGN24C140007) and the Industry-Academia-Research Cooperation Project of Zhuhai, China (ZH22017001210013PWC).

About author:  Yanchen You, Mobile: +86-19857138119, E-mail: 2023220704083@mails.zstu.edu.cn; #Correspondence Qing Yao, Mobile: +86-13958015661, E-mail: q-yao@zstu.edu.cn

Cite this article: 

Yanchen You, Zelin Feng, Zhe Wang, Lingyi Li, Ju Luo, Jun Lv, Haowen Zhang, Baojun Yang, Shuhua Liu, Qing Yao. 2026. Few-shot driven construction method of a large-scale light-trapped insect annotation data based on vision foundation models and self-supervised learning. Journal of Integrative Agriculture, 25(7): 2915-2935.

Akyon F C, Onur Altinuc S, Temizel A. 2022. Slicing aided hyper inference and fine-tuning for small object detection. In: Proceedings of 2022 IEEE International Conference on Image Processing. Institute of Electrical and Electronic Engineers Computer Society, Bordeaux, France. pp. 966–970.

Alfarisy A A, Chen Q, Guo M. 2018. Deep learning based classification for paddy pests and diseases recognition. In: Proceedings of 2018 International Conference on Mathematics and Artificial Intelligence. Association for Computing Machinery, New York, USA. pp. 21–25.

Bjerge K, Nielsen J B, Sepstrup M V, Helsing-Nielsen F, Hoye T T. 2021. An automated light trap to monitor moths (Lepidoptera) using computer vision-based tracking and deep learning. Sensors21, 343.

Caron M, Misra I, Mairal J, Goyal P, Bojanowski P, Joulin A. 2020. Unsupervised learning of visual features by contrasting cluster assignments. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. Curran Associates, Vancouver, Canada. pp. 9912–9924.

Caron M, Touvron H, Misra I, Jégou H, Mairal J, Bojanowski P, Joulin A. 2021. Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Institute of Electrical and Electronic Engineers Computer Society, Montreal, Canada. pp. 9650–9660.

Chen K, Liu C, Chen H, Zhang H, Li W, Zou Z, Shi Z. 2023. RSPrompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. IEEE Transactions on Geoscience and Remote Sensing62 1–17.

Chen T, Zhu L, Ding C, Cao R, Wang Y, Li Z, Sun L, Mao P, Zang Y. 2023. SAM fails to segment anything? - SAM-adapter: Adapting SAM in underperformed scenes: Camouflage, shadow, medical image segmentation, and more. arXiv, doi: https://doi.org/10.48550/arXiv.2304.09148.

Chen X, Yang X, Hu H, Li T, Zhou Z, Li W. 2025. DAMI-YOLOv8l: A multi-scale detection framework for light-trapping insect pest monitoring. Ecological Informatics86, 103067.

Cheng W, Zheng X, Wang P, Lei C, Wang X. 2011. Sexual difference of insect phototactic behavior and related affecting factors. Chinese Journal of Applied Ecology22, 3351–3357. (in Chinese)

Cohen-addad V, Kanade V, Mallmann-trenn F, Mathieu C. 2019. Hierarchical clustering: Objective functions and algorithms. Journal of the ACM66, 1–42.

Feng H, Yao Q. 2018. Automatic identification and monitoring technologies of agricultural pest insects. Plant Protection44, 127–133. (in Chinese)

Feng H, Yao Q, Hu C, Huang W, Hu X, Liu J, Zhang Y, Zhang Z, Qiao H, Liu W. 2023. Recent advances in intelligent techniques for monitoring and prediction of crop diseases and insect pests in China. Plant Protection49, 229–242. (in Chinese)

Gharaee Z, Gong Z, Pellegrino N, Zarubiieva I, Haurum J B, Lowe S, McKeown J, Ho C, McLeod J, Wei Y Y, Agda J, Ratnasingham S, Steinke D, Chang A, Taylor G W, Fieguth P. 2023. A step towards worldwide biodiversity assessment: The BIOSCAN-1M insect dataset. Advances in Neural Information Processing Systems36, 43593–43619.

Guan B, Zhang L, Zhu J, Li R, Kong J, Wang Y, Dong W. 2023. The key issues and evaluation methods for constructing agricultural pest and disease image datasets: A review. Smart Agriculture5, 17–34. (in Chinese)

He K, Fan H, Wu Y, Xie S, Girshick R. 2020. Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronic Engineers Computer Society, Seattle, USA. pp. 9729–9738.

Islam T, Sarker T T, Ahmed K R, Lakhssassi N. 2024. Detection and classification of cannabis seeds using RetinaNet and faster R-CNN. Seeds3, 456–478.

Jiang Y, Liu J, Zeng J, Huang C, Zhang T. 2021. Occurrence of, and damage caused by, major migratory pests and techniques for monitoring and forecasting these in China. Chinese Journal of Applied Entomology58, 542–551. (in Chinese)

Khanam R, Hussain M. 2024. YOLOv11: An overview of the key architectural enhancements. arXiv, doi: https://doi.org/10.48550/arXiv.2410.17725.

Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, Xiao T, Whitehead S, Berg A C, Lo W Y, Dollar P, Girshick R. 2023. Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Institute of Electrical and Electronic Engineers Computer Society, Paris, France. pp. 4015–4026.

Li C, Yang J, Zhang P, Gao M, Xiao B, Dai X, Yuan L, Gao J. 2022. Efficient self-supervised vision transformers for representation learning. arXiv, doi: https://doi.org/10.48550/arXiv.2106.09785.

Li J, Chen D, Qi X, Li Z, Huang Y, Morris D, Tan X. 2023. Label-efficient learning in agriculture: A comprehensive review. Computers and Electronics in Agriculture215, 108412.

Lin T Y, Goyal P, Girshick R, He K, Dollar P. 2017. Focal loss for dense object detection. In: Proceedings of 2017 IEEE International Conference on Computer Vision. Institute of Electrical and Electronic Engineers Computer Society, Venice, Italy. pp. 2999–3007.

Liu L. 2021. Research and applications on agricultural crop pest detection techniques based on deep learning. PhD thesis, University of Science and Technology of China, China. (in Chinese)

Liu L, Wang R, Xie C, Yang P, Wang F, Sudirman S, Liu W. 2019. PestNet: An end-to-end deep learning approach for large-scale multi-class pest detection and classification. IEEE Access7, 45301–45312.

Liu S, Zeng Z, Ren T, Li F, Zhang H, Yang J, Jiang Q, Li C, Yang J, Su H, Zhu J, Zhang L. 2024. Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection. In: Proceedings of the European Conference on Computer Vision. Springer, Zurich, Switzerland. pp. 38–55.

Liu Z, Gao J, Yang G, Zhang H, He Y. 2016. Localization and classification of paddy field pests using a saliency map and deep convolutional neural network. Scientific Reports6, 20410.

Liu Z, Wang F, Li Y, Guan Y, Song Z, Cui C, Li S. 2022. Application effect of intelligent pest monitoring lamps in the monitoring of vegetable pests. China Plant Protection42, 37–41. (in Chinese)

Lv J, Li W, Fan M, Zheng T, Yang Z, Chen Y, He G, Yang X, Liu S, Sun C. 2022. Detecting pests from light-trapping images based on improved YOLOv3 model and instance augmentation. Frontiers in Plant Science13, 939498.

Ma B, Xu W. 2023. Efficient fine tuning for fashion object detection. Sensors23, 6083.

Peng Z, Wang W, Dong L, Hao Y, Huang S, Ma S, Wei F. 2023. Kosmos-2: Grounding multimodal large language models to the world. arXiv, doi: https://doi.org/10.48550/arXiv.2306.14824.

Qin M, Liu Z, Zhang J, Song W, Li Z, Ceng S, Yin Y. 2024. Research on the application of insect scouting lamps in monitoring and forecasting of forestry pests. Journal of Wildland Fire Science42, 105–107. (in Chinese)

Qu R, Meng W, Li J, Ding A, Jin Y. 2008. Effects of environmental moisture and precipitation on insects: A review. Chinese Journal of Ecology27, 619. (in Chinese)

Ren S, He K, Girshick R, Sun J. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence39, 1137–1149.

Savary S, Willocquet L, Pethybridge S J, Esker P, McRoberts N, Nelson A. 2019. The global burden of pathogens and pests on major food crops. Nature Ecology & Evolution3, 430–439.

Shao S, Li Z, Zhang T, Peng C, Yu G, Zhang X, Li J, Sun J. 2019. Objects365: A large-scale, high-quality dataset for object detection. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Institute of Electrical and Electronic Engineers Computer Society, Seoul, South Korea. pp. 8429–8438.

Son J, Jung H. 2024. Teacher-student model using Grounding DINO and you only look once for multi-sensor-based object detection. Applied Sciences14, 2232.

Stevens S, Wu J, Thompson M J, Campolongo E G, Song C H, Carlyn D E, Dong L, Dahdul W M, Stewart C, Berger-Wolf T, Chao W L, Su Y. 2024. BioCLIP: A vision foundation model for the tree of life. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronic Engineers Computer Society, Seattle, USA. pp. 19412–19424.

Sultan R I, Li C, Zhu H, Khanduri P, Brocanelli M, Zhu D. 2024. GeoSAM: Fine-tuning SAM with sparse and dense visual prompting for automated segmentation of mobility infrastructure. arXiv, doi: https://doi.org/10.48550/arXiv.2311.11319.

Talaei Khoei T, Ould Slimane H, Kaabouch N. 2023. Deep learning: Systematic review, models, challenges, and research directions. Neural Computing and Applications35, 23103–23124.

Varghese R, Sambath M S. 2024. YOLOv8: A novel object detection algorithm with enhanced performance and robustness. In: Proceedings of 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems. Institute of Electrical and Electronic Engineers Computer Society, Chennai, India. pp.1–6.

Wang A, Chen H, Liu L, Chen K, Lin Z, Han J, Ding G. 2024. YOLOv10: real-time end-to-end object detection. In: Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, Zhang C, eds., Advances in Neural Information Processing Systems. Curran Associates, Montreal, Canada. pp. 107984–108011.

Wang C Y, Yeh I H, Mark Liao H Y. 2024. YOLOv9: Learning what you want to learn using programmable gradient information. In: Proceedings of Computer Vision - ECCV 2024: 18th European Conference. Springer-Verlag, Milan, Italy. pp. 1–21.

Wang J, Zhang P, Chu T, Cao Y, Zhou Y, Wu T, Wang B, He C, Lin D. 2023. V3Det: Vast vocabulary visual detection dataset. In: Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Institute of Electrical and Electronic Engineers Computer Society, Paris, France. pp. 19787–19797.

Wang Q J, Zhang S Y, Dong S F, Zhang G C, Yang J, Li R, Wang H Q. 2020. Pest24: A large-scale very small object data set of agricultural pests for multi-target detection. Computers and Electronics in Agriculture175, 105585.

Wang R, Liu L, Xie C, Yang P, Li R, Zhou M. 2021. AgriPest: A large-scale domain-specific benchmark dataset for practical agricultural pest detection in the wild. Sensors21, 1601.

Wen C, Chen H, Ma Z, Zhang T, Yang C, Su H, Chen H. 2022. Pest-YOLO: A model for large-scale multi-class dense and tiny pest detection and counting. Frontiers in Plant Science13, 973985.

Wu J, Ji W, Liu Y, Fu H, Xu M, Xu Y, Jin Y. 2023. Medical SAM adapter: Adapting segment anything model for medical image segmentation. Medical Image Analysis102, 103547.

Wu X, Zhan C, Lai Y K, Cheng M M, Yang J. 2019. IP102: A large-scale benchmark dataset for insect pest recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronic Engineers Computer Society, Long Beach, USA. pp. 8787–8796.

Wu Z, Xiong Y, Yu S X, Lin D. 2018. Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronic Engineers Computer Society, Salt Lake City, USA. pp. 3733–3742.

Yao Q, Feng J, Tang J, Xu W, Zhu X, Yang B, Lü J, Xie Y, Yao B, Wu S, Kuai N, Wang L. 2020. Development of an automatic monitoring system for rice light-trap pests based on machine vision. Journal of Integrative Agriculture19, 2500–2513.

Yao Q, Lv J, Tang J, Feng J, Zhu X. 2021a. Research on fine-grained image recognition of agricultural light-trap pests based on bilinear attention network. Scientia Agricultura Sinica54, 4562–4572. (in Chinese)

Yao Q, Wu S, Kuai N, Yang B, Tang J, Feng J, Zhu X. 2021b. Automatic detection of rice planthoppers through light-trap insect images using improved CornerNet. Transactions of the Chinese Society of Agricultural Engineering37, 183–189. (in Chinese)

Zhang H, Li F, Liu S, Zhang L, Su H, Zhu J, Ni L, Shum H Y. 2023. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In: Proceedings of the International Conference on Learning Representations. ICLR, Kigali, Rwanda. pp. 1–19.

Zhao X, Chen Y, Xu S, Li X, Wang X, Li Y, Huang H. 2024. An open and comprehensive pipeline for unified object grounding and detection. arXiv, doi: https://doi.org/10.48550/arXiv.2401.02361.

Zhao Y, Lv W, Xu S, Wei J, Wang G, Dang Q, Liu Y, Chen J. 2024. DETRs beat YOLOs on real-time object detection. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronic Engineers Computer Society, Seattle, USA. pp. 16965–16974.

Zhou J, Wei C, Wang H, Shen W, Xie C, Yuille A, Kong T. 2022. iBOT: image BERT pre-training with online tokenizer. arXiv, doi: https://doi.org/10.48550/arXiv.2111.07832.

No related articles found!
No Suggested Reading articles found!