中国农业科学 ›› 2018, Vol. 51 ›› Issue (19): 3673-3682.doi: 10.3864/j.issn.0578-1752.2018.19.005

• 耕作栽培·生理生化·农业信息技术 • 上一篇    下一篇

基于深度可分离卷积的实时农业图像逐像素分类研究

刘庆飞,张宏立,王艳玲   

  1. 新疆大学电气工程学院,乌鲁木齐 830047
  • 收稿日期:2018-04-11 出版日期:2018-10-01 发布日期:2018-10-01
  • 通讯作者: 张宏立,E-mail:1606829274@qq.com
  • 作者简介:刘庆飞,E-mail:892483452@qq.com
  • 基金资助:
    国家自然科学基金(51767022)

Real-Time Pixel-Wise Classification of Agricultural Images Based on Depth-Wise Separable Convolution

LIU QingFei, ZHANG HongLi, WANG YanLing   

  1. School of Electrical Engineering, Xinjiang University, Urumqi 830047
  • Received:2018-04-11 Online:2018-10-01 Published:2018-10-01

摘要: 【目的】为了提高作物和杂草的识别准确率和实时性,以苗期甜菜田间彩色图像为研究对象,提出了基于深度可分离卷积的实时农业图像逐像素分类方法。【方法】本研究使用由农业机器人采集的苗期甜菜田间彩色图像,通过人工逐像素标注方法将彩色图像中各个像素点标注为作物、杂草、土壤3个类别,并将单一类别的标注信息分别置于3个不同的图像通道,构成用于训练和测试的数据集。首先,建立以编码器-解码器为基础的深度可分离卷积神经网络模型,将编码器部分和解码器部分进行多尺度合并,由编码器部分决定像素位置,解码器部分获得像素分类;然后,为了解决分类类别覆盖率不平衡的问题,通过单通道标注信息训练,提高了低覆盖率分类类别的准确率,再将多个训练结果输出,实现对图像中的土壤、杂草、作物的识别;为了控制网络参数规模,采用宽度乘数控制点卷积核的个数,同时在不同分辨率输入条件下对网络模型进一步测试,以讨论网络模型的实时性。最后,使用随机数据增强技术扩充数据集,数据集中的80%用于网络参数的训练,20%用于测试网络性能。【结果】(1)通过与已有逐像素分类方法比较,本文方法获得较高的分类准确率。其中,SegNet方法逐像素分类的平均准确率为90.06%,U-Net方法平均准确率为92.06%,三通道标记训练的本文网络平均准确率为92.70%,单通道标记训练的本文网络平均准确率达94.99%。(2)通过计算不同方法单一类别逐像素分类的各项指标,论证了本文提出的单通道标注信息训练方法在处理分类类别覆盖率不平衡和训练样本较少情况下的优势。对杂草逐像素分类的准确率,SegNet方法为18.39%,U-Net方法为18.33%,三通道标记训练的本文网络为22.87%,单通道标记训练的本文网络准确率达41.94%。(3)通过宽度乘数可以有效控制网络模型的参数规模,当宽度乘数为1时,参数尺寸为676.8万,当宽度乘数为0.1时,参数尺寸降低到7.72万,是原始网络参数规模的1.14%,对土壤、杂草、作物的逐像素分类准确率分别仅降低2.81%、2.78%、3.7%,按照识别精度需求参数规模还可以进一步减小。(4)在输入分辨率和宽度乘数的共同作用下,讨论了网络的实时处理能力。采用GPU硬件加速对3个类别同时识别的速率可达20 fps,对单一类别识别速率达60 fps。可满足农业除草系统和作物监测系统实时在线运行。【结论】本文所提出的基于深度可分离卷积的逐像素分类方法,能对农业图像中的土壤、杂草、作物实施有效逐像素分类,同时该方法能对单一类别逐像素分类进行实时处理,满足实际系统的应用需求。

关键词: 作物与杂草识别, 深度学习, 卷积神经网络, 逐像素分类, 语义分割

Abstract: 【Objective】In order to improve the accuracy and real time recognition of crops and weeds, the field color image of seedling beet was taken as the research object, and a pixel-wise classification method based on deep separable convolution was proposed.【Method】In this paper, the field color image of the seedling beet was used, the pixels in the color image were tagged into three categories of crops, weeds and soil by the manual pixel marking method, and the single classification information was placed in three different image channels, which was used for training and testing. First, a deep separable convolution neural network model based on encoder and decoder was set up. The encoder part and decoder part were merged in multi scale. The pixel location was determined by the encoder part, and the decoder part got the pixel classification. In order to solve the problem of the unbalance of the coverage rate of the classification category, the single channel standard was used. In order to control the size of the network parameters, the number of the point convolution kernel was controlled by the width multiplier and the network was used under the different resolution input conditions to control the network parameters. The model was further tested to discuss the real-time performance of the network model. Finally, we used random data enhancement technology to expand data sets, 80% of the data sets were used for training network parameters, and 20% of them were used to test network performance. 【Result】 (1) Compared with the existing pixel-wise classification method, the proposed method achieved higher classification accuracy. The average accuracy rate of the SegNet method was 90.06%, the average accuracy of the U-Net method was 92.06%, the average accuracy rate of the three channel marking training was 92.70%, and the average network accuracy of the single channel marking training was 94.99%. (2)The advantages of the single channel annotation information training method in dealing with the unbalance of classified category coverage and the less training samples were demonstrated by calculating the indexes of the single category by pixel classification by different methods. The accuracy rate of weeds pixel-wise classification SegNet method was 18.39%, U-Net method was 18.33%, the network of three channel marking training was 22.87%, and the network accuracy of single channel marking training was 41.94%. (3) The parameter size of the network model could be effectively controlled by the width multiplier. When the width multiplier was 1, the parameter size was 6.768 million, and the parameter size was reduced to 77.2 thousand when the width multiplier was 0.1. It was 1.14% of the original network parameter scale, and the accuracy rate for the classification of soil, weeds and crops was only 2.81%, 2.78% and 3.7%, respectively. According to the accuracy requirement, the scale of parameters could be further reduced. (4) Under the combined action of input resolution and width multiplier, the real-time processing capability of the network was discussed. Using GPU hardware acceleration, the rate of simultaneous recognition of three classes could reach 20 fps, and the rate of single class recognition was 60 fps. It could satisfy the real-time operation of agricultural weeding system and crop monitoring system. 【Conclusion】The pixel-wise classification method based on deep separable convolution proposed in this paper could effectively classify the soil, weeds and crops in agricultural images. At the same time, this method could deal with a single category by pixel classification in real time to meet the needs of the actual system.

Key words: crop and weed recognition, deep learning, convolutional neural networks, pixel-wise classification, semantic segmentation