Swin-transformer-yolov5 For Real-time Wine Grape Bunch Detection
In this research, an integrated detection model, Swin-transformer-YOLOv5 or Swin-T-YOLOv5, was proposed for real-time wine grape bunch detection to inherit the advantages from both YOLOv5 and Swin-transformer. The research was conducted on two different grape varieties of Chardonnay (always white berry skin) and Merlot (white or white-red mix berry skin when immature; red when matured) from July to September in 2019. To verify the superiority of Swin-T-YOLOv5, its performance was compared against several commonly used/competitive object detectors, including Faster R-CNN, YOLOv3, YOLOv4, and YOLOv5. All models were assessed under different test conditions, including two different weather conditions (sunny and cloudy), two different berry maturity stages (immature and mature), and three different sunlight directions/intensities (morning, noon, and afternoon) for a comprehensive comparison. Additionally, the predicted number of grape bunches by Swin-T-YOLOv5 was further compared with ground truth values, including both in-field manual counting and manual labeling during the annotation process. Results showed that the proposed Swin-T-YOLOv5 outperformed all other studied models for grape bunch detection, with up to 97 (mAP) and 0.89 of F1-score when the weather was cloudy. This mAP was approximately 44 and YOLOv5, respectively. Swin-T-YOLOv5 achieved its lowest mAP (90 F1-score (0.82) when detecting immature berries, where the mAP was approximately 40 Swin-T-YOLOv5 performed better on Chardonnay variety with achieved up to 0.91 of R2 and 2.36 root mean square error (RMSE) when comparing the predictions with ground truth. However, it underperformed on Merlot variety with achieved only up to 0.70 of R2 and 3.30 of RMSE.
READ FULL TEXT