YOLO26-RipeLoc Lite: A lightweight architecture for tomato ripeness detection and picking point localization in greenhouse robotic harvesting
YOLO26-RipeLoc Lite:用于温室机器人采摘中番茄成熟度检测与采摘点定位的轻量级架构
Rajmeet Singh, Manveen Kaur, Shahpour Alirezaee, Irfan Hussain
AI总结 提出基于YOLO26的轻量级架构YOLO26-RipeLoc Lite,通过轻量特征金字塔网络、成熟度感知注意力模块和紧凑检测头,实现温室番茄的成熟度分类与中心点定位,在仅2.38M参数下达到92.9% mAP@0.5。
详情
在温室番茄生产中,自动化收获需要准确检测成熟番茄、进行成熟度分类,并为机器人末端执行器精确定位采摘点。本文提出YOLO26-RipeLoc Lite,一种基于YOLO26的轻量级深度学习架构,用于同时检测、成熟度分类和温室番茄的中心点定位。该模型引入了三项改进:(1) 轻量特征金字塔网络(LFPN),采用深度可分离卷积实现高效多尺度融合;(2) 成熟度感知注意力模块(RAAM),具有双池化和可学习的成熟度偏置向量,增强颜色纹理区分能力;(3) 紧凑检测头(CDH),采用共享卷积和集成的中心点回归分支,用于直接抓取规划。该模型在来自阿联酋阿布扎比SILAL温室的自定义数据集(1500张图像,6227个实例,其中3566个成熟,2661个未成熟)上进行评估。YOLO26-RipeLoc Lite在仅使用2.38M参数的情况下,实现了92.9%的mAP@0.5(成熟95.2%,未成熟90.6%),在所有评估架构中精度最高(95.2%)。训练后批量归一化剪枝30%可将参数减少至约1.8M,且精度损失可忽略。消融研究证实,温室感知的HSV增强提供了最大的改进(+2.02个百分点 mAP@50),骨干网络冻结达到了峰值精度(93.8%),而三阶段渐进解冻获得了最佳的定位质量(mAP@50:95为64.6%)。与YOLOv8n/s、YOLO11n/s、YOLO12n/s和YOLO26s的比较证实了其优越的精度-效率:比YOLO12n精度高2.9个百分点,参数少7.0%,并集成了用于机器人末端执行器引导的中心点定位。
In greenhouse tomato production, automated harvesting requires accurate detection of ripe tomatoes, ripeness classification, and precise picking-point localization for robotic end-effectors. This paper proposes YOLO26-RipeLoc Lite, a lightweight deep learning architecture based on YOLO26 for simultaneous detection, ripeness classification, and center-point localization of greenhouse tomatoes. The model introduces three modifications: (1) a Lightweight Feature Pyramid Network (LFPN) with depthwise separable convolutions for efficient multi-scale fusion, (2) a Ripeness-Aware Attention Module (RAAM) with dual pooling and a learnable ripeness bias vector for enhanced color-texture discrimination, and (3) a Compact Detection Head (CDH) with shared convolutions and an integrated center-point regression branch for direct grasp planning. The model is evaluated on a custom dataset of 1,500 images with 6,227 instances (3,566 ripe, 2,661 unripe) from the SILAL greenhouse, Abu Dhabi, UAE. YOLO26-RipeLoc Lite achieves mAP@0.5 of 92.9% (95.2% ripe, 90.6% unripe) with the highest precision (95.2%) among all evaluated architectures using only 2.38M parameters. Post-training BatchNorm pruning at 30% reduces parameters to ~1.8M with negligible accuracy loss. Ablation studies confirm that greenhouse-aware HSV augmentation provides the largest improvement (+2.02 pp mAP@50), backbone freezing achieves peak precision (93.8%), and 3-phase progressive unfreezing yields the best localization quality (mAP@50:95 of 64.6%). Comparisons with YOLOv8n/s, YOLO11n/s, YOLO12n/s, and YOLO26s confirm superior accuracy-efficiency: 2.9 pp higher precision than YOLO12n with 7.0% fewer parameters and integrated center-point localization for robotic end-effector guidance.