Enhancing Trustworthy GUI Grounding via Self-Critiqued Reinforcement Learning
通过自我批评强化学习增强可信的GUI定位
Shaojie Zhang, Pei Fu, Ruoceng Zhang, Jiahui Yang, Anan Du, Xiuwen Xi, Shaokang Wang, Ying Huang, Bin Qin, Zhenbo Luo, Jian Luan
AI总结 提出HyperClick框架,通过自我批评强化学习联合优化定位准确性和置信度可靠性,实现可信的GUI定位。
详情
自主图形用户界面(GUI)代理依赖于准确的GUI定位,将语言指令映射到屏幕坐标,以执行用户命令。然而,当前的模型,无论是通过监督微调(SFT)还是强化学习(RL)训练的,通常提供的置信度信号与实际定位正确性对齐不良,导致过度自信且不可靠的预测。为了解决这个问题,我们提出了HyperClick,一种通过自我批评强化学习(SCRL)增强可信GUI定位的新框架。HyperClick结合了正确性奖励和置信度对齐奖励,训练策略模型同时输出点击预测和明确的置信度估计。这种方法通过基于置信度的自我评估,联合优化了定位准确性和置信度可靠性。在具有挑战性的基准测试上的大量实验表明,HyperClick在保持强大定位性能的同时,提供了更好对齐的置信度估计。通过在GUI动作旁边暴露不确定性,HyperClick支持GUI自动化中基于置信度的弃权。代码将在此处发布。
Autonomous graphical user interface (GUI) agents rely on accurate GUI grounding, which maps language instructions to on-screen coordinates, to execute user commands. However, current models, whether trained via supervised fine-tuning (SFT) or reinforcement learning (RL), often provide confidence signals that are poorly aligned with actual grounding correctness, leading to overconfident and unreliable predictions. To address this, we propose HyperClick, a novel framework that enhances trustworthy GUI grounding through self-critiqued reinforcement learning (SCRL). HyperClick combines a correctness reward and a confidence alignment reward, training the policy model to output both a click prediction and an explicit confidence estimate. This approach jointly optimizes grounding accuracy and confidence reliability through confidence-based self-assessment. Extensive experiments on challenging benchmarks show that HyperClick maintains strong grounding performance while providing better-aligned confidence estimates. By exposing uncertainty alongside GUI actions, HyperClick supports confidence-based abstention in GUI automation. Code will be released here.