2605.16679
2026-05-20
cs.CL
cs.AI
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?
CHI-Bench: 能否让AI代理自动化端到端、长周期、政策丰富的医疗工作流程?
Haolin Chen, Deon Metelski, Leon Qi, Tao Xia, Joonyul Lee, Steve Brown, Kevin Riley, Frank Wang, T. Y. Alvin Liu, Hank Capps MD, Zeyu Tang, Xiangchen Song, Lingjing Kong, Fan Feng, Tianyi Zeng, Zhiwei Liu, Zixian Ma, Hang Jiang, Fangli Geng, Yuan Yuan, Chenyu You, Qingsong Wen, Hua Wei, Yanjie Fu, Yue Zhao, Carl Yang, Biwei Huang, Kun Zhang, Caiming Xiong, Sanmi Koyejo, Eric P. Xing, Philip S. Yu, Weiran Yao
AI总结
本文提出CHI-Bench基准,旨在评估AI代理在医疗工作流程中端到端、长周期和政策丰富任务中的自动化能力,揭示当前基准测试中政策密度、多角色协作和多方交互等能力的不足。