ShoppingBench: A Real-World Intent-Grounded Shopping Benchmark for LLM-based Agents
ShoppingBench:面向LLM智能体的真实世界意图导向购物基准
发表机构 * Alibaba International Digital Commercial Group(阿里巴巴国际数字商业集团)
专题命中 软件智能体 :提出购物基准测试LLM智能体,属于软件智能体
AI总结 提出ShoppingBench基准,包含多层级真实购物意图任务,通过模拟环境和250万商品评估LLM智能体,发现GPT-4.1成功率低于50%,并提出轨迹蒸馏策略提升小模型性能。
Comments Accepted for oral presentation at AAAI 2026