2603.09344
2026-06-18
cs.AI
stat.ML
版本更新
70%
Robust Regularized Policy Iteration under Transition Uncertainty
鲁棒正则化策略迭代在转移不确定性下
Hongqiang Lin, Zhenghui Fu, Weihao Tang, Pengfei Wang, Yiding Sun, Qixian Huang, Dongxu Zhang
发表机构
*
College of Computer Science and Technology, Zhejiang University, Hangzhou, China(浙江大学计算机科学与技术学院)
;
School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an, China(西北工业大学人工智能、光学与电子学院(iOPEN))
;
School of Software Technology, Zhejiang University, Hangzhou, China(浙江大学软件技术学院)
;
School of Software Engineering, Xi'an Jiaotong University, Xi'an, China(西安交通大学软件工程学院)
;
School of Systems Science and Engineering, Sun Yat-sen University, Guangzhou, China(中山大学系统科学与工程学院)
专题命中
规划推理
:鲁棒策略迭代用于离线强化学习
AI总结
提出鲁棒正则化策略迭代(RRPI),通过将离线强化学习建模为鲁棒策略优化,使用KL正则化替代难解的双层目标,并基于鲁棒正则化贝尔曼算子实现高效策略迭代,理论保证收敛性,实验在D4RL基准上表现优异。