Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning
更小的抽象状态空间在强化学习中实现跨尺度泛化
Nasehatul Mustakim, Lucas Lehnert
AI总结 本文提出了一种理论模型,通过扩展POMDP中的状态抽象框架,定义了 successor-weighted model reduction,从而在强化学习代理中实现跨尺度泛化,并分析了抽象状态空间大小对泛化能力的影响。
详情
尽管人类能够轻易地将抽象概念推广到更复杂或更大的任务中,但构建具备这种能力的强化学习(RL)系统仍然难以实现。本文提出了首个关于如何在RL代理中实现Out-of-Distribution(OOD)泛化的理论模型。我们的方法考虑了部分可观测马尔可夫决策过程(POMDPs),并假设智能体使用抽象函数来确定哪些经验可以被视为等价,哪些必须区分。首先,我们扩展了现有的状态抽象框架和证明技术到POMDPs。然后,我们定义了successor-weighted model reduction,这是一种允许压缩到比先前定义更小的抽象空间的模型缩减变体。我们推导了代理OOD测试性能的界限,从而定义了实现OOD泛化的条件。该界限将代理的性能损失分解为近似和估计误差,揭示了减少代理抽象状态空间大小如何提高测试性能和OOD泛化能力。我们的分析表明,限制代理在有限的抽象状态集合上操作对于实现更复杂任务的泛化是必要的。我们的结果鼓励进一步研究学习能够跨不同复杂程度任务进行扩展的RL架构。
While humans readily generalize abstract concepts to more complex or larger tasks, building Reinforcement Learning (RL) systems with this ability remains elusive. Here, we present the first theoretical model of how such Out-of-Distribution (OOD) generalization can be achieved in RL agents. Our approach considers Partially Observable Markov Decision Processes (POMDPs) and assumes that an intelligent agent uses an abstraction function to determine which experiences can be treated as equivalent and which must be distinguished. First, we extend the existing state abstraction framework and proof techniques to POMDPs. Then, we define a successor-weighted model reduction, a model reduction variant that enables compression into smaller abstract spaces than prior definitions allow. We derive a bound on the agent's OOD test performance, thereby defining the conditions under which OOD generalization is achievable. This bound decomposes an agent's performance loss into approximation and estimation errors, revealing how reducing an agent's abstract state space size improves test performance and OOD generalization. Our analysis suggests that constraining an agent to operate over a small, finite set of abstract states is necessary for achieving generalization to more complex tasks. Our results motivate further research into learning RL architectures that scale across tasks of varying complexity levels.