PersonalPlan: Planning Multi-Agent Systems for Personalized Programming Learning
PersonalPlan: 面向个性化编程学习的多智能体系统规划
Zhiyuan Wen, Jiannong Cao, Peng Gao, Haochen Shi, Wengpan Kuan, Bo Yuan, Xiuxiu Qi
AI总结 提出PersonalPlan,一种两阶段多智能体规划器,通过分层SFT和奖励自适应GRPO生成可执行、个性化且具有教学支架的计划,在MAP-PPL数据集上优于现有方法。
详情
有效的编程教育需要针对不同学习者背景进行个性化教学。然而,虽然基于LLM的多智能体系统(MAS)擅长复杂规划,但现有规划器通常缺乏轮廓基础(profile-grounding)和教学支架(pedagogical scaffolding),从而削弱了个性化编程学习。为填补这一空白,我们首先引入\textbf{MAP-PPL}(\textbf{M}ulti-\textbf{A}gent \textbf{P}lans for \textbf{P}ersonalized \textbf{P}rogramming \textbf{L}earning),这是一个基于轮廓的多智能体规划数据集,包含来自1,730个Stack Overflow问题组和2,738个学习者轮廓的3,043个查询-轮廓-计划实例。每个计划指定了智能体、子任务、可执行步骤和先决依赖关系。然后,我们提出\textbf{PersonalPlan},一个两阶段MAS规划器,首先使用独立的LoRA适配器进行分层SFT,用于轮廓感知的任务分解和步骤依赖规划,然后应用奖励自适应GRPO,鼓励模型生成可执行、个性化且具有教学支架的计划。在MAP-PPL上进行的广泛实验,将PersonalPlan与前沿LLM、通用MAS框架和智能体规划器进行比较,证明了其优越性。仅使用8B和32B变体,PersonalPlan在计划可执行性、个性化和教学质量方面达到了最先进水平,有效协调了MAS进行智能体-学生交互。
Effective programming education requires personalized instruction adapted to diverse learner backgrounds. However, while LLM-based multi-agent systems (MAS) excel at complex planning, existing planners often lack profile-grounding and pedagogical scaffolding, thereby undermining personalized programming learning. To fill in the gap, we first introduce \textbf{MAP-PPL} (\textbf{M}ulti-\textbf{A}gent \textbf{P}lans for \textbf{P}ersonalized \textbf{P}rogramming \textbf{L}earning), a profile-conditioned multi-agent planning dataset with 3{,}043 query--profile--plan instances from 1{,}730 Stack Overflow question groups and 2{,}738 learner profiles. Each plan specifies agents, subtasks, executable steps, and prerequisite dependencies. Then, we propose \textbf{PersonalPlan}, a two-stage MAS planner that first performs hierarchical SFT with separate LoRA adapters for profile-aware task decomposition and step dependency planning, then applies a Reward-Adaptive GRPO to encourage the model to generate executable, personalized, and pedagogically scaffolded plans. Extensive experiments on MAP-PPL comparing PersonalPlan against frontier LLMs, generic MAS frameworks, and agentic planners demonstrate its superiority. With only 8B and 32B variants, PersonalPlan achieves state-of-the-art plan executability, personalization, and pedagogical quality, effectively orchestrating MAS for agent-student interactions.