arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.14344 2026-05-18 cs.AI

CrystalReasoner: Reasoning and RL for Property-Conditioned Crystal Structure Generation

CrystalReasoner: 基于推理和强化学习的性质条件晶体结构生成

Yuyang Wu, Stefano Falletta, Delia McGrath, Sherry Yang

AI总结 CrystalReasoner通过引入物理先验和强化学习，实现从自然语言指令生成稳定且具有特定性质的晶体结构，提升了生成精度和科学合理性。

Comments Our work is available at https://crystalreasoner.github.io/, with code at https://github.com/wyy603/CrystalReasoner

详情

AI中文摘要

混合预训练下的扩展规律

Anastasiia Sedova, Skyler Seto, Natalie Schluter, Pierre Ablin

AI总结研究混合预训练中数据限制下的扩展规律，发现重复是影响目标领域性能的核心因素，提出考虑重复的混合扩展定律以优化预训练配置。

详情

AI中文摘要

随着语言模型规模的扩大，所需数据量也随之增加--然而许多目标数据源，如低资源语言或专业领域，本质上尺寸有限。常见策略是将稀缺但有价值的目标数据与大量通用数据混合，这带来了根本性的权衡：混合中目标数据过少会使模型对目标领域暴露不足，而过多则会导致重复示例过多，产生边际效益递减并最终过拟合。我们研究了超过2000次语言模型训练运行，涵盖多种模型和目标数据集大小，以及多种数据类型，包括多语言、领域特定和质量过滤混合。在所有设置中，我们发现重复是目标领域性能的核心驱动因素，且混合训练比单源训练更能容忍更高的重复：稀缺目标语料可重复使用15-20次，最优重复次数取决于目标数据大小、计算预算和模型规模。接下来，我们引入了一种考虑重复的混合扩展定律，该定律考虑了重复目标标记的递减价值和通用数据的正则化作用。优化扩展定律提供了一种系统的方法来计算有效的混合配置，从而在数据限制下为预训练提供实用的混合推荐。

英文摘要

As language models scale, the amount of data they require grows -- yet many target data sources, such as low-resource languages or specialized domains, are inherently limited in size. A common strategy is to mix this scarce but valuable target data with abundant generic data, which presents a fundamental trade-off: too little target data in the mixture underexposes the model to the target domain, while too much target data repeats the same examples excessively, yielding diminishing returns and eventual overfitting. We study this trade-off across more than 2,000 language-model training runs spanning multiple model and target dataset sizes, as well as several data types, including multilingual, domain-specific, and quality-filtered mixtures. Across all settings, we find that repetition is a central driver of target-domain performance, and that mixture training tolerates much higher repetition than single-source training: scarce target corpora can be reused 15-20 times, with the optimal number of repetitions depending on the target data size, compute budget, and model scale. Next, we introduce a repetition-aware mixture scaling law that accounts for the decreasing value of repeated target tokens and the regularizing role of generic data. Optimizing the scaling law provides a principled way to compute effective mixture configurations, yielding practical mixture recommendations for pretraining under data constraints.

URL PDF HTML ☆

赞 0 踩 0

2605.12309 2026-05-18 cs.CV

控制瞬态放大提高长周期展开

Adeel Pervez, Francesco Locatello

AI总结本文通过分析发现瞬态放大是导致长周期展开误差的原因，提出交换性正则化方法，通过减少雅可比矩阵的正常性和交换子范数来提升模型的长周期展开能力。

详情

AI中文摘要

自回归神经模拟器现在在短周期物理系统预测上能与经典求解器相媲美，但其在长周期展开时准确性迅速下降。本文识别出在展开轨迹周围扰动的瞬态放大是导致展开误差的结构机制。通过线性化分析，我们发现当自回归轨迹上的雅可比矩阵非正交且非交换时，模型会瞬态放大误差，即使整体系统渐近稳定。基于此分析，我们提出交换性正则化：一种结合两种惩罚项的方法，旨在减少单个雅可比矩阵的非正交缺陷和跨步雅可比矩阵的交换子范数。惩罚项通过雅可比-向量积估计，无推理时间成本。我们展示了一个传播界，量化在近似交换性和正交性下的展开误差。我们评估了带有交换性正则化的UNet和FNO变体，在合成和真实1D和2D时空数据上实现了数千步的长周期展开。此外，我们展示该方法在ERA5数据上改进FourCastNet气候预测，无需使用任何新数据。增益在分布外情况最显著：训练在几百步轨迹上，正则化模型在初始条件上可维持数千步的展开，而基线模型则发散。

英文摘要

Autoregressive neural simulators now match classical solvers on short-horizon prediction of physical systems, yet their accuracy degrades rapidly when rolled out over long horizons. In this work, we identify transient amplification of perturbations around rollout trajectories as a structural mechanism driving rollout error. Using a linearization analysis we show that when the Jacobians along an autoregressive trajectory are non-normal and non-commuting, the model amplifies errors transiently, resulting in model rollout drift even when the overall system is asymptotically stable. Building on the analysis, we propose commutativity regularization: a combination of two penalties designed to reduce the normality defect of individual Jacobians and the commutator norm of Jacobians across steps. The penalties are estimated with Jacobian-vector products and have no inference-time cost. We show a propagator bound that quantifies rollout error under approximate commutativity and normality. We evaluate UNet and FNO variants with commutativity regularization on 1D and 2D spatio-temporal data in synthetic and real settings, showing successful long-horizon rollouts over thousands of steps. Further, we show that the method improves FourCastNet climate forecasts on ERA5 without using any new data. The gain is most pronounced out-of-distribution: trained on trajectories of a few hundred steps, regularized models remain in-distribution for thousands of rollout steps on initial conditions where baselines diverge.

URL PDF HTML ☆

赞 0 踩 0

2605.08464 2026-05-18 cs.LG

The Geometric Structure of Models Learning Sparse Data

学习稀疏数据的模型几何结构

Thomas Walker, T. Mitchell Roddenberry, Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

AI总结本文研究了在稀疏条件下模型通过局部几何结构实现成功学习的机制，提出正常对齐概念，并引入GrokAlign正则化策略提升深度网络训练效率，同时改进递归特征机以增强对抗鲁棒性。

Comments 27 pages, 7 figures, 5 tables

详情

AI中文摘要

曼哈顿假设（MH）常用于解释机器学习如何克服维度诅咒，但仅在训练数据能提供足够密集的底层低维流形样本或存在此类流形的 regime 中适用。本文描述了MH不适用的稀疏 regime，并证明正常对齐分类器在 norm 约束下最小化训练目标并实现最大局部鲁棒性。对于连续分段仿射深度网络，正常对齐表现为网络诱导幂图分区内的质心对齐，源于特征学习 regime。受理论启发，我们引入GrokAlign正则化策略主动诱导正常对齐，并展示其显著加速深度网络训练动态。此外，我们应用正常对齐原理到递归特征机（RFMs）中，引入递归特征对齐机（RFAMs），证明其在表格数据训练下比RFMs具有更强的对抗鲁棒性。

英文摘要

The manifold hypothesis (MH) is often used to explain how machine learning can overcome the curse of dimensionality. However, the MH is only applicable in regimes where the training data provides a sufficiently dense sample of the underlying low-dimensional data manifold, or where such a low-dimensional manifold is conceivably present. We describe the regimes where the MH is not applicable as sparse. In this paper, we demonstrate that models succeed in the sparse regime by exploiting a highly structured local geometry, a property we formalize as normal alignment. We prove that normal-aligned classifiers -- whose input-output Jacobians are rank-one and align perfectly with the training data -- minimize the training objective under norm constraints and achieve maximal local robustness under a non-zero Jacobian constraint. For continuous piecewise-affine deep networks, normal alignment manifests geometrically as centroid alignment within the network's induced power diagram partition and results from the feature-learning regime. Motivated by these theoretical insights, we introduce GrokAlign, a regularization strategy that actively induces normal alignment. We demonstrate that GrokAlign significantly accelerates the training dynamics of deep networks relevant to the grokking phenomenon. Furthermore, we apply the principle of normal alignment to Recursive Feature Machines (RFMs) to introduce Recursive Feature Alignment Machines (RFAMs). We show that RFAMs exhibit greater adversarial robustness compared to RFMs when trained on tabular data.

URL PDF HTML ☆

赞 0 踩 0

2605.08401 2026-05-18 cs.CL cs.AI

AIPO: Learning to Reason from Active Interaction

AIPO: 通过主动交互学习推理

Junnan Liu, Linhao Luo, Thuy-Trang Vu, Gholamreza Haffari

AI总结 AIPO通过主动多智能体交互提升大语言模型推理能力，引入三个协作代理解决推理瓶颈，改进探索效率并扩展能力边界。

Comments Preprint

详情

AI中文摘要

近期大语言模型（LLM）的进展展示了卓越的推理能力，主要受可验证奖励强化学习（RLVR）推动。然而，现有RL算法面临探索受限于策略模型固有能力边界的根本限制。尽管近期方法引入外部专家演示扩展此边界，但通常依赖完整轨迹级指导，样本效率低、信息稀疏且可能限制探索于静态指导空间。受多智能体系统的启发，我们提出AIPO，一种增强的强化学习框架，通过探索期间的主动多智能体交互提升LLM推理能力。具体而言，AIPO使策略模型在遇到推理瓶颈时主动咨询三个功能协作代理，即验证代理、知识代理和推理代理，从而获得细粒度和针对性的指导，主动扩展其能力边界。我们进一步引入定制的重要性采样系数和剪裁策略，以缓解从代理提供的反馈中学习时出现的离策略偏差和梯度消失问题。训练后，策略模型可独立进行推理而不依赖协作代理。在多样化的推理基准测试中，包括AIME、MATH500、GPQA-Diamond和LiveCodeBench，AIPO一致提升了推理性能，跨不同策略模型和RLVR算法具有鲁棒泛化能力，并有效扩展了策略模型的推理能力边界。

英文摘要

Recent advances in large language models (LLMs) have demonstrated remarkable reasoning capabilities, largely stimulated by Reinforcement Learning with Verifiable Rewards (RLVR). However, existing RL algorithms face a fundamental limitation: their exploration remains largely constrained by the inherent capability boundary of the policy model. Although recent methods introduce external expert demonstrations to extend this boundary, they typically rely on complete trajectory-level guidance, which is sample-inefficient, information-sparse, and may confine exploration to a static guidance space. Inspired by the potential of multi-agent systems, we propose $\textbf{AIPO}$, an enhanced reinforcement learning framework that improves LLM reasoning through active multi-agent interaction during exploration. Specifically, AIPO enables the policy model to proactively consult three functional collaborative agents, $\textit{Verify Agent}$, $\textit{Knowledge Agent}$, and $\textit{Reasoning Agent}$, when encountering reasoning bottlenecks, thereby receiving fine-grained and targeted guidance to actively expand its capability boundary during training. We further introduce a tailored importance sampling coefficient together with a clipping strategy to mitigate the off-policy bias and gradient vanishing issues that arise when learning from agent-provided feedback. After training, the policy model performs reasoning independently without relying on collaborative agents. Extensive experiments on diverse reasoning benchmarks, including AIME, MATH500, GPQA-Diamond, and LiveCodeBench, show that AIPO consistently improves reasoning performance, generalizes robustly across different policy models and RLVR algorithms, and effectively expands the reasoning capability boundary of the policy model.

URL PDF HTML ☆

赞 0 踩 0

2605.06475 2026-05-18 cs.AI cs.CV

Probabilistic Dating of Historical Manuscripts via Evidential Deep Regression on Visual Script Features

通过视觉手写特征的证据深度回归进行历史手稿的概率年代测定

Ranjith Chodavarapu

AI总结本文提出一种基于视觉特征的深度回归方法，用于确定历史手稿的年代，通过分解不确定性提升预测精度，实验显示模型在测试集上取得优异性能。

详情

AI中文摘要

我们介绍了一种概率方法，用于仅通过视觉特征确定历史手稿页面的年代。与以往文献中将世纪聚合为类别的做法不同，我们将年代测定视为一个在连续年份轴上的证据深度回归问题，使神经网络能够在一个前向传递中输出完整的预测分布，包含分解的偶然性和epistemic不确定性。我们的架构结合了EfficientNet-B2主干网络和通过联合负对数似然和证据正则化目标训练的Normal-Inverse-Gamma（NIG）输出头。在DIVA-HisDB基准（150页，3个中世纪手稿，151936个补丁）上，我们的模型在测试集上取得了5.4年的MAE，远低于50年的世纪标签监督粒度，93%的补丁在5年内，97%在10年内。我们的方法在单次前向传递中实现了PICP=92.6%的校准，优于MC Dropout（PICP=88.2%，50次传递）和Deep Ensembles（PICP=79.7%，5个模型）的性能，且推理成本低5倍。不确定性分解显示偶然性不确定性是年代误差的强预测因子（Spearman ρ=0.729），且对最确定的20%补丁的有选择性预测可提供0.5年的MAE。我们展示了预测的不确定性随着图像退化程度的恶化而增加，空间分解映射解释了哪些手写区域导致偶然性不确定性，且页面级聚合将MAE降低到4.5年，不确定性与页面级误差之间的相关性为ρ=0.905。

英文摘要

We introduce a probabilistic approach for dating historical manuscript pages from visual features alone. Instead of aggregating centuries into classes as is standard in the previous literature, we pose dating as an evidential deep regression problem over a continuous year axis, allowing our neural network to output a full predictive distribution with decomposed aleatoric and epistemic uncertainty in a single forward pass. Our architecture combines an EfficientNet-B2 backbone with a Normal-Inverse-Gamma (NIG) output head trained with a joint negative-log-likelihood and evidence-regularization objective. On the DIVA-HisDB benchmark (150 pages, 3 medieval codices, 151,936 patches), our model scores a test MAE of 5.4 years, well below the 50-year century-label supervision granularity, with 93\% of patches within 5 years and 97\% within 10 years. Our approach achieves \textbf{PICP=92.6\%}, the best calibration among all compared methods, in a single forward pass, outperforming MC Dropout (PICP=88.2\%, 50 passes) and Deep Ensembles (PICP=79.7\%, 5 models) at $5\times$ lower inference cost. Uncertainty decomposition shows aleatoric uncertainty is a strong predictor of dating error (Spearman $ρ=0.729$), and a selective prediction about the most certain 20\% of patches can provide \textbf{0.5 years MAE}. We show that predicted uncertainty increases as image degradation worsens, spatial decomposition maps explain which script regions cause aleatoric uncertainty, and page-level aggregation reduces MAE to 4.5 years with $ρ=0.905$ between uncertainty and page-level error.

URL PDF HTML ☆

赞 0 踩 0

2605.06223 2026-05-18 cs.AI cs.RO

ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries

ProCompNav：基于比较判断的主动实例导航

Junhyuk Kwon, Seungjoon Lee, Hyejin Park, Kyle Min, Jungseul Ok

AI总结 ProCompNav通过两阶段框架解决用户查询歧义问题，通过比较判断逐步缩小候选集，提升导航成功率并减少用户响应长度。

Comments Project page: https://tree-jhk.github.io/procompnav/ . Code: https://github.com/tree-jhk/procompnav/

详情

AI中文摘要

自然语言实例导航在初始请求不唯一指定目标实例时变得具有挑战性。一个实用的代理应通过主动询问区分目标与相似干扰项所需的信息来减轻用户负担，而非要求详细描述。现有方法常无法达到此目标：它们可能在初步可行候选者前停止，或在收集多个候选后仅询问单个候选的属性，而非选择区分候选池的提问。因此，尽管有对话，代理仍可能无法区分目标与干扰项，导致提前决策和冗长用户响应。我们提出了Proactive Instance Navigation with Comparative Judgment（ProCompNav），一个两阶段框架，首先构建候选池，然后通过比较判断确定目标。每轮中，ProCompNav提取一个属性-值对，将当前池分割，询问二元是/否问题，并一次性修剪所有不一致的候选。这将歧义消除从开放性目标描述转为池级辨别提问，每个问题旨在缩小候选集。在CoIN-Bench上，ProCompNav在相同最小输入和非交互基线中提高了成功率，并显著减少了响应长度。ProCompNav还在TextNav上实现了最先进的成功率，表明比较判断对相似干扰项间的实例导航具有广泛价值。代码可在https://github.com/tree-jhk/procompnav获取。

英文摘要

Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user's burden by actively asking only the information needed to distinguish the target from similar distractors, rather than requiring a detailed description upfront. Existing approaches often fall short of this goal: they may stop at the first plausible candidate before sufficiently exploring alternatives, or, even after collecting multiple candidates, ask about the target's attributes derived from individual candidates rather than questions selected to distinguish candidates in the pool. As a result, despite the dialogue, the agent may still fail to distinguish the target from distractors, leading to premature decisions and lengthy user responses. We propose Proactive Instance Navigation with Comparative Judgment (ProCompNav), a two-stage framework that first constructs a candidate pool and then identifies the target through comparative judgment. At each round, ProCompNav extracts an attribute-value pair that splits the current pool, asks a binary yes/no question, and prunes all inconsistent candidates at once. This reframes disambiguation from open-ended target description to pool-level discriminative questioning, where each question is chosen to narrow the candidate set. On CoIN-Bench, ProCompNav improves Success Rate over interactive baselines with the same minimal input and non-interactive baselines with detailed descriptions, while substantially reducing Response Length. ProCompNav also achieves state-of-the-art Success Rate on TextNav, suggesting that comparative judgment is broadly useful for instance-level navigation among similar distractors. Code is available at https://github.com/tree-jhk/procompnav.

URL PDF HTML ☆

赞 0 踩 0

2605.05652 2026-05-18 cs.LG

梯度不一致性获取用于池式主动学习

Mohamadsadegh Khosravani, Sandra Zilles

AI总结本文提出基于梯度的获取准则，用于替代不确定性采样中的不确定性度量，或整合到考虑采样点分布和标签不确定性的多样性方法中，理论和实验证明其有效性。

2605.00934 2026-05-18 cs.LG cs.CV stat.ML

Structured Analytic Coherent Point Drift for Non-Rigid Point Set Registration

结构化分析一致点漂移用于非刚性点集配准

Wei Feng, Haiyong Zheng

AI总结本文提出Analytic-CPD，通过结构化分析映射改进传统CPD，实现更高效且可控的非刚性点集配准，实验验证其在不同数据集上的有效性与精度效率优势。

Comments Revised version. Supplementary material incorporated as appendices; method, implementation, and experimental details expanded

详情

AI中文摘要

Coherent Point Drift (CPD) 是一种用于无监督非刚性点集配准的概率框架。其标准非刚性M-step然而依赖于点索引高斯核系统，其大小随移动点数量增长，导致大点集的形变估计计算负担重且难以控制复杂度。为解决这些限制，我们提出Analytic-CPD，一种新的无监督非刚性配准框架，为CPD提供结构化分析重述。Analytic-CPD保留CPD后验对应层，但将M-step从点索引核位移估计提升到结构化分析映射估计。通过将CPD的高斯混合后验机制与结构化分析映射(SAM)耦合，该方法获得一个系数维度由环境维度和分析阶数而非移动点数量决定的形变模型。更重要的是，形变估计在可解释的分析函数空间层次上组织，因此分析阶数可以随着后验对应可靠性增加而逐步提升。我们通过增加阶数连续策略与减少阶段长度实现该想法：低阶分析映射首先稳定后验对应结构，而更高阶模式随后细化非线性残差形变。在受控模型匹配、平滑模型不匹配和注册人体形状数据上的实验验证了Analytic-CPD的有效性和优越的精度-效率性能。

英文摘要

Coherent Point Drift (CPD) is a representative probabilistic framework for unsupervised non-rigid point set registration. Its standard non-rigid M-step, however, relies on a point-indexed Gaussian-kernel system whose size grows with the number of moving points, making deformation estimation computationally heavy for large point sets and difficult to control in complexity during registration. To address these limitations, we propose Analytic-CPD, a new unsupervised non-rigid registration framework that gives CPD a structured analytic reformulation. Analytic-CPD preserves the CPD posterior correspondence layer, but lifts the M-step from point-indexed kernel displacement estimation to structured analytic mapping estimation. By coupling the Gaussian-mixture posterior mechanism of CPD with Structured Analytic Mappings (SAM), the method obtains a deformation model whose coefficient dimension is governed by the ambient dimension and analytic order rather than by the number of moving points. More importantly, deformation estimation is organized over an interpretable hierarchy of analytic function spaces, so the analytic order can be increased progressively as posterior correspondences become more reliable. We implement this idea through an increasing-degree continuation strategy with decreasing stage lengths: low-order analytic maps first stabilize the posterior correspondence structure, while higher-order modes later refine nonlinear residual deformation. Experiments on controlled model-matched, smooth model-mismatch, and registered human-shape data demonstrate the effectiveness and favorable accuracy--efficiency performance of Analytic-CPD.

URL PDF HTML ☆

赞 0 踩 0

2605.00674 2026-05-18 cs.CL

维基数据转训练语料：南斯拉夫语法

Mihailo Škorić, Cosimo Palma

AI总结本文提出将维基数据转化为七种南斯拉夫语言高质量语料的流程，通过文本提取清洗和冗余过滤提升语料质量，为语言模型训练和跨语言比较提供可靠资源。

详情

AI中文摘要

本文提出了一种将原始维基数据转换为七种南斯拉夫语言高质量语料的流程。工作分为两个主要阶段：第一阶段涉及从维基百科、维基源、维基书籍、维基新闻和维基引语的原始数据中提取和清洗文本，需要仔细处理原始维基标记以分离文本文章并提取可用的自然语言文本。第二阶段解决可疑或低质量文章的问题，这些文章通常来自数据库或结构化知识库，具有重复模式、通用短语和极少或没有原创内容。为减轻其影响，采用基于n-gram的过滤策略来检测文章间的高文本冗余并完全移除这些文章。最终的语料库旨在提供语言丰富的文本，适用于语言模型训练或跨南斯拉夫语言的比较研究。通过系统提取与质量控制相结合，本工作为创建可靠、高信息量的语料库做出了贡献，这些语料库反映了语言的真实文化背景。尽管本文专注于南斯拉夫语的情况，但该方法主要语言无关，可推广到其他语言。

英文摘要

This paper presents a pipeline designed to transform raw Wikimedia dumps into quality textual corpora for seven South Slavic languages. The work is divided into two major phases. The first involves extracting and cleaning text from raw dumps of Wikipedia, Wikisource, Wikibooks, Wikinews, and Wikiquote. This step requires careful handling of raw wiki markup to isolate, first of all, textual articles, and then usable natural language text within them. The second phase addresses the challenge of questionable or low-quality articles, which are often generated from databases or structured knowledge bases. These articles are characterised by repetitive patterns, generic phrasing, and minimal to no original content. To mitigate their impact, a n-gram-based filtering strategy was employed to detect high levels of textual redundancy between articles and then remove such articles from the corpora entirely. The resulting datasets aim to provide linguistically rich texts suitable for training language models or conducting comparative research across South Slavic languages. By combining systematic extraction with quality control, this work contributes to the creation of reliable, high-information corpora that reflect the authentic cultural contexts of languages. While focused on the South Slavic case in the paper, the approach is mostly language-agnostic and can be generalised to other languages.

URL PDF HTML ☆

赞 0 踩 0

2604.21251 2026-05-18 cs.LG cs.AI

CAP: Controllable Alignment Prompting for Unlearning in LLMs

CAP：用于大语言模型中去学习的可控对齐提示

Zhaokun Wang, Jinyu Guo, Jingwen Pu, Hongli Pu, Meng Yang, Xunlei Chen, Jie Ou, Wenyi Li, Guangchun Luo, Wenhong Tian

AI总结本文提出CAP框架，通过强化学习将去学习过程转化为可学习的提示优化，实现可控的去学习，无需更新模型参数，解决了现有方法的计算成本高、遗忘边界不可控等问题。

Comments Accpeted to ACL 2026 Main Conference

详情

AI中文摘要

大型语言模型（LLMs）在未过滤语料上训练时，固有地面临保留敏感信息的风险，需要选择性知识去学习以满足监管合规和伦理安全要求。然而，现有参数修改方法面临根本性限制：计算成本高、遗忘边界不可控以及对模型权重访问的严格依赖。这些限制使它们在闭源模型中不切实际，而当前非侵入式替代方案仍缺乏系统性和依赖经验。为解决这些挑战，我们提出了可控对齐提示（CAP）框架，一种端到端的提示驱动去学习范式。CAP通过强化学习将去学习分解为可学习的提示优化过程，其中提示生成器与LLM协作，以抑制目标知识的同时保留选择性的一般能力。这种方法通过提示撤销实现可逆的知识恢复。广泛实验表明，CAP实现了无需更新模型参数的精确、可控的去学习，建立了一种动态对齐机制，克服了先前方法的可转移性限制。

英文摘要

Large language models (LLMs) trained on unfiltered corpora inherently risk retaining sensitive information, necessitating selective knowledge unlearning for regulatory compliance and ethical safety. However, existing parameter-modifying methods face fundamental limitations: high computational costs, uncontrollable forgetting boundaries, and strict dependency on model weight access. These constraints render them impractical for closed-source models, yet current non-invasive alternatives remain unsystematic and reliant on empirical experience. To address these challenges, we propose the Controllable Alignment Prompting for Unlearning (CAP) framework, an end-to-end prompt-driven unlearning paradigm. CAP decouples unlearning into a learnable prompt optimization process via reinforcement learning, where a prompt generator collaborates with the LLM to suppress target knowledge while preserving general capabilities selectively. This approach enables reversible knowledge restoration through prompt revocation. Extensive experiments demonstrate that CAP achieves precise, controllable unlearning without updating model parameters, establishing a dynamic alignment mechanism that overcomes the transferability limitations of prior methods.

URL PDF HTML ☆

赞 0 踩 0