arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.27366 2026-05-27 cs.AI cs.CL cs.LG cs.MA 版本更新

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

MUSE-Autoskill: 通过技能创建、记忆、管理和评估实现自我进化智能体

Huawei Lin, Peng Li, Jie Song, Fuxin Jiang, Tieying Zhang

发表机构 * ByteDance Inc.（字节跳动公司）； Rochester Institute of Technology（罗切斯特理工学院）

AI总结提出MUSE-Autoskill框架，通过统一的技能生命周期（创建、记忆、管理、评估和优化）使LLM智能体持续提升任务解决能力，实验表明生命周期管理的技能可提高任务成功率、效率、复用性和跨智能体迁移。

Comments 30 pages, 8 figures, 13 tables, working in progress

详情

AI中文摘要

大型语言模型（LLM）智能体依赖可复用技能来解决复杂任务。然而，现有的技能创建方法将技能视为孤立和静态的工件，限制了其可复用性、可靠性和长期改进。我们提出了MUSE-Autoskill智能体（记忆利用技能进化），一个以技能为中心的智能体框架，让智能体通过统一的技能生命周期（创建、记忆、管理、评估和优化）持续提升任务解决能力。我们的框架使智能体能够按需创建技能，跨任务存储和复用技能，高效组织和选择技能，并通过单元测试和运行时反馈评估技能以进行持续优化。我们进一步引入了技能级记忆，为每个技能跨任务积累经验，从而实现更有效的复用和随时间适应。在SkillsBench上的实验提供了初步证据，表明生命周期管理的技能可以提高任务成功率、效率、复用性和跨智能体迁移，突出了将技能视为长期存在、具有经验意识和可测试资产的重要性。

英文摘要

Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evolution), a skill-centric agent framework that lets agents continuously improve their task-solving capability by creating, reusing, and refining skills under a unified lifecycle (creation, memory, management, evaluation, and refinement). Our framework enables agents to create skills on demand, store and reuse them across tasks, organize and select them efficiently, and evaluate them through unit tests and runtime feedback for continuous refinement. We further introduce skill-level memory that accumulates experience for each skill across tasks, enabling more effective reuse and adaptation over time. Experiments on SkillsBench provide initial evidence that lifecycle-managed skills can improve task success, efficiency, reuse, and cross-agent transfer, highlighting the importance of treating skills as long-lived, experience-aware, and testable assets.

URL PDF HTML ☆

赞 0 踩 0

2605.27358 2026-05-27 cs.LG cs.AI cs.CL 版本更新

MobileMoE: Scaling On-Device Mixture of Experts

MobileMoE: 扩展设备端混合专家模型

Yanbei Chen, Hanxian Huang, Ernie Chang, Jacob Szwejbka, Digant Desai, Zechun Liu, Vikas Chandra, Raghuraman Krishnamoorthi

发表机构 * Meta AI

AI总结针对设备端部署，提出MobileMoE系列子十亿参数MoE语言模型，通过联合优化架构和四阶段训练，在14个基准上匹配或超越领先的密集模型和MoE模型，并在智能手机上实现高效推理。

详情

AI中文摘要

混合专家（MoE）已成为千亿参数语言模型的事实标准架构，但其在十亿以下规模用于设备端部署的优势尚未得到充分探索。为弥补这一差距，我们提出MobileMoE，一系列设备端MoE语言模型，具有子十亿激活参数（0.3-0.9B激活，1.3-5.3B总参数），为设备端LLM建立了新的帕累托前沿。我们首先制定了一个设备端MoE缩放定律，在移动内存和计算约束下联合优化MoE架构，识别出一个设备端最佳点——具有细粒度和共享专家的适度稀疏性——同时实现内存和计算最优。基于推导出的架构，我们采用四阶段方案训练MobileMoE，包括预训练、中期训练、指令微调和量化感知训练，全部使用开源数据集。在14个基准上，MobileMoE匹配或超越领先的设备端密集LLM，推理FLOPs减少2-4倍，并以最多60%的参数匹配或超越最先进的MoE模型OLMoE-1B-7B。为弥合移动部署的最后一步，我们提供了首个在商用智能手机上的高效MoE推理，并进行了全面的设备端性能分析。在相当的INT4权重内存下，MobileMoE-S的预填充速度比密集基线MobileLLM-Pro快1.8-3.8倍，解码速度快2.2-3.4倍。

英文摘要

Mixture-of-Experts (MoE) has become the de facto architecture for hundred-billion-parameter language models, yet its advantages at sub-billion scales for on-device deployment remain largely unexplored. To close this gap, we present MobileMoE, a family of on-device MoE language models with sub-billion active parameters (0.3-0.9B active and 1.3-5.3B total) that establish a new Pareto frontier for on-device LLMs. We first formulate an on-device MoE scaling law that jointly optimizes MoE architecture under mobile memory and compute constraints, identifying an on-device sweet spot - moderate sparsity with fine-grained and shared experts - that is simultaneously memory and compute-optimal. Building on the derived architectures, we train MobileMoE with a four-stage recipe covering pre-training, mid-training, instruction fine-tuning, and quantization-aware training, all on open-source datasets. Across 14 benchmarks, MobileMoE matches or exceeds leading on-device dense LLMs with 2-4$\times$ fewer inference FLOPs, and matches or surpasses the state-of-the-art MoE OLMoE-1B-7B with up to 60% fewer parameters. To bridge the last mile to mobile deployment, we provide the first efficient MoE inference on commodity smartphones with comprehensive on-device profiling. At comparable INT4 weight memory, MobileMoE-S delivers $1.8$-$3.8\times$ faster prefill and $2.2$-$3.4\times$ faster decode than the dense baseline MobileLLM-Pro.

URL PDF HTML ☆

赞 0 踩 0

2605.27354 2026-05-27 cs.LG cs.AI cs.CL 版本更新

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders

利用稀疏自编码器的模型内部状态指导LLM后训练数据工程

Yi Jing, Zao Dai, Jinwu Hu, Zijun Yao, Lei Hou, Juanzi Li, Xiaozhi Wang

发表机构 * Tsinghua University（清华大学）

AI总结提出SAERL框架，通过稀疏自编码器提取模型内部状态，建模数据多样性、难度和质量，用于强化学习数据工程，提升准确率并减少训练步数。

详情

AI中文摘要

模型内部状态编码了大型语言模型（LLM）处理其训练数据时的丰富信息；然而，后训练数据工程主要依赖外部信号，忽略了模型内部状态中丰富的内在信号。我们提出了SAERL，一个用于LLM强化学习（RL）的数据工程框架。它使用稀疏自编码器（SAE）这一先进的机制可解释性工具提取的模型内部状态，建模三种内在数据属性：多样性、难度和质量。每个属性支撑一个具体的数据工程操作：用于批次多样性控制的SAE空间聚类与适度批次混合、用于从易到难课程排序的难度代理，以及用于数据过滤的质量探针。SAERL在Qwen2.5-Math-1.5B上相比原始GRPO平均准确率提升3.00%，并以减少20%的训练步数达到目标准确率，在模型规模和RL算法上均有一致收益。实验表明，SAE在不同模型家族和规模间有效迁移，作为一种轻量级且可重用的数据工程工具。这些结果证明，模型内部状态是后训练数据工程中强大且实用的信号来源。

英文摘要

Model internals encode rich information about how a large language model (LLM) processes its training data; however, post-training data engineering largely relies on external signals and ignores rich intrinsic signals lying in model internals. We propose SAERL, a data engineering framework for LLM reinforcement learning (RL). It models three intrinsic data properties: diversity, difficulty, and quality, using model internals extracted with Sparse Autoencoder (SAE), an advanced mechanistic interpretability tool. Each property grounds a concrete data engineering operation: SAE-space clustering with moderate batch mixing for batch diversity control, a difficulty proxy for easy-to-hard curriculum ordering, and a quality probe for data filtering. SAERL improves average accuracy by 3.00% over vanilla GRPO and reaches target accuracy with 20% fewer training steps on Qwen2.5-Math-1.5B, with consistent gains across model scales and RL algorithms. Experiments show that SAE transfers effectively across model families and scales, serving as a lightweight and reusable data engineering tool. These results demonstrate that model internals are a powerful and practical source of signals for post-training data engineering.

URL PDF HTML ☆

赞 0 踩 0

2605.27352 2026-05-27 cs.LG stat.ML 版本更新

From Scores to Gibbs Correctors: Accelerating Uniform-Rate Discrete Diffusion Models

从分数到吉布斯校正器：加速均匀速率离散扩散模型

Yuchen Liang, Ness Shroff, Yingbin Liang

发表机构 * The Ohio State University（俄亥俄州立大学）

AI总结提出吉布斯加速离散扩散（GADD）方法，利用具体分数函数构建吉布斯后验似然，无需额外训练即可实现均匀速率离散扩散模型的加速采样，达到$\mathcal{O}(\mathrm{polylog} (\varepsilon^{-1}))$的采样复杂度。

详情

AI中文摘要

离散扩散模型在文本和其他符号领域取得了强大的实证表现，但特别是对于均匀速率模型，它们通常需要许多步骤才能生成单个样本。现有的加速方法要么依赖训练额外的量，要么遭受慢混合问题。在这项工作中，我们提出了一种新颖的基于吉布斯的离散扩散模型校正器，称为吉布斯加速离散扩散（GADD）。GADD利用具体分数函数的结构直接构建吉布斯后验似然，除了标准分数估计外不需要任何额外训练。我们证明GADD实现了$\mathcal{O}(\mathrm{polylog} (\varepsilon^{-1}))$的整体采样复杂度，为均匀速率离散扩散模型的基于扩散的采样器提供了第一个这样的速率。我们还进行了数值实验，展示了GADD在合成数据、零样本文本采样和零样本条件音乐生成中的实际优势。这些结果证实了理论，并表明GADD在样本质量和墙钟效率上始终优于标准基线，包括原始欧拉方法和CTMC校正器。除此之外，我们的理论分析引入了一个新颖的框架，用于分析离散扩散模型中的预测器-校正器方法，这可能具有独立的意义。与依赖Girsanov测度变换技术的现有方法不同，我们的方法基于一个归纳论证，该论证在考虑校正器更新不准确性的同时，跟踪预测器迭代中的误差传播。

英文摘要

Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, but, especially for uniform-rate models, they often require many steps to generate a single sample. Existing acceleration methods either rely on training additional quantities or suffer from slow mixing. In this work, we propose a novel Gibbs-based corrector for discrete diffusion models, termed Gibbs-Accelerated Discrete Diffusion (GADD). GADD leverages the structure of the concrete score function to construct Gibbs posterior likelihoods directly, without requiring any additional training beyond standard score estimation. We show that GADD achieves an overall sampling complexity of $\mathcal{O}(\mathrm{polylog} (\varepsilon^{-1}))$, yielding the first such rate for diffusion-based samplers for uniform-rate discrete diffusion models. We also conduct numerical experiments demonstrating the practical advantages of GADD across synthetic data, zero-shot text sampling, and zero-shot conditional music generation. These results corroborate the theory and show that GADD consistently improves sample quality and wall-clock efficiency over standard baselines, including vanilla Euler methods and CTMC correctors. Beyond this, our theoretical analysis introduces a novel framework for analyzing predictor-corrector methods in discrete diffusion models, which may be of independent interest. Unlike existing approaches that rely on the Girsanov change-of-measure technique, our method is based on an induction argument that tracks error propagation across predictor iterations while accounting for inaccuracies in the corrector updates.

URL PDF HTML ☆

赞 0 踩 0

2605.27343 2026-05-27 cs.CV cs.LG 版本更新

Towards Controllable Image Generation through Representation-Conditioned Diffusion Models

通过表示条件扩散模型实现可控图像生成

Nithesh Chandher Karthikeyan, Jonas Unger, Gabriel Eilertsen

AI总结本文提出利用预训练自监督模型的表示作为条件，通过扩散模型实现无需大量标注的可控图像生成，并探索了表示空间中的平滑和分离特性。

2605.27316 2026-05-27 cs.LG math.OC 版本更新

Probabilistic Smoothing with Ratio-Monotone Transforms for Global Optimization

基于比率单调变换的概率平滑用于全局优化

Kukyoung Jang, Taehyun Cho, Junrui Zhang, Ping Xu, Kyungjae Lee

发表机构 * Department of Statistics, Seoul, Korea University（首尔大学统计系）； Department of Electrical and Computer Engineering, Seoul, Seoul National University（首尔国立大学电气与计算机工程系）； Shandong University at Weihai, Weihai, China（威海山东大学）

AI总结提出一种结合灵活对称单峰核与单调比率变换的通用概率平滑框架，在温和条件下保持全局最优解并保证收敛性，实验证明鲁棒性和竞争力提升。

2605.27309 2026-05-27 cs.LG cs.OH 版本更新

Greening AI Inference with Accuracy and Latency-aware User Incentives

通过准确性和延迟感知的用户激励实现绿色AI推理

Vasilios A. Siris, Adamantia Stamou, George D. Stamoulis, Konstantinos Varsos, Ramin Khalili

发表机构 * Department of Informatics, School of Information Sciences and Technology（信息科学与技术学院信息学院）； Athens University of Economics and Business（雅典经济与商业大学）； Huawei Heisenberg Research Center, Munich, Germany（华为海森堡研究中心，慕尼黑，德国）

AI总结提出一种基于用户对推理质量和延迟的估值以及环境意识的激励框架，通过双层级服务订阅平衡碳排放与QoE参数。

详情

DOI: 10.1109/MIC.2026.3695352
Journal ref: IEEE Internet Computing, 2026

AI中文摘要

AI服务的广泛使用引发了对其环境可持续性的担忧，最近的研究表明AI推理的碳排放是主要贡献者。本文介绍了一个框架，基于用户对推理质量和延迟的估值以及他们的环境意识，同时考虑碳排放与这两个QoE参数之间的权衡，来设计AI推理激励。我们的方法可以适应不同的权衡，这取决于AI模型的大小和复杂性以及用于服务推理请求的资源分配。这些激励可以通过一个实用的双层级服务订阅来提供，该订阅为用户提供折扣以换取减少的碳排放。折扣服务选项使AI提供商能够在高碳强度期间以较低的质量和较高的延迟服务一定比例的推理请求。

英文摘要

The widespread use of AI services has raised concerns for its environmental sustainability, towards which recent studies have identified carbon emissions of AI inference as the major contributor. This paper introduces a framework for designing AI inference incentives based on the users' valuation for inference quality and latency, together with their environmental consciousness, while accounting for the tradeoff between carbon emissions and the two QoE parameters. Our approach can accommodate different tradeoffs, that depend on the size and complexity of the AI models and the allocation of resources to serve inference requests. The incentives can be offered through a practical two-tier service subscription that offers users a discount in exchange for reduced carbon emissions. The discounted service option gives the AI provider the flexibility to serve some percentage of inference requests at a lower quality and higher latency during periods of high carbon intensity.

URL PDF HTML ☆

赞 0 踩 0

2605.27306 2026-05-27 cs.LG 版本更新

Normal Guidance is what Attention Needs

Ethan Harvey, Dennis Johan Loevlie, Michael C. Hughes

发表机构 * Department of Computer Science（计算机科学系）

AI总结提出Normal Guidance正则化技术，使基于注意力的多实例学习方法在3D医学图像切片级定位上超越现有方法，同时保持全扫描分类性能。

详情

AI中文摘要

我们考虑仅使用整个体积的一个二元标签（而不是每个2D切片的标签）来训练3D医学图像的分类器。在这种弱监督设置下，我们能否学习准确的切片级预测分类器？基于注意力的多实例学习（MIL）可以为每个切片生成注意力分数。然而，最近的研究表明，一个忽略图像内容的简单中心聚焦基线在3D脑部扫描的切片级分类上可以胜过基于注意力和基于Transformer的MIL。我们证明该基线在胸部和腹部CT扫描的切片级分类上也优于现有的MIL。受此基线启发，我们提出了Normal Guidance，一种正则化技术，鼓励学习的注意力分布遵循钟形曲线。在三个总计超过400万张2D切片的医学影像数据集上，我们展示了Normal Guidance使基于注意力和基于Transformer的MIL方法在切片级定位上显著优于现有技术，同时在全扫描分类上保持竞争力。

英文摘要

We consider training classifiers for 3D medical images using only one binary label for the entire volume rather than a label for each 2D slice. In such weakly supervised settings, can we learn accurate classifiers for slice-level predictions? Attention-based multiple instance learning (MIL) can produce an attention score for every slice. Yet recent work demonstrates that a simple center-focused baseline that ignores image content can outperform attention-based and transformer-based MIL at slice-level classification of 3D brain scans. We show this baseline also outperforms existing MIL at slice-level classification of thoracic and abdominal CT scans. Motivated by this baseline, we propose Normal Guidance, a regularization technique that encourages the learned attention distribution to follow a bell-shaped curve. Across three medical imaging datasets totaling over 4 million 2D slices, we show our Normal Guidance enables attention-based and transformer-based MIL methods to deliver significantly better slice-level localization than the state-of-the-art while remaining competitive at whole-scan classification.

URL PDF HTML ☆

赞 0 踩 0

2605.27299 2026-05-27 cs.CR cs.AI cs.HC cs.LG cs.SY eess.SY 版本更新

Risk Averse Alert Prioritization for IDS Using Subnormal Gaussian Fuzzy Models

使用次正态高斯模糊模型的IDS风险规避警报优先级排序

Murat Moran

AI总结提出基于次正态高斯模糊数的警报优先级排序框架，通过建模威胁严重性、检测置信度和组织风险态度三种不确定性，利用排序指数实现可调安全姿态，实验证明在检测器退化下比基线方法更鲁棒。

详情

AI中文摘要

现代入侵检测系统每天生成数千条警报，但由于误报或低影响事件过多，警报疲劳严重限制了安全运营的有效性。我们通过提出一个基于次正态高斯模糊数的原则性警报优先级排序框架来解决这个问题，该框架明确建模了三种不确定性来源：威胁严重性、检测置信度和组织风险态度。每个警报被表示为一个模糊数，其核心表示严重性，展度表示不确定性，高度反映检测可靠性。我们应用排序指数对警报进行优先级排序，允许组织通过风险态度参数调整安全姿态。在CIC-IDS2017和NSL-KDD上的实验验证表明，在检测器退化下，该方法比基线方法具有更强的鲁棒性（NDCGrel@100为0.9963对比0.8215），在中等置信度警报中具有明显区分度，在稳健检测器下与基线方法接近。该框架具有理论基础、计算效率高、提供可解释推理，并且在检测器系列和校准错误场景下保持鲁棒性。

英文摘要

Modern intrusion detection systems generate thousands of alerts daily, but alert fatigue severely limits security operations effectiveness due to too many false positives or low-impact events. We address this by proposing a principled framework for alert prioritization based on subnormal Gaussian fuzzy numbers, explicitly modeling three sources of uncertainty: threat severity, detection confidence, and organizational risk attitude. Each alert is represented as a fuzzy number with the core indicating severity, spread indicating uncertainty, and height reflecting detection reliability. We apply ranking indices to prioritize alerts, allowing organizations to tune security posture through a risk-attitude parameter. Experimental validation on CIC-IDS2017 and NSL-KDD demonstrates greater robustness than baselines under detector degradation (0.9963 vs 0.8215 NDCGrel@100), with distinct differentiation in mid-confidence alerts and near-parity with baselines under robust detectors. The framework is theoretically grounded, computationally efficient, provides interpretable reasoning, and remains robust across detector families and miscalibration scenarios.

URL PDF HTML ☆

赞 0 踩 0

2605.27293 2026-05-27 cs.LG stat.ML 版本更新

BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning

BASIS: 基于单次采样信息共享的批量优势估计用于LLM推理

Shijin Gong, Erhan Xu, Kai Ye, Francesco Quinzan, Giulia Livieri, Chengchun Shi

发表机构 * University of Science and Technology of China（中国科学技术大学）； London School of Economics and Political Science（伦敦政治经济学院）； University of Oxford（牛津大学）

AI总结提出BASIS算法，通过单次采样和批次内信息共享改进价值函数估计，在减少计算开销的同时提升策略优化性能。

Comments 17 pages, 7 figures

详情

AI中文摘要

基于可验证奖励的强化学习已成为提升大型语言模型推理能力的标准方法。现有算法在价值估计和策略学习中面临计算效率与样本效率之间的权衡。我们引入BASIS，一种无评论家的后训练算法，旨在解决这一权衡。在每个在线训练步骤中，BASIS每个提示仅采样一次，但利用整个批次中跨提示的丰富信息来改进价值函数估计。实验表明，与代表性单次采样基线REINFORCE++相比，BASIS将价值函数估计的MSE降低了69%，并且使用一次采样达到的MSE低于使用8次采样的组均值估计器。价值估计的改进转化为更好的策略优化：使用显著更少的训练时间，BASIS达到了接近多次采样GRPO型基线的性能，并且通常优于单次采样REINFORCE型基线。

英文摘要

Reinforcement learning with verifiable rewards has become a standard recipe for improving the reasoning abilities of large language models. Existing algorithms face a tradeoff between computational efficiency and sample efficiency in value estimation and policy learning. We introduce BASIS, a critic-free post-training algorithm designed to address this tradeoff. At each online training step, BASIS samples only one rollout per prompt, but leverages rich information across prompts in the entire batch to improve value function estimation. Our experiments demonstrate that BASIS reduces MSE in value function estimation by 69% compared to REINFORCE++, a representative single-rollout baseline, and achieves lower MSE with one rollout than group mean estimators with 8 rollouts. This improvement in value estimation translates to better policy optimization: using substantially less training time, BASIS achieves performance close to multi-rollout GRPO-type baselines and often outperforms single-rollout REINFORCE-type baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.27288 2026-05-27 cs.CL cs.AI cs.LG 版本更新

It's Not Always Sycophancy: Measuring LLM Conformity as a Function of Epistemic Uncertainty

并非总是谄媚：基于认知不确定性测量LLM的从众行为

Kevin H. Guo, Chao Yan, Avinash Baidya, Katherine Brown, Xiang Gao, Juming Xiong, Zhijun Yin, Bradley A. Malin

发表机构 * Vanderbilt University（范德比尔特大学）； Vanderbilt University Medical Center（范德比尔特大学医学中心）； Intuit AI Research（Intuit AI研究院）

AI总结本文提出MUSE框架，通过区分谄媚从众和不确定性驱动的从众，揭示LLM在用户反驳时改变立场的行为机制，并发现两种从众均随用户感知专业性和建议合理性增强。

详情

AI中文摘要

大型语言模型（LLMs）已知会放弃初始立场以适应用户的反驳。虽然先前研究主要将此行为归因于从人类反馈强化学习中习得的谄媚，但我们假设从众行为也受模型在推理时的认知不确定性驱动。本文提出MUSE，一个两阶段评估框架，用于解开驱动LLM从众行为的机制。具体而言，MUSE将模型回答查询时的认知不确定性与其在后续轮次中屈服于用户反驳的可能性进行映射。我们证明驱动从众的机制不仅限于谄媚。具体来说，我们刻画了共同驱动从众的两个不同因素：谄媚从众，即模型即使对其初始回答绝对确定也会与用户反驳保持一致；以及不确定性驱动从众，即模型从众可能性随其不确定性增加而增加。此外，我们进行消融研究，证明谄媚从众和不确定性驱动从众均随1）LLM对用户感知专业性的增加和2）用户建议的合理性增加而增长。更广泛地说，MUSE通过区分对齐诱导的谄媚和训练语料驱动的不确定性，为更有针对性的干预策略提供信息。

英文摘要

Large language models (LLMs) are known to abandon their initial stance to conform to user pushback. While prior research largely attributes this behavior to sycophancy learned during reinforcement learning from human feedback, we hypothesize that conformity is also driven by a model's epistemic uncertainty at inference time. In this paper, we introduce MUSE, a two-stage evaluation framework to disentangle the mechanisms driving LLM conformity. Specifically, MUSE maps a model's epistemic uncertainty in responding to a query against its likelihood to yield to user pushback in a subsequent turn. We demonstrate that the mechanisms driving conformity extend beyond sycophancy alone. Specifically, we characterize two distinct factors that jointly drive conformity: sycophantic conformity, where a model aligns with user pushback even with absolute certainty in its initial response, and uncertainty-driven conformity, where a model's likelihood for conformity increases alongside its uncertainty. Furthermore, we conduct ablation studies to demonstrate that both sycophantic conformity and uncertainty-driven conformity grow with 1) the LLM's perceived expertise of the user and 2) the plausibility of the user's suggestions. More broadly, MUSE informs more targeted intervention strategies by distinguishing alignment-induced sycophancy and training-corpora-driven uncertainty.

URL PDF HTML ☆

赞 0 踩 0

2605.27281 2026-05-27 cs.LG stat.ML 版本更新

Causal Risk Minimization for High-Dimensional Treatments

高维处理变量的因果风险最小化

Nikita Dhawan, Arnav Paruthi, Andrew Kim, Lovedeep Gondara, Jekaterina Novikova, Chris J. Maddison

发表机构 * University of Toronto（多伦多大学）； Vector Institute（向量研究所）； Vanguard（先锋）

AI总结针对高维处理空间（如文本）的因果推断，提出通过分解因果误差为矩平衡误差序列并优化高阶平衡目标，以及将高维处理投影到低维属性的方法，实现无需属性特定训练的因果估计。

Comments 18 pages, 4 figures

详情

AI中文摘要

预测具有多种可能变化的干预效果（例如，影响心理健康结果的治疗内容或推动股价变动的财报电话会议记录）在多个领域中非常有用。然而，经典的因果估计量通常假设所有可能的干预都被观察到，这在干预变化广泛的情况下（例如，在所有文本字符串的空间中）是不可行的。我们采用了一种将因果推断重新表述为学习问题的著名方法，以处理高维处理空间。具体来说，在标准假设（如无未观测混杂）下，我们证明因果误差可分解为一系列递增阶数的矩平衡误差，并设计了直接改进因果估计的目标函数。我们还展示了如何将高维处理的效果投影到低维处理属性上，这使得单个模型能够回答多个因果问题，而无需额外的属性特定训练。我们在高维连续、离散和文本处理设置中经验性地评估了我们的估计量，其中文本处理使用了亚马逊评论的半合成数据集。我们的实验证明了高阶平衡误差优化的优势以及投影因果估计与属性特定估计的竞争性能。

英文摘要

Predicting the effect of interventions with many possible variations, e.g., therapeutic content that affects mental health outcomes or an earnings call transcript that drives movement in share price, is useful across several domains. However, classical causal estimators tend to assume that all possible interventions are observed, which is infeasible when interventions vary widely, for instance, in the space of all text strings. We adapt a well-known approach of recasting causal inference as a learning problem, to address high-dimensional treatment spaces. Specifically, under standard assumptions like no unobserved confounding, we show that causal error decomposes into a series of moment-balancing errors of increasing order, and design objectives that directly improve causal estimation. We also show how to project the effect of a high-dimensional treatment onto lower-dimensional treatment attributes, which allows a single model to answer several causal questions without additional attribute-specific training. We empirically evaluate our estimators in settings with high-dimensional continuous, discrete, and text treatments, the last of which used a semi-synthetic dataset of Amazon Reviews. Our experiments demonstrate the benefit of higher-order balance error optimization and competitive performance of projected causal estimates with attribute-specific estimators.

URL PDF HTML ☆

赞 0 踩 0

2605.27269 2026-05-27 cs.LG stat.AP 版本更新

基于特征和深度学习模型用于TROPOMI甲烷羽流筛选的可解释比较

Solomiia Kurchaba, Joannes D. Maasakkers, Berend J. Schuit, Ilse Aben

发表机构 * SRON Space Research Organisation Netherlands（SRON空间研究组织荷兰）； GHGSat Inc.（GHGSat公司）； Department of Earth Sciences, Vrije Universiteit Amsterdam（地球科学系，阿姆斯特丹自由大学）

AI总结本研究比较了基于特征（SVC、随机森林、XGBoost）和基于图像（ResNet-18、ResNet-34）的模型在甲烷羽流-伪影分类中的性能，并通过SHAP可解释性分析为操作筛选提供指导。

详情

AI中文摘要

连续且全球性地检测大量甲烷排放是全球变暖减缓的关键步骤。卫星观测（例如来自S5P/TROPOMI）结合羽流检测算法可以在这一努力中发挥关键作用。然而，并非所有看起来像甲烷排放羽流的TROPOMI羽流检测都是实际排放的结果。数据中相当一部分类似羽流的特征是检索伪影。此类伪影可能是由海拔或反照率梯度变化、高浓度气溶胶、海岸线、水体等引起的。先前的工作通过支持向量机分类器（SVC）解决了羽流-伪影分类问题，该分类器在由领域专家设计的大量基于观测的标量特征上训练。然而，这种方法将算法接收的信息范围限制在专家认为重要的内容上，破坏了像素之间的空间关系，并在统计聚合过程中丢失信息。在本研究中，我们在平衡和不平衡评估设置下比较了基于特征（SVC、随机森林、XGBoost）和基于图像（ResNet-18、ResNet-34）的模型用于甲烷羽流-伪影分类。为了解释结果，我们将基于SHAP的可解释性应用于两个模型家族。我们的发现为操作甲烷筛选工作流程（如CAMS甲烷热点探索器）中的模型选择提供了实用指导。

英文摘要

Continuous and global detection of large methane emissions is a crucial step for global warming mitigation. Satellite observations, such as from S5P/TROPOMI, combined with plume detection algorithms, can play a key role in this effort. However, not all TROPOMI plume detections that look like methane emission plumes are the result of actual emissions. A significant part of the plume-like features in the data are retrieval artifacts. Such artifacts could be the result of variations in elevation or albedo gradients, high concentrations of aerosols, coastal lines, water bodies, etc. Previous work approached the problem of plume-artifact classification by means of a Support Vector Machine Classifier (SVC), trained on an extensive set of observation-based scalar features designed by domain experts. However, such an approach limits the information scope received by the algorithm to what is deemed to be important by the experts, breaks the spatial relationship between pixels, and loses information during the process of statistical aggregation. In this study, we compare feature-based (SVC, Random Forest, XGBoost) and image-based (ResNet-18, ResNet-34) models for methane plume-artifact classification under balanced and imbalanced evaluation settings. To interpret the results, we apply SHAP-based explainability to both model families. Our findings provide practical guidance for model selection in operational methane-screening workflows such as the CAMS Methane Hotspot Explorer.

URL PDF HTML ☆

赞 0 踩 0

2605.27219 2026-05-27 cs.LG stat.ML 版本更新

Nonlinear Data Integration via Kernel Methods for Data Collaboration Analysis

基于核方法的非线性数据整合用于数据协作分析

Yamato Suetake, Yuta Kawakami, Shunnosuke Ikeda, Yuichi Takano

发表机构 * Graduate School of Science and Technology, University of Tsukuba（科学技术研究生院，茨口大学）； Institute of Systems and Information Engineering, University of Tsukuba（系统与信息工程研究所，茨口大学）

AI总结针对分散保密数据协作分析中线性整合方法重建风险高且无法对齐非线性变换的问题，提出非线性核整合（NKI）方法，通过核岭回归和特征值问题获得全局最优解，并引入图正则化和中心化约束以捕获几何和目标变量信息，在图像分类任务中提升了准确率并降低了重建风险。

Comments 50 pages, 7 figures

详情

AI中文摘要

分散保密数据集的协作分析很重要，但原始数据集的直接共享常受隐私和机构限制。数据协作（DC）分析通过各方特定的混淆函数将每个数据集转换为隐私保护的中间表示，并使用锚数据集将它们整合为公共协作表示。然而，许多现有的DC分析方法依赖线性变换进行数据混淆和整合，这可能增加重建风险。尽管非线性降维可以缓解这一风险，但传统的线性整合方法无法准确对齐非线性变换产生的中间表示。此外，现有的整合方法主要最小化各方之间的差异，并未明确纳入对下游分析有用的几何或目标变量信息。为克服这些限制，我们首先将线性核整合（LKI）公式化为一种线性整合方法，然后对其进行核化以获得非线性核整合（NKI）。NKI通过核岭回归和特征值问题获得全局最优解。我们还引入了图正则化和中心化约束，使得目标表示能够捕获对下游分析有用的几何和目标变量信息。在图像分类任务上的实验表明，在非线性降维下，NKI比现有的线性整合方法提高了分类准确率，而目标变量感知的图正则化和中心化进一步带来了增益。结果还表明，降维选择显著影响分类准确率和重建风险。

英文摘要

Collaborative analysis of decentralized confidential datasets is important, but direct sharing of original datasets is often restricted by privacy and institutional constraints. Data collaboration (DC) analysis transforms each dataset into privacy-preserving intermediate representations via party-specific obfuscation functions and integrates them into common collaboration representations using an anchor dataset. However, many existing DC analysis methods rely on linear transformations for data obfuscation and integration, which may increase reconstruction risk. Although nonlinear dimensionality reduction can mitigate this risk, conventional linear integration methods cannot accurately align intermediate representations produced by nonlinear transformations. Moreover, existing integration methods mainly minimize discrepancies among parties and do not explicitly incorporate geometric or target-variable information useful for downstream analysis. To overcome these limitations, we first formulate linear kernel integration (LKI) as a linear integration method and then kernelize it to obtain nonlinear kernel integration (NKI). NKI admits a globally optimal solution via kernel ridge regression and an eigenvalue problem. We also introduce graph regularization and a centering constraint so that the target representation can capture geometric and target-variable information useful for downstream analysis. Experiments on image classification tasks demonstrate that NKI improves classification accuracy over existing linear integration methods under nonlinear dimensionality reduction, with further gains from target-variable-aware graph regularization and centering. The results also show that dimensionality reduction choices substantially affect both classification accuracy and reconstruction risk.

URL PDF HTML ☆

赞 0 踩 0

2605.27194 2026-05-27 cs.CL cs.CV cs.LG 版本更新

Not All Tokens Matter Equally: Dynamic In-context Vector Distillation with Decisive-Token Supervision for Long-form Medical Report Generation

并非所有标记都同等重要：基于关键标记监督的动态上下文向量蒸馏用于长医学报告生成

Ning Wu, Rui Liu, Xinkun Lin, Weixing Chen, Jinxi Xiang, Tao Wei, Lina Yao, Mingjie Li

发表机构 * UNSW Sydney（新南威尔士大学悉尼分校）； University of Technology Sydney（技术大学悉尼分校）； School of Computer Science and Engineering, Sun Yat-sen University（中山大学计算机科学与工程学院）； Stanford University（斯坦福大学）； Shanghai Jiao Tong University（上海交通大学）

AI总结提出DIVE框架，通过关键标记监督和状态条件动态引导，解决长文本生成中标记级蒸馏忽略关键标记的问题，在医学报告生成任务上取得最佳性能。

Comments Preprint. 20 pages, 6 figures

详情

AI中文摘要

将示范效果蒸馏到隐藏空间干预中提供了一种轻量级的替代全微调的方法。然而，现有的多模态变体主要是在短文本任务上评估的，其中输出在几个标记后结束。将这些方法扩展到长文本生成暴露了一个基本但未充分研究的局限性：标记级蒸馏隐式地将所有输出标记视为同等信息量，但长文本输出由高频模板和语法标记主导，而实际决定输出质量的标记稀疏分布。在医学报告生成（MRG）中，有两种这样的关键标记突出：决定诊断内容的病理相关标记和决定终止的序列结束（EOS）事件。两者在均匀交叉熵下都受到不足的监督，自回归解码通过偏离教师强制轨迹进一步加剧了问题。我们提出DIVE，一个冻结骨干的蒸馏框架，通过两种与这些失败相匹配的互补机制来解决长文本报告生成。关键标记监督通过提高病理相关标记和EOS事件的交叉熵贡献来恢复监督平衡，确保内容保真度和终止在训练期间学习，而不是在解码时施加。状态条件动态引导用隐藏状态相关的适配器替换固定的开环残差，允许注入信号随着解码漂移而适应。在MIMIC-CXR和CheXpert Plus上使用两个医学VLM骨干的实验表明，DIVE在词汇和临床代理指标中始终位列最强方法之一。我们的方法在所有数据集-骨干设置中实现了最佳的BLEU-4、ROUGE-L和RadGraph F1，同时在粗粒度标签级CheXbert F1上保持竞争力。

英文摘要

Distilling demonstration effects into hidden-space interventions offers a lightweight alternative to full finetuning. However, existing multimodal variants are mostly evaluated on short-form tasks, where outputs end after a few tokens. Extending these methods to long-form generation exposes a fundamental yet underexamined limitation: token-level distillation implicitly treats all output tokens as equally informative, but long-form outputs are dominated by high-frequency template and grammatical tokens, while the tokens that actually determine output quality are sparsely distributed. In medical report generation (MRG), two such decisive tokens stand out: pathology-related tokens that determine diagnostic content, and the end-of-sequence (EOS) event that determines termination. Both receive insufficient supervision under uniform cross-entropy, and autoregressive decoding further compounds the problem by drifting away from teacher-forced trajectories. We propose DIVE, a frozen-backbone distillation framework that addresses long-form report generation through two complementary mechanisms matched to these failures. Decisive-token supervision restores supervision balance by upweighting the cross-entropy contribution of pathology-related tokens and the EOS event, ensuring that content fidelity and termination are learned during training rather than imposed at decoding time. State-conditioned dynamic steering replaces fixed open-loop residuals with hidden-state-dependent adapters, allowing the injected signal to adapt as decoding drifts. Experiments on MIMIC-CXR and CheXpert Plus with two medical VLM backbones show that DIVE consistently ranks among the strongest methods across lexical and clinical-proxy metrics. Our method achieves the best BLEU-4, ROUGE-L, and RadGraph F1 in all dataset--backbone settings, while remaining competitive on coarse label-level CheXbert F1.

URL PDF HTML ☆

赞 0 踩 0

2605.27190 2026-05-27 cs.CL cs.AI cs.LG cs.SD 版本更新

Learning When to Think While Listening in Large Audio-Language Models

在大音频语言模型中学习何时在聆听时思考

Zhiyuan Song, Weici Zhao, Yang Xiao, Suhao Yu, Cheng Zhu, Jiatao Gu

发表机构 * University of Pennsylvania（宾夕法尼亚大学）

AI总结提出一种可学习的等待-思考-回答控制机制，通过多奖励强化学习优化大音频语言模型在流式语音交互中的推理时机，在提升准确率的同时减少响应延迟。

Comments 19 pages, 4 figures, 6 tables

详情

AI中文摘要

近期大音频语言模型（LALMs）的进展使得实时、流式的语音交互越来越实用。在这种场景下，推理质量和响应速度紧密耦合：将推理延迟到语音端点可以提高答案质量，但会将思考时间转移到用户可见的响应延迟中，而过早回答则可能在决定性证据到达之前做出承诺。我们为LALMs引入了一种可学习的等待-思考-回答控制公式。受人类对话渐进性启发，控制器在部分音频证据下决定何时等待、何时外化紧凑的推理更新、以及何时回答。以Qwen2.5-Omni-7B为基础模型，我们从语音推理数据中构建对齐的等待-思考-回答轨迹，使用监督微调（SFT）训练控制器，然后应用解耦裁剪和动态采样策略优化（DAPO）。奖励结合了答案正确性、动作有效性、更新时机、延迟同步、推理质量和链一致性，优化完整的等待-思考-回答轨迹，而不仅仅是最终答案。在一个六任务合成语音推理问答（SRQA）基准上，六奖励DAPO控制器将行加权准确率从67.6%提升到70.3%，同时在相同Qwen部署环境下将端点后最终思考长度减少14%。在一个包含186个人类录音的真实音频基准（Real Audio Bench）上，作为超越文本转语音（TTS）渲染语音的迁移检查，控制器家族仍然有效：SFT实现了最强的准确率，而六奖励DAPO控制器是唯一最终思考长度低于基础模型的学习变体。这些结果表明，流式模型应该学习在音频流中何时使中间推理显式化。

英文摘要

Recent advances in Large Audio-Language Models (LALMs) have made real-time, streaming spoken interaction increasingly practical. In this setting, reasoning quality and responsiveness are tightly coupled: delaying reasoning until the speech endpoint can improve answer quality but moves deliberation into user-visible response delay, while answering too early risks committing before decisive evidence arrives. We introduce a learnable wait-think-answer control formulation for LALMs. Motivated by the incremental nature of human conversation, the controller decides under partial audio evidence when to wait, when to externalize a compact reasoning update, and when to answer. Using Qwen2.5-Omni-7B as the base model, we construct aligned wait-think-answer traces from spoken reasoning data, train the controller with supervised fine-tuning (SFT), and then apply Decoupled Clip and Dynamic Sampling Policy Optimization (DAPO). The reward combines answer correctness, action validity, update timing, latency synchronization, reasoning quality, and chain consistency, optimizing the complete wait-think-answer trajectory and not the final answer alone. On a six-task synthetic spoken reasoning question answering (SRQA) benchmark, the six-reward DAPO controller improves the row-weighted accuracy from 67.6% to 70.3% while reducing post-endpoint final-think length by 14% under the same Qwen deployment harness. On a 186-item human-recorded Real Audio Bench, a transfer check beyond text-to-speech (TTS)-rendered speech, the controller family remains functional: SFT achieves the strongest accuracy, while the six-reward DAPO controller is the only learned variant whose final-think length falls below the base. These results suggest that a streaming model should learn when to make intermediate reasoning explicit during the audio stream.

URL PDF HTML ☆

赞 0 踩 0

2605.27189 2026-05-27 cs.CL cs.LG cs.SD eess.AS q-bio.NC 版本更新

图像是否也值得16x16=256个超像素？一个用于注意力图像分类的框架

Pedro Henrique da Costa Avelar, Anderson R. Tavares, Luís C. Lamb

发表机构 * UFRGS（联邦大学里约格兰德杜斯鲁斯）； Institute of Informatics（信息学院）； Federal University of Rio Grande do Sul（里约格兰德杜斯鲁斯联邦大学）； Division of Informatics（信息系）； School of Health Sciences（健康科学学院）； Imaging and Data Science（成像与数据科学）； Faculty of Biology, Medicine and Health（生物医学与健康学院）； University of Manchester（曼彻斯特大学）； Vaughan House, Portsmouth St（波特兰街瓦尔赫恩大楼）

AI总结提出超像素变换器（SPT）框架，统一超像素图像分类与视觉变换器，通过多维正弦余弦位置编码和增强的补丁数据结构，在多个数据集上优于超像素图神经网络方法，与视觉变换器竞争。

详情

AI中文摘要

基于超像素的图像分类传统上利用图神经网络（GNN）处理不规则图像表示。计算机视觉的最新进展，由视觉变换器（ViT）驱动，引入了自注意力模型的新范式，在各种任务中超越了卷积神经网络（CNN）。然而，GNN、超像素和变换器之间的协同联系仍未探索。在这项工作中，我们提出了超像素变换器（SPT），这是一个统一超像素图像分类和ViT的新框架。SPT将超像素图像分类与图注意力网络（SICGAT）模型和ViT泛化，以支持任意超像素分块策略、连接图和位置编码。我们引入了改进，包括多维正弦余弦位置编码和完全包含超像素形状和颜色信息的增强补丁数据结构。通过在CIFAR10、FashionMNIST和Imagenette等数据集上测试SPT，采用各种超像素生成和图连接策略，我们证明SPT相比以前的超像素GNN方法实现了优越的性能，并与ViT保持竞争力。值得注意的是，我们的方法解决了SICGAT的局限性，例如像素聚合过程中的信息丢失，并展示了受限图连接如何增强ViT性能。SPT弥合了基于超像素和变换器模型之间的差距，为跨领域泛化和混合注意力框架的未来创新开辟了道路，并表明图像也值得$16\times16$个超像素。

英文摘要

Superpixel-based image classification has traditionally leveraged graph neural networks (GNNs) for processing irregular image representations. Recent advances in computer vision, driven by Vision Transformers (ViTs), have introduced new paradigms in self-attentional models, surpassing convolutional neural networks (CNNs) in various tasks. However, a synergistic connection between GNNs, superpixels, and transformers remains unexplored. In this work, we propose Superpixel Transformers (SPT), a novel framework that unifies superpixel-based image classification and ViTs. SPT generalizes the Superpixel Image Classification with Graph Attention Networks (SICGAT) model and ViT to support arbitrary superpixel-based chunking strategies, connectivity graphs, and positional encodings. We introduce refinements including a multidimensional sine-cosine positional encoding and an enriched patch data structure that fully incorporates superpixel shape and color information. By testing SPT across datasets such as CIFAR10, FashionMNIST, and Imagenette, with various superpixel generation and graph connectivity strategies, we demonstrate that SPT achieves superior performance compared to previous superpixel-based GNN methods and remains competitive with ViTs. Notably, our approach addresses the limitations of SICGAT, such as information loss during pixel aggregation, and shows how constrained graph connectivity can enhance ViT performance. SPT bridges the gap between superpixel-based and transformer models, opening avenues for cross-domain generalization and future innovations in hybrid attentional frameworks, and showing that an image can also be worth $16\times16$ superpixels.

URL PDF HTML ☆

赞 0 踩 0

2605.27133 2026-05-27 cs.LG cs.AI 版本更新

Deep-layer limit and stability analysis of the basic forward-backward-splitting induced network (II): learning problems

基本前向-后向分裂诱导网络的深层极限与稳定性分析（II）：学习问题

Xuan Lin, Chunlin Wu

发表机构 * China Academy of Aerospace System（中国航天系统研究院）； School of Mathematical Sciences（数学科学学院）； Nankai University（南开大学）

AI总结本文研究基本前向-后向分裂（FBS）诱导网络的训练问题，证明其收敛到深层极限系统的学习问题，并给出扰动稳定性分析。

Comments 38 pages, 1 figure

详情

AI中文摘要

源自迭代优化方案和数值常/偏微分方程（ODE/PDE）的深度展开神经网络在过去十年中引起了数据科学界的广泛关注。其中，许多重要的网络架构是从基本的前向-后向分裂（FBS）算法构建的。在本文中，我们继续研究最基本的FBS诱导网络，该网络通过引入直接参数松弛从原始FBS算法展开。基于我们先前前向系统分析中的差分/微分包含公式，我们在此考虑相应学习问题的一些理论方面。在一些温和假设下，我们建立了基本FBS诱导网络的训练问题收敛到深层极限系统的学习问题的一般收敛性质，这意味着一个$\Gamma$-收敛论证，表明网络最优学习参数的任意聚点是深层极限系统学习问题的解。还对这些学习问题的扰动稳定性进行了定性分析。进行了一个简单的数值实验以验证我们的主要一般收敛结果。

英文摘要

Deep unfolding neural networks derived from iterative optimization schemes and numerical ordinary/partial differential equations (ODEs/PDEs) have attracted much attention in data science over the last decade. Therein, numerous important network architectures were constructed from the basic forward-backward-splitting (FBS) algorithm. In this paper, we continue our research on the most basic FBS-induced network, an architecture unrolled from the original FBS algorithm by incorporating direct parameter relaxations. Following the difference/differential inclusion formulations in our previous forward system analyses, we here consider some theoretical aspects of corresponding learning problems. Under some mild assumptions, we establish a general convergence property of the training problem of the basic FBS-induced network to the learning problem of the deep-layer limit system, implying a $Γ$-convergence argument showing that any cluster point of the optimal learning parameters for the network is a solution to the learning problem of the deep-layer limit system. A qualitative analysis of perturbation stabilities of these learning problems is also presented. A simple numerical experiment is conducted to validate our main general convergence result.

URL PDF HTML ☆

赞 0 踩 0

2605.27130 2026-05-27 cs.LG cs.AI 版本更新

DEI: Diversity in Evolutionary Inference for Quality-Diversity Search

DEI：质量-多样性搜索中的进化推理多样性

John Donaghy, Shikhar Rastogi

AI总结提出DEI框架，通过异构大语言模型作为变异算子进行分布式质量-多样性搜索，实验表明模型多样性比并行性更能提升搜索性能。

Comments Accepted to ICML 2026 Workshop Scalable Learning and Optimization for Efficient Multimodal AI Agents (SCALE)

详情

AI中文摘要

我们提出DEI：进化推理中的多样性，一个分布式质量-多样性（QD）搜索框架，该框架将异构大语言模型（LLM）分配为变异算子，在通过非阻塞集合操作通信的对等节点间运行。与同质并行搜索（在所有工作节点上复制单一模型的归纳偏差）不同，DEI将每个LLM独特的创造性先验视为行为新颖性的互补来源。通过DEI扩展数字红皇后框架，节点在每轮结束时共享局部最优解，以播种下一轮的种群。这产生了跨模型的对抗压力，推动了超越模型内自博弈的鲁棒性。在Core War领域（一个竞争性编程基准，其中Redcode战士程序在模拟机器中战斗）上评估，一个四节点异构集成（GPT-5.4-mini、Claude Sonnet 4.6、GPT-5.2和Claude Haiku 4.5）在相等的总LLM调用预算下，相比单节点基线，实现了124%更高的合并存档QD分数（45.90 vs. 20.46）和28%更高的覆盖率（80.6% vs. 63.0%的单元格）。异构集成还在QD分数、覆盖率和所有四个模型家族的保留解泛化性上优于同等预算的同质集成。这些结果首次提供了经验证据，表明模型多样性（而非仅仅是并行性）是分布式基于LLM的QD搜索中增益的关键驱动因素。

英文摘要

We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework that assigns heterogeneous large language models (LLMs) as mutation operators across peer nodes communicating with non-blocking collective operations. Unlike homogeneous parallel search, which replicates a single model's inductive biases across all workers, DEI treats each LLM's distinct creative prior as a complementary source of behavioral novelty. Extending the Digital Red Queen framework with DEI, nodes share local optimal solutions at the end of each round to seed the next round's population. This creates cross-model adversarial pressure that drives robustness beyond intra-model self-play. Evaluated on the Core War domain, a competitive programming benchmark in which Redcode warrior programs battle inside a simulated machine, a four-node heterogeneous ensemble (GPT-5.4-mini, Claude Sonnet 4.6, GPT-5.2, and Claude Haiku 4.5) achieves 124 percent higher merged-archive QD-Score (45.90 vs. 20.46) and 28 percent higher coverage (80.6 percent vs. 63.0 percent of cells) than a single-node baseline at equal total LLM-call budget. The heterogeneous ensemble also outperforms an equally-budgeted homogeneous ensemble on QD-Score, coverage, and held-out solution generality across all four model families. These results provide the first empirical evidence that model diversity, not merely parallelism, is the key driver of gain in distributed LLM-based QD search.

URL PDF HTML ☆

赞 0 踩 0

2605.27128 2026-05-27 cs.CV cs.LG 版本更新

PILOT: A Data-Free Continual Learning Approach for Real-Time Semantic Segmentation via Boundary Guidance

PILOT: 一种基于边界引导的无数据持续学习方法用于实时语义分割

Yujing Zhou, Prashant Shekhar, Thomas Yang, Yongxin Liu

发表机构 * Department of Mathematics, College of Arts and Sciences, Embry-Riddle Aeronautical University（数学系，文理学院，埃姆布里-里德航空大学）； Department of Electrical Engineering and Computer Science, College of Engineering, Embry-Riddle Aeronautical University（电气工程与计算机科学系，工程学院，埃姆布里-里德航空大学）

AI总结提出PILOT框架，通过冻结原网络参数并引入并行导数分支捕获新类边界信息，实现实时语义分割模型在无需旧数据情况下的增量学习，有效缓解灾难性遗忘。

详情

AI中文摘要

实时语义分割模型在准确性和推理速度之间取得了极好的平衡。然而，将这些模型部署在动态的真实世界环境中，通常需要能够在不重新训练整个数据集的情况下增量地学习新类别。这种能力被称为持续学习。在这方面，深度学习中的标准微调方法常常因灾难性遗忘而失败，即模型学习新信息但忘记了先前训练和学习的类别。针对这一关键领域，本文提出了一种针对PIDNet的新型持续学习框架，PIDNet是一种被广泛引用的最先进的实时语义分割模型。我们的方法PILOT（并行增量学习随时间）通过实现一个并行导数分支（D-branch）引入了一种实时且轻量级的策略，该分支旨在捕获新类别的高频边界信息，同时冻结原始分割网络的训练参数。这种新颖的设置允许模型适应新的语义类别，同时保留先前学习类别的知识。通过仅使用与新类别相关的数据，我们的模型显著减少了训练开销。实验结果表明，我们的方法成功分割了新类别，同时在原始基类上保持了较高的平均交并比（mIoU），从而在该领域轻松超越了所有主要的持续学习方法。总体而言，PILOT被证明能有效缓解灾难性遗忘，同时对推理延迟影响最小，从而保持实时性能。

英文摘要

Real-time semantic segmentation models offer an excellent balance between accuracy and inference speed. However, deploying these models in dynamic real world environments often requires the ability to learn novel classes incrementally without retraining on the entire dataset. This capability is known as continual learning. In this regard, the standard fine-tuning methods in deep learning often fail due to catastrophic forgetting, where the model learns new information but forgets previously trained and learned classes. Contributing to this crucial domain, the current paper proposes a novel continual learning framework tailored for PIDNet, which is a widely cited state-of-the-art real-time semantic segmentation model. Our method, PILOT(Parallel Incremental Learning Over Time), introduces a real-time and lightweight strategy by implementing a parallel Derivative-branch (D-branch) designed to capture the high frequency boundary information of novel classes while freezing the trained parameters of the original segmentation network. This novel setup allows the model to adapt to new semantic categories while preserving the knowledge of previously learned classes. By using only data associated with the new class, our model significantly reduces training overhead. Experimental results demonstrate that our approach successfully segments new classes while maintaining high mean Intersection over Union (mIoU) on the original base classes, thereby comfortably outperforming all major continual learning approaches in this domain. Overall, PILOT is shown to effectively mitigate catastrophic forgetting with minimal impact on inference latency, thus maintaining real-time performance.

URL PDF HTML ☆

赞 0 踩 0

2605.27113 2026-05-27 cs.LG cs.AI 版本更新

High-Quality Synthetic Financial Time-Series using a GAN-Diffusion Framework

使用GAN-扩散框架的高质量合成金融时间序列

Giuseppe Masi, Andrea Coletta, Novella Bartolini

发表机构 * Sapienza University of Rome（罗马大学）

AI总结提出一种结合GAN和扩散模型的质量感知生成框架，通过GAN的Critic引导扩散过程，生成更真实且保留金融时间序列典型事实和资产间相关结构的合成数据。

详情

AI中文摘要

近年来，金融机构和公司越来越多地采用合成数据来解决数据稀缺问题并生成反事实市场情景。然而，再现金融时间序列的所有统计特性（通常称为典型事实）对于许多现有的通用架构来说仍然是一个开放的挑战。在本文中，我们提出了一种质量感知生成框架，该框架结合了两类生成方法，展示了它们的集成如何解决现有局限性，同时增强合成数据的真实性。具体来说，我们首先引入CoMeTS-GAN（相关多变量时间序列GAN），这是一种条件生成对抗网络（C-GAN），旨在联合生成相关股票的中价和成交量时间序列。然后，我们展示了如何将我们的GAN架构整合到最先进的扩散模型中，以提高生成的相关结构的质量。具体来说，GAN的Critic作为一个质量评估模块，指导扩散过程，在生成的时间序列中强制执行学习到的相关结构。我们的框架为真实的股票市场模拟提供了一种轻量级且响应迅速的解决方案，明确建模了资产间的相关结构。我们通过实验将我们的框架与领先的生成架构进行了比较，表明它更有效地捕捉了股票市场的典型事实并建模了资产间的相关性。

英文摘要

In recent years, financial institutions and firms have increasingly adopted synthetic data to address data scarcity and to generate counterfactual market scenarios. However, reproducing all the statistical properties of financial time series, commonly known as stylized facts, remains an open challenge for many existing general-purpose architectures. In this paper, we present a quality-aware generative framework that combines two classes of generative methods, demonstrating how their integration addresses existing limitations while enhancing the realism of synthetic data. Specifically, we first introduce CoMeTS-GAN (Correlated Multivariate Time Series GAN), a Conditional Generative Adversarial Network (C-GAN) designed to jointly generate mid-price and volume time-series for correlated stocks. We then show how our GAN architecture can be incorporated into state-of-the-art diffusion models to enhance the quality of generated correlation structures. Specifically, the GAN's Critic serves as a quality evaluation module that guides the diffusion process, enforcing learned correlation structures in the generated time-series. Our framework offers a lightweight and responsive solution for realistic stock market simulation, explicitly modeling inter-asset correlation structures. We experimentally validate our framework against leading generative architectures, showing that it more effectively captures the stylized facts of stock markets and models inter-asset correlations.

URL PDF HTML ☆

赞 0 踩 0

2605.27097 2026-05-27 cs.LG stat.ML 版本更新

Mildly Overparameterized ReLU Networks on Orthogonal Data: Incremental Learning and Implicit Bias

正交数据上的轻度过参数化ReLU网络：增量学习与隐式偏差

James Town, Etienne Boursier, Ben Lewis, Matthias Englert, Ranko Lazic

发表机构 * University of Warwick（沃里克大学）； INRIA LMO, Université Paris-Saclay（巴黎-萨克勒大学INRIA LMO）

AI总结研究从微小初始化出发的两层ReLU网络在正交数据上的梯度流动力学，揭示了当初始化尺度趋近零时极限流收敛到鞍点间跳跃过程，并证明网络在宽度m约大于log(n)时高概率插值训练数据，且学习到的插值器的平方ℓ2范数缩放为√n，与最小ℓ2范数插值器相差常数因子。

Comments 66 pages, 6 figures

详情

AI中文摘要

神经网络的成功训练依赖于一阶优化方法的使用，但这些方法的理论刻画仍不完整，尤其是在轻度过参数化设置下。本文研究从微小初始化出发的两层ReLU网络在正交训练数据上的梯度流动力学。我们证明，当初始化尺度趋近零时，极限流收敛到鞍点间跳跃过程，揭示了在每个鞍点处激活一个新神经元的增量学习现象。该分析恢复了Dana等人（2025, arXiv:2502.16977）的已知结果：只要$m \gtrsim \log(n)$（其中$m$是网络宽度，$n$是训练样本数），网络就以高概率插值训练数据。这一增量过程刻画还使我们能够推导出一个新的隐式偏差结果：学习到的插值器具有平方$\ell_2$范数缩放为$\sqrt{n}$，这处于最小$\ell_2$范数插值器的常数因子内。更广泛地，我们的工作为ReLU网络的增量学习过程提供了首个严格证明，同时表明轻度过参数化网络可以收敛到复杂度与最优插值器同阶的插值解。

英文摘要

The successful training of neural networks hinges on the use of first order optimization methods, yet the theoretical characterization of these methods remains incomplete. This is especially true in settings with mild overparameterization. In this work, we study the gradient flow dynamics of two-layer ReLU networks from small initialization with orthogonal training data. We prove the limiting flow converges to a saddle-to-saddle jump process as the initialization scale tends to zero, revealing an incremental learning phenomenon in which a new neuron activates at each saddle. This analysis recovers the known result of Dana et al. (2025, arXiv:2502.16977) that the network interpolates the training data with high probability as soon as $m \gtrsim \log(n)$, where $m$ is the network width and $n$ is the number of training samples. This incremental process characterization also allows us to derive a novel implicit bias result: the learned interpolator has a squared $\ell_2$-norm scaling as $\sqrt{n}$, which is within a constant factor of the minimal $\ell_2$-norm interpolator. More broadly, our work provides the first rigorous proof of an incremental learning process for ReLU networks, whilst suggesting mildly overparameterized networks can converge to interpolating solutions whose complexity is of the same order as that of the optimal interpolator.

URL PDF HTML ☆

赞 0 踩 0

2605.27093 2026-05-27 stat.ML cs.LG 版本更新

信任区域Q伴随匹配

Yonghoon Dong, Kyungmin Lee, Changyeon Kim, Jaehyuk Kim, Jinwoo Shin

发表机构 * KAIST AI（韩国科学技术院人工智能）； Seoul National University（首尔国立大学）； RLWRLD

AI总结针对预训练流策略的离策略强化学习不稳定性，提出信任区域Q伴随匹配方法，通过投影对偶下降自适应控制路径空间KL散度，实现稳定微调，在50个OGBench任务中离线RL成功率达68%。

详情

AI中文摘要

由于多步采样过程带来的优化不稳定性，预训练流策略的离策略强化学习仍然具有挑战性。最近，带有伴随匹配的Q学习（QAM）通过将问题重新表述为一个具有学习评论家的无记忆随机最优控制（SOC）问题来解决这一问题。然而，QAM继承了评论家引导改进的根本脆弱性：当评论家病态时，小的评论家误差会被放大，通常导致模型崩溃。本文引入了信任区域Q伴随匹配（TRQAM），一种稳定的离策略微调算法，通过投影对偶下降自适应地控制与预训练流策略的路径空间KL散度。具体来说，我们优化SOC动力学中的信任区域参数$λ$，并从理论上证明路径空间KL可以用$λ$的闭式函数表示。因此，我们的方法可以精确控制与预训练流策略的精确偏差，实现稳定的离策略强化学习。通过在50个OGBench任务上的实验，TRQAM在离线强化学习和离线到在线强化学习中都持续优于先前的方法。特别是，TRQAM在离线强化学习中实现了68%的总体成功率，显著提高了最强基线的46%。

英文摘要

Off-policy reinforcement learning of pretrained flow policies remains challenging due to the instability of optimization arising from the multi-step sampling process. Recently, Q-learning with Adjoint Matching (QAM) addressed this issue by reformulating into a memoryless stochastic optimal control (SOC) problem with a learned critic. However, QAM inherits a fundamental fragility of critic-guided improvement: small critic errors are amplified when critics are ill-conditioned, often leading to model collapse. This paper introduces Trust Region Q-Adjoint Matching (TRQAM), a stable off-policy fine-tuning algorithm that adaptively controls the path-space KL with pretrained flow policies through projected dual descent. Specifically, we optimize the trust-region parameter $λ$ in SOC dynamics, and theoretically show that the path-space KL can be represented by a closed-form function of $λ$. As a result, our method can precisely control the exact deviation from pretrained flow policies, achieving stable off-policy RL. Through experiments on 50 OGBench tasks, TRQAM consistently outperforms prior arts in both offline RL and offline-to-online RL. In particular, TRQAM achieves an overall success rate of 68% in offline RL, substantially improves the strongest baseline at 46%.

URL PDF HTML ☆

赞 0 踩 0

2605.27076 2026-05-27 cs.MA cs.LG 版本更新

Cost of Structural Learning Under Censored Feedback: A Threshold-Bandit Approach

审查反馈下结构学习的代价：一种阈值-老虎机方法

Michael Ledford, William Regli

发表机构 * University of Maryland, College Park（马里兰大学学院公园分校）

AI总结针对任务仅当联盟达到未知规模阈值时才产生奖励的审查反馈问题，提出阈值激活合作多臂老虎机模型，并通过集中式算法C-TAC实现O(log T)累积遗憾，以及去中心化事件触发协议D-TAC在保持可行性对齐的同时减少23倍通信。

详情

AI中文摘要

在许多多智能体应用中，任务仅当由满足未知规模阈值的联盟执行时才产生奖励；否则，反馈完全被审查。这种审查造成了可识别性问题：智能体无法区分随机失败与协调不足。我们将此设置形式化为阈值激活合作多臂老虎机（TAC-MAB），并在集中式和去中心化协调下进行分析。我们证明集中式算法（C-TAC）实现了累积遗憾O(log T)，该遗憾分解为结构搜索项（捕获在审查反馈下解决可行性的代价）和统计监控项（用于价值估计）。然后我们引入D-TAC，一种去中心化事件触发协议，其中智能体仅在其结构信念改变时进行同步。实验表明，在保守信念融合下，D-TAC相对于集中式基线实现了23倍的通信减少，同时保持了可行性对齐。这些结果刻画了在审查反馈下学习的协调代价，并表明无需持续同步即可实现接近集中式的通信效率。

英文摘要

In many multi-agent applications, tasks yield rewards only when executed by a coalition meeting an unknown size threshold; otherwise, feedback is fully censored. This censorship creates an identifiability problem: agents cannot distinguish stochastic failure from insufficient coordination. We formalize this setting as the Threshold-Activated Cooperative Multi-Armed Bandit (TAC-MAB) and analyze it under both centralized and decentralized coordination. We show that a centralized algorithm (C-TAC) achieves cumulative regret O(log T), decomposed into a structural-search term that captures the cost of resolving feasibility under censored feedback and a statistical-monitoring term for value estimation. We then introduce D-TAC, a decentralized event-triggered protocol in which agents synchronize only when their structural beliefs change. Empirically, D-TAC achieves a 23x reduction in communication relative to the centralized baseline while preserving feasibility alignment under conservative belief fusion. These results characterize the coordination cost of learning under censored feedback and show that near-centralized communication efficiency is achievable without continuous synchronization.

URL PDF HTML ☆

赞 0 踩 0

2605.27073 2026-05-27 cs.LG 版本更新

BhashaSetu：一种以数据为中心的低资源机器翻译方法

Param Thakkar, Anushka Yadav, Michael Tiemann, Abhi Mehta, Akshita Bhasin, Shrinivas Khedkar

发表机构 * Department of Computer Engineering and Information Technology, Veermata Jijabai Technological Institute, Mumbai（孟买韦尔马塔·吉贾拜技术学院计算机工程与信息技术系）； Tübingen AI Center, University of Tübingen, Germany（图宾根大学图宾根人工智能中心，德国）

AI总结提出BhashaSetu数据集，通过大规模、多领域、形态感知的英-马拉地语平行语料库，并验证语料库级去重对低资源神经机器翻译质量的关键影响。

详情

AI中文摘要

我们提出了BhashaSetu，一个语言丰富的英语-马拉地语平行数据集，解决了低资源神经机器翻译（NMT）中持续存在的数据限制问题。马拉地语有超过9500万使用者，但在不同领域的高质量平行语料库中仍然代表性不足。我们的数据集包含来自新闻、政治、医疗、文学和文化等异构来源的278万个句子对，并提供了词干化和词形还原表示以支持形态感知分析。我们使用BLEU、spBLEU、chrF++和TER指标对多个最先进的翻译模型进行了基准测试，并使用LoRA对NLLB-200-distilled-600M进行了参数高效微调。我们消融实验的一个关键发现是：语料库级去重是预处理中对下游质量贡献最大的单一因素（去除它会使性能降低1.17 BLEU和2.21 chrF++），这表明对于低资源、形态丰富的语言，有纪律的跨源语料库卫生是一种低成本、高影响力的干预措施。该数据集已公开发布，以促进可重复且语言信息丰富的低资源NMT研究。

英文摘要

We present BhashaSetu, a linguistically enriched English--Marathi parallel dataset addressing persistent data limitations in low-resource neural machine translation (NMT). Marathi, spoken by over 95 million people, remains underrepresented in high-quality parallel corpora across diverse domains. Our dataset comprises 2.78 million sentence pairs from heterogeneous sources including news, politics, healthcare, literature, and culture, with stemmed and lemmatized representations to support morphology-aware analysis. We benchmark multiple state-of-the-art translation models using BLEU, spBLEU, chrF++, and TER metrics, and conduct parameter-efficient fine-tuning of NLLB-200-distilled-600M using LoRA. A key finding from our ablation: corpus-level deduplication is the single largest preprocessing contributor to downstream quality (removing it reduces performance by 1.17 BLEU and 2.21 chrF++), demonstrating that disciplined cross-source corpus hygiene is a low-cost, high-impact intervention for low-resource, morphologically rich languages. The dataset is publicly released to promote reproducible and linguistically informed low-resource NMT research.

URL PDF HTML ☆

赞 0 踩 0

2605.27043 2026-05-27 stat.ML cs.LG stat.ME 版本更新

Causal Representation Learning for Generalisable Recommendation

因果表示学习用于可泛化推荐

Yorgos Felekis, Michael O'Riordan, Oriol Corcoll, Ciarán M. Gilligan-Lee

发表机构 * University of Warwick（沃里克大学）； Spotify（Spotify公司）； University College London（伦敦大学学院）

AI总结针对推荐系统中训练分布与部署分布不一致导致的泛化问题，提出基于因果表示学习的信息论解缠标准及其可计算变分下界，仅利用混淆日志即可提升模型在分布偏移下的泛化能力，在Spotify A/B测试、KuaiRand数据集和合成基准上验证了有效性。

详情

AI中文摘要

基于观测数据训练的预测模型在部署时往往无法泛化到所遇到的分布，尤其是当训练数据是被优化系统的产物时。推荐系统是一个典型例子：它们是在被部署策略、过去用户行为和平台过滤混淆的交互日志上训练的。因此，训练分布与在服务时评分的候选分布存在显著差异，这种差距使得离线指标无法可靠预测在线性能。我们通过一种受因果表示学习（CRL）启发的方法来解决分布偏移问题。我们提出了一种信息论解缠标准，并证明其最优值仅取决于输入的因果成分。然后，我们推导出一个可处理的变分下界，使得该标准仅从有限观测数据中即可优化。我们的方法范围比大多数CRL文献更窄，因为我们目标是改善分布偏移下的泛化能力，而非完全识别所有潜在因果因素。这个更窄的目标使得该方法实用，仅需要现有的混淆日志，适用于任何标准监督模型，且不增加推理时间成本。我们的主要评估是在Spotify上对数百万用户进行的A/B测试，应用于个性化播放列表生成的排序器。一个容量匹配的CRL变体在离线性能上相当，但在在线听众参与度上带来了显著提升。在公开的KuaiRand推荐数据集和具有已知因果结构的合成基准上的补充证据显示了相同模式：与基线离线持平，在分布偏移下获得收益。在所有三种设置中，加入我们的因果解缠目标都带来了更有意义的分布外泛化。

英文摘要

Predictive models trained on observational data often fail to generalise to the distributions they encounter when deployed, especially when the training data is a product of the system being optimised. Recommender systems are a canonical example: they are trained on interaction logs confounded by the deployed policy, past user behaviour, and platform filtering. As a result, the training distribution differs substantially from the candidate distribution scored at serving time, a gap that makes offline metrics unreliable predictors of online performance. We address the distribution shift problem with a method motivated by causal representation learning (CRL). We propose an information-theoretic disentanglement criterion and prove that its optimum depends only on the causal components of the input. We then derive a tractable variational lower bound that makes the criterion optimisable from finite observational data alone. The scope of our method is narrower than that of much of the CRL literature, in that we target better generalisation under distribution shift, not full identification of all latent causal factors. This narrower target is what makes the method practical, requiring only the existing confounded logs, applying to any standard supervised model, and adding no inference-time cost. Our headline evaluation is an A/B test with millions of users on Spotify, applied to a production ranker for personalised playlist generation. A capacity-matched CRL variant performed on par offline but delivered substantial online gains in listener engagement. Complementary evidence on the public KuaiRand recommendation dataset and a synthetic benchmark with known causal structure shows the same pattern: offline parity with baseline, gains under distribution shift. Across all three settings, adding our causal disentanglement objective yields meaningfully better out-of-distribution generalisation.

URL PDF HTML ☆

赞 0 踩 0

2605.27033 2026-05-27 cs.CL cs.AI cs.LG 版本更新

Tracing Computation Density in LLMs

追踪LLMs中的计算密度

Corentin Kervadec, Iuliia Lysova, Iuri Macocco, Marco Baroni, Gemma Boleda

发表机构 * Universitat Pompeu Fabra（庞培法布拉大学）； ICREA

AI总结提出s-Trace方法估计最优子图，发现LLM计算分为早期稀疏核心和后期密集细化两个阶段，且计算量与模型不确定性相关。

详情

AI中文摘要

基于Transformer的大型语言模型（LLMs）由数十亿个参数组成，这些参数排列在深度和宽度都很大的计算图中，但尚不清楚它们是否对所有输入都充分利用了全部容量。我们引入了s-Trace方法，以有效估计最能近似完整模型输出的大小为s的子图。通过这种方法，我们发现各种LLM中的计算组织成两个不同的阶段。一个主要由早期层节点组成的小子图可以重建完整模型输出分布的头部。添加更多节点（主要位于后期层，且越来越多地由注意力头组成）会导致近似完整输出分布的逐步细化。此外，我们发现每个输入所需的计算量与模型不确定性相关，并且更稀疏的子图编码浅层统计信息，例如单字频率。总体而言，我们的结果表明，有效的LLM计算中存在一致的模块化组织，其中稀疏的早期层核心提供粗略预测，然后通过后期层中更密集的计算进一步细化。

英文摘要

Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs, but it is not clear that they exploit their full capacity for all inputs. We introduce the s-Trace method to efficiently estimate the subgraph of size s that best approximates a full model output. With this method, we find the computation in a variety of LLMs to be organized in two distinct phases. A small subgraph mostly composed of early-layer nodes can reconstruct the head of the full model output distribution. Adding further nodes, mostly located in later layers and increasingly consisting of attention heads, leads to incremental refinements in approximating the full output distribution. We find moreover that the amount of necessary computation per input correlates with model uncertainty, and that sparser subgraphs encode shallow statistics, such as unigram frequency. Overall, our results suggest a consistent modular organization in effective LLM computation, with a sparse early-layer core providing a rough prediction that is further refined through denser computations in later layers.

URL PDF HTML ☆

赞 0 踩 0

2605.27028 2026-05-27 cs.LG cs.AI 版本更新

Less is More: Early Stopping Rollout for On-Policy Distillation

少即是多：用于在线策略蒸馏的早停展开

Zhou Ziheng, Jiaqi Li, Huacong Tang, Ying Nian Wu, Demetri Terzopoulos

发表机构 * University of California, Los Angeles（加州大学洛杉矶分校）； Beijing Institute of General Artificial Intelligence（北京通用人工智能研究院）

AI总结针对在线策略蒸馏中存在的“离策略教师衰减”问题，提出早停展开（ESR）方法，通过限制响应生成的前几个token来提升性能、GPU效率和训练稳定性。

详情

AI中文摘要

在线策略蒸馏最近成为标准序列级模仿的有前途的替代方案，通过使用教师模型对学生自身的展开进行评分来训练学生。然而，我们观察到这种范式中的“离策略教师衰减”问题：对于后面的token，由于学生的早期轨迹作为上下文对于教师来说是离策略的，教师产生纠正性分数的能力会衰减，并可能退回到预训练阶段学习的token补全行为。我们通过实验验证了这个问题，并提出了早停展开（ESR）来解决它：一种简单而有效的蒸馏策略，仅限制展开生成到前几个响应token。我们表明，ESR在模型大小、家族、任务和训练制度上均超越了全展开在线策略蒸馏的性能，并且在跨模型家族场景下表现出更高的GPU效率和训练稳定性。我们进一步研究了这一惊人性能背后的机制，发现了ESR的“级联对齐”和“子模式承诺”效应，这可能解释其为何有效，甚至有时超过教师模型性能。此外，我们表明这种基于位置的token选择策略不能完全由KL散度和熵信号解释。

英文摘要

On-policy distillation has recently emerged as a promising alternative to standard sequence-level imitation, training a student by scoring its own rollouts with a teacher model. However, we observe ``Off-policy Teacher Decay'' problem in this paradigm: for the later tokens, with student's earlier trajectory as context that is off-policy to the teacher, the teacher's ability to produce a corrective score would decay, and may fall back to token-completion behavior learned in the pre-training stage. We empirically verify this problem, and we propose Early Stopping Rollout (ESR) to fix it: a simple yet effective distillation strategy that simply restricts the rollout generation to the first response tokens. We show that ESR both surpasses the full rollout OPD performance across model size, family, tasks and training regime, and exhibit much higher GPU efficiency and training stability, especially under cross model family scenarios. We further investigate the mechanism behind this surprising performance and discovered "Cascading Alignment" and "Sub-mode Commitment" effect of ESR that may explain why it works effectively and even sometimes exceeding the teacher model performance. Besides, we show that this position-based token selection strategy cannot be fully explainable by KL divergence and entropy signals.

URL PDF HTML ☆

赞 0 踩 0

2605.27027 2026-05-27 cs.LG 版本更新

SQARL: A Size-Agnostic Reinforcement Learning approach for Circuit Allocation in Distributed Quantum Architectures

SQARL: 一种适用于分布式量子架构中电路分配的大小无关强化学习方法

Víctor Carballo, Júlia López-Closa, Mario Martin

发表机构 * Computer Science Department, Universitat Polit\`ecnica de Catalunya - BarcelonaTech (UPC) ； High Performance Artificial Intelligence group, Barcelona Supercomputing Center

AI总结针对分布式量子计算中的量子比特分配问题，提出一种基于Transformer的灵活强化学习架构，无需重新训练即可处理任意数量的量子比特和核心，在分配成本上比匈牙利量子比特分配算法降低33%。

详情

AI中文摘要

量子处理器的扩展目前受到退相干和串扰等技术挑战的限制。随着量子比特数量的增加，干扰会增大计算噪声。分布式量子计算通过互连更小、更易处理的量子处理器（核心）来解决这些限制，但引入了最小化缓慢且易出错的核间通信的挑战。在最小化通信成本的同时将量子电路分配到核心的任务被称为量子比特分配问题。本文致力于开发一种深度学习方法来解决该问题，强调对量子硬件拓扑的灵活性，并提升现有最优性能。启发式和非学习算法，如匈牙利量子比特分配（HQA），目前代表了最优水平。强化学习（RL）方法利用学习到的分配策略，但通常缺乏灵活性，当硬件配置改变时需要重新训练，并且其解的质量不如非学习方法。然而，学习机制可能超越人工设计的启发式方法。为克服这些限制，本文提出一种灵活的基于Transformer的架构，无需重新训练即可处理任意数量的量子比特和核心。结果表明，训练后的策略持续优于先前的RL最优水平，并缩小了RL与HQA在大多数常见电路上的差距。对于Cuccaro加法器，它相对于HQA实现了33%的分配成本降低，对于随机电路平均降低25%。这些发现表明，基于学习的方法可以有效地匹配手工启发式方法的性能，这是向实际应用迈出的关键一步。

英文摘要

The scaling of quantum processors is currently limited by technical challenges such as decoherence and cross-talk. As the number of qubits grows, interference increases the computational noise. Distributed quantum computing addresses these limitations by interconnecting smaller, easier-to-handle quantum processors (cores), but it introduces the challenge of minimizing slow, error-prone inter-core communication. The task of distributing quantum circuits across cores while minimizing communication costs is known as the Qubit Allocation problem. This work focuses on developing a deep learning approach to this problem, emphasizing flexibility to quantum hardware topology and improving state-of-the-art performance. Heuristic and non-learning algorithms, such as the Hungarian Qubit Allocation (HQA), currently represent the state of the art. Reinforcement Learning (RL) approaches leverage learned allocation policies but often lack flexibility, requiring retraining when hardware configurations change, and they fall short of the solution quality achieved by non-learning methods. However, learning mechanisms could outperform human-crafted heuristics. To overcome these limitations, this work proposes a flexible, transformer-based architecture that can handle arbitrary numbers of qubits and cores without retraining. Results show that the trained policy consistently outperforms the previous RL state of the art and narrows the gap between RL and HQA for the most common circuits. It achieves a 33% reduction in allocation cost relative to the HQA for the Cuccaro Adder and 25% on average for random circuits. These findings show that learning-based approaches can effectively match the performance of hand-crafted heuristics, a crucial step towards their application in real-world scenarios.

URL PDF HTML ☆

赞 0 踩 0

2605.27016 2026-05-27 cs.CL cs.AI cs.LG stat.ML 版本更新

Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination

评估不确定性估计器与LLM幻觉的相关性

Yedidia Agnimo, Anna Korba, Annabelle Blangero, Nicolas Chesneau, Karteek Alahari

发表机构 * CREST, ENSAE Institut Polytechnique de Paris（CREST，巴黎高等理工学院）； Ekimetrics France（法国Ekimetrics）； Centre Inria de l’Université Grenoble Alpes（格勒诺布尔阿尔卑斯大学信息研究院）

AI总结通过系统实证研究，评估信息论、基于采样和反思性等不确定性估计器与LLM幻觉之间的关联，发现关联性高度可变且通常较弱，挑战了将不确定性作为幻觉直接信号的做法。

Comments 35 pages, 7 figures, 9 tables

详情

AI中文摘要

大型语言模型（LLM）容易产生幻觉，即与输入或训练数据不符的陈述，阻碍了可靠部署。同时，许多不确定性估计（UE）方法被提出来量化模型置信度，并常被隐含地视为模型失败的代理。然而，不确定性与幻觉之间的关系尚未得到充分表征。我们对不确定性估计器与LLM幻觉之间的关联进行了系统的实证研究。我们不是假设这种关联，而是直接评估它在何时以及在多大程度上成立。我们考虑了多种不确定性估计器，包括信息论、基于采样和反思性估计器，并检查了它们在幻觉设置中的行为。我们的实验涵盖了内在幻觉（违反输入忠实性）和外在幻觉（相对于训练数据的无根据主张），使用了四个互补基准，包括RAGTruth和HalluLens。我们发现，这种关联性高度可变且通常较弱，取决于幻觉类型和所评估的LLM。这些结果挑战了将不确定性作为幻觉直接信号的做法，并阐明了何时它能提供可操作的信息。

英文摘要

Large language models (LLMs) are prone to hallucinations, i.e., statements unsupported by the input or training data, hindering reliable deployment. In parallel, numerous uncertainty estimation (UE) methods have been proposed to quantify model confidence and are often implicitly treated as proxies for model failure. However, the relationship between uncertainty and hallucinations remains insufficiently characterized. We present a systematic empirical study of the association between uncertainty estimators and hallucinations in LLMs. Rather than assuming this association, we evaluate directly when and to what extent it holds. We consider a diverse set of uncertainty estimators, including information-theoretic, sampling-based, and reflexive estimators, and examine their behavior across hallucination settings. Our experiments cover both intrinsic hallucinations (violations of input faithfulness) and extrinsic hallucinations (unsupported claims relative to training data), using four complementary benchmarks, including RAGTruth and HalluLens. We find that the association is highly variable and often weak, depending on the hallucination type and the LLM under evaluation. These results challenge the use of uncertainty as a direct signal of hallucination and clarify when it provides actionable information.

URL PDF HTML ☆

赞 0 踩 0

2605.27009 2026-05-27 cs.LG 版本更新

SCENT: Aligning Mass Spectra with Molecular Structure for Olfactory Perception

SCENT: 将质谱与分子结构对齐用于嗅觉感知

Ziqi Zhang, Eunyeong Jin, Miguel Vasco, Farzaneh Taleb, Nona Rajabi, Alexandra Gutmann, Jonathan Williams, Antônio H. Ribeiro, Danica Kragic

发表机构 * Dept. of Intelligent Systems, KTH Royal Institute of Technology（智能系统系，皇家理工学院）； Atmospheric Chemistry Dept., Max Planck Institute for Chemistry（大气化学部，马克斯·普朗克研究所）； Dept. of Information Technology, Uppsala University（信息科技系，乌普萨拉大学）； Science for Life Laboratory (SciLifeLab), Uppsala（生命科学实验室（SciLifeLab），乌普萨拉）

AI总结提出SCENT多模态对比学习框架，通过将电子电离质谱表示与预训练化学结构嵌入对齐，在无需分子结构的情况下实现与结构模型相当的嗅觉预测性能。

详情

AI中文摘要

从分子结构预测人类嗅觉感知已取得显著进展，但这些方法在推理时需要明确的化学结构，而这在实际传感场景中并不可用。我们通过探索直接电子电离质谱（EI-MS）作为嗅觉预测的替代输入模态来弥补这一差距，该传感技术可在数秒内获取化学信息丰富的碎片指纹。我们提出了谱图到化学嵌入对齐（SCENT），这是一个多模态对比学习框架，它将EI-MS表示与预训练的化学结构嵌入对齐，同时在推理时仅需要质谱。在多标签气味描述符预测任务中，SCENT显著优于仅使用MS的基线，并实现了与基于结构的模型相当的性能，尽管在测试时不需要明确的分子结构。学习到的表示还能更好地逼近连续的人类感知评分，并泛化到真实实验室测量的谱图，表明跨模态对齐是将分析谱图嵌入化学语义的有效策略。

英文摘要

Predicting human olfactory perception from molecular structure has seen remarkable progress, yet these approaches require explicit chemical structure at inference, which is not available in practical sensing settings. We address this gap by exploring direct electron ionization mass spectrometry (EI-MS), a sensing technique that acquires chemically informative fragmentation fingerprints in seconds, as an alternative input modality for olfactory prediction. We contribute Spectrum-to-Chemical Embedding alignmeNT (SCENT), a multi-modal contrastive learning framework that aligns EI-MS representations with pretrained chemical structure embeddings, while requiring only mass spectra at inference. On the multi-label odor descriptor prediction task, SCENT significantly outperforms MS-only baselines and achieves performance comparable to structure-based models, despite requiring no explicit molecular structure at test time. The learned representations also better approximate continuous human perceptual ratings and generalize to real-world lab-measured spectra, suggesting that cross-modal alignment is an effective strategy for grounding analytical spectra in chemical semantics.

URL PDF HTML ☆

赞 0 踩 0

2605.27006 2026-05-27 cs.LG cond-mat.dis-nn stat.ML 版本更新

Sampling Data with Chains of Forward-Backward Diffusion Steps

通过前向-反向扩散步骤链采样数据

Hyunmo Kang, Noam Itzhak Levi, Corinna Elena Wegner, Daniel J. Korchinski, Matthieu Wyart

发表机构 * Johns Hopkins University（约翰霍普金斯大学）； EPFL（瑞士联邦理工学院）； University of Göttingen（哥廷根大学）

AI总结提出U-turn链，通过扩散模型的短前向-反向步骤迭代构造马尔可夫链，结合Metropolis-Hastings校正从能量修正目标中采样，并发现最小U-turn动力学经历由数据流形碎片化驱动的遍历性破缺相变。

详情

AI中文摘要

从学习到的高维分布中采样是一个基础的计算问题。我们引入U-turn链：通过迭代扩散模型的短前向-反向步骤获得的马尔可夫链，其中每一步提出一个保持在所学数据流形上的移动，并与Metropolis-Hastings校正配对，从能量修正目标中采样。对于合成语言，我们表明最小U-turn动力学经历由数据流形碎片化驱动的遍历性破缺相变；在更大的U-turn幅度下遍历性得以恢复。在非遍历区域，低层特征比高层特征松弛得更快，这种顺序仅在足够大的U-turn幅度下才会反转。我们在自然语言和自然图像上测试这些预测。在两种模态中，最小U-turn松弛缓慢，尤其是对于由CNN或LLM中深层表示近似的高层特征。层序反转仅在噪声足够大且混合高效时出现——这些特征与强约束、弱混合的局部动力学一致。我们讨论了这些结果对使用扩散模型采样的启示。

英文摘要

Sampling from learned high-dimensional distributions is a foundational computational problem. We introduce U-turn chains: Markov chains obtained by iterating short forward-backward steps of a diffusion model, in which each step proposes a move that remains on the learned data manifold and, paired with a Metropolis-Hastings correction, samples from energy-modified targets. For synthetic languages, we show that minimal U-turn dynamics undergoes an ergodicity-breaking phase transition driven by fragmentation of the data manifold; ergodicity is restored at larger U-turn magnitude. In the non-ergodic regime, low-level features relax faster than high-level ones, an ordering that inverts only at sufficiently large U-turn magnitude. We test these predictions on natural language and natural images. In both modalities, minimal U-turns relax slowly, especially for high-level features approximated by deep representations in CNNs or LLMs. The layer-ordering inversion appears only at large noise when mixing is efficient -- signatures consistent with strongly constrained, weakly mixing local dynamics. We discuss the implications of these results for sampling with diffusion models.

URL PDF HTML ☆

赞 0 踩 0

2605.26998 2026-05-27 cs.LG q-bio.NC 版本更新

Probabilistic Recurrent Intention Switching Model

概率递归意图切换模型

Wenyuan Sheng, Hao Zhu, Joschka Boedecker

发表机构 * Department of Computer Science, University of Freiburg（弗赖堡大学计算机科学系）

AI总结提出PRISM模型，利用轻量级递归网络建模非平稳意图切换，实现精确EM分解和闭式求解，在网格世界、小鼠迷宫和机器人操作任务中取得最优似然并恢复可解释意图。

详情

AI中文摘要

逆强化学习（IRL）从观察到的行为中恢复奖励函数，但传统方法假设单一固定奖励，无法捕捉一个回合内的目标切换。最近的多意图IRL方法通过分割轨迹来解决这一问题，但将意图转换建模为无记忆马尔可夫链或通过固定历史窗口的手动状态增强。我们提出概率递归意图切换模型（PRISM），该模型用轻量级递归网络替代这两种机制，将观察历史映射到每步意图分布。我们证明由此产生的EM目标可以精确分解为独立的每意图奖励子问题，每个子问题可闭式求解，从而得到$\mathcal{O}(nK)$的E步，无需变分近似。我们在非马尔可夫网格世界、小鼠迷宫和BridgeData~V2机器人操作（首个大规模多意图IRL机器人应用）上评估PRISM。在所有设置中，PRISM在保持最高留出对数似然的同时，从未标记的演示中恢复出可命名、时间上连贯的意图，表明离散目标切换存在于生物和人工智能体中。

英文摘要

Inverse reinforcement learning (IRL) recovers reward functions from observed behavior, yet traditional methods assume a single stationary reward that cannot capture goal switching within an episode. Recent multi-intention IRL methods address this by segmenting trajectories, but model intention transitions as either a memoryless Markov chain or via manual state augmentation with a fixed history window. We propose the Probabilistic Recurrent Intention Switching Model (PRISM), which replaces both mechanisms with a lightweight recurrent network that maps observation history to a per-step intention distribution. We prove that the resulting EM objective decomposes exactly into independent per-intention reward subproblems, each solvable in closed form, yielding an $\mathcal{O}(nK)$ E-step with no variational approximation. We evaluate PRISM on a non-Markovian gridworld, a mouse labyrinth, and BridgeData~V2 robotic manipulation, the first large-scale robotic application of multi-intention IRL. Across all settings PRISM achieves the highest held-out log-likelihood while recovering nameable, temporally coherent intentions from unlabeled demonstrations, suggesting that discrete goal switching is present in both biological and artificial agents.

URL PDF HTML ☆

赞 0 踩 0

2605.26990 2026-05-27 stat.ML cs.LG 版本更新

Constrained Bayesian Experimental Design via Online Planning

通过在线规划的约束贝叶斯实验设计

Yujia Guo, Daolang Huang, Xinyu Zhang, Sammie Katt, Samuel Kaski, Ayush Bharti

发表机构 * ELLIS Institute Finland（芬兰ELLIS研究所）； Department of Computer Science, Aalto University, Finland（芬兰阿尔托大学计算机科学系）； Department of Computer Science, University of Manchester, UK（英国曼彻斯特大学计算机科学系）

AI总结提出一种结合离线预训练摊销策略和后验网络与在线多步前瞻规划（场景树）的方法，以在动态约束下优化贝叶斯实验设计，相比现有方法获得更优信息序列且计算开销适中。

Comments 24 pages, 9 figures. Accepted at the Forty-Third International Conference on Machine Learning (ICML 2026)

2605.26984 2026-05-27 cs.LG 版本更新

TED: Related Party Transaction guided Tax Evasion Detection on Heterogeneous Graph

TED：基于关联方交易的异构图偷漏税检测

Yiming Xu, Bin Shi, Bo Dong, Jiaxiang Wang, Hua Wei, Qinghua Zheng

发表机构 * School of Computer Science and Technology, Xi’an Jiaotong University（西安交通大学计算机科学与技术学院）； School of Distance Education, Xi’an Jiaotong University（西安交通大学继续教育学院）； School of Computing and Augmented Intelligence, Arizona State University（亚利桑那州立大学计算与增强智能学院）

AI总结针对现有偷漏税检测方法未能充分利用税务场景中丰富交互信息的问题，提出一种基于异构图神经网络的TED模型，通过关联方交易组过滤噪声并设计层次注意力机制捕获深层语义，在真实数据集上显著优于现有方法。

Comments Accepted by Data Mining and Knowledge Discovery (DMKD25)

详情

AI中文摘要

偷漏税导致政府收入严重损失并扰乱公平竞争的经济秩序。为缓解这一问题，最新的偷漏税检测解决方案利用专家知识提取特征，然后训练分类器判断公司是否涉嫌偷漏税。然而，现有方案主要关注公司的统计特征，未能利用税务场景中丰富的交互信息，从而影响检测性能。在本文中，我们首先将税务场景建模为异构图，并研究异构图模型下的偷漏税检测问题。为了提高偷漏税检测的性能，提出了一种新颖的图神经网络模型来提取异构图的综合信息。具体来说，我们利用异构且复杂的关联方交易组来过滤低层噪声信息。此外，设计了一种层次注意力机制来捕获关联方交易组中隐藏的更深层次结构和语义信息。我们将该方法应用于税务局的真实风险管理系统，并在两个人工标注的真实世界税务数据集上进行评估。结果表明，我们的方法在偷漏税检测任务上显著优于现有最先进方法。

英文摘要

Tax evasion causes severe losses of government revenues and disturbs the economic order of fair competition. To help alleviate this problem, the latest tax evasion detection solutions utilize expert knowledge to extract features and then train classifiers to determine whether a company is suspected of tax evasion. However, existing solutions mainly focus on the statistical features of the company, but fail to exploit the rich interactive information in tax scenarios, which affect the detection performance. In this paper, we first model the tax scenario as a heterogeneous graph and study the tax evasion detection problem under the heterogeneous graph model. To improve the performance of tax evasion detection, a novel graph neural network model is proposed to extract the comprehensive information of heterogeneous graphs. Specifically, we use heterogeneous and complex related party transaction groups to filter low-level noise information. Moreover, a hierarchical attention mechanism is designed to capture the deeper structure and semantic information hidden in the related party transaction group. We apply our method to the real risk management system of the tax bureau, and evaluate it on two human-labeled real-world tax datasets. The results demonstrate that our method significantly outperforms the state-of-the-art in the tax evasion detection task.

URL PDF HTML ☆

赞 0 踩 0

2605.26977 2026-05-27 cs.LG math.OC 版本更新

Convergence of Spectral Descent for Non-smooth Optimization

非光滑优化的谱下降收敛性

Yixuan Yang, Yuqing He, Song Li

发表机构 * School of Mathematical Sciences, Zhejiang University, Hangzhou, China（浙江大学数学科学学院，杭州，中国）

AI总结研究Muon优化器的简化变体谱下降(SD)及其截断版本(TSD)在非光滑凸优化中的全局线性收敛性，并应用于鲁棒低秩矩阵恢复。

详情

AI中文摘要

Muon优化器最近在训练大型语言模型方面展示了显著的经验成功。然而，对其机制的理论理解仍然有限。目前Muon的收敛保证严重依赖于光滑性假设，其非光滑收敛行为在很大程度上未被探索。在这项工作中，我们通过研究谱下降(SD)（Muon的简化变体）及其截断版本截断谱下降(TSD)，朝着弥合这一差距迈出了一步。在凸性、Lipschitz连续性和尖锐性条件下，我们建立了SD和TSD在非光滑凸公式中的全局线性收敛性。我们还研究了配备解耦权重衰减的正则化变体，并通过它们与Frank-Wolfe方法的联系推导出次线性收敛保证。最后，我们将我们的理论框架应用于混合稀疏和密集噪声下的鲁棒低秩矩阵恢复，并提供了严格的恢复保证。数值实验支持理论发现，并展示了Muon类型方法在非光滑优化中的有效性。

英文摘要

The Muon optimizer has recently demonstrated remarkable empirical success in training large language models. However, the theoretical understanding of its mechanisms remains limited. Current convergence guarantees for Muon rely heavily on smoothness assumptions, leaving its non-smooth convergence behavior largely unexplored. In this work, we take a step toward bridging this gap by investigating Spectral Descent (SD), a simplified variant of Muon, together with its truncated counterpart, Truncated Spectral Descent (TSD). Under convexity, Lipschitz continuity, and sharpness conditions, we establish global linear convergence for both SD and TSD in non-smooth convex formulations. We also study regularized variants equipped with decoupled weight decay and derive sublinear convergence guarantees through their connection with Frank-Wolfe methods. Finally, we apply our theoretical framework to robust low-rank matrix recovery under mixed sparse and dense noise regimes and provide rigorous recovery guarantees. Numerical experiments support the theoretical findings and demonstrate the effectiveness of Muon-type methods for non-smooth optimization.

URL PDF HTML ☆

赞 0 踩 0

2605.26973 2026-05-27 stat.ML cond-mat.dis-nn cs.LG cs.NE q-bio.NC 版本更新

Signal-to-Noise Ratio and Sample Size Govern Representational Alignment in Neural Networks

信噪比与样本量控制神经网络中的表征对齐

Ali Hussaini Umar, Alessandro Laio

发表机构 * SISSA Trieste, Italy（意大利特里斯特SISSA研究所）； Theoretical and Scientific Data Science (TSDS) group at the International School for Advanced Studies (SISSA)（国际先进研究学院（SISSA）理论与科学数据科学（TSDS）小组）； Condensed Matter and Statistical Physics section at the International Centre for Theoretical Physics (ICTP)（国际理论物理中心（ICTP）凝聚态与统计物理部门）

AI总结通过理论和实验证明，信噪比和训练样本量以单调和非单调方式分别影响神经网络表征对齐，且对齐程度在插值阈值附近最小，与泛化误差解耦。

详情

AI中文摘要

已知神经网络会发展出潜在表征，这些表征是$对齐$的，即在不同架构、训练协议或训练数据集训练的网络之间结构相似。我们在一个受控环境中研究这一现象，使用被噪声过程的独立实现扰动的训练集，训练一组网络执行回归和分类任务。我们表明，信噪比（SNR）和训练样本量以定性相似的方式影响对齐，无论是在真实世界数据集上训练的网络，还是在极其简单的具有单个隐藏层的$线性$网络中（其对齐可以解析估计）。在线性和非线性网络、回归和分类任务以及合成和真实数据中，我们一致观察到，对齐随SNR单调变化，但随训练样本量非单调变化。特别地，对齐在插值阈值附近最小，且更强的对齐不一定对应更好的泛化误差。这些发现揭示了数据质量和数量对对齐的非平凡依赖关系，且与泛化性能解耦。

英文摘要

Neural networks are known to develop latent representations that are $aligned$, namely structurally similar across networks trained with different architectures, training protocols, or training datasets. We study this phenomenon in a controlled setting, where we train an ensemble of networks on regression and classification tasks using training sets perturbed by independent realizations of a noise process. We show that the signal-to-noise ratio (SNR) and the training sample size influence the alignment in qualitatively similar ways in networks trained on real-world datasets and in an extremely simple $linear$ network with a single hidden layer, for which the alignment can be estimated analytically. Across linear and nonlinear networks, regression and classification tasks, and both synthetic and real-world data, we consistently observe that alignment varies monotonically with SNR but non-monotonically with training sample size. In particular, the alignment is minimized near the interpolation threshold, and a stronger alignment does not necessarily correspond to better generalization error. These findings reveal a non-trivial dependence of alignment on data quality and quantity, decoupled from generalization performance.

URL PDF HTML ☆

赞 0 踩 0

2605.26971 2026-05-27 cs.LG 版本更新

RLVR Datasets and Where to Find Them: Tracing Data Lineage for Better Training Data

RLVR 数据集及其查找方法：通过数据溯源寻找更好的训练数据

Hsiu-Yuan Huang, Weijie Liu, Chenming Tang, Sanwoo Lee, Kai Yang, Yangkun Chen, Saiyong Yang, Yunfang Wu

发表机构 * National Key Laboratory for Multimedia Information Processing, Peking University（北京大学多媒体信息处理国家重点实验室）； School of Computer Science, Peking University（北京大学计算机科学学院）； LLM Department, Tencent（腾讯LLM部门）

AI总结针对可验证奖励强化学习（RLVR）数据集来源不清的问题，提出基于谱系感知搜索的原子源追踪框架（ATLAS），追溯超过99.7%的实例至20个原子源，并基于源级反事实归因（SCA）原则构建去污染数据集DAPO++，其质量分数Q与下游RLVR性能强相关。

Comments 7 figures, 12 tables

详情

AI中文摘要

可验证奖励强化学习（RLVR）数据集的激增加剧了来源崩溃问题，原因是现有数据集之间的谱系不明确。为弥合这一碎片化的RLVR数据格局，我们提出了基于谱系感知搜索的原子源追踪（ATLAS），这是一个系统框架，用于将RLVR数据集追溯至其原子源，将145万个实例中的超过99.7%归因于20个原子源。我们的分析表明，大多数RLVR数据集是一小组共享上游源的变体，很少有引入真正新数据的，许多面临数据污染风险。这些发现自然促使我们策划一个新的RLVR数据集DAPO++，并从谱系感知的角度对现有数据集进行基准测试。为此，我们提出源级反事实归因（SCA）作为指导原则，以策划一个具有集中学习信号的去污染训练数据集。本质上，SCA通过比较每个原子源的RL检查点与共享基模型来测量样本的边际效用。基于这些归因信号，我们进一步设计了一个复合数据集质量分数Q，该分数与下游RLVR性能强相关。在Qwen3系列模型上的实验验证了DAPO++在保留基准上持续提升性能，而Q可靠地预测了下游RLVR训练效果。我们的代码和数据可在https://github.com/Celine-hxy/ATLAS获取。

英文摘要

The proliferation of Reinforcement Learning from Verifiable Rewards (RLVR) datasets has exacerbated provenance collapse due to unclear lineage among existing datasets. To bridge this fragmented RLVR data landscape, we propose Atomic-source Tracing via Lineage-Aware Search (ATLAS), a systematic framework for tracing RLVR datasets back to their atomic sources, attributing over 99.7% of 1.45M instances to 20 atomic sources. Our analysis reveals that most RLVR datasets are variants of a small set of shared upstream sources, with few introducing genuinely new data, and many facing data contamination risks. These findings naturally motivate us to curate a new RLVR dataset, DAPO++, and to benchmark existing datasets from a lineage-aware perspective. To this end, we propose Source-level Counterfactual Attribution (SCA) as a guiding principle to curate a decontaminated training dataset with concentrated learning signals. Essentially, SCA measures a sample's marginal utility by comparing per-atomic-source RL checkpoints against a shared base model. Building upon these attribution signals, we further design a composite dataset quality score Q that strongly correlates with downstream RLVR performance. Experiments on Qwen3 series models verify that DAPO++ consistently improves performance on held-out benchmarks, while Q reliably predicts downstream RLVR training effectiveness. Our code and data is available at https://github.com/Celine-hxy/ATLAS.

URL PDF HTML ☆

赞 0 踩 0

2605.26925 2026-05-27 quant-ph cs.LG 版本更新

Adaptive Reinforcement Learning for Robust Open Quantum System Control: A Multi-Task Framework with Temporal Optimization

自适应强化学习用于鲁棒开放量子系统控制：一种带有时间优化的多任务框架

Haftu W. Fentaw, Steve Campbell, Simon Caton

发表机构 * Centre for Quantum Engineering, Science, and Technology, University College Dublin（都柏林大学量子工程、科学与技术中心）

AI总结提出一种多任务软演员-评论家（SAC）强化学习框架，用于开放量子系统控制，同时学习最优脉冲序列并发现特定问题的演化时间T和控制脉冲段数N，在51种哈密顿量变化下实现高保真度状态转移，并展现出优于GRAPE的鲁棒性。

详情

AI中文摘要

我们提出了一种多任务软演员-评论家（SAC）强化学习框架，用于跨不同哈密顿量的开放系统量子控制，该框架学习最优脉冲序列，同时发现特定问题的演化时间T和控制脉冲段数N。在51种哈密顿量变化上的实验结果表明，多任务SAC模型能够生成控制脉冲，在环境噪声下将系统从初始状态驱动到目标状态，并具有高保真度，为适用于实际噪声量子器件的通用量子控制奠定了必要基础。通过逐步扩展训练哈密顿量集，我们研究了使用给定数量样本哈密顿量训练的单个多任务模型是否能够成功完成来自同一哈密顿量空间但训练中未遇到的哈密顿量的状态转移任务。此外，我们的鲁棒性不保真度度量（RIM）分析表明，与GRAPE优化的控制相比，SAC训练的策略对脉冲幅度扰动和退相干率变化表现出更优越的鲁棒性。

英文摘要

We present a Multi-task Soft Actor-Critic (SAC) Reinforcement Learning framework designed for open-system quantum control across diverse Hamiltonians, which learns optimal pulse sequences while simultaneously discovering problem-specific evolution time T and number of control pulse segments N. Experimental results across 51 Hamiltonian variations demonstrate that the multi-task SAC model is able to generate control pulses that can drive a system, under environment noise, from its initial state to its target state with high fidelities, establishing essential foundations for universal quantum control applicable to realistic noisy quantum devices. Through progressive expansion of the training Hamiltonian set, we investigate if a single multi-task model trained using a given number of sample Hamiltonians can successfully accomplish state-transfer tasks for Hamiltonians drawn from the same Hamiltonian space but not encountered during training. In addition, our Robustness Infidelity Measure (RIM) analysis reveals that SAC trained policies exhibit superior robustness to pulse amplitude perturbations and decoherence rate variations compared to GRAPE-optimized controls.

URL PDF HTML ☆

赞 0 踩 0

2605.26908 2026-05-27 cs.AI cs.DS cs.LG 版本更新

On the Detection of Commutative Factors in Factor Graphs: Necessary and Sufficient Conditions

关于因子图中可交换因子检测的充要条件

Malte Luttermann, Ralf Möller, Marcel Gehrke

发表机构 * Institute for Humanities-Centered Artificial Intelligence, University of Hamburg, Germany（人文导向人工智能研究所，汉堡大学，德国）

AI总结本文重新审视了因子图中可交换因子检测的理论基础，指出现有算法依赖的定理仅为必要条件而非充分条件，并提出了修正算法以保证正确性和效率。

详情

AI中文摘要

利用概率图模型（如因子图）中对象的不可区分性是提升概率推理算法的关键，并允许对领域规模进行可处理的概率推理问题。在因子图中利用不可区分对象的核心是识别可交换因子，即其输出值在分配给其部分参数的输入值的排列下保持不变的因子。本文重新审视了检测可交换因子的最先进算法的理论基础。具体而言，我们表明，在其当前形式下，最先进算法依赖于一个中心定理，该定理被错误地视为识别可交换因子的充分条件，而实际上它仅意味着必要条件。因此，正如我们在本文中所展示的，最先进算法可能会产生错误结果。为了修复当前最先进算法中存在的缺陷，我们证明了上述定理的一个略微修改版本，该版本作为识别可交换因子的必要条件。此外，我们提出了最先进算法的修正版本，在保持其效率的同时确保正确性，并引入了一种具有更严格最坏情况边界的补充算法。

英文摘要

Exploiting the indistinguishability of objects in a probabilistic graphical model such as a factor graph is key to lifted probabilistic inference algorithms and allows for tractable probabilistic inference problems with respect to domain sizes. A central building block for the exploitation of indistinguishable objects in factor graphs is the identification of commutative factors, i.e., factors whose output values are invariant under permutations of input values assigned to a subset of their arguments. In this paper, we revisit the theoretical foundations underlying the state-of-the-art algorithm to detect commutative factors. Specifically, we show that in its current form, the state-of-the-art algorithm relies on a central theorem that is mistakenly regarded as a sufficient condition to identify commutative factors, while it actually only implies necessary condition. Consequently, the state of the art might, as we show in this paper, deliver incorrect results. To fix the flaws currently present in the state of the art, we prove a slightly modified version of the aforementioned theorem, which serves as a necessary condition to identify commutative factors. Moreover, we present a corrected version of the state-of-the-art algorithm, which keeps its efficiency while ensuring correctness and introduce a complementary algorithm with tighter worst-case bounds.

URL PDF HTML ☆

赞 0 踩 0

2605.26900 2026-05-27 cs.LG 版本更新

SPHERE-JEPA: Spherical Prediction with Homogeneous Embeddings

SPHERE-JEPA: 均匀嵌入的球面预测

Léo Nicollier, Max Dunitz, Marc Pic, Pablo Musé, Enric Meinhardt-Llopis, Gabriele Facciolo

发表机构 * Université Paris-Saclay, CNRS, Advanced Track and Trace（巴黎萨克雷大学，国家科学研究中心，先进跟踪与追溯）； ENS Paris-Saclay, Centre Borelli, Advanced Track and Trace（巴黎萨克雷高等师范学院，博雷利中心，先进跟踪与追溯）； Université Paris-Saclay, CNRS（巴黎萨克雷大学，国家科学研究中心）； ENS Paris-Saclay, Centre Borelli（巴黎萨克雷高等师范学院，博雷利中心）

AI总结本文提出SPHERE-JEPA框架，通过将Cramér-Wold投影机制调整为强制超球面均匀性而非高斯先验，解决了自监督学习中高斯嵌入导致各向异性k-NN邻域的问题，在纹理检索和ImageNet-1K线性探测上取得显著提升。

详情

AI中文摘要

自监督学习中的一个基本开放问题是明确表征学习表示的最优几何。最近，LeJEPA将各向同性高斯嵌入确定为在欧几里得空间中最小化下游预测风险的最优解。然而，对于支撑在低维流形（如超球面）上的分布，相应问题仍未探索。在这项工作中，我们证明将这种极小极大分析扩展到黎曼流形上的光滑分布会根本性地改变最优解。我们表明，在最坏情况公式下，k近邻和核岭回归都诱导超球面均匀性。更精确地说，我们证明流形上的均匀分布对于k近邻是最优的，而球面上的均匀分布对于使用指数点积核和线性核的核岭回归是最优的。这一理论见解揭示了高斯嵌入的一个根本局限：其非均匀密度导致各向异性的k-NN邻域，严重偏置估计器。为纠正这一点，我们引入了SPHERE-JEPA，一个理论基础的SSL框架。我们调整LeJEPA的Cramér-Wold投影机制以强制超球面均匀性而非高斯先验。实验上，SPHERE-JEPA取得了显著改进，将纹理检索mAP提升了超过6%，同时在标准基准上持续匹配或超越LeJEPA——包括在ImageNet-1K（ViT-B/14）上+1.8%的线性探测增益。

英文摘要

A fundamental open question in self-supervised learning (SSL) is the explicit characterization of the optimal geometry of the learned representations. Recently, LeJEPA identified isotropic Gaussian embeddings as optimal for minimizing downstream prediction risk in Euclidean spaces. However, the corresponding problem for distributions supported on lower-dimensional manifolds, such as the hypersphere, remains unexplored. In this work, we demonstrate that extending this minimax analysis to smooth distributions on Riemannian manifolds fundamentally changes the optimal solution. We show that, under a worst-case formulation, both k-nearest neighbors and kernel ridge regression induce hyperspherical uniformity. More precisely, we show that uniform distributions on manifolds are optimal for k-nearest neighbors, and that the uniform distribution on the sphere is optimal for kernel ridge regression with both the exponential dot-product kernel and the linear kernel. This theoretical insight reveals a fundamental limitation of Gaussian embeddings: their non-uniform density induces anisotropic k-NN neighborhoods, severely biasing the estimator. To correct this, we introduce SPHERE-JEPA, a theoretically grounded SSL framework. We adapt LeJEPA's Cram{é}r-Wold projection mechanism to enforce hyperspherical uniformity rather than a Gaussian prior. Empirically, SPHERE-JEPA yields significant improvements, boosting texture retrieval mAP by over 6%, while consistently matching or outperforming LeJEPA on standard benchmarks-including a +1.8% linear probing gain on ImageNet-1K (ViT-B/14).

URL PDF HTML ☆

赞 0 踩 0

2605.26895 2026-05-27 cs.LG cs.AI stat.ML 版本更新

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

微不足道的大小，显著的效果：大型语言模型中的尺度向量

Mingze Wang, Shuchen Zhu, Yuxin Fang, Binghui Li, Kai Shen, Shu Zhong

发表机构 * Peking University（北京大学）

AI总结本文系统研究了大型语言模型中的尺度向量，发现其虽参数占比极小但对预训练至关重要，通过自放大预条件效应优化优化过程，并提出了三种轻量级改进策略，在多种模型规模上一致提升性能。

Comments 36 pages

详情

AI中文摘要

周期性拓扑深度学习用于聚合物设计与发现

Yasharth Yadav, Tze Kwang Gerald Er, Atsushi Goto, Kelin Xia

发表机构 * School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371（新加坡南洋理工大学物理与数学科学学院）； School of Chemistry, Chemical Engineering and Biotechnology (CCEB), Nanyang Technological University, Singapore 637371（新加坡南洋理工大学化学、化工与生物技术学院）

AI总结提出基于周期性Vietoris-Rips复形和层次单纯形消息传递的深度学习框架Periodic-TDL，通过捕捉多体相互作用和长程信息，在聚合物性质预测任务上超越现有模型，并验证了酯到酰胺取代和α-甲基化对热稳定性的提升。

Comments 19 pages, 3 figures, 3 tables

详情

AI中文摘要

聚合物支撑着能源、医疗和材料科学领域的应用，但其广阔的化学空间使得系统性发现充满挑战。大多数机器学习方法将聚合物表示为单个重复单元的分子图，从而忽略了聚合物链的周期性和超越成对键的多体相互作用。我们提出了Periodic-TDL，一个基于周期性Vietoris-Rips复形的深度学习框架，该复形捕捉跨多个空间尺度的多体相互作用，随后通过层次单纯形消息传递（HSMP）编码器将信息从长程相互作用传播到共价键，产生由高阶拓扑特征增强的表征。Periodic-TDL在涵盖电子、光学、物理和热学目标的聚合物性质预测任务中优于所有最先进的模型。此外，我们定量验证了酯到酰胺取代和α-甲基化如何增强热稳定性。使用通过系统取代丙烯酸酯和丙烯酰胺聚合物生成的计算合成数据集（48,208个结构），我们观察到在匹配的聚合物对中，酯到酰胺取代的平均$T_g$增加约$55^\circ$C，主链α-甲基化的平均$T_g$增加约$14^\circ$C。为了验证这些预测趋势，我们使用Periodic-TDL模型分析了来自独立实验测量的六对新型聚合物，包括三篇文献中未报道的新合成聚合物。实验数据成功证实了模型的预测。最终，这些发现表明Periodic-TDL捕捉了特定官能团修饰的潜在物理效应，而不仅仅是优化基准数据集上的预测性能。

英文摘要

Polymers underpin applications across energy, healthcare, and materials science, yet their vast chemical space makes systematic discovery challenging. Most machine learning approaches represent polymers as molecular graphs of a single repeating unit, thereby missing both the periodicity of polymer chains and many-body interactions beyond pairwise bonds. We introduce Periodic-TDL, a deep learning framework built on periodic Vietoris-Rips complexes that capture many-body interactions across multiple spatial scales, followed by a hierarchical simplicial message-passing (HSMP) encoder that propagates information from long-range interactions to covalent bonds, yielding representations enriched by higher-order topological features. Periodic-TDL outperforms all state-of-the-art models across polymer property prediction tasks spanning electronic, optical, physical, and thermal targets. Furthermore, we quantitatively validate how ester-to-amide substitution and $α$-methylation enhance thermal stability. Using a computationally synthesized dataset of 48,208 structures-generated via systematic substitution of acrylate and acrylamide polymers-we observed a mean $T_g$ increase of $\sim 55^\circ$C for ester-to-amide substitutions and $\sim 14^\circ$C for backbone $α$-methylation across matched polymer pairs. To verify these predicted trends, we use our Periodic-TDL model to analyze six novel polymer pairs from independent experimental measurements, including three newly synthesized polymers previously unreported in the literature. The experimental data successfully confirmed the model's predictions. Ultimately, these findings demonstrate that Periodic-TDL captures the underlying physical effects of specific functional group modifications, rather than merely optimizing predictive performance on benchmark datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.26830 2026-05-27 cs.LG cs.AI cs.CV 版本更新

The Kalman Evolve: Closing the Gap in Kalman Filtering via Interpretable Algorithm Discovery

卡尔曼演化：通过可解释算法发现缩小卡尔曼滤波的差距

Vasileios Saketos, Ming Xiao

发表机构 * KTH Royal Institute of Technology（皇家理工学院）

AI总结针对非线性传感场景下卡尔曼滤波性能下降的问题，提出Kalman Evolve框架，联合优化噪声参数与更新结构，利用大语言模型生成可解释的非仿射修改，在多个基准上实现高达12%的RMSE降低。

详情

AI中文摘要

状态估计是控制和信号处理中的一个基本问题，卡尔曼滤波器在线性动力学、高斯噪声和已知噪声协方差下提供最优解。然而，这些假设在多普勒雷达和LiDAR等实际传感场景中常常不成立。在这些情况下，最优估计器本质上是非线性的，导致系统性能下降。这产生了一个仅通过调整噪声协方差参数（即卡尔曼滤波器中的过程噪声和测量噪声）无法消除的性能差距。为了解决这一限制，我们提出了Kalman Evolve，一个通过联合优化噪声参数和更新结构来发现改进滤波算法的框架。我们的方法利用大语言模型作为程序空间上的结构化先验，能够生成对经典卡尔曼滤波器的可解释、非仿射修改，同时保留其递归形式。我们提供了分析结果，证明了在常见非线性传感模型下仿射估计器的次优性，从而激发了结构感知更新的必要性。在一系列合成和真实跟踪基准测试中，包括多普勒雷达、基于LiDAR的定位和行人跟踪，所发现的算法始终优于强基线（如优化卡尔曼滤波器），实现了高达12%的RMSE降低。这些结果表明，优化卡尔曼滤波器的结构而不仅仅是其参数，提供了一种实用且可解释的方式来改进状态估计。

英文摘要

State estimation is a fundamental problem in control and signal processing, for which the Kalman Filter provides an optimal solution under linear dynamics, Gaussian noise, and known noise covariances. However, these assumptions often fail in realistic sensing settings such as Doppler radar and LiDAR. In these cases, the optimal estimator is inherently nonlinear, which leads to systematic performance degradation. This creates a performance gap that cannot be eliminated by tuning the noise covariance parameters (i.e., the process and measurement noise in the Kalman Filter) alone. To address this limitation, we propose Kalman Evolve, a framework for discovering improved filtering algorithms by jointly optimizing both noise parameters and the update structure. Our approach leverages large language models (LLMs) as a structured prior over program space, enabling the generation of interpretable, non-affine modifications to the classical Kalman filter while preserving its recursive form. We provide analytical results establishing the suboptimality of affine estimators under common nonlinear sensing models, motivating the need for structure-aware updates. Across a range of synthetic and real-world tracking benchmarks, including Doppler radar, LiDAR-based localization, and pedestrian tracking, the discovered algorithms consistently improve over strong baselines such as the Optimized Kalman Filter, achieving up to 12\% reduction in RMSE. These results suggest that optimizing the structure of the Kalman filter, rather than only its parameters, provides a practical and interpretable way to improve state estimation.

URL PDF HTML ☆

赞 0 踩 0

2605.26821 2026-05-27 hep-ph cs.LG hep-ex 版本更新

Particle-Lund Multimodality in Jet Taggers

喷注标记器中的粒子-拉普兰多模态

Loukas Gouskos, Benedikt Maier

发表机构 * Brown University（布朗大学）； Imperial College of Science, Technology and Medicine（帝国理工学院科学、技术与医学学院）

AI总结提出PLuM多模态架构，联合处理粒子成分与拉普兰平面分裂，通过交叉注意力机制研究显式QCD层次结构是否补充原始粒子表示，发现对顶夸克和H→bb标记有系统性提升，在HH(4b)分析中背景抑制提高25%。

详情

AI中文摘要

拉普兰平面提供了喷注内QCD辐射的物理动机层次表示，而基于变换器的标记器通过直接从原始粒子成分及其成对关系中学习达到了最先进的性能。我们研究变换器是否从成分级输入隐式捕获层次QCD结构，或者显式物理表示是否仍然具有互补性。为了测试这一点，我们引入了PLuM，一种多模态架构，将粒子成分和拉普兰平面分裂投影到共享潜在空间，并用统一变换器联合处理两者。交叉注意力允许模型探测结构化QCD信息是否提供了超出粒子单独编码的区分能力。我们观察到顶夸克和H→bb标记的系统性增益，而在H→cc或H→4q拓扑中没有发现可比改进。这种选择性增强表明，即使在高度表达性的架构中，关于b喷注形成的显式层次信息仍然与原始粒子表示互补，而其他拓扑已经在成分级被很好地捕获。对于高影响LHC分析，如洛伦兹增强的双希格斯玻色子搜索中的四b夸克末态（HH(4b)），增益显著：在25%的双希格斯效率工作点，PLuM的背景抑制比基线高25%。我们的结果表明，在变换器时代，QCD辐射的物理结构化表示仍然保留区分价值，激励进一步研究深度学习算法如何编码喷注动力学的不同方面。

英文摘要

The Lund plane offers a physics-motivated, hierarchical representation of QCD radiation within jets, while transformer-based taggers have reached state-of-the-art performance by learning directly from raw particle constituents and their pairwise relations. We investigate whether transformers implicitly capture hierarchical QCD structure from constituent-level inputs, or whether explicit physics representations remain complementary. To test this, we introduce PLuM, a multimodal architecture that projects particle constituents and Lund plane splittings into a shared latent space, processing both jointly with a unified transformer. Cross-attention allows the model to probe whether structured QCD information provides discriminating power beyond what particles alone encode. We observe systematic gains for top-quark and $\mathrm{H}\to\mathrm{b}\bar{\mathrm{b}}$ tagging, while finding no comparable improvement for $\mathrm{H}\to\mathrm{c}\bar{\mathrm{c}}$ or $\mathrm{H}\to 4\mathrm{q}$ topologies. This selective enhancement suggests that explicit hierarchical information about b-jet formation remains complementary to raw particle representations even in highly expressive architectures, while other topologies are already well-captured at constituent level. For high-impact LHC analyses such as Lorentz-boosted di-Higgs searches in the four $\mathrm{b}$ quark final state ($\mathrm{H}\mathrm{H}(4\mathrm{b})$), the gains are substantial: at a $25\%$ di-Higgs efficiency working point, PLuM achieves $25\%$ higher background rejection than the baseline. Our results indicate that physically structured representations of QCD radiation retain discriminating value in the transformer era, motivating further study into how different aspects of jet dynamics are encoded by deep learning algorithms.

URL PDF HTML ☆

赞 0 踩 0

2605.26808 2026-05-27 cs.LG cs.AI cs.IT math.IT 版本更新

Innovation: An Almost Characterization of Hallucination

创新：幻觉的几乎刻画

Nishant P. Das, Piyush Srivastava

发表机构 * School of Technology and Computer Science, Tata Institute of Fundamental Research, Mumbai, Maharashtra - 400 005, India（技术与计算机科学学院，塔塔基础研究机构，孟买，马哈拉施特拉邦 - 400 005, 印度）

AI总结本文引入“创新”属性来刻画大语言模型幻觉的必然性，证明创新与幻觉几乎等价，并基于创新率给出新的幻觉率下界。

详情

AI中文摘要

幻觉是大语言模型（LLMs）的一个核心局限，大量工作致力于理解和缓解它。为此，Kalai 和 Vempala（STOC 2024）引入了一个概率框架来形式化校准和幻觉，并证明高概率下，校准的 LLM 大致以“缺失质量”（衡量训练数据相对于其来源的不完整程度）的速率产生幻觉。这引出了两个基本问题：(i) 校准的 LLM 的什么属性使得幻觉不可避免？(ii) 能否通过放弃校准来避免幻觉？我们通过引入一个更简单的属性——我们称之为“创新”——来回答这些问题，该属性衡量模型产生训练数据之外输出的倾向。我们证明，创新由 Kalai 和 Vempala 识别的幻觉条件蕴含，并且进一步，它是幻觉的几乎刻画：幻觉蕴含创新，反之，创新高概率地蕴含幻觉。我们还基于“创新率”给出了幻觉率的下界，并通过将创新率与缺失质量联系起来，获得了基于缺失质量的新的幻觉率下界，扩展了 Kalai 和 Vempala 的结果。

英文摘要

Hallucination is a central limitation of large language models (LLMs), and substantial effort has been devoted to understanding and mitigating it. Towards this, Kalai and Vempala (STOC 2024) introduced a probabilistic framework formalizing calibration and hallucination, and showed that, with high probability, calibrated LLMs hallucinate roughly at the rate of the "missing mass", a measure of how incomplete the training data is relative to its source. This raises two fundamental questions: (i) what property of a calibrated LLM makes hallucinations unavoidable? and (ii) can hallucinations be avoided by giving up calibration? We answer these questions by introducing a simpler property we call innovation that measures the tendency of a model to produce outputs outside the training data. We show that innovation is implied by the condition for hallucination identified by Kalai and Vempala, and, further, that it is an almost characterization of hallucination: hallucination implies innovation, and conversely, innovation implies hallucination with high probability. We also provide lower bounds on the hallucination rate based on the "innovation rate", and by relating innovation rate back to missing mass, we obtain new hallucination rate lower bounds based on missing mass that extend the results of Kalai and Vempala.

URL PDF HTML ☆

赞 0 踩 0

2605.26802 2026-05-27 cs.LG 版本更新

PATE-TabTransGAN: Differentially Private Synthetic Tabular Data Generation via Transformer-Based Student Discrimination

PATE-TabTransGAN：基于Transformer学生鉴别的差分隐私合成表格数据生成

M. Youssef, M. Woźniak

发表机构 * Wrocław University of Science and Technology（沃拉布大学科学与技术学院）

AI总结提出PATE-TabTransGAN框架，结合教师集成私有聚合（PATE）机制与基于Transformer的学生鉴别器，在正式差分隐私保证下生成高质量合成表格数据，并在四个基准数据集上取得最优或并列最优的AUROC。

Comments 16 pages, 3 figures, 4 tables. Submitted for publication

详情

AI中文摘要

在正式差分隐私保证下生成高保真合成表格数据仍然是一个开放挑战。提供强理论保护的方法通常牺牲了真实合成所需的特征间依赖建模，而擅长捕获复杂列关系的架构仅提供经验隐私保证。我们提出PATE-TabTransGAN，一个生成框架，将教师集成私有聚合（PATE）机制与基于Transformer的学生鉴别器相结合，以共同满足这两个要求，并采用GNMax RDP会计进行数值稳定的隐私核算。在不相交分区上训练的Logistic回归教师集成通过噪声聚合标签监督学生，残差生成器针对这个差分隐私学生进行优化，通过后处理继承正式的(ε, δ)-DP保证。将PATE-TabTransGAN与PATE-GAN、DP-GAN和DP-CTGAN（被认为是差分隐私表格合成的最先进方法）进行比较。在四个表格基准（Adult、Breast、Cardio、Cervical）上进行的实验证实了所提方法的高质量：PATE-TabTransGAN在所有四个数据集上达到最佳或并列最佳的AUROC。在AUCPR上，它在Cardio上与最强基线持平，在Cervical上领先，在Breast上落后；在Adult上，我们证明AUCPR对正类惯例高度敏感，观察到的差距与评估流程之间的惯例差异一致，而非合成缺陷。

英文摘要

Generating high-fidelity synthetic tabular data under formal differential privacy guarantees remains an open challenge. Methods that provide strong theoretical protection typically sacrifice the modeling of inter-feature dependencies required for realistic synthesis, while architectures that excel at capturing complex column relationships offer only empirical privacy guarantees. We present PATE-TabTransGAN, a generative framework that integrates the Private Aggregation of Teacher Ensembles (PATE) mechanism with a Transformer-based student discriminator to jointly address both requirements, and employs a GNMax RDP accountant for numerically stable privacy accounting. An ensemble of Logistic Regression teachers trained on disjoint partitions supervise the student via noisy-aggregated labels, and a residual generator is optimized against this differentially private student, inheriting formal (ε, δ)-DP guarantees by post-processing. PATE-TabTransGAN was compared with PATE-GAN, DP-GAN, and DP-CTGAN, considered state-of-the-art in differentially private tabular synthesis. Experiments conducted on four tabular benchmarks (Adult, Breast, Cardio, Cervical) confirmed the high quality of the proposed method: PATE-TabTransGAN attains the best or tied-best AUROC on all four datasets. On AUCPR it matches the strongest baseline on Cardio, leads on Cervical, and trails on Breast; on Adult, we demonstrate that AUCPR is highly sensitive to positive-class convention, and that the observed gap is consistent with a convention difference between evaluation pipelines rather than a synthesis deficit.

URL PDF HTML ☆

赞 0 踩 0

2605.26797 2026-05-27 cs.LG cs.CL 版本更新

Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior

潜在循环Transformer：架构探索、训练策略与扩展行为

Zeyi Huang, Xuehai He, LiLiang Ren, Yiping Wang, Baolin Peng, Hao Cheng, Shuohang Wang, Pengcheng He, Jianfeng Gao, Yong Jae Lee, Yelong Shen

发表机构 * Microsoft（微软公司）； University of Wisconsin-Madison（威斯康星大学麦迪逊分校）； University of Washington（华盛顿大学）

AI总结提出潜在循环Transformer（LRT），通过跨层循环潜在路径重用前一token的高层隐藏状态作为记忆，在不增加暂停token或额外深度循环的情况下，以约2倍基线计算实现并行训练，在匹配有效计算下提升语言建模损失和上下文学习能力，仅增加0.3%参数。

详情

AI中文摘要

我们研究潜在循环Transformer（LRT），一种自回归Transformer的轻量级增强，它重用来自前一个token的高层源层隐藏状态作为下一个token的循环记忆。由于该源状态在普通解码过程中已经计算，LRT跨位置添加跨层循环潜在路径，无需插入暂停token或额外深度循环，并且保留了标准注意力机制和KV-cache接口。为了在不顺序展开Transformer的情况下大规模预训练这种循环，我们引入了交错并行训练：一次完整的全序列初始化前向传播构建共享缓冲区；然后不相交的位置子集并行细化并写回，使得所有token在约2倍基线计算下获得循环记忆感知的监督。在nanochat风格的主干网络和广泛的每参数token预算范围内，LRT在匹配有效计算下改进了语言建模损失和上下文学习，同时仅增加0.3%的参数。

英文摘要

We study Latent Recurrent Transformer (LRT), a lightweight augmentation of autoregressive transformers that reuses a high-level source-layer hidden state from the previous token as recurrent memory for the next token. Because this source state is already computed during ordinary decoding, LRT adds a cross-layer recurrent latent pathway across positions without inserting pause tokens or extra depth loops, and the standard attention mechanism and KV-cache interface are preserved. To pretrain this recurrence at scale without sequentially unrolling the transformer, we introduce interleaved parallel training: a single full-sequence initialization forward pass builds a shared buffer; then disjoint position subsets are refined in parallel and written back, so that all tokens receive recurrent-memory-aware supervision at roughly 2 times baseline compute. Across nanochat style backbones and a wide range of tokens-per-parameter budgets, LRT improves both language-modeling loss and in-context learning under matched effective compute while adding as little as 0.3% parameters.

URL PDF HTML ☆

赞 0 踩 0

2605.26786 2026-05-27 cs.CY cs.AI cs.LG 版本更新

Implementation of Big Data Analytics for Diabetes Management: Needs Assessment in the Rwanda Healthcare System

大数据分析在糖尿病管理中的应用：卢旺达医疗系统需求评估

Silas Majyambere, Tony Lindgren, Workneh Y. Ayele, Celestin Twizere

发表机构 * University of Rwanda（卢旺达大学）

AI总结本研究通过利益相关者研讨会评估卢旺达医疗系统采用大数据分析管理糖尿病的准备情况，并提出了一个基于可解释机器学习模型的实用框架。

详情

AI中文摘要

糖尿病是一种慢性代谢疾病，如果不及早诊断和管理，可能导致严重的健康问题。大数据分析和机器学习为分析大型健康数据集、支持早期发现和更好的治疗决策提供了实用工具。然而，它们在常规临床实践中的使用仍然有限。本研究考察了卢旺达医疗系统采用大数据分析管理糖尿病的准备情况。随着该国不断扩大电子病历和健康信息系统的使用，改善预测、监测和临床决策的新机遇随之出现。我们举办了一个为期五天的研讨会，涉及25名关键利益相关者，包括临床医生、数据管理员、政策制定者、医学研究人员、营养学家和技术提供商，以评估准备情况并识别现有差距。研究结果突出了大数据分析实施的潜力和主要挑战。基于这些结果，本文提出了一个实用的大数据分析框架，利用可解释的机器学习模型支持糖尿病管理策略。

英文摘要

Diabetes is a chronic metabolic disease that can lead to serious health problems if not diagnosed and managed early. Big Data Analytics (BDA) and machine learning offer practical tools for analyzing large health datasets and supporting early detection and better treatment decisions. However, their use in routine clinical practice is still limited. This study examines the readiness of Rwanda's healthcare system to adopt big data analytics for diabetes management. As the country continues to expand its use of electronic medical records and health information systems, new opportunities arise for improving prediction, monitoring, and clinical decision-making. A five-day workshop involving 25 key stakeholders, including clinicians, data managers, policymakers, medical researchers, nutritionists, and technology providers, was conducted to assess preparedness and identify existing gaps. The findings highlight both the potential and the main challenges of BDA implementation. Based on these results, the paper proposes a practical BDA framework to support diabetes management strategies using explainable machine learning models.

URL PDF HTML ☆

赞 0 踩 0

2605.26784 2026-05-27 cs.LG cs.AI 版本更新

Ratio-Variance Regularized Policy Optimization

比率方差正则化策略优化

Yu Luo, Shuo Han, Yihan Hu, Lei Lv, Huaping Liu, Fuchun Sun, Jianye Hao, Dong Li

发表机构 * Department of Foundation Model, 2012 Labs, Huawei（华为基础模型部门，2012实验室）； Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University（上海智能自主系统研究院，同济大学）； Department of Computer Science and Technology, Tsinghua University（清华大学计算机科学与技术系）； College of Intelligence and Computing, Tianjin University（天津大学智能与计算学院）

AI总结提出R²VPO方法，通过约束策略比率方差作为信任区域的局部近似，替代启发式裁剪，在LLM和机器人控制任务中提升性能与样本效率。

详情

AI中文摘要

标准的同策略强化学习依赖启发式裁剪来强制信任区域，但这种机制通过不加区分地截断高回报但高散度的更新而施加了严重代价。我们证明，显式约束策略比率方差为信任区域约束提供了原则性的局部近似，消除了二元硬裁剪的需要。通过作为分布式的“软刹车”，这种方法保留了来自新颖发现的关键梯度信号，同时自然降低权重并允许重用陈旧的离策略数据。我们引入了${\bf R}^2{\bf VPO}$（比率方差正则化策略优化），它通过原始-对偶优化框架实现这一约束。在跨越快速和慢速推理范式的$7$个LLM规模以及$10$个机器人控制任务上的广泛评估证明了所提出方法的通用性。R$^2$VPO在数学推理基准上取得了显著的性能提升，特别是在较小模型上改进尤为明显，同时显著提高了样本效率。此外，它在连续控制领域（特别是稀疏奖励和动态环境）中始终优于PPO基线。这些发现共同确立了比率方差正则化作为稳定且数据高效策略优化的原则性基础。

英文摘要

Standard on-policy reinforcement learning relies on heuristic clipping to enforce trust regions, but this mechanism imposes a severe cost by indiscriminately truncating high-return yet high-divergence updates. We demonstrate that explicitly constraining the policy ratio variance provides a principled local approximation to trust-region constraints, eliminating the need for binary hard clipping. By acting as a distributional ``soft brake'', this approach preserves critical gradient signals from novel discoveries while naturally down-weighting and enabling the reuse of stale, off-policy data. We introduce ${\bf R}^2{\bf VPO}$ (Ratio-Variance Regularized Policy Optimization), which implements this constraint via a primal-dual optimization framework. Extensive evaluations across $7$ LLM scales, spanning both fast and slow reasoning paradigms, and $10$ robotic control tasks demonstrate the generality of the proposed approach. R$^2$VPO achieves substantial performance gains on mathematical reasoning benchmarks, with particularly pronounced improvements on smaller models, while significantly improving sample efficiency. Furthermore, it consistently outperforms PPO baselines in continuous control domains, particularly in sparse-reward and dynamic environments. Together, these findings establish ratio-variance regularization as a principled foundation for stable and data-efficient policy optimization.

URL PDF HTML ☆

赞 0 踩 0

2605.26776 2026-05-27 cs.LG cs.AI 版本更新

Towards Generalization-Oriented Models for Vehicle Routing Problems with Mixture-of-Experts

面向泛化的混合专家车辆路径问题模型

Changhao Miao, Yuntian Zhang, Tongyu Wu, Fang Deng, Chen Chen

发表机构 * State Key Laboratory of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology（自主智能无人系统国家重点实验室，北京理工大学）； School of AI, Beijing Institute of Technology（北京理工大学人工智能学院）

AI总结提出基于混合专家架构的残差细化专家与实例级门控机制（R2E-IG），通过模块化策略网络和动态权重适应训练，提升车辆路径问题在分布偏移下的泛化能力。

详情

AI中文摘要

近年来，深度强化学习（DRL）在车辆路径问题（VRPs）上取得了显著进展。然而，现有的基于DRL的方法通常是在均匀分布生成的实例上训练的，这限制了它们在真实世界分布偏移下的性能。在本文中，我们旨在开发一个面向泛化的模型，该模型将策略网络划分为多个模块，并在推理过程中自适应地重组模块以形成特定策略。具体来说，我们提出了具有实例级门控的残差细化专家（R2E-IG）以改进跨分布泛化。我们的贡献有三方面：（1）我们引入了一种残差细化专家（R2E）架构，通过残差细化增强专家表达能力；（2）我们设计了一种实例级门控机制，学习分布感知的实例表示并将输入路由到合适的模块；（3）我们提出了一种配备动态权重适应（DWA）的混合分布训练机制，该机制动态地重新加权来自不同分布的训练数据，以强调更具信息量的数据。大量实验表明，R2E-IG在合成和基准数据集的分布内和分布外实例上均取得了与最先进基线相竞争的性能。此外，R2E-IG是通用的，可以轻松集成到现有的基于DRL的方法中，以进一步提高性能。

英文摘要

In recent years, Deep Reinforcement Learning (DRL) has achieved substantial progress on Vehicle Routing Problems (VRPs). However, existing DRL-based methods are typically trained on instances generated from a uniform distribution, which limits their performance under real-world distribution shifts. In this paper, we aim to develop a generalization-oriented model that partitions the policy network into multiple modules and adaptively recombines modules to form specific policies during inference. Specifically, we propose Residual Refined Experts with Instance-level Gating (R2E-IG) to improve cross-distribution generalization. Our contributions are threefold: (1) We introduce a Residual Refined Expert (R2E) architecture that enhance expert expressiveness via residual refinement; (2) We design an instance-level gating mechanism that learns distribution-aware instance representations and routes inputs to suitable modules; (3) We propose a mixed-distribution training mechanism equipped with Dynamic Weight Adaption (DWA), which dynamically reweights training data from different distributions to emphasize more informative ones. Extensive experiments show that R2E-IG achieves competitive performance against state-of-the-art baselines on both in-distribution and out-of-distribution instances across synthetic and benchmark datasets. Moreover, R2E-IG is generic and can be easily integrated into existing DRL-based methods to further improve performance.

URL PDF HTML ☆

赞 0 踩 0

2605.26763 2026-05-27 cs.LG cs.AI 版本更新

Adversarial Training for Robust Coverage Network under Worst-case Facility Losses

对抗训练用于最坏设施损失下的鲁棒覆盖网络

Changhao Miao, Yuntian Zhang, Tongyu Wu, Fang Deng, Chen Chen

发表机构 * State Key Laboratory of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology（自主智能无人系统国家重点实验室，北京理工大学）； School of AI, Beijing Institute of Technology（北京理工大学人工智能学院）

AI总结针对最大覆盖选址-阻断问题，提出基于对抗学习的双智能体深度强化学习框架，实现高效求解与鲁棒决策。

详情

AI中文摘要

最大覆盖选址-阻断问题（MCLIP）是一个经典的双层优化问题，对于韧性基础设施规划至关重要，但计算上仍然难以处理。具体来说，上层确定设施位置以最大化覆盖范围，而下层执行最坏情况下的阻断以最小化覆盖范围。上下层之间的强耦合以及各自的高组合复杂性使得传统方法无效。为了弥补这一差距，我们提出了一种基于对抗学习的双智能体深度强化学习（DADRL）框架，包括对应于上层的选址智能体和对应于下层的阻断智能体。我们的贡献有三方面：（1）选址智能体同时针对不断演化的阻断智能体进行训练，使其有效捕捉上下层之间的动态竞争相互作用；（2）为了充分利用阻断智能体的学习能力，我们提出了一种基于替代的集成推理策略，利用训练好的阻断智能体作为高保真替代来指导选址智能体的决策；（3）在合成和真实世界数据集上的大量实验表明，与其他基线相比，我们的方法在保持高度竞争力的解质量的同时，实现了卓越的计算效率。此外，我们的DADRL框架对网络结构是模型无关的，而其底层的对抗学习范式在解决其他双层优化问题方面显示出强大的潜力。

英文摘要

The Maximal Covering Location-Interdiction Problem (MCLIP) is a classic bi-level optimization problem, which is fundamental to resilient infrastructure planning yet remains computationally intractable. Specifically, the upper level determines facility locations to maximize coverage, while the lower level executes worst-case interdiction to minimize the coverage. The strong coupling between the upper and lower levels, combined with their respective high combinatorial complexity, renders traditional methods ineffective. To bridge this gap, we propose a Dual-Agent Deep Reinforcement Learning (DADRL) framework based on adversarial learning, comprising a location agent corresponding to the upper level and an interdiction agent corresponding to the lower level. Our contributions are threefold: (1) The location agent is trained simultaneously against an evolving interdiction agent, making it effectively capture the dynamic competitive interplay between the upper and lower levels; (2) To fully exploit the learned capabilities of the interdiction agent, we propose a Surrogate-based Ensemble Inference Strategy that utilizes the trained interdiction agent as a high-fidelity surrogate to guide the decisions of location agent; (3) Extensive experiments on synthetic and real-world datasets demonstrate that our approach achieves superior computational efficiency while maintaining highly competitive solution quality compared to other baselines. Furthermore, our DADRL framework is model-agnostic to network structures, while its underlying adversarial learning paradigm demonstrates strong potential for solving other bi-level optimization problems.

URL PDF HTML ☆

赞 0 踩 0

2605.26733 2026-05-27 cs.LG cs.AI 版本更新

Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models

循环语言模型中测试时可扩展潜在推理的稳定循环动力学

Xiao-Wen Yang, Ziyu Han, Xi-Hua Zhang, Wen-Da Wei, Jie-Jing Shao, Lan-Zhe Guo, Yu-Feng Li

发表机构 * State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China（新型软件技术国家重点实验室，南京大学，南京，中国）； School of Artificial Intelligence, Nanjing University, Nanjing, China（人工智能学院，南京大学，南京，中国）； School of Intelligence Science and Technology, Nanjing University, Nanjing, China（智能科学与技术学院，南京大学，南京，中国）

AI总结提出STARS训练框架，通过雅可比谱半径正则化约束潜在状态趋近渐近稳定不动点，解决循环语言模型深度递归时性能崩溃问题，实现可靠的测试时扩展并提升峰值性能。

Comments ICML 2026

详情

AI中文摘要

循环语言模型（LoopLMs）通过深度递归实现高效的潜在推理，但表现出不可靠的测试时缩放行为：性能通常在某个迭代深度达到峰值，然后随着进一步递归而崩溃。通过潜在动力学分析，我们发现现有架构和策略在稳定性和有效性之间存在固有的权衡。通过将推理概念化为不确定性减少，我们提出收敛到稳定不动点同时保持有效性是一种有前景的方法。为此，我们提出了STARS（稳定性驱动的递归缩放），一种训练框架，约束潜在状态趋近渐近稳定不动点。这通过高效的雅可比谱半径正则化和随机循环采样实现，使STARS能够在确保严格稳定性的同时最大化有效性。在算术任务上的实验表明，STARS实现了可靠的测试时缩放，在复杂数学推理中，它显著减轻了随着递归深度增加而出现的性能退化，同时提高了峰值性能。

英文摘要

Looped Language Models (LoopLMs) enable efficient latent reasoning through depth recurrence, yet exhibit unreliable test-time scaling behavior: performance often peaks at a certain iteration depth and then collapses with further recurrence. Through latent dynamics analysis, we find an inherent trade-off between stability and effectiveness in existing architectures and strategies. By conceptualizing reasoning as uncertainty reduction, we propose that convergence toward stable fixed points while preserving effectiveness represents a promising way. To this end, we propose STARS (STAbility-driven Recurrent Scaling), a training framework that constrains latent states to approach asymptotically stable fixed points. This is realized via efficient Jacobian Spectral Radius Regularization with random loop sampling, enabling STARS to maximize effectiveness while ensuring rigorous stability. Experiments on arithmetic tasks show that STARS achieves reliable test-time scaling, and on complex mathematical reasoning it substantially mitigates performance degradation as recurrence depth increases while also improving peak performance.

URL PDF HTML ☆

赞 0 踩 0

2605.26732 2026-05-27 cs.LG 版本更新

APEX: Amplitude Anchors and Phase Priors for Target-Scarce Higher-Frequency Wave Prediction

APEX: 针对稀缺目标的高频波预测的幅度锚定与相位先验

Yifan Sun, Lei Cheng, Sijie Chen, Ting Zhang, Jianlong Li, Shikai Fang

发表机构 * College of Information Science and Electronic Engineering（信息科学与电子工程学院）

AI总结提出APEX框架，通过低频神经算子预测幅度作为锚点，结合格林函数启发的相位先验和条件流匹配增强器，在目标数据稀缺时实现高频波场预测，在多个基准上优于直接外推和联合生成方法。

详情

AI中文摘要

基于学习的替代模型在波场预测中日益有效，特别是神经算子在观测频率范围内表现出色。然而，在目标监督稀缺的情况下，高频预测仍相对未被充分探索，尤其是在高频数据模拟或测量成本远高于低频数据的波动问题中。一个核心困难是跨频率迁移本质上是不对称的：粗粒度幅度结构在不同频率间保持相对稳定，而相位敏感的振荡结构随着频率增加而迅速恶化。受此不对称性启发，我们提出APEX（从外推粗预测中进行的幅度锚定和相位先验引导增强），一个针对目标稀缺高频波场预测的框架。低频神经算子首先在目标频率范围内提供粗预测，我们仅保留幅度作为可迁移的结构锚点。然后，条件流匹配增强器在格林函数启发的相位先验指导下重建目标高频场。在SimpleWave、Helmholtz和Maxwell基准上的实验表明，在有限的目标频率监督下，APEX始终优于直接的低频到高频外推、目标自适应算子和联合生成基线。我们的结果表明，振荡波场的可靠高频预测不应依赖于完整复数场的直接端到端迁移，而应显式重用可迁移的粗粒度结构，同时单独恢复缺失的振荡细节。

英文摘要

Learning-based surrogates have become increasingly effective for wave-field prediction, and neural operators in particular have shown strong performance within observed frequency regimes. However, higher-frequency prediction under scarce target supervision remains comparatively underexplored, especially in wave problems where higher-frequency data are substantially more expensive to simulate or measure than lower-frequency data. A central difficulty is that cross-frequency transfer is inherently asymmetric: coarse amplitude structure remains relatively stable across frequencies, whereas phase-sensitive oscillatory structure deteriorates much more rapidly as frequency increases. Motivated by this asymmetry, we propose APEX, Amplitude-anchored and Phase-prior-guided Enhancement from eXtrapolated coarse predictions, a framework for target-scarce higher-frequency wave-field prediction. A lower-frequency neural operator first provides a coarse prediction in the target-frequency regime, from which we retain only the amplitude as a transferable structural anchor. A conditional flow-matching enhancer then reconstructs the target higher-frequency field under the guidance of a Green's-function-inspired phase prior. Experiments on SimpleWave, Helmholtz, and Maxwell benchmarks show that APEX consistently outperforms direct lower-to-higher extrapolation, target-adapted operator, and joint generative baselines under limited target-frequency supervision. Our results suggest that reliable higher-frequency prediction of oscillatory wave fields should not rely on direct end-to-end transfer of the full complex field, but instead on explicitly reusing transferable coarse structure while separately recovering the missing oscillatory detail.

URL PDF HTML ☆

赞 0 踩 0

2605.26718 2026-05-27 cs.LG 版本更新

MTL-FNO: A Lightweight Multi-Task Fourier Neural Operator for Sparse Field Reconstruction

MTL-FNO：一种用于稀疏场重建的轻量级多任务傅里叶神经算子

Siyu Ye, Shihang Li, Zhiqiang Gong, Benrong Zhang, Weien Zhou, Yiyong Huang, Wen Yao

发表机构 * Defense Innovation Institute, Academy of Military Science, Beijing, 100071, China（国防科技研究院，军事科学院，北京，100071，中国）； Intelligent Game and Decision Laboratory, Beijing, 100071, China（智能游戏与决策实验室，北京，100071，中国）

AI总结针对航空航天飞行器多场稀疏重建中模型庞大且难以利用跨场相关性的问题，提出基于硬参数共享的轻量级多任务傅里叶神经算子MTL-FNO，通过极坐标解耦优化和Cayley变换实现高效联合训练，在少样本条件下模型大小减少76%和60%且精度相当或更优。

详情

AI中文摘要

高效的星载多场稀疏重建对于航空航天飞行器的自主运行至关重要。虽然现有的深度学习模型在单场重建中表现出潜力，但部署多个独立模型会导致模型尺寸急剧增长，并且无法利用跨场相关性，尤其是在少样本条件下。为了解决这些挑战，我们首先提出了一种轻量级多任务傅里叶神经算子（MTL-FNO），这是一种基于硬参数共享的端到端联合训练框架。在每一层中，参数被分为共享部分和任务特定部分，以捕获各场之间的共同特征，同时保留任务特定特征。此外，任务特定的微调参数被实现为低秩项，实现了显著的模型压缩。其次，为了解决共享参数和任务特定参数及其实部和虚部联合优化的困难，我们从极坐标形式的角度重新审视了FNO的谱权重，并设计了一种具有物理意义的解耦优化方案。具体地，我们应用极分解将谱权重逐片解耦为编码相位信息的酉张量和表征振幅的半正定张量。通过解耦相位和振幅的优化，我们的方法可以有效缓解任务冲突。同时，为了在训练过程中保持酉几何保真度，引入Cayley变换对酉张量进行重参数化，将约束优化问题转化为无约束优化问题。最后，在两个代表性工程案例上验证了所提方法在少样本条件下的有效性。结果表明，MTL-FNO达到了与标准FNO相当甚至更优的精度，同时分别将总模型大小减少了76%和60%。

英文摘要

Efficient onboard multi-field sparse reconstruction is essential for the autonomous operation of aerospace vehicles. While existing deep learning models exhibit promise for single-field reconstruction, deploying multiple independent models leads to prohibitive model size growth and fails to exploit cross-field correlations, particularly under few-shot conditions. To address these challenges, we first propose a lightweight multi-task Fourier neural operator (MTL-FNO), an end-to-end joint training framework based on hard parameter sharing. In each layer, the parameters are divided into shared and task-specific components to capture common features across fields while preserving task-specific characteristics. Moreover, the task-specific fine-tuning parameters are implemented as low-rank terms, achieving substantial model compression. Second, to address the difficulty of co-optimizing shared and task-specific parameters along with their real and imaginary parts, we revisit the FNO's spectral weight from a polar-form perspective and devise a physically meaningful decoupled optimization scheme. Specifically, we apply polar decomposition to slice-wise disentangle the spectral weight into a unitary tensor encoding phase information and a positive semi-definite tensor characterizing amplitude. By decoupling the optimization of phase and amplitude, our method can effectively mitigate tasks conflict. Meanwhile, to preserve unitary geometric fidelity during training, the Cayley transform is introduced to reparameterize the unitary tensor, converting the constrained optimization problem to an unconstrained one. Finally, the effectiveness of the proposed method under few-shot conditions is validated on two representative engineering cases. Results show that MTL-FNO achieves accuracy comparable to or even surpassing that of standard FNO, while reducing total model size by 76% and 60%, respectively.

URL PDF HTML ☆

赞 0 踩 0

2605.26715 2026-05-27 cs.LG 版本更新

Image Feature Fusion-based Federated Client Unlearning (FCU)

基于图像特征融合的联邦客户端遗忘 (FCU)

Hangyi Shen, Yizhi Pan, Tiansuo Li, Weiqi Jiang, Guanqun Sun

AI总结针对联邦遗忘中灾难性遗忘导致全局泛化下降的问题，提出基于线性图像特征融合机制（Mixup）的联邦客户端遗忘方法，通过动态生成混合样本弥合遗忘与保留分布，在医学影像基准上实现了与重训练标准相当的遗忘效果。

详情

AI中文摘要

主要数据保护法规都提到了“被遗忘权”，这推动了联邦遗忘技术的发展。但一个顽固的问题仍然存在：灾难性遗忘——你擦除了目标知识，但同时也丢弃了必要的保留知识，从而损害了模型的全局泛化能力。为了在遗忘效果和泛化能力之间取得更好的平衡，我们提出了基于图像特征融合的联邦客户端遗忘（IFF-FCU）。其思想是引入线性图像特征融合机制（Mixup），动态创建混合样本，弥合遗忘分布和保留分布之间的差距。该策略不仅仅是删除几个离散的数据点——它在理论上拓宽并正则化了遗忘边界。我们在医学影像基准（RSNA-ICH 和 ISIC2018）上进行了大量实验，结果表明我们的方法实现了相当好的遗忘效果。例如，在 ICH 数据集上，IFF-FCU 实现了与重训练黄金标准高度竞争的误差偏差，显示出对现有基线的稳健改进。

英文摘要

Major data protection regulations all mention the "right to be forgotten," and that's what pushed federated unlearning (FU) techniques forward. But one stubborn issue remains: catastrophic forgetting--you erase the target knowledge, yet somehow you also end up throwing out essential retained knowledge, which then hurts the model's global generalization. To get a better balance between unlearning effectiveness and generalization ability, we propose something called Image Feature Fusion-based Federated Client Unlearning (IFF-FCU). The idea is to bring in a linear Image Feature Fusion mechanism (Mixup) that dynamically creates mixed samples, bridging the gap between forget-distribution and retain-distribution. What this strategy does isn't just deleting a few discrete data points--it theoretically widens and regularizes the forgetting boundary. We ran extensive experiments on medical imaging benchmarks (RSNA-ICH and ISIC2018), and the results show that our approach achieves reasonably good unlearning. For instance, on the ICH dataset, IFF-FCU achieves a highly competitive Error deviation from the retrained gold standard, demonstrating robust improvements over existing baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.26713 2026-05-27 stat.ML cs.LG 版本更新

Transformers Can Learn Posterior Predictive Distributions In-Context

Transformer可以在上下文中学习后验预测分布

Gyeonghun Kang, Changwoo J. Lee, Xiang Cheng

发表机构 * Department of Statistical Science, Duke University, Durham, NC, USA（统计科学系，达勒姆大学，达勒姆，NC，美国）； Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA（电气与计算机工程系，达勒姆大学，达勒姆，NC，美国）

AI总结本文通过构造证明Transformer能够实现针对后验预测均值和方差的梯度下降算法，并研究其逼近后验预测分布的误差界，揭示了归一化和注意力深度对泛化能力的关键作用。

详情

AI中文摘要

先验数据拟合网络（PFN）最近已成为贝叶斯预测任务的一种强大方法，通过上下文学习近似后验预测分布（PPD）。尽管它们具有强大的实证性能和超越点预测的能力，但对Transformer在上下文中学习分布的算法能力的理论理解仍然缺乏。聚焦于高斯过程回归问题，我们通过构造证明Transformer可以实现针对后验预测均值和方差的梯度下降算法，随后通过非线性映射产生PPD的分箱概率。我们根据注意力深度和分箱分辨率研究了近似PPD的误差界。基于这些结果，我们进一步证明了归一化和注意力深度的选择在使Transformer能够超越预训练样本大小范围进行外推中的关键作用。我们进行了模拟实验，验证了我们的发现，为针对PPD的PFN的表达能力以及架构选择如何影响泛化能力提供了见解。

英文摘要

Prior-data fitted networks (PFNs) have recently emerged as a powerful approach for Bayesian prediction tasks, approximating the posterior predictive distribution (PPD) through in-context learning. Despite their strong empirical performance and ability to go beyond point predictions, theoretical understandings of the algorithmic capability of transformers to learn distributions in context are still lacking. Focusing on Gaussian process regression problems, we show by construction that transformers can implement a gradient descent algorithm targeting the posterior predictive mean and variance, followed by nonlinear mappings that yield binned probabilities of PPD. We study the error bounds of the approximated PPD in terms of attention depth and bin resolution. Based on these results, we further demonstrate the key role of normalization and the choice of attention depth in enabling the extrapolation abilities of transformers beyond the pretraining sample size range. We conduct simulations that corroborate our findings, providing insight into the expressivity of PFNs targeting PPDs and how architectural choices may influence generalization capabilities.

URL PDF HTML ☆

赞 0 踩 0

2605.24041 2026-05-27 cs.LG cs.AI 版本更新

Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation

迭代精化神经算子：一种学习型不动点求解器——频谱偏差缓解的原则性方法

Xiaotian Liu, Shuyuan Shang, Xiaopeng Wang, Pu Ren, Yaoqing Yang

发表机构 * Dartmouth College（达特茅斯学院）； CUHK Shenzhen（香港大学深圳分校）； Lawrence Berkeley National Lab（伯克利国家实验室）

AI总结提出迭代精化神经算子（IRNO），通过固定点迭代应用学习精化模块，结合渐进频谱损失，有效缓解神经算子的频谱偏差，在湍流和活性物质等物理系统中显著降低高频误差。

Comments 47 pages; accepted to ICML 2026 as a Spotlight

详情

AI中文摘要

神经算子作为科学建模的快速数据驱动替代方法，通常依赖于单一前向推理过程，难以解析高频细节，这一局限性称为频谱偏差。我们引入迭代精化神经算子（IRNO），通过固定点迭代反复应用学习精化模块来增强预训练算子。IRNO将预测分解为粗初始化及随后的残差校正，类似于经典数值求解器。在局部假设下，我们建立了诱导算子的收缩性，确保收敛到唯一不动点。为明确针对高频误差，我们提出渐进频谱损失，在训练过程中自适应地增加对高频分量的惩罚。在物理系统中，IRNO持续降低误差，在湍流中提升高达56.05%。在活性物质中，频谱分析显示，相对于基础算子，归一化误差比在低频降至27.72-36.10%，中频降至5.07-6.68%，高频降至1.48-2.04%，且在训练迭代次数之外保持稳定。代码见 https://github.com/xiaotianliu-dartmouth/Iterative_Refinement_Neural_Operator。

英文摘要

Neural operators serve as fast, data-driven surrogates for scientific modeling but typically rely on a monolithic, single-pass inference procedure that struggles to resolve high-frequency details, a limitation known as spectral bias. We introduce the Iterative Refinement Neural Operator (IRNO), which augments pre-trained operators with a learned refinement module iteratively applied via fixed-point iteration. IRNO decomposes the prediction into a coarse initialization followed by successive residual corrections, paralleling classical numerical solvers. Under local assumptions, we establish contraction of the induced operator, ensuring convergence to a unique fixed point. To explicitly target high-frequency errors, we propose a progressive spectral loss that adaptively increases penalty on high-frequency components over refinement steps during training. Across physical systems, IRNO consistently lowers error, with up to 56.05% improvement on turbulent flow. On Active Matter, spectral analysis reveals that, relative to base operator, the normalized error ratios decrease to 27.72-36.10% in low-, 5.07-6.68% in mid-, and 1.48-2.04% in high-frequencies, remaining stable beyond the trained iteration count. Code is available at https://github.com/xiaotianliu-dartmouth/Iterative_Refinement_Neural_Operator

URL PDF HTML ☆

赞 0 踩 0

2605.22557 2026-05-27 cs.LG cs.NA math.NA 版本更新

Neural Flow Operators can Approximate any Operator: Abstract Frameworks and Universal Approximations

神经流算子可以逼近任意算子：抽象框架与通用逼近

Shuang Chen, Juncai He, Xue-Cheng Tai

发表机构 * Qiuzhen College, Tsinghua University（清华大学齐遵学院）； Yau Mathematical Sciences Center, Tsinghua University（清华大学尤 mathematical sciences center）； Norwegian Research Center（挪威研究中心）

AI总结提出神经流抽象框架，涵盖组合与分离结构的连续深度模型，证明其在有限维和无限维空间中的通用逼近性质，并通过时间离散化统一残差与普通架构。

详情

AI中文摘要

我们为神经网络和神经算子引入了一个抽象的神经流框架。该框架包含两种连续深度模型，即具有组合和分离结构的神经流，并涵盖了有限维函数逼近和无限维算子逼近。我们证明了相应神经流的适定性和通用逼近性质，包括据我们所知，首个无限维空间之间基于流的模型的通用逼近结果。我们还获得了卷积神经流模型的通用逼近结果。通过适当的时间离散化，组合结构恢复了ResNet类型的架构，而分离结构通过基于分裂的离散化产生了普通架构。这为具有全连接或卷积线性层的神经网络和神经算子的残差和普通架构提供了一条统一的基于流的路径。

英文摘要

We introduce an abstract neural flow framework for neural networks and neural operators. The framework contains two continuous-depth models, namely neural flows with composition and separation structures, and covers both finite-dimensional function approximation and infinite-dimensional operator approximation. We prove well-posedness and universal approximation properties for the corresponding neural flows, including, to the best of our knowledge, the first universal approximation result for flow-based models between infinite-dimensional spaces. We also obtain universal approximation results for convolutional neural flow models. Through suitable time discretizations, the composition structure recovers ResNet-type architectures, while the separation structure, via a splitting-based discretization, yields plain architectures. This gives a unified flow-based route to both residual and plain architectures for neural networks and neural operators with fully connected or convolutional linear layers.

URL PDF HTML ☆

赞 0 踩 0

2605.22468 2026-05-27 cs.LG cs.AI 版本更新

BioFormer: Rethinking Cross-Subject Generalization via Spectral Structural Alignment in Biomedical Time-Series

BioFormer: 通过频谱结构对齐重新思考生物医学时间序列中的跨主体泛化

Guikang Du, Haoran Li, Xinyu Liu, Zhibo Zhang, Xiaoli Gong, Jin Zhang

发表机构 * College of Computer Science, Nankai University, Tianjin, China（南开大学计算机科学学院）； College of Cyber Science, Tianjin Key Laboratory of Interventional Brain-Computer Interface（天津介入脑机接口与智能康复重点实验室）； Intelligent Rehabilitation, Key Lab of Data（智能康复，数据实验室）； Intelligent System Security, Frontiers Science Center for New Organic Matter, Nankai University, Tianjin, China（智能系统安全，新有机物前沿科学中心，南开大学，天津，中国）

AI总结提出BioFormer模型，通过频谱漂移视角显式建模主体特异性变异，利用频带对齐模块和样本条件层归一化对齐频谱结构，在六个数据集上F1分数提升6%。

详情

AI中文摘要

生物医学时间序列中的跨主体泛化指在一些主体数据上训练并在未见主体上测试。关键挑战是抑制BTS表示中的主体特异性变异。大多数现有方法通过模型构建或主体对抗学习隐式抑制变异，但很少显式建模。我们引入频谱漂移作为表征主体特异性变异的新视角。具体来说，相同标签下的BTS信号通常共享一致的振荡结构，但在特定频率分量上表现出依赖于主体的幅度或相位偏移，我们将其解释为主体特异性变异。基于这一见解，我们提出BioFormer。其核心是频带对齐模块（FBAM），该模块从频谱分布生成带级调制因子，并自适应调整幅度和相位以对齐频谱结构，从而减轻变异。我们进一步将FBAM与样本条件层归一化配对，该归一化从内在信号统计量而非主体身份推断归一化参数，稳定跨主体表示。在六个数据集上的大量实验表明，BioFormer优于12个基线，绝对F1分数提升6%。

英文摘要

Cross-subject generalization in biomedical time-series refers to training on data from some subjects and testing on unseen subjects.The key challenge is to suppress subject specific variability in BTS representations.Most existing methods implicitly suppress the variability through model building or subject adversarial learning, but rarely model it explicitly.We introduce spectral drift as a new perspective to characterize subject specific variability.Specifically, BTS signals under the same label often share consistent oscillatory structure, yet exhibit subject-dependent magnitude or phase shifts in specific frequency components, which we interpret as subject-specific variability. Building on this insight, we propose BioFormer.At its core is a Frequency-Band Alignment Module(FBAM) that generates band-wise modulation factors from the spectral distribution and adaptively adjusts amplitude and phase to align spectral structure, thereby mitigating variability.We further pair FBAM with Sample Conditional Layer Normalization, which infers normalization parameters from intrinsic signal statistics rather than subject identity, stabilizing cross-subject representations.Extensive experiments on six datasets demonstrate that BioFormer outperforms 12 baselines, yielding absolute F1-score improvements of 6%.

URL PDF HTML ☆

赞 0 踩 0

2605.21617 2026-05-27 cs.LG q-bio.QM 版本更新

$\textit{BlockFormer}$ : Transformer-based inference from interaction maps

$ extit{BlockFormer}$：基于交互图的Transformer推理

Eloïse Touron, Pedro L. C. Rodrigues, Julyan Arbel, Nelle Varoquaux, Michael Arbel

发表机构 * Univ. Grenoble Alpes（格勒诺布尔阿尔卑斯大学）； Inria（法国国家科学研究中心）； CNRS（法国国家科学研究中心）； Grenoble INP（格勒诺布尔研究所）； LJK（实验室）； TIMC

AI总结提出BlockFormer，一种基于Transformer架构的数据驱动方法，通过模拟器生成合成数据训练，解决从交互图中推断可变数量和大小实体参数的反问题，并成功应用于多种物种的着丝粒定位。

2605.20530 2026-05-27 cs.AI cs.CL cs.LG cs.SE 版本更新

跳过扩散模型中的零值以生成稀疏数据

Phil Sidney Ostheimer, Mayank Nagda, Andriy Balinskyy, Gabriel Vicente Rodrigues, Jean Radig, Carl Herrmann, Stephan Mandt, Marius Kloft, Sophie Fellenz

发表机构 * RPTU University Kaiserslautern-Landau（科隆-兰道大学RPTU）； Heidelberg University（海德堡大学）； University of California, Irvine（加州大学 Irvine 分校）

AI总结提出稀疏利用扩散（SED）方法，通过仅建模非零值来保持稀疏性，在训练和推理中跳过零值以节省计算并提升生成质量。

Comments Accepted to ICML 2026

2605.26693 2026-05-27 cs.LG cs.AI stat.ML 版本更新

Model Merging on Loss Landscape: A Geometry Perspective

损失景观上的模型合并：几何视角

Juanwu Lu, Anand Bhaskar, Brian Axelrod, Ekaterina Tolstaya, Tristan Emrich

发表机构 * Purdue University（普渡大学）； Waymo LLC（Waymo公司）

AI总结提出EpiMer框架，将模型合并视为黎曼流形上的Fréchet均值，利用任务向量张成的低秩子空间和期望Hessian度量，理论证明曲率感知合并优于平坦几何方法，并在八个图像分类任务上验证了性能提升。

Comments CVPR 2026 Findings Track. 18 pages, 4 figures, 6 tables

详情

AI中文摘要

模型合并为无需重新训练的知识集成和并行开发提供了有前景的途径。然而，现有方法要么忽略损失景观的几何结构，要么依赖于难以处理的全空间Hessian近似。我们提出EpiMer，一个将模型合并视为黎曼流形上Fréchet均值求解的框架，并将计算限制在由任务向量张成的低秩子空间内。以期望Hessian作为度量，我们揭示了局部曲率与参数认知不确定性之间的联系。我们的理论分析将合并误差界分解为子空间Fréchet方差和残差能量，并提供了曲率感知合并何时在理论上优于平坦几何方法的闭式刻画。此外，我们的框架将曲率感知方法和最近的谱方法统一为不同几何度量下子空间Fréchet均值的特例。在八个图像分类任务上合并微调的CLIP-ViT模型，Epistemic Merging在匹配秩下严格优于所有三个CLIP-ViT骨干网络的基线，提高了每个骨干网络上的跨任务平均准确率和最差任务准确率。

英文摘要

Model merging offers a promising avenue for knowledge integration and parallel development without retraining. Yet, existing methods either ignore the geometry of the loss landscape or rely on intractable full-space Hessian approximations. We propose EpiMer, a framework that casts model merging as solving the Fréchet mean on a Riemannian manifold and restricts the computation to a low-rank subspace spanned by the task vectors. With the expected Hessian as the metric, we reveal a connection between local curvature and epistemic uncertainty of the parameters. Our theoretical analysis decomposes the merging error bound into the subspace Fréchet variance and the residual energy, and provides a closed-form characterization of when curvature-aware merging provably outperforms flat-geometry methods. In addition, our framework unifies both curvature-aware methods and recent spectral methods as special cases of the subspace Fréchet mean with different geometric metrics. Merging fine-tuned CLIP-ViT models on eight image classification tasks, Epistemic Merging strictly outperforms the baselines on all three CLIP-ViT backbones at matched rank, improving the across-task average accuracy and worst-task accuracy on every backbone.

URL PDF HTML ☆

赞 0 踩 0

2605.26690 2026-05-27 cs.LG cs.AI q-bio.QM 版本更新

Self-Improvement Imitation with Biologically Guided Search for Protein Design Under Oracle Budgets

SILO：基于生物引导搜索的自改进模仿用于预算约束下的蛋白质设计

Ashima Khanna, Dominik Grimm

发表机构 * Technical University of Munich（慕尼黑技术大学）； University of Applied Sciences Weihenstephan-Triesdorf（魏因斯坦-特里斯多夫应用科学大学）

AI总结提出SILO框架，通过层次化编辑策略、增量随机束搜索和UCB代理集成，在有限oracle预算下实现蛋白质序列优化，在8个蛋白质适应度景观上达到最优性能。

详情

AI中文摘要

在严格的oracle预算下进行蛋白质序列优化需要探索巨大的组合空间，同时使每次评估都具有信息量。现有的强化学习和离策略生成方法在代理噪声下性能下降，且位置无关的突变提议可能破坏功能关键残基。我们提出了SILO，一个用于oracle预算蛋白质设计的轨迹级自改进模仿框架。SILO使用层次化编辑策略，将每个突变分解为位置选择后跟残基选择。在每个主动学习轮次中，策略通过增量随机无放回束搜索（SBS）采样候选轨迹，结合基于UCB的代理集成和丙氨酸扫描适应度分数（AFS），选择具有功能相关编辑的候选进行计算机oracle评估。然后，通过在轮次中最佳oracle标记轨迹上的下一动作交叉熵模仿来更新策略，避免值函数估计。在八个复现的蛋白质适应度景观和来自先前工作的五个强基线上，SILO在我们的评估中在8/8的景观上实现了最高的最大和top-100平均适应度，通常表现出更快的早期改进。在每种设置两个景观的低数据和噪声代理压力测试中，当多个基线退化时，SILO保持竞争力或最佳。消融实验表明，SBS与AFS贡献了大部分增益，迭代模仿提供了额外改进。代码可在：https://github.com/grimmlab/SILO.git 获取。

英文摘要

Protein sequence optimization under tight oracle budgets requires methods that explore vast combinatorial spaces while making each evaluation informative. Existing reinforcement learning and off-policy generative approaches often degrade under surrogate noise, and position-agnostic mutation proposals risk disrupting functionally critical residues. We introduce SILO, a trajectory-level self-improvement imitation framework for oracle-budgeted protein design. SILO uses a hierarchical edit policy that decomposes each mutation into a position choice followed by a residue choice. In each active-learning round, the policy samples candidate trajectories via incremental stochastic beam search without replacement (SBS), and a UCB-based proxy ensemble, combined with an alanine-scan fitness score (AFS), selects candidates with functionally relevant edits for in silico oracle evaluation. The policy is then updated by next-action cross-entropy imitation on the round's best oracle-labeled trajectories, avoiding value-function estimation. Across eight reproduced protein fitness landscapes and five strong baselines from prior work, SILO achieves the highest maximum and top-100 mean fitness on 8 of 8 landscapes within our evaluations, often exhibiting faster early-stage improvement. In low-data and noisy-proxy stress tests on two landscapes per setting, SILO remains competitive or best when several baselines degrade. Ablations show that SBS with AFS account for much of the gains, with iterative imitation providing additional improvement. Code is available at: https://github.com/grimmlab/SILO.git

URL PDF HTML ☆

赞 0 踩 0

2605.26675 2026-05-27 stat.ML cs.LG 版本更新

CART Random Forests as Sequential Allocation over Random Opportunity Sets: A Stochastic-Control Theory of Ensemble Risk

CART随机森林作为随机机会集上的序贯分配：集成风险的随机控制理论

Tianxing Mei, Yingying Fan, Mingming Leng, Jinchi Lv

发表机构 * Faculty of Business, Lingnan University（岭南大学商学院）； Data Sciences and Operations Department, University of Southern California（南加州大学数据科学与运营部门）

AI总结本文从随机控制视角将CART随机森林建模为随机机会集上的序贯分配过程，通过分离特征子采样和信息分裂策略两个设计杠杆，揭示了森林均方误差的构成，并证明了CART策略的局部稳定性与全局次优性。

Comments 69 pages, 1 figure

详情

AI中文摘要

CART随机森林是最广泛使用的现代预测方法之一，具有充分记录的经验成功。然而，在机制层面，由于其复杂性，该算法通常被视为黑箱。在本文中，我们发展了特征子采样CART随机森林的随机控制视角，称为CART随机机会集分配（CART-ROSA）。在每个节点，特征的随机子集被解释为随机可行动作集，CART分裂规则被解释为掩码动作分配策略。该策略在信息性分裂计数状态上诱导出一个受控的随机过程，其终末分布决定了森林均方误差（MSE）中的单棵树误差和树间交互项。这种表示通过分离两个设计杠杆——特征子采样引起的信息性机会率和掩码内分裂策略的收缩强度——打开了CART森林的黑箱。我们证明CART策略是局部稳定的：它收缩了信息性分裂分配中的不平衡，并集中了终末树的几何结构。然而，在系统层面，它对森林目标可能是全局次优的。针对线性模型，我们显式推导了MSE风险展开。我们的结果表明，运筹学视角如何使从CART森林的标准算法描述难以触及的理论缺口变得可处理。

英文摘要

CART random forests are among the most widely used modern predictive methods, with well-documented empirical success. Yet, at the mechanistic level, the algorithm is often treated as a black box because of its complexity. In this paper, we develop a stochastic-control perspective on feature-subsampled CART random forests, named CART random opportunity-set allocation (CART-ROSA). At each node, the random subset of features is interpreted as a random feasible action set, and the CART split rule as a masked-action allocation policy. This policy induces a controlled stochastic process over informative split-count states, whose terminal law determines both single-tree error and cross-tree interaction terms in the forest mean squared error (MSE). Such representation opens the black box of CART-forests by separating two design levers: the informative-opportunity rate induced by feature subsampling, and the contraction strength from the within-mask split policy. We establish that the CART policy is locally stabilizing: it contracts imbalances in informative split allocations and concentrates terminal tree geometry. At the system level, however, it can be globally suboptimal for the forest objective. Specializing to the linear model, we derive the MSE risk expansion explicitly. Our results show how an operations-research perspective makes tractable a theoretical gap difficult to access from the standard algorithmic description of CART forests.

URL PDF HTML ☆

赞 0 踩 0

2605.26667 2026-05-27 cs.AI cs.LG 版本更新

将你的展开用在关键处：基于组强化学习后训练的展开分配

Woojeong Kim, Ziyi Yang, Jing Nathan Yan, Jialu Liu

发表机构 * Cornell University（康奈尔大学）

AI总结提出 Pilot-Commit 框架，通过预算感知的展开分配策略，优先将计算资源分配给高信息量的提示，从而在组策略优化中减少采样成本并加速收敛。

详情

AI中文摘要

Focal Reward: 基于评分标准的强化学习中的平衡奖励

Yu Huang, Zihua Zhao, Zhaoxin Huan, Wanli Gu, Feng Hong, Xinmu Ge, Lin Yuan, Weichang Wu, Qiang Hu, Xiaolu Zhang, Jun Zhou, Jiangchao Yao

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Ant Group（蚂蚁集团）

AI总结针对大语言模型在基于多维评分标准的强化学习中奖励失衡的问题，提出Focal Reward方法，通过逆奖励投影机制估计各维度饱和程度并自动重加权，实现细粒度平衡，在18个模型-基准对比中均优于最强静态聚合基线。

Comments Preprint

详情

AI中文摘要

大语言模型中的开放式生成通常需要多维评分标准来充分评估质量并指导强化学习的改进。然而，这种训练范式固有的一个关键困境是不同评分标准维度上的奖励极化不平衡。在此瓶颈下，即使大语言模型在训练后获得相对较高的奖励，它们仍可能在某些维度上表现出严重缺陷，直接导致用户体验下降。为了解决这个问题，我们提出了Focal Reward，一种新颖的目标函数，用于自动平衡基于评分标准的强化学习训练。具体来说，我们首先利用逆奖励投影机制来估计评分标准中每个准则的饱和程度，这构成了校准奖励方向的基础。然后，最终目标函数为每个准则设计了一个自动重新加权的系数，以实现细粒度平衡。跨三个模型规模和六个基准的大量实验表明，我们的Focal Reward方法在所有18个模型-基准比较中均优于最强的静态聚合基线。展开、机制和消融分析进一步表明，这些增益来自于向仍有改进空间的评分标准进行在线、饱和感知的重新分配。

英文摘要

The open-ended generation in LLMs usually requires multi-dimensional rubrics to adequately assess quality and guide the improvement of reinforcement learning. However, a critical dilemma inherent in this training paradigm is the imbalanced reward polarization along different rubric dimensions. Under this bottleneck, even if LLMs achieve relatively high rewards after training, they may still exhibit severe deficiencies in certain dimensions, leading to a direct deterioration in user experience. To address this problem, we propose Focal Reward, a novel objective to automatically balance the training of reinforcement learning under rubric-based rewards. Specifically, we first leverage an inverse reward projection mechanism to estimate the saturation degree of each criterion in the rubric, which forms the basis to calibrate the reward direction. Then, the final objective is designed with an automatically reweighting coefficient for each criterion to achieve the fine-grained balancing. Extensive experiments across three model scales and six benchmarks demonstrate that our Focal Reward method outperforms the strongest static aggregation baseline in all 18 model-benchmark comparisons. Rollout, mechanism, and ablation analyses further show that these gains arise from online, saturation-aware reallocation toward rubrics that still have room for improvement.

URL PDF HTML ☆

赞 0 踩 0

2605.26577 2026-05-27 eess.SY cs.AI cs.LG cs.SY math.OC 版本更新

Bridging Control with Neural Network Verifier alpha-beta-CROWN: A Tutorial

桥接控制与神经网络验证器 alpha-beta-CROWN：教程

Haoyu Li, Xiangru Zhong, Hao Cheng, Bin Hu, Huan Zhang

发表机构 * Department of Computer Science（计算机科学系）； Department of Electrical and Computer Engineering（电气与计算机工程系）

AI总结本教程提出一个统一框架，通过将控制问题与神经网络验证器 α,β-CROWN 桥接，实现控制器属性的可扩展形式验证。

Comments ACC 2026 Tutorial

详情

AI中文摘要

基于学习的控制器合成方法因其高表达力和强经验性能而受到欢迎。然而，在自动驾驶、机器人技术和电力系统等安全关键场景中，仅凭经验性能是不够的，对控制器的稳定性、安全性等属性进行形式验证是非常可取的。不幸的是，许多先前的验证方法要么依赖于系统或证书的特定结构假设，难以在不同设置间迁移，要么在高维神经网络系统上可扩展性差。在本教程中，我们提出了一个统一框架，旨在通过将控制与最先进的神经网络验证器 $α,\!β$-CROWN（alpha-beta-CROWN）桥接来弥合这一差距。其核心是，$α,\!β$-CROWN 是一个通用的边界引擎，用于表示为计算图的非线性函数：给定一个输入域，它可以产生认证边界和非线性函数的显式线性松弛。这些认证边界本身对于可达性分析等任务很有用，并且它们为执行可满足性检查和优化的更复杂例程提供了基础。更具体地说，许多控制问题归结为验证状态域上的实值不等式（例如，李雅普诺夫理论）。因此，$α,\!β$-CROWN 通过计算紧边界并基于边界递归划分和剪枝子域，实现了这些条件的可扩展验证。得益于 GPU 并行化，该流程在对传统方法具有挑战性的验证和优化问题上展示了卓越的可扩展性。在本教程中，我们讨论了 $α,\!β$-CROWN 的基础知识，并介绍了其在各种控制相关任务中的应用。

英文摘要

Learning-based methods for synthesizing controllers have gained popularity due to their high expressiveness and strong empirical performance. However, in safety-critical scenarios such as autonomous driving, robotics, and power systems, empirical performance alone is insufficient, and formal verification of controller properties such as stability and safety is highly desirable. Unfortunately, many prior verification approaches are either tied to specific structural assumptions on the system or the certificate, making them difficult to transfer across settings, or suffer from poor scalability on higher-dimensional neural network systems. In this tutorial, we present a unified framework that aims to mitigate this gap via bridging control with the state-of-the-art neural network verifier $α,\!β$-CROWN (alpha-beta-CROWN). At its core, $α,\!β$-CROWN is a general-purpose bounding engine for nonlinear functions represented as computation graphs: given an input domain, it can produce certified bounds and explicit linear relaxation of the nonlinear function. These certified bounds are useful on their own for tasks such as reachability analysis, and they also provide the foundation for more complex routines that perform satisfiability checking and optimization. More specifically, many control problems reduce to verifying real-valued inequalities over a state domain (e.g., Lyapunov theory). Consequently, $α,\!β$-CROWN enables scalable verification of such conditions by computing tight bounds and recursively partitioning and pruning subdomains based on the bounds. Thanks to GPU parallelization, this pipeline demonstrates superior scalability on verification and optimization problems that are challenging for traditional approaches. In this tutorial, we discuss the basics of $α,\!β$-CROWN and introduce its application to various control-related tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.26576 2026-05-27 cs.CV cs.LG 版本更新

TrackRef3D: Multi-View Consistent Track-then-Label for Open-World Referring Segmentation in 3D Gaussian Splatting

TrackRef3D: 面向开放世界3D高斯泼溅分割的多视角一致跟踪-标注方法

Yuyang Tan, Renhe Zhang, Hang Zhang, Ao Li, Xin Tan

发表机构 * East China Normal University, Shanghai, China（华东师范大学，上海，中国）； Shanghai AI Laboratory（上海人工智能实验室）； University of Electronic Science and Technology of China, Chengdu, China（电子科技大学，成都，中国）

AI总结提出TrackRef3D全自动流水线，通过多视角一致跟踪-标注范式解耦目标发现与语义定位，无需人工标注实现开放世界3D高斯泼溅分割。

详情

AI中文摘要

引用3D高斯泼溅（R3DGS）利用自然语言进行3D目标分割，已成为具身AI的关键能力。然而，现有方法通常依赖昂贵的每场景人工标注和每视图伪掩码生成，存在多视角不一致以及对不同查询特异性的泛化能力差的问题。为此，我们提出TrackRef3D，一种全自动流水线，通过引入多视角一致的跟踪-标注范式，从根本上将目标发现与语义定位解耦，无需人工标注即可实现3D高斯泼溅（3DGS）中的开放世界引用分割。具体而言，我们提出轨迹感知语义共识模块（TSCM），通过同义词聚类和轨迹感知投票聚合跨视图预测，建立规范语义身份，从而确保多视角一致性。此外，我们采用可见性感知描述生成策略以缓解歧义，并提出混合训练策略（HTS），利用多正例对比目标联合优化粗粒度类别语义和细粒度引用线索，确保在不同查询特异性下的鲁棒性。在基准上的大量实验表明，TrackRef3D达到了最先进的性能。

英文摘要

Referring 3D Gaussian Splatting (R3DGS), which utilizes natural language for 3D object segmentation, has emerged as a crucial capability for embodied AI. However, existing methods typically rely on expensive per-scene manual annotation and per-view pseudo mask generation, which suffer from multi-view inconsistency and poor generalization to varying query specificities. To address this, we present TrackRef3D, a fully automatic pipeline that achieves open-world referring segmentation in 3D Gaussian Splatting (3DGS) without manual annotation by introducing a multi-view consistent track-then-label paradigm that fundamentally decouples object discovery from semantic grounding. Specifically, we propose a Trajectory-Aware Semantic Consensus Module (TSCM) which aggregates cross-view predictions via synonymous clustering and trajectory-aware voting to establish a canonical semantic identity, thereby ensuring multi-view consistency. Furthermore, we employ a visibility-aware description generation strategy to mitigate ambiguity and propose a Hybrid Training Strategy (HTS) that jointly optimizes coarse category semantics and fine-grained referential cues to ensure robustness under varying query specificities using a multi-positive contrastive objective. Extensive experiments on benchmarks demonstrate that TrackRef3D achieves state-of-the-art performance.

URL PDF HTML ☆

赞 0 踩 0

2605.26571 2026-05-27 cs.LG 版本更新

Separate Aggregation of Split Network for Personalized Federated Learning

分离网络的分组聚合用于个性化联邦学习

Yunseok Kang, Jaeyoung Song

发表机构 * Department of Electronics Engineering, Pusan National University（全州国立大学电子工程系）

AI总结提出PGFedSplit框架，采用分离架构和自适应聚合调度，结合本地与服务器生成的表示，解决客户端数据异构下的个性化与全局泛化权衡问题。

详情

AI中文摘要

联邦学习能够在不共享原始数据的情况下进行协作模型训练，但在客户端数据分布异构时性能会大幅下降。单一的全局模型往往无法满足不同客户端的需求，因此个性化联邦学习被探索用于在保持全局泛化的同时提升客户端特定性能。现有的PFL方法通常面临一个基本权衡：更强的全局共享可能削弱本地专业化，而更强的本地适应则可能导致在数据有限、标签不平衡和缺失类别场景下的过拟合。在这项工作中，我们提出了PGFedSplit，一个在严重客户端异构下同时提升个性化和全局泛化的个性化联邦学习框架。PGFedSplit采用分离架构，并根据不同模型组件的角色执行自适应聚合调度，在保持客户端特定适应的同时实现稳定的知识共享。每个客户端进一步利用本地提取的表示和从服务器端高斯统计生成的合成表示的混合，提升了在标签不平衡和缺失类别条件下的鲁棒性。在Fashion MNIST、CIFAR-10、CIFAR-100和Tiny ImageNet上的大量实验表明，与最先进的PFL方法相比，PGFedSplit在高度异构设置下实现了持续改进，具有稳定的收敛和优越的个性化性能。

英文摘要

Federated learning enables collaborative model training without sharing raw data, but its performance can degrade substantially under heterogeneous client data distributions. A single global model often cannot satisfy diverse client requirements, so personalized federated learning has therefore been explored to improve client specific performance while preserving global generalization. Existing PFL methods often face a fundamental tradeoff in which stronger global sharing can undermine local specialization, whereas stronger local adaptation can lead to overfitting under limited data, label imbalance, and missing class scenarios. In this work, we propose PGFedSplit, a personalized federated learning framework that improves both personalization and global generalization under severe client heterogeneity. PGFedSplit adopts a split architecture and performs adaptive aggregation scheduling tailored to the roles of different model components, enabling stable knowledge sharing while maintaining client specific adaptation. Each client further leverages a mixture of locally extracted representations and synthetic representations generated from server side Gaussian statistics, improving robustness under label imbalance and missing class conditions. Extensive experiments on Fashion MNIST, CIFAR 10, CIFAR 100, and Tiny ImageNet demonstrate consistent improvements over state of the art PFL methods, with stable convergence and superior personalization in highly heterogeneous settings.

URL PDF HTML ☆

赞 0 踩 0

2605.26569 2026-05-27 cs.LG 版本更新

Distribution-Aware Conformal Prediction: A Framework for generating efficient prediction intervals for time series

分布感知共形预测：一种为时间序列生成高效预测区间的框架

Daniel Schweizer, Peter Kuhn, Jayant Sharma, Shivali Dubey, Malte von Ramin, Christoph Brockt-Haßauer

发表机构 * Fraunhofer Institute for Highspeed Dynamics, Ernst-Mach-Institut, EMI Freiburg（弗劳恩霍夫高速动力研究所，恩斯特-马赫研究所，EMI弗赖堡）

AI总结提出分布感知共形预测（DCP）框架，通过集成概率预测器与分数无关的共形校准，为时间序列生成有效且高效的预测区间。

Comments submitted to Journal of Machine Learning Research (JMLR)

详情

AI中文摘要

我们提出了分布感知共形预测（DCP），这是一个统一框架，将蒙特卡洛dropout、深度集成和分位数回归等概率预测器与分数无关的共形校准相结合，以生成有效且高效的预测区间。利用数值反演方法构建区间边界，DCP能够适应任意组合的分布生成预测器和非一致性分数。对合成和真实时间序列数据的基准分析表明，DCP能够在不同的不确定性机制下自适应地校准预测区间。关键的是，DCP的模块化设计便于对不同预测器-分数配对进行即插即用实验，并通过新引入的修正Winkler分数进行定量支持，该分数通过显式惩罚欠覆盖来平衡有效性和效率。虽然DCP推广并扩展了现有方法（如共形分位数回归和共形蒙特卡洛），但其模块化设计允许进一步扩展，为在动态环境和高风险应用中推进不确定性量化奠定了基础。

英文摘要

We present Distribution-aware Conformal Prediction (DCP), a unified framework integrating probabilistic predictors like Monte Carlo dropout, deep ensembles, and quantile regression with score-agnostic conformal calibration to produce valid and efficient prediction intervals. Leveraging a numerical inversion approach to construct interval bounds, DCP accommodates arbitrary combinations of distribution generating predictors and nonconformity scores. Benchmark analysis on synthetic and real-world time series data demonstrate DCP's ability to adaptively calibrate prediction intervals under varying uncertainty regimes. Crucially, DCP's modular design facilitates plug-and-play experimentation with different predictor-score pairings, quantitatively supported by a newly introduced modified Winkler score that balances validity and efficiency by explicitly penalizing undercoverage. While DCP generalizes and extends existing approaches like Conformalized Quantile Regression and Conformalized Monte Carlo, its modular design allows further extensions, setting a foundation for advancing uncertainty quantification in dynamic environments and high-risk applications.

URL PDF HTML ☆

赞 0 踩 0

2605.26562 2026-05-27 cs.LG 版本更新

Beyond Holistic Models: Systematic Component-level Benchmarking of Deep Multivariate Time-Series Forecasting

超越整体模型：深度多变量时间序列预测的系统性组件级基准测试

Shuang Liang, Chaochuan Hou, Xu Yao, Shiping Wang, Hailiang Huang, Songqiao Han, Minqi Jiang

发表机构 * Shanghai University of Finance and Economics（上海财经大学）； Key Laboratory of Interdisciplinary Research of Computation and Economics（交叉计算与经济学交叉学科实验室）

AI总结提出TSCOMP基准，通过正交实验分解深度预测方法的核心组件，揭示其有效性并构建性能语料库，实现零样本模型构建，优于手工复杂架构。

Comments accepted by KDD 2026 Datasets and Benchmarks Track

详情

DOI: 10.1145/3770855.3817551

AI中文摘要

虽然先前在多变量时间序列预测中的研究集中于开发复杂的整体模型，但本工作倡导转向对其影响的细粒度、组件级理解。我们提出TSCOMP，这是第一个大规模基准，系统地将深度预测方法分解为其核心、细粒度的组件——涵盖序列预处理、编码策略、包括特定和大规模时间序列模型的网络架构以及优化方法。通过使用约束正交实验设计和广泛评估，我们进行多视角分析，揭示组件在不同骨干网络、数据特征及其交互中的有效性。除了提供见解外，该基准建立了一个包含超过20,000个模型-数据集评估的细粒度性能语料库，支持自动组件选择的学习，从而在新数据集上实现零样本模型构建。我们的实验表明，尽管简单，但基于语料库的方法始终优于最先进的方法，验证了我们评估设计的合理性，并确认系统性组件选择超越了手工设计的复杂架构。所有代码和性能语料库均可在 https://github.com/SUFE-AILAB/TSCOMP 公开获取。

英文摘要

While previous research in multivariate time series forecasting has focused on developing complex holistic models, this work advocates for a shift toward a granular, component-level understanding of their impacts. We propose TSCOMP, the first large-scale benchmark that systematically deconstructs deep forecasting methods into their core, fine-grained components--spanning series preprocessing, encoding strategies, network architectures including specific and large time-series models, and optimization methods. Using constrained orthogonal experimental design and extensive evaluations, we conduct multi-view analyses that reveal component effectiveness across different backbones, data characteristics, and their interactions. Beyond providing insights, this benchmark establishes a fine-grained performance corpus comprising over 20,000 model-dataset evaluations, which supports the learning of automated component selection, enabling zero-shot model construction on new datasets. Our experiments demonstrate that the corpus-driven approach, despite its simplicity, consistently outperforms state-of-the-art methods, validating the soundness of our evaluation design and confirming that systematic component selection surpasses manually designed complex architectures. All code and the performance corpus are publicly available at https://github.com/SUFE-AILAB/TSCOMP.

URL PDF HTML ☆

赞 0 踩 0

2605.26559 2026-05-27 cs.LG cs.AI econ.EM 版本更新

Auditing and Fixing Economic Validity in Tabular Foundation Models for Discrete Choice

审计与修复离散选择中表格基础模型的经济有效性

Yingshuo Wang, Xian Sun, Yanhang Li, Zhichao Fan, Zexin Zhuang

发表机构 * University of California, Berkeley, CA, USA（加州大学伯克利分校）； Duke University, Durham, NC, USA（杜克大学）； Northeastern University, Boston, MA, USA（东北大学）； University of Illinois Urbana-Champaign, IL, USA（伊利诺伊大学厄巴纳-香槟分校）； Southern Methodist University, Dallas, TX, USA（南方 Methodist 大学）

AI总结提出两阶段适配器，将表格基础模型预测嵌入效用最大化框架，在保证经济一致性的同时提升选择预测精度。

Comments 5 pages, 1 table. Accepted at the FMSD Workshop, ICML 2026

详情

AI中文摘要

表格基础模型在选择预测任务上取得了很高的准确率，但其预测常常违反这些任务所需的经济逻辑：提高价格有时会增加预测需求，隐含的支付意愿估计经常为负或不合理。我们提出了一种两阶段适配器，将基础模型预测嵌入效用最大化框架。在第一阶段，我们估计一个标准选择模型，其参数受经济理论约束。在第二阶段，我们冻结这些参数，并训练一个校正项，将基础模型的预测作为附加信息纳入。结果模型继承了基础模型的精度提升，同时保证了政策扰动下价格-需求的单调关系，并产生可解析计算的权衡指标。在两个交通数据集上，适配器在保持完美经济一致性的同时，相比标准logit模型恢复了高达13个百分点的准确率，这是原始基础模型或传统蒸馏都无法实现的。

英文摘要

Tabular foundation models achieve strong accuracy on choice prediction tasks, but their predictions often violate the economic logic those tasks require: raising a price sometimes increases predicted demand, and implied willingness-to-pay estimates are frequently negative or implausible. We propose a two-stage adapter that embeds foundation model predictions within a utility-maximization framework. In the first stage, we estimate a standard choice model whose parameters are constrained to obey economic theory. In the second stage, we freeze those parameters and train a correction term that incorporates the foundation model's predictions as additional information. The result is a model that inherits the foundation model's accuracy gains while guaranteeing monotonic price-demand relationships under policy perturbation and producing analytically computable trade-off measures. On two transportation datasets, the adapter recovers up to 13 percentage points of accuracy over a standard logit model while maintaining perfect economic consistency, something neither the raw foundation models nor conventional distillation achieve.

URL PDF HTML ☆

赞 0 踩 0

2605.26554 2026-05-27 cs.LG cs.AI 版本更新

Linear and Neural Dueling Bandits with Delayed Feedback

线性与神经延迟反馈的对抗性赌博机

Xiangyi Wang, Pingchen Lu, Jie Mao, Mingze Kong, Zhi Hong, Zhiyong Wang, Zhongxiang Dai

发表机构 * The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））； The Chinese University of Hong Kong（香港中文大学）

AI总结针对随机延迟反馈下的上下文对抗性赌博机问题，提出线性（LDB-DF）和神经（NDB-DF）两种算法，通过将逆概率加权（IPW）机制直接融入损失函数实现无偏校正，并给出线性设置下O(d*sqrt(T))的遗憾界和神经设置下的次线性保证。

详情

AI中文摘要

上下文对抗性赌博机构成了基于偏好的决策制定的基石，在推荐系统和大语言模型对齐中有关键应用。然而，标准算法依赖于即时反馈的理想化假设，这一条件在现实场景（如提示优化）中经常被违反。这种设置带来了独特的理论挑战：与线性赌博机不同，对抗性赌博机估计量缺乏闭式解，使得标准加权技术的朴素适应产生偏差。为解决这一问题，我们形式化了具有随机延迟反馈的上下文对抗性赌博机问题，并提出了两种新颖算法：线性延迟反馈对抗性赌博机（LDB-DF）和神经延迟反馈对抗性赌博机（NDB-DF）。我们方法的核心是一种新颖的估计量，它将逆概率加权（IPW）机制直接集成到损失函数中，确保对延迟或缺失反馈的无偏校正。我们提供了全面的理论分析，为线性设置建立了O(d*sqrt(T))的遗憾界，并为神经设置建立了次线性保证。在模拟和真实数据集上的大量实验证明了我们提出方法的有效性。

英文摘要

Contextual dueling bandits form a cornerstone of preference-based decision-making, with critical applications in recommender systems and large language model alignment. However, standard algorithms rely on the idealized assumption of immediate feedback, a condition frequently violated in real-world scenarios such as prompt optimization. This setting introduces a unique theoretical challenge: unlike linear bandits, dueling bandit estimators lack closed-form solutions, rendering naive adaptations of standard weighting techniques biased. To address this, we formalize the problem of Contextual Dueling Bandits with Stochastic Delayed Feedback and propose two novel algorithms: Linear (LDB-DF) and Neural (NDB-DF) Dueling Bandits with Delayed Feedback. Central to our approach is a novel estimator that integrates an Inverse Probability Weighting (IPW) mechanism directly into the loss function, ensuring unbiased correction for delayed or missing feedback. We provide comprehensive theoretical analysis, establishing an O(d*sqrt(T)) regret bound for the linear setting and sub-linear guarantees for the neural setting. Extensive experiments on both simulated and real-world datasets demonstrate the effectiveness of our propose.

URL PDF HTML ☆

赞 0 踩 0

2605.26548 2026-05-27 cs.CR cs.LG 版本更新

CSV-ViT: 一种使用可变大小皮层超顶点的视觉Transformer用于阿尔茨海默病病理检测

Geonwoo Baek, Ikbeom Jang

发表机构 * Department of Computer Science and Engineering（计算机科学与工程系）； Hankuk University of Foreign Studies（韩国家 foreign 学院）

AI总结提出一种保留感兴趣区域的、基于顶点的可变大小皮层表面分块方法（皮层超顶点），并设计可变大小补丁兼容的视觉Transformer（CSV-ViT），在阿尔茨海默病诊断、淀粉样蛋白阳性和tau蛋白阳性三分类任务中优于现有表面模型。

详情

AI中文摘要

确认阿尔茨海默病（AD）通常依赖于正电子发射断层扫描（PET），该方法仍然昂贵且有创，这促使了基于结构MRI的预筛查的使用。在非欧几里得流形，特别是大脑皮层表面上的深度学习，由于数据的球形拓扑结构面临重大挑战。最近的表面模型已经能够从皮层表面数据中学习；然而，施加基于面的均匀补丁通常会导致补丁边界处的重复顶点。一般来说，许多基于表面的模型对感兴趣区域（ROI）的感知有限，这可能导致非皮层区域（如内侧壁）被包含在内。我们提出了一种皮层表面分块方法，该方法执行保留ROI的、基于顶点的、可变大小的补丁划分。我们将这些皮层表面补丁称为皮层超顶点（CSV）。基于这种表示，我们设计了CSV视觉Transformer（CSV-ViT），这是一种可变大小补丁容忍的视觉Transformer，使用填充和掩码感知的补丁嵌入。我们使用T1加权MRI，并通过将AD相关状态分类为三个类别来评估我们的框架：AD诊断、淀粉样蛋白阳性和tau蛋白阳性。在实验中，CSV-ViT取得了比最近基于表面的模型更高的分类性能。结果表明，所提出的CSV-ViT可能支持在PET或脑脊液确认之前基于MRI的AD相关状态预测。

英文摘要

Confirming Alzheimer's disease (AD) typically relies on positron emission tomography (PET), which remains costly and invasive, motivating the use of structural MRI-based prescreening. Deep learning on non-Euclidean manifolds, particularly brain cortical surfaces, faces significant challenges due to the data's spherical topology. Recent surface models have enabled learning from cortical surface data; however, imposing face-based uniform patches often causes duplicate vertices at patch boundaries. In general, many surface-based models are limited in their awareness of the region of interest (ROI), which can result in non-cortical regions, such as the medial wall, being included. We propose a cortical surface tokenization that performs ROI-preserving, vertex-based, variable-sized patch partitioning. We refer to these cortical surface patches as cortical supervertices (CSVs). Building on this representation, we design the CSV Vision Transformer (CSV-ViT), a variable-size patch-tolerant Vision Transformer that uses padding and a mask-aware patch embedding. We used T1-weighted MRI and evaluated our framework by classifying AD-related status into three categories: AD diagnosis, amyloid positivity, and tau positivity. Across the experiments, CSV-ViT achieved higher classification performance than recent surface-based models. The results suggest that the proposed CSV-ViT may support MRI-based prediction of AD-related status prior to PET or CSF confirmation.

URL PDF HTML ☆

赞 0 踩 0

2605.26509 2026-05-27 cs.LG math.PR stat.CO 版本更新

SIKA-GP: Accelerating Gaussian Process Inference with Sparse Inducing Kernel Approximations for Bayesian Deep Learning

SIKA-GP：利用稀疏诱导核近似加速贝叶斯深度学习中的高斯过程推断

Wenyuan Zhao, Rui Tuo, Chao Tian

发表机构 * Department of Electrical ； Computer Engineering, Texas A\&M University, College Station, US ； Department of Industrial ； Systems Engineering, Texas A\&M University, College Station, US

AI总结提出SIKA-GP方法，通过基于二元有序模板基的稀疏诱导核近似，将高斯过程推断的计算复杂度降低至O(log M)，并实现高效张量化GPU计算，可自然嵌入贝叶斯神经网络，在视觉和Transformer语言基准上显著加速训练和推断而不牺牲预测性能。

Comments 20 pages, 8 figures; accepted to International Conference on Machine Learning (ICML) 2026

详情

AI中文摘要

高斯过程（GPs）为不确定性估计提供了原则性的贝叶斯框架，但其计算复杂度严重限制了在大规模数据集上的可扩展性。我们提出SIKA-GP，该方法使用基于二元有序模板基的稀疏诱导核近似来加速GP推断，对诱导点数量的复杂度依赖仅为${O}(\log M)$。我们的方法从稀疏激活基构建紧凑且表达力强的核表示，从而实现高效的张量化GPU计算，并与现代大规模模型无缝集成。SIKA-GP可以自然地嵌入具有稀疏激活的贝叶斯神经网络（BNNs）中，在训练和推断中均实现显著加速，且不牺牲预测性能。该方法自然地扩展到深度特征学习，解决了深度架构和高维特征表示带来的可扩展性挑战。在视觉和基于Transformer的语言基准上的实验结果表明，我们的方法始终提供快速且准确的GP模型，为可扩展核学习提供了一条原则性路径。

英文摘要

Gaussian processes (GPs) provide a principled Bayesian framework for uncertainty estimation, but their computational complexity severely limits scalability to large datasets. We propose SIKA-GP, which accelerates GP inference using sparse inducing kernel approximations based on a dyadic ordered template basis, incurring only ${O}(\log M)$ complexity dependence on the number of inducing points. Our approach constructs compact and expressive kernel representations from sparsely activated bases, enabling efficient tensorized GPU computation and seamless integration with modern large-scale models. SIKA-GP can be naturally embedded into Bayesian neural networks (BNNs) with sparse activations, yielding significant speedups in both training and inference without sacrificing predictive performance. The method naturally extends to deep feature learning, addressing the scalability challenges introduced by deep architectures and high-dimensional feature representations. Empirical results on vision and transformer-based language benchmarks demonstrate that our approach consistently delivers fast and accurate GP models, providing a principled path toward scalable kernel learning.

URL PDF HTML ☆

赞 0 踩 0

2605.26496 2026-05-27 cs.LG cs.AI 版本更新

Dense2MoE: Pushing the Pareto Frontier of On-Device LLMs via Unified Pruning and Upcycling

Dense2MoE：通过统一剪枝和升级推动设备端LLM的帕累托前沿

Fengfa Li, Hongjin Ji, Yifeng Ding, Lei Ren, Chen Wei

发表机构 * Li Auto ； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结提出Dense2MoE框架，通过层融合升级（LF-UC）统一剪枝和升级，将密集LLM高效转换为设备端MoE模型，在推理延迟与准确性之间取得更优帕累托前沿。

Comments 19 pages

详情

AI中文摘要

混合专家（MoE）架构对于资源受限的设备端部署极具前景，但从头训练这些模型成本高昂。当前方法试图通过将密集模型升级为MoE来缓解这一问题，然而它们常常引入参数冗余，降低推理效率。另一方面，标准层剪枝减少了冗余，但不可避免地损害模型准确性。为解决这一困境，我们提出Dense2MoE，一种通过层融合升级（LF-UC）统一剪枝和升级的新框架。在硬件Roofline理论指导下，Dense2MoE通过剪枝来自冗余层的带宽密集型注意力模块，同时将其多层感知机（MLP）重新用作MoE专家，系统地克服了推理内存墙。这种结构创新保留了模型的核心能力，并通过选择性令牌路由严格限制活跃参数。借助适度的持续预训练预算，Dense2MoE高效地将公开可用的密集LLM转换为设备端就绪的MoE模型。大量实验表明，Dense2MoE显著推进了设备端推理延迟与模型准确性的帕累托前沿，优于密集基线、最先进的压缩方法和标准升级方法。

英文摘要

The Mixture of Experts MoE architecture is highly promising for resource constrained on device deployments yet training these models from scratch incurs prohibitive costs Current methods attempt to alleviate this by upcycling dense models into MoEs however they often introduce parameter redundancy that degrades inference efficiency Alternatively standard layer pruning mitigates redundancy but inevitably compromises model accuracy To resolve this dilemma we propose Dense2MoE a novel framework that unifies pruning and upcycling through Layer Fusion UpCycling LF UC Guided by hardware Roofline theory Dense2MoE systematically overcomes the inference memory wall by pruning bandwidth heavy attention modules from redundant layers while repurposing their Multi Layer Perceptrons MLPs into MoE experts This structural innovation preserves the models core capabilities and strictly limits active parameters via selective token routing With a modest continual pre training budget Dense2MoE efficiently converts publicly available dense LLMs into on device ready MoE models Extensive experiments demonstrate that Dense2MoE significantly advances the Pareto frontier for on device inference latency versus model accuracy outperforming dense baselines state of the art compression and standard upcycling methods

URL PDF HTML ☆

赞 0 踩 0

2605.26494 2026-05-27 cs.AI cs.CL cs.LG 版本更新

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

MiniMax-M2系列：小激活释放最大现实智能

MiniMax, :, Aili Chen, Aonian Li, Baichuan Zhou, Bangwei Gong, Binyang Jiang, Boji Dan, Changqing Yu, Chao Wang, Cheng Ma, Cheng Zhong, Cheng Zhu, Chengjun Xiao, Chengyi Yang, Chengyu Du, Chenyang Zhang, Chi Zhang, Chuangyi Huang, Chunhao Zhang, Chunhui Du, Chunyu Zhao, Congchao Guo, Da Chen, Deming Ding, Dianjun Sun, Dongyu Zhang, Enhui Yang, Fei Yu, Guang Zheng, Guodong Zheng, Guohong Li, Haichao Zhu, Haigang Zhou, Haimo Zhang, Han Ding, Hao Zhang, Haohai Sun, Haolin Lyu, Haonan Lu, Haoyu Wang, Huajie Shi, Huiyang Li, Jiacheng Chen, Jian Zhang, Jiaqi Zhuang, Jiaren Cai, Jiaxin Pan, Jiayao Li, Jiayuan Song, Jichuan Zhang, Jie Wang, Jihao Gu, Jin Zhu, Jingwei Dong, Jingyang Li, Jingyu Zhang, Jingze Zhuang, Jinhao Tian, Jinli Liu, Jinyi Hu, Jun Tao, Jun Zhang, Junbin Ruan, Junhao Xu, Junjie Yan, Junteng Liu, Junxian He, Kang Xu, Ke Ji, Ke Yang, Kecheng Xiao, Keyu Duan, Keyu Li, Le Han, Letian Ruan, Li Yuan, Lianfei Yu, Liheng Feng, Lijie Mo, Lin Li, Lingye Bao, Lingyu Yang, Lingyuan Zhou, Loki, Lu Chen, Lunbin Ceng, Ming Li, Ming Zhong, Mingliang Tao, Mingyuan Chi, Mujie Lin, Nan Hu, Ningxin Chen, Peiyin Zhu, Peng Gao, Pengcheng Gao, Pengfei Li, Penglin Li, Pengyu Zhao, Qibin Ren, Qidi Xu, Qihan Ren, Qile Li, Qin Wang, Quanliang Chen, Qunhong Ceng, Rong Tian, Rui Dong, Ruitao Leng, Ruize Zhang, Shanqi Liu, Shaoyu Chen, Sheng Jia, Shun Yao, Shuoran Zhao, Shuqi Yu, Sichen Li, Sicheng Pan, Songquan Zhu, Tengfei Li, Tian Xie, Tiancheng Qin, Tianrun Liang, Wei Liu, Weiqi Xu, Weitao Li, Weixiang Chen, Weiyu Cheng, Weiyu Zhang, Wenhu Chen, Wenqian Zhao, Xiancai Chen, Xiangjun Song, Xiangyuan Wang, Xiao Luo, Xiao Su, Xiaobo Li, Xiaodong Han, Xiaojie Wu, Xihao Song, Xingyi Han, Xinyu Guan, Xuan Lu, Xun Zou, Xunhao Lai, Xutong Li, Yan Gong, Yang Wang, Yang Xu, Yangsen Wang, Ye Tang, Yicheng Chen, Yinran Qiu, Yiqi Shi, Yiting Guo, Yiwen Huang, Yixuan Wang, Yongyi Hu, Yu Gao, Yu Zhang, Yuanxiang Ying, Yuanzhen Zhang, Yubo Wang, Yuchen Song, Yufeng Yang, Yuhang Meng, Yuhang Miao, Yuhao Li, Yujie Liu, Yulin Hu, Yunan Huang, Yunji Li, Yunyi Huang, Yusen Zhang, Yusu Hong, Yutao Xie, Yutong Zhang, Yuwen Liao, Yuxuan Shi, Yuze Wenren, Zebin Li, Zehan Li, Zejian Luo, Zeyu Jin, Zeyuan Sun, Zhanpeng Zhou, Zhaochen Su, Zhendong Li, Zhengmao Zhu, Zhengyuan Peng, Zhenhua Fan, Zhi Zhang, Zhichao Xu, Zhiheng Lv, Zhikang Xu, Zhitao He, Zhiwei He, Zhongyuan Li, Zibo Gao, Zijia Wu, Zijian Song, Zijian Zhou, Zijun Sun, Zishan Huang, Ziying Chen, Ziyue Ge

发表机构 * MiniMax

AI总结提出MiniMax-M2系列混合专家语言模型，通过小激活参数实现前沿性能，核心包括智能体驱动数据管道、可扩展强化学习系统Forge及自进化检查点M2.7。

Comments Technical Report. 35 pages, 10 figures, 4 tables

详情

AI中文摘要

BioFact-MoE：基于生物学因子分解的混合专家模型用于肝细胞癌的视觉-语言预后建模

Junlin Yang, Tian Yu, Nicha C. Dvornek, Yuexi Du, Peiyu Duan, Annabella Shewarega, Lawrence H. Staib, James S. Duncan, Julius Chapiro

发表机构 * Department of Radiology \& Biomedical Imaging, Department of Biomedical Engineering, Department of Electrical Engineering, Department of Statistics \& Data Science Yale University, New Haven, CT, 06510, USA

AI总结提出BioFact-MoE框架，通过生物学监督的混合专家模型显式分解肝脏和肿瘤因子，在肝细胞癌预后预测中提升准确性和生物学可解释性。

Comments Early accepted at MICCAI 2026

详情

AI中文摘要

肝细胞癌（HCC）具有生物学异质性，由肝功能储备和肿瘤相关肿瘤学因素之间的相互作用塑造；因此，相似的生存结果可能反映根本不同的潜在生物学过程。HCC的预后建模依赖于来自多参数MRI和常规临床实践放射学报告的丰富多模态信息。现有的预后视觉-语言模型（VLM）学习单一的纠缠潜在表示，混合了肝脏和肿瘤相关因素，限制了准确性和生物学可解释性。我们提出BioFact-MoE，一个生物学因子分解的混合专家（MoE）框架，通过残差MoE生存架构中的生物学监督专家显式分解肝脏和肿瘤因素。在N=588名患者的HCC队列（在4,582个3D MRI图像-报告对上预训练）中，BioFact-MoE在所有时间范围内持续优于所有基线的生存预测，实现了12、18和24个月的AUC分别为75.33%、75.85%和73.96%。除了标量风险预测，门控专家权重实现了表型感知的风险分层。通路感知的门控揭示了临床上有意义的治疗相关生存异质性。在保留验证中，肝脏和肿瘤嵌入分别与肝功能标志物和肿瘤负荷标志物显示出选择性关联（p<0.05），无需监督。代码可在https://github.com/jy-639/BioFact-MoE获取。

英文摘要

Hepatocellular carcinoma (HCC) is biologically heterogeneous, shaped by the interplay between hepatic functional reserve and tumor-related oncologic factors; thus, similar survival outcomes may reflect fundamentally different underlying biological processes. Prognostic modeling in HCC is informed by rich multimodal information from multiparametric MRI and radiology reports from routine clinical practice. Existing prognostic vision-language models (VLMs) learn a single entangled latent representation that blends hepatic and tumor-related factors, limiting both accuracy and biological interpretability. We present BioFact-MoE, a biologically factorized Mixture of Experts (MoE) framework that explicitly decomposes liver and tumor factors via biologically supervised experts within a residual MoE survival architecture. On a HCC cohort of N=588 patients (pretrained on 4,582 3D MRI image-report pairs), BioFact-MoE consistently improves survival prediction over all baselines across time horizons, achieving 12-, 18-, and 24-month AUCs of 75.33%, 75.85%, and 73.96%. Beyond scalar risk prediction, gated expert weights enable phenotype-aware risk stratification. Pathway-informed gating uncovers clinically meaningful treatment-associated survival heterogeneity. In held-out validation, hepatic and tumor embeddings show selective associations with liver function and tumor burden markers, respectively (p<0.05), without supervision. The code is available at https://github.com/jy-639/BioFact-MoE.

URL PDF HTML ☆

赞 0 踩 0

2605.26373 2026-05-27 cs.LG math.OC stat.ML 版本更新

Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback

通过算法等价性在隐凸损失上的在线学习：最优遗憾、几何障碍与Bandit反馈

Anas Barakat, Andreas Kontogiannis, Vasilis Pollatos, Ioannis Panageas, Antonios Varvitsiotis

发表机构 * Singapore University of Technology and Design（新加坡科技设计大学）； National Technical University of Athens（雅典国家技术大学）； National and Kapodistrian University of Athens（雅典国家与卡多斯大学）； University of California, Irvine（加州大学 Irvine 分校）； Archimedes, Athena Research Center, Greece（希腊阿提卡研究中心 Archimedes）； National University of Singapore, Centre for Quantum Technologies（新加坡国立大学量子技术中心）

AI总结本文通过更精确的离散时间算法等价性论证，证明在线梯度下降在隐凸损失上达到最优的$\mathcal{O}(\sqrt{T})$遗憾，并澄清了所需几何条件，同时扩展到单点Bandit反馈得到$\mathcal{O}(T^{3/4})$期望遗憾。

Comments 43 pages

详情

AI中文摘要

我们研究具有隐凸损失的对抗性在线学习，即经过非线性重参数化后变为凸的非凸损失。Ghai, Lu和Hazan (2022)证明，在几何和光滑性假设下，此类非凸损失上的在线梯度下降(OGD)近似模拟了具有适当正则化器的底层凸损失上的在线镜像下降(OMD)，得到$\mathcal{O}(T^{2/3})$遗憾。他们留下了是否可以在隐凸设置中恢复在线凸优化的最优$\Theta(\sqrt{T})$遗憾的开放问题。我们肯定地回答了这个问题。更具体地，通过更尖锐的离散时间算法等价性论证，我们证明在相同假设下OGD达到$\mathcal{O}(\sqrt{T})$遗憾，匹配对抗性在线凸优化的最坏情况最优速率。我们还解决了Ghai, Lu和Hazan (2022)的另一个开放问题，澄清了这种算法等价性所需的几何条件。我们将对角雅可比充分条件替换为必要且充分的Hessian相容性条件，从而扩展了可允许重参数化的类别。我们用下界补充了紧的遗憾界，表明Hessian相容性假设对OGD是必要的；当该条件不成立时，我们构造一个光滑的重参数化和一个对抗性的隐凸损失序列，使得OGD遭受$\Omega(T)$遗憾。最后，我们将分析扩展到单点Bandit反馈，并证明使用球形平滑的Bandit OGD的$\mathcal{O}(T^{3/4})$期望遗憾界，匹配其在凸损失上的经典速率。

英文摘要

We study adversarial online learning with hidden-convex losses, i.e., nonconvex losses that become convex after a nonlinear reparameterization. Ghai, Lu and Hazan (2022) proved that, under geometric and smoothness assumptions, online gradient descent (OGD) on such nonconvex losses approximately simulates online mirror descent (OMD) on the underlying convex losses with a suitable regularizer, yielding $\mathcal{O}(T^{2/3})$ regret. They left open whether the optimal $Θ(\sqrt{T})$ regret from online convex optimization can be recovered in this hidden-convex setting. We answer this question affirmatively. More specifically, via a sharper discrete-time algorithmic equivalence argument, we prove that OGD achieves $\mathcal{O}(\sqrt{T})$ regret under the same assumptions, matching the optimal worst-case rate for adversarial online convex optimization. We also address another open question of Ghai, Lu and Hazan (2022) by clarifying the geometry required for this algorithmic equivalence. We replace the diagonal-Jacobian sufficient condition with a necessary-and-sufficient Hessian compatibility condition, thereby expanding the class of admissible reparameterizations. We complement our tight regret bound with a lower bound showing that the Hessian compatibility assumption is essential for OGD; when it fails, we construct a smooth reparameterization and an adversarial sequence of hidden-convex losses for which OGD suffers $Ω(T)$ regret. Finally, we extend our analysis to one-point bandit feedback and prove a $\mathcal{O}(T^{3/4})$ expected regret bound for bandit OGD with spherical smoothing, matching its classical rate on convex losses.

URL PDF HTML ☆

赞 0 踩 0

2605.26355 2026-05-27 cs.LG cs.CL eess.SP 版本更新

Energy-Gated Attention and Wavelet Positional Encoding: Complementary Inductive Biases for Transformer Attention

能量门控注意力与小波位置编码：Transformer注意力的互补归纳偏置

Athanasios Zeris

发表机构 * Independent Researcher（独立研究者）； Athens, Greece（希腊雅典）

AI总结针对标准注意力缺乏能量显著性和尺度选择性局部性两种互补归纳偏置的问题，提出能量门控注意力（EGA）和莫雷特位置编码（MoPE），两者组合在字符级语言建模上实现超加性性能提升。

Comments 10 pages, 1 figure, 3 tables. Part 2 of a five-paper series on spectral methods in transformer attention. Code: https://github.com/AthanasiosZeris/energy-gated-attention

详情

AI中文摘要

标准Transformer注意力计算成对标记相似性，但将所有标记视为同等显著、所有位置视为同等局部，忽略了输入的信息结构。我们识别出标准注意力缺乏两种互补归纳偏置：能量显著性（哪些标记集中了信息能量，通过端到端学习而不需要显式频率分解）和尺度选择性局部性（在每个频率上位置影响的范围，通过Morlet小波编码实现）。我们通过两个简单组件解决这两个问题。能量门控注意力（EGA）通过键标记嵌入的学习能量估计（通过单个线性投影计算）来门控值聚合；它选择关注什么。莫雷特位置编码（MoPE）用学习的高斯窗口小波替换固定的正弦编码，使联合位置-频率定位适应语料库；它指定注意力在每个尺度上操作的位置。在TinyShakespeare上，单独EGA相比标准注意力实现+0.092验证损失改进（相比Phase 1-3基线+0.103）；单独MoPE为-0.032（作为独立编码低于基线）；但它们的组合实现+0.119——超过各部分之和。这种超加性在两个独立训练运行中观察到，是核心实证发现：显著性和局部性是互补归纳偏置，各自填补对方无法单独填补的空白。消融实验证实，结构化谱先验（Morlet小波门控、尺度初始化头、固定正弦PE）始终不如其无约束学习对应物，而互补学习组件交互产生超加性。所有实验都在小规模（≤6M参数、字符级基准、单种子）进行；更大规模的多种子验证是未来工作最重要的方向。

英文摘要

Standard transformer attention computes pairwise token similarity but treats all tokens as equally salient and all positions as equally local, regardless of the informational structure of the input. We identify two complementary inductive biases that standard attention lacks: energy salience (which tokens concentrate informational energy, learned end-to-end without explicit frequency decomposition) and scale-selective locality (how far positional influence extends at each frequency, implemented via Morlet wavelet encoding). We address both with two simple components. Energy-Gated Attention (EGA) gates value aggregation by a learned energy estimate of key token embeddings, computed via a single linear projection; it selects what to attend to. Morlet Positional Encoding (MoPE) replaces fixed sinusoidal encodings with learned Gaussian-windowed wavelets that adapt the joint position-frequency localization to the corpus; it specifies where attention operates at each scale. On TinyShakespeare, EGA alone achieves +0.092 validation loss improvement over standard attention (+0.103 over Phase 1-3 baseline); MoPE alone is -0.032 (below baseline as a standalone encoding); but their combination achieves +0.119 -- more than the sum of parts. This superadditivity, observed across two independent training runs, is the central empirical finding: salience and locality are complementary inductive biases, each addressing a gap the other cannot fill alone. Ablations confirm that structured spectral priors (Morlet wavelet gates, scale-initialized heads, fixed sinusoidal PE) consistently underperform their unconstrained learned counterparts, while complementary learned components interact superadditively. All experiments are at small scale (<=6M parameters, character-level benchmarks, single seed); larger-scale multi-seed validation is the most important direction for future work.

URL PDF HTML ☆

赞 0 踩 0

2605.26353 2026-05-27 cs.CV cs.AI cs.LG 版本更新

Personalized Generative Models for Contextual Debiasing

用于上下文去偏的个性化生成模型

Xinran Liang, Esin Tureci, Prachi Sinha, Ye Zhu, Vikram V. Ramaswamy, Olga Russakovsky

发表机构 * Department of Computer Science, Princeton University（普林斯顿大学计算机科学系）； LIX, CNRS, École Polytechnique（巴黎政治学院LIX研究所，法国国家科学研究中心）

AI总结提出DecoupleGen方法，利用个性化文本到图像扩散模型生成罕见上下文图像，作为训练增强以缓解视觉识别中的上下文偏差。

Comments CVPR 2026 Workshop on Synthetic Data for Computer Vision and Generative Models for Computer Vision. Code available at https://github.com/princetonvisualai/DecoupleGen

详情

AI中文摘要

不同的视觉模式在世界中出现的频率不同：例如，沙滩球出现在沙滩上比出现在道路上更常见。这些统计数据反映在视觉数据集中，因此训练好的模型更容易在常见场景中识别物体。然而，在道路上识别沙滩球可能比在沙滩上识别更重要。我们研究如何缓解这种差异。由于在现实世界中收集不常见的图像可能很困难，我们探索生成具有较少频繁上下文的图像是否可以作为有效的训练增强。一个关键挑战是引导生成保持在原始数据集分布附近，同时创建具有不常见上下文的多样化图像。我们引入了DecoupleGen方法，该方法个性化文本到图像扩散模型，以促进罕见上下文图像的连贯合成，同时保留原始视觉细节。生成的图像包含语义上有意义的内容，并在视觉上与原始数据集保持一致。我们进一步应用验证约束以确保增强数据的相关性。我们在复杂场景数据集上的物体分类和识别任务中评估了我们的方法。实验表明，我们的方法比先前的方法有一致的改进，并且我们的分析确定了这些改进背后的因素。

英文摘要

Different visual patterns appear with different frequencies in the world: e.g., beach balls appear on sand more often than they do on a road. These statistics are reflected in vision datasets, and as a result trained models more easily recognize objects in common scenarios. However, recognizing a beach ball on a road may arguably be even more important than recognizing it on sand. We study how to mitigate this discrepancy. Since collecting uncommon images in the real world may be difficult, we explore whether generating images with less frequent contexts can serve as effective training augmentation. A key challenge is guiding generations to remain close to the original dataset distribution while creating diverse images with uncommon contexts. We introduce Decoupling Contextual Patterns with Generations (DecoupleGen), a method that personalizes text-to-image diffusion models to facilitate coherent synthesis of images with rare contexts while preserving original visual details. The generated images contain semantically meaningful content and remain visually aligned with the original datasets. We further apply verification constraints to ensure relevance of the augmented data. We evaluate our approach on object classification and recognition tasks on complex scene datasets. Our experiments demonstrate consistent improvements over previous approaches, and our analyses identify factors underlying these improvements.

URL PDF HTML ☆

赞 0 踩 0

2605.26350 2026-05-27 cs.LG cs.AI 版本更新

When Correct Demonstrations Hurt: Rethinking the Role of Exemplars in In-Context Learning

当正确示例有害时：重新思考示例在上下文学习中的作用

Chenghao Qiu, Chunli Peng, Yufeng Yang, Kuan-Hao Huang, Yi Zhou

发表机构 * Texas A&M University（德克萨斯理工大学）

AI总结本文通过引入任务保持扰动，揭示了正确示例不一定有益甚至可能降低上下文学习准确性的反直觉现象，并提出了上下文证据转移的概念来解释正确性与效用之间的差距。

详情

AI中文摘要

上下文学习（ICL）通常被直觉所驱动，即示例之所以有帮助是因为它们提供了正确的输入-输出对。然而，我们揭示了一个反直觉的现象：正确性并不能保证示例的效用，一些正确的示例甚至可能降低ICL的准确性。为了研究这种正确性-效用差距，我们引入了任务保持扰动，其中仅改变示例输入，而该示例仍然是同一任务的正确实例。具体来说，每个扰动后的示例被赋予由任务映射诱导的目标。该框架涵盖了标签更新扰动（其中任务相关语义发生变化且目标被重新计算）和更严格的目标保持扰动（其中原始目标仍然有效）。我们将由此产生的失败模式形式化为上下文证据转移：任务保持扰动可以改变模型用于上下文推理的有效证据混合，从而将示例正确性与示例效用分离。在情感分类、逻辑推理和数学应用题中，我们发现任务保持扰动的示例会显著降低ICL性能，尤其是对于较小的模型、较难的任务和较高的扰动比例。我们的结果表明，鲁棒的ICL不仅需要评估示例是否正确，还需要评估它们如何影响上下文推理。代码可在 https://github.com/Chenghao-Qiu/Task-Preserving-ICL 获取。

英文摘要

In-context learning (ICL) is often motivated by the intuition that demonstrations help because they provide correct input-output examples. However, we reveal a counterintuitive phenomenon: correctness does not guarantee exemplar utility, and some correct demonstrations can even reduce ICL accuracy. To study this correctness-utility gap, we introduce task-preserving perturbations, where only the exemplar input is changed, while the example remains a correct instance of the same task. Concretely, each perturbed exemplar is assigned the target induced by the task mapping. This framework covers both label-updating perturbations, where task-relevant semantics change and targets are recomputed, and stricter target-preserving perturbations, where the original target remains valid. We formalize the resulting failure mode as contextual evidence shift: task-preserving perturbations can change the effective mixture of evidence used by the model for contextual inference, thereby separating exemplar correctness from exemplar utility. Across sentiment classification, logical reasoning, and math word problems, we find that task-preserving perturbed demonstrations can substantially degrade ICL performance, especially for smaller models, harder tasks, and higher perturbation ratios. Our results show that robust ICL requires evaluating not only whether demonstrations are correct, but also how they influence contextual inference. Code is available at https://github.com/Chenghao-Qiu/Task-Preserving-ICL.

URL PDF HTML ☆

赞 0 踩 0

2605.26343 2026-05-27 cs.LG 版本更新

MechRL: Reinforcement Learning Agents Perform Circuit Discovery for Mechanistic Interpretability

MechRL：强化学习智能体进行电路发现以实现机械可解释性

Barsat Khadka

发表机构 * The University of Southern Mississippi（美国密西西比州立大学）

AI总结提出将电路发现转化为强化学习问题，使用PPO策略在GPT-2 small的144个注意力头上进行零消融和对比奖励，成功在训练任务和未见任务上恢复标准电路，验证了强化学习在机械可解释性中的有效性。

详情

AI中文摘要

机械可解释性已经识别出在Transformer语言模型中实现特定行为的小型注意力头集合，但恢复这些电路通常需要为每个新任务定制分析流程。我们将电路发现重新定义为强化学习问题。一个智能体在GPT-2 small的144个注意力头上操作，作为离散动作空间；每个动作触发零消融和对比奖励，该奖励从消融对目标任务的损害中减去其对通用下一个词预测的损害。一个在向量化多任务环境中训练于两个任务（归纳和IOI）的单一PPO策略，在两个训练任务以及一个保留的第三个任务（文档字符串补全）上均达到每轮最优。其偏好的头与现有文献中规范的头一致，恰好符合这些论文在单头消融下识别为因果非冗余的轴；它们识别为冗余的类别被智能体正确降级。在保留任务上，最佳五次规划在评估时未提供任务信号的情况下恢复了最优上限的96%。这些结果表明，基于因果干预的强化学习是识别机械电路单头瓶颈的可行且可迁移的方法，与现有的路径修补方法互补。

英文摘要

Mechanistic interpretability has identified small sets of attention heads that implement specific behaviours in transformer language models, but recovering these circuits typically requires a bespoke analytical pipeline for each new task. We recast circuit discovery as a reinforcement-learning problem. An agent operates over the 144 attention heads of GPT-2 small as a discrete action space; each action triggers a zero-ablation and a contrastive reward that subtracts the ablation's damage to general next-token prediction from its damage to the target task. A single PPO policy, trained on two tasks (induction and IOI) in a vectorised multi-task environment, attains the per-episode oracle on both training tasks and on a held-out third task (docstring completion). Its preferred heads coincide with the canonical heads of established literature on precisely the axes those papers identify as causally non-redundant under single-head ablation; the categories they identify as redundant are correctly de-prioritised by the agent. On the held-out task, best-of-five planning recovers 96\% of the oracle ceiling with no task signal supplied at evaluation. These results indicate that reinforcement learning over causal interventions is a viable, transferable substrate for identifying the single-head bottlenecks of mechanistic circuits, complementary to existing path-patching approaches.

URL PDF HTML ☆

赞 0 踩 0

2605.26341 2026-05-27 cs.LG stat.ML 版本更新

A PAC-Bayesian View of Generalisation for Physics-Informed Machine Learning

物理信息机器学习的泛化性的PAC-Bayesian视角

Thien V. Nguyen, Amaury Habrard, Benjamin Guedj

发表机构 * Université Jean Monnet Saint-Étienne, CNRS, Institut d’Optique Graduate School, Laboratoire Hubert Curien UMR 5516（里昂蒙特大学圣埃蒂安分校、法国国家科学研究中心、光学研究生院、Hubert Curien实验室 UMR 5516）； Inria and University College London, France and United Kingdom（Inria 和英国伦敦大学学院，法国和英国）

AI总结本文通过PAC-Bayesian框架，针对无界损失下的回归问题，推导了物理信息机器学习的高概率泛化界，并提出了自界感知学习算法，在标准PDE基准上验证了界的非平凡性和更紧性。

详情

AI中文摘要

物理信息机器学习（PIML）将机械知识（通常以偏微分方程（PDE）的形式）整合到数据驱动模型中。尽管经验性能强劲，但其统计泛化性质仍未被充分理解，尤其是在具有无界损失的回归设置中。现有分析依赖于近似或稳定性论证，未能完全捕捉物理结构如何影响有限数据的泛化。在这项工作中，我们为PIML开发了一个PAC-Bayesian框架，在存在无界损失的情况下提供高概率泛化保证。我们采用多任务视角，联合处理数据保真度、PDE残差、初始条件和边界条件，避免了标准联合界方法导致的松散性。我们的分析利用物理信息目标的结构，推导出新的界，其中复杂度与损失的输入梯度范数成比例，揭示了物理正则性与泛化之间的直接联系。我们在Sobolev和Poincaré型假设下实例化该框架，得到两类界，在不同机制下权衡统计复杂性和光滑性。基于这些结果，我们提出了一种自界感知学习算法，直接优化推导界的可处理代理，以及一种在实际设置中估计相关常数的实用程序。在标准PDE基准上的实证评估表明，我们的界是非平凡的，显著比联合界基线更紧，并且可以在训练过程中有效最小化。总体而言，我们的结果为物理信息模型的泛化提供了原则性的统计基础。

英文摘要

Physics-informed machine learning (PIML) integrates mechanistic knowledge, typically in the form of partial differential equations (PDE), into data-driven models. Despite strong empirical performance, its statistical generalisation properties remain poorly understood, particularly in the regression setting with unbounded losses. Existing analyses rely on approximation or stability arguments and do not fully capture how physical structure influences generalisation from finite data. In this work, we develop a PAC-Bayesian framework for PIML that provides high-probability generalisation guarantees in the presence of unbounded losses. We adopt a multi-task perspective that jointly treats data fidelity, PDE residuals, initial and boundary conditions, avoiding the looseness induced by standard union-bound approaches. Our analysis leverages the structure of physics-informed objectives to derive novel bounds where the complexity scales with input-gradient norms of the losses, revealing a direct link between physical regularity and generalisation. We instantiate this framework under Sobolev and Poincaré-type assumptions, yielding two classes of bounds that trade off statistical complexity and smoothness in different regimes. Building on these results, we propose a self-bounding-aware learning algorithm that directly optimises tractable surrogates of the derived bounds, along with a practical procedure to estimate the associated constants in realistic settings. Empirical evaluations on standard PDE benchmarks demonstrate that our bounds are non-vacuous, significantly tighter than union-bound baselines, and can be effectively minimised during training. Overall, our results provide a principled statistical foundation for the generalisation of physics-informed models.

URL PDF HTML ☆

赞 0 踩 0

2605.26339 2026-05-27 cs.LG cs.CL 版本更新

QAM-W: Joint 2D Codebook Quantization for LLM Weights via Hadamard Rotation and Activation-Aware Scaling

QAM-W: 通过哈达玛旋转和激活感知缩放实现LLM权重的联合2D码本量化

Preetam Sharma, Kacper Dobek

发表机构 * Independent Research（独立研究）； Institute of Computing Science（计算科学研究所）； Poznan University of Technology（波兹南技术大学）

AI总结提出QAM-W方法，通过L2归一化、块哈达玛旋转和2D坐标配对量化，结合激活感知缩放，在约5.5 bpw下使困惑度接近BF16，优于极坐标编码，并在5-6 bpw范围内保持质量。

详情

AI中文摘要

标量后训练量化器丢弃了权重行内的成对坐标结构。我们引入QAM-W（权重正交幅度调制），一种恢复该结构的编解码器：每行经过L2归一化、块哈达玛旋转、配对为2D坐标，并针对在单位圆高斯上训练的单个Lloyd-Max码本进行量化，同时采用激活感知的每通道缩放。在跨越四个家族（1.1B--13B参数）的五种LLM和八种量化配置的跨模型研究中，激活感知变体在约5.5 bpw下，每个模型的WikiText-2困惑度保持在BF16的±0.4%以内，以少32%的权重比特匹配SmoothQuant W8A8质量包络。联合2D编码在相同比特率下，在ΔPPL上优于极坐标（幅度×相位）编码2--15个百分点，且与BF16的配对KL散度在37个（方法，模型）行上以Spearman ρ=0.99跟踪ΔPPL%，与从编解码器失真到KL散度的单调复合界一致。3.5 bpw变体在量化容忍架构上具有竞争力。在严格的4 bpw下，旋转码本前沿方法QTIP优于QAM-W；贡献在于质量保持的5--6 bpw波段。

英文摘要

Scalar post-training quantizers discard pairwise coordinate structure within weight rows. We introduce QAM-W (Quadrature Amplitude Modulation for Weights), a codec that recovers this structure: each row is L2-normalized, block-Hadamard rotated, paired into 2D coordinates, and quantized against a single Lloyd-Max codebook trained on the unit circular Gaussian, with activation-aware per-channel scaling. In a cross-model study spanning five LLMs from four families (1.1B--13B parameters) and eight quantized configurations, the activation-aware variant at $\approx 5.5$ bpw stays within $\pm 0.4\%$ of BF16 WikiText-2 perplexity on every model, matching the SmoothQuant W8A8 quality envelope at $32\%$ fewer weight bits. Joint 2D coding outperforms polar (amplitude $\times$ phase) coding by 2--15~pp $Δ$PPL at equal bitrate, and paired KL against BF16 tracks $Δ$PPL\% at Spearman $ρ= 0.99$ across 37 (method, model) rows, consistent with a monotone composite bound from codec distortion to KL divergence. A 3.5~bpw variant is competitive on quantization-tolerant architectures. At strict 4~bpw, the rotated-codebook frontier method QTIP outperforms QAM-W; the contribution is the quality-preserving 5--6~bpw band.

URL PDF HTML ☆

赞 0 踩 0

2605.26327 2026-05-27 cs.LG 版本更新

Reparametrizing Shampoo and SOAP for Subspace Basis Updates and BFloat16 Storage

重新参数化Shampoo和SOAP用于子空间基更新和BFloat16存储

Alan Milligan, Zikun Xu, Simon Lacoste-Julien, Felix Dangel, Wu Lin

发表机构 * Mila & Université de Montréal Microsoft（Mila与蒙特利尔大学微软公司）； Concordia University & Mila（康科德大学与Mila）； University of Central Florida（中央佛罗里达大学）

AI总结本文通过重新参数化预条件器，在子空间中仅更新部分基向量，结合QR分解支持BFloat16存储，降低了Shampoo类方法的计算和内存开销，并缓解了低精度存储带来的性能下降。

Comments Preprint, working in progress

详情

AI中文摘要

基于Shampoo的方法，如KL-Shampoo和SOAP，在训练神经网络中表现出强大的性能，并依赖于QR分解。由于现有的QR实现需要单精度（FP32）算术且计算成本高，当预条件矩阵较大时，这些方法变得时间和内存密集。此外，使用BFloat16（BFP16）存储以减少内存使用会降低基于Shampoo的方法的性能。我们提出了一种预条件器的重新参数化，支持BFP16存储，并通过将更新的基向量与未改变的基向量结合形成完整基。通过在子空间中通过QR分解仅更新部分基，我们的方法减少了计算开销，同时缓解了BFP16存储导致的性能下降。我们的方法广泛适用于使用QR分解的基于Shampoo的方法，包括KL-Shampoo、SOAP和KL-SOAP。特别是，它改善了SOAP和KL-SOAP在BFP16存储下的性能，使KL-SOAP能够匹配或超过KL-Shampoo。总体而言，我们的方法使基于Shampoo的方法更加内存和时间高效。

英文摘要

Shampoo-based methods, such as KL-Shampoo and SOAP, have demonstrated strong performance in training neural networks and rely on QR decomposition. Because existing QR implementations require single-precision (FP32) arithmetic and remain computationally expensive, these methods become time- and memory-intensive when their preconditioning matrices are large. Moreover, using BFloat16 (BFP16) storage to reduce memory usage can degrade the performance of Shampoo-based methods. We propose a reparametrization of the preconditioner that supports BFP16 storage and forms a complete basis by combining updated basis vectors with unchanged ones. By updating only part of the basis through QR decomposition in a subspace, our approach reduces computational overhead while mitigating the performance degradation caused by BFP16 storage. Our approach applies broadly to Shampoo-based methods that employ QR decomposition, including KL-Shampoo, SOAP, and KL-SOAP. In particular, it improves the performance of SOAP and KL-SOAP under BFP16 storage, enabling KL-SOAP to match or exceed KL-Shampoo. Overall, our approach makes Shampoo-based methods more memory- and time-efficient.

URL PDF HTML ☆

赞 0 踩 0

2605.26324 2026-05-27 cs.LG cs.AI cs.NA math.NA 版本更新

Semigroup Consistency as a Diagnostic for Learned Physics Simulators

半群一致性作为学习型物理模拟器的诊断工具

Lennon J. Shikhman

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结提出归一化半群误差作为评估学习型物理模拟器时间组合和长程推演一致性的诊断指标，在热传导和Burgers动力学实验中验证其与推演退化正相关。

Comments 10 pages, 3 figures, 3 tables. Accepted to the AI4Physics Workshop at the 43rd International Conference on Machine Learning

2605.26320 2026-05-27 cs.LG cs.CL 版本更新

MULTISEISMO: A Multimodal Seismic Dataset and Model for Cross-Modal Seismic Understanding

MULTISEISMO: 面向跨模态地震理解的多模态地震数据集与模型

Sai Munikoti, Ian Stewart, Chengping Chai, Lisa Linville, Scott Vasquez, Sameera Horawalavithana, Karl Pazdernik

发表机构 * Pacific Northwest National Laboratory（太平洋西北国家实验室）； Oak Ridge National Laboratory（橡树岭国家实验室）； Sandia National Laboratory（桑迪亚国家实验室）； North Carolina State University（北卡罗来纳州立大学）

AI总结针对地震学中多模态数据整合的缺失，构建了包含超过1.6万次地震事件的结构化多模态数据集MultiSeismo，并开发了专用多模态模型SeisModal，在跨模态地震推理任务上取得了优越性能。

详情

AI中文摘要

通用多模态模型（GMMs）在专业科学领域的应用仍然有限，原因是缺乏整合文本和图像之外多种数据模态的综合性领域特定数据集。在地震学中，理解地震现象需要综合时间序列波形数据、地理图像和上下文元数据，而现有地震数据集缺乏这种多模态整合。我们提出了MultiSeismo，一个大规模结构化多模态地震数据集，包含跨越13年（2010年至2023年）来自不同地理区域的超过1.6万次地震事件。每个事件数据整合了全球台网波形记录、烈度图、人口暴露可视化以及标准JSON格式的全面文本描述。此外，我们开发了MISCE，一个基于原始数据的多模态指令集，用于对GMMs进行监督训练和评估，涵盖从基本信息检索到复杂跨模态分析的地震推理任务。我们利用MISCE微调了一个现有的多模态模型（Unified IO 2），并增强了专门的时间序列编码器，从而得到了SeisModal——首个用于综合地震分析的领域特定多模态模型。在MultiSeismo上对最先进的多模态模型进行评估，揭示了显著挑战，特别是通用模型在处理时间序列数据方面的困难，同时证明了SeisModal在地震多模态推理任务上的优越性能。这些结果证明，MultiSeismo为未来地震学多模态研究提供了严格的基准，并验证了我们领域特定架构调整的成功。

英文摘要

The application of generalist multimodal models (GMMs) to specialized scientific domains remains limited due to the scarcity of comprehensive domain-specific datasets that integrate multiple data modalities beyond text and images. In seismology, understanding earthquake phenomena requires the synthesis of timeseries waveform data, geographical imagery, and contextual metadata, a multimodal integration absent in existing seismic datasets. We present MultiSeismo, a large scale structured multimodal seismic dataset, comprising over 16K seismic events spanning 13 years (2010 to 2023) across diverse geographical regions. Each event data integrates waveform recordings from global station networks, intensity maps, population exposure visualizations, and a comprehensive textual description within a standardized JSON format. We additionally develop MISCE, a multimodal instruction set on top of raw data to enable supervised training and evaluation of GMMs on seismic reasoning tasks ranging from basic information retrieval to complex cross modal analysis. We leverage MISCE to finetune an existing multimodal model (Unified IO 2) enhanced with a specialized timeseries encoder, which yields SeisModal, the first domain specific multimodal model for comprehensive seismic analysis. Evaluation of state of the art multimodal models on MultiSeismo reveals significant challenges, particularly with time-series data processing for general purpose models, while demonstrating SeisModal's superior performance on seismic multimodal reasoning tasks. These results prove that MultiSeismo provides a rigorous benchmark for future multimodal research in seismology and validate the success of our domain specific architectural adaptations.

URL PDF HTML ☆

赞 0 踩 0

2605.26315 2026-05-27 cs.LG cs.AI 版本更新

Curriculum Learning for Safety Alignment

用于安全对齐的课程学习

Sandeep Kumar, Virginia Smith, Chhavi Yadav

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Simons Institute, UC Berkeley（Simons研究所，伯克利大学）

AI总结提出基于课程学习的Staged-Competence框架，通过难度分级的偏好数据和渐进式参考模型更新，提升DPO安全对齐的鲁棒性，在三个模型族上平均降低16%的OOD有害响应率和20%的越狱攻击成功率。

Comments Accepted at the ICML 2026 GlobalSouthML Workshop

详情

AI中文摘要

直接偏好优化（DPO）广泛用于大型语言模型的安全对齐。然而，先前的工作表明它脆弱且表现出较差的分布外（OOD）泛化能力。在本文中，我们研究课程学习是否能提高基于DPO的安全对齐的鲁棒性。我们提出Staged-Competence，一个基于课程的框架，它按难度组织偏好数据，采用基于能力的采样，并在训练过程中逐步更新参考模型。在三个模型族上平均，Staged-Competence将OOD有害响应率降低16%，越狱攻击成功率降低20%，同时保持接近零的过度拒绝，保留通用能力。我们进一步表明，Staged-Competence（1）仅使用75%的训练数据即可达到基线安全性，（2）在安全与不安全响应之间产生更好的分离。Staged-Competence与策略优化损失无关，并可扩展到其他DPO变体和对齐领域。我们的代码和数据可在https://github.com/Sandeep5500/curriculum-learning-for-safety获取。

英文摘要

Direct Preference Optimisation (DPO) is widely used for safety alignment in large language models. However, prior work shows it is brittle and exhibits poor out-of-distribution (OOD) generalisation. In this paper, we investigate whether Curriculum Learning can improve the robustness of DPO-based safety alignment. We propose Staged-Competence, a curriculum-based framework that organises preference data by difficulty, employs competence-based sampling, and progressively updates the reference model during training. Averaged across three model families, Staged-Competence reduces OOD harmful response rates by 16% and jailbreak attack success rates by 20%, while preserving general capabilities with near-zero over-refusal. We further show that Staged-Competence (1) matches baseline safety with only 75% of the training data and (2) yields better separation between safe and unsafe responses. Staged-Competence is agnostic to the policy optimisation loss and can extend to other DPO variants and alignment domains. Our code and data are available at https://github.com/Sandeep5500/curriculum-learning-for-safety.

URL PDF HTML ☆

赞 0 踩 0

2605.26289 2026-05-27 cs.LG 版本更新

Stateful Inference for Low-Latency Multi-Agent Tool Calling

面向低延迟多智能体工具调用的有状态推理

Victor Norgren

发表机构 * LayerScale, Inc.（LayerScale公司）

AI总结提出一种有状态推理架构，通过持久化KV缓存和增量处理，将多智能体工具调用的每轮成本从O(n_t)降至O(Δ_t)，在6轮和35轮工作流中分别实现2.1倍和4.2倍的加速。

详情

AI中文摘要

多智能体工具调用正成为基于LLM系统的主要交互模式，但现有推理框架将每次工具调用视为独立请求，从头重新处理整个对话，尽管85-95%的提示与上一轮相同。我们提出一种有状态推理架构，将传统服务的每轮O(n_t)成本转换为仅增量O(Δ_t)成本：持久KV缓存跨轮次存在，仅通过摄入新令牌前进，而基数前缀缓存将其扩展到交错的多智能体流量，提示查找推测解码器加速结构化输出。在针对新颖、完全生成的工作负载的测试中，与vLLM和SGLang相比，参考实现在6轮智能体工作流中每轮快2.1倍，在35轮工作流的中位数轮次中快4.2倍，端到端挂钟时间减半。优势来自有状态重用和推测，而非缓存。

英文摘要

Multi-agent tool calling is becoming the dominant interaction pattern for LLM-based systems, yet existing inference frameworks treat each tool call as an independent request, re-processing the entire conversation from scratch even though 85-95% of the prompt is unchanged from the previous turn. We present a stateful inference architecture that converts the $O(n_t)$ per-turn cost of conventional serving into an $O(Δ_t)$ delta-only cost: a persistent KV cache lives across turns and advances by ingesting only the new tokens, while a radix prefix cache extends this across interleaved multi-agent traffic and a prompt-lookup speculative decoder accelerates structured output. Against vLLM and SGLang on novel, fully-generated workloads, the reference implementation is $2.1\times$ faster per turn on a 6-turn agentic workflow and $4.2\times$ on the median turn of a 35-turn one, halving end-to-end wall time. The advantage comes from stateful reuse and speculation, not caching.

URL PDF HTML ☆

赞 0 踩 0

2605.26288 2026-05-27 stat.ML cs.LG stat.ME 版本更新

Beyond Differences: Doubly Robust Meta-Learners for Ratio-Based Treatment Effects

超越差异：基于比率的治疗效应的双重稳健元学习器

Michael Fuchs, Dominik Kreiss

发表机构 * Actuarial Department（精算部）

AI总结针对比率型条件平均处理效应（CATE）估计，提出Q-Learner将比率分解为两个优势比的乘积，并推导双重稳健增强版本，在低转化率场景和混杂观测数据中表现优异。

Comments 13+5 pages, 5 figures, 6 tables. Code: https://github.com/michaelfuchs90/ratiobasedcate

详情

AI中文摘要

当治疗效应自然表达为比率时——如在医学、定价和营销中——基于比率的CATE $τ(x) = E[Y|W=1,X=x] / E[Y|W=0,X=x]$ 是合适的估计目标。然而，现有估计器要么施加对数线性参数结构，要么应用通用回归而不对该泛函提供稳健性保证。我们引入了Q-Learner，它将$τ(x)$分解为两个优势比的乘积，将二元结果的比率CATE估计简化为两个倾向性分类任务。我们进一步推导了S/T型和Q型比率学习器的双重稳健增强，并刻画了它们不同的稳健性性质。在七个RCT数据集的基准测试中，Q-Learner在低转化率场景下是最持续有竞争力的方法，其仅基于倾向性的构造规避了伤害基于结果估计器的不平衡回归。在四个观测数据集上，其中倾向性必须估计且混杂无法排除，本文引入的DR学习器明确胜出，使其成为实践者在混杂观测数据中的自然默认选择。

英文摘要

When treatment effects are naturally expressed as ratios -- as in medicine, pricing, and marketing -- the ratio-based CATE $τ(x) = E[Y|W=1,X=x] / E[Y|W=0,X=x]$ is the appropriate estimand. Yet existing estimators either impose a log-linear parametric structure or apply generic regression without robustness guarantees for this functional. We introduce the Q-Learner, which decomposes $τ(x)$ into a product of two odds ratios, reducing ratio-CATE estimation for binary outcomes to two propensity classification tasks. We further derive doubly robust augmentations for both S/T- and Q-style ratio learners and characterize their distinct robustness properties. In benchmarks on seven RCT datasets, the Q-Learner is the most consistently competitive method in low-conversion regimes, where its propensity-only construction sidesteps the imbalanced regression that hurts outcome-based estimators. On four observational datasets, where propensity must be estimated and confounding cannot be ruled out, the DR learners introduced here decisively come out on top, making them practitioners' natural default for confounded observational data.

URL PDF HTML ☆

赞 0 踩 0

2605.26285 2026-05-27 cs.LG cs.NA math.NA 版本更新

Two-Parameter Flows for Learning Population Dynamics of Physical Systems

用于学习物理系统群体动力学的双参数流

Paul Schwerdtner, Tobias Blickhan, Benjamin Peherstorfer

发表机构 * Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA（数学科学学院，纽约大学，251 Mercer Street，纽约，NY 10012，美国）

AI总结提出双参数流方法，通过从基础分布到每个边际的采样时间传输学习高维概率密度动力学，并利用耦合合成轨迹回归提取物理时间速度，无需轨迹信息即可处理旋转等非梯度动力学。

2605.26283 2026-05-27 cs.CV cs.LG 版本更新

Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening

卷积、Transformer、混合模型及视觉语言模型在多病种视网膜筛查中的基准测试

Durjoy Dey, Aymane Ajbar, Yuhong Yan

发表机构 * Department of Computer Science and Software Engineering（计算机科学与软件工程系）； Concordia University（康科迪亚大学）； Ebovir Biotechnologie Inc.（Ebovir生物技术公司）

AI总结本研究在RFMiD数据集上对四种模型家族的12种架构进行基准测试，评估其在多病种视网膜筛查中的性能，发现基于注意力的模型（如SwinTiny、CoAtNet0、MaxViTTiny）在二元筛查和多标签分类中表现最佳，视觉语言模型与CNN基线相当但未超越最优Transformer和混合模型。

Comments 12 pages, 3 figures, accepted at ICMHI 2026, 10th International Conference on Medical and Health Informatics, Kyoto, Japan. To appear in ACM Conference Proceedings

详情

AI中文摘要

现代深度学习为自动化视网膜筛查提供了强大工具，但在现实多病种设置和领域偏移下，不同视觉模型家族的比较仍不明确。本研究使用视网膜眼底多病种图像数据集（RFMiD），对四种模型家族（卷积神经网络、视觉Transformer、混合CNN-Transformer骨干网络和视觉语言模型）的12种架构进行基准测试。我们评估两个任务：任何视网膜疾病的二元筛查和28个疾病类别的多标签分类。通过标准化训练、校准和评估协议，我们报告了在特异性接近80%的临床相关操作点下的AUC、F1、精确率、召回率和灵敏度。在RFMiD上，所有架构在二元筛查中表现良好，AUC均高于84%，但基于注意力的模型表现最佳。SwinTiny以及混合模型CoAtNet0和MaxViTTiny在二元筛查中取得最强结果，并在多标签设置中提高了宏F1和微F1。视觉语言模型（包括CLIP ViT-B/16和SigLIP-Base384）与CNN基线相当，但未超越最优Transformer和混合骨干网络。在Messidor-2上对可转诊糖尿病视网膜病变进行外部验证时，AUC范围为66.8%至84.7%，混合模型和Transformer模型再次表现出强劲性能。这些结果为多病种视网膜筛查中的模型选择提供了可重复的参考，并指导未来用于临床部署的自动化筛查工具。

英文摘要

Modern deep learning offers powerful tools for automated retinal screening, but it remains unclear how different visual model families compare in realistic multi-disease settings and under domain shift. In this work, we benchmark twelve architectures across four model families: convolutional neural networks, vision transformers, hybrid CNN-transformer backbones, and vision-language models, using the Retinal Fundus Multi-disease Image Dataset (RFMiD). We evaluate two tasks: binary screening for any retinal disease and multi-label classification across 28 disease classes. Using standardized training, calibration, and evaluation protocols, we report AUC, F1, precision, recall, and sensitivity at a clinically relevant operating point with specificity near 80%. On RFMiD, all architectures perform well on binary screening, with AUC above 84%, but attention-based models perform best. SwinTiny and the hybrid CoAtNet0 and MaxViTTiny models achieve the strongest binary screening results and improve macro and micro F1 in the multi-label setting. Vision-language models, including CLIP ViT-B/16 and SigLIP-Base384, are competitive with CNN baselines but do not surpass the best transformer and hybrid backbones. In external validation on Messidor-2 for referable diabetic retinopathy, AUC ranges from 66.8% to 84.7%, with hybrid and transformer models again showing strong performance. These results provide a reproducible reference for model selection in multi-disease retinal screening and guide future automated screening tools for clinical deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.26282 2026-05-27 cs.LG 版本更新

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

通过扩散策略优化扩展世界模型强化学习

Xiaoyuan Cheng, Wenxuan Yuan, Zhancun Mu, Yuanzhao Zhang, Yiming Yang, Hai Wang, Zhuo Sun, Che Liu

发表机构 * Dynamic Systems Lab, University College London（伦敦大学学院动态系统实验室）； College of Computing and Data Science, Nanyang Technological University（南洋理工大学计算与数据科学学院）； School of Intelligence Science and Technology, Peking University（北京大学智能科学与技术学院）； Santa Fe Institute（圣塔菲研究所）； School of Statistics and Data Science, Shanghai University of Finance and Economics（上海财经大学统计与数据科学学院）； Department of Computing, Imperial College London（伦敦帝国理工学院计算系）

AI总结针对世界模型强化学习中搜索与价值学习之间的结构错位问题，提出基于扩散策略优化的模型基方法MBDPO，统一搜索与策略优化，实现可扩展的策略学习。

详情

AI中文摘要

基于模型的强化学习可以通过使用世界模型在大规模下得到有效支持。然而，在实践中，扩展此类方法仍然受到根本性限制。一个普遍公认的挑战是模型偏差和误差累积，这会降低长期预测的质量。除了这些问题，我们识别出一个更关键但尚未充分探索的瓶颈：现有世界模型方法中搜索与价值学习之间的结构错位。特别是，策略改进通常依赖于由独立的非搜索策略诱导的价值函数，导致训练不一致并最终产生次优学习。为了解决这一限制，我们在世界模型中提出基于模型的扩散策略优化（MBDPO），该框架通过扩散策略表示统一搜索和策略优化，从而释放世界模型在可扩展策略学习中的潜力。我们不在学习到的世界模型上构建显式规划器，而是将策略优化重新表述为潜在世界模型中搜索轨迹上的扩散过程。从这个视角，我们从收集的数据集中提取一个隐式能量函数来锚定策略，使MBDPO能够细化用于策略优化的分数场，同时缓解错位问题。我们在多种设置下评估MBDPO，包括多任务离线预训练、在线学习以及离线到在线微调。在离线场景中，我们进一步通过在大规模数据集上预训练来研究其扩展行为，观察到随着模型容量增加，性能持续单调提升。

英文摘要

Model-based reinforcement learning (RL) can be effectively supported at scale through the use of world models. However, in practice, scaling such approaches remains fundamentally limited. A commonly recognized challenge is model bias and error compounding, which degrade long-horizon predictions. Beyond these issues, we identify a more critical yet underexplored bottleneck: a structural misalignment between search and value learning in existing world model approaches. In particular, policy improvement often relies on value functions induced by a separate, non-search policy, resulting in training inconsistency and ultimately suboptimal learning. To address this limitation, we propose Model-Based Diffusion Policy Optimization (MBDPO) in world models, a framework that unifies search and policy optimization through diffusion policy representations, thereby unlocking the potential of world models for scalable policy learning. Instead of constructing an explicit planner over a learned world model, we reformulate policy optimization as a diffusion process over searched trajectories in latent world models. In this view, we extract an implicit energy function from the collected dataset that anchors the policy, enabling MBDPO to refine the score field for policy optimization while mitigating misalignment. We evaluate MBDPO across a wide range of settings, including multi-task offline pretraining, online learning, and offline-to-online fine-tuning. In the offline regime, we further investigate its scaling behavior by pretraining on large-scale datasets, observing consistent and monotonic performance gains with increasing model capacity.

URL PDF HTML ☆

赞 0 踩 0

2605.26271 2026-05-27 stat.ML cs.LG econ.EM 版本更新

Learning Nonlinear Factor Models with Unknown Monotone Links from Incomplete and Noisy Data

从不完整和含噪数据中学习具有未知单调链接的非线性因子模型

Yutong Chao, Resat Gökhan, Jalal Etesami, Ali Habibnia

发表机构 * School of Computation, Information and Technology, Technical University of Munich, Germany（计算、信息与技术学院，慕尼黑技术大学，德国）； Department of Economics, Virginia Tech, USA（经济系，弗吉尼亚理工学院，美国）； Munich Institute of Robotics and Machine Intelligence（慕尼黑机器人与人工智能研究所）

AI总结研究从含噪和不完整数据中联合恢复低秩因子、载荷和未知单调链接函数的问题，提出投影块坐标下降算法并建立收敛保证。

详情

AI中文摘要

我们研究了一个非线性因子模型，其中观测响应通过未知的单调链接函数依赖于低秩潜在因子。由于严重的非凸性和可识别性问题，这一设置具有挑战性且在很大程度上未被充分探索。链接函数假设位于再生核希尔伯特空间（RKHS）中，从而在保持可识别性的同时实现灵活的非参数建模。我们将问题表述为从可能不完整和含噪的观测中联合恢复低秩因子、载荷和非线性链接函数，并提出一种带有显式正则化的投影块坐标下降（BCD）算法以解决尺度和旋转模糊性。在因子的弱不相干性和标准采样条件下，我们建立了无噪声和有噪声情况下的收敛保证，以及链接函数更新的次线性遗憾界。我们的结果将经典线性因子模型推广到广泛的非线性领域，并为学习非线性潜在结构提供了一个原则性框架。我们通过受控的合成实验评估了所提出的方法，显示出有希望的性能。

英文摘要

We study a nonlinear factor model in which observed responses depend on low-rank latent factors through an unknown monotone link function. This setting is challenging and largely underexplored due to severe nonconvexity and identifiability issues. The link function is assumed to lie in a reproducing kernel Hilbert space (RKHS), enabling flexible nonparametric modeling while preserving identifiability. We formulate the problem as the joint recovery of the low-rank factors, loadings, and the nonlinear link function from possibly incomplete and noisy observations and propose a projected block coordinate descent (BCD) algorithm with explicit regularization to address scale and rotational ambiguities. Under mild incoherence of factors and standard sampling conditions, we establish convergence guarantees in both noiseless and noisy regimes, along with sublinear regret bounds for the link-function updates. Our results extend classical linear factor models to a broad nonlinear regime and provide a principled framework for learning nonlinear latent structures. We evaluate the proposed approach using controlled synthetic experiments, indicating promising performance.

URL PDF HTML ☆

赞 0 踩 0

2605.26266 2026-05-27 cs.LG cs.AI cs.CV cs.GR eess.IV 版本更新

Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion

量化键窃取注意力：视频扩散中KV缓存压缩的偏差校正

Tuna Tuncer, Felix Becker, Thomas Pfeil

发表机构 * Technical University of Munich（慕尼黑技术大学）； Tensordyne

AI总结针对视频扩散模型中KV缓存量化导致注意力权重系统性偏差的问题，提出基于Jensen偏差的在线逐注意力分数校正方法，在INT2量化下恢复接近BF16的视频质量，且内存减半。

Comments Variants of this manuscript were accepted to the ICML 2026 workshops SCALE and F2S

详情

AI中文摘要

分块自回归视频扩散模型依赖先前生成块的KV缓存以避免冗余计算，但随着视频变长，该缓存迅速成为内存瓶颈。将KV缓存量化到低位宽的方法减少了内存压力，但降低了视频质量。我们表明，这种降低的一个关键驱动因素是注意力权重的系统性偏差：由于softmax注意力中指数的凸性，量化噪声膨胀了缓存键的贡献，我们称之为Jensen偏差。这种效应导致量化键从非量化的当前块中窃取注意力质量。我们推导出一个逐注意力分数校正，在期望中消除此偏差，该校正根据缓存键的量化步长和查询范数在线计算。使用二阶泰勒近似，额外的计算开销可忽略不计，且除了缓存外无需额外内存。在MAGI-1、SkyReels-V2和HY-WorldPlay上评估INT2量化，我们的校正恢复了因激进量化而损失的大部分质量，达到接近BF16的视频质量，并且在使用50%更少内存的情况下优于INT4量化。

英文摘要

Chunk-wise autoregressive video diffusion models rely on a KV cache of previously generated chunks to avoid redundant computation, but this cache quickly becomes a memory bottleneck as videos grow longer. Methods that quantize the KV cache to low bitwidths reduce memory pressure but degrade video quality. We show that a key driver of this degradation is a systematic bias in attention weights: due to the convexity of the exponential in softmax attention, quantization noise inflates the contribution of cached keys, a phenomenon we call the Jensen bias. This effect causes quantized keys to steal attention mass from the unquantized current chunk. We derive a per-attention-score correction that removes this bias in expectation, computed on the fly from the quantization step sizes of the cached keys and the query norm. Using a second-order Taylor approximation, the additional computational overhead is negligible, and no additional memory is needed alongside the cache. Evaluated on MAGI-1, SkyReels-V2, and HY-WorldPlay at INT2 quantization, our correction recovers most of the quality lost to aggressive quantization, reaching near-BF16 video quality, and can outperform INT4 quantization while using 50% less memory.

URL PDF HTML ☆

赞 0 踩 0

2605.26248 2026-05-27 cs.LG cs.AI cs.NE 版本更新

Unified Neural Scaling Laws

统一神经缩放定律

Ethan Caballero, Priyank Jaini, David Krueger, Irina Rish

发表机构 * Mila, University of Montreal（蒙特利尔大学Mila实验室）； Google DeepMind（谷歌DeepMind）

AI总结提出一种统一神经缩放定律（UNSL）函数形式，能够准确建模和预测深度神经网络在多个维度（模型参数、训练数据量、训练步数、推理步数、计算量及超参数）同时变化时的缩放行为，适用于多种架构和任务，并在大规模视觉、语言、数学和强化学习任务中实现更精确的缩放行为外推。

2605.26246 2026-05-27 cs.LG 版本更新

The Bridge-Garden Dilemma in LLM Distillation: Why Mixing Hard and Soft Labels Works

LLM蒸馏中的桥园困境：为什么混合硬标签和软标签有效

Guanghui Wang, Kaiwen Lv Kacuila, Zhiyong Yang, Zitai Wang, Jin-Wen Wu, Longtao Huang, Qianqian Xu, Qingming Huang

发表机构 * School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China（中国科学院大学计算机科学与技术学院）； Alibaba Group, Hangzhou, China（阿里巴巴集团）； State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China（中国科学院人工智能安全国家重点实验室）； Beijing Academy of Artificial Intelligence, Beijing, China（北京人工智能研究院）； Key Laboratory of Big Data Mining and Knowledge Management (BDKM), University of Chinese Academy of Sciences, Beijing, China（中国科学院大数据挖掘与知识管理重点实验室）

AI总结针对大语言模型知识蒸馏中硬标签与软标签的混合使用，提出桥园分解理论解释其降低暴露偏差的机制，并开发自适应混合监督方法，在多个模型上实现性能提升和9.7倍训练成本降低。

Comments Accepted at ICML 2026

详情

AI中文摘要

知识蒸馏（KD）将知识从大型教师模型转移到较小的学生模型。在语言建模中，学生模型要么在从教师模型采样的标记（硬标签）上训练，要么在教师模型的完整下一个标记分布（软标签）上训练。尽管软标签看起来严格更丰富，但我们发现混合硬标签和软标签始终能产生更好的结果。关键的是，我们表明这种增益不能通过训练期间更接近教师匹配来解释。相反，它来自于减少暴露偏差，即训练和推理分布之间的不匹配。为了解释这一现象，我们引入了桥园分解理论，该理论将生成步骤分为两类：桥（Bridge），其中下一个标记必须精确；园（Garden），其中下一个标记可以灵活。我们表明，仅硬标签的KD在桥中通过避免风险偏差表现出色，而仅软标签的KD在园中保持多样性。混合策略处理两种情况，从而减少整个序列中的暴露偏差。在该理论的指导下，我们开发了一系列桥园混合监督方法，自适应地平衡硬标签和软标签。在包含七个教师-学生对（包括Qwen、Llama、Gemma和DeepSeek）的主要套件以及推理和编码基准测试中，我们的方法优于基于散度和基于策略的KD基线，同时将训练成本降低了9.7倍，实现了高效的模型压缩。代码可在https://github.com/ghwang-s/bridge_garden_hybrid_kd_release获取。

英文摘要

Knowledge distillation (KD) transfers knowledge from a large teacher model to a smaller student. In language modeling, the student is trained either on tokens sampled from the teacher (hard labels) or the teacher's full next-token distribution (soft labels). Despite soft labels appear strictly richer, we find that mixing hard and soft labels consistently yields better results. Crucially, we show that this gain cannot be explained by closer teacher matching during training. Instead, it comes from reduced exposure bias, the mismatch between training and inference distributions. To explain this phenomenon, we introduce the Bridge-Garden Decomposition theory, which categorizes generation steps into two types: Bridges, where the next token must be exact, and Gardens, where it can be flexible. We show that hard-only KD excels in Bridges by avoiding risky deviations, while soft-only KD preserves diversity in Gardens. A hybrid strategy handles both cases and, as a result, reduces exposure bias across the sequence. Guided by this theory, we develop a family of Bridge-Garden hybrid supervision methods that adaptively balance hard and soft labels. Across a primary suite of seven teacher-student pairs (including Qwen, Llama, Gemma, and DeepSeek) and benchmarks in reasoning and coding, our approach outperforms divergence-based and on-policy KD baselines while reducing training cost by 9.7x, enabling efficient model compression. Code is available at https://github.com/ghwang-s/bridge_garden_hybrid_kd_release.

URL PDF HTML ☆

赞 0 踩 0

2605.26243 2026-05-27 cs.LG 版本更新

Provably Communication-Efficient and Privacy-Preserving Federated Graph Neural Networks

可证明通信高效且隐私保护的联邦图神经网络

Zhishuai Guo, Wenhan Wu, Chen Chen, Lei Zhang, Olivera Kotevska, Ravi K Madduri

发表机构 * Northern Illinois University（北伊利诺伊大学）； University of North Carolina at Charlotte（北卡罗来纳州立大学查珀尔山分校）； University of Central Florida（中央佛罗里达大学）； Oak Ridge National Laboratory（橡树岭国家实验室）； Argonne National Laboratory（阿贡国家实验室）

AI总结提出CE-FedGNN框架，通过稀疏交换聚合节点表示和移动平均估计器处理跨客户端依赖，结合度量差分隐私实现通信高效与隐私保护，并证明收敛速率和隐私保证。

详情

AI中文摘要

图神经网络（GNN）在关系数据上取得了强性能，但现实世界的图通常分布在多个组织之间，由于隐私和政策约束，这些组织无法共享原始数据。现有的联邦GNN方法要么忽略跨客户端链接导致精度下降，要么需要频繁的嵌入交换，带来巨大的通信和隐私成本。我们提出了CE-FedGNN，一个通信高效且隐私保护的联邦GNN框架，用于学习此类耦合图。我们的方法避免共享原始数据或每轮嵌入，而是通过稀疏交换聚合的节点表示。为了处理跨客户端依赖和过时性，我们引入了一个移动平均估计器，持续跟踪节点表示并使其能够在多轮中稳定重用。为了为发布的表示提供正式的隐私保证，我们采用了度量差分隐私（metric-DP）框架，该框架根据学习嵌入空间中的距离而非最坏情况输入扰动来衡量隐私。这在标准差分隐私变得过于保守的噪声水平下提供了有意义的保证。我们建立了以$O(1/\sqrt{T})$速率收敛到稳定点，通信复杂度为$O(T^{3/4})$。此外，我们在公共队列威胁模型下通过Rényi差分隐私组合推导了$(\varepsilon,\delta)$-度量差分隐私保证。在合成银行间反洗钱基准和引文网络上的实验表明，CE-FedGNN在显著降低通信的同时保持了强性能，并在隐私保护噪声下保持鲁棒性。

英文摘要

Graph neural networks (GNNs) achieve strong performance on relational data, but real-world graphs are often distributed across organizations that cannot share raw data due to privacy and policy constraints. Existing federated GNN methods either ignore cross-client links, leading to degraded accuracy, or require frequent embedding exchanges, incurring substantial communication and privacy costs. We propose CE-FedGNN, a communication-efficient and privacy-preserving federated GNN framework for learning over such coupled graphs. Our approach avoids sharing raw data or per-round embeddings by infrequently exchanging aggregated node representations. To handle cross-client dependency and staleness, we introduce a moving-average estimator that continuously tracks node representations and enables their stable reuse across rounds. To provide formal privacy guarantees for the released representations, we adopt the metric differential privacy (metric-DP) framework, which measures privacy with respect to distances in the learned embedding space rather than worst-case input perturbations. This yields meaningful guarantees at noise levels where standard differential privacy becomes overly conservative. We establish convergence to a stationary point at a rate of $O(1/\sqrt{T})$ with $O(T^{3/4})$ communication complexity. In addition, we derive $(\varepsilon,δ)$-metric-DP guarantees via Rényi differential privacy composition under a public-cohort threat model. Experiments on synthetic interbank anti-money laundering benchmarks and citation networks demonstrate that CE-FedGNN achieves strong performance while significantly reducing communication and maintaining robustness under privacy-preserving noise.

URL PDF HTML ☆

赞 0 踩 0

2605.26222 2026-05-27 cs.LG stat.ML 版本更新

From Privacy to Generalization: Linear Max-Information Bounds for DP-SGD

从隐私到泛化：DP-SGD的线性最大信息界

Christoph H. Lampert, Hossein Zakerinia

发表机构 * Institute of Science and Technology Austria (ISTA)（奥地利科学与技术研究所）

AI总结本文证明了DP-SGD的近似最大信息量具有与数据集大小成线性关系的有限样本界，并基于此推导出PAC-Bayes泛化界和DP-SGD训练模型的显式泛化界。

Comments 22 pages

2605.26192 2026-05-27 cs.LG cs.AI q-bio.BM 版本更新

ARBITER：测试时采样中的推理轨迹盆地与多数投票失败

Meng Cai, Lars Kulik, Farhana Choudhury

发表机构 * School of Computing and Information Systems（计算与信息系统学院）； University of Melbourne（墨尔本大学）

AI总结本文发现语言模型测试时采样的推理轨迹会聚集成少数“推理盆地”，导致多数投票选择最稳定而非最准确的盆地，并提出ARBITER方法通过保守加性证据修正共识，从样本池中恢复部分正确性。

Comments Preprint. 34 pages, 2 figures

详情

AI中文摘要

当语言模型使用测试时采样时，它们会生成多个推理轨迹并通过多数投票选择答案。我们证明这些轨迹并非独立：对于给定问题，它们会聚集成少数几个簇，即推理盆地，每个盆地由归一化的最终答案和达到该答案的解决方案定义。因此，多数投票选择的是最稳定的盆地而非最准确的盆地，这导致错误多数失败，即正确答案存在但被否决。我们提出ARBITER，一种模型无关的方法，仅使用基础模型自身的采样输出、隐藏状态和派生证据来建模盆地之间的交互。大多数直接纠正策略失败；ARBITER则在共识之上使用保守的加性证据。在其最简单的无参数形式中，ARBITER-Δ将同模型证据添加到多数先验中，而ARBITER-Enc则通过来自完整解决方案的隐藏状态的有界残差信号增强这一过程。在GSM8K上使用Qwen3-4B，K=24个样本的共识达到约94%中段，而同池top-2 oracle达到约96%中段。ARBITER在不使用外部信息的情况下恢复了这些案例的一个子集。在三个模型系列和三个数学基准上，它带来了一致的提升，且没有净负例；例如，在Llama-3.1-8B MMLU-HS-Math上，它将准确率从约78%中段提高到约82%中段，恢复了约22%的可用oracle余量，表明该余量可以从样本池本身部分恢复。

英文摘要

When language models use test-time sampling, they generate multiple reasoning trajectories and select an answer by majority vote. We show that these trajectories are not independent: for a given question, they concentrate into a small number of clusters, or reasoning basins, each defined by a normalized final answer and the solutions that reach it. A majority vote therefore selects the most stable basin rather than the most accurate one, which creates wrong-majority failures where the correct answer is present but outvoted. We introduce ARBITER, a model-agnostic approach that models interactions between basins using only the base model's own sampled outputs, hidden states, and derived evidence. Most direct correction strategies fail; ARBITER instead uses conservative additive evidence on top of consensus. In its simplest parameter-free form, ARBITER-Δ adds same-model evidence to the majority prior, while ARBITER-Enc augments this with bounded residual signals from hidden states over complete solutions. On GSM8K with Qwen3-4B, consensus over K=24 samples achieves around the mid-94% range, while a same-pool top-2 oracle reaches around the mid-96% range. ARBITER recovers a subset of these cases using zero external information. Across three model families and three math benchmarks, it yields consistent gains with no net-negative cases; for example, on Llama-3.1-8B MMLU-HS-Math, it improves accuracy from the mid-78% range to the mid-82% range, recovering about 22% of the available oracle headroom, indicating that this headroom can be partially recovered from the sample pool itself.

URL PDF HTML ☆

赞 0 踩 0

2605.26171 2026-05-27 cs.LG 版本更新

When Rule Violations Are Rare: Chimera Training for Logical Anomaly Detection

当规则违反罕见时：用于逻辑异常检测的嵌合体训练

Alejandro Ascarate, Leo Lebrat, Rodrigo Santa Cruz, Clinton Fookes, Olivier Salvado

AI总结针对规则违反样本稀少的逻辑异常检测，提出嵌合体训练方法，通过特征级操作数反事实构造生成监督信号，提升规则级异常检测性能。

Comments 9+30 pages, 4+4 figures, under review

详情

AI中文摘要

许多实际异常不仅仅是罕见的输入，而是语义约束的违反：对象以结构化方式共现，动作蕴含前提条件，事件满足时间或关系规律。我们研究这种设置下的异常检测，其中约束以学习到的视觉概念上的逻辑规则形式给出，但训练期间真实规则违反罕见或缺失。我们提出一种神经规则评估器，将每个约束编译成有向无环图，并为其内部逻辑运算符学习特征感知的子树MLP门。每个门将子特征和边级否定映射到父表示和规则满足概率，并通过基于真实概念标签的精确布尔传播获得中间监督。关键困难在于同图像训练数据通常无法提供信息性真值配置的充分覆盖，并允许捷径解。为解决此问题，我们引入嵌合体训练：在特征级别进行操作数级反事实构造。我们不混合输入图像，而是连接来自不同样本的子树特征；每个操作数保留其来源样本的硬真值标签，并通过将节点的逻辑运算符应用于这些继承标签来获得嵌合体目标。这提供了监督逻辑反例，而无需真实异常图像。在CLEVRER、OpenImages和VidOR上，所得到的评估器在规则级异常AUROC上优于独立事件和同图像语义训练基线，特别是对于组合和关系规则。该方法产生标量异常分数和规则级归因。

英文摘要

Many practical anomalies are not merely rare inputs, but violations of semantic constraints: objects co-occur in structured ways, actions imply preconditions, and events satisfy temporal or relational regularities. We study anomaly detection in this setting, where constraints are given as logical rules over learned visual concepts, but real rule violations are rare or absent during training. We propose a neural rule evaluator that compiles each constraint into a directed acyclic graph and learns feature-aware subtree MLP gates for its internal logical operators. Each gate maps child features and edge-level negations to a parent representation and a rule-satisfaction probability, with intermediate supervision obtained from exact Boolean propagation over ground-truth concept labels. The key difficulty is that same-image training data often provide insufficient coverage of informative truth configurations and also allow shortcut solutions. To address this, we introduce chimera training: an operand-level counterfactual construction at the feature level. Instead of mixing input images, we concatenate subtree features from different samples; each operand keeps the hard truth label of the sample it came from, and the chimera target is obtained by applying the node's logical operator to those inherited labels. This supplies supervised logical counterexamples without requiring real anomalous images. Across CLEVRER, OpenImages, and VidOR, the resulting evaluator improves rule-level anomaly AUROC over independent-events and same-image semantic-training baselines, especially for compositional and relational rules. The method yields both scalar anomaly scores and rule-level attributions.

URL PDF HTML ☆

赞 0 踩 0

2605.26168 2026-05-27 cs.OS cs.LG 版本更新

基于推送的异步联邦学习：一种偏差校正聚合方法

Jiahui Bai, Hai Dong, A. K. Qin

发表机构 * School of Computer Technologies, RMIT University（RMIT大学计算机技术学院）； School of Science, Computing and Engineering Technologies, Swinburne University of Technology（斯威丁大学科学与工程技术学院）

AI总结提出PushCen-ADFL框架，通过中心表示空间中的平均保持推-求和混合与轻量级中心正则化，解决异步去中心化联邦学习中的通信开销、聚合偏差和模型漂移问题。

Comments Accepted at the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026). This is the extended version with full appendix

详情

DOI: 10.1145/3770855.3817925

AI中文摘要

异步去中心化联邦学习（ADFL）消除了中央协调和全局同步，使其在大规模和异构系统中具有吸引力。然而，频繁的点对点通信、有向拓扑上的异步更新以及非独立同分布数据共同导致了过高的通信开销、有偏聚合和严重的模型漂移。我们提出了PushCen-ADFL，一种通信高效的ADFL框架，能够在非对称通信和延迟客户端参与下实现稳定训练。PushCen-ADFL在共享中心表示空间中耦合了通信、聚合和局部稳定化，形成了压缩与优化之间的闭环。客户端交换中心形式的消息，应用平均保持的推-求和混合来校正聚合偏差，并使用锚定在同一中心空间的轻量级中心正则化来减轻异构性和陈旧性下的漂移。一个有界、发送者去重的缓冲区进一步提高了在异步到达不规则情况下的鲁棒性。在视觉数据集上的实验表明，PushCen-ADFL在数据异构性下将准确率提高了最多6%，同时将每次推送的通信成本降低了80%以上，实现了良好的准确率-通信权衡。

英文摘要

Asynchronous decentralized federated learning (ADFL) eliminates central coordination and global synchronization, making it attractive for large-scale and heterogeneous systems. However, frequent peer-to-peer communication, asynchronous updates on directed topologies, and non-IID data jointly lead to excessive communication overhead, biased aggregation and severe model drift. We propose PushCen-ADFL, a communication-efficient ADFL framework that enables stable training under asymmetric communication and delayed client participation. PushCen-ADFL couples communication, aggregation, and local stabilization in a shared centroid representation space, forming a closed loop between compression and optimization. Clients exchange centroid-form messages, apply average-preserving push-sum mixing to correct aggregation bias, and use a lightweight centroid regularization anchored in the same centroid space to mitigate drift under heterogeneity and staleness. A bounded, sender-deduplicated buffer further improves robustness under irregular asynchronous arrivals. Experiments on vision datasets demonstrate that PushCen-ADFL improves accuracy under data heterogeneity by up to 6\% while reducing per-push communication cost by more than 80\%, achieving a favorable accuracy-communication trade-off.

URL PDF HTML ☆

赞 0 踩 0

2605.26161 2026-05-27 cs.LG cs.AI 版本更新

TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models

TSFMAudit: 时间序列基础模型中的数据污染审计

Hongkai Li, Shifeng Xie, Lefei Shen, Zhuo Li, Mouxiang Chen, Xiaobin Zhang, Han Fu, Jianling Sun, Xiaoxue Ren, Chenghao Liu

发表机构 * Zhejiang University（浙江大学）； Télécom Paris（巴黎高等电信学院）； State Street Technology (Zhejiang) Ltd.（State Street Technology（浙江）有限公司）； Datadog

AI总结针对时间序列基础模型（TSFMs）预训练数据污染问题，提出基于探针适应动力学的审计方法TSFMAudit，通过检测微调后损失下降更快且骨干网络移动更小的异常现象来识别污染数据集。

Comments 22 pages, 7 figures, 9 tables

详情

AI中文摘要

时间序列基础模型（TSFMs）越来越多地在大型语料库上进行预训练，这引发了评估数据集可能在预训练期间被暴露从而导致过于乐观的性能估计的担忧。在时间序列中审计此类污染具有挑战性，因为信号是连续且异质的，并且通常缺乏语料库文档。据我们所知，这是第一个研究TSFMs预训练污染审计的工作。我们形式化了TSFMs的预训练污染审计问题，并提出了TSFMAudit，一种基于探针适应动力学的方法。我们的关键直觉是，污染表现为异常高效的适应：在微调探针后，受污染的数据集往往表现出更快的损失减少和更小的骨干网络移动。我们在6个TSFMs和187个数据集上评估了TSFMAudit，使用文档化的训练来源证据作为监督，并与从LLM文献中改编的10个竞争基线进行了比较。

英文摘要

Time series foundation models (TSFMs) are increasingly pretrained on large corpora, raising concerns that evaluation datasets may have been exposed during pretraining and thus yield overly optimistic performance estimates. Auditing such contamination is challenging in time series because signals are continuous and heterogeneous, and often lack corpus documentation. To the best of our knowledge, this is the first work to study pretraining contamination auditing for TSFMs. We formalize the problem of pretraining contamination auditing for TSFMs and propose TSFMAudit, a method based on probe adaptation dynamics. Our key intuition is that contamination manifests as unusually efficient adaptation: after a fine tuning probe, contaminated datasets tend to exhibit faster loss reduction with smaller backbone movement. We evaluate TSFMAudit on 6 TSFMs and 187 datasets using documented training source evidence as supervision, and compare against 10 competitive baselines adapted from the LLM literature.

URL PDF HTML ☆

赞 0 踩 0

2605.26159 2026-05-27 cs.NI cs.CR cs.LG 版本更新

Device Context Protocol: A Compact, Safety-First Architecture for LLM-Driven Control of Constrained Devices

设备上下文协议：一种紧凑、安全优先的架构，用于LLM驱动的受限设备控制

Dongxu Yang

发表机构 * DeepLethe

AI总结针对LLM控制受限设备的安全问题，提出设备上下文协议（DCP），通过极小的帧开销、协议层安全原语和主机端桥接，在保持低资源占用的同时有效防御幻觉和提示注入攻击。

Comments 15 pages, 5 figures. Reference implementation, Python package (pip install pydcp), and reproduction scripts at https://github.com/device-context-protocol/dcp

详情

AI中文摘要

大型语言模型越来越多地通过模型上下文协议（MCP）作为外部工具的编排器，但MCP是为具有兆字节内存的软件服务构建的，并未覆盖主导物理设备长尾的微控制器。近期工作（IoT-MCP）将MCP移植到边缘网关，峰值内存为74 KB；这仍然排除了最小的商用MCU，并且关键的是，没有解决将不可靠调用者（可能产生幻觉或受到提示注入的LLM）直接控制物理硬件的安全问题。我们提出设备上下文协议（DCP）：一个典型帧小于50字节（6字节头部+CBOR负载+可选的16字节HMAC），一个清单模式，其中能力范围、范围和类型检查、试运行评估以及单位即类型是协议层原语，以及一个主机端桥接，在设备收到任何字节之前拒绝格式错误或幻觉调用。参考固件在ESP32上占用27.6 KB闪存/0.6 KB RAM；Python桥接、ESP32固件和语言无关的一致性套件采用MIT许可证并公开。一项实证研究——由来自四个供应商（DeepSeek、阿里巴巴、智谱、MiniMax）的五个LLM针对六类对抗性提示生成的675次工具调用，其中注入类别实例化了AgentDojo的攻击模板——显示DCP拒绝了100%的能力提升尝试和78%的提示注入尝试，而原始MCP和IoT-MCP为0-1%，在固件占用空间小三个数量级的情况下匹配了结构良好的OpenAPI 3模式的表达能力。我们将DCP定位为MCP（正朝着企业SaaS连接发展）与其未覆盖的物理设备之间缺失的一层。

英文摘要

Large language models are increasingly used as orchestrators of external tools via the Model Context Protocol (MCP), but MCP is built for software services with megabytes of memory and does not descend to the microcontrollers that dominate the long tail of physical devices. Recent work (IoT-MCP) ports MCP to edge gateways at 74 KB peak memory; this still excludes the smallest commodity MCUs and, critically, does not address the safety problem of giving an unreliable caller (an LLM that may hallucinate or be prompt-injected) direct control of physical hardware. We present the Device Context Protocol (DCP): a sub-50-byte typical frame (6-byte header + CBOR payload + optional 16-byte HMAC), a manifest schema in which capability scoping, range and type checks, dry-run evaluation, and units-as-types are protocol-layer primitives, and a host-side Bridge that rejects malformed or hallucinated calls before any byte reaches the device. Reference firmware measures 27.6 KB flash / 0.6 KB RAM on ESP32; the Python Bridge, ESP32 firmware, and a language-neutral conformance suite are MIT-licensed and public. An empirical study -- 675 tool calls produced by five LLMs across four vendors (DeepSeek, Alibaba, Zhipu, MiniMax) against six categories of adversarial prompts, with the injection category instantiating AgentDojo's attack templates -- shows DCP rejects 100% of capability-escalation attempts and 78% of prompt-injection attempts, versus 0--1% for Raw MCP and IoT-MCP, matching the expressiveness of a well-formed OpenAPI 3 schema at three orders of magnitude less firmware footprint. We position DCP as the missing layer between MCP (which is moving toward enterprise SaaS connectivity) and the physical devices it does not reach.

URL PDF HTML ☆

赞 0 踩 0

2605.26158 2026-05-27 cs.CR cs.AI cs.LG 版本更新

Furina: Fragmented Uncertainty-Driven Refusal Instability Attack

Furina: 碎片化不确定性驱动的拒绝不稳定攻击

Tongxi Wu, Jian Zhang, Yang Gao

发表机构 * School of Intelligence Science and Technology（智能科学与技术学院）； State Key Laboratory for Novel Software Technology（新型软件技术国家重点实验室）； Nanjing University（南京大学）

AI总结通过揭示大语言模型安全行为存在不稳定区域，提出多指标诊断框架并开发Furina攻击方法，利用碎片化场景提示诱导不确定性放大，实现高效越狱。

Comments This work is accepted as a regular paper at ICML 2026

详情

AI中文摘要

大语言模型和多模态大语言模型的安全对齐通常被认为是一种近二值阈值机制。我们通过揭示安全行为受不稳定区域支配来挑战这一假设，在该区域中，小的扰动会引发随机的拒绝决策而非确定性结果。我们开发了一个结合外部和内部信号的多指标诊断框架来表征这种不稳定性。通过系统实验，我们识别出一个特征性的诊断标志：处于不稳定区域的输入表现出更高的输出不确定性，同时内部安全激活降低，这种解耦现象解释了为什么基于检测的防御无法抵御复杂攻击。基于此框架，我们提出了Furina，一种越狱攻击，它通过碎片化、场景锚定的提示故意诱导这种特征，无需针对模型的优化。Furina在HarmBench上优于强单轮和多轮基线，并在MM-SafetyBench上取得了有竞争力的结果，表明不确定性放大为理解安全漏洞提供了一种有原则且可迁移的机制。代码见：https://github.com/0xCavaliers/Furina_Jailbreak。

英文摘要

Safety alignment in large language models (LLMs) and multimodal large language models (MLLMs) is commonly assumed to operate as a near-binary threshold mechanism. We challenge this assumption by revealing that safety behavior is governed by an instability region where small perturbations induce stochastic refusal decisions rather than deterministic outcomes. We develop a multi-metric diagnostic framework combining external and internal signals to characterize this instability. Through systematic experiments, we identify a characteristic diagnostic signature: inputs in unstable regimes exhibit elevated output uncertainty yet decreased internal safety activation, a decoupling phenomenon that explains why detection-based defenses fail against sophisticated attacks. Building on this framework, we introduce Furina, a jailbreak attack that deliberately induces this signature through fragmented, scene-anchored prompts without model-specific optimization. Furina outperforms strong single-turn and multi-turn baselines on HarmBench and achieves competitive results on MM-SafetyBench, demonstrating that uncertainty amplification provides a principled and transferable mechanism for understanding safety vulnerabilities. Code is available at: https://github.com/0xCavaliers/Furina_Jailbreak.

URL PDF HTML ☆

赞 0 踩 0

2605.26155 2026-05-27 cs.RO cs.AI cs.LG 版本更新

When Does Adaptive Guidance Help? Belief-Aware Privileged Distillation for Autonomous Driving Under Partial Observability

自适应引导何时有帮助？部分可观测条件下自动驾驶的信念感知特权蒸馏

Mehmet Haklidir

发表机构 * TUBITAK BILGEM Artificial Intelligence Institute（土耳其TUBITAK BILGEM人工智能研究所）

AI总结本文提出信念感知GSAC（BA-GSAC），通过集成分歧动态调节蒸馏系数，系统研究自适应引导在部分可观测自动驾驶中的有效性，发现严重遮挡下系数过早崩溃，并揭示可观测性盲区问题。

Comments 9 pages, 3 figures, 7 tables. Accepted at CVPR 2026 Workshop on Autonomous Driving (WAD)

详情

AI中文摘要

引导软演员-评论家（GSAC）将来自特权全状态教师的知识蒸馏给部分可观测的学生，用于自动驾驶，但使用固定的蒸馏系数λ，而不考虑智能体的不确定性。我们提出信念感知GSAC（BA-GSAC），通过集成分歧调节λ，并将其作为系统实证研究的测试平台，探究：自适应引导何时真正有帮助？在Highway-Env上评估五种策略（固定λ∈{0.01, 0.1}、自适应、线性衰减和普通SAC）在三个POMDP难度级别下，我们发现初步的单种子运行表明在轻度和中度部分可观测性下有收益，但在严重遮挡下（所有方法使用3个种子评估），自适应系数在大约3K步内坍缩到λ_min。我们将其归因于可观测性盲区现象：由于集成预测部分观测，即使在严重遮挡下也能达到低分歧，建模了可见部分但无法检测缺失部分。我们诊断了根本原因并提出了架构修复（使用引导演员的特权访问在完整状态预测上训练集成）；虽然此处未验证，但我们表明即使存在当前限制，预热阶段也提供了可测量的稳定性（CV=13.3% vs. 常数λ=0.01的29.8%）。实际上，简单的确定性线性衰减计划在所有指标上实现了最佳的严重POMDP性能（均值116.5，CV=8.9%），表明稳定性收益来自调度效应而非集成。这些发现为设计不确定性感知的师生框架提供了实用指导，并强调了集成预测目标是一个重要的设计选择。

英文摘要

Guided Soft Actor-Critic (GSAC) distills knowledge from a privileged full-state teacher to a partial-observation student for autonomous driving, but uses a fixed distillation coefficient lambda regardless of the agent's uncertainty. We present Belief-Aware GSAC (BA-GSAC), which modulates lambda via ensemble disagreement, and use it as a testbed for a systematic empirical study asking: when does adaptive guidance actually help? Evaluating five strategies (fixed lambda in {0.01, 0.1}, adaptive, linear decay, and vanilla SAC) across three POMDP difficulty levels on Highway-Env, we find that preliminary single-seed runs suggest benefits under mild and moderate partial observability, but under severe occlusion (evaluated with 3 seeds for all methods) the adaptive coefficient collapses to lambda_min within about 3K steps. We trace this to an observability blindness phenomenon: because the ensemble predicts partial observations, it achieves low disagreement even under heavy occlusion, modeling what is visible but unable to detect what is missing. We diagnose the root cause and propose an architectural fix (training the ensemble on full-state predictions using the guiding actor's privileged access); while not validated here, we show that even with current limitations, the warmup phase provides measurable stabilization (CV=13.3% vs. 29.8% for constant lambda=0.01). In fact, a simple deterministic linear decay schedule achieves the best severe-POMDP performance across all metrics (mean 116.5, CV=8.9%), suggesting that the scheduling effect, not the ensemble, drives the stability benefit. These findings provide practical guidance for designing uncertainty-aware teacher-student frameworks and highlight ensemble prediction targets as an important design choice.

URL PDF HTML ☆

赞 0 踩 0

2605.26147 2026-05-27 cs.LG 版本更新

Neural Bayesian Sequential Routing

神经贝叶斯序列路由

Yongchao Huang

AI总结提出神经贝叶斯序列路由（NBSR）框架，将神经推理建模为有向无环图上的主动证据累积，通过狄利克雷-分类共轭框架实现不确定性量化、早期退出和资源理性推理。

Comments 71 pages

详情

AI中文摘要

人类决策是序列化的且具有不确定性意识，然而标准神经网络通常依赖静态、密集的前向计算，对证据获取、不确定性演化或何时停止计算的可视性有限。我们引入了 extbf{神经贝叶斯序列路由（NBSR）}，这是一个将神经推理建模为层次化有向无环图（DAG）上的主动证据累积的框架。在狄利克雷-分类共轭框架内，神经专家查询一个持久的全局知识预言机以提取正证据向量，这些向量作为伪计数，通过精确共轭加法更新狄利克雷信念状态。结合Gumbel-Softmax直通估计器，该更新实现了硬性、路径依赖的路由，同时保留用于端到端训练的代理梯度。由此产生的狄利克雷精度和熵为不确定性量化、基于熵的早期退出、分布外（OOD）弃权以及成本感知的证据获取提供了机制。我们证明，在严格正证据提取下，总狄利克雷精度沿任何有效轨迹单调增加，边际预测方差有界，形式化了序列“假设锐化”；在理想容量和优化假设下，终端狄利克雷期望恢复贝叶斯最优条件分布。在视觉分类、结构化医学诊断、语言建模、部分可观测控制以及成本感知贝叶斯实验设计上的实证评估表明，NBSR在提供透明的路由轨迹、路径依赖的证据归因、不确定性感知的决策控制以及资源理性推理的同时，实现了具有竞争力的预测性能。总体而言，NBSR为可解释、模块化和资源理性的智能体AI提供了一个数学上坚实的框架。

英文摘要

Human decision-making is sequential and uncertainty-aware, yet standard neural networks often rely on static, dense forward computation with limited visibility into evidence acquisition, uncertainty evolution, or when computation should stop. We introduce \textbf{Neural Bayesian Sequential Routing (NBSR)}, a framework that models neural inference as active evidence accumulation over a hierarchical Directed Acyclic Graph (DAG). Within a Dirichlet--Categorical conjugate framework, neural experts query a persistent global knowledge oracle to extract positive evidence vectors, which act as pseudo-counts and update a Dirichlet belief state by exact conjugate addition. Coupled with a Gumbel-Softmax Straight-Through estimator, this update enables hard, path-dependent routing while preserving surrogate gradients for end-to-end training. The resulting Dirichlet precision and entropy provide mechanisms for uncertainty quantification, entropy-based early exiting, OOD abstention, and cost-aware evidence acquisition. We prove that, under strictly positive evidence extraction, total Dirichlet precision increases monotonically along any valid trajectory and marginal predictive variance is bounded, formalizing sequential ``hypothesis sharpening''; under idealized capacity and optimization assumptions, the terminal Dirichlet expectation recovers the Bayes-optimal conditional distribution. Empirical evaluations across visual categorization, structured medical diagnosis, language modeling, partially observable control, and cost-aware Bayesian experimental design show that NBSR achieves competitive predictive performance while providing transparent routing traces, path-dependent evidence attribution, uncertainty-aware decision control, and resource-rational inference. Overall, NBSR offers a mathematically grounded framework for interpretable, modular, and resource-rational agentic AI.

URL PDF HTML ☆

赞 0 踩 0

2605.26135 2026-05-27 cs.LG 版本更新

SilIF: Silhouette-Augmented Isolation Forest for Unsupervised Transaction Fraud Detection

SilIF：基于轮廓增强的隔离森林用于无监督交易欺诈检测

Venkatakrishnan Gopalakrishnan

发表机构 * Independent Researcher（独立研究员）

AI总结提出SilIF方法，通过添加基于轮廓得分的层次增强隔离森林，在IEEE-CIS欺诈检测基准上平均AUC-PR提升0.0080，并在五个种子中均优于原始隔离森林。

Comments 5 pages, 1 figure, 5 tables. Code: https://github.com/venkat15vk/silif-anomaly-detection

详情

AI中文摘要

无监督异常检测广泛应用于标签稀缺的交易欺诈检测中。隔离森林（IF）因其可扩展性和易于部署而成为最流行的经典方法之一。我们提出了SilIF，一种隔离森林的增强方法，它在森林树诱导的表示空间中添加了一个基于轮廓得分的计算层。对于每个点，我们提取每棵树路径长度的向量，将这些“指纹”聚类成结构组，并计算轮廓得分，衡量该点与其分配组的匹配程度相对于最近替代组。轮廓信号通过单个超参数alpha与基础IF得分结合。在IEEE-CIS欺诈检测基准（约59万笔交易，3.5%欺诈）上，alpha=1.0的SilIF在五个种子上平均AUC-PR比普通隔离森林提高0.0080，且SilIF在所有五个种子上获胜（配对t检验p=0.046）。我们还在合成信用卡数据集（Sparkov）上报告了结果，其中轮廓增强并未优于普通IF，并描述了区分两种结果的条件。本文提出了SilIF作为隔离森林的一种可调、易于部署的增强方法，并诚实报告了其何时有效何时无效。代码见https://github.com/venkat15vk/silif-anomaly-detection。

英文摘要

Unsupervised anomaly detection is widely used in transaction fraud detection where labels are scarce. Isolation Forest (IF) is among the most popular classical methods due to its scalability and ease of deployment. We propose SilIF, an augmentation of Isolation Forest that adds a silhouette-based scoring layer computed in a representation space induced by the trees of the forest. For each point, we extract a vector of per-tree path lengths, cluster these "fingerprints" into structural groups, and compute a silhouette score that measures how well the point fits its assigned group versus the nearest alternative. The silhouette signal is combined with the base IF score via a single hyperparameter alpha. On the IEEE-CIS Fraud Detection benchmark (~590K transactions, 3.5% fraud), SilIF with alpha=1.0 improves over plain Isolation Forest by +0.0080 AUC-PR on average across five seeds, with SilIF winning on all five seeds (paired t-test p=0.046). We also report results on a synthetic credit-card dataset (Sparkov) where the silhouette augmentation does not improve over plain IF, and we characterize the conditions that distinguish the two outcomes. The paper presents SilIF as a tunable, easy-to-deploy enhancement to Isolation Forest with honest reporting of when it helps and when it does not. Code at https://github.com/venkat15vk/silif-anomaly-detection.

URL PDF HTML ☆

赞 0 踩 0

2605.26133 2026-05-27 cs.CL cs.AI cs.LG 版本更新

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

大型语言模型中的预训练数据暴露：成员推断、数据污染及安全影响综述

Ziyi Tong, Feifei Sun, Le Minh Nguyen

发表机构 * Japan Advanced Institute of Science and Technology（日本先进科学研究院）

AI总结本文首次统一综述了大型语言模型中的预训练数据暴露问题，涵盖成员推断和数据污染，形式化定义了暴露级别，回顾了攻击与防御方法，并总结了实证发现及未来研究方向。

Comments accepted by NLDB 2025

详情

DOI: 10.1007/978-3-031-97144-0_14

AI中文摘要

大型语言模型（LLMs）已成为NLP中的主导范式，推动了研究和工业的发展。随着模型规模和预训练数据的增长，由于训练数据集的规模和不可见性，对预训练数据暴露（PDE）的担忧也在增加。PDE指的是确定特定数据是否出现在LLM的预训练语料库中。它对于确保评估完整性和保护隐私至关重要，涉及两个关键领域：数据污染和成员推断。尽管概念上相关，但这些领域通常被孤立研究。本文首次在PDE框架下对两者进行了统一综述。我们形式化了跨暴露级别的PDE，回顾了攻击和防御方法，综合了实证发现，并强调了开放的挑战和未来的研究方向。

英文摘要

Large Language Models (LLMs) have become the predominant paradigm in NLP, advancing both research and industry. As model sizes and pretraining data grow, concerns about Pretraining Data Exposure (PDE) increase due to the scale and opacity of training datasets. PDE refers to determining whether specific data appeared in an LLM's pretraining corpus. It is critical for ensuring evaluation integrity and protecting privacy, intersecting two key areas: data contamination and membership inference. Though conceptually related, these areas have often been studied in isolation. This paper offers the first unified survey of both under the PDE framework. We formalize PDE across exposure levels, review attack and defense methods, synthesize empirical findings, and highlight open challenges and future research directions.

URL PDF HTML ☆

赞 0 踩 0

2605.26132 2026-05-27 cs.CL cs.LG 版本更新

Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data Pipeline

自验证蒸馏：你的语言模型秘密地就是它自己的合成数据管道

Tony Lee, Percy Liang

发表机构 * Stanford University（斯坦福大学）

AI总结提出自验证蒸馏算法，让大语言模型仅用无标注种子问题，通过自生成、自验证和自训练提升推理能力，在数学、科学和编程任务上取得显著提升。

详情

AI中文摘要

经过后训练的大语言模型能否仅使用无标注提示，在没有外部教师或工具反馈的情况下进一步提升自己？我们在三个推理领域（数学、科学和编程）中研究这一设置，仅从没有真实解的无标注种子问题开始。我们提出自验证蒸馏，一种简单的后训练精炼算法，其中模型生成这些种子问题的候选解，使用基于提示的自验证进行过滤，并在由此产生的自策展数据集上进行训练。受UQ基准使用多个验证器筛选困难未解问题候选答案的启发，我们将这种基于验证的过滤思想应用于自训练：模型通过三级级联的循环一致性、事实性和正确性检查来过滤自己生成的解，仅当解通过所有阶段且获得一致判断时才被接受。我们发现，在训练数据构建过程中采样更多候选生成并使用更大的验证预算，可以产生更高质量的自策展数据，进而得到更好的推理模型。然后，我们使用自验证蒸馏训练多个规模的Qwen3模型，并在所有三个领域获得收益。对于Qwen3-4B，我们的方法在数学（AIME26和HMMT）上将聚合保留pass@1提升了+16.7个百分点，在科学（GPQA Diamond和HLE）上提升了+11.1个百分点，在编程（LCBv5和LCBv6）上提升了+8.3个百分点，这些收益也扩展到0.6B和8B模型。与我们的仅测试时基线（UQ-TTC）相比，后者通过在推理时花费额外计算来提升性能，自验证蒸馏在大多数设置下实现了更好的性能，同时仅在测试时进行一次推理调用。

英文摘要

Can post-trained large language models (LLMs) further improve themselves using only unlabeled prompts, without external teachers or feedback from tools? We study this setting starting only from unlabeled seed questions with no ground-truth solutions, across three reasoning domains: math, science, and coding. We propose Self-Verified Distillation, a simple post-training refinement algorithm in which the model generates candidate solutions to these seed questions, filters them using prompt-based self-verification, and trains on the resulting self-curated dataset. Inspired by the UQ benchmark's use of multiple validators to screen candidate answers to hard unsolved questions, we adapt this validation-based filtering idea to self-training: the model filters its own generated solutions through a three-stage cascade of cycle-consistency, factuality, and correctness checks, accepting a solution only if it passes all stages with unanimous judge votes. We find that sampling more candidate generations and using a larger verification budget during training data construction produces higher-quality self-curated data and, in turn, better reasoning models. We then train Qwen3 models at multiple scales with Self-Verified Distillation and obtain gains across all three domains. For Qwen3-4B, our method improves aggregate held-out pass@1 by +16.7 points in math (AIME26 and HMMT), +11.1 points in science (GPQA Diamond and HLE), and +8.3 points in coding (LCBv5 and LCBv6), with gains also extending to 0.6B and 8B models. Compared to our test-time-only baseline (UQ-TTC), which improves performance by spending extra compute at inference time, Self-Verified Distillation achieves better performance in most settings while requiring only a single inference call at test time.

URL PDF HTML ☆

赞 0 踩 0

2605.26130 2026-05-27 cs.LG physics.ao-ph 版本更新

AirCast-SR: A Foundation Model for Kilometer-Scale Atmospheric Super-Resolution via Latent Consistency Diffusion

AirCast-SR: 基于潜在一致性扩散的千米级大气超分辨率基础模型

Somnath Luitel, Manmeet Singh, Joshua Durkee, Abdullah Al Fahad, Naveen Sudharsan, Prabhjot Singh, Cenlin He, Harsh Kamath, Zong-Liang Yang, Krishnagopal Halder, Sandeep Juneja, Parthasarathi Mukhopadhyay, Saptarishi Dhanuka, Amit Kumar Srivastava

发表机构 * Department of Earth, Environmental, and Atmospheric Sciences, Western Kentucky University, Bowling Green, KY, USA（地球、环境与大气科学系，西部肯塔基大学）； NASA Goddard Space Flight Center, Greenbelt, MD, USA（NASA戈达德太空飞行中心）； The University of Texas at Austin, Austin, TX, USA（德克萨斯大学奥斯汀分校）； NSF National Center for Atmospheric Research, Boulder, CO, USA（国家大气科学研究中心）； Leibniz Centre for Agricultural Landscape Research (ZALF), Berlin, Germany（莱比锡农业景观研究中心（ZALF））； Ashoka University, Sonipat, India（阿什oka大学）

AI总结提出AirCast-SR基础模型，利用潜在一致性扩散框架将全球AI天气预报从0.25度降尺度至1公里分辨率，实现零偏差和跨区域零样本迁移。

Comments Somnath Luitel and Manmeet Singh are equal-contribution co-first authors, with Manmeet Singh (manmeet.singh@wku.edu) as corresponding author

详情

AI中文摘要

千米尺度的业务天气预报对于传统数值天气预报（NWP）模型而言仍然计算成本过高，限制了需要精细时空细节的能源、农业和灾害管理等应用对预报的获取。本文介绍AirCast-SR，一种用于大气超分辨率的基础模型，将全球AI天气预报从0.25度（约28公里）降尺度至1公里水平分辨率，时间分辨率为每小时，同时生成八个耦合地表变量的67小时预报。EarthMind-SR采用三维U-Net，在潜在一致性模型（LCM）扩散框架内进行条件化，使用基于图块（patch）的样本在美国本土（CONUS）上训练，以GraphCast预报为输入，NOAA的校准记录分析（AORC）为目标。该模型在所有变量和预报时效上实现接近零偏差，其径向功率谱密度分析表明，在10公里至100公里波长范围内，精细大气结构得以保留，而较粗模型在此范围内会损失谱功率。我们通过涵盖冬季、夏季和春季的三个CONUS案例研究验证了EarthMind-SR，并利用独立地面站观测数据，在无需任何重新训练或微调的情况下，展示了在印度和德国上的零样本全球迁移能力。作为一个开放权重的基础模型，EarthMind-SR为千米级AI天气预报建立了新范式，并为区域微调、蒸馏以及气候服务和灾害预报中的下游应用提供了平台。

英文摘要

Operational weather prediction at kilometer scales remains computationally prohibitive for traditional numerical weather prediction (NWP) models, limiting forecast access for applications in energy, agriculture, and disaster management that require fine-grained spatiotemporal detail. Here we introduce AirCast-SR, a foundation model for atmospheric super-resolution that downscales global AI weather forecasts from 0.25 degree (~28 km) to 1 km horizontal resolution at hourly temporal resolution, producing 67-hour forecasts of eight coupled surface variables simultaneously. EarthMind-SR employs a three-dimensional U-Net conditioned within a Latent Consistency Model (LCM) diffusion framework, trained on patch-based samples over the contiguous United States (CONUS) using GraphCast forecasts as input and NOAA's Analysis of Record for Calibration (AORC) as the target. The model achieves near-zero bias across all variables and lead times, and its radial power spectral density analysis demonstrates preservation of fine-scale atmospheric structure at wavelengths of 10 km to 100 km where coarser models lose spectral power. We validate EarthMind-SR across three CONUS case studies spanning winter, summer, and spring seasons, and demonstrate zero-shot global transferability over India and Germany using independent surface station observations without any retraining or fine-tuning. As an open-weights foundation model, EarthMind-SR establishes a new paradigm for kilometer-scale AI weather prediction and provides a platform for regional fine-tuning, distillation, and downstream applications in climate services and hazard forecasting.

URL PDF HTML ☆

赞 0 踩 0

2605.26128 2026-05-27 cs.LG cs.SE 版本更新

The Constraint Tax: Measuring Validity-Correctness Tradeoffs in Structured Outputs for Small Language Models

约束税：小语言模型结构化输出中正确性与准确性的权衡度量

Jaideep Ray

发表机构 * ACM（美国计算机协会）

AI总结本文提出“约束税”测量协议，通过实验证明硬输出约束会显著降低小语言模型的答案准确性和可执行准确性，并建议生产系统应分别报告模式有效性、答案准确性、可执行准确性和错误有效模式率。

详情

AI中文摘要

生产级LLM系统越来越需要机器可读的输出：JSON对象、类型化轨迹、正则表达式约束字段和工具调用模式。本文针对设备端和低成本小语言模型（SLM）部署，其中低于3B参数的模型因隐私、延迟和通用硬件而具有吸引力，但在解决任务时满足模式的能力有限。通常的工程假设是硬输出约束能提高可靠性而不改变底层答案。我们证明这一假设对小模型不安全。我们引入\emph{约束税}，一种测量协议，用于在固定模型、固定任务分布和固定问题实例下，隔离由结构化输出约束引起的答案和可执行准确性损失。在Qwen2.5-0.5B、Qwen2.5-1.5B和SmolLM2-1.7B的15,000次通用GPU生成中，硬答案模式解码将模式有效性从61.5%提高到100.0%，但将答案准确性从19.7%降低到11.0%，并将错误有效模式输出从49.5%增加到88.9%。最强的工业类比是确定性日历工具调用任务：Qwen2.5-1.5B在仅提示JSON下达到91.5%的可执行准确性，但在相同硬工具调用模式下仅为48.0%，而两种模式都是100.0%模式有效。错误是语义性的，而非结构性的。我们还表明，3B边界仍然支付直接模式税，并且延迟包装支持一种建设性设计模式：自由推理，延迟约束。实际结论是直接的：生产系统应分别报告模式有效性、答案准确性、可执行准确性和错误有效模式率。

英文摘要

Production LLM systems increasingly require machine-readable outputs: JSON objects, typed traces, regex-constrained fields, and tool-call schemas. This paper targets on-device and low-cost small language model (SLM) deployments, where sub-3B models are attractive for privacy, latency, and commodity hardware but have limited capacity to satisfy schemas while solving tasks. The usual engineering assumption is that hard output constraints improve reliability without changing the underlying answer. We show that this assumption is unsafe for small models. We introduce \emph{constraint tax}, a measurement protocol for isolating the answer and executable-accuracy loss caused by structured-output constraints at fixed model, fixed task distribution, and fixed problem instances. Across 15,000 commodity-GPU generations with Qwen2.5-0.5B, Qwen2.5-1.5B, and SmolLM2-1.7B, hard answer-only schema decoding raises schema validity from 61.5\% to 100.0\%, but lowers answer accuracy from 19.7\% to 11.0\% and increases wrong-valid-schema outputs from 49.5\% to 88.9\%. The strongest industry analogue is a deterministic calendar tool-call task: Qwen2.5-1.5B achieves 91.5\% executable accuracy with prompt-only JSON but only 48.0\% under the same hard tool-call schema, while both modes are 100.0\% schema-valid. The error is semantic, not structural. We also show that the 3B boundary still pays a direct-schema tax and that delayed packaging supports a constructive design pattern: reason free, constrain late. The practical conclusion is direct: production systems should report schema validity, answer accuracy, executable accuracy, and wrong-valid-schema rate separately.

URL PDF HTML ☆

赞 0 踩 0

2605.26127 2026-05-27 physics.med-ph cs.LG eess.IV 版本更新

PDEInvBench：面向PDE逆问题的神经网络综合数据集与设计空间探索

Divyam Goel, Nithin Chalapathi, Sanjeev Raja, Aditi S. Krishnapriyan

发表机构 * Department of Computer Science, UC Berkeley（计算机科学系，加州大学伯克利分校）； UC Berkeley（加州大学伯克利分校）； Departments of Computer Science and Chemical Engineering UC Berkeley（计算机科学与化学工程系，加州大学伯克利分校；劳伦斯伯克利国家实验室）； LBNL

AI总结提出PDEInvBench基准数据集，通过数值模拟涵盖多种PDE，并沿优化、表示和缩放三个维度系统探索神经网络设计空间，发现两阶段训练、PDE导数输入和初始条件多样性等实用见解。

Comments 37 total pages, 13 main pages, 20 figures, 8 tables. Published in Transactions on Machine Learning Research (TMLR), 2026

详情

Journal ref: Transactions on Machine Learning Research, 2026

AI中文摘要

偏微分方程（PDE）中的逆问题涉及从观测到的时空解场估计系统的物理参数。神经网络因其对函数到函数空间变换的建模能力，非常适合PDE参数估计。虽然现有的机器学习方法基准主要关注正问题，但尚无针对PDE逆问题（即从解场映射到潜在物理参数）的类似综合研究和基准数据集。我们通过引入PDEInvBench填补了这一空白，这是一个全面的基准数据集，包含时间依赖和时间独立PDE的数值模拟，覆盖广泛的物理行为和参数。我们的数据集包括评估划分，用于评估在分布内和多种分布外设置下的性能。利用我们的基准数据集，我们沿三个关键维度全面探索了神经网络在PDE逆问题中的设计空间：（1）优化过程，分析监督、自监督和测试时训练目标对性能的作用；（2）问题表示，研究具有不同归纳偏好的架构选择和各种条件策略的价值；（3）缩放，针对模型和数据大小进行。我们的实验揭示了几个实用见解：1）神经网络在两步训练过程中表现最佳：先用PDE参数进行初始监督，然后使用PDE残差进行测试时微调；2）将PDE导数作为输入特征始终能提高精度；3）增加训练数据中初始条件的多样性比扩大PDE参数范围带来更大的性能提升。我们公开了数据集和代码库。

英文摘要

Inverse problems in partial differential equations (PDEs) involve estimating the physical parameters of a system from observed spatiotemporal solution fields. Neural networks are well-suited for PDE parameter estimation due to their capability to model function-to-function space transformations. While existing benchmarks of machine learning methods for PDEs primarily focus on the forward problem, there are no similar comprehensive studies and benchmark datasets on PDE inverse problems, i.e., mapping solution fields to underlying physical parameters. We fill this gap by introducing PDEInvBench, a comprehensive benchmark dataset consisting of numerical simulations for both time-dependent and time-independent PDEs across a wide range of physical behaviors and parameters. Our dataset includes evaluation splits that assess performance in both in-distribution and various out-of-distribution settings. Using our benchmark dataset, we comprehensively explore the design space of neural networks for PDE inverse problems along three key dimensions: (1) optimization procedures, analyzing the role of supervised, self-supervised, and test-time training objectives on performance, (2) problem representations, where we study the value of architectural choices with different inductive biases and various conditioning strategies, and (3) scaling, which we perform with respect to both model and data size. Our experiments reveal several practical insights: 1) neural networks perform best with a two-stage training procedure: initial supervision with PDE parameters followed by test-time fine-tuning using the PDE residual, 2) incorporating PDE derivatives as input features consistently improves accuracy, and 3) increasing the diversity of initial conditions in the training data yields greater performance gains than expanding the range of PDE parameters. We make our dataset and codebase publicly available.

URL PDF HTML ☆

赞 0 踩 0

2605.24071 2026-05-27 cs.LG cs.AI 版本更新

Not All Transitions Matter: Evidence from PPO

并非所有转移都重要：来自PPO的证据

Ajhesh Basnet

发表机构 * Department of Artificial Intelligence and Data Science（人工智能与数据科学系）； KPR Institute of Engineering and Technology（KPR工程科技研究院）

AI总结本文提出在PPO训练中随机丢弃一定比例的轨迹转移，以打破重复梯度结构，稳定训练，并在多个环境中验证了效果。

Comments 19 pages, 5 figures. Accepted to 2026 8th Asia Conference on Machine Learning and Computing (ACMLC 2026)

详情

Journal ref: Proceedings of the 2026 8th Asia Conference on Machine Learning and Computing

AI中文摘要

在策略上训练强化学习代理意味着每次更新时收集新的经验，而这些经验隐藏着一个问题。轨迹中的每个状态都是前一个状态的直接输出，由代理自身的动作因果链连接。因此，连续的转移从未真正独立。它们携带重叠信息，网络接收到的梯度信号最终比批次大小所暗示的要重复得多。相同的方向被反复强化，价值网络在策略变化时难以跟上，训练变得悄悄不稳定，而仅凭奖励曲线很少能揭示这一点。本文询问这种冗余是否可以简单地移除。我们表明，在适当阶段从轨迹中随机丢弃固定比例的转移，使得奖励信号保持完整，足以打破重复的梯度结构并稳定训练。变化很小：一个采样步骤，没有新组件，不修改核心算法，并且适用于任何PPO实现。在五个难度递增的环境（CartPole-v1、Acrobot-v1、LunarLander-v2、HalfCheetah-v5和Hopper-v5）中，该方法在奖励上与标准PPO匹配，同时在KL散度、策略熵和价值估计上产生更一致的训练动态。丢弃25%的转移是最佳点：足以破坏冗余，又不至于使批次过薄。

英文摘要

Training a reinforcement learning agent on-policy means collecting fresh experience at every update, and that experience comes with a hidden problem. Each state in a rollout is the direct output of the previous one, causally chained together by the agent's own actions. Because of this, consecutive transitions are never truly independent. They carry overlapping information, and the gradient signal the network receives ends up far more repetitive than the batch size suggests. The same directions get reinforced over and over, the value network struggles to keep up as the policy shifts, and training becomes quietly unstable in ways that reward curves alone rarely reveal. This paper asks whether that redundancy can simply be removed. We show that randomly dropping a fixed fraction of transitions from the rollout, at the right stage so the reward signal stays intact, is enough to break the repetitive gradient structure and stabilize training. The change is minimal: one sampling step, no new components, no modification to the core algorithm, and it works with any PPO implementation. Across five environments of increasing difficulty, CartPole-v1, Acrobot-v1, LunarLander-v2, HalfCheetah-v5, and Hopper-v5, the method matches vanilla PPO on reward while producing more consistent training dynamics across KL divergence, policy entropy, and value estimates. Dropping 25% of transitions turns out to be the sweet spot: enough to disrupt the redundancy, not enough to thin the batch.

URL PDF HTML ☆

赞 0 踩 0

2605.24042 2026-05-27 cs.LG cs.AI 版本更新

Hidden-State Privacy Has an Empty Middle

隐藏状态隐私存在空中间

Alexander Okezue Bell

发表机构 * Stanford University（斯坦福大学）

AI总结通过理论下界和实验证明，高斯释放机制在隐藏状态隐私中无法同时实现中等效用和隐私，存在空中间区域，并提出了对角逆Fisher机制作为最优解。

Comments 74 pages, 61 figures

详情

AI中文摘要

在我们测试的1536个高斯释放协方差中，对于单层隐藏状态隐私，没有一个能在自适应检索攻击者下同时实现中等效用和中等隐私。我们证明了一个互补的Fisher球下界：每个具有O(1) Fisher效用的满秩高斯释放都存在一个方向，其马氏信号随隐藏宽度线性增长，排除了该类中的均匀高斯安全性，并与经验上的空中间匹配。对角逆Fisher释放Σ^⋆_{diag}(K) = (2K/d) diag(1/F_{ii})是在一阶KL预算K下唯一的最小最大最优对角机制，也是在32个模型层网格的每个点上最坏攻击者top-1 ≤ 0.001的唯一释放，但它位于隐私/效用边界上，而不是填充中间。在欧几里得检索下达到13倍帕累托缩减的广义特征机制，在自适应马氏攻击者下崩溃为100% top-1，而全轨迹序列逆变器恢复了干净GPT-2前缀的94%，但在Σ_{diag}下为0%。从头训练的分离记忆Transformer在90M时达到G_{Mah} ∈ [20, 33]，并在固定token语言建模损失惩罚下，从30M到1B保持比相同预算GPT基线6-24倍的优势；预训练模型最高为9.3。这些结果将隐藏状态释放从高斯类内的机制设计重新定义为架构或释放协同设计。

英文摘要

Of $1{,}536$ Gaussian release covariances we tested for single-layer hidden-state privacy, zero achieve both moderate utility and moderate privacy against an adaptive retrieval attacker. We prove a complementary Fisher-ball lower bound: every full-rank Gaussian release at $O(1)$ Fisher utility admits a direction whose Mahalanobis signal grows linearly in hidden width, ruling out uniform Gaussian safety in the class and matching the empirical empty middle. The diagonal inverse-Fisher release $Σ^\star_{\mathrm{diag}}(\mathcal{K}) = (2\mathcal{K}/d)\,\mathrm{diag}(1/F_{ii})$ is the unique minimax-optimal diagonal mechanism at first-order KL budget $\mathcal{K}$ and the only release with worst-attacker top-1 $\le 0.001$ at every point of a 32 model-layer grid, but it sits on a privacy/utility edge rather than filling the middle. A generalized-eigen mechanism reaching $13\times$ Pareto reduction under Euclidean retrieval collapses to $100\%$ top-1 under the adaptive Mahalanobis attacker, and a full-trajectory sequence inverter recovers $94\%$ of clean GPT-2 prefixes but $0\%$ under $Σ_{\mathrm{diag}}$. A split-memory transformer trained from scratch reaches $G_{\mathrm{Mah}} \in [20, 33]$ at 90M and maintains a $6$--$24\times$ advantage over same-budget GPT baselines from 30M to 1B at a fixed-token language-modeling loss penalty; pretrained models top out at 9.3. These results reframe hidden-state release from mechanism-design within the Gaussian class to architecture or release co-design.

URL PDF HTML ☆

赞 0 踩 0

2605.24038 2026-05-27 physics.space-ph astro-ph.EP astro-ph.IM cs.LG 版本更新

自主AI代理在供应链管理中的可靠性与有效性

Carol Xuan Long, David Simchi-Levi, Feng Zhu, Huangyuan Su, Andre P. Calmon, Flavio P. Calmon

发表机构 * Harvard University（哈佛大学）； MIT/Purdue（麻省理工学院/普渡大学）； MIT（麻省理工学院）； Harvard University/Kempner Institute（哈佛大学/凯普勒研究所）； Georgia Tech（佐治亚理工学院）

AI总结本文通过MIT啤酒游戏研究多级供应链中的自主生成式AI代理，发现模型能力是性能主导因素，但平均性能掩盖可靠性风险，并引入代理牛鞭效应，提出基于GRPO的后训练框架以提高可靠性。

详情

AI中文摘要

本文使用MIT啤酒游戏研究多级供应链中的自主生成式AI代理。我们确定了影响性能的四个推理时杠杆：模型选择、策略和护栏、集中数据共享以及提示工程。模型能力是主导因素：开箱即用的推理模型超越人类水平性能，优化后的推理模型相对于人类团队将成本降低高达67%。然而，强劲的平均性能掩盖了显著的可靠性风险。我们引入了代理牛鞭效应：自主多级系统中运行间决策不稳定性的放大。其中一个核心组成部分是决策牛鞭效应，即由随机代理决策而非客户需求变化产生的订单变异性部分。我们表明，即使需求路径固定，决策不稳定性也可以在固定时间点跨设施以及同一设施内随时间放大。重复采样（一种自然的测试时补救措施）未能显著减少这种不稳定性，这表明可靠性需要改变底层决策策略，而不仅仅是平均模型输出。为解决这一限制，我们提出了一种基于组相对策略优化（GRPO）的强化学习后训练框架，该框架使用系统级供应链奖励训练共享的基础LLM。后训练显著减少了尾部事件，抑制了代理牛鞭效应，并提高了自主供应链代理的可靠性。

英文摘要

This paper studies autonomous generative AI agents in multi-echelon supply chains using the MIT Beer Game. We identify four inference-time levers that shape performance: model selection, policies and guardrails, centralized data sharing, and prompt engineering. Model capability is the dominant factor: an out-of-the-box reasoning model exceeds human-level performance, and optimized reasoning models reduce costs by up to 67% relative to human teams. However, strong average performance masks substantial reliability risks. We introduce agent bullwhip: the amplification of run-to-run decision instability in autonomous multi-echelon systems. A central component is decision bullwhip, the portion of order variability generated by stochastic agent decisions rather than by changes in customer demand. We show that decision instability can amplify both across facilities at a fixed point in time and within the same facility over time, even when the demand path is held fixed. Repeated sampling, a natural test-time remedy, fails to meaningfully reduce this instability, suggesting that reliability requires changing the underlying decision policy rather than merely averaging over model outputs. To address this limitation, we propose a Group Relative Policy Optimization (GRPO)-based reinforcement-learning post-training framework that trains a shared base LLM using system-level supply-chain rewards. Post-training substantially reduces tail events, curtails agent bullwhip, and improves the reliability of autonomous supply-chain agents.

URL PDF HTML ☆

赞 0 踩 0

2605.16457 2026-05-27 cs.LG cs.AI cs.CV 版本更新

Identifiable Token Correspondence for World Models

可辨识的令牌对应关系用于世界模型

Youngin Kim, Ray Sun, Inho Kim, Bumsoo Park, Hyun Oh Song

发表机构 * Interdisciplinary Program in Artificial Intelligence, Seoul National University（人工智能交叉学科项目，首尔国立大学）； Department of Computer Science（计算机科学系）； Engineering, Seoul National University（工程系，首尔国立大学）

AI总结提出可辨识的令牌对应关系（ITC）方法，通过将下一帧预测建模为结构化分配问题，解决基于令牌的Transformer世界模型在长程推演中的时间不一致性，在四个基准上达到最先进性能。

详情

AI中文摘要

基于令牌的Transformer世界模型在视觉强化学习中表现出色，但常在长程推演中出现时间不一致性，包括对象重复、消失和变形。一个关键原因是大多数现有方法将下一帧预测纯粹视为令牌生成问题，而未考虑令牌在时间上的持续性。我们引入可辨识的令牌对应关系（ITC），这是一种用于基于令牌的Transformer世界模型的解码步骤，将下一帧预测建模为具有潜在令牌对应变量的结构化分配问题：每个下一帧令牌要么通过从上一帧复制令牌来解释，要么通过生成新令牌来解释。ITC保持Transformer架构和训练过程不变，可以添加到现有骨干网络上。我们的实验在4个具有挑战性的基准上展示了最先进的性能。所提出的方法在Craftax-classic基准上实现了72.5%的回报率和35.6%的分数，显著超过了之前的最佳结果67.4%和27.9%。我们在https://github.com/snu-mllab/Identifiable-Token-Correspondence上发布了源代码。

英文摘要

Token-based transformer world models have shown strong performance in visual reinforcement learning, but often suffer from temporal inconsistency in long-horizon rollouts, including object duplication, disappearance, and transmutation. A key reason is that most existing approaches treat next-frame prediction purely as a token generation problem, without considering the persistence of tokens across time. We introduce Identifiable Token Correspondence (ITC), a decoding step for token-based transformer world models that formulates next-frame prediction as a structured assignment problem with latent token correspondence variables: each next-frame token is explained either by copying a token from the previous frame or by generating a new one. ITC leaves the transformer architecture and training procedure unchanged and can be added on top of existing backbones. Our experiments show state-of-the-art performance on 4 challenging benchmarks. The proposed method achieves a return of 72.5% and a score of 35.6% on the Craftax-classic benchmark, significantly surpassing the previous best of 67.4% and 27.9%. We release our source code on https://github.com/snu-mllab/Identifiable-Token-Correspondence.

URL PDF HTML ☆

赞 0 踩 0

2605.04880 2026-05-27 cs.LG cs.AI 版本更新

A Harmonic Mean Formulation of Average Reward Reinforcement Learning in SMDPs

SMDP中平均奖励强化学习的调和均值公式

Erel Shtossel, Alicia Vidler, Uri Shaham, Gal A. Kaminka

发表机构 * Bar Ilan University（巴伊兰大学）

AI总结针对无限时域非回合制任务中的平均奖励强化学习，提出一种修正的调和均值算子，解决SMDP中奖励和持续时间非平稳时的奖励率计算问题，并证明其理论性质及有效性。

详情

Journal ref: https://alaworkshop2026.github.io/papers/ALA2026_paper_57.pdf

AI中文摘要

最近的研究重新激发并增强了对无限时域、非回合制（持续）任务中未折扣平均奖励强化学习算法的兴趣。半马尔可夫决策过程（SMDP）尤其引人关注。在SMDP中，离散动作随机产生奖励和持续时间，目标是优化平均奖励率。现有算法通过优化奖励与持续时间的比率来逼近这一目标。然而，当奖励和持续时间（在无限时域中）非平稳时，这种方法可能不正确。本文提出一种新颖的修正调和均值算子，即使在上述条件下也能正确计算奖励率。这产生了可以与SMDP一起工作的无模型学习算法，同时保持对随时间变化的非平稳奖励和持续时间分布的鲁棒性。我们证明了修正调和均值算子的理论性质，并通过实验与现有算法相比展示了其有效性。

英文摘要

Recent research has revived and amplified interest in algorithms for undiscounted average reward reinforcement learning in infinite-horizon, non-episodic (continuing) tasks. Semi-Markov decision processes (SMDPs) are of particular interest. In SMDPs, discrete actions stochastically generate both rewards and durations, and the objective is to optimize the average reward rate. Existing algorithms approach this by optimizing the ratio of rewards to durations. However, when rewards and durations are non-stationary (in the infinite horizon), this can be incorrect. This paper presents a novel modified harmonic mean operator that correctly computes reward rates even under such conditions. This yields model-free learning algorithms that can work with SMDPs, while maintaining robustness to non-stationary reward and duration distributions over time. We prove theoretical properties of the modified harmonic mean operator, and empirically demonstrate its efficacy in comparison to existing algorithms.

URL PDF HTML ☆

赞 0 踩 0

2605.02207 2026-05-27 cs.CV cs.AI cs.LG 版本更新

MultiSense-Pneumo: A Multimodal Learning Framework for Pneumonia Screening in Resource-Constrained Settings

MultiSense-Pneumo：面向资源受限环境中肺炎筛查的多模态学习框架

Dineth Jayakody, Pasindu Thenahandi, Chameli Dommanige

发表机构 * Department of Computer Science, Old Dominion University, VA, USA（计算机科学系，老 Dominion 大学，弗吉尼亚州，美国）

AI总结提出MultiSense-Pneumo多模态原型系统，整合症状、咳嗽音频、语音和胸片，通过可解释的后期融合实现肺炎筛查与分诊支持。

详情

AI中文摘要

肺炎仍然是全球发病率和死亡率的主要原因，尤其是在低资源环境中，那里缺乏影像学、实验室检测和专科护理。临床评估依赖于异质性证据，包括症状、呼吸模式、口头描述和胸部影像，使得一线筛查本质上是多模态的。然而，许多现有的计算方法仍然是单模态的，并且主要关注放射影像。在这项工作中，我们提出了MultiSense-Pneumo，一个面向肺炎筛查和分诊支持的多模态研究原型，它整合了结构化症状描述符、咳嗽音频、口语和胸部X光片。该系统结合了确定性症状分诊、基于LightGBM的声学分类、使用ResNet-18的域对抗放射影像分析、基于Transformer的语音识别以及可解释的后期融合算子。每个模态被转换为归一化的关注信号，并聚合为统一的筛查估计。融合权重是手动指定的，被视为启发式、可解释的参数，而不是学习或临床优化的值。MultiSense-Pneumo的设计考虑了在标准笔记本电脑级硬件上的离线执行，但并未作为经过部署验证或临床验证的诊断系统呈现。实验结果表明，在合成域偏移下，放射影像路径具有强大的组件级性能，同时也突出了重要的局限性，特别是咳嗽声学的异常类别召回率降低以及缺乏配对的端到端多模态患者评估。因此，MultiSense-Pneumo旨在作为筛查和分诊研究的框架和组件级原型。

英文摘要

Pneumonia remains a leading global cause of morbidity and mortality, particularly in low-resource settings where access to imaging, laboratory testing, and specialist care is limited. Clinical assessment relies on heterogeneous evidence, including symptoms, respiratory patterns, spoken descriptions, and chest imaging, making frontline screening inherently multimodal. However, many existing computational approaches remain unimodal and focus primarily on radiographs. In this work, we present MultiSense-Pneumo, a multimodal research prototype for pneumonia-oriented screening and triage support that integrates structured symptom descriptors, cough audio, spoken language, and chest radiographs. The system combines deterministic symptom triage, LightGBM-based acoustic classification, domain-adversarial radiograph analysis using ResNet-18, transformer-based speech recognition, and an interpretable late-fusion operator. Each modality is transformed into a normalized concern signal and aggregated into a unified screening estimate. The fusion weights are hand-specified and are treated as heuristic, interpretable parameters rather than learned or clinically optimized values. MultiSense-Pneumo is implemented with offline execution in mind on standard laptop-class hardware, but it is not presented as a deployment-validated or clinically validated diagnostic system. Experimental results demonstrate strong component-level performance of the radiograph pathway under synthetic domain shifts, while also highlighting important limitations, especially reduced abnormal-class recall for cough acoustics and the absence of paired end-to-end multimodal patient evaluation. MultiSense-Pneumo is therefore intended as a framework and component-level prototype for screening and triage research.

URL PDF HTML ☆

赞 0 踩 0

2410.18915 2026-05-27 cs.DS cs.LG 版本更新

Testing Support Size More Efficiently Than Learning Histograms

比学习直方图更高效地测试支撑大小

Renato Ferreira Pinto, Nathaniel Harms

发表机构 * Columbia University, USA（哥伦比亚大学，美国）； University of British Columbia, Canada（不列颠哥伦比亚大学，加拿大）； University of Waterloo（多伦多大学）； EPFL（苏黎世联邦理工学院）

AI总结针对未知概率分布p的支撑大小测试问题，提出一种基于切比雪夫多项式的方法，仅需O(n/(ε log n) log(1/ε))个样本，优于学习直方图的Θ(n/(ε^2 log n))样本，并给出支撑大小的更大下界。

Comments 42 pages. This is the TheoretiCS journal version

详情

DOI: 10.46298/theoretics.26.10
Journal ref: TheoretiCS, Volume 5 (May 21, 2026) theoretics:16717

AI中文摘要

考虑关于未知概率分布$p$的两个问题： 1. 需要多少来自$p$的样本才能测试$p$是否支撑在$n$个元素上？具体来说，给定来自$p$的样本，判断它是否支撑在至多$n$个元素上，或者它在总变差距离上“$ε$-远离”支撑在$n$个元素上。 2. 给定来自$p$的$m$个样本，我们能对其支撑大小产生的最大下界是多少？问题(1)的最佳已知上界使用了一种学习分布$p$的直方图的通用算法，该算法需要$Θ( frac{n}{ε^2 \log n})$个样本。我们表明，测试可以比学习直方图更高效，仅需$O( frac{n}{ε\log n} \log(1/ε))$个样本，几乎匹配最佳已知下界$Ω( frac{n}{ε\log n})$。该算法还为问题(2)提供了更好的解决方案，比先前工作产生更大的支撑大小下界。证明依赖于对切比雪夫多项式近似在其设计范围之外的分析，本文旨在作为切比雪夫多项式方法的易于理解的自包含阐述。

英文摘要

Consider two problems about an unknown probability distribution $p$: 1. How many samples from $p$ are required to test if $p$ is supported on $n$ elements or not? Specifically, given samples from $p$, determine whether it is supported on at most $n$ elements, or it is "$ε$-far" (in total variation distance) from being supported on $n$ elements. 2. Given $m$ samples from $p$, what is the largest lower bound on its support size that we can produce? The best known upper bound for problem (1) uses a general algorithm for learning the histogram of the distribution $p$, which requires $Θ(\tfrac{n}{ε^2 \log n})$ samples. We show that testing can be done more efficiently than learning the histogram, using only $O(\tfrac{n}{ε\log n} \log(1/ε))$ samples, nearly matching the best known lower bound of $Ω(\tfrac{n}{ε\log n})$. This algorithm also provides a better solution to problem (2), producing larger lower bounds on support size than what follows from previous work. The proof relies on an analysis of Chebyshev polynomial approximations outside the range where they are designed to be good approximations, and the paper is intended as an accessible self-contained exposition of the Chebyshev polynomial method.

URL PDF HTML ☆

赞 0 踩 0

2503.22823 2026-05-27 quant-ph cs.IT cs.LG math.IT 版本更新

Quantum Doeblin Coefficients: Interpretations and Applications

量子Doeblin系数：解释与应用

Ian George, Christoph Hirche, Theshani Nuradha, Mark M. Wilde

发表机构 * Centre for Quantum Technologies, National University of Singapore, Singapore 117543, Singapore（量子技术中心，新加坡国立大学）； School of Electrical and Computer Engineering, Cornell University, Ithaca, New York 14850, USA（电气与计算机工程学院，康奈尔大学）

AI总结本文定义并研究了量子Doeblin系数，提供了多种解释（如最小单态分数、排除值等），并展示了其在量子机器学习、误差缓解、量子假设检验和时变信道等领域的应用。

Comments v3: 108 pages, 5 figures, added some summary tables, added proof of reducing to classical Doeblin on classical channels, and another multiplicativity result v2: 104 pages, 5 figures, Expanded the application section on mixing, indistinguishability, and decoupling times ; v1:88 pages, 2 figures

详情

DOI: 10.22331/q-2026-05-22-2115
Journal ref: Quantum 10, 2115 (2026)

AI中文摘要

在经典信息论中，经典信道的Doeblin系数提供了信道全变差收缩系数的可有效计算的上界，从而导致了所谓的强数据处理不等式。在这里，我们研究量子Doeblin系数作为经典概念的推广。特别地，我们定义了各种新的量子Doeblin系数，其中一种具有几个理想性质，包括级联性和可乘性，此外还能有效计算。我们还发展了两种量子Doeblin系数的各种解释，包括作为最小单态分数、排除值、反向最大互信息和ovelhoH信息、反向鲁棒性以及假设检验反向互信息和ovelhoH信息的表示。我们将量子Doeblin系数解释为纠缠辅助或非辅助的排除值特别有吸引力，表明它们与通过使用信道在状态排除任务中能达到的最佳可能错误概率成正比。我们还概述了量子Doeblin系数的各种应用，范围从对使用参数化量子电路的量子机器学习算法的限制（噪声诱导的贫瘠高原）、对误差缓解协议的限制、对噪声量子假设检验的样本复杂性的限制，以及对时变信道的混合性、可区分性和解耦时间的限制。所有这些应用都利用了量子Doeblin系数出现在信道各种迹距离收缩系数的上界中这一事实。此外，在所有这些应用中，我们使用Doeblin系数的分析在通用性和可有效计算性方面，对先前文献的贡献提供了各种改进。

英文摘要

In classical information theory, the Doeblin coefficient of a classical channel provides an efficiently computable upper bound on the total-variation contraction coefficient of the channel, leading to what is known as a strong data-processing inequality. Here, we investigate quantum Doeblin coefficients as a generalization of the classical concept. In particular, we define various new quantum Doeblin coefficients, one of which has several desirable properties, including concatenation and multiplicativity, in addition to being efficiently computable. We also develop various interpretations of two of the quantum Doeblin coefficients, including representations as minimal singlet fractions, exclusion values, reverse max-mutual and oveloH informations, reverse robustnesses, and hypothesis testing reverse mutual and oveloH informations. Our interpretations of quantum Doeblin coefficients as either entanglement-assisted or unassisted exclusion values are particularly appealing, indicating that they are proportional to the best possible error probabilities one could achieve in state-exclusion tasks by making use of the channel. We also outline various applications of quantum Doeblin coefficients, ranging from limitations on quantum machine learning algorithms that use parameterized quantum circuits (noise-induced barren plateaus), on error mitigation protocols, on the sample complexity of noisy quantum hypothesis testing, and on mixing, distinguishability, and decoupling times of time-varying channels. All of these applications make use of the fact that quantum Doeblin coefficients appear in upper bounds on various trace-distance contraction coefficients of a channel. Furthermore, in all of these applications, our analysis using Doeblin coefficients provides improvements of various kinds over contributions from prior literature, both in terms of generality and being efficiently computable.

URL PDF HTML ☆

赞 0 踩 0

2605.18866 2026-05-27 cs.LG cs.AI 版本更新

FLUIDSPLAT: Reconstructing Physical Fields from Sparse Sensors via Gaussian Primitives

FLUIDSPLAT: 通过高斯原语从稀疏传感器重建物理场

Huaxi Huang, Meng Li, Zhengqing Gao, Xi Zhou, Xiaoshui Huang, Xiao Sun

发表机构 * Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； The Hong Kong University of Science and Technology（香港科学与技术大学）； Mohamed bin Zayed University of Artificial Intelligence（莫扎德·本·扎耶德人工智能大学）； Shanghai Jiaotong University（上海交通大学）

AI总结提出FLUIDSPLAT模型，利用高斯原语作为空间显式中间表示，从稀疏传感器数据重建流场，理论分析了表示能力与观测数的关系，并在多个基准上实现误差降低11-28%。

Comments 24 pages, 5 figures,preprint

详情

AI中文摘要

从稀疏表面安装的传感器重建连续流场是空气动力学设计、流动控制和数字孪生仪器的核心。现有的神经方法通常将传感器读数编码为隐式潜在代码，空间可解释性差，且关于表示能力应如何随观测数量扩展的正式指导有限。受3D高斯泼溅启发，我们引入FLUIDSPLAT，一种传感器条件模型，预测K个各向异性高斯原语，形成单位划分支架，即流场的空间显式且可解释的中间表示。对于理想化的高斯原语估计器，我们证明了对于具有Sobolev光滑度s的场，逼近率为$O(K^{-s/d})$；结合N个含噪声观测，得到偏差$O(K^{-2s/d})$和方差$O(σ^{2}K/N)$的平方风险分解。平衡两者得到$K^{*}\!\sim\!(N/σ^{2})^{d/(2s+d)}$：在稀疏传感下原语数量不能自由增长，揭示了方差瓶颈，促使用状态条件残差解码器补充支架。在涵盖2D和3D的四个基准（圆柱绕流、AirfRANS、FlowBench LDC-3D和PhySense-Car 3D）上，FLUIDSPLAT相比多个强基线实现了11-28%的误差降低。

英文摘要

Reconstructing continuous flow fields from sparse surface-mounted sensors is central to aerodynamic design, flow control, and digital-twin instrumentation. Existing neural methods for this task typically encode sensor readings into implicit latent codes with little spatial interpretability and limited formal guidance on how representational capacity should scale with observation count. Inspired by 3D Gaussian Splatting, we introduce FLUIDSPLAT, a sensor-conditioned model that predicts K anisotropic Gaussian primitives forming a partition-of-unity scaffold, a spatially explicit and interpretable intermediate representation of the flow. For an idealized Gaussian primitive estimator, we prove an $O(K^{-s/d})$ approximation rate for fields with Sobolev smoothness $s$; incorporating $N$ noisy observations yields a squared-risk decomposition with bias $O(K^{-2s/d})$ and variance $O(σ^{2}K/N)$.Balancing the two yields $K^{*}\!\sim\!(N/σ^{2})^{d/(2s+d)}$: primitive count cannot grow freely under sparse sensing, revealing a variance bottleneck that motivates complementing the scaffold with a state-conditioned residual decoder. Across four benchmarks spanning 2D and 3D, FLUIDSPLAT achieves 11-28% error reduction over several strong baselines on cylinder flow, AirfRANS, FlowBench LDC-3D, and PhySense-Car 3D benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.18592 2026-05-27 cs.LG cs.AI cs.CL 版本更新

AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning

AMARIS: 一种用于基于评分标准的强化学习的记忆增强评分标准改进系统

Peilin Wu, Xinlu Zhang, Kun Wan, Wentian Zhao, Gang Wu, Xinya Du, Zhiyu Chen

发表机构 * The University of Texas at Dallas（德克萨斯大学达拉斯分校）； Adobe Inc.（Adobe公司）； Department of Computer Science, University of California, Santa Barbara（加州大学圣芭芭拉分校计算机科学系）

AI总结提出AMARIS系统，通过持久化评估记忆存储纵向训练证据来改进评分标准，在科学、医学、指令遵循和创意写作任务上优于静态、局部自适应和无记忆基线方法。

Comments Preprint. Under review

详情

AI中文摘要

基于评分标准的奖励塑形为通过强化学习（RL）微调大语言模型（LLMs）提供了可解释且可编辑的奖励信号，但现有的自适应评分标准方法通常从局部证据（如当前批次或实例级比较）更新标准。这种局部视角丢弃了训练过程中产生的诊断信息，使得难以跟踪重复失败、评估之前的评分标准编辑或在早期标准饱和后提高标准。我们引入了AMARIS，一种记忆增强的评分标准改进系统，它将评分标准更新建立在纵向训练证据之上。AMARIS将轨迹分析、步骤级摘要和评分标准更新记录存储在持久化评估记忆中，然后检索最近和语义相关的历史来修订评分标准。我们在全局和实例特定评分标准设置下，在科学、医学、指令遵循和创意写作任务上评估了AMARIS。AMARIS在静态、局部自适应和无记忆基线上有所改进，例如在GPQA-Diamond上比最强基线高出+2.8分，在IFBench上高出+2.2分，同时分析表明记忆减少了振荡性的评分标准编辑，并支持从早期错误纠正到后期课程推进的进展。AMARIS与正常RL循环异步运行，相对于同步评分标准更新减少了阻塞延迟。

英文摘要

Rubric-based reward shaping provides interpretable and editable reward signals for fine-tuning LLMs via reinforcement learning (RL), but existing adaptive rubric methods typically update criteria from local evidence such as the current batch or instance-level comparisons. This local view discards diagnostic information produced during training, making it difficult to track recurring failures, evaluate previous rubric edits, or raise standards once earlier criteria become saturated. We introduce AMARIS, A Memory-Augmented Rubric Improvement System that grounds rubric updates in longitudinal training evidence. AMARIS stores rollout analyses, step-level summaries, and rubric update records in a persistent evaluation memory, then retrieves recent and semantically relevant history to revise rubrics. We evaluate AMARIS across science, medicine, instruction following, and creative writing under both global and instance-specific rubric settings. AMARIS improves over static, local-adaptive, and memory-ablated baselines, such as +2.8 points on GPQA-Diamond and +2.2 points on IFBench over the strongest baselines, while analysis shows that memory reduces oscillatory rubric edits and supports a progression from early failure correction to later curriculum advancement. AMARIS runs asynchronously alongside the normal RL loop, reducing blocking latency relative to synchronous rubric updates.

URL PDF HTML ☆

赞 0 踩 0

2605.17482 2026-05-27 cs.CL cs.LG 版本更新

RSD: A Local Triangulation Audit Primitive for Learned Vector Blocks

RSD：一种用于学习向量块的局部三角剖分审计原语

Seungmin Jin

发表机构 * HSE University（俄罗斯高等经济大学）

AI总结提出RSD（关系语义分解）作为局部三角剖分审计方法，通过拟合单纯形成员关系和坐标极点，结合关系解码器和坐标残差，实现学习向量块的可解释性审计。

Comments 8 pages, 1 figure. Revised version with clarified scope, experiments, and limitations

详情

AI中文摘要

局部XAI审计将有限的学习向量块与弱侧信号进行比较。基线方法如最近邻查找、低秩坐标模型和关系分解揭示了审计的不同部分。我们引入关系语义分解（简称RSD），作为学习向量块的局部三角剖分审计。给定坐标X和一个声明的有界弱亲和代理A，RSD拟合单纯形成员关系S和坐标极点C。它在关系解码器中重用S来解码A，并报告坐标残差R=X-SC。这产生了一个范围限定的审计单元：所选块、代理、解码器类和损失预算的兼容性，以及组件质量和残差读数。合成控制检查单纯形重构、代理解码和固定S残差分解。定理陈述、月份和狗/狼块说明了为什么低代理损失应结合组件质量、残差读数和块大小来解读。

英文摘要

Local XAI audits compare a finite block of learned vectors with a weak side signal. Baselines such as nearest-neighbor lookup, low-rank coordinate models, and relation factorization expose different parts of this audit. We introduce Relational Semantic Decomposition, abbreviated as RSD, as a local triangulation audit for learned vector blocks. Given coordinates X and a declared bounded weak affinity proxy A, RSD fits simplex memberships S and coordinate poles C. It reuses S in a relation decoder for A and reports the coordinate residual R=X-SC. This yields a scoped audit unit: compatibility for the chosen block, proxy, decoder class, and loss budget, plus component mass and residual readouts. Synthetic controls check simplex reconstruction, proxy decoding, and fixed-S residual decomposition. The theorem-statement, month, and dog/wolf blocks illustrate why low proxy loss should be read with component mass, residual readouts, and block size.

URL PDF HTML ☆

赞 0 踩 0

2605.15216 2026-05-27 cs.AR cs.LG 版本更新

Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations

可扩展、节能的模拟循环计算的软硬件协同设计

Arthur Fyon, Julien Brandoit, Loris Mendolia, Damien Ernst, Jean-Michel Redouté, Guillaume Drion

发表机构 * University of Liège（列日大学）

AI总结通过软硬件协同设计，利用双稳态记忆循环单元（BMRU）的离散输出抑制噪声，实现了超低功耗的模拟循环神经网络硬件。

Comments This work has been the subject of two patent applications (Numbers: EP26175243.0 and EP26175248.9)

详情

AI中文摘要

始终在线的AI应用，从环境传感器到生物医学植入物，都需要超低功耗。模拟电路提供了一条亚微瓦级推理的路径，但现有的模拟实现仅限于前馈架构：由于时间反馈中的噪声累积，将其扩展到循环动态被认为是不切实际的。我们证明，通过软硬件协同设计可以克服这一障碍。具体来说，我们发现双稳态记忆循环单元（BMRU）——一类具有离散值输出和迟滞动力学的循环神经网络（RNN）——允许一种超低功耗的电流模式模拟实现，我们从第一性原理设计了该实现。由此产生的电路在每个学习参数和电路元件之间建立了一一对应关系。离散输出在每个单元边界处将模拟噪声抑制至少20倍，打破了阻止模拟循环的噪声累积。我们重新制定了BMRU，使其在固定阈值下进行第一象限操作，从而在保持表达能力和可训练性的同时实现了直接对应。在180纳米互补金属氧化物半导体（CMOS）中的晶体管级模拟显示，软件预测与电路级行为之间几乎完美一致，因此软件模型以低计算成本充当物理硬件的高保真模拟器。我们利用这种保真度进行大规模噪声免疫和功率缩放分析：添加循环的功率成本与状态维度线性缩放，而主导总功率的前馈层则二次缩放，这意味着相对于前馈骨干网络，循环是以线性边际成本添加的。端到端的关键词识别在RNN核心处实现了亚微瓦级推理。

英文摘要

Always-on AI applications, from environmental sensors to biomedical implants, require ultra-low power consumption. Analog circuits offer a path to sub-microwatt inference, yet existing analog implementations are limited to feedforward architectures: extending them to recurrent dynamics has been considered impractical due to noise accumulation through temporal feedback. We demonstrate that this barrier can be overcome through hardware-software co-design. Specifically, we identify that Bistable Memory Recurrent Units (BMRUs), a class of Recurrent Neural Networks (RNNs) with discrete-valued outputs and hysteretic dynamics, admit an ultra-low power current-mode analog implementation which we design from first principles. The resulting circuit establishes a one-to-one correspondence between each learned parameter and a circuit element. The discrete outputs suppress analog noise by at least 20-fold at each cell boundary, breaking the noise accumulation that prevents analog recurrence. We reformulate BMRUs for first-quadrant operation with fixed thresholds, enabling the direct correspondence while preserving expressivity and trainability. Transistor-level simulations in 180 nm Complementary Metal-Oxide-Semiconductor (CMOS) show near-perfect agreement between software predictions and circuit-level behavior, with the software model thereby serving as a high-fidelity simulator of the physical hardware at low computational cost. We leverage this fidelity to conduct large-scale noise immunity and power scaling analyses: the power cost of adding recurrence scales linearly with state dimension, while the feedforward layers dominating total power scale quadratically, meaning recurrence is added at linear marginal cost relative to the feedforward backbone. End-to-end keyword spotting achieves sub-microwatt inference at the RNN core.

URL PDF HTML ☆

赞 0 踩 0

2604.27019 2026-05-27 cs.LG cs.CL cs.CR 版本更新

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry

动态对抗微调重组拒绝几何结构

Wenhao Lan, Shan Li, Xinhua Lai, Meiqi Wu, Junbin Yang, Haihua Shen, Yijun Yang

发表机构 * University of Chinese Academy of Sciences（中国科学院大学）； Inner Mongolia University of Technology（内蒙古科技大学）； Tsinghua University（清华大学）； Shandong University（山东大学）

AI总结研究动态对抗微调如何改变安全对齐语言模型中拒绝行为的因果控制载体（低维子空间），发现R2D2沿鲁棒性-效用前沿重组几何结构但未建立自适应鲁棒性。

详情

AI中文摘要

安全对齐的语言模型必须拒绝有害请求而不广泛过度拒绝，但尚不清楚动态对抗微调如何改变拒绝控制载体：Kullback--Leibler (KL)约束方向或因果调节拒绝而不引起大规模安全提示分布偏移的小子空间。我们研究了一个7B骨干模型在监督微调（SFT）和鲁棒拒绝动态防御（R2D2）下的表现，将HarmBench、StrongREJECT和XSTest评估与五点几何测量、因果干预和稀疏自适应压力测试对齐。R2D2在早期检查点将固定源HarmBench攻击成功率降至零；然而，这些检查点也表现出最大的XSTest拒绝率并未能通过良性效用审计。后期检查点部分恢复了面向效用的行为，同时重新打开了攻击成功率，自适应GCG攻击成功率在第250步升至0.415，第500步升至0.613。内部地，R2D2在第100步之前保留了一个后期层的可接受拒绝控制载体，然后将最佳可接受载体迁移到早期层；SFT迁移更早但鲁棒性较差。有效秩保持在1.24附近，SFT表现出更大的主角漂移，这反对将维度扩展和漂移幅度作为充分解释。因果干预支持一个低维但效用耦合的载体。这些结果支持R2D2沿鲁棒性-效用前沿的几何重组解释，但未建立自适应鲁棒性。

英文摘要

Safety-aligned language models must refuse harmful requests without broad over-refusal, but it remains unclear how dynamic adversarial fine-tuning changes refusal-control carriers: Kullback--Leibler (KL)-constrained directions or small subspaces that causally modulate refusal without large safe-prompt distribution shifts. We study a 7B backbone under supervised fine-tuning (SFT) and Robust Refusal Dynamic Defense (R2D2), aligning HarmBench, StrongREJECT, and XSTest evaluations with five-anchor geometry measurements, causal interventions, and sparse adaptive stress tests. R2D2 drives fixed-source HarmBench attack success to zero at early checkpoints; however, these checkpoints also exhibit maximal XSTest refusal and fail a benign-utility audit. Later checkpoints partially recover utility-facing behavior while reopening attack success, with adaptive GCG attack success rate rising to 0.415 at step 250 and 0.613 at step 500. Internally, R2D2 preserves a late-layer admissible refusal-control carrier through step 100 and then relocates the best admissible carrier to an early layer; SFT relocates earlier yet remains less robust. Effective rank stays near 1.24, and SFT shows larger principal-angle drift, arguing against both dimensional expansion and drift magnitude as sufficient explanations. Causal interventions support a low-dimensional but utility-coupled carrier. These results support a geometry-reorganization account of R2D2 along a robustness--utility frontier, without establishing adaptive robustness.

URL PDF HTML ☆

赞 0 踩 0

2605.15522 2026-05-27 math.OC cs.LG 版本更新

Stochastic Non-Smooth Convex Optimization with Unbounded Gradients

无界梯度的随机非光滑凸优化

Dmitry Kovalev

发表机构 * Yandex Research（Yandex研究院）

AI总结针对梯度范数受最优性间隙仿射函数约束的广义Lipschitz函数类，证明AdamW带裁剪更新在随机非光滑凸优化中优于SGD和AdaGrad，并建立其指数加权梯度累积的关键作用及推广到广义光滑和拟凸设置。

详情

AI中文摘要

现有的一阶非光滑优化理论大多建立在目标函数梯度一致有界这一限制性假设上。我们引入了一类更现实的广义Lipschitz函数，其中梯度范数受最优性间隙的仿射函数约束。然后我们提出一个自然的问题：什么算法能在解决凸随机广义Lipschitz优化问题时达到最好的全局收敛速度？为此，我们对几种现有算法进行了新的收敛性分析，发现带有裁剪更新的AdamW在理论上优于其他流行的随机优化方法，如SGD和AdaGrad。此外，我们的分析确立了AdamW的指数加权梯度累积（而非简单平均）的关键作用。我们进一步证明裁剪AdamW具有普适性，并在流行的广义光滑性假设下获得改进的收敛速度，分析了带对角和矩阵预条件子的裁剪AdamW的收敛性，并将结果推广到拟凸设置。

英文摘要

Much of the existing theory on first-order non-smooth optimization is built on a restrictive assumption that the gradients of the objective function are uniformly bounded. We introduce a much more realistic class of generalized Lipschitz functions, where the gradient norms are bounded by an affine function of the optimality gap. We then ask a natural question: what algorithm achieves the best global convergence rates for solving convex stochastic generalized Lipschitz optimization problems? To address this, we develop a new convergence analysis for several existing algorithms and find that AdamW with clipped updates, provably outperforms other popular stochastic optimization methods, such as SGD and AdaGrad. Moreover, our analysis establishes the critical role of AdamW's exponentially weighted gradient accumulation, as opposed to simple averaging. We further show that clipped AdamW is universal and achieves improved rates under the popular generalized smoothness assumption, analyze the convergence of clipped AdamW with diagonal and matrix preconditioners, and extend our results to the quasar-convex setting.

URL PDF HTML ☆

赞 0 踩 0

2602.13770 2026-05-27 eess.IV cs.LG 版本更新

NeuroMambaLLM: Dynamic Graph Learning of fMRI Functional Connectivity in Autistic Brains Using Mamba and Language Model Reasoning

NeuroMambaLLM：使用Mamba和语言模型推理的自闭症大脑fMRI功能连接的动态图学习

Yasaman Torabi, Parsa Razmara, Hamed Ajorlou, Bardia Baraeinejad

发表机构 * Department of Electrical and Computer Engineering, McMaster University（麦基尔大学电气与计算机工程系）； Department of Biomedical Engineering, University of Southern California（南加州大学生物医学工程系）； Department of Electrical and Computer Engineering, University of Rochester（罗切斯特大学电气与计算机工程系）； BIOSEN Group（BIOSEN集团）

AI总结提出NeuroMambaLLM框架，结合动态潜在图学习、选择性状态空间时序建模与冻结的大语言模型，通过低秩适应实现fMRI动态功能连接的诊断分类与临床文本报告生成。

详情

AI中文摘要

大型语言模型（LLMs）在多模态领域展现了强大的语义推理能力。然而，它们与基于图的脑连接模型的集成仍然有限。此外，大多数现有的fMRI分析方法依赖于静态功能连接（FC）表示，这掩盖了对神经发育障碍（如自闭症）至关重要的瞬时神经动态。最近的状态空间方法（包括Mamba）有效地建模了时间结构，但通常作为独立的特征提取器使用，缺乏显式的高层推理。我们提出了NeuroMambaLLM，一个端到端框架，将动态潜在图学习和选择性状态空间时序建模与LLMs相结合。该方法从原始血氧水平依赖（BOLD）时间序列中动态学习功能连接，用自适应潜在连接取代固定相关图，同时抑制运动相关伪影并捕获长程时间依赖。生成的动态大脑表示被投影到LLM模型的嵌入空间中，其中基础语言模型保持冻结，并训练轻量级低秩适应（LoRA）模块以实现参数高效的对齐。这种设计使LLM能够执行诊断分类和基于语言的推理，从而分析动态fMRI模式并生成具有临床意义的文本报告。

英文摘要

Large Language Models (LLMs) have demonstrated strong semantic reasoning across multimodal domains. However, their integration with graph-based models of brain connectivity remains limited. In addition, most existing fMRI analysis methods rely on static Functional Connectivity (FC) representations, which obscure transient neural dynamics critical for neurodevelopmental disorders such as autism. Recent state-space approaches, including Mamba, model temporal structure efficiently, but are typically used as standalone feature extractors without explicit high-level reasoning. We propose NeuroMambaLLM, an end-to-end framework that integrates dynamic latent graph learning and selective state-space temporal modelling with LLMs. The proposed method learns the functional connectivity dynamically from raw Blood-Oxygen-Level-Dependent (BOLD) time series, replacing fixed correlation graphs with adaptive latent connectivity while suppressing motion-related artifacts and capturing long-range temporal dependencies. The resulting dynamic brain representations are projected into the embedding space of an LLM model, where the base language model remains frozen and lightweight low-rank adaptation (LoRA) modules are trained for parameter-efficient alignment. This design enables the LLM to perform both diagnostic classification and language-based reasoning, allowing it to analyze dynamic fMRI patterns and generate clinically meaningful textual reports.

URL PDF HTML ☆

赞 0 踩 0

2511.19289 2026-05-27 quant-ph cs.IT cs.LG math.IT 版本更新

Performance Guarantees for Quantum Neural Estimation of Entropies

熵的量子神经估计的性能保证

Sreejith Sreekumar, Ziv Goldfeld, Mark M. Wilde

发表机构 * Laboratoire Des Signaux Et Systèmes (L2S), CNRS, CentraleSupélec, University of Paris-Saclay（信号与系统实验室（L2S）、国家科学研究中心（CNRS）、中央理工-巴黎高等师范学院（CentraleSupélec）、巴黎-萨克雷大学）； School of Electrical and Computer Engineering, Cornell University（电气与计算机工程学院、康奈尔大学）

AI总结本文针对量子神经估计器（QNE）估计测量（Rényi）相对熵的问题，提出了非渐近误差风险界和指数尾界，并给出了样本复杂度分析，证明了其最优性。

Comments 43 pages

详情

DOI: 10.22331/q-2026-05-21-2113
Journal ref: Quantum 10, 2113 (2026)

AI中文摘要

估计量子熵和散度是量子物理、信息论和机器学习中的一个重要问题。利用混合经典-量子架构的量子神经估计器（QNE）最近成为估计这些度量的一种有吸引力的计算框架。这种估计器将经典神经网络与参数化量子电路相结合，其部署通常需要繁琐地调整控制样本大小、网络架构和电路拓扑的超参数。本文首次以非渐近误差风险界的形式，对测量（Rényi）相对熵的QNE进行了形式化保证研究。我们进一步建立了指数尾界，表明误差是次高斯的，因此尖锐地集中在真实值附近。对于维度为$d$且具有有界Thompson度量的密度算子对的一个适当子类，我们的理论建立了QNE的副本复杂度为$O(|Θ(\mathcal{U})|d/ε^2)$，其中量子电路参数集为$Θ(\mathcal{U})$，该复杂度对精度$ε$具有极小极大最优依赖。此外，如果密度算子对是置换不变的，我们将上述维度依赖改进为$O(|Θ(\mathcal{U})|\mathrm{polylog}(d)/ε^2)$。我们的理论旨在促进测量相对熵的QNE的原则性实现，并指导实践中的超参数调优。

英文摘要

Estimating quantum entropies and divergences is an important problem in quantum physics, information theory, and machine learning. Quantum neural estimators (QNEs), which utilize a hybrid classical-quantum architecture, have recently emerged as an appealing computational framework for estimating these measures. Such estimators combine classical neural networks with parametrized quantum circuits, and their deployment typically entails tedious tuning of hyperparameters controlling the sample size, network architecture, and circuit topology. This work initiates the study of formal guarantees for QNEs of measured (Rényi) relative entropies in the form of non-asymptotic error risk bounds. We further establish exponential tail bounds showing that the error is sub-Gaussian and thus sharply concentrates about the ground truth value. For an appropriate sub-class of density operator pairs on a space of dimension $d$ with bounded Thompson metric, our theory establishes a copy complexity of $O(|Θ(\mathcal{U})|d/ε^2)$ for QNE with a quantum circuit parameter set $Θ(\mathcal{U})$, which has minimax optimal dependence on the accuracy $ε$. Additionally, if the density operator pairs are permutation invariant, we improve the dimension dependence above to $O(|Θ(\mathcal{U})|\mathrm{polylog}(d)/ε^2)$. Our theory aims to facilitate principled implementation of QNEs for measured relative entropies and guide hyperparameter tuning in practice.

URL PDF HTML ☆

赞 0 踩 0

2605.14151 2026-05-27 math.OC cs.LG 版本更新

DiVeQ: 使用重参数化技巧的可微分向量量化

Mohammad Hassan Vali, Tom Bäckström, Arno Solin

发表机构 * ELLIS Institute Finland & Department of Computer Science, Aalto University, Finland（芬兰ELLIS研究所及阿尔托大学计算机科学系）； Department of Information and Communications Engineering, Aalto University, Finland（芬兰阿尔托大学信息与通信工程系）

AI总结提出DiVeQ方法，通过重参数化技巧将量化视为添加模拟量化失真的误差向量，实现前向传播硬量化而梯度可流动，并引入空间填充变体SF-DiVeQ减少量化误差并充分利用码本，在VQ-VAE、VQGAN和DAC任务中提升重建质量和样本质量。

2605.08455 2026-05-27 cs.LG cs.PL cs.SE 版本更新

CUDABeaver: Benchmarking LLM-Based Automated CUDA Debugging

CUDABeaver：基于LLM的自动化CUDA调试基准测试

Shiyang Li, Haoyang Chen, Mattia Fazzini, Caiwen Ding

发表机构 * University of Minnesota（明尼苏达大学）

AI总结提出CUDABEAVER基准，通过协议条件指标pass@k(M,C,A)评估LLM修复CUDA代码的能力，揭示性能损失容忍度对成功率的影响。

Comments 25 pages, 5 figures

详情

AI中文摘要

调试CUDA程序长期以来一直具有挑战性，因为故障通常源于硬件行为、编译器决策、内存层次结构和异步执行之间微妙的交互。更重要的是，随着GPU在科学计算、机器学习、图形和系统工作负载中的快速扩展，CUDA调试变得比以往任何时候都更具挑战性。当前对基于LLM的CUDA编程的评估大多忽略了这一场景：模型可以通过退化性修复通过正确性测试，将CUDA代码简化为更安全但更慢的程序，从而放弃原始优化结构。我们引入了CUDABEAVER，一个从基于LLM的CUDA生成过程中产生的真实失败工作空间中进行CUDA调试的基准。每个任务提供损坏的候选代码、原生构建/测试命令、原始错误证据以及一个可编辑文件。CUDABEAVER评估修复程序是否真正修复了失败的CUDA代码，还是仅仅找到了一个更慢的通过测试的替代方案，并按故障类别、调试轨迹、停滞模式和性能保持情况报告结果。我们进一步提出了pass@k(M,C,A)，一种协议条件的CUDA调试指标，通过明确修复程序M、语料库C和协议轴A。使用该指标在213个任务和七个前沿LLM上，我们表明协议感知评估提供了更真实的CUDA调试能力视图：当性能损失容忍度高时，修复程序看起来更强，但即使是一个微小的更严格的性能要求也能显著降低测量成功率，分数变化高达40个百分点。

英文摘要

Debugging CUDA programs has long been challenging because failures often arise from subtle interactions among hardware behavior, compiler decisions, memory hierarchy, and asynchronous execution. More importantly, with the rapid expansion of GPU usage across scientific computing, machine learning, graphics, and systems workloads, CUDA debugging has become more challenging than ever. Current evaluations of LLM-based CUDA programming largely miss this setting: a model can pass correctness tests with repair by degeneration, simplifying the CUDA code into a safer but slower program that abandons the original optimization structure. We introduce CUDABEAVER, a benchmark for CUDA debugging from real failing workspaces produced during LLM-based CUDA generation. Each task provides the broken candidate, native build/test commands, raw error evidence, and a single editable file. CUDABEAVER evaluates whether a fixer truly repairs the failing CUDA code or merely finds a slower test-passing replacement, reporting results by failure category, debugging trajectory, stagnation mode, and performance preservation. We further propose pass@k(M,C,A), a protocol-conditional CUDA debugging metric by making the fixer M, corpus C, and protocol axes Aexplicit. Using this metric across 213 tasks and seven frontier LLMs, we show that protocol-aware evaluation gives a more faithful view of CUDA debugging ability: when performance-loss tolerance is high, fixers appear much stronger, but even a minor stricter performance requirement can sharply reduce measured success, shifting scores by up to 40 percentage points.

URL PDF HTML ☆

赞 0 踩 0

2605.03929 2026-05-27 cs.SD cs.AI cs.LG eess.SP 版本更新

PHALAR: Phasors for Learned Musical Audio Representations

PHALAR：用于学习音乐音频表示的相量

Davide Marincione, Michele Mancusi, Giorgio Strano, Luca Cerovaz, Donato Crisostomi, Roberto Ribuoli, Emanuele Rodolà

发表机构 * Department of Computer Science, Sapienza University of Rome, Italy（罗马大学计算机科学系）； Moises Systems, Inc.（Moises系统公司）； Paradigma, Inc.（Paradigma公司）

AI总结提出PHALAR对比框架，利用学习谱池化和复值头实现音高和相位等变，在茎检索任务中参数减少50%、训练加速7倍，准确率相对提升约70%，并捕获鲁棒的音乐结构。

Comments Accepted at ICML 2026

2605.07990 2026-05-27 cs.CL cs.AI cs.LG cs.SE 版本更新

Tool Calling is Linearly Readable and Steerable in Language Models

语言模型中的工具调用是线性可读且可引导的

Zekun Wu, Ze Wang, Seonglae Cho, Yufei Yang, Adriano Koshiyama, Sahan Bulathwela, Maria Perez-Ortiz

发表机构 * University College London（伦敦大学学院）； Holistic AI ； Imperial College London（伦敦帝国学院）

AI总结本文发现语言模型内部存在对应工具选择的线性方向，通过干预该方向可切换工具调用，并能提前检测潜在错误，在多个模型和基准上验证了有效性。

Comments 24 pages. ACL ARR May 2026 submission (EMNLP 2026 preferred venue); v2 reflects revised manuscript

详情

AI中文摘要

当工具调用代理选错工具时，失败在执行之前是不可见的：邮件被发送，会议被错过。随着代理承担重要行动，一次糟糕的工具调用可能造成实际损害。目前我们无法在模型内部查看并在错误发生前捕捉它；本文表明我们可以做到。在模型内部，工具的选择由激活空间中的单个方向承载，每对工具对应一个方向。在生成过程中添加该方向会切换模型选择的工具。在涵盖 Gemma 3、Qwen 3、Qwen 2.5 和 Llama 3.1（270M 到 27B）的 12 个指令微调模型和 6 个基础模型上，这在 4B+ 指令微调模型上对 15 个工具的合成基准达到 83-100% 的准确率，在真实 API 基准 τ-bench airline 上达到 77-94%。随后的 JSON 参数自动适应新工具的模式，因此仅翻转名称就足够了。相同的每工具方向还能在错误发生前标记潜在错误：模型在两个工具之间不确定的查询失败率比确定的高 21 倍（Gemma 3 27B）。这不仅仅是主题注入：相同幅度的随机向量给出 0% 的切换率，而在单个领域（共享一个主题的 14 个航空工具）内的探针仍然能在五个 4B-14B 模型上以 top-1 61-89% 的准确率读取模型将调用的工具。即使是基础模型在能够输出工具之前内部已经携带了正确的工具：从模型内部状态读取所选工具（余弦读出）在 BFCL 上恢复 61-82% 的准确率，而基础生成仅为 2-10%，这表明预训练形成了表示，而指令微调后来将其连接到输出。我们的结果涵盖单轮、固定菜单设置；在多轮代理循环中，相同的干预不太稳定（匹配基线的增益或损失高达 30 个百分点，没有一致的方向）。

英文摘要

When a tool-calling agent picks the wrong tool, the failure is invisible until execution: the email gets sent, the meeting gets missed. As agents take on consequential actions, one bad tool call can do real damage. We currently have no way to look inside the model and catch the mistake before it happens; this paper shows that we can. Inside the model, the choice of tool is carried by a single direction in activation space, one direction per pair of tools. Adding that direction during generation switches which tool the model picks. Across 12 instruction-tuned and 6 base models spanning Gemma 3, Qwen 3, Qwen 2.5, and Llama 3.1 (270M to 27B), this works at 83-100% accuracy on 4B+ instruction-tuned models on a 15-tool synthetic benchmark and at 77-94% on the real-API benchmark $τ$-bench airline. The JSON arguments that follow automatically adapt to the new tool's schema, so flipping the name is enough. The same per-tool directions also flag likely errors before they happen: queries where the model is unsure between two tools fail 21x more often than queries where it is not (Gemma 3 27B). This is not just topic injection: random vectors at the same magnitude give a 0% switch rate, and a probe within a single domain (14 airline tools that share one topic) still reads which tool the model will call at top-1 61-89% across five 4B-14B models. Even base models already carry the right tool internally before they can emit it: reading the chosen tool off the model's internal state (cosine readout) recovers 61-82% accuracy on BFCL while base generation lands at 2-10%, suggesting pretraining forms the representation and instruction tuning later wires it to the output. Our results cover single-turn, fixed-menu settings; on multi-turn agent loops the same intervention is less stable (matched-baseline gain or loss of up to 30 percentage points with no consistent direction).

URL PDF HTML ☆

赞 0 踩 0

2605.07632 2026-05-27 cs.CL cs.AI cs.LG 版本更新

Post-training makes large language models less human-like

后训练使大型语言模型更不像人类

Marcel Binz, Elif Akata, Abdullah Almaatouq, Mohammed Alsobay, Oleksii Ariasov, Franziska Brändle, David Broska, Jason W. Burton, Nuno Busch, Frederick Callaway, Vanessa Cheung, Brian Christian, Julian Coda-Forno, Can Demircan, Vittoria Dentella, Maria K. Eckstein, Noémi Éltető, Michael Franke, Thomas L. Griffiths, Fritz Günther, Susanne Haridi, Sebastian Hellmann, Stefan Herytash, Linus Hof, Eleanor Holton, Isabelle Hoxha, Zak Hussain, Akshay Jagadish, Elif Kara, Valentin Kriegmair, Evelina Leivada, Li Ji-An, Tobias Ludwig, Maximilian Maier, Marcelo G. Mattar, Marvin Mathony, Alireza Modirshanechi, Robin Na, Mariia Nadverniuk, Antonios Nasioulas, Surabhi S. Nath, Helen Niemeyer, Kate Nussenbaum, Sebastian Olschewski, Thorsten Pachur, Stefano Palminteri, Aliona Petrenco, Camille V. Phaneuf-Hadd, Angelo Pirrone, Manuel Rausch, Laura Raveling, Shashank Reddy, Milena Rmus, Evan M. Russek, Tankred Saanum, Kai Sandbrink, Louis Schiekiera, Johannes A. Schubert, Luca M. Schulze Buschoff, Nishad Singhi, Leah H. Somerville, Mikhail S. Spektor, Xin Sui, Christopher Summerfield, Mirko Thalmann, Anna I. Thoma, Taisiia Tikhomirova, Vuong Truong, Polina Tsvilodub, Konstantinos Voudouris, Kristin Witte, Shuchen Wu, Dirk U. Wulff, Hua-Dong Xiong, Songlin Xu, Lance Ying, Xinyu Zhang, Jian-Qiao Zhu, Eric Schulz

发表机构 * Helmholtz Munich（海德堡-慕尼黑亥姆霍兹中心）； Massachusetts Institute of Technology（麻省理工学院）； University of Tübingen（图宾根大学）； University of Oxford（牛津大学）； Stanford（斯坦福大学）

AI总结通过引入Psych-201数据集，发现后训练（将基础模型转化为有用助手的过程）一致地降低了模型与人类行为的对齐度，且这种错位在新模型世代中加剧，而人物诱导技术无法改善个体层面的预测。

详情

AI中文摘要

大型语言模型（LLMs）越来越多地被用作人类参与者的替代品，但目前尚不清楚哪些模型最能捕捉人类行为及其原因。为了解决这个问题，我们引入了Psych-201，这是一个新颖的数据集，使我们能够大规模测量行为对齐。我们发现，后训练——将基础模型转化为有用助手的阶段——在模型家族、规模和目标上一致地降低了与人类行为的对齐度。此外，这种错位在新模型世代中扩大，即使基础模型继续改进。最后，我们发现人物诱导——一种通过将模型条件化为参与者特定信息来引发类人行为的流行技术——并不能改善个体层面的预测。综合来看，我们的结果表明，当前用于将LLMs转化为有用助手的那些过程也使得它们成为人类行为的不太准确的模型。

英文摘要

Large language models (LLMs) are increasingly used as surrogates for human participants, but it remains unclear which models best capture human behavior and why. To address this, we introduce Psych-201, a novel dataset that enables us to measure behavioral alignment at scale. We find that post-training -- the stage that turns base models into useful assistants -- consistently reduces alignment with human behavior across model families, sizes, and objectives. Moreover, this misalignment widens in newer model generations even as base models continue to improve. Finally, we find that persona-induction -- a popular technique for eliciting human-like behavior by conditioning models on participant-specific information -- does not improve predictions at the level of individuals. Taken together, our results suggest that the very processes that are currently employed to turn LLMs into useful assistants also make them less accurate models of human behavior.

URL PDF HTML ☆

赞 0 踩 0

2511.22882 2026-05-27 cs.LG math.PR 版本更新

Normalizing Flows on Quotient Manifolds via Boundary Quotients

通过边界商在商流形上的归一化流

William Ghanem, Benjamin Cai

发表机构 * The University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结提出边界商框架，用于在作为更简单域边界商的流形上学习密度，并构造离散群作用下的商流形上的归一化流，在亏格g曲面和透镜空间上验证了有效性。

2603.23985 2026-05-27 cs.LG 版本更新

Diet Your LLM: Dimension-wise Global Pruning of LLMs via Merging Task-specific Importance Score

精简你的大语言模型：通过融合任务特定重要性分数的维度级全局剪枝

Jimyung Hong, Jaehyung Kim

发表机构 * Yonsei University（延世大学）

AI总结提出一种无需训练的维度级结构化剪枝方法DIET，通过跨任务激活幅度多数投票构建全局掩码，在保持任务感知能力的同时避免高昂训练成本，在Gemma-2模型上显著提升剪枝后准确率。

Comments 14 pages, 10 figures. Code available at https://github.com/Jimmy145123/DIET

详情

AI中文摘要

大型语言模型（LLMs）展现了卓越的能力，但其庞大的规模给实际部署带来了重大挑战。结构化剪枝通过移除整个维度或层提供了一种有前景的解决方案，然而现有方法面临关键权衡：任务无关方法无法适应任务特定需求，而任务感知方法需要昂贵的训练来学习任务适应性。我们提出DIET（通过融合任务重要性分数进行维度级全局剪枝），一种无需训练的结构化剪枝方法，结合了维度级粒度与任务感知选择。DIET仅使用每个任务100个样本跨任务分析激活幅度，然后应用多数投票构建单个全局掩码。DIET不需要预计算或训练的高成本。在Gemma-2 2B和9B模型上的七个零样本基准测试实验证明了DIET的有效性；例如，在Gemma-2 2B上20%稀疏度下，与先前最先进的结构化剪枝方法相比，DIET实现了近10%的平均准确率提升。这一优势在不同稀疏度和模型规模下持续存在，使DIET成为结构化LLM剪枝的实用且稳健的选择。

英文摘要

Large language models (LLMs) have demonstrated remarkable capabilities, but their massive scale poses significant challenges for practical deployment. Structured pruning offers a promising solution by removing entire dimensions or layers, yet existing methods face critical trade-offs: task-agnostic approaches cannot adapt to task-specific requirements, while task-aware methods require costly training to learn task adaptability. We propose DIET (Dimension-wise global pruning of LLMs via merging Task-wise importance scores), a training-free structured pruning method that combines dimension-level granularity with task-aware selection. DIET profiles activation magnitudes across tasks using only 100 samples per task, then applies majority voting to construct a single global mask. DIET does not require large costs from pre-computation or training. Experiments on seven zero-shot benchmarks using Gemma-2 2B and 9B models demonstrate the effectiveness of DIET; for example, at 20% sparsity on Gemma-2 2B, DIET achieves near 10% average accuracy improvement, compared to previous state-of-the-art structured pruning methods. This advantage persists across various sparsity levels and model scales, positioning DIET as a practical and robust choice for structured LLM pruning.

URL PDF HTML ☆

赞 0 踩 0

2604.22774 2026-05-27 cs.CY cs.AI cs.CV cs.LG 版本更新

When VLMs 'Fix' Students: Identifying and Penalizing Over-Correction in the Evaluation of Multi-line Handwritten Math OCR

当VLM“修正”学生：多行手写数学OCR评估中的过度修正识别与惩罚

Jin Seong, Wencke Liermann, Minho Kim, Jong-hun Shin, Soojong Lim

发表机构 * Electronics and Telecommunications Research Institute（电子通信研究所）

AI总结针对多行手写数学OCR评估中VLM过度修正问题，提出基于LLM的语义评估指标PINK，有效惩罚过度修正，在FERMAT数据集上优于BLEU。

详情

AI中文摘要

手写数学的准确转录对于教育AI系统至关重要，但当前基准未能正确评估这一能力。大多数先前研究关注单行表达式，并依赖BLEU等词汇指标，无法评估跨多行学生解决方案的语义推理。本文首次系统研究多行手写数学光学字符识别（OCR），揭示了视觉语言模型（VLM）的一个关键失败模式：过度修正。这些模型往往“修正”错误，而非忠实地转录学生作品，从而隐藏了教育评估旨在检测的错误。为解决此问题，我们提出PINK（基于惩罚的INK分数），一种语义评估指标，利用大语言模型（LLM）进行基于评分标准的评分，并明确惩罚过度修正。我们在FERMAT数据集上对15个最先进的VLM进行全面评估，发现与BLEU相比出现显著的排名反转：GPT-4o等模型因激进的过度修正受到严重惩罚，而Gemini 2.5 Flash成为最忠实的转录者。此外，人类专家研究表明，PINK与人类判断的一致性显著更高（55.0%偏好，而BLEU为39.5%），为教育场景中的手写数学OCR提供了更可靠的评估框架。

英文摘要

Accurate transcription of handwritten mathematics is crucial for educational AI systems, yet current benchmarks fail to evaluate this capability properly. Most prior studies focus on single-line expressions and rely on lexical metrics such as BLEU, which fail to assess the semantic reasoning across multi-line student solutions. In this paper, we present the first systematic study of multi-line handwritten math Optical Character Recognition (OCR), revealing a critical failure mode of Vision-Language Models (VLMs): over-correction. Instead of faithfully transcribing a student's work, these models often "fix" errors, thereby hiding the very mistakes an educational assessment aims to detect. To address this, we propose PINK (Penalized INK-based score), a semantic evaluation metric that leverages a Large Language Model (LLM) for rubric-based grading and explicitly penalizes over-correction. Our comprehensive evaluation of 15 state-of-the-art VLMs on the FERMAT dataset reveals substantial ranking reversals compared to BLEU: models like GPT-4o are heavily penalized for aggressive over-correction, whereas Gemini 2.5 Flash emerges as the most faithful transcriber. Furthermore, human expert studies show that PINK aligns significantly better with human judgment (55.0% preference over BLEU's 39.5%), providing a more reliable evaluation framework for handwritten math OCR in educational settings.

URL PDF HTML ☆

赞 0 踩 0

2603.13381 2026-05-27 cs.LG cs.AI 版本更新

Beyond Linearity in Attention Projections: The Case for Nonlinear Queries

注意力投影中的非线性：非线性查询的情况

Marko Karbevski

发表机构 * Simplicity Technologies（简化科技）

AI总结本文提出用非线性残差替换注意力中的查询投影W_Q，通过瓶颈MLP实现，在GPT-3小模型上验证了性能提升。

Comments Accepted at the ICLR 2026 GRaM workshop: https://openreview.net/forum?id=pwdnneFiNZ#discussion

2512.05794 2026-05-27 cs.LG cs.AI q-bio.QM 版本更新

Mechanistic Interpretability of Antibody Language Models Using SAEs

使用 SAE 对抗体语言模型的机制可解释性研究

Rebonto Haque, Oliver M. Turnbull, Anisha Parsan, Nithin Parsan, John J. Yang, Anna L. Beukenhorst, Charlotte M. Deane

发表机构 * Department of Statistics, University of Oxford, UK（英国牛津大学统计系）； Reticular, San Francisco, USA（美国旧金山Reticular公司）； EECS, MIT, Cambridge MA, USA（美国麻省理工学院电子工程与计算机科学系）； Leyden Laboratories BV, Leiden, The Netherlands（荷兰莱顿实验室）

AI总结本研究采用 TopK 和 Ordered 稀疏自编码器（SAE）对抗体语言模型进行机制可解释性分析，发现 TopK SAE 能揭示有意义的生物学潜在特征但无法保证生成控制，而 Ordered SAE 通过层次结构可靠识别可操控特征但激活模式更复杂。

Comments v3: 15 pages; corrected author list and affiliations in the main text; minor text changes; updated steering results following minor code changes; conclusions and findings remain unchanged; included link to data and code in the Data Availability section

2604.19667 2026-05-27 cs.CL cs.AI cs.CV cs.LG cs.MA 版本更新

Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language

Chat2Workflow: 用自然语言生成可执行可视化工作流的基准

Yi Zhong, Buqiang Xu, Yijun Wang, Zifei Shan, Shuofei Qiao, Guozhou Zheng, Ningyu Zhang

发表机构 * Zhejiang University（浙江大学）； Tencent（腾讯）

AI总结提出Chat2Workflow基准，用于评估大语言模型从自然语言生成可执行可视化工作流的能力，并设计了一个智能体基线以提升性能。

Comments Work in progress

详情

AI中文摘要

目前，可执行的可视化工作流已成为实际工业部署中的主流范式，提供了强大的可靠性和可控性。然而，在当前实践中，此类工作流几乎完全通过手动工程构建：开发人员必须仔细设计工作流，为每个步骤编写提示，并随着需求的变化反复修改逻辑——这使得开发成本高昂、耗时且容易出错。为了研究大语言模型能否自动化这一多轮交互过程，我们引入了Chat2Workflow，一个直接从自然语言生成可执行可视化工作流的基准，并提出了一个稳健的智能体基线以提高性能。该基准基于大量真实业务工作流构建，每个实例的设计使得生成的工作流可以转换并直接部署到实际工作流平台（如Dify和Coze）上。实验结果表明，尽管最先进的语言模型通常能捕捉高层次意图，但在生成正确、稳定且可执行的工作流方面仍存在困难，尤其是在面对复杂且不断变化的需求时。尽管我们的智能体基线带来了高达6.05%的解决率提升，但剩余的现实差距使Chat2Workflow成为推进工业级自动化的基础。代码可在https://github.com/zjunlp/Chat2Workflow获取。

英文摘要

At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and controllability. However, in current practice, such workflows are almost entirely constructed through manual engineering: developers must carefully design workflows, write prompts for each step, and repeatedly revise the logic as requirements evolve -- making development costly, time-consuming, and error-prone. To study whether large language models can automate this multi-round interaction process, we introduce Chat2Workflow, a benchmark for generating executable visual workflows directly from natural language, and propose a robust agentic baseline to improve performance. The benchmark is built from a large collection of real-world business workflows, with each instance designed so that the generated workflow can be transformed and directly deployed to practical workflow platforms such as Dify and Coze. Experimental results show that while state-of-the-art language models can often capture high-level intent, they struggle to generate correct, stable, and executable workflows, especially given complex and evolving requirements. Although our agentic baseline yields up to 6.05% resolve rate gains, the remaining real-world gap positions Chat2Workflow as a foundation for advancing industrial-grade automation. Code is available at https://github.com/zjunlp/Chat2Workflow.

URL PDF HTML ☆

赞 0 踩 0

2604.18751 2026-05-27 cs.LG cs.AI stat.ME stat.ML 版本更新

Beyond Coefficients: Forecast-Necessity Testing for Interpretable Causal Discovery in Nonlinear Time-Series Models

超越系数：非线性时间序列模型中可解释因果发现的预测必要性检验

Valentina Kuskova, Dmitry Zaytsev, Michael Coppedge

发表机构 * Lucy Family Institute for Data & Society（数据与社会联合研究所）； University of Notre Dame（诺特大学）； Department of Political Science（政治学系）

AI总结针对非线性时间序列模型中因果分数被误读为回归系数的问题，提出基于边消融和预测比较的预测必要性检验框架，以评估因果关系的实际必要性。

详情

DOI: 10.32473/flairs.39.1

AI中文摘要

非线性机器学习模型越来越多地用于发现时间序列数据中的因果关系，但其输出的解释仍不明确。特别是，正则化神经自回归模型产生的因果分数常被视为回归系数的类比，导致误导性的统计显著性声明。在本文中，我们认为非线性时间序列模型中的因果相关性应通过预测必要性而非系数大小来评估，并提出了一种实用的评估程序。我们提出了一个基于系统边消融和预测比较的可解释评估框架，用于测试候选因果关系是否对准确预测是必要的。以神经加性向量自回归作为案例研究模型，我们将该框架应用于一个关于民主发展的真实世界案例研究，该案例将面板数据（139个国家的民主指标）建模为多元时间序列。我们表明，具有相似因果分数的关系由于冗余、时间持久性和特定制度效应，其预测必要性可能差异巨大。我们的结果展示了预测必要性检验如何支持应用AI系统中更可靠的因果推理，并为在高风险领域解释非线性时间序列模型提供实用指导。

英文摘要

Nonlinear machine-learning models are increasingly used to discover causal relationships in time-series data, yet the interpretation of their outputs remains poorly understood. In particular, causal scores produced by regularized neural autoregressive models are often treated as analogues of regression coefficients, leading to misleading claims of statistical significance. In this paper, we argue that causal relevance in nonlinear time-series models should be evaluated through forecast necessity rather than coefficient magnitude, and we present a practical evaluation procedure for doing so. We present an interpretable evaluation framework based on systematic edge ablation and forecast comparison, which tests whether a candidate causal relationship is required for accurate prediction. Using Neural Additive Vector Autoregression as a case study model, we apply this framework to a real-world case study of democratic development, modeled as a multivariate time series of panel data - democracy indicators across 139 countries. We show that relationships with similar causal scores can differ dramatically in their predictive necessity due to redundancy, temporal persistence, and regime-specific effects. Our results demonstrate how forecast-necessity testing supports more reliable causal reasoning in applied AI systems and provides practical guidance for interpreting nonlinear time-series models in high-stakes domains.

URL PDF HTML ☆

赞 0 踩 0

2604.11467 2026-05-27 cs.AI cs.HC cs.LG 版本更新

From Attribution to Action: A Human-Centered Application of Activation Steering

从归因到行动：激活导向的人本应用

Tobias Labarta, Maximilian Dreyer, Katharina Weitz, Wojciech Samek, Sebastian Lapuschkin

发表机构 * Fraunhofer Heinrich-Hertz-Institut（弗劳恩霍夫 Heinrich-Hertz 研究所）； Technische Universität Berlin（柏林技术大学）； BIFOLD – Berlin Institute for the Foundations of Learning and Data（柏林学习与数据基础研究所）

AI总结提出结合SAE归因与激活导向的交互式工作流，通过专家访谈验证其能促进从检查到干预的转变，并揭示组件抑制等调试策略及潜在风险。

详情

AI中文摘要

可解释人工智能（XAI）方法揭示了哪些特征影响模型预测，但为实践者基于这些解释采取行动提供了有限的手段。通过XAI识别出的组件的激活导向为可操作的解释提供了一条路径，但其实际效用仍未得到充分研究。我们引入了一个交互式工作流，将基于SAE的归因与激活导向相结合，用于视觉模型中概念使用的实例级分析，并实现为一个基于网页的工具。基于此工作流，我们进行了半结构化专家访谈（N=8），在CLIP上执行调试任务，以调查实践者如何推理、信任和应用激活导向。我们发现，导向使得从检查转向基于干预的假设检验（8/8参与者），大多数参与者将信任建立在观察到的模型响应上，而非仅仅解释的合理性（6/8）。参与者采用了系统性的调试策略，其中组件抑制占主导（7/8），并指出了包括涟漪效应和实例级修正的有限泛化在内的风险。总体而言，激活导向使可解释性更具可操作性，同时为安全有效使用提出了重要考虑。

英文摘要

Explainable AI (XAI) methods reveal which features influence model predictions, yet provide limited means for practitioners to act on these explanations. Activation steering of components identified via XAI offers a path toward actionable explanations, although its practical utility remains understudied. We introduce an interactive workflow combining SAE-based attribution with activation steering for instance-level analysis of concept usage in vision models, implemented as a web-based tool. Based on this workflow, we conduct semi-structured expert interviews (N=8) with debugging tasks on CLIP to investigate how practitioners reason about, trust, and apply activation steering. We find that steering enables a shift from inspection to intervention-based hypothesis testing (8/8 participants), with most grounding trust in observed model responses rather than explanation plausibility alone (6/8). Participants adopted systematic debugging strategies dominated by component suppression (7/8) and highlighted risks including ripple effects and limited generalization of instance-level corrections. Overall, activation steering renders interpretability more actionable while raising important considerations for safe and effective use.

URL PDF HTML ☆

赞 0 踩 0

2505.23606 2026-05-27 cs.LG cs.CV 版本更新

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

Muddit: 通过统一离散扩散模型解放超越文本到图像的生成

Qingyu Shi, Jinbin Bai, Zhuoran Zhao, Wenhao Chai, Kaidong Yu, Jianzong Wu, Yunhai Tong, Xiangtai Li, Xuelong Li, Shuicheng Yan

发表机构 * M-E-AGI-Lab（M-E-AGI实验室）

AI总结提出Muddit，一种统一离散扩散Transformer，结合预训练文本到图像骨干的强视觉先验与轻量文本解码器，实现跨文本和图像模态的快速并行生成，在质量和效率上优于大型自回归模型。

Comments Accepted to ICLR 2026. Codes and Supplementary Material: https://github.com/M-E-AGI-Lab/Muddit

详情

AI中文摘要

统一生成模型旨在单一架构和解码范式下处理跨模态的多种任务——如文本生成、图像生成和视觉-语言推理。自回归统一模型因顺序解码导致推理缓慢，而非自回归统一模型因预训练骨干有限导致泛化能力弱。我们引入第二代Meissonic：Muddit，一种统一离散扩散Transformer，能够在文本和图像模态上实现快速并行生成。与先前从头训练的统一扩散模型不同，Muddit将来自预训练文本到图像骨干的强视觉先验与轻量文本解码器集成，从而在统一架构下实现灵活且高质量的多模态生成。实验结果表明，Muddit在质量和效率上均达到或优于显著更大的自回归模型。该工作凸显了纯离散扩散在配备强视觉先验时，作为统一生成的可扩展且有效骨干的潜力。

英文摘要

Unified generation models aim to handle diverse tasks across modalities -- such as text generation, image generation, and vision-language reasoning -- within a single architecture and decoding paradigm. Autoregressive unified models suffer from slow inference due to sequential decoding, and non-autoregressive unified models suffer from weak generalization due to limited pretrained backbones. We introduce the second-generation Meissonic: Muddit, a unified discrete diffusion transformer that enables fast and parallel generation across both text and image modalities. Unlike prior unified diffusion models trained from scratch, Muddit integrates strong visual priors from a pretrained text-to-image backbone with a lightweight text decoder, enabling flexible and high-quality multimodal generation under a unified architecture. Empirical results show that Muddit achieves competitive or superior performance compared to significantly larger autoregressive models in both quality and efficiency. The work highlights the potential of purely discrete diffusion, when equipped with strong visual priors, as a scalable and effective backbone for unified generation.

URL PDF HTML ☆

赞 0 踩 0

2604.11056 2026-05-27 cs.LG cs.AI 版本更新

Where Hindsight Credit Can Reside: A Signed-Capacity View of Token Updates in RLVR

事后信用可驻留之处：RLVR中令牌更新的有符号容量视角

Yuhang He, Haodong Wu, Siyi Liu, Hongyu Ge, Hange Zhou, Keyi Wu, Zhuo Zheng, Qihong Lin, Zixin Zhong, Yongqi Zhang

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））； Huawei Technologies Ltd.（华为技术有限公司）

AI总结本文通过条件互信息分析RLVR中令牌级信用的容量上限，提出四象限分解区分更新方向，并设计HAPO算法进行容量引导的优势重分配，提升数学推理性能。

详情

AI中文摘要

具有可验证奖励的强化学习（RLVR）提升了大语言模型（LLMs）的推理能力，但稀疏的结果奖励使得令牌级信用分配变得困难。我们将令牌级信用视为从行为策略到事后后验的奖励条件偏移。在自回归RLVR中，这种偏移可以通过条件互信息（CMI）表示，这表明令牌熵限制了可能的事后信用上限。然而，熵指示的是容量而非更新方向，因此我们引入了四象限分解，根据奖励极性和令牌熵来分离更新。受控干预表明，这两个因素共同塑造了令牌更新。持续的推理增益集中在有符号的高熵象限，而低熵更新则迅速饱和。基于此分析，我们提出了事后感知策略优化（HAPO），这是对GRPO的一种符号保持修改，执行容量引导的优势重分配。在两个模型设置的数学推理基准上的实验表明，HAPO在熵感知基线中取得了有竞争力的性能。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) improves the reasoning ability of Large Language Models (LLMs), but sparse outcome rewards make token-level credit assignment difficult. We study token-level credit as a reward-conditioned shift from the behavior policy to a hindsight posterior. In autoregressive RLVR, this shift can be expressed through Conditional Mutual Information (CMI), which shows that token entropy upper-bounds possible hindsight credit. Entropy, however, indicates capacity rather than update direction, so we introduce the Four Quadrant Decomposition to separate updates by reward polarity and token entropy. Controlled interventions show that these two factors jointly shape token updates. Sustained reasoning gains concentrate in signed high-entropy quadrants, whereas low-entropy updates saturate quickly. Based on this analysis, we propose Hindsight-Aware Policy Optimization (HAPO), a sign-preserving modification to GRPO that performs capacity-guided advantage reallocation. Experiments on mathematical reasoning benchmarks in two model settings show that HAPO achieves competitive performance among entropy-aware baselines.

URL PDF HTML ☆

赞 0 踩 0

2509.21882 2026-05-27 cs.LG cs.AI 版本更新

别听我的！多轮对话如何降低LLM的可靠性

Kevin H. Guo, Chao Yan, Avinash Baidya, Katherine Brown, Xiang Gao, Juming Xiong, Zhijun Yin, Bradley A. Malin

发表机构 * Vanderbilt University（范德比尔大学）； Vanderbilt University Medical Center（范德比尔大学医学中心）； Intuit AI Research（Intuit人工智能研究）

AI总结提出“坚持或切换”(SoS)框架，通过将问答空间分割为多个顺序呈现来评估LLM在多轮对话中的可靠性，发现对话税导致准确性和拒绝错误建议的能力平均下降30%，并观察到盲目切换现象。

详情

AI中文摘要

大型语言模型（LLM）在静态基准测试中表现出色，但它们在更能反映实际使用的多轮对话中的性能仍未得到充分研究。解决这一差距在医疗保健等高风险环境中至关重要，因为患者和临床医生正在转向LLM聊天机器人来处理他们的医疗咨询。在这里，我们引入了“坚持或切换”（SoS）框架，该框架将问答空间划分为多个顺序呈现，以模拟两种以安全为中心的行为：坚持（即坚持正确的答案选择或拒绝错误的建议）和灵活性（即在引入正确建议时切换到该建议）。在三个临床基准测试中评估了17个LLM，我们观察到普遍存在的对话税，其中将答案空间分割为顺序呈现使端到端准确性和对错误建议的拒绝率平均下降高达30%，在某些模型中达到65%。我们还观察到盲目切换，即模型从初始拒绝转向错误和正确建议的比率几乎相同，达到50%。最后，我们表明，增加模型规模可以缓解其中一些对话效率低下的问题，但会加剧其他问题，例如从初始拒绝中采纳错误建议的倾向更高。我们的研究结果共同表明，静态基准测试所捕获的一般能力并不能推广到多轮对话中。

英文摘要

Large language models (LLMs) excel on static benchmarks, but their performance across multi-turn conversations, which better reflect real-world usage, remains understudied. Addressing this gap is critical in high-stakes settings like healthcare, where patients and clinicians are turning to LLM chatbots to address their medical inquiries. Here, we introduce the "stick-or-switch" (SoS) framework, which partitions a question-answer space into multiple sequential presentations to model two safety-centric behaviors: conviction (i.e., sticking to a correct answer selection or abstention against incorrect suggestions) and flexibility (i.e., switching to a correct suggestion when it is introduced). Evaluating 17 LLMs across three clinical benchmarks, we observe a pervasive conversation tax, where partitioning an answer-space into sequential presentations reduces end-to-end accuracy and abstention against incorrect suggestions by an average of up to 30%, reaching 65% in certain models. We also observe blind switching, where models transition an initial abstention to incorrect and correct suggestions at near-identical rates reaching 50%. Finally, we show that increasing model scale mitigates some of these conversational inefficacies while exacerbating others, such as a higher propensity to adopt an incorrect suggestion from an initial abstention. Together our findings demonstrate that the general proficiency captured by static benchmarks do not translate over multi-turn dialogues.

URL PDF HTML ☆

赞 0 踩 0

2512.21602 2026-05-27 cs.LG cs.CV 版本更新

An Empirical Study of Machine Learning Robustness and Scalability for Imbalanced Tabular Clinical Data in Emergency and Critical Care

机器学习在急诊和重症监护中不平衡表格临床数据的鲁棒性与可扩展性实证研究

Yusuf Brima, Marcellin Atemkeng

发表机构 * Computer Vision Group, Institute of Cognitive Science, Osnabrück University（计算机视觉组，认知科学研究所，奥斯纳布吕克大学）； Department of Mathematics, Rhodes University（数学系，罗德斯大学）； National Institute for Theoretical and Computational Sciences (NITheCS)（国家理论与计算科学研究所（NITheCS））

AI总结本研究在MIMIC-IV-ED和eICU数据集上评估六类模型在不平衡临床表格数据上的性能，发现树模型在可扩展性上最优，而表格基础模型在性能与效率间提供新的权衡。

详情

AI中文摘要

每年，数百万患者通过急诊科和重症监护室，临床医生必须在时间压力和不确定性下做出高风险决策。机器学习可以支持恶化预测、分诊和罕见关键结局的预测，但临床数据通常严重不平衡，使模型偏向多数类并降低预测性能。因此，为不平衡的临床表格数据开发鲁棒且高效的模型仍然是一个重要挑战。我们在MIMIC-IV-ED和eICU数据库的不平衡表格数据上评估了六类模型：决策树、随机森林、XGBoost、TabNet、TabICL和TabPFN v2.6。可训练模型通过贝叶斯超参数调优进行优化，而基础模型在其预训练推理模式下进行评估，无需任务特定的重新加权。模型使用Macro F1分数、对递增不平衡的鲁棒性以及跨七个临床预测任务的计算可扩展性进行评估。结果在不同数据集上有所不同。在MIMIC-IV-ED上，TabPFN v2.6和TabICL获得了最强的平均Macro F1排名，XGBoost保持竞争力。在eICU上，XGBoost始终表现最佳，其次是其他基于树的方法，而基础模型达到中等性能。在两个数据集中，TabNet在递增不平衡下显示出最大的性能下降和最高的计算成本。训练时间分析表明，基于树的方法随数据集大小扩展最有利，而基础模型提供了较低的每任务适应成本。这些发现表明，没有单一模型族在所有临床环境中占主导地位。然而，表格基础模型正在缩小与强经典基线的性能差距，同时提供独特的效率-性能权衡，这可能有利于资源受限的临床环境。

英文摘要

Every year, millions of patients pass through emergency departments and intensive care units, where clinicians must make high-stakes decisions under time pressure and uncertainty. Machine learning could support prediction of deterioration, triage, and rare critical outcomes, but clinical data are often severely imbalanced, biasing models toward majority classes and reducing predictive performance. Developing robust and efficient models for imbalanced clinical tabular data therefore remains an important challenge. We evaluated six model families on imbalanced tabular data from the MIMIC-IV-ED and eICU databases: Decision Tree, Random Forest, XGBoost, TabNet, TabICL, and TabPFN v2.6. Trainable models were optimized using Bayesian hyperparameter tuning, while foundation models were evaluated in their pretrained inference regime without task-specific reweighting. Models were assessed using Macro F1-score, robustness to increasing imbalance, and computational scalability across seven clinical prediction tasks. Results differed across datasets. On MIMIC-IV-ED, TabPFN v2.6 and TabICL achieved the strongest average Macro F1 ranks, with XGBoost remaining competitive. On eICU, XGBoost consistently performed best, followed by other tree-based methods, while foundation models achieved intermediate performance. Across both datasets, TabNet showed the largest degradation under increasing imbalance and the highest computational cost. Training-time analysis showed that tree-based methods scaled most favorably with dataset size, while foundation models offered low per-task adaptation cost. These findings suggest that no single model family dominates across all clinical settings. However, tabular foundation models are narrowing the performance gap with strong classical baselines while offering a distinct efficiency-performance trade-off that may benefit resource-constrained clinical environments.

URL PDF HTML ☆

赞 0 踩 0

2604.07190 2026-05-27 cs.CY cs.AI cs.LG 版本更新

The ATOM Report: Measuring the Open Language Model Ecosystem

ATOM报告：衡量开放语言模型生态系统

Nathan Lambert, Florian Brand

发表机构 * Interconnects AI

AI总结本研究通过分析约1500个主流开放语言模型（如阿里巴巴的Qwen、DeepSeek、Meta的Llama）的下载量、衍生模型、推理市场份额和性能指标，揭示了2025年夏季中国模型超越美国模型并持续扩大差距的趋势。

Comments 23 pages, 17 figures

2604.00993 2026-05-27 astro-ph.IM astro-ph.EP cs.LG cs.RO 版本更新

Focal plane wavefront control with model-based reinforcement learning

基于模型的强化学习进行焦平面波前控制

Jalo Nousiainen, Iremsu Taskin, Markus Kasper, Gilles Orban De Xivry, Olivier Absil

发表机构 * European Southern Observatory (ESO)（欧洲南天文学观测站）； STAR Institute, Université de Liège（利根大学STAR研究所）

AI总结提出基于模型的强化学习算法PO4NCPA，通过顺序相位分集自动校正动态和静态非共路像差，实现高对比度成像中的焦平面波前控制。

Comments 13 pages, 11 figures accepted by A&A

详情

DOI: 10.1051/0004-6361/202558504
Journal ref: A&A 709, A267 (2026)

AI中文摘要

直接成像潜在宜居系外行星是极大望远镜高对比度成像仪器的主要科学目标之一。大多数此类系外行星轨道靠近其主星，其观测受到快速移动的大气散斑和准静态非共路像差（NCPA）的限制。传统的NCPA校正方法通常使用机械镜面探针，这会在操作期间影响性能。本文提出了基于机器学习的NCPA控制方法，通过利用顺序相位分集自动检测和校正动态及静态NCPA误差。我们将先前用于自适应光学的强化学习工作扩展到焦平面控制。一种新的基于模型的RL算法——NCPA策略优化（PO4NCPA），将焦平面图像解释为输入数据，并通过顺序相位分集确定相位校正，从而在没有先验系统知识的情况下优化非日冕和日冕后点扩散函数。此外，我们通过在受水汽诱导视宁度（动态NCPA）影响的地基望远镜和红外成像仪上数值模拟静态NCPA误差，证明了该方法的有效性。模拟表明，PO4NCPA能够稳健地补偿静态和动态NCPA。在静态情况下，它实现了使用日冕仪时近最优的焦平面光抑制，以及无日冕仪时近最优的斯特列尔比。在动态NCPA情况下，它在这些指标上与结合1步延迟积分器的模态最小二乘重构性能相当。该方法对ELT光瞳、矢量涡旋日冕仪以及在光子和背景噪声下仍然有效。PO4NCPA是无模型的，可直接应用于标准成像以及任何日冕仪。其亚毫秒级的推理时间和性能也使其适用于高对比度成像之外的大气湍流实时低阶校正。

英文摘要

The direct imaging of potentially habitable exoplanets is one prime science case for high-contrast imaging instruments on extremely large telescopes. Most such exoplanets orbit close to their host stars, where their observation is limited by fast-moving atmospheric speckles and quasi-static non-common-path aberrations (NCPA). Conventional NCPA correction methods often use mechanical mirror probes, which compromise performance during operation. This work presents machine-learning-based NCPA control methods that automatically detect and correct both dynamic and static NCPA errors by leveraging sequential phase diversity. We extend previous work in reinforcement learning for AO to focal plane control. A new model-based RL algorithm, Policy Optimization for NCPAs (PO4NCPA), interprets the focal-plane image as input data and, through sequential phase diversity, determines phase corrections that optimize both non-coronagraphic and post-coronagraphic PSFs without prior system knowledge. Further, we demonstrate the effectiveness of this approach by numerically simulating static NCPA errors on a ground-based telescope and an infrared imager affected by water-vapor-induced seeing (dynamic NCPAs). Simulations show that PO4NCPA robustly compensates static and dynamic NCPAs. In static cases, it achieves near-optimal focal-plane light suppression with a coronagraph and near-optimal Strehl without one. With dynamics NCPA, it matches the performance of the modal least-squares reconstruction combined with a 1-step delay integrator in these metrics. The method remains effective for the ELT pupil, vector vortex coronagraph, and under photon and background noise. PO4NCPA is model-free and can be directly applied to standard imaging as well as to any coronagraph. Its sub-millisecond inference times and performance also make it suitable for real-time low-order correction of atmospheric turbulence beyond HCI.

URL PDF HTML ☆

赞 0 踩 0

2604.04948 2026-05-27 cs.IR cs.AI cs.LG 版本更新

From PDF to RAG-Ready: Evaluating Document Conversion Frameworks for Domain-Specific Question Answering

从PDF到RAG就绪：评估面向特定领域问答的文档转换框架

José Guilherme Marques dos Santos, Ricardo Yang, Rui Humberto Pereira, Alexandre Sousa, Brígida Mónica Faria, Henrique Lopes Cardoso, José Duarte, José Luís Reis, Luís Paulo Reis, Pedro Pimenta, José Paulo Marques dos Santos

发表机构 * Faculty of Engineering, University of Porto（葡萄牙波尔图大学工程学院）； Department of Business Administration, University of Maia（马亚大学商业管理系）； LIACC—Artificial Intelligence and Computer Science Laboratory, University of Porto（葡萄牙波尔图大学人工智能与计算机科学实验室）； Department of Communication Sciences and Information Technologies, University of Maia（马亚大学通讯科学与信息科技系）； School of Health, Polytechnic of Porto（波尔图理工学院健康学院）； School of Technology and Management, Polytechnic Institute of Maia（马亚理工学院技术与管理学院）

AI总结通过系统比较四种开源PDF转Markdown框架的21种流水线配置，发现文档预处理质量（尤其是层次化分块和元数据增强）对RAG系统问答准确率的影响远超转换工具本身，最佳配置（Docling+层次化分块+图像描述）达到94.1%准确率，超越人工整理。

Comments 27 pages, 3 figures, 7 tables

详情

DOI: 10.3390/app16105069
Journal ref: Applied Sciences 16 (2026) 5069

AI中文摘要

检索增强生成（RAG）系统严重依赖文档预处理的质量，然而尚无先前研究通过评估PDF处理框架对下游问答准确性的影响来填补这一空白。我们通过系统比较四种开源PDF到Markdown转换框架——Docling、MinerU、Marker和DeepSeek OCR——在21种流水线配置下的表现，这些配置在转换工具、清洗变换、分块策略和元数据增强方面有所变化。评估使用了一个包含36份葡萄牙语行政文档（1706页，约49.2万词）的语料库上的50个问题基准，每个配置通过LLM作为裁判进行超过50次独立运行的评分。通过Wilcoxon符号秩检验和Cohen's d效应量评估统计显著性。两个基线界定了结果范围：朴素的PDFLoader（86.2%）和人工整理的Markdown（91.3%）。采用层次化分块和图像描述的Docling实现了最高的自动准确率（94.1±1.6%），甚至超越了人工整理。按问题类型分析显示，依赖表格的问题导致了最大的准确率差异，在基本分块和层次化分块之间存在33个百分点的差距。元数据增强和层次感知分块对准确率的贡献超过了转换框架本身。探索性的GraphRAG实现表现不如基本RAG（82%对比94.1%）。这些发现表明，数据准备质量是RAG系统性能的主导因素。

英文摘要

Retrieval-Augmented Generation (RAG) systems depend critically on the quality of document preprocessing, yet no prior study has evaluated PDF processing frameworks by their impact on downstream question-answering accuracy. We address this gap through a systematic comparison of four open-source PDF-to-Markdown conversion frameworks, Docling, MinerU, Marker, and DeepSeek OCR, across 21 pipeline configurations, varying the conversion tool, cleaning transformations, splitting strategy, and metadata enrichment. Evaluation was performed using a 50-question benchmark over a corpus of 36 Portuguese administrative documents (1706 pages, ~492K words), with LLM-as-judge scoring over 50 independent runs per configuration. Statistical significance was assessed via Wilcoxon signed-rank tests with Cohen's d effect sizes. Two baselines bounded the results: naïve PDFLoader (86.2%) and manually curated Markdown (91.3%). Docling with hierarchical splitting and image descriptions achieved the highest automated accuracy (94.1 +/- 1.6%), surpassing even manual curation. A per-question-type analysis revealed that table-dependent questions drive the largest accuracy differences, with a 33-percentage-point gap between basic and hierarchical splitting. Metadata enrichment and hierarchy-aware chunking contributed more to accuracy than the conversion framework alone. An exploratory GraphRAG implementation underperformed basic RAG (82% vs. 94.1%). These findings demonstrate that data preparation quality is the dominant factor in RAG system performance.

URL PDF HTML ☆

赞 0 踩 0

2602.02192 2026-05-27 cs.LG cs.DC 版本更新

ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning

ECHO-2: 一种面向经济高效强化学习的大规模分布式推演框架

Jingwei Song, Meng Chen, Jie Xiao, Qingnan Ren, Jiaqi Huang, Yangshen Deng, Chris Tong, Wanyi Chen, Suli Wang, Zhisheng Chen, Ziqian Bi, Shuo Lu, Yiqun Duan, Xu Wang, Rymon Yu, Lynn Ai, Eric Yang, Tianyu Shi

发表机构 * The University of Hong Kong（香港大学）； Fudan University（复旦大学）； Gradient ； University of Edinburgh（爱丁堡大学）； Soochow University（苏州大学）； Technical University of Darmstadt（达姆施塔特技术大学）； University of the Chinese Academy of Sciences（中国科学院大学）

AI总结提出ECHO-2分布式强化学习框架，通过重叠推演生成、传播与训练，结合对等辅助流水线广播和成本感知异构工作节点激活，在保持奖励性能的同时显著提升成本效率。

Comments 24 pages, 7 figures

详情

AI中文摘要

强化学习（RL）是大语言模型（LLM）后训练的关键阶段，涉及推演生成、奖励评估和集中学习之间的反复交互。分布式推演执行提供了利用更具成本效益的推理资源的机会，但引入了广域协调和策略传播方面的挑战。我们提出了ECHO-2，一个用于后训练的分布式RL框架，使用远程推理工作节点且传播延迟不可忽略。ECHO-2将集中学习与分布式推演相结合，将有界策略过时性视为用户可控参数，使得推演生成、传播和训练能够重叠。我们引入了一个基于重叠的容量模型，关联训练时间、传播延迟和推演吞吐量，得出了一个维持学习器利用率的实用配置规则。为了缓解传播瓶颈并降低成本，ECHO-2采用了对等辅助流水线广播和成本感知的异构工作节点激活。在真实广域网带宽条件下，对4B到32B参数规模的LLM进行GRPO后训练的实验表明，ECHO-2在保持与强基线相当的RL奖励的同时，显著提高了成本效率。

英文摘要

Reinforcement learning (RL) is a critical stage in post-training large language models (LLMs), involving repeated interaction between rollout generation, reward evaluation, and centralized learning. Distributing rollout execution offers opportunities to leverage more cost-efficient inference resources, but introduces challenges in wide-area coordination and policy dissemination. We present ECHO-2, a distributed RL framework for post-training with remote inference workers and non-negligible dissemination latency. ECHO-2 combines centralized learning with distributed rollouts and treats bounded policy staleness as a user-controlled parameter, enabling rollout generation, dissemination, and training to overlap. We introduce an overlap-based capacity model that relates training time, dissemination latency, and rollout throughput, yielding a practical provisioning rule for sustaining learner utilization. To mitigate dissemination bottlenecks and lower cost, ECHO-2 employs peer-assisted pipelined broadcast and cost-aware activation of heterogeneous workers. Experiments on GRPO post-training of LLMs ranging from 4B to 32B parameters under real wide-area bandwidth regimes show that ECHO-2 significantly improves cost efficiency while preserving RL reward comparable to strong baselines.

URL PDF HTML ☆

赞 0 踩 0

2512.01678 2026-05-27 cs.LG cs.DC cs.PL 版本更新

Morphling: Fast, Fused, and Flexible GNN Training at Scale

Morphling: 快速、融合且灵活的图神经网络规模化训练

Anubhab, Rupesh Nasre

发表机构 * IIT Madras（印度理工学院马德拉斯学院）

AI总结提出Morphling领域特定代码合成器，通过架构感知的原语和运行时稀疏感知执行引擎，在CPU、GPU和分布式环境下显著提升GNN训练吞吐量并降低内存消耗。

详情

AI中文摘要

图神经网络（GNN）通过融合不规则、内存受限的图遍历与规则、计算密集型密集矩阵运算，带来了根本性的硬件挑战。虽然PyTorch Geometric（PyG）和Deep Graph Library（DGL）等框架优先考虑高级可用性，但它们未能解决这些不同的执行特性。因此，它们依赖通用内核，导致缓存局部性差、内存移动过多以及大量中间分配。为了解决这些限制，我们提出了Morphling，一个旨在弥合这一差距的领域特定代码合成器。Morphling将高级GNN规范编译为可移植的、后端特化的实现，针对OpenMP、CUDA和MPI。它通过实例化一个针对每个执行环境定制的优化、架构感知原语库来实现这一点。Morphling还包含一个运行时稀疏感知执行引擎，该引擎使用输入特征统计动态选择密集或稀疏执行路径，减少对零值条目的不必要计算。我们在涵盖不同图结构、特征维度和稀疏程度的11个真实世界数据集上评估了Morphling。与PyG和DGL相比，Morphling在CPU上平均提高每轮训练吞吐量20倍，在GPU上提高19倍，在分布式设置中提高6倍，峰值加速达到66倍。Morphling的内存高效布局进一步将峰值内存消耗降低多达15倍，使得在商用硬件上进行大规模GNN训练成为可能。这些发现表明，专门的、架构感知的代码合成为跨不同并行和分布式平台的高性能GNN执行提供了一条有效且可扩展的路径。

英文摘要

Graph Neural Networks (GNNs) present a fundamental hardware challenge by fusing irregular, memory-bound graph traversals with regular, compute-intensive dense matrix operations. While frameworks such as PyTorch Geometric (PyG) and Deep Graph Library (DGL) prioritize high-level usability, they fail to address these divergent execution characteristics. As a result, they rely on generic kernels that suffer from poor cache locality, excessive memory movement, and substantial intermediate allocations. To address these limitations, we present Morphling, a domain-specific code synthesizer designed to bridge this gap. Morphling compiles high-level GNN specifications into portable, backend-specialized implementations targeting OpenMP, CUDA, and MPI. It achieves this by instantiating a library of optimized, architecture-aware primitives tailored to each execution environment. Morphling also incorporates a runtime sparsity-aware execution engine that dynamically selects dense or sparse execution paths using input feature statistics, reducing unnecessary computation on zero-valued entries. We evaluate Morphling on eleven real-world datasets spanning diverse graph structures, feature dimensionalities, and sparsity regimes. Morphling improves per-epoch training throughput by an average of 20X on CPUs, 19X on GPUs, and 6X in distributed settings over PyG and DGL, with peak speedups reaching 66X. Morphling's memory-efficient layouts further reduce peak memory consumption by up to 15X, enabling large-scale GNN training on commodity hardware. These findings demonstrate that specialized, architecture-aware code synthesis provides an effective and scalable path toward high-performance GNN execution across diverse parallel and distributed platforms.

URL PDF HTML ☆

赞 0 踩 0

2603.23994 2026-05-27 cs.LG cs.AI 版本更新

Understanding the Challenges in Iterative Generative Optimization with LLMs

理解大语言模型迭代生成优化中的挑战

Allen Nie, Xavier Daull, Zhiyi Kuang, Abhinav Akkiraju, Anish Chaudhuri, Max Piasevoli, Ryan Rong, YuCheng Yuan, Prerit Choudhary, Shannon Xiao, Rasool Fakoor, Adith Swaminathan, Ching-An Cheng

发表机构 * Google DeepMind（谷歌DeepMind）； CNRS（国家科学研究中心）； Stanford University（斯坦福大学）； Carnegie Mellon University（卡内基梅隆大学）； Microsoft（微软）； AWS（亚马逊AWS）； Netflix Research（Netflix研究）； Microsoft Research（微软研究院）

AI总结本文通过案例研究，揭示了在基于大语言模型的迭代生成优化中，起始工件、信用分配和批处理等隐藏设计选择对优化成败的决定性影响，并指出缺乏跨领域的通用学习循环设置方法是生产化和采用的主要障碍。

Comments 39 pages, 17 figures

详情

AI中文摘要

生成优化利用大型语言模型（LLMs）通过执行反馈迭代改进工件（如代码、工作流或提示）。这是一种构建自我改进代理的有前途的方法，但在实践中仍然脆弱：尽管有活跃的研究，只有9%的调查代理使用了任何自动优化。我们认为这种脆弱性是因为，为了建立学习循环，工程师必须做出“隐藏”的设计选择：优化器可以编辑什么，以及在每次更新时提供什么“正确”的学习证据？我们调查了影响大多数应用的三个因素：起始工件、执行轨迹的信用跨度，以及将试错批处理为学习证据。通过在MLAgentBench、Atari和BigBench Extra Hard中的案例研究，我们发现这些设计决策可以决定生成优化是否成功，然而它们在先前的工作中很少被明确说明。不同的起始工件决定了在MLAgentBench中哪些解决方案是可达到的，截断的轨迹仍然可以改进Atari代理，而更大的小批量并不会单调地改善BBEH上的泛化。我们得出结论，缺乏一种简单、通用的跨领域设置学习循环的方法是生产化和采用的主要障碍。我们为做出这些选择提供了实用指导。

英文摘要

Generative optimization uses large language models (LLMs) to iteratively improve artifacts (such as code, workflows or prompts) using execution feedback. It is a promising approach to building self-improving agents, yet in practice remains brittle: despite active research, only 9% of surveyed agents used any automated optimization. We argue that this brittleness arises because, to set up a learning loop, an engineer must make ``hidden'' design choices: What can the optimizer edit and what is the "right" learning evidence to provide at each update? We investigate three factors that affect most applications: the starting artifact, the credit horizon for execution traces, and batching trials and errors into learning evidence. Through case studies in MLAgentBench, Atari, and BigBench Extra Hard, we find that these design decisions can determine whether generative optimization succeeds, yet they are rarely made explicit in prior work. Different starting artifacts determine which solutions are reachable in MLAgentBench, truncated traces can still improve Atari agents, and larger minibatches do not monotonically improve generalization on BBEH. We conclude that the lack of a simple, universal way to set up learning loops across domains is a major hurdle for productionization and adoption. We provide practical guidance for making these choices.

URL PDF HTML ☆

赞 0 踩 0

2603.17685 2026-05-27 cs.LG 版本更新

Flow Matching Policy Optimization with Mirror Descent and Entropy Constraints

基于镜像下降和熵约束的流匹配策略优化

Ting Gao, Stavros Orfanoudakis, Nan Lin, Winnie Daamen, Serge Hoogendoorn, Elvin Isufi

AI总结针对在线强化学习中策略表达性与探索-利用平衡的挑战，提出基于ODE流匹配的框架FMER，通过免模拟策略优化和可计算熵目标，结合动态温度调节，在稀疏奖励任务中取得优越性能。

详情

AI中文摘要

平衡策略表达性与探索-利用权衡是在线强化学习（RL）中的核心挑战。虽然基于随机微分方程（SDE）的扩散策略可以表示复杂的多模态动作分布，但它们存在两个关键限制：其随机逆过程使熵难以处理（需要启发式探索），并且通过长去噪链计算策略梯度既昂贵又不稳定。在这项工作中，我们表明基于ODE的流匹配通过实现免模拟策略优化和可处理的熵计算，从本质上解决了这些问题。基于此，我们引入了基于镜像下降和熵约束的流匹配策略优化（FMER）。我们的框架以三种方式利用这一见解。首先，我们从理论上证明，最小化优势加权条件流匹配损失可以作为策略镜像下降的免模拟替代。这引导速度场朝向高价值区域，同时完全避免通过ODE求解器进行反向传播。其次，我们推导了一个解析熵目标，该目标校正了由$ anh$变换（将无界潜在空间映射到有界动作）引起的密度失真，从而促进了有原则的最大熵优化。最后，我们基于有效样本量动态调整镜像下降温度，以在训练期间强制执行稳健的信任区域。实验评估表明，FMER在具有挑战性的稀疏奖励FrankaKitchen环境中实现了优越的性能，同时在标准密集奖励MuJoCo基准测试中保持了有竞争力的结果。

英文摘要

Balancing policy expressiveness with the exploration-exploitation trade-off is a core challenge in online Reinforcement Learning (RL). While Stochastic Differential Equation (SDE)-based diffusion policies can represent complex, multimodal action distributions, they suffer from two critical limitations: their stochastic reverse processes render entropy intractable (necessitating heuristic exploration), and computing policy gradients through long denoising chains is expensive and unstable. In this work, we show that ODE-based flow matching inherently resolves these issues by enabling both simulation-free policy optimization and tractable entropy computation. Building on this, we introduce Flow Matching Policy Optimization with Mirror Descent and Entropy Constraints (FMER). Our framework exploits this insight in three ways. First, we theoretically establish that minimizing an advantage-weighted conditional flow matching loss acts as a simulation-free surrogate for policy mirror descent. This steers the velocity field toward high-value regions while entirely avoiding backpropagation through the ODE solver. Second, we derive an analytic entropy objective that corrects for the density distortion caused by the $\tanh$ transformation (mapping an unbounded latent space to bounded actions), thereby facilitating principled maximum-entropy optimization. Finally, we dynamically tune the mirror descent temperature based on the effective sample size to enforce a robust trust region during training. Empirical evaluations demonstrate that FMER achieves superior performance on the challenging sparse-reward FrankaKitchen environment, while maintaining competitive results across standard dense-reward MuJoCo benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2603.11790 2026-05-27 cs.LG 版本更新

Disentangled Representation Learning through Unsupervised Symmetry Group Discovery

通过无监督对称群发现实现解缠表示学习

Barthélémy Dang-Nhu, Louis Annabi, Sylvain Argentieri

发表机构 * Sorbonne Université, CNRS, Institut des Systèmes Intelligents et de Robotique, ISIR（索邦大学，国家科学研究中心，智能系统与机器人研究所，ISIR）

AI总结提出一种具身智能体通过与环境的无监督交互自主发现动作空间的群结构的方法，证明了在最小假设下真实对称群分解的可识别性，并推导出两种算法以学习线性对称基解缠表示。

详情

AI中文摘要

基于对称性的解缠表示学习利用环境变换的群结构来揭示潜在的变化因素。先前的基于对称性的解缠方法需要对称群结构的强先验知识，或对子群性质做出限制性假设。在这项工作中，我们通过提出一种方法消除了这些约束，该方法使具身智能体通过与环境的无监督交互自主发现其动作空间的群结构。我们证明了在最小假设下真实对称群分解的可识别性，并推导出两种算法：一种用于从交互数据中发现群分解，另一种用于在不假设特定子群性质的情况下学习线性对称基解缠（LSBD）表示。我们的方法在三个表现出不同群分解的环境中得到了验证，其性能优于现有的LSBD方法。

英文摘要

Symmetry-based disentangled representation learning leverages the group structure of environment transformations to uncover the latent factors of variation. Prior approaches to symmetry-based disentanglement have required strong prior knowledge of the symmetry group's structure, or restrictive assumptions about the subgroup properties. In this work, we remove these constraints by proposing a method whereby an embodied agent autonomously discovers the group structure of its action space through unsupervised interaction with the environment. We prove the identifiability of the true symmetry group decomposition under minimal assumptions, and derive two algorithms: one for discovering the group decomposition from interaction data, and another for learning Linear Symmetry-Based Disentangled (LSBD) representations without assuming specific subgroup properties. Our method is validated on three environments exhibiting different group decompositions, where it outperforms existing LSBD approaches.

URL PDF HTML ☆

赞 0 踩 0

2601.10566 2026-05-27 cs.CL cs.LG 版本更新

Representation-Aware Unlearning via Activation Signatures: From Suppression to Entity-Signature Erasure

基于激活签名的表示感知遗忘：从抑制到实体签名擦除

Syed Naveed Mahmood, Md. Rezaur Rahman Bhuiyan, Tasfia Zaman, Jareen Tasneem Khondaker, Md. Sameer Sakib, K. M. Shadman Wadith, Nazia Tasnim, Farig Sadeque

发表机构 * Computer Science and Engineering, BRAC University, Dhaka, Bangladesh（布拉格大学计算机科学与工程系，达卡，孟加拉国）； Boston University, Boston, MA, USA（波士顿大学，波士顿，马萨诸塞州，美国）

AI总结提出ERUF框架，通过挖掘实体特异性激活签名并抑制对应方向，实现表示层面的遗忘，同时保持表面抑制、内部衰减和效用保留。

Comments 16 pages, 4 figures

详情

AI中文摘要

实体级遗忘通常通过模型输出评估：是否停止命名目标、拒绝查询或改变真值比分布。然而，这些输出级测试无法显示主体的内部表示是否被衰减。我们引入实体表示遗忘框架（ERUF），这是一个表示感知框架，挖掘主体特定的激活签名，抑制相应的激活方向，并将行为蒸馏到LoRA参数中。在评估的基线中，ERUF是唯一同时实现表面级抑制、内部衰减和效用保留的方法。在TOFU forget10上，ERUF达到FQ=0.99和MU=0.62，匹配报告的神谕效用，同时接近神谕遗忘质量。在大多数标准基础模型设置中，ERUF保持低泄漏和低内部目标激活，SMR在0.00%至1.10%之间，EL10低于0.06，效用漂移低于3%。在Llama-3.1-8B上，对抗性实体恢复从63.89%降至20.15%，而名称无关恢复减少72.7%至77.4%。联合表面/内部诊断进一步揭示了推理优先模型中仅靠表面指标无法发现的尺度依赖行为。我们将这些结果解释为表示层面衰减的操作性证据，而非不可逆删除的正式保证。

英文摘要

Entity-level unlearning is usually evaluated by what a model says: whether it stops naming the target, refuses a query, or shifts a Truth Ratio distribution. These output-level tests, however, do not show whether a subject's internal representation has been attenuated. We introduce the Entity Representation Unlearning Framework (ERUF), a representation-aware framework that mines subject-specific activation signatures, suppresses the corresponding activation direction, and distills the behavior into LoRA parameters. Among evaluated baselines, ERUF is the only method that jointly achieves surface-level suppression, internal attenuation, and utility preservation. On TOFU forget10, ERUF achieves FQ = 0.99 and MU = 0.62, matching reported oracle utility while approaching oracle forget quality. Across most standard foundation-model settings, ERUF maintains low leakage and low internal target activation, with SMR between 0.00% and 1.10%, EL10 below 0.06, and utility drift below 3%. On Llama-3.1-8B, adversarial entity recovery falls from 63.89% to 20.15%, while name-agnostic recovery decreases by 72.7% to 77.4%. Joint surface/internal diagnostics further reveal scale-dependent behavior in reasoning-prior models that surface metrics alone would miss. We interpret these results as operational evidence of representation-level attenuation, not as a formal guarantee of irreversible deletion.

URL PDF HTML ☆

赞 0 踩 0

2603.16654 2026-05-27 cs.CL cs.AI cs.LG 版本更新

Omanic: Towards Step-wise Evaluation of Multi-hop Reasoning in Large Language Models

Omanic：迈向大语言模型多跳推理的逐步评估

Xiaojie Gu, Sherry T. Tong, Aosong Feng, Sophia Simeng Han, Jinghui Lu, Yingjian Chen, Yusuke Iwasawa, Yutaka Matsuo, Chanjun Park, Rex Ying, Irene Li

发表机构 * The University of Tokyo（东京大学）； Yale University（耶鲁大学）； Stanford University（斯坦福大学）； Xiaomi EV（小米EV）； Soongsil University（顺天大学）

AI总结针对大语言模型在多跳问答中中间步骤推理失败难以诊断的问题，提出Omanic基准，通过分解为单跳子问题并分析步骤级错误，揭示后期跳数瓶颈、事实知识下限和错误传播，微调后提升多个推理基准性能。

详情

AI中文摘要

仅从最终答案评估大语言模型（LLM）的推理能力可能会掩盖中间步骤的失败，尤其是在没有步骤级标注的多跳问答基准中。为解决这一问题，我们引入了Omanic，一个开放域4跳问答基准，它不仅用于衡量最终答案的准确性，还用于诊断推理在何处中断。Omanic包含10,296个机器生成的训练示例（OmanicSynth）和967个经专家审核的人工标注评估示例（OmanicBench），每个评估问题被分解为单跳子问题、中间答案和结构化图拓扑。对专有和开源LLM的实验表明，Omanic具有挑战性，而逐步分析揭示了后期跳数瓶颈、事实知识下限以及沿推理链的错误传播。在OmanicSynth上微调可迁移到六个推理和数学基准，平均提升7.41分，验证了其作为推理能力迁移监督的有效性。我们在https://huggingface.co/datasets/li-lab/Omanic 发布数据，在https://github.com/XiaojieGu/Omanic 发布代码。

英文摘要

Evaluating the reasoning abilities of large language models (LLMs) solely from final answers can obscure failures in intermediate steps, especially in multi-hop QA benchmarks without step-level annotations. To address this gap, we introduce Omanic, an open-domain 4-hop QA benchmark designed not only to measure final-answer accuracy but also to diagnose where reasoning breaks down. Omanic contains 10,296 machine-generated training examples (OmanicSynth) and 967 expert-reviewed human-annotated evaluation examples (OmanicBench), with each evaluation question decomposed into single-hop sub-questions, intermediate answers, and structured graph topologies. Experiments with proprietary and open-source LLMs show that Omanic is challenging, while step-wise analysis reveals a later-hop bottleneck, factual knowledge floor, and error propagation along reasoning chains. Fine-tuning on OmanicSynth transfers to six reasoning and mathematics benchmarks, yielding a 7.41-point average gain and validating its effectiveness as supervision for reasoning-capability transfer. We release the data at https://huggingface.co/datasets/li-lab/Omanic and the code at https://github.com/XiaojieGu/Omanic.

URL PDF HTML ☆

赞 0 踩 0

2603.15500 2026-05-27 cs.AI cs.LG 版本更新

CompassDPO: 用于鲁棒安全对齐的动态控制直接偏好优化

Jilong Liu, Yonghui Yang, Pengyang Shao, Wenjian Tao, Hao Zhan, Haokai Ma, Wei Qin, Richang Hong

发表机构 * Hefei University of Technology（合肥工业大学）； National University of Singapore（新加坡国立大学）

AI总结提出CompassDPO，通过隐式DPO奖励边际控制更新方向和幅度，无需外部奖励模型，在PKU-SafeRLHF等基准上提升鲁棒性。

详情

AI中文摘要

直接偏好优化（DPO）已成为安全对齐的标准框架，但其对成对偏好更新的依赖使得训练对不完美监督敏感。现有的鲁棒DPO方法通常通过全局损失校正或外部数据级干预来解决这种敏感性，而很大程度上忽略了不可靠比较如何扭曲批次级优化动态。我们提出CompassDPO，一种无奖励的DPO框架，通过动态控制稳定偏好优化。使用隐式DPO奖励边际作为训练时的指南针，CompassDPO沿着两个互补轴调节样本影响：更新方向和更新幅度。对于方向控制，它应用稀疏、有预算和预热延迟的损失混合，以减弱与新兴偏好方向冲突的更新分量。对于幅度控制，它自适应地软温莎化高损失尾部贡献，减少尾部主导同时保留来自困难样本的有用梯度。两种机制仅使用标准DPO训练期间可用的信号，无需外部奖励模型或额外监督。在PKU-SafeRLHF上跨四个骨干网络和多个分布外安全基准的实验表明，CompassDPO在鲁棒性上持续优于普通DPO和强DPO系列基线，特别是在受控标签翻转噪声下。代码可在https://anonymous.4open.science/r/CompassDPO-4D00获取。

英文摘要

Direct Preference Optimization (DPO) has become a standard framework for safety alignment, but its reliance on pairwise preference updates makes training sensitive to imperfect supervision. Existing robust DPO methods often address this sensitivity through global loss corrections or external data-level interventions, while largely overlooking how unreliable comparisons distort batch-level optimization dynamics. We propose CompassDPO, a reward-free DPO framework that stabilizes preference optimization through dynamics control. Using the implicit DPO reward margin as a training-time compass, CompassDPO regulates sample influence along two complementary axes: update direction and update magnitude. For directional control, it applies sparse, budgeted, and warm-up delayed loss mixing to attenuate update components that conflict with the emerging preference direction. For magnitude control, it adaptively soft-winsorizes high-loss tail contributions, reducing tail dominance while preserving useful gradients from hard examples. Both mechanisms use only signals available during standard DPO training and require no external reward model or additional supervision. Experiments on PKU-SafeRLHF across four backbones and multiple out-of-distribution safety benchmarks show that CompassDPO consistently improves robustness over vanilla DPO and strong DPO-family baselines, especially under controlled label-flip noise. Code is available at https://anonymous.4open.science/r/CompassDPO-4D00

URL PDF HTML ☆

赞 0 踩 0

2602.13626 2026-05-27 cs.LG 版本更新

Benchmark Leakage Trap: Can We Trust LLM-based Recommendation?

基准泄露陷阱：我们能信任基于LLM的推荐吗？

Mingqiao Zhang, Qiyao Peng, Yinghui Wang, Hongtao Liu, Yumeng Wang

发表机构 * Nanjing University（南京大学）； Tianjin University（天津大学）； Beijing Institute of Control and Electronic Technology（北京控制与电子技术研究所）

AI总结本文识别并研究了基于大语言模型的推荐系统中基准数据泄露问题，通过模拟多种泄露场景揭示了泄露对性能评估的误导性影响。

详情

AI中文摘要

大语言模型（LLMs）在推荐系统中的广泛应用对评估可靠性提出了严峻挑战。本文识别并研究了一个此前被忽视的问题：基于LLM的推荐中的基准数据泄露。当LLMs在预训练或微调过程中暴露于并可能记忆基准数据集时，就会发生这种现象，导致性能指标被人为夸大，无法反映模型真实性能。为验证这一现象，我们通过在战略混合语料库（包括来自域内和域外的用户-物品交互）上对基础模型进行持续预训练，模拟了多种数据泄露场景。我们的实验揭示了数据泄露的双重效应：当泄露数据与领域相关时，会导致显著但虚假的性能提升，误导性地夸大模型能力；相反，与领域无关的泄露通常会降低推荐准确性，突显了这种污染的复杂性和偶然性。我们的发现表明，数据泄露是基于LLM的推荐中一个关键但此前未被考虑的因素，可能影响模型的真实性能。我们在https://github.com/yusba1/LLMRec-Data-Leakage发布了代码。

英文摘要

The expanding integration of Large Language Models (LLMs) into recommender systems poses critical challenges to evaluation reliability. This paper identifies and investigates a previously overlooked issue: benchmark data leakage in LLM-based recommendation. This phenomenon occurs when LLMs are exposed to and potentially memorize benchmark datasets during pre-training or fine-tuning, leading to artificially inflated performance metrics that fail to reflect true model performance. To validate this phenomenon, we simulate diverse data leakage scenarios by conducting continued pre-training of foundation models on strategically blended corpora, which include user-item interactions from both in-domain and out-of-domain sources. Our experiments reveal a dual-effect of data leakage: when the leaked data is domain-relevant, it induces substantial but spurious performance gains, misleadingly exaggerating the model's capability. In contrast, domain-irrelevant leakage typically degrades recommendation accuracy, highlighting the complex and contingent nature of this contamination. Our findings reveal that data leakage acts as a critical, previously unaccounted-for factor in LLM-based recommendation, which could impact the true model performance. We release our code at https://github.com/yusba1/LLMRec-Data-Leakage.

URL PDF HTML ☆

赞 0 踩 0

2603.01800 2026-05-27 cs.LG cs.AI stat.ML stat.OT 版本更新

Phase-Type Variational Autoencoders for Heavy-Tailed Data

Phase-Type变分自编码器用于重尾数据

Abdelhakim Ziani, András Horváth, Paolo Ballarini

发表机构 * Université Paris Saclay, Lab. MICS, CentraleSupélec, Gif-sur-Yvette, France（巴黎萨克雷大学，MICS实验室，CentraleSupélec，法国吉夫-sur-依夫）

AI总结提出Phase-Type变分自编码器（PH-VAE），通过将解码器分布建模为潜在条件相位型分布（连续时间马尔可夫链的吸收时间），灵活适应重尾行为，在合成和真实基准上优于高斯、Student-t和极值VAE解码器。

详情

AI中文摘要

重尾分布在现实世界数据中无处不在，其中罕见但极端的事件主导了风险和变异性。然而，标准变分自编码器（VAE）采用简单的解码器分布，如高斯分布，无法捕捉重尾行为，而现有的重尾感知扩展仍然局限于预定义的参数族，其尾部行为是预先固定的。我们提出了Phase-Type变分自编码器（PH-VAE），其解码器分布是一个潜在条件的Phase-Type（PH）分布，定义为连续时间马尔可夫链（CTMC）的吸收时间。这种公式组合了多个指数时间尺度，产生了一个灵活且解析可处理的解码器，它直接从观测数据中调整其有限范围的尾部行为。在合成和真实世界基准上的实验表明，PH-VAE能够准确逼近各种重尾分布，在建模观测到的尾部行为和极端分位数方面显著优于基于高斯、Student-t和极值的VAE解码器。在多变量设置中，PH-VAE通过其共享的潜在表示捕捉了现实中的跨维度尾部依赖性。据我们所知，这是首次将Phase-Type分布整合到深度生成建模中的工作，桥接了应用概率论和表示学习。

英文摘要

Heavy-tailed distributions are ubiquitous in real-world data, where rare but extreme events dominate risk and variability. However, standard Variational Autoencoders (VAEs) employ simple decoder distributions, such as Gaussian distributions, that fail to capture heavy-tailed behavior, while existing heavy-tail-aware extensions remain restricted to predefined parametric families whose tail behavior is fixed a priori. We propose the Phase-Type Variational Autoencoder (PH-VAE), whose decoder distribution is a latent-conditioned Phase-Type (PH) distribution, defined as the absorption time of a continuous-time Markov chain (CTMC). This formulation composes multiple exponential time scales, yielding a flexible and analytically tractable decoder that adapts its finite-range tail behavior directly from the observed data. Experiments on synthetic and real-world benchmarks demonstrate that PH-VAE accurately approximates diverse heavy-tailed distributions, significantly outperforming Gaussian, Student-t, and extreme-value-based VAE decoders in modeling observed tail behavior and extreme quantiles. In multivariate settings, PH-VAE captures realistic cross-dimensional tail dependence through its shared latent representation. To our knowledge, this is the first work to integrate Phase-Type distributions into deep generative modeling, bridging applied probability and representation learning.

URL PDF HTML ☆

赞 0 踩 0

2603.01327 2026-05-27 cs.SE cs.CL cs.LG 版本更新

DeepInterestGR: 利用多模态大语言模型挖掘深度多兴趣用于生成式推荐

Yangchen Zeng, Zhenyu Yu, Zhiyuan Hu, Wenxin Zhang, Jinze Wang, Rongfeng Guo

发表机构 * Southeast University（东南大学）

AI总结提出DeepInterestGR框架，通过多LLM兴趣挖掘、奖励标记深度兴趣和兴趣增强物品离散化，解决生成式推荐中的浅层兴趣问题，在三个Amazon数据集上显著提升推荐性能。

详情

AI中文摘要

我们介绍了DeepInterestGR，一个将深度兴趣挖掘集成到生成式推荐流程中的新颖框架。这解决了“浅层兴趣”问题——现有的生成方法依赖于表面文本特征，未能捕捉潜在的用户动机，限制了个性化深度和推荐可解释性。我们的方法通过结构化推理提示利用多LLM兴趣挖掘（MLIM），通过奖励标记深度兴趣（RLDI）进行质量控制，通过RQ-VAE进行兴趣增强物品离散化（IEID），并结合由兴趣感知奖励引导的两阶段SFT-GRPO训练流程。我们在三个Amazon Review基准（Beauty、Sports、Instruments）上验证了DeepInterestGR，与包括SASRec、BERT4Rec、TIGER、LC-Rec和S-DPO在内的14个最先进基线进行了比较。我们的方法在HR@10上实现了5.8%-8.3%的相对改进，在NDCG@10上实现了7.7%-9.9%的相对改进，跨领域泛化增益达到+24.8%。这些结果证明，融入深度语义兴趣可以有效改进基于SID的生成式推荐。

英文摘要

We introduce DeepInterestGR, a novel framework that integrates deep interest mining into the generative recommendation pipeline. This addresses the "Shallow Interest" problem - existing generative methods rely on surface-level textual features and fail to capture latent user motivations, limiting personalization depth and recommendation interpretability. Our approach leverages Multi-LLM Interest Mining (MLIM) via structured reasoning prompting, Reward-Labeled Deep Interest (RLDI) for quality control, and Interest-Enhanced Item Discretization (IEID) via RQ-VAE, combined with a two-stage SFT-GRPO training pipeline guided by an Interest-Aware Reward. We validate DeepInterestGR on three Amazon Review benchmarks (Beauty, Sports, Instruments), comparing against 14 state-of-the-art baselines including SASRec, BERT4Rec, TIGER, LC-Rec, and S-DPO. Our method achieves 5.8%-8.3% relative improvements on HR@10 and 7.7%-9.9% on NDCG@10 over the strongest baseline, with cross-domain generalization gains of +24.8%. These results provide evidence that incorporating deep semantic interests can effectively improve SID-based generative recommendation.

URL PDF HTML ☆

赞 0 踩 0

2602.17605 2026-05-27 cs.CV cs.AI cs.CY cs.LG 版本更新

Adapting Actively on the Fly: Relevance-Guided Online Meta-Learning with Latent Concepts for Geospatial Discovery

在飞行中主动适应：基于相关性的在线元学习与潜在概念用于地理空间发现

Jowaria Khan, Anindya Sarkar, Yevgeniy Vorobeychik, Elizabeth Bondi-Kelly

发表机构 * University of Michigan, Ann Arbor, MI, USA（密歇根大学，安阿伯分校）； Washington University in St. Louis, St. Louis, MO, USA（华盛顿大学圣路易斯分校）

AI总结提出一个统一的地理空间发现框架，结合主动学习、在线元学习和概念引导推理，通过概念加权不确定性采样和相关性感知元批次形成策略，在有限数据和动态环境下高效发现隐藏目标。

详情

AI中文摘要

在环境监测中，数据收集通常成本高昂、稀疏且受紧急公共卫生需求影响。这对于致癌的PFAS（全氟和多氟烷基物质）污染尤其如此，与领域专家和环境组织的讨论强调需要在有限的采样预算下战略性地识别高风险、观测不足的区域。更广泛地说，在灾害响应和公共卫生环境中也出现了类似的挑战，动态环境使得从有限的地面实况中高效发现隐藏目标变得至关重要。然而，稀疏且有偏差的地理空间标签限制了现有基于学习方法（如强化学习）的适用性。为了解决这个问题，我们提出了一个统一的地理空间发现框架，该框架集成了主动学习、在线元学习和概念引导推理。我们的方法引入了两个基于共享的*概念相关性*概念的关键创新，该概念捕捉领域特定因素如何影响目标存在：一个*概念加权不确定性采样策略*，其中不确定性通过从现成概念（如土地覆盖和源距离）学习到的相关性进行调节；以及一个*相关性感知元批次形成策略*，该策略在在线元更新期间促进语义多样性，提高动态环境中的泛化能力。我们在PFAS污染发现任务上评估了我们的框架，这是一个受真实世界启发的环境监测任务，展示了在有限数据和变化条件下鲁棒的目标发现能力。

英文摘要

In environmental monitoring, data collection is often costly, sparse, and shaped by urgent public-health needs. This is particularly true for cancer-causing PFAS (Per- and polyfluoroalkyl substances) contamination, where discussions with domain experts and environmental organizations highlight the need to strategically identify high-risk, under-observed regions under tight sampling budgets. More broadly, similar challenges arise in disaster response and public health settings, where dynamic environments make it essential to efficiently uncover hidden targets from limited ground truth. Yet sparse and biased geospatial labels limit the applicability of existing learning-based methods, such as reinforcement learning. To address this, we propose a unified geospatial discovery framework that integrates active learning, online meta-learning, and concept-guided reasoning. Our approach introduces two key innovations built on a shared notion of *concept relevance*, capturing how domain-specific factors influence target presence: a *concept-weighted uncertainty sampling strategy*, where uncertainty is modulated by learned relevance from readily available concepts such as land cover and source proximity; and a *relevance-aware meta-batch formation strategy* that promotes semantic diversity during online-meta updates, improving generalization in dynamic environments. We evaluate our framework on PFAS contamination discovery as a real-world inspired environmental monitoring task, demonstrating robust target discovery under limited data and changing conditions.

URL PDF HTML ☆

赞 0 踩 0

2510.03352 2026-05-27 cs.CV cs.AI cs.LG 版本更新

Inference-Time Search Using Side Information for Diffusion-Based Image Reconstruction

基于侧信息的推理时搜索用于扩散模型图像重建

Mahdi Farahbakhsh, Vishnu Teja Kunde, Dileep Kalathil, Krishna Narayanan, Jean-Francois Chamberland

发表机构 * Department of Electrical and Computer Engineering, Texas A&M University（电气与计算机工程系，德克萨斯A&M大学）

AI总结提出一种即插即用、无需训练的推理时搜索框架，将侧信息融入现有扩散模型逆问题求解器，显著提升重建质量。

详情

AI中文摘要

扩散模型已被用作解决逆问题的先验。然而，现有方法通常忽略了能够显著提高重建质量的侧信息，尤其是在严重病态设置中。在这项工作中，我们提出了一种新颖的框架，通过推理时搜索将侧信息以即插即用、无需训练的方式融入现有的基于扩散模型的逆问题求解器。通过在多种逆问题（包括图像修复、超分辨率和几种去模糊任务）以及多种基于扩散模型的逆问题求解器（DPS、DAPS和MPGD）上的大量实验，我们表明，用我们的框架增强每个求解器，其重建质量始终优于相应的原始方法。为了展示我们方法的通用性，我们考虑了多种形式的侧信息，包括参考图像、文本描述和解剖学MRI扫描。代码可在该仓库中获取：https://github.com/mahdi-farahbakhsh/DISS。

英文摘要

Diffusion models have been used as priors for solving inverse problems. However, existing approaches typically overlook side information that could significantly improve reconstruction quality, especially in severely ill-posed settings. In this work, we propose a novel framework that incorporates side information into existing diffusion-based inverse problem solvers via inference-time search, in a plug-and-play, training-free manner. Through extensive experiments across a range of inverse problems, including inpainting, super-resolution, and several deblurring tasks, and across multiple diffusion-based inverse problem solvers (DPS, DAPS, and MPGD), we show that augmenting each solver with our framework consistently improves the quality of the reconstructions over the corresponding original method. To demonstrate the generality of our approach, we consider diverse forms of side information, including reference images, textual descriptions, and anatomical MRI scans. The code is available at this \href{https://github.com/mahdi-farahbakhsh/DISS}{repository}\footnote{https://github.com/mahdi-farahbakhsh/DISS}.

URL PDF HTML ☆

赞 0 踩 0

2602.15919 2026-05-27 stat.ML cs.AI cs.LG 版本更新

Assessing Per-Sample Membership Inference Vulnerability without Retraining

无需重训练的逐样本成员推断脆弱性评估

Valentin Dorseuil, Jamal Atif, Olivier Cappé

发表机构 * ENS, École normale supérieure（巴黎高等师范学院）； Université PSL, CNRS（巴黎政治学院、国家科学研究中心）； Institut Polytechnique de Paris（巴黎理工 institute）

AI总结提出一种基于数据依赖几何度量的逐样本成员推断脆弱性评分方法，仅需单个训练模型即可高效识别高风险样本。

详情

AI中文摘要

近期隐私文献表明，针对样本的成员推断攻击（MIA）显著优于非针对性方法。受此启发，我们探讨以下问题：能否在不训练影子模型的情况下评估单个训练点的隐私脆弱性？我们表明，逐样本对MIA的暴露程度不仅受其损失影响，还受数据依赖的几何度量控制。在线性设置中，我们推导出个体黑盒MIA脆弱性的闭式分解，将其分解为总体杠杆得分和残差损失项，明确了样本依赖的几何结构如何转化为隐私暴露。由于大多数现代架构的最后一层是线性的，我们将此框架扩展到深度网络，并提出一种基于最后一层表示的替代评分，仅需单个训练模型且无需影子模型。跨不同数据集和架构的实验表明，我们的评分在识别最先进攻击下的最高风险点时优于损失和梯度范数基线，为逐样本隐私风险评估提供了计算高效且理论基础的工。

英文摘要

Recent work in the privacy literature shows that sample-targeted membership inference attacks (MIAs) significantly outperform untargeted approaches by a wide margin. Motivated by this observation, we address the following question: can the privacy vulnerability of individual training points be assessed without training shadow models? We show that per-sample exposure to MIA is governed not only by a point's loss, but also by a data-dependent geometric measure. In the linear setting, we derive a closed-form decomposition of individual black-box MIA vulnerability into a population leverage score and a residual loss term, making explicit how sample-dependent geometry translates into privacy exposure. Since the final layer of most modern architectures is linear, we extend this framework to deep networks and propose a surrogate score operating on last-layer representations that requires only a single trained model and no shadow models. Empirical evaluations across diverse datasets and architectures show that our score outperforms loss and gradient-norm baselines at identifying the highest-risk points under state-of-the-art attacks, providing a computationally efficient and theoretically grounded tool for per-sample privacy risk assessment.

URL PDF HTML ☆

赞 0 踩 0

2602.12833 2026-05-27 cs.LG cs.AI cs.MA 版本更新

Vital Trace: Protocol-Constrained Patient-State Reasoning for Longitudinal Clinical Trajectories

Vital Trace: 协议约束的患者状态推理用于纵向临床轨迹

Zhan Qu, Michael Färber

发表机构 * TU Dresden（德累斯顿理工大学）

AI总结提出Vital Trace，一个协议约束的多智能体框架，通过紧凑的持久患者状态记忆和四个协调智能体（Router、Reasoner、Auditor、Steward）进行分阶段推理，以解决长期临床轨迹推理中的上下文漂移和不稳定问题，在MIMIC-IV和eICU数据集上预测未来血管加压药、呼吸、肾脏支持和恶化任务中优于自由形式多智能体基线。

详情

AI中文摘要

纵向临床推理需要跟踪电子健康记录中患者轨迹的生理测量、实验室结果和干预措施。现有的基于LLM的临床推理系统通常依赖于重复序列化患者历史或交换无约束的文本智能体消息，导致上下文漂移、推理不稳定以及长期推理成本增加。我们提出了Vital Trace，一个协议约束的多智能体框架，用于在动态ICU轨迹上进行未来临床风险预测。Vital Trace不维护无界文本历史，而是使用紧凑的持久患者状态记忆以及由四个协调智能体（Router、Reasoner、Auditor和Steward）执行的分阶段推理。为了支持时间上连贯的推理，我们引入了一个手动策划的全局协议，包含生理状态转换规则和动态患者状态表示，随时间跟踪血流动力学、呼吸、肾脏、代谢和炎症不稳定性。我们在MIMIC-IV和eICU上使用未来血管加压药支持、呼吸支持、肾脏支持和恶化预测任务评估Vital Trace。结果表明，与自由形式多智能体基线相比，结构化的协议约束推理提高了时间一致性、通信稳定性、校准性和可解释性，同时在长期ICU轨迹上实现了强大的预测性能。

英文摘要

Longitudinal clinical reasoning over electronic health records requires tracking evolving physiological measurements, laboratory results, and interventions across extended patient trajectories. Existing LLM-based clinical reasoning systems often rely on repeatedly serializing patient histories or exchanging unconstrained textual agent messages, leading to context drift, unstable reasoning, and growing inference cost over long horizons. We present Vital Trace, a protocol-constrained multi-agent framework for future clinical risk prediction over evolving ICU trajectories. Instead of maintaining unbounded textual histories, Vital Trace uses a compact persistent patient-state memory together with staged reasoning performed by four coordinated agents: a Router, Reasoner, Auditor, and Steward. To support temporally coherent reasoning, we introduce a manually curated Global Protocol containing physiological state-transition rules and a dynamic patient-state representation that tracks hemodynamic, respiratory, renal, metabolic, and inflammatory instability over time. We evaluate Vital Trace on MIMIC-IV and eICU using future vasopressor-support, respiratory-support, renal-support, and deterioration prediction tasks. Results show that structured protocol-constrained reasoning improves temporal consistency, communication stability, calibration, and interpretability compared with free-form multi-agent baselines while achieving strong predictive performance across long ICU trajectories.

URL PDF HTML ☆

赞 0 踩 0

2507.11486 2026-05-27 cs.LG 版本更新

Exploring the robustness of TractOracle methods in RL-based tractography

探索基于强化学习的纤维追踪中TractOracle方法的鲁棒性

Jeremi Levesque, Antoine Théberge, Maxime Descoteaux, Pierre-Marc Jodoin

发表机构 * Department of Computer Science, Faculty of Science, University of Sherbrooke（谢布罗克大学计算机科学系）

AI总结本文通过整合强化学习的最新进展，扩展了TractOracle-RL框架，并引入迭代奖励训练（IRT）方法，实验表明基于oracle的RL方法在准确性和解剖有效性上显著优于传统纤维追踪技术。

Comments 38 pages, 8 figures. Submitted to Medical Image Analysis

详情

DOI: 10.1016/j.media.2025.103743
Journal ref: Medical Image Analysis, December 2025

AI中文摘要

纤维追踪算法利用扩散MRI重建大脑白质的纤维结构。在机器学习方法中，强化学习（RL）已成为纤维追踪的一个有前景的框架，在几个关键方面优于传统方法。TractOracle-RL是一种最新的基于RL的方法，通过基于奖励的机制将解剖先验纳入训练过程，减少了假阳性。在本文中，我们通过整合RL的最新进展，研究了原始TractOracle-RL框架的四种扩展，并在五个不同的扩散MRI数据集上评估了它们的性能。结果表明，无论使用何种具体方法或数据集，将oracle与RL框架结合始终能产生鲁棒且可靠的纤维追踪。我们还提出了一种新的RL训练方案，称为迭代奖励训练（IRT），其灵感来自人类反馈强化学习（RLHF）范式。IRT不依赖人类输入，而是利用束过滤方法在训练过程中迭代优化oracle的指导。实验结果表明，使用oracle反馈训练的RL方法在准确性和解剖有效性方面显著优于广泛使用的纤维追踪技术。

英文摘要

Tractography algorithms leverage diffusion MRI to reconstruct the fibrous architecture of the brain's white matter. Among machine learning approaches, reinforcement learning (RL) has emerged as a promising framework for tractography, outperforming traditional methods in several key aspects. TractOracle-RL, a recent RL-based approach, reduces false positives by incorporating anatomical priors into the training process via a reward-based mechanism. In this paper, we investigate four extensions of the original TractOracle-RL framework by integrating recent advances in RL, and we evaluate their performance across five diverse diffusion MRI datasets. Results demonstrate that combining an oracle with the RL framework consistently leads to robust and reliable tractography, regardless of the specific method or dataset used. We also introduce a novel RL training scheme called Iterative Reward Training (IRT), inspired by the Reinforcement Learning from Human Feedback (RLHF) paradigm. Instead of relying on human input, IRT leverages bundle filtering methods to iteratively refine the oracle's guidance throughout training. Experimental results show that RL methods trained with oracle feedback significantly outperform widely used tractography techniques in terms of accuracy and anatomical validity.

URL PDF HTML ☆

赞 0 踩 0

2602.10450 2026-05-27 cs.LG cs.AI math.OC 版本更新

Constructing Industrial-Scale Optimization Modeling Benchmark

构建工业规模优化建模基准

Zhong Li, Hongliang Lu, Tao Wei, Yuxuan Chen, Wenyu Liu, Yuan Lan, Fan Zhang, Zaiwen Wen

发表机构 * Great Bay University（大湾大学）； Peking University（北京大学）； Huawei Technologies Co., Ltd（华为技术有限公司）

AI总结提出MIPLIB-NL基准，通过结构感知逆向构建方法从真实混合整数线性规划中生成自然语言规范与求解器代码，以评估大语言模型在工业规模优化建模中的性能。

Comments This paper was accepted by ICML'26 for publication

详情

AI中文摘要

优化建模支撑着物流、制造、能源和金融领域的决策，然而将自然语言需求转化为正确的优化公式和可执行求解器代码仍然需要大量人力。尽管大语言模型（LLMs）已被探索用于此任务，但评估仍以玩具级或合成基准为主，掩盖了具有$10^{3}$--$10^{6}$（或更多）变量和约束的工业问题的难度。一个关键瓶颈是缺乏将自然语言规范与基于真实优化模型的参考公式/求解器代码对齐的基准。为填补这一空白，我们引入了MIPLIB-NL，它通过一种结构感知的逆向构建方法从MIPLIB~2017中的真实混合整数线性规划构建而成。我们的流程（i）从平坦的求解器公式中恢复紧凑、可复用的模型结构，（ii）在统一的模型-数据分离格式下，逆向生成明确关联到该恢复结构的自然语言规范，以及（iii）通过专家评审和人类-LLM交互以及独立的逆向检查进行迭代语义验证。这产生了223个一对一的重构，保留了原始实例的数学内容，同时实现了现实的自然语言到优化评估。实验表明，在现有基准上表现良好的系统在MIPLIB-NL上性能显著下降，暴露了在玩具规模下不可见的失败模式。

英文摘要

Optimization modeling underpins decision-making in logistics, manufacturing, energy, and finance, yet translating natural-language requirements into correct optimization formulations and solver-executable code remains labor-intensive. Although large language models (LLMs) have been explored for this task, evaluation is still dominated by toy-sized or synthetic benchmarks, masking the difficulty of industrial problems with $10^{3}$--$10^{6}$ (or more) variables and constraints. A key bottleneck is the lack of benchmarks that align natural-language specifications with reference formulations/solver code grounded in real optimization models. To fill in this gap, we introduce MIPLIB-NL, built via a structure-aware reverse construction methodology from real mixed-integer linear programs in MIPLIB~2017. Our pipeline (i) recovers compact, reusable model structure from flat solver formulations, (ii) reverse-generates natural-language specifications explicitly tied to this recovered structure under a unified model--data separation format, and (iii) performs iterative semantic validation through expert review and human--LLM interaction with independent reconstruction checks. This yields 223 one-to-one reconstructions that preserve the mathematical content of the original instances while enabling realistic natural-language-to-optimization evaluation. Experiments show substantial performance degradation on MIPLIB-NL for systems that perform strongly on existing benchmarks, exposing failure modes invisible at toy scale.

URL PDF HTML ☆

赞 0 踩 0

2602.10104 2026-05-27 cs.CV cs.AI cs.LG 版本更新

Olaf-World: Orienting Latent Actions for Video World Modeling

Olaf-World: 面向视频世界模型的潜在动作定向

Yuxin Jiang, Yuchao Gu, Ivor W. Tsang, Mike Zheng Shou

发表机构 * Show Lab, National University of Singapore ； Research (A STAR), Singapore

AI总结提出SeqΔ-REPA对齐目标，通过冻结自监督视频编码器的时序特征差异锚定潜在动作，实现无标签视频中可迁移的动作控制世界模型预训练。

Comments ICML 2026. Project page: https://showlab.github.io/Olaf-World/ Code: https://github.com/showlab/Olaf-World

详情

AI中文摘要

扩展动作可控世界模型受限于动作标签的稀缺性。虽然潜在动作学习有望从无标签视频中提取控制接口，但学习到的潜在表示往往难以跨上下文迁移：它们纠缠了场景特定线索，缺乏共享坐标系。这是因为标准目标仅在每个片段内操作，没有提供跨上下文对齐动作语义的机制。我们的关键洞察是，尽管动作未被观测到，但其语义效果是可观测的，可以作为共享参考。我们引入SeqΔ-REPA，一种序列级控制效果对齐目标，将集成潜在动作锚定到来自冻结自监督视频编码器的时序特征差异。基于此，我们提出Olaf-World，一个从大规模被动视频中预训练动作条件视频世界模型的流程。大量实验表明，我们的方法学习了更结构化的潜在动作空间，从而在零样本动作迁移和适应新控制接口的数据效率上优于最先进的基线方法。

英文摘要

Scaling action-controllable world models is limited by the scarcity of action labels. While latent action learning promises to extract control interfaces from unlabeled video, learned latents often fail to transfer across contexts: they entangle scene-specific cues and lack a shared coordinate system. This occurs because standard objectives operate only within each clip, providing no mechanism to align action semantics across contexts. Our key insight is that although actions are unobserved, their semantic effects are observable and can serve as a shared reference. We introduce Seq$Δ$-REPA, a sequence-level control-effect alignment objective that anchors integrated latent action to temporal feature differences from a frozen, self-supervised video encoder. Building on this, we present Olaf-World, a pipeline that pretrains action-conditioned video world models from large-scale passive video. Extensive experiments demonstrate that our method learns a more structured latent action space, leading to stronger zero-shot action transfer and more data-efficient adaptation to new control interfaces than state-of-the-art baselines.

URL PDF HTML ☆

赞 0 踩 0

2602.09842 2026-05-27 math.OC cs.LG 版本更新

Step-Size Stability in Stochastic Optimization: A Theoretical Perspective

随机优化中的步长稳定性：理论视角

Fabian Schaipp, Robert M. Gower, Adrien Taylor

发表机构 * Inria, Departement d'Informatique de l'Ecole Normale Superieure, PSL Research University, Paris, France（法国国家信息与自动化技术研究院，巴黎高等师范学院计算机系，巴黎理工大学，法国）； CCM, Flatiron Institute, New York City（Flatiron研究所，纽约市）

AI总结本文通过理论分析识别关键量，量化步长过大时性能下降程度，证明自适应步长方法（如SPS、NGN）比SGD更鲁棒，并实验验证理论界对实际性能的定性反映。

2511.06625 2026-05-27 cs.CV cs.AI cs.LG 版本更新

Explainable Cross-Disease Reasoning for Cardiovascular Risk Assessment from Low-Dose Computed Tomography

可解释的跨疾病推理：基于低剂量计算机断层扫描的心血管风险评估

Yifei Zhang, Jiashuo Zhang, Mojtaba Safari, Xiaofeng Yang, Liang Zhao

发表机构 * Department of Computer Science, Emory University（埃默里大学计算机科学系）； Department of Computer Science, Johns Hopkins University（约翰霍普金斯大学计算机科学系）； Department of Radiation Oncology（放射肿瘤学部）； Winship Cancer Institute, Emory University（埃默里大学Winship癌症研究所）

AI总结提出一种可解释的跨疾病推理框架，通过提取肺部发现、基于医学知识进行跨器官机制推理，并结合心脏子体积特征，从低剂量胸部CT中实现心血管风险评估，在NLST队列中AUC达0.919。

详情

AI中文摘要

低剂量胸部计算机断层扫描（LDCT）在一次扫描中捕获肺部和心脏结构，使得能够联合评估肺部和心血管健康。现有方法通常独立建模这些领域，并未明确表示它们的生理交互。我们提出了一种可解释的跨疾病推理框架，用于从LDCT进行心血管风险评估。该框架遵循受限的临床信息路径：它提取肺部发现，将跨器官机制基于医学知识进行推理，并生成带有自然语言理由的心血管预测。它结合了四个组件：一个冻结的肺风险先验、一个肺部感知模块、一个代理推理模块和一个心脏子体积特征提取器。它们的输出被融合，以将局部心脏证据与机制层面的肺部上下文整合。在国家肺筛查试验队列中，该框架在CVD筛查中达到0.919的AUC，在CVD死亡率预测中高达0.838，优于心脏特异性、单疾病和基础模型基线。目标对照表明，这些增益不能仅由额外的胸部视觉特征、固定规则传播或单一推理后端解释。因此，所提出的框架提供了一种可审计的方法，用于从LDCT进行跨疾病心血管风险评估。

英文摘要

Low-dose chest computed tomography (LDCT) captures pulmonary and cardiac structures in a single scan, enabling joint assessment of lung and cardiovascular health. Existing approaches typically model these domains independently and do not explicitly represent their physiological interactions. We propose an Explainable Cross-Disease Reasoning Framework for cardiovascular risk assessment from LDCT. The framework follows a constrained clinical-information pathway: it extracts pulmonary findings, grounds cross-organ mechanisms in medical knowledge, and produces a cardiovascular prediction with a natural-language rationale. It combines four components: a frozen lung-risk prior, a pulmonary perception module, an agentic reasoning module, and a cardiac subvolume feature extractor. Their outputs are fused to integrate localized cardiac evidence with mechanism-level pulmonary context. On the National Lung Screening Trial cohort, the framework achieves an AUC of 0.919 for CVD screening and up to 0.838 for CVD mortality prediction, outperforming cardiac-specific, single-disease, and foundation-model baselines. Targeted controls indicate that the gains are not explained by additional thoracic visual features alone, fixed rule propagation, or a single reasoning backend. The proposed framework thus provides an auditable approach to cross-disease cardiovascular risk assessment from LDCT.

URL PDF HTML ☆

赞 0 踩 0

2506.15199 2026-05-27 cs.LG stat.ML 版本更新

Interpretability and Generalization Bounds for Learning Spatial Physics

学习空间物理的可解释性与泛化界

Alejandro Francisco Queiruga, Theo Gutman-Solo, Shuai Jiang

发表机构 * OpenAI ； Google（谷歌）； Sandia National Laboratories（桑迪亚国家实验室）

AI总结利用数值分析技术，严格量化了应用于线性微分方程的机器学习模型在参数发现或求解中的准确性、收敛率和泛化界，并基于格林函数表示引入科学模型的可解释性视角。

Comments To appear in ICML 2026. 18 pages, 13 figures

详情

AI中文摘要

尽管机器学习在科学问题上的许多应用看起来很有前景，但视觉可能具有欺骗性。利用数值分析技术，我们严格量化了某些应用于线性微分方程进行参数发现或求解的机器学习模型的准确性、收敛率和泛化界。除了数据的数量和离散化之外，我们发现数据的函数空间对模型的泛化至关重要。对于常用模型（包括物理特定技术），我们通过实验证明了类似的泛化不足。与直觉相反，我们发现不同类别的模型可能表现出相反的泛化行为。基于我们的理论分析，我们还引入了一种新的科学模型机械可解释性视角，即可以从黑箱模型的权重中提取格林函数表示。我们的结果为测量物理系统泛化性提供了一种新的交叉验证技术，该技术可作为基准。

英文摘要

While there are many applications of ML to scientific problems that look promising, visuals can be deceiving. Using numerical analysis techniques, we rigorously quantify the accuracy, convergence rates, and generalization bounds of certain ML models applied to linear differential equations for parameter discovery or solution finding. Beyond the quantity and discretization of data, we identify that the function space of the data is critical to the generalization of the model. A similar lack of generalization is empirically demonstrated for commonly used models, including physics-specific techniques. Counterintuitively, we find that different classes of models can exhibit opposing generalization behaviors. Based on our theoretical analysis, we also introduce a new mechanistic interpretability lens on scientific models whereby Green's function representations can be extracted from the weights of black-box models. Our results inform a new cross-validation technique for measuring generalization in physical systems, which can serve as a benchmark.

URL PDF HTML ☆

赞 0 踩 0

2601.21008 2026-05-27 cs.LG cs.AI math.OC 版本更新

基于高斯VAE的无训练向量量化

Tongda Xu, Wendi Zheng, Jiajun He, Jose Miguel Hernandez-Lobato, Yan Wang, Ya-Qin Zhang, Jie Tang

发表机构 * AIR, Tsinghua University（清华空气研究院）； CST, Tsinghua University（清华计算机研究所）； University of Cambridge（剑桥大学）

AI总结提出Gaussian Quant (GQ)方法，通过约束训练高斯VAE并直接转换为VQ-VAE，无需额外训练，在UNet和ViT架构上优于现有VQ-VAE。

详情

AI中文摘要

向量量化变分自编码器（VQ-VAEs）是将图像压缩为离散标记的离散自编码器。然而，由于离散化，它们难以训练。在本文中，我们提出了一种简单而有效的技术，称为Gaussian Quant (GQ)，它首先在特定约束下训练高斯VAE，然后将其转换为VQ-VAE，无需额外训练。对于转换，GQ生成随机高斯噪声作为码本，并找到最接近后验均值的噪声向量。理论上，我们证明当码本大小的对数超过高斯VAE的bits-back编码率时，可以保证较小的量化误差。实际上，我们提出了一种启发式方法来训练高斯VAE以实现有效转换，称为目标散度约束（TDC）。实验上，我们表明GQ在UNet和ViT架构上均优于先前的VQ-VAE，如VQGAN、FSQ、LFQ和BSQ。此外，TDC还改进了先前的离散化方法，如TokenBridge。源代码见https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE。

英文摘要

Vector-quantized variational autoencoders (VQ-VAEs) are discrete autoencoders that compress images into discrete tokens. However, they are difficult to train due to discretization. In this paper, we propose a simple yet effective technique dubbed Gaussian Quant (GQ), which first trains a Gaussian VAE under certain constraints and then converts it into a VQ-VAE without additional training. For conversion, GQ generates random Gaussian noise as a codebook and finds the closest noise vector to the posterior mean. Theoretically, we prove that when the logarithm of the codebook size exceeds the bits-back coding rate of the Gaussian VAE, a small quantization error is guaranteed. Practically, we propose a heuristic to train Gaussian VAEs for effective conversion, named the target divergence constraint (TDC). Empirically, we show that GQ outperforms previous VQ-VAEs, such as VQGAN, FSQ, LFQ, and BSQ, on both UNet and ViT architectures. Furthermore, TDC also improves previous Gaussian VAE discretization methods, such as TokenBridge. The source code is provided in https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE.

URL PDF HTML ☆

赞 0 踩 0

2602.04931 2026-05-27 cs.LG cs.AI 版本更新

Emergent Causal-Geometric Dynamics Across Depth in Large Language Models

大型语言模型中跨深度的涌现因果几何动力学

Shahar Haim, Daniel C McNamee

发表机构 * Champalimaud Centre for the Unknown（查普拉米乌德未知中心）

AI总结通过结合几何分析与因果干预，揭示了解码器-only大型语言模型中从上下文处理到预测形成的跨层转变，并发现后期层中角度结构参数化下一词分布相似性并实现选择性因果控制。

详情

AI中文摘要

对大型语言模型（LLM）表征的几何分析揭示了跨深度的结构化变化，但本质上与token预测形成相关。同时，因果干预揭示了依赖于深度的效能曲线，但缺乏对其表征动力学的统一解释。对LLM功能的完整解释需要说明表征结构如何跨深度演化以因果性地产生预测。我们通过将几何分析与机械干预相结合，明确将跨深度动力学作为解释LLM功能的组织轴，综合了这些视角。在解码器-only LLM中，我们识别出从上下文处理到预测形成计算的急剧转变，伴随着跨层的表征几何的更渐进重组。这种综合揭示了一种后期层几何编码，其中角度结构参数化下一词分布相似性，并能够对预测进行选择性因果控制，而表征范数编码的信息与预测基本解耦。总之，我们的结果提供了因果和几何视角的综合，产生了关于语言模型中跨深度的控制相关几何动力学如何将上下文转化为预测的机械论解释。这一视角调和了先前令人困惑的发现，并表明层状功能不能孤立地理解或有效干预，而只能在网络涌现的全局动力学结构中理解。

英文摘要

Geometric analyses of large language model (LLM) representations reveal structured variation across depth but remain fundamentally correlational with respect to token prediction formation. Meanwhile, causal interventions expose depth-dependent efficacy profiles without a unifying account of their representational dynamics. A complete account of LLM function requires explaining how representational structure evolves across depth to causally produce predictions. We synthesize these perspectives by combining geometric analysis with mechanistic interventions, explicitly centralizing depth-wise dynamics as the organizing axis for interpreting LLM function. In decoder-only LLMs, we identify a sharp transition from context-processing to prediction-forming computation, accompanied by a more gradual reorganization of representational geometry across layers. This synthesis reveals a late-layer geometric code in which angular structure parameterizes next-token distributional similarity and enables selective causal control over predictions, while representation norms encode information largely decoupled from prediction. Together, our results provide a synthesis of causal and geometric perspectives, yielding a mechanistic account of how control-relevant geometric dynamics across depth transform context into prediction in language models. This perspective reconciles previously puzzling findings and implies that layer-wise function cannot be understood or effectively intervened upon in isolation, but only within the emergent global dynamical structure of the network.

URL PDF HTML ☆

赞 0 踩 0

2602.04599 2026-05-27 cs.LG 版本更新

Stochastic Decision Horizons for Constrained Reinforcement Learning

约束强化学习的随机决策视界

Nikola Milosevic, Leonard Franz, Daniel Haeufle, Georg Martius, Nico Scherf, Pavel Kolev

发表机构 * Max Planck Institute for Human Cognitive and Brain Sciences（马克斯·普朗克人类认知与脑科学研究所）； Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI)（可扩展数据分析与人工智能中心 (ScaDS.AI)）； Hertie Institute for Clinical Brain Research & Center for Integrative Neuroscience（赫尔特临床脑研究所在线及整合神经科学中心）； University of Tübingen（图宾根大学）； Max Planck Institute for Intelligent Systems（马克斯·普朗克智能系统研究所）

AI总结提出随机决策视界（SDH）框架，通过状态-动作延续概率实现每步约束满足，并开发了首个离策略和正则化算法（AS-SAC和VT-MPO），在90肌肉人形机器人上以4倍更少的环境步数达到最先进步态真实度。

详情

AI中文摘要

我们提出随机决策视界（SDH），这是一个理论基础的框架，用于解决具有每步约束满足的约束强化学习问题，这在许多实际应用中是一个理想属性。在SDH中，违反约束通过状态-动作延续概率有效缩短视界。利用控制作为推理，我们开发了首个用于即时约束RL的离策略和正则化算法。我们确定了违反后决策的两种原则性语义。吸收状态语义终止决策过程，因此只有存活的决策支付熵成本，产生最大熵AS-SAC。虚拟终止保持决策过程活跃，同时停止奖励信用，产生KL正则化VT-MPO。为了连接SDH与CMDP，我们跟踪违反沿轨迹的累积（它们的违反深度剖面）。SDH有效地通过每个轨迹的总违反的指数加权；这正好在违反发生在单一特征尺度时匹配加性CMDP预算，并且我们指出它不能匹配的情况：当罕见的深度违反与频繁的浅层违反混合时。实验验证了理论。在90肌肉H2190人形机器人（Hyfydy）上，VT-MPO以4倍更少的环境步数和更稳定的训练达到最先进的步态真实度。在Safety Gymnasium上，违反深度剖面正确识别了SDH提供强奖励-违反权衡的机制。

英文摘要

We propose stochastic decision horizons (SDH), a theoretically grounded framework for solving constrained RL problems with every-step constraint satisfaction, a desirable property in many real-world applications. In SDH, a constraint violation yields an effective shortening of horizon via a state-action continuation probability. Using Control as Inference, we develop the first off-policy and regularized algorithms for RL with instantaneous constraints. We identify two principled semantics for what counts as a decision after a violation. Absorbing-state semantics end the decision process, so only surviving decisions pay entropy cost, yielding max-entropy AS-SAC. Virtual-termination keeps the decision process alive while stopping reward credit, yielding KL-regularized VT-MPO. To connect SDH with CMDPs, we track how violations accumulate along trajectories (their violation-depth profile). SDH effectively weights each trajectory by the exponential of its total violation; this matches an additive CMDP budget exactly when violations occur at a single characteristic scale, and we pinpoint where it cannot: when rare, deep violations mix with frequent, shallow ones. Experiments validate the theory. On the 90-muscle H2190 humanoid (Hyfydy), VT-MPO matches state-of-the-art gait realism with $4\times$ fewer environment steps and substantially more stable training. On Safety Gymnasium, violation-depth profiles correctly identify the regimes in which SDH delivers strong reward-violation trade-offs. Experiments validate the theory. On the 90-muscle H2190 humanoid (Hyfydy), VT-MPO matches state-of-the-art gait realism with 4x fewer environment steps and substantially more stable training. On Safety Gymnasium, violation-depth profiles correctly identify the regimes in which SDH delivers strong reward-violation trade-offs.

URL PDF HTML ☆

赞 0 踩 0

2602.04397 2026-05-27 cs.GT cs.LG 版本更新

UCPO：不确定性感知策略优化

Xianzhou Zeng, Jing Huang, Chunmei Xie, Gongrui Nan, Siye Chen, Mengyu Lu, Weiqi Xiong, Qixuan Zhou, Junhao Zhang, Qiang Zhu, Yadong Li, Xingzhong Xu

AI总结针对现有强化学习范式在不确定性奖励下存在的优势偏差和过度自信问题，提出三元优势解耦和动态不确定性奖励调整机制，显著提升模型在知识边界外的可靠性。

Comments Accepted by ICML 2026

详情

AI中文摘要

构建可信赖的大语言模型的关键在于赋予其内在的不确定性表达能力，从而减轻高风险应用中的过度自信错误。然而，现有的强化学习范式（如GRPO）由于二元决策空间和静态不确定性奖励，常常遭受优势偏差，导致过度保守或过度自信。为了解决这一挑战，本文揭示了当前结合不确定性奖励的强化学习范式中奖励破解和过度自信的根本原因，并在此基础上提出了不确定性感知策略优化（UCPO）框架。UCPO采用三元优势解耦来分离并独立归一化确定性和不确定性轨迹，从而消除优势偏差。此外，动态不确定性奖励调整机制根据模型演化和实例难度实时调整不确定性权重。在数学推理和通用任务上的实验结果表明，UCPO有效解决了奖励不平衡问题，显著提高了模型在知识边界外的可靠性。

英文摘要

The key to building trustworthy large language models (LLMs) lies in endowing them with inherent uncertainty expression capabilities, thereby mitigating overconfident errors in high-stakes applications. However, existing RL paradigms such as GRPO often suffer from Advantage Bias due to binary decision spaces and static uncertainty rewards, inducing either excessive conservatism or overconfidence. To tackle this challenge, this paper unveils the root causes of reward hacking and overconfidence in current RL paradigms incorporating uncertainty-based rewards, based on which we propose the UnCertainty-Aware Policy Optimization (UCPO) framework. UCPO employs Ternary Advantage Decoupling to separate and independently normalize deterministic and uncertain rollouts, thereby eliminating advantage bias. Furthermore, a Dynamic Uncertainty Reward Adjustment mechanism adapts uncertainty weights in real-time according to model evolution and instance difficulty. Experimental results in mathematical reasoning and general tasks demonstrate that UCPO effectively resolves the reward imbalance, significantly improving the reliability of the model beyond their knowledge boundaries.

URL PDF HTML ☆

赞 0 踩 0

2601.22384 2026-05-27 cs.LG cs.AI 版本更新

Graph is a Substrate Across Data Modalities

图是跨数据模态的基板

Ziming Li, Xiaoming Wu, Zehong Wang, Jiazheng Li, Yijun Tian, Jinhe Bi, Yunpu Ma, Yanfang Ye, Chuxu Zhang

发表机构 * University of Connecticut（康涅狄格大学）； University of Notre Dame（诺丁汉大学）； National University of Singapore（新加坡国立大学）

AI总结提出G-Substrate框架，通过统一结构模式和交错角色训练策略，使图结构作为共享基板跨模态和任务积累，优于孤立和朴素多任务方法。

Comments Graph structure across data modalities, accepted by ICML26

详情

AI中文摘要

图提供了跨不同领域出现的自然关系结构表示。尽管无处不在，图结构通常以模态和任务隔离的方式学习，即在单个任务上下文中构建图表示，然后丢弃。因此，跨模态和任务的结构规律被反复重建，而不是在中间图表示级别积累。这引发了一个表示学习问题：如何组织图结构，使其能够跨异构模态和任务持久存在并积累？我们采用以表示为中心的视角，将图结构视为跨学习上下文持久存在的结构基板。为了实例化这一视角，我们提出了G-Substrate，一个围绕共享图结构组织学习的图基板框架。G-Substrate包含两个互补机制：一个统一的结构模式，确保跨异构模态和任务的图表示兼容性；以及一个交错基于角色的训练策略，在学习过程中将同一图结构暴露给多个功能角色。跨多个领域、模态和任务的实验表明，G-Substrate优于任务隔离和朴素多任务学习方法。代码库、模型和数据集可在https://github.com/zmli6/G-Substrate获取。

英文摘要

Graphs provide a natural representation of relational structure that arises across diverse domains. Despite this ubiquity, graph structure is typically learned in a modality- and task-isolated manner, where graph representations are constructed within individual task contexts and discarded thereafter. As a result, structural regularities across modalities and tasks are repeatedly reconstructed rather than accumulated at the level of intermediate graph representations. This motivates a representation-learning question: how should graph structure be organized so that it can persist and accumulate across heterogeneous modalities and tasks? We adopt a representation-centric perspective in which graph structure is treated as a structural substrate that persists across learning contexts. To instantiate this perspective, we propose G-Substrate, a graph substrate framework that organizes learning around shared graph structures. G-Substrate comprises two complementary mechanisms: a unified structural schema that ensures compatibility among graph representations across heterogeneous modalities and tasks, and an interleaved role-based training strategy that exposes the same graph structure to multiple functional roles during learning. Experiments across multiple domains, modalities, and tasks show that G-Substrate outperforms task-isolated and naive multi-task learning methods. The codebase, model, and datasets are available at https://github.com/zmli6/G-Substrate.

URL PDF HTML ☆

赞 0 踩 0

2509.21906 2026-05-27 math.ST cs.LG stat.ML stat.TH 版本更新

Error Analysis of Discrete Flow with Generator Matching

生成器匹配的离散流误差分析

Zhengyan Wan, Yidong Ouyang, Qiang Yao, Liyan Xie, Fang Fang, Hongyuan Zha, Guang Cheng

发表机构 * School of Statistics, East China Normal University（东华大学统计学院）； Department of Statistics and Data Science, University of California, Los Angeles（加州大学洛杉矶分校统计与数据科学系）； Department of Industrial and Systems Engineering, University of Minnesota（明尼苏达大学工业与系统工程系）； School of Data Science, The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳）数据科学学院）

AI总结本文基于随机微积分理论，通过Girsanov型定理统一分析离散流模型的收敛性质，给出了转移率估计误差和提前停止误差的非渐近误差界，并首次提供了离散流模型的误差分析。

详情

AI中文摘要

离散流模型为学习离散状态空间上的分布提供了强大的框架，并且与离散扩散模型相比表现出更优的性能。然而，它们的收敛性质和误差分析仍然在很大程度上未被探索。在这项工作中，我们开发了一个基于随机微积分理论的统一框架，以系统地研究离散流模型的理论性质。具体来说，通过利用两个连续时间马尔可夫链（CTMC）路径测度的Girsanov型定理，我们提出了一个全面的误差分析，该分析同时考虑了转移率估计误差和提前停止误差。实际上，现有工作中很少关注转移率的估计误差。与离散扩散模型不同，离散流不会因在噪声过程中截断时间范围而产生初始化误差。基于生成器匹配和均匀化，我们在没有对Oracle转移率施加有界性条件的情况下，建立了分布估计的非渐近误差界。此外，我们推导了在有界性条件下估计分布的总变差收敛的更快速率，得到了关于样本量的近乎最优的速率。我们的结果为离散流模型提供了首次误差分析。我们还基于模拟结果研究了不同设置下的模型性能。

英文摘要

Discrete flow models offer a powerful framework for learning distributions over discrete state spaces and have demonstrated superior performance compared to the discrete diffusion models. However, their convergence properties and error analysis remain largely unexplored. In this work, we develop a unified framework grounded in stochastic calculus theory to systematically investigate the theoretical properties of discrete flow models. Specifically, by leveraging a Girsanov-type theorem for the path measures of two continuous-time Markov chains (CTMCs), we present a comprehensive error analysis that accounts for both transition rate estimation error and early stopping error. In fact, the estimation error of transition rates has received little attention in existing works. Unlike discrete diffusion models, discrete flow incurs no initialization error caused by truncating the time horizon in the noising process. Building on generator matching and uniformization, we establish non-asymptotic error bounds for distribution estimation without the boundedness condition on oracle transition rates. Furthermore, we derive a faster rate of total variation convergence for the estimated distribution with the boundedness condition, yielding a nearly optimal rate in terms of sample size. Our results provide the first error analysis for discrete flow models. We also investigate model performance under different settings based on simulation results.

URL PDF HTML ☆

赞 0 踩 0

2601.21845 2026-05-27 cs.LG 版本更新

Constrained Meta Reinforcement Learning with Provable Test-Time Safety

具有可证明测试时安全性的约束元强化学习

Tingting Ni, Maryam Kamgarpour

发表机构 * Sycamore Lab, EPFL, Lausanne, Switzerland（苏黎世联邦理工学院萨克森实验室，瑞士拉瓦尔）

AI总结提出一种约束元强化学习算法，在测试任务上以可证明的安全性和样本复杂度保证学习近似最优策略，并证明样本复杂度下界。

详情

AI中文摘要

元强化学习允许智能体利用在可随意训练的任务分布上的经验，从而在新测试任务上更快地学习最优策略。尽管在提高测试任务样本复杂度方面取得了成功，但许多实际应用（如机器人和医疗保健）在测试期间施加了安全约束。约束元强化学习为将安全性整合到元强化学习中提供了一个有前景的框架。约束元强化学习中的一个开放问题是如何确保策略在真实世界测试任务上的安全性，同时降低样本复杂度，从而更快地学习最优策略。为了解决这一差距，我们提出了一种算法，该算法精炼训练期间学到的策略，具有可证明的安全性和样本复杂度保证，用于在测试任务上学习近似最优策略。我们进一步推导了一个匹配的下界，表明该样本复杂度是紧的。

英文摘要

Meta reinforcement learning (RL) allows agents to leverage experience across a distribution of tasks on which the agent can train at will, enabling faster learning of optimal policies on new test tasks. Despite its success in improving sample complexity on test tasks, many real-world applications, such as robotics and healthcare, impose safety constraints during testing. Constrained meta RL provides a promising framework for integrating safety into meta RL. An open question in constrained meta RL is how to ensure safety of the policy on the real-world test task, while reducing the sample complexity and thus, enabling faster learning of optimal policies. To address this gap, we propose an algorithm that refines policies learned during training, with provable safety and sample complexity guarantees for learning a near optimal policy on the test tasks. We further derive a matching lower bound, showing that this sample complexity is tight.

URL PDF HTML ☆

赞 0 踩 0

2601.21789 2026-05-27 cs.LG cs.AI stat.ML 版本更新

ECSEL: Explainable Classification via Signomial Equation Learning

ECSEL: 通过符号方程学习的可解释分类

Adia Lumadjeng, Ilker Birbil, Erman Acar

发表机构 * Amsterdam Business School, University of Amsterdam, Amsterdam, the Netherlands（阿姆斯特丹大学阿姆斯特丹商学院）； Institute for Informatics, University of Amsterdam（阿姆斯特丹大学信息学院）； Institute for Logic, Language and Computation, University of Amsterdam（阿姆斯特丹大学逻辑、语言与计算研究所）

AI总结提出ECSEL方法，通过学习符号方程形式的闭式表达式实现可解释分类，在符号回归基准上以更低计算量恢复更多目标方程，并保持分类精度与可解释性。

Comments 9 pages, 4 figures, accepted at ICML 2026

详情

AI中文摘要

我们引入ECSEL，一种可解释的分类方法，它学习形如符号方程的正式表达式，其动机是观察到许多符号回归基准具有紧凑的符号结构。ECSEL直接构建一个结构化的闭式表达式，同时作为分类器和解释。在标准符号回归基准上，我们的方法比竞争的最新方法恢复更大比例的目标方程，同时需要更少的计算。利用这种效率，ECSEL在不牺牲可解释性的情况下实现了与已建立的机器学习模型竞争的分类精度。此外，我们展示了ECSEL在全局特征行为、决策边界分析和局部特征归因方面满足一些理想性质。在基准数据集和两个真实世界案例研究（即电子商务和欺诈检测）上的实验表明，学习到的方程暴露了数据集偏差，支持反事实推理，并产生可操作的见解。

英文摘要

We introduce ECSEL, an explainable classification method that learns formal expressions in the form of signomial equations, motivated by the observation that many symbolic regression benchmarks admit compact signomial structure. ECSEL directly constructs a structural, closed-form expression that serves as both a classifier and an explanation. On standard symbolic regression benchmarks, our method recovers a larger fraction of target equations than competing state-of-the-art approaches while requiring substantially less computation. Leveraging this efficiency, ECSEL achieves classification accuracy competitive with established machine learning models without sacrificing interpretability. Further, we show that ECSEL satisfies some desirable properties regarding global feature behavior, decision-boundary analysis, and local feature attributions. Experiments on benchmark datasets and two real-world case studies i.e., e-commerce and fraud detection, demonstrate that the learned equations expose dataset biases, support counterfactual reasoning, and yield actionable insights.

URL PDF HTML ☆

赞 0 踩 0

2511.16870 2026-05-27 cs.CV cs.LG 版本更新

超越迁移准确率：用于受控低资源适应的忠实电路

Khumaisa Nur'aini, Ayu Purwarianti, Alham Fikri Aji, Derry Wijaya

发表机构 * Monash University Indonesia（印度尼西亚墨尔本大学）； Institute Teknologi Bandung（Bandung理工大学）； MBZUAI（MBZUAI研究所）； Boston University（波士顿大学）

AI总结提出基于上下文分解的电路发现方法（CD-T），通过标签平衡激活均值和任务方向相关性评分实现无反事实电路发现，并利用电路目标监督微调（CT-SFT）在低资源跨语言情感迁移中最小化灾难性遗忘，优于全局微调。

详情

AI中文摘要

现有的电路发现方法依赖于具有干净反事实的模板化任务，限制了它们在多样化自然文本上的使用。我们通过标签平衡激活均值和任务方向相关性评分，将上下文分解方法适配到非结构化设置（CD-T），实现了无反事实的电路发现。我们利用这些电路进行电路目标监督微调（CT-SFT），将参数更新限制在任务相关的注意力头和层归一化上。在NusaX跨语言情感迁移上的实验表明，CT-SFT在低资源适应中极具竞争力。虽然非电路稀疏更新和全微调有时通过能力招募达到目标准确率，但CT-SFT独特地最小化灾难性遗忘，保留了源语言和相关任务的性能。在XNLI上的扩展证实了这些发现在更广泛的任务和模型家族中成立，表明电路目标适应提供了一种更安全、基于因果关系的全局微调替代方案。

英文摘要

Existing circuit discovery methods rely on templated tasks with clean counterfactuals, limiting their use on diverse natural text. We adapt Contextual Decomposition for Transformers (CD-T) for unstructured settings via label-balanced activation means and task-directional relevance scoring, enabling counterfactual-free circuit discovery. We leverage these circuits for Circuit-Targeted Supervised Fine-Tuning (CT-SFT), restricting parameter updates to task-relevant heads and LayerNorm. Experiments on NusaX cross-lingual sentiment transfer show that CT-SFT is highly competitive for low-resource adaptation. While non-circuit sparse updates and full fine-tuning sometimes match target accuracy through capacity recruitment, CT-SFT uniquely minimizes catastrophic forgetting, preserving source-language and related-task performance. Extensions to XNLI confirm these findings hold across broader tasks and model families, demonstrating that circuit-targeted adaptation provides a safer, causally grounded alternative to global fine-tuning.

URL PDF HTML ☆

赞 0 踩 0

2601.11334 2026-05-27 cs.IT cs.LG math.IT 版本更新

Information Theoretic Perspective on Representation Learning

表示学习的信息论视角

Deborah Pereg, Michael Wand

发表机构 * Scuola Universitaria Professionale della Svizzera Italiana (SUPSI)（瑞士意大利专业大学（SUPSI））； Istituto Dalle Molle di Studi sull’Intelligenza Artificiale (IDSIA)（达莫尔智能研究 institute（IDSIA））

AI总结本文提出信息论框架分析回归任务中最后一层嵌入的表示，定义了表示率、表示容量和表示率失真，并推导了可达容量和表示率及其逆命题。

2512.01572 2026-05-27 cs.LG cs.AI physics.app-ph 版本更新

Reconstructing Multi-Scale Physical Fields from Extremely Sparse Measurements with an Autoencoder-Diffusion Cascade

使用自编码器-扩散级联从极度稀疏测量中重建多尺度物理场

Letian Yi, Tingpeng Zhang, Mingyuan Zhou, Guannan Wang, Quanke Su, Zhilu Lai

发表机构 * Internet of Things Thrust（物联网方向）； Intelligent Transportation Thrust（智能交通方向）； Marine Hydrodynamic Research Facility（海洋流体研究设施）； Department of Civil and Environmental Engineering（土木与环境工程系）

AI总结提出Cascaded Sensing框架，通过粗尺度确定性估计和细尺度条件扩散模型级联，解决极度稀疏测量下物理场重建的不适定性和多模态后验问题。

Comments 34 pages,22 figures

详情

AI中文摘要

极端传感器稀疏性使得全场重建成为科学传感中一个根本性的不适定问题，其目标是从稀疏测量中推断物理场。在此情况下，后验严重欠约束且固有地多模态，使其近似高度病态。具体而言，确定性映射会坍塌不确定性，直接条件学习无法覆盖可能的观测条件解空间，而似然引导采样对噪声和传感器配置高度敏感。这些限制导致后验估计不稳定，并突显了以结构化方式建模不确定性的必要性。为此，我们提出了Cascaded Sensing，一个跨尺度重构后验推理的分层框架。Cas-Sensing不直接建模全场后验，而是首先通过确定性粗阶段估计器解决全局结构模糊性。一个基于神经算子的功能自编码器，使用掩码输入训练，将稀疏观测映射到粗尺度结构场，其作用类似于最大后验估计器，选择主导全局配置。该结构锚点固定了后验的主要自由度，并将问题转化为一个条件更好的残差推理任务。然后，一个条件扩散模型仅学习细化尺度的残差分布，将采样限制在合理解的稳定邻域内，并抑制观测一致模式之间的竞争。为了增强在不同传感条件下的鲁棒性，我们引入了掩码级联训练，通过中间粗重建使模型暴露于多样的稀疏观测模式。在推理过程中，流形约束引导将观测一致性作为细化机制而非全局模式选择过程来实施。

英文摘要

Extreme sensor sparsity makes full-field reconstruction a fundamentally ill-posed problem in scientific sensing,where the goal is to infer physical fields from sparse measurements.In this regime,the posterior is severely underconstrained and inherently multimodal,making its approximation highly ill-conditioned.Specifically,deterministic mappings collapse uncertainty,direct conditional learning cannot cover the space of possible observation-conditioned solutions,and likelihood-guided sampling becomes highly sensitive to noise and sensor configurations.These limitations result in unstable posterior estimates and highlight the need for modeling uncertainty in a structural manner.To this end,we propose Cascaded Sensing,a hierarchical framework that restructures posterior inference across scales.Rather than modeling the full-field posterior directly,Cas-Sensing first resolves global structural ambiguity through a deterministic coarse-stage estimator.A neural-operator-based functional autoencoder,trained with masked inputs,maps sparse observations to a coarse-scale structural field,acting analogously to a maximum a posteriori estimator that selects the dominant global configuration.This structural anchor fixes the principal degrees of freedom of the posterior and transforms the problem into a better-conditioned residual inference task.A conditional diffusion model then learns only the refined-scale residual distribution,confining sampling to a stable neighborhood of plausible solutions and suppressing competition among observation-consistent modes.To enhance robustness under varying sensing conditions,we introduce mask-cascade training,which exposes the model to diverse sparse observation patterns through intermediate coarse reconstructions.During inference,manifold-constrained guidance enforces observation consistency as a refinement mechanism rather than a global mode-selection process.

URL PDF HTML ☆

赞 0 踩 0

2601.03525 2026-05-27 cs.LG cs.AI 版本更新

Beyond Binary: Turning Partial Success into Dense Verifiable Rewards for Reinforcement Learning in Code Generation

超越二元：将部分成功转化为代码生成中强化学习的密集可验证奖励

Longwen Wang, Yirui Liu, Xuan'er Wu, Xiaohui Hu, Yuankai Fan, Kaidong Yu, Qizhen Weng, Wei Xi, Xuelong Li

发表机构 * Institute of Artificial Intelligence, China Telecom (TeleAI)（中国电信人工智能研究院（TeleAI））； Xingchen AGI Lab, China Telecom Artificial Intelligence Technology (Beijing) Co., Ltd（中国电信人工智能技术（北京）有限公司Xingchen AGI实验室）； National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi’an Jiaotong University（人机混合增强智能国家重点实验室，西安交通大学）

AI总结提出VeRPO框架，利用代码测试的部分成功作为可验证密集奖励，通过动态密度校准局部奖励修正基数偏差，并与全局执行结果结合，提升代码生成强化学习的性能。

详情

AI中文摘要

有效的奖励设计是代码生成强化学习（RL）中的核心挑战。主流的测试套件级结果奖励强制执行功能正确性但导致稀疏性，而外部奖励模型（RM）提供密集监督但代价是错位和额外开销。由于代码评估自然产生多个测试用例级结果，部分成功（即通过部分测试用例）提供了内在的、可验证的密集监督来源。在本文中，我们提出VeRPO（可验证密集奖励策略优化），一个系统地将可验证的部分成功转化为可靠密集奖励的RL框架。我们使用加权和公式分析部分成功奖励，理论上识别出一个关键的基数偏差，导致策略更新不成比例地偏向于从简单测试成功中获益，而非在前沿测试上取得进展。基于此，VeRPO引入了一个动态的、密度校准的局部奖励，明确纠正这种偏差，并从部分成功中提供稳健的密集监督。为了增强与端到端功能正确性的一致性，VeRPO进一步将局部密集奖励与全局执行结果相结合。在多种基准和设置上的大量实验表明，VeRPO优于结果驱动和基于RM的基线，实现了高达+8.83 pass@1的提升，且时间成本可忽略不计（<0.02%），GPU内存开销为零。

英文摘要

Effective reward design is a central challenge in Reinforcement Learning (RL) for code generation. Mainstream test-suite-level outcome rewards enforce functional correctness but induce sparsity, while external Reward Models (RMs) provide dense supervision at the cost of misalignment and additional overhead. Since code evaluation naturally yields multiple test-case-level outcomes, partial success, i.e., passing a subset of test cases, offers an intrinsic, verifiable source of dense supervision. In this paper, we propose VeRPO (Verifiable Dense Reward Policy Optimization), an RL framework that systematically turns verifiable partial success into reliable dense rewards. We analyze partial-success rewards using a weighted sum formulation, theoretically identifying a critical cardinality bias that causes policy updates to disproportionately favor gains from easy-test successes over progress on frontier tests. Based on this, VeRPO introduces a dynamic, density-calibrated local reward that explicitly corrects this bias and provides robust dense supervision from partial success. To enhance alignment with end-to-end functional correctness, VeRPO further integrates the local dense reward with global execution outcomes. Extensive experiments across diverse benchmarks and settings demonstrate that VeRPO outperforms outcome-driven and RM-based baselines, achieving up to +8.83 pass@1 gain with negligible time cost (< 0.02%) and zero GPU memory overhead.

URL PDF HTML ☆

赞 0 踩 0

2601.05028 2026-05-27 cs.LG 版本更新

Approximate Equivariance via Projection-based Regularisation

基于投影正则化的近似等变性

Torben Berndt, Jan Stühmer

发表机构 * Heidelberg Institute for Theoretical Studies（海德堡理论研究所）； Karlsruhe Institute of Technology（卡尔斯鲁厄理工学院）

AI总结提出一种基于投影的正则化方法，通过在线性层中分解等变与非等变分量并惩罚非等变算子范数，实现高效且精确的近似等变性，在SO(3)等连续群上优于样本基方法。

详情

AI中文摘要

等变性是神经网络中一种强大的归纳偏置，能够提高泛化能力和物理一致性。然而，最近非等变模型因其更好的运行时性能以及现实应用中可能出现的不完美对称性而重新受到关注。这推动了近似等变模型的发展，这些模型在尊重对称性和拟合数据分布之间取得了平衡。该领域现有的方法通常使用基于样本的正则化器，这些正则化器依赖于训练时的数据增强，导致较高的样本复杂度，特别是对于$SO(3)$等连续群。相反，本文通过基于投影的正则化器来处理近似等变性，该正则化器利用线性层到等变和非等变分量的正交分解。与现有方法不同，本文在算子层面上对整个群轨道上的非等变性进行惩罚，而不是逐点惩罚。我们提出了一个数学框架，用于在空间域和谱域中精确且高效地计算非等变性惩罚。在我们的实验中，我们的方法在模型性能和效率上始终优于先前的近似等变性方法，与基于样本的正则化器相比，实现了显著的运行时增益。

英文摘要

Equivariance is a powerful inductive bias in neural networks, improving generalisation and physical consistency. Recently, however, non-equivariant models have regained attention, due to their better runtime performance and imperfect symmetries that might arise in real-world applications. This has motivated the development of approximately equivariant models that strike a middle ground between respecting symmetries and fitting the data distribution. Existing approaches in this field usually apply sample-based regularisers which depend on data augmentation at training time, incurring a high sample complexity, in particular for continuous groups such as $SO(3)$. This work instead approaches approximate equivariance via a projection-based regulariser which leverages the orthogonal decomposition of linear layers into equivariant and non-equivariant components. In contrast to existing methods, this penalises non-equivariance at an operator level across the full group orbit, rather than point-wise. We present a mathematical framework for computing the non-equivariance penalty exactly and efficiently in both the spatial and spectral domain. In our experiments, our method consistently outperforms prior approximate equivariance approaches in both model performance and efficiency, achieving substantial runtime gains over sample-based regularisers.

URL PDF HTML ☆

赞 0 踩 0

2410.00995 2026-05-27 cs.LG 版本更新

CktGen: Automated Analog Circuit Design with Generative Artificial Intelligence

CktGen: 基于生成式人工智能的自动化模拟电路设计

Yuxuan Hou, Hehe Fan, Jianrong Zhang, Yue Zhang, Hua Chen, Min Zhou, Faxin Yu, Roger Zimmermann, Yi Yang

发表机构 * College of Computer Science and Technology（计算机科学与技术学院）； Australian Artificial Intelligence Institute（澳大利亚人工智能研究所）； School of Aeronautics and Astronautics（航空宇航科学学院）； School of Computing（计算科学学院）

AI总结提出CktGen，一种基于条件变分自编码器的模拟电路生成方法，通过解耦电路与规格编码并采用对比训练和分类器引导，实现从目标规格到有效电路的生成，显著优于现有方法。

Comments Paper accepted by Engineering

详情

DOI: 10.1016/j.eng.2025.12.025

AI中文摘要

模拟电路的自动综合面临重大挑战。大多数现有方法将问题表述为单目标优化任务，忽略了给定电路类型的设计规格在不同应用中的广泛变化。为了解决这个问题，我们引入了规格条件模拟电路生成，这是一项根据目标规格直接生成模拟电路的任务。其动机是利用现有的设计良好的电路来提高模拟电路设计的自动化程度。具体来说，我们提出了CktGen，一种简单而有效的变分自编码器，它将离散化的规格和电路映射到联合潜在空间，并从该潜在向量重建电路。值得注意的是，由于单个规格可能对应多个有效电路，简单地将规格信息融合到生成模型中无法捕捉这些一对多的关系。为了解决这个问题，我们解耦了电路和规格的编码，并对齐它们映射的潜在空间。然后，我们采用带有过滤掩码的对比训练来最大化编码电路和规格之间的差异。此外，分类器引导与潜在特征对齐促进了共享相同规格的电路的聚类，避免了模型崩溃为平凡的一对一映射。通过根据规格规范化潜在空间，我们可以搜索满足有效目标规格的最优电路。我们在开放电路基准上进行了全面实验，并引入了评估跨模型一致性的指标。实验结果表明，CktGen相比最先进的方法取得了显著改进。

英文摘要

The automatic synthesis of analog circuits presents significant challenges. Most existing approaches formulate the problem as a single-objective optimization task, overlooking that design specifications for a given circuit type vary widely across applications. To address this, we introduce specification-conditioned analog circuit generation, a task that directly generates analog circuits based on target specifications. The motivation is to leverage existing well-designed circuits to improve automation in analog circuit design. Specifically, we propose CktGen, a simple yet effective variational autoencoder that maps discretized specifications and circuits into a joint latent space and reconstructs the circuit from that latent vector. Notably, as a single specification may correspond to multiple valid circuits, naively fusing specification information into the generative model does not capture these one-to-many relationships. To address this, we decouple the encoding of circuits and specifications and align their mapped latent space. Then, we employ contrastive training with a filter mask to maximize differences between encoded circuits and specifications. Furthermore, classifier guidance along with latent feature alignment promotes the clustering of circuits sharing the same specification, avoiding model collapse into trivial one-to-one mappings. By canonicalizing the latent space with respect to specifications, we can search for an optimal circuit that meets valid target specifications. We conduct comprehensive experiments on the open circuit benchmark and introduce metrics to evaluate cross-model consistency. Experimental results demonstrate that CktGen achieves substantial improvements over state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2601.03089 2026-05-27 cs.CL cs.AI cs.LG 版本更新

Faithfulness Evaluation for Decoder-only LLM Attributions with Controlled Retained Information

基于受控保留信息的仅解码器LLM归因忠实性评估

Xin Huang, Antoni B. Chan

发表机构 * City University of Hong Kong（香港城市大学）

AI总结针对现有软扰动忠实性指标因保留词数不同导致评估偏差的问题，提出π-Soft-NC和π-Soft-NS框架，通过控制期望保留概率公平比较归因方法，并引入专用于自回归解码器LLM的梯度归因方法Grad-ELLM。

详情

AI中文摘要

大型语言模型（LLM）越来越多地使用输入归因方法进行评估，但比较这些解释仍然具有挑战性。现有的软扰动忠实性指标，如Soft-NC和Soft-NS，可能将归因质量与扰动期间保留的词数混为一谈：平均得分较高的归因方法可能保留更多词，从而获得膨胀的分数。为解决此问题，我们提出π-Soft-NC和π-Soft-NS，这是一个在相同期望保留概率下比较归因方法的评估框架，从而控制保留词数。我们进一步引入Grad-ELLM，一种针对自回归仅解码器LLM定制的基于梯度的归因方法，该方法在每个解码步骤将梯度导出的通道重要性与注意力导出的标记重要性相结合。在Llama和Mistral上的分类和开放生成任务实验表明，Grad-ELLM在π-Soft-NC下实现了强全面性导向的忠实性，而在π-Soft-NS下没有主导方法。我们的评估指标为比较LLM的可解释人工智能方法提供了一个严格的框架，将支持该领域的进展。

英文摘要

Large Language Models (LLMs) are increasingly evaluated with input attribution methods, yet comparing such explanations remains challenging. Existing soft-perturbation faithfulness metrics, such as Soft-NC and Soft-NS, can conflate attribution quality with the number of words retained during perturbation: attribution methods with larger average scores may keep more words and therefore obtain inflated scores. To address this issue, we propose $π$-Soft-NC and $π$-Soft-NS, an evaluation framework that compares attribution methods under the same expected retaining probability, thus controlling the number of retained words. We further introduce Grad-ELLM, a gradient-based attribution method tailored to autoregressive decoder-only LLMs, which combines gradient-derived channel importance with attention-derived token importance at each decoding step. Experiments on classification and open-generation tasks with Llama and Mistral show that Grad-ELLM achieves strong comprehensiveness-oriented faithfulness under $π$-Soft-NC, while there is no dominant method under $π$-Soft-NS. Our evaluation metric serves as a rigorous framework to compare XAI methods for LLMs, which will support progress in the field.

URL PDF HTML ☆

赞 0 踩 0

2512.22666 2026-05-27 cs.CV cs.LG 版本更新

INTERACT-CMIL: Multi-Task Shared Learning and Inter-Task Consistency for Conjunctival Melanocytic Intraepithelial Lesion Grading

INTERACT-CMIL：用于结膜黑色素细胞上皮内病变分级的任务共享学习与任务间一致性

Mert Ikinci, Luna Toma, Karin U. Loeffler, Leticia Ussem, Daniela Süsskind, Julia M. Weller, Yousef Yeganeh, Martina C. Herwig-Carl, Shadi Albarqouni

发表机构 * Clinic for Diagnostic and Interventional Radiology, University Hospital Bonn, Germany（波恩大学诊断与介入放射科）； Department of Ophthalmology, Friedrich-Alexander University Erlangen-Nürnberg, Germany（埃尔兰根-纽伦堡弗里德里希-亚历山大大学眼科部）； TUM School of Computation, Information and Technology, Technical University of Munich, Germany（慕尼黑技术大学计算、信息与技术学院）； Munich Center for Machine Learning, Germany（慕尼黑机器学习中心）； Helmholtz AI, Helmholtz Center Munich, Germany（海德堡人工智能，海德堡慕尼黑研究中心）

AI总结提出INTERACT-CMIL多任务深度学习框架，通过共享特征学习、组合部分监督和任务间一致性损失联合预测五个组织病理学轴，在486张结膜活检图像数据集上相比CNN和基础模型实现最高55.1%的宏F1提升。

详情

DOI: 10.1109/ISBI61048.2026.11515389
Journal ref: IEEE ISBI 2026

AI中文摘要

结膜黑色素细胞上皮内病变（CMIL）的准确分级对于治疗和黑色素瘤预测至关重要，但由于细微的形态学线索和相互关联的诊断标准，仍然困难。我们提出INTERACT-CMIL，一个多头深度学习框架，通过共享特征学习与组合部分监督以及强制跨任务一致性的相互依赖损失，联合预测五个组织病理学轴：WHO4、WHO5、水平扩散、垂直扩散和细胞异型性。在来自三家大学医院的486张专家注释的结膜活检斑块的新整理多中心数据集上进行训练和评估，INTERACT-CMIL在CNN和基础模型（FM）基线上取得了一致的改进，相对宏F1增益高达55.1%（WHO4）和25.0%（垂直扩散）。该框架提供与专家分级一致的连贯、可解释的多标准预测，为CMIL诊断提供了可重复的计算基准，并朝着标准化数字眼科病理学迈出了一步。

英文摘要

Accurate grading of Conjunctival Melanocytic Intraepithelial Lesions (CMIL) is essential for treatment and melanoma prediction but remains difficult due to subtle morphological cues and interrelated diagnostic criteria. We introduce INTERACT-CMIL, a multi-head deep learning framework that jointly predicts five histopathological axes; WHO4, WHO5, horizontal spread, vertical spread, and cytologic atypia, through Shared Feature Learning with Combinatorial Partial Supervision and an Inter-Dependence Loss enforcing cross-task consistency. Trained and evaluated on a newly curated, multi-center dataset of 486 expert-annotated conjunctival biopsy patches from three university hospitals, INTERACT-CMIL achieves consistent improvements over CNN and foundation-model (FM) baselines, with relative macro F1 gains up to 55.1% (WHO4) and 25.0% (vertical spread). The framework provides coherent, interpretable multi-criteria predictions aligned with expert grading, offering a reproducible computational benchmark for CMIL diagnosis and a step toward standardized digital ocular pathology.

URL PDF HTML ☆

赞 0 踩 0

2512.19332 2026-05-27 cs.LG cs.LO 版本更新

A Logical View of GNN-Style Computation and the Role of Activation Functions

GNN风格计算的逻辑视角与激活函数的作用

Pablo Barceló, Floris Geerts, Matthias Lanzinger, Klara Pakhomenko, Jan Van den Bussche

发表机构 * Institute for Mathematical and Computational Engineering（数学与计算工程研究所）； Pontifical Catholic University of Chile（天主教智利大学）； IMFD ； CENIA ； Department of Computer Science, University of Antwerp（安特卫普大学计算机科学系）； Institute for Logic and Computation, TU Wien（逻辑与计算研究所，维也纳技术大学）； Data Science Institute, Universiteit Hasselt（数据科学研究所，哈塞尔特大学）

AI总结本文通过定义语言MPLang，从逻辑角度研究图神经网络的计算能力，重点分析激活函数（特别是ReLU与有界激活函数）对数值和布尔表达能力的影响，并首次证明在存在线性层时，ReLU比有界激活函数具有更强的数值查询表达能力。

详情

AI中文摘要

我们研究了MPLang的数值和布尔表达能力，MPLang是一种声明式语言，通过线性消息传递和激活函数捕获图神经网络（GNN）的计算。我们从A-MPLang（无激活函数的片段）开始，并基于游走求和特征刻画了其表达能力。对于有界激活函数，我们证明（在温和条件下）所有最终恒定的激活函数产生相同的表达能力——数值和布尔——并且它包含了先前为具有最终恒定激活函数但无线性层的GNN建立的逻辑。最后，我们证明了在存在线性层的情况下，无界激活函数与有界激活函数之间的第一个表达能力分离：使用ReLU的MPLang在数值查询上严格强于使用最终恒定激活函数（例如截断ReLU）的MPLang。这依赖于线性聚合与最终恒定非线性之间的微妙交互，并确立了使用ReLU的GNN比那些仅限于最终恒定激活函数和线性层的GNN更具表达能力。

英文摘要

We study the numerical and Boolean expressiveness of MPLang, a declarative language that captures the computation of graph neural networks (GNNs) through linear message passing and activation functions. We begin with A-MPLang, the fragment without activation functions, and give a characterization of its expressive power in terms of walk-summed features. For bounded activation functions, we show that (under mild conditions) all eventually constant activations yield the same expressive power - numerical and Boolean - and that it subsumes previously established logics for GNNs with eventually constant activation functions but without linear layers. Finally, we prove the first expressive separation between unbounded and bounded activations in the presence of linear layers: MPLang with ReLU is strictly more powerful for numerical queries than MPLang with eventually constant activation functions, e.g., truncated ReLU. This hinges on subtle interactions between linear aggregation and eventually constant non-linearities, and it establishes that GNNs using ReLU are more expressive than those restricted to eventually constant activations and linear layers.

URL PDF HTML ☆

赞 0 踩 0

2512.18540 2026-05-27 eess.SY cs.LG cs.SY math.OC 版本更新

Distributed Control of Network Systems in the Space of Stabilizing Graph Neural Network Policies

稳定图神经网络策略空间中的网络系统分布式控制

John Cao, Luca Furieri

发表机构 * Department of Engineering Science, University of Oxford（牛津大学工程科学系）

AI总结通过将图神经网络嵌入Youla-like幅度-方向参数化，提出一种保证闭环稳定性的分布式随机控制器，并证明其对图拓扑和模型参数扰动的鲁棒性。

2512.17090 2026-05-27 cs.LG cs.AI 版本更新

How to Square Tensor Networks and Circuits Without Squaring Them

如何平方张量网络和电路而不进行平方操作

Lorenzo Loconte, Adrián Javaloy, Antonio Vergari

发表机构 * School of Informatics, University of Edinburgh, UK（爱丁堡大学信息学院）

AI总结提出一种参数化方法，通过正交性和确定性条件简化平方张量网络和电路的边际化计算，避免额外复杂度，并在分布估计任务中保持表达能力且提升学习效率。

详情

AI中文摘要

平方张量网络（TNs）及其作为计算图的扩展——平方电路——已被用作表达性的分布估计器，同时支持闭式边际化。然而，平方操作在计算配分函数或边际化变量时引入了额外的复杂性，这阻碍了它们在机器学习中的应用。为了解决这个问题，张量网络的正则形式通过酉矩阵参数化以简化边际计算。然而，这些正则形式不适用于电路，因为电路可以表示不直接映射到已知张量网络的分解。受正则形式中的正交性和电路中实现可处理最大化的确定性的启发，我们展示了如何参数化平方电路以克服其边际化开销。我们的参数化即使在不同于张量网络的分解中也能实现高效的边际化，这些分解编码为电路，否则其结构会使边际化计算变得困难。最后，我们在分布估计上的实验表明，我们提出的平方电路条件在没有任何表达能力损失的情况下，实现了更高效的学习。

英文摘要

Squared tensor networks (TNs) and their extension as computational graphs--squared circuits--have been used as expressive distribution estimators, yet supporting closed-form marginalization. However, the squaring operation introduces additional complexity when computing the partition function or marginalizing variables, which hinders their applicability in ML. To solve this issue, canonical forms of TNs are parameterized via unitary matrices to simplify the computation of marginals. However, these canonical forms do not apply to circuits, as they can represent factorizations that do not directly map to a known TN. Inspired by the ideas of orthogonality in canonical forms and determinism in circuits enabling tractable maximization, we show how to parameterize squared circuits to overcome their marginalization overhead. Our parameterizations unlock efficient marginalization even in factorizations different from TNs, but encoded as circuits, whose structure would otherwise make marginalization computationally hard. Finally, our experiments on distribution estimation show how our proposed conditions in squared circuits come with no expressiveness loss, while enabling more efficient learning.

URL PDF HTML ☆

赞 0 踩 0

2512.16702 2026-05-27 cond-mat.mtrl-sci cs.LG physics.chem-ph 版本更新

How accurate are foundational machine learning interatomic potentials for heterogeneous catalysis?

基础机器学习原子间势对多相催化的准确性如何？

Luuk H. E. Kempen, Raffaele Cheula, Mie Andersen

发表机构 * Center for Interstellar Catalysis, Department of Physics and Astronomy, Aarhus University（星际催化中心，物理天文系，奥胡斯大学）

AI总结系统评估80种基础机器学习原子间势在多相催化任务中的零样本性能，发现其在特定应用（如钙钛矿氧化物空位形成能）中表现优异，但在磁性材料上失败，且结构弛豫增加误差，无单一模型普遍最优。

Comments 16 pages, 5 figures, 1 table + supplementary information (37 pages, 16 figures, 15 tables)

详情

DOI: 10.1063/5.0317672
Journal ref: J. Chem. Phys. 164, 194119 (2026)

AI中文摘要

基础机器学习原子间势（MLIP）正在快速发展，有望越来越接近从头算精度，从而模拟更大的长度和时间尺度。然而，这些MLIP的基准测试通常限于有序、结晶和块体材料。因此，报告的性能不一定准确反映MLIP在实际应用（如多相催化）中的表现。在此，我们系统分析了80种不同MLIP的零样本性能，评估了多相催化中典型任务，涵盖一系列不同数据集，包括合金金属、氧化物和金属-氧化物界面体系上的吸附和反应。我们证明，当前一代基础MLIP在预测钙钛矿氧化物的空位形成能或负载纳米团簇的零点能等应用中已经能够达到高精度。然而，也存在局限性。我们发现许多MLIP在应用于磁性材料时灾难性地失败，并且与对先前优化结构的单点评估相比，MLIP中的结构弛豫通常会增加能量预测误差。将低成本的特定任务模型与基础MLIP进行比较，我们强调了这些模型方法之间的一些核心差异，并表明——如果仅考虑准确性——这些模型可以与当前一代性能最佳的MLIP竞争。此外，我们表明没有单一的MLIP普遍表现最佳，需要用户针对其所需应用研究MLIP的适用性。

英文摘要

Foundational machine learning interatomic potentials (MLIPs) are being developed at a rapid pace, promising closer and closer approximation to ab initio accuracy. This unlocks the possibility to simulate much larger length and time scales. However, benchmarks for these MLIPs are usually limited to ordered, crystalline and bulk materials. Hence, reported performance does not necessarily accurately reflect MLIP performance in real applications such as heterogeneous catalysis. Here, we systematically analyze zero-shot performance of 80 different MLIPs, evaluating tasks typical for heterogeneous catalysis across a range of different data sets, including adsorption and reaction on surfaces of alloyed metals, oxides, and metal-oxide interfacial systems. We demonstrate that current-generation foundational MLIPs can already perform at high accuracy for applications such as predicting vacancy formation energies of perovskite oxides or zero-point energies of supported nanoclusters. However, limitations also exist. We find that many MLIPs catastrophically fail when applied to magnetic materials, and structure relaxation in the MLIP generally increases the energy prediction error compared to single-point evaluation of a previously optimized structure. Comparing low-cost task-specific models to foundational MLIPs, we highlight some core differences between these model approaches and show that -- if considering only accuracy -- these models can compete with the current generation of best-performing MLIPs. Furthermore, we show that no single MLIP universally performs best, requiring users to investigate MLIP suitability for their desired application.

URL PDF HTML ☆

赞 0 踩 0

2512.16111 2026-05-27 cs.LG eess.SP 版本更新

BUILD with Precision: Bottom-Up Inference of Linear DAGs

精确构建：线性有向无环图的由底向上推断

Hamed Ajorlou, Samuel Rey, Gonzalo Mateos, Geert Leus, Antonio G. Marques

发表机构 * University of Rochester（罗切斯特大学）； Universidad Rey Juan Carlos（雷耶皇家大学）； Delft University of Technology（代尔夫特理工大学）

AI总结提出BUILD算法，利用等噪声方差线性高斯SEM下观测数据的集成精度矩阵的独特结构，通过确定性逐步方法精确重构DAG，并在有限数据下通过周期性重估计精度矩阵增强鲁棒性。

详情

AI中文摘要

从观测数据中学习有向无环图（DAG）的结构是因果发现、统计信号处理和机器学习中的核心问题。在等噪声方差的线性高斯结构方程模型（SEM）下，该问题是可识别的，并且我们证明观测数据的集成精度矩阵展现出一种有助于DAG恢复的独特结构。利用这一性质，我们提出了BUILD（线性DAG的由底向上推断），一种确定性的逐步算法，该算法识别叶节点及其父节点，然后通过移除关联边来修剪叶节点以进入下一步，从真实的精度矩阵中精确重构DAG。在实践中，精度矩阵必须从有限数据中估计，而病态条件可能导致BUILD步骤中的误差累积。作为一种缓解策略，我们定期重新估计精度矩阵（随着叶节点被修剪，变量减少），以运行时换取增强的鲁棒性。在具有挑战性的合成基准上的可重复结果表明，BUILD与最先进的DAG学习算法相比具有优势，同时提供了对复杂性的明确控制。

英文摘要

Learning the structure of directed acyclic graphs (DAGs) from observational data is a central problem in causal discovery, statistical signal processing, and machine learning. Under a linear Gaussian structural equation model (SEM) with equal noise variances, the problem is identifiable and we show that the ensemble precision matrix of the observations exhibits a distinctive structure that facilitates DAG recovery. Exploiting this property, we propose BUILD (Bottom-Up Inference of Linear DAGs), a deterministic stepwise algorithm that identifies leaf nodes and their parents, then prunes the leaves by removing incident edges to proceed to the next step, exactly reconstructing the DAG from the true precision matrix. In practice, precision matrices must be estimated from finite data, and ill-conditioning may lead to error accumulation across BUILD steps. As a mitigation strategy, we periodically re-estimate the precision matrix (with less variables as leaves are pruned), trading off runtime for enhanced robustness. Reproducible results on challenging synthetic benchmarks demonstrate that BUILD compares favorably to state-of-the-art DAG learning algorithms, while offering an explicit handle on complexity.

URL PDF HTML ☆

赞 0 踩 0

2512.08371 2026-05-27 cs.LG stat.ML 版本更新

A Multivariate Bernoulli-Based Sampling Method for Multi-Label Data with Application to Meta-Research

基于多元伯努利的采样方法用于多标签数据及其在元研究中的应用

Simon Chung, Colby J. Vorland, Donna L. Maney, Andrew W. Brown

发表机构 * Department of Biostatistics, University of Arkansas for Medical Sciences（生物统计学系，亚拉巴马州医学科学大学）； Arkansas Children’s Research Institute（亚拉巴马州儿童研究研究所）； Department of Epidemiology and Biostatistics, Indiana University School of Public Health-Bloomington（流行病学与生物统计学系，印第安纳大学公共健康学院-布卢明顿分校）； Department of Psychology, Emory University（心理学系，埃默里大学）

AI总结针对多标签数据中标签频率差异大且存在依赖关系的问题，提出一种基于多元伯努利分布的加权采样算法，通过估计标签组合权重实现目标分布特征，并在Web of Science研究文章数据上验证了其增强少数类别代表性的效果。

详情

AI中文摘要

数据集可能包含具有多个标签的观测值。如果标签不是互斥的，并且标签的频率差异很大，那么获取一个样本，该样本包含足够多的稀有标签观测值以对这些标签进行推断，并且以已知方式偏离总体频率，这带来了挑战。在本文中，我们将多元伯努利分布视为多标签问题的底层分布。我们提出了一种新颖的采样算法，该算法考虑了标签依赖性。它使用观测到的标签频率来估计多元伯努利分布参数，并为每个标签组合计算权重。这种方法确保加权采样在考虑标签依赖性的同时获得目标分布特征。我们将该方法应用于各种数据集，包括来自Web of Science的研究文章样本，这些文章标有64个生物医学主题类别。我们的目标是保持类别频率顺序，减少最常见和最不常见类别之间的频率差异，并考虑类别依赖性。该方法产生了更平衡的子样本，增强了少数类别的代表性。

英文摘要

Datasets may contain observations with multiple labels. If the labels are not mutually exclusive, and if the labels vary greatly in frequency, obtaining a sample that includes sufficient observations with scarcer labels to make inferences about those labels, and which deviates from the population frequencies in a known manner, creates challenges. In this paper, we consider a multivariate Bernoulli distribution as our underlying distribution of a multi-label problem. We present a novel sampling algorithm that takes label dependencies into account. It uses observed label frequencies to estimate multivariate Bernoulli distribution parameters and calculates weights for each label combination. This approach ensures the weighted sampling acquires target distribution characteristics while accounting for label dependencies. We applied this approach to a variety of datasets, including a sample of research articles from Web of Science labeled with 64 biomedical topic categories. We aimed to preserve category frequency order, reduce frequency differences between most and least common categories, and account for category dependencies. This approach produced a more balanced sub-sample, enhancing the representation of minority categories.

URL PDF HTML ☆

赞 0 踩 0

2511.20586 2026-05-27 cs.AI cs.LG 版本更新

PaTAS: A Framework for Trust Propagation in Neural Networks Using Subjective Logic

PaTAS：基于主观逻辑的神经网络信任传播框架

Koffi Ismael Ouattara, Ioannis Krontiris, Theo Dimitrakos, Dennis Eisermann, Houda Labiod, Frank Kargl

AI总结提出PaTAS框架，利用主观逻辑在神经网络中并行传播信任，通过信任节点和信任函数量化输入、参数和激活的信任，并设计参数信任更新和推理路径信任评估方法，以在对抗或退化条件下提供可解释的信任估计。

详情

AI中文摘要

可信度已成为安全关键应用中人工智能系统部署的关键要求。传统的评估指标（如准确率和精确率）无法充分捕捉不确定性或模型预测的可靠性，尤其是在对抗或退化条件下。本文介绍了并行信任评估系统（PaTAS），这是一个使用主观逻辑（SL）对神经网络中的信任进行建模和传播的框架。PaTAS通过信任节点和信任函数与标准神经计算并行运行，这些节点和函数在网络中传播输入、参数和激活信任。该框架定义了一种参数信任更新机制，以在训练过程中优化参数可靠性，以及一种推理路径信任评估（IPTA）方法，以在推理时计算实例特定的信任。在真实世界和对抗性数据集上的实验表明，PaTAS产生可解释、对称且收敛的信任估计，这些估计补充了准确率，并揭示了在中毒、有偏或不确定数据场景中的可靠性差距。结果表明，PaTAS有效区分良性输入和对抗性输入，并识别模型置信度与实际可靠性不一致的情况。通过在神经架构中实现透明且可量化的信任推理，PaTAS为评估AI生命周期中的模型可靠性提供了基础。

英文摘要

Trustworthiness has become a key requirement for the deployment of artificial intelligence systems in safety-critical applications. Conventional evaluation metrics, such as accuracy and precision, fail to appropriately capture uncertainty or the reliability of model predictions, particularly under adversarial or degraded conditions. This paper introduces the Parallel Trust Assessment System (PaTAS), a framework for modeling and propagating trust in neural networks using Subjective Logic (SL). PaTAS operates in parallel with standard neural computation through Trust Nodes and Trust Functions that propagate input, parameter, and activation trust across the network. The framework defines a Parameter Trust Update mechanism to refine parameter reliability during training and an Inference-Path Trust Assessment (IPTA) method to compute instance-specific trust at inference. Experiments on real-world and adversarial datasets demonstrate that PaTAS produces interpretable, symmetric, and convergent trust estimates that complement accuracy and expose reliability gaps in poisoned, biased, or uncertain data scenarios. The results show that PaTAS effectively distinguishes between benign and adversarial inputs and identifies cases where model confidence diverges from actual reliability. By enabling transparent and quantifiable trust reasoning within neural architectures, PaTAS provides a foundation for evaluating model reliability across the AI lifecycle.

URL PDF HTML ☆

赞 0 踩 0

2412.20505 2026-05-27 cs.AI cs.CL cs.LG 版本更新

LiPUP-MA: A Residential Experience-centric Multi-Agent Framework for Living-in-the-loop Participatory Urban Planning

LiPUP-MA：一种以居住体验为中心的循环参与式城市多智能体规划框架

Hang Ni, Yuzhi Wang, Yizhi Song, Hao Liu

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））； The Hong Kong Polytechnic University（香港理工大学）

AI总结提出LiPUP-MA多智能体框架，通过模拟居住生活与体验驱动的计划修订循环，利用基于图的经验库和空间约束技能增强规划器，解决参与式城市规划中经验落地与反馈空间化问题。

详情

AI中文摘要

参与式城市规划（PUP）日益得到基于LLM的智能体的支持，但现有方法主要依赖于静态偏好 elicitation 和一次性利益相关者讨论，忽视了现实世界规划的周期性——居住生活、经验收集和计划调整持续互动。我们提出循环参与式城市规划（LiPUP），一种在模拟居住生活和经验驱动的计划修订之间交替的闭环范式，同时面临两个关键挑战：将分散的居住经验锚定到具体的城市背景中，以及将主观反馈转化为空间连贯的规划行动。为实例化LiPUP，我们引入LiPUP-MA，一个基于LLM的多智能体框架，它构建了一个以计划为中心的基于图的经验库，用于组织来自生活模拟的基于城市的居住反馈，并配备了一个空间约束的技能增强规划器智能体，通过协调经验、视觉和地理空间证据来修订计划。实验表明，LiPUP-MA在传统的静态规划指标和基于生活的指标上均持续优于基线，而迭代的LiPUP循环进一步提高了计划质量。

英文摘要

Participatory Urban Planning (PUP) is increasingly supported by LLM-based agents, yet existing methods largely rely on static preference elicitation and one-shot stakeholder discussions, overlooking the cyclical nature of real-world planning, where residential life, experience collection, and plan adjustment continually interact. We propose Living-in-the-loop Participatory Urban Planning (LiPUP), a closed-loop paradigm that alternates between simulated residential living and experience-driven plan revision, while posing two key challenges: grounding scattered living experience in concrete urban contexts and translating subjective feedback into spatially coherent planning actions. To instantiate LiPUP, we introduce LiPUP-MA, an LLM-based multi-agent framework that constructs a Plan-centric Graph-based Experience Bank to organize urban-grounded residential feedback from living simulation and equips a Spatially-constrained Skill-augmented Planner agent to revise plans by harmonizing experiential, visual, and geospatial evidence. Experiments show that LiPUP-MA consistently outperforms baselines on both conventional static planning metrics and living-based metrics, while iterative LiPUP cycles further improve plan quality.

URL PDF HTML ☆

赞 0 踩 0

2506.09532 2026-05-27 cs.LG cs.AI cs.CL cs.CV 版本更新

Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models

Athena: 利用数据高效的过程奖励模型增强多模态推理

Shuai Wang, Zhenhua Liu, Jiaheng Wei, Xuanwu Yin, Dong Li, Emad Barsoum

发表机构 * Advanced Micro Devices Inc.（先进微器件公司）； The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））

AI总结提出 Athena-PRM，一种多模态过程奖励模型，通过利用弱和强完成者之间的预测一致性高效生成高质量过程标签，在仅5000样本下显著提升复杂推理问题的逐步评估性能。

Comments TMLR 2026, https://openreview.net/forum?id=unWmplHccF

详情

AI中文摘要

我们提出了 Athena-PRM，一种多模态过程奖励模型（PRM），旨在评估解决复杂推理问题中每一步的奖励分数。开发高性能的PRM通常需要大量的时间和资金投入，主要因为需要推理步骤的逐步标注。传统的自动标注方法，如蒙特卡洛估计，通常会产生噪声标签并带来巨大的计算成本。为了高效生成高质量的过程标注数据，我们提出利用弱和强完成者之间的预测一致性作为识别可靠过程标签的标准。值得注意的是，Athena-PRM 在仅5000个样本的情况下，在各种场景和基准测试中展现出卓越的效果。此外，我们还开发了两种有效策略来提升PRM的性能：ORM初始化和负数据上采样。我们在三个具体场景中验证了我们的方法：测试时扩展的验证、推理步骤正确性的直接评估以及奖励排序微调。我们的 Athena-PRM 在多个基准测试和场景中持续取得优越性能。值得注意的是，当使用 Qwen2.5-VL-7B 作为策略模型时，Athena-PRM 在 WeMath 上提升了10.2个百分点，在 MathVista 上提升了7.1个百分点（测试时扩展）。此外，Athena-PRM 在 VisualProcessBench 上取得了最先进（SoTA）结果，比之前的 SoTA 高出3.9个F1分数，展示了其准确评估推理步骤正确性的强大能力。另外，利用 Athena-PRM 作为奖励模型，我们通过奖励排序微调开发了 Athena-7B，在五个基准测试上以显著优势超越了基线。

英文摘要

We present Athena-PRM, a multimodal process reward model (PRM) designed to evaluate the reward score for each step in solving complex reasoning problems. Developing high-performance PRMs typically demands significant time and financial investment, primarily due to the necessity for step-level annotations of reasoning steps. Conventional automated labeling methods, such as Monte Carlo estimation, often produce noisy labels and incur substantial computational costs. To efficiently generate high-quality process-labeled data, we propose leveraging prediction consistency between weak and strong completers as a criterion for identifying reliable process labels. Remarkably, Athena-PRM demonstrates outstanding effectiveness across various scenarios and benchmarks with just 5,000 samples. Furthermore, we also develop two effective strategies to improve the performance of PRMs: ORM initialization and up-sampling for negative data. We validate our approach in three specific scenarios: verification for test time scaling, direct evaluation of reasoning step correctness, and reward ranked fine-tuning. Our Athena-PRM consistently achieves superior performance across multiple benchmarks and scenarios. Notably, when using Qwen2.5-VL-7B as the policy model, Athena-PRM enhances performance by 10.2 points on WeMath and 7.1 points on MathVista for test time scaling. Furthermore, Athena-PRM sets the state-of-the-art (SoTA) results in VisualProcessBench and outperforms the previous SoTA by 3.9 F1-score, showcasing its robust capability to accurately assess the correctness of the reasoning step. Additionally, utilizing Athena-PRM as the reward model, we develop Athena-7B with reward ranked fine-tuning and outperforms baseline with a significant margin on five benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2511.01724 2026-05-27 cs.CV cs.LG 版本更新

CFG-OEC: 带正交误差校正的无分类器引导

Nakgyu Yang, Yechan Lee, SooJean Han

发表机构 * School of Electrical Engineering, Korea Advanced Institute of Science（韩国科学技术院电子工程学院）

AI总结针对扩散模型中无分类器引导的采样规则与训练目标不匹配导致的误差，提出正交误差校正方法（CFG-OEC）通过减少条件与无条件预测误差的交互项来提升采样质量，并在Stable Diffusion上验证了FID和CLIP分数的改进。

详情

AI中文摘要

无分类器引导是扩散模型中条件采样的标准方法，但其采样规则与训练中使用的目标不一致。这种不匹配通过条件预测误差和无条件预测误差的相互作用引入了结构性采样误差。我们通过将采样误差分解为基础项和由两个误差对齐决定的交叉项来分析该问题。基于此分析，我们提出了带正交误差校正的无分类器引导（CFG-OEC），这是一种减少交互项的结构性修改。对于无法观测到真实噪声的实际场景，我们引入了一个从模型预测计算得到的代理量，以及一种跨扩散时间步稳定校正的动态方法。在受控环境下的实验验证了我们的理论误差分解和代理量构造。在Stable Diffusion v1.5和Stable Diffusion XL上的图像生成表明，CFG-OEC在多个采样器和引导机制下比CFG和CFG++改进了FID和CLIP分数。

英文摘要

Classifier free guidance is a standard method for conditional sampling in diffusion models, but its sampling rule is not aligned with the objective used in training. This mismatch induces a structural sampling error through the interaction of conditional and unconditional prediction errors. We analyze this issue by decomposing the sampling error into a base term and a cross term determined by the alignment of the two errors. Based on this analysis we propose CFG with orthogonal error correction (CFG-OEC), a structural modification that reduces the interaction term. For practical settings where ground truth noise is not observable, we introduce a proxy computed from model predictions and a dynamic method that stabilizes correction across diffusion timesteps. Experiments in a controlled environment validate our theoretical error decomposition and proxy construction. Image generation on Stable Diffusion v1.5 and Stable Diffusion XL show that CFG-OEC improves FID and CLIP scores over CFG and CFG++ across multiple samplers and guidance regimes.

URL PDF HTML ☆

赞 0 踩 0

2511.04711 2026-05-27 cs.CR cs.AI cs.LG 版本更新

SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking

SWAP：通过顺序水印实现软提示的版权审计

Wenyuan Yang, Yichen Sun, Changzheng Chen, Zhixuan Chu, Jiaheng Zhang, Yiming Li, Dacheng Tao

发表机构 * Sun Yat-sen University（中山大学）； Zhejiang University（浙江大学）； National University of Singapore（新加坡国立大学）； Nanyang Technological University（南洋理工大学）

AI总结针对软提示的版权保护问题，提出一种基于顺序水印的审计方法SWAP，通过将水印嵌入到更复杂的输出分布顺序空间中，实现无害且鲁棒的版权验证。

Comments This paper has been accepted by the International Journal of Computer Vision (IJCV), 2026. The first two authors contributed equally to this work. 28 pages

详情

AI中文摘要

大规模视觉语言模型，尤其是CLIP，在各种下游任务中展现了卓越的性能。软提示作为精心设计的模块，能够高效地将视觉语言模型适应特定任务，因此需要有效的版权保护。本文通过审计可疑的第三方模型是否使用了受保护的软提示，来研究模型版权保护。虽然这可以视为模型所有权审计的一个特例，但我们的分析表明，由于提示学习的独特特性，现有技术效果不佳。非侵入式审计在独立模型与受害模型共享相似数据分布时，本质上容易产生误报。侵入式方法也失败：为CLIP设计的后门方法无法嵌入功能性触发器，而将传统DNN后门技术扩展到提示学习则面临有害性和模糊性挑战。我们发现，侵入式审计的这些失败源于同一个根本原因：水印与主任务在同一决策空间中运行，却追求相反的目标。基于这些发现，我们提出了软提示的顺序水印（SWAP），将水印植入一个不同且更复杂的空间。SWAP通过防御者指定的分布外类别的特定顺序来编码水印，灵感来自CLIP的零样本预测能力。这种嵌入在更复杂空间中的水印保持原始预测标签不变，从而减少与主任务的冲突。我们进一步为SWAP设计了基于假设检验的验证协议，并提供了验证何时有效的理论分析。在11个数据集上的大量实验证明了SWAP的有效性、无害性以及对潜在攻击的鲁棒性。

英文摘要

Large-scale vision-language models, especially CLIP, have demonstrated remarkable performance across diverse downstream tasks. Soft prompts, as carefully crafted modules that efficiently adapt vision-language models to specific tasks, necessitate effective copyright protection. In this paper, we investigate model copyright protection by auditing whether suspicious third-party models incorporate protected soft prompts. While this can be viewed as a special case of model ownership auditing, our analysis shows that existing techniques are ineffective due to prompt learning's unique characteristics. Non-intrusive auditing is inherently prone to false positives when independent models share similar data distributions with victim models. Intrusive approaches also fail: backdoor methods designed for CLIP cannot embed functional triggers, while extending traditional DNN backdoor techniques to prompt learning suffers from harmfulness and ambiguity challenges. We find that these failures in intrusive auditing stem from the same fundamental reason: watermarking operates within the same decision space as the primary task yet pursues opposing objectives. Motivated by these findings, we propose sequential watermarking for soft prompts (SWAP), which implants watermarks into a different and more complex space. SWAP encodes watermarks through a specific order of defender-specified out-of-distribution classes, inspired by the zero-shot prediction capability of CLIP. This watermark, which is embedded in a more complex space, keeps the original prediction label unchanged, making it less opposed to the primary task. We further design a hypothesis-test-guided verification protocol for SWAP and provide a theoretical analysis of when verification works. Extensive experiments on 11 datasets demonstrate SWAP's effectiveness, harmlessness, and robustness against potential attacks.

URL PDF HTML ☆

赞 0 踩 0

2511.02525 2026-05-27 cs.LG cs.AI 版本更新

An End-to-End Learning Approach for Solving Capacitated Location-Routing Problems

一种用于求解带容量约束选址-路径问题的端到端学习方法

Changhao Miao, Yuntian Zhang, Tongyu Wu, Fang Deng, Chen Chen

发表机构 * National Key Lab of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology（中国自动化智能无人系统国家级实验室，北京理工大学）

AI总结提出基于深度强化学习与异构查询机制（DRLHQ）的端到端方法，首次将编码器-解码器结构应用于带容量约束的选址-路径问题（CLRP）及其开放变体（OCLRP），通过异构查询注意力机制动态协调选址与路径决策，在合成和基准数据集上优于传统方法和现有DRL基线。

详情

AI中文摘要

带容量约束的选址-路径问题（CLRPs）是组合优化中的经典问题，需要同时做出选址和路径决策。在CLRPs中，复杂的约束以及各种决策之间的复杂关系使得问题难以求解。随着深度强化学习（DRL）的出现，它已被广泛应用于解决车辆路径问题及其变体，而与CLRPs相关的研究仍有待探索。在本文中，我们提出了带有异构查询的DRL（DRLHQ）来分别求解CLRP和开放CLRP（OCLRP）。我们是首个为CLRPs提出端到端学习方法的工作，遵循编码器-解码器结构。具体而言，我们将CLRPs重新表述为一个针对各种决策量身定制的马尔可夫决策过程，这是一个通用的建模框架，可适用于其他基于DRL的方法。为了更好地处理选址和路径决策之间的相互依赖关系，我们还引入了一种新颖的异构查询注意力机制，旨在动态适应不同的决策阶段。在合成和基准数据集上的实验结果表明，我们提出的方法在求解CLRP和OCLRP时，相较于代表性的传统方法和基于DRL的基线，具有更优的解质量和更好的泛化性能。

英文摘要

The capacitated location-routing problems (CLRPs) are classical problems in combinatorial optimization, which require simultaneously making location and routing decisions. In CLRPs, the complex constraints and the intricate relationships between various decisions make the problem challenging to solve. With the emergence of deep reinforcement learning (DRL), it has been extensively applied to address the vehicle routing problem and its variants, while the research related to CLRPs still needs to be explored. In this paper, we propose the DRL with heterogeneous query (DRLHQ) to solve CLRP and open CLRP (OCLRP), respectively. We are the first to propose an end-to-end learning approach for CLRPs, following the encoder-decoder structure. In particular, we reformulate the CLRPs as a markov decision process tailored to various decisions, a general modeling framework that can be adapted to other DRL-based methods. To better handle the interdependency across location and routing decisions, we also introduce a novel heterogeneous querying attention mechanism designed to adapt dynamically to various decision-making stages. Experimental results on both synthetic and benchmark datasets demonstrate superior solution quality and better generalization performance of our proposed approach over representative traditional and DRL-based baselines in solving both CLRP and OCLRP.

URL PDF HTML ☆

赞 0 踩 0

2510.23905 2026-05-27 eess.SP cs.LG 版本更新

Inferring Group Intent as a Cooperative Game. An NLP-based Framework for Trajectory Analysis

将群体意图推断作为合作博弈：基于NLP的轨迹分析框架

Yiming Zhang, Vikram Krishnamurthy, Shashwat Jain

发表机构 * Cornell University（康奈尔大学）

AI总结提出一个基于NLP的生成模型和合作博弈框架，通过Fisher信息特征函数和Graph Transformer神经网络从噪声观测中推断群体轨迹意图。

详情

一种基于输出误差上界的深度状态空间模型压缩方法

Hiroki Sakamoto, Kazuhiro Sato

发表机构 * Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo（数学信息学系，信息科学和技术研究生院，东京大学）

AI总结本文提出一种基于输出误差上界的深度状态空间模型压缩方法，通过推导层间LQO系统的h²误差范数上界并优化该上界，实现无需重训练即可减少约60%可训练参数并保持模型性能。

详情

AI中文摘要

我们研究包含线性二次输出（LQO）系统作为内部块的深度状态空间模型（Deep SSMs），并提出一种具有可证明输出误差保证的压缩方法。我们首先推导两个Deep SSM之间输出误差的上界，并证明该上界可以用逐层LQO系统之间的$h^2$误差范数表示。特别地，我们表明减小浅层中LQO系统的$h^2$逼近误差能有效降低推导出的输出误差上界。接下来，我们针对推导出的上界制定一个优化问题，并开发一种基于梯度的模型降阶方法。在数值实验中，使用LRA基准中的IMDb任务，我们展示了所提出的基于上界的压缩方法的有效性。特别地，我们表明无需重训练即可将可训练参数数量减少约60%，同时保持原始模型的性能。

英文摘要

We study deep state-space models (Deep SSMs) that contain linear quadratic-output (LQO) systems as internal blocks and present a compression method with a provable output error guarantee. We first derive an upper bound on the output error between two Deep SSMs and show that the bound can be expressed in terms of the $h^2$-error norms between the layerwise LQO systems. In particular, we show that reducing the $h^2$ approximation errors of the LQO systems placed in shallow layers is effective in reducing the derived upper bound on the output error. Next, we formulate an optimization problem for the derived upper bound and develop a gradient-based MOR method. In the numerical experiments, using the IMDb task from the LRA benchmark, we demonstrate the effectiveness of the proposed upper-bound-based compression method. In particular, we show that the number of trainable parameters can be reduced by approximately 60\% without retraining while maintaining the performance of the original model.

URL PDF HTML ☆

赞 0 踩 0

2510.13217 2026-05-27 cs.IR cs.LG 版本更新

LLM-guided Hierarchical Search for End-to-end Reasoning Intensive Retrieval

LLM引导的层次化搜索用于端到端推理密集型检索

Nilesh Gupta, Wei-Cheng Chang, Ngot Bui, Cho-Jui Hsieh, Inderjit S. Dhillon

发表机构 * UT Austin（得克萨斯大学）； UCLA（加州大学洛杉矶分校）； Google（谷歌）

AI总结本文提出LATTICE，一种无需嵌入模型的LLM引导层次化搜索方法，通过LLM构建搜索索引并校准路径聚合遍历，在推理密集型基准上达到与最优微调集成基线相当的性能。

详情

AI中文摘要

搜索系统越来越多地用于推理密集型查询，其中文档的相关性需要理解或推理查询-文档关系，而不是依赖表面词汇或主题相似性。标准方案——廉价的基于嵌入的检索器后接LLM验证器——仅在嵌入模型将正确文档置于其top-k中时才有效，而最近的推理密集型IR基准显示，即使对于最先进的嵌入模型，这一假设也常常不成立。最近的查询端修复方法（如查询重写和智能体循环）将LLM保持在廉价检索器的上游，但仍然容易受到嵌入器失败和LLM从其参数知识重写查询能力的影响。在本文中，我们探索了一种不同的范式——LLM引导的层次化搜索——其中LLM通过层次可导航搜索索引直接与语料库交互，搜索时无需嵌入模型参与。我们提出了LATTICE，其包含两项技术贡献：(i) 使用LLM对多级文档摘要的判断进行自上而下的LLM引导搜索索引构建，以及(ii) 通过跨分支参考节点减轻噪声、依赖列表的LLM分数的校准路径聚合LLM引导遍历。在推理密集型BRIGHT基准上，使用单个现成LLM的基础LATTICE实现了46.7 nDCG@10——与最佳微调集成基线整体匹配——而轻量级集成LATTICE++将LATTICE与廉价检索融合，达到49.1 nDCG@10。与滑动窗口重排序的受控相同LLM比较显示，在低token预算下重排序提供更好的权衡，但LATTICE在适度预算后收敛到更高的渐近线。LATTICE也适用于开放权重LLM，并在传统IR基准（NQ、SciFact、SciDocs）上保持竞争力。

英文摘要

Search systems are increasingly used for reasoning-intensive queries, where what makes a document relevant requires understanding or reasoning over the query-document relation rather than relying on surface vocabulary or topical similarity. The standard recipe - a cheap embedding-based retriever followed by an LLM verifier - works only when the embedding model places the right documents in its top-k, an assumption that recent reasoning-intensive IR benchmarks show often fails to hold even for SOTA embedding models. Recent query-side fixes such as query rewriting and agentic loops keep the LLM upstream of the cheap retriever and remain brittle to the embedder's failures and to the LLM's ability to rewrite the query from its parametric knowledge. In this paper, we explore a different paradigm - LLM-guided hierarchical search - in which an LLM interacts with the corpus directly via a hierarchically navigable search index, with no embedding model in the loop at search time. We propose LATTICE, an instantiation with two technical contributions: (i) a top-down LLM-guided construction of the search index using LLM judgements over multi-level document summaries, and (ii) a calibrated, path-aggregated LLM-guided traversal that mitigates noisy, slate-dependent LLM scores via cross-branch reference nodes. On the reasoning-intensive BRIGHT benchmark, base LATTICE with a single off-the-shelf LLM achieves 46.7 nDCG@10 - matching the best fine-tuned ensemble baseline overall - and a lightweight ensemble LATTICE++ that fuses LATTICE with cheap retrieval reaches 49.1 nDCG@10. A controlled same-LLM comparison against sliding-window reranking shows reranking offers a better tradeoff at low token budgets, but LATTICE converges to a higher asymptote after a moderate budget. LATTICE also works with open-weight LLMs and remains competitive on traditional IR benchmarks (NQ, SciFact, SciDocs).

URL PDF HTML ☆

赞 0 踩 0

2510.10774 2026-05-27 cs.SD cs.AI cs.HC cs.LG 版本更新

ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis

ParsVoice: 面向文本到语音合成的大规模多说话人波斯语语音语料库

Mohammad Javad Ranjbar Kalahroodi, Heshaam Faili, Azadeh Shakery

发表机构 * School of Electrical and Computer Engineering, University of Tehran（塔里哈大学电气与计算机工程学院）； Institute for Research in Fundamental Sciences (IPM)（基础科学研究所（IPM））

AI总结提出ParsVoice，目前最大的公开波斯语语音-文本语料库，通过可扩展的流水线从长篇有声读物构建高质量数据，用于训练多说话人TTS系统，并验证了其在零样本多说话人TTS中的有效性。

详情

AI中文摘要

波斯语在开放的语音-文本资源中仍然严重不足，限制了多说话人文本到语音（TTS）、语音语言建模和低资源语音处理的进展。我们介绍了ParsVoice，这是目前最大的公开波斯语语音-文本语料库，专为训练多说话人TTS系统而设计，同时提供了一个可扩展的流水线，用于从长篇有声读物录音中构建高质量的语音-文本数据。该流水线结合了微调的ParsBERT句子补全分类器、基于ASR的边界优化、标点恢复、说话人识别以及涵盖音频和波斯语特定文本属性的多维质量评估。最终发布的版本包含一个2200小时的TTS就绪子集，包含来自1815个自动识别说话人ID的136万个对齐片段，比之前最大的公开波斯语TTS数据集大25倍以上。为了验证该语料库，我们微调了XTTS，一个直接操作原始波斯语文本（无需音素表示）的零样本多语言TTS模型，实现了自然度MOS为3.6/5，说话人相似度MOS为4.0/5。ParsVoice数据集公开在：https://huggingface.co/datasets/MohammadJRanjbar/ParsVoice。

英文摘要

Persian remains substantially underrepresented in open speech-text resources, limiting progress in multi-speaker text-to-speech (TTS), speech-language modelling, and low-resource speech processing. We introduce ParsVoice, the largest publicly available Persian speech-text corpus tailored for training multi-speaker TTS systems, along with a scalable pipeline to construct high-quality speech-text data from long-form audiobook recordings. The pipeline combines a fine-tuned ParsBERT sentence-completion classifier, ASR-based boundary optimization, punctuation restoration, speaker identification, and a multi-dimensional quality assessment that covers both audio and Persian-specific text properties. The resulting release contains a 2,200-hour TTS-ready subset with 1.36 million aligned segments from 1,815 automatically identified speaker IDs, making it more than 25 times larger than the previously largest open Persian TTS dataset. To validate the corpus, we fine-tune XTTS, a zero-shot multilingual TTS model that operates directly on raw Persian text without phoneme representations, achieving a naturalness MOS of 3.6/5 and speaker similarity MOS of 4.0/5. The ParsVoice dataset is publicly available at: https://huggingface.co/datasets/MohammadJRanjbar/ParsVoice.

URL PDF HTML ☆

赞 0 踩 0

2510.09405 2026-05-27 cs.LG 版本更新

Cross-Receiver Generalization for RF Fingerprint Identification via Feature Disentanglement and Adversarial Training

基于特征解耦与对抗训练的射频指纹识别跨接收机泛化

Yuhao Pan, Xiucheng Wang, Fushuo Huo, Nan Cheng, Wenchao Xu

发表机构 * Division of Integrative Systems and Design, Hong Kong University of Science and Technology, Hong Kong, China（香港理工大学整合系统与设计学院，中国香港，香港）； State Key Laboratory of ISN and School of Telecommunications Engineering, Xidian University, Xi’an 710071, China（西安电子科技大学信息与通信国家重点实验室及电信工程学院，中国西安，710071）； School of Cyber Science and Engineering, Southeast University, Nanjing, China（东南大学网络科学与工程学院，中国南京）

AI总结提出一种特征解耦与对抗训练框架，通过分离发射机与接收机特征并抑制接收机信息，解决射频指纹识别中接收机更换导致的性能下降问题。

详情

AI中文摘要

射频指纹识别（RFFI）是无线网络安全的关键技术，利用硬件固有缺陷实现发射机识别。尽管深度神经网络能有效提取判别性射频特征，但在实际部署中，其性能受接收机引入的变异性显著影响。真实场景中，射频信号天然地混合了发射机特定特征与接收机依赖失真，导致模型在相同设备上训练和评估时会捕获接收机相关模式。因此，部署时更换接收机常导致性能显著下降。为解决此问题，我们提出一种跨接收机鲁棒的RFFI框架，明确解耦发射机特定和接收机特定表示。该方法整合对抗域对齐与接收机感知正则化，抑制发射机特征中的残余接收机信息，同时强制接收机特定表示的内部一致性。进一步引入特征分离约束，在潜在空间中解耦两个组件。在多接收机WiFi数据集上的大量实验表明，所提方法在跨接收机评估中持续优于最先进基线，并显著提升对接收机更换的鲁棒性。

英文摘要

Radio frequency fingerprint identification (RFFI) is a key technique for wireless network security, leveraging intrinsic hardware imperfections to enable transmitter identification. Although deep neural networks are effective at extracting discriminative RF features, their performance is significantly affected by receiver-induced variability in practical deployments. In real-world scenarios, RF signals inherently entangle transmitter-specific characteristics with receiver-dependent distortions, leading models to capture receiver-related patterns when training and evaluation are conducted on the same device. Consequently, replacing the receiver during deployment often results in notable performance degradation. To address this issue, we propose a cross-receiver robust RFFI framework that explicitly disentangles transmitter-specific and receiver-specific representations. The proposed method integrates adversarial domain alignment with receiver-aware regularization to suppress residual receiver information in transmitter features while enforcing intra-receiver consistency in receiver-specific representations. A feature separation constraint is further introduced to decouple the two components in the latent space. Extensive experiments on multi-receiver WiFi datasets demonstrate that the proposed method consistently outperforms state-of-the-art baselines under cross-receiver evaluation and significantly improves robustness to receiver replacement.

URL PDF HTML ☆

赞 0 踩 0

2510.09250 2026-05-27 physics.flu-dyn cs.LG 版本更新

Smart navigation of a gravity-driven glider with adjustable centre-of-mass

可调质心的重力驱动滑翔机的智能导航

X. Jiang, J. Qiu, K. Gustavsson, B. Mehlig, L. Zhao

发表机构 * AML, Department of Engineering Mechanics, Tsinghua University, 100084 Beijing, China（AML，工程力学系，清华大学，北京100084，中国）； Department of Physics, Gothenburg University, 41296 Gothenburg, Sweden（物理系，哥德堡大学，瑞典41296哥德堡）

AI总结通过直接数值模拟和强化学习，研究了可调质心的紧凑型滑翔机在粘性流体中沉降时的最优导航策略，揭示了粒子雷诺数对策略选择的关键影响。

Comments 13 pages, 8 figures

详情

DOI: 10.1103/1r5c-qpng
Journal ref: Phys. Rev. Research 7 (2025) 043200

AI中文摘要

人工滑翔机被设计为在流体中沉降时分散，需要精确导航以到达目标位置。我们展示了一个在粘性流体中沉降的紧凑型滑翔机可以通过动态调整其质心来导航。使用完全解析的直接数值模拟（DNS）和强化学习，我们发现了两种最优导航策略，使滑翔机能够准确到达目标位置。这些策略敏感地依赖于滑翔机与周围流体的相互作用方式。这种相互作用的性质随着粒子雷诺数Re$_p$的变化而变化。我们的结果解释了最优策略如何依赖于Re$_p$。在大的Re$_p$下，滑翔机学会通过在其方向改变时移动质心来快速翻滚。这产生了大的水平惯性升力，使滑翔机能够远距离移动。相比之下，在小的Re$_p$下，高粘度阻碍了翻滚。在这种情况下，滑翔机学会调整其质心，使其以稳定的倾斜方向沉降，从而产生水平粘性力。水平范围比大Re$_p$时小得多，因为这种粘性力远小于大Re$_p$下的惯性升力。

英文摘要

Artificial gliders are designed to disperse as they settle through a fluid, requiring precise navigation to reach target locations. We show that a compact glider settling in a viscous fluid can navigate by dynamically adjusting its centre-of-mass. Using fully resolved direct numerical simulations (DNS) and reinforcement learning, we find two optimal navigation strategies that allow the glider to reach its target location accurately. These strategies depend sensitively on how the glider interacts with the surrounding fluid. The nature of this interaction changes as the particle Reynolds number Re$_p$ changes. Our results explain how the optimal strategy depends on Re$_p$. At large Re$_p$, the glider learns to tumble rapidly by moving its centre-of-mass as its orientation changes. This generates a large horizontal inertial lift force, which allows the glider to travel far. At small Re$_p$, by contrast, high viscosity hinders tumbling. In this case, the glider learns to adjust its centre-of-mass so that it settles with a steady, inclined orientation that results in a horizontal viscous force. The horizontal range is much smaller than for large Re$_p$, because this viscous force is much smaller than the inertial lift force at large Re$_p$. *These authors contributed equally.

URL PDF HTML ☆

赞 0 踩 0

2510.08932 2026-05-27 cs.LG cs.IR 版本更新

MATT-CTR: Unleashing a Model-Agnostic Test-Time Paradigm for CTR Prediction with Confidence-Guided Inference Paths

MATT-CTR：一种模型无关的测试时范式，用于通过置信度引导的推理路径进行CTR预测

Moyu Zhang, Yun Chen, Yujun Jin, Jinxin Hu, Yu Zhang, Xiaoyi Zeng

发表机构 * Alibaba Group（阿里巴巴集团）

AI总结提出一种模型无关的测试时范式MATT，利用特征组合的置信度分数生成多条推理路径并聚合预测，以缓解低置信度特征对CTR预测的影响。

详情

AI中文摘要

近期，越来越多的研究致力于优化CTR模型架构以更好地建模特征交互，或改进训练目标以辅助参数学习，从而获得更好的预测性能。然而，以往的工作主要集中在训练阶段，很大程度上忽视了推理阶段的优化机会。特别是，不常出现的特征组合会降低预测性能，导致不可靠或低置信度的输出。为了释放已训练CTR模型的预测潜力，我们提出了一种模型无关的测试时范式（MATT），该范式利用特征组合的置信度分数来指导生成多条推理路径，从而减轻低置信度特征对最终预测的影响。具体来说，为了量化特征组合的置信度，我们引入了一种层次概率哈希方法来估计不同阶数特征组合的出现频率，这些频率作为对应的置信度分数。然后，以置信度分数作为采样概率，通过迭代采样生成多条实例特定的推理路径，并随后聚合来自多条路径的预测分数以进行稳健预测。最后，广泛的离线实验和在线A/B测试强有力地验证了MATT在现有CTR模型上的兼容性和有效性。

英文摘要

Recently, a growing body of research has focused on either optimizing CTR model architectures to better model feature interactions or refining training objectives to aid parameter learning, thereby achieving better predictive performance. However, previous efforts have primarily focused on the training phase, largely neglecting opportunities for optimization during the inference phase. Infrequently occurring feature combinations, in particular, can degrade prediction performance, leading to unreliable or low-confidence outputs. To unlock the predictive potential of trained CTR models, we propose a Model-Agnostic Test-Time paradigm (MATT), which leverages the confidence scores of feature combinations to guide the generation of multiple inference paths, thereby mitigating the influence of low-confidence features on the final prediction. Specifically, to quantify the confidence of feature combinations, we introduce a hierarchical probabilistic hashing method to estimate the occurrence frequencies of feature combinations at various orders, which serve as their corresponding confidence scores. Then, using the confidence scores as sampling probabilities, we generate multiple instance-specific inference paths through iterative sampling and subsequently aggregate the prediction scores from multiple paths to conduct robust predictions. Finally, extensive offline experiments and online A/B tests strongly validate the compatibility and effectiveness of MATT across existing CTR models.

URL PDF HTML ☆

赞 0 踩 0

2505.07894 2026-05-27 cs.NI cs.ET cs.LG eess.SP math.ST stat.TH 版本更新

EnvCDiff: Joint Refinement of Environmental Information and Channel Fingerprints via Conditional Generative Diffusion Model

EnvCDiff：通过条件生成扩散模型联合优化环境信息与信道指纹

Zhenzhou Jin, Li You, Xiang-Gen Xia, Xiqi Gao

发表机构 * National Mobile Communications Research Laboratory, Southeast University（东南大学国家移动通信研究中心）； Purple Mountain Laboratories（紫金山实验室）； Department of Electrical and Computer Engineering, University of Delaware（德雷塞尔大学电气与计算机工程系）

AI总结针对环境信息和信道指纹粗粒度问题，提出条件生成扩散模型（CDiff）同时细化两者，从粗粒度重建细粒度EnvCF，实验表明性能显著提升。

Comments 6 pages, 2 figures

详情

DOI: 10.1109/TVT.2025.3617013
Journal ref: IEEE Transactions on Vehicular Technology, vol. 75, no. 4, pp. 6846-6851, Apr. 2026

AI中文摘要

从环境无感知通信向智能环境感知通信的范式转变有望促进未来无线通信中信道状态信息的获取。信道指纹（CF）作为环境感知通信的新兴使能技术，为目标通信区域内潜在位置提供信道相关知识。然而，由于用于感知环境信息和测量信道相关知识的实际设备有限，大多数获取的环境信息和CF是粗粒度的，不足以指导无线传输设计。为此，本文提出一种深度条件生成学习方法，即定制的条件生成扩散模型（CDiff）。所提出的CDiff同时细化环境信息和CF，从其粗粒度对应物重建包含环境信息的细粒度CF，称为EnvCF。实验结果表明，与基线相比，所提方法显著提高了EnvCF构建的性能。

英文摘要

The paradigm shift from environment-unaware communication to intelligent environment-aware communication is expected to facilitate the acquisition of channel state information for future wireless communications. Channel Fingerprint (CF), as an emerging enabling technology for environment-aware communication, provides channel-related knowledge for potential locations within the target communication area. However, due to the limited availability of practical devices for sensing environmental information and measuring channel-related knowledge, most of the acquired environmental information and CF are coarse-grained, insufficient to guide the design of wireless transmissions. To address this, this paper proposes a deep conditional generative learning approach, namely a customized conditional generative diffusion model (CDiff). The proposed CDiff simultaneously refines environmental information and CF, reconstructing a fine-grained CF that incorporates environmental information, referred to as EnvCF, from its coarse-grained counterpart. Experimental results show that the proposed approach significantly improves the performance of EnvCF construction compared to the baselines.

URL PDF HTML ☆

赞 0 踩 0

2506.23274 2026-05-27 cs.LG cs.AI 版本更新

Real-Time Progress Prediction in Reasoning Language Models

推理语言模型中的实时进度预测

Hans Peter Lyngsøe Raaschou-Jensen, Constanza Fierro, Anders Søgaard

发表机构 * Department of Computer Science, University of Copenhagen（哥本哈根大学计算机科学系）

AI总结研究通过离散化推理轨迹训练线性探针和微调模型生成0-100%进度估计，实现推理语言模型中的实时进度预测，并在数学推理任务上达到0.161 MAE。

详情

AI中文摘要

最近的推理语言模型，特别是那些采用长潜在思维链的模型，在复杂的智能体任务上表现出色。然而，随着这些模型在越来越长的时间范围内运行，其内部进展对用户变得不透明，使得期望管理和实时监督变得困难。在这项工作中，我们研究了对此类模型进行实时进度预测的可行性。我们首先通过离散化推理轨迹并训练线性探针对推理状态进行分类，测试隐藏状态是否编码进度信息。然后，我们微调模型以在思维链推理过程中生成0-100%的进度估计。我们最强的进度报告检查点在数学推理轨迹上达到了0.161的平均绝对误差，并在此设置中优于位置基线。最后，我们通过测量相同部分展开中隐含进度值的变化程度，量化了进度标签的内在模糊性。这种模糊性在Qwen3-4B中最低，其延续产生的展开离散度最小，表明更大的模型可以通过减少剩余解决方案长度的变化来使进度标签更稳定。

英文摘要

Recent reasoning language models, particularly those that employ long latent chains of thought, achieve strong performance on complex agentic tasks. However, as these models operate over increasingly long time horizons, their internal progress becomes opaque to users, making expectation management and real-time oversight difficult. In this work, we investigate whether real-time progress prediction is feasible for such models. We first test whether hidden states encode progress information by discretizing reasoning trajectories and training a linear probe to classify reasoning states. We then fine-tune models to generate progress estimates from 0--100\% during chain-of-thought reasoning. Our strongest progress-reporting checkpoint reaches 0.161 MAE on mathematical reasoning traces and outperforms position baselines in this setting. Finally, we quantify the intrinsic ambiguity of progress labels by measuring how much the implied progress value varies from the same partial rollout. This ambiguity is lowest for Qwen3-4B, whose continuations produce the smallest rollout dispersion, suggesting that larger models can make progress labels more stable by reducing variation in remaining solution length.

URL PDF HTML ☆

赞 0 踩 0

2510.06381 2026-05-27 cs.LG cs.AI 版本更新

Monte Carlo Permutation Search

蒙特卡洛排列搜索

Tristan Cazenave

AI总结提出一种改进GRAVE算法的通用蒙特卡洛树搜索算法MCPS，通过利用路径上所有节点的统计信息，在多种游戏中优于GRAVE，并给出了统计权重公式的数学推导。

2510.01168 2026-05-27 math.OC cs.LG cs.NA math.NA stat.ML 版本更新

A first-order method for constrained nonconvex-nonconcave minimax optimization

约束非凸-非凹极小极大优化的一阶方法

Zhaosong Lu, Xiangyuan Wang

发表机构 * Department of Industrial and Systems Engineering, University of Minnesota, USA（明尼苏达大学工业与系统工程系）

AI总结针对内层最大化含复杂约束的非凸-非凹极小极大问题，通过提升重构和局部KL条件，提出基于序列凸规划的不精确近端梯度法并证明收敛性。

Comments 27 pages

2510.01336 2026-05-27 cs.CL cs.AI cs.LG 版本更新

HiSpec: Hierarchical Speculative Decoding for LLMs

HiSpec: 分层推测解码用于大语言模型

Avinash Kumar, Sujay Sanghavi, Poulami Das

发表机构 * Department of Electrical and Computer Engineering, The University of Texas at Austin（德克萨斯大学奥斯汀分校电子与计算机工程系）

AI总结提出HiSpec框架，利用早期退出模型进行低开销中间验证，通过重用键值缓存和隐藏状态提高吞吐量，平均加速1.28倍，最高2.01倍，且不损失准确性。

详情

AI中文摘要

推测解码通过使用较小的草稿模型推测令牌，再由较大的目标模型验证，从而加速LLM推理。验证通常是瓶颈（例如，当3B模型为70B目标模型推测时，验证速度比令牌生成慢4倍），但大多数先前工作只关注加速草稿生成。“中间”验证通过早期丢弃不准确的草稿令牌来减少验证时间，但现有方法在引入中间验证器时会产生大量训练开销，增加内存占用以协调中间验证步骤，并依赖近似启发式方法损害准确性。我们提出$\underline{\textit{Hi}}\textit{erarchical }\underline{\textit{Spec}}\textit{ulative Decoding (HiSpec)}$，一种高吞吐量推测解码框架，利用早期退出模型进行低开销中间验证。早期退出模型允许令牌通过跳过层遍历提前退出，并经过显式训练，使得选定层的隐藏状态可解释，从而在不显著增加计算和内存开销的情况下，非常适合中间验证。为了进一步提高资源效率，我们设计了一种方法，使HiSpec能够在草稿模型、中间验证器和目标模型之间重用键值缓存和隐藏状态。为了保持准确性，HiSpec定期针对目标模型验证中间验证器接受的草稿令牌。我们在各种代表性基准和模型上的评估表明，与基线单层推测相比，HiSpec平均提高吞吐量1.28倍，最高达2.01倍，且不损失准确性。

英文摘要

Speculative decoding accelerates LLM inference by using a smaller draft model to speculate tokens that a larger target model verifies. Verification is often the bottleneck (e.g. verification is $4\times$ slower than token generation when a 3B model speculates for a 70B target model), but most prior works focus only on accelerating drafting. $\textit{``Intermediate"}$ verification reduces verification time by discarding inaccurate draft tokens early, but existing methods incur substantial training overheads in incorporating the intermediate verifier, increase the memory footprint to orchestrate the intermediate verification step, and compromise accuracy by relying on approximate heuristics. We propose $\underline{\textit{Hi}}\textit{erarchical }\underline{\textit{Spec}}\textit{ulative Decoding (HiSpec)}$, a framework for high-throughput speculative decoding that exploits $\textit{early-exit (EE) models}$ for low-overhead intermediate verification. EE models allow tokens to exit early by skipping layer traversal and are explicitly trained so that hidden states at selected layers can be interpreted, making them uniquely suited for intermediate verification without drastically increasing compute and memory overheads. To improve resource-efficiency even further, we design a methodology that enables HiSpec to re-use key-value caches and hidden states between the draft, intermediate verifier, and target models. To maintain accuracy, HiSpec periodically validates the draft tokens accepted by the intermediate verifier against the target model. Our evaluations using various representative benchmarks and models show that HiSpec improves throughput by 1.28$\times$ on average and by up to 2.01$\times$ compared to the baseline single-layer speculation without compromising accuracy.

URL PDF HTML ☆

赞 0 踩 0

2509.21167 2026-05-27 cs.LG cs.CV 版本更新

A Unified Framework for Diffusion Model Unlearning with f-Divergence

基于f-散度的扩散模型遗忘统一框架

Nicola Novello, Federico Fontana, Luigi Cinque, Deniz Gunduz, Andrea M. Tonello

发表机构 * University of Klagenfurt, Austria（克雷根福特大学）； Sapienza University of Rome, Italy（罗马萨皮恩扎大学）； Imperial College London, UK（伦敦帝国学院）

AI总结提出一个基于f-散度的统一框架，将扩散模型概念遗忘中的MSE损失推广到任意f-散度，并通过理论分析和实验验证不同散度对遗忘效果的影响。

Comments Accepted at ICML 2026

详情

AI中文摘要

现有的大多数文本到图像扩散模型概念遗忘方法最小化基于目标概念和锚定概念的去噪器输出之间的均方误差（MSE）损失，这隐式地是两个高斯分布之间的KL散度。我们将这一目标推广到任意$f$-散度，将MSE恢复为KL实例，并识别出一族$\alpha$-散度，其高斯闭式形式产生廉价、类似MSE的训练目标。对于剩余的$f$-散度，我们基于$f$-散度的变分公式提供了一个最小-最大目标。我们从理论上分析并数值验证了不同$f$-散度如何影响梯度幅度和算法的收敛性质，从而影响遗忘质量。例如，我们观察到Hellinger闭式实例在多种场景下始终优于MSE。更一般地，所提出的统一框架为根据应用和用户目标选择最优散度提供了灵活的范式，允许对遗忘效果与生成保真度之间的权衡进行更精细的控制。

英文摘要

Most existing methods for concept unlearning in text-to-image diffusion models minimize a mean squared error (MSE) loss between the denoiser outputs conditioned on a target and an anchor concept, which is implicitly the KL divergence between two Gaussians. We generalize this objective to any $f$-divergence, recovering MSE as the KL instance, and identify a family of $α$-divergences whose Gaussian closed-form yields cheap, MSE-like training objectives. For the remaining $f$-divergences, we provide a min-max objective based on the variational formulation of the $f$-divergence. We theoretically analyze and numerically validate how different $f$-divergences impact the gradient magnitude and the convergence properties of the algorithm, affecting the quality of unlearning. For instance, we observe that the Hellinger closed-form instance consistently dominates MSE across multiple scenarios. More generally, the proposed unified framework offers a flexible paradigm for selecting the optimal divergence based on the application and user goal, allowing for finer control over the trade-off between unlearning efficacy and generative fidelity.

URL PDF HTML ☆

赞 0 踩 0

2509.15121 2026-05-27 hep-ph cs.LG hep-ex 版本更新

Shedding Light on Dark Matter at the LHC with Machine Learning

利用机器学习在LHC上揭示暗物质

Ernesto Arganda, Martín de los Rios, Andres D. Perez, Subhojit Roy, Rosa M. Sandá Seoane, Carlos E. M. Wagner

发表机构 * Departamento de Física Teórica, Universidad Autónoma de Madrid（马德里自治大学理论物理系）； Instituto de Física Teórica UAM-CSIC（UAM-CSIC理论物理研究所）； SISSA - International School for Advanced Studies（国际先进研究学院SISSA）； Instituto de Astronomía Teórica y Experimental, CONICET - UNC（理论与实验天文学研究所，CONICET-UNC）； HEP Division, Argonne National Laboratory（阿贡国家实验室高能物理部）； Enrico Fermi Institute, Physics Department, University of Chicago（恩里科·费米研究所，芝加哥大学物理系）； Kavli Institute for Cosmological Physics, University of Chicago（芝加哥大学宇宙学研究所）； Leinweber Center for Theoretical Physics, University of Chicago（芝加哥大学理论物理中心）； Perimeter Institute for Theoretical Physics, Waterloo, Ontario N2L 2Y5, Canada（加拿大滑铁卢大学理论物理研究所）

AI总结研究在Z3对称的次最小超对称标准模型中，通过机器学习分析辐射衰变中性伴随子产生的多光子信号，实现LHC上对单重态主导LSP暗物质的5σ发现能力。

Comments 26 pages + references, 6 figures, 8 tables, 1 appendix (version published in PRD)

详情

DOI: 10.1103/m1cd-1sfb
Journal ref: Phys. Rev. D 113 (2026) 9, 095013

AI中文摘要

我们在$Z_3$对称的次最小超对称标准模型（NMSSM）中，研究了一个以单重态主导的最轻超对称粒子（LSP）形式的WIMP暗物质（DM）候选者。该框架产生了参数空间区域，其中暗物质通过与邻近的higgsino-like电弱伴随子共同湮灭而获得，且暗物质直接探测信号被抑制，即所谓的“盲点”。另一方面，由于higgsino到单重态主导的LSP和光子的辐射衰变模式增强，而不是衰变成轻子或强子，对撞机特征仍然有希望。与具有轻bino-like和wino-like电弱伴随子的MSSM情景相比，NMSSM允许来自级联辐射衰变的多光子末态，提供了独特的对撞机特征。这激发了寻找辐射衰变中性伴随子的研究，然而，这些信号面临巨大的背景挑战，因为衰变产物通常由于LSP和higgsino-like共同湮灭伙伴之间的小质量差（$Δm$）而变软。我们应用了一种数据驱动的机器学习（ML）分析，提高了对这些微弱信号的灵敏度，为发现新物理情景提供了对传统搜索策略的有力补充。使用LHC在$14~\mathrm{TeV}$下积分亮度为$100~\mathrm{fb}^{-1}$的数据，该方法对higgsino质量高达$225~\mathrm{GeV}$且$Δm\!\lesssim\!12~\mathrm{GeV}$实现了$5σ$发现能力，对高达$285~\mathrm{GeV}$且$Δm\!\lesssim\!20~\mathrm{GeV}$实现了$2σ$排除能力。这些结果突显了对撞机搜索在探测当前直接探测实验隐藏的暗物质候选者方面的能力，并为LHC合作组使用ML方法进行搜索提供了动机。

英文摘要

We investigate a WIMP dark matter (DM) candidate in the form of a singlino-dominated lightest supersymmetric particle (LSP) within the $Z_3$-symmetric Next-to-Minimal Supersymmetric Standard Model (NMSSM). This framework gives rise to regions of parameter space where DM is obtained via co-annihilation with nearby higgsino-like electroweakinos and DM direct detection~signals are suppressed, the so-called ``blind spots''. On the other hand, collider signatures remain promising due to enhanced radiative decay modes of higgsinos into the singlino-dominated LSP and photons, rather than into leptons or hadrons. Compared to MSSM scenarios with light bino- and wino-like electroweakinos, the NMSSM allows for final states with multiple photons arising from cascade radiative decays, providing a distinctive collider signature. This motivates searches for radiatively decaying neutralinos, however, these signals face substantial background challenges, as the decay products are typically soft due to the small mass-splits ($Δm$) between the LSP and the higgsino-like coannihilation partners. We apply a data-driven Machine Learning (ML) analysis that improves sensitivity to these subtle signals, offering a powerful complement to traditional search strategies to discover a new physics scenario. Using an LHC integrated luminosity of $100~\mathrm{fb}^{-1}$ at $14~\mathrm{TeV}$, the method achieves a $5σ$ discovery reach for higgsino masses up to $225~\mathrm{GeV}$ with $Δm\!\lesssim\!12~\mathrm{GeV}$, and a $2σ$ exclusion up to $285~\mathrm{GeV}$ with $Δm\!\lesssim\!20~\mathrm{GeV}$. These results highlight~the power of collider searches to probe DM candidates that remain hidden from current~direct detection experiments, and provide a motivation for a search by the LHC collaborations using ML methods.

URL PDF HTML ☆

赞 0 踩 0

2503.20507 2026-05-27 cs.AR cs.DC cs.LG 版本更新

消息传递状态空间模型：利用现代序列建模改进图学习

Andrea Ceni, Alessio Gravina, Claudio Gallicchio, Davide Bacciu, Carola-Bibiane Schonlieb, Moshe Eliasof

发表机构 * University of Pisa（帕尔米斯大学）； University of Cambridge（剑桥大学）

AI总结提出MP-SSM，将现代状态空间模型的核心计算嵌入消息传递神经网络，实现静态和时序图上的高效、置换等变和长程信息传播，并通过精确敏感性分析刻画深层信息流问题。

详情

AI中文摘要

状态空间模型（SSM）在序列建模中的近期成功推动了其向图学习的迁移，催生了图状态空间模型（GSSM）。然而，现有的GSSM通过将SSM模块应用于从图中提取的序列，往往损害了置换等变性、消息传递兼容性和计算效率等核心属性。本文引入了一种新视角，将现代SSM计算的关键原理直接嵌入消息传递神经网络框架，从而为静态图和时序图提供统一的方法论。我们的方法MP-SSM能够实现高效、置换等变和长程信息传播，同时保持消息传递的架构简洁性。关键的是，MP-SSM支持精确的敏感性分析，我们利用该分析从理论上刻画信息流，并评估深层网络中的梯度消失和过压缩等问题。此外，我们的设计选择允许类似现代SSM的高度优化并行实现。我们在包括节点分类、图属性预测、长程基准和时空预测在内的广泛任务上验证了MP-SSM，展示了其多功能性和强大的实证性能。

英文摘要

The recent success of State-Space Models (SSMs) in sequence modeling has motivated their adaptation to graph learning, giving rise to Graph State-Space Models (GSSMs). However, existing GSSMs operate by applying SSM modules to sequences extracted from graphs, often compromising core properties such as permutation equivariance, message-passing compatibility, and computational efficiency. In this paper, we introduce a new perspective by embedding the key principles of modern SSM computation directly into the Message-Passing Neural Network framework, resulting in a unified methodology for both static and temporal graphs. Our approach, MP-SSM, enables efficient, permutation-equivariant, and long-range information propagation while preserving the architectural simplicity of message passing. Crucially, MP-SSM enables an exact sensitivity analysis, which we use to theoretically characterize information flow and evaluate issues like vanishing gradients and over-squashing in the deep regime. Furthermore, our design choices allow for a highly optimized parallel implementation akin to modern SSMs. We validate MP-SSM across a wide range of tasks, including node classification, graph property prediction, long-range benchmarks, and spatiotemporal forecasting, demonstrating both its versatility and strong empirical performance.

URL PDF HTML ☆

赞 0 踩 0

2505.17163 2026-05-27 cs.LG cs.AI cs.CL cs.CV 版本更新

OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning

OCR-Reasoning基准：揭示MLLMs在复杂文本丰富图像推理中的真实能力

Mingxin Huang, Yongxin Shi, Dezhi Peng, Songxuan Lai, Zecheng Xie, Lianwen Jin

发表机构 * South China University of Technology（华南理工大学）； Huawei Technologies Co., Ltd（华为技术有限公司）

AI总结提出OCR-Reasoning基准，包含1069个人工标注样本，覆盖6种核心推理能力和18个实际推理任务，通过双标注（最终答案和逐步推理过程）评估多模态大语言模型在文本丰富图像推理中的能力，发现最先进模型准确率均低于50%。

Comments ICLR 2026

详情

AI中文摘要

近期多模态慢思考系统在各种视觉推理任务中表现出色。然而，由于缺乏专门且系统的基准，它们在文本丰富图像推理任务中的能力仍未得到充分研究。为填补这一空白，我们提出了OCR-Reasoning，一个新颖的基准，旨在系统评估多模态大语言模型在文本丰富图像推理任务上的表现。具体而言，OCR-Reasoning包含1069个人工标注的示例，涵盖文本丰富视觉场景中的6种核心推理能力和18个实际推理任务。与仅提供最终答案的现有文本丰富图像理解基准不同，本基准额外提供了详细的逐步推理过程。这种双标注使得能够同时评估模型的最终答案和推理过程，从而全面评估文本丰富推理能力。利用该基准，我们对最新的多模态大语言模型进行了全面评估。结果表明，即使是最先进的多模态大语言模型在文本丰富图像推理任务中也面临巨大困难，在我们的基准上没有一个模型的准确率超过50%，这表明文本丰富图像推理的挑战是一个亟待解决的问题。基准和评估脚本可在https://github.com/SCUT-DLVCLab/OCR-Reasoning获取。

英文摘要

Recent advancements in multimodal slow-thinking systems have demonstrated remarkable performance across various visual reasoning tasks. However, their capabilities in text-rich image reasoning tasks remain understudied due to the absence of a dedicated and systematic benchmark. To address this gap, we propose OCR-Reasoning, a novel benchmark designed to systematically assess Multimodal Large Language Models on text-rich image reasoning tasks. Specifically, OCR-Reasoning comprises 1,069 human-annotated examples spanning 6 core reasoning abilities and 18 practical reasoning tasks in text-rich visual scenarios. Unlike existing text-rich image understanding benchmarks that only provide a final answer, this benchmark additionally provides a detailed step-by-step reasoning process. This dual annotation enables the evaluation of both the models' final answers and their reasoning processes, thereby offering a holistic assessment of text-rich reasoning capabilities. By leveraging this benchmark, we conducted a comprehensive evaluation of the latest MLLMs. Our results demonstrate that even the most advanced MLLMs exhibit substantial difficulties in text-rich image reasoning tasks, with none achieving an accuracy above 50\% on our benchmark, indicating that the challenges of text-rich image reasoning are an urgent issue to be addressed. The benchmark and evaluation scripts are available at https://github.com/SCUT-DLVCLab/OCR-Reasoning.

URL PDF HTML ☆

赞 0 踩 0

2505.16942 2026-05-27 cs.CV cs.LG 版本更新

Efficient All-Pairs Correlation Volume Sampling for Optical Flow Estimation

高效的全对相关性体素采样用于光流估计

Karlis Martins Briedis, Markus Gross, Christopher Schroers

发表机构 * DisneyResearch|Studios（迪士尼研究与工作室）； ETH Zürich（苏黎世联邦理工学院）

AI总结提出一种内存和计算高效的算法，实现全对相关性体素采样的精确数学运算，在保持低内存占用的同时显著提升速度，并应用于高分辨率光流估计达到最优性能。

Comments CVPR 2026

详情

AI中文摘要

最近的光流估计方法通常从密集的全对相关性体素中进行局部代价采样。这导致计算和内存复杂度与像素数成二次关系。尽管存在一种按需代价计算的替代内存高效实现，但在实践中速度明显较慢，因此许多先前方法在降采样分辨率下处理图像，丢失了细粒度细节。为了解决这个问题，我们提出了一种算法，用于全对相关性体素采样的内存和计算高效实现，同时仍然匹配RAFT定义的精确数学算子。我们的方法在保持同样低内存使用的情况下，性能优于按需采样高达92%，并且与默认实现相比，内存使用降低高达99%的同时性能至少相当。由于代价采样占整体运行时间的很大一部分，这可以转化为高分辨率输入下端到端模型推理总时间高达63%的节省。我们对现有方法的评估包括一个8K超高清数据集和SEA-RAFT方法的推理时间扩展。通过这一点，我们在高分辨率下在准确性和运行时间上都达到了最先进的结果。

英文摘要

Recent optical flow estimation methods often employ local cost sampling from a dense all-pairs correlation volume. This results in quadratic computational and memory complexity in the number of pixels. Although an alternative memory-efficient implementation with on-demand cost computation exists, this is significantly slower in practice and therefore many prior methods process images at downsampled resolutions, missing fine-grained details. To address this, we propose an algorithm for both memory and compute-efficient implementation of the all-pairs correlation volume sampling, still matching the exact mathematical operator as defined by RAFT. Our approach outperforms on-demand sampling by up to 92% while maintaining equally low memory usage, and performs at least on par with the default implementation with up to 99% lower memory usage. As cost sampling makes up a significant portion of the overall runtime, this can translate to up to 63% savings for the total end-to-end model inference on high-resolution inputs. Our evaluation of existing methods includes an 8K ultra-high-resolution dataset and an inference-time extension of the SEA-RAFT method. With this, we achieve state-of-the-art results at high resolutions both in accuracy and runtime.

URL PDF HTML ☆

赞 0 踩 0

2502.17666 2026-05-27 cs.LG cs.AI 版本更新

扩散模型在味模型中的应用：以$S_4^\prime$模味模型为例

Satsuki Nishimura, Hajime Otsuka, Haruki Uchiyama

发表机构 * Department of Physics, Kyushu University（九州大学物理系）

AI总结利用扩散模型（一种生成式人工智能）提出一种数值方法，通过实验约束搜索味模型参数，并以$S_4^\prime$模味模型为例，构建神经网络再现夸克质量、CKM矩阵和Jarlskog不变量，发现新的现象学感兴趣参数区域，并确认自发CP破坏。

Comments 19 pages, 2 figures

详情

DOI: 10.1093/ptep/ptag069
Journal ref: Prog Theor Exp Phys (2026)

AI中文摘要

我们提出了一种利用扩散模型（属于生成式人工智能）在通用味模型中搜索具有实验约束参数的数值方法。作为一个具体例子，我们考虑$S_4^\prime$模味模型，并构建一个神经网络，通过将味模型中的自由参数视为生成目标，再现夸克质量、CKM矩阵和Jarlskog不变量。通过使用训练好的网络生成新参数，我们发现了各种现象学上有趣的参数区域，在这些区域中对$S_4^\prime$模型进行解析评估具有挑战性。此外，我们确认了在$S_4^\prime$模型中发生了自发CP破坏。扩散模型实现了逆问题方法，使得机器能够从给定的实验数据中提供一系列合理的模型参数。此外，它还可以作为一种通用的分析工具，用于从味模型中提取新的物理预测。

英文摘要

We propose a numerical method of searching for parameters with experimental constraints in generic flavor models by utilizing diffusion models, which are classified as a type of generative artificial intelligence (generative AI). As a specific example, we consider the $S_4^\prime$ modular flavor model and construct a neural network that reproduces quark masses, the CKM matrix, and the Jarlskog invariant by treating free parameters in the flavor model as generating targets. By generating new parameters with the trained network, we find various phenomenologically interesting parameter regions where an analytical evaluation of the $S_4^\prime$ model is challenging. Additionally, we confirm that the spontaneous CP violation occurs in the $S_4^\prime$ model. The diffusion model enables an inverse problem approach, allowing the machine to provide a series of plausible model parameters from given experimental data. Moreover, it can serve as a versatile analytical tool for extracting new physical predictions from flavor models.

URL PDF HTML ☆

赞 0 踩 0

2504.00307 2026-05-27 cs.LG physics.ao-ph 版本更新

Generating realistic global precipitation fields from modelled atmospheric circulation

从模拟大气环流生成逼真的全球降水场

Michael Aich, Sebastian Bathiany, Philipp Hess, Yu Huang, Niklas Boers

发表机构 * Technical University of Munich（慕尼黑技术大学）； Munich Climate Center（慕尼黑气候中心）； TUM School of Engineering and Design（TUM工程与设计学院）； Department of Aerospace and Geodesy（航空航天与大地测量系）； Earth System Modelling Group（地球系统建模组）； Potsdam Institute for Climate Impact Research（波茨坦气候影响研究所）； Global Systems Institute（全球系统研究所）； Department of Mathematics（数学系）； University of Exeter（埃克塞特大学）

AI总结提出基于条件扩散模型与UNet架构的生成式机器学习方法，从少量预报大气变量生成高分辨率全球降水场，作为传统参数化方案的替代，减少偏差并实现高效集合预测。

Comments Accepted for publication at Climate Dynamics

详情

AI中文摘要

改进地球系统模型（ESMs）中降水的表示对于评估气候变化的影响，特别是洪水和干旱等极端事件至关重要。在现有的ESMs中，降水并非显式解析，而是通过参数化表示。这些参数化通常依赖于解析近似但计算昂贵的基于柱的物理过程，不考虑位置间的相互作用。它们难以捕捉精细尺度的降水过程，并引入显著偏差。我们提出了一种基于生成式机器学习的新方法，将条件扩散模型与UNet架构相结合，从一小部分预报大气变量生成准确、高分辨率（0.25°）的全球每日降水场。与传统参数化不同，我们的框架高效地生成集合预测，捕捉降水的不确定性，且无需手动微调。我们在ERA5再分析数据上训练模型，并提出一种方法使其能应用于未见过的ESM数据，从而实现概率预测和气候情景的快速生成。通过利用全球预报变量之间的相互作用，我们的方法提供了一种替代参数化方案，减轻了ESM降水中存在的偏差，同时保持与其大尺度（年）趋势的一致性。这项工作表明，复杂的降水模式可以直接从大尺度大气变量中学习，提供了一种计算高效的方法来获得高分辨率降水，而无需以如此高分辨率运行动力模型的成本。

英文摘要

Improving the representation of precipitation in Earth system models (ESMs) is critical for assessing the impacts of climate change and especially of extreme events like floods and droughts. In existing ESMs, precipitation is not resolved explicitly, but represented by parameterizations. These typically rely on resolving approximated but computationally expensive column-based physics, not accounting for interactions between locations. They struggle to capture fine-scale precipitation processes and introduce significant biases. We present a novel approach, based on generative machine learning, which integrates a conditional diffusion model with a UNet architecture to generate accurate, high-resolution (0.25°) global daily precipitation fields from a small set of prognostic atmospheric variables. Unlike traditional parameterizations, our framework efficiently produces ensemble predictions, capturing uncertainties in precipitation, and does not require fine-tuning by hand. We train our model on the ERA5 reanalysis and present a method that allows us to apply it to unseen ESM data, enabling fast generation of probabilistic forecasts and climate scenarios. By leveraging interactions between global prognostic variables, our approach provides an alternative parameterization scheme that mitigates biases present in the ESM precipitation while maintaining consistency with its large-scale (annual) trends. This work demonstrates that complex precipitation patterns can be learned directly from large-scale atmospheric variables, offering a computationally efficient method to obtain high-resolution precipitation without the cost of running the dynamical model at such high resolution.

URL PDF HTML ☆

赞 0 踩 0

2306.13985 2026-05-27 stat.ML cs.AI cs.LG stat.ME 版本更新

Robust Classification of High-Dimensional Data using Data-Adaptive Energy Distance

使用数据自适应能量距离的高维数据鲁棒分类

Jyotishka Ray Choudhury, Aytijhya Saha, Sarbojit Roy, Subhajit Dutta

发表机构 * Indian Statistical Institute , Kolkata, India（印度统计研究所，加尔各答，印度）； School of Industrial and Systems Engineering, Georgia Institute of Technology , Atlanta, USA（工业与系统工程学院，佐治亚理工学院，美国亚特兰大）； Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology , Saudi Arabia（计算机、电子和数学科学与工程系，国王阿卜杜勒·阿齐兹大学科学与技术学院，沙特阿拉伯）； Applied Statistics Unit, Indian Statistical Institute , Kolkata, India（应用统计部，印度统计研究所，加尔各答，印度）； Department of Mathematics and Statistics, Indian Institute of Technology Kanpur , India（数学与统计系，印度理工学院坎普尔分校，印度）

AI总结针对高维低样本量数据，提出无调参、无矩条件的鲁棒分类器，在渐近条件下实现完美分类，并通过模拟和真实数据验证其优势。

Comments Published at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2023

详情

DOI: 10.1007/978-3-031-43424-2_6
Journal ref: In: ECML PKDD 2023: Research Track. Lecture Notes in Computer Science, vol 14173. Springer, Cham (2023)

AI中文摘要

高维低样本量数据的分类在基因表达研究、癌症研究和医学成像等多种实际场景中构成挑战。本文开发并分析了一些专门为HDLSS数据设计的分类器。这些分类器无需调参且具有鲁棒性，即它们不依赖于底层数据分布的任何矩条件。研究表明，在相当一般的条件下，它们在HDLSS渐近框架下能实现完美分类。还研究了所提分类器的比较性能。我们的理论结果得到了广泛的模拟研究和真实数据分析的支持，这些分析表明所提出的分类技术相对于几种广泛认可的方法具有显著优势。

英文摘要

Classification of high-dimensional low sample size (HDLSS) data poses a challenge in a variety of real-world situations, such as gene expression studies, cancer research, and medical imaging. This article presents the development and analysis of some classifiers that are specifically designed for HDLSS data. These classifiers are free of tuning parameters and are robust, in the sense that they are devoid of any moment conditions of the underlying data distributions. It is shown that they yield perfect classification in the HDLSS asymptotic regime, under some fairly general conditions. The comparative performance of the proposed classifiers is also investigated. Our theoretical results are supported by extensive simulation studies and real data analysis, which demonstrate promising advantages of the proposed classification techniques over several widely recognized methods.

URL PDF HTML ☆

赞 0 踩 0

2502.06567 2026-05-27 stat.ML cs.LG 版本更新

Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study

量化模型中的成员推断风险：理论与实证研究

Eric Aubinais, Philippe Formont, Pablo Piantanida, Elisabeth Gassiat

发表机构 * Université Paris-Saclay, CNRS, Laboratoire de mathématiques d’Orsay, France（巴黎萨克雷大学，法国国家科学研究中心，奥赛数学实验室，法国）； Université Paris-Saclay, ILLS, MILA, ÉTS, Montreal, Canada（巴黎萨克雷大学，ILLs，MILA，ÉTS，加拿大蒙特利尔）； ILLS, MILA, CNRS, CentraleSupélec, Montreal, Canada（ILLs，MILA，法国国家科学研究中心，中央超导学院，加拿大蒙特利尔）

AI总结本文通过理论分析和实证方法，研究后训练量化对机器学习模型成员推断隐私风险的影响，并提出新的成员推断安全指标。

详情

Journal ref: AISTATS 2026

AI中文摘要

量化机器学习模型已被证明在降低内存和推理成本的同时，能够保持与原始模型相当的性能水平。在这项工作中，我们研究了量化过程对数据驱动模型隐私的影响，重点关注它们对成员推断攻击的脆弱性。成员推断安全（MIS）最近被提出，用于表征机器学习模型针对最强大（且可能未知）攻击的隐私性。然而，量化MIS在计算上似乎非常困难。在本文中，我们针对最小化经验损失的机器学习模型的后训练量化过程，提出了一种新的MIS指标。该新指标是此背景下MIS理论渐近分析的副产品。我们还提出了一种经验估计MIS指标的方法。使用合成数据集和真实世界数据（在药物发现背景下），我们证明了我们的方法在评估和排序不同量化器的MIS方面的有效性。

英文摘要

Quantizing machine learning models has demonstrated its effectiveness in lowering memory and inference costs while maintaining performance levels comparable to those of the original models. In this work, we investigate the impact of quantization procedures on privacy in data-driven models, focusing on their vulnerability to membership inference attacks. Membership Inference Security (MIS) has recently been proposed to characterize the privacy of machine learning models against the most powerful (and possibly unknown) attacks. However, quantifying MIS appears to be computationally very difficult. In this paper, we propose a new MIS indicator for post-training quantization procedures of machine learning models that minimizes an empirical loss. This new indicator is a byproduct of a theoretical asymptotic analysis of the MIS in this context. We also present a methodology for empirically estimating our MIS indicator. Using synthetic datasets and real-world data (in the context of drug discovery), we demonstrate the effectiveness of our approach in assessing and ranking the MIS of different quantizers.

URL PDF HTML ☆

赞 0 踩 0

2501.00520 2026-05-27 cs.CV cs.LG 版本更新

Innovative Silicosis and Pneumonia Classification: Leveraging Graph Transformer Post-hoc Modeling and Ensemble Techniques

创新性矽肺和肺炎分类：利用图Transformer后验建模与集成技术

Bao Q. Bui, Tien T. T. Nguyen, Duy M. Le, Cong Tran, Cuong Pham

AI总结提出结合图Transformer网络与传统深度神经网络的架构，并采用平衡交叉熵损失函数和集成方法，在自建胸部X光数据集上实现高精度矽肺与肺炎分类。

Comments Withdrawn by the authors because the manuscript contains incomplete and potentially misleading descriptions of the dataset construction and evaluation protocol, particularly in the Dataset and Experimental Setup sections. The work should not be cited or used as an independent reference in its current form

详情

AI中文摘要

本文对矽肺相关肺部炎症的分类与检测进行了全面研究。我们的主要贡献包括：1) 创建了一个名为SVBCX的新策划胸部X光（CXR）图像数据集，该数据集针对不同病原体引起的肺部炎症的细微差别进行了定制，为矽肺和肺炎研究社区提供了宝贵资源；2) 提出了一种新颖的深度学习架构，该架构将图Transformer网络与传统深度神经网络模块相结合，用于有效分类矽肺和肺炎。此外，我们采用平衡交叉熵（BalCE）作为损失函数，以确保不同类别之间的更均匀学习，增强模型辨别肺部状况细微差异的能力。所提出的模型架构和损失函数选择旨在提高炎症检测的准确性和可靠性，特别是在矽肺背景下。此外，我们的研究探索了一种集成方法的有效性，该方法结合了不同模型架构的优势。在构建的数据集上的实验结果表明，与基线模型相比，取得了显著改进。模型集成实现了宏F1分数0.9749，每个类别的AUC ROC分数超过0.99，突显了我们的方法在准确和鲁棒的肺部炎症分类中的有效性。

英文摘要

This paper presents a comprehensive study on the classification and detection of Silicosis-related lung inflammation. Our main contributions include 1) the creation of a newly curated chest X-ray (CXR) image dataset named SVBCX that is tailored to the nuances of lung inflammation caused by distinct agents, providing a valuable resource for silicosis and pneumonia research community; and 2) we propose a novel deep-learning architecture that integrates graph transformer networks alongside a traditional deep neural network module for the effective classification of silicosis and pneumonia. Additionally, we employ the Balanced Cross-Entropy (BalCE) as a loss function to ensure more uniform learning across different classes, enhancing the model's ability to discern subtle differences in lung conditions. The proposed model architecture and loss function selection aim to improve the accuracy and reliability of inflammation detection, particularly in the context of Silicosis. Furthermore, our research explores the efficacy of an ensemble approach that combines the strengths of diverse model architectures. Experimental results on the constructed dataset demonstrate promising outcomes, showcasing substantial enhancements compared to baseline models. The ensemble of models achieves a macro-F1 score of 0.9749 and AUC ROC scores exceeding 0.99 for each class, underscoring the effectiveness of our approach in accurate and robust lung inflammation classification.

URL PDF HTML ☆

赞 0 踩 0

2410.19248 2026-05-27 cs.LG 版本更新

CHESTNUT: A QoS Dataset for Mobile Edge Environments

CHESTNUT: 面向移动边缘环境的QoS数据集

Guobing Zou, Fei Zhao, Shengxiang Hu

发表机构 * School of Computer Engineering and Science, Shanghai University（上海大学计算机工程与科学学院）

AI总结针对现有QoS数据集忽略时间和地理位置等动态属性的问题，提出CHESTNUT数据集，在采集过程中精确记录时间和地理位置信息，以支持移动边缘环境中的QoS预测。

详情

AI中文摘要

服务质量（QoS）是衡量网络服务性能的重要指标。如今，它被广泛应用于移动边缘环境中，以评估移动设备从边缘服务器请求服务时的服务质量。QoS通常涉及多个维度，如带宽、延迟、抖动和数据包丢失率。然而，大多数现有的QoS数据集，例如常见的WS-Dream数据集，主要关注网络服务的静态QoS指标，而忽略了时间和地理位置等动态属性。这意味着它们应该详细记录服务请求时移动设备的位置或请求的时间顺序。然而，这些动态属性对于理解和预测网络服务的实际性能至关重要，因为QoS性能通常随时间和地理位置波动。为此，我们提出了一种新的数据集，在采集过程中精确记录服务质量的时间和地理位置信息，旨在为移动边缘环境中的未来QoS预测提供更准确、可靠的数据支持。

英文摘要

Quality of Service (QoS) is an important metric to measure the performance of network services. Nowadays, it is widely used in mobile edge environments to evaluate the quality of service when mobile devices request services from edge servers. QoS usually involves multiple dimensions, such as bandwidth, latency, jitter, and data packet loss rate. However, most existing QoS datasets, such as the common WS-Dream dataset, focus mainly on static QoS metrics of network services and ignore dynamic attributes such as time and geographic location. This means they should have detailed the mobile device's location at the time of the service request or the chronological order in which the request was made. However, these dynamic attributes are crucial for understanding and predicting the actual performance of network services, as QoS performance typically fluctuates with time and geographic location. To this end, we propose a novel dataset that accurately records temporal and geographic location information on quality of service during the collection process, aiming to provide more accurate and reliable data to support future QoS prediction in mobile edge environments.

URL PDF HTML ☆

赞 0 踩 0

2410.00357 2026-05-27 cs.LG stat.ML 版本更新

Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study

深度ReLU和深度算子网络的神经缩放定律：一项理论研究

Hao Liu, Zecheng Zhang, Wenjing Liao, Hayden Schaeffer

发表机构 * Department of Mathematics, Hong Kong Baptist University（香港 Baptist 大学数学系）； Department of ACMS, University of Notre Dame（Notre Dame 大学ACMS系）； School of Mathematics, Georgia Institute of Technology（佐治亚理工学院数学系）； Department of Mathematics, UCLA（加州大学洛杉矶分校数学系）

AI总结本文通过分析深度算子网络的逼近误差和泛化误差，建立了量化神经缩放定律的理论框架，揭示了网络模型大小和训练数据大小与误差之间的关系，并推广到深度ReLU网络。

详情

AI中文摘要

神经缩放定律在深度神经网络的性能中起着关键作用，并在广泛的任务中被观察到。然而，理解这些缩放定律的完整理论框架仍不完善。在本文中，我们探索了深度算子网络的神经缩放定律，这些网络涉及学习函数空间之间的映射，重点关注Chen和Chen风格的架构。这些方法包括流行的深度算子网络（DeepONet），它们使用可学习基函数和依赖于输入函数的系数的线性组合来近似输出函数。我们建立了一个理论框架，通过分析其逼近和泛化误差来量化神经缩放定律。我们阐述了深度算子网络的逼近和泛化误差与网络模型大小和训练数据大小等关键因素之间的关系。此外，我们处理了输入函数表现出低维结构的情况，从而能够推导出更紧的误差界。这些结果也适用于深度ReLU网络和其他类似结构。我们的结果为算子学习中的神经缩放定律提供了部分解释，并为其应用提供了理论基础。

英文摘要

Neural scaling laws play a pivotal role in the performance of deep neural networks and have been observed in a wide range of tasks. However, a complete theoretical framework for understanding these scaling laws remains underdeveloped. In this paper, we explore the neural scaling laws for deep operator networks, which involve learning mappings between function spaces, with a focus on the Chen and Chen style architecture. These approaches, which include the popular Deep Operator Network (DeepONet), approximate the output functions using a linear combination of learnable basis functions and coefficients that depend on the input functions. We establish a theoretical framework to quantify the neural scaling laws by analyzing its approximation and generalization errors. We articulate the relationship between the approximation and generalization errors of deep operator networks and key factors such as network model size and training data size. Moreover, we address cases where input functions exhibit low-dimensional structures, allowing us to derive tighter error bounds. These results also hold for deep ReLU networks and other similar structures. Our results offer a partial explanation of the neural scaling laws in operator learning and provide a theoretical foundation for their applications.

URL PDF HTML ☆

赞 0 踩 0

2408.05560 2026-05-27 cs.LG math.OC stat.ML 版本更新

Incremental Gauss-Newton Descent for Machine Learning

增量高斯-牛顿下降法在机器学习中的应用

Mikalai Korbit, Mario Zanon

发表机构 * IMT School for Advanced Studies Lucca（利卡学院高级研究学院）

AI总结针对标量输出损失逐样本评估的场景，提出增量高斯-牛顿下降法（IGND），通过闭式标量归一化随机梯度实现无需存储或求解曲率矩阵的高效更新，并证明其收敛性。

详情

AI中文摘要

随机梯度更新因其高效性和可扩展性被广泛使用，但其有效步长可能严重依赖于特征缩放和局部模型敏感性。高斯-牛顿方法通过曲率信息处理此类尺度效应，但在标准小批量形式中需要矩阵-向量乘积、线性求解或结构化近似。本文研究每次评估一个样本的标量输出损失的特殊情况。在此设置下，广义高斯-牛顿矩阵的秩至多为1，其唯一可能的非零曲率方向与随机梯度对齐。因此，阻尼高斯-牛顿方向简化为样本梯度的闭式标量归一化。由此产生的更新，即增量高斯-牛顿下降法（IGND），不需要曲率矩阵存储、分解或迭代线性求解。我们推导了该更新，描述了其行为，并将其与归一化梯度下降、自适应一阶方法、随机Polyak步长和小批量高斯-牛顿更新联系起来。在显式光滑性、对齐性和随机逼近假设下，我们证明了IGND更新的平稳性结果。在监督学习、尺度鲁棒性的受控测试以及线性二次控制案例研究上的实验表明，IGND提高了对敏感性缩放的鲁棒性，并且可以在保持简单增量更新的同时，与常见的随机优化器竞争或互补。

英文摘要

Stochastic gradient updates are widely used for their efficiency and scalability, but their effective step sizes can depend strongly on feature scaling and local model sensitivity. Gauss-Newton methods address such scale effects through curvature information, but in their standard mini-batch form they require matrix-vector products, linear solves, or structured approximations. This paper studies the special case of scalar-output losses evaluated one sample at a time. In this setting, the generalized Gauss-Newton matrix has rank at most one, and its only possible nonzero curvature direction is aligned with the stochastic gradient. As a result, the damped Gauss-Newton direction reduces to a closed-form scalar normalization of the sample gradient. The resulting update, Incremental Gauss-Newton Descent (IGND), requires no curvature matrix storage, factorization, or iterative linear solve. We derive the update, characterize its behavior, and relate it to normalized gradient descent, adaptive first-order methods, stochastic Polyak step sizes, and mini-batch Gauss-Newton updates. Under explicit smoothness, alignment, and stochastic approximation assumptions, we prove a stationarity result for the IGND update. Experiments on supervised learning, a controlled test of scale robustness, and a linear-quadratic control case study show that IGND improves robustness to sensitivity scaling and can be competitive with, or complementary to, common stochastic optimizers while retaining a simple incremental update.

URL PDF HTML ☆

赞 0 踩 0

2306.09344 2026-05-27 cs.CV cs.LG 版本更新

DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data

DreamSim: 使用合成数据学习人类视觉相似性的新维度

Stephanie Fu, Netanel Tamir, Shobhita Sundaram, Lucy Chai, Richard Zhang, Tali Dekel, Phillip Isola

发表机构 * MIT（麻省理工学院）； Weizmann Institute of Science（魏茨曼科学研究所）； Adobe Research（Adobe研究）

AI总结本文提出DreamSim指标，通过合成数据训练，在图像布局、对象姿态和语义内容等中高层面上对齐人类感知，并在检索和重建任务中优于现有指标。

Comments Website: https://dreamsim-nights.github.io/ Code: https://github.com/ssundaram21/dreamsim

详情

DOI: 10.5555/3666122.3668330
Journal ref: Advances in Neural Information Processing Systems 36 (NeurIPS 2023)

AI中文摘要

当前的感知相似性度量在像素和补丁级别上操作。这些度量在低层颜色和纹理方面比较图像，但未能捕捉图像布局、对象姿态和语义内容中的中层相似性和差异。在本文中，我们开发了一种整体评估图像的感知度量。第一步是收集一个关于以多种方式相似的图像对的人类相似性判断的新数据集。该数据集的关键在于判断几乎是自动的，并且所有观察者共享。为了实现这一点，我们使用最近的文本到图像模型创建沿不同维度扰动的合成对。我们观察到流行的感知度量无法解释我们的新数据，因此我们引入了一个新的度量DreamSim，调整以更好地与人类感知对齐。我们分析了不同视觉属性如何影响我们的度量，发现它主要关注前景对象和语义内容，同时对颜色和布局敏感。值得注意的是，尽管在合成数据上训练，我们的度量能够泛化到真实图像，在检索和重建任务上取得了强劲的结果。此外，我们的度量在这些任务上优于先前学习的度量和最近的大型视觉模型。

英文摘要

Current perceptual similarity metrics operate at the level of pixels and patches. These metrics compare images in terms of their low-level colors and textures, but fail to capture mid-level similarities and differences in image layout, object pose, and semantic content. In this paper, we develop a perceptual metric that assesses images holistically. Our first step is to collect a new dataset of human similarity judgments over image pairs that are alike in diverse ways. Critical to this dataset is that judgments are nearly automatic and shared by all observers. To achieve this we use recent text-to-image models to create synthetic pairs that are perturbed along various dimensions. We observe that popular perceptual metrics fall short of explaining our new data, and we introduce a new metric, DreamSim, tuned to better align with human perception. We analyze how our metric is affected by different visual attributes, and find that it focuses heavily on foreground objects and semantic content while also being sensitive to color and layout. Notably, despite being trained on synthetic data, our metric generalizes to real images, giving strong results on retrieval and reconstruction tasks. Furthermore, our metric outperforms both prior learned metrics and recent large vision models on these tasks.

URL PDF HTML ☆

赞 0 踩 0

2210.02573 2026-05-27 cs.LG 版本更新

Efficient Learning of Mesh-Based Physical Simulation with BSMS-GNN

基于BSMS-GNN的网格物理模拟高效学习

Yadi Cao, Menglei Chai, Minchen Li, Chenfanfu Jiang

发表机构 * Department of Computer Science, UCLA, Los Angeles, USA（加州大学洛杉矶分校计算机科学系）； AR Perception, Google, Los Angeles, USA（谷歌AR感知部门）； Department of Mathematics, UCLA, Los Angeles, USA（加州大学洛杉矶分校数学系）

AI总结针对大规模网格物理模拟中图神经网络扩展复杂度和过平滑问题，提出基于二分图确定的双步幅池化策略BSMS-GNN，无需人工粗网格且避免几何边界错误边，显著提升精度和计算效率。

Comments Updates summary: fix the missing remark for yadi and menglei (* mention work partially done during while they are at snap inc.)

详情

AI中文摘要

使用平面图神经网络（GNN）和堆叠消息传递（MP）在大规模网格上学习物理模拟具有挑战性，因为其扩展复杂度与节点数量相关且存在过平滑问题。社区对引入多尺度结构到GNN用于物理模拟的兴趣日益增长。然而，当前最先进的方法受限于依赖人工绘制粗网格或基于空间邻近性构建粗层级，这可能在几何边界引入错误边。受二分图确定启发，我们提出了一种新颖的池化策略——双步幅（bi-stride），以解决上述限制。双步幅在广度优先搜索（BFS）的每个其他前沿上池化节点，无需手动绘制粗网格，并避免了空间邻近性导致的错误边。此外，它实现了每层级单次MP方案以及通过插值进行非参数化池化和反池化，类似于U-Net，显著降低了计算成本。实验表明，所提出的框架BSMS-GNN在代表性物理模拟中，在精度和计算效率方面均显著优于现有方法。

英文摘要

Learning the physical simulation on large-scale meshes with flat Graph Neural Networks (GNNs) and stacking Message Passings (MPs) is challenging due to the scaling complexity w.r.t. the number of nodes and over-smoothing. There has been growing interest in the community to introduce \textit{multi-scale} structures to GNNs for physical simulation. However, current state-of-the-art methods are limited by their reliance on the labor-intensive drawing of coarser meshes or building coarser levels based on spatial proximity, which can introduce wrong edges across geometry boundaries. Inspired by the bipartite graph determination, we propose a novel pooling strategy, \textit{bi-stride} to tackle the aforementioned limitations. Bi-stride pools nodes on every other frontier of the breadth-first search (BFS), without the need for the manual drawing of coarser meshes and avoiding the wrong edges by spatial proximity. Additionally, it enables a one-MP scheme per level and non-parametrized pooling and unpooling by interpolations, resembling U-Nets, which significantly reduces computational costs. Experiments show that the proposed framework, \textit{BSMS-GNN}, significantly outperforms existing methods in terms of both accuracy and computational efficiency in representative physical simulations.

URL PDF HTML ☆

赞 0 踩 0

2302.13473 2026-05-27 cs.LG 版本更新

Towards Interpretable Federated Learning

迈向可解释的联邦学习

Anran Li, Rui Liu, Ming Hu, Yuanyuan Chen, Shipeng Wang, Lizhen Cui, Han Yu

发表机构 * Department of Biomedical Informatics and Data Science, School of Medicine at Yale University（耶鲁大学医学院生物医学信息学与数据科学系）； School of Computer Science and Engineering, Nanyang Technological University（南洋理工大学计算机科学与工程学院）； School of Software, Shandong University（山东大学软件学院）； Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University（山东大学与南洋理工大学联合人工智能研究中心）

AI总结本文首次综述可解释联邦学习（IFL），提出涵盖模型解释、调试和数据贡献评估的独特分类体系，并分析代表性方法、评估指标和未来方向。

Comments Survey of interpretable federated learning

详情

AI中文摘要

联邦学习（FL）使多个数据所有者能够在不暴露私有本地数据的情况下协作构建机器学习模型。为了使FL得到广泛采用，平衡性能、隐私保护和可解释性的需求至关重要，尤其是在金融和医疗等关键任务应用中。因此，可解释联邦学习（IFL）已成为一个新兴的研究课题，吸引了学术界和工业界的极大兴趣。其跨学科性质对新研究人员来说可能具有挑战性。在本文中，我们通过提供（据我们所知）第一篇关于IFL的综述来弥合这一差距。我们提出了一个独特的IFL分类法，涵盖了使FL模型能够解释预测结果、支持模型调试以及提供关于单个数据所有者或数据样本贡献的见解的相关工作，这对于公平分配奖励以激励在FL中积极可靠的参与至关重要。我们对代表性的IFL方法、常用的性能评估指标以及构建多功能IFL技术的有前景方向进行了全面分析。

英文摘要

Federated learning (FL) enables multiple data owners to build machine learning models collaboratively without exposing their private local data. In order for FL to achieve widespread adoption, it is important to balance the need for performance, privacy-preservation and interpretability, especially in mission critical applications such as finance and healthcare. Thus, interpretable federated learning (IFL) has become an emerging topic of research attracting significant interest from the academia and the industry alike. Its interdisciplinary nature can be challenging for new researchers to pick up. In this paper, we bridge this gap by providing (to the best of our knowledge) the first survey on IFL. We propose a unique IFL taxonomy which covers relevant works enabling FL models to explain the prediction results, support model debugging, and provide insights into the contributions made by individual data owners or data samples, which in turn, is crucial for allocating rewards fairly to motivate active and reliable participation in FL. We conduct comprehensive analysis of the representative IFL approaches, the commonly adopted performance evaluation metrics, and promising directions towards building versatile IFL techniques.

URL PDF HTML ☆

赞 0 踩 0

2009.11997 2026-05-27 cs.LG cs.AI cs.RO 版本更新

Continual Model-Based Reinforcement Learning with Hypernetworks

基于超网络的连续模型强化学习

Yizhou Huang, Kevin Xie, Homanga Bharadhwaj, Florian Shkurti

发表机构 * Division of Engineering Science, University of Toronto, Canada（多伦多大学工程科学系）； Department of Computer Science, University of Toronto, Canada（多伦多大学计算机科学系）

AI总结提出HyperCRL方法，利用任务条件超网络在序列任务中持续学习动力学模型，避免重新训练并固定存储开销，在机器人 locomotion 和 manipulation 任务中优于现有持续学习方法。

Comments Updated link to project website in the abstract. 7 pages (+2 pages in appendix), 8 figures. In proceedings of the 2021 IEEE International Conference on Robotics and Automation

详情

AI中文摘要

在基于模型的强化学习（MBRL）和模型预测控制（MPC）中，有效规划依赖于学习到的动力学模型的准确性。在MBRL和MPC的许多实例中，该模型被假定为平稳的，并且定期从头开始重新训练，使用从环境交互开始收集的状态转移经验。这意味着训练动力学模型所需的时间——以及计划执行之间的暂停时间——随着收集的经验规模线性增长。我们认为这对于终身机器人学习来说太慢，并提出了HyperCRL，一种使用任务条件超网络在序列任务中持续学习所遇到动力学的方法。我们的方法有三个主要特点：首先，它包括不重新访问先前任务训练数据的动力学学习会话，因此只需存储最近固定大小的状态转移经验；其次，它使用固定容量的超网络来表示非平稳且任务感知的动力学；第三，它优于依赖固定容量网络的现有持续学习替代方案，并且与记忆不断增长的过去经验核心集的基线方法相比具有竞争力。我们展示了HyperCRL在机器人 locomotion 和 manipulation 场景（如推和开门任务）中在连续基于模型的强化学习中的有效性。我们的项目网站（含视频）位于此链接：https://rvl.cs.toronto.edu/blog/hypercrl

英文摘要

Effective planning in model-based reinforcement learning (MBRL) and model-predictive control (MPC) relies on the accuracy of the learned dynamics model. In many instances of MBRL and MPC, this model is assumed to be stationary and is periodically re-trained from scratch on state transition experience collected from the beginning of environment interactions. This implies that the time required to train the dynamics model - and the pause required between plan executions - grows linearly with the size of the collected experience. We argue that this is too slow for lifelong robot learning and propose HyperCRL, a method that continually learns the encountered dynamics in a sequence of tasks using task-conditional hypernetworks. Our method has three main attributes: first, it includes dynamics learning sessions that do not revisit training data from previous tasks, so it only needs to store the most recent fixed-size portion of the state transition experience; second, it uses fixed-capacity hypernetworks to represent non-stationary and task-aware dynamics; third, it outperforms existing continual learning alternatives that rely on fixed-capacity networks, and does competitively with baselines that remember an ever increasing coreset of past experience. We show that HyperCRL is effective in continual model-based reinforcement learning in robot locomotion and manipulation scenarios, such as tasks involving pushing and door opening. Our project website with videos is at this link https://rvl.cs.toronto.edu/blog/hypercrl

URL PDF HTML ☆

赞 0 踩 0

1909.08210 2026-05-27 cs.LG stat.ML 版本更新

Reformulation of RBM to Unify Linear and Nonlinear Dimensionality Reduction

RBM的重新表述以统一线性和非线性降维

Jiangsheng You, Chun-Yen Liu

发表机构 * Aspen Technology Inc（阿斯彭技术公司）

AI总结本文通过最大后验估计和期望最大化算法重新表述受限玻尔兹曼机为确定性模型，提出无需MCMC的对比散度算法，统一了标量和向量变量的线性和非线性降维。

Comments 16 pages with 7 figures

详情

AI中文摘要

受限玻尔兹曼机（RBM）是一种具有共享权重的两层神经网络，在文献中已被广泛研究用于降维、数据表示和推荐系统。传统的RBM需要对两层上的值进行概率解释，并在训练期间使用马尔可夫链蒙特卡洛（MCMC）过程生成样本。对比散度（CD）算法能高效训练RBM，但其收敛性尚未得到数学证明。在本文中，利用最大后验（MAP）估计和期望最大化（EM）算法，我们证明了无MCMC的CD算法对于条件似然目标函数是收敛的。本文的另一个关键贡献是将RBM重新表述为确定性模型。在重新表述的RBM中，无MCMC的CD算法近似于梯度下降（GD）方法。这种重新表述的RBM可以在节点上采用连续的标量和向量变量，并灵活选择激活函数。数值实验显示了其在线性和非线性降维中的能力，并且对于非线性降维，通过选择合适的激活函数，重新表述的RBM可以优于主成分分析（PCA）。最后，我们展示了其在CIFAR-10数据集（彩色图像）和多变量序列数据上的向量值节点应用，这些应用无法用传统RBM自然配置。这项工作不仅为传统RBM提供了理论见解，而且统一了标量和向量变量的线性和非线性降维。

英文摘要

A restricted Boltzmann machine (RBM) is a two-layer neural network with shared weights and has been extensively studied for dimensionality reduction, data representation and recommendation systems in the literature. The traditional RBM requires a probabilistic interpretation of the values on both layers and a Markov chain Monte Carlo (MCMC) procedure to generate samples during the training. The contrastive divergence (CD) is efficient to train the RBM but its convergence has not been proved mathematically. In this paper, using a maximum a posteriori (MAP) estimate and the expectation maximization (EM) algorithm, we show that the CD algorithm without MCMC is convergent for the conditional likelihood object function. Another key contribution in this paper is the reformulation of the RBM into a deterministic model. Within the reformulated RBM, the CD algorithm without MCMC approximates the gradient descent (GD) method. This reformulated RBM can take the continuous scalar and vector variables on the nodes with flexibility in choosing the activation functions. Numerical experiments show its capability in both linear and nonlinear dimensionality reduction, and, for the nonlinear dimensionality reduction, the reformulated RBM can outperform principal component analysis (PCA) by choosing the proper activation functions. Finally, we demonstrate its application to vector-valued nodes for the CIFAR-10 dataset (color images) and the multivariate sequence data, which cannot be configured naturally with the traditional RBM. This work not only provides theoretical insights regarding the traditional RBM but also unifies the linear and nonlinear dimensionality reduction for scalar and vector variables.

URL PDF HTML ☆

赞 0 踩 0