arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2509.24467 2026-06-09 cs.LG stat.ML 版本更新

Interpretable Self-Supervised Learning via Representer Landmarks and Nyström Approximation

通过表征地标和Nyström近似的可解释自监督学习

Maedeh Zarvandi, Michael Timothy, Theresa Wasserer, Debarghya Ghoshdastidar

发表机构 * Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心）； Technical University of Munich, TUM School of Computation, Information and Technology（慕尼黑技术大学，TUM计算、信息与技术学院）

AI总结提出KREPES框架，利用表征地标和Nyström近似，对自监督学习目标（SimCLR、BYOL、VICReg）学到的表征进行可解释性分析，并引入新指标量化透明度。

Comments 24 pages, 10 figures. Accepted to the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

自监督学习（SSL）从大量未标记数据中学习表征，但所得模型通常作为黑盒运行，需要特定领域的解释。我们引入了KREPES，一个统一的框架，用于分析解释SSL目标（包括SimCLR、BYOL和VICReg）学到的表征。通过将神经网络的实证神经正切核近似与核的表征定理联系起来，我们直接通过“表征地标”（即具有影响力的未标记训练样本的表征）来表达学到的潜在空间。我们引入了新指标：“样本特定影响分数”、“条件概念影响分数”和“特征对齐差距”，以量化所学表征的透明度。KREPES能够在无监督的情况下直接审计潜在空间，例如，揭示Adult-1M数据集中的算法偏差，其中SSL使用人口统计代理来预测收入。最后，为了确保在具有100万以上样本的基准测试（ImageNet-1K、Adult-1M）上的可扩展性，KREPES为SSL目标引入了一种基于Nyström近似的新型分析推理框架。

英文摘要

Self-supervised learning (SSL) learns representations from massive unlabeled data, yet the resulting models typically operate as black boxes, necessitating domain-specific explanations. We introduce KREPES, a unified framework to analytically interpret the learned representations of SSL objectives, including SimCLR, BYOL, and VICReg. By bridging empirical neural tangent kernel approximations of neural networks with the Representer Theorem for kernels, we express the learned latent space directly via "Representer Landmarks", which are the representations of influential unlabeled training examples. We introduce novel metrics, "Sample-Specific Influence Score", "Concept-Conditioned Influence Score" and "Feature Alignment Gap", to quantify the transparency of the learned representations. KREPES enables direct audit of the latent space without supervision, for example, revealing an algorithmic bias in the Adult-1M dataset where SSL uses demographic proxies for income. Finally, to ensure scalability to benchmarks with 1M+ samples (ImageNet-1K, Adult-1M), KREPES introduces a novel Nyström approximation-based analytical inference framework for SSL objectives.

URL PDF HTML ☆

赞 0 踩 0

2509.25004 2026-06-09 cs.AI 版本更新

CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning

CLPO：课程学习与策略优化相结合用于大语言模型推理

Shijie Zhang, Zheng Xiao, Shiyu Liu, Guohao Sun, Kevin Zhang, Xiang Guo, Rujun Guo, Shaoyu Liu, Wangxiao Zhao, Guanjun Jiang

发表机构 * Peking University（北京大学）； Qwen Applications Business Group, Alibaba Group（通义实验室，阿里巴巴集团）； Xiamen University（厦门大学）

AI总结提出CLPO框架，通过在线策略准确率动态调整问题难度，使课程与策略共同进化，在数学和通用推理基准上显著优于GRPO和DAPO。

详情

AI中文摘要

具有可验证奖励的在线强化学习已成为提升大语言模型推理能力的有效范式，但大多数方法仍对静态问题集优化推理轨迹，将rollout预算浪费在已解决或过于困难的问题上。我们提出\textbf{CLPO（课程学习与策略优化相结合）}，一种自我进化的课程框架，利用在线策略rollout准确率识别已解决、中等难度和困难问题，然后根据模型当前能力重构所选任务。困难问题被简化以变得可学习，而中等难度问题被多样化以提供有用的训练变化。这使得学习课程能够与策略共同进化，而不是随着模型能力边界移动而保持固定。CLPO不将这些重写视为静态数据增强，而是优化重构轨迹，并根据重写问题的下游准确率增益分配信用，除了原始可验证答案外不需要额外的人工标注。跨数学推理和域外通用推理基准的实验表明，CLPO在Qwen3-8B上分别以平均10.21和7.75个点显著优于GRPO和DAPO。在数学和代码领域的消融研究进一步表明，重构模式和重写损失都对最终增益有贡献，证明了CLPO通过自我进化的课程为激发更强推理能力提供了可扩展且稳健的途径。

英文摘要

Online reinforcement learning with verifiable rewards (RLVR) has become an effective paradigm for improving the reasoning abilities of large language models, but most methods still optimize reasoning trajectories over the static problem set, wasting rollout budget on solved or overly difficult problems. We propose \textbf{CLPO (Curriculum Learning meets Policy Optimization)}, a self-evolving curriculum framework that uses on-policy rollout accuracy to identify solved, medium-difficulty, and hard problems, then restructures selected tasks according to the model's current capability. Hard problems are simplified to become learnable, while medium-difficulty problems are diversified to provide useful training variation. This allows the learning curriculum to co-evolve with the policy rather than remaining fixed as the model's capability boundary shifts. Rather than treating these rewrites as static data augmentation, CLPO optimizes restructuring trajectories with credit assigned by the downstream accuracy gain of the rewritten problem, requiring no additional human annotations beyond the original verifiable answers. Experiments across mathematical reasoning and out-of-domain general reasoning benchmarks show that CLPO substantially outperforms GRPO and DAPO on Qwen3-8B by 10.21 and 7.75 average points, respectively. Ablation studies on math and code domains further show that both the restructuring mode and the rewriting loss contribute to the final gains, demonstrating that CLPO provides a scalable and robust pathway for eliciting stronger reasoning capabilities through a self-evolving curriculum.

URL PDF HTML ☆

赞 0 踩 0

2509.24531 2026-06-09 cs.CV 版本更新

Diffusion Bridge or Flow Matching? A Unifying Framework and Comparative Analysis

扩散桥还是流匹配？一个统一框架与比较分析

Kaizhen Zhu, Mokai Pan, Zhechuan Yu, Jingya Wang, Jingyi Yu, Ye Shi

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文通过随机最优控制和最优传输理论统一了扩散桥与流匹配，证明扩散桥成本更低且轨迹更稳定，并在图像恢复等任务中通过实验验证了理论。

详情

AI中文摘要

扩散桥和流匹配在任意分布之间的变换中都展示了令人信服的经验性能。然而，关于哪种方法通常更优仍存在困惑，并且它们建模假设和实际实现中的显著差异阻碍了对其相对优势的统一理论解释。我们首次为这两种模型提供了统一的理论和实验验证。我们通过随机最优控制的视角重新构建了它们的框架，并证明了扩散桥的成本函数更低，引导系统走向更稳定和自然的轨迹。同时，从最优传输的角度来看，当训练数据规模减少时，流匹配的插值系数 $t$ 和 $1-t$ 变得越来越无效。为了证实这些理论主张，我们提出了一种基于潜在Transformer的新型强大架构用于扩散桥，并实现了具有相同结构的流匹配模型，以便在各种实验中进行公平的性能比较。我们在图像恢复、图像翻译和风格迁移任务上进行了全面的实验，系统性地改变了分布差异（不同难度）和训练数据规模。广泛的经验结果与我们的理论预测完全一致，并使我们能够描绘这两种模型各自的优缺点。我们的代码可在 https://this https URL 获取。

英文摘要

Diffusion Bridge and Flow Matching have both demonstrated compelling empirical performance in transformation between arbitrary distributions. However, there remains confusion about which approach is generally preferable, and the substantial discrepancies in their modeling assumptions and practical implementations have hindered a unified theoretical account of their relative merits. We have, for the first time, provided a unified theoretical and experimental validation of these two models. We recast their frameworks through the lens of Stochastic Optimal Control and prove that the cost function of the Diffusion Bridge is lower, guiding the system toward more stable and natural trajectories. Simultaneously, from the perspective of Optimal Transport, interpolation coefficients $t$ and $1-t$ of Flow Matching become increasingly ineffective when the training data size is reduced. To corroborate these theoretical claims, we propose a novel, powerful architecture for Diffusion Bridge built on a latent Transformer, and implement a Flow Matching model with the same structure to enable a fair performance comparison in various experiments. Comprehensive experiments are conducted across Image Restoration, Translation, and Style Transfer tasks, systematically varying both the distributional discrepancy (different difficulty) and the training data size. Extensive empirical results align perfectly with our theoretical predictions and allow us to delineate the respective advantages and disadvantages of these two models. Our code is available at https://github.com/zhukaizhen/diffusion_bridge_flow_matching.

URL PDF HTML ☆

赞 0 踩 0

2506.06891 2026-06-09 cs.LG cs.CR 版本更新

Robust In-Context Reinforcement Learning Under Reward Poisoning Attacks

奖励投毒攻击下的鲁棒上下文强化学习

Paulius Sasnauskas, Yiğit Yalın, Goran Radanović

发表机构 * Department of Computing Science, University of Alberta, Edmonton, Canada.（阿尔伯塔大学计算机科学系，加拿大埃德蒙顿）； Alberta Machine Intelligence Institute (Amii), Edmonton, Canada.（阿尔伯塔机器智能研究所（Amii），加拿大埃德蒙顿）

AI总结针对奖励投毒攻击，提出对抗训练框架AT-DPT，通过同时训练攻击者和DPT模型，显著提升上下文强化学习在赌博机环境下的鲁棒性，并泛化到MDP等复杂场景。

Comments ICML 2026, code available at https://github.com/PauliusSasnauskas/AT-DPT

详情

AI中文摘要

我们研究了上下文强化学习（ICRL）的腐败鲁棒性，重点关注决策预训练变换器（DPT, Lee et al., 2023）。为了应对针对DPT的奖励投毒攻击挑战，我们提出了一种新颖的对抗训练框架，称为对抗训练DPT（AT-DPT）。我们的方法同时训练一群攻击者，通过毒化环境奖励来最小化DPT的真实奖励，以及一个DPT模型从毒化数据中推断最优动作。我们评估了该方法相对于标准赌博机算法（包括旨在处理奖励污染的鲁棒基线）的有效性。结果表明，AT-DPT在学习攻击者下的赌博机设置中显著优于它们，并泛化到更复杂的环境，如自适应攻击者和MDP。它作为元强化学习方法，在学习有效的腐败鲁棒算法方面显示出在ICRL中的前景。

英文摘要

We study the corruption-robustness of in-context reinforcement learning (ICRL), focusing on the Decision-Pretrained Transformer (DPT, Lee et al., 2023). To address the challenge of reward poisoning attacks targeting the DPT, we propose a novel adversarial training framework, called Adversarially Trained DPT (AT-DPT). Our method simultaneously trains a population of attackers to minimize the true reward of the DPT by poisoning environment rewards, and a DPT model to infer optimal actions from the poisoned data. We evaluate the effectiveness of our approach against standard bandit algorithms, including robust baselines designed to handle reward contamination. Our results show that AT-DPT significantly outperforms them in bandit settings under a learned attacker, and generalizes to more complex environments such as adaptive attackers and MDPs. It shows promise in ICRL as a meta-RL approach to learning effective corruption-robust algorithms.

URL PDF HTML ☆

赞 0 踩 0

2506.01052 2026-06-09 cs.LG math.OC stat.ML 版本更新

A Robust $\widetilde{\mathcal{O}}(1/\sqrt{T})$ Rate for Unprojected TD Learning with Linear Function Approximation

线性函数逼近的无投影TD学习的鲁棒 $\widetilde{\mathcal{O}}(1/\sqrt{T})$ 收敛率

Wei-Cheng Lee, Francesco Orabona

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文针对线性函数逼近的时序差分学习，在无投影条件下证明了期望收敛率为 $\widetilde{\mathcal{O}}(\\|\theta^*\\|^2_2/\sqrt{T})$，仅需对学习率进行轻微的对数修正，无需额外正则条件。

详情

AI中文摘要

我们研究了线性函数逼近的时序差分（TD）学习的有限时间收敛性质，这是强化学习的基石。我们关注所谓的“鲁棒”设置，其中收敛保证不依赖于势函数的最小曲率。虽然先前的工作已经建立了该设置下的收敛保证，但这些结果通常依赖于每次迭代被投影到有界集上的人为假设。Bhandari 等人（COLT'18）将去除这一条件留作开放问题，并假设需要额外的“正则条件”。在本文中，我们表明，即使存在马尔可夫噪声，简单的无投影 TD(0) 也能以期望的 $\widetilde{\mathcal{O}}\left(\frac{\\|\theta^*\\|^2_2}{\sqrt{T}}\right)$ 速率收敛。我们不需要额外的正则条件，仅需对学习率进行轻微的对数修正。我们的分析揭示了 TD 更新的一种新的自界性质，并利用它来保证迭代的有界性。

英文摘要

We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone of reinforcement learning. We are interested in the so-called ``robust'' setting, where the convergence guarantee does not depend on the potential function's minimal curvature. While prior work has established convergence guarantees in this setting, these results typically rely on the artificial assumption that each iterate is projected onto a bounded set. Removing such a condition was left as an open problem by Bhandari et al. (COLT'18), hypothesizing the need for additional ``regularity conditions''. In this paper, we show that the simple unprojected TD(0) converges with a rate of $\widetilde{\mathcal{O}}\left(\frac{\|θ^*\|^2_2}{\sqrt{T}}\right)$ in expectation, even in the presence of Markovian noise. We do not require an additional regularity condition, but only a minor polylog correction to the learning rate. Our analysis reveals a novel self-bounding property of the TD updates and exploits it to guarantee bounded iterates.

URL PDF HTML ☆

赞 0 踩 0

2505.11189 2026-06-09 cs.AI cs.LG 版本更新

Can Global XAI Methods Reveal Injected Behaviours in LLMs? SHAP vs Rule Extraction vs RuleSHAP

全局XAI方法能否揭示LLM中的注入行为？SHAP vs 规则提取 vs RuleSHAP

Francesco Sovrano

发表机构 * Collegium Helveticum at ETH Zurich（苏黎世联邦理工学院霍夫曼学院）； Università della Svizzera italiana（瑞士联邦理工学院）

AI总结研究通过统计验证的抽象将全局LLM信念映射为数值分数，提出RuleSHAP算法，结合全局SHAP与规则归纳，以更好地捕捉非单变量触发因素，平均MRR@1比RuleFit提升82%。

Comments Accepted for publication at KDD'2026

详情

DOI: 10.1145/3770855.3818093

AI中文摘要

大型语言模型（LLM）可能放大错误信息，破坏联合国可持续发展目标等社会目标。我们研究了三个有文献记载的错误信息驱动因素（效价框架、信息过载和过度简化），这些因素通常由默认信念塑造。基于LLM编码此类默认信念（例如，“快乐是积极的”、“数学是复杂的”）并可作为“启发式包”的证据，我们询问是否可以从黑盒LLM行为中恢复出错误信息相关行为背后的信念驱动启发式作为显式规则。一个关键障碍是可解释AI（XAI）中的全局规则提取方法是为数值输入输出数据设计的，而非文本。我们通过引出全局LLM信念并通过统计验证的抽象将其映射为数值分数来解决这一问题，从而使现成的全局XAI能够检测信念驱动的启发式。为了获得真实情况，我们通过系统指令向GPT系列和Llama模型注入复杂度递增的非线性行为触发因素（单变量、合取、非凸）。我们发现RuleFit经常遗漏非单变量触发因素，而全局SHAP在排名合取触发特征方面更好，但不产生符号规则。为了弥合这一差距，我们提出了RuleSHAP，一种将全局SHAP聚合与规则归纳相结合的规则提取算法，以更好地捕捉非单变量触发因素，平均MRR@1比RuleFit提升82%。我们的结果提示了一种揭示LLM中行为触发因素的实用途径。

英文摘要

Large language models (LLMs) can amplify misinformation, undermining societal goals such as the UN SDGs. We study three documented drivers of misinformation (valence framing, information overload, and oversimplification) often shaped by default beliefs. Building on evidence that LLMs encode such defaults (e.g., "joy is positive", "math is complex") and can act as "bags of heuristics", we ask whether belief-driven heuristics behind misinformation-related behaviour can be recovered from black-box LLM behaviour as explicit rules. A key obstacle is that global rule-extraction methods in explainable AI (XAI) are built for numerical input-output data, not text. We address this by eliciting global LLM beliefs and mapping them to numerical scores via statistically validated abstractions, enabling off-the-shelf global XAI to detect belief-driven heuristics. For ground truth, we inject nonlinear behavioural triggers of increasing complexity (univariate, conjunctive, non-convex) into GPT-family and Llama models via system instructions. We find that RuleFit often misses non-univariate triggers, while global SHAP better ranks conjunctive trigger features but yields no symbolic rules. To bridge this gap, we propose RuleSHAP, a rule-extraction algorithm that couples global SHAP aggregates with rule induction to better capture non-univariate triggers, improving MRR@1 over RuleFit by +82% on average. Our results suggest a practical pathway for surfacing behavioural triggers in LLMs.

URL PDF HTML ☆

赞 0 踩 0

2509.17446 2026-06-09 cs.LG cs.AI 版本更新

MVCL-DAF++: Enhancing Multimodal Intent Recognition via Prototype-Aware Contrastive Alignment and Coarse-to-Fine Dynamic Attention Fusion

MVCL-DAF++: 通过原型感知对比对齐和由粗到细动态注意力融合增强多模态意图识别

Haofeng Huang, Yifei Han, Long Zhang, Bin Li, Yangfan He, Yaxin Xue

发表机构 * University of Shanghai for Science and TechnologyChina（上海科学技术大学中国）； Shenzhen Institute of Advanced Technology, Chinese Academy of SciencesChina（深圳先进技术研究院，中国科学院中国）； University of Minnesota-Twin Cities, USA（明尼苏达大学双城分校，美国）； University of LeedsUK（利兹大学，英国）

AI总结提出MVCL-DAF++，通过原型感知对比对齐和由粗到细注意力融合，在MIntRec和MIntRec2.0上提升多模态意图识别，尤其改善稀有类识别。

Comments Accepted by Interspeech 2026

2509.17455 2026-06-09 cs.CL cs.AI 版本更新

Understanding Benchmark Language Under Weakened Formal Semantics

弱化形式语义下的基准语言理解

Haoyang Chen, Kumiko Tanaka-Ishii

发表机构 * Department of Computer Science and Engineering（计算机科学与工程系）； School of Fundamental Science and Engineering（基础科学与工程学院）； Waseda University（早稻田大学）

AI总结提出可计算表示方法，通过外部知识检索提取可执行代码，在数学推理、多步推理等基准上超越纯文本推理和单次代码执行，提供可扩展、可检查的语义证据。

Comments Accepted to Transactions of the Association for Computational Linguistics (TACL). 29 pages, 5 figures

详情

AI中文摘要

最先进的 NLP 基准需要解释指定条件、程序和异常的自然语言，通常依赖隐含假设和外部知识。在规模上构建具有证明论保证的完整语义表示通常不切实际，而纯文本推理提供的检查手段有限。本文探讨当形式语义保证被弱化时，能在多大程度上理解基准语言。我们通过提取可计算表示来研究这个问题：可执行表示，其运行时行为提供语义充分性的操作证据，包括可执行性、执行轨迹和运行时失败。我们使用外部知识检索，为基准实例诱导并迭代优化可计算表示。在数学推理、多步推理、因果推断以及规则和异常密集的法律和生物医学基准上，我们发现所提出的方法持续优于纯文本推理和单次代码执行。除了准确性，我们的分析表明，这些可计算表示提供了可扩展、可检查的语义证据：它们暴露了基准语言强制转化为可执行形式的条件和异常，为面向证明的语义和纯文本推理之间提供了实用的桥梁。

英文摘要

State-of-the-art NLP benchmarks require interpretation of natural language that specifies conditions, procedures, and exceptions, often relying on implicit assumptions and external knowledge. Constructing complete semantic representations with proof-theoretic guarantees is frequently impractical at scale, and purely text-based reasoning offers limited means of inspection. This paper asks how much understanding of benchmark language can be achieved when formal semantic guarantees are weakened. We investigate this question by extracting computables: executable representations whose runtime behavior provides operational evidence of semantic adequacy, including executability, execution traces, and runtime failures. We induce and iteratively refine computables for benchmark instances using retrieval from external knowledge. Across mathematical reasoning, multi-step reasoning, causal inference, and rule- and exception-heavy legal and biomedical benchmarks, we find that the proposed approach consistently exceeds text-only reasoning and one-shot code execution. Beyond accuracy, our analyses show that these computables provide scalable, inspectable semantic evidence: they expose conditions and exceptions benchmark language forces into executable form, offering a practical bridge between proof-oriented semantics and purely textual reasoning.

URL PDF HTML ☆

赞 0 踩 0

2509.15494 2026-06-09 cs.LG physics.data-an 版本更新

Multi-resolution Enhancement for Full Spectrum Neural Representations

全频谱神经表示的多分辨率增强

Yuan Ni, Zhantao Chen, Shizhou Xu, Cheng Peng, Rajan Plumley, Chun Hong Yoon, Jana B. Thayer, Joshua J. Turner

发表机构 * Linac Coherent Light Source, SLAC National Accelerator Laboratory（直线相干光源，SLAC国家加速器实验室）； Stanford Institute for Materials and Energy Sciences, Stanford University（斯坦福大学材料与能源科学研究所）； Walker Department of Mechanical Engineering, The University of Texas at Austin（德克萨斯大学奥斯汀分校机械工程系）； Department of Mathematics, University of California Davis（加州大学戴维斯分校数学系）； Department of Physics, Carnegie Mellon University（卡内基梅隆大学物理系）

AI总结提出WIEN-INR框架，通过分层增强网络在不同分辨率尺度上建模，提升小网络对多尺度结构和高频细节的表示能力，实现紧凑高保真表示。

详情

AI中文摘要

科学数据采集持续超越存储和分析能力，使得基于体素的表示越来越难以处理。隐式神经表示（INRs）通过基于坐标的神经网络编码信号，作为数据的替代品，其计算和存储需求随网络复杂度而非数据维度扩展，提供了有前景的解决方案。然而，较小的INRs难以忠实表示构成科学测量大部分的多尺度结构、高频信息和精细纹理。我们提出WIEN-INR，一个理论指导的分层INR框架，跨分辨率尺度分配建模，并通过新颖的增强网络恢复细微细节，从而提高表示能力。这种多尺度架构允许较小的网络保留全频谱信息，同时保持训练效率并降低存储成本。在跨尺度和复杂性的不同原始实验测量上评估，WIEN-INR代表了神经表示在科学工作流中更广泛采用的实用步骤，提供了紧凑、鲁棒和高保真的表示。

英文摘要

Scientific data acquisition continues to outpace storage and analysis capabilities, making voxel-based representations increasingly intractable. Implicit neural representations (INRs) offer a promising solution by encoding signals through coordinate-based neural networks, serving as surrogates of data, with computational and storage requirements scaling with network complexity rather than data dimensionality. However, smaller INRs struggle to faithfully represent the multi-scale structures, high-frequency information, and fine textures that constitute a large proportion of scientific measurements. We propose WIEN-INR, a theoretically-guided hierarchical INR framework that distributes modeling across resolution scales and enables improved representation capacity through a novel enhancement network to recover subtle details. This multi-scale architecture allows smaller networks to retain the full spectrum of information while preserving training efficiency and lowering storage cost. Evaluated on distinct raw experimental measurements across scales and complexities, WIEN-INR represents a practical step toward broader adoption of neural representations in scientific workflows, delivering compact, robust, and high-fidelity representations.

URL PDF HTML ☆

赞 0 踩 0

2509.14562 2026-06-09 cs.LG math.OC 版本更新

LiMuon: Light and Fast Muon Optimizer for Large Models

LiMuon: 面向大模型的轻量快速Muon优化器

Feihu Huang, Yuning Luo, Songcan Chen

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出LiMuon优化器，结合动量方差缩减与随机SVD，在降低内存的同时实现更低的样本复杂度O(ε^{-3})，并在Mamba-130M等模型上验证有效性。

Comments Published in ICML 2026

详情

AI中文摘要

近年来，大模型在机器学习中广泛应用，因此大模型的高效训练受到广泛关注。最近，实用的Muon优化器专门针对大模型的矩阵结构参数设计。尽管已有工作开始研究Muon优化器，但现有的Muon及其变体在处理大模型时仍存在样本复杂度高或内存占用高的问题。为填补这一空白，我们提出了一种轻量快速的Muon（LiMuon）优化器用于训练大模型，它基于动量方差缩减技术和随机奇异值分解（SVD）。特别地，我们的LiMuon同时具有比Muon及其变体更低的内存和更低的样本复杂度。此外，我们证明在广义光滑条件下，具有更低内存的LiMuon在非凸随机优化中寻找ε-稳定解时具有更低的样本复杂度O(ε^{-3})。为进一步缩小理论与实践差距，我们还证明采用Newton-Schulz步骤的LiMuon比采用Newton-Schulz步骤的Muon具有更低的样本复杂度。在训练Mamba-130M、Qwen2.5-0.5B和ViT模型上的数值实验结果证明了我们LiMuon的有效性。

英文摘要

Large models recently are widely applied in machine learning, so efficient training of large models has received widespread attention. More recently, the useful Muon optimizer is specifically designed for matrix-structured parameters of large models. Although some works have begun to study the Muon optimizer, the existing Muon and its variants still suffer from high sample complexity or high memory for large models. To fill this gap, we propose a light and fast Muon (LiMuon) optimizer for training large models, which builds on the momentum-based variance reduced technique and randomized Singular Value Decomposition (SVD). In particular, our LiMuon simultaneously has a lower memory and lower sample complexity than the Muon and its variants. Moreover, we prove that our LiMuon with lower memory has a lower sample complexity of $O(ε^{-3})$ for finding an $ε$-stationary solution of non-convex stochastic optimization under the generalized smoothness condition. To further narrow practice and theory gap, we also prove that our LiMuon with Newton-Schulz steps has a lower sample complexity than the Muon with Newton-Schulz steps. Numerical experimental results on training Mamba-130M, Qwen2.5-0.5B and ViT models demonstrate effectiveness of our LiMuon.

URL PDF HTML ☆

赞 0 踩 0

2509.15017 2026-06-09 cs.CV 版本更新

No Modality Left Behind: Adapting to Missing Modalities via Knowledge Distillation for Brain Tumor Segmentation

不遗漏任何模态：通过知识蒸馏适应缺失模态的脑肿瘤分割

Shenghao Zhu, Yifei Chen, Weihong Chen, Shuo Jiang, Guanyu Zhou, Yuanhan Wang, Feiwei Qin, Changmiao Wang, Qiyuan Tian

发表机构 * Medical Image Analysis（医学影像分析）

AI总结提出AdaMM框架，利用知识蒸馏和三个协同模块处理多模态MRI中模态缺失问题，在多个数据集上显著提升分割精度和鲁棒性。

Comments 51 pages, 11 figures

详情

DOI: 10.1016/j.media.2026.104108

AI中文摘要

准确的脑肿瘤分割对于术前评估和个性化治疗至关重要。多模态MRI因其能够捕捉不同序列中互补的肿瘤特征而被广泛使用。然而，在临床实践中，模态缺失很常见，限制了依赖完整输入的现有深度学习方法的鲁棒性和泛化能力，尤其是在非主导模态组合下。为了解决这个问题，我们提出了AdaMM，一个针对缺失模态场景定制的多模态脑肿瘤分割框架，以知识蒸馏为核心，由三个协同模块组成。图引导自适应细化模块显式建模通用特征与模态特定特征之间的语义关联，增强对模态缺失的适应性。双瓶颈蒸馏模块通过全局风格匹配和对抗特征对齐，将结构和纹理知识从教师模型转移到学生模型。病变存在引导可靠性模块通过辅助分类任务预测病变类型的先验概率，有效抑制不完整输入下的假阳性。在Pretreat-MetsToBrain-Masks和BraTS 2018、2024数据集上的大量实验表明，AdaMM始终优于现有方法，在单模态和弱模态配置下表现出更优的分割精度和鲁棒性。此外，我们对六类缺失模态策略进行了系统评估，支持知识蒸馏的优越性，并为方法选择和未来研究提供了实用指导。我们的源代码可在以下网址获取：此 https URL。

英文摘要

Accurate brain tumor segmentation is essential for preoperative evaluation and personalized treatment. Multi-modal MRI is widely used due to its ability to capture complementary tumor features across different sequences. However, in clinical practice, missing modalities are common, limiting the robustness and generalizability of existing deep learning methods that rely on complete inputs, especially under non-dominant modality combinations. To address this, we propose AdaMM, a multi-modal brain tumor segmentation framework tailored for missing-modality scenarios, centered on knowledge distillation and composed of three synergistic modules. The Graph-guided Adaptive Refinement Module explicitly models semantic associations between generalizable and modality-specific features, enhancing adaptability to modality absence. The Bi-Bottleneck Distillation Module transfers structural and textural knowledge from teacher to student models via global style matching and adversarial feature alignment. The Lesion-Presence-Guided Reliability Module predicts prior probabilities of lesion types through an auxiliary classification task, effectively suppressing false positives under incomplete inputs. Extensive experiments on the Pretreat-MetsToBrain-Masks and BraTS 2018, 2024 datasets demonstrate that AdaMM consistently outperforms existing methods, exhibiting superior segmentation accuracy and robustness, particularly in single-modality and weak-modality configurations. In addition, we conduct a systematic evaluation of six categories of missing-modality strategies, supporting the superiority of knowledge distillation and offering practical guidance for method selection and future research. Our source code is available at https://github.com/Quanato607/AdaMM.

URL PDF HTML ☆

赞 0 踩 0

2509.10334 2026-06-09 cs.CV cs.AI cs.LG 版本更新

I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation

I-Segmenter: 用于高效语义分割的纯整数视觉Transformer

Jordan Sassoon, Michal Szczepanski, Martyna Poreba

发表机构 * CEA, France（法国原子能委员会）

AI总结提出I-Segmenter，首个全整数ViT分割框架，通过整数运算替换、λ-ShiftGELU激活函数及解码器优化，在保持精度前提下显著降低模型大小和推理延迟。

Comments Accepted by the Journal of Systems Architecture

详情

AI中文摘要

视觉Transformer（ViT）最近在语义分割中取得了强劲的结果，但由于其高内存占用和计算成本，在资源受限设备上的部署仍然有限。量化提供了一种提高效率的有效策略，但基于ViT的分割模型在低精度下非常脆弱，因为量化误差会在深度编码器-解码器流水线中累积。我们引入了I-Segmenter，这是第一个完全纯整数的ViT分割框架。基于Segmenter架构，I-Segmenter系统地将浮点运算替换为纯整数对应运算。为了进一步稳定训练和推理，我们提出了λ-ShiftGELU，一种新颖的激活函数，它减轻了均匀量化在处理长尾激活分布时的局限性。此外，我们移除了L2归一化层，并将解码器中的双线性插值替换为最近邻上采样，确保整个计算图都是纯整数执行。大量实验表明，I-Segmenter在合理精度范围内（平均5.1%）达到其FP32基线的精度，同时将模型大小减少高达3.8倍，并通过优化的运行时实现高达1.2倍的推理加速。值得注意的是，即使在单张校准图像的一次性PTQ中，I-Segmenter也能提供有竞争力的精度，凸显了其在实际部署中的实用性。

英文摘要

Vision Transformers (ViTs) have recently achieved strong results in semantic segmentation, yet their deployment on resource-constrained devices remains limited due to their high memory footprint and computational cost. Quantization offers an effective strategy to improve efficiency, but ViT-based segmentation models are notoriously fragile under low precision, as quantization errors accumulate across deep encoder-decoder pipelines. We introduce I-Segmenter, the first fully integer-only ViT segmentation framework. Building on the Segmenter architecture, I-Segmenter systematically replaces floating-point operations with integer-only counterparts. To further stabilize both training and inference, we propose $λ$-ShiftGELU, a novel activation function that mitigates the limitations of uniform quantization in handling long-tailed activation distributions. In addition, we remove the L2 normalization layer and replace bilinear interpolation in the decoder with nearest neighbor upsampling, ensuring integer-only execution throughout the computational graph. Extensive experiments show that I-Segmenter achieves accuracy within a reasonable margin of its FP32 baseline (5.1 % on average), while reducing model size by up to 3.8x and enabling up to 1.2x faster inference with optimized runtimes. Notably, even in one-shot PTQ with a single calibration image, I-Segmenter delivers competitive accuracy, underscoring its practicality for real-world deployment.

URL PDF HTML ☆

赞 0 踩 0

2509.09151 2026-06-09 cs.CV cs.AI cs.LG 版本更新

Video Understanding by Design: How Datasets Shape Video Models

通过设计理解视频：数据集如何塑造视频模型

Lei Wang, Syuan-Hao Li, Piotr Koniusz, Yongsheng Gao

发表机构 * School of Engineering and Built Environment, Electrical and Electronic Engineering, Griffith University（工程与建筑环境学院，电气与电子工程学院，格里菲斯大学）； School of Computer Science and Engineering, University of New South Wales（计算机科学与工程学院，新南威尔士大学）

AI总结本文从数据集视角出发，提出统一框架连接数据集结构、归纳偏差与架构设计，分析数据集特性如何驱动视频理解架构创新，并讨论不同数据体制下的表征偏差。

Comments Research report

详情

AI中文摘要

视频理解研究因日益多样化的数据集和更强大的模型架构而快速发展。现有综述通常按任务、基准或模型家族组织进展，但对特定架构为何出现并成功提供的见解有限。本文认为，视频理解的演进根本上由数据集结构塑造。我们提出一个以数据集为中心的视角，在统一框架内连接数据集结构、归纳偏差和架构设计。我们表明，不同数据集要求模型捕获特定的不变性和能力，例如对视角变化的鲁棒性、对时间顺序的敏感性、长程依赖推理、关系交互和跨模态对齐。这些需求自然产生归纳偏差，即有利于特定推理和泛化模式的架构假设。从这一视角看，里程碑式架构，包括双流网络、3D CNN、时序模型、Transformer、基于图的方法和多模态基础模型，可理解为对演进数据集所带来挑战的架构响应。基于此框架，我们系统分析了数据集特性如何塑造视频理解任务中的架构创新，并讨论了不同数据体制引发的表征偏差。通过将数据集、归纳偏差和架构统一为一个连贯视角，本综述既提供了对领域演进的回顾性解释，也提供了通向通用视频理解系统的前瞻性路线图。代码和数据集诱导偏差的动态视频可视化见 https://this https URL。

英文摘要

Research in video understanding has advanced rapidly, driven by increasingly diverse datasets and more powerful model architectures. While existing surveys typically organize progress by tasks, benchmarks, or model families, they provide limited insight into why particular architectures emerged and succeeded. In this survey, we argue that the evolution of video understanding is fundamentally shaped by dataset structure. We present a dataset-centric perspective that connects dataset structure, inductive biases, and architectural design within a unified framework. We show that different datasets require models to capture specific invariances and capabilities, such as robustness to viewpoint changes, sensitivity to temporal ordering, reasoning over long-range dependencies, relational interactions, and cross-modal alignment. These requirements naturally give rise to inductive biases, i.e., architectural assumptions that favor particular patterns of reasoning and generalization. From this perspective, milestone architectures, including two-stream networks, 3D CNNs, temporal models, transformers, graph-based methods, and multimodal foundation models, can be understood as architectural responses to the challenges posed by evolving datasets. Building on this framework, we systematically analyze how dataset characteristics have shaped architectural innovation across video understanding tasks and discuss the representational biases induced by different data regimes. By unifying datasets, inductive biases, and architectures into a coherent perspective, this survey offers both a retrospective explanation of the field's evolution and a forward-looking roadmap toward general-purpose video understanding systems. Code and dynamic video visualizations of dataset-induced biases are available at https://time.griffith.edu.au/paper-sites/video-understanding/.

URL PDF HTML ☆

赞 0 踩 0

2509.02167 2026-06-09 cs.SD 版本更新

AudioRWKV: Efficient and Stable Bidirectional RWKV for Audio Pattern Recognition

AudioRWKV：用于音频模式识别的高效稳定双向RWKV

Jing Wang, Maoxiang Wu, Jiayu Xiong, Jianlong Kwan, Jun Xue

发表机构 * arXiv

AI总结提出AudioRWKV架构，通过2D深度可分离卷积和双向WKV核，在保持线性复杂度的同时实现全局上下文建模，解决了Transformer的高复杂度和Mamba的不稳定性问题。

Comments 6 pages, 3 figures

详情

AI中文摘要

最近，Transformer（例如，音频频谱图Transformer，AST）和状态空间模型（例如，Audio Mamba，AuM）在音频建模中取得了显著进展。然而，Transformer架构的O(L^2)计算复杂度阻碍了高效的长序列处理，而Mamba架构在扩展参数和数据时往往变得不稳定。为了解决这些挑战，本文提出了AudioRWKV（A-RWKV），一种用于音频建模的高效稳定架构。具体来说，我们继承了RWKV7的稳定高效循环公式，并将其一维token移位操作替换为二维深度可分离卷积，以更好地捕捉局部频谱-时间模式。此外，我们将原始的因果WKV核改编为双向WKV核（Bi-WKV），使得能够在整个音频序列上进行全局上下文建模，同时保持线性计算复杂度。得益于RWKV7基础的固有稳定性，A-RWKV可以无缝扩展到更大的模型尺寸。实验结果表明，在相同的线性模型机制下，A-RWKV-S（22M）达到了与AuM-B（92M）相当的性能，同时表现出比AST更稳定的吞吐量；对于长音频（约5分28秒），WKV7的处理速度提升高达13.3倍。

英文摘要

Recently, Transformers (e.g., Audio Spectrogram Transformers, AST) and state-space models (e.g., Audio Mamba, AuM) have achieved remarkable progress in audio modeling. However, the O(L^2) computational complexity of the Transformer architecture hinders efficient long-sequence processing, while the Mamba architecture tends to become unstable when scaling parameters and data. To address these challenges, this paper proposes AudioRWKV (A-RWKV), a highly efficient and stable architecture for audio modeling. Specifically, we inherit the stable and efficient recurrent formulation of RWKV7 and replace its 1D token-shift operation with a 2D depthwise separable convolution to better capture local spectro-temporal patterns. Furthermore, we adapt the original causal WKV kernel into a bidirectional WKV kernel (Bi-WKV), enabling global context modeling over the entire audio sequence while maintaining linear computational complexity. Benefiting from the inherent stability of the RWKV7 foundation, A-RWKV scales seamlessly to larger model sizes. Experimental results demonstrate that, under the same linear-model regime, A-RWKV-S (22M) achieves performance parity with AuM-B (92M) while exhibiting more stable throughput than AST; for long-form audio (~5 minutes 28 seconds), WKV7 achieves up to a 13.3X speedup in processing.

URL PDF HTML ☆

赞 0 踩 0

2509.01916 2026-06-09 cs.LG 版本更新

Causal Representation Learning from Network Data

从网络数据中进行因果表示学习

Jifan Zhang, Michelle M. Li, Elena Zheleva

发表机构 * Department of Statistics and Data Science, Northwestern University（统计与数据科学系，西北大学）； Department of Biomedical Informatics, Harvard University（生物医学信息学系，哈佛大学）； Department of Computer Science, University of Illinois Chicago（计算机科学系，伊利诺伊大学芝加哥分校）

AI总结提出GraCE-VAE，利用图神经网络编码器整合生物网络和通路信息，在软干预下识别潜在因果图与干预目标，实验证明利用结构化生物上下文可提升干预结果预测。

Comments 19 pages, 8 figures

详情

AI中文摘要

在软干预下，因果解缠在假设线性干预忠实性以及同时拥有观测数据和干预数据的情况下是可识别的。先前的工作主要关注非结构化观测，未利用测量实体间已知的关系上下文。然而，在许多科学应用中，测量变量伴随着一个观测到的交互网络，该网络提供结构化上下文，例如蛋白质-蛋白质相互作用和通路-基因成员关系。我们提出GraCE-VAE，一种图感知的因果差异变分自编码器，它将通路级信息视为潜在因果程序的辅助视图。图神经网络编码器以这个辅助通路视图和生物图为条件，以改进摊销推理，而因果解码器仍然是一个带有软干预的潜在SCM。假设每个干预机制内的样本是独立同分布的，我们证明GraCE-VAE继承了因果差异VAE的可识别性保证，并在标准等价类内识别出潜在因果图和干预目标。在三个CRISPR扰动数据集上的实验表明，利用结构化生物上下文可以改善对干预结果（包括未见过的扰动组合）的预测。

英文摘要

Causal disentanglement from soft interventions is identifiable under the assumptions of linear interventional faithfulness and availability of both observational and interventional data. Prior work has focused on unstructured observations without leveraging known relational context among measured entities. In many scientific applications, however, the measured variables come with an observed interaction network that provides structured context, such as protein-protein interactions and pathway-gene membership. We propose GraCE-VAE, a graph-aware causal discrepancy variational autoencoder that treats pathway-level information as an auxiliary view of the latent causal programs. The graph neural network encoder conditions on this auxiliary pathway view and the biological graph to improve amortized inference, while the causal decoder remains a latent SCM with soft interventions. Assuming samples are i.i.d. within each intervention regime, we show that GraCE-VAE inherits the identifiability guarantees of causal discrepancy VAEs and identifies the latent causal graph and intervention targets up to the standard equivalence class. Experiments on three CRISPR perturbation datasets demonstrate that leveraging structured biological context improves prediction of interventional outcomes, including unseen perturbation combinations.

URL PDF HTML ☆

赞 0 踩 0

2508.20734 2026-06-09 cs.CV 版本更新

CardioMorphNet: Cardiac Motion Prediction Using a Shape-Guided Bayesian Recurrent Deep Network

CardioMorphNet: 使用形状引导的贝叶斯循环深度网络进行心脏运动预测

Reza Akbari Movahed, Abuzar Rezaee, Arezoo Zakeri, Colin Berry, Edmond S. L. Ho, Ali Gooya

发表机构 * University of California, San Diego（加州大学圣地亚哥分校）； University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结提出CardioMorphNet，一种基于循环变分自编码器和贝叶斯公式的3D心脏形状引导可变形配准框架，通过递归配准分割图避免强度相似性损失，在心脏运动估计中优于现有方法，并具有更低的不确定性。

Comments Published in Medical Image Analysis. Updated to match the final published version

详情

DOI: 10.1016/j.media.2026.104149
Journal ref: Medical Image Analysis, vol. 113, p. 104149, 2026

AI中文摘要

从电影心脏磁共振（CMR）图像中准确估计心脏运动对于评估心脏功能和检测其异常至关重要。现有方法通常难以准确捕捉心脏运动，因为它们依赖于基于强度的图像配准相似性损失，可能忽略心脏解剖区域。为了解决这个问题，我们提出了CardioMorphNet，一个用于使用短轴（SAX）CMR图像进行3D心脏形状引导可变形配准的循环贝叶斯深度学习框架。它采用循环变分自编码器来建模心脏周期中的时空依赖性，以及两个用于双心室分割和运动估计的后验模型。从贝叶斯公式导出的损失函数通过递归配准分割图来引导框架关注解剖区域，而不使用基于强度的图像配准相似性损失，同时利用顺序SAX体积和时空特征。贝叶斯建模还使得能够计算估计运动场的不确定性图。通过在UK Biobank和M&M数据集上验证，将扭曲的掩模形状与真实掩模进行比较，CardioMorphNet在心脏运动估计中表现出优越的性能，优于最先进的方法。不确定性评估表明，与其他基于概率的心脏配准方法相比，它在心脏区域估计的运动场上产生更低的不确定性值，表明其预测具有更高的置信度。此外，临床指标提取评估显示，CardioMorphNet比其他方法更准确地估计临床指标。

英文摘要

Accurate cardiac motion estimation from cine cardiac magnetic resonance (CMR) images is vital for assessing cardiac function and detecting its abnormalities. Existing methods often struggle to accurately capture heart motion because they rely on intensity-based image registration similarity losses that may overlook cardiac anatomical regions. To address this, we propose CardioMorphNet, a recurrent Bayesian deep learning framework for 3D cardiac shape-guided deformable registration using short-axis (SAX) CMR images. It employs a recurrent variational autoencoder to model spatio-temporal dependencies across the cardiac cycle, along with two posterior models for bi-ventricular segmentation and motion estimation. The derived loss function from the Bayesian formulation guides the framework to focus on anatomical regions by recursively registering segmentation maps without using intensity-based image registration similarity loss, while leveraging sequential SAX volumes and spatio-temporal features. The Bayesian modelling also enables the computation of uncertainty maps for the estimated motion fields. Validated on the UK Biobank and M&M datasets by comparing warped mask shapes with ground-truth masks, CardioMorphNet demonstrates superior performance in cardiac motion estimation, outperforming state-of-the-art methods. Uncertainty assessment shows that it also yields lower uncertainty values for estimated motion fields in the cardiac region compared with other probabilistic-based cardiac registration methods, indicating higher confidence in its predictions. In addition, the clinical indices extraction assessment shows that CardioMorphNet estimates the clinical indices more accurately than other approaches.

URL PDF HTML ☆

赞 0 踩 0

2503.18314 2026-06-09 cs.LG cs.AI cs.CV 版本更新

LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty

LoTUS：带有不确定性风味的大规模机器遗忘

Christoforos N. Spartalis, Theodoros Semertzidis, Petros Daras, Efstratios Gavves

发表机构 * University of Amsterdam（阿姆斯特丹大学）； Centre for Research & Technology Hellas（希腊研究中心与技术中心）； Archimedes/Athena RC（阿基米德/雅典娜研究中心）

AI总结提出LoTUS方法，通过平滑预测概率至信息论界限来消除训练样本影响，避免从头重训练，在Transformer和ResNet18模型上超越现有方法，并引入RF-JSD指标用于实际评估。

Comments Accepted as a main conference paper at CVPR 2025 (https://cvpr.thecvf.com/virtual/2025/poster/33292)

2508.06659 2026-06-09 cs.LG cs.AI 版本更新

In-Context Reinforcement Learning via Communicative World Models

通过通信世界模型进行上下文强化学习

Fernando Martinez-Lopez, Tao Li, Yingdong Lu, Juntao Chen

发表机构 * Department of Computer and Information Sciences, Fordham University（福特汉姆大学计算机与信息科学系）； Department of Systems Engineering, City University of Hong Kong（香港城市大学系统工程系）； IBM Research（IBM研究院）

AI总结提出CORAL框架，通过将潜在表示学习与控制分离，利用信息代理预训练世界模型并生成通信消息，使控制代理实现零样本适应和样本效率提升。

详情

AI中文摘要

强化学习（RL）代理通常难以在不更新参数的情况下泛化到新任务和上下文，主要是因为它们学到的表示和策略过度拟合于训练环境的特定性。为了提升代理的上下文RL（ICRL）能力，本文将ICRL形式化为一个双代理涌现通信问题，并引入了CORAL（用于自适应RL的通信表示）框架，该框架通过功能性地分离潜在表示学习与控制来学习可迁移的通信上下文。在CORAL中，信息代理（IA）在多样化的任务分布上作为世界模型进行预训练。其目标不是直接最大化回报，而是进行世界建模并将其理解提炼为简洁的消息。涌现通信协议由一种新颖的因果影响损失塑造，该损失衡量消息对下一动作的影响。在部署期间，预训练的IA作为固定上下文提供者服务于新的控制代理（CA），后者通过解释提供的通信上下文来学习解决任务。我们的实验表明，这种方法使CA能够实现样本效率的显著提升，并在多样化的在线和离线环境中借助预训练的IA成功进行零样本适应，验证了学习可迁移通信表示的有效性。

英文摘要

Reinforcement learning (RL) agents often struggle to generalize to new tasks and contexts without updating their parameters, mainly because their learned representations and policies are overfit to the specifics of their training environments. To boost agents' in-context RL (ICRL) ability, this work formulates ICRL as a two-agent emergent communication problem and introduces CORAL (Communicative Representation for Adaptive RL), a framework that learns a transferable communicative context by functionally separating latent representation learning from control. In CORAL, an Information Agent (IA) is pre-trained as a world model on a diverse distribution of tasks. Its objective is not direct return maximization, but world modeling and distilling its understanding into concise messages. The emergent communication protocol is shaped by a novel Causal Influence Loss, which measures the effect that the message has on the next action. During deployment, the previously trained IA serves as a fixed contextualizer for a new Control Agent (CA), which learns to solve tasks by interpreting the provided communicative context. Our experiments demonstrate that this approach enables the CA to achieve significant gains in sample efficiency and successfully perform zero-shot adaptation with the help of pre-trained IA in diverse online and offline environments, validating the efficacy of learning a transferable communicative representation.

URL PDF HTML ☆

赞 0 踩 0

2508.06336 2026-06-09 cs.LG cs.AI cs.HC cs.MA 版本更新

Unsupervised Partner Design Enables Robust Ad-hoc Teamwork

无监督伙伴设计实现鲁棒的临时团队协作

Constantin Ruhdorfer, Matteo Bortoletto, Victor Oei, Anna Penzkofer, Andreas Bulling

发表机构 * University of Southampton（索姆塞特大学）

AI总结提出无监督伙伴设计(UPD)方法，通过动态生成并基于可学习性准则自适应选择训练伙伴，无需预训练伙伴群体或手动调参，在多个任务中达到强性能，并在人机交互研究中获得更高评价。

Comments 27 pages

2508.05950 2026-06-09 cs.CV cs.AI 版本更新

CLONE: A 3DGS-Based Closed-Loop Differentiable Optimization Framework for Single-Image Normal Estimation

CLONE: 基于3DGS的闭环可微优化框架用于单图像法线估计

Yanxing Liang, Yinghui Wang, Wei Li, Tao Yan, Jiaxing Shen

发表机构 * School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China（江南大学人工智能与计算机科学学院，中国无锡）； School of Data Science, Lingnan University, Hong Kong, China（岭南大学数据科学学院，中国香港）

AI总结提出CLONE框架，通过3D高斯泼溅参数化场景并利用协方差特征分解得到连续可微法线，结合可微光照模型和一步确定性扩散精化网络，在统一重投影目标下联合优化，实现无需真值法线监督的几何一致性单图像法线估计。

详情

AI中文摘要

我们提出CLONE，一个基于3DGS的闭环可微优化框架，用于单图像法线估计。核心思想是构建一个“图像-几何-图像”一致性循环，统一并联合约束两种范式的局限性：判别式方法依赖显式监督而缺乏跨域几何约束，生成式方法虽有强生成先验但缺乏稳定的可微优化路径。具体地，我们首先采用3D高斯泼溅显式参数化场景，并通过协方差特征分解导出连续可微的表面法线，为几何建模提供解析梯度路径。然后，我们引入一个带有可学习光调制核的可微光照模型，建立表面法线与图像辐射之间的连续映射，使重投影误差直接监督底层3D几何。此外，为补偿高斯表示在局部细节表达上的不足，我们设计了一个一步确定性扩散启发的精化网络，在保持端到端可微性的同时增强局部几何细节。引入跨域门控融合机制以协调全局几何一致性和局部细节重建。最后，所有组件在统一的重投影目标下联合优化，形成闭环且稳定的梯度传播路径。这使得无需真值法线监督即可有效约束多解空间并改善几何一致性。

英文摘要

We propose CLONE, a 3DGS-based Closed-Loop differentiable Optimization framework for single-image Normal Estimation. The core idea is to construct an "image-geometry-image" consistency loop that unifies and jointly constrains the limitations of both paradigms: the reliance on explicit supervision without cross-domain geometric constraints in discriminative methods, and the absence of stable differentiable optimization pathways in generative methods despite strong generative priors. Specifically, we first employ 3D Gaussian Splatting to explicitly parameterize the scene and derive continuous and differentiable surface normals via covariance eigen-decomposition, providing an analytical gradient pathway for geometric modeling. We then introduce a differentiable illumination model with a learnable light modulation kernel to establish a continuous mapping between surface normals and image radiance, enabling reprojection errors to directly supervise the underlying 3D geometry. Furthermore, to compensate for the limited local detail expressiveness of Gaussian representations, we design a one-step deterministic diffusion-inspired refinement network, which enhances local geometric details while preserving end-to-end differentiability. A cross-domain gating fusion mechanism is introduced to coordinate global geometric consistency and local detail reconstruction. Finally, all components are jointly optimized under a unified reprojection objective, forming a closed-loop and stable gradient propagation pathway. This enables effective constraint of the multi-solution space and improved geometric consistency without requiring ground-truth normal supervision.

URL PDF HTML ☆

赞 0 踩 0

2507.12612 2026-06-09 cs.LG cs.AI 版本更新

Learning Task Mixtures from Task Affinities: A Probabilistic Graphical Model for Supervised Fine-Tuning

学习什么是重要的：通过互信息的概率任务选择用于模型微调

Prateek Chanda, Saral Sureka, Parth Pratim Chatterjee, Krishnateja Killamsetty, Nikhil Shivakumar Nayak, Ganesh Ramakrishnan

发表机构 * IIT Bombay（印度理工学院班加罗尔分校）； IBM Research（IBM研究）； Red Hat AI Innovation（红帽AI创新）； MIT-IBM Watson AI Lab（麻省理工-IBM沃森AI实验室）

AI总结本文提出TaskPGM框架，通过基于能量的任务模型学习连续任务混合，利用互信息和行为分歧来捕捉任务间的关系，从而在任务覆盖和冗余之间取得平衡，提升大语言模型的监督微调性能。

Comments 9, 8 tables, 7 figures

详情

AI中文摘要

大语言模型的监督微调性能在很大程度上取决于训练预算如何分配到异质任务集上。在实践中，通常使用简单的启发式方法（例如均匀或按比例采样）来固定混合，但这些方法忽略了任务之间的相互作用，可能损害迁移并浪费在冗余来源上的预算。我们引入TaskPGM，一种通过基于能量的任务模型学习连续任务混合的框架。任务形成马尔可夫随机场的节点：单变量势能捕捉单个任务的效用，而双变量势能使用从单任务微调模型的预测分布中计算的行为分歧（如Jensen-Shannon分歧和点互信息）来编码任务间的关系。优化此目标会产生在覆盖和冗余之间取得平衡的混合。我们显示，所得到的集合函数在预算约束下是弱子模的，这使得离散选择变体能够获得近似保证。在多个模型家族（LLaMA-7B，Qwen2-7B）和评估套件（BIG-Bench Hard）上，TaskPGM在标准混合策略之上取得改进，并提供了任务间关系的可解释结构。

英文摘要

Supervised fine-tuning performance for large language models depends strongly on how training budget is distributed across a heterogeneous set of tasks. In practice, mixtures are often fixed using simple heuristics (e.g., uniform or size-proportional sampling) that ignore task interactions, which can hurt transfer and waste budget on redundant sources. We introduce TaskPGM, a framework for learning continuous task mixtures via an energy-based model over tasks. Tasks form the nodes of a Markov random field: unary potentials capture per-task utility, and pairwise potentials encode inter-task relationships using behavioral divergences computed from predictive distributions of single-task fine-tuned models (e.g., Jensen--Shannon divergence and pointwise mutual information). Optimizing this objective yields mixtures that balance coverage against redundancy. We show that the resulting set function is weakly submodular under budget constraints, enabling approximation guarantees for discrete selection variants. Across multiple model families (LLaMA-7B, Qwen2-7B) and evaluation suites (BIG-Bench Hard), TaskPGM improves over standard mixing strategies and provides interpretable structure over task interactions.

URL PDF HTML ☆

赞 0 踩 0

2508.00917 2026-06-09 cs.RO cs.CV cs.LG 版本更新

A Survey on Deep Multi-Task Learning in Connected Autonomous Vehicles

联网自动驾驶车辆中深度多任务学习综述

Jiayuan Wang, Farhad Pourpanah, Q. M. Jonathan Wu, Ning Zhang

发表机构 * Department of Electrical and Computer Engineering, University of Windsor（温莎大学电气与计算机工程系）； Department of Electrical and Computer Engineering, Queen’s University（皇后大学电气与计算机工程系）

AI总结综述联网自动驾驶车辆中深度多任务学习，涵盖感知、预测、规划、控制及V2X通信与资源管理，分析现有方法优缺点并指出未来方向。

详情

DOI: 10.1109/COMST.2026.3699223

AI中文摘要

联网自动驾驶车辆（CAVs）必须同时执行多个任务，如感知、预测、规划和控制，以确保在复杂环境中安全可靠地导航。此外，通过车联万物（V2X）通信，可以实现CAVs之间的协同感知和驾驶，从而减轻单个车辆的局限性，同时也引入了严格的延迟、可靠性和带宽约束。传统上，任务使用单独的模型处理，这导致部署成本高、计算开销增加以及实现实时性能的挑战。多任务学习（MTL）最近成为一种有前景的解决方案，能够在统一模型中联合学习多个任务，从而提供更高的效率和资源利用率。据我们所知，本综述是首次专注于CAVs中深度MTL的全面回顾。我们首先概述CAVs和MTL以提供基础背景。然后，我们回顾了CAVs关键功能领域的MTL方法，包括感知、预测、规划、控制以及V2X通信和无线电资源管理（RRM）。对于前四个领域，我们将现有工作分为仅单车（车载）和V2X增强协同（多智能体）范式。我们进一步将V2X通信和RRM作为以通信为中心的MTL问题进行讨论。最后，我们讨论了现有方法的优势和局限性，识别了关键研究空白，并提供了旨在推进CAV系统MTL方法的未来研究方向。

英文摘要

Connected autonomous vehicles (CAVs) must simultaneously perform multiple tasks, such as perception, prediction, planning, and control, to ensure safe and reliable navigation in complex environments. Moreover, through vehicle-to-everything (V2X) communication, cooperative perception and driving among CAVs can be enabled, thereby mitigating the limitations of individual vehicles, while it also introduces stringent latency, reliability, and bandwidth constraints. Traditionally, tasks are addressed using separate models, which leads to high deployment costs, increased computational overhead, and challenges in achieving real-time performance. Multi-task learning (MTL) has recently emerged as a promising solution that enables the joint learning of multiple tasks within a unified model. This offers improved efficiency and resource utilization. To the best of our knowledge, this survey is the first comprehensive review focusing on deep MTL in CAVs. We begin with an overview of CAVs and MTL to provide foundational background. Then, we review MTL approaches across key functional domains in CAVs, including perception, prediction, planning, control, as well as V2X communications and radio resource management (RRM). For the first four domains, we categorize existing works under ego vehicle-only (onboard-only) and V2X-enhanced cooperative (multi-agent) paradigms. We further discuss V2X communications and RRM as communication-centric MTL problems. Finally, we discuss the strengths and limitations of existing methods, identify key research gaps, and provide future research directions aimed at advancing MTL methodologies for CAV systems.

URL PDF HTML ☆

赞 0 踩 0

2507.09751 2026-06-09 cs.AI cs.CL cs.LO 版本更新

Sound and Complete Neurosymbolic Reasoning with LLM-Grounded Interpretations

基于LLM解释的完备且可靠的神经常识推理

Bradley P. Allen, Prateek Chhikara, Thomas Macaulay Ferguson, Filip Ilievski, Paul Groth

发表机构 * University of Amsterdam（阿姆斯特丹大学）； University of Southern California（南加州大学）； Rensselaer Polytechnic Institute（拉特格斯理工学院）； Vrije Universiteit Amsterdam（阿姆斯特丹自由大学）

AI总结提出将LLM直接集成到次协调逻辑的语义解释函数中，实现可靠且完备的神经常识推理，在GPQA和SimpleQA基准上宏F1提升约6个百分点，并成功检测药物安全知识库中的矛盾。

Comments 43 pages, 14 tables, 4 figures. Accepted to the 19th Conference on Neurosymbolic Learning and Reasoning (NeSy 2025); to appear Neurosymbolic Artifical Intelligence Special Issue on NeSy 2025 Extended Papers

详情

AI中文摘要

大型语言模型（LLM）在自然语言理解和生成方面展现了令人印象深刻的能力，但在输出中表现出逻辑一致性问题。我们如何在形式推理中利用LLM的广泛覆盖参数知识，尽管它们存在不一致性？我们提出了一种方法，将LLM直接集成到次协调逻辑的形式语义的解释函数中。我们使用从短事实性基准GPQA和SimpleQA导出的数据集对方法进行实证评估，显示双边事实性评估在两个基准上的宏F1比单边基线提高了约6个百分点（以覆盖率为代价，因为在不一致或不确定的情况下会触发弃权）。我们进一步描述了一个实现该方法的原型tableau推理器，并将其应用于包含228条断言和712条推断语句的药物安全知识库：系统检测到92个对应于医学显著错误（例如，阿片类药物被推断为非成瘾性，β受体阻滞剂被推断为在哮喘中安全）的过剩（glut），同时保持可满足性，表明矛盾被局部化而不是导致逻辑爆炸。与先前工作不同，我们的方法提供了一个理论框架和实际实现，用于神经常识推理，利用LLM的知识同时保留底层逻辑的可靠性和完备性属性。

英文摘要

Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but exhibit problems with logical consistency in their output. How can we harness LLMs' broad-coverage parametric knowledge in formal reasoning despite their inconsistency? We present a method for directly integrating an LLM into the interpretation function of the formal semantics for a paraconsistent logic. We evaluate the method empirically using datasets derived from the short-form factuality benchmarks GPQA and SimpleQA, showing that bilateral factuality evaluation improves macro-F1 over a unilateral baseline by roughly 6 percentage points on both benchmarks (at the cost of reduced coverage, as abstention is triggered on inconsistent or uncertain cases). We further describe a proof-of-concept tableau reasoner implementing the method, and apply it to a medication-safety knowledge base of 228 asserted and 712 inferred statements: the system detects 92 gluts corresponding to medically significant errors (e.g., opioids inferred as non-addictive, beta-blockers inferred as safe in asthma) while remaining satisfiable, demonstrating that contradictions are localized rather than causing logical explosion. Unlike prior work, our method offers a theoretical framework with a practical implementation for neurosymbolic reasoning that leverages an LLM's knowledge while preserving the underlying logic's soundness and completeness properties.

URL PDF HTML ☆

赞 0 踩 0

2507.22876 2026-06-09 cs.AI cs.LO 版本更新

Discovering heuristics in a complex SAT solver with large language models

利用大型语言模型发现复杂SAT求解器中的启发式策略

Yiwen Sun, Furong Ye, Zhihan Chen, Ke Wei, Shaowei Cai

发表机构 * School of Data Science, Fudan University, Shanghai, China（复旦大学数据科学学院，上海，中国）； Key Laboratory of System Software, Institute of Software, Chinese Academy of Sciences, Beijing, China（中国科学院软件研究所系统软件重点实验室，北京，中国）； SeedMath Technology Limited, Beijing, China（SeedMath技术有限公司，北京，中国）

AI总结提出AutoModSAT框架，结合模块化求解器设计、无监督提示优化和进化算法，利用LLM自动优化SAT求解器，在多个数据集上性能提升40%。

详情

AI中文摘要

可满足性问题（SAT）是计算复杂性理论的基础，并具有广泛的工业应用。由于现代SAT求解器架构复杂，在现实环境中优化它们相当具有挑战性。尽管已经开发了自动配置框架，但它们依赖于手动约束的搜索空间。在这里，我们开发了AutoModSAT，一个使用大型语言模型（LLM）自动优化SAT求解器的框架。AutoModSAT结合了兼容LLM的模块化求解器设计、无监督提示优化以多样化生成的函数，以及基于预搜索策略和$(1+\lambda)$进化算法的高效搜索过程。在广泛的数据集上进行的大量实验表明，AutoModSAT相比基线求解器实现了40%的性能提升，相比最先进的求解器实现了30%的提升。此外，在大多数测试数据集上，AutoModSAT相比最先进求解器的参数调优替代方案也实现了显著的加速。这些结果证明了LLM引导的启发式发现用于优化复杂SAT求解器的潜力。

英文摘要

The Satisfiability problem (SAT) is fundamental in computational complexity theory and has a wide range of industrial applications. Optimizing modern SAT solvers in real-world settings is quite challenging due to their intricate architectures. While automatic configuration frameworks have been developed, they rely on manually constrained search spaces. Here we develop AutoModSAT, a framework that uses large language models (LLMs) to automatically optimize SAT solvers. AutoModSAT combines an LLM-compatible modular solver design, unsupervised prompt optimization to diversify generated functions, and an efficient search procedure based on presearch strategy and a $(1+λ)$ evolutionary algorithm. Extensive experiments across a wide range of datasets demonstrate that AutoModSAT achieves $40\%$ performance improvement over the baseline solver and $30\%$ improvement over the state-of-the-art solvers. Moreover, AutoModSAT also attains a notable speedup compared to the parameter-tuned alternatives of the state-of-the-art solvers over most of the test datasets. These results demonstrate the potential of LLM-guided heuristic discovery for optimizing complex SAT solvers.

URL PDF HTML ☆

赞 0 踩 0

2507.19700 2026-06-09 cs.LG 版本更新

Disjoint Generation of Synthetic Data

合成数据的分离生成

Anton Danholt Lautrup, Muhammad Rajabinasab, Tobias Hyrup, Arthur Zimek, Peter Schneider-Kamp

发表机构 * Department of Mathematics and Computer Science（数学与计算机科学系）； University of Southern Denmark（南方大学）

AI总结提出通过分离生成模型生成表格合成数据的新框架，将数据集分区后独立生成再合并，在无公共变量时实现连接，提升隐私性、计算可行性和混合模型合成能力。

详情

Journal ref: Transact. mach. learn. res. (June 2026). https://openreview.net/forum?id=LSzXkAWBKI

AI中文摘要

我们提出了一种通过分离生成模型生成表格合成数据集的新框架。在该范式中，数据集被划分为多个不相交的子集，分别提供给生成模型的独立实例。然后，通过一种在缺乏公共变量/标识符的情况下工作的连接操作，将结果事后组合。通过几个案例研究和表格数据示例，我们展示了该框架的成功，并帮助阐明了一些可能的设计选择。分离生成所实现的优势包括：i) 观察到隐私的经验度量有所提高。ii) 增加了某些模型类型的计算可行性。iii) 能够使用不同生成模型的混合来生成合成数据。具体而言，混合模型合成弥合了隐私和效用性能之间的差距，在下游任务的准确性和曲线下面积方面提供了极具竞争力的性能，同时显著降低了经验重识别风险。

英文摘要

We propose a new framework for generating tabular synthetic datasets via disjoint generative models. In this paradigm, a dataset is partitioned into disjoint subsets that are supplied to separate instances of generative models. The results are then combined post hoc by a joining operation that works in the absence of common variables/identifiers. The success of the framework is demonstrated through several case studies and examples on tabular data that help illuminate some of the design choices that one may make. The advantages achieved by the disjoint generation include: i) An observed increase in the empirical measurement of privacy. ii) Increased computational feasibility of certain model types. iii) Ability to generate synthetic data using a mixture of different generative models. Specifically, mixed-model synthesis bridges the gap between privacy and utility performance, providing highly competitive performance on Accuracy and Area Under the Curve for downstream tasks while significantly lowering the empirical re-identification risk.

URL PDF HTML ☆

赞 0 踩 0

2507.09092 2026-06-09 cs.CV cs.LG 版本更新

Analysis of Information Theory for Explainable AI

可解释人工智能的信息论分析

Ram S Iyer

发表机构 * Rajiv Gandhi Institute of Petroleum Technology（拉贾夫·甘地石油技术研究所）

AI总结提出基于互信息的激活映射方法MI CAM，通过特征图与输入图像的互信息加权生成显著性可视化，实现模型推理的因果解释，性能优于现有方法。

详情

AI中文摘要

随着机器视觉在医疗和自动化电厂等关键日常需求中的介入，卷积神经网络的内部机制以及网络提供特定推理的原因引起了关注。本文提出了一种新颖的基于激活映射的事后视觉解释方法，称为MI CAM。与之前基于类激活映射的方法不同，MI CAM通过每个特征图与输入图像的互信息对其进行加权，生成显著性可视化，最终结果由权重和激活图的线性组合产生。它还通过反事实分析验证了因果解释的生成。我们旨在展示MI CAM在模型推理过程中实现的视觉表现和无偏解释。我们的方法与所有最先进的方法相当，但在定性和定量度量上尤其优于其中一些方法。

英文摘要

With the intervention of machine vision in our crucial day to day necessities including healthcare and automated power plants, attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network provides specific inferences. This paper proposes a novel post-hoc visual explanation method called MI CAM based on activation mapping. Differing from previous class activation mapping based approaches, MI CAM produces saliency visualizations by weighing each feature map through its mutual information with the input image and the final result is generated by a linear combination of weights and activation maps. It also adheres to producing causal interpretations as validated with the help of counterfactual analysis. We aim to exhibit the visual performance and unbiased justifications for the model inferencing procedure achieved by MI CAM. Our approach works at par with all state-of-the-art methods but particularly outperforms some in terms of qualitative and quantitative measures.

URL PDF HTML ☆

赞 0 踩 0

2406.07318 2026-06-09 cs.CV cs.AR eess.IV 版本更新

Embedded Graph Convolutional Networks for Real-Time Event Data Processing on SoC FPGAs

嵌入式图卷积网络用于SoC FPGA上的实时事件数据处理

Kamil Jeziorek, Piotr Wzorek, Krzysztof Blachut, Andrea Pinna, Tomasz Kryjak

发表机构 * University of Warsaw（华沙大学）； Politechnika Warszawska（华沙理工大学）

AI总结提出一种针对事件相机的硬件感知图卷积网络EFGCN，在SoC FPGA上实现实时处理，模型大小相比AEGNN降低100倍，精度仅下降2.9%。

详情

DOI: 10.1016/j.sysarc.2026.103850
Journal ref: Journal of Systems Architecture, Volume 177, August 2026, 103850

AI中文摘要

事件相机的使用代表了解决传统视频系统限制的重要且快速发展的趋势。特别是在汽车领域，这些相机因其低延迟和低功耗而集成到嵌入式实时系统中具有重要意义。确保事件处理所需吞吐量和延迟的一种有效方法是利用图卷积网络（GCNs）。在本研究中，我们引入了一种定制的EFGCN（基于事件的FPGA加速图卷积网络），该网络采用了一系列针对PointNetConv（一种用于点云处理的图卷积）的硬件感知优化。所提出的技术相比该领域最新工作之一——异步基于事件的GNN（AEGNN），模型大小减少了高达100倍，而精度下降相对较小（N-Caltech101分类任务下降2.9%，N-Cars分类任务下降2.2%），从而遵循了TinyML趋势。我们在ZCU104 SoC FPGA平台上实现了EFGCN，无需任何片外外部存储器资源，实现了每秒1330万事件（MEPS）的吞吐量和低延迟的实时部分异步处理。在多个基于事件的分类基准测试中，我们的方法在提供每事件最先进的计算效率、小模型大小以及高可扩展性、可定制性和资源效率的同时，实现了具有竞争力的精度。我们将软件和硬件源代码发布在开放存储库中：此 https URL。

英文摘要

The utilisation of event cameras represents an important and swiftly evolving trend aimed at addressing the constraints of traditional video systems. Particularly within the automotive domain, these cameras find significant relevance for their integration into embedded real-time systems due to lower latency and power consumption. One effective approach to ensure the necessary throughput and latency for event processing is through the utilisation of graph convolutional networks (GCNs). In this study, we introduce a custom EFGCN (Event-based FPGA-accelerated Graph Convolutional Network) designed with a series of hardware-aware optimisations tailored for PointNetConv,a graph convolution designed for point cloud processing. The proposed techniques result in up to 100-fold reduction in model size compared to Asynchronous Event-based GNN (AEGNN), one of the most recent works in the field, with a relatively small decrease in accuracy (2.9% for the N-Caltech101 classification task, 2.2% for the N-Cars classification task), thus following the TinyML trend. We implemented EFGCN on a ZCU104 SoC FPGA platform without any off-chip external memory resources, achieving a throughput of 13.3 million events per second (MEPS) and real-time partially asynchronous processing with low latency. Across multiple event-based classification benchmarks, our approach achieves competitive accuracy while providing state-of-the-art computational efficiency per event, small model size, and high scalability, customisability and resource efficiency. We publish both software and hardware source code in an open repository: https://github.com/vision-agh/gcnn-dvs-fpga.

URL PDF HTML ☆

赞 0 踩 0

2507.00322 2026-06-09 cs.CL cs.AI cs.SE 版本更新

Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones

干扰导致的失败：当有缺陷机制掩盖健全机制时，语言模型在平衡括号任务中出错

Daking Rai, Samuel Miller, Kevin Moran, Ziyu Yao

发表机构 * George Mason University（乔治·马歇尔大学）； University of Central Florida（中央佛罗里达大学）； Department of Computer Science（计算机科学系）

AI总结研究揭示语言模型在平衡括号任务中出错的原因：部分组件实现可靠机制，而其他组件引入噪声，当噪声机制主导时导致错误。提出RASteer方法，通过增强可靠组件贡献，将部分模型准确率从0%提升至近100%，并在算术推理任务中取得约20%的性能提升。

Comments 23 pages, 10 figures, accepted for NeurIPS 2025

详情

AI中文摘要

尽管语言模型（LMs）在编码能力方面取得了显著进步，但在生成平衡括号等简单句法任务上仍然存在困难。在本研究中，我们调查了不同规模（124M-7B）的语言模型中这些错误持续存在的潜在机制，旨在理解和减少这些错误。我们的研究揭示，语言模型依赖于多个独立做出预测的组件（注意力头和前馈神经元）。虽然一些组件在广泛的输入范围内可靠地促进正确答案（即实现“健全机制”），但其他组件可靠性较低，通过促进错误标记引入噪声（即实现“有缺陷机制”）。当有缺陷机制掩盖健全机制并主导预测时，就会发生错误。受此启发，我们引入了RASteer，一种引导方法，用于系统地识别并增加可靠组件的贡献，以提升模型性能。RASteer在平衡括号任务上显著提升了性能，将某些模型的准确率从0%提高到接近100%，且不影响模型的一般编码能力。我们进一步展示了其在算术推理任务中的更广泛适用性，实现了高达约20%的性能提升。

英文摘要

Despite remarkable advances in coding capabilities, language models (LMs) still struggle with simple syntactic tasks such as generating balanced parentheses. In this study, we investigate the underlying mechanisms behind the persistence of these errors across LMs of varying sizes (124M-7B) to both understand and mitigate the errors. Our study reveals that LMs rely on a number of components (attention heads and FF neurons) that independently make their own predictions. While some components reliably promote correct answers across a generalized range of inputs (i.e., implementing "sound mechanisms''), others are less reliable and introduce noise by promoting incorrect tokens (i.e., implementing "faulty mechanisms''). Errors occur when the faulty mechanisms overshadow the sound ones and dominantly affect the predictions. Motivated by this insight, we introduce RASteer, a steering method to systematically identify and increase the contribution of reliable components for improving model performance. RASteer substantially improves performance on balanced parentheses tasks, boosting accuracy of some models from $0$% to around $100$% without impairing the models' general coding ability. We further demonstrate its broader applicability in arithmetic reasoning tasks, achieving performance gains of up to around $20$%.

URL PDF HTML ☆

赞 0 踩 0

2506.20699 2026-06-09 cs.LG 版本更新

Structural Decoupling: A Scaffold-Flow Theory of Generalization and Alignment

结构解耦：泛化与对齐的支架流理论

Xin Li

发表机构 * NSF（美国国家科学基金会）； Xin Li（李新）

AI总结提出结构学习理论（StrLT），通过宽度概念和收缩相似性算子，揭示非平稳环境下结构发现与维护的机制，并导出结构解耦原则，解释幻觉、奖励模型边界错误等安全问题。

详情

AI中文摘要

在非平稳和多上下文环境中的学习需要超越普通的任务内泛化。系统还必须发现哪些上下文存在，将输入路由到正确的上下文，保留旧上下文，并在环境变化时修订上下文库。本文提出结构学习理论（StrLT）作为填补这一结构缺失的框架。StrLT 补充了 Vapnik 的统计学习理论（SLT）：SLT 支配着固定机制内的预测或控制（即“漏斗”）；而 StrLT 支配着结构机制的发现与维护（即“陷阱”）。StrLT 的核心对象是宽度，即覆盖一个问题所需的最少局部可行上下文数量。我们总结了三个基本结果：宽度与 VC 维不可比较；学习在真实宽度处发生相变；宽度可通过收缩相似性（CS）算子估计，该算子将任务诱导的非收缩性转化为谱分离。在 StrLT 框架下，我们解释了固定类别的结构可学习性如何导致结构解耦原则：维持结构支架的机制不应由优化上下文内流的相同梯度来训练。这一原则激发了一种支架流模型，其中对齐和泛化在架构上分离。最后，我们论证了若干安全故障，包括幻觉、奖励模型边界错误和欺骗性对齐，可以被解释为支架分辨率或支架维护的失败，而不仅仅是输出层面的预测错误。

英文摘要

Learning in non-stationary and multi-context environments requires more than ordinary within-task generalization. A system must also discover which contexts exist, route inputs to the correct context, preserve old contexts, and revise the context library when the environment changes. This paper presents Structural Learning Theory (StrLT) as a framework of filling this missing structural gap. StrLT complements Vapnik's Statistical Learning Theory (SLT): SLT governs the \emph{funnel}, prediction or control within a fixed regime; while StrLT governs the \emph{trap}, the discovery and maintenance of structural regimes. The core StrLT object is \emph{width}, the minimum number of locally feasible contexts needed to cover a problem. We summarize three basic results: width is incomparable with VC dimension; learning exhibits a phase transition at the true width; and width can be estimated by a contractive-similarity (CS) operator that converts task-induced non-contractivity into spectral separation. Under the StrLT framework, we explain how fixed-class structural learnability leads to a \emph{structural decoupling principle}: the mechanisms that maintain the structural scaffold should not be trained by the same gradients that optimize within-context flow. This principle motivates a scaffold-flow model in which alignment and generalization separate architecturally. Finally, we argue that several safety failures, including hallucination, reward-model boundary errors, and deceptive alignment, can be interpreted as scaffold-resolution or scaffold-preservation failures rather than merely output-level prediction errors.

URL PDF HTML ☆

赞 0 踩 0

2506.11336 2026-06-09 cs.LG math.OC 版本更新

The Sample Complexity of Parameter-Free Stochastic Convex Optimization

无参数随机凸优化的样本复杂度

Jared Lawrence, Ari Kalinsky, Hannah Bradfield, Yair Carmon, Oliver Hinder

发表机构 * Department of Industrial Engineering, University of Pittsburgh（工业工程系，匹兹堡大学）； Department of Computer Science, Tel Aviv University（计算机科学系，特拉维夫大学）

AI总结研究未知问题参数（如到最优点的距离和Lipschitz常数）下随机凸优化的样本复杂度，提出可靠模型选择方法和正则化方法，实现最优样本复杂度并避免过拟合。

Comments Accepted for publication in JMLR

详情

AI中文摘要

我们研究当问题参数（如到最优点的距离和Lipschitz常数）未知时随机凸优化的样本复杂度。我们采用两种策略。首先，我们开发了一种可靠的模型选择方法，避免对验证集的过拟合。该方法允许我们通用地调整随机优化方法的学习率，以匹配最优已知参数样本复杂度（相差log log因子）。其次，我们开发了一种专门针对仅到最优点的距离未知情况的正则化方法。具体而言，它使用范数正则化经验风险最小化来估计到最优点的距离（常数因子内），使得已知参数的随机优化方法能够达到最优样本复杂度。该方法提供了对未知到最优点距离的完美适应性，展示了无参数随机凸优化的样本复杂度与计算复杂度之间的分离。结合这两种方法允许我们同时适应多种问题结构。在CIFAR-10上通过微调CLIP模型和提示工程Gemini计数形状进行的小样本学习实验表明，我们的可靠模型选择方法有助于减轻对小验证集的过拟合。

英文摘要

We study the sample complexity of stochastic convex optimization when problem parameters such as the distance to optimality and the Lipschitz constant are unknown. We pursue two strategies. First, we develop a reliable model selection method that avoids overfitting to the validation set. This method allows us to generically tune the learning rate of stochastic optimization methods to match the optimal known-parameter sample complexity up to log log factors. Second, we develop a regularization-based method that is specialized to the case that only the distance to optimality is unknown. More specifically, it uses norm-regularized empirical risk minimization to estimate the distance to optimality to within a constant factor, allowing known-parameter stochastic optimization methods to achieve optimal sample complexity. This method provides perfect adaptability to unknown distance to optimality, demonstrating a separation between the sample and computational complexity of parameter-free stochastic convex optimization. Combining these two methods allows us to simultaneously adapt to multiple problem structures. Experiments performing few-shot learning on CIFAR-10 by fine-tuning CLIP models and prompt engineering Gemini to count shapes indicate that our reliable model selection method can help mitigate overfitting to small validation sets.

URL PDF HTML ☆

赞 0 踩 0