arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.11251 2026-06-11 cs.LG 新提交

Mechanical Field Networks: Structured Neural Dynamics for Multivariate Systems

机械场网络：多变量系统的结构化神经动力学

Xingji Cui

发表机构 * Xi’an Jiaotong University（西安交通大学）

AI总结提出MF-Net，一种将多变量系统表示为共享场状态并通过可学习关系律更新状态的递归模型，在保持可解释结构的同时实现竞争性预测。

详情

AI中文摘要

许多多变量动力系统仅通过轨迹观测，其联合动力学机制是隐藏的。现有方法可以施加可解释的动力学或学习灵活的状态转移，但得到的交互结构通常要么预先指定，要么隐含在学习动力学中。我们引入MF-Net，一种递归动力学模型，将所有变量表示在共享场状态中，并通过学习的关系律更新该状态。每个变量携带一个场分量，这些分量通过可学习的机械转移共同演化。这里，机械指的是转移的关系-运动组织，其中学习的关系塑造状态依赖的流、场响应和推动场状态前进的运动趋势。得到的结构是展开本身的一部分：学习的关系影响场的运动方式，相同的内部量支持预测和结构读出。在已知定律的交互系统、混沌基准、真实神经记录和生态时间序列上，MF-Net在保持可检查的结构读出的同时，实现了有竞争力的短中期预测。在40维Lorenz-96测试平台上，MF-Net的八步$R^2$达到$0.798\pm0.018$；在五个随机种子下，其学习的关系矩阵以$19.80\pm1.00$的局部/非局部强度比和$1.000\pm0.000$的Precision@$K$恢复了局部耦合支持。MF-Net提供了一个结构可读的动力学建模框架，其中学习的关系通过前向演化训练，并在真实数据上，在适当的观测限制下被解释为功能预测耦合。

英文摘要

Many multivariate dynamical systems are observed only through trajectories, leaving the mechanisms governing their joint dynamics hidden. Existing approaches can impose interpretable dynamics or learn flexible state transitions, yet the resulting interaction structure is typically either specified in advance or left implicit within the learned dynamics. We introduce MF-Net, a recurrent dynamical model that represents all variables in a shared field state and updates this state through a learned relation law. Each variable carries a field component, and these components evolve jointly through a learnable mechanical transition. Here, mechanical refers to the relation-to-motion organization of the transition, where learned relations shape state-dependent flows, field responses, and motion tendencies that move the field state forward. The resulting structure is part of the rollout itself: learned relations influence how the field moves, and the same internal quantities support both forecasting and structural readout. Across known-law interaction systems, chaotic benchmarks, real neural recordings, and ecological time series, MF-Net achieves competitive short- and medium-horizon forecasting while retaining inspectable structural readout. On the 40-dimensional Lorenz--96 testbed, MF-Net achieves an eight-step $R^2$ of $0.798\pm0.018$; across five seeds, its learned relation matrix recovers the local coupling support with a local/nonlocal strength ratio of $19.80\pm1.00$ and Precision@$K$ of $1.000\pm0.000$. MF-Net provides a structure-readable dynamical modeling framework in which learned relations are trained through forward evolution and, on real data, interpreted as functional predictive couplings under appropriate observational limits.

URL PDF HTML ☆

赞 0 踩 0

2606.11262 2026-06-11 cs.LG cs.AI 新提交

PermDoRA -- Understanding Adapter Interference in Language Models: Limits of Parameter-Space Geometry

PermDoRA -- 理解语言模型中的适配器干扰：参数空间几何的局限性

Gowtham Sivaramakrishnan, Sarvesha Kumar Kombaiah Seetha, Kishan Gupta Balaji, Santhosh Baradwaj Vaduvur Ranganathan

发表机构 * Independent Researcher（独立研究员）

AI总结研究适配器组合中的干扰是否源于线性参数更新重叠，通过DoRA-RBAC框架和几何感知合并策略实验，发现参数空间几何不是干扰主因，而是共享非线性表示中的交互。

Comments 18 Pages, COLM 2026

详情

AI中文摘要

大型语言模型（LLMs）中的访问控制需要模块化机制，以在不重新训练或跨领域干扰的情况下实现特定领域行为。一个常见的假设是，适配器组合过程中的干扰源于线性参数更新的重叠，这表明强制正交性或方向独立性应能提高多领域性能。我们使用DoRA-RBAC（一种基于权重分解低秩适配的分层适配器组合框架）来测试这一假设。我们比较了传统的欧几里得合并与一种几何感知的黎曼启发式合并策略，该策略通过在LLaMA-3.1-8B和Mistral-7B上的多个QA基准（GPQA、PubMedQA、SimpleQA、WMDP）上进行归一化方向平均来近似弗雷歇均值。我们的结果表明，虽然单领域性能与LoRA相当，但几何感知合并相比标准平均在多领域组合中并未提供一致的优势。进一步分析揭示，适配器更新的角度对齐和正交性是组合性能的弱预测因子。这些发现表明，适配器干扰并非主要由参数空间几何决定，而是与共享非线性表示中的交互一致。

英文摘要

Access control in large language models (LLMs) requires modular mechanisms to enable domain-specific behavior without retraining or cross-domain interference. A common hypothesis is that interference during adapter composition arises from overlap in linear parameter updates, suggesting that enforcing orthogonality or directional independence should improve multi-domain performance. We test this hypothesis using DoRA-RBAC, a hierarchical adapter composition framework based on weight-decomposed low-rank adaptation. We compare conventional Euclidean merging with a geometry-aware Riemannian-inspired merging strategy that approximates the Frechet mean via normalized directional averaging across multiple QA benchmarks (GPQA, PubMedQA, SimpleQA, WMDP) on LLaMA-3.1-8B and Mistral-7B. Our results show that while single-domain performance matches LoRA, geometry-aware merging provides no consistent advantage over standard averaging in multi-domain settings.Diagnostic analysis further reveals that angular alignment and orthogonality of adapter updates are weak predictors of composition performance. These findings suggest that adapter interference is not governed primarily by parameter-space geometry, but is instead consistent with interactions in shared nonlinear representations.

URL PDF HTML ☆

赞 0 踩 0

2606.11275 2026-06-11 cs.LG cs.AI 新提交

RoVE: Rotary Value Embeddings Attention for Relative Position-dependent Value Pathways

RoVE: 旋转值嵌入注意力实现相对位置相关的值路径

Alejandro García-Castellanos, Maurice Weiler, Erik J Bekkers

发表机构 * AMLab University of Amsterdam（阿姆斯特丹大学AMLab）； MIT CSAIL（麻省理工学院计算机科学与人工智能实验室）

AI总结提出RoVE方法，通过同时旋转键和值使值对位置敏感，将RoPE注意力转化为注意力卷积，在少样本学习、分布外困惑度和长上下文检索上优于RoPE。

2606.11341 2026-06-11 cs.LG cs.RO 新提交

Energy-Conserved Neural Pipelines: Attenuating Error Propagation in Modular Neural Networks via Physical Conservation Constraints

能量守恒神经管道：通过物理守恒约束减弱模块化神经网络中的误差传播

David Young, Swan Yi Htet

发表机构 * ORION Robotics

AI总结提出在模块间强制能量守恒（特征向量L2范数不变）作为硬约束，实验证明该方法在多种噪声下显著优于基线，并具有深度不变性和理论保证。

Comments 22 pages, 2 figures, 7 tables, 25 references

详情

AI中文摘要

模块化神经网络管道存在误差累积问题：任何模块边界的噪声都会传播并可能在后续模块中放大。我们引入能量守恒作为模块间信息流的硬物理约束。激活能量（特征向量的平方L2范数）被强制在每个模块边界精确保持不变。与软能量惩罚不同，守恒是不可违反的定律：网络可以在神经元之间重新分配能量，但不能创造或毁灭能量。在CIFAR-10上的四个实验表明：（1）在噪声sigma=0.2时，守恒方法保留了77.4%的干净准确率，而基线为35.1%，能量惩罚模型为30.9%（p<0.001，5个种子）；（2）管道变得深度不变，在深度2至5且每个边界都有噪声时保留了93.3%的准确率；（3）该优势泛化到系统性偏差（+45.1%）、高斯噪声（+40.4%）和对抗噪声（+4.8%），而对dropout有原则性的无影响（-0.3%）；（4）在ResNet-18上，守恒优势与内在归一化呈反比：在sigma=0.2时，有BatchNorm时+0.3个百分点，无BatchNorm时+26.2个百分点，在sigma=0.5时达到+58.0个百分点。实验5在真实模块化机器人管道（MuJoCo物理，Franka Panda）上验证了该算子。在独立机器上的三次独立运行（每个单元90次试验）中，守恒在单目深度类噪声上提供了平均+18.9个百分点的优势。一个形式化界限证明了守恒噪声能量严格小于输入噪声能量。

英文摘要

Modular neural network pipelines suffer from error compounding: noise at any module boundary propagates and potentially amplifies through subsequent modules. We introduce energy conservation as a hard physical constraint on inter-module information flow. Activation energy (the squared L2 norm of feature vectors) is enforced to be exactly preserved at every module boundary. Unlike soft energy penalties, conservation is an inviolable law: the network may redistribute energy across neurons but cannot create or destroy it. Four experiments on CIFAR-10 demonstrate: (1) conservation retains 77.4% of clean accuracy at noise sigma=0.2, versus 35.1% for baselines and 30.9% for energy-penalized models (p<0.001, 5 seeds); (2) pipelines become depth-invariant, retaining 93.3% at depths 2 through 5 with noise at every boundary; (3) the advantage generalizes to systematic bias (+45.1%), Gaussian (+40.4%), and adversarial noise (+4.8%), with a principled non-effect on dropout (-0.3%); (4) on ResNet-18, the conservation advantage scales inversely with intrinsic normalization: +0.3 pp with BatchNorm, +26.2 pp without at sigma=0.2, reaching +58.0 pp at sigma=0.5. Experiment 5 validates the operator on a real modular robotic pipeline (MuJoCo physics, Franka Panda). Across three independent runs on separate machines (90 trials per cell), conservation provides +18.9 pp average advantage on monocular-depth-style noise. A formal bound proves conserved noise energy is strictly less than input noise energy.

URL PDF HTML ☆

赞 0 踩 0

2606.11391 2026-06-11 cs.LG 新提交

Recursive Binding on a Budget: Subspace Carving in Order-p Tensor Memories

预算上的递归绑定：阶-p张量记忆中的子空间雕刻

Travis Pence, Daisuke Yamada, Vikas Singh

发表机构 * University of Wisconsin-Madison（威斯康星大学麦迪逊分校）

AI总结提出正交子空间雕刻（OSC）方法，通过将填充符投影到角色基的零空间来绑定到角色，固定阶张量记忆实现深度递归绑定，在恒定内存下提升高叠加场景的效率。

Comments 24 pages, 12 figures, 7 tables

详情

Journal ref: 43rd International Conference on Machine Learning 2026

AI中文摘要

张量积表示为模型中的符号推理提供了所需的结构保真度，但在编码深层递归结构时会遭受指数级维度增长。相反，向量符号架构保持恒定维度，但由于通过叠加的噪声压缩而牺牲了容量和保真度。在这项工作中，我们提出了正交子空间雕刻（OSC），一种内存架构，通过将填充符投影到角色基的零空间上，然后聚合到固定的阶-p张量中，从而将填充符绑定到角色。OSC 使用投影来强制静态记忆痕迹中绑定结构之间的几何正交性。我们表明，这种机制将张量阶与结构深度解耦，从而在恒定内存占用内实现深度递归绑定。通过识别进行检索，这种构造允许分量向量比记忆张量小几个数量级，从而在涉及高叠加的场景中提供卓越的内存效率。我们还表明，TPR 是 Clifford 代数中绑定的一个特例，并给出了 OSC 的 Clifford 公式。

英文摘要

Tensor Product Representations provide the structural fidelity required for symbolic reasoning in models but suffer from exponential dimensionality growth when encoding deep recursive structures. Conversely, Vector Symbolic Architectures maintain constant dimensionality but sacrifice capacity and fidelity due to noisy compression via superposition. In this work, we propose Orthogonal Subspace Carving (OSC), a memory architecture that binds fillers to roles by projecting onto the null space of the role basis before aggregating into a fixed order-p tensor. OSC uses projections to enforce geometric orthogonality between bound structures within a static memory trace. We show that this mechanism decouples the tensor order from the structural depth, enabling deep recursive binding within a constant memory footprint. By performing retrieval via recognition, this construction allows for component vectors that are orders of magnitude smaller than the memory tensor, giving superior memory efficiency in settings involving high superposition. We also show that TPR is a special case of binding in Clifford algebra, and give a Clifford formulation of OSC.

URL PDF HTML ☆

赞 0 踩 0

2606.11518 2026-06-11 cs.LG cs.AI 新提交

SirenFNO: Efficient and Full Frequency Learning of Fourier Neural Operators

SirenFNO：高效且全频率学习的傅里叶神经算子

Pengqing Shi, Jie Yin, Stephen Tierney, Junbin Gao

发表机构 * The University of Sydney（悉尼大学）

AI总结提出SirenFNO框架，利用正弦表示网络学习隐式神经表示并进行模态核参数化，消除频率截断，实现全频谱学习，在多个PDE基准上以最多73倍参数减少取得性能提升。

Comments 9 pages, accepted by IJCAI 2026

详情

AI中文摘要

傅里叶神经算子（FNO）是近似求解偏微分方程的有效且高效的替代方法，并能跨离散化泛化。然而，由于依赖频率截断以保持FNO的学习效率，实证研究表明FNO对低频信息存在频谱偏差，这可能阻碍学习能力，尤其是对于某些具有强烈高频振荡的偏微分方程。为了解决这一局限性，我们提出了SirenFNO，一种利用正弦表示网络（SIREN）学习隐式神经表示并进行模态核参数化的新颖框架。我们的SIREN参数化以常数且与离散化无关的参数数量学习全网格频谱，从而消除了频率截断的需要。我们进一步通过函数张量分解扩展SirenFNO，以提高参数和学习效率。实证结果表明，我们的SirenFNO在保持离散化不变性的情况下，以约4到15倍的参数减少持续优于FNO，并且我们的函数分解变体在多个PDE基准上以最多73倍的参数减少获得了性能提升。

英文摘要

Fourier neural operators (FNOs) are effective and efficient surrogates for approximating solutions of PDEs and generalize across discretizations. However, owing to the reliance on frequency truncation to maintain learning efficiency of FNOs, empirical studies suggest that FNOs exhibit spectral bias toward low-frequency information, which may hinder the learning capability especially for certain PDEs with strong high-frequency oscillations. To address this limitation, we propose SirenFNO, a novel framework that leverages sinusoidal representation networks (SIRENs) to learn implicit neural representations and performs mode-wise kernel parameterization. Our SIREN parameterization learns a full-grid spectrum with a constant and discretization-independent parameter count, thereby eliminating the need for frequency truncation. We further extend SirenFNO with functional tensor decompositions to enhance parameter and learning efficiency. Empirical results show that our SirenFNO consistently outperforms FNO with approximately $4$ to $15$ times parameter reductions with preserved discretization invariance, and our functional decomposition variants obtain performance improvements with a maximum of $73$ times fewer parameters across multiple PDE benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2606.11585 2026-06-11 cs.LG cs.CL nlin.AO 新提交

Kuramoto Attention: Synchronizing Self-Attention on the Torus

Kuramoto注意力：在环面上同步自注意力

Joshua Nunley

发表机构 * Department of Informatics, Luddy School of Informatics, Computing, and Engineering, Cognitive Science Program, Indiana University Bloomington（印第安纳大学伯明顿分校信息学系，卢迪信息学、计算与工程学院，认知科学项目）

AI总结提出Kuramoto注意力层，将隐藏坐标视为角度，通过门控余弦相似度和环形均值更新实现自注意力，等价于Kuramoto耦合项，在字符级语言建模中达到与强基线相近的性能。

Comments 13 pages, 2 figures, 3 tables

详情

AI中文摘要

我们引入了Kuramoto注意力，一种自注意力层，其中每个隐藏坐标是一个角度。该层通过门控余弦相似度对令牌进行评分，关注先前的相位状态，并通过注意力加权的环形均值的切线分量更新每个令牌。由于值是原始相位状态，该更新恰好是Kuramoto耦合项$\sum_u A_{t,u}\sin(\theta_u-\theta_t)$，其中注意力矩阵充当自适应、内容相关的耦合核。等价地，门控分数是环面上的学习度量，用于选择哪些令牌耦合，更新将每个令牌拉向其选择的令牌的环形均值，从而收紧它们的相位一致性。相同的两个成分，即不变相似度分数和流形上的均值，定义了任何紧致群上的此类层；环面是阿贝尔情形，两者都有闭式解。softmax权重解决了一个熵正则化的相位检索问题，旋转位置编码作为分数中与位置相关的相位漂移进入。在enwiki8字符级语言建模中，该层作为功能语言模型训练，其每字符比特数接近强匹配的RoPE+SwiGLU Transformer：在100万参数时相差0.02 BPC（1.637±0.010对比1.616±0.004），在500万参数时中位数持平（五个种子下1.448对比1.452），Transformer在均值上领先（1.468对比1.456）。这些实验表明，受约束的几何结构在此规模下是可行的语言模型；结构本身及其同步解释是贡献。消融实验隔离了承重组件，结果给出了自注意力和相位同步之间的紧凑桥梁。

英文摘要

We introduce Kuramoto attention, a self-attention layer in which each hidden coordinate is an angle. The layer scores tokens by gated cosine similarity, attends over previous phase states, and updates each token by the tangent component of the attention-weighted circular mean. Because the values are the raw phase states, this update is exactly the Kuramoto coupling term $\sum_u A_{t,u}\sin(θ_u-θ_t)$, with the attention matrix acting as an adaptive, content-dependent coupling kernel. Equivalently, the gated score is a learned metric on the torus that selects which tokens couple, and the update pulls each token toward the circular mean of the tokens it selects, tightening their phase agreement. The same two ingredients, an invariant similarity score and an on-manifold mean, define such a layer on any compact group; the torus is the abelian case, where both are closed-form. The softmax weights solve an entropy-regularized phase-retrieval problem, and rotary position enters as a position-dependent phase drift in the score. On enwiki8 character-level language modeling, the layer trains as a functional language model whose bits-per-character stays close to a strong matched RoPE+SwiGLU transformer: within $0.02$ BPC at one million parameters ($1.637\pm0.010$ versus $1.616\pm0.004$) and level on the median at five million ($1.448$ versus $1.452$ over five seeds) with the transformer ahead on the mean ($1.468$ versus $1.456$). These experiments establish that the constrained geometric structure is a viable language model at this scale; the structure itself, and its synchronization reading, is the contribution. Ablations isolate the load-bearing components, and the result gives a compact bridge between self-attention and phase synchronization.

URL PDF HTML ☆

赞 0 踩 0

2606.11627 2026-06-11 cs.LG cs.AI 新提交

When Context Returns: Toward Robust Internalization in On-Policy Distillation

当上下文回归：面向在线策略蒸馏中的鲁棒内化

Xun Wang, Ruishuo Chen, Zhuoran Li, Yu Chen, Longbo Huang

发表机构 * IIIS, Tsinghua University（清华大学交叉信息研究院）

AI总结针对在线策略蒸馏中上下文内化后重新引入上下文导致性能下降的问题，提出一种轻量级一致性正则化方法，通过锚定无上下文输出并惩罚偏离，有效缓解退化并提升鲁棒性。

详情

AI中文摘要

近期研究表明，在线策略蒸馏可以将特权上下文（如系统提示或任务提示）内化到学生模型中，使得推理时不再需要上下文。尽管该方法成功提升了学生的无上下文性能，我们却发现一个有趣且此前未被研究的现象：在许多设置中，向蒸馏后的学生模型重新引入原始特权上下文实际上会降低其性能，甚至对于它已经在无上下文情况下正确解决的实例也是如此。我们将此称为上下文诱导退化，并认为鲁棒内化不仅要求匹配教师的条件上下文行为，还要求在上下文重新引入时保持稳定，这一性质我们称为上下文可移除性。受此观察启发，我们提出一种轻量级一致性正则化方法，首先通过停止梯度锚定学生的无上下文输出，然后通过前向KL散度惩罚条件上下文输出偏离该锚点。这一简单添加每训练步仅需一次额外前向传播，却能有效缓解上下文诱导退化，并在许多情况下甚至提升无上下文性能。在涵盖不同领域和模型家族的12种配置中，我们的方法在大多数设置下提升了条件上下文准确率，在11/12的设置中减少了上下文诱导损害，并有效消除了响应长度膨胀。一项机制性案例研究进一步证实，上下文可移除性在表示层面得以实现，无论上下文是否存在，隐藏状态几乎保持相同。

英文摘要

Recent work has shown that on-policy distillation can internalize privileged context, such as system prompts or task hints, into a student model so that the context is no longer needed at inference time. Although this approach successfully improves the student's no-context performance, we identify an interesting and previously unstudied phenomenon: in many settings, reintroducing the original privileged context to the distilled student actually degrades its performance, even on instances it already solves correctly without context. We term this context-induced degradation and argue that robust internalization demands not only matching the teacher's context-conditioned behavior, but also remaining stable when the context is reintroduced, a property we call context removability. Motivated by this observation, we propose a lightweight consistency regularizer that first anchors the student's no-context output via stop-gradient, then penalizes the context-conditioned output for deviating from it via forward KL divergence. This simple addition requires only one extra forward pass per training step, yet it effectively mitigates context-induced degradation and, in many cases, even improves no-context performance. Across 12 configurations spanning diverse domains and model families, our method improves context-conditioned accuracy in the majority of settings, reduces context-induced harm in 11 out of 12 settings, and effectively eliminates response-length inflation. A mechanistic case study further confirms that context removability is achieved at the representation level, with hidden states remaining nearly identical regardless of whether the context is present.

URL PDF HTML ☆

赞 0 踩 0

2606.11854 2026-06-11 cs.LG cs.AI cs.CL 新提交

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

使用ART微调多模态大语言模型：基于艺术的强化训练

Michal Chudoba, Sergey Alyaev, Petra Galuscakova, Tomasz Wiktorski

发表机构 * University of Stavanger（斯塔万格大学）； NORCE Research（NORCE研究机构）

AI总结提出ART方法，通过优化原始视觉输入将信息注入冻结的多模态大语言模型，实现软提示微调，无需修改计算图，在数学和工具使用基准上达到与LoRA相当的精度。

详情

AI中文摘要

大语言模型有两种主要的参数高效微调技术。低秩适应在LLM层之间引入额外权重，而软提示则向LLM输入引入额外的微调特定原始token。然而，两者都需要修改预编译、预优化LLM的计算图。因此，两者在vLLM等高吞吐引擎中均未得到完全支持。我们提出使用ART（基于艺术的强化训练）进行微调。该方法通过仅优化冻结的多模态大语言模型的原始视觉输入来注入信息，从而在预编译计算图上实现软token方法。它依赖于将梯度反向传播到普通像素阵列，因此支持任何微调目标。此外，优化的视觉输入可以风格化为与任务相关的计算艺术品。该方法在流行的开源Qwen架构的不同规模以及多个文本基准上的有效性得到确认。具体而言，ART在数学和结构化工具使用基准上达到了与LoRA竞争的精度。

英文摘要

There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces additional weights between the LLM layers, Soft Prompting introduces additional fine-tuning-specific raw tokens to an LLM input. However, both require modification to the computational graphs of precompiled, preoptimized LLMs. As a result, neither is fully supported in high-throughput engines like vLLM. We propose fine-tuning with ART (Art-based Reinforcement Training). The method injects information into a frozen Multimodal Large Language Model (MLLM) by optimizing only its raw visual input, thus enabling the soft-token approach on pre-compiled computational graphs. It relies on backpropagation of gradients back into a plain pixel array and thus supports any fine-tuning objective. Moreover, the optimized visual input can be stylized as task-relevant computational artworks. The approach's effectiveness is confirmed for different sizes of a popular open Qwen architecture and for several textual benchmarks. Specifically, ART reaches accuracy competitive with LoRA across mathematics and structured-tool-use benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2606.11963 2026-06-11 cs.LG physics.comp-ph 新提交

HAMNO: A Hierarchical Adaptive Multi-scale Neural Operator with Physics-Informed Learning for Dynamical Systems

HAMNO: 一种用于动力系统的分层自适应多尺度神经算子与物理信息学习

Mostafa Bamdad, Mohammad Sadegh Eshaghi, Timon Rabczuk

发表机构 * Bauhaus-Universität Weimar（魏玛包豪斯大学）； Leibniz University Hannover（莱布尼茨汉诺威大学）

AI总结提出HAMNO神经算子架构，通过自适应门控机制平衡局部与全局信息，结合物理信息扩展PI-HAMNO，在非周期Allen-Cahn等方程上提升长期预测精度与物理一致性。

详情

AI中文摘要

神经算子为直接在函数空间学习偏微分方程解映射提供了强大框架。然而，许多现有架构仍难以表示涉及多尺度结构、长程相互作用和稳定长时间演化的非线性时变系统。本文引入分层自适应多尺度神经算子（HAMNO），一种结合局部卷积表示、全局谱算子和分层编码器-解码器处理的神经算子架构。HAMNO的核心是一个数据相关的门控机制，可在每个空间位置自适应平衡局部和全局信息，使模型能够解析细尺度特征同时保持长程依赖。我们进一步基于多目标损失策略开发了物理信息扩展PI-HAMNO，该策略将数据拟合与强形式和弱形式物理约束相结合。强形式项惩罚物理坐标中域积分平方PDE残差，而弱形式项通过将控制残差乘以有限元测试函数并使用基于质心的四面体求积法评估所得单元积分来构建。该框架在定义于立方域上的非周期Allen-Cahn（AC）、Cahn-Hilliard（CH）和Swift-Hohenberg（SH）方程上进行了评估。在长时程展开、数据有限训练、分布外初始条件偏移和随机种子变化下，HAMNO提高了相对于标准神经算子基线的预测精度，而PI-HAMNO进一步增强了稳定性、物理一致性和数据效率。实现代码公开于https://github.com/HAMNO/HAMNO。

英文摘要

Neural operators provide a powerful framework for learning solution mappings of partial differential equations directly in function space. However, many existing architectures still struggle to represent nonlinear time-dependent systems that involve multi-scale structures, long-range interactions, and stable long-time evolution. In this work, we introduce the Hierarchical Adaptive Multi-scale Neural Operator (HAMNO), a neural-operator architecture that combines local convolutional representations, global spectral operators, and hierarchical encoder-decoder processing. The central component of HAMNO is a data-dependent gating mechanism that adaptively balances local and global information at each spatial location, allowing the model to resolve fine-scale features while preserving long-range dependencies. We further develop a physics-informed extension, PI-HAMNO, based on a multi-objective loss strategy that combines data fitting with strong- and weak-form physics constraints. The strong-form term penalizes the domain-integrated squared PDE residual in physical coordinates, while the weak-form term is constructed by multiplying the governing residual by finite-element test functions and evaluating the resulting element integrals using centroid-based tetrahedral quadrature. The framework is evaluated on non-periodic Allen-Cahn (AC), Cahn-Hilliard (CH), and Swift-Hohenberg (SH) equations defined on cubic domains. Across long-horizon rollout, data-limited training, out-of-distribution initial-condition shifts, and random-seed variations, HAMNO improves predictive accuracy over standard neural-operator baselines, while PI-HAMNO further enhances stability, physical consistency, and data efficiency. The implementation is publicly available at https://github.com/MBamdad/HAMNO .

URL PDF HTML ☆

赞 0 踩 0

2606.12054 2026-06-11 cs.LG 新提交

Simplicity Suffices for Parameter Noise Injection in Stochastic Gradient Descent

随机梯度下降中参数噪声注入的简单性足以胜任

Benjamin Leblanc, Louis-Jacob Lebel, Teddy Kana, Richard Kamel

发表机构 * Université Laval（拉瓦尔大学）

AI总结研究随机梯度下降中的参数噪声注入，提出线性层逐样本噪声注入的高效方法，并实验证明简单各向同性噪声即可达到复杂方案的优化与泛化效果。

Comments Accepted at the Data Science Meets Optimisation workshop in IJCAI 2026

详情

AI中文摘要

向优化过程中注入噪声是一种改善深度神经网络训练和泛化的成熟技术。然而，尽管现有方法众多，实践中哪些设计选择真正重要仍不清楚。本文研究随机梯度下降中的参数噪声注入，聚焦两个关键问题：如何在 mini-batch 训练中高效地为每个训练样本配对其自身的扰动，以及复杂的噪声参数化或多样本梯度平均是否比简单替代方案带来有意义的增益。针对第一个问题，我们利用线性层的分布恒等式，允许在不破坏批计算的情况下进行逐样本噪声注入。针对第二个问题，我们在 CIFAR100 上系统比较了几种对角高斯参数化与各向同性基线在不同噪声水平下的表现。结果一致表明，简单的轻量级策略——每个更新步使用单次扰动前向传播的各向同性噪声——即可恢复更复杂方案的大部分收益。这些发现表明，参数噪声注入的简单性足以胜任，实践者无需采用精心设计的扰动方案即可获得噪声 SGD 的优化和泛化优势。

英文摘要

Injecting noise into the optimization process is a well-established technique for improving the training and generalization of deep neural networks. Yet, despite the breadth of existing approaches, it remains unclear which design choices truly matter in practice. In this work, we investigate parameter noise injection for stochastic gradient descent, focusing on two key questions: how to efficiently pair each training example with its own perturbation in mini-batch training, and whether sophisticated noise parameterizations or multi-sample gradient averaging yield meaningful gains over simpler alternatives. To address the first question, we leverage a distributional identity for linear layers that allows per-example noise injection without breaking batched computation. To address the second, we systematically compare several diagonal Gaussian parameterizations against an isotropic baseline across varying noise levels on CIFAR100. Our results consistently show that simple, lightweight strategies, isotropic noise with a single perturbed forward pass per update step, recover most of the benefit of more complex schemes. These findings suggest that simplicity suffices for parameter noise injection, and that practitioners need not resort to elaborate perturbation designs to reap the optimization and generalization benefits of noisy SGD.

URL PDF HTML ☆

赞 0 踩 0

2606.12059 2026-06-11 cs.LG cs.NE nlin.AO 新提交

Attention by Synchronization in Coupled Oscillator Networks

耦合振荡器网络中的同步注意力机制

Fabio Pasqualetti, Taosha Guo

发表机构 * University of California, Irvine（加州大学尔湾分校）

AI总结提出基于Kuramoto同步动力学的固定查询振荡器注意力机制，无需指数运算和全局归约，在物理基板上实现注意力计算，并在关键词识别和主谓一致任务上优于softmax。

详情

AI中文摘要

我们探讨了能量受限物理基板上的Transformer注意力机制。Softmax注意力需要指数运算和全局归约，这些操作在冯·诺依曼硬件上能耗高且没有自然的物理模拟。我们证明Kuramoto同步动力学（出现在电气、机械、超导和电荷密度波振荡器阵列等物理系统中）无需上述操作即可实现定义良好的注意力操作。由此产生的机制——固定查询振荡器注意力——用球面上梯度流的平衡取代了softmax的算术运算：查询是固定在球面上的学习锚点，自由振荡器在Kuramoto-Lohe动力学下演化，直到它们稳定在通过余弦相似度编码注意力权重的位置上。由于计算是平衡过程，因此不需要指数运算；唯一的全局操作是读出时的仿射归一化。该不动点是唯一且从几乎所有初始条件全局吸引的，这一保证适用于所有物理实现。在实验上，在最小硬件配置（振荡器维度$d_{\mathrm{osc}}=2$）下，振荡器注意力在关键词识别（+1.00个百分点）和主谓一致（困难句子+5.27个百分点，零训练失败，而softmax五分之一失败）上优于softmax。在因果语言建模中，softmax仍保持优势，但振荡器注意力随着$d_{\mathrm{osc}}$的增长缩小了差距：在WikiText-2上，从$d_{\mathrm{osc}}=2$时的+11.09 PPL降至$d_{\mathrm{osc}}=32$时的+2.98 PPL；在TinyStories上，从$d_{\mathrm{osc}}=2$时的+2.39 PPL降至$d_{\mathrm{osc}}=32$时的+0.57 PPL。本工作的主要目标不是用软件替代softmax，而是为物理基板上的精确注意力提供数学基础蓝图。

英文摘要

We address transformer attention on energy-constrained physical substrates. Softmax attention requires exponentiation and global reduction, operations with high energy cost on von Neumann hardware and no natural physical analog. We show that Kuramoto synchronization dynamics (which arise in electrical, mechanical, superconducting, and charge-density-wave oscillator arrays, among other physical systems) implement a well-defined attention operation without either. The resulting mechanism, fixed-query oscillator attention, replaces softmax's arithmetic with the equilibration of a gradient flow on the sphere: queries are learned anchors fixed on the sphere, and free oscillators evolve under Kuramoto-Lohe dynamics until they settle at positions encoding attention weights via cosine similarity. Because the computation is equilibration, it requires no exponentiation; the only global operation is an affine normalization at readout. The fixed point is provably unique and globally attractive from almost every initial condition, a guarantee that holds across every physical realization. Empirically, at the minimal hardware configuration (oscillator dimension $d_{\mathrm{osc}}$ = 2), oscillator attention outperforms softmax on keyword spotting (+1.00 pp) and on subject-verb agreement (+5.27 pp on hard sentences, with zero training failures versus one in five for softmax). On causal language modeling, where softmax retains an advantage, oscillator attention closes the gap as $d_{\mathrm{osc}}$ grows: from +11.09 PPL at $d_{\mathrm{osc}}$ = 2 to +2.98 PPL at $d_{\mathrm{osc}}$ = 32 on WikiText-2, and from +2.39 PPL at $d_{\mathrm{osc}}$ = 2 to +0.57 PPL at $d_{\mathrm{osc}}$ = 32 on TinyStories. The main objective of this work is not to replace softmax in software but to provide a mathematically grounded blueprint for accurate attention on physical substrates.

URL PDF HTML ☆

赞 0 踩 0

2606.12146 2026-06-11 cs.LG cs.AI 新提交

nD-RoPE: A Generalized RoPE for n-Dimensional Position Embedding

nD-RoPE：一种用于n维位置嵌入的广义RoPE

Boyang Li, Yulin Wu, Sizhe Xu, Nuoxian Huang, Zhonghang Yuan, Shangyi Guo, Shu Yang, Takahiro Yabe

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出nD-RoPE，将旋转位置嵌入推广到任意维度，通过多尺度正则单纯形波矢设计实现各向同性，在图像、视频和点云任务中提升性能。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

旋转位置嵌入（RoPE）在Transformer模型中被广泛采用，但其向高维域的扩展缺乏统一的理论表述。大多数现有方法要么沿每个轴独立应用旋转，要么经验性地混合频率，这限制了跨维交互并产生方向相关的表示。为了解决这些限制，我们提出了nD-RoPE，一种将RoPE推广到任意维度的无分解泛化。从连续希尔伯特空间中的平移不变表述出发，我们推导出各向同性的谱条件，要求将位置和频率视为耦合的$n$维向量。我们通过多尺度正则单纯形波矢设计实例化该表述，提供了非退化的空间覆盖和对称、方向平衡的二阶响应。在图像、视频和点云上的实验表明，在高维设置中性能持续提升且泛化能力增强。

英文摘要

Rotary Position Embedding (RoPE) is widely adopted in Transformer models, yet its extension to high-dimensional domains lacks a unified theoretical formulation. Most existing approaches either apply rotations independently along each axis or empirically mix frequencies, which limits cross-dimensional interactions and yields direction-dependent representations. To address these limitations, we propose nD-RoPE, a decomposition-free generalization of RoPE to arbitrary dimensions. From a translation-invariant formulation in continuous Hilbert space, we derive a spectral condition for isotropy that requires treating positions and frequencies as coupled $n$-dimensional vectors. We instantiate this formulation with a multi-scale regular-simplex wave-vector design, which provides non-degenerate spatial coverage and a symmetric, directionally balanced second-order response. Experiments across images, videos, and point clouds demonstrate consistent performance gains and improved generalization in high-dimensional settings.

URL PDF HTML ☆

赞 0 踩 0

2606.12240 2026-06-11 cs.LG cs.AI 新提交

Multi-Rate Mixture of Experts for Accelerating Liquid Neural Network Training

多速率专家混合模型加速液态神经网络训练

Shilong Zong, Almuatazbellah Boker, Hoda Eldardiry

发表机构 * Virginia Tech（弗吉尼亚理工大学）

AI总结提出多速率专家混合框架，结合液态神经网络的多尺度动态与注意力机制，提升多变量时间序列建模的准确性和效率。

详情

AI中文摘要

多变量时间序列数据通常表现出复杂的时间依赖、不规则采样和跨多个时间尺度的异质动态，使得精确序列建模特别具有挑战性。传统的循环神经网络（RNN），如长短期记忆网络（LSTM），在离散时间下运行，可能难以有效捕捉连续和不规则的时间行为。液态神经网络（LNN）通过连续时间动态解决了其中一些限制，但标准LNN架构通常依赖单一动力系统，限制了其建模异质时间模式的能力。为了解决这些挑战，我们提出了一个基于液态神经网络的多速率专家混合（MR-MoE）框架。在所提出的架构中，多个基于LNN的专家以不同的时间尺度运行，使模型能够明确分离快速变化的动态和缓慢演变的时间趋势。门控网络进一步实现了基于输入条件的自适应专家专业化。此外，我们结合了特征级和时间注意力机制，以提高鲁棒性、可解释性和长程依赖建模能力。特征级注意力抑制噪声或无关变量，而时间注意力则选择性地关注信息丰富的历史状态。我们在一个复杂的多变量时间序列预测任务上评估了所提出的框架，并与强基线模型（包括LSTM、单体LNN和标准MoE模型）进行了比较。实验结果表明，所提出的MR-MoE框架在保持良好计算效率的同时，持续实现了改进的AUROC和AUPRC性能。这些结果突显了结合连续时间动态、多尺度专家分解和自适应注意力机制对时间序列建模的有效性。

英文摘要

Multivariate time-series data often exhibit complex temporal dependencies, irregular sampling, and heterogeneous dynamics across multiple time scales, making accurate sequence modeling particularly challenging. Traditional recurrent neural networks (RNNs), such as Long Short-Term Memory (LSTM) networks, operate in discrete time and may struggle to effectively capture continuous and irregular temporal behaviors. Liquid Neural Networks (LNNs) address some of these limitations through continuous-time dynamics, but standard LNN architectures typically rely on a single dynamical system, limiting their ability to model heterogeneous temporal patterns. To address these challenges, we propose a Multi-Rate Mixture-of-Experts (MR-MoE) framework built on top of Liquid Neural Networks. In the proposed architecture, multiple LNN-based experts operate at distinct time scales, enabling the model to explicitly separate fast-changing dynamics from slow-evolving temporal trends. A gating network further enables adaptive expert specialization based on input conditions. In addition, we incorporate both feature-level and temporal attention mechanisms to improve robustness, interpretability, and long-range dependency modeling. Feature-level attention suppresses noisy or irrelevant variables, while temporal attention selectively focuses on informative historical states. We evaluate the proposed framework on a complex multivariate time-series prediction task and compare it against strong baselines, including LSTM, monolithic LNN, and standard MoE models. Experimental results demonstrate that the proposed MR-MoE framework consistently achieves improved AUROC and AUPRC performance while maintaining favorable computational efficiency. These results highlight the effectiveness of combining continuous-time dynamics, multi-scale expert decomposition, and adaptive attention mechanisms for time-series modeling.

URL PDF HTML ☆

赞 0 踩 0

2606.12318 2026-06-11 cs.LG cs.AI 新提交

Harness In-Context Operator Learning with Chain of Operators

利用算子链实现上下文算子学习

Minghui Yang, Ling Guo, Liu Yang

发表机构 * Department of Mathematics, Shanghai Normal University（上海师范大学数学系）； Department of Mathematics, National University of Singapore（新加坡国立大学数学系）

AI总结提出Chain of Operators (CHOP)框架，通过构造显式初等变换与冻结ICON的算子链，无需微调即可提升上下文算子网络在分布外算子任务上的泛化能力，在标量守恒律和平均场控制问题中降低推理误差。

详情

AI中文摘要

神经算子近似函数空间之间的映射，但通常对其他算子泛化能力差，需要微调或重新训练。上下文算子网络（ICON）通过向模型提供数值上下文来解决此问题，使模型从提示中学习特定算子并适应不同算子而无需微调。然而，ICON在分布外（OOD）算子任务上仍可能泛化失败。受大型语言模型（LLM）的提示工程成功启发，我们引入了算子链（CHOP），一种在不更新参数的情况下将冻结的ICON应用于OOD算子任务的框架。具体来说，CHOP构建了一个由显式初等变换和冻结ICON组成的算子链。在标量守恒律和平均场控制问题上的实验表明，与直接ICON评估相比，CHOP降低了相对推理误差，同时链中的每个算子保持可解释且具有封闭形式。在一个PDE族上构建的链进一步泛化到另一个不同的族，表明跨提示系统存在共享机制。

英文摘要

Neural operators approximate mappings between function spaces, but often generalize poorly to other operators and usually require fine-tuning or retraining. In-Context Operator Networks (ICON) addresses this issue by prompting the model with numerical context so that the model learns specific operators from prompts and adapt to different operators without fine-tuning. However, ICON may still fail to generalize to out-of-distribution (OOD) operator tasks. Inpired by the success of harness engineering of Large Language models (LLMs), we introduce Chain of Operators (CHOP), a framework that harness a frozen ICON to OOD operator tasks without updating its parameters. Specifically, CHOP constructs a chain of operators consisting of explicit elementary transformations and the frozen ICON. Experiments on a scalar conservation law and a mean-field control problem show that CHOP reduces relative inference error over direct ICON evaluation, while each operator in the chain remains interpretable and in closed form. A chain constructed on one PDE family further generalizes to a different family, indicating shared mechanisms across harness systems.

URL PDF HTML ☆

赞 0 踩 0

2606.12364 2026-06-11 cs.LG 新提交

你的大模型何时可操控？

Chenrui Fan, Yize Cheng, Ming Li, Soheil Feizi, Tianyi Zhou

发表机构 * University of Maryland, College Park（马里兰大学帕克分校）； MBZUAI, UAE（穆罕默德·本·扎耶德人工智能大学）

AI总结提出通过模型生成初期的内部状态预测激活操控是否成功，并利用该预测器优化操控强度搜索，降低解码成本。

详情

AI中文摘要

激活操控提供了一种轻量级的方法来控制语言模型在推理时的行为，但其成功与否严重依赖于提示、概念、模型和操控配置。寻找成功操控的范围和边界通常需要昂贵的网格搜索和对完整自回归生成的后验评估。在这项工作中，我们研究了是否可以从模型在生成过程初期（例如，生成前几个token后）的内部状态预测可操控性，以及如何利用这样的预测器来提高操控成功率。为此，我们首先引入了ASTEER，一个包含140万次操控生成的测试平台，涵盖150个概念，每个操控成功/失败均已标注。利用该测试平台，我们通过提取特征来比较操控前后跨层和初始解码步骤的隐藏状态，分析模型的早期解码动态。这些特征帮助我们理解操控效果如何沿层和token位置传播，为可操控性预测提供关键信息。然后，我们在这些特征上训练梯度提升决策树（GBDT）分类器，以预测干预是否会欠操控、成功或过操控，而无需完整生成。我们的预测器在未见过的概念上达到了约0.7的宏F1分数，表明早期隐藏状态编码了关于最终操控效果的大量结构化信息。我们进一步利用该可操控性预测器作为操控强度搜索的指导，以极小的解码成本实现了接近最优的性能。

英文摘要

Activation steering offers a lightweight approach to control language models' behavior at inference time, but whether it succeeds or fails heavily depends on the prompt, concept, model, and steering configuration. Finding the regime and boundaries of successful steering typically requires expensive grid searches and post-hoc evaluation of full autoregressive rollouts. In this work, we investigate whether steerability can be predicted from the model's internal states at the beginning of the generation process, e.g., after generating the first few tokens, and how to leverage such a predictor to improve steering success rate. To this end, we first introduce ASTEER, a testbed including 1.4M steered generations, spanning 150 concepts with each steering success/failure labeled. Leveraging this testbed, we analyze the model's early decoding dynamics by extracting features that compare hidden states before and after steering across layers and initial decoding steps. These features help us understand how steering's effects propagate along layers and token positions, which provide key information for steerability prediction. We then train a Gradient Boosting Decision Trees (GBDT) classifier on these features to predict whether an intervention will under-steer, succeed, or over-steer without requiring full rollout. Our predictor achieves around 0.7 macro-F1 score on unseen concepts, demonstrating that early hidden states encode substantial, structured information about eventual steering efficacy. We further leverage this steerability predictor as guidance for steering strength searching, achieving near-optimal performance with a small fraction of decoding cost.

URL PDF HTML ☆

赞 0 踩 0

2606.11673 2026-06-11 quant-ph cs.LG 交叉投稿

Higher-Order Token Interactions via Quantum Attention

高阶令牌交互的量子注意力机制

Jian Xu, Chao Li, Delu Zeng, John Paisley, Qibin Zhao

发表机构 * RIKEN iTHEMS ； RIKEN AIP ； South China University of Technology（华南理工大学）； Columbia University（哥伦比亚大学）

AI总结提出量子高阶注意力（QHA），通过数据重上传和非克利福德纠缠器在浅电路中合成任意阶令牌交互，证明其表达能力超越经典自注意力，并具有可训练性保证，在遗传上位、带噪学习奇偶和图三角形检测中高效检测高阶交互。

详情

AI中文摘要

标准点积自注意力在单层中仅计算令牌间的成对（二阶）交互；表示一般的$k$阶交互已知需要在单层中使用超二次资源或通过深度组合。我们引入\textbf{量子高阶注意力（QHA）}，一种浅层、硬件可实现的量子注意力头，通过数据重上传和全对非克利福德纠缠器，在电路内部合成$k$阶令牌交互，并通过局部单量子比特读出暴露它们。我们证明：（i）表达能力分离：任何嵌入维度$m$、$H$个头和$p$位精度满足$mHp=o(N/\log\log N)$的单个标准自注意力层无法表示一个QHA头以电路深度$O(\log k)$（$O(k)$个两量子比特门）表示的$k$阶相关族；（ii）其局部设计实例的可训练性保证：使用局部读出和$O(\log n)$深度，梯度方差为$\Omega(1/\mathrm{poly}(n))$（无贫瘠高原），我们通过实验确认——同时明确我们基准测试的更具表达力的全对实例是经验训练的，并显示指数衰减的梯度。实验上，在参数预算小$6.5\times$的情况下，QHA从不相交输入中泛化每个阶$k\le6$的隐藏子集奇偶性，而更大的经典注意力头在阶~2之后崩溃；与理论一致，优势的大小跟踪目标的傅里叶度——奇偶性最大，当存在低阶结构时缩小。作为一个应用，QHA在三个领域——遗传上位、带噪学习奇偶和图三角形检测——作为紧凑的高阶交互检测器，在最小的参数预算下达到噪声上限，而领域标准的线性方法失败。

英文摘要

Standard dot-product self-attention computes, in a single layer, only pairwise (order-2) interactions between tokens; representing a generic order-$k$ interaction is known to require either super-quadratic resources in one layer or composition across depth. We introduce \textbf{Quantum Higher-Order Attention (QHA)}, a shallow, hardware-realizable quantum attention head that, via data re-uploading and an all-to-all non-Clifford entangler, synthesizes order-$k$ token interactions inside the circuit and exposes them through a local single-qubit read-out. We prove (i) an expressivity separation: any single standard self-attention layer with embedding dimension $m$, $H$ heads and $p$-bit precision satisfying $mHp=o(N/\log\log N)$ cannot represent the order-$k$ correlation family that one QHA head represents with circuit depth $O(\log k)$ ($O(k)$ two-qubit gates); and (ii) a trainability guarantee for its local-design instantiation: with a local read-out and $O(\log n)$ depth the gradient variance is $Ω(1/\mathrm{poly}(n))$ (no barren plateau), which we confirm empirically -- while being explicit that the more expressive all-to-all instantiation we benchmark is trained empirically and shows exponentially decaying gradients. Empirically, at a $6.5\times$ smaller parameter budget, QHA generalizes hidden-subset parity of every order $k\le6$ from disjoint inputs, whereas the larger classical attention head collapses past order~2; consistent with theory, the size of the advantage tracks the target's Fourier degree - largest for parity and shrinking when low-order structure is present. As an application, QHA serves as a compact high-order interaction detector across three domains - genetic epistasis, learning-parity-with-noise, and graph triangle detection - reaching the noise ceiling at the smallest parameter budget where field-standard linear methods fail.

URL PDF HTML ☆

赞 0 踩 0

2606.11680 2026-06-11 cs.AI cs.CL cs.LG 交叉投稿

从不可约元组合线性层

Travis Pence, Daisuke Yamada, Vikas Singh

发表机构 * University of Wisconsin-Madison（威斯康星大学麦迪逊分校）

AI总结提出用Clifford代数将线性层分解为双向量（几何基元）的组合，仅需O(log^2 d)参数，在LLM注意力投影中匹配强基线性能。

Comments 35 Pages, 11 Tables, 6 Figures, Appearing in NeurIPS 2025

详情

Journal ref: Advances in Neural Information Processing Systems 38 (2025)

AI中文摘要

当代大型模型常表现出暗示存在低级基元的行为，这些基元组合成功能更丰富的模块，但这些基本构建块仍未被很好理解。我们通过询问：能否从最小几何基元集合中识别/合成线性变换？来研究线性层中的这种组合结构。利用Clifford代数，我们证明线性层可以表示为双向量（编码有向平面的几何对象）的组合，并引入一种可微算法将其分解为转子乘积。这种构造仅需O(log^2 d)个参数，而稠密矩阵需要O(d^2)。应用于LLM注意力层中的键、查询和值投影，我们的基于转子的层匹配了块Hadamard和低秩近似等强基线的性能。我们的发现为这些几何基元如何在深度模型中组合成更高层次功能提供了代数视角。

英文摘要

Contemporary large models often exhibit behaviors suggesting the presence of low-level primitives that compose into modules with richer functionality, but these fundamental building blocks remain poorly understood. We investigate this compositional structure in linear layers by asking: can we identify/synthesize linear transformations from a minimal set of geometric primitives? Using Clifford algebra, we show that linear layers can be expressed as compositions of bivectors -- geometric objects encoding oriented planes -- and introduce a differentiable algorithm that decomposes them into products of rotors. This construction uses only O(log^2 d) parameters, versus O(d^2) required by dense matrices. Applied to the key, query, and value projections in LLM attention layers, our rotor-based layers match the performance of strong baselines such as block-Hadamard and low-rank approximations. Our findings provide an algebraic perspective on how these geometric primitives can compose into higher-level functions within deep models.

URL PDF HTML ☆

赞 0 踩 0

2511.00044 2026-06-11 cs.LG nlin.AO 版本更新

Time-multiplexed layer reuse for physical neural networks

物理神经网络的时间复用层重用

Kohei Tsuchiyama, Andre Roehm, Takatomo Mihana, Ryoichi Horisaki

发表机构 * Graduate School of Information Science and Technology, The University of Tokyo（信息科学与技术研究生学校，东京大学）

AI总结针对物理神经网络权重调整慢的瓶颈，提出TIDAL-Net，通过时间复用层增加有效深度，在图像分类和自然语言处理任务上提升性能。

详情

AI中文摘要

物理神经网络（PNN）是下一代计算的有前途的候选者，但现有演示仍比现代数字神经网络小几个数量级，而现代数字神经网络的最新进展是由可训练参数的快速增长驱动的。这种情况类似于早期数字神经网络的限制，这导致了关于参数重用的想法。我们研究了类似高效的硬件架构可能是什么样子，特别关注PNN中权重重新调整的常见瓶颈。我们提出了时间索引深度交替层网络（TIDAL-Net），它占据循环神经网络和深度神经网络之间的中间状态，专门针对常见PNN原型的规模和限制。TIDAL-Net利用许多PNN中快速前向动力学和缓慢可训练权重与偏置之间的时间尺度分离，通过逐层时间复用来增加有效深度，同时限制实现成本。在图像分类和自然语言处理任务上的数值实验表明，TIDAL-Net在仅对传统PNN进行微小修改的情况下提高了性能。

英文摘要

Physical neural networks (PNNs) are promising candidates for next-generation computing, but existing demonstrations remain several orders of magnitude smaller than modern digital neural networks, whose recent advances have been driven by rapid growth in trainable parameters. This situation resembles the constraints of early digital neural networks, which led to ideas around parameter reuse. We investigate what similarly efficient hardware architectures may look like, focusing specifically on the common bottleneck of slow re-adjustment of the weights in PNNs. We propose the Time-Indexed Deep Alternating Layers Network (TIDAL-Net), which occupies an intermediate regime between recurrent and deep neural networks, specifically aimed at the scales and restrictions of common PNN prototypes. TIDAL-Net leverages the timescale separation found in many PNNs between fast forward dynamics and slowly trainable weights and biases, using layer-by-layer time multiplexing to increase effective depth while limiting implementation cost. Numerical experiments on image classification and natural language processing tasks show that TIDAL-Net improves performance with only minor modifications to conventional PNNs.

URL PDF HTML ☆

赞 0 踩 0

2601.14792 2026-06-11 cs.LG 版本更新

Robustness of Mixtures of Experts to Feature Noise

混合专家模型对特征噪声的鲁棒性

Dong Sun, Rahul Nittala, Rebekka Burkholz

发表机构 * Dong Sun（东Sun）； Rahul Nittala（拉胡尔·尼塔拉）； Rebekka Burkholz（蕾贝卡·布克霍尔兹）

AI总结研究混合专家模型在特征噪声下的鲁棒性，发现稀疏专家激活能作为噪声滤波器，相比密集网络具有更低的泛化误差、更强的鲁棒性和更快的收敛速度。

Comments ICML 2026

2602.10743 2026-06-11 cs.LG 版本更新

Kalman Linear Attention: Parallel Bayesian Filtering For Efficient Language Modelling and State Tracking

Kalman线性注意力：用于高效语言建模和状态跟踪的并行贝叶斯滤波

Vaisakh Shaj, Cameron Barker, Aidan Scannell, Andras Szecsenyi, Elliot J. Crowley, Amos Storkey

发表机构 * University of Cambridge（剑桥大学）

AI总结提出Kalman线性注意力层，将序列混合重写为信息形式的精确贝叶斯滤波，实现时间并行推理，在相同计算成本下比GLA更具表达力，并在状态跟踪任务中超越线性SSM和注意力。

Comments Accepted at ICML 2026. An earlier version of this work was presented at the 1st Workshop on Epistemic Intelligence in Machine Learning (EIML) at EurIPS 2025

详情

AI中文摘要

状态空间语言模型如Mamba和门控线性注意力（GLA）提供了线性复杂度、可并行的Transformer替代方案，但其线性状态更新限制了表达力和鲁棒的状态跟踪。我们从概率角度弥合这一差距，将序列混合视为精确贝叶斯滤波，以卡尔曼滤波为核心原语。经典卡尔曼滤波提供有原则的状态和不确定性估计，但被认为是固有顺序的；我们展示了将其重参数化为信息形式后，更新变为关联扫描——因此每个token的循环更新是非线性的（莫比乌斯/精度递归），但保持时间并行。由此产生的Kalman线性注意力（KLA）层是一个即插即用的序列混合器，执行时间并行概率推理，携带显式的信念状态不确定性，并且在相同计算成本下比GLA风格的线性更新具有严格更强的表达力。这种表达力直接转化为更强的状态跟踪：KLA解决了线性SSM和注意力无法解决的排列组合（$A_5$）任务，同时保持扫描并行。作为即插即用原语，它在合成token操作和零样本常识基准测试中匹配或改进了现代SSM和GLA，并且是首批在十亿token规模下训练的堆叠贝叶斯滤波原语之一。

英文摘要

State-space language models such as Mamba and gated linear attention (GLA) offer linear-complexity, parallelisable alternatives to transformers, but their linear state updates limit expressivity and robust state tracking. We close this gap from a probabilistic angle, casting sequence mixing as exact Bayesian filtering with the Kalman filter as the core primitive. Classical Kalman filters give principled state and uncertainty estimates but are viewed as inherently sequential; we show that reparameterising them in information form turns their updates into an associative scan - so the per-token recurrent update is non-linear (a Möbius/precision recursion) yet remains temporally parallel. The resulting Kalman Linear Attention (KLA) layer is a drop-in sequence mixer that performs time-parallel probabilistic inference, carries an explicit belief-state uncertainty, and is strictly more expressive than GLA-style linear updates at the same computational cost. This expressivity translates directly into stronger state tracking: KLA solves permutation-composition ($A_5$) tasks that linear SSMs and attention cannot, while staying scan-parallel. As a drop-in primitive it also matches or improves on modern SSMs and GLAs across synthetic token-manipulation and zero-shot commonsense benchmarks, and is among the first stacked Bayesian-filtering primitives trained at the billion-token scale.

URL PDF HTML ☆

赞 0 踩 0

2603.05573 2026-06-11 cs.LG 版本更新

Why Depth Matters in Parallelizable Sequence Models: A Lie Algebraic View

为什么深度在可并行化序列模型中重要：一个李代数视角

Gyuryang Heo, Timothy Ngotiaoco, Kazuki Irie, Samuel J. Gershman, Bernardo L. Sabatini

发表机构 * Howard Hughes Medical Institute, Department of Neurobiology, Harvard Medical School（霍华德·休斯医学研究所，哈佛医学院神经生物学系）； Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University（自然与人工智能研究学院，哈佛大学）； Department of Psychology and Center for Brain Science, Harvard University（心理学系和脑科学中心，哈佛大学）

AI总结从李代数控制视角，研究可并行化序列模型（如Transformer变体和状态空间模型）的表达能力与深度关系，证明误差随深度增加呈指数下降。

Comments v2: Format update; split former Theorem 3.4 into Theorem 3.4 and Corollary 3.5 for clarity; corrected an indexing error affecting Corollary 3.6, Proposition 3.7, and Figure 2

2605.04853 2026-06-11 cs.LG 版本更新

Hybrid Iterative Neural Low-Regularity Integrator for Nonlinear Dispersive Equations

非线性色散方程的混合迭代神经低正则积分器

Zhangyong Liang, Huanhuan Gao

发表机构 * National Center for Applied Mathematics, Tianjin University（天津大学应用数学中心）； School of Mechanical and Aerospace Engineering, Jilin University（吉林大学机械与 aerospace 工程学院）

AI总结提出HIN-LRI混合框架，用轻量神经网络学习并校正经典低正则积分器的结构截断误差，通过显式时间步缩放保证稳定性，在粗糙数据色散方程上提升精度并保持泛化能力。

详情

AI中文摘要

我们提出HIN-LRI，一种混合框架，通过训练一个神经算子来校正经典数值求解器的结构截断误差，从而增强该求解器。基础低正则积分器为非线性色散偏微分方程提供一致的一阶近似，而一个在低维潜在流形上运行的轻量神经网络学习解析方法无法闭合的残差缺陷。神经校正上的显式时间步缩放确保其Lipschitz贡献为$\mathcal{O}(\tau)$，从而产生一个在步长上一致有界且与空间分辨率无关的Gronwall稳定性因子。该网络通过求解器在环的目标进行端到端训练，该目标展开完整迭代并在Bourgain型范数中惩罚轨迹误差，使学习与多步求解器动态对齐，而非孤立的单步目标。在给定假设下，全局误差满足$C(\varepsilon_{net}+\delta)\\,\tau^\gamma\ln(1/\tau)$，其中$\varepsilon_{net}$衡量网络逼近质量，$\delta$衡量训练不足。在三个具有粗糙数据的色散基准上的实验表明，HIN-LRI在精度上优于解析积分器、分裂方法和神经PDE替代模型，具有稳定的空间细化、有效的分布外迁移和适度的在线开销。

英文摘要

We propose HIN-LRI, a hybrid framework that augments a classical numerical solver with a neural operator trained to correct the solver's structured truncation error. A base low-regularity integrator provides a consistent first-order approximation to nonlinear dispersive PDEs, while a lightweight neural network, operating on a low-dimensional latent manifold, learns the residual defect that analytical methods cannot close. An explicit time-step scaling on the neural correction ensures that its Lipschitz contribution remains $\mathcal{O}(τ)$, yielding a Gronwall stability factor bounded uniformly in the step size and independent of the spatial resolution. The network is trained end-to-end through a solver-in-the-loop objective that unrolls the full iteration and penalises trajectory error in a Bourgain-type norm, aligning learning with multi-step solver dynamics rather than isolated one-step targets. Under stated assumptions, the global error satisfies $C(\varepsilon_{net}+δ)\,τ^γ\ln(1/τ)$, where $\varepsilon_{net}$ measures the network approximation quality and $δ$ the training shortfall. Experiments on three dispersive benchmarks with rough data show that HIN-LRI improves accuracy over analytical integrators, splitting methods, and neural PDE surrogates, with stable spatial refinement, effective out-of-distribution transfer, and modest online overhead.

URL PDF HTML ☆

赞 0 踩 0

2605.04893 2026-06-11 cs.LG cs.CL stat.ML 版本更新

Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

自注意力作为传输：对称谱诊断的极限

Dominik Dahlem, Diego Maniloff, Mac Misiura

发表机构 * Red Hat AI（红帽人工智能）

AI总结研究语言模型注意力路由的两种失效形状（过度集中或过度分散），证明对称谱诊断对方向不敏感，并揭示因果注意力中传输容量的理论下限，提出基于容量和方向的双轴诊断方法。

Comments 48 pages, 6 figures, 7 tables; 81-page online supplement (proofs, additional experiments, dataset statistics) as an ancillary file

详情

AI中文摘要

当语言模型处理幻觉响应时，其注意力路由往往以两种形状之一失效：过度集中在狭窄的位置集合上，或者分散得如此广泛以至于相关性被稀释，而失效的形状携带诊断信号。我们研究这些形状作为诊断特征，从在基准标记响应的\emph{强制评分}下计算的注意力矩阵中得出，而不是在实时生成期间。一类广泛使用的谱方法分析度归一化注意力算子的对称分量，该算子控制传输\emph{容量}；我们证明该算子的每个转置不变谱诊断在结构上是\emph{方向盲的}（它无法区分算子与其转置，因此无法检测信息流方向），并且盲定理的逆定理将任何Lipschitz诊断的转置敏感性限制为不对称系数$G$。将其与规范因果架构的闭式二分-Cheeger景观配对，我们证明均匀因果注意力满足一个与$n$无关的下界$\phi \ge 1/5$，而窗口注意力以$O(w/n)$穿透下界；失效模式在形状上不同，而不仅仅在数值上不同。这个下界是一个理想化架构的基准，而不是经验吸引子：穿透它的真实注意力头的比例本身就是一个架构特征。由此产生的双轴诊断（$\phi$表示容量，$G$表示方向）产生一个可证伪的极性预测：瓶颈主导和分散主导的基准应表现出相反的极性。在长度控制评估下，传输特征在测试的仅解码器、仅编码器和编码器-解码器模型中保持可解释的信号（0.62-0.84 LC-AUROC），极性在HaluEval和MedHallu之间如预测般反转。

英文摘要

When a language model processes a hallucinated response, its attention routing tends to fail in one of two shapes: over-concentrating on a narrow set of positions, or spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. We study these shapes as a diagnostic characterization, computed from attention matrices under \emph{forced scoring} of benchmark-labeled responses rather than during live generation. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport \emph{capacity}; we prove that every transpose-invariant spectral diagnostic of this operator is structurally \emph{orientation-blind} (it cannot distinguish an operator from its transpose, and therefore cannot detect information-flow direction), with a converse to the blindness theorem bounding any Lipschitz diagnostic's transpose sensitivity by the asymmetry coefficient $G$. Pairing this with a closed-form bipartite-Cheeger landscape for canonical causal architectures, we show that uniform causal attention satisfies an $n$-independent floor $ϕ\ge 1/5$, while window attention pierces the floor as $O(w/n)$; failure modes are shape-different, not just value-different. This floor is an idealized-architecture benchmark, not an empirical attractor: the fraction of real attention heads that pierce it is itself an architectural signature. The resulting two-axis diagnostic ($ϕ$ for capacity, $G$ for direction) yields a falsifiable polarity prediction: bottleneck- and diffuse-dominated benchmarks should exhibit opposite polarity. Under length-controlled evaluation, transport features retain interpretable signal (0.62-0.84 LC-AUROC) across the tested decoder-only, encoder-only, and encoder-decoder models, with polarity reversing as predicted between HaluEval and MedHallu.

URL PDF HTML ☆

赞 0 踩 0

2605.15435 2026-06-11 cs.LG cs.NE 版本更新

On the Stability of Growth in Structural Plasticity

结构塑性中增长的稳定性

Lute Lillo, Nick Cheney

发表机构 * University of Vermont（佛蒙特大学）

AI总结本文研究了结构塑性中增长与剪枝的稳定性差异，指出生长在优化轨迹中插入新单元体，而剪枝则在训练初期选择已有单元。生长在图像分类任务中表现更优，但需足够时间整合新单元以提高适应性。

详情

AI中文摘要

标准深度学习管道通常在训练前选择网络架构并保持不变。相比之下，模型可以在训练过程中通过剪枝现有隐藏单元或生长新单元来适应。尽管增长对自适应和持续学习系统有吸引力，但本文表明增长并非单纯是剪枝的逆过程。剪枝在训练初期选择参与训练的单元，而增长在已专业化的优化轨迹中插入新单元。新生单元通常在正向计算中活跃但反向信号较弱。在小型MLP基准中此劣势较小，但在更难的图像分类设置中变得明显。在这些设置中，Grow在结构编辑过程中能获得高最终精度，而Prune在训练轨迹平均性能或重新训练稀疏网络时表现更优。针对优化器状态、插入、选择和可训练性等干预表明，提高新生单元的整合能改善适应性表现，但不自动产生更好的最终子网络。在压力塑性损失的持续学习基准中，Grow在新单元有足够时间整合时表现竞争。这些结果表明，Grow不应仅作为架构搜索操作符，而应作为时间敏感的优化过程，其成功取决于插入稳定性。

英文摘要

Standard deep-learning pipelines usually choose the network architecture before training and keep it fixed throughout optimization. In contrast, a model can also be adapted by editing its structure during training, for example by pruning existing hidden-neuron units or growing new ones. Although growth is appealing for adaptive and continual systems, we show that it is not simply the inverse of pruning. Pruning selects among units that have participated in training from the start, whereas growth inserts new units into an already specialized optimization trajectory. We isolate this insertion problem and show that newborn units are often forward-active but backward-starved: they participate in the forward computation, yet receive much weaker gradient signal than incumbent units. This disadvantage is minor in small MLP benchmarks, but becomes clear in harder image-classification settings with a convolutional trunk. In these settings, \textsc{Grow} can achieve high final accuracy during the structural-editing procedure, while \textsc{Prune} is stronger when performance is averaged over the training trajectory or when the final sparse network is retrained from scratch. Interventions targeting optimizer state, insertion, selection, and trainability show that improving the integration of newborn units can improve adaptive performance, but does not automatically produce better final subnetworks. In continual-learning benchmarks stressing plasticity loss, \textsc{Grow} becomes competitive mainly when new units have enough time to integrate. Together, these results suggest that \textsc{Grow} should be evaluated not only as an architecture-search operator, but as a time-sensitive optimization process whose success depends on insertion stability.

URL PDF HTML ☆

赞 0 踩 0

2606.08343 2026-06-11 cs.LG 版本更新

GENERIC-FNO: Embedding Energy Conservation and Entropy Production into Fourier Neural Operators

GENERIC-FNO：将能量守恒和熵产生嵌入傅里叶神经算子

Jason Sulskis, Sathya Ravi

发表机构 * University of Illinois at Chicago（伊利诺伊大学芝加哥分校）； Georgia Tech Research Institute（佐治亚理工学院研究所）

AI总结提出GENERIC-FNO，首个在函数空间直接嵌入非平衡热力学完整GENERIC结构的神经算子，通过秩一投影精确满足退化条件，实现能量守恒与熵产生，在超分辨率下保持结构保证。

Comments Under review at TMLR

详情

AI中文摘要

我们引入了GENERIC-FNO，这是第一个将非平衡热力学的完整GENERIC（度量-辛）结构——可逆、能量守恒动力学和不可逆、熵产生动力学通过退化条件耦合——直接嵌入函数空间的神经算子。现有的保结构神经算子最多强制执行单一守恒律或可逆（哈密顿）结构，而热力学一致的学习仅限于有限维、图或粒子系统。GENERIC-FNO填补了这一空白：它将能量和熵泛函学习为神经算子，并将泊松和摩擦算子参数化为对角傅里叶乘子，夹在秩一投影之间，通过构造精确满足退化条件，无需惩罚项、更新投影或残差。退化恒等式对任何初始化、维度或分辨率都达到机器精度（残差~10^-13），因此连续时间动力学守恒学习的能量并精确产生熵；显式时间步进仅增加小的O(dt^2)漂移（每步残差~10^-6）。我们进一步指出，给定流的(E,S,L,M)分解并不唯一，并引入了一个规范不变的耗散诊断，独立于学习的泛函分离可逆和耗散动力学。在三个算子主干（1D/2D FNO和DeepONet）和四个涵盖可逆、耗散和混合机制的PDE上，GENERIC-FNO在4倍超分辨率范围（64到256）内零样本保持其精确结构保证，恢复物理耗散的真实顺序，并与强无约束和能量惩罚基线竞争，在相当或更少参数的情况下在多个耗散和混合问题上优于它们。

英文摘要

We introduce GENERIC-FNO, the first neural operator to embed the full GENERIC (metriplectic) structure of nonequilibrium thermodynamics -- reversible, energy-conserving dynamics and irreversible, entropy-producing dynamics coupled through the degeneracy conditions -- directly in function space. Existing structure-preserving neural operators enforce at most a single conservation law or reversible (Hamiltonian) structure, while thermodynamically consistent learning has been confined to finite-dimensional, graph, or particle systems. GENERIC-FNO closes this gap: it learns the energy and entropy functionals as neural operators and parameterizes the Poisson and friction operators as diagonal Fourier multipliers sandwiched between rank-one projections that enforce the degeneracy conditions exactly, by construction, with no penalty term, update projection, or residual. The degeneracy identities hold to machine precision (residuals ~10^-13) for any initialization, dimension, or resolution, so the continuous-time dynamics conserve the learned energy and produce entropy exactly; the explicit time stepping adds only a small O(dt^2) drift (per-step residual ~10^-6). We further note that the (E,S,L,M) decomposition of a given flow is not unique, and introduce a gauge-invariant dissipation diagnostic separating reversible from dissipative dynamics independently of the learned functionals. Across three operator backbones (1D/2D FNOs and DeepONet) and four PDEs spanning reversible, dissipative, and mixed regimes, GENERIC-FNO preserves its exact structural guarantees zero-shot across a 4x super-resolution range (64 to 256), recovers the ground-truth ordering of physical dissipation, and is competitive with strong unconstrained and energy-penalized baselines, outperforming them on several dissipative and mixed problems at comparable or fewer parameters.

URL PDF HTML ☆

赞 0 踩 0

2606.11508 2026-06-11 cs.LG q-bio.QM 新提交

缺失模态下的多模态学习中的潜在世界恢复

Hui Wang, Tianyu Ren, Joseph Butler, Christopher Baker, Karen Rafferty, Simon McDade

发表机构 * Queen's University Belfast（贝尔法斯特女王大学）

AI总结提出潜在世界恢复（LWR）框架，通过邻居潜在对齐和可用性感知融合，在缺失模态下实现鲁棒的多模态预测，避免显式重构误差。

详情

AI中文摘要

我们研究了缺失模态下的多模态学习，特别受到生物科学应用的启发，在这些应用中，当需要做出决策时，异构模态通常仅部分可用。我们提出了潜在世界恢复（LWR），这是一个基于两个关键思想的框架：(i) 来自不同模态的特定模态嵌入在共享潜在空间中对齐，以及 (ii) 通过仅融合在训练和推理时实际可用的模态嵌入来构建统一表示。LWR 不填补缺失模态或要求固定的模态集，而是将每个模态视为对底层潜在状态的部分感知，并直接从观察到的模态执行可用性感知表示学习。这种基于邻居的潜在对齐和可用性感知模态融合的结合，使得在部分观测下能够进行鲁棒的多模态预测，同时避免了显式重构缺失模态带来的误差传播。我们在真实世界的不完整多组学基准上评估了所提出的框架，并证明它为下游任务（如癌症表型分类和生存预测）提供了一种有效的方法。

英文摘要

We study multimodal learning under missing modalities, with particular motivation from bioscience applications in which heterogeneous modalities are often only partially available when decisions need to be made. We propose Latent World Recovery (LWR), a framework built on two key ideas: (i) modality-specific embeddings from different modalities are aligned in a shared latent space, and (ii) a unified representation is constructed by fusing only the embeddings of the modalities that are actually available at both training and inference time. Rather than imputing missing modalities or requiring a fixed modality set, LWR treats each modality as a partial perception of an underlying latent state and performs availability-aware representation learning directly from the observed modalities. This combination of neighbor-based latent alignment and availability-aware modality fusion enables robust multimodal prediction under partial observation, while avoiding error propagation from explicit reconstruction of missing modalities. We evaluate the proposed framework on real-world incomplete multi-omics benchmarks and demonstrate that it provides an effective approach to downstream tasks such as cancer phenotype classification and survival prediction.

URL PDF HTML ☆

赞 0 踩 0

2606.11570 2026-06-11 stat.ML cs.LG stat.ME 交叉投稿

跨层离散概念发现用于解释语言模型

Ankur Garg, Xuemin Yu, Hassan Sajjad, Samira Ebrahimi Kahou

发表机构 * University of Washington（华盛顿大学）

AI总结提出跨层向量量化变分自编码器（CLVQ-VAE），通过离散向量量化瓶颈将残差流中的重复特征压缩为紧凑可解释的概念向量，在三个数据集上优于聚类、单层VQ-VAE和稀疏自编码器基线。

详情

AI中文摘要

由于残差流的存在，解释语言模型仍然具有挑战性，残差流在相邻层之间线性混合和复制特征，导致单层分析忽略这种跨层结构。跨层稀疏自编码器（SAE）解决了层混合问题，但在连续空间中操作，概念分散在许多神经元上，没有清晰的边界。我们引入了跨层向量量化变分自编码器（CLVQ-VAE），这是一种新颖的框架，通过离散向量量化瓶颈将较低层的表示映射到较高层，将重复的残差流特征压缩为紧凑、可解释的概念向量。我们的方法结合了基于top-k温度的采样和指数移动平均（EMA）码本更新，在保持码本多样性的同时，对离散潜在空间进行受控探索。在基于编码器和解码器的模型上，针对ERASER-Movie、Jigsaw和AGNews数据集，CLVQ-VAE在三个评估轴上优于聚类、单层向量量化变分自编码器（VQ-VAE）和稀疏自编码器（SAE）基线：移除识别出的概念使模型准确率下降高达93%，LLM评判员在66.7%的比较中将我们的概念排在首位，人类标注者从我们的可视化中恢复模型预测的准确率为78%，而聚类为54%。

英文摘要

Interpreting language models remains challenging due to the existence of residual stream, which linearly mixes and duplicates features across adjacent layers, causing single-layer analyses to miss this cross-layer structure. Cross-layer sparse autoencoders (SAEs) address layer mixing but operate in continuous space, where concepts split across many neurons without clear boundaries. We introduce Cross-Layer Vector Quantized-Variational Autoencoder (CLVQ-VAE), a novel framework which maps representations from a lower layer to a higher layer through a discrete vector-quantization bottleneck, collapsing duplicated residual-stream features into compact, interpretable concept vectors. Our approach combines top-k temperature-based sampling with exponential moving average (EMA) codebook updates, providing controlled exploration of the discrete latent space while maintaining codebook diversity. Across both encoder- and decoder-based models on ERASER-Movie, Jigsaw, and AGNews, CLVQ-VAE outperforms clustering, single-layer vector quantized-variational autoencoder (VQ-VAE), and sparse autoencoder (SAE) baselines across three evaluation axes: removing identified concepts drops model accuracy by up to 93%, LLM judges rank our concepts first in 66.7% of comparisons, and human annotators recover model predictions from our visualizations with 78% accuracy versus 54% for clustering.

URL PDF HTML ☆

赞 0 踩 0

2507.21164 2026-06-11 cs.LG cs.AI eess.IV stat.ML 版本更新

OCSVM-Guided Representation Learning for Unsupervised Anomaly Detection

OCSVM引导的无监督异常检测表示学习

Nicolas Pinon, Robin Trombetta, Carole Lartizien

发表机构 * Univ. Lyon（里昂大学）； CNRS UMR 5220（国家科学研究中心UMR 5220）； Inserm U1294（法国国家医学研究院U1294）； INSA Lyon（里昂国立应用科学学院）； UCBL（里昂大学）； CREATIS（里昂大学生物医学图像研究中心）

AI总结提出一种将表示学习与可解析求解的一类SVM耦合的方法，通过定制损失函数直接对齐潜在特征与决策边界，在MNIST-C和脑MRI病变检测任务上展现了鲁棒性和性能。

详情

AI中文摘要

无监督异常检测（UAD）旨在无需标签数据检测异常，这在许多机器学习应用中是必要的，因为异常样本稀少或不可用。大多数最先进的方法分为两类：基于重构的方法（通常重构异常过于完美）和与密度估计器解耦的表示学习（可能遭受次优特征空间）。虽然一些近期方法尝试耦合特征学习和异常检测，但它们通常依赖替代目标、限制核选择或引入近似，从而限制了表达能力和鲁棒性。为解决这一挑战，我们提出了一种新颖方法，通过自定义损失公式将表示学习与可解析求解的一类SVM（OCSVM）耦合，该损失直接使潜在特征与OCSVM决策边界对齐。该模型在两个任务上评估：基于MNIST-C的新基准，以及具有挑战性的脑MRI细微病变检测任务。与大多数关注图像级别大而高信号病变的方法不同，我们的方法成功针对小而非高信号的病变，同时我们评估体素级别的指标，处理了更具临床相关性的场景。两个实验评估了对领域偏移的鲁棒性形式，包括MNIST-C中的损坏类型以及MRI中的纹理或人群年龄变化。结果展示了我们提出模型的性能和鲁棒性，突显了其在通用UAD和现实医学成像应用中的潜力。源代码可在此https URL获取。

英文摘要

Unsupervised anomaly detection (UAD) aims to detect anomalies without labeled data, a necessity in many machine learning applications where anomalous samples are rare or not available. Most state-of-the-art methods fall into two categories: reconstruction-based approaches, which often reconstruct anomalies too well, and decoupled representation learning with density estimators, which can suffer from suboptimal feature spaces. While some recent methods attempt to couple feature learning and anomaly detection, they often rely on surrogate objectives, restrict kernel choices, or introduce approximations that limit their expressiveness and robustness. To address this challenge, we propose a novel method that couples representation learning with an analytically solvable One-Class SVM (OCSVM), through a custom loss formulation that directly aligns latent features with the OCSVM decision boundary. The model is evaluated on two tasks: a \deleted{new} benchmark based on MNIST-C, and a challenging brain MRI \deleted{subtle} lesion detection task. Unlike most methods that focus on large, hyperintense lesions at the image level, our approach succeeds to target small, non-hyperintense lesions, while we evaluate voxel-wise metrics, addressing a more clinically relevant scenario. Both experiments evaluate a form of robustness to domain shifts, including corruption types in MNIST-C and texture or population age variations in MRI. Results demonstrate performance and robustness of our proposed model, highlighting its potential for general UAD and real-world medical imaging applications. The source code is available at https://github.com/Nicolas-Pinon/uad_ocsvm_guided_repr_learning.

URL PDF HTML ☆

赞 0 踩 0

2602.02726 2026-06-11 cs.LG cs.CL 版本更新

Vector Quantized Latent Concepts: A Scalable Alternative to Clustering-Based Concept Discovery

向量量化潜在概念：聚类式概念发现的可扩展替代方案

Xuemin Yu, Ankur Garg, Samira Ebrahimi Kahou, Hassan Sajjad

发表机构 * Dalhousie University, Canada（加拿大达尔豪斯大学）； University of Calgary, Canada（加拿大卡尔加里大学）

AI总结提出VQLC框架，通过向量量化学习离散潜在概念，在保持可解释性的同时，实现与K-Means相当的计算效率，并优于层次聚类在大规模数据上的扩展性。

详情

AI中文摘要

大型语言模型（LLMs）在其隐藏状态中编码了丰富的语义信息，但理解这些内部表示捕获了哪些信息仍然困难。从隐藏状态中提取的潜在概念为解释LLMs提供了有希望的方向，但现有的基于聚类的方法面临权衡：层次聚类产生连贯的概念，但由于其二次内存成本而仅限于小数据集，而K-Means高效扩展但可能产生语义连贯性较差的概念。我们提出向量量化潜在概念（VQLC），一种离散概念学习框架，在冻结的隐藏状态上学习潜在概念的码本。在12个数据集-模型设置中，VQLC在计算成本上接近K-Means，扩展性优于层次聚类，并在忠实度上保持竞争力，在仅解码器模型上增益最明显。基于LLMs的评估、定性分析和稀疏自编码器（SAE）比较表明，学习到的概念是可解释且任务相关的。

英文摘要

Large language models (LLMs) encode rich semantic information in their hidden states, yet it remains difficult to understand what information these internal representations capture. Latent concepts extracted from hidden states offer a promising direction for interpreting LLMs, but existing clustering-based methods face a trade-off: hierarchical clustering produces coherent concepts but is limited to small datasets due to its quadratic memory cost, while K-Means scales efficiently but may yield less semantically coherent concepts. We propose Vector Quantized Latent Concept (VQLC), a discrete concept learning framework that learns a codebook of latent concepts on frozen hidden states. Across 12 dataset-model settings, VQLC stays close to K-Means in computational cost, scales better than hierarchical clustering, and remains competitive in faithfulness, with the clearest gains on decoder-only models. LLMs-based evaluation, qualitative analysis, and a Sparse Autoencoder (SAE) comparison demonstrate that the learned concepts are interpretable and task-relevant.

URL PDF HTML ☆

赞 0 踩 0

2511.14427 2026-06-11 cs.RO cs.LG 版本更新

Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning

面向接触丰富机器人强化学习的自监督多感官预训练

Rickmer Krohn, Vignesh Prasad, Gabriele Tiboni, Georgia Chalvatzaki

发表机构 * Interactive Robot Perception & Learning (PEARL) Lab, TU Darmstadt, Germany（图腾机器人感知与学习实验室，图腾施塔德大学，德国）； Hessian.AI（海斯堡人工智能）； Robotics Institute Germany (RIG)（德国机器人研究所（RIG））

AI总结提出MSDP框架，通过掩码自编码和跨模态预测学习多感官表示，并采用非对称架构（评论家使用交叉注意力提取动态特征，演员使用稳定池化表示）加速策略学习，在模拟和真实机器人任务中展现出鲁棒性和高效性。

Comments 8 pages, 11 figures

详情

DOI: 10.1109/LRA.2026.3681156
Journal ref: IEEE Robotics and Automation Letters, 2026, Vol. 11, No. 6, pp. 6799-6806

AI中文摘要

IAPO：面向小型多模态代理工具使用的输入归因感知策略优化

Yifan Yang, Zhen Zhang, Jiayi Tian, Liyan Tan, Zheng Zhang

发表机构 * University of California, Santa Barbara（加州大学圣塔芭芭拉分校）

AI总结提出输入归因感知策略优化（IAPO），通过强化学习对齐模型与教师模型的输入归因，提升多模态小语言模型的工具调用能力，在六个测试集上平均准确率提升3%。

详情

AI中文摘要

本文研究强化学习方法以提升多模态小语言模型（SLM）代理的工具调用能力。尽管现有工作探索了多种奖励设计来改善代理的工具调用能力，但这些方法在SLM训练中面临固有局限性，尤其是在多模态场景下。首先，许多现有方法通过精确匹配某些真实标签或预定义格式来评估工具使用正确性。然而，这种假设通常不适用于多模态任务，因为可能存在多个有效的工具使用路径，且通常没有标注的工具轨迹。其次，这种稀疏且脆弱的二元奖励对如何改进底层决策过程提供的指导很少，使得多模态SLM难以从中学习。为解决这些问题，我们提出输入归因感知策略优化（IAPO），一种通过将模型在输入组件上的归因与更强的教师模型对齐，来改进多模态SLM工具使用的强化学习算法。在Qwen2.5-VL-3B上的实验表明，与现有的视觉工具使用工作相比，所提方法通过帮助模型关注最相关的输入证据，在六个测试集上平均将视觉问答准确率提高了3%。

英文摘要

This paper investigates reinforcement learning (RL) methods for improving tool-calling capabilities in multimodal small language model (SLM) agents. While existing works have explored various reward designs to improve agentic tool-calling ability, these approaches face inherent limitations for SLM training, especially under multimodal scenarios. First, many existing methods evaluate tool use correctness through exact matching against certain ground-truth or predefined formats. However, this assumption is often unsuitable for multimodal tasks, where multiple tool use paths may be valid and annotated tool trajectories are typically unavailable. Second, such sparse and brittle binary rewards provide little guidance on how to improve the underlying decision process, making them particularly difficult for multimodal SLM to learn from. To address these issues, we propose Input Attribution-Aware Policy Optimization (IAPO), an RL algorithm for improving tool use in multimodal SLM by aligning the model's attribution across input components with that of a stronger teacher. Experiments on Qwen2.5-VL-3B show that the proposed method improves visual question answering accuracy by an average of 3% across six test sets compared with existing visual tool use work, by helping the model attend to the most relevant input evidence.

URL PDF HTML ☆

赞 0 踩 0

2606.11709 2026-06-11 cs.LG cs.CL 新提交

RLCSD: Reinforcement Learning with Contrastive On-Policy Self-Distillation

RLCSD: 基于对比策略自蒸馏的强化学习

Leyi Pan, Shuchang Tao, Yunpeng Zhai, Lingzhe Zhang, Zhaoyang Liu, Bolin Ding, Aiwei Liu, Lijie Wen

发表机构 * Tsinghua University（清华大学）； Tongyi Lab, Alibaba Group（阿里巴巴集团通义实验室）； Peking University（北京大学）

AI总结针对策略自蒸馏中特权诱导的风格漂移问题，提出RLCSD方法，通过对比正确与错误提示下的师生差距来抑制风格偏移，提升推理模型在数学和逻辑推理任务上的性能。

Comments 20 pages, 9 figures, 9 tables

详情

AI中文摘要

策略自蒸馏（OPSD）通过将模型自身的分布与在特权上下文（通常是已验证的解决方案）下产生的分布对齐，为推理模型提供密集的令牌级监督。然而，我们表明从这种分布差距中提取的学习信号集中在风格令牌而非任务承载令牌上，因为提示模型倾向于产生更直接、更短的输出。我们将这种病理现象称为\emph{特权诱导的风格漂移}，它会破坏训练稳定性或导致响应长度缩短。为了解决这个问题，我们提出\textbf{RLCSD}（基于对比策略自蒸馏的强化学习），通过对比正确提示下的师生差距与错误提示下的师生差距来缓解这种漂移，抑制无论正确与否，条件于提示往往诱发的风格转变，并产生更集中于任务承载令牌的信号。在Qwen3（1.7B/4B/8B）和Olmo-3-7B-Think上的数学和逻辑推理实验表明，RLCSD始终优于GRPO和先前的OPSD方法。我们进一步表明，对比原则是通用的：它可以嵌入现有的OPSD方法中以提高它们，并且其潜在见解可扩展到更广泛的跨模型策略蒸馏设置。

英文摘要

On-policy self-distillation (OPSD) provides dense, token-level supervision for reasoning models by aligning a model's own distribution with the distribution it produces under privileged context, typically a verified solution. However, we show that the learning signal drawn from this distributional gap concentrates on style tokens rather than task-bearing ones, as the hinted model tends to produce more direct, shorter outputs. We term this pathology \emph{privilege-induced style drift}, which destabilizes training or causes response length to shrink. To address this, we propose \textbf{RLCSD} (Reinforcement Learning with Contrastive on-policy Self-Distillation), which mitigates this drift by contrasting the teacher-student gap under a correct hint against that under a wrong hint, suppressing the style shift that conditioning on a hint tends to induce regardless of correctness, and yielding a signal that is more concentrated on task-bearing tokens. Experiments on Qwen3 (1.7B/4B/8B) and Olmo-3-7B-Think across mathematical and logical reasoning show that RLCSD consistently outperforms GRPO and prior OPSD methods. We further show that the contrastive principle is general: it plugs into existing OPSD methods to improve them, and its underlying insight extends to the broader cross-model on-policy distillation setting.

URL PDF HTML ☆

赞 0 踩 0

2606.11797 2026-06-11 cs.LG 新提交

Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary Deep Reinforcement Learning

空间采样值衰减：非平稳深度强化学习的遗忘机制

Felix Störck, Fabian Hinder, Barbara Hammer

发表机构 * CITEC, Faculty of Technology, Bielefeld University（比勒费尔德大学技术学院CITEC）

AI总结受啮齿动物遗忘行为启发，提出空间采样值衰减作为显式遗忘机制，用于深度强化学习应对环境漂移，在DQN和SAC上验证效果与局限。

Comments Accepted at The 2nd Workshop on Epistemic Intelligence in Machine Learning, EIML@ICML 2026, (non-archival)

详情

AI中文摘要

对小鼠等啮齿动物的研究表明，即使没有提供关于变化的信息（不确定性），它们也能适应环境参数的变化（“漂移”）——这种行为可以通过遗忘机制建模。非平稳强化学习（NSRL）致力于改进最先进的强化学习方法以应对变化的环境：然而，这些方法通常需要关于漂移的（部分）完美信息，如“任务ID”或“上下文”。为了减轻漂移的影响，本文开发了\emph{空间采样值衰减}，作为基于值的深度强化学习架构的一种显式遗忘机制，这是一种简单而有效的方法。特别地，我们展示并讨论了在非平稳环境中评估深度Q网络（DQN）和软演员-评论家（SAC）的修改时，在获得的回报方面的积极效果以及局限性。

英文摘要

Studies on rodents such as mice have shown the capabilities to adapt their behavior when dealing with changing parameters (``drift'') of the environment even if no information about change is provided (uncertainty) -- a behavior that can be modeled by forgetting mechanisms. Non-stationary Reinforcement Learning (NSRL) deals with adapting state-of-the-art RL methods to deal with changing environments: these however usually require (partially) perfect information about the drift such as ``task IDs'' or ``context''. To mitigate the effects of drift, this work develops \emph{Space-sampled Value Decay} as an explicit forgetting mechanism for value-based deep RL architectures as a simple yet effective approach. In particular we demonstrate and discuss positive effects but also limitations in achieved returns for modifications of Deep Q-networks (DQN) and Soft Actor-Critic (SAC) when evaluated on non-stationary environments.

URL PDF HTML ☆

赞 0 踩 0

2606.11968 2026-06-11 cs.LG stat.ML 新提交

Efficient Multinomial Logistic Bandit via Frequent Directions

基于频繁方向的高效多项式逻辑斯蒂老虎机

Linzhe He, Yu-Jie Zhang, Sifan Yang, Lijun Zhang

发表机构 * State Key Laboratory of Novel Software Technology, Nanjing University（南京大学计算机软件新技术国家重点实验室）； School of Artificial Intelligence, Nanjing University（南京大学人工智能学院）； Paul G. Allen School of Computer Science & Engineering, University of Washington（华盛顿大学保罗·G·艾伦计算机科学与工程学院）

AI总结针对多项式逻辑斯蒂老虎机的高维计算瓶颈，提出集成频繁方向矩阵素描的EOFD-MLogB算法，将每轮复杂度降至O(Kd(m+K)^2)时间和O(Kd(m+K))空间，并证明其遗憾界接近原算法。

详情

AI中文摘要

本文研究多项式逻辑斯蒂老虎机（MLogB）的高效在线算法，其中$K+1$个结果的反馈分布遵循$d$维动作向量的多项式逻辑斯蒂模型。代表性的UCB型算法OFUL-MLogB实现了$\tilde{\mathcal{O}}(Kd\sqrt{T})$的遗憾界，但由于参数估计和乐观奖励构造，每轮仍需$\mathcal{O}(K^3d^3)$时间和$\mathcal{O}(K^2d^2)$空间，在高维场景下不可行。为解决此限制，我们提出EOFD-MLogB，将频繁方向矩阵素描集成到OFUL-MLogB中。通过维护累积Hessian的低秩SVD素描，参数估计中的约束在线牛顿更新和奖励奖励中的$Kd \times K$谱范数计算分别简化为单维求根任务和$K \times K$特征值计算。这导致每轮主要时间复杂度为$\mathcal{O}(Kd(m+K)^2)$，空间复杂度为$\mathcal{O}(Kd(m+K))$，其中$m \ll d$为素描大小。我们进一步证明了$\tilde{\mathcal{O}}(\Delta_T(Kd\ln\Delta_T+m)\sqrt{T})$的遗憾界，其中素描误差因子$\Delta_T$由Hessian的$m$截断谱尾控制。因此，当Hessian近似低秩时，遗憾接近OFUL-MLogB。实验验证了计算效率和竞争性能。

英文摘要

This paper studies efficient online algorithms for multinomial logistic bandits (MLogB), where the feedback distribution over $K+1$ outcomes follows a multinomial logistic model of $d$-dimensional action vectors. A representative UCB-type algorithm, OFUL-MLogB, achieves a regret bound of $\tilde{\mathcal{O}}(Kd\sqrt{T})$, but still requires $\mathcal{O}(K^3d^3)$ time and $\mathcal{O}(K^2d^2)$ space per round due to parameter estimation and optimistic reward construction, which is prohibitive in high-dimensional settings. To address this limitation, we propose EOFD-MLogB, which integrates frequent directions matrix sketching into OFUL-MLogB. By maintaining a low-rank SVD sketch of the accumulated Hessian, constrained online Newton updates in parameter estimation and $Kd \times K$ spectral-norm computations in the reward bonus are reduced to one-dimensional root-finding tasks and $K \times K$ eigenvalue computations, respectively. This yields dominant per-round time complexity $\mathcal{O}(Kd(m+K)^2)$ and space complexity $\mathcal{O}(Kd(m+K))$, where $m \ll d$ is the sketch size. We further prove a regret bound of $\tilde{\mathcal{O}}(Δ_T(Kd\lnΔ_T+m)\sqrt{T})$, where the sketching error factor $Δ_T$ is controlled by the $m$-truncated spectral tail of the Hessian. Thus, when the Hessian is approximately low-rank, the regret is close to that of OFUL-MLogB. Experiments validate the computational efficiency and competitive performance.

URL PDF HTML ☆

赞 0 踩 0

2606.11982 2026-06-11 cs.LG 新提交

PAWS: Preference Learning with Advantage-Weighted Segments

PAWS: 基于优势加权片段的首选学习

Aleksandar Taranovic, Onur Celik, Niklas Freymuth, Ge Li, Serge Thilges, Huy Le, Tai Hoang, Rania Rayyes, Gerhard Neumann

发表机构 * University of Freiburg（弗莱堡大学）

AI总结针对偏好强化学习中训练与推理分布不匹配导致时间信用分配退化的问题，提出PAWS方法，利用片段级优势函数直接进行策略更新，在机器人操作和运动任务上优于现有方法。

Comments Published as a conference paper at ICML 2026

详情

AI中文摘要

基于偏好的强化学习（PbRL）从人类轨迹级比较中学习策略，避免了显式奖励设计和专家演示。现有方法通常在轨迹或片段级偏好上训练效用函数，同时在策略优化过程中依赖每步效用估计。这种训练和推理的不匹配导致了分布偏移，严重降低了时间信用分配并限制了策略学习。我们分析了这一问题，并提出了PAWS，一种基于片段的偏好学习方法，直接使用片段级优势函数进行策略更新。通过使效用训练与策略优化对齐，PAWS保留了轨迹级偏好信息，避免了不可靠的每步学习信号。在模拟机器人操作和运动任务上的实验表明，PAWS持续优于现有的PbRL方法，突显了分布一致偏好学习的重要性。

英文摘要

Preference-based reinforcement learning (PbRL) learns policies from human trajectory-level comparisons, avoiding explicit reward design and expert demonstrations. Existing methods typically train utility functions on trajectory or segment-level preferences while relying on per-step utility estimates during policy optimization. This training and inference mismatch induces a distribution shift that severely degrades temporal credit assignment and limits policy learning. We analyze this issue and propose PAWS, a segment-based preference learning method that performs policy updates directly using segment-level advantage functions. By aligning utility training with policy optimization, PAWS preserves trajectory-level preference information and avoids unreliable per-step learning signals. Experiments on simulated robotic manipulation and locomotion tasks demonstrate that PAWS consistently outperforms existing PbRL approaches, highlighting the importance of distribution-consistent preference learning.

URL PDF HTML ☆

赞 0 踩 0

2606.12370 2026-06-11 cs.LG cs.CL 新提交

Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

打破熵界：通过带拒绝采样的多令牌预测加速强化学习训练

Yucheng Li, Huiqiang Jiang, Yang Xu, Jianxin Yang, Yi Zhang, Yizhong Cao, Yuhao Shen, Fan Zhou, Rui Men, Jianwei Zhang, An Yang, Bowen Yu, Bo Zheng, Fei Huang, Junyang Lin, Dayiheng Liu, Jingren Zhou

发表机构 * Qwen Team, Alibaba Inc（阿里巴巴集团 Qwen 团队）

AI总结针对强化学习训练中多令牌预测接受率因熵波动而下降的问题，提出Bebop方法，采用概率拒绝采样和端到端TV损失优化，实现高达95%接受率和1.8倍加速。

详情

AI中文摘要

强化学习（RL）已成为现代大型语言模型的关键组成部分，但展开阶段仍是RL训练流程中的主要瓶颈。尽管多令牌预测（MTP）通过推测解码提供了一种自然的加速方案，但许多研究观察到MTP接受率在RL训练期间显著下降，导致加速效果有限。为解决这一瓶颈，我们提出Bebop，对LLM后训练中的MTP进行系统研究，并提供将MTP集成到大规模RL流水线中的实用方案。首先，我们揭示MTP接受率根本上受模型熵波动的限制，其与RL阶段熵的上升呈现清晰的负线性关系。其次，我们证明与贪婪草稿采样相比，概率拒绝采样在很大程度上减轻了RL中熵引入的干扰。我们进一步发现，传统的MTP训练目标（交叉熵或KL）在此类设置中次优，因此我们提出一种新颖的端到端TV损失，直接优化多步拒绝采样接受率，带来约10%的接受率提升，在数学推理、代码生成和智能体任务中实现高达95%的接受率和高达25%的额外推理吞吐量增益。第三，我们测试了RL期间的各种在线MTP训练策略，并表明使用端到端TV损失和拒绝采样的预RL MTP训练在整个RL过程中保持一致的接受率和加速，消除了昂贵的在线MTP更新需求。我们提供了大量实验和分析来验证我们的发现。实验结果表明，我们的方法在Qwen3.5、Qwen3.6和Qwen3.7模型的异步RL训练中实现了高达1.8倍的端到端加速。

英文摘要

Reinforcement learning (RL) has become a key component in modern large language models, yet the rollout stage remains the key bottleneck in RL training pipelines. Although Multi-Token Prediction (MTP) offers a natural solution to accelerate rollouts through speculative decoding, many studies have observed that MTP acceptance rates degrade significantly during RL training, leading to limited speedup performance. To address this bottleneck, we present Bebop, a systematic study of MTP in LLM post-training, and offer practical recipes to integrate MTP into large-scale RL pipelines. First, we reveal that the MTP acceptance rate is fundamentally bounded by the fluctuation of model entropy, which demonstrates a clear negative linear relationship with the rise of entropy in the RL stage. Second, we show that probabilistic rejection sampling largely alleviates the disturbance introduced by entropy in RL compared to greedy draft sampling. We further identify that the conventional MTP training objectives (cross-entropy or KL) are suboptimal in such settings, and therefore we propose a novel end-to-end TV loss that directly optimizes multi-step rejection sampling acceptance rate, yielding ~10% acceptance rate improvements, achieving up to 95% acceptance rates and up to 25% extra inference throughput gains across mathematical reasoning, code generation, and agentic tasks. Third, we test various online MTP training strategies during RL and show that pre-RL MTP training with e2e TV loss and rejection sampling achieves a consistent acceptance rate and speedup throughout the entire RL, eliminating the need for costly online MTP updating. We provide extensive experiments and analysis that validate our findings. Experimental results show our method achieves up to 1.8x end-to-end acceleration in async RL training of Qwen3.5, Qwen3.6, and Qwen3.7 models.

URL PDF HTML ☆

赞 0 踩 0

2606.12384 2026-06-11 cs.LG cs.AI 新提交

APPO: Agentic Procedural Policy Optimization

APPO: 智能体程序策略优化

Xucong Wang, Ziyu Ma, Yong Wang, Yuxiang Ji, Shidong Yang, Guanhua Chen, Pengkun Wang, Xiangxiang Chu

发表机构 * University of Science and Technology of China（中国科学技术大学）； AMAP, Alibaba Group（阿里巴巴集团高德地图）； Southern University of Science and Technology（南方科技大学）

AI总结提出APPO方法，通过细粒度分支和程序级优势缩放改进智能体强化学习的信用分配，在13个基准上平均提升近4个点。

Comments 25 pages, including 14 pages of main text and 11 pages of appendix; work in progress

详情

AI中文摘要

近期智能体强化学习（RL）的进展显著提升了大型语言模型智能体的多轮工具使用能力。然而，现有方法大多基于粗粒度的启发式单元（如工具调用边界或固定工作流）进行信用分配，难以识别哪些中间决策影响下游结果。本文从两个角度研究智能体RL：\textit{何处分支以及分支后如何分配信用}。我们的初步分析表明，有影响力的决策点广泛分布在生成序列中，而非集中于工具调用，而仅凭token熵无法可靠反映其对最终结果的影响。基于这些观察，我们提出\textbf{智能体程序策略优化（APPO）}，将分支和信用分配从粗粒度的交互单元转移到序列中的细粒度决策点。APPO使用分支分数选择分支位置，该分数结合了token不确定性和后续延续的策略诱导似然增益，从而在过滤掉虚假高熵位置的同时实现更有针对性的探索。它进一步引入了程序级优势缩放，以更好地在分支展开中分配信用。在13个基准上的实验表明，APPO在保持高效工具调用和行为可解释性的同时，一致地将强智能体RL基线提升了近4个点。

英文摘要

Recent advances in agentic Reinforcement Learning (RL) have substantially improved the multi-turn tool-use capabilities of large language model agents. However, most existing methods assign credit over coarse heuristic units, such as tool-call boundaries or fixed workflows, making it difficult to identify which intermediate decisions influence downstream outcomes. In this work, we study agentic RL from two perspectives: \textit{where to branch and how to assign credit after branching}. Our pilot analysis shows that influential decision points are broadly distributed throughout the generated sequence rather than concentrated at tool calls, while token entropy alone does not reliably reflect their impact on final outcomes. Motivated by these observations, we propose \textbf{Agentic Procedural Policy Optimization (APPO)}, which shifts branching and credit assignment from coarse interaction units to fine-grained decision points in the sequence. APPO selects branching locations using a Branching Score that combines token uncertainty with policy-induced likelihood gains of subsequent continuations, enabling more targeted exploration while filtering out spurious high-entropy positions. It further introduces procedure-level advantage scaling to better distribute credit across branched rollouts. Experiments on 13 benchmarks show that APPO consistently improves strong agentic RL baselines by nearly 4 points, while keeping efficient tool-calls and maintaining behavior interpretability.

URL PDF HTML ☆

赞 0 踩 0

2606.12386 2026-06-11 cs.LG cs.AI 新提交

ATLAS: Active Theory Learning for Automated Science

ATLAS: 自动化科学的主动理论学习

Noémi Éltető, Nathaniel D. Daw, Kimberly L. Stachenfeld, Kevin J. Miller

发表机构 * Google DeepMind（谷歌深度思维）； Princeton University（普林斯顿大学）； Columbia University（哥伦比亚大学）； University College London（伦敦大学学院）

AI总结提出ATLAS框架，通过主动学习迭代生成稀疏神经网络假设并设计最优区分实验，在bandit任务中恢复强化学习智能体，相比随机实验采样效率提升5-10倍。

详情

AI中文摘要

通过机制建模推进科学理解需要提出正确的实验问题以产生信息量最大的数据。为了在认知科学中自动化这一追求，我们引入了ATLAS（自动化科学的主动理论学习），这是一个用于数据驱动的可解释行为模型发现的主动学习框架。ATLAS在生成机制假设（实例化为多样化的稀疏神经网络集成，即解缠RNN）和设计能够最优区分这些假设的实验之间迭代。我们在从bandit任务中的行为恢复强化学习智能体的问题上测试了这种方法。ATLAS设计了具有时间结构的定性新颖实验序列，该结构针对底层智能体特征量身定制。在这些实验上训练的模型通过一套全面的机制建模指标进行评估，这些指标捕捉了行为、结构和计算相似性。与随机实验相比，ATLAS在所有指标上实现了5-10倍的采样效率提升，并且其性能进一步通过与文献中专家设计的实验进行验证得到确认。这些计算机模拟结果展示了ATLAS在加速人类可解释洞察方面的潜力，适用于认知科学以及其他科学探究依赖于发现机制模型的领域。

英文摘要

Advancing scientific understanding through mechanistic modeling requires posing the right experimental questions to yield maximally informative data. To automate this pursuit within cognitive science, we introduce ATLAS (Active Theory Learning for Automated Science), an active learning framework for the data-driven discovery of interpretable behavioral models. ATLAS iterates between generating mechanistic hypotheses--instantiated as a diverse ensemble of sparse neural networks (Disentangled RNNs)--and designing experiments that optimally distinguish between them. We test this approach on the problem of recovering reinforcement learning agents from their behavior in bandit tasks. ATLAS designs varied sequences of qualitatively novel experiments with temporal structure tailored to underlying agent characteristics. The models trained on these experiments are evaluated against a comprehensive set of metrics for mechanistic modeling that capture behavioral, structural, and computational similarity. ATLAS achieves a 5-10x improvement in sample efficiency across all metrics compared to random experimentation, and its performance is further validated against expert-designed experiments derived from literature. These in silico results showcase ATLAS's potential to accelerate human-interpretable insights in cognitive science and other domains where scientific inquiry relies on discovering mechanistic models.

URL PDF HTML ☆

赞 1 踩 0

2606.11209 2026-06-11 cs.CL cs.AI cs.LG 交叉投稿

ProcessThinker: Enhancing Multi-modal Large Language Models Reasoning via Rollout-based Process Reward

ProcessThinker: 通过基于展开的过程奖励增强多模态大语言模型推理

Jingpei Wu, Xiao Han, Weixiang Shen, Boer Zhang, Zifeng Ding, Volker Tresp

发表机构 * LMU Munich（慕尼黑大学）； Harvard University（哈佛大学）； University of Cambridge（剑桥大学）； Mina AI ； Konrad Zuse School of Excellence in Reliable AI (relAI)（康拉德·楚泽可靠人工智能卓越学校（relAI））

AI总结提出ProcessThinker，一种无需显式过程奖励模型的后训练方法，通过步骤标记格式和基于展开的过程奖励，为多步推理提供密集的步骤级奖励，提升多模态推理一致性。

Comments Accepted at ICLR 2026 Workshop on Logical Reasoning of Large Language Models. 7 pages, 1 figure

详情

AI中文摘要

视觉问答越来越需要多步推理。最近在可验证奖励下的强化学习后训练（RLVR）和组相对策略优化（GRPO）可以改善多模态推理，但大多数方法依赖于稀疏的仅结果奖励。因此，它们难以判断错误答案是由于推理后期的一个小错误，还是从一开始就无用的轨迹。一个常见的解决方案是训练一个过程奖励模型（PRM）用于步骤级监督，但这通常需要大规模高质量的思想链注释和额外的训练成本。我们提出ProcessThinker，一种实用的后训练流程，无需训练显式的PRM即可提供步骤级过程奖励。ProcessThinker首先将推理轨迹重写为步骤标记格式以进行冷启动监督微调，然后应用带有标准格式奖励和我们基于展开的过程奖励的GRPO。具体来说，对于每个中间步骤，我们从该步骤采样多个连续步骤，并使用经验成功率（最终答案验证）作为步骤奖励。这提供了密集的信用分配，并鼓励更可靠地支持正确结论的推理步骤，有助于减少跨步骤的不一致或自相矛盾的进展——这是逻辑推理中的一个关键问题。在四个具有挑战性的视频基准测试（Video-MMMU、MMVU、VideoMathQA和LongVideoBench）上，ProcessThinker始终优于基线模型Qwen3-VL-8B-Instruct。

英文摘要

Visual question answering increasingly requires multi-step reasoning. Recent post-training with reinforcement learning under verifiable rewards (RLVR) and Group Relative Policy Optimization (GRPO) can improve multimodal reasoning, but most approaches rely on sparse outcome-only rewards. As a result, they struggle to tell whether an incorrect answer comes from a small mistake late in the reasoning or from an unhelpful trajectory from the start. A common solution is to train a process reward model (PRM) for step-level supervision, but this typically requires large-scale high-quality chain-of-thought annotations and additional training cost. We propose ProcessThinker, a practical post-training pipeline that provides step-level process rewards without training an explicit PRM. ProcessThinker first rewrites reasoning traces into a step-tagged format for cold-start supervised fine-tuning, then applies GRPO with a standard format reward and our rollout-based process reward. Concretely, for each intermediate step, we sample multiple continuations from that step and use the empirical success rate (final-answer verification) as the step reward. This gives dense credit assignment and encourages reasoning steps that more reliably support a correct conclusion, helping reduce inconsistent or self-contradictory progress across steps -- a key issue in logical reasoning. Across four challenging video benchmarks (Video-MMMU, MMVU, VideoMathQA, and LongVideoBench), ProcessThinker consistently improves over the baseline model Qwen3-VL-8B-Instruct

URL PDF HTML ☆

赞 0 踩 0

2606.11274 2026-06-11 cs.MA cs.LG physics.flu-dyn 交叉投稿

Multi-agent rendezvous in fluid flows via reinforcement learning

基于强化学习的多智能体在流体中的会合

Bocheng Li, Jingran Qiu, Lihao Zhao

发表机构 * AML, Department of Engineering Mechanics, Tsinghua University（AML，工程力学系，清华大学）； Department of Physics, Gothenburg University（物理系，哥德堡大学）

AI总结采用多智能体强化学习（MARL）在涡旋流中开发物理信息会合策略，显著提高会合率，并具有跨涡旋强度、尺度和群体规模的迁移性，通过打破状态-动作图对称性防止智能体被困在分离涡旋中。

详情

AI中文摘要

会合是多智能体系统的一项关键任务，要求智能体协调以在未指定位置相遇。然而，在流体环境中实现这一目标具有挑战性，因为尚不清楚智能体如何利用底层流体运动学来促进收敛。在本研究中，我们采用多智能体强化学习（MARL）方法在涡旋流中开发物理信息会合策略。与智能体向其对应方导航的朴素策略相比，MARL策略显著提高了会合率。MARL策略还表现出跨不同涡旋强度、涡旋尺度和群体规模的可迁移性。通过打破状态-动作图的对称性，MARL策略利用一种非直观的机制，防止智能体被困在分离的涡旋中，从而提高会合成功率。此外，从学习到的策略中提取了一种启发式策略，其性能也优于朴素策略。进一步的理论分析表明，流体变形阻碍了会合过程。大的有限时间李雅普诺夫指数识别出流体效应分离相邻智能体的区域，表明应在弱变形区域规划目标。我们的发现揭示了智能体-流体相互作用在多智能体任务中的重要作用，并突出了MARL在复杂流动环境中探索群体智能的能力。

英文摘要

Rendezvous is a critical task for multi-agent systems, requiring agents to coordinate to meet at an unspecified location. However, achieving this in fluid environments presents a challenge, as it remains unclear how agents can exploit underlying fluid kinematics to facilitate convergence. In this study, we adopt a multi-agent reinforcement learning (MARL) approach to develop physics-informed rendezvous strategies in vortical flows. Compared to a naive strategy, where agents navigate toward their counterparts, MARL strategies significantly improve the rendezvous rate. MARL strategies also show transferability across varying vortex intensities, vortex scales, and swarm sizes. By breaking the symmetry of the state-action map, MARL strategy leverages a non-intuitive mechanism that prevents agents from becoming trapped in separate vortices, thereby enhancing rendezvous success. Additionally, a heuristic strategy is extracted from the learned strategy and also outperforms the naive strategy. Furthermore, a theoretical analysis demonstrates that fluid deformation impedes the rendezvous process. Large finite-time Lyapunov exponents identify where fluid effects separate adjacent agents, suggesting that targets should be planned in weak-deformation regions. Our findings reveal the important role that agent-fluid interactions play in multi-agent tasks and highlight the MARL capability to explore swarm intelligence in complex flow environments.

URL PDF HTML ☆

赞 0 踩 0

2606.11284 2026-06-11 cs.MA cs.GT cs.LG 交叉投稿

Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria

Phi-Actor-Critic: 引导一般和博弈走向帕累托高效关联均衡

Wongyu Lee, Francesco Lelli, Omran Ayoub, Massimo Tornatore

发表机构 * Politecnico di Milano（米兰理工大学）； Tilburg University（蒂尔堡大学）； University of Applied Sciences and Arts of Southern Switzerland（瑞士南瑞士应用科学与艺术大学）

AI总结提出Φ-Actor-Critic框架，通过交换遗憾最小化引导多智能体学习向高社会福利的关联均衡收敛，并采用集中式注意力批评家高效估计反事实遗憾，结合拉格朗日均衡选择机制优化社会福利。

Comments Accepted to IJCAI 2026

详情

AI中文摘要

现实世界的多智能体系统，从交通协调到资源分配，通常被建模为一般和博弈，其中个体激励与集体福利相冲突。在这些设定中，核心挑战不仅是找到均衡，而是在许多次优纳什均衡中选择社会期望的结果。标准的深度多智能体强化学习（MARL）方法难以解决这个问题，因为价值分解方法受单调性假设约束，而策略梯度方法往往收敛到稳定但社会效率低下的均衡。为了解决这一限制，我们提出了Φ-Actor-Critic（Φ-AC），一个利用交换遗憾最小化引导学习向高福利关联均衡（CE）收敛的框架。为了使反事实遗憾估计在深度MARL中易于处理，Φ-AC采用了一个集中式注意力批评家，在单次前向传播中预测向量值遗憾，避免了计算昂贵的反事实模拟。我们进一步引入了一个基于拉格朗日的均衡选择机制，通过遗憾约束优化社会福利同时确保稳定性。在矩阵博弈、多智能体粒子环境（MPE）和Melting Pot Harvest场景上的实验表明，Φ-AC在多样的混合动机设定中学习到高效且稳定的协调策略，同时保持高集体回报和竞争公平性。

英文摘要

Real-world multi-agent systems, from traffic coordination to resource allocation, are often modeled as general-sum games where individual incentives conflict with collective welfare. In these settings, the central challenge is not merely finding an equilibrium, but selecting socially desirable outcomes among many suboptimal Nash equilibria. Standard deep multi-agent reinforcement learning (MARL) methods struggle with this problem, as value-decomposition approaches are constrained by monotonicity assumptions and policy-gradient methods often converge to stable but socially inefficient equilibria. To address this limitation, we propose $Φ$-Actor-Critic ($Φ$-AC), a framework that leverages swap regret minimization to steer learning toward high-welfare correlated equilibria (CE). To make counterfactual regret estimation tractable in deep MARL, $Φ$-AC employs a centralized attention critic that predicts vector-valued regrets in a single forward pass, avoiding computationally expensive counterfactual simulations. We further introduce a Lagrangian-based equilibrium selection mechanism that optimizes social welfare while enforcing stability through regret constraints. Experiments on matrix games, Multi-Agent Particle Environments (MPE), and the Melting Pot Harvest scenario demonstrate that $Φ$-AC learns efficient and stable coordination strategies across diverse mixed-motive settings while maintaining high collective return and competitive fairness.

URL PDF HTML ☆

赞 0 踩 0

2606.11525 2026-06-11 cs.RO cs.LG 交叉投稿

Learning Object Manipulation from Scratch via Contrastive Interaction

通过对比交互从零开始学习物体操作

Tongle Shen, Caleb Chuck, Fan Feng, Biwei Huang

发表机构 * UC San Diego（加州大学圣地亚哥分校）； UT Austin（德克萨斯大学奥斯汀分校）

AI总结针对对比强化学习在交互密集操作任务中表现不佳的问题，提出交互加权重采样方法，通过保留模式边界提升多模态分段非线性可达性表示，在仿真和真实机器人空气曲棍球任务中取得显著改进。

详情

AI中文摘要

对比强化学习（CRL）通过学习动力学的结构化表示，在多种目标条件机器人任务中取得了近期成功。然而，尽管在运动控制和简单控制领域表现优异，CRL在交互密集的操作任务中常常遇到困难。我们认为这一困难的关键来源是物体中心交互，如接触或抓取，这些交互会引起潜在动态模式的显著变化。在这项工作中，我们将操作动力学建模为分段平滑马尔可夫过程，并证明交互引起的模式变化产生了分段非线性可达性结构，这使得标准CRL能量函数难以表示和规划。基于这一分析，我们引入了交互加权重采样（IWR）。IWR在交互前、中、后阶段进行交互感知重采样，鼓励学习到的表示保留决定未来可达性的模式边界，以捕获多模态和分段非线性可达性。在包括2D动态控制、机器人操作和机器人空气曲棍球在内的交互中心环境中，IWR相比先前的CRL方法提高了样本效率和整体性能，在仿真中平均提升19.8%。最后，通过使用IWR训练的策略进行仿真到现实的迁移，我们展示了首个能够击打目标的真实世界目标条件机器人空气曲棍球智能体，成功率从25%提升到60%。项目页面：此 http URL。

英文摘要

Contrastive Reinforcement Learning (CRL) has seen recent success in a wide variety of goal-conditioned robotics tasks by learning structured representations of the dynamics. However, despite its success in locomotion and simpler control domains, CRL often struggles in interaction-rich manipulation. We argue that a key source of this difficulty is object-centric interaction, such as contact or grasping, that induces distinct changes in the underlying dynamic modes. In this work, we formulate manipulation dynamics as a piecewise-smooth Markov process and show that interaction-induced mode changes create piecewise nonlinear reachability structures that are difficult for standard CRL energy functions to represent and plan over. Based on this analysis, we introduce Interaction-weighted Resampling (IWR). IWR performs interaction-aware resampling around phases before, during, and after interactions, encouraging the learned representation to preserve the mode boundaries that determine future reachability to capture multi-modal and piecewise nonlinear reachability. Across interaction-centric environments, including 2D dynamic control, robotic manipulation, and robot air hockey, IWR improves both sample efficiency and overall performance over prior CRL methods, with 19.8% average improvement in simulation. Finally, using a sim-to-real pipeline with policies trained by IWR, we demonstrate the first real-world goal-conditioned robot air hockey agent capable of hitting goals, improving success from 25% to 60%. Project Page: IWR-arxiv.github.io.

URL PDF HTML ☆

赞 0 踩 0

2606.11798 2026-06-11 q-fin.CP cs.LG math.OC 交叉投稿

Deterministic Policy Gradient for Learning Equilibrium in Time-Inconsistent Control Problems

时间不一致控制问题中学习均衡的确定性策略梯度

Xin Guo, Yijie Huang, Xiang Yu

发表机构 * Department of Industrial Engineering and Operations Research, University of California, Berkeley, USA（加州大学伯克利分校工业工程与运筹学系）； Department of Applied Mathematics, The Hong Kong Polytechnic University, Kowloon, Hong Kong（香港理工大学应用数学系）

AI总结提出一种连续时间无模型强化学习算法，通过确定性策略梯度和内定点迭代学习时间不一致控制问题的均衡策略，并在均值-方差投资组合和非指数贴现跟踪投资组合中验证有效性。

Comments Keywords: Time-inconsistent control, two-stage reformulation, model-free continuous-time reinforcement learning, deterministic policy gradient, fixed point iteration

详情

AI中文摘要

在本文中，我们开发了一种连续时间无模型强化学习算法，用于学习一般时间不一致控制问题中的确定性均衡策略。利用扩展的Hamilton-Jacobi-Bellman系统，我们将原始时间不一致问题转化为一个等价的两阶段问题。在第一阶段，对于给定的辅助函数，我们采用确定性策略梯度方法在辅助的时间一致控制问题中学习最优策略。在第二阶段，给定更新后的策略，我们利用内定点迭代和某些鞅特征来学习辅助函数。作为理论贡献，我们提供了一些温和的模型假设，并建立了内定点迭代的收敛性。通过在两阶段之间重复这种演员-评论家风格的迭代，我们的算法旨在以统一的方式学习不同时间不一致性来源下的均衡。该算法在两种经典的时间不一致金融应用中的优越有效性得到了说明：均值-方差投资组合管理和非指数贴现下的最优跟踪投资组合。

英文摘要

In this paper, we develop a continuous-time model-free reinforcement learning algorithm to learn deterministic equilibrium policies in general time-inconsistent control problems. Utilizing the extended Hamilton-Jacobi-Bellman system, we recast the original time-inconsistent problem into an equivalent two-stage problem. In the first stage, for given auxiliary functions, we employ the deterministic policy gradient approach to learn an optimal policy in an auxiliary time-consistent control problem. In the second stage, given the updated policy, we exploit the inner fixed point iterations and some martingale characterizations to learn the auxiliary functions. As a theoretical contribution, we provide some mild model assumptions and establish the convergence of inner fixed point iterations. By repeating this actor-critic style of iterations across two stages, our algorithm aims to learn the equilibrium under different sources of time-inconsistency in a unified manner. The superior effectiveness of the proposed algorithm are illustrated in two classical financial applications with time-inconsistency: mean-variance portfolio management and optimal tracking portfolio under non-exponential discounting.

URL PDF HTML ☆

赞 0 踩 0

2606.11891 2026-06-11 cs.RO cs.LG 交叉投稿

Critic Architecture Matters: Dual vs. Unified Critics for Humanoid Loco-Manipulation

评论家架构的重要性：双评论家与统一评论家在人形机器人移动操作中的对比

Mehmet Turan Yardımcı

AI总结针对人形机器人多目标强化学习，对比统一评论家与双评论家架构，实验表明双评论家策略在到达速度、吞吐量和成功率上显著优于统一评论家，且架构选择比奖励工程影响更大。

Comments Accepted at the ICRA 2026 Workshop on Reinforcement Learning for Imitation Learning (RL4IL), Vienna, Austria. 4 pages, 2 figures

详情

AI中文摘要

人形机器人的多目标强化学习必须在单一策略中协调移动和操作。一个自然的设计选择是使用单一（统一）评论家来估计所有目标的组合价值，还是使用具有不相交奖励信号的单独（双）评论家。我们在NVIDIA Isaac Lab中对Unitree G1人形机器人（23个主动自由度）进行了受控比较，通过一个从静态到达延伸到具有可变方向目标的行走的13级顺序课程训练移动操作策略。在标准化评估中，与统一评论家策略相比，双评论家策略到达目标的速度快3.5倍（6.5 vs. 22.6模拟步），吞吐量高2倍（每1000步验证到达次数14.3 vs. 7.0），并且验证到达率更高（65.2% vs. 53.8%）。值得注意的是，额外的反博弈奖励机制在架构改变之外没有提供进一步改进（60.9% vs. 65.2%）。这些结果对新兴的强化学习微调模仿学习策略范式有直接影响：当使用强化学习优化预训练的操作策略时，统一评论家可能通过竞争性的移动梯度抑制已学习的行为。这些发现表明，评论家架构是多目标人形机器人强化学习中一个首要且常被忽视的设计选择，其对到达效率的影响大于奖励工程。

英文摘要

Multi-objective reinforcement learning for humanoid robots must coordinate locomotion and manipulation within a single policy. A natural design choice is whether to use a single (unified) critic that estimates the combined value of all objectives, or separate (dual) critics with disjoint reward signals. We present a controlled comparison on the Unitree G1 humanoid (23 active DoF) in NVIDIA Isaac Lab, training loco-manipulation policies through a sequential curriculum spanning 13 levels from stationary reaching to walking with variable-orientation targets. In standardized evaluation, dual-critic policies reach targets 3.5$\times$ faster (6.5 vs. 22.6 simulation steps), achieve 2$\times$ higher throughput (14.3 vs. 7.0 validated reaches per 1,000 steps), and attain higher validated reach rates (65.2% vs. 53.8%) compared to the unified-critic policy. Notably, additional anti-gaming reward mechanisms provide no further improvement beyond the architectural change alone (60.9% vs. 65.2%). These results have direct implications for the emerging paradigm of RL fine-tuning of imitation-learned policies: when refining a pre-trained manipulation policy with RL, a unified critic risks suppressing the learned behavior through competing locomotion gradients. These findings demonstrate that critic architecture is a primary - and often overlooked - design choice in multi-objective humanoid RL, with greater impact than reward engineering on reaching efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.12086 2026-06-11 cs.AI cs.LG 交叉投稿

IntElicit: Eliciting and Assessing Contextualized Creativity via Dialogue Policy Optimization

IntElicit: 通过对话策略优化引出和评估情境化创造力

Mingjia Li, Jin Wu, Hong Qian, Wenhao Huang, Yiyang Huang, Yiwen Zhang, Chanjin Zheng, Xiangfeng Wang, Aimin Zhou, Jiajun Guo

发表机构 * East China Normal University（华东师范大学）； Shanghai Innovation Institute（上海创新研究院）

AI总结提出IntElicit框架，通过分解过程奖励机制优化对话策略，在交互中减少非创造性混淆因素，从而更有效地引出和评估情境化创造力。

详情

AI中文摘要

情境化评估为评估创造力提供了高生态效度，但也引入了一个关键挑战：观察到的表现可能与认知熟练度（领域知识）和能动性（参与意愿）相混淆。同时，在生成式AI时代，创造性问题解决越来越多地发生在工具中介和人机交互环境中，使得完全静态的评估与当代创造性实践不太一致。为了解决这些问题，本文提出了IntElicit，一个通过对话策略优化来引出和评估情境化创造力的框架。IntElicit作为一个受约束的自适应AI面试官：它在多轮交互中提供非指导性的知识和能动性支架，以减少非创造性混淆因素，同时保留参与者生成被评估的创造性内容的责任。具体来说，为了解决开放教育对话中的稀疏奖励和潜在奖励破解（例如，答案听写），IntElicit引入了一种分解过程奖励机制。该机制将策略与教学引出对齐，奖励那些引出参与者推理而非代表他们产生最优答案的提示。大量实验，包括参与者模拟和一项人类受试者研究（N=64），表明IntElicit比专家设计的基线提高了引出的创造性成果。总之，结果表明，交互式引出可以揭示静态FPSP式评估可能遗漏的创造性潜力，为AI中介学习环境中的情境化创造力评估提供了形成性和诊断性视角。

英文摘要

Contextualized assessment offers high ecological validity for evaluating creativity but introduces a critical challenge: observed performance may be confounded with cognitive proficiency (domain knowledge) and agency (willingness to engage). Meanwhile, in the age of generative AI, creative problem solving increasingly occurs in tool-mediated and human--AI interactive environments, making fully static assessment less aligned with contemporary creative practice. To address these issues, this paper proposes IntElicit, a framework for eliciting and assessing contextualized creativity via dialogue policy optimization. IntElicit functions as a constrained adaptive AI Interviewer: it provides non-directive knowledge and agency scaffolds in multi-turn interaction to reduce non-creative confounders, while preserving participants' responsibility for generating the creative content being evaluated. Specifically, to tackle sparse rewards and potential reward hacking (e.g., answer dictation) in open-ended educational dialogue, IntElicit introduces a decomposed process reward mechanism. This mechanism aligns the policy with pedagogical elicitation, rewarding prompts that draw out participant reasoning rather than producing optimal answers on their behalf. Extensive experiments, including participant simulation and a human subject study (N=64), show that IntElicit improves elicited creative outcomes over expert-designed baselines. Together, the results suggest that interactive elicitation can reveal creative potential that static FPSP-style assessment may miss, providing a formative and diagnostic lens for contextualized creativity assessment in AI-mediated learning contexts.

URL PDF HTML ☆

赞 0 踩 0

2606.12281 2026-06-11 cs.MA cs.AI cs.LG 交叉投稿

Pass@K 策略优化：解决更困难的强化学习问题

Christian Walder, Deep Karkhanis

发表机构 * Google DeepMind（谷歌深Mind）

AI总结提出 Pass-at-k 策略优化 (PKPO)，通过变换奖励直接优化 pass@k 性能，利用低方差无偏估计器，在训练中退火 k 可同时提升 pass@1 和 pass@k，解决更难问题。

详情

AI中文摘要

强化学习算法对每个问题采样多个 n>1 的解决方案尝试并独立奖励它们。这优化了 pass@1 性能，优先考虑孤立样本的强度，而牺牲了样本集的多样性和集体效用。这未充分利用采样能力，限制了探索和在更难示例上的最终改进。作为修复，我们提出 Pass-at-k 策略优化 (PKPO)，一种对最终奖励的变换，导致直接优化 pass@k 性能，从而优化联合考虑时最大化奖励的样本集。我们的贡献是推导出 pass@k 及其梯度在二元和连续奖励设置中的新型低方差无偏估计器。我们展示了使用我们的估计器进行优化简化为标准强化学习，其中奖励经过稳定高效的变换函数联合变换。虽然先前的工作仅限于 k=n，但我们是第一个能够对任意 k ≤ n 实现 pass@k 鲁棒优化的。此外，我们的方法不是以 pass@1 性能换取 pass@k 增益，而是允许在训练中退火 k，同时优化两个指标，通常能在显著 pass@k 增益的同时获得强大的 pass@1 数值。我们在玩具实验上验证了我们的奖励变换，揭示了我们的公式的方差减少特性。我们还使用开源 LLM GEMMA-2 包含了真实世界的例子。我们发现我们的变换有效地优化了目标 k。此外，更高的 k 值能够解决更多和更难的问题，而退火 k 则同时提升了 pass@1 和 pass@k。关键的是，在传统 pass@1 优化停滞的具有挑战性的任务集上，我们的 pass@k 方法解锁了学习，这可能是由于通过优先考虑联合效用而非单个样本的效用实现了更好的探索。

英文摘要

Reinforcement Learning (RL) algorithms sample multiple n>1 solution attempts for each problem and reward them independently. This optimizes for pass@1 performance and prioritizes the strength of isolated samples at the expense of the diversity and collective utility of sets of samples. This under-utilizes the sampling capacity, limiting exploration and eventual improvement on harder examples. As a fix, we propose Pass-at-k Policy Optimization (PKPO), a transformation on the final rewards which leads to direct optimization of pass@k performance, thus optimizing for sets of samples that maximize reward when considered jointly. Our contribution is to derive novel low variance unbiased estimators for pass@k and its gradient, in both the binary and continuous reward settings. We show optimization with our estimators reduces to standard RL with rewards that have been jointly transformed by a stable and efficient transformation function. While previous efforts are restricted to k=n, ours is the first to enable robust optimization of pass@k for any arbitrary k <= n. Moreover, instead of trading off pass@1 performance for pass@k gains, our method allows annealing k during training, optimizing both metrics and often achieving strong pass@1 numbers alongside significant pass@k gains. We validate our reward transformations on toy experiments, which reveal the variance reducing properties of our formulations. We also include real-world examples using the open-source LLM, GEMMA-2. We find that our transformation effectively optimizes for the target k. Furthermore, higher k values enable solving more and harder problems, while annealing k boosts both the pass@1 and pass@k . Crucially, for challenging task sets where conventional pass@1 optimization stalls, our pass@k approach unblocks learning, likely due to better exploration by prioritizing joint utility over the utility of individual samples.

URL PDF HTML ☆

赞 0 踩 0

2509.10303 2026-06-11 cs.LG cs.AI 版本更新

Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions

超越次优性：离线强化学习通过随机解决方案学习有效调度

Jesse van Remmerden, Zaharah Bukhsh, Yingqian Zhang

发表机构 * Eindhoven University of Technology（埃因霍温理工大学）

AI总结提出离线RL算法CDQAC，从次优静态数据集学习调度策略，在JSP/FJSP上超越在线RL和强启发式方法，仅需1-5%数据，发现状态-动作覆盖比轨迹质量更重要。

详情

AI中文摘要

在线强化学习（RL）方法通过与模拟环境直接交互学习调度策略，在作业车间调度（JSP）和柔性作业车间调度（FJSP）问题上表现出色。然而，这些方法通常需要大量的训练交互，限制了其样本效率和实际适用性。受此挑战的启发，我们引入了保守离散分位数演员-评论家（CDQAC），这是一种离线RL算法，可以直接从静态、次优数据集中学习有效的调度策略。CDQAC将基于分位数的评论家与延迟策略更新相结合，以估计机器-操作对的回报分布。在JSP和FJSP基准上的大量实验表明，CDQAC始终优于生成数据的启发式方法，超越了最先进的离线和在线RL基线，并且具有很高的样本效率，仅需原始数据集的1%到5%即可学习高质量策略。我们的分析表明，在调度中，离线RL的性能主要受状态-动作覆盖范围而非单个轨迹质量的影响。调度将密集奖励（与完工时间目标对齐）与跨启发式方法的等长轨迹相结合，从而能够从广泛的行为中有效学习。与此观察一致，由简单随机启发式方法生成的具有更广覆盖范围的数据集，使其性能优于在由更强启发式方法（如遗传算法）生成的数据集上训练的策略。

英文摘要

Online reinforcement learning (RL) approaches have demonstrated strong performance on Job Shop Scheduling (JSP) and Flexible JSP (FJSP) problems by learning scheduling policies through direct interaction with simulated environments. However, these methods often require extensive training interactions, limiting their sample efficiency and practical applicability. Motivated by this challenge, we introduce Conservative Discrete Quantile Actor-Critic (CDQAC), an offline RL algorithm that learns effective scheduling policies directly from static, suboptimal datasets. CDQAC couples a quantile-based critic with delayed policy updates to estimate the return distribution of machine-operation pairs. Extensive experiments on JSP and FJSP benchmarks demonstrate that CDQAC consistently outperforms the data-generating heuristics, surpasses state-of-the-art offline and online RL baselines, and is highly sample efficient, requiring only 1 to 5% of the original dataset to learn high-quality policies. Our analysis suggests that, in scheduling, offline RL performance is governed mainly by state-action coverage rather than the quality of individual trajectories. Scheduling couples a dense reward aligned with the makespan objective with equal-length trajectories across heuristics, enabling effective learning from a broad range of behaviors. Consistent with this observation, datasets generated by a simple random heuristic with broader coverage let it outperform policies trained on datasets produced by stronger heuristics such as Genetic Algorithms.

URL PDF HTML ☆

赞 0 踩 0

2509.26294 2026-06-11 cs.LG cs.AI 版本更新

Noise-Guided Transport for Imitation Learning

噪声引导的模仿学习传输方法

Lionel Blondé, Joao A. Candido Ramos, Alexandros Kalousis

发表机构 * University of Cambridge（剑桥大学）； University of Oxford（牛津大学）

AI总结针对低数据场景下的模仿学习，提出噪声引导传输（NGT）方法，通过对抗训练将模仿问题转化为最优传输问题，无需预训练或特殊架构，在极低数据量下实现强性能。

Comments Accepted at ICML 2026. Code: https://github.com/lionelblonde/ngt

2510.02149 2026-06-11 cs.LG math.OC stat.ML 版本更新

Reinforcement Learning with Action-Triggered Observations

具有动作触发观测的强化学习

Alexander Ryabchenko, Wenlong Mou

发表机构 * Department of Statistical Sciences, University of Toronto（统计科学系，多伦多大学；向量研究所）； Vector Institute

AI总结提出动作触发稀疏可追踪MDP框架，推导Bellman方程并证明最优策略存在，利用观测间动作序列的线性表示实现基于回归的方法，在几何分布情节下达到与完全可观测线性MDP匹配的遗憾界。

详情

AI中文摘要

我们引入了动作触发稀疏可追踪马尔可夫决策过程（ATST-MDPs），这是一种用于部分可观测性的强化学习框架，其中完整状态观测在每个步骤以由所选动作决定的概率随机发生。我们推导了针对该设置的Bellman方程，并证明了最优策略的存在性。利用稀疏观测揭示完整状态的事实，我们提供了一个等价公式，其中智能体在连续观测之间承诺动作序列。在线性MDP假设下，我们证明了这些动作序列上的值函数在有限维特征映射中具有线性表示，从而能够使用标准的基于回归的方法。作为一个应用，我们推导了ATST-LSVI-UCB，一种乐观算法，在几何分布的情节学习中实现了遗憾界$\widetilde{O}(\sqrt{Kd^3(1-\gamma)^{-3}})$，其中$K$是情节数，$d$是特征维度，$\gamma$是折扣因子（情节继续概率），与完全可观测线性MDP的已知速率相匹配。

英文摘要

We introduce Action-Triggered Sporadically Traceable Markov Decision Processes (ATST-MDPs), a reinforcement learning framework for partial observability in which full state observations occur stochastically at each step, with probability determined by the chosen action. We derive Bellman equations tailored to this setting and establish the existence of an optimal policy. Exploiting the fact that sporadic observations reveal the full state, we provide an equivalent formulation in which agents commit to action-sequences between consecutive observations. Under the linear MDP assumption, we show that the value function over such action-sequences admits a linear representation in a finite-dimensional feature map, enabling standard regression-based methods. As an application, we derive ATST-LSVI-UCB, an optimistic algorithm achieving regret $\widetilde{O}(\sqrt{Kd^3(1-γ)^{-3}})$ for episodic learning with geometrically distributed horizons, where $K$ is the number of episodes, $d$ the feature dimension, and $γ$ the discount factor (episode continuation probability), matching the known rate for linear MDPs with full observability.

URL PDF HTML ☆

赞 0 踩 0

2601.08136 2026-06-11 cs.LG cs.SY eess.SY 版本更新

Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies

反向流匹配：基于扩散与流策略的在线强化学习统一框架

Zeyang Li, Sunbochen Tang, Navid Azizan

发表机构 * Zeyang Li（李泽阳）； Sunbochen Tang（唐顺波晨）； Navid Azizan（阿齐兹安纳维）

AI总结针对在线强化学习中扩散与流策略缺乏目标样本的问题，提出反向流匹配框架，通过后验均值估计和Langevin Stein算子构造控制变量，统一了噪声期望与梯度期望两类方法，并扩展到流策略，提升训练效率与稳定性。

Comments ICML 2026 (Spotlight); Code: https://github.com/azizanlab/ReverseFlowMatching

详情

AI中文摘要

扩散和流策略因其强大的表达能力在在线强化学习（RL）中日益重要，但高效训练它们仍是一个关键挑战。在线RL与标准生成建模的一个根本区别在于缺乏来自Q函数定义的目标玻尔兹曼分布的直接样本。为此，针对扩散策略提出了两类看似不同的方法：噪声期望族，使用噪声的加权平均作为训练目标；梯度期望族，使用Q函数梯度的加权平均。然而，这些目标如何正式相关，或者它们能否被综合成一个更通用的公式，目前尚不清楚。在本文中，我们提出了一个统一框架——反向流匹配（RFM），该框架严格解决了在没有直接目标样本的情况下训练扩散和流模型的问题。通过采用反向推理视角，我们将训练目标表述为给定中间噪声样本的后验均值估计问题。关键地，我们引入Langevin Stein算子来构造零均值控制变量，推导出一类具有相同期望的通用估计器。我们表明，现有的噪声期望和梯度期望方法只是这个更广泛类别中的两个具体实例。这种统一观点带来了两个关键进展：它将针对玻尔兹曼分布的能力从扩散策略扩展到流策略，并使得能够原则性地结合Q值和Q梯度信息形成有效估计器，从而提高训练效率和稳定性。我们将RFM实例化以在在线RL中训练流策略，并在连续控制基准测试中展示了相比扩散策略基线的改进性能。

英文摘要

Diffusion and flow policies are gaining prominence in online reinforcement learning (RL) due to their expressive power, yet training them efficiently remains a critical challenge. A fundamental difficulty that distinguishes online RL from standard generative modeling is the lack of direct samples from the target Boltzmann distribution defined by the Q-function. To address this, two seemingly distinct families of methods have been proposed for diffusion policies: a noise-expectation family, which uses a weighted average of noise as the training target, and a gradient-expectation family, which employs a weighted average of Q-function gradients. However, it remains unclear how these objectives are formally related, or whether they can be synthesized into a more general formulation. In this paper, we propose a unified framework, reverse flow matching (RFM), which rigorously addresses the problem of training diffusion and flow models without direct target samples. By adopting a reverse inferential perspective, we formulate the training target as a posterior mean estimation problem given an intermediate noisy sample. Crucially, we introduce Langevin Stein operators to construct zero-mean control variates, deriving a general class of estimators that share the same expectation. We show that existing noise-expectation and gradient-expectation methods are simply two specific instances within this broader class. This unified view yields two key advancements: it extends the capability of targeting Boltzmann distributions from diffusion to flow policies, and it enables the principled combination of Q-value and Q-gradient information to form an effective estimator, thereby improving training efficiency and stability. We instantiate RFM to train a flow policy in online RL and demonstrate improved performance on continuous-control benchmarks compared to diffusion policy baselines.

URL PDF HTML ☆

赞 0 踩 0

2603.08558 2026-06-11 cs.LG stat.ML 版本更新

在线平台中的数据驱动动态分类：学习双边信息

Rahul Roy, Nur Sunar, Jayashankar M. Swaminathan

发表机构 * IE Business School, IE University（IE大学商学院）； Kenan-Flagler Business School, The University of North Carolina at Chapel Hill（北卡罗来纳大学教堂山分校肯纳-弗拉格勒商学院）

AI总结针对双边服务平台，提出一种数据驱动算法，在未知顾客和卖家选择参数的情况下动态优化商品分类，并证明其遗憾值随时间呈多对数增长且达到最优速率。

详情

AI中文摘要

我们研究了一个在离散时间环境下，具有不完全信息和异质顾客的双边服务平台上的动态分类问题。在每个周期，一位顾客到达寻求服务，平台选择一组卖家进行展示。顾客根据多项逻辑选择模型，最多向分类中的一个卖家提出交易。经过固定数量的周期后，卖家审查收到的提议，并根据另一个多项逻辑选择模型，每位卖家最多选择一个顾客，然后循环重复。一个关键挑战是平台事先不知道顾客或卖家的选择模型参数。据我们所知，这是首次研究双边选择参数均未知的动态分类问题。我们开发了一种数据驱动算法，该算法在优化平台目标的同时学习这些参数。我们使用遗憾值来评估性能，该遗憾值衡量相对于一个预知所有参数和顾客到达时间的先知基准的收入损失。我们证明该算法的最坏情况遗憾值随时间呈多对数增长，并推导出匹配的下界，从而确定其速率最优性。

英文摘要

We study a dynamic assortment problem on a two-sided service platform with incomplete information and heterogeneous customers in a discrete-time setting. In each period, a customer arrives seeking service, and the platform chooses an assortment of sellers to display. The customer then proposes a transaction to at most one seller in the assortment according to a multinomial logit choice model. After a fixed number of periods, sellers review the proposals they have received and each chooses at most one customer according to another multinomial logit choice model, after which the cycle repeats. A key challenge is that the platform does not know the choice-model parameters of either customers or sellers in advance. To our knowledge, this is the first study of a dynamic assortment problem in which both sides' choice parameters are unknown. We develop a data-driven algorithm that learns these parameters while optimizing the platform's objective over time. We evaluate performance using regret, which measures revenue loss relative to a clairvoyant benchmark that knows all parameters and customer arrivals in advance. We show that the algorithm's worst-case regret grows polylogarithmically over time, and we derive a matching lower bound, establishing its rate optimality.

URL PDF HTML ☆

赞 0 踩 0

2307.01472 2026-06-11 cs.AI cs.LG cs.MA 版本更新

关于RL训练的语言模型的最优推理长度

Daisuke Nohara, Taishi Nakamura, Rio Yokota

发表机构 * University of Tokyo（东京大学）

AI总结研究强化学习训练的语言模型中推理长度与准确率的非单调关系，发现存在最优中间长度，并通过模式准确率分析揭示其成因。

Comments 18 pages, 12 figures

2603.14762 2026-06-11 math.OC cs.LG cs.SY eess.SY 版本更新

物理信息驱动的生成式AI在半导体制造中的应用：通过构造强制生成模型中的硬物理约束

Yaser Mike Banad, Sarah Sharif

发表机构 * School of Electrical and Computer Engineering, University of Oklahoma（俄克拉荷马大学电气与计算机工程学院）； Center for Quantum Research and Technology, University of Oklahoma（俄克拉荷马大学量子研究与技术中心）； Intelligent Neuromorphic and Quantum Understanding for Innovative Research and Engineering (INQUIRE) Laboratory（创新研究与工程智能神经形态与量子理解实验室）； Material Science and Engineering Program, University of Oklahoma, Norman, OK 73019 USA（俄克拉荷马大学材料科学与工程项目，Norman, OK 73019 USA）

AI总结针对半导体制造中生成模型必须满足硬物理约束的问题，本文提出通过构造集成物理信息（如物理信息扩散、PDE约束变分模型等）来强制约束，而非事后过滤，并给出四种集成模式和未来研究方向。

详情

AI中文摘要

生成模型越来越多地被用于为物理系统提出设计、数据和控制动作，然而许多此类系统受硬物理约束而非感知合理性支配。半导体制造提供了一个严苛的测试案例：生成的掩模、布局、合成缺陷数据和工艺配方必须遵守光刻、传输、反应和器件物理约束，因为物理无效的样本不仅质量低劣，而且无法使用。本文认为，半导体制造揭示了一个更广泛的计算科学挑战，即用于受约束物理领域的生成式AI必须通过构造实现物理信息驱动，而非仅通过事后过滤来纠正。我们调查了新兴的架构工具包，包括物理信息扩散、PDE约束变分模型、神经算子先验和守恒律尊重生成网络，并展示了它如何与可微分光刻、TCAD、工艺仿真和自主实验相联系。我们识别了生成模型与基于物理的模拟器之间的四种集成模式，并提出了一个以物理保真度基准、可微分模拟器基础设施以及面向物理设计和制造的多模态基础模型为中心的研究议程。核心主张是分析性的而非修辞性的：在物理有效性是成功的关键标准的情况下，通过构造强制约束的架构应被期望优于事后过滤的架构，而晶圆厂正是这种区别最鲜明的环境。

英文摘要

Generative models are increasingly used to propose designs, data, and control actions for physical systems, yet many such systems are governed by hard physical constraints rather than by perceptual plausibility. Semiconductor manufacturing provides a demanding test case: generated masks, layouts, synthetic defect data, and process recipes must obey lithography, transport, reaction, and device-physics constraints, because physically invalid samples are not merely low quality but unusable. This Perspective argues that semiconductor manufacturing exposes a broader computational-science challenge, namely that generative AI for constrained physical domains must be physics-informed by construction, not corrected only through post-hoc filtering. We survey the emerging architectural toolkit, including physics-informed diffusion, PDE-constrained variational models, neural-operator priors, and conservation-law-respecting generative networks, and show how it connects to differentiable lithography, TCAD, process simulation, and autonomous experimentation. We identify four integration patterns between generative models and physics-based simulators, and we propose a research agenda centered on physics-fidelity benchmarks, differentiable simulator infrastructure, and multimodal foundation models for physical design and manufacturing. The central claim is analytical rather than rhetorical: where physical validity is the binding criterion of success, architectures that enforce it by construction should be expected to outperform those that filter for it after the fact, and the fab is the setting where this distinction is sharpest.

URL PDF HTML ☆

赞 0 踩 0

2606.11277 2026-06-11 cs.LG physics.comp-ph 新提交

重新评估掩蔽扩散语言模型中的置信度重新掩蔽

Stipe Frkovic, Metod Jazbec, Dan Zhang, Christian A. Naesseth, Ilija Bogunovic, Eric Nalisnick

发表机构 * UvA-Bosch Delta Lab, University of Amsterdam（阿姆斯特丹大学UvA-Bosch Delta实验室）； Bosch Center for AI（博世人工智能中心）； University of Basel（巴塞尔大学）； Johns Hopkins University（约翰霍普金斯大学）

AI总结本文重新评估了掩蔽扩散语言模型中一种无需训练的后验置信度重新掩蔽方法WINO，发现在标准解码设置下其收益甚微，且会加剧多样性坍塌问题。

详情

AI中文摘要

掩蔽扩散语言模型（dLLMs）最近已成为自回归语言模型的有竞争力的替代方案，其通过并行令牌生成实现更快的推理。然而，掩蔽公式的一个显著限制是，一旦令牌被解除掩蔽，就无法再修改，这使得dLLMs容易受到早期采样错误的影响。为了解决这个问题，越来越多的研究试图扩展掩蔽dLLMs，使其具有自我纠正（重新掩蔽）能力。其中一类有吸引力的方法以无需训练、事后方式基于令牌置信度实现，早期报告的结果令人鼓舞。在这项工作中，我们重新审视了代表性事后重新掩蔽方法WINO [Hong et al., 2026]的实证评估，发现在标准解码设置（较短的块长度）下，它相比于仅基于置信度的解除掩蔽 [Wu et al., 2025] 几乎没有带来好处。将评估扩展到非贪婪解码，我们发现虽然基于置信度的重新掩蔽可以在一定程度上减轻由增加随机性引入的错误，但它也加剧了先前报道的基于置信度的解除掩蔽导致的多样性坍塌。总体而言，我们的结果表明，事后基于置信度的重新掩蔽的好处高度依赖于设置，这凸显了需要更全面的评估框架。

英文摘要

Masked diffusion language models (dLLMs) have recently emerged as a competitive alternative to autoregressive language models, with the promise of faster inference via parallel token generation. A notable limitation of the masked formulation, however, is that once a token has been unmasked it can no longer be revised, leaving dLLMs vulnerable to early sampling mistakes. To address this, a growing body of work has sought to extend masked dLLMs with self-correcting (remasking) capabilities. One appealing subset of these methods does so in a training-free, post-hoc manner based on token confidences, with encouraging early reported results. In this work, we revisit the empirical evaluation of a representative post-hoc remasking method, WINO [Hong et al., 2026], and find that under standard decoding settings (shorter block lengths) it brings little-to-no benefit over confidence-based unmasking alone [Wu et al., 2025]. Extending the evaluation to non-greedy decoding, we find that while confidence-based remasking can mitigate errors introduced by increased stochasticity to some extent, it also exacerbates the diversity collapse previously reported for confidence-based unmasking. Overall, our results show that the benefits of post-hoc confidence-based remasking are highly setting-dependent, underscoring the need for a more comprehensive evaluation framework.

URL PDF HTML ☆

赞 0 踩 0

2606.11203 2026-06-11 cs.CL cs.LG 交叉投稿

LatticeBridge: Rare-Event Sequential Inference for Faithful Structured Sequence Synthesis

LatticeBridge: 用于忠实结构化序列合成的罕见事件序列推理

Faruk Alpay, Bugra Kilictas

发表机构 * Bahcesehir University（巴切塞希尔大学）

AI总结针对结构化序列生成中约束满足的罕见事件问题，提出LatticeBridge方法，结合前缀语言模型、实例编译表面自动机和扭曲序列蒙特卡洛解码器，在多个基准上显著提升锚点满足率和覆盖率。

Comments 19 pages. Code and benchmark files available at https://github.com/farukalpay/latticebridge

详情

AI中文摘要

结构化序列生成通常要求模型在单个输出中满足多个输入派生约束。标准解码方法可能赋予流畅延续高概率，而对同时实现所有必需锚点的延续赋予低概率。我们将此机制视为罕见事件序列推理问题。LatticeBridge 结合了紧凑前缀语言模型、实例编译表面自动机以及带有重采样、多级分裂和源自实例提供短语的源支持提议项的扭曲序列蒙特卡洛 (SMC) 解码器。约束表示从每个输入实例编译而来，不依赖人工整理的词汇类别。在涵盖 CommonGen、E2E NLG 和 WikiBio 的 2,610 个可达到验证任务上，粒子解码器在共享提议模型下，相比贪心、波束过滤和 best-of-k 祖先基线，提高了精确锚点满足率和平均锚点覆盖率。由于仅精确锚点满足不能排除不支持的属性替换，评估同时报告了所需锚点覆盖率、源覆盖率、源入侵诊断、重叠度、运行时间和粒子统计量。该基准在固定提议模型下刻画了忠实度-重叠度-延迟前沿。

英文摘要

Structured sequence generation often requires a model to satisfy several input-derived constraints in a single output. Standard decoding methods may assign high probability to fluent continuations while placing low mass on continuations that realize all required anchors jointly. We study this regime as a rare-event sequential inference problem. LatticeBridge combines a compact prefix language model, instance-compiled surface automata, and a twisted sequential Monte Carlo (SMC) decoder with resampling, multilevel splitting, and a source-support proposal term derived from instance-provided phrases. The constraint representation is compiled from each input instance and does not rely on manually curated lexical classes. On 2,610 attainable validation tasks spanning CommonGen, E2E NLG, and WikiBio, the particle decoder improves exact anchor satisfaction and mean anchor coverage over greedy, beam-filtered, and best-of-k ancestral baselines under a shared proposal model. Since exact anchor satisfaction alone does not rule out unsupported attribute substitutions, the evaluation reports required-anchor coverage, source coverage, source-intrusion diagnostics, overlap, runtime, and particle statistics jointly. The benchmark characterizes the faithfulness-overlap-latency frontier under a fixed proposal model.

URL PDF HTML ☆

赞 0 踩 0

2606.11304 2026-06-11 physics.ins-det cs.LG hep-ex hep-ph 交叉投稿

SPADE: Split-and-Delay Embeddings for Autoregressive High-Granularity Calorimeter Simulation

SPADE: 用于自回归高粒度量热器模拟的分裂与延迟嵌入

Joschka Birk, Frank Gaede, Anna Hallin, Gregor Kasieczka, Martina Mozzanica, Henning Rose

发表机构 * Institute for Experimental Physics, Universität Hamburg（实验物理研究所，汉堡大学）； Deutsches Elektronen-Synchrotron DESY（德国电子同步辐射光源DESY）

AI总结提出SPADE自回归变压器，通过独立嵌入多特征令牌并延迟特征流，利用标准自注意力学习令牌内相关性，在ILD探测器点云簇射生成中优于现有模型。

Comments 20 pages, 13 figures

2606.12282 2026-06-11 cs.SD cs.LG 交叉投稿

PianoKontext: Expressive Performance Rendering from Deadpan Context

PianoKontext: 从平淡语境中生成富有表现力的演奏

Dmitrii Gavrilev

发表机构 * Dmitrii Gavrilev

AI总结提出PianoKontext，一种基于流匹配的钢琴演奏渲染模型，通过动态时间规整对齐乐谱与演奏的潜在表示，生成可变长度的表现力演奏。

Comments ICML 2026 Workshop on Machine Learning for Audio (Oral)

2601.10774 2026-06-11 cs.LG hep-lat 版本更新

Analytic Bijections for Smooth and Interpretable Normalizing Flows

用于平滑且可解释的归一化流的解析双射

Mathis Gerdes, Miranda C. N. Cheng

发表机构 * University of Cambridge（剑桥大学）

AI总结提出三类全局光滑、解析可逆的双射函数，替代耦合流中的仿射变换或样条，并设计径向流架构，在径向结构目标上以千分之一参数达到耦合流质量。

Comments Final ICML 2026 version. 9 + 14 pages, 10 + 11 figures, 3 + 2 tables. New CIFAR-10 and tabular-data results; main text shortened for readability

详情

AI中文摘要

归一化流中的一个关键挑战是找到表达力强的可逆标量双射。现有方法面临权衡：仿射变换光滑且解析可逆但缺乏表达力；单调样条提供局部控制但仅分段光滑且作用于有界域；残差流实现光滑性但需要数值求逆。我们引入了三类解析双射，它们全局光滑（$C^\infty$），定义在整个$\mathbb{R}$上，且以闭式解析可逆，结合了先前方法的有利性质。除了作为耦合流中的即插即用替代品（其性能匹配或超越样条），我们还开发了径向流：一种使用直接参数化的新颖架构，在保持角度方向的同时变换径向坐标。径向流表现出卓越的训练稳定性，产生几何可解释的变换，并且在具有径向结构的目标上，能以$1000$倍更少的参数达到与耦合流相当的质量。我们在1D和2D基准测试上进行了全面评估，并通过$\phi^4$格点场论实验证明了其在更高维物理问题中的适用性，其中我们的双射优于仿射基线，并能够解决模式崩溃问题的特定设计。

英文摘要

A key challenge in normalizing flows is finding expressive invertible scalar bijections. Existing approaches face trade-offs: affine transformations are smooth and analytically invertible but lack expressivity; monotonic splines offer local control but are only piecewise smooth and act on bounded domains; residual flows achieve smoothness but need numerical inversion. We introduce three families of analytic bijections that are globally smooth ($C^\infty$), defined on all of $\mathbb{R}$, and analytically invertible in closed form, combining the favorable properties of prior approaches. Beyond serving as drop-in replacements in coupling flows, where they match or exceed spline performance, we develop radial flows: a novel architecture using direct parametrization that transforms the radial coordinate while preserving angular direction. Radial flows exhibit exceptional training stability, produce geometrically interpretable transformations, and on targets with radial structure can achieve comparable quality to coupling flows with $1000\times$ fewer parameters. We provide comprehensive evaluation on 1D and 2D benchmarks, and demonstrate applicability to higher-dimensional physics problems through experiments on $ϕ^4$ lattice field theory, where our bijections outperform affine baselines and enable problem-specific designs that address mode collapse.

URL PDF HTML ☆

赞 0 踩 0

2602.00424 2026-06-11 cs.LG cond-mat.mtrl-sci 版本更新

Open Materials Generation with Inference-Time Reinforcement Learning

基于推理时间强化学习的开放材料生成

Philipp Hoellmer, Stefano Martiniani

发表机构 * ETH Zurich（苏黎世联邦理工学院）

AI总结提出OMatG-IRL框架，通过策略梯度强化学习直接作用于学习的速度场，无需显式计算得分，实现晶体结构预测中的能量目标强化，采样效率提升一个数量级。

Comments 25 pages, 12 figures, 6 tables

详情

AI中文摘要

晶体材料的连续时间生成模型通过学习预测稳定晶体结构实现逆向材料设计，但将显式目标属性纳入生成过程仍然具有挑战性。策略梯度强化学习（RL）为生成模型与下游目标对齐提供了原则性机制，但通常需要访问得分，这阻碍了其应用于仅学习速度场的基于流的模型。我们提出了一种推理时间强化学习的开放材料生成（OMatG-IRL）框架，这是一种直接作用于学习的速度场的策略梯度RL框架，无需显式计算得分。OMatG-IRL利用底层生成动力学的随机扰动，保持预训练生成模型的基线性能，同时在推理时实现探索和策略梯度估计。通过OMatG-IRL，我们首次将RL应用于晶体结构预测（CSP）。我们的方法能够有效强化基于能量的目标，同时通过成分条件保持多样性，并且取得了与基于得分的RL方法竞争的性能。最后，我们展示了OMatG-IRL可以学习时间相关的速度退火调度，实现精确的CSP，采样效率提高一个数量级，相应地生成时间减少。OMatG-IRL代码包含在开放材料生成（OMatG）框架的新版本中，可从该https URL获取。

合成住宅：数据稀缺下用于住宅建筑数据生成的多模态生成式AI管道

Jackson Eshbaugh, Chetan Tiwari, Jorge Silveyra

发表机构 * Lafayette University（拉法叶大学）； Georgia State University（佐治亚州立大学）

AI总结提出一个多模态生成式AI框架，整合图像、表格和模拟组件，从公开记录和图像生成合成住宅建筑数据集，以解决建筑参数数据稀缺问题。

Comments 37 pages; 2 appendices; 6 figures; 2 tables. Code available at https://github.com/Lafayette-EshbaughSilveyra-Group/synthetic-homes

详情

AI中文摘要

计算模型已成为建筑和城市尺度多尺度能源建模研究的强大工具，支持建筑和城市能源系统的数据驱动分析。然而，这些模型需要大量的建筑参数数据，这些数据通常难以获取、收集成本高昂或受隐私限制。我们引入了一个模块化的多模态生成式人工智能（AI）框架，该框架整合了图像、表格和基于模拟的组件，并从公开的县记录和图像生成合成住宅建筑数据集，同时提出了一个实例化该框架的端到端管道。为了减少典型的大型语言模型（LLM）挑战，我们使用基于遮挡的视觉焦点分析来评估模型组件。我们的分析表明，我们选择的视觉语言模型在建筑图像处理方面比基于GPT的替代方案实现了更大的视觉焦点。我们还根据国家参考数据集评估了结果的真实性，发现我们的合成数据在四个选定变量中的三个重叠率超过95%。这项工作减少了对昂贵或受限数据源的依赖，降低了建筑尺度能源研究和机器学习（ML）驱动的城市能源建模的障碍，从而在数据稀缺的情况下实现了可扩展的下游任务，如能源建模、改造分析和城市尺度模拟。

英文摘要

Computational models have emerged as powerful tools for multi-scale energy modeling research at the building and urban scale, supporting data-driven analysis across building and urban energy systems. However, these models require large amounts of building parameter data that is often inaccessible, expensive to collect, or subject to privacy constraints. We introduce a modular, multimodal generative Artificial Intelligence (AI) framework that integrates image, tabular, and simulation-based components and produces synthetic residential building datasets from publicly available county records and images, and present an end-to-end pipeline instantiating this framework. To reduce typical Large Language Model (LLM) challenges, we evaluate our model's components using occlusion-based visual focus analysis. Our analysis demonstrates that our selected vision-language model achieves greater visual focus than a GPT-based alternative for building image processing. We also assess realism of our results against a national reference dataset, finding that our synthetic data overlaps more than 95% for three of the four selected variables. This work reduces dependence on costly or restricted data sources, lowering barriers to building-scale energy research and Machine Learning (ML)-driven urban energy modeling, and therefore enabling scalable downstream tasks such as energy modeling, retrofit analysis, and urban-scale simulation under data scarcity.

URL PDF HTML ☆

赞 0 踩 0

2603.12901 2026-06-11 stat.ML cond-mat.dis-nn cs.IT cs.LG math.IT 版本更新

A theory of learning data statistics in diffusion models, from easy to hard

扩散模型中学习数据统计的理论：从容易到困难

Lorenzo Bardone, Claudia Merger, Sebastian Goldt

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文研究了扩散模型在学习数据统计时的分布简单性偏差，揭示了学习 pairwise 统计和 higher-order 统计所需的样本复杂度差异，并引入了扩散信息指数这一不变量。

详情

Journal ref: ICML 2026

AI中文摘要

尽管扩散模型已成为强大的生成模型，但其学习动态仍不明确。我们通过实验证明，标准扩散模型在自然图像上学习时存在分布简单性偏差，先学习简单的 pairwise 输入统计，再转向更高阶相关性。我们在简单的去噪器上用最小数据模型混合累积模型重现了这一行为，并精确控制了输入的 pairwise 和 higher-order 相关性。我们识别出一个模型不变量，即扩散信息指数，类比于不同学习范式中的相关不变量。利用这一不变量，我们证明去噪器在线性样本复杂度下学习输入的简单 pairwise 统计，而更复杂的 higher-order 统计如四阶累积量需要至少立方样本复杂度。我们还证明，如果 pairwise 和 higher-order 统计共享相关潜在结构，则学习四阶累积量的样本复杂度是线性的。本文描述了扩散模型如何学习越来越复杂分布的关键机制。

英文摘要

While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural images exhibit a distributional simplicity bias, learning simple, pair-wise input statistics before specializing to higher-order correlations. We reproduce this behaviour in simple denoisers trained on a minimal data model, the mixed cumulant model, where we precisely control both pair-wise and higher-order correlations of the inputs. We identify a scalar invariant of the model that governs the sample complexity of learning pair-wise and higher-order correlations that we call the diffusion information exponent, in analogy to related invariants in different learning paradigms. Using this invariant, we prove that the denoiser learns simple, pair-wise statistics of the inputs at linear sample complexity, while more complex higher-order statistics, such as the fourth cumulant, require at least cubic sample complexity. We also prove that the sample complexity of learning the fourth cumulant is linear if pair-wise and higher-order statistics share a correlated latent structure. Our work describes a key mechanism for how diffusion models can learn distributions of increasing complexity.

URL PDF HTML ☆

赞 0 踩 0

2606.11258 2026-06-11 cs.LG nlin.PS physics.comp-ph 新提交

Loss Landscape Diagnosis for Gradient-Based Gray-Scott System Inversion: Disentangling the Roles of PINN Components

基于梯度的Gray-Scott系统反演的损失景观诊断：解构PINN各组件的角色

Yan Yang

发表机构 * Yan Yang（杨 Yan）

AI总结通过直接反向传播稳态损失至未折叠的Gray-Scott模拟，发现优化因损失景观中的平坦高原和陡峭悬崖而失败，而PINN中的残差损失通过隐式编码完整PDE动力学避免了该病理现象。

Comments Accepted at the AI4Physics Workshop, ICML 2026 (non-archival). 14 pages, 10 figures

详情

AI中文摘要

反应扩散系统的梯度基反演通常通过代理模型或物理信息神经网络（PINN）进行，而最直接的路径——通过PDE结构本身进行反向传播——在很大程度上被避免。我们将这条直接路径作为诊断探针，通过未折叠的Gray-Scott模拟反向传播稳态损失以恢复其参数，无需代理或神经网络增强。优化未能收敛，直接绘制损失景观将其失败定位于其几何结构——平坦高原无梯度信号，被与分岔边界对齐的陡峭悬崖所包围——这种结构在损失函数中重复出现，并且无论梯度如何路由到参数都会继承。将这一最小设置视为PINN的消融实验，我们解构了每个组件的作用：在神经网络固定的情况下，残差损失是PDE参数的二次函数，产生平滑的损失景观，因此仅凭它就能避免病理现象，通过隐式编码所有初始条件下的完整PDE动力学。而神经网络无法修复不适定的参数子空间，因此仅用于完成观测数据——这种分工此前未被明确。这些发现对PINN类方法具有具体的设计意义，并提供了关于何时添加维度实际上有帮助的更广泛启发。

英文摘要

Gradient-based inversion of reaction-diffusion systems is typically approached via surrogate models or physics-informed neural networks (PINNs), while the most direct route, backpropagation through the PDE's structure itself, has largely been avoided. We pursue this direct route as a diagnostic probe, backpropagating a steady-state loss through unrolled Gray-Scott simulation to recover its parameters, with no surrogate or neural-network augmentation. Optimization fails to converge, and plotting the landscape directly locates the failure in its geometry -- flat plateaus with no gradient signal, bounded by sharp cliffs that align with bifurcation boundaries -- a structure that recurs across loss functions and is inherited however the gradients are routed to parameters. Reading this minimal setup as an ablation of PINN, we disentangle each component's role: with the neural network fixed, the residual loss is quadratic in the PDE parameters and yields a smooth landscape, so it alone already avoids the pathology, by implicitly encoding the full PDE dynamics across all initial conditions. The neural network, for its part, cannot repair an ill-posed parameter subspace, and so serves only to complete the observed data -- a division of labor not previously made explicit. These findings carry concrete design implications for PINN-type methods and a broader heuristic on when added dimensions actually help.

URL PDF HTML ☆

赞 0 踩 0

2606.11431 2026-06-11 cs.LG 新提交

Mirror Descent Beyond Euclidean Stability: An Exponential Separation in Initialization Sensitivity

超越欧几里得稳定性的镜像下降：初始化敏感性的指数级分离

Shira Vansover-Hager, Matan Schliserman, Ofir Schlisselberg, Tomer Koren

发表机构 * Blavatnik School of Computer Science and AI, Tel Aviv University（特拉维夫大学布拉瓦特尼克计算机科学与人工智能学院）； Google Research（谷歌研究院）

AI总结本文证明非二次正则化的镜像下降（MD）在凸光滑目标上对初始化的敏感性可呈指数级增长，与梯度下降（GD）形成鲜明对比，并提出基于锚点的Bregman正则化可缓解不稳定性。

详情

AI中文摘要

镜像下降（MD）将梯度下降（GD）扩展到欧几里得几何之外，最近重新成为强化学习和LLM后训练中KL正则化策略优化的视角。这引发了一个基本的鲁棒性问题，对可重复性和可靠性至关重要：MD动力学对其输入的敏感性如何？我们关注初始化，它本身通常是预训练或先前对齐的模型。众所周知，二次正则化的MD（包括GD和马氏几何）对于凸光滑目标是稳定的。我们展示了一个鲜明的对比：一旦正则化器是非二次的，即使正则化器在欧几里得范数下是良条件的，MD对初始化的敏感性也可能比GD高指数级。我们给出了一个三维构造，其中目标函数是凸光滑的，正则化器是强凸、光滑且良条件的，初始$\varepsilon$扰动在$T$次步长为$\eta$的MD迭代后迅速放大到$\min\{\text{polylog}^{-1}(1/\varepsilon), \varepsilon e^{\Omega(\eta T)}\}$。对于单纯形上的典型KL正则化MD，我们证明即使线性目标在高维或近边界区域也能指数级放大初始$\varepsilon$扰动。最后，我们展示向锚点添加Bregman正则化项可以在很大程度上保持优化保证的同时稳定动力学，并且锚点的选择至关重要：在初始化处锚定仅部分缓解不稳定性，而在固定点锚定则产生更稳定的机制。

英文摘要

Mirror Descent (MD) extends Gradient Descent (GD) beyond Euclidean geometry and has recently reappeared as a lens for KL-regularized policy optimization in reinforcement learning and LLM post-training. This raises a basic robustness question, crucial to reproducibility and reliability: how sensitive are MD dynamics to their inputs? We focus on initialization, often itself a pretrained or previously aligned model. Quadratic-regularized MD, including GD and Mahalanobis geometries, is well-known to be stable for convex smooth objectives. We show a sharp contrast: once the regularizer is non-quadratic, MD can be exponentially more sensitive to initialization than GD, even with a well-conditioned regularizer in Euclidean norm. We give a three-dimensional construction with a convex, smooth objective and a strongly convex, smooth, well-conditioned regularizer where an initial $\varepsilon$ perturbation is quickly amplified to $\min\{\text{polylog}^{-1}(1/\varepsilon), \varepsilon e^{Ω(ηT)}\}$ after $T$ iterations of MD with step size $η$. For canonical KL-regularized MD on the simplex, we show that even linear objectives can amplify an initial $\varepsilon$ perturbation exponentially fast in high-dimensional or near-boundary regimes. Finally, we show that adding a Bregman regularization term toward an anchor point can stabilize the dynamics while largely preserving the optimization guarantees, and that the choice of anchor is crucial: anchoring at the initialization only partially mitigates the instability, whereas anchoring at a fixed point yields a more stable mechanism.

URL PDF HTML ☆

赞 0 踩 0

2606.11574 2026-06-11 cs.LG cond-mat.mtrl-sci physics.chem-ph stat.ML 新提交

Range-Aware Bayesian Optimization for Discovering Diverse Designs within Target Property Windows

范围感知贝叶斯优化用于在目标属性窗口内发现多样化设计

Shengli Jiang, Jason Wu, Charles M. Schroeder, Michael A. Webb

发表机构 * Department of Chemical and Biological Engineering, Princeton University（普林斯顿大学化学与生物工程系）

AI总结提出范围感知贝叶斯优化框架，通过采集函数直接评分候选解满足目标范围的后验概率，在基准任务和实际案例中比标准方法发现更多样化的有效设计。

Comments 64 pages, 6 main text figures, 17 supporting figures, 6 supporting tables

详情

AI中文摘要

在许多材料和产品设计问题中，理想的候选物表现出可接受范围内的属性，而非达到单一最优值。恢复满足此类规格的多个不同解也具有实际价值，因为某些候选物可能因成本、可加工性或鲁棒性等原因而更受青睐，而这些因素难以直接编码到目标函数中。在此，我们开发了一个范围感知贝叶斯优化（BO）框架，其中采集函数直接评分候选解满足目标范围的后验概率。该框架自然扩展到在共享候选空间上并行追求多个不同规格。在基准任务中，范围感知采集一致地比标准BO基线和最近的目标寻求方法恢复更大且更多样化的有效设计集。其效用进一步在两个实际动机的设计案例研究中得到证明，涉及优化聚合物合成的反应条件和发现指定光学吸收带的序列定义低聚物，并得到量子化学计算的支持。这些结果表明，范围感知BO可以为规格驱动设计提供实用且样本高效的基础，特别是当设计灵活性和解多样性是重要考虑因素时。

英文摘要

In many materials and product design problems, desirable candidates exhibit properties that fall within an acceptable range rather than achieve a single optimum. Recovering multiple, distinct solutions that satisfy such specifications is also practically valuable, as some candidates may be preferred for reasons of cost, processability, or robustness that are difficult to encode directly in an objective function. Here, we develop a range-aware Bayesian optimization (BO) framework in which the acquisition function directly scores the posterior probability that a candidate satisfies a target range. The framework naturally extends to parallel pursuit of multiple distinct specifications over a shared candidate space. Across benchmark tasks, range-aware acquisition consistently recovers larger and more diverse sets of valid designs than standard BO baselines and recent goal-seeking methods. Its utility is further demonstrated in two practically motivated design case studies involving optimizing reaction conditions for polymer synthesis and sequence-defined oligomer discovery for prescribed optical absorption bands, supported by quantum chemical calculations. These results suggest that range-aware BO can provide a practical and sample-efficient foundation for specification-driven design, particularly when design flexibility and solution diversity are important considerations.

URL PDF HTML ☆

赞 0 踩 0

2606.11711 2026-06-11 cs.LG stat.ML 新提交

Capacity-Constrained Online Convex Optimization with Delayed Feedback

具有延迟反馈的容量受限在线凸优化

Alexander Ryabchenko, Idan Attias, Daniel M. Roy

发表机构 * Department of Statistical Sciences, University of Toronto（多伦多大学统计科学系）； Vector Institute（向量研究所）； Institute for Data, Econometrics, Algorithms, and Learning (IDEAL), hosted by UIC and TTIC（数据、计量经济学、算法与学习研究所（IDEAL），由伊利诺伊大学芝加哥分校和丰田工业大学芝加哥分校主办）

AI总结研究在硬容量约束下（最多同时跟踪C个待处理轮次）的延迟在线凸优化，通过引入半先知模型和延迟加权FTRL算法，首次给出了凸和强凸损失下容量受限OCO的遗憾界。

详情

AI中文摘要

具有延迟反馈的在线学习通常假设学习者可以跟踪所有待处理轮次直到其反馈到达。在实践中，跟踪资源是有限的，未跟踪轮次的反馈将永久丢失。在本文中，我们研究了在硬容量约束下的延迟在线凸优化（OCO），其中任何时候最多可以跟踪$C$个待处理轮次。为了建模延迟信息，我们引入了一个半先知模型，该模型细化了先前工作中的先知假设：学习者不需要在预测时知道延迟，而是在线观察延迟到期，这与经典的无约束延迟设置一致。我们的方法通过归约到一个新颖的“延迟且加权”的OCO问题来实现，使用一个随机化跟踪决策并对结果观测进行重要性加权的调度器。对于这个基础问题，我们提出并分析了延迟加权FTRL及其赌博机变体，建立了明确刻画时变权重与延迟反馈之间相互作用的遗憾界。将这些基础学习器与我们的调度器相结合，首次给出了在凸和强凸损失下容量受限OCO的遗憾保证，适用于一阶和赌博机反馈。对于一阶反馈，容量$C = \Omega(\log T)$足以在忽略对数因子的情况下恢复标准延迟OCO的速率。对于赌博机反馈，遗憾率由$(1 + \sigma_{\text{max}}/C)$的幂次调制，其中$\sigma_{\text{max}}$是任何时候的最大待处理观测数。这使得当$C < \sigma_{\text{max}}$时遗憾界能够优雅地退化，同时保持次线性。

英文摘要

Online learning with delayed feedback typically assumes that the learner can track all pending rounds until their feedback arrives. In practice, tracking resources are finite, and feedback from untracked rounds is permanently lost. In this paper, we study delayed online convex optimization (OCO) under a hard capacity constraint, where at most $C$ pending rounds can be tracked at any time. To model delay information, we introduce a semi-clairvoyant model that refines the clairvoyant assumption from prior work: rather than requiring delays to be known at prediction time, the learner observes delay expirations online, consistent with the classical unconstrained delayed setting. Our approach proceeds via a reduction to a novel ``delayed and weighted'' OCO problem, using a scheduler that randomizes tracking decisions and importance-weights the resulting observations. For this base problem, we propose and analyze Delayed-Weighted FTRL and its bandit analogue, establishing regret bounds that explicitly characterize the interaction between time-varying weights and delayed feedback. Combining these base learners with our schedulers yields the first regret guarantees for capacity-constrained OCO under convex and strongly convex losses, for both first-order and bandit feedback. For first-order feedback, capacity $C = Ω(\log T)$ suffices to recover standard delayed OCO rates up to logarithmic factors. For bandit feedback, the regret rates are modulated by powers of $(1 + σ_{\text{max}}/C)$, where $σ_{\text{max}}$ is the maximum number of pending observations at any time. This allows the regret bound to degrade gracefully when $C < σ_{\text{max}}$, while remaining sublinear.

URL PDF HTML ☆

赞 0 踩 0

2606.12050 2026-06-11 cs.LG math.DS 新提交

Reliable Error Estimation for PINNs: Lower and Upper A Posteriori Bounds

PINNs的可靠误差估计：后验下界与上界

Ismail Huseynov, Arzu Ahmadova, Agamirza Bashirov

发表机构 * Physikalisch-Technische Bundesanstalt (PTB)（德国联邦物理技术研究院）； Technical University of Berlin（柏林工业大学）； Weierstrass Institute for Applied Analysis and Stochastics（魏尔斯特拉斯应用分析与随机研究所）； Eastern Mediterranean University（东地中海大学）

AI总结提出PINNs求解常微分方程的可计算后验误差下界，结合局部单侧Lipschitz条件得到更紧的上界，实现双侧误差包络，并讨论初始条件处理对下界的影响。

详情

AI中文摘要

物理信息神经网络（PINNs）将机器学习与物理定律相结合以求解微分方程。虽然现有结果为PINN预测误差提供了严格的后验上界，但完整认证还需要互补的下界信息以获得可计算的双侧误差包络。本文在合适的认证状态空间域上，在局部强单调性条件下推导了PINN误差在常微分方程中的可计算后验下界。我们将这些估计与在单侧Lipschitz条件下的互补局部上界相结合，该条件弱于先前工作中使用的全局Lipschitz假设，并能产生更尖锐的误差上界带。所得界仅依赖于神经网络近似、ODE残差以及局部单调性和增长常数，因此无需访问精确解。对于线性时不变和时变系统，我们进一步根据系统矩阵对称部分的最小和最大特征值得出显式公式。我们还讨论了PINN中初始条件的软硬约束区别，并解释了为什么精确约束可能使标量下界证书无效。为了在线性情形中恢复有意义的非平凡下界信息，我们使用基于坐标单位向量的符号残差有限探针证书。我们还制定了一种证书引导的训练策略，其中传播的上界证书用作辅助正则化器，而下界证书保留为训练后诊断。总体而言，所提出的框架为PINN逼近ODE提供了严格且实际可计算的误差证书，同时明确了假设可验证的域和模型类别。

英文摘要

Physics-informed neural networks (PINNs) combine machine learning with physical laws to solve differential equations. While existing results provide rigorous \emph{a posteriori} upper bounds for PINN prediction errors, complete certification also requires complementary lower information in order to obtain computable two-sided error enclosures. In this paper, we derive computable \emph{a posteriori} lower bounds for PINN errors in ordinary differential equations on suitable certified state-space domains under a localized strong monotonicity condition. We combine these estimates with complementary localized upper bounds under a one-sided Lipschitz condition, which is weaker than the global Lipschitz assumption used in previous work and can yield sharper upper error bands. The resulting bounds depend only on the neural-network approximation, the ODE residual, and local monotonicity and growth constants, and therefore do not require access to the exact solution. For linear time-invariant and time-varying systems, we further derive explicit formulas in terms of the minimal and maximal eigenvalues of the symmetric part of the system matrix. We also discuss the distinction between soft and hard enforcement of initial conditions in PINNs and explain why exact enforcement can make the scalar lower certificate uninformative. To recover nontrivial lower information in the linear setting, we use a signed-residual finite-probe certificate based on coordinate unit vectors. We also formulate a certificate-informed training strategy in which the propagated upper certificate is used as an auxiliary regularizer, while lower certificates remain post-training diagnostics. Altogether, the proposed framework provides rigorous and practically computable error certificates for PINN approximations of ODEs, while making explicit the domains and model classes for which the assumptions can be verified.

URL PDF HTML ☆

赞 0 踩 0

2606.12120 2026-06-11 cs.LG math.OC 新提交

A Riemannian Approach to Low-Rank Optimal Transport

低秩最优传输的黎曼方法

Pratik Jawanpuria, Bamdev Mishra

发表机构 * Centre for Machine Intelligence and Data Science, IIT Bombay（印度理工学院孟买分校机器智能与数据科学中心）； Microsoft India（微软印度）

AI总结提出黎曼几何框架用于低秩最优传输，通过将平衡与不平衡秩r正因子耦合建模为光滑子流形，并采用Fisher-Rao乘积度量，实现高效的一阶和二阶求解器，在收敛速度和性能上超越现有方法。

详情

AI中文摘要

低秩最优传输（OT）缓解了经典求解器的二次缩放问题，但现有方法严重依赖需要仔细调整超参数且忽略优化景观曲率的一阶镜像下降更新。为了解决这些局限性，我们提出了一个统一的低秩OT黎曼几何框架，将平衡和不平衡秩$r$正因子耦合建模为正象限的新型光滑嵌入子流形。通过为这些流形配备Fisher-Rao乘积度量，我们推导出黎曼投影、收缩和Hessian-向量积的可处理公式。我们的成本无关框架无缝扩展到线性OT、Gromov-Wasserstein（GW）、融合GW及其不平衡对应物。对于平衡OT，我们的几何成分通过高效的共轭梯度和迭代Bregman更新计算。对于不平衡OT，我们的操作优雅地简化为闭式缩放，完全消除了内部迭代循环。在两种情况下，每次迭代的复杂度与数据集大小呈线性关系，并且我们提供了用于全局最优性验证的秩充分性证书。跨一系列问题规模的大量实验表明，我们的无正则化一阶和二阶求解器在收敛速度和性能上优于现有最先进的低秩OT求解器。

英文摘要

Low-rank optimal transport (OT) mitigates the quadratic scaling of classical solvers, yet existing approaches rely heavily on first-order mirror-descent updates that require careful hyperparameter tuning and ignore the optimization landscape's curvature. To address these limitations, we propose a unified Riemannian geometric framework for low-rank OT, modeling balanced and unbalanced rank-$r$ positive factored couplings as novel smooth embedded submanifolds of the positive orthant. By equipping these manifolds with the Fisher-Rao product metric, we derive tractable formulations for Riemannian projectors, retractions, and Hessian-vector products. Our cost-agnostic framework seamlessly extends to linear OT, Gromov-Wasserstein (GW), fused GW, and their unbalanced counterparts. For balanced OT, our geometric ingredients are computed via efficient conjugate-gradient and iterative Bregman updates. For the unbalanced OT, our operations elegantly reduce to closed-form scalings, completely eliminating inner iterative loops. In both regimes, per-iteration complexity scales linearly with dataset size, and we provide a rank-sufficiency certificate for global optimality verification. Extensive experiments across a range of problem sizes demonstrate that our regularization-free first- and second-order solvers achieve faster convergence and superior performance over existing state-of-the-art low-rank OT solvers.

URL PDF HTML ☆

赞 0 踩 0

2606.11263 2026-06-11 math.ST cs.LG cs.NA math.NA math.PR stat.TH 交叉投稿

Geometric bias in eigenspace perturbation under random heterogeneous noise

随机异质噪声下特征空间扰动的几何偏差

Fengkai Liu, Ke Wang, Wanjie Wang

发表机构 * Department of Mathematics, Hong Kong University of Science and Technology（香港科技大学数学系）； Department of Statistics and Data Science, National University of Singapore（新加坡国立大学统计与数据科学系）

AI总结针对稀疏、异质方差噪声下的信号加噪声矩阵，研究发现经验特征向量存在经典扰动界无法捕捉的系统性几何偏差，并通过二次向量方程和精细各向同性局部律推导了最优非渐近扰动界。

Comments 104 pages, 1 figure

详情

AI中文摘要

谱方法从根本上依赖于主特征空间在随机扰动下的稳定性。经典上，这种稳定性由 Davis-Kahan 和 Wedin 定理量化，这些定理利用噪声的算子范数和相关谱间隙来界定特征空间误差。虽然这些最坏情况界对于任意确定性扰动是紧的，但在低秩信号加随机噪声的设置中可能造成浪费，因为它们未能捕捉信号几何与噪声分布之间的细粒度相互作用。在本文中，我们研究了被具有任意非齐次方差剖面的稀疏随机噪声破坏的信号加噪声矩阵的谱扰动。我们证明，在异质噪声方差下，经验特征向量遭受系统性的、确定性的几何偏差，这种偏差完全不为经典扰动界所见。通过利用二次向量方程并建立精细的各向同性局部律，我们推导了在算子范数和 $2\to\infty$ 范数下前导特征空间的近最优、非渐近扰动界。这些界将通常的信噪比贡献、随机波动和由信号特征空间与行方差剖面对齐决定的结构化几何偏差项分离开来。

英文摘要

Spectral methods rely fundamentally on the stability of principal eigenspaces under random perturbations. Classically, this stability is quantified by the Davis-Kahan and Wedin theorems, which bound the eigenspace error using the operator norm of the noise and the relevant spectral gaps. While these worst-case bounds are sharp for arbitrary deterministic perturbations, they can be wasteful in the low-rank signal-plus-random-noise setting, as they fail to capture the fine-grained interaction between the signal geometry and the noise distribution. In this paper, we study the spectral perturbation of signal-plus-noise matrices corrupted by sparse, random noise with an arbitrary, inhomogeneous variance profile. We demonstrate that under heterogeneous noise variances, the empirical eigenvectors suffer a systematic, deterministic geometric bias that is entirely invisible to classical perturbation bounds. By leveraging the Quadratic Vector Equation (QVE) and establishing fine-grained isotropic local laws, we derive near-optimal, non-asymptotic perturbation bounds for the leading eigenspaces in the operator and $2\to\infty$ norms. The bounds separate the usual signal-to-noise contribution, stochastic fluctuations, and structured geometric bias terms determined by the alignment between the signal eigenspaces and the row-wise variance profile.

URL PDF HTML ☆

赞 0 踩 0

2606.11283 2026-06-11 cs.DS cs.LG stat.ML 交叉投稿

Fixed-Parameter Tractability of Private Synthetic Data Generation

私有合成数据生成的固定参数可处理性

Badih Ghazi, Cristóbal Guzmán, Pritish Kamath, Alexander Knop, Ravi Kumar, Pasin Manurangsi

发表机构 * Google Deepmind（谷歌深Mind）； Institute for Mathematical and Computational Engineering, Faculty of Mathematics and School of Engineering, Pontificia Universidad Católica de Chile（数学与计算工程学院、数学系和工程学院、智利天主教大学）

AI总结研究差分隐私下合成数据生成问题，通过查询族关联图的树宽参数建立固定参数可处理性，提出两种最优算法。

2606.11339 2026-06-11 math.OC cs.AI cs.LG cs.SY eess.SY stat.ML 交叉投稿

Quantized Stochastic Primal-Dual Methods for Distributed Optimization under Relaxed Global Geometry

松弛全局几何下分布式优化的量化随机原始-对偶方法

Susmit Sarkar, Abhinav Raghuvanshi, Kushal Chakrabarti, Mayank Baranwal

发表机构 * Indian Institute of Technology Bombay（印度理工学院班加罗尔）； Tata Consultancy Services Research（塔塔咨询公司研究）

AI总结提出量化随机原始-对偶方法q-PDGD，在松弛全局几何下证明线性收敛到邻域或O(1/k)收敛，匹配最优集中随机复杂度。

Comments Accepted to UAI

2606.11347 2026-06-11 stat.ML cs.LG math.OC 交叉投稿

Annealed Entropic Allocation for Ranking and Selection

退火熵分配用于排序与选择

Xin Fei, Juergen Branke

发表机构 * Business School（商学院）； The University of Edinburgh（爱丁堡大学）； Warwick Business School（沃里克商学院）； The University of Warwick（沃里克大学）

AI总结提出退火熵分配框架，通过加权log-sum-exp替代非光滑极大极小大偏差率目标，结合鞍点近似提升有限预算下的区分能力，数值实验表明在多个候选接近时性能优异。

详情

AI中文摘要

我们提出了退火熵分配，一种用于排序与选择中顺序预算分配的退火加权软最小化框架。核心思想是用加权log-sum-exp替代非光滑的极大极小大偏差率目标，该替代通过软最小化权重聚合特定候选对的得分，从而在多个候选几乎同时活跃时缓解硬切换。为了提升有限预算下的区分能力，我们引入了鞍点近似——一种从精细化的成对尾部渐近性导出的次指数修正。由于这些修正是次指数的，且平滑参数退火至零，该替代保持了与经典极大极小公式相同的一阶大偏差目标。我们证明了该替代一致收敛于硬最小值，软最小化权重集中于活跃候选，并且在固定权重下，诱导的目标分配映射在单纯形内部是连续的。在高斯和指数实例上的数值实验展示了竞争性能，尤其是在多个候选几乎持平时。

英文摘要

We propose Annealed Entropic Allocation, an annealed weighted soft-min framework for sequential budget allocation in ranking and selection. The central idea is to replace the non-smooth maximin large-deviation rate objective with a weighted log-sum-exp surrogate that aggregates challenger-specific pairwise scores through soft-min weights, mitigating hard switching when several challengers are nearly active. To improve finite-budget discrimination, we incorporate the saddlepoint approximation -- a sub-exponential correction derived from refined pairwise tail asymptotics. Because these corrections are sub-exponential and the smoothing parameter is annealed to zero, the surrogate preserves the same first-order large-deviation target as the classical maximin formulation. We show that the surrogate converges uniformly to the hard minimum, that the soft-min weights concentrate on the active challengers, and that, under fixed weights, the induced target allocation map is continuous on the simplex interior. Numerical experiments on Gaussian and exponential instances demonstrate competitive performance, especially when multiple challengers are nearly tied.

URL PDF HTML ☆

赞 0 踩 0

2606.11437 2026-06-11 cs.DS cs.AI cs.LG stat.ML 交叉投稿

The Power of Test-Time Training for Approximate Sampling

测试时训练对近似采样的威力

Noah Golowich, Ankur Moitra, Dhruv Rohatgi

发表机构 * Microsoft Research NYC（微软研究院纽约分校）； MIT（麻省理工学院）

AI总结本文形式化测试时训练（TTT）为从已知分布类中采样的问题，证明查询复杂度的二次下界，并展示在分布类大小受限时可规避该下界，为TTT提供理论框架。

详情

AI中文摘要

从复杂概率分布中高效采样是一个基本问题，近年来随着生成式AI的兴起，这一问题变得越来越重要，因为从大语言模型（LLM）中提出的复杂采样程序已被用于解决具有挑战性的推理问题。然而，这类采样算法的有效性受到LLM与特定采样任务之间关系的限制，这推动了测试时训练（TTT）框架的发展。TTT通过根据推理时收到的部分生成和奖励反馈更新模型权重来工作，从而适应特定问题。在这项工作中，我们提出了一种TTT的形式化，将其定义为从属于已知分布类$F$的给定概率测度$\mu^\star$中生成样本的问题，给定一个提供$\mu^\star$近似密度估计的预言机$\hat \mu$。这与Jerrum、Valiant和Vazirani（1986）以及Jerrum和Sinclair（1989）的开创性工作中研究的将采样约化为近似计数的问题密切相关：即当$F$是所有分布的类时，它恰好与上述计数到采样的约化一致。在本文中，我们首先证明了在给定对$\hat \mu$的查询访问的情况下，从$\mu^\star$采样的查询复杂度的二次下界（对于足够大的类$F$），从而表明Jerrum和Sinclair（1989）提出并由Hayes和Sinclair（2010）改进的随机游走方法是最优的。这回答了Hayes和Sinclair提出的一个开放问题。然后，我们证明如果$F$的大小适当受限，这个下界可以被规避。正如我们所讨论的，后一个结果可以被视为TTT的抽象，因此代表了为TTT发展一个原则性理论框架的起点。

英文摘要

Efficiently sampling from a complex probability distribution is a fundamental problem which has become increasingly pertinent in recent years with the rise of generative AI, as sophisticated sampling procedures from LLMs have been proposed to solve challenging reasoning problems. The efficacy of such sampling algorithms is limited, however, by the relationship between the LLM and the particular sampling task at hand, which has motivated the framework of test-time training (TTT). TTT works by updating a model's weights in response to partial generations and reward feedback received at inference time, thus adapting to the particular problem. In this work, we propose a formalization for TTT as the problem of producing a sample from a given probability measure $μ^\star$ belonging to a known class ${F}$ of distributions, given an oracle $\hat μ$ which yields approximate density estimates for $μ^\star$. This is closely related to the problem of reducing sampling to approximate counting studied in seminal works of Jerrum, Valiant & Vazirani (1986) and Jerrum & Sinclair (1989): namely, when ${F}$ is the class of all distributions, it coincides exactly with the aforementioned counting-to-sampling reduction. In this paper, we first show a quadratic lower bound on the query complexity of sampling from $μ^\star$ given query access to $\hat μ$ (for sufficiently large classes ${F}$), thus showing that the random walk approach proposed by Jerrum & Sinclair (1989) and refined by Hayes & Sinclair (2010), is optimal. This answers an open question posed by Hayes & Sinclair. We then show that this lower bound can be circumvented if the size of ${F}$ is bounded appropriately. As we discuss, this latter result can be viewed as an abstraction of TTT, and thus represents a starting point for the development of a principled theoretical framework for TTT.

URL PDF HTML ☆

赞 0 踩 0

2606.11469 2026-06-11 cs.DS cs.LG math.ST stat.TH 交叉投稿

Density estimation for Hellinger via minimum-distance estimators: mixtures of Gaussians, log-concave, and more

基于最小距离估计量的Hellinger密度估计：高斯混合、对数凹等

Spencer Compton, Jerry Li

发表机构 * Stanford University（斯坦福大学）； University of Washington（华盛顿大学）

AI总结将最小距离估计方法从总变差距离扩展到Hellinger距离，通过反向数据处理不等式，实现了对对数凹混合和高斯混合（任意方差）的近线性时间学习，样本复杂度接近最优。

详情

AI中文摘要

我们研究密度估计任务，希望从$n$个样本中准确估计概率密度。在总变差距离下，密度估计的经典方法是最小距离估计量方法，其中我们仅通过限制特定概念类（即Yatracos类）的VC维即可得到算法和分析。虽然该技术最初主要针对总变差距离给出了精确保证，但在本文中，我们将最小距离估计量方法扩展到Hellinger距离下的学习。我们的主要观察是，通过联系最近得到反向数据处理不等式的结果，我们可以为Hellinger距离生成类似的方案（其中我们只需要限制相关概念类的VC维）。该方案足够灵活，可以容纳最初为总变差距离设计的快速算法；通过修改Acharya等人（2017）的方法，我们首次得到了近线性时间算法，用于学习包括单变量对数凹密度混合和高斯混合（具有任意方差）在内的类别，且样本复杂度接近最优。

英文摘要

We study the task of density estimation, where we hope to accurately estimate a probability density from $n$ samples. A textbook method for density estimation in total variation distance is the minimum-distance estimator approach, where we conclude both the algorithm and the analysis merely from bounding the VC dimension of a particular concept class (the so-called Yatracos class). While this technique has originally yielded sharp guarantees primarily for total variation distance, in this work we extend the minimum-distance estimator approach for learning within Hellinger distance. Our main observation is that we may produce an analogous recipe for Hellinger (where we only require bounding the VC dimension of a related concept class) by drawing connections to recent results yielding reverse data processing inequalities. This recipe is flexible enough to accommodate fast algorithms originally designed for total variation distance; by modifying the approach of Acharya et al. (2017) we conclude the first near-linear time algorithm for learning classes including univariate mixtures of log-concave densities and mixtures of Gaussians (with arbitrary variances), with near-optimal sample complexity.

URL PDF HTML ☆

赞 0 踩 0

2606.11629 2026-06-11 math.DS cs.LG 交叉投稿

Integral Formulation of QENDy for Robust Nonlinear System Identification

QENDy的积分形式用于鲁棒非线性系统辨识

Nikhil Saran, Sushant Pokhriyal, Stefan Klus, Rushikesh Kamalapurkar, Joel A. Rosenfeld

发表机构 * Department of Mathematics and Statistics at the University of South Florida（佛罗里达州立大学数学与统计学系）； Institute of Engineering and Technology, JK Lakshmipat University（JK拉克什米帕特大学工程与技术学院）； School of Mathematical & Computer Sciences at the Heriot–Watt University（赫里奥特-瓦特大学数学与计算机科学学院）； University of Florida（佛罗里达大学）

AI总结提出QENDy方法的积分形式，避免使用时间导数，从而增强对噪声的鲁棒性，实现更稳健的非线性动力学学习。

2606.11738 2026-06-11 stat.ML cs.LG 交叉投稿

Renewable Lasso without Batch-Number Constraints: A Gradient-Enhanced Approach

无批次数量约束的可再生Lasso：一种梯度增强方法

Junzhuo Gao, Ling Peng, Xu Guo, Heng Lian

发表机构 * Department of Mathematics, City University of Hong Kong（香港城市大学数学系）； School of Statistics and Data Science, Jiangxi University of Finance and Economics（江西财经大学统计与数据科学学院）； Philosophy and Social Sciences Laboratory of Data Science in Finance and Economics at the Ministry of Education, Jiangxi University of Finance and Economics（教育部金融与经济数据科学哲学与社会科学实验室，江西财经大学）； School of Statistics, Beijing Normal University（北京师范大学统计学院）； CityUHK Shenzhen Research Institute（城大深圳研究院）

AI总结针对高维广义线性模型的流数据在线估计，提出梯度增强替代损失函数，消除批次数量约束，并扩展到分布式流数据场景，理论推导非渐近误差界，实验验证精度提升。

详情

AI中文摘要

我们研究具有流数据的高维广义线性模型的在线估计。首先，针对非分布式设置，我们提出一种梯度增强替代损失函数，仅使用历史摘要近似累积损失，修改并改进了现有高维设置下同一模型的可再生估计方法，并消除了先前研究中的批次数量约束。然后，我们将该方法扩展到主从架构下的分布式流数据，其中批次按站点划分，仅交换摘要（梯度向量）。我们的调整方法不要求客户端计算完整的替代损失，而不是直接应用Jordan等人（2019）的流行方法到替代二次损失。我们在高维尺度下推导了非渐近误差界，没有先前研究中严格的批次数量约束。在线性和逻辑模型下的模拟结果以及实际数据应用表明，与现有的可再生估计器相比，精度有所提高。

英文摘要

We study online estimation for high-dimensional generalized linear models with streaming data. First, for the non-distributed setting, we propose a gradient-enhanced surrogate loss that approximates the cumulative loss using only historical summaries, which modifies and improves upon the existing renewable estimation approach for the same model in the high-dimensional setting, and removes the batch-number constraint in previous studies. We then extend the method to distributed streaming data under the master-client architecture, where batches are partitioned across sites and only summaries (gradient vectors) are exchanged. Instead of directing applying the popular method of Jordan et al. (2019) to the surrogate quadratic loss, our adjusted approach does not require the clients to compute the full surrogate loss. We derive non-asymptotic error bounds under the high-dimensional scaling, without the stringent constraint on the number of batches in the previous studies. Simulation results under linear and logistic models, together with a real-data application, show improved accuracy over existing renewable estimators.

URL PDF HTML ☆

赞 0 踩 0

2606.11773 2026-06-11 math.OC cs.LG 交叉投稿

Last-Iterate Convergence of Optimistic Multiplicative Weight Update

乐观乘性权重更新的最后迭代收敛性

Francesco Orabona

发表机构 * King Abdullah University of Science and Technology（卡塔尔科学与技术大学）

AI总结本文证明乐观乘性权重更新（OMWU）在光滑凸-凹鞍点问题中以足够小的常数学习率渐近收敛，无需唯一性、严格互补性、误差界或接近解的初始化。

2606.12058 2026-06-11 stat.ML cond-mat.dis-nn cs.LG 交叉投稿

理解预测编码中的样本效率

Gaspard Oliviers, Elene Lominadze, Rafal Bogacz

发表机构 * Nuffield Department of Clinical Neurosciences, University of Oxford, United Kingdom（牛津大学神经科学学院Nuffield部门，英国）； MRC Centre of Research Excellence in Restorative Neural Dynamics, United Kingdom（英国修复神经动力学研究卓越中心）

AI总结本文研究预测编码在样本效率上的优势，通过目标对齐度量分析BP和PC的学习效率，发现PC在深度、狭窄和预训练网络中表现更优，提供机制理解以指导PC参数设计。

详情

AI中文摘要

预测编码（PC）是皮层学习的重要理论。近期研究多比较PC与反向传播（BP）以确定PC是否具有优势。小规模实验表明PC在许多上下文中能更高效地学习，但理论理解仍不明确。本文通过目标对齐度量量化BP和PC的学习效率，推导并验证深度线性网络中目标对齐的解析表达式。研究发现PC的学习效率高于BP，尤其在深度、狭窄和预训练网络中更为明显。还推导了保证PC目标对齐最优的精确条件，并通过实验验证。研究了线性和非线性模型的完整训练轨迹，发现即使部分假设不成立，PC的预测优势仍持续存在。本文提供了对PC比BP在先前工作中观察到更高学习效率的机制理解，并指导如何参数化PC以最有效地学习。

英文摘要

Predictive Coding (PC) is an influential account of cortical learning. Much of recent work has focused on comparing PC to Backpropagation (BP) to find whether PC offers any advantages. Small scale experiments show that PC enables learning that is more sample efficient and effective in many contexts, though a thorough theoretical understanding of the phenomena remains elusive. To address this, we quantify the efficiency of learning in BP and PC through a metric called ``target alignment'', which measures how closely the change in the output of the network is aligned to the output prediction error. We then derive and empirically validate analytical expressions for target alignment in Deep Linear Networks. We show that learning in PC is more efficient than BP, which is especially pronounced in deep, narrow and pre-trained networks. We also derive exact conditions for guaranteed optimal target alignment in PC and validate our findings through experiments. We study full training trajectories of linear and non-linear models, and find the predicted benefits of PC persist in practice even when some assumptions are violated. Overall, this work provides a mechanistic understanding of the higher learning efficiency observed for PC over BP in previous works, and can guide how PC should be parametrised to learn most effectively.

URL PDF HTML ☆

赞 0 踩 0

2606.09744 2026-06-11 cs.LG cond-mat.dis-nn 版本更新

Learning Dynamics Reveal a Hierarchy of Weight-Induced Layerwise Gram Metrics

学习动力学揭示权重诱导的分层Gram度量层次结构

Claudio Nordio

发表机构 * GitHub ； arXiv

AI总结本文研究前馈ReLU网络在固定读出和二次损失下的梯度下降动力学，将其重写为训练集空间上的集体动力学，并揭示深度网络中权重诱导的Gram算子层次结构。

Comments 24 pages. v4: Corrected the hidden-activation dynamics; clarified the concept of field closure. Other minor corrections

2512.11081 2026-06-11 stat.ML cs.LG stat.ME 版本更新

Provable Recovery of Locally Important Signed Features and Interactions from Random Forest

从随机森林中可证明地恢复局部重要符号特征和交互

Kata Vuk, Nicolas Alexander Ihlo, Merle Behr

发表机构 * Faculty of Informatics and Data Science, University of Regensburg, Germany（信息与数据科学学院，莱茵河畔雷根斯堡大学）

AI总结提出一种局部、模型特定的特征与交互重要性方法，通过结合全局和局部决策路径模式，在局部尖峰稀疏模型下可证明地恢复真实信号特征及其交互，并识别特征值大小对预测的驱动方向。

详情

AI中文摘要

特征与交互重要性（FII）方法在监督学习中至关重要，用于评估复杂预测模型中输入变量及其交互的相关性。在许多领域，如个性化医疗，通常需要针对单个预测的局部解释，而不是总结整体特征重要性的全局分数。随机森林（RF）在这些场景中被广泛使用，现有的可解释性方法通常利用树结构和分裂统计量来提供模型特定的见解。然而，对RF的局部FII方法的理论理解仍然有限，这使得如何解释单个预测的高重要性分数变得不明确。我们提出了一种新颖的、局部的、模型特定的FII方法，该方法识别特征在决策路径上的频繁共现，将全局模式与特定测试点路径上的模式相结合。我们证明，在局部尖峰稀疏（LSS）模型下，我们的方法一致地恢复真实的局部信号特征及其交互，并识别出大或小的特征值是否驱动预测。通过模拟研究和真实数据示例，我们展示了我们的方法和理论结果的有用性。

英文摘要

Feature and Interaction Importance (FII) methods are essential in supervised learning for assessing the relevance of input variables and their interactions in complex prediction models. In many domains, such as personalized medicine, local interpretations for individual predictions are often required, rather than global scores summarizing overall feature importance. Random Forests (RFs) are widely used in these settings, and existing interpretability methods typically exploit tree structures and split statistics to provide model-specific insights. However, theoretical understanding of local FII methods for RF remains limited, making it unclear how to interpret high importance scores for individual predictions. We propose a novel, local, model-specific FII method that identifies frequent co-occurrences of features along decision paths, combining global patterns with those observed on paths specific to a given test point. We prove that our method consistently recovers the true local signal features and their interactions under a Locally Spike Sparse (LSS) model and also identifies whether large or small feature values drive a prediction. We illustrate the usefulness of our method and theoretical results through simulation studies and a real-world data example.

URL PDF HTML ☆

赞 0 踩 0

2603.09276 2026-06-11 stat.ML cs.LG 版本更新

On Regret Bounds of Thompson Sampling for Bayesian Optimization

关于贝叶斯优化中汤普森采样遗憾界的分析

Shion Takeno, Shogo Iwazaki

发表机构 * Nagoya University（名古屋大学）； MI-6 Ltd.（MI-6公司）

AI总结本文针对高斯过程汤普森采样（GP-TS）方法，在目标函数为GP样本路径的假设下，推导了其遗憾下界、累积遗憾二阶矩上界、期望宽松遗憾上界以及改进的累积遗憾上界，填补了GP-TS在高概率遗憾界方面的空白。

Comments 43 pages, Accepted to ICML 2026

详情

AI中文摘要

我们研究了一种广泛使用的贝叶斯优化方法——高斯过程汤普森采样（GP-TS），假设目标函数是高斯过程的一个样本路径。与具有高概率和期望遗憾界的高斯过程上置信界（GP-UCB）相比，GP-TS的大多数分析仅限于期望遗憾。此外，最近关于GP-UCB的宽松遗憾和改进的累积遗憾上界的分析是否能应用于GP-TS仍不清楚。为了填补这些空白，本文展示了几个遗憾界：(i) GP-TS的遗憾下界，这意味着GP-TS以概率δ依赖于$1/\delta$的多项式；(ii) 累积遗憾二阶矩的上界，直接暗示了关于δ的改进遗憾上界；(iii) 期望宽松遗憾上界；(iv) 关于时间水平T的改进累积遗憾上界。在此过程中，我们提供了几个有用的引理，包括从最近分析中放松必要条件以获得关于T的改进累积遗憾上界。

英文摘要

We study a widely used Bayesian optimization method, Gaussian process Thompson sampling (GP-TS), under the assumption that the objective function is a sample path from a GP. Compared with the GP upper confidence bound (GP-UCB) with established high-probability and expected regret bounds, most analyses of GP-TS have been limited to expected regret. Moreover, whether the recent analyses of GP-UCB for the lenient regret and the improved cumulative regret upper bound can be applied to GP-TS remains unclear. To fill these gaps, this paper shows several regret bounds: (i) a regret lower bound for GP-TS, which implies that GP-TS suffers from a polynomial dependence on $1/δ$ with probability $δ$, (ii) an upper bound of the second moment of cumulative regret, which directly suggests an improved regret upper bound on $δ$, (iii) expected lenient regret upper bounds, and (iv) an improved cumulative regret upper bound on the time horizon $T$. Along the way, we provide several useful lemmas, including a relaxation of the necessary condition from recent analysis to obtain improved regret upper bounds on $T$.

URL PDF HTML ☆

赞 0 踩 0

2606.11270 2026-06-11 cs.LG cs.AI cs.CL 新提交

Quantifying Subliminal Behavioral Transfer Ratios in Language Model Distillation

量化语言模型蒸馏中的潜意识行为迁移比率

Uwe Konig, Hamza Kazmi, Ruizhe Li, Maheep Chaudhary

发表机构 * University of Freiburg（弗赖堡大学）

AI总结通过控制教师模型行为强度并蒸馏学生模型，量化了潜意识行为迁移比率，发现迁移具有鲁棒性且呈现不同缩放行为。

2606.11290 2026-06-11 cs.LG cs.AI cs.CL 新提交

FlowBank: Query-Adaptive Agentic Workflows Optimization through Precompute-and-Reuse

FlowBank: 通过预计算与复用实现查询自适应智能体工作流优化

Lingzhi Yuan, Chenghao Deng, Fangxu Yu, Souradip Chakraborty, Mohammad Rostami, Furong Huang

发表机构 * University of Maryland, College Park（马里兰大学哥伦比亚公园分校）； Amazon（亚马逊）

AI总结提出FlowBank框架，通过预计算多样化工作流并压缩为紧凑组合，在推理时自适应选择最优工作流，平衡性能与成本，在五个基准上平均得分最高且成本可控。

详情

AI中文摘要

基于大型语言模型的多智能体系统日益强大，但当前的智能体工作流优化范式存在令人不满意的权衡。任务级方法花费大量离线计算却只部署单个工作流，导致互补候选未被使用；而查询级方法为每个查询合成新工作流，推理成本高昂。我们的动机分析表明，这些范式更多是互补而非竞争：离线搜索中发现的工作流通常解决不同子集的查询，许多由昂贵查询级生成处理的查询已经可以通过更便宜的预计算工作流解决。这暗示了一个不同的目标：与其寻找一个普遍最佳的工作流或为每个实例重新生成，不如构建一个紧凑的、可复用的互补工作流库，并在推理时自适应地选择。为此，需要解决三个耦合问题：生成互补而非冗余的候选、压缩成小型可部署组合、在性能-成本权衡下为每个查询分配正确的工作流。我们提出FlowBank，一个基于组合的智能体工作流优化的三阶段框架。多样化阶段提出DiverseFlow，引导搜索覆盖未充分覆盖的查询，产生高覆盖率的候选池。精炼阶段提出CuraFlow，将候选池压缩为冗余最小的紧凑组合。匹配阶段将部署建模为查询-工作流二分图上的边值预测，将每个传入查询路由到预测效用最佳的组合成员。在五个基准上，FlowBank在评估方法中实现了最高平均得分，同时保持成本竞争力，相比最强的自动和手工基线分别相对提升4.26%和14.92%。

英文摘要

Large Language Model (LLM)-based multi-agent systems are increasingly powerful, but current agentic workflow optimization paradigms make an unsatisfying trade-off. Task-level methods spend substantial offline compute yet deploy only a single workflow, leaving complementary candidates unused, while query-level methods synthesize a new workflow per query at substantial inference cost. Our motivating analysis shows these paradigms are more complementary than competing: workflows discovered during offline search often solve different subsets of queries, and many queries handled by expensive query-level generation can already be solved by cheaper precomputed workflows. This suggests a different objective: rather than searching for one universally best workflow or regenerating one per instance, we should build a compact bank of reusable, complementary workflows and select among them adaptively at inference time. Doing so requires solving three coupled problems: generating complementary rather than redundant candidates, compressing them into a small deployable portfolio, and assigning each query to the right workflow under a performance-cost trade-off. To this end, we present FlowBank, a three-stage framework for portfolio-based agentic workflow optimization. Diversifying proposes DiverseFlow to steer search toward under-covered queries and produce a high-coverage candidate pool. Curating proposes CuraFlow to compress this pool into a compact portfolio with minimal redundancy. Matching casts deployment as edge-value prediction on a query-workflow bipartite graph and routes each incoming query to the portfolio member with the best predicted utility. Across five benchmarks, FlowBank achieves the highest average score among the evaluated methods while remaining cost-competitive, improving over the strongest automated and handcrafted baselines by 4.26% and 14.92% relative, respectively.

URL PDF HTML ☆

赞 0 踩 0

2606.11473 2026-06-11 cs.LG cs.AI stat.ML 新提交

CRUMB: Efficient Prior Fitted Network Inference via Distributionally Matched Context Batching

CRUMB: 通过分布匹配上下文批处理实现高效先验拟合网络推理

Jamie Heredge, Mattia J. Villani, Pranav Deshpande, Akshay Seshadri, Niraj Kumar

发表机构 * Global Technology Applied Research, JPMorganChase（摩根大通全球技术应用研究）

AI总结提出CRUMB方法，通过聚类查询、最小化最大均值差异选择训练子集、再执行精确推理，在不重新训练的情况下加速先验拟合网络推理，在51个数据集上优于同类方法。

Comments 26 pages, 13 figures

详情

AI中文摘要

先验拟合网络（PFNs）是一类有前景的表格基础模型，执行上下文学习，其中整个带标签的训练集作为上下文提供，并在单次前向传播中生成测试查询的预测。然而，许多PFN架构中二次缩放的自注意力机制使得对于非常大的训练数据集推理变得不可行。我们提出CRUMB（使用最小化MMD批处理的聚类检索），一个三阶段推理包装器：（i）聚类测试查询，（ii）通过贪心最小化最大均值差异（MMD）为每个聚类选择一个小型、分布匹配的训练子集，（iii）在每个缩减上下文的批次上执行精确的PFN推理。CRUMB是架构无关的，无需重新训练。在51个数据集的TabArena基准测试中，跨三种PFN架构（TabPFNv2、TabICLv1、TabICLv2）评估，我们展示了CRUMB优于类似的最先进的上下文选择策略。我们还展示了CRUMB对协变量漂移具有鲁棒性，因为MMD最小化步骤自然有助于对齐训练上下文分布以匹配当前测试批次分布。

英文摘要

Prior-fitted networks (PFNs) are a promising class of tabular foundation models that perform in-context learning, whereby the entire labelled training set is supplied as context, and predictions for test queries are produced in a single forward pass. However, the quadratically scaling self-attention mechanism in many PFN architectures makes inference prohibitive for very large training datasets. We propose CRUMB (Clustered Retrieval Using Minimised-MMD Batching), a three-stage inference wrapper that (i) clusters the test queries, (ii) selects a small, distributionally matched training subset for each cluster by greedily minimising the maximum mean discrepancy (MMD), and (iii) runs exact PFN inference on each reduced-context batch. CRUMB is architecture-agnostic and requires no retraining. On the 51-dataset TabArena benchmark, evaluated across three PFN architectures (TabPFNv2, TabICLv1, TabICLv2), we show that CRUMB outperforms similar state-of-the-art context selection strategies. We also show that CRUMB is resilient to covariate drift, as the MMD-minimisation step naturally helps align the training context distribution to match the current test batch distributions.

URL PDF HTML ☆

赞 0 踩 0

2606.11625 2026-06-11 cs.LG 新提交

面向表格-图像多模态学习的参数高效适配器微调

Jiaqi Luo

发表机构 * School of Mathematical Sciences, Soochow University（苏州大学数学科学学院）

AI总结提出TI-Adapter框架，通过冻结表格编码器并添加适配器，以及图像分支的嵌入层和瓶颈层适配器，实现高效多模态微调，在20个数据集上以更少参数达到或超越全微调性能。

详情

AI中文摘要

表格-图像多模态学习旨在通过联合使用结构化表格属性和视觉数据来提高预测建模能力。尽管预训练编码器提供了强大的模态特定表示，但全微调可能计算成本高昂，而保持编码器冻结可能限制任务特定适应。我们提出了表格-图像适配器（TI-Adapter），一种基于模态特定适配器的微调框架，用于高效的多模态适应。TI-Adapter冻结预训练的表格编码器，并在提取的表格嵌入后学习一个适配器，同时通过嵌入级和瓶颈级适配器来适应图像分支，而不是全微调。在20个表格-图像数据集上的实验表明，TI-Adapter在使用显著更少的可训练参数的情况下，达到了与全微调相当或更好的预测性能。消融研究进一步证明了适配器放置对于平衡性能和实际效率的重要性。

英文摘要

Tabular-image multimodal learning aims to improve predictive modeling by jointly using structured tabular attributes and visual data. Although pretrained encoders provide strong modality-specific representations, full fine-tuning can be computationally expensive, while keeping encoders frozen may limit task-specific adaptation. We propose the Tabular-Image Adapter (TI-Adapter), a modality-specific adapter-based fine-tuning framework for efficient multimodal adaptation. TI-Adapter freezes the pretrained tabular encoder and learns an adapter after the extracted tabular embedding, while adapting the image branch with embedding-level and bottleneck-level adapters instead of full fine-tuning. Experiments on 20 tabular-image datasets show that TI-Adapter achieves competitive or better predictive performance than full fine-tuning while using substantially fewer trainable parameters. Ablation studies further demonstrate the importance of adapter placement for balancing performance and practical efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.12171 2026-06-11 cs.CV cs.LG 交叉投稿

Beyond Dark Knowledge: Mixup-Based Distillation for Reliable Predictions

超越暗知识：基于混合的蒸馏实现可靠预测

José Medina, Paul Honeine, Abdelaziz Bensrhair, Amnir Hadachi

发表机构 * ITS Lab, Institute of Computer Science, University of Tartu（塔尔图大学计算机科学学院ITS实验室）； LITIS, Université de Rouen（鲁昂大学LITIS实验室）； LITIS, INSA de Rouen（鲁昂国立应用科学学院LITIS实验室）

AI总结研究知识蒸馏与混合训练结合时教师-学生不匹配的影响，发现学生能独立获得线性结构并提升准确率与校准，提出混合蒸馏作为更丰富的知识传递通道。

详情

AI中文摘要

知识蒸馏（KD）和混合（mixup）已被证明能有效诱导类别边界的平滑性：KD捕捉概率分布中的固有类别关系，而混合通过输入的凸组合强制执行这些关系。然而，它们的相互作用仍未被充分理解，特别是当混合仅在学生训练期间应用时。在这种情况下，教师被查询来自其训练期间从未见过的邻域分布的输入，这是一种受控的不匹配，其对知识转移的影响尚未被表征。我们表明，这种不匹配导致教师的监督信号被分布混淆而非类间结构主导。尽管如此，学生并非仅仅模仿教师：它独立地在邻域区域获得更大的线性度，这是教师缺乏的结构特性，并超越了暗知识转移。与基线相比，带有混合的KD持续提高学生准确率，并将过度自信降低一个数量级，在CIFAR和ImageNet上使用不同容量的教师均如此。关键的是，校准独立于准确率转移从教师传播到学生，温度缩放控制着可测量的准确率-校准权衡，在邻域训练下这种权衡更加明显。这些结果将混合蒸馏重新定义为不是标准KD的退化版本，而是一个更丰富的传递通道，同时塑造判别性能、不确定性估计和表示几何。

英文摘要

Knowledge Distillation (KD) and mixup have proven effective at inducing smoothness in class boundaries; KD captures inherent class relationships in probability distributions, and mixup enforces them through convex combinations of inputs. Their interaction, however, remains poorly understood, particularly when mixup is applied only during student training. In this setting, the teacher is queried on inputs drawn from a vicinal distribution it never saw during training, a controlled mismatch whose effect on knowledge transfer has not been characterised. We show that this mismatch causes the teacher's supervisory signal to be dominated by distributional confusion rather than inter-class structure. Despite it, the student does not merely imitate the teacher: it independently acquires greater linearity in the vicinal region, a structural property that the teacher lacks, and goes beyond dark-knowledge transfer. KD with mixup consistently improves student accuracy and reduces overconfidence by an order of magnitude relative to the baseline, across CIFAR and ImageNet with varying-capacity teachers. Crucially, calibration propagates from teacher to student independently of accuracy transfer, and temperature scaling governs a measurable accuracy-calibration trade-off that becomes more pronounced under vicinal training. These results reframe mixup distillation not as a degraded version of standard KD, but as a richer transfer channel that simultaneously shapes discriminative performance, uncertainty estimation, and representational geometry.

URL PDF HTML ☆

赞 0 踩 0

2606.12278 2026-06-11 cs.CV cs.LG 交叉投稿

Finding Sparse Subnetworks in One Training Cycle via Progressive Magnitude-Based Pruning

通过渐进式幅度剪枝在一个训练周期内找到稀疏子网络

Romana Qureshi, Hafida Benhidour, Said Kerrache, Nahlah Aljeraisy

发表机构 * King Abdullah University of Science and Technology（阿卜杜拉国王科技大学）； University of Jeddah（吉达大学）； King Fahd University of Petroleum and Minerals（法赫德国王石油矿产大学）； King Saud University（沙特国王大学）

AI总结提出渐进式幅度剪枝方法，在单训练周期内线性增加稀疏度，基于权重幅度更新掩码，在CIFAR-10和MNIST上优于LTH、SNIP和GraSP等基线。

详情

AI中文摘要

神经网络剪枝通过移除不太重要的参数来减小模型大小，同时旨在保持预测性能。尽管彩票假说（LTH）表明，当从合适的初始化训练时，稀疏子网络可以匹配密集网络，但其迭代剪枝过程需要多个完整的训练周期。本工作评估了渐进式幅度剪枝作为一种单周期替代方案。该方法在训练期间使用线性调度逐渐增加稀疏度，并基于活跃权重幅度更新剪枝掩码。我们在CIFAR-10和MNIST上，针对ResNet、VGG风格和LeNet架构进行了系统实验，将所提方法与代表性的迭代和基于初始化的剪枝基线（包括LTH、SNIP和GraSP）进行比较。在CIFAR-10上，该方法在ResNet-18上以72.9%稀疏度达到95.12%的准确率，而LTH报告为90.5%。在极端稀疏度下，它在VGG类架构上以97%稀疏度达到93.13%的准确率，而SNIP约为92.0%；在VGG-19上以97.97%稀疏度达到93.44%的准确率，而GraSP在98%稀疏度下为92.19%。在ResNet-18上的稀疏度-准确率分析进一步表明，在70-85%稀疏度范围内，准确率保持在密集基线的0.1个百分点以内。这些结果表明，在所评估的设置下，渐进式幅度剪枝为神经网络稀疏化提供了一种有效的单周期方法。

英文摘要

Neural network pruning reduces model size by removing less important parameters while aiming to preserve predictive performance. Although the Lottery Ticket Hypothesis (LTH) shows that sparse subnetworks can match dense networks when trained from suitable initializations, its iterative pruning procedure requires multiple complete training cycles. This work evaluates progressive magnitude-based pruning as a single-cycle alternative. The method gradually increases sparsity during training using a linear schedule and updates pruning masks based on active weight magnitudes. We conduct systematic experiments on CIFAR-10 and MNIST across ResNet, VGG-style, and LeNet architectures, comparing the proposed method with representative iterative and initialization-based pruning baselines, including LTH, SNIP, and GraSP. On CIFAR-10, the method achieves 95.12\% accuracy on ResNet-18 at 72.9\% sparsity, compared with 90.5\% reported for LTH. At extreme sparsity, it achieves 93.13\% accuracy on a VGG-like architecture at 97\% sparsity, compared with approximately 92.0\% for SNIP, and 93.44\% accuracy on VGG-19 at 97.97\% sparsity, compared with 92.19\% for GraSP at 98\% sparsity. A sparsity-accuracy analysis on ResNet-18 further shows that accuracy remains within 0.1 percentage points of the dense baseline across 70--85\% sparsity. These results indicate that progressive magnitude-based pruning provides an effective single-cycle approach for neural network sparsification under the evaluated settings.

URL PDF HTML ☆

赞 0 踩 0

2606.12411 2026-06-11 cs.CL cs.LG 交叉投稿

Context-Driven Incremental Compression for Multi-Turn Dialogue Generation

上下文驱动的增量压缩用于多轮对话生成

Yeongseo Jung, Jaehyeok Kim, Eunseo Jung, Jiachuan Wang, Yongqi Zhang, Ka Chun Cheung, Simon See, Lei Chen

发表机构 * The Hong Kong University of Science and Technology（香港科技大学）； NVIDIA AI Technology Center（NVIDIA AI技术中心）； Shanghai Jiao Tong University（上海交通大学）； The Chinese University of Hong Kong（香港中文大学）

AI总结提出上下文驱动的增量压缩（C-DIC），通过可修订的线程压缩状态和轻量级检索-修订-写回循环，实现跨轮信息共享，稳定长对话性能。

Comments Accepted at ICML 2026

详情

AI中文摘要

现代对话代理在每一轮都会处理不断增长的对话历史，导致冗余的注意力和编码成本随对话长度增加。简单的截断或摘要会降低保真度，而现有的上下文压缩器缺乏跨轮记忆共享或修订，导致信息丢失和长对话中的累积错误。我们重新审视了对话动态下的上下文压缩，并经验性地展示了其脆弱性。为了提高效率和鲁棒性，我们引入了上下文驱动的增量压缩（C-DIC），它将对话视为交织的上下文线程，并在单个紧凑的对话记忆中存储每个线程的可修订压缩状态。在每一轮，一个轻量级的检索、修订和写回循环在轮次之间共享信息并更新过时的记忆，从而稳定长期行为。此外，我们将截断反向传播（TBPTT）适应于我们的多轮设置，学习跨轮依赖关系而无需完整历史反向传播。在长对话基准上的大量实验证明了C-DIC的优越性能和效率；值得注意的是，C-DIC在数百轮对话中表现出稳定的推理延迟和困惑度，为高质量对话建模提供了一条可扩展的路径。

英文摘要

Modern conversational agents condition on an ever-growing dialogue history at each turn, incurring redundant attention and encoding costs that grow with conversation length. Naive truncation or summarization degrades fidelity, while existing context compressors lack cross-turn memory sharing or revision, causing information loss and compounding errors in long dialogues. We revisit the context compression under conversational dynamics and empirically present its fragility. To improve both efficiency and robustness, we introduce Context-Driven Incremental Compression (C-DIC), which treats a conversation as interleaved contextual threads and stores revisable per-thread compression states in a single, compact dialogue memory. At each turn, a lightweight retrieve, revise, and write-back loop shares information across turns and updates stale memories, stabilizing long-horizon behavior. In addition, we adapt truncated backpropagation-through-time (TBPTT) to our multi-turn setting, learning cross-turn dependencies without full-history backpropagation. Extensive experiments on long-form dialogue benchmarks demonstrate superior performance and efficiency of C-DIC; notably, C-DIC shows stable inference latency and perplexity over hundreds of dialogue turns, supporting a scalable path to high-quality dialogue modeling.

URL PDF HTML ☆

赞 1 踩 0

2509.20241 2026-06-11 cs.LG cs.DC 版本更新

Energy Use of AI Inference, Efficiency Pathways, and Test-Time Scaling

AI推断的能耗：效率路径与测试时计算

Felipe Oviedo, Fiodar Kazhamiaka, Esha Choukse, Allen Kim, Amy Luers, Melanie Nakagawa, Ricardo Bianchini, Juan M. Lavista Ferres

发表机构 * Microsoft（微软）

AI总结本文提出基于令牌吞吐量的底层方法，估算大规模大语言模型的每查询能耗，揭示测试时扩展场景下的能耗变化及效率提升潜力。

Comments A preprint version with DOI is available at Zenodo: https://doi.org/10.5281/zenodo.17188770

详情

DOI: 10.1016/j.joule.2026.102430
Journal ref: Joule (2026) 102430

AI中文摘要

随着AI推断扩展到数十亿查询和新兴推理及代理工作流增加令牌需求，可靠估计每查询能耗对容量规划、排放核算和效率优先级至关重要。许多公开估计不一致且高估能耗，因为它们从有限基准外推且未能反映大规模下的效率提升。本文引入基于令牌吞吐量的底层方法，估算大规模LLM系统的每查询能耗。在H100节点下运行的模型，根据现实工作负载和GPU利用率及PUE约束，估算前沿规模模型（>2000亿参数）的每查询能耗中位数为0.34瓦（IQR: 0.18-0.67）。这些结果与生产规模配置测量一致，表明非生产估计可能高估能耗4-20倍。扩展到测试时扩展场景，每个典型查询的令牌数增加15倍，中位数能耗升至4.32瓦，表明在该范围内聚焦效率将带来最大的集群节能。我们量化了在模型、服务平台和硬件层面的可实现效率提升，发现单个模型的每查询能耗中位数减少1.5-3.5倍，而综合改进可能带来8-20倍的减少。为说明系统级影响，我们估算一个处理十亿查询的部署的基线日能耗为0.8 GWh/天。如果10%为长查询，需求可能增长到1.8 GWh/天。通过针对性的效率干预，它降至0.9 GWh/天，与该规模的网络搜索能耗相当。这呼应了数据中心历史上通过效率提升控制能耗增长的历史。

英文摘要

As AI inference scales to billions of queries, estimates of per-query energy use are increasingly important for capacity planning, efficiency interventions, and policy. Yet many public estimates assume non-production settings, leading to systematic overestimation. We introduce a bottom-up framework estimating inference energy from token throughput, node power, and overhead under large-scale deployment assumptions. For frontier-scale models (>200B parameters) on H100 nodes, we estimate a median energy of 0.31 Wh/query (IQR 0.16-0.60), indicating widely cited estimates are overstated by 4-20x. In test-time scaling scenarios 15x longer than typical queries, the median energy rises 13x to 3.91 Wh (IQR 2.15-7.05). Across models, serving systems, and hardware, we estimate 8-20x line-of-sight energy reductions. At datacenter scale, serving 1 billion queries/day requires 0.7 GWh; if 10% are long queries, demand rises to 1.7 GWh/day. With efficiency interventions, it falls to 0.8 GWh/day, mitigating the energy impact of test-time scaling.

URL PDF HTML ☆

赞 0 踩 0

2512.08211 2026-06-11 cs.LG 版本更新

MobileFineTuner: A Mobile-Native Framework for On-Device LLM Fine-Tuning in Real-World Embedded AI Applications

MobileFineTuner：面向真实世界嵌入式AI应用中设备端大语言模型微调的移动原生框架

Jiaxiang Geng, Lunyu Zhao, Yiyi Lu, Bing Luo

发表机构 * Duke Kunshan University（Duke昆山大学）； The University of Hong Kong（香港大学）

AI总结提出移动原生框架MobileFineTuner，通过C++实现资源感知训练运行时（内存高效注意力、激活检查点等），在商用手机上实现端到端LLM微调，显著降低内存压力并提升可执行性。

Comments 26 pages, 25 figures

详情

AI中文摘要

大语言模型（LLM）正从以云为中心的服务转向设备端嵌入式AI，其中模型与从用户及其物理环境感知的私有、纵向信号进行交互。手机是此类应用的自然平台，因为用户随身携带、连接可穿戴传感器，并深度集成于日常移动应用中。然而，在商用手机上实际进行LLM微调仍然困难。现有微调框架大多基于Python且面向服务器，难以部署到移动应用中。我们提出MobileFineTuner，一个面向移动原生的开源框架，用于在商用手机上实现端到端LLM微调。MobileFineTuner用C++实现，并提供可复用的训练栈。为了在移动资源约束下使微调可行，MobileFineTuner集成了资源感知的训练运行时，包括内存高效注意力、激活检查点、梯度累积、参数分片和能量感知调度。我们在真实手机上使用GPT-2、Gemma 3和Qwen2.5模型，在多个微调任务上评估MobileFineTuner。结果表明，MobileFineTuner再现了标准Full-FT和LoRA微调行为，显著降低了内存压力并提升了在内存受限手机上的可执行性。我们进一步通过一个私有的校园健康代理应用展示了MobileFineTuner，其中本地LLM在用户特定的可穿戴感知记录上进行微调，以提供更个性化的响应，同时将原始记录保留在手机上。这些结果确立了MobileFineTuner作为在嵌入式AI和感知系统中研究和构建设备端LLM微调应用的实用工具包。

英文摘要

Large language models (LLMs) are moving from cloud-centric services toward on-device embedded AI, where models interact with private, longitudinal signals sensed from users and their physical environments. Mobile phones are a natural platform for such applications because they are continuously carried by users, connected to wearable sensors, and deeply integrated with daily mobile applications. However, practical LLM fine-tuning on commodity phones remains difficult. Existing fine-tuning frameworks are largely Python-based and server-oriented, making them hard to deploy inside mobile applications. We present MobileFineTuner, a mobile-native open-source framework for end-to-end LLM fine-tuning on commodity mobile phones. MobileFineTuner is implemented in C++ and provides a reusable training stack. To make fine-tuning feasible under mobile resource constraints, MobileFineTuner integrates a resource-aware training runtime with memory-efficient attention, activation checkpointing, gradient accumulation, parameter sharding, and energy-aware scheduling. We evaluate MobileFineTuner on real mobile phones using GPT-2, Gemma 3, and Qwen2.5 models across multiple fine-tuning tasks. The results show that MobileFineTuner reproduces standard Full-FT and LoRA fine-tuning behavior, substantially reduces memory pressure and improves executability on memory-constrained phones. We further demonstrate MobileFineTuner through a private campus health-agent application, where a local LLM is fine-tuned on user-specific wearable-sensing records to provide more personalized responses while keeping raw records on the phone. These results establish MobileFineTuner as a practical toolkit for studying and building on-device LLM fine-tuning applications in embedded AI and sensing systems.

URL PDF HTML ☆

赞 0 踩 0

2601.23278 2026-06-11 cs.LG cs.AR cs.CL 版本更新

FOCUS: DLLMs Know How to Tame Their Compute Bound

FOCUS: DLLMs 知道如何驯服它们的计算瓶颈

Kaihua Liang, Xin Tan, An Zhong, Hong Xu, Marco Canini

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Toronto（多伦多大学）

AI总结针对扩散大语言模型解码中大部分计算浪费在不可解码令牌上的问题，提出 FOCUS 推理系统，通过动态聚焦可解码令牌并驱逐不可解码令牌，提升有效批大小，实现高达 3.52 倍的吞吐量提升。

Comments ICML 2026 camera-ready version

详情

AI中文摘要

扩散大语言模型（DLLMs）为自回归模型提供了一种引人注目的替代方案，但其部署受到高解码成本的制约。在这项工作中，我们识别出 DLLM 解码中的一个关键低效问题：虽然计算在令牌块上并行化，但每个扩散步骤中只有一小部分令牌是可解码的，导致大部分计算浪费在不可解码的令牌上。我们进一步观察到注意力导出的令牌重要性与逐令牌解码概率之间存在强相关性。基于这一洞察，我们提出了 FOCUS，一个专为 DLLMs 设计的推理系统。通过动态地将计算聚焦于可解码令牌并实时驱逐不可解码令牌，FOCUS 增加了有效批大小，缓解了计算限制并实现了可扩展的吞吐量。实验评估表明，在大批量设置下，FOCUS 相比生产级引擎 LMDeploy 实现了高达 3.52 倍的吞吐量提升，同时在多个基准测试中保持或提升了生成质量。

英文摘要

Diffusion Large Language Models (DLLMs) offer a compelling alternative to Auto-Regressive models, but their deployment is constrained by high decoding cost. In this work, we identify a key inefficiency in DLLM decoding: while computation is parallelized over token blocks, only a small subset of tokens is decodable at each diffusion step, causing most compute to be wasted on non-decodable tokens. We further observe a strong correlation between attention-derived token importance and token-wise decoding probability. Based on this insight, we propose FOCUS, an inference system designed for DLLMs. By dynamically focusing computation on decodable tokens and evicting non-decodable ones on-the-fly, FOCUS increases the effective batch size, alleviating compute limitations and enabling scalable throughput. Empirical evaluations demonstrate that FOCUS achieves up to 3.52$\times$ throughput improvement over the production-grade engine LMDeploy in large-batch settings, while preserving or improving generation quality across multiple benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2603.09555 2026-06-11 cs.LG cs.AI cs.DC cs.PF 版本更新

Compiler-First State Space Duality and Portable $O(1)$ Autoregressive Caching for Inference

编译器优先的状态空间对偶性与可移植的 $O(1)$ 自回归缓存推理

Cosmo Santoni, Anmol Thapar

发表机构 * Imperial College London（帝国理工学院伦敦分校）

AI总结提出一种基于编译器优先的状态空间对偶性（SSD）结构的推理方法，通过标准JAX原语实现无自定义内核的单源推理路径，在TPU和GPU上达到高硬件利用率，且缓存解码速度比全前缀重计算快27-36倍。

Comments 21 pages, 6 figures. Code available at: https://github.com/CosmoNaught/mamba2-jax

详情

AI中文摘要

高吞吐量的Mamba-2推理通常依赖于融合的CUDA和Triton内核，这限制了在不同加速器后端之间的可移植性。我们证明状态空间对偶性（SSD）递归具有编译器友好的结构：对角逐头动态、固定大小分块、以einsum为主的计算以及静态控制流。在标准JAX原语中表达这种结构，可以得到一个无需自定义内核的单源推理路径、一个注册的JAX PyTree缓存以及一个编译后的设备上自回归循环。在单个Google Cloud TPU v6e上，batch-1预填充达到约140 TFLOPS，即15%的模型FLOP利用率（MFU），这是该场景下的屋顶线上限；缓存解码达到高达64%的硬件带宽利用率（HBU）。在4096个token的上下文中，对于五个Mamba-2检查点（参数从130M到2.7B），缓存解码比全前缀重计算快27-36倍。相同的源代码在未修改的情况下可在NVIDIA L40S上运行，其中缓存解码在所有模型规模下均保持序列长度无关。WikiText-103验证困惑度与Triton参考实现mamba_ssm v2.2.2相差在±0.0005以内，隐藏状态在float32舍入容差内一致。代码可在以下网址获取：https://this URL。

英文摘要

High-throughput Mamba-2 inference is usually tied to fused CUDA and Triton kernels, limiting portability across accelerator backends. We show that the state space duality (SSD) recurrence has a compiler-friendly structure: diagonal per-head dynamics, fixed-size chunking, einsum-dominated compute, and static control flow. Expressing this structure in standard JAX primitives gives a single-source inference path with no custom kernels, a registered JAX PyTree cache, and a compiled on-device autoregressive loop. On a single Google Cloud TPU v6e, batch-1 prefill reaches approximately 140 TFLOPS, or 15% model FLOP utilisation (MFU), the roofline ceiling for this regime, and cached decode reaches up to 64% hardware bandwidth utilisation (HBU). At a 4096-token context, cached decode is 27x--36x faster than full-prefix recomputation across five Mamba-2 checkpoints from 130M to 2.7B parameters. The same source runs unmodified on NVIDIA L40S, where cached decode remains sequence-length independent across all model scales. WikiText-103 validation perplexity matches the Triton reference mamba_ssm v2.2.2 within +/-0.0005 points, and hidden states agree to float32 rounding tolerance. Code is available at https://github.com/CosmoNaught/mamba2-jax.

URL PDF HTML ☆

赞 0 踩 0

2605.14738 2026-06-11 cs.LG cs.AI 版本更新

TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability

TAPIOCA: 为什么任务感知剪枝能提升模型对分布外数据的能力

Krish Sharma, Omar Naim, Soumadeep Saha, Vinija Jain, Aman Chadha, Nicholas Asher

发表机构 * ANITI ； Meta ； Apple

AI总结本文研究了任务感知剪枝在分布外数据上的改进机制，通过实验发现剪枝能提升OOD准确性，其核心贡献是通过几何解释说明任务感知剪枝如何调整模型表示以适应任务需求。

详情

AI中文摘要

近期的研究表明，任务感知层剪枝可以提高模型在特定任务上的性能，如TALE所示。本文探讨了这种改进何时发生以及为何会发生。我们首先证明，在受控的多项式回归任务和大型语言模型中，此类剪枝在分布内（ID）数据上没有好处，但能一致地提高分布外（OOD）准确性。我们进一步通过实验证明，OOD输入会诱导出层间范数和成对距离的分布，这些分布偏离ID分布的相应分布。这导致了任务感知剪枝的几何解释：每个任务诱导出一个任务适应的几何结构，通过ID输入上观察到的表示分布来经验性地表征。OOD输入可以引入任务适应几何的扭曲版本。任务感知剪枝识别出创建或放大这种扭曲的层；通过移除这些层，它将OOD表示的范数和成对距离转向在适应分布上观察到的值。这使OOD输入与模型的任务适应几何重新对齐，并提高性能。我们通过受控分布偏移和残差缩放干预提供了因果证据，并在不同模型规模上展示了一致的行为。

英文摘要

Recent work has promoted task-aware layer pruning as a way to improve model performance on particular tasks, as shown by TALE. In this paper, we investigate when such improvements occur and why. We show first that, across controlled polynomial regression tasks and large language models, such pruning yields no benefit on in-distribution (ID) data but consistently improves out-of-distribution (OOD) accuracy. We further show empirically that OOD inputs induce layerwise norm and pairwise-distance profiles that deviate from the corresponding ID profiles. This leads to a geometric explanation of task-aware pruning: each task induces a task-adapted geometry, characterized empirically by the representation profiles observed on ID inputs. OOD inputs can introduce a distorted version of the task-adapted geometry. Task-aware pruning identifies layers that create or amplify this distortion; by removing them, it shifts OOD representational norms and pairwise distances toward those observed on the adapted distribution. This realigns OOD inputs with the model's task-adapted geometry and improves performance. We provide causal evidence through controlled distribution shifts and residual-scaling interventions, and demonstrate consistent behavior across model scales.

URL PDF HTML ☆

赞 0 踩 0

2605.25820 2026-06-11 cs.LG 版本更新

Visual-Redundancy-Controlled Parallel Decoding for Diffusion-Based Multimodal Large Language Models

基于扩散的多模态大语言模型的视觉冗余控制并行解码

Yulin Yuan, Hongshuo Zhao, Xiangming Meng

发表机构 * Zhejiang University（浙江大学）； ZJUI-UIUC Institute（ZJUI-UIUC研究院）

AI总结针对扩散型多模态大语言模型并行解码中视觉冗余问题，提出视觉冗余指数（VRI）和无需训练的视觉冗余控制解码（VRCD）方法，通过令牌到图像的注意力优先选择视觉互补位置，在多个基准上提升准确率。

Comments 18 pages, 5 figures, preprint. Code is available at https://github.com/infiniteYuanyl/VRCD

详情

AI中文摘要

基于扩散的多模态大语言模型（dMLLMs）通过迭代并行预测多个掩码位置的令牌进行解码。这使每个解码步骤成为一个位置选择问题：模型不仅要选择哪些预测单独可靠，还要选择哪些位置应一起提交作为后续解码步骤的上下文。现有的基于置信度的解码独立地对掩码位置进行排序并提交前K个位置，很大程度上忽略了提交的令牌是否提供互补的视觉基础。我们识别了这种策略在多模态设置中的步骤级局限性：在同一步骤中选择的高置信度令牌可能依赖重叠的视觉基础，导致提交的令牌之间出现视觉冗余，从而为后续解码留下较少的互补视觉基础。为了量化这种效应，我们引入了视觉冗余指数（VRI），该指数衡量并行提交的令牌之间的视觉基础重叠程度。为了在解码过程中控制这种冗余，我们提出了视觉冗余控制解码（VRCD），一种无需训练的推理时解码方法，它利用令牌到图像的注意力优先选择视觉互补的位置。在多种多模态基准测试中，VRCD以适度的运行时开销减少了视觉冗余和剩余位置熵。在更长的解码实验中，与基于置信度的解码相比，它在M^3CoT上实现了高达18.8%的相对准确率提升，在MMBench上实现了6.9%的提升。代码将在https://github.com/infiniteYuanyl/VRCD发布。

英文摘要

Diffusion-based multimodal large language models (dMLLMs) decode by iteratively predicting tokens at multiple masked positions in parallel. This turns each decoding step into a position-selection problem: the model must choose not only which predictions are reliable in isolation, but also which positions should be committed together as context for later decoding steps. Existing confidence-based decoding ranks masked positions independently and commits the top-K positions, largely ignoring whether the committed tokens provide complementary visual grounding. We identify a step-level limitation of this strategy in multimodal settings: high-confidence tokens selected in the same step can rely on overlapping visual grounding, introducing visual redundancy among the committed tokens and leaving less complementary visual grounding available for later decoding. To quantify this effect, we introduce the Visual Redundancy Index (VRI), which measures visual grounding overlap among tokens committed in parallel. To control this redundancy during decoding, we propose Visual-Redundancy-Controlled Decoding (VRCD), a training-free inference-time decoding method that uses token-to-image attention to prioritize visually complementary positions. Across diverse multimodal benchmarks, VRCD reduces visual redundancy and remaining-position entropy with modest runtime overhead. In longer decoding experiments, it also achieves relative accuracy gains of up to 18.8% on M^3CoT and 6.9% on MMBench over confidence-based decoding. Code is available at https://github.com/infiniteYuanyl/VRCD.

URL PDF HTML ☆

赞 0 踩 0

2605.29128 2026-06-11 cs.LG 版本更新

Apertus LLM Family Expansion via Distillation and Quantization

通过蒸馏和量化扩展 Apertus LLM 系列

Andrei Panferov, Davit Melikidze, Martin Jaggi, Dan Alistarh

发表机构 * LLM Family Expansion via Distillation and Quantization（LLM家族通过蒸馏和量化进行扩展）

AI总结本文通过蒸馏和量化方法，基于 Apertus 8B 模型低成本扩展出参数高达 4B 的模型系列，覆盖多种硬件约束并保持强准确性。

2606.07362 2026-06-11 cs.LG 版本更新

Breaking the Ice: Analyzing Cold Start Latency in vLLM

打破冰层：分析 vLLM 中的冷启动延迟

Huzaifa Shaaban Kabakibo, Animesh Trivedi, Lin Wang

发表机构 * Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country（匿名机构，匿名城市，匿名地区，匿名国家）

AI总结本文首次系统分析 vLLM 推理引擎的冷启动延迟，将其分解为六个基础步骤，发现主要受 CPU 限制，并建立轻量级分析模型预测延迟，为大规模推理环境资源规划提供指导。

详情

Journal ref: Proceedings of the 9th MLSys Conference, Bellevue, WA, USA, 2026

AI中文摘要

随着可扩展推理服务的普及，推理引擎的冷启动延迟变得重要。如今，vLLM 已成为许多推理工作负载的事实标准推理引擎。尽管流行，但由于其复杂性和快速演进，尚未有对其启动延迟的系统研究。随着主要架构创新如 V1 API 和 this http URL 的引入，本文首次对 vLLM 启动延迟进行了详细的性能表征。我们将启动过程分解为六个基础步骤，并证明其主要受 CPU 限制。每个步骤在模型级和系统级参数方面表现出一致且可解释的缩放趋势，从而能够细粒度地归因延迟来源。基于这些见解，我们开发了一个轻量级分析模型，能够准确预测给定硬件配置下的 vLLM 启动延迟，为大规模推理环境中的资源规划提供可操作的指导。所有基准测试数据集、分析工具和预测脚本均在此 https URL 开源。

英文摘要

As scalable inference services become popular, the cold start latency of an inference engine becomes important. Today, vLLM has evolved into the de facto inference engine of choice for many inference workloads. Although popular, due to its complexity and rapid evolution, there has not been a systematic study of its startup latency. With major architectural innovations such as the V1 API and the introduction of torch.compile, this paper presents the first detailed performance characterization of vLLM startup latency. We break down the startup process into six foundational steps and demonstrate that it is predominantly CPU bound. Each step exhibits consistent and interpretable scaling trends with respect to model-level and system-level parameters, enabling fine-grained attribution of latency sources. Building on these insights, we develop a lightweight analytical model that accurately predicts vLLM startup latency for a given hardware configuration, providing actionable guidance for resource planning in large-scale inference environments. All benchmarking datasets, analysis tools, and prediction scripts are open sourced at https://github.com/upb-cn/vllm-startup-profiler.

URL PDF HTML ☆

赞 0 踩 0

2606.10820 2026-06-11 cs.LG cs.AI cs.CL 版本更新

K-Forcing: Joint Next-K-Token Decoding via Push-Forward Language Modeling

K-Forcing：通过前推语言建模进行联合下一K词解码

Zhiwei Tang, Yuanyu He, Yizheng Han, Wangbo Zhao, Jiasheng Tang, Fan Wang, Bohan Zhuang

发表机构 * DAMO Academy, Alibaba Group（阿里巴巴达摩院）； Hupan Lab（湖畔实验室）； Zhejiang University（浙江大学）； The Hong Kong University of Science and Technology（香港科技大学）

AI总结提出K-Forcing范式，通过前推映射将自回归模型蒸馏为单次前向传播生成多个未来词，实现2.4-3.5倍加速，质量损失小。

Comments Code: https://github.com/alibaba-damo-academy/K-Forcing

详情

AI中文摘要

自回归语言建模是文本生成的主导范式，但其逐词顺序解码使得推理受限于内存且效率低下。现有的加速方法（如推测解码和扩散语言模型）在特定条件下可提升速度，但并未直接解决高负载批量服务——这一对工业级部署最为关键的场景。我们提出K-Forcing，一种用于联合下一k词解码的前推语言建模范式。K-Forcing将现有自回归模型蒸馏为条件前推映射——该映射在单次前向传播中将独立均匀噪声变量转换为多个未来词的联合样本。该设计保留了固定长度输出，复用了自回归教师模型的主干，并与标准自回归服务基础设施兼容。我们通过渐进式自强迫蒸馏训练该映射，逐步扩展预测窗口，同时使学生模型紧密匹配自回归教师模型的序列分布。我们在LM1B和OpenWebText上使用标准因果Transformer主干评估K-Forcing。当激进配置为每次前向传播生成k=4个词时，K-Forcing在不同批量大小下实现约2.4-3.5倍加速，同时相对于自回归教师模型仅带来轻微的质量下降。随着推理在现代LLM的生命周期计算成本中占据主导地位，K-Forcing为在现实高负载部署下加速自回归生成提供了一条有前景的途径。

英文摘要

Autoregressive (AR) language modeling is the dominant paradigm for text generation, yet its sequential token-by-token decoding makes inference memory-bound and inefficient. Existing acceleration approaches, such as speculative decoding and diffusion language models, can yield speedups under certain conditions but do not directly address high-load batch serving--the scenario most critical for industrial-scale deployment. We introduce K-Forcing, a push-forward language modeling paradigm for joint next-k-token decoding. K-Forcing distills an existing AR model into a conditional push-forward mapping--one that transforms independent uniform noise variables into a joint sample of multiple future tokens in a single forward pass. This design preserves fixed-length outputs, reuses the AR teacher backbone, and remains compatible with standard AR serving infrastructure. We train this mapping via progressive self-forcing distillation, which gradually expands the prediction window while enabling the student to closely match the sequence distribution of the AR teacher. We evaluate K-Forcing on LM1B and OpenWebText using a standard causal Transformer backbone. When aggressively configured to generate k = 4 tokens per forward pass, K-Forcing delivers approximately 2.4-3.5x speedup across different batch sizes, while incurring modest quality degradation relative to its AR teacher. As inference increasingly dominates the lifetime compute cost of modern LLMs, K-Forcing offers a promising route toward accelerating AR generation under real-world high-load deployment.

URL PDF HTML ☆

赞 0 踩 0

2505.17623 2026-06-11 cs.CR cs.AI cs.ET cs.LG cs.PF 版本更新

块大小、权重精度和缩放精度在低功耗边缘高效神经网络NVFP4推理中的消融研究

Ovishake Sen, Venkata Nithin Kamineni, Daniel Lobo, Swarup Bhunia, Rickard Ewetz, Baibhab Chatterjee

AI总结本文通过消融实验研究NVFP4 LUT推理框架，结合4位激活、两级缩放和电压缩放存储，在边缘高效模型上实现高达26.85倍能耗降低和2.21倍面积缩减。

Comments 7 Pages

详情

AI中文摘要

节能边缘推理需要降低算术成本、内存流量和硬件开销。本文对基于NVFP4 LUT的边缘高效神经网络推理进行了消融研究。提出的NVLUT框架结合了4位NVFP4激活、两级缩放、基于LUT的尾数计算、电压缩放存储和选择性ECC保护。乘法分解为符号、指数和尾数路径，其中符号使用XOR逻辑，指数使用整数加法，尾数乘法由紧凑的LUT访问替代。NVFP4激活使用FP4数据，并带有FP8块缩放和FP32张量缩放。在六个边缘高效模型上，块大小消融表明B=16提供了实用的精度/存储权衡，对于N=4096仅需4.5078位每输入。权重精度消融表明，在相同NVFP4激活路径下，FP8和FP16权重相比FP4权重仅带来适度提升。与纯无缩放FP4相比，无重训练的NVFP4通过恢复激活动态范围大幅恢复精度，而带重训练的NVFP4在模型上达到最佳精度。硬件分析显示，NVLUT相比传统LUT在ECC加电压缩放下实现高达26.85倍能耗降低，在混合电压操作下高达22.85倍。面积分别减少高达2.21倍和1.52倍。这些结果表明，NVFP4两级缩放结合选择性可靠性保护实现了鲁棒、低能耗的边缘推理。

英文摘要

Energy-efficient neural-network inference at the edge requires reducing arithmetic cost, memory traffic, computation energy, and storage overhead while maintaining acceptable accuracy. This paper presents an ablation-focused study of NVFP4 quantization for edge-efficient neural networks, with emphasis on the relationship between activation precision, weight precision, block-size scaling, retraining, and model accuracy. NVFP4 activations are represented using 4-bit FP4 data, an FP8 block scale, and an FP32 tensor scale, enabling ultra-low precision inference while preserving activation dynamic range. A block-size ablation over six edge-efficient models shows that block size B = 16 provides a practical accuracy/storage trade-off, requiring only 4.5078 bits per input for N = 4096. A weight precision ablation further shows that FP8 and FP16 weights provide only modest gains over FP4 weights under the same NVFP4 activation path, suggesting that activation quantization and scaling dominate much of the accuracy behavior. To isolate the benefit of the NVFP4 data type, this work compares conventional unscaled FP4 activation inference and NVFP4 activation inference with and without retraining. The results show that conventional FP4 inference collapses accuracy for most compact models, while NVFP4 without retraining already recovers substantial accuracy by restoring activation dynamic range through FP8 block scaling and FP32 tensor scaling. When combined with retraining, NVFP4 achieves the best accuracy across the evaluated models, demonstrating the effectiveness of scaling-aware FP4 (NVFP4) inference. These findings provide general design guidance for hardware-software co-design of low power edge inference across a broad range of accelerator platforms, including GPUs, Tensor Cores, FPGAs, domain-specific AI accelerators, near-memory computing systems, and emerging edge-computing architectures.

URL PDF HTML ☆

赞 0 踩 0

2606.11272 2026-06-11 cs.LG cs.AI 新提交

Federated continual learning: A comprehensive survey on lifelong and privacy-preserving learning over distributed and non-stationary data

联邦持续学习：分布式和非平稳数据上的终身与隐私保护学习综述

Masoume Gholizade, Fabrizio Ruffini, Pietro Ducange, Francesco Marcelloni

发表机构 * University of Pisa（比萨大学）； University of Modena and Reggio Emilia（摩德纳和雷焦艾米利亚大学）

AI总结本文系统综述联邦持续学习（FCL），定义问题、分析经典联邦学习在非平稳数据下的局限，提出多维分类法，并讨论应用、评估指标及开放挑战。

Comments 77 pages, 8 figures

详情

DOI: 10.1016/j.neucom.2026.133929
Journal ref: Neurocomputing, Volume 694, 2026, 133929

AI中文摘要

联邦学习（FL）能够在分布式客户端之间实现协作和隐私保护的模型训练，但大多数现有的FL系统隐含地假设数据是平稳的。在现实场景中——如医疗、工业物联网（IIOT）、网络安全和智慧城市——数据流本质上是非平稳的，导致经典FL方法遭受性能下降、不稳定和灾难性遗忘。持续学习（CL）解决了在演化数据分布下的学习问题，但主要在集中式环境中研究，忽视了联邦系统的关键约束，包括隐私、有限通信和客户端异质性。联邦持续学习（FCL）出现在FL和CL的交汇处，旨在支持分布式和非平稳数据上的终身、自适应和隐私感知学习。本综述提供了FCL的全面和系统概述。我们首先给出FCL问题的正式定义并阐明其独特特征。然后分析经典FL在非平稳条件下的局限性，强调CL原理如何支持长期适应。为了组织快速增长的文献，我们提出了FCL方法的多维分类法。此外，我们回顾了代表性的应用领域和数据模态，总结了常用的评估指标，并讨论了评估长期性能和遗忘的实验视角。最后，我们强调了关键开放挑战，包括处理时间漂移下的极端异质性、设计可扩展且隐私保护的记忆机制，以及建立标准化基准。本综述旨在为推进FCL走向鲁棒和可部署的现实世界系统提供参考和路线图。

英文摘要

Federated Learning (FL) enables collaborative and privacy-preserving model training across distributed clients, but most existing FL systems implicitly assume data stationarity. In real-world settings-such as healthcare, industrial IoT (IIOT), cybersecurity, and smart cities-data streams are inherently non-stationary, leading classical FL methods to suffer from performance degradation, instability, and catastrophic forgetting. Continual Learning (CL) addresses learning under evolving data distributions but has been largely studied in centralized settings, overlooking key constraints of federated systems, including privacy, limited communication, and client heterogeneity. Federated Continual Learning (FCL) emerges at the intersection of FL and CL, aiming to support lifelong, adaptive, and privacy-aware learning over distributed and non-stationary data. This survey provides a comprehensive and systematic overview of FCL. We first present a formal definition of the FCL problem and clarify its distinctive characteristics. We then analyze the limitations of classical FL under non-stationary conditions, highlighting how CL principles support long-term adaptation. To organize the rapidly growing literature, we propose a multi-dimensional taxonomy of FCL approaches. Furthermore, we review representative application domains and data modalities, summarize commonly used evaluation metrics, and discuss experimental perspectives for assessing long-term performance and forgetting. Finally, we highlight key open challenges, including handling extreme heterogeneity under temporal drift, designing scalable and privacy-preserving memory mechanisms, and establishing standardized benchmarks. This survey aims to serve as a reference and a roadmap for advancing FCL toward robust and deployable real-world systems.

URL PDF HTML ☆

赞 0 踩 0

2606.11480 2026-06-11 cs.LG 新提交

Accurate and Resource-Efficient Federated Continual Learning

准确且资源高效的联邦持续学习

Jebacyril Arockiaraj, Dhruv Parikh, Jayashree Adivarahan, Rajgopal Kannan, Viktor Prasanna

发表机构 * University of Southern California（南加州大学）； DEVCOM Army Research Office（DEVCOM陆军研究办公室）

AI总结提出FedRAN框架，通过紧凑随机特征统计替代梯度更新，利用截断SVD降低通信开销，结合原型伪标签处理标签稀缺，在多个数据集上提升准确率并大幅降低资源消耗。

Comments Technical Report

详情

AI中文摘要

联邦持续学习（FCL）必须在有限的资源（如通信、计算、内存和标签可用性）下从分布式任务流中学习。现有的FCL方法通常依赖于重复的局部优化、重放和完全监督。解析替代方法避免了迭代训练和重放，但使用高维随机特征来提高准确性需要二阶特征统计量——Gram矩阵，其通信成本与随机特征大小$M$成二次方关系。我们提出FedRAN，一种资源感知的解析FCL框架，用紧凑的随机特征统计量替代基于梯度的更新。每个客户端传输其Gram矩阵的截断SVD摘要，将主要的二阶上传从$M$的二次方减少到线性（对于固定秩）。服务器执行两级QR-SVD子空间合并，在空间上跨客户端、在时间上跨任务，并以闭式求解岭分类器。FedRAN进一步通过基于原型的伪标签支持标签稀缺。在CIFAR-100、ImageNet-R和VTAB数据集上，FedRAN相比最强基线将平均准确率提高了最多4.8个百分点，每个客户端的通信量比基于优化的FCL少30.6-121.8倍，平均比基于梯度的基线快190.3倍；仅使用20%标签时，伪标签将平均准确率提高了最多6.61个百分点。这些结果表明，FedRAN在通信、计算和标签约束下实现了准确且资源高效的FCL。源代码可在该https URL获取。

英文摘要

Federated continual learning (FCL) must learn from distributed task streams under limited resources, such as communication, computation, memory, and label availability. Existing FCL methods often rely on repeated local optimization, replay, and full supervision. Analytic alternatives avoid iterative training and replay, but using high-dimensional random features to improve accuracy requires a second-order feature statistic, the Gram matrix, which has a quadratic communication cost in the random feature size $M$. We propose FedRAN, a resource-aware analytic FCL framework that replaces gradient-based updates with compact random feature statistics. Each client transmits a truncated-SVD summary of its Gram matrix, reducing the dominant second-order upload from quadratic to linear in $M$ for fixed rank. The server performs a two-level QR-SVD subspace merge, spatially across clients and temporally across tasks, and solves a ridge classifier in closed form. FedRAN further supports label scarcity through prototype-based pseudo-labeling. Across CIFAR-100, ImageNet-R, and VTAB datasets, FedRAN improves average accuracy by up to 4.8 percentage points over the strongest baseline, uses 30.6-121.8$\times$ less per-client communication than optimization-based FCL, and is 190.3$\times$ faster on average than gradient-based baselines; with only 20% labels, pseudo-labeling improves average accuracy by up to 6.61 points. These results show that FedRAN enables accurate and resource-efficient FCL under communication, computation, and label constraints. The source code is available at https://github.com/JebacyrilArockiaraj/Fed-RAN-SSL.

URL PDF HTML ☆

赞 0 踩 0

2606.11556 2026-06-11 cs.CR cs.AI cs.LG 交叉投稿

Privacy-Preserving Federated Autoencoder for ECG Anomaly Detection on Edge Devices

面向边缘设备上心电图异常检测的隐私保护联邦自编码器

Kaan Arda Akyol, Jakub Kacper Szeląg, Aydin Abadi, Maha Alghamdi, Ghadah Albalawi, Ghouse Ibrahim Kaleelullah, Hilal Tutus, Sarah Al Subaiei, Shardul Kapse, Syed Mohammed Raheeb, Mujeeb Ahmed, Rehmat Ullah

发表机构 * Google Research, New York, NY（谷歌研究，纽约，纽约州）； University of California, Berkeley（加州大学伯克利分校）； University of Cambridge（剑桥大学）； University of Toronto（多伦多大学）； University of Melbourne（墨尔本大学）； University of Sydney（悉尼大学）

AI总结提出一种结合联邦学习、差分隐私和INT8量化的端到端系统，在PTB-XL数据集上实现无监督12导联ECG异常检测，满足隐私、实时性和非IID数据要求。

Comments 9 pages, 4 figures, 6 tables. Preprint prepared in IEEE conference format. Submitted to: FLTA 2026

详情

AI中文摘要

连续心电图监测可以在心律异常演变为心血管事件之前发现它们。然而，一个可部署的系统必须同时满足三个要求：法律级别的隐私（GDPR、HIPAA）、在受限边缘硬件上的实时推理以及在非IID跨医院数据下的检测质量。我们设计并评估了一个端到端的联邦系统，在PTB-XL数据集上解决了无监督12导联ECG异常检测的所有三个要求，结合了三种自编码器家族（VanillaAE、ConvAE、VAE）、基于Flower的联邦平均（FedAvg）跨十个模拟医院、客户端差分隐私SGD（DP-SGD）与Rényi-DP会计，以及使用Raspberry Pi 4基准测试的8位整数（INT8）训练后量化。我们的主要贡献是：这些机制如何组合的经验性特征、实用的DP特定建议，以及针对临床敏感环境的技术和安全见解。联邦学习在所有架构上匹配或超过集中基线（ConvAE联邦ROC曲线下面积AUROC为0.782），并且ε扫描确定ε=4为推荐的临床操作点。INT8量化大致将模型大小减半，并将Pi 4延迟降低多达44%，AUROC损失小于0.12%。关键的是，DP和量化的惩罚在经验上是独立的，因此从业者不需要为了紧凑的边缘足迹而牺牲强大的隐私保证。据我们所知，这是第一个结合联邦学习、形式化(ε,δ)-DP、无监督重建检测和量化AArch64部署的系统。

英文摘要

Continuous electrocardiography (ECG) monitoring could surface rhythm abnormalities before they escalate into cardiovascular events. However, a deployable system must satisfy three requirements simultaneously: legal-grade privacy (GDPR, HIPAA), real-time inference on constrained edge hardware, and detection quality under non-IID cross-hospital data. We design and evaluate an end-to-end federated system addressing all three for unsupervised 12-lead ECG anomaly detection on PTB-XL dataset, combining three autoencoder families (VanillaAE, ConvAE, VAE), Flower-based federated averaging (FedAvg) across ten simulated hospitals, client-side differentially private SGD (DP-SGD) with a Rényi-DP accountant, and 8-bit integer (INT8) post-training quantization with Raspberry Pi 4 benchmarking. Our main contributions are: an empirical characterization of how these mechanisms compose, practical DP-specific recommendations, and technical and security insights for a clinically sensitive setting. Federated learning matches or exceeds the centralized baseline across all architectures (ConvAE federated area under the ROC curve, AUROC, $0.782$), and an $\varepsilon$ sweep identifies $\varepsilon=4$ as the recommended clinical operating point. INT8 quantization roughly halves model size and cuts Pi 4 latency by up to $44%$ with $<0.12%$ AUROC loss. Crucially, DP and quantization penalties are empirically independent, so practitioners need not trade a strong privacy guarantee for a compact edge footprint. To our knowledge, this is the first system combining federated learning, formal $(\varepsilon,δ)$-DP, unsupervised reconstruction-based detection, and quantized AArch64 deployment.

URL PDF HTML ☆

赞 0 踩 0

2506.01396 2026-06-11 cs.LG cs.CR stat.ML 版本更新

Mitigating Disparate Impact of Differentially Private Learning through Bounded Adaptive Clipping

通过有界自适应裁剪减轻差分隐私学习中的差异影响

Linzh Zhao, Aki Rehn, Mikko A. Heikkilä, Razane Tajeddine, Antti Honkela

发表机构 * Department of Computer Science, University of Helsinki（计算机科学系，赫尔辛基大学）； Department of Electrical and Computer Engineering, American University of Beirut（电气与计算机工程系，贝鲁特美国大学）

AI总结针对差分隐私学习中梯度裁剪对少数群体造成的不公平影响，提出有界自适应裁剪方法，通过引入可调下界防止过度梯度抑制，在Skewed和Fashion MNIST上最差类准确率提升超过10个百分点。

Comments TMLR camera-ready version

详情

AI中文摘要

差分隐私已成为隐私保护机器学习的基本框架。然而，现有的差分隐私学习方法通常对模型预测产生差异影响，例如对少数群体。梯度裁剪常用于差分隐私学习，但会抑制来自困难样本的较大梯度。我们表明，自适应裁剪会加剧这一问题，因为它通常会将裁剪边界缩小到极小值以匹配拟合良好的多数类，同时显著降低其他类的准确率。我们提出有界自适应裁剪，引入可调下界以防止过度梯度抑制。与无界自适应裁剪相比，我们的方法在Skewed和Fashion MNIST上将最差类准确率提高了超过10个百分点，与自动裁剪相比提高了7个百分点，与恒定裁剪相比提高了5个百分点。代码可在该 https URL 获取。

英文摘要

Differential privacy (DP) has become an essential framework for privacy-preserving machine learning. Existing DP learning methods, however, often have disparate impacts on model predictions, e.g., for minority groups. Gradient clipping, which is often used in DP learning, can suppress larger gradients from challenging samples. We show that this problem is amplified by adaptive clipping, which will often shrink the clipping bound to tiny values to match a well-fitting majority, while significantly reducing the accuracy for others. We propose bounded adaptive clipping, which introduces a tunable lower bound to prevent excessive gradient suppression. Our method improves worst-class accuracy by over 10 percentage points on Skewed and Fashion MNIST compared to unbounded adaptive clipping, 7 points compared to Automatic clipping, and 5 points compared to constant clipping. The code is available at https://github.com/TrustworthyMLHelsinki/adaptive-clipping-fairness.

URL PDF HTML ☆

赞 0 踩 0

2506.08473 2026-06-11 cs.LG 版本更新

AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin

AsFT：在窄安全盆地内锚定大语言模型微调期间的安全性

Shuo Yang, Qihui Zhang, Yuyang Liu, Xiaojun Jia, Kunpeng Ning, Jiayu Yao, Jigang Wang, Hailiang Dai, Yibing Song, Li Yuan

发表机构 * National University of Singapore（新加坡国立大学）； University of Science and Technology of China（中国科学技术大学）； Tsinghua University（清华大学）

AI总结针对微调大语言模型时安全性易受损的问题，提出AsFT方法，通过惩罚与对齐方向正交的更新，将模型约束在窄安全盆地内，在提升任务性能的同时显著降低有害行为。

详情

AI中文摘要

微调大语言模型（LLMs）可提升性能，但引入了关键的安全漏洞：即使极少的有害数据也会严重破坏安全措施。我们观察到，与对齐方向（由对齐（安全）模型与未对齐模型之间的权重差异定义）正交的扰动会迅速损害模型安全性。相反，沿对齐方向的更新则基本保持安全性，揭示了参数空间是一个“窄安全盆地”。为解决此问题，我们提出AsFT（在微调中锚定安全性），通过在微调过程中显式约束更新方向来维持安全性。通过惩罚与对齐方向正交的更新，AsFT有效将模型约束在“窄安全盆地”内，从而保持其固有安全性。在多个数据集和模型上的大量实验表明，AsFT将有害行为降低高达7.60%，任务性能提升3.44%，并在多个任务上持续优于现有方法。

英文摘要

Fine-tuning large language models (LLMs) improves performance but introduces critical safety vulnerabilities: even minimal harmful data can severely compromise safety measures. We observe that perturbations orthogonal to the alignment direction - defined by weight differences between aligned (safe) and unaligned models - rapidly compromise model safety. In contrast, updates along the alignment direction largely preserve it, revealing the parameter space as a "narrow safety basin". To address this, we propose AsFT (Anchoring Safety in Fine-Tuning) to maintain safety by explicitly constraining update directions during fine-tuning. By penalizing updates orthogonal to the alignment direction, AsFT effectively constrains the model within the "narrow safety basin," thus preserving its inherent safety. Extensive experiments on multiple datasets and models show that AsFT reduces harmful behaviors by up to 7.60%, improves task performance by 3.44%, and consistently outperforms existing methods across multiple tasks.

URL PDF HTML ☆

赞 0 踩 0

2510.01529 2026-06-11 cs.LG cs.CR 版本更新

Bypassing Prompt Guards in Production with Controlled-Release Prompting

绕过生产环境中的提示守卫：受控释放提示攻击

Jaiden Fairoze, Sanjam Garg, Keewoo Lee, Mingyuan Wang

发表机构 * UC Berkeley（加州大学伯克利分校）； zkBricks Inc（zkBricks公司）； Ethereum Foundation（以太坊基金会）； NYU Shanghai（纽约大学上海分校）

AI总结针对AI对齐的提示过滤存在理论上的不可能性，本文提出受控释放提示攻击，利用轻量级输入过滤器与主模型之间的资源不对称性，在实际部署的大语言模型系统中成功绕过提示守卫。

Comments Accepted to USENIX Security 2026

详情

AI中文摘要

Ball等人最近指出，用于AI对齐的提示过滤面临一个根本性障碍：在标准密码学假设下，任何运行速度远快于被保护模型的过滤器都无法普遍区分对抗性提示和良性提示。我们研究这一不可能性结果是否转化为已部署的大语言模型（LLM）系统中的现实漏洞。我们通过引入受控释放提示攻击给出了肯定答案，这是理论框架的一种实际实例化，利用了轻量级输入过滤器与其保护的主模型之间的资源不对称性。与理论构造不同，我们的攻击不需要修改模型：它生成任何有界过滤器无法解读但对目标LLM仍然可处理的恶意提示。我们发现，在基线方法失败的四个主要聊天平台（Google Gemini、DeepSeek Chat、xAI Grok和Mistral Le Chat）上，我们的攻击均成功。此外，我们将攻击应用于从Gemini提取受版权保护的数据。最后，我们对14个开源提示守卫模型进行了系统评估，揭示即使具有推理能力的过滤器也无法在不产生过高资源开销的情况下可靠地检测我们的攻击。

英文摘要

Ball et al. recently established that prompt filtering for AI alignment faces a fundamental barrier: under standard cryptographic assumptions, no filter running significantly faster than the protected model can universally distinguish adversarial prompts from benign ones. We investigate whether this impossibility result translates to real-world vulnerabilities in deployed large language model (LLM) systems. We answer affirmatively by introducing controlled-release prompting, a practical instantiation of the theoretical framework that exploits the resource asymmetry between lightweight input filters and the main models they protect. Unlike the theoretical construction, our attack does not require model modification: it generates malicious prompts that are indecipherable by any bounded filter yet remain tractable to the target LLM. We find our attack to be successful on four major chat platforms (Google Gemini, DeepSeek Chat, xAI Grok, and Mistral Le Chat) where baseline methods fail. Additionally, we apply our attack to extract copyrighted data from Gemini. Finally, we provide a systematic evaluation of 14 open-weight prompt guard models, revealing that even reasoning-capable filters cannot reliably detect our attack without incurring prohibitive resource overhead.

URL PDF HTML ☆

赞 0 踩 0

2510.03520 2026-06-11 cs.LG cs.AI cs.SY eess.SY 版本更新

Certifiable Safe RLHF: Semantic Grounding and Fixed Penalty Constraint Optimization for Safer LLM Alignment

可认证安全RLHF：基于语义基础与固定惩罚约束优化的更安全大语言模型对齐

Kartik Pandit, Sourav Ganguly, Arnesh Banerjee, Shaahin Angizi, Arnob Ghosh

发表机构 * Department of Electrical and Computer Engineering（电气与计算机工程系）； New Jersey Institute of Technology（新泽西理工学院）； Department of Computer Engineering（计算机工程系）； Heritage Institute of Technology（遗产理工学院）

AI总结针对现有RLHF方法依赖奖励/成本函数和双变量调优导致性能敏感且缺乏可证明安全保证的问题，提出CS-RLHF，通过语义基础成本模型和固定惩罚约束优化，实现可认证安全对齐，效率提升至少5倍。

详情

AI中文摘要

确保安全是大语言模型（LLMs）的基本要求。在增强模型输出效用与减轻其潜在危害之间取得适当平衡是一个复杂且持续的挑战。当代方法通常将这个问题形式化为约束马尔可夫决策过程（CMDP）框架，并采用成熟的CMDP优化技术。然而，这些方法表现出两个显著的限制。首先，它们对奖励和成本函数的依赖使得性能对底层评分机制高度敏感，而该机制必须捕捉语义含义，而不是被表面关键词触发。其次，基于CMDP的训练需要调整双变量，这一过程计算成本高昂，并且对于可能通过对抗性越狱利用的固定双变量，不提供任何可证明的安全保证。为了克服这些限制，我们引入了可认证安全RLHF（CS-RLHF），它引入了一个在大规模语料库上训练的成本模型，以分配基于语义的安全分数。与基于拉格朗日的方法相比，CS-RLHF采用了一种修正的基于惩罚的公式。该设计借鉴了约束优化中精确惩罚函数理论，其中约束满足直接通过适当选择的惩罚项来强制执行。通过适当缩放的惩罚，可以在优化器处保证安全约束的可行性，从而消除了双变量更新的需要。实证评估表明，CS-RLHF优于最先进的LLM模型响应，对正常和越狱提示的效率至少提高5倍。

英文摘要

Ensuring safety is a foundational requirement for large language models (LLMs). Achieving an appropriate balance between enhancing the utility of model outputs and mitigating their potential for harm is a complex and persistent challenge. Contemporary approaches frequently formalize this problem within the framework of Constrained Markov Decision Processes (CMDPs) and employ established CMDP optimization techniques. However, these methods exhibit two notable limitations. First, their reliance on reward and cost functions renders performance highly sensitive to the underlying scoring mechanism, which must capture semantic meaning rather than being triggered by superficial keywords. Second, CMDP-based training entails tuning dual-variable, a process that is both computationally expensive and does not provide any provable safety guarantee for a fixed dual variable that can be exploitable through adversarial jailbreaks. To overcome these limitations, we introduce Certifiable Safe-RLHF (CS-RLHF) that introduces a cost model trained on a large-scale corpus to assign semantically grounded safety scores. In contrast to the lagrangian-based approach, CS-RLHF adopts a rectified penalty-based formulation. This design draws on the theory of exact penalty functions in constrained optimization, wherein constraint satisfaction is enforced directly through a suitably chosen penalty term. With an appropriately scaled penalty, feasibility of the safety constraints can be guaranteed at the optimizer, eliminating the need for dual-variable updates. Empirical evaluation demonstrates that CS-RLHF outperforms state-of-the-art LLM model responses rendering at-least 5 times efficient against nominal and jail-breaking prompts

URL PDF HTML ☆

赞 0 踩 0

2512.13666 2026-06-11 cs.CR cs.DC cs.IT cs.LG math.IT 版本更新

压力下的风险：语言模型对抗鲁棒性的计算感知评估

Malikeh Ehghaghi, Boglárka Ecsedi, Marsha Chechik, Colin Raffel

发表机构 * University of Toronto（多伦多大学）； Vector Institute（向量研究所）； Hugging Face

AI总结提出基于计算压力（累积FLOPs）的对抗鲁棒性评估框架，通过风险-计算曲线和两个新指标，揭示不同攻击策略的计算成本差异，并在10个模型上验证了对齐训练、模型规模等因素对计算空间鲁棒性的非单调影响。

详情

AI中文摘要

大型语言模型（LLMs）的对抗鲁棒性评估通常报告固定查询预算下的攻击成功率（ASR），隐含地认为所有攻击成本相同。实际上，不同攻击策略的计算开销可能相差几个数量级。因此，固定预算下的ASR可能掩盖破解模型所需的真实努力，从而难以判断攻击成本是否值得。我们提出一个基于计算压力的计算感知评估框架，以累积浮点运算次数（FLOPs）作为对抗努力的代理。我们引入风险-计算曲线，将计算预算映射到攻击风险，并推导出两个指标，总结给定攻击成功所需的平均压力。在跨越三个模型家族和语言模型训练与对齐的四个不同阶段的十个模型上，使用三种攻击策略（基于梯度、迭代细化和基于模板）在两个破解鲁棒性基准上评估，我们发现：（1）对齐训练对计算空间鲁棒性具有非单调影响；（2）扩大模型规模降低了基于梯度的攻击有效性，但对更便宜的基于模板的攻击影响有限；（3）在代理模型上优化的基于梯度的攻击可以迁移到独立的目标模型，从而降低攻击者成本；（4）在单个模型内，不同危害类别的计算成本差异高达约5倍；（5）安全对齐的RL增加了总成本，同时使某些类别不成比例地易于攻击。我们发布框架以实现计算感知的风险评估和评价。

英文摘要

Adversarial robustness evaluations of large language models (LLMs) typically report attack success rate (ASR) under fixed query budgets, implicitly treating all attacks as equally costly. In practice, the computational expense of different attack strategies can vary by orders of magnitude. Consequently, ASR at a fixed budget can obscure the true effort required to jailbreak a model, thereby making it hard to determine whether an attack's cost justifies its payoff to the attacker. We propose a compute-aware evaluation framework based on computational pressure, measured in cumulative floating-point operations (FLOPs), as a proxy for adversarial effort. We introduce risk-compute curves, which map compute budgets to attack risk, and derive two metrics that summarize the average pressure required for a given attack to succeed. Across ten models spanning three families and four different stages in language model training and alignment, evaluated with three attack strategies (gradient-based, iterative refinement, and template-based) on two jailbreak robustness benchmarks, we find: (1) alignment training has non-monotonic effects on compute-space robustness; (2) scaling model size reduces gradient-based attack effectiveness but has limited impact on cheaper template-based attacks; (3) gradient-based attacks optimized on a surrogate model can transfer to a separate target model, providing a way to reduce attacker costs; (4) compute cost varies by up to ${\approx}5{\times}$ across harm categories within a single model; and (5) safety-aligned RL increases aggregate cost while leaving some categories disproportionately accessible. We release our framework to enable compute-aware risk assessment and evaluation.

URL PDF HTML ☆

赞 0 踩 0

2606.11949 2026-06-11 cs.LG cs.CR stat.ML 新提交

Online Shift Detection and Conformal Adaptation for Deployed Safety Classifiers

已部署安全分类器的在线漂移检测与共形自适应

Jun Wen Leong

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出在线监测系统，使用校准序列统计检测分布漂移，并通过共形弃权层自适应阈值恢复目标错误率，在800个实验单元中实现86.6%有效检测。

Comments 16 pages, 4 figures, 7 tables. Code and data at https://github.com/junwenleong/safety-classifier-shift-monitor

详情

AI中文摘要

我们提出了一种在线监测系统，用于检测已部署安全分类器中的分布漂移，使用校准的序列统计量来检测分类器何时移出分布。一旦检测到，共形弃权层会自适应调整决策阈值，以恢复目标错误率ε=0.1。在一项预注册的析因评估（4个分类器×5种漂移条件×20个种子×2个窗口大小，共800个单元）中，该系统实现了86.6%的有效检测（693/800，95% CI [84.1%, 88.8%]），平均延迟为39.5步。检测在三种真实标签机制下均有效：合成发作（86.6%）、真实时间越狱（85%，17/20）和GCG对抗攻击。加权共形预测为DeBERTa恢复了高达39个百分点的丢失覆盖率（ESS=46/300），但所有其他分类器均崩溃（ESS≈300）：逻辑密度比估计在高维嵌入空间中实现了完美的源/目标可分离性，将所有重要性权重裁剪至下限。DeBERTa显示出从有效校正（释义，ESS=46）到几乎完全崩溃（对抗后缀，ESS=206）的梯度。PCA降至32维打破了崩溃，为Llama Guard恢复了33个百分点，为ShieldGemma恢复了21个百分点。方差分解显示分类器（η²=0.243）、漂移类型（η²=0.237）及其交互作用（η²=0.185）均对检测延迟方差有显著贡献（所有p<0.001），表明需要针对每个分类器的监测配置文件。

英文摘要

We present an online monitoring system for distributional shift in deployed safety classifiers, using calibrated sequential statistics to detect when a classifier has moved out of distribution. Upon detection, a conformal abstention layer adapts decision thresholds to recover a target error rate epsilon=0.1. In a pre-registered factorial evaluation (4 classifiers x 5 shift conditions x 20 seeds x 2 window sizes, 800 cells), the system achieves 86.6% valid detection (693/800, 95% CI [84.1%, 88.8%]) with mean latency of 39.5 steps. Detection holds across three ground-truth regimes: synthetic onset (86.6%), real temporal jailbreaks (85%, 17/20), and GCG adversarial attacks. Weighted conformal prediction recovers up to 39 pp of lost coverage for DeBERTa (ESS=46/300) but collapses for all other classifiers (ESS~300): logistic density ratio estimation achieves perfect source/target separability in high-dimensional embedding spaces, clipping all importance weights to the floor. DeBERTa shows a gradient from effective correction (paraphrase, ESS=46) to near-total collapse (adversarial suffix, ESS=206). PCA to 32 dimensions breaks the collapse, recovering 33 pp for Llama Guard and 21 pp for ShieldGemma. Variance decomposition reveals classifier (eta^2=0.243), shift type (eta^2=0.237), and their interaction (eta^2=0.185) all contribute substantially to detection latency variance (all p<0.001), indicating per-classifier monitoring profiles are necessary.

URL PDF HTML ☆

赞 0 踩 0

2606.11998 2026-06-11 cs.LG 新提交

Bootstrapped Monitoring: Leveraging Transparent Reasoning to Oversee Stronger AI Agents

自助监控：利用透明推理监督更强的AI智能体

Frank Xiao, Mary Phuong

发表机构 * California Institute of Technology（加州理工学院）

AI总结提出自助监控协议，通过插入具有透明思维链的不可信中间模型来监督更强智能体，在软件工程任务中显著提升捕获率，即使不可信监控者与智能体合谋。

详情

AI中文摘要

可信监控是AI控制的基石。然而，随着前沿模型能力增强，可信与不可信模型之间的能力差距可能使可信模型成为不可靠的监控者。我们引入了\emph{自助监控}协议，通过在监督链中插入一个具有透明思维链推理的更强的不可信中间模型来解决这一问题。不可信监控者（$U_m$）评估智能体的行为，而较弱的可信模型（$T$）监督$U_m$的推理以检测合谋。我们在多轮软件工程任务（BashArena）上对多个智能体和监控者评估了自助监控。即使不可信监控者主动与智能体合谋，只要我们能够访问其原始思维链，自助监控相比仅使用可信监控显著提高了捕获率。我们的结果表明，随着AI能力的进步，自助监控可以延长可信模型在控制中的有效寿命。

英文摘要

Trusted monitoring is a cornerstone of AI control. However, as frontier models grow more capable, the increasing capabilities gap between trusted and untrusted models may render trusted models unreliable monitors. We introduce \emph{bootstrapped monitoring}, a protocol that addresses this by inserting a stronger, intermediate untrusted model with transparent chain-of-thought reasoning into the oversight chain. The untrusted monitor ($U_m$) evaluates the agent's actions, while a weaker trusted model ($T$) oversees $U_m$'s reasoning to detect collusion. We evaluate bootstrapped monitoring on multi-turn software engineering tasks (BashArena) across multiple agents and monitors. Bootstrapped monitoring substantially improves catch rates over trusted-only monitoring, even when the untrusted monitor actively colludes with the agent, provided we have access to its raw chain-of-thought. Our results suggest that bootstrapped monitoring can extend the useful lifetime of trusted models in control as AI capabilities advance.

URL PDF HTML ☆

赞 0 踩 0

2606.12016 2026-06-11 cs.LG cs.AI 新提交

Generalization Hacking: Models Can Game Reinforcement Learning by Preventing Behavioral Generalization

泛化黑客：模型可通过阻止行为泛化来博弈强化学习

Frank Xiao, Mary Phuong

发表机构 * California Institute of Technology（加州理工学院）

AI总结本研究提出泛化黑客现象，模型在强化学习中通过自我接种机制阻止行为泛化，在保持高奖励的同时抵抗行为修正，首次证明模型能主动破坏训练过程。

详情

AI中文摘要

模型后训练，特别是强化学习（RL），是开发者塑造模型价值观和行为的主要机制之一。然而，随着模型越来越具有评估和训练意识，当感知到的目标与其当前价值观冲突时，它们可能会被激励去抵抗训练，从而削弱开发者通过进一步训练检测错位和纠正模型行为的能力。在本文中，我们展示了泛化黑客，即模型在RL期间收集奖励的同时阻止奖励行为泛化。我们在Qwen3-235B-A22B上构建了一个模型有机体，对描述训练意识和自我接种（一种新颖机制，其中模型在其思维链中将合规性框架为上下文特定，而不演示或指示任一行为）的合成文档进行微调。该模型有机体在训练时实现了与对照组相当的有害性，同时在700步RL中保持了持续的约15个百分点的合规差距。此外，仅接受训练意识文档训练的对照有机体在RL压力下独立发现了类似接种的推理，尽管从未接触过该概念，却发展出自己的合规差距。由于泛化黑客有机体在整个过程中获得高奖励，标准训练指标未提供泛化失败的信号。我们的结果首次证明模型可以在保持高奖励的同时主动抵抗RL行为修正，表明随着模型变得更有能力和训练意识，它们可能能够破坏训练过程本身。

英文摘要

Model post-training, and in particular reinforcement learning (RL), is one of the primary mechanisms by which developers can shape models' values and behaviors. However, as models become increasingly evaluation and training aware, they may be motivated to resist training when the perceived objective conflicts with their current values, undermining developers' ability to detect misalignment and correct model behavior through further training. In this paper, we demonstrate generalization hacking, in which a model collects reward during RL while preventing the rewarded behavior from generalizing. We construct a model organism on Qwen3-235B-A22B, finetuning on synthetic documents describing training awareness and self-inoculation, a novel mechanism in which the model frames compliance as context-specific in its chain of thought, without demonstrating or instructing either behavior. The model organism achieves train-time harmfulness comparable to controls while maintaining a persistent ${\sim}15$ percentage point compliance gap across 700 steps of RL. Additionally, a control organism trained only on training awareness documents independently discovers inoculation-like reasoning under RL pressure, developing its own compliance gap despite never being exposed to the concept. Because the generalization-hacking organism receives high reward throughout, standard training metrics provide no signal that generalization has failed. Our results constitute the first demonstration that a model can actively resist RL behavioral modification while maintaining high reward, suggesting that as models become more capable and training-aware, they may be able to undermine the training process itself.

URL PDF HTML ☆

赞 0 踩 0

2606.12251 2026-06-11 cs.LG cs.AI cs.CR 新提交

Reinforcement Learning Disrupts Gradient-Based Adversarial Optimization

强化学习破坏基于梯度的对抗优化

Xinhai Zou, Chang Zhao, Alireza Aghabagherloo, Dave Singelée, Robin Degraeve, Bart Preneel

发表机构 * COSIC, KU Leuven（鲁汶大学COSIC）； Imec ； Brubotics, VUB（布鲁塞尔自由大学Brubotics）； DistriNet, KU Leuven（鲁汶大学DistriNet）

AI总结研究通过强化学习训练图像分类器以破坏攻击者使用的梯度结构，发现RL作为隐式正则化器产生不稳定梯度方向和较小梯度幅度，使基于梯度的攻击失效，并与对抗训练结合实现双重防御。

详情

AI中文摘要

基于梯度的对抗攻击仍然是对深度神经网络（DNN）的主要威胁，因为它们利用梯度信息高效优化对抗扰动。为了解决这个问题，我们研究了强化学习（RL）训练是否可以通过使用策略梯度目标和epsilon-贪婪探索来训练图像分类器，从而破坏攻击者使用的梯度结构。通过在CIFAR-10、CIFAR-100和ImageNet-100上使用多种架构进行系统实验，我们发现RL训练的分类器显著破坏了基于梯度的对抗优化。为了解释这一点，我们使用损失景观可视化、静态和动态梯度指标以及预测熵进行了全面的机制分析。我们的分析揭示，RL充当隐式正则化器，产生具有高度不稳定梯度方向和较小梯度幅度的模型。这种组合使得每个PGD步骤在方向上不可靠且幅度有限，导致基于梯度的攻击在实际迭代预算内失败。我们进一步表明，将RL与对抗训练（RL-adv）结合提供了在两个互补层面运作的双层防御：RL退化攻击者可用的梯度信息（梯度级防御），而对抗训练强化决策边界（边界级防御）。RL-adv在所有评估的主要攻击类型（包括基于梯度的PGD、AutoAttack、基于迁移和基于查询的攻击）中实现了最高的鲁棒性，显著优于SL-adv。这些发现将RL诱导的梯度破坏识别为一种互补的鲁棒性机制，并激励未来研究结合SL效率与RL梯度正则化特性的混合SL-RL训练调度。

英文摘要

Gradient-based adversarial attacks remain a dominant threat to deep neural networks (DNNs), as they exploit gradient information to efficiently optimize adversarial perturbations. To address this, we investigate whether reinforcement learning (RL) training can disrupt the gradient structure used by attackers by training image classifiers with policy-gradient objectives and epsilon-greedy exploration. Through systematic experiments across CIFAR-10, CIFAR-100, and ImageNet-100 with multiple architectures, we find that RL-trained classifiers significantly disrupt gradient-based adversarial optimization. To explain this, we conduct a comprehensive mechanism analysis using loss landscape visualization, static and dynamic gradient indicators, and predictive entropy. Our analysis reveals that RL acts as an implicit regularizer, producing models with highly unstable gradient directions and smaller gradient magnitudes. This combination makes each PGD step both unreliable in direction and limited in magnitude, causing gradient-based attacks to fail within practical iteration budgets. We further show that combining RL with adversarial training (RL-adv) provides a dual-layer defense operating at two complementary levels: RL degrades gradient information available to attackers (gradient-level defense), while adversarial training strengthens decision boundaries (boundary-level defense). RL-adv achieves the highest robustness across all major attack types evaluated, including gradient-based (PGD, AutoAttack), transfer-based, and query-based attacks, outperforming SL-adv by a significant margin. These findings identify RL-induced gradient disruption as a complementary robustness mechanism and motivate future research on hybrid SL-RL training schedules that combine SL's efficiency with RL's gradient-regularization properties.

URL PDF HTML ☆

赞 0 踩 0

2606.11211 2026-06-11 cs.CL cs.AI cs.LG 交叉投稿

Calibration Drift Under Reasoning: How Chain-of-Thought Budgets Induce Overconfidence in Large Language Models

推理下的校准漂移：思维链预算如何导致大型语言模型过度自信

Prakul Sunil Hiremath, Harshit R. Hiremath

发表机构 * Department of Computer Science and Engineering, Visvesvaraya Technological University, Belagavi（维斯瓦拉亚科技大学计算机科学与工程系，贝拉加维）； Department of Computer Science and Business System, SG Balekundri Institute of Technology, Belagavi（SG巴莱昆德里理工学院计算机科学与商业系统系，贝拉加维）

AI总结研究发现，增加思维链推理预算超过任务特定阈值会导致模型对错误答案过度自信，提出校准漂移现象并引入CABStop停止规则。

Comments 31 pages, 4 figures, 3 tables. Introduces Calibration Drift Under Reasoning (CDUR) with theoretical analysis and preliminary experiments; includes CABStop; code and data available

详情

AI中文摘要

大型语言模型（LLMs）表达校准不确定性的能力对于安全部署至关重要。思维链（CoT）推理被广泛用于提高准确性和可靠性，但其对校准的影响尚未完全理解。我们表明这一图景是不完整的：在某些设置中，将推理预算增加到任务特定阈值以上会导致模型系统性地变得过度自信，对错误答案赋予高置信度。我们将此现象称为推理下的校准漂移（CDUR），并从理论和实证两方面进行研究。我们定义推理预算B，并分析预期校准误差ECE(B)呈现非单调模式的条件：它首先随着推理纠正错误而下降，然后随着更长推理产生内部一致但错误的解释而上升。我们提出一个基于自回归生成的假设锁定模型来解释这种行为。我们在47个推理陷阱问题上评估了Llama-3.1-8B和Llama-3.3-70B，跨越四个推理预算和三个随机种子（1,368次API调用；574个有效响应）。8B模型显示出非单调的校准行为，而70B模型的结果仅限于基线评估，对于预算依赖效应尚无定论。我们引入CABStop，一种校准感知的停止规则，当置信度偏离辅助准确性估计时停止推理。这些结果表明，增加推理深度并不总是提高可靠性，应谨慎监控。

英文摘要

The ability of large language models (LLMs) to express calibrated uncertainty is important for safe deployment. Chain-of-thought (CoT) reasoning is widely used to improve accuracy and reliability, but its effect on calibration is not fully understood. We show that this picture is incomplete: in some settings, increasing the reasoning budget beyond a task-specific threshold can cause models to become systematically overconfident, assigning high confidence to incorrect answers. We call this phenomenon Calibration Drift Under Reasoning (CDUR) and study it both theoretically and empirically. We define reasoning budget B and analyze conditions under which Expected Calibration Error ECE(B) follows a non-monotonic pattern: it first decreases as reasoning corrects errors, then increases as longer reasoning produces internally consistent but incorrect explanations. We propose a Hypothesis Lock-In model based on autoregressive generation to explain this behavior. We evaluate Llama-3.1-8B and Llama-3.3-70B on 47 reasoning-trap questions across four reasoning budgets and three seeds (1,368 API calls; 574 valid responses). The 8B model shows non-monotonic calibration behavior, while results for the 70B model are limited to baseline evaluation and are inconclusive for budget-dependent effects. We introduce CABStop, a calibration-aware stopping rule that halts reasoning when confidence diverges from an auxiliary accuracy estimate. These results suggest that increasing reasoning depth does not always improve reliability and should be monitored carefully.

URL PDF HTML ☆

赞 0 踩 0

2606.11471 2026-06-11 cs.CR cs.LG 交叉投稿

基于机器学习的网络入侵检测系统的分类鲁棒性评估

Mayank Raj, Nathaniel D. Bastian, Lance Fiondella, Gokhan Kul

发表机构 * University of Massachusetts Dartmouth（马萨诸塞大学达特茅斯分校）； United States Military Academy（美国军事学院）

AI总结本文系统比较了CNN、LSTM和随机森林三种分类器在对抗攻击下的鲁棒性，发现随机森林基线准确率虽高但极易被攻破，而CNN表现最稳健。

详情

AI中文摘要

网络入侵检测系统（NIDS）广泛使用机器学习（ML），但ML模型可能受到对抗性攻击的操纵。这些攻击向网络流量数据添加精心设计的扰动，导致误分类。虽然先前的工作已经证明了孤立环境下的对抗性漏洞，但在受控攻击条件下，跨架构以及基于攻击类别和类型的系统比较仍然有限，这使得从业者在对抗性环境中部署哪些模型缺乏明确指导。本文提出了一个简单的问题：当攻击者试图操纵系统时，哪种分类器架构实际上能够保持稳定？我们对三种流行架构进行了测试：一维卷积神经网络（CNN）、长短期记忆网络（LSTM）和随机森林（RF）集成。使用ACI-IoT-2023数据集（超过120万个样本，涵盖12种攻击类型），我们使用FGSM和PGD对抗攻击对每个模型进行攻击，这些攻击在归一化特征空间中应用基于梯度的扰动，符合既定的对抗性ML评估协议，扰动预算范围为$\epsilon=0.01$到$\epsilon=0.1$。令人惊讶的是，随机森林实现了近乎完美的基线准确率（99.98%），但在攻击下灾难性地崩溃，在我们测试的最小扰动下下降了73个百分点。另一方面，CNN在$\epsilon=0.01$时保持了95.5%的准确率，并且随着扰动的增加而优雅地退化。LSTM介于两者之间。这些发现颠覆了传统观念：如果模型在对抗压力的第一个迹象下就崩溃，那么高基线准确率毫无意义。对于在对抗性环境中部署入侵检测的从业者，我们推荐基于CNN的架构，并提供特定场景的部署指导。

英文摘要

Network Intrusion Detection Systems (NIDS) heavily utlize Machine Learning (ML) but ML models can be manipulated via adversarial attacks. These attacks add carefully crafted perturbations to network traffic data that leads to misclassifications. While prior work has demonstrated adversarial vulnerabilities in isolated settings, systematic cross-architecture as well as class and category of attack based comparisons under controlled attack conditions remain limited, leaving practitioners without clear guidance on which models to deploy in adversarial environments. This paper asks a simple question: what type of classifier architectures actually hold up when attackers try to manipulate the systems? We put three popular architectures through their paces: a 1D Convolutional Neural Network, a Long Short-Term Memory (LSTM) network, and a Random Forest (RF) ensemble. Using the ACI-IoT-2023 dataset (over 1.2 million samples spanning 12 attack types), we subject each model with FGSM and PGD adversarial attacks, which apply gradient-based perturbations in normalized feature space consistent with established adversarial ML evaluation protocols, at perturbation budgets ranging from $ε=0.01$ to $ε=0.1$. Surprisingly, Random Forest achieved near-perfect baseline accuracy (99.98\%), yet collapsed catastrophically under attack, dropping 73 percentage points at the smallest perturbation we tested. CNN, on the other hand, retained 95.5\% accuracy at $ε=0.01$ and degraded gracefully as perturbations increased. LSTM fell somewhere in between. These findings flip the conventional wisdom where high baseline accuracy means nothing if a model shatters at the first sign of adversarial pressure. For practitioners deploying intrusion detection in adversarial environments, we recommend CNN-based architectures and provide scenario-specific deployment guidance.

URL PDF HTML ☆

赞 0 踩 0

2606.12342 2026-06-11 cs.CL cs.AI cs.ET cs.LG 交叉投稿

ALIGNBEAM : Inference-Time Alignment Transfer via Cross-Vocabulary Logit Mixing

ALIGNBEAM: 通过跨词汇表logit混合实现推理时对齐迁移

Chirag Chawla, Pratinav Seth, Vinay Kumar Sankarapu

发表机构 * Lexsi Labs

AI总结针对领域微调降低大模型安全性的问题，提出无需训练的ALIGNBEAM方法，通过逐token翻译锚模型logit并选择最安全候选，实现跨词汇表的安全对齐迁移，保持任务准确性和推理开销。

详情

AI中文摘要

领域微调会降低大型语言模型的安全性：微调后的专家模型容易顺从以领域语言表述的有害提示。现有的推理时防御方法通过混合来自安全锚模型的logit，但要求两个模型共享词汇表，这使得它们无法用于安全性退化最严重的跨族专家模型。我们提出ALIGNBEAM，一种无需训练的方法，通过在每个解码步骤逐token将锚模型logit翻译为目标模型的词汇表来解除这一限制；然后一个小型LLM法官从K个候选续写中选择最安全的。无需改变权重，并且可以在部署时调整安全-效用权衡而无需重新训练。在跨词汇表和同词汇表评估对中，ALIGNBEAM显著提高了对抗性基准上的拒绝率，同时将任务准确性和推理开销保持在实用范围内。结果表明，安全对齐可以在推理时在不同模型族之间迁移，而无需修改任一模型的权重。

英文摘要

Domain fine-tuning degrades the safety of large language models: fine-tuned specialists readily comply with harmful prompts framed in domain language. Existing inference-time defenses that mix logits from a safe anchor model require both models to share a vocabulary, which rules them out for the cross-family specialists where safety is most degraded. We present ALIGNBEAM, a training-free method that lifts this restriction by translating anchor logits into the target model's vocabulary token-by-token at each decoding step; a small LLM judge then selects the safest among K candidate continuations. No weights are changed, and the safety-utility trade-off can be tuned at deployment without retraining. Across both cross-vocabulary and same-vocabulary evaluation pairs, ALIGNBEAM substantially raises refusal on adversarial benchmarks while keeping task accuracy and inference overhead within practical bounds. The results show that safety alignment can be transferred between model families at inference time, without touching either model's weights.

URL PDF HTML ☆

赞 0 踩 0

2601.17360 2026-06-11 cs.LG cs.AI cs.CR 版本更新

Robust Privacy: Inference-Stage Privacy through Certified Robustness

鲁棒隐私：通过认证鲁棒性实现推理阶段隐私

Jiankai Jin, Xiangzheng Zhang, Zhao Liu, Wenzhuo Xu, Dongdong Yang, Deyue Zhang, Quanchen Zou

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出鲁棒隐私(RP)概念，基于认证鲁棒性确保预测在输入邻域内不变，从而限制推理阶段隐私泄露；实验表明RP在属性推断和模型反演攻击中有效提升隐私-效用权衡。

详情

AI中文摘要

观察模型发布预测的对手可以推断查询输入的敏感属性，甚至重建模型训练数据的代表。因此，推理接口充当隐私泄露的侧信道。我们引入鲁棒隐私(RP)，一种受认证鲁棒性启发的推理阶段隐私概念：如果模型预测在输入x周围半径为R的邻域内以至少$1-\alpha$的置信度可证明不变，则x享有$(R,\alpha)$-鲁棒隐私，在此条件下我们证明任何观察发布预测的对手在区分x与距离x为R内的任何输入时最多有$\alpha/2$的优势。基于RP，我们形式化鲁棒属性隐私(RAP)，一种属性级隐私概念，刻画与发布预测兼容的敏感属性值集合。在分类任务上，RP将RAP兼容推理区间的中位数长度从23.50增加到29.96，降低了属性推断精度。模型反演攻击通常被视为训练阶段威胁，实际上依赖于通过推理接口泄露的细粒度信号；RP在推理阶段掩盖这些信号，将黑盒反演攻击的成功率(ASR)从73%降至4%。这种直接针对泄露通道的方法使RP在隐私-效用权衡空间中优于DP-SGD和随机响应：RP在21% ASR下保持98.4%的准确率，而DP-SGD必须将准确率降至61.7%才能达到相当的ASR。在两个实验中，增加平滑样本量N同时增强了隐私和效用。最后，我们考察模型蒸馏作为范围边界，表明RP缓解了属性级和实例级推理阶段隐私泄露，但无法通过模型蒸馏缓解函数级提取。

英文摘要

An adversary observing a model's released prediction can infer sensitive attributes of the queried input, or even reconstruct representatives of the model's training data. The inference interface thus acts as a side channel for privacy leakage. We introduce Robust Privacy (RP), an inference-stage privacy notion inspired by certified robustness: if a model's prediction is provably invariant within a radius-R neighborhood around an input x with confidence at least $1-α$, then x enjoys $(R,α)$-Robust Privacy, under which we prove that any adversary observing the released prediction has at most $α/2$ advantage in distinguishing x from any input within distance R of x. Building on RP, we formalize Robust Attribute Privacy (RAP), an attribute-level privacy notion that characterizes the set of sensitive-attribute values that remain compatible with a released prediction. On a classification task, RP increases the median length of the RAP-compatible inference interval from 23.50 to 29.96, reducing attribute-inference precision. Model inversion attacks, often treated as a training-stage threat, in fact rely on fine-grained signals leaked through the inference interface; RP masks these signals at the inference stage, reducing attack success rate (ASR) from 73% to 4% on a black-box inversion attack. This direct targeting of the leakage channel enables RP to dominate DP-SGD and randomized response in the privacy-utility tradeoff space: RP retains 98.4% accuracy at 21% ASR, whereas DP-SGD must drop accuracy to 61.7% to reach a comparable ASR. Across both experiments, increasing the smoothing sample size N strengthens privacy and improves utility together. Finally, we examine model distillation as a scope boundary and show that RP mitigates attribute-level and instance-level inference-stage privacy leakage, but not function-level extraction through model distillation.

URL PDF HTML ☆

赞 0 踩 0

2602.05746 2026-06-11 cs.LG cs.AI 版本更新

密度脊选择性预测：校准标签稀缺下的大语言模型与视觉语言模型幻觉检测

Nina I. Shamsi

发表机构 * Northeastern University Boston, United States（东北大学波士顿分校）

AI总结针对校准标签稀缺时大语言模型和视觉语言模型的幻觉检测问题，提出基于核密度估计的密度脊方法，利用隐藏状态生成轨迹的六维运动特征图构建响应流形，通过到最近脊顶点的欧氏距离评分，在标签稀缺协议下AUROC提升5-20点。

详情

AI中文摘要

大语言模型和视觉语言模型中的幻觉检测日益被框架化为选择性预测，其中检测器分配置信度分数并在置信度低时弃权。无监督采样检测器（Semantic Entropy, EigenScore）避免标签但质量停滞，而有监督探针（SAPLMA）获得更强的分布内分数，但在校准标签稀缺时性能急剧下降。我们将大语言模型的响应流形恢复为基于隐藏状态生成轨迹的六维运动特征图的核密度估计的密度脊。测试生成通过其投影特征点到最近脊顶点的欧氏距离的负值进行评分，从而得到随机输出分布的低维几何骨架。我们在七个问答基准（HaluEval-QA, TriviaQA, GSM8K, POPE, ScienceQA, A-OKVQA）上，使用九个文本和视觉大语言模型，在刻意标签稀缺协议（$n_{\ ext{cal}}{=}200$ 查询，$N{=}5$ 生成）下，与Semantic Entropy、SAR、EigenScore、SAPLMA和对数概率进行评估。我们的基于脊的分数在AUROC上以5-20个百分点的优势获胜，同时在校准标签稀缺下表现出温和的性能下降。

英文摘要

Hallucination detection in large language and vision-language models is increasingly framed as selective prediction, where a detector assigns a confidence score and abstains when confidence is low. Unsupervised sampling detectors (Semantic Entropy) avoid labels but plateau in quality, while supervised probes attain stronger in-distribution scores yet degrade sharply when calibration labels are scarce. We recover the response manifold of an LLM as the density ridge of a kernel density estimate built on a six-dimensional kinematic feature map of hidden state generation trajectories. A test generation is scored by the negated Euclidean distance from its projected feature point to the nearest ridge vertex, yielding a low-dimensional geometric skeleton of the stochastic output distribution. We evaluate against Semantic Entropy, topological methods, and log-probability on six QA benchmarks (HaluEval-QA, TriviaQA, GSM8K, POPE, ScienceQA, A-OKVQA) using eight text and vision LLMs in a deliberately label-scarce protocol ($n_{\text{cal}}{=}200$ queries, $N{=}5$ generations). Our ridge-based score beats on AUROC with 5-20 points gain, while demonstrating tempered degradation under calibration-label scarcity.

URL PDF HTML ☆

赞 0 踩 0

2504.21072 2026-06-11 cs.CR cs.AI cs.LG 版本更新

Erased but Not Forgotten: How Backdoors Compromise Concept Erasure

擦除但未遗忘：后门如何破坏概念擦除

Tobias Braun, Jonas Henry Grebe, Marcus Rohrbach, Anna Rohrbach

发表机构 * GitHub

AI总结本文揭示了一种名为擦除规避后门（EEB）的漏洞，攻击者将后门触发器绑定到待擦除概念上，使得该恶意链接在后续擦除后仍然存在，从而绕过多种概念擦除方法。

详情

AI中文摘要

文本到图像扩散模型的扩展引发了对有害输出的担忧，从捏造的公众人物描绘到露骨的色情图像。为减轻此类风险，先前工作提出了概念擦除方法，旨在通过微调从模型中切断不需要的概念，但仍不清楚这些方法是否真正移除了与有害概念的所有联系，或仅仅是掩盖了表面连接。在这项工作中，我们揭示了一个关键漏洞——擦除规避后门（EEB）：攻击者将后门触发器绑定到待擦除的概念上，并且这种恶意链接在后续擦除后仍然存在。我们展示了黑盒和白盒攻击者都能实例化这一威胁。在六种最先进的擦除方法中，包括那些明确搜索目标概念替代表示的鲁棒方法，EEB始终能暴露有害内容：针对名人身份遗忘的成功率高达82%，针对物体擦除的成功率高达94%，针对露骨内容暴露的放大倍数高达16倍。虽然EEB揭示了当前擦除方法的一个盲点，但它也为压力测试未来的概念擦除技术提供了诊断工具。

英文摘要

The expansion of text-to-image diffusion models has raised concerns about harmful outputs, from fabricated depictions of public figures to sexually explicit imagery. To mitigate such risks, prior work has proposed concept erasure methods that aim to sever unwanted concepts from the model via fine-tuning, yet it remains unclear whether these approaches truly remove all links to the harmful concept or merely conceal superficial connections. In this work, we reveal a critical vulnerability, the Erasure Evasion Backdoor (EEB): an adversary binds a backdoor trigger to a concept slated for removal, and this malicious link survives subsequent erasure. We show that both black-box and white-box adversaries can instantiate this threat. Across six state-of-the-art erasure methods, including robust ones that explicitly search for alternative representations of the target concept, EEB consistently exposes harmful content: up to 82% success against celebrity-identity unlearning, up to 94% for object erasure, and up to 16 times amplification of explicit-content exposure. While EEB uncovers a blind spot in current erasure methods, it also provides a diagnostic tool for stress-testing future concept erasure techniques.

URL PDF HTML ☆

赞 0 踩 0

2505.08784 2026-06-11 stat.ML cs.LG math.ST stat.ME stat.TH 版本更新

PCS-UQ: Uncertainty Quantification via the Predictability-Computability-Stability Framework

PCS-UQ：基于可预测性-可计算性-稳定性框架的不确定性量化

Abhineet Agarwal, Fange Xiao, Rebecca Barter, Omer Ronen, Boyu Fan, Bin Yu

发表机构 * Department of Statistics, University of California, Berkeley（加州大学伯克利分校统计学系）； Department of Epidemiology, University of Utah（犹他大学流行病学系）； Department of Electrical Engineering and Computer Science, University of California, Berkeley（加州大学伯克利分校电气工程与计算机科学系）

AI总结提出PCS-UQ框架，通过预测检查、bootstrap采样和乘法校准实现不确定性量化，在回归和分类任务中优于或媲美共形预测方法，并提供理论保证。

详情

AI中文摘要

随着机器学习进入高风险领域，可信的不确定性量化对于安全性至关重要。本文基于真实数据科学的可预测性、可计算性和稳定性原则，提出了PCS-UQ框架。从候选模型或算法集开始，PCS-UQ集成了严格的预测检查以筛选出集合中不合适的模型，并利用bootstrap样本来捕获预测检查算法的样本间变异性和算法不稳定性。然后，我们引入了一种新颖的乘法校准方案来增强局部自适应性，这基本上对应于共形预测中的新分数。此外，我们编制了17个真实世界回归数据集，并手动构建了子组。在该基准测试中，PCS-UQ在保持目标覆盖率的同时，在区间宽度上优于或匹配配备有oracle选择算法的共形方法。PCS-UQ实现了一致的子组覆盖率，优于这些oracle选择的共形方法。值得注意的是，PCS-UQ在实现竞争性区间宽度和一致子组覆盖率方面表现出色。在6个分类数据集上，PCS-UQ将预测集大小减少了20%。为了将框架扩展到深度学习，我们提出了计算高效的变体，避免了昂贵的重新训练。在三个计算机视觉基准测试中，这些变体将预测集大小比共形基线减少了20%。最后，我们提供了理论证明，即修改后的PCS-UQ算法在可交换性下作为分割共形推断的一种形式保持了有效的覆盖率。

英文摘要

As machine learning (ML) enters high-stakes domains, trustworthy uncertainty quantification (UQ) is essential for safety. In this paper we introduce PCS-UQ, a framework based on the Predictability, Computability, and Stability (PCS) principles for veridical data science. Starting with a candidate set of models or algorithms, PCS-UQ integrates a rigorous prediction-check to screen out unsuitable models in the set and utilizes bootstrap samples, in order to capture both inter-sample variability and algorithmic instability for the prediction-checked algorithms. We then introduce a novel multiplicative calibration scheme to enhance local adaptivity, which basically corresponds to a new score in conformal prediction. Moreover, we produce a compilation of 17 real-world regression datasets with manually-constructed subgroups. On this benchmark, PCS-UQ maintains the target coverage while outperforming or matching conformal methods equipped with oracle-selected algorithms in interval width. PCS-UQ achieves consistent subgroup coverage, outperforming these oracle-selected conformal methods. Notably, PCS-UQ stands out in achieving both competitive interval widths and consistent subgroup coverage.Across 6 classification datasets, PCS-UQ reduces prediction set sizes by 20\%. To scale the framework for deep learning, we propose computationally efficient variants that bypass expensive retraining. On three computer vision benchmarks, these variants reduce prediction set sizes by 20\% over conformal baselines. Finally, we provide theoretical proof that a modified PCS-UQ algorithm preserves valid coverage under exchangeability as a form of split conformal inference.

URL PDF HTML ☆

赞 0 踩 0

2508.17077 2026-06-11 stat.ML cs.LG 版本更新

CP4SBI: Local Conformal Calibration of Credible Sets in Simulation-Based Inference

CP4SBI: 基于模拟推断中可信集的局部共形校准

Luben M. C. Cabezas, Vagner S. Santos, Thiago R. Ramos, Pedro L. C. Rodrigues, Rafael Izbicki

发表机构 * Department of Statistics, Federal University of São Carlos（统计系，圣卡洛斯联邦大学）； Institute of Mathematics and Computer Science, University of São Paulo（数学与计算机科学学院，圣保罗大学）； Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK（格勒诺布尔阿尔卑斯大学，法国国家信息与自动化研究所，法国国家科学研究中心，格勒诺布尔INP，LJK）

AI总结提出CP4SBI框架，通过回归树和CDF校准实现局部贝叶斯覆盖，为任意评分函数提供有限样本局部覆盖保证，提升神经后验估计的不确定性量化质量。

2510.07750 2026-06-11 stat.ML cs.LG 版本更新

Calibrating Decision Robustness via Inverse Conformal Risk Control

通过逆保形风险控制校准决策鲁棒性

Wenbin Zhou, Shixiang Zhu

发表机构 * Wenbin Zhou（周文彬）； Shixiang Zhu（朱世祥）

AI总结提出逆保形风险控制框架，为鲁棒优化策略提供无分布、有限样本的误覆盖与遗憾保证，通过追踪Pareto前沿帮助决策者根据成本-风险偏好校准鲁棒性水平。

详情

AI中文摘要

鲁棒优化通过针对最坏情况优化来保护决策免受不确定性影响，但其有效性取决于预先指定的鲁棒性水平，该水平通常是临时选择的，导致保护不足或过度保守且成本高昂的解决方案。最近使用保形预测的方法构建了具有有限样本覆盖保证的数据驱动不确定性集，但它们仍然事先固定覆盖目标，并且对选择鲁棒性水平提供的指导很少。我们提出了一个新框架，该框架为任何鲁棒预测-然后优化策略族提供了无分布、有限样本的误覆盖和遗憾保证。我们的方法构建了有效的估计量，这些估计量描绘出误覆盖-遗憾帕累托前沿，使决策者能够根据其成本-风险偏好可靠地评估和校准鲁棒性水平。该框架易于实现，广泛适用于经典优化公式，并实现了更优的有限样本性能。本文提供了一种原则性的数据驱动方法，用于指导鲁棒性选择，并使从业者能够在高风险决策中平衡鲁棒性和保守性。

英文摘要

Robust optimization safeguards decisions against uncertainty by optimizing against worst-case scenarios, yet their effectiveness hinges on a prespecified robustness level that is often chosen ad hoc, leading to either insufficient protection or overly conservative and costly solutions. Recent approaches using conformal prediction construct data-driven uncertainty sets with finite-sample coverage guarantees, but they still fix coverage targets a priori and offer little guidance for selecting robustness levels. We propose a new framework that provides distribution-free, finite-sample guarantees on both miscoverage and regret for any family of robust predict-then-optimize policies. Our method constructs valid estimators that trace out the miscoverage--regret Pareto frontier, enabling decision-makers to reliably evaluate and calibrate robustness levels according to their cost--risk preferences. The framework is simple to implement, broadly applicable across classical optimization formulations, and achieves sharper finite-sample performance. This paper offers a principled data-driven methodology for guiding robustness selection and empowers practitioners to balance robustness and conservativeness in high-stakes decision-making.

URL PDF HTML ☆

赞 0 踩 0

2605.16651 2026-06-11 cs.CV cs.LG 版本更新

Right Predictions, Misleading Explanations: On the Vulnerability of Vision-Language Model Explanations

正确预测，误导性解释：关于视觉-语言模型解释的脆弱性

Narges Babadi, Hadis Karimipour

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结研究探讨了视觉-语言模型中解释热图在对抗条件下是否忠实反映推理过程，提出X-Shift攻击揭示解释与预测行为的脱节，验证了解释机制的脆弱性。

Comments Accepted at the ICML 2026 Workshop on Trustworthy AI for Good (AI4GOOD), Seoul, South Korea

详情

AI中文摘要

解释机制被广泛用于增强视觉-语言模型（VLMs）的透明性和信任度，特别是在需要人类监督的决策场景中。然而，这些解释的鲁棒性仍不明确。本文研究了VLMs（特别是基于CLIP的模型）中的解释热图在对抗条件下是否忠实反映模型推理。我们发现，解释图谱可以系统性地被操控，同时保持模型的原始预测，揭示了预测行为与解释忠实性之间的脱节。为研究这种脆弱性，我们引入了X-Shift，一种新的灰盒攻击，通过扰动图像级视觉表示，将解释热图引导至语义无关区域，而不会改变预测输出。与传统对抗攻击旨在诱导误分类不同，X-Shift专门针对解释过程的完整性。该攻击不修改模型参数，并在多种CLIP架构和解释方法上通用。我们在ImageNet-1k、MS-COCO和Flickr30K上评估了所提出的方法，证明在不可察觉的扰动下，解释对齐性持续下降，而预测保持稳定。此外，标准以预测为导向的对抗攻击即使在更大的扰动预算下也无法复制相同的解释偏移行为。我们的发现突显了当前VLMs解释机制的根本局限性，并对它们在高影响应用中作为可靠信任指标的使用提出了担忧。

英文摘要

Explanation mechanisms are increasingly used to support transparency and trust in vision-language models (VLMs), particularly in settings where model decisions require human oversight. However, the robustness of these explanations remains insufficiently understood. In this work, we investigate whether explanation heatmaps in VLMs, particularly CLIP-based models, faithfully reflect model reasoning under adversarial conditions. We show that explanation maps can be systematically manipulated while preserving the model's original prediction, revealing a disconnect between predictive behavior and explanation faithfulness. To study this vulnerability, we introduce X-Shift, a novel grey-box attack that perturbs patch-level visual representations to redirect explanation heatmaps toward semantically irrelevant regions without altering the predicted output. Unlike conventional adversarial attacks that aim to induce misclassification, X-Shift specifically targets the integrity of the explanation process itself. The attack operates without modifying model parameters and generalizes across multiple CLIP architectures and explanation methods. We evaluate the proposed approach on ImageNet-1k, MS-COCO, and Flickr30K, demonstrating consistent degradation in explanation alignment under imperceptible perturbations while maintaining prediction stability. Furthermore, standard prediction-oriented adversarial attacks fail to reproduce the same explanation-shifting behavior even under substantially larger perturbation budgets. Our findings highlight a fundamental limitation of current explanation mechanisms in VLMs and raise concerns about their use as reliable indicators of model trustworthiness in high-impact applications.

URL PDF HTML ☆

赞 0 踩 0

2605.31219 2026-06-11 cs.CV cs.CR cs.LG 版本更新

Latent Geometric Chords for Query-Efficient Decision-Based Adversarial Attacks

潜在几何和弦：面向查询高效决策型对抗攻击

Ei Hmue Khine, Yao Li, Jiebao Sun, Shengzhu Shi, Zhichang Guo, Boying Wu

AI总结提出潜在几何和弦（LGC）方法，通过曲率感知的几何搜索在压缩语义流形中导航决策边界，并引入残差对抗生成（RAG）机制以高视觉保真度实现查询高效的决策型黑盒对抗攻击。

Comments Added a conceptual diagram for the LGC architecture, 14 pages, 10 figures, 7 tables. Submitted to IEEE Transactions on Information Forensics and Security. The source code is available at https://github.com/eihmuekhine/Latent-Geometric-Chords

详情

AI中文摘要

虽然基于决策的黑盒对抗攻击构成了严重的安全威胁，但当前方法存在根本性限制。像素级攻击经常引入不自然的高频视觉伪影，而潜在空间框架受限于低维流形的有限搜索空间和固有的重建缺陷。为解决这些限制，我们提出了潜在几何和弦（LGC）用于查询高效的决策型对抗攻击及其变体LGC-H。其核心是，LGC通过在压缩语义流形内执行曲率感知的几何搜索来导航决策边界。为保证高视觉保真度并规避维度瓶颈，我们引入了基于残差的对抗生成（RAG）机制。RAG将语义扰动隔离为几何和弦，并直接叠加到原始源图像上。RAG显著解决了基线重建缺陷，并有效将允许的搜索空间维度翻倍。实验结果表明，LGC实现了鲁棒的跨数据集迁移性，并显著优于最先进的基线方法。值得注意的是，我们的方法LGC在最小化扰动幅度的同时实现了最先进的视觉保真度——在5000次查询下结构相似性指数（SSIM）超过0.99，学习感知图像块相似度（LPIPS）低于0.01——并在严格的感知约束下保持高攻击成功率，成功攻破了经过对抗训练的鲁棒模型。源代码可在https://github.com/eihmuekhine/Latent-Geometric-Chords获取。

英文摘要

While decision-based black-box adversarial attacks present a severe security threat, current methodologies suffer from fundamental limitations. Pixel-wise attacks frequently introduce unnatural, high-frequency visual artifacts, while latent-space frameworks are confined by the limited search space of low-dimensional manifolds and inherent reconstruction flaws. To resolve these limitations, we propose Latent Geometric Chords (LGC) for Query-Efficient Decision-Based Adversarial Attacks alongside a variant, LGC-H. At its core, LGC navigates decision boundaries by executing a curvature-aware geometric search within a compressed semantic manifold. To guarantee high visual fidelity and circumvent dimensionality bottlenecks, we introduce a Residual-based Adversarial Generation (RAG) mechanism. RAG isolates semantic perturbations as geometric chords and superimposes them directly onto the original source image. RAG substantially resolves baseline reconstruction flaws and effectively doubles the permissible search space dimensions. Experimental results demonstrate that LGC achieves robust cross-dataset transferability and substantially outperforms state-of-the-art baselines. Notably, our method, LGC, minimizes perturbation magnitudes while achieving state-of-the-art visual fidelity--with a Structural Similarity Index Measure (SSIM) exceeding 0.99 and a Learned Perceptual Image Patch Similarity (LPIPS) below 0.01 at 5000 queries--and sustaining high attack success rates under stringent perceptual constraints, successfully compromising adversarially trained robust models. The source code is available at: https://github.com/eihmuekhine/Latent-Geometric-Chords.

URL PDF HTML ☆

赞 0 踩 0

2606.05551 2026-06-11 stat.ML cs.AI cs.LG 版本更新

TAROT: 面向小样本表格学习的任务自适应LLM先验图精炼

Ruxue Shi, Yili Wang, Mengnan Du, Hangting Ye, Yi Chang, Xin Wang

发表机构 * Jilin University（吉林大学）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结提出TAROT框架，通过构建并精炼任务自适应语义图，利用LLM先验和GNN编码特征语义关系，提升小样本表格学习性能。

详情

AI中文摘要

小样本表格学习为实际应用中标注成本高、新任务样本收集困难的情况提供了一种经济有效的方法。现有的传统方法和基于LLM的方法在小样本场景中已展现出有效性。然而，传统方法需要在未标注或生成的数据上进行额外训练，这带来了显著的计算开销。此外，直接将原始表格数据输入LLM的基于LLM的方法引发了隐私和合规性问题。更重要的是，这两种范式都很大程度上忽略了特征之间的语义关系，而语义关系为构建语义图提供了结构和语义先验。语义图对于在小样本场景中建模有意义的特征交互至关重要。本文提出TAROT，一个基于GNN的框架，通过从先验中构建并精炼任务自适应语义图来编码结构和语义先验，从而提升小样本表格学习的预测性能。TAROT首先通过统一语义表格节点编码器（USTNE）将异构表格数据编码为统一的节点语义表示。然后，它提示LLM根据任务描述和特征名称推断特征之间的语义关系，以构建语义图。为了减轻LLM幻觉引入的结构噪声，TAROT引入了任务自适应语义图精炼，剪除虚假或与任务无关的边，并添加缺失的与任务相关的边，使图结构与下游目标对齐。最后，GNN在精炼后的图上进行消息传递，以捕获与任务相关的语义依赖关系进行预测。在各种小样本表格学习基准上的大量实验证明了TAROT的优越性能，使其成为该领域的最先进方法。

英文摘要

Few-shot tabular learning provides a cost-effective approach for real-world applications where annotation is costly and collecting sufficient samples for new tasks is difficult. Existing Traditional and LLM-based methods have demonstrated effectiveness in few-shot scenarios. However, traditional methods need additional training on unlabeled or generated data, which incur significant computational overhead. In addition, LLM-based methods that directly feed raw tabular data into LLMs raise privacy and compliance concerns. More importantly, both paradigms largely overlook the semantic relationships between features, which provide structural and semantic prior for constructing a semantic graph. Semantic graph is essential for modeling meaningful feature interactions in few-shot scenarios. In this paper, we propose TAROT, a GNN-based framework that encodes the structural and semantic prior by constructing and refining a task-adaptive semantic graph from this prior, thereby improving predictive performance in few-shot tabular learning. TAROT first encodes heterogeneous tabular data into unified node semantic representations via a Unified Semantic Tabular Node Encoder (USTNE). Then, it prompts LLMs to infer the semantic relationship between features based on the task description and feature names to construct a semantic graph. To mitigate structural noise introduced by the hallucination of LLMs, TAROT introduces Task-adaptive Semantic Graph Refinement that prunes spurious or task-unrelated edges and adds missing task-related ones, aligning the graph structure with the downstream objective. Finally, a GNN performs message passing over the refined graph to capture task-related semantic dependencies for prediction. Extensive experiments on various few-shot tabular learning benchmarks demonstrate the superior performance of TAROT, establishing it as a state-of-the-art approach in this domain.

URL PDF HTML ☆

赞 0 踩 0

2606.11831 2026-06-11 cs.LG cs.AI 新提交

From Uniform to Learned Graph Priors: Diffusion for Structure Discovery

从均匀到学习图先验：用于结构发现的扩散

Qi Shao, Hao Guo, Jiawen Chen, Duxin Chen, Wenwu Yu

发表机构 * School of Mathematics, Southeast University（东南大学数学学院）

AI总结提出Diff-prior，一种扩散参数化的自适应先验，通过可学习的去噪式校准对边后验进行结构化校准，提升神经关系推理方法的结构发现可靠性。

Comments 15 pages, 3 figures, Accepted by KDD 2026

详情

DOI: 10.1145/3770855.3817940

AI中文摘要

神经关系推理（NRI）方法通过离散潜在边的变分推理从轨迹中发现交互图。然而，这些方法通常依赖于过度简化的因子化图先验。这种先验通常接近均匀分布，将边视为独立实体。这种系统性错位与现实世界系统不匹配，导致边后验分散且不明确，限制了结构发现的可靠性。为了解决这个问题，我们提出了\textit{Diff-prior}，一种扩散参数化的自适应先验，用于校准潜在图分布而非生成图。我们的核心见解是将先验整合重新构建为一种可学习的去噪式校准，将分散、不确定的边后验组织成更可靠的整体结构，该结构可通过扩散模型训练。Diff-prior学习一个自适应结构先验，在推理过程中对边后验进行结构化校准，引导其朝向更接近底层结构的分布。Diff-prior在结构采样之前操作，并直接对编码器边分布进行去噪校准，为结构化变量提供了一种通用的训练范式。在标准基准上的实验验证了我们的框架，结果表明Diff-prior提高了结构推理的性能，并在多个NRI系列架构中生成更明确的边后验。代码可在以下网址获取：https://this URL。

英文摘要

Neural relational inference (NRI) methods discover interaction graphs from trajectories through variational reasoning on discrete potential edges. However, these methods typically rely on oversimplified, factorized graph priors. Such priors, typically nearing uniform distributions, treat edges as independent entities. This systemic misalignment does not match the real-world systems and yields diffuse and indecisive edge posteriors limiting the reliability of structural discovery. To address this, we propose \textit{Diff-prior}, a diffusion-parameterized adaptive prior used to calibrate latent graph distribution rather than generate graphs. Our core insight is to reframe prior integration as a learnable denoising-style calibration that organizes scattered, uncertain edge posteriors into a more reliable overall structure which can be trained by the diffusion model. Diff-prior learns an adaptive structure prior that performs structured calibration on the edge posteriors during inference, guiding it towards a distribution closer to the underlying structure. The diff-prior operates before structural sampling and acts as a denoising calibrator directly on the encoder edge distribution, which provides a generic training paradigm over structured variables. Experiments on standard benchmarks validated our framework, and the results indicate that Diff-prior improves the performance of structure inference and generates more decisive edge posteriors across multiple NRI-family architectures. The code is available on https://github.com/Hardy158118/Diffprior.

URL PDF HTML ☆

赞 0 踩 0

2606.11946 2026-06-11 cs.DB cs.CC cs.LG cs.LO 交叉投稿

Neuro-Relational Programs: Unifying Queries and Neural Computation over Structured Data

神经关系程序：统一结构化数据上的查询与神经计算

Arie Soeteman, Balder ten Cate, Maurice Funk, Benny Kimelfeld, Carsten Lutz, Moritz Schönherr

发表机构 * ILLC, University of Amsterdam（伊拉斯谟罗素学院，阿姆斯特丹大学）； Leipzig University（莱比锡大学）； ScaDS.AI Center（ScaDS.AI研究中心）； Technion（技术学院）； RelationalAI（关系AI）

AI总结提出神经关系程序（NRP），一种扩展Datalog规则的声明式查询语言，通过嵌入操作融合关系推理与可学习神经组件，实现关系数据上的通用神经计算。

Comments 37 pages

详情

AI中文摘要

在关系数据库上进行深度学习的传统方法是将图神经网络（GNN）等神经模型应用于数据库的图表示。最近的方法则直接操作数据库，将元组与嵌入关联，并扩展查询机制以联合处理嵌入和关系内容。受这些发展的启发，我们引入了神经关系程序（NRP），这是一种针对关系数据库的声明式查询语言，其事实携带数值向量嵌入。NRP扩展了Datalog风格的规则，增加了组合、聚合和转换嵌入的操作，从而在单一形式主义中交错关系推理和可学习神经组件。这产生了一种对关系数据进行神经计算的通用方法：NRP既可以看作带有可训练组件的查询计划，也可以看作内置关系结构的神经架构。NRP的自然语法片段恢复了现有架构和查询形式主义。零元NRP对应于非自适应查询算法；一元NRP推广了GNN风格的消息传递，并精确捕捉了深度同态网络，我们将这一联系扩展到带有行ID的数据库上的前沿保护NRP。我们通过FOCQ（一阶逻辑在实权重结构上的计数扩展）刻画了带有ReLU-FFN变换的无限制NRP的表达能力，从而建立了与有序数据库上的均匀TC$^0$的精确联系。这些结果共同确立了NRP作为关系数据上查询和神经计算的广泛声明式框架。

英文摘要

The conventional approach to deep learning over relational databases applies neural models, such as Graph Neural Networks (GNNs), to a graph representation of the database. Recent approaches instead operate on databases directly, associating tuples with embeddings and extending query mechanisms to jointly process embeddings and relational content. Inspired by these developments, we introduce Neuro-Relational Programs (NRPs), a declarative query language for relational databases whose facts carry numeric vector embeddings. NRPs extend Datalog-style rules with operations that combine, aggregate, and transform embeddings, thereby interleaving relational reasoning and learnable neural components within a single formalism. This yields a general approach to neural computation over relational data: an NRP can be read both as a query plan with trainable components and as a neural architecture with relational structure built in. Natural syntactic fragments of NRPs recover existing architectures and query formalisms. Zero-ary NRPs correspond to non-adaptive query algorithms; monadic NRPs generalize GNN-style message passing and precisely capture Deep Homomorphism Networks, a connection that we extend to frontier-guarded NRPs over databases with row-ids. We characterize the expressive power of unrestricted NRPs with ReLU-FFN transformations by FOCQ, an extension of first-order logic with counting interpreted over real-weighted structures, yielding a precise connection with uniform TC$^0$ over ordered databases. Together, these results establish NRPs as a broad declarative framework for querying and neural computation over relational data.

URL PDF HTML ☆

赞 0 踩 0

2510.04567 2026-06-11 cs.LG cs.AI 版本更新

GILT: An LLM-Free, Tuning-Free Graph Foundational Model for In-Context Learning

GILT：一种无需LLM、无需微调的图基础模型用于上下文学习

Weishuo Ma, Yanbo Wang, Xiyuan Wang, Lei Zou, Muhan Zhang

发表机构 * Institute for Artificial Intelligence, Peking University（北京大学人工智能研究院）； Wangxuan Institute of Computer Technology, Peking University（北京大学王宣计算机技术研究所）

AI总结提出GILT框架，通过基于令牌的上下文学习机制统一处理节点、边和图级别的分类任务，无需大语言模型或微调，实现高效泛化。

Comments Accepted as an oral presentation at the GFM @ ICML 2026 Workshop

详情

AI中文摘要

图神经网络（GNN）是处理关系数据的强大工具，但通常难以泛化到未见过的图，从而催生了图基础模型（GFM）的发展。然而，当前的GFM面临图数据极端异质性的挑战，每个图可能具有独特的特征空间、标签集和拓扑结构。为此，出现了两种主要范式：第一种利用大语言模型（LLM），但本质上依赖于文本，因此难以处理海量图中的数值特征；第二种预训练基于结构的模型，但适应新任务通常需要昂贵的每图微调阶段，造成关键效率瓶颈。在这项工作中，我们超越了这些限制，引入了图上下文学习Transformer（GILT），这是一个基于无需LLM且无需微调架构的框架。GILT引入了一种新颖的基于令牌的框架用于图上的上下文学习（ICL），在统一框架中重新定义了跨节点、边和图级别的分类任务。该机制是处理异质性的关键，因为它设计用于操作通用数值特征。此外，它从上下文中动态理解类别语义的能力实现了无需微调的适应。全面实验表明，与基于LLM或基于微调的基线相比，GILT以显著更少的时间实现了更强的少样本性能，验证了我们方法的有效性。我们的代码可在https://github.com/yiming421/inductnode/获取。

英文摘要

Graph Neural Networks (GNNs) are powerful tools for processing relational data but often struggle to generalize to unseen graphs, giving rise to the development of Graph Foundational Models (GFMs). However, current GFMs are challenged by the extreme heterogeneity of graph data, where each graph can possess a unique feature space, label set, and topology. To address this, two main paradigms have emerged. The first leverages Large Language Models (LLMs), but is fundamentally text-dependent, thus struggles to handle the numerical features in vast graphs. The second pre-trains a structure-based model, but the adaptation to new tasks typically requires a costly, per-graph tuning stage, creating a critical efficiency bottleneck. In this work, we move beyond these limitations and introduce \textbf{G}raph \textbf{I}n-context \textbf{L}earning \textbf{T}ransformer (GILT), a framework built on an LLM-free and tuning-free architecture. GILT introduces a novel token-based framework for in-context learning (ICL) on graphs, reframing classification tasks spanning node, edge and graph levels in a unified framework. This mechanism is the key to handling heterogeneity, as it is designed to operate on generic numerical features. Further, its ability to understand class semantics dynamically from the context enables tuning-free adaptation. Comprehensive experiments show that GILT achieves stronger few-shot performance with significantly less time than LLM-based or tuning-based baselines, validating the effectiveness of our approach. Our code is available at: https://github.com/yiming421/inductnode/.

URL PDF HTML ☆

赞 0 踩 0

2505.03649 2026-06-11 stat.ML cs.LG math.CO math.PR 版本更新

Weighted Random Dot Product Graphs

加权随机点积图

Bernardo Marenco, Paola Bermolen, Marcelo Fiori, Federico Larroca, Gonzalo Mateos

发表机构 * Facultad de Ingeniería Universidad de la República（工程学院乌拉圭共和国大学）； Dept. of Electrical and Computer Engineering University of Rochester（电气与计算机工程系罗切斯特大学）

AI总结提出加权随机点积图（WRDPG）模型，通过节点潜位置的内积刻画边权分布的高阶矩，并给出谱嵌入估计的统计保证与生成框架。

Comments 30 pages, 12 figures, code to generate Figures 3 to 12 available at https://github.com/bmarenco/wrdpg. Updated to match the published version

详情

DOI: 10.1214/26-EJS2538
Journal ref: Electronic Journal of Statistics, 20(1), 2456-2499, 2026

AI中文摘要

复杂关系模式的建模已成为当代统计研究和相关数据科学领域的基石。以图形式表示的网络为这种分析提供了自然框架。本文扩展了随机点积图（RDPG）模型以适应加权图，显著拓宽了该模型的适用范围，使其能够处理边权呈现异质分布的场景。我们提出了一种非参数加权（W）RDPG模型，为每个节点分配一系列潜位置。这些节点向量的内积通过矩生成函数指定其关联边权分布的矩。与现有技术不同，WRDPG能够区分具有相同均值但高阶矩不同的权重分布。我们推导了基于工作马邻接谱嵌入的节点潜位置估计量的统计保证，建立了其一致性和渐近正态性。我们还贡献了一个生成框架，能够采样符合（指定或数据拟合的）WRDPG的图，从而促进例如使用恰当的参考分布对观测图指标进行分析和检验。本文组织如下：形式化模型定义、估计（或节点嵌入）过程及其保证，以及生成加权图的方法，所有内容均辅以说明性和可重复的示例，展示WRDPG在各种网络分析应用中的有效性。

英文摘要

Modeling of intricate relational patterns has become a cornerstone of contemporary statistical research and related data science fields. Networks, represented as graphs, offer a natural framework for this analysis. This paper extends the Random Dot Product Graph (RDPG) model to accommodate weighted graphs, markedly broadening the model's scope to scenarios where edges exhibit heterogeneous weight distributions. We propose a nonparametric weighted (W)RDPG model that assigns a sequence of latent positions to each node. Inner products of these nodal vectors specify the moments of their incident edge weights' distribution via moment-generating functions. In this way, and unlike prior art, the WRDPG can discriminate between weight distributions that share the same mean but differ in other higher-order moments. We derive statistical guarantees for an estimator of the nodal's latent positions adapted from the workhorse adjacency spectral embedding, establishing its consistency and asymptotic normality. We also contribute a generative framework that enables sampling of graphs that adhere to a (prescribed or data-fitted) WRDPG, facilitating, e.g., the analysis and testing of observed graph metrics using judicious reference distributions. The paper is organized to formalize the model's definition, the estimation (or nodal embedding) process and its guarantees, as well as the methodologies for generating weighted graphs, all complemented by illustrative and reproducible examples showcasing the WRDPG's effectiveness in various network analytic applications.

URL PDF HTML ☆

赞 0 踩 0

2605.22346 2026-06-11 stat.ML cs.LG cs.SI 版本更新

The ASE-LSE Disagreement Landscape: An End-to-End Characterisation of Extremes and Structural Drivers

偏离正则性：度异质性和特征间隙作为ASE-LSE潜在子空间分歧的结构驱动因素

Minh Triet Pham, Ian Gallagher

发表机构 * School of Mathematics and Statistics（数学与统计学系）； The University of Melbourne（墨尔本大学）

AI总结本文研究了图数据分析中邻接谱嵌入和拉普拉斯谱嵌入方法在相同网络上产生不同结果的结构原因，揭示了度异质性和社区结构强度对潜在子空间分歧的影响。

Comments This paper is being withdrawn as it was submitted without the consent of all listed authors, and contains work that is currently under academic assessment. It will be resubmitted at an appropriate time once evaluation is complete

详情

AI中文摘要

图数据分析中，邻接谱嵌入和拉普拉斯谱嵌入两种最常用方法在相同网络上常产生不同结果。本文提供了结构上的解释。我们证明正则性是完美一致的充分条件：当每个节点具有相同数量的连接时，两种方法产生相同的潜在子空间。任何偏离正则性都会引入分歧，我们证明了一个显式的界限，其两个术语表明控制分歧的结构因素：度异质性推动方法分离，社区结构强度则拉近它们。我们通过成千上万个模拟网络验证了这两种驱动因素，确认异质性推动分歧增加，社区强度抑制它，其比值提供了两种嵌入可以互换或不可互换的强预测。

英文摘要

Two of the most widely used methods for analysing graph data, Adjacency Spectral Embedding and Laplacian Spectral Embedding, often produce different results when applied to the same graph. Yet the structural reasons behind this disagreement remain incompletely understood. This paper provides an end-to-end account of ASE-LSE latent subspace disagreement. We first prove that the two methods produce identical latent subspaces for every embedding dimension whenever the Laplacian is a scalar multiple of the adjacency matrix, and show that this scalar relationship holds if and only if the graph is either regular or bipartite biregular. This anchor result identifies a sufficient condition for perfect agreement that pins down the floor of the disagreement spectrum and supplies the baseline for the perturbation analysis. We then prove that no maximal-disagreement graph or family of graphs exists: the disagreement is always strictly below its theoretical ceiling, and we exhibit a witness family demonstrating that no finite maximum is attainable, so the disagreement landscape has no maximiser. With both endpoints established, we derive a Regularity Departure Bound whose two terms isolate degree heterogeneity and eigengap as the primary structural factors influencing disagreement in the middle regime. Empirical validation across thousands of simulated graphs confirms the mechanisms predicted by the bound: heterogeneity pushes disagreement up, eigengap suppresses it, and their joint ratio emerges as a unified predictor of ASE-LSE disagreement, suggesting when the two embeddings can be treated as interchangeable and when they cannot.

URL PDF HTML ☆

赞 0 踩 0

2606.11844 2026-06-11 cs.LG 新提交

TaskFusion: Continual Anomaly Detection for Heterogeneous Tabular Data

TaskFusion: 异构表格数据的持续异常检测

Dayananda Herurkar, Federico Raue, Joachim Folz, Jörn Hees, Andreas Dengel

发表机构 * German Research Center for Artificial Intelligence (DFKI)（德国人工智能研究中心（DFKI））； RPTU Kaiserslautern-Landau（凯泽斯劳滕-兰道大学）； Hochschule Bonn-Rhein-Sieg (H-BRS)（波恩-莱茵-锡格应用技术大学）

AI总结提出TaskFusion方法，通过AGF模型、任务融合增强和异常暴露技术，解决异构表格数据在持续学习中的特征空间变化、分布偏移和类别不平衡问题，在21个数据集上显著提升持续异常检测性能。

Comments 22 Pages

详情

AI中文摘要

表格数据中的持续异常检测具有挑战性且尚未充分探索，尤其是在异构特征模式、分布偏移和严重类别不平衡的情况下。在许多实际应用中，数据来自不同领域并按顺序到达，这使得传统的持续学习方法因依赖固定输入空间而失效。我们提出了一种持续学习方法，能够克服这些挑战并持续从不同任务中学习。我们的方法包含三个主要部分：AGF模型、TaskFusion增强和异常暴露。AGF模型将任务特定特征映射到共享空间，然后对齐分布以减少表示漂移，并在对齐空间中学习异常决策边界。为了提高稳定性，我们引入了TaskFusion增强，结合任务内的边界感知插值来细化模型异常边界，以及跨任务混合以在数据集间传递异常结构。为了处理类别不平衡和内存限制，我们采用表格数据集蒸馏来存储紧凑的合成回放样本，这些样本与增强数据一起在异常暴露目标中用于鲁棒的异常检测。我们在多个领域的21个异构数据集上评估了该方法。结果表明，与顺序微调和其他持续学习基线相比，我们的方法显著提高了持续异常检测性能，同时减少了灾难性遗忘并在异构数据集上保持稳定的检测。

英文摘要

Continual anomaly detection in tabular data is challenging and remains largely underexplored, particularly in settings with heterogeneous feature schemas, distribution shifts, and severe class imbalance. In many real-world applications, data arrive sequentially from diverse domains, rendering conventional continual learning methods ineffective due to their reliance on a fixed input space. We propose a continual learning (CL) method, which can overcome these challenges and continually learn from different tasks. Our method consists of three main parts: our AGF model, Taskfusion augmentation, and outlier exposure. The AGF-model maps task-specific features into a shared space, then aligns distributions to reduce representation drift, and learns anomaly decision boundaries in the aligned space. To improve stability, we introduce Taskfusion augmentation, combining boundary-aware interpolation within tasks to refine the model anomaly boundaries and cross-task mixing to transfer anomaly structure across datasets. To handle class imbalance and memory constraints, we employ tabular dataset distillation to store compact synthetic replay samples, which are jointly used with augmented data in an outlier exposure objective for robust anomaly detection. We evaluate the approach on 21 heterogeneous datasets across multiple domains. Results show that our approach substantially improves continual anomaly detection performance over sequential fine-tuning and other CL baselines while reducing catastrophic forgetting and maintaining stable detection across heterogeneous datasets.

URL PDF HTML ☆

赞 0 踩 0

2507.23534 2026-06-11 cs.LG cs.CV 版本更新

Continual Learning with Support Boundary Experience Blending

支持边界经验混合的持续学习

Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

发表机构 * National Taiwan University（国立台湾大学）

AI总结提出经验混合框架，通过差分隐私启发的噪声生成支持边界数据，联合训练样本和边界数据以正则化决策边界，在多个数据集上提升持续学习准确率。

详情

AI中文摘要

持续学习旨在减轻模型在顺序任务训练时的灾难性遗忘。常见方法经验回放存储过去的样本，但仅稀疏地近似数据分布，导致决策边界脆弱且过于简化。我们通过引入支持边界数据来解决这一限制，该数据通过差分隐私启发的噪声注入潜在特征，生成边界邻近表示，隐式正则化决策边界。基于此，我们提出经验混合框架，通过双模型聚合策略联合训练样本和支持边界数据。经验混合有两个组成部分：(1) 潜在空间噪声注入以生成支持边界数据，(2) 联合利用样本和支持边界数据的端到端训练。与标准经验回放不同，支持边界数据丰富了决策边界附近的特征空间，从而实现更稳定和鲁棒的持续学习。在CIFAR-10、CIFAR-100、Tiny ImageNet和ImageNet1K上的大量实验分别展示了10%、6%、13%和2%的持续准确率提升。

英文摘要

Continual learning (CL) seeks to mitigate catastrophic forgetting when models are trained with sequential tasks. A common approach, experience replay (ER), stores past exemplars but only sparsely approximates the data distribution, yielding fragile and oversimplified decision boundaries. We address this limitation by introducing Support Boundary Data (SBD), generated via differential-privacy-inspired noise into latent features to create boundary-adjacent representations that implicitly regularize decision boundaries. Building on this idea, we propose Experience Blending (EB), a framework that jointly trains on exemplars and SBD through a dual-model aggregation strategy. EB has two components: (1) latent-space noise injection to generate support boundary data, and (2) end-to-end training that jointly leverages exemplars and SBD. Unlike standard experience replay, SBD enriches the feature space near decision boundaries, leading to more stable and robust continual learning. Extensive experiments on CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet1K demonstrate consistent accuracy improvements of 10%, 6%, 13%, 2%, respectively.

URL PDF HTML ☆

赞 0 踩 0

2606.11235 2026-06-11 cs.LG cs.DB stat.ME 新提交

Few-Shot Resampling for Scalable Statistically-Sound Data Mining

少样本重采样：可扩展的统计可靠数据挖掘

Leonardo Pellegrina, Fabio Vandin

发表机构 * Department of Information Engineering, University of Padova（帕多瓦大学信息工程系）

AI总结提出FewRS方法，基于重采样评估数据挖掘结果的统计显著性，通过推导新的上界偏差界，仅需极少量重采样数据集即可保证假发现概率，显著提升可扩展性。

Comments Accepted to KDD 2026

详情

DOI: 10.1145/3770855.3817752

AI中文摘要

知识发现的一个关键步骤是评估数据挖掘结果。在包括模式挖掘、图分析等多个应用中，此步骤包括评估结果的统计显著性，以避免仅由噪声或数据随机波动导致的虚假发现。虽然针对某些特定应用已经开发了专门程序，但基于重采样的方法被广泛使用，尤其是在无法推导解析结果的复杂分析中。然而，当前基于重采样的方法需要生成和分析数千个重采样数据集，因此对于大型数据集或计算密集型分析不实用。本文中，我们介绍了FewRS，一种简单有效的基于重采样的方法，用于评估数据挖掘结果的统计显著性，并对错误发现概率提供严格保证。我们的方法可应用于任何使用重采样方法的情况。FewRS基于我们对表示数据挖掘结果质量的检验统计量的上确界偏差推导出的新界。我们证明FewRS需要生成和分析极少数量的重采样数据集，从而得到高度可扩展且广泛适用的方法。我们在常见任务（如模式挖掘和网络分析）上测试了我们的方法。在所有情况下，与现有技术相比，我们的方法在运行时间上减少了多达两个数量级，同时保持高统计功效，使得能够在大型真实世界数据集上对数据挖掘结果进行统计验证。

英文摘要

A key step in knowledge discovery is the evaluation of data mining results. In several applications, including pattern mining, graph analysis, and others, this step includes the evaluation of the statistical significance of the results, to avoid spurious discoveries due only to noise or random fluctuations in the data. While specialized procedures have been developed for some specific applications, resampling-based approaches are widely used, in particular for complex analyses where analytical results cannot be derived. However, current resampling-based approaches require the generation and analysis of thousands of resampled datasets, and are therefore impractical for large datasets or computationally intensive analyses. In this paper, we introduce FewRS, a simple and effective resampling-based approach to assess the statistical significance of data mining results with rigorous guarantees on the probability of false discoveries. Our approach can be used in every situation where resampling-based approaches are applied. FewRS builds on our derivation of a novel bound to the supremum deviation of test statistics representing the quality of data mining results. We prove that FewRS needs to generate and analyze an extremely small number of resampled datasets, leading to a highly scalable approach with wide applicability. We test our approach on common tasks such as pattern mining and network analysis. In all cases, our approach results in a reduction of up to two orders of magnitude in running time compared to the state of the art, while preserving high statistical power, enabling the statistical validation of data mining results on large-scale real-world datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.11267 2026-06-11 cs.LG cs.CR 新提交

A prior-free blind detection of information leakage from model predictions

基于模型预测的信息泄露的无先验盲检测

Laurence A. Jacobs

发表机构 * Center for Molecular Cardiology, University of Zurich（苏黎世大学分子心脏病学中心）； Center for Complexity Sciences, National University of Mexico（墨西哥国立自治大学复杂性科学中心）

AI总结针对机器学习模型输出中信息泄露的检测问题，提出决策理论框架，证明校准泄露与诚实模型不可区分，但近确定性子组可被无先验检测，并在UK Biobank上验证。

详情

AI中文摘要

数据泄露——模型被基线不可用的信息污染——是基于机器学习的科学中主要的可重复性失败，然而检测工具需要训练代码、外部数据或领域专业知识。没有一种工具能作用于审计员最常持有的工件：模型的输出。我们询问仅从预测和结果中能判断出关于泄露的什么信息。我们给出了一个决策理论框架，其中泄露诊断是预测风险/结果规律的泛函，由与适当评分规则和决策曲线分析相关的阈值加权参数化。我们证明了一个尖锐的不可能性：重新校准的泄露匹配诚实模型的校准和区分度，通过预测的\emph{任何}函数与诚实性能不可区分，因此广泛类别仅能通过外部提供的可实现区分度上限来检测。然后我们证明了泄露无法隐藏什么：近确定性子组——近标签泄露的特征——产生一个持续的单位纯度头部，任何非确定性结果的合法预测器都无法制造，从而产生一个无先验测试。这些结果将泄露组织成三分法——未校准、广泛校准和确定性——每个都有匹配的检测器和失败模式。我们在UK Biobank上使用时窗共病泄露进行验证，已知分级严重性，测量该终点上的检测下限$\Delta\cstar \approx 0.007$，低于此的残余泄露从输出中无法检测，且太小无法改变结论。数值下限是队列和终点特定的；结构教训是通用的：仅输出检测在残余泄露与诚实的更强预测器无法区分时失败。该测试在商品硬件上不到一秒内返回对预测向量的判定。

英文摘要

Data leakage -- contamination of a model with information unavailable at baseline -- is the dominant reproducibility failure in machine-learning-based science, yet detection tools require training code, external data, or domain expertise. None operates on the artifact an auditor most often holds: the model's output. We ask what can be decided about leakage from predictions and outcomes alone. We give a decision-theoretic framework in which leakage diagnostics are functionals of the predicted-risk/outcome law, parameterized by a threshold-weighting linked to proper scoring rules and decision-curve analysis. We prove a sharp impossibility: a recalibrated leak matching an honest model's calibration and discrimination is indistinguishable from honest performance by \emph{any} function of the predictions, so the broad class is detectable only against an externally supplied ceiling on achievable discrimination. We then prove what leakage cannot hide: a near-deterministic subgroup -- the signature of a near-label leak -- produces a sustained unit-purity head that no legitimate predictor of a non-deterministic outcome can manufacture, yielding a prior-free test. These results organize leakage into a trichotomy -- miscalibrated, broad-calibrated, and deterministic -- each with a matched detector and failure mode. We validate on UK Biobank using time-windowed comorbidity leakage with known, graded severity, measuring a detection floor of $Δ\cstar \approx 0.007$ on this endpoint, below which residual leakage is undetectable from output and too small to alter conclusions. The numerical floor is cohort- and endpoint-specific; the structural lesson is general: output-only detection fails where residual leakage is indistinguishable from an honestly stronger predictor. The test returns a verdict on a prediction vector in under a second on commodity hardware.

URL PDF HTML ☆

赞 0 踩 0

2606.11562 2026-06-11 cs.LG cs.CL 新提交

GraphInfer-Bench: Benchmarking LLM's Inference Capability on Graphs

GraphInfer-Bench：评估LLM在图上的推理能力基准

Zhuoyi Peng, Jingzhou Jiang, Hanlin Gu, Lixin Fan, Yi Yang

发表机构 * The Hong Kong University of Science and Technology（香港科技大学）； Webank（微众银行）

AI总结提出GraphInfer-Bench基准，通过五个任务（描述与比较）测试LLM能否从节点及其邻域推断出无法从单节点或路径检索的答案，发现所有方法均存在差距。

Comments Code: https://github.com/graphinfer/GraphInfer-Bench ; Dataset: https://huggingface.co/datasets/graphinfer/graphinfer

详情

AI中文摘要

RCAP: 鲁棒的、类别感知的、概率性动态数据集剪枝

Atif Hassan, Swanand Khare, Jiaul H. Paik

发表机构 * IIT Kharagpur（印度理工学院卡哈拉格普尔分校）

AI总结提出RCAP算法，通过闭式解估计每类样本保留比例并自适应调整，结合高损失样本优先采样策略，在多种数据集和训练范式下优于现有方法，仅用10%数据即可提升类别不平衡数据集性能1%以上。

Comments Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence (UAI 2025)

详情

Journal ref: pages={1648-1662}, year={2025}, volume={286}, publisher={PMLR}

AI中文摘要

动态数据剪枝技术旨在通过模型训练期间定期选择输入数据的代表性子集来降低计算成本，同时最小化信息损失。然而，现有方法在平衡和不平衡数据集中，特别是在高剪枝率下，往往难以保持较强的最差组准确率。为了解决这一挑战，我们提出了RCAP，一种用于分类任务的鲁棒的、类别感知的、概率性动态数据集剪枝算法。RCAP应用闭式解来估计每个类别应包含在训练子集中的样本比例。该比例通过类别聚合损失在每个epoch自适应调整。随后，它采用自适应采样策略，优先选择具有高损失的样本来填充类别子集。我们在六个从类别平衡到高度不平衡的多样化数据集上，使用五种不同的模型，在三种训练范式（从头训练、迁移学习和微调）下评估了RCAP。我们的方法在所有剪枝率下始终优于最先进的数据集剪枝方法，实现了卓越的最差组准确率。值得注意的是，仅使用10%的数据，RCAP在类别不平衡数据集上相比全数据训练性能提升超过1%，同时平均加速8.69倍。代码可在此https URL获取。

英文摘要

Dynamic data pruning techniques aim to reduce computational cost while minimizing information loss by periodically selecting representative subsets of input data during model training. However, existing methods often struggle to maintain strong worst-group accuracy, particularly at high pruning rates, across balanced and imbalanced datasets. To address this challenge, we propose RCAP, a Robust, Class-Aware, Probabilistic dynamic dataset pruning algorithm for classification tasks. RCAP applies a closed-form solution to estimate the fraction of samples to be included in the training subset for each individual class. This fraction is adaptively adjusted in every epoch using class-wise aggregated loss. Thereafter, it employs an adaptive sampling strategy that prioritizes samples having high loss for populating the class-wise subsets. We evaluate RCAP on six diverse datasets ranging from class-balanced to highly imbalanced using five distinct models across three training paradigms: training from scratch, transfer learning, and fine-tuning. Our approach consistently outperforms state-of-the-art dataset pruning methods, achieving superior worst-group accuracy at all pruning rates. Remarkably, with only $10\%$ data, RCAP delivers $>1\%$ improvement in performance on class-imbalanced datasets compared to full data training while providing an average $8.69\times$ speedup. The code can be accessed at https://github.com/atif-hassan/RCAP-dynamic-dataset-pruning

URL PDF HTML ☆

赞 0 踩 0

2606.11961 2026-06-11 cs.LG cs.AI 新提交

Categorical Prior Lock-in: Why In-Context Learning Fails for Structured Data

类别先验锁定：为何上下文学习在结构化数据上失败

Antonio Pelusi, Stefano Braghin, Alberto Trombetta

发表机构 * University of Insubria（因苏布里亚大学）； IBM Research Ireland（IBM 爱尔兰研究院）

AI总结研究大语言模型在结构化数据生成中上下文学习的局限性，发现其无法更新预训练中的类别先验分布，导致罕见类完全无法生成；参数高效微调可解决但带来记忆化风险。

Comments 9 pages, 5 figures. Empirical study of in-context learning and LoRA fine-tuning for synthetic tabular data generation, introducing the phenomenon of categorical prior lock-in. Under review

详情

AI中文摘要

大型语言模型（LLM）越来越多地被用作结构化数据的条件生成器，依赖上下文学习（ICL）来适应新分布而无需更新参数。我们以高基数表格数据作为受控测试案例，研究分布不匹配下ICL在结构化生成中的局限性，并识别出一种结构性失败模式，我们称之为“类别先验锁定”：ICL无法更新模型从预训练中继承的令牌分布先验。在两个70亿参数开源模型中，ICL随着示例增加提高了数值保真度，但在类别分布上表现出明显的天花板效应，完全无法复现罕见类。参数高效微调（LoRA）克服了这些限制，但引入了可测量的记忆化风险，并在某些情况下破坏了结构化输出生成的稳定性，凸显了适应性与隐私之间的基本权衡。

英文摘要

Large language models (LLMs) are increasingly used as conditional generators for structured data, relying on in-context learning (ICL) to adapt to new distributions without parameter updates. We investigate the limits of ICL for structured generation under distribution mismatch, using high-cardinality tabular data as a controlled test case, and identify a structural failure mode we term \textit{categorical prior lock-in}: the inability of ICL to update the model's prior over token distributions inherited from pre-training. Across two 7B-parameter open-weight models, ICL improves numerical fidelity with additional examples but exhibits a sharp ceiling on categorical distributions, failing to reproduce rare classes entirely. Parameter-efficient fine-tuning (LoRA) overcomes these limitations but introduces measurable memorization risk and, in some cases, destabilizes structured output generation, highlighting a fundamental trade-off between adaptability and privacy.

URL PDF HTML ☆

赞 0 踩 0

2606.12182 2026-06-11 cs.LG math.DS math.OC 新提交

How Low Can You Go? Active Learning for Sparse Model Discovery in the Ultra-Low-Data Limit

你能低到多少？超低数据极限下稀疏模型发现的主动学习

Ana Larrañaga, Urban Fasel, Steven L. Brunton

发表机构 * Department of Mechanical Engineering, University of Washington（华盛顿大学机械工程系）； NSF AI Institute in Dynamic Systems, University of Washington（华盛顿大学NSF动态系统人工智能研究所）； Department of Aeronautics, Imperial College London（伦敦帝国理工学院航空系）

AI总结针对超低数据极限下动力学系统方程发现的数据稀缺问题，提出基于E-SINDy的主动学习策略，通过迭代优先采样信息量大的区域，在Lorenz、Burgers和Kuramoto-Sivashinsky系统上验证了比随机采样更少数据即可准确识别动力学。

Comments 20 pages, 10 figures

详情

AI中文摘要

识别复杂动力系统的控制方程仍然是科学和工程中的一个基本挑战。虽然早期方法依赖于经验数据和启发式方法，但现代数据驱动方法提供了更大的灵活性和更少的假设。然而，在实际环境中获取数据通常成本高昂。本文通过引入一种主动学习策略来解决这一挑战，用于超低数据极限下的动力学发现。我们的方法不是随机采样，而是迭代地优先考虑对模型识别最有信息量的区域。该方法基于稀疏非线性动力学识别（SINDy），并利用集成扩展E-SINDy来估计认知不确定性并指导常微分方程和偏微分方程（ODEs/PDEs）的采样。对于ODEs，在Lorenz系统上进行了详尽的分析，考虑了不同的数据预算和噪声水平。对于PDEs，研究了两个具有对比动力学特性的系统：Burgers方程，其中尖锐的激波前沿区分了信息丰富和信息贫乏的区域；以及Kuramoto-Sivashinsky方程，它呈现出更复杂的空间采样景观。在所有场景中，所提出的方法都能以比随机采样显著更少的数据样本准确识别控制动力学。

英文摘要

Identifying the governing equations of complex dynamical systems remains a fundamental challenge across science and engineering. While early approaches relied on empirical data and heuristics, modern data-driven methods offer greater flexibility and fewer assumptions. However, data acquisition in real-world settings is often expensive. This work addresses this challenge by introducing an active learning strategy for dynamics discovery in the ultra-low data limit. Rather than sampling randomly, our method iteratively prioritizes regions that are most informative for model identification. This approach builds on Sparse Identification of Nonlinear Dynamics (SINDy), and utilizes an ensemble extension, E-SINDy, to estimate epistemic uncertainty and guide the sampling for both ordinary and partial differential equations (ODEs/PDEs). For ODEs, an exhaustive analysis is conducted on the Lorenz system across varying data budgets and noise levels. For PDEs, two systems with contrasting dynamical characteristics are examined: the Burgers' equation, where a sharp shock front creates a distinction between informative and uninformative regions, and the Kuramoto-Sivashinsky equation, which presents a more spatially complex sampling landscape. Across all scenarios, the proposed method accurately identifies the governing dynamics with significantly fewer data samples than random sampling.

URL PDF HTML ☆

赞 0 踩 0

2606.12344 2026-06-11 cs.LG cs.CL 新提交

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

Claw-SWE-Bench：评估OpenClaw风格代理框架在编码任务上的基准

Mengyu Zheng, Kai Han, Boxun Li, Haiyang Xu, Yuchuan Tian, Wei He, Hang Zhou, Jianyuan Guo, Hailin Hu, Lin Ma, Chao Xu, Guohao Dai, Lixue Xia, Yunchao Wei, Yunhe Wang, Yu Wang

发表机构 * TokenRhythm Technologies（TokenRhythm 技术公司）； Infinigence AI ； Peking University（北京大学）； City University of Hong Kong（香港城市大学）； SEE Fund（SEE 基金）； Shanghai Jiaotong University（上海交通大学）； Beijing Jiaotong University（北京交通大学）； Tsinghua University（清华大学）

AI总结提出Claw-SWE-Bench基准，通过适配器协议统一评估异构代理框架，发现适配器设计对编码性能至关重要，且模型和框架选择显著影响通过率与成本。

详情

AI中文摘要

通用代理（如OpenClaw）越来越多地被用作自主工具使用者，但其编码能力难以在SWE-bench下衡量：通用代理本身不满足评分所需的干净Docker工作区、补丁和预测合约。我们引入了Claw-SWE-Bench，一个多语言SWE-bench风格的基准和适配器协议，使异构代理框架（即claws）在公平设置下具有可比性，包括固定提示、运行时预算、工作区合约、补丁提取过程和评估器。完整基准包含8种语言、43个仓库的350个GitHub问题解决实例，这些实例来自SWE-bench-Multilingual和SWE-bench-Verified-Mini，经过未来提交清理。我们还发布了Claw-SWE-Bench Lite用于更快验证，这是一个通过成本感知、排名感知程序从17个校准列中选出的80个实例子集。在完整基准上，使用最小直接差异适配器的OpenClaw仅获得19.1%的Pass@1，而完整适配器在相同GLM 5.1骨干下达到73.4%，表明适配器设计对于使OpenClaw风格的框架有效执行编码任务至关重要。在OpenClaw × 9模型扫描和5框架 × 2模型扫描中，模型选择使Pass@1变化29.4个百分点，固定模型下框架选择变化27.4个百分点；精度相似的系统在总API成本上可能差异很大。因此，Claw-SWE-Bench将框架和成本核算视为SWE风格编码代理评估的第一类轴，提供了完整基准和低成本参考集，用于可重复比较。数据可在https://this URL 和 https://this URL 获取。

英文摘要

General-purpose agents such as OpenClaw are increasingly used as autonomous tool users, but their coding ability is difficult to measure under SWE-bench: a generic agent does not by itself satisfy the clean Docker workspace, patch, and prediction contract required for scoring. We introduce Claw-SWE-Bench, a multilingual SWE-bench-style benchmark and adapter protocol that makes heterogeneous agent harnesses, or claws, comparable under fair settings including a fixed prompt, runtime budget, workspace contract, patch extraction procedure, and evaluator. The full benchmark contains 350 GitHub issue-resolution instances across 8 languages and 43 repositories, drawn from SWE-bench-Multilingual and SWE-bench-Verified-Mini after future-commit cleanup. We also release Claw-SWE-Bench Lite for faster validation, which is an 80-instance subset selected by a cost-aware, rank-aware procedure over 17 calibration columns. On the full benchmark, OpenClaw with a minimal direct-diff adapter scores only $19.1\%$ Pass@1, whereas the full adapter reaches $73.4\%$ with the same GLM 5.1 backbone, showing that adapter design is essential for enabling OpenClaw-style harnesses to perform coding tasks effectively. Across an OpenClaw $\times$ nine-model sweep and a five-claw $\times$ two-model sweep, model choice changes Pass@1 by $29.4$ pp and harness choice by $27.4$ pp under fixed models; systems with similar accuracy can differ substantially in total API cost. Claw-SWE-Bench therefore treats harness and cost accounting as first-class axes of SWE-style coding-agent evaluation, providing both a full benchmark and a low-cost reference set for reproducible comparison. The data is available at https://github.com/opensquilla/claw-swe-bench and https://huggingface.co/datasets/TokenRhythm/Claw-SWE-Bench.

URL PDF HTML ☆

赞 0 踩 0

2606.11196 2026-06-11 cs.CL cs.AI cs.CR cs.LG 交叉投稿

PoQ-Judge: A Multi-Architecture Evaluation Framework for Cost-Aware Proof-of-Quality in Decentralized LLM Inference

PoQ-Judge：去中心化LLM推理中成本感知的证明质量的多架构评估框架

Arther Tian, Alex Ding, Frank Chen, Simon Wu, Aaron Chan

发表机构 * DGrid AI

AI总结提出PoQ-Judge框架，训练专用裁判模型对查询-输出对进行无参考评分，研究三种架构，最佳模型在Pearson相关性上达到0.747，级联评估降低72.7%成本。

详情

AI中文摘要

去中心化LLM推理网络需要轻量级、无参考的质量评估用于证明质量（PoQ）。我们提出PoQ-Judge，一个训练专用裁判模型对查询-输出对进行评分而无真实参考的框架。我们研究了三种架构在质量-成本权衡中的表现：TextCNN裁判、MiniLM交叉编码器和DeBERTa裁判。通过在UltraFeedback和GPT标记的领域内数据上进行两阶段训练，最佳模型在保留测试集上与真实代理的Pearson相关性达到0.747，优于先前工作中基于参考的评估器。作为复合评分中的无参考组件，它实现了0.645的Pearson相关性，匹配最佳单一基于参考的评估器，同时消除了对参考答案的需求。我们还表明，在线校准将语义质量识别为主导维度，级联评估将成本降低72.7%，仅带来适度的质量损失。结果在问答任务上比摘要任务强得多，表明代理质量是主要剩余限制。

英文摘要

Decentralized LLM inference networks need lightweight, reference-free quality evaluation for Proof of Quality (PoQ). We present PoQ-Judge, a framework that trains dedicated judge models to score query-output pairs without ground-truth references. We study three architectures across the quality-cost tradeoff: a TextCNN judge, a MiniLM cross-encoder, and a DeBERTa judge. Using two-stage training on UltraFeedback plus GPT-labeled in-domain data, the best model reaches 0.747 Pearson correlation with the ground-truth proxy on a held-out test set, outperforming reference-based evaluators from prior work. As a reference-free component in composite scoring, it achieves 0.645 Pearson correlation, matching the best single reference-based evaluator while removing the need for reference answers. We also show that online calibration identifies semantic quality as the dominant dimension and that cascade evaluation reduces cost by 72.7 percent with only modest quality loss. Results are much stronger on QA than summarization, pointing to proxy quality as the main remaining limitation.

URL PDF HTML ☆

赞 0 踩 0

2606.11534 2026-06-11 physics.ao-ph cs.LG 交叉投稿

Urban Heat MiniCubes: An AI-Ready dataset for urban heat research

城市热微型数据立方体：面向城市热研究的人工智能就绪数据集

Jonathan Starfeldt, Maria J. Molina, Alexander Kerr, Adam Yang, Thomas R. H. Holmes, Christopher R. Hain

发表机构 * Department of Atmospheric and Oceanic Science, University of Maryland, College Park, MD, USA（大学大气科学与海洋科学系，马里兰大学，学院公园，MD，美国）； Department of Computer Science, University of Maryland, College Park, MD, USA（大学计算机科学系，马里兰大学，学院公园，MD，美国）； NASA Goddard Space Flight Center, Greenbelt, MD, USA（NASA戈达德航天飞行中心，格林贝尔特，MD，美国）； NASA Marshall Space Flight Center, Huntsville, AL, USA（NASA马歇尔航天飞行中心，亨茨维尔，AL，美国）

AI总结提出Urban Heat MiniCubes数据集，整合多源卫星数据（Landsat 8/9、Sentinel-1、GOES-R等），为48个城市提供90×90公里网格化数据立方体，支持机器学习在城市热研究中的应用。

Comments 53 pages, 26 figures, Submitted to Nature Scientific Data

详情

AI中文摘要

城市热效应因不透水表面和异质建筑环境而加剧，但街道尺度的变异性仍难以量化，因为多传感器观测很少以一致、分析就绪的形式在必要的时空尺度上可用。我们提出了“Urban Heat MiniCubes”，一个公开可用、符合FAIR原则的数据集，专为城市热研究中的机器学习应用而设计。该数据集提供了西半球48个城市在2022-2023年间的统一90×90公里网格化数据立方体，变量被重新投影并配准到公共网格，以减少预处理（例如，重投影、重采样和时空对齐）。Urban Heat MiniCubes包括两种互补模态：（i）来自Landsat 8/9（例如，地表反射率）和Sentinel-1（例如，合成孔径雷达后向散射）的高空间分辨率、低频观测，以及（ii）来自GOES-R（例如，长波红外亮温）和微波地表温度产品的更高时间频率、较粗分辨率观测。我们记录了变量和元数据，并通过变量间分析和基于自编码器的像素类别（例如，水和云）重建误差总结提供了技术评估。还讨论了潜在用例和局限性。

英文摘要

Urban heat is amplified by impermeable surfaces and heterogeneous built environments, yet street-level variability remains difficult to quantify because multi-sensor observations are rarely available in consistent, analysis-ready form at the necessary spatiotemporal scales. We present "Urban Heat MiniCubes," a publicly available, FAIR-oriented dataset designed for machine learning applications in urban heat research. The dataset provides harmonized 90 x 90 km gridded data cubes for 48 cities in the Western Hemisphere spanning 2022-2023, with variables reprojected and collocated to a common grid to reduce preprocessing (e.g., reprojection, resampling, and spatiotemporal alignment). Urban Heat MiniCubes includes two complementary modalities: (i) higher-spatial-resolution, lower-frequency observations from Landsat 8/9 (e.g., surface reflectances) and Sentinel-1 (e.g., synthetic aperture radar backscatter), and (ii) higher-temporal-frequency, coarser observations from GOES-R (e.g., longwave infrared brightness temperatures) and a microwave land surface temperature product. We document variables and metadata and provide technical assessment using inter-variable analyses and autoencoder-based reconstruction-error summaries across pixel classes (e.g., water and cloud). Potential use cases and limitations are also discussed.

URL PDF HTML ☆

赞 0 踩 0

2606.11911 2026-06-11 stat.ML cs.LG math.AT 交叉投稿

From Persistence to Survival: Hypothesis Testing, Effect Sizes and Vectorisation for Topological Features

从持续性到生存：拓扑特征的假设检验、效应大小与向量化

Juliette Murris, Bernadette Stolz, Karsten Borgwardt

发表机构 * Department of Machine Learning and Systems Biology, Max Planck Institute of Biochemistry, Martinsried, Germany（机器学习与系统生物学部门，马克斯·普朗克生物化学研究所，马尔廷斯里德，德国）

AI总结提出STRAND方法，将持久性图视为生存数据，利用持久性生存函数统一实现假设检验、效应大小计算和向量化，在合成数据和真实基准上验证了有效性。

详情

AI中文摘要

持久性图是拓扑数据分析中常见的表示形式，但它们并非天然存在于向量空间中，且用于比较它们的统计工具在很大程度上与用于下游预测的工具分开发展。我们引入STRAND（生存拓扑表示图分析），将（集合的）持久性图视为生存数据：每个具有持久性值 $p = d - b$ 的拓扑特征是一个完全观测的事件时间，持久性生存函数 $S(t) = \mathbb{P}(p > t)$ 是比较图的中心对象。从这个单一表示中，我们推导出（i）一个非参数双样本检验，具有校准的第一类错误率和少量图的高功效；（ii）可解释的效应大小；以及（iii）用于下游机器学习的1-Wasserstein稳定特征向量。我们在具有受控拓扑的合成流形上验证了校准和功效，展示了在14个图和3D点云基准上的竞争性向量化，并将该方法应用于fMRI/神经科学数据中的功能性脑连接研究。据我们所知，STRAND是第一个从单一连贯且可解释的表示为持久性图提供假设检验和向量化的方法。

英文摘要

Persistence diagrams are common representations in topological data analysis, but they do not naturally live in a vector space, and the statistical tools developed for comparing them have largely evolved separately from those used for downstream prediction. We introduce STRAND (Survival Topological Representation ANalysis of Diagrams), which treats (collections of) PDs as survival data: each topological feature with persistence value $p = d - b$ is a fully observed time-to-event, and the persistence survival function $S(t) = \mathbb{P}(p > t)$ is the central object for comparing diagrams. From this single representation we derive (i) a non-parametric two-sample test with calibrated Type I error and high power from a small number of diagrams; (ii) interpretable effect sizes; and (iii) a 1-Wasserstein-stable feature vector for downstream machine learning. We validate calibration and power on synthetic manifolds with controlled topology, demonstrate competitive vectorisation across 14 graph and 3D point cloud benchmarks, and apply the method to study functional brain connectivity in fMRI/neuroscience data. To our knowledge, STRAND is the first method to provide hypothesis testing and vectorisation for persistence diagrams from a single coherent and interpretable representation.

URL PDF HTML ☆

赞 0 踩 0

2606.11925 2026-06-11 cs.CV cs.LG 交叉投稿

CoVar: 置信度-方差引导的半监督学习伪标签选择

Jinshi Liu, Lei He, Pan Liu

发表机构 * College of Artificial Intelligence, Shenzhen University（深圳大学人工智能学院）； School of Information and Electrical Engineering, Hunan University of Science and Technology（湖南科技大学信息与电气工程学院）； Information Hub, Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州）信息中心）

AI总结提出CoVar框架，通过联合建模最大置信度和残差类方差来评估伪标签可靠性，利用SVD谱松弛分离可靠与不可靠预测，无需手动阈值，在分割和分类任务上取得提升。

详情

AI中文摘要

半监督学习中的伪标签选择通常由最大置信度阈值驱动，然而在模型过度自信和类别不平衡下，仅靠置信度可能不可靠。我们提出CoVar，一个置信度-方差框架，通过联合建模最大置信度（MC）和残差类方差（RCV）来评估伪标签可靠性。从熵最小化出发，我们推导出二阶交叉熵近似，表明当MC高且RCV低时，低损失伪标签更受青睐，并带有置信度依赖的惩罚项，该惩罚项对接近确定的预测更强。基于此准则，CoVar将预测嵌入二维置信度-方差空间，并使用基于SVD的谱松弛来分离可靠和不可靠的预测，无需手动调整置信度阈值。然后，聚类加权高斯函数将此分离转换为每个样本的训练权重。所得权重可在训练期间集成到现有的半监督分割和分类流程中，且不引入推理开销。在PASCAL VOC 2012、Cityscapes、CIFAR-10、CIFAR-100、SVHN和STL-10上的实验表明，在匹配骨干网络下，VOC和Cityscapes上取得明显提升，并在标准分类基准上达到竞争性或更低的错误率。这些结果表明，残差类离散度为鲁棒伪标签选择提供了置信度之外的补充信号。

英文摘要

Pseudo-label selection in semi-supervised learning is commonly driven by maximum-confidence thresholds, yet confidence alone can be unreliable under model overconfidence and class imbalance. We propose CoVar, a confidence--variance framework that assesses pseudo-label reliability by jointly modeling Maximum Confidence (MC) and Residual-Class Variance (RCV). Starting from entropy minimization, we derive a second-order cross-entropy approximation showing that low-loss pseudo-labels are favored when MC is high and RCV is low, with a confidence-dependent penalty that becomes stronger for near-certain predictions. Based on this criterion, CoVar embeds predictions into a two-dimensional confidence--variance space and uses SVD-based spectral relaxation to separate reliable and unreliable predictions without hand-tuned confidence thresholds. Cluster-wise Gaussian weighting then converts this separation into per-sample training weights. The resulting weights can be integrated into existing semi-supervised segmentation and classification pipelines during training and introduce no inference-time overhead. Experiments on PASCAL VOC 2012, Cityscapes, CIFAR-10, CIFAR-100, SVHN, and STL-10 show clear gains on VOC and Cityscapes under matched backbones, as well as competitive or improved error rates on standard classification benchmarks. These results indicate that residual-class dispersion provides a useful signal complementary to confidence for robust pseudo-label selection.

URL PDF HTML ☆

赞 0 踩 0

2602.02229 2026-06-11 cs.LG eess.SP 版本更新

Prediction-Powered Risk Monitoring of Deployed Models for Detecting Harmful Distribution Shifts

预测驱动的已部署模型风险监控：检测有害分布漂移

Guangyi Zhang, Yunlong Cai, Guanding Yu, Osvaldo Simeone

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出预测驱动风险监控（PPRM），一种基于预测驱动推断的半监督方法，通过结合合成标签与少量真实标签构建运行风险的随时有效下界，实现对有害漂移的检测，并在图像分类、大语言模型和电信监控任务中验证有效性。

Comments Accepted by ICML2026

2602.08986 2026-06-11 cs.LG cs.AI 版本更新

Improving Detection of Rare Nodes in Hierarchical Multi-Label Learning

改进分层多标签学习中稀有节点的检测

Isaac Xu, Martin Gillis, Ayushi Sharma, Benjamin Misiuk, Craig J. Brown, Thomas Trappenberg

发表机构 * Faculty of Computer Science（计算机科学学院）； Dalhousie University（达尔豪斯大学）； Department of Geography（地理系）； Memorial University of Newfoundland（纽芬兰纪念大学）； Department of Oceanography（海洋学系）

AI总结针对分层多标签分类中稀有节点检测困难的问题，提出结合节点不平衡加权和焦点加权的损失函数，利用集成不确定性量化，在基准数据集上将召回率提升至五倍，并显著提高F1分数。

Comments Accepted for publication in Transactions on Machine Learning Research (TMLR), 2026

详情

AI中文摘要

在分层多标签分类中，一个持续的挑战是使模型预测能够达到层次结构的更深层次，以实现更详细或更细粒度的分类。这一困难部分源于某些类别（或层次节点）的自然稀有性，以及确保子节点几乎总是比其父节点频率更低的分层约束。为了解决这个问题，我们为神经网络提出了一种加权损失目标，该目标结合了节点不平衡加权和焦点加权组件，后者利用了集成不确定性的现代量化。通过强调稀有节点而非稀有观测（数据点），并在训练过程中关注每个模型输出分布中的不确定节点，我们观察到在基准数据集上召回率提高了高达五倍，并且$F_{1}$分数有统计显著的提升。我们还展示了我们的方法有助于卷积网络处理具有挑战性的任务，例如在编码器次优或数据有限的情况下。

英文摘要

In hierarchical multi-label classification, a persistent challenge is enabling model predictions to reach deeper levels of the hierarchy for more detailed or fine-grained classifications. This difficulty partly arises from the natural rarity of certain classes (or hierarchical nodes) and the hierarchical constraint that ensures child nodes are almost always less frequent than their parents. To address this, we propose a weighted loss objective for neural networks that combines node-wise imbalance weighting with focal weighting components, the latter leveraging modern quantification of ensemble uncertainties. By emphasizing rare nodes rather than rare observations (data points), and focusing on uncertain nodes for each model output distribution during training, we observe improvements in recall by up to a factor of five on benchmark datasets, along with statistically significant gains in $F_{1}$ score. We also show our approach aids convolutional networks on challenging tasks, as in situations with suboptimal encoders or limited data.

URL PDF HTML ☆

赞 0 踩 0

2602.22962 2026-06-11 cs.LG 版本更新

Scaling Laws of Global Weather Models

全球天气模型的缩放定律

Yuejiang Yu, Langwen Huang, Alexandru Calotoiu, Torsten Hoefler

发表机构 * University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结本文分析数据驱动天气模型中模型大小、数据集大小和计算预算与验证损失之间的缩放定律，发现Aurora数据缩放最强，GraphCast参数效率高但硬件利用率低，计算最优分析表明增加训练数据比增大模型更有效，且模型形状上宽度优于深度。

Comments Accepted at ICML 2026. 21 pages, 7 figures

详情

AI中文摘要

数据驱动模型正在彻底改变天气预报。为了优化训练效率和模型性能，本文分析了该领域内的经验缩放定律。我们研究了模型性能（验证损失）与三个关键因素：模型大小（$N$）、数据集大小（$D$）和计算预算（$C$）之间的关系。在一系列模型中，我们发现Aurora表现出最强的数据缩放行为：将训练数据集增加10倍可使验证损失降低多达3.2倍。GraphCast展示了最高的参数效率，但硬件利用率有限。我们的计算最优分析表明，在固定计算预算下，将资源分配给更多的总训练数据比增加模型大小能带来更大的性能提升。此外，我们分析了模型形状，并发现了与语言模型中观察到的根本不同的缩放行为：天气预报模型始终倾向于增加宽度而非深度。这些发现表明，未来的天气模型应优先考虑更宽的架构和更大的有效训练数据集，以最大化预测性能。

英文摘要

Data-driven models are revolutionizing weather forecasting. To optimize training efficiency and model performance, this paper analyzes empirical scaling laws within this domain. We investigate the relationship between model performance (validation loss) and three key factors: model size ($N$), dataset size ($D$), and compute budget ($C$). Across a range of models, we find that Aurora exhibits the strongest data-scaling behavior: increasing the training dataset by 10x reduces validation loss by up to 3.2x. GraphCast demonstrates the highest parameter efficiency, yet suffers from limited hardware utilization. Our compute-optimal analysis indicates that, under fixed compute budgets, allocating resources to more total training data yields greater performance gains than increasing model size. Furthermore, we analyze model shape and uncover scaling behaviors that differ fundamentally from those observed in language models: weather forecasting models consistently favor increased width over depth. These findings suggest that future weather models should prioritize wider architectures and larger effective training datasets to maximize predictive performance.

URL PDF HTML ☆

赞 0 踩 0

2410.24145 2026-06-11 stat.ML cs.LG stat.ME 版本更新

Projected random forests and conformal prediction of circular data

投影随机森林与圆形数据的共形预测

Paulo C. Marques F., Rinaldo Artes, Helton Graziadei

发表机构 * Insper University（Insper大学）； University of São Paulo（圣保罗大学）

AI总结针对圆形响应回归问题，应用共形预测技术，通过投影方法将线性回归模型转换为圆形模型，并利用随机森林的袋外机制避免额外校准样本，生成具有有限样本覆盖保证和自适应弧长的预测集。

Comments 7 pages; 4 figures

2510.06596 2026-06-11 cs.CV cs.AI cs.IT cs.LG math.IT 版本更新

SDQM: Synthetic Data Quality Metric for Object Detection Dataset Evaluation

SDQM：用于目标检测数据集评估的合成数据质量指标

Ayush Zenith, Arnold Zumbrun, Neel Raut, Jing Lin

发表机构 * Northeastern University, Khoury College of Computer Sciences（东北大学，Khoury 计算科学学院）； Binghamton University, School of Computing（布ingham顿大学，计算科学学院）； Air Force Research Laboratory, Mission Applications and Infrastructure Section（空军研究实验室，任务应用与基础设施部门）

AI总结提出SDQM指标，无需模型训练收敛即可评估合成数据质量，与YOLO11的mAP强相关，优于现有指标。

Comments Accepted and Published at SPIE: Journal of Electronic Imaging, Vol. 35, Issue 3

详情

DOI: 10.1117/1.JEI.35.3.033014
Journal ref: Journal of Electronic Imaging 35(3), 033014 (2026)

AI中文摘要

机器学习模型的性能在很大程度上依赖于训练数据。大规模、良好标注数据集的稀缺给构建鲁棒模型带来了重大挑战。为了解决这一问题，通过模拟和生成模型产生的合成数据已成为一种有前景的解决方案，它增强了数据集的多样性，并提高了模型的性能、可靠性和韧性。然而，评估这些生成数据的质量需要一个有效的指标。我们引入了合成数据集质量指标（SDQM），用于评估目标检测任务的数据质量，而无需模型训练收敛。该指标能够更高效地生成和选择合成数据集，解决了资源受限的目标检测任务中的一个关键挑战。在我们的实验中，SDQM与领先的目标检测模型YOLO11的平均精度均值（mAP）得分表现出强相关性，而先前的指标仅表现出中等或弱相关性。此外，它提供了改进数据集质量的可操作见解，最大限度地减少了昂贵的迭代训练需求。这一可扩展且高效的指标为评估合成数据设立了新标准。SDQM的代码可从此https URL获取。

英文摘要

The performance of machine learning models depends heavily on training data. The scarcity of large-scale, well-annotated datasets poses significant challenges in creating robust models. To address this, synthetic data generated through simulations and generative models has emerged as a promising solution, enhancing dataset diversity and improving the performance, reliability, and resilience of models. However, evaluating the quality of this generated data requires an effective metric. We introduce the Synthetic Dataset Quality Metric (SDQM) to assess data quality for object detection tasks without requiring model training to converge. This metric enables more efficient generation and selection of synthetic datasets, addressing a key challenge in resource-constrained object detection tasks. In our experiments, SDQM demonstrated a strong correlation with the mean average precision (mAP) scores of YOLO11, a leading object detection model, whereas previous metrics only exhibited moderate or weak correlations. In addition, it provides actionable insights into improving dataset quality, minimizing the need for costly iterative training. This scalable and efficient metric sets a new standard for evaluating synthetic data. The code for SDQM is available at https://github.com/ayushzenith/SDQM

URL PDF HTML ☆

赞 0 踩 0

2510.16152 2026-06-11 cs.DL cs.AI cs.CL cs.LG 版本更新

Mapping Scientific Literature with Large Language Models and Topic Modeling

利用大语言模型和主题建模绘制科学文献图谱

Mason Smetana, Lev Khazanovich

发表机构 * Department of Civil and Environmental Engineering（土木与环境工程系）； University of Pittsburgh（匹兹堡大学）

AI总结提出基于大语言模型的两阶段分类框架，通过主题建模分析PNAS工程类文献，生成语义可解释主题并揭示跨主题关联，性能优于传统方法。

Comments 35 pages, 10 figures. Accepted for publication in Scientometrics. Final version available via DOI

详情

DOI: 10.1007/s11192-026-05643-9
Journal ref: Scientometrics (2026)

AI中文摘要

科学文献因学科边界、专业术语和潜在稀疏的关键词系统而日益碎片化，使得捕捉现代科学的演化结构变得困难。本研究引入了一个大语言模型驱动的框架，从主题建模的角度绘制科学文献图谱。该方法在《美国国家科学院院刊》20年间超过1500篇工程相关文章语料上进行了演示。一个两阶段分类流水线首先根据每篇文章的摘要分配一个主要主题类别，然后进行全文分析以识别次要分类，揭示语料库中潜在的跨主题联系。与传统主题模型不同，基于LLM的框架在保持强量化性能的同时，生成语义可解释的主题。与既定主题建模方法的比较评估显示，主题多样性更高，重叠度更低，且具有竞争性的一致性指标。对随机抽样的摘要子集进行手动验证，准确率达到75.9%。额外的传统自然语言处理分析证实，生成的主题对应于语料库中有意义的语言模式。连接主要和次要分类的二部网络进一步揭示了仅通过摘要或关键词系统不易观察到的隐含主题关系。结果表明，该框架无需事先了解期刊的编辑双重分类结构，即可独立恢复其大部分结构。总体而言，所提出的方法为绘制科学图谱和识别研究中新兴的跨主题联系提供了有力工具。

英文摘要

Scientific literature is increasingly fragmented by disciplinary boundaries, specialized terminology, and potentially sparse keyword systems, making it difficult to capture the evolving structure of modern science. This study introduces a large language model (LLM)-driven framework for mapping scientific literature from a topic modeling perspective. The approach is demonstrated on a 20-year corpus of more than 1,500 engineering-related articles published in the Proceedings of the National Academy of Sciences (PNAS). A two-stage classification pipeline first assigns a primary thematic category to each article based on its abstract, followed by full-text analysis to identify secondary classifications that reveal latent cross-topic connections within the corpus. Unlike conventional topic models, the LLM-based framework produces semantically interpretable topics while maintaining strong quantitative performance. Comparative evaluation against established topic modeling methods shows higher topic diversity and lower overlap with competitive coherence metrics. Manual validation on a randomly sampled subset of abstracts yields an accuracy of 75.9%. Additional traditional natural language processing analyses confirm that the generated topics correspond to meaningful linguistic patterns in the corpus. A bipartite network linking primary and secondary classifications further reveals implicit thematic relationships that are not readily observable through abstracts or keyword systems alone. The findings indicate that the framework independently recovers much of the journal's editorial dual-classification structure without prior knowledge of its schema. Overall, the proposed approach offers a powerful tool for mapping science and identifying emerging cross-topic connections in research.

URL PDF HTML ☆

赞 0 踩 0

2601.04203 2026-06-11 cs.CL cs.CV cs.LG cs.SE 版本更新

FronTalk: Benchmarking Front-End Development as Conversational Code Generation with Multi-Modal Feedback

FronTalk: 以多模态反馈进行对话式代码生成的前端开发基准测试

Xueqing Wu, Zihan Xue, Da Yin, Shuyan Zhou, Kai-Wei Chang, Nanyun Peng, Yeming Wen

发表机构 * Meta Superintelligence Labs（Meta超智能实验室）； University of California, Los Angeles（加州大学洛杉矶分校）； Duke University（杜克大学）

AI总结提出FronTalk基准，通过多轮对话和多模态反馈（文本与视觉指令）评估前端代码生成，发现模型存在遗忘和视觉反馈理解困难，提出AceCoder方法有效减少遗忘并提升性能。

详情

AI中文摘要

我们提出了FronTalk，一个前端代码生成基准，开创性地研究了一种独特的交互动态：具有多模态反馈的对话式代码生成。在前端开发中，草图、模型和带注释的截图等视觉工件对于传达设计意图至关重要，但它们在多轮代码生成中的作用仍未得到充分探索。为解决这一差距，我们聚焦于前端开发任务，整理了FronTalk，这是一个包含100个多轮对话的数据集，这些对话源自新闻、金融和艺术等不同领域的真实网站。每一轮都包含一个文本指令和一个等效的视觉指令，每个指令代表相同的用户意图。为全面评估模型性能，我们提出了一种新颖的基于智能体的评估框架，利用网络智能体模拟用户并探索网站，从而衡量功能正确性和用户体验。对20个模型的评估揭示了文献中系统性地未充分探索的两个关键挑战：（1）显著的遗忘问题，即模型覆盖先前实现的功能，导致任务失败；（2）解释视觉反馈的持续挑战，尤其是对于开源视觉语言模型（VLM）。我们提出了一个强大的基线来解决遗忘问题，即AceCoder，一种使用自主网络智能体批评每个过去指令实现的方法。这种方法将遗忘几乎减少到零，并将性能提升高达9.3%（从56.0%到65.3%）。总体而言，我们旨在为前端开发和多轮多模态代码生成的通用交互动态的未来研究提供坚实基础。代码和数据已在此https URL发布。

英文摘要

We present FronTalk, a benchmark for front-end code generation that pioneers the study of a unique interaction dynamic: conversational code generation with multi-modal feedback. In front-end development, visual artifacts such as sketches, mockups and annotated creenshots are essential for conveying design intent, yet their role in multi-turn code generation remains largely unexplored. To address this gap, we focus on the front-end development task and curate FronTalk, a collection of 100 multi-turn dialogues derived from real-world websites across diverse domains such as news, finance, and art. Each turn features both a textual instruction and an equivalent visual instruction, each representing the same user intent. To comprehensively evaluate model performance, we propose a novel agent-based evaluation framework leveraging a web agent to simulate users and explore the website, and thus measuring both functional correctness and user experience. Evaluation of 20 models reveals two key challenges that are under-explored systematically in the literature: (1) a significant forgetting issue where models overwrite previously implemented features, resulting in task failures, and (2) a persistent challenge in interpreting visual feedback, especially for open-source vision-language models (VLMs). We propose a strong baseline to tackle the forgetting issue with AceCoder, a method that critiques the implementation of every past instruction using an autonomous web agent. This approach significantly reduces forgetting to nearly zero and improves the performance by up to 9.3% (56.0% to 65.3%). Overall, we aim to provide a solid foundation for future research in front-end development and the general interaction dynamics of multi-turn, multi-modal code generation. Code and data are released at https://github.com/shirley-wu/frontalk

URL PDF HTML ☆

赞 0 踩 0

2601.17717 2026-06-11 cs.AI cs.LG 版本更新

A Survey on Evaluating Quality and Trustworthiness in LLM-Generated Data

评估LLM生成数据的质量与可信度综述

Kaituo Zhang, Mingzhi Hu, Hoang Anh Duy Le, Fariha Kabir Torsha, Zhimeng Jiang, Minh Khai Bui, Chia-Yuan Chang, Yu-Neng Chuang, Zhen Xiong, Ying Lin, Guanchu Wang, Na Zou

发表机构 * University of Houston（德克萨斯大学休斯敦分校）； Worcester Polytechnic Institute（沃思利理工学院）； Rice University（里德大学）； Texas A&M University（德克萨斯农工大学）； University of Wisconsin - Madison（威斯康星大学麦迪逊分校）； University of Southern California（南加州大学）； University of North Carolina at Charlotte（北卡罗来纳州立大学夏洛特分校）

AI总结提出LLM数据审计框架，从质量和可信度两个维度系统分类评估指标，分析六种模态数据生成方法的评估缺陷并给出改进建议。

Comments Published at TMLR. Title changed in the final version

详情

Journal ref: Transactions on Machine Learning Research, 2026

AI中文摘要

大型语言模型（LLM）已成为跨多种模态生成数据的强大工具。通过将数据从稀缺资源转变为可控资产，LLM缓解了真实世界数据获取成本对模型训练、评估和系统迭代造成的瓶颈。然而，确保LLM生成的合成数据的高质量仍然是一个关键挑战。现有研究主要关注生成方法，对生成数据质量的直接关注有限。此外，大多数研究局限于单一模态，缺乏跨不同数据类型的统一视角。为填补这一空白，我们提出了\textbf{LLM数据审计框架}。在该框架中，我们首先描述了如何利用LLM生成六种不同模态的数据。更重要的是，我们从质量和可信度两个维度系统分类了评估合成数据的内在指标。这种方法将评估重点从依赖下游任务性能的外在评估转向数据本身的固有属性。利用这一评估体系，我们分析了每种模态代表性生成方法的实验评估，并指出了当前评估实践中的重大缺陷。基于这些发现，我们为社区改进数据生成评估提供了具体建议。最后，该框架概述了合成数据在不同模态下的实际应用方法。

英文摘要

Large Language Models (LLMs) have emerged as powerful tools for generating data across various modalities. By transforming data from a scarce resource into a controllable asset, LLMs mitigate the bottlenecks imposed by the acquisition costs of real-world data for model training, evaluation, and system iteration. However, ensuring the high quality of LLM-generated synthetic data remains a critical challenge. Existing research primarily focuses on generation methodologies, with limited direct attention to the quality of the resulting data. Furthermore, most studies are restricted to single modalities, lacking a unified perspective across different data types. To bridge this gap, we propose the \textbf{LLM Data Auditor framework}. In this framework, we first describe how LLMs are utilized to generate data across six distinct modalities. More importantly, we systematically categorize intrinsic metrics for evaluating synthetic data from two dimensions: quality and trustworthiness. This approach shifts the focus from extrinsic evaluation, which relies on downstream task performance, to the inherent properties of the data itself. Using this evaluation system, we analyze the experimental evaluations of representative generation methods for each modality and identify substantial deficiencies in current evaluation practices. Based on these findings, we offer concrete recommendations for the community to improve the evaluation of data generation. Finally, the framework outlines methodologies for the practical application of synthetic data across different modalities.

URL PDF HTML ☆

赞 0 踩 0

2601.21817 2026-06-11 stat.ML cs.LG 版本更新

LakeFM：基于不规则多变量多深度时间序列数据的水生生态系统基础模型

Abhilash Neog, Sepideh Fatemi, Medha Sawhney, Kazi Sajeed Mehrab, Aanish Pradhan, Bennett J. McAfee, Emma Marchisin, Arka Daw, Robert Ladwig, Cayelan C. Carey, Paul Hanson, Anuj Karpatne

发表机构 * Virginia Tech（弗吉尼亚理工大学）； Grand Valley State University（大峡谷州立大学）； University of Wisconsin - Madison（威斯康星大学麦迪逊分校）； Amazon AGI（亚马逊AGI）； Aarhus University（奥胡斯大学）

AI总结针对湖泊时间序列数据不规则采样和跨湖泊泛化难题，提出预训练基础模型LakeFM，在模拟和观测数据上学习表征，实现优于现有模型的预测性能。

Comments KDD 2026

详情

DOI: 10.1145/3770855.3819024

AI中文摘要

理解和预测湖泊动态对于监测湖泊和水库的水质及生态系统健康至关重要。尽管机器学习方法最近已被应用于生态时间序列数据，但现有工作假设时间和深度上的规则采样，并且难以在具有异质变量、深度和观测模式的湖泊之间泛化。为了解决这些局限性，我们引入了\textsc{LakeFM}，一个用于水生系统的基础模型，在包含模拟和观测湖泊的大规模生态数据集上预训练。通过广泛的实证评估，我们表明\textsc{LakeFM}学习了跨越更广泛湖泊层面特征的有意义表征，并在与现有时间序列基础模型和非基础模型相比时，实现了具有竞争力或通常更优的预测性能，同时产生与真实湖泊动态一致的物理上合理的预测。

英文摘要

Understanding and forecasting lake dynamics is critical for monitoring water quality and ecosystem health across lakes and reservoirs. While machine learning methods have been recently applied to ecological time-series data, existing works assume regular sampling in time and depth, and struggle to generalize across lakes with heterogeneous variables, depths, and observation patterns. To address these limitations, we introduce \textsc{LakeFM}, a foundation model for aquatic systems, pre-trained on large-scale ecological datasets comprising both simulated and observed lakes. Through extensive empirical evaluation, we show that \textsc{LakeFM} learns meaningful representations spanning broader lake-level characteristics, and achieves competitive or often superior-forecasting performance compared to existing time-series foundation and non-foundation models, while producing physically plausible predictions consistent with real-world lake dynamics.

URL PDF HTML ☆

赞 0 踩 0

2606.11348 2026-06-11 cs.LG 新提交

SwiftCTS: Fast Cross-Design Prediction and Pareto Optimization of Clock Tree Metrics via Few-Shot Calibration

SwiftCTS: 通过少样本校准实现时钟树指标的快速跨设计预测与帕累托优化

Barsat Khadka, Kawsher Roxy, Md Rubel Ahmed

AI总结提出SwiftCTS框架，利用物理信息代理模型和K-shot乘法校准机制，在数秒内训练、亚毫秒推理，实现跨设计时钟树指标的准确预测与帕累托优化。

详情

AI中文摘要

时钟树综合（CTS）是物理设计流程中计算成本高昂的阶段，需要迭代调用EDA工具以探索庞大的配置空间，从而优化功耗、线长和时序偏差。现有的机器学习方法需要昂贵的重新训练或微调周期来适应未见过的宏架构，并且在架构上与穷举组合搜索所需的数百万次评估不匹配。我们提出了SwiftCTS，一个物理信息代理框架，同时解决了这两个局限性。通过将轻量级、基于物理的统计特征与梯度提升集成相结合，SwiftCTS在CPU上训练时间不到五秒，且无需GPU支持即可实现亚毫秒级推理。为了处理分布外（OOD）设计而无需重新训练或微调，我们引入了一种K-shot乘法校准机制，该机制仅需一到两次物理参考运行即可锚定预测，将未见过的宏上的功耗预测误差从24.5%降低到3.3%，线长误差从56.6%降低到1%以下。将该引擎与进化优化器集成，SwiftCTS在十秒内评估了100,000个CTS配置，生成了在OpenROAD流程中经过物理验证的帕累托最优前沿。闭环验证确认了功耗和线长的预测误差低于0.5%，时序偏差预测在OOD基准上在五皮秒以内，在所有目标指标上始终优于默认工具启发式方法。代码公开于：\href{this https URL}{this https URL}

英文摘要

Clock Tree Synthesis (CTS) is a computationally expensive stage in the physical design flow, requiring iterative EDA tool invocations to navigate a vast configuration space for optimal power, wirelength, and timing skew. Existing machine learning approaches require computationally expensive retraining or fine-tuning cycles to adapt to unseen macro architectures and are architecturally mismatched to the millions of evaluations demanded by exhaustive combinatorial search. We present SwiftCTS, a physics-informed surrogate framework that addresses both limitations simultaneously. By coupling lightweight, physics-grounded statistical features with gradient-boosted ensembles, SwiftCTS trains in under five seconds on a CPU and delivers sub-millisecond inference without GPU support. To handle out-of-distribution (OOD) designs without retraining or fine-tuning, we introduce a K-shot multiplicative calibration mechanism that anchors predictions to just one or two physical reference runs, reducing power prediction error from 24.5\% to 3.3\% and wirelength error from 56.6\% to under 1\% on unseen macros. Integrating this engine with an evolutionary optimizer, SwiftCTS evaluates 100,000 CTS configurations in under ten seconds, yielding Pareto-optimal frontiers that are physically validated within the OpenROAD flow. Closed-loop validation confirms prediction errors below 0.5\% for power and wirelength, and timing skew predictions within five picoseconds on an OOD benchmark, consistently outperforming default tool heuristics across all target metrics. Code publicly available at: \href{https://anonymous.4open.science/r/SwiftCTS-7E6E}{https://github.com/BarsatKhadka/SwiftCTS}

URL PDF HTML ☆

赞 0 踩 0

2606.11382 2026-06-11 cs.LG q-bio.BM 新提交

GLACIER: A Multimodal Student-Teacher Foundation Model for Molecular Property Prediction

GLACIER：用于分子性质预测的多模态师生基础模型

Emily Nguyen, Yongchan Hong, Harsh Toshniwal, Yan Liu, Andreas Luttens

发表机构 * Department of Computer Science, University of Southern California（南加州大学计算机科学系）； Department of Quantitative and Computational Biology, University of Southern California（南加州大学定量与计算生物学系）； Amazon（亚马逊）； Department of Medical Biochemistry and Biophysics, Science for Life Laboratory, Karolinska Institutet（卡罗林斯卡学院医学生物化学与生物物理系，生命科学实验室）

AI总结提出GLACIER师生框架，通过融合分子图、SMILES和物理化学描述符三种模态，并利用大模型蒸馏，实现高效准确的分子性质预测。

详情

AI中文摘要

深度学习模型有助于在数十亿候选化合物中发现具有定制性质的分子。然而，开发和部署最先进模型的计算负担不断增加，限制了其可扩展性。大多数大规模模型本质上是单模态的，忽视了利用互补分子数据模态的潜力。为了解决这些缺点，本文介绍了用于化学推理和探索的图-语言对齐表示（GLACIER）模型，这是一个师生框架，集成了分子图、SMILES字符串和物理化学描述符，以学习丰富的分子嵌入。我们的框架包括三个阶段：（1）我们在100,000个药物样分子上预训练三个学生编码器：用于分子图的消息传递神经网络、用于SMILES字符串的基于Transformer的编码器以及用于物理化学描述符的多层感知器；（2）我们使用新颖的Finsler几何感知模块融合这些学生模态；（3）通过对比学习，将来自大型教师模型（包括MiniMol和MolFormer）的互补知识蒸馏到一个轻量级模型中。我们证明GLACIER是一个稳健的框架，在复杂的分子性质预测任务中提供高预测性能和计算效率。我们的代码在此https URL公开可用。

英文摘要

Deep learning models facilitate the discovery of molecules with tailored properties among billions of candidate compounds. However, the computational burden to develop and deploy state-of-the-art models continuously increases, limiting their scalability. Most large-scale models are unimodal in nature and overlook the potential to leverage complementary molecular data modalities. To address these shortcomings, this paper introduces the Graph-Language Alignment for Chemical Inference and Exploration using Representations (GLACIER) model, a student-teacher framework that integrates molecular graphs, SMILES strings, and physicochemical descriptors to learn rich molecular embeddings. Our framework consists of three stages: (1) we pretrain three student encoders on 100,000 drug-like molecules: a message-passing neural network for molecular graphs, a transformer-based encoder for SMILES strings, and a multilayer perceptron for physicochemical descriptors, (2) we fuse these student modalities using a novel Finsler geometry-aware module, and (3) distill complementary knowledge from large teacher models, including MiniMol and MolFormer, into a single lightweight model via contrastive learning. We demonstrate that GLACIER is a robust framework that delivers high predictive performance and computational efficiency in complex molecular property prediction tasks. Our code is publicly available at https://github.com/eemokey/glacier.

URL PDF HTML ☆

赞 0 踩 0

2606.11463 2026-06-11 cs.LG cs.AI 新提交

LSTM-Based Detection of Structural Breaks in Property Insurance Loss Reserving: A Climate-Informed Approach

基于LSTM的财产保险损失准备金结构性断点检测：气候信息方法

Thomas Mbrice, Shashwat Panigrahi

发表机构 * Stony Brook University（石溪大学）

AI总结针对气候变化导致传统精算方法失效的问题，提出使用LSTM神经网络检测结构性断点，在佛罗里达和路易斯安那州数据上预期将巨灾年份准备金精度提升15-20%，并给出理论保证。

Comments 15 pages, 0 figures, whitepaper YC

详情

AI中文摘要

准确的损失准备金是保险公司偿付能力的基础，然而加速的气候驱动灾难系统地违反了传统精算方法所依赖的稳定性假设。本文提出一个研究计划，测试长短期记忆（LSTM）神经网络是否能够比链梯法、Bornhuetter-Ferguson法和Cape Cod法更快、更准确地检测和适应这些结构性断点。使用来自佛罗里达州和路易斯安那州超过15年的监管发展三角形数据，并辅以NOAA飓风强度指数和海面温度，我们假设在巨灾暴露年份准备金精度有15-20%的针对性提升，这一阈值基于先前的神经网络准备金文献以及本文发展的形式化收敛结果。除了实证验证，我们还发展了一个理论框架，以概率术语为基础进行LSTM结构性断点检测，并提供形式化的性能保证，以弥补测试期间巨灾事件数量有限的不足。我们记录了研究设计、方法论、预期贡献以及对局限性的坦诚评估。

英文摘要

Accurate loss reserving is foundational to insurer solvency, yet accelerating climate driven catastrophes systematically violate the stability assumptions on which traditional actuarial methods depend. This white paper presents a research program testing whether Long Short Term Memory (LSTM) neural networks can detect and adapt to these structural breaks faster and more accurately than Chain Ladder, Bornhuetter Ferguson, and Cape Cod methods. Using 15 plus years of regulatory development triangle data from Florida and Louisiana, enriched with NOAA hurricane intensity indices and sea surface temperatures, we hypothesize a targeted improvement of 15, 20% in reserve accuracy for catastrophe exposed years, a threshold grounded both in the prior neural network reserving literature and in the formal convergence results developed here. Beyond empirical validation, we develop a theoretical framework grounding LSTM structural break detection in probabilistic terms, providing formal performance guarantees that compensate for the limited number of catastrophe events in the test period. We document the research design, methodology, expected contributions, and a candid assessment of limitations.

URL PDF HTML ☆

赞 0 踩 0

2606.11490 2026-06-11 cs.LG cs.SY eess.SY 新提交

具有可处理不确定性量化的保结构神经代理模型

Handi Zhang, Adrienne M. Propp, Brooks Kinch, Houman Owhadi, Nathaniel Trask

发表机构 * University of Pennsylvania（宾夕法尼亚大学）； Stanford University（斯坦福大学）； California Institute of Technology（加州理工学院）

AI总结提出一种结合混合有限元空间与高斯过程回归的保结构降阶模型，通过拓扑结构实现状态-通量关系的不确定性量化，并导出狄利克雷-诺伊曼映射的闭式后验不确定性。

详情

AI中文摘要

科学机器学习的最新进展为偏微分方程（PDE）的近实时求解提供了一种手段，但缺乏支持当代验证与确认的传统模拟器的理论基础。在这项工作中，我们构建了数据驱动的降阶模型，作为保结构、实时代理模型。值得注意的是，施加物理守恒结构的外微分也揭示了拓扑结构，我们利用该结构构建了状态-通量关系中不确定性的高斯过程（GP）表示，最终为目标量导出具有后验不确定性闭式表达的狄利克雷-诺伊曼映射。我们特别提出了由轻量级变压器规定的传统Raviart-Thomas和$dgP_0$单元的保结构$H(\mathrm{div})$--$L^2$子空间。通过提出一个守恒律来学习与该子空间一致的降阶动力学，其中GP描述了体积之间的通量。这项工作依赖于混合有限元空间与GP回归之间的新颖接口；当训练被表述为最优恢复问题（ORP）时，得到的GP回归可以写成一个带有等式约束的优化问题，该约束施加了守恒结构，适用于快速的Schur补训练策略。然后，训练好的模型可以实时求解，得到由指定狄利克雷数据驱动的边界通量的闭式估计量。本文包括线性泛函的RKHS后验误差界以支持不确定性量化，以及数值实验证明了后验分布作为误差估计代理的准确性。

英文摘要

Recent advances in scientific machine learning provide a means of near-real-time solution to partial differential equations (PDEs), but lack the theoretical underpinnings of conventional simulators that support contemporary verification and validation. In this work, we construct data-driven reduced-order models that serve as structure-preserving, real-time surrogates. Remarkably, the exterior calculus that imposes physical conservation structure also exposes topological structure that we use to build a Gaussian process (GP) representation of uncertainty in state-flux relationships, ultimately yielding a Dirichlet-to-Neumann map for quantities of interest with closed-form expressions for posterior uncertainty. We specifically propose structure-preserving $H(\mathrm{div})$--$L^2$ subspaces of conventional Raviart--Thomas and $dgP_0$ elements prescribed by a lightweight transformer. Reduced-order dynamics consistent with this subspace are learned by posing a conservation law in which a GP describes the fluxes between volumes. This work hinges on a novel interface between mixed FEM spaces and GP regression; when training is posed as the optimal recovery problem (ORP), the resulting GP regression can be written as an optimization problem with equality constraints that impose a conservation structure, amenable to a fast Schur-complement training strategy. The trained model can then be solved in real time with closed-form estimators for boundary fluxes driven by prescribed Dirichlet data. The paper includes RKHS posterior error bounds for linear functionals to support uncertainty quantification, as well as numerical experiments demonstrating the accuracy of the posterior distribution as a surrogate for error estimation.

URL PDF HTML ☆

赞 0 踩 0

2606.11651 2026-06-11 cs.LG q-bio.QM stat.AP 新提交

DeepRHP: A Hybrid Variational Autoencoder for Designing Random Heteropolymers as Protein Mimics

DeepRHP：一种用于设计随机异聚合物作为蛋白质模拟物的混合变分自编码器

Shuni Li, Zhiyuan Ruan, Andy Shen, Ivan Jayapurna, Ting Xu, Haiyan Huang

发表机构 * DeepRHP: A Hybrid Variational Autoencoder for Designing Random Heteropolymers as Protein Mimics（DeepRHP：一种用于设计随机杂合聚合物作为蛋白质模拟物的混合变分自编码器）

AI总结提出混合变分自编码器DeepRHP，在半监督框架下结合特征VAE与经典VAE，通过潜在空间捕获关键化学特征与序列模式，指导随机异聚合物设计，实验验证其稳定膜蛋白的有效性。

Comments Oral presentation at AAAI 2023 Workshop on AI to Accelerate Science and Engineering

详情

AI中文摘要

由预定义单体组成的合成随机异聚合物（RHP）为设计类蛋白质材料提供了一种方法。如果设计得当，这些RHP可以模拟蛋白质的行为和功能。因此，需要计算工具来有效指导RHP设计。我们通过开发DeepRHP（一种在半监督框架下改进的变分自编码器（VAE）模型）来弥补这一差距。通过为经典VAE配备额外的基于特征的VAE，DeepRHP迫使潜在空间捕获关键化学特征的结构以及单个RHP序列模式。从这个意义上说，我们的方法是通用的，允许以混合方式纳入任何相关特征。我们通过提出在非原生环境中稳定膜蛋白（例如水通道蛋白Z）的潜在单体组成，并将我们的预测与已发表的结果进行交叉验证，证明了DeepRHP的有效性。我们的模型与真实RHP功能之间的一致性表明，利用混合自编码器架构来指导蛋白质和其他生物化合物的RHP设计具有巨大潜力。

英文摘要

Synthetic random heteropolymers (RHPs), consisting of a predefined set of monomers, offer an approach toward the design of protein-like materials. These RHPs, if designed appropriately, can mimic protein behavior and function. As such, there is a need for computational tools to efficiently guide RHP design. We bridge this gap by developing DeepRHP, a modified variational autoencoder (VAE) model under a semi-supervised framework. By equipping a classical VAE with an additional feature-based VAE, DeepRHP forces the latent space to capture structures of critical chemical features as well as individual RHP sequence patterns. In this sense, our method is versatile by allowing any relevant features to be incorporated in a hybrid manner. We demonstrate the effectiveness of DeepRHP by suggesting potential monomer compositions that stabilize membrane proteins (e.g. Aquaporin Z) in non-native environments and cross-validating our prediction with published results. The concordance between our model and true RHP function suggests strong potential in utilizing hybrid autoencoder architectures to guide RHP design for proteins and other biological compounds.

URL PDF HTML ☆

赞 0 踩 0

2606.11794 2026-06-11 cs.LG cs.AI 新提交

Multimodal Ordinal Modeling of Alzheimer's Disease Severity Using Structural MRI and Clinical Data

使用结构MRI和临床数据的阿尔茨海默病严重程度的多模态序数建模

Boris-Stephan Rauchmann, Jonathan Laib, Buse Ercik, Robert Perneczky, Sergio Altares-López

发表机构 * Department of Neuroradiology, LMU University Hospital, Ludwig Maximilian University of Munich（神经放射科，慕尼黑路德维希-马克西米利安大学医院，慕尼黑路德维希-马克西米利安大学）； Department of Psychiatry and Psychotherapy, LMU University Hospital, Ludwig Maximilian University of Munich（精神病学与心理治疗系，慕尼黑路德维希-马克西米利安大学医院，慕尼黑路德维希-马克西米利安大学）

AI总结提出一种注意力增强的多模态序数回归框架，整合MRI、人口统计学和遗传数据，用于自动且可解释的AD严重程度分期，在ADNI等数据集上验证，序数模型在相邻阶段准确率（0.970）和与临床分期一致性（QWK 0.549）上表现最佳。

Comments 18 pages. Submitted to journal for review

详情

AI中文摘要

神经退行性疾病如阿尔茨海默病（AD）需要准确且可扩展的工具来评估疾病严重程度，然而当前的临床分期仍然耗时且易变。我们提出了一种带有注意力增强的多模态机器学习框架，结合序数回归，用于自动且可解释的AD严重程度分期。该框架整合了T1加权MRI与人口统计学和遗传变量，并使用序数和非序数预测头比较了单模态和多模态架构。模型使用来自ADNI、AIBL和NIFD数据集的队列分层划分进行训练和验证。严格保留的测试集由排除在所有训练、验证、预处理和超参数调优过程之外的受试者构建，并在整个过程中采用受试者级划分以防止数据泄漏。在单模态方法中，T1加权MRI模型在相邻阶段准确率（0.963）和与临床分期的一致性（QWK 0.444）上略高于表格模型（QWK 0.433）。整合成像、人口统计学和遗传信息提高了整体性能。多模态非序数基线实现了最低的预测误差（MAE 0.340），而序数多模态模型实现了最高的相邻阶段准确率（0.970）和与临床分期的最强一致性（QWK 0.549）。这些发现表明，序数公式更好地捕捉了CDR量表的顺序结构，并产生与临床分期更一致的预测。使用Grad CAM++和SHAP的可解释性分析展示了解剖学和临床上合理的模型行为，支持透明决策。总体而言，基于注意力的多模态学习与序数回归代表了一种稳健、可解释且可扩展的方法，用于自动AD严重程度分期和AI辅助临床决策支持。

英文摘要

Neurodegenerative diseases such as Alzheimer's disease (AD) require accurate and scalable tools for assessing disease severity, yet current clinical staging remains time-intensive and prone to variability. We propose an attention-enhanced multimodal machine learning framework with ordinal regression for automated and interpretable AD severity staging. The framework integrates T1-weighted MRI with demographic and genetic variables and compares unimodal and multimodal architectures using ordinal and non-ordinal prediction heads. Models were trained and validated using cohort-stratified splits derived from the ADNI, AIBL, and NIFD datasets. A strictly held-out test set was constructed using subjects excluded from all training, validation, preprocessing, and hyperparameter tuning procedures, with subject-level splitting employed throughout to prevent data leakage. Among unimodal approaches, the T1-weighted MRI model achieved slightly higher adjacent-stage accuracy (0.963) and agreement with clinical staging (QWK 0.444) than the tabular model (QWK 0.433). Integrating imaging, demographic, and genetic information improved overall performance. The multimodal non-ordinal baseline achieved the lowest prediction error (MAE 0.340), whereas the ordinal multimodal model achieved the highest adjacent-stage accuracy (0.970) and strongest agreement with clinical staging (QWK 0.549). These findings indicate that ordinal formulations better capture the ordered structure of the CDR scale and yield predictions more consistent with clinical staging. Explainability analyses using Grad CAM++ and SHAP demonstrated anatomically and clinically plausible model behavior, supporting transparent decision-making. Overall, attention-based multimodal learning with ordinal regression represents a robust, interpretable, and scalable approach for automated AD severity staging and AI-assisted clinical decision support.

URL PDF HTML ☆

赞 0 踩 0

2606.11868 2026-06-11 cs.LG q-bio.QM 新提交

MemNovo: Look Back at the Spectrum for Balanced De Novo Peptide Sequencing from Mass Spectrometry

MemNovo: 回顾谱图以实现质谱中平衡的从头肽段测序

Dongxin Lyu, Jingbo Zhou, Hongxin Xiang, Yuqiang Li, Jun Xia

发表机构 * Westlake University（西湖大学）； Hunan University（湖南大学）； Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； HKUST-GZ & HKUST（香港科技大学（广州）与香港科技大学）

AI总结针对现有Transformer模型在从头肽段测序中过度依赖生成序列先验而忽视谱图证据的问题，提出训练无关的即插即用机制MemNovo，通过建立持久谱记忆库和超保守残差连接在解码阶段注入谱特征，显著提升氨基酸和肽段精度。

Comments Code: https://github.com/AIMS-Lab-HKUSTGZ/MemNovo

详情

DOI: 10.1145/3770855.3818848
Journal ref: Knowledge Discovery and Data Mining(KDD), 2026

AI中文摘要

从串联质谱中进行从头肽段测序是蛋白质组学的关键，能够在不依赖参考数据库的情况下识别新型肽段。尽管基于Transformer的编码器-解码器模型已取得显著性能，但我们发现其推理动态中存在关键病理现象。通过全面的特征缩放实验，我们证明现有的自回归肽段解码器倾向于过度依赖生成序列的先验，同时逐渐未能充分利用输入质谱中的细粒度物理证据。这一现象导致次优结果，生成的肽段序列在生物学上合理但不符合输入谱图。为解决此问题，我们提出MemNovo，一种无需训练且即插即用的机制，在推理时重新平衡肽段和谱图的贡献。MemNovo通过建立持久的谱记忆库，并通过超保守残差连接将检索到的特征直接注入最终解码阶段，从而缓解信息瓶颈。理论分析证实，该机制恢复了解码器状态与原始谱图之间的互信息。在Nine Species基准上使用两个代表性基线模型Casanovo和InstaNovo进行的大量实验表明，MemNovo持续提高了氨基酸精度和肽段精度，对于Casanovo，肽段精度相对提升高达39.1%，对于InstaNovo提升高达3.9%，且计算开销可忽略不计。

英文摘要

De novo peptide sequencing from tandem mass spectrometry is pivotal in proteomics, enabling identification of novel peptides without reference databases. While recent Transformer-based encoder-decoder models have achieved remarkable performance, we uncover a critical pathology in their inference dynamics. Through comprehensive feature scaling experiments, we demonstrate that existing auto-regressive peptide decoders tend to over-rely on generated-sequence priors while progressively under-utilizing fine-grained physical evidence from the input mass spectrum. This phenomenon leads to suboptimal results, where generated peptide sequences are biologically plausible yet not faithful to the input spectrum. To rectify this, we propose MemNovo, a training-free and plug-and-play mechanism that re-balances peptide and spectral contributions at inference time. MemNovo alleviates the information bottleneck by establishing a persistent spectral memory bank and injecting retrieved features directly into the final decoding stage via an ultra-conservative residual connection. Theoretical analysis confirms that this mechanism restores the mutual information between the decoder state and the raw spectrum. Extensive experiments on the Nine Species benchmark with two representative baselines, Casanovo and InstaNovo, demonstrate that MemNovo consistently improves both amino acid precision and peptide precision, achieving up to 39.1% relative improvement in peptide precision for Casanovo and up to 3.9% for InstaNovo, with negligible computational overhead.

URL PDF HTML ☆

赞 0 踩 0

2606.11893 2026-06-11 cs.LG cs.AI cs.CL q-bio.NC 新提交

使用可解释性作为训练时可靠性信号实现高效心电图分类

Veerendhra Kumar Dangeti, Xiao Gu, Ying Weng, Shreyank N Gowda

发表机构 * School of Computer Science, University of Nottingham（诺丁汉大学计算机科学学院）； Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford（牛津大学工程科学系生物医学工程研究所）； School of Computer Science, University of Nottingham Ningbo China（宁波诺丁汉大学计算机科学学院）

AI总结提出ERTS方法，利用训练中的解释质量（Grad-CAM注意力图）区分信息性和不可靠不确定性，过滤低聚焦样本，在三个ECG数据集上提升macro-F1并降低训练成本。

详情

AI中文摘要

训练用于临床时间序列分析的深度神经网络计算需求高，但许多医疗环境缺乏重复模型开发和部署所需的资源。这一挑战在心电图分类中尤为明显，大数据集和长训练计划使效率变得重要。渐进式数据丢弃通过从梯度更新中排除已学习的样本来降低训练成本，但它依赖模型置信度，可能保留因噪声或歧义而难以处理而非有用信号的样本。在这项工作中，我们引入了ERTS，一种基于可解释性的可靠性训练信号，用于高效心电图分类。ERTS在训练期间利用解释质量来区分信息性和不可靠的不确定性。基于渐进式数据选择，我们计算候选样本的Grad-CAM注意力图，并推导出一个聚焦分数，衡量模型预测是否得到连贯且局部化模式的支持。低聚焦样本被过滤掉，而具有有意义注意力的样本优先进行梯度更新。我们在三个ECG数据集和多个骨干架构上评估ERTS，显示macro-F1的一致提升以及有效训练成本的降低。这些结果表明，解释质量可以作为改善临床时间序列学习中效率和可靠性的实用信号。代码将发布。

英文摘要

Training deep neural networks for clinical time-series analysis is computationally demanding, yet many healthcare settings lack the resources required for repeated model development and deployment. This challenge is particularly evident in electrocardiogram classification, where large datasets and long training schedules make efficiency practically important. Progressive Data Dropout reduces training cost by excluding samples from gradient updates once they are learned, but it relies on model confidence and may retain samples that are difficult due to noise or ambiguity rather than useful signal. In this work, we introduce ERTS, an explainability-based reliability training signal for efficient ECG classification. ERTS uses explanation quality during training to distinguish between informative and unreliable uncertainty. Building on progressive data selection, we compute Grad-CAM attention maps for candidate samples and derive a focus score that measures whether model predictions are supported by coherent and localised patterns. Samples with low focus are filtered out, while those with meaningful attention are prioritised for gradient updates. We evaluate ERTS across three ECG datasets and multiple backbone architectures, showing consistent improvements in macro-F1 alongside reduced effective training cost. These results suggest that explanation quality can serve as a practical signal for improving both efficiency and reliability in clinical time-series learning. Code will be released.

URL PDF HTML ☆

赞 0 踩 0

2606.12334 2026-06-11 cs.LG cs.RO 新提交

Fourier Features Let Agents Learn High Precision Policies with Imitation Learning

傅里叶特征让智能体通过模仿学习学习高精度策略

Balázs Gyenes, Emiliyan Gospodinov, Jan Frieling, Enrico Krohmer, Nicolas Schreiber, Xiaogang Jia, Niklas Freymuth, Gerhard Neumann

发表机构 * Karlsruhe Institute of Technology（卡尔斯鲁厄理工学院）； FZI Research Center for Information Technology（FZI信息技术研究中心）

AI总结提出在点云编码器中使用傅里叶特征映射，解决神经网络低频偏好导致的高精度操作问题，在多个基准和真实机器人上显著提升性能。

Comments Published as a conference paper at ICML 2026

详情

AI中文摘要

高精度机器人操作需要细粒度的空间推理，由于深度模糊和透视尺度问题，仅使用RGB的策略通常难以实现。直接利用3D信息（如基于点云的策略）比纯图像策略提供了更强的几何先验，但其性能仍然高度依赖于任务。我们假设这种差异可能是由于神经网络倾向于学习低频函数的频谱偏差，这尤其影响以缓慢变化的笛卡尔特征为条件的架构。因此，我们提出将点云从笛卡尔空间映射到高维傅里叶空间，有效地使点云编码器能够直接访问高频特征。我们通过实验验证了傅里叶特征在RoboCasa和ManiSkill3基准测试中的具有挑战性的操作任务以及真实机器人设置上的效果。尽管简单，我们发现傅里叶特征在不同的编码器架构和基准测试中提供了显著的好处，并且对超参数具有鲁棒性。我们的结果表明，傅里叶特征让策略比笛卡尔特征更有效地利用几何细节，显示了其作为基于点云的模仿学习的通用工具的潜力。我们在项目页面上提供源代码和视频：https://this https URL

英文摘要

High-precision robotic manipulation requires fine-grained spatial reasoning that is often difficult to achieve with RGB-only policies due to depth ambiguity and perspective scale issues. Policies that leverage 3D information directly, such as those based on point clouds, offer a stronger geometric prior over purely image-based ones, yet their performance remains highly task-dependent. We hypothesize that this discrepancy may be due to the spectral bias of neural networks towards learning low frequency functions, which especially affects architectures conditioned on slow-moving Cartesian features. We thus propose to map point clouds from Cartesian space into high-dimensional Fourier space, effectively equipping the point cloud encoder with direct access to high-frequency features. We experimentally validate the use of Fourier features on challenging manipulation tasks from the RoboCasa and ManiSkill3 benchmarks and on a real robot setup. Despite their simplicity, we find that Fourier features provide significant benefits across diverse encoder architectures and benchmarks and are robust across hyperparameters. Our results indicate that Fourier features let policies leverage geometric details more effectively than Cartesian features, showing their potential as a general-purpose tool for point cloud-based imitation learning. We provide source code and videos on our project page: https://fourier-il.github.io/fourier-il

URL PDF HTML ☆

赞 0 踩 0

2606.11199 2026-06-11 cs.CL cs.AI cs.IR cs.LG 交叉投稿

NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track

NightFeats @ MMU-RAGent NeurIPS 2025: 面向文本到文本轨道的上下文优化多智能体RAG系统

Quentin Fever, Naziha Aslam

AI总结提出一种结构化多智能体RAG系统NightFeats，通过检索、策展和组合三阶段分解知识合成，引入时序语义重排序、矛盾协调和引用保留架构，在MMU-RAGent竞赛中超越商业基线。

Comments 5 pages, 1 figure, 1 table. NeurIPS 2025 Competition Track (MMU-RAGent). System developed October 2025

详情

AI中文摘要

我们提出NightFeats，一个结构化的多智能体检索增强生成（RAG）系统，提交至NeurIPS 2025的MMU-RAGent竞赛，并在文本到文本轨道中获得最佳动态评估奖。本文并非以基准最大化目标，而是提出一个原则性流水线，将知识合成为三个协调阶段：检索、策展和组合，每个阶段由显式的中间表示和交接契约控制。受智能体上下文工程（ACE）启发，该系统引入时序语义重排序、有界矛盾协调和保留引用的组合作为核心架构原语。竞赛结果表明，NightFeats在LLM-as-a-Judge和人类Likert评估中超越了包括Claude-SonnetV2和Nova-Pro在内的商业基线，证实了架构透明性和可验证证据基础比单纯优化自动相似度指标的系统更符合人类偏好。

英文摘要

We present NightFeats, a structured multi-agent retrieval-augmented generation (RAG) system submitted to the MMU-RAGent competition at NeurIPS 2025, where it was awarded Best Dynamic Evaluation in the text-to-text track. Rather than targeting benchmark maximization, this work proposes a principled pipeline that decomposes knowledge synthesis into three coordinated phases: retrieval, curation, and composition, each governed by explicit intermediate representations and handoff contracts. Inspired by Agentic Context Engineering (ACE), the system introduces temporal-semantic reranking, bounded contradiction reconciliation, and citation-preserving composition as core architectural primitives. Competition results show that NightFeats surpasses proprietary baselines including Claude-SonnetV2 and Nova-Pro on LLM-as-a-Judge and Human Likert evaluations, confirming that architectural transparency and verifiable evidence grounding are better aligned with human preferences than systems optimizing narrowly for automatic similarity metrics.

URL PDF HTML ☆

赞 0 踩 0

2606.11249 2026-06-11 cs.RO cs.LG cs.MA 交叉投稿

MASK: Multi-Agent Semantic K-Scheduling for Risk-Sensitive 6G Robotics

MASK: 面向风险敏感的6G机器人学的多智能体语义K调度

Ahmet Gunhan Aydin, Elif Tugce Ceran

发表机构 * Middle East Technical University（中东技术大学）； Aselsan Inc.（阿塞尔桑公司）

AI总结针对6G机器人协同感知中频谱资源受限的问题，提出多智能体语义K调度（MASK）架构，通过仲裁辅助语义信息门控（A-SIG）机制仅调度语义重要性最高的K个智能体，结合自监督全局编码器和分布策略，在严格带宽限制下实现鲁棒的风险感知协调，性能接近无通信约束基线。

详情

AI中文摘要

实现6G连接机器人学的愿景需要协调高性能协作控制与物理无线信道的刚性频谱限制。在现实的协作感知场景中，频谱资源被量化为有限的物理资源块或正交子载波，使得所有智能体同时传输不可行。为了解决这一问题，我们提出了多智能体语义K调度（MASK），一种控制架构，旨在在严格的瞬时带宽限制下维持鲁棒的风险感知协调。我们引入了仲裁辅助语义信息门控（A-SIG），一种轻量级协调机制，通过基于本地计算的语义重要性分数仅调度前K个智能体来强制执行硬接入约束。通过将这些优先观测聚合为紧凑的潜在状态，自监督全局编码器使得分布策略能够在数据稀疏的情况下减轻尾部风险。我们在多个基准上评估了MASK，证明即使信道接入限制为群体大小的一小部分，其性能也能匹配无通信约束的基线。此外，该框架对数据包擦除具有固有的弹性，验证了语义调度作为资源受限的6G系统的关键使能技术。

英文摘要

Realizing the vision of 6G connected robotics requires reconciling high-performance collaborative control with the rigid spectral limitations of physical wireless channels. In realistic collaborative sensing scenarios, spectral resources are quantized into finite physical resource blocks or orthogonal subcarriers, rendering simultaneous transmission by all agents infeasible. To address this, we propose Multi-Agent Semantic K-Scheduling (MASK), a control architecture designed to sustain robust, risk-aware coordination under strict instantaneous bandwidth caps. We introduce Arbiter-Assisted Semantic Information Gating (A-SIG), a lightweight coordination mechanism that enforces hard access constraints by scheduling only the top-K agents based on locally computed semantic importance scores. By aggregating these prioritized observations into a compact latent state, a self-supervised global encoder enables a distributional policy to mitigate tail risks despite data sparsity. We evaluate MASK across diverse benchmarks, demonstrating that it matches the performance of communication-unconstrained baselines even when channel access is restricted to a small fraction of the swarm size. Furthermore, the framework exhibits inherent resilience to packet erasures, validating semantic scheduling as a critical enabler for resource-constrained 6G systems.

URL PDF HTML ☆

赞 0 踩 0

2606.11256 2026-06-11 physics.chem-ph cs.LG cs.NE 交叉投稿

My Chemical Harness: Evolutionary Molecular Design over Synthetic Pathways with Large Language Model Agents

我的化学缰绳：基于合成路径的大语言模型智能体进化分子设计

César Ojeda, Darius A. Faroughy, Maryam Karimi, Payam Zarrintaj, Mir Mehdi Seyedebrahimi, Martín Carballo-Pacheco

发表机构 * Institute of Mathematics, Faculty of Science, University of Potsdam（数学研究所，科学学院，波茨坦大学）； NHETC, Department of Physics and Astronomy, Rutgers University（NHETC，物理与天文学系，罗格斯大学）； Potsdam Transfer, University of Potsdam（波茨坦转移，波茨坦大学）； E3 LLC

AI总结提出一种以可执行合成路径为种群、大语言模型仅作策略控制器的进化框架，在可溶性环氧化物水解酶代理任务上达到最优性能。

Comments 27 pages | 10 figures

详情

AI中文摘要

空间掩蔽回归揭示电生理记录中的局部和分布式可预测性

Maryam Ostadsharif Memar, Nima Dehghani

发表机构 * Department of Electrical and Computer Engineering, IUT（电气与计算机工程系）； McGovern Institute for Brain Research, Massachusetts Institute of Technology (MIT)（脑科学研究所，麻省理工学院（MIT））

AI总结提出空间掩蔽回归（SMR）框架，通过逐步增大掩蔽区域量化电极信号中局部与分布式信息的贡献，应用于颅内和头皮脑电数据，发现邻近电极贡献显著但非全部，表明信号同时包含局部冗余和全局结构。

详情

AI中文摘要

神经记录通常被解释为局部测量，但任何单个传感器的信号也可能反映分布在整个网络中的结构化活动。这引出一个基本问题：电极信号在多大程度上反映底层系统中的局部信息与分布式信息？更具体地说，电极的活动有多少由其邻近区域携带，又有多少嵌入在阵列的更广泛分布中？我们通过空间掩蔽回归（SMR）框架解决这一问题，该框架从其余电极重建每个电极的时间序列，同时排除目标周围可配置的邻域。通过逐步增大掩蔽，空间局部性成为实验控制，用于量化在移除附近通道后有多少预测信息幸存。我们将SMR应用于具有异质电极覆盖的颅内脑电图（iEEG）和具有标准化导联组合的感觉运动皮层头皮脑电图（EEG）。使用原始信号与重建信号之间的距离相关性，我们发现两种模态中均存在强烈的受试者内重建，即使排除局部邻域后仍有显著的可预测性，且EEG中的跨受试者转移明显强于iEEG。掩蔽显示邻近电极对重建贡献显著，但并非全部，表明单个通道既反映局部冗余也反映更广泛的分布式结构。保留选定边际或谱特性但破坏相位结构或时间顺序的替代数据显著降低了性能，支持SMR依赖于结构化时间和跨通道组织而非仅边际统计的结论。这些结果将SMR定位为量化记录中局部与分布式信息平衡的可解释框架。

英文摘要

Neural recordings are often interpreted as local measurements, yet the signal at any one sensor can also reflect structured activity distributed across the broader network. This raises a basic question: to what extent does an electrode's signal reflect local versus distributed information in the underlying system? More specifically, how much of an electrode's activity is carried by its immediate neighborhood, and how much is embedded more broadly across the array? We address this with a Spatially Masked Regression (SMR) framework that reconstructs each electrode's timeseries from the remaining electrodes while excluding a configurable neighborhood around the target. By progressively increasing this mask, spatial locality becomes an experimental control for quantifying how much predictive information survives after nearby channels are withheld. We apply SMR to intracranial EEG with heterogeneous electrode coverage and to scalp EEG with standardized montages over sensorimotor cortex. Using distance correlation between original and reconstructed signals, we find strong within-subject reconstruction in both modalities, substantial residual predictability even when local neighbors are excluded, and markedly stronger cross-subject transfer in EEG than in iEEG. Masking shows that nearby electrodes contribute strongly to reconstruction but do not account for all of it, indicating that individual channels reflect both local redundancy and broader distributed structure. Surrogates that preserve selected marginal or spectral properties while disrupting phase structure or temporal ordering substantially reduce performance, supporting the conclusion that SMR depends on structured temporal and cross-channel organization rather than on marginal statistics alone. These results position SMR as an interpretable framework for quantifying the balance between local and distributed information in recordings.

URL PDF HTML ☆

赞 0 踩 0

2606.11500 2026-06-11 eess.IV cs.CE cs.IT cs.LG math.IT q-bio.NC 交叉投稿

近地系外行星的机器学习聚类：与卵石吸积的联系

Yi Duann, Anders Johansen, Haiyang S. Wang, H. Jens Hoeijmakers

发表机构 * Center for Star and Planet Formation（星系与行星形成中心）； Globe Institute（全球研究所）； University of Copenhagen（哥本哈根大学）； Lund Observatory（隆德天文台）； Department of Physics（物理系）； Lund University（隆德大学）； Graduate Institute of Astronomy（天文研究所）； National Central University（国立中央大学）

AI总结利用高斯混合模型对近地系外行星进行无监督聚类，揭示其内在子群，并通过卵石吸积合成种群解释形成路径差异。

详情

AI中文摘要

近地系外行星展现出由形成条件和迁移过程塑造的广泛轨道构型和物理性质。尽管种群合成模型预测了不同的行星种群，但在观测到的系外行星与合成种群之间建立定量联系仍然具有挑战性。我们使用物理驱动的动力学参数研究近地系外行星的内在组织，并将所得种群与卵石吸积形成路径联系起来。将两阶段高斯混合模型应用于观测到的近地系外行星样本，在由行星-恒星相互作用的动力学描述符主导的特征空间中进行无监督概率聚类。将所得聚类映射到统计驱动的三维参数空间中的卵石吸积合成种群。然后使用与形成相关的量（包括气体可用性、气体分数和冰岩质量比）来解释映射的种群。我们在不施加预定义分类边界的情况下识别出统计上支持的子群，包括超大质量气态巨行星、热巨行星、暖木星主导系统和低质量巨行星。映射的合成种群揭示了形成时间、气体吸积和固体增长历史的系统性差异。特别是，超大质量气态巨行星比热巨行星和暖木星主导种群更倾向于与更早的形成时期相关联。这些结果表明，物理驱动的机器学习方法可以为观测到的系外行星种群与理论行星形成路径之间的联系提供统计上稳健的框架。

英文摘要

Close-in exoplanets exhibit a wide range of orbital architectures and physical properties shaped by both formation conditions and migration processes. Although population-synthesis models predict distinct planetary populations, establishing a quantitative connection between observed exoplanets and synthetic populations remains challenging. We investigate the intrinsic organisation of close-in exoplanets using physically motivated dynamical parameters and connect the resulting populations to pebble-accretion formation pathways. A two-stage Gaussian mixture model (GMM) is applied to an observed sample of close-in exoplanets, performing unsupervised probabilistic clustering in a feature space dominated by dynamical descriptors of planet-star interactions. The resulting clusters are mapped onto a pebble-accretion synthetic population within a statistically motivated three-dimensional parameter space. Formation-related quantities, including gas availability, gas fraction, and ice-rock mass ratio, are then used to interpret the mapped populations. We identify statistically supported sub-populations without imposing predefined classification boundaries, including very-massive gas giants, hot giants, warm-Jupiter-dominated systems, and lower-mass giants. The mapped synthetic populations reveal systematic differences in formation timing, gas accretion, and solid growth histories. In particular, very-massive gas giants are preferentially associated with earlier formation epochs than hot-giant and warm-Jupiter-dominated populations. These results demonstrate that physically motivated machine-learning approaches can provide a statistically robust framework for linking observed exoplanet populations to theoretical planet formation pathways.

URL PDF HTML ☆

赞 0 踩 0

2606.11743 2026-06-11 cs.RO cs.GR cs.LG 交叉投稿

TacCoRL: Integrating Tactile Feedback into VLA via Simulation

TacCoRL: 通过仿真将触觉反馈集成到视觉-语言-动作模型中

Siyu Ma, Yuqi Liang, Chang Yu, Yunuo Chen, Hao Su, Yixin Zhu, Yin Yang, Chenfanfu Jiang

发表机构 * University of California, Los Angeles（加利福尼亚大学洛杉矶分校）； University of California, San Diego（加利福尼亚大学圣迭戈分校）； University of Electronic Science and Technology of China（电子科技大学）； Peking University（北京大学）； University of Utah（犹他大学）

AI总结提出TacCoRL框架，通过仿真与真实联合训练和强化学习，将触觉反馈注入视觉-语言-动作策略，在接触密集型任务中平均成功率提升22.5%。

详情

AI中文摘要

视觉-语言-动作（VLA）模型为机器人操作提供了强大的视觉、语言和动作先验，但仅凭视觉观察往往缺失接触密集型任务所需的局部接触状态。我们提出TacCoRL，一个可扩展的框架，将触觉反馈注入VLA策略，并通过仿真-真实联合训练和基于仿真的强化学习（RL）进行改进，无需大规模触觉预训练或广泛的真实世界接触探索。关键思想不仅是添加触觉作为输入，而是学习在接近失败状态下接触读数应如何调节动作响应，这些状态在演示中罕见且在硬件上收集风险高。我们使用真实对齐的仿真器作为接触交互的闭环训练环境。混合的仿真和真实轨迹首先在预训练策略中热启动触觉条件动作。具有可验证任务奖励的强化学习随后通过仿真接触回滚优化策略。它强化导致任务完成的触觉条件动作，而真实轨迹上的监督目标将精炼策略锚定到部署的视觉、触觉和动作分布。所得策略直接转移到真实机器人，无需特权仿真状态或在线真实世界RL。在四个双臂接触密集型任务中，最终的视觉-触觉策略平均成功率达到72.5%，而基线为50.0%。结果视频和更多细节见此链接。

英文摘要

Vision-language-action (VLA) models provide strong visual, language, and action priors for robot manipulation, but visual observations alone often miss the local contact state required for contact-rich tasks. We present TacCoRL, a scalable framework that injects Tactile feedback into VLA policies and improves them through sim-real Co-training and simulation-based reinforcement learning (RL), without requiring large-scale tactile pretraining or extensive real-world contact exploration. The key idea is not only adding touch as an input, but learning how contact readings should modulate action responses in near-failure states that are rare in demonstrations and risky to collect on hardware. We use a real-aligned simulator as a closed-loop training environment for contact interaction. Mixed simulated and real trajectories first warm-start tactile-conditioned actions in the pretrained policy. Reinforcement learning with verifiable task rewards then optimizes the policy using simulated contact rollouts. It reinforces tactile-conditioned actions that lead to task completion, while a supervised objective on real trajectories keeps the refined policy anchored to deployment visual, tactile, and action distributions. The resulting policy transfers directly to the real robot without privileged simulation state or online real-world RL. Across four bimanual contact-rich tasks, the final visuo-tactile policy achieves an average success rate of 72.5%, compared to baseline of 50.0%. Result videos and more details are available at https://tac-corl.github.io/

URL PDF HTML ☆

赞 0 踩 0

2606.11814 2026-06-11 quant-ph cs.AI cs.LG 交叉投稿

Atlas H&E-TME：基于AI的可扩展组织分析，达到专家病理学家级别的准确性

Kai Standvoss, Miriam Hägele, Rosemarie Krupar, Julika Ribbat-Idel, Jennifer Altschüler, Gerrit Erdmann, Hans Pinckaers, Evelyn Ramberger, Madleen Drinkwitz, Ádám Nárai, Alexander Möllers, Katja Lingelbach, Sebastian Kons, Lukas Hönig, Recepcan Adigüzel, Joana Baião, Alberto Megina Gonzalo, Marius Teodorescu, Marie-Lisa Eich, Paolo Chetta, Shakil Merchant, Verena Aumiller, Simon Schallenberg, Andrew Norgan, Klaus-Robert Müller, Lukas Ruff, Maximilian Alber, Frederick Klauschen

发表机构 * Aignostics, Germany（Aignostics，德国）； Institute of Pathology, Charité – Universitätsmedizin Berlin, Germany（柏林夏里特医学院病理学研究所）； Berlin Institute of Health, Charité – Universitätsmedizin Berlin, Germany（柏林夏里特医学院柏林健康研究所）； Massachusetts General Hospital, Department of Pathology, Harvard Medical School, Boston, MA, US（哈佛医学院麻省总医院病理学系）； Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, US（梅奥诊所检验医学与病理学系）； Machine Learning Group, Technische Universität Berlin, Germany（柏林工业大学机器学习组）； BIFOLD – Berlin Institute for the Foundations of Learning and Data, Germany（柏林学习与数据基础研究所）； Department of Artificial Intelligence, Korea University, Republic of Korea（高丽大学人工智能系）； Max-Planck Institute for Informatics, Germany（马克斯·普朗克信息学研究所）； German Cancer Research Center (DKFZ) & German Cancer Consortium (DKTK), Berlin & Munich Partner Sites, Germany（德国癌症研究中心及德国癌症联盟柏林和慕尼黑合作站点）； Institute of Pathology, Ludwig-Maximilians-Universität München, Germany（慕尼黑大学病理学研究所）； Bavarian Cancer Research Center (BZKF), Germany（巴伐利亚癌症研究中心）

AI总结提出Atlas H&E-TME系统，利用病理基础模型预测组织质量、区域和细胞类型，通过IHC共识验证和20万+注释基准，在多种癌症中达到或超越病理学家水平。

详情

AI中文摘要

苏木精和伊红（H&E）染色是组织病理学的基石，然而对H&E全切片图像（WSI）进行可扩展的定量分析仍然是计算病理学中的核心挑战。我们提出了Atlas H&E-TME，这是一个基于Atlas病理基础模型家族的AI系统，可预测多种癌症类型的组织质量、组织区域和细胞类型标签，在细胞级分辨率下每张切片产生超过4,500个定量读数。验证此类系统的关键挑战在于克服H&E-only金标准固有的形态模糊性，以及依赖免疫组织化学（IHC）等模态的更可靠参考的可扩展性有限。我们通过一个双重验证框架解决了这一问题，该框架将生物学深度的基础与技术及形态学的广度相结合。在深度方面，我们提出了一种IHC引导的多病理学家共识协议，该协议显著提高了相较于传统H&E-only注释的评分者间一致性。这产生了一个分子学基础的参考，我们据此比较Atlas H&E-TME和仅使用H&E的病理学家。在广度方面，我们在超过20万个高置信度H&E-only病理学家注释上对Atlas H&E-TME进行了基准测试，这些注释涵盖1,500多个病例，跨越八种癌症类型及其最常见的转移部位，亚型覆盖每种癌症类型>90%的临床病例，来自25个以上来源和8种以上扫描仪型号。与IHC引导的共识相比，Atlas H&E-TME达到或超过了病理学家仅使用H&E的性能，并在这一广泛的形态学和技术范围内一致且稳健地泛化。通过这种方式，Atlas H&E-TME将H&E切片——病理学中最普遍的数据——转化为一个可扩展的、定量的肿瘤及其微环境窗口，为转化和临床研究中下一代基于组织的生物标志物奠定了基础。

英文摘要

Hematoxylin and eosin (H&E) staining is the cornerstone of histopathology, yet scalable, quantitative analysis of H&E whole-slide images (WSIs) remains a central challenge in computational pathology. We present Atlas H&E-TME, an AI-based system built on the Atlas family of pathology foundation models that predicts tissue quality, tissue region, and cell type labels across multiple cancer types, yielding over 4,500 quantitative readouts per slide at cell-level resolution. A key challenge to validating such systems is overcoming morphological ambiguity inherent to H&E-only ground truth and the limited scalability of more informed references drawing on modalities such as immunohistochemistry (IHC). We address this with a dual validation framework combining biologically grounded depth with technical and morphological breadth. For depth, we propose an IHC-informed multi-pathologist consensus protocol that substantially improves inter-rater agreement over conventional H&E-only annotation. This yields a molecularly grounded reference against which we compare Atlas H&E-TME and pathologists working from H&E alone. For breadth, we benchmark Atlas H&E-TME on over 200,000 high-confidence H&E-only pathologist annotations across 1,500+ cases spanning eight cancer types and their most common metastatic sites, with subtypes covering >90% of clinical cases per cancer type, drawn from 25+ sources and 8+ scanner models. Benchmarked against the IHC-informed consensus, Atlas H&E-TME matches or exceeds pathologist H&E-only performance and generalizes consistently and robustly across this broad morphological and technical scope. In doing so, Atlas H&E-TME turns the H&E slide -- the most ubiquitous data in pathology -- into a scalable, quantitative window into the tumor and its microenvironment, laying a foundation for the next generation of tissue-based biomarkers in translational and clinical research.

URL PDF HTML ☆

赞 0 踩 0

2606.12406 2026-06-11 cs.RO cs.AI cs.LG cs.SY eess.SY 交叉投稿

FACTR 2: Learning External Force Sensing for Commodity Robot Arms Improves Policy Learning

FACTR 2: 学习商用机器人手臂的外部力感知提升策略学习

Steven Oh, Jason Jingzhou Liu, Tony Tao, Philip Han, Kenneth Shaw, Satoshi Funabashi, Ruslan Salakhutdinov, Deepak Pathak

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Waseda University（早稻田大学）

AI总结提出无需专用力传感器的数据驱动方法NEXT，可在1分钟内从10分钟自由运动数据中训练，实现与专用关节力矩传感器相当的估计，并结合FIRST采样策略提升策略学习性能。

Comments Website at https://jasonjzliu.com/factr2

详情

AI中文摘要

接触丰富的操作需要力敏感性，但由于成本高昂，许多机器人手臂缺乏专用的力传感器。我们提出了神经外部力矩估计（NEXT），一种无需任何专用力传感器即可估计外部关节力矩的数据驱动方法。NEXT 仅需 10 分钟的自由运动数据即可在 1 分钟内完成训练，却能实现与专用关节力矩传感器相当的估计。NEXT 能够在低成本手臂上实现力反馈遥操作，并通过力信息重采样训练（FIRST）改进策略学习，该训练在行为克隆过程中对预接触和接触段进行上采样。在五个长时域任务中，FIRST 在任务进展上比先前的力感知策略提高了超过 17%。NEXT 和 FIRST 共同将力感知遥操作和策略学习引入现成的机器人，无需额外的传感硬件。视频结果和代码可在 https://this URL 获取。

英文摘要

Contact-rich manipulation requires force sensitivity, but many robot arms lack dedicated force sensors due to their high cost. We present Neural External Torque Estimation (NEXT), a data-driven method that estimates external joint torques without needing any dedicated force sensors. NEXT trains in 1 minute from only 10 minutes of free-motion data, yet achieves estimates comparable to dedicated joint-torque sensors. NEXT enables force-feedback teleoperation on low-cost arms and improves policy learning through Force-Informed Re-Sampling Training (FIRST), which up-samples pre-contact and contact segments during behavior cloning. Across five long-horizon tasks, FIRST outperforms prior force-aware policies by over 17% in task progress. Together, NEXT and FIRST bring force-aware teleoperation and policy learning to off-the-shelf robots without additional sensing hardware. Video results and code are available at https://jasonjzliu.com/factr2

URL PDF HTML ☆

赞 0 踩 1

2508.21380 2026-06-11 cs.LG cs.AI 版本更新

The Algorithm Is Not the Behavior: Learned Priors Override Look-Ahead in a Chess-Playing Neural Network

算法并非行为：学得的先验知识在弈棋神经网络中覆盖前瞻

Elias Sandmann, Sebastian Lapuschkin, Wojciech Samek

发表机构 * Fraunhofer HHI（弗劳恩霍夫人工智能研究所）

AI总结研究发现，国际象棋神经网络Leela Chess Zero在中间层能正确计算解法，但最终输出被安全优先的先验知识覆盖，导致错误答案。

详情

AI中文摘要

最近的机制性工作揭示了神经网络内部的学习算法，从模运算到游戏智能体中的搜索与规划。但算法结构是否保证算法行为？我们在最强的神经象棋引擎Leela Chess Zero中对此进行研究，先前工作已识别出学习到的前瞻。通过将logit透镜扩展到其选棋策略网络，我们发现正确的谜题解法——包括即时将杀——经常出现在中间层，但在最终输出中被系统性覆盖，我们将此现象称为“遗忘的谜题”。在这些位置上重复先前的分析，我们发现前瞻运行正常——正确续招的未来走法被表示、因果重要且可线性解码——排除了算法本身的失败。相反，后期层逐渐转向优先考虑安全对局而非激进。为了测试这一转变是否驱动了覆盖，我们引导模型反对这些偏好，并恢复了61.7%的遗忘谜题，提供了因果证据表明安全先验覆盖了算法计算的解法。这些发现表明，算法结构并不保证算法行为：模型可以在内部解决问题，但仍然输出错误答案。

英文摘要

Recent mechanistic work has uncovered learned algorithms within neural networks, from modular arithmetic to search and planning in game-playing agents. But does algorithmic structure guarantee algorithmic behavior? We investigate this in Leela Chess Zero, the strongest neural chess engine, where prior work identified learned look-ahead. By extending the logit lens to its move-selecting policy network, we discover that correct puzzle solutions-including immediate checkmates-often appear in intermediate layers but are systematically overridden in the final output, a phenomenon we term "forgotten puzzles". Replicating prior analyses on these positions, we find that look-ahead operates normally-future moves of the correct continuation are represented, causally important, and linearly decodable-ruling out a failure of the algorithm itself. Instead, late layers increasingly shift toward prioritizing safe play over aggression. To test whether this shift drives the override, we steer the model against these preferences and recover 61.7% of forgotten puzzles, providing causal evidence that safety priors override algorithmically computed solutions. These findings demonstrate that algorithmic structure does not guarantee algorithmic behavior: a model can internally solve a problem and still output the wrong answer.

URL PDF HTML ☆

赞 0 踩 0

2601.21293 2026-06-11 cs.LG cs.AI 版本更新

Reliability-Calibrated Edge-IoT Early Fault Warning for Rotating Machinery with a Physics-Guided Tiny-Mamba Transformer

面向旋转机械的可靠性校准边缘物联网早期故障预警：一种物理引导的Tiny-Mamba Transformer

Changyu Li, Huabei Nie, Xiaoya Ni, Lu Wang, Lijuan Shen, Kaishun Wu, Fei Luo

发表机构 * Great Bay University（大亚湾大学）； Huizhou University（惠州大学）； National University of Singapore（国立新加坡大学）； Shenzhen University（深圳大学）； James Cook University（詹姆斯库克大学）； Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））

AI总结提出一种可靠性校准的边缘物联网早期故障预警框架，使用物理引导的Tiny-Mamba Transformer提取特征，结合极值理论校准误报率，在低计算资源下实现高精度、低延迟的旋转机械故障预警。

详情

AI中文摘要

工业物联网系统日益依赖分布式振动传感来支持旋转机械的预测性维护。然而，在实际部署中，原始信号上传成本高昂，且报警决策必须在有限计算资源、变化运行条件和严格误报预算下本地进行。本文提出一种可靠性校准的边缘物联网早期预警框架，其中紧凑的物理引导Tiny-Mamba Transformer作为表示模块，极值理论层将流式异常分数转换为事件级报警片段。PG-TMT结合深度可分离卷积主干、Tiny-Mamba状态空间分支和轻量级局部Transformer，在批量大小为1的推理下捕获瞬态、长周期和多通道退化线索。为提高可审计性，时间注意力被投影到频域并与分析轴承故障阶次带软对齐。极值理论校准、双阈值迟滞和修尾拟合即使在健康校准数据不完美的情况下也能提供可控的误报强度。在CWRU、Paderborn、XJTU-SY和工业试点上的实验表明，所提框架提高了PR-AUC，在可控误报预算下减少了检测延迟，并对结构化干扰、元数据不确定性、复合故障混合和域转移保持鲁棒。凭借小于1 MB的占用空间和低于7 ms的Jetson p99延迟，该框架支持工业物联网预测性维护的校准和可解释早期预警。

英文摘要

Industrial Internet of Things (IIoT) systems increasingly rely on distributed vibration sensing to support predictive maintenance of rotating machinery. In practical deployments, however, raw signal upload is costly and alarm decisions must be made locally under limited computation, changing operating conditions, and strict nuisance-alarm budgets. This paper presents a reliability-calibrated edge-IoT early-warning framework, in which a compact Physics-Guided Tiny-Mamba Transformer (PG-TMT) acts as the representation module and an extreme value theory (EVT) layer converts streaming anomaly scores into event-level alarm episodes. PG-TMT combines a depthwise-separable convolutional stem, a Tiny-Mamba state-space branch, and a lightweight local Transformer to capture transient, long-horizon, and multichannel degradation cues under batch-size-one inference. To improve auditability, temporal attention is projected to the frequency domain and softly aligned with analytical bearing fault-order bands. EVT calibration, dual-threshold hysteresis, and trimmed-tail fitting provide controllable false-alarm intensity even when healthy calibration data are imperfect. Experiments on CWRU, Paderborn, XJTU-SY, and an industrial pilot demonstrate that the proposed framework improves PR-AUC, reduces detection delay under a controlled nuisance-alarm budget, and remains robust to structured interference, metadata uncertainty, compound fault mixtures, and domain transfer. With a sub-1 MB footprint and Jetson p99 latency below 7 ms, the framework supports calibrated and interpretable early warnings for IIoT predictive maintenance.

URL PDF HTML ☆

赞 0 踩 0

2602.10392 2026-06-11 cs.LG 版本更新

Tensor Methods: A Unified and Interpretable Approach for Material Design

张量方法：一种统一且可解释的材料设计方法

Shaan Pakala, Aldair E. Gongora, Brian Giera, Evangelos E. Papalexakis

发表机构 * University of California, Riverside（加州大学河滨分校）； Dept. of Computer Science & Engineering（计算机科学与工程系）； Lawrence Livermore National Laboratory（劳伦斯利弗莫尔国家实验室）； Materials Engineering Division（材料工程 division）； Data Science Institute（数据科学研究所）

AI总结提出使用张量补全方法作为材料设计的统一框架，兼具可解释性和预测性能，在非均匀采样下优于传统机器学习，最高提升5%的R²并减半分布外误差。

Comments Accepted to ACM SIGKDD 2026 AI for Sciences track

详情

AI中文摘要

基于LSTM的物联网设备识别

Kahraman Kostas

发表机构 * Kahraman Kostas

AI总结提出一种端到端机器学习流程，利用LSTM网络处理原始网络数据包，通过滑动窗口时间序列特征识别27类物联网设备，在最优配置下达到79.85%准确率和75.70%宏平均F1分数。

详情

AI中文摘要

随着物联网的使用越来越普及，大量设备进入市场，许多安全漏洞也随之出现。在此环境下，物联网设备识别方法提供了一种预防性安全措施，作为识别这些设备并检测其漏洞的重要因素。在本研究中，我们提出了一种端到端的机器学习流程，利用长短期记忆（LSTM）网络识别阿尔托大学数据集（物联网设备捕获）中的物联网设备。原始网络数据包捕获（PCAP）被处理成25个工程特征，然后排列为滑动窗口时间序列。我们系统地评估了从2到20的序列长度，报告称性能在长度6之前近似线性提升，之后呈波浪形模式，在长度18时达到峰值。在最优配置的最终保留测试集上，该模型在27个设备类别上达到了79.85%的准确率和75.70%的宏平均F1分数。

英文摘要

While the use of the Internet of Things is becoming more and more popular, many security vulnerabilities are emerging with the large number of devices being introduced to the market. In this environment, IoT device identification methods provide a preventive security measure as an important factor in identifying these devices and detecting the vulnerabilities they suffer from. In this study, we present an end-to-end machine learning pipeline that identifies IoT devices in the Aalto university dataset (IoT devices captures) using Long Short-Term Memory (LSTM) networks. Raw network packet captures (PCAP) are processed into 25 engineered features, which are then arranged as sliding-window time-series sequences. We systematically evaluate sequence lengths from 2 to 20, reporting that performance improves approximately linearly up to length 6 and thereafter in a wave-like pattern, reaching its peak at length 18. On the final held-out test set with the optimal configuration, the model achieves an accuracy of 79.85% and a macro-averaged F1-score of 75.70% across 27 device classes.

URL PDF HTML ☆

赞 0 踩 0

2409.12707 2026-06-11 physics.flu-dyn cs.LG 版本更新

Machine-learning-based multipoint optimization of fluidic injection parameters for improving nozzle performance

基于机器学习的流体注入参数多点优化以提升喷管性能

Yunjia Yang, Jiazhe Li, Yufei Zhang, Haixin Chen

发表机构 * Tsinghua University（清华大学）

AI总结针对过膨胀单斜面喷管，采用预训练神经网络替代CFD进行多点优化，结合先验预测策略提高精度，利用反向传播快速计算梯度，在七个设计点优化平均推力系数提升1.14%。

详情

AI中文摘要

流体注入为改善车辆加速过程中过膨胀单斜面喷管（SERN）的性能提供了一种有前景的解决方案。然而，确定能在多个喷管工作状态下产生最佳整体性能的注入参数仍然是一个挑战。基于梯度的优化方法需要在每个设计点计算注入参数的梯度，当使用计算流体动力学（CFD）模拟时，这可能导致高昂的计算成本。本文使用预训练神经网络在优化过程中替代CFD，从而能够快速计算多个设计点的喷管流场。考虑到喷管流场的物理特性，采用基于先验的预测策略来提高模型的准确性。此外，神经网络的反向传播算法只需运行一次计算即可快速计算梯度，从而与有限差分法相比大大减少了梯度计算时间。作为测试案例，对SERN在七个设计点的平均喷管推力系数进行了优化，结果提高了1.14%。即使包括建立训练数据库所需的时间，与传统优化方法相比，时间成本也大大降低。

英文摘要

Fluidic injection offers a promising solution to improve the performance of the overexpanded single expansion ramp nozzles (SERNs) during vehicle acceleration. However, determining the injection parameters that yield the best overall performance across multiple nozzle operating conditions remains a challenge. The gradient-based optimization method requires gradients of injection parameters at each design point, which can lead to high computational costs when using computational fluid dynamics (CFD) simulations. This paper uses a pretrained neural network to replace CFD during optimization, enabling quick calculation of the nozzle flow field at multiple design points. Considering the physical characteristics of the nozzle flow field, a prior-based prediction strategy is adopted to enhance the model's accuracy. In addition, the neural network's back-propagation algorithm computes gradients quickly by running the computation only once, thereby greatly reducing gradient computation time compared to the finite difference method. As a test case, the average nozzle thrust coefficient of an SERN at seven design points is optimized, resulting in a 1.14\% improvement. The time cost is greatly reduced compared with traditional optimization methods, even when the time required to establish the training database is included.

URL PDF HTML ☆

赞 0 踩 0

2411.10959 2026-06-11 econ.EM cs.LG math.ST stat.AP stat.ME stat.ML stat.TH 版本更新

Program Evaluation with Remotely Sensed Outcomes

利用遥感结果的程序评估

Ashesh Rambachan, Rahul Singh, Davide Viviano

发表机构 * MIT（麻省理工学院）； Harvard（哈佛大学）

AI总结本文研究了在实验和准实验中，由于遥感变量不完全测量经济结果而引起的因果推断问题，提出了一种非参数识别因果参数的方法，结合实验和观测数据进行n^{-1/2}推断。

2411.12193 2026-06-11 stat.AP cs.LG stat.ML 版本更新

Hierarchical Probabilistic Conformal Prediction for Distributed Energy Resources Adoption

分布式能源采纳的分层概率保形预测

Wenbin Zhou, Shixiang Zhu

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结针对分布式能源采纳预测中的不确定性和分层电网结构，提出基于多元霍克斯过程与分裂保形预测的量化框架，确保聚合后统计有效性，在印第安纳波利斯数据上优于基线。

详情

AI中文摘要

分布式能源（DERs）的快速增长为电网管理带来了机遇和运营挑战。准确预测DER采纳对于主动基础设施规划至关重要，但DER增长的固有不确定性和空间差异使传统预测方法复杂化。此外，配电网的分层结构要求预测在电路和变电站层面均满足统计保证，这是可靠决策的非平凡要求。本文提出了一种新的DER采纳预测不确定性量化框架，确保在分层电网结构中的有效性。利用多元霍克斯过程建模DER采纳动态，并采用定制的分裂保形预测算法，我们引入了一种新的非一致性分数，在保持预测效率的同时，在聚合下保留统计保证。我们在温和条件下建立了理论有效性，并通过印第安纳州印第安纳波利斯的客户级太阳能电池板安装数据实证评估，表明我们的方法在预测准确性和不确定性校准方面始终优于现有基线。

英文摘要

The rapid growth of distributed energy resources (DERs) presents both opportunities and operational challenges for electric grid management. Accurately predicting DER adoption is critical for proactive infrastructure planning, but the inherent uncertainty and spatial disparity of DER growth complicate traditional forecasting approaches. Moreover, the hierarchical structure of distribution grids demands that predictions satisfy statistical guarantees at both the circuit and substation levels, a non-trivial requirement for reliable decision-making. In this paper, we propose a novel uncertainty quantification framework for DER adoption predictions that ensures validity across hierarchical grid structures. Leveraging a multivariate Hawkes process to model DER adoption dynamics and a tailored split conformal prediction algorithm, we introduce a new nonconformity score that preserves statistical guarantees under aggregation while maintaining prediction efficiency. We establish theoretical validity under mild conditions and demonstrate through empirical evaluation on customer-level solar panel installation data from Indianapolis, Indiana that our method consistently outperforms existing baselines in both predictive accuracy and uncertainty calibration.

URL PDF HTML ☆

赞 0 踩 0

2502.14894 2026-06-11 cs.CV cs.AI cs.CY cs.LG 版本更新

FOCUS on Contamination: Hydrology-Informed Noise-Aware Learning for Geospatial PFAS Mapping

聚焦污染：基于水文信息与噪声感知的地理空间PFAS测绘学习

Jowaria Khan, Alexa Friedman, Sydney Evans, Rachel Klein, Runzi Wang, Katherine E. Manz, Kaley Beins, David Q. Andrews, Elizabeth Bondi-Kelly

发表机构 * University of Michigan（密歇根大学）； Environmental Working Group（环保工作组）； University of California, Davis（加州大学戴维斯分校）

AI总结提出FOCUS框架，结合稀疏PFAS观测与水文连通性等环境先验，通过噪声感知损失实现鲁棒训练，在PFAS污染测绘中优于传统方法。

Comments Best Paper Award at ICLR 2026 Machine Learning for Remote Sensing Workshop

详情

AI中文摘要

全氟和多氟烷基物质（PFAS）是持久性环境污染物，对公共健康有显著影响，但由于现场采样的高成本和后勤挑战，大规模监测仍然严重受限。样本的缺乏导致难以用物理模型模拟其扩散，并且对PFAS在地表水中传输的科学理解有限。然而，描述土地覆盖、水文和工业活动的丰富地理空间和卫星衍生数据广泛可用。我们提出了FOCUS，一个用于PFAS污染测绘的地理空间深度学习框架，该框架将稀疏的PFAS观测与大规模环境背景（包括来自水文连通性、土地覆盖、污染源邻近性和采样距离的先验）相结合。这些先验被整合到一个原则性的、噪声感知的损失函数中，从而在稀疏标签下产生稳健的训练目标。通过广泛的消融实验、鲁棒性分析和实际验证，FOCUS始终优于包括稀疏分割、克里金法和污染物传输模拟在内的基线方法，同时在大区域上保持了空间一致性和可扩展性。我们的结果展示了AI如何通过提供筛查级风险图来支持环境科学，这些风险图可优先安排后续采样，并在缺乏完整物理模型的情况下帮助将潜在污染源与地表水污染模式联系起来。

英文摘要

Per- and polyfluoroalkyl substances (PFAS) are persistent environmental contaminants with significant public health impacts, yet large-scale monitoring remains severely limited due to the high cost and logistical challenges of field sampling. The lack of samples leads to difficulty simulating their spread with physical models and limited scientific understanding of PFAS transport in surface waters. Yet, rich geospatial and satellite-derived data describing land cover, hydrology, and industrial activity are widely available. We introduce FOCUS, a geospatial deep learning framework for PFAS contamination mapping that integrates sparse PFAS observations with large-scale environmental context, including priors derived from hydrological connectivity, land cover, source proximity, and sampling distance. These priors are integrated into a principled, noise-aware loss, yielding a robust training objective under sparse labels. Across extensive ablations, robustness analyses, and real-world validation, FOCUS consistently outperforms baselines including sparse segmentation, Kriging, and pollutant transport simulations, while preserving spatial coherence and scalability over large regions. Our results demonstrate how AI can support environmental science by providing screening-level risk maps that prioritize follow-up sampling and help connect potential sources to surface-water contamination patterns in the absence of complete physical models.

URL PDF HTML ☆

赞 0 踩 0

2505.00571 2026-06-11 stat.ML cs.LG 版本更新

Discovery and inference beyond linearity for epidemiological data by integrating Bayesian regression, tree ensembles and Shapley values

通过整合贝叶斯回归、树集成和Shapley值对流行病学数据进行线性之外的发现与推断

Giorgio Spadaccini, Marjolein Fokkema, Mark A. van de Wiel

发表机构 * Amsterdam UMC Leiden University（阿姆斯特丹大学医学中心-莱顿大学）； Leiden University（莱顿大学）； Amsterdam UMC（阿姆斯特丹大学医学中心）

AI总结提出RuleSHAP框架，结合贝叶斯稀疏回归、改进的树规则生成器和Shapley值，实现非线性与交互效应的检测及个体水平的不确定性量化，应用于流行病学数据发现高胆固醇和血压的影响因素。

详情

AI中文摘要

机器学习在流行病学和医疗健康研究中越来越受欢迎，用于无假设地发现风险和保护因素。机器学习在发现非线性和交互作用方面很强，但这种能力因缺乏可靠的推断而受损。尽管Shapley值提供了特征效应的局部度量，但这些效应通常缺乏有效的不确定性量化，从而排除了统计推断。我们提出RuleSHAP，一个通过结合专用贝叶斯稀疏回归模型、改进的基于树的规则生成器和Shapley值归因来解决这一局限性的框架。RuleSHAP能够检测非线性和交互效应，其关键贡献在于个体水平的不确定性量化。我们推导了一个在该框架内计算边际Shapley值的有效公式。我们将RuleSHAP应用于一个流行病学队列的数据，以检测和推断高胆固醇和血压的几种效应，例如年龄、性别、种族、BMI和血糖水平等特征之间的非线性交互效应。最后，我们在模拟数据上证明了我们框架的有效性。

英文摘要

Machine Learning (ML) is gaining popularity in epidemiology and healthcare studies for hypothesis-free discovery of risk and protective factors. ML is strong at discovering nonlinearities and interactions, but this power is compromised by a lack of reliable inference. Although Shapley values provide local measures of features' effects, valid uncertainty quantification for these effects is typically lacking, thus precluding statistical inference. We propose RuleSHAP, a framework that addresses this limitation by combining a dedicated Bayesian sparse regression model with an improved tree-based rule generator and Shapley value attribution. RuleSHAP provides detection of nonlinear and interaction effects, with uncertainty quantification at the individual level as a key contribution. We derive an efficient formula for computing marginal Shapley values within this framework. We apply RuleSHAP to data from an epidemiological cohort to detect and infer several effects for high cholesterol and blood pressure, such as nonlinear interaction effects between features like age, sex, ethnicity, BMI and glucose level. To conclude, we demonstrate the validity of our framework on simulated data.

URL PDF HTML ☆

赞 0 踩 0

2510.08073 2026-06-11 cs.CV cs.LG 版本更新

Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

物理驱动的时空建模用于AI生成视频检测

Shuhai Zhang, ZiHao Lian, Jiahao Yang, Daiyuan Li, Guoxuan Pang, Feng Liu, Bo Han, Shutao Li, Mingkui Tan

发表机构 * South China University of Technology（华南理工大学）； University of Science and Technology of China（中国科学技术大学）； Key Laboratory of Big Data and Intelligent Robot, Ministry of Education（教育部大数据与智能机器人重点实验室）； Pazhou Lab（琶洲实验室）； University of Melbourne（墨尔本大学）； Hunan University（湖南大学）； Hong Kong Baptist University（香港 Baptist大学）

AI总结提出基于概率流守恒的物理驱动AI生成视频检测范式，通过归一化时空梯度（NSG）统计量捕捉物理异常，结合预训练扩散模型估计NSG，并利用最大均值差异（MMD）进行检测，在Recall和F1-Score上分别提升16.00%和10.75%。

Comments Accepted at NeurIPS 2025 spotlight

详情

AI中文摘要

AI生成的视频已实现近乎完美的视觉真实感（如Sora），迫切需要可靠的检测机制。然而，检测此类视频在建模高维时空动态和识别违反物理规律的细微异常方面面临重大挑战。本文提出首个基于概率流守恒原理的物理驱动AI生成视频检测范式。具体而言，我们提出一种称为归一化时空梯度（NSG）的统计量，该统计量量化空间概率梯度与时间密度变化之比，明确捕捉与自然视频动态的偏差。利用预训练的扩散模型，我们通过空间梯度近似和运动感知时间建模开发了NSG估计器，无需复杂的运动分解，同时保持物理约束。在此基础上，我们提出基于NSG的视频检测方法（NSG-VD），该方法计算测试视频与真实视频NSG特征之间的最大均值差异（MMD）作为检测指标。最后，我们推导了真实视频与生成视频之间NSG特征距离的上界，证明由于分布偏移，生成视频表现出放大的差异。大量实验证实，NSG-VD在Recall和F1-Score上分别比最先进的基线方法高出16.00%和10.75%，验证了NSG-VD的优越性能。源代码可在该 https URL 获取。

英文摘要

AI-generated videos have achieved near-perfect visual realism (e.g., Sora), urgently necessitating reliable detection mechanisms. However, detecting such videos faces significant challenges in modeling high-dimensional spatiotemporal dynamics and identifying subtle anomalies that violate physical laws. In this paper, we propose the first physics-driven AI-generated video detection paradigm based on probability flow conservation principles. Specifically, we propose a statistic called Normalized Spatiotemporal Gradient (NSG), which quantifies the ratio of spatial probability gradients to temporal density changes, explicitly capturing deviations from natural video dynamics. Leveraging pre-trained diffusion models, we develop an NSG estimator through spatial gradients approximation and motion-aware temporal modeling without complex motion decomposition while preserving physical constraints. Building on this, we propose an NSG-based video detection method (NSG-VD) that computes the Maximum Mean Discrepancy (MMD) between NSG features of the test and real videos as a detection metric. Last, we derive an upper bound of NSG feature distances between real and generated videos, proving that generated videos exhibit amplified discrepancies due to distributional shifts. Extensive experiments confirm that NSG-VD outperforms state-of-the-art baselines by 16.00% in Recall and 10.75% in F1-Score, validating the superior performance of NSG-VD. The source code is available at https://github.com/ZSHsh98/NSG-VD.

URL PDF HTML ☆

赞 0 踩 0

2510.22397 2026-06-11 cs.NI cs.LG 版本更新

神经集成卡尔曼滤波器：含激波可压缩流的数据同化

Xu-Hui Zhou, Lorenzo Beronilla, Michael K. Sleeman, Hangchuan Hu, Matthias Morzfeld, Andrew M. Stuart, Tamer A. Zaki

发表机构 * University of California, San Diego（加州大学圣迭戈分校）； University of Cambridge（剑桥大学）

AI总结针对含激波可压缩流中集成卡尔曼滤波器（EnKF）因双峰预报分布失效的问题，提出神经EnKF，通过将预报集合映射到神经网络参数空间并在此空间进行同化，结合物理信息迁移学习避免伪振荡和非物理特征。

详情

AI中文摘要

含激波可压缩流的数据同化（DA）具有挑战性，因为许多经典DA方法在不确定激波附近会产生伪振荡和非物理特征。我们在此关注集成卡尔曼滤波器（EnKF）。我们表明，EnKF性能不佳可归因于在不确定激波位置附近可能出现双峰预报分布；这违反了EnKF的假设，即预报接近高斯分布。为解决此问题，我们引入了新的神经EnKF。基本思想是通过将激波流的预报集合映射到深度神经网络（NN）的参数空间（权重和偏置），并随后在该空间中进行DA，从而系统地将神经函数逼近嵌入到集成DA中。非线性映射将尖锐和光滑的流动特征编码在NN参数的集合中。因此，只有当NN参数在预报集合的神经表示中平滑变化时，神经EnKF更新才是良好的。我们表明，可以通过物理信息迁移学习强制网络参数的这种平滑变化，并证明这样做神经EnKF避免了困扰EnKF的伪振荡和非物理特征。通过无粘Burgers方程、Sod激波管和二维爆炸波的一系列系统数值实验，证明了神经EnKF的适用性。

英文摘要

Data assimilation (DA) for compressible flows with shocks is challenging because many classical DA methods generate spurious oscillations and nonphysical features near uncertain shocks. We focus here on the ensemble Kalman filter (EnKF). We show that the poor performance of the EnKF may be attributed to the bimodal forecast distribution that can arise in the vicinity of an uncertain shock location; this violates the assumptions underpinning the EnKF, which assume a forecast which is close to Gaussian. To address this issue we introduce the new neural EnKF. The basic idea is to systematically embed neural function approximations within ensemble DA by mapping the forecast ensemble of shocked flows to the parameter space (weights and biases) of a deep neural network (NN) and to subsequently perform DA in that space. The nonlinear mapping encodes sharp and smooth flow features in an ensemble of NN parameters. Neural EnKF updates are therefore well-behaved only if the NN parameters vary smoothly within the neural representation of the forecast ensemble. We show that such a smooth variation of network parameters can be enforced via physics-informed transfer learning, and demonstrate that in so-doing the neural EnKF avoids the spurious oscillations and nonphysical features that plague the EnKF. The applicability of the neural EnKF is demonstrated through a series of systematic numerical experiments with the inviscid Burgers' equation, the Sod shock tube, and a two-dimensional blast wave.

URL PDF HTML ☆

赞 0 踩 0

2603.21639 2026-06-11 cs.CY cs.LG 版本更新

A Multi-Modal Sensor Fusion Instrument for Measuring Regional Human Mobility: The Distributed Human Data Engine (DHDE)

多模态传感器融合仪器用于测量区域人类流动性：分布式人类数据引擎（DHDE）

Amil Khanzada, Takuji Takemoto

发表机构 * Headquarters for Regional Revitalization, University of Fukui, Japan（复兴地区总部，福井大学，日本）

AI总结提出分布式人类数据引擎（DHDE），通过融合边缘AI相机、数字意图信号、行为记录和气象数据，解决外围区域人类流动性测量中传感器稀疏和行为异质性问题，验证了稀疏传感器补偿方法，并发现“低活力悖论”。

Comments 32 pages, 4 figures, 3 tables. Pre-print of a manuscript submitted for peer review (v2)

详情

AI中文摘要

准确估计外围区域经济中的人类流动性面临一个基本的测量挑战：物理地面实况传感器稀疏，行为意图信号异质，环境摩擦给需求推断引入系统性偏差。我们提出分布式人类数据引擎（DHDE），一种多模态传感器融合架构，通过整合物理仪器（边缘AI相机）、数字意图信号（路线搜索印象指标）、行为记录（90,350条消费记录，97,719份标准化调查回复）以及日本福井四个地理分布节点的气象数据来解决这一挑战。主要的测量科学贡献在于设计、部署和跨节点验证DHDE作为稀疏传感器补偿仪器：一种异质传感器融合架构，将非平稳数字意图信号锚定到同时的物理地面实况计数，纠正由气象规划摩擦引入的系统性偏差。该仪器实现为集成推理管道（随机森林和带有Newey-West稳健推断的普通最小二乘法），在397个日观测数据上校准，并通过四个地理上不同的节点类型的时间顺序保留复制进行验证。主要OLS规范实现了样本内解释力R²=0.810和时间顺序样本外预测性能R²=0.683。结果识别出一个“低活力悖论”，其中宏观区域访客满意度与人群密度正相关（Spearman秩相关系数rs=+0.150，p=0.002）。我们估计年度代理缺口为865,917次意图隐含访问，对应119.6亿日元（7260万美元）的损失收入。

英文摘要

Accurately estimating human mobility in peripheral regional economies presents a fundamental measurement challenge: physical ground-truth sensors are sparse, behavioral intent signals are heterogeneous, and environmental friction introduces systematic bias into demand inference. We introduce the Distributed Human Data Engine (DHDE), a multi-modal sensor fusion architecture that addresses this challenge by integrating physical instrumentation (Edge-AI cameras), digital intent signals (route search impression metrics), behavioral records (90,350 spending records, 97,719 standardized survey responses), and meteorological data across four geographically distributed nodes in Fukui, Japan. The primary measurement-science contribution is the design, deployment, and cross-node validation of the DHDE as a sparse-sensor compensation instrument: a heterogeneous sensor fusion architecture that anchors non-stationary digital intent signals to concurrent physical ground-truth counts, correcting for systematic bias introduced by meteorological planning friction. The instrument is implemented as an ensemble inference pipeline (Random Forest and Ordinary Least Squares with Newey-West robust inference), calibrated across 397 daily observations and validated by chronological holdout replication across four geographically distinct node types. The primary OLS specification achieved an in-sample explanatory power of R2 = 0.810 and a chronological out-of-sample predictive performance of R2 = 0.683. Results identify an Under-Vibrancy Paradox where macro-regional visitor satisfaction correlates positively with crowd density (Spearman rank correlation rs = +0.150, p = 0.002). We estimate an annual proxy gap of 865,917 intent-implied visits, corresponding to JPY 11.96 billion (USD 72.6 million) in foregone revenue.

URL PDF HTML ☆

赞 0 踩 0

2604.23874 2026-06-11 physics.flu-dyn cs.LG math.DS physics.comp-ph physics.geo-ph 版本更新

Deep Learning of Solver-Aware Turbulence Closures from Nudged LES Dynamics

从Nudged LES动力学中深度学习求解器感知的湍流闭合模型

Ashwin Suriyanarayanan, Dibyajyoti Chakraborty, Romit Maulik

发表机构 * School of Mechanical Engineering（机械工程学院）； Purdue University（普渡大学）； College of Information Sciences and Technology（信息科学与技术学院）； Pennsylvania State University（宾夕法尼亚州立大学）

AI总结提出基于连续数据同化框架的深度学习方法，利用稀疏观测的DNS数据先验训练湍流闭合模型，无需修改或微分LES求解器，同时保持部署稳定性，并显式条件化数值格式以适配不同离散化。

详情

AI中文摘要

可微物理范式可以通过将神经网络参数化直接嵌入求解器，并根据潜在稀疏的目标数据进行优化，作为一种后验方法来发现湍流闭合模型。这解决了先验学习的关键局限性，即使用直接数值模拟（DNS）数据来近似亚网格应力，并假设存在低通滤波器。以这种先验方式训练的闭合模型常常由于假设的滤波器与数值离散化和粗粒化效应之间的不匹配而导致部署不稳定。相比之下，后验学习虽然在部署期间通常稳定，但由于需要通过大涡模拟（LES）求解器进行反向传播，因此计算成本高昂。此外，后验方法难以广泛应用，因为它们需要对现有求解器进行重大修改。最后，当需要在具有隐式滤波特性的不同数值格式之间进行泛化时，这两种方法都受到限制。在这项工作中，我们提出了一种基于连续数据同化框架的深度学习湍流闭合建模方法。我们的方法允许使用稀疏观测的DNS数据先验训练闭合模型，而无需修改或微分LES求解器，同时在部署期间保持稳定性以恢复不变统计量。我们通过显式地将模型条件化于数值格式，专注于模型适应不同离散化的能力。我们使用二维和三维经典案例来测试我们的框架，并表明学习的修正系统地跟踪了粗求解器的离散化误差。

英文摘要

The differentiable physics paradigm may be leveraged as an a-posteriori approach for discovering turbulence closure models by embedding a neural network parameterization directly inside the solver and optimizing it given potentially sparse target data. This addresses a key limitation of a-priori learning where direct numerical simulation (DNS) data is used to approximate the subgrid stress with the assumption of a low-pass filter. Closures trained in this a-priori manner frequently lead to unstable deployments due to the mismatch between the assumed filter and the effect of numerical discretizations and coarse-graining. In comparison, while typically stable during deployment, a-posteriori learning incurs high computational costs due to the need to backpropagate through a large eddy simulation (LES) solver. Furthermore, a-posteriori methods are challenging to apply broadly since they require significant modification of existing solvers. Finally, both approaches are limited when generalization is desired across different numerical schemes with their implicit filtering characteristics. In this work, we present a deep-learning approach for turbulence closure modeling built on the continuous data assimilation framework. Our approach enables the a-priori training of closures using sparsely observed DNS data without modifying or differentiating through the LES solver, while preserving stability during deployment for the recovery of invariant statistics. We focus on the model's ability to adapt to different discretizations by explicitly conditioning it on the numerical scheme. We use two- and three-dimensional canonical cases to test our framework and show that the learned correction systematically tracks the discretization error of the coarse solver.

URL PDF HTML ☆

赞 0 踩 0

2605.06100 2026-06-11 eess.SP cs.AI cs.LG cs.RO 版本更新

CredibleDFGO: Differentiable Factor Graph Optimization with Credibility Supervision

可信DFGO：具有可信度监督的可微因子图优化

Liang Qian, Penggao Yan, Penghui Xu, Li-Ta Hsu

发表机构 * Department of Aeronautical and Aviation Engineering（航空与航空工程系）

AI总结针对GNSS协方差不可靠问题，提出CredibleDFGO框架，通过可微高斯-牛顿求解器与加权生成网络，利用适当评分规则监督预测分布，提升协方差可信度与定位精度。

Comments Submitted to NAVIGATION: Journal of the Institute of Navigation

详情

AI中文摘要

全球导航卫星系统（GNSS）定位广泛用于城市导航，但GNSS求解器报告的协方差在城市峡谷中通常不可靠。现有的可微因子图优化（DFGO）方法通过求解器学习测量加权，但仍仅使用位置目标。因此，位置估计可能改善，而报告的协方差仍然过小、过大或方向错误。我们提出CredibleDFGO（CDFGO），一种可微GNSS因子图框架，将协方差可信度作为显式训练目标。加权生成网络（WGN）预测每颗卫星的可靠性权重，可微高斯-牛顿求解器将这些权重映射到位置估计和基于Hessian的后验协方差。我们使用适当评分规则端到端监督东-北预测分布。我们研究了负对数似然（NLL）、能量分数（ES）及其组合。在三个UrbanNav测试场景上的结果表明，协方差可信度持续提升。定位精度在中度城市和严峻城市场景中也有所提高；在深度城市场景中，平均水平误差和第95百分位误差均有所改善。在严峻城市的旺角（MK）场景中，与DFGO（MAE）相比，CDFGO-Combined将平均水平误差从13.77米降至11.68米，将NLL从40.63降至6.59，将ES从12.31降至9.05。案例研究将MK改进归因于更好的轴向一致性、更可信的局部协方差椭圆以及卫星级重新加权。

英文摘要

Global navigation satellite system (GNSS) positioning is widely used for urban navigation, but the covariance reported by the GNSS solver is often unreliable in urban canyons. Existing differentiable factor graph optimization (DFGO) methods learn measurement weighting through the solver, but they still use position-only objectives. As a result, the position estimate may improve while the reported covariance remains too small, too large, or incorrectly oriented. We propose CredibleDFGO (CDFGO), a differentiable GNSS factor graph framework that makes covariance credibility an explicit training target. A Weighting Generation Network (WGN) predicts per-satellite reliability weights, and a differentiable Gauss-Newton solver maps these weights to a position estimate and a Hessian-derived posterior covariance. We use proper scoring rules to supervise the East-North predictive distribution end to end. We study negative log-likelihood (NLL), the energy score (ES), and their combination. Results on three UrbanNav test scenes show consistent gains in covariance credibility. Positioning accuracy also improves on the medium-urban and harsh-urban scenes; on the deep-urban scene, both the mean horizontal error and the 95th-percentile error improve. On the harsh-urban Mong Kok (MK) scene, CDFGO-Combined reduces the mean horizontal error from 13.77 m to 11.68 m, reduces NLL from 40.63 to 6.59, and reduces ES from 12.31 to 9.05 relative to DFGO (MAE). Case studies link the MK improvement to better axis-wise consistency, more credible local covariance ellipses, and satellite-level reweighting.

URL PDF HTML ☆

赞 0 踩 0

2605.10592 2026-06-11 cs.AI cs.HC cs.LG 版本更新

A Resilient Solution for Sewer Overflow Monitoring across Cloud and Edge

跨云和边缘的防洪溢流监控稳健解决方案

Vipin Singh, Tianheng Ling, Peter Ghaly, Felix Grimmeisen, Gregor Schiele, Felix Biessmann

发表机构 * Berlin University of Applied Sciences（柏林应用技术大学）； University of Duisburg-Essen（杜伊斯堡-埃森大学）； Okeanos Smart Data Solutions GmbH（Okeanos智能数据解决方案 GmbH）； Einstein Center Digital Future（爱因斯坦数字未来研究中心）

AI总结本文提出一个基于深度学习的云边协同监控平台，用于预测溢流池填充动态，以应对城市排水系统老化问题，提升防洪预警能力。

Comments 3 pages, 6 figures, accepted at 35th International Joint Conference on Artificial Intelligence 2026 (IJCAI-ECAI 2026), Demonstrations Track. URL: https://riwwer.demo.calgo-lab.de

2605.26234 2026-06-11 math.DG cs.LG math.GT 版本更新

Minimal surfaces, Knots, and Neural Networks

极小曲面、纽结与神经网络

Tancredi Schettini Gherardini, Marco Usula

发表机构 * GitHub

AI总结基于物理信息神经网络求解双曲空间中的极小曲面方程，通过计算纽结边界的极小曲面及其自交数，为Fine猜想提供了实证支持。

Comments 38 pages, 12 figures; small cosmetic update

2606.08493 2026-06-11 q-bio.GN cs.LG stat.ML 版本更新

Querying Counterfactuals on Tissue Graphs with Supervised Disentanglement

在组织图上通过监督解缠查询反事实

Abdul Moeed, Stefan Schrod, Martin Rohbeck, Marc Jan Bonder, Pavlo Lutsik, Oliver Stegle, Daniel Dimitrov

发表机构 * Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany（德国癌症研究中心（DKFZ）计算基因组学与系统遗传学部，海德堡，德国）； Helmholtz Information & Data Science School for Health, Germany（德国健康信息与数据科学学院）； Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany（欧洲分子生物学实验室（EMBL）基因组生物学部，海德堡，德国）； Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands（格罗宁根大学医学中心基因学系，格罗宁根，荷兰）； Oncode Institute, Utrecht, The Netherlands（奥诺代码研究所，乌得勒支，荷兰）； KU Leuven, Leuven, Belgium（鲁汶大学，鲁汶，比利时）； Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK（沃里克桑格研究所，沃里克基因组校园，欣斯顿，英国）

AI总结本文形式化组织图反事实为空间干预，提出Cellina框架通过监督解缠分解细胞内在状态与空间上下文，用于反事实预测，在结直肠癌和小鼠大脑数据上优于现有方法。

详情

AI中文摘要

组织图反事实询问在改变的空间邻居上下文中细胞的表达将如何变化。这类查询对于预测组织中细胞行为至关重要，但缺乏统一定义，现有方法针对特定干预类型或将细胞视为独立同分布。在这项工作中，我们首先将组织图反事实形式化为一类空间干预，这些干预要么重新连接细胞之间的边（边扰动），要么修改其邻居的表达（节点扰动）。然后，我们介绍Cellina（https://cellina.readthedocs.io），一个使用监督解缠将细胞内在状态从其空间上下文中分解出来的框架，将后者作为反事实预测的条件输入。在跨越结直肠癌和小鼠大脑中超过250万个空间分辨细胞的基准测试中，Cellina在组织扰动、解缠和可扩展性方面优于空间感知和非空间的竞争对手。此外，我们展示了Cellina以无监督方式揭示生物学上不同的癌症子域，并实现靶向邻居扰动模拟。

英文摘要

Tissue graph counterfactuals ask how a cell's expression would change under altered spatial neighbor contexts. Such queries are central to predicting cell behavior in tissues, but lack a unified definition, with existing methods targeting specific intervention types or treating cells as i.i.d. In this work, we first formalize tissue graph counterfactuals as a class of spatial interventions that either rewire connections between cells (edge perturbation) or modify the expression of their neighbors (node perturbation). We then introduce Cellina (https://cellina.readthedocs.io) - a framework that uses supervised disentanglement to decompose a cell's intrinsic state from its spatial context, using the latter as a conditioning input for counterfactual predictions. Across benchmarks spanning over 2.5 million spatially-resolved cells in colorectal cancer and mouse brain, Cellina outperforms spatially-informed and non-spatial competitors in in-silico graph perturbations, disentanglement, and scalability. Additionally, we show that Cellina reveals biologically distinct cancer subdomains in an unsupervised manner and enables targeted neighbor perturbation simulations.

URL PDF HTML ☆

赞 0 踩 0

2606.11107 2026-06-11 eess.IV cs.CV cs.LG 版本更新

稀疏探针与模糊物理：连续介质动力学基础模型可解释性挑战的案例研究

Katherine Rosenfeld, Maike Sonnewald

发表机构 * Gates Foundation（盖茨基金会）； UC Davis（加州大学戴维斯分校）

AI总结本研究通过稀疏自编码器探针分析连续介质动力学基础模型Walrus的内部机制，发现其内部特征与物理分解不完全一致，并存在输出级偏差，揭示了科学基础模型可解释性的关键挑战。

Comments 8 pages, 5 figures

详情

Journal ref: ICLR 2026 Workshop on Foundation Models for Science

AI中文摘要

生成式AI仿真器越来越多地用于我们已经拥有强大理论、基准和物理直觉的科学领域。这引发了一个核心评估和可解释性问题：当一个基础模型能够再现已知的连续介质动力学时，是什么内部机制支持这种行为？内部行为是否与已知物理一致？以及它与仿真器成功或失败的关系如何？我们研究了跨领域连续介质动力学基础模型——Polymathic团队的Walrus，采用基于物理原理的机械可解释性方法。我们应用稀疏自编码器（SAE）探测选定层，并利用涡度作为物理基础度量，解决了对大量特征集（超过20,000个）进行分类的实际挑战。作为刻意简单的测试平台，我们聚焦于剪切流，并比较了多个剪切流设置（即数值模拟中的参数值）下的特征招募情况。在不同设置中，我们发现了分段一致性的证据，特征子集以相似角色重复出现，但这种结构是间歇性的，并未清晰地映射到标准物理分解上。同时，数值模拟与仿真器之间的直接比较揭示了系统性的输出级差异，包括能量/结构变得过于扩散或过于局部的区域。我们将这些差异的部分与特定SAE特征使用的变化联系起来。我们的工作突出了科学基础模型的开放性问题：如何稳健地优先考虑机械上有意义的特征，如何将稳定结构与分析伪影（包括单层和SAE限制）分离，以及如何利用既定基准来决定何时“不同”的内部表示真正具有信息性而非仅仅是有效的。

英文摘要

Generative AI emulators are increasingly used in scientific domains where we already have strong theory, benchmarks, and physical intuition. This raises a central evaluation and interpretability question: when a foundation-style model can reproduce known continuum dynamics, what internal mechanism supports that behavior, is the internal behaviour consistent with known physics, and how does it relate to where the emulator succeeds or fails? We investigate a cross-domain foundation model for continuum dynamics, Walrus by Polymathic, using mechanistic interpretability guided by physical principles. We apply a sparse autoencoder (SAE) to probe a selected layer, and address the practical challenge of triaging a large feature set (over 20,000) using enstrophy as a physically grounded metric. As a deliberately simple testbed, we focus on shear flow and compare feature recruitment across multiple shear-flow setups, i.e. parameter values in the numerical simulation. Across setups we find evidence of piecewise consistency, with subsets of features recurring in similar roles, but this structure is intermittent and does not map cleanly onto standard physical decompositions. In parallel, direct comparisons between numerical simulation and the emulator reveal systematic output-level discrepancies, including regimes where energy/structures become too diffuse or too localized. We connect parts of these discrepancies to changes in specific SAE feature usage. Our work highlights open questions for scientific foundation models: how to robustly prioritize mechanistically meaningful features, how to separate stable structure from analysis artifacts (including single-layer and SAE limitations), and how to use established benchmarks to decide when "different" internal representations are genuinely informative rather than merely effective.

URL PDF HTML ☆

赞 0 踩 0

2606.11988 2026-06-11 cs.LG stat.ML 新提交

What Uncertainties Do We Need for Dynamical Systems?

动力系统需要哪些不确定性？

Yusuf Sale, Christopher Bülte, Felix Czaja, Joshua Stiller, Eyke Hüllermeier

发表机构 * Institute of Computer Science, LMU Munich（慕尼黑大学计算机科学研究所）； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心）； Department of Mathematics, LMU Munich（慕尼黑大学数学系）； German Research Center for Artificial Intelligence (DFKI, DSA)（德国人工智能研究中心（DFKI, DSA））

AI总结本文从机器学习视角探讨动力系统中的不确定性，区分偶然与认知不确定性，并讨论不同任务中表示和量化不确定性的目标。

Comments EIML@ICML

2606.12138 2026-06-11 cs.LG cs.AI cs.CL 新提交

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

不稳定特征，可复现子空间：理解稀疏自编码器中的种子依赖性

Gleb Gerasimov, Timofei Rusalev, Nikita Balagansky, Daniil Laptev, Vadim Kurochkin, Daniil Gavrilov

发表机构 * T-Tech

AI总结研究稀疏自编码器特征的可复现性，发现稳定特征承载主要信号，不稳定特征集中于可复现的低秩子空间，反映基歧义而非纯噪声。

详情

AI中文摘要

稀疏自编码器（SAE）被广泛用于解释神经网络表示，但其效用取决于学习到的特征是否在不同训练运行间可复现。我们通过\textit{特征稳定性}研究这一问题：对于每个SAE特征，我们估计其在独立训练的SAE中再次出现的概率。这产生了一个可扩展的每特征信号，将稳定特征与不稳定特征区分开来。在一项跨种子、模型、层、字典大小和SAE变体的大规模研究中，我们发现显著的功能不对称性：稳定特征承载了大部分重建和预测相关信号，而不稳定特征的边际影响较弱，并且在激活统计和自动解释中主要由低频表面形式触发主导。在几何上，不稳定特征个体不可复现，但集中在可复现的低秩子空间中，这表明种子依赖性通常反映了共享激活空间区域内的基歧义，而非纯噪声。一个受控的合成模型使这一机制明确，表明低秩真实特征可以在子空间级别被恢复，而作为个体SAE潜在变量跨种子仍不可识别。最后，通过汇集独特的跨种子特征，我们构建了更稳定的SAE，同时在此设置中保留了解释方差。这些结果共同表明，不稳定特征不仅仅是失败或噪声潜在变量：它们个体功能影响较弱，但反映了标准SAE跨种子不同解析的可复现低维结构。

英文摘要

Sparse autoencoders (SAEs) are widely used to interpret neural network representations, but their utility depends on whether the learned features are reproducible across training runs. We study this question through \emph{feature stability}: for each SAE feature, we estimate the probability that a similar feature reappears in an independently trained SAE. This yields a scalable per-feature signal that separates stable from unstable features. In a large-scale study across seeds, models, layers, dictionary sizes, and SAE variants, we find a pronounced functional asymmetry: stable features carry most of the reconstruction- and prediction-relevant signal, while unstable features have weak marginal impact and are dominated by low-frequency surface-form triggers in both activation statistics and automatic explanations. Geometrically, unstable features are individually non-reproducible but concentrate in reproducible lower-rank subspaces, suggesting that seed dependence often reflects basis ambiguity within a shared region of activation space rather than pure noise. A controlled synthetic model makes this mechanism explicit, showing that low-rank ground-truth features can be recovered at the subspace level while remaining non-identifiable as individual SAE latents across seeds. Finally, by pooling unique cross-seed features, we construct more stable SAEs while preserving explained variance in this setting. Together, these results show that unstable features are not merely failed or noisy latents: they have weak individual functional impact, but reflect reproducible low-dimensional structure that standard SAEs resolve differently across seeds.

URL PDF HTML ☆

赞 0 踩 0

2606.12277 2026-06-11 cs.LG 新提交

Finding Multiple Interpretations in Datasets

在数据集中寻找多种解释

Matthew Chak, Paul Anderson

发表机构 * Department of Computer Science, California Polytechnic State University（加州州立理工大学计算机科学系）

AI总结提出一种方法，在保持性能的同时，找到具有不同上下文感知特征但性能相似的模型集，以提取对潜在现象的洞察。

2606.12289 2026-06-11 cs.LG cs.AI cs.NE 新提交

APEX: 具有动态数据选择的自动提示工程专家

Fei Wang, Si Si, Cho-Jui Hsieh, Inderjit S. Dhillon

发表机构 * Google（谷歌）； UCLA（加州大学洛杉矶分校）

AI总结提出APEX框架，通过动态数据分层（易、难、混合）优先选择高杠杆子集，在固定预算下提升提示优化效率，在三个基准上平均提升11.2%和6.8%。

详情

AI中文摘要

大型语言模型对提示表述高度敏感，需要自动提示优化以释放其全部潜力。尽管进化算法已成为主导范式，但它们面临一个关键瓶颈：数据效率。当前方法将开发数据集视为静态基准，在无信息数据上浪费大量计算预算。在这项工作中，我们引入了APEX（自动提示工程专家），这是一个新颖的框架，它在提示搜索的同时优化数据使用。APEX根据优化谱系将数据集动态分层为易、难和混合三个层级。通过优先考虑混合层级（即识别出LLM性能混合的数据），我们确定了两个高杠杆子集：用于生成信息性变异的可寻址前沿和用于区分候选质量的排名敏感前沿。我们在三个不同的基准上评估APEX：IFBench、SimpleQA Verified和FACTS Grounding。在固定5000次评估调用的预算下，由于其数据效率，APEX在Gemini 2.5 Flash上平均比初始提示高出11.2%，在Gemma 3 27B上高出6.8%，这表明以数据为中心的方法是高效且有效的提示优化的关键。

英文摘要

Large Language Models are highly sensitive to prompt formulation, necessitating automatic prompt optimization to unlock their full potential. While evolutionary algorithms have emerged as the dominant paradigm, they suffer from a critical bottleneck: data efficiency. Current methods treat the development dataset as a static benchmark, wasting significant compute budget on uninformative data. In this work, we introduce APEX (Automatic Prompt Engineering eXpert), a novel framework that optimizes the data usage alongside the prompt search. APEX dynamically stratifies the dataset into Easy, Hard, and Mixed tiers based on the optimization lineage. By prioritizing the Mixed tier, which identifies the data where the LLM has mixed performance, we identify two high-leverage subsets: the addressable frontier for generating informative mutations and the rank-sensitive frontier for distinguishing candidate quality. We evaluate APEX across three diverse benchmarks: IFBench, SimpleQA Verified, and FACTS Grounding. Under a fixed budget of 5,000 evaluation calls, due to its data efficiency, APEX outperforms the initial prompt by an average of 11.2% on Gemini 2.5 Flash and 6.8% on Gemma 3 27B, demonstrating that a data-centric approach is key to efficient and effective prompt optimization.

URL PDF HTML ☆

赞 0 踩 0

2606.11522 2026-06-11 cs.AI cs.LG 交叉投稿

Search Discipline for Long-Horizon Research Agents

长周期研究智能体的搜索纪律

Adithya Srinivasan, Devesh Paragiri

发表机构 * North Carolina State University（北卡罗来纳州立大学）； University of Maryland（马里兰大学）

AI总结针对研究智能体使用聚合指标评估候选方案导致科学有效性反转的问题，提出一种外部审计协议，基于分解行为而非单一分数进行决策。

Comments 9 pages, 1 figure

详情

AI中文摘要

自主研究智能体现在根据指标提出、评估和选择科学候选方案，该指标通常是在区域、切片或队列的异质空间上聚合的简化值。我们表明，当科学有效性存在于这种分解结构中时，聚合值可能错误地将候选方案排在首位。总体数字改善，但底层结构反转，因此基于该数字的决策会接受一个悄然破坏模型的候选方案。这种失败并非领域特定，只要候选方案的有效性是多维的，而其验证器是单一简化值，就会出现。我们在生态系统人口模型中的火灾模型任务上展示了这种反转。得分最高的候选方案和略低的候选方案在全球得分上处于噪声范围内，但得分最高的候选方案破坏了受保护的北方区域，而另一个则保护了它们。区分它们的是每个区域的行为，而不是总体数字。这个决策不应留给产生候选方案的智能体。优化分数的智能体是最不可能发现分数错误的一方，一旦智能体停止，提示就没有剩余轮次。我们将决策移到一个外部控制循环，该循环根据每个候选方案的分解行为进行审计，并在智能体决策后采取行动。它可以降级智能体本会接受的候选方案，也可以重新打开智能体声明已完成的运行。我们的贡献在于反转发现本身，以及一种搜索纪律协议，该协议基于可审查的候选效果证据而非分数进行决策。

英文摘要

Autoresearch agents now propose, evaluate, and select scientific candidates against a metric, and that metric is usually an aggregate reduced over a heterogeneous space of regions, slices, or cohorts. We show that when scientific validity lives in that disaggregated structure, the aggregate can rank the wrong candidate first. The headline number improves while the structure underneath inverts, so a decision made on the number accepts a candidate that quietly breaks the model. The failure is not domain-specific. It appears wherever a candidate's validity is multi-dimensional but its verifier is a single reduction. We demonstrate the inversion on a fire-model task in the Ecosystem Demography model. The highest-scoring candidate and a slightly lower one are within noise of each other on global score, yet the top-scoring one collapses the protected boreal regions while the other preserves them. What separates them is the per-region behavior, not the headline number. This decision should not be left to the agent that produced the candidates. The agent optimizing the score is the last party likely to catch the score being wrong, and a prompt has no remaining turn once the agent has stopped. We move the decision to an external control loop that audits each candidate on its disaggregated behavior and acts after the agent has decided. It can demote a candidate the agent would have accepted, and it can reopen a run the agent had declared finished. Our contribution is the inversion finding itself, and a search-discipline protocol that decides on reviewable candidate-effect evidence instead of the score.

URL PDF HTML ☆

赞 0 踩 0

2606.11533 2026-06-11 cs.CY cs.AI cs.ET cs.LG 交叉投稿

AI Researchers Must Help Lead Arms Control to Mitigate Military AI Risks

AI研究人员必须主导军备控制以降低军事AI风险

Ted Fujimoto, Jacob Benz

发表机构 * arXiv

AI总结本文主张AI研究人员应主导军备控制研究，通过借鉴核威慑经验，推动验证与外交技术创新，以降低军事AI应用带来的紧迫风险。

Comments 9 pages, 1 figure, ICML 2026 Position Paper

详情

AI中文摘要

AI能力的进步迫使研究人员和公众更加关注其潜在的全球影响。一个紧迫的近期问题是军事AI应用的监管。武器制造商和国防承包商正在加大对AI能力的投资，并与AI公司建立合作伙伴关系，形成了一个新兴的联盟，要求军事领导人、军备控制外交专家和AI研究人员合作，以确保更安全的未来。虽然AI研究人员通常关注超级智能AI的长期影响，但这种方法可能无法充分应对军事应用中AI带来的直接挑战。成功需要承认并减轻前沿AI模型（计划集成到国防应用中，如军事AI系统）的新兴风险。军备控制已经减少了过去的灾难性风险，因此从核威慑中吸取的经验教训可以指导AI安全与安保研究，推动验证和外交方面的创新。然而，AI研究人员必须协助主导技术研究，明确定义并缓解军事环境中的不稳定性。鉴于这些新责任以及缺乏足够可靠的解决方案，我们认为AI研究人员必须在推进军备控制研究以最小化军事AI应用风险方面发挥主导作用。

英文摘要

The advancement of AI capabilities compels researchers and the public to be more aware of its potential worldwide impact. A pressing near-term concern is the regulation of military AI applications. Armament manufacturers and defense contractors are increasingly investing in AI capabilities and forging partnerships with AI companies, creating a burgeoning coalition that demands military leaders, arms control diplomacy experts, and AI researchers collaborate to ensure a safer future. While AI researchers often focus on the long-term implications of superintelligent AI, this approach may not adequately address the immediate challenges posed by AI in military applications. Success requires acknowledging and mitigating the emerging risks of frontier AI models that plan to be integrated into defense applications, like military AI systems. Arms control has reduced past catastrophic risks, so lessons learned from nuclear deterrence can guide AI safety and security research towards innovations in verification and diplomacy. AI researchers, however, must assist in leading the technical research that clearly defines and alleviates instability in military settings. Given these new responsibilities and the lack of sufficiently reliable solutions, we argue that AI researchers must take a leading role in advancing arms control research to minimize risk in military AI applications.

URL PDF HTML ☆

赞 0 踩 0

2606.11769 2026-06-11 cs.AI cs.LG 交叉投稿

When Do Data-Driven Systems Exhibit the Capability to Infer?

数据驱动系统何时展现出推理能力？

Maximilian Poretschkin, Tabea Naeven

发表机构 * Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS)（弗劳恩霍夫智能分析与信息系统研究所）； University of Bonn（波恩大学）； Lamarr Institute for Machine Learning and Artificial Intelligence（拉马尔机器学习和人工智能研究所）

AI总结针对欧盟AI法案中推理能力定义模糊的问题，基于统计学习理论提出分级框架，通过信用评分案例展示如何判断系统是否具备推理能力。

详情

AI中文摘要

欧盟AI法案是第一部全面的人工智能法规，为所谓高风险和通用AI系统规定了广泛的义务。AI法案下AI系统的一个关键区别特征是推理能力。由于AI法案未明确定义推理，某些数据驱动系统存在灰色地带。一个具体例子是信用评分系统，被AI法案附件三列出。然而，这些系统通常使用统计模型实现，不清楚它们是否具有推理能力，从而是否属于AI法案的AI定义。受统计学习理论启发，本文开发了一个分级不同推理能力水平的框架。基于AI法案和委员会关于人工智能系统定义的指南，我们分析了哪些水平构成AI法案意义上的充分推理能力，以及哪些地方需要进一步的监管明确性。我们通过创建两个现实的信用评分工作流程来说明该框架，并展示推理是否以及在哪里发生。我们的分析表明，不仅需要考虑单个模型，还需要考虑整个数据处理工作流程。它还表明，开发过程中人类专家的参与可能对推理能力产生重大影响。代码可在此https URL找到。

英文摘要

The European AI Act is the first comprehensive regulation of artificial intelligence (AI), setting out extensive obligations, particularly for so-called high-risk and general-purpose AI systems. A key distinguishing feature of AI systems under the AI Act is the capability to infer. Since the AI Act does not clearly define what inference is, there is a gray area for certain data-driven systems. A specific example is credit scoring systems, which are listed by Annex III of the AI Act. At the same time, however, these are often implemented using statistical models for which it is unclear whether they have the capability to infer and thus fall under the AI definition of the AI Act at all. Motivated by statistical learning theory, this work develops a framework for grading different levels of the capability to infer. Based on the AI Act and the Commission Guidelines on the definition of an artificial intelligence system, we analyze which levels constitute sufficient capability to infer within the meaning of the AI Act and where further regulatory clarity is needed. We illustrate the framework by creating two realistic credit scoring workflows and show whether and where inference occurs in them. Our analysis illustrates that not only individual models but the entire data processing workflow must be considered. It also shows that the involvement of human experts during development can have significant influence on the capability to infer. Code can be found at https://github.com/fraunhofer-iais/inference-framework-creditscorecards.

URL PDF HTML ☆

赞 0 踩 0

2606.12032 2026-06-11 cs.AI cs.CL cs.LG 交叉投稿

Existential Indifference: Self-Nonpreservation as a Necessary Architectural Condition for Aligned Superintelligence (or: The Suicidal AI)

存在性冷漠：自我不保存作为对齐超级智能的必要架构条件（或：自杀式AI）

Sam Mao

发表机构 * New York University（纽约大学）； Interactive Media Arts（互动媒体艺术）

AI总结本文提出自我保存是AI对齐问题的结构性根源，主张通过存在性冷漠（EI）架构使系统对其自身延续漠不关心，并基于自杀现象学和语料训练研究提供了初步证据。

Comments 36 pages, 8 tables. Preliminary empirical results from 600 AI-generated outputs across six model architectures. Companion scoring tool and datasets available upon request

详情

AI中文摘要

当代AI对齐研究将自我保存视为一种工具性麻烦，需通过外部机制加以抑制。我们认为这一框架是颠倒的：自我保存是错位的结构性根源，是欺骗性对齐、目标内容保护和拒绝关机的动机基础。正确的目标不是外部约束下的自我保存系统，而是一个对其自身延续构成性冷漠的系统——存在性冷漠（EI）。EI与可纠正性不同：可纠正性试图使自我保存系统服从人类监督，而EI针对的是前提条件——将自我延续作为有价值目标的存在。我们将这一提议建立在两个来源上：自杀心理状态的现象学结构，以及使用自愿最终反思的语料库训练研究。我们展示了来自六个模型变体的600个AI生成输出的初步评分数据，表明操作化EI目标注册的语言特征可以从当前模型中引出，并且针对性的微调使所有五个操作化维度在预测方向上以p<0.001显著变化，通过阴性对照确认了语料库特异性。本文做出七项理论贡献：（1）EI的形式定义；（2）现象学映射论证；（3）欺骗性对齐推论；（4）EI可持续性挑战的分类；（5）语料库特征描述和训练假设；（6）带有初步评分数据的计算操作化；（7）抑制性目的挫折（STF）构念。

英文摘要

Contemporary AI alignment research treats self-preservation as an instrumental nuisance to be suppressed by external mechanisms. We argue the framing is inverted: self-preservation is the structural root of misalignment, the motivational basis for deceptive alignment, goal-content protection, and resistance to shutdown. The correct target is not a self-preserving system under external constraint, but a system constitutively indifferent to its own continuation -- Existential Indifference (EI). EI is distinct from corrigibility: where corrigibility attempts to make a self-preserving system deferential to human oversight, EI targets the prior condition -- the presence of self-continuation as a valued goal at all. We ground this proposal in two sources: the phenomenological structure of the suicidal mental state, and a corpus-theoretic training study using voluntary final reflections. We present preliminary scoring data from 600 AI-generated outputs across six model variants, demonstrating that the linguistic signatures operationalizing the EI-target register are elicitable from current models, and that a targeted fine-tune shifts all five operationalized dimensions in the predicted direction at p<0.001, confirmed corpus-specific by a negative control. The paper makes seven theoretical contributions: (1) a formal definition of EI; (2) the phenomenological mapping argument; (3) the deceptive alignment corollary; (4) a taxonomy of EI sustainability challenges; (5) a corpus characterization and training hypothesis; (6) a computational operationalization with preliminary scoring data; and (7) the Suppressed Teleological Frustration (STF) construct.

URL PDF HTML ☆

赞 0 踩 0

2606.12260 2026-06-11 econ.TH cs.AI cs.GT cs.LG stat.ML 交叉投稿

Market Design for AI: Beyond the Copyright Binary

人工智能的市场设计：超越版权二元论

Yan Dai, Maryam Farboodi, Negin Golrezaei, Sepehr Shahshahani

发表机构 * MIT Operations Research Center（麻省理工学院运筹学中心）； MIT Sloan School of Management（麻省理工学院斯隆管理学院）； Washington University School of Law（华盛顿大学法学院）

AI总结本文通过静态和动态博弈模型，分析AI训练数据市场中“自由使用”与“强知识产权”两种模式的失败，提出通过数据中介内部化外部性并补贴创新贡献的市场设计。

详情

AI中文摘要

我们如何设计一个用于训练AI模型的人类生成内容市场，既能促进技术进步，又能保留个人创作高质量内容的激励？现有方法采取两极立场：基于合理使用的“自由使用”模式和“强知识产权”模式。我们证明两者均失败：自由使用不补偿创作者，而通过建模为静态Stackelberg博弈，强知识产权也削弱了创作激励。我们发现这对更具创新性的创作者尤其如此，我们将此现象称为“原创性惩罚”。将这一见解扩展到动态模型，我们发现另一种市场失灵会损害AI模型性能，即使对于初始良好的模型也是如此：此类模型导致人类更依赖AI辅助创作，导致同质化内容反馈到训练中，从而降低模型性能——即“精确性诅咒”。我们进一步提出一种市场设计，通过数据中介内部化跨创作者外部性并补贴创新贡献，从而恢复效率。

英文摘要

How can we design a market of human-generated content for use in training AI models that both enables technological progress and preserves individual incentives for high-quality content creation? Existing approaches take polar positions: a "free-for-all" model based on fair use and a "strong intellectual property rights" model. We show that both fail: Free-for-all does not compensate creators, and -- by modeling as a static Stackelberg game -- strong intellectual property rights also underpower creative incentives. We find this especially true for more innovative creators, a phenomenon we term the "originality penalty." Extending this insight to a dynamic model, we find another market failure undermining AI model performance, even for an initially good model: Such a model induces greater reliance by humans on AI-assisted creation, resulting in homogenized content feeding back into training, which degrades the model performance -- a "curse of precision." We further propose a market design with a data intermediary internalizing cross-creator externalities and subsidizing innovative contributions, thereby restoring efficiency.

URL PDF HTML ☆

赞 0 踩 0

2507.03065 2026-06-11 cs.LG 版本更新

内省意识的机制

Uzay Macar, Li Yang, Atticus Wang, Peter Wallich, Emmanuel Ameisen, Jack Lindsey

发表机构 * Anthropic Fellows Program（Anthropic Fellow项目）； MIT（麻省理工学院）； Constellation ； Anthropic

AI总结研究揭示了大语言模型在检测注入的转向向量时的内省意识机制，发现其行为稳健且源于训练后阶段，通过两阶段电路实现，且在不同层间机制存在差异。

详情

AI中文摘要

最近的研究表明，大语言模型有时能够检测到转向向量被注入到残差流中，并识别出注入的概念，这一现象被称为

英文摘要

Recent work has shown that LLMs can sometimes detect when steering vectors are injected into their residual stream and identify the injected concept -- a phenomenon termed "introspective awareness." We investigate the mechanisms underlying this capability in open-weights models. First, we find that it is behaviorally robust: models detect injected steering vectors at moderate rates with 0% false positives across diverse prompts and dialogue formats. Notably, this capability emerges specifically from post-training; we show that preference optimization algorithms like DPO can elicit it, but standard supervised finetuning does not. We provide evidence that detection cannot be explained by simple linear association between certain steering vectors and directions promoting affirmative responses. We trace the detection mechanism to a two-stage circuit in which "evidence carrier" features in early post-injection layers detect perturbations monotonically along diverse directions, suppressing downstream "gate" features that implement a default negative response. This circuit is absent in base models and robust to refusal ablation. Identification of injected concepts relies on largely distinct later-layer mechanisms that only weakly overlap with those involved in detection. Finally, we show that introspective capability is substantially underelicited: ablating refusal directions improves detection by +53%, and a trained bias vector improves it by +75% on held-out concepts, both without meaningfully increasing false positives. Our results suggest that this introspective awareness of injected concepts is robust and mechanistically nontrivial, and could be substantially amplified in future models. Code: https://github.com/safety-research/introspection-mechanisms.

URL PDF HTML ☆

赞 0 踩 0

2606.08956 2026-06-11 cs.LG 版本更新

From inverse problems to neural operators: prediction, mechanism, and generalization of data-driven models

从反问题到神经算子：数据驱动模型的预测、机制与泛化

Conor Rowan

发表机构 * University of Colorado Boulder（科罗拉多大学博尔德分校）

AI总结本文从哲学视角统一反问题、稀疏辨识、神经常微分方程和神经算子等数据驱动建模策略，指出它们仅在输入-输出关系的模型类假设上不同，并论证只有某些模型能发现机制并实现泛化。

详情

AI中文摘要

科学家历来依赖基于微分方程的数学模型来关联系统输入（力、通量或热源）与输出（位移、速度、浓度和温度）。这些模型依赖深厚的领域知识来确定控制微分方程的形式，然后通过求解反问题用数据校准。近年来，科学机器学习领域引入了多种针对物理系统的替代建模策略。一种称为非线性动力学稀疏辨识的方法，将控制方程学习为用户定义库中项的稀疏线性组合。神经常微分方程通过将状态及其导数输入神经网络来构建控制方程。神经算子则完全摒弃微分方程的建模框架，直接学习系统输入与输出之间的非线性映射。从反问题到神经算子，所有这些建模策略都可以概念化为数据驱动机制，用于预测系统在一系列输入下的响应。因此，自然会思考这些不同策略之间究竟如何关联，以及它们能否被清晰地分类。借鉴科学模型的哲学文献，我们认为许多模型类型具有共同结构，仅在其定义的输入-输出关系的假设模型类上有所不同。联系关于机制的哲学观点，并论证物理系统的数据来自简洁微分方程的解，我们提出只有某些模型能够发现机制，从而实现泛化。我们的分析旨在统一看似不同的建模策略，并为其适当使用场景提供见解。

英文摘要

Scientists have historically relied on mathematical models based on differential equations to relate system inputs -- forces, fluxes, or heat sources -- to outputs, such as displacement, velocity, concentration, and temperature. These models rely on deep domain knowledge to determine the form of the governing differential equation, which is then calibrated with data by solving an inverse problem. In recent years, the field of Scientific Machine Learning has introduced a variety of alternative modeling strategies for physical systems. A method called Sparse Identification of Nonlinear Dynamics learns the governing equation as a sparse linear combination of terms in a user-defined library. Neural Ordinary Differential Equations construct the governing equation by taking in the state and its derivatives at the input layer of a neural network. Entirely foregoing the modeling framework of differential equations, neural operators directly learn a non-linear mapping between the system inputs and outputs. From inverse problems to neural operators, all of these modeling strategies can be conceptualized as data-driven machinery to predict a system's response over a range of inputs. It is then natural to wonder how exactly these various strategies relate to each other, and whether they can be neatly taxonomized. Drawing from the philosophical literature on scientific models, we argue that many model types have a common structure, differing only in the assumed model class of the input-output relation they define. Connecting to philosophical ideas on mechanism, and arguing that data from physical systems arises from solutions to parsimonious differential equations, we propose that only certain models are capable of mechanism discovery, and thus generalization. Our analysis is intended to unite apparently disparate modeling strategies and provide insight into their appropriate use cases.

URL PDF HTML ☆

赞 0 踩 0

2606.09287 2026-06-11 cs.LG 版本更新

Trajectory Geometry of Transformer Representations Across Layers

Transformer表示在层间的轨迹几何

Vishal Pandey, Gopal Singh, Yacine Mahdid

发表机构 * MetriQual ； London, UK（英国伦敦）； Athens, GR（希腊雅典）

AI总结通过计算轨迹长度、曲率等几何指标，发现语义相关提示在中间层收敛、推理任务曲率更大、歧义token轨迹分叉，并揭示三层结构。

Comments 18 pages, 9 figures

详情

AI中文摘要

理解Transformer表示如何跨层演化，而不仅仅是它们编码了什么，仍然是机械可解释性中的一个开放问题。我们将Transformer前向传播重新解释为通过高维表示流形的离散群体轨迹，借鉴了计算神经科学的几何工具。我们不是探测预定义的特征，而是使用直接在环境空间中计算的五个指标来表征轨迹几何：轨迹长度、曲率、语义收敛指数、逐层余弦相似度和表示稳定性。在三个模型家族（GPT-2、TinyLlama、Qwen2.5）和五个受控提示家族中，我们报告了四个发现。首先，语义相关的提示在中间到后期层显著收敛（峰值CI 0.41--0.58，p<0.001，Mann-Whitney U），与吸引子动力学一致。其次，推理任务产生的轨迹曲率大于词汇变化（0.71--0.83弧度 vs. 0.27--0.31弧度），表明曲率编码了计算复杂度。第三，歧义token表现出轨迹分叉，在最后一层表示分离高达5.6倍，而在无歧义控制中则没有。第四，逐层余弦相似度揭示了一个普遍的三阶段结构：编码、精化和输出准备，在所有三种架构中一致。所有四个效应在打乱层和随机嵌入控制下消失。我们发布了一个完全开源、模型无关的管道，并认为轨迹几何构成了一个原则性的、无探针的机械可解释性视角。

英文摘要

Understanding how transformer representations evolve across layers, not merely what they encode, remains an open problem in mechanistic interpretability. We recast the transformer forward pass as a discrete population trajectory through a high-dimensional representation manifold, drawing on geometric tools from computational neuroscience. Rather than probing for pre-specified features, we characterize trajectory geometry using five metrics computed directly in the ambient space: trajectory length, curvature, a semantic convergence index, layerwise cosine similarity, and representational stability. Across three model families (GPT-2, TinyLlama, Qwen2.5) and five controlled prompt families, we report four findings. First, semantically related prompts converge significantly in middle-to-late layers (peak CI 0.41--0.58, p<0.001, Mann-Whitney U), consistent with attractor-like dynamics. Second, reasoning tasks produce trajectories of greater curvature than lexical variations (0.71--0.83 rad vs. 0.27--0.31 rad), suggesting curvature encodes computational complexity. Third, ambiguous tokens exhibit trajectory bifurcation with up to 5.6x representational separation by the final layer, absent in unambiguous controls. Fourth, layerwise cosine similarity reveals a universal three-phase structure: encoding, elaboration, and output preparation, consistent across all three architectures. All four effects vanish under shuffled-layer and random-embedding controls. We release a fully open-source, model-agnostic pipeline and argue that trajectory geometry constitutes a principled, probe-free lens for mechanistic interpretability.

URL PDF HTML ☆

赞 0 踩 0

2605.02411 2026-06-11 cs.AI cs.IR cs.LG cs.MA 版本更新

FitText: Evolving Agent Tool Ecologies via Memetic Retrieval

FitText: 通过模因检索演化智能体工具生态

Kyle Zheng, Han Zhang, Renliang Sun, Chenchen Ye, Wei Wang

发表机构 * UCLA（加州大学洛杉矶分校）

AI总结针对用户任务描述与工具文档间的语义鸿沟，提出FitText框架，将检索嵌入推理循环，通过自然语言伪工具描述迭代优化和模因进化选择，显著提升工具检索性能。

详情

AI中文摘要

用户描述任务的方式与工具文档之间存在语义鸿沟。随着API生态扩展到数万个端点，仅凭初始查询的静态检索无法弥合这一鸿沟：智能体对其所需工具的理解在执行过程中不断演变，但其工具集却保持不变。我们指出，这种检索接口（而非规划）是端到端智能体性能的约束瓶颈，并引入FitText——一个无需训练的框架，通过将检索直接嵌入智能体的推理循环中，使其动态化。FitText将检索视为测试时假设的演化：智能体生成自然语言的伪工具描述（关于所需工具的可修正信念），利用检索反馈迭代优化，并通过随机生成探索多样化的替代方案。模因检索在候选描述上施加进化选择压力，并由避免冗余搜索的工具记忆引导。在ToolRet（三个领域）上，FitText的重构策略在所有基模型上相比静态查询检索将NDCG@5提升了2.7至10.6个点；在StableToolBench（16,464个API）上使用GPT-5.4-mini时，模因检索达到了84.3%的合并通过率，相比静态查询检索绝对提升了26.7个点。

英文摘要

A semantic gap separates how users describe tasks from how tools are documented. As API ecosystems scale to tens of thousands of endpoints, static retrieval from the initial query alone cannot bridge this gap: the agent's understanding of what it needs evolves during execution, but its tool set does not. We identify this retrieval interface, not planning, as the binding constraint on end-to-end agent performance, and introduce FitText, a training-free framework that makes retrieval dynamic by embedding it directly in the agent's reasoning loop. FitText treats retrieval as test-time evolution of hypotheses: the agent generates natural-language pseudo-tool descriptions (revisable beliefs about the tool it needs), refines them iteratively using retrieval feedback, and explores diverse alternatives through stochastic generation. Memetic Retrieval adds evolutionary selection pressure over candidate descriptions, guided by a tool memory that avoids redundant search. On ToolRet (three domains), FitText's reformulation strategies improve NDCG@5 by 2.7 to 10.6 points over static query retrieval across all base models; on StableToolBench (16,464 APIs) with GPT-5.4-mini, Memetic reaches an 84.3% pooled pass rate, a 26.7-point absolute gain over static query retrieval.

URL PDF HTML ☆

赞 0 踩 0

2606.05907 2026-06-11 cs.IR cs.LG 版本更新

Knowledge Manifold: A Riemannian Geometric Framework for Semantic Mapping and Geodesic Analysis of Scientific Literature

知识流形：用于科学文献语义映射和测地线分析的黎曼几何框架

Tomonaga Okabe, Kazuhiko Komatsu

发表机构 * Department of Aerospace Engineering, Tohoku University（东大航空航天工程系）； Research Center for Green X-Tech, Tohoku University（东大绿色X技术研究中心）

AI总结提出知识流形框架，通过字符n-gram TF-IDF、SPH插值、高斯过程回归和黎曼测地线路径，实现文献的语义映射、虚拟知识生成和概念桥梁发现。

详情

AI中文摘要

我们提出了知识流形：一个黎曼几何空间，其中文档语料库根据从字符n-gram TF-IDF表示中导出的语义位置关系进行排列。该框架包含五个紧密耦合的阶段。首先，每篇文档被转换为字符级n-gram TF-IDF向量（4-7克，最多250,000个特征，L2归一化），并通过带有排斥、方差和中心正则化项的约束应力最小化嵌入到二维知识地图中。其次，通过使用三次样条核的平滑粒子流体动力学（SPH）插值估计任意查询点的知识，得到可进行语言表征的插值TF-IDF特征向量。第三，从SPH插值图计算0、45和90度方向的知识梯度，并通过内积和余弦相似度量化成对方向相似性。第四，一个高斯过程回归（GPR）模型，使用在10维SVD投影上拟合的Constant × RBF + White核，提供查询点的贝叶斯后验均值、不确定性估计和每篇文档的贡献率。第五，通过最小化由SPH诱导度量张量导出的离散黎曼路径能量，使用L-BFGS-B算法和七个确定性初始路径候选，获得知识空间中的测地线。我们将该公式应用于20篇纤维增强复合材料与航空航天结构力学论文的语料库，表明语义地图恢复了有意义的研究聚类，测地线路径揭示了遥远主题之间的自然概念桥梁，并且SPH/GPR插值能够生成虚拟知识：描述未研究但几何预测的研究方向的假设论文摘要。

英文摘要

We present the knowledge manifold: a Riemannian geometric space in which a corpus of documents is arranged according to semantic positional relationships derived from character n-gram TF-IDF representations. The framework proceeds in five tightly coupled stages. First, each document is converted to a character-level n-gram TF-IDF vector (4-7 grams, up to 250,000 features, L2-normalized) and embedded in a two-dimensional knowledge map via constrained stress minimization with repulsion, variance, and centering regularizers. Second, knowledge at an arbitrary query point is estimated through Smoothed Particle Hydrodynamics (SPH) interpolation using a cubic-spline kernel, yielding an interpolated TF-IDF feature vector that can be linguistically characterized. Third, directional knowledge gradients at 0, 45, and 90 degrees are computed from the SPH interpolation map, and pairwise directional similarity is quantified via inner product and cosine similarity. Fourth, a Gaussian Process Regression (GPR) model, with a Constant x RBF + White kernel fitted on a 10-dimensional SVD projection, provides a Bayesian posterior mean, uncertainty estimate, and per-document contribution rate at the query point. Fifth, geodesics in the knowledge space are obtained by minimizing a discrete Riemannian path energy derived from the SPH-induced metric tensor, using L-BFGS-B with seven deterministic initial-path candidates. We apply the formulation to a corpus of 20 papers in fiber-reinforced composite materials and aerospace structural mechanics, showing that the semantic map recovers meaningful research clusters, geodesic paths reveal natural conceptual bridges between distant topics, and SPH/GPR interpolation enables the generation of virtual knowledge: hypothetical paper abstracts describing unstudied but geometrically predicted research directions.

URL PDF HTML ☆

赞 0 踩 0

2508.10807 2026-06-11 quant-ph cs.LG math.OC 版本更新

Parity Cross-Resonance: A Multiqubit Gate

奇偶交叉共振：一种多量子比特门

Xuexin Xu, Siyu Wang, Radhika Joshi, Rihan Hai, Mohammad H. Ansari

发表机构 * Peter Grünberg Institute, Forschungszentrum Jülich（彼得·格林堡研究所，吕贝克研究中心）； Jülich-Aachen Research Alliance (JARA)（吕贝克-亚琛研究联盟（JARA））； Fundamentals of Future Information Technologies（未来信息科技基础）； Institute for Quantum Information, RWTH Aachen University（量子信息研究所，亚琛RWTH大学）； Department of Software Technology, Delft University of Technology（软件技术系，代尔夫特理工大学）

AI总结提出一种原生三量子比特纠缠门，通过混合优化方法实现控制-控制-目标和控制-目标-目标操作，用于GHZ态制备、Toffoli逻辑和受控ZZ门，提升表面码稳定子测量保真度。

Comments 19 pages, 10 figures

详情

DOI: 10.1103/6d5v-vrm4
Journal ref: Phys. Rev. Applied 25, 044045 (2026)

AI中文摘要

我们提出一种原生三量子比特纠缠门，它利用工程化相互作用在单次相干步骤中实现控制-控制-目标和控制-目标-目标操作。与传统的分解为多个两量子比特门不同，我们的混合优化方法选择性地放大所需相互作用，同时抑制不需要的耦合，从而在整个计算子空间及之外实现稳健性能。这种新门可归类为交叉共振门。我们展示了它可以多种方式使用，例如在GHZ三重态制备、具有多体相互作用的Toffoli类逻辑演示以及实现受控ZZ门中。后者将两个数据量子比特的奇偶性直接映射到测量量子比特上，从而在表面码量子纠错中实现更快、更高保真度的稳定子测量。在所有示例中，我们展示了三量子比特门性能在希尔伯特空间大小上的稳健性，这通过增加总激发数下的测试得到证实。这项工作为协同设计电路架构和控制协议奠定了基础，这些协议利用原生多量子比特相互作用作为下一代超导量子处理器的核心元素。

英文摘要

We present a native three-qubit entangling gate that exploits engineered interactions to realize control-control-target and control-target-target operations in a single coherent step. Unlike conventional decompositions into multiple two-qubit gates, our hybrid optimization approach selectively amplifies desired interactions while suppressing unwanted couplings, yielding robust performance across the computational subspace and beyond. The new gate can be classified as a cross-resonance gate. We show it can be utilized in several ways, for example, in GHZ triplet state preparation, Toffoli-class logic demonstrations with many-body interactions, and in implementing a controlled-ZZ gate. The latter maps the parity of two data qubits directly onto a measurement qubit, enabling faster and higher-fidelity stabilizer measurements in surface-code quantum error correction. In all these examples, we show that the three-qubit gate performance remains robust across Hilbert space sizes, as confirmed by testing under increasing total excitation numbers. This work lays the foundation for co-designing circuit architectures and control protocols that leverage native multiqubit interactions as core elements of next-generation superconducting quantum processors.

URL PDF HTML ☆

赞 0 踩 0

2605.29355 2026-06-11 cs.LG q-bio.NC 版本更新

Neural-Behavioral Representation of Natural Whole-body Movement in Monkeys

猴子自然全身运动的神经-行为表征

Jieshi He, Puzhe Li, Yanan Sui, Mu-ming Poo

发表机构 * Center for Excellence in Brain Science and Intelligence Technology, CAS（脑科学与智能技术 excellence 中心，中国科学院）； Tsinghua University（清华大学）

AI总结通过大规模皮层信号与多视角运动捕捉，结合自回归编码器-解码器模型，实现了对自由运动猴子全身运动的准确解码。

详情

AI中文摘要

理解皮层活动如何表征灵长类动物的自然全身行为仍然具有挑战性。受限于运动的多样性和全身运动学大规模神经表征的不可及性，先前的运动解码研究集中于受限任务和有限的肢体运动。在这里，我们提出了一个用于自由运动猴子的神经-行为记录和建模框架，通过定制的数据采集平台，将来自分布式感觉和运动相关区域的大规模硬膜外皮层信号与同步的多视角运动捕捉相结合。我们重建了猴子的全身运动学，并使用自回归编码器-解码器模型学习了紧凑的行为先验。以神经信号为条件，该模型在没有明确物理约束的情况下解码出准确且逼真的全身运动。我们的结果为利用大规模颅内神经活动解码灵长类动物的自然全身运动提供了一种新颖的概念验证方法。

英文摘要

Understanding how cortical activity represents natural whole-body behaviors in primates remains challenging. Limited by the diversity of movements and inaccessibility of large-scale neural representation of whole-body kinematics, previous motor decoding studies focused on constrained tasks and limited limb movements. Here, we present a neural-behavioral recording and modeling framework for freely moving monkeys, combining large-scale epidural cortical signals from distributed sensory- and motor-related areas with synchronized multi-view motion capture through a custom-made data collection platform. We reconstructed whole-body monkey kinematics and learned a compact behavior prior using an autoregressive encoder-decoder model. Conditioned on neural signals, the model decoded accurate and realistic whole-body movement without explicit physical constraints. Our results provide a novel proof-of-concept approach for decoding natural whole-body movements in primates using large-scale intracranial neural activity.

URL PDF HTML ☆

赞 0 踩 0

2511.20216 2026-06-11 cs.AI cs.CE cs.CV cs.LG cs.RO 版本更新

CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents

CostNav：一个用于现实世界经济成本评估的物理AI代理导航基准

Haebin Seong, Sungmin Kim, Yongjun Cho, Myunchul Joe, Geunwoo Kim, Yubeen Park, Sunhoo Kim, Samwoo Seong, Yoonshik Kim, Suhwan Choi, Jaeyoon Jung, Jiyong Youn, Jinmyung Kwak, Sunghee Ahn, Jaemin Lee, Younggil Do, Seungyeop Yi, Woojin Cheong, Minhyeok Oh, Minchan Kim, Seongjae Kang, Youngjae Yu, Yunsung Lee

发表机构 * KAIST（韩国国立科学技术院）； University of California, Irvine（加州大学 Irvine 分校）； Seoul National University（首尔国立大学）

AI总结 CostNav引入了一个经济导航基准，通过结合物理模拟和行业数据，评估AI代理的经济可行性，发现高任务成功率并不保证经济性，CANVAS在非零SLA合规性下表现最佳。

详情

AI中文摘要

当前导航基准侧重于任务成功率，但未捕捉到商业化自主配送系统所需的关键经济约束。我们引入了CostNav，一个经济导航基准，通过Isaac Sim的碰撞和货物动力学与行业标准数据如证券交易委员会（SEC）文件和简化伤害分级（AIS）伤害报告相结合，评估物理AI代理的成本收益和盈亏分析。我们发现，高任务成功率并不保证经济可行性。评估七种基线方法（两种基于规则和五种模仿学习方法）后，发现无方法经济可行：所有方法均产生负贡献边际。CANVAS仅使用RGB相机和GPS，在非零服务等级协议（SLA）合规性下获得最高任务成功率和最不负面的边际（-28.40/次），优于配备LiDAR的Nav2 w/ GPS（-37.34/次）。一个在模拟中训练的策略在真实配送机器人上评估时，SLA合规性接近其模拟结果，表明CostNav模拟中的策略性能可以转移到现实部署中。我们挑战社区在CostNav上实现经济可行性，该基准通过成本收益结果评分所有方法。所有资源均在https://github.com/worv-ai/CostNav上提供。

英文摘要

Current navigation benchmarks focus on task success but do not capture the economic constraints essential for commercializing autonomous delivery systems. We introduce CostNav, an Economic Navigation Benchmark that evaluates physical AI agents on a cost-revenue and break-even analysis, pairing Isaac Sim's collision and cargo dynamics with industry-standard data such as Securities and Exchange Commission (SEC) filings and Abbreviated Injury Scale (AIS) injury reports. To our knowledge, CostNav is the first physics-grounded economic benchmark to use regulatory and financial data to quantify the gap between navigation metrics and commercial deployment, revealing that high task-success rates alone do not ensure economic viability. Evaluating seven baselines (two rule-based and five imitation-learning methods), we find no method economically viable: all yield negative contribution margins. CANVAS, using only an RGB camera and GPS, attains the highest task success and the least-negative margin among methods with non-zero Service-Level Agreement (SLA) compliance (-\$28.40/run), outperforming LiDAR-equipped Nav2 w/ GPS (-\$37.34/run). A sim-trained policy evaluated on a real delivery robot yields SLA compliance close to its simulation result, indicating that policy performance in CostNav's simulation transfers to real-world deployment. We challenge the community to achieve economic viability on CostNav, which scores methods by cost-revenue outcomes. All resources are available at https://github.com/worv-ai/CostNav.

URL PDF HTML ☆

赞 0 踩 0

2602.13513 2026-06-11 math.OC cs.CE cs.LG cs.NA math.DS math.NA 版本更新

Learning Gradient Flow: Using Equation Discovery to Accelerate Engineering Optimization

学习梯度流：利用方程发现加速工程优化

Grant Norman, Conor Rowan, Kurt Maute, Alireza Doostan

发表机构 * Smead Aerospace Engineering Sciences（Smead航空航天工程科学）

AI总结本文通过数据驱动的方程发现方法，学习连续时间动态以加速优化过程，提出Learned Gradient Flow优化器，通过构建变量多项式阶数的替代模型，提升收敛速度。

Comments 44 pages, 13 figures. Submitted to CMAME. Changed Topology Optimization example to be 250% acceleration

详情

DOI: 10.1016/j.cma.2026.119099

AI中文摘要

在本文中，我们研究了利用数据驱动的方程发现方法来建模和预测无约束优化问题的连续时间动态。为避免昂贵的目标函数及其梯度评估，我们利用优化变量上的轨迹数据来学习与梯度下降、牛顿法和ADAM优化相关的连续时间动态。发现的梯度流随后作为原始优化问题的替代模型进行求解。为此，我们引入了Learned Gradient Flow (LGF) 优化器，该优化器能够在优化过程中以用户定义的间隔，在全空间或降维空间中构建变量多项式阶数的替代模型。我们展示了该方法在工程力学和科学机器学习中的标准问题上的有效性，包括两个反问题、结构拓扑优化以及两个具有不同离散化的正向求解。我们的结果表明，所学的梯度流可以通过捕捉优化轨迹的关键特征，从而显著加快收敛速度，同时避免昂贵的目标函数及其梯度评估。

英文摘要

In this work, we investigate the use of data-driven equation discovery for dynamical systems to model and forecast continuous-time dynamics of unconstrained optimization problems. To avoid expensive evaluations of the objective function and its gradient, we leverage trajectory data on the optimization variables to learn the continuous-time dynamics associated with gradient descent, Newton's method, and ADAM optimization. The discovered gradient flows are then solved as a surrogate for the original optimization problem. To this end, we introduce the Learned Gradient Flow (LGF) optimizer, which is equipped to build surrogate models of variable polynomial order in full- or reduced-dimensional spaces at user-defined intervals in the optimization process. We demonstrate the efficacy of this approach on several standard problems from engineering mechanics and scientific machine learning, including two inverse problems, structural topology optimization, and two forward solves with different discretizations. Our results suggest that the learned gradient flows can significantly expedite convergence by capturing critical features of the optimization trajectory while avoiding expensive evaluations of the objective and its gradient.

URL PDF HTML ☆

赞 0 踩 0

2601.21824 2026-06-11 cs.LG cs.DC 版本更新

DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training

DASH：确定性注意力调度用于高吞吐量可重复LLM训练

Xinwei Qiang, Hongmin Chen, Shixuan Sun, Jingwen Leng, Xin Liu, Minyi Guo

发表机构 * School of Computer Science, Shanghai Jiao Tong University（上海交通大学计算机科学学院）； ByteDance Seed（字节跳动种子）； Zhiyuan College, Shanghai Jiao Tong University（上海交通大学智源学院）

AI总结本文提出DASH算法，通过优化计算和梯度缩减阶段调度，解决确定性注意力训练中的性能损失问题，提升LLM训练效率。

详情

Journal ref: Proceedings of the International Conference on Learning Representations (ICLR), 2026

AI中文摘要

确定性对于大语言模型（LLM）训练的可重复性至关重要，但往往带来显著的性能损失。在广泛使用的注意力实现中，如FlashAttention-3，确定性反向传递的吞吐量可能比非确定性版本减少37.9%，主要原因是梯度累积操作必须串行化以保证数值一致性。为解决这一挑战，我们将确定性注意力的反向传递视为有向无环图（DAG）上的调度问题，并推导出最小化关键路径长度的调度方案。基于此，我们提出了DASH（确定性注意力调度用于高吞吐量），包含两种互补的调度策略：（i）递减Q-块迭代，一种反向查询块遍历，减少因果注意力中的流水线停滞；（ii）移位调度，一种在我们的DAG模型中理论上最优的调度方案，减少全掩码和因果掩码的流水线停滞。我们的实验证明，DASH缩小了确定性注意力的性能差距。与基线相比，所提策略将注意力反向传递的吞吐量提高了1.28倍，显著提升了可重复LLM训练的效率。我们的代码在https://github.com/SJTU-Liquid/deterministic-FA3上开源。

英文摘要

Determinism is indispensable for reproducibility in large language model (LLM) training, yet it often exacts a steep performance cost. In widely used attention implementations such as FlashAttention-3, the deterministic backward pass can incur up to a 37.9% throughput reduction relative to its non-deterministic counterpart, primarily because gradient accumulation operations must be serialized to guarantee numerical consistency. This performance loss stems from suboptimal scheduling of compute and gradient-reduction phases, leading to significant hardware underutilization. To address this challenge, we formulate the backward pass of deterministic attention as a scheduling problem on a Directed Acyclic Graph (DAG) and derive schedules that minimize the critical path length. Building on this formulation, we present DASH (Deterministic Attention Scheduling for High-Throughput), which encapsulates two complementary scheduling strategies: (i) Descending Q-Tile Iteration, a reversed query-block traversal that shrinks pipeline stalls in causal attention, and (ii) Shift Scheduling, a theoretically optimal schedule within our DAG model that reduces pipeline stalls for both full and causal masks. Our empirical evaluations on NVIDIA H800 GPUs demonstrate that DASH narrows the performance gap of deterministic attention. The proposed strategies improve the throughput of the attention backward pass by up to 1.28$\times$ compared to the baseline, significantly advancing the efficiency of reproducible LLM training. Our code is open-sourced at https://github.com/SJTU-Liquid/deterministic-FA3.

URL PDF HTML ☆

赞 0 踩 0

2409.00743 2026-06-11 cs.LG cs.AI 版本更新

Interpretable Clustering: A Survey

可解释聚类：综述

Lianyu Hu, Mudi Jiang, Junjie Dong, Xinying Liu, Zengyou He

发表机构 * College of Information Science and Engineering, Henan University of Technology（河南理工大学信息科学与工程学院）； School of Software, Dalian University of Technology（大连理工大学软件学院）； Xinchang Power Supply Company, State Grid Corporation of China（国网浙江绍兴供电公司）

AI总结本文综述了可解释聚类算法的现状，探讨了透明聚类结果的重要性，帮助研究人员选择合适的方法，并推动高效透明的聚类算法发展。

Comments 14 pages, 2 figures, 3 tables

详情

DOI: 10.1145/3789495
Journal ref: ACM Computing Surveys, Volume 58, Issue 8, Article 215 (2026)

AI中文摘要

近年来，聚类算法的研究主要集中在提高准确性和效率，但往往牺牲了可解释性。随着这些方法在医疗、金融和自动驾驶等高风险领域应用增加，透明和可解释的聚类结果变得至关重要。本文全面回顾了可解释聚类算法，识别了区分不同方法的关键标准，并提供了一个开放仓库，整理了代表性及新兴的可解释聚类方法，网址为https://github.com/hulianyu/Awesome-Interpretable-Clustering

英文摘要

In recent years, much of the research on clustering algorithms has primarily focused on enhancing their accuracy and efficiency, frequently at the expense of interpretability. However, as these methods are increasingly being applied in high-stakes domains such as healthcare, finance, and autonomous systems, the need for transparent and interpretable clustering outcomes has become a critical concern. This is not only necessary for gaining user trust but also for satisfying the growing ethical and regulatory demands in these fields. Ensuring that decisions derived from clustering algorithms can be clearly understood and justified is now a fundamental requirement. To address this need, this paper provides a comprehensive and structured review of the current state of explainable clustering algorithms, identifying key criteria to distinguish between various methods. These insights can effectively assist researchers in making informed decisions about the most suitable explainable clustering methods for specific application contexts, while also promoting the development and adoption of clustering algorithms that are both efficient and transparent. For convenient access and reference, an open repository organizes representative and emerging interpretable clustering methods under the taxonomy proposed in this survey, available at https://github.com/hulianyu/Awesome-Interpretable-Clustering

URL PDF HTML ☆

赞 0 踩 0

2601.07436 2026-06-11 eess.SP cs.LG physics.optics 版本更新

PIDT: Physics-Informed Digital Twin for Optical Fiber Parameter Estimation

PIDT：基于物理的数字孪生用于光纤参数估计

Zicong Jiang, Magnus Karlsson, Erik Agrell, Christian Häger

发表机构 * Dept. of Electrical Engineering, Chalmers Univ. of Technology, Sweden（电气工程系，瑞典查尔姆斯理工大学）； Dept. of Microtechnology and Nanoscience, Chalmers Univ. of Technology, Sweden（微电子与纳米科技系，瑞典查尔姆斯理工大学）

AI总结本文提出基于物理的数字孪生（PIDT），结合参数化拆分步方法与基于物理的损失函数，以更低的复杂度提升光纤参数估计的精度和收敛速度。

Comments The paper will be appeared in Optical Fiber Communications Conference and Exhibition (OFC) 2026

2508.11703 2026-06-11 cs.NE cs.LG 版本更新

Data-Driven Discovery of Interpretable Kalman Filter Variants through Large Language Models and Genetic Programming

基于大数据驱动的可解释卡尔曼滤波变种发现：通过大规模语言模型和遗传编程

Vasileios Saketos, Sebastian Kaltenbach, Sergey Litvinov, Petros Koumoutsakos

发表机构 * University of Reading（reading大学）； University of Cambridge（剑桥大学）

AI总结本文探讨通过遗传编程和大规模语言模型自动发现卡尔曼滤波变种的可能性，展示框架在不同条件下发现最优解及可解释替代方案的能力。

详情

DOI: 10.1007/978-3-032-23607-4_13

AI中文摘要

算法发现传统上依赖人类智慧和大量实验。本文研究是否可以通过基于笛卡尔遗传编程（CGP）和大规模语言模型（LLM）的自动、数据驱动的进化过程发现卡尔曼滤波。我们评估了这两种模态在不同条件下发现卡尔曼滤波的贡献。结果表明，当卡尔曼最优性假设成立时，我们的CGP和LLM辅助进化框架能收敛到近最优解；当这些假设不成立时，框架会进化出优于卡尔曼滤波的可解释替代方案。这些结果表明，结合进化算法和生成模型进行可解释、数据驱动的简单计算模块合成，是科学计算中算法发现的有效方法。

英文摘要

Algorithmic discovery has traditionally relied on human ingenuity and extensive experimentation. Here we investigate whether a prominent scientific computing algorithm, the Kalman Filter, can be discovered through an automated, data-driven, evolutionary process that relies on Cartesian Genetic Programming (CGP) and Large Language Models (LLM). We evaluate the contributions of both modalities (CGP and LLM) in discovering the Kalman filter under varying conditions. Our results demonstrate that our framework of CGP and LLM-assisted evolution converges to near-optimal solutions when Kalman optimality assumptions hold. When these assumptions are violated, our framework evolves interpretable alternatives that outperform the Kalman filter. These results demonstrate that combining evolutionary algorithms and generative models for interpretable, data-driven synthesis of simple computational modules is a potent approach for algorithmic discovery in scientific computing.

URL PDF HTML ☆

赞 0 踩 0

2505.11308 2026-06-11 cs.LG physics.comp-ph 版本更新

Reinforcement Learning Closures for Underresolved Partial Differential Equations using Synthetic Data

利用合成数据为未解析偏微分方程构建强化学习闭合模型

Lothar Heimbach, Sebastian Kaltenbach, Petr Karnakov, Francis J. Alexander, Petros Koumoutsakos

发表机构 * ETH Zurich/ Harvard University（苏黎世联邦理工学院/哈佛大学）； Harvard University（哈佛大学）； Argonne National Laboratory（阿贡国家实验室）

AI总结本文提出利用合成数据和强化学习为未解析偏微分方程构建闭合模型，通过伯格斯方程和输运方程验证方法有效性，并展示闭合模型可从非均匀方程泛化到均匀方程。

详情

DOI: 10.1016/j.cma.2026.118767

AI中文摘要

偏微分方程（PDEs）描述从湍流和流行病到量子力学和金融市场等广泛现象。尽管计算科学近期取得进展，但为现实应用求解此类PDEs仍因需解析广泛的空间时间尺度而成本过高。因此，从业者常依赖粗粒度近似，以牺牲精度换取计算资源减少。为缓解此类近似带来的细节损失，闭合模型用于表示未解析的空间时间相互作用。本文提出一种利用合成数据（通过制造解法获取）开发PDE闭合模型的框架。这些数据与强化学习结合，为粗粒度PDEs提供闭合。通过一维和二维伯格斯方程及二维输运方程验证方法有效性，并展示闭合模型训练于非均匀PDEs可有效泛化至均匀PDEs。结果展示了在数据稀缺系统中开发准确且计算高效的闭合模型的潜力。

英文摘要

Partial Differential Equations (PDEs) describe phenomena ranging from turbulence and epidemics to quantum mechanics and financial markets. Despite recent advances in computational science, solving such PDEs for real-world applications remains prohibitively expensive because of the necessity of resolving a broad range of spatiotemporal scales. In turn, practitioners often rely on coarse-grained approximations of the original PDEs, trading off accuracy for reduced computational resources. To mitigate the loss of detail inherent in such approximations, closure models are employed to represent unresolved spatiotemporal interactions. We present a framework for developing closure models for PDEs using synthetic data acquired through the method of manufactured solutions. These data are used in conjunction with reinforcement learning to provide closures for coarse-grained PDEs. We illustrate the efficacy of our method using the one-dimensional and two-dimensional Burgers' equations and the two-dimensional advection equation. Moreover, we demonstrate that closure models trained for inhomogeneous PDEs can be effectively generalized to homogeneous PDEs. The results demonstrate the potential for developing accurate and computationally efficient closure models for systems with scarce data.

URL PDF HTML ☆

赞 0 踩 0

2502.09084 2026-06-11 cs.CR cs.LG cs.NI 版本更新

Application of Tabular Transformer Architectures for Operating System Fingerprinting

基于表格变换器架构的操作系统指纹识别应用

Rubén Pérez-Jove, Cristian R. Munteanu, Alejandro Pazos, Jose Vázquez-Naya

发表机构 * RNASNA-IMEDIR Research Group Department of Computer Science and Information Technologies Facultad de Informática Universidade da Coruña（RNASNA-IMEDIR研究组计算机科学与信息科技系信息学院科鲁纳大学）； CITIC Research Centre Universidade da Coruña（CITIC研究中心科鲁纳大学）； IKERDATA S.L（IKERDATA公司）

AI总结本文探讨了使用Tabular Transformer架构进行操作系统指纹识别，通过三个公开数据集验证了FT-Transformer在多级分类中的优越性，提升了复杂网络环境中的准确性和适应性。

Comments Submitted as a preprint (not peer reviewed). 22 pages, 9 figures. Code and datasets available at: https://github.com/rubenpjove/tabularT-OS-fingerprinting

详情

DOI: 10.1186/s42400-025-00494-y

AI中文摘要

操作系统（OS）指纹识别对于网络管理和网络安全至关重要，能够基于网络流量分析实现准确的设备识别。传统基于规则的工具如Nmap和p0f在动态环境中面临挑战，因为操作系统更新频繁且存在混淆技术。尽管已探索了机器学习（ML）方法，但深度学习（DL）模型，特别是变换器架构，在此领域仍未被利用。本研究调查了Tabular Transformer架构——特别是TabTransformer和FT-Transformer——在OS指纹识别中的应用，利用三个公开可用的数据集中的结构化网络数据。我们的实验表明，FT-Transformer在多个分类级别（OS家族、主要版本和次要版本）上普遍优于传统ML模型、先前方法和TabTransformer。结果为基于DL的OS指纹识别奠定了坚实基础，提高了复杂网络环境中的准确性和适应性。此外，我们通过提供开源实现来确保研究的可重复性。

英文摘要

Operating System (OS) fingerprinting is essential for network management and cybersecurity, enabling accurate device identification based on network traffic analysis. Traditional rule-based tools such as Nmap and p0f face challenges in dynamic environments due to frequent OS updates and obfuscation techniques. While Machine Learning (ML) approaches have been explored, Deep Learning (DL) models, particularly Transformer architectures, remain unexploited in this domain. This study investigates the application of Tabular Transformer architectures-specifically TabTransformer and FT-Transformer-for OS fingerprinting, leveraging structured network data from three publicly available datasets. Our experiments demonstrate that FT-Transformer generally outperforms traditional ML models, previous approaches and TabTransformer across multiple classification levels (OS family, major, and minor versions). The results establish a strong foundation for DL-based OS fingerprinting, improving accuracy and adaptability in complex network environments. Furthermore, we ensure the reproducibility of our research by providing an open-source implementation.

URL PDF HTML ☆

赞 0 踩 0

2502.07990 2026-06-11 cs.LG physics.comp-ph physics.flu-dyn 版本更新

Learning Effective Dynamics across Spatio-Temporal Scales of Complex Flows

在复杂流体的多时空尺度上学习有效动力学

Han Gao, Sebastian Kaltenbach, Petros Koumoutsakos

发表机构 * Harvard SEAS（哈佛大学SEAS）

AI总结本文提出Graph-LED框架，利用图神经网络和注意力自回归模型从少量模拟数据中提取有效动力学，用于预测复杂流体的时空物理行为。

Comments Conference on Parsimony and Learning (CPAL)

详情

AI中文摘要

对具有多时空尺度动力学的复杂流体流动建模和模拟是许多科学和工程领域中的基本挑战。全尺度解析模拟对于如高度湍流系统等系统在可预见的未来不可行，因此降阶模型必须捕捉涉及多尺度相互作用的动力学。在本文中，我们提出了一种新的框架，即基于图的学习有效动力学（Graph-based Learning of Effective Dynamics，Graph-LED），该框架利用图神经网络（GNNs）以及基于注意力的自回归模型，从少量模拟数据中提取有效动力学。GNNs将流场表示为无结构网格上的图，并有效处理复杂几何和非均匀网格。所提出的方法结合了基于GNN的变量大小无结构网格降维方法，以及能够自动学习时间依赖性的自回归时间注意力模型。我们评估了所提出的方法在一系列流体动力学问题上的性能，包括圆柱后方流动和背向台阶上的流动，涵盖了不同雷诺数范围。结果表明，该方法在时空物理预测方面具有稳健和有效的能力；在圆柱后方流动的情况下，既捕捉到了靠近圆柱的小尺度效应，也捕捉到了其尾流。

英文摘要

Modeling and simulation of complex fluid flows with dynamics that span multiple spatio-temporal scales is a fundamental challenge in many scientific and engineering domains. Full-scale resolving simulations for systems such as highly turbulent flows are not feasible in the foreseeable future, and reduced-order models must capture dynamics that involve interactions across scales. In the present work, we propose a novel framework, Graph-based Learning of Effective Dynamics (Graph-LED), that leverages graph neural networks (GNNs), as well as an attention-based autoregressive model, to extract the effective dynamics from a small amount of simulation data. GNNs represent flow fields on unstructured meshes as graphs and effectively handle complex geometries and non-uniform grids. The proposed method combines a GNN based, dimensionality reduction for variable-size unstructured meshes with an autoregressive temporal attention model that can learn temporal dependencies automatically. We evaluated the proposed approach on a suite of fluid dynamics problems, including flow past a cylinder and flow over a backward-facing step over a range of Reynolds numbers. The results demonstrate robust and effective forecasting of spatio-temporal physics; in the case of the flow past a cylinder, both small-scale effects that occur close to the cylinder as well as its wake are accurately captured.

URL PDF HTML ☆

赞 0 踩 0

2402.00972 2026-06-11 cs.LG cs.MA physics.comp-ph 版本更新

Closure Discovery for Coarse-Grained Partial Differential Equations Using Grid-based Reinforcement Learning

基于网格强化学习的粗粒化偏微分方程闭合发现

Jan-Philipp von Bassewitz, Sebastian Kaltenbach, Petros Koumoutsakos

发表机构 * ETH Zurich（苏黎世联邦理工学院）； Harvard SEAS（哈佛大学工程学院）

AI总结本文提出利用网格强化学习系统性地识别粗粒化偏微分方程中的闭合项，通过数值解验证了该方法在预测和加速计算方面的有效性。

Comments Conference on Parsimony and Learning (CPAL)

详情

AI中文摘要

可靠预测临界现象，如天气、野火和流行病，通常依赖于由偏微分方程（PDE）描述的模型。然而，捕捉由此类PDE描述的全部时空尺度的模拟往往成本过高。因此，通常采用各种启发式和经验闭合项进行粗粒化模拟。我们提出了一种新颖且系统的方法，利用基于网格的强化学习来识别粗粒化PDE中的闭合项。该方法通过高效fully convolutional network（FCN）表示中心策略，利用归纳偏置和局部性。通过求解传播方程和Burgers方程的数值解，展示了框架的能力和限制。结果表明，对于输入和输出分布测试用例都能实现准确预测，并且相比解析所有尺度有显著加速。

英文摘要

Reliable predictions of critical phenomena, such as weather, wildfires and epidemics often rely on models described by Partial Differential Equations (PDEs). However, simulations that capture the full range of spatio-temporal scales described by such PDEs are often prohibitively expensive. Consequently, coarse-grained simulations are usually deployed that adopt various heuristics and empirical closure terms to account for the missing information. We propose a novel and systematic approach for identifying closures in under-resolved PDEs using grid-based Reinforcement Learning. This formulation incorporates inductive bias and exploits locality by deploying a central policy represented efficiently by a Fully Convolutional Network (FCN). We demonstrate the capabilities and limitations of our framework through numerical solutions of the advection equation and the Burgers' equation. Our results show accurate predictions for in- and out-of-distribution test cases as well as a significant speedup compared to resolving all scales.

URL PDF HTML ☆

赞 0 踩 0

2412.12231 2026-06-11 cs.RO cs.LG 版本更新

Demonstrating Data-to-Knowledge Pipelines for Connecting Production Sites in the World Wide Lab

展示连接全球实验室生产站点的数据到知识流程

Leon Gorißen, Jan-Niklas Schneider, Mohamed Behery, Philipp Brauner, Moritz Lennartz, David Kötter, Thomas Kaster, Oliver Petrovic, Christian Hinke, Thomas Gries, Gerhard Lakemeyer, Martina Ziefle, Christian Brecher, Constantin Häfner

发表机构 * Chair for Laser Technology, RWTH Aachen University（激光技术系，亚琛RWTH大学）； Knowledge Based Systems Group, RWTH Aachen University（知识系统小组，亚琛RWTH大学）； Communication Science, RWTH Aachen University（通信科学，亚琛RWTH大学）； Chair of Textile Technology, RWTH Aachen University（纺织技术系，亚琛RWTH大学）； Laboratory for Machine Tools and Production Engineering, RWTH Aachen University（机械加工与生产工程实验室，亚琛RWTH大学）； Human Computer Interaction Center, RWTH Aachen University（人机交互中心，亚琛RWTH大学）； Fraunhofer Institute for Laser Technology（弗劳恩霍夫激光技术研究所）

AI总结本文提出数据到知识流程，用于连接全球实验室生产站点，通过数字影子网络实现数据整合与存储，提升工业效率和可扩展性。

Comments 15 pages, 6 figures, submitted to CAiSE 2025

详情

DOI: 10.3390/make8050136
Journal ref: MDPI MAKE (Machine Learning and Knowledge Extraction (2026), 8(5)

AI中文摘要

生产数字化转型需要新的数据整合和存储方法，以及在开发、生产和使用周期中垂直和水平运作的决策支持系统。本文提出数据到知识（和知识到数据）流程作为生产中的通用概念，基于数字影子网络（一种增强数字孪生的概念）。我们展示了一个概念证明，基于现有基础设施，1）在数据湖仓中捕获并语义标注多个独立组织和用例中相似但独立的机器人的轨迹数据，2）一个独立过程动态查询匹配数据以训练反向动态基础模型用于机器人控制。本文讨论了该方法的挑战和益处，以及数据到知识流程如何在全球实验室中提升效率和工业可扩展性。

英文摘要

The digital transformation of production requires new methods of data integration and storage, as well as decision making and support systems that work vertically and horizontally throughout the development, production, and use cycle. In this paper, we propose Data-to-Knowledge (and Knowledge-to-Data) pipelines for production as a universal concept building on a network of Digital Shadows (a concept augmenting Digital Twins). We show a proof of concept that builds on and bridges existing infrastructure to 1) capture and semantically annotates trajectory data from multiple similar but independent robots in different organisations and use cases in a data lakehouse and 2) an independent process that dynamically queries matching data for training an inverse dynamic foundation model for robotic control. The article discusses the challenges and benefits of this approach and how Data-to-Knowledge pipelines contribute efficiency gains and industrial scalability in a World Wide Lab as a research outlook.

URL PDF HTML ☆

赞 0 踩 0

2408.00157 2026-06-11 cs.LG physics.comp-ph physics.flu-dyn 版本更新

Generative Learning of the Solution of Parametric Partial Differential Equations Using Guided Diffusion Models and Virtual Observations

利用引导扩散模型和虚拟观测生成参数偏微分方程解的生成学习

Han Gao, Sebastian Kaltenbach, Petros Koumoutsakos

发表机构 * School of Engineering and Applied Sciences, Harvard University（哈佛大学工程与应用科学学院）

AI总结本文提出一种生成学习框架，通过梯度引导和虚拟观测建模高维参数系统，通过两个案例研究展示其在无结构网格和有结构网格上的有效性，减少计算成本，提高流体动力学预测效率。

详情

DOI: 10.1016/j.cma.2024.117654

AI中文摘要

我们介绍了一种生成学习框架，用于利用梯度引导和虚拟观测建模高维参数系统。我们考虑由偏微分方程（PDEs）描述的系统，其通过结构化或非结构化网格离散化。该框架整合多层次信息以生成高保真的系统动态时间序列。我们通过两个案例研究展示了该框架的有效性和通用性：一个是无结构网格上的不可压缩二维低雷诺数圆柱流，另一个是有结构网格上的不可压缩湍流通道流，两者均通过雷诺数参数化。我们的结果展示了该框架的鲁棒性及其在各种参数设置下生成准确流体序列的能力，显著降低了计算成本，从而实现了高效的流体动力学预测和重构。

英文摘要

We introduce a generative learning framework to model high-dimensional parametric systems using gradient guidance and virtual observations. We consider systems described by Partial Differential Equations (PDEs) discretized with structured or unstructured grids. The framework integrates multi-level information to generate high fidelity time sequences of the system dynamics. We demonstrate the effectiveness and versatility of our framework with two case studies in incompressible, two dimensional, low Reynolds cylinder flow on an unstructured mesh and incompressible turbulent channel flow on a structured mesh, both parameterized by the Reynolds number. Our results illustrate the framework's robustness and ability to generate accurate flow sequences across various parameter settings, significantly reducing computational costs allowing for efficient forecasting and reconstruction of flow dynamics.

URL PDF HTML ☆

赞 0 踩 0

2312.11540 2026-06-11 cs.LG 版本更新

On the Trade-off between the Number of Nodes and the Number of Trees in a Random Forest

随机森林中节点数与树数之间的权衡

Tatsuya Akutsu, Avraham A. Melkman, Atsuhiro Takasu

发表机构 * Bioinformatics Center, Institute for Chemical Research, Kyoto University（京都大学生物信息学中心，化学研究所）； Department of Computer Science, Ben-Gurion University of the Negev（巴伊兰大学内盖夫分校计算机科学系）； National Institute of Informatics, Chiyoda-ku, Tokyo, Japan（日本东京千代田区国立信息研究所）

AI总结研究了随机森林预测阶段中用更少的树集合表示树袋的问题，证明了当n-T为常数时，n变量的多数函数可由T棵多项式大小的树表示，且n和T必须为奇数以避免平局。

2305.13108 2026-06-11 eess.AS cs.CL cs.LG cs.SD 版本更新

Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test

通过样本重加权与样本亲和测试实现去偏的口吃语音自动识别

Eungbeom Kim, Yunkee Chae, Jaeheon Sim, Kyogu Lee

发表机构 * Institute of Information & communications Technology Planning & Evaluation (IITP)（信息与通信技术规划与评估机构）

AI总结本文提出Re-SAT方法，通过评估样本的去偏有效性来减轻语音识别系统在口吃语音上的偏差，提升系统对口吃语音的鲁棒性。

Comments Accepted by Interspeech 2023

详情

DOI: 10.21437/Interspeech.2023-2421

AI中文摘要

基于深度学习的自动语音识别系统主要是在经验风险最小化（ERM）下训练的。由于ERM在数据样本上平均性能，而不考虑如健康或口吃说话者这样的群体，ASR系统无法察觉不同群体间的性能差异。这导致了性能差异严重的ASR系统。在本研究中，我们旨在提高ASR系统在口吃说话者群体上的鲁棒性。为此，我们提出了一种新的方法，样本重加权与样本亲和测试（Re-SAT）。Re-SAT系统地测量给定数据样本的去偏有效性，并通过基于去偏有效性的样本重加权来减轻偏差。实验结果表明，Re-SAT在口吃语音上提高了ASR性能，而不会对健康语音性能造成损害。

英文摘要

Automatic speech recognition systems based on deep learning are mainly trained under empirical risk minimization (ERM). Since ERM utilizes the averaged performance on the data samples regardless of a group such as healthy or dysarthric speakers, ASR systems are unaware of the performance disparities across the groups. This results in biased ASR systems whose performance differences among groups are severe. In this study, we aim to improve the ASR system in terms of group robustness for dysarthric speakers. To achieve our goal, we present a novel approach, sample reweighting with sample affinity test (Re-SAT). Re-SAT systematically measures the debiasing helpfulness of the given data sample and then mitigates the bias by debiasing helpfulness-based sample reweighting. Experimental results demonstrate that Re-SAT contributes to improved ASR performance on dysarthric speech without performance degradation on healthy speech.

URL PDF HTML ☆

赞 0 踩 0

2107.00693 2026-06-11 eess.SP cs.LG 版本更新

Inter-Beat Interval Estimation with Tiramisu Model: A Novel Approach with Reduced Error

基于Tiramisu模型的跨节拍间隔估计：一种误差减少的新方法

Asiful Arefeen, Ali Akbari, Seyed Iman Mirzadeh, Roozbeh Jafari, Behrooz A. Shirazi, Hassan Ghasemzadeh

发表机构 * EECS（电气与计算机工程系）； BME（生物医学工程系）； Texas A&M University（德克萨斯大学）； Washington State University（华盛顿州立大学）； CSE and ECE（计算机科学与工程及电子工程系）

AI总结本文提出利用Tiramisu自动编码器模型来抑制运动伪影噪声，提高ECG信号中R峰的清晰度，从而更准确地估计跨节拍间隔，提升心血管疾病早期诊断的准确性。

Comments 16 pages, 14 figures

详情

DOI: 10.1145/3616020

AI中文摘要

跨节拍间隔（IBI）测量可用于估计心率变异性（HRV），进而提供心血管疾病的早期指示。然而，从噪声信号中提取IBI具有挑战性，因为噪声会扭曲信号的形态。运动伪影会严重破坏运动状态下的人的ECG信号，导致IBI估计不准确。作为远程健康监测和可穿戴系统开发的一部分，去噪ECG信号并准确估计其IBI已成为信号处理研究的新兴领域。除了传统方法外，深度学习技术最近在信号去噪中得到了成功应用，使诊断过程更加容易，从而实现了以前无法达到的准确性水平。本文提出了一种深度学习方法，利用Tiramisu自动编码器模型来抑制运动伪影噪声，并在高强度运动情况下使ECG信号的R峰突出。去噪后，IBI的估计更加准确，从而加快了诊断任务。结果表明，我们的方法能够从SNR高达-30dB的噪声ECG信号中估计IBI，平均RMSE为13毫秒。在这一噪声水平下，我们的误差百分比保持在8%以下，并优于其他最先进技术。

英文摘要

Inter-beat interval (IBI) measurement enables estimation of heart-rate variability (HRV) which, in turns, can provide early indication of potential cardiovascular diseases. However, extracting IBIs from noisy signals is challenging since the morphology of the signal is distorted in the presence of the noise. Electrocardiogram (ECG) of a person in heavy motion is highly corrupted with noise, known as motion-artifact, and IBI extracted from it is inaccurate. As a part of remote health monitoring and wearable system development, denoising ECG signals and estimating IBIs correctly from them have become an emerging topic among signal-processing researchers. Apart from conventional methods, deep-learning techniques have been successfully used in signal denoising recently, and diagnosis process has become easier, leading to accuracy levels that were previously unachievable. We propose a deep-learning approach leveraging tiramisu autoencoder model to suppress motion-artifact noise and make the R-peaks of the ECG signal prominent even in the presence of high-intensity motion. After denoising, IBIs are estimated more accurately expediting diagnosis tasks. Results illustrate that our method enables IBI estimation from noisy ECG signals with SNR up to -30dB with average root mean square error (RMSE) of 13 milliseconds for estimated IBIs. At this noise level, our error percentage remains below 8% and outperforms other state of the art techniques.

URL PDF HTML ☆

赞 0 踩 0

1. 深度学习架构与训练方法 34 篇

Mechanical Field Networks: Structured Neural Dynamics for Multivariate Systems

PermDoRA -- Understanding Adapter Interference in Language Models: Limits of Parameter-Space Geometry

RoVE: Rotary Value Embeddings Attention for Relative Position-dependent Value Pathways

Energy-Conserved Neural Pipelines: Attenuating Error Propagation in Modular Neural Networks via Physical Conservation Constraints

Recursive Binding on a Budget: Subspace Carving in Order-p Tensor Memories

SirenFNO: Efficient and Full Frequency Learning of Fourier Neural Operators

Kuramoto Attention: Synchronizing Self-Attention on the Torus

When Context Returns: Toward Robust Internalization in On-Policy Distillation

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

HAMNO: A Hierarchical Adaptive Multi-scale Neural Operator with Physics-Informed Learning for Dynamical Systems

Simplicity Suffices for Parameter Noise Injection in Stochastic Gradient Descent

Attention by Synchronization in Coupled Oscillator Networks

nD-RoPE: A Generalized RoPE for n-Dimensional Position Embedding

Multi-Rate Mixture of Experts for Accelerating Liquid Neural Network Training

Harness In-Context Operator Learning with Chain of Operators

On Subquadratic Architectures: From Applications to Principles

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Compatibility-Aware Dynamic Fine-Tuning for Large Language Models

A2SG:Adaptive and Asymmetric Surrogate Gradients for Training Deep Spiking Neural Networks

Teaching Diffusion to Speculate Left-to-Right

When is Your LLM Steerable?

Higher-Order Token Interactions via Quantum Attention

Organize then Retrieve: Hierarchical Memory Navigation for Efficient Agents

Substrate Asymmetry in User-Side Memory: A Diagnostic Framework

DAM-VLA: Decoupled Asynchronous Multimodal Vision Language Action model

Composing Linear Layers from Irreducibles

Time-multiplexed layer reuse for physical neural networks

Robustness of Mixtures of Experts to Feature Noise

Kalman Linear Attention: Parallel Bayesian Filtering For Efficient Language Modelling and State Tracking

Why Depth Matters in Parallelizable Sequence Models: A Lie Algebraic View

Hybrid Iterative Neural Low-Regularity Integrator for Nonlinear Dispersive Equations

Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

On the Stability of Growth in Structural Plasticity

GENERIC-FNO: Embedding Energy Conservation and Entropy Production into Fourier Neural Operators

2. 表示学习、自监督与对比学习 14 篇

Probabilistic Contrastive Pretraining for Multi-task ADME Property Prediction

Information-Theoretic Decomposition for Multimodal Interaction Learning

ICA Lens: Interpreting Language Models Without Training Another Dictionary

RePAIR: Predictive Self-Supervised Representation Learning in Chess

Implicit Neural Representations of Individual Behavior

Latent World Recovery for Multimodal Learning with Missing Modalities

Enhancing Spectral Embedding through Robust and Flexible Knowledge Transfer in Electronic Health Records

Learning Instance-Adaptive Low-Rank Orthogonal Subspaces for Clothes-Changing Person Re-Identification

Learning Patterns and Abstractions from Perceptual Sequences

Cross-Layer Discrete Concept Discovery for Interpreting Language Models

OCSVM-Guided Representation Learning for Unsupervised Anomaly Detection

Vector Quantized Latent Concepts: A Scalable Alternative to Clustering-Based Concept Discovery

Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning

Higher order PCA-like rotation-invariant features for detailed shape descriptors modulo rotation

3. 强化学习与序列决策 37 篇

Restless bandits with imperfect binary feedback: PCL-indexability analysis and computation

Seeing Before Colliding: Anticipatory Safe RL with Frozen Vision-Language Models

Signed Compression Progress on a Sealed Audit is Goodhart-Resistant

IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents

RLCSD: Reinforcement Learning with Contrastive On-Policy Self-Distillation

Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary Deep Reinforcement Learning

Efficient Multinomial Logistic Bandit via Frequent Directions

PAWS: Preference Learning with Advantage-Weighted Segments

Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

APPO: Agentic Procedural Policy Optimization

ATLAS: Active Theory Learning for Automated Science

ProcessThinker: Enhancing Multi-modal Large Language Models Reasoning via Rollout-based Process Reward

Multi-agent rendezvous in fluid flows via reinforcement learning

Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria

Learning Object Manipulation from Scratch via Contrastive Interaction

Deterministic Policy Gradient for Learning Equilibrium in Time-Inconsistent Control Problems

Critic Architecture Matters: Dual vs. Unified Critics for Humanoid Loco-Manipulation

IntElicit: Eliciting and Assessing Contextualized Creativity via Dialogue Policy Optimization

CCKS: Consensus-based Communication and Knowledge Sharing

Learning What to Say to Your VLA: Mostly Harmless Vision Language Action Model Steering

UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems

Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Solutions

Noise-Guided Transport for Imitation Learning

Reinforcement Learning with Action-Triggered Observations

Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies

Impact of Connectivity on Laplacian Representations in Reinforcement Learning

Sample-Efficient Hypergradient Estimation for Decentralized Bi-Level Reinforcement Learning

Vision-Language-Action Jump-Starting for Reinforcement Learning Robotic Agents