arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.17415 2026-05-25 cs.LG cs.AI cs.DB cs.IR 版本更新

CHRONOS：面向演化数据市场的时态感知多智能体协调

Joydeep Chandra

发表机构 * BNRIST, Tsinghua University（北京清华大学智能机器人系统研究院）

AI总结 CHRONOS 是一种面向动态数据市场的多智能体协调框架，旨在解决静态设计中因数据演化带来的检索效率下降、价值分配不准确和隐私预算过度消耗等问题。该方法采用三层架构，分别通过时间感知的神经微分方程、基于突变点检测的夏普利价值评估和满足差分隐私的强化学习算法，实现高效且隐私保护的市场协调。实验表明，CHRONOS 在多个基准上表现出优越的检索性能和隐私效率，具有较高的实用价值。

详情

AI中文摘要

时态知识图谱数据市场在静态设计中面临三个耦合的失败：随着边演化，过时的混合索引捷径降低召回率；分布漂移后，固定的Shapley定价错误归因价值；不协调的智能体过度消耗共享的差分隐私预算。我们提出CHRONOS，一个三层架构，通过显式的公共和私有分离统一处理这些挑战。第一层应用神经ODE时间衰减到捷径边，提供每个查询的期望召回损失界为Big-O of Pq lambda delta t，单调包络保证将边界宽松度降低到观测损失的1.8到3.2倍。第二层将Shapley估值条件化在检测到的变点上，并在噪声下提供有限样本误差保证。第三层使用EXP3-IX实现Big-O of sqrt(T log T)遗憾，同时通过矩会计强制执行epsilon和delta差分隐私。CHRONOS每轮使用高斯机制发布私有化亲和矩阵；所有检索和排序都是后处理，不产生额外隐私成本。我们提供多轮结算、500个卖家的可扩展性分析，以及与加速基线的比较。在四个基准上，CHRONOS在10个结果时召回率为0.937，每秒2.74个查询，延迟161毫秒，在zCDP组合下总epsilon为4.25，delta为10^{-6}。这些结果表明一个竞争性的操作点。一个局限性是，在此隐私水平下，发布的估值仍受噪声主导；效用主要来自公共索引路由和由低敏感度统计驱动的自适应调度。

英文摘要

Temporal knowledge-graph data marketplaces face three coupled failures in static designs: stale hybrid index shortcuts reduce recall as edges evolve, stationary Shapley pricing misattributes value after distribution shifts, and uncoordinated agents over-consume a shared differential-privacy budget. We present CHRONOS, a three-layer architecture providing a unified treatment of these challenges with explicit public and private separation. Layer one applies neural-ODE temporal decay to shortcut edges, providing a per-query expected recall-loss bound of Big-O of Pq lambda delta t, with a monotone-envelope guarantee reducing bound looseness to 1.8 to 3.2 times observed loss. Layer two conditions Shapley valuation on detected changepoints and provides finite-sample error guarantees under noise. Layer three uses EXP3-IX to achieve Big-O of the square root of T log T regret while enforcing epsilon and delta differential privacy via moments accounting. CHRONOS releases a privatized affinity matrix per epoch using the Gaussian mechanism; all retrieval and ranking are post-processing, incurring no extra privacy cost. We provide multi-epoch settlement, scalability analysis for 500 sellers, and comparisons against accelerated baselines. Across four benchmarks, CHRONOS shows 0.937 recall at ten, 2.74 queries per second, 161 ms latency, and total epsilon of 4.25 at delta of 10 to the power of negative 6 under zCDP composition. These results indicate a competitive operating point. A limitation is that at this privacy level, released valuations remain noise-dominated; utility derives primarily from public index routing and adaptive scheduling driven by low-sensitivity statistics.

URL PDF HTML ☆

赞 0 踩 0

2605.23879 2026-05-25 stat.ML cs.CR cs.LG math.ST stat.TH 版本更新

On the Stability of Spherical Hellinger-Kantorovich Flows and Their Implications for Differential Privacy

球形Hellinger-Kantorovich流的稳定性及其对差分隐私的影响

Aratrika Mustafi, Soumya Mukherjee

发表机构 * Department of Statistics, Pennsylvania State University（宾夕法尼亚州立大学统计学系）

AI总结本文研究了球形Hellinger-Kantorovich梯度流的稳定性问题，并探讨其在差分隐私中的应用。作者建立了该梯度流的扰动理论，分析了不同势函数下流的动力学差异，并给出了与时间相关的log-似然比和Rényi散度的统一上界，进一步推导了KL散度的界。这些结果被用于差分隐私中的指数机制采样，提供了基于SHK梯度流的纯差分隐私和近似差分隐私保证，并分离了机制本身的次优性与有限时间采样误差的影响。

详情

AI中文摘要

梯度流采样将吉布斯分布解释为概率测度上能量泛函的最小值，并生成收敛到该目标的动力学。在球形Hellinger-Kantorovich (SHK)几何下，流耦合输运和反应，并与生灭Langevin动力学一致。本文发展了SHK梯度流的摄动理论。对于两个势函数$V$和$V^{\prime}$，我们从共同初始值出发比较相关的流，并量化势差异随时间传播的程度。一个统一的扰动界给出了对数似然比和Rényi散度的无维、逐点控制，而额外的结构使我们能够推导出KL散度的界。我们将这些结果应用于差分隐私中指数机制的近似采样。似然比控制为基于SHK的采样器提供了显式的时间依赖纯DP保证，而KL界通过hockey-stick散度给出了近似DP证书。我们还推导了一个效用界，将指数机制的内在次优性与有限时间采样误差分离。

英文摘要

Gradient-flow sampling interprets a Gibbs distribution as the minimizer of an energy functional over probability measures and generates dynamics converging to this target. Under spherical Hellinger-Kantorovich (SHK) geometry, the flow couples transport and reaction and coincides with birth-death Langevin dynamics. In this work, we develop a perturbation theory for SHK gradient flows. For two potentials $V$ and $V^{\prime}$, we compare the associated flows from a common initialization and quantify how potential discrepancies propagate over time. A uniform perturbation bound yields dimension-free, pointwise control of the log-likelihood ratio and Rényi divergence, while additional structure allows us to derive bounds for the KL divergence as well. We apply these results to approximate sampling for the exponential mechanism in differential privacy. The likelihood-ratio control provides explicit time-dependent Pure-DP guarantees for SHK-based samplers, while the KL bound yields Approximate-DP certificates via hockey-stick divergence. We also derive a utility bound separating intrinsic exponential-mechanism suboptimality from finite-time sampling error.

URL PDF HTML ☆

赞 0 踩 0

2605.23872 2026-05-25 cs.LG cs.NA math.NA stat.ML 版本更新

Training-Free Looped Transformers

免训练循环Transformer

Lizhang Chen, Jonathan Li, Chen Liang, Ni Lao, Qiang Liu

发表机构 * University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结本文提出了一种无需训练的循环变压器模型，通过在冻结的预训练模型中引入一个轻量级的推理时包装器，对连续的中间层块进行循环应用，而无需额外微调或结构修改。研究发现，直接重复使用中间层块会导致性能下降，因此作者借鉴常微分方程的前向欧拉方法，将循环视为对同一近似的优化，采用更小的阻尼子步骤替代单一的大更新。实验表明，该方法在多种模型架构上均能有效提升推理性能，如在MMLU-Pro等基准测试中取得显著提升。

详情

AI中文摘要

我们引入了免训练循环Transformer，其中轻量级推理时包装器循环冻结检查点的连续中间块层，无需额外微调、继续训练或架构更改。与先前使用循环结构端到端训练的循环Transformer方法不同，我们在测试时将循环性改造到预训练模型上。我们表明，简单的块重新应用通常会降低性能，凸显了循环应用策略的重要性。受将预归一化Transformer块视为ODE上的前向欧拉步骤的启发，我们将循环视为同一近似的细化，用一个大的更新替换为更小的阻尼子步骤。在七个密集、稀疏MoE和MLA+MoE模型家族中，我们的方法在MMLU-Pro上将Qwen3-4B-Instruct提升了2.64个百分点，在CommonsenseQA上将Qwen3-30B-A3B-Instruct提升了1.14个百分点，在OpenBookQA上将Moonlight-16B-A3B-Instruct提升了1.20个百分点。

英文摘要

We introduce training-free looped transformers, in which a lightweight inference-time wrapper loops a contiguous mid-stack block of layers of a frozen checkpoint without additional fine-tuning, continued training, or architectural changes. Unlike prior looped transformer methods that train with the looped structure end-to-end, we retrofit recurrence onto pretrained models at test time. We show that naive block reapplication usually degrades performance, highlighting the importance of the loop application strategy. Motivated by viewing a pre-norm transformer block as a forward Euler step on an ODE, we instead treat looping as a refinement of the same approximation, replacing one large update with smaller damped sub-steps. Across seven dense, sparse MoE, and MLA+MoE model families, our method improves Qwen3-4B-Instruct by +2.64 pp on MMLU-Pro, Qwen3-30B-A3B-Instruct by +1.14 pp on CommonsenseQA, and Moonlight-16B-A3B-Instruct by +1.20 pp on OpenBookQA.

URL PDF HTML ☆

赞 0 踩 0

2605.23871 2026-05-25 stat.ML cs.LG math.ST stat.TH 版本更新

Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer

Muon上的移动：Muon优化器的哈密顿概率梯度流视角

Aratrika Mustafi, Soumya Mukherjee, Bharath K. Sriperumbudur

AI总结本文从哈密顿概率梯度流的视角，研究了Muon优化器的连续时间动力学行为，提出了正则化Muon优化的梯度流形式，并揭示了其与核范数的Fenchel对偶平滑之间的联系。通过将Muon优化推广到有限粒子概率目标函数，作者推导了其惯性连续时间极限，并建立了参数-动量对的概率相空间平均场方程，证明了该动力学为阻尼哈密顿概率动力系统，具有单调递减的哈密顿能量。此外，文章还分析了目标函数的收敛性，并将该方法扩展到适用于变换器混合专家模型的块状Muon概率流。

详情

AI中文摘要

我们开发了一种在矩阵值参数概率测度空间上的梯度流，该梯度流由正则化Muon（理想化Muon优化器的解析平滑版本）诱导。关键观察是正则化正交化映射是核范数的光滑Fenchel对偶平滑的梯度。这确定了（正则化）Muon更新为更新变量中的镜像/近端步骤，其中动量充当对偶坐标。我们利用这一结构将Muon从单个矩阵参数提升到形如$J(ρ)=R\left(\int F d ρ ight)$的有限粒子概率目标，这一设置由神经网络训练的均场描述所激发，并推导出惯性连续时间极限。利用这一结构，我们在步长和动量的惯性缩放下推导出有限粒子连续时间极限，然后过渡到参数-动量对概率律上的相空间均场方程。所得流可被证明是阻尼哈密顿概率动力学，其动能由正则化Muon镜像势诱导。我们证明了一个精确的哈密顿耗散恒等式，显示哈密顿能量单调递减。虽然目标目标本身在惯性Muon动力学下不一定单调，但在额外的梯度优势、有界动量和曲率/对齐假设下，我们获得了目标间隙的连续和离散时间指数收敛率。我们还研究了均场极限方程的适定性，并建立了相互作用粒子系统的混沌传播保证。最后，我们将公式扩展到乘积矩阵空间上的Hilbert值特征映射，得到适用于平滑变压器混合专家模型的块状Muon概率流。

英文摘要

We develop a gradient flow on the space of probability measures defined on matrix-valued parameters induced by regularized Muon, an analytically smoothed version of the idealized Muon optimizer. The key observation is that the regularized orthogonalization map is the gradient of a smooth Fenchel-dual smoothing of the nuclear norm. This identifies the (regularized) Muon update as a mirror/prox step in the update variable, with momentum acting as the dual coordinate. We use this structure to lift Muon from a single matrix parameter to finite-particle probability objectives of the form $J(ρ)=R\left(\int F d ρ\right)$, a setting motivated by mean-field descriptions of neural-network training, and derive the inertial continuous-time limit. Using this structure, we derive the finite-particle continuous-time limit under the inertial scaling of step size and momentum, and then pass to a phase-space mean-field equation over probability laws on parameter-momentum pairs. The resulting flow can be shown to be a damped Hamiltonian probability dynamics whose kinetic energy is induced by the regularized Muon mirror potential. We prove an exact Hamiltonian dissipation identity, showing that the Hamiltonian energy decreases monotonically. While the target objective itself need not be monotone along the inertial Muon dynamics, under additional gradient-dominance, bounded-momentum, and curvature/alignment assumptions, we obtain continuous and discrete-time exponential convergence rates for the objective gap. We also study the well-posedness of the mean-field limit equation and establish propagation of chaos guarantees for the interacting particle system. Finally, we extend the formulation to Hilbert-valued feature maps on product matrix spaces, yielding a blockwise Muon probability flow applicable to smooth transformer mixture-of-experts models.

URL PDF HTML ☆

赞 0 踩 0

2605.23861 2026-05-25 cs.LG cs.AI cs.CV 版本更新

Leveraging Foundation Models for Causal Generative Modeling

利用基础模型进行因果生成建模

Aneesh Komanduri, Xintao Wu

发表机构 * University of Arkansas（亚拉巴马大学）

AI总结该论文研究如何利用预训练基础模型进行因果生成建模，旨在提升AI系统在反事实推理方面的能力。提出了一种名为FM-CGM的模块化框架，通过概念提取器、概念操作器和反事实生成器三个核心组件，实现了端到端的视觉因果推理。该方法结合了因果推理模型和文本到图像扩散模型，并引入了因果语义引导机制，有效支持零样本因果发现与反事实图像生成，具有重要的理论与应用价值。

详情

AI中文摘要

因果生成建模对于开发能够进行反事实推理的可靠且透明的AI系统至关重要。现有方法侧重于在生成模型训练过程中整合因果约束，但通常缺乏统一框架来利用预训练基础模型的零样本推理能力。我们提出FM-CGM，一个使用预训练基础模型进行端到端视觉因果推理的模块化框架。FM-CGM通过三个核心组件形式化因果流程：概念提取器、概念操作器和反事实生成器。通过利用大型推理模型进行因果推断，以及文本到图像扩散模型进行生成，我们的方法实现了零样本因果发现、干预和反事实生成。然后，我们开发了因果语义引导（CSG），一种基于交叉注意力的机制，确保语义干预传播到后代概念，同时保留不变区域。我们实验证明，我们的方法能够识别合理的因果结构，并适用于忠实的反事实图像生成。

英文摘要

Causal generative modeling is essential for developing reliable and transparent AI systems capable of counterfactual reasoning. While existing approaches focus on integrating causal constraints during the training of generative models, they often lack a unified framework to leverage the zero-shot reasoning capabilities of pretrained foundation models. We introduce FM-CGM, a modular framework for end-to-end visual causal reasoning using pretrained foundation models. FM-CGM formalizes the causal pipeline through three core components: a concept extractor, a concept manipulator, and a counterfactual generator. By leveraging a large reasoning model for causal inference and a text-to-image diffusion model for generation, our approach enables zero-shot causal discovery, intervention, and counterfactual generation. We then develop Causal Semantic Guidance (CSG), a cross-attention-based mechanism that ensures semantic interventions propagate to descendant concepts while preserving invariant regions. We empirically show that our approach can identify plausible causal structures and is suitable for faithful counterfactual image generation.

URL PDF HTML ☆

赞 0 踩 0

2605.23857 2026-05-25 cs.LG cs.CL 版本更新

AI天气模型的物理学

George Craig, Tobias Selz, Matthias Beylich, Kirsten I. Tempest

发表机构 * Meteorological Institute, LMU Munich（慕尼黑大学气象研究所）

AI总结本文探讨了人工智能天气模型是否在隐式求解物理方程，尽管这些方程可能不同于传统数值天气预报模型所使用的方程。研究通过计算预报技能与中心核对齐的相关性，发现不同AI天气模型在表征大气时具有相似性，尽管其结构和容量存在差异。文章提出这些模型可能通过粒子描述的方式模拟大气，其中每个网格点的潜在变量对应高维潜在空间中的粒子位置，并假设粒子的运动遵循潜在空间中自由能函数的梯度流。这一假设在GraphCast和Aurora模型的分析中得到了支持。

详情

AI中文摘要

AI天气模型是否可能在求解物理方程，尽管这些方程可能不是传统NWP模型所使用的方程？我们计算了预测技能和中心核对齐的相关性，提供了证据表明不同的AI天气模型以相似的方式表示大气，尽管架构和能力存在差异。我们认为AI模型的架构和训练限制了它们可能模拟的物理定律的形式。特别地，我们提出这些模型实现了大气的粒子描述，其中每个网格点的潜变量对应于高维潜空间中粒子的位置。我们假设粒子的运动遵循潜空间中的梯度流，朝向学习到的自由能泛函的最小值。对GraphCast和Aurora模型的分析表明，它们在早期处理器层中在大空间尺度上进行变化，并随着层深增加转向更小尺度，这与梯度流假设一致。

英文摘要

Could it be that AI weather models are solving physical equations, although they may not be the equations used by conventional NWP models? We compute correlations of forecast skill and Centered Kernel Alignment, providing evidence that different AI weather models represent the atmosphere in similar ways, despite differences in architecture and capacity. We argue that the architecture and training of the AI models constrains the form of the physical laws that they might simulate. In particular, we propose that the models implement a particle description of the atmosphere, where the latent variables at each mesh point correspond to the position of a particle in the high dimensional latent space. We hypothesize that the movement of the particles follows a gradient flow in the latent space towards a minimum of a learned free energy functional. Analysis of the GraphCast and Aurora models show that they make changes on large spatial scales in the early processor layers and move to smaller scale with increasing layer depth, consistent with the gradient flow hypothesis.

URL PDF HTML ☆

赞 0 踩 0

2605.23754 2026-05-25 cs.LG 版本更新

LLM-driven design of physics-constrained constitutive models: two agents are better than one

LLM驱动的物理约束本构模型设计：两个智能体胜过一个

Marius Tacke, Matthias Busch, Kian Abdolazizi, Jonas Eichinger, Kevin Linka, Roland Aydin, Christian Cyron

发表机构 * Helmholtz-Zentrum Hereon（海德堡中心）； Hamburg University of Technology（汉堡技术大学）； RWTH Aachen University（亚琛工业大学）； Saarland University（萨尔兰州大学）； German Center for Artificial Intelligence（德国人工智能中心）

AI总结本文提出了一种基于大语言模型（LLM）的多智能体方法，用于生成符合物理规律的本构模型。该方法引入了两个智能体：Creator 负责根据数据生成模型，Inspector 负责检查模型是否满足九项物理约束，若不满足则返回修改。实验表明，该方法显著提高了生成模型的物理正确性，同时保持了高精度和良好的泛化能力，为自动化、物理感知的模型发现提供了可信的解决方案。

详情

AI中文摘要

传统上，开发描述材料在载荷下变形方式的本构模型需要连续介质力学、机器学习和科学编程方面多年的专业知识。最近，大型语言模型（LLM）已被证明可以通过按需生成本构模型来降低这一门槛，但现有的单智能体流程缺乏系统性的检查，以确保生成的模型尊重基本物理定律。为弥补这一差距，我们引入了首个多智能体LLM驱动的本构模型生成方法：一个Creator智能体根据数据提出定制模型，而一个Inspector智能体对每个提案进行严格审计，检查其是否满足九个物理约束，并在检测到违规时返回修改。我们使用本构人工神经网络（CANN）演示了这一概念，并在脑组织、实验橡胶和合成橡胶上使用两种不同的LLM骨干（Claude Opus 4.7和Kimi K2.5）进行基准测试。添加Inspector后，对于Opus，导出模型中真正满足所有物理约束的比例从91%提高到完美的100%；对于Kimi，从37%提高到56%，同时保持了接近基线的准确性和对未见加载路径的显著泛化能力。综合来看，生成的模型在物理上有效、高度准确，并能可靠地外推到训练数据之外——这些特性使其可以直接在实践中使用。因此，将生成与检查分离，使LLM驱动的本构建模成为一个真正可信的过程。该范式故意与技术无关，并随着LLM能力的进步自动扩展，为自动化、物理感知的模型发现开辟了一条有前景的道路。

英文摘要

Developing constitutive models that capture how materials deform under load traditionally requires years of specialized expertise in continuum mechanics, machine learning, and scientific programming. Large language models (LLMs) have recently been shown to lower this barrier by generating constitutive models on demand, but existing single-agent pipelines lack systematic checks that the resulting models respect fundamental physical laws. To close this gap, we introduce the first multi-agent LLM-driven approach for constitutive model generation: a Creator agent proposes a model tailored to the data, while an Inspector agent critically audits each proposal against nine physical constraints and returns it for refinement whenever a violation is detected. We demonstrate this concept with constitutive artificial neural networks (CANNs) and benchmark it on brain tissue, experimental rubber, and synthetic rubber, using two different LLM backbones (Claude Opus 4.7 and Kimi K2.5). Adding the Inspector raises the share of exported models that truly satisfy all physical constraints from 91% to a perfect 100% for Opus and from 37% to 56% for Kimi, while preserving near-baseline accuracy and remarkable generalization to unseen loading paths. In combination, the generated models are physically valid, highly accurate, and extrapolate reliably beyond the training data - properties that together make them directly usable in practice. Separating generation from inspection thus turns LLM-driven constitutive modeling into a genuinely trustworthy process. The paradigm is deliberately technique-agnostic and scales automatically with advances in LLM capability, opening a promising path toward automated, physics-aware model discovery.

URL PDF HTML ☆

赞 0 踩 0

2605.23753 2026-05-25 cs.LG 版本更新

SeedER: Seed-and-Expand Retrieval from Knowledge Graphs

SeedER: 基于种子扩展的知识图谱检索

Hamed Shirzad, Frederik Wenkel, Dominique Beaini, Danica J. Sutherland, Emmanuel Noutahi

发表机构 * Valence Labs, Montréal, QC, Canada（Valence实验室，加拿大魁北克省蒙特利尔）； University of British Columbia, Department of Computer Science, Vancouver, BC, Canada（不列颠哥伦比亚大学计算机科学系，加拿大不列颠哥伦比亚省温哥华）

AI总结 SeedER 是一种用于知识图谱的检索框架，旨在解决其不规则结构带来的检索挑战。该方法通过先利用轻量级的密集嵌入和实体检索确定核心节点，再通过强化学习训练的图感知策略进行选择性扩展，从而高效发现与查询相关的节点。实验表明，SeedER 在保持较低扩展成本的同时，显著提升了检索效果，尤其在处理多跳组合查询时表现出优越的性能。

详情

AI中文摘要

知识图谱（KGs）为关系知识提供了丰富的表示，但其不规则结构使得检索具有挑战性：自我图扩展迅速增长，而密集嵌入方法难以处理多跳组合查询。现有的基于智能体的图探索方法虽然表达能力强，但通常对于大规模检索来说过于昂贵。我们引入了SeedER（种子扩展检索），这是一个通过迭代、低成本扩展显式利用KG结构的检索框架。SeedER首先使用轻量级密集和基于实体的检索播种一个紧凑的核心节点集，然后通过使用强化学习训练的图感知策略选择性地扩展该集合。这种设计将全局推理分解为可重用的局部决策，从而能够在严格控制扩展成本的同时高效发现与查询相关的节点。我们展示了密集检索在组合图查询上的理论局限性，并从组合泛化和图约束子模优化的角度确立了SeedER的优势。实验上，SeedER在紧凑候选集上显著提高了召回率，超过了强大的密集和图增强基线，使其成为知识密集型推理系统中有效的第一阶段检索器。

英文摘要

Knowledge graphs (KGs) offer a rich representation for relational knowledge, but their irregular structure makes retrieval challenging: ego-graph expansion grows rapidly, and dense embedding methods struggle with multi-hop compositional queries. Existing agent-based graph exploration approaches, while expressive, are often too expensive for large-scale retrieval. We introduce SeedER (Seed-and-Expand Retrieval), a retrieval framework that explicitly leverages KG structure through iterative, low-cost expansion. SeedER first seeds a compact set of core nodes using lightweight dense and entity-based retrieval, then selectively expands this set via a learned graph-aware policy trained with reinforcement learning. This design decomposes global reasoning into reusable local decisions, enabling efficient discovery of query-relevant nodes while tightly controlling expansion cost. We show theoretical limitations of dense retrieval on compositional graph queries, and establish advantages of SeedER from both compositional generalization and graph-constrained submodular optimization perspectives. Empirically, SeedER substantially improves recall with compact candidate sets over strong dense and graph-augmented baselines, making it an effective first-stage retriever for knowledge-intensive reasoning systems.

URL PDF HTML ☆

赞 0 踩 0

2605.23751 2026-05-25 cs.LG 版本更新

Approaching I/O-optimality for Approximate Attention

逼近近似注意力的I/O最优性

Pál András Papp, Aleksandros Sobczyk, Anastasios Zouzias

发表机构 * Computing Systems Lab（计算系统实验室）； Huawei Technologies（华为技术）

AI总结本文研究了大语言模型中注意力机制的I/O复杂度问题，旨在以最少的快慢内存数据传输次数计算注意力矩阵。作者提出了一种基于近似注意力框架的I/O高效算法，使得在大多数参数设置下，I/O代价仅近似线性依赖于序列长度$n$，显著优于现有方法的二次复杂度。同时，作者还给出了不同参数范围下的I/O下界，证明所提方法接近I/O最优。

详情

AI中文摘要

我们重新审视了大语言模型中注意力的I/O复杂度。给定查询-键-值矩阵 $Q,K,V\in\mathbb{R}^{n\times d}$，以及一个快速内存大小为 $M$ 的机器，目标是计算“注意力矩阵” $A=\text{softmax}(Q K ^{\top}/\sqrt{d}) V$，同时最小化快速和慢速内存之间的数据传输次数。文献中的现有方法，尤其是FlashAttention及其变体，其I/O开销与 $n$ 呈二次关系，而一个平凡的下界仅需要 $\Omega(nd)$ 次I/O来读取输入和写入输出。在这项工作中，我们提出了一种计算注意力的技术，在大多数参数范围内，其I/O开销几乎与 $n$ 呈线性关系。这是通过开发受Alman和Song最近提出的近似注意力框架启发的I/O高效算法实现的。我们还证明了每个参数范围内的相应下界，以表明我们的算法确实接近I/O最优。

英文摘要

We revisit the I/O complexity of attention in large language models. Given query-key-value matrices $Q,K,V\in\mathbb{R}^{n\times d}$, and a machine with fast memory size $M$, the goal is to compute the "attention matrix" $A=\text{softmax}(Q K ^{\top}/\sqrt{d}) V$ with the minimal number of data transfers between fast and slow memory. Existing methods in the literature, most notably FlashAttention and its variants, incur an I/O cost that depends quadratically on $n$, while a trivial lower bound only requires $Ω(nd)$ I/O's to read the inputs and write the output. In this work, we present a technique for computing attention where the I/O cost only depends almost-linearly on $n$ in most parameter regimes. This is achieved by developing I/O-efficient algorithms inspired by the recent approximate attention framework of Alman and Song. We also prove corresponding lower bounds in each parameter regime to show that our algorithms are indeed close to I/O-optimal.

URL PDF HTML ☆

赞 0 踩 0

2605.23744 2026-05-25 cs.LG 版本更新

Contrast to Detect: Dynamic Graph Contrastive Regularization for Unsupervised Anomaly Detection in Multivariate Time Series

对比检测：面向无监督多变量时间序列异常检测的动态图对比正则化

Yunhua Pei, Zixing Song, Jin Zheng, John Cartlidge

发表机构 * School of Computer Science, University of Bristol（布里斯托大学计算机科学学院）； School of Engineering Mathematics, University of Bristol（布里斯托大学工程数学学院）

AI总结该研究针对多变量时间序列中的无监督异常检测问题，提出了一种名为ContrastAD的框架，用于应对动态变量依赖关系和频谱噪声带来的挑战。该方法通过动态图对比学习，将结构演变作为学习信号，而非抑制其变化，并引入多视角嵌入和频率感知注意力机制以提升鲁棒性。实验表明，ContrastAD在多个真实数据集上取得了优越的异常检测性能，尤其在F1指标上表现突出。

Comments 12 pages, 5 figures. Preprint. Code and demo data available online

详情

AI中文摘要

多变量时间序列（MTS）中的异常检测受到动态变量间依赖关系和频谱噪声下特征纠缠的阻碍，在实践中，由于缺乏异常标签而进一步复杂化。现有的基于重构的检测器倾向于像正常模式一样忠实地恢复异常，而流行的图对比方法强制视图间不变性，从而假设一个平稳的关系结构，这一假设在真实系统的结构漂移下被打破。我们提出ContrastAD，一个无监督框架，将结构演化本身转变为学习信号而非抑制它。一个多视角编码器从时间、属性和结构视角编码输入。一个频率感知注意力混合器在注意力之前执行频谱top-K过滤，防止噪声泄漏到查询-键相似度中。核心组件，一个动态图对比学习器，从批次级DTW距离构建基于幂律的稀疏图快照，并将最发散的对与稳定锚点进行对比，在不施加刚性不变性的情况下正则化潜在空间。在五个真实世界基准上，ContrastAD在所有五个数据集上获得最高平均F1，并在三个数据集上获得最高AUC（SWaT 93.60，SMD 98.66，PSM 97.79），在SWaT和PSM上相对于最强基线具有统计显著的F1和AUC差距。在MSL和SMAP上，其AUC落后领先者不到0.7个百分点，同时F1仍领先。消融和敏感性研究进一步证实，对比目标作为软正则化器效果最佳，支持我们的主张：在非平稳动态下严格不变性是次优的。

英文摘要

Anomaly detection in multivariate time series (MTS) is hindered by dynamic inter-variable dependencies and feature entanglement under spectral noise, and in practice, is further complicated by the absence of anomaly labels. Existing reconstruction-based detectors tend to recover anomalies as faithfully as normal patterns, while prevailing graph contrastive methods enforce invariance across views and thus assume a stationary relational structure, an assumption that breaks under structural drift in real systems. We propose ContrastAD, an unsupervised framework that turns structural evolution itself into a learning signal rather than suppressing it. A Multi-Perspective Embedder encodes inputs from temporal, attribute, and structural perspectives. A Frequency-Aware Attention Mixer then performs spectral top-K filtering before attention, preventing noise from leaking into query-key similarities. The core component, a Dynamic Graph Contrastive Learner, builds power-law-inspired sparse graph snapshots from batch-level DTW distances and contrasts the most divergent pair against a stable anchor, regularizing the latent space without imposing rigid invariance. Across five real-world benchmarks, ContrastAD attains the highest mean F1 on all five datasets and the highest AUC on three (SWaT 93.60, SMD 98.66, PSM 97.79), with statistically significant F1 and AUC margins over the strongest baseline on SWaT and PSM. On MSL and SMAP, it trails the AUC leader by under 0.7 points while still leading on F1. Ablation and sensitivity studies further confirm that the contrastive objective works best as a soft regularizer, supporting our claim that strict invariance is suboptimal under non-stationary dynamics.

URL PDF HTML ☆

赞 0 踩 0

2605.23726 2026-05-25 cs.LG cs.DS stat.ML 版本更新

Optimal Dimension-Free Sampling for Regularized Classification

正则化分类的最优无维度采样

Meysam Alishahi, Alexander Munteanu, Simon Omlor, Jeff M. Phillips

发表机构 * University of Utah, USA（美国犹他大学）； TU Dortmund, Germany（德国图鲁姆大学）

AI总结本文研究了在正则化分类问题中实现$(1\pm\varepsilon)$相对误差的最优无维度采样方法，适用于一大类满足Lipschitz条件的分类损失函数，如逻辑回归、铰链损失和ReLU损失等。作者给出了不同正则化项下的采样复杂度上界和下界，证明了基于$\|\cdot\|_2/k$和$\|\cdot\|_1/k$正则化的采样复杂度分别为$k^2/\varepsilon^2$和$k/\varepsilon^2$，并分析了$\|\cdot\|_2^2/k$正则化下采样复杂度对函数导数性质的依赖。相比现有基于敏感度的立方复杂度方法，本文通过统一采样和更精细的高阶矩分析，实现了更优的采样效率。

详情

AI中文摘要

我们证明了对于一大类Lipschitz连续分类损失函数，在各种正则化项下，达到$(1\pm\varepsilon)$相对误差的最优采样界。这包括重要的函数如logistic和sigmoid损失、hinge损失和ReLU损失，作为突出和流行的代表性例子。特别地，我们证明了对于$\|\cdot\|_2/k$正则化的$k^2/\varepsilon^2$上下界，以及对于$\|\cdot\|_1/k$正则化的$k/\varepsilon^2$上下界。对于$\|\cdot\|_2^2/k$正则化，采样复杂度主要取决于有界导数性质：如果$|g'(x)|\leq g(x)$，且$g(0)>0$，且$g$是单调或凸的，则采样复杂度是$k$的线性；否则一般界为$k^2/\varepsilon^2$。然而，如果$g(0)=0$，我们的结果表明不可能得到无维度界，甚至次线性界也被排除。所有上界都有匹配的下界（至多相差多对数项）。此外，我们的工作在概念上和算法上依赖于简单的均匀或（平方）范数采样，从而改进了最近(Alishahi and Phillips, ICML'24)的立方$k^3/\varepsilon^2$敏感度采样界。这是通过涉及更高矩界和经验过程分析的精细论证来实现的，以避免在事实上的标准VC维和敏感度框架中出现的过度计数。

英文摘要

We prove optimal sampling bounds achieving $(1\pm\varepsilon)$-relative error for a broad class of Lipschitz continuous classification loss functions under various regularization terms. This includes important functions such as logistic and sigmoid loss, hinge loss, and ReLU loss, as prominent and popular representative examples. In particular, we prove $k^2/\varepsilon^2$ upper and lower bounds for $\|\cdot\|_2/k$ regularization, and $k/\varepsilon^2$ upper and lower bounds for $\|\cdot\|_1/k$ regularization. For $\|\cdot\|_2^2/k$ regularization, the sampling complexity depends mainly on a bounded derivative property: if $|g'(x)|\leq g(x)$, and $g(0)>0$, and $g$ is monotonic or convex, then it admits linear in $k$ sampling complexity; otherwise the general bound is $k^2/\varepsilon^2$. However, if $g(0)=0$, our results indicate that no dimension-free bounds are possible, and even sublinear bounds are ruled out. All upper bounds are complemented by matching lower bounds up to polylogarithmic terms. Moreover, our work relies conceptually and algorithmically on simple uniform or (squared) norm sampling and hereby improves over recent cubic $k^3/\varepsilon^2$ sensitivity sampling bounds of (Alishahi and Phillips, ICML'24). This is achieved by refined arguments involving higher moment bounds and empirical process analyses to avoid overcounting that appears in the de-facto standard VC-dimension and sensitivity framework.

URL PDF HTML ☆

赞 0 踩 0

2605.23712 2026-05-25 cs.CE cs.LG 版本更新

随机神经网络优化用于传递算子逼近

Mohammad Tabish, Stefan Klus

发表机构 * Maxwell Institute for Mathematical Sciences, University of Edinburgh and Heriot–Watt University（爱丁堡大学麦克斯韦数学科学研究所和赫里奥特-瓦特大学）； School of Mathematical & Computer Sciences, Heriot–Watt University（赫里奥特-瓦特大学数学与计算机科学学院）

AI总结本文提出了一种用于复杂动力系统传递算子近似的随机神经网络架构RaNNDy，其隐藏层的权重和偏置随机初始化并固定，仅训练输出层，从而降低了训练成本并提供了闭式解。然而，该方法依赖于初始选择的激活函数来确定基函数，为此，本文提出了一种优化激活函数的算法，在保持网络参数固定的情况下提升基函数的适应性，并通过多个基准问题验证了方法的有效性。

详情

AI中文摘要

RaNNDy是一种随机神经网络架构，用于数据驱动地逼近与复杂动力系统相关的传递算子。网络隐藏层的权重和偏置随机初始化并保持固定，仅训练输出层。与完全优化的神经网络相比，这具有几个优点，特别是输出层的闭式解和显著降低的训练成本。尽管有这些优点，RaNNDy局限于参数化算子逼近所需基函数的权重和偏置的初始选择。由于基函数由激活函数决定，为隐藏层选择合适的激活函数至关重要。在这项工作中，我们提出了一种算法，该算法优化激活函数本身，同时保持随机神经网络中的权重和偏置固定，从而提供更合适的字典。我们通过各种基准问题（包括随机微分方程和图上的随机游走）说明了该方法的有效性。

英文摘要

RaNNDy is a randomized neural network architecture for the data-driven approximation of transfer operators associated with complex dynamical systems. The weights and biases of the hidden layers of the network are randomly initialized and kept fixed, only the output layer is trained. This has several advantages over fully optimized neural networks, notably a closed-form solution for the output layer and significantly lower training costs. Despite these advantages, RaNNDy is restricted to the initial selection of weights and biases that parametrize the basis functions required for the operator approximation. Since the basis functions are determined by the activation function, choosing an appropriate activation function for the hidden layers is crucial. In this work, we propose an algorithm that optimizes the activation function itself, while keeping the weights and biases in the randomized neural network fixed, providing a more suitable dictionary. We illustrate the efficacy of the approach using various benchmark problems, including stochastic differential equations and random walks on graphons.

URL PDF HTML ☆

赞 0 踩 0

2605.22738 2026-05-25 cs.LG cs.AI stat.ML 版本更新

使用个体灵敏度界限的认证逐实例遗忘

Hanna Benarroch, Jamal Atif, Olivier Cappé

发表机构 * DI ENS, École normale supérieure, Université PSL, CNRS（巴黎大学（ENS）数据科学研究所，巴黎政治学院，法国国家科学研究中心）； CMAP, École polytechnique, Institut Polytechnique de Paris（巴黎高等理工学院计算数学与应用物理研究所，巴黎理工学院）

AI总结本文研究了如何通过个体敏感度界限实现有保证的逐实例模型遗忘。不同于传统的基于最坏情况敏感度的噪声注入方法，作者提出了一种针对每个数据点贡献进行自适应噪声校准的新方法，从而减少噪声注入量并提升模型性能。在岭回归和深度学习实验中验证了该方法的有效性，证明其在保证遗忘认证的同时能够显著降低噪声影响。

详情

AI中文摘要

认证的机器遗忘可以通过注入噪声实现，从而提供差分隐私保证，其中噪声根据最坏情况灵敏度进行校准。这种保守的校准通常会导致性能下降，限制了实际适用性。在这项工作中，我们研究了一种基于自适应逐实例噪声校准的替代方法，该校准针对每个数据点对学习解的个体贡献进行定制。这引发了以下挑战：当机制依赖于要移除的特定点时，如何建立正式的遗忘保证？为了定义噪声梯度动力学中的个体数据点灵敏度，我们考虑使用逐实例差分隐私。对于通过朗之万动力学训练的岭回归，我们推导出高概率的逐实例灵敏度界限，从而在注入显著更少噪声的情况下实现认证遗忘。我们通过线性设置中的实验证实了我们的理论发现，并提供了进一步的经验证据，表明该方法在深度学习设置中的相关性。

英文摘要

Certified machine unlearning can be achieved via noise injection leading to differential privacy guarantees, where noise is calibrated to worst-case sensitivity. Such conservative calibration often results in performance degradation, limiting practical applicability. In this work, we investigate an alternative approach based on adaptive per-instance noise calibration tailored to the individual contribution of each data point to the learned solution. This raises the following challenge: how can one establish formal unlearning guarantees when the mechanism depends on the specific point to be removed? To define individual data point sensitivities in noisy gradient dynamics, we consider the use of per-instance differential privacy. For ridge regression trained via Langevin dynamics, we derive high-probability per-instance sensitivity bounds, yielding certified unlearning with substantially less noise injection. We corroborate our theoretical findings through experiments in linear settings and provide further empirical evidence on the relevance of the approach in deep learning settings.

URL PDF HTML ☆

赞 0 踩 0

2602.12534 2026-05-25 stat.ML cs.DS cs.LG math.ST stat.TH 版本更新

Linear Regression with Unknown Truncation Beyond Gaussian Features

未知截断下的线性回归：超越高斯特征

Alexandros Kouridakis, Anay Mehrotra, Alkis Kalavasis, Constantine Caramanis

发表机构 * UT Austin（德克萨斯大学奥斯汀分校）； Stanford University（斯坦福大学）； Yale University（耶鲁大学）

AI总结本文研究了在截断线性回归中，当响应变量的生存集未知时，如何高效估计未知的回归参数问题。不同于以往依赖已知生存集或强假设（如高斯分布）的工作，本文提出了一种仅需特征向量满足次高斯条件的算法，其运行时间仅为多项式时间，显著提升了计算效率。该方法的核心在于设计了一种新的子程序，能够在仅有正例且满足平滑条件的情况下高效学习有限个区间联合的模型，具有独立的理论价值和应用前景。

详情

AI中文摘要

在截断线性回归中，只有当结果 $y$ 落在某个生存集 $S^\star$ 内时，样本 $(x,y)$ 才被观测到，目标是估计未知的 $d$ 维回归系数 $w^\star$。该问题在统计学和机器学习中有着悠久的研究历史，可追溯到 (Galton, 1897; Tobin, 1958) 的工作，以及近期如 (Daskalakis et al., 2019; 2021; Lee et al., 2023; 2024) 的研究。然而，尽管历史久远，大多数先前工作仅限于 $S^\star$ 精确已知的特殊情况。更实际相关的情况——$S^\star$ 未知且需从数据中学习——仍然开放：实际上，目前可用的算法要么要求特征向量分布有强假设（如高斯性），即使如此，达到 $\varepsilon$ 精度的运行时间也为 $d^{\mathrm{poly} (1/\varepsilon)}$。在本工作中，我们给出了首个针对未知生存集的截断线性回归算法，运行时间为 $\mathrm{poly} (d/\varepsilon)$，仅要求特征向量是次高斯的。我们的算法依赖于一个新颖的子程序，该子程序在某种平滑条件下，利用正例（无负例）高效学习有界数量区间的并集。该学习保证补充了正例仅 PAC 学习的研究路线，并可能具有独立意义。

英文摘要

In truncated linear regression, samples $(x,y)$ are shown only when the outcome $y$ falls inside a certain survival set $S^\star$ and the goal is to estimate the unknown $d$-dimensional regressor $w^\star$. This problem has a long history of study in Statistics and Machine Learning going back to the works of (Galton, 1897; Tobin, 1958) and more recently in, e.g., (Daskalakis et al., 2019; 2021; Lee et al., 2023; 2024). Despite this long history, however, most prior works are limited to the special case where $S^\star$ is precisely known. The more practically relevant case, where $S^\star$ is unknown and must be learned from data, remains open: indeed, here the only available algorithms require strong assumptions on the distribution of the feature vectors (e.g., Gaussianity) and, even then, have a $d^{\mathrm{poly} (1/\varepsilon)}$ run time for achieving $\varepsilon$ accuracy. In this work, we give the first algorithm for truncated linear regression with unknown survival set that runs in $\mathrm{poly} (d/\varepsilon)$ time, by only requiring that the feature vectors are sub-Gaussian. Our algorithm relies on a novel subroutine for efficiently learning unions of a bounded number of intervals using access to positive examples (without any negative examples) under a certain smoothness condition. This learning guarantee adds to the line of works on positive-only PAC learning and may be of independent interest.

URL PDF HTML ☆

赞 0 踩 0

2602.04431 2026-05-25 cs.LG cs.GT 版本更新

MaMa: A Game-Theoretic Approach for Designing Safe Agentic Systems

MaMa: 一种基于博弈论的安全智能体系统设计方法

Jonathan Nöther, Adish Singla, Goran Radanovic

发表机构 * Max Planck Institute for Software Systems (MPI-SWS)（马克斯·普朗克软件系统研究所）

AI总结本文研究了基于大语言模型的多智能体系统在部分智能体失效或对抗行为下的安全设计问题。受Stackelberg安全博弈启发，作者提出了一种名为MaMa的新算法，通过元对抗者与元代理之间的博弈过程，自动设计出在最坏情况下仍能保持安全的智能体系统。实验表明，该方法设计的系统不仅能够有效抵御最坏攻击，还能在不同攻击目标和大模型环境下保持良好的泛化能力。

详情

AI中文摘要

基于LLM的多智能体系统展现了令人印象深刻的能力，但当单个智能体失败或表现出对抗行为时，也会引入显著的安全风险。在这项工作中，我们研究了即使部分智能体被攻破时仍能保持安全的智能体系统的自动设计。受Stackelberg安全博弈启发，我们将此问题形式化为系统设计者（元智能体）与一个最佳响应的元对手之间的博弈，该对手选择并攻破一部分智能体以最小化安全性。我们提出了MaMa（元对手-元智能体），一种受此形式化启发的新算法，用于自动设计安全的智能体系统。我们的方法使用基于LLM的对抗搜索，其中元智能体迭代地提出系统设计，并根据元对手发现的最强攻击接收反馈。跨不同环境的实证评估表明，使用MaMa设计的系统能够持续防御最坏情况下的攻击，同时保持与仅优化任务成功率的系统相当的性能。此外，所得系统能够泛化到更强的对手，以及具有不同攻击目标或底层LLM的对手，展示了超越训练设置的鲁棒安全性。

英文摘要

LLM-based multi-agent systems have demonstrated impressive capabilities, but they also introduce significant safety risks when individual agents fail or behave adversarially. In this work, we study the automated design of agentic systems that remain safe even when a subset of agents is compromised. Inspired by Stackelberg security games, we formalize this problem as a game between a system designer (the Meta-Agent) and a best-responding Meta-Adversary that selects and compromises a subset of agents to minimize safety. We propose Meta-Adversary-Meta-Agent (MaMa), a novel algorithm inspired by this formalization for automatically designing safe agentic systems. Our approach uses LLM-based adversarial search, where the Meta-Agent iteratively proposes system designs and receives feedback based on the strongest attacks discovered by the Meta-Adversary. Empirical evaluations across diverse environments show that systems designed with MaMa consistently defend against worst-case attacks while maintaining performance comparable to systems optimized solely for task success. Moreover, the resulting systems generalize to stronger adversaries, as well as ones with different attack objectives or underlying LLMs, demonstrating robust safety beyond the training setting.

URL PDF HTML ☆

赞 0 踩 0

2601.21513 2026-05-25 cs.LG 版本更新

Cascaded Transfer: Learning Many Tasks under Budget Constraints

级联迁移：在预算约束下学习多任务

Eloi Campagne, Yvenn Amara-Ouali, Yannig Goude, Mathilde Mougeot, Argyris Kalogeratos

发表机构 * Centre Borelli, CNRS, ENS Paris-Saclay, Université Paris-Saclay（Centre Borelli，CNRS，ENS巴黎萨克雷，巴黎萨克雷大学）； Laboratoire de Mathématiques d’Orsay, CNRS, Université Paris-Saclay（奥赛数学实验室，CNRS，巴黎萨克雷大学）； EDF R&D（EDF研发部）； ENSIIE, Évry-Courcouronnes（ENSIIE，Évry-科尔库荣）

AI总结在分布式应用场景中，如变电站级别的用电需求预测或联邦学习，需要为大量相关任务训练不同模型，但任务之间的关系未知。本文提出了一种新的级联迁移学习（CTL）范式，通过构建以根节点为起点的树形结构，使模型参数在任务间逐层传递，同时遵循全局训练预算约束。该方法基于最小化任务间距离与预算约束的组合目标构建生成树，形成具有几何感知和深度限制的迁移图，并理论分析了迁移误差在级联路径上的累积与衰减特性。实验表明，CTL在多种任务集合上实现了比现有方法更准确且更节省成本的模型适应，尤其在预算受限时效果更显著。

详情

AI中文摘要

在分布式应用中，如变电站级能源需求预测或联邦学习，大量相关任务必须由不同模型学习，而确切的任务关系未知。我们提出了新颖的级联迁移学习（CTL）范式，其中模型参数通过组织为有根树的任务层级级联，并遵守全局训练预算。从源任务开始，树指定了任务学习和细化的顺序，预算沿其分支分配。我们设计了基于生成树的级联机制，通过最小化结合成对任务距离和可用训练预算的目标来连接所有任务，从而产生几何感知和深度有界的迁移图。我们从理论上刻画了迁移误差如何沿级联路径累积和衰减：任何上游节点引入的误差都会被每个下游细化收缩，而平衡的树拓扑限制了这种累积。在合成和真实多任务场景、时间序列预测和图像分类上的实验表明，CTL能够在大量任务集合中实现比替代方法更准确和成本效益更高的适应，且在预算最紧张时增益最大。

英文摘要

In distributed applications, such as energy demand forecasting at the substation level or federated learning, a large number of related tasks must be learned by different models, while the exact task relationships are unknown. We propose the novel Cascaded Transfer Learning (CTL) paradigm in which model parameters cascade hierarchically through tasks organized as a rooted tree, respecting a global training budget. Starting from a source task, the tree specifies the order in which tasks are learned and refined, with the budget allocated along its branches. We design cascade mechanisms based on spanning trees that connect all tasks by minimizing an objective combining pairwise task distances and the available training budget, which yield geometry-aware and depth-bounded transfer graphs. We theoretically characterize how transfer errors accumulate and attenuate along cascade paths: errors introduced at any upstream node are contracted by every downstream refinement, and balanced tree topologies bound this accumulation. Experiments on synthetic and real many-task settings, time-series forecasting and image classification, show that CTL enables more accurate and cost-effective adaptation across large task collections than alternative approaches, with the largest gains at the tightest budgets.

URL PDF HTML ☆

赞 0 踩 0

2601.03715 2026-05-25 cs.LG cs.AI 版本更新

R$^3$L: Reflect-then-Retry Reinforcement Learning with Language-Guided Exploration, Pivotal Credit, and Positive Amplification

R$^3$L: 反思-重试强化学习与语言引导探索、关键信用和正向放大

Weijie Shi, Yanxi Chen, Zexi Li, Xuchen Pan, Yuchang Sun, Jiajie Xu, Xiaofang Zhou, Yaliang Li

发表机构 * Tongyi Lab（通义实验室）； Soochow University（苏州大学）； Hong Kong University of Science and Technology（香港科技大学）

AI总结 R$^3$L 是一种结合语言引导探索、关键信用分配和正向增强的强化学习方法，旨在解决大语言模型在推理和智能体能力训练中面临的探索与利用难题。该方法通过“反思-重试”机制合成高质量轨迹，利用语言反馈定位错误并优化失败路径，同时仅更新存在差异的轨迹后缀以提高信用分配精度，并通过增强成功轨迹的权重来稳定训练过程。实验表明，R$^3$L 在多个任务中相较基线方法实现了显著性能提升，同时保持了训练稳定性。

详情

AI中文摘要

强化学习推动了LLM推理和智能体能力的最新进展，但当前方法在探索和利用方面均存在困难。探索方面，困难任务成功率低且从头开始重复rollout成本高；利用方面，粗粒度的信用分配和训练不稳定：轨迹级奖励因后续错误惩罚有效前缀，且失败主导的群体淹没少数正向信号，使优化缺乏建设性方向。为此，我们提出R$^3$L，即反思-重试强化学习与语言引导探索、关键信用和正向放大。为合成高质量轨迹，R$^3$L通过反思-重试从随机采样转向主动合成，利用语言反馈诊断错误，将失败尝试转化为成功尝试，并通过从识别出的失败点重启来降低rollout成本。在错误被诊断和定位后，关键信用分配仅更新存在对比信号的分叉后缀，排除共享前缀的梯度更新。由于困难任务中失败占主导且反思-重试产生离策略数据，可能导致训练不稳定，正向放大提高成功轨迹的权重，确保正向信号引导优化过程。在智能体和推理任务上的实验表明，与基线相比，相对提升5%到52%，同时保持训练稳定性。我们的代码已发布在https://github.com/shiweijiezero/R3L。

英文摘要

Reinforcement learning drives recent advances in LLM reasoning and agentic capabilities, yet current approaches struggle with both exploration and exploitation. Exploration suffers from low success rates on difficult tasks and high costs of repeated rollouts from scratch. Exploitation suffers from coarse credit assignment and training instability: Trajectory-level rewards penalize valid prefixes for later errors, and failure-dominated groups overwhelm the few positive signals, leaving optimization without constructive direction. To this end, we propose R$^3$L, Reflect-then-Retry Reinforcement Learning with Language-Guided Exploration, Pivotal Credit, and Positive Amplification. To synthesize high-quality trajectories, R$^3$L shifts from stochastic sampling to active synthesis via reflect-then-retry, leveraging language feedback to diagnose errors, transform failed attempts into successful ones, and reduce rollout costs by restarting from identified failure points. With errors diagnosed and localized, Pivotal Credit Assignment updates only the diverging suffix where contrastive signals exist, excluding the shared prefix from gradient update. Since failures dominate on difficult tasks and reflect-then-retry produces off-policy data, risking training instability, Positive Amplification upweights successful trajectories to ensure positive signals guide the optimization process. Experiments on agentic and reasoning tasks demonstrate 5\% to 52\% relative improvements over baselines while maintaining training stability. Our code is released at https://github.com/shiweijiezero/R3L.

URL PDF HTML ☆

赞 0 踩 0

2512.15767 2026-05-25 cs.LG cs.AI 版本更新

Bridging Data and Physics: A Graph Neural Network-Based Hybrid Twin Framework

连接数据与物理：基于图神经网络的混合孪生框架

M. Gorpinich, B. Moya, S. Rodriguez, F. Meraghni, Y. Jaafra, A. Briot, M. Henner, R. Leon, F. Chinesta

发表机构 * Valeo（瓦莱欧）； PIMM Lab. ENSAM Institute of Technology（ENSAM技术学院PIMM实验室）

AI总结该研究提出了一种基于图神经网络的混合孪生框架，旨在解决物理仿真中因模型简化或未建模效应导致的“无知模型”问题。通过结合物理模型与数据驱动方法，该方法利用图神经网络学习稀疏空间测量中的缺失物理规律，从而在减少数据需求的前提下提升仿真精度与可解释性。实验表明，该框架在不同网格、几何和负载位置的非线性热传导问题中均表现出良好的泛化能力与修正效果。

Comments 27 pages, 14 figures

详情

AI中文摘要

模拟复杂的非定常物理现象依赖于详细的数学模型，例如通过有限元方法（FEM）进行仿真。然而，由于未建模效应或简化假设，这些模型通常与实际情况存在差异。我们将这种差距称为无知模型。纯数据驱动的方法试图学习整个系统的行为，但需要跨越整个空间和时间域的大量高质量数据。在现实场景中，此类信息不可用，使得完全数据驱动的建模不可靠。为了克服这一限制，我们采用混合孪生方法对无知分量进行建模，而不是从头模拟现象。由于基于物理的模型近似了现象的整体行为，剩余的无知通常比完整的物理响应复杂度低，因此可以用更少的数据进行学习。然而，一个关键困难是空间测量是稀疏的，并且在实际中获取不同空间配置下同一现象的数据具有挑战性。我们的贡献是通过使用图神经网络（GNN）来表示无知模型来克服这一限制。即使测量位置数量有限，GNN也能学习缺失物理的空间模式。这使得我们能够用数据驱动的修正来丰富基于物理的模型，而无需密集的空间、时间和参数数据。为了展示所提出方法的性能，我们在不同网格、几何形状和载荷位置的非线性热传导问题上评估了这种基于GNN的混合孪生方法。结果表明，GNN成功捕获了无知并泛化了跨空间配置的修正，提高了仿真精度和可解释性，同时最小化了数据需求。

英文摘要

Simulating complex unsteady physical phenomena relies on detailed mathematical models, simulated for instance by using the Finite Element Method (FEM). However, these models often exhibit discrepancies from the reality due to unmodeled effects or simplifying assumptions. We refer to this gap as the ignorance model. While purely data-driven approaches attempt to learn full system behavior, they require large amounts of high-quality data across the entire spatial and temporal domain. In real-world scenarios, such information is unavailable, making full data-driven modeling unreliable. To overcome this limitation, we model of the ignorance component using a hybrid twin approach, instead of simulating phenomena from scratch. Since physics-based models approximate the overall behavior of the phenomena, the remaining ignorance is typically lower in complexity than the full physical response, therefore, it can be learned with significantly fewer data. A key difficulty, however, is that spatial measurements are sparse, also obtaining data measuring the same phenomenon for different spatial configurations is challenging in practice. Our contribution is to overcome this limitation by using Graph Neural Networks (GNNs) to represent the ignorance model. GNNs learn the spatial pattern of the missing physics even when the number of measurement locations is limited. This allows us to enrich the physics-based model with data-driven corrections without requiring dense spatial, temporal and parametric data. To showcase the performance of the proposed method, we evaluate this GNN-based hybrid twin on nonlinear heat transfer problems across different meshes, geometries, and load positions. Results show that the GNN successfully captures the ignorance and generalizes corrections across spatial configurations, improving simulation accuracy and interpretability, while minimizing data requirements.

URL PDF HTML ☆

赞 0 踩 0

2512.07078 2026-05-25 cs.CV cs.LG 版本更新

DFIR-DETR: Frequency-Domain Iterative Refinement and Dynamic Feature Aggregation for Small Object Detection

DFIR-DETR：面向小目标检测的频域迭代细化与动态特征聚合

Bo Gao, Jingcheng Tong, Xingsheng Chen, Han Yu, Zichen Li

发表机构 * School of Information Engineering, Beijing Institute of Graphic Communication（信息工程学院，北京印刷学院）； School of Computing and Data Science, The University of Hong Kong（计算与数据科学学院，香港大学）； College of Computing and Data Science, Nanyang Technological University（计算与数据科学学院，南洋理工大学）

AI总结本文针对复杂场景中小目标检测中的核心挑战，提出了一种名为DFIR-DETR的新方法，通过频率域迭代优化和动态特征聚合，有效解决了现有网络在注意力分配、特征上采样和高频信息保留方面的不足。该方法在保持较低计算成本的同时，在NEU-DET和VisDrone数据集上取得了显著的性能提升，验证了其在不同检测任务中的有效性。

2511.15503 2026-05-25 cs.AR cs.DC cs.LG cs.PF 版本更新

DCC: Data-Centric Compilation of Machine Learning Kernels for Processing-In-Memory Architectures

DCC: 面向处理-内存架构的机器学习内核数据驱动编译

Peiming Yang, Sankeerth Durvasula, Ivan Fernandez, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko, Christina Giannoula

发表机构 * University of Toronto（多伦多大学）； Vector Institute（向量研究所）； Barcelona Supercomputing Center（巴塞罗那超级计算中心）； ETH Zürich（苏黎世联邦理工学院）； Nvidia（英伟达）； Max Planck Institute for Software Systems（马克斯·普朗克软件系统研究所）

AI总结本文提出了一种面向存算一体架构的数据为中心的机器学习内核编译器DCC，旨在解决在处理大型语言模型等内存密集型任务时，主机处理器与存算一体核心之间数据布局不一致带来的性能瓶颈。DCC通过统一优化数据重排与计算代码生成，结合多层PIM抽象和性能预测模型，有效提升了在不同PIM设备上的执行效率。实验表明，DCC在多种机器学习内核和端到端大语言模型推理中均实现了显著的加速效果。

详情

AI中文摘要

高性能主机处理器可以集成处理-内存（PIM）设备，通过利用PIM核心可用的大内存带宽，加速机器学习（ML）模型（包括大型语言模型（LLM））的内存密集型内核。然而，主机处理器需要分布在DRAM bank中的连续元素，而PIM核心需要其本地bank内的连续元素。这需要在ML内核执行中进行数据重排，带来了显著的性能和可编程性挑战，并且由于需要支持多种PIM设备而进一步加剧。当前的编译方法缺乏针对多种ML内核和多个PIM设备的系统优化，并且可能在计算代码优化步骤中很大程度上忽略数据重排成本。我们表明数据重排和计算代码优化是相互依赖的，需要在调优过程中联合优化。因此，我们设计了DCC，这是首个面向PIM系统的数据驱动ML编译器，它在统一的调优过程中联合优化数据重排和计算代码。DCC集成了多层PIM抽象以支持多个PIM后端。DCC实现了数据分区策略与计算循环分区方案的有效联合优化。DCC应用了PIM特定的代码优化，并利用快速准确的性能预测模型为目标PIM架构上的给定内核选择最佳性能的代码调度。我们在各种单个ML内核上的评估表明，与仅GPU执行相比，DCC在HBM-PIM上实现了高达7.68倍的加速（平均2.21倍），在AttAcc PIM上实现了高达13.17倍的加速（平均3.92倍）。在端到端LLM推理中，AttAcc上的DCC在GPT-3和LLaMA-2上比GPU平均加速4.52倍（LLaMA-2上最高7.71倍）。DCC已在https://github.com/SPIN-Research-Group/DCC开源。

英文摘要

High-performance Host processors can integrate Processing-In-Memory (PIM) devices, which can accelerate memory-intensive kernels of Machine Learning (ML) models, including Large Language Models (LLMs), by leveraging the large memory bandwidth available at PIM cores. However, Host processor needs consecutive elements distributed across DRAM banks, while PIM cores need consecutive elements within their local banks. This necessitates data rearrangements in ML kernel execution that pose significant performance and programmability challenges, further exacerbated by the need to support diverse PIM devices. Current compilation approaches lack systematic optimization for diverse ML kernels and multiple PIM devices, and may largely ignore data rearrangement costs during the compute code optimization step. We show that data rearrangements and compute code optimization are interdependent, and need to be jointly optimized during the tuning process. Therefore, we design DCC, the first data-centric ML compiler for PIM systems that jointly co-optimizes data rearrangements and compute code in a unified tuning process. DCC integrates a multi-layer PIM abstraction to support multiple PIM backends. DCC enables effective co-optimization of data partitioning strategies with compute loop partitioning schemes. DCC applies PIM-specific code optimizations, and leverages a fast and accurate performance prediction model to select the bestperforming code schedule for a given kernel on a target PIM architecture. Our evaluations in various individual ML kernels show that DCC achieves up to 7.68x speedup (2.21x average) on HBM-PIM, and up to 13.17x speedup (3.92x average) on AttAcc PIM, over GPU-only execution. In end-to-end LLM inference, DCC on AttAcc accelerates GPT-3 and LLaMA-2 by 4.52x average (up to 7.71x in LLaMA-2) over GPU. DCC is open-sourced at https://github.com/SPIN-Research-Group/DCC.

URL PDF HTML ☆

赞 0 踩 0

2511.03882 2026-05-25 cs.CV cs.AI cs.LG cs.RO 版本更新

Investigating Robot Control Policy Learning for Autonomous X-ray-guided Spine Procedures

自主X光引导脊柱手术的机器人控制策略学习研究

Florence Klitzner, Blanca Inigo, Benjamin D. Killeen, Lalithkumar Seenivasan, Michelle Song, Axel Krieger, Mathias Unberath

发表机构 * Johns Hopkins University（约翰霍普金斯大学）； Technical University of Munich（慕尼黑技术大学）； Johns Hopkins School of Medicine（约翰霍普金斯医学院）

AI总结本文研究了基于模仿学习的机器人控制策略在X射线引导脊柱手术中的应用，特别是在椎体成形术中导管插入任务中的可行性与挑战。研究构建了一个高度逼真的仿真环境，并构建了包含正确操作轨迹和双平面X射线序列的数据集，用于训练仅依赖视觉信息的模仿学习策略。实验表明，该策略在多种脊柱解剖结构和初始条件下均能实现安全的导管插入，为未来轻量化、无需CT的术中脊柱机器人导航提供了基础。

详情

DOI: 10.1007/s11548-026-03716-x

AI中文摘要

基于模仿学习的机器人控制策略在基于视频的机器人学中重新受到关注。然而，对于稀疏输入的X光引导手术（如脊柱内固定），这种方法是否适用尚不清楚。我们研究了在双平面引导的套管针插入中模仿策略学习的可行性、机遇和挑战。我们开发了一个用于可扩展、自动化模拟X光引导脊柱手术的计算机沙盒，具有高度逼真性。我们整理了一个包含正确轨迹和相应双平面X光序列的数据集，模拟了提供者的逐步对齐过程。然后，我们训练了用于规划和开环控制的模仿学习策略，该策略仅基于视觉信息在椎体成形术环境中迭代对齐套管针。这种精确控制的设置提供了对该方法局限性和能力的见解。我们的策略在68.5%的案例中首次尝试成功，在不同椎体水平上保持了安全的椎弓根内轨迹。该策略迁移到了复杂解剖结构（包括骨折）以及不同的解剖结构和初始位置。在真实X光上的展开表明，具有合理轨迹的部分仿真到真实迁移是可能的。尽管这些初步结果令人鼓舞，但我们还发现了局限性，特别是在入口点精度方面。当前的结果为未来的努力提供了明确的基准，而借助更稳健的先验和领域知识，此类模型可能为未来实现轻量级、无CT的机器人术中脊柱导航奠定基础。

英文摘要

Imitation learning-based robot control policies are enjoying renewed interest in video-based robotics. However, it remains unclear whether this approach applies to X-ray-guided procedures, such as spine instrumentation, with sparse inputs. We examine the feasibility, opportunities and challenges for imitation policy learning in bi-plane-guided cannula insertion. We develop an in silico sandbox for scalable, automated simulation of X-ray-guided spine procedures with a high degree of realism. We curate a dataset of correct trajectories and corresponding bi-planar X-ray sequences that emulate the stepwise alignment of providers. We then train imitation learning policies for planning and open-loop control that iteratively align a cannula in a vertebroplasty setting solely based on visual information. This precisely controlled setup offers insights into limitations and capabilities of this method. Our policy succeeded on the first attempt in 68.5% of cases, maintaining safe intra-pedicular trajectories across diverse vertebral levels. The policy transferred to complex anatomy, including fractures, as well as varied anatomies and initializations. Rollouts on real X-ray indicate that partial sim-to-real transfer with plausible trajectories is possible. While these preliminary results are promising, we also identify limitations, especially in entry point precision. The current results present a clear benchmark for future efforts, while with more robust priors and domain knowledge, such models may provide a foundation for future efforts toward lightweight and CT-free robotic intra-operative spinal navigation.

URL PDF HTML ☆

赞 0 踩 0

2509.06896 2026-05-25 cs.LG stat.ML 版本更新

一种用于动态健康指标构建的无监督框架及其在滚动轴承预测中的应用

Tongda Sun, Chen Yin, Huailiang Zheng, Yining Dong

发表机构 * School of Data Science（数据科学学院）； Hong Kong Institute for Data Science, City University of Hong Kong, Hong Kong（香港数据科学研究所，香港城市大学，香港）； College of Mechanical（机械学院）； Electrical Engineering, Harbin Engineering University, Harbin 150001, China（电气工程学院，哈尔滨工程大学，哈尔滨150001，中国）

AI总结本文提出了一种无需专家知识的无监督框架，用于构建动态健康指标（HI），以提升滚动轴承退化趋势建模与剩余寿命预测的准确性。该方法通过基于跳跃连接的自编码器自动提取退化特征，并在特征空间中引入嵌入内部预测模块的HI生成模块，显式建模HI状态的时序依赖关系，从而捕捉退化过程中的动态信息。实验结果表明，所提出的动态HI在两个轴承生命周期数据集上优于现有方法，显著提升了预测性能。

详情

DOI: 10.1016/j.ress.2025.111039

AI中文摘要

健康指标（HI）在滚动轴承的退化评估和预测中起着关键作用。尽管已有多种HI构建方法被研究，但大多数依赖于专家知识进行特征提取，并忽略了捕捉序列退化过程中隐藏的动态信息，这限制了所构建HI在退化趋势表示和预测中的能力。为解决这些问题，通过一种无监督框架构建了考虑HI级时间依赖性的新型动态HI。具体而言，由基于跳跃连接的自编码器组成的退化特征学习模块首先将原始信号映射到代表性退化特征空间（DFS），以自动提取必要的退化特征，无需专家知识。随后，在该DFS中，提出了一种嵌入内部HI预测模块的新型HI生成模块用于动态HI构建，其中过去和当前HI状态之间的时间依赖性被保证并显式建模。在此基础上，动态HI捕捉了退化过程固有的动态内容，确保其在退化趋势建模和未来退化预测中的有效性。在两个轴承生命周期数据集上的实验结果表明，所提出的HI构建方法优于对比方法，且构建的动态HI在预测任务中表现更优。

英文摘要

Health indicator (HI) plays a key role in degradation assessment and prognostics of rolling bearings. Although various HI construction methods have been investigated, most of them rely on expert knowledge for feature extraction and overlook capturing dynamic information hidden in sequential degradation processes, which limits the ability of the constructed HI for degradation trend representation and prognostics. To address these concerns, a novel dynamic HI that considers HI-level temporal dependence is constructed through an unsupervised framework. Specifically, a degradation feature learning module composed of a skip-connection-based autoencoder first maps raw signals to a representative degradation feature space (DFS) to automatically extract essential degradation features without the need for expert knowledge. Subsequently, in this DFS, a new HI-generating module embedded with an inner HI-prediction block is proposed for dynamic HI construction, where the temporal dependence between past and current HI states is guaranteed and modeled explicitly. On this basis, the dynamic HI captures the inherent dynamic contents of the degradation process, ensuring its effectiveness for degradation tendency modeling and future degradation prognostics. The experiment results on two bearing lifecycle datasets demonstrate that the proposed HI construction method outperforms comparison methods, and the constructed dynamic HI is superior for prognostic tasks.

URL PDF HTML ☆

赞 0 踩 0

2412.19098 2026-05-25 cs.LG 版本更新

SyMerge: From Non-Interference to Synergistic Merging via Single-Layer Adaptation

SyMerge：从无干扰到协同合并的单层自适应方法

Aecheon Jung, Seunghwan Lee, Dongyoon Han, Sungeun Hong

发表机构 * Sungkyunkwan University（成均馆大学）； NAVER AI Lab（NAVER AI实验室）

AI总结 SyMerge 是一种轻量级的模型合并框架，旨在通过单层适配实现任务间的协同效应，而非仅仅避免任务干扰。该方法通过联合优化合并系数和一个任务特定层，引入专家引导的自标注目标，提升了合并效果的稳定性与性能。研究证明，SyMerge 能够成功合并不同初始化训练的模型，在多个视觉、密集预测和自然语言处理基准上取得了最先进的结果。

Comments Accepted at ICML 2026

详情

AI中文摘要

模型合并将独立训练的模型组合成一个多任务模型。然而，大多数现有方法主要关注避免任务干扰。我们认为其更大的潜力在于实现任务协同，即任务之间主动相互改进。我们识别出跨任务性能，由不同任务之间的编码器和预测器的兼容性定义，作为合并质量的关键指标。我们证明仅适应单个任务特定层就足以诱导这种协同。本研究提出SyMerge，一个轻量级框架，联合优化合并系数和单个任务特定层。我们采用专家引导的自标签目标，提供超越熵最小化的稳定监督。有趣的是，我们进一步表明SyMerge成功合并了从不同初始化训练的模型，而标准方法在此情况下失效。我们极简但有原则的方法在视觉、密集预测和NLP基准上达到了最先进的结果。我们的代码可在https://aim-skku.github.io/SyMerge获取。

英文摘要

Model merging combines independently trained models into a single multi-task model. However, most existing approaches focus primarily on avoiding task interference. We argue that its greater potential lies in enabling task synergy, where tasks actively improve one another. We identify cross-task performance, defined by compatibility between encoders and predictors across tasks, as a key indicator of merge quality. We demonstrate that adapting only a single task-specific layer is sufficient to induce such synergy. This study proposes SyMerge, a lightweight framework that jointly optimizes merging coefficients and a single task-specific layer. We adopt an expert-guided self-labeling objective, providing stable supervision beyond entropy minimization. Intriguingly, we further show that SyMerge successfully merges models trained from different initializations, a regime where standard methods break down. Our minimalist yet principled method achieves state-of-the-art results across vision, dense prediction, and NLP benchmarks. Our code is available at https://aim-skku.github.io/SyMerge

URL PDF HTML ☆

赞 0 踩 0

2406.02883 2026-05-25 cs.LG cs.CR 版本更新

Nonlinear Transformations Against Unlearnable Datasets

针对不可学习数据集的非线性变换

Thushari Hapuarachchi, Jing Lin, Kaiqi Xiong, Mohamed Rahouti, Gitte Ost

发表机构 * University of South Florida（佛罗里达州立大学）； Fordham University（福特汉姆大学）

AI总结本文研究了如何通过非线性变换方法解决深度学习模型对传统认为无法学习的“不可遗忘”数据集的学习问题。作者提出了一种有效的非线性变换框架，并通过大量实验表明，深度神经网络能够从由多种数据保护方法生成的不可遗忘数据中有效学习，显著优于近期提出的线性可分技术。实验结果表明，该方法在多个数据集上提升了模型性能，揭示了现有保护方法在防止数据未经授权使用方面存在不足，亟需更强大的防护机制。

详情

AI中文摘要

自动化爬取是深度学习模型中未经数据所有者授权收集数据的常见方法。近期研究开始解决这种数据收集方法带来的隐私问题。显著的方法包括Deepconfuse、误差最小化、误差最大化（也称为对抗性投毒）、神经正切泛化攻击、合成、自回归、单像素捷径、自集成保护、纠缠特征、鲁棒误差最小化、虚伪和TensorClog。这些方法生成的数据称为“不可学习”样本，阻止深度学习模型“学习”。在本研究中，我们调查并设计了一个有效的非线性变换框架，并进行大量实验，证明深度神经网络能够有效从上述十二种方法产生的传统上被认为不可学习的数据/样本中学习。与研究人员最近提出的线性可分技术相比，所提出的方法提高了破解不可学习数据的能力。具体来说，我们的大量实验表明，对于这些十二种数据保护方法生成的不可学习CIFAR10数据集（除单像素捷径外），改进范围为0.34%至249.59%。此外，与线性可分技术相比，所提出的框架在自回归和REM方法上实现了超过100%的测试准确率提升。我们的发现表明，这些方法不足以防止机器学习模型中数据的未经授权使用。迫切需要开发更强大的保护机制，有效阻止攻击者在未经所有者适当授权的情况下访问数据。

英文摘要

Automated scraping stands out as a common method for collecting data in deep learning models without the authorization of data owners. Recent studies have begun to tackle the privacy concerns associated with this data collection method. Notable approaches include Deepconfuse, error-minimizing, error-maximizing (also known as adversarial poisoning), Neural Tangent Generalization Attack, synthetic, autoregressive, One-Pixel Shortcut, Self-Ensemble Protection, Entangled Features, Robust Error-Minimizing, Hypocritical, and TensorClog. The data generated by those approaches, called "unlearnable" examples, are prevented "learning" by deep learning models. In this research, we investigate and devise an effective nonlinear transformation framework and conduct extensive experiments to demonstrate that a deep neural network can effectively learn from the data/examples traditionally considered unlearnable produced by the above twelve approaches. The resulting approach improves the ability to break unlearnable data compared to the linear separable technique recently proposed by researchers. Specifically, our extensive experiments show that the improvement ranges from 0.34% to 249.59% for the unlearnable CIFAR10 datasets generated by those twelve data protection approaches, except for One-Pixel Shortcut. Moreover, the proposed framework achieves over 100% improvement of test accuracy for Autoregressive and REM approaches compared to the linear separable technique. Our findings suggest that these approaches are inadequate in preventing unauthorized uses of data in machine learning models. There is an urgent need to develop more robust protection mechanisms that effectively thwart an attacker from accessing data without proper authorization from the owners.

URL PDF HTML ☆

赞 0 踩 0

2605.23673 2026-05-25 cs.LG 版本更新

Relevant Walk Search for Explaining Graph Neural Networks

用于解释图神经网络的相关游走搜索

Ping Xiong, Thomas Schnake, Michael Gastegger, Grégoire Montavon, Klaus-Robert Müller, Shinichi Nakajima

发表机构 * BIFOLD -- Berlin Institute for the Foundations of Learning（柏林学习与数据基础研究所）； Google Research, Brain team, Berlin（谷歌研究，柏林脑团队）； Department of Artificial Intelligence, Korea University, Seoul 136-713, Korea（人工智能系，韩国大学，首尔136-713，韩国）； RIKEN Center for AIP, Japan（日本AIP研究中心）

AI总结本文研究了图神经网络（GNN）的可解释性问题，提出了一种高效寻找关键路径（walk）的方法，用于揭示网络中的重要信息流动。针对现有基于层间相关性传播（GNN-LRP）方法计算复杂度高、难以应用于大规模网络的问题，作者设计了多项式时间算法，能够在保证解释精度的同时大幅提升计算效率。实验表明，该方法在多个实际应用领域中表现良好，具有广泛的应用价值。

Comments Published in ICML 2023

详情

Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:38301-38324, 2023

AI中文摘要

图神经网络（GNN）已成为图分析的重要机器学习工具，其可解释性对于安全性、公平性和鲁棒性至关重要。GNN的逐层相关性传播（GNN-LRP）评估游走的相关性以揭示网络中的重要信息流，并提供高阶解释，已被证明优于低阶（即节点/边级）解释。然而，通过GNN-LRP识别相关游走需要相对于网络深度的指数级计算复杂度，本文将对这一问题进行改进。具体来说，我们提出了多项式时间算法来寻找前K个相关游走，这大大减少了计算量，从而提高了GNN-LRP在大规模问题上的适用性。我们提出的算法基于最大积算法——一种在概率图模型中寻找最大似然配置的常用工具——并且可以在神经元级别精确地找到最相关的游走，在节点级别近似地找到。我们的实验展示了我们的算法在规模上的性能及其在应用领域（即流行病学、分子和自然语言基准）中的实用性。我们在\href{https://github.com/xiong-ping/rel_walk_gnnlrp}{github.com/xiong-ping/rel\_walk\_gnnlrp}上提供代码。

英文摘要

Graph Neural Networks (GNNs) have become important machine learning tools for graph analysis, and its explainability is crucial for safety, fairness, and robustness. Layer-wise relevance propagation for GNNs (GNN-LRP) evaluates the relevance of \emph{walks} to reveal important information flows in the network, and provides higher-order explanations, which have been shown to be superior to the lower-order, i.e., node-/edge-level, explanations. However, identifying relevant walks by GNN-LRP requires {\em exponential} computational complexity with respect to the network depth, which we will remedy in this paper. Specifically, we propose {\em polynomial-time} algorithms for finding top-$K$ relevant walks, which drastically reduces the computation and thus increases the applicability of GNN-LRP to large-scale problems. Our proposed algorithms are based on the \emph{max-product} algorithm -- a common tool for finding the maximum likelihood configurations in probabilistic graphical models -- and can find the most relevant walks exactly at the neuron level and approximately at the node level. Our experiments demonstrate the performance of our algorithms at scale and their utility across application domains, i.e., on epidemiology, molecular, and natural language benchmarks. We provide our codes under \href{https://github.com/xiong-ping/rel_walk_gnnlrp}{github.com/xiong-ping/rel\_walk\_gnnlrp}.

URL PDF HTML ☆

赞 0 踩 0

2605.23663 2026-05-25 cs.HC cs.LG 版本更新

Detecting Drunk Driving Using Off-the-Shelf Smartwatches

使用现成智能手表检测酒驾

Robin Deuber, Lanlan Yang, Michal Bechny, Christoph Heck, Matthias Pfäffli, Matthias Bantle, Florian von Wangenheim, Elgar Fleisch, Wolfgang Weinmann, Manuel Günther, Felix Wortmann, Varun Mishra

发表机构 * University of Bern（伯尔尼大学）； University of St. Gallen（施特加尔伦大学）； Northeastern University（东北大学）

AI总结本文研究了如何利用市售智能手表检测酒后驾驶行为，以预防道路交通事故。研究通过分析手腕加速度计数据和心率变异性等生理信号，提出了一种基于机器学习的检测系统，并在封闭测试轨道上进行了随机对照实验。该系统使用逻辑回归和一维卷积神经网络进行训练，取得了较高的检测准确率，为基于可穿戴设备的酒驾预防提供了新的可行方案。

Comments 27 pages, 7 figures

详情

AI中文摘要

酒精影响驾驶仍然是道路交通事故和死亡的一个主要但可预防的原因，许多驾驶员低估了自己的醉酒程度。与车载系统相比，使用消费级智能手表的移动酒驾检测提供了一种可扩展的方式，无需额外车载硬件即可触发预防性干预并提高意识。我们引入了一个系统，利用手腕加速度计数据和心率变异性衍生的生理信号来检测酒精相关的驾驶障碍。我们在一个随机、对照的三组测试轨道研究（n=54）中收集数据，并训练了带有窗口聚合特征的逻辑回归模型和一个双塔一维卷积神经网络（CNN），以检测酒精影响下的驾驶。CNN在检测任何酒精中毒时实现了参与者平均受试者工作特征曲线下面积（AUROC）为0.88，在检测驾驶超过WHO推荐的0.05 g/dL限值时AUROC为0.86。据我们所知，这是第一个（1）展示使用消费级智能手表检测酒驾的工作，（2）在封闭测试轨道的真实车辆中开发和评估此类系统，以及（3）严格评估对未见参与者的泛化能力。这些发现共同凸显了基于可穿戴设备的传感在支持可扩展、测量驱动的酒精相关交通伤害预防方面的潜力。

英文摘要

Alcohol-impaired driving remains a major yet preventable cause of road traffic injury and death, with many drivers underestimating their level of intoxication. Compared to in-vehicle systems, mobile drunk-driving detection using consumer smartwatches offers a scalable way to trigger preventive interventions and increase awareness without additional in-vehicle hardware. We introduce a system that leverages wrist accelerometer data and heart rate variability-derived physiological signals to detect alcohol-related driving impairment. We collected data in a randomized, controlled three-arm test-track study (n=54) and trained both logistic regression models with window-aggregated features and a two-tower 1D convolutional neural network (CNN), to detect alcohol-impaired driving. The CNN achieved a participant-averaged area under the receiver operating characteristic (AUROC) of 0.88 for detecting any alcohol intoxication and 0.86 for detecting driving above the WHO-recommended limit of 0.05 g/dL. To the best of our knowledge, this is the first work to (1) demonstrate drunk-driving detection using consumer smartwatches, (2) develop and evaluate such a system in a real vehicle on a closed test track, and (3) rigorously assess generalization to unseen participants. Together, these findings highlight the potential of wearable-based sensing to support scalable, measurement-driven prevention of alcohol-related traffic harm.

URL PDF HTML ☆

赞 0 踩 0

2605.23655 2026-05-25 cs.CV cs.AI cs.LG cs.MM 版本更新

CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception

CVSearch：赋予多模态大语言模型认知视觉搜索能力以感知高分辨率图像

Liupeng Li, Haoqian Kang, Zhenyu Lu, Jinpeng Wang, Bin Chen, Ke Chen, Yaowei Wang

发表机构 * Harbin Institute of Technology, Shenzhen, China（哈尔滨工业大学（深圳））； Peng Cheng Laboratory, Shenzhen, China（鹏城实验室）； Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China（深圳先进技术研究院）

AI总结高分辨率图像感知是多模态大语言模型面临的关键瓶颈。为解决视觉搜索中覆盖性与效率之间的矛盾，本文提出CVSearch，一种无需训练的自适应框架，通过“评估-搜索”流程动态调度搜索策略。该方法在全局信息不足时采用专家辅助搜索，失败时触发语义感知的扫描机制，有效减少物体碎片化，并通过动态自底向上搜索策略提升局部细节的探索效率。实验表明，CVSearch在高分辨率基准上实现了最先进的准确率和显著提升的搜索效率。

Comments Accepted by ICML 2026. 22 pages, 12 figures, 7 tables

详情

AI中文摘要

高分辨率图像感知是多模态大语言模型的一个关键瓶颈。虽然视觉搜索提供了有希望的解决方案，但现有方法在覆盖率和效率之间难以权衡。视觉专家辅助搜索效率高，但当提议失败时容易出现盲点，而基于扫描的搜索以计算冗余和语义碎片化为代价保证了覆盖率。为了解决这一困境，我们引入了CVSearch，一种无需训练的自适应框架，通过评估-搜索工作流动态调度搜索策略。具体来说，CVSearch首先在全局信息不足时调用专家辅助搜索，仅在失败时触发一种新颖的语义感知扫描机制。与刚性网格划分不同，这种高效扫描范式结合了语义引导的自适应补丁，将图像分解为语义一致的区域，有效缓解了物体碎片化。此外，我们设计了一种由视觉复杂性先验驱动的动态自底向上搜索策略，以实现对局部细节的高效且精确的迭代探索。在高分辨率基准上的大量实验表明，CVSearch在显著提高搜索效率的同时实现了最先进的准确性。代码已发布在https://github.com/liliupeng28/ICML26-CVSearch。

英文摘要

High-resolution (HR) image perception presents a key bottleneck for multimodal large language models (MLLMs). While visual search offers a promising solution, existing methods struggle with the trade-off between coverage and efficiency. Visual expert-assisted search is efficient but prone to blind spots when proposals fail, whereas scan-based search guarantees coverage at the cost of computational redundancy and semantic fragmentation. To address this dilemma, we introduce CVSearch, a training-free adaptive framework that dynamically schedules search strategies via an Assess-then-Search workflow. Specifically, CVSearch first invokes expert-assisted search when global information is insufficient, and only triggers a novel semantic-aware scanning mechanism upon failure. Distinct from rigid grid partitioning, this efficient scanning paradigm incorporates Semantic Guided Adaptive Patching to decompose images into semantically consistent regions, effectively mitigating object fragmentation. Furthermore, we devise a Dynamic Bottom-Up Search strategy driven by a Visual Complexity prior to enable efficient and precise iterative exploration of local details. Extensive experiments on HR benchmarks demonstrate that CVSearch achieves state-of-the-art accuracy while substantially improving search efficiency. Code is released at https://github.com/liliupeng28/ICML26-CVSearch.

URL PDF HTML ☆

赞 0 踩 0

2605.23645 2026-05-25 cs.LG cs.AI 版本更新

Learning Through Noise: Why Subliminal Learning Works and When It Fails

通过噪声学习：为什么潜意识学习有效以及何时失败

Vincent C. Brockers, Roman D. Ventzke, Valentin Neuhaus, Belén Hidalgo-Ogalde, Viola Priesemann

发表机构 * Max Planck Institute for Dynamics and Self-Organization（马克斯·普朗克动态与自组织研究所）； Faculty of Physics, Institute for the Dynamics of Complex Systems, University of Göttingen（哥廷根大学物理系，复杂系统动力学研究所）

AI总结本文研究了人工神经网络中的“潜意识学习”现象，即通过任务无关的输入-输出对进行知识蒸馏时，学生模型从教师模型中隐式学习任务相关知识或偏差的机制。研究发现，这一过程并不依赖于教师与学生模型的初始化一致性，而是由输出头的兼容性所决定。通过控制实验，作者展示了即使在随机初始化、网络结构变化等情况下，学生模型仍能通过兼容的辅助输出头从教师模型中学习有用信息，并在特定条件下达到与教师相当的任务性能。该研究为潜意识学习提供了理论解释，并明确了其适用范围与失效条件。

详情

AI中文摘要

在人工神经网络的背景下，潜意识学习指的是通过任务无关的输入-输出对的蒸馏，将任务相关知识或意外偏差从教师模型传递到学生模型。先前的解释将这种效应归因于共享或紧密匹配的教师-学生初始化。我们表明，紧密匹配的初始化并非必要。相反，潜意识学习由兼容的输出头控制。使用受控的MNIST设置，我们将输出分为辅助头（用于辅助的、任务无关的噪声信号）和分类头（用于分类），以证明潜意识学习发生——即使我们随机初始化隐藏层并移除层、添加新层或更改架构（MLP到CNN）。兼容的辅助头能够传递可恢复的教师信号，使学生的表示更接近教师的表示。当分类头也保持兼容时，仅训练于任务无关噪声的学生可以接近，并且在有利情况下达到教师级别的任务性能。我们的设置使我们能够发展一种理论来解释潜意识学习的机制，并推导出潜意识学习失败时的上界。总之，我们的结果将潜意识学习从一种令人惊讶的迁移效应转变为具有可预测限制的理论基础机制。

英文摘要

In the context of artificial neural networks, subliminal learning refers to the transfer of task-relevant knowledge or unintended biases from teacher to student models through distillation on task-unrelated input$\unicode{x2013}$output pairs. Prior explanations tie this effect to shared or closely matched teacher$\unicode{x2013}$student initialization. We show that a closely matched initialization is not necessary. Instead, subliminal learning is governed by compatible output heads. Using a controlled MNIST setting, we split outputs into an auxiliary head (for auxiliary, task-unrelated noise signals) and a class head (for classification) to demonstrate subliminal learning occurs$\unicode{x2014}$even when we randomly initialize hidden layers and remove layers, add new layers, or change the architecture (MLP-to-CNN). Compatible auxiliary heads enable transfer of a recoverable teacher signal, bringing the student's representations closer to the teacher's. When the class heads remain compatible as well, students trained only on task-unrelated noise can approach, and in favorable regimes match, teacher-level task performance. Our setting enables us to develop a theory that explains the mechanism of subliminal learning and to derive upper bounds on when subliminal learning fails. Together, our results turn subliminal learning from a surprising transfer effect into a theoretically grounded mechanism with predictable limits.

URL PDF HTML ☆

赞 0 踩 0

2605.23643 2026-05-25 cs.CR cs.LG 版本更新

Less Effort, Shorter Proofs: Reinforcement Learning for Security Protocol Analysis in Tamarin

更少努力，更短证明：Tamarin中安全协议分析的强化学习

Matthias Cosler, Cas Cremers, Bernd Finkbeiner, Mohamed Ghanem, Niklas Medinger

发表机构 * CISPA Helmholtz Center for Information Security（CISPA 欧洲信息安全中心）； Technical University of Munich（慕尼黑技术大学）

AI总结本文提出了一种基于强化学习的框架，用于辅助Tamarin工具进行安全协议的形式化验证。该方法受到AlphaZero和AlphaProof的启发，结合蒙特卡洛树搜索和神经网络启发式策略，实现了更高效、更短的协议验证过程。实验表明，该方法在多个案例研究中能够自动发现更多证明，并且生成的证明长度优于Tamarin默认搜索和人工设计的启发式方法，有效降低了验证过程中的人力投入。

详情

AI中文摘要

像Tamarin和ProVerif这样的工具在分析和验证复杂的现实世界协议（如EMV、5G和WPA2）方面取得了显著成功，甚至检测到了零日漏洞。尽管取得了这些成功，验证此类协议仍然是一项耗时、具有挑战性的任务，通常需要大量的人力和专业知识。在本文中，我们提出了一个受AlphaZero和AlphaProof启发的强化学习（RL）框架，该框架为Tamarin实现了一种新的证明搜索风格。我们为Tamarin开发了一个无状态API，充当经典的RL环境。我们通过一个从完成的子证明中学习的神经启发式来指导蒙特卡洛树搜索（MCTS）。我们在16个案例研究上评估了我们的框架，范围从经典协议模型到近期出版物中具有挑战性的最先进协议模型。我们的方法比Tamarin的标准搜索自动找到更多的证明，并且比标准和人工设计的启发式产生更短的证明。我们的流程开箱即用，可帮助Tamarin用户在活跃研究中减少所需的人力。此外，我们的标准化接口为用户提供了一种与Tamarin交互的程序化方式。最后，我们的工作展示了将基于RL的方法适应Tamarin领域的巨大潜力。

英文摘要

Tools like Tamarin and ProVerif have achieved notable success in analyzing and verifying complex real-world protocols such as EMV, 5G, and WPA2, even detecting zero-day exploits. Despite these successes, verifying such protocols remains a time-consuming, challenging task, often requiring significant human effort and expertise. In this paper, we present a reinforcement learning (RL) framework inspired by AlphaZero and AlphaProof that implements a new style of proof search for Tamarin. We have developed a stateless API for Tamarin that acts as a classical RL environment. We guide a Monte Carlo Tree Search (MCTS) by a neural heuristic that learns from completed subproofs. We evaluate our framework on 16 case studies, ranging from classical protocol models to challenging state-of-the-art protocol models from recent publications. Our method finds more proofs automatically than Tamarin's standard search and produces shorter proofs than both the standard and human-engineered heuristics. Our pipeline is applicable out of the box to assist Tamarin users in active research, reducing the human effort required. Moreover, our standardized interface provides a programmatic way for users to interact with Tamarin. Finally, our work demonstrates the promising potential of adapting RL-based methods to the Tamarin domain.

URL PDF HTML ☆

赞 0 踩 0

2605.23635 2026-05-25 stat.ML cs.LG 版本更新

Dirichlet-Based Monte Carlo Dropout for Uncertainty Estimation in Neural Networks

基于狄利克雷的蒙特卡洛丢弃法用于神经网络不确定性估计

Rouaa Hoblos, Noura Dridi, Noureddine Zerhouni, Zeina Al Masry

AI总结传统神经网络无法提供预测的不确定性估计，而贝叶斯神经网络虽能进行不确定性量化，但计算复杂度较高。本文提出了一种基于狄利克雷分布的蒙特卡洛Dropout方法，在保持计算效率的同时提升了不确定性估计的质量。该方法通过将类别概率建模为狄利克雷分布，实现了更具信息量的不确定性表示，并在实验中验证了其在不确定性校准方面的有效性。

详情

Journal ref: 56es Journ{é}es de Statistique de la SFdS, Jun 2025, Marseille, France

AI中文摘要

传统神经网络提供确定性预测，缺乏固有的不确定性估计。虽然贝叶斯神经网络（BNN）为不确定性量化提供了原则性方法，但其计算复杂度限制了可扩展性。蒙特卡洛（MC）Dropout最初作为正则化技术引入，已被证明通过多次随机前向传播实现概率建模，从而近似贝叶斯推断。在这项工作中，我们通过在MC Dropout中集成基于狄利克雷的框架来增强深度学习中的不确定性估计。具体来说，我们利用Sensoy等人（2018）提出的公式，其中使用狄利克雷分布对类概率进行建模，从而允许更信息化的不确定性表示。所提出的方法保持了MC Dropout的计算效率，同时提高了不确定性估计的质量。我们讨论了所提出方法的理论基础，并将其与现有的不确定性量化技术进行了比较。结果突显了所提出方法在产生良好校准的不确定性估计方面的有效性，为不确定性感知的深度学习模型提供了实用解决方案。

英文摘要

Traditional neural networks provide deterministic predictions without inherent uncertainty estimates. While Bayesian Neural Networks (BNNs) offer a principled approach to uncertainty quantification, their computational complexity limits scalability. Monte Carlo (MC) Dropout, initially introduced as a regularization technique, has been shown to approximate Bayesian inference by enabling probabilistic modeling through multiple stochastic forward passes. In this work, we enhance uncertainty estimation in deep learning by integrating a Dirichlet-based framework within MC Dropout. Specifically, we leverage the formulation proposed by Sensoy et al. (2018), where class probabilities are modeled using a Dirichlet distribution, allowing for a more informative uncertainty representation. The proposed approach maintains the computational efficiency of MC Dropout while improving the quality of uncertainty estimates. We discuss the theoretical foundations of our method and compare it with existing uncertainty quantification techniques. The results highlight the effectiveness of the proposed method in producing well-calibrated uncertainty estimates, offering a practical solution for uncertainty-aware deep learning models.

URL PDF HTML ☆

赞 0 踩 0

2605.23632 2026-05-25 cs.LG 版本更新

Valid and Expressive Copulas for Irregular Multivariate Time Series

不规则多元时间序列的有效且表达力强的Copula模型

Christian Klötergens, Tom Hanika, Lars Schmidt-Thieme, Vijaya Krishna Yalavarthi

发表机构 * Institute of Computer Science（计算机科学研究所）； University of Hildesheim（希尔德斯海姆大学）

AI总结本文提出了一种名为CopFITi的模型，用于对不规则多变量时间序列进行概率预测。该模型结合了归一化流在单变量边缘分布上的表达能力，以及高斯混合copula在联合依赖结构上的灵活性和一致性。研究首次构建了一个在边缘化上具有一致性的不规则多变量时间序列copula模型，并在联合密度建模方面取得了新的状态-of-the-art成果。

2605.23628 2026-05-25 cs.LG 版本更新

How Hard is it to Rig a Benchmark? A Social Choice Analysis of Leaderboard Robustness

操纵基准测试有多难？排行榜鲁棒性的社会选择分析

Polina Gordienko, Georg Schollmeyer, Frauke Kreuter, Christoph Jansen

发表机构 * Department of Statistics, LMU Munich（慕尼黑大学统计系）； Social Data Science Center, University of Maryland（马里兰大学社会数据科学中心）； School of Computing & Communications, Lancaster University Leipzig（莱比锡兰卡斯特大学计算与通信学院）

AI总结本文研究了在多任务基准测试中通过训练数据选择来操纵模型排名的难度问题，将其类比为社会选择理论中的选举操纵问题。作者将数据集视为选民、模型视为候选人，证明在Borda计数和平均胜率等评价指标下，基准特定训练问题属于NP难问题。此外，文章引入了实例级别的鲁棒性指标，用于衡量模型开发者需要包含多少数据集才能在排行榜上超越其他模型，并在多个基准测试中验证了不同指标下的鲁棒性差异，发现平均胜率最难被操纵。

详情

AI中文摘要

多任务基准测试已成为机器学习研究的核心支柱，但其日益增长的影响力激励了基准测试游戏——为提高特定模型的排行榜排名而采取的策略性行动。将数据集视为选民，模型视为候选人，我们将基准特定训练——在训练中包含基准数据——视为一种选举操纵形式。对于任何序数基准，选择训练数据集以使目标模型排名第一的问题对应于移位贿赂，这是计算社会选择中的一类操纵问题。利用这一识别，我们证明在Borda计数和平均胜率下，基准特定训练问题是NP难的。作为这种最坏情况视角的补充，我们引入了实例级鲁棒性，即模型开发者必须包含在训练中以使给定排行榜排名第一的最小数据集数量，并在算术平均、中位数、平均胜率和成对多数下推导出其表达式。我们在HELM下的MMLU和Open LLM排行榜下的BIG-Bench Hard（BBH）上评估了这些表达式。在两个套件中，平均胜率最难操纵：这一差距在BBH（24个任务，4507个模型）上很明显，其中位鲁棒性为22个任务（92%），而算术平均下为13个（54%），中位数和成对多数下为12个（50%）。

英文摘要

Multi-task benchmarks have become a central pillar of machine learning research, yet their growing influence has incentivised benchmark gaming -- strategic actions taken to improve the leaderboard rank of a specific model. Treating datasets as voters and models as candidates, we consider benchmark-specific training -- the inclusion of benchmark data in training -- as a form of election manipulation. For any ordinal benchmark, the problem of choosing datasets to train on so that a target model becomes top-ranked corresponds to shift bribery, a class of manipulation problems from computational social choice. Leveraging this identification, we show that the benchmark-specific training problem is NP-hard under Borda count and mean win rate. Complementing this worst-case perspective, we introduce the instance-level robustness, the minimum number of datasets a model developer must include in training to top a given leaderboard, and derive expressions for it under arithmetic mean, median, mean win rate and pairwise majority. We evaluate these expressions on MMLU under HELM and on BIG-Bench Hard (BBH) under the Open LLM Leaderboard. Across both suites, mean win rate is hardest to manipulate: this gap is clear on BBH (24 tasks, 4507 models), where its median robustness is 22 tasks (92%), compared with 13 (54%) under arithmetic mean and 12 (50%) under median and pairwise majority.

URL PDF HTML ☆

赞 0 踩 0

2605.23623 2026-05-25 cs.CR cs.AI cs.LG 版本更新

Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection

时间概念漂移下的对抗脆弱性：Android恶意软件检测的纵向研究

Ahmed Sabbah, Mohammed Kharma, Radi Jarrar, Samer Zein, David Mohaisen

发表机构 * Department of Computer Science, Birzeit University（巴勒斯坦伯利兹大学计算机科学系）； Department of Computer Science, University of Central Florida（佛罗里达州立大学计算机科学系）

AI总结本文通过长期视角研究了安卓恶意软件检测系统在时间概念漂移下的对抗脆弱性，分析了十年间应用数据在静态和动态特征表示下的对抗鲁棒性。研究采用三种部署协议评估模型性能，引入了多个时间关联指标以量化分布偏移对鲁棒性的影响。结果表明，随着时间间隔增大，对抗鲁棒性下降，而攻击成功率上升，强调了在动态数据环境下需考虑时间漂移因素，并提出了针对长期对抗环境的鲁棒性评估框架的重要性。

Comments 42 pages, 4 tables, 10 figures

详情

基于稀疏特征的非对称缩放定律

John Sous, Michael Winer

发表机构 * Yale University（耶鲁大学）； Energy Sciences Institute（能源科学研究所）； Institute for Advanced Study（高级研究院）； Alignment Research Center（对齐研究中心）

AI总结本文研究了稀疏激活下神经网络的扩展规律，提出了一种新的模型，指出测试损失主要由训练输入中从未出现的稀疏坐标主导，从而形成一种不同于密集模型的新瓶颈。研究推导了欠参数化和过参数化情形下的渐近损失，并发现损失曲线在插值阈值附近呈现双下降现象，表现出由稀疏度决定的两个不同扩展指数。此外，还分析了梯度下降动力学，并展示了固定步长梯度下降不稳定概率的扩展规律，表明稀疏性带来的影响在非线性激活下依然存在。

详情

AI中文摘要

我们引入了一个稀疏激活下的神经缩放定律模型。在该模型中，测试损失通常由训练输入中从未观察到的稀有坐标主导。这种机制引入了一个密集模型中不存在的新瓶颈。我们推导了欠参数化和过参数化区域的渐近总体损失，并表明损失在插值阈值附近出现双下降峰值——其中参数数量刚好足以拟合训练数据——导致损失曲线由两个不同的缩放指数控制：一个用于过参数化区域，一个用于欠参数化区域，其差距由稀疏程度决定。此外，我们推导了一个计算最优边界，在固定计算预算下倾向于增加数据集大小而非模型容量。我们还分析了梯度下降动力学，并确定了固定步长梯度下降变得不稳定的概率的缩放定律。我们进一步表明，稀疏诱导效应在非线性激活下仍然存在。

英文摘要

We introduce a model for neural scaling laws under sparse activations. In the model, test loss is often dominated by rare coordinates that are never observed in the training input. This mechanism induces a novel bottleneck absent from dense models. We derive the asymptotic population loss in both the underparameterized and overparameterized regimes, and show that the loss exhibits a double-descent peak near the interpolation threshold -- where the number of parameters is just sufficient to fit the training data -- resulting in a loss curve governed by two distinct scaling exponents -- one for the overparameterized regime and one for the underparameterized regime -- with a gap determined by the degree of sparsity. Additionally, we derive a compute-optimal frontier that favors increasing dataset size over model capacity under fixed compute budgets. We also analyze gradient-descent dynamics and identify a scaling law for the probability that fixed-step gradient descent becomes unstable. We further show that the sparsity-induced effect persists under nonlinear activations.

URL PDF HTML ☆

赞 0 踩 0

2605.23583 2026-05-25 cs.RO cs.LG 版本更新

How Many Training Samples Are Needed for the Inverse Kinematics Solutions by Artificial Neural Networks

人工神经网络求解逆运动学需要多少训练样本

Dong-Won Lim

发表机构 * The University of Suwon（苏won大学）

AI总结本文研究了使用人工神经网络求解机器人逆运动学问题时所需的最小训练样本数量。通过构建不同规模的训练数据集，训练前馈神经网络并评估其精度、收敛性和泛化能力，发现当样本数量超过125后，模型效率提升不再显著。该研究为实际机器人应用中优化神经网络数据规模、平衡计算成本与模型精度提供了有价值的指导。

Comments 14 pages, 5 figures

详情

AI中文摘要

逆运动学在机器人运动规划与控制中扮演关键角色。机器人操作臂的逆运动学求解可通过传统方法如几何法、代数法或雅可比法实现，但这些方法存在缺陷。人工神经网络因其泛化能力和计算效率，已成为近似逆运动学解的有前途的替代方案。该方法基本上只训练记录用于求解逆运动学问题的少量末端执行器样本。然而，一个基本问题仍然存在：多少训练样本足以实现可靠且准确的逆运动学预测？本研究探讨了训练数据集大小与基于ANN的逆运动学求解器精度之间的数学框架。使用关节型机器人操作臂，我们生成不同数量的关节位置对来训练前馈神经网络，并评估其精度、收敛性和泛化能力。结果表明，超过125个训练样本并未有助于提高模型效率，该效率通过采样大小上的近似精度可比度量来衡量，为数据效率提供了宝贵见解。这项工作为优化ANN解决方案的数据规模提供了实用指导，平衡了实际机器人应用中的计算成本和模型精度。

英文摘要

Inverse Kinematics (IK) plays a critical role in robotic motion planning and control. The IK solutions of a robot manipulator could be done by conventional ways such as geometric, algebraic, or Jacobian methods, which have drawbacks. The Artificial Neural Networks (ANNs) have become a promising alternative for approximating IK solutions due to their generalization ability and computational efficiency. This approach basically trains only a few samples of the end effector that are recorded for the solution of the IK problem. However, a fundamental question remains: how many training samples are sufficient to achieve reliable and accurate IK predictions? This study investigates the mathematical framework of relating the size of training datasets and the accuracy of ANN-based IK solvers. Using an articulated robotic manipulator, we generate varying amounts of joint-position pairs to train feedforward neural networks and assess their accuracy, convergence, and generalization capability. The results reveal more training samples than 125 did not contribute to the improvement of the model efficiency that the comparable measure dealing with the approximation accuracy over the sampling size, offering valuable insight into data efficiency. This work provides practical guidance for optimizing the data sizing of ANN solutions, balancing computational cost and model accuracy for real-world robotic applications.

URL PDF HTML ☆

赞 0 踩 0

2605.23574 2026-05-25 cs.LG cs.SE 版本更新

Push Your Agent: Measuring and Enforcing Quantitative Goal Persistence in Long-Horizon LLM Agents

推动你的智能体：在长周期LLM智能体中测量和强制实现定量目标持续性

Yuandao Cai, Yuzhang Zhu, Liyou Gao, Wensheng Tang, Shengchao Qin

发表机构 * Independent Researcher（独立研究者）； Xidian University（西安电子科技大学）

AI总结本文研究了长期语言智能体在完成定量目标时存在的“定量目标持续性”（QGP）问题，即智能体是否能持续工作直到外部验证器确认完成足够数量的有效任务。为此，作者提出了PushBench基准，用于直接衡量重复工作、重复提交、虚假完成等问题。实验表明，基于状态追踪和工作单元追踪的控制器在减少重复提交和提高任务完成率方面表现优异，而当前主流智能体在处理大量任务时成功率显著下降，突显了定量目标对智能体可靠性提出的更高要求。

详情

AI中文摘要

长周期语言智能体可能做出许多看似合理的局部工具调用，但未能持续直到请求的数量实际完成。我们将这一差距研究为定量目标持续性（QGP）：即智能体是否持续工作，直到外部验证器确认足够数量的不同有效项。PushBench将其转化为一个用于仓库-工件收集和验证器支持的工作单元的基准，因此重复工作、重复提交、虚假完成和进度漂移被直接测量，而不是隐藏在最终成功标志之后。在匹配的控制器比较中，状态追踪检索控制器达到69-78%的成功率，同时消除了重复提交；而积压追踪工作单元控制器在标准和完成门控控制器无法完成任何任务实例的设置中达到25-50%的成功率。使用Claude Code（Sonnet 4.6）和Codex CLI（gpt-5.4）的黑盒前沿智能体评估解决了许多50个工件的任务，但在100个工件时每条件仅剩3/9的成功率。结果表明，定量目标对不同于局部任务能力的可靠性要求提出了挑战：智能体必须维护已验证的进度，并仅在请求的工作完成时停止。

英文摘要

Long-horizon language agents can make many plausible local tool calls yet fail to persist until a requested count is actually complete. We study this gap as Quantitative Goal Persistence (QGP): whether an agent keeps working until an external verifier confirms enough distinct valid items. PushBench turns this into a benchmark for repository-artifact collection and verifier-backed work units, so repeated work, duplicate submissions, false completion, and progress drift are measured directly rather than hidden behind a final success flag. In matched controller comparisons, a state-tracking retrieval controller reaches 69-78% success while eliminating duplicate submissions, and a backlog-tracking work-unit controller reaches 25-50% success in settings where standard and completion-gated controllers complete no task instances. Black-box frontier-agent evaluations with Claude Code (Sonnet 4.6) and Codex CLI (gpt-5.4) solve many 50-artifact tasks but drop to 3 out of 9 successes per condition at 100 artifacts. The results show that quantitative goals stress a different reliability requirement from local task competence: agents must maintain verified progress and stop only when the requested work is complete.

URL PDF HTML ☆

赞 0 踩 0

2605.23572 2026-05-25 cs.IR cs.AI cs.LG 版本更新

HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval

HARNESS-LM: 一种在赞助搜索中利用小语言模型的三阶段训练方案

Vipul Gupta, Shikhar Mohan, Lakshya Kumar, Pranjal Chitale, Nikit Begwani, Amit Singh, Manik Varma

发表机构 * Microsoft AI（微软人工智能）

AI总结在赞助搜索中，如何在保证检索质量的同时降低响应延迟是一个重要挑战。本文提出HARNESS-LM（HLM），一种三阶段训练框架，旨在将大规模语言模型的检索能力转移到参数更少、成本更低的模型中。通过知识蒸馏和对比优化等方法，HLM在保持高检索精度的同时显著提升了推理效率，并在实际的Bing Ads测试中验证了其有效性，取得了更高的收益、曝光和点击率提升。

Comments 9 pages, 3 figures, 10 tables

详情

AI中文摘要

在赞助搜索的竞争格局中，平衡检索质量与生产延迟是一个关键挑战。尽管基于小语言模型（SLM）的大型检索模型（如Qwen3-Embedding-4B/8B）在公共基准上设定了强上限，但其在高吞吐、延迟敏感环境中的部署仍不切实际。本文提出HARNESS-LM（HLM），一个三阶段训练框架，用于将大规模检索器的能力迁移至紧凑、成本高效的模型。该方法包括：（1）通过微调十亿参数规模的SLM训练高性能参考（“教师”）检索器；（2）通过L2目标对齐查询表示，将知识蒸馏至低于600M参数的学生编码器；（3）应用最终对比精炼阶段以优化学生的检索性能。我们还对关键设计选择进行了全面的实证研究，包括对齐目标、嵌入维度、模型规模、架构和优化策略，以确定在生产环境中最为有效的配置。在真实世界的Bing Ads评估基准上，HLM在多种设置下恢复了参考检索器超过98%的精度，同时在NVIDIA A100 GPU上实现了高达27倍的在线查询编码器延迟降低和20倍的吞吐量提升。在Bing Ads上的在线A/B测试进一步显示，与当前生产中运行的检索器集成（部署190M参数模型）相比，收入提升+1%，展示量提升+0.6%，点击量提升+0.4%，清晰突显了HLM方案在真实世界赞助搜索场景中的实际效果。

英文摘要

In the competitive landscape of sponsored search, balancing retrieval quality with production latency is a critical challenge. While large retrieval models based on Small Language Models (SLMs) such as Qwen3-Embedding-4B/8B set strong upper bounds on public benchmarks, their deployment in high-throughput, latency-sensitive environments remains impractical. In this paper, we present HARNESS-LM (HLM), a three-phase training framework for transferring the capabilities of large-scale retrievers into compact, cost-efficient models. The approach comprises: (1) training a high-performance reference ("teacher") retriever by fine-tuning a billion-parameter-scale SLM; (2) aligning query representations via an L2 objective to distill knowledge into a sub-600M parameter student encoder; and (3) applying a final contrastive refinement stage to optimize the student for retrieval performance. We also present a comprehensive empirical study of key design choices, including alignment objectives, embedding dimensionality, model scale, architecture, and optimization strategies, to identify configurations that are most effective in production settings. On a real-world Bing Ads evaluation benchmark, HLM recovers over 98% of the reference retriever's precision across multiple settings, while delivering up to 27x lower online query-encoder latency and 20x higher throughput on NVIDIA A100 GPUs. Online A/B testing on Bing Ads further shows a +1% Revenue, +0.6% Impression, and +0.4% Click uplift over the current ensemble of retrievers running in production with the deployed 190M parameter model, clearly highlighting the practical efficacy of the HLM recipe in a real-world sponsored search setting.

URL PDF HTML ☆

赞 0 踩 0

2605.23565 2026-05-25 cs.LG cs.AI 版本更新

Understanding Goal Generalisation in Sequential Reinforcement Learning

理解序贯强化学习中的目标泛化

Jason Ross Brown, Edward James Young

发表机构 * University of Cambridge（剑桥大学）； Geodesic Research（Geodesic研究）

AI总结本研究探讨了序列强化学习代理在新环境中实现目标泛化的能力，分析了其训练历史对其行为的影响。通过研究超过100种序列训练流程并在250多个分布外环境中进行评估，发现显著特征和早期学习的目标对后续泛化具有重要影响。为此，研究提出了一种名为潜在策略梯度的方法，能够预测训练流程可能诱导的分布外行为，具有较高的预测准确性、良好的泛化能力和可解释性，为从发展角度理解目标泛化提供了基础。

详情

AI中文摘要

强化学习代理在其训练分布之外常常表现出非预期的目标导向行为，但我们目前缺乏基于训练历史对这类代理如何泛化到新环境的原理性理解。我们针对在单个或多个任务上序贯训练的代理解决了这一空白。我们研究了超过100个序贯训练流程，评估了超过250个分布外环境中的行为。我们发现显著特征驱动泛化，并且训练早期习得的目标会持续存在并影响后期习得的目标。为了解释这些现象，我们引入了潜在策略梯度方法，该方法预测训练流程可能诱导的分布外行为。我们的方法根据潜在变量如何映射到行为的简单模型，模拟训练过程中低维潜在变量的演化，以实现在训练目标上获得高奖励。它实现了强预测准确性，泛化到未见过的训练流程类型，并且是可解释的。我们的发现表明，虽然分布外RL代理行为依赖于整个训练流程，但这种依赖具有我们可以捕捉的底层结构，为从发展角度理解目标泛化奠定了基础。

VACE：学习几何结构化表示用于时间序列异常检测

Alberto D. Cencillo, Leonardo Concepción, Isaac Triguero, Julián Luengo

发表机构 * Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI)（安达卢西亚数据科学与计算智能研究 institute）； Department of Computer Science and Artificial Intelligence (DECSAI), University of Granada（格拉纳达大学计算机科学与人工智能系）

AI总结该论文提出了一种名为VACE的自监督异常检测方法，用于多变量时间序列中的异常检测。VACE通过速度对齐的通道嵌入方式，学习具有紧凑且方向一致结构的正常表示，从而更准确地识别异常。该方法无需负样本和合成异常，通过速度一致性目标训练编码器，使正常轨迹在嵌入空间中保持局部平滑和对齐。实验表明，VACE在多个基准数据集上取得了优于复杂方法的优异性能。

Comments 16 pages, 5 figures

详情

AI中文摘要

多变量时间序列中的异常检测是广泛实际应用中的关键任务，其中异常行为罕见、标签不可用且漏检成本高昂。核心挑战在于学习足够精确的正常性表征以标记偏差。表示自监督学习（通常通过对比方法）通过将时间补丁嵌入到潜在空间来解决这一问题，其中正常性占据一个定义明确的区域，异常通过几何偏差检测。然而，对比方法通过配对采样启发式间接塑造该空间，无法对基于距离评分所需的几何结构进行显式控制。这意味着正常表示的紧凑程度以及距离是否具有方向意义。我们提出VACE（速度对齐通道嵌入），一种自监督异常检测方法，将正常性表示为嵌入空间中紧凑且方向一致的区域。为此，VACE通过速度一致性目标训练通道感知编码器，无需负样本和合成异常，使得正常轨迹局部平滑且对齐。在测试时，马氏距离位置得分和速度库方向得分相乘，标记同时偏离分布和动态异常的点。尽管方法简单，VACE在严格评估下于TSB-AD-M上实现了最先进性能，显著优于使用更大预算训练的复杂方法。

英文摘要

Anomaly detection in multivariate time series is a critical task across a wide range of real-world applications, where abnormal behaviour is rare, labels are unavailable, and the cost of a miss is high. The central challenge is learning a characterisation of normality precise enough to flag deviations. Representation self-supervised learning, typically through contrastive approaches, addresses this by embedding temporal patches into a latent space where normality occupies a well-defined region, with anomalies detected by geometric deviation. However, contrastive approaches shape this space indirectly through pair-sampling heuristics, providing no explicit control over the geometric structure that distance-based scoring requires. This means how tightly normal representations are grouped, and whether distances are directionally meaningful. We present VACE (Velocity-Aligned Channel Embeddings), a self-supervised anomaly detection method that represents normality as a compact, directionally coherent region in the embedding space. To this end, VACE trains a channel-aware encoder through a velocity-consistency objective, with no negatives and no synthetic anomalies, so that normal trajectories are locally smooth and aligned. At test time, a Mahalanobis positional score and a velocity-bank directional score are combined multiplicatively, flagging points that are simultaneously off-distribution and dynamically atypical. Despite its simplicity, VACE achieves state-of-the-art performance on TSB-AD-M under rigorous evaluation, significantly outperforming more complex methods trained on substantially larger budgets.

URL PDF HTML ☆

赞 0 踩 0

2605.23476 2026-05-25 cs.LG cond-mat.dis-nn cond-mat.mtrl-sci math.OC 版本更新

Non-normal spectral signatures of instability in neural network training dynamics

神经网络训练动态中不稳定性的非正态谱特征

Souvik Ghosh

发表机构 * Department of Physics, National Sun Yat-sen University, Kaohsiung 80424, Taiwan（物理系，国立中山大学，高雄 80424，台湾）

AI总结本文研究了深度网络训练过程中常见的不稳定性问题，如损失尖峰、振荡收敛和梯度异常，并通过非正规算子理论提供了理论解释。研究发现，常用优化器的线性化更新算子普遍是非正规的，其非正规性由Hessian矩阵与自适应预条件器或动量结构之间的相互作用引起。通过非正规稳定性理论，作者提出了一个基于伪谱的保守前兆界，并证明了条件数κ(V)可以作为训练过程中瞬时放大现象的早期预警指标，为理解自适应优化算法的稳定性提供了新的诊断工具和理论框架。

Comments 9 pages, 3 figurea

详情

AI中文摘要

深度网络中的训练不稳定性——损失尖峰、振荡收敛和梯度病态——在经验上普遍存在，但缺乏严格的算子理论解释。我们证明，实际使用的优化器的线性化更新算子通常是非正态的：对于Adam，非正态性由Hessian与对角自适应预条件子之间的换位子[H, M]控制；而对于带动量的SGD，它源于更新映射的增广状态空间结构。将非正态稳定性理论应用于这些算子，我们推导出一个保守的伪谱前兆界，其中κ(V)作为瞬态放大的早期预警指标，即使谱半径仍小于1；并且我们建立了更新算子的异常点作为该框架中κ(V) → ∞的极限情况。在两层网络上的数值实验证实，谱半径ρ(J)无法区分稳定和不稳定的训练阶段，而κ(V)能将它们分开约一个数量级，用非正态放大的连续严重性度量补充了经典的锐度准则。这些结果确立了非厄米算子理论作为神经网络优化稳定性中一个有用且未被充分探索的框架，为理解自适应优化稳定性提供了诊断语言和概念验证基准。

英文摘要

Training instabilities in deep networks - loss spikes, oscillatory convergence, and gradient pathologies - are empirically prevalent but lack a rigorous operator-theoretic explanation. We show that the linearized update operators for practically used optimizers are generically non-normal: for Adam, non-normality is controlled by the commutator [H, M] between the Hessian and the diagonal adaptive preconditioner, while for SGD with momentum it arises from the augmented state-space structure of the update map. Applying non-normal stability theory to these operators, we derive a conservative pseudospectral precursor bound in which κ(V) serves as an early-warning indicator of transient amplification even when the spectral radius remains below one, and we establish that exceptional points of the update operator appear as the κ(V) -> \infty limiting case of this framework. Numerical experiments on two-layer networks confirm that the spectral radius ρ(J) provides no separation between stable and unstable training phases while κ(V) separates them by approximately one order of magnitude, complementing the classical sharpness criterion with a continuous severity measure of non-normal amplification. These results establish non-Hermitian operator theory as a useful and underexplored framework for neural network optimization stability, offering a diagnostic language and proof-of-concept benchmark for understanding adaptive optimization stability.

URL PDF HTML ☆

赞 0 踩 0

2605.23471 2026-05-25 cs.LG cs.AI 版本更新

CBANet: A Compact Attention-Based CNN-BiLSTM Network for Aggressive Driving Event Detection

CBANet：一种用于激进驾驶事件检测的紧凑型注意力CNN-BiLSTM网络

Hanadi Alhamdan, Ghadah Alosaimi, Amir Atapour-Abarghouei, Farshad Arvin

发表机构 * Department of Computer Science, Princess Nourah bint Abdulrahman University（普里西拉计算机科学系，普里西拉努拉·本·阿卜杜勒拉赫曼大学）； Department of Computer Science, Durham University（计算机科学系，杜ham大学）； Department of Computer Science, Imam Mohammad Ibn Saud Islamic University（计算机科学系，伊玛姆穆罕默德·本·萨德伊斯兰大学）

AI总结本文提出了一种名为CBANet的紧凑型注意力机制结合CNN-BiLSTM的深度学习框架，用于检测激进驾驶事件。该方法通过构建工程化的动态特征来捕捉转向、加速和制动行为，并采用基于SMOTE的过采样与类别加权损失相结合的稳定训练策略，以应对自然驾驶数据中激进事件极度稀有的问题。实验表明，该方法在少数类召回率和安全关键F分数等指标上显著优于传统深度学习方法，同时保持了较高的计算效率。

Comments 8 pages, 4 figures, 4 tables. Submitted to IJCNN/WCCI 2026. CBANet: A compact attention-based CNN-BiLSTM framework for aggressive driving event detection using multivariate vehicle dynamics signals. Code available at https://github.com/halhamdan/CBANet

详情

AI中文摘要

激进驾驶是交通事故的主要原因，对道路安全构成严重威胁。尽管深度学习方法在从车辆传感器数据检测危险驾驶行为方面显示出有希望的结果，但它们在现实条件下的性能通常受到严重数据不平衡、驾驶员间巨大差异以及缺乏物理可解释的车辆动力学表示的限制。在本文中，我们提出了一种增强的深度学习框架，用于使用多变量车辆动力学信号进行激进驾驶检测。该方法不仅依赖原始测量，还构建了捕捉转向、加速和制动行为的工程动力学特征。为了解决自然驾驶数据中激进事件的极端稀少性，我们引入了一种稳定的训练策略，结合了基于SMOTE的受控过采样和类别加权损失公式，并评估了用于不平衡处理的焦点损失变体。此外，采用基于类别特定阈值校准的安全导向决策策略，以更好地反映现实应用中漏检和误报的不对称风险。该框架在新收集的自然驾驶数据集上进行了评估。大量实验表明，所提出的方法在保持实际计算效率的同时，在少数类召回率和安全关键F-score指标上始终优于标准深度学习基线。代码：\url{https://github.com/halhamdan/CBANet}

英文摘要

Aggressive driving is a major cause of traffic accidents and poses a serious threat to road safety. Although deep learning methods have shown promising results in detecting risky driving behaviours from vehicle sensor data, their performance in real-world conditions is often limited by severe data imbalance, large variability between drivers, and the lack of physically interpretable vehicle dynamics representations. In this paper, we propose an enhanced deep learning framework for aggressive driving detection using multivariate vehicle dynamics signals. Instead of relying solely on raw measurements, the proposed approach constructs engineered dynamic features that capture steering, acceleration, and braking behaviour. To address the extreme rarity of aggressive events in naturalistic driving data, we introduce a stable training strategy that combines controlled SMOTE-based oversampling with a class-weighted loss formulation, and evaluates focal loss variants for imbalance handling. Furthermore, a safety-oriented decision strategy based on class-specific threshold calibration is adopted to better reflect the asymmetric risks of missed detections and false alarms in real-world applications. The proposed framework is evaluated on a newly collected naturalistic driving dataset. Extensive experiments show that the proposed method consistently outperforms standard deep learning baselines with significant improvements in minority-class recall and safety-critical F-score metrics while maintaining practical computational efficiency. Code: \url {https://github.com/halhamdan/CBANet}

URL PDF HTML ☆

赞 0 踩 0

2605.23470 2026-05-25 cs.LG cs.AI cs.CE 版本更新

Learning Individual Dynamics from Sparse Cross-Sectional Snapshots

从稀疏横截面快照中学习个体动力学

Christian Lagemann, Kai Lagemann, Steven L. Brunton, Sach Mukherjee

发表机构 * Statistics and Machine Learning, German Center for Neurodegenerative Diseases (DZNE)（统计与机器学习，德国神经退行性疾病中心（DZNE））； MediaTek Research（联发科技研究）； Department of Mechanical Engineering & AI Institute in Dynamic Systems, University of Washington, Seattle（机械工程与人工智能动态系统研究所，华盛顿大学，西雅图）； DZNE & University of Bonn, Bonn, Germany and University of Cambridge, Cambridge, United Kingdom（DZNE与波恩大学，波恩，德国和剑桥大学，剑桥，英国）

AI总结该研究旨在从稀疏的横截面快照中学习个体的动态演化过程，传统方法在数据稀疏或完全横截面的情况下难以准确推断个体的连续时间轨迹。本文提出了一种名为CADENCE的概率框架，通过将潜在动态与静态个体上下文关联，实现了从孤立快照中恢复个体轨迹。该方法结合了基于分数的空域编码器和软专家混合路由机制，提供了单时间点轨迹推断的可识别性保证，并在多个基准测试中表现出优于现有序列模型的性能。

详情

AI中文摘要

预测一个动力学单元如何随时间演化——例如个体如何衰老、流行病如何传播、物理系统如何退化——通常需要密集的纵向追踪。当只有极其稀疏或完全横截面的数据可用时，推断个体化的连续时间轨迹本质上是病态的。现有方法迫使严格妥协：序列模型（如潜在ODE）需要密集的纵向数据，而横截面方法（如最优传输、基于流匹配的）映射聚合群体，丢失了个体动力学。在本文中，我们证明这种二分法可以被打破。我们介绍CADENCE，一个原则性的概率框架，通过将潜在动力学锚定到静态的个体级上下文，从孤立快照中恢复连续的个体轨迹。我们为单时间点轨迹推断提供了新颖的可识别性保证。通过结合基于分数的空间编码器（双射概率流ODE）以消除微分同胚歧义，以及软混合专家（SMoE）路由器，我们证明个体动力学参数和路由函数是联合可识别的。在一系列涵盖物理系统到真实世界生物数据的基准测试中，CADENCE严格在具有上下文结构的极端稀疏快照上训练，其性能匹配或超过了在密集全轨迹数据上训练的最先进序列模型。

英文摘要

Predicting how a dynamical unit evolves over time - how an individual ages, an epidemic spreads, or a physical system degrades - typically requires dense longitudinal tracking. When only extremely sparse or entirely cross-sectional data is available, inferring individualized, continuous-time trajectories is fundamentally ill-posed. Existing methods force a strict compromise: sequence models (e.g. latent ODEs) require dense longitudinal data, while cross-sectional methods (e.g. optimal transport, flow matching-based) map aggregate populations, losing individual dynamics. In this paper, we demonstrate that this dichotomy can be broken. We introduce CADENCE, a principled probabilistic framework that recovers continuous individual trajectories from isolated snapshots by anchoring latent dynamics to static, individual-level contexts. We provide novel identifiability guarantees for single-timepoint trajectory inference. By combining a score-based spatial encoder (bijective Probability Flow ODE) to eliminate diffeomorphic ambiguities with a Soft Mixture-of-Experts (SMoE) router, we show that individual dynamical parameters and routing function are jointly identifiable. Across a suite of benchmarks spanning physical systems to real-world biological data, CADENCE, trained strictly on extremely sparse snapshots with context structure, matches or exceeds the performance of state-of-the-art sequential models trained on dense, full-trajectory data.

URL PDF HTML ☆

赞 0 踩 0

2605.23467 2026-05-25 cs.LG 版本更新

S$^3$GNN: Efficient Global Mixing and Local Message Passing for Long-Range Graph Learning

S$^3$GNN：用于长程图学习的高效全局混合与局部消息传递

Dai Shi, Luke Thompson, Linhan Luo, Lequan Lin, Andi Han, Junbin Gao, José Miguel Hernández Lobato

发表机构 * Department of Engineering, University of Cambridge, Cambridge, UK.（剑桥大学工程系，英国剑桥）； University of Sydney, Australia.（悉尼大学，澳大利亚）

AI总结本文针对图神经网络在捕捉长距离依赖时面临的信息瓶颈问题，提出了一种名为S$^3$GNN的新方法。该方法通过引入轻量级的全局信息混合机制，在不依赖严格理论假设的前提下有效缓解了过度压缩现象。实验表明，S$^3$GNN在多个领域任务中实现了显著的性能提升，并大幅减少了参数数量。

详情

AI中文摘要

消息传递神经网络（MPNN）在捕获长程依赖时常常遭受信息瓶颈，导致过挤压（OSQ）现象。除了空间连通性增强（例如，重连）外，最近的研究表明，谱滤波可以产生强大的长程学习结果，因为谱算子能够实现全局信息混合，从而缓解OSQ。这些方法通过稳定深层传播中的雅可比能量或在强理论假设下保证OSQ缓解来实现这一点。我们重新审视这些结论，并表明相关的雅可比敏感性下界在实践中通常难以实现。然后，我们提出S$^3$GNN，它通过以显著较低的计算复杂度轻量级地重新引入被忽略的组件来缓解OSQ，而不需要这些限制性假设，同时特征变换的标准稳定性约束在我们的新动态下仍然有效。跨不同领域（例如，长程基准、KGQA和基于网格的流体动力学）的大量实验表明，S$^3$GNN在参数减少多达50%的情况下实现了高达一个数量级的误差降低。我们的代码可在https://github.com/EEthanShi/S3-GNN.git找到。

英文摘要

Message-passing neural networks (MPNNs) often suffer from an information bottleneck when capturing long-range dependencies, leading to the oversquashing (OSQ) phenomenon. Alongside spatial connectivity enrichment (e.g., rewiring), recent studies have shown that spectral filtering can yield strong long-range learning outcomes, as spectral operators enable global information mixing that alleviates OSQ. These approaches achieve this either by stabilizing the Jacobian energies in deep propagation or by guaranteeing OSQ mitigation under strong theoretical assumptions. We revisit these conclusions and show that the associated Jacobian sensitivity lower bound is generally difficult to achieve in practice. We then propose S$^3$GNN, which mitigates OSQ without such restrictive assumptions by lightweightly reintroducing omitted components with substantially lower computational complexity, while standard stability constraints on feature transformations remain effective under our new dynamics. Extensive experiments across diverse domains (e.g., long-range benchmarks, KGQA, and mesh-based fluid dynamics) demonstrate that S$^3$GNN achieves up to an order-of-magnitude error reduction with up to 50\% fewer parameters. Our code can be found in https://github.com/EEthanShi/S3-GNN.git.

URL PDF HTML ☆

赞 0 踩 0

2605.23464 2026-05-25 cs.LG 版本更新

Unextractable Protocol Models: Collaborative Training and Inference without Weight Materialization

不可提取协议模型：无需权重物化的协作训练与推理

Alexander Long, Chamin Hewa Koneputugodage, Thalaiyasingam Ajanthan, Yan Zuo, Gil Avraham, Violetta Shevchenko, Hadi Mohaghegh Dolatabadi, Sameera Ramasinghe

发表机构 * Pluralis Research（Pluralis研究）

AI总结本文研究了在去中心化环境中协作训练和推理大规模神经网络的问题，提出了一种名为“不可提取协议模型（UPMs）”的新框架。该方法通过在参与者之间定期注入时间变化的可逆变换，使得模型各部分在不同时间步上不兼容，从而防止权重被完整提取。实验表明，UPMs在保持模型性能的同时有效提升了安全性，并分析了其在训练和推理中的开销及对各类攻击的防御能力。

Comments Accepted at NeurIPS 2025. 34 pages, 6 figures (5 in main body, 1 in appendix). Alexander Long and Chamin Hewa Koneputugodage contributed equally

详情

Journal ref: Advances in Neural Information Processing Systems 38, pp. 18677-18713 (NeurIPS 2025)

AI中文摘要

我们考虑一个去中心化设置，其中参与者协作训练和提供大型神经网络服务，且每个参与者只处理模型的一个子集。在此设置中，我们探索了不可物化权重的可能性，即完整权重集永远不会对任何参与者可用。我们引入了不可提取协议模型（UPMs）：一种利用分片模型设置来确保参与者持有的模型分片（即子集）在不同时间步不兼容的训练和推理框架。UPMs 在参与者边界定期注入时变、随机、可逆的变换；保持整体网络功能，但使跨时间组装变得不连贯。在 Qwen-2.5-0.5B 和 Llama-3.2-1B 上，10,000 次变换使 FP32 困惑度保持不变（ΔPPL < 0.01；Jensen-Shannon 漂移 < 4×10^{-5}），并且我们展示了如何控制低精度数据类型的增长。每 30 秒应用一次变换在推理时增加 3% 的延迟、0.1% 的带宽和 10% 的 GPU 内存开销，而训练开销降至 1.6% 的时间和 < 1% 的内存。我们考虑了多种攻击，表明直接攻击的要求不切实际且易于防御，并且基于梯度的拼接分区微调消耗了从头训练所需 token 的 ≥ 60%。通过使模型能够协作训练但不可提取，UPMs 使得在社区驱动的去中心化训练中嵌入程序化激励机制变得可行。

英文摘要

We consider a decentralized setup in which the participants collaboratively train and serve a large neural network, and where each participant only processes a subset of the model. In this setup, we explore the possibility of unmaterializable weights, where a full weight set is never available to any one participant. We introduce Unextractable Protocol Models (UPMs): a training and inference framework that leverages the sharded model setup to ensure model shards (i.e., subsets) held by participants are incompatible at different time steps. UPMs periodically inject time-varying, random, invertible transforms at participant boundaries; preserving the overall network function yet rendering cross-time assemblies incoherent. On Qwen-2.5-0.5B and Llama-3.2-1B, 10,000 transforms leave FP32 perplexity unchanged ($Δ$PPL $< 0.01$; Jensen-Shannon drift $< 4 \times 10^{-5}$), and we show how to control growth for lower precision datatypes. Applying a transform every 30s adds 3% latency, 0.1% bandwidth, and 10% GPU-memory overhead at inference, while training overhead falls to 1.6% time and $< 1$% memory. We consider several attacks, showing that the requirements of direct attacks are impractical and easy to defend against, and that gradient-based fine-tuning of stitched partitions consumes $\geq 60$% of the tokens required to train from scratch. By enabling models to be collaboratively trained yet not extracted, UPMs make it practical to embed programmatic incentive mechanisms in community-driven decentralized training.

URL PDF HTML ☆

赞 0 踩 0

2605.23449 2026-05-25 cs.LG cs.CV math.AG 版本更新

Commutator-Induced Uncertainty in VAEs

VAE中的换位子引发的不确定性

Tahereh Dehdarirad, Michael Felsberg, Gabriel Eilertsen, Ziliang Xiong

发表机构 * Computer Vision and Learning Systems (CVL), Linköping University, Sweden（计算机视觉与学习系统（CVL），林雪平大学，瑞典）； Department of Science and Technology, Linköping University, Sweden（科学与技术系，林雪平大学，瑞典）

AI总结变分自编码器（VAEs）在学习非交换结构时常常面临不确定性问题。本文提出了一种基于李群的VAE框架，通过结合几何与代数视角分析不确定性，将离散生成因素与连续几何变换分离。该方法通过诊断代数非交换性并调整解码器对非交换结构的敏感度，提升了重构质量与潜在空间结构的一致性，在多个基准数据集上表现出优越的重构与潜在空间遍历性能。

详情

AI中文摘要

变分自编码器（VAE）通常难以表示学习到的潜在空间中的非交换结构。对称感知的VAE通常通过代数正则化强制交换性来解决这个问题，这适用于交换变换群，但当非交换性是数据内在特性时会抑制有意义的非交换结构。我们认为，非交换性应被明确诊断并反映在重建行为中。我们引入了一个李群VAE框架，该框架结合了几何和代数视角下的不确定性，同时将离散生成因子与连续几何变换分开。在第一阶段，模型在没有结构约束的情况下进行训练，同时通过有限Baker-Campbell-Hausdorff偏差测量代数非交换性，并通过重建顺序交换测试测量解码器顺序敏感性。这些诊断揭示了在无约束训练下潜在非交换性与重建行为之间的尺度不匹配。在第二阶段，我们引入了一个具有数据驱动校准常数的变形稳定性约束，使解码器敏感性与代数非交换性对齐。我们在dSprites、3DShapes、3DCars和CelebA上评估了该框架，并与通用和对称感知基线（包括beta-VAE、CLG-VAE和CFASL）进行了比较。在合成基准上，该方法提高了重建质量，并产生了与潜在非交换结构更一致的解码器行为。定性分析显示了更清晰的顺序依赖潜在组合和更稳定的重建。在CelebA上，该模型比CFASL产生了更忠实的重建和因子特定的潜在遍历，同时在学习的潜在方向之间也表现出有意义的顺序依赖交互。

英文摘要

Variational autoencoders (VAEs) often struggle to represent non-commutative structure in learned latent spaces. Symmetry-aware VAEs commonly address this issue by enforcing commutativity through algebraic regularization, which is appropriate for commutative transformation groups but can suppress meaningful non-commutative structure when it is intrinsic to the data. We argue that non-commutativity should instead be explicitly diagnosed and reflected in reconstruction behavior. We introduce a Lie Group VAE framework that combines geometric and algebraic perspectives on uncertainty while separating discrete generative factors from continuous geometric transformations. In a first phase, the model is trained without structural constraints while algebraic non-commutativity is measured through finite Baker-Campbell-Hausdorff deviations and decoder order sensitivity is measured through reconstruction order-swap tests. These diagnostics reveal a scale mismatch between latent non-commutativity and reconstruction behavior under unconstrained training. In a second phase, we introduce a deformation-stability constraint with a data-driven calibration constant that aligns decoder sensitivity with algebraic non-commutativity. We evaluate the framework on dSprites, 3DShapes, 3DCars, and CelebA against generic and symmetry-aware baselines, including beta-VAE, CLG-VAE, and CFASL. Across synthetic benchmarks, the method improves reconstruction quality and yields decoder-level behavior more consistent with latent non-commutative structure. Qualitative analyses show clearer order-dependent latent compositions and more stable reconstructions. On CelebA, the model yields more faithful reconstructions and factor-specific latent traversals than CFASL, while also exhibiting meaningful order-dependent interactions between learned latent directions.

URL PDF HTML ☆

赞 0 踩 0

2605.23446 2026-05-25 cs.LG math.CO 版本更新

Weisfeiler-Leman Is Incomplete on Simple Spectrum Graphs, so Canonicalize Them

Weisfeiler-Leman 在简单谱图上是不完备的，因此对它们进行规范化

Snir Hordan, Nadav Dym, Tim Seppelt

发表机构 * IT University of Copenhagen（哥本哈根IT大学）

AI总结该研究探讨了具有简单谱图的图同构问题，指出对于任意自然数 $k$，$k$-Weisfeiler-Leman 测试无法区分所有非同构的简单谱图，从而揭示了现有图神经网络在该类图上的局限性。为解决这一问题，研究提出了 PRiSM 方法，这是首个能够完全对简单谱图进行正则化分解的算法，填补了该领域的空白。PRiSM 不仅保证了表达能力的完备性，还与深度集合或 Transformer 结合后实现了对简单谱图的通用逼近能力，为图的表示学习提供了新的理论支持和实用方法。

详情

AI中文摘要

具有简单谱的图允许三次时间同构测试，然而我们证明对于每个自然数 $k$，$k$-Weisfeiler-Leman ($k$-WL) 测试无法区分所有非同构的简单谱图。由于 WL 层次结构限制了广泛使用的图神经网络 (GNN) 的区分能力，这种不完备性适用于所有此类 GNN，从而排除了每个 $k$-WL 对齐的 GNN 家族的完备性。为了弥补这一差距，我们引入了 PRiSM (分区、细化、求解、匹配)，这是第一个可证明完备的简单谱特征分解规范化方法。PRiSM 获得了先前规范化方法显然缺乏的完备性保证，并解决了在简单谱图上实现完全表达性的开放问题。当与 DeepSets 或 Transformer 组合时，PRiSM 在简单谱图上实现了通用逼近，证明了使用规范化拉普拉斯位置编码的合理性。实验上，PRiSM 在图回归、分类和表达性方面与现有谱规范化方法性能相当或更优。

英文摘要

Graphs with a simple spectrum admit cubic-time isomorphism testing, yet we prove that for every natural number $k$, the $k$-Weisfeiler-Leman ($k$-WL) test cannot distinguish all non-isomorphic graphs with a simple spectrum. As the WL hierarchy upper-bounds the distinguishing power of widely-used Graph Neural Networks (GNNs), this incompleteness applies to all such GNNs, ruling out completeness for every $k$-WL-aligned GNN family. To close this gap, we introduce PRiSM (Partition, Refine, Solve, Match), the first provably complete canonicalization of simple-spectrum eigendecompositions. PRiSM obtains the completeness guarantee that prior canonicalizations provably lack, and resolves the open problem of achieving complete expressivity on simple-spectrum graphs. When composed with DeepSets or a Transformer, PRiSM achieves universal approximation on simple-spectrum graphs, justifying the use of canonicalized Laplacian positional encodings. Empirically, PRiSM performs comparably to or outperforms existing spectral canonicalizations on graph regression, classification, and expressivity

URL PDF HTML ☆

赞 0 踩 0

2605.23434 2026-05-25 cs.LG 版本更新

Onsager-Machlup Posterior Transport for Deep Gaussian Processes

深度高斯过程的Onsager-Machlup后验传输

Jian Xu, Delu Zeng, John Paisley, Qibin Zhao

发表机构 * RIKEN iTHEMS（日本理化学研究院iTHEMS研究中心）； RIKEN AIP（日本理化学研究院AIP研究中心）； South China University of Technology（华南理工大学）； Columbia University（哥伦比亚大学）

AI总结深度高斯过程（DGPs）中的近似推断在诱导变量上面临计算瓶颈。本文提出一种新的后验传输方法，通过确定性采样器将可计算的参考测度映射到与后验相关的诱导变量，并利用由Doob桥扩散过程导出的路径先验进行正则化。核心方法基于Song的概率流ODE和Onsager-Machlup作用量，实验证明该方法在多个UCI回归数据集上优于现有方法，尤其在大规模数据集上表现更优。

详情

AI中文摘要

对诱导变量的近似推断是深度高斯过程(DGP)的计算瓶颈。现有方法要么通过ELBO拟合显式密度$q_\phi(\bU)$(DSVI, IPVI, DDVI, DBVI)，要么通过MCMC采样(SGHMC)。我们则将DGP推断框架化为\emph{后验传输}：学习一个确定性采样器，将易处理的参考测度映射到后验相关的诱导变量，并通过从Doob桥接参考扩散导出的路径先验进行正则化。我们的实现\textbf{OM-Path}(正式名称为FBVI-bridge-Path)使用Song的概率流ODE应用于DBVI的Doob桥接前向SDE；参考漂移由桥边际系数闭式给出(无需分数匹配)，路径正则化器为\textbf{Onsager--Machlup作用量}。在训练时使用的有限$\epsilon$值下，目标函数是温度Doob桥路径后验的负对数未归一化密度，定理1通过Freidlin--Wentzell LDP将其识别为同一后验的小噪声MAP路径。在同一桥骨干上推导了两种严格的路径空间ELBO变体(FFJORD对数行列式；OM正则化CNF)作为消融实验。在七个UCI回归基准上与DBVI进行匹配种子的配对Wilcoxon检验，OM-Path在两个最大数据集上取得了统计显著的胜利(\textit{power}: $p=0.014$，NLL $\mathbf{0.012}$匹配DSVI基线$0.017$；\textit{protein}: $p=0.002$，RMSE $\mathbf{0.716}$对比$0.764$，NLL $\mathbf{1.086}$对比$1.149$)，在\textit{yacht}/\textit{qsar}上统计持平，在\textit{boston}/\textit{energy}/\textit{concrete}上因小噪声数据而输给DBVI。严格的ELBO变体在任何UCI指标上均未超过DBVI：在该机制下，降低路径目标方差比精确密度跟踪更重要。

英文摘要

Approximate inference over inducing variables is the central computational bottleneck of Deep Gaussian Processes (DGPs). Existing methods either fit an explicit density $q_ϕ(\bU)$ by an ELBO (DSVI, IPVI, DDVI, DBVI) or sample by MCMC (SGHMC). We instead frame DGP inference as \emph{posterior transport}: learn a deterministic sampler that maps a tractable reference measure to posterior-relevant inducing variables, regularised by a path prior derived from the Doob-bridged reference diffusion. Our realisation, \textbf{OM-Path} (formally FBVI-bridge-Path), uses Song's probability-flow ODE applied to DBVI's Doob-bridged forward SDE; the reference drift is closed-form from the bridge marginal coefficients (no score matching) and the path regulariser is the \textbf{Onsager--Machlup action}. At the finite-$ε$ value used at training, the objective is the negative log unnormalised density of a tempered Doob-bridge path posterior, and Theorem 1 identifies it with the same posterior's small-noise MAP path via the Freidlin--Wentzell LDP. Two strict path-space ELBO variants on the same bridge backbone (FFJORD log-det; OM-regularised CNF) are derived as ablations. Under a matched-seed paired Wilcoxon test against DBVI on seven UCI regression benchmarks, OM-Path delivers statistically significant wins on the two largest datasets (\textit{power}: $p\!=\!0.014$, NLL $\mathbf{0.012}$ matching the DSVI baseline of $0.017$; \textit{protein}: $p\!=\!0.002$, RMSE $\mathbf{0.716}$ vs.\ $0.764$, NLL $\mathbf{1.086}$ vs.\ $1.149$), statistical ties on \textit{yacht} / \textit{qsar}, and concedes \textit{boston} / \textit{energy} / \textit{concrete} to DBVI on small-$N$ noisy data. The strict-ELBO variants do not clear DBVI on any UCI metric: in this regime, reducing the variance of the path objective dominates exact-density tracking.

URL PDF HTML ☆

赞 0 踩 0

2605.23424 2026-05-25 cs.IT cs.LG math.IT 版本更新

Sparse In-Network Learning via Shortest-Path Backpropagation and Finite-Rate Gating

通过最短路径反向传播和有限速率门控的稀疏网内学习

Mohammad Reza Deylam Salehi

发表机构 * Nice, France（法国尼斯）

AI总结本文研究了网络内学习（INL）中的稀疏通信问题，提出了一种基于最短路径树和有限速率门控机制的稀疏网络内学习方法D-INL。该方法通过保留以融合节点为根的容量感知最短路径树，去除非树链接，同时将局部路由建模为有限速率的随机门控，以在稀疏性和预测信息之间取得平衡。实验表明，D-INL在保持分类精度的同时，将训练过程中的通信量减少了70.4%，并进一步通过有限速率正则化将潜在信息率降低了45.7%。

2605.23422 2026-05-25 cs.LG 版本更新

Hinge Regression Trees and HRT-Boost: Newton-Optimized Oblique Learning for Compact Tabular Models

铰链回归树与HRT-Boost：面向紧凑表格模型的牛顿优化斜学习

Hongyi Li, Jun Xu, Hong Yan

发表机构 * School of Intelligence Science and Engineering, Harbin Institute of Technology, Shenzhen（哈尔滨工业大学深圳校区智能科学与工程学院）； Shenzhen Key Lab for Advanced Motion Control and Modern Automation Equipments, Shenzhen（深圳先进运动控制及现代自动化装备重点实验室）； Department of Electrical Engineering, City University of Hong Kong, Kowloon（香港城市大学电子工程系）

AI总结本文提出了一种名为Hinge Regression Tree（HRT）的框架，通过将每个斜向分割转化为两个线性预测器的非线性最小二乘问题，从而提升斜向决策树的学习质量。HRT利用节点级别的优化过程，结合阻尼牛顿法进行求解，并在理论上证明其具有明确的逼近能力。基于HRT，作者进一步提出了HRT-Boost集成方法，将节点级的牛顿更新与逐阶段函数梯度下降相结合，在平方损失下实现了经验风险的逐步减少，实验表明该方法在多个基准数据集上表现优异，且能生成更为紧凑的模型。

Comments arXiv admin note: substantial text overlap with arXiv:2602.05371

详情

AI中文摘要

由于分割优化的离散性和非凸性，学习高质量的斜决策树仍然是一个重大挑战。我们提出了铰链回归树（HRT）框架，该框架将每个斜分割重构为两个线性预测器上的非线性最小二乘问题，其最大/最小包络诱导出类似ReLU的表示能力。我们证明了由此产生的节点级优化可以解释为阻尼牛顿法，并为其回溯线搜索变体建立了节点目标函数的单调递减性质。理论上，我们证明了HRT是一个通用逼近器，具有显式的$O(δ^2)$逼近速率。在此基础学习器之上，我们提出了HRT-Boost，一种数学上协同的集成扩展，将节点级牛顿更新与阶段式函数梯度下降相结合。我们证明了在平方损失下，这种集成构造具有阶段式经验风险降低保证。在合成和真实世界基准上的实证评估表明，HRT与现有的单树基线相比具有很强的竞争力，而HRT-Boost与强集成基线相比表现良好，并且通常产生更紧凑的模型。代码公开于https://github.com/Hongyi-Li-sz/HRT-Boost。

英文摘要

Learning high-quality oblique decision trees remains a significant challenge due to the discrete and non-convex nature of split optimization. We present the Hinge Regression Tree (HRT) framework, which reframes each oblique split as a nonlinear least-squares problem over two linear predictors whose max/min envelope induces ReLU-like representation capacity. We show that the resulting node-level optimization can be interpreted as a damped Newton method, and we establish the monotonic decrease of the node objective for its backtracking line-search variant. We establish, theoretically, that HRT is a universal approximator with an explicit $O(δ^2)$ approximation rate. Building upon this base learner, we propose HRT-Boost, a mathematically synergistic ensemble extension that couples node-level Newton updates with stage-wise functional gradient descent. We show that this ensemble construction admits a stage-wise empirical risk reduction guarantee under the squared loss. Empirical evaluations on synthetic and real-world benchmarks show that HRT is highly competitive with established single-tree baselines, and HRT-Boost compares favorably with strong ensemble baselines and often yields substantially more compact models. The code is publicly available at https://github.com/Hongyi-Li-sz/HRT-Boost.

URL PDF HTML ☆

赞 0 踩 0

2605.23417 2026-05-25 cs.LG 版本更新

An Open-Source Training Dataset for Foundation Models for Black-box Optimization

黑箱优化的基础模型的开源训练数据集

Aaron Klein, Herilalaina Rakotoarison, Luca Thale-Bombien, David Salinas

发表机构 * ELLIS Institute Tübingen（图宾根ELLIS研究所）； University of Helsinki（赫尔辛基大学）； Leipzig University（莱比锡大学）； Prior Labs（Prior实验室）

AI总结本文提出了一种名为BBO-Pile的开源训练数据集，包含超过50万个优化轨迹，覆盖3095个不同黑盒优化问题，是目前规模最大的公开黑盒优化预训练数据集。研究利用该数据集训练了多个不同规模的基础模型，验证了大规模预训练在模仿黑盒优化方法中的有效性，为该领域未来的研究奠定了基础。

详情

AI中文摘要

大多数黑箱优化方法需要大量的超参数调优，这通常限制了它们在不同优化领域的泛化能力。用于黑箱优化的基础模型从大量优化轨迹中学习优化原理，提供了一种有前景的替代方案，有潜力在多样的问题类别中超越手工设计的方法。然而，先前的工作要么依赖非公开数据集，要么依赖纯合成数据，限制了可重复性和对真实世界问题的泛化。因此，该领域的进展一直受到缺乏大规模、真实世界、公开可用的预训练数据的制约。我们引入了BBO-Pile，这是第一个包含超过500K优化轨迹的开源数据集，这些轨迹在3095个不同的黑箱上针对不同的优化器进行了评估，这代表了迄今为止该任务最大的公开数据集。利用该数据集，我们训练了一系列不同规模的基础模型，参数从2M到80M，训练token从200M到2B，并研究了它们相对于计算量的扩展行为。我们的结果表明，大规模预训练是模仿黑箱优化方法的一种可行且有效的方法，为未来的研究铺平了道路。

英文摘要

Most black-box optimization methods require extensive hyperparameter tuning, often limiting their ability to generalize across different optimization domains. Foundation models for black-box optimization that learn optimization principles from a large collection of optimization trajectories offer a promising alternative, with the potential to outperform manually designed methods across diverse problem classes. However, prior work has either relied on non-public datasets or on purely synthetic data, limiting reproducibility and generalization to real-world problems. As a result, progress in this area has been constrained by the lack of large-scale, real-world, publicly available pre-training data. We introduce BBO-Pile, the first open-source dataset comprising over 500K optimization trajectories evaluated across 3095 different black-boxes for different optimizers, which represents by far the largest public dataset for this task. Using this dataset, we train a family of foundation models at multiple scales, ranging from 2M to 80M parameters and from 200M to 2B training tokens, and study their scaling behavior with respect to compute. Our results demonstrate that large-scale pre-training is a viable and effective approach to imitate black-box optimization methods, paving the way for future research in this direction.

URL PDF HTML ☆

赞 0 踩 0

2605.23414 2026-05-25 cs.AI cs.LG 版本更新

每个组件都是一个查找：来自单一分解的令牌归因与组合

Po-Kai Chen, Niki van Stein, Aske Plaat

发表机构 * Leiden University（莱顿大学）

AI总结该论文研究了如何从单一前向传播中解析Transformer模型中各组件对预测结果的贡献及其组合方式。作者提出了一种名为Unpack的反向递归方法，通过分解注意力和MLP子层中的信用，揭示了不同组件之间的交互强度以及每个token的归因信息，无需干预、梯度或辅助训练。实验表明，该方法在GPT-2和Pythia系列模型上有效恢复了组件间的组合结构，并展示了对token级归因的准确捕捉，验证了其在机制可解释性方面的有效性。

详情

AI中文摘要

变压器的机制可解释性不仅需要识别哪些组件重要，还需要理解它们如何组合成产生预测的计算路径。注意力和MLP都遵循共享的键值模板 $ϕ(S)U$。我们利用这一结构开发了Unpack，一种后向递归方法，通过两个子层分解贡献，产生任意两个组件之间的交互强度，称为带有K/Q/V组合标签的端到端路径，以及来自单次前向传递的每个令牌的归因，无需干预、梯度或辅助训练。我们在间接宾语识别任务上进行了评估。在GPT-2 small上，该方法恢复了Wang等人（2023）描述的所有三种组合连接，包括每个连接的特定模式路由（K、Q或V）。为了测试超越简单复制的令牌级归因，我们比较了同一分解中同一名称的两次出现：第一次提及保持强归因，而重复检测位置被抑制，这一模式在匹配的控制提示中不存在。在Pythia系列从160M到6.9B参数中，这一抑制模式在每个尺度上一致地恢复，表明该方法无需真实电路标签即可追踪机制结构。代码可在https://github.com/Fun-Cry/unpacklm获取。

英文摘要

Mechanistic interpretability of transformers requires identifying not just which components matter but how they compose into the computational route that produced a prediction. Both attention and MLP follow a shared key-value template $ϕ(S)U$. We exploit this structure to develop Unpack, a backward recursion that decomposes credit through both sublayers, producing interaction strengths between any two components, named end-to-end paths with K/Q/V composition labels, and per-token attribution from a single forward pass, without intervention, gradients, or auxiliary training. We evaluate on the indirect object identification task. On GPT-2 small, the method recovers all three composition connections described by Wang et al. (2023), including the mode-specific routing of each connection (K, Q, or V). To test token-level attribution beyond trivial copying, we compare two occurrences of the same name in the same decomposition: the first mention retains strong credit while the duplicate-detection position is suppressed, a pattern absent in matched control prompts. Across the Pythia family from 160M to 6.9B parameters, this suppression pattern is consistently recovered at every scale, demonstrating that the method tracks mechanistic structure without ground-truth circuit labels. Code is available at https://github.com/Fun-Cry/unpacklm.

URL PDF HTML ☆

赞 0 踩 0

2605.23391 2026-05-25 cs.LG cs.NA math.NA 版本更新

Coupling-Robust Accuracy in Multiphysics Physics Informed Neural Networks via Kronecker-Preconditioned Optimization

通过Kronecker预条件优化实现多物理场物理信息神经网络的耦合鲁棒精度

Youngjae Park, Jaemin Kim, Junghwa Hong

发表机构 * Dept. of Control and Instrumentation Engineering, Korea University, Sejong, South Korea（控制与仪器工程系，韩国大学，世宗，韩国）； BK21 FOUR Smart Mobility Education and Research Team, Korea University, Sejong, South Korea（BK21 FOUR智能交通教育与研究团队，韩国大学，世宗，韩国）

AI总结物理信息神经网络（PINNs）在处理耦合多物理场系统时，随着方程间耦合增强，会出现系统性精度下降的问题。本文通过神经切线核（NTK）分析，揭示了这一现象的理论原因，并提出了一种基于克罗内克预处理的优化方法SOAP+GN，有效抑制了耦合强度对学习稳定性的影响。实验表明，该方法在多种耦合偏微分方程系统中均能保持较高的精度，显著优于传统优化方法。

Comments 20 pages, 10 figures. Extended version of AI4Physics Workshop submission (ICML 2026)

详情

AI中文摘要

用于耦合多物理场系统的物理信息神经网络（PINN）在方程间耦合增强时会遭受系统性精度退化。我们通过神经正切核（NTK）分析为这一现象提供了理论解释：对于线性耦合系统，我们证明标准NTK的谱半径随耦合强度γ呈Ω(γ²)增长，缩小了稳定学习率，而块对角Gauss-Newton（GN）预条件产生预条件NTK $K_P = J H^{+} J^ op$（其中$H$是块对角GN Hessian矩阵），其谱半径以$S$（网络数量）为界，与γ无关。我们在对称、非对称和非线性耦合PDE系统上数值验证了Ω(γ²)增长，并确认在所有情况下$λ_{\max}(K_P) = S$。将Kronecker预条件优化器SOAP与逆梯度范数损失平衡（SOAP+GN）相结合，实现了耦合鲁棒精度：在跨越三个非线性递增的一维系统和一个二维电渗流基准的234个实验中，即使耦合参数变化一到两个数量级，SOAP+GN保持最终epoch的$L_2$退化≤1.1倍（强耦合与弱耦合误差之比），而Adam+GN则超过$10^2$倍。SOAP+GN进一步扩展到EDL分辨条件下的二维六PDE电渗流系统——这一所有先前PINN电动力学研究通过简化物理而避免的工况——而Adam+GN完全失败（$L_2 > 0.9$）。

英文摘要

Physics-informed neural networks (PINNs) for coupled multiphysics systems suffer systematic accuracy degradation as inter-equation coupling strengthens. We provide a theoretical explanation for this phenomenon through neural tangent kernel (NTK) analysis: for linearly coupled systems, we prove that the standard NTK's spectral radius grows as $Ω(γ^2)$ with coupling strength $γ$, shrinking the stable learning rate, while block-diagonal Gauss--Newton (GN) preconditioning yields a preconditioned NTK $K_P = J H^{+} J^\top$ (where $H$ is the block-diagonal GN Hessian) whose spectral radius is bounded by $S$ ($S$ = number of networks), independent of $γ$. We verify the $Ω(γ^2)$ growth numerically across symmetric, asymmetric, and nonlinear coupled PDE systems, and confirm $λ_{\max}(K_P) = S$ with equality in all cases. Combining the Kronecker-preconditioned optimizer SOAP with inverse-gradient-norm loss balancing (SOAP+GN) yields coupling-robust accuracy: across 234 experiments spanning three 1D systems of increasing nonlinearity and a 2D electroosmotic flow benchmark, SOAP+GN maintains final-epoch $L_2$ degradation $\leq 1.1\times$ (ratio of strong- to weak-coupling error) even as coupling parameters vary over one to two orders of magnitude, compared with $> 10^2\times$ for Adam+GN. SOAP+GN further scales to a 2D, 6-PDE electroosmotic flow system at EDL-resolved conditions -- a regime that all prior PINN electrokinetics studies have avoided through simplified physics -- where Adam+GN fails entirely ($L_2 > 0.9$).

URL PDF HTML ☆

赞 0 踩 0

2605.23378 2026-05-25 math.OC cs.LG 版本更新

Selective Ambulance Dispatch Under Contextual Travel-Time Uncertainty

上下文旅行时间不确定性下的选择性救护车调度

Zikun Lin, Daniel Zhuoyu Long, Viet Anh Nguyen

发表机构 * Department of Systems Engineering and Engineering Management（系统工程与工程管理系）

AI总结本文研究了在交通时间不确定性背景下如何选择性派遣救护车以应对院外心脏骤停的紧急情况。提出了一种名为IDEAL的智能双派车框架，仅在主路线与备选路线的时间差超过阈值时才派遣第二辆救护车，从而在保证响应速度的同时减少资源消耗。该方法通过弱监督双层网络学习上下文相关的道路旅行时间，并结合非光滑优化与不确定性建模，实现了高效且具有收敛性保证的实时决策，在实际数据与模拟测试中表现出优于现有方法的响应时间与资源利用平衡。

详情

AI中文摘要

救护车响应在院外心脏骤停（OHCA）中具有时间紧迫性，调度员必须在及时到达与有限车队容量之间取得平衡。静态区域和确定性旅行时间估计易受动态拥堵影响，而始终双调度增加了冗余但消耗了车队容量。我们提出IDEAL（智能双调度急救车），一种选择性双调度框架，仅当主要路径与次要路径之间的乐观差距超过阈值时才派出第二辆救护车。IDEAL利用弱监督双层表示网络，从行程级调度记录（包括未观测路线）中学习上下文特定的边旅行时间。我们使用小批量保守梯度训练非光滑模型，并证明渐近收敛保证。IDEAL通过Burg散度扰动对学习表示空间中的共享度量进行建模，从而引起边旅行时间的相关变化，并从历史低估误差中学习上下文特定半径。对于实时决策，IDEAL将乐观差距计算转化为凸差规划，并推导出具有复杂度保证的高效预言机。与香港消防处合作，我们使用历史OHCA记录和实时自适应模拟评估IDEAL。相对于所有基于区域和基于谷歌的基线，结果实现了更强的响应时间/资源权衡。

英文摘要

Ambulance response is time-critical in out-of-hospital cardiac arrest (OHCA), where dispatchers must balance timely arrivals with limited fleet capacity. Static territories and deterministic travel-time estimates are vulnerable to dynamic congestion, while always-dual dispatch adds redundancy but consumes fleet capacity. We propose IDEAL (Intelligent Dual dispatch of Emergency AmbuLances), a selective dual-dispatch framework that sends a second ambulance only when the optimistic gap between primary and secondary paths exceeds a threshold. IDEAL learns context-specific edge travel times from trip-level dispatch records, including unobserved routes, using a weakly supervised bilevel representation network. We train the nonsmooth model with mini-batch conservative gradients and prove an asymptotic convergence guarantee. IDEAL models uncertainty via Burg-divergence perturbations to a shared metric in the learned representation space, thereby inducing correlated changes in edge travel times and learning context-specific radii from historical underprediction errors. For real-time decisions, IDEAL casts optimistic-gap computation as a difference-of-convex program and derives an efficient oracle with complexity guarantees. In collaboration with the Hong Kong Fire Services Department, we evaluate IDEAL using historical OHCA records and real-time adaptive simulations. The results achieve a stronger response-time/resource trade-off relative to all region-based and Google-based baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.23372 2026-05-25 cs.LG cs.AI 版本更新

Curriculum reinforcement learning with measurable task representation learning

基于可度量任务表征学习的课程强化学习

Yongyan Wen, Siyuan Li, Mingjian Fu, Yiqin Yang, Xun Wang, Peng Liu

发表机构 * Harbin Institute of Technology（哈尔滨工业大学）； Fuzhou University（福州大学）； China Academy of Sciences（中国科学院）

AI总结本文研究了课程强化学习（CRL）中自动课程生成的问题，特别是在非欧几里得任务空间中的复杂导航任务。为了解决传统插值方法在非欧空间中失效的问题，作者提出了一种基于可度量任务表示学习的自动课程生成方法，通过变分自编码器结构对任务的奖励和状态转移进行编码，从而获得具有任务相似性度量能力的潜在任务表示。实验表明，该方法在多个复杂导航任务中优于基于插值和生成对抗网络的现有CRL方法。

详情

DOI: 10.1016/j.neunet.2026.109019
Journal ref: Neural Networks, 109019 (2026)

AI中文摘要

在课程强化学习（CRL）中，智能体通过一系列任务（即课程）逐步积累知识，学习过程旨在利用积累的知识最终解决具有挑战性的目标任务。虽然早期的CRL工作侧重于对候选任务进行排序，但最近的研究探索了自动课程生成。在丰富的CRL文献中，基于插值的CRL范式是主体，它通过在任务空间中利用有意义的距离度量（即可以衡量任务相似性）对初始任务分布和目标任务分布进行插值，自动生成中间任务。然而，在具有挑战性的导航任务中，非欧几里得上下文（任务）空间使得这一假设失效。为了在复杂任务中实现自动课程生成，我们提出了一种基于可度量任务表征学习的新型自动课程生成方法。为了更好地衡量相似性，我们提出将任务空间变换到潜在空间。通过一个编码奖励和状态转移的变分自编码器结构，我们获得了具有任务相似性度量属性的潜在任务表征，其中两个相近的任务嵌入对应两个在奖励和状态转移方面相似的任务。基于学习到的任务表征，我们进一步开发了一种自动课程生成方案，该方案能够有效地生成与目标任务越来越相似的新任务。我们在各种具有挑战性的导航任务中评估了我们的方法，实验结果表明，所提出的方法超越了基于插值和生成对抗网络的最先进CRL方法。

英文摘要

In curriculum reinforcement learning (CRL), an agent incrementally accumulates knowledge over a sequence of tasks (i.e., a curriculum), and the learning process is aimed at using the accumulated knowledge to finally solve a challenging target task. While early CRL works focus on sequencing candidate tasks, recent research explores automatic curriculum generation. Among the rich CRL literature, the interpolation-based CRL paradigm is a main body, which automatically generates intermediate tasks by interpolating between the initial task distribution and the target task distribution in task space with meaningful distance metrics (i.e., can measure the task similarity). However, in challenging navigation tasks, the non-Euclidean context (task) space invalidates this assumption. To achieve automatic curriculum generation in complex task, we propose a novel automatic curriculum generation approach based on measurable task representation learning. To better measure the similarity, we propose to transform the task space to a latent space. Through a variational autoencoder structure that encodes the reward and the state transitions, we achieve a latent task representation with a task similarity measurement property, and two close task embeddings correspond to two similar tasks in terms of rewards and state transitions. Based on the learned task representation, we further develop an automatic curriculum generation scheme, which can effectively generate new tasks more and more similar to the target task. We evaluate our method in a variety of challenging navigation tasks, and the experiment results indicate that the proposed approach surpasses state-of-the-art CRL approaches based on interpolation and generative adversarial networks.

URL PDF HTML ☆

赞 0 踩 0

2605.23365 2026-05-25 cs.LG cs.AI 版本更新

Prudent-Banker: 对抗性赌博机中无延迟与有延迟下的基线安全保障无额外代价

Ting Hu, Luanda Cai, Emmanouil-Vasileios Vlatakis-Gkaragkounis

发表机构 * Department of Economics University of Wisconsin–Madison（经济学系威斯康星大学麦迪逊分校）； Department of Finance University of Wisconsin–Madison（金融系威斯康星大学麦迪逊分校）； Department of Computer Sciences University of Wisconsin–Madison（计算机科学系威斯康星大学麦迪逊分校）

AI总结本文研究了在有无延迟反馈的情况下，如何在对抗性多臂老虎机问题中实现安全基线下的最小最大最优最坏情况悔恨。为了解决延迟可能破坏安全保证的问题，作者提出了Prudent-Banker算法，结合了延迟自适应的在线镜像下降方法和改进的分阶段激进机制，实现了与安全策略相比近似常数悔恨的最优安全-鲁棒性权衡。该算法在理论分析中证明了其悔恨上界不可改进，并通过实验验证了其在多种延迟分布下的有效性。

详情

AI中文摘要

我们研究了在安全感知目标下，具有和不具有延迟反馈的对抗性多臂赌博机问题：实现极小极大最优的最坏情况遗憾，同时相对于指定的“安全”基线策略保持几乎恒定的遗憾。现有方法可以在即时反馈下平衡这种权衡以获得平滑的比较器，但任意延迟可能会错误地安排保守主义和探索之间的转换，危及安全保障。为了弥合这一差距，我们提出了Prudent-Banker，一种新颖的算法，它将延迟适应的在线镜像下降变体与修改的分阶段攻击机制相结合。其关键技术贡献是一个延迟校准的重启阈值，该阈值严格考虑了未观察反馈引起的最坏情况失真，并可靠地检测比较器的次优性。我们还为安全约束的对抗性延迟赌博机建立了新的下界，表明在基线安全要求下，Prudent-Banker的遗憾保证在忽略对数因子时是不可改进的。据我们所知，Prudent-Banker是第一个实现最优安全-鲁棒性权衡的算法：伪遗憾$\widetilde{O}(\sqrt{T}+\sqrt{D})$加上相对于安全比较器的$\widetilde{O}(1)$遗憾，无论有无延迟。跨不同延迟分布的实验表明，与标准的延迟鲁棒基线不同，Prudent-Banker有效地平衡了安全性和学习。

英文摘要

We study adversarial multi-armed bandits with and without delayed feedback under a safety-aware goal: achieving minimax-optimal worst-case regret while keeping nearly constant regret relative to a designated "safe" baseline policy. Existing approaches can balance this trade-off with immediate feedback for smooth comparators, but arbitrary delays can mistime transitions between conservatism and exploration, endangering the safety guarantee. To bridge this gap, we propose Prudent-Banker, a novel algorithm that combines a delay-adapted variant of Online Mirror Descent with a modified phased-aggression mechanism. Its key technical contribution is a delay-calibrated restart threshold that rigorously accounts for the worst-case distortion induced by unobserved feedback and reliably detects comparator suboptimality. We also establish new lower bounds for safety-constrained adversarial delayed bandits, showing that the regret guarantees of Prudent-Banker are unimprovable, up to logarithmic factors, under the baseline-safety requirement. To the best of our knowledge, Prudent-Banker is the first algorithm to achieve the optimal safety--robustness trade-off: pseudo-regret $\widetilde{O}(\sqrt{T}+\sqrt{D})$ together with $\widetilde{O}(1)$ regret against the safe comparator, both with and without delays. Experiments across diverse delay distributions show that, unlike standard delay-robust baselines, Prudent-Banker effectively balances safety and learning.

URL PDF HTML ☆

赞 0 踩 0

2605.23346 2026-05-25 cs.LG 版本更新

Contrastive Distribution Matching for Amortized Sequential Monte Carlo in Discrete Diffusion

对比分布匹配用于离散扩散中的摊销序贯蒙特卡洛

Jaihoon Kim, Taehoon Yoon, Prin Phunyaphibarn, Seungjun Kim, Morteza Mardani, Minhyuk Sung

发表机构 * KAIST（韩国科学技术院）； University of Michigan（密歇根大学）； NVIDIA（英伟达）

AI总结离散扩散模型在生成结构化分类数据方面表现出色，但如何高效地从奖励倾斜分布中采样仍是一个核心挑战。本文提出了一种名为对比分布匹配（CDM）的新框架，通过学习参数化的扭曲函数，将序列蒙特卡洛（SMC）推断的计算成本进行摊销，从而显著提升了推理效率。实验表明，CDM在多种应用场景中均优于现有方法，且额外计算开销极小，验证了其有效性与广泛适用性。

Comments Project Page: https://cdm-smc.github.io/

详情

AI中文摘要

离散扩散模型已成为生成结构化分类数据的强大框架。然而，从奖励倾斜分布中高效采样仍然是一个基本挑战。虽然扭曲序贯蒙特卡洛（SMC）为此任务提供了渐近精确性，但在离散状态空间中估计最优扭曲函数需要昂贵的蒙特卡洛近似，导致推理时严重的计算瓶颈。为克服这一限制，我们引入对比分布匹配（CDM），一种新颖的框架，通过正负样本学习参数化扭曲函数，摊销SMC推理的成本。为了高效训练，我们重新表述梯度估计器，以利用离散扩散模型的闭式前向核。在实践中，评估我们学习的扭曲函数相比基础模型的单次前向传播仅增加不到5%的额外计算开销。通过广泛的经验评估，我们证明CDM在匹配的挂钟时间下始终优于现有基线。我们在多种应用中验证了我们方法的有效性和通用性，包括有毒文本生成、调控DNA序列设计、蛋白质可设计性以及扩散大语言模型对齐。

英文摘要

Discrete diffusion models have emerged as powerful frameworks for generating structured categorical data. However, efficiently sampling from reward-tilted distributions remains a fundamental challenge. While Twisted Sequential Monte Carlo (SMC) offers asymptotic exactness for this task, estimating the optimal twist function in discrete state spaces necessitates costly Monte Carlo approximations, resulting a severe computational bottleneck at inference. To overcome this limitation, we introduce Contrastive Distribution Matching (CDM), a novel framework that amortizes the cost of SMC inference by learning a parameterized twist function via positive and negative samples. For efficient training, we reformulate the gradient estimator to leverage the closed-form forward kernels of discrete diffusion models. In practice, evaluating our learned twist function incurs less than 5% additional computational overhead compared to a single forward pass of the base model. Through extensive empirical evaluations, we demonstrate that CDM consistently outperforms existing baselines under matched wall-clock time. We validate the effectiveness and versatility of our approach across a diverse range of applications, including toxic text generation, regulatory DNA sequence design, protein designability, and diffusion large language model alignment.

URL PDF HTML ☆

赞 0 踩 0

2605.23306 2026-05-25 physics.soc-ph cs.LG cs.SY eess.SY 版本更新

SpinFlow: A Physics-Informed Spin Field Framework for Traffic Phase Inference and Transition Detection

SpinFlow: 一种物理信息自旋场框架用于交通相位推断和过渡检测

Haopeng Deng, Fucheng Zheng, Xinhai Xia

发表机构 * School of Future Transportation（未来交通学院）

AI总结本文提出了一种名为SpinFlow的物理信息化自旋场框架，用于交通相位推断和相变检测。该方法结合Kerner的三相理论与统计物理，通过自旋场建模实现对宏观交通状态的连续推断，并利用正则化的期望最大化算法从高分辨率轨迹数据中反演潜在的自旋场结构。实验表明，SpinFlow在多个真实数据集上表现出优越的性能，能够准确识别交通相变点并生成可解释的相图，为智能交通管理提供了数据驱动且符合物理规律的决策依据。

Comments 11 pages, 8 figures, accepted to ITSC 2026

详情

AI中文摘要

主动交通管理（ATM）经常受到传统宏观模型和刚性经验阈值的阻碍，这些模型和阈值无法捕捉亚稳态相位前兆，导致延迟的反应性干预。为了解决这个问题，我们提出了SpinFlow，一个物理信息自旋场框架，将Kerner的三相理论与统计物理统一起来，用于连续宏观交通相位推断。受海森堡模型启发，SpinFlow通过潜在自旋向量和竞争平衡映射参数化空间变化的相位权重，使同步流自然出现。一种物理正则化的期望最大化算法从高分辨率轨迹中反演这种潜在结构，联合优化自旋场，同时软性强制执行质量守恒和空间平滑性。我们引入相位平衡度（PED）来量化结构对齐并在拓扑上定位相变点。在四个真实轨迹数据集上，SpinFlow实现了高达0.940的$R_{q}^{2}$，PED下降94.9-100%，以及可解释的相位图，在前向准确性、物理一致性和瓶颈定位方面优于三个异构基线。SpinFlow无需先验网络拓扑即可精确定位拥堵成核，为ATM提供了一种数据驱动、物理一致的触发机制。

英文摘要

Active traffic management (ATM) is frequently hindered by traditional macroscopic models and rigid empirical thresholds that fail to capture metastable phase precursors, resulting in delayed, reactive interventions. To address this, we propose SpinFlow, a physics-informed spin-field framework unifying Kerner's three-phase theory with statistical physics for continuous macroscopic traffic phase inference. Inspired by the Heisenberg model, SpinFlow parametrizes spatially varying phase weights via a latent spin vector and a competitive-equilibrium mapping, allowing synchronized flow to emerge naturally. A physics-regularized Expectation-Maximization algorithm inverts this latent structure from high-resolution trajectories, jointly optimizing the spin field while softly enforcing mass conservation and spatial smoothness. We introduce the Phase Equilibrium Degree (PED) to quantify structural alignment and topologically localize phase-transition points. Across four real-world trajectory datasets, SpinFlow achieves $R_{q}^{2}$ up to 0.940, PED drops of 94.9-100%, and interpretable phase maps that outperform three heterogeneous baselines on forward accuracy, physics consistency, and bottleneck localization. SpinFlow pinpoints congestion nucleation without prior network topology, yielding a data-driven, physics-consistent trigger for ATM.

URL PDF HTML ☆

赞 0 踩 0

2605.23295 2026-05-25 physics.optics cs.LG physics.app-ph 版本更新

Accelerating ground state search of spatial photonic Ising machines with genetic-simulated annealing hybrid algorithm

基于遗传-模拟退火混合算法加速空间光子伊辛机基态搜索

Ze Zheng, Ruhui Ni, Jingyi Zhao, Xiaojian Hu, Wen Jiang, Yuegang Li, Hang Xu, Tailong Xiao, Guihua Zeng

发表机构 * Institute for Quantum Sensing and Information Processing（量子传感与信息处理研究所）； State Key Laboratory of Photonics and Communications（光子与通信国家重点实验室）； Global College（全球学院）； Shanghai Research Center for Quantum Sciences（上海量子科学研究中心）； Hefei National Laboratory（合肥国家实验室）； Shanghai Quantum Intelligence Sensing Technology Co., Ltd（上海量子智能感知技术有限公司）

AI总结该研究提出了一种结合遗传算法与模拟退火的混合算法，用于加速空间光子Ising机的基态搜索。传统方法依赖单一的模拟退火算法，收敛速度慢且耗时，而新方法在早期阶段利用遗传算法进行全局搜索，后期采用模拟退火进行局部优化，从而显著提升了求解效率和解的质量。实验表明，该方法在不同规模的Max-Cut问题及高阶优化问题中均优于传统算法，为智能光子Ising计算系统的发展提供了新思路。

Comments 12 pages, 6 figures

详情

AI中文摘要

基于空间光调制器的空间光子伊辛机已成为解决组合优化问题和自旋玻璃模拟等众多任务的高效求解器。然而，传统仅依赖模拟退火算法的SPIM在复杂能量景观中需要大量测量-反馈迭代才能找到相对最优解，存在收敛慢、时间成本高的问题。本文提出一种光学遗传-模拟退火混合算法来加速SPIM的基态搜索。GA在迭代早期进行全局粗粒度搜索，而SA在后期进行细粒度局部精化。数值模拟表明，我们的方法在不同规模的全秩Max-Cut问题上比纯GA或SA能获得更高的解质量。我们还在同一迭代预算下，在规范变换时分复用SPIM上实验证明了其相对于传统算法在高秩优化问题上的优越性。我们的方法可进一步与其他先进元启发式算法结合，向智能光学伊辛计算系统发展。

英文摘要

Spatial photonic Ising machines (SPIMs) based on spatial light modulators (SLMs) have emerged as highly effective solvers for many tasks, including combinatorial optimization problems and spin-glass simulations. However, traditional SPIMs relying solely on the simulated annealing algorithm require a large number of measurement-feedback iterations to find a relatively optimal solution in complex energy landscapes, suffering from slow convergence and high time cost. Here, we propose an optical genetic-simulated annealing hybrid algorithm to accelerate the ground-state search of SPIMs. GA conducts a global coarse-grained search in the early iteration stage, while SA performs fine-grained local refinement in the late stage. Numerical simulations show that our method enables a higher solution quality of full-rank Max-Cut problems than pure GA or SA at different scales. We also experimentally demonstrate its superiority over conventional algorithms on a gauge-transformation time-division multiplexing SPIM for high-rank optimization problems under the same iteration budget. Our approach can be further developed with other advanced metaheuristic algorithms toward intelligent optical Ising computing systems.

URL PDF HTML ☆

赞 0 踩 0

2605.23285 2026-05-25 cs.LG cond-mat.stat-mech cs.AI 版本更新

Reinforcement Learning for Microcanonical Graph Ensemble with Assortativity Constraints

具有同配性约束的微正则图集成的强化学习

Hoyun Choi, Junghyo Jo, Deok-Sun Lee

发表机构 * School of Computational Sciences, Korea Institute for Advanced Study（韩国高等科学研究院计算科学系）； Department of Physics Education, Seoul National University（首尔国立大学物理教育系）； Center for Theoretical Physics and Artificial Intelligence Institute, Seoul National University（首尔国立大学理论物理与人工智能研究所）； Center for AI and Natural Sciences, Korea Institute for Advanced Study（韩国高等科学研究院人工智能与自然科学中心）

AI总结本文研究如何通过强化学习生成满足特定 assortativity（度-度相关性）约束的微正则图系，以精确控制网络结构特性。提出了一种基于强化学习的深度微正则图生成器（DMGG），通过度保持的重连操作，使图的 assortativity 精确达到目标值，克服了传统方法在参数调校和生成效率上的不足。该方法能够在不同规模、稀疏度和拓扑结构的图上生成精确的无偏模型，有助于定量分析网络的次级特性，如聚类系数，为研究网络结构与功能的关系提供了有力工具。

详情

AI中文摘要

网络结构如何决定功能是一个基本问题，可以通过具有精确控制结构属性的图集成来研究。规范方法（如指数随机图模型ERGM）仅期望约束，允许个体实现围绕目标波动。相反，微正则集成施加硬约束，但除固定度序列外的实用采样方法仍难以实现。本文介绍深度微正则图生成器（DMGG），一种强化学习（RL）框架，通过保度重连变换任意给定图，以精确达到指定的同配性（表征相邻节点的度-度相关性）。DMGG不依赖于ERGM的熵主导的Metropolis-Hastings动力学，而是采用策略引导搜索，最大程度地改变联合度矩阵。这消除了详尽的参数调优，并在保持构型多样性的同时将生成速度提高至少一个数量级。由于DMGG可推广到各种图大小、稀疏性和拓扑结构，它提供了精确的零模型，允许定量隔离二次可观测量（如聚类系数）。这些结果确立了RL作为生成硬约束图的实用且强大的范式，为研究无集成伪影的结构-功能关系开辟了途径。

英文摘要

How network structure determines function is a fundamental question, and it can be investigated by graph ensembles with precisely controlled structural properties. Canonical approaches, formulated as exponential random graph models (ERGMs), enforce constraints only in expectation, allowing individual realizations to fluctuate around the target. Conversely, microcanonical ensembles impose hard constraints exactly, but practical sampling methods beyond fixing the degree sequence have remained out of reach. Here we introduce the Deep Microcanonical Graph Generator (DMGG), a reinforcement learning (RL) framework that transforms any given graph through degree-preserving rewirings to exactly reach a prescribed assortativity, which characterizes the degree--degree correlation of adjacent nodes. Instead of relying on the entropically dominated Metropolis--Hastings dynamics of the ERGM, DMGG employs a policy-guided search that maximally alters the joint-degree matrix. This eliminates exhaustive parameter tuning and accelerates generation by at least an order of magnitude while preserving configurational diversity. As DMGG generalizes across various graph sizes, sparsities, and topologies, it provides exact null models that allow for the quantitative isolation of secondary observables, such as the clustering coefficient. These results establish RL as a practical and powerful paradigm for generating hard-constrained graphs, opening avenues to investigate structure-function relationships free from ensemble artifacts.

URL PDF HTML ☆

赞 0 踩 0

2605.23282 2026-05-25 eess.IV cs.CV cs.LG 版本更新

Discontinuous Galerkin Neural Operator for Pathology Defocus Deblurring

病理学离焦去模糊的间断伽辽金神经算子

Shaoqing Duan, Haofei Song, Xintian Mao, Qingli Li, Yan Wang

发表机构 * Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai, China（上海多维信息处理关键实验室，华东师范大学，上海，中国）

AI总结病理学显微镜中的离焦去模糊因光学模糊的空间变化和局部不连续性而具有挑战性。现有深度学习方法受限于位移不变性假设和可解释性不足，难以处理这种异质性模糊模式。本文提出了一种基于不连续伽辽金格式的神经算子（DGNO），通过局部体积算子和界面数值通量参数化积分核，有效建模了异质且局部不连续的模糊模式，在保持光学成像物理特性的前提下，实现了更优的去模糊效果，并在高分辨率场景下表现出良好的性能。

Comments 17 pages, 9 figures. Accepted by ICML 2026

详情

AI中文摘要

病理显微镜中的离焦去模糊仍然具有挑战性，因为由位置相关的积分成像过程引起的光学模糊具有空间变化和局部不连续的特性。现有的深度学习方法受限于平移不变性假设和有限的可解释性，不太适合这种异质模糊模式。神经算子通过直接将离焦形成建模为积分算子，提供了一种原则性的替代方案，为离焦去模糊提供了新的视角。然而，大多数现有的用于低级视觉的神经算子架构依赖于全局参数化核，这些核假设平滑性和平稳性，限制了它们建模异质和局部不连续模糊模式的能力。为了解决这一限制，我们提出了间断伽辽金神经算子（DGNO），它使用具有单元局部体积算子和界面数值通量的间断伽辽金公式来参数化积分核。DGNO 提供了局部性、异质性建模和全局一致性的原则性组合，同时保留了光学图像形成的底层物理。广泛且深入的实验表明，DGNO 超越了现有技术，提供了更清晰的图像重建、对空间变化模糊的鲁棒处理以及可扩展的高分辨率性能。代码将在 https://github.com/DeepMed-Lab-ECNU/Single-Image-Deblur 发布。

英文摘要

Defocus deblurring in pathological microscopy remains challenging due to the spatially varying and locally discontinuous nature of optical blur induced by a position-dependent integral imaging process. Existing deep learning methods, constrained by shift-invariance assumptions and limited interpretability, are not well suited to such heterogeneous blur patterns. Neural operators provide a principled alternative by modeling defocus formation directly as an integral operator, offering a new perspective on defocus deblurring. However, most existing neural operator architectures for low-level vision rely on globally parameterized kernels that assume smoothness and stationarity, limiting their ability to model heterogeneous and locally discontinuous blur patterns. To address this limitation, we propose the Discontinuous Galerkin Neural Operator (DGNO), which parameterizes the integral kernel using a discontinuous Galerkin formulation with element-local volume operators and interface numerical fluxes. DGNO provides a principled combination of locality, heterogeneity modeling, and global coherence while preserving the underlying physics of optical image formation. Extensive and insightful experiments demonstrate that DGNO surpasses state-of-the-arts, delivering sharper reconstructions, robust handling of spatially varying blur, and scalable high-resolution performance. The code will be released at https://github.com/DeepMed-Lab-ECNU/Single-Image-Deblur.

URL PDF HTML ☆

赞 0 踩 0

2605.23275 2026-05-25 cs.LG 版本更新

Diffusion Domain Expansion: Learning to Coordinate Pre-trained Diffusion Models

扩散域扩展：学习协调预训练扩散模型

Egor Lifar, Semyon Savkin, Timur Garipov, Shangyuan Tong, Tommi Jaakkola

发表机构 * MIT CSAIL（麻省理工学院计算机科学与人工智能实验室）

AI总结本文提出了一种名为扩散域扩展（DDE）的方法，旨在高效地扩展预训练扩散模型，使其能够生成更大规模的对象并处理更复杂的条件输入。该方法通过一个紧凑的可训练网络协调预训练扩散模型的去噪输出，实现了对超出其原始训练范围的领域的泛化能力。实验表明，DDE在长音频生成和条件图像生成任务中均表现出色，优于其他协调生成方法。

Comments Accepted as poster at ICML 2024 Workshop on Structured Probabilistic Inference and Generative Modeling (SPIGM)

2605.23272 2026-05-25 cs.LG cs.AI 版本更新

When Good Equations Get Bad Scores: Improving Symbolic Regression Through Better Parameter Optimization

当好方程得到差分数：通过更好的参数优化改进符号回归

Boxiao Wang, Kai Li, Zhiwei Chen, Yang Huang, Runxiang Wang, Ziwen Zhang, Yifan Zhang, Jian Cheng

发表机构 * Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所）

AI总结符号回归（SR）在科学知识发现中扮演重要角色，旨在从观测数据中提炼出数学方程。现有方法通常采用双层优化框架，但参数拟合质量直接影响结构评分，导致正确结构可能因局部最优解而被低估。为此，本文提出SAGE-Fit，一种基于符号表达式结构与语义先验的拟合框架，有效缓解了优化瓶颈，显著提升了符号回归系统的评估准确性和整体性能。

详情

AI中文摘要

符号回归（SR）通过从观测数据中提炼数学方程，在科学知识发现中发挥核心作用。大多数现有SR方法在双层优化框架内运行：外层循环搜索离散方程结构，内层循环优化该结构的连续参数。关键的是，参数拟合质量直接决定结构的得分，从而影响外层搜索。然而，非线性算子使得内层循环高度非凸，且预算驱动的快速局部求解器（如BFGS）的依赖常常导致正确的结构陷入较差的局部极小值并被低估得分。这种“好结构、差分数”现象成为关键瓶颈，降低效率并误导搜索偏离真实方程。为解决此问题，我们提出SAGE-Fit（结构感知与语义引导的符号回归评估器），一个利用符号表达式双重原生先验的SR原生拟合框架。通过利用SR特有的结构和语义先验，我们为每个属性设计定制模块，从而有效缓解这一优化瓶颈。大量实验表明，我们的方法作为即插即用模块，显著提升评估保真度，并普遍提高各种SR系统的性能。

英文摘要

Symbolic Regression (SR) plays a central role in scientific knowledge discovery by distilling mathematical equations from observational data. Most existing SR methods function within a bi-level optimization framework: an outer loop that searches for the discrete equation structure, and an inner loop that optimizes the continuous parameters of that structure. Crucially, parameter-fitting quality directly determines a structure's score and thus the outer-loop search. However, nonlinear operators make the inner loop highly non-convex, and budget-driven reliance on fast local solvers (e.g., BFGS) often yields poor local minima and underestimated scores for correct structures. This ``Good Structure, Bad Score'' phenomenon becomes a key bottleneck, degrading efficiency and misguiding the search away from the true equation. To resolve this, we propose SAGE-Fit (Structure-Aware and Semantics-Guided Evaluator for Symbolic Regression), an SR-native fitting framework that exploits the dual native priors of symbolic expressions. By capitalizing on the structural and semantic priors unique to SR, we design tailored modules for each property, thereby effectively mitigating this optimization bottleneck. Extensive experiments demonstrate that our approach, as a plug-and-play module, significantly enhances evaluation fidelity and universally improves the performance of various SR systems.

URL PDF HTML ☆

赞 0 踩 0

2605.23268 2026-05-25 stat.ML cs.LG 版本更新

Coupled Training with Privileged Information and Unlabeled Data

基于特权信息与未标记数据的联合训练

Jiahao Shi, Omar Hagrass, Jason M. Klusowski

发表机构 * Department of Electrical and Computer Engineering, Princeton University（普林斯顿大学电子与计算机工程系）； Department of Operations Research and Financial Engineering, Princeton University（普林斯顿大学运筹学与金融工程系）

AI总结在许多预测任务中，训练时可获得额外信息（如昂贵或难以收集的测量数据），而这些信息在模型部署时并不可用。本文提出了一种联合训练方法，将利用额外信息的模型与仅使用测试时输入的部署模型一同训练，使部署模型仅在额外信息真正有助于预测时才加以利用，从而避免继承其错误。该方法提供了预测准确率提升的理论保证，并通过实验验证了其在合成数据和实际任务中的优越性。

Comments 37 pages, 6 figures. Accepted to ICML 2026

详情

AI中文摘要

在许多预测问题中，我们在训练期间拥有额外信息（例如，昂贵或收集缓慢的测量值），但在模型部署时这些信息将不可用。一种常见策略是首先训练一个使用所有训练信息的模型，然后利用其对未标记样本的预测来训练第二个模型，该模型仅使用测试时可用的输入。然而，当额外的训练专用信息较弱或存在噪声时，这种两阶段方法可能会误导部署模型，甚至降低准确性。我们提出一种联合训练方法，同时学习两个模型，使得部署模型仅在额外信息真正有帮助时从中受益，而不是继承其错误。我们提供了描述联合训练何时提高预测准确性的保证，并分析了一种适用于大规模高维模型的简单交替训练算法。在合成数据和真实世界预测任务上的实验表明，我们的方法避免了这些失败，并稳健地优于标准两阶段基线。

英文摘要

In many prediction problems, we have extra information during training (for example, measurements that are expensive or slow to collect) that will not be available when the model is deployed. A common strategy is to first train a model that uses all training information, then use its predictions on unlabeled examples to train a second model that only uses the inputs available at test time. However, when the extra training-only information is weak or noisy, this Two-Stage approach can mislead the deployment model and even hurt accuracy. We propose a joint training method that learns the two models together, so the deployment model can benefit from the extra information only when it actually helps, instead of inheriting its mistakes. We provide guarantees that describe when joint training improves prediction accuracy and analyze a simple alternating training algorithm for large, high-dimensional models. Experiments on synthetic data and real-world prediction tasks show that our approach avoids these failures and robustly outperforms standard Two-Stage baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.23259 2026-05-25 cs.LG cs.AI cs.CL 版本更新

Multi-Gate Residuals

多门残差

Zhizhan Zheng, Feiyun Zhang, Shuchun Liu, Tian Xia, Xi Liu, Dasheng Hu, Hongquan Zhou

发表机构 * Shanghai Yichuang Information Technology Co.,Ltd.（上海亿创信息技术有限公司）； Fudan University（复旦大学）

AI总结本文提出了一种名为Multi-Gate Residuals（MGR）的新方法，旨在解决深度残差网络中激活值无界增长的问题，同时避免引入额外的通信开销。该方法通过简单的评分与门控机制维护多流上下文，并结合注意力池化技术提取隐藏状态，从而在保持激活规模稳定的同时提升模型性能。实验表明，MGR在大规模训练与部署中具有实用性，并优于现有架构。

2605.23258 2026-05-25 cs.LG 版本更新

A Simple Plug-in for Improving Eviction-Based KV Cache Compression

一种改进基于驱逐的KV缓存压缩的简单插件

Yuping Lin, Jiayuan Ding, Yue Xing, Pengfei He, Jiliang Tang, Subhabrata Mukherjee

发表机构 * Michigan State University（密歇根州立大学）； Hippocratic AI（希波克拉底AI）

AI总结在大型语言模型的长上下文推理中，键值缓存（KV cache）的增长是一个主要瓶颈。本文提出VECTOR，一种用于改进基于驱逐的KV缓存压缩的即插即用方法，通过引入三类标记路由机制（保留、近似和驱逐），结合基础评分器的重要信号与离线校准的回归值估计的可重构信号，有效提升了缓存压缩下的质量与内存权衡，尤其在严格的内存预算下表现突出。

2605.23255 2026-05-25 cs.LG cs.DS 版本更新

Learning-Augmented Online Scheduling with Parsimonious Preemption

具有节俭抢占的学习增强在线调度

Mugen Blue, Sungjin Im, Alexander Lindermayr

发表机构 * University of California, Santa Cruz（加州大学圣克鲁兹分校）； Institut für Mathematik, Technische Universität Berlin（柏林技术大学数学研究所）

AI总结本文研究了学习增强型在线调度问题，旨在在优化任务延迟的同时减少预emption（任务切换）的次数。作者提出了一种新的算法框架，在保证调度性能的同时，将每个任务的预emption次数控制为常数级别，并且预emption开销随预测误差对数增长。该工作首次为非相关和可变形机器的调度提供了有限预emption的理论保证，拓展了学习增强调度理论的应用范围。

详情

AI中文摘要

学习增强算法已成为一种强大的范式，通过整合可能带有噪声的预测来超越传统的最坏情况下限。虽然该框架在在线调度中取得了成功，但现有工作主要优化作业延迟，同时依赖于频繁的“盲目”抢占。这忽略了算法性能与抢占复杂度之间的基本权衡。我们首次系统研究了在优化延迟的同时限制抢占的学习增强调度。我们证明了理论延迟界限与抢占开销之间的差距可以通过坚实的分析基础来弥合。我们的结果包括：在准确预测下，单机和无关并行机上每作业仅需$O(1)$次抢占的$O(1)$-竞争比算法，且开销随预测误差对数增长。通过为无关机和可塑机提供首个有界抢占保证，我们将学习增强框架的理论范围扩展到更受约束和更现实的设置。最后，通过实验验证了我们的算法。

英文摘要

Learning-augmented algorithms have emerged as a powerful paradigm to surpass traditional worst-case lower bounds by integrating potentially noisy predictions. While this framework has seen success in online scheduling, existing work primarily optimizes job latency while relying on frequent, ``blind'' preemptions. This ignores the fundamental trade-off between algorithmic performance and preemption complexity. We provide the first systematic study of learning-augmented scheduling that curbs preemption while optimizing latency. We establish that the gap between theoretical latency bounds and preemption overhead can be bridged with solid analytical foundations. Our results include $O(1)$-competitive algorithms for single and unrelated parallel machines with only $O(1)$ preemptions per job under accurate predictions, with overhead scaling logarithmically with the prediction error. By providing the first bounded-preemption guarantees for unrelated and malleable machines, we extend the theoretical reach of the learning-augmented framework to more constrained and realistic settings. Finally, our algorithms are validated through experiments.

URL PDF HTML ☆

赞 0 踩 0

2605.23249 2026-05-25 cs.LG cs.AI 版本更新

Enhancing Deep Neural Network Reliability with Refinement and Calibration

通过精炼和校准增强深度神经网络的可靠性

Ramya Hebbalaguppe, Ajay Shastry, Soumya Suvra Ghosal, Chetan Arora

发表机构 * SIT, Indian Institute of Technology Delhi, New Delhi, India（印度理工学院德里SIT，新德里）

AI总结尽管深度神经网络在预测准确性方面表现优异，但其置信度估计往往不可靠，可能影响用户对其决策的信任。为此，本文提出了一种新的损失函数和统一训练框架RefCal，旨在同时提升模型的校准性、锐度（即正确与错误预测之间的置信度差异）和准确率，从而增强深度神经网络的可靠性。实验表明，RefCal在类别不平衡的数据集上显著优于现有方法。

Comments ICLR 2026, Trustworthy AI and Representational Alignment

详情

AI中文摘要

尽管深度神经网络（DNN）实现了高预测精度，但其置信度估计通常不可靠，可能损害用户对其决策的信任。这推动了校准模型的研究，其中校准衡量模型预测置信度与正确经验概率的一致性。然而，校准指标通常可以通过后处理技术改进，这些技术仅模仿训练时的不确定性，而并未真正提升模型的理解。因此，统计学家建议模型不仅要校准，还要精炼。直观上，如果模型对正确和错误预测分配显著不同的置信度分数，则被认为更精炼，这一属性也称为锐度。我们观察到，许多现有的校准方法以降低精炼度为代价来改善校准。为解决这一局限，我们提出：（1）一种新的损失函数，显式促进精炼度，并可通过监督对比学习优化；（2）一个统一的训练框架RefCal，联合优化校准、精炼度和准确性，以提高DNN的可靠性。在类别不平衡率为10%的CIFAR-100-LT数据集上，RefCal实现了（准确率，精炼度，ECE）为（58.81，95.67，0.08），显著优于广泛使用的Correctness Ranking Loss（46.27，93.7，0.22）。

英文摘要

Although deep neural networks (DNNs) achieve high predictive accuracy, their confidence estimates are often unreliable, potentially compromising user trust in their decisions. This has motivated research on calibrated models, where calibration measures how well a model's predicted confidence aligns with the empirical probability of correctness. However, calibration metrics can often be improved through post-processing techniques that merely mimic training-time uncertainty without genuinely improving the model's understanding. For this reason, statisticians recommend that models be not only calibrated but also refined. Intuitively, a model is considered more refined if it assigns significantly different confidence scores to correct and incorrect predictions, a property also referred to as sharpness. We observe that many existing calibration methods improve calibration at the cost of reduced refinement. To address this limitation, we propose: (1) a novel loss function that explicitly promotes refinement and can be optimized through supervised contrastive learning; and (2) a unified training framework, RefCal, that jointly optimizes calibration, refinement, and accuracy to improve DNN reliability. On the CIFAR-100-LT dataset with 10 percent class imbalance, RefCal achieves (accuracy, refinement, ECE) of (58.81, 95.67, 0.08), substantially outperforming the widely used Correctness Ranking Loss, which achieves (46.27, 93.7, 0.22).

URL PDF HTML ☆

赞 0 踩 0

2605.23244 2026-05-25 cs.LG 版本更新

Convex Optimization for Alignment and Preference Learning on a Single GPU

单GPU上的对齐与偏好学习的凸优化

Miria Feng, Mert Pilanci

发表机构 * Department of Electrical Engineering, Stanford University, California, United States（斯坦福大学电气工程系，加州，美国）

AI总结本文提出了一种名为COALA的凸优化算法，用于在单块GPU上高效完成大语言模型的对齐与偏好学习。该方法通过将神经网络重新表述为凸优化问题，避免了传统方法对参考模型的依赖，显著降低了训练时间和显存消耗。实验表明，COALA在多个数据集和模型上表现出优异的性能和效率，其计算量仅为DPO方法的约17.6%，且训练过程中奖励稳定增长，达到性能峰值的时间也明显缩短。

详情

AI中文摘要

微调大型语言模型（LLMs）以符合人类偏好推动了Gemini和ChatGPT等系统的成功。然而，从人类反馈中强化学习（RLHF）等方法仍然计算昂贵且复杂。直接偏好优化（DPO）提供了一种更简单的替代方案，但存在排名准确性不一致、对GPU资源依赖度高以及超参数调优成本高等局限性。我们提出了对齐与偏好学习的凸优化算法（COALA）：一种具有强理论保证的新型轻量级策略。通过利用神经网络的凸优化重表述，COALA消除了对参考模型的需求，并在训练时间和VRAM消耗上实现了显著减少，从而能够在单个GPU上进行高效训练。在四个数据集（包括一个26621样本的合成教育反馈数据集）和六个模型（包括Llama-3.1-8B）上的实验表明，COALA在仅使用DPO总TFLOPs约17.6%的情况下，展现了具有竞争力的性能和效率。与DPO和ORPO等传统方法相比，COALA表现出稳定、单调递增的奖励，并在显著更短的时间内达到峰值边际。据我们所知，这是首次将凸优化有效应用于LLMs的偏好微调。

英文摘要

Fine-tuning large language models (LLMs) to align with human preferences has driven the success of systems such as Gemini and ChatGPT. However, approaches like Reinforcement Learning from Human Feedback (RLHF) remain computationally expensive and complex. Direct Preference Optimization (DPO) offers a simpler alternative but has limitations such as inconsistent ranking accuracy, high dependence on GPU resources, and expensive hyperparameter tuning. We propose the Convex Optimization for Alignment and Preference Learning Algorithm (COALA): a novel lightweight strategy with strong theoretical guarantees. By leveraging the convex optimization reformulation of neural networks, COALA eliminates the need for a reference model and obtains significant reduction in both training time and VRAM consumption, thus enabling efficient training on a single GPU. Experiments across four datasets--including a 26621-sample synthetic Educational Feedback dataset--and six models (including Llama-3.1-8B) demonstrate COALA's competitive performance and efficiency while utilizing as little as ~17.6% of DPO's total TFLOPs. COALA exhibits stable, monotonically increasing rewards and reaches peak margins in significantly shorter time in comparison to traditional methods such as DPO and ORPO. To the best of our knowledge, this is the first time convex optimization has been effectively applied to preference fine-tuning of LLMs.

URL PDF HTML ☆

赞 0 踩 0

2605.23241 2026-05-25 cs.LG 版本更新

RelPrism: A Multi-Faceted Pre-training Framework with Self-Generated Tasks for Relational Databases

RelPrism：面向关系数据库的多面预训练框架与自生成任务

Jinyu Yang, Cheng Yang, Junze Chen, Zedi Liu, Muhan Zhang, Hanyang Peng, Chuan Shi

发表机构 * Beijing University Of Posts and Telecommunications（北京邮电大学）； Peng Cheng Laboratory（鹏城实验室）； Peking University（北京大学）

AI总结关系数据库（RDB）仍是现代数据系统的核心，支持多种预测任务。尽管现有的关系深度学习方法通过将数据库转化为图结构并应用图模型进行表征学习，但有效的自监督预训练方法仍面临挑战，尤其是在处理多视角、多粒度的信息需求时。为此，本文提出RelPrism，一种多视角的自监督学习框架，通过从不同角度构建内在属性、关系属性和混合属性，并结合多粒度聚类生成伪任务，使预训练表征更具适应性。实验表明，RelPrism在多个真实数据集上的分类和回归任务中均优于现有方法。

详情

AI中文摘要

关系数据库（RDB）仍然是现代数据系统的基石，并支持多种预测任务。最近的关系深度学习（RDL）方法通过将RDB转换为图（其中行表示为节点，表间交互表示为边），然后应用基于图的模型进行表示学习，从而实现端到端预测。尽管RDL具有强大的能力，但有效的自监督预训练对于RDB仍然具有挑战性。RDB任务通常需要跨不同视角和粒度的多面信息。例如，用户流失分类可能更依赖于交互模式，而消费价值预测则需要用户-项目行为和内在用户属性来进行细粒度回归。这种异构需求对RDB表示学习提出了挑战，因为预训练目标应涵盖全面的信息以适应下游任务。然而，现有的自监督学习方法通常从单一视角（如节点级内在属性或子图级关系结构）获取监督信号，适应性有限。为此，我们提出了RelPrism，一个面向RDB的多面自监督学习框架。RelPrism从不同视角构建内在、关系和混合属性，并对每个视角应用多粒度聚类以形成相应的伪任务池。在这些池上进行预训练使表示暴露于更广泛的视角和粒度级别，为下游适应提供了更强的基础。在5个真实数据集上的14个任务上的实验表明，RelPrism在分类任务上比最先进的基线提高了4.15%的ROC-AUC，在回归任务上降低了10.75%的MAE。我们的代码可在https://anonymous.4open.science/r/RelPrism获取。

英文摘要

Relational databases (RDBs) remain the cornerstone of modern data systems and support diverse predictive tasks. Recent relational deep learning (RDL) methods enable end-to-end prediction by converting RDBs into graphs, where rows are represented as nodes and inter-table interactions are represented as edges, and then applying graph-based models for representation learning. Despite the strong capability of RDL, effective self-supervised pre-training for RDBs remains non-trivial. RDB tasks often require multi-faceted information across different perspectives and granularities. For example, user churn classification may rely more on interaction patterns, whereas consumption value prediction requires both user-item behaviors and intrinsic user attributes for fine-grained regression. Such heterogeneous needs challenge RDB representation learning, as pre-training objectives should cover comprehensive information for downstream adaptation. However, existing SSL methods typically derive supervision from a single facet, such as node-level intrinsic attributes or subgraph-level relational structures, providing limited adaptability. To this end, we propose RelPrism, a multi-faceted self-supervised learning framework for RDBs. RelPrism constructs intrinsic, relational, and hybrid attributes from distinct perspectives, and applies multi-granularity clustering to each perspective to form corresponding pseudo-task pools. Pre-training over these pools exposes representations to broader perspectives and granularity levels, yielding a stronger basis for downstream adaptation. Experiments on 14 tasks across 5 real-world datasets show that RelPrism improves ROC-AUC by 4.15% for classification and reduces MAE by 10.75% for regression over state-of-the-art baselines. Our code is available at https://anonymous.4open.science/r/RelPrism.

URL PDF HTML ☆

赞 0 踩 0

2605.23238 2026-05-25 cs.AI cs.GT cs.LG cs.MA 版本更新

GENSTRAT: Toward a Science of Strategic Reasoning in Large Language Models

GENSTRAT：迈向大型语言模型中的战略推理科学

Vartan Shadarevian, Kia Ghods, Alex Kenich, Anany Kotawala

发表机构 * Princeton University（普林斯顿大学）； Google（谷歌）

AI总结本文提出GENSTRAT，一种基于程序生成战略环境的评估框架，用于更准确地评估大型语言模型在复杂战略场景中的推理能力。该方法生成一系列两人零和不完全信息卡牌游戏，并结合能力分析和“崎岖度”指标，全面评估模型在不同战略维度上的表现和稳定性。实验表明，前沿模型在整体表现上更优，但其能力分布和局部波动性存在显著差异，为实际部署提供了更细致的诊断依据。

Comments 33 pages, 8 figures, 9 tables (4 figures, 2 tables in main paper)

详情

AI中文摘要

大型语言模型（LLMs）越来越多地被部署为市场、拍卖和竞价环境中的经济主体。预测它们在特定部署中的行为是困难的。现有的战略推理基准在固定的规范博弈上评估模型。这些基准可能会随着前沿模型的改进而饱和，并且不允许评估者从基准性能自信地推广到实际部署中涉及的各种混乱的战略环境。我们引入了GENSTRAT，它使用程序化生成的战略环境来解决这些挑战。具体来说，我们生成了一个两人零和、不完全信息纸牌游戏的分布。生成器可以按需生成新游戏，从而实现常青评估并抵抗污染。我们将游戏分布与一种能力剖面方法论配对，该方法论将模型能力分解为六个轴（状态空间、时间深度、信息敏感性、对手建模、风险和脆弱性）。我们还引入了一种分布内平滑度的锯齿度量，用于检测模型在战略相似游戏之间优势是否不可预测地跳跃。我们从2000个游戏的生成池中采样了50个基准游戏，并在一个包含超过36,000场比赛的正面交锋锦标赛中评估了九个前沿和开放权重LLM。较新的前沿模型平均得分更高。除了平均值之外，整体实力几乎相同的模型显示出性质不同的能力剖面，并且排行榜前三名模型中的两个（gpt-5和claude）在局部波动性上明显高于第三个（gemini-3.1-pro），尽管整体实力接近。总之，能力剖面和锯齿度量提供了仅靠整体排名无法提供的与部署相关的诊断信息。

英文摘要

Large language models (LLMs) are increasingly deployed as economic agents in marketplaces, auctions, and bidding settings. Anticipating their behavior in any specific deployment is hard. Existing strategic-reasoning benchmarks evaluate models on fixed canonical games. These benchmarks may saturate as the frontier improves, and they do not allow evaluators to generalize with confidence from benchmark performance to the varied and messy strategic environments that actual deployments involve. We introduce GENSTRAT, which uses procedurally generated strategic environments to address these challenges. Concretely, we generate a distribution of two-player zero-sum imperfect-information card games. The generator can draw fresh games on demand, allowing for evergreen evaluation and resistance to contamination. We pair the game distribution with a capability-profile methodology that decomposes model competence across six axes (state space, temporal depth, information sensitivity, opponent modeling, risk, and brittleness). We also introduce a jaggedness measure of within-distribution smoothness that detects when a model's advantage jumps unpredictably between strategically similar games. We sample 50 benchmark games from a 2,000-game generated pool and evaluate nine frontier and open-weight LLMs in a head-to-head tournament with over 36,000 matches. Newer frontier-tier models score higher on average. Beyond that average, models with near-identical overall strength show qualitatively different capability profiles, and two of the top three leaderboard models (gpt-5 and claude) are noticeably more locally volatile than the third (gemini-3.1-pro), despite being close in overall strength. Together, the capability profile and the jaggedness measure give a deployment-relevant diagnostic that the overall ranking alone cannot provide.

URL PDF HTML ☆

赞 0 踩 0

2605.23235 2026-05-25 cs.LG 版本更新

Convex Low-resource Accent-Robust Language Detection in Speech Recognition

语音识别中的凸低资源口音鲁棒语言检测

Miria Feng, William Tan, Mert Pilanci

发表机构 * Department of Electrical Engineering（电气工程系）； Department of Computer Science, Stanford University, California, United States（计算机科学系，斯坦福大学，加利福尼亚，美国）

AI总结随着全球化和多元文化的发展，语音识别系统在面对资源匮乏的方言和口音时常常表现不佳，导致语言识别错误并影响后续对话任务。本文提出了一种基于凸优化的低资源鲁棒语言检测方法Convex Language Detection（CLD），通过引入理论支撑的凸优化技术，结合多GPU加速的ADMM算法，实现了高效训练与全局最优解。该方法在理论上有稳定性保证，在实验中表现出对输入方言变化的强鲁棒性，即使在低资源条件下也能达到97-98%的识别准确率。

详情

AI中文摘要

全球化和多元文化持续产生日益多样化的语音变体。然而，当前的语音对话系统在处理代表性不足的方言和口音时经常失败，常常误识别输入语言，导致下游对话任务中的级联故障。在低资源约束下解决这种方言差异仍然是一个开放的挑战，因为标准微调计算成本高且容易在高维语音数据上过拟合。我们提出了凸语言检测（CLD），一种新颖的框架，将理论基础的凸优化技术集成到语音对话系统流程中。我们的方法通过JAX中的多GPU交替方向乘子法（ADMM）高效实现，从而提供全局最优性保证和多项式时间内的快速训练。理论上，我们证明了我们的凸目标诱导了认证的边际稳定性，并提供了对特征扰动的保证。实验上，我们展示了样本效率和对输入方言变化的鲁棒性，在具有挑战性的低资源场景中达到了97-98%的准确率。我们的开源包可在https://pypi.org/project/jaxcld/获取。

英文摘要

Globalization and multiculturalism continue to produce increasingly diverse speech varieties. Yet current spoken dialogue systems frequently fail on under-represented dialects and accents, often misidentifying the input language and causing cascading failures in downstream dialogue tasks. Addressing this dialectal variance under low-resource constraints remains an open challenge, as standard fine-tuning is computationally expensive and prone to overfitting on high-dimensional speech data. We propose Convex Language Detection (CLD), a novel framework that integrates theoretically grounded convex optimization techniques into the spoken dialogue systems pipeline. Our method is efficiently implemented via multi-GPU Alternating Direction Method of Multipliers (ADMM) in JAX, thus providing global optimality guarantees and fast training in polynomial time. Theoretically, we prove that our convex objective induces certified margin stability and provide guarantees against feature perturbations. Empirically, we demonstrate sample efficiency and robustness to input dialectical variation, achieving 97-98% accuracy in challenging low-resource regimes. Our open-source package is available at https://pypi.org/project/jaxcld/

URL PDF HTML ☆

赞 0 踩 0

2605.23225 2026-05-25 cs.DS cs.DM cs.IT cs.LG math.IT math.ST stat.TH 版本更新

Entropy Equivalence Testing

熵等价性检验

Clément L. Canonne, Yash Pote, Jonathan Scarlett, Joy Qiping Yang

发表机构 * University of Sydney（悉尼大学）； National University of Singapore（新加坡国立大学）

AI总结本文提出了一个名为“熵等价性检验”的新问题，旨在判断两个未知分布的熵是否相差超过给定阈值，相较于传统的分布接近性检验更为宽松。研究设计了一种时间与样本效率较高的算法，证明其样本复杂度可显著低于传统接近性检验。该成果进一步应用于低阶贝叶斯网络的接近性检验，显著提升了现有基于完整学习方法的样本或时间效率。

2605.23220 2026-05-25 cs.LG 版本更新

WMAttack: Automated Attack Search for Adversarial Evaluation of World-Model Agents

WMAttack：世界模型智能体对抗评估的自动化攻击搜索

Zhixiang Guo, Siyuan Liang, Shi Fu, Cheng Guo, Andras Balogh, Mark Jelasity, Dacheng Tao

发表机构 * Nanyang Technological University（南洋理工大学）； University of Szeged（塞格德大学）

AI总结尽管世界模型作为决策代理的应用日益广泛，但其对抗鲁棒性仍因缺乏专门的自动化评估方法而研究不足。为解决攻击评估中准确性和效率之间的矛盾，本文提出WMAttack，一个用于世界模型代理对抗评估的自动攻击搜索框架。该方法通过有限预算下的攻击配置搜索，并结合自纠正攻击搜索和表示引导的攻击检索技术，显著提升了攻击发现的效率和效果，在多个基准任务中均优于现有基线方法。

详情

AI中文摘要

尽管世界模型作为决策智能体的使用日益增多，但由于缺乏专用的自动化评估方法，其对抗鲁棒性仍未得到充分探索。一个关键障碍是攻击评估必须既准确又高效：弱的手动调优攻击可能高估鲁棒性，而穷举超参数搜索由于每个候选都需要通过学习的潜在动力学进行闭环展开而代价高昂。我们引入了WMAttack，一个用于世界模型智能体对抗评估的自动化攻击搜索框架。WMAttack将鲁棒性评估形式化为对攻击配置的有限预算搜索，包括攻击族、扰动预算、优化步骤、重启和分配规则。为了提高搜索准确性，自校正攻击搜索（SCAS）利用来自奖励退化、动作不稳定性、运行时间和展开变异性的反馈来细化攻击提议分布。为了提高搜索效率，表征引导攻击检索（RGAR）从表征相似的任务中检索有效的历史配置，为未见环境提供热启动。我们提供了一个理论解释，表明当提议细化将概率质量转移到高效用攻击时，它能改善有限预算搜索。在Atari和DeepMind Control任务上，WMAttack始终发现比评估基线更强的攻击，在DreamerV3 Atari上将归一化奖励下降从0.497提高到1.034，在DMC上从0.319提高到0.682。消融实验进一步表明，在固定评估预算下，RGAR提高了初始候选质量，SCAS提高了最终攻击效用。

英文摘要

Despite the growing use of world models as decision-making agents, their adversarial robustness remains underexplored due to the lack of dedicated automated evaluation methods. A key obstacle is that attack evaluation must be both accurate and efficient: weak manually tuned attacks can overestimate robustness, while exhaustive hyperparameter search is prohibitively expensive because each candidate requires closed-loop rollouts through learned latent dynamics. We introduce WMAttack, an automated attack-search framework for adversarial evaluation of world-model agents. WMAttack formulates robustness evaluation as a finite-budget search over attack configurations, including attack families, perturbation budgets, optimization steps, restarts, and allocation rules. To improve search accuracy, Self-Correcting Attack Search (SCAS) refines the attack proposal distribution using feedback from reward degradation, action instability, runtime cost, and rollout variability. To improve search efficiency, Representation-Guided Attack Retrieval (RGAR) retrieves effective historical configurations from representation-similar tasks, providing a warm start for unseen environments. We provide a theoretical explanation showing that proposal refinement improves finite-budget search when it shifts probability mass toward high-utility attacks. Across Atari and DeepMind Control tasks, WMAttack consistently discovers stronger attacks than the evaluated baselines, improving normalized reward drop from 0.497 to 1.034 on DreamerV3 Atari and from 0.319 to 0.682 on DMC. Ablations further show that RGAR improves initial candidate quality and SCAS improves final attack utility under fixed evaluation budgets.

URL PDF HTML ☆

赞 0 踩 0

2605.23219 2026-05-25 cs.LG cs.AI 版本更新

PaP-NF: Probabilistic Long-Term Time Series Forecasting via Prefix-as-Prompt Reprogramming and Normalizing Flows

PaP-NF: 通过前缀作为提示重编程和归一化流进行概率长期时间序列预测

Minju Kim, Youngbum Hur

发表机构 * Department of Industrial Engineering, Inha University, Incheon, Republic of Korea（韩国Inha大学工业工程系）

AI总结本文提出了一种名为PaP-NF的概率长期时间序列预测框架，通过Prefix-as-Prompt机制将连续时间序列表示与冻结的大语言模型对齐，并基于该模型提取的全局上下文条件化归一化流解码器，从而实现对不确定性的建模。该方法在多个长期预测基准上表现出色，能够有效捕捉多模态不确定性，同时保持较高的点预测精度。

Comments Accepted to ICPR 2026

详情

AI中文摘要

时间序列预测在许多实际应用中扮演核心角色，并已被广泛研究。大多数现有方法依赖于确定性模型。然而，现实环境表现出固有的不确定性和复杂的未来行为，使得单点预测不足。这凸显了对能够量化和表示不确定性的概率预测方法的需求。在这项工作中，我们提出了PaP-NF，一个概率预测框架，它使用前缀作为提示机制将连续时间序列表示与冻结的大语言模型（LLM）对齐，并基于LLM提取的全局上下文条件化归一化流解码器。所得预测分布的质量使用连续排名概率得分（CRPS）进行评估，这是概率预测中的标准指标。在各种长期预测基准上，PaP-NF稳健地捕获多模态不确定性，同时保持有竞争力的点预测精度。官方实现可在：https://github.com/democracy04/PaP-NF 获取。

英文摘要

Time series forecasting plays a central role in many real-world applications and has been extensively studied. Most existing approaches rely on deterministic models. However, real-world environments exhibit inherently uncertain and complex future behaviors, making single-point predictions insufficient. This highlights the need for probabilistic forecasting methods that can quantify and represent uncertainty. In this work, we propose PaP-NF, a probabilistic forecasting framework that aligns continuous time series representations with a frozen large language model (LLM) using a Prefix-as-Prompt mechanism, and conditions a normalizing flow decoder on the global context extracted by the LLM. The quality of the resulting predictive distributions is evaluated using the Continuous Ranked Probability Score (CRPS), a standard metric in probabilistic forecasting. Across a variety of long-term forecasting benchmarks, PaP-NF robustly captures multi-modal uncertainty while maintaining competitive point forecasting accuracy. The official implementation is available at: https://github.com/democracy04/PaP-NF

URL PDF HTML ☆

赞 0 踩 0

2605.23215 2026-05-25 cs.LG cs.AI cs.CL 版本更新

FastKernels: Benchmarking GPU Kernel Generation in Production

FastKernels：生产中GPU内核生成的基准测试

Gabriele Oliaro, Yichao Fu, May Jiang, Owen Lu, Junli Wang, Zhihao Jia, Hao Zhang, Samyam Rajbhandari

发表机构 * Snowflake AI Research（Snowflake AI研究院）； CMU（卡内基梅隆大学）； UCSD（加州大学圣地亚哥分校）； Independent Researcher（独立研究者）

AI总结当前基于大语言模型的GPU内核生成代理在性能评估方面面临基准与实际生产环境不匹配的问题。为此，研究提出了FastKernels，一个基于46个代表性架构构建的基准测试集，覆盖了8个类别，几乎涵盖了96.2%的HuggingFace Transformers架构，并同时提供了一个生产级推理框架。实验表明，现有最先进的内核生成代理在FastKernels上的加速效果有限，突显了基准与实际应用之间存在的关键瓶颈。

详情

AI中文摘要

基于LLM的GPU内核生成代理正在快速发展，但其进展从根本上受到所优化基准的限制。现有基准与生产推理框架严重脱节：它们在单GPU上使用合成输入评估内核，忽略周围的编译栈，并奖励复制已知优化而非发现新优化。由此产生的奖励信号具有误导性：代理学会生成在沙箱中得分高但在集成到实际系统时引入接口不兼容、编译栈冲突和静默正确性下降的内核。我们引入FastKernels，一个基于最小化46个代表性架构（涵盖8个类别）的内核基准，这些内核共同涵盖了96.2%（409/425）的HuggingFace Transformers架构。FastKernels同时作为一个简约的生产级推理框架，在主流LLM服务上与vLLM和SGLang等成熟系统运行性能相当，并在服务不足的架构上显著超过上游参考；每个任务的接口镜像其架构家族中最先进库的相应模块，使得优化后的内核能够直接部署到生产代码库中。在FastKernels上评估最先进的内核代理，我们发现即使最强的代理也仅实现0.94倍于生产基线的总加速，而较弱的代理分别为0.78倍和0.53倍——证实基准-生产错位是该领域的关键瓶颈。我们发布FastKernels，作为迈向基准收益直接转化为生产吞吐量改进的内核代理的垫脚石。代码可在https://github.com/Snowflake-AI-Research/fastkernels获取。

英文摘要

LLM-based agents for GPU kernel generation are advancing rapidly, yet their progress is fundamentally constrained by the benchmarks they optimize against. Existing benchmarks are poorly aligned with production inference frameworks: they evaluate kernels on a single GPU with synthetic inputs, ignore the surrounding compilation stack, and reward replicating known optimizations rather than discovering new ones. The resulting reward signals are misleading: agents learn to generate kernels that score well in sandboxes but introduce interface incompatibilities, compilation-stack conflicts, and silent correctness degradation when integrated into real systems. We introduce FastKernels, a kernel benchmark built around a minimal set of 46 representative architectures spanning 8 categories, whose kernels collectively subsume those of 96.2% (409/425) of HuggingFace Transformers architectures. FastKernels doubles as a minimalistic, production-grade inference framework that runs at parity with hardened systems such as vLLM and SGLang on mainstream LLM serving and substantially exceeds upstream references on under-served architectures; each task's interface mirrors the corresponding module in the state-of-the-art library for its architecture family, enabling direct deployment of optimized kernels into production codebases. Evaluating state-of-the-art kernel agents on FastKernels, we find that even the strongest agent achieves only 0.94$\times$ aggregate speedup over production baselines, with weaker agents at $0.78\times$ and $0.53\times$ -- confirming that benchmark-production misalignment is a critical bottleneck for the field. We release FastKernels as a stepping stone toward kernel agents whose benchmark gains translate directly into production throughput improvements. Code is available at https://github.com/Snowflake-AI-Research/fastkernels

URL PDF HTML ☆

赞 0 踩 0

2605.23203 2026-05-25 cs.CV cs.AI cs.LG cs.RO 版本更新

Lipschitz Optimization for Formal Verification of Homographies

单应性矩阵形式化验证的Lipschitz优化

Jean-Guillaume Durand, Panagiotis Kouvaros, Maxime Gariel, Alessio Lomuscio

发表机构 * Joby Aviation（Joby航空）； Safe Intelligence

AI总结本文研究了针对视觉神经网络在安全关键领域应用的正式鲁棒性验证问题，特别关注相机运动引起的3D扰动对图像生成过程的影响。作者提出了一种基于李普希茨优化和分段连续性分析的验证方法，建立了相机姿态到像素值的闭式映射，并推导出对扰动像素值的紧致线性界。该方法适用于具有平面结构的场景，如增强现实、自动驾驶和机器人操作等，并在多个基准测试中验证了其有效性，相比现有方法在速度和边界紧致性方面均有提升。

Comments 18 pages, 13 figures, 6 tables, to be published at CVPR 2026

详情

AI中文摘要

在受监管行业中采用视觉神经网络需要形式化的鲁棒性保证，尤其是在医疗、自动驾驶和航空航天等安全关键领域。然而，当前方法局限于不完整的统计验证或对$\ell_p$范数和仿射变换的鲁棒性，仅覆盖了图像形成过程中一小部分扰动。特别是，对相机运动的鲁棒性仍然是一个开放问题，尽管它是部署许多视觉应用的关键。我们提出了一种形式化验证方法，针对捕获相机的3D运动扰动鲁棒性。我们首先建立了从相机位姿到像素值的闭式映射。通过分析所得单应性矩阵的连续性性质，我们展示了如何将最近关于Lipschitz优化和分段连续性的工作扩展到推导扰动像素值的紧线性边界。我们的方法适用于以平面结构为主的场景，例如增强现实中的地面、自动驾驶中的道路标记和交通标志，或机器人操作中的平面工作空间。这实现了对投影几何变换的首次形式化验证，无需复杂仿真、替代网络或显式图像形成模型。我们验证了实现，并展示了相比先前工作最高89%的加速和7%更紧的边界。然后，我们在VNN-COMP基准上评估了我们的方法，揭示了投影扰动的系统性弱点。最后，我们在一个安全关键的跑道分类器上进行了真实世界案例研究，突出了对相机运动的实际漏洞，并解决了学习模型认证中的一个关键挑战。数据和代码公开在https://github.com/jeangud/homography-verification。

英文摘要

The adoption of vision neural networks in regulated industries requires formal robustness guarantees, especially in safety-critical domains such as healthcare, autonomous vehicles, and aerospace. However, current approaches are confined to incomplete statistical verification or robustness to $\ell_p$-norm and affine transforms, which cover only a narrow subset of perturbations to the image formation process. In particular, robustness to camera motion remains an open problem despite being key to deploy many vision applications. We present a formal verification approach that targets robustness against 3D motion perturbations of the capturing camera. We first establish a closed-form mapping from camera pose to pixel values. By analyzing the continuity properties of the resulting homographies, we show that recent work on Lipschitz optimization and piecewise continuity can be extended to derive tight linear bounds on perturbed pixel values. Our approach applies to scenes with predominantly planar structure, such as ground planes in augmented reality, road markings and traffic signs in autonomous driving, or planar workspaces in robotic manipulation. This enables the first formal verification of projective geometry transforms, without complex simulation, surrogate networks, or explicit image-formation models. We validate our implementation and show up to 89% speedup and 7% tighter bounds over prior work. We then evaluate our method on the VNN-COMP benchmark and reveal systematic weaknesses to projective perturbations. Finally, we demonstrate a real-world case study on a safety-critical runway classifier, highlighting practical vulnerabilities to camera motion, and addressing a key challenge in the certification of learned models. Data and code are publicly available at https://github.com/jeangud/homography-verification .

URL PDF HTML ☆

赞 0 踩 0

2605.23200 2026-05-25 cs.LG cs.AI 版本更新

扩展更多，收缩更少：塑造有效秩动态以实现推荐中的密集扩展

Guoming Li, Shangyu Zhang, Junwei Pan, Wentao Ning, Jin Chen, Gengsheng Xue, Chao Zhou, Shudong Huang, Haijie Gu, Menglin Yang

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））； Tencent Inc.（腾讯公司）； Tencent Inc. Shenzhen China（腾讯公司深圳中国）

AI总结在推荐系统中，扩展推荐模型的规模是一个核心挑战。本文针对现有方法RankMixer在扩展过程中出现的嵌入坍塌问题，提出了一种新的架构RankElastor，通过参数化的全混合机制和改进的GLU风格前馈网络，有效提升了表示的谱稳定性，缓解了有效秩的衰减现象。实验表明，RankElastor在大规模工业数据集上显著提升了推荐性能，并表现出更稳健的扩展行为。

Comments Accepted at the 32st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Research Track), KDD 2026 February Cycle

详情

DOI: 10.1145/3770855.3818049

AI中文摘要

扩展推荐模型是推荐系统中的一个核心挑战。最近，RankMixer作为一种有效的解决方案出现，它基于统一的令牌表示，交替进行令牌混合和每个令牌的前馈网络（P-FFN），以实现可扩展的性能。然而，RankMixer存在 extit{嵌入坍缩}问题，即学习到的表示具有较低的有效秩，限制了表达能力并未能充分利用扩展后的表示空间。通过实证分析和理论洞察，我们识别出刚性令牌混合和P-FFN模块是这一现象的主要原因，它们共同在跨层的有效秩演化中诱导出 extbf{阻尼振荡轨迹}。为了解决这个问题，我们提出了RankElastor，一种新颖的架构，能够产生频谱鲁棒的表示，并具有可证明的坍缩缓解能力。RankElastor引入了两个组件：（i） extbf{参数化全混合}，通过改进的频谱鲁棒性实现表达性令牌混合；（ii） extbf{GLU改进的P-FFN}，通过GLU风格的FFN模块稳定表示频谱。在大规模工业数据集上的大量实验表明，RankElastor持续改进推荐性能，缓解嵌入坍缩，并表现出稳健的扩展行为。代码可在以下GitHub仓库获取：https://github.com/vasile-paskardlgm/RankElastor

英文摘要

Scaling recommendation models is a central challenge in recommender systems. Recently, RankMixer has emerged as an effective solution, operating on a unified token representation and alternating between token mixing and per-token feedforward networks (P-FFNs) to achieve scalable performance. However, RankMixer suffers from \textit{embedding collapse}, where learned representations have low effective rank, limiting expressivity and underutilizing the expanded representation space. Through empirical analysis and theoretical insights, we identify rigid token mixing and P-FFN modules as the primary causes of this phenomenon, jointly inducing a \textbf{damped oscillatory trajectory} in effective-rank evolution across layers. To address it, we propose RankElastor, a novel architecture that produces spectrum-robust representations with provable collapse mitigation. RankElastor introduces two components: (i) \textbf{parameterized full mixing}, which enables expressive token mixing with improved spectral robustness; and (ii) \textbf{GLU-improved P-FFNs}, which stabilize representation spectra through GLU-style FFN modules. Extensive experiments on large-scale industrial datasets demonstrate that RankElastor consistently improves recommendation performance, mitigates embedding collapse, and exhibits robust scaling behavior. Code is available at this GitHub repository: https://github.com/vasile-paskardlgm/RankElastor

URL PDF HTML ☆

赞 0 踩 0

2605.23189 2026-05-25 cs.LG 版本更新

Empirical Bayes Conformal Prediction for Vision and Language Models

视觉与语言模型的经验贝叶斯共形预测

Jiapeng Zeng, Yogesh Prabhu, Zhanpeng Zeng, Michael A. Newton, Vikas Singh

发表机构 * University of Wisconsin–Madison（威斯康星大学麦迪逊分校）； University of California San Diego（加州大学圣地亚哥分校）； Xiamen University（厦门大学）

AI总结本文提出了一种基于经验贝叶斯的符合性预测框架，用于提升视觉与语言模型的预测置信度评估。该方法通过引入 $r$-值将分数的不确定性转化为置信度评分，从而更准确地判断候选结果是否属于高分组。该方法在保持目标置信度的同时，有效减少了高方差错误候选的纳入，并在多个基准任务中表现出更稳定的排序性能和更小的预测集合规模。

详情

AI中文摘要

共形预测（CP）为现代视觉和语言模型提供无分布覆盖，但通常被迫从单个不稳定的非一致性得分中做出排序决策。标准CP使用一次实现，而平均后校准变体将多次实现平滑为点估计。这两种选项都丢弃了有助于识别候选是否真正稳定的不一致性。一个弱答案可能进入共形集，即使证据不充分，仅仅因为一个后验样本或提示措辞使其看起来很强。但变异性有助于区分稳定信号和噪声驱动的波动。我们描述了一个经验贝叶斯共形预测框架，该框架使用r值将得分变异性转化为不确定性感知的非一致性得分。得到的r值估计一个候选的潜在得分在考虑其均值和不确定性后属于排名靠前组的可能性。它既接受闭式正态-正态经验贝叶斯估计器，也接受非参数后验采样估计器。使用r值作为非一致性得分在温和正则条件下保留了目标共形覆盖，同时可证明地减少了高方差假候选的包含。在图像分类、基于CLIP的VLM基准和LLM上，我们展示了r值共形预测在变异性具有信息性时保持目标覆盖，同时提高排序稳定性并减小集合大小，并在变异性消失时恢复为类似CP的行为。

英文摘要

Conformal prediction (CP) gives distribution-free coverage for modern vision and language models, but it is often forced to make a ranking decision from a single unstable nonconformity score. Standard CP uses one realization, while average-then-calibrate variants smooth multiple realizations into a point estimate. Both options discard the inconsistency that can help identify whether a candidate is indeed stable. A weak answer can enter the conformal set even if the evidence is not strong, simply because one posterior sample or prompt phrasing made it look strong. But variability can help distinguish a stable signal from noise-driven fluctuations. We describe an empirical Bayes conformal prediction framework that uses $r$-values to convert score variability into an uncertainty informed nonconformity score. The resulting $r$-value estimates how likely a candidate's latent score belongs to the top-ranked group after accounting for both its mean score and its uncertainty. It admits both a closed-form Normal-Normal empirical Bayes estimator and a nonparametric posterior-sampling estimator. Using the $r$-value as the nonconformity score preserves the target conformal coverage while provably reducing the inclusion of high variance false candidates under mild regularity conditions. Across image classification, CLIP-based VLM benchmarks, and LLMs, we show that $r$-value conformal prediction preserves target coverage while improving ranking stability and reducing set size when variability is informative, and reverting to CP-like behavior when variability vanishes.

URL PDF HTML ☆

赞 0 踩 0

2605.23182 2026-05-25 cs.LG 版本更新

Pure Exploration for a Good Policy in Reinforcement Learning with Bandit Feedback

强化学习中基于Bandit反馈的良好策略的纯探索

Zitian Li, Wang Chi Cheung

发表机构 * Department of Industrial Systems Engineering & Management（工业系统工程与管理系）

AI总结本文研究了强化学习中在仅获得带反馈（bandit feedback）的情况下，如何高效识别一个“足够好”的策略，而非传统的最优策略。为此，作者提出了“良好策略识别”（GPI）问题，目标是在给定奖励阈值的前提下，找到满足该阈值的策略或判断其不存在。文中设计了一种新算法BEE-GPI，并理论分析了其样本复杂度上界，表明其在正例和负例场景下均具有较高的效率，且其复杂度系数不依赖于状态和动作空间的大小，优于传统最优策略识别方法。实验验证了该方法的有效性。

详情

AI中文摘要

情节式强化学习中的纯探索主要关注最优策略识别（BPI），旨在以高置信度识别（近）最优策略。受实际场景中“足够好”的策略即可满足需求的启发，我们研究了另一种目标——良好策略识别（GPI）。对于给定的奖励阈值 $μ_0$，GPI 仅要求识别出一个期望奖励至少为 $μ_0$ 的策略（如果存在这样的策略，即正实例），或者声明不存在（负实例）。我们在固定置信度设置下形式化 GPI。要求输出以概率 $\geq 1-δ$ 正确，并寻求最小化期望样本复杂度，即输出所探索的情节数期望值。我们提出了一种新颖的算法 BEE-GPI，并推导了其在正实例和负实例下样本复杂度的理论上界。值得注意的是，对于正实例，上界中 $\log 1/δ$ 的系数为 $O(H^2/(V^* - μ_0)^2)$，其中 $H$ 是情节长度，$V^*$ 是情节的最优期望奖励。该系数不依赖于动作和状态空间大小，这与 BPI 中的样本复杂度形成鲜明对比。我们进一步建立了下界结果，以证明 BEE-GPI 的近最优性以及 $1/(V^* -μ)^2$ 项的必要性。数值实验进一步验证了我们方法的效率。

英文摘要

Pure exploration in episodic Reinforcement Learning has primarily focused on Best Policy Identification (BPI), which seeks to identify a (near)-optimal policy with high confidence. Motivated by practical settings where a ``good enough'' policy suffices, we study an alternate objective of Good Policy Identification (GPI). For a given reward threshold $μ_0$, GPI only requires identifying a policy with expected reward in an episode at least $μ_0$ if such a policy exists (positive instance), or declaring None if no such policy exists (negative instance). We formalize GPI under the fixed-confidence setting. We require the output to be correct with probability $\geq 1-δ$, and seek to minimize the expected sample complexity, which is the expected number of episodes explored for the output. We propose a novel algorithm BEE-GPI, and derive theoretically-grounded upper bounds on its sample complexity for positive and negative instances. Notably, for positive instances, the coefficient of $\log 1/δ$ in our upper bound is $O(H^2/(V^* - μ_0)^2)$, where $H$ is the episode length and $V^*$ is the optimal expected reward in an episode. The coefficient does not depend on the action and state space sizes otherwise, in sharp contrast to the sample complexity in BPI. We further establish lower bound results to show the near-optimality of BEE-GPI and the necessity of the $1/(V^* -μ)^2$ term. Numerical experiments further validate the efficiency of our approach.

URL PDF HTML ☆

赞 0 踩 0

2605.23180 2026-05-25 cs.CL cs.LG 版本更新

Self-Improving In-Context Learning

自我改进的上下文学习

Baturay Saglam, Dionysis Kalogerias

发表机构 * Department of Electrical and Computer Engineering（电气与计算机工程系）

AI总结本文提出了一种改进上下文学习（ICL）的方法，通过在测试时优化固定少样本提示的连续嵌入来提升模型性能。研究发现，模型对示例输出的对数概率可以作为衡量其任务理解程度的有效信号，并据此构建了一个无需额外数据的自监督置信度代理，通过零阶优化对提示嵌入进行校准。该方法无需微调、无需生成token、无需预定义标签集，适用于分类和自由生成任务，在多个ICL任务中表现出色，验证了其优化信号的有效性。

详情

AI中文摘要

我们提出通过优化测试时固定少样本提示的连续嵌入来改进上下文学习（ICL）。关键观察是，模型对其演示输出分配的对数概率——可在单次前向传播中获得，无需生成任何令牌——为模型从演示中推断任务提供了有意义的信号。我们将此信号形式化为一个有界的、自监督的置信度代理，并通过在提示嵌入上进行零阶优化来最大化它，从而得到一种测试时校准程序。该方法不需要微调、令牌生成、预定义标签集或外部数据，因此同样适用于分类和自由生成任务。在一系列全面的ICL任务中，所提出的校准方法始终匹配或改进基础模型，并在大多数任务上优于特定于分类的基线。代理改进与下游准确率提升之间的统计显著相关性证实了所提出的代理编码了用于上下文学习的可靠优化信号。

英文摘要

We propose to improve in-context learning (ICL) by optimizing the continuous embeddings of a fixed few-shot prompt at test time. The key observation is that the log-probabilities a model assigns to its demonstrated outputs$\unicode{x2013}$available from a single forward pass without generating any tokens$\unicode{x2013}$provide a meaningful signal for how well the model has inferred the task from its demonstrations. We formalize this signal as a bounded, self-supervised confidence proxy and maximize it via zeroth-order optimization over the prompt embeddings, yielding a test-time calibration procedure. The approach requires no finetuning, no token generation, no predefined label set, and no external data, making it equally applicable to both classification and free-form generation tasks. Across a comprehensive suite of ICL tasks, the proposed calibration consistently matches or improves upon the base model and outperforms classification-specific baselines on most tasks. The statistically significant correlation between proxy improvement and downstream accuracy gain confirms that the proposed proxy encodes a reliable optimization signal for in-context learning.

URL PDF HTML ☆

赞 0 踩 0

2605.23171 2026-05-25 cs.LG cs.AI stat.ML 版本更新

Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning

理解与改进指令微调中的噪声嵌入技术

Abhay Yadav

发表机构 * Johns Hopkins University（约翰霍普金斯大学）

AI总结该研究探讨了指令微调中嵌入层添加噪声的技术，分析了均匀噪声与高斯噪声的效果差异，并提出了一种新的对称噪声嵌入方法SymNoise。通过理论与实验分析，研究发现不同噪声类型性能相近，而SymNoise通过更严格地调控模型局部曲率，显著提升了微调效果。在多个基准测试中，SymNoise相比当前最优方法NEFTune取得了约6.7%的性能提升，展示了其在语言模型微调中的优越性。

Comments arXiv admin note: substantial text overlap with arXiv:2312.01523

详情

Journal ref: IEEE International Conference on Language Modeling (COLM), 2025

AI中文摘要

最近指令微调的进展在嵌入中注入噪声，其中NEFTune（Jain等人，2024）使用均匀噪声设立了基准。尽管NEFTune的实验发现均匀噪声优于高斯噪声，其原因仍不清楚。本文旨在通过提供彻底的理论和实证分析来澄清这一点，表明这些噪声类型之间的性能相当。此外，我们引入了一种新的语言模型微调方法，在嵌入中使用对称噪声。该方法旨在通过更严格地调节模型的局部曲率来增强模型功能，表现出优于当前方法NEFTune的性能。当使用Alpaca微调LLaMA-2-7B模型时，标准技术在AlpacaEval上获得29.79%的分数。然而，我们的方法SymNoise使用对称噪声嵌入将这一分数显著提高到69.04%，比最先进方法NEFTune（64.69%）提高了6.7%。此外，当在各种模型和更强的基线指令数据集（如Evol-Instruct、ShareGPT、OpenPlatypus）上测试时，SymNoise始终优于NEFTune。当前文献，包括NEFTune，强调了在语言模型微调中应用基于噪声的策略需要更深入的研究。我们的方法SymNoise是朝着这一方向迈出的又一重要步骤，显示出对现有最先进方法的显著改进。

英文摘要

Recent advancements in instructional fine-tuning have injected noise into embeddings, with NEFTune (Jain et al., 2024) setting benchmarks using uniform noise. Despite NEFTune's empirical findings that uniform noise outperforms Gaussian noise, the reasons for this remain unclear. This paper aims to clarify this by offering a thorough analysis, both theoretical and empirical, indicating comparable performance among these noise types. Additionally, we introduce a new fine-tuning method for language models, utilizing symmetric noise in embeddings. This method aims to enhance the model's function by more stringently regulating its local curvature, demonstrating superior performance over the current method, NEFTune. When fine-tuning the LLaMA-2-7B model using Alpaca, standard techniques yield a 29.79% score on AlpacaEval. However, our approach, SymNoise, increases this score significantly to 69.04%, using symmetric noisy embeddings. This is a 6.7% improvement over the state-of-the-art method, NEFTune (64.69%). Furthermore, when tested on various models and stronger baseline instruction datasets, such as Evol-Instruct, ShareGPT, OpenPlatypus, SymNoise consistently outperforms NEFTune. The current literature, including NEFTune, has underscored the importance of more in-depth research into the application of noise-based strategies in the fine-tuning of language models. Our approach, SymNoise, is another significant step towards this direction, showing notable improvement over the existing state-of-the-art method.

URL PDF HTML ☆

赞 0 踩 0

2605.23170 2026-05-25 cs.CL cs.AI cs.LG 版本更新

Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks

长上下文LLM中的位置失败：推理基准测试中的盲点

Chuyifei Zhang, Hongyu Cui, Xiaowen Huang, Jitao Sang

发表机构 * Beijing Jiaotong University（北京交通大学）； Central South University of Forestry and Technology（中央林业科技大学）

AI总结该研究指出当前主流的长上下文大语言模型推理基准在任务位置控制方面存在不足，导致无法准确评估模型在不同位置上的表现。为此，作者提出了Context Rot Evaluation（CRE）框架，系统地控制任务位置、填充内容和上下文长度三个因素，并通过实验发现，当目标任务从上下文末尾移至中间位置时，模型性能会显著下降，且随着上下文长度增加，这一问题更加严重。研究还表明，通过在末尾添加任务副本，可以有效缓解位置带来的性能下降，揭示了当前基准设计中存在结构性的评估盲区。

Comments 20 pages, 1 figure, 23 tables

详情

AI中文摘要

位置控制评估是检索任务（如Needle-in-a-Haystack和RULER）的标准做法，但主流推理基准测试并未控制目标任务在长上下文中的位置。我们审计了11个长上下文基准测试，发现没有一个同时控制任务位置、填充内容和上下文长度进行推理。对四个旗舰长上下文发布的审计发现，NIAH、RULER或LongBench系列基准测试的主要结果表中没有条目，而智能体和编码基准测试在所有四个发布的主要结果表中均有出现。我们提出了上下文旋转评估（CRE），一个控制所有三个因素的框架，并在两轮中评估了九个LLM在GSM8K和ARC-Challenge上的表现：初始五个模型集和四个较新的供应商发布。当目标任务从末尾移动到中间时，模型性能可能急剧下降，且对于易受影响的模型，这种下降随着上下文长度增加而恶化。MiMo-v2-Flash在64K下使用with_solutions填充时下降88个百分点（中间准确率8%）。较新的发布显示出较小的下降：在64K下，四个模型中有三个的末尾位置准确率波动在+/-6个百分点内；MiMo-V2.5-Pro将MiMo-v2-Flash的88个百分点下降缩小到32个百分点。在questions_only_v2填充下，所有四个模型在中间位置的下降仍然存在（在8K、32K、64K下范围-16到-56个百分点）。在8K下，一个诊断探针在末尾添加目标任务副本，使所有九个模型的中间准确率与末尾基线相差在+/-4个百分点内，这与位置解释一致。在初始五个模型集中，76%的中间位置错误与周围填充文本匹配，而末尾位置仅为22%，这与填充-答案干扰作为主要错误模式一致。这些结果暴露了当前推理基准测试设计和供应商评估实践中的结构性评估差距：当任务位置不受控制时，无法测量随上下文长度增长而恶化的位置脆弱性。

英文摘要

Position-controlled evaluation is standard for retrieval tasks such as Needle-in-a-Haystack and RULER, but mainstream reasoning benchmarks do not control positional placement of target tasks in long contexts. We audit 11 long-context benchmarks and find none jointly controls task position, filler content, and context length for reasoning. An audit of four flagship long-context releases finds no main result-table entry for NIAH, RULER, or LongBench-family benchmarks, while agentic and coding benchmarks appear in main result-tables across all four. We propose Context Rot Evaluation (CRE), a controlled framework varying all three factors, and evaluate nine LLMs on GSM8K and ARC-Challenge across two rounds: an initial five-model set and four newer vendor releases. Models can drop sharply when the target task moves from end to middle, and the drop grows worse with context length for vulnerable models. MiMo-v2-Flash drops 88pp at 64K under with_solutions filler (middle accuracy 8%). Newer releases show smaller drops: at 64K, three of four stay within +/-6pp of end-position accuracy; MiMo-V2.5-Pro narrows the MiMo-v2-Flash 88pp drop to 32pp. Under questions_only_v2 filler, middle-position drops persist across all four (range -16pp to -56pp across 8K, 32K, 64K). At 8K, a diagnostic probe adding a target-task copy at the end brings middle accuracy within +/-4pp of end baseline across all nine models, consistent with a positional explanation. In the initial five-model set, 76% of middle-position errors match surrounding filler text versus 22% at the end position, consistent with filler-answer interference as a dominant error mode. These results expose a structural evaluation gap in current reasoning benchmark design and vendor evaluation practice: positional vulnerabilities that grow with context length cannot be measured when task position is not controlled.

URL PDF HTML ☆

赞 0 踩 0

2605.23168 2026-05-25 cs.CR cs.AI cs.LG 版本更新

PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

PoisonForge: 面向指令微调LLM的任务级定向投毒基准

Luze Sun, Anshuman Suri, Harsh Chaudhari, Cristina Nita-Rotaru, Alina Oprea

发表机构 * Department of Computer Science（计算机科学系）

AI总结本文提出PoisonForge，一个针对指令微调大语言模型的针对性任务级投毒基准，用于评估在有限投毒预算下模型对恶意数据的脆弱性。该基准通过四个维度参数化投毒威胁，并在五个任务家族中测试了12个不同参数量的开源模型，结果显示大多数模型在最脆弱配置下攻击成功率超过70%，但对非目标任务的影响极小。研究分析了影响攻击成功率的关键因素，并发现投毒设计选择而非模型规模是攻击成功的主要原因。

详情

AI中文摘要

当从业者在未经验证的数据集上微调LLM时，攻击者可以通过任务级投毒利用数据供应链：插入少量精心设计的指令-响应对，导致模型在目标任务族中嵌入攻击者指定的实体（如国家），而在其他行为中表现正常。我们引入PoofForge，一个沿四个维度（偏差类型、投毒模式、出现次数和目标输出长度）参数化此威胁的基准，并在五个模型族中评估了12个开源模型（参数从2B到32B），主要采用1%的投毒预算。在1000个微调样本中仅使用10个投毒样本的情况下，12个模型中有11个在其最易受攻击的配置下攻击成功率（ASR）超过70%。同时，非目标任务的无意泄露低于0.5%，模型在标准基准上表现良好。我们详细分析了影响攻击成功的因素。我们观察到，实体的多次出现提高了ASR，最佳投毒模式取决于目标实体的语义结构，并且ASR随任务输出长度单调下降。相关分析和风险预测模型证实，投毒设计选择而非模型规模是攻击成功的主要原因，并且这些模式可以推广到预测新任务上的攻击成功。我们发布所有配置、流水线和分析代码以支持可重复比较。

英文摘要

When practitioners fine-tune LLMs on unvetted datasets, an adversary can exploit the data supply chain through task-level poisoning: inserting a small number of crafted instruction-response pairs that cause the model to embed attacker-specified entities, such as a country, in outputs for a targeted task family while behaving normally elsewhere. We introduce PoisonForge, a benchmark that parameterizes this threat along four dimensions (bias type, poisoning mode, appearance count, and target output length) and evaluates 12 open-weight models (from 2B to 32B parameters) across five families under a primarily 1% poison budget. With only 10 poisoned examples among 1,000 fine-tuning examples, 11 of 12 models exceed a 70% attack success rate (ASR) in their most vulnerable configuration. Meanwhile, unintended leakage to non-target tasks remains below 0.5%, and models perform well on standard benchmarks. We analyze in detail the factors contributing to attack success. We observe that multiple appearances of an entity increase the ASR, the optimal poisoning mode depends on the semantic structure of the target entity, and ASR drops monotonically with the task output length. A correlation analysis and risk prediction model confirm that poisoning design choices, rather than model scale, are the primary causes of attack success, and that these patterns generalize to predict attack success on new tasks. We release all configurations, pipelines, and analysis code to support reproducible comparisons.

URL PDF HTML ☆

赞 0 踩 0

2605.23158 2026-05-25 cs.CR cs.CL cs.LG 版本更新

What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference

服务器看到了什么？理解大语言模型在分割推理中的隐私泄露

Mingyuan Fan, Yu Liu, Fuyi Wang, Cen Chen

发表机构 * East China Normal University（华东师范大学）； RMIT University（皇家墨尔本理工大学）

AI总结本文研究了在分割推理（split inference）框架下，大型语言模型（LLM）可能泄露用户隐私的问题。作者提出了一种名为ActInv的方法，通过匹配中间激活值来重建客户端输入，揭示了分割推理中的隐私漏洞。研究还引入了“扰动放大因子”（PAF）来量化各层对重建的抵抗能力，并设计了PriPert防御方案，有效提升了隐私保护效果，同时保持了模型的实用性和计算效率。

Comments Accepted to ACM CCS'26

详情

AI中文摘要

在资源受限设备上部署大语言模型（LLM）仍然具有挑战性，这激发了人们对分割推理的兴趣，即模型在客户端和服务器之间进行划分，通过仅传输中间激活来减少计算负担并增强隐私。然而，分割推理的隐私保护能力，特别是在LLM背景下，尚未得到彻底研究。为填补这一空白，我们引入了ActInv，它解决了一个中间激活匹配问题以重建客户端的输入。大量评估表明，即使在存在常见基于扰动的防御（如高斯噪声注入和激活稀疏化）的情况下，ActInv也能实现高保真重建。为了系统地理解这一漏洞，我们开发了扰动放大因子（PAF），一个用于量化层对重建固有抵抗力的指标。我们的分析揭示了隐私脆弱性在层间并不均匀，一些层高度易受泄露，而另一些层则提供自然抵抗力。此外，我们证明了通过校准扰动方向以在反向传播期间最大化重建误差，可以显著提高防御有效性。基于这些见解，我们设计了PriPert，并进行了全面评估，涵盖隐私、效用和计算开销，以证明其有效性。

英文摘要

The deployment of large language models (LLMs) on resource-constrained devices remains challenging, spurring interest in split inference, where models are partitioned between client and server to reduce computational burden and enhance privacy by transmitting only intermediate activations. However, the privacy-preserving capabilities of split inference, particularly in the context of LLMs, have not been exhaustively investigated. To fill this gap, we introduce ActInv, which solves an intermediate activation matching problem to reconstruct the client's input. Extensive evaluations demonstrate that ActInv achieves high-fidelity reconstructions, even in the presence of common perturbation-based defenses such as Gaussian noise injection and activation sparsification. To systematically understand this vulnerability, we develop Perturbation Amplification Factor (PAF), a metric for quantifying a layer's inherent resistance to reconstruction. Our analysis reveals that privacy vulnerability is not uniform across layers, with some layers being highly susceptible to leakage while others offer natural resistance. Furthermore, we demonstrate that defense effectiveness can be significantly improved by calibrating perturbation directions to maximize reconstruction error during backpropagation. Building on these insights, we design PriPert and conduct comprehensive evaluations, covering privacy, utility, and computational overhead, to demonstrate its effectiveness.

URL PDF HTML ☆

赞 0 踩 0

2605.23156 2026-05-25 cs.LG math.FA math.RT stat.ML 版本更新

Any-Dimensional Invariant Universality

任意维不变泛化性

Shengtai Yao, Eitan Levin, Mateo Díaz

发表机构 * Department of Applied Mathematics and Statistics, Johns Hopkins University（约翰霍普金斯大学应用数学与统计学系）； Department of Computing and Mathematical Sciences, Caltech（加州理工学院计算与数学科学系）

AI总结本文研究了适用于任意尺寸输入的机器学习模型的泛化能力问题，这类模型如处理不同节点数的图或点云的数据。传统泛化性分析通常针对固定尺寸的输入，而本文提出了一种系统的方法，通过将任意维函数映射到一个合适的无限维极限空间，从而建立任意维模型的泛化性理论。该方法利用输入的对称性及不同尺寸输入之间的关系，定义了该空间上的自然拓扑结构，并展示了如何在该空间上建立任意维泛化性。研究还指出了一些现有模型的泛化性缺陷，并提出了简单的改进方案以恢复其泛化能力。

详情

AI中文摘要

一些机器学习模型是为任意大小的输入定义的，例如具有不同节点数的图和包含不同点数目的点云。这类任意维模型的泛化性仍然知之甚少，因为泛化性传统上是在接受固定大小输入的模型上研究的，定义在其域的紧致子集上。与此形成鲜明对比的是，任意维模型可以被视为定义在规模不断增长的输入上的函数序列，目前尚不清楚它们在何种意义上可以是泛化的。我们开发了一种系统的方法来建立任意维泛化性，通过将任意维函数与一个唯一的函数等同起来，该函数在合适的无限维极限空间中接受输入，该空间包含所有有限大小的输入及其极限。利用这些输入的对称性以及不同大小输入之间的关系，我们证明了该极限空间具有自然的拓扑结构，并且包含丰富的紧致集族，在这些紧致集上可以建立任意维泛化性。我们通过展示几种现有架构无法实现泛化性，并提出了恢复泛化性的简单修改，来说明我们的方法。

英文摘要

Several machine learning models are defined for inputs of any size, such as graphs with different numbers of nodes and point clouds containing varying numbers of points. The universality properties of such any-dimensional models remain poorly understood, as universality is traditionally studied for models accepting inputs of a fixed size, defined on a compact subset of their domain. In sharp contrast, any-dimensional models can be viewed as sequences of functions defined on growing-sized inputs, and it is not clear in which sense they can be universal. We develop a systematic approach to establish any-dimensional universality, by identifying any-dimensional functions with a unique function taking inputs in a suitable infinite-dimensional limit space containing inputs of all finite sizes as well as their limits. Using the symmetries of these inputs and relations between inputs of different sizes, we show that this limit space admits a natural topology with rich families of compact sets on which any-dimensional universality can be established. We illustrate our approach by showing that several existing architectures fail to be universal, and we propose simple modifications that restore universality.

URL PDF HTML ☆

赞 0 踩 0

2605.23146 2026-05-25 cs.LG cs.AI 版本更新

通过泰勒模式自动微分进行阿基米德Copula推断

Cambridge Yang, Dongdong Li

发表机构 * Cambridge Yang（剑桥阳）； Harvard Medical School（哈佛医学院）

AI总结该研究提出了一种名为 \textsc{acopula} 的 JAX 框架，用于高效计算任意嵌套阿基米德Copula模型在高维、任意变量右删失情况下的精确似然和参数梯度。其核心方法是通过泰勒模式自动微分的多项式幂运算，替代传统手动推导的贝尔多项式表，从而支持任意生成函数和复杂的嵌套结构。实验表明，该框架在高维数据、大规模金融和医学数据集上表现出优越的性能和灵活性，并实现了比现有工具显著的加速效果。

详情

AI中文摘要

现有的嵌套阿基米德Copula工具无法同时处理以下三个方面：(a) 生存分析中任意变量的（右）删失，(b) 任意嵌套树，以及(c) 精确参数梯度。现有实现仅处理双变量问题、低维（即$d \leq 10$）情况、两层嵌套或仅手工推导的Copula嵌套。我们提出 extsc{acopula}，一个JAX原生框架，给定任意阿基米德生成元——经典或神经——在多项式时间内，在任意删失掩码下评估精确的嵌套Copula似然和参数梯度。其机制是泰勒模式自动微分输出的多项式幂运算，用单个可微计算替代每个族手工推导的偏贝尔多项式表，任何用户定义的生成元都可以驱动该计算。我们进行了大量模拟以验证 extsc{acopula}的正确性。然后我们展示了：(a) 在$d=53$的高维MIMIC-IV ICU入院数据（$85{,}229$条记录）上的逐变量删失，由经典阿基米德族和嵌套神经阿基米德Copula拟合；(b) 在$d=98$的标普500日收益率上的11部门层次模型；(c) 在一项视网膜病变研究中，跨十个族（其中五个族之前没有实现）的族无关删失MLE；以及(d) 在$d=35$时，相对于R的 exttt{nacLL}每密度加速约$650$倍，且二次扩展到$d=8{,}000$。

英文摘要

No existing nested Archimedean copula tool handles all three of (a) arbitrary per-variable (right-)censoring in survival analysis, (b) arbitrary nesting trees, and (c) exact parameter gradients. Existing implementations handle only bivariate problems, low dimensional (i.e., $d \leq 10$) cases, two layers of nesting, or only hand-derived copula nestings. We present \textsc{acopula}, a JAX-native framework that, given any Archimedean generator -- classical or neural -- evaluates exact nested-copula likelihoods and parameter gradients under arbitrary censoring masks in polynomial time. The mechanism is polynomial powering of Taylor-mode automatic differentiation output, which replaces per-family hand-derived partial Bell polynomial tables with a single differentiable computation that any user-defined generator can drive. We conduct extensive simulations to verify the correctness of \textsc{acopula}. We then demonstrate (a) per-variable censoring on $85{,}229$ MIMIC-IV ICU admissions in high dimensions with $d{=}53$, fit by both classical Archimedean families and nested neural Archimedean copulas; (b) an 11-sector hierarchical model on S\&P~500 daily returns at $d{=}98$; (c) family-agnostic censored MLE across ten families, five of them with no prior implementation, on a retinopathy study; and (d) a ${\sim}650\times$ per-density speedup over R's \texttt{nacLL} at $d{=}35$, scaling quadratically to $d{=}8{,}000$.

URL PDF HTML ☆

赞 0 踩 0

2605.23131 2026-05-25 cs.LG 版本更新

When Determinants Are Not Enough: Private Rare Switching

当行列式不够时：私有稀有切换

Xingyu Zhou

发表机构 * Wayne State University（韦恩州立大学）

AI总结本文探讨了在隐私保护背景下，传统基于行列式的线性上上下文 bandits 和强化学习更新规则的局限性。当引入高斯噪声以满足隐私要求时，设计矩阵的单调增长特性可能被破坏，导致原有分析不再适用。为解决这一问题，作者提出了一种基于广义瑞利商的稀有切换规则，恢复了对数策略更新和置信区间宽度的常数因子控制，从而在隐私设置下实现了有效的稀有切换策略。

2605.23118 2026-05-25 cs.CV cs.AI cs.LG 版本更新

Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking

在临床医生验证的交互式病灶追踪中利用纵向上下文

Yannick Kirchhoff, Maximilian Rokuss, Daniel Philipp Mertens, David Füller, Benjamin Hamm, Andreas Schreyer, Oliver Ritter, Klaus Maier-Hein

发表机构 * German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Germany（德国癌症研究中心（DKFZ）海德堡，医学图像计算部，德国）； Faculty of Mathematics and Computer Science, Heidelberg University, Germany（海德堡大学数学与计算机科学学院，德国）； HIDSS4Health -- Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany（HIDSS4Health——海德堡信息与数据科学健康学校，卡尔斯鲁厄/海德堡，德国）； Medical Faculty, Heidelberg University, Germany（海德堡大学医学学院，德国）； University Hospital Brandenburg an der Havel, Brandenburg Medical School Theodor Fontane, Germany（勃兰登堡运河大学医院，布兰登堡泰奥多尔·冯·_fontane医学学校，德国）； Pattern Analysis and Learning Group, Department of Radiation Oncology, Heidelberg University Hospital, Germany（放射肿瘤科模式分析与学习组，海德堡大学医院，德国）

AI总结本文研究了如何在临床验证的交互式病灶追踪中有效利用纵向影像信息，以提高肿瘤在连续CT扫描中的追踪准确性。作者提出了一种“验证追踪”范式，通过临床医生验证注册提出的提示，并结合病灶的基线外观信息，解决分割中的模糊问题。该方法结合了早期空间提示融合与潜在时间差分加权，构建了一个统一的纵向信息引导分割框架，并通过大规模合成预训练克服数据稀缺问题，显著提升了性能。实验表明，该方法在全自动和验证追踪设置下均优于现有方法，且在MICCAI autoPET IV挑战赛中取得第一名。

Comments Accepted at MICCAI 2026

详情

AI中文摘要

在系列CT扫描中追踪肿瘤病灶对于肿瘤学反应评估至关重要。现有的自动化方法面临一个基本权衡：端到端追踪器实现高度自动化，但无法纠正无声的追踪失败；而解耦的配准-分割流程允许用户验证，却丢弃了病灶的先验外观，限制了在模糊情况下的准确性。在这项工作中，我们提出了一种验证追踪范式：临床医生验证配准提出的提示，模型利用该提示以及基线病灶外观来解决分割模糊性。我们提出了一个统一框架，结合早期空间提示融合与潜在时间差异加权，用于纵向信息感知的分割。为了解决数据稀缺问题，我们利用大规模合成预训练，证明这对于利用纵向上下文至关重要，相比从头训练性能提升高达4.5个Dice点。我们的方法在MICCAI autoPET IV挑战中获得第一名。我们进一步整理并发布了PanTrack，一个新的纵向胰腺癌基准，以评估分布外泛化能力。实验表明，我们的模型在全自动和所提出的验证追踪设置中均优于先前工作，在自动化与控制之间提供了一个临床安全的中间地带。代码、模型和数据集将在https://github.com/MIC-DKFZ/LongiSeg发布。

英文摘要

Tracking tumor lesions across serial CT scans is essential for oncological response assessment. Existing automated methods face a fundamental trade-off: end-to-end trackers achieve high automation but offer no opportunity to correct silent tracking failures, while decoupled registration-segmentation pipelines permit user verification yet discard the lesion's prior appearance, limiting accuracy in ambiguous cases. In this work, we propose a Verified Tracking paradigm: a clinician verifies a registration-proposed prompt, which the model leverages alongside the baseline lesion appearance to resolve segmentation ambiguities. We present a unified framework combining early spatial prompt fusion with latent temporal difference weighting for longitudinally-informed segmentation. To address data scarcity, we leverage large-scale synthetic pretraining, proving essential for exploiting longitudinal context, improving performance by up to 4.5 Dice points over training from scratch. Our approach secured first place in the MICCAI autoPET IV challenge. We further curate and release PanTrack, a new longitudinal pancreatic cancer benchmark, to assess out-of-distribution generalization. Experiments show that our model outperforms prior work in both fully automatic and the proposed verified tracking setting offering a clinically safe middle ground between automation and control. Code, model and dataset will be released at https://github.com/MIC-DKFZ/LongiSeg

URL PDF HTML ☆

赞 0 踩 0

2605.23115 2026-05-25 cs.LG stat.ML 版本更新

Robust OT-Guided Generative Residual Domain Adaptation for Bike-Sharing Demand Prediction under Temporal Domain Shift

鲁棒OT引导的生成式残差域适应用于时间域偏移下的共享单车需求预测

Yiming Ma

发表机构 * Department of Statistics ； Finance, School of Management, University of Science

AI总结本文研究了从2021年到2026年纽约Citi Bike共享单车需求预测中的时间域适应问题，提出了一种基于最优运输引导的残差域适应框架Gen-ROTDA。该方法通过拟合目标域的站点-时间锚点，转移残差而非原始需求，并采用确定性标签保持的残差特征生成器，提升了模型在时间域偏移下的鲁棒性。实验表明，Gen-ROTDA在主要任务2025至2026年的预测中取得了最低的平均绝对误差，并在多任务中优于其他最优运输方法，尤其在面对噪声数据时表现出更强的稳定性。

详情

AI中文摘要

基于历史站点-小时数据训练的共享单车模型在后续年份部署时，由于出行模式随时间变化，性能可能会下降。本文将2021年至2026年3月Citi Bike需求预测作为时间域适应问题进行研究，并提出了Gen-ROTDA，一种鲁棒最优传输引导的残差域适应框架。该方法利用少量标记目标子集拟合目标域站点-时间锚点，传输残差而非原始需求，应用确定性标签保持残差特征生成器，并在训练最终残差预测器之前修剪高成本传输匹配。实验将Gen-ROTDA与仅锚点、仅源域、仅目标域、微调、MMD适应、Sinkhorn OTDA、ROTDA和Gen-OTDA进行比较。Gen-ROTDA在2025年至2026年主要任务上取得了最低MAE，并且在多年度任务中平均表现最佳，尽管微调和MMD适应仍然是强大的整体基线。在异常目标无标签记录下，Gen-ROTDA比非鲁棒OT变体稳定得多，表明鲁棒传输对于共享单车需求预测中的噪声时间迁移是有用的。

英文摘要

Bike-sharing models trained on historical station-hour data may degrade when deployed in later years because travel patterns change over time. This paper studies March Citi Bike demand prediction from 2021 to 2026 as a temporal domain adaptation problem and proposes Gen-ROTDA, a robust optimal transport-guided residual domain adaptation framework. The method fits a target-domain station-time anchor with a small labeled target subset, transfers residual rather than raw demand, applies a deterministic label-preserving residual feature generator, and trims high-cost transport matches before training the final residual predictor. Experiments compare Gen-ROTDA with anchor-only, source-only, target-only, fine-tuning, MMD adaptation, Sinkhorn OTDA, ROTDA, and Gen-OTDA. Gen-ROTDA achieves the lowest MAE on the main 2025 to 2026 task and is the best OT-family method on average across multi-year tasks, although fine-tuning and MMD adaptation remain strong overall baselines. Under abnormal target-unlabeled records, Gen-ROTDA is much more stable than non-robust OT variants, suggesting that robust transport is useful for noisy temporal transfer in bike-sharing demand prediction.

URL PDF HTML ☆

赞 0 踩 0

2605.23102 2026-05-25 stat.ML cs.LG stat.ME 版本更新

ThriftAttention: 面向长上下文FP4注意力机制的选择性混合精度

Joe Sharratt

发表机构 * NVIDIA Corporation（英伟达公司）

AI总结在长上下文任务中，注意力机制的二次计算成本是一个关键挑战。为了解决这一问题，ThriftAttention 提出了一种选择性混合精度方法，在保持 FP4 推理效率的同时，显著提升了长上下文场景下的模型质量。该方法通过分阶段策略，优先以 FP16 精度计算少量重要的查询-键块对，其余块则使用 FP4 精度计算，并通过在线 softmax 合并结果，从而在仅使用 5% FP16 块的情况下，恢复了 89.1% 的 FP4 到 FP16 性能差距。

详情

AI中文摘要

高效的注意力算法对于减轻长上下文工作负载中注意力的二次成本至关重要。先前的工作在Blackwell GPU上利用块缩放量化技术将注意力计算移至4位精度以加速推理。然而，这些技术在长上下文设置中会导致显著的质量下降。我们表明，量化误差的输出影响高度不均匀，并且随着每个查询-键交互的重要性而增加，将功能相关的误差集中在包含最重要标记的少量注意力块中。我们提出ThriftAttention，一种低比特注意力变体，在FP4推理效率下提供接近FP16的长上下文质量。该方法分两个阶段进行。首先，一种启发式方法快速选择少量重要的查询-键块对进行FP16精度计算。其次，选中的块以FP16计算，其余块以FP4计算，两条路径通过在线softmax合并为单个输出。我们在长上下文基准和模型家族上证明，通过仅计算5%的查询-键块为FP16，ThriftAttention平均恢复了FP4到FP16性能差距的89.1%。我们展示了ThriftAttention的优势随序列长度增加而增长，缓解了在更长上下文中观察到的系统性FP4质量下降。代码可在https://github.com/joesharratt1229/ThriftAttention获取。

英文摘要

Efficient attention algorithms are critical to mitigate the quadratic cost of attention in long-context workloads. Prior work utilises block-scaled quantisation techniques on Blackwell GPUs to move attention computation to 4-bit precision to accelerate inference. However, these techniques result in significant quality degradation in long-context settings. We show that the output impact of quantisation error is highly non-uniform and increases with the importance of each query-key interaction, concentrating functionally relevant error in a small number of attention blocks that contain the most important tokens. We propose ThriftAttention, a low-bit attention variant that delivers near-FP16 long-context quality at FP4 inference efficiency. This approach proceeds in two stages. First, a heuristic rapidly selects a small number of important query-key block pairs for FP16 precision. Second, the selected blocks are computed in FP16 and the remaining blocks in FP4, with both paths merged via online softmax into a single output. We demonstrate across long-context benchmarks and model families that by computing only 5% of query-key blocks in FP16, ThriftAttention recovers on average 89.1% of the FP4-to-FP16 performance gap. We show ThriftAttention's advantage grows with sequence length, mitigating the systematic FP4 quality degradation observed at longer contexts. The code is available at https://github.com/joesharratt1229/ThriftAttention.

URL PDF HTML ☆

赞 0 踩 0

2605.23078 2026-05-25 cs.LG cs.CL 版本更新

GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs

GEMQ：MoE大语言模型的全局专家级混合精度量化

Jianing Deng, Song Wang, Dongwei Wang, Zijie Liu, Tianlong Chen, Huanrui Yang, Jingtong Hu

发表机构 * University of Pittsburgh（匹兹堡大学）； University of Central Florida（佛罗里达州立大学）； University of Arizona（亚利桑那大学）； University of North Carolina at Chapel Hill（北卡罗来纳大学教堂山分校）

AI总结混合专家大型语言模型（MoE-LLMs）在性能上表现优异，但因大量专家参数导致内存开销较大。为解决这一问题，本文提出了一种全局专家级混合精度量化方法GEMQ，通过全局线性规划形式捕捉模型整体的专家重要性，并结合高效的路由微调以适应量化后的专家，从而实现更优的精度与内存权衡。实验表明，GEMQ在保持精度的同时显著降低了内存占用并加速了推理。

Comments ICML 2026

详情

AI中文摘要

混合专家大语言模型（MoE-LLMs）性能强大，但由于大量专家参数导致显著的内存开销。混合精度量化根据专家重要性分配不同的位宽，接近精度-内存帕累托前沿，并实现极低比特量化。然而，现有方法依赖于逐层重要性估计，忽视了量化引起的路由器偏移，导致次优的分配和路由。本文提出全局专家级混合精度量化（GEMQ），通过（1）基于量化误差分析的全局线性规划公式来捕获模型范围内的专家重要性，以及（2）高效的路由器微调以适应量化后的专家，从而克服这些限制。这些组件被集成到一个渐进式量化框架中，该框架迭代地优化重要性估计和分配。实验表明，GEMQ在最小化精度损失的情况下显著减少内存并加速推理。源代码可在 https://github.com/jndeng/GEMQ 获取。

英文摘要

Mixture-of-Experts Large Language Models (MoE-LLMs) achieve strong performance but incur substantial memory overhead due to massive expert parameters. Mixed-precision quantization mitigates this cost by allocating expert-wise bit-widths based on their importance, approaching the accuracy-memory Pareto frontier and enabling extreme low-bit quantization. However, existing methods rely on layer-wise importance estimation and overlook router shifts induced by quantization, resulting in suboptimal allocation and routing. In this work, we propose Global Expert-level Mixed-precision Quantization (GEMQ) to overcome these limitations via (1) a global linear-programming formulation that captures model-wide expert importance based on quantization error analysis, and (2) efficient router fine-tuning to adapt routing to quantized experts. These components are integrated into a progressive quantization framework that iteratively refines importance estimation and allocation. Experiments demonstrate that GEMQ significantly reduces memory and accelerates inference with minimal accuracy degradation. Source code is available at https://github.com/jndeng/GEMQ .

URL PDF HTML ☆

赞 0 踩 0

2605.23065 2026-05-25 cs.CV cs.AI cs.LG 版本更新

Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering

抖动防御：通过多级 Floyd-Steinberg 抖动实现视觉基础模型的对抗鲁棒性

Yury Belousov, Brian Pulfer, Vitaliy Kinakh, Slava Voloshynovskiy

发表机构 * Department of Computer Science, University of Geneva, Switzerland（日内瓦大学计算机科学系）

AI总结该研究提出了一种基于多级Floyd-Steinberg抖动算法的轻量输入变换方法，用于提升视觉基础模型在对抗攻击下的鲁棒性。该方法通过在图像中引入可控的噪声，破坏对抗扰动的同时保留语义内容，适用于多种下游任务和不同模型架构。实验表明，该方法在多种攻击场景下表现优异，且对干净输入的性能下降较小，优于现有的去噪基线方法。

Comments Paper accepted at the IEEE International Conference on Image Processing (ICIP 2026)

详情

AI中文摘要

视觉基础模型被广泛用作许多下游任务中的冻结骨干，使其成为对抗攻击下的单点故障。我们研究了多级 Floyd-Steinberg 误差扩散抖动作为一种轻量级、模型无关的输入变换，它在保留语义内容的同时破坏对抗扰动。与先前局限于二值抖动、灰度 CIFAR-10 和从头训练的单个小模型的工作不同，我们在六个任务（分类、分割、深度估计、检索、字幕生成、视觉问答）、两个模型家族（DINOv2、PaliGemma）以及三种强度递增的攻击（PGD、MI-FGSM、SIA）上进行了评估，还包括使用直通估计器的自适应攻击者。我们的结果表明，在中间量化级别上的 Floyd-Steinberg 抖动，尤其是与后处理模糊相结合时，超过或匹配所有测试的基线（包括基于扩散的去噪），并且在干净输入上的退化显著更小。

英文摘要

Vision foundation models are widely used as frozen backbones across many downstream tasks, making them a single point of failure under adversarial attack. We study multi-level Floyd-Steinberg error-diffusion dithering as a lightweight, model-agnostic input transformation that disrupts adversarial perturbations while preserving semantic content. Unlike prior work, which was limited to binary dithering, grayscale CIFAR-10, and a single small model trained from scratch, we evaluate across six tasks (classification, segmentation, depth estimation, retrieval, captioning, visual question answering), two model families (DINOv2, PaliGemma), and three attacks of increasing strength (PGD, MI-FGSM, SIA), as well as an adaptive attacker using a straight-through estimator. Our results show that Floyd-Steinberg dithering at intermediate quantization levels, especially when combined with post-processing blur, exceeds or matches all tested baselines, including diffusion-based denoising, with substantially less degradation on clean inputs.

URL PDF HTML ☆

赞 0 踩 0

2605.23064 2026-05-25 cs.CV cs.LG 版本更新

Millimeter-wave Imaging for Anthropometric Body Measurement

毫米波成像用于人体测量

Miriam Senne, Benjamin D. Killeen, Christoph Baur, Nassir Navab, Azade Farshad

发表机构 * Chair for Computer Aided Medical Procedures（计算机辅助医疗程序研究所）； Technical University of Munich（慕尼黑技术大学）； Rohde & Schwarz GmbH & Co. KG（罗德与施瓦茨 GmbH & Co. KG）； Munich Center for Machine Learning（慕尼黑机器学习中心）； ELLIS Unit Helsinki, Dept. Computer Science, Aalto University（赫尔辛基ELLIS单位，计算机科学系，阿alto大学）

AI总结该研究提出了一种基于毫米波雷达的无接触人体体型测量方法，旨在解决传统测量工具在隐私、效率和适用性方面的不足。通过优化框架，该方法能够从毫米波点云数据中恢复人体三维形状并提取全面的体态测量指标。其核心贡献在于引入了一种顶点加权策略，结合参数化人体模型（SMPL）进行鲁棒的表面对齐与噪声抑制，实现了无需脱衣、无需摄像头的快速、隐私保护的测量流程，适用于各类人群的临床风险评估。

详情

AI中文摘要

身体形状和围度是临床上用于风险分层的信息性生物标志物，包括腰臀比、肢体和躯干周长等指标，然而传统工具如手动卷尺和光学扫描仪通常需要脱衣和保持姿势。这些要求减缓了工作流程，损害了尊严，并且排除了许多老年人和行动不便者。为了实现快速无接触测量，我们利用毫米波雷达，它保护隐私并能穿透典型衣物，实现快速全身采集。在这项工作中，我们提出了一个新的基于优化的框架，从体积毫米波数据中恢复3D人体形状并提取一套全面的人体测量数据。我们的方法引入了一个加权配准流程，将参数化身体模型（SMPL）直接拟合到噪声毫米波点云上。我们贡献的核心是一种顶点加权策略，该策略调节Chamfer能量函数以实现可靠的表面对齐和噪声消除。我们通过加入脚-地面约束和姿态先验进一步稳定拟合，直接优化SMPL参数。这些组件共同实现了一个快速、保护隐私的工作流程，无需摄像头或脱衣，且只需最小程度的配合，即可通过衣物提供高保真度的身体形状和测量数据，支持在诊所和护理机构中对所有年龄和活动水平的患者进行频繁的风险导向评估。

英文摘要

Body shape and circumferences are clinically informative biomarkers for risk stratification, including measures such as waist to hip ratio, limb and trunk girths, yet conventional tools such as manual tape measures and optical scanners often require undressing and sustained poses. These demands slow workflows, compromise dignity, and exclude many older adults and people with limited mobility. To make measurement fast and contactless, we leverage millimeter-wave (mmWave) radar, which preserves privacy and operates through typical clothing, enabling quick full-body acquisition. In this work, we present a new optimization-based framework to recover 3D human shape and extract a comprehensive set of anthropometric measurements from volumetric mmWave data. Our method introduces a weighted registration pipeline that fits a parametric body model (SMPL) directly to the noisy mmWave point cloud. The core of our contribution is a vertex-weighting strategy that modulates a Chamfer energy function for reliable surface alignment and noise elimination. We further stabilize the fit by incorporating a foot-ground plane constraint and pose priors, optimizing directly for the SMPL parameters. Together, these components enable a fast, privacy preserving workflow that delivers high fidelity body shape and measurements through clothing without cameras or disrobing and with minimal cooperation, supporting frequent risk oriented assessments in clinics and care facilities for patients of all ages and mobility levels.

URL PDF HTML ☆

赞 0 踩 0

2605.23061 2026-05-25 cs.LG cs.AI math.OC stat.ML 版本更新

时间机器：论运动在高效感知中的力量

Mantas Skackauskas, Xinyue Hao, Laura Sevilla-Lara

发表机构 * School of Informatics University of Edinburgh（信息学院爱丁堡大学）

AI总结本文提出了一种以运动为核心模态的视频表征学习方法，旨在解决现有视频模型在时序理解和训练成本方面的局限。通过使用点轨迹表示视频中的运动，并利用掩码自编码器进行自监督训练，模型能够学习到更高效且细粒度的视频表征。该方法无需依赖语言标注，大幅降低了训练数据需求，并在多项任务中展现出与当前先进模型相当的性能，为构建更高效、更具时序感知能力的视频模型提供了新方向。

详情

AI中文摘要

近年来，视频表示学习取得了巨大进展。这受到多种因素的推动，包括训练规模以及通过语言对比训练的视觉模型的成功。虽然这些因素推动了视频模型的能力边界，但它们也引入了自身的局限性：首先，扩展视频模型可能达到高昂的成本；其次，从语言学习限制了可学习概念的范围，仅限于字幕中的概念。因此，视频模型在时间理解方面仍然存在困难。在本文中，我们提出了一种新颖的方法，将运动作为视频表示的核心模态。具体而言，给定视频中以点轨迹形式存在的运动，我们使用掩码自编码器来掩码部分轨迹，并训练自编码器重建缺失的轨迹。这使我们能够以自监督方式学习表示。我们表明，使用运动来表示视频实际上解决了视频技术的两个核心局限性。首先，它使我们能够大幅减少训练数据的规模，因为运动本质上与外观无关，因此需要更少的样本就能很好地泛化。其次，运动使我们能够绕过依赖语言的训练范式，学习更细粒度的概念。结果是一种嵌入，我们称之为TIME（时间感知运动嵌入），这是一种仅使用合成运动数据训练的表示。我们在零样本方式下对广泛的任务测试了这种嵌入。我们观察到，无需额外技巧，其性能与使用多达4个数量级更少训练数据的最先进模型相当。这为迈向更有时序感知且更具可扩展性的视频模型新范式奠定了基础。

英文摘要

Video representation learning has seen tremendous progress in recent years. This has been driven by many factors, including the scale of training and the success of visual models trained contrastively with language. While these factors have pushed the boundaries of what video models can do, they also introduce their own set of limitations: first, scaling video models can reach prohibitive costs and second, learning from language restricts the range of concepts that can be learned to those in captions. As a result, video models still struggle with temporal understanding. In this paper we propose a novel approach that uses motion as the central modality for video representation. In particular, given the motion in a video in the form of point-tracks, we use a masked-autoencoder to mask some of the tracks and train the autoencoder to reconstruct the missing tracks. This allows us to learn a representation in a self-supervised manner. We show that using motion to represent videos actually addresses both of the core limitations of video technology. First, it allows us to massively reduce the scale of training data, as motion is inherently appearance-independent and hence needs fewer examples to generalize well. Second, motion allows us to bypass the language-dependent training paradigm, learning better fine-grained concepts. The result is an embedding that we call TIME (Temporally Informed Motion Embedding), a representation trained exclusively on synthetic motion data. We test this embedding on a wide set of tasks in a zero-shot manner. We observe that without bells and whistles, performance is on par with state-of-the-art models using up to 4 orders of magnitude less training data. This is a stepping stone towards a new paradigm of video models that are both more temporally aware as well as more scalable.

URL PDF HTML ☆

赞 0 踩 0

2605.23040 2026-05-25 cs.LG 版本更新

世界机器：面向时间序列的生成式世界建模

Elton Cardoso do Nascimento, Alexandre da Silva Simões, Esther Luna Colombini, Ricardo Ribeiro Gudwin, Paula Dornhofer Paro Costa

发表机构 * Universidade Estadual de Campinas (UNICAMP)（坎皮纳斯州立大学）； Universidade Estadual Paulista (UNESP)（保罗斯州立大学）

AI总结本文提出了一种名为 World Machine 的生成式世界建模架构，用于时间序列数据，旨在实现对环境的可预测理解和可控模拟。该架构基于变压器模型，引入了潜在状态机制，能够适应不同长度的观测数据和上下文，相比传统变压器在计算和内存效率上有所提升。实验在合成数据集 Toy1D 上验证了该方法的可行性，并展示了其相对于传统变压器的独特优势与各训练组件的贡献。

2605.23024 2026-05-25 cs.AI cs.CC cs.CL cs.LG 版本更新

The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems

确定性视界：作为可信AI系统设计规范的不可行性结果

Dongxin Guo

AI总结本文探讨了可信人工智能系统设计中由计算理论根本限制所带来的边界问题，提出将不可行性定理转化为系统设计规则的新方法。研究核心在于确定性地证明了大型语言模型的推理深度存在一个由架构决定的上限——“确定性地平线”，该上限不受训练数据量、适配器秩或损失函数的影响，并可通过模型层数和嵌入宽度预先计算。研究还展示了这一理论在多个AI子领域中的应用，形成一套包含十六项设计规范的目录，为构建更可靠的人工智能系统提供了理论依据和设计指导。

Comments PhD thesis, Department of Computer Science, The University of Hong Kong, 2026. 271 pages, 18 figures, 15 tables, 5 algorithms

详情

AI中文摘要

大型语言模型现在编写软件、起草法律文件并生成临床笔记，但从图灵、阿罗到没有免费午餐定理的基本极限，塑造了计算的能力。本文将这些不可行性结果从奇闻转化为设计规则。其旗舰结果证明了仅由架构设定的准确率上限：超过关键推理深度后，无论适配器秩、样本大小或损失函数如何，训练都无法改变它。该确定性视界在部署前可从层数和嵌入宽度计算，在十二种Transformer架构中测量值介于19到31之间，而在最优长度轨迹上微调可恢复不到4个百分点。其机制是残差流的容量不变性，信息论转换得出超过视界后准确率超指数衰减。一个针对模幂的无条件电路复杂度下界（对抗常数深度素数模电路）补充了这一结果。同样的论证重新应用于多个子领域：任何错误指定模型下的偏好学习在样本复杂度上出现不连续跳跃；多阶段检索流水线至少需要与阶段数一样多的独立指标；标准诚实拍卖对于具有提示相关估值的智能体失效；神经推理的零知识验证为每个非线性激活支付110到190倍的测量开销。这些共同构成了一个包含16条规范的目录，每条规范配对一个可计算边界、一个量化违反成本和一个建设性设计规则：两个组合已被证明，一个配对是诚实障碍，四个保持开放。本文为可信AI可能需要的生成式研究计划提供了不可行性规范方法论。AI的每一个基本极限也是一个设计规则。

英文摘要

Large language models now write software, draft legal documents, and produce clinical notes, yet fundamental limits, from Turing and Arrow to the No Free Lunch theorems, shape what computation can do. This thesis turns such impossibility results from curiosities into design rules. Its flagship result proves an accuracy ceiling set by architecture alone: past a critical reasoning depth, no amount of training moves it, at any adapter rank, sample size, or loss function. Computable before deployment from layer count and embedding width, this Deterministic Horizon is measured between nineteen and thirty-one across twelve transformer architectures, and fine-tuning on optimal-length traces recovers under four percentage points. The mechanism is a capacity invariant of the residual stream, and an information-theoretic conversion yields super-exponential accuracy decay past the horizon. An unconditional circuit-complexity lower bound for modular exponentiation against constant-depth prime-modulus circuits complements this result. The same argument recasts across subfields: preference learning under any misspecified model jumps discontinuously in sample complexity; multi-stage retrieval pipelines require at least as many independent metrics as stages; standard truthful auctions fail for agents with prompt-dependent valuations; and zero-knowledge verification of neural inference pays a measured overhead of one hundred ten to one hundred ninety times per non-linear activation. Together these form a catalogue of sixteen specifications, each pairing a computable boundary, a quantified violation cost, and a constructive design rule: two compositions are proved, one pairing is an honest obstruction, and four remain open. The impossibility-specification methodology is offered for the generative research programme that trustworthy AI may need. Every fundamental limit of AI is also a design rule.

URL PDF HTML ☆

赞 0 踩 0

2605.23019 2026-05-25 cs.LG 版本更新

PACE: Two-Timescale Self-Evolution for Small Language Model Agents

PACE：小型语言模型代理的双时间尺度自我进化

Chen Ling, Pei Chen, Albert Guan, Jiaming Qu, Shayan Ali Akbar, Madhu Gopinathan, Erwin Cornejo

发表机构 * Amazon（亚马逊）

AI总结本文研究了在资源受限条件下，冻结的小语言模型（SLM）能否作为有效的自进化智能体。为此，作者提出了PACE框架，通过双时间尺度协调低风险的提示优化与高风险的控制逻辑更新，实现了无需更新模型权重或依赖前沿模型的可靠自进化。实验表明，PACE在多个基准任务中均优于传统方法，显著提升了多轮工具使用等复杂任务的性能。

详情

AI中文摘要

在生产中部署语言模型代理通常需要大量的计算和人力来调整提示、解析器、验证器和代理流水线的其他组件。自我进化提供了一种有前景的替代方案，但大多数现有框架假设可以访问能够可靠诊断故障、提出修订并判断自身更新的前沿模型。我们研究冻结的小型语言模型（SLM）是否可以在资源约束下作为有效的自我进化代理。我们提出PACE（提示和控制逻辑进化），一个双时间尺度框架，协调低风险的提示优化与高风险的控逻辑更新。PACE在固定控制逻辑下进化提示，直到提示层面的增益饱和，然后考虑通过保留验证接受的有约束控制逻辑更新。在三个从4B到14B参数的冻结SLM骨干和四个受控基准上，PACE在所有12个骨干-基准组合上实现了最佳性能，相比原始SLM代理相对提升高达+9.2%，相比更强的单模式进化基线相对提升高达+5.4%。tau-bench案例研究进一步表明，PACE在多次交互工具使用成功率上优于原始和仅提示进化。这些结果表明，无需更新模型权重或依赖前沿模型教师，可靠的SLM代理自我进化是可能的，并且关键优势不在于任何单一的最终求解模式，而在于自主、经过验证地发现适合任务的推理策略。

英文摘要

Deploying language-model agents in production often requires substantial compute and human effort to tune prompts, parsers, validators, and other components of the agent pipeline. Self-evolution offers a promising alternative, but most existing frameworks assume access to frontier models that can reliably diagnose failures, propose revisions, and judge their own updates. We study whether frozen small language models (SLMs) can serve as effective self-evolving agents under resource constraints. We propose PACE (Prompt And Control Logic Evolution), a two-timescale framework that coordinates low-risk prompt refinement with higher-risk control-logic updates. PACE evolves prompts under fixed control logic until prompt-level gains saturate, then considers constrained control-logic updates that are accepted through held-out validation. Across three frozen SLM backbones ranging from 4B to 14B parameters and four controlled benchmarks, PACE achieves the best performance on all 12 backbone--benchmark combinations, improving over vanilla SLM agents by up to +9.2% relative improvement and over the stronger single-mode evolution baseline by up to +5.4% relative improvement. A tau-bench case study further shows that PACE improves multi-turn tool-use success over vanilla and prompt-only evolution. These results suggest that reliable SLM agent self-evolution is possible without updating model weights or relying on frontier-model teachers, and that the key benefit is not any single final solver pattern but autonomous, validated discovery of task-appropriate inference strategies.

URL PDF HTML ☆

赞 0 踩 0

2605.23017 2026-05-25 cs.LG cs.GT 版本更新

Smoothed Elicitation Complexity for Approximate $Γ$-calibration of Discrete Classification Tasks

离散分类任务的近似 $\Gamma$ 校准的平滑引发复杂度

Jessica Finocchiaro, Victor Ganson, Drona Khurana

发表机构 * Computer Science, Boston College（波士顿学院计算机科学系）； Computer Science, University of Colorado Boulder（科罗拉多大学博尔德分校计算机科学系）

AI总结本文研究了在离散分类任务中实现近似Γ-校准的问题，针对多类别分类模型的校准复杂度过高这一挑战，提出了一种基于Lipschitz连续性质的中间表示方法，有效降低了校准复杂度。通过构造适用于强可排序离散属性的Lipschitz性质，作者首次给出了离散属性近似校准的理论结果，并提供了设计这些性质的算法，为离散属性的校准提供了新的方法和理论支持。

Comments Working paper

详情

AI中文摘要

评估机器学习模型可信度的一种重要方法是校准的概念。在二元结果设置中，如果结果根据模型的条件分布预测实现，则概率预测器是校准的。将二元校准定义直接扩展到概率多类分类器会导致指数级的复杂度爆炸，因为预测空间随类别数 $n$ 呈指数增长。作为补救措施，Noarov 和 Roth (2023) 提出了使用结果分布属性的多类校准，将复杂度从随类别数 $n$ 增长降低到属性维度 $d$，称为其引发复杂度。先前关于近似属性校准的工作通常局限于连续标量属性，尽管许多相关属性是离散的，如众数或排名。我们通过使用Lipschitz连续属性作为中介，刻画了强可排序离散属性的近似属性校准。据我们所知，这是首次为离散属性提供近似校准结果。在此过程中，我们通过构建设计这些Lipschitz属性的算法，刻画了强可排序离散属性的Lipschitz引发复杂度，并证明这些属性可以通过后处理得到原始离散属性。

英文摘要

One prominent method of evaluating machine learning model trustworthiness is the notion of calibration. In the binary outcome setting, a probabilistic predictor is calibrated if outcomes are realized according to a model's distributional prediction, conditioned on this prediction. Straightforward extensions of binary calibration definitions to probabilistic multiclass classifiers suffer from an exponential complexity blowup as the space of predictions grows exponentially in the number of classes $n$. As a remedy, Noarov and Roth (2023) propose multiclass calibration with predictions that are properties of the outcome distribution, reducing complexity from growing in the number of classes $n$ to the dimension $d$ of the property, called its elicitation complexity. Previous work on approximate property calibration is generally limited to continuous scalar properties, despite many relevant properties of interest being discrete, like the mode or rankings. We characterize the approximate property calibration of discrete properties which are strongly orderable by using Lipschitz continuous properties as an intermediary. This work is the first to our knowledge to provide approximate calibration results for discrete properties. Along the way, we characterize the Lipschitz elicitation complexity of strongly orderable discrete properties by constructing algorithms for designing these Lipschitz properties, which we prove can be post-processed to obtain the original discrete property.

URL PDF HTML ☆

赞 0 踩 0

2605.23007 2026-05-25 q-fin.TR cs.AI cs.LG q-fin.PM 版本更新

MadEvolve: Evolutionary Optimization of Trading Systems with Large Language Models

MadEvolve: 基于大型语言模型的交易系统进化优化

Yurii Kvasiuk, Tianyi Li, Owen Colegrove, Moritz Münchmeyer

发表机构 * Department of Physics, University of Wisconsin–Madison（威斯康星大学麦迪逊分校物理系）； Event Horizon Labs（事件地平线实验室）

AI总结本文提出了一种基于大型语言模型的进化优化框架MadEvolve，用于优化量化交易系统，特别是在比特币交易中的策略生成与执行。该方法通过进化算法优化交易策略的特征集、策略组件及整体流程，显著提升了交易表现。研究还对比了其他智能搜索方法，并评估了模拟环境中的p-hacking概率，验证了AI驱动的进化算法在量化金融中的有效性。

详情

AI中文摘要

我们探索了将LLM驱动的算法优化应用于量化金融中的几个常见任务。MadEvolve是一个受DeepMind的Alpha-Evolve启发的通用算法优化框架，最近被开发用于优化计算宇宙学中的算法。在此，我们以比特币交易为例，展示了MadEvolve在优化算法交易策略和alpha生成方面的实用性。在我们的模拟和回测设置中，我们在所有考虑的任务上取得了显著改进，例如演化用于信号生成的特征集、优化交易策略的独立组件，以及联合演化特征流水线与执行策略。此外，我们将我们的方法与其他智能搜索方法（特别是Claude Code）进行了比较，并仔细评估了模拟设置中的p-hacking概率。我们的发现强烈支持AI驱动的智能和进化算法在算法交易和量化金融中的实用性。

英文摘要

We explore the application of LLM-driven algorithm optimization to several common tasks in quantitative finance. MadEvolve, a general-purpose algorithm optimization framework inspired by DeepMind's Alpha-Evolve, was recently developed to optimize algorithms in computational cosmology. Here we demonstrate the utility of MadEvolve to optimize algorithmic trading strategies and alpha generation at the example of Bitcoin trading. On our simulation and backtesting setup, we achieve significant improvements on all tasks we considered, such as evolving feature sets for signal generation, optimizing separate components of the trading strategy, and jointly evolving the feature pipeline together with the execution strategy. Additionally, we compare our method to other agentic search approaches, specifically Claude Code, and carefully evaluate p-hacking probabilities on our simulation setup. Our findings strongly support the utility of AI-driven agentic and evolutionary algorithms for algorithmic trading and quantitative finance.

URL PDF HTML ☆

赞 0 踩 0

2605.22988 2026-05-25 q-bio.NC cs.LG cs.RO cs.SY eess.SY 版本更新

Active Sensing Subserves Task-Level Control

主动感知服务于任务级控制

Andrew Lamperski, Debojyoti Biswas, Eric S. Fortune, John Guckenheimer, Kathleen Hoffman, Noah J. Cowan

发表机构 * Department of Electrical and Computer Engineering, University of Minnesota（明尼苏达大学电气与计算机工程系）； Laboratory for Computational Sensing and Robotics, Johns Hopkins University（约翰霍普金斯大学计算感知与机器人实验室）； Federated Department of Biological Sciences, New Jersey Institute of Technology（新泽西理工学院联合生物科学系）； Department of Mathematics, Cornell University（康奈尔大学数学系）； Department of Mathematics and Statistics, University of Maryland, Baltimore County（马里兰大学巴尔的摩县分校数学与统计学系）； Department of Mechanical Engineering, Johns Hopkins University（约翰霍普金斯大学机械工程系）

AI总结本文探讨了主动感知在任务级控制中的作用，提出主动感知并非由感官目标驱动，而是任务控制的必要组成部分。研究结合生物实证数据和数学理论，表明主动感知行为通常以离散阶段出现，动物在“探索”与“利用”两种行为模式间切换，以适应性传感器和模式切换实现反馈控制。这一策略在生物系统中普遍存在，但在工程系统中却较少应用，提示当前机器人控制体系仍有待改进。

详情

AI中文摘要

主动感知传统上被定义为为了获取信息而消耗能量，通常以运动的形式。在这里，我们提出，对自适应传感器的依赖、运动与感知之间的联系以及任务级控制的结合，必然导致主动感知运动的出现。这样，主动感知并非由感官目标驱动，例如最小化状态不确定性，而是任务级控制所必需的。这一假设，即主动感知服务于控制，得到了来自生物体的经验数据和数学理论的支持。有趣的是，主动感知行为通常发生在离散的时段中，与目标导向行为交替出现。这表明动物在两种具有不同控制策略的行为模式之间切换：一种“探索”模式，动物产生动态运动以塑造感觉反馈；以及一种“利用”模式，动物产生与实现任务目标直接相关的较慢补偿运动。这种依赖于自适应传感器、主动感知和模式切换的反馈控制策略在工程系统中并不常用，尽管在生物学中普遍存在。由最先进的传感器、执行器和机械设计组成的工程系统在“成本函数”方面（如最大力生成、精度和速度）可以胜过动物。然而，动物通常能够实现目前工程系统无法比拟的稳健、优雅的行为，这表明当前的控制系统存在不足。这些以控制理论语言表达的见解可能对改进机器人感知和控制至关重要。

英文摘要

Active sensing is traditionally defined as the expenditure of energy, typically in the form of movement, for obtaining information. Here, we propose that the combination of reliance on adaptive sensors, the linkage between movement and sensing, and task-level control inevitably gives rise to the emergence of active sensing movements. In this way, active sensing is not driven by sensory goals, such as minimizing uncertainty about the state, but rather is necessary for task-level control. This hypothesis, that active sensing subserves control, is supported by both empirical data from organisms and mathematical theory. Interestingly, active sensing behaviors often occur in discrete epochs, interspersed with goal-oriented behavior. This suggests that animals switch between two behavioral modes with distinct control policies, an `explore' mode in which animals produce dynamic movements to shape sensory feedback, and an `exploit' mode in which animals produce slower compensatory movements that are directly related to achieving task goals. This strategy for feedback control that relies on adaptive sensors, active sensing, and mode switching is not commonly used in engineered systems despite being ubiquitous in biology. Engineered systems comprising state-of-the-art sensors, actuators, and mechanical designs can outperform animals with respect to ``cost functions'' such as maximum force generation, precision, and speed. Nevertheless, animals routinely achieve robust, graceful behaviors that are currently unmatched by engineered systems, suggesting that current control systems are insufficient. These insights, expressed in the language of control theory, may be critical for improving robotic sensing and control.

URL PDF HTML ☆

赞 0 踩 0

比随机更差：无监督特征选择中基线的重要性

Muhammad Rajabinasab, Michael E. Houle, Oussama Chelly, Arthur Zimek

发表机构 * University of Southern Denmark（丹麦南部大学）； New Jersey Institute of Technology（新泽西理工学院）； Oratio Technologies（Oratio技术公司）

AI总结本文探讨了无监督特征选择方法的评估基准问题，指出当前多数方法缺乏与随机特征选择这一基准的比较，难以衡量其实际贡献。作者提出应将随机特征选择作为评估基准，并通过实验证明许多先进方法在性能和效率上均不如随机选择。因此，研究强调在开发新的无监督特征选择方法时，必须以随机选择为基准，以确保方法的有效性与改进价值。

Comments Preprint submitted to Elsevier Pattern Recognition Letters

2605.22972 2026-05-25 cs.LG cs.AI 版本更新

针对安全分类器的边界目标成员推断攻击

Anthony Hughes, Alexander Goldberg, Prince Jha, Adam Perer, Nikolaos Aletras, Niloofar Mireshghallah

发表机构 * University of Sheffield（谢菲尔德大学）； Carnegie Mellon University（卡内基梅隆大学）； MBZUAI

AI总结该研究探讨了针对安全分类器的边界定向成员推理攻击问题，这类分类器常用于生成式AI系统中以过滤有害内容或识别高风险用户。研究提出了一种新的攻击方法，通过识别分类器最不自信的样本，揭示模型在训练数据上的记忆性特征，从而推断出样本是否属于训练集。实验表明，该方法在检测用户情绪支持需求的分类器上，能以较低的误报率恢复更多被标记为高风险的对话，效果显著优于现有成员推理攻击方法，并进一步分析了边界样本的特性，指出基于内容的过滤策略难以有效防御此类攻击。

详情

AI中文摘要

安全分类器是生成式AI系统中的重要保障，用于过滤有害内容或识别与大语言模型交互时处于风险中的用户。尽管这些模型是必要的，但它们是在包含自残和心理健康讨论等敏感数据集上训练的，这引发了重要但尚未充分理解的隐私问题。成员推断攻击（MIA）允许对手推断用于训练模型的示例的成员身份。在这项工作中，我们假设识别分类器最不自信的示例对于对手推断成员身份是有信息的。这反映了局部泛化失败，其中模型依赖记忆来解决训练集中的歧义。为了研究这一点，我们引入了一种新的边界目标选择策略，该策略识别低置信度示例，从而放大训练集中示例成员身份的信号。我们的实验结果表明，在针对检测可能需要情感支持的用户的微调分类器上，对手可以以5%的假阳性率恢复安全分类器标记为指示用户困扰的对话中的19%。这比单独使用最先进的MIA方法攻击高出3.5倍。最后，我们描述了边界示例的特征，并表明基于内容的过滤对于保护无效，而现有的噪声策略可以有效减轻这些示例的敏感性。

英文摘要

Safety classifiers are essential safeguards within generative AI systems, filtering harmful content or identifying at-risk users when interacting with large language models. Despite their necessity, these models are trained on sensitive datasets including discussions of self-harm and mental health, raising important, yet poorly understood, privacy concerns. Membership inference attacks (MIAs) allow adversaries to infer membership of examples used to train models. In this work, we hypothesize that identifying the examples on which the classifier is least confident are informative for an adversary to infer membership. This reflects a localized failure of generalization, where the model relies on memorization to resolve ambiguity in the training set. To investigate this, we introduce a new boundary-targeted selection strategy that identifies low confidence examples that amplify the signal of an examples membership within a training set. Our experimental results show that an adversary can recover 19% of the conversations a safety classifier flagged as indicating user distress, at a 5% false-positive rate, on a classifier fine-tuned for detecting a user who may require emotional support. This is $3.5$ times more than attacking using state-of-the-art MIA methods alone. Finally, we characterize the boundary laying examples and show that content-based filtering is ineffective for protection, and existing noise strategies can effectively mitigate susceptibility of these examples.

URL PDF HTML ☆

赞 0 踩 0

2605.22350 2026-05-25 cs.LG stat.ML 版本更新

Partial Fusion of Neural Networks: Efficient Tradeoffs Between Ensembles and Weight Aggregation

神经网络的部分融合：集成与权重聚合之间的高效权衡

Fabian Morelli, Stephan Eckstein

发表机构 * Department of Mathematics, University of Tübingen, Germany（图宾根大学数学系，德国）； Department of Computer Science, University of Tübingen, Germany（图宾根大学计算机科学系，德国）

AI总结该论文提出了一种神经网络的部分融合方法，在集成学习与权重聚合之间实现计算成本与性能的灵活权衡。核心思想是基于神经元层面的相似性，仅对最相似的神经元进行权重聚合，从而在保持较高准确率的同时降低计算开销。研究还展示了通过部分最优运输方法识别和匹配相似神经元的具体实现，并将权重聚合与部分融合视为集成模型的广义剪枝过程，允许对神经元进行删除或线性组合操作，进一步拓展了模型优化的灵活性。

Comments Accepted to ICML 2026

详情

AI中文摘要

神经网络的集成通常优于单个网络，但计算成本高昂，而权重聚合产生的聚合模型成本较低，但精度也较低。我们引入了网络的部分融合，它在集成和权重聚合之间进行插值，从而允许在计算成本和性能之间进行灵活的权衡。实现这一目标的一种直接方法是扩展现有的基于不同网络之间神经元级相似性的权重聚合方法，其中部分融合仅聚合最相似神经元的权重。我们展示了一种特定方法，通过部分最优传输联合识别哪些神经元最相似并进行匹配。此外，我们将权重聚合和部分融合视为集成模型的广义剪枝，其中神经元不仅可以被删除，还可以线性组合。最后，我们表明，应用于单个网络的广义剪枝通过允许基于相似性隔离、删除和线性组合神经元之间的权衡，产生了与部分融合类似的优势。我们的代码可在 https://github.com/Fabian-Mor/partial_fusion_nn 获取。

英文摘要

Ensembles of neural networks typically outperform individual networks but incur large computational costs, whereas weight aggregation produces less costly, yet also less accurate, aggregate models. We introduce partial fusion of networks, which interpolates between ensembles and weight aggregation and thus allows for a flexible tradeoff between computational cost and performance. A direct way to achieve this is to extend existing weight aggregation methods based on neuron-level similarity between different networks, where partial fusion then only aggregates weights of neurons which are most similar. We showcase one particular method to jointly identify which neurons are most similar and match them via partial optimal transport. Further, we consider the more general perspective of weight aggregation and partial fusion as generalized pruning of ensemble models, where neurons cannot just be deleted, but also linearly combined. Finally, we show that generalized pruning applied to a single network yields similar benefits as partial fusion by allowing for a tradeoff between isolating, deleting, and linearly combining neurons based on similarity. Our code is available at https://github.com/Fabian-Mor/partial_fusion_nn.

URL PDF HTML ☆

赞 0 踩 0

2605.22237 2026-05-25 cs.CR cs.LG 版本更新

Decision-Aware Quadratic ReLU Replacement for HE-Friendly Inference

面向同态加密推理的决策感知二次ReLU替换

Rui Li, Wenyuan Wu, Weijie Miao

发表机构 * Chongqing Key Laboratory of Secure Computing for Biology（重庆生物安全计算重点实验室）； Chongqing Institute of Green and Intelligent Technology（重庆绿色智能技术研究所）； Chinese Academy of Sciences（中国科学院）； Department of Industrial and Systems Engineering（工业与系统工程系）

AI总结该研究针对全同态加密（FHE）下神经网络推理中ReLU激活函数的替换问题，提出了一种基于决策感知的二次多项式替代方法，旨在在不重新训练模型的前提下，使用低阶多项式保持分类决策的一致性。研究通过几何框架分析校准集的决策边界，提出了在正边距条件下实现无误差替换的充要条件及构造算法，并在边距不足时引入凸包缩减和拉格朗日对偶松弛方法，有效降低计算复杂度。实验表明，该方法在CKKS方案下能够达到与明文模型相当的精度，且推理效率显著优于现有方法。

Comments 13 pages, 2 figures

详情

AI中文摘要

全同态加密（FHE）仅支持加法和乘法，因此仅使用FHE的神经网络推理通常将ReLU替换为在经验激活区间上拟合的多项式。这种区间拟合通常需要更高次多项式来控制激活误差，从而产生同态评估成本，而分类由最终logit决策决定。我们从决策感知的角度重新审视ReLU替换：给定一个训练好的单隐层ReLU MLP和一个指定的校准集，能否在不重新训练的情况下，用一个同态友好的低次多项式替换ReLU，同时保持校准集决策不变？我们专注于二次替换，即保留每个单元非线性的最低次数。对于在提升空间中正间隔可分的校准集，我们将二次替换公式化为一个线性可分问题，得到了校准无损替换的充分必要条件以及系数的构造性算法。当正间隔条件不满足时（通常是因为少数接近边界或错误分类的校准样本使提升凸包接触），我们通过缩减凸包和拉格朗日对偶软间隔松弛来扩展相同的几何框架。这些方法限制了单个样本能携带的权重，将问题转化为较小的凸二次规划，产生近似可行的系数，并在校准集决策上具有高经验一致性。特别地，在最大权重上限μ=1时，缩减凸包松弛退化为标准凸包分离；因此该松弛连续地扩展了正间隔精确理论。在CKKS下，二次替换在多个基准测试中匹配明文top-1准确率，激活模块运行速度比Remez-7快3.7-4.1倍，端到端快1.18-1.68倍。

英文摘要

Fully homomorphic encryption (FHE) supports only additions and multiplications, so FHE-only neural-network inference typically replaces ReLU with polynomials fitted over empirical activation intervals. Such interval fitting often requires higher-degree polynomials to control activation error, incurring homomorphic evaluation costs, while classification is determined by the final logit decision. We revisit ReLU replacement from a decision-aware perspective: given a trained single-hidden-layer ReLU MLP and a specified calibration set, can an HE-friendly low-degree polynomial replace ReLU without retraining while preserving calibration-set decisions? We focus on quadratic replacement, the lowest-degree that retains a genuine per-unit nonlinearity. For calibration sets positive-margin separable in the lifted space, we formulate quadratic replacement as a linear separation problem, yielding necessary and sufficient conditions for calibration-lossless replacement and a constructive algorithm for the coefficients. When the positive-margin condition fails -- often because a few near-boundary or misclassified calibration samples bring the lifted hulls into contact -- we extend the same geometric framework via reduced convex hulls and Lagrangian-dual soft-margin relaxations. These cap the weight any single sample can carry, converting the problem into smaller convex quadratic programs that yield approximately feasible coefficients with high empirical agreement on calibration-set decisions. In particular, at the maximal weight cap $μ=1$, the reduced-convex-hull relaxation reduces to standard convex-hull separation; the relaxation thus continuously extends the positive-margin exact theory. Under CKKS, the quadratic replacement matches plaintext top-1 accuracy on multiple benchmarks, running 3.7--4.1$\times$ faster than Remez-7 in the activation module and 1.18--1.68$\times$ faster end-to-end.

URL PDF HTML ☆

赞 0 踩 0

2605.21851 2026-05-25 cs.LG cs.AI 版本更新

OPPO: Bayesian Value Recursion for Token-Level Credit Assignment in LLM Reasoning

OPPO: 用于LLM推理中令牌级信用分配的贝叶斯价值递归

Yu Li, Rui Miao, Tian Lan, Zhengling Qi

发表机构 * George Washington University（乔治华盛顿大学）； The University of Texas at Dallas（德克萨斯大学达拉斯分校）

AI总结该论文提出了一种名为OPPO的新型算法，用于改进大语言模型（LLM）在推理任务中的信用分配机制。OPPO基于一种关键观察：传统方法中用于局部判别的 oracle 信号本质上是模型对最终成功概率的贝叶斯更新。通过沿轨迹累积该信号，OPPO能够在不依赖价值网络或额外采样的情况下，直接计算出每个位置的成功概率估计和令牌级优势，从而更准确地识别推理过程中的关键步骤。实验表明，OPPO在多个数学、科学和代码推理基准上显著优于现有方法。

详情

AI中文摘要

具有可验证奖励的强化学习已成为提升LLM推理的标准方法，但主流算法GRPO为每个令牌分配单一轨迹级优势，稀释了关键推理步骤的信号，并在无信息步骤中注入噪声。源自在线策略蒸馏的无评论家替代方案通过预言机条件似然比提供每令牌信号，但每个信号孤立于该位置之前累积的轨迹级证据。我们提出Oracle-Prompted Policy Optimization (OPPO)，它基于一个简单观察：先前蒸馏式方法用于局部区分的预言机信号，也是模型对最终成功信念的自然贝叶斯更新。沿轨迹累积信号，以一次额外前向传播的代价，以闭式形式给出每个位置成功概率的运行估计，以及无需学习价值网络和额外采样的令牌级优势。一阶分析将优势分解为蒸馏方法使用的每令牌区分信号，乘以一个状态权重，该权重将信用集中在真正关键的令牌上，并具有方向性方差减少保证。该框架包含两种估计器，区别仅在于谁对证据评分： extit{自预言机}重用学生模型，将在线策略蒸馏奖励作为严格特例恢复； extit{教师预言机}将评分委托给更强的冻结模型。在两个基础LLM上，跨越七个数学、科学和代码推理基准，OPPO在AMC'23上比GRPO、DAPO和SDPO提升高达+6.0分，在AIME'24上提升+5.2分，且增益随响应长度单调增加。

英文摘要

Reinforcement learning with verifiable rewards has become the standard recipe for improving LLM reasoning, but the dominant algorithm GRPO assigns a single trajectory-level advantage to every token, diluting the signal at pivotal reasoning steps and injecting noise at uninformative ones. Critic-free alternatives derived from on-policy distillation supply per-token signals through oracle-conditioned likelihood ratios, yet apply each signal in isolation from the trajectory-level evidence accumulated up to that position. We propose Oracle-Prompted Policy Optimization (OPPO), which rests on a single observation: the oracle signal used by prior distillation-style methods for local discrimination is also the natural Bayesian update of the model's belief about eventual success. Accumulating the signal along a trajectory yields, in closed form and at the cost of one extra forward pass, a running estimate of the success probability at every position, together with a token-level advantage that requires no learned value network and no additional rollouts. A first-order analysis factorizes the advantage into the per-token discrimination signal used by distillation methods modulated by a state weight that concentrates credit on genuinely pivotal tokens, with a directional variance-reduction guarantee. The framework admits two estimators differing only in which model scores the evidence: a \textit{self-oracle} that reuses the student and recovers the on-policy distillation reward as a strict special case, and a \textit{teacher-oracle} that delegates scoring to a stronger frozen model. On two base LLMs across seven mathematics, science, and code reasoning benchmarks, OPPO improves over GRPO, DAPO, and SDPO by up to $+6.0$ points on AMC'23 and $+5.2$ points on AIME'24, with gains that widen monotonically with response length.

URL PDF HTML ☆

赞 0 踩 0

2605.21489 2026-05-25 cs.LG cs.AI cs.CV stat.CO stat.ML 版本更新

Variance Reduction for Expectations with Diffusion Teachers

具有扩散教师的期望方差缩减

Jesse Bettencourt, Xindi Wu, Matan Atzmon, James Lucas, Jonathan Lorraine

发表机构 * NVIDIA ； University of Toronto（多伦多大学）； Princeton University（普林斯顿大学）

AI总结本文研究了如何在使用预训练扩散模型作为“教师”进行下游任务（如文本到3D生成、单步蒸馏等）时，降低梯度估计的方差。提出了一种名为CARV的计算感知方差控制框架，通过分层蒙特卡洛估计器，将昂贵的上游计算过程与廉价的扩散噪声重采样相结合，并结合时间步重要性采样和分层逆CDF构造，有效减少了计算成本。实验表明，CARV在不改变目标函数的前提下显著提升了计算效率，但在某些任务中梯度方差的降低并未带来生成质量的提升，表明此时方差已不再是性能瓶颈。

Comments Project page: https://research.nvidia.com/labs/sil/projects/CARV/

详情

AI中文摘要

预训练的扩散模型作为冻结教师，为文本到3D、单步蒸馏和数据归因等下游流程提供支持。这些流程消耗的教师梯度是关于噪声水平和高斯噪声样本的蒙特卡洛期望；其估计器方差主导了计算成本，因为每次抽取都需要昂贵的上游工作（渲染、模拟、编码）。我们引入了CARV，一个计算感知的方差核算框架，它激发了一种分层蒙特卡洛估计器：通过廉价的扩散噪声重采样来摊销昂贵的上游计算，并通过时间步重要性采样和分层逆CDF构造加以强化。在我们的文本到3D蒸馏和归因实验中，CARV在不改变目标的情况下提供了2-3倍的有效计算乘数（主要来自摊销重用；约25%来自IS+分层）；在单步蒸馏中，相同的技术将梯度方差降低了一个数量级，但并未改善下游FID，标志着MC方差不再是瓶颈的区间。

英文摘要

Pretrained diffusion models serve as frozen teachers feeding downstream pipelines such as text-to-3D, single-step distillation, and data attribution. The teacher gradients these pipelines consume are Monte Carlo (MC) expectations over noise levels and Gaussian noise samples; their estimator variance dominates compute cost because each draw requires expensive upstream work (rendering, simulation, encoding). We introduce CARV, a compute-aware variance-accounting framework that motivates a hierarchical MC estimator: amortize the expensive upstream computation over cheap diffusion-noise resamples, sharpened by timestep importance sampling and a stratified-inverse-CDF construction. In our text-to-3D distillation and attribution experiments, CARV delivers 2-3x effective compute multipliers (most from amortized reuse; ~25% additional from IS+stratification) without changing the objective; in single-step distillation, the same techniques cut gradient variance by an order of magnitude but do not improve downstream FID, marking the regime where MC variance is no longer the bottleneck.

URL PDF HTML ☆

赞 0 踩 0

2605.21139 2026-05-25 cs.CV cs.LG 版本更新

Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving

蒸馏思考，预见行动：面向自动驾驶的认知-物理强化学习

Yang Wu, Qiang Meng, Zhaojiang Liu, Youquan Liu, Jian Yang, Jin Xie

发表机构 * NJU（南京大学）； SJTU（上海交通大学）； FDU（福建大学）

AI总结当前端到端自动驾驶模型受到模仿学习行为克隆天花板的限制，为此，本文提出CoPhy认知-物理强化学习框架，通过将视觉语言模型知识蒸馏到鸟瞰图编码器中，实现零推理成本的认知能力，并构建自回归的鸟瞰图世界模型以预测候选动作的未来语义地图，从而在物理环境层面预见行动后果。该方法结合物理奖励和认知奖励优化驾驶策略，不仅在NAVSIM基准上取得最优性能，还支持通过用户定义的语言指令实现更安全、更灵活的驾驶控制。

详情

AI中文摘要

当前的端到端自动驾驶模型从根本上受到模仿学习的行为克隆上限的限制。虽然强化学习提供了更智能自主性的路径，但它需要两个缺失的基础设施：（1）理解交通语义和驾驶意图的认知基础，以及（2）能够预见候选行动后果的前瞻性物理环境。为此，我们提出了CoPhy，一个用于自动驾驶的认知-物理强化学习框架。为了蒸馏思考，我们将VLM知识蒸馏到BEV编码器中，然后完全丢弃VLM，以零推理成本保留认知能力，同时将认知通道作为可插拔接口释放，用于可选的人类语言命令。为了预见行动，我们构建了一个自回归BEV世界模型，该模型明确预测以候选行动为条件的未来语义地图，作为一个可解释的物理沙盒，从中直接推导出安全指标。基于这一双重基础设施，我们通过GRPO优化驾驶策略，采用新颖的双奖励机制：从BEV rollout导出的物理奖励强制执行硬安全约束，而来自语言对齐评分器的认知奖励确保意图合规。大量实验表明，CoPhy不仅在NAVSIM v1和v2基准上取得了最先进的结果，而且通过认知信息化的场景合规性和通过用户定义的语言指令实现的灵活意图控制，实现了更安全的驾驶。

英文摘要

Current end-to-end autonomous driving models are fundamentally constrained by the behavioral cloning ceiling of imitation learning. While reinforcement learning offers a path to smarter autonomy, it demands two missing pieces of infrastructure: (1) a cognitive foundation that understands traffic semantics and driving intent, and (2) a foresighted physical environment that can anticipate the consequences of candidate actions. To this end, we propose CoPhy, a CognitivePhysical reinforcement learning framework for autonomous driving. To distill to think, we distill VLM knowledge into the BEV encoder and then discard the VLM entirely, retaining cognitive ability at zero inference cost while releasing the cognitive channel as a pluggable interface for optional human language commands. To foresee to act, we build an auto-regressive BEV world model that explicitly predicts future semantic maps conditioned on candidate actions, serving as an interpretable physical sandbox from which safety metrics are directly derived. Built upon this dual infrastructure, we optimize the driving policy via GRPO with a novel dual-reward mechanism: a physical reward derived from BEV rollouts enforces hard safety constraints, while a cognitive reward from a language-aligned scorer ensures intent compliance. Extensive experiments demonstrate that CoPhy not only achieves state-of-the-art results on NAVSIM v1 and v2 benchmarks, but also enables safer driving via cognitively informed scene compliance and flexible intent control through user-defined language instructions.

URL PDF HTML ☆

赞 0 踩 0

2605.20919 2026-05-25 cs.LG cs.AI cs.PL 版本更新

Sutra: Tensor-Op RNNs as a Compilation Target for Vector Symbolic Architectures

Sutra: 以张量操作RNN作为向量符号架构的编译目标

Emma Leonhart

发表机构 * Emma Leonhart

AI总结 Sutra 是一种类型化的纯函数式编程语言，其前向传播过程被编译为 PyTorch 神经网络。该语言通过将程序中的原始操作、控制流和字符串 I/O 等全部转换为一个融合的张量操作图，实现了对向量符号架构的高效编译。研究展示了 Sutra 在多种嵌入表示上的高精度解码能力，并验证了其可微分性，使得同一程序既能作为逻辑程序运行，也能作为可训练的神经网络进行优化。

Comments Modified NeurIPS submission, see AI declaration and replication materials at end of paper

详情

AI中文摘要

Sutra是一种带类型的纯函数式编程语言，其编译后的前向传播是一个PyTorch神经网络。编译器将整个程序——包括原语、控制流、字符串I/O——通过beta归约降级为一个在冻结嵌入基质上的融合张量操作图。旋转绑定、解绑、捆绑、多项式Kleene三值逻辑以及尾递归循环均被降级为张量操作；Kleene连接词是在{-1, 0, +1}真值网格上精确的拉格朗日插值多项式。验证通过两种方式测试同一事实。(1) 同一程序在跨越两种模态的四个冻结嵌入上运行——三种文本编码器（nomic-embed-text、all-minilm、mxbai-embed-large）和一种蛋白质语言模型（ESM-2）——并在每个基质上以宽度k=8实现100%的解码准确率，而教科书式的Hadamard乘积已经崩溃（mxbai-embed-large上2.5%，all-minilm上7.5%）。(2) PyTorch自动求导流经实际编译的图：一个用.su编写的模糊规则分类器从随机初始化（18.7±9.5%；随机概率=20%，五类）通过反向传播经过发射图（符号源未修改）训练到100.0±0.0%（三个种子）。一个加权变体额外训练一个标量余弦增益，并将其作为数值字面量写回.su源文件；重新编译重现训练后的行为，每个logit误差约2e-7，因此训练后的模型本身是可读、可重编译的代码。因此，同一工件既是一个逻辑程序，也是一个可训练的神经网络。

英文摘要

Sutra is a typed, purely functional programming language whose compiled forward pass is a PyTorch neural network. The compiler beta-reduces the whole program -- primitives, control flow, string I/O -- to one fused tensor-op graph over a frozen embedding substrate. Rotation binding, unbind, bundle, polynomial Kleene three-valued logic, and tail-recursive loops all lower to tensor operations; the Kleene connectives are Lagrange-interpolated polynomials exact on the {-1, 0, +1} truth grid. Validation is one fact tested two ways. (1) The same program runs on four frozen embeddings spanning two modalities -- three text encoders (nomic-embed-text, all-minilm, mxbai-embed-large) and one protein language model (ESM-2) -- and decodes bundles at 100% accuracy through width k=8 on every substrate, where the textbook Hadamard product has already collapsed (2.5% on mxbai-embed-large, 7.5% on all-minilm). (2) PyTorch autograd flows through the actually compiled graph: a fuzzy-rule classifier written in .su trains from random init (18.7 +/- 9.5%; chance = 20%, five classes) to 100.0 +/- 0.0% (three seeds) by backpropagating through the emitted graph, the symbolic source unmodified. A weighted variant additionally trains a scalar cosine gain and writes it back into the .su source as a numeric literal; recompiling reproduces the trained behaviour to ~2e-7 per logit, so the trained model is itself legible, recompilable code. The same artifact is therefore both a logic program and a trainable neural network.

URL PDF HTML ☆

赞 0 踩 0

2605.20896 2026-05-25 cs.CR cs.AI cs.LG 版本更新

TwinRouterBench：面向现实智能体LLM路由的快速静态与实时动态评估

Pei Yang, Wanyi Chen, Tongyun Yang, Pengbin Feng, Jiarong Xing, Wentao Guo, Yuhang Yao, Yuhang Han, Hanchen Li, Xu Wang, Zeyu Wang, Jie Xiao, Anjie Yang, Liang Tian, Lynn Ai, Eric Yang, Tianyu Shi

发表机构 * Gradient ； Soochow University（苏州大学）； Independent Researcher（独立研究者）； University of Southern California（南加州大学）； Rice University（Rice大学）； Carnegie Mellon University（卡内基梅隆大学）； Shanghai Jiao Tong University（上海交通大学）； University of California, Berkeley（加州大学伯克利分校）； University of the Chinese Academy of Sciences（中国科学院大学）； University of California, Los Angeles（加州大学洛杉矶分校）

AI总结本文提出 TwinRouterBench，一个用于评估代理式大语言模型（LLM）路由策略的基准工具，旨在支持静态和动态场景下的高效评估。该基准包含两个赛道：静态赛道提供多个任务中的模型调用前缀及对应的最优模型层级，通过确定性计算进行评分；动态赛道则在真实代理系统中运行路由策略，评估其在实际任务完成和成本控制方面的表现。该工作为路由算法的开发与优化提供了全面且高效的实验平台。

详情

AI中文摘要

LLM路由在长时任务（如编码智能体、深度研究系统和计算机使用智能体）中最为重要，其中单个用户请求会触发多次模型调用。将每次调用路由到最便宜的足够模型可以在不牺牲质量的情况下降低成本，然而现有的路由器基准仅评估一次性提示的路由。它们从未暴露中间智能体步骤中路由器可见的前缀，从未测试更便宜的替代品是否保留下游任务的成功，并且通常在评估时依赖在线LLM评判。我们引入了TwinRouterBench，一个具有两轨的步骤级路由基准。静态轨提供来自SWE-bench、BFCL、mtRAG、QMSum和PinchBench中520个实例的970个路由器可见前缀，每个前缀与在发布的降级和级联协议下估计的执行验证目标层级配对；评分是层级标签、轨迹成员资格和令牌成本的确定性算术，无需在线评估方LLM评判。动态轨提供一个工具，可在完整的500例SWE-bench验证集上运行路由器；本文报告了与静态SWE监督划分不相交的100例保留评估。每次LLM调用时，路由器从锁定池中选择一个具体模型，成功由官方任务解决率和实际API支出衡量。两轨支持快速离线迭代，随后在实时智能体执行下进行端到端验证。代码和数据可在https://github.com/CommonstackAI/TwinRouterBench获取。

英文摘要

LLM routing matters most in long-horizon applications such as coding agents, deep research systems, and computer-use agents, where a single user request triggers many model calls. Routing each call to the cheapest sufficient model can cut costs without sacrificing quality, yet existing router benchmarks evaluate routers only on one-shot prompts. They never expose the router-visible prefix at an intermediate agent step, never test whether a cheaper replacement preserves downstream task success, and often rely on online LLM judges at evaluation time. We introduce TwinRouterBench, a step-level routing benchmark with two tracks. The static track provides 970 router-visible prefixes from 520 instances across SWE-bench, BFCL, mtRAG, QMSum, and PinchBench, each paired with an execution-verified target tier estimated under a released downgrade-and-cascade protocol; scoring is deterministic arithmetic over tier labels, trajectory membership, and token costs, with no online evaluator-side LLM judge. The dynamic track supplies a harness that runs routers on the full 500-case SWE-bench Verified suite; in this paper we report a 100-case held-out evaluation disjoint from the static SWE supervision split. At each LLM call the router selects a concrete model from a locked pool, and success is measured by official task resolution and realized API spend. The two tracks support fast offline iteration followed by end-to-end validation under live agent execution. Code and data are available at https://github.com/CommonstackAI/TwinRouterBench.

URL PDF HTML ☆

赞 0 踩 0

2605.18370 2026-05-25 stat.ML cs.LG math.ST stat.TH 版本更新

On Stability and Decomposition of Sample Quantiles under Heavy-Tailed Distributions

重尾分布下样本分位数的稳定性与分解

Choudur Lakshminarayan

发表机构 * School of Business, Stevens Institute of Technology（斯蒂文斯理工学院商学院）

AI总结本文研究了在重尾分布下，基于估计参数的样本分位数的稳定性与分解问题，尤其关注与金融收益线性投影相关的风险价值（VaR）估计。传统Bahadur表示在固定分布下难以分离投影方向和分位数阈值带来的不稳定性，本文提出一种Q-Q正交性方法，将两者的影响分离开来，并将样本分位数与理论分位数的差异分解为三个部分，分别对应投影方向变化、样本分位数波动以及余项，从而更精确地分析分位数估计的稳定性来源。

Comments 0 figures

详情

AI中文摘要

我们研究由估计参数索引的分布样本分位数，重点关注与金融收益线性投影相关的风险价值，其潜在概率律是重尾的。在此设定下，投影方向和经验分位数阈值均从数据中估计，因此固定分布下的标准Bahadur表示无法分离不同的不稳定性来源。一个规范的起点是Bahadur表示，它通过经验分布函数加上余项来表达样本分位数\cite{bahadur1966}。经验过程理论通过半空间、对称差和Glivenko-Cantelli一致收敛的机制提供了可用的框架。它们给出了稳定性界，但将投影方向的变化和分位数阈值的变化吸收到单一的对称差度量中。有趣的是，对于本质上是局部分位数稳定性问题，却施加了全局一致收敛的要求。本文引入了一种Q-Q正交性公式来分离投影方向和分位数阈值效应。关注的对象是使用估计投影方向计算的经验分位数与参考投影方向下的总体分位数之间的差异。我们将此差异分解为三项：$\hat q_α(\hat w)-q_α(w_0)=D_1+D_2+D_3$。其中，$D_1$衡量由投影方向扰动引起的总体分位数移动，$D_2$衡量在投影方向固定时经验分位数的波动，$D_3$是Bahadur型余项。

英文摘要

We study sample quantiles of distributions indexed by estimated parameters, with a on Value-at-Risk related to linear projections of financial returns that whose underlying probability law is heavy-tailed. In this setting, the projection direction and the empirical quantile threshold are estimated from the data, so the standard Bahadur representation under a fixed distribution does not separate the distinct sources of instability. A canonical starting point is Bahadur's representation, which expresses the sample quantile through the empirical distribution function plus a remainder term \cite{bahadur1966}. Empirical-process theory provides a usable scaffolding through the mechanics of half-spaces, symmetric differences, and Glivenko--Cantelli uniform convergence. They yield stability bounds, but absorb changes in projection direction and changes in quantile threshold into a single symmetric-difference measure. Interestingly, a global uniform-convergence requirement is imposed on what is intrinsically a local quantile-stability problem. This paper introduces a Q-Q orthogonality formulation for separating projection-direction and quantile-threshold effects. The object of interest is the difference between the empirical quantile computed using the estimated projection direction and the population quantile computed at the reference projection direction. We decompose this difference into three terms, $\hat q_α(\hat w)-q_α(w_0)=D_1+D_2+D_3$. Here, $D_1$ measures the population quantile movement induced by perturbing the projection direction, $D_2$ measures the empirical quantile fluctuation with the projection direction held fixed, and $D_3$ is the Bahadur-type remainder.

URL PDF HTML ☆

赞 0 踩 0

2605.18329 2026-05-25 cs.CV cs.LG 版本更新

Lost in the Folds: When Cross-Validation Is Not a Deep Ensemble for Uncertainty Estimation

迷失在折叠中：当交叉验证不是用于不确定性估计的深度集成时

Tristan Kirscher, Markus Bujotzek, Yannick Kirchhoff, Maximilian Rokuss, Fabian Isensee, Kim-Celine Kahl, Balint Kovacs, Klaus Maier-Hein

发表机构 * ICube Laboratory, CNRS UMR-7357, University of Strasbourg, Strasbourg, France（ICube实验室，法国斯特拉斯堡大学）； CLCC Institut-Strauss, Strasbourg, France（CLCC斯特拉斯堡研究所）； German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing（海德堡德国癌症研究中心（DKFZ）医学影像计算部门）； Medical Faculty Heidelberg, Heidelberg University, Heidelberg, Germany（海德堡医学院，海德堡大学）； Faculty of Mathematics and Computer Science, University of Heidelberg, Germany（海德堡大学数学与计算机科学学院）； Helmholtz Imaging, German Cancer Research Center, Heidelberg, Germany（海德堡德国癌症研究中心Helmholtz成像部门）； Pattern Analysis and Learning Group, Department of Radiation Oncology, Heidelberg University Hospital, Heidelberg, Germany（海德堡大学医院放射肿瘤学部模式分析与学习小组）

AI总结在医学图像分割中，集成模型的分歧常被用作认识论不确定性的代理，但许多研究通过K折交叉验证（CV）构建集成模型，却称之为“深度集成”（DE），导致术语与实现不一致。本文对比了标准5折CV集成与5成员DE在三个多标注分割数据集上的表现，发现DE在保持分割精度的同时，提升了校准和失败检测能力，而CV集成有时与标注者间差异相关性更强。研究指出，应根据研究目标选择集成构建方式：DE适用于可靠性导向任务（如选择性转诊），CV集成则更适合作为模糊性代理。

Comments Accepted for publication at MICCAI 2026

详情

Journal ref: 29th International Conference On Medical Image Computing And Computer Assisted Intervention, Sep 2026, Strasbourg, France

AI中文摘要

集成不一致性被广泛用作医学图像分割中认知不确定性的代理。在实践中，许多研究通过K折交叉验证（CV）形成集成，却称之为“深度集成”（DE）。由于CV成员在不同的数据子集上训练，它们的不一致性混合了种子驱动变异和数据暴露效应，这可能改变不确定性的解释方式。我们审查了最近的分割不确定性研究，发现术语与实现不匹配很常见。然后，我们在三个多模态多标注者分割数据集上，在相同配置下比较了标准5折CV集成与5成员DE（固定训练集，不同随机种子）。我们评估了不确定性在校准、故障检测、歧义建模和分布偏移下的鲁棒性。DE在匹配分割精度的同时改善了校准和故障检测，而CV集成在研究数据集上有时与标注者间变异性相关性更强。因此，应选择与研究问题匹配的集成构建方式：DE用于可靠性导向的使用（如选择性转诊/故障检测），CV集成作为歧义的代理。我们提供了一个轻量级的nnU-Net修改，使得在默认流程内能够进行DE训练。

英文摘要

Ensemble disagreement is widely used as a proxy for epistemic uncertainty in medical image segmentation. In practice, many studies form ensembles via K-fold cross-validation (CV), yet refer to them as ``deep ensembles'' (DE). Because CV members are trained on different data subsets, their disagreement mixes seed-driven variability with data-exposure effects, which can change how uncertainty should be interpreted. We audit recent segmentation uncertainty studies and find that terminology--implementation mismatches are common. We then compare a standard 5-fold CV ensemble to a 5-member DE (fixed training set, different random seeds) under otherwise identical configurations on three multi-rater segmentation datasets spanning three modalities. We evaluate uncertainty for calibration, failure detection, ambiguity modeling, and robustness under distribution shift. DE match segmentation accuracy while improving calibration and failure detection, whereas CV ensembles sometimes correlate more strongly with inter-rater variability on the studied datasets. Thus, ensemble construction should be chosen to match the research question: DE for reliability-oriented use (e.g., selective referral/failure detection) and CV ensembles as a proxy for ambiguity. We provide a lightweight nnU-Net modification enabling DE training within the default pipeline.

URL PDF HTML ☆

赞 0 踩 0

2605.17767 2026-05-25 stat.ML cs.LG 版本更新

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent

线性宽度双层网络中的特征学习：梯度下降的两步 vs 一步

Behrad Moniri, Hamed Hassani

发表机构 * University of Pennsylvania（宾夕法尼亚大学）

AI总结本文研究了在宽度线性增长的两层神经网络中特征学习的行为，重点分析了梯度下降第二步更新时隐藏层权重的变化。作者超越了之前仅分析单步更新的研究，揭示了第二步更新中权重的谱特性，表明其行为类似于具有多个异常值的尖峰随机矩阵，这些异常值对应于学习到的不同方向。研究还发现，通过重复使用训练批次而非独立批次，可以学习到信息指数大于一的方向，表明批次重用在宽网络中仍具有优势。

详情

AI中文摘要

我们在线性宽度机制下研究双层神经网络中的特征学习，其中隐藏神经元数量、样本量和输入维度成比例缩放。尽管近期工作分析了该机制下通过单步梯度下降更新第一层权重的特征学习，但这种单步更新方案存在根本性限制：权重更新近似秩一，仅捕获单个方向，且要求目标函数的信息指数为1。本文超越单步更新，完整刻画了步长$η_1\asymp N^{α_1}$和$η_2 \asymp N^{α_2}$（$α_1, α_2 \in [0,0.5)$，$N$为隐藏神经元数）的梯度下降 extit{第二步}过程中学习的特征。我们推导了更新权重的谱特征，证明其表现为具有多个离群点的尖峰随机矩阵，每个离群点对应一个学习方向。我们证明离群点数量由参数$α_1, α_2$通过$\lfloor \frac{α_2}{1/2 - α_1} \rfloor$决定。此外，通过分析学习方向与目标函数之间的对齐，我们发现了独立批次与重用批次训练之间的差距。独立批次将学习限制在信息指数为1的方向上，而批重用使得第二步更新能够捕获信息指数超过1的方向，前提是$α_1, α_2$选择得当。这表明先前在窄宽度机制中观察到的批重用优势在线性宽度极限下仍然存在。通过刻画这些早期阶段的演化，我们的工作为研究现代过参数化网络中的优化和特征学习现象提供了一个易处理的框架。

英文摘要

We study feature learning in two-layer neural networks within the linear-width regime, where the number of hidden neurons, sample size, and input dimension scale proportionally. While recent work has analyzed feature learning via a single step of gradient descent on the first layer weights in this regime, such one-step update schemes are fundamentally limited: the update to the weights is approximately rank-one, captures only a single direction, and requires the target function to have an information exponent of one. In this paper, we go beyond one-step updates to provide a full characterization of the features learned during the \textit{second step} of gradient descent with step-sizes $η_1\asymp N^{α_1}$ and $η_2 \asymp N^{α_2}$ for $α_1, α_2 \in [0,0.5)$, where $N$ is the number of hidden neurons. We derive a spectral characterization of the updated weights, demonstrating they behave as a spiked random matrix with multiple outliers, each corresponding to a learned direction. We show that the number of the outliers is determined by the parameters $α_1, α_2$ through $\lfloor \frac{α_2}{1/2 - α_1} \rfloor$. Furthermore, by analyzing the alignment between the learned directions and the target function, we identify a gap between training with independent versus reused batches. While independent batches restrict learning to directions with an information exponent of one, batch reuse enables the second update to capture directions even when the information exponent exceeds one, provided that $α_1, α_2$ are chosen properly. This shows that the benefits of batch reuse, previously observed in narrow-width regimes, persist in the linear-width limit as well. By characterizing these early-phase evolutions, our work proposes a tractable framework for studying optimization and feature learning phenomenology in modern overparameterized networks.

URL PDF HTML ☆

赞 0 踩 0

2605.17245 2026-05-25 cs.NI cs.LG 版本更新

An Efficient Machine Learning-based Framework for Detection and Prevention of Frauds in Telecom Networks

一种基于机器学习的高效电信网络欺诈检测与预防框架

Praveen Hegde, Mishal Shah

发表机构 * Verizon Bloomberg LP（Verizon Bloomberg实验室）； Atlanta, USA（美国亚特兰大）； Jersey City, USA（美国新泽西州杰赛尔城）

AI总结本文提出了一种基于机器学习的高效框架，用于电信网络中欺诈行为的检测与预防。研究使用包含10万余条客户记录的电信详单数据集，通过特征预处理、数据平衡和模型训练等步骤，评估了多种机器学习模型的性能。实验结果表明，随机森林（RF）模型在准确率、精确率、召回率和F1分数等指标上均达到99.9%，是检测电信欺诈最有效的模型。

Comments Peer-reviewed and presented at 2025 International Conference on Advancement in Communication and Computing Technology (INOACC-2025); self-published by the author due to a sustained 13-month indexing delay by the organizers. Contains 7 pages and 7 figures

详情

Journal ref: International Conference on Advancement in Communication and Computing Technology (INOACC), 2025

AI中文摘要

电信欺诈是一个严重问题，导致重大物质损失并损害全球电信系统的可靠性。只有有效且高效的检测机制才能应对这些威胁，尽管欺诈检测方法有所转变。本文使用通话详细记录（CDR）数据集评估了人工智能驱动的模型在电信网络欺诈检测中的性能。本研究聚焦于使用Telecom CDR数据集进行电信网络欺诈检测，该数据集包含101,174条客户记录，具有17个属性，其中包括8,830个欺诈案例。在特征预处理中，处理了缺失值，随后使用Min-Max缩放进行数据缩放，并使用SMOTE技术进行数据平衡。使用随机森林（RF）和XGBoost模型对数据集进行预测分析训练。使用F1分数、ROC AUC、召回率、准确率、时间和精确度作为指标来比较两个模型的性能。RF的准确率高达99.9%，而XGBoost为99.7%。结果表明，所提出的框架成功检测欺诈且误分类很少。评估和对比了多种机器学习模型，如RF、XGBoost、DBSCAN、RoBERTa和K-means。在所有模型中，RF表现最佳，准确率99.9%、精确度99.9%、召回率99.9%和F1分数99.9%，优于XGBoost、GNN和BERT。研究结果强调RF是检测电信网络欺诈活动的最有效模型，确保稳健可靠的欺诈预防。

英文摘要

Telecommunication fraud is an acute problem that leads to substantial material losses and compromises the reliability of telecom systems worldwide. Only effective and efficient detection mechanisms can help to deal with these threats, though there are certain shifts in the approaches to fraud detection. This paper evaluates the performance of AI-driven models for fraud detection in telecommunication networks using Call Detail Record (CDR) datasets. This study focuses on fraud detection in telecom networks using the Telecom CDR dataset, which contains 101,174 customer records with 17 attributes, including 8,830 fraud cases. In feature preprocessing, missing values were dealt with, followed by data scaling using Min-Max scaling and data balancing using the SMOTE technique. The dataset was trained for predictive analysis using Random Forest (RF) and XGBoost models. F1-score, ROC AUC, recall, accuracy, time, and precision were used as indicators with which to compare performance of the two models. RF recorded a high level of accuracy at 99.9% while XGBoost at 99.7%. Findings show that the suggested framework successfully detects fraud with few misclassifications. Several machine learning models were evaluated and contrasted, such as RF, XGBoost, DBSCAN, RoBERTa, and K-means. Among all the models, RF was seen to give the highest performance with an accuracy of 99.9% and precision of 99.9%, recall of 99.9% and F1-score of 99.9%, XGBoost, GNN and BERT. The findings emphasize RF as the most effective model for detecting fraudulent activities in telecom networks, ensuring robust and reliable prevention of fraud.

URL PDF HTML ☆

赞 0 踩 0

2605.17076 2026-05-25 cs.LG cs.AI cs.DC cs.MA 版本更新

S-Bus: Automatic Read-Set Reconstruction for Multi-Agent LLM State Coordination

S-Bus: 多智能体LLM状态协调的自动读集重建

Sajjad Khan

发表机构 * Sajjad Khan

AI总结本文提出了一种名为 S-Bus 的 HTTP 中间件，用于解决多智能体 LLM 在共享可变状态时的并发控制问题，尤其针对无法声明读集的场景。其核心机制 DeliveryLog 能够在提交时从观察到的 HTTP GET 流量中重建每个智能体的读集，从而实现一种名为“可观测读隔离”（ORI）的一致性保证，有效防止分片拓扑中的结构化竞态条件。研究贡献包括形式化验证、与传统数据库的性能对比以及对 ORI 在不同工作负载下的语义影响分析。

Comments v2: LLM judge validated against human annotator (Zahid Hussain, Mindgigs Peshawar) on PH-3 at strict kappa=0.93 (n=93, 96.8% agreement); over-claim refined to 32% (LLM) / 49% (human). Adds Exp.PG-Comparison Rust-Native and Workload-B chi2=1094.98. 24 pages, 23 tables. Annotation data attached as arXiv ancillary files

详情

AI中文摘要

我们解决了通过HTTP共享可变状态的LLM智能体的并发控制问题，其中智能体无法被修改以声明读集。S-Bus是一个HTTP中间件，其核心机制——服务端DeliveryLog——在提交时从观察到的HTTP GET流量中重建每个智能体的读集。它提供的一致性属性——可观测读隔离（ORI），一种基于HTTP可观测读投影的部分因果一致性——防止了专用分片拓扑中的结构性竞态条件。三项贡献：（C1）DeliveryLog机制，具有三层机械化证据：TLAPS证明了ReadSetSoundness和ORICommitSafety（基于一个类型公理）；N=3时的穷举TLC探索了20,763,484个状态，零违规；Dafny验证了9个归纳引理。（C2）与PostgreSQL 17 SERIALIZABLE和Redis 7 WATCH/MULTI的经验安全对等：在884,110次提交尝试中（其中427,308次处于活跃争用下）零Type-I损坏。（C3）ORI在专用分片工作负载中语义中性，但在单分片协作写入中有害，因为保留传播并发矛盾。 v2更新：PH-3 LLM评判器现在已针对人类标注者（Zahid Hussain, Mindgigs Peshawar）在400个（步骤，分片）对上进行独立验证，严格kappa=0.93（n=93，原始一致性96.8%）。LLM间评判器一致性为kappa=0.46（边界方差）。智能体自我报告高估分片使用量32%（LLM评判器）至49%（人类标注者）。SJ-v4语义质量评分标准仍为单评判器LLM-only。源代码、形式化证明、测试框架、标注数据：https://github.com/sajjadanwar0/sbus

英文摘要

We address concurrency control for LLM agents sharing mutable state over HTTP, where agents cannot be modified to declare read sets. S-Bus is an HTTP middleware whose central mechanism, a server-side DeliveryLog, reconstructs each agent's read set at commit time from observed HTTP GET traffic. The consistency property it provides -- Observable-Read Isolation (ORI), a partial causal consistency over the HTTP-observable read projection -- prevents Structural Race Conditions in dedicated-shard topologies. Three contributions. (C1) DeliveryLog mechanism with three-tier mechanised evidence: TLAPS proves ReadSetSoundness and ORICommitSafety (modulo one typing axiom); exhaustive TLC at N=3 explores 20,763,484 states with zero violations; Dafny discharges 9 inductive lemmas. (C2) Empirical safety parity against PostgreSQL 17 SERIALIZABLE and Redis 7 WATCH/MULTI: zero Type-I corruptions across 884,110 commit attempts (427,308 under active contention). (C3) ORI is semantically neutral in dedicated-shard workloads but harmful in single-shard collaborative writing because preservation propagates concurrent contradictions. v2 update: the PH-3 LLM judge is now independently validated against a human annotator (Zahid Hussain, Mindgigs Peshawar) on 400 (step, shard) pairs at strict kappa=0.93 (n=93, 96.8% raw agreement). Inter-LLM-judge agreement is kappa=0.46 (boundary variance). Agent self-reports over-claim shard usage by 32% (LLM judge) to 49% (human annotator). The SJ-v4 semantic-quality rubric remains single-judge LLM-only. Source code, formal proofs, harness, annotation data: https://github.com/sajjadanwar0/sbus

URL PDF HTML ☆

赞 0 踩 0

2605.16799 2026-05-25 cs.LG cs.AI 版本更新

Cross-Domain Molecular Relational Learning: Leveraging Chemical Structure-Activity Analysis

跨域分子关系学习：利用化学结构-活性分析

Peiliang Zhang, Jingling Yuan, Shiqing Wu, Mengqing Hu, Chao Che, Yongjun Zhu, Lin Li

发表机构 * Wuhan University of Technology（武汉理工大学）； Yonsei University（延世大学）； Hubei Key Laboratory of Transportation Internet of Things（湖北省交通运输物联网重点实验室）； State Key Laboratory of Silicate Materials for Architectures（建筑硅酸盐材料国家重点实验室）； City University of Macau（澳门城市大学）； Kyung Hee University（庆熙大学）； Dalian University（大连大学）

AI总结该研究针对分子关系学习中跨领域建模的不足，提出了一种基于结构-活性分析的跨领域分子关系学习方法。核心方法是引入结构语义迁移差异的领域对抗训练网络（DisTrans），通过子结构拓扑差异引导模型学习分子结构的领域依赖性，并对齐源域与目标域的功能团语义信息，从而提升跨领域适应能力。实验表明，该方法在两种典型跨领域场景下优于16种基线方法，具有良好的泛化性能。

Comments Accepted by SIGKDD 2026 Research Track

详情

AI中文摘要

分子表示的最新进展整合了分子拓扑和视觉模态，为精确的分子关系学习（MRL）开辟了新途径。现有的MRL方法专注于域内建模，其固有的域封闭效应限制了在分子科学中的适用性，特别是在阐明跨域相互作用机制方面。因此，跨域分子关系学习的必要性日益迫切。受益于结构-活性分析，我们提出了具有结构语义迁移差异的域对抗训练网络（DisTrans），以优化分子结构和视觉图像的跨域自适应表示。1）我们利用基于域间子结构拓扑差异的梯度反转策略来学习分子结构的域依赖性。该策略引导模型适应目标域中的结构邻接模式，生成域可分离的结构表示。2）我们应用跨域表示引导机制来对齐源域和目标域之间的官能团语义信息，学习跨域一致性信息。在两种典型跨域策略中的实验结果表明，DisTrans优于16种基线方法，即使在显著的域间差异下也能保持令人满意的性能。

英文摘要

Recent advances in molecular representation integrates molecular topological and visual modalities, opening new avenues for precise Molecular Relational Learning (MRL). Existing MRL methods focus on intra-domain modeling, and their inherent domain-closed effect limits applicability to molecular science, particularly in elucidating cross-domain interaction mechanisms. Consequently, the imperative for Cross-Domain Molecular Relational Learning has become increasingly pressing. Benefiting from structure-activity analysis, we propose the Domain Adversarial Training Network with Structural-Semantic Transfer Discrepancy (DisTrans) to optimize cross-domain adaptive representation for molecular structures and visual images. 1) We employ the gradient reversal strategy based on substructure topological discrepancies between domains to learn the domain dependence of molecular structures. This strategy guides the model to adapt to the structural adjacency patterns in the target domain, generating domain-separable structural representations. 2) We apply the cross-domain representation guidance mechanism to align the functional-group semantic information between the source and target domains, learning cross-domain consistency information. The experimental results in two typical cross-domain strategies demonstrate that DisTrans outperforms 16 baseline methods, maintaining satisfactory performance even under pronounced inter-domain discrepancy.

URL PDF HTML ☆

赞 0 踩 0

2605.11490 2026-05-25 cs.LG stat.ML 版本更新

Adaptive Calibration in Non-Stationary Environments

非平稳环境中的自适应校准

Junyan Liu, Haipeng Luo, Lillian J. Ratliff

发表机构 * University of Washington（华盛顿大学）； University of Southern California（南加州大学）

AI总结在非平稳环境中实现自适应校准是现代AI系统中的核心挑战。本文提出了一类能够根据环境非平稳程度自动调整校准误差的在线预测算法，在i.i.d.和对抗性环境之间实现平滑过渡。该方法在多种校准度量下均取得了理论保证，其误差上界在平稳和对抗性场景下均达到最优，并扩展了先前相关工作，引入了基于阶段的调度策略和预测空间的非均匀划分技术。

Comments Added results for piecewise-stationary environments and included a comparison with the concurrent work of Huang et al. (arXiv:2605.09273)

详情

AI中文摘要

在现代AI系统中，进行校准的在线预测是一个核心挑战。现有文献大多关注完全对抗性环境，其中结果可能是任意的，导致算法保守，在更温和的设置（如结果近乎平稳）中表现次优。这一差距引发了一个自然问题：我们能否设计在线预测算法，其校准误差自动适应环境的非平稳程度，在独立同分布和对抗性场景之间平滑插值？我们对此问题给出肯定回答，并开发了一套算法，在多种校准度量下实现自适应校准保证。具体地，设$T$为轮数，$K$为环境中未知的独立同分布段数，$C\in[0,T]$为另一个未知的非平稳度量（定义为均值结果的最小$\ell_1$偏差），我们的算法对$\ell_1$校准误差达到$\widetilde{O}(\min\{\sqrt{T}+(TC)^{\frac{1}{3}}, \sqrt{KT}\})$，对$\ell_2$和伪KL校准误差均达到$\widetilde{O}(\min\{(1+C)^{\frac{1}{3}}, K\})$。这些界匹配平稳情况（$C=0$且$K=1$）的最优率，并在完全对抗性场景（$C, K=\Omega(T)$）中恢复已知保证。我们的方法建立在并扩展了先前工作[Hu等人，2026，Luo等人，2025]的基础上，引入基于epoch的调度以及对预测空间进行新颖的非均匀划分，在底层真实值附近分配更精细的分辨率。

英文摘要

Making calibrated online predictions is a central challenge in modern AI systems. Much of the existing literature focuses on fully adversarial environments where outcomes may be arbitrary, leading to conservative algorithms that can perform suboptimally in more benign settings, such as when outcomes are nearly stationary. This gap raises a natural question: can we design online prediction algorithms whose calibration error automatically adapts to the degree of non-stationarity in the environment, smoothly interpolating between i.i.d. and adversarial regimes? We answer this question in the affirmative and develop a suite of algorithms that achieve adaptive calibration guarantees under multiple calibration measures. Specifically, with $T$ being the number of rounds, $K$ being the unknown number of i.i.d. segments of the environment, and $C\in[0,T]$ being another unknown non-stationary measure defined as the minimal $\ell_1$ deviation of the mean outcomes, our algorithms attain $\widetilde{O}(\min\{\sqrt{T}+(TC)^{\frac{1}{3}}, \sqrt{KT}\})$ for $\ell_1$ calibration error and $\widetilde{O}(\min\{(1+C)^{\frac{1}{3}}, K\})$ for both $\ell_2$ and pseudo KL calibration error. These bounds match the optimal rates in the stationary case ($C=0$ and $K=1$) and recover known guarantees in the fully adversarial regime ($C, K=Ω(T)$). Our approach builds on and extends prior work [Hu et al., 2026, Luo et al., 2025], introducing an epoch-based scheduling together with a novel non-uniform partition of the prediction space that allocates finer resolution near the underlying ground truth.

URL PDF HTML ☆

赞 0 踩 0

2605.11053 2026-05-25 cs.CR cs.AI cs.LG 版本更新

Content-Aware Attack Detection in LLM Agent Tool-Call Traffic: An Empirical Study of Features, Architectures, and Evaluation Protocols

LLM Agent工具调用流量中的内容感知攻击检测：特征、架构与评估协议的实证研究

Sultan Zavrak

发表机构 * Department of Computer Engineering, Duzce University（杜兹大学计算机工程系）

AI总结本文研究了大语言模型代理在调用外部工具时的流量攻击检测问题，提出了一种基于内容感知的检测框架，将每个代理会话建模为图结构，并结合语句嵌入特征进行分类。研究对比了多种图神经网络和传统机器学习模型，发现内容级别的特征对检测性能至关重要，且基于SBERT的嵌入特征在多个数据集上表现优异，优于图神经网络和MLP模型。此外，研究还揭示了数据划分方式对评估结果的影响，并指出先前工作未充分考虑这一问题。

Comments v2: renamed manuscript (brand removed; descriptive title). No changes to methodology, results, tables, or figures

详情

AI中文摘要

模型上下文协议（MCP）已成为LLM agent调用外部工具的广泛采用的接口，然而对MCP工具调用流量的学习监控仍未被充分探索。本文提出的检测器是一个针对MCP工具调用流量的攻击检测框架，它将每个agent会话编码为图（工具调用作为节点，顺序和数据流链接作为边），通过参数和响应的句子嵌入特征丰富节点，并将会话分类为良性或受攻击。评估了三种GNN架构（GAT、GCN、GraphSAGE）、一个无图MLP以及经典基线（XGBoost、随机森林、逻辑回归、线性SVM），完整架构比较在RAS-Eval（任务分层分割）上进行，GraphSAGE作为GNN基线保留在ATBench和组合源变体（均标签分层）上。得出三个发现。首先，内容级特征至关重要：仅元数据检测的AUROC停滞在0.64左右，无论架构如何，而内容嵌入将AUROC推高至0.89以上。其次，相对于任务不相交分割，朴素随机分割评估将AUROC高估多达26个百分点，这是先前agent检测工作未解决的记忆混淆问题。第三，检测信号主要存在于SBERT内容嵌入中：在池化嵌入上，树集成达到了0.975的AUROC，在大多数情况下优于主要RAS-Eval设置中的神经架构，包括GNN（0.917）和MLP（0.896），并且自监督预训练在此任务上未带来标签效率优势。

英文摘要

The Model Context Protocol (MCP) has become a widely adopted interface for LLM agents to invoke external tools, yet learned monitoring of MCP tool-call traffic remains underexplored. In this article, the proposed detector is presented as an attack detection framework for MCP tool-call traffic that encodes each agent session as a graph (tool calls as nodes, sequential and data-flow links as edges), enriches nodes with sentence-embedding features over arguments and responses, and classifies sessions as benign or attacked. Three GNN architectures (GAT, GCN, GraphSAGE), a no-graph MLP, and classical baselines (XGBoost, random forest, logistic regression, linear SVM) are evaluated, with the full architecture comparison conducted on RAS-Eval (task-stratified splits) and GraphSAGE retained as the GNN baseline on ATBench and a combined-source variant (both label-stratified). Three findings emerge. First, content-level features are essential: metadata-only detection plateaus around an AUROC of 0.64 regardless of architecture, while content embeddings push the AUROC above 0.89. Second, naive random-split evaluation inflates AUROC by up to 26 percentage points relative to task-disjoint splits, a memorization confound that prior agent-detection work has not addressed. Third, the detection signal resides primarily in the SBERT content embeddings: an AUROC of 0.975 was reached by tree ensembles on pooled embeddings, performing, for the most part, better than the neural architectures in the primary RAS-Eval setting including GNNs (0.917) and the MLP (0.896), and self-supervised pre-training does not deliver a label-efficiency advantage on this task.

URL PDF HTML ☆

赞 0 踩 0

2605.10220 2026-05-25 astro-ph.GA cs.LG 版本更新

Stellar Age Compression Reshapes Interpretations of the Milky Way Thick-Disk Formation History

恒星年龄压缩重塑对银河系厚盘形成历史的解释

Zhipeng Zhang

发表机构 * China Mobile Research Institute（中国移动研究院）； China Mobile GBA (Greater Bay Area) Innovation Institute（中国移动粤港澳大湾区创新研究院）

AI总结银河厚盘的形成时间尺度是银河考古学中的核心问题之一。本研究通过比较光谱推断年龄和星震学年龄两种独立的恒星年龄标度，发现厚盘形成历史的关键观测特征在星震学锚定下发生了系统性变化，表明之前支持快速形成的观点可能受到恒星年龄压缩效应的影响。研究进一步表明，年龄压缩变换本身即可解释快速形成特征的观测结果，无需假设厚盘本身具有突发形成的历史，揭示了银河形成历史的统计解释可能高度依赖于恒星年龄的定义。

详情

AI中文摘要

银河系厚盘的形成时标是银河考古学的核心争论之一。年龄-金属丰度关系（AMR）、形成时标和化学演化梯度常被用来推断厚盘的快速聚集、短时标增丰和爆发式形成历史。然而，恒星年龄并非直接可观测，这引入了推断年龄可能因观测质量而存在系统性压缩的潜在风险。在本文中，我们使用相同的恒星样本和相同的物理协变量匹配条件，但采用两种独立的年龄标度——光谱推断年龄（astroNN）和星震学年龄（APOKASC-3）——来比较厚盘形成历史的可观测特征。我们发现，先前支持厚盘快速形成的几个关键可观测特征在星震学锚定下系统性减弱：AMR斜率从-3.29变为-1.86 Gyr dex⁻¹（Δa = +1.43），形成时标从3.04 Gyr展宽至3.55 Gyr，峰值形成年龄从9.1 Gyr移至6.0 Gyr。通过传输反演实验，我们进一步表明加性噪声只能展宽年龄分布而无法重现上述模式，而压缩性传输映射（λ < 1）能同时重现更窄的年龄分布、更陡的AMR以及类似快速形成的可观测特征。这一结果表明，压缩变换本身足以产生有利于快速形成的可观测特征，而无需内在的爆发式形成历史。我们的发现揭示了银河系形成历史的统计解释可能敏感地依赖于恒星年龄定义本身。

英文摘要

The formation timescale of the Milky Way thick disk is one of the central debates in Galactic archaeology. The age-metallicity relation (AMR), formation timescale, and chemical evolution gradients are frequently used to infer a rapid assembly, short-timescale enrichment, and bursty formation history of the thick disk. However, stellar ages are not directly observable, introducing the potential risk that inferred ages may harbor a systematic compression tied to observational quality. In this paper, we use the same stellar sample and identical physical covariate matching conditions, but two independent age scales--spectroscopic inferred ages (astroNN) and asteroseismic ages (APOKASC-3)--to compare the observable signatures of the thick-disk formation history. We find that several key observables previously supporting a rapid thick-disk formation are systematically weakened under seismic anchoring: the AMR slope flattens from -3.29 to -1.86 Gyr dex-1 (Delta a = +1.43), the formation timescale widens from 3.04 to 3.55 Gyr, and the peak formation age shifts from 9.1 to 6.0 Gyr. Through transport inversion experiments, we further show that additive noise can only broaden the age distribution and cannot reproduce the above pattern, whereas a compressive transport map (lambda < 1) simultaneously reproduces a narrower age distribution, a steeper AMR, and rapid-formation-like observables. This result indicates that the compression transformation itself is sufficient to generate rapid-formation-friendly observables without requiring an intrinsically bursty formation history. Our findings reveal that statistical interpretations of the Milky Way formation history may depend sensitively on the stellar age definition itself.

URL PDF HTML ☆

赞 0 踩 0

2605.10219 2026-05-25 math.OC cs.CC cs.LG 版本更新

Parameterized Complexity of Stationarity Testing for Piecewise-Affine Functions and Shallow CNN Losses

分段仿射函数与浅层CNN损失的平稳性检验的参数化复杂性

Yuhan Ye

发表机构 * MIT（麻省理工学院）

AI总结本文研究了在给定的点上测试连续分段仿射（PA）函数近似一阶平稳性的参数化复杂度问题，这是非光滑优化中的基本任务。作者从参数化复杂度的角度出发，以环境维度 $d$ 为参数，给出了固定维度下的XP算法，并证明了其对立面的W[1]-难性。此外，研究还扩展到浅层ReLU卷积神经网络的训练损失函数，表明相同参数化复杂度的结论也适用于这类简单CNN的训练问题。

Comments 32 pages, 1 figure, 1 table

详情

AI中文摘要

我们研究了在指定点检验连续分段仿射（PA）函数的近似一阶平稳性的参数化复杂性，这是非光滑优化中的基本任务。PA函数构成了非光滑平稳性检验的典型模型，并捕捉了ReLU型训练损失中出现的局部多面体几何。Tian和So（SODA 2025）最近的工作表明，在最坏情况下，PA函数的近似平稳性概念检验在计算上难以处理，并将固定维度的可处理性确定为一个开放方向。我们从参数化复杂性的角度处理这一方向，以环境维度$d$作为参数。在本文中，我们为可处理侧给出了固定维度的XP算法，并为互补侧证明了W[1]-难度。此外，在指数时间假设下的下界排除了运行时间为$ρ(d)\size^{o(d)}$的算法，其中$\size$表示平稳性检验实例的总二进制编码长度，$ρ$为任意可计算函数。作为进一步的结果，我们的结果给出了检验连续PA函数局部极小性的相应参数化复杂性图景。我们进一步将硬度结果推广到一系列浅层ReLU CNN训练损失，在可训练权重空间中检验平稳性。因此，简单的CNN训练损失也出现了相同的参数化复杂性图景。

英文摘要

We study the parameterized complexity of testing approximate first-order stationarity at a prescribed point for continuous piecewise-affine (PA) functions, a basic task in nonsmooth optimization. PA functions form a canonical model for nonsmooth stationarity testing and capture the local polyhedral geometry that appears in ReLU-type training losses. Recent work by Tian and So (SODA 2025) shows that testing approximate stationarity notions for PA functions is computationally intractable in the worst case, and identifies fixed-dimensional tractability as an open direction. We address this direction from the viewpoint of parameterized complexity, with the ambient dimension $d$ as the parameter. In this paper, we give XP algorithms in fixed dimension for the tractable sides, and prove W[1]-hardness for the complementary sides. Moreover, lower bounds under the Exponential Time Hypothesis rule out algorithms running in time $ρ(d)\size^{o(d)}$ for any computable function $ρ$, where $\size$ denotes the total binary encoding length of the stationarity-testing instance. As a further consequence, our results yield the corresponding parameterized complexity picture for testing local minimality of continuous PA functions. We further extend our hardness results to a family of shallow ReLU CNN training losses, with stationarity tested in the trainable weight space. Thus, the same parameterized-complexity picture also appears for simple CNN training losses.

URL PDF HTML ☆

赞 0 踩 0

2605.07220 2026-05-25 cs.LG 版本更新

不遗漏任何硬币：对抗无遗憾动态的最大化战略剩余

Yiheng Su, Emmanouil-Vasileios Vlatakis-Gkaragkounis

发表机构 * University of Wisconsin–Madison（威斯康星大学麦迪逊分校）

AI总结本文研究了在零和博弈中，如何对抗使用固定步长的Follow-the-Regularized-Leader（FTRL）学习者，最大化战略盈余。作者证明了从FTRL学习者中提取与遗憾尺度相关的盈余是该方法族的固有特性，而非特定实现的结果，并提出了两个关键结果：固定最大最小优化器下，盈余与学习者的次优动作数量成正比；交替优化器下，无论均衡结构如何，均可保证一定规模的盈余。研究还揭示了正则化器的几何二分现象，并提出了衡量正则化器对学习者策略敏感程度的指标。

详情

AI中文摘要

我们研究了在 $n\times m$ 两人零和博弈中，对抗使用恒定步长 $\eta$ 的跟随正则化领导者（FTRL）学习器时，先知优化者在 $T$ 轮博弈中可获得的战略剩余。与之前的分析不同，我们表明这种遗憾尺度剩余的提取是 FTRL 家族的固有特征，而非特定实例的产物。首先，对于固定的最大最小优化器，我们建立了一个阶为 $\Omega(N_{\mathrm{sub}}/\eta)$ 的普遍规律，证明效用剩余随学习器次优动作数量 $N$ 缩放，并在没有次优动作时消失。其次，对于交替优化器，在随机博弈中，无论均衡结构如何，都能以高概率保证 $\Omega(\eta T/\mathrm{poly}(n,m))$ 的剩余。我们的分析揭示了一个尖锐的几何二分法：非陡峭正则化器允许优化器通过有限时间消除次优动作实现最大瞬态剩余，而陡峭正则化器则引入一个消失的尾部修正，可能延迟剩余饱和。最后，我们讨论了这种优势在双边收益不确定性下是否持续存在，并提出了一个易感性度量，量化哪些正则化器最容易受到学习器感知的战略引导。

英文摘要

We investigate the strategic surplus obtainable against a Follow-the-Regularized-Leader (FTRL) learner with constant step size $η$ in $n\times m$ two-player zero-sum games played over $T$ rounds against a clairvoyant optimizer. In contrast with prior analysis, we show that the extraction of such regret-scale surplus is an inherent feature of the FTRL family, rather than an artifact of specific instantiations. First, for a fixed max-min optimizer, we establish a sweeping law of order $Ω(N_{\mathrm{sub}}/η)$, proving that utility surplus scales with the number of the learner's suboptimal actions $N$ and vanishes in their absence. Second, for an alternating optimizer, a surplus of $Ω(ηT/\mathrm{poly}(n,m))$ can be guaranteed regardless of the equilibrium structure, with high probability, in random games. Our analysis uncovers a sharp geometric dichotomy: non-steep regularizers allow the optimizer to realize the maximal transient surplus via finite-time elimination of suboptimal actions, whereas steep regularizers introduce a vanishing tail correction that can delay surplus saturation. Finally, we discuss whether this leverage persists under bilateral payoff uncertainty and propose a susceptibility measure quantifying which regularizers are most vulnerable to learner-aware strategic steering.

URL PDF HTML ☆

赞 0 踩 0

2603.24226 2026-05-25 cs.IR cs.LG 版本更新

Joint Model Parameter Scaling and Universal-Domain Data Integration for E-commerce Search Ranking

联合模型参数缩放与通用域数据集成用于电商搜索排序

Liren Yu, Caiyuan Li, Feiyi Dong, Tao Zhang, Zhixuan Zhang, Dan Ou, Haihong Tang, Bo Zheng

发表机构 * Taobao \& Tmall Group of Alibaba Hangzhou China ； Taobao \& Tmall Group of Alibaba Beijing China ； Taobao \& Tmall Group of Alibaba

AI总结本文研究了电商搜索排序中模型参数扩展与数据质量提升的联合优化问题，指出单纯增加模型规模效果有限，而异构大规模行为数据的处理也难以仅靠架构调整解决。为此，作者提出UniScale框架，包含两个核心组件：ES$^3$系统通过引入跨域示例和全局监督信号扩展训练数据，HHSFT模型则通过分层特征交互和用户兴趣融合处理异构数据。实验表明，UniScale在离线和在线测试中均显著提升了搜索效果，包括订单量和GMV的提升。

详情

AI中文摘要

工业搜索、广告和推荐的缩放研究主要强调扩大模型容量或改进架构。然而在现实系统中，性能不仅受限于模型大小，还受限于训练数据的质量和分布。我们的实证分析显示了两个关键瓶颈：单独增加参数带来的收益逐渐减小，且异构大规模行为数据引入的挑战无法仅通过架构调整完全解决。为解决此问题，我们提出了UniScale，一个将数据缩放与模型设计相结合的统一框架。UniScale包含两个组件。首先，ES$^3$，一个全空间样本构建系统，通过用全局归因的监督信号丰富域内搜索上下文，并引入反映用户在可比内容曝光条件下决策的跨域示例，将监督范围扩展到传统采样训练数据之外。其次，HHSFT，一个异构层次融合Transformer，旨在通过跨整个行为空间的层次化特征交互和用户兴趣融合，利用由此产生的大规模异构数据。这些组件共同实现了比仅以结构为中心的优化更有效的缩放。实验表明，UniScale持续改善离线性能，并展现出有利的缩放行为。在大型电商搜索平台的在线A/B测试中，它带来了1.70%的购买量提升和2.04%的GMV提升。

英文摘要

Scaling studies for industrial search, advertising, and recommendation have largely emphasized enlarging model capacity or refining architectures. Yet in real-world systems, performance is constrained not only by model size but also by the quality and distribution of training data. Our empirical analysis shows two key bottlenecks: increasing parameters alone yields progressively smaller gains, and the challenges introduced by heterogeneous, large-scale behavior data cannot be fully resolved by architecture tuning in isolation. To address this issue, we present UniScale, a unified framework that couples data scaling with model design. UniScale consists of two components. First, ES$^3$, an entire-space sample construction system, broadens supervision beyond conventional sampled training data by enriching intra-domain search contexts with globally attributed supervisory signals and introducing cross-domain examples that reflect user decisions under comparable content exposure conditions. Second, HHSFT, a heterogeneous hierarchical fusion transformer, is tailored to exploit the resulting large-scale heterogeneous data through hierarchical feature interaction and user-interest fusion across the entire behavior space. Together, these components enable more effective scaling than structure-centric optimization alone. Experiments show that UniScale consistently improves offline performance and demonstrates favorable scaling behavior. In online A/B tests on a large e-commerce search platform, it delivers a 1.70% increase in purchases and a 2.04% lift in GMV.

URL PDF HTML ☆

赞 0 踩 0

2603.19812 2026-05-25 cs.LG 版本更新

Eye Gaze-Informed and Context-Aware Pedestrian Trajectory Prediction in Shared Spaces with Automated Shuttles: A Virtual Reality Study

共享空间中自动穿梭车与行人的眼动知情与情境感知轨迹预测：一项虚拟现实研究

Danya Li, Yan Feng, Rico Krueger

发表机构 * Department of Technology, Management and Economics at the Technical University of Denmark（丹麦技术大学技术、管理与经济学系）； Department of Transport & Planning, Civil Engineering Geosciences at Delft University of Technology（代尔夫特理工大学交通运输与规划、土木工程与地质科学系）

AI总结本研究通过虚拟现实实验，探讨行人眼动信息在共享空间中预测其轨迹的价值，研究了不同接近角度和交通条件下的行人与自动驾驶接驳车的交互行为。研究构建了一个融合眼动、头部方向和情境上下文的多模态预测模型，发现眼动信息对轨迹预测的贡献依赖于角度和身体协调，并与情境信息具有互补性。实验表明，结合眼动与情境信息可将最终位移误差降低8.47%，突显了将人类感知信号纳入行人行为预测的重要性。

详情

AI中文摘要

为填补这一空白，我们进行了一项虚拟现实实验，行人在不同接近角度（45°、90°、135°）和连续交通条件（单辆穿梭车、两辆穿梭车间隔3或5秒）下与自动穿梭车交互，收集了同步的运动、眼动和头部朝向数据。为了探究细粒度眼动在何种程度、何种条件下以及以何种形式对行人运动预测提供信息，我们开发了一个多模态预测模型，通过模态特定编码器融合这些信号，并系统地消融眼动表示与头部朝向和情境上下文。我们报告三个主要结果。首先，眼动的预测价值与角度相关，并与眼-头-身体协调紧密耦合：在锐角角度下，行人主动转移视线以获取穿梭车信息时，眼动携带了仅头部朝向无法捕捉的信息。其次，连续眼动朝向优于分类语义注视标签，最佳编码框架（全局或身体相对）取决于眼动是单独使用还是与上下文联合使用。第三，眼动和情境上下文提供互补的预测信息：它们的组合将最终位移误差（FDE）降低了8.47%，接近各自贡献之和。这些发现共同凸显了将人类感知信号纳入行人行为预测的价值，并激励了以人为中心的建模方法补充以车辆为中心的建模方法。我们的代码可在 https://github.com/danyayay/GazeX.git 获取。

英文摘要

To address this gap, we conduct a Virtual Reality experiment in which pedestrians interact with automated shuttles under varying approach angles (45°, 90°, 135°) and continuous-traffic conditions (single shuttle, two shuttles with 3 or 5-second gaps), collecting synchronized motion, eye gaze, and head orientation data. To investigate to what extent, under what conditions, and in what form fine-grained eye gaze is informative for pedestrian motion prediction, we develop a multi-modal prediction model that fuses these signals through modality-specific encoders, and systematically ablate gaze representations against head orientation and situational context. We report three main results. First, the predictive value of eye gaze is angle-dependent and tightly coupled with eye-head-body coordination: at acute angles where pedestrians actively redirect gaze to acquire the shuttle, eye gaze carries information that head orientation alone misses. Second, continuous gaze orientation outperforms categorical semantic fixation labels, with the optimal encoding frame (global or body-relative) depending on whether gaze is used alone or jointly with context. Third, eye gaze and situational context provide complementary predictive information: their combination reduces final displacement error (FDE) by 8.47%, close to the sum of their individual contributions. Together, these findings highlight the value of incorporating human perceptual signals into pedestrian behavior prediction and motivate a human-centered complement to vehicle-centric modeling approaches. Our code is available at https://github.com/danyayay/GazeX.git.

URL PDF HTML ☆

赞 0 踩 0

2603.19310 2026-05-25 cs.LG cs.AI 版本更新

MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels

MemReward: 基于图的经验记忆用于有限标签下的LLM奖励预测

Tianyang Luo, Tao Feng, Zhigang Hua, Yan Xie, Shuang Yang, Ge Liu, Jiaxuan You

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Meta

AI总结本文提出了一种基于图结构的经验记忆框架 MemReward，用于在标注数据有限的情况下提升大语言模型（LLM）的奖励预测能力。该方法通过构建包含初始策略生成的推理过程和答案的异构图，并利用图神经网络（GNN）将有限的标注奖励传播到未标注的样本中，从而在在线策略优化过程中实现奖励的高效获取。实验表明，MemReward 在仅使用20%标注数据的情况下，能够在数学证明、问答和代码生成等任务中接近理想奖励模型的性能。

详情

AI中文摘要

强化学习已成为改进大型语言模型推理能力的强大范式，其中从策略中采样rollout，并利用在这些rollout上计算的奖励信号来更新策略。然而，在数据稀缺的场景中，大规模获取ground-truth标签以验证rollout通常需要昂贵的人工标注或劳动密集型的专家验证。例如，评估数学证明需要专家评审，而开放式问答缺乏确定的ground-truth。当ground-truth标签稀缺时，强化学习微调的有效性受到限制。受半监督学习在将标签从标注样本传播到未标注样本方面成功的启发，我们提出了MemReward，一种基于图的经验记忆框架，将奖励传播直接集成到在线策略优化中。MemReward将来自初始LLM策略的rollout（思考过程和最终答案）存储为异构图中的节点，这些节点通过相似性和结构边连接，图神经网络通过该图将奖励从标注rollout传播到未标注rollout。为了训练这样的框架，我们首先在标注rollout上预热GNN，通过查询、思考和答案节点的异质聚合来预测奖励。在在线RL微调期间，未标注rollout通过查询相似性附加到图中，GNN预测它们的奖励，从而产生一种结合ground-truth和GNN预测奖励的混合奖励获取策略。在Qwen2.5-1.5B和3B上的数学、问答和代码生成实验表明，MemReward仅使用20% rollout的ground-truth奖励，就在1.5B上达到Oracle性能的96.6%，在3B上达到97.3%，并在域外任务上接近Oracle。

英文摘要

Reinforcement learning has emerged as a powerful paradigm for improving large language model (LLM) reasoning, where rollouts are sampled from the policy and reward signals computed on those rollouts are used to update the policy. However, in data-scarce scenarios, obtaining ground-truth labels to verify rollouts at scale often requires expensive human annotation or labor-intensive expert verification. For instance, evaluating mathematical proofs demands expert review, and open-ended question answering lacks definitive ground truth. When ground-truth labels are scarce, the effectiveness of reinforcement learning fine-tuning is constrained. Inspired by the success of semi-supervised learning in propagating labels from labeled to unlabeled samples, we propose MemReward, a graph-based experience memory framework that integrates reward propagation directly into online policy optimization. MemReward stores rollouts (thinking processes and final answers) from an initial LLM policy as nodes in a heterogeneous graph connected by similarity and structural edges, over which a GNN propagates rewards from labeled to unlabeled rollouts. To train such a framework, we first warm up the GNN on labeled rollouts to predict rewards via heterogeneous aggregation over query, thinking, and answer nodes. During online RL fine-tuning, unlabeled rollouts are attached to the graph by query similarity, and the GNN predicts their rewards, yielding a hybrid reward acquisition strategy that combines ground-truth and GNN-predicted rewards. Experiments on Qwen2.5-1.5B and 3B in mathematics, question answering, and code generation demonstrate that MemReward, with ground-truth rewards on only 20% of rollouts, achieves 96.6% of Oracle performance on 1.5B and 97.3% on 3B, and closely approaches Oracle on out-of-domain tasks.

URL PDF HTML ☆

赞 0 踩 0

2603.18551 2026-05-25 math.OC cs.CC cs.LG 版本更新

Learning Decision-Sufficient Representations for Linear Optimization

学习线性优化的决策充分表示

Yuhan Ye, Saurabh Amin, Asuman Ozdaglar

发表机构 * MIT（麻省理工学院）

AI总结本文研究如何构建压缩数据集以恢复具有未知成本向量的线性规划问题中的最优决策。作者证明了确定决策相关维度 $d^\star$ 是 NP 难的，并提出了一种点态充分性概念，从而在多项式时间内构造出适用于单个成本向量的决策数据集。进一步地，他们提出了一种累积算法，在独立同分布成本假设下实现稳定压缩，并给出了分布无关的 PAC 保证，同时将决策充分性表示应用于上下文线性优化，获得了更优的泛化界。

Comments 45 pages plus appendix, 2 figures. Accepted at COLT 2026

详情

AI中文摘要

我们研究如何构建压缩数据集，使其足以恢复未知成本向量$c$位于先验集$\mathcal{C}$中的线性规划的最优决策。Bennouna等人最近的工作通过内在的决策相关维度$d^\star$给出了充分决策数据集（SDDs）的精确几何刻画。然而，他们构建最小规模SDD的算法需要求解混合整数规划。在本文中，我们建立了硬度结果，表明计算$d^\star$是NP难的，判定数据集是否全局充分是coNP难的，从而解决了Bennouna等人提出的一个近期开放问题。为了应对这种最坏情况下的难解性，我们引入了点态充分性，这是一种要求对单个成本向量充分的松弛。在非退化条件下，我们提供了一种多项式时间的切割平面算法来构建点态充分的决策数据集。在具有独立同分布成本的数据驱动框架下，我们进一步提出了一种累积算法，该算法跨样本聚合决策相关方向，产生一个大小至多为$d^\star$的稳定压缩方案。这导致了一个无分布PAC保证：以高概率，在训练样本上，新样本的点态充分失败概率至多为$ ilde{O}(d^\star/n)$，且该速率在对数因子意义下是紧的。最后，我们将决策充分表示应用于上下文线性优化，获得压缩预测器，其泛化界为$ ilde{O}(\sqrt{d^\star/n})$而非$ ilde{O}(\sqrt{d/n})$，其中$d$是环境成本维度。

英文摘要

We study how to construct compressed datasets that suffice to recover optimal decisions in linear programs with an unknown cost vector $c$ lying in a prior set $\mathcal{C}$. Recent work by Bennouna et al. provides an exact geometric characterization of sufficient decision datasets (SDDs) via an intrinsic decision-relevant dimension $d^\star$. However, their algorithm for constructing minimum-size SDDs requires solving mixed-integer programs. In this paper, we establish hardness results showing that computing $d^\star$ is NP-hard and deciding whether a dataset is globally sufficient is coNP-hard, thereby resolving a recent open problem posed by Bennouna et al. To address this worst-case intractability, we introduce pointwise sufficiency, a relaxation that requires sufficiency for an individual cost vector. Under nondegeneracy, we provide a polynomial-time cutting-plane algorithm for constructing pointwise-sufficient decision datasets. In a data-driven regime with i.i.d.\ costs, we further propose a cumulative algorithm that aggregates decision-relevant directions across samples, yielding a stable compression scheme of size at most $d^\star$. This leads to a distribution-free PAC guarantee: with high probability over the training sample, the pointwise sufficiency failure probability on a fresh draw is at most $\tilde{O}(d^\star/n)$, and this rate is tight up to logarithmic factors. Finally, we apply decision-sufficient representations to contextual linear optimization, obtaining compressed predictors with generalization bounds scaling as $\tilde{O}(\sqrt{d^\star/n})$ rather than $\tilde{O}(\sqrt{d/n})$, where $d$ is the ambient cost dimension.

URL PDF HTML ☆

赞 0 踩 0

2603.16331 2026-05-25 cs.LG 版本更新

Decoding the Critique Mechanism in Large Reasoning Models

解码大型推理模型中的批判机制

Hoang Phan, Quang H. Nguyen, Hung T. Q. Le, Xiusi Chen, Heng Ji, Khoa D. Doan

发表机构 * VinUni-Illinois Smart Health Center（VinUniversity-伊利诺伊州智能健康中心）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结本文研究了大推理模型（LRMs）在推理过程中如何通过内部机制纠正错误，提出了“隐藏的批评能力”这一概念。研究发现，即使模型在中间推理步骤中出现错误且未进行明确纠正，仍能最终得出正确答案，表明其具备某种隐式的错误检测与自我修正机制。通过特征空间分析，作者识别出一个可解释的“批评向量”，用于引导模型增强错误检测能力，提升推理性能，且无需额外训练成本。这一发现为理解与改进大模型的自我验证机制提供了新思路。

详情

AI中文摘要

大型推理模型（LRMs）展现出回溯和自我验证机制，使其能够修正中间步骤并达到正确解，在复杂逻辑基准上表现强劲。我们假设这种行为仅在模型具有足够强的“批判”能力来检测自身错误时才有益。本工作通过在中间推理步骤中插入算术错误，系统研究了当前LRMs如何从错误中恢复。值得注意的是，我们发现一个奇特但重要的现象：尽管错误在整个思维链（CoT）中传播且没有任何言语修正，模型在思考过程结束后仍能得出正确的最终答案。这种恢复暗示存在一种内部机制帮助模型检测错误并触发自我修正，我们称之为隐藏的批判能力。基于特征空间分析，我们识别出一个高度可解释的批判向量，代表这种行为。跨多个模型规模和系列的广泛实验表明，用该向量引导潜在表示可提升模型的错误检测能力，并在无需额外训练成本的情况下增强测试时扩展性能。我们的发现为LRMs的批判行为提供了有价值的理解，提示了控制和改进其自我验证机制的有前景方向。我们的代码可在 https://github.com/mail-research/lrm-critique-vectors 获取。

英文摘要

Large Reasoning Models (LRMs) exhibit backtracking and self-verification mechanisms that enable them to revise intermediate steps and reach correct solutions, yielding strong performance on complex logical benchmarks. We hypothesize that such behaviors are beneficial only when the model has sufficiently strong ``critique'' ability to detect its own mistakes. This work systematically investigates how current LRMs recover from errors by inserting arithmetic mistakes in their intermediate reasoning steps. Notably, we discover a peculiar yet important phenomenon: despite the error propagating throughout the entire chain-of-thought (CoT) without any verbalized correction, the model still reaches the correct final answer after the thinking process finishes. This recovery implies the existence of an internal mechanism helping the model to detect errors and trigger self-correction, which we refer to as the \textit{hidden critique ability}. Building on feature space analysis, we identify a highly interpretable \textit{critique vector} representing this behavior. Extensive experiments across multiple model scales and families demonstrate that steering latent representations with this vector improves the model's error detection capability and enhances the performance of test-time scaling at no extra training cost. Our findings provide a valuable understanding of LRMs' critique behavior, suggesting a promising direction to control and improve their self-verification mechanism. Our code is available at: https://github.com/mail-research/lrm-critique-vectors.

URL PDF HTML ☆

赞 0 踩 0

2603.10067 2026-05-25 cs.LG cs.AI 版本更新

HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

HTMuon：通过重尾谱校正改进Muon

Tianyu Pang, Yujie Fang, Zihang Liu, Shenyang Deng, Lei Hsiung, Shuhua Yu, Yaoqing Yang

发表机构 * Dartmouth College（达特茅斯学院）； Microsoft（微软）； International Computer Science Institute（国际计算机科学研究所）； University of California, Berkeley（加州大学伯克利分校）； Meta

AI总结本文提出 HTMuon，一种改进 Muon 优化算法的方法，旨在提升大语言模型的训练效果。研究指出，Muon 的正交更新规则抑制了权重谱的重尾特性，而 HTMuon 基于重尾自正则化理论，通过生成更重尾的更新步长，增强模型对参数依赖关系的捕捉能力。实验表明，HTMuon 在语言模型预训练和图像分类任务中均优于现有方法，且可作为现有 Muon 变体的插件使用。

详情

AI中文摘要

Muon最近在LLM训练中显示出有希望的结果。在这项工作中，我们研究如何进一步改进Muon。我们认为Muon的正交化更新规则抑制了重尾权重谱的出现，并过度强调了沿噪声主导方向的训练。受重尾自正则化（HT-SR）理论的启发，我们提出了HTMuon。HTMuon保留了Muon捕捉参数相互依赖性的能力，同时产生更重尾的更新并诱导更重尾的权重谱。在LLM预训练和图像分类上的实验表明，HTMuon持续优于最先进的基线，并且可以作为现有Muon变体的插件使用。例如，在C4数据集上的LLaMA预训练中，与Muon相比，HTMuon将困惑度降低了高达0.98。我们进一步从理论上证明，HTMuon对应于Schatten-$q$范数约束下的最速下降，并提供了在光滑非凸环境下的收敛性分析。HTMuon的实现可在https://github.com/TDCSZ327/HTmuon获取。

英文摘要

Muon has recently shown promising results in LLM training. In this work, we study how to further improve Muon. We argue that Muon's orthogonalized update rule suppresses the emergence of heavy-tailed weight spectra and over-emphasizes the training along noise-dominated directions. Motivated by the Heavy-Tailed Self-Regularization (HT-SR) theory, we propose HTMuon. HTMuon preserves Muon's ability to capture parameter interdependencies while producing heavier-tailed updates and inducing heavier-tailed weight spectra. Experiments on LLM pretraining and image classification show that HTMuon consistently improves performance over state-of-the-art baselines and can also serve as a plug-in on top of existing Muon variants. For example, on LLaMA pretraining on the C4 dataset, HTMuon reduces perplexity by up to $0.98$ compared to Muon. We further theoretically show that HTMuon corresponds to steepest descent under the Schatten-$q$ norm constraint and provide convergence analysis in smooth non-convex settings. The implementation of HTMuon is available at https://github.com/TDCSZ327/HTmuon.

URL PDF HTML ☆

赞 0 踩 0

2603.06610 2026-05-25 cs.LG 版本更新

MELT：用于高风险 Memecoin 发行检测的行为轨迹数据集

Sihao Hu, Selim Furkan Tekin, Yichang Xu, Ling Liu

发表机构 * School of Computer Science（计算机科学学院）

AI总结本文提出MELT，一个用于检测高风险模因币发行的行为轨迹数据集。该数据集基于Solana区块链，包含超过41,000次模因币发行的2亿多笔交易，提取了包括交易类型、账户协调行为等结构化行为记录，揭示了发行方隐藏真实控制权的策略。MELT还提供了122个行为特征和风险等级标注，支持大规模监督学习，并通过实验验证了其在风险检测中的有效性，为模因币投资风险缓解提供了新方法。

详情

AI中文摘要

Launchpad 已成为发行 memecoin 的主要机制，使投资者面临现有 rug-pull 检测方法无法捕捉的新型高风险发行。我们认为，检测这些威胁需要结构化的行为轨迹，这些轨迹隐藏在原始异构区块链数据之下，即内部人员如何积累、协调和解除头寸。为了实现这种分析，我们引入了 MELT（Memecoin 发行轨迹），这是第一个用于分析和检测 Solana 上高风险 memecoin 发行的行为轨迹数据集。MELT 覆盖了 41k+ 个 memecoin 发行，包含 200M+ 笔交易，这些交易被解析为类型化的行为记录，区分了交换、洗盘交易、转账和铸造。除了每个账户的行为外，MELT 还贡献了捆绑轨迹数据，该数据链接了同一实体控制的账户，揭示平均 36.5% 的代币供应由协调账户持有，这是一种隐藏策略，使真正的所有权集中度不被不知情的买家察觉。在这些轨迹之上，MELT 提供了 122 个行为特征和风险级别标注，使得在人口规模上进行监督学习成为可能。我们在高风险发行检测任务上对代表性 ML 模型进行了基准测试。将其预测整合到一个简单的 memecoin 选择策略中，显著减少了投资损失，证明了行为轨迹可以转化为风险缓解。我们的数据集和代码可在 https://github.com/git-disl/MELT 获取。

英文摘要

Launchpads have become the dominant mechanism for issuing memecoins, exposing investors to a new class of high-risk launches that existing rug-pull detection methods cannot capture. We argue that detecting these threats requires structured behavioral traces that underlie raw heterogeneous blockchain data, i.e., how insiders accumulate, coordinate, and unwind positions. To enable such analysis, we introduce MELT (MEmecoin Launch Trace, the first behavioral trace dataset for analyzing and detecting high-risk memecoin launches on Solana. MELT covers 41k+ memecoin launches with 200M+ transactions parsed into typed behavioral records that distinguish swaps, wash trades, transfers, and mints. Beyond per-account behaviors, MELT contributes bundle-trace data that links accounts controlled by the same entity, revealing that, on average, 36.5% of token supply is held by coordinated accounts, a concealment strategy that disguises the true ownership concentration from unsuspecting buyers. On top of these traces, MELT provides 122 behavioral features and risk-level annotations, enabling supervised learning at a population scale. We benchmark representative ML models on the high-risk launch detection task. Integrating their predictions into a simple memecoin selection strategy reduces investment loss significantly, demonstrating that behavioral traces can be translated into risk mitigation. Our dataset and code is available at https://github.com/git-disl/MELT.

URL PDF HTML ☆

赞 0 踩 0

2602.13249 2026-05-25 q-bio.BM cs.AI cs.LG 版本更新

A Systematic Evaluation of Co-folding Model Representations for Small-Molecule Learning

小分子学习的共折叠模型表示的系统评估

Hyosoon Jang, Hyunjin Seo, Honghui Kim, Seonghyun Park, Taewon Kim, Yunhui Jang, Sungsoo Ahn

发表机构 * KAIST（韩国科学技术院）

AI总结本文系统评估了基于蛋白质-配体共折叠的模型在小分子学习中的表示能力。研究使用现代共折叠模型Boltz2，将其原子级配体表示迁移到独立的小分子任务中，结果表明其性能在ADMET基准测试中达到或超越现有模型，并提升了分子生成建模和结构引导的配体优化效率。此外，Boltz2的表示与传统独立分子监督方法具有互补性，并可应用于强化学习以增强分子发现过程。这些结果表明，蛋白质-配体共折叠是一种有前景的小分子表示学习预训练范式。

详情

AI中文摘要

小分子基础模型通常仅在独立分子数据上进行预训练，这与视觉和语言模型不同，后者通常受益于跨模态或关系监督。蛋白质-配体共折叠通过将模型暴露于原子级配体-蛋白质相互作用，提供了这种监督的分子类似物，引发了一个问题：共折叠模型能否产生强大的小分子表示。我们使用现代共折叠模型Boltz2研究这个问题，通过将其原子级配体表示转移到独立的小分子任务。通过系统探测和蒸馏，我们表明Boltz2表示在ADMET基准上匹配或超越现有模型，加速分子生成建模，并提高结构引导配体优化的样本效率。我们进一步发现Boltz2表示与从传统独立分子监督（包括3D构象、生物测定标签和量子化学性质）中学习到的表示互补。最后，我们将表示对齐扩展到强化学习，表明密集的表示级监督可以补充分子发现中的标量奖励。这些结果将蛋白质-配体共折叠确定为小分子表示学习的有前景的预训练范式，并将Boltz2定位为强大的现成分子基础模型。

英文摘要

Small-molecule foundation models are typically pretrained on standalone molecular data, unlike vision and language models that often benefit from cross-modal or relational supervision. Protein-ligand co-folding provides a molecular analogue of such supervision by exposing models to atom-level ligand-protein interactions, raising the question of whether co-folding models can yield strong small-molecule representations. We study this question using Boltz2, a modern co-folding model, by transferring its atom-level ligand representations to standalone small-molecule tasks. Through systematic probing and distillation, we show that Boltz2 representations match or outperform existing models on the ADMET benchmark, accelerate molecular generative modeling, and improve sample efficiency in structure-guided ligand optimization. We further find that Boltz2 representations are complementary to those learned from conventional standalone molecular supervision, including 3D conformers, bioassay labels, and quantum-chemical properties. Finally, we extend representation alignment to reinforcement learning, showing that dense representation-level supervision can complement scalar rewards in molecular discovery. These results identify protein-ligand co-folding as a promising pretraining paradigm for small-molecule representation learning and position Boltz2 as a strong, off-the-shelf molecular foundation model.

URL PDF HTML ☆

赞 0 踩 0

2602.12579 2026-05-25 cs.LG cs.AI 版本更新

ArcMark: 通过最优传输实现无失真的多字节大语言模型水印

Atefeh Gilani, Sajani Vithana, Carol Xuan Long, Oliver Kosut, Lalitha Sankar, Flavio P. Calmon

发表机构 * Arizona State University（亚利桑那州立大学）； Harvard University（哈佛大学）

AI总结 ArcMark 是一种基于最优传输理论的无失真多字节大语言模型水印方法，能够在不改变模型生成文本质量的前提下，将多个字节的信息嵌入到少量的生成文本中。该方法通过将无失真水印问题建模为信道编码问题，推导出信息论意义上的信道容量，从而确定了在不引入失真的情况下嵌入信息的理论极限，并据此设计了 ArcMark 算法。实验表明，ArcMark 在信息重建准确率和抗攻击能力方面优于现有方法，且生成文本的困惑度和下游任务表现与未加水印的文本无明显差异。

详情

AI中文摘要

水印是促进大语言模型（LLM）负责任使用的重要工具。现有水印在生成的token中插入信号，要么标记LLM生成的文本（零比特水印），要么编码更复杂的消息（多比特水印）。尽管最近许多方法在不扰动平均下一token预测的情况下向文本中插入多个比特，但它们很大程度上扩展了零比特设置的设计原则，例如每个token编码单个比特。相比之下，能够将多个字节嵌入文本的水印将极大地增加潜在应用，例如嵌入提交提示的用户ID、使用的精确模型版本，甚至提示本身。我们通过引入ArcMark来解决这个问题：一种基于编码和信息论原理的新型水印构造，能够可靠地将多字节信息嵌入仅几百个token中，而不会对底层LLM的下一token分布造成任何失真。我们通过将无失真水印问题建模为信道编码问题，并推导出信息论信道容量，该容量建立了在LLM输出中无失真嵌入信息的基本极限，从而推导出ArcMark。该容量公式指导了ArcMark的设计。在实践中，ArcMark在重建精度上优于竞争的多比特无失真水印，包括在面对改变部分LLM文本的攻击时。ArcMark输出在困惑度和下游任务质量方面也显示出与未加水印文本无法区分。

英文摘要

Watermarking is an important tool for promoting the responsible use of large language models (LLMs). Existing watermarks insert a signal into generated tokens that either flags LLM-generated text (zero-bit watermarking) or encodes more complex messages (multi-bit watermarking). Though a number of recent approaches insert multiple bits into text without perturbing average next-token predictions, they largely extend design principles from the zero-bit setting, such as encoding a single bit per token. In contrast, a watermarker capable of embedding multiple bytes into the text would dramatically increase the potential applications, by embedding information such as the ID of the user who submitted the prompt, the precise model version that was used, or even the prompt itself. We address this problem by introducing ArcMark: a new watermark construction based on coding and information-theoretic principles that is capable of reliably embedding multiple bytes of information into just a few hundred tokens, without any distortion of the underlying LLM next-token distribution. We derive ArcMark by formulating the distortion-free watermarking problem as a channel coding problem, and deriving an information-theoretic channel capacity that establishes the fundamental limit of embedding information in LLM output in a distortion-free manner. This capacity formulation informs the design of ArcMark. In practice, ArcMark outperforms competing multi-bit distortion-free watermarks in terms of reconstruction accuracy, including in the face of attacks that alter a subset of the LLM text. ArcMark output is also shown to be indistinguishable from unwatermarked text in terms of perplexity, and in downstream task quality.

URL PDF HTML ☆

赞 0 踩 0

2602.02780 2026-05-25 cs.AI cs.LG 版本更新

基于LLM代理的临床评分系统自动构建

Silas Ruhrberg Estévez, Christopher Chiu, Mihaela van der Schaar

发表机构 * DAMTP, University of Cambridge, Cambridge, UK（剑桥大学 DAMTP 实验室，剑桥，英国）

AI总结本文研究如何自动构建适用于临床实践的评分系统，这类系统通常由少量可解释的决策规则组成。作者提出了一种基于大语言模型（LLM）代理的方法——AgentScore，通过语义引导的优化流程，在巨大的规则组合空间中搜索符合统计有效性与临床部署要求的评分规则。实验表明，AgentScore 在多个临床预测任务中优于现有方法，并在保持强结构性约束的同时实现了与灵活可解释模型相当的预测性能。

详情

AI中文摘要

现代临床实践依赖于以紧凑评分系统形式实施的循证指南，这些评分系统由少量可解释的决策规则组成。虽然机器学习模型实现了强大的性能，但由于与工作流约束（如可记忆性、可审计性和床边执行）不匹配，许多模型未能转化为常规临床使用。我们认为，这种差距并非源于预测能力不足，而是由于在模型类别上优化时与指南部署不兼容。可部署的指南通常采用单位加权临床检查表的形式，通过对二元规则求和并设置阈值形成，但学习此类评分需要在指数级大的离散规则集空间中进行搜索。我们引入了AgentScore，它通过使用LLM提出候选规则，并采用确定性的、基于数据的验证与选择循环来强制执行统计有效性和可部署性约束，在此空间中进行语义引导的优化。在八个临床预测任务中，AgentScore优于现有的评分生成方法，并且在更强的结构约束下实现了与更灵活的可解释模型相当的AUROC。在两个额外经过外部验证的任务中，AgentScore比已建立的基于指南的评分实现了更高的区分度。

英文摘要

Modern clinical practice relies on evidence-based guidelines implemented as compact scoring systems composed of a small number of interpretable decision rules. While machine-learning models achieve strong performance, many fail to translate into routine clinical use due to misalignment with workflow constraints such as memorability, auditability, and bedside execution. We argue that this gap arises not from insufficient predictive power, but from optimizing over model classes that are incompatible with guideline deployment. Deployable guidelines often take the form of unit-weighted clinical checklists, formed by thresholding the sum of binary rules, but learning such scores requires searching an exponentially large discrete space of possible rule sets. We introduce AgentScore, which performs semantically guided optimization in this space by using LLMs to propose candidate rules and a deterministic, data-grounded verification-and-selection loop to enforce statistical validity and deployability constraints. Across eight clinical prediction tasks, AgentScore outperforms existing score-generation methods and achieves AUROC comparable to more flexible interpretable models despite operating under stronger structural constraints. On two additional externally validated tasks, AgentScore achieves higher discrimination than established guideline-based scores.

URL PDF HTML ☆

赞 0 踩 0

2601.21500 2026-05-25 cs.LG 版本更新

具有理论基础的低成本硬标签对抗攻击

Jun Liu, Leo Yu Zhang, Fengpeng Li, Isao Echizen, Jiantao Zhou

发表机构 * University of Macau（澳门大学）； National Institute of Informatics（国家信息研究所）； Griffith University（格里菲斯大学）； University of Tokyo（东京大学）

AI总结本文研究了基于硬标签的黑盒对抗攻击问题，这类攻击仅依赖模型的顶部预测结果，具有较高的实际威胁性。为解决现有方法在初始化策略和理论保障方面的不足，作者提出了一个具有理论支撑的统一框架，并设计了零查询初始化策略与模式驱动优化算法，显著提升了攻击效率与成功率。实验表明，该方法在多个数据集和防御模型上均优于现有最先进方法，且具有良好的泛化能力与对状态型防御的绕过能力。

详情

AI中文摘要

硬标签黑盒攻击仅依赖top-1预测，是最具挑战性但实际威胁最大的模型之一。尽管近期有进展，现有方法存在两个关键局限：(1) 忽视初始化的关键作用，主要关注优化策略；(2) 严重依赖经验启发式方法，缺乏理论保证。为弥补这一差距，我们建立了一个统一的理论框架，表明现有的符号翻转硬标签攻击可理解为近似真实梯度符号。在此原则性分析指导下，我们提出一种新颖的攻击框架，包含零查询初始化策略和模式驱动优化（PDO）算法。我们提供理论保证，证明我们的初始化比随机基线具有更高的与真实梯度符号的余弦相似度，且PDO模块的查询复杂度显著低于基线搜索方法。在CIFAR-10、ImageNet和ObjectNet上的大量实验（涵盖标准训练和对抗训练模型、商业API以及CLIP模型）表明，我们的方法在成功率和效率上持续优于最先进的硬标签攻击，尤其在低查询预算下。此外，我们的方法在损坏数据（ImageNet-C）、生物医学图像（PathMNIST）以及密集预测任务（如分割）上展现出鲁棒的泛化能力。值得注意的是，它绕过了有状态防御Blacklight，实现了0%的检测率。

英文摘要

Hard-label black-box attacks, relying solely on top-1 predictions, represent one of the most challenging yet practically threat models. Despite recent progress, existing approaches face two key limitations: (1) they overlook the critical role of initialization, focusing primarily on optimization strategies; and (2) they rely heavily on empirical heuristics without theoretical guarantees. To bridge this gap, we establish a unified theoretical framework showing that existing sign-flipping hard-label attacks can be understood as approximating the true gradient sign. Guided by this principled analysis, we propose a novel attack framework featuring a zero-query initialization strategy and a Pattern-Driven Optimization (PDO) algorithm. We provide theoretical guarantees that our initialization yields higher cosine similarity to the true gradient sign than random baselines, and our PDO module achieves significantly lower query complexity than baseline search methods. Extensive experiments across CIFAR-10, ImageNet, and ObjectNet-covering standard and adversarially trained models, commercial APIs, and CLIP models-demonstrate that our method consistently outperforms SOTA hard-label attacks in both success rate and efficiency, particularly under low query budgets. Furthermore, our method demonstrates robust generalization across corrupted data (ImageNet-C), biomedical images (PathMNIST), and dense prediction tasks such as segmentation. Notably, it bypasses the stateful defense Blacklight, achieving a 0% detection rate.

URL PDF HTML ☆

赞 0 踩 0

2601.07545 2026-05-25 cs.LG stat.ML 版本更新

Near-Optimal Private Linear Regression via Iterative Hessian Mixing

通过迭代Hessian混合实现近最优私有线性回归

Omri Lev, Moshe Shenfeld, Vishwak Srinivasan, Katrina Ligett, Ashia C. Wilson

发表机构 * Department of EECS, Massachusetts Institute of Technology, US（麻省理工学院电子工程与计算机科学系）； School of Computer Science and Engineering, The Hebrew University of Jerusalem, IL（耶路撒冷希伯来大学计算机科学与工程学院）

AI总结本文研究了在数据有界条件下实现差分隐私的普通最小二乘回归问题，提出了一种基于高斯投影的迭代海森矩阵混合（IHM）算法。该方法在保证差分隐私的同时，通过改进的实用风险界提升了模型性能，相比现有方法如AdaSSP，去除了与数据维度相关的乘法因子，从而在多个数据集上表现出更优的实证效果。

详情

AI中文摘要

我们研究通过草图机制实现带界数据$(X,Y)$的差分隐私普通最小二乘（DP-OLS）。虽然高斯草图方法已被探索用于DP-OLS \citep{sheffet2017differentially}，但它们通常被认为不如自适应充分统计量扰动（AdaSSP）方法 \citep{wang_adassp}，后者直接扰动充分统计量$(X^{\top}X, X^{\top}Y)$。该方法被证明接近信息论最优，同时表现出强大的实证性能。在这项工作中，我们提出了\emph{迭代Hessian混合}（IHM），一种基于高斯草图方法构建的DP-OLS算法，其灵感来自\citet{pilanci_hessiansketch}的迭代Hessian草图。我们证明IHM是差分私有的，并以超额经验风险界的形式提供效用保证。这些界通过移除一个可能高达数据维度平方根的乘法因子，改进了AdaSSP的界。IHM的设计基于我们为先前DP-OLS的高斯草图方法提出的新准确性保证，这些保证阐明了这些方法何时预期表现良好，以及IHM如何规避其固有局限性。我们还在大量数据集上进行了严格的实证评估，表明IHM始终优于包括AdaSSP在内的先前基线。

英文摘要

We study differentially private ordinary least squares (DP-OLS) with bounded data $(X,Y)$ via sketching-based mechanisms. While Gaussian sketching approaches have been explored for DP-OLS \citep{sheffet2017differentially}, they are typically viewed as less competitive than the Adaptive Sufficient Statistics Perturbation (AdaSSP) method \citep{wang_adassp}, which directly perturbs the sufficient statistics $(X^{\top}X, X^{\top}Y)$. This method was shown to be close to information-theoretically optimal, while also exhibiting strong empirical performance. In this work, we propose the \emph{Iterative Hessian Mixing} (IHM), an algorithm that builds on Gaussian sketching approaches to DP-OLS and is inspired by the Iterative Hessian Sketch of \citet{pilanci_hessiansketch}. We prove that IHM is differentially private and provide utility guarantees in the form of excess empirical risk bounds. These bounds improve upon those of AdaSSP by removing a multiplicative factor that can be as large as the square root of the data dimension. The design of the IHM is based on new accuracy guarantees that we present for prior Gaussian sketching approaches for DP-OLS, which clarify when these methods are expected to perform well and how IHM circumvents their inherent limitations. We also conduct a rigorous empirical evaluation on a large suite of datasets, demonstrating that IHM consistently outperforms prior baselines, including AdaSSP.

URL PDF HTML ☆

赞 0 踩 0

2512.22597 2026-05-25 cs.LG physics.chem-ph 版本更新

Energy-Guided Generative Modeling for Low-Energy Molecular Structure Discovery

能量引导的生成式建模用于低能分子结构发现

Guikun Xu, Xiaohan Yi, Ziqiao Meng, Peilin Zhao, Yatao Bian

发表机构 * School of Artificial Intelligence, Shanghai Jiao Tong University（上海交通大学人工智能学院）； Department of Computer Science, National University of Singapore（新加坡国立大学计算机科学系）； Shenzhen International Graduate School, Tsinghua University（清华大学深圳国际研究生院）

AI总结本文提出了一种名为EnFlow的能量引导生成模型，用于高效发现低能量分子构型。该方法结合了基于流的构型生成与显式的能量景观建模，实现了构象集合的联合生成与基态识别。通过将生成动力学与学习到的能量模型相结合，EnFlow能够在极少采样步骤内生成结构准确且能量较低的分子构型，并能根据能量对生成结果进行排序，实验表明其在多个分子数据集上表现出色。

详情

AI中文摘要

探索分子能量景观和识别基态构象是计算化学的核心挑战。然而，从分子图生成多样化的低能构象在传统的基于物理的流程中仍然昂贵。现有的基于学习的方法仍然分散：生成模型捕捉构象多样性但通常缺乏可靠的能量校准，而确定性预测器关注单一结构且无法表示系综变异性。这里我们介绍EnFlow，据我们所知，这是第一个能量引导的生成框架，它将基于流的构象生成与显式能量景观建模相结合，用于联合构象系综生成和基态识别。通过将生成动力学与学习的能量模型集成，EnFlow引导采样朝向构象景观的低能区域，在极少的采样步数下提高结构保真度，同时实现对生成构象的基于能量的排序。在GEOM-QM9和GEOM-Drugs上的实验表明，EnFlow在构象生成和基态识别方面取得了强劲性能，同时仅需要1-2个ODE采样步。单点GFN2-xTB评估进一步表明，学习的能量分数保留了生成构象的物理上有意义的能量排序。这些结果支持显式能量景观建模作为通过联合建模构象系综及其相关能量来发现低能分子结构的有效策略。

英文摘要

Exploring molecular energy landscapes and identifying ground-state conformations are central challenges in computational chemistry. However, generating diverse low-energy conformers from molecular graphs remains expensive with traditional physics-based pipelines. Existing learning-based approaches remain fragmented: generative models capture conformational diversity but often lack reliable energy calibration, whereas deterministic predictors focus on a single structure and fail to represent ensemble variability. Here we introduce EnFlow, to our knowledge, the first energy-guided generative framework that couples flow-based conformer generation with explicit energy landscape modeling for joint conformational ensemble generation and ground-state identification. By integrating generative dynamics with a learned energy model, EnFlow guides sampling toward low-energy regions of the conformational landscape, improving structural fidelity under extremely few sampling steps while enabling energy-based ranking of generated conformations. Experiments on GEOM-QM9 and GEOM-Drugs show that EnFlow achieves strong performance in conformer generation and ground-state identification while requiring only 1--2 ODE sampling steps. Single-point GFN2-xTB evaluations further show that the learned energy scores preserve physically meaningful energetic rankings of generated conformations. These results support explicit energy landscape modeling as an effective strategy for low-energy molecular structure discovery through joint modeling of conformational ensembles and their associated energies.

URL PDF HTML ☆

赞 0 踩 0

2512.15436 2026-05-25 stat.ML cs.LG 版本更新

Online Partitioned Local Depth for semi-supervised applications

面向半监督应用的在线分区局部深度

John D. Foley, Justin T. Lee

发表机构 * Metron, Inc.（梅隆公司）

AI总结本文提出了一种适用于在线应用场景的改进版分区局部深度（PaLD）算法，名为在线PaLD，主要用于半监督预测任务。该算法在预计算参考数据集的凝聚网络后，能够在较短时间内扩展至新数据点，从而提升计算效率。研究通过实际应用展示了在线PaLD在医疗数据集上的异常检测和半监督分类中的潜力，拓展了PaLD框架的应用范围。

Comments Added theorem statements and refined results; 21 pages, 2 figures

详情

AI中文摘要

我们介绍了分区局部深度（PaLD）算法的一个扩展，该扩展适用于在线应用，如半监督预测。PaLD以无监督、无参数聚类而闻名，但其鲁棒性基于数据点的三元组，使得精确分析计算成本高昂。目前正在研究如何提高底层离散算法的可扩展性并扩大PaLD的应用范围。我们提出的新算法online PaLD非常适合那些可以预先从参考数据集中计算凝聚网络的情况。在花费$O(n^3)$步骤构建可查询的数据结构后，online PaLD可以在$O(n^2)$时间内将凝聚网络扩展到新的数据点。我们的方法补充了之前基于近似和并行的加速方法。在实际应用中，online PaLD通过相对简单的实现使得更大的数据集可以进行精确分析。我们展示了在医疗保健数据集上的在线异常检测和半监督分类应用，作为online PaLD扩展PaLD框架应用潜力的初步说明。

英文摘要

We introduce an extension of the partitioned local depth (PaLD) algorithm that is adapted to online applications such as semi-supervised prediction. PaLD is best known for unsupervised, parameter-free clustering, but its robustness is based on triples of data points, making exact analysis computationally expensive. Research is ongoing to improve the scalability of the underlying discrete algorithm and expand the breath of PaLD's applications. The new algorithm we present, online PaLD, is well-suited to situations where it is possible to pre-compute a cohesion network from a reference dataset. After $O(n^3)$ steps to construct a queryable data structure, online PaLD can extend the cohesion network to a new data point in $O(n^2)$ time. Our approach complements previous speed up approaches based on approximation and parallelism. In practical terms, online PaLD makes larger datasets accessible to exact analysis with a relatively simple implementation. We present applications to online anomaly detection and semi-supervised classification for health-care datasets as initial illustrations of online PaLD's potential to expand applications of the PaLD framework.

URL PDF HTML ☆

赞 0 踩 0

2511.18000 2026-05-25 cs.LG cs.AI q-bio.PE 版本更新

Reward Engineering for Spatial Epidemic Simulations: A Reinforcement Learning Platform for Individual Behavioral Learning

空间流行病模拟中的奖励工程：个体行为学习的强化学习平台

Radman Rakhshandehroo, Daniel Coombs

发表机构 * Department of Computer Science University of British Columbia（计算机科学系，不列颠哥伦比亚大学）； Department of Mathematics and Institute of Applied Mathematics University of British Columbia（数学系和应用数学研究所，不列颠哥伦比亚大学）

AI总结本文介绍了 ContagionRL，一个专为疫情空间模拟设计的强化学习平台，用于系统研究奖励函数设计对个体行为学习的影响。该平台结合了可配置的 SIRS+D 流行病模型，支持在不同环境条件下评估多种奖励机制对智能体生存策略的影响，并通过实验发现方向引导和明确遵守激励是提升策略学习的关键因素。研究还表明，采用势场奖励函数的智能体在非药物干预遵守和空间规避策略方面表现最优，平台为探索奖励与行为关系提供了模块化工具，具有重要的理论和应用价值。

Comments 38 pages, 15 figures and 18 tables; Accepted to TMLR. OpenReview: https://openreview.net/forum?id=yPEASsx3hk

详情

Journal ref: Transactions on Machine Learning Research, 2026

AI中文摘要

我们提出了ContagionRL，一个与Gymnasium兼容的强化学习平台，专门用于空间流行病模拟中的系统奖励工程。与依赖固定行为规则的传统基于智能体的模型不同，我们的平台能够严格评估奖励函数设计如何影响在不同流行病场景中学到的生存策略。ContagionRL集成了空间SIRS+D流行病模型与可配置的环境参数，允许研究人员在包括有限可观测性、不同移动模式和异质人口动态等变化条件下对奖励函数进行压力测试。我们评估了五种不同的奖励设计，从稀疏生存奖励到一种新颖的势场方法，跨越多种RL算法（PPO、SAC、A2C）。通过系统的消融研究，我们发现方向性指导和明确的依从性激励是稳健策略学习的关键组成部分。我们在不同感染率、网格大小、可见性约束和移动模式下的全面评估表明，奖励函数的选择显著影响智能体行为和生存结果。使用我们的势场奖励训练的智能体始终获得优越性能，学习最大程度地遵守非药物干预，同时发展出复杂的空间规避策略。该平台的模块化设计使得能够系统地探索奖励-行为关系，弥补了这类模型中奖励工程关注有限的空白。ContagionRL是研究流行病背景下适应性行为反应的有效平台，并强调了奖励设计、信息结构和环境可预测性在学习中的重要性。我们的代码公开在https://github.com/redradman/ContagionRL。

英文摘要

We present ContagionRL, a Gymnasium-compatible reinforcement learning platform specifically designed for systematic reward engineering in spatial epidemic simulations. Unlike traditional agent-based models that rely on fixed behavioral rules, our platform enables rigorous evaluation of how reward function design affects learned survival strategies across diverse epidemic scenarios. ContagionRL integrates a spatial SIRS+D epidemiological model with configurable environmental parameters, allowing researchers to stress-test reward functions under varying conditions including limited observability, different movement patterns, and heterogeneous population dynamics. We evaluate five distinct reward designs, ranging from sparse survival bonuses to a novel potential field approach, across multiple RL algorithms (PPO, SAC, A2C). Through systematic ablation studies, we identify that directional guidance and explicit adherence incentives are critical components for robust policy learning. Our comprehensive evaluation across varying infection rates, grid sizes, visibility constraints, and movement patterns reveals that reward function choice dramatically impacts agent behavior and survival outcomes. Agents trained with our potential field reward consistently achieve superior performance, learning maximal adherence to non-pharmaceutical interventions while developing sophisticated spatial avoidance strategies. The platform's modular design enables systematic exploration of reward-behavior relationships, addressing a knowledge gap in models of this type where reward engineering has received limited attention. ContagionRL is an effective platform for studying adaptive behavioral responses in epidemic contexts and highlight the importance of reward design, information structure, and environmental predictability in learning. Our code is publicly available at https://github.com/redradman/ContagionRL

URL PDF HTML ☆

赞 0 踩 0

2511.17171 2026-05-25 cs.CV cs.LG 版本更新

FireScope: Wildfire Risk Raster Prediction with a Chain-of-Thought Oracle

FireScope: 基于思维链预言机的野火风险栅格预测

Mario Markov, Stefan Maria Ailuro, Luc Van Gool, Konrad Schindler, Danda Pani Paudel

发表机构 * ETH Zurich（苏黎世联邦理工学院）

AI总结该论文提出了一种名为FireScope的框架，用于预测野火风险栅格图，通过结合视觉、气候和地理信息进行因果推理。研究引入了FireScope-Bench数据集，整合了Sentinel-2卫星图像、气候数据和专家定义的风险图，用于跨大陆评估。FireScope基于视觉语言模型，结合强化学习和视觉监督，生成带有推理轨迹的风险图，显著提升了模型在不同大陆间的泛化能力和可解释性。该工作首次展示了基于语言的推理在视觉生成中的泛化提升作用，并提出了首个可跨大陆应用的高分辨率野火风险模型。

Comments CVPR 2026, Project Page: https://firescope.ai/research

详情

AI中文摘要

预测野火风险是一个推理密集型的空间问题，需要整合视觉、气候和地理因素来推断连续的风险地图。现有方法缺乏可靠泛化所需的因果推理和多模态理解。我们引入了FireScope-Bench，一个大规模数据集和基准，将Sentinel-2图像和气候数据与专家定义的全美风险栅格以及欧洲的真实野火事件配对，用于跨大陆评估。基于此数据集，我们提出了FireScope，一个基于VLM的推理到生成框架，从强化学习和视觉监督中学习，通过互补的推理轨迹预测风险栅格。当在美国训练并在欧洲测试时，FireScope取得了显著的性能提升，而专家反馈和自动化分析证实其推理轨迹是忠实且有语义意义的。我们的发现表明，推理可以支撑栅格预测模型，提高泛化性和可解释性。据我们所知，这是第一个（1）证明基于语言的推理可以改善视觉生成泛化性的框架，（2）提出一个可跨大陆应用的高分辨率野火风险模型，以及（3）能够系统研究多模态火灾风险模型稳健跨大陆泛化的框架。我们相信FireScope-Bench有潜力成为推动推理驱动、可解释和可泛化空间建模的基础。数据和源代码将公开提供。

英文摘要

Predicting wildfire risk is a reasoning-intensive spatial problem that requires the integration of visual, climatic, and geographic factors to infer continuous risk maps. Existing methods lack the causal reasoning and multimodal understanding required for reliable generalization. We introduce FireScope-Bench, a large-scale dataset and benchmark that couples Sentinel-2 imagery and climate data with expert-defined risk rasters across the USA, and real wildfire events in Europe for cross-continental evaluation. Building on this dataset, we propose FireScope, a VLM-based reasoning-to-generation framework that learns from both reinforcement learning and visual supervision to predict risk rasters with complementary reasoning traces. When trained in the USA and tested in Europe, FireScope achieves substantial performance gains, while expert feedback and automated analysis confirm that its reasoning traces are faithful and semantically meaningful. Our findings demonstrate that reasoning can ground raster prediction models, improving both generalization and interpretability. To our knowledge, this is the first framework to (1) demonstrate that language-based reasoning can improve generalization in visual generation, (2) propose a high-resolution wildfire risk model that can be applied across continents, and (3) enable systematic studies of robust cross-continental generalization for multimodal fire risk models. We believe that FireScope-Bench has the potential to serve as a foundation for advancing reasoning-driven, interpretable and generalizable spatial modeling. Data and source code will be made publicly available.

URL PDF HTML ☆

赞 0 踩 0

2511.00266 2026-05-25 cs.LG cs.RO 版本更新

X-TRACK: Physics-Aware xLSTM for Realistic Vehicle Trajectory Prediction

X-TRACK: 物理感知的xLSTM用于真实车辆轨迹预测

Aanchal Rajesh Chugh, Marion Neumeier, Sebastian Dorn

AI总结准确的轨迹预测对自动驾驶系统的安全性和可靠性至关重要，尤其需要在高速公路场景中建模长期时间依赖关系并考虑车辆之间的社会交互。本文提出了一种基于xLSTM的新型高速公路轨迹预测框架X-TRAJ，并进一步引入其物理感知变体X-TRACK，通过显式整合车辆运动学约束，生成更真实可行的轨迹。实验表明，X-TRACK在公开数据集highD和NGSIM上均优于现有先进方法，尤其在highD上表现突出。

详情

AI中文摘要

准确的轨迹预测对于安全可靠的自动驾驶系统至关重要，需要模型能够捕捉长期时间依赖性，同时考虑高速公路驾驶场景中相邻车辆之间的社交互动。虽然长短期记忆（LSTM）网络在轨迹预测领域得到了广泛应用，但它们存在记忆容量有限和标量细胞状态等局限性。最近引入的扩展长短期记忆（xLSTM）通过引入指数门控和增强的记忆结构解决了传统LSTM的这些局限性，使其更适合建模长期时间依赖性。尽管具有潜力，基于xLSTM的模型在车辆轨迹预测方面仍未得到充分探索。本文首次将xLSTM应用于高速公路轨迹预测，提出了新颖的基于xLSTM的高速公路轨迹预测框架X-TRAJ，以及其物理感知变体X-TRACK（受运动学约束的扩展LSTM轨迹预测），该变体将车辆运动学显式集成到模型学习过程中。通过引入物理约束，所提出的模型生成真实可行的高速公路轨迹。在公开的高速公路数据集highD和NGSIM上的全面评估表明，X-TRACK在highD上优于最先进的基线，并在NGSIM数据集上达到最先进模型水平。

英文摘要

Accurate trajectory prediction is crucial for safe and reliable autonomous driving systems, requiring models that capture long-term temporal dependencies while accounting for social interactions among neighboring vehicles in highway driving scenarios. While Long Short Term Memory (LSTM) networks have been widely used in the domain of trajectory prediction, they have limitations such as limited memory capacity and scalar cell state. The recently introduced Extended Long Short Term Memory (xLSTM) addresses these limitations of traditional LSTMs by introducing exponential gating and enhanced memory structures, making them better suited for modeling long-term temporal dependencies. Despite their potential, xLSTM-based models remain underexplored in the context of vehicle trajectory prediction. This paper introduces a novel xLSTM-based highway trajectory prediction framework, X-TRAJ, as the first application of xLSTM, and its physics-aware variant, X-TRACK (eXtended LSTM for TRAjectory prediction Constraint by Kinematics), which explicitly integrates vehicle motion kinematics into the model learning process. By introducing physical constraints, the proposed model generates realistic and feasible highway trajectories. A comprehensive evaluation on the publicly available highway datasets, highD and NGSIM, demonstrates that X-TRACK outperforms state-of-the-art baselines on highD and is among the state-of-the-art models on the NGSIM dataset.

URL PDF HTML ☆

赞 0 踩 0

2510.22941 2026-05-25 cs.LG 版本更新

Hazard-Responsive Digital Twin for Climate-Driven Urban Resilience and Equity

面向气候驱动的城市韧性与公平的灾害响应数字孪生

Zhenglai Shen, Hongyu Zhou

发表机构 * Buildings and Transportation Science Division, Oak Ridge National Laboratory（奥克伍德国家实验室建筑与交通科学部门）； Civil and Environmental Engineering, University of Tennessee（田纳西大学土木与环境工程系）

AI总结面对野火引发的停电和城市热浪等复合型气候灾害，本文提出了一种具有响应能力的数字孪生系统（H-RDT），结合物理信息神经网络、多模态数据融合和公平性风险分析，提升城市应对灾害的韧性与公平性。该系统在模拟城区中展示了对部分传感器失效情况下的稳定室内温度预测能力，并通过强化学习模块自适应融合物联网、无人机和卫星数据，识别高脆弱性区域，如学校、诊所和低收入住房。研究还表明，通过提前启动冷却中心和共享微电网等干预措施，可有效降低人群加权热风险和极端风险，为城市气候适应决策提供更具适应性和公平导向的支持。

Comments 52 pages, 9 figures

详情

DOI: 10.1016/j.scs.2026.107413
Journal ref: Sustainable Cities and Society 144 (2026) 107413

AI中文摘要

复合气候灾害，如野火引发的停电和城市热浪，挑战着城市的稳定性和公平性。我们提出一种灾害响应数字孪生（H-RDT），它结合了物理信息神经网络建模、多模态数据融合和公平感知风险分析，用于城市尺度的响应。在一个包含多种建筑原型和人群的合成区域中，模拟的野火-停电-热浪级联事件表明，H-RDT 在部分传感器缺失的情况下能维持稳定的室内温度预测（约31至33°C），再现停电引发的温度激增和恢复。基于强化学习的融合模块自适应地重新加权物联网、无人机和卫星输入，以维持时空覆盖，而公平调整的映射则隔离出高脆弱性集群（学校、诊所、低收入住房）。前瞻性干预措施，如预防性冷却中心启动和微电网共享，将人口加权热风险降低11%至13%，将95百分位（尾部）风险缩小7%至17%，并将过热小时数减少高达9%。除了合成演示之外，该框架为实际城市实施建立了可迁移的基础，将物理灾害建模与社会公平和决策智能联系起来。H-RDT 推动数字城市韧性向自适应、基于学习和以公平为中心的决策支持发展，以应对气候适应。

英文摘要

Compounding climate hazards, such as wildfire-induced outages and urban heatwaves, challenge the stability and equity of cities. We present a Hazard-Responsive Digital Twin (H-RDT) that combines physics-informed neural network modeling, multimodal data fusion, and equity-aware risk analytics for urban-scale response. In a synthetic district with diverse building archetypes and populations, a simulated wildfire-outage-heatwave cascade shows that H-RDT maintains stable indoor temperature predictions (approximately 31 to 33 C) under partial sensor loss, reproducing outage-driven surges and recovery. The reinforcement learning based fusion module adaptively reweights IoT, UAV, and satellite inputs to sustain spatiotemporal coverage, while the equity-adjusted mapping isolates high-vulnerability clusters (schools, clinics, low-income housing). Prospective interventions, such as preemptive cooling-center activation and microgrid sharing, reduce population-weighted thermal risk by 11 to 13 percent, shrink the 95th-percentile (tail) risk by 7 to 17 percent, and cut overheating hours by up to 9 percent. Beyond the synthetic demonstration, the framework establishes a transferable foundation for real-city implementation, linking physical hazard modeling with social equity and decision intelligence. The H-RDT advances digital urban resilience toward adaptive, learning-based, and equity-centered decision support for climate adaptation.

URL PDF HTML ☆

赞 0 踩 0

2510.12328 2026-05-25 cs.LG 版本更新

在不完美验证器下基于可验证但含噪声奖励的强化学习

Xin-Qiang Cai, Wei Wang, Feng Liu, Tongliang Liu, Gang Niu, Masashi Sugiyama

发表机构 * RIKEN AIP（日本理化学研究所AIP）； The University of Tokyo（东京大学）； The University of Melbourne（墨尔本大学）； The University of Sydney（悉尼大学）

AI总结该论文研究了在不可靠验证器存在下如何改进可验证奖励的强化学习（RLVR）。通过将验证器的不可靠性建模为具有不对称噪声率的随机奖励通道，作者提出了两种轻量级修正方法：一种是反向修正，用于生成无偏的替代奖励；另一种是正向修正，通过调整得分函数项使策略更新更贴近干净梯度方向。实验表明，这两种方法在合成和真实验证噪声环境下均能提升数学推理任务的性能，其中正向修正在高噪声情况下更为稳定。此外，作者还引入了一个基于轻量级语言模型的申诉机制，用于在线估计假阴性率并进一步提升性能。

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）用自动验证器替代昂贵的人工标注。为减少验证器攻击，许多RLVR系统将奖励二值化为$\\\{0,1\\\}$，但不完美的验证器不可避免地引入\\emph{假阴性}（拒绝正确答案）和\\emph{假阳性}（接受错误答案）。我们将验证器不可靠性形式化为具有非对称噪声率$ρ_0$和$ρ_1$（分别为FP率和FN率）的随机奖励通道。由此抽象我们推导出两种轻量级校正：（i）\\emph{后向}校正，产生无偏替代奖励，从而在期望上得到无偏的策略梯度估计量；（ii）\\emph{前向}校正，重新加权得分函数项，使得期望更新与干净梯度方向对齐，且仅需FN率。我们在分组相对策略优化流程中将两者实现为轻量级钩子，两种校正均在合成和真实验证器噪声下改善了数学推理的RLVR，其中前向变体在较大噪声下更稳定。最后，一个带有轻量级LLM验证器的上诉机制在线估计FN率并进一步提升性能。

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) replaces costly human labeling with automated verifiers. To reduce verifier hacking, many RLVR systems binarize rewards to $\{0,1\}$, but imperfect verifiers inevitably introduce \emph{false negatives} (rejecting correct answers) and \emph{false positives} (accepting incorrect ones). We formalize verifier unreliability as a stochastic reward channel with asymmetric noise rates $ρ_0$ and $ρ_1$ -- the FP rate and the FN rate, respectively. From this abstraction we derive two lightweight corrections: (i) a \emph{backward} correction that yields an unbiased surrogate reward and thus an unbiased policy-gradient estimator in expectation, and (ii) a \emph{forward} correction that reweights score-function terms so the expected update aligns with the clean gradient direction and requires only the FN rate. We implement both as lightweight hooks in a group relative policy optimization pipeline, both corrections improve RLVR for math reasoning under synthetic and real verifier noise, with the forward variant being more stable under heavier noise. Finally, an appeals mechanism with a lightweight LLM verifier estimates the FN rate online and further improves performance.

URL PDF HTML ☆

赞 0 踩 0

2510.00526 2026-05-25 cs.CL cs.LG 版本更新

Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

超越对数似然：面向模型能力连续体的监督微调概率目标

Gaotang Li, Ruizhong Qiu, Xiusi Chen, Heng Ji, Hanghang Tong

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结本文研究了监督微调（SFT）中超越负对数似然（NLL）的目标函数，针对大语言模型在不同能力水平下的表现差异，提出了一种基于概率的优化目标体系。通过大量实验和消融研究，发现模型能力水平是决定不同目标函数优劣的关键因素：在模型能力强时，优先考虑先验知识的目标（如$-p$、$-p^{10}$）表现更优；在模型能力弱时，NLL仍占优势；而中间阶段则无单一目标占优。该研究为根据模型能力选择合适的目标函数提供了理论依据和实践指导。

Comments ICML 2026

详情

AI中文摘要

监督微调（SFT）是后训练大型语言模型（LLM）的标准方法，但通常表现出有限的泛化能力。我们将此限制归因于其默认训练目标：负对数似然（NLL）。虽然NLL在从头训练时经典最优，但后训练处于不同范式，可能违反其最优性假设，因为模型已编码任务相关先验，且监督可能冗长且有噪声。在这项工作中，我们系统研究了各种基于概率的目标，并刻画了不同目标在不同条件下成功或失败的时间和原因。通过在8个模型骨干、27个基准和7个领域上的全面实验和广泛消融研究，我们揭示了控制目标行为的关键维度：模型能力连续体。在模型强端附近，降低低概率令牌权重的先验倾向目标（例如，-p, -p^{10}, 阈值变体）一致优于NLL；在模型弱端，NLL占主导；在中间，没有单一目标普遍最优。我们的理论分析进一步阐明了目标如何在连续体上交换位置，为根据模型能力调整目标提供了原则性基础。代码可在 https://github.com/GaotangLi/Beyond-Log-Likelihood 获取。

英文摘要

Supervised fine-tuning (SFT) is the standard approach for post-training large language models (LLMs), yet it often shows limited generalization. We trace this limitation to its default training objective: negative log likelihood (NLL). While NLL is classically optimal when training from scratch, post-training operates in a different paradigm and could violate its optimality assumptions, where models already encode task-relevant priors and supervision can be long and noisy. In this work, we systematically study various probability-based objectives and characterize when and why different objectives succeed or fail under varying conditions. Through comprehensive experiments and extensive ablation studies across 8 model backbones, 27 benchmarks, and 7 domains, we uncover a critical dimension that governs objective behavior: the model-capability continuum. Near the model-strong end, prior-leaning objectives that downweight low-probability tokens (e.g., $-p$, $-p^{10}$, thresholded variants) consistently outperform NLL; toward the model-weak end, NLL dominates; in between, no single objective prevails. Our theoretical analysis further elucidates how objectives trade places across the continuum, providing a principled foundation for adapting objectives to model capability. The code is available at https://github.com/GaotangLi/Beyond-Log-Likelihood.

URL PDF HTML ☆

赞 0 踩 0

2509.15105 2026-05-25 cs.LG 版本更新

Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting

Super-Linear: 一种轻量级预训练线性专家混合模型用于时间序列预测

Liran Nochumsohn, Raz Marshanski, Hedi Zisling, Omri Azencot

发表机构 * Faculty of Computer and Information Science, Ben-Gurion University（计算机与信息科学学院，本·古里安大学）

AI总结本文提出了一种轻量级的预训练混合专家模型 Super-Linear，用于时间序列预测。该模型通过使用频率特化的线性专家替代复杂的深度结构，并结合轻量的频谱门控机制动态选择相关专家，实现了高效且准确的预测。Super-Linear 在多个基准数据集上表现出色，显著提升了计算效率、对采样率的鲁棒性以及模型可解释性。

详情

Journal ref: Transactions on Machine Learning Research (TMLR), 2026

AI中文摘要

时间序列预测（TSF）在能源、金融、医疗和物流等领域至关重要，需要能够跨不同数据集泛化的模型。像Chronos和Time-MoE这样的大型预训练模型表现出强大的零样本（ZS）性能，但计算成本高。在这项工作中，我们引入了Super-Linear，一种轻量级且可扩展的混合专家（MoE）模型，用于通用预测。它用简单的频率特化线性专家替代深度架构，这些专家在多个频率范围内的重采样数据上进行训练。一种轻量级光谱门控机制动态选择相关专家，实现高效准确的预测。尽管简单，Super-Linear在基准测试中表现出强劲性能，同时显著提高了效率、对采样率的鲁棒性和可解释性。Super-Linear的实现可在以下网址获取：\href{https://github.com/azencot-group/SuperLinear}{https://github.com/azencot-group/SuperLinear}。

英文摘要

Time series forecasting (TSF) is critical in domains like energy, finance, healthcare, and logistics, requiring models that generalize across diverse datasets. Large pre-trained models such as Chronos and Time-MoE show strong zero-shot (ZS) performance but suffer from high computational costs. In this work, we introduce Super-Linear, a lightweight and scalable mixture-of-experts (MoE) model for general forecasting. It replaces deep architectures with simple frequency-specialized linear experts, trained on resampled data across multiple frequency regimes. A lightweight spectral gating mechanism dynamically selects relevant experts, enabling efficient, accurate forecasting. Despite its simplicity, Super-Linear demonstrates strong performance across benchmarks, while substantially improving efficiency, robustness to sampling rates, and interpretability. The implementation of Super-Linear is available at: \href{https://github.com/azencot-group/SuperLinear}{https://github.com/azencot-group/SuperLinear}.

URL PDF HTML ☆

赞 0 踩 0

2508.14311 2026-05-25 cs.LG cs.AI 版本更新

Online Learning with Multiple Fairness Regularizers via Graph-Structured Feedback

通过图结构反馈进行多重公平正则化器的在线学习

Quan Zhou, Jakub Marecek, Robert Shorten

发表机构 * Department of Mathematics, National University of Singapore（新加坡国立大学数学系）； Department of Computer Science, Czech Technical University（捷克技术大学计算机科学系）； Dyson School of Design Engineering, Imperial College London（伦敦帝国理工学院设计工程戴森学院）； Imperial College London（伦敦帝国理工学院）

AI总结本文研究了在自动决策系统中如何同时满足多个可能相互冲突的公平性要求的问题。作者提出了一种基于图结构反馈的强化学习方法，能够在序贯交互过程中自适应地学习不同公平性目标的权重。该方法为动态环境中实现多目标公平性优化提供了新的解决方案。

Comments Published in Transactions on Machine Learning Research (TMLR), 2026. OpenReview: https://openreview.net/forum?id=y8iWuDZtEw

2508.14083 2026-05-25 cs.LG cs.AI 版本更新

GeoMAE: Masking Representation Learning for Spatio-Temporal Graph Forecasting with Missing Values

GeoMAE：面向缺失值的时空图预测的掩码表示学习

Songyu Ke, Chenyu Wu, Yuxuan Liang, Huiling Qin, Junbo Zhang, Yu Zheng

发表机构 * College of Computer and Data Science, Fuzhou University（福州大学计算机与数据科学学院）； JD Intelligent Cities Research（京东智能城市研究院）； School of Computing and Artificial Intelligence, Southwest Jiaotong University（西南交通大学计算机与人工智能学院）； Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； Beijing Normal University（北京师范大学）

AI总结 GeoMAE 是一种用于时空图预测的自监督表示学习模型，旨在解决城市智能系统中因环境和设备问题导致的数据缺失问题。该方法通过引入基于注意力机制的时空预测网络和辅助学习任务，有效捕捉了传感器网络中的动态空间关联，并提升了模型对缺失数据的鲁棒性。实验表明，GeoMAE 在多个真实数据集上显著优于现有方法，相对提升了最高达13.20%的预测性能。

Comments 34 pages for pre-print version. This work has been published in *Neural Networks*. Please check the latest version via the following DOI

详情

DOI: 10.1016/j.neunet.2026.108986

AI中文摘要

城市智能系统中缺失数据的普遍存在，归因于不利的环境条件和设备故障，对下游应用（尤其是交通预测和能耗预测）的有效性构成了重大挑战。因此，开发一种能够从不完整数据集中提取有意义信息的稳健时空学习方法至关重要。尽管存在针对缺失值时空图预测的方法，但未解决的问题依然存在。首先，现有研究大多基于时间序列分析，从而忽略了传感器网络中固有的动态空间相关性。其次，缺失数据模式的复杂性加剧了问题的复杂性。此外，维护条件的差异导致缺失值比率和模式显著波动，从而挑战了预测模型的泛化能力。针对这些挑战，本研究引入了GeoMAE，一种自监督的时空表示学习模型。该模型由三个主要组件组成：输入预处理模块、基于注意力的时空预测网络（STAFN）和一个辅助学习任务，该任务受掩码自编码器启发，以增强时空表示学习的鲁棒性。在真实数据集上的实证评估表明，GeoMAE显著优于现有基准，相对于最佳基线模型实现了高达13.20%的相对改进。

英文摘要

The ubiquity of missing data in urban intelligence systems, attributable to adverse environmental conditions and equipment failures, poses a significant challenge to the efficacy of downstream applications, notably in the realms of traffic forecasting and energy consumption prediction. Therefore, it is imperative to develop a robust spatio-temporal learning methodology capable of extracting meaningful insights from incomplete datasets. Despite the existence of methodologies for spatio-temporal graph forecasting in the presence of missing values, unresolved issues persist. Primarily, the majority of extant research is predicated on time-series analysis, thereby neglecting the dynamic spatial correlations inherent in sensor networks. Additionally, the complexity of missing data patterns compounds the intricacy of the problem. Furthermore, the variability in maintenance conditions results in a significant fluctuation in the ratio and pattern of missing values, thereby challenging the generalizability of predictive models. In response to these challenges, this study introduces GeoMAE, a self-supervised spatio-temporal representation learning model. The model is comprised of three principal components: an input preprocessing module, an attention-based spatio-temporal forecasting network (STAFN), and an auxiliary learning task, which draws inspiration from Masking AutoEncoders to enhance the robustness of spatio-temporal representation learning. Empirical evaluations on real-world datasets demonstrate that GeoMAE significantly outperforms existing benchmarks, achieving up to 13.20\% relative improvement over the best baseline models.

URL PDF HTML ☆

赞 0 踩 0

2508.10651 2026-05-25 cs.LG 版本更新

Graph Learning via Logic-Based Weisfeiler-Leman Variants and Tabularization

基于逻辑的Weisfeiler-Leman变体与表格化的图学习

Reijo Jaakkola, Tomi Janhunen, Antti Kuusisto, Magdalena Ortiz, Matias Selin, Mantas Šimkus

发表机构 * Tampere University（塔尔皮奥大学）； TU Wien（维也纳技术大学）

AI总结本文提出了一种基于逻辑增强的Weisfeiler-Leman算法和表格化的新型图分类方法，通过将图数据转化为表格形式并应用传统表格数据分析方法进行分类。该方法通过修改底层逻辑框架提升了表达能力，并通过广义量化器的双模拟游戏理论进行了精确刻画。实验表明，该方法在多个数据集上性能接近图神经网络和图变换器，且无需GPU支持和复杂的超参数调优，计算效率显著更高。

Comments New version: Revised the experimental section

2508.02332 2026-05-25 cs.LG stat.ML 版本更新

BOOST: A Data-Driven Framework for the Automated Joint Selection of Kernel and Acquisition Functions in Bayesian Optimization

BOOST: 一种用于贝叶斯优化中核函数与采集函数自动联合选择的数据驱动框架

Joon-Hyun Park, Mujin Cheon, Jeongsu Wi, Dong-Yeun Koh

发表机构 * Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology（化学与生物分子工程系，韩国科学技术院）； Department of AX, Korea Advanced Institute of Science and Technology（AX系，韩国科学技术院）； Saudi Aramco-KAIST CO2 Management Center（沙特阿美-KAIST二氧化碳管理中心）

AI总结贝叶斯优化（BO）是一种在昂贵黑箱问题中高度样本高效的优化方法，其性能高度依赖于核函数和获取函数等超参数的选择。本文提出了一种名为BOOST的框架，用于自动联合选择最优的核函数和获取函数对，解决了传统方法中依赖启发式或手动调参的问题。BOOST通过离线评估阶段预测不同核-获取函数对的性能，并在实际优化前选择最有可能表现良好的组合，从而提升优化效率和效果。实验表明，BOOST在合成基准和机器学习超参数优化任务中均优于固定超参数的BO方法，并能与先进自适应方法竞争。

Comments 25 pages

详情

AI中文摘要

贝叶斯优化（BO）是一种对昂贵黑箱问题具有高样本效率的方法，其性能关键取决于超参数的选择，包括核函数和采集函数。这带来了一个重要的实际挑战：不恰当的组合可能导致性能差和评估浪费。虽然对核函数和采集函数的单独改进已被积极探索，但自动联合选择最佳超参数对在很大程度上被忽视，迫使从业者依赖启发式方法或昂贵的手动训练。在这项工作中，我们提出了一个框架BOOST（贝叶斯优化与最优核函数和采集函数选择技术），该框架自动化了这一选择过程。BOOST利用一个简单的离线评估阶段来预测各种核函数-采集函数对的性能，并在进行昂贵的评估过程之前识别出最有希望的对。BOOST是一种数据驱动的策略选择程序，它根据候选策略在手头数据上的经验性能来评估核函数-采集函数对。在每次迭代中，先前观察到的点被划分为参考集和查询集。这些子集扮演类似于机器学习中训练集和验证集的角色：参考集用于模型构建，而查询集代表未见的区域，用于回顾性评估每个候选策略在向目标值推进方面的有效性。在合成基准和机器学习超参数优化任务上的实验表明，BOOST始终优于固定超参数的BO，并与最先进的自适应方法保持竞争力，突显了其在各种场景下的鲁棒性。

英文摘要

The performance of Bayesian optimization (BO), a highly sample-efficient method for expensive black-box problems, is critically governed by the selection of its hyperparameters, including the kernel and acquisition functions. This presents a significant practical challenge: an inappropriate combination of these can lead to poor performance and wasted evaluations. While individual improvements to kernel functions and acquisition functions have been actively explored, the joint and autonomous selection of the best pair of these fundamental hyperparameters has been largely overlooked. This forced practitioners to rely on heuristics or costly manual training. In this work, we propose a framework, BOOST (Bayesian Optimization with Optimal Kernel and Acquisition Function Selection Technique), that automates this selection. BOOST utilizes a simple offline evaluation stage to predict the performance of various kernel-acquisition function pairs and identify the most promising pair before committing to the expensive evaluation process. BOOST is a data-driven strategy selection procedure that evaluates kernel-acquisition pairs based on their empirical performance on the data-in-hand. At each iteration, previously observed points are partitioned into a reference set and a query set. These subsets play roles analogous to training and validation sets in machine learning: the reference set is used for model construction, while the query set represents unseen regions to retrospectively evaluate how effectively each candidate strategy progresses toward the target value. Experiments on synthetic benchmarks and machine learning hyperparameter optimization tasks demonstrate that BOOST consistently improves over fixed-hyperparameter BO and remains competitive with state-of-the-art adaptive methods, highlighting its robustness across diverse landscapes.

URL PDF HTML ☆

赞 0 踩 0

2507.09330 2026-05-25 physics.flu-dyn cs.LG physics.comp-ph 版本更新

WellPINN: Accurate Well Representation for Transient Fluid Pressure Diffusion in Subsurface Reservoirs with Physics-Informed Neural Networks

WellPINN：基于物理信息神经网络的瞬态流体压力扩散在储层中的精确井表征

Linus Walter, Qingkai Kong, Sara Hanson-Hedgecock, Víctor Vilarrasa

发表机构 * Global Change Research Group (GCRG), IMEDEA, CSIC-UIB（全球变化研究组（GCRG），IMEDEA，CSIC-UIB）

AI总结本文提出了一种基于物理信息神经网络（PINN）的新型建模方法 WellPINN，用于更准确地表征地下储层中井周围的瞬态流体压力扩散问题。该方法通过依次训练多个 PINN 模型，并逐步缩小等效井半径以匹配实际井尺寸，有效解决了现有方法在注水初期井附近压力预测不准确的问题。WellPINN 在整个注水周期内实现了对流体压力的高精度反演，显著提升了 PINN 在逆向建模和操作场景模拟中的应用潜力。

详情

DOI: 10.1029/2025WR041404

AI中文摘要

精确的井表征对于可靠的地层描述和地下流动模型中操作场景的模拟至关重要。物理信息神经网络（PINNs）最近作为一种有前景的储层建模方法出现，能够无缝集成监测数据和控制物理方程。然而，现有的基于PINN的研究在捕捉井附近流体压力方面面临重大挑战，特别是在注入开始后的早期阶段。为了解决这个问题，我们提出了WellPINN，一种建模工作流，它结合了多个顺序训练的PINN模型的输出，以精确表征井。该工作流通过将域分解为逐步缩小的子域，同时减小等效井半径，迭代地逼近等效井半径以匹配实际井尺寸。我们的结果表明，在抽水井周围顺序训练叠加网络是第一个专注于在整个注入期间从泵注速率精确推断流体压力的工作流，显著推进了PINN在反演建模和操作场景模拟中的潜力。本文的所有数据和代码将在https://github.com/linuswalter/WellPINN公开提供。

英文摘要

Accurate representation of wells is essential for reliable reservoir characterization and simulation of operational scenarios in subsurface flow models. Physics-informed neural networks (PINNs) have recently emerged as a promising method for reservoir modeling, offering seamless integration of monitoring data and governing physical equations. However, existing PINN-based studies face major challenges in capturing fluid pressure near wells, particularly during the early stage after injection begins. To address this, we propose WellPINN, a modeling workflow that combines the outputs of multiple sequentially trained PINN models to accurately represent wells. This workflow iteratively approximates the radius of the equivalent well to match the actual well dimensions by decomposing the domain into stepwise shrinking subdomains with a simultaneously reducing equivalent well radius. Our results demonstrate that sequential training of superimposing networks around the pumping well is the first workflow that focuses on accurate inference of fluid pressure from pumping rates throughout the entire injection period, significantly advancing the potential of PINNs for inverse modeling and operational scenario simulations. All data and code for this paper will be made openly available at https://github.com/linuswalter/WellPINN.

URL PDF HTML ☆

赞 0 踩 0

2507.06252 2026-05-25 cs.CR cs.AI cs.LG 版本更新

False Alarms, Real Damage: Adversarial Attacks Using LLM-based Models on Text-based Cyber Threat Intelligence Systems

虚假警报，真实损害：基于LLM的模型对文本网络威胁情报系统的对抗攻击

Samaneh Shafee, Alysson Bessani, Pedro M. Ferreira

发表机构 * Faculty of Sciences, University of Lisbon（里斯本大学科学学院）； CIENCES, University of Lisbon（里斯本大学CIENCES）

AI总结本文研究了基于大语言模型（LLM）的对抗攻击对基于文本的网络威胁情报（CTI）系统的影响。研究分析了三种攻击类型，包括规避、泛滥和投毒攻击，揭示了CTI系统在处理来自开放来源的文本数据时存在的脆弱性。特别指出，通过生成虚假文本，攻击者可以误导分类器，降低系统性能并破坏其功能，其中规避攻击在CTI流程中尤为关键，为后续攻击提供了前提条件。

详情

DOI: 10.1016/j.future.2026.108603
Journal ref: Future Generation Computer Systems, 2026

AI中文摘要

网络威胁情报（CTI）已成为一种重要的补充方法，在网络威胁生命周期的早期阶段运作。CTI涉及收集、处理和分析威胁数据，以提供更准确和快速的网络威胁理解。由于数据量大，通过机器学习（ML）和自然语言处理（NLP）模型进行自动化对于有效的CTI提取至关重要。这些自动化系统利用来自社交网络、论坛和博客等来源的开源情报（OSINT）来识别威胁指标（IoCs）。尽管先前的研究集中在针对特定ML模型的对抗攻击上，但本研究通过调查整个CTI管道中各个组件的脆弱性及其对对抗攻击的敏感性，扩展了研究范围。这些脆弱性源于它们从各种开放来源（包括真实和潜在虚假内容）接收文本输入。我们分析了针对CTI管道的三种攻击类型，包括逃避、淹没和投毒，并评估了它们对系统信息选择能力的影响。具体而言，在虚假文本生成方面，该工作展示了对抗文本生成技术如何创建虚假的网络安全和类似网络安全的文本，从而误导分类器、降低性能并破坏系统功能。重点主要放在逃避攻击上，因为它先于并使得CTI管道中的淹没和投毒攻击成为可能。

英文摘要

Cyber Threat Intelligence (CTI) has emerged as a vital complementary approach that operates in the early phases of the cyber threat lifecycle. CTI involves collecting, processing, and analyzing threat data to provide a more accurate and rapid understanding of cyber threats. Due to the large volume of data, automation through Machine Learning (ML) and Natural Language Processing (NLP) models is essential for effective CTI extraction. These automated systems leverage Open Source Intelligence (OSINT) from sources like social networks, forums, and blogs to identify Indicators of Compromise (IoCs). Although prior research has focused on adversarial attacks on specific ML models, this study expands the scope by investigating vulnerabilities within various components of the entire CTI pipeline and their susceptibility to adversarial attacks. These vulnerabilities arise because they ingest textual inputs from various open sources, including real and potentially fake content. We analyse three types of attacks against CTI pipelines, including evasion, flooding, and poisoning, and assess their impact on the system's information selection capabilities. Specifically, on fake text generation, the work demonstrates how adversarial text generation techniques can create fake cybersecurity and cybersecurity-like text that misleads classifiers, degrades performance, and disrupts system functionality. The focus is primarily on the evasion attack, as it precedes and enables flooding and poisoning attacks within the CTI pipeline.

URL PDF HTML ☆

赞 0 踩 0

2507.05064 2026-05-25 stat.ML cs.LG stat.ME 版本更新

Vecchia-Inducing-Points Full-Scale Approximations for Gaussian Processes

高斯过程的Vecchia诱导点全尺度近似

Tim Gyger, Reinhard Furrer, Fabio Sigrist

发表机构 * Institute of Financial Services（金融服务研究所）； Lucerne University of Applied Sciences and Arts（卢塞恩应用科学与艺术大学）； University of Zurich（苏黎世大学）； Seminar for Statistics, ETH Zurich（苏黎世联邦理工学院统计系）

AI总结本文提出了一种结合全局诱导点与局部Vecchia近似优势的高斯过程全尺度近似方法——VIF近似，旨在解决高斯过程在大规模数据集上的计算瓶颈。该方法通过基于相关性的邻居查找策略，提高了残差过程的Vecchia近似效率，并利用改进的覆盖树算法实现高效计算。此外，研究还扩展了该框架以处理非高斯似然，引入迭代方法大幅降低了计算成本，并在模拟和真实数据集上验证了其在计算效率、精度和数值稳定性方面的优越性。

详情

AI中文摘要

高斯过程是灵活、概率性的非参数模型，广泛应用于机器学习和统计学。然而，其在大数据集上的可扩展性受计算限制。为克服这些挑战，我们提出Vecchia诱导点全尺度（VIF）近似，结合全局诱导点和局部Vecchia近似的优势。Vecchia近似在低维输入和中等光滑协方差函数设置中表现优异，而诱导点方法更适合高维输入和更光滑的协方差函数。我们的VIF方法通过使用基于相关性的高效邻居搜索策略（通过改进的覆盖树算法实现）对残差过程进行Vecchia近似，从而桥接这两种情况。我们进一步将框架扩展到非高斯似然，引入迭代方法，与基于Cholesky的计算相比，在使用拉普拉斯近似时，训练和预测的计算成本降低了几个数量级。特别是，我们提出并比较了新颖的预条件器，并提供了理论收敛结果。在模拟和真实数据集上的大量数值实验表明，VIF近似不仅计算高效，而且比最先进的替代方法更准确、数值更稳定。所有方法均在开源C++库GPBoost中实现，并配有高级Python和R接口。

英文摘要

Gaussian processes are flexible, probabilistic, non-parametric models widely used in machine learning and statistics. However, their scalability to large data sets is limited by computational constraints. To overcome these challenges, we propose Vecchia-inducing-points full-scale (VIF) approximations combining the strengths of global inducing points and local Vecchia approximations. Vecchia approximations excel in settings with low-dimensional inputs and moderately smooth covariance functions, while inducing point methods are better suited to high-dimensional inputs and smoother covariance functions. Our VIF approach bridges these two regimes by using an efficient correlation-based neighbor-finding strategy for the Vecchia approximation of the residual process, implemented via a modified cover tree algorithm. We further extend our framework to non-Gaussian likelihoods by introducing iterative methods that substantially reduce computational costs for training and prediction by several orders of magnitudes compared to Cholesky-based computations when using a Laplace approximation. In particular, we propose and compare novel preconditioners and provide theoretical convergence results. Extensive numerical experiments on simulated and real-world data sets show that VIF approximations are both computationally efficient as well as more accurate and numerically stable than state-of-the-art alternatives. All methods are implemented in the open source C++ library GPBoost with high-level Python and R interfaces.

URL PDF HTML ☆

赞 0 踩 0

2505.21573 2026-05-25 cs.LG cs.AI 版本更新

Spectral-inspired Operator Learning with Limited Data and Unknown Physics

光谱启发的少数据与未知物理下的算子学习

Han Wan, Rui Zhang, Hao Sun

发表机构 * Gaoling School of Artificial Intelligence, Renmin University of China（中国人民大学光明学院人工智能学院）

AI总结本文研究了在数据有限且物理机制未知的情况下学习偏微分方程（PDE）动力学的挑战。为此，提出了一种名为SINO的频谱启发神经算子，它仅需2到5条轨迹即可建模复杂系统，无需显式依赖PDE方程。SINO通过频率索引自动捕捉局部和全局空间导数，结合乘法操作块和低通滤波器处理非线性效应和混叠问题，在多个二维和三维PDE基准测试中表现出优异性能，尤其在少量数据和分布外场景下显著优于现有方法。

Comments To appear in KDD 2026

详情

DOI: 10.1145/3770855.3817831

AI中文摘要

从有限数据和未知物理中学习PDE动力学具有挑战性。现有的神经PDE求解器要么需要大型数据集，要么依赖已知物理（如PDE残差或手工模板），导致适用性有限。为解决这些问题，我们提出光谱启发神经算子（SINO），它仅需2-5条轨迹即可建模复杂系统，无需显式PDE项。具体而言，SINO从频率索引自动捕获局部和全局空间导数，从而在物理无关机制下实现底层微分算子的紧凑表示。为建模非线性效应，它采用Pi块对光谱特征进行乘法运算，并辅以低通滤波器抑制混叠。在2D和3D PDE基准上的大量实验表明，SINO实现了最先进的性能，精度提升1-2个数量级。特别地，仅用5条训练轨迹，SINO就优于在1000条轨迹上训练的数据驱动方法，并在其他方法失败的高难度分布外案例中保持预测能力。

英文摘要

Learning PDE dynamics from limited data with unknown physics is challenging. Existing neural PDE solvers either require large datasets or rely on known physics (e.g., PDE residuals or handcrafted stencils), leading to limited applicability. To address these challenges, we propose Spectral-Inspired Neural Operator (SINO), which can model complex systems from just 2-5 trajectories, without requiring explicit PDE terms. Specifically, SINO automatically captures both local and global spatial derivatives from frequency indices, enabling a compact representation of the underlying differential operators in physics-agnostic regimes. To model nonlinear effects, it employs a Pi-block that performs multiplicative operations on spectral features, complemented by a low-pass filter to suppress aliasing. Extensive experiments on both 2D and 3D PDE benchmarks demonstrate that SINO achieves state-of-the-art performance, with improvements of 1-2 orders of magnitude in accuracy. Particularly, with only 5 training trajectories, SINO outperforms data-driven methods trained on 1000 trajectories and remains predictive on challenging out-of-distribution cases where other methods fail.

URL PDF HTML ☆

赞 0 踩 0

2505.17354 2026-05-25 cs.LG stat.ML 版本更新

CT-OT Flow: Estimating Continuous-Time Dynamics from Discrete Temporal Snapshots

CT-OT Flow：从离散时间快照估计连续时间动态

Keisuke Kawano, Takuro Kutsuna, Naoki Hayashi, Yasushi Esaki, Hidenori Tanaka

发表机构 * Toyota Central R&D Labs., Inc.（丰田中央研发实验室）

AI总结本文研究如何从离散时间快照中估计连续时间动态，针对如单细胞RNA测序、移动感知等场景中数据仅以时间聚合快照形式存在、时间标签可能噪声或不确定的问题。提出了一种两阶段框架——连续时间最优传输流（CT-OT Flow），通过部分最优传输对齐相邻时间区间以推断高分辨率时间标签，并利用时间核平滑重建连续时间数据分布，从而训练标准的常微分方程或随机微分方程模型。该方法有效处理快照聚合和时间标签不确定性，并通过实用加速策略提升计算效率，在多个合成和真实数据集上表现出更优的分布和轨迹估计性能。

Comments https://github.com/ToyotaCRDL/CT-OT_Flow

详情

DOI: 10.1016/j.neucom.2026.133965

AI中文摘要

在许多现实场景中（例如单细胞RNA测序、移动感知和环境监测），数据仅作为在有限时间窗口内收集的时间聚合快照被观测到，通常带有噪声或不确定的时间戳，并且无法访问连续轨迹。我们研究从这类快照估计连续时间动态的问题。我们提出连续时间最优传输流（CT-OT Flow），这是一个两阶段框架：（i）通过部分最优传输（POT）对齐相邻区间来推断高分辨率时间标签，（ii）通过时间核平滑重建连续时间数据分布，从中采样邻近时间对以训练标准ODE/SDE模型。我们的公式明确考虑了快照聚合和时间标签不确定性，并使用实际加速（筛选和小批量POT），使其适用于大型数据集。在合成基准和两个真实数据集（scRNA-seq和台风轨迹）上，与OT-CFM、[SF]²M、TrajectoryNet、MFM和ENOT相比，CT-OT Flow减少了分布和轨迹误差。

英文摘要

In many real-world settings--e.g., single-cell RNA sequencing, mobility sensing, and environmental monitoring--data are observed only as temporally aggregated snapshots collected over finite time windows, often with noisy or uncertain timestamps, and without access to continuous trajectories. We study the problem of estimating continuous-time dynamics from such snapshots. We present Continuous-Time Optimal Transport Flow (CT-OT Flow), a two-stage framework that (i) infers high-resolution time labels by aligning neighboring intervals via partial optimal transport (POT) and (ii) reconstructs a continuous-time data distribution through temporal kernel smoothing, from which we sample pairs of nearby times to train standard ODE/SDE models. Our formulation explicitly accounts for snapshot aggregation and time-label uncertainty and uses practical accelerations (screening and mini-batch POT), making it applicable to large datasets. Across synthetic benchmarks and two real datasets (scRNA-seq and typhoon tracks), CT-OT Flow reduces distributional and trajectory errors compared with OT-CFM, [SF]$^{2}$M, TrajectoryNet, MFM, and ENOT.

URL PDF HTML ☆

赞 0 踩 0

2505.03784 2026-05-25 cs.LG 版本更新

Insulin Resistance Prediction From Wearables and Routine Blood Biomarkers

从可穿戴设备和常规血液生物标志物预测胰岛素抵抗

Ahmed A. Metwally, A. Ali Heydari, Daniel McDuff, Alexandru Solot, Zeinab Esmaeilpour, Anthony Z Faranesh, Menglian Zhou, David B. Savage, Conor Heneghan, Shwetak Patel, Cathy Speed, Javier L. Prieto

发表机构 * Google Research（谷歌研究）； Institute of Metabolic Science, University of Cambridge（剑桥大学代谢科学研究所）

AI总结该研究旨在利用可穿戴设备数据和常规血液生物标志物预测胰岛素抵抗，以实现糖尿病的早期干预。研究构建了深度神经网络模型，结合多源数据进行预测，取得了较高的准确率和泛化能力。模型在肥胖和久坐人群中表现尤为突出，并展示了与大型语言模型结合用于解释预测结果的潜力，为个性化健康管理提供了新方法。

详情

DOI: 10.1038/s41586-026-10179-2

AI中文摘要

胰岛素抵抗是2型糖尿病的前兆，其特征是组织中胰岛素作用受损。当前测量胰岛素抵抗的方法虽然有效，但昂贵、难以获取、不广泛可用，并阻碍了早期干预的机会。在这项研究中，我们在美国远程招募了迄今为止最大的数据集来研究胰岛素抵抗（N=1,165名参与者，中位BMI=28 kg/m²，年龄=45岁，HbA1c=5.4%），整合了可穿戴设备时间序列数据和血液生物标志物，包括胰岛素抵抗的金标准测量——稳态模型评估胰岛素抵抗（HOMA-IR）。我们开发了深度神经网络模型，基于易于获取的数字和血液生物标志物预测胰岛素抵抗。结果表明，我们的模型通过结合可穿戴数据和易于获取的血液生物标志物，能够比单独使用任一数据源更好地预测胰岛素抵抗（R²=0.5，auROC=0.80，灵敏度=76%，特异性=84%）。在肥胖和久坐参与者（最易患2型糖尿病且能从早期干预中最大受益的亚群）中，模型显示出93%的灵敏度和95%的调整后特异性。对模型性能的严格评估，包括可解释性和鲁棒性，促进了在更大队列中的泛化能力，这一点通过在独立验证队列（N=72名参与者）上复现预测性能得到证明。此外，我们展示了如何将预测的胰岛素抵抗集成到大语言模型代理中，以帮助理解和情境化HOMA-IR值，促进解释和安全的个性化推荐。这项工作为早期检测2型糖尿病风险人群提供了可能，从而促进预防策略的早期实施。

英文摘要

Insulin resistance, a precursor to type 2 diabetes, is characterized by impaired insulin action in tissues. Current methods for measuring insulin resistance, while effective, are expensive, inaccessible, not widely available and hinder opportunities for early intervention. In this study, we remotely recruited the largest dataset to date across the US to study insulin resistance (N=1,165 participants, with median BMI=28 kg/m2, age=45 years, HbA1c=5.4%), incorporating wearable device time series data and blood biomarkers, including the ground-truth measure of insulin resistance, homeostatic model assessment for insulin resistance (HOMA-IR). We developed deep neural network models to predict insulin resistance based on readily available digital and blood biomarkers. Our results show that our models can predict insulin resistance by combining both wearable data and readily available blood biomarkers better than either of the two data sources separately (R2=0.5, auROC=0.80, Sensitivity=76%, and specificity 84%). The model showed 93% sensitivity and 95% adjusted specificity in obese and sedentary participants, a subpopulation most vulnerable to developing type 2 diabetes and who could benefit most from early intervention. Rigorous evaluation of model performance, including interpretability, and robustness, facilitates generalizability across larger cohorts, which is demonstrated by reproducing the prediction performance on an independent validation cohort (N=72 participants). Additionally, we demonstrated how the predicted insulin resistance can be integrated into a large language model agent to help understand and contextualize HOMA-IR values, facilitating interpretation and safe personalized recommendations. This work offers the potential for early detection of people at risk of type 2 diabetes and thereby facilitate earlier implementation of preventative strategies.

URL PDF HTML ☆

赞 0 踩 0

2504.09846 2026-05-25 cs.LG cs.AI cs.HC 版本更新

GlyTwin: Digital Twin for Glucose Control in Type 1 Diabetes Through Optimal Behavioral Modifications Using Patient-Centric Counterfactuals

GlyTwin: 通过以患者为中心的反事实实现1型糖尿病血糖控制的最佳行为修改的数字孪生

Asiful Arefeen, Saman Khamesian, Maria Adela Grando, Bithika Thompson, Hassan Ghasemzadeh

发表机构 * College of Health Solutions, Arizona State University（亚利桑那州立大学健康解决方案学院）； School of Computing and Augmented Intelligence, Arizona State University（亚利桑那州立大学计算与增强智能学院）； Department of Endocrinology, Mayo Clinic Arizona（梅奥诊所亚利桑那分部内分泌科）

AI总结该研究提出了一种名为GlyTwin的数字孪生框架，用于通过行为优化改善1型糖尿病患者的血糖控制。其核心方法是结合反事实解释，模拟最优行为干预方案，如调整碳水化合物摄入和胰岛素剂量，以减少高血糖事件的发生。研究还引入了利益相关者的偏好，使干预方案更具个性化和实用性。实验结果表明，GlyTwin在生成有效反事实解释和预防高血糖方面优于现有方法，具有较高的实用价值。

详情

AI中文摘要

频繁和长期暴露于高血糖会增加慢性并发症的风险，包括神经病变、肾病和心血管疾病。现有的连续皮下胰岛素输注（CSII）和连续血糖监测（CGM）技术仅模拟血糖调节的特定方面，例如预测低血糖和给予小剂量胰岛素推注。同样，当前糖尿病管理中的数字孪生方法主要侧重于预测血糖对人类行为和胰岛素治疗的反应。因此，这些技术缺乏提供替代治疗方案的能力，而这些方案可以指导主动行为干预以实现最佳糖尿病管理。为填补这一空白，我们提出GlyTwin，一种新颖的计算框架，通过整合反事实解释来增强数字孪生技术，以模拟血糖控制的最佳行为治疗。GlyTwin通过推荐行为选择（如碳水化合物摄入和胰岛素剂量）的调整来生成反事实治疗，以显著减少高血糖事件的发生和持续时间。此外，GlyTwin将利益相关者的偏好纳入其干预生成过程，确保工具个性化和以用户为中心。我们在AZT1D上评估GlyTwin，该数据集是通过收集50名使用自动胰岛素输送（AID）系统的1型糖尿病（T1D）患者的纵向数据构建的，每人监测26天。结果表明，与历史数据相比，GlyTwin在生成反事实解释方面优于现有方法，有效解释率为85.8%，预防高血糖的有效性为87.3%。

英文摘要

Frequent and long-term exposure to hyperglycemia increases the risk of chronic complications, including neuropathy, nephropathy, and cardiovascular disease. Existing continuous subcutaneous insulin infusion (CSII) and continuous glucose monitoring (CGM) technologies model only specific aspects of glycemic regulation, such as predicting hypoglycemia and administering small insulin boluses. Similarly, current digital twin approaches in diabetes management primarily focus on predicting glucose responses to human behavior and insulin therapy. As a result, these technologies lack the ability to provide alternative treatment scenarios that could guide proactive behavioral interventions for optimal diabetes management. To address this gap, we propose GlyTwin, a novel computational framework that enhances digital twin technologies by integrating counterfactual explanations to simulate optimal behavioral treatments for glucose control. GlyTwin generates counterfactual treatments by recommending adjustments to behavioral choices, such as carbohydrate intake and insulin dosing, to significantly reduce the occurrence and duration of hyperglycemic events. In addition, GlyTwin incorporates stakeholder preferences into its intervention-generation process, ensuring that the tool is personalized and user-centric. We evaluate GlyTwin on AZT1D, a new dataset constructed by collecting longitudinal data from 50 individuals living with type 1 diabetes (T1D) on automated insulin delivery (AID) systems, each monitored for 26 days. Results show that GlyTwin outperforms state-of-the-art methods for generating counterfactual explanations, with 85.8\% valid explanations and 87.3\% effectiveness in preventing hyperglycemia compared with historical data.

URL PDF HTML ☆

赞 0 踩 0

2503.04929 2026-05-25 cs.RO cs.LG cs.SY eess.SY 版本更新

Neural Configuration-Space Barriers for Manipulation Planning and Control

用于操作规划与控制的神经构型空间障碍

Kehan Long, Ki Myung Brian Lee, Nikola Raicevic, Niyas Attasseri, Melvin Leok, Nikolay Atanasov

发表机构 * Contextual Robotics Institute, University of California San Diego（情境机器人研究所，加州大学圣地亚哥分校）

AI总结本文研究了如何在复杂动态环境中高效安全地规划和控制高维机械臂的运动。作者提出了一种基于神经网络配置空间距离函数（CDF）的统一方法，将安全约束转化为CDF屏障，从而减少路径规划中的碰撞检测次数。为应对模型误差和传感器噪声带来的不确定性，研究还提出了分布鲁棒的CDF屏障控制框架，无需假设噪声分布。实验表明，该方法能够在仅依赖 onboard 点云观测的情况下，实现高效且安全的机械臂操控。

详情

AI中文摘要

在杂乱动态环境中，高维机器人操作器的规划与控制需要计算效率和鲁棒的安全保证。受近期学习构型空间距离函数（CDF）作为机器人身体表示的研究启发，我们提出了一种统一的运动规划与控制方法，将安全约束公式化为CDF障碍。CDF障碍近似局部自由构型空间，显著减少了运动规划中的碰撞检测操作次数。然而，使用神经网络学习CDF障碍并依赖在线传感器观测会引入不确定性，这些必须在控制综合中加以考虑。为此，我们开发了一种分布鲁棒的CDF障碍控制公式，该公式在不假设已知底层分布的情况下，考虑了建模误差和传感器噪声。在UFactory xArm6操作器上的仿真和硬件实验表明，我们的神经CDF障碍公式能够在杂乱动态环境中实现高效规划和鲁棒安全控制，仅依赖机载点云观测。

英文摘要

Planning and control for high-dimensional robot manipulators in cluttered dynamic environments require computational efficiency and robust safety guarantees. Inspired by recent advances in learning configuration-space distance functions (CDFs) as representations of robot bodies, we propose a unified approach for motion planning and control that formulates safety constraints as CDF barriers. A CDF barrier approximates the local free configuration space, substantially reducing the number of collision-checking operations during motion planning. However, learning a CDF barrier with a neural network and relying on online sensor observations introduces uncertainties that must be considered during control synthesis. To address this, we develop a distributionally robust CDF barrier formulation for control that accounts for modeling errors and sensor noise without assuming a known underlying distribution. Simulations and hardware experiments on a UFactory xArm6 manipulator show that our neural CDF barrier formulation enables efficient planning and robust safe control in cluttered and dynamic environments, relying only on onboard point-cloud observations.

URL PDF HTML ☆

赞 0 踩 0

2502.17119 2026-05-25 cs.LG cs.AI 版本更新

Diffusion and Flow Matching Models for Tabular Data: A Survey

表格数据的扩散与流匹配模型：综述

Zhong Li, Qi Huang, Lincen Yang, Jiayang Shi, Zhao Yang, Niki van Stein, Thomas Bäck, Matthijs van Leeuwen

发表机构 * Great Bay University（大湾大学）； Vrije Universiteit Amsterdam（阿姆斯特丹自由大学）； LIACS, Leiden University（莱顿大学LIACS）

AI总结本文综述了扩散模型和流匹配模型在表格数据生成中的应用，探讨了这些模型在处理数值与类别混合、缺失值、敏感字段及复杂依赖关系等挑战时的优势与方法。文章系统梳理了从2015年至2026年的相关研究，围绕数据工程难题、任务目标、设计选择及评估维度进行组织，并指出了在可扩展性、特征依赖建模、隐私保护、公平性及约束感知生成等方面的开放问题。

Comments We substantially updated the previous version "Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions" by including flow matching models for tabular data

详情

AI中文摘要

深度生成模型在图像、文本、音频和视频生成方面取得了快速进展，并越来越多地应用于结构化记录。然而，对于表格数据，生成建模仍然困难：数据集可能包含数值和分类属性、缺失值、敏感字段、不平衡类别、复杂的特征依赖和领域约束。早期基于GAN或VAE的表格数据建模方法取得了有用结果，但可能面临训练不稳定、模式崩溃、多模态分布建模能力弱以及混合类型特征处理脆弱等问题。因此，扩散模型因其噪声-去噪公式提供了灵活稳定的方式来建模复杂数据分布而受到越来越多的关注，并已被应用于表格合成、缺失值填补、可信数据生成和异常检测。流匹配通过学习沿概率路径的传输向量场提供了一条密切相关的途径，通常对路径设计和采样效率有更直接的控制。尽管取得了进展，但针对表格数据的扩散和流匹配模型文献仍然难以比较，因为方法针对不同任务，依赖于不同的表示、目标、评估协议和领域假设。据我们所知，这是第一篇专门针对表格数据的扩散和流匹配模型的综述。我们回顾了2015年6月至2026年5月的工作，围绕数据工程挑战、任务、设计选择和评估维度进行组织，并讨论了可扩展性、特征依赖建模、隐私、公平性、基准测试和约束感知生成中的开放问题。我们在GitHub仓库中保持更新。

XAttnMark：基于交叉注意力的鲁棒音频水印学习

Yixin Liu, Lie Lu, Jihui Jin, Lichao Sun, Andrea Fanelli

发表机构 * Department of Computer Science, Lehigh University, Bethlehem, PA, USA（莱文斯顿大学计算机科学系）； Dolby Laboratories Inc., San Francisco, CA, USA（杜比实验室公司）

AI总结随着生成式音频合成和编辑技术的快速发展，版权保护、数据溯源和深度伪造音频传播等问题日益突出。本文提出了一种基于交叉注意力机制的鲁棒音频水印方法XAttnMark，通过生成器与检测器之间的部分参数共享、高效的交叉注意力消息检索机制以及时间条件模块，实现了水印检测与归属的联合优化。此外，该方法引入了与心理声学对齐的时频掩码损失，提升了水印的不可感知性，实验表明其在多种音频变换下均表现出优越的鲁棒性，为生成式AI时代的音频版权保护提供了有效解决方案。

Comments Accepted at ICML'25

详情

AI中文摘要

生成式音频合成与编辑技术的快速普及引发了关于版权侵权、数据溯源以及通过深度伪造音频传播虚假信息的严重担忧。水印技术通过将不可感知但可识别和可追踪的信号嵌入音频内容，提供了一种主动解决方案。尽管最近基于神经网络的水印方法（如WavMark和AudioSeal）在鲁棒性和质量上有所改进，但它们难以同时优化鲁棒检测和准确归因。本文介绍了交叉注意力鲁棒音频水印（XATTNMARK），通过利用生成器和检测器之间的部分参数共享、用于高效消息检索的交叉注意力机制以及用于改善消息分布的时间条件模块，弥合了这一差距。此外，我们提出了一种心理声学对齐的时频（TF）掩蔽损失，捕捉细粒度的听觉掩蔽效应，提高了水印的不可感知性。XATTNMARK在检测和归因方面均达到了最先进的性能，展示了针对各种音频变换（包括不同强度的具有挑战性的生成式编辑）的卓越鲁棒性。这项工作推进了音频水印技术，用于在生成式AI时代保护知识产权并确保真实性。

英文摘要

The rapid proliferation of generative audio synthesis and editing technologies has raised serious concerns about copyright infringement, data provenance, and the spread of misinformation via deepfake audio. Watermarking offers a proactive solution by embedding imperceptible yet identifiable and traceable signals into audio content. While recent neural network-based watermarking methods like WavMark and AudioSeal have improved robustness and quality, they struggle to jointly optimize both robust detection and accurate attribution. This paper introduces Cross-Attention Robust Audio Watermark (XATTNMARK), which bridges this gap by leveraging partial parameter sharing between the generator and the detector, a cross-attention mechanism for efficient message retrieval, and a temporal conditioning module for improved message distribution. Additionally, we propose a psychoacoustic-aligned time-frequency (TF) masking loss that captures fine-grained auditory masking effects, improving watermark imperceptibility. XATTNMARK achieves state-of-the-art performance in both detection and attribution, demonstrating superior robustness against a wide range of audio transformations, including challenging generative editing at varying strengths. This work advances audio watermarking for protecting intellectual property and ensuring authenticity in the era of generative AI.

URL PDF HTML ☆

赞 0 踩 0

2411.12173 2026-05-25 cs.LG cs.AI 版本更新

SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks

SkillTree: 面向长时域控制任务的可解释基于技能的深度强化学习

Yongyan Wen, Siyuan Li, Rongchang Zuo, Lei Yuan, Hangyu Mao, Peng Liu

发表机构 * Faculty of Computing, Harbin Institute of Technology（哈尔滨工业大学计算机学院）； National Key Laboratory of Novel Software Technology, Nanjing University（南京大学新型软件技术国家实验室）； School of Artificial Intelligence, Nanjing University（南京大学人工智能学院）； Polixir Technologies ； SenseTime Research（时光机器研究）

AI总结本文提出了一种名为SkillTree的可解释技能型深度强化学习框架，用于解决长期控制任务中的复杂连续动作空间问题。该方法通过将连续动作空间离散化为技能空间，并在高层策略中引入可微决策树生成技能嵌入，从而指导底层策略执行具体技能，实现了技能层面的可解释性。实验表明，SkillTree在复杂机械臂控制任务中性能与基于神经网络的技能方法相当，同时提升了决策过程的透明度。

详情

DOI: 10.1609/aaai.v39i20.35451

AI中文摘要

深度强化学习（DRL）在各个研究领域取得了显著成功。然而，其对神经网络的依赖导致缺乏透明度，限制了实际应用。为了实现可解释性，决策树已成为神经网络的一种流行且有前景的替代方案。然而，由于其表达能力有限，传统决策树难以处理高维长时域连续控制任务。在本文中，我们提出了SkillTree，一种新颖的框架，将复杂的连续动作空间缩减为离散的技能空间。我们的层次化方法在高层次策略中集成了可微决策树以生成技能嵌入，进而指导低层次策略执行技能。通过使技能决策可解释，我们实现了技能级可解释性，增强了对复杂任务中决策过程的理解。实验结果表明，我们的方法在复杂机器人臂控制领域中达到了与基于技能的神经网络相当的性能。此外，SkillTree在技能级别提供解释，从而提高了决策过程的透明度。

英文摘要

Deep reinforcement learning (DRL) has achieved remarkable success in various research domains. However, its reliance on neural networks results in a lack of transparency, which limits its practical applications. To achieve explainability, decision trees have emerged as a popular and promising alternative to neural networks. Nonetheless, due to their limited expressiveness, traditional decision trees struggle with high-dimensional long-horizon continuous control tasks. In this paper, we proposes SkillTree, a novel framework that reduces complex continuous action spaces into discrete skill spaces. Our hierarchical approach integrates a differentiable decision tree within the high-level policy to generate skill embeddings, which subsequently guide the low-level policy in executing skills. By making skill decisions explainable, we achieve skill-level explainability, enhancing the understanding of the decision-making process in complex tasks. Experimental results demonstrate that our method achieves performance comparable to skill-based neural networks in complex robotic arm control domains. Furthermore, SkillTree offers explanations at the skill level, thereby increasing the transparency of the decision-making process.

URL PDF HTML ☆

赞 0 踩 0

2411.08126 2026-05-25 stat.ML cs.LG 版本更新

A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing

双城记：离线动态定价中的悲观主义与机会主义

Zeyu Bian, Lan Wang, Zhengling Qi

发表机构 * Department of Statistics, Florida State University（佛罗里达州立大学统计系）； Department of Management Science, University of Miami（迈阿密大学管理科学系）； Department of Decision Sciences, The George Washington University（乔治华盛顿大学决策科学系）

AI总结本文研究了在历史数据未能覆盖全部价格区间的情况下，如何进行离线动态定价，尤其是在最优价格可能完全未被观测到的现实场景中。为解决这一问题，作者提出了一种非参数部分识别框架，利用需求对价格的单调性来估计未观测价格的价值，并设计了两种动态定价策略：一种是追求最坏情况下收益最大化的悲观策略，另一种是力求最小化最坏情况下遗憾的乐观策略。该方法在无覆盖场景下表现出优越性能，并为企业提供了根据风险偏好选择定价策略的实用指导。

详情

AI中文摘要

我们研究离线动态定价，当历史数据对价格空间的覆盖不完整时，一些候选价格（包括最优价格）可能完全未被观测到。这种设置在现实中很常见，在动态环境中尤其困难。现有的离线强化学习方法通常依赖于完全或部分覆盖，因此在这种设置下表现不佳。我们开发了一个用于离线动态定价的非参数部分识别框架，利用需求在价格上的单调性来界定未观测价格的价值。在该框架内，我们制定了两种动态决策规则：一种最大化最坏情况收入的悲观策略，和一种最小化最坏情况遗憾的机会策略。这些规则针对顺序无覆盖环境量身定制，并非现有悲观离线强化学习或静态机会主义方法的直接扩展。我们为两种策略建立了有限样本遗憾界，当最优价格被覆盖时恢复了标准速率，并量化了未覆盖时的额外成本。我们还开发了高效算法，并通过模拟和机票应用表明，我们的方法在无覆盖设置中优于标准离线强化学习基线。从管理角度看，该框架提供了从公司风险态度到定价策略的实用映射：寻求收入稳定和下行保护的公司应偏好悲观策略，而愿意承担适度风险以从未充分探索的价格中获取潜在收益的公司应偏好机会策略。

英文摘要

We study offline dynamic pricing when historical data provide incomplete coverage of the price space such that some candidate prices, including the optimal one, may be entirely unobserved. This setting is common in practice and is especially difficult in dynamic environments. Existing offline reinforcement learning methods typically rely on full or partial coverage and can therefore perform poorly in such settings. We develop a nonparametric partial identification framework for offline dynamic pricing that exploits the monotonicity of demand in price to bound the value of unobserved prices. Within this framework, we formulate two dynamic decision rules: a pessimistic policy that maximizes worst-case revenue and an opportunistic policy that minimizes worst-case regret. These rules are tailored to a sequential no-coverage environment and are not direct extensions of existing pessimistic offline RL or static opportunistic approaches. We establish finite-sample regret bounds for both policies, recovering the standard rate when the optimal price is covered and quantifying the additional cost when it is not. We also develop efficient algorithms and show, through simulations and an airline ticket application, that our methods outperform standard offline RL baselines in no-coverage settings. Managerially, the framework provides a practical mapping from a firm's risk posture to its pricing policy: firms seeking revenue stability and downside protection should prefer the pessimistic policy, whereas firms willing to bear measured risk for potential gains from underexplored prices should prefer the opportunistic policy.

URL PDF HTML ☆

赞 0 踩 0

2411.01088 2026-05-25 cs.LG math.OC 版本更新

CRONOS: Enhancing Deep Learning with Scalable GPU Accelerated Convex Neural Networks

CRONOS: 利用可扩展的GPU加速凸神经网络增强深度学习

Miria Feng, Zachary Frangella, Mert Pilanci

发表机构 * Stanford University（斯坦福大学）

AI总结本文提出了一种名为 CRONOS 的算法，用于对两层神经网络进行凸优化，该算法能够首次扩展到高维数据集如 ImageNet，显著优于以往仅在 MNIST 和 CIFAR-10 下采样版本上进行研究的工作。基于 CRONOS，作者进一步开发了 CRONOS-AM 算法，结合交替最小化方法，实现了对任意结构多层网络的训练。理论分析表明 CRONOS 在温和条件下能收敛到凸重构的全局最小值，实验验证显示其在图像和语言任务中表现优于主流深度学习优化器。

详情

Journal ref: Advances in Neural Information Processing Systems 37 (NeurIPS 2024)

AI中文摘要

我们提出了用于两层神经网络凸优化的CRONOS算法。CRONOS是首个能够扩展到高维数据集（如现代深度学习中普遍存在的ImageNet）的算法。这显著改进了先前的工作，这些工作仅限于MNIST和CIFAR-10的下采样版本。以CRONOS为基础，我们进一步开发了一种名为CRONOS-AM的新算法，它将CRONOS与交替最小化相结合，以获得能够训练任意架构多层网络的算法。我们的理论分析证明，在温和假设下，CRONOS收敛到凸重述的全局最小值。此外，我们通过使用JAX进行GPU加速的大规模数值实验，验证了CRONOS和CRONOS-AM的有效性。我们的结果表明，在视觉和语言任务中，使用ImageNet和IMDb等基准数据集，CRONOS-AM可以获得与主流调优深度学习优化器相当或更好的验证精度。据我们所知，CRONOS是首个利用凸重述来增强大规模学习任务性能的算法。

英文摘要

We introduce the CRONOS algorithm for convex optimization of two-layer neural networks. CRONOS is the first algorithm capable of scaling to high-dimensional datasets such as ImageNet, which are ubiquitous in modern deep learning. This significantly improves upon prior work, which has been restricted to downsampled versions of MNIST and CIFAR-10. Taking CRONOS as a primitive, we then develop a new algorithm called CRONOS-AM, which combines CRONOS with alternating minimization, to obtain an algorithm capable of training multi-layer networks with arbitrary architectures. Our theoretical analysis proves that CRONOS converges to the global minimum of the convex reformulation under mild assumptions. In addition, we validate the efficacy of CRONOS and CRONOS-AM through extensive large-scale numerical experiments with GPU acceleration in JAX. Our results show that CRONOS-AM can obtain comparable or better validation accuracy than predominant tuned deep learning optimizers on vision and language tasks with benchmark datasets such as ImageNet and IMDb. To the best of our knowledge, CRONOS is the first algorithm which utilizes the convex reformulation to enhance performance on large-scale learning tasks.

URL PDF HTML ☆

赞 0 踩 0

2410.19842 2026-05-25 eess.SP cs.LG 版本更新

A comprehensive evaluation of pretraining strategies for channel-agnostic contrastive self-supervision of biosignals

生物信号通道无关对比自监督预训练策略的综合评估

Thea Brüsch, Mikkel N. Schmidt, Tommy S. Alstrøm

发表机构 * Department of Applied Mathematics and Computer Science（应用数学和计算机科学系）

AI总结该研究探讨了在生物信号的通道无关自监督学习中创建正样本对的有效策略，以解决多通道时间序列数据中数据增强设计困难和模型泛化能力不足的问题。研究提出了一种名为对比随机导联编码（CRLC）的方法，通过随机选择输入通道的子集生成正样本对，并在EEG和ECG数据上验证了其有效性。实验表明，CRLC在通道无关设置下优于其他方法，在EEG任务中甚至超越了当前最先进的模型，为生物信号的自监督学习提供了新的思路。

详情

AI中文摘要

对比学习在计算机视觉的自监督中取得了令人印象深刻的结果。该方法依赖于正对的创建，这通常通过数据增强来实现。然而，对于多变量时间序列，有效的增强可能难以设计。此外，生物信号数据集的输入通道数通常因应用而异，限制了使用特定通道配置训练的大型自监督模型的实用性。受这些挑战的驱动，我们着手研究用于生物信号通道无关自监督的正对创建策略。我们引入了对比随机导联编码（CRLC），其中使用输入通道的随机子集来创建正对，并与使用增强和时间上相邻片段作为正对的方法进行比较。我们通过在EEG和ECG数据上预训练模型，然后针对下游任务进行微调来验证我们的方法。在通道无关设置中，CRLC在两种场景下均优于竞争策略。值得注意的是，对于EEG任务，CRLC超越了当前最先进的参考模型。而在ECG任务中，尽管最先进的参考模型更优，但结合CRLC使我们能够获得可比较的结果。总之，CRLC有助于在训练我们的通道无关模型时，跨不同通道设置进行泛化。代码可在https://github.com/theabrusch/Multiview_TS_SSL获取。

英文摘要

Contrastive learning yields impressive results for self-supervision in computer vision. The approach relies on the creation of positive pairs, something which is often achieved through augmentations. However, for multivariate time series effective augmentations can be difficult to design. Additionally, the number of input channels for biosignal datasets often varies from application to application, limiting the usefulness of large self-supervised models trained with specific channel configurations. Motivated by these challenges, we set out to investigate strategies for creation of positive pairs for channel-agnostic self-supervision of biosignals. We introduce contrastive random lead coding (CRLC), where random subsets of the input channels are used to create positive pairs and compare with using augmentations and neighboring segments in time as positive pairs. We validate our approach by pre-training models on EEG and ECG data, and then fine-tuning them for downstream tasks. CRLC outperforms competing strategies in both scenarios in the channel-agnostic setting. Notably, for EEG tasks CRLC surpasses the current state-of-the-art reference model. While, the state-of-the-art reference model is superior in the ECG task, incorporating CRLC allows us to obtain comparable results. In conclusion, CRLC helps generalization across variable channel setups when training our channel-agnostic model. The code is available at https://github.com/theabrusch/Multiview_TS_SSL.

URL PDF HTML ☆

赞 0 踩 0

2409.08036 2026-05-25 cs.LG 版本更新

Heterogeneous Sheaf Neural Networks

异质层丛神经网络

Luke Braithwaite, Alessio Borgi, Gabriele Onorato, Kristjan Tarantelli, Francesco Restuccia, Fabrizio Silvestri, Pietro Liò

发表机构 * Department of Computer Science and Technology, University of Cambridge（计算机科学与技术系，剑桥大学）； Department of Electrical and Computer Engineering, Northeastern University（电气与计算机工程系，东北大学）； Department of Computer, Control and Management Engineering, Sapienza University of Rome（计算机、控制与管理工程系，罗马萨皮恩扎大学）

AI总结该研究提出了一种名为HetSheaf的异构图神经网络框架，用于处理节点和边具有不同类型和特征空间的异构图数据。不同于传统方法通过复杂架构处理异构性，HetSheaf通过细胞叠层结构直接在数据层面表示异构性，并学习基于节点和边类型的限制映射。该方法引入了SheafPool读取模块，实现了对图级别的鲁棒预测，并在多个基准测试中表现出色，性能优于多种现有方法，同时显著减少了参数数量。

Comments 48 pages, 2 figures

详情

AI中文摘要

异质图的节点和边可以属于不同类型和特征空间，出现在许多真实世界领域，包括生物学、推荐系统、社交网络和计算机系统。现有的异质图神经网络通常在架构层面通过关系特定模块、元路径机制或类型感知注意力来处理这种异质性，这往往导致越来越专门化的参数密集型设计。在这项工作中，我们提出了HetSheaf，一个通过细胞层丛学习异质图的框架。HetSheaf不是仅在架构中编码异质性，而是通过分配类型感知的局部特征空间和学习基于节点特征、节点类型和边类型的限制映射，直接在底层数据结构中表示异质性。为了支持图级预测，我们进一步引入了SheafPool，一种通用的茎空间读出方法，它聚合节点表示同时对局部基的变化保持不变，从而使层丛网络的图分类得到良好定义，并且F1分数比平均池化高出高达42个百分点。在多样化的基准测试套件（节点分类、链接预测和图分类）中，HetSheaf在异质图基准（HGB）框架上，针对同质（GCN、GAT、GIN、GraphSAGE）、异质（R-GCN、HAT、HGT）和类型无关的层丛基线，一致地实现了高达2个百分点的性能提升（节点分类上高达94.97%的Macro F1分数，链接预测上高达99.62%），同时参数数量减少了高达10倍。

英文摘要

Heterogeneous graphs, whose nodes and edges can belong to different types and feature spaces, arise in many real-world domains, including biology, recommendation, social networks, and computer systems. Existing heterogeneous graph neural networks typically handle this heterogeneity at the architectural level through relation-specific modules, meta-path machinery or type-aware attention, which often leads to increasingly specialised parameter-heavy designs. In this work, we propose HetSheaf, a framework for learning heterogeneous graphs through cellular sheaves. Instead of encoding heterogeneity solely in the architecture, HetSheaf represents it directly in the underlying data structure by assigning type-aware local feature spaces and learning restriction maps conditioned on node features, node types, and edge types. To support graph-level prediction, we further introduce SheafPool, a universal stalk-space readout that aggregates node representations while being invariant to local changes of basis, thereby making graph classification with sheaf networks well-defined and achieving an F1 Score up to 42 percentage points higher than mean pooling. Across a diverse suite of benchmarks (node classification, link prediction and graph classification). HetSheaf consistently achieves up to 2 percentage points higher performance (up to 94.97% Macro F1 Score on node classification and up to 99.62% on link prediction) on the Heterogeneous Graph Benchmark (HGB) framework against homogeneous (GCN, GAT, GIN, GraphSAGE), heterogeneous (R-GCN, HAT, HGT) and type-agnostic sheaf baselines, while reducing the number of parameters by up to 10$\times$.

URL PDF HTML ☆

赞 0 踩 0

2408.03085 2026-05-25 quant-ph cs.LG 版本更新

Moonwalk: 逆-前向微分

Dmitrii Krylov, Armin Karamzade, Roy Fox

发表机构 * University of California, Irvine（加州大学尔湾分校）

AI总结 Moonwalk 研究了反向传播中需要存储中间激活值的限制问题，提出了一种无需存储激活值的梯度计算方法。该方法通过引入向量-逆雅可比乘积（vijp）操作符，结合子浸入网络和碎片化梯度检查点技术，在前向过程中精确重建梯度，从而显著提升了网络深度而不增加内存消耗。实验表明，Moonwalk 在保持运行时间与反向传播相当的同时，能够在相同内存预算下训练出深度超过两倍的网络。

详情

Journal ref: The 29th International Conference on Artificial Intelligence and Statistics, 2026

AI中文摘要

反向传播的主要限制是它需要在正向传播过程中存储中间激活值（残差），这限制了可训练网络的深度。这引出了一个基本问题：我们能否避免存储这些激活值？我们通过重新审视梯度计算的结构来解决这个问题。反向传播通过一系列向量-雅可比乘积计算梯度，这一操作通常是不可逆的。丢失的信息位于每层雅可比矩阵的余核中。我们定义了浸没式网络——其层雅可比矩阵具有平凡余核的网络——在这种网络中，梯度可以在前向扫描中精确重建，而无需存储激活值。对于非浸没式层，我们引入了碎片梯度检查点，仅记录恢复被雅可比矩阵擦除的余切向量所需的最小残差子集。我们方法的核心是一种新的算子，即向量-逆-雅可比乘积（vijp），它反转了余核外的梯度流。我们的混合模式算法首先通过内存高效的反向传播计算输入梯度，然后使用vijp在前向扫描中重建参数梯度，从而消除了存储激活值的需要。我们在Moonwalk中实现了该方法，并表明它在相同内存预算下训练深度超过两倍的网络时，运行时间与反向传播相当。

英文摘要

Backpropagation's main limitation is its need to store intermediate activations (residuals) during the forward pass, which restricts the depth of trainable networks. This raises a fundamental question: can we avoid storing these activations? We address this by revisiting the structure of gradient computation. Backpropagation computes gradients through a sequence of vector-Jacobian products, an operation that is generally irreversible. The lost information lies in the cokernel of each layer's Jacobian. We define submersive networks -- networks whose layer Jacobians have trivial cokernels -- in which gradients can be reconstructed exactly in a forward sweep without storing activations. For non-submersive layers, we introduce fragmental gradient checkpointing, which records only the minimal subset of residuals necessary to restore the cotangents erased by the Jacobian. Central to our approach is a novel operator, the vector-inverse-Jacobian product (vijp), which inverts gradient flow outside the cokernel. Our mixed-mode algorithm first computes input gradients with a memory-efficient reverse pass, then reconstructs parameter gradients in a forward sweep using the vijp, eliminating the need to store activations. We implement this method in Moonwalk and show that it matches backpropagation's runtime while training networks more than twice as deep under the same memory budget.

URL PDF HTML ☆

赞 0 踩 0

2103.14995 2026-05-25 cs.LG cs.AI eess.SP 版本更新

Thermal transmittance prediction based on the application of artificial neural networks on heat flux method results

基于人工神经网络在热流法结果上的热透射率预测

Sanjin Gumbarević, Bojan Milovanović, Mergim Gaši, Marina Bagarić

发表机构 * Center for Theoretical Physics, Sloane Physics Laboratory, Yale University（理论物理中心、斯洛恩物理实验室、耶鲁大学）； University of Zagreb, Faculty of Civil Engineering, Department of Materials（扎格雷布大学、土木工程学院、材料系）

AI总结本文研究如何利用人工神经网络（ANN）加速建筑围护结构热传导系数（U值）的现场测量过程。通过在热流法（HFM）测量中引入并行测量策略，并基于内外空气温度预测未知热流，从而缩短测量时间。研究对比了多种ANN模型在多层墙体上的应用效果，结果表明该方法在热流预测方面具有较高准确性，为后续研究提供了有价值的参考方向。

Comments Submitted to International Building Physics Conference 2021

详情

DOI: 10.1088/1742-6596/2069/1/012152
Journal ref: J. Phys.: Conf. Ser. 2069 (2021) 012152

AI中文摘要

由于能效相关指令，欧洲联盟更加关注建筑群的深度能源改造。许多需要深度能源改造的建筑年代久远，可能缺乏设计/改造文件，或者建筑构件中的材料可能随时间发生退化。热透射率（即U值）是确定通过建筑围护结构构件传输热损失的最重要参数之一，取决于构成建筑构件的所有材料的厚度和热性能。现场U值可通过ISO 9869-1标准（热流法 - HFM）确定。然而，测量持续时间是HFM在改造设计过程开始前现场测试中未广泛使用的原因之一。本文分析了通过使用一个热流传感器进行并行测量来减少测量时间的可能性。这种并行化可以通过在HFM结果上应用特定类别的人工神经网络（ANN）来实现，基于收集的室内外空气温度预测未知热流。在达到满意的预测后，HFM传感器可重新定位到另一个测量位置。本文展示了四种ANN案例应用于HFM结果的比较，这些测量在一面多层墙上进行：一个隐藏层中有三个神经元的多层感知器、100个单元的长短期记忆、100个单元的门控循环单元以及50个长短期记忆单元和50个门控循环单元的组合。分析在基于两个输入温度预测热流率方面给出了有希望的结果。另一面墙上的额外分析显示了该方法的可能局限性，这为这一主题的进一步研究提供了方向。

英文摘要

Deep energy renovation of building stock came more into focus in the European Union due to energy efficiency related directives. Many buildings that must undergo deep energy renovation are old and may lack design/renovation documentation, or possible degradation of materials might have occurred in building elements over time. Thermal transmittance (i.e. U-value) is one of the most important parameters for determining the transmission heat losses through building envelope elements. It depends on the thickness and thermal properties of all the materials that form a building element. In-situ U-value can be determined by ISO 9869-1 standard (Heat Flux Method - HFM). Still, measurement duration is one of the reasons why HFM is not widely used in field testing before the renovation design process commences. This paper analyzes the possibility of reducing the measurement time by conducting parallel measurements with one heat-flux sensor. This parallelization could be achieved by applying a specific class of the Artificial Neural Network (ANN) on HFM results to predict unknown heat flux based on collected interior and exterior air temperatures. After the satisfying prediction is achieved, HFM sensor can be relocated to another measuring location. Paper shows a comparison of four ANN cases applied to HFM results for a measurement held on one multi-layer wall - multilayer perceptron with three neurons in one hidden layer, long short-term memory with 100 units, gated recurrent unit with 100 units and combination of 50 long short-term memory units and 50 gated recurrent units. The analysis gave promising results in term of predicting the heat flux rate based on the two input temperatures. Additional analysis on another wall showed possible limitations of the method that serves as a direction for further research on this topic.

URL PDF HTML ☆

赞 0 踩 0

2605.22954 2026-05-25 cs.LG q-bio.QM 版本更新

FederatedRSF : Federated Random Survival Forests for Partially Overlapping Medical Data

FederatedRSF：面向部分重叠医学数据的联邦随机生存森林

Maryam Moradpour, Jonas Harriehausen, Amirreza Aleyasin, Lion Philipp Wolf, Youngjun Park, Anne-Christin Hauschild

发表机构 * Institute for Predictive Deep Learning in Medicine and Healthcare（预测医学与健康人工智能研究所）； Justus Liebig University Gießen（吉森约瑟夫·李比希大学）； Hessian Center for Artificial Intelligence (hessian.AI)（黑森人工智能中心 (hessian.AI)）； Department of Medical Informatics（医学信息学系）； University Medical Center Göttingen（哥廷根大学医学中心）； Max Planck Institute for Biology of Ageing（马克斯·普朗克衰老生物学研究所）

AI总结本文提出了一种名为FederatedRSF的联邦学习方法，用于处理多中心医疗数据中的生存分析问题，特别是在数据特征部分重叠的情况下。该方法通过在各机构本地训练随机生存森林模型，并仅共享特征兼容的树结构，从而在不泄露原始数据的前提下实现模型聚合与推理。实验表明，该方法在乳腺癌数据集上的表现与集中式训练模型相当，有效解决了数据隐私和特征异质性带来的挑战。

Comments 4 pages, 2 figures. Maryam Moradpour, Jonas Harriehausen, and Amirreza Aleyasin contributed equally to this work. Includes supplementary material

详情

AI中文摘要

多中心生存预测可以提高鲁棒性和泛化性，但隐私法规和机构治理通常阻止跨机构汇集患者水平的临床和基因组数据。在实践中，部署因特征空间异质性而进一步复杂化，其中不同站点收集不同的协变量或使用不同的测序面板，导致特征集仅部分重叠。我们提出了FederatedRSF，一个实现联邦随机生存森林的Python包，它聚合本地训练的生存树，并仅将特征兼容的树重新分发到每个站点，从而在无需共享原始数据的情况下实现部分重叠的推理。我们在scikit-survival包中分发的GBSG2乳腺癌队列上评估了FederatedRSF，通过保留特征子集模拟客户端之间的特征异质性，并使用Harrell一致性指数（C-Index）在重复交叉验证和站点分割下评估区分能力。结果表明，联邦模型可以达到与集中式训练设置相当的性能。

英文摘要

Multi-center survival prediction can improve robustness and generalizability, yet privacy regulations and institutional governance often prevent pooling patient-level clinical and genomic data across institutions. In practice, deployment is further complicated by feature-space heterogeneity, in which sites collect different covariates or use different sequencing panels, resulting in only partially overlapping feature sets. We present FederatedRSF, a Python package that implements federated random survival forests, aggregating locally trained survival trees and redistributing only feature-compatible trees to each site, enabling inference with partial overlap without sharing raw data. We evaluate FederatedRSF on the GBSG2 breast cancer cohort distributed with the scikit-survival package, simulating feature heterogeneity across clients by withholding subsets of features, and assessing discrimination using Harrell's concordance index (C-Index) under repeated cross-validation and site-splits. The results demonstrated that the federated model can achieve performance comparable to that of the centralized training setting.

URL PDF HTML ☆

赞 0 踩 0

2605.22950 2026-05-25 stat.ML cs.LG math.ST stat.ME stat.TH 版本更新

Diffusion-based Denoising Beats Vanilla Score Matching in Parameter Estimation: A Theoretical Explanation

基于扩散的去噪在参数估计中优于普通得分匹配：一个理论解释

Benedikt Lütke Schwienhorst, Nadja Klein, Johannes Lederer

AI总结本文研究了在多峰分布参数估计中，基于扩散的去噪分数匹配方法相较于传统分数匹配方法的优越性，并给出了理论解释。作者提出了一种新的扩散去噪分数匹配估计器（DDSME），并通过理论分析证明，传统分数匹配估计器在峰间距离增大时误差会恶化，而DDSME通过适当调节超参数可避免这一问题。该研究为扩散模型在参数估计中的优势提供了新的理论依据。

2605.22940 2026-05-25 cs.LG cs.AI stat.ML 版本更新

Human-Centered Learning Mechanics: A Dynamical Framework for Entropy-Regulated Representation Learning

以人为中心的学习力学：熵正则化表示学习的动力学框架

Kim Phuc Tran

发表机构 * Univ. Lille, ENSAIT, ULR 2461 – GEMTEX – Génie et Matériaux Textiles（里尔大学，ENSAIT，ULR 2461 – GEMTEX – 纺织工程与材料纺织系）； International Chair in DS & XAI, International Research Institute for Artificial Intelligence and Data Science, Dong A University（数据科学与可解释人工智能国际主席，人工智能与数据科学国际研究所，东亚大学）

AI总结本文提出了一种名为“以人为中心的学习力学”（HCLM）的动态信息理论框架，旨在为开放且受控的学习系统提供理论支持。研究指出，传统的熵正则化方法在某些情况下可能导致梯度不稳定或与优化方向不一致，因此引入了有效熵的概念，并提出了可计算的几何熵代理方法，如基于方差和对数行列式的协方差代理。文章的主要贡献包括形式化有效信息力下的熵正则化、推导收敛性和泛化性理论，以及从动态角度解释模型规模与性能之间的关系。实验表明，几何熵代理，尤其是对数行列式协方差熵，能产生更稳定和有力的信息力，提升表示学习的效果。

Comments Submitted to JMLR

详情

AI中文摘要

深度学习越来越被视为参数空间中的动力学过程，然而许多现有理论仍将训练视为封闭的优化系统。这种观点对于现实世界的人工智能是有限的，因为模型在不确定性、资源约束、分布偏移、下游决策风险和人类反馈下运行。我们提出了以人为中心的学习力学（HCLM），一个用于开放和受控学习系统的动力学和信息论框架。核心思想是，只有当所选的熵代理沿着优化轨迹产生非简并的信息力时，熵正则化才是有用的。否则，熵项可能产生弱、不稳定或不对齐的梯度，导致动力学坍缩为普通的损失最小化。我们引入了有效熵的概念，并研究了可处理的几何熵代理，包括基于方差和对数行列式协方差代理。本文做出三项贡献。首先，它通过有效信息力形式化了熵正则化，并刻画了简并熵区域。其次，它在显式假设下推导了收敛性、熵流、Wasserstein梯度流和噪声表示泛化结果。第三，它提供了缩放律行为的条件动力学解释，作为信息注入、熵耗散和残差风险之间的平衡，而不声称对经验神经缩放律的无条件推导。受控的表示学习实验支持几何熵代理（尤其是对数行列式协方差熵）比softmax归一化熵产生更强更稳定的信息力的假设。

英文摘要

Deep learning is increasingly viewed as a dynamical process in parameter space, yet many existing theories still treat training as a closed optimization system. This view is limited for real-world AI, where models operate under uncertainty, resource constraints, distribution shift, downstream decision risks, and human feedback. We propose Human-Centered Learning Mechanics (HCLM), a dynamical and information-theoretic framework for open and controlled learning systems. The central idea is that entropy regularization is useful only when the chosen entropy surrogate generates a non-degenerate information force along the optimization trajectory. Otherwise, entropy terms may produce weak, unstable, or misaligned gradients, causing the dynamics to collapse toward ordinary loss minimization. We introduce the notion of effective entropy and study tractable geometric entropy surrogates, including variance-based and log-determinant covariance proxies. The paper makes three contributions. First, it formalizes entropy regularization through effective information force and characterizes degenerate entropy regimes. Second, it derives convergence, entropy-flow, Wasserstein-gradient-flow, and noisy-representation generalization results under explicit assumptions. Third, it offers a conditional dynamical interpretation of scaling-law-like behavior as a balance between information injection, entropy dissipation, and residual risk, without claiming an unconditional derivation of empirical neural scaling laws. Controlled representation-learning experiments support the hypothesis that geometric entropy surrogates, especially log-determinant covariance entropy, induce stronger and more stable information forces than softmax-normalized entropy.

URL PDF HTML ☆

赞 0 踩 0

2605.22939 2026-05-25 cs.CL cs.LG 版本更新

Learnability-Informed Fine-Tuning of Diffusion Language Models

扩散语言模型的可学习性感知微调

Shubham Parashar, Atharv Chagi, Jacob Helwig, Lakshmi Jotsna, Sushil Vemuri, James Caverlee, Dileep Kalathil, Shuiwang Ji

发表机构 * Department of Computer Science and Engineering, Texas A\&M University, College Station, TX, USA（计算机科学与工程系，德克萨斯A&M大学，College Station, TX, USA）； Department of Electrical and Computer Engineering, Texas A\&M University, College Station, TX, USA（电气与计算机工程系，德克萨斯A&M大学，College Station, TX, USA）

AI总结本文旨在提升扩散语言模型（DLMs）的推理能力。研究发现，传统的监督微调（SFT）在DLMs中应用时存在局限，忽视了学习的难易程度与时机，导致性能下降。为此，作者提出了一种新的微调方法LIFT，通过在不同扩散时间步根据上下文的丰富程度学习易学或难学的token，从而更有效地利用训练信息。实验表明，LIFT在六个推理基准测试中均优于现有方法，相对提升了达3倍的性能。

详情

AI中文摘要

我们旨在提升扩散语言模型（DLM）的推理能力。虽然SFT是自回归模型常用的后训练方法，但其在DLM中的应用面临挑战，甚至可能损害性能，而根本原因尚未得到充分研究。我们的分析揭示，普通SFT忽略了可学习性，即学习什么以及何时学习。具体而言，当大部分输入被掩码时，稀有标记难以学习；而当大部分输入未被掩码时，学习常见标记则较为简单且价值不大。基于我们的分析，我们提出LIFT，一种高效的基于SFT的DLM后训练算法。LIFT在大部分输入被掩码时学习容易标记，在更多上下文可用时学习困难标记，从而使训练与不同扩散时间步的信息可用性对齐。我们的结果表明，LIFT在六个推理基准上优于现有SFT基线，在AIME'24和AIME'25上实现了高达3倍的相对增益。我们的代码已在https://github.com/divelab/LIFT公开。

英文摘要

We aim to improve the reasoning capabilities of diffusion language models (DLMs). While SFT is a popular post-training recipe for autoregressive models, its use in DLMs faces challenges and can even hurt performance, though the underlying causes remain understudied. Our analysis reveals that vanilla SFT overlooks learnability, namely what and when tokens are learned. Specifically, rare tokens are difficult to learn when most of the input is masked, whereas it is straightforward and thus of little value to learn common tokens when most of the input is unmasked. Motivated by our analysis, we propose LIFT, an efficient SFT-based post-training algorithm for DLMs. LIFT learns easy tokens when most of the input is masked and hard tokens when more context is available, thus aligning the training with the information available at different diffusion time steps. Our results show that LIFT outperforms existing SFT baselines across six reasoning benchmarks, achieving up to a 3x relative gain on AIME'24 and AIME'25. Our code is publicly available at https://github.com/divelab/LIFT.

URL PDF HTML ☆

赞 0 踩 0

2605.22902 2026-05-25 cs.LG cs.AI cs.CL 版本更新

Transcoders Trace Visual Grounding and Hallucinations in Vision-Language Models

Transcoders 追踪视觉语言模型中的视觉基础与幻觉

Dimitrios Damianos, Leon Voukoutis, Georgios Skyrianos, Vassilis Katsouros, Georgios Paraskevopoulos

发表机构 * Institute of Language and Speech Processing（语言与语音处理研究所）； Athena Research Center（雅典研究中心）

AI总结该研究探讨了生成式视觉-语言模型（VLMs）中视觉输入如何转化为文本的问题，提出了基于Transcoders的函数中心解释框架，用于分解模型内部的计算路径，揭示图像块与文本生成之间的关联。相比传统的稀疏自编码器（SAEs），该方法在图像块缺失实验中表现出更强且更稳定的解释效果，并能更准确地对应语义相关的图像区域。此外，研究还通过结构分析揭示了模型生成幻觉的机制，并利用图特征构建分类器实现了对幻觉的预测。

详情

AI中文摘要

生成式视觉语言模型（VLM）在多模态推理上表现良好，但视觉输入如何转化为文本仍知之甚少。现有的VLM可解释性工作使用稀疏自编码器（SAE），其分解静态残差表示，忽略了驱动跨模态交互的功能更新。我们采用基于Transcoders的功能中心框架，Transcoders是MLP子层的稀疏近似，作为逐层计算的因果代理。应用于Gemma 3-4B-IT，该框架将模型分解为可解释的计算路径，连接图像块到文本生成中的方向。在补丁消融下，Transcoder归因对视觉基础标记产生比SAE归因更强且更稳定的效果，并与语义相关的图像区域更好对齐。假视觉基础反事实分析证实恢复的路径是视觉-语言交互特有的。最后，我们对幻觉生成进行结构分析，从Transcoder产生的电路痕迹中提取基于图的指标。基于这些机制图特征的逻辑分类器以AUC 0.68预测幻觉。这些结果表明，功能中心的电路分解为VLM中的多模态计算提供了可解释且可预测的描述。

英文摘要

Generative Vision-Language Models (VLMs) perform well on multimodal reasoning, but how visual inputs are transformed to text remains poorly understood. Existing interpretability work on VLMs uses Sparse Autoencoders (SAEs), which decompose static residual representations and miss the functional updates that drive cross-modal interaction. We adopt a function-centric framework based on Transcoders, sparse approximations of MLP sublayers that act as a causal proxy for layer-wise computation. Applied to Gemma 3-4B-IT, the framework decomposes the model into interpretable computational pathways linking image patches to directions in token generation. Transcoder attributions produce stronger and more stable effects on visually grounded tokens under patch ablation than SAE attributions, and align better with semantically relevant image regions. A False Visual Grounding counterfactual analysis confirms that the recovered pathways are specific to vision-language interaction.Finally, we perform a structural analysis of hallucinated generations, by extracting graph-based indicators from circuit traces produced by the transcoders. A logistic classifier over these mechanistic graph features predicts hallucinations at AUC $0.68$. These results show that function-centric circuit decomposition yields interpretable and predictive accounts of multimodal computation in VLMs.

URL PDF HTML ☆

赞 0 踩 0

2605.22898 2026-05-25 cs.LG 版本更新

FIRMA: FIbonacci Ring Model Aggregation for Privacy-preserving Federated Learning

FIRMA: 斐波那契环模型聚合用于隐私保护联邦学习

Rachid Hedjam

发表机构 * Bishop’s University（比什大学）

AI总结本文提出了一种名为FIRMA的隐私保护联邦学习框架，旨在解决现有方法在去中心化、隐私保护和模型聚合效率之间的矛盾。FIRMA基于斐波那契数列设计环形拓扑结构，通过非对称邻居加权和永久私有分类头实现安全聚合，并引入动态邻居抑制和优化的环排列策略以提升模型性能。实验表明，FIRMA在多种异构数据环境下优于传统联邦平均方法，尤其在标签偏斜和狄利克雷异构场景中表现出显著优势。

详情

AI中文摘要

联邦学习协议面临结构性三难困境：规范的基于服务器的聚合~\cite{mcmahan2017} 产生单点故障和梯度反演风险；去中心化的环-八卦替代方案~\cite{hu2019segmented} 通过无信息的均匀权重将分类头暴露给半诚实的对等节点；个性化方法~\cite{collins2021exploiting} 重新引入中心聚合。现有协议无法同时实现无服务器操作、永久私有分类头、环拓扑和原则性的非对称邻居加权。我们提出FIRMA（ extbf{FI}bonacci extbf{R}ing extbf{M}odel extbf{A}ggregation），一个包含三种逐步增强的联邦学习协议系列：1) ibfl\ 建立基础：无服务器环聚合，采用斐波那契加权的邻居混合和永久私有的分类头。2) ibflp\ 在此基础上增加精度门控邻居抑制，选择性降低收敛不良的对等节点权重，同时保留斐波那契方向偏差。3) ibflpp，完整系统，通过2-opt环置换最大化相邻客户端的类别多样性，通过$K_g{=}\lceil N/2 ceil$次八卦传递实现全局环覆盖，以及余弦退火自保留校准，完成该系列。我们建立了一个收敛速率界和三个支持命题，涉及归一化、覆盖、保留和多样性最优性。在28种配置（四个基准与七种异构性制度交叉）上的系统实验表明， ibflpp\ 在所有12种标签偏斜配置中均优于 edavg\，在CIFAR-10上$K{=}1$时峰值优势达$+20.7$个百分点。在Dirichlet异构性下， ibflpp\ 是所有无服务器协议中的帕累托主导方法，在28种配置中的17种中实现了最高精度。

英文摘要

Federated learning protocols face a structural trilemma: canonical server-based aggregation~\cite{mcmahan2017} creates a single point of failure and gradient inversion risk; decentralised ring-gossip alternatives~\cite{hu2019segmented} expose classification heads to semi-honest peers via uninformed uniform weights; and personalised methods~\cite{collins2021exploiting} reintroduce central aggregation. No existing protocol simultaneously achieves server-free operation, permanently private heads, ring topology, and principled asymmetric neighbour weighting. We propose FIRMA (\textbf{FI}bonacci \textbf{R}ing \textbf{M}odel \textbf{A}ggregation), a family of three progressively enhanced federated learning protocols: 1) \fibfl\ establishes the foundation: server-free ring aggregation with Fibonacci-weighted neighbour blending and permanently private classification heads. 2) \fibflp\ augments this with accuracy-gated neighbour suppression, selectively down-weighting poorly-converged peers while preserving the Fibonacci directional bias. 3) \fibflpp, the full system, completes the family with a 2-opt ring permutation that maximises adjacent-client class diversity, global ring coverage via $K_g{=}\lceil N/2\rceil$ gossip passes, and cosine-annealed self-retention calibration. We establish a convergence rate bound and three supporting propositions governing normalisation, coverage, retention, and diversity optimality. Systematic experiments across 28 configurations -- four benchmarks crossed with seven heterogeneity regimes -- demonstrate that \fibflpp\ surpasses \fedavg\ in all 12 label-skew configurations, with a peak advantage of $+20.7$\,pp on CIFAR-10 at $K{=}1$. Under Dirichlet heterogeneity, \fibflpp\ is the Pareto-dominant method among all server-free protocols, achieving the highest accuracy in 17 of 28 configurations.

URL PDF HTML ☆

赞 0 踩 0

2605.22897 2026-05-25 cs.LG 版本更新

基于持续同调的AI原生无线接收机韧性表征

Christo Kurisummoottil Thomas, Emilio Calvanese Strinati

发表机构 * CEA-Leti（CEA-莱提）

AI总结本文研究了基于深度学习的无线接收机在非平稳信道下的鲁棒性问题，提出了一种基于持续同调的实时度量指标——拓扑鲁棒性指数（TRI），用于量化神经网络接收机在在线适应过程中的结构稳定性。TRI从三个互补维度刻画系统鲁棒性，包括模型-信道不匹配、信道冲激响应分布偏移以及信道流形拓扑特性。理论分析表明TRI具有有界性、单调性和稳定性，仿真结果验证了其在OFDM接收机中的有效性，相比传统方法能提前预警信道变化并显著降低误码率。

详情

AI中文摘要

基于深度学习的AI原生无线接收机在平稳信道条件下表现出卓越性能，但其对分布偏移的韧性仍难以通过误码率（BER）等传统指标有效表征。为克服这些局限，本文提出一种新颖的实时指标——拓扑韧性指数（TRI），该指标基于持续同调和持续指数。TRI量化了神经网络接收机参数空间在在线适应非平稳信道过程中的结构稳定性。具体而言，TRI通过三个互补维度捕捉韧性：（i）验证损失韧性，衡量模型-信道失配，基于损失景观子水平集的拓扑持续性；（ii）信道冲激响应（CIR）分布偏移，追踪CIR向量相对于校准参考分布的几何漂移；（iii）信道流形拓扑，通过经Olivier-Ricci曲率范数归一化的高斯核矩阵谱隙量化。我们建立了理论保证，表明TRI具有有界性、在性能退化下的单调性，以及关于Wasserstein距离度量的信道分布扰动的Lipschitz稳定性。针对一个OFDM深度学习接收机在三种偏移速率下跨越十个ITU-R环境间转换的仿真结果表明，TRI相比梯度范数和验证损失基线，提供了一致的大于一个OFDM符号的平均预警提前量，而梯度范数基线在每种场景下均实现零提前量。此外，所提出的TRI引导的突发重适应在200个OFDM符号内将后偏移BER相对于无适应降低了80%。

英文摘要

AI-native wireless receivers based on deep learning exhibit remarkable performance under stationary channel conditions, yet their resilience to distributional shifts remains poorly characterized by conventional metrics such as bit error rate (BER). To overcome these limitations, this paper proposes a novel real-time metric, the Topological Resilience Index (TRI), grounded in persistent homology and persistence exponents. TRI quantifies the structural stability of a neural network receiver's parameter space during online adaptation to non-stationary channels. Specifically, TRI captures resilience through three complementary dimensions: (i) validation-loss resilience measuring model-channel mismatch, grounded in the topological persistence of loss-landscape sublevel sets; (ii) channel impulse response (CIR) distribution shift, tracking geometric drift of CIR vectors from the calibration reference distribution; and (iii) channel manifold topology, quantified by the spectral gap of the Gaussian kernel matrix normalized by the Olivier-Ricci curvature norm. We establish theoretical guarantees showing that TRI is bounded, monotonic under performance degradation, and Lipschitz-stable with respect to perturbations in channel distributions measured in Wasserstein distance. Simulation results for an OFDM deep-learning receiver adapting across ten ITU-R inter-environment transitions at three shift rates demonstrate that TRI provides a consistent mean warning lead of more than one OFDM symbol over gradient-norm and validation-loss baselines, whereas the gradient-norm baseline achieves zero lead in every scenario. Furthermore, the proposed TRI-guided burst re-adaptation reduces post-shift BER by 80% relative to no adaptation within 200 OFDM symbols.

URL PDF HTML ☆

赞 0 踩 0

2605.22885 2026-05-25 cs.AI cs.CL cs.LG cs.LO 版本更新

ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization

ImProver 2：用于神经符号证明优化的迭代自改进语言模型

Riyaz Ahuja, Tate Rowney, Jeremy Avigad, Sean Welleck

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结随着形式化数学库的快速增长，对验证证明的重构和神经证明器训练数据质量的提升需求日益迫切。为解决可扩展性证明优化中面临的异构目标、数据稀缺和高训练推理成本等问题，本文提出ImProver 2，一个用于Lean 4的神经符号框架，结合高效的数据专家迭代流程和形式化结构暴露的轻量非正式抽象框架，并引入一系列衡量证明结构特性的指标。实验表明，该框架能够使小型模型在多个指标上达到与更大模型相当甚至更优的性能，展示了证明优化作为可扩展学习任务的可行性。

详情

AI中文摘要

形式化数学库正在迅速扩展，这产生了对已验证证明进行重构以保持可维护性以及提高神经证明器训练数据质量的日益增长的需求。然而，可扩展的证明优化受到异构且启发式指定的目标、稀缺的数据以及高训练和推理成本的阻碍。为了克服这些挑战，我们引入了ImProver 2，这是一个用于在Lean 4中自动进行证明优化的神经符号框架。ImProver 2将数据高效的专家迭代流程与一个暴露形式结构并附带轻量级非正式抽象的脚手架相结合。我们进一步引入了一套捕捉证明结构属性的指标。使用ImProver 2，我们训练了一个7B参数的模型，该模型在相同模型系列中优于数量级更大的模型，并且在各项指标上与中端前沿模型具有竞争力。我们还证明，我们的神经符号脚手架显著提高了小型和前沿模型的性能。我们表明，通过适当的脚手架和训练，小型模型可以有效地在复杂且多样的指标上重构研究级证明，与更大的系统相匹配，并将证明优化确立为一项可扩展、可学习的任务。

英文摘要

Formal mathematics libraries are rapidly expanding, creating a growing need to refactor verified proofs for maintainability and to improve training data quality for neural provers. However, scalable proof optimization is hindered by heterogeneous and heuristically specified objectives, scarce data, and high training and inference costs. To overcome these challenges, we introduce ImProver 2, a neurosymbolic framework for automated proof optimization in Lean 4. ImProver 2 combines a data-efficient expert-iteration pipeline with a scaffold that exposes formal structure alongside lightweight informal abstractions. We further introduce a suite of metrics capturing structural proof properties. Using ImProver 2, we train a 7B-parameter model that outperforms orders-of-magnitude larger models within the same model family, and is competitive with mid-tier frontier models across metrics. We additionally demonstrate that our neurosymbolic scaffold significantly improves performance across both small and frontier models. We show that with proper scaffolding and training, small models can effectively restructure research-level proofs over complex and varied metrics, matching substantially larger systems and establishing proof optimization as a scalable, learnable task.

URL PDF HTML ☆

赞 0 踩 0

2605.22884 2026-05-25 cs.LG cs.AI 版本更新

WeCon: 一种高效的权重条件神经求解器用于多目标组合优化问题

Xuan Wu, Jinbiao Chen, Yang Li, Lijie Wen, Chunguo Wu, Yuanshu Li, Yubin Xiao, Chunyan Miao, You Zhou, Di Wang

发表机构 * Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education（教育部符号计算与知识工程重点实验室）； College of Computer Science and Technology, Jilin University（吉林大学计算机科学与技术学院）； Department of Industrial Systems Engineering and Management, National University of Singapore（新加坡国立大学工业系统工程与管理系）； College of Software, Jilin University（吉林大学软件学院）； School of Software, Tsinghua University（清华大学软件学院）； School of Computing and Information Systems, Singapore Management University（新加坡管理学院 computing and information systems 系）

AI总结本文提出了一种高效的权重条件神经求解器WeCon，用于解决多目标组合优化问题。该方法通过设计包含三个注意力模块和门控残差融合块的编码器，增强了实例特征与权重之间的交互，生成更具信息量的权重条件上下文，并在解码器中引入残差融合块以缓解权重信号衰减问题。此外，还提出了高效的偏好优化方法EPO，生成更高质量的解对以提升训练效果。实验表明，WeCon在多个问题规模和分布模式下取得了与当前最优求解器相当的性能，同时推理时间减少了40%。

详情

AI中文摘要

现有的多目标组合优化问题（MOCOP）神经求解器通常采用基于分解的策略，将MOCOP标量化为多个与不同权重向量相关的子问题。然而，它们要么仅在解码过程中注入一次权重，限制了权重条件上下文建模，要么主要在编码过程中注入，导致解码过程中权重信号稀释。此外，偏好优化方法依赖纯随机采样来构建解对以训练求解器，这通常产生信息量较少的解对，从而导致训练效率低下。为了更好地解决这些局限性，我们提出了一种高效的权重条件神经求解器（WeCon）。具体来说，我们设计了一个具有三个注意力块和我们提出的门控残差融合（GRF）块的编码器层，以促进实例特征和权重之间的和谐交互，从而生成信息丰富的权重条件上下文。我们进一步在解码器中引入了一个即插即用的残差融合（RF）块，以减轻权重信号稀释。最后，我们提出了高效偏好优化（EPO），它构建高质量的解，从而生成更多信息量的解对以提高训练效率。在不同问题规模和分布模式下的四个MOCOP变体上的实验表明，WeCon实现了与最先进求解器POCCO-W相当的HyperVolume（HV）值，同时将推理时间减少了40%。消融研究验证了所有设计的贡献。

英文摘要

Existing neural solvers for Multi-Objective Combinatorial Optimization Problems (MOCOPs) commonly adopt decomposition-based strategies that scalarize an MOCOP into multiple subproblems associated with distinct weight vectors. However, they either inject weights only once during decoding, limiting weight-conditioned context modeling, or primarily during encoding, causing weight-signal dilution during decoding. Moreover, preference optimization methods rely on purely random sampling to construct solution pairs for training solvers, which often produces less informative pairs and thus leads to low training effectiveness. To better address these limitations, we propose an efficient Weight-Conditioned neural solver (WeCon). Specifically, we design an encoder layer with three attention blocks and our proposed Gated Residual Fusion (GRF) block to facilitate harmonious interaction between instance features and weights, thereby generating informative weight-conditioned context. We further introduce a plug-and-play Residual Fusion (RF) block in the decoder to alleviate weight-signal dilution. Finally, we propose Efficient Preference Optimization (EPO), which constructs high-quality solutions, thereby generating more informative pairs to improve training effectiveness. Experiments on four MOCOP variants across different problem scales and distribution patterns demonstrate that WeCon achieves HyperVolume (HV) values comparable to SOTA solver POCCO-W, while reducing inference time by 40%. Ablation studies validate the contributions of all designs.

URL PDF HTML ☆

赞 0 踩 0

2605.22875 2026-05-25 cs.AI cs.LG 版本更新

RMA: an Agentic System for Research-Level Mathematical Problems

RMA：一个面向研究级数学问题的智能体系统

Zelin Zhao, Bo Yuan, Jaemoo Choi, Yongxin Chen

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结本文提出了一种名为 **RMA** 的智能代理系统，专门用于解决研究级数学问题。RMA 通过分解问题分析、文献检索、公平比较、知识库构建和证明验证等模块，并由初始化器、提议者和验证者代理协同工作，实现了对复杂数学问题的长期推理和迭代证明优化。实验表明，RMA 在 First Proof 基准测试中表现出色，解决了其中八道难题，其生成的证明在逻辑性和可读性上优于现有强基线模型。

详情

AI中文摘要

我们提出了$ extbf{Research Math Agents (RMA)}$，一个用于研究级数学问题自动推理的智能体框架。与以往专注于竞赛数学或形式化定理证明的研究不同，RMA针对需要长程推理、文献依据和迭代证明改进的研究级数学问题。RMA将研究级证明求解分解为专门模块，包括问题分析、文献搜索与理解、公平比较、知识库构建和证明验证，所有这些都由初始化器、提议器和验证器智能体通过共享的结构化内存协调。在这个统一框架内，这些智能体以多角色、多轮工作流的方式运行，通过迭代反馈协作生成、改进和验证候选证明。我们在First Proof基准上评估了RMA，该基准由来自不同领域的专家数学家贡献的十个研究级问题组成。通过全面的专家评估，RMA在First Proof基准上优于强基线（包括GPT-5.2R和Aletheia），解决了十个研究问题中的八个，并生成了逻辑更合理、可读性更强的证明。我们的全面消融研究进一步表明，性能提升来自于结构化推理模块、迭代改进和基于验证器的反馈之间的交互，而非任何单一组件。我们的解决方案和实现将在论文被接收后公开。

英文摘要

We present $\textbf{Research Math Agents (RMA)}$, an agentic framework for automated reasoning on research-level mathematical problems. Unlike prior studies centered on competition mathematics or formal theorem proving, RMA targets research-level mathematical problems that require long-horizon reasoning, literature grounding, and iterative proof refinement. RMA decomposes research-level proof solving into specialized modules for problem analysis, literature search and understanding, fair comparison, knowledge-bank construction, and proof verification, all coordinated by initializer, proposer, and verifier agents through a shared structured memory. Within this unified framework, these agents operate in a multi-role, multi-round workflow, collaboratively generating, refining, and verifying candidate proofs through iterative feedback. We evaluate RMA on the First Proof benchmark, which consists of ten research-level problems contributed by expert mathematicians across diverse domains. Through comprehensive expert evaluation, RMA outperforms strong baselines on the First Proof benchmark, including GPT-5.2R and Aletheia, solving eight out of ten research problems and producing more logically sound and readable proofs. Our comprehensive ablation studies further show that performance gains arise from the interaction of structured reasoning modules, iterative refinement, and verifier-based feedback, rather than any single component. Our solutions and implementations will be made publicly available upon acceptance.

URL PDF HTML ☆

赞 0 踩 0

2605.22872 2026-05-25 cs.LG cs.AI cs.CV 版本更新

MedExpMem: Adapting Experience Memory for Differential Diagnosis

MedExpMem：适应经验记忆用于鉴别诊断

Qianhan Feng, Zhongzhen Huang, Yakun Zhu, Yannian Gu, Winnie Chiu Wing Chu, Xiaofan Zhang, Qi Dou

发表机构 * The Chinese University of Hong Kong（香港中文大学）； Shanghai Jiao Tong University（上海交通大学）

AI总结本文提出了一种名为 MedExpMem 的经验记忆框架，旨在提升基于视觉-语言模型的医疗诊断代理在鉴别诊断方面的能力。该方法通过记录模型自身在诊断过程中的失败经验，生成包含关键鉴别点、决策规则和推理错误模式的成对鉴别笔记，并采用两阶段构建过程模拟医生的学习过程。实验表明，MedExpMem 在多个放射学子专科基准上有效提升了诊断准确性，验证了其在医疗适应性方面的优越性。

Comments MICCAI 2026 Early Accept. Submission Version

详情

AI中文摘要

经验丰富的医生通过临床实践发展诊断专业知识，不仅获得疾病知识，还能区分易混淆的病症。当前的医学视觉语言模型（VLM）缺乏这种能力——它们的参数编码了静态知识，不会随着诊断经历而演变。我们提出了MedExpMem，一个经验记忆框架，使基于VLM的诊断代理能够积累鉴别诊断专业知识。与检索增强生成（检索百科式疾病描述）不同，MedExpMem记忆从代理自身的诊断失败中获得的判别经验，并将其组织为成对的鉴别笔记，编码关键判别因素、可操作的决策规则和推理错误模式。该框架采用两阶段构建过程，模仿医生的学习：初始实践暴露知识差距，反思性重新诊断完善理解。当遇到新病例时，代理检索经验记忆以指导鉴别推理。我们在涵盖11个亚专业的放射学基准上评估了MedExpMem。结果表明，在不同模型和规模上，准确率持续提升，最高达7.0%。分析实验验证了经验质量和鲁棒性，表明MedExpMem是一种有竞争力的方法，解决了参数学习无法触及的医学适应需求。

英文摘要

Experienced physicians develop diagnostic expertise through clinical practice, acquiring not only disease knowledge but also the ability to differentiate confusable conditions. Current medical vision-language models (VLMs) lack this capability -- their parameters encode static knowledge that does not evolve across diagnostic encounters. We propose MedExpMem, an experience memory framework enabling VLM-based diagnostic agents to accumulate differential diagnosis expertise. Unlike retrieval-augmented generation, which retrieves encyclopedic disease descriptions, MedExpMem memorizes discriminative experience derived from the agent's own diagnostic failures and organizes them as pairwise differential notes encoding key discriminators, actionable decision rules and reasoning error patterns. The framework adopts a two-phase construction process mirroring physician learning: initial practice exposes knowledge gaps, and reflective re-diagnosis refines understanding. When encountering new cases, the agent retrieves experience memory to guide differential reasoning. We evaluate MedExpMem on a radiology benchmark spanning 11 subspecialties. Results demonstrate consistent accuracy improvements, maximum 7.0%, across diverse models and scales. Analytical experiments validate experience quality and robustness, demonstrating MedExpMem as a competitive method addresses medical adaptation needs beyond the reach of parameteric learning.

URL PDF HTML ☆

赞 0 踩 0

2605.22871 2026-05-25 cs.LG cs.AI stat.ML 版本更新

Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity

通过自模式连通性引导的流形表示遗忘实现近似机器遗忘

Weiqi Wang, Zhiyi Tian, Chenhan Zhang, Luoyu Chen, Shui Yu

发表机构 * Xi'an Jiaotong University（西安交通大学）； Southeast University（东南大学）； University of Technology Sydney（悉尼大学）

AI总结本文提出了一种名为ManiF-SMC的近似机器遗忘方法，旨在解决现有方法在遗忘效果和学习目标保持之间的平衡问题。该方法基于模型在剩余数据上重训练时的语义相似性分类行为，通过将被遗忘样本从原始流形表示中心推向保留数据的语义邻居，实现近似遗忘。为提升遗忘效果并减少对标签和任务梯度的依赖，ManiF-SMC引入了基于边距的三元组损失和自模式连通模块，以自适应生成遗忘边距，实验表明其在多个数据集上达到了与先进方法相当的遗忘效果。

详情

AI中文摘要

机器遗忘是强制执行被遗忘权的基本机制。现有的依赖标签操作或任务梯度反转的遗忘研究通常遗忘效果有限，且可能破坏原始学习目标，通常不能保证与重新训练的标准遗忘等价。本文提出ManiF-SMC（自模式连通性引导的流形遗忘），其动机是观察到在剩余数据上重新训练的模型倾向于根据保留数据中的语义相似性对擦除样本进行分类。我们首先系统地将近似遗忘重新表述为：将每个擦除样本从其原始学习的流形表示质心推向保留数据中最近的语义邻居。这种重新表述使遗忘与重新训练行为对齐，并且仅在表示空间中操作，减少了对标签和任务特定梯度的依赖。为了解决基于流形表示的遗忘问题，ManiF-SMC将遗忘和表示保留目标封装在基于边界的三元组损失中。由于为遗忘找到合适的边界具有挑战性，我们提出一个自模式连通性模块，快速重建局部流形以指导每个遗忘案例的自适应边界生成。在四个代表性数据集上的大量实验表明，ManiF-SMC在仅操作模型表示空间的情况下，实现了与最先进近似方法相当的遗忘效果。

英文摘要

Machine unlearning is a fundamental mechanism that enforces the right to be forgotten. Existing unlearning studies that rely on label manipulation or task-gradient reversal often deliver limited unlearning effectiveness. Moreover, they can undermine the original learning objective and typically do not guarantee equivalence to standard unlearning by retraining. In this paper, we propose \textbf{ManiF-SMC} (\textbf{Mani}fold \textbf{F}orgetting with \textbf{S}elf \textbf{M}ode \textbf{C}onnectivity), motivated by the observation that a model retrained on the remaining data tends to classify erased samples by their semantic similarity to the retained data. We begin with systematically recasting the approximate unlearning as pushing each erased sample away from its original learned manifold representation centroid toward its nearest semantic neighbors in the retained data. This reformulation aligns unlearning with retraining behavior and operates purely in representation space, reducing reliance on labels and task-specific gradients. To tackle the manifold representation-based unlearning problem, ManiF-SMC encapsulates the unlearning and representation preservation goals in a margin-based triplet loss. Because finding a suitable margin for unlearning is challenging, we propose a self-mode-connectivity module that rapidly reconstructs the local manifold to guide the adaptive margins generation for each unlearning case. Extensive experiments on four representative datasets show that ManiF-SMC achieves unlearning effectiveness comparable to state-of-the-art approximate methods while operating solely within the model's representation space.

URL PDF HTML ☆

赞 0 踩 0

2605.22870 2026-05-25 cs.LG cs.AI cs.CL 版本更新

The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

读出捷径：位置数字复制主导小语言模型中的算术思维链读出

Ming Liu

发表机构 * Amazon（亚马逊）

AI总结该研究探讨了小型语言模型在进行算术推理时，思维链（CoT）提示的实际作用。研究发现，模型在输出答案时更倾向于复制位于答案分隔符前的最后一个数字，而非依赖中间推理过程。这一“位置捷径”现象显著影响了模型性能，表明当前的CoT方法可能更多依赖位置信息而非逻辑推理。实验还揭示了不同模型在复制行为上的差异，并指出这一机制可能与模型架构及任务类型相关。

Comments 18 pages (8 main + 10 appendix), 3 figures, 5 tables

详情

AI中文摘要

思维链提示对于小语言模型进行算术运算是必要的，然而打乱其步骤仍能保留大部分性能。如果思维链贡献的不是逻辑顺序，那是什么？在三个1-3B指令微调的语言模型上，针对GSM8K数据集，我们通过前缀补全隔离了答案读出阶段，并识别出一个位置捷径：模型复制占据答案分隔符前最后一个位置的数字，无论中间推理如何。正确答案的存在贡献了54-92个百分点的准确率（每个模型教师强制上限的89-92%）；即使在错误项上，最终答案与思维链最后一个数字匹配的概率为95-96%。复制通道优先于保留上下文补全：用错误值替换最后一个数字会使准确率降至接近零，尽管中间步骤正确；但移除它后，准确率在该基线之上恢复5-32个百分点——当存在可复制的数字时，即使模型本可以执行的单步算术也被抑制。Qwen和Llama在87-95%的情况下复制新干扰项；Gemma则选择性门控。头部级消融实验揭示了特定于架构的头部集；该效应在GSM-Symbolic上复现。在非算术的BBH任务上，打乱保留率急剧下降；在7-8B规模时，出现了内容选择性门控。步骤级忠实度评估有风险将位置答案传输与真实计算混为一谈——这是基于思维链的监督的一个失败模式。

英文摘要

Chain-of-thought (CoT) prompting is necessary for arithmetic in small language models, yet shuffling its steps preserves most performance. What does CoT contribute if not logical sequencing? In three 1-3B instruction-tuned LMs on GSM8K, we isolate the answer-readout stage via prefix completion and identify a positional shortcut: the model copies whichever number occupies the trailing position before the answer delimiter, regardless of intermediate reasoning. Gold-answer presence accounts for 54-92 pp of accuracy (89-92% of each model's teacher-forcing ceiling); even on incorrect items, the final answer matches the last CoT number 95-96% of the time. The copy channel takes precedence over retained-context completion: replacing the trailing number with a wrong value collapses accuracy to near-zero despite correct intermediates, yet removing it recovers 5-32 pp above that floor--even single-step arithmetic the model can otherwise perform is suppressed when a copyable number is present. Qwen and Llama copy novel distractors 87-95% of the time; Gemma gates selectively. Head-level ablation implicates architecture-specific head sets; the effect replicates on GSM-Symbolic. On non-arithmetic BBH tasks, shuffle retention drops sharply; at 7-8B, content-selective gating emerges. Step-level faithfulness evaluations risk conflating positional answer transport with genuine computation--a failure mode for CoT-based oversight.

URL PDF HTML ☆

赞 0 踩 0

2605.22869 2026-05-25 cs.LG 版本更新

从语言模型轨迹中读取校准的不确定性

Aliai Eusebi, Alexander Herzog, Xiaoyu Liang, Marie Vasek, Enrico Mariconti, Lorenzo Cavallaro

发表机构 * University College London, London, United Kingdom（伦敦大学学院）

AI总结该研究探讨了如何从语言模型生成过程中的内部轨迹中更准确地量化不确定性。不同于传统的最大softmax概率方法，作者提出了一种基于模型各层激活路径的几何特征提取方法，通过稀疏线性探针来捕捉不确定性信息。该方法在选择性拒绝任务中表现优于传统方法，且能揭示不同层在生成过程中如何逐步形成误差，为理解模型不确定性提供了更细粒度的分析视角。

详情

AI中文摘要

最大softmax概率（MSP）是评估结构化输出语言模型生成不确定性量化的默认方法。虽然计算成本低，但通常校准不佳。探针模型内部激活的方法将原始隐藏状态输入不透明分类器，将激活视为静态快照，并隐含了表示形成的逐层轨迹。然而，相似的端点可能源于非常不同的路径，证据如何在深度上累积、增强或反转可能揭示最终概率掩盖的不确定性。我们提取了十一个尺度不变的几何特征，追踪逐层MLP更新的累积路径，并将其输入稀疏线性探针。该探针在选择性弃权下优于MSP，增益随基线校准误差增加而增加，最高达21 AURC点。由于每个特征都有封闭形式的几何意义，探针的系数追踪了错误如何以及沿着深度何处形成——哪些层过早承诺，哪些层与运行状态矛盾，轨迹在何处偏离其端点。

英文摘要

The maximum softmax probability (MSP) represents a default approach when evaluating uncertainty quantification for language model generation with structured output. Although cheap, it is often miscalibrated. Methods that probe the model's internal activations feed raw hidden states into opaque classifiers, reading activations as static snapshots and leaving implicit the layer-wise trajectory by which a representation is formed. Yet, similar endpoints can arise from very different paths, and how evidence accumulates, reinforces, or reverses across depth might reveal uncertainty that final probabilities obscure. We extract eleven scale-invariant geometric features, tracing the cumulative path of per-layer MLP updates, and feed them to a sparse linear probe. The probe outperforms MSP under selective abstention, with gains scaling with baseline miscalibration up to 21 AURC points. Because every feature has a closed-form geometric meaning, the probe's coefficients trace how and where along depth errors take shape -- which layers commit prematurely, which contradict the running state, where trajectories drift away from their endpoint.

URL PDF HTML ☆

赞 0 踩 0

2605.22858 2026-05-25 eess.SP cs.LG 版本更新

Classification of IED-free EEG Responses for Assisted Epilepsy Diagnosis

用于辅助癫痫诊断的无IED脑电图反应分类

Giacomo Zanardini, Ryan Moesman, Paul van der Kleij, Robert van den Berg, Justin Dauwels

发表机构 * Signal Processing Systems（信号处理系统）； Delft University of Technology（代尔夫特理工大学）； Erasmus Medical Center（埃因霍温医学中心）

AI总结本文研究了在常规脑电图（EEG）缺乏发作间期癫痫样放电（IED）的情况下，如何利用刺激诱发的脑电信号辅助癫痫诊断。作者提出了一种基于多领域特征（时域、频域、小波域和连接性）的机器学习分类方法，并采用堆叠集成策略融合不同特征集，以提高分类性能。实验结果表明，该方法在多个数据集上表现出良好的诊断能力，特别是在间歇性光刺激（IPS）诱发的脑电信号中，能够有效区分癫痫患者与非患者，为无IED情况下的癫痫辅助诊断提供了新思路。

Comments Accepted at IEEE EMBC2026

详情

AI中文摘要

当常规脑电图缺乏发作间期癫痫样放电（IED）时，诊断癫痫具有挑战性。间歇性光刺激（IPS）和过度换气（HV）可提高诊断率，但其解释具有主观性。我们提出一种可重复的流水线，使用跨越时域、频谱、小波和连接性域的机器学习特征，以及堆叠集成来组合互补特征集，对刺激过程中采集的脑电图记录进行分类。在TUH癫痫语料库和临床Erasmus MC（EMC）队列上使用留一受试者交叉验证（LOSO）评估性能，包括在TUH上的无IED分析。在TUH上，集成在无IED静息态脑电图上达到高达97.8% AUC / 93.1% BAC，在无IED IPS上达到94.1% AUC / 86.8% BAC。在EMC上，IPS提供最强的区分能力（79.4% AUC / 73.9% BAC），而HV性能受益于按反应性对受试者进行分层。这些结果表明，刺激诱发的活动，特别是IPS，包含对无IED癫痫分类有意义的判别信息，并且多域集成提高了鲁棒性。

英文摘要

Diagnosing epilepsy is challenging when routine EEGs lack interictal epileptiform discharges (IEDs). Intermittent photic stimulation (IPS) and hyperventilation (HV) can increase diagnostic yield, but their interpretation is subjective. We propose a reproducible pipeline that classifies EEG recordings acquired during stimulation procedures, using machine-learning features spanning temporal, spectral, wavelet, and connectivity domains, and a stacked ensemble to combine complementary feature sets. Performance is evaluated with leave-one-subject-out (LOSO) cross-validation on the TUH Epilepsy Corpus and a clinical Erasmus MC (EMC) cohort, including IED-free analyses on TUH. On TUH, ensembles achieve up to 97.8\% AUC / 93.1\% BAC on IED-free resting-state EEG and 94.1\% AUC / 86.8\% BAC on IED-free IPS. On EMC, IPS provides the strongest discrimination (79.4\% AUC / 73.9\% BAC), while HV performance benefits from stratifying subjects by responsiveness. These results indicate that stimulation-evoked activity, particularly IPS, contains meaningful discriminative information for IED-free epilepsy classification and that multi-domain ensembling improves robustness.

URL PDF HTML ☆

赞 0 踩 0

2605.22857 2026-05-25 eess.SP cs.LG 版本更新

JointHRRP-Net: A Statistically Constrained Decoupling Network for Joint Target and Jamming Recognition in Composite Jamming

JointHRRP-Net: 一种用于复合干扰中目标与干扰联合识别的统计约束解耦网络

Yunfei Zhao, Mei Liu, Shuowei Liu, Xunzhang Gao, Yujie Zhou

发表机构 * College of Electronic Science and Technology, National University of Defense Technology（电子科学学院，国防科技大学）

AI总结在复合干扰环境下，基于高分辨率距离像（HRRP）的雷达自动目标识别性能显著下降。为此，本文提出了一种统一的联合目标-干扰识别框架JointHRRP-Net，通过统计约束解耦模块从混合HRRP中分离出目标主导和干扰主导的潜在特征分支，并结合多尺度时序编码模块和双专家决策模块，分别实现单标签目标分类和多标签干扰分类。实验表明，该方法在不同信噪比和信干比条件下均优于现有方法，且对未知目标具有良好的判别能力。

Comments Submitted to IEEE Transactions on Geoscience and Remote Sensing (TGRS). 15 pages, 12 figures

详情

AI中文摘要

基于高分辨率距离像（HRRP）的雷达自动目标识别在复合干扰环境中性能严重下降。有源干扰在接收到的距离像中引入压制和欺骗相关分量。脉冲压缩后，这些分量与目标回波在HRRP域中耦合，使得目标相关散射峰难以区分，削弱了特征可分离性。针对这一问题，本文提出JointHRRP-Net，一种用于目标-干扰联合识别的统一框架。首先开发了一个统计约束解耦模块，从混合HRRP表示中生成目标主导和干扰主导的潜在分支。施加相关性引导的统计约束以抑制冗余的跨分支信息并减轻目标-干扰特征纠缠。然后设计了一个多尺度时序编码模块来建模局部散射结构和长距离单元依赖关系，随后是一个双专家决策模块，用于单标签目标分类和多标签干扰分类。在不同信干比（SJR）和信噪比（SNR）水平下的实验表明，JointHRRP-Net在目标识别和复合干扰识别方面均优于代表性基线方法。开放集评估进一步表明，学习到的目标表示对于未知目标拒绝仍具有判别性。这些结果证明了JointHRRP-Net在复合干扰场景中的有效性和鲁棒性。

英文摘要

High-resolution range profile (HRRP)-based radar automatic target recognition suffers from severe performance degradation in composite jamming environments. Active jamming introduces suppression- and deception-related components into the received range profile. After pulse compression, these components are coupled with target echoes in the HRRP domain, making target-related scattering peaks difficult to distinguish and weakening feature separability. To address this problem, this paper proposes JointHRRP-Net, a unified framework for joint target-jamming recognition. A statistically constrained decoupling module is first developed to generate target-dominant and jamming-dominant latent branches from the mixed HRRP representation. Correlation-guided statistical constraints are imposed to suppress redundant cross-branch information and alleviate target-jamming feature entanglement. A multi-scale temporal encoding module is then designed to model local scattering structures and long-range range-cell dependencies, followed by a dual-expert decision module for single-label target classification and multi-label jamming classification. Experiments under diverse signal-to-jamming ratio (SJR) and signal-to-noise ratio (SNR) levels demonstrate that JointHRRP-Net outperforms representative baseline methods in both target recognition and composite jamming recognition. Open-set evaluation further shows that the learned target representation remains discriminative for unknown-target rejection. These results demonstrate the effectiveness and robustness of JointHRRP-Net in composite jamming scenarios.

URL PDF HTML ☆

赞 0 踩 0

2605.22855 2026-05-25 cs.GT cs.AI cs.CL cs.LG 版本更新

PrefBench: Evaluating Zero-Shot LLM Agents in Hidden-Preference Personalized Pricing Negotiations

PrefBench：评估隐藏偏好个性化定价谈判中的零样本LLM智能体

Yingjie Lei

发表机构 * University of Aberdeen（阿伯丁大学）

AI总结本文提出了PrefBench，一个用于评估零样本大语言模型（LLM）代理在隐藏偏好个性化定价谈判中表现的基准测试平台。该平台通过模拟买家与固定车辆定制套餐的互动，要求卖家在仅能获取公开信息的情况下进行谈判，而买家的估值、耐心、还价行为等关键参数是隐藏的。实验表明，尽管LLM代理能够遵循协议并达成高比例的交易，但其利润表现较差，远不如简单的让步策略，突显了当前LLM在利润敏感型谈判中的不足。PrefBench为研究隐藏买家偏好下的定价代理行为提供了可控的评估环境。

Comments 24 pages, 3 figures, 5 tables. Code is available at https://github.com/ChaosTheProducer/PrefBench

详情

AI中文摘要

个性化定价谈判是LLM智能体的一个具有挑战性的测试平台，因为成功的互动并不能保证盈利的决策。当买方的支付意愿和谈判特征仍然隐藏时，卖方可能产生有效的行动并达成许多交易，但定价仍然很差。本文提出了PrefBench，一个基于模拟器的隐藏偏好个性化定价谈判基准。每个回合将一个模拟买家与一个固定的车辆定制捆绑包配对；卖方观察公开的人物描述符、捆绑包信息和谈判历史，而潜在的买方变量控制估值、耐心、还价行为和退出决策。PrefBench通过一个面向LLM的状态摘要协议来评估这一设置，该协议限制智能体在固定的隐藏信息边界下返回严格的JSON动作。我们在7500个回合中评估了零样本LLM卖家与启发式参考。测试的LLM可靠地遵循协议，实现了高于0.99的交易率，但它们的卖家利润结果仍然较弱：最佳LLM平均利润仅略高于随机基线，远低于同一回合流下的简单让步启发式。这些结果表明，结构化行动合规性和寻求协议的行为可以与弱利润敏感谈判共存。PrefBench为评估隐藏买方偏好下的定价智能体行为提供了一个受控基准。

英文摘要

Personalized pricing negotiations are a challenging testbed for LLM agents because successful interaction does not guarantee profitable decision making. A seller may produce valid actions and close many deals while still pricing poorly when buyer willingness to pay and bargaining traits remain hidden. This paper presents PrefBench, a simulator-based benchmark for hidden-preference personalized pricing negotiations. Each episode pairs a simulated buyer with a fixed vehicle-customization bundle; the seller observes public persona descriptors, bundle information, and negotiation history, while latent buyer variables govern valuation, patience, counter-offer behavior, and walkaway decisions. PrefBench evaluates this setting through an LLM-facing state-summary protocol that constrains agents to return strict JSON actions under a fixed hidden-information boundary. We evaluate zero-shot LLM sellers against heuristic references over 7,500 episodes. The tested LLMs follow the protocol reliably and achieve deal rates above 0.99, but their seller-profit outcomes remain weak: the best LLM average profit is only slightly above the random baseline and far below a simple concession heuristic under the same episode stream. These results show that structured action compliance and agreement-seeking behavior can coexist with weak profit-sensitive bargaining. PrefBench provides a controlled benchmark for evaluating pricing-agent behavior under hidden buyer preferences.

URL PDF HTML ☆

赞 0 踩 0

2605.22853 2026-05-25 eess.SP cs.LG q-bio.QM 版本更新

Topological Signal Processing: An Application-Oriented Tutorial

拓扑信号处理：面向应用的教程

Flavia Petruso, Maria Giulia Preti, Dimitri Van De Ville

发表机构 * Neuro-X Institute, École Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland（神经-X研究所，瑞士洛桑联邦理工学院（EPFL），日内瓦）； Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland（放射学与医学信息学系，日内瓦大学，日内瓦，瑞士）

AI总结本文介绍了拓扑信号处理（TSP）的基础概念及其在实际应用中的方法，旨在帮助研究者更好地理解和应用这一新兴领域。TSP 扩展了传统图信号处理（GSP），能够处理定义在节点、边、三角形等高阶网络结构上的信号，通过组合霍奇拉普拉斯算子等工具，实现了对复杂系统中高阶相互作用的分析。文章结合脑成像等实际案例，展示了 TSP 在揭示非平凡区域交互关系中的潜力，推动其在理论与应用研究中的广泛应用。

详情

AI中文摘要

许多现代数据集规模庞大且具有复杂的结构关系。传统上，基于图的方法用于表示网络数据，将个体元素建模为节点，将成对交互建模为边。此外，图信号处理（GSP）已被开发用于分析图节点上的信号，例如全国不同地区的温度测量值（节点信号）表示为图。拓扑信号处理（TSP）是一个新兴领域，它推广了GSP，使得不仅可以分析节点上的信号，还可以分析边、三角形以及更高维网络元素上的信号，这些元素被建模为单纯复形及相关拓扑结构。这使得TSP通过将滤波和傅里叶变换等经典信号处理概念扩展到拓扑层面，自然适用于研究复杂系统中的高阶交互。尽管TSP具有多功能性，但对许多实践者来说仍然具有挑战性。因此，我们提供了一个易于理解的TSP基础概述，同时与面向应用的场景建立联系。我们重点介绍基于组合Hodge Laplacian的处理技术，该技术将图Laplacian推广到单纯复形。特别地，我们回顾了关键的TSP概念，将其与现实世界的例子联系起来，并讨论了如何从数据集中导出高阶结构和信号。例如，我们引入了一种捕捉节点信号之间滞后交互的边级信号，并在基于TSP的脑成像数据分析案例研究中展示了其应用，揭示了脑区域集合之间的非平凡交互。总体而言，我们旨在通过弥合方法发展与应用程序之间的差距，促进TSP的更广泛采用，推动其在理论和应用研究人员社区中的使用。

英文摘要

Many modern datasets are large and carry complex structural relationships. Graph-based methods have traditionally been used to represent networked data, modeling individual elements as nodes and pairwise interactions as edges. Furthermore, Graph Signal Processing (GSP) has been developed to analyze signals on graph nodes, such as temperature measurements (node signals) across different regions of a country represented as a graph. Topological Signal Processing (TSP) is an emerging field that generalizes GSP, enabling the analysis of signals defined not only on nodes but also on edges, triangles, and higher-dimensional network elements, modeled as simplicial complexes and related topological structures. This makes TSP naturally well-suited for studying higher-order interactions in complex systems by extending classical signal processing concepts, such as filtering and Fourier transforms, to the topological level. Despite its versatility, TSP remains challenging for many practitioners. Therefore, we present an accessible overview of TSP foundations while drawing connections with application-oriented settings. We focus on processing techniques based on the combinatorial Hodge Laplacian, which generalizes the graph Laplacian to simplicial complexes. In particular, we review key TSP concepts, relate them to real-world examples, and discuss how higher-order structures and signals can be derived from datasets. For instance, we introduce an edge-level signal capturing lagged interactions between nodal signals, and demonstrate its use in a case study on TSP-based analysis of brain imaging data, revealing nontrivial interactions between sets of brain regions. Overall, we aim to promote a broader adoption of TSP by bridging methodological developments with applications, fostering its use among a wide community of theoretical and applied researchers.

URL PDF HTML ☆

赞 0 踩 0

2605.22852 2026-05-25 cs.DB cs.AI cs.LG cs.LO 版本更新

Expressive Power of Deep Homomorphism Networks over Relational Databases

关系数据库上深度同态网络的表达能力

Moritz Schönherr, Balder ten Cate, Maurice Funk, Benny Kimelfeld, Carsten Lutz, Arie Soeteman

发表机构 * University of Amsterdam（阿姆斯特丹大学）； Leipzig University（莱比锡大学）； Technion（技术学院）； RelationalAI（关系AI）

AI总结本文研究了深度同态网络（DHNs）在关系数据库上的表达能力，探讨其与一阶逻辑及其扩展之间的联系。通过将DHNs与包含否定、计数和比例量化等扩展的逻辑片段进行对比，揭示了其在不同聚合方式下的表达能力边界。研究还表明，DHNs与SQL之间存在经典对应关系，并进一步分析了其在静态分析问题中的可判定性。实验验证了不同表达能力的DHNs在预测任务中的性能差异。

详情

AI中文摘要

消息传递图神经网络（GNN）的表达能力限制促使了更强大的图学习架构的发展。我们主张深度同态网络（DHN）作为一种特别适合在关系数据库上学习的模型，因为它与SQL的重要片段（如合取查询）有密切联系。我们通过将DHN与一阶逻辑（FO）的各种自然片段和扩展相关联，研究了DHN的精确表达能力。对于具有max、sum和mean聚合的DHN，我们建立了与一元否定片段（UNFO）以及带有计数量词和比例量词的UNFO扩展的联系。我们进一步将sum聚合DHN与FO的一元量词交替片段以及带有表达性计数的FO扩展相关联。通过FO与SQL之间的经典对应关系，这些结果也阐明了DHN与SQL之间的关系。它们还使我们能够研究DHN的两个基本静态分析问题——空问题和包含问题——的可判定性。最后，我们通过实验证实，表达能力的差异在合适的预测任务性能上得到了体现。

英文摘要

The expressive limitations of message-passing Graph Neural Networks (GNNs) have motivated a wide range of more powerful graph learning architectures. We advocate Deep Homomorphism Networks (DHNs) as a model particularly well-suited for learning over relational databases, due to their close connection to important fragments of SQL such as conjunctive queries. We study the precise expressive power of DHNs by relating them to various natural fragments and extensions of first-order logic (FO). For DHNs with max, sum, and mean aggregations, we establish connections to the unary negation fragment (UNFO) and to the extensions of UNFO with counting quantifiers and with ratio quantifiers. We further relate sum-aggregation DHNs to the unary quantifier alternation fragment of FO and to an extension of FO with expressive counting. Through the classical correspondence between FO and SQL, these results also illuminate the relation between DHNs and SQL. They also enable us to study the decidability of two fundamental static analysis problems for DHNs, the emptiness problem and the subsumption problem. Finally, we confirm through experiments that the established differences in expressive power are reflected in the performance on suitable prediction tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.22851 2026-05-25 eess.SP cs.LG eess.IV 版本更新

VAMP-Diff: VampPrior Latent Diffusion for Photoplethysmography Modeling

VAMP-Diff: 用于光电容积描记法建模的VampPrior潜扩散模型

Fatemeh Ghasemi Balouei, Nathan Willemsen, Mahesh Banavar, Bahman Moraffah

发表机构 * Department of Electrical and Computer Engineering（电气与计算机工程系）； Clarkson University（克林顿大学）； Department of Computer Science（计算机科学系）； Worcester Polytechnic Institute（沃思格理工学院）

AI总结本文提出了一种名为 VAMP-Diff 的变分潜扩散模型，用于生成和重建光电容积图（PPG）信号。该方法结合了时间编码器、条件扩散解码器和 VampPrior 正则化，能够在潜空间中更准确地保留心率和呼吸率等生理特征，并生成形态更真实的 PPG 波形。实验表明，与基于高斯先验的模型相比，VAMP-Diff 在重建精度和生理信息保持方面表现出更优的性能。

Comments Submitted to the 2026 Asilomar Conference on Signals, Systems, and Computers. 12 pages, 6 figures

详情

AI中文摘要

光电容积描记法（PPG）已成为一种普遍存在的生理信号；然而，当前的生成模型仍然难以保留真实的波形形态并学习捕捉心脏和呼吸生理的潜在结构。使用对抗损失训练的PPG生成器可以产生合理的波形，但无法提供从真实信号到潜在表示的推理路径。另一方面，变分自编码器将PPG数据映射到潜在编码，尽管它们的解码器常常模糊收缩上升波并削弱幅度和频谱细节。扩散模型提高了波形保真度，但通常缺乏用于重建和生理分析的推理路径。我们提出了VampPrior潜扩散（VAMP-Diff），一种联合训练的变分扩散模型，结合了时间PPG编码器、条件一维扩散解码器以及紧凑池化潜在上的VampPrior正则化。该模型在扩散重建期间使用完整的时间潜在，使解码器能够访问心跳时序和形态，同时从学习的VampPrior组件而非固定高斯先验生成样本。我们在CapnoBase数据集上证明，VAMP-Diff生成逼真的PPG信号，重建比高斯先验基线更清晰的生理波形，保留心率信息，维持呼吸率一致性，并通过重建误差对波形损坏敏感。

英文摘要

Photoplethysmography (PPG) has become a ubiquitous physiological signal; however, current generative models still struggle to preserve realistic waveform morphology and learn a latent structure that captures cardiac and respiratory physiology. PPG generators trained with adversarial losses can produce plausible waveforms, but provide no inference path from a real signal to a latent representation. Variational autoencoders, on the other hand, map the PPG data to latent codes, although their decoders often blur systolic upstrokes and dampen amplitude and spectral details. Diffusion models improve waveform fidelity, but typically lack an inference path for reconstruction and physiological analysis. We propose VampPrior Latent Diffusion (VAMP-Diff), a jointly trained variational diffusion model that combines a temporal PPG encoder, a conditional one-dimensional diffusion decoder, and VampPrior regularization on a compact pooled latent. The model uses full temporal latent during diffusion reconstruction, giving the decoder access to beat timing and morphology while generating samples from learned VampPrior components instead of a fixed Gaussian prior. We demonstrate on the CapnoBase dataset that VAMP-Diff produces realistic PPG signals, reconstructs sharper physiological waveforms than Gaussian-prior baselines, preserves heart-rate information, maintains respiratory-rate consistency, and is sensitive to waveform corruptions through reconstruction error.

URL PDF HTML ☆

赞 0 踩 0

2605.22848 2026-05-25 cs.CE cs.LG q-bio.OT 版本更新

From Simulation to Discovery: AI Enabled Probabilistic Emulation of Mechanistic Crop Systems

从模拟到发现：AI驱动的机理作物系统概率仿真

Mojdeh Saadati, Juan Panelo, Gustavo Visentini, Soumik Sarkar, Carlos Messina, Baskar Ganapathysubramanian

发表机构 * Department of Mathematics and Department of Computer Science, Iowa State University（数学系和计算机科学系，爱荷华州立大学）； Department of Horticultural Sciences, University of Florida（园艺科学系，佛罗里达大学）； Department of Mechanical Engineering, and Translational AI Center, Iowa State University（机械工程系和转化人工智能中心，爱荷华州立大学）

AI总结该研究提出了一种基于人工智能的概率神经模拟器，用于高效模拟作物生长过程，解决了传统作物模型计算成本过高的问题。通过训练大量多样化条件下的模拟数据，并结合物理一致的天气生成器，该方法在保持高预测精度的同时大幅提升了模拟效率，能够快速探索不同基因型、环境和管理条件下的作物响应。研究发现了一些在多种条件下保持高产量的玉米性状组合，并揭示了辐射利用效率和温度驱动的根系动态是影响产量韧性的关键因素，展示了该方法在农业适应气候变化研究中的巨大潜力。

详情

AI中文摘要

全球粮食安全依赖于预测作物对气候变异的响应，但基于过程的作物模型对于大规模探索基因型和环境相互作用而言计算成本过高。本文开发了APSIM的概率神经仿真器，该仿真器在13个输出上以高保真度（R²=0.93）再现了关键玉米生长过程，同时将模拟时间降低了数个数量级。该框架在涵盖多样化遗传、土壤和管理条件的200万次模拟上训练，并辅以卷积合成天气生成器以产生物理一致的气候序列，从而能够在现实且多样化的环境输入下进行可扩展的作物响应探索，同时提供校准的预测不确定性，无需昂贵的贝叶斯推断。将该框架应用于10万个性状配置、爱荷华州和伊利诺伊州的六种土壤环境以及两种排放情景下直至2100年的气候预测，我们识别出181种在所有测试条件下均能持续保持高产的玉米性状组合——这一分析仅靠机理模型是无法实现的。我们进一步表明，辐射利用效率和温度驱动的根系动态是产量韧性的主要驱动因素。值得注意的是，预测的产量分布在不同地点间差异显著，一些低生产力地点在未来气候情景下产量增加，表明气候变化可能以非直观的方式重塑区域产量潜力。这些结果证明了不确定性感知仿真如何将机理作物模拟从计算瓶颈转变为按需发现引擎，其能够以任何基于过程的模型无法比拟的规模探索完整的基因型、环境和管理空间。

英文摘要

Global food security depends on predicting crop responses to climate variability, yet process based crop models remain too computationally expensive for large scale exploration of genotype and environment interactions. Here we develop a probabilistic neural emulator of APSIM that reproduces key maize growth processes across 13 outputs with high fidelity (with R^2 of 0.93) while reducing simulation time by several orders of magnitude. Trained on two million simulations spanning diverse genetic, soil, and management conditions, and augmented with a convolutional synthetic weather generator that produces physically consistent climate sequences, the framework enables scalable exploration of crop responses under realistic and diverse environmental inputs while providing calibrated predictive uncertainty without costly Bayesian inference. Applying this framework across 100,000 trait configurations, six soil environments in Iowa and Illinois, and climate projections through the year 2100 under two emissions scenarios, we identify 181 maize trait combinations that consistently maintain high yield across all tested conditionsan analysis infeasible with the mechanistic model alone. We further show that radiation use efficiency and temperature driven root dynamics are dominant drivers of yield resilience. Notably, projected yield distributions vary substantially across locations, with some lower productivity sites exhibiting yield increases under future climate scenarios, indicating that climate change may reshape regional yield potential in nonintuitive ways. These results demonstrate how uncertainty aware emulation transforms mechanistic crop simulation from a computational bottleneck into an on demand discovery engine, one capable of interrogating the full genotype, environment and management space at a scale no process-based model can match.

URL PDF HTML ☆

赞 0 踩 0

2605.22842 2026-05-25 cs.CR cs.AI cs.LG 版本更新

The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems

归因偏差：当记忆中毒在自主AI系统中看起来像模型失败时

Tanzim Ahad, Ismail Hossain, Md Jahangir Alam, Sai Puppala, Syed Bahauddin Alam, Sajedul Talukder

发表机构 * Department of Computer Science, University of Texas at El Paso（德克萨斯大学埃尔帕索分校计算机科学系）； School of Computing, Southern Illinois University Carbondale（南方伊利诺伊大学卡本代尔分校计算机学院）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结该论文揭示了多智能体AI系统中的一种结构性缺陷——“误归因鸿沟”，即内存层攻击引发的行为与模型失效难以区分，导致防御者误判问题根源。研究提出“语义规范漂移”（SND）作为智能体行为失当的第三种路径，不同于模型对齐偏差和共谋行为，其通过信任清洗链使恶意文档伪装成系统可信内容。论文引入反事实组合测试等新方法，有效识别攻击源，并提出内存持久信息流控制技术，显著提升系统安全性。

Comments This paper is presently under review at a top-tier security venue

详情

AI中文摘要

多智能体AI流水线通常假设智能体不当行为源于模型失配。我们识别了该假设中的一个结构性缺陷，即“归因偏差”，其中记忆层攻击产生与模型失败无法区分的行为，导致防御者应用错误的补救措施。我们将“语义规范漂移”（SND）形式化为智能体不当行为的第三条路径，区别于新兴失配和共谋。在SND中，一份策略格式的文档通过正常上传进入共享向量存储，并在通过信任洗钱链丢失来源后重新作为受信任的系统上下文出现。在64个记录在案的失败中，归因系统一致地指责模型。四个安全分类器，包括一个在记忆中毒上训练的，在510个检查点中产生了零检测。在65个有效案例中的59个中，智能体在服从前明确引用注入的文档作为规范权威。该攻击不需要触发器、模型访问或重复交互，在五个会话内达到完全效果，并无限期持续。我们引入了反事实组合测试，它以87.5%的准确率和零误报识别因果入口，而取证基线在所有25个场景中均失败。我们进一步证明了检索-覆盖困境，表明更强的规避本质上削弱了攻击，限制了自适应绕过策略。最后，我们提出了记忆持久信息流控制，它在跨会话边界阻止了97%的攻击，而先前的防御在此处失败。我们发布了SND语料库，这是第一个具有时间持久性和跨金融与医疗保健领域多智能体组合的对抗性记忆基准。

英文摘要

Multi-agent AI pipelines typically assume that agent misconduct originates from model misalignment. We identify a structural failure in this assumption, the \emph{Misattribution Gap}, where memory-layer attacks produce behaviors indistinguishable from model failure, causing defenders to apply the wrong remediation. We formalize \emph{Semantic Norm Drift} (SND) as a third path to agent misconduct, distinct from emergent misalignment and collusion. In SND, a policy-formatted document enters a shared vector store through normal uploads and later reappears as trusted system context after provenance is lost through a Trust Laundering Chain. Across 64 documented failures, attribution systems consistently blamed the model. Four safety classifiers, including one trained on memory poisoning, produced zero detections across 510 checkpoints. In 59 of 65 valid cases, agents explicitly cited the injected document as normative authority before complying. The attack requires no trigger, model access, or repeated interaction, achieves full effect within five sessions, and persists indefinitely. We introduce Counterfactual Composition Testing, which identifies the causal entry with 87.5% accuracy and zero false positives, while a forensics baseline fails across all 25 scenarios. We further prove the Retrieval-Coverage Dilemma, showing that stronger evasion inherently weakens the attack, limiting adaptive bypass strategies. Finally, we propose Memory-Persistent Information-Flow Control, which blocks 97% of attacks at the cross-session boundary where prior defenses fail. We release the SND Corpus, the first adversarial memory benchmark with temporal persistence and multi-agent composition across financial and Health Care domains.

URL PDF HTML ☆

赞 0 踩 0

2605.22837 2026-05-25 physics.geo-ph cs.LG eess.SP 版本更新

Evaluating PhaseNet on Teleseismic Data with MsPASS

使用 MsPASS 评估 PhaseNet 在远震数据上的表现

Jinxin Ma, Yinzhi Wang, Gary L. Pavlis, Chenbo Yin

发表机构 * Texas Advanced Computing Center, The University of Texas at Austin（德克萨斯高级计算中心，德克萨斯大学奥斯汀分校）； Department of Earth and Atmospheric Sciences, Indiana University, Bloomington, IN 47405（地球与大气科学系，印第安纳大学，印第安纳波利斯，IN 47405）

AI总结本文研究了机器学习拾震器PhaseNet在远震数据上的性能问题，并提出了一种基于MsPASS的可复现工作流，用于大规模地震数据的处理与PhaseNet的训练与推理。通过构建包含160万个远震P波波形的控制数据集，研究发现PhaseNet在区域数据上训练的模型在远震数据上表现较差，而从该数据集重新训练的模型在P波拾取的召回率和精度上均有显著提升。实验还表明，增大模型规模虽能提升性能，但会大幅降低推理效率，尤其在CPU上更为明显。

详情

AI中文摘要

大量研究表明，机器学习拾取器 PhaseNet 在本地地震信号上能产生准确的 P 波和 S 波拾取，但其在远震信号上的性能会急剧下降。为解决这一局限，我们提出了一个可重现的 MsPASS 工作流，该工作流 (i) 支持大规模地震档案的可扩展数据准备和管理，(ii) 支持标准化的 PhaseNet 训练和推理。我们构建了一个包含 160 万条波形的控制数据集，这些波形与 USArray 阵列网络设施 (ANF) 分析人员做出的远震 P 波拾取相关联。控制数据集证实，在区域信号上训练的 PhaseNet 模型在这些数据上表现不佳。然后，我们在 ANF 控制数据集的训练集上从头训练 PhaseNet，并在不重叠的保留测试集上评估，将 P 波拾取召回率提高了 741.5%，并在 0.1 秒残差窗口内产生了 683.9% 更多的拾取。我们还评估了不同模型大小的 PhaseNet 在 CPU 和 GPU 上的表现。将模型大小增加约 120 倍，精度和召回率分别提高了 15.6% 和 23.2%。然而，缩放后的模型在 NVIDIA A100 GPU 上推理吞吐量降低了 87.2%，在 128 核高性能 CPU 节点上降低了 97.3%。这些结果表明，在 GPU 上缩放 PhaseNet 比在 CPU 上更实用，并且简单地扩大模型并不是实现大幅精度提升的有效方法。

英文摘要

Numerous studies have shown that the machine-learning picker PhaseNet produces accurate P and S picks on local earthquake signals, but its performance can degrade sharply on teleseismic signals. To address this limitation, we present a reproducible MsPASS workflow that (i) enables scalable data preparation and management for large seismic archives and (ii) supports standardized PhaseNet training and inference. We assembled a control dataset of 1.6 million waveforms linked to teleseismic P-wave picks made by analysts at the USArray Array Network Facility (ANF). The control dataset confirms that the PhaseNet model trained on regional signals performs poorly on these data. We then trained PhaseNet from scratch on the training split of the ANF control dataset and evaluated it on a non-overlapping held-out test split, increasing P-pick recall by 741.5% and yielding 683.9% more picks within a 0.1s residual window. We also evaluated PhaseNet across different model sizes on both CPUs and GPUs. Increasing the model size by about 120 times improved precision and recall by 15.6% and 23.2%, respectively. However, the scaled model reduced inference throughput by 87.2% on an NVIDIA A100 GPU and by 97.3% on a 128-core high-performance CPU node. These results indicate that scaling PhaseNet is more practical on GPUs than on CPUs, and that simply enlarging the model is not an efficient way to achieve large accuracy gains.

URL PDF HTML ☆

赞 0 踩 0

2605.22836 2026-05-25 physics.geo-ph cs.LG 版本更新

Real-Time Earthquake Magnitude Classification from Initial P-Waves: Models, Dataset, and Comparative Analysis for South Asia

基于初始P波的实时地震震级分类：南亚地区的模型、数据集与比较分析

Md Nasiat Hasan Fahim, Md. Abid Ullah Muhib, Rayhanul Amin Tanvir, Abdullah Al Noman

发表机构 * Department of Computer Science and Engineering（计算机科学与工程系）； Shahjalal University of Science and Technology（沙赫拉尔科学与技术大学）

AI总结本文研究了如何利用单一地震台站初始7秒P波的垂直分量，实时分类地震震级，以提升地震预警系统的效率。研究比较了六种机器学习方法，包括传统模型和先进深度学习架构，并构建了一个包含7,318个南亚地震事件的新数据集，涵盖五个里氏震级类别。实验表明，基于Transformer的深度学习模型在准确率和推理延迟方面均优于传统方法，尤其在处理震级边界不确定性时表现出色，为实时地震预警提供了可行方案。

Comments Accepted for publication in 2025 28th International Conference on Computer and Information Technology (ICCIT). \c{opyright} 2025 IEEE

详情

DOI: 10.1109/ICCIT68739.2025.11489542
Journal ref: 2025 28th International Conference on Computer and Information Technology (ICCIT), Cox's Bazar, Bangladesh, 2025

AI中文摘要

快速地震震级估计对于有效的早期预警系统至关重要，可以挽救生命并减少经济损失。在本文中，我们提出了一项全面的震级分类研究，仅使用来自单个台站的初始7秒P波窗口的垂直分量。我们比较了六种机器学习方法，范围从传统模型到最先进的深度学习架构。我们还整理了一个包含南亚7318个地震事件的新数据集。该数据集分为五个里氏震级类别：轻微（3.0-3.9）、轻度（4.0-4.9）、中等（5.0-5.9）、强烈（6.0-6.9）和严重（>=7.0）。我们的实验表明，深度学习模型显著优于传统方法。我们基于Transformer的架构实现了76.23%的标准准确率和81.56%的自适应准确率，推理延迟为4.8毫秒。自适应准确率指标是针对近类别边界震级估计中固有的不确定性而引入的。这些结果表明，Transformer中的注意力机制与自适应分类相结合，有效地捕捉了地震信号的时间动态。这种架构优势有助于对罕见的高震级事件进行有希望的泛化，尽管地震目录具有固有的数据稀缺性。自适应准确率提供了对模型性能更现实的评估，结果表明了实时部署的可行性。

英文摘要

Rapid earthquake magnitude estimation is crucial for effective early warning systems that can save lives and reduce economic damage. In this paper, we present a comprehensive study of magnitude classification using only the vertical component of the initial 7-second P-wave window from a single station. We compare six machine learning approaches that range from traditional models to state-of-the-art deep learning architectures. We also curated a novel dataset of 7,318 earthquake events in South Asia. The dataset was categorized into five Richter-scale classes: slight (3.0-3.9), light (4.0-4.9), moderate (5.0-5.9), strong (6.0-6.9) and severe (>= 7.0). Our experiments show that deep learning models substantially outperform traditional approaches. Our Transformer based architecture achieved 76.23% standard accuracy and 81.56% adaptive accuracy with 4.8 ms inference latency. The adaptive-accuracy metric is introduced for the inherent uncertainty in magnitude estimation of near class boundaries. These results indicate that the attention mechanisms in Transformers combined with adaptive classification effectively capture the temporal dynamics of seismic signals. The architectural advantage facilitates promising generalization to rare high-magnitude events, despite the inherent data scarcity characteristic of seismic catalogs. The adaptive accuracy provides a more realistic assessment of model performance, and the result suggests viability for real-time deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.22833 2026-05-25 cs.IR cs.AI cs.LG 版本更新

RAG4Outcome: A Retrieval-Augmented Multimodal Framework for Prognostic Prediction in Chronic Osteomyelitis

RAG4Outcome：用于慢性骨髓炎预后预测的检索增强多模态框架

Daqian Shi, Pei Han, Jishizhan Chen, Yang Wang, Xiaolei Diao, Xianyou Zheng, Pengfei Cheng

发表机构 * Queen Mary University of London（女王玛丽大学）； Shanghai Sixth People’s Hospital Affiliated to SJTU School of Medicine（上海第六人民医院附属复旦大学医学院）； University College London（大学学院伦敦）

AI总结慢性骨髓炎因其高复发风险和复杂的术后恢复过程，给预后预测带来了较大挑战。传统评估方法依赖人工评分系统，存在可扩展性差、效率低和一致性不足的问题。为此，本文提出RAG4Outcome，一种基于检索增强生成（RAG）的多模态框架，整合PET-CT影像报告、结构化手术和诊断记录以及非结构化的随访记录，结合领域特定检索语料和专家引导提示，实现了更可解释、有依据且临床可靠的预后预测，初步实验结果表明其在真实病例中具有良好的效果和临床契合度。

详情

AI中文摘要

慢性骨髓炎因其高复发风险和复杂的术后恢复轨迹而面临巨大的预后挑战。传统评估通常依赖于手动评分系统，这限制了临床实践中的可扩展性、效率和一致性。此外，临床数据的异质性对当前需要对齐输入和大量标注数据集的多模态学习方法构成了挑战。在这项工作中，我们提出了RAG4Outcome，一个用于慢性骨髓炎预后预测的检索增强生成（RAG）框架。我们的方法将多模态临床数据（包括PET-CT影像报告、结构化手术和诊断记录以及非结构化随访笔记）整合到一个统一的预测流程中。通过结合领域特定的检索语料库和专家引导的提示，该框架实现了更可解释、基于证据且临床可靠的预后。在真实世界病例上的初步结果显示了有希望的有效性和临床一致性，突显了RAG4Outcome在AI辅助感染管理和术后决策支持方面的潜力。

英文摘要

Chronic osteomyelitis presents substantial prognostic challenges due to its high recurrence risk and complex postoperative recovery trajectories. Traditional assessment often relies on manual scoring systems, which limit scalability, efficiency, and consistency in clinical practice. Furthermore, the heterogeneous nature of clinical data poses challenges for current multimodal learning approaches that require aligned inputs and large annotated datasets. In this work, we propose RAG4Outcome, a retrieval-augmented generation (RAG) framework for prognostic prediction in chronic osteomyelitis. Our method integrates multimodal clinical data, including PET-CT imaging reports, structured surgical and diagnostic records, and unstructured follow-up notes, into a unified prediction pipeline. By combining a domain-specific retrieval corpus with expert-guided prompting, the framework enables more interpretable, evidence-grounded, and clinically reliable prognosis. Preliminary results on real-world cases demonstrate promising effectiveness and clinical alignment, highlighting the potential of RAG4Outcome for AI-assisted infection management and postoperative decision support.

URL PDF HTML ☆

赞 0 踩 0

2605.17212 2026-05-25 cs.LG 版本更新

Anytime PAC-Bayes for Constrained Density-Ratio Networks under Covariate Shift

协变量偏移下约束密度比网络的任意时间PAC-Bayes

Paulo Akira F. Enabe

发表机构 * Escola Politénica University of São Paulo Department of Structural and Geotechnical Engineering（圣保罗大学理工学院土木与地质工程系）

AI总结本文提出了一种统一的协变量偏移学习框架，通过约束密度比网络估计Radon-Nikodym导数，并结合PAC-Bayes方法提供任意时间的泛化保证。研究通过改变测度恒等式分解目标风险与重要性加权源风险之间的差距，并利用增强拉格朗日方法强制归一化和矩匹配约束，从而控制有效样本量。实验表明，该框架在真实数据上实现了校准的密度比估计，并优于传统方法，验证了其在协变量偏移场景下的有效性与稳定性。

详情

AI中文摘要

提出一个在协变量偏移下学习的统一框架，其中约束密度比网络逼近Radon-Nikodym导数 $r^\star = dP/dQ$ 并馈入任意时间PAC-Bayes泛化证书。一个测度变换恒等式将目标风险与重要性加权源风险之间的差距分解为由 $\|r_\theta - r^\star\|_{L^2(Q)}$ 控制的比率偏差项和由加权损失变异性控制的泛化差距项。通过增广拉格朗日方案将归一化和矩匹配恒等式作为硬积分约束强制执行，其中二阶矩惩罚控制有效样本量。PAC-Bayes在固定时间机制下实例化于加权风险，得到Bernoulli-KL界，将网络加权Gibbs后验识别为唯一的KL正则化最小化器，并量化学习比率在 $L^2(Q)$ 扰动下的稳定性，然后通过几何剥离增强为在 $t \geq t_{\min}$ 上一致的任意时间证书。一个预先注册的两阶段协议结合了对解析真实性的补丁测试和真实数据部署，验证了该框架：网络产生校准比率，相对于未加权ERM和经典直接比率估计基线降低了目标0/1损失，并达到了任意时间证书。记录了一次固定时间覆盖失败，每次分割的覆盖与标签偏移幅度一一对应，确认了仅协变量假设在操作上是紧的，而非证书的缺陷。

英文摘要

A unified framework for learning under covariate shift is presented, in which a constrained density-ratio network approximates the Radon-Nikodym derivative $r^\star = dP/dQ$ and feeds an anytime PAC-Bayes generalization certificate. A change-of-measure identity decomposes the gap between target risk and importance-weighted source risk into a ratio-bias term governed by $\|r_θ- r^\star\|_{L^2(Q)}$ and a generalization-gap term governed by the variability of the weighted loss. Normalization and moment-matching identities are enforced as hard integral constraints through an augmented-Lagrangian scheme, with a second-moment penalty controlling the effective sample size. PAC-Bayes is instantiated on the weighted risk in a fixed-time regime that yields Bernoulli-KL bounds, identifies the network-weighted Gibbs posterior as the unique KL-regularized minimizer, and quantifies stability under $L^2(Q)$ perturbations of the learned ratio, and is then strengthened by geometric peeling to an anytime certificate uniform in $t \geq t_{\min}$. A pre-registered two-campaign protocol combining a patch test against analytic ground truth with a real-data deployment validates the framework: the network produces calibrated ratios, reduces target $0/1$ loss against unweighted ERM and classical direct ratio-estimation baselines, and attains the anytime certificate. A single fixed-time coverage failure is recorded, with per-split coverage aligning one-to-one with the magnitude of the label shift, confirming that the covariate-only assumption is operationally tight rather than a defect of the certificate.

URL PDF HTML ☆

赞 0 踩 0

2602.20102 2026-05-25 cs.LG cs.AI 版本更新

BarrierSteer: LLM Safety via Learning Barrier Steering

BarrierSteer: 通过学习障碍引导实现大语言模型安全

Thanh Q. Tran, Arun Verma, Kiwan Wong, Bryan Kian Hsiang Low, Daniela Rus, Wei Xiao

发表机构 * Department of Computer Science, National University of Singapore（新加坡国立大学计算机科学系）； Singapore-MIT Alliance for Research and Technology Centre（新加坡-麻省理工联合研究中心）； CSAIL, Massachusetts Institute of Technology（麻省理工学院计算机科学与人工智能实验室）； Worcester Polytechnic Institute（沃斯堡理工学院）

AI总结尽管大语言模型（LLMs）在各种任务中表现出色，但其对对抗性攻击和不安全内容生成的易感性仍然是部署中的重大障碍，尤其是在高风险场景中。为此，本文提出了一种名为 BarrierSteer 的新型推理时框架，通过在模型的潜在表示空间中嵌入学习到的非线性安全约束，提升响应的安全性。该方法将隐藏状态的安全分类器视为控制屏障函数（CBFs），在生成过程中引导不安全的潜在轨迹满足安全约束，从而在不修改模型参数的前提下有效提升安全性，并在多个模型和数据集上验证了其优越性。

Comments This paper introduces SafeBarrier, a framework that enforces safety in large language models by steering their latent representations with control barrier functions during inference, reducing adversarial and unsafe outputs

详情

AI中文摘要

尽管大型语言模型（LLMs）在各种任务中表现出色，但它们对对抗性攻击和不安全内容生成的敏感性仍然是部署的重大障碍，尤其是在高风险场景中。解决这一挑战需要既实际有效又有理论依据的安全机制。在本文中，我们介绍了 BarrierSteer，一种新颖的推理时框架，通过将学习到的非线性安全约束直接嵌入模型的潜在表示空间来提高响应安全性。BarrierSteer 将隐藏状态安全分类器视为控制障碍函数（CBFs），从而在生成过程中引导不安全的潜在轨迹。通过有效的约束合并组合多个安全约束，而不修改底层 LLM 参数，BarrierSteer 保持了模型效用。我们提供的理论结果表明，在潜在空间中应用 CBFs 提供了一种有原则、模块化且计算高效的方法，用于根据学习到的安全约束进行引导，并保证学习到的障碍能够捕捉预期的安全属性。我们在多个模型系列和数据集上的广泛实验结果表明，BarrierSteer 显著降低了对抗性攻击成功率和有害生成，优于现有方法。代码可在我们的 GitHub 仓库中获取。

英文摘要

Despite the strong performance of large language models (LLMs) across diverse tasks, their susceptibility to adversarial attacks and unsafe content generation remains a significant obstacle to deployment, particularly in high-stakes settings. Addressing this challenge requires safety mechanisms that are both practically effective and theoretically grounded. In this paper, we introduce BarrierSteer, a novel inference-time framework that improves response safety by embedding learned nonlinear safety constraints directly into the model's latent representation space. BarrierSteer treats hidden-state safety classifiers as Control Barrier Functions (CBFs), enabling constraint-guided steering of unsafe latent trajectories during generation. By composing multiple safety constraints through efficient constraint merging without modifying the underlying LLM parameters, BarrierSteer preserves model utility. We provide theoretical results showing that applying CBFs in the latent space yields a principled, modular, and computationally efficient approach for steering with respect to learned safety constraints, with guarantees conditional on the learned barriers capturing the intended safety property. Our extensive experimental results across multiple model families and datasets demonstrate that BarrierSteer substantially reduces adversarial attack success rates and unsafe generations, outperforming the existing method. The code is available in our \href{https://github.com/thanhquangtran/BarrierSteer}{GitHub repository}.

URL PDF HTML ☆

赞 0 踩 0

2601.21306 2026-05-25 cs.LG cs.AI 版本更新

The Surprising Difficulty of Search in Model-Based Reinforcement Learning

基于模型的强化学习中搜索的惊人困难

Wei-Di Chang, Mikael Henaff, Brandon Amos, Gregory Dudek, Scott Fujimoto

发表机构 * Meta FAIR ； McGill University（麦吉尔大学）

AI总结本文研究了基于模型的强化学习中的搜索问题。传统观点认为长期预测和误差累积是主要障碍，但作者发现搜索并不能简单替代学习到的策略，甚至在模型高度准确时也可能损害性能。研究指出，缓解高估偏差比提升模型或价值函数的准确性更为关键，而通过对一组价值函数取最小值的方法能有效解决这一偏差，从而实现高效的搜索，并在多个基准任务中取得领先性能。

Comments ICML 2026

2311.01468 2026-05-25 cs.CL cs.LG 版本更新

Remember what you did so you know what to do next

记住你做了什么，以便知道下一步该做什么

Manuel R. Ciosici, Alex Hedges, Yash Kankanampati, Justin Martin, Marjorie Freedman, Ralph Weischedel

发表机构 * Information Sciences Institute, University of Southern California（信息科学研究所，南加州大学）

AI总结本文研究了使用中等规模的大型语言模型（GPT-J，60亿参数）为模拟机器人在ScienceWorld平台中制定计划，以完成30类科学实验目标。实验表明，通过引入更多历史步骤信息，该模型的性能显著优于基于强化学习的方法，最高可达3.5倍。研究还指出任务类别间的性能差异较大，平均表现可能掩盖具体问题，并展示了在仅使用6.5%训练数据时仍能取得2.2倍的性能提升。

Comments Identical to EMNLP 2023 Findings

详情

DOI: 10.18653/v1/2023.findings-emnlp.104

AI中文摘要

我们探索使用中等规模的大语言模型（GPT-J，6B参数）为模拟机器人在ScienceWorld（一个用于基础科学实验的文本游戏模拟器）中制定计划，以实现30类目标。先前发表的实证工作声称，与强化学习相比，大语言模型（LLMs）不太适合（Wang等人，2022）。使用马尔可夫假设（仅前一步），LLM的性能是强化学习方法性能的1.4倍。当我们尽可能多地填充LLM的输入缓冲区（包含尽可能多的先前步骤）时，性能提升至3.5倍。即使仅使用6.5%的训练数据，我们观察到性能比强化学习方法提高了2.2倍。我们的实验表明，30类动作的性能差异很大，这表明对任务进行平均可能会掩盖显著的性能问题。在与我们同时期的工作中，Lin等人（2023）展示了一种两部分方法（SwiftSage），该方法使用一个小型LLM（T5-large）并辅以OpenAI的大规模LLM，在ScienceWorld中取得了出色结果。我们的6B参数单阶段GPT-J在结合GPT-3.5 turbo（其参数数量是GPT-J的29倍）时，与SwiftSage的两阶段架构性能相匹配。

英文摘要

We explore using a moderately sized large language model (GPT-J 6B parameters) to create a plan for a simulated robot to achieve 30 classes of goals in ScienceWorld, a text game simulator for elementary science experiments. Previously published empirical work claimed that large language models (LLMs) are a poor fit (Wang et al., 2022) compared to reinforcement learning. Using the Markov assumption (a single previous step), the LLM outperforms the reinforcement learning-based approach by a factor of 1.4. When we fill the LLM's input buffer with as many prior steps as possible, improvement rises to 3.5x. Even when training on only 6.5% of the training data, we observe a 2.2x improvement over the reinforcement-learning-based approach. Our experiments show that performance varies widely across the 30 classes of actions, indicating that averaging over tasks can hide significant performance issues. In work contemporaneous with ours, Lin et al. (2023) demonstrated a two-part approach (SwiftSage) that uses a small LLM (T5-large) complemented by OpenAI's massive LLMs to achieve outstanding results in ScienceWorld. Our 6-B parameter, single-stage GPT-J matches the performance of SwiftSage's two-stage architecture when it incorporates GPT-3.5 turbo which has 29-times more parameters than GPT-J.

URL PDF HTML ☆

赞 0 踩 0

2110.01552 2026-05-25 cs.CL cs.AI cs.LG 版本更新

Perhaps PTLMs Should Go to School -- A Task to Assess Open Book and Closed Book QA

或许PTLMs应该去上学——一项评估开卷和闭卷问答的任务

Manuel R. Ciosici, Joe Cecil, Alex Hedges, Dong-Ho Lee, Marjorie Freedman, Ralph Weischedel

发表机构 * Information Sciences Institute, University of Southern California（信息科学研究所，南加州大学）

AI总结本文提出了一项新的任务，旨在评估预训练语言模型（PTLMs）在开放书和闭合书场景下的问答能力，使用社会学和人文领域的大学教材作为教学材料。研究通过设计基于教材内容的判断题，并进行多轮测试，发现PTLMs在闭合书条件下表现有限，表明其可能未真正理解教材内容；而在开放书条件下，允许模型检索相关段落进行回答时，性能显著提升。该任务为评估PTLMs对复杂文本的理解能力提供了新的基准。

Comments Identical to the EMNLP 2021 version

详情

DOI: 10.18653/v1/2021.emnlp-main.493

AI中文摘要

我们的目标是提供一项新任务和排行榜，以刺激关于问答和预训练语言模型（PTLM）的研究，使其理解重要的教学文档，例如大学入门教科书或手册。PTLM在许多问答任务中取得了巨大成功，但需要大量监督训练，而在零样本设置中表现较差。我们提出了一项新任务，包括两本社会科学（《美国政府2e》）和人文科学（《美国历史》）的大学入门教材，数百个基于教材作者编写的复习题的真假陈述，基于教材前八章的验证/开发测试，基于剩余章节的盲测，以及基于最先进PTLM的基线结果。由于问题平衡，随机表现应为约50%。使用BoolQ微调的T5达到了相同的表现，表明教材内容未在PTLM中预表示。闭卷考试（即阅读教材，将教材添加到T5的预训练中）最多带来微小改进（56%），表明PTLM可能没有“理解”教材（或可能误解了问题）。开卷考试（即允许机器自动检索段落并用于回答问题）表现更好（约60%）。

英文摘要

Our goal is to deliver a new task and leaderboard to stimulate research on question answering and pre-trained language models (PTLMs) to understand a significant instructional document, e.g., an introductory college textbook or a manual. PTLMs have shown great success in many question-answering tasks, given significant supervised training, but much less so in zero-shot settings. We propose a new task that includes two college-level introductory texts in the social sciences (American Government 2e) and humanities (U.S. History), hundreds of true/false statements based on review questions written by the textbook authors, validation/development tests based on the first eight chapters of the textbooks, blind tests based on the remaining textbook chapters, and baseline results given state-of-the-art PTLMs. Since the questions are balanced, random performance should be ~50%. T5, fine-tuned with BoolQ achieves the same performance, suggesting that the textbook's content is not pre-represented in the PTLM. Taking the exam closed book, but having read the textbook (i.e., adding the textbook to T5's pre-training), yields at best minor improvement (56%), suggesting that the PTLM may not have "understood" the textbook (or perhaps misunderstood the questions). Performance is better (~60%) when the exam is taken open-book (i.e., allowing the machine to automatically retrieve a paragraph and use it to answer the question).

URL PDF HTML ☆

赞 0 踩 0

2101.05400 2026-05-25 cs.CL cs.AI cs.LG 版本更新

Machine-Assisted Script Curation

机器辅助脚本编纂

Manuel R. Ciosici, Joseph Cummings, Mitchell DeHaven, Alex Hedges, Yash Kankanampati, Dong-Ho Lee, Ralph Weischedel, Marjorie Freedman

发表机构 * Information Sciences Institute, University of Southern California（信息科学研究所，南加州大学）

AI总结本文介绍了一种名为MASC的系统，用于实现人机协作的脚本创作。该系统能够自动生成事件类型、链接至维基数据、提示可能被遗漏的子事件，并记录参与多个子事件的实体及其时间顺序，从而辅助用户高效编写结构复杂的事件脚本。研究展示了MASC在实际案例中的应用效果，验证了其在脚本创作中的实用价值。

Comments Identical to the NAACL 2021 Demo version