arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.01712 2026-06-03 cs.LG 版本更新

CoAction: Cross-task Correlation-aware Pareto Set Learning

CoAction: 跨任务相关性感知的帕累托集学习

Xinyue Chen, Yingxuan Liang, Yiqin Huang, Chikai Shang, Hai-Lin Liu, Fangqing Gu

发表机构 * Guangdong University of Technology（广东工业大学）； Xiamen University（厦门大学）

AI总结提出CoAction框架，利用任务感知Transformer同时处理多个多目标优化问题，通过自注意力机制捕获任务间相关性，提升超体积、范围和稀疏性指标。

Comments Accepted by ICIC 2026 (Oral)

详情

AI中文摘要

帕累托集学习（PSL）是多目标优化中的一种新兴范式，它训练神经网络将偏好向量映射到帕累托最优解。然而，现有的PSL方法主要关注一次解决单个多目标优化问题。这一局限性不仅在多目标多任务优化场景中增加了计算成本（因为每个任务需要单独的模型），而且未能利用任务间的相关性。为了解决这个问题，我们提出了一个跨任务相关性感知的帕累托集学习（CoAction）框架，该框架利用任务感知Transformer同时处理多个任务。具体来说，通过为每个任务分配任务特定的嵌入向量，模型有效地区分任务，同时促进任务间的知识共享。我们采用Transformer编码器作为骨干架构，利用其自注意力机制捕获复杂的任务依赖关系。该方法在涵盖基准问题和实际应用的全面多任务测试套件上进行了评估，在超体积、范围和稀疏性方面展示了有效性和有竞争力的性能。

英文摘要

Pareto set learning (PSL) is an emerging paradigm in multi-objective optimization that trains neural networks to map preference vectors to Pareto optimal solutions. However, existing PSL methods primarily focus on solving a single multi-objective optimization problem at a time. This limitation not only increases computational costs in multi-objective multitask optimization scenarios by requiring a separate model for each task, but also fails to exploit the inter-task correlations across tasks. To address this, we propose a Cross-tAsk correlation-aware Pareto Set Learning (CoAction) framework, which leverages task-aware transformer to handle multiple tasks simultaneously. Specifically, by assigning task-specific embedding vectors to individual tasks, the model effectively distinguishes between tasks while facilitating knowledge sharing among them. We utilize a Transformer encoder as the backbone architecture to leverage its self-attention mechanism for capturing complex task dependencies. The proposed approach is evaluated on comprehensive multitask test suites covering both benchmark problems and real-world applications, demonstrating effectiveness and competitive performance in Hypervolume, Range, and Sparsity.

URL PDF HTML ☆

赞 0 踩 0

2606.03940 2026-06-03 eess.IV cs.CV cs.LG cs.RO 版本更新

信号损失下广告系统的隐私鲁棒增量测量

Prashant Shekhar, Caroline Howard

发表机构 * Department of Mathematics, Embry-Riddle Aeronautical University（数学系，埃姆伯里-里德尔航空大学）

AI总结针对隐私保护报告系统导致的信号损失，提出鲁棒因果决策框架，通过投影观测兼容的实验世界到增量泛函，给出尖锐决策边界，实现认证、拒绝或未决的增量判断。

详情

AI中文摘要

广告平台使用随机提升测试来测量增量，但隐私保护报告系统通过匹配率损失、可链接性损失、归因窗口损失、聚合阈值抑制、随机报告噪声和分段异质信号损失降低观测信号。本文将隐私约束下的广告测量形式化为一个鲁棒因果决策问题，考虑上述信号损失。给定随机实验和隐私引起的退化的模糊集，该框架将观测兼容的干净/未过滤实验世界的纤维投影到增量泛函上，并返回认证、拒绝和未决的决策。主要结果给出了尖锐的决策边界。边界外的报告支持一致有效的认证或拒绝，而边界内的报告包含的信息太少，任何方法都无法一致区分高于阈值的增量与非增量。支持结果给出了有限样本认证、样本复杂度保证、表明信号损失减少有效信息的极小极大下界，以及报告粒度权衡。在200万条Criteo提升数据和6.4万条Hillstrom电子邮件实验中，两个数据集的干净转化提升均为正，分别为0.00112和0.00495。在Criteo中，总体认证在轻度退化下幸存，在Hillstrom中在严重退化下幸存，而两个数据集中所有考虑的有限样本压力设置在同时包含不确定性和报告噪声后仍然未决。总体而言，本研究为隐私感知的增量测量贡献了一个决策理论层，其输出是由退化广告信号证明的最强因果主张。

英文摘要

Advertising platforms use randomized lift tests to measure incrementality, but privacy-preserving reporting systems degrade the observed signal through match-rate loss, linkability loss, attribution-window loss, aggregation-threshold suppression, randomized reporting noise, and segment-heterogeneous signal loss. This paper formulates privacy-constrained advertising measurement as a robust causal decision problem under the mentioned signal losses. Given a randomized experiment and an ambiguity set for privacy-induced degradation, the framework projects the observation-compatible fiber of clean/unfiltered experimental worlds onto the incrementality functional and returns certified, rejected, and unresolved decisions. The main result gives a sharp decision frontier. Reports outside the frontier support uniformly valid certification or rejection, whereas reports inside it contain too little information for any method to uniformly distinguish above-threshold incrementality from non-incrementality. Supporting results give finite-sample certification, sample-complexity guarantees, a minimax lower bound showing that signal loss reduces effective information, and a reporting-granularity tradeoff. On 2.0M Criteo Uplift rows and the 64K-row Hillstrom email experiment, clean conversion lift is positive in both datasets, with lifts 0.00112 and 0.00495, respectively. Population certification survives mild degradation in Criteo and severe degradation in Hillstrom, while all considered finite-sample stress settings in both datasets remain unresolved after simultaneous uncertainty and reporting noise are included. Overall, the research contributes a decision-theoretic layer for privacy-aware incrementality measurement whose output is the strongest causal-claim justified by degraded ads signals.

URL PDF HTML ☆

赞 0 踩 0

2606.03820 2026-06-03 stat.ML cs.LG 版本更新

A Quantitative Approximation Framework for Flow Distillation in Diffusion Models

扩散模型中流蒸馏的定量近似框架

Weiguo Gao, Ming Li, Lei Shi, Hanfei Zhou

发表机构 * School of Mathematical Sciences, Fudan University（复旦大学数学学院）； Shanghai Key Laboratory of Contemporary Applied Mathematics, Fudan University（复旦大学当代应用数学重点实验室）

AI总结针对扩散模型中的流蒸馏，提出一个定量近似框架，将少步采样视为学习流映射组合下的误差传播，通过理论分析和实验验证了稳定性平衡的非均匀时间网格能显著降低端到端相对MSE。

详情

AI中文摘要

我们为扩散蒸馏开发了一个定量近似框架，将少步采样视为学习流映射组合下的误差传播。聚焦于概率流ODE的轨迹蒸馏，我们表明局部近似误差在低噪声多模态区域可能被强烈放大，其中底层动力学变得刚性。在一个解析可处理的高斯混合Ornstein--Uhlenbeck设定中，我们分离了两个核心困难：近似时间依赖的分数场和控制由概率流ODE的时间积分Jacobian界决定的动力学放大。在近似方面，我们证明了构造性的L^p(p_t)保证，表明ReLU--ReQU网络随时间一致地近似高斯混合分数，其深度和宽度在目标精度上呈多对数缩放，并显式依赖于混合几何。在稳定性方面，我们推导了概率流速度的空间Lipschitz常数的一个显式界L(t)，并将其转化为由∫_s^t L(u)du控制的流映射稳定性估计，使得刚性区域中的后期放大可计算。基于这些估计，我们证明深度残差组合有效近似长时程传输，全局误差由稳定性放大因子控制，并识别出一个Lipschitz不匹配区域，其中一步蒸馏在结构上不利。由此产生的理论通过累积稳定性坐标的均匀划分得到一个稳定性平衡的非均匀时间网格。实验支持该预测，并在8个分段下与均匀网格相比将端到端相对MSE降低了高达51.9%。

英文摘要

We develop a quantitative approximation framework for diffusion distillation, viewing few-step sampling as error propagation under compositions of learned flow maps. Focusing on trajectory distillation for the probability-flow ODE, we show that local approximation errors can be strongly amplified in low-noise multimodal regimes, where the underlying dynamics become stiff. In an analytically tractable Gaussian-mixture Ornstein--Uhlenbeck setting, we separate two core difficulties: approximating the time-dependent score field and controlling the dynamical amplification governed by the time-integrated Jacobian bound of the probability-flow ODE. On the approximation side, we prove constructive L^p(p_t) guarantees showing that ReLU--ReQU networks approximate the Gaussian-mixture score uniformly over time, with depth and width scaling polylogarithmically in the target accuracy and explicitly with the mixture geometry. On the stability side, we derive an explicit bound L(t) for the spatial Lipschitz constant of the probability-flow velocity and convert it into a flow map stability estimate governed by \int_s^t L(u)\,du, making late-time amplification in stiff regimes computable. Building on these estimates, we prove that deep residual compositions efficiently approximate the long-horizon transport, with global error controlled by the stability amplification factor, and identify a Lipschitz-mismatch regime in which one-step distillation is structurally unfavorable. The resulting theory yields a stability-balanced non-uniform time grid obtained by uniform partitioning in the cumulative stability coordinate. Experiments support the prediction and reduce end-to-end relative MSE by up to 51.9\% with 8 segments compared with uniform grids.

URL PDF HTML ☆

赞 0 踩 0

2606.03736 2026-06-03 stat.ML cs.LG 版本更新

Resource-Constrained Adaptive Inference for Sequential Pricing

资源约束下的自适应推断用于顺序定价

Ruicheng Ao, Jiashuo Jiang, David Simchi-Levi

发表机构 * Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA 02139（数据、系统与社会研究所，麻省理工学院，剑桥，马萨诸塞州，02139）； Department of Industrial Engineering and Decision Analytics, Hong Kong University of Science and Technology, Hong Kong（工业工程与决策分析系，香港科技大学，香港）； Department of Civil and Environmental Engineering and Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA 02139（土木与环境工程系和运筹中心，麻省理工学院，剑桥，马萨诸塞州，02139）

AI总结针对资源约束导致固定价格推断不可行的问题，提出一种目标感知定价控制器，通过认证可行目标带并记录连续局部密度，实现基于局部去偏的学生化区间，并分析遗憾-信息核算。

详情

AI中文摘要

资源约束的定价控制器可能使得固定价格推断变得不可能：即使每个已实现的动作具有已知的正密度，控制器的资源状态也可能从可行集中移除目标价格邻域。我们通过局部不可识别结果和已实现的信息时钟形式化了这种支持排除失败。然后，我们设计了一种目标感知定价控制器，该控制器认证可行的目标带并记录连续的局部密度。局部去偏产生了学生化区间，其宽度由该时钟控制。由此产生的遗憾-信息核算（直到初始求解误差）表明，廉价的探索可能不足以进行推断：多项式目标质量给出多项式速率，而纯$1/t$目标分支在没有额外局部移动的情况下不会产生收缩的固定目标区间。实验显示了在认证带中的校准以及当资源状态崩溃目标支持时的诊断性弃权。

英文摘要

Resource-constrained pricing controllers can make fixed-price inference impossible: the controller's resource state may remove the target price neighborhood from the feasible set, even when every realized action has a known positive density. We formalize this support-exclusion failure through a local non-identification result and a realized information clock. We then design a target-aware pricing controller that certifies feasible target bands and logs continuous local densities. Localized debiasing gives studentized intervals whose width is governed by this clock. The resulting regret--information accounting, stated up to pilot re-solving error, shows that cheap exploration can be insufficient for inference: polynomial target mass gives polynomial rates, while a pure $1/t$ target branch does not yield shrinking fixed-target intervals without additional local movement. Experiments show calibration in certified bands and diagnostic abstention when the resource state collapses target support.

URL PDF HTML ☆

赞 0 踩 0

2606.03600 2026-06-03 stat.ML cs.LG 版本更新

Set-Preserving Calibration from Conformal P-Values to E-Values

从共形p值到e值的集合保持校准

Nabil Alami, Jad Zakharia, Souhaib Ben Taieb

发表机构 * ETH Zurich（苏黎世联邦理工学院）

AI总结针对共形预测中p值到e值转换的局限性，提出一种集合保持的P2E校准器，在不改变预测集的前提下实现高效转换，并在交叉共形预测和共形聚合中达到期望覆盖并提升效率。

详情

AI中文摘要

标准的共形预测（CP）过程通常用p值表述，但仅依赖p值限制了灵活性，例如在跨模型或数据分割组合依赖证据时。最近的工作探索了共形推断的e值表述，然而CP中p值和e值表述之间的直接联系仍然缺失，特别是在统计效率方面。我们首先指出了CP设置中经典p到e校准器的局限性，表明它们不是集合保持的，可能导致过于保守的预测集。为解决这一问题，我们提出了一种新颖的P2E校准器，它将共形p值转换为e值，而不改变原始共形p值诱导的预测集。我们在理论和实证上证明，我们的校准器相比现有的p到e校准器可以带来显著的效率提升。这种e值表述使得能够原则性地使用e值合并和随机化的最新进展，我们在两个应用中展示了其影响：交叉共形预测（CCP），其变体通常仅提供近似的$1-2\alpha$覆盖率，以及共形聚合（CA）。在这两种情况下，我们基于e值的方法满足所需的$1-\alpha$覆盖率保证，同时相比标准基线提高了效率。更广泛地说，我们的方法扩展了CP的灵活性，并为高效、无分布的量化不确定性开辟了新方向。

英文摘要

Standard conformal prediction (CP) procedures are typically formulated in terms of p-values, but reliance on p-values alone limits flexibility, for example, when combining dependent evidence across models or data splits. Recent work has explored e-value formulations for conformal inference, yet a direct connection between p- and e-value formulations in CP has been missing, especially regarding their statistical efficiency. We first identify limitations of classical p-to-e calibrators in the CP setting, showing that they are not set-preserving and can lead to overly conservative prediction sets. To address this, we propose a novel P2E calibrator that converts conformal p-values into e-values without altering the prediction set induced by the original conformal p-value. We establish both theoretically and empirically that our calibrator can yield significant efficiency gains over existing p-to-e calibrators. This e-value formulation enables principled use of recent advances in e-value merging and randomization, where we demonstrate its impact in two applications: cross-conformal prediction (CCP), whose variants typically provide only approximate $1-2α$ coverage, and conformal aggregation (CA). In both cases, our e-value-based methods satisfy the desired $1-α$ coverage guarantee while improving efficiency over standard baselines. More broadly, our approach expands the flexibility of CP and opens new directions for efficient, distribution-free uncertainty quantification.

URL PDF HTML ☆

赞 0 踩 0

2606.03574 2026-06-03 stat.ML cs.LG 版本更新

Few-Shot Prediction for Pulsar Noise with Long Short-Term Memory Network

基于长短期记忆网络的脉冲星噪声少样本预测

Qingye Tang, Dechao An, Haoran Peng, Yuqi Ouyang

发表机构 * Sichuan University, College of Computer Science（四川大学计算机学院）； Sichuan University, College of Physics（四川大学物理学院）

AI总结针对脉冲星计时数据稀缺问题，提出一种结合模型无关元学习优化的LSTM网络，仅需少量真实计时残差即可快速适应新频域，并利用粒子群算法自动调参，在IPTA数据集上以10%数据实现高精度预测。

详情

AI中文摘要

本文提出了一种新颖的解决方案，用于在有限数据下预测脉冲星计时残差，解决了PTA数据集中毫秒脉冲星自旋频率子组数据稀缺的关键挑战。该方案应用了长短期记忆（LSTM）网络，并通过模型无关元学习算法进行优化，使得仅需少量真实计时残差即可通过微调LSTM网络快速适应新的频域。同时，采用粒子群优化算法进行自动超参数优化，提高了预测精度。我们的解决方案在国际脉冲星计时阵列（IPTA）第二次数据发布上进行了评估，在高频测试频域的三个指标上均展现出鲁棒的泛化能力和准确预测，且仅需这些域中10%的计时残差进行模型微调。此外，我们的轻量级结构仅需16.86 MB CPU内存和18毫秒即可完成单步残差预测。所有这些特性使得我们的解决方案非常适合实际应用，在这些应用中，有效且实时的脉冲星计时残差预测至关重要——尤其是在计算能力、内存或能源有限的资源受限环境中。

英文摘要

This work proposes a novel solution to predict pulsar timing residuals with limited data, addressing the critical challenge of data scarcity across spin-frequency subgroups of millisecond pulsars in PTA datasets. The proposed solution applies a Long Short-Term Memory (LSTM) network optimized using the model-agnostic meta-learning algorithm, enabling rapid adaptation to new frequency domain by fine-tuning the LSTM network with only a few-shot of ground truth timing residuals. Particle swarm optimization algorithm is also used for automatic hyperparameter optimization, leading to improved prediction accuracy. Our solution, evaluated on the second data release of the International Pulsar Timing Array (IPTA), demonstrates robust generalization with accurate predictions in three metrics across high-frequency test frequency domains, while requiring only 10% of the timing residuals from these domains for model fine-tuning. Furthermore, our lightweight structure only costs 16.86 MB CPU memory and 18 milliseconds for single-step residual prediction. All these characteristics make our solution highly suitable for real-world applications, where effective and real-time predictions of pulsar timing residuals are essential-particularly in resource-constrained environments with limited computational power, memory, or energy availability.

URL PDF HTML ☆

赞 0 踩 0

2606.03553 2026-06-03 stat.ML cs.LG math.OC 版本更新

A Robust Optimization Approach to Sparse Principal Component Analysis

稀疏主成分分析的鲁棒优化方法

David Vävinggren, Francis Bach, André M. H. Teixeira, Dave Zachariah, Antônio H. Ribeiro

发表机构 * Uppsala University, Sweden（乌普萨拉大学，瑞典）； PSL Research University / INRIA, France（巴黎社会科学大学 / INRIA，法国）； Science for Life Laboratory, Sweden（生命科学实验室，瑞典）

AI总结提出AdvPCA方法，通过鲁棒优化在重建目标中引入最坏情况潜在空间扰动实现稀疏性，并给出闭式解和迭代算法。

详情

AI中文摘要

Trans GAN-WT: 一种基于特征提取和交互学习的风电机组时间序列数据异常检测模型

Jingzhe Kang

AI总结提出融合Transformer和生成对抗网络的异常检测模型TransGAN-WT，通过放大重构误差、自回归多模态特征提取和时序特征交互学习，在真实风电机组数据集上F1达96.10%，误报率仅0.06%。

详情

AI中文摘要

随着风电场规模和数量的增加，风电机组的日常运维成本不断上升。为了降低运维成本并在灾难性故障发生前提高风电机组及系统运行数据的可靠性，监测设备运行状态并在早期检测故障至关重要。利用工况数据对风电机组运行状态进行异常评估，实现运行状态异常监测具有重要的实际意义。然而，现有的异常检测方法既无法在充满大量冗余信息的数据中进行有效的关系建模，也无法合理利用有价值的异常数据。为此，本文提出了一种融合Transformer和生成对抗网络的异常检测模型。首先，通过放大重构误差来降低微小偏差异常的漏检率。其次，利用自回归推理提取多模态特征，以增强训练的稳定性和泛化能力。最后，构建时序特征提取模块，促进不同时间尺度特征之间的交互学习，有效减少时间冗余。在真实风电机组数据集上进行的多组实验结果表明，TransGAN-WT在多个风电机组数据集上的平均F1分数达到96.10%，比几种其他最先进的基线方法分别高出5.84%和2.89%。同时，其误报率（FPR）仅为0.06%，并通过Wilcoxon符号秩检验验证了与最先进基线方法相比取得了统计上显著的性能提升，有效保障了风电机组的稳定运行。

英文摘要

With the increasing scale and number of wind farms, wind turbines' daily operation and maintenance costs are increasing. To reduce operation and maintenance costs and enhance the reliability of wind turbine and system operation data before reaching catastrophic failures, monitoring the operating status of the equipment and detecting failures at an early stage is crucial. It is of great practical significance to utilize the working condition data for abnormal assessment of the operating status of wind turbines to realize abnormal monitoring of the operating status of wind turbines. However, the existing anomaly detection methods can neither perform effective relational modeling in data filled with a large amount of redundant information nor reasonably utilize the valuable anomaly data. For this reason, this paper proposes an anomaly detection model that fuses a Transformer and a generative adversarial network. Firstly, it reduces the leakage detection rate of minor deviation anomalies by amplifying the reconstruction error. Secondly, it uses autoregressive inference to extract multimodal features to enhance the stability and generalization ability of training. Finally, the temporal feature extraction module is constructed to promote the interactive learning between features of different time scales and effectively reduce the time redundancy. The results of multiple sets of experiments conducted on real WTG datasets show that TransGAN-WT achieves an average F1 score of 96.10% across multiple wind turbine datasets, which is 5.84% and 2.89% higher than several other state-of-the-art baseline methods. It also realizes a false positive rate (FPR) of 0.06%, and is verified by the Wilcoxon signed-rank test to have achieved a statistically significant performance enhancement compared to the state-of-the-art baseline methods, effectively ensuring the stable operation of wind turbines.

URL PDF HTML ☆

赞 0 踩 0

2606.03018 2026-06-03 stat.ME cs.LG math.ST stat.ML stat.TH 版本更新

A Fast Screening Approach for High-dimensional Outcomes and High-dimensional Predictors

高维结果与高维预测变量的快速筛选方法

Hongju Park, Zhenyao Ye, Shuo Chen

AI总结提出图独立双筛选（GIDS）框架，同时降低响应变量和预测变量的维度，以解决高维交叉模态分析中的计算负担和可解释性问题。

Comments 38 pages, 2 figures

详情

AI中文摘要

由于超高维度和复杂依赖结构伴随高水平噪声，对多模态高维数据间的交互建模本质上具有挑战性。筛选方法能有效降低维度，但大多数现有方法仅缩减预测变量空间而保留所有结果变量。在交叉模态分析中，不同结果变量通常选择不同的预测变量子集，因此并集仍然很大且响应维度不变，限制了筛选的实际效益。这导致沉重的计算负担和较差的可解释性。为解决这些局限，我们提出一个新的筛选框架——图独立双筛选（GIDS），它同时降低响应变量和预测变量的维度。我们设计了计算高效的算法，促进后续选择过程，提高准确性和可扩展性，并建立了支持性的理论结果。广泛的模拟研究表明，GIDS优于仅筛选预测变量的现有方法。为展示其实用性，我们将GIDS应用于阿尔茨海默病神经影像学倡议（ADNI）数据集，分析全基因组865,353个DNA甲基化与49,386个转录组变量之间的交互。GIDS将特征空间缩减至约9,000个CpG位点和2,000个转录本，揭示了块状交互结构：具有强关联的CpG位点簇和基因转录本簇。这些发现不仅提高了计算可处理性，还产生了可解释的生物学见解，突显了阿尔茨海默病背后的协调调控机制。

英文摘要

Modeling interactions among multimodal, high-dimensional data is intrinsically challenging due to ultra-high dimensionality and complex dependence structure with high level noise. Screening methods are effective for reducing dimensionality, but most existing approaches shrink only the predictor space while retaining all outcomes. In cross-modal analyses, different outcomes often select different predictor subsets, so the union remains large and the response dimension is unchanged, limiting the practical benefit of screening. This gives rise to heavy computational burdens and poor interpretability. To address these limitations, we propose a new screening framework, Graph Independence Dual Screening (GIDS), which simultaneously reduces the dimensionality of response variables and predictors. We design computationally efficient algorithms that facilitate downstream selection procedures, improving accuracy and scalability, and establish supporting theoretical results. Extensive simulation studies demonstrate that GIDS outperforms existing methods that screen only predictors. To illustrate its utility, we applied GIDS to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, analyzing interactions between genome-wide 865,353 DNA methylation and 49,386 transcriptomic variables. GIDS reduced the feature space to approximately 9,000 CpGs and 2,000 transcripts, uncovering blockwise interaction structures: clusters of CpG sites and gene transcripts with strong associations. These findings not only improve computational tractability but also yield interpretable biological insights, highlighting coordinated regulatory mechanisms underlying Alzheimer's disease.

URL PDF HTML ☆

赞 0 踩 0

2606.02909 2026-06-03 stat.ML cs.LG 版本更新

目标更新可能稳定线性Q学习：周期性和软动态

Donghwan Lee

发表机构 * School of Electrical Engineering, KAIST（韩国成均馆大学电气工程学院）

AI总结本文通过精确的切换线性系统动力学和联合谱半径分析，证明了在特定谱和步长条件下，周期性硬目标更新和软目标更新可以保证线性Q学习收敛到精确的投影Q-Bellman解。

详情

AI中文摘要

Q学习中的周期性目标更新和actor-critic方法中的软目标更新是经验上公认的稳定机制，但其精确的理论解释仍不完整。本文针对线性函数逼近的Q学习（线性Q学习），利用Bellman最大值引起的精确切换线性系统（SLS）动力学以及由此产生的切换矩阵族的联合谱半径（JSR），对这些机制进行了严格而精确的分析。尽管线性Q学习通常可能无法收敛，但我们证明，在明确的谱和步长条件下，周期性硬目标更新和软目标更新可以保证收敛到精确的投影Q-Bellman解。主要分析针对确定性线性Q学习进行，其中目标更新机制最为透明。一旦为均值递归建立了相应的JSR证书，随机强化学习设置可以通过将确定性模式替换为采样随机模式并添加相应的随机噪声分析来处理。

英文摘要

Periodic target updates in Q-learning and soft target updates in actor-critic methods are empirically well established stabilization mechanisms, but their precise theoretical explanation is still incomplete. This paper gives a rigorous and exact analysis of these mechanisms for Q-learning with linear function approximation (linear Q-learning) using the exact switched linear system (SLS) dynamics induced by the Bellman maximum and the joint spectral radius (JSR) of the resulting switching matrix families. Although linear Q-learning can fail to converge in general, we prove that, under explicit spectral and step-size conditions, periodic hard target updates and soft target updates can guarantee convergence to the exact projected Q-Bellman solution. The main analysis is carried out for deterministic linear Q-learning, where the target-update mechanism is most transparent. Once the corresponding JSR certificate is established for the mean recursion, the stochastic reinforcement-learning setting can be treated by replacing deterministic modes with sampled stochastic modes and adding the corresponding stochastic-noise analysis.

URL PDF HTML ☆

赞 0 踩 0

2606.02632 2026-06-03 stat.ML cs.AI cs.CY cs.LG econ.EM stat.AP 版本更新

Position: Prioritize Identifying Structure, Not Complex Models, for Scientific Discovery

立场：优先识别结构，而非复杂模型，以促进科学发现

Tyler H. McCormick

发表机构 * GitHub

AI总结本文论证现代机器学习在高维代理机制下存在通用欠定性，提出“机制性机器学习”的具体标准，以确保以LLM为中心的工作流真正支持科学而非模拟科学。

Comments Will appear as a position paper in ICML

2606.03184 2026-06-03 q-fin.CP cs.LG q-fin.ST 版本更新

FinStressTS: A Parametric Synthetic Benchmark for Time-Series Forecasting in Finance

FinStressTS: 金融时间序列预测的参数化合成基准

Jiaze Sun, Kelvin J. L. Koa, Ruiyang Ni, Yize Liu, Haonan Chen, Ke-Wei Huang

发表机构 * National University of Singapore（新加坡国立大学）； Asian Institute of Digital Finance（亚洲数字金融研究所）； Nanyang Technological University（南洋理工大学）

AI总结针对金融预测中信号弱、机制复杂的问题，提出FinStressTS合成基准，通过30个诊断环境系统评估15种模型在点预测与概率预测上的表现，揭示模型性能对数据机制的依赖性。

Comments KDD 2026 (Oral)

详情

DOI: 10.1145/3770855.3817578

AI中文摘要

金融预测因信噪比低、潜在因子、重尾、机制转换和跳跃而困难。真实世界基准提供的故障归因有限：研究人员可以观察到表现不佳，但往往无法隔离原因，因为机制不可观察且纠缠。真实金融数据仅揭示一条实现路径，使得评估尾部风险校准或数据效率变得困难。我们引入FinStressTS，一个机制感知的合成基准，将模型行为与受控的结构原因联系起来。FinStressTS包含围绕六个机制族（波动率聚类、多尺度持续性、重尾冲击、机制转换、自激跳跃和零膨胀过程）的30个诊断环境。我们评估两个任务：点预测（使用五种设置下的NMAE）和概率预测（在已知数据生成机制下使用CRPS）。我们对15个模型进行基准测试，从经典方法（HAR、VAR）到Transformer预测器（PatchTST、iTransformer）和深度概率架构（DeepAR、TSFlow），并使用学习曲线衡量样本效率。我们的评估揭示了三个见解。首先，性能依赖于机制：自回归和线性模型在多个波动率、尾部和跳跃驱动的环境中具有很强的竞争力，并且通常优于基于Transformer的模型。其次，分布对齐很重要：诸如DeepAR之类的参数化概率模型在平稳设置中校准良好，而灵活模型在分布变为多模态或稀疏时可能有所帮助。第三，神经网络模型通常需要更多数据才能匹配简单基线，主要在学习潜在机制或复杂分布时获得更大收益。FinStressTS提供了一个用于诊断故障模式和推进风险感知预测的开放框架。

英文摘要

Financial forecasting is difficult due to low signal-to-noise ratios, latent factors, heavy tails, regime shifts, and jumps. Real-world benchmarks offer limited failure attribution: researchers can observe underperformance, but often cannot isolate why because mechanisms are unobservable and entangled. Real financial data reveal only one realized path, making it difficult to assess tail-risk calibration or data efficiency. We introduce FinStressTS, a mechanism-aware synthetic benchmark that links model behavior to controlled structural causes. FinStressTS comprises 30 diagnostic environments around six mechanism families: volatility clustering, multi-scale persistence, heavy-tailed shocks, regime switching, self-exciting jumps, and zero-inflated processes. We evaluate two tasks: point forecasting, using NMAE across five settings, and probabilistic forecasting, using CRPS under known data-generating mechanisms. We benchmark 15 models, from classical methods (HAR, VAR) to Transformer forecasters (PatchTST, iTransformer) and deep probabilistic architectures (DeepAR, TSFlow), and use learning curves to measure sample efficiency. Our evaluation reveals three insights. First, performance is mechanism-dependent: autoregressive and linear models are highly competitive, and often outperform Transformer-based models, in several volatility-, tail-, and jump-driven environments. Second, distributional alignment matters: parametric probabilistic models such as DeepAR calibrate well in stationary settings, while flexible models can help when distributions become multimodal or sparse. Third, neural models often require more data to match simple baselines, with larger gains mainly when learning latent regimes or complex distributions. FinStressTS provides an open framework for diagnosing failure modes and advancing risk-aware forecasting.

URL PDF HTML ☆

赞 0 踩 0

2606.02629 2026-06-03 q-bio.QM cs.AI cs.LG 版本更新

Enhancing Protein-Protein Interaction Prediction with Hierarchical Motif-based Multimodal Protein Embedding

基于层次化基序的多模态蛋白质嵌入增强蛋白质-蛋白质相互作用预测

Zaifei Yang, Samuel Ping-Man Choi, James Kwok

发表机构 * National University of Singapore（新加坡国立大学）； University of California, Los Angeles（加州大学洛杉矶分校）

AI总结提出MMM-PPI模型，通过层次化基序的多模态编码（微观、中观、宏观三尺度）整合序列、结构和功能信息，提升蛋白质-蛋白质相互作用预测性能。

详情

AI中文摘要

蛋白质-蛋白质相互作用（PPIs）对许多生物过程至关重要。然而，现有的PPI预测方法存在两个主要局限性：它们忽略了蛋白质的层次组织，特别是关键调控PPIs的中观尺度基序，并且未能有效整合序列、结构和功能模态。为了解决这些局限性，我们提出了MMM-PPI，一种基于层次化基序的多模态蛋白质编码器用于PPI预测，该编码器以自底向上的多模态方式在三个尺度上构建PPI嵌入。在微观尺度上，我们编码三种模态的残基特征；在中观尺度上，一种新颖的多模态基序编码器将残基聚合成空间感知的基序嵌入；在宏观尺度上，一种多模态蛋白质编码器通过联合建模基序重要性和模态间相关性将基序整合为蛋白质嵌入。预训练的编码器可直接用于大规模PPI预测。在多个PPI数据集上的大量实验表明，MMM-PPI优于最先进的多标签PPI预测模型，特别是在具有挑战性的数据划分和有限数据场景下。代码见此链接。

英文摘要

Protein-protein interactions (PPIs) are essential for many biological processes. However, existing PPI prediction approaches suffer from two major limitations: they overlook the hierarchical organization of proteins, particularly meso-scale motifs that critically regulate PPIs, and fail to effectively integrate sequence, structure, and function modalities. To address these limitations, we propose MMM-PPI, a Hierarchical Motif-based Multi-Modal protein Encoder for PPI Prediction that constructs PPI embeddings in a bottom-up multi-modal manner across three scales. At the micro-scale, we encode three modal residue features; at the meso-scale, a novel multimodal motif encoder aggregates residues into spatially-informed motif embeddings; at the macro-scale, a multimodal protein encoder integrates motifs into protein embeddings by jointly modeling motif importance and inter-modal correlations. The pre-trained encoder can be used off-the-shelf for large-scale PPI prediction. Extensive experiments on multiple PPI datasets show that MMM-PPI outperforms state-of-the-art multi-label PPI prediction models, particularly under challenging data partitions and limited data scenarios. Codes are in https://github.com/yzf-code/MMM-PPI.

URL PDF HTML ☆

赞 0 踩 0

2606.02625 2026-06-03 q-bio.QM cs.AI cs.LG 版本更新

Skill-RM: 通过智能体技能统一异构评估标准

Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang, Yihao Liu, Jingwei Ni, Jiaqi Guo, Mengyu Zhou, Kai Tang, Junling Liu, Qinliang Su, Xiaoxi Jiang, Guanjun Jiang

发表机构 * Qwen Large Model Application Team, Alibaba（通义千问大模型应用团队，阿里巴巴）； Sun Yat-sen University（中山大学）； The Chinese University of Hong Kong（香港中文大学）； Peking University（北京大学）； ETH Zürich University of Zurich（苏黎世联邦理工学院）

AI总结提出Skill-RM框架，将奖励建模重构为可重用的奖励评估技能执行，通过动态选择和聚合证据统一异构评估标准，在奖励基准和下游任务中优于传统方法。

详情

AI中文摘要

奖励模型（RMs）为LLM后训练提供关键反馈信号，特别是在强化微调（RFT）和强化学习（RL）流程中。然而，当前的奖励评估依赖于异构标准，如基于规则的验证器、真实参考、程序化检查表和复杂评分标准，而统一整合所有类型证据的机制尚未被探索。为此，我们提出技能奖励模型（Skill-RM），一个统一框架，将奖励建模重构为可重用的奖励评估技能的执行。通过将奖励计算视为结构化的智能体任务，Skill-RM提供一致的接口来编排异构资源，动态选择和聚合针对每个输入特定要求定制的证据。这种方法使奖励模型能够超越静态评估，确保跨不同任务的一致性和透明度。在奖励基准和下游应用（包括最佳N选择和强化学习）上的大量实验表明，Skill-RM始终优于传统的评判基线。我们的发现表明，Skill-RM不仅为奖励建模提供了统一解决方案，而且通过战略性和动态的证据编排实现了卓越性能。代码见此链接。

英文摘要

Reward models (RMs) provide critical feedback signals for LLM post-training, notably in reinforced fine-tuning (RFT) and reinforcement learning (RL) pipelines. However, current reward evaluation relies on heterogeneous criteria such as rule-based verifiers, ground-truth references, procedural checklists, and complex rubrics, where a unified mechanism to integrate all types of evidence remains unexplored. To this end, we propose Skill Reward Model (Skill-RM), a unified framework that reformulates reward modeling as the execution of a reusable Reward-Evaluation Skill. By treating reward computation as a structured agentic task, Skill-RM provides a consistent interface to orchestrate heterogeneous resources, dynamically selecting and aggregating evidence tailored to the specific requirements of each input. This approach enables the reward model to move beyond static evaluation, ensuring consistency and transparency across diverse tasks. Extensive experiments on reward benchmarks and downstream applications, including best-of-N selection and reinforcement learning, demonstrate that Skill-RM consistently outperforms traditional judge baselines. Our findings suggest that Skill-RM not only provides a unified solution for reward modeling but also achieves superior performance through the strategic and dynamic orchestration of evidence. The code is at https://github.com/Qwen-Applications/Skill-RM.

URL PDF HTML ☆

赞 0 踩 0

2606.03979 2026-06-03 cs.LG cs.AI 版本更新

Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

语言模型需要睡眠：学习自我修改和巩固记忆

Ali Behrouz, Farnoosh Hashemi, Vahab Mirrokni

发表机构 * Google（谷歌）； Cornell University（康奈尔大学）

AI总结受人类学习过程启发，提出“睡眠”范式，通过记忆巩固（知识播种）和梦境（自我改进）两阶段，使模型持续学习、将短期记忆转化为长期知识并自我提升。

Comments A version of this work has been publicly available from September 2025 on OpenReview

详情

AI中文摘要

过去几十年见证了机器学习算法设计的重大进步，从早期针对特定任务的浅层模型研究到更通用的深度大语言模型（LLMs）。尽管在需要即时预测或上下文学习的任务中显示出有希望的结果，现有模型缺乏持续学习并有效将其时间上下文知识转移到长期参数的能力。受人类学习过程的启发，我们引入了一种“睡眠”范式，允许模型持续学习，通过重放将其短期脆弱记忆蒸馏为稳定的长期知识，并通过“梦境”过程递归地自我改进。更详细地说，睡眠包括两个阶段：（1）记忆巩固：一个向上的蒸馏过程，称为知识播种，其中较小自我的记忆被蒸馏到更大的网络中，以在保留知识的同时提供更多容量。作为概念验证，我们提出了一种新的广义蒸馏过程用于知识播种（即在线策略蒸馏与基于强化学习的模仿学习的结合）；（2）梦境：一个自我改进阶段，其中模型使用强化学习生成合成数据的课程，以排练新知识并在没有人类监督的情况下完善现有能力。我们在长视野、持续学习、知识整合和少样本泛化任务上的实验支持了睡眠阶段的重要性。

英文摘要

The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with ''Dreaming'' process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for {Knowledge Seeding} (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.

URL PDF HTML ☆

赞 0 踩 0

2606.03976 2026-06-03 cs.CV cs.AI cs.LG q-bio.NC 版本更新

MLSkip: 通过轻量级元数据实现ML过滤器的数据跳过

Mihail Stoian, Mark Gerarts, Pascal Ginter, Andreas Zimmerer, Jan Van den Bussche, Andreas Kipf

发表机构 * University of Technology Nuremberg（图恩堡技术大学）； Hasselt University（哈塞尔特大学）； Technical University of Munich（慕尼黑技术大学）

AI总结针对ML过滤器无法应用传统数据跳过技术的问题，提出利用Parquet默认的min-max元数据以及增强的二维凸包元数据结构，实现高效的谓词剪枝，平均剪枝效果达38.31%。

详情

AI中文摘要

数据库厂商最近发布了可用于过滤器谓词的AI函数。由于这些函数通常依赖于昂贵且黑盒的ML模型，它们带来了新的数据管理挑战。具体而言，针对整数和字符串数据的传统数据跳过技术无法适用于这种新型过滤器。实际上，目前还没有已知的机制用于剪枝不合格的行组，例如从blob存储读取文件时。在这项工作中，我们首次研究了ML过滤器的数据跳过技术。我们论证了Parquet默认的min-max元数据足以实现剪枝。为此，我们联系了两条研究路线：(i) 最近提出的ML模型查询语言和(ii) 神经网络验证。我们在ReLU架构上的初步结果表明，在TPC-H和TPC-DS表上，选择性低于0.1%的过滤器的平均剪枝效果为27.4%。最后，受空间连接研究的启发，我们提出了一种增强的元数据结构：一个有大小限制的二维凸包，验证工具可以更好地利用它，将剪枝效果提高到38.31%，同时每个行组和列对最多占用45字节。我们观察到在DuckDB中相对于PyTorch的端到端加速比为1.07倍。

英文摘要

Database vendors recently released AI functions that can be used in filter predicates. As such functions often rely on costly, black-box ML models, they unveil new data management challenges. Concretely, traditional data skipping techniques for integer and string data fail to be applicable to the new filter type. Indeed, there is no known mechanism for pruning non-qualifying row groups, e.g., when reading files from blob storage. In this work, we initiate the study of data skipping techniques for ML filters. We make the case that Parquet's default min-max metadata is enough to enable pruning. To this end, we draw connections to two lines of research: (i) the recently proposed query language for ML models and (ii) neural network verification. Our preliminary results on ReLU architectures show that on tables from TPC-H and TPC-DS, the average pruning effectiveness for filters of selectivity below 0.1% amounts to 27.4%. Finally, inspired by research on spatial joins, we propose an enhanced metadata structure: a size-bounded 2D convex hull that verification tools can make better use of, increasing the pruning effectiveness to 38.31%, while occupying at most 45 bytes per row group and column pair. We observe an end-to-end speedup of 1.07$\times$ over PyTorch in DuckDB.

URL PDF HTML ☆

赞 0 踩 0

2606.03939 2026-06-03 cs.LG cs.AI cs.PF 版本更新

FlashbackCL: Mitigating Temporal Forgetting in Federated Learning

FlashbackCL：缓解联邦学习中的时间遗忘

Mubarak A. Ojewale, Adriana E. Chis, Jorge M. Cortes-Mendoza, Bernardo Pulido-Gaytan, Horacio Gonzalez-Velez

发表机构 * Cloud Competency Centre, National College of Ireland, Dublin, Ireland（云竞争力中心，爱尔兰国家学院，都柏林，爱尔兰）

AI总结针对联邦学习中客户端数据分布随时间漂移导致的时间遗忘问题，提出FlashbackCL方法，通过时间衰减标签计数、类别平衡水库采样重放和服务器端主动核心集筛选，在CIFAR-10上相对Flashback提升6.9%-10.0%，时间遗忘减少68%。

详情

AI中文摘要

基础模型和边缘模型的联邦学习（FL）越来越多地部署在客户端数据分布随时间漂移的场景中，然而现有的遗忘缓解方法假设每个客户端的分布是平稳的。Flashback是近期最强的针对跨客户端（空间）遗忘的FL方法，它使用单调累积的每类标签计数作为知识代理；该代理在时间分布漂移下会失准，并将全局模型锚定在过时的类别平衡上。我们通过一个与协议级波动隔离的每阶段指标形式化定义了FL中的时间遗忘，并提出了Flashback Continual Learning（FlashbackCL），它是Flashback的即插即用扩展，包含：(i) 时间衰减的标签计数；(ii) 具有类别平衡水库采样（CBRS）的设备感知重放缓冲区；(iii) 在公共蒸馏集上的服务器端主动核心集筛选。结果表明，在具有50个客户端和三种受控时间漂移模式的CIFAR-10上，FlashbackCL相对于Flashback实现了6.9%至10.0%的相对改进，同时将时间遗忘减少了高达68%。一项5变体消融实验表明CBRS重放是关键组件。FlashbackCL在平稳CIFAR-100上也比Flashback提高了3.5个百分点，表明类别平衡重放同样正则化了空间异质性和时间漂移。

英文摘要

Federated Learning (FL) of foundation and edge models increasingly targets deployments where client data distributions drift over time, yet existing forgetting-mitigation methods assume each client's distribution is stationary. Flashback, the strongest recent FL method against cross-client (spatial) forgetting, uses monotonically accumulating per-class label counts as a knowledge proxy; this proxy becomes miscalibrated under temporal distribution shift and anchors the global model to an outdated class balance. We formalise temporal forgetting in FL with a per-phase metric isolated from protocol-level fluctuations and propose Flashback Continual Learning (FlashbackCL), a drop-in extension of Flashback with (i) temporally-decayed label counts; (ii) a device-aware replay buffer with Class-Balanced Reservoir Sampling (CBRS); and (iii) server-side active coreset curation on the public distillation set. The results show that FlashbackCL achieves 6.9% to 10.0% relative improvement relative to Flashback, on CIFAR-10 with 50 clients and three controlled temporal shift modes, while simultaneously reducing temporal forgetting by up to 68%. A 5-variant ablation identifies CBRS replay as the critical component. FlashbackCL also improves Flashback by 3.5 points on stationary CIFAR-100, suggesting that class-balanced replay regularises spatial heterogeneity as well as temporal shift.

URL PDF HTML ☆

赞 0 踩 0

2606.03936 2026-06-03 cs.LG physics.geo-ph 版本更新

FFR：前向-前向学习用于回归

Xinyang Liu, Xuanyu Liang, Shiqi Ding, Boyang Li, Zhiqiang Que, Jiayang Li, Guosheng Hu

发表机构 * University of Bristol（布里斯托大学）； University College London（伦敦大学学院）； University of Cambridge（剑桥大学）

AI总结提出FFR框架，通过序数竞争 goodness 函数、分层阶梯架构和层次化预测将前向-前向算法扩展到回归任务，在多个数据集上恢复BP 98.6%的精度并显著降低内存和时间开销。

详情

AI中文摘要

前向-前向（FF）算法通过纯局部、逐层优化训练神经网络，提供了反向传播（BP）的计算高效且生物合理的替代方案。然而，FF本质上是为通过对比正负样本对进行分类而设计的，将其扩展到回归面临根本性挑战：连续目标空间缺乏用于对比学习的自然“对立面”，且标准 goodness 函数不携带关于目标幅度或顺序的信息。我们提出FFR（前向-前向回归），据我们所知，这是第一个将FF扩展到现实世界回归并展示在多样化真实数据集上具有竞争力的性能的框架。FFR引入了三项关键创新：（1）序数竞争 goodness 函数，通过距离感知序数监督下分区神经元组之间的竞争学习取代对比对；（2）分层阶梯架构，其中浅层学习粗序数判别，深层细化到细粒度回归，并通过多尺度特征聚合实现层间协作；（3）带不确定性估计的层次化预测，其中多尺度预测器联合提供鲁棒预测和预测置信度作为免费午餐。大量实验结果表明，FFR在五个真实世界回归基准上平均恢复了BP 98.6%的精度，同时将峰值训练内存降低到深度8时BP的27%和深度32时BP的8%，每次迭代时间约为BP的72%，并且显著优于所有无BP的竞争对手。

英文摘要

The Forward-Forward (FF) algorithm offers a computationally efficient and biologically plausible alternative to backpropagation (BP) by training neural networks through purely local, layer-wise optimization. However, FF is inherently designed for classification via contrastive positive-negative sample pairs, and extending it to regression poses fundamental challenges: continuous target space lack natural "opposites" for contrastive learning, and the standard goodness function carries no information about target magnitude or ordering. We propose FFR (Forward-Forward for Regression), to our knowledge, the first framework to extend FF to real-world regression and demonstrate competitive performance across diverse real-world datasets. FFR introduces three key innovations: (1) an ordinal competitive goodness function that replaces contrastive pairs with competitive learning between partitioned neuron groups under distance-aware ordinal supervision; (2) a stratified ladder architecture where shallow layers learn coarse ordinal discrimination and deeper layers refine into fine-grained regression, with multi-scale feature aggregation for inter-layer collaboration; and (3) hierarchical prediction with uncertainty estimation, where multi-scale predictors jointly provide robust predictions and prediction confidence as a free-lunch. Extensive experimental results show FFR recovers on average 98.6% of BP's accuracy across five real-world regression benchmarks while reducing peak training memory to only 27% of BP's at depth 8 and 8% at depth 32, with per-iteration time around 72% of BP's, and substantially outperforms all BP-free competitors.

URL PDF HTML ☆

赞 0 踩 0

2606.03926 2026-06-03 cs.HC cs.LG 版本更新

DiffUNet^2: Bidirectional Prediction, Probabilistic Generation and Collaborative Visual Discovery for Scientific Data

DiffUNet^2: 科学数据的双向预测、概率生成与协同视觉发现

Mengdi Chu, Jiaxin Yang, Angus G. Forbes, Nathan Debardeleben, Earl Lawrence, Ayan Biswas, Han-Wei Shen

发表机构 * Ohio State University（俄亥俄州立大学）； NVIDIA ； Los Alamos National Laboratory（洛斯阿拉莫斯国家实验室）

AI总结提出基于扩散模型的条件生成框架DiffUNet^2，实现时间序列的双向任意步预测与概率分布捕获，并结合交互式可视化支持科学探索。

Comments 12 pages, 20 figures

详情

AI中文摘要

对科学现象进行时间演化建模对于分析和推理至关重要，然而大多数机器学习方法仅提供确定性的前向预测，忽略了多种可能的结果，且很少支持反向推理，限制了它们在科学工作流中的实用性。我们提出了一个将基于扩散的生成建模与交互式视觉分析相结合的框架，用于科学探索。我们引入了DiffUNet^2，一种条件扩散模型，能够实现跨时间的双向、任意到任意生成，并捕获系统可能演化的分布。基于该模型，我们的交互式系统支持分支时间线探索、用户引导的状态编辑和概率空间导航，使科学家能够主动探索替代假设，而非被动观察预测。我们在5个不同科学领域的数据集上评估了该模型，验证了其预测准确性和概率空间集成质量。与领域专家合作，我们证明了该方法在支持实际科学时间数据分析工作流中的有效性。通过集成建模与视觉交互，我们的方法使科学家能够交互式地探索系统动力学，将生成模型转化为假设驱动的科学分析工具。

英文摘要

Modeling temporal evolution is important to analyzing and reasoning about scientific phenomena, yet most machine learning methods provide deterministic forward predictions that overlook multiple plausible outcomes and rarely support backward reasoning, limiting their usefulness in practical scientific workflows. We present a framework that integrates diffusion-based generative modeling with interactive visual analytics for scientific exploration. We introduce DiffUNet^2, a conditional diffusion model that enables bidirectional, any-to-any generation across time and captures distributions of plausible system evolutions. Built upon the model, our interactive system supports branching timeline exploration, user-guided state editing, and probability-space navigation, enabling scientists to actively explore alternative hypotheses rather than passively observe predictions. We evaluate the model on 5 datasets across different scientific domains to validate its predictive accuracy and probability-space ensemble quality. In collaboration with domain experts, we demonstrate the effectiveness of our approach in supporting practical scientific temporal data analysis workflows. By integrating modeling and visual interaction, our approach enables scientists to interactively explore system dynamics, transforming generative models into tools for hypothesis-driven scientific analysis.

URL PDF HTML ☆

赞 0 踩 0

2606.03923 2026-06-03 cs.LG 版本更新

Contrastive Neural Algorithmic Reasoning for Graph Coloring

对比神经算法推理用于图着色

Thien Le, Tianyu Zhao, Melanie Weber

发表机构 * Harvard University SEAS（哈佛大学SEAS）； Harvard University T.H. Chan School of Public Health（哈佛大学T.H. Chan公共卫生学院）

AI总结提出对比学习框架学习可迁移的着色几何结构，通过图神经网络编码器实现低冲突着色，并推广到不同规模的图。

Comments 52 pages, 5 figures, 45 tables

详情

AI中文摘要

图着色旨在用尽可能少的颜色为图的节点分配颜色，使得相邻节点颜色不同。这里，我们研究近似$k$-着色，目标是用最多$k$种颜色同时最小化单色边的数量。该问题是图论的核心问题，并在调度和资源分配等领域有应用。最近的無监督GNN方法直接优化每个实例，阻碍了跨图大小和分布的泛化。我们转而提出一个对比学习框架，学习可迁移的着色几何结构，其中同色节点的嵌入对齐，而相邻节点的表示被推向不同方向。我们分析了有界大小图上的总体目标。对于单位范数嵌入，我们证明其最优解具有线原型结构：同色节点的表示坍缩到共享的一维子空间，边连接正交子空间。该几何结构在有监督设置中产生平稳条件，并在平衡着色假设下通过投影次梯度动力学保持。在非归一化变体中，梯度下降具有由商图硬间隔问题控制的最大间隔偏差。在合成和真实世界图上的实验表明，对比GNN编码器有效泛化并产生低冲突着色，与贪心方法匹配甚至有时改进。

英文摘要

Graph coloring seeks to assigns colors to a graph's nodes so that adjacent nodes receive different colors, using as few colors as possible. Here, we study approximate $k$-coloring, where the goal is to use at most $k$ colors while minimizing the number of monochromatic edges. This problem is central to graph theory and has applications in areas such as scheduling and resource allocation. Recent unsupervised GNN approaches optimize each instance directly, precluding generalization across graph sizes and distributions. We instead propose a contrastive learning framework that learns transferable coloring geometry where the embeddings of same-color nodes align, while adjacent nodes' representations are pushed toward distinct directions. We analyze the resulting population objective over bounded-size graphs. For unit-norm embeddings, we show that its optima have a line-prototype structure: Representations of nodes of the same color collapse to a shared one-dimensional subspace, and edges connect orthogonal subspaces. This geometry yields stationarity conditions in the supervised setting and is preserved by projected subgradient dynamics under a balanced-coloring assumption. In an unnormalized variant, gradient descent has a max-margin bias governed by a quotient-graph hard-margin problem. Experiments on synthetic and real-world graphs show that contrastive GNN encoders generalize effectively and produce low-conflict colorings, matching and sometimes improving on greedy approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.03919 2026-06-03 cs.SI cs.CY cs.DL cs.LG physics.soc-ph 版本更新

Forecasting Conceptual Diffusion in Science: The Case of Quantum Computing

预测科学中的概念扩散：以量子计算为例

Thomas Maillart, Thibaut Chataing, David Dosu, Paul Bagourd, Julian Jang-Jaccard, Alain Mermoud

发表机构 * Geneva School of Economics and Management, University of Geneva（日内瓦经济管理学院，日内瓦大学）； Faculty of Medicine, University of Geneva（日内瓦大学医学院）； Open Quantum Institute, CERN（开放量子研究所，欧洲核子研究中心）； armasuisse Science + Technology（armasuisse 科学与技术）

AI总结通过构建时间分辨的概念共现网络并训练LightGBM模型，研究量子计算领域概念的内生巩固与外生扩散的可预测性，发现外生扩散和熵具有强可预测性（R²高达0.78），而内生巩固在量子计算中几乎不可预测，但在神经植入领域显著上升（R²=0.83），表明概念扩散受语义和引用环境中的稳定结构规律支配。

Comments 19 pages, 5 figures, 6 tables. Code and manuscript sources: https://github.com/wazaahhh/breakthroughs-diffusion . An earlier version was presented at the Global Tech Mining Conference (GTM) 2026 (submission #117)

详情

AI中文摘要

理解和预测科学变化需要能够区分科学概念的内生巩固和外生扩散的模型。利用OpenAlex中量子计算概念子树，我们构建了一个时间分辨的概念共现网络，并追踪每个概念对的上游引用谱系和下游扩散。我们在分布和多样性感知特征上训练LightGBM模型，以预测四个结果：内生巩固、外生扩散、它们的比率以及扩散熵。在控制科学整体出版增长后，内生巩固在主要的量子计算基准中基本不可预测。相比之下，外生扩散和熵具有很强的可预测性（R²高达0.78），并且由上游异质性、引用广度和分布离散度驱动，如SHAP分析所示；在机器人、先进材料和神经植入上的重复验证证实，外生扩散仍然是跨领域排名最高的目标（测试R²约0.60-0.87），而内生可预测性在神经植入中显著上升（测试R²=0.83），表明量子计算的不对称性并非普遍适用。案例研究表明，尖锐的熵增加与新概念前沿的开启同时发生，而熵崩溃则标志着技术趋同或范式更替。这些结果表明，概念扩散受嵌入语义和引用环境中的稳定结构规律支配。通过识别跨领域采纳的早期基于多样性的信号，该方法为快速发展的研究领域中的预期科学计量学、技术预见和创新导向政策分析提供了可扩展的基础。

英文摘要

Understanding and anticipating scientific change requires models that distinguish between endogenous consolidation and exogenous diffusion of scientific concepts. Using the quantum computing subtree of concepts in OpenAlex, we construct a temporally resolved concept co-occurrence network and track each concept pair through its upstream citation lineage and downstream diffusion. We train LightGBM models on distributional and diversity-aware features to predict four outcomes: endogenous reinforcement, exogenous diffusion, their ratio, and diffusion entropy. After controlling for overall publication growth of the scientific body, endogenous reinforcement proves largely unpredictable in the primary quantum-computing benchmark. In contrast, exogenous diffusion and entropy are strongly predictable ($R^2$ up to $0.78à) and are driven by upstream heterogeneity, citation breadth, and distributional dispersion, as shown by SHAP analyses; replications on robotics, advanced materials, and neuro implants confirm that exogenous diffusion remains the top-ranked target across fields ($R^2_test \sim 0.60-0.87$), while endogenous predictability rises markedly in neuro implants (R^2_test = 0.83), indicating that the quantum-computing asymmetry does not generalise uniformly. Case studies reveal that sharp entropy increases coincide with the opening of new conceptual frontiers, while entropy collapses signal technological convergence or paradigm displacement. These results demonstrate that conceptual diffusion is governed by stable structural regularities embedded in semantic and citation environments. By identifying early diversity-based signals of cross-domain uptake, the approach provides a scalable foundation for anticipatory scientometrics, technology foresight, and innovation-oriented policy analysis in rapidly evolving research fields.

URL PDF HTML ☆

赞 0 踩 0

2606.03904 2026-06-03 cs.LG cs.CV 版本更新

MAdam: Metric-Aware Multi-Objective Adam

MAdam: 度量感知的多目标Adam

Fengbei Liu, Rachit Saluja, Sunwoo Kwak, Ruibo Wang, Ruining Deng, Heejong Kim, Johannes C. Paetzold, Mert R. Sabuncu

发表机构 * Cornell Tech（康奈尔科技）； Weill Cornell Medicine（韦尔医学院）； Delft University of Technology（代尔夫特理工大学）

AI总结提出MAdam，通过偏好条件曲率预处理多目标优化中的协调方向，解决Adam与求解器之间的权重失配和几何失配问题，在多任务学习、帕累托前沿恢复等任务中一致提升性能。

详情

AI中文摘要

多目标优化是许多机器学习问题的基础，然而跨损失平衡、梯度平衡和基于帕累托的求解器家族几乎都将它们协调后的方向交给Adam处理。我们表明这种耦合在求解器的意图和优化器的执行之间引入了两个系统性差距。第一个是权重失配：Adam的二阶矩分母将时变偏好向量与梯度统计量纠缠在一起，将偏好边缘化为历史平均值，并将不同的帕累托权衡压缩为近乎均匀的混合。第二个是几何失配：Adam的自适应度量扭曲了多目标优化求解器假设的欧几里得几何，将对齐的目标转化为明显的冲突。为了共同解决这两个问题，我们引入了MAdam（度量感知的多目标Adam），这是一个即插即用的包装器，不改变求解器和优化器。MAdam通过标量化目标的偏好条件曲率对协调方向进行预处理；在此白化输入上，Adam的二阶矩退化为单位矩阵，因此实际更新由偏好条件度量主导。在多任务学习、帕累托前沿恢复、物理信息神经网络和医学成像中，MAdam在每个求解器家族上都一致优于Adam。

英文摘要

Multi-objective optimization (MOO) underlies many machine learning problems, yet MOO solvers across the loss-balancing, gradient-balancing, and Pareto-based families almost universally hand their reconciled directions to Adam~\cite{kingma2015adam}. We show this coupling introduces two systematic gaps between the solver's intent and the optimizer's execution. The first is a \emph{weighting mismatch}: Adam's second-moment denominator entangles the time-varying preference vector with gradient statistics, marginalizing the preference into a history average and collapsing distinct Pareto trade-offs toward a near-uniform mixture. The second is a \emph{geometric mismatch}: Adam's adaptive metric distorts the Euclidean geometry MOO solvers assume, turning aligned objectives into apparent conflicts. To resolve both jointly, we introduce \textbf{MAdam} (Metric-Aware Multi-Objective Adam), a drop-in wrapper that leaves both solver and optimizer unchanged. MAdam preconditions the reconciled direction by the preference-conditioned curvature of the scalarized objective; on this whitened input, Adam's second moment collapses to identity, so the realized update is governed by the preference-conditioned metric. Across multi-task learning, Pareto-front recovery, physics-informed neural networks, and medical imaging, MAdam consistently improves over Adam for every solver family.

URL PDF HTML ☆

赞 0 踩 0

2606.03888 2026-06-03 cs.CV cs.LG 版本更新

CoralBay: A Self-Supervised CT Foundation Model

CoralBay: 一种自监督CT基础模型

Ioannis Gatopoulos, Nicolas Känzig, Sebastian Otálora, Fei Tang

发表机构 * kaiko.ai（Kaiko AI）

AI总结提出CoralBay框架，通过分层3D Swin骨干网络和自蒸馏学习多尺度特征，实现CT体积数据的自监督预训练，有效提升下游放射学任务性能。

详情

AI中文摘要

自监督学习已在2D自然图像上实现了大规模预训练，产生了跨任务有效迁移的通用视觉表示。然而，许多医学成像模态（如CT扫描）本质上是三维的，在结构和语义上与自然图像根本不同。体积模态捕捉空间连续性、器官解剖和基于强度的组织特性（如亨氏单位），这些无法通过2D预训练充分建模。为弥补这一差距，我们引入了CoralBay，一种自蒸馏框架，通过使用分层3D Swin骨干网络并将自蒸馏应用于拼接的多尺度特征，扩展了DINO，实现了数据高效的自监督学习，编码了全局语义和细粒度局部结构的丰富空间表示。因此，CoralBay有效迁移到广泛的下游放射学任务，在多样化的解剖目标上展现出强大且一致的性能。此外，我们通过引入一个公开、可复现的3D放射学排行榜，为开源\eva框架做出贡献，该排行榜统一了多个数据集，并建立了评估体积表示学习方法的标准化基准。

英文摘要

Self-supervised learning has enabled large-scale pre-training on 2D natural images, producing general-purpose visual representations that transfer effectively across tasks. However, many medical imaging modalities, such as CT scans, are inherently three-dimensional and differ fundamentally from natural images in both structure and semantics. Volumetric modalities capture spatial continuity, organ anatomy, and intensity-based tissue properties (e.g., Hounsfield Units), which are not adequately modeled by 2D pre-training. To bridge this gap, we introduce CoralBay, a self-distillation framework that extends DINO by using a hierarchical 3D Swin backbone and applying self-distillation to concatenated multi-scale features, enabling data-efficient self-supervised learning of rich spatial representations that encode both global semantics and fine-grained local structure. As a result, CoralBay transfers effectively to a wide range of downstream radiological tasks, demonstrating strong and consistent performance across diverse anatomical targets. In addition, we contribute to the open-source \eva framework by introducing a public, reproducible 3D radiology leaderboard that unifies multiple datasets and establishes a standardized benchmark for evaluating volumetric representation learning methods.

URL PDF HTML ☆

赞 0 踩 0

2606.03885 2026-06-03 cs.LG 版本更新

Attribution via Distributional Paths for Information Revelation

通过分布路径进行信息揭示的归因

Kieran A. Murphy, Shameen Shrestha

发表机构 * New Jersey Institute of Technology（新泽西理工学院）

AI总结提出Reveal-IG方法，将路径归因从输入空间提升到结构化探针分布空间，通过逐步揭示信息并归因模型期望输出的变化，保留完整性并避免输入空间路径伪影。

Comments Code: https://github.com/murphyka/Reveal-IG

详情

AI中文摘要

特征归因方法通过为输入特征分配重要性分数来解释预测。基于路径的方法（如积分梯度）特别有吸引力，因为它们满足 extit{完整性}：归因总和等于模型输出在参考状态和输入之间的变化。然而，大多数路径方法在输入空间中定义轨迹，沿着所选路径通过逐点扰动输入来解释模型。输入空间路径积分模型在每个经过点的原始响应，无法控制特征被查询的分辨率；轨迹中靠近基线的早期部分与输入本身对解释的贡献相同。在这里，我们将路径归因从输入空间提升到围绕感兴趣示例的结构化探针分布空间，并将我们的方法称为Reveal-IG。Reveal-IG不是遍历原始输入值，而是逐步揭示关于输入的信息，并归因模型期望输出沿此分布路径的变化。结果是一个路径归因框架，它保留了对期望模型响应的完整性，并自然地适应多尺度图像探针和表格数据中的特征级不确定性。综合诊断表明，Reveal-IG避免了影响输入空间方法的路径伪影，并且在ImageNet分类和表格回归中，它产生稳定的、有符号的归因——在使用归因符号的指标上领先，同时在其余指标上保持竞争力。

英文摘要

Feature attribution methods explain predictions by assigning importance scores to input features. Path-based methods such as Integrated Gradients are especially appealing because they satisfy \textit{completeness}: attributions sum to the change in model output between a reference state and the input. Yet most path methods define this trajectory in input space, explaining a model through pointwise perturbed inputs along a chosen path. An input-space path integrates the model's raw response at each point it passes through, with no control over the resolution at which a feature is queried; the early, baseline-adjacent part of the trajectory contributes to the explanation on equal footing with the input itself. Here, we lift path attribution from input space to a space of structured probe distributions around the example of interest, and call our method Reveal-IG. Rather than traversing raw input values, Reveal-IG progressively reveals information about the input and attributes changes in the model's expected output along this distributional path. The result is a path-attribution framework that retains completeness with respect to the expected model response, and naturally accommodates multiscale image probes and feature-wise uncertainty in tabular data. Synthetic diagnostics show that Reveal-IG avoids path artifacts that affect input-space methods, and across ImageNet classification and tabular regression it produces stable, signed attributions -- leading on metrics that use attribution sign while remaining competitive on the rest.

URL PDF HTML ☆

赞 0 踩 0

2606.03883 2026-06-03 cs.AI cs.LG 版本更新

Reasoning Structure of Large Language Models

大型语言模型的推理结构

Frédéric Berdoz, Luca A. Lanzendörfer, Fabian Farestam, Roger Wattenhofer

AI总结针对大型推理模型评估中隐藏不同推理结构的问题，提出基于逻辑谜题的基准测试和将非结构化轨迹转化为可验证推理图的方法，并定义推理效率指标，以量化分析推理拓扑结构。

Comments Accepted at ICML 2026 and presented at the ICLR 2026 workshop on LLM reasoning

2606.03871 2026-06-03 cs.CV cs.CL cs.LG 版本更新

Visual Instruction Tuning Aligns Modalities through Abstraction

视觉指令调优通过抽象对齐模态

Luis Palacios, Lorenzo Basile, Diego Doimo, Alberto Cazzaniga

发表机构 * Area Science Park, Trieste, Italy（特里埃斯特Area Science Park）

AI总结通过探针分析和因果干预，发现视觉指令调优将视觉特征直接嵌入LLM的中间语义层，绕过早期单模态处理层，并通过扩展和强化现有抽象阶段对齐视觉与文本表示。

详情

AI中文摘要

视觉指令调优有效地使预训练的大语言模型（LLM）能够同时处理图像信息和文本。然而，视觉特征如何嵌入到LLM骨干网络的逐层抽象层次中仍不清楚。通过一系列不同的视觉-语言架构，我们表明指令调优主要充当桥梁，将视觉特征直接嵌入到LLM的中间语义层，绕过了用于单模态处理的早期层。通过探针分析和因果干预，我们表明这些中间层是视觉-语言处理的语义核心，并在广泛的 multimodal 基准测试中发挥关键作用。此外，通过比较语义等价的视觉和文本表示的几何结构，我们发现微调扩展并强化了现有的抽象阶段，使视觉特征与已有的文本特征对齐。最后，我们通过将微调限制在中间层来确认这种局部对齐的功能作用：该策略在视觉中心基准测试中保持了全微调的性能，同时减少了训练时间。我们的结果表明，多模态集成是一种局部现象，由LLM内部抽象引擎的重新利用驱动。

英文摘要

Visual instruction tuning effectively adapts a pre-trained Large Language Model (LLM) to process image information alongside text. Yet, it remains unclear how visual features are embedded into the layer-wise hierarchy of abstractions of the LLM backbone. Across a diverse set of vision-language architectures, we show that instruction tuning primarily serves as a bridge, embedding visual features directly into the intermediate semantic layers of the LLM, bypassing the early layers devoted to unimodal processing. With probing analyses and causal interventions, we show that these intermediate layers are the semantic core of vision-language processing and play a critical role in the performance on a broad set of multimodal benchmarks. In addition, by comparing the geometry of semantically equivalent visual and textual representations, we find that fine-tuning extends and strengthens the existing abstraction phase, aligning visual features with pre-existing textual ones. Finally, we confirm the functional role of this localized alignment by restricting fine-tuning to intermediate layers alone: this strategy preserves the performance of full fine-tuning on vision-centric benchmarks while reducing training time. Our results suggest that multimodal integration is a localized phenomenon driven by the repurposing of the internal abstraction engine of the LLM.

URL PDF HTML ☆

赞 0 踩 0

2606.03864 2026-06-03 cs.SI cs.CY cs.DL cs.LG physics.soc-ph 版本更新

Explainable Forecasting of Scientific Breakthroughs from Concept Network Dynamics

基于概念网络动力学的科学突破可解释预测

Thomas Maillart, Thibaut Chataing, Ntorina Antoni, David Dosu, Paul Bagourd, Julian Jang-Jaccard, Alain Mermoud

发表机构 * Geneva School of Economics and Management, University of Geneva, Geneva, Switzerland（日内瓦经济管理学院，日内瓦大学，瑞士日内瓦）； Faculty of Medicine, University of Geneva, Geneva, Switzerland（日内瓦大学医学学院，瑞士日内瓦）； TU Eindhoven, The Netherlands（埃因霍温理工大学，荷兰）； Open Quantum Institute, CERN, Geneva, Switzerland（开放量子研究所，欧洲核子研究中心，瑞士日内瓦）； armasuisse Science + Technology, Switzerland（armasuisse 科学与技术，瑞士）

AI总结提出一种可解释的机器学习方法，通过建模OpenAlex概念网络的演化，预测科学突破的结构前兆（研究概念之间联系的出现和增强），并利用59个特征的两阶段LightGBM模型同时预测概念对的形成和未来权重，在四个技术/生物医学领域取得优于现有方法的ROC-AUC（0.954-0.967）和可解释性。

Comments 18 pages, 10 figures, 4 tables. An earlier version was presented at Global Tech Mining Conference 2026. Code and data: https://github.com/wazaahhh/breakthroughs-forecasting

详情

AI中文摘要

我们介绍了一种可解释的机器学习方法，通过建模OpenAlex概念网络随时间演化的方式，预测科学突破的结构前兆——研究概念之间联系的出现和增强。利用59个语义和拓扑特征，一个两阶段LightGBM模型联合预测概念对的形成及其未来权重，增加了一个回归阶段，将预期强度量化到先前的链接存在预测之上。与现有技术相比，该方法同时提高了准确性和可解释性：在四个技术和生物医学领域的比较验证中，无需重新调整即可在所有时间范围内获得[0.954, 0.967]的ROC-AUC，超过了先前模型约0.90的水平，而每个预测都基于结构化的、可审计的特征，而非不透明的嵌入。分类性能高（AUC约0.95），回归保持稳定（一到五年内RMSLE为0.45至0.6）。特征归因表明，结构因素——特别是Adamic-Adar相似性和基于度的Hadamard度量——持续驱动准确性，表明与突破相关的重组出现在紧密连接的子网络中。两个专家锚定的案例——量子退火和AI赋能的量子架构——显示模型浮现出与专家预期一致的技术融合。然后，我们概述了一个三层决策架构——检测、专家翻译、机构整合——将这些预测转化为基于证据的研究战略和政策，以开放数据和可解释特征为基础。

英文摘要

We introduce an explainable machine-learning approach that forecasts the structural precursors of scientific breakthroughs -- the emergence and intensification of links between research concepts -- by modelling how OpenAlex concept networks evolve over time. Using 59 semantic and topological features, a two-stage LightGBM model jointly predicts the formation and the future weight of concept pairs, adding a regression stage that quantifies expected intensity to prior link-existence forecasts. Relative to the state of the art, the approach improves accuracy and explainability at once: comparative validation across four technology and biomedical domains yields ROC-AUC in [0.954, 0.967] at all horizons without re-tuning, exceeding the roughly 0.90 of prior models, while every forecast rests on structural, auditable features rather than opaque embeddings. Classification performance is high (AUC about 0.95) and regression remains stable (RMSLE 0.45 to 0.6 over one to five years). Feature attribution shows that structural factors -- particularly Adamic-Adar similarity and degree-based Hadamard measures -- consistently drive accuracy, suggesting that breakthrough-relevant recombinations emerge in tightly connected sub-networks. Two expert-anchored cases, quantum annealing and AI-enabled quantum architectures, show the model surfacing technological convergence consistent with expert expectations. We then outline a three-layer decision architecture -- detection, expert translation, institutional integration -- that turns these forecasts into evidence-based research strategy and policy, anchored in open data and explainable features.

URL PDF HTML ☆

赞 0 踩 0

2606.03851 2026-06-03 cs.LG 版本更新

通过文本选择和属性匹配的文本属性图压缩

Haowei Han, Yuxiang Wang, Guojia Wan, Hao Wang, Shanshan Feng, Hao Huang, Jiawei Jiang, Xiao Yan

发表机构 * School of Computer Science Wuhan University（武汉大学计算机学院）； Institute for Math & AI Wuhan University（武汉大学数学与人工智能研究院）

AI总结提出TAGSAM方法，通过子图文本选择和属性相似性匹配压缩文本属性图，在保持训练精度的同时显著降低空间和时间消耗。

详情

DOI: 10.1145/3774904.3792205

AI中文摘要

文本属性图（TAG）是一种重要的图结构数据，其中每个节点都有文本描述。TAG模型通常联合训练图神经网络（GNN）和语言模型，导致高空间和时间消耗，尤其是在大型数据集上。为了缓解这一问题，我们提出了TAGSAM，一种在保持训练精度的同时压缩TAG的压缩方法。TAGSAM有两个关键设计，即子图文本选择和属性相似性匹配，分别压缩TAG的文本描述和图拓扑。对于文本，子图文本选择通过最大化互信息从多个相关文本描述中选择并合并代表性文本块。对于图拓扑，基于匹配训练轨迹（MTT）的流行压缩方法存在高方差，阻碍了精度。我们的属性相似性匹配通过对齐稳定的相似性矩阵来缓解这一问题。我们评估了TAGSAM与六个最先进的基线方法，结果显示其优越性能。在相同压缩大小下，TAGSAM在精度上平均比最佳基线提高4.9%。此外，即使将TAG压缩到仅1%的大小，它仍能保持有竞争力的训练精度。我们的代码可在以下网址获取：this https URL

英文摘要

Text-Attributed Graph (TAG) is an important type of graph structured data, where each node has a text description. TAG models usually train a Graph Neural Network (GNN) and language model jointly, which leads to high space and time consumption, especially on large datasets. To mitigate this, we propose TAGSAM, a condensation method that compresses TAGs while preserving training accuracy. TAGSAM comes with two key designs, i.e., subgraph text Selection and Attribute similarity Matching, which compress the text description and graph topology of TAG, respectively. For the texts, subgraph text selection selects and merges representative text chunks from multiple related text descriptions by maximizing mutual information. For the graph topology, popular condensation methods based on Matching Training Trajectories (MTT) suffer from high variance, which hinders accuracy. Our attribute similarity matching mitigates this issue by aligning stable similarity matrices. We evaluate TAGSAM against six state-of-the-art baselines, where it showcases superior performance. For the same compressed size, TAGSAM improves upon the best-performing baseline by an average of 4.9% in accuracy. Furthermore, it maintains competitive training accuracy even when the TAG is condensed to just 1% size. Our code is available at https://github.com/SundayVHan/TAGSAM

URL PDF HTML ☆

赞 0 踩 0

2606.03831 2026-06-03 cs.LG stat.ML 版本更新

Online Learning with Gradient-Variation Interval Regret

基于梯度变化的区间遗憾在线学习

Yan-Feng Xie, Shuche Wang, Peng Zhao, Zhi-Hua Zhou

发表机构 * State Key Laboratory for Novel Software Technology and the School of Artificial Intelligence, Nanjing University（新型软件技术国家重点实验室和人工智能学院，南京大学）； Institute of Operations Research and Analytics, National University of Singapore（运筹与分析研究所，新加坡国立大学）

AI总结本文提出首个基于梯度变化量实现区间遗憾界的在线学习算法，采用两层在线集成结构，自适应多种问题相关量并达到极小化最优率，同时引入Lipschitz和平滑性无关的变体。

详情

AI中文摘要

本文研究使用区间遗憾度量的非平稳在线学习，该度量要求在线算法在每个时间区间内表现良好。我们提出了第一个在线学习算法，其区间遗憾界随梯度变化缩放，梯度变化是衡量在线函数梯度累积变化的基本度量，与多种问题相关量有关，并与随机优化等问题紧密相连。我们的方法采用简单高效的两层在线集成结构，实现了强大的理论保证。具体来说，它享有同时自适应多种问题相关量的遗憾界，同时在最坏情况下保持极小化最优率。此外，认识到超参数调优的挑战，我们引入了一种Lipschitz和平滑性无关的变体，自动适应这些可能未知的常数。这主要得益于一种新颖的Lipschitz自适应元算法，该算法可能具有独立的意义。除了区间遗憾，我们的方法还产生了更广泛的影响：它为区间动态遗憾（一种更强的度量，与任何区间上的变化比较器竞争）提供了通用的界，并首次为随机扩展对抗优化提供了分段刻画。理论发现通过实验得到验证。

英文摘要

This paper investigates non-stationary online learning using the metric of interval regret, which requires an online algorithm to perform well over every time interval. We propose the first online learning algorithm that achieves an interval regret bound scaling with gradient variation, a fundamental measure of the cumulative change in online function gradients, which relates to various problem-dependent quantities and is closely connected to stochastic optimization and other problems. Our method employs a simple and efficient two-layer online ensemble structure that achieves strong theoretical guarantees. Specifically, it enjoys a regret bound that simultaneously adapts to various problem-dependent quantities while also preserving the minimax-optimal rate in the worst case. Moreover, recognizing the challenge of hyperparameter tuning, we introduce a Lipschitz- and smoothness-agnostic variant that automatically adapts to these potentially unknown constants. This is primarily enabled by a novel Lipschitz-adaptive meta algorithm, which may be of independent interest. Beyond interval regret, our method also yields broader implications: it provides versatile bounds for interval dynamic regret, a stronger measure that competes with changing comparators over any interval, and yields the first piecewise characterization for stochastic extended adversarial optimization. Theoretical findings are validated by experiments.

URL PDF HTML ☆

赞 0 踩 0

2606.03825 2026-06-03 cs.LG cs.CL 版本更新

Dynamic Short Convolutions Improve Transformers

动态短卷积改进Transformer

Oliver Sieberling, Bharat Runwal, Rameswar Panda, Yoon Kim

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； MIT-IBM Watson AI Lab（MIT-IBM沃森人工智能实验室）

AI总结本文提出动态短卷积作为新的神经网络原语，通过输入依赖的滤波器增强Transformer，在语言建模中相比标准Transformer和静态短卷积变体持续提升性能，并带来计算优势。

详情

AI中文摘要

Transformer已成为大型语言模型的主导架构，主要得益于注意力、前馈层、残差连接和归一化的可扩展性和灵活性。本文引入动态短卷积作为改进Transformer的额外神经网络原语。与静态短卷积不同，动态卷积使用输入依赖的滤波器，在保持卷积局部性偏差的同时增加表达能力。动机实验表明，在具有挑战性的关联回忆任务中，对键、查询和值表示应用动态短卷积相比静态卷积变体提升了性能。在从150M到2B参数的语言建模实验中，动态卷积持续优于标准Transformer和用静态短卷积增强的Transformer。拟合缩放定律表明，当动态卷积应用于键、查询和值向量时，相对于计算匹配的Transformer具有1.33倍的计算优势，而在每个线性层后添加动态卷积时优势达到1.60倍。动态卷积还在线性RNN（Mamba-2/Gated DeltaNet）和混合专家架构上带来了改进。我们通过自定义Triton内核使这些增益变得实用，实现了高效的训练和可管理的端到端减速。这些结果表明，动态短卷积是一种可扩展、硬件高效且富有表现力的原语，可用于推进基于Transformer的语言模型。

英文摘要

Transformers have become the dominant architecture for large language models, largely due to the scalability and flexibility of attention, feed-forward layers, residual connections, and normalization. This paper introduces dynamic short convolutions as an additional neural network primitive for improving Transformers. Unlike static short convolutions, dynamic convolutions use input-dependent filters, which preserves the locality bias of convolution while increasing expressivity. Motivating experiments show that applying dynamic short convolutions to key, query, and value representations improves performance on challenging associative recall tasks compared with static convolutional variants. Across language-modeling experiments ranging from 150M to 2B parameters, dynamic convolutions consistently outperform standard Transformers and Transformers augmented with static short convolutions. Fitting scaling laws indicates a 1.33$\times$ compute advantage over compute-matched Transformers when dynamic convolutions are applied to the key, query, and value vectors, and a 1.60$\times$ advantage when adding dynamic convolutions after every linear layer. Dynamic convolutions also offer improvements on linear RNNs (Mamba-2/Gated DeltaNet) and mixture-of-experts architectures. We make these gains practical with custom Triton kernels that enable efficient training with a manageable end-to-end slowdown. These results suggest that dynamic short convolutions are a scalable, hardware-efficient, and expressive primitive for advancing Transformer-based language models.

URL PDF HTML ☆

赞 0 踩 0

2606.03821 2026-06-03 cs.LG 版本更新

Finding Needles in the Haystack: Transductive Active Labeling in Ecology

在干草堆中寻找针：生态学中的转导式主动标注

Rupa Kurinchi-Vendhan, Sara Beery

发表机构 * Massachusetts Institute of Technology（麻省理工学院）

AI总结本文提出转导式主动标注方法，通过发现稀有类样本解决生态数据长尾分布问题，并设计混合停止准则提升稀有类恢复。

详情

AI中文摘要

主动学习现在已成为标注生态数据的标准做法，使生态学家能够快速处理大量野外数据以理解和监测自然环境。当前的做法归纳性地评估主动学习，在保留的测试集上估计预测性能。我们认为这种评估与大多数生态任务不一致，这些任务的目标是尽可能高效地转导式地标注整个数据池。我们证明，忽略人在回路中会低估继续标注的重要性，特别是对于长尾中的类别，这些类别可能具有不成比例的生态重要性（稀有物种、不常见行为等）。我们的分析表明，对于这个长尾，转导式目标将重要性从预测转移到发现：真正的挑战变成了在干草堆中寻找针，即嵌入在潜在几何中丰富类别密集区域内的稀有类别样本，我们通过一种新的采样难度度量来量化这一点。最后，为了将这些见解转化为实际的生态工作流程，我们提出了一种受生态稀疏曲线启发的保守混合停止准则，并表明将预测性能与发现标准相结合可以减少长尾池上的过早停止，当发现（而非分类）是限制因素时，改善稀有类别的恢复。

英文摘要

Active learning is now standard practice in labeling ecological data, enabling ecologists to quickly process large volumes of field data to understand and monitor natural environments. Current practices evaluate active learning inductively, estimating predictive performance on a held-out test set. We argue that this evaluation is misaligned with most ecological tasks, where the goal is to transductively label an entire pool of data as efficiently as possible. We demonstrate that ignoring the human-in-the-loop underestimates the importance of continuing to label, particularly for classes in the long tail which may be of disproportionate ecological importance (rare species, uncommon behaviors, etc.). Our analysis shows that, for this long tail, the transductive objective shifts importance from prediction to discovery: the true challenge becomes finding "needles in the haystack," examples of rare classes that are embedded within dense regions of abundant classes in the latent geometry, which we quantify with a novel metric of sampling difficulty. Finally, to translate these insights to practical ecological workflows, we propose a conservative hybrid stopping criterion inspired by ecological rarefaction curves, and show that combining predictive performance with discovery criteria reduces premature stopping on long-tailed pools, improving rare-class recovery when discovery, not classification, is the limiting factor.

URL PDF HTML ☆

赞 0 踩 0

2606.03819 2026-06-03 cs.LG 版本更新

TreeFlash: Parallel AR-Approximation for Faster Speculative Decoding

TreeFlash: 用于更快推测解码的并行AR近似

Peer Rheinboldt, Frédéric Berdoz, Roger Wattenhofer

发表机构 * ETH Zurich（苏黎世联邦理工学院）

AI总结提出TreeFlash，通过MLP层近似自回归分布，在保持O(1)解码时间复杂度的同时，提升树形推测解码的块效率和加速比。

详情

AI中文摘要

用于推测解码的一次性块起草器在单次前向传播中生成完整草稿，通过消除顺序令牌生成实现高吞吐量。然而，它们仅基于前缀上下文预测每个草稿令牌，而不依赖于先前生成的令牌。这种非自回归条件导致随着草稿深度增加，起草器的分布偏离验证器的真实自回归分布。在基于树的起草中，这个问题更加严重，因为不同的分支被迫共享后续令牌的相同边际分布。我们提出TreeFlash，通过引入一个以起草器隐藏状态和前一个令牌为条件的MLP层来近似自回归分布，从而解决这一问题。TreeFlash通过采用两阶段近似机制，保留了一次性起草器的O(1)解码时间复杂度。TreeFlash在各种任务和模型上实现了最先进的性能，与边际树起草相比，块效率提高了12%，加速比提高了9%。

英文摘要

One-shot block drafters for speculative decoding generate the full draft in a single forward pass, achieving strong throughput by eliminating sequential token generation. However, they predict each draft token conditioned only on the prefix context, with no dependence on previously drafted tokens. This non-autoregressive conditioning causes the drafter's distribution to diverge from the verifier's true autoregressive distribution as draft depth grows. This problem becomes more severe in tree-based drafting, where distinct branches are forced to share the same marginal distribution for subsequent tokens. We propose TreeFlash, which addresses this by incorporating an MLP layer conditioned on the drafter's hidden state and the previous token to approximate an autoregressive distribution. TreeFlash retains the $\mathcal{O}(1)$ decoding time complexity of one-shot drafters by employing a two-stage approximation mechanism. TreeFlash achieves state-of-the-art performance across a variety of tasks and models, improving over marginal tree drafting by $12\%$ higher block efficiency and $9\%$ higher speedup.

URL PDF HTML ☆

赞 0 踩 0

2606.03811 2026-06-03 cs.CR cs.AI cs.LG 版本更新

AI Agents Enable Adaptive Computer Worms

AI代理实现自适应计算机蠕虫

Jonas Guan, Tom Blanchard, Hanna Foerster, Hengrui Jia, Gabriel Huang, Nicolas Papernot

发表机构 * University of Toronto（多伦多大学）； Vector Institute（向量研究所）； University of Cambridge（剑桥大学）； ServiceNow

AI总结本研究展示了AI代理能够生成针对每个目标的定制攻击策略，利用被感染机器上的大语言模型自我维持并传播，形成自持的AI驱动网络威胁。

详情

AI中文摘要

计算机蠕虫是一种通过在网络中从一台机器复制到另一台机器来传播的恶意软件。传统蠕虫（如WannaCry）利用预定的漏洞，修补这些漏洞即可阻止其传播。本文表明，人工智能（AI）代理实现了一种根本性的新威胁：一种能够针对每个遭遇的目标生成定制攻击策略的蠕虫。该蠕虫寄生性地利用被感染的机器运行开放权重的大语言模型（LLM）以维持其推理能力，或扩展其攻击范围。在部署于Linux、Windows和物联网（IoT）设备的机器网络上，该蠕虫通过利用常见的现实企业网络漏洞进行传播。由于蠕虫由窃取的计算资源驱动，攻击者每次新感染所需的边际成本为零。这在攻击者和防御者之间造成了不稳定的经济不对称。此外，由于蠕虫不需要商业AI平台，集中式安全控制（如服务拒绝或速率限制）在结构上无关紧要。我们的结果表明，自持的AI驱动网络威胁不再是理论上的。我们必须为自主的生成式对手做好准备：这些恶意软件系统无需人类操作员即可传播，其定义不是固定的利用代码，而是实时推理目标、适应观察并合成攻击逻辑的能力。

英文摘要

A computer worm is malware that spreads on a network by replicating itself from one machine to another. Traditional worms, like WannaCry, exploited predetermined vulnerabilities, and their spread can be halted by patching those vulnerabilities. Here we show that artificial intelligence (AI) agents enable a fundamentally new threat: a worm that generates tailored attack strategies to each target it encounters. The worm parasitically uses compromised machines to run open-weight large language models (LLMs) to sustain its reasoning, or extend its reach for further attacks. Deployed on a network of machines spanning Linux, Windows, and IoT (Internet of Things) devices, the worm propagated by exploiting common, real-world corporate network vulnerabilities. Since the worm is powered by stolen compute, the attacker's marginal cost per new infection is zero. This creates a destabilizing economic asymmetry between attackers and defenders. Moreover, because the worm requires no commercial AI platform, centralized safety controls, such as service refusals or rate limiting, are structurally irrelevant. Our results demonstrate that self-sustaining AI-driven cyber-threats are no longer theoretical. We must prepare for autonomous generative adversaries: malware systems that propagate without human operators and are defined not by fixed exploit code, but by the capacity to reason about targets, adapt to observations, and synthesize attack logic in real time.

URL PDF HTML ☆

赞 0 踩 0

2606.03808 2026-06-03 cs.LG cs.AI cs.CR 版本更新

PURGE: Projected Unlearning via Retain-Guided Erasure

PURGE: 通过保留引导擦除的投影遗忘

Vedant Jawandhia, Daksh Ahuja, Ghufran Alam Siddiqui, Prashant Trivedi, Yash Sinha, Pratik Narang

发表机构 * BITS Pilani, Pilani Campus, India（印度比斯帕利尼学院）

AI总结提出一种基于持续学习与机器遗忘对偶性的遗忘算法PURGE，利用梯度投影约束保留损失，并通过多层表示擦除和保留混淆目标实现隐私与效用的平衡。

Comments 13 pages, 10 figures, 6 tables

详情

AI中文摘要

我们提出PURGE，一种基于简单但未被充分利用的观察构建的机器遗忘算法：持续学习（CL）和机器遗忘（MU）本质上是二元问题。CL试图在不遗忘旧任务的情况下学习新任务；MU试图在不损害保留性能的情况下擦除特定数据，代表了相同基本张力在相反方向上的体现。PURGE通过调整A-GEM（Chaudhry等人，2019）的梯度投影来利用这种对偶性，使得每个遗忘步骤都受到约束，不会增加保留集损失。在此基础上，它执行多层表示擦除，将中间层中遗忘集的激活推向保留分布，以从隐藏表示中移除信息，而不仅仅是在输出层抑制信息。一个关键的设计选择是保留混淆目标：不是将遗忘输出推向均匀分布（我们发现这很容易被成员推断攻击检测到），而是将目标设定为模型在保留数据上的自然混淆模式。这使得遗忘模型难以与从头重新训练的模型区分。两个自调节停止标准（保留损失预算和遗忘准确率目标）让算法自行决定何时停止，无需手动调整训练轮数。在五个数据集（CIFAR-10、MNIST、SVHN、STL10、PathMNIST）上的22个类别级遗忘任务实验中，PURGE始终将保留准确率保持在96%以上，同时实现接近0.5（理想值）的MIA AUROC，在隐私-效用前沿上优于梯度上升、KL均匀分布以及多个已发表的基线方法。

英文摘要

We propose PURGE, a machine unlearning algorithm built on a simple but an under-exploited observation: continual learning (CL) and machine unlearning (MU) which are fundamentally dual problems. CL tries to learn new tasks without forgetting old ones; MU tries to erase specific data without hurting retained performance representing the same underlying tension in opposite directions. PURGE leverages this duality by adapting gradient projection from A-GEM (Chaudhry et al., 2019) so that every unlearning step is constrained to not increase the retain-set loss. On top of this, it performs multi-layer representation erasure, pushing forget-set activations in intermediate layers towards the retain distribution to remove information from hidden representations rather than just suppressing it at the output. A key design choice is the retain-confusion target: rather than pushing forget outputs toward the uniform distribution, which we found to be surprisingly easy for membership inference attacks to detect, we instead target the model's natural confusion pattern on retain data. This makes the unlearned model hard to distinguish from one retrained from scratch. Two self-regulating stopping criteria (a retain-loss budget and a forget-accuracy target) let the algorithm decide on its own when to stop, removing the need for manual epoch tuning. In experiments on five datasets (CIFAR-10, MNIST, SVHN, STL10, PathMNIST) across 22 class-level forgetting tasks, PURGE consistently keeps retain accuracy above 96% while achieving MIA AUROC close to 0.5 (the ideal), outperforming gradient ascent, KL-uniform, and several published baselines on the privacy-utility frontier.

URL PDF HTML ☆

赞 0 踩 0

2606.03804 2026-06-03 cs.LG 版本更新

Easy-to-Use Shielding for Reinforcement Learning

易于使用的强化学习屏蔽技术

Stefan Pranger, Bettina Könighofer

AI总结提出tempestpy库，将形式化屏蔽合成集成到Gymnasium API中，降低强化学习安全探索的门槛，并扩展了随机多人博弈的屏蔽算法。

详情

AI中文摘要

安全探索是强化学习中的一个关键挑战，旨在防止智能体在探索环境时做出有害决策。屏蔽是一种利用环境模型形式的领域知识来决定动作安全性的技术。尽管已经成熟，但由于缺乏将形式化屏蔽合成与标准强化学习框架连接起来的可访问端到端基础设施，屏蔽在强化学习中的应用有限。应用屏蔽通常需要形式化方法的专业知识和大量的工程工作，使其脱离典型的强化学习工作流程。我们通过将屏蔽合成工具Tempest扩展为安全强化学习的实用后端来解决这一问题。我们的核心贡献是tempestpy，一个Python库，它将基于Tempest的屏蔽合成直接集成到Gymnasium API中，使得屏蔽可以在现有的强化学习管道中合成和部署。这降低了屏蔽的入门门槛，将形式化安全探索方法转化为强化学习实践者可用的组件。我们还扩展了Tempest的算法支持，以计算随机多人博弈的可靠屏蔽，保留了形式化安全保证。我们端到端地展示了最终的工作流程，并在多个环境中评估了有屏蔽和无屏蔽的强化学习。为了便于建模，我们为MiniGrid提供了符号模型，并引入了MiniGridSafe，这是一个游乐场环境集合，旨在使屏蔽易于访问且实验透明。MiniGridSafe通过具有概率转换和额外智能体的安全导向场景扩展了MiniGrid，使得在简单直观的设置中研究具有挑战性的安全方面成为可能。

英文摘要

Safe exploration is a key challenge in Reinforcement Learning (RL) that aims to prevent agents from making harmful decisions while exploring their environment. Safe exploration is a key challenge in Reinforcement Learning (RL) that aims to prevent agents from making harmful decisions while exploring their environment. Shielding is one such technique that assumes domain knowledge in the form of an environment model to decide upon action safety. Although well-established, shielding has seen limited adoption in RL due to the lack of accessible end-to-end infrastructure connecting formal shield synthesis with standard RL frameworks. Applying shielding typically requires expertise in formal methods and substantial engineering effort, keeping it outside the typical RL workflow. We address this by extending our shield synthesis tool Tempest into a practical backend for safe RL. Our core contribution is tempestpy, a Python library that integrates Tempest-based shield synthesis directly into the Gymnasium API, allowing shields to be synthesized and deployed within existing RL pipelines. This lowers the barrier to entry for shielding and turns formal safe-exploration methods into a usable component for RL practitioners. We also extend Tempest's algorithmic support to compute sound shields for stochastic multiplayer games, preserving formal safety guarantees. We demonstrate the resulting workflow end to end and evaluate shielded and unshielded RL across multiple environments. To facilitate modeling, we provide symbolic models for MiniGrid and introduce MiniGridSafe, a collection of playground environments designed to make shielding easily accessible and experimentally transparent. MiniGridSafe extends MiniGrid with safety-oriented scenarios featuring probabilistic transitions and additional agents, enabling the study of challenging safety aspects in a simple and intuitive setting.

URL PDF HTML ☆

赞 0 踩 0

2606.03800 2026-06-03 cs.LG cs.AI 版本更新

Trading Human Curation for Synthetic Augmentation in RLVR

在RLVR中用合成增强替代人工策展

Akshansh, Leonardo Rosa Rodrigues, Michael Korostelev, Youssef Hassan, Mark E. Whiting

发表机构 * Pareto AI

AI总结研究通过预指定、门控过滤的增强任务替代人工策展任务，在RLVR中实现成本效益权衡，并保持泛化性能。

Comments 21 pages, 5 main-text figures, 4 appendix figures. Preprint

详情

AI中文摘要

高质量训练任务的供应是基于可验证奖励的强化学习（RLVR）在智能体语言模型上的核心瓶颈。每个任务需要一个沙盒环境、一个提示和一个手工编写的奖励函数，只有通过质量标准的任务才能产生有用的训练信号。达到这一质量标准的人工策展在有效RL训练所需的任务数量上无法经济地扩展，而自动生成的任务变体与人工编写任务之间的替代率尚未确定。我们研究在RLVR期间，使用预指定、门控过滤的增强（augmentations）作为额外人工策展的替代品。我们形式化了增强任务与人工任务之间的成本调整权衡率 $\rho_{\text{cost}}$，通过在不同增强比例的训练语料库上进行受控消融实验来测量它，并描述了增强管道的端到端经济学。用增强内容替代额外的人工编写任务，在涵盖代码、指令遵循、推理和多轮智能体函数调用的十个基准测试套件上保持了聚合的留出泛化能力。在合理的 $c_{\text{human}}/c_{\text{aug}}$ 范围内，门控合成与人工RLVR任务之间的成本调整权衡率 $\rho_{\text{cost}}$ 保持在 $[1.4\times, 11.6\times]$ 之间。

英文摘要

The supply of high-quality training tasks is a central bottleneck for reinforcement learning from verifiable rewards (RLVR) on agentic language models. Each task requires a sandboxed setup, a prompt, and a hand-authored reward function, and only tasks that pass a quality bar produce useful training signal. Hand-curation at this quality bar does not scale economically to the task counts effective RL training requires, and the substitution rate between automatically generated task variants and human-authored ones is not yet established. We investigate using pre-specified, gate-filtered augmentations of a small hand-authored base as a substitute for additional human curation during RLVR. We formalize the cost-adjusted trade rate $ρ_{\text{cost}}$ between augmented and human-authored tasks, measure it through a controlled ablation across training corpora with varying augmentation share, and characterize the end-to-end economics of the augmentation pipeline. Substituting augmented content for additional human-authored tasks retains aggregate held-out generalization on a ten-benchmark suite spanning code, instruction following, reasoning, and multi-turn agentic function-calling. The cost-adjusted trade rate $ρ_{\text{cost}}$ between gated synthetic and human-authored RLVR tasks stays in $[1.4\times, 11.6\times]$ across the plausible $c_{\text{human}}/c_{\text{aug}}$ range.

URL PDF HTML ☆

赞 0 踩 0

2606.03794 2026-06-03 cs.LG eess.SP 版本更新

基于熵引导的工具感知优化用于高效智能体强化学习

Hongye Cao, Nuo Yan, Haoyuan Deng, Ziwei Wang, Tianpei Yang, Jing Huo, Yuyao Zhang, Yang Gao

发表机构 * National Key Laboratory for Novel Software Technology, Nanjing University（南京大学新型软件技术国家重点实验室）； Nanyang Technological University（南洋理工大学）； China Mobile NineVerse Artificial Intelligence Technology (Beijing) Co., Ltd.（中国移动九章人工智能技术（北京）有限公司）； Institute of Artificial Intelligence, NineVerse（九章人工智能研究院）

AI总结提出TAO-RL框架，通过工具感知轨迹过滤和熵引导探索解决智能体强化学习中工具使用导致的训练不稳定问题，在7个推理基准上优于现有方法。

详情

AI中文摘要

智能体强化学习（RL）使大型语言模型（LLMs）具备工具使用能力，从而显著提升复杂任务的推理性能。然而，整合外部工具常常导致训练不稳定：过度依赖工具会引发输入分布偏移，而过于保守的工具使用则限制了有效探索。为解决这一问题，我们提出统一框架TAO-RL，将工具感知轨迹过滤与熵引导探索相结合，以实现高效策略优化。具体而言，在数据层面，TAO-RL根据两个标准过滤轨迹：丢弃所有工具调用均执行失败的轨迹，以及移除所有轨迹全部正确或全部错误的轨迹，因为这两种情况都会产生退化的优势估计，无法提供有区分度的学习信号。这种联合过滤保留了既具备工具能力又包含信息量的数据，建立了高质量的训练分布。在算法层面，我们引入工具感知的熵引导奖励，重塑工具调用后token的优势函数，鼓励策略在关键决策点探索更多样化的推理路径。这两个组成部分相互增强：轨迹过滤建立了干净且信息丰富的训练基础，而熵引导探索则在关键工具交互节点驱动更强的推理行为。在3种模型规模下的7个具有挑战性的推理基准上的大量实验表明，TAO-RL优于现有方法。

英文摘要

Agentic reinforcement learning (RL) equips large language models (LLMs) with tool-use capabilities that substantially improve reasoning on complex tasks. However, integrating external tools often destabilizes training: over-reliance on tools can induce input distribution shift, while overly conservative tool use limits effective exploration. To address this issue, we propose a unified framework TAO-RL that couples tool-aware trajectory filtering with entropy-guided exploration for efficient policy optimization. Specifically, at the data level, TAO-RL filters rollout trajectories along two criteria: discarding those where all tool invocations fail to execute, and removing those where all rollouts are either correct or incorrect, as both cases yield degenerate advantage estimates that contribute no discriminative learning signal. This joint filtering retains data that are both tool-capable and informative, establishing a high-quality training distribution. At the algorithmic level, we introduce a tool-aware entropy-guided bonus that reshapes the advantage function at post-tool-call tokens, encouraging the policy to explore more diverse reasoning paths at critical decision points. These two components are mutually reinforcing: trajectory filtering establishes a clean and informative training foundation, while entropy-guided exploration drives stronger reasoning behaviors at critical tool-interaction junctures. Extensive experiments on 7 challenging reasoning benchmarks across 3 model scales demonstrate the superiority of TAO-RL over existing methods.

URL PDF HTML ☆

赞 0 踩 0

2606.03756 2026-06-03 cs.RO cs.LG 版本更新

Neural Navigation Functions for Zero-Shot Generalizable Motion Planning

神经导航函数用于零样本泛化运动规划

Benjamin D. Shaffer, Pei-An Hsieh, Brooks Kinch, Nathaniel Trask, M. Ani Hsieh

发表机构 * University of Pennsylvania, United States（宾夕法尼亚大学，美国）； Department of Mechanical Engineering and Applied Mechanics（机械工程与应用力学系）； Department of Electrical and Systems Engineering（电气与系统工程系）

AI总结提出神经导航函数（Neural-NF），通过将数据驱动适应嵌入结构化椭圆规划器，实现跨未见环境几何的零样本迁移，并保证无碰撞、单调下降和全局最小值。

Comments 17 pages, 10 figures

详情

AI中文摘要

我们引入了神经导航函数（Neural-NF），一种学习到的反应式导航函数，能够跨未见环境几何进行零样本迁移。Neural-NF将数据驱动适应置于结构化椭圆规划器中，其中导航目标被学习，而规划器结构通过构造得以保留。具体来说，内在的拉普拉斯派生特征被映射到局部PDE系数，求解得到的边值问题在每个目标域上产生全局一致的值函数。对于每个可接受的学习模型，所得策略无碰撞，提供单调下降，并通过构造在目标处具有全局最小值。这为任何参数设置提供了线性可解的最优控制解释。实验上，Neural-NF在多样几何上实现了强大的零样本迁移，并比直接预测值函数的学习规划器性能提升高达5倍。

英文摘要

We introduce Neural Navigation Functions (Neural-NF), a learned reactive navigation function capable of zero-shot transfer across unseen environment geometries. Neural-NF places data-driven adaptation within a structured elliptic planner, where the navigation objective is learned while planner structure is preserved by construction. Specifically, intrinsic Laplacian-derived features are mapped to local PDE coefficients, and solving the resulting boundary value problem produces a globally consistent value function on each target domain. For every admissible learned model, the resulting policy is collision-free, provides monotonic descent and a global minimum at the goal by construction. This admits a linearly-solvable optimal-control interpretation for any parameter setting. Empirically, Neural-NF achieves strong zero-shot transfer across diverse geometries and outperforms learned planners that directly predict the value function by up to a $5\times$ improvement.

URL PDF HTML ☆

赞 0 踩 0

2606.03731 2026-06-03 cs.LG stat.ML 版本更新

Conformal Language Modeling via Posterior Sampling

通过后验采样的共形语言建模

Nicolas Emmenegger, Theo X. Olausson, Armando Solar-Lezama, Chara Podimata

发表机构 * Massachusetts Institute of Technology（麻省理工学院）

AI总结提出通过近似LLM后验采样（条件为校准的高分区域）来替代事后过滤，实现目标风险控制并提高下游效用。

详情

AI中文摘要

大型语言模型仍然受到幻觉的困扰。最近的工作试图使用基于共形预测的统计技术来抑制其普遍性，取得了理论和实证上的成功。然而，这些方法以事后方式运作，将采样过程本身视为原子操作，然后通过外科手术式地修改样本来移除幻觉声明。这种过滤与生成之间的脱节可能导致样本不连贯、不一致，或者仅仅在模型本身下不太可能。此外，事后手术无法将概率质量转移到更有用和更有帮助的响应上。为了解决这些问题，我们提出从LLM后验的近似中采样，其中条件事件对应于一个校准的高分区域。我们开发了一种针对条件序列生成场景的校准程序，该程序能有效识别该区域并实现目标风险控制。在实证中，我们将我们的方法应用于以开放式的传记生成和数学问题解决为重点的案例研究；与先前的工作相比，我们获得了相同的统计保证，且下游效用更高。

英文摘要

Large Language Models remain plagued by hallucinations. Recent work has sought to tame their prevalence using statistical techniques based on conformal prediction, with both theoretical and empirical success. However, these methods operate in a post-hoc fashion, treating the sampling procedure itself as atomic and then surgically altering samples to remove hallucinated claims. This disconnect between filtering and generation can result in samples that are incoherent, inconsistent, or simply unlikely under the model itself. Moreover, post-hoc surgery is unable to shift probability mass towards more useful and helpful responses. To address these issues, we propose to instead sample from approximations to an LLM posterior, where the conditioning event corresponds to a calibrated, high-scoring region. We develop a calibration procedure tailored to the setting of conditional sequential generation that effectively identifies this region and achieves target risk control. Empirically, we apply our method to case studies focused on open-ended biography generation and mathematical problem solving; compared to prior work, we obtain the same statistical guarantees, with higher downstream utility.

URL PDF HTML ☆

赞 0 踩 0

2606.03723 2026-06-03 cs.LG 版本更新

Compress then Merge: From Multiple LoRAs into One Low-Rank Adapter

先压缩后合并：从多个LoRA到一个低秩适配器

Zhengbao He, Ruiqi Ding, Zhehao Huang, Ruikai Yang, Tao Li, Xiaolin Huang

发表机构 * Institute of Image Processing and Pattern Recognition, School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, China（图像处理与模式识别研究所，自动化与智能感知学院，上海交通大学，上海，中国）； Shanghai Key Laboratory of Flexible Medical Robotics, Tongren Hospital, Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China（柔性医疗机器人上海市重点实验室，同仁医院，医疗机器人研究所，上海交通大学，上海，中国）

AI总结针对多LoRA合并时全参数合并破坏低秩结构的问题，提出先压缩后合并（CtM）方法，通过共享子空间投影保证输出严格秩r，性能优于现有单LoRA基线。

Comments Accepted to ICML 2026. Code: https://github.com/ZhengbaoHe/compress-then-merge

详情

AI中文摘要

低秩适配（LoRA）实现了基础模型的参数高效特化，但任务特定适配器的激增将能力分散到多个适配器中，使复用和部署复杂化。我们研究将$T$个LoRA合并为单个秩-$r$ LoRA的问题，从而保留低秩结构的优势。现有的先合并后压缩流水线将秩约束视为事后考虑：它们在完整参数空间中合并适配器，然后通过截断SVD将合并结果压缩到秩$r$。然而，全参数合并可能破坏低秩结构，使得后续压缩难以恢复有效的秩-$r$ LoRA。我们提出先压缩后合并（CtM），一种反向流水线，在合并前强制秩-$r$瓶颈：CtM仅使用LoRA权重计算共享的$r$维子空间以捕获跨适配器的公共结构，将每个适配器投影到共享子空间以获得$r\times r$坐标，然后在此缩减空间中应用标准合并规则。CtM通过构造保证秩-$r$ LoRA，避免了事后截断，并在由拼接的LoRA因子张成的核心空间中实现高效计算。跨多个模型和任务的实验表明，CtM持续优于现有的单LoRA输出基线，同时缩小了与全参数合并方法的性能差距。

英文摘要

Low-rank adaptation (LoRA) enables parameter-efficient specialization of foundation models, but the proliferation of task-specific adapters fragments capabilities across many adapters, complicating reuse and deployment. We study the problem of merging $T$ LoRAs into a single rank-$r$ LoRA, thereby preserving the benefits of low-rank structure. Existing Merge-then-Compress pipelines treat the rank constraint as an afterthought: they merge adapters in the full parameter space, then compress the merged result to rank $r$ via truncated SVD. However, full-parameter merging may destroy the low-rank structure, making it difficult for subsequent compression to recover an effective rank-$r$ LoRA. We propose Compress-then-Merge (CtM), a reversed pipeline that enforces the rank-$r$ bottleneck before merging: CtM computes shared $r$-dimensional subspaces using only the LoRA weights to capture cross-adapter common structure, projects each adapter into the shared subspaces to obtain $r\times r$ coordinates, and then applies standard merging rules in this reduced space. CtM guarantees a rank-$r$ LoRA by construction, avoiding post-hoc truncation, and enables efficient computation in the core space spanned by concatenated LoRA factors. Experiments across multiple models and tasks show that CtM consistently outperforms existing single-LoRA-output baselines while narrowing the performance gap to full-parameter merging methods.

URL PDF HTML ☆

赞 0 踩 0

2606.03712 2026-06-03 cs.LG 版本更新

When Graph Tokens Sink: A Mechanistic Analysis of Graph Language Models

当图标记沉没：图语言模型的机制分析

Ding Zhang, Runtao Zhou, Wenqing Zheng, Rizal Fathony, Bayan Bruss, Chirag Agarwal

发表机构 * University of Virginia（弗吉尼亚大学）； Capital One

AI总结本文通过分析图语言模型中图标记的内部行为，发现激活层面的显著性与图信息利用之间存在解耦，揭示了现有图标记构建、放置和对齐机制的局限性。

详情

AI中文摘要

图语言模型（GLMs）已成为将大型语言模型（LLMs）适应图学习任务的一个有前景的方向。通过将图拓扑和节点信息转换为图标记，GLMs允许LLMs联合处理结构化图输入和文本指令。然而，LLMs如何内部解释这些图标记以及图标记是否作为图结构的有意义载体仍不清楚。在这项工作中，我们通过代表性GLM架构中的图标记行为分析了LLMs如何处理图信息。发现：我们发现GLMs中图标记的内部显著性与图信息利用并不等价。图沉没标记一致地表现为激活层面的异常值：它们可以通过一小部分隐藏状态维度上的巨大激活值来识别，并且偏向于早期的图标记位置。然而，这种激活层面的显著性并不意味着这些标记是图信息的主要载体。与语言和视觉-语言模型中的经典注意力沉没不同，图沉没标记不一定从查询标记中吸引最大的注意力权重。通过剪枝、重新定位和交换干预，我们表明图沉没标记对于下游预测并不是最重要的语义或结构标记。含义：这些结果共同表明，在当前的GLMs将图结构映射到LLM标记空间后，产生的图标记表示并不会自然地形成完全可用的拓扑感知内部表示；相反，它们在激活层面的显著性和图语义效用之间表现出解耦。这种解耦指出了现有图标记构建、放置和对齐机制的局限性。

英文摘要

Graph Language Models (GLMs) have become a promising direction for adapting Large Language Models (LLMs) to graph learning tasks. By transforming graph topology and node information into graph tokens, GLMs allow LLMs to jointly process structured graph inputs and textual instructions. Yet, it remains unclear how LLMs internally interpret these graph tokens and whether graph tokens act as meaningful carriers of graph structure. In this work, we analyze how LLMs process graph information through graph-token behavior in representative GLM architectures. Findings. We find that the internal saliency of graph tokens in GLMs is not equivalent to graph information utilization. Graph sink tokens consistently emerge as activation-level outliers: they can be identified by massive activation values along a small set of hidden-state dimensions and are biased toward early graph-token positions. However, this activation-level saliency does not imply that these tokens are the main carriers of graph information. Unlike classical attention sinks in language and vision-language models, graph sink tokens do not necessarily attract the largest attention weights from query tokens. Through pruning, repositioning, and swapping interventions, we show that graph sink tokens are not the most important semantic or structural tokens for downstream prediction. Implications. Together, these results suggest that after current GLMs map graph structure into the LLM token space, the resulting graph-token representations do not naturally form a fully usable topology-aware internal representation; instead, they exhibit a decoupling between activation-level saliency and graph-semantic utility. This decoupling points to limitations in existing graph-token construction, placement, and alignment mechanisms.

URL PDF HTML ☆

赞 0 踩 0

2606.03698 2026-06-03 cs.LG 版本更新

Multi$^2$: Hierarchical Multi-Agent Decision-Making with LLM-Based Agents in Interactive Environments

Multi$^2$：基于LLM智能体在交互环境中的分层多智能体决策

Sangeun Park, Minhae Kwon

发表机构 * KAIST（韩国科学技术院）

AI总结提出Multi$^2$分层多智能体决策框架，通过高层智能体（System 1）使用监督微调生成子目标，低层智能体（System 2）使用离线到在线强化学习执行原子动作，以缓解目标漂移并实现长期稳定控制。

Comments Accepted at ICML 2026

详情

AI中文摘要

大型语言模型（LLM）研究的一个核心目标是构建能够通过与动态环境持续交互进行规划、行动和适应的智能体系统。尽管最近的基于LLM的智能体展现出令人印象深刻的上下文推理能力，但它们的长期决策仍然脆弱，常常遭受目标漂移，即目标和计划在长时间交互中发生偏移。我们引入了Multi$^2$，一个分层多智能体决策框架，将智能体行为显式分解为互补角色。高层智能体（System 1）使用监督微调（SFT）专注于上下文感知的子目标生成，而低层智能体（System 2）通过交互环境中的离线到在线强化学习（RL）执行原子动作。这种分离实现了稳定的长期控制，减轻了目标漂移，并允许高效适应。在多种交互环境中，Multi$^2$持续优于强智能体基线，在多轮交互中展现出改进的鲁棒性和协调性。除了性能提升，我们还引入并发布了三个分层基准数据集，填补了训练和评估基于LLM智能体的分层决策的长期空白。

英文摘要

A central goal of large language model (LLM) research is to build agentic systems that can plan, act, and adapt through sustained interaction with dynamic environments. While recent LLM-based agents exhibit impressive contextual reasoning, their long-horizon decision-making remains fragile, often suffering from objective drift, where goals and plans drift over extended interactions. We introduce Multi$^2$, a hierarchical multi-agent decision-making framework that explicitly decomposes agent behavior into complementary roles. A high-level agent (System 1) focuses on context-aware sub-goal generation using supervised fine-tuning (SFT), while a low-level agent (System 2) executes atomic actions through offline-to-online reinforcement learning (RL) in interactive environments. This separation enables stable long-horizon control, mitigates objective drift, and allows efficient adaptation. Across diverse interactive environments, Multi$^2$ consistently outperforms strong agentic baselines, demonstrating improved robustness and coordination in multi-turn interaction. Beyond performance, we introduce and release three hierarchical benchmark datasets, filling a long-standing gap in training and evaluating hierarchical decision-making for LLM-based agents.

URL PDF HTML ☆

赞 0 踩 0

2606.03689 2026-06-03 cs.LG cs.AI 版本更新

Staying Alive: Uncensored Survival Analysis with Tabular Foundation Models

保持存活：基于表格基础模型的无审查生存分析

Mariana Vargas Vieyra

发表机构 * GitHub

AI总结提出一种无需训练的生存回归方法，利用表格基础模型预测事件时间并迭代填补右删失数据，构建加速失效时间模型，在标准基准上表现与需训练的模型相当。

详情

AI中文摘要

生存分析是一种统计框架，用于建模直到某个感兴趣事件发生的时间跨度。它广泛应用于包括医疗保健和客户流失预测在内的多个领域，其适用性的一个核心挑战在于事件时间被部分观测或存在右删失。近年来，表格基础模型因其能够在单次前向传播中执行预测任务而无需数据集特定的参数拟合，引起了广泛关注。尽管取得了成功，但由于右删失的存在，它们在时间-事件数据预测任务中的应用仍然困难。在这项工作中，我们提出了一种无需训练的生存回归方法，通过利用表格基础模型来预测事件时间并迭代地填补右删失数据。我们的方法使用表格基础模型构建加速失效时间模型，除了拟合单个标量参数外无需训练。随后，基于Buckley-James估计器，我们引入了一种非参数上下文内估计器来处理右删失数据。我们在标准生存分析基准上的实验表明，我们的方法与几种需要训练的参数和半参数生存回归模型（包括Cox回归和参数加速失效时间模型）相比具有竞争力。

英文摘要

Survival Analysis (SA) is a statistical framework that models the time span until some event of interest occurs. Widely used in several domains, including healthcare and churn prediction, a central challenge in its applicability stems from the time of the event being partially observed or \emph{right-censoring}. Tabular Foundation Models (TFM) have attracted significant interest in recent years due to their ability to perform prediction tasks in a single forward pass, requiring no dataset-specific parameter fitting. Despite their success, their application to prediction tasks on time-to-event data remains difficult due to right censoring. In this work, we present a training-free method to survival regression by leveraging TFMs to both predict the time of the event and iteratively impute right-censored data. Our method uses a TFM to construct an Accelerated Failure Time (AFT) model requiring no training beyond fitting a single scalar parameter. Subsequently, by building on the Buckley-James estimator, we introduce a non-parametric in-context estimator for right-censored data. Our experiments on standard survival analysis benchmarks show that our method is competitive with several parametric and semi-parametric survival regression models that require training, including Cox regression and parametric AFT models.

URL PDF HTML ☆

赞 0 踩 0

2606.03685 2026-06-03 cs.LG cs.AI 版本更新

A Close Look At World Model Recovery In Supervised Fine-Tuned LLM Planners

监督微调的大语言模型规划器中世界模型恢复的深入探究

Patrick Emami, Nan Qiang, Peter Graf

发表机构 * National Laboratory of the Rockies（落基山国家实验室）

AI总结通过可解释性实验，研究监督微调如何影响大语言模型在经典规划任务中恢复世界模型的能力，发现微调使模型线性编码动作有效性和状态谓词，且更广泛的状态空间覆盖有助于更准确的世界模型恢复。

Comments 17 pages. Under review at TMLR

详情

AI中文摘要

监督微调（SFT）改进了大语言模型（LLM）中的端到端经典规划，但这些模型是否也学会了表示和推理它们正在解决的规划问题？由于经典规划问题的相对复杂性以及端到端规划生成对LLM的挑战，探索这个问题一直很困难。在我们的工作中，我们设计并执行了一系列可解释性实验，通过检查微调LLM的内部表示和生成能力，全面探究世界模型恢复。我们发现：a) 对有效动作序列进行监督微调使LLM能够线性编码动作有效性和一些状态谓词。b) 难以使用输出概率对动作有效性进行分类的模型可能仍然学习到将有效动作与无效动作分开的内部表示。c) 微调期间更广泛的状态空间覆盖（例如来自随机游走数据）能更准确地恢复底层世界模型。总之，这项工作为将可解释性技术应用于规划LLM提供了一种方法，并产生了有助于揭示LLM中知识表示方式的见解。

英文摘要

Supervised fine-tuning (SFT) improves end-to-end classical planning in large language models (LLMs), but do these models also learn to represent and reason about the planning problems they are solving? Due to the relative complexity of classical planning problems and the challenge that end-to-end plan generation poses for LLMs, it has been difficult to explore this question. In our work, we devise and perform a series of interpretability experiments that holistically interrogate world model recovery by examining both internal representations and generative capabilities of fine-tuned LLMs. We find that: a) Supervised fine-tuning on valid action sequences enables LLMs to linearly encode action validity and some state predicates. b) Models that struggle to use output probabilities for classifying action validity may still learn internal representations that separate valid from invalid actions. c) Broader state space coverage during fine-tuning, such as from random walk data, yields more accurate recovery of the underlying world model. In summary, this work contributes a recipe for applying interpretability techniques to planning LLMs and generates insights that shed light on open questions about how knowledge is represented in LLMs.

URL PDF HTML ☆

赞 0 踩 0

2606.03681 2026-06-03 cs.LG 版本更新

Speedrunning Tabular Foundation Model Pretraining

表格基础模型预训练的速通

Salih Bora Ozturk, Alexander Pfefferle, Frank Hutter

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种速通竞赛格式，通过优化单文件训练脚本，在nanoTabPFN上实现81倍预训练加速，并建立社区排行榜以累积改进。

2606.03647 2026-06-03 cs.CR cs.AI cs.LG 版本更新

Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs

黑盒、自适应、高效、可迁移、有害、适用……攻击是破解LLM所需的一切

Vincent Limbach, Jonas Dornbusch, David Lüdke, Stephan Günnemann, Leo Schwinn

发表机构 * University of St. Gallen（圣加尔大学）

AI总结提出间接危害优化（IHO）方法，通过迭代偏好优化训练掩码扩散语言模型攻击器，实现黑盒、高效、可迁移的自适应攻击，显著提升对分层防御的破解成功率。

详情

AI中文摘要

准确评估对抗鲁棒性是一个长期挑战。有缺陷的攻击设计可能会夸大鲁棒性估计，使得部署风险评估和防御比较不可靠。历史上，像AutoAttack这样的标准化攻击在很大程度上解决了图像分类器的问题，为跨防御的系统比较提供了可靠的评估基线。然而，对于LLM越狱评估，目前还没有等效的方法，而设计这样的攻击要困难得多。一个可靠的攻击必须（除其他外）兼容黑盒、适用于任意防御管道且高效，而现有方法无法同时满足这些条件。我们引入了间接危害优化（IHO），这是一种掩码扩散语言模型攻击器，通过对危害评判器进行迭代偏好优化来训练，仅需对目标进行黑盒访问。相同的方法无需修改即可用作针对个体行为的强自适应攻击，或作为一种高效的摊销策略，无需微调即可迁移到未见行为和未见目标模型。即使面对分层防御（例如，结合辅助检测器的Circuit Breaker训练模型），IHO在攻击成功率上也显著优于最先进的方法，且无需任何防御特定的适应。我们的结果将IHO定位为向那种过去提高了可靠性的标准化越狱评估迈出的实际一步。代码和模型可在GitHub和Hugging Face上获取。

英文摘要

Accurately evaluating adversarial robustness is a longstanding challenge. A flawed attack design can inflate robustness estimates, making deployment risk assessment and defense comparison unreliable. Historically, standardized attacks such as AutoAttack have largely resolved this for image classifiers, providing a reliable evaluation baseline for systematic comparison across defenses. However, no equivalent exists for LLM jailbreak evaluation yet, where designing such an attack is considerably more difficult. A reliable attack must, among other things, be black-box compatible, applicable to arbitrary defense pipelines, and efficient, which no existing method jointly satisfies. We introduce Indirect Harm Optimization (IHO), a masked diffusion language model attacker trained via iterative preference optimization against a harmfulness judge, requiring only black-box access to the target. The same method can be used without modification as a strong adaptive attack on individual behaviors, or as an efficient amortized policy that transfers to held-out behaviors and unseen target models without fine-tuning. Even against layered defenses, such as a Circuit Breaker-trained model combined with an auxiliary detector, IHO improves attack success considerably over state-of-the-art approaches, without any defense-specific adaptation. Our results position IHO as a practical step toward the kind of standardized jailbreak evaluation that has improved reliability in the past. Code and models are available on GitHub and Hugging Face.

URL PDF HTML ☆

赞 0 踩 0

2606.03645 2026-06-03 cs.LG cs.AI 版本更新

The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models

加法的形状：大型语言模型中算术的几何结构

Liuyuan Wen, Xun Zhu, Lihao Huang, Wenbin Li, Yang Gao

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结通过分析多操作数加法中残差流的几何结构，发现等原始和轨迹（IRST）并建立噪声量化模型，将算术错误解释为由内部神经噪声引起的几何滑移，并利用几何一致性检查方法检测和纠正量化失败。

Comments Accepted by ICML 2026

详情

AI中文摘要

大型语言模型在基本算术中表现出矛盾的脆弱性，暗示内部计算与离散输出之间存在脱节。通过分析多操作数加法中的残差流几何结构，我们识别出等原始和轨迹（IRST），这是一种由语义数字锚定并由连续进位纤维调制的几何结构。我们提出噪声量化模型来解释这种几何结构，将算术错误视为由内部神经噪声推动连续的潜在进位势跨越量化阈值引起的几何滑移。这一几何框架进一步阐明了探针多功能性，解释了轻量级探针如何从单个激活向量中解开共存的潜在信号（如真实值与幻觉）。最后，我们通过一种几何一致性检查方法验证了这些见解，该方法在推理过程中有效检测和纠正了这些量化失败。我们的代码可在以下网址获取：https://this URL。

英文摘要

Large Language Models exhibit paradoxical fragility in fundamental arithmetic, implying a disconnect between internal computation and discrete output. By analyzing the residual stream geometry during multi-operand addition, we identify the Iso-Raw-Sum Trajectory (IRST), a geometric structure where representations are anchored by semantic digits and modulated by continuous carry fibers. We propose the Noisy Quantization Model to explain this geometry, framing arithmetic errors as Geometric Slippages caused by internal neural noise pushing a continuous, latent Carry Potential across quantization thresholds. This geometric framework further elucidates Probe Versatility, explaining how lightweight probes can disentangle coexisting latent signals (such as ground truth versus hallucination) from a single activation vector. Finally, we validate these insights through a geometric consistency check method that effectively detects and corrects these quantization failures during inference. Our code is available at https://github.com/RL-MIND/Shape-of-Addition.

URL PDF HTML ☆

赞 0 踩 0

2606.03644 2026-06-03 cs.LG 版本更新

Spatial Transcriptomics-Guided Alignment Enhances Molecular Profiling in Pathology Foundation Model

空间转录组学引导的对齐增强病理基础模型中的分子分析

Fengtao Zhou, Yingxue Xu, Zhengyu Zhang, Yihui Wang, Zhengrui Guo, Ling Liang, Jiabo Ma, Cheng Jin, Ziyi Liu, Huajun Zhou, Hongyi Wang, Du Cai, Chenglong Zhao, Xi Wang, Can Yang, Yu Wang, Wenbin Li, Feng Gao, Zhe Wang, Zhenhui Li, Xiuming Zhang, Li Liang, Hao Chen

发表机构 * Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China（计算机科学与工程系，香港科学与技术大学，香港特别行政区，中国）； Department of Pathology, Nanfang Hospital, Southern Medical University, Guangzhou, China（pathology department, 南方医科大学南芳医院，广州，中国）； Department of Pathology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China（pathology department, 南方医科大学基础医学学院，广州，中国）； Guangdong Province Key Laboratory of Molecular Tumor Pathology, Guangzhou, China（广东省分子肿瘤病理学重点实验室，广州，中国）； Jinfeng Laboratory, Chongqing, China（金风实验室，重庆，中国）

AI总结提出STAMP框架，利用空间转录组数据通过通路感知对齐策略增强病理基础模型的分子感知能力，并在多层级评估中验证其临床效用。

详情

AI中文摘要

全面的分子分析对于现代精准肿瘤学至关重要，但高昂的成本、标本耗尽和漫长的周转时间仍然阻碍其应用。虽然病理基础模型（PFMs）已显示出从常规苏木精-伊红（H&E）全切片图像推断分子表型的潜力，但当前架构主要依赖于以视觉为中心的自监督学习或视觉-语言对齐，缺乏将细微形态学特征与潜在基因组改变联系起来所需的空间解析分子监督。空间转录组学（ST）作为一种变革性技术出现，能够在完整组织切片内进行转录组定量，从而保留组织学与分子谱之间的精确空间联系。在本研究中，我们提出了用于分子分析的空间转录组学引导对齐框架（STAMP），该框架赋予PFMs内在的分子感知能力。为支持这一范式，我们整理了HumanST-1k，一个涵盖不同解剖器官和测序平台的人类ST数据集。该图谱产生了180万对H&E斑块及其对应的转录组谱，提供了一个将组织学结构与其分子状态联系起来的语料库。为减轻原始转录组学中固有的技术噪声，STAMP采用了一种通路感知对齐策略，将转录组数据聚合为生物学功能通路，随后通过参数高效微调将其整合到PFMs中。这种对齐丰富了PFMs的表征空间，并释放了其解析亚视觉分子特征的能力。通过多层级评估框架验证了这些增强表征的临床实用性。

英文摘要

Comprehensive molecular profiling is essential for modern precision oncology but remains hindered by prohibitive costs, specimen exhaustion, and protracted turnaround times. While pathology foundation models (PFMs) have demonstrated potential for inferring molecular phenotypes from routine hematoxylin and eosin (H&E) whole-slide images (WSIs), current architectures primarily rely on vision-centric self-supervised learning or vision-language alignment, lacking the spatially resolved molecular supervision required to connect subtle morphological features with underlying genomic alterations. Spatial transcriptomics (ST) emerges as a transformative technology that enables transcriptomic quantification within intact tissue sections, thereby preserving the precise spatial link between histology and molecular profiles. In this study, we present a Spatial Transcriptomics-guided Alignment framework for Molecular Profiling (STAMP), which endows PFMs with intrinsic molecular awareness. To support this paradigm, we curated HumanST-1k, a human ST dataset spanning diverse anatomical organs and sequencing platforms. This atlas yields 1.8 million pairs of H&E patches and corresponding transcriptomic profiles, providing a corpus that links histological structures with their molecular states. To mitigate the technical noise inherent to raw transcriptomics, STAMP applies a pathway-informed alignment strategy that aggregates transcriptomic data into biologically functional pathways, which are subsequently integrated into PFMs via parameter-efficient fine-tuning. This alignment enriches the representation space of PFMs and unlocks their capacity to resolve sub-visual molecular signatures. The clinical utility of these augmented representations was validated through a multi-tier evaluation framework.

URL PDF HTML ☆

赞 0 踩 0

2606.03628 2026-06-03 cs.CL cs.AI cs.LG 版本更新

Building Reliable Long-Form Generation via Hallucination Rejection Sampling

通过幻觉拒绝采样构建可靠的长文本生成

Lin Li, Georgia Channing, Suhaas M Bhat, Gabriel Davis Jones, Yarin Gal

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； University of California, Berkeley（加州大学伯克利分校）； University of Cambridge（剑桥大学）； DeepMind（深度思维）

AI总结提出分段幻觉拒绝采样框架SHARS，利用任意幻觉检测器在生成过程中拒绝并重采样幻觉片段，以缓解长文本生成中的幻觉累积问题，提升事实一致性。

Comments accepted by ICML 2026

详情

AI中文摘要

大型语言模型（LLMs）在开放式文本生成方面取得了显著进展，但仍容易产生不正确或无依据的幻觉内容，这损害了其可靠性。在长文本生成中，由于幻觉雪崩现象（早期错误传播并累积到后续输出），这一问题更加严重。为了解决这一挑战，我们提出了一种新颖的推理时幻觉缓解框架，称为分段幻觉拒绝采样（SHARS），该框架使用任意幻觉检测器在生成过程中识别并拒绝幻觉片段，并重新采样直到生成忠实的内容。通过仅保留可信信息并在此基础上构建后续生成，该框架减轻了幻觉累积并增强了事实一致性。为了实例化该框架，我们采用语义不确定性作为检测器，并引入了若干关键修改以解决其局限性并更好地适应长文本。我们的方法使模型能够自我纠正幻觉，无需外部资源（如网络搜索或知识库），同时保持与这些资源的兼容性以便未来扩展。在标准化幻觉基准上的实证评估表明，我们的方法显著减少了长文本生成中的幻觉，同时保持甚至提高了生成的信息量。代码可在以下网址获取：this https URL。

英文摘要

Large language models (LLMs) have achieved remarkable progress in open-ended text generation, yet they remain prone to hallucinating incorrect or unsupported content, which undermines their reliability. This issue is exacerbated in long-form generation due to hallucination snowballing, a phenomenon where early errors propagate and compound into subsequent outputs. To address this challenge, we propose a novel inference-time hallucination mitigation framework, named Segment-wise HAllucination Rejection Sampling (SHARS), which uses an arbitrary hallucination detector to identify and reject hallucinated segments during generation and resample until faithful content is produced. By retaining only confident information and building subsequent generations upon it, the framework mitigates hallucination accumulation and enhances factual consistency. To instantiate this framework, we adopt semantic uncertainty as the detector and introduce several vital modifications to address its limitations and better adapt it to long-form text. Our method enables models to self-correct hallucinations without requiring external resources such as web search or knowledge bases, while remaining compatible with them for future extensions. Empirical evaluations on standardized hallucination benchmarks demonstrate that our method substantially reduces hallucinations in long-form generation while preserving or even improving the informativeness of generation. Code is available at: https://github.com/TreeLLi/hallucination-rejection-sampling.

URL PDF HTML ☆

赞 0 踩 0

2606.03620 2026-06-03 cs.LG cs.AI 版本更新

Physics-Guided Policy Optimization with Self-Distillation

基于物理引导的自蒸馏策略优化

Ke Wang, Yuning Wu, Haoran Liu, Chaoqun Jia, Devin Chen, Kai Wei

发表机构 * Amazon（亚马逊）

AI总结针对自蒸馏策略优化中固定步长导致训练不稳定的问题，提出受粘性流体动力学启发的物理引导策略优化（PGPO），通过互信息估计动态调整步长，在Science-QA数据集上提升性能并保持训练稳定性。

详情

AI中文摘要

自蒸馏策略优化（SDPO）已成为大语言模型后训练的一种流行范式，其中模型根据特权信息从自身预测中学习。然而，SDPO对每次更新步长的信任程度敏感：来自自我教师的修正可能在某些批次上信息丰富，而在其他批次上具有误导性，若以固定步长统一应用，会破坏训练稳定性。受粘性流体动力学启发，并在随机微分方程层面形式化类比，我们提出物理引导策略优化（PGPO），该方法引入一个基于学生预测与反馈条件教师之间互信息估计的信息调制步长乘子。我们证明这种调制保留了普通SGD的一阶弱近似保证，且每次迭代的额外开销可忽略。我们在Science-QA数据集上评估PGPO，它在4个领域中的3个上优于SDPO，提升高达+4.5个点，同时在SDPO训练后期崩溃的设置中保持稳定。

英文摘要

Self-distilled policy optimization (SDPO) has become a popular paradigm for LLM post-training, where a model learns from its own predictions conditioned on privileged information. SDPO, however, is sensitive to how much each update step should be trusted: corrections from a self-teacher can be highly informative on some batches and misleading on others, and applying them uniformly with a fixed step size can destabilize training. Drawing inspiration from viscous-fluid dynamics and formalizing the analogy at the SDE level, we propose Physics-Guided Policy Optimization (PGPO), which introduces an information-modulated step-size multiplier derived from a mutual-information estimate between the student's predictions and the feedback-conditioned teacher. We show that this modulation preserves the order-1 weak-approximation guarantees of vanilla SGD, and incurs negligible overhead per iteration. We evaluate PGPO on the Science-QA dataset, where it outperforms SDPO on 3 of the 4 domains with gains of up to +4.5 points, while remaining stable in a setting where SDPO collapses late in training.

URL PDF HTML ☆

赞 0 踩 0

2606.03608 2026-06-03 cs.LG cs.AI 版本更新

Exploiting Verification-Generation Gap: Test-Time Reinforcement Learning with Confidence-Conditioned Verification

利用验证-生成差距：基于置信度条件的测试时强化学习

Jiahui Li, Jianfeng Shan, Wenpei Chen, Shunyu Wu, Jian Lou, Wenjie Feng, Dan Li, See-Kiong Ng

发表机构 * Sun Yat-Sen University（中山大学）； University of Science and Technology of China（中国科学技术大学）； National University of Singapore（新加坡国立大学）

AI总结提出TTRL-CoCoV框架，通过置信度自适应机制解决无标签设置下Pass@k优化中的伪标签错误和多样性崩溃问题，显著提升Pass@1和Pass@k性能。

详情

AI中文摘要

测试时强化学习已成为一种有前景的范式，用于在完全无标签的方式下增强大型语言模型的复杂推理能力。尽管现有研究关注Pass@1性能，但在无标签设置下优化Pass@k（衡量生成覆盖率以支持持续探索）仍未被充分探索且至关重要。在无标签设置下优化Pass@k极具挑战性，因为直接应用对RLVR有效的Pass@k优势设计会导致性能不佳。通过深入的实证分析，我们发现阻碍性能的根本原因：低置信度样本的伪标签估计很可能不正确，而高置信度样本的候选答案则遭受严重的多样性崩溃。为克服这些障碍，我们提出TTRL-CoCoV（基于置信度条件的测试时强化学习），一种新颖的置信度自适应框架，可扩展Pass@k覆盖率并提升Pass@1性能。基于我们的关键洞察——验证能力通常领先于生成能力，TTRL-CoCoV采用置信度条件机制：对于高置信度样本，它引导验证器并应用探索增强奖励以防止多样性崩溃；对于低置信度样本，它将伪标签选择委托给验证器以过滤错误伪标签；对于中等置信度样本，则完全绕过验证。大量实验表明，TTRL-CoCoV在6个广泛认可的基准上优于最佳竞争方法，在Pass@1上平均绝对提升+9.8%，在Pass@16上平均绝对提升+18.7%，甚至在与全监督强化学习方法相比时，在多个推理基准上实现了高达+5.0%的Pass@1绝对提升。我们的代码仓库：此 https URL。

英文摘要

Test-time reinforcement learning has emerged as a promising paradigm for enhancing the complex reasoning abilities of large language models in a completely label-free manner. Despite existing studies focusing on Pass@1 performance, optimizing Pass@k remains under-explored yet critical in label-free settings, which measures generation coverage for sustained exploration. Optimizing Pass@k in label-free setting is highly non-trivial, as directly applying the Pass@k advantage designs effective for RLVR yields unsatisfactory performance. Through in-depth empirical analysis, we discover the root causes hindering performance: pseudo-label estimations for low-confidence samples have a high probability of being incorrect, while candidate answers for high-confidence samples suffer from severe diversity collapse. To overcome these hurdles, we propose TTRL-CoCoV (Test-Time Reinforcement Learning with Confidence-Conditioned Verification), a novel confidence-adaptive framework that expands Pass@k coverage and improves Pass@1 performance. Based on our key insight that verification capability generally leads generation capability, TTRL-CoCoV employs a confidence-conditioned mechanism: for high-confidence samples, it bootstraps verifier and applies an exploration-enhancing reward to prevent diversity collapse; for low-confidence samples, it delegates pseudo-label selection to the verifier to filter incorrect pseudo-labels; and for medium-confidence samples, it bypasses verification entirely. Extensive experiments demonstrate that TTRL-CoCoV outperforms the best competing methods across 6 widely-recognized benchmarks, achieves average absolute gains of +9.8% in Pass@1 and +18.7% in Pass@16 over TTRL, and even achieves absolute Pass@1 improvements of up to +5.0% across multiple reasoning benchmarks when compared against fully supervised RL methods. Our code repository: https://github.com/shanjf666/CoCoV.

URL PDF HTML ☆

赞 0 踩 0

2606.03602 2026-06-03 cs.LG cs.AI cs.CL 版本更新

CauTion: Knowing When to Trust LLMs for Ensemble Causal Discovery

CauTion：知道何时信任LLM进行集成因果发现

Bo Peng, Kaiwen Wu, Sirui Chen, Zhiheng Wang, Yu Qiao, Chaochao Lu

发表机构 * Shanghai AI Laboratory（上海人工智能实验室）； Shanghai Innovation Institute（上海创新研究院）； Shanghai Jiao Tong University（上海交通大学）； Nanjing University（南京大学）； Tongji University（同济大学）

AI总结提出CauTion框架，通过共识过滤和LLM可靠性估计，将LLM领域知识可靠地集成到多个统计因果发现算法中，解决纯统计方法的局限和LLM错误问题。

详情

AI中文摘要

从观测数据进行因果发现仍然具有挑战性，因为纯统计方法存在根本性限制，例如等价类内的统计可区分性和对有限样本量的敏感性。虽然大型语言模型（LLM）提供了有希望的领域知识来源来补充统计推断，但现有的LLM增强方法容易受到LLM错误的影响，并且产生高昂的令牌成本。此外，依赖单一数据驱动算法可能使结果对算法特定偏差敏感。为了解决这些限制，我们提出了CauTion，一个通过共识过滤和LLM可靠性估计将LLM领域知识可靠地集成到统计因果发现算法集成中的框架。CauTion分三个阶段进行。首先，算法集成利用共识投票解决算法一致的最多96%的边，在过滤后的共识边上实现接近完美的准确性。其次，一个信任校准仲裁机制通过无注释的信任校准过程估计LLM和算法的相对可靠性，然后用于控制信任加权投票过程，将LLM仲裁限制在算法证据不可靠的边上。第三，应用循环修复步骤确保最终因果图是有效的无环图。在六个数据集上的实验表明，CauTion在性能上始终优于数据驱动和LLM增强的基线，在更大的图上获得更大的收益，并且对LLM错误具有强大的鲁棒性。代码可在以下网址获取：https://this URL。

英文摘要

Causal discovery from observational data remains challenging due to the fundamental limitations of purely statistical methods, such as statistical distinguishability within equivalence classes and sensitivity to finite sample sizes. While large language models (LLMs) offer a promising source of domain knowledge to complement statistical inference, existing LLM-augmented methods are vulnerable to LLM errors and incur high token costs. Moreover, reliance on a single data-centric algorithm can make results sensitive to algorithm-specific biases. To address these limitations, we propose CauTion, a framework that reliably integrates LLM domain knowledge into an ensemble of statistical causal discovery algorithms through consensus filtering and LLM reliability estimation. CauTion proceeds in three stages. First, an algorithm ensemble utilizes a consensus voting to resolve up to 96% of edges on which algorithms agree, achieving near-perfect accuracy on the filtered consensus edges. Second, a trust-calibrated arbitration mechanism estimates the relative reliability of the LLM and the algorithms via an annotation-free trust calibration procedure, which is then utilized to govern a trust-weighted voting process that restricts LLM arbitration exclusively to edges with unreliable algorithmic evidence. Third, a cycle repair step is applied to guarantee the final causal graph is validly acyclic. Experiments on six datasets demonstrate that CauTion consistently outperforms both data-centric and LLM-augmented baselines, with larger gains on larger graphs and strong robustness to LLM errors. Code is available at https://github.com/OpenCausaLab/CauTion.

URL PDF HTML ☆

赞 0 踩 0

2606.03584 2026-06-03 cs.LG cond-mat.dis-nn cs.NE 版本更新

Training a Predictive Coding Network on ImageNet using Equilibrium Propagation

使用均衡传播在ImageNet上训练预测编码网络

Tugdual Kerjan, Rasmus Høier, Benjamin Scellier

发表机构 * Rain AI

AI总结提出一种结合中心化均衡传播与新型均衡方案的预测编码网络训练方法，在ImageNet上训练10层卷积PCN，达到13.23% top-5错误率，接近反向传播基线。

详情

AI中文摘要

均衡传播（EP）是一种基于物理的训练框架，主要应用于能量模型，包括连续Hopfield网络、非线性电阻网络和耦合相位振荡器。然而，EP的实际应用至今仍局限于相对小规模的问题。预测编码网络（PCN）是另一类根植于计算神经科学的能量模型，通常使用专门的算法训练，同样尚未在大规模上得到验证。在这项工作中，我们开发了一种基于EP的PCN训练方法，该方法将中心化EP与一种新的PCN均衡方案相结合。使用这种方法，我们在全尺寸ImageNet上训练了一个10层卷积PCN（VGG10），在top-5分类任务上实现了13.23%的测试错误率，接近12.2%的反向传播基线。据我们所知，这是PCN和基于EP的训练首次在ImageNet规模上得到验证。这些结果显著扩展了两种方法的可扩展性，并表明在其他物理系统中扩展EP的主要挑战可能更多地来自这些系统的计算特性，而非EP框架本身的固有限制。

英文摘要

Equilibrium Propagation (EP) is a physics-based training framework that has primarily been employed in energy-based models, including continuous Hopfield networks, nonlinear resistive networks and coupled phase oscillators. However, EP's practical applications have so far remained limited to relatively small-scale problems. Predictive coding networks (PCNs), another class of energy-based models rooted in computational neuroscience, are typically trained with a specialized algorithm and have likewise not yet been demonstrated at large scale. In this work, we develop an EP-based training method for PCNs which combines the centered variant of EP with a novel equilibration scheme for PCNs. Using this approach, we train a 10-layer convolutional PCN (VGG10) on full-size ImageNet, achieving 13.23\% test error rate on the top-5 classification task, close to the 12.2\% backpropagation baseline. To our knowledge, this is the first demonstration of both PCNs and EP-based training at ImageNet scale. These results significantly extend the scalability of both approaches and suggest that the primary challenges in scaling EP in other physical systems may come more from the computational properties of these systems than from inherent limitations of the EP framework.

URL PDF HTML ☆

赞 0 踩 0

2606.03568 2026-06-03 cs.CV cs.AI cs.LG cs.RO 版本更新

Learned Non-Maximum Suppression for 3D Object Detection

用于3D目标检测的学习型非极大值抑制

Timo Osterburg, Stefan Schütte, Torsten Bertram

发表机构 * Institute of Control Theory and Systems Engineering, TU Dortmund University（控制理论与系统工程研究所，多特蒙德技术大学）

AI总结提出两种基于学习的过滤模块（D2D-Rescore和GossipNet3D）替代启发式NMS，通过检测间关系提升3D检测性能，尤其改善小物体和稀有类别的检测精度。

Comments 6 pages, accepted at IEEE Intelligent Vehicles Symposium (IV) 2026

详情

AI中文摘要

后处理是基于激光雷达的3D目标检测中的关键阶段，必须过滤密集且重叠的提议以实现紧凑可靠的感知。本文引入了两个学习型过滤模块，通过利用检测之间的关系来替代启发式非极大值抑制（NMS）。D2D-Rescore采用基于Transformer的检测到检测（D2D）注意力，而GossipNet3D通过鸟瞰图中的局部消息传递将2D GossipNet概念适应到3D。一种与nuScenes评估协议对齐的度量感知匹配策略确保了训练和验证行为的一致性，从而提高了整体检测性能。与CircleNMS相比，两种方法都提高了平均精度（mAP）、nuScenes检测分数（NDS）和真阳性质量，特别是对于小物体和稀有类别，同时增加了最小的计算开销。这些结果表明，学习型的检测级过滤可以在不修改基础网络的情况下增强3D检测器的可靠性，为启发式抑制提供了一种原则性的替代方案。代码可在以下网址获取：https://this URL。

英文摘要

Post-processing is a critical stage in LiDAR-based 3D object detection, where dense and overlapping proposals must be filtered for compact and reliable perception. This work introduces two learned filtering modules that replace heuristic non-maximum suppression (NMS) by leveraging relations among detections. D2D-Rescore employs transformer-based detection-to-detection (D2D) attention, while GossipNet3D adapts the 2D GossipNet concept to 3D through localized message passing in bird's-eye view. A metric-aware matching strategy aligned with the nuScenes evaluation protocol ensures consistent training and validation behavior, improving overall detection performance. Both approaches improve mean average precision (mAP), nuScenes detection score (NDS), and true positive quality compared to CircleNMS, particularly for small and infrequent classes, while adding minimal computational overhead. These results demonstrate that learned, detection-level filtering can enhance 3D detector reliability without modifying the base network, offering a principled alternative to heuristic suppression. Code is available at https://github.com/rst-tu-dortmund/learned-3d-nms .

URL PDF HTML ☆

赞 0 踩 0

2606.03549 2026-06-03 cs.LG math.PR 版本更新

How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration

随机森林中需要多少棵树？一种结合平台搜索与Optuna集成的重新审视方法

Vadim Porvatov, Andrey Dukhovny, Andrey Lange

发表机构 * Sberbank ； Skolkovo Institute of Science and Technology (Skoltech)（Skoltech）； Federal Research Center "Computer Science and Control" of Russian Academy of Sciences (FRC CSC RAS)（俄罗斯科学院计算机科学与控制联邦研究中心）

AI总结提出一种基于三元组平台搜索的算法，通过监控袋外分数的相对变化自动确定随机森林的树数量，避免预设搜索范围，并提供了理论分析和实验验证。

详情

AI中文摘要

随机森林的超参数优化在调整树数量时面临一个特定困难：预测分数通常随集成规模单调提升，因此诸如树结构Parzen估计器（TPE）和Hyperband等标准方法需要预定义搜索范围，且往往将估计推向其右边界。早停策略避免了固定这样的范围，但对分数噪声敏感且容易过早停止。为解决此问题，我们提出一种集成的基于三元组的平台搜索算法，该算法将树数量从直接TPE搜索空间中移除，同时仍利用跨HPO试验积累的信息。该方法通过监控三个森林规模上的袋外（OOB）分数相对变化，自适应地跟踪接近最小的充分集成规模，并相应移动该三元组。这产生了一个基于容差参数的自动化且用户可解释的过程。我们还提供了理论分析：我们将所提出的相对OOB分数准则与当前分数和极限分数之间的差距联系起来，并推导了相应的基于OOB的绝对相对差异的渐近方差估计。实验表明，所选树数量可能与常见启发式方法有显著差异：对于大多数经典基准数据集，它更小；而对于一些高维生物信息学数据集（如Arcene和Dorothea），则更大。源代码和可重复实验可在以下网址获取：https://github.com/your-repo。

英文摘要

Hyperparameter optimization (HPO) for Random Forest faces a specific difficulty in tuning the number of trees: the predictive score typically improves monotonically with ensemble size, so standard methods such as Tree-structured Parzen Estimator (TPE) and Hyperband require a predefined search range and often drive the estimate toward its right boundary. Early-stopping strategies avoid fixing such a range, but can be sensitive to score noise and prone to premature stopping. To address this, we propose an integrated triplet-based plateau-search algorithm that removes the number of trees from the direct TPE search space and still exploits information accumulated across HPO trials. The method adaptively tracks a near-minimal sufficient ensemble size by monitoring relative changes in the out-of-bag (OOB) score across a triplet of forest sizes and shifting this triplet accordingly. This yields an automated and user-interpretable procedure based on a tolerance parameter. We also provide a theoretical analysis: we relate the proposed relative OOB-score criterion to the gap between the current and limiting scores, and derive an asymptotic variance estimate for the corresponding OOB-based absolute relative difference. Experiments show that the selected number of trees can differ substantially from the common heuristic: for most classical benchmark datasets it is smaller, whereas for some high-dimensional bioinformatics datasets, such as Arcene and Dorothea, it is larger. The source code and reproducible experiments are available at https://github.com/lange-am/rf_plateau_hpo.

URL PDF HTML ☆

赞 0 踩 0

2606.03532 2026-06-03 cs.LG cs.AI 版本更新

When Should the Teacher Move? Temporal Coupling and Stability in Self On-Policy Distillation

教师何时应该移动？自在线策略蒸馏中的时间耦合与稳定性

Haowei Guo, Baolong Bi, Ruicheng Zhang, Bingqian Sun, Wentao Zhang

发表机构 * Peking University（北京大学）； University of Chinese Academy of Sciences（中国科学院大学）； Tsinghua University（清华大学）

AI总结研究自在线策略蒸馏中教师更新调度对稳定性的影响，提出基于隔离期和门控机制的CGTR方法，实现零崩溃和最佳性能。

详情

AI中文摘要

自在线策略蒸馏针对从自身参数历史派生的教师训练学生策略，但教师的更新调度——控制教师与学生之间的\emph{时间耦合}——尚未作为稳定性变量被系统研究。通过对Qwen3-8B进行受控调度扫描，我们确定\emph{隔离期}（定义为更新之间教师完全冻结）是实现稳定学习的关键结构属性，而非教师年龄。为了刻画这些底层训练动态，我们引入了一个诊断框架，包括时间KL结构、刷新冲击和长度尾部风险。该框架进一步揭示了\emph{状态遗忘崩溃}：最优的短视固定调度在长视训练下灾难性失败，因为时钟驱动的刷新可以在单个不可逆步骤中将短暂漂移的学生复制到教师中。这种失败模式在短视评估下不可见，并且在机制上不同于EMA的慢性污染。为了解决这个问题，我们提出了\emph{巩固门控教师刷新}（CGTR），它在保持隔离期的同时，基于奖励改进和长度尾部安全的联合证据对每次刷新进行门控，确保每次教师移动响应于真正的学生巩固而非时钟信号。使用单一共享参数集且无需每数据集重新调整，CGTR在所有四个任务（化学、生物学、物理学、工具使用）上实现了 extbf{零崩溃}和最佳最终分数，并自动调节其刷新频率以适应每个任务的学习动态。

英文摘要

Self on-policy distillation trains a student policy against a teacher derived from its own parameter history, yet the teacher's update schedule -- which governs the \emph{temporal coupling} between teacher and student -- has not been systematically studied as a stability variable. Through a controlled schedule sweep on Qwen3-8B, we establish that \emph{isolation periods}, defined as complete teacher freezing between updates, are the key structural property enabling stable learning, not teacher age. To characterize these underlying training dynamics, we introduce a diagnostic framework of temporal KL structure, refresh shock, and length-tail risk. This framework further uncovers \emph{state-oblivious collapse}: optimal short-horizon fixed schedules catastrophically fail under long-horizon training because a clock-driven refresh can copy a transiently drifting student into the teacher in a single, irreversible step. This failure mode is invisible under short-horizon evaluation and mechanistically distinct from EMA's chronic contamination. To address this, we propose \emph{Consolidation-Gated Teacher Refresh} (CGTR), which preserves isolation periods while gating each refresh on joint evidence of reward improvement and length-tail safety, ensuring every teacher movement responds to genuine student consolidation rather than a clock signal. With a single shared parameter set and no per-dataset retuning, CGTR achieves \textbf{zero collapse} and the best final score on all four tasks (Chemistry, Biology, Physics, ToolUse), self-regulating its refresh frequency to each task's learning dynamics.

URL PDF HTML ☆

赞 0 踩 0

2606.03523 2026-06-03 cs.CR cs.AI cs.LG 版本更新

High-Precision APT Malware Attribution with Out-of-Scope Resilience

高精度APT恶意软件归因与越界鲁棒性

Peter Williams, Adam Sobey, Erisa Karafili

发表机构 * Department of Computer Science, University of Oxford（1 奥克斯福德大学计算机科学系）

AI总结提出基于排名二元分类器与显式弃权的APT恶意软件归因方法，在越界样本占比87%时仍保持92%精度和95%选择性准确率。

详情

AI中文摘要

早期归因高级持续性威胁（APT）活动可帮助防御者优先调查、选择对策并减少入侵影响。恶意软件提供了有用的归因证据，但自动化APT恶意软件归因在实践中仍然困难。现有方法通常作为封闭集分类器在有限数量的已知APT组织上进行训练和评估。然而，在操作环境中，分类器很可能遇到训练中未出现的组织样本。封闭集分类器被迫将这些样本分配给已知组织，产生无根据且可能误导的归因。我们提出一种基于排名二元分类器与显式弃权的高精度APT恶意软件归因方法。我们的方法不是训练单个多类分类器，而是为每个APT组织训练和调整两个二元分类器，根据验证性能对分类器进行排名，并顺序应用它们。仅当分类器提供足够证据时才对样本进行归因；否则，弃权。我们在APT恶意软件数据集和旨在压力测试越界行为的更大组合数据集上评估该方法。在APT恶意软件数据集上，该方法实现了比先前公布结果更高的精度。在最具挑战性的设置中，87%的测试样本来自训练中排除的60个APT组织，该方法对94%的越界样本弃权，同时在其分类的样本上保持92%的精度和95%的选择性准确率。

英文摘要

Early attribution of Advanced Persistent Threat (APT) activity can help defenders prioritise investigation, select countermeasures, and reduce the impact of an intrusion. Malware provides useful attribution evidence, but automated APT malware attribution remains difficult in practice. Existing approaches are typically trained and evaluated as closed-set classifiers over a limited number of known APT groups. In operational environments, however, classifiers are likely to encounter samples from groups not represented during training. Closed-set classifiers are then forced to assign such samples to known groups, producing unsupported and potentially misleading attributions. We present a high-precision APT malware attribution method based on ranked binary classifiers with explicit abstention. Rather than training a single multi-class classifier, our approach trains and tunes two binary classifiers per APT group, ranks the classifiers by validation performance, and applies them sequentially. A sample is attributed only when a classifier provides sufficient evidence; otherwise, it abstains. We evaluate the method on the APT Malware dataset and on a larger combined dataset designed to stress-test out-of-scope behaviour. On the APT Malware dataset, the method achieves higher precision than previously published results on the same dataset. In the most challenging setting, where 87% of test samples came from 60 APT groups excluded from training, the method abstained on 94% of out-of-scope samples while maintaining 92% precision and 95% selective accuracy on the samples it classified.

URL PDF HTML ☆

赞 0 踩 0

2606.03521 2026-06-03 cs.LG cs.AI 版本更新

Post-Hoc Robustness for Model-Based Reinforcement Learning

基于模型的强化学习的后验鲁棒性

Siemen Herremans, Ali Anwar, Siegfried Mercelis

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结提出一种在推理时利用学习模型和名义策略进行鲁棒策略改进的后验鲁棒化方法，通过对抗性展开的模型预测控制提升鲁棒性，无需额外训练神经网络。

详情

AI中文摘要

为了提高强化学习（RL）在现实世界中的适用性，对抗鲁棒RL领域研究如何在对抗环境扰动下训练智能体。在该设置中，主角智能体在对手的环境扰动下优化策略，形成零和马尔可夫博弈。当对抗鲁棒RL与基于模型的RL结合时，对手可以针对学习到的转移模型而非训练环境。扩展这一思想，本文引入了深度RL智能体在推理时的后验鲁棒化。通过将学习模型与训练的名义策略结合使用，我们的方法执行鲁棒策略改进步骤。目标是提高鲁棒性而无需对神经网络进行额外训练。具体来说，我们利用对抗性展开下的模型预测控制，这些展开通过有界不确定性集内的投影梯度下降进行近似。此外，这些离线展开在执行时考虑并缓解了分布外问题。通过在扰动的Gymnasium MuJoCo环境中评估算法，同时考虑后验推理设置的计算限制，验证了所提方法在鲁棒性上的显著提升。

英文摘要

To improve the real-world applicability of reinforcement learning (RL), the field of adversarially robust RL studies how to train agents under adversarial environment perturbations. In this setting, a protagonist agent optimizes a policy under environmental perturbations from an adversary, resulting in a zero-sum Markov game. When adversarially robust RL is combined with model-based RL, the adversary can target a learned transition model instead of the training environment. Extending this idea, this work introduces post-hoc robustification of deep RL agents at inference time. By using the learned model in combination with a trained nominal policy, our approach performs a robust policy improvement step. The goal is to improve robustness without any additional training of neural networks. Specifically, we utilize model-predictive control under adversarial rollouts, which are approximated via projected gradient descent within a bounded uncertainty set. Furthermore, these offline rollouts are performed while considering and mitigating out-of-distribution issues. The proposed methodology is validated by demonstrating significant improvements in robustness when the algorithm is evaluated in perturbed Gymnasium MuJoCo environments, while considering the computational limitations of the post-hoc inference setting.

URL PDF HTML ☆

赞 0 踩 0

2606.03498 2026-06-03 cs.LG cs.DC 版本更新

Demystifying Pipeline Parallelism: First Theory for PipeDream

揭秘流水线并行：PipeDream 的首个理论

Ivan Ilin, Peter Richtárik

发表机构 * KAUST（卡斯土尼亚大学）

AI总结本文通过引入随机化 PipeDream (RPD) 抽象，首次为 PipeDream 风格方法提供了非凸收敛保证，并分析了其稳态延迟与阶段数的缩放关系，同时与 LocalSGD 进行了比较。

Comments 40 pages, 4 figures

详情

AI中文摘要

训练现代机器学习模型越来越需要跨多个加速器进行分布式计算。数据并行仍然是默认选择，并且通常与张量并行分片相结合，但一旦参数、激活或优化器状态不再适合单个设备，模型并行就变得不可避免。本文通过 PipeDream (PD) (Harlap et al., 2018) 的视角研究流水线模型并行。我们的第一个贡献是理论性的：我们引入了随机化 PipeDream (RPD)，一种陈旧块-SGD 抽象，据我们所知，这为 PD 风格方法提供了第一个干净的非凸收敛保证。我们的第二个贡献是扩展诊断：我们证明了稳态 PD 引起的延迟随阶段数 S 增长为 $S^2 - S/2 + O(1)$，因此收敛定理中的陈旧读取贡献缩放为 $\Theta(\gamma^2 S^4)$，在调谐速率形式中等价于 $\Theta(S^4/K)$。我们的第三个贡献是与 LocalSGD 的比较，后者通过周期性模型平均来权衡权重陈旧性与同步气泡。在我们报告的模拟时间实验中，表现更好的方法取决于目标：PD 在二次目标和小型语言建模训练损失任务上表现更好，而对于逻辑回归，随着阶段数增加，LocalSGD 变得优越。

英文摘要

Training modern machine learning models increasingly requires computation to be distributed across many accelerators. Data parallelism remains the default choice and is often paired with tensor-parallel sharding, but model parallelism becomes unavoidable once parameters, activations, or optimizer states no longer fit on a single device. This paper studies pipeline model parallelism through the lens of PipeDream (PD) (Harlap et al., 2018). Our first contribution is theoretical: we introduce Randomized PipeDream (RPD), a stale block-SGD abstraction that yields, to our knowledge, the first clean nonconvex convergence guarantee for a PD-style method. Our second contribution is a scaling diagnosis: we prove that the delay induced by steady-state PD grows as $S^2 - S/2 + O(1)$ for $S$ stages, so the stale-read contribution in the convergence theorem scales as $Θ(γ^2 S^4)$, equivalently as $Θ(S^4/K)$ in the tuned-rate form. Our third contribution is a comparison with LocalSGD, whose periodic model averaging trades weight staleness for synchronization bubbles. In our reported simulated-time experiments, the better-performing method depends on the objective: PD performs better on the quadratic objective and on a small language-modeling training-loss task, while for logistic regression LocalSGD becomes superior as the number of stages increases.

URL PDF HTML ☆

赞 0 踩 0

2606.03495 2026-06-03 cs.LG 版本更新

HiSE: A Lightweight Hierarchical Semantic Explainer for Heterogeneous Graph Neural Networks

HiSE：一种用于异构图神经网络的轻量级层次语义解释器

Zongrui Li, Yuhang Zhao, Ying Zhao, Yuanzhao Guo, Qiang Huang, Yuan Tian

发表机构 * School of Artificial Intelligence, Jilin University（吉林大学人工智能学院）； Mohamed bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）

AI总结提出HiSE，一种轻量级特征导向的可解释模型，通过层次语义建模（语义级LASSO稀疏特征学习和跨语义级KL散度自适应融合）实现高保真、低计算开销的异构图神经网络解释。

详情

AI中文摘要

异构图神经网络（HGNNs）在建模复杂关系数据方面表现出色，然而在高风险应用中的可解释性仍然是一个关键挑战。现有的解释方法存在两个主要局限性：一方面，生成的解释未能反映HGNNs固有的语义层次，导致对模型内部决策机制的保真度不足；另一方面，特征解释通常依赖于复杂的搜索或扰动机制，导致计算复杂度过高且效率低下。为了解决这些问题，我们提出了HiSE，一种轻量级特征导向的HGNNs可解释模型。HiSE通过层次语义建模实现语义感知的特征解释：在语义层面，采用基于最小绝对收缩和选择算子（LASSO）的局部代理模型学习每个语义视图下的稀疏特征表示；在跨语义层面，通过KL散度自适应地表征不同语义视图的贡献，生成统一的解释。大量实验表明，HiSE在保真度、鲁棒性和跨语义解释能力方面优于现有方法，同时其轻量级框架具有较低的计算开销，能够高效应用于大规模、复杂的真实世界异构图。

英文摘要

Heterogeneous graph neural networks (HGNNs) have demonstrated remarkable performance in modeling complex relational data, however their interpretability in high-stakes applications remains a critical challenge. Existing explanation methods suffer from two major limitations: on the one hand, the generated explanations fail to reflect the inherent semantic hierarchy of HGNNs, resulting in a lack of fidelity to the model's internal decision-making mechanism; on the other hand, feature explanations often rely on complex search or perturbation mechanisms, leading to excessive computational complexity and poor efficiency. To address these issues, we propose HiSE, a lightweight feature-oriented interpretable model for HGNNs. HiSE achieves semantically aware feature explanations through hierarchical semantic modeling: at the semantic level, local surrogate models based on the Least Absolute Shrinkage and Selection Operator (LASSO) are employed to learn sparse feature representations under each semantic view; at the cross-semantic level, the contributions of different semantic views are adaptively characterized via KL divergence to produce a unified explanation. Extensive experiments demonstrate that HiSE outperforms existing methods in terms of fidelity, robustness, and cross-semantic explanation capability, while its lightweight framework incurs low computational overhead, enabling efficient application to large-scale, complex real-world heterogeneous graphs.

URL PDF HTML ☆

赞 0 踩 0

2606.03493 2026-06-03 cs.CV cs.LG 版本更新

Low-Frequency Shortcuts in Texture-Driven Visual Learning

纹理驱动视觉学习中的低频捷径

Utku Şirin, Cathy Hou, David Alvarez-Melis, Stratos Idreos

发表机构 * Harvard University（哈佛大学）； Kempner Institute（凯姆纳研究所）

AI总结本文分析了纹理驱动领域中神经网络依赖低频成分作为捷径的现象，提出通过裁剪低频成分来消除捷径，从而提升分布内准确率和鲁棒性。

详情

AI中文摘要

神经网络存在捷径学习问题，即学习到的特征在训练集上泛化良好，但在分布内（ID）或分布外（OOD）测试集上表现不佳。现有研究均基于少数几个标准基准，这些基准是形状驱动的。然而，许多应用领域是纹理驱动的。在这项工作中，我们针对纹理驱动领域进行了捷径学习分析，并将其与标准基准进行了比较。我们表明，纹理驱动领域存在低频捷径。它们主要基于少数具有偏斜频谱行为的低频成分（LFC）做出决策，尽管其分类信息存在于更高频率的细粒度细节中。从训练集和测试集中裁剪LFC可以消除捷径，并提供更平衡的频谱行为，将ID准确率提升高达8%。我们表明，低频捷径使模型极易受到OOD干扰的影响，导致与ID准确率相比下降高达70%。裁剪LFC显著提高了对低频干扰的鲁棒性，提升高达40%，并引入了对高频干扰的权衡；平衡的频谱行为提供了更好的泛化性能，而对高频特征的依赖增加则降低了泛化性能。OOD准确率取决于这两个因素之间的相互作用。

英文摘要

Neural networks suffer from shortcut learning, where learned features generalize well to the training set but not to in-distribution (ID) or out-of-distribution (OOD) test sets. Existing studies are all based on a few standard benchmarks, which are shape-driven. Numerous application domains, however, are texture-driven. In this work, we present shortcut learning analysis for texture-driven domains, and compare it with that of a standard benchmark. We show that texture-driven domains suffer from low-frequency shortcuts. They make the majority of their decisions based on a few low-frequency components (LFCs) with a skewed spectral behavior, despite that their classification information is in higher-frequency, fine-grained details. Pruning LFCs from training and test sets eliminates the shortcut and provides a more balanced spectral behavior, improving the ID accuracy by up to 8%. We show that low-frequency shortcuts make the models highly vulnerable to OOD corruptions, leading up to 70% accuracy drop compared to the ID accuracy. Pruning LFCs significantly improves robustness to low-frequency corruptions, by up to 40%, and introduces a trade-off for high-frequency corruptions; the balanced spectral behavior provides a better generalization performance, whereas the increased dependence on high-frequency features reduces it. OOD accuracy depends on the interaction between these two factors.

URL PDF HTML ☆

赞 0 踩 0

2606.03483 2026-06-03 cs.LG cs.AI 版本更新

Analyzing Stream Collapse in Hyper-Connections: From Diagnosis to Mitigation

分析超连接中的流坍缩：从诊断到缓解

Ekaterina Alimaskina, Gleb Molodtsov, Aleksandr Beznosikov

发表机构 * MIRAI ； BRAIn Lab ； Yandex Research ； Innopolis University

AI总结本文通过细粒度诊断发现超连接中的多流残差连接存在流坍缩现象，即信号集中于主导流，并通过打破初始化对称性缓解该问题以提升性能。

详情

AI中文摘要

一种使用二次特征融合的恶意软件分类混合方法

Raja Khurram Shahzad, Muhammad Mustaqeem, Haroon Elahi

AI总结提出一种通过融合API调用和n-gram特征，并采用投票集成算法进行恶意软件检测与家族分类的方法，在Microsoft数据集上达到99.72%准确率和0.989 AUC。

详情

AI中文摘要

恶意软件（无论是变种还是新型）的数量正在迅速增加，使得恶意软件检测和缓解成为一个复杂的问题。改善恶意软件缓解的一种方法是自动检测和恶意软件家族分类。然而，传统的恶意软件检测方法无法将检测到的恶意软件分类到各自的家族中，阻碍了有效的恶意软件缓解。因此，本文提出了一种自动化恶意软件检测并将检测到的恶意软件分类到相应恶意软件家族的方法。所提出的方法在提取相关恶意软件特征（如API调用、固定和可变长度n-gram）后，使用自定义特征选择方法进行特征融合。此外，对于预测模型，提出了一种基于投票的算法融合方法。为了对所提出的方法进行实验评估，对Microsoft提供的数据集应用了二分类和多分类方法。最后，将实验结果与现有技术进行了比较。实验结果表明了所提出方法的有效性和效率，AUC为0.989，准确率为99.72%，对数损失为0.01。

英文摘要

The number of malware (either variant or novel) is rapidly increasing, making malware detection and mitigation a complex problem. One approach to improving malware mitigation is automatic detection and malware family classification. However, traditional malware detection methods cannot classify detected malware into their respective families, hindering effective malware mitigation. Consequently, this paper proposes a method to automate malware detection and classification of the detected malware into respective malware families. The proposed method uses feature fusion after extracting relevant malware features such as API calls and fixed and variable length n-grams with a customized feature selection method. Moreover, for the predictive model, a voting based approach is proposed for algorithm fusion. For the experimental evaluation of the proposed method, both binary and multi-class classification approaches are applied to the data set provided by Microsoft. Finally, the experimental results are compared with the state of the art. The experimental results indicate the effectiveness and efficiency of the proposed approach with an AUC of 0.989, accuracy of 99.72%, and a log loss of 0.01.

URL PDF HTML ☆

赞 0 踩 0

2606.03428 2026-06-03 cs.NE cs.AI cs.LG 版本更新

PrimeSVT: An Automated Memory-aware Pruning Framework with Prioritized Compression Policy for Spiking Vision Transformers

PrimeSVT: 一种具有优先压缩策略的自动化内存感知剪枝框架用于脉冲视觉Transformer

Rachmad Vidya Wicaksana Putra, Achyuta Muthuvelan, Alberto Marchisio, Muhammad Shafique

发表机构 * eBRAIN Lab, Division of Engineering, New York University (NYU) Abu Dhabi（eBRAIN实验室，工程系，纽约大学（NYU）阿布扎克分校）； New York University (NYU) Abu Dhabi, United Arab Emirates (UAE)（纽约大学（NYU）阿布扎克分校，阿拉伯联合酋长国（UAE））

AI总结提出PrimeSVT框架，通过自动化结构化剪枝和优先压缩策略，在满足精度和内存约束下压缩脉冲视觉Transformer，实现内存节省26.68%且精度损失小于3%。

Comments 8 pages, 8 figures, 3 tables

详情

AI中文摘要

脉冲视觉Transformer（SViT）的大尺寸仍然阻碍其嵌入式实现，因此需要模型压缩。现有工作通过非结构化剪枝压缩SViT模型，这需要专门的硬件加速器来利用其特定的稀疏模式以最大化效率提升。此外，它们的手动方法需要大量设计时间来为每个网络找到合适的剪枝设置，因此这种方法不可扩展。为了解决这一限制，我们提出了PrimeSVT，一种新颖的框架，对预训练的SViT模型执行自动化的内存感知结构化剪枝，从而在推理期间最大化其效率提升，适用于广泛使用的计算架构。为此，PrimeSVT首先根据层的大小（即参数数量）对SViT层进行排序，根据它们在不同剪枝率下的鲁棒性识别目标剪枝层，然后利用这个顺序从最大层到最小层逐层顺序压缩模型（即所谓的优先压缩策略），同时考虑用户定义的约束（即可接受的精度和内存节省）。在每一层中，PrimeSVT基于L2范数值采用通道级滤波器剪枝，以结构性地移除不重要的权重。实验结果表明，PrimeSVT通过自动化单次剪枝节省了26.68%的内存，同时将精度保持在原始未剪枝SViT模型（73.3%）的3%以内（未微调时为70.3%，微调后为72.9%），从而满足了精度和内存约束。这些表明我们的PrimeSVT框架实现了SViT及其嵌入式实现的设计自动化。

英文摘要

The large sizes of Spiking Vision Transformers (SViTs) still hinder their embedded implementation, highlighting the need for model compression. State-of-the-art works compress SViT models through unstructured pruning, which needs specialized hardware accelerators for their specific sparsity patterns to maximize efficiency gains. Moreover, their manual approach requires a huge design time to find an appropriate pruning setting for each network, thus making this approach not scalable. To address this limitation, we propose PrimeSVT, a novel framework that performs automated memory-aware structured pruning on pre-trained SViT models, thereby maximizing their efficiency gains during inference amenable to widely-used computing architectures. To achieve this, PrimeSVT first sorts the SViT layers based on their sizes (i.e., number of parameters), identifies the targeted pruning layers based on their robustness under different pruning rates, then leverages this order for compressing the model layer-by-layer sequentially from the largest one to the smallest one (i.e., so-called prioritized compression policy), while considering the user-defined constraints (i.e., acceptable accuracy and memory saving). In each layer, PrimeSVT employs channel-wise filter pruning based on their L2-norm values to structurally remove the non-significant weights. Experimental results show that PrimeSVT saves 26.68% memory through automated single-shot pruning, while preserving accuracy within 3% (70.3% without fine-tuning and 72.9% with fine-tuning) from the original unpruned SViT model (73.3%), thus meeting the accuracy and memory constraints. These show that our PrimeSVT framework enables design automation for SViTs and their embedded implementation.

URL PDF HTML ☆

赞 0 踩 0

2606.03391 2026-06-03 cs.LG cs.AI cs.CL 版本更新

When Model Merging Breaks Routing: Training-Free Calibration for MoE

当模型合并破坏路由：MoE的无训练校准

Canbin Huang, Tianyuan Shi, Xiaojun Quan, Jingang Wang, Jianfei Zhang, Qifan Wang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结针对MoE架构中模型合并导致的路由崩溃问题，提出基于二阶曲率的无训练校准方法HARC，通过闭式解和共轭梯度法高效重对齐路由器，显著提升数学推理和代码生成性能。

详情

AI中文摘要

模型合并已成为一种无需重新训练即可整合多个LLM能力的成本效益方法。然而，现有的合并技术主要基于线性参数算术或优化，在应用于混合专家（MoE）架构时面临困难。我们识别出MoE合并中的一个关键失效模式，称为路由崩溃，其中合并后的路由器无法将令牌分派给合适的专家。路由崩溃源于非线性softmax和离散Top-k路由机制对合并引起的参数扰动的敏感性，这种敏感性进一步被MoE预训练期间施加的负载平衡约束放大。由于微调后的专家表现出不同的专长，即使是适度的错误路由也可能导致严重的性能下降。为解决此问题，我们提出Hessian感知路由器校准（HARC），一种无训练框架，利用二阶曲率信息重新对齐合并后的路由器。该方法采用闭式解，可通过无矩阵共轭梯度法高效求解。在数学推理和代码生成任务上的实验表明，HARC有效缓解了多种MoE合并基线中的路由崩溃，并带来了显著的性能提升。我们的代码可在该https URL获取。

英文摘要

Model merging has emerged as a cost-effective approach for consolidating the capabilities of multiple LLMs without retraining. However, existing merging techniques, largely based on linear parameter arithmetic or optimization, struggle when applied to Mixture-of-Experts (MoE) architectures. We identify a critical failure mode in MoE merging, termed routing breakdown, in which the merged router fails to dispatch tokens to suitable experts. Routing breakdown stems from the sensitivity of the non-linear softmax and discrete Top-k routing mechanisms to parameter perturbations from merging, a sensitivity further amplified by load-balancing constraints imposed during MoE pretraining. Because fine-tuned experts exhibit distinct specializations, even modest misrouting can cause severe performance degradation. To address this issue, we propose Hessian-Aware Router Calibration (HARC), a training-free framework that leverages second-order curvature information to realign the merged router. This approach admits a closed-form solution that can be efficiently solved using a matrix-free conjugate gradient method. Experiments on mathematical reasoning and code generation tasks show that HARC effectively mitigates routing breakdown across diverse MoE merging baselines and leads to substantial performance improvements. Our code is available at https://github.com/huangcb01/HARC.

URL PDF HTML ☆

赞 0 踩 0

2606.03365 2026-06-03 cs.LG 版本更新

Link Prediction or Perdition: the Seeds of Instability in Knowledge Graph Embeddings

链接预测还是预测失灵：知识图谱嵌入中不稳定的种子

Guillaume Méroué, Fabien Gandon, Pierre Monnin

发表机构 * Université Côte d’Azur, Inria, CNRS, I3S, France（法国埃克塞特大学、法国国家信息与自动化研究所、法国国家科学研究中心、I3S研究所）

AI总结本文系统分析了多种知识图谱嵌入模型在链接预测中的稳定性，发现高性能模型在三元组预测和嵌入空间上存在显著不稳定性，且随机种子、超参数等因素独立引发同等程度的不稳定，投票机制仅能有限提升稳定性。

Comments Paper accepted at ESWC 2026 (https://2026.eswc-conferences.org)

详情

DOI: 10.1007/978-3-032-25156-5_11

AI中文摘要

嵌入模型（KGEMs）是完成知识图谱的主要链接预测方法。标准评估协议强调基于排名的指标如MRR或Hits@$K$，但通常忽略随机种子对结果稳定性的影响。此外，这些指标掩盖了个别预测和嵌入空间组织中的潜在不稳定性。在这项工作中，我们对多个数据集上的多种KGEM进行了系统的稳定性分析。我们发现高性能模型实际上在三元组级别产生分歧预测，并具有高度可变的嵌入空间。通过隔离随机因素（即初始化、三元组排序、负采样、dropout、硬件），我们表明每个因素独立地引发相当程度的不稳定性。此外，对于给定模型，具有更好MRR的超参数配置并不能保证更稳定。而且，投票虽然是一种已知的补救机制，但只能提供有限的稳定性增强。这些发现凸显了当前基准测试协议的关键局限性，并引发了对KGEM用于知识图谱补全的可靠性的担忧。

英文摘要

Embedding models (KGEMs) constitute the main link prediction approach to complete knowledge graphs. Standard evaluation protocols emphasize rank-based metrics such as MRR or Hits@$K$, but usually overlook the influence of random seeds on result stability. Moreover, these metrics conceal potential instabilities in individual predictions and in the organization of embedding spaces. In this work, we conduct a systematic stability analysis of multiple KGEMs across several datasets. We find that high-performance models actually produce divergent predictions at the triple level and highly variable embedding spaces. By isolating stochastic factors (i.e., initialization, triple ordering, negative sampling, dropout, hardware), we show that each independently induces instability of comparable magnitude. Furthermore, for a given model, hyperparameter configurations with better MRR are not guaranteed to be more stable. Moreover, voting, albeit a known remediation mechanism, only provides a limited enhancement of stability. These findings highlight critical limitations of current benchmarking protocols, and raise concerns about the reliability of KGEMs for knowledge graph completion.

URL PDF HTML ☆

赞 0 踩 0

2606.03361 2026-06-03 cs.LG 版本更新

Mitigating False Credit Propagation: Probabilistic Graphical Reward Aggregation for Rubric-Based Reinforcement Learning

缓解虚假信用传播：基于概率图奖励聚合的准则强化学习

Can Lv, Mingju Chen, Heng Chang, Shiji Zhou

发表机构 * Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Artificial Intelligence, Beihang University（北京未来区块链与隐私计算先进创新中心，人工智能学院，北京航空航天大学）； Tsinghua University（清华大学）

AI总结针对准则奖励中因忽略准则间依赖关系导致的虚假信用传播问题，提出概率图框架Graphical Event Aggregation for Rubric rewards (GEAR)，通过建模潜在伯努利事件和软抑制传播实现依赖感知的奖励聚合，在多个基准上提升性能并减少信用泄漏。

详情

AI中文摘要

基于准则的奖励越来越多地用于开放式语言模型的后训练，但准则级别的分数通常作为独立效用进行聚合。这种扁平标量化忽略了准则间由准则指定的前提和激活关系，使得即使触发奖励或惩罚的条件不存在，奖励或惩罚仍被计入。我们将这种结构性的奖励聚合失败称为 extbf{虚假信用传播}（FCP）。为解决这一局限，我们提出\ourname（ extbf{G}raphical extbf{E}vent extbf{A}ggregation for extbf{R}ubric rewards），一种用于依赖感知准则聚合的概率图框架。\ourname将每个准则结果建模为类型化准则图中的潜在伯努利事件，从不受支持的父事件向其子事件传播软抑制，并将结果事件概率聚合为归一化的期望符号效用。这产生了一个线性时间的奖励计算，可以插入到标准的基于准则的RL流程中，而无需改变外部优化算法。在HealthBench、WritingBench和PLawBench上使用两种策略骨干的实验表明，\ourname一致优于扁平聚合和确定性门控，相对于扁平聚合实现了高达15.5%的相对增益。FCP诊断进一步显示，相对于扁平聚合，\ourname减少了96.5%的泄漏，同时保留了比确定性门控更多的许可下游效用。我们的代码在此https URL公开。

英文摘要

Rubric-based rewards are increasingly used for open-ended language model post-training, but criterion-level scores are often aggregated as independent utilities. This flat scalarization ignores rubric-specified prerequisite and activation relations among criteria, allowing reward or penalty to be counted even when the condition that licenses it is absent. We call this structural reward-aggregation failure \textbf{False Credit Propagation} (FCP). To address this limitation, we propose \ourname (\textbf{G}raphical \textbf{E}vent \textbf{A}ggregation for \textbf{R}ubric rewards), a probabilistic graphical framework for dependency-aware rubric aggregation. \ourname models each criterion outcome as a latent Bernoulli event in a typed rubric graph, propagates soft suppression from unsupported parent events to their children, and aggregates the resulting event probabilities into a normalized expected signed utility. This yields a linear-time reward computation that can be plugged into standard rubric-based RL pipelines without changing the outer optimization algorithm. Experiments on HealthBench, WritingBench, and PLawBench with two policy backbones show that \ourname consistently improves over flat aggregation and deterministic gating, achieving relative gains of up to 15.5\% over flat aggregation. FCP diagnostics further show that \ourname reduces leakage by 96.5\% relative to flat aggregation while preserving more licensed downstream utility than deterministic gating. Our code is publicly available at https://github.com/LvCan926/GEAR.

URL PDF HTML ☆

赞 0 踩 0

2606.03359 2026-06-03 cs.SD cs.CL cs.LG 版本更新

Speech Emotion Recognition using Attention-based LSTM-Network with Residual Connection

基于注意力机制的残差连接LSTM网络的语音情感识别

Daniil Krasnoproshin, Maxim Vashkevich

发表机构 * Institute of Cybernetics and Machine Learning, Belarusian State University（白俄罗斯国立大学信息学与机器学习学院）

AI总结提出ResLSTM-SA轻量级架构，在LSTM中集成残差连接和软注意力，在RAVDESS数据集上以46.8k参数达到0.6517 UAR，优于传统基线且适合边缘部署。

Comments 6 pages, 5 figures, DSPA 2026

详情

DOI: 10.1109/DSPA69176.2026.11476771

AI中文摘要

语音情感识别是现代人机交互系统的重要组成部分。然而，许多最先进的方法依赖于具有高计算和内存需求的大型预训练模型，限制了其适用性。本文提出了ResLSTM-SA，一种轻量级架构，在基于LSTM的框架中集成了残差连接和软注意力。在RAVDESS数据集上，在严格的说话人独立划分下进行评估，所提出的模型在未加权平均召回率（UAR）方面优于传统的基于注意力的LSTM基线以及几种先前报道的CNN和混合CNN-LSTM架构。性能最佳的变体（ResLSTM-SA-h64）仅用46.8k可训练参数就达到了0.6517的最大UAR，以比大规模自监督替代方案少三个数量级的参数提供了具有竞争力的准确性，从而能够在边缘设备和实时语音助手上高效部署。源代码可在以下网址获取：https://this URL。

英文摘要

Speech emotion recognition is an important component of modern human-computer interaction systems. However, many state-of-the-art approaches rely on large pretrained models with high computational and memory requirements, limiting their applicability. This paper proposes ResLSTM-SA, a lightweight architecture that integrates residual connections with soft attention within an LSTM-based framework. Evaluated on the RAVDESS dataset under strict speaker-independent partitioning, the proposed model outperforms conventional attention-based LSTM baselines and several previously reported CNN- and hybrid CNN-LSTM architectures in terms of unweighted average recall (UAR). The best-performing variant (ResLSTM-SA-h64) achieves a maximum UAR of 0.6517 with only 46.8k trainable parameters, delivering competitive accuracy with three orders of magnitude fewer parameters than large-scale self-supervised alternatives, thereby enabling efficient deployment on edge devices and real-time voice assistants. The source code is available at https://github.com/Mak-Sim/ResLSTM-SER.

URL PDF HTML ☆

赞 0 踩 0

2606.03358 2026-06-03 cs.LG 版本更新

The Impact of Temporal Granularity on Socio-Demographic Inference from Household Load Profiles

时间粒度对家庭负荷曲线社会人口推断的影响

Dejan Radovanovic, Maximilian Schirl, Andreas Unterweger, Günther Eibl

发表机构 * Center for Secure Energy Informatics（安全能源信息中心）； Salzburg University of Applied Sciences（萨尔茨堡应用科学大学）； Paris Lodron University of Salzburg（萨尔茨堡巴黎洛登大学）

AI总结本文通过分析15分钟到7天不同粒度负荷曲线对8个社会人口属性的预测影响，揭示了隐私-效用权衡中时间分辨率、特征提取和分类器选择的联合作用。

Comments 30 pages, 10 figures, book chapter

详情

AI中文摘要

智能电表数据可以揭示家庭敏感的社会人口特征，引发隐私担忧。虽然这一风险已在固定粒度下得到证实，但时间分辨率在塑造推断性能中的作用尚未得到充分探索。本文通过分析从15分钟到7天不同粒度的负荷曲线如何影响1589户家庭一年数据中八个社会人口属性的可预测性，填补了这一空白。我们引入了一个评估框架，其中分类器在全年数据上训练，但在任意周上测试，迫使模型跨季节和每周变化进行泛化。我们的结果显示了三个主要发现。首先，虽然粗化粒度降低了预测准确性，但出现了两个平台期：性能在15分钟到1小时之间稳定，以及在1到7天之间再次稳定。这揭示了在不牺牲效用的情况下进行数据最小化的机会。其次，可解释的手工特征和tsfresh特征仍然与基于CNN的自编码器嵌入具有竞争力，而XGBoost始终优于其他分类器。第三，特征重要性分析突出了静态和动态属性之间的差异：即使从粗粒度数据中也能推断出住宅面积，而游泳池使用则需要细粒度的时间信号。总体而言，我们的研究为智能计量中的隐私-效用权衡提供了新的见解，显示了时间分辨率、特征提取和分类器选择如何共同影响社会人口推断。

英文摘要

Smart meter data can reveal sensitive socio-demographic characteristics of households, raising privacy concerns. While this risk has been demonstrated at fixed granularities, the role of temporal resolution in shaping inference performance remains insufficiently explored. This paper addresses this gap by analyzing how load profiles with granularities from 15 minutes to 7 days affect the predictability of eight socio-demographic attributes in a dataset of 1,589 households over one year. We introduce an evaluation framework where classifiers are trained on year-round data but tested on arbitrary weeks, forcing generalization across seasonal and weekly variations. Our results show three main findings. First, while coarsening granularity reduces predictive accuracy, two plateaus emerge: performance is stable between 15 minutes and 1 hour, and again between 1 and 7 days. This reveals opportunities for data minimization without sacrificing utility. Second, interpretable handcrafted and tsfresh features remain competitive with CNN-based autoencoder embeddings, while XGBoost consistently outperforms alternative classifiers. Third, feature importance analysis highlights differences between static and dynamic attributes: dwelling size can be inferred even from coarse data, whereas swimming pool usage requires fine-grained temporal signals. Overall, our study provides new insights into the privacy-utility trade-off in smart metering, showing how temporal resolution, feature extraction, and classifier choice jointly influence socio-demographic inference.

URL PDF HTML ☆

赞 0 踩 0

2606.03355 2026-06-03 cs.LG 版本更新

APIC: Amortized Physics-Informed Calibration using Neural Processes

APIC: 使用神经过程的摊销物理信息校准

Aishwarya Venkataramanan, Sai Karthikeya Vemuri, Joachim Denzler

发表机构 * Computer Vision Group, Friedrich Schiller University Jena（耶纳弗里德里希-施莱尔大学计算机视觉组）

AI总结提出APIC框架，通过神经过程实现群体级贝叶斯推断，利用两分支潜在架构分离实例特定物理参数与共享结构差异，实现从稀疏观测中快速校准并量化不确定性。

Comments Accepted at UAI 2026

详情

AI中文摘要

物理模型由于机制错误或缺失而固有地不完美，导致模型预测与真实观测之间存在系统性差异。Kennedy-O'Hagan (KOH) 框架通过显式差异建模解决了这个问题。然而，其非摊销的、每个实例的公式限制了在相关系统族中的可扩展性。我们引入了摊销物理信息校准 (APIC)，这是 KOH 的群体级扩展，利用神经过程在实现之间进行可扩展的贝叶斯推断。我们的框架采用两分支潜在架构，将实例特定的物理参数与共享的、状态相关的结构差异分离开来。通过将可微物理集成到摊销推断骨干中，APIC 能够从稀疏观测中快速校准未见过的实现，同时量化不确定性。在阻尼弹簧振荡器、Lotka-Volterra 系统和具有错误物理的对流扩散偏微分方程上的实验表明，与其他校准方法相比，参数恢复得到改善，并且系统差异结构的一致识别得到增强。

英文摘要

Physics models are inherently imperfect due to misspecified or missing mechanisms, resulting in systematic discrepancies between model predictions and real-world observations. The Kennedy-O'Hagan (KOH) framework addresses this issue through explicit discrepancy modeling. However, its non-amortized, per-instance formulation limits scalability across families of related systems. We introduce Amortized Physics-Informed Calibration (APIC), a population-level extension of KOH that leverages Neural Processes to perform scalable Bayesian inference across realizations. Our framework employs a two-branch latent architecture to disentangle instance-specific physical parameters from shared, state-dependent structural discrepancies. By integrating differentiable physics into an amortized inference backbone, APIC enables rapid calibration of unseen realizations from sparse observations while quantifying uncertainty. Experiments on the damped spring oscillator, the Lotka-Volterra system, and the advection-diffusion PDE with misspecified physics demonstrate improved parameter recovery and consistent identification of the systemic discrepancy structure compared to other calibration approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.03347 2026-06-03 cs.LG cs.AI stat.ML 版本更新

AugMask: Training Diffusion Models on Incomplete Tabular Data via Stochastic Augmentation and Masking

AugMask: 通过随机增强和掩码在不完整表格数据上训练扩散模型

Jungkyu Kim, Taeyoung Park, Kibok Lee

发表机构 * KAIST（韩国科学技术院）

AI总结提出AugMask训练框架，通过条件随机增强和仅对观测坐标去噪，使标准扩散模型适应缺失表格数据，并连接Rao-Blackwellized目标实现方差加权惩罚，优于专门处理缺失的基线。

详情

AI中文摘要

基于分数的扩散模型已成为突出的深度生成模型；然而，它们在表格数据上的应用仍然具有挑战性，因为其主干网络假设输入完全指定，而现实世界的表格数据通常包含缺失值。我们提出了AugMask，一个即插即用的训练框架，通过将条件与监督分离，使对缺失不敏感的主干网络适应不完整数据。AugMask 1) 使用轻量级辅助模型通过条件随机增强构建数值输入，2) 仅对观测坐标应用去噪监督。实际上，增强的缺失条目作为不确定的条件上下文，而不是训练目标。我们将此训练规则与Rao-Blackwellized目标联系起来，并表明对缺失条目进行边缘化会产生方差加权的敏感性惩罚，从而阻止对不确定补全的过度依赖。在多种数据集和缺失机制下，AugMask使基于扩散的标准表格生成器优于专门处理缺失的基线方法。

英文摘要

Score-based diffusion models have emerged as prominent deep generative models; however, their application to tabular data remains challenging because their backbones assume fully specified inputs, whereas real-world tabular data often contain missing values. We propose AugMask, a plug-and-play training framework that adapts missing-unaware backbones to incomplete data by separating conditioning from supervision. AugMask 1) constructs numeric inputs via conditional stochastic augmentation using lightweight auxiliary models, and 2) applies denoising supervision only to observed coordinates. In effect, augmented missing entries serve as uncertain conditioning context rather than training targets. We connect this training rule to a Rao--Blackwellized objective and show that marginalizing missing entries yields a variance-weighted sensitivity penalty, discouraging over-reliance on uncertain completions. Across diverse datasets and missingness regimes, AugMask enables standard diffusion-based tabular generators to outperform specialized missing-aware baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.03344 2026-06-03 cs.CR cs.LG 版本更新

为下游任务定制严格适当的评分规则：因果推断中的应用

Roman Plaud, Alexandre Perez-Lebel, Antoine Saillenfest, Thomas Bonald, Marine Le Morvan, Gaël Varoquaux, Matthieu Labeau

发表机构 * Inria（法国国家科学研究中心）； CNRS（法国国家科学研究中心）； Université de Paris（巴黎大学）； Université de Paris-Saclay（巴黎-萨克雷大学）

AI总结提出一种通过匹配下游误差指标的局部曲率来推导任务特定严格适当评分规则的框架，并将其应用于平均处理效应估计，导出了闭式损失函数及其对应的规范概率映射，实验表明该方法优于标准似然和协变量平衡方法。

Comments Accepted to ICML 2026

2606.03330 2026-06-03 cs.LG cs.AI cs.CR 版本更新

FLIPS: Instance-Fingerprinting for LLMs via Pseudo-random Sequences

FLIPS：通过伪随机序列为LLMs进行实例指纹识别

Gurvan Richardeau, Gohar Dashyan, Erwan Le Merrer, Gilles Tredan

发表机构 * Inria（法国国家信息与自动化研究所）

AI总结提出FLIPS方法，利用生成的二进制随机序列中的偏差，在237个模型实例上实现96%（闭集）和90%（开集）的识别准确率，解决了现有指纹识别技术无法区分同一LLM不同配置的问题，为AI监管提供了实例级指纹识别新范式。

Comments 20 pages, 20 figures, 3 tables. 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

文献揭示，大型语言模型（LLM）的行为不仅受其原始权重影响，还受其实例级参数（如指令提示、采样配置或量化）影响。在一种配置下生成安全输出的模型，在另一种配置下可能产生有毒内容。然而，当前的LLM识别技术（如指纹识别）侧重于知识产权保护，其设计倾向于对这些实例级参数的变化具有鲁棒性。这对AI监管构成了关键挑战，因为合规评估针对的是实际部署的行为，而非模型来源。在本文中，我们引入了实例级指纹识别，这是一种面向监管的范式，用于区分同一LLM的不同配置。我们的方法FLIPS利用生成的二进制随机序列中的偏差，在237个模型实例上达到96%（闭集）和90%（开集，其中一些目标未知）的识别准确率，而改编的LLMmap基线仅为35%。这表明实例级指纹识别对于监管既必要又实际可行。代码见https://this URL。

英文摘要

Literature reveals that a Large Language Model's (LLM) behavior is not only conditioned by its original weights but also its instance-level parameters, such as instructional prompt, sampling configuration or quantization. A model that generates safe outputs under one configuration may produce toxic content under another. However, current LLM identification techniques (such as fingerprinting) focus on intellectual property protection, and their design favors robustness to changes in these instance-level parameters. This poses a critical challenge for AI regulation in which compliance assessments target actual deployed behaviors, not model provenance. In this paper, we introduce instance-level fingerprinting, a regulator-oriented paradigm that distinguishes configurations of the same LLM. Our method FLIPS, exploits biases in generated binary random sequences to reach 96% (closed-set) and 90% (open-set, where some targets are unknown) identification accuracy across 237 model instances, versus 35% for the adapted LLMmap baseline. This shows that instance-level fingerprinting is both necessary for regulation and practically feasible. Code available at https://github.com/GurvanR/FLIPS-LLM-Instance-Fingerprinting.

URL PDF HTML ☆

赞 0 踩 0

2606.03322 2026-06-03 cs.LG cs.AI 版本更新

Multi-Modal Graph Neural Network with Transformer-Guided Adaptive Diffusion for Preclinical Alzheimer Classification

多模态图神经网络与Transformer引导的自适应扩散用于临床前阿尔茨海默病分类

Jaeyoon Sim, Minjae Lee, Guorong Wu, Won Hwa Kim

发表机构 * Pohang University of Science and Technology（浦项科学技术大学）； University of North Carolina at Chapel Hill（北卡罗来纳大学教堂山分校）

AI总结提出一种结合扩散核与多头注意力的图神经网络框架，通过Transformer引导自适应扩散过程，有效融合多模态特征，提升临床前阿尔茨海默病分类性能并识别关键脑区。

Comments 10 pages, Accepted to MICCAI 2024

详情

AI中文摘要

大脑的图形表示通过感兴趣区域（ROI）之间的关系为诊断和预测神经退行性疾病提供了关键见解。尽管近年来出现了各种图神经网络（GNN）来有效捕获关系信息，但在解释大脑网络方面仍存在固有局限性。具体而言，卷积方法无法有效聚合远邻域信息，而基于注意力的方法在捕获节点中心信息方面存在缺陷，特别是在保留关键节点的关键特征方面。这些不足揭示了从不同模态的不同特征中识别疾病特异性变化的挑战。为此，我们提出一个集成框架，通过下游Transformer引导每个节点的扩散过程，其中图的短程和长程属性分别通过扩散核和多头注意力进行聚合。我们通过使用多种模态改进临床前阿尔茨海默病（AD）分类的性能，证明了我们模型的优越性。此外，我们的模型能够熟练识别与AD临床前阶段密切相关的关键ROI，为疾病的早期诊断和预防提供了重要潜力。

英文摘要

The graphical representation of the brain offers critical insights into diagnosing and prognosing neurodegenerative disease via relationships between regions of interest (ROIs). Despite recent emergence of various Graph Neural Networks (GNNs) to effectively capture the relational information, there remain inherent limitations in interpreting the brain networks. Specifically, convolutional approaches ineffectively aggregate information from distant neighborhoods, while attention-based methods exhibit deficiencies in capturing node-centric information, particularly in retaining critical characteristics from pivotal nodes. These shortcomings reveal challenges for identifying disease-specific variation from diverse features from different modalities. In this regard, we propose an integrated framework guiding diffusion process at each node by a downstream transformer where both short- and long-range properties of graphs are aggregated via diffusion-kernel and multi-head attention respectively. We demonstrate the superiority of our model by improving performance of pre-clinical Alzheimer's disease (AD) classification with various modalities. Also, our model adeptly identifies key ROIs that are closely associated with the preclinical stages of AD, marking a significant potential for early diagnosis and prevision of the disease.

URL PDF HTML ☆

赞 0 踩 0

2606.03321 2026-06-03 cs.LG cs.MA cs.SY eess.SY 版本更新

Validation-Gated Multi-Agent Governance for Online Adaptation of Thermal-Hydraulic Surrogate Models under Operating-Regime Shift

验证门控多智能体治理：运行工况迁移下热工水力代理模型的在线自适应

Doyeong Lim, Seungyoon Lee, In Cheol Bang

发表机构 * Department of Nuclear Engineering, Ulsan National Institute of Science and Technology (UNIST)（核工程系，乌山国立科学技术研究所（UNIST））

AI总结针对离线训练模型在运行工况迁移时性能退化问题，提出验证门控多智能体治理框架，通过角色分离的智能体协作与确定性门控机制实现可审计的在线自适应，在实验热工水力数据上将平均绝对误差降低19%。

详情

AI中文摘要

人工智能代理模型可以支持每秒的热工水力预测，但离线选定并冻结的模型一旦部署到预训练包络之外，可能会变得条件锁定。本研究针对实验热工水力回路数据开发了一个受保护的持续自适应框架，其中角色分离的智能体——监控器、诊断器、自适应器、安全审计器和编排器——诊断误差特征、优先考虑候选模型族并审查升级，而确定性的冠军-挑战者门控和后台影子学习保留对模型替换的最终权限。通过分块三折交叉验证筛选了七个代理模型族，并选择时间傅里叶神经算子作为初始冠军，用于两个保留瞬态的60秒历史到10秒轨迹预测，每种自适应模式使用三个种子。静态部署给出通道平均MAE为7.06，警告超标率为56.8%；基于规则的自适应将MAE降至6.54，而仅使用影子刷新则接近静态。MA-Full模式中，角色分离的多智能体委员会审查每个评估流步骤，实现了最低的平均误差5.72和35.8%的超标率，相比静态改进19.0%。与静态的配对自助区间排除零，但自适应模式之间的区间重叠，且六个配对单元限制了广泛的统计声明。从神经算子到Transformer和图神经网络的验证升级表明，记录的门控自适应可以支持可审计的代理模型演化，同时确定性门控保留部署权限。

英文摘要

Artificial-intelligence surrogates can support second-by-second thermal-hydraulic forecasting, but models selected and frozen offline may become condition-locked once deployed outside their pretraining envelope. This study develops a guarded continual-adaptation framework for experimental thermal-hydraulic loop data in which role-separated agents - Monitor, Diagnosis, Adaptation, Safety-Auditor, and Orchestrator - diagnose error signatures, prioritize candidate model families, and review promotions, while deterministic champion-challenger gates and background shadow learning retain final authority over model replacement. Seven surrogate families were screened by blocked three-fold cross-validation, and a temporal Fourier neural operator was selected as the initial champion for 60-s-history-to-10-s-trajectory forecasting on two held-out transients, with three seeds per adaptive mode. Static deployment gave a channel-averaged MAE of 7.06 and a 56.8% warning-exceedance ratio; rule-based adaptation reduced MAE to 6.54, whereas shadow refresh alone remained close to Static. The MA-Full mode, in which the role-separated multi-agent council reviews every evaluated stream step, achieved the lowest mean error, 5.72, and 35.8% exceedance, corresponding to a 19.0% improvement over Static. Paired bootstrap intervals against Static excluded zero, although intervals among adaptive modes overlapped and the six paired units limit broad statistical claims. Validated promotions from the neural operator to Transformer and graph neural network indicate that logged, gate-controlled adaptation can support auditable surrogate evolution while deterministic gates retain deployment authority.

URL PDF HTML ☆

赞 0 踩 0

2606.03315 2026-06-03 cs.LG 版本更新

物理对齐数据压缩的几何视角

Aleix Segui, Wesley Armour

发表机构 * GitHub

AI总结本文通过局部几何理论揭示了物理信息损失函数在科学数据压缩中导致的率失真权衡，并提出了基于主特征空间重叠的对齐诊断方法。

Comments Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026

AI中文摘要

在人工智能科学中，物理信息损失函数越来越多地被用于训练科学数据的学习压缩器，但其率失真影响仍知之甚少。在固定比特率下，这些目标通常能改善目标物理可观测量的保存，但会降低标准重建保真度。我们发展了一个局部几何理论，表明这种权衡由熵模型、物理可观测量和失真度量引起的潜在空间敏感性的相互作用所支配。在每个操作点，这些因素诱导出压缩噪声应被抑制的优先方向，从而产生各向异性的误差分配机制。当这些方向未对齐时，在固定速率下改善可观测量必然恶化标准失真，这确立了同时保存的基本限制。我们通过局部切空间率失真定律形式化这一点，并引入基于主特征空间重叠的实用对齐诊断方法。跨科学领域的实验测试了该理论，并验证了对齐诊断与观测到的数据和物理空间权衡相关。

英文摘要

In AI for Science, physics-informed losses are increasingly used to train learned compressors for scientific data, but their rate-distortion implications remain poorly understood. At fixed bitrate, these objectives often improve preservation of a target physical observable while degrading standard reconstruction fidelity. We develop a local geometric theory showing that this tradeoff is governed by the interaction of latent-space sensitivities induced by the entropy model, the physical observable, and the distortion metric. At each operating point, these induce preferred directions along which compression noise should be suppressed, yielding an anisotropic error-allocation mechanism. When these directions are misaligned, improving the observable at fixed rate necessarily worsens standard distortion, establishing a fundamental limit on simultaneous preservation. We formalise this through a local tangent-space rate-distortion law and introduce a practical alignment diagnostic based on dominant eigenspace overlap. Experiments across scientific domains test the theory and validate that the alignment diagnostic correlates with observed data- and physics-space trade-offs.

URL PDF HTML ☆

赞 0 踩 0

2606.03270 2026-06-03 cs.LG cs.AI 版本更新

Are Common Substructures Transferable? Riemannian Graph Foundation Model with Neural Vector Bundles

常见子结构可迁移吗？基于神经向量丛的黎曼图基础模型

Li Sun, Zhenhao Huang, Yiding Wang, Qin Chen, Pietro Lio, Philip S. Yu

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结针对图结构迁移性理论缺失的问题，提出基于黎曼几何的神经向量丛框架GAUGE，通过内在几何学习实现可迁移子结构表征，在零样本链接预测和图同构任务中验证了优越性。

Comments Accepted by ICML 2026

详情

AI中文摘要

基础模型通过预训练-适应范式引发了革命，最近的研究将这一成功扩展到图。与其他模态不同，图包含丰富的结构模式，但其结构迁移性仍知之甚少。先前的研究考虑离散领域中的常见子结构，我们被一个基本问题所驱动：常见子结构可迁移吗？其背后的理论很大程度上未被探索。在这项工作中，我们转向通过功能行为的视角学习可迁移结构。理论上，我们将可迁移子结构与表示空间的内在几何联系起来。然而，表征这种内在几何很少被触及。基于黎曼几何，我们开发了一个称为神经向量丛的图内在几何学习框架，该框架能够用局部坐标解析内在几何。在此基础上，我们设计了GAUGE，一个可预训练的神经架构，它构建向量丛，展平几何兼容的局部坐标，以及一个新的狄利克雷损失，该损失也衡量迁移努力。我们通过实验验证了其在具有挑战性的任务（包括零样本链接预测和图同构）中的优越表现力。

英文摘要

Foundation models have sparked a revolution via a pretraining-adaptation paradigm, with recent efforts extending this success to graphs. Unlike other modalities, graphs contain rich structural patterns, yet their structural transferability remains poorly understood. Prior studies consider common substructures in the discrete realm, and we are motivated by a fundamental question: Are common substructures transferable? The underlying theory is largely underexplored. In this work, we shift toward learning transferable structures through the lens of functional behavior. Theoretically, we connect transferable substructures to intrinsic geometry of the representation space. However, characterizing such intrinsic geometry has rarely been touched. Grounded in Riemannian geometry, we develop a graph intrinsic geometry learning framework called Neural Vector Bundle, which enables parsing intrinsic geometry with local coordinates. Building on this, we design GAUGE, a pretrainable neural architecture that constructs the vector bundle, flattening geometrically compatible local coordinates, and a new Dirichlet loss, which also measures the transfer effort. We empirically validate its superior expressiveness in challenging tasks including zero-shot link prediction and graph isomorphism.

URL PDF HTML ☆

赞 0 踩 0

2606.03262 2026-06-03 cs.LG cs.NA math.NA 版本更新

现实世界数据集是否包含自然实验？基于因果特征选择的实证研究

Gautam Gare, John Galeotti, Michael Mozer, Deva Ramanan, Nan Rosemary Ke

AI总结本文利用因果发现和特征选择检测现实世界数据集中的自然实验，并通过干预性处理提升模型性能。

详情

AI中文摘要

在自然界中，影响某些个体或群体但不影响其他个体或群体的事件构成隐式干预，被称为自然实验。例如，COVID-19大流行是冠状病毒对感染COVID的亚群的一次干预。我们问：现有的现实世界数据集中是否存在自然实验？如果存在，我们应该如何处理它们？为了检测数据中的自然实验，我们使用因果发现恢复潜在因果图，并基于因果链接进行特征选择。如果通过将数据视为干预性而非观测性来提升下游性能，我们认为这表明数据集包含自然实验。我们首先通过使用合成图模拟包含和不包含自然实验的数据集来验证这一假设。然后，我们在大量现实世界数据集上进行系统的实证评估。我们的结果表明，现实世界数据集确实包含自然实验，我们可以利用这些自然实验通过因果推断来提升模型性能。我们的工作代表了该领域的初步探索，在有限范围内进行了初步研究。

英文摘要

In nature, events that affect some individuals or groups but not others constitute an implicit intervention and are known as natural experiments. For example, the COVID-19 pandemic was an intervention by the coronavirus on the sub-population infected with COVID. We ask, do natural experiments occur in existing real-world datasets? If yes, how should we treat them? To detect natural experiments in data, we use causal discovery to recover the underlying causal graph and perform feature selection based on causal links. If downstream performance improves by treating the data as interventional rather than observational, we argue that this suggests the dataset contains natural experiments. We first validate this hypothesis by simulating datasets with and without natural experiments using synthetic graphs. We then perform a systematic empirical evaluation on a large suite of real-world datasets. Our results indicate that real-world datasets do contain natural experiments and we can take advantage of those natural experiments to improve model performance using causal inference. Our work represents the initial foray into this area, offering a preliminary exploration within a limited scope.

URL PDF HTML ☆

赞 0 踩 0

2606.03238 2026-06-03 cs.LG cs.AI 版本更新

When RLHF Fails: A Mechanistic Taxonomy of Reward Hacking, Collapse, and Evaluator Gaming

当RLHF失败时：奖励黑客、崩溃和评估者博弈的机制分类

Zelalem Abahana

发表机构 * First Citizens Bank（第一公民银行）； Alma Mater Europaea University（欧洲大学）

AI总结本文通过PPO、DPO等方法的对比实验，提出了一种基于奖励和评估者分数方向的机制分类法，将RLHF失败模式分类为可定位、可预测的训练动态。

Comments 20 pages, 8 figures; includes code, artifacts, and live demo

详情

AI中文摘要

从人类反馈中强化学习（RLHF）通过用学习到的可扩展代理替代未明确指定的人类目标，实现了大规模后训练。这种替代同时创建了一个结构化的失败面：优化可以提高学习到的奖励而外部质量下降，降低代理和评估者分数，揭示代理欠对齐，或产生评估者特定的分歧。我们展示了一个紧凑RLHF流程的实证失败模式研究，该流程包括近端策略优化（PPO）、直接偏好优化（DPO）、不确定性惩罚PPO（UP-PPO）、奖励模型不确定性、近似策略漂移、多样性和重复诊断，以及两个外部LLM评估者。我们不将奖励黑客视为单一终端事件，而是使用学习到的奖励、评估者分数和平均评估者分数的方向对检查点之间的匹配转换进行分类。在61个检查点行和1920个行级转换中，激进的PPO具有最高的局部奖励黑客率（14.45%；bootstrap 95% CI: 10.16-18.75），而UP-PPO在相同激进机制下产生较低率（11.33-10.94%）。转换前的逻辑模型以ROC-AUC 0.821预测未来行级奖励黑客，行级分析发现12个设置中有3个存在检查点平均值遗漏的局部奖励黑客。核心结论是方法论上的：RLHF失败不仅是最终模型病理，而且是可分类、可定位和部分可预测的训练动态。

英文摘要

Reinforcement learning from human feedback (RLHF) makes large-scale post-training possible by replacing an underspecified human objective with learned and scalable proxies. The same substitution creates a structured failure surface: optimization can raise the learned reward while external quality falls, degrade both proxy and judge scores, reveal proxy under-alignment, or produce evaluator-specific disagreement. We present an empirical failure-mode study of a compact RLHF pipeline with proximal policy optimization (PPO), direct preference optimization (DPO), uncertainty-penalized PPO (UP-PPO), reward-model uncertainty, approximate policy drift, diversity and repetition diagnostics, and two external LLM judges. Rather than treating reward hacking as a single terminal event, we classify matched transitions between checkpoints using the directions of the learned reward, judge scores, and average judge score. Across 61 checkpoint rows and 1920 row-level transitions, aggressive PPO has the highest localized reward-hacking rate (14.45%; bootstrap 95% CI: 10.16-18.75), while UP-PPO yields lower rates in the same aggressive regime (11.33-10.94%). A pre-transition logistic model predicts future row-level reward hacking with ROC-AUC 0.821, and row-level analysis finds localized reward hacking that checkpoint averages miss in 3 of 12 settings. The central conclusion is methodological: RLHF failures are not only final-model pathologies, but training dynamics that can be classified, localized, and partially anticipated.

URL PDF HTML ☆

赞 0 踩 0

2606.03237 2026-06-03 cs.AI cs.CL cs.CY cs.LG cs.MA 版本更新

Solipsistic Superintelligence is Unlikely to be Cooperative

唯我论超级智能不太可能合作

Rakshit S Trivedi, Natasha Jaques, Logan Cross, Alexander Sasha Vezhnevets, Joel Z Leibo

发表机构 * DeepMind（深度Mind）； University of Cambridge（剑桥大学）； University of California, Berkeley（加州大学伯克利分校）

AI总结本文指出，基于唯我论方法设计的超级智能（极端能力的任务求解器）因忽视部署引发的内生非平稳性而难以合作，呼吁将相互依存作为核心设计原则的非唯我论研究范式。

Comments 24 pages, 1 figure, Accepted at Proceedings of the 43rd International Conference on Machine Learning, 2026

详情

AI中文摘要

AI的核心挑战正从能力转向共存。AI研究的主导范式侧重于开发将世界视为外生且平稳反馈源的强大智能体。我们认为，源于这种唯我论AI设计方法的超级智能（极端能力的任务求解器）不太可能合作。部署AI系统会引发内生非平稳性，导致训练-测试-部署差距，即历史分布与部署环境相偏离。我们称此为单边优化的自我削弱属性。缩小这一差距需要参与合作的AI：即多个行为体导航其相互依存的均衡选择过程。我们呼吁一种非唯我论的研究范式，将这种相互依存作为核心设计原则，而非将合作视为待解决的任务。这需要构建涉及自适应对手方的动态评估测试平台，将制度视为设计原语，并保留人类能动性作为我们构建系统的结构性特征。

英文摘要

AI's central challenge is shifting from capability to coexistence. The dominant paradigm in AI research focuses on developing powerful agents that treat the world as an exogenous and stationary source of feedback. We contend that superintelligence, an extremely capable task solver, born out of such a solipsistic approach to AI design, is unlikely to be cooperative. Deploying AI systems induces endogenous non-stationarity, resulting in a train-test-deploy gap where historical distributions diverge from the deployment context. We refer to this as the self-undermining property of unilateral optimization. Closing this gap requires AI that participates in cooperation: the equilibrium-selection process through which multiple actors navigate their interdependence. We call for a non-solipsistic research paradigm that treats this interdependence as a core design principle rather than approaching cooperation as a task to solve. This entails building dynamic evaluation testbeds involving adaptive counterparties, treating institutions as design primitives, and preserving human agency as a structural feature of the systems we build.

URL PDF HTML ☆

赞 0 踩 0

2606.03234 2026-06-03 cs.LG 版本更新

Right Makes Might: Aligning Verified Hidden States Empowers RL Reasoning

正确即力量：对齐验证的隐藏状态增强强化学习推理

Ziyue Wang, Aomufei Yuan, Yongfu Zhu, Shuai Dong, Wenpu Liu, Yiran Yao, Weichu Xie, Yuqi Xu, Caoyuan Ma, Wenqi Shao, Xiaoying Zhang, Nan Duan, Jiaqi Wang

发表机构 * Peking University（北京大学）； JINGDONG（京东）； Shanghai Innovation Institute（上海创新研究院）； The University of Tokyo（东京大学）； Tianjin University（天津大学）

AI总结提出Hidden-Align辅助损失函数，在强化学习训练中对齐正确rollout在锚点token处的最后一层隐藏状态，提升数学推理性能。

Comments 16 pages, 7 figures

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）已成为提升大语言模型数学推理的主流方法，但当前方法将每个正确rollout简化为单个奖励比特，忽略了其隐藏状态共享的几何结构。研究这一结构发现，在锚点token（答案标记前的位置）处，正确rollout自然收敛，因为它们必须产生相同答案（余弦相似度约0.84），但每个rollout仍保留其独特推理路径的残余方差。鼓励在该点完全对齐，推动模型提取统一的“正确决策”表示，减少对推理路径的敏感性。基于此观察，我们提出Hidden-Align，一种辅助损失函数，在RL训练中对齐正确rollout在锚点token处的最后一层隐藏状态，训练和推理中零开销。在八个数学推理基准上，Hidden-Align在Qwen3-1.7B、4B和14B上分别比DAPO基线平均提升pass@1 3.8、6.2和5.4个百分点，且在所有三种规模上pass@k一致提升，消融实验支持了损失类型、锚点位置、层深度和损失权重的影响。

英文摘要

Reinforcement Learning from Verifiable Rewards (RLVR) has become the dominant approach for improving mathematical reasoning in large language models, yet current methods reduce each correct rollout to a single reward bit, ignoring the geometric structure shared among their hidden states. Investigating this structure, we find that at the anchor token (the position immediately before the answer marker), correct rollouts converge naturally because they must produce the same answer (cosine similarity ~0.84), yet each retains residual variance from its unique reasoning path. Encouraging full alignment at this point pushes the model to extract a unified "correct decision" representation, reducing sensitivity to which reasoning path was taken. Based on this observation, we propose Hidden-Align, an auxiliary loss function that aligns the last-layer hidden states of correct rollouts at the anchor token during RL training, with zero overhead in both training and inference. On eight mathematical reasoning benchmarks, Hidden-Align improves average pass@1 over the DAPO baseline by 3.8, 6.2, and 5.4 percentage points on Qwen3-1.7B, 4B, and 14B respectively, with consistent pass@k gains across all three scales, supported by ablations on loss type, anchor position, layer depth, and loss weight.

URL PDF HTML ☆

赞 0 踩 0

2606.03232 2026-06-03 cs.LG cs.AI 版本更新

GFFMERGE: Efficient Merging of Graph Neural Force Fields and Beyond

GFFMERGE: 图神经力场的高效合并及其扩展

Parth Verma, Parv P. Singh, Vipul Garg, Ishita Thakre, N. M. Anoop Krishnan, Sayan Ranu

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）； University of Cambridge（剑桥大学）

AI总结提出GFFMERGE框架，通过凸嵌入对齐问题解析解实现图神经网络的闭式模型合并，在力场回归任务中恢复接近联合训练的性能，并实现5-27倍加速。

详情

AI中文摘要

图神经网络（GNN）通过降低计算成本实现接近量子精度的原子模拟，彻底改变了神经力场，但将这些模型适应新化学系统需要对基础模型进行昂贵的重新训练。受视觉和语言处理中模型合并的启发，我们提出了GFFMERGE，这是第一个用于GNN闭式模型合并的原则性框架。我们利用消息传递层的线性结构，将合并问题形式化为具有解析解的凸嵌入对齐问题。通过对GNN模型合并的首次系统基准测试，我们发现为视觉和语言设计的现有方法在力场回归任务上灾难性地失败，而GFFMERGE恢复了接近黄金标准联合训练的性能。在分子（MD17、MD22）、固态（LiPS20）和大规模图基准测试中，GFFMERGE及其通用GNN对应物GNNMERGE实现了5-27倍的加速，同时支持专业模型的模块化组合。值得注意的是，我们的闭式解在微调前就优于所有基线方法，并为更快、数据高效的收敛提供了优越的初始化。

英文摘要

Graph Neural Networks (GNNs) have revolutionized Neural Force Fields for atomistic simulations, achieving near-quantum accuracy at reduced cost, yet adapting these models to new chemical systems requires expensive retraining of foundation models. Inspired by model merging in vision and language processing, we introduce GFFMERGE, the first principled framework for closed-form model merging in GNNs. We exploit the linear structure of message-passing layers and formulate merging as a convex embedding-alignment problem with an analytical solution. Through the first systematic benchmarking of model merging for GNNs, we show that existing methods designed for vision and language catastrophically fail on force field regression, while GFFMERGE recovers performance approaching gold standard joint training. Across molecular (MD17, MD22), solid-state (LiPS20), and large-scale graph benchmarks, GFFMERGE and GNNMERGE (its generic GNN counterpart) achieve 5-27$\times$ speedups while enabling modular composition of specialized models. Remarkably, our closed-form solution alone outperforms all baseline methods before fine-tuning and provides superior initialization for faster, data-efficient convergence.

URL PDF HTML ☆

赞 0 踩 0

2606.03227 2026-06-03 cs.LG 版本更新

Learning Temporal Causal Structure via Smooth Differentiable Optimization

通过平滑可微优化学习时间因果结构

Tong Zhao, Ce Guo, Wayne Luk, Emil Lupu, Ray Dipojjwal

发表机构 * Imperial College London（帝国理工学院伦敦分校）； University of Bristol（布里斯托大学）

AI总结提出使用Gumbel-Sinkhorn算子学习可微变量排序，三角化结构向量自回归模型的瞬时系数矩阵，将无环性转化为参数化，实现统一连续优化，提高时间序列因果发现的效率和准确性。

详情

AI中文摘要

多变量时间序列中具有瞬时效应的因果发现具有挑战性，因为瞬时结构必须是无环的。先前的方法通过将瞬时和滞后估计分离为多阶段流水线，或通过复杂的增广拉格朗日优化施加代数无环性约束来强制执行这一点，这两种方法都 incur 高计算成本。在这项工作中，我们提出了一种不同的方法：我们使用Gumbel-Sinkhorn算子学习变量的可微排列，并按照学习到的顺序三角化结构向量自回归（SVAR）模型的瞬时系数矩阵。这将无环性从硬约束转化为参数化，并在整个优化过程中保持其有效性。通过这样做，我们的方法实现了基于梯度的学习的统一连续优化，从而提高了时间序列因果发现的效率。在三个真实世界基准测试中，我们的方法在发现准确性和效率方面均优于12个基线方法，取得了最佳整体性能。在大规模基准测试中，它进一步展示了强大的可扩展性，实现了比竞争方法快6倍以上的加速。

英文摘要

Causal discovery with instantaneous effects in multivariate time series is challenging, as the instantaneous structure must be acyclic. Prior methods enforce this by either separating instantaneous and lagged estimation into multi-stage pipelines or imposing algebraic acyclicity constraints via complex augmented Lagrangian optimization, both of which incur high computational cost. In this work, we propose a different approach: we learn a differentiable permutation of variables using the Gumbel--Sinkhorn operator and triangularize the instantaneous coefficient matrix of a Structural Vector Autoregressive (SVAR) model in the learned order. This converts acyclicity from a hard constraint into a parameterization and keeps it valid throughout optimization. In doing so, our method enables unified, continuous optimization with gradient-based learning, leading to improved efficiency in time--series causal discovery. Across three real-world benchmarks, our method achieves the best overall performance compared with 12 baselines in both discovery accuracy and efficiency. On the large-scale benchmark, it further demonstrates strong scalability, achieving more than a 6x speedup over competing methods.

URL PDF HTML ☆

赞 0 踩 0

2606.03219 2026-06-03 cs.CL cs.LG 版本更新

Sample-Size Scaling of the African Languages NLI Evaluation

非洲语言自然语言推理评估的样本量缩放

Anuj Tiwari, Oluwapelumi Ogunremu, Terry Oko-odion, Jesujuwon Egbewale, Hannah Nwokocha

发表机构 * Noida Institute of Engineering and Technology（奈德人工智能工程与技术学院）； ML Collective（机器学习集体）

AI总结本研究通过AfriXNLI基准对16种非洲语言进行系统样本量缩放实验，发现NLI性能随样本量增加并非单调提升，而是呈现语言敏感且非单调的缩放行为，表明数据量不足以保证稳定收益，需语言敏感的数据集和更强多语言建模策略。

Comments Accepted at the AfricaNLP Workshop, EACL 2026

详情

DOI: 10.18653/v1/2026.africanlp-main.22

AI中文摘要

非洲语言标注数据非常少，且增加标注数据量是否能可靠提升下游性能尚不明确。本研究基于AfriXNLI基准，对16种非洲语言进行了自然语言推理（NLI）的系统样本量缩放研究。在受控条件下，测试了两个约0.6B参数的多语言Transformer模型（在XNLI上微调的XLM-R Large和AfroXLM-R Large），样本量从50到500个标注示例不等，并在随机子采样运行中平均结果。与通常认为的随数据增加性能单调提升相反，我们发现了一种强烈语言敏感且通常非单调的缩放行为。一些语言在低资源场景下表现出早期饱和或性能下降，以及高方差。这些结果表明，数据量不足以保证非洲NLI的稳定收益，因此需要创建语言敏感的数据集和更强的多语言建模策略。

英文摘要

African languages have very little labelled data, and it is unclear if augmenting the quantity of annotation data reliably enhances downstream performance. The study is a systematic sample-size scaling study of natural language inference (NLI) on 16 African languages based on the AfriXNLI benchmark. Under controlled conditions, two multilingual transformer models with roughly 0.6B parameters XLM-R Large fine-tuned on XNLI and AfroXLM-R Large are tested on sample sizes of between 50 and 500 labeled examples and average their results across random subsampling runs. As opposed to the usual belief of monotonic increase with increased data, we find a strongly language sensitive and often non-monotonic scaling behavior. Some languages show early saturation or decrease in performance with sample size as well as high variance in low resource regimes. These results indicate that the volume of data is not enough to guarantee stable profits to African NLI, creating the necessity of language sensitive datasets creation and stronger multi-lingual modelling strategies.

URL PDF HTML ☆

赞 0 踩 0

2606.03214 2026-06-03 cs.AI cs.CV cs.CY cs.LG 版本更新

Effect of Demographic Bias on Skin Lesion Classification

人口统计偏差对皮肤病变分类的影响

Ralf Raumanns, Gerard Schouten, Veronika Cheplygina, Josien P. W. Pluim

发表机构 * Fontys University of Applied Science, Venlo, The Netherlands（Fontys应用科学大学，荷兰Venlo）； Fontys University of Applied Science, Eindhoven, The Netherlands（Fontys应用科学大学，荷兰Eindhoven）； Eindhoven University of Technology, Eindhoven, The Netherlands（埃因霍温技术大学，荷兰Eindhoven）； IT University of Copenhagen, Denmark（哥本哈根IT大学，丹麦）

AI总结本研究使用基于ResNet的卷积模型评估皮肤病变分类性能，通过线性规划控制人口统计特征，研究患者性别和年龄偏差的影响，并比较三种学习策略，发现性别偏差主要源于数据不平衡，而年龄偏差始终偏向年轻群体。

Comments Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) , 26 pages, 12 figures

详情

DOI: 10.59275/j.melba.2026-4156
Journal ref: https://melba-journal.org/2026:011

AI中文摘要

在这项研究中，我们评估了使用基于ResNet的卷积模型进行皮肤病变分类的性能，重点关注训练数据中人口统计偏差的影响，特别是患者性别和年龄的变化。我们使用线性规划生成具有受控人口统计特征的数据集，从而系统性地研究偏差效应。评估了三种学习策略：单任务模型、强化多任务模型和对抗学习方案。我们的性别分析表明，性别特定的训练数据集优化了模型性能。值得注意的是，在训练数据中包含男性患者提高了男性亚组的性能，即使在女性占多数的情况下也是如此。强化学习和对抗学习方案缩小或消除了平衡和女性占多数数据集中的偏差差距。然而，这些策略在男性占多数的环境中效果较差，模型在男性上的表现仍然优于女性。在主要男性患者群体中，与基线模型相比，这两种学习方案显示出边际偏差减少。基于年龄的分析表明，三种模型方法的基线性能相当，性能随年龄类别下降。无论训练数据分布如何，年轻组始终达到最高性能。尽管平衡训练对最年轻年龄组产生最佳结果，但较老年组的性能下降。我们发现性别偏差主要源于数据不平衡，而年龄偏差无论分布如何始终偏向年轻群体。这些不同的机制需要有针对性的缓解策略。此外，在两个外部数据集上的跨数据集验证表明，域转移显著影响性能和人口统计偏差模式。

英文摘要

In this study, we evaluate the performance of skin lesion classification using ResNet-based convolutional models, focusing on the impact of demographic bias in training data, particularly variations in patient sex and age. We use linear programming to generate datasets with controlled demographic characteristics, allowing systematic investigation of bias effects. Three learning strategies are evaluated: a single-task model, a reinforcing multi-task model, and an adversarial learning scheme. Our sex-based analysis indicates that sex-specific training datasets optimise model performance. Notably, including male patients in the training data improved performance for the male subgroup, even in female-majority cases. Reinforcing and adversarial learning schemes narrowed or eliminated bias gaps in balanced and female-majority datasets. However, these strategies proved less effective in male-majority settings, where models continued to perform better for males than females. The two learning schemes showed marginal bias reduction compared to the baseline model in predominantly male patient populations. Age-based analysis demonstrates comparable baseline performance across the three model approaches, with performance declining across age categories. Younger groups consistently achieve the highest performance, regardless of training data distribution. Although balanced training yields optimal results for the youngest age category, performance decreases in older categories. We find that sex biases arise mainly from data imbalances, while age biases consistently favour younger groups regardless of distribution. These distinct mechanisms require targeted mitigation strategies. Additionally, cross-dataset validation on two external datasets revealed that domain shifts notably affect performance and patterns of demographic bias.

URL PDF HTML ☆

赞 0 踩 0

2606.03210 2026-06-03 cs.CE cs.LG cs.NA math.NA 版本更新

Critical evaluation of PINN for FWD inverse analysis and differentiable FEM as an alternative

PINN 在 FWD 反分析中的批判性评估及可微有限元方法作为替代方案

Yongjin Choi, Hyeonbin Moon, Seunghwa Ryu

发表机构 * KAIST（韩国科学技术院）

AI总结本文批判性评估了物理信息神经网络（PINN）在多层路面系统落锤式弯沉仪（FWD）反分析中的表现，并提出可微有限元方法（DiffFEM）作为更准确、稳定和高效的替代方案。

详情

AI中文摘要

基于自动微分的反分析方法，包括物理信息神经网络（PINN）和可微编程，最近因其计算精确梯度和收敛效率的能力而显示出巨大潜力。然而，它们对落锤式弯沉仪（FWD）反计算的适用性尚未被探索。本研究基于合成基准，批判性评估了基于PINN的多层路面系统反分析，并研究了可微有限元方法（DiffFEM）作为替代方案。标准PINN由于层状路面系统固有的尖锐域不连续性而无法恢复层模量。尽管我们使用了具有域分解的扩展PINN（XPINN），它在不连续域上表现更好，但其性能仍然对损失权重和网络架构高度敏感，并且在测量噪声下会退化。相比之下，DiffFEM始终获得更准确、稳定且计算高效的反演结果。这些结果表明，将控制物理作为硬约束强加的DiffFEM比基于PINN的方法（其中控制物理通过损失函数作为软约束施加）具有更好的准确性、鲁棒性和计算效率。更广泛地说，研究结果表明，在基于PINN和DiffFEM的反分析之间进行选择需要仔细考虑，当存在高效且稳健的可微正演求解器时，DiffFEM提供了实际优势。

英文摘要

Automatic-differentiation-based inverse analysis methods, including physics-informed neural networks (PINNs) and differentiable programming, have recently shown great promise due to their ability to compute accurate gradients and convergence efficiency. However, their applicability to falling weight deflectometer (FWD) backcalculation remains unexplored. This study critically evaluates PINN-based inverse analysis for a multilayer pavement system and investigates differentiable finite element method (DiffFEM) as an alternative based on a synthetic benchmark. The standard PINN does not recover layer moduli because of the sharp domain discontinuities inherent to layered pavement systems. Although we use an extended PINN with domain decomposition (XPINN), which shows better performance on discontinuous domains, its performance remains highly sensitive to loss weighting and network architecture, and degrades under measurement noise. By contrast, DiffFEM consistently achieves more accurate, stable, and computationally efficient inversion results. These results indicate that DiffFEM, which enforces the governing physics as a hard constraint, yields better accuracy, robustness, and computational efficiency than PINN-based approaches, in which the governing physics is imposed as a soft constraint through the loss function. More broadly, the findings suggest that the choice between PINN- and DiffFEM-based inverse analysis needs careful consideration, with DiffFEM offering practical advantages when an efficient and robust differentiable forward solver is available.

URL PDF HTML ☆

赞 0 踩 0

2606.03209 2026-06-03 cs.LG 版本更新

DECA: Decentralizing Block-Wise Adam for Efficient LLM Full-Parameter Fine-Tuning on Non-IID Data

DECA: 去中心化逐块Adam优化器用于非独立同分布数据上的高效大语言模型全参数微调

Yunsheng Yuan, Shaowei Li, Kai Wang, Zhongyuan Sun, Zheng Zhang, Kai Han, Jun Luo, Feng Li

发表机构 * School of Computer Science and Technology, Shandong University, Qingdao China（山东大学计算机科学与技术学院，青岛中国）； School of Mathematical Science, Peking University, China（北京大学数学科学学院，中国）； IEIT SYSTEM, China（IEIT SYSTEM，中国）； School of Computer Science and Artificial Intelligence, Shanghai University of Finance and Economics, Shanghai, China（上海财经大学计算机科学与人工智能学院，上海中国）； College of Computing and Data Science, Nanyang Technological University, Singapore（南洋理工大学计算与数据科学学院，新加坡）

AI总结针对隐私敏感和资源受限环境中的大语言模型微调，提出DECA框架，通过逐块Adam优化和去中心化共识机制，在非独立同分布数据上实现高效的全参数微调，兼顾收敛速度、下游性能和资源效率。

详情

AI中文摘要

在隐私敏感和资源受限的环境中微调大语言模型（LLM）仍然具有挑战性。由于训练数据通常分布在多个客户端上，去中心化微调提供了一种无需中央服务器的协作适应自然范式。然而，在这种去中心化设置中实现全参数微调（FPFT）是困难的：FPFT提供了强大的适应能力，但对于十亿级模型来说会带来高昂的资源消耗。因此，现有的去中心化LLM微调方法主要依赖于参数高效更新，这提高了效率但可能限制下游性能。此外，客户端数据通常是非独立同分布的，这使得去中心化优化更容易受到客户端漂移和不稳定收敛的影响。为了解决这些挑战，我们提出了DECA，一种用于非独立同分布数据上LLM的资源高效去中心化FPFT框架。DECA将模型参数划分为不相交的块，并执行顺序逐块Adam优化，在保持去中心化全参数适应的同时减少资源消耗。为了稳定训练，DECA进一步引入了基于新鲜局部梯度统计和共识衍生差异信号的一阶和二阶逐块矩估计。我们提供了严格的理论分析和广泛的实验，表明DECA实现了快速收敛、强大的下游性能和显著的资源效率。

英文摘要

Fine-tuning large language models (LLMs) in privacy-sensitive and resource-constrained environments remains challenging. Since training data are often distributed across multiple clients, decentralized fine-tuning offers a natural paradigm for collaborative adaptation without a central server. However, enabling full-parameter fine-tuning (FPFT) in this decentralized setting is difficult: FPFT provides strong adaptation capacity but incurs prohibitive resource consumption for billion-scale models. Existing decentralized LLM fine-tuning methods therefore mainly rely on parameter-efficient updates, which improve efficiency but may restrict downstream performance. Moreover, client data are typically non-IID, making decentralized optimization more vulnerable to client drift and unstable convergence. To address these challenges, we propose DECA, a resource-efficient decentralized FPFT framework for LLMs on non-IID data. DECA partitions model parameters into disjoint blocks and performs sequential block-wise Adam optimization, reducing resource consumption while preserving decentralized full-parameter adaptation. To stabilize training, DECA further introduces first- and second-order block-wise moment estimates with fresh local gradient statistics and consensus-derived discrepancy signals. We provide rigorous theoretical analysis and extensive experiments, showing that DECA achieves fast convergence, strong downstream performance, and significant resource efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.03199 2026-06-03 cs.LG physics.chem-ph 版本更新

Fast Organic Crystal Structure Prediction with Unit Cell Flow Matching

基于晶胞流匹配的快速有机晶体结构预测

Alston Lo, Luka Mucko, Austin H. Cheng, Andy Cai, Alastair J. A. Price, Wojciech Matusik, Alán Aspuru-Guzik

发表机构 * MIT CSAIL（麻省理工学院计算机科学与人工智能实验室）； University of Zagreb（Zagreb大学）； University of Toronto（多伦多大学）； Vector Institute for Artificial Intelligence（人工智能矢量研究所）； Acceleration Consortium（加速联盟）； Canadian Institute for Advanced Research（加拿大高级研究研究院）； NVIDIA（NVIDIA公司）

AI总结提出Clari模型，利用流匹配生成无冗余晶胞，以秒级速度实现有机晶体结构预测，速度提升15-30倍。

详情

AI中文摘要

有机晶体结构预测（CSP）是有机固体计算建模的必要条件，但传统上每个分子需要耗费数CPU年。诸如OXtal之类的生成模型通过直接采样稳定的有机晶体结构，大幅降低了这一成本。然而，OXtal放弃了显式晶格参数化，转而使用昂贵的三角形层对块体材料的大块区域进行建模，这可能导致每个分子花费数分钟的计算成本。在本文中，我们通过Clari将其降低到秒级，Clari是一个大规模流匹配模型，生成无冗余晶胞，并用纯对偏注意力取代三角形层。Clari仅需原子类型和键作为输入，无需RDKit可处理的输入分子，从而扩展了其适用于富勒烯、金属配合物和原子团簇等具有挑战性的化学体系。我们进一步消融了关键设计选择，如辅助损失、时间步分布、噪声先验和自条件化。在OXtal的测试集上，我们超越了OXtal的求解率，同时获得了15-30倍的加速。由于Clari还模拟了显式氢原子，它通过直接能量排序支持推理时扩展，无需任何修饰或弛豫步骤。当生成150个晶体并选择能量前30的晶体时，我们进一步提高了求解率，同时保持了5-8倍的加速。我们还引入了CSD教学子集，作为未来基准测试中多样化和复杂分子的新测试分割。我们的贡献使得在几秒内实现CSP成为可能，使有机固体的大规模虚拟筛选变得实用。代码可从此https URL获取。

英文摘要

Organic crystal structure prediction (CSP) is a requirement for computational modelling of organic solids, but traditionally costs several CPU-years per molecule. Generative models such as OXtal dramatically reduce this cost by sampling stable organic crystal structures directly. However, OXtal forgoes explicit lattice parametrization in favour of modelling large crops of the bulk material with expensive triangle layers, which can incur a computational cost of minutes per molecule. In this paper, we reduce this to seconds with Clari, a large-scale flow matching model that generates redundancy-free unit cells and replaces triangle layers with pure pair-bias attention. Clari requires only atom types and bonds as input and does not need an RDKit-sanitizable input molecule, which expands its applicability to challenging chemistries such as fullerenes, metal complexes, and atom clusters. We further ablate key design choices such as auxiliary losses, timestep distributions, noise priors, and self-conditioning. On OXtal's test sets, we surpass OXtal's solve rate while obtaining a speedup of $15$-$30\times$. Because Clari also models explicit hydrogens, it supports inference-time scaling via direct energy ranking, without any decoration or relaxation step. When generating 150 crystals and selecting the top-30 by energy, we further improve solve rate while maintaining a speedup of $5$-$8\times$. We also introduce the CSD Teaching Subset as a new test split of diverse and complex molecules for future benchmarking. Our contributions enable CSP within seconds, making large-scale virtual screening of organic solids practical. Code is available at https://github.com/aspuru-guzik-group/clari.

URL PDF HTML ☆

赞 0 踩 0

2606.03180 2026-06-03 cs.CV cs.CL cs.LG 版本更新

无声操作失败的可见性：模拟机器人任务中假成功检测的可观测性研究

Aarav Bedi

发表机构 * Aarav Bedi

AI总结本研究通过模拟双机械臂ALOHA任务，探讨机器人自身成功检测器标记为成功的任务中，假成功（实际失败但被误判为成功）的可恢复性，发现基于关节数据的检测器在方块转移任务中几乎完全可恢复假成功，而在插销任务中仅部分可恢复，视觉检测器可弥补差距，且可分离性依赖于远低于实际传感器噪声的速度差异。

Comments 4 pages, 3 figures

详情

AI中文摘要

模仿学习策略用于机器人操作时，其训练任务的成功标签质量取决于机器人自身的成功检测器。一种特别有害的错误是假成功：机器人记录为成功但实际任务结果错误的任务。我们针对这些任务提出一个狭窄但实际的问题：一旦任务被标记为成功，推翻该标签所需的信息有多少存在于本体感觉中，又有多少需要视觉？我们在两个双机械臂ALOHA任务上构建模拟测试平台，通过环境扰动而非标签编辑诱发失败，利用检测器从未见过的特权模拟器状态标记每个任务，仅保留机器人标记为成功的任务。然后，我们将限制于本体感觉的检测器与基于视觉的检测器进行比较。我们发现可恢复性范围广泛：在方块转移任务中，假成功几乎完全可从关节数据中恢复，而在插销插入任务中，本体感觉仅恢复部分假成功，视觉检测器则弥补了大部分差距。我们还表明，我们测量的本体感觉可分离性依赖于远低于任何实际传感器噪声水平的速度差异，因此最好将其视为无噪声模拟器夸大的乐观上限。我们发布了生成和评估流程。

英文摘要

Imitation-learning policies for robot manipulation inherit the quality of the success labels attached to their training episodes, and those labels are usually produced by the robot's own success check. A particularly damaging error is the false success: an episode the robot logs as a success when the task outcome was actually wrong. We ask a narrow but practical question about these episodes. Once an episode has already been flagged as a success, how much of the information needed to overturn that label is present in proprioception, and how much requires vision? We build a simulated testbed on two bimanual ALOHA tasks, induce failures through environment perturbations rather than label edits, label every episode by privileged simulator state that the detector never sees, and keep only episodes the robot flagged as successful. We then compare detectors restricted to proprioception against a vision-based detector. We find that recoverability spans a wide range: in cube transfer the false successes are almost fully recoverable from joint data alone, while in peg insertion proprioception recovers only part of them and a vision detector closes most of the gap. We also show that the proprioceptive separability we measure rests on velocity differences far below any realistic sensor noise floor, so it is best read as an optimistic upper bound that a noiseless simulator inflates. We release the generation and evaluation pipeline.

URL PDF HTML ☆

赞 0 踩 0

2606.03131 2026-06-03 cs.LG 版本更新

HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models

HARVE：面向鲁棒奖励模型的感知黑客奖励头向量编辑

Shuang Liu, Yuxuan Bo, Qiuyang Zhao, Caiyue Huang, Xiaorong Chen, Yanguang Liu, Mengnan Du

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； University of Virginia（弗吉尼亚大学）； Harvard University（哈佛大学）； Stanford University（斯坦福大学）； University of Michigan（密歇根大学）； New Jersey Institute of Technology（新泽西理工学院）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结针对奖励模型易受奖励黑客攻击的问题，提出无需训练的奖励头编辑方法HARVE，通过移除与黑客相关子空间对齐的奖励头向量分量，提升鲁棒性并保持通用能力。

详情

AI中文摘要

奖励模型对于大型语言模型（LLM）对齐至关重要，但它们仍然容易受到奖励黑客攻击。为了评估奖励模型的鲁棒性，我们引入了RewardHackBench，其中包含13种奖励黑客模式，涵盖现实生活中的高风险领域和通用设置，并且我们发现八个奖励模型在特定子类别上存在严重失败。为了缓解这些失败，我们提出了HARVE，一种针对标量奖励模型的无需训练的奖励头编辑方法。HARVE不是微调奖励模型，而是从与选定黑客子类别相关的残差流方向中识别出多方向黑客子空间，并移除与该子空间对齐的奖励头向量分量。这直接降低了奖励头对黑客相关特征的敏感性，仅使用少量对比性的黄金-黑客示例，无需梯度更新或微调。在八个奖励模型上的综合实验表明，该方法提高了黑客鲁棒性，优于微调基线，并保持了奖励模型的通用能力。进一步的分析表明，奖励黑客攻击更适合被捕捉为多维残差空间结构，而不是孤立的表面线索。

英文摘要

Reward models are central to large language model (LLM) alignment, but they remain vulnerable to reward hacking. To evaluate reward-model robustness, we introduce RewardHackBench containing 13 reward-hacking patterns covering real life high-stakes domains and general settings, and we find severe failures on specific subcategories across eight reward models. To mitigate these failures, we propose HARVE, a training-free reward-head editing method for scalar reward models. Instead of fine-tuning the reward model, HARVE identifies a multi-directional hacking subspace from residual stream directions associated with selected hacking subcategories, and removes the component of the reward-head vector aligned with that subspace. This directly reduces the reward head's sensitivity to hacking-related features using only a small set of contrastive gold-hacked examples, without gradient updates or fine-tuning. Comprehensive experiments across eight reward models indicates that \model improves hacking robustness, outperforms fine-tuning baselines, and preserves reward-models' general capability. Further analyses suggest that reward hacking is better captured as a multidimensional residual-space structure than by isolated surface cues.

URL PDF HTML ☆

赞 0 踩 0

2606.03130 2026-06-03 cs.LG 版本更新

Synthetic Hallucinations, Real Gains: Hard Negatives from Frontier Models for FIM Hallucination Mitigation

合成幻觉，真实收益：来自前沿模型的硬负样本用于FIM幻觉缓解

Mahdi Erfanian, Nelson Daniel Troncoso, Aashna Garg, Amabel Gale, Xiaoyu Liu, Pareesa Ameneh Golnari, Shengyu Fu

发表机构 * University of Illinois Chicago（伊利诺伊大学芝加哥分校）； Microsoft（微软）

AI总结针对小型开源代码模型在IDE自动补全中产生的填充中间（FIM）幻觉问题，提出一种无需执行的替代方法：利用前沿代码模型合成看似合理但错误的补全作为硬负样本，通过对比合成幻觉与真实开发者编辑的差异作为监督微调信号，在Delulu基准上提升精确匹配18.8个百分点。

详情

AI中文摘要

驱动IDE自动补全的小型开源代码模型仍然会输出幻觉的填充中间（FIM）补全：对项目中不存在的方法、参数、变量和导入的语法上自然的调用。现有的缓解方法要么需要每种语言的执行沙箱（在按键中途不适用），要么需要偏好优化管道（需要大量人工标注语料库）。我们提出一种无需执行的替代方案：使用前沿代码模型合成看似合理但错误的补全作为硬负样本，然后利用这些合成幻觉与真实开发者编辑之间的对比作为监督微调信号。我们的管道从公共GitHub中跨八种语言抓取多语言FIM上下文，并让一组三个前沿生成器为每个上下文针对Delulu分类法（一个经Docker验证的多语言FIM幻觉基准）中的四种幻觉类型各生成一个硬负样本，从而产生配对的选定/拒绝数据集。在10万行精选子集上微调Qwen2.5-Coder-7B-Instruct，使Delulu精确匹配提升+18.8点，编辑相似度提升+0.22，覆盖每种语言和每种类型，同时改进每个HumanEval-Infilling分割和每个SAFIM子集。同样的配方在3B模型上使Delulu提升+12.8 EM，并带有小的、特征化的一般FIM权衡。五轴消融实验（规模、类型混合、语言覆盖、基础模型家族和难度感知的愚弄率）加上头对头的SFT与DPO/ORPO比较，映射了哪些设计选择驱动了收益。我们发布完整的管道源代码——生成、愚弄率LLM评判、筛选和FIM微调配方——以便本文中的实验可以在任何许可语料库上端到端复现。

英文摘要

Small open-source code models that power IDE autocomplete still emit hallucinated Fill-in-the-Middle (FIM) completions: syntactically natural calls to methods, parameters, variables, and imports that do not exist in the surrounding project. Existing mitigations either require per-language execution sandboxes that do not apply at mid-keystroke or preference-optimisation pipelines that need large human-labelled corpora. We propose an execution-free alternative: use frontier code models to synthesise plausible-but-wrong completions as hard negatives, then leverage the contrast between these synthetic hallucinations and the ground-truth developer edit as a supervised fine-tuning signal. Our pipeline scrapes multilingual FIM contexts from public GitHub across eight languages and asks a panel of three frontier generators to produce one hard negative per context for each of four hallucination types drawn from the Delulu taxonomy, a Docker-verified multilingual FIM hallucination benchmark, yielding a paired chosen/rejected dataset. Fine-tuning Qwen2.5-Coder-7B-Instruct on a 100K-row curated subset lifts Delulu exact match by +18.8 points and edit similarity by +0.22 on every language and every type, while also improving every HumanEval-Infilling split and every SAFIM subset. The same recipe at 3B lifts Delulu by +12.8 EM with a small, characterised general-FIM trade-off. Five-axis ablations (size, type mix, language coverage, base-model family, and a difficulty-aware fool rate) plus a head-to-head SFT vs. DPO/ORPO comparison map which design choices drive the gain. We release the full pipeline source code -- generation, fool-rate LLM judging, curation, and the FIM fine-tuning recipe -- so that the experiments in this paper can be reproduced end-to end on any permissively licensed corpus.

URL PDF HTML ☆

赞 0 踩 0

2606.03128 2026-06-03 cs.CR cs.AI cs.CL cs.LG 版本更新

Decoupled Smart Contract Audits: Lightweight LLM Framework via Distillation and Aggregation

解耦式智能合约审计：通过蒸馏与聚合的轻量级LLM框架

Bagus Rakadyanto Oktavianto Putra, Muhamad Risqi Utama Saputra, Widyawan, Guntur Dharma Putra

发表机构 * University of Indonesia（印度尼西亚大学）

AI总结提出一种基于轻量级开源LLM（0.6B-4B参数）的解耦式智能合约审计框架，通过rsLoRA、知识蒸馏和链式验证聚合策略，在漏洞检测中达到98.25%准确率，优于7B-34B参数模型。

Comments 12 pages, 4 figures, 5 tables. Accepted to IEEE ICWS 2026

详情

AI中文摘要

智能合约面临关键安全挑战，需要在去中心化网络服务中进行彻底审计。虽然大型语言模型（LLMs）在自动漏洞检测中展现出潜力，但现有方法缺乏严重性评估和可操作的修复建议，且计算开销过大。在本研究中，我们引入了一个高效的端到端智能合约安全审计框架，利用轻量级、高度优化的开源LLMs（0.6B-4B参数）。我们的框架将综合审计任务解耦为四个相互关联的组件：漏洞检测、解释、严重性分类和修复建议。为了在无需庞大参数量的情况下保持高准确性，我们实现了秩稳定低秩适配器（rsLoRA）、知识蒸馏以及自定义链式验证（CoVe）聚合策略，系统性地筛选并整合模型生成的多个草稿响应，形成高准确度的审计报告。实验结果表明，我们的轻量级流水线持续优于最先进的开源代码密集LLMs（7B至34B参数），在漏洞检测中达到98.25%的准确率，在生成解释任务中达到0.4375的对齐分数。此外，我们广泛的消融研究实证验证了我们的解耦审计过程相对于统一提示的优越性，并揭示了一种新颖的严重性中心性偏差，为未来LLM辅助审计研究建立了关键基准。

英文摘要

Smart contracts face critical security challenges that require thorough auditing in decentralized web services. While Large Language Models (LLMs) have shown promise in automated vulnerability detection, existing approaches lack severity evaluations with actionable remediation and demand unnecessarily massive computational overhead. In this study, we introduce an efficient end-to-end smart contract security audit framework utilizing lightweight, highly optimized open-source LLMs (0.6B-4B parameters). Our framework decouples comprehensive audit tasks into four interconnected components: vulnerability detection, explanation, severity classification, and remediation recommendation. To maintain high accuracy without massive parameters, we implement Rank-Stabilized Low-Rank Adapters (rsLoRA), knowledge distillation, and a custom Chain-of-Verification (CoVe) aggregation strategy to systematically screen and consolidate multiple draft responses from the model into a highly accurate audit report. Experimental results demonstrate that our lightweight pipeline consistently outperforms state-of-the-art open-source coder dense LLMs (7B to 34B parameters), achieving 98.25% accuracy in vulnerability detection and an alignment score of 0.4375 in generative explanation tasks. Furthermore, our extensive ablation studies empirically validate the superiority of our decoupled audit processes over unified prompting and uncover a novel severity centrality bias, establishing a critical benchmark for future research in LLM-assisted auditing.

URL PDF HTML ☆

赞 0 踩 0

2606.03125 2026-06-03 cs.LG 版本更新

Rethinking Neural Width for Alternating Current Optimal Power Flow Proxies

重新思考用于交流最优潮流代理的神经网络宽度

Dhruvi Khandelwal, Anurag Basistha, Ayushi Jolotia, Parikshit Pareek

发表机构 * Department of Electrical Engineering, National Institute of Technology Kurukshetra, India（印度克什米尔国立理工学院电气工程系）； Indraprastha Institute of Information Technology Delhi, India（印度德里印度理工信息学院）； Department of Electrical Engineering, Indian Institute of Technology Roorkee, India（印度罗尔基印度理工学院电气工程系）

AI总结本文提出损失引导神经稠密化算法，通过逐步扩展网络容量来最小化宽度，以精确逼近交流最优潮流流形，并在多个IEEE系统上以少十倍的神经元达到与基线相当的性能。

2606.03121 2026-06-03 cs.LG 版本更新

TiWeaver: Unified Temporal Dynamics Modeling via Contextual Patching

TiWeaver：通过上下文补丁实现统一的时间动态建模

Zhe Li, Jindong Tian, Hao Miao, Zhi Lei, Chenjuan Guo, Bin Yang

发表机构 * East China Normal University（东华大学）； Hong Kong Polytechnic University（香港理工大学）； Aalborg University（奥胡斯大学）

AI总结针对多变量时间序列中因缺失值和非均匀采样等不规则性导致的动态复杂性和通道间异步依赖问题，提出TiWeaver框架，通过图引导自适应分词器（G²AT）和细粒度异步依赖提取器（FADE）实现自适应建模，在12个数据集上取得最高25%的性能提升。

详情

DOI: 10.1145/3770855.3817748

AI中文摘要

多变量时间序列预测在现实世界应用中扮演着关键角色，包括天气预报、股票分析和健康监测。由于数据源的多样性，时间序列表现出多样的时间动态，通常伴随着各种不规则性，如缺失值和非均匀采样频率。这些不规则性导致跨通道的复杂异步时间依赖。因此，具有固定补丁方案的单一模型往往难以很好地适应多样化的多变量时间序列，阻碍了准确预测。在本文中，我们提出了TiWeaver，一个统一框架，旨在自适应地处理时间动态和细粒度的通道间依赖。具体来说，我们引入了一个图引导自适应分词器（G²AT），通过联合考虑时间密度和表示一致性，将时间序列划分为高度上下文连贯的补丁。此外，我们提出了一个细粒度异步依赖提取器（FADE），旨在建模细粒度的异步通道间依赖，同时结合长期历史依赖。我们在12个真实世界时间序列数据集上评估了TiWeaver，它取得了最先进的性能，优于现有方法高达25%。这些结果证明了其在多样化领域和数据特征上的鲁棒性和有效性。

英文摘要

Multivariate time series forecasting plays a critical role in real-world applications, including weather prediction, stock analysis, and health monitoring. Due to the diversity of data sources, time series exhibit diverse temporal dynamics, often accompanied by various irregularities such as missing values and non-uniform sampling frequencies. Such irregularities lead to complex and asynchronous temporal dependencies across channels. Thus, a single model with a fixed patching scheme often fails to adapt well to diverse multivariate time series, hindering accurate forecasting. In this paper, we propose TiWeaver, a unified framework designed to handle temporal dynamics and fine-grained inter-channel dependencies adaptively. Specifically, we introduce a Graph-Guided Adaptive Tokenizer (G$^2$AT) that divides time series into high contextually coherent patches by jointly considering temporal density and representation consistency. In addition, we propose a Fine-grained Asynchronous Dependency Extractor (FADE), which is designed to model fine-grained asynchronous inter-channel dependencies while incorporating long-term historical dependencies. We evaluate TiWeaver on 12 real-world time series datasets, where it achieves state-of-the-art performance, outperforming existing methods up to 25%. These results demonstrate its robustness and effectiveness across diverse domains and data characteristics.

URL PDF HTML ☆

赞 0 踩 0

2606.03119 2026-06-03 cs.CV cs.AI cs.LG 版本更新

GuidedBridge: Training-freely Improving Bridge Models with Prior Guidance

GuidedBridge: 无需训练地利用先验引导改进桥接模型

Zehua Chen, Yucheng Yang, Binjie Yuan, Kaiwen Zheng, Jun S. Liu, Jun Zhu

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出无需训练的先验引导方法（PG）和频率调制先验引导（FMPG），通过对比弱先验与已见先验增强桥接模型的先验利用，并设计级联框架CFG-FMPG用于图像修复，实验证明该方法能一致提升预训练桥接模型在多种图像翻译任务中的性能。

Comments ICML 2026

详情

AI中文摘要

引导方法，如无分类器引导（CFG）和自动引导（AG），推动了扩散模型中噪声到数据生成的发展。最近，桥接模型引入了一种数据到数据的生成过程，可以利用有指导性的干净先验。在这项工作中，受先前通过去噪结果质量差异作为引导的方法启发，我们提出了一种无需训练的桥接引导方法，称为先验引导（PG）。具体来说，我们引入一个弱先验，该先验在桥接预训练期间未见，阻碍先验利用从而降低去噪结果。然后，我们将其与已见先验对比，通过缩放因子突出并增强先验利用。此外，我们分析了桥接过程中先验利用的潜在机制，并设计了频率调制先验引导（FMPG），该引导将引导尺度调整到与桥接生成动力学一致的低频和高频带。为了解决图像修复中的先验利用问题，我们开发了一个级联框架CFG-FMPG，该框架首先通过CFG生成噪声隐藏表示，然后将其作为生成先验与FMPG一起利用，在不影响推理效率的情况下发挥它们的互补优势。实验表明，我们的PG方法在多种图像翻译任务中一致地改进了预训练桥接模型。

英文摘要

Guidance methods, such as classifier-free guidance (CFG) and auto-guidance (AG), have advanced noise-to-data generation in diffusion models. Recently, bridge models have introduced a data-to-data generative process that can exploit an instructive clean prior. In this work, inspired by previous methods creating quality difference between denoising results as guidance, we propose a training-free bridge guidance method, termed Prior Guidance (PG). Specifically, we introduce a weak prior, which is unseen during bridge pre-training, hindering prior exploitation and thereby degrading denoising result. Then, we contrast it with the seen prior to highlight and enhance prior exploitation via a scaling factor. Moreover, we analyze the underlying mechanism of prior exploitation in the bridge process and design frequency-modulated prior guidance (FMPG), which tailors the guidance scale to low- and high-frequency bands coherent with bridge generative dynamics. To address prior exploitation in image in-painting, we develop a cascaded framework, CFG-FMPG, which first generates a noisy hidden representation via CFG and then exploits it as a generative prior with FMPG, fulfilling their complementary strengths without compromising inference efficiency. Experiments demonstrate that our PG methods consistently improve pre-trained bridge models across diverse image translation tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.03118 2026-06-03 cs.LG cs.CV q-bio.NC 版本更新

Learning to See via Epiretinal Implant Stimulation in silico with Model-Based Deep Reinforcement Learning

通过基于模型的深度强化学习在硅上学习经由视网膜上植入物刺激的视觉

Jacob Lavoie, Marwan Besrour, William Lemaire, Jean Rouat, Réjean Fontaine, Eric Plourde

发表机构 * Department of Electrical Engineering and Computer Engineering, Université de Sherbrooke（电气与计算机工程系， Sherbrooke 大学）

AI总结本研究提出使用各向同性和各向异性形状，通过深度强化学习在虚拟患者的视网膜上渲染可理解的图像，以提高人工恢复视觉的清晰度。

Comments 18 pages, 6 figures. Published version: Biomed. Phys. Eng. Express 10, 025006 (2024)

详情

DOI: 10.1088/2057-1976/acf1a5
Journal ref: Biomed. Phys. Eng. Express 10 (2024) 025006

AI中文摘要

目标：年龄相关性黄斑变性和视网膜色素变性等疾病会导致感光层退化。恢复视力的一种方法是通过微电极阵列（如视网膜上植入物）电刺激存活的视网膜神经节细胞。已知视网膜上植入物会产生沿邻近视网膜神经节细胞轴突束延伸的可见各向异性形状。最近的研究表明，为了获得各向同性的像素状形状，可以通过失活电极或降低刺激电流水平来映射轴突束并避免刺激它们。避免轴突束刺激旨在去除类似笔触的形状，转而采用更简化的像素状形状集合。方法：在本研究中，我们提出使用各向同性和各向异性形状，在名为rlretina的强化学习环境中为虚拟患者的视网膜渲染可理解的图像。该环境将任务形式化为在基于笔触的渲染任务中使用笔触。主要结果：我们训练了一个深度强化学习智能体，它学会组合各向同性和各向异性形状以形成图像。我们研究了哪种基于误差或基于感知的指标适合奖励智能体。该智能体以基于模型的数据生成方式训练，使用经过心理物理学验证的轴突映射模型来渲染不同虚拟患者感知到的图像。我们表明，与不同虚拟患者中的朴素方法相比，该智能体可以生成更可理解的图像。意义：这项工作提供了一种解决视网膜上刺激的新方法，这是朝着使用各向异性光幻视改善人工恢复视力中视觉敏锐度的第一步。

英文摘要

Objective: Diseases such as age-related macular degeneration and retinitis pigmentosa cause the degradation of the photoreceptor layer. One approach to restore vision is to electrically stimulate the surviving retinal ganglion cells with a microelectrode array such as epiretinal implants. Epiretinal implants are known to generate visible anisotropic shapes elongated along the axon fascicles of neighboring retinal ganglion cells. Recent work has demonstrated that to obtain isotropic pixel-like shapes, it is possible to map axon fascicles and avoid stimulating them by inactivating electrodes or lowering stimulation current levels. Avoiding axon fascicle stimulation aims to remove brushstroke-like shapes in favor of a more reduced set of pixel-like shapes. Approach: In this study, we propose the use of isotropic and anisotropic shapes to render intelligible images on the retina of a virtual patient in a reinforcement learning environment named rlretina. The environment formalizes the task as using brushstrokes in a stroke-based rendering task. Main Results: We train a deep reinforcement learning agent that learns to assemble isotropic and anisotropic shapes to form an image. We investigate which error-based or perception-based metrics is adequate to reward the agent. The agent is trained in a model-based data generation fashion using the psychophysically validated axon map model to render images as perceived by different virtual patients. We show that the agent can generate more intelligible images compared to the naive method in different virtual patients. Significance: This work shares a new way to address epiretinal stimulation that constitutes a first step towards improving visual acuity in artificially-restored vision using anisotropic phosphenes.

URL PDF HTML ☆

赞 0 踩 0

2606.03094 2026-06-03 cs.LG 版本更新

FGRPO: Federated GRPO with Adaptive Aggregation on Non-IID Data

FGRPO：非独立同分布数据上具有自适应聚合的联邦GRPO

Pengyu Chen, Shaowei Li, Kai Wang, Yunsheng Yuan, Kai Han, Jun Luo, Feng Li

发表机构 * School of Computer Science and Technology, Shandong University（山东大学计算机科学与技术学院）； School of Mathematical Science, Peking University（北京大学数学科学学院）； School of Computer Science and Artificial Intelligence, Shanghai University of Finance and Economics（上海财经大学计算机科学与人工智能学院）； College of Computing and Data Science, Nanyang Technological University（南洋理工大学计算与数据科学学院）

AI总结提出联邦GRPO（FGRPO）框架，通过基于相对性能增益的自适应聚合机制，在非独立同分布数据上实现去中心化推理模型微调，兼顾数据隐私与鲁棒收敛。

详情

AI中文摘要

语言模型的最新进展已将强化学习确立为引发自我纠正和长链推理的主要范式。虽然群体相对策略优化（GRPO）通过消除评论家网络提供了卓越的可扩展性，但将其部署在中央基础设施上需要从分布式所有者收集大量数据，这带来了显著的隐私风险。为了解决这些问题，我们引入了联邦GRPO（FGRPO），这是一个旨在跨异构数据所有者去中心化推理模型微调的框架。为了有效缓解异构任务间奖励尺度差异引起的不稳定性，FGRPO结合了一种基于相对性能增益的自适应聚合机制。通过刻画每个客户端相对于其个性化历史基线的改进，该框架动态地优先考虑有效的学习轨迹，而无需考虑局部任务的难度。FGRPO在非独立同分布数据上确保鲁棒收敛，同时保护数据隐私。

英文摘要

Recent advances in language models have established reinforcement learning as the primary paradigm for eliciting self-correction and long-chain reasoning. While group relative policy optimization (GRPO) offers superior scalability by eliminating the critic network, deploying it on a central infrastructure entails collecting a large volume of data from distributed owners, which poses significant privacy risks. To address these concerns, we introduce federated GRPO (FGRPO), a framework designed to decentralize the fine-tuning of reasoning models across heterogeneous data owners. To effectively mitigate the instability caused by divergent reward scales across heterogeneous tasks, FGRPO incorporates an adaptive aggregation mechanism based on relative performance gain. By characterizing each client's improvement relative to its personalized historical baseline, the framework dynamically prioritizes effective learning trajectories regardless of local task difficulty. FGRPO ensures robust convergence on non-IID data while preserving data privacy.

URL PDF HTML ☆

赞 0 踩 0

2606.03087 2026-06-03 cs.LG 版本更新

Learning to Solve, Forgetting to Retain: Correct-Set Turnover in RLVR

学会解决，忘记保留：RLVR中的正确集更替

Chuanyu Qin, Chenxu Yang, Qingyi Si, Naibin Gu, Peng Fu, Zheng Lin

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences（中国科学院信息工程研究所）； School of Cyber Security, University of Chinese Academy of Sciences（中国科学院大学网络安全学院）； JD.COM（京东）

AI总结针对强化学习可验证奖励（RLVR）中模型遗忘已解决问题的问题，提出正确集更替现象和修复窗口原则，并设计保留感知的回顾机制\method{}，通过零额外开销的预部署批量替换提升多模态任务性能。

详情

AI中文摘要

强化学习可验证奖励（RLVR）提升了大型语言模型的能力，然而头条准确率的提升往往掩盖了一个隐藏代价：随着训练进行，先前解决的问题悄然变得无法解决。我们将此现象定义为\emph{正确集更替}，代表了在已掌握集上解决方案获取与退化的耦合动态。在此视角下，保留与获取一样成为明确的优化目标。我们分析并实证建立了\emph{修复窗口原则}：恢复退化提示的成本随回顾延迟急剧增加，定义了一个标准RLVR流程未能利用的低成本窗口。为解决此问题，我们提出\method{}，一种保留感知的回顾机制，追踪已掌握提示并定期重新引入以\emph{提醒}模型先前的解决方案。通过利用预部署批量替换，\method{}引入零额外部署开销。在涵盖图像-文本、视频和纯文本任务的20个基准上，使用Qwen3-VL和Qwen2.5-Math进行评估，\method{}在GRPO、DAPO和回放基线上持续提升性能，展示了跨模态和算法的稳健泛化能力。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) improves the ability of large language model, yet headline accuracy gains often conceal a hidden cost: previously solved problems quietly become unsolvable as training proceeds. We frame this phenomenon as \emph{correct-set turnover}, representing the coupled dynamics of solution acquisition and regression over the mastered set. Under this view, retention becomes an explicit optimization target alongside acquisition. We analytically and empirically establish the \emph{repair-window principle}: the cost of restoring a regressed prompt grows sharply with review delay, defining a low-cost window that standard RLVR pipelines fail to exploit. To address this, we propose \textbf{\method{}}, a retention-aware review mechanism that tracks mastered prompts and periodically reintroduces them to \textbf{remind} the model of previous solutions. By utilizing pre-rollout batch replacement, \method{} incurs zero additional rollout overhead. Evaluated across 20 benchmarks spanning image-text, video, and text-only tasks with Qwen3-VL and Qwen2.5-Math, \method{} consistently improves performance over GRPO, DAPO, and replay baselines, demonstrating robust generalizability across modalities and algorithms.

URL PDF HTML ☆

赞 0 踩 0

2606.03074 2026-06-03 cs.LG cs.SY eess.SY 版本更新

学习何时何地连接：图上动态消息传递的自适应虚拟节点

Jaejun Lee, Joyce Jiyoung Whang

发表机构 * School of Computing, KAIST（计算机学院，韩国科学技术院）； Department of AI Computing, KAIST（人工智能计算系，韩国科学技术院）

AI总结提出MAVN框架，通过端到端可微分的方式自适应地决定在消息传递神经网络的哪一层为哪些节点引入虚拟节点，并基于双向评分机制建立连接，理论证明其能模拟任意节点-虚拟节点连接模式，实验表明在多个数据集上显著提升骨干网络性能。

Comments 12 pages, 6 figures, 10 tables, 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

详情

DOI: 10.1145/3770855.3818013

AI中文摘要

虽然虚拟节点（VN）常用于消息传递神经网络（MPNN）中以促进有效的消息传递，但现有的基于VN的方法存在局限性，例如限制所有节点连接到相同数量的VN、在应用MPNN之前固定连接，以及独立于连接到同一VN的其他节点而将节点连接到VN。我们提出了MAVN，一个端到端可微分的MPNN框架，允许节点和VN之间无约束的连接，并根据跨层演化的节点表示动态按需引入VN。具体来说，MAVN学习基于连接的相对重要性自适应地决定何时（在哪一层）以及何地（连接到哪些节点）引入和连接VN。从候选VN池中，MAVN在每一层选择必要的VN，每个选中的VN连接到非空节点子集，由双向评分机制引导，该机制同时捕捉节点对VN的偏好和VN对节点的偏好。我们理论上证明，对于任何节点-VN连接模式，都存在一组MAVN参数可以模拟该模式。在九个真实世界数据集上的实验表明，MAVN持续提升骨干MPNN的性能，相对于骨干网络实现高达46.5%的提升，并优于基线方法。

英文摘要

While Virtual Nodes (VNs) are often utilized in Message Passing Neural Networks (MPNNs) to facilitate effective message passing, existing VN-based methods have limitations, such as constraining all nodes to connect to the same number of VNs, fixing the connections before applying MPNNs, and connecting a node to a VN independently of the other nodes that connect to the same VN. We propose MAVN, an end-to-end differentiable MPNN framework that allows non-constrained connections between nodes and VNs and dynamically introduces VNs on demand in response to evolving node representations across layers. Specifically, MAVN learns to adaptively determine when (at which layer) and where (to which nodes) to introduce and connect VNs based on the relative importance of connections. From a pool of candidate VNs, MAVN selects the necessary VNs in each layer, where each selected VN is connected to a nonempty subset of nodes, guided by a dual-perspective scoring mechanism that jointly captures the nodes' preferences for VNs and the VNs' preferences for nodes. We theoretically prove that for any node-VN connectivity pattern, there exists a set of MAVN's parameters that can simulate the pattern. Experiments on nine real-world datasets demonstrate that MAVN consistently improves the performance of backbone MPNNs, achieving up to 46.5% improvement over the backbones and outperforms the baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.03061 2026-06-03 cs.DC cs.AI cs.LG cs.NI cs.SY eess.SY 版本更新

Brief Announcement: Generative Markov Model for Distributed Computing Systems

简要公告：分布式计算系统的生成马尔可夫模型

Alfreds Lapkovskis, Ali Beikmohammadi, Sindri Magnússon, Praveen Kumar Donta

发表机构 * Department of Computer and Systems Sciences, Stockholm University, Sweden（斯德哥尔摩大学计算机与系统科学系）

AI总结针对分布式计算系统的异构性和复杂性，提出一种基于结构化状态分解的生成马尔可夫模型，实现可处理的模拟、推理和策略学习，并通过协作AI推理案例验证其有效性。

Comments Submitted to 40th International Symposium on Distributed Computing (DISC 2026)

详情

AI中文摘要

新兴的分布式计算范式，如计算连续体，本质上是异构、随机和复杂的。高效且有效地利用连续体中所有可用资源需要一个统一的系统形式化模型。为了解决这一差距，我们提出了一个通用框架，将分布式计算系统建模为生成马尔可夫模型，该模型在结构化系统状态上进行分解。在我们的模型中，状态分解为高维变量，每个变量进一步在其元素上分解，反映了分布式系统固有的稀疏依赖结构。这产生了一个可处理的模型，能够对原本难以处理的系统状态进行模拟、推理和策略学习，从而将分布式计算与马尔可夫链理论和强化学习（RL）联系起来。我们通过一个协作AI推理的案例研究来展示我们的框架，其中专用服务器将资源与服务用户自愿提供的资源相结合。我们的结果表明，集中式调度在规模上成为瓶颈，而将计算分布到用户设备上可减少延迟和服务器资源消耗。这些发现突显了自适应决策在分布式计算系统中的价值，并展示了该框架在建模、模拟和优化方面的实用性。

英文摘要

Emerging distributed computing paradigms, such as the computing continuum, are inherently heterogeneous, stochastic, and complex. Efficiently and effectively utilizing all available resources across the continuum demands a unified formal model of the system. To address this gap, we propose a general framework for modeling distributed computing systems as a generative Markov model, factorized over a structured system state. In our model, the state decomposes into high-dimensional variables, each further factorized over its elements, reflecting the sparse dependency structure inherent to distributed systems. This yields a tractable model enabling simulation, inference, and policy learning over otherwise intractable system states, bridging distributed computing with Markov chain theory and reinforcement learning (RL). We demonstrate our framework through a case study of collaborative AI inference, in which a dedicated server combines resources with those volunteered by service users. Our results show that centralized scheduling becomes a bottleneck at scale, while distributing computation across user devices reduces both latency and server resource consumption. These findings highlight the value of adaptive decision-making in distributed computing systems and demonstrate the framework's utility for modeling, simulation, and optimization.

URL PDF HTML ☆

赞 0 踩 0

2606.03057 2026-06-03 cs.LG cs.AI 版本更新

Rethinking Molecular Text Representations for LLMs: An Empirical Study

重新思考用于大语言模型的分子文本表示：一项实证研究

Arun Raja, Garrett M. Morris, Kian Ming A. Chai

发表机构 * University of Oxford（牛津大学）； DSO National Laboratories（DSO国家实验室）

AI总结通过系统基准测试，评估了9种分子表示和8种化学任务下16个LLM的性能，发现表示选择强烈影响结果，结构化文本表示（CML、MolJSON）在结构任务中占优，IUPAC在语义任务中占优，而SMILES很少最优。

Comments 25 pages, 11 figures, 20 tables

详情

AI中文摘要

大语言模型（LLMs）越来越多地用于分子任务，但目前尚不清楚使用哪种分子表示。我们提出了一个系统基准测试，评估了LLM在九种表示和八种化学任务上的分子能力。我们基准测试了16个LLM，涵盖五个模型家族，包括推理和非推理变体、化学专用LLM以及封闭前沿模型。性能强烈依赖于表示，没有单一表示在所有任务中获胜，尽管CML是最好的，其次是MolJSON、InChI，然后是规范SMILES。显式结构化文本表示（CML和MolJSON）主导结构任务；IUPAC主导语义任务，在所有16个LLM的分子检索中获胜；而SMILES变体尽管在预训练中普遍存在，但很少是最优的。化学专用模型在使用SMILES时表现良好，但使用结构化文本表示时性能大幅下降，这表明仅基于SMILES的评估奖励了不具泛化能力的专业化。使用LLM作为评判者，我们发现IUPAC产生的正确分子生成比例最高。通过分词审计、线性探针和注意力的机制研究表明，表示在模型内部以不同方式编码；例如，结构化表示需要跨分子范围的更高注意力。我们的结果反对表示不变的评估，并激励基于LLM的化学任务感知表示路由。

英文摘要

Large language models (LLMs) are increasingly used for molecular tasks, but it remains unclear which molecular representation to use. We present a systematic benchmark evaluating LLM molecular competence across nine representations and eight chemical tasks. We benchmark 16 LLMs across five model families, including reasoning and non-reasoning variants, chemistry-specialized LLMs, and closed frontier models. Performance is strongly representation-dependent and no single representation wins across tasks, though CML is the best, followed by MolJSON, InChI, and then canonical SMILES. Explicit structured text representations (CML and MolJSON) dominate structural tasks; IUPAC dominates semantic tasks, winning molecule retrieval for all 16 LLMs; and SMILES variants are rarely optimal despite their prevalence in pretraining. Chemistry-specialized models perform well with SMILES at the cost of large degradations with structured text representations, suggesting SMILES-only evaluation rewards specialization that does not generalize. Using LLM-as-a-judge, we find that IUPAC produces the highest fraction of correct molecule generations. A mechanistic study via tokenization audits, linear probes and attention shows that representations are encoded differently inside the model; for example, structured representations require higher attention across the molecular span. Our results argue against representation-invariant evaluation and motivate task-aware representation routing for LLM-based chemistry.

URL PDF HTML ☆

赞 0 踩 0

2606.03052 2026-06-03 cs.LG 版本更新

What Do Students Learn? A Feature-Level Analysis of Dark Knowledge

学生学到了什么？暗知识的特征级分析

Seungu Kang, Songkuk Kim

发表机构 * Yonsei University（延世大学）

AI总结本文利用交互张量框架分析知识蒸馏中学生模型的特征学习，发现有效蒸馏作为正则化器去除低频样本特定特征，并提出基于混淆矩阵的教师无关自蒸馏方法混淆蒸馏（CD），在CIFAR-100上优于现有自蒸馏方法。

Comments Accepted at ICPR 2026

详情

AI中文摘要

知识蒸馏（KD）是模型压缩的强大工具，然而学生模型获取特征表示的确切机制仍未充分探索。在这项工作中，我们使用交互张量框架分析学生特征学习。我们的分析表明，有效的KD充当正则化器，修剪低频、样本特定的特征，鼓励学生依赖一组紧凑的高可重用特征。至关重要的是，我们观察到数据集级别的混淆矩阵包含类似于教师“暗知识”的结构信息。利用这一见解，我们提出了混淆蒸馏（CD），一种无教师自蒸馏方法，利用模型自身不断演化的混淆模式作为动态软目标。CD在CIFAR-100上的ResNet-34和ResNet-50上取得了有竞争力的性能，比现有的自蒸馏方法如CS-KD和PS-KD高出1.2%，同时提供了标准KD的计算高效替代方案。

英文摘要

Knowledge Distillation (KD) is a powerful tool for model compression, yet the precise mechanisms by which student models acquire feature representations remain underexplored. In this work, we analyze student feature learning using the Interaction Tensor framework. Our analysis reveals that effective KD acts as a regularizer that prunes low-frequency, sample-specific features, encouraging the student to rely on a compact set of highly reusable features. Crucially, we observe that the dataset-level confusion matrix contains structural information analogous to the teacher's "Dark Knowledge." Leveraging this insight, we propose Confusion Distillation (CD), a teacher-free self-distillation method that utilizes the model's own evolving confusion patterns as dynamic soft targets. CD achieves competitive performance on ResNet-34 and ResNet-50 for CIFAR-100, outperforming existing self-distillation methods like CS-KD and PS-KD by 1.2% while offering a computationally efficient alternative to standard KD.

URL PDF HTML ☆

赞 0 踩 0

2606.03040 2026-06-03 cs.AI cs.LG 版本更新

RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases

RelGT-AC：用于关系数据库中自动完成任务的关系图Transformer

Phillip Jiang

发表机构 * Appsofa LLC（Appsofa公司）

AI总结提出RelGT-AC模型，通过列掩码策略、统一任务头和TF-IDF文本编码器，在关系数据库的自动完成任务上优于GraphSAGE基线。

Comments 12 pages, 6 figures. Code and model checkpoints available at https://github.com/jiangdmv/graph-transformer

详情

AI中文摘要

关系数据库支撑着现代企业、科学和医疗系统，但由于其多表、异构和时间结构，对此类数据进行预测性机器学习仍然具有挑战性。关系深度学习（RDL）通过将数据库表示为异构图并直接应用图神经网络（GNN）来解决这一问题。RelBench v2最近引入了自动完成任务——一种实际动机的任务类型，其目标是从关系上下文中预测现有列值，类似于智能表单填充助手。我们提出了RelGT-AC（用于自动完成的关系图Transformer），通过三个有针对性的贡献扩展了RelGT架构：（1）一种列掩码策略，通过在子图编码期间屏蔽目标列来防止平凡解；（2）一个统一的任务头，支持在单个模型内进行二分类、多分类和回归自动完成任务；（3）一个TF-IDF文本编码器，自动检测和编码自由文本列，恢复分类编码器丢弃的强词汇信号。在跨越3个RelBench v2数据集（rel-trial、rel-f1、rel-stack）的7个任务中，RelGT-AC在所有3个回归自动完成任务上优于GraphSAGE基线，并通过TF-IDF编码器在文本密集的资格任务上实现了高达+10 AUROC点的提升。

英文摘要

Relational databases underpin modern enterprise, scientific, and healthcare systems, yet predictive machine learning on such data remains challenging due to their multi-table, heterogeneous, and temporal structure. Relational Deep Learning (RDL) addresses this by representing databases as heterogeneous graphs and applying graph neural networks (GNNs) directly. RelBench v2 recently introduced autocomplete tasks -- a practically motivated task type where the goal is to predict an existing column value from relational context, analogous to an intelligent form-filling assistant. We propose RelGT-AC (Relational Graph Transformer for Autocomplete), extending the RelGT architecture with three targeted contributions: (1) a column masking strategy that prevents trivial solutions by masking the target column during subgraph encoding; (2) a unified task head supporting binary classification, multiclass classification, and regression autocomplete tasks within a single model; and (3) a TF-IDF text encoder that automatically detects and encodes free-text columns, recovering strong lexical signal that categorical encoders discard. Across 7 tasks spanning 3 RelBench v2 datasets (rel-trial, rel-f1, rel-stack), RelGT-AC outperforms the GraphSAGE baseline on all 3 regression autocomplete tasks and achieves up to +10 AUROC points on text-heavy eligibility tasks via the TF-IDF encoder.

URL PDF HTML ☆

赞 0 踩 0

2606.03038 2026-06-03 cs.LG physics.comp-ph physics.optics 版本更新

Will Accurate Fields Mislead Photonic Design? FromGlobal Accuracy to Port Readout

精确的场会误导光子设计吗？从全局精度到端口读出

Yitian Zhang, Yonghong chen, Youming Chen, Yiyang Li, Xing Zhe, Renhe Lu, Shaolin Liao, Yuzhe Ma, Zhong Guan

发表机构 * Sun Yat-sen University（中山大学）； The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））

AI总结针对光子设计中全局场精度高但端口读出不可靠的问题，提出传播对齐神经算子PaNO及其输出感知变体PaNO-R2，在MMI分束器基准上将端口功率误差降低72.7%。

详情

AI中文摘要

神经场代理可以加速光子设计循环，但一个在全局场误差上看起来精确的代理，当最终决策依赖于局部输出端口读出时，仍可能对候选器件进行错误排序。这种风险在传播主导的MMI分束器和耦合器中尤为严重，其中端口功率、分束、相位和耦合由累积的模态干涉和输出窗口聚合决定，而不仅仅是平均场相似性。我们通过场/中介/读出视角研究这种场到设计的不匹配，将密集复场误差与传播轮廓和输出窗口误差在端口聚合前分离。为了将代理与此链对齐，我们提出PaNO，一种传播对齐的神经算子，它保持全场预测接口，同时围绕局部边界结构、横向模态内容、轴向传播和交叉模态交互组织潜在状态。我们还评估了PaNO-R2，一种针对端口区域附近残余场分量的输出感知反馈变体。在具有4608个保留场的15波长可调谐$3\times3$ MMI基准上，PaNO将NeurOLight的端口功率误差从0.2018降低到0.0739，尽管cMAE略有升高，表明仅全局场精度不足以实现设计相关的读出保真度。PaNO-R2获得了最佳的cMAE、传播轮廓误差、输出轮廓误差和端口功率误差，将NeurOLight的端口功率和输出轮廓误差分别降低了72.7%和72.5%。

英文摘要

Neural field surrogates can accelerate photonic design loops, but a surrogate that looks accurate in global field error can still mis-rank candidate devices when the final decision depends on localized output-port readouts. This risk is acute in propagation-dominated MMI splitters and couplers, where port power, splitting, phase, and coupling are determined by accumulated modal interference and output-window aggregation rather than by average field similarity alone. We study this field-to-design mismatch through a Field/Mediator/Readout view that separates dense complex-field error from propagation-profile and output-window errors before port aggregation. To align the surrogate with this chain, we propose PaNO, a propagation-aligned neural operator that keeps the full-field prediction interface while organizing latent states around local boundary structure, transverse modal content, axial propagation, and cross-mode interaction. We also evaluate PaNO-R2, an output-aware feedback variant for residual field components near the port region. On a 15-wavelength tunable $3{\times}3$ MMI benchmark with 4608 held-out fields, PaNO lowers NeurOLight's port-power error from 0.2018 to 0.0739 despite slightly higher cMAE, showing that global field accuracy alone is not sufficient for design-relevant readout fidelity. PaNO-R2 attains the best cMAE, propagation-profile error, output-profile error, and port-power error, reducing NeurOLight's port-power and output-profile errors by 72.7\% and 72.5\%.

URL PDF HTML ☆

赞 0 踩 0

2606.03026 2026-06-03 cs.NE cs.AI cs.LG 版本更新

Spike-Aware C++ INT8 Inference for Sparse Spiking Language Models on Commodity CPUs

面向稀疏脉冲语言模型在商用CPU上的脉冲感知C++ INT8推理

Ting Liu

发表机构 * SymbolicLight Research（SymbolicLight研究院）

AI总结本文提出一种脉冲感知的C++推理运行时，利用稀疏二进制脉冲状态作为执行原语，结合混合布局、AVX2/FMA内核和INT8量化，在商用CPU上实现脉冲语言模型的高效解码，吞吐量优于同等规模稠密模型但质量略逊。

Comments 11 pages, 7 tables

详情

AI中文摘要

脉冲语言模型展现出激活稀疏性，而稠密Transformer运行时无法直接利用。本文从系统角度研究这一特性。基于SymbolicLight V1脉冲门控语言模型家族，我们实现了一个C++ CPU推理运行时，将稀疏二进制脉冲状态视为执行原语，而非仅应用事后权重压缩。该运行时结合了清单驱动的权重加载器、混合行/列内存布局、AVX2/FMA内核、每通道对称INT8量化以及脉冲条件稀疏路径的整数域累加。在AMD Ryzen 7 5800X上，早期标量FP32基线解码速度为9.5 tokens/s。混合布局AVX2 FP32将其提升至14.7 tokens/s，而AVX2 INT8在相同step-30k导出模型上达到19.9 tokens/s，同时将权重占用从3.49 GB降至1.06 GB。对于可用的186k步874M参数INT8导出模型，C++运行时在单线程CPU基准测试中解码速度为22.63 tokens/s，相比之下，TinyLlama-1.1B Q8_0为16.31 tokens/s，Falcon3-1B Q8_0为11.26 tokens/s，Qwen2.5-1.5B Q8_0为9.70 tokens/s。线程扩展在四个CPU线程时达到47.90 tokens/s，512 token预填充从单线程的29.86 tokens/s提升至八线程的94.68 tokens/s。吞吐量提升伴随着质量代价：SNN报告WikiText-2困惑度为24.80，差于同一基准中的稠密基线。我们将结果定位为稀疏语言运行时的推理系统研究，长期动机在于可能受益于传感器和执行器附近本地低核推理的具身和边缘智能体。脉冲感知执行可以改善稀疏脉冲语言模型的CPU吞吐量和内存行为，而模型质量、受控稠密训练基线、具身任务评估和测量CPU能耗仍是开放问题。

英文摘要

Spiking language models expose activation sparsity that dense Transformer runtimes do not directly exploit. This paper studies that property from a systems perspective. Building on the SymbolicLight V1 spike-gated language model family, we implement a C++ CPU inference runtime that treats sparse binary spike states as an execution primitive rather than only applying post-hoc weight compression. The runtime combines a manifest-driven weight loader, mixed row/column memory layout, AVX2/FMA kernels, per-channel symmetric INT8 quantization, and integer-domain accumulation for spike-conditioned sparse paths. On an AMD Ryzen 7 5800X, an early scalar FP32 baseline decodes at 9.5 tokens/s. Mixed-layout AVX2 FP32 raises this to 14.7 tokens/s, and AVX2 INT8 reaches 19.9 tokens/s on the same step-30k export while reducing the weight footprint from 3.49 GB to 1.06 GB. For the available 186k-step 874M-parameter INT8 export, the C++ runtime decodes at 22.63 tokens/s in a single-thread CPU benchmark, compared with 16.31 tokens/s for TinyLlama-1.1B Q8_0, 11.26 tokens/s for Falcon3-1B Q8_0, and 9.70 tokens/s for Qwen2.5-1.5B Q8_0 under llama.cpp. Thread scaling reaches 47.90 tokens/s at four CPU threads, and 512-token prefill improves from 29.86 to 94.68 tokens/s from one to eight threads. The throughput result comes with a quality cost: the SNN reports WikiText-2 perplexity 24.80, worse than the dense baselines in the same benchmark. We frame the result as an inference-systems study for sparse language runtimes, with longer-term motivation in embodied and edge agents that may benefit from local, low-core inference near sensors and actuators. Spike-aware execution can improve CPU throughput and memory behavior for sparse spiking language models, while model quality, controlled dense training baselines, embodied-task evaluation, and measured CPU energy remain open problems.

URL PDF HTML ☆

赞 0 踩 0

2606.03017 2026-06-03 cs.LG cs.AI cs.RO 版本更新

神经网络可证明地学习群组合的谱表示

Jianliang He, Leda Wang, Fengzhuo Zhang, Siyu Chen, Zhuoran Yang

AI总结通过将投影梯度流提升到傅里叶域，证明两层神经网络在群组合任务中几乎必然收敛到单个不可约表示，并揭示了表示论视角下的特征学习和低秩压缩现象。

详情

AI中文摘要

理解神经网络训练过程中结构化内部结构如何涌现是深度学习研究的核心。我们通过群组合任务研究这一现象，其中训练一个两层神经网络来预测有限群 $G$ 中元素的 $g_1 \star g_2$。通过将投影梯度流提升到傅里叶域，我们证明训练动力学由一个表示论能量泛函上的黎曼梯度上升控制。我们证明，在随机初始化下，该流驱动每个神经元几乎必然收敛到单个不可约表示，而跨层傅里叶系数实现旋转秩一对齐。该框架提供了特征学习的表示论解释，并刻画了矩阵值群表示的一种新颖的低秩压缩现象。此外，对于阿贝尔群，我们提供了完整的总体水平描述：随机初始化促进非平凡表示上的均匀多样化，并诱导 Haar 均匀相位，通过多数投票机制联合逼近指示函数。我们进一步证明相位对齐和表示竞争都以指数收敛速率出现。

英文摘要

Understanding how structured internal structure emerges during neural network training is central to the study of deep learning. We investigate this phenomenon through the group composition task, where a two-layer neural network is trained to predict $g_1 \star g_2$ for elements of a finite group $G$. By lifting the projected gradient flow to the Fourier domain, we demonstrate that the training dynamics are governed by a Riemannian gradient ascent on a representation-theoretic energy functional. We prove that, under random initialization, this flow drives each neuron to converge almost surely toward a single irreducible representation, while the cross-layer Fourier coefficients achieve a rotational rank-one alignment. This framework provides a representation-theoretic account of feature learning and characterizes a novel low-rank compression phenomenon for matrix-valued group representations. Moreover, for Abelian groups, we provide a complete population-level description: random initialization promotes uniform diversification across nontrivial representations and induces Haar-uniform phases, jointly approximating the indicator via a majority-vote mechanism. We further prove that both phase alignment and representation competition emerge with exponential convergence rates.

URL PDF HTML ☆

赞 0 踩 0

2606.02982 2026-06-03 cs.PF cs.DC cs.LG 版本更新

DriftSched: Adaptive QoS-Aware Scheduling under Runtime Token Drift for Multi-Tenant GPU Inference

DriftSched: 多租户GPU推理中运行时令牌漂移下的自适应QoS感知调度

Kathiravan Palaniappan

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出DriftSched框架，通过运行时反馈驱动的漂移补偿和自适应偏差校正，解决多租户LLM推理中令牌漂移导致的调度问题，在NVIDIA L4 GPU上实现平均38.8%的估计误差降低和42%的中位延迟改善。

Comments 17 pages, 22 figures, 7 tables

详情

AI中文摘要

大型语言模型（LLM）推理服务的快速增长增加了对高效多租户GPU调度的需求。尽管现代推理运行时（如vLLM）通过连续批处理和优化内存管理提高了吞吐量，但准确估计异构推理请求的运行时成本仍然是一个重大挑战。在实践中，观察到的输出长度通常偏离准入时的估计值，产生运行时令牌漂移，可能导致工作负载错误分类、队列不平衡、尾延迟增加和服务质量（QoS）下降。本文提出了DriftSched，一个用于NVIDIA L4 GPU上多租户LLM推理服务的自适应QoS感知调度框架。DriftSched结合了工作负载分类、令牌预算估计、租户感知队列管理和运行时反馈驱动的漂移补偿，以改进准入时的调度决策。该框架在异构多租户工作负载下评估了FIFO、优先级、加权、最短作业优先（SJF）和老化优先级调度策略。实验结果表明，各工作负载类别存在可测量的运行时令牌漂移。自适应偏差校正将工作负载估计误差平均降低38.8%（MAE）和40.5%（RMSE），提高了工作负载分类稳定性和调度准确性。在所有评估的调度器中，SJF实现了最佳整体性能，在持续GPU争用下，相对于FIFO，中位端到端延迟降低了约42%，P99延迟降低了约16%。该工作贡献了一个自适应漂移感知调度架构、一个运行时令牌漂移补偿机制，以及一个用于评估共享GPU基础设施上QoS感知LLM推理调度的可重复基准测试框架。

英文摘要

The rapid growth of large language model (LLM) inference services has increased the demand for efficient multi-tenant GPU scheduling. While modern inference runtimes such as vLLM improve throughput through continuous batching and optimized memory management, accurately estimating the runtime cost of heterogeneous inference requests remains a significant challenge. In practice, observed output lengths often deviate from admission-time estimates, creating runtime token drift that can lead to workload misclassification, queue imbalance, increased tail latency, and degraded Quality-of-Service (QoS). This paper presents DriftSched, an adaptive QoS-aware scheduling framework for multi-tenant LLM inference serving on NVIDIA L4 GPUs. DriftSched combines workload classification, token-budget estimation, tenant-aware queue management, and runtime feedback-driven drift compensation to improve admission-time scheduling decisions. The framework evaluates FIFO, Priority, Weighted, Shortest-Job-First (SJF), and Aging Priority scheduling policies under heterogeneous multi-tenant workloads. Experimental results demonstrate measurable runtime token drift across workload categories. Adaptive bias correction reduces workload estimation error by an average of 38.8% (MAE) and 40.5% (RMSE), improving workload classification stability and scheduling accuracy. Among all evaluated schedulers, SJF achieves the best overall performance, reducing median end-to-end latency by approximately 42% and P99 latency by approximately 16% relative to FIFO under sustained GPU contention. The work contributes an adaptive drift-aware scheduling architecture, a runtime token-drift compensation mechanism, and a reproducible benchmarking framework for evaluating QoS-aware LLM inference scheduling on shared GPU infrastructure.

URL PDF HTML ☆

赞 0 踩 0

2606.02974 2026-06-03 cs.AI cs.HC cs.LG 版本更新

WISE-HAR: A Generalizable Ensemble Deep Learning Framework for WiFi-Based Human Activity Recognition

WISE-HAR：一种基于WiFi的人类活动识别的可泛化集成深度学习框架

Maheen Arshad, Qindeel E Zahra, Muhammad Khuram Shahzad

发表机构 * Department of Computing, School of Electrical Engineering and Computer Science（计算机系，电气工程与计算机科学学院）； National University of Sciences and Technology (NUST)（国家科学与技术大学（NUST））

AI总结本文提出WISE-HAR框架，通过集成五种CNN架构、数据增强和跨场景评估，在Wallhack1.8k数据集上实现94.87%的LOS测试准确率，并展现出强泛化能力。

Comments 8 pages, 5 figures

详情

AI中文摘要

利用WiFi信号进行人类活动识别（HAR）已成为智能家居、医疗监控、安全系统和环境辅助生活的一项变革性技术。与引发严重隐私问题且在弱光条件下失效的传统基于摄像头的系统，或需要用户配合的可穿戴传感器不同，基于WiFi的HAR是非侵入性的、保护隐私的、成本效益高的，并且能在任何光照条件下无缝工作。本文提出了一种综合方法，使用Wallhack1.8k WiFi频谱图数据集识别三种不同的人类活动：“无人”（空房间）、“行走”和“行走+挥手”。我们提出了三项关键改进以应对基于WiFi的HAR的主要挑战。首先，为了解决高性能方差问题，我们实现了集成学习，采用五种不同的CNN架构（Deep CNN、Wide CNN、MobileNetV2、ResNet50V2和EfficientNetB0）。其次，为了解决小数据集大小的限制，我们应用了激进的数据增强技术，包括时间扭曲、频率掩蔽和噪声添加。第三，为了评估真实世界的泛化能力，我们进行了跨场景评估（在视距上训练，在非视距上测试）和跨天线评估（在双锥天线上训练，在PIFA天线上测试）。我们的集成模型在使用双锥天线的LOS场景下达到了94.87%的测试准确率，比最佳单个模型高出0.66%。数据增强将随机森林的性能从60%提升到95%。跨场景评估显示准确率下降极小，仅为1.37%和2.07%，证明了强大的泛化能力。结果表明，所提出的方法鲁棒、可靠，适用于不同硬件配置的多样化环境中的实际部署。

英文摘要

Human Activity Recognition (HAR) using WiFi signals has emerged as a transformative technology for smart homes, healthcare monitoring, security systems, and ambient assisted living. Unlike traditional camera-based systems that raise significant privacy concerns and fail in low-light conditions, or wearable sensors that require user compliance, WiFi-based HAR is non-intrusive, privacy-preserving, cost-effective, and works seamlessly in any lighting condition. This paper presents a comprehensive approach to recognize three distinct human activities: "No Presence" (empty room), "Walking", and "Walking + Arm-waving" using the Wallhack1.8k WiFi spectrogram dataset. We propose three key improvements to address the main challenges in WiFi-based HAR. First, to address high performance variance, we implement ensemble learning with five different CNN architectures (Deep CNN, Wide CNN, MobileNetV2, ResNet50V2, and EfficientNetB0). Second, to address the small dataset size limitation, we apply aggressive data augmentation techniques including time-warping, frequency masking, and noise addition. Third, to evaluate real-world generalization capability, we perform cross-scenario evaluation (training on Line-of-Sight and testing on Non-Line-of-Sight) and cross-antenna evaluation (training on Biquad antenna and testing on PIFA antenna). Our ensemble model achieved a test accuracy of 94.87% on the LOS scenario with Biquad antenna, outperforming the best individual model by 0.66%. Data augmentation improved Random Forest performance from 60% to 95%. Cross-scenario evaluation showed minimal accuracy drops of only 1.37% and 2.07%, demonstrating strong generalization capabilities. The results indicate that the proposed approach is robust, reliable, and suitable for real-world deployment in diverse environments with different hardware configurations.

URL PDF HTML ☆

赞 0 踩 0

2606.02964 2026-06-03 cs.AR cs.CL cs.LG 版本更新

从非凸到强凸：曲率自适应的FTPL在线优化

Moses Charikar, Chirag Pabbaraju, Ambuj Tewari

发表机构 * Stanford University（斯坦福大学）； University of Michigan, Ann Arbor（密歇根大学安娜堡分校）

AI总结提出一种曲率自适应的FTPL算法，通过时变扰动尺度实现非凸Lipschitz损失下的最优遗憾界，并在线性累积曲率下达到对数遗憾。

详情

AI中文摘要

曲率自适应是在线优化中的一个经典主题：对于凸Lipschitz损失，自适应方法在一般凸损失的最优$O(\sqrt{T})$遗憾和强凸性下的$O(\log T)$遗憾之间进行插值。最近的研究表明，假设可以访问近似离线优化预言机，Follow-the-Perturbed-Leader (FTPL) 即使对于在线非凸Lipschitz损失也能实现最优的$O(\sqrt{T})$遗憾，但这些保证没有利用曲率。我们证明，在非凸设置中，FTPL可以变得曲率自适应，而无需事先知道曲率如何随时间累积。我们的算法将标准FTPL的固定扰动尺度替换为仅使用过去信息选择的时变尺度。我们给出了该尺度的简单跟随者调节规则，并表明它与事后最佳选择竞争（在常数因子内）。所得到的方法对于任意非凸Lipschitz损失实现了$O(\sqrt{T})$遗憾，并随着累积曲率的增长而改进；在足够精确的预言机调用下，当累积曲率线性增长时（包括经典的强凸情形），它实现了$O(\log T)$遗憾。我们用规定的累积曲率序列（即使对于一维凸损失）的匹配下界来补充这些上界，表明最坏情况非凸遗憾与曲率驱动的快速速率之间的权衡是内在的。

英文摘要

Curvature adaptivity is a classical theme in online optimization: for convex Lipschitz losses, adaptive methods interpolate between the optimal $O(\sqrt{T})$ regret for general convex losses and $O(\log T)$ regret under strong convexity. Recent work has shown that Follow-the-Perturbed-Leader (FTPL) achieves optimal $O(\sqrt{T})$ regret even for online non-convex Lipschitz losses, assuming access to an approximate offline-optimization oracle, but these guarantees do not exploit curvature. We show that FTPL can be made curvature-adaptive in the non-convex setting, without knowing in advance how curvature will accumulate over time. Our algorithm replaces the fixed perturbation scale of standard FTPL with a time-varying scale chosen using only past information. We give a simple follow-the-leader tuning rule for this scale and show that it competes, up to constants, with the best choice in hindsight. The resulting method achieves $O(\sqrt{T})$ regret for arbitrary non-convex Lipschitz losses and improves as cumulative curvature grows; with sufficiently accurate oracle calls, it achieves $O(\log T)$ regret when cumulative curvature grows linearly, which includes the classical strongly convex regime. We complement these upper bounds with matching lower bounds for prescribed cumulative-curvature sequences, already for one-dimensional convex losses, showing that the tradeoff between worst-case non-convex regret and curvature-driven fast rates is intrinsic.

URL PDF HTML ☆

赞 0 踩 0

2606.02947 2026-06-03 cs.LG cs.CV 版本更新

BYORn: Bootstrap Your Own Responses to Defend Large Vision-Language Models Against Backdoor Attacks

BYORn：自举你的响应以防御大型视觉-语言模型的后门攻击

Ivan Sabolić, Marin Oršić, Josip Šarić, Sven Lončarić

发表机构 * University of Rijeka（里耶卡大学）

AI总结提出BYORn框架，通过识别并替换语义不合理的后门目标响应，打破触发器与目标输出的关联，从而在保持干净任务性能的同时提升对后门攻击的鲁棒性。

Comments Accepted to ICML 2026

详情

AI中文摘要

监督微调是将自回归视觉-语言模型适应下游任务的主要方法。最近的研究表明，这种范式极易受到后门攻击，并且现有的防御在开放生成设置中无效。为此，我们提出了BYORn，一个鲁棒的后门防御微调框架，其动机是观察到，在给定相应图像-文本输入和预训练模型的情况下，被毒化的目标响应通常在语义上不合理。BYORn识别这种不对齐的响应，并动态地用模型生成的替代响应替换它们，从而打破触发器与目标输出之间的相关性。由此产生的目标梯度对应于干净数据分布上总体风险上界的经验估计的梯度。实验上，BYORn在保持干净任务性能的同时，持续提高了对后门攻击的鲁棒性，建立了泛化与攻击成功率之间的新权衡边界。最后，我们证明了BYORn对专门设计用于规避所提防御的自适应攻击仍然有效。

英文摘要

Supervised fine-tuning is the predominant approach for adapting autoregressive vision-language models to downstream tasks. Recent work has shown that this paradigm is highly vulnerable to backdoor attacks, and that existing defenses are ineffective in open-ended generation settings. In response, we propose BYORn, a backdoor-robust fine-tuning framework motivated by the observation that poisoned target responses are often semantically implausible given the corresponding image-text inputs and a pretrained model. BYORn identifies such misaligned responses and dynamically replaces them with alternative responses generated by the model, thereby breaking the correlation between triggers and target outputs. The resulting objective gradient corresponds to the gradient of the empirical estimate of the population risk upper bound over the clean data distribution. Empirically, BYORn consistently improves robustness to backdoor attacks while preserving clean-task performance, establishing a new trade-off frontier between generalization and attack success rate. Finally, we demonstrate that BYORn remains effective against adaptive attacks specifically designed to circumvent the proposed defense.

URL PDF HTML ☆

赞 0 踩 0

2606.02946 2026-06-03 cs.LG cs.CR 版本更新

Outsmarting the Chameleon: Counterfactual Decoupling for Tactical OOD Shifts in Live Streaming Risk Assessment

智取变色龙：针对直播风险评估中战术性OOD偏移的反事实解耦

Yiran Qiao, Jing Chen, Jiaqi Xu, Yang Liu, Qiwei Zhong, Xiang Ao

发表机构 * Institute of Computing Technology, Chinese Academy of Sciences（中国科学院计算技术研究所）； ByteDance China（字节跳动中国）

AI总结针对直播中恶意行为者通过战术性分布偏移（Tactical OOD Shift）规避检测的问题，提出基于潜在因果的反事实解耦框架LPCD，通过潜在层建模意图与叙事变化并强制潜在反事实一致性，实现鲁棒的风险评估。

Comments Accepted by KDD'26

详情

AI中文摘要

直播已成为社交互动和数字商务的主要媒介，但日益受到复杂风险的困扰。该领域的一个基本挑战是战术性分布偏移（tactical OOD shift）：虽然恶意行为者保持稳定的潜在目标，但他们不断重新设计叙事包装以逃避检测。这种对抗性偏移暴露了现有OOD泛化范式的关键局限性，其假设在意图-战术紧密耦合演变和原始级反事实定义不清的情况下难以满足。在本文中，我们从潜在因果角度解决这一问题，并提出潜在预测反事实解耦（LPCD），一个用于鲁棒直播风险评估的即插即用框架。LPCD通过在潜在层建模意图和叙事变化来实现对抗性战术重新包装下的反事实推理，并强制潜在反事实一致性以将风险预测锚定在因果稳定的恶意意图上。在推理时，LPCD应用轻量级、无参数的校准以进一步缓解战术引起的分布偏移。在大规模工业数据集和在线生产流量上的大量实验表明，LPCD持续优于最先进的基线，验证了其在现实直播中调节不断演变的对抗性风险的有效性。项目页面见此https URL。

英文摘要

Live streaming has emerged as a primary medium for social interaction and digital commerce, yet it is increasingly plagued by sophisticated risks. A fundamental challenge in this domain is \emph{tactical out-of-distribution (OOD) shift}: while malicious actors maintain stable underlying objectives, they continuously redesign narrative packaging to evade detection. Such adversarial shifts expose critical limitations of existing OOD generalization paradigms, whose assumptions are difficult to satisfy in the presence of tightly coupled intent-tactic evolution and ill-defined raw-level counterfactuals. In this paper, we tackle this issue from a \emph{latent causal} perspective and propose \underline{L}atent-\underline{P}redictive \underline{C}ounterfactual \underline{D}ecoupling~(LPCD), a plug-in framework for robust live streaming risk assessment. LPCD enables counterfactual reasoning under adversarial tactical re-packaging by modeling intent and narrative variation at the latent level, and enforces \emph{latent counterfactual consistency} to anchor risk prediction on causally stable malicious intent. At inference time, LPCD applies a lightweight, parameter-free calibration to further mitigate tactic-induced distribution shifts. Extensive experiments on large-scale industrial datasets and online production traffic demonstrate that LPCD consistently outperforms state-of-the-art baselines, validating its effectiveness in moderating evolving adversarial risks in real-world live streaming. The project page is available at https://qiaoyran.github.io/LiveStreamingRiskAssessment/.

URL PDF HTML ☆

赞 0 踩 0

2606.02939 2026-06-03 cs.LG eess.SP 版本更新

ERP-XTTN: Interpretable Prototype-Guided Cross-Attention for Cross-Subject ERP Classification

ERP-XTTN: 可解释的原型引导跨注意力用于跨被试ERP分类

Charlotte Genevier Wyman, Leanne Hirshfield

发表机构 * University of Colorado Boulder（科罗拉多大学波得尔分校）

AI总结提出ERP-XTTN，一种基于原型引导跨注意力的架构，在无需校准的跨被试条件下实现可解释的ERP分类，并揭示分类错误的神经生理学原因。

详情

AI中文摘要

可解释的脑机接口分类器能够在无需校准的情况下跨被试泛化仍然是一个开放的挑战。我们测试了基于原型的跨注意力是否能在部署兼容条件下提供具有竞争力且可解释的事件相关电位（ERP）分类。我们提出ERP-XTTN，一种跨注意力架构，通过仅查询-键的跨注意力（无值投影）将输入EEG片段路由到固定的差异波原型，因此分类完全依赖于注意力路由，且注意力忠实性是结构性的而非事后解释的。原型从训练折差异波的极值自动推导。我们在三个公开数据集（BNCI Horizon 2020、HRI Cursor和ERP CORE）上评估，涵盖八个ERP成分（ERN、LRP、ErrP、N170、P300、N2pc、MMN、N400），使用留一被试（LOSO）评估，并在两种通道数（3通道和全导联）下采用因果滤波，与EEGNet和基于黎曼几何的xDAWN（xDAWN+RG）对比。最佳基线与ERP-XTTN的平均差距在3通道时为0.018 AUROC，在全导联时为0.034，这源于两个大致不同的来源：相对于EEGNet的时间灵活性成本和相对于xDAWN+RG的空间利用成本，后者在全导联时由信噪比驱动。除了准确性，透明的路由揭示了黑箱模型无法发现的跨被试信号结构：假阳性与真阳性的相似度高于真阴性，表明分类错误在神经生理学上是可以解释的。ERP-XTTN在因果、无校准条件下泛化到多种ERP，并在最小导联设置下具有较小的可解释性代价。据我们所知，这是ERP CORE上首个epoch级LOSO基准测试。

英文摘要

Interpretable brain-computer interface classifiers that generalize across subjects without calibration remain an open challenge. We test whether prototype-based cross-attention can provide competitive, interpretable event-related potential (ERP) classification under deployment-compatible conditions. We propose ERP-XTTN, a cross-attention architecture that routes input EEG patches to fixed difference-wave prototypes via query-key-only cross-attention with no value projection, so classification depends entirely on attention routing and attention faithfulness is structural rather than post-hoc. Prototypes are derived automatically from extrema in the training-fold difference wave. We evaluate across three public sources (BNCI Horizon 2020, HRI Cursor, and ERP CORE) spanning eight ERP components (ERN, LRP, ErrP, N170, P300, N2pc, MMN, N400), using leave-one-subject-out (LOSO) evaluation with causal filtering at two channel counts (3-channel and full montage), against EEGNet and xDAWN with Riemannian geometry (xDAWN+RG). The mean gap between the best baseline and ERP-XTTN was .018 AUROC at 3 channels and .034 at full montage, arising from two largely distinct sources: a temporal-flexibility cost relative to EEGNet and a spatial-exploitation cost relative to xDAWN+RG, the latter driven by signal-to-noise ratio at full montage. Beyond accuracy, the transparent routing reveals cross-subject signal structure that black-box models cannot: false positives resembled true positives more than true negatives did, indicating that classification errors are neurophysiologically explicable. ERP-XTTN generalizes across diverse ERPs under causal, calibration-free conditions with a small interpretability cost at minimal montages. To our knowledge, this is the first epoch-level LOSO benchmark on ERP CORE.

URL PDF HTML ☆

赞 0 踩 0

2606.02936 2026-06-03 cs.LG 版本更新

Hierarchical RBF-KAN and RBF-SKAN Architectures for Multidimensional Function Approximation and Random Field Learning

分层RBF-KAN和RBF-SKAN架构用于多维函数逼近和随机场学习

Mingtao Xia, Qijing Shen

发表机构 * University of Houston（德克萨斯大学）； University of Birmingham（伯明翰大学）； University of Oxford（牛津大学）

AI总结提出并分析使用径向基函数作为激活函数的分层Kolmogorov-Arnold神经网络架构，用于逼近确定性函数和随机场模型，并证明其通用逼近性质及缓解维度灾难的潜力。

详情

AI中文摘要

本文提出并分析了使用径向基函数作为激活函数的分层Kolmogorov-Arnold神经网络架构，用于逼近确定性函数和随机场模型。具体地，我们开发了用于多维确定性函数逼近的分层径向基函数Kolmogorov-Arnold网络（分层RBF-KAN）和用于随机场学习的分层径向基函数随机Kolmogorov-Arnold网络（分层RBF-SKAN）。从理论角度，我们为两种架构建立了通用逼近结果。特别地，我们推导了分层RBF-KAN的定量逼近估计，表明所提出的框架通过降低逼近问题的有效维度，有潜力部分缓解高维函数学习中的维度灾难。此外，我们证明了分层RBF-SKAN可以在Wasserstein-2度量下逼近随机场模型。实验上，我们表明所提出的基于径向基函数的神经网络结构能够有效学习多元函数和随机场模型。

英文摘要

In this manuscript, we propose and analyze hierarchical Kolmogorov--Arnold neural network architectures employing radial basis functions as activation functions for approximating deterministic functions and random field models. Specifically, we develop a hierarchical radial-basis-function Kolmogorov--Arnold network (hierarchical RBF-KAN) for multidimensional deterministic function approximation and a hierarchical radial-basis-function stochastic Kolmogorov--Arnold network (hierarchical RBF-SKAN) for random field learning. From a theoretical perspective, we establish universal approximation results for both architectures. In particular, we derive quantitative approximation estimates for the hierarchical RBF-KAN, showing that the proposed framework has the potential to partially alleviate the curse of dimensionality in learning high-dimensional functions by reducing the effective dimensionality of the approximation problem. Furthermore, we show that the hierarchical RBF-SKAN can approximate random field models under the Wasserstein-2 metric. Empirically, we show that our proposed radial-basis-function-based neural network structure could effectively learn multivariate functions and random field models.

URL PDF HTML ☆

赞 0 踩 0

2606.02920 2026-06-03 cs.LG 版本更新

遗忘并非擦除：通过传输键恢复潜在知识

Archie Chaudhury

发表机构 * Axionic Labs（Axionic实验室）

AI总结通过缝合评估协议和紧凑的任务特定传输键，发现灾难性遗忘主要由内部阶段接口漂移而非任务相关计算的永久擦除引起，并能在顺序训练后恢复大部分早期任务性能。

Comments Technical report showcasing results from transport keys

详情

AI中文摘要

灾难性遗忘通常被视为表征问题：在顺序训练后，模型似乎失去了支持早期任务性能的特征。我们挑战了这一观点的更强形式。在受控的持续学习设置中，我们发现相当一部分明显的遗忘可归因于内部阶段之间的接口漂移，而非任务相关计算的永久擦除。我们通过一种缝合评估协议研究这一现象，该协议将更新后网络的早期计算与其前身的后期计算相结合，并可选地通过紧凑的任务特定传输键进行中介。我们在系统层面将传输键描述为紧凑的接口对齐算子，从少量配对的锚点激活中估计，并通过模型缝合进行评估。在split CIFAR-100上使用ResNet风格网络时，传输键在顺序训练任务B后恢复了大部分原始任务A的性能。在紧凑视觉变换器上，我们观察到类似的恢复模式。这些结果表明，持续学习可能需要更好的机制来索引和重新访问潜在计算，而不仅仅是防止权重变化的方法。

英文摘要

Catastrophic forgetting is often framed as a representational problem: after sequential training, a model appears to lose the features that supported performance on earlier tasks. We challenge the stronger form of this view. Across controlled continual-learning settings, we find that a significant portion of apparent forgetting can be attributed to interface drift between internal stages rather than permanent erasure of task-relevant computation. We study this phenomenon through a stitched evaluation protocol that combines early computation from a post-update network with late computation from its predecessor, optionally mediated by a compact, task-specific transport key. We describe transport keys at a systems level as compact interface-alignment operators estimated from a small set of paired anchor activations and evaluated through model stitching. On split CIFAR-100 with a ResNet-style network, transport keys recover most of the original Task A performance after sequential training on Task B. On a compact vision transformer, we observe a similar recovery pattern. These results suggest that continual learning may require better mechanisms for indexing and re-accessing latent computations, not only methods that prevent weight change.

URL PDF HTML ☆

赞 0 踩 0

2606.02857 2026-06-03 cs.LG cs.AI 版本更新

GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning

GRZO：用于大语言模型微调的组相对零阶优化

Liyan Tan, Yequan Zhao, Yifan Yang, Ruijie Zhang, Xinling Yu, Zheng Zhang

发表机构 * University of California, Santa Barbara（加州大学圣巴巴拉分校）

AI总结提出GRZO优化器，通过组相对归一化聚合每个样本的损失，在不增加前向成本的情况下将有效梯度方向数从1提升至批量大小，降低方差并改善收敛，在多个模型和任务上优于MeZO。

Comments Preprint. Under review

详情

AI中文摘要

零阶优化是微调大语言模型时一种内存高效的反向传播替代方案，但其部署受限于梯度估计的高方差。我们提出GRZO，一种组相对零阶优化器，它为每个小批量样本抽取一个伪独立扰动，并通过组相对归一化聚合每个样本的损失，在不增加额外前向成本的情况下将有效梯度方向数从1提升至批量大小，同时保持推理级内存。我们证明GRZO在方向上是无偏的，方差随批量大小成比例缩小，从而得到比MeZO更紧的非凸收敛界。在RoBERTa-large、Llama3-8B和OPT-13B上，跨多个任务，GRZO在Llama3-8B上的平均准确率比MeZO提高$+3.0$，峰值GPU内存降低$23\%$；作为MeZO核心的即插即用替代，它平均将稀疏、低秩和量化ZO变体提升$+6.0$。

英文摘要

Zeroth-order (ZO) optimization is a memory-efficient alternative to backpropagation for fine-tuning large language models, but its deployment is limited by the high variance of gradient estimation. We propose GRZO, a Group-Relative Zeroth-Order optimizer that draws one pseudo-independent perturbation per mini-batch example and aggregates the per-example losses through group-relative normalization, raising the effective gradient-direction count from one to the batch size at no additional forward cost while preserving inference-level memory. We prove that GRZO is directionally unbiased with variance shrinking proportionally to the batch size, yielding a tighter nonconvex convergence bound than MeZO. Across RoBERTa-large, Llama3-8B, and OPT-13B over multiple tasks, GRZO improves average accuracy on Llama3-8B by $+3.0$ over MeZO at $23\%$ lower peak GPU memory; as a drop-in replacement for the MeZO core, it lifts sparse, low-rank, and quantized ZO variants by $+6.0$ on average.

URL PDF HTML ☆

赞 0 踩 0

2606.02852 2026-06-03 cs.LG 版本更新

RESCAST-100K: A Comprehensive Dataset for Cross-Domain Residential Load and Indoor Temperature Forecasting

RESCAST-100K：一个用于跨领域住宅负荷和室内温度预测的综合数据集

Jainam Dhruva, Yousaf Raza, A. B. Siddique, Simone Silvestri

AI总结提出RESCAST-100K大规模基准数据集，通过配置驱动接口支持跨领域泛化研究，涵盖约10万模拟住宅和5个真实数据集，用于评估迁移学习、域适应和零样本预测方法。

详情

AI中文摘要

住宅能源负荷和室内温度的准确短期预测对于家庭能源管理系统、电网级需求响应和社区能效工作至关重要。域适应和迁移学习在改善住宅环境中常见的数据异质性和稀缺性下的预测精度方面显示出潜力。然而，由于缺乏全面的住宅数据集，进展受到限制：现有基准在目标覆盖范围上狭窄，且很少支持结构化的跨领域评估。我们引入了RESCAST-100K，这是一个用于研究跨领域泛化的大规模住宅预测基准。它提供了一个配置驱动的接口，沿着可解释的轴（包括地理、气候区、墙体结构和供暖设备）实例化源域和目标域，从而能够在受控域偏移下系统评估迁移学习、域适应和零样本泛化。该基准涵盖约10万个来自ResStock的EnergyPlus模拟的美国住宅，每个住宅包含三个耦合目标的15分钟时间序列：总负荷、暖通空调负荷和室内温度。这些数据与天气通道、暖通空调设定点以及超过40个静态建筑协变量配对。RESCAST-100K还整合了五个真实世界住宅数据集，采用统一模式，支持在相同任务上进行模拟到真实的评估。我们对循环、注意力和MLP混合器架构进行了零样本性能基准测试，涵盖跨领域、缺失输入条件和预测任务。在域偏移下，交叉注意力和MLP混合器模型始终优于循环和经典Transformer基线。RESCAST-100K旨在帮助机器学习和建筑分析社区在家庭、社区和电网规模上推进跨领域住宅预测。

英文摘要

Accurate short-term forecasting of residential energy load and indoor temperature is essential for home energy management systems, grid-level demand response, and community energy efficiency efforts. Domain adaptation and transfer learning have shown promise for improving forecasting accuracy under data heterogeneity and scarcity commonly seen in residential settings. However, progress is limited by the lack of comprehensive residential datasets: existing benchmarks are narrow in target coverage and rarely support structured cross-domain evaluation. We introduce RESCAST-100K, a large-scale residential forecasting benchmark for studying cross-domain generalization. It provides a configuration-driven interface that instantiates source and target domains along interpretable axes, including geography, climate zone, wall construction, and heating equipment, enabling systematic evaluation of transfer learning, domain adaptation, and zero-shot generalization under controlled domain shifts. The benchmark covers approximately 100,000 EnergyPlus-simulated U.S. homes derived from ResStock, with 15-minute time series for three coupled targets per home: total load, HVAC load, and indoor temperature. These are paired with weather channels, HVAC setpoints, and over 40 static building covariates. RESCAST-100K also integrates five real-world residential datasets under a unified schema, supporting sim-to-real evaluation on the same tasks. We benchmark recurrent, attention-based, and MLP-mixer architectures for zero-shot performance across domains, missing-input conditions, and forecasting tasks. Cross-attention and MLP-mixer models consistently outperform recurrent and classical transformer baselines under domain shift. RESCAST-100K is intended to aid the machine learning and building analytics communities advance cross-domain residential forecasting at home, community, and grid scale.

URL PDF HTML ☆

赞 0 踩 0

2606.02849 2026-06-03 cs.LG 版本更新

A Systematic Evaluation of Current Architectures in Wind Power Forecasting

风电功率预测中当前架构的系统评估

Vinicius Bortolini, Gilson Adamczuk Oliveira, Erick Oliveira Rodrigues, Matheus Henrique Dal Molin Ribeiro

AI总结本文通过系统文献综述，评估混合深度学习、模态分解和统计方法在风电区间预测中的应用，发现结合VMD或EEMD等分解技术能提高预测精度和可靠性。

详情

DOI: 10.1109/ACCESS.2025.3628172
Journal ref: IEEE Access 2025

AI中文摘要

区间风速预测对于将风能有效集成到电力系统中至关重要，因为它考虑了风资源的固有不确定性。本研究对风电发电区间预测的混合方法进行了系统文献综述，探讨了深度学习、模态分解和统计方法的结合。为了指导论文选择，应用了潜在狄利克雷分配（LDA）进行主题建模，从而识别出模式和研究趋势。研究结果强调，将混合模型与分解技术（如变分模态分解（VMD）和集合经验模态分解（EEMD））相结合，通过在不牺牲覆盖率的情况下缩小预测区间，提高了预测准确性和可靠性。关于区间构建，大多数研究采用双模型策略，独立预测上下界。输入数据通常使用EMD、EEMD或VMD等技术进行分解，提取基于频率的分量。这些分量作为LSTM或ELM等模型的输入，分别针对每个边界进行训练。这种方法允许对不确定性进行有针对性的建模，提高了灵活性和精度。区间质量通常通过平衡覆盖率和区间宽度的指标进行评估。该综述还强调了挑战，包括缺乏标准化的评估指标、计算复杂性和有限的实际验证。总体而言，该研究强化了区间预测对风能运营的价值，并为提高模型鲁棒性和决策提供了见解。

英文摘要

Interval wind speed forecasting is essential for the efficient integration of wind energy into power systems, as it accounts for the inherent uncertainty of wind resources. This study presents a systematic literature review focused on hybrid approaches to interval forecasting of wind generation, exploring the combination of deep learning, modal decomposition, and statistical methods. To guide the paper selection, Latent Dirichlet Allocation (LDA) was applied for topic modeling, enabling the identification of patterns and research trends. The findings emphasize that integrating hybrid models with decomposition techniques-such as Variational Mode Decomposition (VMD) and Ensemble Empirical Mode Decomposition (EEMD)-enhances forecast accuracy and reliability by narrowing prediction intervals without compromising coverage. Regarding interval construction, most studies adopt a dual-model strategy, independently forecasting the lower and upper bounds. Input data are commonly decomposed using techniques like EMD, EEMD, or VMD, which extract frequency-based components. These components serve as inputs to models such as LSTM or ELM, trained separately for each bound. This approach allows for targeted modeling of uncertainty, improving flexibility and precision, Interval quality is typically evaluated through metrics that balance coverage and interval width. The review also highlights challenges, including the lack of standardized evaluation metrics, computational complexity, and limited real-world validation. Overall, the study reinforces the value of interval forecasting for wind energy operations and offers insights for advancing model robustness and decision-making.

URL PDF HTML ☆

赞 0 踩 0

2606.02842 2026-06-03 cs.LG 版本更新

Spectral-Progressive Thought Flow for Lightweight Multimodal Reasoning

光谱渐进式思维流：轻量级多模态推理

Yixian Shen, Zhiheng Yang, Qi Bi, Changshuo Wang, Shuai Wang, Jia-Hong Huang, George Floros, Prayag Tiwari, Anuj Pathania

AI总结提出光谱渐进式思维流（SpecFlow），通过在固定大小离散余弦空间中表示中间视觉思维，并利用无分类器引导将视觉状态更新与文本意图对齐，实现轻量级多模态空间推理，在保持竞争性能的同时将计算和KV缓存成本降低高达2.1倍。

Comments Accepted at ICML 2026

详情

AI中文摘要

多模态空间推理通常依赖于长链的中间文本和视觉思维，其中累积的视觉标记和密集的跨模态注意力会带来大量的计算和内存开销。为了解决这一挑战，我们提出了光谱渐进式思维流（SpecFlow），一种新颖的轻量级多模态空间推理框架，它在固定大小的离散余弦空间中表示中间视觉思维。通过利用强大的能量压缩，SpecFlow保留了全局布局和关系结构，同时仅在需要增加空间精度时引入高频细节。为了将视觉状态演化与语言意图对齐，无分类器引导使得自回归文本思维能够引导基于流的视觉工作空间/状态更新，而无需扩展上下文。因此，SpecFlow维持一个有界的视觉工作空间，其更新仅依赖于当前视觉状态和累积的文本轨迹，从而能够以稳定的延迟和内存使用进行长程推理，且与推理深度无关。实验结果表明，SpecFlow在实现竞争性或更优推理性能的同时，将计算和KV缓存成本降低了高达2.1倍。

英文摘要

Multimodal spatial reasoning often relies on long chains of intermediate textual and visual thoughts, where accumulating visual tokens and dense cross-modal attention incur substantial computation and memory overhead. To address this challenge, we propose Spectral-Progressive Thought Flow (SpecFlow), a novel lightweight multimodal spatial reasoning framework that represents intermediate visual thoughts in a fixed-size discrete cosine space. By exploiting strong energy compaction, SpecFlow preserves global layout and relational structure while introducing high-frequency details only when increased spatial precision is required. To align visual state evolution with linguistic intent, classifier-free guidance enables autoregressive textual thoughts to steer flow-based updates of the visual workspace/state without expanding the context. As a result, SpecFlow maintains a bounded visual workspace whose updates depend only on the current visual state and accumulated textual trace, enabling long-horizon inference with stable latency and memory usage independent of reasoning depth. Empirical results show that SpecFlow achieves competitive or superior reasoning performance while reducing computation and KV cache costs by up to 2.1 times.

URL PDF HTML ☆

赞 0 踩 0

2606.02841 2026-06-03 cs.LG math.AT 版本更新

Learning Coherent Representations: A Topological Approach to Interpretability

学习一致表示：一种拓扑可解释性方法

Sigurd Gaukstad, Melvin Vaupel, Valdemar Kargård Olsen, Erik Hermansen, Benjamin Dunn

发表机构 * University of Oslo（奥斯陆大学）

AI总结提出基于脑神经编码启发的“一致性”几何约束，通过Fréchet方差目标函数Coh训练模型，使特征在样本空间中形成连续区域，从而提升表示的可解释性。

Comments To appear in ICML 2026

详情

AI中文摘要

深度神经网络学习的表示中，单个特征往往缺乏可解释意义；一个神经元可能对分散、不相关的输入激活。我们引入一致性，这是一种受大脑神经编码启发的几何性质，其中像网格细胞和头部方向细胞这样的神经元对状态空间的连续区域做出响应。一个非负矩阵是一致的，如果每个行（样本）关注几何上聚类的列（特征），反之亦然，并且每个样本都由某个特征很好地描述，每个特征都被某个样本需要。我们证明一致矩阵在样本和特征的Vietoris-Rips过滤之间诱导有界交错，保证两个空间共享兼容的拓扑结构。这种几何约束促进了可解释性。例如，如果数据位于圆上，一致特征必须将该圆分割成连续的弧段。我们引入Coh，一种基于Fréchet方差的可微目标函数，在训练过程中强制执行一致性。与稀疏性（限制一个特征激活多少个样本）不同，一致性限制哪些样本，要求几何连通性而不仅仅是稀有性。这不仅产生可解释的特征，还产生可解释的特征空间。我们使用合成数据和旋转MNIST数据集在自编码器中验证Coh，并使用语言数据在BERT的词嵌入中验证Coh。

英文摘要

Deep neural networks learn representations where individual features often lack interpretable meaning; a single neuron may activate for scattered, unrelated inputs. We introduce coherence, a geometric property inspired by neural coding in the brain, where neurons like grid cells and head direction cells respond to contiguous regions of state space. A non-negative matrix is coherent if each row (sample) attends to geometrically clustered columns (features) and vice versa, and in addition every sample is well described by some feature and every feature is needed by some sample. We prove that coherent matrices induce a bounded interleaving between the Vietoris-Rips filtrations of samples and features, guaranteeing that both spaces share compatible topological structure. This geometric constraint facilitates interpretability. For example, if data lies on a circle, coherent features must tile that circle into contiguous arcs. We introduce Coh, a differentiable objective function based on Fréchet variance that enforces coherence during training. Unlike sparsity, which bounds how many samples a feature activates on, coherence bounds which samples, requiring geometric connectivity rather than only rarity. This yields not just interpretable features but an interpretable feature space. We validate Coh in an auto-encoder using synthetic and rotated MNIST datasets and in a token embedding of BERT using language data.

URL PDF HTML ☆

赞 0 踩 0

2606.02830 2026-06-03 cs.LG math.OC 版本更新

混合自适应卡尔曼滤波用于数据高效的联合跟踪与分类

Jiho Lee, Nisar R. Ahmed, Rebecca Russell

发表机构 * Charles Stark Draper Laboratory, Inc.（查尔斯·斯泰克·德帕尔实验室，Inc.）； Ann and H. J. Smead Department of Aerospace Engineering Sciences（安与H.J.斯梅德航空航天工程科学系）

AI总结提出一种自监督混合自适应卡尔曼滤波器，通过仅从测量中学习系统动力学和过程噪声协方差的结构化校正，同时保持滤波器的概率结构，实现低数据和大数据场景下的高精度估计与鲁棒分类。

Comments 8 pages, 4 figures

2606.02765 2026-06-03 cs.LG cs.AI 版本更新

Representational Capacity: Geometric Limits on Feature Representation in Transformer Language Models

表示能力：Transformer语言模型中特征表示的几何限制

Alexander Guha

发表机构 * Arizona State University（亚利桑那州立大学）

AI总结基于线性表示和叠加假设，通过嵌入矩阵的余弦相似度分布估计模型可支持的近正交方向数量，推导出容量公式，并发现容量对偏差ε指数敏感。

Comments 22 pages, 10 figures. Submitted to NeurIPS 2026. This is a condensed version of thesis: https://hdl.handle.net/2286/R.2.N.204857

详情

AI中文摘要

模型维度（$d_{model}$）是Transformer语言模型中的一个基本超参数，但其在设定特征表示的几何限制方面的作用仍未得到充分探索。基于线性表示和叠加假设——这些假设提出模型将特征编码为潜在空间中的近正交方向——我们开发了一个框架来估计模型可以支持多少个这样的方向。我们首先将嵌入矩阵确立为跨潜在空间近正交约束的可测量代理：成对余弦相似度分布中有意义的token关系与偶然相似性之间的边界给出了模型对完美正交性的可接受偏差ε的具体估计。将此度量应用于数十个开源模型揭示了两个类别：具有高ε且其嵌入缺乏近正交结构的模型，以及具有低ε且保持近正交结构的模型。然后我们表明，标准的Johnson-Lindenstrauss引理大大低估了训练表示的填充效率，并推导出一个调整后的容量公式，其中近正交方向的数量取决于向量与维度的比率（$k/d$）而非原始计数——这一单一修改在没有额外参数的情况下将预测误差降低了两个数量级。结合这些结果，我们将表示能力定义为模型潜在空间中可用于特征和嵌入的可区分方向上界。容量对ε指数敏感，并且较大的模型倾向于更严格的正交约束而非最大化原始容量——这一模式与几种解释（稳定性-容量权衡、可用概念的上限或模型规模的混杂因素）兼容，我们将这些留给未来工作。

英文摘要

Model dimension ($d_{model}$) is a fundamental hyperparameter in transformer language models, yet its role in setting the geometric limits of feature representation remains under-explored. Grounded in the Linear Representation and Superposition Hypotheses - which propose that models encode features as near-orthogonal directions in latent space - we develop a framework for estimating how many such directions a model can support. We first establish the embedding matrix as a measurable proxy for near-orthogonality constraints across the latent space: the boundary between meaningful token relationships and incidental similarity in the pairwise cosine similarity distribution gives a concrete estimate of the model's accepted deviation $\varepsilon$ from perfect orthogonality. Applying this metric across dozens of open-source models reveals two classes: models with high $\varepsilon$ whose embeddings lack near-orthogonal structure, and models with low $\varepsilon$ that maintain it. We then show that the standard Johnson-Lindenstrauss lemma greatly underestimates the packing efficiency of trained representations, and derive an adjusted capacity formula in which the number of near-orthogonal directions depends on the ratio of vectors to dimensions ($k/d$) rather than the raw count - a single modification that cuts prediction error by two orders of magnitude with no extra parameters. Combining these results, we define representational capacity as an upper bound on the number of distinguishable directions available for features and embeddings in a model's latent space. Capacity is exponentially sensitive to $\varepsilon$, and larger models favor tighter orthogonality constraints over maximizing raw capacity - a pattern compatible with several explanations (a stability-capacity trade-off, a ceiling on usable concepts, or confounds with model scale) that we leave to future work.

URL PDF HTML ☆

赞 0 踩 0

2606.02762 2026-06-03 cs.LG 版本更新

Binary Road Surface Classification Using Machine Learning on Production Vehicle Signals During Cruising

基于生产车辆巡航信号的道路表面二分类机器学习方法

Vishal Hariharan, Salar Basiri, Kanwar Bharat Singh

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结针对巡航工况下传统摩擦估计方法失效的问题，提出基于特征和端到端数据驱动框架，利用车辆动力学信号统计特征对路面抓地力（干/湿）与打滑（雪/冰）进行二分类。

详情

AI中文摘要

实时道路滑溜性知识，甚至更精确的峰值抓地潜力估计，是车辆预警和干预控制系统的关键输入。通常，摩擦通过基于动力学的递归估计器计算滑移斜率来估计；然而，其有效性受到车辆动力学场景的严重限制。当车辆巡航且几乎没有滑移时，由于当前生产级传感器（如轮速传感器）和方法无法测量或准确估计微滑移（这对区分不同路面至关重要），这些方法变得无效。为了解决这一挑战，需要利用机器学习揭示巡航过程中车辆信号与路面条件之间的相关性。本文采用基于特征的框架和端到端数据驱动框架，将车辆动力学行为统计量与路面条件相关联，并执行二分类：抓地（干或湿）和打滑（雪或冰）。采用滑动窗口方法，将短时缓冲窗口内的轮速、轮扭矩、纵向加速度、转向角和横摆角速度批量输入机器学习模块，以预测道路状态。在公共道路数据上的验证结果表明，即使在巡航过程中，数据驱动方法也能正确识别路面，显示出在轮胎和车辆动力学领域实现精确数据驱动摩擦相关状态估计器的潜力。

英文摘要

Knowledge of real-time road slipperiness, or even better, a refined estimate of peak grip potential, is a critical input for vehicle warning and intervention control systems. Typically, friction is estimated through dynamics-based recursive estimators by calculating the slip slope; however, its efficacy is heavily constrained by the vehicle dynamic scenario. When the vehicle is cruising and there is little to no slip, these methods become ineffective due to the inability of present-day production-grade sensors, such as wheel speed sensors, and methods to either measure or accurately estimate micro slip, which is crucial for distinguishing different surfaces. To address this challenge, the correlation between vehicle signals and road surface condition during cruising needs to be uncovered using machine learning. In this paper, a feature-based framework and an end-to-end data-driven framework are used to correlate the statistics of vehicle dynamics behavior with the condition of the road surface and perform binary classification into grip, dry or damp, and slip, snow or ice, conditions. A sliding-window approach is adopted to batch a short buffered window of wheel speeds, wheel torques, longitudinal acceleration, steering angle, and yaw rate, which are fed into a machine learning module for predicting the road state. Validation results on public-road data show scenarios where the data-driven method identifies the road surface correctly even during cruising, showing promise for accurate data-driven friction-related state estimators in the field of tire and vehicle dynamics.

URL PDF HTML ☆

赞 0 踩 0

2606.02754 2026-06-03 cs.LG 版本更新

$Ψ$-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues

$\Psi$-Bench: 评估说服性对话中人格敏感的影响力

Peixuan Han, Hongyi Du, Jiayu Liu, Yihang Sun, Yutong Liu, Jiaxuan You

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结提出 $\Psi$-Bench 基准，通过三个现实场景评估 LLM 利用用户画像进行说服的能力，发现当前模型仍有较大提升空间，且用户画像带来 18.24% 的性能提升。

详情

AI中文摘要

个性化是现代语言代理的关键能力。然而，当前研究主要将个性化代理定位为对用户偏好的被动响应者，限制了其与用户交互并主动提供建议或指导的能力。为了在真实交互中系统评估这种主动个性化，我们提出了 $\Psi$-Bench，一个评估 LLM 通过对话影响真实用户能力的基准。我们在 $\Psi$-Bench 中设计了三个涉及说服的现实交互场景，并通过从对话历史中提取的显式用户画像赋予模拟客户个性特征。我们在 $\Psi$-Bench 上评估了 10 个前沿 LLM，发现尽管大多数模型能产生连贯合理的论点，但即使是最先进的模型在说服方面仍有相当大的改进空间。我们还发现，提供客户画像访问权限平均带来 18.24% 的性能提升，突显了用户特定信息对有效说服的重要性。总体而言，我们的工作强调了人格敏感的影响力作为评估和开发更主动的个性化 LLM 代理的一个具有挑战性但实用的方向。代码可在以下网址获取：this https URL。

英文摘要

Personalization is a crucial capability of modern language agents. However, current research primarily positions personalized agents as passive responders to user preferences, limiting their ability to interact with users and provide suggestions or guidance proactively. To systematically evaluate such proactive personalization in realistic interactions, we propose $Ψ$-Bench, a benchmark for assessing LLMs' ability to influence realistic users through conversation. We design three real-world interaction scenarios that involve persuasion in $Ψ$-Bench, and endow simulated clients with personal characteristics through explicit user profiles derived from dialogue histories. We evaluate 10 frontier LLMs on $Ψ$-Bench and find that while most models can produce coherent and reasonable arguments, even state-of-the-art models still leave considerable room for improvement in persuasion. We also find that providing access to client profiles yields an average performance gain of 18.24\%, highlighting the importance of user-specific information for effective persuasion. Overall, our work highlights persona-sensitive influencing as a challenging yet practical direction for evaluating and developing more proactive personalized LLM agents. Codes are available at: https://github.com/Hanpx20/Psi-Bench.

URL PDF HTML ☆

赞 0 踩 0

2606.02745 2026-06-03 cs.RO cs.LG 版本更新

大语言模型中用于结构推理的可视化图脚手架

Runlin Lei, Xiaokui Xiao, Zhewei Wei

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文提出将图结构作为大语言模型的内部推理辅助而非仅外部知识源，通过多跳问答实验发现视觉图引导相比文本化图在无直接答案提示时仍保持有效性，支持图作为组织推理的可视化脚手架。

详情

AI中文摘要

图已被用于增强大语言模型的结构化推理，主要是在测试时作为外部知识源提供给模型。在本文中，我们采取不同的视角：图对LLMs的价值不仅在于提供信息，还在于组织推理。受人类使用图结构思维导图组织分支和汇聚思维的启发，我们探究图是否可以作为推理辅助的内部形式。我们在多跳问答任务上研究这一问题，其中教师提供的推理轨迹被重写为图思维导图并用于指导学生模型。我们的实验揭示了明显的模态差距。当图结构被扁平化为文本时，一旦直接答案提示被移除，其益处变得有限。在这种抽象引导设置下，推理效率和答案质量都大幅下降。相比之下，视觉图引导在没有直接答案线索时仍然有效，并且其优势在监督微调和基于KL的蒸馏后仍然保持。上述发现支持了以下主张：图不仅应作为LLMs的外部知识结构来研究，还应作为组织推理的可视化脚手架。

英文摘要

Graphs have been used to enhance large language models (LLMs) for structured reasoning, mostly as external knowledge sources are provided to models at test time. In this paper, we take a different view: the value of graphs for LLMs lie not only in supplying information, but also in organizing reasoning. Inspired by how humans use graph-structured mind maps to organize branching and converging thoughts, we ask whether graphs can serve as an internal form of reasoning assistance. We study this question on multi-hop question answering tasks, where teacher-provided reasoning traces are rewritten as graph mind maps and used to guide a student model. Our experiments reveal a clear modality gap. When graph structures are flattened into text, their benefits become limited once direct answer hints are removed. Under this abstract guidance setting, both reasoning efficiency and answer quality degrade substantially. In contrast, visual graph guidance remains effective without direct answer clues, and its advantage persists after supervised fine-tuning and KL-based distillation. The above findings support the claim that graphs should be studied not only as external knowledge structures for LLMs, but also as visual scaffolds for organizing reasoning.

URL PDF HTML ☆

赞 0 踩 0

2606.02671 2026-06-03 cs.LG cs.AI 版本更新

幻觉可从量化LLM中间层隐藏状态线性解码

Aizierjiang Aiersilan

发表机构 * University of Macau（澳门大学）

AI总结研究开源LLM在4位量化下中间层隐藏状态是否编码线性可分的真实性信号，发现单层线性探针AUROC达0.904-1.000，优于采样方法，且信号近似线性。

详情

AI中文摘要

我们研究开源LLM是否在其隐藏状态中编码线性可分的真实性信号，以及该信号在网络哪一层最强。在三个7B-8B指令微调模型（Llama-3.1-8B、Mistral-7B、Qwen2.5-7B）以4位NF4量化加载的情况下，我们在四个幻觉基准（TruthfulQA、HaluEval-QA、FEVER和一个受控合成集）上提取每层隐藏状态，并比较四种检测方法：线性探针、MLP探针、INSIDE EigenScore、自一致性和注意力熵。单个中间网络层的线性探针在保留分割上达到0.904-1.000 AUROC，而基于采样的检测器在相同协议下不超过0.541 AUROC。真实性信号近似线性：MLP探针很少超过线性探针0.01 AUROC。在自然语言基准上，峰值探测层落在模型家族的一致范围内——Llama和Mistral的32层中第13-18块，Qwen的28层中第19-25块。第一块注意力熵在知识基础设置中提供互补信号（HaluEval-QA上0.866-0.941 AUROC），且无额外推理成本。该协议下采样方法的低区分性反映了配对标签评估与这些方法访问信息之间的结构性不匹配，而非这些方法的固有限制。代码和数据已发布，可在单个8 GB GPU上完全复现。

英文摘要

We investigate whether open-source LLMs encode a linearly separable truthfulness signal in their hidden states, and at which network depth this signal is strongest. Across three $7$B--$8$B instruction-tuned models (Llama-3.1-8B, Mistral-7B, Qwen2.5-7B) loaded in $4$-bit NF4 quantization, we extract per-layer hidden states on four hallucination benchmarks (TruthfulQA, HaluEval-QA, FEVER, and a controlled synthetic set) and compare four detection approaches: linear and MLP probes, INSIDE EigenScore, self-consistency, and attention entropy. A linear probe on a single mid-network layer achieves $0.904$--$1.000$ AUROC on held-out splits, while sampling-based detectors do not exceed $0.541$ AUROC under the same protocol. The truthfulness signal is approximately linear: MLP probes rarely surpass linear probes by more than $0.01$ AUROC. Peak probing layers fall in a consistent band across model families on natural-language benchmarks -- blocks~$13$--$18$ of~$32$ for Llama and Mistral, and blocks~$19$--$25$ of~$28$ for Qwen. First-block attention entropy provides a complementary signal in knowledge-grounded settings ($0.866$--$0.941$ AUROC on HaluEval-QA) at no additional inference cost. The low discriminability of sampling methods under this protocol reflects a structural mismatch between paired-label evaluation and the information these methods access, rather than an inherent limitation of those methods. Code and data are released for full reproducibility on a single $8$\,GB GPU.

URL PDF HTML ☆

赞 0 踩 0

2606.02623 2026-06-03 cs.NE cs.AI cs.LG 版本更新

Oscillatory State-Space Models as Inductive Biases for Physics-Informed Neural PDE Solvers

振荡状态空间模型作为物理信息神经PDE求解器的归纳偏置

Abhishek Chandra, Taniya Kapoor

发表机构 * KTH Royal Institute of Technology（皇家理工学院）； Wageningen University & Research（瓦赫宁根大学与研究中心）

AI总结提出一种结合振荡状态空间动力学和PDE感知空间谱的PINN方法，以改进时变PDE求解的精度和内存效率。

详情

AI中文摘要

求解时变偏微分方程（PDE）是计算科学与工程中的一个重要问题。物理信息神经网络（PINN）从控制方程中学习PDE解。然而，准确捕捉时间演化仍然具有挑战性。最近的基于序列模型的方法使用通用序列模型参数化时间演化，这些模型捕捉时间依赖性，但没有显式编码PDE解的结构化动力学。此外，它们的内存需求可能随序列长度和分辨率而不利地扩展，限制了在大规模或高维设置中的适用性。本文介绍了一种PINN方法，该方法结合了振荡状态空间动力学来表示PDE解的模态结构。所提出的方法利用基于线性振荡器的时间演化，以及空间上的PDE感知谱基。这种设计实现了闭式空间微分和边界条件的一致强制执行。该方法在前向、逆和高维PDE问题上进行了评估，包括高达100个空间维度的情况。结果表明，与最近基于序列模型的PINN方法相比，该方法提高了精度并减少了内存使用。总体而言，本文强调了将结构化动力学先验纳入神经PDE求解器的时间演化中的好处，并建议设计更符合物理和计算高效的PINN架构。

英文摘要

Solving time-dependent partial differential equations (PDEs) is an important problem in computational science and engineering. Physics-informed neural networks (PINNs) learn PDE solutions from governing equations. However, accurately capturing temporal evolution remains challenging. Recent sequence-model-based approaches parameterize time evolution using general-purpose sequence models, which capture temporal dependencies but do not explicitly encode the structured dynamics of PDE solutions. In addition, their memory requirements can scale unfavorably with sequence length and resolution, limiting applicability in large-scale or high-dimensional settings. This work introduces a PINN approach that incorporates oscillatory state-space dynamics to represent the modal structure of PDE solutions. The proposed method leverages a linear-oscillator-based temporal evolution, together with a PDE-aware spectral basis in space. This design enables closed-form spatial differentiation and consistent enforcement of boundary conditions. The method is evaluated on forward, inverse, and high-dimensional PDE problems, including cases up to 100 spatial dimensions. The results show improved accuracy and reduced memory usage compared to recent sequence-model-based PINN approaches. Overall, this work highlights the benefits of incorporating structured dynamical priors into the temporal evolution of neural PDE solvers and suggests designing more physics-aligned and computationally efficient PINN architectures.

URL PDF HTML ☆

赞 0 踩 0

2606.02610 2026-06-03 cs.CE cs.AI cs.LG physics.ao-ph 版本更新

Samudra 2: Scaling Ocean Emulators across Resolutions

Samudra 2: 跨分辨率扩展海洋仿真器

Yuan Yuan, Jesse Rusak, Alexander Merose, Adam Subel, Pavel Perezhogin, Alistair Adcroft, Carlos Fernandez-Granda, Laure Zanna

发表机构 * Courant Institute School of Mathematics, Computing, and Data Science, New York University（Courant学院数学、计算与数据科学系，纽约大学）； Open Athena AI Foundation, Inc.（开放Athena人工智能基金会）； Program in Atmospheric and Oceanic Sciences, Princeton University（大气与海洋科学项目，普林斯顿大学）

AI总结针对现有海洋神经仿真器在长期自回归滚动中出现的方差崩溃和印记伪影问题，提出Samudra 2，通过改进U-Net骨干网络和动态损失函数，在1°分辨率下将上层海洋全球平均温度R²从0.56提升至0.87，并将深层海洋温度误差降低约七倍，且可扩展至1/2°和1/4°分辨率。

详情

AI中文摘要

海洋环流模式（OGCM）对气候科学至关重要，但计算成本高，限制了集合规模和强迫情景。神经仿真器有望实现数量级的加速，然而现有的海洋仿真器未能将精细空间分辨率与多年自回归滚动相结合。Samudra是第一个产生多十年全球滚动的自回归神经海洋仿真器，但仅限于$1^\\\circ$分辨率，并表现出两种长期故障模式：\\emph{方差崩溃}，即时间变异性的丧失，以及\\emph{印记伪影}，即速度模式泄漏到深海场中。我们提出Samudra 2，它引入了更宽的U-Net骨干网络，采用修改后的ConvNeXt风格块和减小的块内扩展因子，以及一个动态损失函数，根据预测误差重新加权输出通道，从而增强缓慢演变的深海场的梯度。在$1^\\\circ$分辨率下，Samudra 2将上层海洋全球平均温度$R^2$从0.56提高到0.87，并将深海温度误差降低约七倍。相同的架构可扩展到$1/2^\\\circ$和$1/4^\\\circ$分辨率，在大约8年的自回归滚动中恢复中尺度涡旋和尖锐的西边界流。在单个GPU上运行，Samudra 2能够为海平面预测、海洋热吸收和气候变率研究提供更大的集合。我们在此https URL提供代码、文档和基准资源。

英文摘要

Ocean general circulation models (OGCMs) are essential to climate science but computationally expensive, limiting ensemble size and forcing scenarios. Neural emulators promise orders-of-magnitude speedups, yet existing ocean emulators have not combined fine spatial resolution with multi-year autoregressive rollouts. Samudra, the first autoregressive neural ocean emulator to produce multi-decade global rollouts, is limited to $1^\circ$ resolution and exhibits two long-horizon failure modes: \emph{variance collapse}, the loss of temporal variability, and \emph{imprinting artifacts}, in which velocity patterns leak into deep-ocean fields. We present Samudra 2, which introduces a wider U-Net backbone with modified ConvNeXt-style blocks and a reduced block-internal expansion factor, together with a dynamic loss that reweights output channels according to their prediction errors, strengthening gradients for slow-evolving deep-ocean fields. At $1^\circ$, Samudra 2 increases upper-ocean global-mean temperature $R^2$ from 0.56 to 0.87 and reduces deep-ocean temperature error by roughly sevenfold. The same architecture scales to $1/2^\circ$ and $1/4^\circ$ over approximately 8-year autoregressive rollouts, recovering mesoscale eddies and sharp western boundary currents. Running on a single GPU, Samudra 2 enables larger ensembles for sea-level projections, ocean heat uptake, and climate variability studies. We provide code, documentation, and benchmark resources at https://openathena.ai/Ocean_Emulator/.

URL PDF HTML ☆

赞 0 踩 0

2606.02607 2026-06-03 cs.LG cs.AI cs.CR 版本更新

Geometry-Aware Tabular Diffusion

几何感知表格扩散

David Turtora Zagardo

发表机构 * arXiv

AI总结提出几何感知表格扩散（GATD），通过向扩散去噪器注入列值差异的成对角度和长度作为输入和辅助目标，以显式建模列间关系，在10个数据集上以更少参数取得SOTA性能。

Comments Accepted to the ICML 2026 main track. 24 pages, 10 figures, 22 tables

详情

AI中文摘要

表格合成对于隐私保护的共享和增强至关重要，然而扩散模型依赖隐式机制来捕捉列间关系。我们引入了几何感知表格扩散（GATD），它通过从列值差异计算出的成对角度和长度来增强表格扩散去噪器，并将其用作输入和辅助目标。我们的MLP实例化在平均使用3.5倍更少参数（对于分类任务最多25倍）的情况下实现了最先进的基准性能：在十个数据集上，它在8/10的形状、7/10的趋势和9/10的下游效用（F1/RMSE）上获胜，将形状和趋势误差分别降低了27%和20%。默认损失权重可迁移到GNN和Transformer去噪器，在27/30个架构-数据集单元上改善了形状，在25/30上改善了趋势。一项匹配的消融实验表明，监督（而非额外输入或容量）驱动了性能提升。这表明显式关系监督是表格扩散的一种可移植归纳偏置。

英文摘要

Tabular synthesis is critical for privacy-preserving sharing and augmentation, yet diffusion models rely on implicit mechanisms to capture inter-column relationships. We introduce Geometry-Aware Tabular Diffusion (GATD), which augments tabular diffusion denoisers with pairwise angles and lengths computed from column value differences and used as inputs and auxiliary targets. Our MLP instantiation achieves state-of-the-art benchmark performance while using 3.5x fewer parameters on average (up to 25x for classification tasks): on ten datasets, it wins 8/10 Shape, 7/10 Trend, and 9/10 downstream utility (F1/RMSE), reducing Shape and Trend error by 27% and 20%. Default loss weights transfer to GNN and Transformer denoisers, improving Shape on 27/30 and Trend on 25/30 architecture-dataset cells. A matched ablation shows supervision (not extra inputs or capacity) drives the gain. This shows explicit relational supervision is a portable inductive bias for tabular diffusion.

URL PDF HTML ☆

赞 0 踩 0

2606.02606 2026-06-03 cs.LG cs.AI 版本更新

ReLoRA: Knowledge-Reusing Adaptation for Fast Rollout of Evolving LLM Services

ReLoRA: 面向演化LLM服务快速部署的知识复用适配

Yang Xu, Zihuai Xu, Hongli Xu, Yunming Liao, Zhiwei Yao, Xitong Fu

发表机构 * School of Computer Science and Technology, University of Science and Technology of China（计算机科学与技术学院，中国科学技术大学）； Suzhou Institute for Advanced Research, University of Science and Technology of China（苏州先进研究院，中国科学技术大学）

AI总结针对基础模型频繁更新导致已有LoRA适配器失效的问题，提出ReLoRA框架，通过贝叶斯优化初始化与调度正则化微调，实现知识复用与快速重新适配，降低计算开销并提升性能。

详情

AI中文摘要

大型语言模型（LLM）越来越多地被部署为持续演化的服务，其中频繁的基础模型更新可能使先前部署的任务特定低秩适配（LoRA）适配器失效。对于管理众多下游模型服务的提供商来说，为每个更新的基础模型从头重新训练每个LoRA适配器在计算上代价高昂，并延迟服务部署。同时，更简单的替代方案，即简单地将原始LoRA适配器应用于更新的基础模型，由于适配器-骨干网络不兼容，常常导致服务质量下降。为了解决这个问题，我们提出了ReLoRA，一种知识复用的重新适配框架，能够高效地为演化的LLM服务恢复可用的LoRA适配器，同时保持或提升任务性能。具体来说，ReLoRA包含两个关键的优化步骤：1）自适应LoRA初始化利用贝叶斯优化，通过融合先前部署的任务适配器和基础模型演化的信息，构建一个兼容性感知的起点；2）带调度正则化的微调首先通过强正则化快速将适配器引导至高质量区域，随后通过放松正则化进行任务特定精炼。这种设计使得在减少重新适配开销的同时，能够快速恢复服务质量。大量实验表明，与基线相比，ReLoRA将就绪时间减少高达8.9倍，准确率提升高达4.6%。

英文摘要

Large Language Models (LLMs) are increasingly deployed as continuously evolving services, where frequent base-model updates may invalidate previously deployed task-specific Low-Rank Adaptation (LoRA) adapters. For service providers managing numerous downstream model services, retraining each LoRA adapter from scratch for every updated base model is computationally prohibitive and delays service rollout. Meanwhile, the simpler alternative, i.e., naively applying the original LoRA adapter to the updated base model, often leads to degraded service quality due to adapter-backbone incompatibility. To address this problem, we propose ReLoRA, a knowledge-reusing re-adaptation framework that efficiently restores service-ready LoRA adapters for evolving LLM services while preserving or improving task performance. Specifically, ReLoRA comprises two key optimization steps: 1) Adaptive LoRA initialization leverages Bayesian optimization to construct a compatibility-aware starting point by fusing information from both the previously deployed task adapter and the base model's evolution; 2) Fine-tuning with scheduled regularization first rapidly steers the adapter to a high-quality region via strong regularization, followed by relaxed regularization for task-specific refinement. This design enables rapid service-quality recovery with reduced re-adaptation overhead. Extensive experiments demonstrate that ReLoRA reduces time-to-readiness by up to 8.9$\times$ and improves accuracy by up to 4.6\% compared to baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.02605 2026-06-03 cs.LG cs.AI eess.IV 版本更新

基于拓扑感知排序的图Mamba生存分析

Yuanfang Chen, Peiqiang Yan, Yuntao Shou, Qian Zhao, Xiangyong Cao

发表机构 * School of Mathematics and Statistics（数学与统计学学院）； West China Science and Technology Innovation Harbor（西部科学与技术创新港）； School of Computer Science and Technology（计算机科学与技术学院）

AI总结针对WSI生存分析中Mamba模型对输入顺序敏感及单向架构限制空间结构利用的问题，提出基于拓扑感知排序的图Mamba框架TopoMamSurv，通过TAO策略、双向Mamba模块和GCN集成实现高效长程依赖建模与双向空间上下文建模。

详情

AI中文摘要

面向短租动态定价的人机协同上下文赌博机：历史预热与审批门控在线学习的结构等价性

Oleg Miroshnichenko

发表机构 * Oleg Miroshnichenko（奥列格·米罗什尼琴科）

AI总结针对短租动态定价中反馈稀疏、决策风险高的问题，提出人机协同门控赌博机框架，证明历史定价数据与在线策略预热数据的结构等价性，并设计正则化岭回归预热方法，将冷启动周期从约150轮压缩至约30轮。

详情

AI中文摘要

短租市场中的动态定价为在线学习算法带来了独特挑战：定价决策具有重大财务风险，运营商需要可解释性，且市场反馈稀疏（每个挂牌夜仅有一个预订结果）。我们提出了人机协同门控赌博机（HITL-GB）框架，其中上下文赌博机算法生成价格推荐，但人类代理保留在接受、修改或拒绝每条推荐后应用的权力。我们证明，在审批约束下，历史定价数据——在先前确定性策略下收集的——与用于初始化赌博机后验的策略内预热数据在结构上等价，从而避免了在稀疏反馈市场中使纯在线赌博机学习不可行的数周至数月的冷启动期。我们形式化了审批门控奖励信号，从历史片段推导出正则化岭回归预热程序，并在真实短租生产数据（匿名城市市场，2个房间，2022年4月至2026年4月，1461个夜间定价片段）上验证了该方法。当从层次因子化汤普森采样（HF-TS）家族初始化代理时，我们的预热程序将有效冷启动从约150轮压缩至约30轮。我们进一步论证，该结构等价结果具有领域无关性：任何法律或操作上需要人类审批的高风险领域——包括临床药物剂量、信贷发放、内容审核和放射诊断——都满足相同条件，并受益于相同的预热策略。在受监管行业中，强制性人类监督因此是一种统计资产而非部署约束。

英文摘要

Dynamic pricing in short-term rental (STR) markets presents a distinctive challenge for online learning algorithms: pricing decisions carry significant financial risk, operators require explainability, and market feedback is sparse (one booking outcome per listed night). We introduce the Human-in-the-Loop Gated Bandit (HITL-GB) framework, in which a contextual bandit algorithm generates price recommendations but a human agent retains authority to accept, modify, or reject each recommendation before it is applied. We show that under this approval constraint, historical pricing data -- collected under a prior deterministic policy -- is structurally equivalent to on-policy warm-up data for initialising the bandit's posterior, bypassing the weeks-to-months cold-start period that renders pure online bandit learning impractical in sparse-feedback markets. We formalise the approval-gated reward signal, derive a regularised ridge-regression warm-up procedure from historical episodes, and validate the approach on real STR production data (anonymised urban market, 2 rooms, April 2022 -- April 2026, 1,461 nightly pricing episodes). Our warm-up procedure compresses effective cold-start from ~150 episodes to ~30 episodes when initialising agents from the Hierarchical Factored Thompson Sampling (HF-TS) family. We further argue that the structural equivalence result is domain-agnostic: any high-stakes domain where human approval is legally or operationally required -- including clinical drug dosing, credit origination, content moderation, and radiological diagnosis -- satisfies the same conditions and benefits from the same warm-up strategy. In regulated industries, mandatory human oversight is thus a statistical asset rather than a deployment constraint.

URL PDF HTML ☆

赞 0 踩 0

2606.02582 2026-06-03 cs.CE cs.LG cs.NA math.NA 版本更新

Applying Two-Grid Preconditioner for Subsurface Flow Simulation using Attention-enhanced Hybrid Network to Accelerate Multiscale Discretization in High-contrast Media

应用注意力增强混合网络的两网格预条件子进行高对比度介质中地下流动模拟以加速多尺度离散化

Peiqi Li, Jie Chen, Shubin Fu

发表机构 * xjtlu.edu.cn（XTL大学）

AI总结提出一种结合学习与多尺度数值方法的混合框架，利用注意力增强混合网络预测多尺度基函数，并通过两网格预条件求解器加速高对比度介质中达西方程的数值求解。

详情

AI中文摘要

本文研究了强非均质、高对比度渗透率介质中达西方程的高效数值求解，提出了一种结合学习与多尺度数值方法的混合框架。学习组件用于预测混合广义多尺度有限元方法（混合GMsFEM）中的多尺度基函数，旨在减少离线阶段所需的重复局部计算。一旦预测出这些基函数，全局系统被组装，并通过两网格预条件求解器计算压力场。所提方法加速了昂贵的局部基函数构建阶段，同时保留了底层求解器的多尺度离散化和预条件迭代结构。在二维非均质达西问题上的数值实验表明，与几种代表性基于学习的方法相比，所提框架能获得更准确的最终压力重构，并在强非均质和高对比度系数下保持稳定。与传统混合GMsFEM相比，其主要优势在于基函数生成阶段的效率，而全局求解的质量仍由两网格预条件子保证。这些结果表明，通过学习加速多尺度基函数构建，同时保留成熟的全局问题数值求解器，为高分辨率达西型模拟提供了一种可行方法。

英文摘要

In this paper, we study the efficient numerical solution of Darcy equations in strongly heterogeneous media with high-contrast permeability and propose a hybrid framework that combines learning with multiscale numerical methods. The learning component is used for the prediction of multiscale basis functions in the mixed generalized multiscale finite element method (mixed GMsFEM), with the goal of reducing the repeated local computations required in the offline stage. Once these basis functions are predicted, the global system is assembled and the pressure field is computed by a two-grid preconditioned solver. The resulting method accelerates the costly local basis-construction stage while retaining the multiscale discretization and preconditioned iterative structure of the underlying solver. Numerical experiments on two-dimensional heterogeneous Darcy problems show that the proposed framework yields more accurate final pressure reconstruction than several representative learning-based methods and remains stable under strong heterogeneity and high-contrast coefficients. In comparison with the traditional mixed GMsFEM, its main advantage lies in the efficiency of the basis-generation stage, while the quality of the global solve is still ensured by the two-grid preconditioner. These results indicate that accelerating multiscale basis construction through learning, while preserving a mature numerical solver for the global problem, provides a viable approach for high-resolution Darcy-type simulations.

URL PDF HTML ☆

赞 0 踩 0

2606.03769 2026-06-03 math.OC cs.LG math.PR 版本更新

Bregman meets Lévy: Stochastic mirror descent with heavy-tailed noise in continuous and discrete time

Bregman遇见Lévy：具有重尾噪声的随机镜像下降在连续和离散时间中

Pierre-Louis Cauvin, Panayotis Mertikopoulos

AI总结研究随机镜像下降在重尾噪声下的鲁棒性，通过引入Lévy镜像流连续时间模型，证明其在凸和强凸目标下达到ε-最优的时间复杂度，并推导出离散时间匹配保证。

Comments 68 pages, 3 figures; to appear in the proceedings of ICML 2026

详情

AI中文摘要

Parameter-efficient fine-tuning (PEFT) is usually treated as a cheaper alternative to full fine-tuning. We study a broader role: small trainable adapters as persistent local state on top of strong shared foundation models. In this framing, the base model provides shared competence while adapters carry instance-specific behavior such as preferences, skills, tool habits, and memory-like updates. We organize the problem around three scaling axes: Scale Up, where stronger shared priors make small local updates more useful; Scale Down, where we study how small adapters can be while remaining reliable; and Scale Out, where many persistent adapted instances coexist. MinT provides one infrastructure example for managing adapter identity, revision, provenance, evaluation, and serving residency. Together, the results suggest that PEFT can be a compact substrate for persistent personal models rather than only a budget substitute for full fine-tuning.

URL PDF HTML ☆

赞 0 踩 0

2606.02332 2026-06-03 cs.AI cs.CL cs.LG 版本更新

Forget Attention: Importance-Aware Attention Is All You Need

忘记注意力：重要性感知注意力即你所需

Suhyeong Shin, Yeongwook Yang

发表机构 * Department of Computer Engineering（计算机工程系）

AI总结提出SISA方法，通过将状态空间模型的重要性信号直接融入注意力分数计算，实现分数级融合，在语言建模中兼顾全局检索与重要性排序。

Comments 20 pages, 6 figures, 25 tables

详情

AI中文摘要

将注意力的全局检索与状态空间模型（SSM）的顺序重要性信号相结合是混合语言建模的开放挑战。Transformer能看见所有位置但无法区分优先级；SSM知道什么重要但无法重新访问。现有混合模型——Jamba（块级）和Hymba（头级）——将两者置于独立模块，因此在注意力计算过程中彼此无法相互影响。我们提出SISA（SSM引导的Softmax注意力），该方法在注意力分数内部直接添加SSM导出的重要性项，并通过在增强的查询/键向量上执行单个SDPA调用来实现完整操作——无需循环状态，无需自定义内核。在152M/5B token上，SISA在LAMBADA-greedy上达到17.3%（对比Transformer的13.9和Mamba-3的15.5），并从第1K步起实现NIAH 100%，比Transformer的检索收敛速度快7倍；在369M规模下，Mamba-3在LAMBADA上领先，而SISA保持完美的NIAH和标准SDPA执行。因此，SISA为SSM-注意力混合模型定义了第三个设计轴——分数级融合——超越了此前主导该领域的块级和头级范式。

英文摘要

Combining attention's global retrieval with the sequential importance signal of state space models (SSMs) is the open challenge of hybrid language modeling. Transformers see everywhere but cannot prioritize; SSMs know what matters but cannot revisit. Existing hybrids -- Jamba (block level) and Hymba (head level) -- place the two in separate compartments, so neither informs the other during the attention computation itself. We propose SISA (SSM-Informed Softmax Attention), which adds an SSM-derived importance term directly inside the attention score and realizes the full operation as a single SDPA call on augmented query/key vectors -- no recurrent state, no custom kernel. At 152M / 5B tokens, SISA reaches LAMBADA-greedy 17.3% (vs. Transformer 13.9 and Mamba-3 15.5) and attains NIAH 100% from step 1K, 7x faster than Transformer's retrieval convergence; at 369M, Mamba-3 leads LAMBADA while SISA preserves perfect NIAH and stock-SDPA execution. SISA thus defines a third design axis for SSM-attention hybrids -- score-level fusion -- beyond the block-level and head-level paradigms that have dominated the field.

URL PDF HTML ☆

赞 0 踩 0

2606.02004 2026-06-03 cs.CL cs.LG 版本更新

Machine Learning for Coding Retail Product Names to Consumer-Price Categories: A Rule-plus-Bag-of-Words Pipeline with Reliability-Weighted Human-in-the-Loop Labeling

将零售产品名称编码为消费者价格类别的机器学习：基于规则加词袋的流水线，结合可靠性加权的人工参与标注

Vladimir Beskorovainyi

发表机构 * Besk Tech（Besk科技）； Moscow Institute of Physics and Technology (MIPT)（莫斯科物理技术学院）

AI总结本文提出一种结合规则和词袋模型的流水线方法，并采用可靠性加权的人工参与标注协议，将零售产品名称映射到消费者价格类别（如UN COICOP），实验表明词袋模型在该任务上已接近饱和（F1约0.99），而标注协议中可靠性加权投票仅略优于简单多数投票。

Comments 11 pages, 3 tables. Methodology paper; illustrative experiments only, no proprietary data

详情

DOI: 10.5281/zenodo.20503355

AI中文摘要

消费者价格测量越来越多地依赖替代数据源——扫描仪、网络抓取和交易/收据数据。一个反复出现的障碍是，这些来源中的产品描述简短、嘈杂且缩写，没有标准产品代码，因此每个项目必须首先映射到消费分类（例如，联合国COICOP方案），然后才能比较价格。本文将该映射作为一种通用的、可重复的方法进行研究。流水线包括：(i) 对嘈杂项目名称进行文本归一化和分词；(ii) 基于每类关键词和停用词的前缀树（trie）规则预分类器；(iii) 每个类别的二元确认模型，决定一个项目是否属于暂定分配的类别。对于大规模标注，我们使用人工参与协议，其中标注者给出二元有效/拒绝判断，通过动态更新的可靠性权重进行聚合；模型加入相同的规则，实现持续微调。我们的实证发现是通货紧缩的：在一个受控、无泄漏的研究中（一个类别，真实正例与困难负例，五个随机种子），词袋模型基本上饱和了任务（F1约0.99）——线性分类器匹配多层感知器，显式词序（n-gram）特征没有增加任何价值，约67个标注样本已经足够。标注协议的蒙特卡洛研究表明，可靠性加权投票勉强超过简单多数投票（其加性权重饱和），而Dawid-Skene方法明显更好地恢复标签。我们还讨论了价格层面的质量控制和统计办公室考虑交易数据时的设计经验。所有数字均为示意性；未复制任何机密数据、代码或文档。

英文摘要

Consumer-price measurement increasingly draws on alternative data sources -- scanner, web-scraped, and transaction/receipt data. A recurring obstacle is that product descriptions in such sources are short, noisy, and abbreviated, with no standard product code, so each item must first be mapped to a consumption classification (e.g., the UN COICOP scheme) before prices can be compared. This paper studies that mapping as a general, reproducible method. The pipeline is: (i) text normalization and tokenization of noisy item names; (ii) a prefix-tree (trie) rule-based pre-classifier driven by per-category key-phrases and stop-phrases; and (iii) a per-category binary confirmation model deciding whether an item belongs to a tentatively assigned category. For labels at scale we use a human-in-the-loop protocol in which annotators give a binary valid/reject judgment, aggregated by a dynamically updated reliability weight; the model joins the same rule, enabling continual fine-tuning. Our empirical finding is deflationary: in a controlled, leakage-free study (one category, real positives vs. hard negatives, five seeds), bag-of-words models essentially saturate the task (F1 about 0.99) -- a linear classifier matches a multilayer perceptron, explicit word-order (n-gram) features add nothing, and about 67 labeled examples already suffice. A Monte-Carlo study of the labeling protocol shows the reliability-weighted vote barely beats plain majority (its additive weights saturate) while Dawid-Skene recovers labels markedly better. We also discuss price-level quality control and design lessons for statistical offices considering transaction data. All figures are illustrative; no confidential data, code, or documentation is reproduced.

URL PDF HTML ☆

赞 0 踩 0

2606.01849 2026-06-03 cs.LG cs.CL cs.CR 版本更新

ContinuousBench: Can Differentially Private Synthetic Text Improve Capabilities?

ContinuousBench: 差分隐私合成文本能否提升能力？

Peihan Liu, Lucas Rosenblatt, Weiwei Kong, Natalia Ponomareva, Gautam Kamath, Rachel Cummings, Roxana Geambasu, Yu Gan, Lillian Tsai, Alex Bie

发表机构 * Columbia University（哥伦比亚大学）； NYU（纽约大学）； Google Research（谷歌研究）； University of Waterloo（滑铁卢大学）； Vector Institute（向量研究所）； Google（谷歌）

AI总结提出ContinuousBench基准，通过持续更新的数据集评估差分隐私合成文本能否传递原始语料库中的新知识，实验表明非隐私合成能有效转移知识，而最先进的DP合成方法即使在高隐私预算下也基本失败。

Comments For datasets, see https://huggingface.co/ContinuousBench; for the evaluation harness, see https://github.com/plau666/ContinuousBenchEval; for an accompanying blog post, see https://peihanliu.com/posts/continuousbench.html

详情

AI中文摘要

差分隐私（DP）文本合成有望为模型训练解锁敏感语料库，但目前尚不清楚DP合成数据是否能传递仅存在于这些语料库中的真正新知识和能力。这是因为现有评估依赖于无需训练即可几乎解决的任务，因此强大的基准性能并不能证明DP合成可以替代原始数据访问。为此，我们引入了ContinuousBench，一个持续自动更新的基准，用于衡量DP合成文本带来的能力提升。每季度发布一次新版本，配对一个从未见过的训练语料库和一个衍生的问答集，其构建满足：（1）无语料库无法解决；（2）在DP下可学习，因为测试的知识由数百条独立记录支持。研究人员从训练语料库生成DP合成数据，并在其合成数据上运行我们的标准化训练和评估框架以衡量增益。我们实例化两个轨道：Geminon，一个关于虚构生物的程序生成数据集；以及News，一个新爬取的公共新闻文章流。尽管标准基准几乎饱和，但在ContinuousBench上，我们发现非隐私合成从原始语料库转移了大量知识，而最先进的DP合成方法即使在高隐私预算（ε=100）下也基本无法做到。

英文摘要

Differentially private (DP) text synthesis promises to unlock sensitive corpora for model training, but it remains unclear whether DP synthetic data transmits genuinely new knowledge and capabilities present only in those corpora. This is because existing evaluations rely on tasks that are nearly solvable without training, so strong benchmark performance does not establish that DP synthesis can substitute original data access. Thus, we introduce ContinuousBench, a continuously and automatically-regenerated benchmark that measures capability gain from DP synthetic text. Each quarter, a new release pairs a never-before-seen training corpus with a derived QA set, constructed to be: (1) unsolvable sans-corpus; and (2) learnable under DP, as the tested knowledge is supported by hundreds of independent records. Researchers produce DP synthetic data from the training corpus and run our standardized training and evaluation harness on their synthetic data to measure gains. We instantiate two tracks: Geminon, a procedurally-generated dataset about fictional creatures; and News, a stream of newly crawled public news articles. Although standard benchmarks are nearly saturated, on ContinuousBench we find that non-private synthesis transfers substantial knowledge from the original corpus, while state-of-the-art DP synthesis methods generally fail to do so, even at $\varepsilon=100$.

URL PDF HTML ☆

赞 0 踩 0

2606.01532 2026-06-03 cs.LG cs.CC 版本更新

Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete

重新思考位置编码的作用：无PE的滑动窗口Transformer仍具图灵完备性

Qian Li, Xinyu Mao, Shang-Hua Teng

发表机构 * Shenzhen Research Institute of Big Data（深圳大数据研究院）； University of Southern California（南加州大学）

AI总结本文证明，在滑动窗口机制下，无需位置编码的Transformer仍可通过窗口演化模拟图灵完备的Post机器，从而具备通用计算能力。

详情

AI中文摘要

位置编码（PE）被广泛认为是Transformer处理有序序列所必需的：没有位置编码，下一个token映射在其上下文token中似乎是置换不变的。这一直觉支撑了所有先前的普适性结果，这些结果依赖位置信息来证明具有思维链的Transformer可以执行任意计算，即它们是图灵完备的。我们在与长程推理最相关的机制下重新审视这一信念，其中生成通过有限的滑动上下文窗口进行。我们的初步认识是，窗口机制本身（轻微地）打破了置换对称性。为了提炼并精确捕捉这种额外表达能力的大小，我们引入了一个抽象的自回归模型——HIST模型，其中每次更新仅依赖于恒定大小的内部状态和当前窗口内的token计数直方图。我们证明这个HIST模型是图灵完备的，通过展示窗口的演化可以揭示刚刚离开窗口的token，这足以模拟图灵完备的Post机器。然后，我们构建了一个在恒定大小token字母表上的滑动窗口Transformer，没有位置编码，并证明它可以模拟HIST模型。我们的结果表明，位置编码对于Transformer执行通用计算并非不可或缺：窗口滑动本身已经打破了置换对称性并捕获了足够的位置信息。

英文摘要

Positional encoding (PE) is widely viewed as necessary for transformers to process ordered sequences: without them, the next-token map appears permutation-invariant in its context tokens. This intuition underlies all prior universality results, which rely on positional information to prove that transformers with chain-of-thought can perform arbitrary computation, i.e., they are Turing complete. We revisit this belief in the regime most relevant to long-form reasoning, where generation proceeds through a finite sliding context window. Our opening perception is that the window mechanism itself (mildly) breaks the permutation symmetry. To distill and precisely capture the degree of this added expressiveness, we introduce an abstract autoregressive model, the HIST model, in which each update depends only on constant-size internal state and the token-count histogram within the current window. We prove that this HIST model is Turing complete by showing that the evolution of the window can reveal the token that has just left the window, which suffices to simulate Turing-complete Post machines. We then construct a sliding-window transformer over a constant-size token alphabet, without PE, and show that it can simulate the HIST model. Our result demonstrates that positional encodings are not indispensable for transformers to perform universal computation: The window sliding itself already breaks permutation symmetry and captures sufficient positional information.

URL PDF HTML ☆

赞 0 踩 0

2606.01472 2026-06-03 cs.DC cs.AI cs.LG 版本更新

Hierarchical Online Prompt Mutation with Dual-Loop Feedback for Guardrailed Evidence Document Generation: A Production-Evaluation Case Study

分层在线提示变异与双环反馈用于有护栏的证据文档生成：生产评估案例研究

Nataraj Agaram Sundar, Tejas Morabia

发表机构 * eBay Inc.（eBay公司）

AI总结提出分层在线提示变异框架HOPM，通过双环反馈（人工审核与自动评判）优化提示策略，在真实市场纠纷证据生成中显著提升胜率和质量。

Comments 7 pages. Production-evaluation case study of guardrailed LLM evidence-document generation

详情

AI中文摘要

高风险生产文档生成系统要求语言模型具有适应性、基于证据且可审计。我们提出HOPM，一种分层在线提示变异框架，在真实市场纠纷证据工作流上评估。HOPM将提示视为在线策略：一个家族/版本路由器选择提示，确定性护栏将失败归因于可变的提示-令牌类别，来自人工审核和自动评判的双重反馈更新路由和变异优先级。主要证据是观察到的匹配生产评估消融：七个变体在相同的600个案例上评估，实现组件比较：静态提示、手动迭代、仅bandit路由、仅变异适应、仅人工反馈、仅自动评判反馈和全双环HOPM。全HOPM将计数胜率从34.7%提升至45.7%（+11.0个百分点；配对McNemar p=1.31e-11），金额加权胜率从22.3%提升至41.4%（+19.1个百分点；95%配对bootstrap CI [10.3, 28.9]个百分点）。它还将平均Likert质量从3.18提高到4.40，并将问题标记率从15.3%降低到5.2%。支持性审查工件涵盖770篇生成文本审查、318份标记审查员导出、一个10案例/61评分的校准切片和一个70案例/350评分的OCR基准；这些工件校准评分标准、护栏、标题风险和OCR风险解释，而非替代生产消融。论文包括控制设置、样本量、置信区间、配对检验、提示-令牌类别、伪代码、模式、评分标准、护栏分类法以及一个构造示例，以便在不暴露专有证据的情况下重现评估结构。

英文摘要

High-stakes production document-generation systems require language models to be adaptive, evidence-grounded, and auditable. We present HOPM, a hierarchical online prompt mutation framework evaluated on a real marketplace dispute-evidence workflow. HOPM treats prompts as online policies: a family/version router selects a prompt, deterministic guardrails attribute failures to mutable prompt-token categories, and dual feedback from human review and an automated judge updates both routing and mutation priorities. The primary evidence is an observed matched production-evaluation ablation: seven variants are evaluated on the same 600 cases each, enabling component comparisons against static prompting, manual iteration, bandit-only routing, mutation-only adaptation, human-only feedback, auto-judge-only feedback, and full dual-loop HOPM. Full HOPM improves count win rate over a static control from 34.7% to 45.7% (+11.0 pp; paired McNemar p = 1.31e-11) and amount-weighted win rate from 22.3% to 41.4% (+19.1 pp; 95% paired bootstrap CI [10.3, 28.9] pp). It also increases mean Likert quality from 3.18 to 4.40 and reduces issue-flag rate from 15.3% to 5.2%. Supporting review artifacts cover 770 generated-text reviews, 318 labeled reviewer exports, a 10-case/61-rating calibration slice, and a 70-case/350-rating OCR benchmark; these artifacts calibrate rubric, guardrail, title-risk, and OCR-risk interpretation rather than substituting for the production ablation. The paper includes control setup, sample sizes, confidence intervals, paired tests, prompt-token categories, pseudocode, schema, rubric, guardrail taxonomy, and a constructed example so the evaluation structure can be reproduced without exposing proprietary evidence.

URL PDF HTML ☆

赞 0 踩 0

2606.01340 2026-06-03 cs.LG stat.ML 版本更新

Sample Complexity and Decision-Theoretic Guarantees for Bayesian Model Averaging over Decision Trees with Catalan-Exponential Priors

基于Catalan指数先验的决策树贝叶斯模型平均的样本复杂度和决策理论保证

Livija Jakaite, Vitaly Schetinin

发表机构 * School of Computing and Engineering University of Bedfordshire, Luton, UK（计算与工程学院贝德福德郡大学，卢顿，英国）

AI总结针对具有Dirichlet-Multinomial叶模型和Catalan指数树大小先验的贝叶斯决策树，建立了理性承诺阈值的完整非渐近理论，回答了贝叶斯模型平均权重何时蕴含足够认知信息以证明对平均分布的承诺利用是合理的。

Comments 22 pages, 3 figures, Submitted to the Journal of Machine Learning Research

2606.01111 2026-06-03 cs.LG 版本更新

LeAP: Learnable Adaptive Permutation for Feature Selection in Heterogeneous and Sparse Recommender Systems

LeAP: 面向异构稀疏推荐系统的可学习自适应特征选择排列

Yihong Huang, Chen Chu, Fei Chen, Yu Lin, Ruiduan Li, Zhihao Li

发表机构 * Bilibili Inc.（哔哩哔哩公司）

AI总结针对工业推荐系统中特征异构、极度稀疏及排列计算成本高的问题，提出可学习自适应排列模块LeAP，通过将随机排列转化为可学习机制并引入自适应正则化，实现高效特征选择，在四个公开数据集和十亿级工业搜索排序模型中取得最优性能。

详情

AI中文摘要

现代工业推荐系统依赖数千种异构特征——从低维标量（如统计值）到高维嵌入（如用户ID嵌入、MLP表示）——以实现高精度预测。鉴于训练相关的巨大计算成本，高效的特征选择至关重要。然而，现有方法面临三个主要瓶颈：（1）它们通常假设特征维度统一或需要昂贵的映射到固定大小；（2）它们难以处理极度稀疏性，其中大多数特征（例如99%以上）保持默认值；（3）传统的基于排列的方法在大规模设置中计算成本过高。为了解决这些挑战，我们提出了LeAP（可学习自适应排列），一种新颖的、模型无关的即插即用特征选择模块。LeAP将低效的随机排列过程转化为可学习机制，显著加速了特征重要性的评估。此外，我们引入了一种针对异构维度和极度稀疏性定制的自适应正则化策略，使得在非对称输入空间中获得优越的特征重要性排序结果。在四个公开推荐数据集上的实验表明，LeAP达到了最先进的性能。此外，LeAP已部署在一个大规模工业搜索排序模型中，该模型每天处理超过十亿次请求，模型参数规模达2TB。在这个涉及12000多个总特征维度的实际场景中，LeAP成功识别并移除了超过3600个冗余维度，且性能没有下降，其能力是基线方法的2到10倍。

英文摘要

Modern industrial recommender systems rely on thousands of heterogeneous features -- ranging from low-dimensional scalars (e.g., statistical value) to high-dimensional embeddings (e.g., user-id embeddings, MLP representations) -- to achieve high-precision predictions. Given the immense computational costs associated with training, efficient feature selection is critical. However, existing methods encounter three primary bottlenecks: (1) they typically assume uniform feature dimensions or require costly mapping to a fixed size; (2) they struggle with extreme sparsity, where the majority of features (e.g., 99%+) remain at default values; and (3) traditional permutation-based approaches are computationally prohibitive in large-scale settings. To address these challenges, we propose LeAP (Learnable Adaptive Permutation), a novel, model-agnostic plug-in module for feature selection. LeAP transforms the inefficient random permutation process into a learnable mechanism, significantly accelerating the evaluation of feature importance. In addition, we introduce an adaptive regularization strategy tailored for heterogeneous dimensions and extreme sparsity, enabling superior feature importance ranking results across asymmetric input spaces. Experiments on four public recommendation datasets demonstrate that LeAP achieves state-of-the-art performance. Furthermore, LeAP has been deployed in a large-scale industrial search ranking model with over a billion daily requests and a 2TB model parameter scale. In this real-world scenario involving 12,000+ total feature dimensions, LeAP successfully identified and removed over 3,600 redundant dimensions without performance degradation, which is 2 to 10 times the ability of compared baseline methods.

URL PDF HTML ☆

赞 0 踩 0

2606.00757 2026-06-03 cs.LG 版本更新

RADE: Random Add-Drop Edge as a Regularizer

RADE: 随机增删边作为正则化器

Danial Saber, Amirali Salehi-Abari

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出随机增删边方法RADE，同时解决图神经网络过拟合和长程信息过压缩问题，通过训练-推理对齐实现无分布偏移的正则化，并自适应调整增删率。

Comments 27 pages, ICML 2026

详情

AI中文摘要

图神经网络（GNN）存在过拟合和长程信息过压缩的问题。随机图增强（如边删除）通过正则化训练来缓解过拟合，但会导致训练-推理错位，且无法改善过压缩。相反，重连方法通过改善连通性来缓解过压缩，但并非设计用于正则化训练。我们提出随机增删边（RADE），一种同时删除和添加边的随机图增强方法，以同时解决过拟合和过压缩。RADE被证明能够对齐训练和推理，使得随机增强在无分布偏移的情况下正则化训练，同时在推理时支持长程通信。我们进一步提出并研究了一种小批量梯度范数平衡算法，该算法在训练过程中自适应调整删除和添加率，使得RADE在实践中无需超参数。在节点和图分类基准上的实验表明，RADE是一种强大的正则化器，并能缓解过压缩。消融实验支持了训练-推理对齐、自适应率选择以及随机边删除和边添加的互补作用。

英文摘要

Graph Neural Networks (GNNs) suffer from overfitting and over-squashing of long-range information. Stochastic graph augmentations (e.g., edge deletion) regularize training against overfitting but can introduce train-inference misalignment and do not improve over-squashing. In contrast, rewiring methods improve connectivity to mitigate over-squashing, but are not designed to regularize training. We propose Random Add-Drop Edge (RADE), a stochastic graph augmentation method that jointly drops and adds edges to address both overfitting and over-squashing simultaneously. RADE is provably designed to align training and inference so that random augmentations regularize training without distribution shift, while supporting long-range communication at inference. We further propose and study a mini-batch gradient-norm balancing algorithm that adapts deletion and addition rates during training, rendering RADE hyperparameter-free in practice. Experiments on node- and graph-classification benchmarks show that RADE is a strong regularizer and mitigates over-squashing. Ablations support the roles of train-inference alignment, adaptive rate selection, and the complementary effects of random edge deletion and edge addition.

URL PDF HTML ☆

赞 0 踩 0

2606.00680 2026-06-03 cs.AI cs.LG 版本更新

Regularized Offline Policy Optimization with Posterior Hybrid Bayesian Belief

具有后验混合贝叶斯信念的正则化离线策略优化

Hongqiang Lin, Pengfei Wang, Nenggan Zheng

AI总结提出后验混合贝叶斯信念（PhyB）以统一量化离线强化学习中的认知不确定性，并基于此开发迭代正则化策略优化算法，实现单调改进直至收敛。

详情

AI中文摘要

离线强化学习旨在从预先收集的数据集中优化策略。该范式的一个瓶颈是管理认知不确定性，这种不确定性源于有限的数据覆盖（样本层面）以及从有限数据中识别转移动态的模糊性（模型层面）。为了统一量化这些不确定性，贝叶斯强化学习通过将动态模型视为随机变量并维护相应的信念而被提出。尽管具有理论吸引力，贝叶斯强化学习中的策略优化在计算上仍然具有挑战性，因为它需要求解带有期望的复合目标。先前的方法要么采用计算可扩展性差的基于搜索的技术，要么施加牺牲贝叶斯强化学习适应性的限制性后验假设。为了解决这些局限性，我们提出了后验混合贝叶斯信念（PhyB），它将期望重新表述为动态模型子集上的凸组合。理论分析表明，这种近似引起的目标差异是有界的。基于PhyB，我们开发了一种迭代正则化策略优化算法，该算法为单调改进直至收敛提供了与度量无关的保证。实验结果表明，PhyB在各种基准测试中达到了最先进的性能。

英文摘要

Offline reinforcement learning (RL) aims to optimize policies from pre-collected datasets. A bottleneck of this paradigm is managing epistemic uncertainty, which arises from limited data coverage (sample-level) and the ambiguity in identifying transition dynamics from finite data (model-level). To provide a unified quantification of these uncertainties, Bayesian RL has been proposed by treating the dynamics model as a random variable and maintaining a corresponding belief. Despite its theoretical appeal, policy optimization in Bayesian RL remains computationally challenging as it requires solving composite objectives with expectations. Prior methods either employ search-based techniques with poor computational scalability or impose restrictive posterior assumptions that sacrifice the adaptability of Bayesian RL. To address these limitations, we propose Posterior Hybrid Bayesian Belief (PhyB), which reformulates the expectation as a convex combination over a subset of dynamics models. Theoretical analysis demonstrates that the objective discrepancy induced by this approximation remains bounded. Based on PhyB, we develop an iterative regularized policy optimization algorithm that provides metric-agnostic guarantees for monotonic improvement until convergence. Empirical results demonstrate that PhyB achieves state-of-the-art performance on various benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2606.00542 2026-06-03 cs.LG 版本更新

Rethinking Bregman Divergences in Kronecker-Factored Optimizers

重新思考Kronecker因子优化器中的Bregman散度

Bing Liu, Wenjie Zhou, Chengcheng Zhao

发表机构 * College of Control Science and Engineering, Zhejiang University（浙江大学控制科学与工程学院）； State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences（中国科学院人工智能安全国家重点实验室，计算技术研究所）

AI总结本文通过协方差矩阵谱分析，研究了不同Bregman散度（Frobenius、von Neumann、LogDet）在Kronecker近似误差分配中的角色，并提出一种子空间感知的Kronecker优化器，在顶部子空间应用基于特征值的预处理，在底部子空间使用自适应各向同性加速常数。

详情

AI中文摘要

Shampoo风格的优化器使用Kronecker因子结构近似梯度协方差矩阵。最近的工作~\cite{lin2026understanding}表明，这种近似可以视为Bregman矩阵散度下的投影，从而得到不同的Kronecker因子预条件子。然而，当协方差并非精确Kronecker因子化时，散度选择的作用仍不清楚。我们通过协方差矩阵的谱来研究这个问题。我们表明，Frobenius、von Neumann和LogDet散度将不可避免的Kronecker近似误差以不同方式分布在协方差谱上。我们进一步表明，它们的Kronecker因子由散度加权残差而非原始近似误差主导，解释了这些谱偏好如何在所得预条件子中实现。实验上，我们观察到顶部协方差特征空间与Hessian矩阵的对齐程度显著更好，而尾部谱则更加嘈杂且不可靠。受这些发现启发，我们提出一种子空间感知的Kronecker优化器，在顶部子空间应用基于特征值的预处理，在底部子空间使用自适应各向同性加速常数。

英文摘要

Shampoo-style optimizers approximate gradient covariance matrices using Kronecker-factored structures. Recent work~\cite{lin2026understanding} showed that such approximations can be viewed as projections under Bregman matrix divergences, leading to different Kronecker-factored preconditioners. However, it remains unclear what role the choice of divergence plays when the covariance is not exactly Kronecker-factored. We study this question through the spectrum of the covariance matrix. We show that Frobenius, von Neumann, and LogDet divergences distribute the unavoidable Kronecker approximation error differently across the covariance spectrum. We further show that their Kronecker factors are governed by divergence-weighted residuals rather than the raw approximation error, explaining how these spectral preferences are realized in the resulting preconditioners. Empirically, we observe that the top covariance eigenspace is substantially better aligned with the Hessian matrix, while the tail spectrum is much noisier and unreliable. Motivated by these findings, we propose a subspace-aware Kronecker optimizer that applies eigenvalue-based preconditioning in the top subspace and uses an adaptive isotropic acceleration constant in the bottom subspace.

URL PDF HTML ☆

赞 0 踩 0

2606.00494 2026-06-03 cs.LG 版本更新

ProjQ: Project-and-Quantize for Adapter-Aware LLM Compression

ProjQ：面向适配器感知的大语言模型压缩的投影与量化

Wenya Yu, Chao Zhang, Li Wang, Samson Lasaulce, Merouane Debbah

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出ProjQ框架，通过正交子空间投影将量化噪声约束到低秩流形，利用交替算法将主导误差卸载给适配器，实现更优的量化误差补偿和下游任务微调。

Comments Acceppted paper in ICML 2026

详情

AI中文摘要

训练后量化（PTQ）和低秩适配（LoRA）构成了高效大语言模型（LLM）部署的标准流程。然而，顺序应用它们会带来一个问题：PTQ常常留下分散（在模型权重中）的随机噪声，LoRA难以轻易修复，这意味着LoRA最终会浪费其有限的容量来试图修复不可校正的噪声，而不是提高任务性能。在本文中，我们提出了 extbf{ProjQ}，一种通过正交子空间投影将量化噪声约束到低秩流形的新框架。我们推导出一种高效的交替算法，将量化噪声塑造成低秩结构，有效地将主导误差分量卸载给后续适配器，同时最小化正交“不可校正”子空间中的残差误差。我们的理论分析表明，与标准PTQ相比，ProjQ为下游任务保留了严格更大的模型可塑性。在LLaMA-2、Qwen2.5和Qwen3上的大量实验证实，ProjQ在量化误差补偿和下游任务微调方面均持续优于现有方法，在补偿方面实现了高达$2 imes$的评估损失降低，并且仅用3比特就达到了标准4比特基线在语言建模任务上的性能。代码可在https://github.com/yy9301/ProjQ获取。

英文摘要

Post-Training Quantization (PTQ) and Low-Rank Adaptation (LoRA) constitute the standard pipeline for efficient Large Language Model (LLM) deployment. However, applying them sequentially poses a problem: PTQ often leaves behind random noise that is spread out (across the model's weights) in a way LoRA can't easily fix, meaning that LoRA ends up wasting its limited capacity trying to fix uncorrectable noise instead of improving task performance. In this paper, we propose \textbf{ProjQ}, a novel framework for constraining quantization noise to the low-rank manifold via orthogonal subspace projection. We derive an efficient alternating algorithm that shapes the quantization noise into a low-rank structure, effectively offloading dominant error components to the subsequent adapter while minimizing the residual error in the orthogonal "uncorrectable" subspace. Our theoretical analysis demonstrates that ProjQ preserves strictly greater model plasticity for downstream tasks compared to standard PTQ. Extensive experiments on LLaMA-2, Qwen2.5 and Qwen3 confirm that ProjQ consistently outperforms existing methods in both quantization error compensation and downstream task fine-tuning, achieving up to $2\times$ lower evaluation loss for compensation and matching the performance of standard 4-bit baselines on language modeling tasks with only 3 bits. The code is available on https://github.com/yy9301/ProjQ .

URL PDF HTML ☆

赞 0 踩 0

2606.00395 2026-06-03 cs.LG cs.AI 版本更新

PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning

PR2: 基于MoE的大语言模型强化学习中的预测性路由重放

Daize Dong, Junlin Chen, Haolong Jia, Jiang Liu, Jiawei Wu, Huanwei Di, Jialian Wu, Zhengzhong Liu, Zicheng Liu, Emad Barsoum, Dimitris N. Metaxas, Hongyi Wang

发表机构 * Rutgers University（罗格斯大学）； AMD ； MBZUAI

AI总结针对MoE大语言模型强化学习中路由器漂移导致的不稳定性问题，提出预测性路由重放方法，通过轻量级演化预测器减少路由不匹配，提升训练稳定性和性能。

详情

AI中文摘要

混合专家（MoE）大语言模型（LLM）在规模上实现了强大的性能。然而，基于MoE的LLM的强化学习（RL）常常遭受训练不稳定性。一个根本原因是路由器漂移，即专家激活可能在模型更新时发生剧烈变化，并且在分解的推出和训练阶段之间不同，导致PPO风格RL算法中出现大的推出-训练不匹配和不稳定的重要性采样权重。路由重放通过在每个推理轨迹内冻结重放路由来缓解这个问题，但它忽略了路由器在离策略更新下如何演化，从而导致路由器过时。为了解决这个限制，我们提出了预测性路由重放（PR2），它为每个路由器配备了一个轻量级的演化预测器，学习预测短时域的路由器演化。在推出阶段，我们使用预测性路由分布来应用top-$k$路由，使梯度能够到达更新后可能激活的专家。在训练阶段，我们重放由此产生的预测路由，以保持一致性，从而实现稳定的重要性估计。理论分析和实验支持PR2减少了由路由引起的不匹配，提高了RL稳定性，并在各种推理基准上取得了更强的性能。

英文摘要

Mixture of Experts (MoE) Large Language Models (LLMs) achieve strong performance at scale. However, reinforcement learning (RL) on MoE-based LLMs often suffers from training instability. A root cause is router drift, i.e., expert activations can change drastically across model updates and differ between disaggregated rollout and training phases, causing large rollout--training mismatch and unstable importance sampling weights in PPO-style RL algorithms. Routing replay mitigates this issue by freezing the replay route within each reasoning trajectory, but it ignores how the router evolves under off-policy updates and thus causes router staleness. To address this limitation, we propose Predictive Routing Replay (PR2), which augments each router with a lightweight evolution predictor that learns to anticipate short-horizon router evolution. During the rollout phase, we use the predictive routing distribution to apply top-$k$ routing, enabling gradients to reach experts that are likely to become active after updates. During the training phase, we replay the resulting predicted route to retain consistency for stable importance estimation. Theoretical analysis and experiments support that PR2 reduces routing-induced mismatch, improves RL stability, and yields stronger performance across various reasoning benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2606.00366 2026-06-03 cs.LG math.OC 版本更新

GLENS: Global Search via Learning from Solver Iterates with Diffusion Models

GLENS: 通过扩散模型从求解器迭代中学习进行全局搜索

Anjian Li, Bartolomeo Stellato, Ryne Beeson

发表机构 * Department of Electrical and Computer Engineering, Princeton University（电气工程与计算机科学系，普林斯顿大学）； Department of Operations Research and Financial Engineering, Princeton University（运筹学与金融工程系，普林斯顿大学）； Department of Mechanical and Aerospace Engineering, Princeton University（机械与航空航天工程系，普林斯顿大学）

AI总结提出GLENS方法，利用扩散模型学习求解器迭代过程中的局部几何结构，生成高质量且多样化的初始猜测，加速多模态非凸优化问题的全局搜索。

详情

AI中文摘要

我们考虑为多模态非凸连续优化问题的局部最小值生成大量初始猜测的问题。目标是这些初始猜测质量高（即数值求解器快速收敛）且多样化（即代表许多不同的局部最小值）。识别多个局部最优解能够实现灵活的下游决策，但通常需要昂贵的全局搜索。现有的数据驱动方法仅使用离线求解器运行中最终收敛的最优值来预测初始猜测，这丢弃了关于解局部邻域的信息，并限制了可用的训练数据。我们提出GLENS（通过从求解器迭代中学习进行全局搜索），一种数据高效的全局搜索方法，利用中间求解器迭代作为免费的数据增强。GLENS由两个组件组成：邻域结构模型，使用扩散模型学习以问题参数为条件的最优值周围的局部几何结构；以及求解器行为模型，学习细化方向，在扩散采样期间进一步引导样本朝向附近的最优值。在修改的非凸基准问题和双机器人避障导航问题上的实验表明，GLENS生成高质量的初始猜测，同时保留了多样局部最优值的多模态分布。生成的初始猜测在不同问题设置和求解器中导致更快的求解器收敛。我们还分析了关键超参数选择对性能的影响。

英文摘要

We consider the problem of generating a large collection of initial guesses for local minima of multimodal non-convex continuous optimization problems. The goal is for these initial guesses to be high-quality (i.e., a numerical solver converges quickly) and diverse (i.e., represent many different local minima). Identifying multiple locally optimal solutions enables flexible downstream decision-making, but typically requires expensive global search. Existing data-driven methods predict initial guesses using only the final converged optima from offline solver runs, which discards information about the local neighborhoods of solutions and limits the available training data. We propose GLENS (Global Search via Learning from Solver Iterates), a data-efficient global search method that leverages intermediate solver iterates as free data augmentation. GLENS consists of two components: a neighborhood structure model that uses diffusion models to learn the local geometry around optima conditioned on problem parameters, and a solver behavior model that learns refinement directions to further guide samples towards nearby optima during diffusion sampling. Experiments on modified non-convex benchmark problems and a two-robot obstacle-avoidance navigation problem show that GLENS generates high-quality initial guesses while preserving the multimodal distribution of diverse local optima. The resulting initial guesses lead to faster solver convergence across different problem settings and solvers. We also analyze how key hyperparameter choices affect the performance.

URL PDF HTML ☆

赞 0 踩 0

2606.00188 2026-06-03 cs.GR cs.CV cs.LG 版本更新

PaintBench: Deterministic Evaluation of Precise Visual Editing

PaintBench: 精确视觉编辑的确定性评估

Kai Xu, Ellis Brown, Shrikar Madhu, Rob Fergus, He He, Saining Xie

发表机构 * New York University（纽约大学）

AI总结提出PaintBench基准，通过程序化生成20种基本视觉编辑操作，实现确定性像素级评估，发现当前模型性能低（最高mIoU 17.1%），并揭示任务分解和场景变化的影响。

Comments Project Page: https://paintbench.github.io/

详情

AI中文摘要

虽然当前的多模态模型在开放式视觉编辑方面表现熟练，但执行精确的单答案编辑仍然是一个重要障碍。为了探究这一挑战，我们引入了PaintBench，一个动态可扩展的基准测试，针对四个类别的20种基本精确视觉编辑操作：几何变换、结构操作、颜色变化和符号推理。具有可配置复杂性的程序化生成实现了有效无限、抗污染的评估套件，而确定性像素级评估消除了对易偏见的评判模型的依赖。在11个图像编辑模型中，我们发现整体性能较低，当前表现最佳的行业领先者仅得17.1%（mIoU）。任务分解揭示了特别具有挑战性的操作类型（几何变换、大多数结构操作、基于公式的颜色变化）和模型特定的专长。细粒度的基准诊断进一步显示了由对象数量、背景复杂性、配色方案和编辑区域大小等场景变化引起的性能下降。为了测试PaintBench分数对应用任务性能的泛化能力，我们创建了一个用于数据可视化编辑的程序化确定性评估（TinyGrafixBench），并发现其与PaintBench分数之间存在强线性相关性（$R^2 = 0.91$, $p < 0.001$）。总之，PaintBench为衡量和推动精确多模态视觉编辑的进展提供了严格的基础。

英文摘要

While current multimodal models are proficient at open-ended visual editing, executing precise single-answer edits remains an important obstacle. To probe this challenge, we introduce PaintBench, a dynamically scalable benchmark targeting 20 fundamental precise visual editing operations across four categories: geometric transformation, structural manipulation, color change, and symbolic reasoning. Procedural generation with configurable complexity enables an effectively infinite, contamination-resistant evaluation suite, and deterministic pixel-level evaluation eliminates reliance on bias-prone judge models. Across 11 image editing models, we find overall low performance, with the current highest-performing industry leader scoring only 17.1% (mIoU). Task decomposition reveals especially challenging operation types (geometric transformation, most structural manipulation, formula-based color change) and model-specific specializations. Fine-grained benchmark diagnostics further show performance degradations induced by scene variations in object count, background complexity, color scheme, and edit-region size. To test generalization of PaintBench scores to applied task performance, we create a procedural, deterministic evaluation for data visualization editing (TinyGrafixBench) and find strong linear correlation with PaintBench scores ($R^2 = 0.91$, $p < 0.001$). Altogether, PaintBench provides a rigorous foundation for measuring and driving progress in precise multimodal visual editing.

URL PDF HTML ☆

赞 0 踩 0

2605.30952 2026-06-03 cs.LG 版本更新

Spectral Anatomy of Quantum Gaussian Process Kernels

量子高斯过程核的谱解剖

Jian Xu, Chao Li, Guang Lin, Yuning Qiu, Delu Zeng, John Paisley, Qibin Zhao

发表机构 * RIKEN iTHEMS ； RIKEN AIP ； South China University of Technology（华南理工大学）； Columbia University（哥伦比亚大学）

AI总结通过归一化谱熵S(K)/log n统一解释了量子高斯过程回归中指数加速失效与后验病理现象，并证明该诊断量在多种量子与经典核上具有普适性，且在IBM Heron硬件上实现了低误差迁移。

详情

AI中文摘要

两个近期结果重塑了量子高斯过程（QGPs）。一方面，\citet{lowe2025assessing} 排除了在典型、良态条件下基于HHL的QGP回归声称的指数加速；另一方面，一项独立工作表明，高表达性量子核存在后验病理，破坏了贝叶斯优化。我们证明这些看似无关的现象由同一个量控制：核Gram矩阵的归一化谱熵 $S(K)/\log n$。我们证明了Nyström近似误差的Cauchy–Schwarz尾部界、以Bach自由度 $d_σ(K)$ 表示的有限样本方差收缩恒等式，以及通过目标在核本征基中的内在维数对 \emph{依赖于目标} 的最优熵的表征。实验上，该诊断量与核无关：硬件高效、matchgate、IQP \emph{以及} RBF/Matérn/RFF/深度核族在去量子化、ECE和方差收缩面板上全部坍缩到相同的 $S/\log n$ 曲线上。NLL最佳点位于光滑目标的高熵和带限量子数据目标的低熵。该诊断量从模拟器迁移到IBM Heron硬件，在 $n_q = 4$ 的 $24$ 种配置中 $S/\log n$ 的中位绝对误差为 $3.2\%$，平均误差为 $5.2\%$，其中matchgate和IQP的平均误差在 $5\%$ 以内，单个HE配置返回 $30\%$ 的异常值，重新运行时降至 $0.5\%$（归因于校准漂移）；相同的诊断量迁移到第二个Heron后端（平均误差 $2.7\%$）以及原始后端上的 $n_q = 6$ 扩展（平均误差 $1.7\%$）。全程未应用误差缓解。

英文摘要

Two recent results have reshaped quantum Gaussian processes (QGPs). On the one hand, \citet{lowe2025assessing} rule out the exponential speedups claimed by HHL-based QGP regression in the typical, well-conditioned regime; on the other, an independent line of work shows that highly expressive quantum kernels suffer posterior pathologies that break Bayesian optimization. We show that these seemingly unrelated phenomena are governed by the same quantity: the normalized spectral entropy $S(K)/\log n$ of the kernel Gram matrix. We prove a Cauchy--Schwarz tail bound on Nyström approximation error, a finite-sample variance-contraction identity in terms of Bach's degrees of freedom $d_σ(K)$, and a characterization of the \emph{target-dependent} optimal entropy via the intrinsic dimension of the target in the kernel eigenbasis. Empirically, the diagnostic is kernel-agnostic: hardware-efficient, matchgate, IQP \emph{and} RBF/Matérn/RFF/deep-kernel families all collapse onto identical $S/\log n$ curves on dequantization, ECE, and variance-contraction panels. The NLL sweet spot lives at high entropy for smooth targets and at low entropy for band-limited quantum-data targets. The diagnostic transfers from simulator to IBM Heron hardware with median absolute error $3.2\%$ and mean $5.2\%$ in $S/\log n$ across $24$ configurations at $n_q = 4$, with matchgate and IQP within $5\%$ mean and a single HE configuration returning a $30\%$ outlier that drops to $0.5\%$ on rerun (attributed to calibration drift); the same diagnostic transfers to a second Heron backend (mean error $2.7\%$) and to a $n_q = 6$ scale-up on the original backend (mean error $1.7\%$). No error mitigation is applied throughout.

URL PDF HTML ☆

赞 0 踩 0

2605.30789 2026-06-03 cs.LG cs.AI 版本更新

幻觉检测的自动层选择

Xinpeng Wang, William X. Cao, Andrew Gordon Wilson, Zhe Zeng

发表机构 * University of Washington（华盛顿大学）

AI总结针对大语言模型中幻觉检测的层选择问题，提出无需训练的FEPoID准则，自动识别最优中间层，并结合截断策略提升检测性能。

Comments Accepted at ICML 2026

详情

AI中文摘要

最近关于幻觉检测的研究表明，在大语言模型（LLMs）中，与幻觉相关的信号在中间层比在最后一层编码得更强。尽管越来越多的研究试图利用这一特性进行幻觉检测，但如何自动选择高性能层仍未得到充分探索，且缺乏针对此目的的原则性方法。为填补这一空白，我们首先提出了几个关于为何这些信号出现在中间层的假设，并评估了相应的自动层选择准则，这些准则适用于不同的LLM架构、规模和任务，涵盖了问答和摘要幻觉检测基准。然而，我们发现这些准则均不能持续提供令人满意的性能。因此，我们提出了一种新的选择准则——第一有效本征维度峰值（FEPoID），它能够一致地识别最优或接近最优的层，并优于上述准则和现有的幻觉检测基线。FEPoID无需训练，且计算开销可忽略不计。此外，我们研究了LLM的生成行为，并引入了一种简单而有效的截断策略，该策略进一步放大了与幻觉相关的信号，并显著提高了整体检测性能。代码公开于 https://github.com/DesoloYw/Automatic-Layer-Selection-for-Hallucination-Detection.git

英文摘要

Recent studies on hallucination detection have shown that hallucination-related signals are more strongly encoded in intermediate layers than in the final layer of large language models (LLMs). Although a growing body of work has sought to exploit this property for hallucination detection, how to automate the selection of high-performing layers remains underexplored, and principled methods for this purpose are still lacking. To address this gap, we first propose several hypotheses for why such signals emerge in intermediate layers and evaluate corresponding criteria for automatic layer selection across diverse LLM architectures, scales, and tasks, covering both question answering and summarization hallucination detection benchmarks. However, we find that none of these criteria consistently delivers satisfactory performance. We therefore propose a new selection criterion, First Effective Peak of Intrinsic Dimension (FEPoID), which consistently identify optimal or near-optimal layers and outperforms both the aforementioned criteria and existing hallucination detection baselines. FEPoID is training-free and incurs negligible computational overhead. In addition, we study the generation behaviors of LLMs and introduce a simple yet effective truncation strategy, which further amplifies hallucination-related signals and substantially improves overall detection performance. Code is publicly available at https://github.com/DesoloYw/Automatic-Layer-Selection-for-Hallucination-Detection.git

URL PDF HTML ☆

赞 0 踩 0

2605.25902 2026-06-03 cs.LG 版本更新

Reading the Finetuning Prior: Verbatim Content Recovery via Contrastive Decoding Diffing

读取微调先验：通过对比解码差异进行逐字内容恢复

Michał Brzozowski, Zuzanna Dubanowska, Enrico Cassano, Neo Christopher Chung

发表机构 * Samsung AI Center（三星人工智能中心）； University of Turin（都灵大学）； University of Warsaw（华沙大学）

AI总结提出对比解码差异（CDD）方法，仅基于输出层logit分布，无需权重或内部访问，即可从微调模型中逐字恢复植入事实，并在多种架构和场景下优于白盒方法。

详情

AI中文摘要

窄微调的语言模型会逐字记忆植入的内容，但在无法访问模型权重或训练数据的情况下，审计已部署模型所学内容仍然是一个开放挑战。最近的研究表明，基础模型与微调模型之间的激活差异携带了微调领域的可读痕迹；最先进的激活差异透镜（ADL）可以恢复模糊的领域级描述，但需要完全的“白盒”访问模型内部。我们引入了对比解码差异（CDD），一种仅操作输出级logit分布的模型差异方法，无需权重访问、无需层选择、无需针对模型调参，却能恢复植入的事实。CDD包含三个思想：绕过聊天模板以暴露原始微调先验，用最大模糊的前缀填充种子生成，以及在每个解码步骤放大微调模型与基础模型之间的logit空间差异。一个单一的默认配置即可逐字恢复植入的事实——精确的药物名称、投票数、物理测量值和程序细节——涵盖四种架构（1B-32B参数），尽管访问更少且运行速度约快170倍，但全面优于ADL。此外，CDD揭示了意外的数据管道伪影：LLM数据生成器通过模式崩溃引入的虚构角色泄露到模型权重中，并被CDD提取出来，据我们所知，这构成了首个从数据生成器伪影到模型权重再到恢复输出的端到端指纹链。我们在真实领域微调设置上进行了验证，在所有单数据集非CoT变体上实现了近乎完美的恢复，并在混合数据集设置中正确识别了所有四个数据集。CDD作为一种灰盒方法成功超越了白盒基线，凸显了其在AI系统透明度和问责性方面的实际效用。

英文摘要

Narrowly finetuned language models memorize implanted content verbatim, but auditing what a deployed model has been taught, without access to its weights or training data, remains an open challenge. Recent work shows that activation differences between base and finetuned models carry readable traces of the finetuning domain; the state-of-the-art Activation Difference Lens (ADL) recovers a vague domain-level description but requires full "white-box" access to model internals. We introduce Contrastive Decoding Diffing (CDD), a model diffing method that operates on output-level logit distributions only, with no weight access, no layer selection, and no per-model tuning, yet recovers implanted facts. CDD consists of three ideas: bypassing the chat template to expose the raw finetuning prior, seeding generation with maximally vague pre-fills, and amplifying the logit-space difference between finetuned and base models at each decoding step. A single default configuration recovers implanted facts verbatim -- exact drug names, vote counts, physical measurements, and procedural details -- across four architectures (1B--32B parameters), uniformly outperforming ADL despite less access and running ~170x faster. Furthermore, CDD surfaces unintended data pipeline artifacts: a fictional persona introduced by the LLM data generator via mode collapse leaked into model weights and was extracted by CDD, constituting to our knowledge the first demonstrated end-to-end fingerprinting chain from data generator artifact to model weights to recovered output. We validate on real-domain finetuning settings, achieving near-perfect recovery across all single-dataset non-CoT variants and correctly identifying all four datasets in the mixed-dataset setting. CDD's success as a grey-box method outperforming white-box baselines underscores its practical utility for transparency and accountability in AI systems.

URL PDF HTML ☆

赞 0 踩 0

2603.05691 2026-06-03 cs.LG stat.ML 版本更新

WildRoadBench: 面向视觉语言模型与自主智能体的野外航拍道路损伤定位基准

Bingnan Liu, Chenhang Cui, Rui Huang, Jiani Luo, Zhirong Shen, Tinghao Wang, Xiande Huang, Lingbei Meng, Fei Shen, An Zhang

发表机构 * University of Electronic Science and Technology of China（电子科技大学）； National University of Singapore（新加坡国立大学）； De Artificial Intelligence Lab（德人工智能实验室）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））； University of Science and Technology of China（中国科学技术大学）

AI总结提出WildRoadBench基准，通过VLM直接定位和LLM驱动智能体自主研究两种协议，评估模型在航拍道路损伤定位上的性能，发现现有方法在野外场景下仍不可靠。

Comments Preprint. Under review. 4 figures, 6 tables

详情

AI中文摘要

我们介绍了WildRoadBench，一个野外航拍道路损伤定位基准，它在一个专业标注的无人机语料库上，将视觉语言模型的直接视觉定位与LLM驱动的智能体的自主研究与工程相结合。在两种协议下评估相同的图像集和相同的每类AP_50指标。VLM轨道衡量固定VLM是否能在统一的提示、解码和解析流程下，从一张图像和一个简短提示中定位特定领域的损伤。智能体轨道衡量一个自主智能体，在仅给定书面任务简介、少量探索切片和固定交互预算的情况下，能否搜索公共网络、调整预训练组件、编写训练和推理代码，并通过隐藏保留集上的标量反馈预言机提交预测。我们对广泛的闭源前沿模型和开源VLM以及几个前沿LLM驱动的智能体进行了基准测试。在野外环境中，两种途径都远未达到可靠性能：闭源前沿模型在VLM排行榜上领先，但仍留下超过一半的指标未达到；开源定位器远低于它们，且新一代或推理型变体并未持续改进定位；每个开源模型的小目标均崩溃；尽管智能体拥有更丰富的功能，但仍落后于最强的VLM，且有几个未能在预算内提交有效结果。我们在https://anonymous.4open.science/r/wildroadbench-0607发布代码和数据，以支持可重复的后续研究。

英文摘要

We introduce WildRoadBench, a wild aerial road-damage grounding benchmark that couples direct visual grounding by vision-language models with autonomous research-and-engineering by LLM-driven agents on a single professionally annotated UAV corpus. The same image set and the same per-class AP_50 metric are evaluated under two protocols. The VLM Track measures whether a fixed VLM can localise domain-specific damage from one image and one short prompt under a unified prompting, decoding and parsing pipeline. The Agent Track measures whether an autonomous agent, given only a written task brief, a small exploratory slice and a fixed interaction budget, can search the public web, adapt pretrained components, write training and inference code, and submit predictions through a scalar-feedback oracle on a hidden holdout. We benchmark a broad pool of closed-source frontier models and open-source VLMs together with several frontier LLM-driven agents. Both routes remain far from reliable performance in this wild setting: closed-source frontier models lead the VLM leaderboard but still leave more than half of the metric on the table; open-source grounders plateau well below them, and newer generations or reasoning-style variants do not consistently improve grounding; small targets collapse for every open-source model; agents lag the strongest VLM despite richer affordances, and several fail to land a valid submission within the budget. We release the code and data at https://anonymous.4open.science/r/wildroadbench-0607 to support reproducible follow-up research.

URL PDF HTML ☆

赞 0 踩 0

2605.19805 2026-06-03 cs.LG cs.AI stat.ML 版本更新

Latent Laplace Diffusion for Irregular Multivariate Time Series

潜在拉普拉斯扩散用于不规则多元时间序列

Zinuo You, Jin Zheng, John Cartlidge

发表机构 * University of Cambridge（剑桥大学）

AI总结提出潜在拉普拉斯扩散（LLapDiff）生成框架，通过低维潜在轨迹和拉普拉斯域参数化实现不规则时间序列的长时预测与缺失值插补。

Comments Accepted as a Spotlight at ICML 2026. The Version of Record will appear in Proceedings of Machine Learning Research (PMLR). 27 pages, 5 figures. Code: https://github.com/pixelhero98/LLapDiffusion

详情

AI中文摘要

不规则多元时间序列对长期预测提出了权衡：离散方法通过重新网格化可能扭曲时间结构，而连续时间模型通常需要容易漂移的序贯求解器。为弥合这一差距，我们提出了潜在拉普拉斯扩散（LLapDiff），一种生成式框架，将目标建模为低维潜在轨迹，从而无需逐步积分物理时间即可实现全范围生成。我们利用受随机端口-哈密顿动力学启发的稳定模态参数化来引导逆向过程，并通过可学习的共轭复极点参数化其在拉普拉斯域中的均值演化，从而能够在不规则时间戳上直接评估。我们还通过更新平均分析将连续动力学与不规则观测联系起来，该分析将采样间隙映射到有效事件域极点，并激发了间隙感知的历史总结器。大量实验表明，LLapDiff在长期预测中优于基线，其连续时间生成性质通过在同一模型的历史时间戳上查询，支持缺失值插补。代码可在https://github.com/pixelhero98/LLapDiffusion获取。

英文摘要

Irregular multivariate time series impose a trade-off for long-horizon forecasting: discrete methods can distort temporal structure via re-gridding, while continuous-time models often require sequential solvers prone to drift. To bridge this gap, we present Latent Laplace Diffusion (LLapDiff), a generative framework that models the target as a low-dimensional latent trajectory, enabling horizon-wide generation without step-by-step integration over physical time. We guide the reverse process utilizing a stable modal parameterization motivated by stochastic port-Hamiltonian dynamics, and parameterize its mean evolution in the Laplace domain via learnable complex-conjugate poles, enabling direct evaluation over irregular timestamps. We also link continuous dynamics to irregular observations through renewal-averaging analysis, which maps sampling gaps to effective event-domain poles and motivates a gap-aware history summarizer. Extensive experiments show that LLapDiff improves over baselines in long-horizon forecasting, and its continuous-time generative nature supports missing-value imputation by querying the same model at historical timestamps. Code is available at https://github.com/pixelhero98/LLapDiffusion.

URL PDF HTML ☆

赞 0 踩 0

2605.18740 2026-06-03 cs.CV cs.AI cs.CL cs.LG 版本更新

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Vision-OPD：通过在线策略自蒸馏学习多模态大语言模型的精细细节

Qianhao Yuan, Jie Lou, Xing Yu, Hongyu Lin, Le Sun, Xianpei Han, Yaojie Lu

发表机构 * Tsinghua University（清华大学）

AI总结提出Vision-OPD框架，通过在线策略自蒸馏将模型自身的局部区域感知能力迁移到全局图像策略，提升多模态大语言模型对细粒度视觉理解的准确性。

Comments Project page: https://github.com/VisionOPD/Vision-OPD

详情

AI中文摘要

多模态大语言模型（MLLMs）在细粒度视觉理解方面仍然存在困难，答案往往依赖于全图中微小但决定性的证据。我们观察到一种区域到全局的感知差距：当以证据为中心的裁剪图像为条件时，同一MLLM回答细粒度问题的准确率高于以对应全图为条件，这表明许多失败源于难以聚焦于相关证据，而非局部识别能力不足。受此观察启发，我们提出Vision-OPD（视觉在线策略蒸馏），一种区域到全局的自蒸馏框架，将模型自身特权的区域感知迁移到其全图策略。Vision-OPD从同一MLLM实例化两个条件策略：一个以裁剪图像为条件的教师和一个以全图为条件的学生。学生生成在线策略轨迹，Vision-OPD沿这些轨迹最小化教师和学生下一个词元分布之间的词元级差异。这使得模型能够内化视觉放大的好处，而无需外部教师模型、真实标签、奖励验证器或推理时工具使用。在多个细粒度视觉理解基准上的实验表明，Vision-OPD模型在性能上可与更大的开源、闭源以及“思考图像”智能体模型相媲美或更优。

英文摘要

Multimodal Large Language Models (MLLMs) still struggle with fine-grained visual understanding, where answers often depend on small but decisive evidence in the full image. We observe a regional-to-global perception gap: the same MLLM answers fine-grained questions more accurately when conditioned on evidence-centered crops than on the corresponding full images, suggesting that many failures stem from difficulty to focus on relevant evidence rather than insufficient local recognition ability. Motivated by this observation, we propose Vision-OPD (Vision On-Policy Distillation), a regional-to-global self-distillation framework that transfers the model's own privileged regional perception to its full-image policy. Vision-OPD instantiates two conditional policies from the same MLLM: a crop-conditioned teacher and a full-image-conditioned student. The student generates on-policy rollouts, and Vision-OPD minimizes token-level divergence between the teacher and student next-token distributions along these rollouts. This enables the model to internalize the benefit of visual zooming without external teacher models, ground-truth labels, reward verifiers, or inference-time tool use. Experiments on multiple fine-grained visual understanding benchmarks show that Vision-OPD models achieve competitive or superior performance against much larger open-source, closed-source, and "Thinking-with-Images" agentic models. The code is available at https://github.com/VisionOPD/Vision-OPD

URL PDF HTML ☆

赞 0 踩 0

2605.19262 2026-06-03 cs.LG cs.CR 版本更新

Backdooring Masked Diffusion Language Models

掩码扩散语言模型的后门攻击

Daniel Yiming Cao, Chengzhong Wang, Sheng-Yen Chou, Chengyu Huang, Pin-Yu Chen, Shengwei An

发表机构 * Cornell University（康奈尔大学）； Virginia Tech（弗吉尼亚理工学院）； IBM Research（IBM研究院）

AI总结提出SHADOWMASK后门攻击方法，通过修改掩码扩散语言模型的前向破坏过程，实现高成功率攻击并保持模型清洁性能。

详情

AI中文摘要

掩码扩散语言模型（MDLM）正成为文本生成的一种引人注目的新范式，但其训练时安全性仍 largely unexplored。现有的针对高斯扩散模型或自回归语言模型的后门攻击并不直接适用于MDLM，因为MDLM依赖于离散状态破坏和迭代去噪，而非连续噪声或从左到右预测。在这项工作中，我们首次对MDLM的训练时后门攻击进行了系统研究。我们提出了SHADOWMASK，一种通过将标准全掩码终端分布替换为触发-掩码混合先验来修改MDLM前向破坏过程的后门攻击。这创建了一条从触发破坏状态到攻击者指定目标的专用去噪路径，同时保持清洁去噪行为。我们进一步通过定义后门前向过程、推导反向时间后验以及获得连续时间训练目标，提供了原理性的数学表述。在基于DiT的MDLM和LLaDA-8B-Instruct上，针对WikiText-103、OpenWebText和Alpaca的评估表明，SHADOWMASK实现了接近100%的攻击成功率，显著优于标准数据投毒，在很大程度上保持了清洁效用，在全模型和参数高效微调下仍然有效，并且对代表性防御具有鲁棒性。

英文摘要

Masked diffusion language models (MDLMs) are emerging as a compelling new paradigm for text generation, but their training-time security remains largely unexplored. Existing backdoor attacks on Gaussian diffusion models or autoregressive language models do not directly apply to MDLMs because MDLMs rely on discrete state corruption and iterative denoising rather than continuous noising or left-to-right prediction. In this work, we present the first systematic study of training-time backdoor attacks on MDLMs. We propose SHADOWMASK, a backdoor attack that modifies the MDLM forward corruption process by replacing the standard all-mask terminal distribution with a trigger-mask mixture prior. This creates a dedicated denoising pathway from trigger-corrupted states to attacker-specified targets while preserving clean denoising behavior. We further provide a principled mathematical formulation by defining the backdoored forward process, deriving the reverse-time posterior, and obtaining the continuous-time training objective. Evaluations on DiT-based MDLM and LLaDA-8B-Instruct across WikiText-103, OpenWebText, and Alpaca show that SHADOWMASK achieves near-100% attack success, substantially outperforms standard data poisoning, largely preserves clean utility, remains effective under full-model and parameter-efficient fine-tuning, and is robust against representative defenses.

URL PDF HTML ☆

赞 0 踩 0

2512.18552 2026-06-03 cs.SE cs.AI cs.CL cs.LG 版本更新

Toward Training Superintelligent Software Agents through Self-Play SWE-RL

通过自我对弈SWE-RL训练超级智能软件代理

Yuxiang Wei, Zhiqing Sun, Emily McMilin, Jonas Gehring, David Zhang, Gabriel Synnaeve, Daniel Fried, Lingming Zhang, Sida Wang

发表机构 * Meta FAIR ； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Meta TBD Lab（Meta TBD 实验室）； Carnegie Mellon University（卡内基梅隆大学）

AI总结提出自我对弈SWE-RL（SSR）方法，通过强化学习在自对弈环境中训练单一LLM代理，使其在无需人工标注问题或测试的情况下，在真实代码库中迭代注入和修复软件缺陷，在SWE-bench基准上实现显著自我改进并超越人类数据基线。

Comments Accepted to ICML 2026

详情

AI中文摘要

尽管当前由大型语言模型（LLM）和智能体强化学习（RL）驱动的软件代理能够提高程序员的生产力，但其训练数据（例如GitHub问题和拉取请求）和环境（例如通过-通过和失败-通过测试）严重依赖人类知识或整理，这构成了通向超级智能的根本障碍。在本文中，我们提出了自我对弈SWE-RL（SSR），这是迈向超级智能软件代理训练范式的第一步。我们的方法仅需最小的数据假设，只需访问带有源代码和已安装依赖项的沙盒化仓库，无需人工标注的问题或测试。基于这些真实世界的代码库，单个LLM代理通过强化学习在自我对弈环境中进行训练，以迭代地注入和修复复杂度逐渐增加的软件缺陷，每个缺陷由测试补丁而非自然语言问题描述正式指定。在SWE-bench Verified和SWE-Bench Pro基准上，SSR实现了显著的自我改进（分别提升+10.4和+7.8分），并在整个训练轨迹中持续优于人类数据基线，尽管其评估的是自我对弈中未出现的自然语言问题。我们的结果虽然尚处于早期阶段，但表明了一条路径，即代理可以从真实软件仓库中自主收集广泛的学习经验，最终实现超越人类能力的超级智能系统，在理解系统构建方式、解决新挑战以及从头开始自主创建新软件方面超越人类。

英文摘要

While current software agents powered by large language models (LLMs) and agentic reinforcement learning (RL) can boost programmer productivity, their training data (e.g., GitHub issues and pull requests) and environments (e.g., pass-to-pass and fail-to-pass tests) heavily depend on human knowledge or curation, posing a fundamental barrier to superintelligence. In this paper, we present Self-play SWE-RL (SSR), a first step toward training paradigms for superintelligent software agents. Our approach takes minimal data assumptions, only requiring access to sandboxed repositories with source code and installed dependencies, with no need for human-labeled issues or tests. Grounded in these real-world codebases, a single LLM agent is trained via reinforcement learning in a self-play setting to iteratively inject and repair software bugs of increasing complexity, with each bug formally specified by a test patch rather than a natural language issue description. On the SWE-bench Verified and SWE-Bench Pro benchmarks, SSR achieves notable self-improvement (+10.4 and +7.8 points, respectively) and consistently outperforms the human-data baseline over the entire training trajectory, despite being evaluated on natural language issues absent from self-play. Our results, albeit early, suggest a path where agents autonomously gather extensive learning experiences from real-world software repositories, ultimately enabling superintelligent systems that exceed human capabilities in understanding how systems are constructed, solving novel challenges, and autonomously creating new software from scratch.

URL PDF HTML ☆

赞 0 踩 0

2605.18629 2026-06-03 cs.LG 版本更新

Aligned Training: A Parameter-Free Method to Improve Feature Quality and Stability of Sparse Autoencoders (SAE)

对齐训练：一种无参数方法提升稀疏自编码器（SAE）的特征质量与稳定性

Michał Brzozowski, Neo Christopher Chung

发表机构 * Samsung AI Center（三星人工智能中心）； University of Warsaw（华沙大学）

AI总结提出无参数的对齐训练方法，通过强制编码器与解码器方向内积为1的几何约束，同时提升稀疏自编码器的重建质量、消除死特征并增强训练稳定性。

详情

AI中文摘要

稀疏自编码器（SAE）是解释深度神经网络（DNN）内部工作机制的主要方法之一，将激活分解为高维特征。然而，它们存在关键缺陷：大量特征从未被激活且不稳定。尽管SAE的变体试图缓解这些问题，但它们需要额外的数据、重采样或训练。我们提出了 extbf{对齐训练}，一种无参数的SAE重参数化方法，同时提升重建质量、消除死特征，并显著增强跨训练种子的稳定性。我们的方法源于一个被忽视的观察：SAE特征质量（通过编码器和解码器方向之间的内积衡量，我们称之为 extbf{对齐分数}）在所有现代架构中呈现双峰分布。所提出的对齐训练在编码器和解码器之间施加几何约束，使得每个特征的内积等于1，从而在不增加任何超参数的情况下消除了SAE训练中的一个退化来源。在多个模型、字典大小和稀疏度水平上，对齐训练在SAEBench基准测试中显示出帕累托改进。除了改善死特征、稳定性和重建外，我们的方法可以轻松集成到机械可解释性技术中，例如Top/BatchTop-K架构和p退火。总体而言，对齐训练在不增加计算复杂度或成本的情况下显著提升了SAE的特征质量和稳定性。

英文摘要

Sparse autoencoders (SAEs) are one of the main methods to interpret the inner workings of deep neural networks (DNNs), decomposing activations into higher-dimensional features. However, they exhibit critical shortcomings where a large fraction of features are never activated and are unstable. Despite variants of SAEs that attempt to mitigate these issues, they require additional data, resampling, or training. We propose the \textbf{aligned training}, a parameter-free reparameterization of SAEs that simultaneously improves reconstruction quality, eliminates dead features, and significantly enhances stability across training seeds. Our approach is motivated by an overlooked observation that SAE feature quality, measured by the inner product between encoder and decoder directions (which we call the \textbf{alignment score}), follows a bimodal distribution across all modern architectures. The proposed aligned training enforces a geometric constraint between the encoder and decoder such that their inner product equals one for every feature, which removes a source of degeneracy in the SAE training without adding any hyperparameters. Across multiple models, dictionary sizes, and sparsity levels, the aligned training shows Pareto improvements on the SAEBench benchmarks. Beyond improving dead features, stability and reconstruction, our method readily integrates with techniques in mechanical interpretability such as Top/BatchTop-K architectures and p-Annealing. Overall, the aligned training substantially improves feature quality and stability of SAE without computational complexity or cost.

URL PDF HTML ☆

赞 0 踩 0

2605.18106 2026-06-03 math.OC cs.AI cs.LG stat.ML 版本更新

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

优化器设计的对称性兼容原理：嵌入、LM头、SwiGLU MLP和MoE路由器

Tim Tsz-Kit Lau, Weijie Su

发表机构 * University of Pennsylvania（宾夕法尼亚大学）； Wharton School（沃顿商学院）

AI总结针对现代神经网络参数空间的对称性与坐标级优化器之间的几何不匹配，提出对称性兼容的优化器设计原则，并针对嵌入矩阵、LM头、SwiGLU MLP投影和MoE路由器等特殊参数块导出相应更新规则，实验证明其改善验证损失、负载平衡和训练稳定性。

详情

AI中文摘要

深度学习实践中长期存在一种显著的几何差异。现代神经网络架构自然展现出丰富的对称性和等变性，而流行的优化器如Adam及其变体本质上是坐标级的，无法尊重参数空间的等变结构。我们通过引入优化器设计的对称性兼容原则来解决这一差异：梯度更新规则应在作用于相应权重块的对称群下等变。遵循这一原则，我们首先为一般矩阵层提供了双正交等变更新的统一视角，如随机谱下降、Muon、Scion和极梯度方法所采用的。更重要的是，通过从正交群转向置换和共享移位对称性，我们为参数块（其对称性与一般矩阵层不同）推导了对称性兼容的优化器：嵌入和LM头矩阵、SwiGLU MLP投影以及MoE路由器矩阵。这些构造包括单边谱、行范数、混合行范数/谱、行感知、列感知、中心行范数和左谱更新。它们产生了一个端到端的逐层优化器堆栈，其中每个主要的矩阵值参数类被分配一个更新，其等变性与其对称群匹配。我们通过在密集和稀疏MoE语言模型上的预训练实验验证了这一原则，包括Qwen3-0.6B风格、Gemma 3 1B风格、OLMoE-1B-7B风格和缩小版gpt-oss架构。在这些实验中，对称性兼容的更新规则一致地改善了最终验证损失，减少了稀疏MoE模型中的负载不平衡，并在若干情况下比相应的AdamW更新提高了训练稳定性。

英文摘要

A striking geometric disparity has long persisted in the practice of deep learning. While modern neural network architectures naturally exhibit rich symmetry and equivariance properties, popular optimizers such as Adam and its variants operate inherently coordinate-wise, rendering them unable to respect the equivariance structures of the parameter space. We address this disparity by introducing a symmetry-compatible principle for optimizer design: the gradient update rule should be equivariant under the symmetry group acting on the corresponding weight block. Following this principle, we first provide a unified perspective on bi-orthogonally equivariant updates for general matrix layers, as employed by stochastic spectral descent, Muon, Scion, and polar gradient methods. More importantly, by moving from orthogonal groups to permutation and shared-shift symmetries, we derive symmetry-compatible optimizers for parameter blocks whose symmetries differ from those of general matrix layers: embedding and LM head matrices, SwiGLU MLP projections, and MoE router matrices. These constructions include one-sided spectral, row-norm, hybrid row-norm/spectral, row-aware, column-aware, centered row-norm, and left-spectral updates. They yield an end-to-end layerwise optimizer stack in which each major matrix-valued parameter class is assigned an update whose equivariance matches its symmetry group. We corroborate this principle through pre-training experiments on dense and sparse MoE language models, including Qwen3-0.6B-style, Gemma 3 1B-style, OLMoE-1B-7B-style, and downsized gpt-oss architectures. Across these experiments, symmetry-compatible update rules consistently improve final validation loss, reduce load imbalance in sparse MoE models, and in several cases improve training stability over the corresponding AdamW updates.

URL PDF HTML ☆

赞 0 踩 0

2605.17866 2026-06-03 cs.LG 版本更新

DAD4TS: Data-Augmentation-Oriented Diffusion Model for Time-Series Forecasting with Small-Scale Data

DAD4TS：面向小规模数据的时间序列预测的数据增强扩散模型

Masahiro Suzuki, Bohui Xia, Hiroto Yamamoto, Masanori Miyahara

发表机构 * Sony Group Corporation（索尼集团公司）

AI总结针对小规模时间序列数据预测中数据增强生成有意义数据困难的问题，提出基于扩散模型和强化学习的DAD4TS方法，通过几何空间投影训练扩散模型，在多个数据集和模型上验证了有效性。

详情

AI中文摘要

小规模数据是时间序列预测任务中的一个关键问题。数据增强是解决该问题的有效策略，但在生成有意义的数据方面存在局限性。为了解决这一局限性，我们提出了DAD4TS，一种基于扩散模型并结合强化学习的数据增强方法，专为小规模数据的时间序列预测设计。在DAD4TS中，数据生成器与时间序列模型同时训练，并由强化学习模型控制，以高效生成能够提高时间序列模型预测准确性的样本。为了支持小规模数据，我们使用数学方法代替传统的VAE方法，通过将时间序列数据投影到几何空间来训练扩散模型。我们通过定性和定量实验，在六个真实世界数据集和八个时间序列模型上，使用七种对比方法验证了DAD4TS的有效性。结果表明，DAD4TS在五个数据集上得到了验证。

英文摘要

Small-scale data is a critical problem in time-series forecasting tasks. Data augmentation is an effective strategy for this task, but it has a limitation in generating meaningful data. To address this limitation, we propose DAD4TS, a diffusion-model-based data augmentation method with reinforcement learning, designed for time-series forecasting with small-scale data. In DAD4TS, a data generator is simultaneously trained with a time-series model and controlled by a reinforcement learning model to efficiently generate samples that improve the forecast accuracy of the time-series model. To support small-scale data, we use mathematical methods instead of conventional VAE methods to train the diffusion model by projecting the time-series data into the geometric space. We validated the effectiveness of DAD4TS with seven comparative methods through qualitative and quantitative experiments on six real-world datasets and eight time-series models. As a result, DAD4TS was validated on five datasets.

URL PDF HTML ☆

赞 0 踩 0

2605.17219 2026-06-03 cs.CR cs.AI cs.LG cs.NI eess.SP 版本更新

Integration of AI in Cybersecurity: Current Trends with a Focused Look at Intrusion Detection Applications

AI在网络安全中的集成：当前趋势及入侵检测应用的聚焦分析

S. Tazili, A. Mansour, M. Y. Chkouri

发表机构 * SIGL Laboratory, ENSATE, Abdelmalek Essaâdi University, Tetouan, Morocco（SIGL实验室、ENSATE、阿卜杜勒马利克·埃萨迪大学、突塔努安、摩洛哥）

AI总结本文综述了当前基于AI的网络安全趋势，重点分析入侵检测方法，通过比较不同AI技术和性能指标揭示有意义见解。

Comments Accepted at AI2SD 2025. Forthcoming in Springer Lecture Notes in Networks and Systems (2026). Please cite this preprint as indicated in the paper!

详情

Journal ref: https://conferences.academyskills.net/ai2sd/2025/PapersManagement/all.php#:~:text=643174

AI中文摘要

人工智能（AI）如今被广泛采用，因其能够检测模式、自动化任务并减少各种应用中的时间和成本。AI与网络安全的整合引起了广泛关注，特别是在入侵检测、恶意软件分析以及钓鱼或垃圾邮件检测等领域。随着AI和网络安全的发展，新的方法和途径不断涌现。当前趋势包括使用生成式AI、自然语言处理、用于隐私保护协作训练的联邦学习以及可解释AI以确保可解释性和信任，这些在网络安全中至关重要。本文对当前基于AI的网络安全趋势进行了有趣的综述，重点聚焦入侵检测方法，旨在通过基于所采用的AI技术和报告性能的比较分析，揭示有意义的见解。

英文摘要

Artificial Intelligence (AI) is widely adopted today for its ability to detect patterns, automate tasks, and reduce time and cost across various applications. Its integration into Cybersecurity has garnered significant attention, particularly in areas such as intrusion detection, malware analysis, and phishing or spam detection. As AI and cybersecurity evolve, new methods and approaches emerge regularly. Current trends include the use of Generative AI, Natural Language Processing, Federated Learning for privacy-preserving collaborative training, and eXplainable AI to ensure interpretability and trust, which are vital in cybersecurity. This paper presents an interesting review of current AI-based cybersecurity trends, focusing on intrusion detection approaches and aiming to uncover meaningful insights through comparative analysis based on the employed AI techniques and reported performance.

URL PDF HTML ☆

赞 0 踩 0

2605.15806 2026-06-03 cs.LG 版本更新

VeRO: 用于优化智能体的智能体框架

Varun Ursekar, Apaar Shanker, Veronica Chatrath, Yuan Xue, Samuel Marc Denton

发表机构 * arXiv

AI总结提出 VeRO 框架和 VeRO-Bench 基准，通过版本化快照、预算控制评估和结构化执行轨迹来优化智能体代码，并实验比较不同优化器对目标智能体的改进效果。

Comments Accepted to the Forty-Third International Conference on Machine Learning (ICML), 2026

详情

AI中文摘要

编码智能体的一个重要新兴应用是智能体框架优化：通过编辑和评估目标智能体的代码来迭代改进它。尽管具有相关性，但社区对编码智能体在此任务上的表现缺乏系统理解。框架优化与传统软件工程不同：智能体框架将确定性代码与随机 LLM 完成交错，需要结构化捕获中间执行轨迹和下游结果。为了解决这些挑战，我们引入了 (1) VeRO（版本化、奖励和观察），一个外部框架，提供目标框架的版本化快照、预算控制评估和结构化执行轨迹，以及 (2) VeRO-Bench，一个包含参考评估程序的目标智能体和任务的基准套件。使用 VeRO，我们进行了一项实证研究，比较了不同任务上的优化器，并分析了哪些修改能可靠地改进目标智能体框架。我们发布 VeRO 以支持作为编码智能体核心能力的智能体优化研究。代码可在 https://github.com/scaleapi/vero 获取。

英文摘要

An important emerging application of coding agents is agent harness optimization: the iterative improvement of a target agent by editing and evaluating its code. Despite its relevance, the community lacks a systematic understanding of coding agent performance on this task. Harness optimization differs from conventional software engineering: agent harnesses interleave deterministic code with stochastic LLM completions, requiring structured capture of both intermediate execution traces and downstream outcomes. To address these challenges, we introduce (1) VeRO (Versioning, Rewards, and Observations), an outer harness that provides versioned snapshots, budget-controlled evaluation, and structured execution traces of target harnesses, and (2) VeRO-Bench, a benchmark suite of target agents and tasks with reference evaluation procedures. Using VeRO, we conduct an empirical study comparing optimizers across tasks and analyzing which modifications reliably improve target agent harnesses. We release VeRO to support research on agent optimization as a core capability for coding agents. Code is available at https://github.com/scaleapi/vero.

URL PDF HTML ☆

赞 0 踩 0

2605.05629 2026-06-03 stat.ML cs.CL cs.LG 版本更新

Spherical Flows for Sampling Categorical Data

用于分类数据采样的球面流

Jannis Chemseddine, Gregor Kornhardt, Gabriele Steidl

发表机构 * Technische Universität Berlin（柏林技术大学）

AI总结提出在球面上利用von Mises-Fisher分布进行离散序列生成建模，通过径向对称性简化连续性方程为标量ODE，结合后验加权切线和与预测-校正采样实现高效采样。

详情

AI中文摘要

我们研究了在连续嵌入空间中学习离散序列生成模型的问题。以往的方法通常在欧几里得空间或概率单纯形上操作，而我们则在球面$\mathbb S^{d-1}$上工作。在那里，von Mises-Fisher (vMF)分布诱导了一个自然的噪声过程，并允许闭式条件得分。条件速度通常是难以处理的。利用vMF密度的径向对称性，我们将$\mathbb S^{d-1}$上的连续性方程简化为关于余弦相似度的标量ODE，其唯一有界解决定了速度。$\mathbb S^{d-1}$上的边际速度和边际得分都分解为后验加权的切线和，仅因每个token的标量权重不同。这提供了ODE和预测-校正(PC)采样两种途径。后验是唯一需要学习的对象，通过交叉熵损失训练。实验将vMF路径与测地线和欧几里得替代方案进行了比较。vMF与PC采样的结合显著改善了数独和语言建模的结果。

英文摘要

We study the problem of learning generative models for discrete sequences in a continuous embedding space. Whereas prior approaches typically operate in Euclidean space or on the probability simplex, we instead work on the sphere $\mathbb S^{d-1}$. There the von Mises-Fisher (vMF) distribution induces a natural noise process and admits a closed-form conditional score. The conditional velocity is in general intractable. Exploiting the radial symmetry of the vMF density we reduce the continuity equation on $\mathbb S^{d-1}$ to a scalar ODE in the cosine similarity, whose unique bounded solution determines the velocity. The marginal velocity and marginal score on $(\mathbb S^{d-1})^L$ both decompose into posterior-weighted tangent sums that differ only by per-token scalar weights. This gives access to both ODE and predictor-corrector (PC) sampling. The posterior is the only learned object, trained by a cross-entropy loss. Experiments compare the vMF path against geodesic and Euclidean alternatives. The combination of vMF and PC sampling significantly improves results on Sudoku and language modeling.

URL PDF HTML ☆

赞 0 踩 0

2604.23099 2026-06-03 cs.LG cs.AI stat.ML 版本更新

ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation

ProEval：生成式AI评估的主动故障发现与高效性能估计

Yizheng Huang, Wenjun Zeng, Aditi Kumaresan, Zi Wang

发表机构 * Google DeepMind（谷歌深Mind）

AI总结提出ProEval框架，利用预训练高斯过程进行贝叶斯积分和超水平集采样，实现高效性能估计和主动故障发现，在推理、安全对齐和分类基准上以8-65倍更少样本达到1%误差内估计。

Comments Our open-sourced code and data can be found at https://github.com/google-deepmind/proeval

详情

Journal ref: International Conference on Machine Learning, 2026

AI中文摘要

由于推理速度慢、评估成本高以及模型和基准的快速增长，评估生成式AI模型变得越来越资源密集。我们提出ProEval，一个主动评估框架，利用迁移学习高效估计性能并识别故障案例。ProEval采用预训练高斯过程（GPs）作为性能评分函数的代理，将模型输入映射到指标，如错误严重性或安全违规。通过将性能估计构建为贝叶斯积分（BQ）和故障发现构建为超水平集采样，我们开发了不确定性感知的决策策略，主动选择或合成高度信息量的输入进行测试。理论上，我们证明了基于预训练GP的BQ估计器是无偏且有界的。实验上，在推理、安全对齐和分类基准上的大量实验表明，ProEval比竞争基线显著更高效。它需要8-65倍更少的样本即可达到真实值1%内的估计，同时在更严格的评估预算下揭示更多样化的故障案例。

英文摘要

Evaluating generative AI models is increasingly resource-intensive due to slow inference, expensive raters, and a rapidly growing landscape of models and benchmarks. We propose ProEval, a proactive evaluation framework that leverages transfer learning to efficiently estimate performance and identify failure cases. ProEval employs pre-trained Gaussian Processes (GPs) as surrogates for the performance score function, mapping model inputs to metrics such as the severity of errors or safety violations. By framing performance estimation as Bayesian quadrature (BQ) and failure discovery as superlevel set sampling, we develop uncertainty-aware decision strategies that actively select or synthesize highly informative inputs for testing. Theoretically, we prove that our pre-trained GP-based BQ estimator is unbiased and bounded. Empirically, extensive experiments on reasoning, safety alignment, and classification benchmarks demonstrate that ProEval is significantly more efficient than competitive baselines. It requires 8-65x fewer samples to achieve estimates within 1% of the ground truth, while simultaneously revealing more diverse failure cases under a stricter evaluation budget.

URL PDF HTML ☆

赞 0 踩 0

2604.20316 2026-06-03 cs.LG 版本更新

R2IF: Aligning Reasoning with Decisions via Composite Rewards for Interpretable LLM Function Calling

R2IF: 通过复合奖励对齐推理与决策以实现可解释的LLM函数调用

Aijia Cheng, Kailong Wang, Ling Shi, Yongxin Zhao

发表机构 * Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai, China（上海可信计算实验室，华东师范大学，上海，中国）； Huazhong University of Science and Technology（华中科技大学）； Nanyang Technological University（南洋理工大学）

AI总结提出R2IF框架，通过复合奖励（格式/正确性约束、思维链有效性奖励和规范-修改-价值奖励）和GRPO优化，对齐推理过程与工具调用决策，在BFCL/ACEBench上提升函数调用准确性和可解释性。

未知扰动下的功率感知认知雷达多目标跟踪

Imad Bouhou, Stefano Fortunati, Leila Gharsalli, Alexandre Renaux

发表机构 * Université Paris-Saclay, CNRS, CentraleSupélec, Laboratoire des Signaux et Systèmes（巴黎萨克雷大学、法国国家科学研究中心、中央理工大学、信号与系统实验室）； SAMOVAR, Télécom SudParis, Institut Polytechnique de Paris（SAMOVAR、 Télécom SudParis、巴黎公立理工学院）； DR2I-IPSA

AI总结针对未知扰动下多目标跟踪问题，提出一种基于部分可观测蒙特卡洛规划（POMCP）的认知雷达框架，通过自适应波形设计和功率分配提升低信噪比目标检测概率和跟踪精度。

2604.10169 2026-06-03 cs.AI cs.LG 版本更新

MAVEN-T: Reinforced Heterogeneous Distillation for Real-Time Multi-Agent Trajectory Prediction

MAVEN-T：用于实时多智能体轨迹预测的强化异构蒸馏

Wenchang Duan, Zhenguo Gao, Jinguo Xian, Yi Shi

发表机构 * School of Mathematical Sciences, Shanghai Jiao Tong University（上海交通大学数学科学学院）； Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University（上海交通大学Bio-X研究院、发育与神经精神疾病遗传学重点实验室）； Shanghai Key Laboratory of Psychotic Disorders, Brain Science and Technology Research Center, Shanghai Jiao Tong University（上海精神疾病重点实验室、脑科学与技术研究中心，上海交通大学）

AI总结提出MAVEN-T框架，通过高容量教师模型和紧凑学生模型的异构蒸馏，结合强化学习优化，实现实时多智能体轨迹预测，在多个数据集上达到高精度与低延迟。

详情

AI中文摘要

轨迹预测是自动驾驶系统的关键组成部分，因为未来运动直接影响碰撞检查、行为规划和控制。在密集交互、异构行为、多模态未来和有限车载计算条件下，该任务仍然具有挑战性。现有的图、注意力和生成式预测器改进了交互推理或不确定性建模，但其高容量设计通常成本高昂，难以实时部署。轻量级预测器和传统蒸馏降低了推理成本，但通常依赖静态模仿，并未明确纠正与安全相关的教师偏差。本文提出了MAVEN-T，一种用于实时多智能体轨迹预测的强化异构蒸馏框架。高容量教师模型通过环绕感知图编码器建模有向局部交互，结合高效时间滤波与移位窗口空间注意力，并通过稀疏混合专家头解码特定机动未来。紧凑的GRU-挤压激励学生模型配备低秩自适应策略头，通过特征级、注意力级和语义级蒸馏进行训练。为了与下游行为对齐，学生模型进一步通过近端策略优化奖励进行细化，奖励包括碰撞避免、舒适性和进度，同时复杂度感知课程和弹性权重巩固稳定了分阶段训练。在NGSIM、HighD、MoCAD、Argoverse 2和Waymo开放运动数据集上的实验评估了准确性、效率、泛化性、鲁棒性和闭环安全性。学生模型在NVIDIA Jetson AGX Orin上实现了6.2倍参数压缩、3.7倍推理加速和14.6毫秒延迟，同时保持竞争性准确性。

英文摘要

Trajectory prediction is a key component of autonomous driving systems because future motions directly affect collision checking, behavior planning, and control. The task remains challenging under dense interactions, heterogeneous behaviors, multimodal futures, and limited on-board computation. Existing graph, attention, and generative predictors improve interaction reasoning or uncertainty modeling, but their high-capacity designs are often costly for real-time deployment. Lightweight predictors and conventional distillation reduce inference cost, yet usually rely on static imitation and do not explicitly correct safety-relevant teacher bias. This paper proposes \textbf{MAVEN-T}, a reinforced heterogeneous distillation framework for real-time multi-agent trajectory prediction. A high-capacity teacher models directed local interactions with a surround-aware graph encoder, combines efficient temporal filtering with shifted-window spatial attention, and decodes maneuver-specific futures through a sparse Mixture-of-Experts head. A compact GRU--Squeeze-and-Excitation student with a Low-Rank Adapted policy head is trained by feature-, attention-, and semantic-level distillation. To align prediction with downstream behavior, the student is further refined by Proximal Policy Optimization rewards for collision avoidance, comfort, and progress, while a complexity-aware curriculum and Elastic Weight Consolidation stabilize stage-wise training. Experiments on NGSIM, HighD, MoCAD, Argoverse~2, and the Waymo Open Motion Dataset evaluate accuracy, efficiency, generalization, robustness, and closed-loop safety. The student achieves 6.2$\times$ parameter compression, 3.7$\times$ inference acceleration, and 14.6,ms latency on an NVIDIA Jetson AGX Orin while maintaining competitive accuracy.

URL PDF HTML ☆

赞 0 踩 0

2510.02779 2026-06-03 cs.LG 版本更新

Optimal Rates for Generalization of Gradient Descent for Deep ReLU Classification

深度ReLU分类中梯度下降泛化的最优速率

Yuanfan Li, Yunwen Lei, Zheng-Chu Guo, Yiming Ying

发表机构 * School of Mathematical Sciences, Zhejiang University（浙江大学数学科学学院）； Department of Mathematics, The University of Hong Kong（香港大学数学系）； School of mathematics and statistics, University of Sydney（悉尼大学数学与统计学学院）

AI总结针对深度ReLU网络，通过权衡优化与泛化误差，在NTK可分离假设下证明了梯度下降的泛化误差率为~O(L^6/(nγ^2))，与SVM最优率仅差深度相关因子，关键技术是控制参考模型附近的激活模式以得到更紧的Rademacher复杂度界。

Comments Published in NeurIPS 2025

详情

AI中文摘要

近期进展显著提升了我们对深度神经网络中梯度下降（GD）方法泛化性能的理解。一个自然且基本的问题是：GD能否达到核方法中建立的最小最大最优速率？现有结果要么给出次优的$O(1/\sqrt{n})$速率，要么关注具有光滑激活函数的网络，导致对网络深度$L$的指数依赖。本文通过仔细权衡优化与泛化误差，为深度ReLU网络的GD建立了最优泛化速率，仅对深度有多项式依赖。具体地，在数据以间隔$γ$为NTK可分离的假设下，我们证明了过风险率为$\widetilde{O}(L^6 / (n γ^2))$，这与最优SVM型速率$\widetilde{O}(1 / (n γ^2))$仅差深度相关因子。一项关键的技术贡献是我们对参考模型附近激活模式的新颖控制，从而为梯度下降训练的深度ReLU网络获得了更紧的Rademacher复杂度界。

英文摘要

Recent advances have significantly improved our understanding of the generalization performance of gradient descent (GD) methods in deep neural networks. A natural and fundamental question is whether GD can achieve generalization rates comparable to the minimax optimal rates established in the kernel setting. Existing results either yield suboptimal rates of $O(1/\sqrt{n})$, or focus on networks with smooth activation functions, incurring exponential dependence on network depth $L$. In this work, we establish optimal generalization rates for GD with deep ReLU networks by carefully trading off optimization and generalization errors, achieving only polynomial dependence on depth. Specifically, under the assumption that the data are NTK separable from the margin $γ$, we prove an excess risk rate of $\widetilde{O}(L^6 / (n γ^2))$, which aligns with the optimal SVM-type rate $\widetilde{O}(1 / (n γ^2))$ up to depth-dependent factors. A key technical contribution is our novel control of activation patterns near a reference model, enabling a sharper Rademacher complexity bound for deep ReLU networks trained with gradient descent.

URL PDF HTML ☆

赞 0 踩 0

2604.07366 2026-06-03 cs.LG 版本更新

Flow Learners for PDEs: Toward a Physics-to-Physics Paradigm for Scientific Computing

PDE的流学习器：迈向科学计算的物理到物理范式

Yilong Dai, Shengyu Chen, Xiaowei Jia, Runlong Yu

发表机构 * The University of Alabama（阿拉巴马大学）； University of Pittsburgh（匹兹堡大学）

AI总结本文提出流学习器（flow learners）范式，通过参数化传输向量场并积分生成轨迹，将PDE求解从状态预测转向物理上允许的未来传输建模，实现连续时间预测、不确定性量化及物理感知求解器设计。

详情

AI中文摘要

偏微分方程（PDE）支配着科学与工程中几乎所有的物理过程，但大规模求解仍然代价高昂。生成式AI已经改变了语言、视觉和蛋白质科学，但学习的PDE求解器尚未经历类似的转变。现有范式各自捕捉了问题的一部分。物理信息神经网络嵌入残差结构，尽管在刚性、多尺度或大区域情况下通常难以优化。神经算子跨实例进行摊销，尽管它们通常继承快照预测的求解视图，并可能在长滚动中退化。基于扩散的求解器对不确定性建模，尽管它们通常建立在仍以状态回归为中心的求解器模板上。我们认为核心问题是用于训练学习求解器的抽象。许多模型被要求预测状态，而许多科学设置需要建模不确定性如何在约束动力学中移动。相关对象是物理上允许的未来上的传输。这激发了流学习器：参数化传输向量场并通过积分生成轨迹的模型，呼应定义PDE演化的连续动力学。这种物理到物理的对齐支持连续时间预测、原生不确定性量化以及物理感知求解器设计的新机会。我们解释了为什么基于传输的学习为学习的PDE求解提供了更强的组织原则，并概述了从这一转变中产生的研究议程。

英文摘要

Partial differential equations (PDEs) govern nearly every physical process in science and engineering, but solving them at scale remains prohibitively expensive. Generative AI has transformed language, vision, and protein science, but learned PDE solvers have not undergone a comparable shift. Existing paradigms each capture part of the problem. Physics-informed neural networks embed residual structure, although they are often difficult to optimize in stiff, multiscale, or large-domain regimes. Neural operators amortize across instances, although they commonly inherit a snapshot-prediction view of solving and can degrade over long rollouts. Diffusion-based solvers model uncertainty, although they are often built on a solver template that still centers on state regression. We argue that the core issue is the abstraction used to train learned solvers. Many models are asked to predict states, while many scientific settings require modeling how uncertainty moves through constrained dynamics. The relevant object is transport over physically admissible futures. This motivates flow learners: models that parameterize transport vector fields and generate trajectories through integration, echoing the continuous dynamics that define PDE evolution. This physics-to-physics alignment supports continuous-time prediction, native uncertainty quantification, and new opportunities for physics-aware solver design. We explain why transport-based learning offers a stronger organizing principle for learned PDE solving and outline the research agenda that follows from this shift.

URL PDF HTML ☆

赞 0 踩 0

2604.04439 2026-06-03 cs.LG cs.CV 版本更新

Estimating Central, Peripheral, and Temporal Visual Contributions to Human Decision Making in Atari Games

估计Atari游戏中中央、周边和时间视觉对人类决策的贡献

Henrik Krauss, Takehisa Yairi

发表机构 * Department of Advanced Interdisciplinary Studies, The University of Tokyo（东京大学先进跨学科研究系）； Research Center for Advanced Science and Technology, The University of Tokyo（东京大学先进科学与技术研究中心）

AI总结通过控制消融框架分析Atari游戏中的眼动数据，发现周边视觉信息对人类决策贡献最大，而注视信息和过去状态信息贡献较小。

详情

AI中文摘要

我们研究了不同视觉信息源在动态视觉环境中对人类决策的贡献。利用Atari-HEAD（一个带有同步眼动追踪的大规模Atari游戏数据集），我们引入了一个受控消融框架，作为逆向工程周边视觉信息、显式注视信息（以注视图形式）以及人类行为中过去状态信息贡献的手段。我们在六种设置下训练动作预测网络，这些设置选择性地包含或排除这些信息源。在20个游戏中，周边信息的贡献最为显著，移除后预测准确率的中位数下降范围为35.27-43.90%。注视信息导致的下降较小，为2.11-2.76%，而过去状态信息的下降范围较广，为1.52-15.51%，其中上限可能因减少了周边信息泄露而更具信息量。为了补充总体准确率，我们根据不同模型配置分配的真实动作概率对状态进行聚类。该分析识别出粗略的行为模式，包括焦点主导、周边主导以及更多情境决策情境。这些结果表明，Atari游戏中的人类决策强烈依赖于当前注视焦点之外的信息，而所提出的框架提供了一种从行为中估计此类信息源贡献的方法。

英文摘要

We study how different visual information sources contribute to human decision making in dynamic visual environments. Using Atari-HEAD, a large-scale Atari gameplay dataset with synchronized eye-tracking, we introduce a controlled ablation framework as a means to reverse-engineer the contribution of peripheral visual information, explicit gaze information in the form of gaze maps, and past-state information from human behavior. We train action-prediction networks under six settings that selectively include or exclude these information sources. Across 20 games, peripheral information shows by far the strongest contribution, with median prediction-accuracy drops in the range of 35.27-43.90% when removed. Gaze information yields smaller drops of 2.11-2.76%, while past-state information shows a broader range of 1.52-15.51%, with the upper end likely more informative due to reduced peripheral-information leakage. To complement aggregate accuracies, we cluster states by true-action probabilities assigned by the different model configurations. This analysis identifies coarse behavioral regimes, including focus-dominated, periphery-dominated, and more contextual decision situations. These results suggest that human decision making in Atari depends strongly on information beyond the current focus of gaze, while the proposed framework provides a way to estimate such information-source contributions from behavior.

URL PDF HTML ☆

赞 0 踩 0

2604.04087 2026-06-03 cs.LG 版本更新

ArrowFlow: Hierarchical Machine Learning in the Space of Permutations

ArrowFlow：排列空间中的层次化机器学习

Ozgur Yilmaz

发表机构 * Department of Artificial Intelligence（人工智能系）； Adana Science and Technology University（阿达纳科学技术大学）

AI总结提出ArrowFlow架构，在排列空间中通过排序滤波器和置换矩阵累积实现无浮点参数的层次化排序表示学习，并利用社会选择公理违反作为归纳偏置，实验表明在多个基准上具有竞争力且具备噪声鲁棒性、隐私保护等特性。

详情

AI中文摘要

我们引入了ArrowFlow，一种完全在排列空间中运行的机器学习架构。其计算单元是排序滤波器，即学习到的排序，通过Spearman's footrule距离比较输入，并通过置换矩阵累积（一种基于位移证据的非梯度规则）进行更新。层以层次方式组合：每一层的输出排序成为下一层的输入，从而在核心计算中无需任何浮点参数即可实现深度序数表示学习。我们将该架构与Arrow不可能定理联系起来，表明社会选择公平性公理（上下文依赖性、专业化、对称性破坏）的违反作为非线性、稀疏性和稳定性的归纳偏置。实验涵盖UCI表格基准、MNIST、基因表达癌症分类（TCGA）和偏好数据，均与GridSearchCV调优的基线进行比较。ArrowFlow在Iris上击败所有基线（2.7% vs. 3.3%），并在大多数UCI数据集上具有竞争力。单个参数多项式次数充当主开关：次数1带来噪声鲁棒性（退化减少8-28%）、隐私保护（成本增加0.5个百分点）和缺失特征弹性；更高次数则牺牲这些特性以换取更高的干净准确率。ArrowFlow并非旨在超越基于梯度的方法。它是一个存在性证明，表明在一种根本不同的计算范式（将序数结构提升为一等公民，且与纯整数和神经形态硬件自然对齐）中实现有竞争力的分类是可能的。

英文摘要

We introduce ArrowFlow, a machine learning architecture that operates entirely in the space of permutations. Its computational units are ranking filters, learned orderings that compare inputs via Spearman's footrule distance and update through permutation-matrix accumulation, a non-gradient rule rooted in displacement evidence. Layers compose hierarchically: each layer's output ranking becomes the next layer's input, enabling deep ordinal representation learning without any floating-point parameters in the core computation. We connect the architecture to Arrow's impossibility theorem, showing that violations of social-choice fairness axioms (context dependence, specialization, symmetry breaking) serve as inductive biases for nonlinearity, sparsity, and stability. Experiments span UCI tabular benchmarks, MNIST, gene expression cancer classification (TCGA), and preference data, all against GridSearchCV-tuned baselines. ArrowFlow beats all baselines on Iris (2.7% vs. 3.3%) and is competitive on most UCI datasets. A single parameter, polynomial degree, acts as a master switch: degree 1 yields noise robustness (8-28% less degradation), privacy preservation (+0.5pp cost), and missing-feature resilience; higher degrees trade these for improved clean accuracy. ArrowFlow is not designed to surpass gradient-based methods. It is an existence proof that competitive classification is possible in a fundamentally different computational paradigm, one that elevates ordinal structure to a first-class citizen, with natural alignment to integer-only and neuromorphic hardware.

URL PDF HTML ☆

赞 0 踩 0

2603.19551 2026-06-03 stat.ME cs.LG 版本更新

Learning to Bet for Horizon-Aware Anytime-Valid Testing

学习在严格截止日期下进行前瞻性任意有效测试的投注

Ege Onur Taga, Samet Oymak, Shubhanshu Shekhar

发表机构 * Department of Electrical and Computer Engineering, University of Michigan（密歇根大学电气与计算机工程系）

AI总结本文通过将前瞻性投注建模为有限时域最优控制问题，利用深度强化学习学习通用策略，在严格截止日期下实现有界均值的任意有效测试和置信序列。

Comments To appear in ICML 2026; 29 pages, 22 figures

详情

AI中文摘要

WaterSIC: 信息论（近乎）最优的线性层量化

Egor Lifar, Semyon Savkin, Or Ordentlich, Yury Polyanskiy

发表机构 * University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Stanford University（斯坦福大学）； University of California, Berkeley（加州大学伯克利分校）； University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结针对密集线性层低精度量化问题，提出WaterSIC算法，通过为权重矩阵不同列分配不同量化率，实现与信息论极限仅0.255比特的差距，并在Llama和Qwen系列大语言模型上达到1-4比特量化的最优性能。

2603.03612 2026-06-03 cs.LG cs.CC cs.CL cs.FL 版本更新

Why Are Linear RNNs More Parallelizable?

为什么线性RNN更易于并行化？

William Merrill, Hongjian Jiang, Yanhong Li, Anthony Lin, Ashish Sabharwal

发表机构 * GitHub

AI总结本文通过将RNN类型与标准复杂度类紧密关联，揭示了线性RNN（LRNN）因可视为对数深度算术电路而易于并行化，而非线性RNN因能解决L-完全问题而存在并行化障碍。

Comments To appear at ICML 2026

详情

AI中文摘要

社区越来越多地探索线性RNN（LRNN）作为语言模型，受其表达能力和并行化能力的驱动。虽然先前的工作确立了LRNN相对于Transformer的表达优势，但尚不清楚是什么使得LRNN——而非传统的非线性RNN——在实践中与Transformer一样易于并行化。我们通过提供RNN类型与标准复杂度类之间的紧密联系来回答这个问题。我们表明，LRNN可以看作是对数深度（有界扇入）算术电路，相对于Transformer所允许的对数深度布尔电路，这仅代表轻微深度开销。此外，我们表明非线性RNN可以解决$\mathsf{L}$-完全问题（甚至在多项式精度下解决$\mathsf{P}$-完全问题），揭示了将它们与Transformer一样高效并行化的根本障碍。我们的理论还识别了近期流行LRNN变体之间的细粒度表达差异：置换对角LRNN是$\mathsf{NC}^1$-完全的，而对角加低秩LRNN更具表达性（$\mathsf{PNC}^1$-完全）。我们通过将每种RNN类型与它可以模拟的相应自动机理论模型相关联，提供了进一步见解。总之，我们的结果揭示了非线性RNN与不同LRNN变体之间的基本权衡，为设计在表达性和并行性之间实现最佳平衡的LLM架构提供了基础。

英文摘要

The community is increasingly exploring linear RNNs (LRNNs) as language models, motivated by their expressive power and parallelizability. While prior work establishes the expressivity benefits of LRNNs over transformers, it is unclear what makes LRNNs -- but not traditional, nonlinear RNNs -- as easy to parallelize in practice as transformers. We answer this question by providing a tight connection between types of RNNs and standard complexity classes. We show that LRNNs can be viewed as log-depth (bounded fan-in) arithmetic circuits, which represents only a slight depth overhead relative to log-depth boolean circuits that transformers admit. Furthermore, we show that nonlinear RNNs can solve $\mathsf{L}$-complete problems (and even $\mathsf{P}$-complete ones, under polynomial precision), revealing a fundamental barrier to parallelizing them as efficiently as transformers. Our theory also identifies fine-grained expressivity differences between recent popular LRNN variants: permutation-diagonal LRNNs are $\mathsf{NC}^1$-complete whereas diagonal-plus-low-rank LRNNs are more expressive ($\mathsf{PNC}^1$-complete). We provide further insight by associating each type of RNN with a corresponding automata-theoretic model that it can simulate. Together, our results reveal fundamental tradeoffs between nonlinear RNNs and different variants of LRNNs, providing a foundation for designing LLM architectures that achieve an optimal balance between expressivity and parallelism.

URL PDF HTML ☆

赞 0 踩 0

2510.20372 2026-06-03 stat.ML cs.LG econ.EM math.ST stat.ME stat.TH 版本更新

Testing Most Influential Sets

最具影响力集合的检验

Lucas D. Konrad, Nikolas Kuschnig

发表机构 * Vienna University of Economics and Business（维也纳经济与商业大学）； Monash University（墨尔本大学）

AI总结针对小部分数据点可能过度影响模型结论的问题，基于线性最小二乘法推导精确影响公式并识别最大影响的极值分布，提出一个用于检验过度影响的假设检验框架。

Comments Published as a conference paper at ICLR 2026

详情

AI中文摘要

小的有影响力的数据子集可以极大地影响模型结论，少数数据点可能推翻关键发现。虽然最近的研究识别了这些最具影响力的集合，但没有正式的方法来判断最大影响何时是过度的，而非在自然随机抽样变异下预期的。我们通过开发一个关于最具影响力集合的原则性框架来填补这一空白。聚焦于线性最小二乘法，我们推导了一个方便的精确影响公式，并识别了最大影响的极值分布——对于固定大小的集合和重尾数据是重尾的弗雷歇分布，对于增长集合或轻尾数据是表现良好的耿贝尔分布。这使得我们能够对过度影响进行严格的假设检验。我们通过跨经济学、生物学和机器学习基准的应用，解决了有争议的发现，并用严格的推断取代了临时的启发式方法。

英文摘要

Small influential data subsets can dramatically impact model conclusions, with a few data points overturning key findings. While recent work identifies these most influential sets, there is no formal way to tell when maximum influence is excessive rather than expected under natural random sampling variation. We address this gap by developing a principled framework for most influential sets. Focusing on linear least-squares, we derive a convenient exact influence formula and identify the extreme value distributions of maximal influence - the heavy-tailed Fréchet for constant-size sets and heavy-tailed data, and the well-behaved Gumbel for growing sets or light tails. This allows us to conduct rigorous hypothesis tests for excessive influence. We demonstrate through applications across economics, biology, and machine learning benchmarks, resolving contested findings and replacing ad-hoc heuristics with rigorous inference.

URL PDF HTML ☆

赞 0 踩 0

2510.16462 2026-06-03 cs.LG stat.ML 版本更新

Buzz, Choose, Forget: A Meta-Bandit Framework for Bee-Like Decision Making

Buzz, Choose, Forget: 一种类蜂决策的元老虎机框架

Emmanuelle Claeys, Elena Kerjean, Jean-Michel Loubes

发表机构 * University of Toulouse, IRIT（图卢兹大学，IRIT）； University of Toulouse, CBI（图卢兹大学，CBI）； Regalia Team, INRIA University of Toulouse, France（Regalia团队，法国国家信息与自动化研究所图卢兹大学）

AI总结提出基于多臂老虎机的序列模仿学习模型MAYA，通过时间窗口τ模拟蜜蜂有限记忆，在真实、模拟和补充数据集上优于基线模型，并具备可解释性和轨迹推断能力。

2506.13107 2026-06-03 cs.LG stat.ML 版本更新

Honesty in Causal Forests: When It Helps and When It Hurts

因果森林中的诚实性：何时有益，何时有害

Yanfang Hou, Carlos Fernández-Loría

AI总结本文通过偏差-方差权衡分析，发现诚实估计（分割数据用于子组定义和效应估计）在异质性较强且数据充足时会降低个体处理效应估计精度，建议将其视为正则化手段而非默认选择。

详情

AI中文摘要

因果森林估计处理效应如何随个体变化，指导营销、运营和公共政策等领域的个性化干预。标准做法是诚实估计：将数据分为两个样本，一个用于定义子组，另一个用于估计子组内的处理效应。这旨在减少过拟合，并且是许多软件包的默认设置。但这是正确的选择吗？我们表明，诚实估计会降低个体处理效应估计的准确性，特别是当效应异质性显著且数据集足够大以检测到它时。原因是偏差-方差权衡：诚实性降低了过拟合的风险，但通过限制可用于检测和建模异质性的数据，增加了欠拟合的风险。在超过7000个基准数据集上，我们发现默认使用诚实性的代价可能高达需要多27%的数据才能匹配未使用诚实性训练的模型的性能。诚实性最好被理解为一种正则化形式。是否采用它应取决于应用的目标及其经验表现，而不是反射性的默认使用。

英文摘要

Causal forests estimate how treatment effects vary across individuals, guiding personalized interventions in areas like marketing, operations, and public policy. A standard practice is honest estimation: dividing the data into two samples, one to define subgroups and another to estimate treatment effects within them. This is intended to reduce overfitting and is the default in many software packages. But is it the right choice? We show that honest estimation can reduce the accuracy of estimates of individual treatment effects, especially when effect heterogeneity is substantial and datasets are large enough to detect it. The reason is a bias-variance trade-off: honesty lowers the risk of overfitting but increases the risk of underfitting by limiting the data available to detect and model heterogeneity. Across more than 7,000 benchmark datasets, we find that the cost of using honesty by default can be as high as requiring 27% more data to match the performance of models trained without it. Honesty is best understood as a form of regularization. Whether to adopt it should depend on the goals of the application and its empirical performance, not on reflexive default use.

URL PDF HTML ☆

赞 0 踩 0

2603.03480 2026-06-03 cs.LG stat.ML 版本更新

Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning

在线强化学习中延迟观测的极小化最优策略

Harin Lee, Kevin Jamieson

发表机构 * University of California, Berkeley（加州大学伯克利分校）； UC Berkeley（加州大学伯克利分校）

AI总结针对延迟状态观测的强化学习问题，提出结合增广方法和上置信界算法的策略，在表格型MDP上达到极小化最优遗憾界。

Comments ICML camera ready version

详情

AI中文摘要

我们研究具有延迟状态观测的强化学习，其中智能体在随机数量的时间步后观察到当前状态。我们提出了一种结合增广方法和上置信界方法的算法。对于表格型马尔可夫决策过程（MDP），我们推导出遗憾界为$\tilde{\mathcal{O}}(H \sqrt{D_{\max} SAK})$，其中$S$和$A$是状态和动作空间的基数，$H$是时间跨度，$K$是回合数，$D_{\max}$是最大延迟长度。我们还提供了匹配的下界（对数因子除外），表明我们的方法是最优的。我们的分析框架将这个问题表述为一类更广泛的MDP的特例，其中它们的转移动态分解为已知部分和未知但结构化的部分。我们为这个抽象设定建立了通用结果，这可能具有独立的研究价值。

英文摘要

We study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence bound approach. For tabular Markov decision processes (MDPs), we derive a regret bound of $\tilde{\mathcal{O}}(H \sqrt{D_{\max} SAK})$, where $S$ and $A$ are the cardinalities of the state and action spaces, $H$ is the time horizon, $K$ is the number of episodes, and $D_{\max}$ is the maximum length of the delay. We also provide a matching lower bound up to logarithmic factors, showing the optimality of our approach. Our analytical framework formulates this problem as a special case of a broader class of MDPs, where their transition dynamics decompose into a known component and an unknown but structured component. We establish general results for this abstract setting, which may be of independent interest.

URL PDF HTML ☆

赞 0 踩 0

2603.01471 2026-06-03 cs.IR cs.LG 版本更新

Reconstructing Content with Collaborative Attention for Universal Multimodal Representation Learning

通过协同注意力重建内容以提升多模态嵌入质量

Jiahan Chen, Da Li, Hengran Zhang, Yinqiong Cai, Lixin Su, Jiafeng Guo, Daiting Shi, Dawei Yin, Keping Bi

发表机构 * State Key Laboratory of AI Safety（人工智能安全国家重点实验室）； Institute of Computing Technology, Chinese Academy of Sciences（中国科学院计算技术研究所）； University of Chinese Academy of Sciences（中国科学院大学）； Baidu Inc.（百度公司）

AI总结提出CoCoA预训练范式，通过重构注意力流和基于EOS的重建任务，利用协同注意力优化多模态嵌入，使模型将输入语义压缩到<EOS>令牌中，从而提升嵌入质量。

详情

AI中文摘要

多模态嵌入模型，根植于多模态大语言模型（MLLMs），在检索和分类等多样任务中取得了显著的性能提升。然而，现有方法大多严重依赖大规模对比学习，对MLLMs的架构和训练范式如何影响嵌入质量的探索有限。虽然MLLMs的因果注意力和下一个令牌预测范式在生成任务中有效，但并未明确鼓励形成全局紧凑的表示，限制了其作为多模态嵌入骨干的有效性。为解决这一问题，我们提出了CoCoA，一种基于协同注意力的内容重建预训练范式，用于多模态嵌入优化。具体而言，我们重构注意力流并引入基于EOS的重建任务，鼓励模型从相应的<EOS>嵌入中重建输入。这促使多模态模型将输入的语义信息压缩到<EOS>令牌中，为后续的对比学习奠定基础。在MMEB-V1上的大量实验表明，基于Qwen2-VL和Qwen2.5-VL构建的CoCoA显著提升了嵌入质量。结果验证了内容重建作为最大化现有数据价值的有效策略，使多模态嵌入模型能够生成紧凑且信息丰富的表示，提升其性能上限。

英文摘要

Multimodal embedding models, rooted in multimodal large language models (MLLMs), have yielded significant performance improvements across diverse tasks such as retrieval and classification. However, most existing approaches rely heavily on large-scale contrastive learning, with limited exploration of how the architectural and training paradigms of MLLMs affect embedding quality. While effective for generation, the causal attention and next-token prediction paradigm of MLLMs does not explicitly encourage the formation of globally compact representations, limiting their effectiveness as multimodal embedding backbones. To address this, we propose CoCoA, a Content reconstruction pre-training paradigm based on Collaborative Attention for multimodal embedding optimization. Specifically, we restructure the attention flow and introduce an EOS-based reconstruction task, encouraging the model to reconstruct input from the corresponding <EOS> embeddings. This drives the multimodal model to compress the semantic information of the input into the <EOS> token, laying the foundations for subsequent contrastive learning. Extensive experiments on MMEB-V1 demonstrate that CoCoA built upon Qwen2-VL and Qwen2.5-VL significantly improves embedding quality. Results validate that content reconstruction serves as an effective strategy to maximize the value of existing data, enabling multimodal embedding models generate compact and informative representations, raising their performance ceiling.

URL PDF HTML ☆

赞 0 踩 0

2603.01372 2026-06-03 cs.LG cs.AI 版本更新

Causal Neural Probabilistic Circuits

因果神经概率电路

Weixin Chen, Han Zhao

AI总结提出因果神经概率电路（CNPC），通过结合神经属性预测器和从因果图编译的因果概率电路，支持遵循因果依赖的精确干预推理，从而提升概念瓶颈模型在干预下的分类准确率。

详情

AI中文摘要

概念瓶颈模型（CBM）通过引入概念层并从概念预测中预测类别标签，增强了端到端神经网络的可解释性。CBM的一个关键特性是支持干预，即领域专家可以在测试时纠正错误预测的概念值以提高最终准确性。然而，典型的CBM仅覆盖被纠正的概念，而保持其他概念预测不变，这忽略了概念间的因果依赖。为解决此问题，我们提出了因果神经概率电路（CNPC），它结合了神经属性预测器和从因果图编译的因果概率电路。该电路支持精确、易处理的因果推理，天然尊重因果依赖。在干预下，CNPC基于专家乘积（PoE）建模类别分布，融合了属性预测器的预测分布和电路计算的干预边际。我们从理论上刻画了CNPC相对于其模块的组合干预误差，并确定了CNPC接近真实干预类别分布的条件。在五个基准数据集上的分布内和分布外实验表明，与五个基线模型相比，CNPC在不同干预属性数量下均实现了更高的任务准确率。

英文摘要

Concept Bottleneck Models (CBMs) enhance the interpretability of end-to-end neural networks by introducing a layer of concepts and predicting the class label from the concept predictions. A key property of CBMs is that they support interventions, i.e., domain experts can correct mispredicted concept values at test time to improve the final accuracy. However, typical CBMs apply interventions by overwriting only the corrected concept while leaving other concept predictions unchanged, which ignores causal dependencies among concepts. To address this, we propose the Causal Neural Probabilistic Circuit (CNPC), which combines a neural attribute predictor with a causal probabilistic circuit compiled from a causal graph. This circuit supports exact, tractable causal inference that inherently respects causal dependencies. Under interventions, CNPC models the class distribution based on a Product of Experts (PoE) that fuses the attribute predictor's predictive distribution with the interventional marginals computed by the circuit. We theoretically characterize the compositional interventional error of CNPC w.r.t. its modules and identify conditions under which CNPC closely matches the ground-truth interventional class distribution. Experiments on five benchmark datasets in both in-distribution and out-of-distribution settings show that, compared with five baseline models, CNPC achieves higher task accuracy across different numbers of intervened attributes.

URL PDF HTML ☆

赞 0 踩 0

2602.00423 2026-06-03 cs.LG 版本更新

scBatchProx: Federated-Inspired Refinement for Stable Cell-Type Discriminability under Heterogeneous Batch Compositions

scBatchProx：异质性批次组成下稳定细胞类型可区分性的联邦启发式精炼

Quang-Huy Nguyen, Jiaqi Wang, Wei-Shinn Ku

发表机构 * National Institute of Health (NIH)（国家卫生研究院）

AI总结提出scBatchProx，一种轻量级后处理方法，通过联邦学习启发的优化和保守正则化，稳定单细胞潜在嵌入，提升异质批次下的细胞类型分类性能。

详情

AI中文摘要

单细胞整合工作流通常构建低维细胞嵌入，然后使用后处理方法减少批次效应。当细胞类型组成在不同批次间变化，某些群体在特定批次中代表性不足或缺失时，这种精炼过程可能变得不稳定。在动态单细胞数据系统中，新获取的批次可能改变技术条件和细胞类型组成，问题变得更加严重。这种不稳定性会降低下游细胞类型分类性能，并削弱在失衡扰动下的稳定性。我们引入scBatchProx，一种轻量级后处理方法，用于在这些异质和不断变化的环境中稳定单细胞潜在嵌入。scBatchProx直接操作预计算嵌入，并将每个批次或研究视为联邦启发优化过程中的客户端。批次条件FiLM适配器学习局部潜在更新，而近端和身份保持正则化使这些更新保持保守。在多批次和跨研究单细胞数据集上的实验表明，scBatchProx在不同上游嵌入上改善了下游细胞类型分类。在受控失衡扰动中，当选定群体从一个批次中降采样或移除时，scBatchProx维持更稳定的受影响细胞类型F1分数。在累积重训练和持续整合设置中，随着新数据集随时间到达，scBatchProx保持有效。这些结果共同表明，保守的联邦启发式精炼有助于在批次组成随数据集和时间变化时维持稳定的单细胞嵌入。

英文摘要

Single-cell integration workflows often construct low-dimensional cell embeddings and then refine them with post-hoc methods to reduce batch effects. This refinement process can become unstable when cell-type compositions vary across batches, with some populations underrepresented or absent in particular batches. The problem becomes more consequential in dynamic single-cell data systems, where newly acquired batches can change both technical conditions and cell-type composition. Such instability can reduce downstream cell-type classification performance and weaken stability under imbalance perturbations. We introduce scBatchProx, a lightweight post-hoc refinement method for stabilizing single-cell latent embeddings in these heterogeneous and evolving settings. scBatchProx operates directly on precomputed embeddings and treats each batch or study as a client in a federated-inspired optimization procedure. A batch-conditioned FiLM adapter learns local latent updates, while proximal and identity-preserving regularization keep these updates conservative. Experiments on multi-batch and cross-study single-cell datasets show that scBatchProx improves downstream cell-type classification across different upstream embeddings. In controlled imbalance perturbations, scBatchProx maintains more stable affected-cell-type F1 when selected populations are downsampled or ablated from one batch. In cumulative retraining and continual integration settings, scBatchProx remains effective as new datasets arrive over time. Together, these results suggest that conservative, federated-inspired refinement can help maintain stable single-cell embeddings as batch compositions change across datasets and over time.

URL PDF HTML ☆

赞 0 踩 0

2505.23725 2026-06-03 cs.LG 版本更新

LLM何时应降低具体性？面向可靠长文本生成的选择性抽象

Shani Goren, Ido Galil, Ran El-Yaniv

发表机构 * Technion（技术离子大学）； NVIDIA（英伟达）

AI总结针对LLM在长文本生成中因低置信度而丢弃有价值信息的问题，提出选择性抽象框架，通过原子级抽象替换不确定内容，在保持语义的同时提升准确性和可靠性。

详情

AI中文摘要

LLM被广泛使用，但仍容易出现事实错误，这削弱了用户信任并限制了在高风险场景中的采用。缓解这一风险的一种方法是为模型配备不确定性估计机制，在置信度低时弃权。然而，这种二元的“全有或全无”方法在长文本场景中过于严格，常常丢弃有价值的信息。我们引入了选择性抽象（SA），这是一个框架，使LLM能够通过选择性地降低不确定内容的细节来用具体性换取可靠性。我们首先通过选择性风险和覆盖率的视角形式化SA。然后，我们提出原子级选择性抽象，这是一种声明级别的实例化，将响应分解为原子声明（简短、自包含的陈述，每个表达一个单一事实），并用更高置信度、更低具体性的抽象替换不确定的原子。为了评估这一框架，我们开发了一个新颖的端到端流水线用于开放式生成，将风险实例化为事实正确性，并使用信息论度量保留信息来衡量覆盖率。在FactScore和LongFact-Objects基准测试上的六个开源模型中，原子级SA始终优于现有基线，在风险-覆盖率曲线下面积（AURC）上比声明移除方法提升高达27.73%，表明降低具体性可以在保留大部分原始含义的同时提升准确性和可靠性。

英文摘要

LLMs are widely used, yet they remain prone to factual errors that erode user trust and limit adoption in high-risk settings. One approach to mitigate this risk is to equip models with uncertainty estimation mechanisms that abstain when confidence is low. However, this binary "all-or-nothing" approach is excessively restrictive in long-form settings, often discarding valuable information. We introduce Selective Abstraction (SA), a framework that enables LLMs to trade specificity for reliability by selectively reducing the detail of uncertain content. We first formalize SA through the lenses of selective risk and coverage. We then propose Atom-wise Selective Abstraction, a claim-level instantiation that decomposes responses into atomic claims (short, self-contained statements each expressing a single fact) and replaces uncertain atoms with higher confidence, less specific abstractions. To evaluate this framework, we develop a novel end-to-end pipeline for open-ended generation that instantiates risk as factual correctness and measures coverage using an information-theoretic measure of retained information. Across six open-source models on the FactScore and LongFact-Objects benchmarks, atom-wise SA consistently outperforms existing baselines, improving the area under the risk-coverage curve (AURC) by up to 27.73% over claim removal, demonstrating that reducing specificity can boost accuracy and reliability while preserving most of their original meaning.

URL PDF HTML ☆

赞 0 踩 0

2602.10949 2026-06-03 stat.ML cs.LG math.DS math.PR 版本更新

Optimal Initialization in Depth: Lyapunov Initialization and Limit Theorems for Deep Leaky ReLU Networks

深度网络的最优初始化：深度Leaky ReLU网络的Lyapunov初始化与极限定理

Constantin Kogler, Tassilo Schwarz, Samuel Kittle

发表机构 * School of Mathematics, Institute for Advanced Study（数学系，高级研究院）； Mathematical Institute, University of Oxford（牛津大学数学学院）； Max Planck Institute for Multidisciplinary Sciences（多学科科学研究所）； Department of Mathematics, University College London（伦敦大学学院数学系）

AI总结本文通过随机深度Leaky ReLU网络的严格概率分析，提出Lyapunov初始化方法，将Lyapunov指数设为零以确保激活稳定性，从而改善学习效果。

Comments Preprint, 44 pages

详情

AI中文摘要

深度网络的有效初始化需要理解随机神经网络。本文对深度无偏置随机Leaky ReLU网络进行了严格的概率分析。我们证明了网络激活范数对数的强大数定律和中心极限定理，表明随着层数增加，其增长由称为Lyapunov指数的参数控制。该参数刻画了激活消失与爆炸之间的尖锐相变，并针对高斯或正交权重矩阵显式计算了Lyapunov指数。我们的结果表明，标准方法（如He初始化或正交初始化）无法保证低宽度深度网络的激活稳定性。基于这些理论见解，我们提出了一种新的初始化方法，称为Lyapunov初始化，它将Lyapunov指数设为零，从而确保神经网络尽可能稳定，经验上导致学习改进。

英文摘要

Effective initialization in deep networks requires an understanding of random neural networks. In this work, a rigorous probabilistic analysis of deep bias-free random Leaky ReLU networks is provided. We prove a Law of Large Numbers and a Central Limit Theorem for the logarithm of the norm of network activations, establishing that, as the number of layers increases, their growth is governed by a parameter called the Lyapunov exponent. This parameter characterizes a sharp phase transition between vanishing and exploding activations, and we calculate the Lyapunov exponent explicitly for Gaussian or orthogonal weight matrices. Our results reveal that standard methods, such as He initialization or orthogonal initialization, do not guarantee activation stability for deep networks of low width. Based on these theoretical insights, we propose a novel initialization method, referred to as Lyapunov initialization, which sets the Lyapunov exponent to zero and thereby ensures that the neural network is as stable as possible, leading empirically to improved learning.

URL PDF HTML ☆

赞 0 踩 0

2602.10352 2026-06-03 cs.CL cs.AI cs.LG 版本更新

Learning Self-Interpretation from Interpretability Artifacts: Training Lightweight Adapters on Vector-Label Pairs

从可解释性工件中学习自我解释：在向量-标签对上训练轻量级适配器

Keenan Pepper, Alex McKenzie, Florin Pop, Stijn Servaes, Martin Leitgab, Mike Vaiana, Judd Rosenblatt, Michael S. A. Graziano, Diogo de Lucena

发表机构 * University of Washington（华盛顿大学）

AI总结通过训练轻量级适配器（标量仿射适配器，仅需d_model+1参数）在可解释性工件上，保持语言模型完全冻结，实现了跨任务和模型族的可靠自我解释，在稀疏自编码器特征标注、主题识别和多跳推理桥接实体解码等任务上显著优于未训练基线。

Comments 26 pages, 18 tables, 17 figures. Code and data at https://github.com/agencyenterprise/selfie-adapters

详情

AI中文摘要

自我解释方法促使语言模型描述其内部状态，但由于超参数敏感性而仍然不可靠。我们表明，在可解释性工件上训练轻量级适配器，同时保持语言模型完全冻结，可以在任务和模型族中产生可靠的自我解释。一个仅需$d_\text{model}+1$个参数的标量仿射适配器就足够了：训练后的适配器生成稀疏自编码器特征标签，其性能优于训练标签本身（在70B规模下，生成评分为70% vs 50%），以94%的召回率@1识别主题（未训练基线为1%），并在多跳推理中解码既不在提示中也不在响应中出现的桥接实体，从而无需思维链即可揭示隐式推理。仅学习到的偏置向量就占了改进的85%，更简单的适配器比更具表达力的替代方案具有更好的泛化能力。通过提示描述控制模型知识，我们发现从7B到72B参数，自我解释的提升超过了能力提升。我们的结果表明，自我解释随着规模扩大而改善，且无需修改被解释的模型。

英文摘要

Self-interpretation methods prompt language models to describe their own internal states, but remain unreliable due to hyperparameter sensitivity. We show that training lightweight adapters on interpretability artifacts, while keeping the LM entirely frozen, yields reliable self-interpretation across tasks and model families. A scalar affine adapter with just $d_\text{model}+1$ parameters suffices: trained adapters generate sparse autoencoder feature labels that outperform the training labels themselves (70% vs 50% generation scoring at 70B scale), identify topics with 94% recall@1 versus 1% for untrained baselines, and decode bridge entities in multi-hop reasoning that appear in neither prompt nor response, surfacing implicit reasoning without chain-of-thought. The learned bias vector alone accounts for 85% of improvement, and simpler adapters generalize better than more expressive alternatives. Controlling for model knowledge via prompted descriptions, we find self-interpretation gains outpace capability gains from 7B to 72B parameters. Our results demonstrate that self-interpretation improves with scale, without modifying the model being interpreted.

URL PDF HTML ☆

赞 0 踩 0

2602.09708 2026-06-03 cs.LG cs.AI cs.CV cs.NA math.NA 版本更新

Physics-informed diffusion models in spectral space

谱空间中的物理信息扩散模型

Davide Gallon, Philippe von Wurstemberger, Patrick Cheridito, Arnulf Jentzen

发表机构 * ETH Zürich（苏黎世联邦理工学院）

AI总结提出物理信息谱扩散（PISD）方法，结合生成式潜扩散模型与物理信息机器学习，在谱表示潜空间中对偏微分方程参数和解进行扩散建模，通过扩散后验采样施加物理约束和测量条件，在泊松、亥姆霍兹和不可压缩纳维-斯托克斯方程上展现出比现有扩散求解器更高的精度和计算效率。

Comments 18 pages, 10 figures

详情

AI中文摘要

我们提出物理信息谱扩散（PISD），一种将生成式潜扩散模型与物理信息机器学习相结合的方法，用于生成基于部分观测的偏微分方程（PDE）的解，特别包括正向和逆向PDE问题。我们在缩放谱表示的潜空间中通过扩散过程学习PDE参数和解的联合分布，其中高斯噪声对应于具有受控正则性的函数。与基于网格的扩散模型相比，这种谱公式能够实现显著的降维，并确保函数空间中的诱导过程保持在PDE算子定义良好的函数类内。基于扩散后验采样，我们在推理过程中施加物理信息约束和测量条件，在每个扩散步骤应用基于Adam的更新。我们在泊松、亥姆霍兹和不可压缩纳维-斯托克斯方程上评估了所提出的方法，与现有的基于扩散的PDE求解器（在稀疏观测下达到最先进水平）相比，展示了更高的精度和计算效率。代码可在 https://github.com/deeplearningmethods/PISD 获取。

英文摘要

We propose physics-informed spectral diffusion (PISD), a methodology that combines generative latent diffusion models with physics-informed machine learning to generate solutions of partial differential equations (PDEs) conditioned on partial observations, which includes, in particular, forward and inverse PDE problems. We learn the joint distribution of PDE parameters and solutions via a diffusion process in a latent space of scaled spectral representations, where Gaussian noise corresponds to functions with controlled regularity. This spectral formulation enables significant dimensionality reduction compared to grid-based diffusion models and ensures that the induced process in function space remains within a class of functions for which the PDE operators are well defined. Building on diffusion posterior sampling, we enforce physics-informed constraints and measurement conditions during inference, applying Adam-based updates at each diffusion step. We evaluate the proposed approach on Poisson, Helmholtz, and incompressible Navier-Stokes equations, demonstrating improved accuracy and computational efficiency compared with existing diffusion-based PDE solvers, which are state of the art for sparse observations. Code is available at https://github.com/deeplearningmethods/PISD.

URL PDF HTML ☆

赞 0 踩 0

2510.12636 2026-06-03 stat.ML cs.LG math.AP 版本更新

Adapting Noise to Data: Generative Flows from 1D Processes

将噪声适应于数据：来自一维过程的生成流

Jannis Chemseddine, Gregor Kornhardt, Richard Duong, Gabriele Steidl

发表机构 * University of Cambridge（剑桥大学）

AI总结提出一个通用框架，通过一维分位数函数学习数据自适应的参数化先验分布（潜在噪声），利用噪声与数据之间的Wasserstein距离进行优化，以改善生成流模型对重尾等分布的学习能力。

Comments ICML 2026

2602.06842 2026-06-03 math.NA cs.LG cs.NA 版本更新

Are Deep Learning Based Hybrid PDE Solvers Reliable? Why Training Paradigms and Update Strategies Matter

基于深度学习的混合PDE求解器可靠吗？为什么训练范式和更新策略很重要

Yuhan Wu, Jan Willem van Beek, Victorita Dolean, Alexander Heinlein

发表机构 * Delft Institute of Applied Mathematics（代尔夫特应用数学研究所）； Delft University of Technology（代尔夫特理工大学）； Department of Mathematics and Computer Science（数学与计算机科学系）； Eindhoven University of Technology（埃因霍温理工大学）； The Netherlands（荷兰）

AI总结本文研究基于深度学习的混合迭代方法（DL-HIMs）在科学计算中的可靠性问题，发现训练目标与求解器动力学及物理问题不一致会导致残差停滞，并提出物理感知的Anderson加速（PA-AA）方法以恢复可靠收敛。

Comments Accepted manuscript version of an article accepted for publication in IEEE Computing in Science & Engineering. The final published version will be available through IEEE Xplore

详情

AI中文摘要

基于深度学习的混合迭代方法（DL-HIMs）将经典数值求解器与神经算子相结合，利用它们互补的谱偏差来加速收敛。尽管有这一前景，许多DL-HIMs在假固定点处停滞，此时神经更新消失而物理残差仍然很大，这引发了对其在科学计算中可靠性的质疑。在本文中，我们提供证据表明，即使神经架构固定，性能对训练范式和更新策略高度敏感。通过对基于DeepONet的混合迭代数值可转移求解器（HINTS）和基于FFT的傅里叶神经求解器（FNS）的详细研究，我们展示了当训练目标与求解器动力学和问题物理不一致时，显著的物理残差可能持续存在。我们进一步研究了Anderson加速（AA），并证明其经典形式不适用于非线性神经算子。为了克服这一点，我们引入了物理感知的Anderson加速（PA-AA），它最小化物理残差而非固定点更新。数值实验证实，PA-AA在显著更少的迭代次数内恢复了可靠收敛。这些发现为围绕基于AI的PDE求解器的持续争议提供了具体答案：可靠性不仅取决于架构，还取决于物理信息驱动的训练和迭代设计。

英文摘要

Deep learning-based hybrid iterative methods (DL-HIMs) integrate classical numerical solvers with neural operators, utilizing their complementary spectral biases to accelerate convergence. Despite this promise, many DL-HIMs stagnate at false fixed points where neural updates vanish while the physical residual remains large, raising questions about reliability in scientific computing. In this paper, we provide evidence that performance is highly sensitive to training paradigms and update strategies, even when the neural architecture is fixed. Through a detailed study of a DeepONet-based hybrid iterative numerical transferable solver (HINTS) and an FFT-based Fourier neural solver (FNS), we show that significant physical residuals can persist when training objectives are not aligned with solver dynamics and problem physics. We further examine Anderson acceleration (AA) and demonstrate that its classical form is ill-suited for nonlinear neural operators. To overcome this, we introduce physics-aware Anderson acceleration (PA-AA), which minimizes the physical residual rather than the fixed-point update. Numerical experiments confirm that PA-AA restores reliable convergence in substantially fewer iterations. These findings provide a concrete answer to ongoing controversies surrounding AI-based PDE solvers: reliability hinges not only on architectures but on physically informed training and iteration design.

URL PDF HTML ☆

赞 0 踩 0

2511.12085 2026-06-03 cs.CR cs.AI cs.LG 版本更新

A Robust and Explainable Transformer-Based Framework for Phishing Email Detection

一种鲁棒且可解释的基于Transformer的钓鱼邮件检测框架

Sajad U P

发表机构 * Independent Researcher（独立研究者）

AI总结提出基于DistilBERT的轻量级钓鱼邮件检测框架，通过梯度对抗训练和字符级噪声增强鲁棒性，并集成LIME、SHAP和IG三种可解释AI方法，结合Flan-T5-Small生成自然语言解释，提升检测准确性和用户信任。

详情

AI中文摘要

钓鱼及相关网络威胁正变得越来越复杂，基于电子邮件的钓鱼仍然是最持久的攻击载体。这些攻击利用人类漏洞来传递恶意软件或获取对敏感信息的未授权访问。基于Transformer的模型通过强大的上下文语言理解增强了钓鱼检测；然而，由于缺乏可解释性，它们通常被视为黑盒。此外，最近的AI驱动攻击进一步削弱了模型的韧性。为了解决这些挑战，本文提出了一种基于DistilBERT（一种轻量级Transformer模型）的轻量级钓鱼检测框架。通过使用快速梯度法（FGM）进行基于梯度的对抗训练，并结合随机字符级扰动，增强了对嵌入级扰动和字符级输入噪声的鲁棒性。为了提高透明度，集成了三种突出的可解释AI（XAI）方法：LIME（局部可解释模型无关解释）、SHAP（SHapley Additive exPlanations）和IG（积分梯度），以解释模型决策。一个结构化的基于规则的提示结合模型预测和XAI特征，引导Flan-T5-Small生成通俗易懂、基于证据的解释。实验结果表明，所提出的框架在准确性和韧性方面优于未经鲁棒性增强的标准DistilBERT检测模型。这种集成方法有助于弥合模型可靠性与用户信任之间的差距，推动透明钓鱼检测的发展。

英文摘要

Phishing and related cyber threats are becoming increasingly sophisticated, with email-based phishing remaining the most persistent attack vector. These attacks exploit human vulnerabilities to deliver malware or gain unauthorized access to sensitive information. Transformer-based models enhance phishing detection through robust contextual language understanding; yet they are often regarded as black boxes due to a lack of interpretability. Moreover, recent AI-enabled attacks further undermine model resilience. To address these challenges, this work proposes a lightweight phishing detection framework based on DistilBERT, a lightweight Transformer model. Robustness to embedding-level perturbations and character-level input noise is enhanced through gradient-based adversarial training using the Fast Gradient Method (FGM), combined with stochastic character-level perturbations. To improve transparency, three prominent Explainable AI (XAI) methods, LIME (Local Interpretable Model-agnostic Explanations), SHAP (SHapley Additive exPlanations), and IG (Integrated Gradients), are integrated to interpret model decision-making. A structured rule-based prompt combines model predictions and XAI features to guide Flan-T5-Small in generating plain-language, evidence-based explanations. Experimental results demonstrate that the proposed framework outperforms a standard DistilBERT-based detection model trained without robustness enhancements in terms of accuracy and resilience. This integrated approach helps bridge the gap between model reliability and user trust, advancing transparent phishing detection.

URL PDF HTML ☆

赞 0 踩 0

2602.05031 2026-06-03 cs.LG 版本更新

Laplacian Representations for Decision-Time Planning

用于决策时规划的拉普拉斯表示

Dikshant Shehmar, Matthew Schlegel, Matthew E. Taylor, Marlos C. Machado

发表机构 * University of Cambridge（剑桥大学）

AI总结本文提出利用拉普拉斯表示作为决策时规划的潜在空间，通过多时间尺度捕捉状态空间距离，并基于此设计层次规划算法ALPS，在离线目标条件强化学习任务中优于常用基线。

Comments Accepted at ICML 2026

详情

Journal ref: Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)

AI中文摘要

在基于模型的强化学习中，使用学习到的模型进行规划仍然是一个关键挑战。在决策时规划中，状态表示至关重要，因为它们必须支持局部成本计算，同时保持长时程结构。在本文中，我们展示了拉普拉斯表示通过在多时间尺度上捕捉状态空间距离，为规划提供了一个有效的潜在空间。这种表示保留了有意义的距离，并自然地将长时程问题分解为子目标，同时也减轻了长预测范围内出现的复合误差。基于这些特性，我们引入了ALPS，一种层次规划算法，并证明它在来自OGBench（一个以前由无模型方法主导的基准）的离线目标条件强化学习任务选择上优于常用的基线。

英文摘要

Planning with a learned model remains a key challenge in model-based reinforcement learning (RL). In decision-time planning, state representations are critical as they must support local cost computation while preserving long-horizon structure. In this paper, we show that the Laplacian representation provides an effective latent space for planning by capturing state-space distances at multiple time scales. This representation preserves meaningful distances and naturally decomposes long-horizon problems into subgoals, also mitigating the compounding errors that arise over long prediction horizons. Building on these properties, we introduce ALPS, a hierarchical planning algorithm, and demonstrate that it outperforms commonly used baselines on a selection of offline goal-conditioned RL tasks from OGBench, a benchmark previously dominated by model-free methods.

URL PDF HTML ☆

赞 0 踩 0

2507.10419 2026-06-03 cs.LG cs.AI cs.CL stat.ML 版本更新

Multiple Choice Learning of Low-Rank Adapters for Language Modeling

低秩适配器的多选学习用于语言建模

Victor Letzelter, Hugo Malard, Mathieu Fontaine, Gaël Richard, Slim Essid, Andrei Bursuc, Patrick Pérez

发表机构 * Institut National de la Recherche Scientifique (INRS)（国家科学研究院）

AI总结提出LoRA-MCL训练方案，通过多选学习和低秩适配扩展语言模型的下一词预测，以在推理时解码多样且合理的句子延续。

Comments ICML 2026

2602.03681 2026-06-03 cs.CL cs.LG 版本更新

Neural Attention Search Linear: Towards Adaptive Token-Level Hybrid Attention Models

神经注意力搜索线性：迈向自适应令牌级混合注意力模型

Difan Deng, Andreas Bentzen Winje, Lukas Fehring, Marius Lindauer

发表机构 * University of Copenhagen（哥本哈根大学）

AI总结提出NAtS-L框架，在同一层内对不同令牌自适应选择线性注意力或softmax注意力，以平衡效率与表达能力。

Comments 21 pages, 12 figures

详情

AI中文摘要

softmax变换器的二次计算复杂度已成为长上下文场景的瓶颈。相比之下，线性注意力模型系列为更高效的序列模型提供了有希望的方向。这些线性注意力模型将过去的KV值压缩成单个隐藏状态，从而在训练和推理期间有效降低复杂度。然而，它们的表达能力仍然受限于隐藏状态的大小。先前的工作提出交错使用softmax和线性注意力层，以在保持表达能力的同时降低计算复杂度。然而，这些模型的效率仍然受限于其softmax注意力层。在本文中，我们提出神经注意力搜索线性（NAtS-L）框架，该框架在同一层内对不同令牌同时应用线性注意力和softmax注意力操作。NAtS-L自动判断一个令牌是否可以由线性注意力模型处理（即仅具有短期影响且可编码为固定大小隐藏状态的令牌），或者是否需要softmax注意力（即包含与长期检索相关信息且需要为未来查询保留的令牌）。通过跨令牌搜索最优的门控DeltaNet和softmax注意力组合，我们展示了NAtS-L提供了一种强大而高效的令牌级混合架构。

英文摘要

The quadratic computational complexity of softmax transformers has become a bottleneck in long-context scenarios. In contrast, linear attention model families provide a promising direction towards a more efficient sequential model. These linear attention models compress past KV values into a single hidden state, thereby efficiently reducing complexity during both training and inference. However, their expressivity remains limited by the size of their hidden state. Previous work proposed interleaving softmax and linear attention layers to reduce computational complexity while preserving expressivity. Nevertheless, the efficiency of these models remains bottlenecked by their softmax attention layers. In this paper, we propose Neural Attention Search Linear (NAtS-L), a framework that applies both linear attention and softmax attention operations within the same layer on different tokens. NAtS-L automatically determines whether a token can be handled by a linear attention model, i.e., tokens that have only short-term impact and can be encoded into fixed-size hidden states, or require softmax attention, i.e., tokens that contain information related to long-term retrieval and need to be preserved for future queries. By searching for optimal Gated DeltaNet and softmax attention combinations across tokens, we show that NAtS-L provides a strong yet efficient token-level hybrid architecture.

URL PDF HTML ☆

赞 0 踩 0

2602.02890 2026-06-03 cs.LG 版本更新

Self-Soupervision: Cooking Model Soups without Labels

自我汤合：无标签的模型汤烹饪

Anthony Fuller, James R. Green, Evan Shelhamer

AI总结提出Self-Soupervision方法，将模型汤技术扩展到自监督学习，通过使用无标签数据混合不同自监督算法训练的参数，提升模型鲁棒性和准确性。

Comments code: https://github.com/antofuller/self_soupervision data: https://huggingface.co/datasets/antofuller/mini-VTAB

详情

AI中文摘要

模型汤是参数的一种奇特且异常有效的组合。它们将一个模型（底汤）微调成多个模型（配料），然后将它们的参数混合回一个模型（汤）以改进预测。虽然所有已知的汤都需要监督学习，并在标记数据上优化相同的损失，但我们的Self-Soupervision配方将汤推广到自监督学习（SSL）。我们的Self-Souping允许我们在新的数据源上调味配料，例如来自迁移任务的无标签数据或来自鲁棒性迁移的数据。我们表明，在损坏的测试数据上进行Self-Souping，然后回到未损坏的训练数据上进行微调，可以将鲁棒性提升+3.5%（ImageNet-C）和+7%（LAION-C）。Self-Soupervision还解锁了无数SSL算法，以烹饪更鲁棒汤所需的各种配料。我们首次表明，配料可以在其SSL超参数上有所不同——更令人惊讶的是，在其SSL算法上也可以不同。我们烹饪了MAE、MoCoV3、MMCR和LeJEPA配料的汤，这些汤比任何单个SSL配料都更准确。

英文摘要

Model soups are strange and strangely effective combinations of parameters. They take a model (the stock), fine-tune it into multiple models (the ingredients), and then mix their parameters back into one model (the soup) to improve predictions. While all known soups require supervised learning, and optimize the same loss on labeled data, our recipes for Self-Soupervision generalize soups to self-supervised learning (SSL). Our Self-Souping lets us flavor ingredients on new data sources, e.g. from unlabeled data from a task for transfer or from a shift for robustness. We show that Self-Souping on corrupted test data, then fine-tuning back on uncorrupted train data, boosts robustness by +3.5% (ImageNet-C) and +7% (LAION-C). Self-Soupervision also unlocks countless SSL algorithms to cook the diverse ingredients needed for more robust soups. We show for the first time that ingredients can differ in their SSL hyperparameters -- and more surprisingly, in their SSL algorithms. We cook soups of MAE, MoCoV3, MMCR, and LeJEPA ingredients that are more accurate than any single SSL ingredient.

URL PDF HTML ☆

赞 0 踩 0

2602.01903 2026-06-03 cs.LG stat.ML 版本更新

Data- and Variance-dependent Regret Bounds for Online Tabular MDPs

在线表格MDPs的数据依赖和方差依赖遗憾界

Mingyi Li, Taira Tsuchiya, Kenji Yamanishi

AI总结针对已知转移的在线表格马尔可夫决策过程，提出在对抗性环境下实现数据依赖遗憾界、在随机环境下实现方差依赖遗憾界的最优算法，并证明全局优化方法达到近乎最优。

Comments Accepted at ICML 2026. 72 pages, 4 tables

详情

AI中文摘要

本文研究具有已知转移的在线情景表格马尔可夫决策过程（MDPs），并开发了在对抗性环境下实现精细数据依赖遗憾界、在随机环境下实现方差依赖遗憾界的最佳算法。我们使用一阶量和几个新的数据依赖度量（包括二阶量和路径长度度量）来量化对抗性环境下的MDP复杂度，以及基于方差的度量来量化随机环境下的MDP复杂度。为了适应这些度量，我们基于全局优化和策略优化开发了算法，两者都建立在具有对数障碍正则化的乐观跟随正则化领导者之上。对于全局优化，我们的算法在对抗性环境下实现了一阶、二阶和路径长度遗憾界，在随机环境下实现了方差感知的无间隙依赖界和方差感知的间隙依赖界（该界关于情景数量为多对数）。对于策略优化，通过利用新的乐观$Q$函数估计器，我们的算法实现了相同的数据和方差依赖自适应性，但乘以情景视界因子。最后，我们针对对抗性环境下的数据依赖复杂度度量和随机环境下的方差度量建立了遗憾下界，表明全局优化方法实现的遗憾上界是近乎最优的。

英文摘要

This work studies online episodic tabular Markov decision processes (MDPs) with known transitions and develops best-of-both-worlds algorithms that achieve refined data-dependent regret bounds in the adversarial regime and variance-dependent regret bounds in the stochastic regime. We quantify MDP complexity using a first-order quantity and several new data-dependent measures for the adversarial regime, including a second-order quantity and a path-length measure, as well as variance-based measures for the stochastic regime. To adapt to these measures, we develop algorithms based on global optimization and policy optimization, both built on optimistic follow-the-regularized-leader with log-barrier regularization. For global optimization, our algorithms achieve first-order, second-order, and path-length regret bounds in the adversarial regime, and in the stochastic regime, they achieve a variance-aware gap-independent bound and a variance-aware gap-dependent bound that is polylogarithmic in the number of episodes. For policy optimization, our algorithms achieve the same data- and variance-dependent adaptivity, up to a factor of the episode horizon, by exploiting a new optimistic $Q$-function estimator. Finally, we establish regret lower bounds in terms of data-dependent complexity measures for the adversarial regime and a variance measure for the stochastic regime, implying that the regret upper bounds achieved by the global-optimization approach are nearly optimal.

URL PDF HTML ☆

赞 0 踩 0

2602.01483 2026-06-03 cs.LG cs.AI stat.ME 版本更新

Causal Preference Elicitation

因果偏好启发

Edwin V. Bonilla, He Zhao, Daniel M. Steinberg

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种贝叶斯框架，通过主动查询局部边关系来集中有向无环图的后验分布，实现专家参与的因果发现。

2512.00956 2026-06-03 cs.LG cs.CL 版本更新

WUSH: Near-Optimal Adaptive Transforms for LLM Quantization

WUSH: 面向LLM量化的近最优自适应变换

Jiale Chen, Vage Egiazarian, Roberto L. Castro, Torsten Hoefler, Dan Alistarh

发表机构 * University of Tartu（塔尔图大学）

AI总结提出一种结合Hadamard基与数据依赖二阶矩的非正交变换WUSH，在标准RTN AbsMax缩放块量化器下实现权重-激活联合量化的闭式最优解，显著提升低比特量化精度并支持高效GPU实现。

Comments Published as a conference paper at the 43rd International Conference on Machine Learning (ICML 2026): https://openreview.net/forum?id=ZsECxUkbKB

详情

AI中文摘要

量化LLM权重和激活是实现高效部署的标准方法，但少数极端异常值会拉伸动态范围并放大低比特量化误差。先前的基于变换的缓解方法（例如Hadamard旋转）是固定的且与数据无关，其量化最优性尚不明确。我们推导了在标准RTN AbsMax缩放块量化器下，用于联合权重-激活量化的闭式最优线性块变换，涵盖整数和浮点格式。由此产生的构造WUSH将Hadamard骨干与数据依赖的二阶矩分量相结合，形成一种非正交变换，在温和假设下对FP和INT量化器证明是近最优的，同时支持高效的融合GPU实现。实验上，WUSH在最强Hadamard基线（例如，在Llama-3.1-8B-Instruct的MXFP4上，RTN平均提升+2.8个点，GPTQ提升+0.7个点）上改善了W4A4精度，同时通过FP4 MatMul实现了高达BF16的5.8倍每层吞吐量。源代码可在https://github.com/IST-DASLab/WUSH获取。

英文摘要

Quantizing LLM weights and activations is a standard approach for efficient deployment, but a few extreme outliers can stretch the dynamic range and amplify low-bit quantization errors. Prior transform-based mitigations (e.g., Hadamard rotations) are fixed and data-agnostic, and their optimality for quantization has remained unclear. We derive closed-form optimal linear blockwise transforms for joint weight-activation quantization under standard RTN AbsMax-scaled block quantizers, covering both integer and floating-point formats. The resulting construction, WUSH, combines a Hadamard backbone with a data-dependent second-moment component to form a non-orthogonal transform that is provably near-optimal for FP and INT quantizers under mild assumptions while admitting an efficient fused GPU implementation. Empirically, WUSH improves W4A4 accuracy over the strongest Hadamard-based baselines (e.g., on Llama-3.1-8B-Instruct in MXFP4, it gains +2.8 average points with RTN and +0.7 with GPTQ) while delivering up to 5.8$\times$ per-layer throughput over BF16 via FP4 MatMul. Source code is available at https://github.com/IST-DASLab/WUSH.

URL PDF HTML ☆

赞 0 踩 0

2510.02763 2026-06-03 cs.LG cs.AI 版本更新

Fusing Multi- and Hyperspectral Satellite Data for Harmful Algal Bloom Monitoring with Self-Supervised and Hierarchical Deep Learning

融合多光谱和高光谱卫星数据用于有害藻华监测的自监督与分层深度学习

Nicholas LaHaye, Kelly M. Luis, Michelle M. Gierach

发表机构 * University of Colorado Boulder（科罗拉多大学博尔德分校）

AI总结提出自监督机器学习框架SIT-FUSE，融合多传感器卫星反射率与TROPOMI太阳诱导荧光数据，通过分层深度聚类生成有害藻华严重程度和物种分类产品，在墨西哥湾和南加州验证了与实测数据的一致性。

详情

DOI: 10.1029/2025EA004881

AI中文摘要

我们提出了一种自监督机器学习框架，用于利用多传感器卫星数据检测和绘制有害藻华（HABs）的严重程度和物种分类。通过融合来自运行极轨卫星仪器（VIIRS、MODIS、OLCI和OCI）的反射率数据与TROPOMI太阳诱导荧光（SIF），我们的框架SIT-FUSE无需每个仪器的标记数据集即可生成HAB严重程度和物种分类产品。该框架采用自监督表示学习和分层深度聚类，将浮游植物细胞丰度和物种分割成可解释的类别，并利用墨西哥湾和南加州（2018-2025年）的原位数据进行了验证。结果显示与总浮游植物、短凯伦藻和拟菱形藻属测量值高度一致。这项工作推进了在地面观测有限的环境中进行可扩展的HAB监测，同时通过分层嵌入实现探索性分析——这是将自监督学习应用于全球水生生物地球化学操作化的关键一步。

英文摘要

We present a self-supervised machine learning framework for detecting and mapping the severity and speciation of harmful algal blooms (HABs) using multi-sensor satellite data. By fusing reflectance data from operational polar-orbiting satellite-based instruments (VIIRS, MODIS, OLCI, and OCI) with TROPOMI solar-induced fluorescence (SIF), our framework, called SIT-FUSE, generates HAB severity and speciation products without requiring per-instrument labeled datasets. The framework employs self-supervised representation learning and hierarchical deep clustering to segment phytoplankton cell abundance and species into interpretable classes, validated against in-situ data from the Gulf of Mexico and Southern California (2018-2025). Results show strong agreement with total phytoplankton, Karena brevis, and Pseudo-nitzschia spp. measurements. This work advances scalable HAB monitoring in environments where ground truth observations are limited, while enabling exploratory analysis via hierarchical embeddings - a critical step toward operationalizing self-supervised learning for global aquatic biogeochemistry.

URL PDF HTML ☆

赞 0 踩 0

2602.00392 2026-06-03 cs.LG 版本更新

Localized, High-resolution Geographic Representations with Slepian Functions

基于Slepian函数的局部高分辨率地理表示

Arjun Rao, Ruth Crasto, Tessa Ooms, David Rolnick, Konstantin Klemmer, Marc Rußwurm

AI总结提出利用球面Slepian函数构建地理编码器，在感兴趣区域内集中表示能力，实现高分辨率且计算高效，并引入混合Slepian-球谐编码器平衡局部与全局性能，在分类、回归等任务中优于基线。

Comments ICML 2026

详情

AI中文摘要

规划、验证与填充：扩散语言模型的结构化并行解码方法

Miao Li, Hanyang Jiang, Sikai Cheng, Hengyu Fu, Yuhang Cai, Baihe Huang, Tinghan Ye, Xuanzhou Chen, Pascal Van Hentenryck

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； University of California, Berkeley（加州大学伯克利分校）； University of Michigan（密歇根大学）

AI总结提出Plan-Verify-Fill (PVF)方法，通过定量验证进行分层骨架规划，并采用验证协议实现结构化停止，在保持准确性的同时将函数评估次数减少高达65%。

详情

AI中文摘要

扩散语言模型（DLM）为文本生成提供了一种有前景的非顺序范式，不同于标准的自回归（AR）方法。然而，当前的解码策略通常采取被动姿态，未能充分利用全局双向上下文来指导全局轨迹。为了解决这个问题，我们提出了Plan-Verify-Fill（PVF），一种无需训练的范式，通过定量验证来锚定规划。PVF通过优先考虑高杠杆语义锚点主动构建分层骨架，并采用验证协议来实现实用的结构化停止，在进一步思考收益递减时停止。在LLaDA-8B-Instruct和Dream-7B-Instruct上的广泛评估表明，与基于置信度的并行解码相比，PVF在基准数据集上将函数评估次数（NFE）减少了高达65%，在不牺牲准确性的情况下实现了卓越的效率。

英文摘要

Diffusion Language Models (DLMs) present a promising non-sequential paradigm for text generation, distinct from standard autoregressive (AR) approaches. However, current decoding strategies often adopt a reactive stance, underutilizing the global bidirectional context to dictate global trajectories. To address this, we propose Plan-Verify-Fill (PVF), a training-free paradigm that grounds planning via quantitative validation. PVF actively constructs a hierarchical skeleton by prioritizing high-leverage semantic anchors and employs a verification protocol to operationalize pragmatic structural stopping where further deliberation yields diminishing returns. Extensive evaluations on LLaDA-8B-Instruct and Dream-7B-Instruct demonstrate that PVF reduces the Number of Function Evaluations (NFE) by up to 65% compared to confidence-based parallel decoding across benchmark datasets, unlocking superior efficiency without compromising accuracy.

URL PDF HTML ☆

赞 0 踩 0

2509.01641 2026-06-03 eess.SP cs.AI cs.LG 版本更新

Non-Identical Diffusion Models in MIMO-OFDM Channel Generation

MIMO-OFDM信道生成中的非相同扩散模型

Yuzhi Yang, Omar Alhussein, Mérouane Debbah

AI总结提出非相同扩散模型，通过元素级时间指示器捕获局部误差变化，解决MIMO-OFDM信道估计中元素可靠性不均的问题，理论验证其正确性并数值实验证明有效性。

Comments resubmitted to IEEE TCOM

详情

AI中文摘要

我们提出了一种新颖的扩散模型，称为非相同扩散模型，并研究了其在无线正交频分复用（OFDM）信道生成中的应用。与使用标量时间索引表示全局噪声水平的标准扩散模型不同，我们将这一概念扩展为元素级时间指示器，以更准确地捕获局部误差变化。非相同扩散使我们能够表征噪声输入中每个元素（例如OFDM中的子载波）的可靠性，从而在初始化有偏时改善生成结果。具体来说，我们专注于无线多输入多输出（MIMO）OFDM信道矩阵的恢复，其中由于导频方案，初始信道估计在元素间表现出高度不均匀的可靠性。传统的时间嵌入假设噪声进展均匀，无法捕获这种跨导频方案和噪声水平的变化。我们引入一个与输入大小匹配的矩阵来控制元素级噪声进展。遵循与现有方法类似的扩散过程，我们从理论和数值上证明了所提出的非相同扩散方案的正确性和有效性。对于MIMO-OFDM信道生成，我们提出了一种维度级时间嵌入策略。我们还开发并评估了多种训练和生成方法，并通过数值实验进行了比较。

英文摘要

We propose a novel diffusion model, termed the non-identical diffusion model, and investigate its application to wireless orthogonal frequency division multiplexing (OFDM) channel generation. Unlike the standard diffusion model that uses a scalar-valued time index to represent the global noise level, we extend this notion to an element-wise time indicator to capture local error variations more accurately. Non-identical diffusion enables us to characterize the reliability of each element (e.g., subcarriers in OFDM) within the noisy input, leading to improved generation results when the initialization is biased. Specifically, we focus on the recovery of wireless multi-input multi-output (MIMO) OFDM channel matrices, where the initial channel estimates exhibit highly uneven reliability across elements due to the pilot scheme. Conventional time embeddings, which assume uniform noise progression, fail to capture such variability across pilot schemes and noise levels. We introduce a matrix that matches the input size to control element-wise noise progression. Following a similar diffusion procedure to existing methods, we show the correctness and effectiveness of the proposed non-identical diffusion scheme both theoretically and numerically. For MIMO-OFDM channel generation, we propose a dimension-wise time embedding strategy. We also develop and evaluate multiple training and generation methods and compare them through numerical experiments.

URL PDF HTML ☆

赞 0 踩 0

2501.17377 2026-06-03 cs.LG cs.AI 版本更新

ASAP: Exploiting the Satisficing Generalization Edge in Neural Combinatorial Optimization

ASAP：利用神经组合优化中的满意泛化优势

Han Fang, Paul Weng, Yutong Ban

发表机构 * GitHub

AI总结针对神经组合优化模型在分布偏移下的脆弱性，提出ASAP框架，通过将决策分解为提案和选择两阶段，并利用MAML增强在线适应能力，在3D-BPP、TSP和CVRP上提升了泛化性能。

Comments Accepted as poster of ICML-2026

详情

AI中文摘要

深度强化学习（DRL）已成为解决组合优化（CO）问题（如3D装箱问题（3D-BPP）、旅行商问题（TSP）或车辆路径问题（VRP））的一种有前景的方法，但这些神经求解器在面对分布偏移时往往表现出脆弱性。为了解决这个问题，我们揭示了满意泛化优势，并在理论和实验上进行了验证：识别一组有希望的行动本质上比选择单一最优行动更具泛化性。为了利用这一特性，我们提出了自适应选择后提案（ASAP），这是一个通用框架，将决策过程分解为两个不同的阶段：作为鲁棒过滤器的提案策略和作为可适应决策者的选择策略。这种架构使得一种高效的在线适应策略成为可能，其中选择策略可以在新分布上快速微调。具体地，我们引入了一个由模型无关元学习（MAML）增强的两阶段训练框架，以使模型能够快速适应。在3D-BPP、TSP和CVRP上的大量实验表明，ASAP提高了最先进基线的泛化能力，并在分布外实例上实现了优越的在线适应。

英文摘要

Deep Reinforcement Learning (DRL) has emerged as a promising approach for solving Combinatorial Optimization (CO) problems, such as the 3D Bin Packing Problem (3D-BPP), Traveling Salesman Problem (TSP), or Vehicle Routing Problem (VRP), but these neural solvers often exhibit brittleness when facing distribution shifts. To address this issue, we uncover the Satisficing Generalization Edge, which we validate both theoretically and experimentally: identifying a set of promising actions is inherently more generalizable than selecting the single optimal action. To exploit this property, we propose Adaptive Selection After Proposal (ASAP), a generic framework that decomposes the decision-making process into two distinct phases: a proposal policy that acts as a robust filter, and a selection policy as an adaptable decision maker. This architecture enables a highly effective online adaptation strategy where the selection policy can be rapidly fine-tuned on a new distribution. Concretely, we introduce a two-phase training framework enhanced by Model-Agnostic Meta-Learning (MAML) to prime the model for fast adaptation. Extensive experiments on 3D-BPP, TSP, and CVRP demonstrate that ASAP improves the generalization capability of state-of-the-art baselines and achieves superior online adaptation on out-of-distribution instances.

URL PDF HTML ☆

赞 0 踩 0

2511.04243 2026-06-03 quant-ph cs.LG 版本更新

Twirlator: A Pipeline for Analyzing Subgroup Symmetry Effects in Quantum Machine Learning Ansatzes

Twirlator: 分析量子机器学习拟设中子群对称性效应的流水线

Valter Uotila, Väinö Mehtola, Ilmo Salmenperä, Bo Zhao

AI总结提出Twirlator流水线，通过对称群子群大小建模部分对称性，量化对称性增加时量子机器学习拟设的生成器漂移、电路开销、表达能力和纠缠能力之间的权衡。

Comments 8 pages; 7 figures; presented at the 7th International Workshop on Quantum Software Engineering (Q-SE 2026)

详情

DOI: 10.1145/3786150.3788613
Journal ref: Q-SE '26: Proceedings of the 7th IEEE/ACM International Workshop on Quantum Software Engineering (2026) 55 - 62

AI中文摘要

对称性是几何深度学习及其量子对应物中的强归纳偏置，并因其改善QML模型可训练性而受到越来越多的关注。然而，将对称性纳入量子机器学习（QML）拟设并非免费：对称化通常会增加门并约束电路。为了理解这些效应，我们提出了Twirlator，这是一个自动化流水线，用于对称化参数化QML拟设，并量化随着对称性增加而产生的权衡。Twirlator通过对称群子群的大小对部分对称性进行建模，从而能够分析“无对称性”和“完全对称性”极端之间的情形。在19种常见拟设模式中，Twirlator针对$S_n$的任何子群对称化电路，并测量（1）生成器漂移，（2）电路开销（深度和大小），以及（3）表达能力和纠缠能力。实验评估聚焦于$S_4$和$S_5$的子群。Twirlator揭示，较大的子群通常会增加电路开销，降低表达能力，并往往增加纠缠能力。该流水线和结果为在对称性感知的QML应用中选择平衡硬件成本和模型性能的拟设模式和对称性水平提供了实用指导。

英文摘要

Symmetry is a strong inductive bias in geometric deep learning and its quantum counterpart, and has attracted increasing attention for improving the trainability of QML models. Yet incorporating symmetries into quantum machine learning (QML) ansatzes is not free: symmetrization often adds gates and constrains the circuits. To understand these effects, we present Twirlator, which is an automated pipeline that symmetrizes parameterized QML ansatzes and quantifies the trade-offs as the amount of symmetry increases. Twirlator models partial symmetries by the size of a subgroup of the symmetric group, enabling analysis between the ``no symmetry'' and ``full symmetry'' extremes. Across 19 common ansatz patterns, Twirlator symmetrizes circuits with respect to any subgroup of $S_n$ and measures (1) generator drift, (2) circuit overhead (depth and size), and (3) expressibility and entangling capability. The experimental evaluation focuses on subgroups of $S_4$ and $S_5$. Twirlator reveals that larger subgroups typically increase circuit overhead, reduce expressibility, and often increase entangling capability. The pipeline and results provide practical guidance for selecting ansatz patterns and symmetry levels that balance hardware cost and model performance in symmetry-aware QML applications.

URL PDF HTML ☆

赞 0 踩 0

2601.17130 2026-06-03 cs.LG cs.CR 版本更新

Impact of Graph Structure on Membership-Inference Risk for Graph Neural Networks

图结构对图神经网络成员推理风险的影响

Megha Khosla

发表机构 * Delft University of Technology（代尔夫特理工大学）

AI总结本文通过分析训练图构建和推理时边访问两个维度，研究了图结构如何影响图神经网络的节点级成员推理风险，并发现雪球采样会损害泛化能力，而推理时边访问能显著改变成员推理优势。

Comments Accepted for publication in PETS 2026

详情

AI中文摘要

图神经网络（GNN）广泛用于节点分类和链接预测等任务，但在敏感场景中的使用引发了训练数据泄露的担忧。先前关于GNN隐私泄露的工作大多借鉴非图领域的假设，忽视了图结构的作用。我们主张对隐私风险进行图特定的分析，并研究图结构如何影响节点级成员推理。我们形式化了节点-邻域元组上的成员推理（MI），并探讨了两个重要维度：（i）训练图构建和（ii）推理时边访问。我们比较了雪球采样（一种结构感知过程）与均匀随机节点采样用于构建训练图。实验表明，雪球采样由于其覆盖偏差，通常比随机采样更损害泛化能力。相反，在推理时允许访问训练-测试间边可以提高测试准确率，缩小训练-测试差距，同时也会对成员推理优势产生强烈且依赖于设置的影响。这些结果表明图结构直接塑造了隐私风险。我们进一步表明，泛化差距（以训练和测试节点之间的性能差异衡量）是成员推理风险的不完全代理：成员推理优势可以独立于该差距的变化而上升或下降，而推理时边访问通常起着关键作用。理论上，我们证明对于节点级任务，基于成员推理的标准隐私审计结果不能直接推广到归纳图设置，因为训练和测试节点在结构上相互依赖而非可互换。我们在https://github.com/PriXAI/GraphStructurePrivacyAnalysis-public 发布代码和数据。

通过加权p值提高组合预测集的覆盖范围

Gina Wong, Drew Prinster, Suchi Saria, Rama Chellappa, Anqi Liu

发表机构 * Johns Hopkins University（约翰霍普金斯大学）

AI总结提出一种加权聚合预测集的框架，通过为每个预测集分配权重，实现覆盖范围在$1-2α$与$1-α$之间的灵活控制，并推广到数据依赖权重，在混合专家模型等场景中保持有限样本有效性。

详情

Journal ref: AISTATS 2026

AI中文摘要

共形预测通过用有效的预测集增强点预测来量化机器学习模型的不确定性。对于涉及多个试验、模型或数据源的复杂场景，可以聚合共形预测集以创建捕获整体不确定性的预测集，通常能提高精度。然而，聚合具有个体$1-α$覆盖率的多个预测集不可避免地削弱了整体保证，通常导致最坏情况覆盖率为$1-2α$。在这项工作中，我们提出了一个预测集加权聚合的框架，其中根据每个预测集的贡献为其分配权重。我们的框架提供了对集合聚合方式的灵活控制，实现了更紧的覆盖界限，根据权重的分布在组合模型的$1-2α$保证和单个模型的$1-α$保证之间插值。重要的是，我们的框架推广到数据依赖的权重，因为我们推导了一个加权聚合程序，即使权重依赖于数据，也能保持有限样本有效性。这一扩展使我们的框架广泛适用于权重被学习的场景，例如混合专家模型（MoE），并且我们通过在MoE设置中的实验证明，我们的方法实现了自适应覆盖。

英文摘要

Conformal prediction quantifies the uncertainty of machine learning models by augmenting point predictions with valid prediction sets. For complex scenarios involving multiple trials, models, or data sources, conformal prediction sets can be aggregated to create a prediction set that captures the overall uncertainty, often improving precision. However, aggregating multiple prediction sets with individual $1-α$ coverage inevitably weakens the overall guarantee, typically resulting in $1-2α$ worst-case coverage. In this work, we propose a framework for the weighted aggregation of prediction sets, where weights are assigned to each prediction set based on their contribution. Our framework offers flexible control over how the sets are aggregated, achieving tighter coverage bounds that interpolate between the $1-2α$ guarantee of the combined models and the $1-α$ guarantee of an individual model depending on the distribution of weights. Importantly, our framework generalizes to data-dependent weights, as we derive a procedure for weighted aggregation that maintains finite-sample validity even when the weights depend on the data. This extension makes our framework broadly applicable to settings where weights are learned, such as mixture-of-experts (MoE), and we demonstrate through experiments in the MoE setting that our methods achieve adaptive coverage.

URL PDF HTML ☆

赞 0 踩 0

2507.16003 2026-06-03 cs.CL cs.LG 版本更新

Learning without training: The implicit dynamics of in-context learning

无需训练的学习：上下文学习的内在动态

Benoit Dherin, Michael Munn, Hanna Mazzawi, Michael Wunder, Javier Gonzalvo

发表机构 * Google（谷歌）

AI总结本文通过理论分析和实验证明，自注意力层与MLP的组合使Transformer块能够根据上下文隐式修改MLP权重，从而解释大语言模型在推理时无需权重更新即可进行上下文学习的机制。

2512.16882 2026-06-03 physics.chem-ph cond-mat.mtrl-sci cs.LG 版本更新

A Cartesian-3j Framework for Machine Learning Interatomic Potentials

机器学习原子间势的 Cartesian-3j 框架

Zemin Xu, Chenyu Wu, Wenbo Xie, P. Hu

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出基于Cartesian-3j符号和Cartesian广义Clebsch-Gordan系数的不可约Cartesian张量框架，构建MACE、NequIP和Allegro的Cartesian版本，并引入TACE-v1-OAM-M模型在Matbench Discovery上取得竞争性能。

详情

AI中文摘要

机器学习原子间势（MLIPs）在计算化学的外推能力方面带来了显著提升。然而，大多数等变模型通常使用球张量（STs）构建，而笛卡尔张量公式尽管与原子坐标和张量目标自然对齐，但仍未得到充分发展。在这项工作中，我们通过引入\texttt{Cartesian-3j}符号和Cartesian广义Clebsch-Gordan系数，为不可约Cartesian张量（ICTs）开发了一个Cartesian框架，这些符号和系数直接类比于为ST耦合定义的\texttt{Wigner-3j}符号和广义Clebsch-Gordan系数。我们扩展了\texttt{e3nn}库以支持ICT乘积，并使用该框架构建了\texttt{MACE}、\texttt{NequIP}和\texttt{Allegro}的Cartesian对应版本，从而首次实现了在固定架构仅改变张量基下的受控比较。我们的实验表明，不可约Cartesian模型可以达到与球面对应版本相当的精度，但直接Cartesian化会导致不利的计算和内存缩放，这促使我们采用专门的Cartesian架构选择。利用ICTs和我们的框架，我们引入了\texttt{TACE-v1-OAM-M}，并证明它在Matbench Discovery上取得了与最先进ST模型竞争的性能。

英文摘要

Machine learning interatomic potentials (MLIPs) have brought substantial gains in the extrapolation capability in computational chemistry. However, most equivariant models are typically built with spherical tensors (STs), while Cartesian tensor formulations remain less developed despite their natural alignment with atomic coordinates and tensorial targets. In this work, we develop a Cartesian framework for irreducible Cartesian tensors (ICTs) by introduce the \texttt{Cartesian-3j} symbol and Cartesian Generalized Clebsch-Gordan Coefficients, which serve as direct analogues of the \texttt{Wigner-3j} symbol and Generalized Clebsch-Gordan coefficients defined for ST coupling. We extend the \texttt{e3nn} library to support ICT product, and use this framework to build Cartesian counterparts of \texttt{MACE}, \texttt{NequIP}, and \texttt{Allegro}, allowing the first controlled comparison where architectures are held fixed and only the tensor basis is changed. Our experiments show that irreducible Cartesian models can achieve accuracy comparable to spherical counterparts, but direct Cartesianization incurs unfavorable compute and memory scaling, motivating dedicated Cartesian architectural choices. Leveraging ICTs and our framework, we introduce \texttt{TACE-v1-OAM-M} and demonstrate that it achieves competitive performance on Matbench Discovery compared to state-of-the-art ST models.

URL PDF HTML ☆

赞 0 踩 0

2512.15427 2026-06-03 cs.LG cond-mat.stat-mech math.ST stat.TH 版本更新

Statistics of Min-max Normalized Eigenvalues in Random Matrices

随机矩阵中最小-最大归一化特征值的统计

Hyakka Nakada, Shu Tanaka

发表机构 * Graduate School of Science and Technology（理工学研究科）； Keio University（庆应大学）； Department of Applied Physics and Physico-Informatics（应用物理与物理信息学系）； Keio University Sustainable Quantum Artificial Intelligence Center (KSQAIC)（庆应大学可持续量子人工智能中心）； Human Biology-Microbiome-Quantum Research Center (WPI-Bio2Q)（人生物学-微生物组-量子研究中心（WPI-Bio2Q））； Green Computing System Research Organization（绿色计算系统研究机构）

AI总结研究随机矩阵中最小-最大归一化特征值的统计性质，提出有效分布并推导累积分布的标度律和矩阵分解的残差误差。

Comments 4 pages, 4 figures

2512.03019 2026-06-03 cs.LG cs.AI 版本更新

基于大规模在线核学习的双向因果效应估计

Masahiro Tanaka

发表机构 * Japan Society for the Promotion of Science（日本学术振兴会）

AI总结提出一种可扩展的在线核学习框架，结合异方差识别和拟极大似然估计，用于估计存在相互依赖和异方差系统中的双向因果效应，并通过随机傅里叶特征和自适应在线梯度下降实现高效计算。

详情

DOI: 10.1109/DSIS67228.2025.11390623
Journal ref: Proceedings of the 2025 International Conference on Data Science and Intelligent Systems (DSIS 2025), Article 65, pp. 449-455

AI中文摘要

本研究提出一种可扩展的在线核学习框架，用于估计以相互依赖和异方差为特征的系统中的双向因果效应。传统因果推断通常关注单向效应，忽略了现实世界中常见的双向关系。基于异方差识别，该方法将联立方程模型的拟极大似然估计与大规模在线核学习相结合。它采用随机傅里叶特征逼近来灵活建模非线性条件均值和方差，同时自适应在线梯度下降算法确保了对流式和高维数据的计算效率。大量模拟结果表明，与单方程和多项式逼近基线相比，该方法在多种数据生成过程中实现了更高的准确性和稳定性，偏差和均方根误差更低。这些结果证实，该方法以近线性计算扩展有效捕获了复杂的双向因果效应。通过将计量经济学识别与现代机器学习技术相结合，所提框架为自然科学/社会科学、政策制定、商业和工业应用中的大规模因果推断提供了一种实用、可扩展且理论扎实的解决方案。

英文摘要

In this study, a scalable online kernel learning framework is proposed for estimating bidirectional causal effects in systems characterized by mutual dependence and heteroskedasticity. Traditional causal inference often focuses on unidirectional effects, overlooking the common bidirectional relationships in real-world phenomena. Building on heteroskedasticity-based identification, the proposed method integrates a quasi-maximum likelihood estimator for simultaneous equation models with large scale online kernel learning. It employs random Fourier feature approximations to flexibly model nonlinear conditional means and variances, while an adaptive online gradient descent algorithm ensures computational efficiency for streaming and high-dimensional data. Results from extensive simulations demonstrate that the proposed method achieves superior accuracy and stability than single equation and polynomial approximation baselines, exhibiting lower bias and root mean squared error across various data-generating processes. These results confirm that the proposed approach effectively captures complex bidirectional causal effects with near-linear computational scaling. By combining econometric identification with modern machine learning techniques, the proposed framework offers a practical, scalable, and theoretically grounded solution for large scale causal inference in natural/social science, policy making, business, and industrial applications.

URL PDF HTML ☆

赞 0 踩 0

2511.13899 2026-06-03 q-bio.NC cs.CE cs.LG 版本更新

A Factorized Low-Rank RNN Framework for Uncovering Independent Neural Latent Dynamics and Connectivity

一种分解低秩RNN框架用于揭示独立神经潜在动力学和连接性

Chengrui Li, Yunmiao Wang, Yule Wang, Weihan Li, Dieter Jaeger, Anqi Wu

发表机构 * University of California, San Diego（加州大学圣迭戈分校）

AI总结提出FacRNN框架，通过组间独立假设和部分相关惩罚，在低秩循环神经网络中实现潜在动力学的解耦与可解释性提升。

详情

AI中文摘要

低秩循环神经网络（lrRNN）是一类揭示神经群体活动背后低维潜在动力学的模型。尽管其功能连接是低秩的，但缺乏独立性解释，使得难以将不同的计算角色分配给不同的潜在维度。为了解决这个问题，我们提出了分解循环神经网络（FacRNN），这是一种生成式lrRNN框架，它假设潜在动力学之间具有组间独立性，同时允许组内灵活纠缠。这些独立的潜在组允许潜在动力学分别演化，但内部丰富以进行复杂计算。我们在变分自编码器（VAE）框架下重新表述lrRNN，从而引入部分相关惩罚，鼓励潜在维度组之间的独立性。在合成数据、猴子M1和小鼠电压成像数据上的实验表明，与不鼓励组间独立性的基线lrRNN相比，FacRNN持续改善了在低维空间和低秩连接中学到的神经潜在轨迹的解耦性和可解释性。

英文摘要

Low-rank recurrent neural networks (lrRNNs) are a class of models that uncover low-dimensional latent dynamics underlying neural population activity. Although their functional connectivity is low-rank, it lacks independence interpretations, making it difficult to assign distinct computational roles to different latent dimensions. To address this, we propose the Factored Recurrent Neural Network (FacRNN), a generative lrRNN framework that assumes group-wise independence among latent dynamics while allowing flexible within-group entanglement. These independent latent groups allow latent dynamics to evolve separately, but are internally rich for complex computation. We reformulate the lrRNN under a variational autoencoder (VAE) framework, enabling us to introduce a partial correlation penalty that encourages independence between groups of latent dimensions. Experiments on synthetic, monkey M1, and mouse voltage imaging data show that FacRNN consistently improves the disentanglement and interpretability of learned neural latent trajectories in low-dimensional space and low-rank connectivity over baseline lrRNNs that do not encourage group-wise independence.

URL PDF HTML ☆

赞 0 踩 0

2511.13663 2026-06-03 cs.PL cs.LG 版本更新

SAIL: Sound Abstract Interpreters with LLMs

SAIL: 基于LLM的可靠抽象解释器

Qiuhan Gu, Avaljot Singh, Gagandeep Singh

AI总结提出SAIL框架，利用大语言模型自动合成全局可靠的抽象变换器，通过约束优化和代价函数确保可靠性，在神经网络验证中匹配甚至超越人工设计的变换器。

Comments 43 pages, 21 figures

详情

DOI: 10.1145/3808308
Journal ref: Proc. ACM Program. Lang. 10, PLDI, Article 230, 26 pages (2026)

AI中文摘要

如何构建全局可靠的抽象解释器以安全地近似程序行为仍然是抽象解释中的一个瓶颈。在本文中，我们展示了使用最先进的大语言模型来自动化这一繁琐过程的潜力。聚焦于神经网络验证领域，我们利用大语言模型从零开始在无限空间中搜索，跨不同抽象域合成非平凡的可靠抽象变换器。我们将合成任务形式化为一个约束优化问题，为此设计了一种新颖的基于数学的代价函数，用于衡量每个生成候选变换器的不可靠程度，同时强制执行硬性的语法和语义有效性约束。基于这一公式，我们引入了SAIL，一个新颖的统一框架，结合了模型生成、语法和语义验证以及基于代价函数的细化，以合成全局可靠的抽象变换器。评估结果表明，SAIL不仅匹配了人工设计的变换器的性能，还能够合成为复杂非线性算子设计的、文献中不存在的可靠且高精度的变换器。

英文摘要

How to construct globally sound abstract interpreters to safely approximate program behaviors remains a bottleneck in abstract interpretation. In this paper, we show the potential of using state-of-the-art LLMs to automate this tedious process. Focusing on the neural network verification area, we synthesize non-trivial sound abstract transformers across diverse abstract domains using LLMs to search within infinite space from scratch. We formalize the synthesis task as a constrained optimization problem, for which we design a novel mathematically grounded cost function that measures the degree of unsoundness of each generated candidate transformer, while enforcing hard syntactic and semantic validity constraints. Building on this formulation, we introduce SAIL, a novel unified framework that combines model generation, syntactic and semantic validation, and cost-function-based refinement to synthesize globally sound abstract transformers. Evaluation results show that SAIL not only matches the performance of manually designed transformers, but also is able to synthesize sound and high-precision transformers that do not exist in the literature for complex non-linear operators.

URL PDF HTML ☆

赞 0 踩 0

2511.12482 2026-06-03 quant-ph cs.LG 版本更新

Discovering autonomous quantum error correction via deep reinforcement learning

通过深度强化学习发现自主量子纠错

Yue Yin, Tailong Xiao, Xiaoyang Deng, Ming He, Jianping Fan, Guihua Zeng

发表机构 * Zhiyuan College, Shanghai Jiao Tong University, Shanghai 200240, P.R. China（上海交通大学玉泉学院）； State Key Laboratory of Photonics and Communications, Institute for Quantum Sensing and Information Processing, Shanghai Jiao Tong University, Shanghai 200240, P.R. China（上海交通大学光子通信国家重点实验室）； Hefei National Laboratory, Hefei, 230088, P.R. China（合肥国家实验室）； Shanghai Research Center for Quantum Sciences, Shanghai, 201315, P.R. China（上海量子科学研究中心）； AI Lab, Lenovo Research, Beijing 100094, P.R. China（联想AI实验室）

AI总结本文利用课程学习启发的深度强化学习，在近似自主量子纠错框架下发现抵抗单光子和双光子损失的玻色子码，并实现超越盈亏平衡点的最优码字。

详情

DOI: 10.1103/rgy3-z928
Journal ref: Phys. Rev. A 112, 062618 (2025)

AI中文摘要

量子纠错对于容错量子计算至关重要。然而，依赖主动测量的标准方法可能会引入额外错误。自主量子纠错（AQEC）通过利用玻色子系统中的工程耗散和驱动来规避这一问题，但由于严格的Knill-Laflamme条件，识别实用的编码仍然具有挑战性。在本工作中，我们利用课程学习启发的深度强化学习，在近似AQEC框架下发现抵抗单光子和双光子损失的玻色子码。我们提出了在近似条件下求解主方程的解析解，这可以显著加速强化学习的训练过程。智能体首先通过在受限演化时间框架内快速探索，识别出超越盈亏平衡点的编码子空间，然后策略性地微调其策略，以在更长的时间范围内维持这一性能优势。我们发现，经过两阶段训练的智能体能够发现最优码字集合，即考虑单光子和双光子损失效应的Fock态$\ket{4}$和$\ket{7}$。我们识别出该码在更长的演化时间内超越了盈亏平衡阈值，并达到了最先进的性能。我们还分析了该码对相位阻尼和振幅阻尼噪声的鲁棒性。我们的工作突显了课程学习启发的深度强化学习在发现最优量子纠错码方面的潜力，特别是在早期容错量子系统中。

英文摘要

Quantum error correction is essential for fault-tolerant quantum computing. However, standard methods relying on active measurements may introduce additional errors. Autonomous quantum error correction (AQEC) circumvents this by utilizing engineered dissipation and drives in bosonic systems, but identifying practical encoding remains challenging due to stringent Knill-Laflamme conditions. In this work, we utilize curriculum learning enabled deep reinforcement learning to discover Bosonic codes under approximate AQEC framework to resist both single-photon and double-photon losses. We present an analytical solution of solving the master equation under approximation conditions, which can significantly accelerate the training process of reinforcement learning. The agent first identifies an encoded subspace surpassing the breakeven point through rapid exploration within a constrained evolutionary time-frame, then strategically fine-tunes its policy to sustain this performance advantage over extended temporal horizons. We find that the two-phase trained agent can discover the optimal set of codewords, i.e., the Fock states $\ket{4}$ and $\ket{7}$ considering the effect of both single-photon and double-photon loss. We identify that the discovered code surpasses the breakeven threshold over a longer evolution time and achieve the state-of-art performance. We also analyze the robustness of the code against the phase damping and amplitude damping noise. Our work highlights the potential of curriculum learning enabled deep reinforcement learning in discovering the optimal quantum error correct code especially in early fault-tolerant quantum systems.

URL PDF HTML ☆

赞 0 踩 0

2511.11346 2026-06-03 cs.LG 版本更新

Fast and Expressive Multi-Byte Prediction with Probabilistic Circuits

基于概率电路的快速且富有表现力的多字节预测

Andreas Grivas, Lorenzo Loconte, Emile van Krieken, Piotr Nawrot, Yu Zhao, Euan Wielewski, Pasquale Minervini, Edoardo Ponti, Antonio Vergari

发表机构 * University of Cambridge（剑桥大学）

AI总结提出MTPC框架，利用概率电路编码未来令牌的联合分布，在字节级LLM中实现快速生成，同时保持表现力。

详情

AI中文摘要

多令牌预测（MTP）是一种显著加速大型语言模型（LLM）生成的突出策略，尤其是在字节级LLM中，这些模型无需分词器但速度极慢。然而，许多现有的MTP方法要么假设未来令牌之间独立，牺牲了表现力，要么在窗口内逐个生成令牌，增加了延迟。在这项工作中，我们在概率电路（PC）框架内研究了MTP中表现力与延迟之间的权衡。我们的框架MTPC允许通过选择电路架构来探索编码未来令牌联合分布的不同方式，推广了经典模型，如（层次）混合模型、隐马尔可夫模型和张量网络。我们通过改造现有的字节级LLM（如EvaByte）和字节化的子词模型（如Llama3.2 3B）展示了MTPC的有效性。实验表明，当与推测解码结合时，与具有独立性假设的MTP相比，MTPC显著加速了生成，同时保证保持原始验证器LLM的性能。我们还严格研究了在探索MTPC的可能参数化（如PC架构以及验证器和草稿LLM之间的部分层共享）时，表现力与延迟之间的最优权衡。

英文摘要

Multi-token prediction (MTP) is a prominent strategy to significantly speed up generation in large language models (LLMs), especially in byte-level LLMs, which are tokeniser-free but prohibitively slow. However, many existing MTP methods either assume independence between future tokens, sacrificing expressiveness, or generate tokens one at a time within the window, increasing latency. In this work, we investigate the trade-off between expressiveness and latency in MTP within the framework of probabilistic circuits (PCs). Our framework, MTPC, allows one to explore different ways to encode the joint distributions over future tokens by selecting circuit architectures, generalising classical models such as (hierarchical) mixture models, hidden Markov models, and tensor networks. We show the efficacy of MTPC by retrofitting existing byte-level LLMs, such as EvaByte, and byte-fied subword models, such as Llama3.2 3B. Our experiments show that, when combined with speculative decoding, MTPC substantially speeds up generation compared to MTP with independence assumptions, while guaranteeing to retain the performance of the original verifier LLM. We also rigorously study the optimal trade-off between expressiveness and latency when exploring the possible parameterisations of MTPC, such as PC architectures and partial layer sharing between the verifier and draft LLMs.

URL PDF HTML ☆

赞 0 踩 0

2511.07971 2026-06-03 cs.LG 版本更新

Low-Rank Curvature for Zeroth-Order Optimization in LLM Fine-Tuning

低秩曲率用于大语言模型微调中的零阶优化

Hyunseok Seung, Jaewoo Lee, Hyunsuk Ko

发表机构 * University of Wisconsin – Madison（威斯康星大学麦迪逊分校）； University of Georgia（佐治亚大学）； Hanyang University（翰阳大学）

AI总结提出LOREN方法，通过低秩块对角预条件器捕捉曲率并利用REINFORCE留一法梯度估计器降低方差，在LLM微调中实现更高精度和更快收敛，同时峰值内存使用降低27.3%。

Comments Accepted to the AAAI Conference on Artificial Intelligence (AAAI-2026)

详情

DOI: 10.1609/aaai.v40i30.39715

AI中文摘要

我们引入了LOREN，一种用于微调大型语言模型（LLM）的曲率感知零阶（ZO）优化方法。现有的ZO方法通过随机扰动的有限差分估计梯度，常常遭受高方差和次优搜索方向的问题。我们的方法通过以下方式解决这些挑战：（i）将梯度预条件问题重新表述为自适应估计用于梯度估计的各向异性扰动分布的问题，（ii）通过自然进化策略框架，使用低秩块对角预条件器捕捉曲率，以及（iii）应用REINFORCE留一法（RLOO）梯度估计器来降低方差。在标准LLM基准上的实验表明，我们的方法通过实现更高的精度和更快的收敛，优于最先进的ZO方法，同时与MeZO-Adam相比，峰值内存使用减少了高达27.3%。

英文摘要

We introduce LOREN, a curvature-aware zeroth-order (ZO) optimization method for fine-tuning large language models (LLMs). Existing ZO methods, which estimate gradients via finite differences using random perturbations, often suffer from high variance and suboptimal search directions. Our approach addresses these challenges by: (i) reformulating the problem of gradient preconditioning as that of adaptively estimating an anisotropic perturbation distribution for gradient estimation, (ii) capturing curvature through a low-rank block diagonal preconditioner using the framework of natural evolution strategies, and (iii) applying a REINFORCE leave-one-out (RLOO) gradient estimator to reduce variance. Experiments on standard LLM benchmarks show that our method outperforms state-of-the-art ZO methods by achieving higher accuracy and faster convergence, while cutting peak memory usage by up to 27.3% compared with MeZO-Adam.

URL PDF HTML ☆

赞 0 踩 0

2506.08464 2026-06-03 cs.LG 版本更新

MAC: An Efficient Gradient Preconditioning using Mean Activation Approximated Curvature

MAC：一种使用平均激活近似曲率的高效梯度预条件方法

Hyunseok Seung, Jaewoo Lee, Hyunsuk Ko

发表机构 * University of Wisconsin – Madison（威斯康星大学麦迪逊分校）； University of Georgia（佐治亚大学）； Hanyang University（翰阳大学）

AI总结提出MAC方法，通过近似KFAC中Fisher信息矩阵的Kronecker因子，降低二阶优化计算负担，并首次将Kronecker分解应用于Transformer注意力层，在多种网络和数据集上优于KFAC等现有方法。

Comments Accepted to the IEEE International Conference on Data Mining (ICDM-2025)

详情

DOI: 10.1109/ICDM65498.2025.00077

AI中文摘要

用于训练神经网络的二阶优化方法，如KFAC，通过利用损失景观的曲率信息展现出优越的收敛性。然而，这是以高计算负担为代价的。在这项工作中，我们分析了构成KFAC中逐层Fisher信息矩阵（FIM）的两个组件：与激活和预激活梯度相关的Kronecker因子。基于对其特征谱的实证观察，我们提出了它们的有效近似，从而产生了一种计算高效的优化方法，称为MAC。据我们所知，MAC是第一个将Kronecker分解应用于Transformer中注意力层的FIM，并明确将注意力分数整合到预条件中的算法。我们还研究了MAC在非线性神经网络上的收敛性质，并提供了其收敛到全局最小值的两个条件。我们在各种网络架构和数据集上的广泛评估表明，所提出的方法在准确性、端到端训练时间和内存使用方面优于KFAC和其他最先进的方法。

英文摘要

Second-order optimization methods for training neural networks, such as KFAC, exhibit superior convergence by utilizing curvature information of loss landscape. However, it comes at the expense of high computational burden. In this work, we analyze the two components that constitute the layer-wise Fisher information matrix (FIM) used in KFAC: the Kronecker factors related to activations and pre-activation gradients. Based on empirical observations on their eigenspectra, we propose efficient approximations for them, resulting in a computationally efficient optimization method called MAC. To the best of our knowledge, MAC is the first algorithm to apply the Kronecker factorization to the FIM of attention layers used in transformers and explicitly integrate attention scores into the preconditioning. We also study the convergence property of MAC on nonlinear neural networks and provide two conditions under which it converges to global minima. Our extensive evaluations on various network architectures and datasets show that the proposed method outperforms KFAC and other state-of-the-art methods in terms of accuracy, end-to-end training time, and memory usage.

URL PDF HTML ☆

赞 0 踩 0

2310.00965 2026-06-03 cs.LG 版本更新

Node Perturbation Can Effectively Train Multi-Layer Neural Networks

节点扰动可以有效训练多层神经网络

Sander Dalm, Marcel van Gerven, Nasir Ahmad

发表机构 * Donders Institute for Brain, Cognition and Behaviour（大脑、认知与行为研究所）

AI总结通过将节点扰动与方向导数对齐并在每层进行输入去相关，显著提升了节点扰动学习的参数收敛速度和测试性能，接近反向传播。

详情

AI中文摘要

反向传播（BP）仍然是训练深度神经网络参数的主导且最成功的方法。然而，BP依赖于两个计算上不同的阶段，不能提供对生物学习的满意解释，并且可能难以应用于具有不连续性或噪声节点动态的网络训练。相比之下，节点扰动（NP），也称为活动扰动前向梯度，提出通过向网络激活中注入噪声并随后测量引起的损失变化来学习。NP依赖于两次前向（推理）传递，不使用网络导数，并已被提出作为生物系统中学习的模型。然而，标准NP数据效率极低，并且由于其无引导的基于噪声的搜索过程可能不稳定。在这项工作中，我们通过将NP与方向导数相关联并引入输入去相关，发展了一种现代视角。我们发现，与方向导数的更紧密对齐以及每层的输入去相关在理论和实践上增强了NP学习的性能，在参数收敛方面有大幅改进，并且在测试数据上获得更高的性能，接近BP。此外，我们的新公式允许应用于噪声过程本身不可访问的噪声系统，这对于神经形态芯片上的学习特别有意义。

英文摘要

Backpropagation (BP) remains the dominant and most successful method for training parameters of deep neural network models. However, BP relies on two computationally distinct phases, does not provide a satisfactory explanation of biological learning, and can be challenging to apply for training of networks with discontinuities or noisy node dynamics. By comparison, node perturbation (NP), also known as activity-perturbed forward gradients, proposes learning by the injection of noise into network activations, and subsequent measurement of the induced loss change. NP relies on two forward (inference) passes, does not make use of network derivatives, and has been proposed as a model for learning in biological systems. However, standard NP is highly data inefficient and can be unstable due to its unguided noise-based search process. In this work, we develop a modern perspective on NP by relating it to the directional derivative and incorporating input decorrelation. We find that a closer alignment with directional derivatives together with input decorrelation at every layer theoretically and practically enhances performance of NP learning with large improvements in parameter convergence and much higher performance on the test data, approaching that of BP. Furthermore, our novel formulation allows for application to noisy systems in which the noise process itself is inaccessible, which is of particular interest for on-chip learning in neuromorphic systems.

URL PDF HTML ☆

赞 0 踩 0

2511.02986 2026-06-03 stat.ML cs.LG q-bio.GN 版本更新

Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models

基于潜在扩散模型的可扩展单细胞基因表达生成

Giovanni Palla, Sudarshan Babu, Payam Dibaeinia, James D. Pearce, Donghui Li, Aly A. Khan, Theofanis Karaletsos, Jakub M. Tomczak

发表机构 * University of Cambridge（剑桥大学）

AI总结提出scLDM，一种结合变分自编码器和潜在扩散模型的可扩展生成方法，通过置换不变/等变架构和扩散Transformer实现高质量单细胞基因表达生成。

Comments Accepted to ICML 2026, Github: https://github.com/czi-ai/scldm/

详情

AI中文摘要

单细胞基因表达的计算建模对于理解细胞过程至关重要，但生成真实的表达谱仍然是一个主要挑战。这一困难源于基因表达数据的计数性质以及基因之间复杂的潜在依赖性。现有的生成模型通常强加人工基因排序或依赖浅层神经网络架构。我们引入了一种可扩展的潜在扩散模型用于单细胞基因表达数据，称为scLDM，该模型尊重数据的基本可交换性属性。我们的VAE使用固定大小的潜在变量，利用统一的多头交叉注意力块（MCAB）架构，该架构具有双重作用：编码器中的置换不变池化和解码器中的置换等变反池化。我们通过用使用扩散Transformer和线性插值的潜在扩散模型替换高斯先验来增强这一框架，从而通过多条件无分类器引导实现高质量生成。我们在观察性和扰动性单细胞数据的多种实验以及下游任务（如细胞水平分类）中展示了其优越性能。

英文摘要

Computational modeling of single-cell gene expression is crucial for understanding cellular processes, but generating realistic expression profiles remains a major challenge. This difficulty arises from the count nature of gene expression data and complex latent dependencies among genes. Existing generative models often impose artificial gene orderings or rely on shallow neural network architectures. We introduce a scalable latent diffusion model for single-cell gene expression data, which we refer to as scLDM, that respects the fundamental exchangeability property of the data. Our VAE uses fixed-size latent variables leveraging a unified Multi-head Cross-Attention Block (MCAB) architecture, which serves dual roles: permutation-invariant pooling in the encoder and permutation-equivariant unpooling in the decoder. We enhance this framework by replacing the Gaussian prior with a latent diffusion model using Diffusion Transformers and linear interpolants, enabling high-quality generation with multi-conditional classifier-free guidance. We show its superior performance in a variety of experiments for both observational and perturbational single-cell data, as well as downstream tasks like cell-level classification.

URL PDF HTML ☆

赞 0 踩 0

2511.02304 2026-06-03 cs.MA cs.AI cs.CL cs.FL cs.LG 版本更新

Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning

自动机条件化协作多智能体强化学习

Beyazit Yalcinkaya, Marcell Vazquez-Chanlatte, Ameesh Shah, Hanna Krasowski, Sanjit A. Seshia

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； Stanford University（斯坦福大学）

AI总结提出自动机条件化协作多智能体强化学习框架，通过自动机分解团队目标为子任务，学习任务条件化的分散策略，实现最优任务分配和多步协调。

详情

AI中文摘要

我们研究在集中训练、分散执行下，针对协作性时间目标的多任务、多智能体策略学习。在此设置中，使用自动机表示分配给智能体的任务，能够将团队级目标分解为更简单、更小的子任务。然而，现有方法样本效率低下，且局限于单任务情况，需要为每个新任务重新训练策略。在这项工作中，我们提出了自动机条件化协作多智能体强化学习（ACC-MARL），一个学习任务条件化分散团队策略的框架。我们识别了ACC-MARL可行性的挑战，提出了解决方案，并证明了我们的方法是最优的。我们进一步展示了学习到的价值函数可用于在测试时最优地分配任务。实验表明，智能体之间涌现出任务感知的多步协调，例如按下按钮开门、扶住门以及短路任务。

英文摘要

We study learning multi-task, multi-agent policies for cooperative, temporal objectives, under centralized training, decentralized execution. In this setting, using automata to represent tasks assigned to agents enables breaking down a team-level objective into simpler, smaller sub-tasks. However, existing approaches remain sample-inefficient and are limited to the single-task case, requiring retraining policies for each new task. In this work, we present Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning (ACC-MARL), a framework for learning task-conditioned, decentralized team policies. We identify challenges to the feasibility of ACC-MARL, propose solutions, and prove that our approach is optimal. We further show that learned value functions can be used to assign tasks optimally at test time. Experiments demonstrate emergent task-aware, multi-step coordination among agents, such as pressing a button to unlock a door, holding the door, and short-circuiting tasks.

URL PDF HTML ☆

赞 0 踩 0

2510.23216 2026-06-03 cs.AI cs.LG 版本更新

Human-Like Goalkeeping in a Realistic Football Simulation: a Sample-Efficient Reinforcement Learning Approach

逼真足球模拟中的人性化守门：一种样本高效的强化学习方法

Alessandro Sestini, Joakim Bergdahl, Jean-Philippe Barrette-LaPierre, Florian Fuchs, Brady Chen, Fabio Zinno, Michael Jones, Linus Gisslén

发表机构 * University of Edinburgh（爱丁堡大学）； KTH Royal Institute of Technology（皇家理工学院）； University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种样本高效的深度强化学习方法，通过利用预收集数据和增加网络可塑性，在EA SPORTS FC 25中训练出守门员智能体，其扑救率比内置AI高10%，训练速度比标准DRL快50%，且行为更接近人类。

详情

AI中文摘要

尽管多个知名视频游戏已成为深度强化学习（DRL）的测试平台，但该技术很少被游戏行业用于制作真实的AI行为。先前的研究侧重于使用大型模型训练超人类智能体，这对于资源有限、旨在实现类人智能体的游戏工作室来说并不实际。本文提出了一种样本高效的DRL方法，专为在工业环境（如视频游戏行业）中训练和微调智能体而设计。我们的方法通过利用预收集的数据和增加网络可塑性来提高基于价值的DRL的样本效率。我们在EA SPORTS FC 25（当今最畅销的足球模拟游戏之一）中评估了该方法训练守门员智能体的效果。我们的智能体在扑救率上比游戏内置AI高出10%。消融研究表明，与标准DRL方法相比，我们的方法训练智能体速度提高了50%。最后，领域专家的定性评估表明，与手工制作的智能体相比，我们的方法创造了更人性化的游戏玩法。作为该方法影响力的证明，该技术已被用于该系列的最新版本中。

英文摘要

While several high profile video games have served as testbeds for Deep Reinforcement Learning (DRL), this technique has rarely been employed by the game industry for crafting authentic AI behaviors. Previous research focuses on training super-human agents with large models, which is impractical for game studios with limited resources aiming for human-like agents. This paper proposes a sample-efficient DRL method tailored for training and fine-tuning agents in industrial settings such as the video game industry. Our method improves sample efficiency of value-based DRL by leveraging pre-collected data and increasing network plasticity. We evaluate our method training a goalkeeper agent in EA SPORTS FC 25, one of the best-selling football simulations today. Our agent outperforms the game's built-in AI by 10% in ball saving rate. Ablation studies show that our method trains agents 50% faster compared to standard DRL methods. Finally, qualitative evaluation from domain experts indicates that our approach creates more human-like gameplay compared to hand-crafted agents. As a testament to the impact of the approach, the method has been adopted for use in the most recent release of the series.

URL PDF HTML ☆

赞 0 踩 0

2510.23469 2026-06-03 cs.LG 版本更新

Towards Fair Graph Prompting: A Dual-Prompt Mechanism for Mitigating Attribute and Structural Bias

面向公平图提示：一种缓解属性与结构偏差的双提示机制

Yuhan Yang, Xingbo Fu, Jundong Li

发表机构 * University of Michigan（密歇根大学）； University of Virginia（弗吉尼亚大学）

AI总结提出自适应双提示框架（ADPrompt），通过自适应特征修正和自适应消息校准两个模块，在适应预训练GNN的同时缓解节点属性与图结构中的偏差，实现公平的节点分类。

详情

AI中文摘要

对未标记图数据进行自监督预训练已成为图神经网络（GNN）的常见范式。然而，预训练目标与下游任务之间通常存在目标差距。为弥补这一差距，图提示方法通过可学习提示将冻结的预训练GNN适应到特定下游任务。尽管有效，但现有大多数图提示方法主要关注提升模型性能，而很大程度上忽略了公平性问题。由于下游图数据在节点属性和图结构中固有地包含偏差，预训练GNN可能在不同人口统计子组之间产生不同的表示。为解决这一局限，我们提出自适应双提示（ADPrompt），一种公平感知的图提示框架，用于适应预训练GNN。ADPrompt包含两个互补组件：自适应特征修正，学习个性化属性提示以在输入层面抑制敏感信息；以及自适应消息校准，引入逐层结构提示以动态调节来自邻居节点的信息传播。通过联合优化这两个模块，ADPrompt在适应预训练GNN的同时缓解了属性级和结构级偏差。在四个基准数据集上采用多种预训练策略的实验表明，ADPrompt在节点分类任务中始终优于七个竞争基线。

英文摘要

Self-supervised pre-training on unlabeled graph data has become a common paradigm for Graph Neural Networks (GNNs). However, an objective gap often remains between pre-training objectives and downstream tasks. To bridge this gap, graph prompting methods adapt frozen pre-trained GNNs to specific downstream tasks through learnable prompts. Despite its effectiveness, most existing graph prompting methods primarily focus on improving model performance and largely overlook fairness concerns. As downstream graph data inherently contains biases in both node attributes and graph structures, pre-trained GNNs may produce representations that differ across demographic subgroups. To address this limitation, we propose Adaptive Dual Prompting (ADPrompt), a fairness-aware graph prompting framework for adapting pre-trained GNNs. ADPrompt incorporates two complementary components: Adaptive Feature Rectification, which learns personalized attribute prompts to suppress sensitive information at the input level, and Adaptive Message Calibration, which introduces layer-wise structure prompts to dynamically regulate information propagation from neighboring nodes. By jointly optimizing these two modules, ADPrompt adapts the pre-trained GNN while mitigating both attribute-level and structural bias. Experiments on four benchmark datasets with multiple pre-training strategies demonstrate that ADPrompt consistently outperforms seven competitive baselines in node classification tasks.

URL PDF HTML ☆

赞 0 踩 0

2510.15780 2026-06-03 stat.AP cs.LG 版本更新

Enhanced Renewable Energy Forecasting using Context-Aware Conformal Prediction

基于上下文感知保形预测的增强型可再生能源预测

Alireza Moradi, Mathieu Tanneau, Reza Zandehshahvar, Pascal Van Hentenryck

发表机构 * EPFL, Switzerland（瑞士联邦理工学院）； Ghent University, Belgium（比利时根特大学）

AI总结提出上下文感知保形预测（CACP）框架，通过加权历史观测校准预测区间，无需重新训练模型，提升可再生能源预测的可靠性和效率。

详情

AI中文摘要

人工智能（AI）越来越多地被用于支持可再生能源预测和电网运营。随着可再生能源渗透率的增长，可靠的概率预测对于管理不确定性和支持风险感知的运营决策变得至关重要。然而，由于时间变异性、天气条件变化和异质运行机制，这些预测常常存在校准偏差。在许多实际场景中，可再生能源预测由外部来源、供应商或独立训练的系统提供，由于模型访问受限或计算约束，重新训练不可行。这需要高效且模型无关的方法来在预测生成后提高其可靠性。本文提出了上下文感知保形预测（CACP），一种用于校准可再生能源预测的框架。所提方法在校准过程中依赖于一种加权机制，该机制为与目标预测条件更相似的历史观测分配更高的权重。这使得能够自适应预测区间，反映局部不确定性机制，而无需访问或重新训练底层预测模型。实验在来自美国国家可再生能源实验室（NREL）的日前太阳能预测大规模数据集上进行，涵盖包括MISO、ERCTO和SPP在内的多个系统。结果表明，与NREL的基础预测模型和其他保形预测基线相比，CACP在站点和系统层面均改善了可靠性-效率权衡。这些结果表明，CACP可以作为可信AI驱动的可再生能源预测和运营决策支持的实际可靠性增强层。

英文摘要

Artificial intelligence (AI) is increasingly used to support renewable energy forecasting and grid operations. As renewable penetration grows, reliable probabilistic forecasting is becoming essential for managing uncertainty and supporting risk-aware operational decision-making. However, these forecasts often suffer from miscalibration due to temporal variability, changing weather conditions, and heterogeneous operating regimes. In many real-world settings, renewable energy forecasts are provided by external sources, vendors, or independently trained systems, making retraining infeasible because of limited model access or computational constraints. This creates a need for efficient and model-agnostic methods that can improve forecast reliability after they are produced. This paper presents Context-Aware Conformal Prediction (CACP), a framework for calibrating renewable energy forecasts. The proposed method relies on a weighting mechanism during the calibration procedure which assigns higher weights to historical observations that are more similar to the target forecasting condition. This enables adaptive prediction intervals that reflect local uncertainty regimes without requiring access to, or retraining of, the underlying forecasting model. Experiments are performed on a large-scale dataset from National Renewable Energy Laboratory (NREL) day-ahead solar forecasting, covering multiple systems including MISO, ERCTO, and SPP. The results show that CACP improves the reliability-efficiency tradeoff at both site and system levels compared to NREL's base forecasting model and the other conformal prediction baselines. These results suggest that CACP can serve as a practical reliability-enhancement layer for trustworthy AI-enabled renewable energy forecasting and operational decision support.

URL PDF HTML ☆

赞 0 踩 0

2509.08048 2026-06-03 hep-ph cs.LG 版本更新

Forecasting Generative Amplification

预测生成放大

Henning Bahl, Sascha Diefenbacher, Nina Elmer, Tilman Plehn, Jonas Spinner

发表机构 * Institut für Theoretische Physik, Universität Heidelberg, Germany（海德堡大学理论物理研究所）； Physics Division, Lawrence Berkeley National Laboratory, Berkeley, USA（伯克利国家实验室物理部）； Interdisciplinary Center for Scientific Computing (IWR), Universität Heidelberg, Germany（海德堡大学跨学科科学计算中心（IWR））

AI总结本文提出两种互补方法（平均放大和差分放大）来估计生成网络在LHC模拟中的统计放大因子，无需大型保留数据集，并应用于最新事件生成器，表明放大在相空间特定区域可行但尚未覆盖整个分布。

Comments 23 pages, 15 figures. v2: added link to github repo, extended acknowledgements. v3: updated conventions and refined text, now 25 pages

2506.09398 2026-06-03 cs.LG physics.comp-ph 版本更新

Efficient Prediction of SO(3)-Equivariant Hamiltonian Matrices via SO(2) Local Frames

通过SO(2)局部框架高效预测SO(3)等变哈密顿矩阵

Haiyang Yu, Yuchao Lin, Xuan Zhang, Xiaofeng Qian, Shuiwang Ji

发表机构 * National University of Singapore（新加坡国立大学）

AI总结提出QHNetV2网络，利用SO(2)局部框架和SO(2)等变操作实现全局SO(3)等变性，避免昂贵的SO(3)张量积，高效预测哈密顿矩阵。

Comments Code available at: https://github.com/divelab/AIRS

详情

AI中文摘要

我们考虑预测哈密顿矩阵以加速电子结构计算的任务，这在物理、化学和材料科学中扮演重要角色。受哈密顿矩阵的非对角块与SO(2)局部框架之间固有关系的启发，我们提出了一种新颖高效的网络，称为QHNetV2，该网络在不使用昂贵的SO(3) Clebsch-Gordan张量积的情况下实现了全局SO(3)等变性。这是通过引入一组新的高效且强大的SO(2)等变操作，并在SO(2)局部框架内执行所有非对角特征更新和消息传递来实现的，从而消除了对SO(3)张量积的需求。此外，在每个节点的SO(2)局部框架内执行连续的SO(2)张量积以融合节点特征，模拟对称收缩操作。在大型QH9和MD17数据集上的大量实验表明，我们的模型在广泛的分子结构和轨迹上实现了优越的性能，凸显了其强大的泛化能力。所提出的基于SO(2)局部框架的SO(2)操作为可扩展且对称感知的电子结构学习提供了一个有前景的方向。我们的代码将作为AIRS库的一部分发布，网址为https://github.com/divelab/AIRS。

英文摘要

We consider the task of predicting Hamiltonian matrices to accelerate electronic structure calculations, which plays an important role in physics, chemistry, and materials science. Motivated by the inherent relationship between the off-diagonal blocks of the Hamiltonian matrix and the SO(2) local frame, we propose a novel and efficient network, called QHNetV2, that achieves global SO(3) equivariance without the costly SO(3) Clebsch-Gordan tensor products. This is achieved by introducing a set of new efficient and powerful SO(2)-equivariant operations and performing all off-diagonal feature updates and message passing within SO(2) local frames, thereby eliminating the need of SO(3) tensor products. Moreover, a continuous SO(2) tensor product is performed within the SO(2) local frame at each node to fuse node features, mimicking the symmetric contraction operation. Extensive experiments on the large QH9 and MD17 datasets demonstrate that our model achieves superior performance across a wide range of molecular structures and trajectories, highlighting its strong generalization capability. The proposed SO(2) operations on SO(2) local frames offer a promising direction for scalable and symmetry-aware learning of electronic structures. Our code will be released as part of the AIRS library https://github.com/divelab/AIRS.

URL PDF HTML ☆

赞 0 踩 0

2308.07867 2026-06-03 eess.SY cs.LG cs.SY 版本更新

Learning Power Flow with Confidence: A Probabilistic Guarantee Framework for Voltage Risk

学习潮流与置信度：电压风险的概率保证框架

Parikshit Pareek, Sidhant Misra, Deepjyoti Deka

AI总结针对机器学习在电力系统安全应用中缺乏形式化性能保证的问题，提出基于高斯过程回归的概率保证框架，通过顶点度核和网络扫描主动学习算法实现数据高效且可靠的电压风险评估。

Comments 10 pages

详情

AI中文摘要

机器学习缺乏形式化性能保证限制了其在安全关键的电力系统应用中的采用，在这些应用中，置信度和可解释性与准确性同样重要。在这项工作中，我们通过高斯过程回归框架，为潮流学习和电压风险估计提供了概率保证。具体来说，我们建立了期望估计误差的界限，将GP的预测方差与电压风险估计的置信度联系起来，确保与基于蒙特卡洛的ACPF风险量化在统计上等价。为了在低数据情况下增强模型的可学习性，我们首先设计了顶点度核，这是一种拓扑感知的加性核，将电压-负荷相互作用分解为局部邻域，以实现高效的大规模学习。在此基础上，我们引入了一种网络扫描主动学习算法，该算法自适应地采样信息丰富的运行点，并提供了原则性的停止准则，无需样本外验证。这些进展通过结合数据效率和分析保证，缓解了基于机器学习的潮流的主要瓶颈——缺乏可靠的保证。在IEEE 118、500和1354节点系统上的实证评估证实，所提出的VDK-GP实现了低于1E-03 p.u.的平均绝对电压误差，以15倍更少的ACPF计算复现了蒙特卡洛级别的电压风险估计，并在保守地约束违规概率的同时实现了超过120倍的评估时间减少。

英文摘要

The absence of formal performance guarantees in machine learning (ML) has limited its adoption for safety-critical power system applications, where confidence and interpretability are as vital as accuracy. In this work, we present a probabilistic guarantee for power flow learning and voltage risk estimation, derived through the framework of Gaussian Process (GP) regression. Specifically, we establish a bound on the expected estimation error that connects the GP's predictive variance to confidence in voltage risk estimates, ensuring statistical equivalence with Monte Carlo-based ACPF risk quantification. To enhance model learnability in the low-data regime, we first design the Vertex-Degree Kernel (VDK), a topology-aware additive kernel that decomposes voltage-load interactions into local neighborhoods for efficient large-scale learning. Building on this, we introduce a network-swipe active learning (AL) algorithm that adaptively samples informative operating points and provides a principled stopping criterion without requiring out-of-sample validation. Together, these developments mitigate the principal bottleneck of ML-based power flow, its lack of guaranteed reliability, by combining data efficiency with analytical assurance. Empirical evaluations across IEEE 118-, 500-, and 1354-bus systems confirm that the proposed VDK-GP achieves mean absolute voltage errors below 1E-03 p.u., reproduces Monte Carlo-level voltage risk estimates with 15x fewer ACPF computations, and achieves over 120x reduction in evaluation time while conservatively bounding violation probabilities.

URL PDF HTML ☆

赞 0 踩 0

2510.09845 2026-06-03 cs.LG cs.AI cs.CV 版本更新

Harnessing Self-Supervised Deep Learning and Geostationary Remote Sensing for Advancing Wildfire and Associated Air Quality Monitoring: Improved Smoke and Fire Front Masking using GOES and TEMPO Radiance Data

利用自监督深度学习和地球静止遥感推进野火及相关空气质量监测：使用GOES和TEMPO辐射数据改进烟雾和火锋掩膜

Nicholas LaHaye, Thilanka Munashinge, Hugo Lee, Xiaohua Pan, Gonzalo Gonzalez Abad, Hazem Mahmoud, Jennifer Wei

AI总结本研究利用NASA TEMPO卫星任务的每小时数据和自监督深度学习，提出了一种创新系统，通过GOES-18和TEMPO数据有效区分烟雾与云层，实时绘制野火火锋和烟雾羽流，显著优于现有业务产品。

Comments https://2025.ieeeigarss.org/view_paper.php?PaperNum=6389&SessionID=1611

2510.08977 2026-06-03 cs.LG cs.CL 版本更新

Breaking the Self-Confirming Loop: Diagnosing and Mitigating Systemic Reward Bias in Self-Rewarding RL

打破自我确认循环：诊断与缓解自奖励强化学习中的系统性奖励偏差

Chuyi Tan, Peiwen Yuan, Xinglin Wang, Yiwei Li, Shaoxiong Feng, Yueqi Zhang, Jiayi Shi, Ji Zhang, Boyuan Pan, Yao Hu, Kan Li

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文通过量化反馈回路偏差并提出集成奖励强化学习（RLER）方法，诊断并缓解了自奖励强化学习中由置信度耦合导致的系统性奖励偏差，从而提升性能与稳定性。

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）高效扩展了大语言模型（LLMs）的推理能力，但受限于稀缺的标注数据。基于内在奖励的强化学习（RLIR）通过自奖励提供了一种可扩展的替代方案，但常面临不稳定和性能较差的问题。我们将这一差距归因于置信度耦合的自奖励中的系统性偏差：模型倾向于过度奖励高置信度的错误，形成自我确认循环。我们通过三个指标量化这种反馈回路偏差：奖励噪声幅度（rho_noise）、策略-奖励耦合（rho_selfbias）和过度/不足奖励偏斜（rho_symbias）。我们的分析显示了一种复合效应，其中强耦合放大了置信度条件误差，并导致向过度奖励的漂移，从而引发不稳定和较低的性能上限。为缓解这一问题，我们提出集成奖励强化学习（RLER），该方法通过自适应奖励插值和分歧感知的轨迹选择聚合多样化的模型，以减少耦合并抑制过度奖励漂移。大量实验表明，RLER相比最佳RLIR基线提升了6.2%，且与RLVR的差距在3.6%以内，同时在未标注样本上表现出稳定的扩展性。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) efficiently scales the reasoning ability of large language models (LLMs) but is bottlenecked by scarce labeled data. Reinforcement learning with intrinsic rewards (RLIR) offers a scalable alternative via self-rewarding, yet often suffers from instability and inferior performance. We trace this gap to a systemic bias in confidence-coupled self-rewarding: the model tends to over-reward high-confidence mistakes, forming a self-confirming loop. We quantify this feedback-loop bias with three metrics: reward noise magnitude (rho_noise), policy-reward coupling (rho_selfbias), and over-/under-reward skew (rho_symbias). Our analyses show a compounding effect where strong coupling amplifies confidence-conditioned errors and drives a drift toward over-reward, leading to instability and a lower performance ceiling. To mitigate this, we propose reinforcement learning with ensembled rewards (RLER), which aggregates diverse models with adaptive reward interpolation and disagreement-aware rollout selection to reduce coupling and suppress over-reward drift. Extensive experiments show that RLER improves by 6.2% over the best RLIR baseline and is within 3.6% of RLVR, while exhibiting stable scaling on unlabeled samples.

URL PDF HTML ☆

赞 0 踩 0

2502.09755 2026-06-03 cs.CR cs.LG 版本更新

Jailbreak Attack Initializations as Extractors of Compliance Directions

越狱攻击初始化作为合规方向的提取器

Amit Levi, Rom Himelstein, Yaniv Nemcovsky, Avi Mendelson, Chaim Baskin

发表机构 * Department of Computer Science, Technion - Israel Institute of Technology（技术学院计算机科学系）； Department of Data and Decision Science, Technion - Israel Institute of Technology（技术学院数据与决策科学系）； School of Electrical and Computer Engineering Engineering, Ben-Gurion University of the Negev（内盖夫本· Gurion大学电气与计算机工程学院）

AI总结本文发现基于梯度的越狱攻击初始化会收敛到抑制拒绝的单一合规方向，并据此提出CRI框架，通过沿合规方向投影未见提示来提高攻击成功率并降低计算开销。

Comments Accepted to Findings of the Association for Computational Linguistics 2025 (EMNLP 2025)

详情

AI中文摘要

安全对齐的LLM对提示的响应要么是合规要么是拒绝，每种响应对应模型激活空间中的不同方向。最近的研究表明，通过从其他提示进行自我迁移来初始化攻击可以显著提升其性能。然而，这些初始化的潜在机制仍不清楚，并且攻击使用任意或手动选择的初始化。本文表明，每个基于梯度的越狱攻击及其后续初始化逐渐收敛到一个抑制拒绝的单一合规方向，从而能够实现从拒绝到合规的高效转换。基于这一见解，我们提出了CRI，一个旨在将未见提示进一步投影到合规方向的初始化框架。我们在多种攻击、模型和数据集上展示了我们的方法，实现了更高的攻击成功率（ASR）并降低了计算开销，突显了安全对齐LLM的脆弱性。参考实现可在以下网址获取：https://amit1221levi.github.io/CRI-Jailbreak-Init-LLMs-evaluation

英文摘要

Safety-aligned LLMs respond to prompts with either compliance or refusal, each corresponding to distinct directions in the model's activation space. Recent works show that initializing attacks via self-transfer from other prompts significantly enhances their performance. However, the underlying mechanisms of these initializations remain unclear, and attacks utilize arbitrary or hand-picked initializations. This work presents that each gradient-based jailbreak attack and subsequent initialization gradually converge to a single compliance direction that suppresses refusal, thereby enabling an efficient transition from refusal to compliance. Based on this insight, we propose CRI, an initialization framework that aims to project unseen prompts further along compliance directions. We demonstrate our approach on multiple attacks, models, and datasets, achieving an increased attack success rate (ASR) and reduced computational overhead, highlighting the fragility of safety-aligned LLMs. A reference implementation is available at: https://amit1221levi.github.io/CRI-Jailbreak-Init-LLMs-evaluation.

URL PDF HTML ☆

赞 0 踩 0

2510.03316 2026-06-03 cs.CV cs.AI cs.LG 版本更新

The View From Space: Navigating Instrumentation Differences with EOFMs

从太空视角：利用EOFMs导航仪器差异

Ryan P. Demilt, Nicholas LaHaye, Karis Tenneson

发表机构 * Spatial Informatics Group（空间信息组）

AI总结本研究通过分析地球观测基础模型（EOFMs）对传感器架构的敏感性，揭示了当前模型设计的缺陷，并为模型开发者、用户和遥感科学社区指明了前进方向。

详情

Journal ref: https://neurips.cc/virtual/2025/loc/san-diego/122891

AI中文摘要

地球观测基础模型（EOFMs）作为处理大量遥感及其他地球观测数据、并对许多关键地球监测任务产生影响的工具，其普及程度急剧上升。一个新兴趋势是利用预训练模型的输出作为“嵌入”，这些嵌入总结了高维数据，可用于通用任务，如相似性搜索和内容特定查询。然而，大多数EOFMs仅在单一模态数据上训练，然后通过匹配不同模态的波段进行应用或基准测试。现有工作尚不清楚多样化的传感器架构如何影响当前EOFMs套件的内部表示。我们在本工作中表明，EOFMs的表示空间对传感器架构高度敏感，理解这一差异为我们提供了关于当前EOFMs设计陷阱的关键视角，并指明了作为模型开发者、用户以及以稳健遥感科学为指导的社区应如何前进的方向。

英文摘要

Earth Observation Foundation Models (EOFMs) have exploded in prevalence as tools for processing the massive volumes of remotely sensed and other earth observation data, and for delivering impact on the many essential earth monitoring tasks. An emerging trend posits using the outputs of pre-trained models as 'embeddings' which summarize high dimensional data to be used for generic tasks such as similarity search and content-specific queries. However, most EOFM models are trained only on single modalities of data and then applied or benchmarked by matching bands across different modalities. It is not clear from existing work what impact diverse sensor architectures have on the internal representations of the present suite of EOFMs. We show in this work that the representation space of EOFMs is highly sensitive to sensor architecture and that understanding this difference gives a vital perspective on the pitfalls of current EOFM design and signals for how to move forward as model developers, users, and a community guided by robust remote-sensing science.

URL PDF HTML ☆

赞 0 踩 0

2510.01377 2026-06-03 math.OC cs.AI cs.LG cs.MA cs.SY eess.SY 版本更新

DeMuon: A Decentralized Muon for Matrix Optimization over Graphs

DeMuon：一种用于图上矩阵优化的去中心化Muon方法

Chuan He, Shuyi Ren, Jingwei Mao, Erik G. Larsson

发表机构 * Department of Mathematics, Linköping University（利乌普堡大学数学系）； Department of Electrical Engineering, Linköping University（利乌普堡大学电气工程系）； Department of Computer and Information Science, Linköping University（利乌普堡大学计算机与信息科学系）

AI总结提出DeMuon方法，通过牛顿-舒尔茨迭代实现矩阵正交化，并利用梯度跟踪处理局部函数异质性，在重尾噪声下达到与集中式算法匹配的复杂度，首次将Muon扩展到去中心化图优化并具有可证明的复杂度保证。

Comments Add an accelerated variant of the proposed method. New proofs of proposed methods

详情

AI中文摘要

本文提出DeMuon，一种在给定通信拓扑上进行去中心化矩阵优化的方法。DeMuon通过牛顿-舒尔茨迭代（继承自其集中式前身Muon）实现矩阵正交化，并采用梯度跟踪来减轻局部函数之间的异质性。在重尾噪声条件和额外的温和假设下，我们建立了DeMuon达到近似随机驻点的迭代复杂度。该复杂度结果在目标容差依赖方面与已知的最佳集中式算法复杂度界相匹配。据我们所知，DeMuon是首个将Muon直接扩展到图上去中心化优化并具有可证明复杂度保证的方法。我们在不同连通程度的图上进行了去中心化Transformer预训练的初步数值实验。数值结果表明，在不同网络拓扑下，DeMuon相比其他流行的去中心化算法具有明显的改进优势。

英文摘要

In this paper, we propose DeMuon, a method for decentralized matrix optimization over a given communication topology. DeMuon incorporates matrix orthogonalization via Newton-Schulz iterations-a technique inherited from its centralized predecessor, Muon-and employs gradient tracking to mitigate heterogeneity among local functions. Under heavy-tailed noise conditions and additional mild assumptions, we establish the iteration complexity of DeMuon for reaching an approximate stochastic stationary point. This complexity result matches the best-known complexity bounds of centralized algorithms in terms of dependence on the target tolerance. To the best of our knowledge, DeMuon is the first direct extension of Muon to decentralized optimization over graphs with provable complexity guarantees. We conduct preliminary numerical experiments on decentralized transformer pretraining over graphs with varying degrees of connectivity. Our numerical results demonstrate a clear margin of improvement of DeMuon over other popular decentralized algorithms across different network topologies.

URL PDF HTML ☆

赞 0 踩 0

2509.26169 2026-06-03 cs.LG 版本更新

Alignment-Aware Decoding

对齐感知解码

Frédéric Berdoz, Luca A. Lanzendörfer, René Caky, Roger Wattenhofer

发表机构 * EPFL, Switzerland（瑞士联邦理工学院）

AI总结提出一种推理时增强模型对齐的方法——对齐感知解码（AAD），可解释为隐式奖励优化，无需额外训练，在多种基准和模型规模上优于强基线，并能生成合成数据改善数据受限场景下的对齐。

Comments Accepted at ICML 2026

2509.22468 2026-06-03 cs.LG cs.AI 版本更新

Learning the Neighborhood: Contrast-Free Multimodal Self-Supervised Molecular Graph Pretraining

学习邻域：无对比的多模态自监督分子图预训练

Boshra Ariguib, Mathias Niepert, Andrei Manolache

发表机构 * University of Tübingen（图宾根大学）

AI总结提出C-FREE框架，通过预测子图嵌入与互补邻域的关系，融合2D拓扑和3D构象信息，实现无对比、无负样本的多模态自监督分子图预训练，在MoleculeNet上取得最优结果。

Comments Accepted at ICML 2026

详情

AI中文摘要

高质量的分子表示对于性质预测和分子设计至关重要，然而大型标注数据集仍然稀缺。尽管分子图上的自监督预训练已显示出潜力，但许多现有方法要么依赖于手工数据增强或复杂的生成目标，要么仅利用2D拓扑，导致宝贵的3D结构信息未被充分利用。为弥补这一空白，我们引入了C-FREE（基于自我网络的无需对比的表示学习），一个将2D图与3D构象集成在一起的简单框架。C-FREE通过从潜在空间中互补邻域预测子图嵌入来学习分子表示，使用固定半径的自我网络作为不同构象之间的建模单元。这种设计使我们能够在混合图神经网络（GNN）-Transformer骨干中整合几何和拓扑信息，无需负样本、位置编码或昂贵的预处理。在提供丰富3D构象多样性的GEOM数据集上进行预训练后，C-FREE在MoleculeNet上取得了最先进的结果，超越了对比、生成和其他多模态自监督方法。在具有不同规模和分子类型的数据集上进行微调进一步表明，预训练能有效迁移到新的化学领域，突显了3D信息分子表示的重要性。

英文摘要

High-quality molecular representations are essential for property prediction and molecular design, yet large labeled datasets remain scarce. While self-supervised pretraining on molecular graphs has shown promise, many existing approaches either depend on hand-crafted augmentations or complex generative objectives, and often rely solely on 2D topology, leaving valuable 3D structural information underutilized. To address this gap, we introduce C-FREE (Contrast-Free Representation learning on Ego-nets), a simple framework that integrates 2D graphs with ensembles of 3D conformers. C-FREE learns molecular representations by predicting subgraph embeddings from their complementary neighborhoods in the latent space, using fixed-radius ego-nets as modeling units across different conformers. This design allows us to integrate both geometric and topological information within a hybrid Graph Neural Network (GNN)-Transformer backbone, without negatives, positional encodings, or expensive pre-processing. Pretraining on the GEOM dataset, which provides rich 3D conformational diversity, C-FREE achieves state-of-the-art results on MoleculeNet, surpassing contrastive, generative, and other multimodal self-supervised methods. Fine-tuning across datasets with diverse sizes and molecule types further demonstrates that pretraining transfers effectively to new chemical domains, highlighting the importance of 3D-informed molecular representations.

URL PDF HTML ☆

赞 0 踩 0

2509.08726 2026-06-03 math.OC cs.LG 版本更新

Decentralized Stochastic Nonconvex Optimization under the $(L_0,L_1)$-Smoothness

$(L_0,L_1)$-光滑条件下的去中心化随机非凸优化

Luo Luo, Xue Cui, Tingkai Jia, Cheng Chen

发表机构 * School of Data Science, Fudan University（复旦大学数据科学学院）； East China Normal University（华东师范大学）

AI总结针对满足$(L_0,L_1)$-光滑条件的非凸函数，提出去中心化归一化随机梯度下降算法，实现每个局部智能体达到ε-稳定点，并给出样本复杂度和通信复杂度的上界。

详情

AI中文摘要

本文关注去中心化随机优化问题 $f(\mathbf{x})=\frac{1}{m}\sum_{i=1}^m f_i(\mathbf{x})$，其中网络由 $n$ 个智能体连接，每个局部函数形如 $f_i(\mathbf{x}) = {\mathbb E}\left[F(\mathbf{x};{\boldsymbol ξ}_i)\right]$，满足 $(L_0,L_1)$-光滑条件但可能非凸，且每个随机变量 ${\boldsymbol ξ}_i$ 服从分布 ${\mathcal D}_i$。我们提出一种新算法——去中心化归一化随机梯度下降（DNSGD），该算法可使每个局部智能体达到 $\varepsilon$-稳定点。我们提出了一个基于梯度范数与一致性误差乘积的李雅普诺夫函数的新框架，用于分析 $(L_0,L_1)$-光滑设置下的去中心化一阶方法。我们证明，所提算法在每个智能体上的样本复杂度上界为 ${\mathcal O}(m^{-1}(L_fσ^2Δ_fε^{-4} + σ^2ε^{-2} + L_f^{-2}L_1^3σ^2Δ_fε^{-1} + L_f^{-2}L_1^2σ^2))$，通信复杂度上界为 $\tilde{\mathcal O}((L_fε^{-2} + L_1ε^{-1})γ^{-1/2}Δ_f)$，其中 $L_f=L_0 +L_1ζ$，$σ^2$ 是随机梯度的方差，$Δ_f$ 是初始最优函数值差距，$γ$ 是网络的谱间隙，$ζ$ 是梯度异质性程度。在 $L_1=0$ 的特殊情况下，上述结果（几乎）匹配标准光滑条件下去中心化随机非凸优化的下界。我们还进行了数值实验，以展示我们方法的实证优越性。

英文摘要

This paper focuses on the decentralized stochastic optimization problem $f(\mathbf{x})=\frac{1}{m}\sum_{i=1}^m f_i(\mathbf{x})$ over a connected network of $n$ agents, where each local function has the form of $f_i(\mathbf{x}) = {\mathbb E}\left[F(\mathbf{x};{\boldsymbol ξ}_i)\right]$ which satisfies the $(L_0,L_1)$-smooth condition but possibly nonconvex and each random variable ${\boldsymbol ξ}_i$ follows distribution ${\mathcal D}_i$. We propose a novel algorithm called decentralized normalized stochastic gradient descent (DNSGD), which can achieve an $ε$-stationary point at each local agent. We present a new framework for analyzing decentralized first-order methods in the $(L_0,L_1)$-smooth setting, based on the Lyapunov function related to the product of the gradient norm and the consensus error. We show that the proposed algorithm attains the upper bounds on the sample complexity of ${\mathcal O}(m^{-1}(L_fσ^2Δ_fε^{-4} + σ^2ε^{-2} + L_f^{-2}L_1^3σ^2Δ_fε^{-1} + L_f^{-2}L_1^2σ^2))$ per agent and the communication complexity of $\tilde{\mathcal O}((L_fε^{-2} + L_1ε^{-1})γ^{-1/2}Δ_f)$, where $L_f=L_0 +L_1ζ$, $σ^2$ is the variance of the stochastic gradient, $Δ_f$ is the initial optimal function value gap, $γ$ is the spectral gap of the network, and $ζ$ is the degree of the gradient dissimilarity. In the special case of $L_1=0$, the above results (nearly) match the lower bounds of decentralized stochastic nonconvex optimization under the standard smoothness. We also conduct numerical experiments to show the empirical superiority of our method.

URL PDF HTML ☆

赞 0 踩 0

2502.02748 2026-06-03 cs.LG cond-mat.mtrl-sci 版本更新

ReciNet: Reciprocal Space-Aware Long-Range Modeling for Crystalline Property Prediction

ReciNet: 用于晶体性质预测的倒易空间感知长程建模

Jianan Nie, Peiyao Xiao, Kaiyi Ji, Peng Gao

发表机构 * Department of Computer Science, Virginia Tech（维吉尼亚理工大学计算机科学系）； Department of Computer Science and Engineering, University at Buffalo（布法罗大学计算机科学与工程系）

AI总结提出基于倒易空间的几何网络ReciNet，通过傅里叶级数表示和可学习滤波器结合几何GNN与倒易模块，实现晶体中短程和长程相互作用建模，在多个基准上取得优异预测精度。

详情

AI中文摘要

从晶体结构预测其性质是材料科学中一项基础但具有挑战性的任务。与分子不同，晶体结构表现出原子的无限周期排列，需要能够有效捕捉局部和全局信息的方法。然而，当前的工作在捕捉周期结构内的长程相互作用方面存在不足。为了解决这个问题，我们利用倒易空间（周期晶体的自然域），并从分数坐标和倒易格矢出发，使用可学习滤波器构建傅里叶级数表示。在此基础上，我们引入了基于倒易空间的几何网络（ReciNet），这是一种新颖的架构，它集成了几何GNN和倒易模块来建模短程和长程相互作用。在综合基准JARVIS、Materials Project和MatBench上的实验表明，ReciNet在一系列晶体性质预测任务中取得了出色的预测精度。此外，我们探索了使用混合专家模型进行多性质预测的模型扩展，该扩展展示了高计算效率，并揭示了相关性质之间的正迁移。这些发现凸显了我们的模型作为可扩展且准确的晶体性质预测解决方案的潜力。

英文摘要

Predicting properties of crystals from their structures is a fundamental yet challenging task in materials science. Unlike molecules, crystal structures exhibit infinite periodic arrangements of atoms, requiring methods capable of capturing both local and global information effectively. However, current works fall short of capturing long-range interactions within periodic structures. To address this, we leverage \emph{reciprocal space}, the natural domain for periodic crystals, and construct a Fourier series representation from fractional coordinates and reciprocal lattice vectors with learnable filters. Building on this, we introduce the reciprocal space-based geometry network (\textbf{ReciNet}), a novel architecture that integrates geometric GNNs and reciprocal blocks to model short-range and long-range interactions. Experiments on comprehensive benchmarks JARVIS, Materials Project, and MatBench demonstrate that ReciNet achieves outstanding predictive accuracy across a range of crystal property prediction tasks. Additionally, we explore a model extension for multi-property prediction with the mixture-of-experts, which demonstrates high computational efficiency and reveals positive transfer between correlated properties. These findings highlight the potential of our model as a scalable and accurate solution for crystal property prediction.

URL PDF HTML ☆

赞 0 踩 0

2509.19305 2026-06-03 cs.LG cs.AI eess.SP 版本更新

Wavelet Fourier Diffuser: Frequency-Aware Diffusion Model for Reinforcement Learning

小波傅里叶扩散器：用于强化学习的频率感知扩散模型

Yifu Luo, Yongzhe Chang, Xueqian Wang

发表机构 * Tsinghua University China（清华大学中国）

AI总结针对现有扩散模型在离线强化学习中忽略频域特征导致频率偏移的问题，提出WFDiffuser，通过离散小波变换分解轨迹并利用短时傅里叶变换和交叉注意力增强频域建模，在D4RL基准上有效缓解频率偏移，提升轨迹稳定性和决策性能。

Comments IJCNN 2025

详情

Journal ref: IJCNN 2025

AI中文摘要

扩散概率模型通过直接建模轨迹序列，在离线强化学习中展现出显著潜力。然而，现有方法主要关注时域特征而忽略频域特征，根据我们的观察，这会导致频率偏移和性能下降。在本文中，我们从频域的新视角研究强化学习问题。我们首先观察到，仅使用时域的方法会无意中引入频域低频分量的偏移，从而导致轨迹不稳定和性能下降。为了解决这个问题，我们提出了小波傅里叶扩散器（WFDiffuser），一种新颖的基于扩散的强化学习框架，它集成了离散小波变换将轨迹分解为低频和高频分量。为了进一步增强每个分量的扩散建模，WFDiffuser采用短时傅里叶变换和交叉注意力机制来提取频域特征并促进跨频率交互。在D4RL基准上的大量实验结果表明，WFDiffuser有效缓解了频率偏移，从而产生更平滑、更稳定的轨迹，并相比现有方法提高了决策性能。

英文摘要

Diffusion probability models have shown significant promise in offline reinforcement learning by directly modeling trajectory sequences. However, existing approaches primarily focus on time-domain features while overlooking frequency-domain features, leading to frequency shift and degraded performance according to our observation. In this paper, we investigate the RL problem from a new perspective of the frequency domain. We first observe that time-domain-only approaches inadvertently introduce shifts in the low-frequency components of the frequency domain, which results in trajectory instability and degraded performance. To address this issue, we propose Wavelet Fourier Diffuser (WFDiffuser), a novel diffusion-based RL framework that integrates Discrete Wavelet Transform to decompose trajectories into low- and high-frequency components. To further enhance diffusion modeling for each component, WFDiffuser employs Short-Time Fourier Transform and cross attention mechanisms to extract frequency-domain features and facilitate cross-frequency interaction. Extensive experiment results on the D4RL benchmark demonstrate that WFDiffuser effectively mitigates frequency shift, leading to smoother, more stable trajectories and improved decision-making performance over existing methods.

URL PDF HTML ☆

赞 0 踩 0

2509.08707 2026-06-03 q-bio.BM cs.LG 版本更新

Tokenizing Loops of Antibodies

抗体环的标记化

Ada Fang, Robert G. Alberstein, Simon Kelow, Frédéric A. Dreyer

发表机构 * Harvard University（哈佛大学）； Prescient Design, Genentech（Prescient Design，基因泰克）

AI总结提出Igloo多模态抗体环标记器，通过对比学习编码主链二面角和序列，高效检索相似环结构，提升H3环识别性能5.9%，并集成到蛋白质语言模型中改善抗体设计。

Comments 21 pages, 7 figures, 10 tables, code available at https://github.com/prescient-design/igloo

详情

AI中文摘要

抗体的互补决定区是环状结构，对其与抗原的相互作用至关重要，并且对新型生物制品的设计具有高度重要性。自20世纪80年代以来，将CDR结构的多样性分类为规范簇使得能够识别抗体的关键结构基序。然而，现有方法的覆盖范围有限，并且不能轻易地整合到蛋白质基础模型中。在这里，我们介绍了免疫球蛋白环标记器Igloo，这是一种多模态抗体环标记器，用于编码主链二面角和序列。Igloo使用对比学习目标进行训练，以在潜在空间中将具有相似主链二面角的环映射得更近。Igloo可以高效地从结构抗体数据库中检索最接近的匹配环结构，在识别相似H3环方面比现有方法高出5.9%。Igloo为所有环分配标记，解决了规范簇覆盖范围有限的问题，同时保留了恢复规范环构象的能力。为了展示Igloo标记的多功能性，我们展示了它们可以通过IglooLM和IglooALM整合到蛋白质语言模型中。在预测重链变体的结合亲和力方面，IglooLM在10个抗体-抗原靶点中的8个上优于基础蛋白质语言模型。此外，它与现有的最先进的基于序列和多模态蛋白质语言模型相当，与参数多7倍的模型表现相当。IglooALM采样的抗体环在序列上多样化，在结构上比最先进的抗体逆折叠模型更一致。Igloo展示了引入多模态标记用于抗体环在编码抗体环的多样化景观、改进蛋白质基础模型以及抗体CDR设计方面的优势。

英文摘要

The complementarity-determining regions of antibodies are loop structures that are key to their interactions with antigens, and of high importance to the design of novel biologics. Since the 1980s, categorizing the diversity of CDR structures into canonical clusters has enabled the identification of key structural motifs of antibodies. However, existing approaches have limited coverage and cannot be readily incorporated into protein foundation models. Here we introduce ImmunoGlobulin LOOp Tokenizer, Igloo, a multimodal antibody loop tokenizer that encodes backbone dihedral angles and sequence. Igloo is trained using a contrastive learning objective to map loops with similar backbone dihedral angles closer together in latent space. Igloo can efficiently retrieve the closest matching loop structures from a structural antibody database, outperforming existing methods on identifying similar H3 loops by 5.9\%. Igloo assigns tokens to all loops, addressing the limited coverage issue of canonical clusters, while retaining the ability to recover canonical loop conformations. To demonstrate the versatility of Igloo tokens, we show that they can be incorporated into protein language models with IglooLM and IglooALM. On predicting binding affinity of heavy chain variants, IglooLM outperforms the base protein language model on 8 out of 10 antibody-antigen targets. Additionally, it is on par with existing state-of-the-art sequence-based and multimodal protein language models, performing comparably to models with $7\times$ more parameters. IglooALM samples antibody loops which are diverse in sequence and more consistent in structure than state-of-the-art antibody inverse folding models. Igloo demonstrates the benefit of introducing multimodal tokens for antibody loops for encoding the diverse landscape of antibody loops, improving protein foundation models, and for antibody CDR design.

URL PDF HTML ☆

赞 0 踩 0

2507.23035 2026-06-03 cs.LG cs.AR 版本更新

OASIS: Outlier-Aware LUT-Based GEMM with Dual-Side Quantization for LLM Inference Acceleration

OASIS：基于查找表的离群点感知双端量化LLM推理加速通用矩阵乘法

Xueying Wu, Baijun Zhou, Zhihui Gao, Yuzhe Fu, Qilin Zheng, Yintao He, Hai Li

发表机构 * National University of Singapore（新加坡国立大学）

AI总结提出OASIS架构，利用预计算笛卡尔积查找表实现非均匀量化权重与激活的高效通用矩阵乘法，通过离群点感知量化方案和实时离群点检测引擎Orizuru，在保持精度的同时显著提升推理速度和能效。

详情

AI中文摘要

大型语言模型（LLM）在各种应用中展现了令人印象深刻的能力，但在推理过程中需要大量的内存和计算资源。现有的量化方法在效率和准确性之间存在权衡：仅权重量化（WOQ）引入了昂贵的反量化开销，而整数权重和激活量化（INT-WAQ）降低了精度并损害了模型质量。非均匀权重和激活量化（NU-WAQ）能更好地捕捉LLM权重和激活的非均匀分布，但仍与传统的低精度计算单元不兼容。本文提出了OASIS，一种基于查找表（LUT）的架构，能够在无需反量化的情况下实现非均匀量化权重和激活之间的高效通用矩阵乘法（GEMM）。OASIS采用预计算的笛卡尔积LUT，实现了LUT大小的64倍缩减，并相较于现有基于LUT的GEMM方法实现了1024倍的计算并行度提升。为了在激进的激活量化下保持精度，OASIS引入了一种离群点感知量化方案，同时进行基于LUT的GEMM和针对离群点的误差补偿。此外，我们设计了Orizuru，一种用于实时激活离群点检测的高效top-k检测引擎。根据广泛评估，与FP16基线相比，OASIS的平均精度下降仅为1.98%，比Atom低5.18%。在硬件方面，与FIGLUT加速器相比，OASIS实现了平均3.00倍的加速和1.44倍的能效提升。

英文摘要

Large language models (LLMs) have demonstrated impressive capabilities across a wide range of applications, but demand substantial memory and compute resources during inference. Existing quantization methods expose a trade-off between efficiency and accuracy: weight-only quantization (WOQ) incurs costly dequantization overheads, while integer weight-and-activation quantization (INT-WAQ) reduces precision and degrades model quality. Non-uniform weight-and-activation quantization (NU-WAQ) can better capture the non-uniform distributions of LLM weights and activations, yet remains incompatible with conventional low-precision compute units. This paper presents OASIS, a lookup table (LUT)-based architecture that enables efficient general matrix multiplication (GEMM) between non-uniformly quantized weights and activations without requiring dequantization. OASIS employs pre-computed Cartesian Product LUTs, achieving a 64x reduction in LUT size and enabling a 1024x higher computational parallelism over existing LUT-based GEMM methods. To preserve accuracy under aggressive activation quantization, OASIS introduces an outlier-aware quantization scheme with concurrent LUT-based GEMM and error compensation for outliers. Furthermore, we design Orizuru, an efficient top-k detection engine for real-time activation outlier identification. According to extensive evaluations, OASIS incurs an average accuracy drop of only 1.98% compared to the FP16 baseline, which is 5.18% lower than Atom. On the hardware side, OASIS achieves an average 3.00x speedup and a 1.44x energy efficiency improvement compared to the FIGLUT accelerator.

URL PDF HTML ☆

赞 0 踩 0

2508.13174 2026-06-03 cs.AI cs.LG q-fin.CP stat.ML 版本更新

AlphaEval: A Comprehensive and Efficient Evaluation Framework for Formula Alpha Mining

AlphaEval：一个全面高效的公式化Alpha挖掘评估框架

Hongjun Ding, Binqi Chen, Jinsheng Huang, Taian Guo, Zhengyang Mao, Guoyi Shao, Lutong Zou, Luchen Liu, Ming Zhang

发表机构 * CUNY Baruch College（CUNY 巴纳特学院）； Peking University（北京大学）； Harvard University（哈佛大学）； Zhengren Research（正人研究所）； Zhengren Quant（正人量化）

AI总结提出AlphaEval框架，通过五个维度（预测能力、稳定性、鲁棒性、金融逻辑、多样性）对自动Alpha挖掘模型进行统一、可并行化且无需回测的评估，实现与回测相当的评估一致性并提高效率。

Comments Accepted by KDD2026

详情

DOI: 10.1145/3770855.3817727

AI中文摘要

公式化Alpha挖掘从金融数据中生成预测信号，对量化投资至关重要。尽管遗传编程、强化学习和大语言模型等多种算法方法显著扩展了Alpha发现的能力，但系统评估仍是一个关键挑战。现有评估指标主要包括回测和基于相关性的度量。回测计算密集、本质上是顺序的，并且对特定策略参数敏感。基于相关性的度量虽然高效，但仅评估预测能力，忽略了时间稳定性、鲁棒性、多样性和可解释性等其他关键属性。此外，大多数现有Alpha挖掘模型的闭源性质阻碍了可重复性并减缓了该领域的进展。为解决这些问题，我们提出了AlphaEval，一个统一、可并行化且无需回测的自动Alpha挖掘模型评估框架。AlphaEval沿五个互补维度评估生成Alpha的整体质量：预测能力、稳定性、对市场扰动的鲁棒性、金融逻辑和多样性。跨代表性Alpha挖掘算法的广泛实验表明，AlphaEval实现了与全面回测相当的评估一致性，同时提供更全面的洞察和更高的效率。此外，与传统的单一指标筛选方法相比，AlphaEval能有效识别更优的Alpha。所有实现和评估工具均已开源，以促进可重复性和社区参与。

英文摘要

Formula alpha mining, which generates predictive signals from financial data, is critical for quantitative investment. Although various algorithmic approaches-such as genetic programming, reinforcement learning, and large language models-have significantly expanded the capacity for alpha discovery, systematic evaluation remains a key challenge. Existing evaluation metrics predominantly include backtesting and correlation-based measures. Backtesting is computationally intensive, inherently sequential, and sensitive to specific strategy parameters. Correlation-based metrics, though efficient, assess only predictive ability and overlook other crucial properties such as temporal stability, robustness, diversity, and interpretability. Additionally, the closed-source nature of most existing alpha mining models hinders reproducibility and slows progress in this field. To address these issues, we propose AlphaEval, a unified, parallelizable, and backtest-free evaluation framework for automated alpha mining models. AlphaEval assesses the overall quality of generated alphas along five complementary dimensions: predictive power, stability, robustness to market perturbations, financial logic, and diversity. Extensive experiments across representative alpha mining algorithms demonstrate that AlphaEval achieves evaluation consistency comparable to comprehensive backtesting, while providing more comprehensive insights and higher efficiency. Furthermore, AlphaEval effectively identifies superior alphas compared to traditional single-metric screening approaches. All implementations and evaluation tools are open-sourced to promote reproducibility and community engagement.

URL PDF HTML ☆

赞 0 踩 0

2507.19684 2026-06-03 cs.LG cs.AI cs.CL cs.CV 版本更新

CoMPAS3D: A Dataset and Benchmark for Interactive Motion

CoMPAS3D: 一个用于交互动作的数据集和基准

Bermet Burkanova, Yasaman Etesam, Payam Jome Yazdian, Trinity Evans, Chuxuan Zhang, Zoe Stanley, Paige Tuttösí, Angelica Lim

发表机构 * School of Computing Science Simon Fraser University（计算科学学院西蒙弗雷泽大学）

AI总结提出CoMPAS3D数据集和评估框架，通过动作可读性和熟练度适当性等客观指标，解决交互式动作生成中缺乏社交上下文评估的问题。

Comments https://rosielab.github.io/compas3d

详情

AI中文摘要

社交互动型人形机器人必须通过身体与人类互动，实时适应伙伴的动作、意图和能力。这需要模型不仅理解身体如何移动，还要理解在共享社交背景下动作的含义。然而，交互式动作生成的评估框架并未衡量生成的动作是否在共享动作词汇中可读，也不评估其是否适合伙伴的熟练水平。这一差距有两个原因：现有框架依赖运动学指标（如FID和节拍对齐），无法衡量上述特性；现有数据集缺乏动作标注和熟练度变化。萨尔萨舞作为评估领域很合适：即兴、双人、由动作词汇和评判标准（涵盖时机、音乐性、技巧、难度、配合和原创性）指导。我们提出CoMPAS3D，一个即兴双人萨尔萨舞的动作捕捉数据集，附带评估框架，涵盖运动学质量、两个客观指标（动作可读性和熟练度适当性）以及六个基于竞赛的主观维度。数据集包含18名舞者（涵盖初级、中级和高级水平）的3小时即兴表演，超过2800个专家标注片段，涵盖动作类型、错误和风格元素。我们定义了三个基准：动作分类（类似于转录）、熟练度估计（流利度评估）和跟随者生成（对话响应）。微调的视觉语言模型在应用于真实动作序列的客观指标上表现强劲。应用于Duolando和InterGen时，这些指标揭示了运动学指标遗漏的失败。人工评估确认了生成动作与真实动作之间的差距。CoMPAS3D、标注、基准代码和基线结果公开可用。

英文摘要

Socially interactive humanoid robots must engage with humans through their bodies, adapting in real time to a partner's movement, intent, and abilities. This requires models that understand not just how bodies move, but what movement means in a shared social context. Yet evaluation frameworks for interactive motion generation do not measure whether generated follower motion is legible within a shared movement vocabulary, nor whether it is appropriate to the partner's proficiency level. This gap has two causes: existing frameworks rely on kinematic metrics such as FID and beat alignment that cannot measure either property, and existing datasets lack the move annotations and proficiency variation needed. Salsa is well-suited as an evaluation domain: improvised, dyadic, and governed by a move vocabulary and judging criteria covering timing, musicality, technique, difficulty, partnering, and originality. We present CoMPAS3D, a motion capture dataset of improvised partner salsa paired with an evaluation framework covering kinematic quality, two objective metrics (move legibility and proficiency appropriateness), and six competition-based subjective dimensions. The dataset includes 3 hours of improvisation by 18 dancers spanning beginner, intermediate, and professional levels, with over 2,800 expert-annotated segments covering move types, errors, and stylistic elements. We define three benchmarks: move classification (analogous to transcription), proficiency estimation (fluency assessment), and follower generation (dialogue response). Fine-tuned vision-language models perform strongly on objective metrics applied to ground-truth motion sequences. Applied to Duolando and InterGen, the metrics reveal failures that kinematic metrics miss. Human evaluations confirm the gap between generated and ground-truth motion. CoMPAS3D, annotations, benchmark code, and baseline results are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2504.01531 2026-06-03 cs.LG 版本更新

DRAN: A Distribution and Relation Adaptive Network for Spatio-temporal Forecasting

DRAN：一种面向时空预测的分布与关系自适应网络

Xiaobei Zou, Luolin Xiong, Kexuan Zhang, Cesare Alippi, Yang Tang

发表机构 * Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai（能源化学过程智能制造关键实验室，教育部，东华大学，上海）； Faculty of Informatics, Università della Svizzera italiana（瑞士意大利大学信息学院）； Department of Electronics, Information and Bioengineering, Politecnico di Milano（米兰理工学院电子、信息与生物工程系）

AI总结针对非平稳时空系统的预测挑战，提出分布与关系自适应网络（DRAN），通过空间因子学习器（SFL）和动态-静态融合学习器（DSFL）分别适应分布偏移和关系变化，在天气和交通预测任务上超越现有方法。

Comments 15 pages, 10 figures

详情

AI中文摘要

准确的时空系统预测对于系统管理、控制和危机预防等任务至关重要。然而，许多时空系统固有的时变性给在非平稳条件下实现准确预测带来了挑战。为了解决非平稳性问题，我们提出了一种分布与关系自适应网络（DRAN），能够动态适应随时间变化的关系和分布。虽然时间归一化和反归一化是适应分布偏移的常用技术，但这种操作不适用于时空上下文，因为时间归一化会缩放节点的时间序列，可能破坏节点间的空间关系。为了解决这个问题，我们开发了一个空间因子学习器（SFL）模块，使得归一化和反归一化过程得以实现。为了适应传感器间空间关系的动态变化，我们提出了一种动态-静态融合学习器（DSFL）模块，通过自适应融合比例机制有效整合从动态和静态关系中学习到的特征。此外，我们引入了一个随机学习器来捕获时空表示中的噪声成分。我们的方法在天气预测和交通流预测任务上优于现有最先进方法。实验结果表明，我们的SFL在各种时间归一化操作下有效保持了空间关系。对学习到的动态和静态关系的可视化表明，DSFL能够捕获节点间的局部和远程关系。

英文摘要

Accurate predictions of spatio-temporal systems are crucial for tasks such as system management, control, and crisis prevention. However, the inherent time variance of many spatio-temporal systems poses challenges to achieving accurate predictions whenever stationarity is not granted. In order to address non-stationarity, we propose a Distribution and Relation Adaptive Network (DRAN) capable of dynamically adapting to relation and distribution changes over time. While temporal normalization and de-normalization are frequently used techniques to adapt to distribution shifts, this operation is not suitable for the spatio-temporal context as temporal normalization scales the time series of nodes and possibly disrupts the spatial relations among nodes. In order to address this problem, a Spatial Factor Learner (SFL) module is developed that enables the normalization and de-normalization process. To adapt to dynamic changes in spatial relationships among sensors, we propose a Dynamic-Static Fusion Learner (DSFL) module that effectively integrates features learned from both dynamic and static relations through an adaptive fusion ratio mechanism. Furthermore, we introduce a Stochastic Learner to capture the noisy components of spatio-temporal representations. Our approach outperforms state-of-the-art methods on weather prediction and traffic flow forecasting tasks.Experimental results show that our SFL efficiently preserves spatial relationships across various temporal normalization operations. Visualizations of the learned dynamic and static relations demonstrate that DSFL can capture both local and distant relationships between nodes.

URL PDF HTML ☆

赞 0 踩 0

2506.21129 2026-06-03 cs.LG cs.AI 版本更新

Curriculum-Adapted Robust Reinforcement Learning for UAV Deconfliction in Adversarial Environments

对抗环境中无人机冲突消解的课程自适应鲁棒强化学习

Deepak Kumar Panda, Adolfo Perrusquia, Weisi Guo

发表机构 * Faculty of Engineering and Applied Sciences, Cranfield University（工程与应用科学学院，克兰菲尔德大学）

AI总结提出一种课程引导的适应框架，通过渐进暴露于梯度对抗观测扰动并对齐时序差分误差分布，提升无人机在GNSS欺骗攻击下的鲁棒性和泛化能力。

详情

AI中文摘要

自主无人机（UAV）越来越依赖强化学习（RL）进行导航。然而，全球导航卫星系统（GNSS）欺骗攻击可能导致分布外观测偏移，破坏价值估计并降低任务性能。现有的鲁棒RL方法通常能提高对特定攻击模型的抵抗力，但往往无法泛化到训练中未遇到的攻击。为解决这一局限，我们提出一种课程引导的适应框架，该框架逐步将鲁棒策略暴露于强度递增的基于梯度的对抗观测扰动，同时对齐课程阶段间的时序差分（TD）误差分布。所提出的方法不是适应特定的攻击模型，而是保持TD误差一致性以促进跨攻击条件的可迁移性。我们进一步推导了一个TD空间泛化保证，表明如果测试时攻击引起的TD误差分布与最终课程阶段的分布足够接近，则由此产生的性能退化是有界的。该框架在具有动态3D障碍物的无人机冲突消解环境中进行评估，面对之前未见过的固定和动态GNSS欺骗攻击。在固定欺骗条件下，课程适应策略实现了近乎完美的任务成功率，而标准和鲁棒RL基线为20-56%。在动态障碍物引诱欺骗攻击下，它获得了最高的情节奖励，同时随着空中交通密度的增加，任务完成步骤最多减少了45%。

英文摘要

Autonomous unmanned aerial vehicles (UAVs) increasingly rely on reinforcement learning (RL) for navigation. However, global navigation satellite system (GNSS) spoofing attacks can induce out-of-distribution observation shifts that corrupt value estimation and degrade mission performance. Existing robust RL approaches typically improve resilience against specific attack models but often fail to generalize to attacks not encountered during training. To address this limitation, we propose a curriculum-guided adaptation framework that progressively exposes a robust policy to gradient-based adversarial observation perturbations of increasing intensity while aligning temporal-difference (TD) error distributions across curriculum stages. Rather than adapting to a particular attack model, the proposed approach preserves TD-error consistency to promote transferability across attack conditions. We further derive a TD-space generalization certificate showing that if the TD-error distribution induced by a test-time attack remains sufficiently close to that of the final curriculum stage, the resulting performance degradation is bounded. The framework is evaluated in a UAV deconfliction environment with dynamic 3D obstacles under previously unseen fixed and dynamic GNSS spoofing attacks. Under fixed spoofing conditions, the curriculum-adapted policy achieved near-perfect mission success rates, compared with 20-56% for standard and robust RL baselines. Under dynamic obstacle-luring spoofing attacks, it achieved the highest episodic rewards while reducing mission completion steps by up to 45% across increasing aerial traffic densities.

URL PDF HTML ☆

赞 0 踩 0

2506.01969 2026-06-03 cs.DC cs.AI cs.LG 版本更新

FlashMLA-ETAP: Efficient Transpose Attention Pipeline for Accelerating MLA Inference on NVIDIA H20 GPUs

FlashMLA-ETAP：用于加速NVIDIA H20 GPU上MLA推理的高效转置注意力流水线

Pengcuo Dege, Qiuming Luo, Rui Mao, Chang Kong

发表机构 * Tencent（腾讯）； College of Computer Science and Software Engineering, Shenzhen University（深圳大学计算机科学与软件工程学院）； College of Artificial Intelligence, Shenzhen Polytechnic University（深圳职业技术学院人工智能学院）

AI总结针对单多GPU服务器部署DeepSeek-R1 671B模型时多头潜在注意力（MLA）推理效率低的问题，提出FlashMLA-ETAP框架，通过高效转置注意力流水线（ETAP）重配置注意力计算，在NVIDIA H20 GPU上实现2.78倍加速，并保持数值稳定性。

Comments Accepted by ICONIP2025

详情

AI中文摘要

多头潜在注意力（MLA）的高效推理面临在单台多GPU服务器上部署DeepSeek-R1 671B模型的挑战。本文介绍FlashMLA-ETAP，一种新颖的框架，用于增强NVIDIA H20 GPU上单实例部署场景的MLA推理。我们提出了高效转置注意力流水线（ETAP），通过转置重新配置注意力计算，使KV上下文长度与WGMMA操作中的$M$维度对齐，显著减少冗余计算。FlashMLA-ETAP在64K序列长度（批大小16）下比FlashMLA加速2.78倍，比FlashAttention-3和FlashInfer分别提升5.24倍和4.94倍，同时保持数值稳定性，均方根误差（RMSE）比FlashAttention-3低15.2倍（$1.25 imes 10^{-5}$）。此外，ETAP的设计能够无缝集成到FlashAttention-3和FlashInfer等框架中，并有详细的理论分析支持。我们的工作解决了资源受限推理中的一个关键空白，为中端GPU提供了可扩展的解决方案，并为硬件感知优化的更广泛采用铺平了道路。代码可在https://github.com/pengcuo/FlashMLA-ETAP获取。

英文摘要

Efficient inference of Multi-Head Latent Attention (MLA) is challenged by deploying the DeepSeek-R1 671B model on a single Multi-GPU server. This paper introduces FlashMLA-ETAP, a novel framework that enhances MLA inference for the single-instance deployment scenario on NVIDIA H20 GPUs. We propose the Efficient Transpose Attention Pipeline (ETAP), which reconfigures attention computation through transposition to align the KV context length with the $M$-dimension in WGMMA operations, significantly reducing redundant computations. FlashMLA-ETAP achieves a 2.78x speedup over FlashMLA at 64K sequence length (batch size 16), with 5.24x and 4.94x improvements over FlashAttention-3 and FlashInfer, respectively, while maintaining numerical stability with a 15.2x lower RMSE ($1.25 \times 10^{-5}$) than FlashAttention-3. Furthermore, ETAP's design enables seamless integration into frameworks like FlashAttention-3 and FlashInfer, supported by a detailed theoretical analysis. Our work addresses a critical gap in resource-constrained inference, offering a scalable solution for mid-tier GPUs and paving the way for broader adoption in hardware-aware optimization. Code is available at https://github.com/pengcuo/FlashMLA-ETAP.

URL PDF HTML ☆

赞 0 踩 0

2506.03087 2026-06-03 cs.LG cs.AI 版本更新

Do Explanations Increase the Risk of Decision Logic Leakage? Explanation-Guided Stealing of Graph Models

解释是否会增加决策逻辑泄露的风险？解释引导的图模型窃取

Bin Ma, Yuyuan Feng, Minhua Lin, Enyan Dai

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））； Xiamen University（厦门大学）； The Pennsylvania State University（宾夕法尼亚州立大学）

AI总结研究解释机制可能泄露图神经网络决策逻辑的风险，提出一种结合解释对齐与数据增强的模型窃取框架，实验证明其优于传统方法。

详情

AI中文摘要

图神经网络（GNNs）已成为药物发现和金融分析等领域中分析图结构数据的重要工具，导致对模型透明度的需求日益增长。可解释GNNs的最新进展通过揭示影响预测的重要子图满足了这一需求，但这些解释机制可能无意中使这些模型面临安全风险。本文研究了此类解释如何潜在泄露可被利用进行模型窃取的关键决策逻辑。我们提出了{\method}，一种新颖的窃取框架，它将用于捕获决策逻辑的解释对齐与用于在有限查询下高效训练的引导数据增强相结合，从而能够有效复制目标模型的预测行为和底层推理模式。在分子图数据集上的实验表明，我们的方法在模型窃取方面优于传统方法。这项工作突出了在敏感领域部署可解释GNNs时的重要安全考虑，并表明需要针对基于解释的攻击采取保护措施。我们的代码可在https://github.com/beanmah/EGSteal获取。

英文摘要

Graph Neural Networks (GNNs) have become essential tools for analyzing graph-structured data in domains such as drug discovery and financial analysis, leading to a growing demand for model transparency. Recent advances in explainable GNNs have addressed this need by revealing important subgraphs that influence predictions, but these explanation mechanisms may inadvertently expose these models to security risks. This paper investigates how such explanations potentially leak critical decision logic that can be exploited for model stealing. We propose {\method}, a novel stealing framework that integrates explanation alignment for capturing decision logic with guided data augmentation for efficient training under limited queries, enabling effective replication of both the predictive behavior and underlying reasoning patterns of target models. Experiments on molecular graph datasets demonstrate that our approach shows advantages over conventional methods in model stealing. This work highlights important security considerations for the deployment of explainable GNNs in sensitive domains and suggests the need for protective measures against explanation-based attacks. Our code is available at https://github.com/beanmah/EGSteal.

URL PDF HTML ☆

赞 0 踩 0

2506.01075 2026-06-03 cs.DS cs.IT cs.LG math.IT 版本更新

Learning DNF through Generalized Fourier Representations

通过广义傅里叶表示学习DNF

Mohsen Heidari, Roni Khardon

发表机构 * Department of Computer Sciences, Indiana University, Bloomington, IN, USA（印第安纳大学计算机科学系，印第安纳州布卢明顿，IN，USA）

AI总结针对非乘积分布下DNF学习难题，引入基于贝叶斯网络的广义傅里叶表示，证明合取式的L1谱范数有界性，实现DNF和决策树的可学习性。

Comments 60 pages

详情

AI中文摘要

布尔傅里叶表示在学习理论中被广泛使用，特别是在均匀分布和乘积分布下学习析取范式（DNF）。将这些结果扩展到非乘积分布一直是一个长期未解决的开放问题。我们通过引入一种广义傅里叶表示来应对这一挑战，该表示能够在广泛的一类非乘积分布下进行学习。我们的方法将任意分布$D$表示为贝叶斯网络（BN），并推导出相应的傅里叶展开。我们证明了使用成员查询来识别重系数的标准基于傅里叶的学习技术可以通过少量修改适应于这种广义表示。我们证明了对于差分有界树BN，合取式的$L_1$谱范数在这种展开下保持有界，显著推广了均匀分布的已知结果；匹配的下界证明了这些约束的必要性。利用这些结果，我们建立了DNF的可学习性以及决策树在此类分布下的不可知学习性。最后，我们提出了一种学习差分有界树BN分布的算法，将我们的结果扩展到分布未知的场景。

英文摘要

The Boolean Fourier representation has been widely used in learning theory, particularly for learning Disjunctive Normal Form (DNF) under uniform and product distributions. Extending these results to non-product distributions has remained a longstanding open problem. We address this challenge by introducing a generalized Fourier representation that enables learning under a broad class of non-product distributions. Our approach represents any distribution $D$ as a Bayesian network (BN) and derives a corresponding Fourier expansion. We show that standard Fourier-based learning techniques using membership queries to identify heavy coefficients can be adapted to this generalized representation with minor modifications. We prove that the $L_1$ spectral norm of conjunctions remains bounded under this expansion for difference-bounded tree BNs, significantly generalizing the known result for uniform distributions; matching lower bounds demonstrate the necessity of these constraints. Using these results, we establish the learnability of DNF and the agnostic learnability of decision trees under such distributions. Finally, we present an algorithm for learning difference-bounded tree BN distributions, extending our results to settings where the distribution is unknown.

URL PDF HTML ☆

赞 0 踩 0

2506.00431 2026-06-03 cs.LG 版本更新

TIDFormer: Exploiting Temporal and Interactive Dynamics Makes A Great Dynamic Graph Transformer

TIDFormer: 利用时间和交互动态打造卓越的动态图Transformer

Jie Peng, Zhewei Wei, Yuhang Ye

发表机构 * Renmin University of China（中国人民大学）； Huawei Shenzhen, Guangdong China（华为深圳，广东中国）

AI总结提出TIDFormer，通过高效利用时间和交互动态，并设计可解释的自注意力机制，在多个动态图数据集上超越现有模型。

Comments KDD2025

详情

DOI: 10.1145/3711896.3737155

AI中文摘要

由于自注意力机制（SAMs）在序列建模中捕捉依赖关系的能力，一些现有的动态图神经网络（DGNNs）利用具有各种编码设计的Transformer架构来捕捉动态图的序列演化。然而，这些基于Transformer的DGNNs的有效性和效率差异很大，凸显了在动态图上正确定义SAM以及在不增加额外复杂模块的情况下全面编码时间和交互动态的重要性。在这项工作中，我们提出了TIDFormer，一种以高效方式充分利用时间和交互动态的动态图Transformer。我们阐明并验证了我们提出的SAM的可解释性，解决了先前工作中在动态图上其定义不可解释的开放问题。为了分别建模时间和交互动态，我们利用基于日历的时间划分信息，并仅使用采样的一阶邻居为二分图和非二分图提取信息丰富的交互嵌入。此外，我们通过简单的分解捕捉历史交互模式的潜在变化，联合建模时间和交互特征。我们在多个动态图数据集上进行了大量实验，以验证TIDFormer的有效性和效率。实验结果表明，TIDFormer表现出色，在大多数数据集和实验设置中超越了最先进的模型。此外，与之前基于Transformer的方法相比，TIDFormer展现出显著的效率优势。

英文摘要

Due to the proficiency of self-attention mechanisms (SAMs) in capturing dependencies in sequence modeling, several existing dynamic graph neural networks (DGNNs) utilize Transformer architectures with various encoding designs to capture sequential evolutions of dynamic graphs. However, the effectiveness and efficiency of these Transformer-based DGNNs vary significantly, highlighting the importance of properly defining the SAM on dynamic graphs and comprehensively encoding temporal and interactive dynamics without extra complex modules. In this work, we propose TIDFormer, a dynamic graph TransFormer that fully exploits Temporal and Interactive Dynamics in an efficient manner. We clarify and verify the interpretability of our proposed SAM, addressing the open problem of its uninterpretable definitions on dynamic graphs in previous works. To model the temporal and interactive dynamics, respectively, we utilize the calendar-based time partitioning information and extract informative interaction embeddings for both bipartite and non-bipartite graphs using merely the sampled first-order neighbors. In addition, we jointly model temporal and interactive features by capturing potential changes in historical interaction patterns through a simple decomposition. We conduct extensive experiments on several dynamic graph datasets to verify the effectiveness and efficiency of TIDFormer. The experimental results demonstrate that TIDFormer excels, outperforming state-of-the-art models across most datasets and experimental settings. Furthermore, TIDFormer exhibits significant efficiency advantages compared to previous Transformer-based methods.

URL PDF HTML ☆

赞 0 踩 0

2505.20853 2026-06-03 cs.LG cs.AI 版本更新

Cooperation of Experts: Fusing Heterogeneous Information with Large Margin

专家合作：大间隔融合异构信息

Shuo Wang, Shunyang Huang, Jinghui Yuan, Zhixiang Shen, Zhao Kang

发表机构 * Shuo Wang, Shunyang Huang, Jinghui Yuan, Zhixiang Shen, Zhao Kang（未知）

AI总结提出专家合作框架，通过大间隔机制融合异构信息，在统一异构多路网络中编码多类型数据，实现鲁棒且互补的知识提取。

Comments Accepted at the 42nd International Conference on Machine Learning (ICML 2025)

详情

Journal ref: Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:63169-63185, 2025

AI中文摘要

融合异构信息仍然是现代数据分析中的一个持续挑战。尽管已取得显著进展，但现有方法往往未能考虑对象模式在不同语义空间中的固有异质性。为解决这一局限性，我们提出了专家合作（CoE）框架，该框架将多类型信息编码到统一的异构多路网络中。通过克服模态和连接差异，CoE为捕捉现实世界复杂数据的复杂结构提供了一个强大且灵活的模型。在我们的框架中，专用编码器充当领域特定专家，每个专家专门学习特定语义空间中的不同关系模式。为了增强鲁棒性并提取互补知识，这些专家通过一种新颖的大间隔机制进行协作，该机制由定制的优化策略支持。严格的理论分析保证了框架的可行性和稳定性，而跨多种基准的广泛实验证明了其优越的性能和广泛的适用性。我们的代码可在 https://github.com/strangeAlan/CoE 获取。

英文摘要

Fusing heterogeneous information remains a persistent challenge in modern data analysis. While significant progress has been made, existing approaches often fail to account for the inherent heterogeneity of object patterns across different semantic spaces. To address this limitation, we propose the Cooperation of Experts (CoE) framework, which encodes multi-typed information into unified heterogeneous multiplex networks. By overcoming modality and connection differences, CoE provides a powerful and flexible model for capturing the intricate structures of real-world complex data. In our framework, dedicated encoders act as domain-specific experts, each specializing in learning distinct relational patterns in specific semantic spaces. To enhance robustness and extract complementary knowledge, these experts collaborate through a novel large margin mechanism supported by a tailored optimization strategy. Rigorous theoretical analyses guarantee the framework's feasibility and stability, while extensive experiments across diverse benchmarks demonstrate its superior performance and broad applicability. Our code is available at https://github.com/strangeAlan/CoE.

URL PDF HTML ☆

赞 0 踩 0

2505.20142 2026-06-03 cs.LG 版本更新

Grounding Functional Similarity by Invariance-Aware Model Stitching

通过不变性感知模型拼接实现功能相似性评估

Ioannis Athanasiadis, Anmar Karmush, Michael Felsberg

发表机构 * Ioannis Athanasiadis ； Anmar Karmush ； Michael Felsberg

AI总结针对标准模型拼接忽略不变性导致功能相似性误判的问题，提出前向-后向兼容性要求下的不变性感知模型拼接方法，揭示隐藏的功能差异。

详情

AI中文摘要

在深度学习中，功能相似性评估量化了独立训练的模型学习相似输入-输出关系的程度。在模型拼接中，功能相似性被表述为表示前向兼容性，即两个模型的表示能否对齐以解决给定任务。然而，最近的研究强调了一个关键限制：依赖不同信息线索的模型仍可能产生兼容的表示，使其看起来具有误导性的相似性（Smith et al., 2025）。我们将此失败归因于标准模型拼接本质上对拼接模型的不变性特性视而不见。为解决这一限制，我们引入了前向-后向兼容性要求，并据此制定了不变性感知模型拼接。通过分析关键拼接配置，我们研究了前向和后向兼容性之间的相互作用，表明不变性感知模型拼接为功能相似性评估提供了更原则性的方法，同时揭示了先前被掩盖的功能差异。

英文摘要

In deep learning, functional similarity evaluation quantifies the extent to which independently trained models learn similar input--output relationships. In model stitching, functional similarity is framed as representation forward compatibility, i.e., whether the representations of two models can be aligned to solve a given task. Recent studies, however, highlight a critical limitation: models relying on different information cues can still produce compatible representations, making them appear misleadingly similar (Smith et al., 2025). We attribute this failure to standard model stitching being inherently blind to the invariance properties of the stitched models. To address this limitation, we introduce the forward--backward compatibility requirement under which we formulate the invariance-aware model stitching. Through analyzing key stitching configurations, we study the interplay between forward and backward compatibility, showing that invariance-aware model stitching provides a more principled approach to functional similarity evaluation while revealing functional discrepancies previously obscured.

URL PDF HTML ☆

赞 0 踩 0

2502.08006 2026-06-03 cs.LG cs.AI stat.ML 版本更新

Greed is Good: A Unifying Perspective on Guided Generation

贪婪即美德：引导生成的统一视角

Zander W. Blasingame, Chen Liu

AI总结本文通过将后验引导视为端到端引导的贪婪策略，统一了两种梯度引导方法，并提出了在计算与精度之间权衡的插值方法，在逆图像问题和分子生成任务上验证了有效性。

Comments Accepted at NeurIPS 2025

详情

AI中文摘要

无训练引导生成是一种广泛使用且强大的技术，允许最终用户对流/扩散模型的生成过程施加进一步控制。一般来说，针对基于梯度的引导，已经出现了两种技术系列：即后验引导（即通过目标预测模型将当前样本投影到目标分布进行引导）和端到端引导（即通过在整个ODE求解过程中执行反向传播进行引导）。在这项工作中，我们表明这两个看似分离的系列实际上可以通过将后验引导视为端到端引导的贪婪策略来统一。我们探索了这两个系列之间的理论联系，并深入分析了这两种技术相对于连续理想梯度的关系。基于这一分析，我们提出了一种在这两个系列之间插值的方法，从而在引导梯度的计算与精度之间实现权衡。然后，我们在几个逆图像问题和性质引导的分子生成任务上验证了这项工作。

英文摘要

Training-free guided generation is a widely used and powerful technique that allows the end user to exert further control over the generative process of flow/diffusion models. Generally speaking, two families of techniques have emerged for solving this problem for gradient-based guidance: namely, posterior guidance (i.e., guidance via projecting the current sample to the target distribution via the target prediction model) and end-to-end guidance (i.e., guidance by performing backpropagation throughout the entire ODE solve). In this work, we show that these two seemingly separate families can actually be unified by looking at posterior guidance as a greedy strategy of end-to-end guidance. We explore the theoretical connections between these two families and provide an in-depth theoretical of these two techniques relative to the continuous ideal gradients. Motivated by this analysis we then show a method for interpolating between these two families enabling a trade-off between compute and accuracy of the guidance gradients. We then validate this work on several inverse image problems and property-guided molecular generation.

URL PDF HTML ☆

赞 0 踩 0

2505.08886 2026-06-03 cs.CV cs.LG 版本更新

Optimizing Neuro-Fuzzy and Colonial Competition Algorithms for Skin Cancer Diagnosis in Dermatoscopic Images

优化神经模糊与殖民竞争算法用于皮肤镜图像中的皮肤癌诊断

Hamideh Khaleghpour, Brett McKinney

AI总结本研究融合图像处理、神经模糊和殖民竞争算法，在ISIC数据库的560张皮肤镜图像上实现94%准确率，旨在辅助临床早期黑色素瘤检测。

Comments 7 pages, 10 figures. Accepted at the 2nd Asia Pacific Computer Systems Conference (APCS 2024), March 15-17, 2024

详情

Journal ref: Proceedings of the 2024 7th International Conference on Information and Computer Technologies, pages 166-172, IEEE, March 2024

AI中文摘要

皮肤癌发病率的上升，加上公众意识有限和临床专业知识的不足，凸显了对先进诊断辅助工具的迫切需求。人工智能（AI）已成为该领域有前景的工具，特别是在区分恶性与良性皮肤病变方面。利用公开可用的皮肤病变数据集，研究人员一直在开发基于AI的诊断解决方案。然而，此类计算机系统在临床环境中的整合仍处于初期阶段。本研究旨在通过融合图像处理技术和机器学习算法（特别是神经模糊和殖民竞争方法）来弥合这一差距。应用于ISIC数据库中的皮肤镜图像，我们的方法在560张图像的数据集上达到了94%的显著准确率。这些结果强调了我们的方法在帮助临床医生早期检测黑色素瘤方面的潜力，从而为皮肤癌诊断做出重要贡献。

英文摘要

The rising incidence of skin cancer, coupled with limited public awareness and a shortfall in clinical expertise, underscores an urgent need for advanced diagnostic aids. Artificial Intelligence (AI) has emerged as a promising tool in this domain, particularly for distinguishing malignant from benign skin lesions. Leveraging publicly available datasets of skin lesions, researchers have been developing AI-based diagnostic solutions. However, the integration of such computer systems in clinical settings is still nascent. This study aims to bridge this gap by employing a fusion of image processing techniques and machine learning algorithms, specifically neuro-fuzzy and colonial competition approaches. Applied to dermoscopic images from the ISIC database, our method achieved a notable accuracy of 94% on a dataset of 560 images. These results underscore the potential of our approach in aiding clinicians in the early detection of melanoma, thereby contributing significantly to skin cancer diagnostics.

URL PDF HTML ☆

赞 0 踩 0

2505.07068 2026-06-03 stat.ML cs.LG math.DS 版本更新

A Sparse Bayesian Learning Algorithm for Estimation of Interaction Kernels in Motsch-Tadmor Model

Motsch-Tadmor模型中交互核估计的稀疏贝叶斯学习算法

Jinchao Feng, Sui Tang

发表机构 * Department of Mathematics, Great Bay University（广东大湾大学数学系）； Department of Mathematics, University of California, Santa Barbara（加州大学圣芭芭拉分校数学系）

AI总结针对Motsch-Tadmor模型中非对称交互核的估计问题，提出一种基于变分框架和稀疏贝叶斯学习的算法，实现核函数的鲁棒识别与不确定性量化。

Comments 23 pages

详情

AI中文摘要

本文基于观测轨迹数据，研究Motsch-Tadmor模型中非对称交互核的数据驱动辨识。所考虑的模型由一类半线性演化方程控制，其中交互核定义了一个归一化的、依赖于状态的拉普拉斯算子，该算子支配集体动力学。为了解决由此产生的非线性逆问题，我们提出一个变分框架，利用控制方程的隐式形式重新表述核辨识问题，将其简化为子空间辨识问题。我们建立了一个可辨识性结果，刻画了交互核在尺度意义下可唯一恢复的条件。为了鲁棒地求解逆问题，我们开发了一种稀疏贝叶斯学习算法，该算法引入信息先验进行正则化，量化不确定性，并实现原则性的模型选择。在代表性交互粒子系统上的大量数值实验表明，所提出的框架在不同噪声水平和数据范围内具有准确性、鲁棒性和可解释性。

英文摘要

In this paper, we investigate the data-driven identification of asymmetric interaction kernels in the Motsch-Tadmor model based on observed trajectory data. The model under consideration is governed by a class of semilinear evolution equations, where the interaction kernel defines a normalized, state-dependent Laplacian operator that governs collective dynamics. To address the resulting nonlinear inverse problem, we propose a variational framework that reformulates kernel identification using the implicit form of the governing equations, reducing it to a subspace identification problem. We establish an identifiability result that characterizes conditions under which the interaction kernel can be uniquely recovered up to scale. To solve the inverse problem robustly, we develop a sparse Bayesian learning algorithm that incorporates informative priors for regularization, quantifies uncertainty, and enables principled model selection. Extensive numerical experiments on representative interacting particle systems demonstrate the accuracy, robustness, and interpretability of the proposed framework across a range of noise levels and data regimes.

URL PDF HTML ☆

赞 0 踩 0

2504.01250 2026-06-03 cs.LG cs.SY eess.SY 版本更新

R2DN: Scalable Parameterization of Contracting and Lipschitz Recurrent Deep Networks

R2DN：收缩和Lipschitz循环深度网络的可扩展参数化

Nicholas H. Barbara, Ruigang Wang, Ian R. Manchester

发表机构 * Australian Centre for Robotics（澳大利亚机器人中心）； School of Aerospace, Mechanical and Mechatronic Engineering（航空航天、机械与机电工程学院）； The University of Sydney（悉尼大学）

AI总结本文提出鲁棒循环深度网络（R2DN），通过将线性时不变系统与1-Lipschitz深度前馈网络反馈互联，直接参数化权重以保证模型稳定（收缩）且对小输入扰动鲁棒（Lipschitz），相比循环均衡网络（REN）无需迭代求解均衡层，显著提升GPU上的推理和反向传播速度，并在非线性系统辨识、观测器设计和基于学习的反馈控制中实现相近性能下训练和推理速度提升一个数量级。

详情

AI中文摘要

本文提出鲁棒循环深度网络（R2DN），这是一种用于机器学习和数据驱动控制的鲁棒循环神经网络的可扩展参数化。我们将R2DN构造为线性时不变系统与1-Lipschitz深度前馈网络的反馈互联，并直接参数化权重，使得我们的模型天生稳定（收缩）且对小输入扰动鲁棒（Lipschitz）。我们的参数化使用了类似于先前提出的循环均衡网络（REN）的结构，但无需在每个时间步迭代求解均衡层。这加速了GPU上的模型推理和反向传播，并且与REN相比，使得网络规模、批大小和输入序列长度的扩展在计算上可行。我们在非线性系统辨识、观测器设计和基于学习的反馈控制三个代表性问题上将R2DN与REN进行比较。我们发现，在相似的测试集性能下，训练和推理速度均提升一个数量级，并且它们在模型表达能力方面具有更好的可扩展性。

英文摘要

This paper presents the Robust Recurrent Deep Network (R2DN), a scalable parameterization of robust recurrent neural networks for machine learning and data-driven control. We construct R2DNs as the feedback interconnection of a linear time-invariant system and a 1-Lipschitz deep feedforward network, and directly parameterize the weights so that our models are stable (contracting) and robust to small input perturbations (Lipschitz) by design. Our parameterization uses a structure similar to the previously-proposed recurrent equilibrium network (REN), but without the requirement to iteratively solve an equilibrium layer at each time-step. This speeds up both model inference and backpropagation on GPUs, and makes it computationally feasible to scale up the network size, batch size, and input sequence length in comparison to RENs. We compare R2DNs to RENs on three representative problems in nonlinear system identification, observer design, and learning-based feedback control. We find that training and inference are both up to an order of magnitude faster with similar test set performance, and that they scale more favorably with respect to model expressivity.

URL PDF HTML ☆

赞 0 踩 0

2502.03139 2026-06-03 astro-ph.CO astro-ph.IM cs.LG 版本更新

Fast Sampling of Cosmological Initial Conditions with Gaussian Neural Posterior Estimation

基于高斯神经后验估计的宇宙学初始条件快速采样

Oleg Savchenko, Guillermo Franco Abellán, Florian List, Noemi Anau Montel, Christoph Weniger

发表机构 * GRAPPA Institute, Institute for Theoretical Physics Amsterdam, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands（GRAPPA研究所、阿姆斯特丹理论物理研究所、阿姆斯特丹大学、科学公园904号、1098 XH阿姆斯特丹、荷兰）； Department of Astrophysics, University of Vienna, Türkenschanzstraße 17, 1180 Vienna, Austria（天体物理学系、维也纳大学、土耳其沙恩茨街17号、1180维也纳、奥地利）

AI总结提出一种基于模拟推理的方法，通过高斯后验建模和神经网络估计，实现从晚期观测数据快速重建宇宙初始密度场，比现有方法快数个数量级。

Comments 9 + 2 pages, 7 figures, 1 table. Comments welcome!

详情

DOI: 10.1093/mnras/stag977
Journal ref: Mon Not R Astron Soc (2026)

AI中文摘要

了解宇宙大尺度结构在宇宙时间中形成的原初物质密度场对宇宙学至关重要。然而，从晚期观测重建这些宇宙学初始条件是一项著名的困难任务，需要先进的宇宙学模拟器和复杂的统计方法来探索数百万维的参数空间。我们展示了如何利用基于模拟的推理（SBI）来解决这个问题，并以模拟高效的方式使用通用的不可微模拟器获得数据约束的原初暗物质密度场实现。我们的方法适用于完整的高分辨率暗物质$N$体模拟，并基于将约束初始条件的后验分布建模为傅里叶空间中对角协方差矩阵的高斯分布。因此，我们可以在单个GPU上几秒内生成数千个后验样本，比现有方法快数个数量级，为宇宙学场的顺序SBI铺平了道路。此外，我们对协方差与波数的依赖关系进行了解析拟合，有效地将任何初始条件的点估计器转化为快速采样器。我们通过汇总统计将获得的样本与真实值进行比较，并执行贝叶斯一致性检验，验证了样本的有效性。

英文摘要

Knowledge of the primordial matter density field from which the large-scale structure of the Universe emerged over cosmic time is of fundamental importance for cosmology. However, reconstructing these cosmological initial conditions from late-time observations is a notoriously difficult task, which requires advanced cosmological simulators and sophisticated statistical methods to explore a multi-million-dimensional parameter space. We show how simulation-based inference (SBI) can be used to tackle this problem and to obtain data-constrained realisations of the primordial dark matter density field in a simulation-efficient way with general non-differentiable simulators. Our method is applicable to full high-resolution dark matter $N$-body simulations and is based on modelling the posterior distribution of the constrained initial conditions to be Gaussian with a diagonal covariance matrix in Fourier space. As a result, we can generate thousands of posterior samples within seconds on a single GPU, orders of magnitude faster than existing methods, paving the way for sequential SBI for cosmological fields. Furthermore, we perform an analytical fit of the estimated dependence of the covariance on the wavenumber, effectively transforming any point-estimator of initial conditions into a fast sampler. We test the validity of our obtained samples by comparing them to the true values with summary statistics and performing a Bayesian consistency test.

URL PDF HTML ☆

赞 0 踩 0

2502.02260 2026-06-03 cs.LG cs.CR 版本更新

Position: Adversarial ML for LLMs Is Not Making Any Progress

立场：针对LLM的对抗性机器学习并未取得任何进展

Javier Rando, Jie Zhang, Nicholas Carlini, Florian Tramèr

发表机构 * GitHub ； University of California, Berkeley（加州大学伯克利分校）

AI总结本文认为，在大语言模型时代，对抗性机器学习研究的问题定义更模糊、更难解决且更难以评估，可能导致未来十年仍无法取得有意义进展。

Comments Accepted at ICML 2026 Position Paper Track

2501.02173 2026-06-03 cs.IR cs.LG 版本更新

The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit

效率与准确性的权衡：使用多头早期退出优化RAG增强的LLM推荐系统

Huixue Zhou, Hengrui Gu, Xi Liu, Kaixiong Zhou, Mingfu Liang, Yongkang Xiao, Srinivas Govindan, Piyush Chawla, Jiyan Yang, Xiangfei Meng, Huayu Li, Buyun Zhang, Liang Luo, Wen-Yen Chen, Yiping Han, Bo Long, Rui Zhang, Tianlong Chen

发表机构 * Meta Platforms（Meta平台）； University of Minnesota（明尼苏达大学）； NCSU（北卡罗来纳州立大学）； UNC at Chapel Hill（Chapel Hill分校，北卡罗来纳大学）

AI总结提出结合检索增强生成（RAG）与多头早期退出架构的优化框架，通过图卷积网络（GCN）高效检索和动态推理终止，在降低计算时间的同时保持或提升点击率（CTR）预测准确性。

详情

AI中文摘要

在推荐系统中部署大型语言模型（LLM）以预测点击率（CTR）需要在计算效率和预测准确性之间取得微妙的平衡。本文提出一个优化框架，结合检索增强生成（RAG）与创新的多头早期退出架构，同时增强这两个方面。通过集成图卷积网络（GCN）作为高效检索机制，我们能够显著减少数据检索时间，同时保持高模型性能。采用的早期退出策略允许动态终止模型推理，利用跨多个头的实时预测置信度评估。这不仅加快了LLM的响应速度，还维持或提高了其准确性，使其非常适合实时应用场景。我们的实验表明，该架构有效减少了计算时间，而不牺牲可靠推荐交付所需的准确性，为商业系统中高效、实时的LLM部署建立了新标准。

英文摘要

The deployment of Large Language Models (LLMs) in recommender systems for predicting Click-Through Rates (CTR) necessitates a delicate balance between computational efficiency and predictive accuracy. This paper presents an optimization framework that combines Retrieval-Augmented Generation (RAG) with an innovative multi-head early exit architecture to concurrently enhance both aspects. By integrating Graph Convolutional Networks (GCNs) as efficient retrieval mechanisms, we are able to significantly reduce data retrieval times while maintaining high model performance. The early exit strategy employed allows for dynamic termination of model inference, utilizing real-time predictive confidence assessments across multiple heads. This not only quickens the responsiveness of LLMs but also upholds or improves their accuracy, making it ideal for real-time application scenarios. Our experiments demonstrate how this architecture effectively decreases computation time without sacrificing the accuracy needed for reliable recommendation delivery, establishing a new standard for efficient, real-time LLM deployment in commercial systems.

URL PDF HTML ☆

赞 0 踩 0

2412.05109 2026-06-03 cs.LG cs.IT math.IT math.PR math.ST stat.ML stat.TH 版本更新

Generating Rectifiable Measures through Neural Networks

通过神经网络生成可求积测度

Erwin Riegler, Alex Bühler, Yang Pan, Helmut Bölcskei

AI总结本文证明可数m-可求积测度可通过ReLU神经网络将[0,1]上的一维勒贝格测度推前得到，在Wasserstein距离下达到任意小逼近误差，且所需网络数量上界为2^{O(ε^{-m} log^2 ε)}，该率等于可求积参数m。

详情

AI中文摘要

我们推导了（可数）$m$-可求积测度类的通用逼近结果。具体地，我们证明$m$-可求积测度可以通过ReLU神经网络将$[0,1]$上的一维勒贝格测度推前得到，在Wasserstein距离下达到任意小的逼近误差。此外，所考虑网络的权重是量化和有界的，达到逼近误差$\varepsilon$所需的ReLU神经网络数量不超过$2^{b(\varepsilon)}$，其中$b(\varepsilon)=\mathcal{O}(\varepsilon^{-m}\log^2(\varepsilon))$。这一结果改进了Perekrestenko等人的引理IX.4，因为它表明当$\varepsilon$趋于零时$b(\varepsilon)$趋于无穷的速率等于可求积参数$m$，而$m$可能远小于环境维度。我们将此结果推广到可数$m$-可求积测度，并证明该速率仍然等于可求积参数$m$，前提是（除其他技术假设外）测度在可数$m$-可求积支撑集的各个分量上指数衰减。

英文摘要

We derive universal approximation results for the class of (countably) $m$-rectifiable measures. Specifically, we prove that $m$-rectifiable measures can be approximated as push-forwards of the one-dimensional Lebesgue measure on $[0,1]$ using ReLU neural networks with arbitrarily small approximation error in terms of Wasserstein distance. What is more, the weights in the networks under consideration are quantized and bounded and the number of ReLU neural networks required to achieve an approximation error of $\varepsilon$ is no larger than $2^{b(\varepsilon)}$ with $b(\varepsilon)=\mathcal{O}(\varepsilon^{-m}\log^2(\varepsilon))$. This result improves Lemma IX.4 in Perekrestenko et al. as it shows that the rate at which $b(\varepsilon)$ tends to infinity as $\varepsilon$ tends to zero equals the rectifiability parameter $m$, which can be much smaller than the ambient dimension. We extend this result to countably $m$-rectifiable measures and show that this rate still equals the rectifiability parameter $m$ provided that, among other technical assumptions, the measure decays exponentially on the individual components of the countably $m$-rectifiable support set.

URL PDF HTML ☆

赞 0 踩 0

2409.08958 2026-06-03 cs.LG cs.AI physics.comp-ph physics.flu-dyn 版本更新

PINNfluence: Interpreting PINNs through Influence Functions

PINNfluence: 通过影响函数解释 PINN

Aleksander Krasowski, Jonas R. Naujoks, Moritz Weckbecker, Galip Ü. Yolcu, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek, René P. Klausen

发表机构 * Technical University of Munich（慕尼黑技术大学）； Max Planck Institute for Intelligent Systems（智能系统马克斯·普朗克研究所）； University of Tübingen（图宾根大学）； ETH Zurich（苏黎世联邦理工学院）

AI总结提出 PINNfluence 框架，基于影响函数对物理信息神经网络进行训练数据归因，实现预测、损失分量和训练数据点之间的细粒度归因，并通过基准实验区分训练好与差的 PINN 的结构特征。

Comments Accepted at ICML 2026

2410.14573 2026-06-03 cs.LG cs.AI 版本更新

Building Trust in Black-box Optimization: A Comprehensive Framework for Explainability

在黑盒优化中建立信任：可解释性的综合框架

Nazanin Nezami, Hadis Anahideh

发表机构 * University of Illinois Chicago（伊利诺伊大学芝加哥分校）

AI总结提出一套模型无关的指标IEMSO，通过采样核心、批次属性、优化过程和特征重要性四类指标，增强代理优化方法的透明性和可解释性。

详情

AI中文摘要

在受限评估预算内优化昂贵的黑盒函数在许多实际应用中面临重大挑战。代理优化（SO）是一种常见的解决方案，但其由代理模型和采样核心（例如采集函数）的复杂性引入的专有性质往往导致缺乏可解释性和透明度。尽管现有文献主要集中在增强对全局最优的收敛性，但新提出策略的实际解释仍未被充分探索，特别是在批量评估设置中。在本文中，我们提出了代理优化的包容性可解释性指标（IEMSO），这是一组全面的模型无关指标，旨在增强SO方法的透明度、可信度和可解释性。通过这些指标，我们在执行昂贵评估之前和之后为从业者提供中间和事后解释，以建立信任。我们考虑了四类主要指标，每类针对SO过程的特定方面：采样核心指标、批次属性指标、优化过程指标和特征重要性。我们的实验评估证明了所提指标在不同基准上的显著潜力。

英文摘要

Optimizing costly black-box functions within a constrained evaluation budget presents significant challenges in many real-world applications. Surrogate Optimization (SO) is a common resolution, yet its proprietary nature introduced by the complexity of surrogate models and the sampling core (e.g., acquisition functions) often leads to a lack of explainability and transparency. While existing literature has primarily concentrated on enhancing convergence to global optima, the practical interpretation of newly proposed strategies remains underexplored, especially in batch evaluation settings. In this paper, we propose \emph{Inclusive} Explainability Metrics for Surrogate Optimization (IEMSO), a comprehensive set of model-agnostic metrics designed to enhance the transparency, trustworthiness, and explainability of the SO approaches. Through these metrics, we provide both intermediate and post-hoc explanations to practitioners before and after performing expensive evaluations to gain trust. We consider four primary categories of metrics, each targeting a specific aspect of the SO process: Sampling Core Metrics, Batch Properties Metrics, Optimization Process Metrics, and Feature Importance. Our experimental evaluations demonstrate the significant potential of the proposed metrics across different benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2406.10407 2026-06-03 math.OC cs.LG cs.NA math.NA 版本更新

Suboptimality bounds for trace-bounded SDPs enable a faster and scalable low-rank SDP solver SDPLR+

迹有界半定规划的最优性界实现更快且可扩展的低秩SDP求解器SDPLR+

Yufan Huang, David F. Gleich

发表机构 * Purdue University（普渡大学）

AI总结本文利用迹有界半定规划的最优性界改进Burer-Monteiro的低秩SDP求解器SDPLR，提出SDPLR+，通过动态调整秩并跟踪原始不可行性和最优性，实现更快的求解和更好的可扩展性。

Comments 31 pages, 12 figures

详情

AI中文摘要

半定规划（SDP）及其求解器是机器学习和数据科学中许多应用的有力工具。设计可扩展的SDP求解器具有挑战性，因为标准情况下正半定决策变量是一个$n \times n$的稠密矩阵，尽管输入通常是$n \times n$的稀疏矩阵。然而，如Barvinok和Pataki所示，解可能不需要满秩矩阵。二十年前，Burer和Monteiro开发了SDP求解器\texttt{SDPLR}，它在低秩分解上而不是完整矩阵上进行优化。这大大降低了存储成本，并且对许多问题效果良好。原始求解器\texttt{SDPLR}仅跟踪解的原始不可行性，阻止了在中等精度下的提前终止。我们利用迹有界SDP问题的最优性界，使我们能够更好地跟踪进展并执行提前终止。然后我们开发了\texttt{SDPLR+}，它以极低秩分解开始优化，并基于原始不可行性和最优性动态更新秩。这进一步加速了计算并节省了存储。在Max Cut、Minimum Bisection、Cut Norm和Lovász Theta问题上与许多近期的内存高效可扩展SDP求解器的数值比较展示了\texttt{SDPLR+}在决策变量达到百万乘百万规模问题上的可扩展性。它通常是达到中等精度$10^{-2}$的最快求解器。在$\mu$-电导、矩阵补全和$k$-均值聚类上的进一步实验显示了\texttt{SDPLR+}在更广泛数据科学应用中的潜力。

英文摘要

Semidefinite programs (SDPs) and their solvers are powerful tools with many applications in machine learning and data science. Designing scalable SDP solvers is challenging because by standard the positive semidefinite decision variable is an $n \times n$ dense matrix, even though the input is often an $n \times n$ sparse matrix. However, the solution may not require a full-rank matrix, as shown by Barvinok and Pataki. Two decades ago, Burer and Monteiro developed an SDP solver \texttt{SDPLR} that optimizes over a low-rank factorization instead of the full matrix. This greatly decreases the storage cost and works well for many problems. The original solver \texttt{SDPLR} tracks only the primal infeasibility of the solution, preventing early termination at moderate accuracy. We use a suboptimality bound for trace-bounded SDP problems that enables us to track the progress better and perform early termination. We then develop \texttt{SDPLR+}, which starts the optimization with an extremely low-rank factorization and dynamically updates the rank based on the primal infeasibility and suboptimality. This further speeds up the computation and saves storage. Numerical comparisons on Max Cut, Minimum Bisection, Cut Norm, and Lovász Theta problems with many recent memory-efficient scalable SDP solvers demonstrate the scalability of \texttt{SDPLR+} up to problems with million-by-million decision variables. It is often the fastest solver to a moderate accuracy of $10^{-2}$. Further experiments on $μ$-conductance, matrix completion, and $k$-means clustering show the potential of \texttt{SDPLR+} on a broader range of data science applications.

URL PDF HTML ☆

赞 0 踩 0

2407.18428 2026-06-03 cs.LG cs.AI cs.CV 版本更新

Weighted Risk Invariance: Domain Generalization under Invariant Feature Shift

加权风险不变性：不变特征偏移下的领域泛化

Gina Wong, Joshua Gleason, Rama Chellappa, Yoav Wald, Anqi Liu

发表机构 * Johns Hopkins University（约翰霍普金斯大学）； University of Maryland, College Park（马里兰大学学院公园分校）； New York University（纽约大学）； Center for Data Science（数据科学中心）

AI总结针对不变协变量偏移下现有不变学习方法性能不佳的问题，提出加权风险不变性（WRI）框架，通过环境间损失的不变性并加权训练样本，在理论上保证学习到不变模型，并在实验中优于先前方法。

详情

Journal ref: TMLR 2024

AI中文摘要

学习预测在多个环境下不变的模型是一种有前景的分布外泛化方法。这类模型被训练来提取特征 $X_{ ext{inv}}$，其中给定提取特征的条件分布 $Y \mid X_{ ext{inv}}$ 在不同环境下不发生变化。不变模型还应能泛化到提取特征 $X_{ ext{inv}}$ 的边缘分布 $p(X_{ ext{inv}})$ 的偏移，这种偏移称为 $ extit{不变协变量偏移}$。然而，我们表明，现有学习不变模型的方法在不变协变量偏移下表现不佳，要么无法学习到不变模型——即使对于从简单且经过充分研究的线性-高斯模型生成的数据也是如此——要么有限样本性能较差。为了解决这些问题，我们提出 $ extit{加权风险不变性}$（WRI）。我们的框架基于对训练样本进行适当加权，强制要求损失在不同环境下保持不变。我们证明，在线性-高斯设置下，WRI 可证明地学习到不变模型，即丢弃虚假相关性。我们提出了一种实用算法，通过同时学习密度 $p(X_{ ext{inv}})$ 和模型参数来实现 WRI，并且实验表明，在不变协变量偏移下，WRI 优于先前的不变学习方法。

英文摘要

Learning models whose predictions are invariant under multiple environments is a promising approach for out-of-distribution generalization. Such models are trained to extract features $X_{\text{inv}}$ where the conditional distribution $Y \mid X_{\text{inv}}$ of the label given the extracted features does not change across environments. Invariant models are also supposed to generalize to shifts in the marginal distribution $p(X_{\text{inv}})$ of the extracted features $X_{\text{inv}}$, a type of shift we call an $\textit{invariant covariate shift}$. However, we show that proposed methods for learning invariant models underperform under invariant covariate shift, either failing to learn invariant models$\unicode{x2014}$even for data generated from simple and well-studied linear-Gaussian models$\unicode{x2014}$or having poor finite-sample performance. To alleviate these problems, we propose $\textit{weighted risk invariance}$ (WRI). Our framework is based on imposing invariance of the loss across environments subject to appropriate reweightings of the training examples. We show that WRI provably learns invariant models, i.e. discards spurious correlations, in linear-Gaussian settings. We propose a practical algorithm to implement WRI by learning the density $p(X_{\text{inv}})$ and the model parameters simultaneously, and we demonstrate empirically that WRI outperforms previous invariant learning methods under invariant covariate shift.

URL PDF HTML ☆

赞 0 踩 0

2405.03386 2026-06-03 cs.LG 版本更新

Annot-Mix: Learning with Noisy Class Labels from Multiple Annotators via a Mixup Extension

Annot-Mix: 通过混合扩展从多个标注者学习带噪声类别标签

Marek Herde, Lukas Lührs, Denis Huseljic, Bernhard Sick

发表机构 * University of Kassel（卡塞尔大学）； European Conference on Artificial Intelligence（欧洲人工智能会议）； Conference on Prestigious Applications of Intelligent Systems（智能系统 prestigious 应用会议）

AI总结提出Annot-Mix框架，通过扩展mixup处理多标注者提供的类别标签，在11个数据集上优于11种现有方法。

Comments 9 pages, 8 figures, 4 tables; post-publication arXiv version with minor editorial corrections; methodology, results, and conclusions unchanged

详情

DOI: 10.3233/FAIA240829
Journal ref: ECAI 2024: 27th European Conference on Artifical Intelligence, IOS Press, pp. 2910-2918, 2024

AI中文摘要

使用带噪声的类别标签进行训练会损害神经网络的泛化性能。在此背景下，mixup是一种流行的正则化技术，通过使记忆错误类别标签更加困难来提高训练鲁棒性。然而，mixup忽略了多个标注者（例如众包工作者）通常提供类别标签的事实。因此，我们提出了mixup的一种扩展，该扩展处理每个实例的多个类别标签，同时考虑哪个类别标签来自哪个标注者。集成到我们的多标注者分类框架annot-mix中，在包含来自人类或模拟标注者的噪声类别标签的11个数据集的评估研究中，它的性能优于11种（大多数是最先进的）方法。我们的代码通过我们的GitHub仓库公开提供：https://github.com/ies-research/multi-annotator-machine-learning/tree/annot-mix

英文摘要

Training with noisy class labels impairs neural networks' generalization performance. In this context, mixup is a popular regularization technique to improve training robustness by making memorizing false class labels more difficult. However, mixup neglects that multiple annotators, e.g., crowdworkers, typically provide class labels. Therefore, we propose an extension of mixup, which handles multiple class labels per instance while considering which class label originates from which annotator. Integrated into our multi-annotator classification framework annot-mix, it performs superiorly to eleven (mostly state-of-the-art) approaches in an evaluation study with eleven datasets comprising noisy class labels from either human or simulated annotators. Our code is publicly available through our GitHub repository at https://github.com/ies-research/multi-annotator-machine-learning/tree/annot-mix

URL PDF HTML ☆

赞 0 踩 0

1212.5524 2026-06-03 eess.SY cs.LG cs.SY 版本更新

Reinforcement learning for port-Hamiltonian systems

面向端口-哈密顿系统的强化学习

Olivier Sprangers, Gabriel A. D. Lopes, Robert Babuska

AI总结针对端口-哈密顿系统的无源控制中性能优化与PDE求解困难的问题，提出一种基于演员-评论家强化学习的参数化能量平衡无源控制方法，实现近最优控制策略学习并保持系统稳定性。

Comments submitted

详情

DOI: 10.1109/TCYB.2014.2343194
Journal ref: IEEE Transactions on Cybernetics, Volume: 45 , Issue: 5 , May 2015

AI中文摘要

端口-哈密顿系统的无源控制（PBC）通过使系统相对于期望的存储函数无源，提供了一种直观的稳定化方法。然而，在大多数情况下，控制律的获得没有考虑任何性能指标，并且必须通过求解复杂的偏微分方程（PDE）来计算。为了解决这些问题，我们将强化学习方法引入能量平衡无源控制（EB-PBC）方法中，这是一种PBC形式，其中闭环能量等于存储能量与供给能量之差。我们提出了一种参数化EB-PBC的技术，该技术保留了系统的PDE匹配条件，不需要指定全局期望哈密顿量，包含性能标准，并且对额外的非线性（如控制输入饱和）具有鲁棒性。控制律的参数通过演员-评论家强化学习找到，从而能够学习满足期望闭环能量景观的近最优控制策略。其优点是，可以使用标准的能量整形技术生成近最优控制器，并且学习到的解可以在能量整形和阻尼注入方面进行解释，从而使得利用无源性理论对稳定性进行数值评估成为可能。从强化学习的角度来看，我们的方法允许将端口-哈密顿系统类纳入演员-评论家框架，通过策略的参数化加速学习。该方法已成功应用于仿真和实际实验中的摆锤起摆问题。

英文摘要

Passivity-based control (PBC) for port-Hamiltonian systems provides an intuitive way of achieving stabilization by rendering a system passive with respect to a desired storage function. However, in most instances the control law is obtained without any performance considerations and it has to be calculated by solving a complex partial differential equation (PDE). In order to address these issues we introduce a reinforcement learning approach into the energy-balancing passivity-based control (EB-PBC) method, which is a form of PBC in which the closed-loop energy is equal to the difference between the stored and supplied energies. We propose a technique to parameterize EB-PBC that preserves the systems's PDE matching conditions, does not require the specification of a global desired Hamiltonian, includes performance criteria, and is robust to extra non-linearities such as control input saturation. The parameters of the control law are found using actor-critic reinforcement learning, enabling learning near-optimal control policies satisfying a desired closed-loop energy landscape. The advantages are that near-optimal controllers can be generated using standard energy shaping techniques and that the solutions learned can be interpreted in terms of energy shaping and damping injection, which makes it possible to numerically assess stability using passivity theory. From the reinforcement learning perspective, our proposal allows for the class of port-Hamiltonian systems to be incorporated in the actor-critic framework, speeding up the learning thanks to the resulting parameterization of the policy. The method has been successfully applied to the pendulum swing-up problem in simulations and real-life experiments.

URL PDF HTML ☆

赞 0 踩 0

1206.3582 2026-06-03 math.OC cs.LG cs.SY eess.SY 版本更新

Decentralized Learning for Multi-player Multi-armed Bandits

多人多臂老虎机的分散式学习

Dileep Kalathil, Naumaan Nayyar, Rahul Jain

AI总结针对多人多臂老虎机问题，提出了一种无需协调的分散式在线学习算法dUCB_4，实现了近O(log^2 T)的期望遗憾。

Comments 33 pages, 3 figures. Submitted to IEEE Transactions on Information Theory

详情

DOI: 10.1109/CDC.2012.6426587

AI中文摘要

我们考虑多人多臂老虎机（MAB）模型中的分布式在线学习问题。每个玩家可以选择多个臂。当玩家选择一个臂时，它会获得奖励。我们考虑独立同分布奖励模型和马尔可夫奖励模型。在独立同分布模型中，每个臂被建模为具有未知均值的未知分布的独立同分布过程。在马尔可夫模型中，每个臂被建模为具有未知概率转移矩阵和平稳分布的有限、不可约、非周期且可逆的马尔可夫链。不同玩家从臂中获得不同奖励。如果两个玩家选择同一个臂，则发生“碰撞”，两者均无法获得任何奖励。玩家之间没有专用的控制信道用于协调或通信。用户之间的任何其他通信都是有代价的，并会增加遗憾。我们提出了一种基于索引的在线分布式学习策略，称为${ t dUCB_4}$算法，该算法以正确的方式权衡探索与利用，并实现期望遗憾增长不超过近$O(\log^2 T)$。该研究的动机来自认知无线电网络中多个次要用户的机会频谱接入，他们必须在不同用户看起来不同的各种无线信道中进行选择。据我们所知，这是首个针对多人MAB的分布式学习算法。

英文摘要

We consider the problem of distributed online learning with multiple players in multi-armed bandits (MAB) models. Each player can pick among multiple arms. When a player picks an arm, it gets a reward. We consider both i.i.d. reward model and Markovian reward model. In the i.i.d. model each arm is modelled as an i.i.d. process with an unknown distribution with an unknown mean. In the Markovian model, each arm is modelled as a finite, irreducible, aperiodic and reversible Markov chain with an unknown probability transition matrix and stationary distribution. The arms give different rewards to different players. If two players pick the same arm, there is a "collision", and neither of them get any reward. There is no dedicated control channel for coordination or communication among the players. Any other communication between the users is costly and will add to the regret. We propose an online index-based distributed learning policy called ${\tt dUCB_4}$ algorithm that trades off \textit{exploration v. exploitation} in the right way, and achieves expected regret that grows at most as near-$O(\log^2 T)$. The motivation comes from opportunistic spectrum access by multiple secondary users in cognitive radio networks wherein they must pick among various wireless channels that look different to different users. This is the first distributed learning algorithm for multi-player MABs to the best of our knowledge.

URL PDF HTML ☆

赞 0 踩 0

1303.4778 2026-06-03 cs.LG cs.NA math.NA stat.ML 版本更新

Greedy Feature Selection for Subspace Clustering

子空间聚类的贪婪特征选择

Eva L. Dyer, Aswin C. Sankaranarayanan, Richard G. Baraniuk

AI总结本文研究使用贪婪方法（正交匹配追踪）进行子空间聚类的精确特征选择，并证明其在稀疏采样条件下优于最近邻方法。

Comments 32 pages, 7 figures, 1 table

详情

Journal ref: Journal of Machine Learning Research, Vol.14, Issue 1, pp. 2487-2517, January 2013

AI中文摘要

子空间的并集为高维数据集合提供了对线性子空间模型的强大推广。为了从数据集合中学习子空间的并集，必须识别集合中属于同一子空间的信号集，以获得数据中存在的子空间结构的准确估计。最近，稀疏恢复方法已被证明为精确特征选择（EFS）提供了可证明且稳健的策略——从集合中恢复位于同一子空间的点集。与最近关于L1最小化EFS的研究并行，本文为使用贪婪方法（即正交匹配追踪（OMP））进行稀疏信号恢复的EFS发展了充分条件。在分析之后，我们提供了对生活在子空间并集上的信号的特征选择策略的实证研究，并刻画了稀疏恢复方法与基于最近邻（NN）的方法之间的差距。特别是，我们证明了稀疏恢复方法比NN方法具有显著优势，并且当数据集中子空间的采样稀疏时，这两种方法之间的差距尤为明显。我们的结果表明，在NN方法无法揭示集合中点所属子空间的许多情况下，OMP可以可靠地恢复精确的特征集。

英文摘要

Unions of subspaces provide a powerful generalization to linear subspace models for collections of high-dimensional data. To learn a union of subspaces from a collection of data, sets of signals in the collection that belong to the same subspace must be identified in order to obtain accurate estimates of the subspace structures present in the data. Recently, sparse recovery methods have been shown to provide a provable and robust strategy for exact feature selection (EFS)--recovering subsets of points from the ensemble that live in the same subspace. In parallel with recent studies of EFS with L1-minimization, in this paper, we develop sufficient conditions for EFS with a greedy method for sparse signal recovery known as orthogonal matching pursuit (OMP). Following our analysis, we provide an empirical study of feature selection strategies for signals living on unions of subspaces and characterize the gap between sparse recovery methods and nearest neighbor (NN)-based approaches. In particular, we demonstrate that sparse recovery methods provide significant advantages over NN methods and the gap between the two approaches is particularly pronounced when the sampling of subspaces in the dataset is sparse. Our results suggest that OMP may be employed to reliably recover exact feature sets in a number of regimes where NN approaches fail to reveal the subspace membership of points in the ensemble.

URL PDF HTML ☆

赞 0 踩 0

1207.2940 2026-06-03 stat.ML cs.LG cs.SY eess.SY 版本更新

Expectation Propagation in Gaussian Process Dynamical Systems: Extended Version

高斯过程动态系统中的期望传播：扩展版

Marc Peter Deisenroth, Shakir Mohamed

AI总结本文提出基于期望传播的消息传递算法用于高斯过程动态系统的近似推理，通过前向后向平滑迭代获得更精确的潜在结构后验分布，提升预测性能，并统一了现有GPDS平滑器。

详情

Journal ref: Advances in Neural Information Processing Systems 25 (NIPS), pp. 2609-2617, 2012

AI中文摘要

丰富且复杂的时间序列数据，例如来自工程系统、金融市场、视频或神经记录的生成数据，现在已成为现代数据分析的常见特征。解释这些多样化数据集背后的现象需要灵活且准确的模型。在本文中，我们推广高斯过程动态系统（GPDS）作为适合此类分析的丰富模型类。特别地，我们提出了一种基于期望传播的GPDS近似推理消息传递算法。通过将推理视为一般的消息传递问题，我们迭代前向后向平滑。因此，我们获得了更准确的潜在结构后验分布，与最先进的GPDS平滑器（这些平滑器是我们一般消息传递算法的特例）相比，预测性能得到改善。因此，我们提供了一种统一的方法，在其中将消息传递置于GPDS的上下文中。

英文摘要

Rich and complex time-series data, such as those generated from engineering systems, financial markets, videos or neural recordings, are now a common feature of modern data analysis. Explaining the phenomena underlying these diverse data sets requires flexible and accurate models. In this paper, we promote Gaussian process dynamical systems (GPDS) as a rich model class that is appropriate for such analysis. In particular, we present a message passing algorithm for approximate inference in GPDSs based on expectation propagation. By posing inference as a general message passing problem, we iterate forward-backward smoothing. Thus, we obtain more accurate posterior distributions over latent structures, resulting in improved predictive performance compared to state-of-the-art GPDS smoothers, which are special cases of our general message passing algorithm. Hence, we provide a unifying approach within which to contextualize message passing in GPDSs.

URL PDF HTML ☆

赞 0 踩 0

1102.5597 2026-06-03 math.NA cs.LG cs.NA 版本更新

Fast and Faster: A Comparison of Two Streamed Matrix Decomposition Algorithms

快速与更快：两种流式矩阵分解算法的比较

Radim Řeh{ů}řek

AI总结本文比较了单遍分布式算法和两遍流式随机算法在恒定内存下处理大规模矩阵分解的性能与精度，以英文维基百科为数据集进行潜在语义分析实验。

详情

Journal ref: NIPS Workshop on Low-Rank Methods for Large-Scale Machine Learning, 2010

AI中文摘要

随着数字数据集规模的爆炸式增长，分解算法的限制因素是对输入的 extit{遍历次数}，因为输入通常存储在外存甚至异地。此外，我们只关注相对于输入大小在 extit{恒定内存}中运行的算法，以便处理任意大的输入。在本文中，我们提出了两种此类算法的实际比较：一种是对输入进行单遍操作的分布式方法，另一种是流式两遍随机算法。实验跟踪了分布式计算、过采样和内存权衡对两种算法精度和性能的影响。为了确保有意义的结果，我们选择真实数据集，即整个英文维基百科，作为潜在语义分析的应用场景。

英文摘要

With the explosion of the size of digital dataset, the limiting factor for decomposition algorithms is the \emph{number of passes} over the input, as the input is often stored out-of-core or even off-site. Moreover, we're only interested in algorithms that operate in \emph{constant memory} w.r.t. to the input size, so that arbitrarily large input can be processed. In this paper, we present a practical comparison of two such algorithms: a distributed method that operates in a single pass over the input vs. a streamed two-pass stochastic algorithm. The experiments track the effect of distributed computing, oversampling and memory trade-offs on the accuracy and performance of the two algorithms. To ensure meaningful results, we choose the input to be a real dataset, namely the whole of the English Wikipedia, in the application settings of Latent Semantic Analysis.

URL PDF HTML ☆

赞 0 踩 0

1111.2262 2026-06-03 cs.LG cs.NA math.NA 版本更新

Improved Bound for the Nystrom's Method and its Application to Kernel Classification

Nyström方法的改进界及其在核分类中的应用

Rong Jin, Tianbao Yang, Mehrdad Mahdavi, Yu-Feng Li, Zhi-Hua Zhou

AI总结本文通过积分算子集中不等式和压缩感知理论改进了Nyström方法的谱范数逼近误差界，并应用于核分类，证明在特征值服从p次幂律时可将支持向量数量减少至N^{2p/(p^2-1)}。

详情

DOI: 10.1109/TIT.2013.2271378

AI中文摘要

我们开发了两种分析Nyström方法逼近误差界的方法，一种基于积分算子的集中不等式，另一种基于压缩感知理论。我们表明，在大特征间隙的情况下，以谱范数度量的逼近误差可以从$O(N/\sqrt{m})$改进到$O(N/m^{1 - ρ})$，其中$N$是数据点总数，$m$是采样数据点数，$ρ\in (0, 1/2)$是刻画特征间隙的正常数。当核矩阵的特征值服从$p$次幂律时，基于压缩感知理论的分析在非相干性假设下进一步将界改进为$O(N/m^{p - 1})$，这解释了为什么Nyström方法对特征值倾斜的核矩阵效果良好。我们提出了一种基于Nyström方法的核分类方法，并利用改进的界推导了其泛化性能。我们表明，当核矩阵的特征值服从$p$次幂律时，我们可以将支持向量数量减少到$N^{2p/(p^2 - 1)}$，当$p > 1+\sqrt{2}$时该数量小于$N$，而不会严重牺牲其泛化性能。

英文摘要

We develop two approaches for analyzing the approximation error bound for the Nyström method, one based on the concentration inequality of integral operator, and one based on the compressive sensing theory. We show that the approximation error, measured in the spectral norm, can be improved from $O(N/\sqrt{m})$ to $O(N/m^{1 - ρ})$ in the case of large eigengap, where $N$ is the total number of data points, $m$ is the number of sampled data points, and $ρ\in (0, 1/2)$ is a positive constant that characterizes the eigengap. When the eigenvalues of the kernel matrix follow a $p$-power law, our analysis based on compressive sensing theory further improves the bound to $O(N/m^{p - 1})$ under an incoherence assumption, which explains why the Nyström method works well for kernel matrix with skewed eigenvalues. We present a kernel classification approach based on the Nyström method and derive its generalization performance using the improved bound. We show that when the eigenvalues of kernel matrix follow a $p$-power law, we can reduce the number of support vectors to $N^{2p/(p^2 - 1)}$, a number less than $N$ when $p > 1+\sqrt{2}$, without seriously sacrificing its generalization performance.

URL PDF HTML ☆

赞 0 踩 0

1205.4133 2026-06-03 math.NA cs.LG cs.NA 版本更新

Constrained Overcomplete Analysis Operator Learning for Cosparse Signal Modelling

约束过完全分析算子学习用于共稀疏信号建模

Mehrdad Yaghoobi, Sangnam Nam, Remi Gribonval, Mike E. Davies

AI总结提出一种基于L1优化的约束学习框架，通过投影次梯度和Douglas-Rachford分裂技术学习过完全分析算子，实现共稀疏信号建模，并验证了其在干净和噪声训练集上的鲁棒恢复能力。

Comments 29 pages, 13 figures, accepted to be published in TSP

详情

DOI: 10.1109/TSP.2013.2250968

AI中文摘要

我们考虑从训练样本集合中学习低维信号模型的问题。主流方法是学习一个过完全字典，利用稀疏合成系数对训练样本提供良好近似。这个著名的稀疏模型有一个不太为人知的对应物，即分析形式的共稀疏分析模型。在这个新模型中，信号的特征在于它们在使用过完全（线性）分析算子的变换域中的简约性。我们提出基于L1优化的约束优化框架，从训练语料库中学习分析算子。在优化框架中引入约束的原因是为了排除平凡解。尽管目前还没有最终答案确定哪个约束最相关，但我们研究了模型自适应领域的一些常规约束，并为此使用了均匀归一化紧框架（UNTF）。然后，我们推导了一个实用的学习算法，基于投影次梯度和Douglas-Rachford分裂技术，并展示了当提供足够大小的干净训练集时，该算法能够稳健地恢复真实分析算子。我们还使用一些含噪的共稀疏信号找到了图像的分析算子，这确实是一个更现实的实验。由于推导出的优化问题不是凸规划，我们通常使用此类变分方法找到局部最小值。针对两种不同设置推导了局部最优性条件，为学习问题在适当条件下的适定性提供了初步的理论支持。

英文摘要

We consider the problem of learning a low-dimensional signal model from a collection of training samples. The mainstream approach would be to learn an overcomplete dictionary to provide good approximations of the training samples using sparse synthesis coefficients. This famous sparse model has a less well known counterpart, in analysis form, called the cosparse analysis model. In this new model, signals are characterised by their parsimony in a transformed domain using an overcomplete (linear) analysis operator. We propose to learn an analysis operator from a training corpus using a constrained optimisation framework based on L1 optimisation. The reason for introducing a constraint in the optimisation framework is to exclude trivial solutions. Although there is no final answer here for which constraint is the most relevant constraint, we investigate some conventional constraints in the model adaptation field and use the uniformly normalised tight frame (UNTF) for this purpose. We then derive a practical learning algorithm, based on projected subgradients and Douglas-Rachford splitting technique, and demonstrate its ability to robustly recover a ground truth analysis operator, when provided with a clean training set, of sufficient size. We also find an analysis operator for images, using some noisy cosparse signals, which is indeed a more realistic experiment. As the derived optimisation problem is not a convex program, we often find a local minimum using such variational methods. Some local optimality conditions are derived for two different settings, providing preliminary theoretical support for the well-posedness of the learning problem under appropriate conditions.

URL PDF HTML ☆

赞 0 踩 0

0911.1419 2026-06-03 cs.DS cond-mat.stat-mech cs.DM cs.LG cs.NA math.NA math.OC 版本更新

Belief Propagation and Loop Calculus for the Permanent of a Non-Negative Matrix

非负矩阵积和式的信念传播与环路微积分

Yusuke Watanabe, Michael Chertkov

AI总结针对非负矩阵积和式的计算问题，利用信念传播固定点导出了精确的积和式表达式，并提供了基于Bethe自由能和Ihara图zeta函数的两种推导。

Comments 11 pages; submitted to Journal of Physics A: Mathematical Theoretical

详情

DOI: 10.1088/1751-8113/43/24/242002

AI中文摘要

我们考虑计算一个正$(N\times N)$非负矩阵$P=(P_i^j|i,j=1,\cdots,N)$的积和式，或等价地，完全二分图$K_{N,N}$上完美匹配的加权计数问题。该问题已知具有指数复杂度。作为图模型的配分函数$Z$，该问题允许精确的环路微积分表示[Chertkov, Chernyak '06]，该表示基于Bethe自由能泛函在非整数双随机边际信念矩阵$\beta=(\beta_i^j|i,j=1,\cdots,N)$上的内部最小值，该矩阵也对应于信念传播(BP)型迭代消息传递算法的固定点。我们的主要结果是给出精确配分函数（积和式）用BP边际矩阵$\beta$表示的显式表达式：$Z=\mbox{Perm}(P)=Z_{BP} \mbox{Perm}(\beta_i^j(1-\beta_i^j))/\prod_{i,j}(1-\beta_i^j)$，其中$Z_{BP}$是用$\beta$显式表示的BP积和式表达式。我们给出了该公式的两种推导：一种直接基于Bethe自由能，另一种结合了Ihara图$\zeta$函数和环路微积分方法。假设已计算出信念传播边际矩阵$\beta$，我们提供了两个下界和一个上界来估计乘积项。两个互补的下界分别基于Gurvits-van der Waerden定理以及修正积和式与行列式之间的关系。

英文摘要

We consider computation of permanent of a positive $(N\times N)$ non-negative matrix, $P=(P_i^j|i,j=1,\cdots,N)$, or equivalently the problem of weighted counting of the perfect matchings over the complete bipartite graph $K_{N,N}$. The problem is known to be of likely exponential complexity. Stated as the partition function $Z$ of a graphical model, the problem allows exact Loop Calculus representation [Chertkov, Chernyak '06] in terms of an interior minimum of the Bethe Free Energy functional over non-integer doubly stochastic matrix of marginal beliefs, $β=(β_i^j|i,j=1,\cdots,N)$, also correspondent to a fixed point of the iterative message-passing algorithm of the Belief Propagation (BP) type. Our main result is an explicit expression of the exact partition function (permanent) in terms of the matrix of BP marginals, $β$, as $Z=\mbox{Perm}(P)=Z_{BP} \mbox{Perm}(β_i^j(1-β_i^j))/\prod_{i,j}(1-β_i^j)$, where $Z_{BP}$ is the BP expression for the permanent stated explicitly in terms if $β$. We give two derivations of the formula, a direct one based on the Bethe Free Energy and an alternative one combining the Ihara graph-$ζ$ function and the Loop Calculus approaches. Assuming that the matrix $β$ of the Belief Propagation marginals is calculated, we provide two lower bounds and one upper-bound to estimate the multiplicative term. Two complementary lower bounds are based on the Gurvits-van der Waerden theorem and on a relation between the modified permanent and determinant respectively.

URL PDF HTML ☆

赞 0 踩 0

1208.4773 2026-06-03 eess.SY cs.AI cs.LG cs.SY 版本更新

Optimized Look-Ahead Tree Policies: A Bridge Between Look-Ahead Tree Policies and Direct Policy Search

优化前瞻树策略：连接前瞻树策略与直接策略搜索的桥梁

Tobias Jung, Louis Wehenkel, Damien Ernst, Francis Maes

AI总结提出一种混合策略学习方案，通过直接策略搜索学习节点评分函数来指导小型前瞻树的构建，从而结合直接策略搜索和前瞻树策略的优势。

Comments In Submission

详情

AI中文摘要

直接策略搜索（DPS）和前瞻树（LT）策略是两类广泛使用的技术，用于为序列决策问题产生高性能策略。要使DPS方法有效工作，一个关键问题是针对目标问题选择合适的参数化策略空间。LT方法的一个基本问题是，为了做出好的决策，这类策略必须开发非常大的前瞻树，这可能需要过多的在线计算资源。在本文中，我们提出了一种新的混合策略学习方案，它位于DPS和LT的交集，其中策略是一种算法，以有向方式开发一个小型前瞻树，由通过DPS学习的节点评分函数引导。基于LT的表示被证明是在DPS方案中表示策略的一种通用方式，同时，DPS能够显著减少做出高质量决策所需的前瞻树的大小。我们通过实验将我们的方法与两种其他最先进的DPS技术和四种常见的LT策略在四个基准领域进行比较，并表明它结合了其起源的两种技术的优势。特别是，我们表明我们的方法：（1）总体上比纯DPS和纯LT策略产生更好的性能策略，（2）需要的策略评估次数远少于其他DPS技术，（3）易于调整，（4）产生的策略对初始条件的扰动具有相当的鲁棒性。

英文摘要

Direct policy search (DPS) and look-ahead tree (LT) policies are two widely used classes of techniques to produce high performance policies for sequential decision-making problems. To make DPS approaches work well, one crucial issue is to select an appropriate space of parameterized policies with respect to the targeted problem. A fundamental issue in LT approaches is that, to take good decisions, such policies must develop very large look-ahead trees which may require excessive online computational resources. In this paper, we propose a new hybrid policy learning scheme that lies at the intersection of DPS and LT, in which the policy is an algorithm that develops a small look-ahead tree in a directed way, guided by a node scoring function that is learned through DPS. The LT-based representation is shown to be a versatile way of representing policies in a DPS scheme, while at the same time, DPS enables to significantly reduce the size of the look-ahead trees that are required to take high-quality decisions. We experimentally compare our method with two other state-of-the-art DPS techniques and four common LT policies on four benchmark domains and show that it combines the advantages of the two techniques from which it originates. In particular, we show that our method: (1) produces overall better performing policies than both pure DPS and pure LT policies, (2) requires a substantially smaller number of policy evaluations than other DPS techniques, (3) is easy to tune and (4) results in policies that are quite robust with respect to perturbations of the initial conditions.

URL PDF HTML ☆

赞 0 踩 0

1205.2584 2026-06-03 math.NA cs.LG cs.NA math.OC 版本更新

Low Complexity Damped Gauss-Newton Algorithms for CANDECOMP/PARAFAC

低复杂度阻尼高斯-牛顿算法用于CANDECOMP/PARAFAC分解

Anh Huy Phan, Petr Tichavský, Andrzej Cichocki

AI总结针对CP分解中阻尼高斯-牛顿算法计算复杂度过高的问题，提出基于分块逆近似Hessian的快速实现，显著降低计算和内存需求。

详情

AI中文摘要

用于CANDECOMP/PARAFAC (CP) 分解的阻尼高斯-牛顿 (dGN) 算法可以处理因子共线性和不同因子量级的挑战；然而，对于大小为 $I_1\times I_N$、秩为 $R$ 的 $N$ 维张量分解，该算法由于需要构建大小为 $(RT \times RT)$ 的大型近似Hessian矩阵并求逆（其中 $T = \sum_n I_n$），计算量巨大。本文提出了一种dGN算法的快速实现，基于分块形式的逆近似Hessian的新表达式。新实现具有较低的计算复杂度，除了梯度计算（这部分两种方法相同）外，只需要对一个大小为 $NR^2\times NR^2$ 的矩阵求逆，如果 $T \gg NR$，这远小于整个近似Hessian矩阵。此外，该实现具有更低的内存需求，因为Hessian矩阵及其逆矩阵都不需要完整存储。还提出了处理复数数据的算法变体。在困难基准张量示例上，将所提算法的复杂度和性能与dGN和带线搜索的ALS进行了比较。

英文摘要

The damped Gauss-Newton (dGN) algorithm for CANDECOMP/PARAFAC (CP) decomposition can handle the challenges of collinearity of factors and different magnitudes of factors; nevertheless, for factorization of an $N$-D tensor of size $I_1\times I_N$ with rank $R$, the algorithm is computationally demanding due to construction of large approximate Hessian of size $(RT \times RT)$ and its inversion where $T = \sum_n I_n$. In this paper, we propose a fast implementation of the dGN algorithm which is based on novel expressions of the inverse approximate Hessian in block form. The new implementation has lower computational complexity, besides computation of the gradient (this part is common to both methods), requiring the inversion of a matrix of size $NR^2\times NR^2$, which is much smaller than the whole approximate Hessian, if $T \gg NR$. In addition, the implementation has lower memory requirements, because neither the Hessian nor its inverse never need to be stored in their entirety. A variant of the algorithm working with complex valued data is proposed as well. Complexity and performance of the proposed algorithm is compared with those of dGN and ALS with line search on examples of difficult benchmark tensors.

URL PDF HTML ☆

赞 0 踩 0

1104.3792 2026-06-03 stat.ML cs.LG cs.NA math.NA 版本更新

A sufficient condition on monotonic increase of the number of nonzero entry in the optimizer of L1 norm penalized least-square problem

L1范数惩罚最小二乘问题优化器中非零条目数单调递增的充分条件

J. Duan, Charles Soussen, David Brie, Jerome Idier, Y. -P. Wang

AI总结本文针对L1范数惩罚最小二乘问题（LASSO），提出了一个充分条件，在该条件下当超参数减小时优化器中非零条目数单调递增，并将结果推广到全变分情形。

详情

AI中文摘要

基于$\ell$-1范数的优化广泛应用于信号处理，尤其是近期的压缩感知理论。本文研究$\ell$-1范数惩罚最小二乘问题的解路径，其约束形式称为最小绝对收缩和选择算子（LASSO）。解路径是随着超参数（拉格朗日乘子）变化的所有优化器的集合。解路径的研究对于理解和观察近似项与正则化项之间的权衡曲线具有重要意义。如果已知给定问题的解路径，它可以帮助我们在给定准则（如Akaike信息准则）下找到最优超参数。本文提出了$\ell$-1范数惩罚最小二乘问题的一个充分条件。在该充分条件下，当超参数减小时，优化器或解向量中的非零条目数单调递增。我们还将结果推广到常用的全变分情形，其中$\ell$-1范数作用于解向量的一阶导数。我们证明所提出的条件与Donoho等人\cite{Donoho08}给出的条件以及Efron等人\cite{Efron04}的正锥条件具有内在联系。然而，所提出的条件不需要像Donoho等人的条件那样假设信号的稀疏水平，并且在用于实际应用时比Efron等人的正锥条件更容易验证。

英文摘要

The $\ell$-1 norm based optimization is widely used in signal processing, especially in recent compressed sensing theory. This paper studies the solution path of the $\ell$-1 norm penalized least-square problem, whose constrained form is known as Least Absolute Shrinkage and Selection Operator (LASSO). A solution path is the set of all the optimizers with respect to the evolution of the hyperparameter (Lagrange multiplier). The study of the solution path is of great significance in viewing and understanding the profile of the tradeoff between the approximation and regularization terms. If the solution path of a given problem is known, it can help us to find the optimal hyperparameter under a given criterion such as the Akaike Information Criterion. In this paper we present a sufficient condition on $\ell$-1 norm penalized least-square problem. Under this sufficient condition, the number of nonzero entries in the optimizer or solution vector increases monotonically when the hyperparameter decreases. We also generalize the result to the often used total variation case, where the $\ell$-1 norm is taken over the first order derivative of the solution vector. We prove that the proposed condition has intrinsic connections with the condition given by Donoho, et al \cite{Donoho08} and the positive cone condition by Efron {\it el al} \cite{Efron04}. However, the proposed condition does not need to assume the sparsity level of the signal as required by Donoho et al's condition, and is easier to verify than Efron, et al's positive cone condition when being used for practical applications.

URL PDF HTML ☆

赞 0 踩 0

1101.4003 2026-06-03 cs.AI cs.LG cs.SY eess.SY math.OC 版本更新

Dyna-H: a heuristic planning reinforcement learning algorithm applied to role-playing-game strategy decision systems

Dyna-H：一种应用于角色扮演游戏策略决策系统的启发式规划强化学习算法

Matilde Santos, Jose Antonio Martin H., Victoria Lopez, Guillermo Botella

AI总结提出Dyna-H算法，结合启发式搜索与Dyna框架，在角色扮演游戏策略决策中实现无模型在线强化学习，实验表明其性能显著优于Q-Learning和Dyna-Q。

详情

AI中文摘要

在角色扮演游戏中，寻找最优轨迹是最重要的任务之一。实际上，策略决策系统成为游戏引擎的关键组成部分。决策方式（在线、批处理或模拟）以及决策所消耗的资源（如执行时间、内存）将在很大程度上影响游戏性能。当可以使用经典搜索算法（如A*）时，它们是最优先的选择。然而，这些方法依赖于搜索空间的精确和完整模型，在许多有趣的场景中无法应用。此时，无模型的序贯决策方法（在不确定性下）是最佳选择。本文提出一种启发式规划策略，将启发式搜索在路径规划中的能力融入Dyna智能体。所提出的Dyna-H算法，与A*一样，会选择更有可能产生结果的路径分支。此外，它具有无模型在线强化学习算法的优点。该方案与单步Q-Learning和Dyna-Q算法进行了对比评估，获得了优异的实验结果：Dyna-H在所有实验中显著优于这两种方法。我们还提出了一个功能类比，即从最差轨迹中采样的启发式与人类行为中梦境（如噩梦）的作用类似。

英文摘要

In a Role-Playing Game, finding optimal trajectories is one of the most important tasks. In fact, the strategy decision system becomes a key component of a game engine. Determining the way in which decisions are taken (online, batch or simulated) and the consumed resources in decision making (e.g. execution time, memory) will influence, in mayor degree, the game performance. When classical search algorithms such as A* can be used, they are the very first option. Nevertheless, such methods rely on precise and complete models of the search space, and there are many interesting scenarios where their application is not possible. Then, model free methods for sequential decision making under uncertainty are the best choice. In this paper, we propose a heuristic planning strategy to incorporate the ability of heuristic-search in path-finding into a Dyna agent. The proposed Dyna-H algorithm, as A* does, selects branches more likely to produce outcomes than other branches. Besides, it has the advantages of being a model-free online reinforcement learning algorithm. The proposal was evaluated against the one-step Q-Learning and Dyna-Q algorithms obtaining excellent experimental results: Dyna-H significantly overcomes both methods in all experiments. We suggest also, a functional analogy between the proposed sampling from worst trajectories heuristic and the role of dreams (e.g. nightmares) in human behavior.

URL PDF HTML ☆

赞 0 踩 0

1012.3005 2026-06-03 math.OC cs.LG cs.NI cs.SY eess.SY math.PR 版本更新

On the Combinatorial Multi-Armed Bandit Problem with Markovian Rewards

关于马尔可夫奖励的组合多臂老虎机问题

Yi Gai, Bhaskar Krishnamachari, Mingyan Liu

AI总结针对用户-资源匹配中状态演化为马尔可夫链的组合多臂老虎机问题，提出一种多项式存储和每步多项式复杂度的学习算法，实现接近对数时间的遗憾界。

详情

AI中文摘要

我们考虑经典多臂老虎机问题的一个组合推广，定义如下：给定一个二分图，包含 $M$ 个用户和 $N \geq M$ 个资源。对于每个用户-资源对 $(i,j)$，存在一个关联状态，该状态演化为一个参数未知的不可约非周期有限状态马尔可夫链，每次特定用户 $i$ 被分配资源 $j$ 时状态发生转移。用户 $i$ 每次被分配资源 $j$ 时获得一个依赖于对应状态的奖励。系统目标是学习用户与资源的最佳匹配，使得所有用户获得的长期奖励总和最大化。这对应于最小化遗憾，这里定义为最佳可能静态匹配所能获得的期望总奖励与给定算法所能达到的期望总奖励之间的差距。我们针对该问题提出了一种多项式存储和每步多项式复杂度的匹配学习算法。我们证明该算法能够实现均匀任意接近对数时间的遗憾，且遗憾与用户和资源数量成多项式关系。该公式广泛适用于网络中的调度和交换问题，并显著扩展了该领域的先前结果。

英文摘要

We consider a combinatorial generalization of the classical multi-armed bandit problem that is defined as follows. There is a given bipartite graph of $M$ users and $N \geq M$ resources. For each user-resource pair $(i,j)$, there is an associated state that evolves as an aperiodic irreducible finite-state Markov chain with unknown parameters, with transitions occurring each time the particular user $i$ is allocated resource $j$. The user $i$ receives a reward that depends on the corresponding state each time it is allocated the resource $j$. The system objective is to learn the best matching of users to resources so that the long-term sum of the rewards received by all users is maximized. This corresponds to minimizing regret, defined here as the gap between the expected total reward that can be obtained by the best-possible static matching and the expected total reward that can be achieved by a given algorithm. We present a polynomial-storage and polynomial-complexity-per-step matching-learning algorithm for this problem. We show that this algorithm can achieve a regret that is uniformly arbitrarily close to logarithmic in time and polynomial in the number of users and resources. This formulation is broadly applicable to scheduling and switching problems in networks and significantly extends prior results in the area.

URL PDF HTML ☆

赞 0 踩 0

1005.2146 2026-06-03 cs.LG cs.NA math.NA 版本更新

On the Finite Time Convergence of Cyclic Coordinate Descent Methods

关于循环坐标下降法的有限时间收敛性

Ankan Saha, Ambuj Tewari

AI总结本文证明了在等调性假设下，两种循环坐标下降变体具有O(1/k)的收敛速率，并通过与梯度下降的比较展示了其优越性。

Comments 20 pages

1303.3183 2026-06-03 eess.SY cs.CE cs.LG cs.SY q-bio.MN 版本更新

Toggling a Genetic Switch Using Reinforcement Learning

使用强化学习切换遗传开关

Aivar Sootla, Natalja Strelkowa, Damien Ernst, Mauricio Barahona, Guy-Bart Stan

AI总结本文采用拟合Q迭代强化学习算法，无需系统数学模型，直接利用测量数据实现基因调控网络的最优外源控制，并以切换开关系统为例驱动两种蛋白质浓度到达目标状态区域。

Comments 12 pages, presented at the 9th French Meeting on Planning, Decision Making and Learning, Liège (Belgium), May 12-13, 2014

1201.5604 2026-06-03 cs.AI cs.LG cs.NE cs.SY eess.SY math.OC 版本更新

Discrete and fuzzy dynamical genetic programming in the XCSF learning classifier system

XCSF学习分类系统中的离散与模糊动态遗传编程

Richard J. Preen, Larry Bull

AI总结本文在XCSF框架内使用离散和模糊动态系统表示（异步随机布尔网络和模糊逻辑网络），通过自适应的开放式进化设计集成系统，解决多个经典测试问题。

1106.3703 2026-06-03 nlin.AO cs.AI cs.IT cs.LG cs.SY eess.SY math.IT q-bio.QM stat.ME 版本更新

Prediction and Modularity in Dynamical Systems

动力系统中的预测与模块性

Artemy Kolchinsky, Luis M. Rocha

AI总结本文从统计建模和预测的角度，利用模型简洁性与预测精度之间的权衡，提出了一种将动力网络最优多尺度分解为弱耦合简单模块的方法，并给出了状态依赖和因果版本。

Comments v1 published in ECAL 2011 (European Conference on Artificial Life). v2 fixes error in causal risk (number of parameters should be based on training distribution)

1209.2194 2026-06-03 math.OC cs.LG cs.MA cs.SY eess.SY 版本更新

Cooperative learning in multi-agent systems from intermittent measurements

基于间歇测量的多智能体系统协同学习

Naomi Ehrich Leonard, Alex Olshevsky

AI总结针对时变连接和间歇测量下的多智能体系统，提出一种分布式学习协议，从噪声测量中学习未知向量μ，并给出学习速度与网络大小和组合特征的关系。

1210.7559 2026-06-03 cs.LG cs.NA math.NA stat.ML 版本更新

Tensor decompositions for learning latent variable models

用于学习潜变量模型的张量分解

Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky

AI总结本文利用低阶可观测矩的张量结构，通过对称张量分解（类似矩阵SVD的推广）实现高斯混合模型、隐马尔可夫模型等潜变量模型的参数估计，并提供了鲁棒张量幂法的详细分析。

详情

Journal ref: Journal of Machine Learning Research, 15(Aug):2773-2832, 2014

AI中文摘要

本文研究了一类广泛的潜变量模型（包括高斯混合模型、隐马尔可夫模型和潜在狄利克雷分配）的计算和统计高效参数估计方法，该方法利用了其低阶可观测矩（通常是二阶和三阶）中的特定张量结构。具体地，参数估计被简化为从矩导出的对称张量中提取某种（正交）分解的问题；这种分解可以看作是矩阵奇异值分解的自然推广。尽管张量分解通常难以计算，但这些特殊结构张量的分解可以通过多种方法高效获得，包括幂迭代和最大化方法（类似于矩阵的情况）。本文提供了鲁棒张量幂方法的详细分析，建立了类似于矩阵奇异向量的Wedin扰动定理的类比。这意味着对于几种流行的潜变量模型，存在一种鲁棒且计算可行的估计方法。

英文摘要

This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.

URL PDF HTML ☆

赞 0 踩 0

1204.4200 2026-06-03 cs.AI cs.LG cs.NE cs.SY eess.SY 版本更新

Discrete Dynamical Genetic Programming in XCS

XCS中的离散动力遗传编程

Richard J. Preen, Larry Bull

AI总结本文研究在XCS学习分类器系统中使用异步随机布尔网络作为离散动力系统表示，通过自适应的开放式进化设计集成系统以解决多个经典测试问题。

Comments arXiv admin note: substantial text overlap with arXiv:1201.5604

1211.4116 2026-06-03 cs.LG cs.NA math.AG math.CO math.NA stat.ML 版本更新

The Algebraic Combinatorial Approach for Low-Rank Matrix Completion

低秩矩阵补全的代数组合方法

Franz J. Király, Louis Theran, Ryota Tomioka

AI总结本文提出一种基于代数几何和拟阵理论的代数组合新视角，通过研究少量条目间的关系来解决低秩矩阵补全问题，并给出概率为1的算法判断特定条目是否可补全、从少量条目补全该条目及估计补全误差。

Comments 37 pages, with an appendix by Takeaki Uno

1012.1919 2026-06-03 math.NA cs.IT cs.LG cs.NA math.IT 版本更新

Low-Rank Structure Learning via Log-Sum Heuristic Recovery

通过对数求和启发式恢复的低秩结构学习

Yue Deng, Qionghai Dai, Risheng Liu, Zengke Zhang, Sanqing Hu

AI总结提出对数求和启发式恢复（LHR）模型，通过非凸对数求和度量增强稀疏性，并采用MM型算法求解，在鲁棒主成分分析和低秩表示任务中优于ℓ1基方法。

Comments 13 pages, 3 figures

详情

DOI: 10.1109/TNNLS.2012.2235082
Journal ref: Neural Networks and Learning Systems, IEEE Transactions on, Volume:24 , Issue: 3, March, 2013

AI中文摘要

从被破坏的观测中恢复内在数据结构在机器学习和信号处理的各种任务中扮演重要角色。本文提出一种新模型，称为对数求和启发式恢复（LHR），用于从被破坏数据中学习本质的低秩结构。与传统方法直接使用ℓ1范数衡量稀疏性不同，LHR引入更合理的对数求和度量，以增强内在低秩结构和稀疏破坏中的稀疏性。尽管所提出的LHR优化不再凸，但仍可通过一种主要化-最小化（MM）类型算法有效求解，该算法用凸代理迭代替换非凸目标函数，最终LHR落入重加权方法的一般框架。我们证明了MM型算法在连续迭代后可以收敛到一个稳定点。我们通过将模型应用于解决两个典型问题来测试其性能：鲁棒主成分分析（RPCA）和低秩表示（LRR）。对于RPCA，我们从模拟和实际应用的角度将LHR与基准方法主成分追踪（PCP）进行比较。对于LRR，我们将LHR应用于计算运动分割和股票聚类的低秩表示矩阵。低秩结构学习的实验结果表明，所提出的基于对数求和的模型在数据秩更高且破坏更密集的情况下，性能远优于基于ℓ1的方法。

英文摘要

Recovering intrinsic data structure from corrupted observations plays an important role in various tasks in the communities of machine learning and signal processing. In this paper, we propose a novel model, named log-sum heuristic recovery (LHR), to learn the essential low-rank structure from corrupted data. Different from traditional approaches, which directly utilize $\ell_1$ norm to measure the sparseness, LHR introduces a more reasonable log-sum measurement to enhance the sparsity in both the intrinsic low-rank structure and in the sparse corruptions. Although the proposed LHR optimization is no longer convex, it still can be effectively solved by a majorization-minimization (MM) type algorithm, with which the non-convex objective function is iteratively replaced by its convex surrogate and LHR finally falls into the general framework of reweighed approaches. We prove that the MM-type algorithm can converge to a stationary point after successive iteration. We test the performance of our proposed model by applying it to solve two typical problems: robust principal component analysis (RPCA) and low-rank representation (LRR). For RPCA, we compare LHR with the benchmark Principal Component Pursuit (PCP) method from both the perspectives of simulations and practical applications. For LRR, we apply LHR to compute the low-rank representation matrix for motion segmentation and stock clustering. Experimental results on low rank structure learning demonstrate that the proposed Log-sum based model performs much better than the $\ell_1$-based method on for data with higher rank and with denser corruptions.

URL PDF HTML ☆

赞 0 踩 0

1302.6768 2026-06-03 math.NA cs.LG cs.NA stat.ML 版本更新

Missing Entries Matrix Approximation and Completion

缺失条目的矩阵逼近与补全

Gil Shabat, Yaniv Shmueli, Amir Averbuch

AI总结针对仅部分条目已知的矩阵，提出一系列算法实现矩阵补全与逼近，支持低秩、核范数、谱范数等多种约束，并证明凸情形下全局收敛，无需参数且适用于图像重建及偏微分方程数据恢复。

详情

AI中文摘要

我们描述了当矩阵仅部分条目已知时，用于矩阵补全和矩阵逼近的几种算法。逼近约束可以是任何对于完整矩阵已知其近似解的约束。对于低秩逼近，类似的算法最近在文献中以不同名称出现。在这项工作中，我们引入了矩阵逼近的新定理，并表明这些算法可以扩展到处理不同的约束，例如核范数、谱范数、正交约束等，这些约束与低秩逼近不同。由于这些算法可以从优化的角度看待，我们讨论了它们在凸情形下收敛到全局解的问题。我们还讨论了最优步长，并表明它在每次迭代中是固定的。此外，推导出的矩阵补全流是鲁棒的，不需要任何参数。该矩阵补全流适用于不同的谱最小化问题，并可应用于物理、数学和电气工程问题，例如图像数据重建以及来自偏微分方程（如用于电磁波的亥姆霍兹方程）的数据重建。

英文摘要

We describe several algorithms for matrix completion and matrix approximation when only some of its entries are known. The approximation constraint can be any whose approximated solution is known for the full matrix. For low rank approximations, similar algorithms appears recently in the literature under different names. In this work, we introduce new theorems for matrix approximation and show that these algorithms can be extended to handle different constraints such as nuclear norm, spectral norm, orthogonality constraints and more that are different than low rank approximations. As the algorithms can be viewed from an optimization point of view, we discuss their convergence to global solution for the convex case. We also discuss the optimal step size and show that it is fixed in each iteration. In addition, the derived matrix completion flow is robust and does not require any parameters. This matrix completion flow is applicable to different spectral minimizations and can be applied to physics, mathematics and electrical engineering problems such as data reconstruction of images and data coming from PDEs such as Helmholtz equation used for electromagnetic waves.

URL PDF HTML ☆

赞 0 踩 0

1206.1623 2026-06-03 stat.ML cs.DS cs.LG cs.NA math.NA math.OC 版本更新

Proximal Newton-type methods for minimizing composite functions

最小化复合函数的近端牛顿型方法

Jason D. Lee, Yuekai Sun, Michael A. Saunders

AI总结针对光滑函数与非光滑凸函数之和的优化问题，提出近端牛顿型方法，并证明其在不精确搜索方向下仍保持牛顿型方法的收敛性，统一了生物信息学、信号处理和统计学习中的多种流行方法。

1301.3584 2026-06-03 cs.LG cs.NA math.NA 版本更新

Revisiting Natural Gradient for Deep Networks

重新审视深度网络的自然梯度

Razvan Pascanu, Yoshua Bengio

AI总结本文重新评估了自然梯度算法在深度模型学习中的应用，揭示了其与Hessian-Free、Krylov子空间下降和TONGA方法的联系，并提出了利用无标签数据改进泛化误差、评估对训练集顺序的鲁棒性以及结合二阶信息的扩展算法。

1302.3447 2026-06-03 math.ST cs.LG cs.NA math.NA math.PR stat.TH 版本更新

Exact Methods for Multistage Estimation of a Binomial Proportion

二项比例多阶段估计的精确方法

Zhengjia Chen, Xinjia Chen

AI总结本文回顾了现有二项比例序贯估计方法，提出了一类新的组序贯抽样方案，在给定误差和置信水平下实现均匀覆盖概率控制和渐近最优性，并推导了样本数的解析界。

Comments 38 pages, 9 figures

1303.4207 2026-06-03 cs.LG cs.NA math.NA 版本更新

Improving CUR Matrix Decomposition and the Nyström Approximation via Adaptive Sampling

通过自适应采样改进CUR矩阵分解和Nyström近似

Shusen Wang, Zhihua Zhang

AI总结本文通过建立自适应列/行采样算法的更一般误差界，提出了具有预期相对误差界的更精确CUR和Nyström算法，并给出了标准Nyström方法和集成Nyström方法的低误差界理论分析。

详情

Journal ref: Journal of Machine Learning Research, 14: 2549-2589, 2013

AI中文摘要

CUR矩阵分解和Nyström近似是两种重要的低秩矩阵近似技术。Nyström方法通过少量列来近似对称半正定矩阵，而CUR通过少量列和行来近似任意数据矩阵。因此，CUR分解可以看作是Nyström近似的扩展。在本文中，我们为自适应列/行采样算法建立了更一般的误差界，基于此我们提出了具有预期相对误差界的更精确CUR和Nyström算法。所提出的CUR和Nyström算法也具有低时间复杂度，并且可以避免将整个数据矩阵保存在内存中。此外，我们对标准Nyström方法和集成Nyström方法的低误差界进行了理论分析。本文建立的主要理论结果是新颖的，并且我们的分析对数据矩阵没有特殊假设。

英文摘要

The CUR matrix decomposition and the Nyström approximation are two important low-rank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number of its columns and rows. Thus, CUR decomposition can be regarded as an extension of the Nyström approximation. In this paper we establish a more general error bound for the adaptive column/row sampling algorithm, based on which we propose more accurate CUR and Nyström algorithms with expected relative-error bounds. The proposed CUR and Nyström algorithms also have low time complexity and can avoid maintaining the whole data matrix in RAM. In addition, we give theoretical analysis for the lower error bounds of the standard Nyström method and the ensemble Nyström method. The main theoretical results established in this paper are novel, and our analysis makes no special assumption on the data matrices.

URL PDF HTML ☆

赞 0 踩 0

1303.4694 2026-06-03 math.NA cs.LG cs.NA stat.ML 版本更新

Recovering Non-negative and Combined Sparse Representations

恢复非负和组合稀疏表示

Karthikeyan Natesan Ramamurthy, Jayaraman J. Thiagarajan, Andreas Spanias

AI总结基于多面体理论，研究了欠定线性系统中非负解的唯一性条件，并提出了组合稀疏表示范式及相应的组合正交匹配追踪算法，用于恢复唯一最稀疏系数向量。

详情

AI中文摘要

有时，欠定线性系统的非负解可以被唯一恢复，即使不施加任何额外的稀疏约束。在本文中，我们基于多面体理论推导了此类系统存在唯一非负解的条件。此外，我们发展了组合稀疏表示的范式，其中只有部分系数向量被约束为非负，其余部分无约束（一般）。我们分析了在三种不同的系数支持知识情况下，组合表示的唯一最稀疏解的恢复：（a）非负系数和一般系数的非零支持已知，（b）仅一般系数的非零支持已知，（c）两个非零支持均未知。对于情况（c），我们提出了组合正交匹配追踪算法用于系数恢复，并推导了确定性稀疏度阈值，在该阈值下可以恢复唯一最稀疏的系数向量。我们量化了算法的阶复杂度，并检验了它们在各种噪声条件下精确和近似恢复系数的性能。此外，我们还获得了它们的经验相变特性。我们表明，与无约束的对应算法相比，具有部分非负约束的基追踪算法和所提出的贪婪算法在恢复唯一稀疏表示方面表现更好。最后，我们展示了所提方法在恢复受饱和噪声污染的图像中的实用性。

英文摘要

The non-negative solution to an underdetermined linear system can be uniquely recovered sometimes, even without imposing any additional sparsity constraints. In this paper, we derive conditions under which a unique non-negative solution for such a system can exist, based on the theory of polytopes. Furthermore, we develop the paradigm of combined sparse representations, where only a part of the coefficient vector is constrained to be non-negative, and the rest is unconstrained (general). We analyze the recovery of the unique, sparsest solution, for combined representations, under three different cases of coefficient support knowledge: (a) the non-zero supports of non-negative and general coefficients are known, (b) the non-zero support of general coefficients alone is known, and (c) both the non-zero supports are unknown. For case (c), we propose the combined orthogonal matching pursuit algorithm for coefficient recovery and derive the deterministic sparsity threshold under which recovery of the unique, sparsest coefficient vector is possible. We quantify the order complexity of the algorithms, and examine their performance in exact and approximate recovery of coefficients under various conditions of noise. Furthermore, we also obtain their empirical phase transition characteristics. We show that the basis pursuit algorithm, with partial non-negative constraints, and the proposed greedy algorithm perform better in recovering the unique sparse representation when compared to their unconstrained counterparts. Finally, we demonstrate the utility of the proposed methods in recovering images corrupted by saturation noise.

URL PDF HTML ☆

赞 0 踩 0

1102.2490 2026-06-03 math.ST cs.LG cs.SY eess.SY math.OC stat.TH 版本更新

The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond

KL-UCB算法：有界随机赌博机及其扩展

Aurélien Garivier, Olivier Cappé

AI总结本文提出KL-UCB算法，通过有限时间分析证明其在有界奖励下优于UCB和UCB2，在伯努利奖励下达到Lai和Robbins下界，并扩展到指数族分布，数值实验显示其高效稳定。

Comments 18 pages, 3 figures; Conf. Comput. Learning Theory (COLT) 2011 in Budapest, Hungary

详情

Journal ref: Conference On Learning Theory n°24 Jul. 2011 pp.359-376

AI中文摘要

本文对KL-UCB算法进行了有限时间分析，该算法是一种在线、无时间视界的随机赌博机问题的索引策略。我们证明了两个不同的结果：首先，对于任意有界奖励，KL-UCB算法满足比UCB或UCB2一致更优的遗憾界；其次，在伯努利奖励的特殊情况下，它达到了Lai和Robbins的下界。此外，我们展示了KL-UCB算法的简单改编对于特定类别的（可能无界）奖励也是最优的，包括那些从指数族分布生成的奖励。一项大规模数值研究将KL-UCB与其主要竞争对手（UCB、UCB2、UCB-Tuned、UCB-V、DMED）进行比较，表明KL-UCB非常高效且稳定，包括在短时间范围内。KL-UCB也是唯一始终优于基本UCB策略的方法。我们的遗憾界依赖于附录中陈述并证明的具有独立兴趣的偏差结果。作为副产品，我们还获得了标准UCB算法的改进遗憾界。

英文摘要

This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizon-free index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary bounded rewards, the KL-UCB algorithm satisfies a uniformly better regret bound than UCB or UCB2; second, in the special case of Bernoulli rewards, it reaches the lower bound of Lai and Robbins. Furthermore, we show that simple adaptations of the KL-UCB algorithm are also optimal for specific classes of (possibly unbounded) rewards, including those generated from exponential families of distributions. A large-scale numerical study comparing KL-UCB with its main competitors (UCB, UCB2, UCB-Tuned, UCB-V, DMED) shows that KL-UCB is remarkably efficient and stable, including for short time horizons. KL-UCB is also the only method that always performs better than the basic UCB policy. Our regret bounds rely on deviations results of independent interest which are stated and proved in the Appendix. As a by-product, we also obtain an improved regret bound for the standard UCB algorithm.

URL PDF HTML ☆

赞 0 踩 0

1211.6687 2026-06-03 stat.ML cs.LG cs.NA math.NA math.OC 版本更新

Robustness Analysis of Hottopixx, a Linear Programming Model for Factoring Nonnegative Matrices

Hottopixx的鲁棒性分析：一个用于非负矩阵分解的线性规划模型

Nicolas Gillis

AI总结本文对Hottopixx线性规划模型进行鲁棒性分析，并提出一种后处理策略以增强对数据集中重复和近似重复的鲁棒性。

Comments 23 pages; new numerical results; Comparison with Arora et al.; Accepted in SIAM J. Mat. Anal. Appl

详情

DOI: 10.1137/120900629
Journal ref: SIAM J. Matrix Anal. & Appl. 34 (3), pp. 1189-1212, 2013

AI中文摘要

尽管非负矩阵分解（NMF）通常是NP难的，但最近研究表明，在输入非负数据矩阵接近可分离的假设下（可分离性要求输入矩阵的所有列属于由这些列的一个小子集张成的锥），NMF是易处理的。此后，设计了多种算法来处理这类NMF子问题。特别地，Bittorf、Recht、Ré和Tropp（《用线性规划分解非负矩阵》，NIPS 2012）提出了一种线性规划模型，称为Hottopixx。本文提供了对其方法的一种新的、更一般的鲁棒性分析。特别地，我们通过一种后处理策略设计了一个可证明更鲁棒的变体，该策略允许我们处理数据集中的重复和近似重复。

英文摘要

Although nonnegative matrix factorization (NMF) is NP-hard in general, it has been shown very recently that it is tractable under the assumption that the input nonnegative data matrix is close to being separable (separability requires that all columns of the input matrix belongs to the cone spanned by a small subset of these columns). Since then, several algorithms have been designed to handle this subclass of NMF problems. In particular, Bittorf, Recht, Ré and Tropp (`Factoring nonnegative matrices with linear programs', NIPS 2012) proposed a linear programming model, referred to as Hottopixx. In this paper, we provide a new and more general robustness analysis of their method. In particular, we design a provably more robust variant using a post-processing strategy which allows us to deal with duplicates and near duplicates in the dataset.

URL PDF HTML ☆

赞 0 踩 0

1210.5323 2026-06-03 cs.IT cs.LG cs.NA math.IT math.NA 版本更新

The performance of orthogonal multi-matching pursuit under RIP

正交多匹配追踪在RIP下的性能

Zhiqiang Xu

AI总结研究正交多匹配追踪(OMMP)在受限等距性质(RIP)下的性能，证明在特定RIP条件下OMMP能在s次迭代内恢复s-稀疏信号，并针对慢衰减稀疏信号实现迭代次数减少。

Comments 22 pages

详情

AI中文摘要

正交多匹配追踪(OMMP)是正交匹配追踪(OMP)的自然扩展。我们将参数为$M$的OMMP记为OMMP(M)，其中$M\geq 1$是整数。OMP与OMMP(M)的主要区别在于，OMMP(M)每次迭代选择$M$个原子，而OMP每次只向最优原子集添加一个原子。本文研究正交多匹配追踪(OMMP)在RIP下的性能。特别地，我们证明，当测量矩阵A满足$(9s, 1/10)$-RIP时，存在绝对常数$M_0\leq 8$使得OMMP(M_0)能在$s$次迭代内恢复$s$-稀疏信号。我们进一步证明，对于慢衰减的$s$-稀疏信号，对于一大类$M$，OMMP(M)能在$O(\frac{s}{M})$次迭代内恢复$s$-稀疏信号。特别地，对于$M=s^a$且$a\in [0,1/2]$，OMMP(M)能在$O(s^{1-a})$次迭代内恢复慢衰减的$s$-稀疏信号。该结果表明OMMP能大幅降低计算复杂度。

英文摘要

The orthogonal multi-matching pursuit (OMMP) is a natural extension of orthogonal matching pursuit (OMP). We denote the OMMP with the parameter $M$ as OMMP(M) where $M\geq 1$ is an integer. The main difference between OMP and OMMP(M) is that OMMP(M) selects $M$ atoms per iteration, while OMP only adds one atom to the optimal atom set. In this paper, we study the performance of orthogonal multi-matching pursuit (OMMP) under RIP. In particular, we show that, when the measurement matrix A satisfies $(9s, 1/10)$-RIP, there exists an absolutely constant $M_0\leq 8$ so that OMMP(M_0) can recover $s$-sparse signal within $s$ iterations. We furthermore prove that, for slowly-decaying $s$-sparse signal, OMMP(M) can recover s-sparse signal within $O(\frac{s}{M})$ iterations for a large class of $M$. In particular, for $M=s^a$ with $a\in [0,1/2]$, OMMP(M) can recover slowly-decaying $s$-sparse signal within $O(s^{1-a})$ iterations. The result implies that OMMP can reduce the computational complexity heavily.

URL PDF HTML ☆

赞 0 踩 0

1211.3500 2026-06-03 math.NA cs.LG cs.NA 版本更新

Accelerated Canonical Polyadic Decomposition by Using Mode Reduction

利用模式约简加速典型多路分解

Guoxu Zhou, Andrzej Cichocki, Shengli Xie

AI总结针对高阶张量CP分解中频繁展开至N个模式导致的效率瓶颈，提出一种将N阶张量先转化为3阶张量再分解的方法，避免逐模式展开，同时保持分解唯一性并提升效率。

Comments 12 pages. Accepted by TNNLS

详情

DOI: 10.1109/TNNLS.2013.2271507

AI中文摘要

典型多路（或CANDECOMP/PARAFAC，CP）分解（CPD）广泛应用于分析高阶张量。现有的CPD方法使用交替最小二乘（ALS）迭代，因此需要频繁地将张量展开到每个$N$个模式，这是大规模数据特别是当$N$较大时效率的主要瓶颈之一。为了解决这个问题，本文提出了一种新的CPD方法，该方法首先将原始的$N$阶（$N>3$）张量转换为3阶张量。然后通过分解这个模式约简后的张量，再经过Khatri-Rao积投影过程来实现完整的CPD。这种方法非常高效，因为避免了展开到每个$N$个模式，并且可以轻松地加入降维以进一步提高效率。我们证明，在温和条件下，任何$N$阶CPD都可以转化为3阶情况，而不会破坏本质唯一性，并且理论上给出与直接$N$路CPD方法相同的结果。仿真表明，与最先进的CPD方法相比，所提方法更高效，且更容易摆脱局部解。

英文摘要

Canonical Polyadic (or CANDECOMP/PARAFAC, CP) decompositions (CPD) are widely applied to analyze high order tensors. Existing CPD methods use alternating least square (ALS) iterations and hence need to unfold tensors to each of the $N$ modes frequently, which is one major bottleneck of efficiency for large-scale data and especially when $N$ is large. To overcome this problem, in this paper we proposed a new CPD method which converts the original $N$th ($N>3$) order tensor to a 3rd-order tensor first. Then the full CPD is realized by decomposing this mode reduced tensor followed by a Khatri-Rao product projection procedure. This way is quite efficient as unfolding to each of the $N$ modes are avoided, and dimensionality reduction can also be easily incorporated to further improve the efficiency. We show that, under mild conditions, any $N$th-order CPD can be converted into a 3rd-order case but without destroying the essential uniqueness, and theoretically gives the same results as direct $N$-way CPD methods. Simulations show that, compared with state-of-the-art CPD methods, the proposed method is more efficient and escape from local solutions more easily.

URL PDF HTML ☆

赞 0 踩 0

1303.1849 2026-06-03 cs.LG cs.DS cs.NA math.NA 版本更新

Revisiting the Nystrom Method for Improved Large-Scale Machine Learning

重新审视 Nyström 方法以改进大规模机器学习

Alex Gittens, Michael W. Mahoney

AI总结本文重新审视了对称半正定矩阵低秩近似的随机算法，通过经验评估和理论分析比较了采样与投影方法的性能，并提出了更优的误差界。

Comments 60 pages, 15 color figures; updated proof of Frobenius norm bounds, added comparison to projection-based low-rank approximations, and an analysis of the power method applied to SPSD sketches

详情

AI中文摘要

我们重新考虑了数据分析和机器学习应用中出现的对称半正定（SPSD）矩阵（如拉普拉斯矩阵和核矩阵）低秩近似的随机算法。我们的主要结果包括对一系列多样化的SPSD矩阵上采样和投影方法的性能质量和运行时间的经验评估。我们的结果突出了采样与投影方法的互补性；它们描述了常见数据预处理步骤对这些算法性能的影响；并指出了基于杠杆得分的均匀采样与非均匀采样方法之间的重要差异。此外，我们的经验结果表明，现有理论非常薄弱，甚至无法为实践提供定性指导。因此，我们用一套随机采样和随机投影方法的最坏情况理论界限来补充我们的经验结果。这些界限在质量上优于现有界限——例如，谱范数和Frobenius范数误差的改进加性误差界，以及迹范数误差的相对误差界——并指出了使这些算法在更大规模机器学习应用中更有用的未来方向。

英文摘要

We reconsider randomized algorithms for the low-rank approximation of symmetric positive semi-definite (SPSD) matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods; they characterize the effects of common data preprocessing steps on the performance of these algorithms; and they point to important differences between uniform sampling and nonuniform sampling methods based on leverage scores. In addition, our empirical results illustrate that existing theory is so weak that it does not provide even a qualitative guide to practice. Thus, we complement our empirical results with a suite of worst-case theoretical bounds for both random sampling and random projection methods. These bounds are qualitatively superior to existing bounds---e.g. improved additive-error bounds for spectral and Frobenius norm error and relative-error bounds for trace norm error---and they point to future directions to make these algorithms useful in even larger-scale machine learning applications.

URL PDF HTML ☆

赞 0 踩 0

1204.4202 2026-06-03 cs.AI cs.LG cs.NE cs.SY eess.SY 版本更新

Fuzzy Dynamical Genetic Programming in XCSF

XCSF中的模糊动态遗传编程

Richard J. Preen, Larry Bull

AI总结研究在XCSF学习分类器系统中使用模糊动态遗传编程表示，通过异步模糊逻辑网络实现自适应性开放演化，解决连续值测试问题。

Comments 2 page GECCO 2011 poster paper

1211.7045 2026-06-03 cs.LG cs.NA math.NA math.OC q-bio.BM 版本更新

Orientation Determination from Cryo-EM images Using Least Unsquared Deviation

使用最小未平方偏差从冷冻电镜图像确定方向

Lanhui Wang, Amit Singer, Zaiwen Wen

AI总结针对冷冻电镜单颗粒重构中方向未知的二维投影图像，提出基于最小未平方偏差的鲁棒全局自洽误差模型，通过半定松弛和谱范数约束/正则化求解，显著降低低共线检测率下的方向估计误差。

详情

AI中文摘要

冷冻电镜单颗粒重构的一个主要挑战是利用未知方向的二维投影图像建立可靠的三维初始模型。基于共线的方法无需额外几何信息即可估计方向。然而，当图像噪声水平过高导致共线检测率过低时，此类方法会失效。通过半定规划的凸松弛，得到了最小二乘全局自洽误差的近似。本文引入一种更鲁棒的全局自洽误差，并证明相应的优化问题可通过半定松弛求解。为了防止估计视角的人为聚类，我们进一步引入一个谱范数项，作为约束或正则化项添加到松弛的最小化问题中。所得问题通过交替方向乘子法或迭代重加权最小二乘过程求解。模拟和真实图像的数值实验表明，当共线检测率较低时，所提方法显著降低了方向估计误差。

英文摘要

A major challenge in single particle reconstruction from cryo-electron microscopy is to establish a reliable ab-initio three-dimensional model using two-dimensional projection images with unknown orientations. Common-lines based methods estimate the orientations without additional geometric information. However, such methods fail when the detection rate of common-lines is too low due to the high level of noise in the images. An approximation to the least squares global self consistency error was obtained using convex relaxation by semidefinite programming. In this paper we introduce a more robust global self consistency error and show that the corresponding optimization problem can be solved via semidefinite relaxation. In order to prevent artificial clustering of the estimated viewing directions, we further introduce a spectral norm term that is added as a constraint or as a regularization term to the relaxed minimization problem. The resulted problems are solved by using either the alternating direction method of multipliers or an iteratively reweighted least squares procedure. Numerical experiments with both simulated and real images demonstrate that the proposed methods significantly reduce the orientation estimation error when the detection rate of common-lines is low.

URL PDF HTML ☆

赞 0 踩 0

1204.1259 2026-06-03 cs.LG cs.IR cs.NA math.NA 版本更新

Fast ALS-based tensor factorization for context-aware recommendation from implicit feedback

基于快速ALS的张量分解用于隐式反馈的上下文感知推荐

Balázs Hidasi, Domonkos Tikk

AI总结提出iTALS算法，利用基于ALS的张量分解方法线性扩展至非零元素，整合上下文信息（如季节性和序列模式），在隐式反馈数据集上显著提升推荐质量。

Comments Accepted for ECML/PKDD 2012, presented on 25th September 2012, Bristol, UK

详情

DOI: 10.1007/978-3-642-33486-3_5
Journal ref: Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II

AI中文摘要

尽管基于隐式反馈的推荐问题（仅用户历史可用而无评分）是实际应用中最典型的场景，但其研究远少于显式反馈情况。在显式情况下高效的最先进算法，若要保持可扩展性，无法直接转化为隐式情况。隐式反馈基准数据集很少，因此新想法通常基于显式基准进行实验。本文提出了一种通用的上下文感知隐式反馈推荐算法，称为iTALS。iTALS应用了一种快速的、基于ALS的张量分解学习方法，其规模与张量中非零元素数量呈线性关系。该方法还允许我们在保持计算效率的同时，将多样化的上下文信息融入模型。特别地，我们提出了iTALS的两种上下文感知实现变体。第一种融入季节性，能够区分不同时间间隔的用户行为。另一种将用户历史视为序列信息，能够识别特定物品组的典型使用模式，例如自动区分通常重复购买（收藏品、杂货）或一次性购买（家用电器）的产品类型或类别。在三个隐式数据集（两个专有数据集和Netflix数据集的隐式变体）上进行的实验表明，通过将上下文感知信息与我们的分解框架集成到最先进的隐式推荐算法中，推荐质量显著提高。

英文摘要

Albeit, the implicit feedback based recommendation problem - when only the user history is available but there are no ratings - is the most typical setting in real-world applications, it is much less researched than the explicit feedback case. State-of-the-art algorithms that are efficient on the explicit case cannot be straightforwardly transformed to the implicit case if scalability should be maintained. There are few if any implicit feedback benchmark datasets, therefore new ideas are usually experimented on explicit benchmarks. In this paper, we propose a generic context-aware implicit feedback recommender algorithm, coined iTALS. iTALS apply a fast, ALS-based tensor factorization learning method that scales linearly with the number of non-zero elements in the tensor. The method also allows us to incorporate diverse context information into the model while maintaining its computational efficiency. In particular, we present two such context-aware implementation variants of iTALS. The first incorporates seasonality and enables to distinguish user behavior in different time intervals. The other views the user history as sequential information and has the ability to recognize usage pattern typical to certain group of items, e.g. to automatically tell apart product types or categories that are typically purchased repetitively (collectibles, grocery goods) or once (household appliances). Experiments performed on three implicit datasets (two proprietary ones and an implicit variant of the Netflix dataset) show that by integrating context-aware information with our factorization framework into the state-of-the-art implicit recommender algorithm the recommendation quality improves significantly.

URL PDF HTML ☆

赞 0 踩 0

1303.6370 2026-06-03 stat.ML cs.LG cs.NA math.NA 版本更新

Convex Tensor Decomposition via Structured Schatten Norm Regularization

通过结构化Schatten范数正则化的凸张量分解

Ryota Tomioka, Taiji Suzuki

AI总结本文研究用于凸优化张量分解的结构化Schatten范数，从理论上证明“潜在”方法优于“重叠”方法，并建立对偶性、一致性和可识别性结果。

Comments 12 pages, 3 figures

详情

AI中文摘要

我们讨论了用于张量分解的结构化Schatten范数，包括最近提出的两种用于基于凸优化的张量分解的范数（“重叠”和“潜在”），并将张量分解与更广泛的结构化稀疏性文献联系起来。基于结构化Schatten范数的性质，我们从数学上分析了“潜在”方法在张量分解中的性能，该方法在某些设置下经验上被发现比“重叠”方法表现更好。我们从理论上证明了这确实是事实。特别是，当未知的真实张量在特定模式下是低秩时，该方法的表现与知道最小秩的模式一样好。在此过程中，我们展示了结构化Schatten范数的一个新颖的对偶性结果，建立了一致性，并讨论了该方法的可识别性。通过数值模拟，我们确认了我们的理论预测可以精确预测均方误差的缩放行为。

英文摘要

We discuss structured Schatten norms for tensor decomposition that includes two recently proposed norms ("overlapped" and "latent") for convex-optimization-based tensor decomposition, and connect tensor decomposition with wider literature on structured sparsity. Based on the properties of the structured Schatten norms, we mathematically analyze the performance of "latent" approach for tensor decomposition, which was empirically found to perform better than the "overlapped" approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific mode, this approach performs as good as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structures Schatten norms, establish the consistency, and discuss the identifiability of this approach. We confirm through numerical simulations that our theoretical prediction can precisely predict the scaling behavior of the mean squared error.

URL PDF HTML ☆

赞 0 踩 0

1303.4434 2026-06-03 cs.LG cs.NA math.NA stat.CO stat.ML 版本更新

A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems

非凸正则化优化问题的一般迭代收缩与阈值算法

Pinghua Gong, Changshui Zhang, Zhaosong Lu, Jianhua Huang, Jieping Ye

AI总结针对非凸稀疏诱导惩罚的优化问题，提出一种通用迭代收缩与阈值算法（GIST），通过近端算子闭式解和BB规则线搜索实现高效求解，并给出收敛性分析。

详情

AI中文摘要

非凸稀疏诱导惩罚近年来在稀疏学习中受到广泛关注。最近的理论研究表明，在若干稀疏学习场景中，非凸惩罚优于其凸对应物。然而，与非凸惩罚相关的非凸优化问题的求解仍然是一个重大挑战。一种常用方法是多阶段（MS）凸松弛（或DC规划），它将原始非凸问题松弛为一系列凸问题。这种方法通常不适用于大规模问题，因为其计算成本是求解单个凸问题的倍数。在本文中，我们提出了一种通用迭代收缩与阈值（GIST）算法，用于求解一大类非凸惩罚的非凸优化问题。GIST算法迭代求解一个近端算子问题，而该问题对于许多常用惩罚具有闭式解。在算法的每次外迭代中，我们使用由Barzilai-Borwein（BB）规则初始化的线搜索，以快速找到合适的步长。本文还给出了GIST算法的详细收敛性分析。通过在大规模数据集上的大量实验，证明了所提算法的效率。

英文摘要

Non-convex sparsity-inducing penalties have recently received considerable attentions in sparse learning. Recent theoretical investigations have demonstrated their superiority over the convex counterparts in several sparse learning settings. However, solving the non-convex optimization problems associated with non-convex penalties remains a big challenge. A commonly used approach is the Multi-Stage (MS) convex relaxation (or DC programming), which relaxes the original non-convex problem to a sequence of convex problems. This approach is usually not very practical for large-scale problems because its computational cost is a multiple of solving a single convex problem. In this paper, we propose a General Iterative Shrinkage and Thresholding (GIST) algorithm to solve the nonconvex optimization problem for a large class of non-convex penalties. The GIST algorithm iteratively solves a proximal operator problem, which in turn has a closed-form solution for many commonly used penalties. At each outer iteration of the algorithm, we use a line search initialized by the Barzilai-Borwein (BB) rule that allows finding an appropriate step size quickly. The paper also presents a detailed convergence analysis of the GIST algorithm. The efficiency of the proposed algorithm is demonstrated by extensive experiments on large-scale data sets.

URL PDF HTML ☆

赞 0 踩 0

1301.3527 2026-06-03 cs.LG cs.NA math.NA 版本更新

Block Coordinate Descent for Sparse NMF

块坐标下降法用于稀疏非负矩阵分解

Vamsi K. Potluru, Sergey M. Plis, Jonathan Le Roux, Barak A. Pearlmutter, Vince D. Calhoun, Thomas P. Hayes

AI总结针对稀疏NMF问题，提出基于L1/L2混合范数的块坐标下降算法，在保持稀疏性的同时显著提升计算速度，适用于大规模数据集。

详情

AI中文摘要

非负矩阵分解（NMF）已成为数据分析中无处不在的工具。一个重要变体是稀疏NMF问题，当我们明确要求学习到的特征稀疏时就会出现。稀疏性的自然度量是L$_0$范数，但其优化是NP难的。混合范数（如L$_1$/L$_2$度量）已被证明能够基于这些度量需要满足的直观属性来稳健地建模稀疏性。这与计算上更便宜的替代方案（如普通L$_1$范数）形成对比。然而，当前为优化混合范数L$_1$/L$_2$而设计的算法速度较慢，并且已经提出了其他稀疏NMF的公式，例如基于L$_1$和L$_0$范数的公式。我们提出的算法允许我们在不牺牲计算时间的情况下解决混合范数稀疏约束。我们在真实世界数据集上的实验证据表明，与当前最先进的优化混合范数的求解器相比，我们的新算法速度快一个数量级，并且适用于大规模数据集。

英文摘要

Nonnegative matrix factorization (NMF) has become a ubiquitous tool for data analysis. An important variant is the sparse NMF problem which arises when we explicitly require the learnt features to be sparse. A natural measure of sparsity is the L$_0$ norm, however its optimization is NP-hard. Mixed norms, such as L$_1$/L$_2$ measure, have been shown to model sparsity robustly, based on intuitive attributes that such measures need to satisfy. This is in contrast to computationally cheaper alternatives such as the plain L$_1$ norm. However, present algorithms designed for optimizing the mixed norm L$_1$/L$_2$ are slow and other formulations for sparse NMF have been proposed such as those based on L$_1$ and L$_0$ norms. Our proposed algorithm allows us to solve the mixed norm sparsity constraints while not sacrificing computation time. We present experimental evidence on real-world datasets that shows our new algorithm performs an order of magnitude faster compared to the current state-of-the-art solvers optimizing the mixed norm and is suitable for large-scale datasets.

URL PDF HTML ☆

赞 0 踩 0

1301.3389 2026-06-03 math.NA cs.LG cs.NA 版本更新

The Diagonalized Newton Algorithm for Nonnegative Matrix Factorization

非负矩阵分解的对角化牛顿算法

Hugo Van hamme

AI总结针对非负矩阵分解问题，提出一种对角化牛顿算法（DNA），在保持实现简单性的同时加速收敛，适用于高秩问题。

Comments 8 pages + references; International Conference on Learning Representations, 2013

1106.6104 2026-06-03 math.OC cs.LG cs.SY eess.SY math.PR math.ST stat.TH 版本更新

Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems

多臂赌博机问题的探索与利用确定性排序

Sattar Vakili, Keqin Liu, Qing Zhao

AI总结提出基于探索与利用确定性排序（DSEE）的策略，针对轻尾分布实现最优对数遗憾，针对重尾分布达到次优遗憾，并推广到多种MAB变体。

Comments 22 pages, 2 figures

详情

AI中文摘要

在多臂赌博机（MAB）问题中，存在一组具有未知奖励模型的臂。在每个时刻，玩家选择一个臂进行游戏，旨在最大化在T长度时间范围内的总期望奖励。本文开发了一种基于探索与利用确定性排序（DSEE）的方法来构建顺序臂选择策略。结果表明，对于所有轻尾奖励分布，DSEE实现了遗憾的最优对数阶，其中遗憾定义为相对于已知奖励模型的理想情况的总期望奖励损失。对于重尾奖励分布，当奖励分布的矩存在到p阶（1<p≤2）时，DSEE实现了O(T^{1/p})的遗憾，对于p>2时实现了O(T^{1/(1+p/2)})的遗憾。利用对重尾奖励分布有限矩的上界知识，DSEE提供了最优的对数遗憾阶。所提出的DSEE方法通过为一般奖励分布提供相应结果，补充了现有的MAB工作。此外，通过明确定义的可调参数——探索序列的基数，DSEE方法易于扩展到MAB的变体，包括具有不同目标的MAB、具有多个玩家和碰撞下不完全奖励观测的分散式MAB、具有未知马尔可夫动力学的MAB，以及具有依赖臂的组合MAB，这些常出现在网络优化问题中，如未知随机权重下的最短路径、最小生成树和支配集问题。

英文摘要

In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies. It is shown that for all light-tailed reward distributions, DSEE achieves the optimal logarithmic order of the regret, where regret is defined as the total expected reward loss against the ideal case with known reward models. For heavy-tailed reward distributions, DSEE achieves O(T^1/p) regret when the moments of the reward distributions exist up to the pth order for 1<p<=2 and O(T^1/(1+p/2)) for p>2. With the knowledge of an upperbound on a finite moment of the heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret order. The proposed DSEE approach complements existing work on MAB by providing corresponding results for general reward distributions. Furthermore, with a clearly defined tunable parameter-the cardinality of the exploration sequence, the DSEE approach is easily extendable to variations of MAB, including MAB with various objectives, decentralized MAB with multiple players and incomplete reward observations under collisions, MAB with unknown Markov dynamics, and combinatorial MAB with dependent arms that often arise in network optimization problems such as the shortest path, the minimum spanning, and the dominating set problems under unknown random weights.

URL PDF HTML ☆

赞 0 踩 0

1303.1264 2026-06-03 cs.LG cs.NA math.NA 版本更新

Discovery of factors in matrices with grades

带等级矩阵中的因子发现

Radim Belohlavek, Vilem Vychodil

AI总结提出一种针对有序数据矩阵的分解与因子分析方法，基于完全剩余格结构，利用几何洞察识别矩形子矩阵作为最优因子，并设计贪心近似算法实现少量因子的分解。

详情

AI中文摘要

我们提出了一种处理有序数据矩阵的分解与因子分析的方法。矩阵中的条目是对象（由行表示）满足属性（由列表示）的等级，例如图像红色的程度、产品具有给定特征的程度或一个人在测试中表现良好的程度。我们假设这些等级构成一个有界尺度，配备特定的聚合算子，并符合完全剩余格的结构。我们提出了一种贪心近似算法，用于在因子数量较小的限制下，将此类矩阵分解为两个带等级矩阵的乘积。我们的算法基于一个定理提供的几何洞察，该定理将特定的矩形子矩阵识别为分解的最优因子。这些因子对应于输入数据的形式概念，并允许对分解进行简单解释。我们提供了说明性示例和实验评估。

英文摘要

We present an approach to decomposition and factor analysis of matrices with ordinal data. The matrix entries are grades to which objects represented by rows satisfy attributes represented by columns, e.g. grades to which an image is red, a product has a given feature, or a person performs well in a test. We assume that the grades form a bounded scale equipped with certain aggregation operators and conforms to the structure of a complete residuated lattice. We present a greedy approximation algorithm for the problem of decomposition of such matrix in a product of two matrices with grades under the restriction that the number of factors be small. Our algorithm is based on a geometric insight provided by a theorem identifying particular rectangular-shaped submatrices as optimal factors for the decompositions. These factors correspond to formal concepts of the input data and allow an easy interpretation of the decomposition. We present illustrative examples and experimental evaluation.

URL PDF HTML ☆

赞 0 踩 0

1302.7283 2026-06-03 cs.LG cs.NA math.NA 版本更新

Source Separation using Regularized NMF with MMSE Estimates under GMM Priors with Online Learning for The Uncertainties

基于GMM先验下MMSE估计的正则化NMF源分离及其不确定性在线学习

Emad M. Grais, Hakan Erdogan

AI总结提出一种在非负矩阵分解中引入高斯混合模型先验的最小均方误差估计正则化方法，用于单通道源分离，通过在线学习不确定性提升分离性能。

详情

AI中文摘要

变分优化

Joe Staines, David Barber

AI总结本文提出一种通用技术，通过构造可微边界来优化不可微或离散目标函数，并应用于稀疏学习与支持向量分类。

1212.2475 2026-06-03 cs.LG cs.SY eess.SY 版本更新

Efficient Gradient Estimation for Motor Control Learning

运动控制学习的高效梯度估计

Gregory Lawrence, Noah Cowan, Stuart Russell

AI总结针对存在输入噪声的梯度估计问题，提出两种降低估计误差的方法：基于局部线性模型的强化基线法和方差折扣法，并应用于模拟三连杆机械臂的投镖任务，显著改善了奖励函数梯度估计和学习曲线。

Comments Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003)

详情

AI中文摘要

在噪声存在的情况下估计函数梯度的任务是几种强化学习形式的核心，包括策略搜索方法。我们提出了两种技术，用于在可观测输入噪声应用于控制信号时减少梯度估计误差。第一种方法通过拟合一个局部线性模型到被估计梯度的函数，扩展了强化基线的思想；我们展示了如何找到最小化梯度估计方差的线性模型，以及如何从数据中估计该模型。第二种方法通过折扣具有高方差的梯度向量分量进一步改进了这一点。这些方法被应用于运动控制学习问题，其中执行器噪声对行为有显著影响。特别地，我们将这些技术应用于使用模拟三连杆机械臂的投镖任务中学习局部最优控制器；我们证明了所提出的方法显著改善了奖励函数梯度估计，并因此改善了学习曲线，优于现有方法。

英文摘要

The task of estimating the gradient of a function in the presence of noise is central to several forms of reinforcement learning, including policy search methods. We present two techniques for reducing gradient estimation errors in the presence of observable input noise applied to the control signal. The first method extends the idea of a reinforcement baseline by fitting a local linear model to the function whose gradient is being estimated; we show how to find the linear model that minimizes the variance of the gradient estimate, and how to estimate the model from data. The second method improves this further by discounting components of the gradient vector that have high variance. These methods are applied to the problem of motor control learning, where actuator noise has a significant influence on behavior. In particular, we apply the techniques to learn locally optimal controllers for a dart-throwing task using a simulated three-link arm; we demonstrate that proposed methods significantly improve the reward function gradient estimate and, consequently, the learning curve, over existing methods.

URL PDF HTML ☆

赞 0 踩 0

1212.2471 2026-06-03 cs.LG cs.AI cs.NA math.NA 版本更新

Monte Carlo Matrix Inversion Policy Evaluation

蒙特卡洛矩阵求逆策略评估

Fletcher Lu, Dale Schuurmans

AI总结提出使用蒙特卡洛矩阵求逆（MCMI）进行强化学习策略评估，通过重要性采样降低方差，并在运行时间和准确性上优于最大似然模型和时序差分方法。

Comments Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003)

详情

AI中文摘要

1950年，Forsythe和Leibler（1950）引入了一种统计技术，通过将矩阵逆的元素表征为一系列随机游走的期望值来求矩阵的逆。Barto和Duff（1994）随后展示了该技术与标准动态规划和时序差分方法之间的关系。蒙特卡洛矩阵求逆（MCMI）方法的优势在于，它相对于其他技术，在状态空间大小方面具有更好的可扩展性。在本文中，我们介绍了一种使用MCMI进行强化学习策略评估的算法。我们证明，MCMI在运行时间上优于基于最大似然模型的策略评估方法，并且在运行时间和准确性上都优于时序差分（TD）策略评估方法。我们进一步通过向算法添加重要性采样技术来降低估计器的方差，从而改进了MCMI策略评估。最后，我们展示了将MCMI扩展到大规模状态空间以进行策略改进的技术。

英文摘要

In 1950, Forsythe and Leibler (1950) introduced a statistical technique for finding the inverse of a matrix by characterizing the elements of the matrix inverse as expected values of a sequence of random walks. Barto and Duff (1994) subsequently showed relations between this technique and standard dynamic programming and temporal differencing methods. The advantage of the Monte Carlo matrix inversion (MCMI) approach is that it scales better with respect to state-space size than alternative techniques. In this paper, we introduce an algorithm for performing reinforcement learning policy evaluation using MCMI. We demonstrate that MCMI improves on runtime over a maximum likelihood model-based policy evaluation approach and on both runtime and accuracy over the temporal differencing (TD) policy evaluation approach. We further improve on MCMI policy evaluation by adding an importance sampling technique to our algorithm to reduce the variance of our estimator. Lastly, we illustrate techniques for scaling up MCMI to large state spaces in order to perform policy improvement.

URL PDF HTML ☆

赞 0 踩 0

1211.7369 2026-06-03 stat.ML cs.LG cs.NA math.NA 版本更新

Approximate Rank-Detecting Factorization of Low-Rank Tensors

低秩张量的近似秩检测分解

Franz J. Király, Andreas Ziehe

AI总结提出AROFAC2算法，通过检测三阶张量的CP秩并分解为秩一分量，具有内在检测真实秩、避免虚假分量、对异常值和非高斯噪声鲁棒的优势。

1211.5414 2026-06-03 cs.DS cs.LG cs.NA math.NA stat.ML 版本更新

Analysis of a randomized approximation scheme for matrix multiplication

矩阵乘法的随机近似方案分析

Daniel Hsu, Sham M. Kakade, Tong Zhang

AI总结本文分析了Sarlos (2006)提出的基于随机旋转和均匀列采样的矩阵乘法随机近似方案，利用矩阵Bernstein不等式和次高斯随机向量二次型的尾部不等式给出简单分析。

1103.1417 2026-06-03 math.ST cs.LG cs.SY eess.SY math.OC math.PR stat.TH 版本更新

Localization from Incomplete Noisy Distance Measurements

基于不完整含噪距离测量的定位

Adel Javanmard, Andrea Montanari

AI总结针对含噪部分距离测量下的欧氏空间点云定位问题，提出基于半定规划的算法，并刻画其在随机几何图模型下的性能边界。

Comments 46 pages, 8 figures, numerical experiments added. Journal version (v1,v2: Conference versions, ISIT 2011); Journal of Foundations of Computational Mathematics, 2012

详情

DOI: 10.1007/s10208-012-9129-5

AI中文摘要

我们考虑在欧氏空间 $\mathbb{R}^d$ 中利用部分成对距离的含噪测量来定位点云的问题。该任务在传感器网络定位和从NMR测量重建蛋白质构象等领域有应用。此外，它与降维问题和流形学习密切相关，后者的目标是通过局部（或部分）度量信息学习数据集的潜在全局几何结构。本文提出一种基于半定规划的重建算法。对于随机几何图模型和一致有界噪声，我们精确刻画了算法的性能：在无噪声情况下，我们找到一个半径 $r_0$，超过该半径算法能重建精确位置（直至刚性变换）。在存在噪声的情况下，我们得到的重建误差上下界仅相差一个依赖于维度 $d$ 和图中节点平均度的因子。

英文摘要

We consider the problem of positioning a cloud of points in the Euclidean space $\mathbb{R}^d$, using noisy measurements of a subset of pairwise distances. This task has applications in various areas, such as sensor network localization and reconstruction of protein conformations from NMR measurements. Also, it is closely related to dimensionality reduction problems and manifold learning, where the goal is to learn the underlying global geometry of a data set using local (or partial) metric information. Here we propose a reconstruction algorithm based on semidefinite programming. For a random geometric graph model and uniformly bounded noise, we provide a precise characterization of the algorithm's performance: In the noiseless case, we find a radius $r_0$ beyond which the algorithm reconstructs the exact positions (up to rigid transformations). In the presence of noise, we obtain upper and lower bounds on the reconstruction error that match up to a factor that depends only on the dimension $d$, and the average degree of the nodes in the graph.

URL PDF HTML ☆

赞 0 踩 0

1011.4104 2026-06-03 cs.LG cs.NA math.NA math.SP 版本更新

Clustering and Latent Semantic Indexing Aspects of the Singular Value Decomposition

奇异值分解的聚类和潜在语义索引方面

Andri Mirzal

AI总结本文解释了奇异值分解（SVD）如何用于聚类，并指出其聚类与潜在语义索引（LSI）源于同一原理，进而设计了一种模拟SVD聚类能力的LSI算法，无需指定分解秩，性能与SVD相当。

Comments 38 pages, submitted to Pattern Recognition

详情

AI中文摘要

本文讨论了奇异值分解（SVD）的聚类和潜在语义索引（LSI）方面。本文的目的有两个。第一是解释奇异向量如何以及为何可用于聚类。第二是表明这两个看似无关的SVD方面实际上源于同一来源：在低秩近似矩阵的图表示中，相关顶点比在原始语义图中更倾向于聚集在一起。因此，SVD可以提高信息检索系统的检索性能，因为对近似矩阵的查询比原始矩阵的相同查询能检索到更多相关文档并过滤掉更多不相关文档。利用这一事实，我们将设计一种LSI算法，模拟SVD在聚类相关顶点方面的能力。收敛性分析表明该算法收敛，并对每个输入产生唯一解。使用LSI研究中一些标准数据集的实验结果表明，该算法的检索性能与SVD相当。此外，该算法更实用且更易使用，因为无需确定分解秩，而分解秩对驱动SVD的检索性能至关重要。

英文摘要

This paper discusses clustering and latent semantic indexing (LSI) aspects of the singular value decomposition (SVD). The purpose of this paper is twofold. The first is to give an explanation on how and why the singular vectors can be used in clustering. And the second is to show that the two seemingly unrelated SVD aspects actually originate from the same source: related vertices tend to be more clustered in the graph representation of lower rank approximate matrix using the SVD than in the original semantic graph. Accordingly, the SVD can improve retrieval performance of an information retrieval system since queries made to the approximate matrix can retrieve more relevant documents and filter out more irrelevant documents than the same queries made to the original matrix. By utilizing this fact, we will devise an LSI algorithm that mimicks SVD capability in clustering related vertices. Convergence analysis shows that the algorithm is convergent and produces a unique solution for each input. Experimental results using some standard datasets in LSI research show that retrieval performances of the algorithm are comparable to the SVD's. In addition, the algorithm is more practical and easier to use because there is no need to determine decomposition rank which is crucial in driving retrieval performance of the SVD.

URL PDF HTML ☆

赞 0 踩 0

1211.3444 2026-06-03 cs.LG cs.NA math.NA stat.ML 版本更新

$l_0$ 正则化凸锥规划问题的迭代硬阈值方法

Zhaosong Lu

AI总结提出迭代硬阈值方法及其变体求解 $l_0$ 正则化凸锥规划，证明收敛到局部极小点并建立迭代复杂度。

Comments 25 pages

1202.5298 2026-06-03 eess.SY cs.LG cs.SY 版本更新

Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

两阶段确定性批量模式强化学习的最小最大泛化：松弛方案

Raphael Fonteneau, Damien Ernst, Bernard Boigelot, Quentin Louveaux

AI总结针对确定性批量模式强化学习中的最小最大优化问题，提出两种松弛方案（约束丢弃和拉格朗日对偶化）以降低计算复杂度，并证明其优于现有方法。

1210.5034 2026-06-03 cs.LG cs.CV cs.NA math.NA 版本更新

Optimal Computational Trade-Off of Inexact Proximal Methods

非精确近端方法的最优计算权衡

Pierre Machart, Sandrine Anthoine, Luca Baldassarre

AI总结本文研究近端梯度方法在计算代价与收敛速度之间的权衡，提出了一种计算高效且易于实现的快速非精确近端梯度算法（SIP）。

详情

AI中文摘要

在本文中，我们研究了在使用近端梯度方法（机器学习中流行的优化工具）最小化复合泛函时，收敛速度与计算代价之间的权衡。我们考虑近端算子通过迭代过程计算的情况，该过程提供了精确近端算子的近似。在这种情况下，我们得到具有两个嵌套循环的算法。我们表明，在有限时间内达到所需精度的解时，最小化计算代价的策略是将内迭代次数设置为常数，这与收敛速度分析所指示的策略不同。在此过程中，我们还提出了一种称为SIP（快速非精确近端梯度算法）的新程序，该程序既计算高效又易于实现。我们的数值实验证实了理论发现，并表明SIP可以成为标准程序的非常有竞争力的替代方案。

英文摘要

In this paper, we investigate the trade-off between convergence rate and computational cost when minimizing a composite functional with proximal-gradient methods, which are popular optimisation tools in machine learning. We consider the case when the proximity operator is computed via an iterative procedure, which provides an approximation of the exact proximity operator. In that case, we obtain algorithms with two nested loops. We show that the strategy that minimizes the computational cost to reach a solution with a desired accuracy in finite time is to set the number of inner iterations to a constant, which differs from the strategy indicated by a convergence rate analysis. In the process, we also present a new procedure called SIP (that is Speedy Inexact Proximal-gradient algorithm) that is both computationally efficient and easy to implement. Our numerical experiments confirm the theoretical findings and suggest that SIP can be a very competitive alternative to the standard procedure.

URL PDF HTML ☆

赞 0 踩 0

1210.4883 2026-06-03 cs.LG cs.NA math.NA stat.ML 版本更新

A Model-Based Approach to Rounding in Spectral Clustering

基于模型的谱聚类舍入方法

Leonard K. M. Poon, April H. Liu, Tengfei Liu, Nevin Lianwen Zhang

AI总结提出一种基于潜树模型的谱聚类舍入方法，同时解决特征向量选择、聚类数确定和数据划分三个子问题。

Comments Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

详情

AI中文摘要

在谱聚类中，首先为数据点集合定义相似矩阵，然后变换矩阵得到拉普拉斯矩阵，接着计算拉普拉斯矩阵的特征向量，最后利用前导特征向量获得数据的划分。最后一步有时称为舍入，需要决定使用多少个前导特征向量、确定聚类数以及划分数据点。本文提出了一种新的舍入方法。该方法在三个方面与以往方法不同。首先，我们放宽了聚类数等于所用特征向量数的假设。其次，在决定使用多少个前导特征向量时，我们不仅依赖前导特征向量本身包含的信息，还使用后续特征向量。第三，我们的方法是基于模型的，并使用一类称为潜树模型的图模型来解决舍入的三个子问题。我们在合成数据和真实数据上评估了该方法。结果表明，在理想情况下（即类间相似度为0），我们的方法能够正确工作，并且随着偏离理想情况，性能会优雅地下降。

英文摘要

In spectral clustering, one defines a similarity matrix for a collection of data points, transforms the matrix to get the Laplacian matrix, finds the eigenvectors of the Laplacian matrix, and obtains a partition of the data using the leading eigenvectors. The last step is sometimes referred to as rounding, where one needs to decide how many leading eigenvectors to use, to determine the number of clusters, and to partition the data points. In this paper, we propose a novel method for rounding. The method differs from previous methods in three ways. First, we relax the assumption that the number of clusters equals the number of eigenvectors used. Second, when deciding the number of leading eigenvectors to use, we not only rely on information contained in the leading eigenvectors themselves, but also use subsequent eigenvectors. Third, our method is model-based and solves all the three subproblems of rounding using a class of graphical models called latent tree models. We evaluate our method on both synthetic and real-world data. The results show that our method works correctly in the ideal case where between-clusters similarity is 0, and degrades gracefully as one moves away from the ideal case.

URL PDF HTML ☆

赞 0 踩 0

1210.4081 2026-06-03 math.NA cs.CV cs.DS cs.LG cs.NA math.OC 版本更新

Getting Feasible Variable Estimates From Infeasible Ones: MRF Local Polytope Study

从不可行变量估计获得可行变量估计：MRF局部多面体研究

Bogdan Savchynskyy, Stefan Schmidt

AI总结针对具有可分离性的大规模优化问题，提出一种从对偶解构造近似可行原始解的方法，并应用于马尔可夫随机场推理问题的局部多面体松弛，证明其优于现有方法。

Comments 20 page, 4 figures

1202.3772 2026-06-03 cs.LG cs.NA math.NA stat.ML 版本更新

Rank/Norm Regularization with Closed-Form Solutions: Application to Subspace Clustering

具有闭式解的秩/范数正则化：应用于子空间聚类

Yao-Liang Yu, Dale Schuurmans

AI总结本文通过推广Eckart-Young-Mirsky定理到所有酉不变范数，得到秩/范数正则化问题的闭式解，并应用于子空间聚类，获得新理论见解和实验效果。

Comments 11 pages, 1 figure, appeared in UAI 2011. One footnote corrected and appendix added

1107.3090 2026-06-03 cs.CC cs.LG cs.SY eess.SY math.OC 版本更新

On the Computational Complexity of Stochastic Controller Optimization in POMDPs

关于POMDP中随机控制器优化的计算复杂度

Nikos Vlassis, Michael L. Littman, David Barber

AI总结本文证明在马尔可夫决策过程中寻找最优随机“盲”控制器是NP难问题，相应的决策问题属于PSPACE且是SQRT-SUM难的，并指出POMDP中更一般的随机控制器优化问题也是NP难的，但存在一个凸的特殊情况可高效求解。

Comments Corrected error in the proof of Theorem 2, and revised Section 5

1206.4481 2026-06-03 math.NA cs.LG cs.NA 版本更新

Parsimonious Mahalanobis Kernel for the Classification of High Dimensional Data

用于高维数据分类的简约马氏核

M. Fauvel, A. Villa, J. Chanussot, J. A. Benediktsson

AI总结利用高维空间的空性，基于马氏距离提出一种简约核，通过高维判别分析模型估计信号和噪声子空间，实现稳定逆协方差矩阵，并在SVM框架下优化半径-间隔界，实验表明该核优于高斯核。

详情

AI中文摘要

本文考虑使用核方法对高维数据进行分类。利用高维空间的空性，提出了一种基于马氏距离的核。计算马氏距离需要协方差矩阵的逆。在高维空间中，估计的协方差矩阵是病态的，其逆不稳定或不可能。使用简约统计模型，即高维判别分析模型，为每个考虑的类别估计特定的信号和噪声子空间，使得类别特定协方差矩阵的逆显式且稳定，从而定义了简约马氏核。采用基于SVM的框架，通过优化所谓的半径-间隔界来选择简约马氏核的超参数。在三个高维数据集上的实验结果表明，所提出的核适用于高维数据分类，比传统高斯核提供更好的分类精度。

英文摘要

The classification of high dimensional data with kernel methods is considered in this article. Exploit- ing the emptiness property of high dimensional spaces, a kernel based on the Mahalanobis distance is proposed. The computation of the Mahalanobis distance requires the inversion of a covariance matrix. In high dimensional spaces, the estimated covariance matrix is ill-conditioned and its inversion is unstable or impossible. Using a parsimonious statistical model, namely the High Dimensional Discriminant Analysis model, the specific signal and noise subspaces are estimated for each considered class making the inverse of the class specific covariance matrix explicit and stable, leading to the definition of a parsimonious Mahalanobis kernel. A SVM based framework is used for selecting the hyperparameters of the parsimonious Mahalanobis kernel by optimizing the so-called radius-margin bound. Experimental results on three high dimensional data sets show that the proposed kernel is suitable for classifying high dimensional data, providing better classification accuracies than the conventional Gaussian kernel.

URL PDF HTML ☆

赞 0 踩 0

1209.0001 2026-06-03 cs.LG cs.NA math.NA stat.ML 版本更新

An Improved Bound for the Nystrom Method for Large Eigengap

大特征间隙下Nyström方法的改进界

Mehrdad Mahdavi, Tianbao Yang, Rong Jin

AI总结针对核矩阵谱中存在大特征间隙的情况，基于积分算子集中不等式和矩阵扰动理论，将Nyström方法的Frobenius范数近似误差从O(N/m^{1/4})改进到O(N/m^{1/2})。

1208.0864 2026-06-03 math.OC cs.LG cs.SY eess.SY 版本更新

Statistical Results on Filtering and Epi-convergence for Learning-Based Model Predictive Control

基于学习的模型预测控制的滤波与上收敛统计结果

Anil Aswani, Humberto Gonzalez, S. Shankar Sastry, Claire Tomlin

AI总结本文证明了基于学习的模型预测控制中测量模型选择的合理性，并给出了随机收敛性证明，同时证明了用于LBMPC的非参数估计器的统计性质。

详情

AI中文摘要

基于学习的模型预测控制（LBMPC）是一种提供鲁棒性确定性保证的技术，同时使用统计识别工具来识别更丰富的系统模型以提高性能。本技术说明提供了证明，阐明我们选择测量模型的原因，并给出了关于LBMPC随机收敛性的证明。第一部分讨论了可用常微分方程（ODE）描述的动力系统的同时状态估计和未建模动力学的统计识别（或学习）。第二部分提供了关于可与基于学习的模型预测控制（LBMPC）技术一起使用的不同统计估计器的上收敛的证明。特别地，我们证明了一种非参数估计器的统计性质，该估计器设计用于与LBMPC结合使用时具有正确的确定性和随机性数值实现性质。

英文摘要

Learning-based model predictive control (LBMPC) is a technique that provides deterministic guarantees on robustness, while statistical identification tools are used to identify richer models of the system in order to improve performance. This technical note provides proofs that elucidate the reasons for our choice of measurement model, as well as giving proofs concerning the stochastic convergence of LBMPC. The first part of this note discusses simultaneous state estimation and statistical identification (or learning) of unmodeled dynamics, for dynamical systems that can be described by ordinary differential equations (ODE's). The second part provides proofs concerning the epi-convergence of different statistical estimators that can be used with the learning-based model predictive control (LBMPC) technique. In particular, we prove results on the statistical properties of a nonparametric estimator that we have designed to have the correct deterministic and stochastic properties for numerical implementation when used in conjunction with LBMPC.

URL PDF HTML ☆

赞 0 踩 0

1107.2487 2026-06-03 math.OC cs.LG cs.SY eess.SY math.ST stat.TH 版本更新

Provably Safe and Robust Learning-Based Model Predictive Control

可证明安全且鲁棒的基于学习的模型预测控制

Anil Aswani, Humberto Gonzalez, S. Shankar Sastry, Claire Tomlin

AI总结提出一种基于学习的模型预测控制（LBMPC）方案，通过解耦安全与性能，利用统计学习改进性能并保证鲁棒性。

详情

AI中文摘要

控制器设计面临鲁棒性与性能之间的权衡，线性控制器的可靠性使得许多从业者关注前者。然而，为了应对日益增长的能源约束，提高系统性能重新引起兴趣。本文描述了一种基于学习的模型预测控制（LBMPC）方案，该方案提供鲁棒性的确定性保证，同时使用统计识别工具来识别更丰富的系统模型以提高性能；该框架的优点在于它处理状态和输入约束，根据成本函数优化系统性能，并且可以设计使用各种参数或非参数统计工具。LBMPC的主要见解是，在优化框架中，通过维护两个系统模型，可以在合理条件下解耦安全性和性能。第一个是具有不确定性界限的近似模型，第二个模型通过统计方法更新。LBMPC通过选择最小化成本的输入（受学习动力学约束）来提高性能，并通过检查这些相同的输入是否在不确定性下保持近似模型稳定来确保安全性和鲁棒性。此外，我们证明如果系统充分激励，则LBMPC控制动作概率收敛到使用真实动力学计算的MPC的控制动作。

英文摘要

Controller design faces a trade-off between robustness and performance, and the reliability of linear controllers has caused many practitioners to focus on the former. However, there is renewed interest in improving system performance to deal with growing energy constraints. This paper describes a learning-based model predictive control (LBMPC) scheme that provides deterministic guarantees on robustness, while statistical identification tools are used to identify richer models of the system in order to improve performance; the benefits of this framework are that it handles state and input constraints, optimizes system performance with respect to a cost function, and can be designed to use a wide variety of parametric or nonparametric statistical tools. The main insight of LBMPC is that safety and performance can be decoupled under reasonable conditions in an optimization framework by maintaining two models of the system. The first is an approximate model with bounds on its uncertainty, and the second model is updated by statistical methods. LBMPC improves performance by choosing inputs that minimize a cost subject to the learned dynamics, and it ensures safety and robustness by checking whether these same inputs keep the approximate model stable when it is subject to uncertainty. Furthermore, we show that if the system is sufficiently excited, then the LBMPC control action probabilistically converges to that of an MPC computed using the true dynamics.

URL PDF HTML ☆

赞 0 踩 0

1207.3438 2026-06-03 stat.ML cs.LG cs.NA math.NA 版本更新

MahNMF: Manhattan Non-negative Matrix Factorization

MahNMF: 曼哈顿非负矩阵分解

Naiyang Guan, Dacheng Tao, Zhigang Luo, John Shawe-Taylor

AI总结针对重尾噪声和异常值问题，提出基于曼哈顿距离的MahNMF模型，并开发了秩一残差迭代和Nesterov平滑两种快速优化算法。

Comments 43 pages, 20 figures, 2 tables, submission to Journal of Machine Learning Research

详情

AI中文摘要

非负矩阵分解（NMF）通过两个非负低秩因子矩阵 $W$ 和 $H$ 的乘积来逼近非负矩阵 $X$。NMF 及其扩展通过最小化 $X$ 与 $W^T H$ 之间的 Kullback-Leibler 散度或欧氏距离来建模泊松噪声或高斯噪声。然而，当噪声分布具有重尾特性时，这些方法表现不佳。本文提出曼哈顿 NMF（MahNMF），通过最小化 $X$ 与 $W^T H$ 之间的曼哈顿距离来建模重尾拉普拉斯噪声。与稀疏和低秩矩阵分解类似，MahNMF 能够鲁棒地估计非负矩阵的低秩部分和稀疏部分，从而在数据受到异常值污染时有效工作。我们通过开发带盒约束的 MahNMF、流形正则化 MahNMF、组稀疏 MahNMF、弹性网诱导 MahNMF 和对称 MahNMF，将 MahNMF 扩展到各种实际应用。本文的主要贡献在于为 MahNMF 及其扩展提出了两种快速优化算法：秩一残差迭代（RRI）方法和 Nesterov 平滑方法。具体地，通过将 MahNMF 中的残差矩阵近似为 $W$ 的一行和 $H$ 的一行的外积，我们开发了 RRI 方法，以闭式解迭代更新 $W$ 和 $H$ 的每个变量。尽管 RRI 对于小规模 MahNMF 及其某些扩展是高效的，但它既不能扩展到大规模矩阵，也不够灵活以优化所有 MahNMF 扩展。由于 MahNMF 及其扩展的目标函数既非凸也不光滑，我们应用 Nesterov 平滑方法，在固定一个因子矩阵的情况下递归优化另一个因子矩阵。通过将平滑参数设置为与迭代次数成反比，我们逐步提高了 MahNMF 及其扩展的逼近精度。

英文摘要

Non-negative matrix factorization (NMF) approximates a non-negative matrix $X$ by a product of two non-negative low-rank factor matrices $W$ and $H$. NMF and its extensions minimize either the Kullback-Leibler divergence or the Euclidean distance between $X$ and $W^T H$ to model the Poisson noise or the Gaussian noise. In practice, when the noise distribution is heavy tailed, they cannot perform well. This paper presents Manhattan NMF (MahNMF) which minimizes the Manhattan distance between $X$ and $W^T H$ for modeling the heavy tailed Laplacian noise. Similar to sparse and low-rank matrix decompositions, MahNMF robustly estimates the low-rank part and the sparse part of a non-negative matrix and thus performs effectively when data are contaminated by outliers. We extend MahNMF for various practical applications by developing box-constrained MahNMF, manifold regularized MahNMF, group sparse MahNMF, elastic net inducing MahNMF, and symmetric MahNMF. The major contribution of this paper lies in two fast optimization algorithms for MahNMF and its extensions: the rank-one residual iteration (RRI) method and Nesterov's smoothing method. In particular, by approximating the residual matrix by the outer product of one row of W and one row of $H$ in MahNMF, we develop an RRI method to iteratively update each variable of $W$ and $H$ in a closed form solution. Although RRI is efficient for small scale MahNMF and some of its extensions, it is neither scalable to large scale matrices nor flexible enough to optimize all MahNMF extensions. Since the objective functions of MahNMF and its extensions are neither convex nor smooth, we apply Nesterov's smoothing method to recursively optimize one factor matrix with another matrix fixed. By setting the smoothing parameter inversely proportional to the iteration number, we improve the approximation accuracy iteratively for both MahNMF and its extensions.

URL PDF HTML ☆

赞 0 踩 0

1203.1007 2026-06-03 cs.LG cs.AI cs.SY eess.SY stat.ML 版本更新

Agnostic System Identification for Model-Based Reinforcement Learning

基于模型的强化学习的不可知系统辨识

Stephane Ross, J. Andrew Bagnell

AI总结针对模型类可能不包含真实系统的不可知情况，提出一种利用无遗憾在线学习算法获得近优策略的迭代方法，并在离散和连续域上验证其有效性。

Comments 8 pages, published in ICML 2012

1206.6857 2026-06-03 cs.LG cs.NA math.NA stat.ML 版本更新

Faster Gaussian Summation: Theory and Experiment

更快的高斯求和：理论与实验

Dongryeol Lee, Alexander G. Gray

AI总结本文针对机器学习中常见的高斯求和问题，提出两种新扩展（带严格误差界的O(Dp)泰勒展开和集成任意近似方法的新误差控制方案），并在自适应分层数据结构框架下实现更快的算法，通过核密度估计中的最优带宽选择实验首次揭示了当前最先进方法的优缺点。

Comments Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006)

1206.6833 2026-06-03 cs.LG cs.CE cs.NA math.NA stat.ML 版本更新

Matrix Tile Analysis

矩阵瓦片分析

Inmar Givoni, Vincent Cheung, Brendan J. Frey

AI总结提出矩阵瓦片分析（MTA）问题，通过非重叠瓦片分解矩阵，并设计近似迭代算法和和积松弛方法，在合成数据和酵母基因敲除数据上验证其有效性。

Comments Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006)

详情

AI中文摘要

许多任务需要在数字、符号或类别似然矩阵中寻找元素组。一种方法是使用高效的双线性或三线性分解技术，包括PCA、ICA、稀疏矩阵分解和格子分析。当矩阵元素的加法和乘法没有明确定义时，这些技术不适用。更直接地，像双聚类这样的方法可用于对矩阵元素进行分类，但这些方法做出了过于严格的假设，即每个元素的类别是行类别和列类别的函数。我们引入一个通用的计算问题——矩阵瓦片分析（MTA），它将矩阵分解为一组非重叠的瓦片，每个瓦片由通常不相邻的行和列的子集定义。MTA不需要用于组合瓦片的代数，但必须搜索瓦片分配的离散组合。精确MTA是一个计算上难以处理的整数规划问题，但我们描述了一种近似迭代技术和一种计算高效的整数规划和积松弛。我们在数百个随机生成的任务上比较了这些方法与PCA和格子分析的有效性。利用双基因敲除数据，我们展示了MTA找到了具有生物学相关功能的相互作用酵母基因群。

英文摘要

Many tasks require finding groups of elements in a matrix of numbers, symbols or class likelihoods. One approach is to use efficient bi- or tri-linear factorization techniques including PCA, ICA, sparse matrix factorization and plaid analysis. These techniques are not appropriate when addition and multiplication of matrix elements are not sensibly defined. More directly, methods like bi-clustering can be used to classify matrix elements, but these methods make the overly-restrictive assumption that the class of each element is a function of a row class and a column class. We introduce a general computational problem, `matrix tile analysis' (MTA), which consists of decomposing a matrix into a set of non-overlapping tiles, each of which is defined by a subset of usually nonadjacent rows and columns. MTA does not require an algebra for combining tiles, but must search over discrete combinations of tile assignments. Exact MTA is a computationally intractable integer programming problem, but we describe an approximate iterative technique and a computationally efficient sum-product relaxation of the integer program. We compare the effectiveness of these methods to PCA and plaid on hundreds of randomly generated tasks. Using double-gene-knockout data, we show that MTA finds groups of interacting yeast genes that have biologically-related functions.

URL PDF HTML ☆

赞 0 踩 0

1206.6474 2026-06-03 cs.DS cs.LG cs.NA math.NA stat.ML 版本更新

Estimation of Simultaneously Sparse and Low Rank Matrices

同时稀疏和低秩矩阵的估计

Emile Richard, Pierre-Andre Savalle, Nicolas Vayatis

AI总结本文提出一种凸混合惩罚方法，同时使用ℓ1范数和迹范数，以估计同时稀疏和低秩的矩阵，并推导了预言不等式和链接预测的泛化误差界，通过近端下降算法高效求解。

Comments Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

1206.6470 2026-06-03 cs.LG cs.DM cs.NA math.NA stat.ML 版本更新

A Combinatorial Algebraic Approach for the Identifiability of Low-Rank Matrix Completion

低秩矩阵完备可辨识性的组合代数方法

Franz Kiraly, Ryota Tomioka

AI总结本文通过组合代数方法，首次给出了任意秩矩阵从一组矩阵条目中可辨识的充要组合条件，并提出了新算法。

Comments Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

1206.6141 2026-06-03 cs.LG cs.SY eess.SY stat.ML 版本更新

Directed Time Series Regression for Control

面向控制的定向时间序列回归

Yi-Hao Kao, Benjamin Van Roy

AI总结提出定向时间序列回归方法，结合最小二乘回归与经验优化的优点，用于确定性等价模型预测控制中的时间序列模型参数估计，在随机倒立摆平衡问题中显著提升控制器性能。

1206.4676 2026-06-03 cs.LG cs.CV cs.NA math.NA stat.ML 版本更新

Clustering by Low-Rank Doubly Stochastic Matrix Decomposition

基于低秩双随机矩阵分解的聚类

Zhirong Yang, Erkki Oja

AI总结提出一种超越矩阵分解的低秩学习方法，通过两步二分随机游走逼近聚类分配概率，利用KL散度最小化实现判别模型的最大似然估计，并采用松弛的MM算法优化，显著提升大规模流形数据的聚类纯度。

Comments ICML2012

详情

AI中文摘要

在过去十年中，通过非负低秩近似进行聚类分析取得了显著进展。然而，该方向上的大多数近似方法仍局限于矩阵分解。我们提出了一种新的低秩学习方法以提高聚类性能，该方法超越了矩阵分解。该近似基于通过虚拟聚类节点的两步二分随机游走，其中近似仅由聚类分配概率构成。通过Kullback-Leibler散度测量的近似误差最小化等价于判别模型的最大似然估计，这为我们的方法提供了坚实的概率解释。优化通过一种松弛的Majorization-Minimization算法实现，该算法在寻找良好局部最小值方面具有优势。此外，我们指出带有Dirichlet先验的正则化算法仅作为初始化。实验结果表明，新方法在各种数据集上，特别是大规模流形数据上，具有强大的聚类纯度性能。

具有线性函数逼近和优先级扫描的Dyna风格规划

Richard S. Sutton, Csaba Szepesvari, Alborz Geramifard, Michael P. Bowling

AI总结本文提出一种基于模型的Dyna风格规划方法，扩展至线性函数逼近，证明其收敛性，并引入线性Dyna的优先级扫描算法。

Comments Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008)

详情

AI中文摘要

我们考虑在在线设置中高效学习最优控制策略和价值函数的问题，其中状态空间很大，且必须在每次与世界交互后获得估计。本文开发了一种显式的基于模型的方法，将Dyna架构扩展到线性函数逼近。Dyna风格规划通过从世界模型生成想象经验，然后将无模型强化学习算法应用于想象的状态转移来进行。我们的主要结果是证明，在自然条件下，线性Dyna风格规划收敛到一个独立于生成分布的唯一解。在策略评估设置中，我们证明极限点是最小二乘（LSTD）解。我们的结果的一个含义是，优先级扫描可以合理地扩展到线性逼近情况，即回溯到前驱特征而不是前驱状态。我们介绍了两种线性Dyna的优先级扫描版本，并在Mountain Car和Boyan Chain问题上简要展示了它们的经验性能。

英文摘要

We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available after each interaction with the world. This paper develops an explicitly model-based approach extending the Dyna architecture to linear function approximation. Dynastyle planning proceeds by generating imaginary experience from the world model and then applying model-free reinforcement learning algorithms to the imagined state transitions. Our main results are to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions. In the policy evaluation setting, we prove that the limit point is the least-squares (LSTD) solution. An implication of our results is that prioritized-sweeping can be soundly extended to the linear approximation case, backing up to preceding features rather than to preceding states. We introduce two versions of prioritized sweeping with linear Dyna and briefly illustrate their performance empirically on the Mountain Car and Boyan Chain problems.

URL PDF HTML ☆

赞 0 踩 0

1008.5373 2026-06-03 math.OC cs.LG cs.NA cs.SY eess.SY math.NA q-fin.CP q-fin.ST 版本更新

Penalty Decomposition Methods for Rank Minimization

秩最小化的罚分解方法

Zhaosong Lu, Yong Zhang

AI总结本文提出罚分解方法求解目标或约束中含秩的秩最小化问题，通过块坐标下降法求解子问题，并证明序列聚点满足一阶最优性条件，在矩阵补全和最近低秩相关矩阵问题上表现优于或持平现有方法。

Comments This paper has been withdrawn by the author

1205.2643 2026-06-03 cs.LG cs.SY eess.SY math.OC stat.CO stat.ML 版本更新

New inference strategies for solving Markov Decision Processes using reversible jump MCMC

使用可逆跳跃MCMC求解马尔可夫决策过程的新推理策略

Matthias Hoffman, Hendrik Kueck, Nando de Freitas, Arnaud Doucet

AI总结本文提出基于可逆跳跃MCMC的改进推理策略，通过新目标分布和打破参数-轨迹相关性，实现高维空间中的最优策略估计。

Comments Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

1008.5372 2026-06-03 math.OC cs.CV cs.IT cs.LG cs.NA math.IT math.NA stat.ME 版本更新

Penalty Decomposition Methods for $L0$-Norm Minimization

L0-范数最小化的罚分解方法

Zhaosong Lu, Yong Zhang

AI总结提出罚分解方法求解含L0-范数的优化问题，通过转化为秩最小化问题并利用向量化操作，在压缩感知等应用中优于现有方法。

Comments This paper has been withdrawn by the author because an updated version has been resubmitted

详情

AI中文摘要

本文考虑一般的l0-范数最小化问题，即目标函数或约束中出现l0-范数的问题。特别地，我们首先将l0-范数约束问题重新表述为等价的秩最小化问题，然后应用[33]中提出的罚分解（PD）方法求解后者。通过利用特殊结构，我们将该方法的所有矩阵运算转化为向量运算，得到仅涉及向量运算的PD方法。在适当的假设下，我们证明PD方法生成的序列的任何聚点满足一阶最优性条件，该条件通常比一个自然最优性条件更强。我们进一步扩展PD方法以求解目标函数中出现l0-范数的问题。最后，通过将PD方法应用于压缩感知、稀疏逻辑回归和稀疏逆协方差选择来测试其性能。计算结果表明，我们的方法在解质量和/或速度方面通常优于现有方法。

英文摘要

In this paper we consider general l0-norm minimization problems, that is, the problems with l0-norm appearing in either objective function or constraint. In particular, we first reformulate the l0-norm constrained problem as an equivalent rank minimization problem and then apply the penalty decomposition (PD) method proposed in [33] to solve the latter problem. By utilizing the special structures, we then transform all matrix operations of this method to vector operations and obtain a PD method that only involves vector operations. Under some suitable assumptions, we establish that any accumulation point of the sequence generated by the PD method satisfies a first-order optimality condition that is generally stronger than one natural optimality condition. We further extend the PD method to solve the problem with the l0-norm appearing in objective function. Finally, we test the performance of our PD methods by applying them to compressed sensing, sparse logistic regression and sparse inverse covariance selection. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed.

URL PDF HTML ☆

赞 0 踩 0

1204.6250 2026-06-03 eess.SY cs.LG cs.SY 版本更新

Feature Selection for Generator Excitation Neurocontroller Development Using Filter Technique

使用滤波技术的发电机励磁神经控制器特征选择

Abdul Ghani Abro, Junita Mohamad Saleh

AI总结针对发电机励磁控制问题，提出采用滤波技术选择最优输入特征以训练人工神经网络控制器，提升控制性能。

Comments 10-Pages, 10-Figures, 8-Tables, International Journal of Computer Science Issues, Vol. 8, Issue 5, No 3, September 2011

详情

Journal ref: International Journal of Computer Science Issues,PP. 108-117, Vol. 8, Issue 5, No 3, September 2011

AI中文摘要

本质上，使用控制系统的动机是生成适当的控制信号，以产生物理过程的期望响应。同步发电机的控制在电力系统运行和控制中始终非常关键。出于某些众所周知的原因，发电机通常在其稳态稳定性极限以下运行。这提高了对高效快速控制器的需求。据报道，人工智能在控制工程领域带来了革命性的成果。人工神经网络（ANN）作为人工智能的一个分支，利用其固有的可观测性，已被用于非线性和自适应控制。神经控制器的整体性能也依赖于输入特征。选择最优特征以最优地训练神经控制器非常关键。数据的质量和大小对于更好的性能同等重要。在这项工作中，采用滤波技术选择用于ANN训练的独立因素。

英文摘要

Essentially, motive behind using control system is to generate suitable control signal for yielding desired response of a physical process. Control of synchronous generator has always remained very critical in power system operation and control. For certain well known reasons power generators are normally operated well below their steady state stability limit. This raises demand for efficient and fast controllers. Artificial intelligence has been reported to give revolutionary outcomes in the field of control engineering. Artificial Neural Network (ANN), a branch of artificial intelligence has been used for nonlinear and adaptive control, utilizing its inherent observability. The overall performance of neurocontroller is dependent upon input features too. Selecting optimum features to train a neurocontroller optimally is very critical. Both quality and size of data are of equal importance for better performance. In this work filter technique is employed to select independent factors for ANN training.

URL PDF HTML ☆

赞 0 踩 0

1106.1933 2026-06-03 cs.GT cs.LG cs.SY eess.SY math.OC 版本更新

Lyapunov stochastic stability and control of robust dynamic coalitional games with transferable utilities

具有可转移效用的鲁棒动态联盟博弈的Lyapunov随机稳定性与控制

Dario Bauso, Puduru Viswanadha Reddy, Tamer Basar

AI总结针对特征函数为连续时间有界均值遍历过程的动态可转移效用博弈，提出基于额外奖励观测的分配规则，确保平均分配收敛到平均博弈的核心且联盟超额收敛到先验给定锥。

详情

AI中文摘要

本文考虑一个具有可转移效用（TU）的动态博弈，其中特征函数是一个连续时间有界均值遍历过程。一个中央规划者通过选择满足预算约束的瞬时分配，随时间与玩家持续交互。在博弈开始前，中央规划者知道过程的性质（有界均值遍历）、联盟值采样的有界集以及长期平均联盟值。另一方面，他不知道产生联盟值的潜在概率函数。我们的目标是找到分配规则，该规则使用对联盟截至当前时间所获得的额外奖励的度量，通过在玩家之间重新分配预算。目标有两个：i) 保证平均分配收敛到平均博弈的核心（或核心中的特定点），ii) 驱动联盟超额收敛到先验给定的锥。由此产生的分配规则是鲁棒的，因为尽管联盟值具有不确定性和时变性，它们仍能保证上述收敛性质。我们强调三个主要贡献。首先，我们基于对额外奖励的完全观测设计了一个分配规则，使得平均分配接近平均博弈核心中的特定点，而联盟超额收敛到先验给定的方向。其次，我们基于对额外奖励的部分观测设计了一个新的分配规则，使得平均分配收敛到平均博弈的核心，而联盟超额收敛到先验给定的锥。第三，我们建立了与逼近理论和可达性理论的联系。

英文摘要

This paper considers a dynamic game with transferable utilities (TU), where the characteristic function is a continuous-time bounded mean ergodic process. A central planner interacts continuously over time with the players by choosing the instantaneous allocations subject to budget constraints. Before the game starts, the central planner knows the nature of the process (bounded mean ergodic), the bounded set from which the coalitions' values are sampled, and the long run average coalitions' values. On the other hand, he has no knowledge of the underlying probability function generating the coalitions' values. Our goal is to find allocation rules that use a measure of the extra reward that a coalition has received up to the current time by re-distributing the budget among the players. The objective is two-fold: i) guaranteeing convergence of the average allocations to the core (or a specific point in the core) of the average game, ii) driving the coalitions' excesses to an a priori given cone. The resulting allocation rules are robust as they guarantee the aforementioned convergence properties despite the uncertain and time-varying nature of the coaltions' values. We highlight three main contributions. First, we design an allocation rule based on full observation of the extra reward so that the average allocation approaches a specific point in the core of the average game, while the coalitions' excesses converge to an a priori given direction. Second, we design a new allocation rule based on partial observation on the extra reward so that the average allocation converges to the core of the average game, while the coalitions' excesses converge to an a priori given cone. And third, we establish connections to approachability theory and attainability theory.

URL PDF HTML ☆

赞 0 踩 0

1204.4717 2026-06-03 math.OC cs.LG cs.SY eess.SY 版本更新

Energy-Efficient Building HVAC Control Using Hybrid System LBMPC

使用混合系统LBMPC的节能建筑HVAC控制

Anil Aswani, Neal Master, Jay Taneja, Andrew Krioukov, David Culler, Claire Tomlin

AI总结本文提出一种基于混合系统学习模型预测控制（LBMPC）的建筑HVAC控制方法，通过系统辨识和模型更新实现日均1.5MWh的节能效果，且不降低舒适度。

详情

AI中文摘要

提高供暖、通风和空调（HVAC）系统的能效具有巨大的经济和社会效益。本文关注建筑级HVAC系统的混合系统模型辨识，以及后续使用基于学习的模型预测控制（LBMPC）的混合系统公式进行控制。这里，学习指的是对混合系统模型的更新，除了底层控制中固有的积分器动态外，还纳入了由于 occupancy、太阳效应、室外空气温度（OAT）和设备引起的加热效应。尽管我们做了显著的建模简化，但使用该模型的相应控制器能够在实验中实现大幅降低能耗，且不降低 occupant 舒适度。通过这种方式，我们证明了所做出的建模简化的合理性。最后，我们展示了在建筑HVAC测试平台上的实验结果，显示平均每天节省1.5MWh的能源（p = 0.002），95%置信区间为1.0MWh至2.1MWh。

英文摘要

Improving the energy-efficiency of heating, ventilation, and air-conditioning (HVAC) systems has the potential to realize large economic and societal benefits. This paper concerns the system identification of a hybrid system model of a building-wide HVAC system and its subsequent control using a hybrid system formulation of learning-based model predictive control (LBMPC). Here, the learning refers to model updates to the hybrid system model that incorporate the heating effects due to occupancy, solar effects, outside air temperature (OAT), and equipment, in addition to integrator dynamics inherently present in low-level control. Though we make significant modeling simplifications, our corresponding controller that uses this model is able to experimentally achieve a large reduction in energy usage without any degradations in occupant comfort. It is in this way that we justify the modeling simplifications that we have made. We conclude by presenting results from experiments on our building HVAC testbed, which show an average of 1.5MWh of energy savings per day (p = 0.002) with a 95% confidence interval of 1.0MWh to 2.1MWh of energy savings.

URL PDF HTML ☆

赞 0 踩 0

1204.0885 2026-06-03 eess.SY cs.LG cs.NE cs.SY 版本更新

PID Parameters Optimization by Using Genetic Algorithm

使用遗传算法优化PID参数

Andri Mirzal, Shinichiro Yoshii, Masashi Furukawa

AI总结针对一阶滞后加时滞系统，采用遗传算法确定PID控制器参数，并与迭代法和Ziegler-Nichols规则的结果进行比较。

Comments 12 pages, 4 figures

1203.2511 2026-06-03 cs.LG cs.CE cs.NI cs.SY eess.SY stat.AP 版本更新

A Simple Flood Forecasting Scheme Using Wireless Sensor Networks

一种使用无线传感器网络的简单洪水预测方案

Victor Seal, Arnab Raha, Shovan Maity, Souvik Kr Mitra, Amitava Mukherjee, Mrinal Kanti Naskar

AI总结提出一种基于无线传感器网络的多元鲁棒线性回归洪水预测模型，通过简单快速的计算实现实时预测，并与其他算法对比验证改进效果。

Comments 16 pages, 4 figures, published in International Journal Of Ad-Hoc, Sensor And Ubiquitous Computing, February 2012; V. seal et al, 'A Simple Flood Forecasting Scheme Using Wireless Sensor Networks', IJASUC, Feb.2012

详情

DOI: 10.5121/ijasuc.2012.3105

AI中文摘要

本文提出一种使用无线传感器网络（WSNs）设计的预测模型，用于预测河流洪水，采用简单快速的计算提供实时结果，以拯救可能受洪水影响的生命。我们的预测模型使用多元鲁棒线性回归，易于理解，实现简单且成本效益高，速度高效，资源利用率低，同时提供可靠精度的实时预测，因此具有任何实际算法所期望的特征。我们的预测模型独立于参数数量，即可以根据现场需求添加或删除任意数量的参数。当水位上升时，我们使用多项式表示水位，其性质用于判断水位是否可能在近期超过洪水线。我们将我们的工作与一种当代算法进行比较，以展示我们的改进。然后，我们展示了预测水位与实际水位的仿真结果。

英文摘要

This paper presents a forecasting model designed using WSNs (Wireless Sensor Networks) to predict flood in rivers using simple and fast calculations to provide real-time results and save the lives of people who may be affected by the flood. Our prediction model uses multiple variable robust linear regression which is easy to understand and simple and cost effective in implementation, is speed efficient, but has low resource utilization and yet provides real time predictions with reliable accuracy, thus having features which are desirable in any real world algorithm. Our prediction model is independent of the number of parameters, i.e. any number of parameters may be added or removed based on the on-site requirements. When the water level rises, we represent it using a polynomial whose nature is used to determine if the water level may exceed the flood line in the near future. We compare our work with a contemporary algorithm to demonstrate our improvements over it. Then we present our simulation results for the predicted water level compared to the actual water level.

URL PDF HTML ☆

赞 0 踩 0

1008.3043 2026-06-03 math.NA cs.CC cs.LG cs.NA stat.ML 版本更新

Learning Functions of Few Arbitrary Linear Parameters in High Dimensions

高维中少量任意线性参数的函数学习

Massimo Fornasier, Karin Schnass, Jan Vybiral

AI总结针对高维空间中由少量线性参数决定的函数，提出基于随机采样和压缩感知的近似算法，在多项式时间内实现高概率逼近。

Comments 31 pages, this version was accepted to Foundations of Computational Mathematics, the final publication will be available on http://www.springerlink.com

详情

AI中文摘要

假设 $f$ 是定义在 $\mathbb R^d$ 的单位球上的连续函数，形式为 $f(x) = g (A x)$，其中 $A$ 是 $k imes d$ 矩阵，$g$ 是 $k$ 个变量的函数，且 $k \ll d$。我们有一个预算 $m \in \mathbb N$，即允许查询 $f$ 的 $m$ 个点 $f(x_i)$，$i=1,...,m$，以构造一致逼近函数。在函数 $g$ 的某些光滑性和变差假设下，以及矩阵 $A$ 的任意选择下，本文提出： 1. 随机抽取点 $\{x_i\}$ 的采样选择，用于每个函数逼近； 2. 计算逼近函数的算法（算法1和算法2），其复杂度在维度 $d$ 和点数 $m$ 上最多为多项式。由于 $A$ 的任意性，采样点的选择将根据适当的随机分布进行，我们的结果以压倒性概率成立。我们的方法使用了压缩感知框架中的工具、正半定矩阵和的近期Chernoff界，以及奇异值分解不变子空间的经典稳定性界。

英文摘要

Let us assume that $f$ is a continuous function defined on the unit ball of $\mathbb R^d$, of the form $f(x) = g (A x)$, where $A$ is a $k \times d$ matrix and $g$ is a function of $k$ variables for $k \ll d$. We are given a budget $m \in \mathbb N$ of possible point evaluations $f(x_i)$, $i=1,...,m$, of $f$, which we are allowed to query in order to construct a uniform approximating function. Under certain smoothness and variation assumptions on the function $g$, and an {\it arbitrary} choice of the matrix $A$, we present in this paper 1. a sampling choice of the points $\{x_i\}$ drawn at random for each function approximation; 2. algorithms (Algorithm 1 and Algorithm 2) for computing the approximating function, whose complexity is at most polynomial in the dimension $d$ and in the number $m$ of points. Due to the arbitrariness of $A$, the choice of the sampling points will be according to suitable random distributions and our results hold with overwhelming probability. Our approach uses tools taken from the {\it compressed sensing} framework, recent Chernoff bounds for sums of positive-semidefinite matrices, and classical stability bounds for invariant subspaces of singular value decompositions.

URL PDF HTML ☆

赞 0 踩 0

1108.6296 2026-06-03 cs.LG cs.NA math.NA 版本更新

Infinite Tucker Decomposition: Nonparametric Bayesian Models for Multiway Data Analysis

无限Tucker分解：用于多路数据分析的非参数贝叶斯模型

Zenglin Xu, Feng Yan, Yuan, Qi

AI总结提出基于非参数贝叶斯的无限Tucker分解模型（InfTucker），通过潜在高斯/t过程和非线性协方差函数，在概率框架下处理连续和二元数据，并开发高效变分推理方法，显著提升预测精度。

详情

AI中文摘要

张量分解是多路数据分析的强大计算工具。许多流行的张量分解方法——如Tucker分解和CANDECOMP/PARAFAC (CP)——本质上是多线性因子分解。它们不足以建模(i)数据实体间的复杂交互、(ii)各种数据类型（如缺失数据和二元数据）以及(iii)噪声观测和异常值。为解决这些问题，我们提出了张量变量潜在非参数贝叶斯模型，并结合高效推理方法，用于多路数据分析。我们将这些模型命名为InfTucker。使用这些InfTucker，我们在无限特征空间中进行Tucker分解。与经典张量分解模型不同，我们的新方法在概率框架下处理连续和二元数据。与先前关于矩阵和张量的贝叶斯模型不同，我们的模型基于具有非线性协方差函数的潜在高斯或t过程。为了从数据中高效学习InfTucker，我们开发了一种张量上的变分推理技术。与经典实现相比，新技术将时间和空间复杂度降低了几个数量级。我们在化学计量学和社交网络数据集上的实验结果表明，我们的新模型比最先进的张量分解方法取得了显著更高的预测精度。

英文摘要

Tensor decomposition is a powerful computational tool for multiway data analysis. Many popular tensor decomposition approaches---such as the Tucker decomposition and CANDECOMP/PARAFAC (CP)---amount to multi-linear factorization. They are insufficient to model (i) complex interactions between data entities, (ii) various data types (e.g. missing data and binary data), and (iii) noisy observations and outliers. To address these issues, we propose tensor-variate latent nonparametric Bayesian models, coupled with efficient inference methods, for multiway data analysis. We name these models InfTucker. Using these InfTucker, we conduct Tucker decomposition in an infinite feature space. Unlike classical tensor decomposition models, our new approaches handle both continuous and binary data in a probabilistic framework. Unlike previous Bayesian models on matrices and tensors, our models are based on latent Gaussian or $t$ processes with nonlinear covariance functions. To efficiently learn the InfTucker from data, we develop a variational inference technique on tensors. Compared with classical implementation, the new technique reduces both time and space complexities by several orders of magnitude. Our experimental results on chemometrics and social network datasets demonstrate that our new models achieved significantly higher prediction accuracy than the most state-of-art tensor decomposition

URL PDF HTML ☆

赞 0 踩 0

1109.1533 2026-06-03 math.OC cs.LG cs.NI cs.SY eess.SY math.PR 版本更新

The Non-Bayesian Restless Multi-Armed Bandit: A Case of Near-Logarithmic Strict Regret

非贝叶斯不安分多臂老虎机：近对数严格遗憾的一个案例

Wenhan Dai, Yi Gai, Bhaskar Krishnamachari, Qing Zhao

AI总结针对非贝叶斯不安分多臂老虎机问题，提出一种元策略方法，通过学习有限策略集中的最优策略，实现近对数遗憾，并首次在非贝叶斯RMAB中达到与已知模型最优策略相同的平均奖励。

Comments arXiv admin note: significant text overlap with arXiv:1011.4752

详情

AI中文摘要

在经典的贝叶斯不安分多臂老虎机（RMAB）问题中，有$N$个臂，所有臂上的奖励在每个时刻以已知参数的马尔可夫链演化。玩家每时刻选择激活$K \geq 1$个臂，以最大化多次游戏获得的期望总奖励。RMAB是一个具有挑战性的问题，通常已知为PSPACE-hard。本文考虑更困难的问题：非贝叶斯RMAB，其中马尔可夫链的参数假设先验未知。我们提出了一种原创方法，适用于当对应的贝叶斯问题具有如下结构时：根据已知参数值，最优解是预设的有限策略集中的一个。在此类设置中，我们提出通过采用合适的元策略来学习非贝叶斯RMAB的最优策略，该元策略将有限策略集中的每个策略视为另一个非贝叶斯多臂老虎机问题中的一个臂，而该问题的单臂选择策略是最优的。我们通过开发一种新的感知策略来演示该方法，用于在未知动态信道上进行机会频谱接入。我们证明，我们的策略实现了近对数遗憾（与模型感知的“精灵”相比的期望奖励差异），从而获得了与已知模型下最优策略相同的平均奖励。这是文献中首次在非贝叶斯RMAB上得到这样的结果。在证明中，我们还开发了Chernoff-Hoeffding界的一个新推广。

英文摘要

In the classic Bayesian restless multi-armed bandit (RMAB) problem, there are $N$ arms, with rewards on all arms evolving at each time as Markov chains with known parameters. A player seeks to activate $K \geq 1$ arms at each time in order to maximize the expected total reward obtained over multiple plays. RMAB is a challenging problem that is known to be PSPACE-hard in general. We consider in this work the even harder non-Bayesian RMAB, in which the parameters of the Markov chain are assumed to be unknown \emph{a priori}. We develop an original approach to this problem that is applicable when the corresponding Bayesian problem has the structure that, depending on the known parameter values, the optimal solution is one of a prescribed finite set of policies. In such settings, we propose to learn the optimal policy for the non-Bayesian RMAB by employing a suitable meta-policy which treats each policy from this finite set as an arm in a different non-Bayesian multi-armed bandit problem for which a single-arm selection policy is optimal. We demonstrate this approach by developing a novel sensing policy for opportunistic spectrum access over unknown dynamic channels. We prove that our policy achieves near-logarithmic regret (the difference in expected reward compared to a model-aware genie), which leads to the same average reward that can be achieved by the optimal policy under a known model. This is the first such result in the literature for a non-Bayesian RMAB. For our proof, we also develop a novel generalization of the Chernoff-Hoeffding bound.

URL PDF HTML ☆

赞 0 踩 0

1109.1552 2026-06-03 cs.LG cs.NI cs.SY eess.SY math.OC math.PR 版本更新

Efficient Online Learning for Opportunistic Spectrum Access

机会频谱接入的高效在线学习

Wenhan Dai, Yi Gai, Bhaskar Krishnamachari

AI总结针对认知无线电网络中机会频谱接入的非贝叶斯多臂赌博机问题，提出连续探索与利用（CEE）算法，实现近对数遗憾界，并在已知部分信息时达到对数遗憾。

详情

AI中文摘要

认知无线电网络中的机会频谱接入问题最近被建模为非贝叶斯非平稳多臂赌博机问题。该问题中，有N个臂（对应信道）和一个玩家（对应次用户）。每个臂的状态演变为参数未知的有限状态马尔可夫链。在每个时隙，玩家可以选择K < N个臂进行播放，并获得状态相关的奖励（对应主用户活动下的吞吐量）。目标是最大化多次播放获得的期望总奖励（即总吞吐量）。此类多臂赌博机算法的性能通过遗憾来衡量，定义为与始终播放最佳K个臂的模型感知精灵相比的期望奖励差异。本文针对该问题提出了一种新的连续探索与利用（CEE）算法。当没有关于臂动态的信息时，CEE是首个保证随时间均匀近对数遗憾的算法。当已知与平稳状态分布和状态相关奖励对应的某些界限时，我们证明CEE可以轻松修改以实现随时间对数遗憾。相比之下，先前算法需要关于转移矩阵第二特征值界限的额外信息才能保证对数遗憾。最后，通过数值模拟表明CEE比先前算法更高效。

英文摘要

The problem of opportunistic spectrum access in cognitive radio networks has been recently formulated as a non-Bayesian restless multi-armed bandit problem. In this problem, there are N arms (corresponding to channels) and one player (corresponding to a secondary user). The state of each arm evolves as a finite-state Markov chain with unknown parameters. At each time slot, the player can select K < N arms to play and receives state-dependent rewards (corresponding to the throughput obtained given the activity of primary users). The objective is to maximize the expected total rewards (i.e., total throughput) obtained over multiple plays. The performance of an algorithm for such a multi-armed bandit problem is measured in terms of regret, defined as the difference in expected reward compared to a model-aware genie who always plays the best K arms. In this paper, we propose a new continuous exploration and exploitation (CEE) algorithm for this problem. When no information is available about the dynamics of the arms, CEE is the first algorithm to guarantee near-logarithmic regret uniformly over time. When some bounds corresponding to the stationary state distributions and the state-dependent rewards are known, we show that CEE can be easily modified to achieve logarithmic regret over time. In contrast, prior algorithms require additional information concerning bounds on the second eigenvalues of the transition matrices in order to guarantee logarithmic regret. Finally, we show through numerical simulations that CEE is more efficient than prior algorithms.

URL PDF HTML ☆

赞 0 踩 0

1105.2176 2026-06-03 math.OC cs.IT cs.LG cs.SY eess.SY math.IT 版本更新

A Framework for Optimization under Limited Information

有限信息下的优化框架

Tansu Alpcan

AI总结针对有限信息下的优化问题，提出一个融合信息收集、估计和优化的统一框架，采用贝叶斯方法和高斯过程回归，并利用信息论熵量化信息获取。

1110.1781 2026-06-03 cs.LG cs.SY eess.SY 版本更新

A Study of Unsupervised Adaptive Crowdsourcing

无监督自适应众包研究

G. Kesidis, A. Kurve

AI总结基于用户响应与多数响应的一致性，研究无监督众包性能，提出两种场景下的可靠性度量方法。

Comments Technical Report, 2 figures

1107.1744 2026-06-03 math.OC cs.LG cs.SY eess.SY 版本更新

Stochastic convex optimization with bandit feedback

带强盗反馈的随机凸优化

Alekh Agarwal, Dean P. Foster, Daniel Hsu, Sham M. Kakade, Alexander Rakhlin

AI总结针对带强盗反馈的随机凸优化问题，提出椭球算法的推广，实现$\otil(\poly(d)\sqrt{T})$遗憾，在$T$的尺度上最优。

1110.0718 2026-06-03 cs.IT cs.LG cs.SY eess.SY math.IT 版本更新

Directed information and Pearl's causal calculus

有向信息与Pearl因果演算

Maxim Raginsky

AI总结本文探讨Pearl因果形式化与信息论中因果性及反馈概念之间的联系，并展示条件有向信息如何用于发展Pearl后门准则的信息论版本。

Comments 8 pages, uses ieeeconf.cls; to appear in Proc. 49th Annual Allerton Conf. on Communication, Control and Computing (2011)

详情

AI中文摘要

概率图模型是统计学、机器学习、信号处理和控制中的基本工具。当这样的模型定义在有向无环图（DAG）上时，可以为相应随机系统中发生的事件分配偏序。基于Judea Pearl等人的工作，这些基于DAG的联合概率测度的“因果分解”已被用于功能依赖（因果联系）的表征和推断。这篇主要是说明性的论文聚焦于Pearl形式化（特别是他的“干预”概念）与信息论中的因果性和反馈概念（如因果条件、有向随机核和有向信息）之间的若干联系。作为一个应用，我们展示了如何利用条件有向信息来发展Pearl的“后门”准则的信息论版本，该准则用于从被动观测中识别因果效应。这表明后门准则可以被视为统计充分性的因果类比。

英文摘要

Probabilistic graphical models are a fundamental tool in statistics, machine learning, signal processing, and control. When such a model is defined on a directed acyclic graph (DAG), one can assign a partial ordering to the events occurring in the corresponding stochastic system. Based on the work of Judea Pearl and others, these DAG-based "causal factorizations" of joint probability measures have been used for characterization and inference of functional dependencies (causal links). This mostly expository paper focuses on several connections between Pearl's formalism (and in particular his notion of "intervention") and information-theoretic notions of causality and feedback (such as causal conditioning, directed stochastic kernels, and directed information). As an application, we show how conditional directed information can be used to develop an information-theoretic version of Pearl's "back-door" criterion for identifiability of causal effects from passive observations. This suggests that the back-door criterion can be thought of as a causal analog of statistical sufficiency.

URL PDF HTML ☆

赞 0 踩 0

1109.2088 2026-06-03 cs.LG cs.NI cs.SY eess.SY math.OC math.PR 版本更新

秩亏矩阵的稀疏主成分

Megasthenis Asteris, Dimitris S. Papailiopoulos, George N. Karystinos

AI总结针对秩亏矩阵的稀疏主成分识别问题，通过引入辅助球面变量并证明存在多项式大小的候选指标集，提出了一种多项式时间算法来计算任意稀疏度下的最优稀疏主成分。

Comments 5 pages, 1 figure, to be presented at ISIT

1105.3931 2026-06-03 cs.LG cs.NA math.NA stat.ML 版本更新

Behavior of Graph Laplacians on Manifolds with Boundary

带边界流形上图拉普拉斯算子的行为

Xueyuan Zhou, Mikhail Belkin

AI总结本文分析了带边界流形上图拉普拉斯算子在边界附近的行为，揭示了其与内部不同的缩放特性及全局影响，并给出了收敛速率和数值结果。

详情

AI中文摘要

在流形学习中，基于数据构建的图拉普拉斯算法在实际应用和理论分析中都受到了广泛关注。特别是，从采样数据获得的图拉普拉斯算子收敛到某些连续算子最近成为一个活跃的研究课题。现有的大部分工作都假设数据采样自无边界流形，或者感兴趣的函数在远离边界的点处评估。然而，边界行为问题具有相当大的实践和理论意义。在本文中，我们分析了图拉普拉斯算子在边界附近或边界上的点的行为，讨论了它们的收敛速率及其含义，并提供了一些数值结果。结果表明，虽然边界附近的点只占流形总体积的一小部分，但图拉普拉斯算子在这些点的行为具有与流形上其他地方不同的缩放特性，并对整个流形产生全局影响，这一观察对于流形学习的普遍问题具有潜在的重要意义。

英文摘要

In manifold learning, algorithms based on graph Laplacians constructed from data have received considerable attention both in practical applications and theoretical analysis. In particular, the convergence of graph Laplacians obtained from sampled data to certain continuous operators has become an active research topic recently. Most of the existing work has been done under the assumption that the data is sampled from a manifold without boundary or that the functions of interests are evaluated at a point away from the boundary. However, the question of boundary behavior is of considerable practical and theoretical interest. In this paper we provide an analysis of the behavior of graph Laplacians at a point near or on the boundary, discuss their convergence rates and their implications and provide some numerical results. It turns out that while points near the boundary occupy only a small part of the total volume of a manifold, the behavior of graph Laplacian there has different scaling properties from its behavior elsewhere on the manifold, with global effects on the whole manifold, an observation with potentially important implications for the general problem of learning on manifolds.

URL PDF HTML ☆

赞 0 踩 0

1009.4219 2026-06-03 cs.LG cs.SY eess.SY math.OC 版本更新

Safe Feature Elimination for the LASSO and Sparse Supervised Learning Problems

LASSO和稀疏监督学习问题的安全特征消除

Laurent El Ghaoui, Vivian Viallon, Tarek Rabbani

AI总结提出一种快速方法，在LASSO问题中消除无关特征，显著减少运行时间，并可推广到一般l1惩罚凸问题。

Comments Submitted to JMLR in April 2011

详情

AI中文摘要

我们描述了一种快速方法，用于消除l1惩罚最小二乘回归（或LASSO）问题中的特征（变量）。特征的消除可能导致运行时间的大幅减少，特别是对于惩罚参数的大值。我们的方法不是启发式的：它只消除那些在解决LASSO问题后保证不存在的特征。特征消除步骤易于并行化，并且可以独立测试每个特征的消除。此外，与解决LASSO问题的计算量相比，我们的方法的计算努力可以忽略不计——大致相当于单个梯度步骤的计算量。我们的方法扩展了现有LASSO算法的范围，以处理以前无法达到的更大数据集。我们展示了如何将我们的方法扩展到一般的l1惩罚凸问题，并给出了稀疏支持向量机和逻辑回归问题的初步结果。

英文摘要

We describe a fast method to eliminate features (variables) in l1 -penalized least-square regression (or LASSO) problems. The elimination of features leads to a potentially substantial reduction in running time, specially for large values of the penalty parameter. Our method is not heuristic: it only eliminates features that are guaranteed to be absent after solving the LASSO problem. The feature elimination step is easy to parallelize and can test each feature for elimination independently. Moreover, the computational effort of our method is negligible compared to that of solving the LASSO problem - roughly it is the same as single gradient step. Our method extends the scope of existing LASSO algorithms to treat larger data sets, previously out of their reach. We show how our method can be extended to general l1 -penalized convex problems and present preliminary results for the Sparse Support Vector Machine and Logistic Regression problems.

URL PDF HTML ☆

赞 0 踩 0

1105.2211 2026-06-03 math.OC cs.IT cs.LG cs.SY eess.SY math.IT 版本更新

Dual Control with Active Learning using Gaussian Process Regression

使用高斯过程回归的主动学习双控制

Tansu Alpcan

AI总结针对信息有限的控制问题，提出一种基于信息论熵度量和高斯过程回归的双控制方法，同时优化系统辨识和控制目标，并在混沌系统和倒立摆控制中验证。

详情

AI中文摘要

在许多实际问题中，控制决策必须在有限信息下做出。控制器可能没有关于非线性系统的先验（甚至后验）数据，除了随时间获得的有限数量点。这要么是由于观测成本高，要么是由于系统的高度非平稳性。信息收集（辨识、探索）与控制（优化、利用）之间的冲突需要一种主动学习方法，用于迭代选择控制动作，同时为系统辨识提供数据点。本文提出一种双控制方法，其中每个控制步骤获取的信息使用信息论中的熵度量进行量化，并作为最先进的高斯过程回归（贝叶斯学习）方法的训练输入。对每个数据点获取的信息进行显式量化，允许迭代优化辨识和控制目标。所开发的方法通过两个例子说明：作为混沌系统的逻辑斯蒂映射控制和带倒立摆的小车位置控制。

英文摘要

In many real world problems, control decisions have to be made with limited information. The controller may have no a priori (or even posteriori) data on the nonlinear system, except from a limited number of points that are obtained over time. This is either due to high cost of observation or the highly non-stationary nature of the system. The resulting conflict between information collection (identification, exploration) and control (optimization, exploitation) necessitates an active learning approach for iteratively selecting the control actions which concurrently provide the data points for system identification. This paper presents a dual control approach where the information acquired at each control step is quantified using the entropy measure from information theory and serves as the training input to a state-of-the-art Gaussian process regression (Bayesian learning) method. The explicit quantification of the information obtained from each data point allows for iterative optimization of both identification and control objectives. The approach developed is illustrated with two examples: control of logistic map as a chaotic system and position control of a cart with inverted pendulum.

URL PDF HTML ☆

赞 0 踩 0

1104.5391 2026-06-03 cs.LG cs.SY eess.SY math.OC 版本更新

On Optimality of Greedy Policy for a Class of Standard Reward Function of Restless Multi-armed Bandit Problem

关于一类标准奖励函数的贪婪策略在非稳态多臂赌博机问题中的最优性

Quan Liu, Kehao Wang, Lin Chen

AI总结针对非稳态多臂赌博机问题，通过分析一类标准奖励函数，建立了保证贪婪策略在折扣期望奖励准则下最优性的折扣因子闭式条件，并验证了其在认知无线电网络中的有效性。

详情

AI中文摘要

本文考虑非稳态赌博机问题，这是决策理论中著名的随机多臂赌博机问题最广泛研究的推广之一。然而，已知该问题在近似任何非平凡因子时是PSPACE-难的。因此，由于其高复杂性，最优性很难获得。考虑到贪婪策略的稳定性和简单性，一个自然的方法是采用贪婪策略。然而，贪婪策略通常因其固有的短视行为而导致最优性损失。本文通过分析一类所谓的标准奖励函数，建立了关于折扣因子β的闭式条件，使得在折扣期望奖励准则下贪婪策略的最优性得到保证，特别是条件β=1表示在平均累积奖励准则下贪婪策略的最优性。因此，标准形式的奖励函数可以轻松用于判断贪婪策略的最优性，无需任何复杂计算。文中给出了认知无线电网络中的一些例子，以验证该数学结果在判断贪婪策略最优性方面的有效性。

英文摘要

In this paper,we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. However, it is known be PSPACE-Hard to approximate to any non-trivial factor. Thus the optimality is very difficult to obtain due to its high complexity. A natural method is to obtain the greedy policy considering its stability and simplicity. However, the greedy policy will result in the optimality loss for its intrinsic myopic behavior generally. In this paper, by analyzing one class of so-called standard reward function, we establish the closed-form condition about the discounted factor βsuch that the optimality of the greedy policy is guaranteed under the discounted expected reward criterion, especially, the condition β= 1 indicating the optimality of the greedy policy under the average accumulative reward criterion. Thus, the standard form of reward function can easily be used to judge the optimality of the greedy policy without any complicated calculation. Some examples in cognitive radio networks are presented to verify the effectiveness of the mathematical result in judging the optimality of the greedy policy.

URL PDF HTML ☆

赞 0 踩 0

1001.4475 2026-06-03 cs.LG cs.SY eess.SY math.OC math.ST stat.TH 版本更新

X-Armed Bandits

Sébastien Bubeck, Rémi Munos, Gilles Stoltz, Csaba Szepesvari

AI总结针对臂集为一般可测空间且均值回报函数满足已知相异度局部Lipschitz条件的随机多臂赌博机问题，提出HOO算法，实现与维度无关的遗憾界并证明极小极大最优性。

详情

AI中文摘要

我们考虑随机赌博机的一个推广，其中臂集$\cX$可以是任意可测空间，且均值回报函数关于决策者已知的相异度函数是“局部Lipschitz”的。在此条件下，我们构建了一种称为HOO（分层乐观优化）的臂选择策略，对于一大类问题，其遗憾界相比之前的结果有所改进。特别地，我们的结果表明，如果$\cX$是欧氏空间中的单位超立方体，且均值回报函数有有限个全局最大值，在这些最大值附近函数的行为具有已知光滑度的局部连续性，那么HOO的期望遗憾以对数因子为界被$\sqrt{n}$控制，即遗憾的增长速率与空间维度无关。我们还证明了当相异度为度量时，我们的算法是极小极大最优的。我们的基本策略具有关于时间步数的二次计算复杂度，且不依赖于加倍技巧。我们还引入了一种改进策略，该策略依赖于加倍技巧但运行时间为线性对数。这两个结果相比之前的方法都有改进。

英文摘要

We consider a generalization of stochastic bandits where the set of arms, $\cX$, is allowed to be a generic measurable space and the mean-payoff function is "locally Lipschitz" with respect to a dissimilarity function that is known to the decision maker. Under this condition we construct an arm selection policy, called HOO (hierarchical optimistic optimization), with improved regret bounds compared to previous results for a large class of problems. In particular, our results imply that if $\cX$ is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally continuous with a known smoothness degree, then the expected regret of HOO is bounded up to a logarithmic factor by $\sqrt{n}$, i.e., the rate of growth of the regret is independent of the dimension of the space. We also prove the minimax optimality of our algorithm when the dissimilarity is a metric. Our basic strategy has quadratic computational complexity as a function of the number of time steps and does not rely on the doubling trick. We also introduce a modified strategy, which relies on the doubling trick but runs in linearithmic time. Both results are improvements with respect to previous approaches.

URL PDF HTML ☆

赞 0 踩 0

1010.5290 2026-06-03 cs.LG cs.NA math.NA 版本更新

Converged Algorithms for Orthogonal Nonnegative Matrix Factorizations

正交非负矩阵分解的收敛算法

Andri Mirzal

AI总结提出基于Lee和Seung算法以及Lin思想的单正交和双正交非负矩阵分解算法，并给出收敛性证明，实验验证了收敛性。

Comments 55 pages, 11 figures

1103.2491 2026-06-03 cs.LG cs.GT cs.SY eess.SY math.OC 版本更新

Heterogeneous Learning in Zero-Sum Stochastic Games with Incomplete Information

不完全信息零和随机博弈中的异构学习

Quanyan Zhu, Hamidou Tembine, Tamer Basar

AI总结针对不完全信息零和随机博弈，提出异构学习方案（各智能体采用不同学习模式），利用随机逼近将其转化为常微分方程，并应用于安全博弈中攻防双方因理性与信息差异采用不同学习策略的场景。

详情

AI中文摘要

学习算法对于博弈论在网络环境中的应用至关重要。在动态和去中心化的环境中，流量、拓扑和信道状态可能随时间变化，且智能体之间的通信不切实际，因此需要制定和研究不完全信息博弈以及完全分布式学习算法，这些算法要求每个智能体对其他智能体的信息需求最小。在本文中，我们应对这一重大挑战，引入了异构学习方案，其中每个智能体在不完全信息博弈的背景下采用不同的学习模式。我们使用随机逼近技术来证明异构学习方案可以通过其确定性常微分方程对应物进行研究。根据玩家的学习速率，这些常微分方程可能不同于标准的复制者动力学、（短视）最佳响应动力学、logit动力学和虚拟博弈动力学。我们将结果应用于一类安全博弈，其中攻击者和防御者由于理性水平和获取信息的差异而采用不同的学习方案。

英文摘要

Learning algorithms are essential for the applications of game theory in a networking environment. In dynamic and decentralized settings where the traffic, topology and channel states may vary over time and the communication between agents is impractical, it is important to formulate and study games of incomplete information and fully distributed learning algorithms which for each agent requires a minimal amount of information regarding the remaining agents. In this paper, we address this major challenge and introduce heterogeneous learning schemes in which each agent adopts a distinct learning pattern in the context of games with incomplete information. We use stochastic approximation techniques to show that the heterogeneous learning schemes can be studied in terms of their deterministic ordinary differential equation (ODE) counterparts. Depending on the learning rates of the players, these ODEs could be different from the standard replicator dynamics, (myopic) best response (BR) dynamics, logit dynamics, and fictitious play dynamics. We apply the results to a class of security games in which the attacker and the defender adopt different learning schemes due to differences in their rationality levels and the information they acquire.

URL PDF HTML ☆

赞 0 踩 0

1102.2975 2026-06-03 math.OC cs.LG cs.SY eess.SY math.PR 版本更新

Decentralized Restless Bandit with Multiple Players and Unknown Dynamics

多玩家未知动力学的分散式休止臂赌博机

Haoyang Liu, Keqin Liu, Qing Zhao

AI总结针对多玩家未知动力学的分散式休止多臂赌博机问题，提出一种分散式策略，在已知系统参数边界时实现对数阶遗憾，在无先验知识时实现任意接近对数阶的遗憾。

Comments 7 pages, 2 figures, in Proc. of Information Theory and Applications Workshop (ITA), January, 2011

1102.0899 2026-06-03 cs.AI cs.CV cs.LG cs.NA math.NA math.PR 版本更新

Evidence Feed Forward Hidden Markov Model: A New Type of Hidden Markov Model

证据前馈隐马尔可夫模型：一种新型隐马尔可夫模型

Michael DelRose, Christian Wagner, Philip Frederick

AI总结针对隐马尔可夫模型无法建模观测间关联的问题，提出证据前馈隐马尔可夫模型，通过引入观测间概率链接提升分类性能，并在视觉动作和测量数据上验证其有效性。

Comments 19 pages, International Journal of Artificial Intelligence and Applications

详情

DOI: 10.5121/ijaia.2011.2101
Journal ref: International Journal of Artificial Intelligence and Applications (IJAIA), Vol. 2, No. 1, Jan 2011

AI中文摘要

仅基于视觉动作预测他人意图的能力是人类和动物独有的技能。当前计算机算法的智能尚未达到这种复杂程度，但已有若干研究正朝此方向努力。由于可用的分类算法众多，难以确定哪种算法最适合特定情境。在视觉人类意图数据分类中，隐马尔可夫模型（HMM）及其变体是主要候选方法。HMM无法提供观测间链接的概率，这是该分类技术的一大缺陷。当人通过视觉识别他人的动作时，会监控观测中的模式。通过估计下一个观测，人们能够总结动作，从而相当准确地判断执行动作者的意图。这些视觉线索和链接对于创建基于视觉观测确定人类动作的智能算法至关重要。证据前馈隐马尔可夫模型是一种新开发的算法，它提供了观测间链接。本研究阐述了证据前馈HMM背后的理论，提供了其学习这些参数以优化观测似然性的数学证明（这对所有计算智能算法都至关重要），并给出了与标准HMM在视觉动作数据和测量数据分类中的比较示例，从而为证据前馈HMM在多种问题分类中的应用奠定了坚实基础。

英文摘要

The ability to predict the intentions of people based solely on their visual actions is a skill only performed by humans and animals. The intelligence of current computer algorithms has not reached this level of complexity, but there are several research efforts that are working towards it. With the number of classification algorithms available, it is hard to determine which algorithm works best for a particular situation. In classification of visual human intent data, Hidden Markov Models (HMM), and their variants, are leading candidates. The inability of HMMs to provide a probability in the observation to observation linkages is a big downfall in this classification technique. If a person is visually identifying an action of another person, they monitor patterns in the observations. By estimating the next observation, people have the ability to summarize the actions, and thus determine, with pretty good accuracy, the intention of the person performing the action. These visual cues and linkages are important in creating intelligent algorithms for determining human actions based on visual observations. The Evidence Feed Forward Hidden Markov Model is a newly developed algorithm which provides observation to observation linkages. The following research addresses the theory behind Evidence Feed Forward HMMs, provides mathematical proofs of their learning of these parameters to optimize the likelihood of observations with a Evidence Feed Forwards HMM, which is important in all computational intelligence algorithm, and gives comparative examples with standard HMMs in classification of both visual action data and measurement data; thus providing a strong base for Evidence Feed Forward HMMs in classification of many types of problems.

URL PDF HTML ☆

赞 0 踩 0

1011.1518 2026-06-03 stat.ML cs.LG cs.NA math.NA 版本更新

Robust Matrix Decomposition with Outliers

含离群值的鲁棒矩阵分解

Daniel Hsu, Sham M. Kakade, Tong Zhang

AI总结研究通过ℓ1范数和迹范数最小化从观测矩阵中恢复低秩矩阵和稀疏离群值矩阵的条件，给出了比以往更强的恢复保证，且不假设离群值的空间模式是随机的。

Comments Corrected comparisons to previous work of Candes et al (2009)

1008.4406 2026-06-03 cs.MM cs.LG cs.SY eess.SY 版本更新

Structural Solutions to Dynamic Scheduling for Multimedia Transmission in Unknown Wireless Environments

未知无线环境下多媒体传输的动态调度结构解决方案

Fangwen Fu, Mihaela van der Schaar

AI总结针对时变无线信道中延迟敏感媒体数据的调度问题，提出基于马尔可夫决策过程（MDP）和优先级图（DAG）的结构化解决方案，通过分解多数据单元决策为顺序单数据单元决策降低复杂度，并开发低复杂度在线学习算法处理未知统计知识，显著优于现有方法。

详情

AI中文摘要

在本文中，我们提出了一种系统性的解决方案，用于在时变无线信道上调度延迟敏感的媒体数据进行传输。我们首先将动态调度问题建模为马尔可夫决策过程（MDP），该过程明确考虑了用户异构的多媒体数据特征（例如延迟截止时间、失真影响和依赖性等）以及时变信道条件，这些在现有的数据包调度算法中并未同时考虑。这种建模使我们能够进行前瞻性决策，在每次传输时调度多个数据单元，以优化多媒体应用的长期效用。媒体数据的异构性使我们能够将不同数据单元之间的传输优先级表示为优先级图，这是一个有向无环图（DAG）。该优先级图为我们提供了一种优雅的结构，可以将每次的多数据单元前瞻性决策分解为多个单数据单元前瞻性决策，这些决策可以按顺序执行，从高优先级数据单元到低优先级数据单元，从而显著降低计算复杂度。当多媒体数据特征和信道条件的统计知识先验未知时，我们开发了一种低复杂度的在线学习算法来更新价值函数，该函数捕捉当前决策对未来效用的影响。仿真结果表明，所提出的解决方案显著优于现有的最先进调度解决方案。

英文摘要

In this paper, we propose a systematic solution to the problem of scheduling delay-sensitive media data for transmission over time-varying wireless channels. We first formulate the dynamic scheduling problem as a Markov decision process (MDP) that explicitly considers the users' heterogeneous multimedia data characteristics (e.g. delay deadlines, distortion impacts and dependencies etc.) and time-varying channel conditions, which are not simultaneously considered in state-of-the-art packet scheduling algorithms. This formulation allows us to perform foresighted decisions to schedule multiple data units for transmission at each time in order to optimize the long-term utilities of the multimedia applications. The heterogeneity of the media data enables us to express the transmission priorities between the different data units as a priority graph, which is a directed acyclic graph (DAG). This priority graph provides us with an elegant structure to decompose the multi-data unit foresighted decision at each time into multiple single-data unit foresighted decisions which can be performed sequentially, from the high priority data units to the low priority data units, thereby significantly reducing the computation complexity. When the statistical knowledge of the multimedia data characteristics and channel conditions is unknown a priori, we develop a low-complexity online learning algorithm to update the value functions which capture the impact of the current decision on the future utility. The simulation results show that the proposed solution significantly outperforms existing state-of-the-art scheduling solutions.

URL PDF HTML ☆

赞 0 踩 0

1007.0380 2026-06-03 math.NA cs.LG cs.NA 版本更新

Additive Non-negative Matrix Factorization for Missing Data

缺失数据的加性非负矩阵分解

Mithun Das Gupta

AI总结提出一种加性非负矩阵分解方法，通过联合优化缺失属性和分解因子来生成测试数据中的缺失属性，并证明算法的单调收敛性。

Comments General extension of the NMF framework

0910.0921 2026-06-03 cs.LG cs.NA math.NA 版本更新

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

含噪声观测的低秩矩阵补全：定量比较

Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh

AI总结本文通过仿真平台定量比较了三种主流低秩矩阵补全算法（OptSpace、ADMiRA和FPCA）在噪声观测下的性能，并展示了它们在真实数据和随机生成数据上的准确重建能力。

Comments 7 pages, 7 figures, 47th Allerton Conference on Communication Control and Computing, 2009, invited paper

0909.5000 2026-06-03 cs.LG cs.NA cs.NE math.NA 版本更新

Eignets for function approximation on manifolds

用于流形上函数逼近的特征网络

H. N. Mhaskar

AI总结针对紧致光滑黎曼流形上的函数逼近问题，提出一种基于核函数线性组合的特征网络（eignet）的确定性通用算法，给出最优逼近阶估计并证明系数有界性及导数逼近的最优性。

Comments 28 pages. Articles in press; Applied and Computational Harmonic Analysis, 2009

详情

AI中文摘要

设 $\XX$ 为无边界紧致光滑连通黎曼流形，$G:\XX\times\XX\to \RR$ 为核函数。类似于径向基函数网络，特征网络（eignet）形如 $\sum_{j=1}^M a_jG(\circ,y_j)$，其中 $a_j\in\RR$，$y_j\in\XX$，$1\le j\le M$。我们描述了一种确定性的通用算法，用于构造逼近 $L^p(\mu;\XX)$ 中函数的特征网络，适用于一类广泛的测度 $\mu$ 和核 $G$。我们的算法产生线性算子。以中心 $y_j$ 之间的最小间隔作为逼近代价，我们给出了特征网络逼近度的光滑模估计，并通过逆定理证明这些估计对每个个体函数都是最优的。我们还根据特征网络的范数给出了系数 $a_j$ 的估计。最后，我们证明：如果任何特征网络序列满足光滑函数逼近度的最优估计（以最小间隔度量），那么特征网络的导数也以最优方式逼近目标函数的相应导数。

英文摘要

Let $\XX$ be a compact, smooth, connected, Riemannian manifold without boundary, $G:\XX\times\XX\to \RR$ be a kernel. Analogous to a radial basis function network, an eignet is an expression of the form $\sum_{j=1}^M a_jG(\circ,y_j)$, where $a_j\in\RR$, $y_j\in\XX$, $1\le j\le M$. We describe a deterministic, universal algorithm for constructing an eignet for approximating functions in $L^p(μ;\XX)$ for a general class of measures $μ$ and kernels $G$. Our algorithm yields linear operators. Using the minimal separation amongst the centers $y_j$ as the cost of approximation, we give modulus of smoothness estimates for the degree of approximation by our eignets, and show by means of a converse theorem that these are the best possible for every \emph{individual function}. We also give estimates on the coefficients $a_j$ in terms of the norm of the eignet. Finally, we demonstrate that if any sequence of eignets satisfies the optimal estimates for the degree of approximation of a smooth function, measured in terms of the minimal separation, then the derivatives of the eignets also approximate the corresponding derivatives of the target function in an optimal manner.

URL PDF HTML ☆

赞 0 踩 0

0810.0877 2026-06-03 math.NA cs.LG cs.NA 版本更新

Bias-Variance Techniques for Monte Carlo Optimization: Cross-validation for the CE Method

蒙特卡洛优化的偏差-方差技术：CE方法的交叉验证

Dev Rajnarayan, David Wolpert

AI总结本文利用偏差-方差权衡和交叉验证技术改进交叉熵（CE）方法在蒙特卡洛优化中的性能，并指出参数学习中的技术可推广至优化算法。

详情

AI中文摘要

本文在蒙特卡洛优化（MCO）和参数学习（PL）的广泛背景下考察了CE方法，后者是一种机器学习。一个众所周知的用于提高许多PL算法性能的总体原则是偏差-方差权衡。该权衡已被用于改进从蒙特卡洛积分估计到线性估计再到一般统计估计的PL算法。此外，如所述，MCO与PL密切相关。由于这种相似性，偏差-方差权衡影响MCO性能，正如它影响PL性能一样。在本文中，我们利用偏差-方差权衡来增强MCO算法的性能。我们使用交叉验证技术，一种基于偏差-方差权衡的技术，来显著改进交叉熵（CE）方法（一种MCO算法）的性能。在先前的工作中，我们已确认其他PL技术改进了其他MCO算法的性能。我们得出结论，PL中开创的许多技术可以研究作为改进一般MCO算法，特别是CE方法的方法。

英文摘要

In this paper, we examine the CE method in the broad context of Monte Carlo Optimization (MCO) and Parametric Learning (PL), a type of machine learning. A well-known overarching principle used to improve the performance of many PL algorithms is the bias-variance tradeoff. This tradeoff has been used to improve PL algorithms ranging from Monte Carlo estimation of integrals, to linear estimation, to general statistical estimation. Moreover, as described by, MCO is very closely related to PL. Owing to this similarity, the bias-variance tradeoff affects MCO performance, just as it does PL performance. In this article, we exploit the bias-variance tradeoff to enhance the performance of MCO algorithms. We use the technique of cross-validation, a technique based on the bias-variance tradeoff, to significantly improve the performance of the Cross Entropy (CE) method, which is an MCO algorithm. In previous work we have confirmed that other PL techniques improve the perfomance of other MCO algorithms. We conclude that the many techniques pioneered in PL could be investigated as ways to improve MCO algorithms in general, and the CE method in particular.

URL PDF HTML ☆

赞 0 踩 0