arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4101
2506.08137 2026-06-02 cs.CV cs.AI

IGraSS: Learning to Identify Infrastructure Networks from Satellite Imagery by Iterative Graph-constrained Semantic Segmentation

IGraSS: 通过迭代图约束语义分割从卫星图像中识别基础设施网络

Oishee Bintey Hoque, Abhijin Adiga, Aniruddha Adiga, Siddharth Chaudhary, Madhav V. Marathe, S. S. Ravi, Kirti Rajagopalan, Amanda Wilson, Samarth Swarup

发表机构 * Biocomplexity Institute, University of Virginia(弗吉尼亚大学生物复杂性研究所) Department of Computer Science, University of Virginia(弗吉尼亚大学计算机科学系) Department Biomedical Systems Engineering, Washington State University(华盛顿州立大学生物医学系统工程系) Earth System Science Center, University of Alabama in Huntsville(阿拉巴马大学亨茨维尔分校地球系统科学中心)

AI总结 提出IGraSS迭代框架,结合语义分割与图约束优化,将不可达运河段从18%降至3%,并提升道路网络完整性。

详情
AI中文摘要

精确的运河网络制图对于水资源管理(包括灌溉规划和基础设施维护)至关重要。最先进的基础设施制图语义分割模型(如道路)依赖于大规模、良好标注的遥感数据集。然而,不完整或不充分的真实标注会阻碍这些学习方法。许多基础设施网络具有图级属性,如可达性(运河)或连通性(道路),可用于改进现有真实标注。本文开发了一种新颖的迭代框架IGraSS,将结合RGB和额外模态(NDWI、DEM)的语义分割模块与基于图的真实标注精化模块相结合。分割模块处理卫星图像块,而精化模块将基础设施网络视为图,在整个数据上运行。实验表明,IGraSS将不可达运河段从约18%降至3%,并且使用精化后的真实标注进行训练显著改善了运河识别。IGraSS是一个鲁棒的框架,既可用于精化噪声真实标注,也可用于从遥感影像中绘制运河网络。我们还以道路网络为例,应用不同的图论约束来完善道路网络,证明了IGraSS的有效性和泛化能力。

英文摘要

Accurate canal network mapping is essential for water management, including irrigation planning and infrastructure maintenance. State-of-the-art semantic segmentation models for infrastructure mapping, such as roads, rely on large, well-annotated remote sensing datasets. However, incomplete or inadequate ground truth can hinder these learning approaches. Many infrastructure networks have graph-level properties such as reachability to a source (like canals) or connectivity (roads) that can be leveraged to improve these existing ground truth. This paper develops a novel iterative framework IGraSS, combining a semantic segmentation module-incorporating RGB and additional modalities (NDWI, DEM)-with a graph-based ground-truth refinement module. The segmentation module processes satellite imagery patches, while the refinement module operates on the entire data viewing the infrastructure network as a graph. Experiments show that IGraSS reduces unreachable canal segments from around 18% to 3%, and training with refined ground truth significantly improves canal identification. IGraSS serves as a robust framework for both refining noisy ground truth and mapping canal networks from remote sensing imagery. We also demonstrate the effectiveness and generalizability of IGraSS using road networks as an example, applying a different graph-theoretic constraint to complete road networks.

2506.09035 2026-06-02 cs.CV

Princeton365: A Diverse Dataset with Accurate Camera Pose

Princeton365: 一个具有精确相机位姿的多样化数据集

Karhan Kayan, Stamatis Alexandropoulos, Rishabh Jain, Yiming Zuo, Erich Liang, Jia Deng

发表机构 * Princeton University(普林斯顿大学)

AI总结 提出Princeton365数据集,包含365个视频和精确相机位姿,通过校准板和360度相机的新颖真值采集框架弥合精度与多样性差距,并引入基于光流的尺度感知评估指标及新颖视图合成基准。

Comments Update v2: Match the ICCV 2025 camera-ready version. Fix typos

详情
AI中文摘要

我们介绍了Princeton365,一个包含365个视频的大规模多样化数据集,具有精确的相机位姿。我们的数据集通过引入一种新颖的真值采集框架,利用校准板和360度相机,弥合了当前SLAM基准中精度与数据多样性之间的差距。我们收集了室内、室外和物体扫描视频,并同步输出单目和立体RGB视频以及IMU数据。我们进一步提出了一种基于相机位姿估计误差引起的光流的新场景尺度感知SLAM评估指标。与当前指标相比,我们的新指标允许跨场景比较SLAM方法的性能,而现有指标如平均轨迹误差(ATE)则不能,从而使研究人员能够分析其方法的失败模式。我们还提出了一个具有挑战性的新颖视图合成基准,涵盖了当前NVS基准未覆盖的情况,例如具有360度相机轨迹的完全非朗伯场景。请访问 https://princeton365.cs.princeton.edu 获取数据集、代码、视频和提交信息。

英文摘要

We introduce Princeton365, a large-scale diverse dataset of 365 videos with accurate camera pose. Our dataset bridges the gap between accuracy and data diversity in current SLAM benchmarks by introducing a novel ground truth collection framework that leverages calibration boards and a 360-camera. We collect indoor, outdoor, and object scanning videos with synchronized monocular and stereo RGB video outputs as well as IMU. We further propose a new scene scale-aware evaluation metric for SLAM based on the optical flow induced by the camera pose estimation error. In contrast to the current metrics, our new metric allows for comparison between the performance of SLAM methods across scenes as opposed to existing metrics such as Average Trajectory Error (ATE), allowing researchers to analyze the failure modes of their methods. We also propose a challenging Novel View Synthesis benchmark that covers cases not covered by current NVS benchmarks, such as fully non-Lambertian scenes with 360-degree camera trajectories. Please visit https://princeton365.cs.princeton.edu for the dataset, code, videos, and submission.

2505.19489 2026-06-02 cs.AI cs.SE

Taming System Complexity: Demystifying Software Engineering Agents in Diagnosing Linux Kernel Faults

驯服系统复杂性:揭秘软件工程代理在诊断Linux内核故障中的作用

Zhenhao Zhou, Zhuochen Huang, Yike He, Chong Wang, Jiajun Wang, Yijian Wu, Xin Peng, Yiling Lou

发表机构 * Fudan University(复旦大学) Nanyang Technological University(南洋理工大学)

AI总结 针对Linux内核故障定位挑战,提出LinuxFLBench基准和LinuxFL$^+$增强框架,将现有LLM代理的文件级top-1准确率从41.6%提升7.2%-11.2%。

Comments Accepted to ACL 2026

详情
AI中文摘要

Linux内核是一个关键系统,作为众多系统的基础。Linux内核中的错误可能导致严重后果,影响数十亿用户。故障定位(FL)旨在识别软件中的错误代码元素,在软件质量保证中起着至关重要的作用。虽然最近的LLM代理在SWE-bench等最新基准测试中取得了有希望的FL准确率,但目前尚不清楚这些方法在Linux内核中的表现如何,因为由于大规模代码库、有限的可观测性和多样的影响因素,FL在该领域更具挑战性。在本文中,我们介绍了LinuxFLBench,这是一个从真实世界Linux内核错误构建的FL基准。我们进行了一项实证研究,以评估最先进的LLM代理在Linux内核上的性能。我们的初步结果显示,现有代理在此任务上表现不佳,文件级最佳top-1准确率仅为41.6%。为应对这一挑战,我们提出了LinuxFL$^+$,一个旨在提高LLM代理在Linux内核中FL有效性的增强框架。LinuxFL$^+$以最小的成本显著提高了所有研究代理的FL准确率(例如,准确率提升7.2% - 11.2%)。

英文摘要

The Linux kernel is a critical system, serving as the foundation for numerous systems. Bugs in the Linux kernel can cause serious consequences, affecting billions of users. Fault localization (FL), which aims at identifying the buggy code elements in software, plays an essential role in software quality assurance. While recent LLM agents have achieved promising accuracy in FL on recent benchmarks like SWE-bench, it remains unclear how well these methods perform in the Linux kernel, where FL is much more challenging due to the large-scale code base, limited observability, and diverse impact factors. In this paper, we introduce LinuxFLBench, a FL benchmark constructed from real-world Linux kernel bugs. We conduct an empirical study to assess the performance of state-of-the-art LLM agents on the Linux kernel. Our initial results reveal that existing agents struggle with this task, achieving a best top-1 accuracy of only 41.6% at file level. To address this challenge, we propose LinuxFL$^+$, an enhancement framework designed to improve FL effectiveness of LLM agents for the Linux kernel. LinuxFL$^+$ substantially improves the FL accuracy of all studied agents (e.g., 7.2% - 11.2% accuracy increase) with minimal costs.

2505.18113 2026-06-02 cs.LG math.OC

Beyond Discreteness: Sample Complexity Analysis of Straight-Through Estimator for 1-bit Quantization

超越离散性:1比特量化的直通估计器样本复杂度分析

Halyun Jeong, Jack Xin, Penghang Yin

发表机构 * Department of Mathematics and Statistics, University at Albany, SUNY(纽约州立大学阿尔巴尼分校数学与统计学系) Department of Mathematics, University of California, Irvine(加州大学欧文分校数学系)

AI总结 本文首次对神经网络量化中直通估计器(STE)的样本复杂度进行分析,通过研究具有二元权重和激活的两层神经网络的量化感知训练,推导出保证STE优化收敛到全局最小值的样本复杂度界,并发现标签噪声下STE梯度方法的循环逃逸与回归特性,以及STE在非高斯数据上失效但可通过归一化恢复有效性。

详情
AI中文摘要

训练量化神经网络需要解决底层优化问题的非可微和离散性质。为应对这一挑战,直通估计器(STE)已成为最广泛采用的启发式方法,通过引入有偏但有效的替代梯度,允许通过离散操作进行反向传播。然而,其理论性质仍 largely unexplored,现有少数分析通过假设无限训练数据来关注泛化误差。相比之下,本文首次在神经网络量化背景下对STE进行了样本复杂度分析。我们的理论结果强调了样本量在STE成功中的关键作用,这是现有研究缺失的关键见解。具体而言,通过分析具有二元权重和激活的两层神经网络的量化感知训练,我们推导出以数据维度表示的样本复杂度界,这些界保证了基于STE的优化在遍历和非遍历分析中收敛到全局最小值。此外,在存在标签噪声的情况下,我们证明了STE梯度方法的一个有趣循环性质,其中迭代反复逃离并返回到最优二元权重。最后,我们实验证明STE在一般非高斯数据上失败,但通过归一化可以恢复其有效性,这突显了其在有效量化中的实际重要性。

英文摘要

Training quantized neural networks requires addressing the non-differentiable and discrete nature of the underlying optimization problem. To tackle this challenge, the straight-through estimator (STE) has become the most widely adopted heuristic, allowing backpropagation through discrete operations by introducing biased yet valid surrogate gradients. However, its theoretical properties remain largely unexplored, with few existing analyses focus on the generalization error by assuming an infinite amount of training data. In contrast, this work presents the first sample complexity analysis of STE in the context of neural network quantization. Our theoretical results highlight the critical role of sample size in the success of STE, a key insight absent from existing studies. Specifically, by analyzing the quantization-aware training of a two-layer neural network with binary weights and activations, we derive the sample complexity bounds in terms of the data dimensionality that guarantee the convergence of STE-based optimization to the global minimum for both ergodic and non-ergodic analyses. Moreover, in the presence of label noises, we prove an intriguing recurrence property of STE-gradient method, where the iterate repeatedly escape from and return to the optimal binary weights. Finally, we empirically demonstrate that STE fails for general non-Gaussian data but its effectiveness can be restored through normalization, underscoring its practical importance in effective quantization.

2503.24183 2026-06-02 cs.LG cs.MA

Scalable Ride-Sourcing Vehicle Rebalancing with Service Accessibility Guarantee: A Constrained Mean-Field Reinforcement Learning Approach

可扩展的网约车再平衡方法:具有服务可及性保证的约束平均场强化学习

Matej Jusup, Kenan Zhang, Zhiyuan Hu, Barna Pásztor, Andreas Krause, Francesco Corman

发表机构 * ETH Zürich(苏黎世联邦理工学院) EPFL Lausanne(洛桑联邦理工学院)

AI总结 提出基于约束平均场强化学习的连续状态再平衡模型,在保证服务公平性的同时实现大规模网约车车队的高效协调与可扩展性。

Comments 34 pages, 15 figures

详情
Journal ref
Transportation Research Part C: Emerging Technologies, Vol. 188, 105705 (2026)
AI中文摘要

Uber和Lyft等网约车服务的扩张通过移动应用提供灵活的按需出行,重塑了城市交通。尽管便利,这些平台面临重大运营挑战,尤其是车辆再平衡——即战略性地重新定位车队以解决供需的时空错配。再平衡不足会导致乘客等待时间延长和车辆利用率低下,还会引发公平性问题,如服务分布不均和司机收入差异。为解决这些问题,我们引入了具有连续再平衡动作的连续状态平均场控制(MFC)和平均场强化学习(MFRL)模型。MFC和MFRL通过车辆与车辆分布(而非单个车辆)的交互来建模每辆车的行为,从而提供可扩展的解决方案。这缓解了关于智能体数量的维度灾难,使得能够以显著降低的计算复杂度协调大型车队,并在车队规模变化时无需重新训练模型。为确保跨地理区域的公平服务可及性,我们将可及性约束整合到模型中,并推导出在高度满足乘客需求和公平覆盖车辆供应之间取得平衡的再平衡策略。使用深圳数据驱动模拟的广泛评估证明了我们方法的效率和鲁棒性。值得注意的是,该方法可扩展到数万辆车辆,训练时间与线性规划再平衡相当。此外,我们的策略有效探索了效率-公平帕累托前沿,在车队利用率、完成请求数和接驾距离等关键指标上优于传统基准,同时确保公平的服务可及性。

英文摘要

The expansion of ride-sourcing services such as Uber and Lyft has reshaped urban transportation by offering flexible, on-demand mobility via mobile applications. Despite convenience, these platforms confront significant operational challenges, particularly vehicle rebalancing-strategic repositioning of a fleet of vehicles to address spatiotemporal mismatches in supply and demand. Inadequate rebalancing results in prolonged rider waiting times and inefficient vehicle utilization, but also leads to fairness issues, such as the inequitable distribution of service and disparities in driver income. To tackle these, we introduce continuous-state mean-field control (MFC) and mean-field reinforcement learning (MFRL) models with continuous repositioning actions. MFC and MFRL offer scalable solutions by modeling each vehicle's behavior through interaction with the vehicle distribution, rather than with individual vehicles. This mitigates the curse of dimensionality with respect to the number of agents, enabling coordination across large fleets with significantly reduced computational complexity and eliminating the need to retrain the model when fleet size changes. To ensure equitable service access across geographic regions, we integrate an accessibility constraint into models and derive rebalancing policies that strike a balance between high fulfillment of rider demand and fair coverage of vehicle supply. Extensive evaluation using data-driven simulation of Shenzhen demonstrates the efficiency and robustness of our approach. Remarkably, it scales to tens of thousands of vehicles, with training times comparable to linear programming rebalancing. Besides, our policies effectively explore the efficiency-equity Pareto front, outperforming conventional benchmarks across key metrics like fleet utilization, fulfilled requests, and pickup distance, while ensuring equitable service access.

2505.13273 2026-06-02 cs.AI cs.LG

EMoE: Training-Free Expert Disagreement for Uncertainty-Aware Text-to-Image Diffusion

EMoE: 面向不确定性感知的文本到图像扩散的无训练专家分歧方法

Lucas Berry, Axel Brando, Wei-Di Chang, Juan Camilo Gamboa Higuera, David Meger

发表机构 * McGill University(麦吉尔大学) Barcelona Supercomputing Center (BSC)(巴塞罗那超级计算中心 (BSC)) Ideogram AI

AI总结 提出EMoE方法,通过预训练MoE扩散模型中早期MoE层的专家分歧,无需训练即可估计认知不确定性,用于提示风险诊断和生成质量排序。

详情
AI中文摘要

大型文本到图像扩散模型很少在提示可能产生低质量生成时提供可靠信号,尤其是在训练数据未公开的情况下。我们研究预训练混合专家(MoE)扩散模型中的专家分歧是否可以作为认知不确定性的可靠估计。我们引入EMoE,一种无训练方法,在早期MoE层分离专家特定的计算路径,跨路径使用相同的初始噪声,并在第一步去噪后测量其潜在表示之间的方差。这提供了在完整图像生成之前的不确定性感知提示信号,无需辅助网络或训练扩散集成。在COCO和CC3M上,EMoE根据文本-图像对齐质量指标对提示进行排序,比扩散特定和基于路由的基线更一致。我们进一步将EMoE应用于多语言提示,并发现分歧和生成质量中存在系统的语言依赖性差异,包括共享词汇效应。这些结果使EMoE成为MoE文本到图像扩散模型中提示风险、模型覆盖和偏差分析的实用诊断工具。

英文摘要

Large text-to-image diffusion models rarely expose reliable signals of when a prompt is likely to produce a poorly aligned generation, especially when training data is undisclosed. We study whether expert disagreement inside pre-trained mixture-of-experts (MoE) diffusion models can serve as a reliable estimate for epistemic uncertainty. We introduce EMoE, a training-free method that separates expert-specific computation paths at an early MoE layer, uses the same initial noise across paths, and measures variance among their latent representations after the first denoising step. This provides an uncertainty-aware prompt signal before full image generation, without auxiliary networks or training diffusion ensembles. On COCO and CC3M, EMoE ranks prompts by text-image alignment quality metrics more consistently than diffusion-specific and router-based baselines. We further apply EMoE to multilingual prompts and find systematic language-dependent differences in disagreement and generation quality, including shared-vocabulary effects. These results position EMoE as a practical diagnostic tool for prompt risk, model coverage, and bias analysis in MoE text-to-image diffusion models.

2503.03137 2026-06-02 cs.AI cs.LG cs.NE

Learning to Reduce Search Space for Generalizable Neural Routing Solver

学习减少搜索空间以实现泛化的神经路由求解器

Changliang Zhou, Xi Lin, Zhenkun Wang, Qingfu Zhang

发表机构 * School of Automation and Intelligent Manufacturing(自动化与智能制造学院) Southern University of Science and Technology(南方科技大学) School of Mathematics and Statistics(数学与统计学学院) Xi'an Jiaotong University(西安交通大学) Department of Computer Science(计算机科学系) City University of Hong Kong(香港城市大学)

AI总结 提出首个基于学习的动态搜索空间缩减框架L2R,通过自适应剪枝节点来高效求解大规模车辆路径问题,在千万节点规模上保持高质量解。

Comments accepted by SIGKDD 2026

详情
AI中文摘要

构造性神经组合优化(NCO)通过直接学习构造近似最优解,为解决车辆路径问题(VRPs)提供了一种有前景的范式,从而减少了对算法设计专家知识的依赖。然而,由于高计算复杂度,将这些方法扩展到大规模实例仍然具有挑战性。虽然最近的动态搜索空间缩减(SSR)方法可以通过基于几何距离的剪枝提高推理效率,但它们通常难以处理具有非均匀分布的复杂实例,或者当最优解严重依赖于非空间约束时。为了解决这一关键问题,我们提出了学习减少(L2R),这是首个基于学习的动态SSR框架。L2R通过从问题特定特征中提取模式来学习自适应地优先考虑节点,从而在每一步剪枝搜索空间,实现高效且可扩展的解构造。大量实验表明,我们的L2R框架在不同的VRP变体上对不同问题规模和数据分布具有稳健的泛化能力。据我们所知,L2R是首个有效扩展到具有1000万个节点的VRP实例同时保持高质量解的神经求解器,这显著推动了NCO在泛化和可扩展性方面的前沿。我们的代码可在https://github.com/CIAM-Group/L2R获取。

英文摘要

Constructive neural combinatorial optimization (NCO) offers a promising paradigm for solving vehicle routing problems (VRPs) by directly learning to construct approximate optimal solutions, thereby reducing reliance on expert knowledge for algorithm design. However, scaling these methods to handle large-scale instances remains challenging due to high computational complexity. While recent dynamic search space reduction (SSR) methods can improve inference efficiency through geometric distance-based pruning, they often struggle on complex instances with non-uniform distributions or when optimal solutions rely heavily on non-spatial constraints. To address this critical issue, we propose Learning to Reduce (L2R), which is the first learning-based dynamic SSR framework. L2R learns to adaptively prioritize nodes by extracting patterns from problem-specific features to prune the search space at each step, enabling efficient and scalable solution construction. Extensive experiments show that our L2R framework generalizes robustly to different problem scales and data distributions on various VRP variants. To the best of our knowledge, L2R is the first neural solver to effectively scale to VRP instances with $10$ million nodes while maintaining high solution quality, which significantly pushes the frontier of NCO in terms of generalization and scalability. Our code is available at https://github.com/CIAM-Group/L2R.

2505.10882 2026-06-02 cs.LG stat.ML

Global Convergence of Adaptive Sensing for Principal Eigenvector Estimation

主特征向量估计的自适应传感的全局收敛性

Alex Saad-Falcon, Brighton Ancelin, Justin Romberg

发表机构 * School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA(电子与计算机工程学院,佐治亚理工学院,亚特兰大,GA,USA)

AI总结 本文分析Oja算法的一种压缩变体,利用每样本两个自适应测量估计协方差矩阵的主特征向量,证明了期望正弦平方误差的收敛速率并给出信息论下界,揭示了压缩带来的维度代价。

Comments Accepted at ICML 2026. 34 pages (9 main text + appendices), 4 figures, 2 tables. v2 (camera-ready) adds a matching information-theoretic lower bound and a non-adaptive lower-bound separation across three powers of d; substantially revised from v1

详情
AI中文摘要

主成分分析经典地需要完整的$d$维样本,但在各种应用中,硬件限制每次采集只能获得少量标量测量。我们分析了Oja算法的一种压缩变体,用于估计数据协方差矩阵的主特征向量,每样本仅使用两个自适应测量。在每次迭代中,我们沿着当前估计方向进行一次测量,并在随机正交方向上进行一次测量。我们证明,经过$t$次迭代后,到真实特征向量的期望正弦平方误差为$\mathcal{O}(\lambda_1\lambda_2 d^2 / (\Delta^2 t))$,其中$d$是环境维度,$\lambda_1, \lambda_2$是前导特征值,$\Delta = \lambda_1 - \lambda_2$是特征间隙。我们用一个匹配的信息论下界$\Omega(\lambda_1\lambda_2 d^2 / (\Delta^2 t))$来补充这一结果——这是压缩特征向量估计的第一个下界——证明$d^2$因子(与完全观测的极小极大速率$\Theta(\lambda_1\lambda_2 d / (\Delta^2 t))$相比多了一个$d$因子)是压缩的基本代价,无法改进。相比之下,每次迭代两次测量的任何非自适应方案都会遭受$\Omega(\lambda_2^2 d^3 / (\Delta^2 t))$的误差,多了一个$d$的幂次。这通过$d$的三个幂次将完全观测PCA、自适应压缩PCA和非自适应压缩PCA区分开来。我们的分析处理了协方差具有非零尾部特征值的噪声设置,为无噪声情况之外的自适应压缩子空间跟踪提供了首个收敛性保证。

英文摘要

Principal component analysis classically requires full $d$-dimensional samples, yet in various applications hardware limits acquisition to a few scalar measurements per sample. We analyze a compressed variant of Oja's algorithm for estimating the principal eigenvector of the data covariance matrix using only two adaptive measurements per sample. At each iteration, we observe one measurement along the current estimate and one in a random orthogonal direction. We prove that after $t$ iterations, the expected sine-squared error to the true eigenvector is $\mathcal{O}(λ_1λ_2 d^2 / (Δ^2 t))$, where $d$ is the ambient dimension, $λ_1, λ_2$ are the leading eigenvalues, and $Δ= λ_1 - λ_2$ is the eigengap. We complement this with a matching information-theoretic lower bound of $Ω(λ_1λ_2 d^2 / (Δ^2 t))$ -- the first for compressed eigenvector estimation -- proving that the $d^2$ factor, an additional factor of $d$ compared to the fully-observed minimax rate $Θ(λ_1λ_2 d / (Δ^2 t))$, is the fundamental cost of compression and cannot be improved. In contrast, any non-adaptive scheme with two measurements per iteration suffers $Ω(λ_2^2 d^3 / (Δ^2 t))$, an additional power of $d$. This separates fully-observed, adaptive-compressed, and non-adaptive-compressed PCA across three powers of $d$. Our analysis handles the noisy setting where the covariance has nonzero trailing eigenvalues, providing the first convergence guarantee for adaptive compressed subspace tracking beyond the noiseless case.

2503.06473 2026-06-02 cs.CV cs.AI

Enhancing Layer Attention Efficiency through Pruning Redundant Retrievals

通过剪枝冗余检索增强层注意力效率

Hanze Li, Yaosong Du, Zhibo Yao, Mengyao Zeng, Xiuqi Ge, Xiande Huang

发表机构 * De Artificial Intelligence Lab(德人工智能实验室)

AI总结 针对层注意力机制中相邻层权重冗余导致特征重复和训练效率低的问题,提出基于KL散度量化冗余并利用增强Beta分位数映射(EBQM)跳过冗余层的高效层注意力(ELA)架构,在图像分类和目标检测任务中训练时间减少30%且性能提升。

Comments 5 pages

详情
AI中文摘要

越来越多的证据表明,层注意力机制增强了深度神经网络中层间的交互,显著推进了网络架构的发展。然而,现有的层注意力方法存在冗余问题,因为相邻层学习的注意力权重往往变得高度相似。这种冗余导致多个层提取几乎相同的特征,降低了模型的表示能力并增加了训练时间。为了解决这个问题,我们提出了一种新颖的方法,利用相邻层之间的Kullback-Leibler(KL)散度来量化冗余。此外,我们引入了一种增强Beta分位数映射(EBQM)方法,能够准确识别并跳过冗余层,从而保持模型稳定性。我们提出的高效层注意力(ELA)架构提高了训练效率和整体性能,在图像分类和目标检测等任务中实现了30%的训练时间减少,同时提升了性能。

英文摘要

Growing evidence suggests that layer attention mechanisms, which enhance interaction among layers in deep neural networks, have significantly advanced network architectures. However, existing layer attention methods suffer from redundancy, as attention weights learned by adjacent layers often become highly similar. This redundancy causes multiple layers to extract nearly identical features, reducing the model's representational capacity and increasing training time. To address this issue, we propose a novel approach to quantify redundancy by leveraging the Kullback-Leibler (KL) divergence between adjacent layers. Additionally, we introduce an Enhanced Beta Quantile Mapping (EBQM) method that accurately identifies and skips redundant layers, thereby maintaining model stability. Our proposed Efficient Layer Attention (ELA) architecture, improves both training efficiency and overall performance, achieving a 30% reduction in training time while enhancing performance in tasks such as image classification and object detection.

2504.21427 2026-06-02 cs.LG cs.AI

MPEC: Manifold-Preserved EEG Classification via an Ensemble of Clustering-Based Classifiers

MPEC:通过集成基于聚类的分类器实现流形保持的脑电图分类

Shermin Shahbazi, Mohammad-Reza Nasiri, Majid Ramezani

发表机构 * Department of Electrical and Computer(电气与计算机系) Department of Computer Science and Engineering, Information Technology(计算机科学与工程系,信息科技)

AI总结 提出MPEC方法,通过协方差矩阵和RBF核的特征工程以及黎曼流形上的改进K-means聚类集成,解决EEG信号的非欧几里得流形结构问题,在BCI Competition IV数据集2a上取得显著提升。

Comments 7 pages ,3 figures

详情
AI中文摘要

脑电图信号的准确分类对于脑机接口(BCI)和神经假体应用至关重要,然而许多现有方法未能考虑EEG数据的非欧几里得流形结构,导致性能欠佳。保留这种流形信息对于捕捉EEG信号的真实几何结构至关重要,但传统分类技术在很大程度上忽视了这一需求。为此,我们提出了MPEC(通过集成基于聚类的分类器实现流形保持的EEG分类),它引入了两项关键创新:(1)一个特征工程阶段,结合协方差矩阵和径向基函数(RBF)核来捕捉EEG通道之间的线性和非线性关系;(2)一个聚类阶段,采用针对黎曼流形空间定制的改进K-means算法,确保局部几何敏感性。通过集成多个基于聚类的分类器,MPEC取得了优越的结果,并在BCI Competition IV数据集2a上得到了显著改进的验证。

英文摘要

Accurate classification of EEG signals is crucial for brain-computer interfaces (BCIs) and neuroprosthetic applications, yet many existing methods fail to account for the non-Euclidean, manifold structure of EEG data, resulting in suboptimal performance. Preserving this manifold information is essential to capture the true geometry of EEG signals, but traditional classification techniques largely overlook this need. To this end, we propose MPEC (Manifold-Preserved EEG Classification via an Ensemble of Clustering-Based Classifiers), that introduces two key innovations: (1) a feature engineering phase that combines covariance matrices and Radial Basis Function (RBF) kernels to capture both linear and non-linear relationships among EEG channels, and (2) a clustering phase that employs a modified K-means algorithm tailored for the Riemannian manifold space, ensuring local geometric sensitivity. Ensembling multiple clustering-based classifiers, MPEC achieves superior results, validated by significant improvements on the BCI Competition IV dataset 2a.

2504.17471 2026-06-02 cs.LG cs.AI cs.DC

GRANITE : a Byzantine-Resilient Dynamic Gossip Learning Framework

GRANITE:一种拜占庭鲁棒的动态八卦学习框架

Yacine Belal, Mohamed Maouche, Sonia Ben Mokhtar

发表机构 * CEA, List, Université Paris-Saclay Palaiseau(CEA、List、巴黎-萨克雷大学帕莱索分校) INRIA, INSA Lyon, CITI, UR3720(INRIA、里昂INSA、CITI、UR3720) LIRIS, INSA Lyon, CNRS Lyon(LIRIS、里昂INSA、里昂CNRS)

AI总结 针对动态八卦学习中拜占庭节点通过毒化模型和操纵节点采样发起的双重攻击,提出GRANITE框架,通过累积节点标识知识并动态调整聚合阈值,实现鲁棒学习,理论证明拜占庭节点在局部邻域呈指数衰减,实验表明在30%拜占庭节点下精度接近非拜占庭场景,且通信成本降低9倍。

详情
AI中文摘要

八卦学习是一种去中心化的学习范式,用户通过迭代地与少量邻居节点交换和聚合模型。最近的方法依赖于使用随机节点采样协议构建的动态通信图,这些协议已被证明可以加速收敛。然而,我们表明这些方法容易受到双重攻击:拜占庭节点可以毒化模型并操纵节点采样以放大其影响力。我们通过GRANITE框架应对这种组合威胁,该框架用于在存在拜占庭节点的稀疏动态图上进行鲁棒学习。GRANITE随时间累积关于遇到的节点标识的知识,并根据每个节点邻域中估计的拜占庭密度动态调整局部聚合阈值。我们证明,在GRANITE下,局部邻域中的拜占庭节点呈现指数衰减。我们进一步推导了GRANITE生成图的鲁棒性条件。实验结果表明,在30%拜占庭节点下,GRANITE的收敛精度在非拜占庭精度的5%以内,收敛速度更快,且通信成本降低高达9倍。

英文摘要

Gossip Learning (GL) is a decentralized learning paradigm where users iteratively exchange and aggregate models with a small set of neighboring peers. Recent approaches rely on dynamic communication graphs built using Random Peer Sampling (RPS) protocols which have been proven to accelerate convergence. However, we show that these approaches are vulnerable to a dual attack: Byzantine nodes can poison models and manipulate peer sampling to amplify their influence. We address this combination of threats with GRANITE, a framework for robust learning over sparse, dynamic graphs in the presence of Byzantine nodes. GRANITE accumulates knowledge about encountered node identifiers over time and dynamically adjusts local aggregation thresholds based on estimated Byzantine density in the neighbourhood of each node. We demonstrate that under GRANITE, the Byzantine presence in local neighborhoods exhibits an exponential decay. We further derive the robustness conditions of the graphs generated by GRANITE. Empirically, our results indicate that GRANITE converges within 5% of non-Byzantine accuracy under 30% Byzantines nodes, offers faster convergence and operates on graphs with up to 9x lower communication cost.

2406.09953 2026-06-02 cs.RO cs.AI

DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning

DAG-Plan:生成有向无环依赖图用于双臂协作规划

Zeyu Gao, Yao Mu, Jinye Qu, Mengkang Hu, Shijia Peng, Chengkai Hou, Lingyue Guo, Ping Luo, Shanghang Zhang, Yanfeng Lu

发表机构 * State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences (CASIA)(多模态人工智能系统国家重点实验室,自动化研究所,中国科学院(CASIA)) School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS)(中国科学院大学人工智能学院) School of Computer Science, Shanghai Jiao Tong University(上海交通大学计算机科学学院) State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University(多媒体信息处理国家重点实验室,北京大学计算机科学学院) Department of Computer Science, The University of Hong Kong(香港大学计算机科学系) OpenGVLab, Shanghai AI Laboratory(上海人工智能实验室,OpenGVLab)

AI总结 提出DAG-Plan框架,首次使用有向无环图作为双臂协调的核心表示,通过一次LLM解析生成结构化DAG,实现自适应并行执行,在双臂厨房基准测试中成功率提升48%,执行效率提升84.1%。

Comments ICRA 2026

详情
AI中文摘要

双臂机器人有望提高效率,但需要规划具有非线性子任务依赖关系的复杂任务。当前使用大型语言模型(LLM)的方法存在根本性权衡:生成线性序列效率高但无法建模并行性和适应变化,而迭代查询具有适应性但过于缓慢且成本高昂。为弥合这一差距,我们引入DAG-Plan,一种新颖的任务规划框架,首次采用有向无环图(DAG)作为双臂协调的核心表示。关键洞察在于DAG天然捕获复杂的子任务依赖关系并明确揭示并行执行的机会。在该框架内,LLM仅被使用一次作为强大的语义解析器,将自然语言指令转换为结构化的DAG。在执行过程中,我们的系统基于实时环境观察动态地将候选节点分配给合适的机械臂,实现真正的自适应并行操作。在双臂厨房基准测试上的广泛评估表明,DAG-Plan的结构化方法从根本上优于现有范式。与单查询线性序列方法相比,通过稳健管理依赖关系,成功率提高了48%;与迭代查询方法相比,通过消除重复LLM调用的延迟,执行效率提高了84.1%。我们的工作表明,基于图的原则性表示是解锁高效可靠的基于LLM的复杂机器人系统规划的关键。更多演示和代码请访问 https://sites.google.com/view/dag-plan。

英文摘要

Dual-arm robots promise greater efficiency but require planning for complex tasks with nonlinear sub-task dependencies. Current methods using Large Language Models (LLMs) suffer from a fundamental trade-off: generating linear sequences is efficient but fails to model parallelism and adapt to changes, while iterative querying is adaptive but too slow and costly. To bridge this gap, we introduce DAG-Plan, a novel task planning framework that for the first time employs a Directed Acyclic Graph (DAG) as the central representation for dual-arm coordination. The key insight is that a DAG natively captures complex sub-task dependencies and explicitly reveals opportunities for parallel execution. Within this framework, an LLM is used only once as a powerful semantic parser to translate a natural language instruction into a structured DAG. During execution, our system dynamically assigns candidate nodes to the suitable arm based on real-time environmental observations, enabling truly adaptive and parallel operation. Extensive evaluation on a dual-arm kitchen benchmark shows that DAG-Plan's structured approach fundamentally outperforms existing paradigms. It achieves a 48% higher success rate than single-query linear sequence methods with dual arm by robustly managing dependencies, and an 84.1% higher execution efficiency than iterative querying methods by eliminating the latency of repeated LLM calls. Our work demonstrates that a principled, graph-based representation is the key to unlocking efficient and reliable LLM-based planning for complex robotic systems. More demos and code are available on https://sites.google.com/view/dag-plan.

2504.04718 2026-06-02 cs.CL cs.AI

T1: Tool-integrated Verification for Test-time Compute Scaling in Small Language Models

T1:小语言模型测试时计算扩展的工具集成验证

Minki Kang, Jongwon Jeong, Jaewoong Cho

发表机构 * KAIST(韩国科学技术院) University of Wisconsin-Madison(威斯康星大学麦迪逊分校) KRAFTON

AI总结 针对小语言模型在测试时扩展中验证能力不足的问题,提出T1框架,通过外部工具过滤候选输出后由小语言模型进行最终验证,显著提升验证准确率和测试时扩展性能。

Comments ICLR 2026

详情
AI中文摘要

近期研究表明,测试时计算扩展能有效提升小语言模型(sLMs)的性能。然而,先前研究主要利用额外的大模型作为验证器进行测试时计算扩展,而sLMs自身的验证能力尚未被充分探索。本文研究sLMs在测试时扩展中能否可靠地验证输出候选。我们发现,即使从大验证器进行知识蒸馏,sLMs在需要记忆的任务(如数值计算和事实核查)上仍表现不佳。为解决这一局限,我们提出工具集成验证(T1),这是一个两阶段框架:首先用外部工具过滤候选,然后使用sLM进行最终验证,将记忆密集型步骤卸载到代码解释器等工具上。在T1框架内,我们证明卸载到外部工具可减轻sLMs的记忆负担,并提升测试时扩展性能。在MATH基准上的实验表明,采用T1的Llama-3.2 1B模型在测试时扩展下性能优于规模更大的Llama-3.1 8B模型。此外,T1提高了过程奖励模型(PRMs)和评论家模型的验证准确率。我们的发现凸显了工具集成在显著提升sLMs验证能力方面的潜力。

英文摘要

Recent studies have demonstrated that test-time compute scaling effectively improves the performance of small language models (sLMs). However, prior research has mainly examined test-time compute scaling with an additional larger model as a verifier, leaving verification by sLMs underexplored. In this work, we investigate whether sLMs can reliably verify the output candidates under test-time scaling. We find that even with knowledge distillation from larger verifiers, sLMs struggle with verification tasks requiring memorization, such as numerical calculations and fact-checking. To address this limitation, we propose Tool-integrated verification (T1), a two-stage framework that first filters candidates with external tools and then uses an sLM for final verification, offloading memorization-heavy steps to tools such as a code interpreter. Within T1, we prove that offloading to external tools reduces the memorization burden on sLMs and improves test-time scaling performance. Experiments on the MATH benchmark demonstrate that, with T1, a Llama-3.2 1B model under test-time scaling outperforms the significantly larger Llama-3.1 8B model. Moreover, T1 improves the verification accuracy of both process reward models (PRMs) and critic models. Our findings highlight the potential of tool integration to substantially improve the verification abilities of sLMs.

2503.05500 2026-06-02 cs.CL cs.AI

EuroBERT: Scaling Multilingual Encoders for European Languages

EuroBERT:面向欧洲语言的多语言编码器扩展

Nicolas Boizard, Hippolyte Gisserot-Boukhlef, Duarte M. Alves, André Martins, Ayoub Hammal, Caio Corro, Céline Hudelot, Emmanuel Malherbe, Etienne Malaboeuf, Fanny Jourdan, Gabriel Hautreux, João Alves, Kevin El Haddad, Manuel Faysse, Maxime Peyrard, Nuno M. Guerreiro, Patrick Fernandes, Ricardo Rei, Pierre Colombo

发表机构 * Artefact Research Center(Artfact研究中心) CNRS(法国国家科学研究中心) ISIA Lab(ISIA实验室) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文提出EuroBERT系列多语言编码器,通过整合生成式模型的最新进展,在检索、回归和分类等任务上超越现有模型,并原生支持长达8192个token的序列。

Comments 28 pages, 8 figures, 13 tables

详情
AI中文摘要

用于检索、回归和分类的通用多语言向量表示传统上来自双向编码器模型。尽管应用广泛,但编码器最近被生成式仅解码器模型的进步所掩盖。然而,推动这一进展的许多创新并非解码器所独有。在本文中,我们通过这些进展的视角重新审视多语言编码器的发展,并介绍EuroBERT,一个覆盖欧洲及全球广泛使用语言的多语言编码器家族。我们的模型在包括多语言能力、数学和编码在内的多种任务上优于现有替代方案,并原生支持长达8192个token的序列。我们还研究了EuroBERT背后的设计决策,提供了关于数据集组成和训练流程的见解。我们公开发布EuroBERT模型,包括中间训练检查点以及我们的训练框架。

英文摘要

General-purpose multilingual vector representations, used in retrieval, regression and classification, are traditionally obtained from bidirectional encoder models. Despite their wide applicability, encoders have been recently overshadowed by advances in generative decoder-only models. However, many innovations driving this progress are not inherently tied to decoders. In this paper, we revisit the development of multilingual encoders through the lens of these advances, and introduce EuroBERT, a family of multilingual encoders covering European and widely spoken global languages. Our models outperform existing alternatives across a diverse range of tasks, spanning multilingual capabilities, mathematics, and coding, and natively supporting sequences of up to 8,192 tokens. We also examine the design decisions behind EuroBERT, offering insights into our dataset composition and training pipeline. We publicly release the EuroBERT models, including intermediate training checkpoints, together with our training framework.

2503.15639 2026-06-02 cs.CV cs.AI

A Lightweight Context-Driven Training-Free Network for Scene Text Segmentation and Recognition

一种轻量级上下文驱动的免训练网络用于场景文本分割与识别

Ritabrata Chakraborty, Shivakumara Palaiahnakote, Umapada Pal, Cheng-Lin Liu

发表机构 * CVPR Unit, Indian Statistical Institute, Kolkata, India(印度统计研究所柯西拉分校CVPR单位) Manipal University Jaipur, India(印度贾浦尔曼普尔大学) University of Salford, UK(英国萨尔福德大学) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院)

AI总结 提出一种基于上下文理解、无需训练的即插即用框架,通过注意力分割和语义评估实现高效场景文本识别,性能与SOTA相当且资源消耗更低。

Comments Accepted at ICDAR 2025 (ORAL) 21 pages, 8 figures, 7 tables

详情
AI中文摘要

现代场景文本识别系统通常依赖于大型端到端架构,这些架构需要大量训练,并且对于实时场景来说成本过高。在这种情况下,由于内存、计算资源和延迟的限制,部署重型模型变得不切实际。为了应对这些挑战,我们提出了一种新颖的、无需训练的即插即用框架,该框架利用预训练文本识别器的优势,同时最小化冗余计算。我们的方法使用基于上下文的理解,并引入了一个基于注意力的分割阶段,该阶段在像素级别细化候选文本区域,从而改进下游识别。我们不执行传统的文本检测(即特征图与源图像之间的块级比较),而是利用预训练的标题生成器来利用上下文信息,使框架能够直接从场景上下文生成单词预测。候选文本经过语义和词汇评估以获得最终分数。达到或超过预定义置信度阈值的预测绕过更重的端到端文本STR(场景文本识别)流程,确保更快的推理并减少不必要的计算。在公共基准上的实验表明,我们的范式实现了与最先进系统相当的性能,但所需资源大大减少。我们的代码可在此处找到:https://ritabrata04.github.io/Context-driven-STR/。

英文摘要

Modern scene text recognition systems often depend on large end-to-end architectures that require extensive training and are prohibitively expensive for real-time scenarios. In such cases, the deployment of heavy models becomes impractical due to constraints on memory, computational resources, and latency. To address these challenges, we propose a novel, training-free plug-and-play framework that leverages the strengths of pre-trained text recognizers while minimizing redundant computations. Our approach uses context-based understanding and introduces an attention-based segmentation stage, which refines candidate text regions at the pixel level, improving downstream recognition. Instead of performing traditional text detection that follows a block-level comparison between feature map and source image and harnesses contextual information using pretrained captioners, allowing the framework to generate word predictions directly from scene context.Candidate texts are semantically and lexically evaluated to get a final score. Predictions that meet or exceed a pre-defined confidence threshold bypass the heavier process of end-to-end text STR profiling, ensuring faster inference and cutting down on unnecessary computations. Experiments on public benchmarks demonstrate that our paradigm achieves performance on par with state-of-the-art systems, yet requires substantially fewer resources.Our code can be found here: https://ritabrata04.github.io/Context-driven-STR/.

2503.15371 2026-06-02 cs.RO cs.LG

GIFT: Geometry-Induced Functional Transfer for Category-level Object Manipulation

GIFT: 几何诱导的功能迁移用于类别级物体操作

Cristiana de Farias, Luis Figueredo, Riddhiman Laha, Maxime Adjigble, Brahim Tamadazte, Rustam Stolkin, Sami Haddadin, Naresh Marturi

发表机构 * Extreme Robotics Laboratory, School of Metallurgy and Materials, University of Birmingham(伯明翰大学冶金与材料学院极端机器人实验室) Munich Institute of Robotics & Machine Intelligence, Technische Universität München (TUM)(慕尼黑工业大学机器人与人工智能研究所) School of Computer Science, University of Nottingham(诺丁汉大学计算机科学学院) Sorbonne Université, ISIR, Paris, France(巴黎法国索邦大学ISIR研究所)

AI总结 提出GIFT框架,利用功能映射和螺旋插值,从单次人类演示中迁移复杂物体操作技能到新物体,无需额外训练。

Comments 8 pages, 6 figures. ICRA 2026

详情
AI中文摘要

在新环境中操作不熟悉物体对机器人来说具有挑战性,因为泛化能力有限。我们提出了一种新的技能迁移框架GIFT(几何诱导的功能迁移),使机器人能够从单次人类演示中迁移复杂的物体操作技能和约束。我们的方法通过关注以物体为中心的交互,从演示中推导几何表示,解决了技能获取和任务执行的挑战。利用功能映射(FMC)框架,我们高效地映射物体及其环境之间的交互函数,使机器人能够在具有相似拓扑或类别的物体之间复制任务操作,即使它们形状差异很大。此外,我们的方法结合了螺旋插值(ScLERP)来生成平滑、几何感知的机器人路径,确保迁移的技能遵循演示的任务约束。我们通过大量实验验证了该方法的有效性和适应性,展示了在多样化的真实环境中成功进行技能迁移和任务执行,无需额外训练。

英文摘要

Robotic manipulation of unfamiliar objects in new environments is challenging due to limited generalisation capabilities. We propose a new skill transfer framework, GIFT (Geometry-Induced Functional Transfer), which enables a robot to transfer complex object manipulation skills and constraints from a single human demonstration. Our approach addresses the challenge of skill acquisition and task execution by deriving geometric representations from demonstrations focusing on object-centric interactions. By leveraging the Functional Maps (FMC) framework, we efficiently map interaction functions between objects and their environments, allowing the robot to replicate task operations across objects of similar topologies or categories, even when they have significantly different shapes. Additionally, our method incorporates screw interpolation (ScLERP) for generating smooth, geometrically-aware robot paths to ensure the transferred skills adhere to the demonstrated task constraints. We validate the effectiveness and adaptability of our approach through extensive experiments, demonstrating successful skill transfer and task execution in diverse real-world environments without requiring additional training.

2503.07154 2026-06-02 cs.LG cs.AI

Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms

推理时缩放的思想可以有益于生成式预训练算法

Jiaming Song, Linqi Zhou

发表机构 * Luma AI

AI总结 本文指出自回归模型与扩散模型的二分法是错误的,提出应从推理过程(序列扩展与状态细化)出发设计训练目标,并论证了推理算法优先于训练目标的原则。

Comments updated some new literature on flow maps and continuous LLMs

详情
AI中文摘要

生成式预训练通常被框定在一个错误的二分法中:用于离散信号的自回归模型与用于连续信号的扩散模型。我们认为这种二分法是错误的,因为它混淆了模型家族、数据表示、训练目标和推理过程。自回归是一种通过归一化条件采样扩展序列的推理过程,而扩散是一种反复修正现有状态的细化过程。因此,更有用的对比不是自回归与扩散,而是用交叉熵学习的离散标记与用扩散风格目标学习的连续标记,以及用于从中采样的推理算法。从这个角度来看,算法进展应优先考虑推理时间效率的两个维度:序列扩展和状态细化。我们主张在训练目标之前设计推理过程,因为如果推理映射省略了必要参数或施加了错误分解,训练方法无法弥补。我们通过DDIM风格采样器的目标时间限制、多标记预测的联合分布限制,以及直接参数化长距离推理移动的最新流映射和少步蒸馏方法来说明这一原则。

英文摘要

Generative pre-training is often framed through a false dichotomy between autoregressive models for discrete signals and diffusion models for continuous signals. We argue that the dichotomy is false because it conflates model family, data representation, training objective, and inference procedure. Autoregression is an inference procedure that expands a sequence through normalized conditional draws, while diffusion is a refinement procedure that repeatedly revises an existing state. The more useful contrast is therefore not autoregressive versus diffusion, but discrete tokens learned with cross-entropy versus continuous tokens learned with diffusion-style objectives, together with the inference algorithms used to sample from them. From this perspective, algorithmic progress should prioritize inference-time efficiency along two axes: sequence expansion and state refinement. We advocate designing the inference procedure before the training objective, because a training method cannot compensate for an inference map that omits necessary arguments or imposes an incorrect factorization. We illustrate this principle through a target-time limitation of DDIM-style samplers, a joint-distribution limitation of multi-token prediction, and recent flow-map and few-step distillation methods that directly parameterize long-range inference moves.

2503.07325 2026-06-02 cs.LG stat.ML

Non-vacuous Generalization Bounds for Deep Neural Networks without any modification to the trained models

无需对训练模型进行任何修改的深度神经网络非平凡泛化界

Khoat Than, Dat Phan

发表机构 * Hanoi University of Science and Technology(河内科学与技术大学) VinBigdata Institute(VinBigdata研究院)

AI总结 提出一类新的数据依赖泛化界,直接应用于未修改的训练模型,通过分解泛化误差为分布复杂度和局部模型行为项,首次在大型未修改深度网络上实现非平凡泛化保证。

详情
AI中文摘要

理解和认证现代深度神经网络的行为仍然是可靠机器学习中的一个基本挑战。我们引入了一类新的数据依赖泛化界,直接应用于训练模型,无需任何修改。特别地,我们提出了一个可精确计算的界,在所有评估的网络中(包括具有6亿参数的ImageNet规模模型)都是非平凡的。这是首次表明即使对于大型未修改的深度网络,也能实现有意义的泛化保证。我们的方法揭示了泛化由训练模型与数据分布几何之间的相互作用所支配。我们将泛化误差分解为两个可解释的组成部分:一个分布复杂度项,捕捉数据质量在输入空间中的分布;以及局部模型行为项,捕捉网络在单个区域内的行为。这种联合依赖识别出泛化差距出现的位置和原因。实验上,我们界的某些部分对真实测试误差具有高度预测性,并且当划分与内在数据几何对齐时,界会收紧,突出了数据依赖的局部正则性作为泛化的关键驱动因素。

英文摘要

Understanding and certifying the behavior of modern deep neural networks remains a fundamental challenge in reliable machine learning. We introduce a new class of data-dependent generalization bounds that apply directly to trained models, without any modification. In particular, we present an exactly computable bound that is non-vacuous across all evaluated networks, including ImageNet-scale models with 600M parameters. This this is the first work showing that meaningful generalization guarantees are achievable even for large, unaltered deep networks. Our approach reveals that generalization is governed by the interaction between the trained model and the geometry of the data distribution. We decompose the generalization error into two interpretable components: a distributional complexity term, capturing how the data mass is distributed across the input space, and local model-behavior terms, capturing the network's behavior within individual regions. This joint dependence identifies where and why generalization gaps arise. Empirically, some components of our bound are highly predictive of the true test error, and the bound tightens when the partition aligns with the intrinsic data geometry, highlighting data-dependent local regularity as a key driver of generalization.

2503.06136 2026-06-02 cs.CV cs.AI

GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation

GSV3D: 基于高斯溅射的几何蒸馏与稳定视频扩散用于单图像3D物体生成

Ye Tao, Jiawei Zhang, Yahao Shi, Dongqing Zou, Bin Zhou

发表机构 * State Key Laboratory of Virtual Reality Technology and Systems, Beihang University(虚拟现实技术与系统国家重点实验室,北京航空航天大学) SenseTime Research(商汤研究) PBVR

AI总结 提出一种结合2D扩散模型隐式3D推理能力与高斯溅射几何蒸馏的方法,通过高斯溅射解码器将SV3D潜变量输出转换为显式3D表示,实现多视图一致性和高质量3D生成。

详情
AI中文摘要

基于图像的3D生成在机器人和游戏领域有广泛应用,其中高质量、多样化的输出和一致的3D表示至关重要。然而,现有方法存在局限性:3D扩散模型受限于数据集稀缺和缺乏强大的预训练先验,而基于2D扩散的方法则难以保证几何一致性。我们提出了一种方法,利用2D扩散模型的隐式3D推理能力,同时通过基于高斯溅射的几何蒸馏确保3D一致性。具体来说,所提出的高斯溅射解码器通过将SV3D潜变量输出转换为显式3D表示来强制3D一致性。与仅依赖隐式2D表示进行视频生成的SV3D不同,高斯溅射显式编码空间和外观属性,通过几何约束实现多视图一致性。这些约束纠正了视图不一致性,确保了稳健的几何一致性。因此,我们的方法同时生成高质量、多视图一致的图像和精确的3D模型,为基于单图像的3D生成提供了可扩展的解决方案,并弥合了2D扩散多样性与3D结构一致性之间的差距。实验结果表明,该方法在多个数据集上实现了最先进的多视图一致性和强泛化能力。代码将在接收后公开。

英文摘要

Image-based 3D generation has vast applications in robotics and gaming, where high-quality, diverse outputs and consistent 3D representations are crucial. However, existing methods have limitations: 3D diffusion models are limited by dataset scarcity and the absence of strong pre-trained priors, while 2D diffusion-based approaches struggle with geometric consistency. We propose a method that leverages 2D diffusion models' implicit 3D reasoning ability while ensuring 3D consistency via Gaussian-splatting-based geometric distillation. Specifically, the proposed Gaussian Splatting Decoder enforces 3D consistency by transforming SV3D latent outputs into an explicit 3D representation. Unlike SV3D, which only relies on implicit 2D representations for video generation, Gaussian Splatting explicitly encodes spatial and appearance attributes, enabling multi-view consistency through geometric constraints. These constraints correct view inconsistencies, ensuring robust geometric consistency. As a result, our approach simultaneously generates high-quality, multi-view-consistent images and accurate 3D models, providing a scalable solution for single-image-based 3D generation and bridging the gap between 2D Diffusion diversity and 3D structural coherence. Experimental results demonstrate state-of-the-art multi-view consistency and strong generalization across diverse datasets. The code will be made publicly available upon acceptance.

2502.20016 2026-06-02 cs.LG

Position: Neglecting the Sustainability of AI is Fuelling a Global AI Arms Race

立场:忽视人工智能的可持续性正在助长全球人工智能军备竞赛

Pedram Bakhtiarifard, Pınar Tözün, Christian Igel, Raghavendra Selvan

发表机构 * Department of Computer Science, University of Copenhagen, Denmark(丹麦哥本哈根大学计算机科学系) Robotics Section, IT University of Copenhagen, Denmark(丹麦哥本哈根IT大学机器人学系)

AI总结 本文指出当前AI可持续性讨论忽视经济和社会维度,提出通过调和气候意识与资源意识、引入CARAML框架来遏制全球AI军备竞赛。

Comments Accepted to be presented at ICML 2026. Source code at https://github.com/saintslab/caraml

详情
AI中文摘要

可持续性包含三个关键方面:经济、环境和社会。然而,关于可持续人工智能(AI)的新兴讨论主要集中在AI的环境可持续性上,忽视了经济和社会方面。实现真正可持续的AI需要解决其环境可持续性(强调减轻AI对气候的影响)与社会可持续性(依赖于公平获取AI开发资源)之间的张力。然而,这种提高可及性的推动往往忽视了扩大此类资源使用的环境成本。本立场论文认为,调和气候意识和资源意识对于实现真正可持续的AI至关重要,而忽视这些因素会助长全球AI军备竞赛。运用历史唯物主义的卡尔·马克思基础-上层建筑框架,我们分析了物质条件如何塑造当前的AI进展及其相关讨论。此外,我们引入了气候与资源感知机器学习(CARAML)框架,并提出了涵盖个人、社区、行业、政府和全球层面的可操作建议,以实现可持续的AI。

英文摘要

Sustainability encompasses three key facets: economic, environmental, and social. However, the nascent discourse on sustainable artificial intelligence (AI) predominantly focuses on the environmental sustainability of AI, neglecting the economic and social aspects. Achieving truly sustainable AI necessitates addressing the tension between its environmental sustainability, which emphasises mitigating AI's climate impact, and its social sustainability, hinging on equitable access to AI development resources. This push for increased accessibility, however, often overlooks the environmental costs of expanding such resource usage. This position paper argues that reconciling climate awareness and resource awareness is essential to realising truly sustainable AI, and neglecting these factors fuels a global AI arms race. Applying Karl Marx's base-superstructure framework from historical materialism, we analyse how the material conditions are shaping the current AI progress and the discourse surrounding it. Further, we introduce the Climate and Resource Aware Machine Learning (CARAML) framework with actionable recommendations spanning individual, community, industry, government, and global levels to achieve sustainable AI.

2502.07617 2026-06-02 cs.CV

Scaling Pre-training to One Hundred Billion Data for Vision Language Models

将视觉语言模型的预训练扩展到一千亿数据

Xiao Wang, Ibrahim Alabdulmohsin, Daniel Salz, Zhe Li, Keran Rong, Xiaohua Zhai

发表机构 * Google DeepMind(谷歌DeepMind)

AI总结 本文通过实验探究将视觉语言模型预训练数据扩展到一千亿规模的效果,发现传统基准性能饱和,但文化多样性任务和低资源语言受益显著,并指出质量过滤可能减少文化多样性。

Comments v2: CVPR Findings'26

详情
AI中文摘要

我们提供了一个关于将视觉语言模型预训练扩展到前所未有规模——一千亿样本——潜力的实证研究。我们发现,在许多常见的西方中心分类和检索基准(如COCO Captions)上,模型性能在此规模下趋于饱和。然而,文化多样性任务从一千亿规模的网络数据中获得了更实质性的提升,这得益于其对长尾概念的覆盖。此外,我们分析了模型的多语言能力,并展示了在低资源语言上的提升。另外,我们观察到,通过使用如CLIP等质量过滤器减少预训练数据集的大小(通常用于提升性能)可能会无意中减少大规模数据集中所代表的文化多样性。我们的结果强调,虽然传统基准可能不会从将噪声原始网络数据扩展到一千亿样本中显著受益,但这一数据规模对于构建真正包容的多模态系统至关重要。

英文摘要

We provide an empirical investigation of the potential of pre-training vision-language models on an unprecedented scale: 100 billion examples. We find that model performance tends to saturate at this scale on many common Western-centric classification and retrieval benchmarks, such as COCO Captions. Nevertheless, tasks of cultural diversity achieve more substantial gains from the 100-billion scale web data, thanks to its coverage of long-tail concepts. Furthermore, we analyze the model's multilinguality and show gains in low-resource languages as well. In addition, we observe that reducing the size of the pretraining dataset via quality filters like using CLIP, typically used to enhance performance, may inadvertently reduce the cultural diversity represented in large-scale datasets. Our results highlight that while traditional benchmarks may not benefit significantly from scaling noisy, raw web data to 100 billion examples, this data scale is vital for building truly inclusive multimodal systems.

2502.04512 2026-06-02 cs.AI

Safety Must Precede the Deployment of Open-Ended AI

安全必须优先于开放式AI的部署

Ivaxi Sheth, Jan Wehner, Sahar Abdelnabi, Ruta Binkyte, Mario Fritz

发表机构 * CISPA-Helmholtz Center of Information Security(CISPA-海德堡信息安全中心) MPI for Intelligent Systems, ELLIS Institute Tübingen, Tübingen AI Center(智能系统Max Planck研究所、图宾根ELLIS研究所、图宾根人工智能中心)

AI总结 本文提出开放式AI系统因自主无限生成新行为而带来预测性丧失、新兴错位和控制困难等独特安全挑战,需在部署前主动研究,并给出挑战分类和研究方向。

Comments Accepted to ICML'26

详情
AI中文摘要

AI的进步在很大程度上由基础模型和好奇心驱动的学习共同推动,旨在提高能力和适应性。在此背景下,开放式(即AI智能体自主且无限地生成新行为、表示或解决方案)引起了越来越多的兴趣。这在自我进化智能体和长期发现的背景下变得相关。本文立场论文认为,开放式AI系统的定义特性引入了一类独特且未被充分探索的安全挑战,包括预测性丧失、新兴错位以及随着系统超出初始设计假设而难以维持有效控制,这些挑战必须被预先解决。这些挑战在性质上不同于与任务受限或静态模型相关的挑战,且不太可能仅通过现有安全框架解决,因此必须在大规模部署之前主动审视这些风险。论文提出了关键挑战的分类,讨论了研究机会,并呼吁采取协调行动以支持开放式AI的安全和负责任开发。

英文摘要

AI advancements have been significantly driven by a combination of foundation models and curiosity-driven learning aimed at increasing capability and adaptability. Within this landscape, open-endedness, where AI agents autonomously and indefinitely generate novel behaviors, representations, or solutions, has gained increasing interest. This has become relevant in the context of self-evolving agents and long-horizon discovery. This position paper argues that the defining properties of open-ended AI systems introduce a distinct and underexplored class of safety challenges, including loss of predictability, emergent misalignment, and difficulties in maintaining effective control as systems evolve beyond their initial design assumptions, that must be addressed preemptively. These challenges differ qualitatively from those associated with task-bounded or static models and are unlikely to be addressed by existing safety frameworks alone, which is why these risks must be examined proactively, before large-scale deployment. The paper proposes a taxonomy for key challenges, discusses research opportunities, and calls for coordinated action to support the safe and responsible development of open-ended AI.

2112.11279 2026-06-02 cs.LG

Differential Parity: Relative Fairness Between Two Sets of Decisions

差分奇偶性:两组决策之间的相对公平性

Zhe Yu, Xiaoyin Xi, Pranam Prakash Shetty

发表机构 * Rochester Institute of Technology(罗切斯特理工学院)

AI总结 本文提出差分奇偶性概念,通过比较两组决策对敏感属性的独立性来评估相对公平性,避免了绝对公平定义的模糊性,并可在有或无参考集时分别作为群体公平度量或揭示偏好/偏见。

Comments Accepted by JAIR

详情
AI中文摘要

随着AI系统广泛应用于辅助人类决策过程,如人才招聘、学校录取和贷款审批,确保决策公平的需求日益增长。分析决策公平性的一个主要挑战是标准高度主观且依赖上下文——对于每个场景而言,绝对公平的含义并无共识。这并非说不同的公平标准经常相互冲突。为了绕过这个问题,本文旨在测试决策中的相对公平性。也就是说,我们不定义什么是“绝对”公平的决策,而是提出通过差分奇偶性——两组决策之间的差异应独立于某个敏感属性——来测试一组决策相对于另一组的相对公平性。这一提出的差分奇偶性公平概念具有以下优点:(1) 避免了绝对公平决策定义的模糊性和矛盾性;(2) 当存在参考集(真实标签或可靠公平决策)时,差分奇偶性可作为新的群体公平概念(类似于分离性和充分性,但有所不同);(3) 即使没有参考集,它也能揭示不同决策集之间的相对偏好或偏见。差分奇偶性的一个局限性是它要求被比较的两组决策针对相同的数据主体做出。为了克服这一局限性,我们提出利用机器学习模型来弥合针对不同数据做出的两组决策之间的差距,并估计差分奇偶性。

英文摘要

With AI systems widely applied to assist humans in decision-making processes such as talent hiring, school admission, and loan approval; there is an increasing need to ensure that the decisions made are fair. One major challenge for analyzing fairness in decisions is that the standards are highly subjective and contextual -- there is no consensus for what absolute fairness means for every scenario. That is not to say that different fairness standards often conflict with each other. To bypass this issue, this work aims to test relative fairness in decisions. That is, instead of defining what are ``absolutely'' fair decisions, we propose to test the relative fairness of one decision set against another with differential parity -- the difference between two sets of decisions should be independent of a certain sensitive attribute. This proposed notion of differential parity fairness has the following benefits: (1) it avoids the ambiguous and contradictory definition of what absolutely fair decisions are; (2) when a reference set (of ground truth or reliable fair decisions) is available, differential parity can serve as a new group fairness notion (similar to but different from separation and sufficiency); (3) even when no reference set is available, it reveals the relative preference or bias between different decision sets. One limitation for differential parity is that it requires the two sets of decisions under comparison to be made on the same data subjects. To overcome this limitation, we propose to utilize a machine learning model to bridge the gap between the two sets of decisions made on difference data and estimate the differential parity.

2502.04646 2026-06-02 cs.LG cs.AI

Efficient Weighted Sampling via Score-based Generative Models

基于分数生成模型的高效加权采样

Heasung Kim, Taekyun Lee, Hyeji Kim, Gustavo de Veciana

发表机构 * The University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 提出一种无需训练的加权采样框架,通过轻量级引导近似和不确定性感知调度器,在预训练分数生成模型上实现高效、稳定的采样,并在大规模设置中取得1.2至4.7倍加速。

Comments 37 pages

详情
AI中文摘要

加权采样——从与基概率密度函数和权重函数乘积成比例的概率密度函数中采样——是一种基础技术,在方差缩减、有偏采样、数据增强等领域有广泛应用。利用日益可用的预训练分数生成模型,我们提出了一种无需训练的加权采样框架,通过以原则性和计算高效的方式,用辅助引导项增强预训练基分数函数,来近似目标分布的逆向扩散过程。我们的方法基于两个关键组件:一个轻量级的引导近似,避免了分数函数和权重函数的高阶导数;以及一个不确定性感知调度器,基于近似误差的时间分析动态调整引导强度。这些组件共同实现了准确稳定的采样,无需依赖现有方法通常需要的基于粒子的重采样或Hessian评估。我们从合成设置到大规模设置(如Stable Diffusion XL)验证了方法的有效性,在该框架下,我们实现了1.2倍到4.7倍的加速,同时在任务性能上始终匹配或超越最先进的基线。这些结果使我们的方法成为生成应用中任务自适应、时间敏感采样的可扩展且推理高效的解决方案。

英文摘要

Weighted sampling -- sampling from a probability density function (PDF) proportional to the product of a base PDF and a weight function -- is a fundamental technique with wide-ranging applications in variance reduction, biased sampling, data augmentation, and more. Leveraging the increasing availability of pretrained score-based generative models (SGMs), we propose a training-free weighted sampling framework that approximates the backward diffusion process of the target distribution by augmenting the pretrained base score function with an auxiliary guidance term, in a principled and computationally efficient manner. Our approach builds on two key components: a lightweight approximation of the guidance that avoids costly higher-order derivatives of both the score and weight functions, and an uncertainty-aware scheduler that dynamically adjusts the guidance strength based on a temporal analysis of approximation error. Together, these components enable accurate and stable sampling without relying on particle-based resampling or Hessian evaluations commonly required by existing methods. We validate the effectiveness of our method from synthetic to large-scale settings such as Stable Diffusion XL, where our framework achieves $1.2\times$ to $4.7\times$ speedups while consistently matching or outperforming state-of-the-art baselines in task performance. These results position our method as a scalable and inference-efficient solution for task-adaptive, time-sensitive sampling in generative applications.

2111.03861 2026-06-02 cs.CV cs.AI cs.LG

What augmentations are sensitive to hyper-parameters and why?

哪些数据增强对超参数敏感以及为什么?

Ch Muhammad Awais, Imad Eddine Ibrahim Bekkouch

发表机构 * Knowledge Representation Lab Innopolis University(知识表示实验室 印尼奥利普斯大学) Sorbonne Center for Artificial Intelligence - SCAI Sorbonne University(索邦人工智能中心 - SCAI 索邦大学)

AI总结 本研究通过局部代理(LIME)解释和线性回归系数评估不同数据增强对模型超参数的敏感性、一致性和影响,发现某些增强对超参数高度敏感,而另一些则更稳健可靠。

Comments 10 pages, 17 figures

详情
Journal ref
Intelligent Computing: Proceedings of the 2022 Computing Conference
AI中文摘要

我们对数据集应用增强以提高预测质量,并使最终模型对噪声数据和领域漂移更具鲁棒性。然而,问题仍然存在:这些增强在不同的超参数下表现如何?在本研究中,我们通过执行局部代理(LIME)解释来评估增强对模型超参数的敏感性、一致性和影响,当不同增强应用于机器学习模型时,解释超参数的影响。我们利用线性回归系数来加权每个增强。我们的研究证明,有些增强对超参数高度敏感,而其他增强则更具鲁棒性和可靠性。

英文摘要

We apply augmentations to our dataset to enhance the quality of our predictions and make our final models more resilient to noisy data and domain drifts. Yet the question remains, how are these augmentations going to perform with different hyper-parameters? In this study we evaluate the sensitivity of augmentations with regards to the model's hyper parameters along with their consistency and influence by performing a Local Surrogate (LIME) interpretation on the impact of hyper-parameters when different augmentations are applied to a machine learning model. We have utilized Linear regression coefficients for weighing each augmentation. Our research has proved that there are some augmentations which are highly sensitive to hyper-parameters and others which are more resilient and reliable.

2501.12178 2026-06-02 cs.CV

Visualizing definitional divergence in high-dimensional data by manifold alignment: Application to 3D right ventricular strain computations

通过流形对齐可视化高维数据中的定义差异:应用于3D右心室应变计算

Maxime Di Folco, Gabriel Bernardino, Patrick Clarysse, Nicolas Duchateau

发表机构 * Univ Lyon, Université Claude Bernard Lyon 1, INSA-Lyon,CNRS, Inserm, CREATIS UMR 5220, U1294(里昂大学,克劳德·贝尔纳里 Lyon 1 大学,INSA-里昂,CNRS,Inserm,CREATIS UMR 5220,U1294) Institute of Machine Learning in Biomedical Imaging, Helmholtz Center Munich, Germany(生物医学成像机器学习研究所,海德堡中心慕尼黑,德国) LTCI, Telecom Paris, Institut Polytechnique de Paris(LTCI,电信巴黎,巴黎理工学院) DTIC, Universitat Pompeu Fabra, Barcelona, Spain(DTIC,庞培法布拉大学,巴塞罗那,西班牙) Institut Universitaire de France (IUF)(法国大学研究所(IUF))

AI总结 提出一种基于表示学习的策略,通过流形对齐匹配不同定义的高维数据,并重建参数图以可视化定义差异,应用于右心室应变分析。

Comments Accepted for publication in IEEE Transactions on Medical Imaging, DOI: 10.1109/TMI.2026.3698240 \c{opyright} 2026 IEEE. Personal use is permitted. For all other uses, permission must be obtained from IEEE

详情
AI中文摘要

医学影像研究通常依赖于每个受试者的单个样本,假设其能代表生理特征。然而,输入描述符定义或计算方式的变化(例如由于科学领域缺乏共识)可能对分析产生关键影响,但在实践中很少被考虑。本文提出一种基于表示学习的原创策略,用于估计反映这种定义差异对先前从医学图像中提取的特定生理描述符影响的参数图。我们将这些生理描述符的不同定义或计算视为不同的高维数据,可能具有异构类型。我们特别关注心肌变形(应变),其定义尚未达成共识。我们首先使用流形对齐来匹配与该描述符不同定义相关的潜在表示。然后,我们在潜在空间中制定合理的分布来表示描述符之间的定义差异,并从中重建高维参数图以可视化这种定义差异。由于缺乏针对该特定临床应用的适当真实数据,我们首先在玩具实验上演示该方法,然后扩展到从3D超声心动图图像序列获得的受试者右心室应变数据的评估,其中右心室内膜表面网格的每个点都有不同类型的应变可用。除了这一说明性应用外,我们的方法具有推广到其他考虑异构高维描述符的人群分析的潜力。

英文摘要

Medical imaging studies often rely on a single sample per subject, assuming it is representative of their physiological traits. However, variations in how input descriptors are defined or computed (e.g. due to a lack of consensus in the scientific field) may have a crucial impact on the analysis, and are hardly considered in practice. In this paper, we propose an original strategy based on representation learning to estimate a parametric map reflecting the impact of such definitional differences on a given physiological descriptor, previously extracted from medical images. We consider the different definitions or computations of such physiological descriptors as different high-dimensional data, potentially of heterogeneous types. We specifically focus on myocardial deformation (strain), for which there is limited agreement on its definition. We first use manifold alignment to match the latent representations associated with the different definitions of this descriptor. Then, we formulate plausible distributions in the latent space to represent definitional divergence across descriptors, from which we reconstruct a high-dimensional parametric map to visualize such definitional divergence. Due to the lack of proper ground truth for this specific clinical application, we first demonstrate this methodology on toy experiments and then expand the evaluation on right ventricular strain data from subjects obtained from 3D echocardiographic image sequences, for which different types of strain are available at each point of the right ventricle endocardial surface mesh. Beyond this illustrative application, our methodology has the potential to be generalised to many other population analyses considering heterogeneous high-dimensional descriptors.

2501.08640 2026-06-02 cs.LG stat.ML

Quantum Reservoir Computing and Risk Bounds

量子储层计算与风险界

Naomi Mona Chmielewski, Nina Amini, Joseph Mikael

发表机构 * EDF Lab, France(法国EDF实验室)

AI总结 利用Rademacher复杂度对量子储层计算中的泛化误差进行界定,并分析其随量子比特数增长的标度行为。

详情
AI中文摘要

我们提出了一种利用Rademacher复杂度来界定几类量子储层泛化误差的方法。我们给出了两个特定量子储层类别的具体参数依赖界。我们分析了泛化界随量子比特数增长的标度行为。将我们的结果应用于具有多项式读出函数的类别,我们发现风险界在训练样本数量上收敛。我们的界中对量子储层和读出参数的显式依赖可用于在一定程度上控制泛化误差。需要注意的是,这些界随量子比特数n呈指数增长。Rademacher复杂度的上界可应用于满足量子动力学和读出函数若干假设的其他储层类别。

英文摘要

We propose a way to bound the generalisation errors of several classes of quantum reservoirs using the Rademacher complexity. We give specific, parameter-dependent bounds for two particular quantum reservoir classes. We analyse how the generalisation bounds scale with growing numbers of qubits. Applying our results to classes with polynomial readout functions, we find that the risk bounds converge in the number of training samples. The explicit dependence on the quantum reservoir and readout parameters in our bounds can be used to control the generalisation error to a certain extent. It should be noted that the bounds scale exponentially with the number of qubits n. The upper bounds on the Rademacher complexity can be applied to other reservoir classes that fulfill a few hypotheses on the quantum dynamics and the readout function.

2501.04424 2026-06-02 cs.AI cs.CL

NSA: Neuro-symbolic ARC Challenge

NSA: 神经符号 ARC 挑战

Paweł Batorski, Jannik Brinkmann, Paul Swoboda

发表机构 * Heinrich Heine Universität Düsseldorf(杜伊斯堡-艾森大学) University of Mannheim(曼海姆大学)

AI总结 提出一种结合 transformer 提案生成与领域特定语言组合搜索的神经符号方法,在 ARC 评估集上超越现有最优方法 27%。

详情
Journal ref
ESANN 2026 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 99-104, 2026
AI中文摘要

抽象与推理语料库 (ARC) 评估了机器学习模型和组合搜索方法都难以处理的通用推理能力。我们提出了一种神经符号方法,该方法结合了用于提案生成的 transformer 和使用领域特定语言的组合搜索。Transformer 通过提出有希望的搜索方向来缩小搜索空间,从而使组合搜索能够在短时间内找到实际解决方案。我们使用合成生成的数据预训练 transformer。在测试时,我们生成额外的任务特定训练任务并微调我们的模型。我们的结果在 ARC 评估集上比现有最优方法高出 27%,并且在 ARC 训练集上表现良好。我们在 https://github.com/Batorskq/NSA 公开了我们的代码和数据集。

英文摘要

The Abstraction and Reasoning Corpus (ARC) evaluates general reasoning capabilities that are difficult for both machine learning models and combinatorial search methods. We propose a neuro-symbolic approach that combines a transformer for proposal generation with combinatorial search using a domain-specific language. The transformer narrows the search space by proposing promising search directions, which allows the combinatorial search to find the actual solution in short time. We pre-train the trainsformer with synthetically generated data. During test-time we generate additional task-specific training tasks and fine-tune our model. Our results surpass comparable state of the art on the ARC evaluation set by 27% and compare favourably on the ARC train set. We make our code and dataset publicly available at https://github.com/Batorskq/NSA.

2412.19444 2026-06-02 cs.LG math.OC stat.ML

Towards Simple and Provable Parameter-Free Adaptive Gradient Methods

迈向简单且可证明的无参数自适应梯度方法

Yuanzhe Tao, Yifeng Liu, Huizhuo Yuan, Xun Zhou, Yuan Cao, Quanquan Gu

发表机构 * School of Mathematical Sciences, Peking University(北京大学数学科学学院) Department of Computer Science, University of California, Los Angeles(加州大学洛杉矶分校计算机科学系) School of Computing and Data Science, the University of Hong Kong(香港大学计算科学与数据科学学院) Bytedance Inc(字节跳动公司)

AI总结 提出 AdaGrad++ 和 Adam++ 两种简单无参数自适应梯度方法,在无需预设学习率的情况下实现与 AdaGrad 和 Adam 相当的收敛保证。

Comments 45 pages, 19 figures, 3 tables

详情
AI中文摘要

诸如 AdaGrad 和 Adam 等优化算法通过在优化过程中动态调整学习率,显著推进了深度模型的训练。然而,学习率的临时调整带来了挑战并导致实际中的低效。为解决此问题,近期研究聚焦于开发无需学习率调整即可有效运行的“无参数”算法。尽管有这些努力,现有的 AdaGrad 和 Adam 无参数变体往往过于复杂且/或缺乏正式的收敛保证。在本文中,我们提出了 AdaGrad++ 和 Adam++,这是 AdaGrad 和 Adam 的新型简单无参数变体,具有收敛保证。我们证明 AdaGrad++ 在凸优化中无需预设学习率假设即可达到与 AdaGrad 相当的收敛速率。类似地,Adam++ 在不依赖任何学习率条件的情况下匹配 Adam 的收敛速率。跨多种深度学习任务的实验结果验证了 Adam++ 的竞争性能。

英文摘要

Optimization algorithms such as AdaGrad and Adam have significantly advanced the training of deep models by dynamically adjusting the learning rate during the optimization process. However, ad-hoc tuning of learning rates poses a challenge and leads to inefficiencies in practice. To address this issue, recent research has focused on developing ``parameter-free'' algorithms that operate effectively without the need for learning rate tuning. Despite these efforts, existing parameter-free variants of AdaGrad and Adam tend to be overly complex and/or lack formal convergence guarantees. In this paper, we present AdaGrad++ and Adam++, novel and simple parameter-free variants of AdaGrad and Adam with convergence guarantees. We prove that AdaGrad++ achieves comparable convergence rates to AdaGrad in convex optimization without predefined learning rate assumptions. Similarly, Adam++ matches the convergence rate of Adam without relying on any conditions on the learning rates. Experimental results across various deep learning tasks validate the competitive performance of Adam++.

2412.19419 2026-06-02 cs.LG cs.AI

Introduction to Graph Neural Networks for Machine Learning Engineers

面向机器学习工程师的图神经网络导论

James H. Tanis, Chris Giannella, Adrian V. Mariano, Daoud Meerzaman

发表机构 * The MITRE Corporation(MITRE公司) National Cancer Institute(国家癌症研究所)

AI总结 本文通过编码器-解码器框架介绍图神经网络,并通过同质图上的理论和实验分析不同训练规模和复杂度下的行为,重点讨论过平滑和过挤压问题。

Comments Author accepted manuscript. Title and metadata updated to match the published ACM Computing Surveys version. 73 pages, including references and supplementary material

详情
AI中文摘要

图神经网络是专为节点或边带有属性的图设计的深度神经网络。由于其在广泛任务上的出色表现,文献中关于这些模型的研究论文数量正在快速增长。本综述通过编码器-解码器框架介绍图神经网络,并提供了一系列图分析任务的解码器示例。它利用理论和对同质图的大量实验,展示了图神经网络在不同训练规模和图复杂度下的行为,重点强调了过平滑和过挤压现象。

英文摘要

Graph neural networks are deep neural networks designed for graphs with attributes attached to nodes or edges. The number of research papers in the literature concerning these models is growing rapidly due to their impressive performance on a broad range of tasks. This survey introduces graph neural networks through the encoder-decoder framework and provides examples of decoders for a range of graph analytic tasks. It uses theory and numerous experiments on homogeneous graphs to illustrate the behavior of graph neural networks under different training sizes and degrees of graph complexity, with an emphasis on oversmoothing and oversquashing.