arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 4033
专题追踪
2603.16593 2026-05-12 cs.RO

Scalable Inspection Planning via Flow-based Mixed Integer Linear Programming

Adir Morgan, Kiril Solovey, Oren Salzman

发表机构 * Technion--Israel Institute of Technology, Haifa, Israel(技术离子-以色列理工学院,海法,以色列)

AI总结 本文研究了机器人在给定兴趣点(POIs)集合中进行检测的路径规划问题,旨在找到最短的机器人路径以完成检测任务。为了解决该问题的复杂性,作者提出了一种基于网络流的混合整数线性规划(MILP)方法,将核心约束条件转化为网络流模型,并设计了专用的分支定界求解器,从而显著提升了求解效率和解的质量。实验表明,该方法在大规模场景下表现出优越的可扩展性,并大幅缩小了最优解的差距。

详情
英文摘要

Inspection planning is concerned with computing the shortest robot path to inspect a given set of points of interest (POIs) using the robot's sensors. This problem arises in a wide range of applications from manufacturing to medical robotics. To alleviate the problem's complexity, recent methods rely on sampling-based methods to obtain a more manageable (discrete) graph inspection planning (GIP) problem. Unfortunately, GIP still remains highly difficult to solve at scale as it requires simultaneously satisfying POI-coverage and path-connectivity constraints, giving rise to a challenging optimization problem, particularly at scales encountered in real-world scenarios. In this work, we present highly scalable Mixed Integer Linear Programming (MILP) solutions for GIP that significantly advance the state-of-the-art in both runtime and solution quality. Our key insight is a reformulation of the problem's core constraints as a network flow, which enables effective MILP models and a specialized Branch-and-Cut solver that exploits the combinatorial structure of flows. We evaluate our approach on medical and infrastructure benchmarks alongside large-scale synthetic instances. Across all scenarios, our method produces substantially tighter lower bounds than existing formulations, reducing optimality gaps by 30-50% on large instances. Furthermore, our solver demonstrates unprecedented scalability: it provides non-trivial solutions for problems with up to 15,000 vertices and thousands of POIs, where prior state-of-the-art methods typically exhaust memory or fail to provide any meaningful optimality guarantees.

2603.12275 2026-05-12 cs.CL cs.LG

GONE: Structural Knowledge Unlearning via Neighborhood-Expanded Distribution Shaping

Chahana Dahal, Ashutosh Balasubramaniam, Zuobin Xiong

发表机构 * University of Nevada, Las Vegas(内华达大学拉斯维加斯分校) Indian Institute of Technology Guwahati(印度理工学院古瓦哈提分校)

AI总结 本文提出GONE,一个用于评估大语言模型中结构化知识遗忘能力的基准,旨在解决现有方法在处理关系型、多跳推理知识时的不足。该研究引入了基于知识图谱的基准和一种名为NEDS的新框架,通过利用图结构中的邻居信息来精确控制遗忘事实与语义邻域之间的边界,有效提升了知识遗忘的效果与局部性。实验表明,NEDS在多个基准上表现出色,具有较高的遗忘效率和局部保持能力。

详情
英文摘要

Unlearning knowledge is a pressing and challenging task in Large Language Models (LLMs) because of their unprecedented capability to memorize and digest training data at scale, raising more significant issues regarding safety, privacy, and intellectual property. However, existing works, including parameter editing, fine-tuning, and distillation-based methods, are all focused on flat sentence-level data but overlook the relational, multi-hop, and reasoned knowledge in naturally structured data. In response to this gap, this paper introduces Graph Oblivion and Node Erasure (GONE), a benchmark for evaluating knowledge unlearning over structured knowledge graph (KG) facts in LLMs. This KG-based benchmark enables the disentanglement of three effects of unlearning: direct fact removal, reasoning-based leakage, and catastrophic forgetting. In addition, Neighborhood-Expanded Distribution Shaping (NEDS), a novel unlearning framework, is designed to leverage graph connectivity and identify anchor correlated neighbors, enforcing a precise decision boundary between the forgotten fact and its semantic neighborhood. Evaluations on LLaMA-3-8B and Mistral-7B across multiple knowledge editing and unlearning methods showcase NEDS's superior performance (1.000 on unlearning efficacy and 0.839 on locality) on GONE and other benchmarks. Code is available at https://anonymous.4open.science/r/GONE-4679/.

2603.11969 2026-05-12 cs.CV

AstroSplat: Physics-Based Gaussian Splatting for Rendering and Reconstruction of Small Celestial Bodies

Jennifer Nolan, Travis Driver, John Christian

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文提出了一种基于物理的高斯点绘(Gaussian Splatting)框架AstroSplat,用于小天体(如小行星)表面的渲染与重建。该方法引入行星反射模型,显式建模表面材质属性与光照交互,克服了传统基于球谐函数的外观参数化方法在物理特性表达上的不足。实验表明,AstroSplat在NASA“黎明”任务的真实图像上表现出更优的渲染效果和表面重建精度。

Comments 10 pages, 6 figures, conference

详情
英文摘要

Image-based surface reconstruction and characterization are crucial for missions to small celestial bodies (e.g., asteroids), as it informs mission planning, navigation, and scientific analysis. Recent advances in Gaussian splatting enable high-fidelity neural scene representations but typically rely on a spherical harmonic intensity parameterization that is strictly appearance-based and does not explicitly model material properties or light-surface interactions. We introduce AstroSplat, a physics-based Gaussian splatting framework that integrates planetary reflectance models to improve the autonomous reconstruction and photometric characterization of small-body surfaces from in-situ imagery. The proposed framework is validated on real imagery taken by NASA's Dawn mission, where we demonstrate superior rendering performance and surface reconstruction accuracy compared to the typical spherical harmonic parameterization.

2603.11566 2026-05-12 cs.CV

R4Det: 4D Radar-Camera Fusion for High-Performance 3D Object Detection

Zhongyu Xia, Yousen Tang, Yongtao Wang, Zhifeng Wang, Weijun Qin

发表机构 * Wangxuan Institute of Computer Technology, Peking University(北京大学王轩计算机技术研究所) EBTech Co. Ltd(EBTech公司)

AI总结 本文提出了一种名为R4Det的4D雷达-相机融合方法,用于提升自动驾驶中的3D目标检测性能。针对现有方法在深度估计、时序融合和小目标检测方面的不足,R4Det引入全景深度融合模块增强深度估计精度,设计无需依赖车辆姿态的可变形门控时序融合模块,并构建实例引导的动态细化模块以提升小目标检测能力。实验表明,R4Det在TJ4DRadSet和VoD数据集上取得了最先进的3D检测效果。

Comments Accepted to CVPR 2026

详情
英文摘要

4D radar-camera sensing configuration has gained increasing importance in autonomous driving. However, existing 3D object detection methods that fuse 4D Radar and camera data confront several challenges. First, their absolute depth estimation module is not robust and accurate enough, leading to inaccurate 3D localization. Second, the performance of their temporal fusion module will degrade dramatically or even fail when the ego vehicle's pose is missing or inaccurate. Third, for some small objects, the sparse radar point clouds may completely fail to reflect from their surfaces. In such cases, detection must rely solely on visual unimodal priors. To address these limitations, we propose R4Det, which enhances depth estimation quality via the Panoramic Depth Fusion module, enabling mutual reinforcement between absolute and relative depth. For temporal fusion, we design a Deformable Gated Temporal Fusion module that does not rely on the ego vehicle's pose. In addition, we built an Instance-Guided Dynamic Refinement module that extracts semantic prototypes from 2D instance guidance. Experiments show that R4Det achieves state-of-the-art 3D object detection results on the TJ4DRadSet and VoD datasets. The source code and models will be released at https://github.com/VDIGPKU/R4Det.

2603.10126 2026-05-12 cs.RO cs.AI

AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models

Yutong Hu, Jan-Nico Zaech, Nikolay Nikolov, Yuanqi Yao, Sombit Dey, Giuliano Albanese, Renaud Detry, Luc Van Gool, Danda Paudel

发表机构 * KU Leuven, Dept. Mechanical Engineering, Research unit Robotics, Automation and Mechatronics(库勒恩大学,机械工程系,机器人、自动化与机电一体化研究单位) KU Leuven, Dept. Electrical Engineering, Research unit Processing Speech and Images(库勒恩大学,电气工程系,语音和图像处理研究单位)

AI总结 本文提出了一种独立的自回归(AR)动作专家AR-VLA,它能够在可刷新的视觉-语言前缀条件下,生成连续的因果动作序列。与现有视觉-语言-动作(VLA)模型和扩散策略不同,该动作专家通过长时记忆保持自身历史信息,具备内在的上下文感知能力,有效解决了快速控制与慢速推理之间的频率不匹配问题。实验表明,AR-VLA在保持或超越现有反应式VLA任务成功率的同时,展现出更强的历史感知能力和更平滑的动作轨迹,为训练高效机器人策略提供了可扩展的结构基础。

Comments RSS 2026 accepted

详情
英文摘要

We propose a standalone autoregressive (AR) Action Expert that generates actions as a continuous causal sequence while conditioning on refreshable vision-language prefixes. In contrast to existing Vision-Language-Action (VLA) models and diffusion policies that reset temporal context with each new observation and predict actions reactively, our Action Expert maintains its own history through a long-lived memory and is inherently context-aware. This structure addresses the frequency mismatch between fast control and slow reasoning, enabling efficient independent pretraining of kinematic syntax and modular integration with heavy perception backbones, naturally ensuring spatio-temporally consistent action generation across frames. To synchronize these asynchronous hybrid V-L-A modalities, we utilize a re-anchoring mechanism that mathematically accounts for perception staleness during both training and inference. Experiments on simulated and real-robot manipulation tasks demonstrate that the proposed method can effectively replace traditional chunk-based action heads for both specialist and generalist policies. AR-VLA exhibits superior history awareness and substantially smoother action trajectories while maintaining or exceeding the task success rates of state-of-the-art reactive VLAs. Overall, our work introduces a scalable, context-aware action generation schema that provides a robust structural foundation for training effective robotic policies. Code and Videos available at https://arvla.insait.ai

2603.09970 2026-05-12 cs.CL

CREATE: Testing LLMs for Associative Creativity

Manya Wadhwa, Tiasa Singha Roy, Harvey Lederman, Junyi Jessy Li, Greg Durrett

发表机构 * New York University(纽约大学) The University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 CREATE 是一个用于评估大语言模型关联创造力能力的基准测试。该任务要求模型生成连接概念的路径,路径需具备高特异性和多样性,模型生成的路径越多且质量越高,得分越高。研究发现,当前最先进的模型在创造性任务中表现更优,但因搜索空间庞大,基准测试难以饱和,且思维模型在高token预算下也不一定更具优势。CREATE 为提升模型关联创造力提供了实验平台。

详情
英文摘要

A key component of creativity is associative reasoning: the ability to draw novel yet meaningful connections between concepts. We introduce CREATE, a benchmark designed to evaluate models' capacity for creative associative reasoning. CREATE requires models to generate sets of paths connecting concepts in a model's parametric knowledge. Paths should have high specificity (distinctiveness and closeness of the concept connection) and high diversity (dissimilarity from other paths), and models are scored more highly if they produce a larger set of strong, diverse paths. This task shares demands of real creativity tasks like hypothesis generation, including an extremely large search space, but enables collection of a sizable benchmark with objective answer grading. Evaluation of frontier models shows that the strongest models achieve higher creative utility than others, with the high multiplicity of answers and complexity of the search making benchmark saturation difficult to achieve. Furthermore, our results illustrate that thinking models are not always more effective on our task, even with high token budgets. Recent approaches for creative prompting give some but limited additional improvement. CREATE provides a sandbox for developing new methods to improve models' capacity for associative creativity.

2603.09465 2026-05-12 cs.CV cs.AI

EvoDriveVLA: Evolving Driving VLA Models via Collaborative Perception-Planning Distillation

Jiajun Cao, Xiaoan Zhang, Xiaobao Wei, Liyuqiu Huang, Zijian Wang, Hanzhen Zhang, Zhengyu Jia, Wei Mao, Hao Wang, Xianming Liu, Shuchang Zhou, Yang Wang, Shanghang Zhang

发表机构 * State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University(多媒体信息处理国家重点实验室,计算机学院,北京大学) XPeng Motors(小鹏汽车)

AI总结 本文提出了一种名为EvoDriveVLA的协作感知-规划蒸馏框架,旨在解决视觉语言动作模型在自动驾驶中解冻视觉编码器后感知性能下降以及长期规划不稳定的问题。该方法结合了自锚定感知约束和未来感知轨迹优化,通过自锚定教师模型引导学生模型关注关键区域,并利用未来感知的引导教师进行轨迹优化与不确定性建模,从而提升模型的感知与规划能力。实验表明,EvoDriveVLA在nuScenes和NAVSIM数据集上均取得了优越的性能。

Comments 19 pages, 5 figures, 5 tables

详情
英文摘要

Vision-Language-Action models have shown great promise for autonomous driving, yet they suffer from degraded perception after unfreezing the visual encoder and struggle with accumulated instability in long-term planning. To address these challenges, we propose EvoDriveVLA-a novel collaborative perception-planning distillation framework that integrates self-anchored perceptual constraints and future-informed trajectory optimization. Specifically, self-anchored visual distillation leverages self-anchor teacher to deliver visual anchoring constraints, regularizing student representations via trajectory-guided key-region awareness. In parallel, future-informed trajectory distillation employs a future-aware oracle teacher with coarse-to-fine trajectory refinement and Monte Carlo dropout sampling to synthesize reasoning trajectories that model future evolutions, enabling the student model to internalize the future-aware insights of the teacher. EvoDriveVLA achieves SOTA performance in nuScenes open-loop evaluation and significantly enhances performance in NAVSIM closed-loop evaluation. Our code is available at: https://github.com/hey-cjj/EvoDriveVLA.

2603.08588 2026-05-12 cs.LG cs.AI

Towards Batch-to-Streaming Deep Reinforcement Learning for Continuous Control

Riccardo De Monte, Matteo Cederle, Gian Antonio Susto

发表机构 * Department of Information Engineering University of Padova(信息工程系帕多瓦大学)

AI总结 本文研究了如何将现有的批量深度强化学习方法适配到流式处理场景中,以满足资源受限硬件的需求。作者提出了两种新型流式深度强化学习算法——S2AC和SDAC,它们在保持与先进批量RL方法兼容的同时,能够在标准基准上达到与现有流式方法相当的性能,且无需繁琐的超参数调整。研究还探讨了从批量到流式的过渡问题,并提出了一种有效保持预训练策略性能的方法。

详情
英文摘要

State-of-the-art deep reinforcement learning (RL) methods have achieved remarkable performance in continuous control tasks, yet their computational complexity is often incompatible with the constraints of resource-limited hardware, due to their reliance on replay buffers, batch updates, and target networks. The emerging paradigm of streaming deep RL addresses this limitation through purely online updates, achieving strong empirical performance on standard benchmarks. In this work, we propose two novel streaming deep RL algorithms, Streaming Soft Actor-Critic (S2AC) and Streaming Deterministic Actor-Critic (SDAC), explicitly designed to be compatible with state-of-the-art batch RL methods, making them particularly suitable for on-device finetuning applications such as Sim2Real transfer. Both algorithms achieve performance comparable to state-of-the-art streaming baselines on standard benchmarks without requiring tedious per-environment hyperparameter tuning. We further investigate the batch-to-streaming transition, showing that a naive transition does not guarantee preservation of pre-trained policy performance, and propose a principled approach to address this challenge.

2603.08065 2026-05-12 cs.LG cs.CL

Deterministic Differentiable Structured Pruning for Large Language Models

Weiyu Huang, Pengle Zhang, Xiaolu Zhang, Jun Zhou, Jun Zhu, Jianfei Chen

发表机构 * Department of Computer Science and Technology, Tsinghua University, Beijing, China(清华大学计算机科学与技术系)

AI总结 该研究提出了一种确定性可微分结构化剪枝方法(DDP),用于降低大语言模型的推理成本。与以往依赖随机硬混凝土松弛的方法不同,DDP 直接优化离散 l0 目标的确定性软替代目标,消除了随机性,从而减少训练与测试间的不匹配并加快收敛。实验表明,该方法在多个密集和 MoE 模型上实现了接近原模型的性能,且在 20% 稀疏度下优于现有方法,并在实际部署中显著提升了推理速度。

Comments Published at ICML26;

详情
英文摘要

Structured pruning reduces LLM inference cost by removing low-importance architectural components. This can be viewed as learning a multiplicative gate for each component under an l0 sparsity constraint. Due to the discreteness of the l0 norm, prior work typically adopts stochastic hard-concrete relaxations to enable differentiable optimization; however, this stochasticity can introduce a train--test mismatch when sampled masks are discretized for deployment and restricts masks to a bounded, near-binary range. To address this, we propose Deterministic Differentiable Pruning (DDP), a mask-only optimization method that eliminates stochasticity by directly optimizing a deterministic soft surrogate of the discrete l0 objective. Compared with prior approaches, DDP offers greater expressiveness, reduced train--test mismatch, and faster convergence. We apply our method to several dense and MoE models, including Qwen3-32B and Qwen3-30B-A3B, achieving a performance loss as small as 1% on downstream tasks while outperforming previous methods at 20% sparsity. We further demonstrate end-to-end inference speedups in realistic deployment settings with vLLM.

2603.04783 2026-05-12 cs.AI cs.CL

Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction

Xingwu Chen, Zhanqiu Zhang, Yiwen Guo, Difan Zou

发表机构 * Department of XXX, University of YYY, Location, Country(XXX系,YYY大学,地点,国家) School of ZZZ, Institute of WWW, Location, Country(ZZZ学院,WWW研究所,地点,国家)

AI总结 尽管大型语言模型在单轮对话中表现出强大的推理能力,但在多轮交互中却容易因信息逐步揭示或需要更新而出现性能下降,其根本原因是“上下文惯性”——模型倾向于固守先前的推理路径,忽视后续输入的修正信息。为此,研究提出了一种基于单轮锚点的强化学习方法RLSTA,利用模型在单轮任务中的优势作为稳定参考点,引导其在多轮交互中动态调整推理过程,从而打破上下文惯性。实验表明,RLSTA在多个领域均表现出优越的性能和良好的泛化能力,无需外部验证即可实现稳定有效的多轮对话。

详情
英文摘要

While LLMs demonstrate strong reasoning capabilities when provided with full information in a single turn, they exhibit substantial vulnerability in multi-turn interactions. Specifically, when information is revealed incrementally or requires updates, models frequently fail to integrate new constraints, leading to a collapse in performance compared to their single-turn baselines. We term the root cause as \emph{Contextual Inertia}: a phenomenon where models rigidly adhere to previous reasoning traces. Even when users explicitly provide corrections or new data in later turns, the model ignores them, preferring to maintain consistency with its previous (incorrect) reasoning path. To address this, we introduce \textbf{R}einforcement \textbf{L}earning with \textbf{S}ingle-\textbf{T}urn \textbf{A}nchors (\textbf{RLSTA}), a generalizable training approach designed to stabilize multi-turn interaction across diverse scenarios and domains. RLSTA leverages the model's superior single-turn capabilities as stable internal anchors to provide reward signals. By aligning multi-turn responses with these anchors, RLSTA empowers models to break contextual inertia and self-calibrate their reasoning based on the latest information. Experiments show that RLSTA significantly outperforms standard fine-tuning and abstention-based methods. Notably, our method exhibits strong cross-domain generalization (e.g., math to code) and proves effective even without external verifiers, highlighting its potential for general-domain applications. Code is available at https://github.com/Tencent/RLSTA.

2603.03756 2026-05-12 cs.LG cs.CE cs.CL

MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

Zonglin Yang, Lidong Bing

发表机构 * MiroMind AI

AI总结 尽管大型语言模型在科学发现中展现出潜力,但现有研究多关注推理或反馈驱动的训练,而未直接建模生成推理过程 $P(h|b)$。本文提出 MOOSE-Star 框架,通过分解子任务、动机引导的分层搜索和有界组合等方法,将训练复杂度从指数级降低到对数级,实现了 $P(h|b)$ 的可扩展训练。为支持该框架,研究者还发布了包含 108,717 篇分解论文的 TOMATO-Star 数据集,实验证明 MOOSE-Star 能够随着训练数据和推理预算持续扩展,而直接采样方法则受限于复杂度瓶颈。

Comments Accepted by ICML 2026

详情
英文摘要

While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, $P(\text{hypothesis}|\text{background})$ ($P(h|b)$), unexplored. We demonstrate that directly training $P(h|b)$ is mathematically intractable due to the combinatorial complexity ($O(N^k)$) inherent in retrieving and composing inspirations from a vast knowledge base. To break this barrier, we introduce MOOSE-Star, a unified framework that enables tractable and scalable training of $P(h|b)$, while supporting more scalable inference. In the best case, MOOSE-Star reduces complexity from exponential to logarithmic ($O(\log N)$) by (1) training on decomposed subtasks derived from the probabilistic equation of discovery, (2) employing motivation-guided hierarchical search to enable logarithmic retrieval and prune irrelevant subspaces, and (3) utilizing bounded composition for robustness against retrieval noise. To facilitate this, we release TOMATO-Star, a dataset of 108,717 decomposed papers (38,400 GPU hours) for training. Empirically, MOOSE-Star scales continuously with training data and inference budget, whereas direct brute-force sampling hits a complexity wall.

2603.03239 2026-05-12 cs.CV

COP-GEN: Latent Diffusion Transformer for Copernicus Earth Observation Data

Miguel Espinosa, Eva Gmelich Meijling, Valerio Marsocci, Elliot J. Crowley, Mikolaj Czerkawski

发表机构 * School of Engineering University of Edinburgh(工程学院爱丁堡大学) European Space Agency (ESA)(欧洲航天局) Asterisk Labs(Asterisk实验室)

AI总结 该研究提出了一种名为COP-GEN的多模态潜扩散变换器,用于生成Copernicus地球观测数据,能够建模不同传感器(如光学、雷达、高程和土地覆盖)在原生空间分辨率下的联合分布。通过将跨模态映射参数化为条件分布,COP-GEN实现了灵活的任意到任意条件生成,包括无需任务特异性再训练的零样本模态转换。实验表明,该模型在保持高峰值保真度的同时,能够生成多样且物理一致的观测结果,并在构建的基准数据集上展现出显著优于现有方法的生成能力。

详情
英文摘要

Earth observation applications increasingly rely on data from multiple sensors, including optical, radar, elevation, and land-cover. Relationships between modalities are fundamental for data integration but are inherently non-injective: identical conditioning information can correspond to multiple physically plausible observations, and should be parametrised as conditional distributions. Deterministic models, by contrast, collapse toward conditional means and fail to represent the uncertainty and variability required for tasks such as data completion and cross-sensor translation. We introduce COP-GEN, a multimodal latent diffusion transformer that models the joint distribution of heterogeneous EO modalities at their native spatial resolutions. By parameterising cross-modal mappings as conditional distributions, COP-GEN enables flexible any-to-any conditional generation, including zero-shot modality translation without task-specific retraining. Experiments show that COP-GEN generates diverse yet physically consistent realisations while maintaining strong peak fidelity across optical, radar, and elevation modalities. Qualitative and quantitative analyses demonstrate that the model captures meaningful cross-modal structure and adapts its output uncertainty as conditioning information increases. We release a stochastic benchmark built from multi-temporal Sentinel-2 observations that enables distribution-level comparison of generative EO models. On this benchmark, COP-GEN covers 90% of the real observation manifold and 63% of its per-band reflectance range, while the strongest competing method collapses to 2.8% and 18%, respectively. These results highlight the importance of stochastic generative modeling for EO and motivate evaluation protocols beyond single-reference, pointwise metrics. Website: https://miquel-espinosa.github.io/cop-gen

2603.01960 2026-05-12 cs.LG cs.AI

TiledAttention: a CUDA Tile SDPA Kernel for PyTorch

Taimur Khan

发表机构 * Helmholtz Centre for Environmental Research - UFZ(环境保护研究霍普夫研究所)

AI总结 TiledAttention 是一种用于 NVIDIA GPU 的缩放点积注意力(SDPA)前向计算算子,旨在加速 SDPA 相关研究。该方法基于 FlashAttention 的在线 softmax 形式,采用 cuTile/TileIR 实现策略,支持在 Python 层面对调度策略进行修改,从而实现高性能与高度可定制化的平衡。实验表明,TiledAttention 在标准 eager 注意力路径上实现了显著加速,并可直接集成到 PyTorch 工作流中,为注意力机制的高效研究提供了实用工具。

详情
英文摘要

TiledAttention is a scaled dot-product attention (SDPA) forward operator for SDPA research on NVIDIA GPUs. Implemented in cuTile Python (TileIR) and exposed as a PyTorch-callable function, it is easier to modify than low-level CUDA templates while retaining realistic behavior via online softmax and tiled $K,V$ streaming. Algorithmically, TiledAttention follows the established FlashAttention-style online-softmax formulation; our novelty is the cuTile/TileIR implementation strategy, schedule-level modifiability, and reproducible benchmarking/profiling workflow. The approach is both performant and directly editable at the schedule level from Python (tile shapes, staging, shared-memory layout), enabling rapid, reproducible kernel research without template-heavy CUDA/CUTLASS rewrites. We benchmark TiledAttention on an NVIDIA DGX GB10 node with a reproducible harness and compare against PyTorch SDPA (auto-dispatch), explicit unfused baselines (torch_sdpa_math, standard eager attention), and forced backend probes (FlashAttention2, EffecientAttention, CuDNN Attention) across sequence length, head dimension, and precision (FP16/BF16). While production fused baselines remain stronger overall, TiledAttention delivers large speedups over standard eager attention paths and is available for direct use within PyTorch workflows, providing a practical balance between performance and customizability.

2603.00541 2026-05-12 cs.LG stat.ML

Spectral Condition for $μ$P under Width-Depth Scaling

Chenyu Zheng, Rongzhen Wang, Xinyu Zhang, Chongxuan Li

发表机构 * Gaoling School of AI, Renmin University of China(中国人民大学东城区人工智能学院)

AI总结 随着生成式基础模型在宽度和深度上同时扩展,稳定特征学习和可靠的超参数迁移面临挑战。本文提出了一种统一的谱域框架,用于在联合宽度-深度缩放下实现最大更新参数化($μ$P),明确了权重及其每步更新的范数应如何随宽度和深度变化,并揭示了从单变换($k=1$)到多变换($k\geq 2$)的转变。该框架适用于多种优化器,实验表明其在GPT-2类语言模型中能实现稳定的特征学习和鲁棒的超参数迁移,优于传统参数化和$ k=1 $情况下的$ μ $P方法。

Comments 76 pages, 13 figures, 40 tables

详情
英文摘要

Generative foundation models are increasingly scaled in both width and depth, posing significant challenges for stable feature learning and reliable hyperparameter (HP) transfer across model sizes. While maximal update parameterization ($μ$P) has provided a principled solution to both problems for width scaling, existing extensions to the joint width-depth scaling regime remain fragmented, architecture- and optimizer-specific, and often rely on technically involved theories. In this work, we develop a simple and unified spectral framework for $μ$P under joint width-depth scaling. For deep residual networks whose residual blocks contain $k$ transformations, the framework specifies how the norms of weights and their per-step updates should scale with width and depth. It reveals a fundamental transition from $k=1$ to $k\geq 2$, unifying previously disparate $μ$P formulations and identifying the $k\geq 2$ case as more appropriate for practical architectures with multi-transformation branches such as Transformers. Building on this framework, we derive a general recipe for implementing $μ$P across a broad class of optimizers by mapping spectral constraints to concrete HP parameterizations, recovering existing results and extending them to additional optimizers. Finally, experiments on GPT-2 style language models show that the $μ$P formulation derived from the $k\geq 2$ case achieves stable feature learning and robust HP transfer under width-depth scaling, whereas standard parameterization and $μ$P in the $k=1$ case often fail to do so. These results support the practical effectiveness of the proposed spectral framework.

2602.23928 2026-05-12 cs.CL

The Astonishing Ability of Large Language Models to Parse Jabberwockified Language

Gary Lupyan, Senyi Yang

发表机构 * Department of Psychology, University of Wisconsin-Madison(威斯康星大学麦迪逊分校心理学系)

AI总结 本研究展示了大型语言模型在解析严重退化的英语文本方面具有惊人的能力。通过将内容词随机替换为无意义字符串生成的“Jabberwockified”文本,模型仍能恢复出接近原意的常规英语句子。这一结果表明,句法结构和封闭类词汇等线索对词义的约束远超以往认知,也为理解语言处理机制提供了重要启示。

Comments Submitted to the 2026 Annual Meeting of the Cognitive Science Society

详情
英文摘要

We show that large language models (LLMs) have an astonishing ability to recover meaning from severely degraded English texts. Texts in which content words have been randomly substituted by nonsense strings, e.g., "At the ghybe of the swuint, we are haiveed to Wourge Phrear-gwurr, who sproles into an ghitch flount with his crurp", can be translated to conventional English that is, in many cases, close to the original text, e.g., "At the start of the story, we meet a man, Chow, who moves into an apartment building with his wife." These results show that structural cues (e.g., morphosyntax, closed-class words) constrain lexical meaning to a much larger degree than imagined. Although the abilities of LLMs to make sense of "Jabberwockified" English are clearly superhuman, they are highly relevant to understanding linguistic structure and suggest that efficient language processing either in biological or artificial systems likely benefits from very tight integration between syntax, lexical semantics, and general world knowledge.

2602.22953 2026-05-12 cs.AI

General Agent Evaluation

Elron Bandel, Asaf Yehudai, Lilach Eden, Yehoshua Sagron, Yotam Perlitz, Elad Venezian, Natalia Razinkov, Natan Ergas, Shlomit Shachor Ifergan, Segev Shlomov, Michal Jacovi, Leshem Choshen, Liat Ein-Dor, Yoav Katz, Michal Shmueli-Scheuer

发表机构 * IBM Research(IBM研究院) MIT(麻省理工学院)

AI总结 该研究系统评估了通用智能体在不同协议和陌生环境中的性能,比较了工具调用、MCP、代码生成和CLI等多种智能体架构。研究提出了统一的协议和评估框架,构建了首个开放的通用智能体排行榜,涵盖多种基础模型和基准任务。实验发现,通用智能体无需领域定制即可适应不同任务,但架构选择对性能影响显著,且开源模型在通用性方面存在明显不足。

Comments Presented at the ICLR 2026 Workshop on Agents in the Wild

详情
英文摘要

General-purpose agents perform tasks in unfamiliar environments without domain-specific manual customization. Yet no study has systematically measured how agent architecture shapes performance across heterogeneous protocols and diverse unfamiliar environments. This is the first systematic study, comparing tool-calling, MCP, code-generation, and CLI agents on the same benchmarks with the same models. Two gaps blocked such a study: existing harnesses require per-benchmark wiring or fixed protocol classes (web for BrowserGym, CLI for Harbor), and benchmarks themselves expect human-authored prompts, context, and integration glue. To enable this study, we contribute (1) a unifying protocol that bridges existing benchmark and agent protocols; (2) an evaluation harness that surfaces any benchmark to any general-purpose agent and backbone model; and (3) the first Open General Agent Leaderboard of agent configurations, a full factorial over 5 agent architectures x 5 backbone LLMs (three closed-source, two open-weight) x 6 benchmarks spanning software engineering, customer service, deep research, and personal assistance. We find that (i) general agents adapt to every tested domain without per-domain customization; (ii) agent architecture choice swings results by up to 12pp within a single model, yet backbone model choice dominates overall performance; (iii) on 4 of 6 tested benchmarks, top general agents are indistinguishable from the leading heavily-customized domain-specific agents; (iv) open-weight models tested exhibit "generality sinks" absent from frontier closed-source models: they consistently collapse on specific agent architectures or benchmarks; (v) a behavioral failure analysis reveals architecture-distinctive error signatures that aggregate scoring cannot discriminate. Code, harness, leaderboard, and traces are at https://www.exgentic.ai.

2602.22611 2026-05-12 cs.LG

Mitigating Membership Inference in Intermediate Representations with Differentially Private Training

Jiayang Meng, Tao Huang, Chen Hou, Guolong Zheng, Hong Chen

发表机构 * School of Information, Renmin University of China, Beijing, China(中国人民大学信息学院,北京,中国) School of Computer Science(计算机科学学院) Big Data, Minjiang University, Fuzhou, Fujian, China(大数据,闽江大学,福州,福建,中国)

AI总结 在嵌入式接口(EaaI)场景中,预训练模型被用于生成中间表示(IRs),这些表示可能泄露训练数据成员信息,从而被用于成员推理攻击(MIA)。本文提出了一种基于差分隐私的分层训练方法LM-DP-SGD,通过分析各层的MIA风险,动态调整隐私保护强度,从而在保证模型效用的同时更有效地缓解中间表示中的成员推理问题。实验表明,该方法在相同隐私预算下能够显著降低IR级别的MIA风险,实现了更优的隐私与效用平衡。

详情
英文摘要

In Embedding-as-an-Interface (EaaI) settings, pre-trained models are queried for Intermediate Representations (IRs). The distributional properties of IRs can leak training-set membership signals, enabling Membership Inference Attacks (MIAs) whose strength varies across layers. Although Differentially Private Stochastic Gradient Descent (DP-SGD) mitigates such leakage, existing implementations employ per-example gradient clipping and a uniform, layer-agnostic noise multiplier, ignoring heterogeneous layer-wise MIA vulnerability. This paper introduces Layer-wise MIA-risk-aware DP-SGD (LM-DP-SGD), which adaptively allocates privacy protection across layers in proportion to their MIA risk. Specifically, LM-DP-SGD trains a shadow model on a public shadow dataset, extracts per-layer IRs from its train/test splits, and fits layer-specific MIA adversaries, using their attack error rates as MIA-risk estimates. Leveraging the cross-dataset transferability of MIAs, these estimates are then used to reweight each layer's contribution to the globally clipped gradient during private training, providing layer-appropriate protection under a fixed noise magnitude. We further establish theoretical guarantees on both privacy and convergence of LM-DP-SGD. Extensive experiments show that, under the same privacy budget, LM-DP-SGD reduces the peak IR-level MIA risk while preserving utility, yielding a superior privacy-utility trade-off.

2602.21307 2026-05-12 cs.LG

SymTorch: Symbolic Distillation of Neural Networks

Elizabeth S. Z. Tan, Adil Soubki, Miles Cranmer

发表机构 * Department of Applied Mathematics and Theoretical Physics(应用数学与理论物理系)

AI总结 本文提出了一种名为 SymTorch 的符号蒸馏方法,旨在揭示神经网络组件所学习的数学函数,并将其表示为可解释的闭式表达式。该方法基于 PySR 实现,适用于多种网络架构,并成功应用于物理定律的自动发现、模型解释性提升以及提升神经网络效率等方面。研究展示了 SymTorch 在符号回归、模型解释和资源优化中的广泛适用性与优越性能。

详情
英文摘要

What mathematical functions do neural network components learn? Symbolic distillation addresses this question by expressing neural network components with interpretable, closed-form mathematical expressions that expose the functional structure learned during training. We develop symbolic distillation as a systematic, architecture-agnostic methodology, and release our approach as the open-source SymTorch package - a PySR-powered library built natively for the PyTorch ecosystem. Applying this methodology across diverse architectures, we find that SymTorch is successful in the automated discovery of physical laws. Specifically, our approach (1) recovers pairwise interaction forces from graph neural networks trained on empirical $n$-body observations, (2) distills the exact closed-form PDE/ODE solutions of multiple physical systems, including the value of constants, from physics-informed neural networks trained on sparse data, and (3) uncovers the chaotic dynamics of the Lorenz system from high-dimensional data, ultimately outperforming the base neural network on downstream prediction tasks. We further demonstrate the utility of our framework for model interpretability by providing an optimized implementation of SLIME - a symbolic extension to the LIME explainability method. SLIME consistently outperforms LIME across predictive metrics across eight popular classification and regression benchmarks, while still providing an interpretable local symbolic model. Lastly, we investigate replacing transformer MLP layers with symbolic surrogates: replacing 1-7 layers with symbolic approximations yields 2-19\% throughput improvements and up to 18.7\% VRAM reduction, with the resulting hybrid models lying on the Pareto front of throughput versus perplexity among open-source LLMs of comparable scale.

2602.18866 2026-05-12 cs.LG stat.ML

$(α,β)$-Stability for Boosting Vector-Valued Prediction

Jian Qian, Shu Ge

发表机构 * The University of Hong Kong(香港大学) Independent Researcher(独立研究者)

AI总结 本文研究了向量值预测中的提升(boosting)方法,提出了基于几何中位数的$(α,β)$-稳定性概念,用于分析聚合过程如何将弱预测器的性能提升为强预测器。作者在多种自然散度度量下刻画了该稳定性性质,并基于此提出了一种通用的提升框架\geomedboost,该框架通过指数重加权和几何中位数聚合实现,能够在弱学习器条件下保证经验散度误差的指数衰减,并进一步推导出总体误差的上界。

详情
英文摘要

Despite the widespread use of boosting in structured prediction, a general theoretical understanding of aggregation beyond scalar prediction remains incomplete. We study vector-valued prediction under a target divergence and identify a geometric stability property under which aggregation amplifies weak guarantees into strong ones. We formalize this property as $(α,β)$-stability by geometric median and show how it supports a boosting framework based on exponential reweighting and geometric-median aggregation. For vector-valued prediction, we characterize this stability property under several natural divergences: $\ell_1$ and $\ell_2$ distances for unconstrained vector-valued prediction, and TV, Hellinger, and KL for density estimation over finite probability vectors. Building on these results, we propose a generic boosting framework \geomedboost. Under a weak learner condition and $(α,β)$-stability, we obtain exponential decay of the empirical divergence error, which then yields population guarantees through a generalization bound.

2602.17546 2026-05-12 cs.CL cs.LG

Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning

Jyotin Goel, Souvik Maji, Pratik Mazumder

发表机构 * Indian Institute of Technology Jodhpur(印度理工学院乔浦尔)

AI总结 本文研究了在微调过程中如何防止语言模型的安全性下降问题,提出了一种自适应正则化框架,能够根据安全风险动态调整正则化策略,从而在保持模型实用性的同时提升其安全性。该方法通过两种方式估计训练过程中的安全风险:一种是基于判别器对训练批次进行高风险评分,另一种是利用轻量分类器分析中间激活特征预测有害意图。实验表明,该方法在多种模型和攻击场景下均能有效降低攻击成功率,且不增加推理时的开销。

Comments Work in progress (48 pages)

详情
英文摘要

Instruction-following language models are trained to be helpful and safe, yet their safety behavior can deteriorate under benign fine-tuning and worsen under adversarial updates. Existing defenses often offer limited protection or force a trade-off between safety and utility. We introduce a training framework that adapts regularization in response to safety risk, enabling models to remain aligned throughout fine-tuning. To estimate safety risk at training time, we explore two distinct approaches: a judge-based Safety Critic that assigns high-level harm scores to training batches, and an activation-based risk predictor built with a lightweight classifier trained on intermediate model activations to estimate harmful intent. Each approach provides a risk signal that is used to constrain updates deemed higher risk to remain close to a safe reference policy, while lower-risk updates proceed with standard training. We empirically verify that harmful intent signals are predictable from pre-generation activations and that judge scores provide effective high-recall safety guidance. Across multiple model families and attack scenarios, adaptive regularization with either risk estimation approach consistently lowers attack success rate compared to standard fine-tuning, preserves downstream performance, and adds no inference-time cost. This work demonstrates a principled mechanism for maintaining safety without sacrificing utility.

2602.17251 2026-05-12 cs.LG

SCOPE: Structured Prototype-Guided Adaptation for EEG Foundation Models with Limited Labels

Jingying Ma, Feng Wu, Yucheng Xing, Qika Lin, Tianyu Liu, Chenyu Liu, Ziyu Jia, Mengling Feng

发表机构 * Saw Swee Hock School of Public Health, National University of Singapore(新加坡国立大学 Saw Swee Hock 公共卫生学院) Institute of Data Science, National University of Singapore(新加坡国立大学数据科学研究所) Guangzhou Research Translation and Innovation Institute, National University of Singapore(新加坡国立大学广州研究翻译与创新研究所) College of Computing and Data Science, Nanyang Technological University(南洋理工大学计算与数据科学学院) Beijing Key Laboratory of Brainnetome and Brain-Computer Interface, Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所脑网络与脑机接口重点实验室) Brainnetome Center, Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所脑网络中心)

AI总结 本文研究了在仅有少量标注样本的情况下,如何有效地适配脑电图基础模型(EFMs)。针对EFMs在有限标签下适应时出现的校准偏差、预测崩溃和表示漂移等问题,提出了一种结构化置信感知的原型引导框架SCOPE。该方法通过构建群体级外部监督和生成置信感知伪标签,提升了无标签样本的可靠性,并引入轻量的原型适配器以冻结EFMs的预训练表示,从而在多种任务和数据比例下均表现出优异的性能和效率。

详情
英文摘要

Electroencephalography (EEG) foundation models (EFMs) have shown strong potential for transferable representation learning, yet their adaptation in realistic settings remains challenging when only a few labeled subjects are available. We show that this challenge stems from a structural mismatch between noisy, limited supervision and the highly plastic parameter space of EFMs, reflected in three key failure modes: overconfident miscalibration, prediction collapse, and representation drift caused by unconstrained parameter updates. To address these challenges, we propose SCOPE, a Structured COnfidence-aware Prototype-guided framework for label-limited EFM adaptation. SCOPE first constructs cohort-level external supervision to provide persistent guidance and further derives confidence-aware pseudo-labels to select reliable unlabeled samples for adaptation. Building on the constructed external supervision, SCOPE introduces ProAdapter, a lightweight prototype-conditioned adapter that modulates frozen EFMs to preserve pretrained representations. Experiments across 50 label-limited adaptation settings, covering 6 EEG tasks, 5 EFM backbones, and 5%-50% training labeled-subject ratios, show that SCOPE consistently achieves strong performance and efficiency.

2602.10868 2026-05-12 cs.LG

The Sample Complexity of Uniform Approximation for Multi-Dimensional CDFs and Fixed-Price Mechanisms

Matteo Castiglioni, Anna Lunghi, Alberto Marchesi

发表机构 * Politecnico di Milano(米兰理工学院)

AI总结 本文研究了在仅获得一位反馈信息的情况下,学习多维累积分布函数(CDF)的均匀近似所需的样本复杂度。研究发现,样本复杂度在维度上几乎不变,仅以对数形式依赖于维度。该结果为小市场中的固定价格机制学习提供了紧致的样本复杂度界和新的遗憾界。

详情
英文摘要

We study the sample complexity of learning a uniform approximation of an $n$-dimensional cumulative distribution function (CDF) within an error $ε> 0$, when observations are restricted to a minimal one-bit feedback. This serves as a counterpart to the multivariate DKW inequality under ''full feedback'', extending it to the setting of ''bandit feedback''. Our main result shows a near-dimensional-invariance in the sample complexity: we get a uniform $ε$-approximation with a sample complexity $\frac{1}{ε^3}{\log\left(\frac 1 ε\right)^{\mathcal{O}(n)}}$ over a arbitrary fine grid, where the dimensionality $n$ only affects logarithmic terms. As direct corollaries, we provide tight sample complexity bounds and novel regret guarantees for learning fixed-price mechanisms in small markets, such as bilateral trade settings.

2602.09789 2026-05-12 cs.LG

When Less is More: The LLM Scaling Paradox in Context Compression

Ruishan Guo, Yibing Liu, Guoxin Ma, Yan Wang, Yueyang Zhang, Long Xia, Kecheng Chen, Zhiyuan Sun, Daiting Shi

发表机构 * Baidu Inc.(百度公司) Tsinghua University(清华大学) Xi’an Jiaotong University(西安交通大学) City University of Hong Kong(香港城市大学)

AI总结 本文研究了在上下文压缩任务中,大语言模型参数规模增加所带来的“规模-保真度悖论”:尽管增大压缩模型的规模可以降低重建误差,但却可能降低重建内容的忠实度。研究发现,这一现象主要由“知识覆盖”和“语义漂移”两种机制引起,并通过嵌入几何和重建确定性分析揭示了大模型在语义子空间中组织记忆的特性,导致表示模糊、覆盖和恢复能力下降。研究结果对现有上下文压缩评估体系提出了补充,并揭示了在从生成可信内容转向忠实保留原始信息的目标下,模型扩展规律可能失效。

Comments 22 pages, 7 figures, conference

详情
英文摘要

Scaling up model parameters has long been a prevalent training paradigm driven by the assumption that larger models yield superior generation capabilities. However, under lossy context compression in a compressor--decoder setup, we find a \textbf{\textit{Size-Fidelity Paradox}}: increasing compressor size can lessen the faithfulness of reconstructed contexts though reconstruction error decreases. Across 27 compressor setups spanning model families, scales, and compression rates, we coin this paradox arising from two dominant factors: 1) \textit{knowledge overwriting}: larger models increasingly replace source facts with their own prior beliefs, \textit{e.g.}, ``the white strawberry`` $\to$ ``the red strawberry``; and 2) \textit{semantic drift}: larger models tend to paraphrase or restructure content instead of reproducing it verbatim, \textit{e.g.}, ``Alice hit Bob`` $\to$ ``Bob hit Alice``. Interestingly, this paradox persists across varied settings, with mid-sized compressors often outperforming larger ones in faithful recovery. By analyzing the compressed memory via embedding geometry and reconstruction determinacy, we further reveal that compressors tend to organize memory across broader semantic subspaces, yielding more ambiguous representations prone to overwriting, drift, and weakened recovery. These findings complement existing evaluations of context compression and expose a breakdown of scaling laws when the objective shifts from plausible generation to faithful preservation.

2602.08617 2026-05-12 cs.LG

ERIS: Enhancing Privacy and Scalability in Federated Learning via Federated Shard Aggregation

Dario Fenoglio, Pasquale Polverino, Jacopo Quizi, Martin Gjoreski, Akash Dhasade, Marc Langheinrich

发表机构 * Università della Svizzera italiana(瑞士意大利大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文提出了一种名为ERIS的联邦学习框架,通过引入联邦分片聚合(FSA)机制,在提升隐私性的同时解决大规模模型训练中的可扩展性问题。ERIS将客户端更新划分为互不重叠的分片,并在多个客户端聚合器上分布式聚合,从而消除中心化聚合瓶颈、限制单个观察者可获取的信息,并在重组后保持与集中式联邦学习相同的更新效果。实验表明,ERIS在保持模型性能的同时,有效减少了通信开销并增强了对成员推理和重构攻击的鲁棒性。

详情
英文摘要

Scaling Federated Learning (FL) to billion-parameter models forces a challenging trade-off between privacy, scalability, and model utility. Existing solutions often tackle these challenges in isolation, sacrificing accuracy, relying on costly cryptographic tools, or introducing communication and optimization inefficiencies that affect convergence. We introduce ERIS, an FL framework centered on Federated Shard Aggregation (FSA), a novel mechanism that partitions each client update into non-overlapping shards whose aggregation is distributed across multiple client-side aggregators. FSA removes the central aggregation bottleneck, limits the information visible to any single observer, and preserves the centralized FL update after reassembly. ERIS can further readily integrate Distributed Shifted Compression (DSC) to reduce transmitted payloads and exposed coordinates. We prove that ERIS preserves convergence under standard assumptions and bounds mutual information leakage by the observable fraction of each update, decreasing with the number of client-side aggregators, and with the compression level when DSC is enabled. Experiments across image and text tasks, including large language models, show that ERIS achieves FedAvg-level utility while substantially reducing communication bottlenecks and improving robustness to membership inference and reconstruction attacks, without relying on heavy cryptography or utility-degrading perturbations.

2602.07940 2026-05-12 cs.AI

MePo: Meta Post-Refinement for Rehearsal-Free General Continual Learning

Guanglong Sun, Hongwei Yan, Liyuan Wang, Zhiqi Kang, Shuang Cui, Hang Su, Jun Zhu, Yi Zhong

发表机构 * School of Life Sciences, IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing, China(生命科学学院,IDG/麦克戈维脑研究 institute,清华大学,北京,中国) Tsinghua-Peking Center for Life Sciences(清华-北京大学生命科学中心) Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab, BNRist Center, Tsinghua University, Beijing, China(计算机科学与技术系,人工智能研究所,清华-博世联合机器学习中心,THBI实验室,BNRist中心,清华大学,北京,中国) Department of Psychological and Cognitive Sciences, Tsinghua University, Beijing, China(心理学与认知科学系,清华大学,北京,中国) Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Grenoble, France(格勒诺布尔阿尔卑斯大学,Inria,CNRS,格勒诺布尔INP,LJK,格勒诺布尔,法国) Institute of Software Chinese Academy of Sciences, Beijing, China(软件研究所,中国科学院,北京,中国)

AI总结 为应对外部环境的不确定性变化,智能系统需要从复杂动态环境中持续学习并实时响应,这一能力被称为通用持续学习(GCL)。尽管利用预训练模型(PTMs)已显著提升了传统持续学习的性能,但在处理单一过程中多样化且时间混合的信息时仍存在局限。本文提出了一种名为MePo的元后优化方法,通过构建伪任务序列和双层元学习框架,增强PTMs在无回放场景下的持续学习能力,并通过初始化元协方差矩阵提升表征对齐的鲁棒性,实验证明该方法在多个GCL基准上取得了显著性能提升。

详情
英文摘要

To cope with uncertain changes of the external world, intelligent systems must continually learn from complex, evolving environments and respond in real time. This ability, collectively known as general continual learning (GCL), encapsulates practical challenges such as online datastreams and blurry task boundaries. Although leveraging pretrained models (PTMs) has greatly advanced conventional continual learning (CL), these methods remain limited in reconciling the diverse and temporally mixed information along a single pass, resulting in sub-optimal GCL performance. Inspired by meta-plasticity and reconstructive memory in neuroscience, we introduce here an innovative approach named Meta Post-Refinement (MePo) for PTMs-based GCL. This approach constructs pseudo task sequences from pretraining data and develops a bi-level meta-learning paradigm to refine the pretrained backbone, which serves as a prolonged pretraining phase but greatly facilitates rapid adaptation of representation learning to downstream GCL tasks. MePo further initializes a meta covariance matrix as the reference geometry of pretrained representation space, enabling GCL to exploit second-order statistics for robust output alignment. MePo serves as a plug-in strategy that achieves significant performance gains across a variety of GCL benchmarks and pretrained checkpoints in a rehearsal-free manner (e.g., 15.10\%, 13.36\%, and 12.56\% on CIFAR-100, ImageNet-R, and CUB-200 under Sup-21/1K). Our source code is available at \href{https://github.com/SunGL001/MePo}{MePo}

2602.06550 2026-05-12 cs.LG cs.AI

Dynamics-Aligned Shared Hypernetworks for Contextual RL under Discontinuous Shifts

Jan Benad, Pradeep Kr. Banerjee, Frank Röder, Nihat Ay, Martin V. Butz, Manfred Eppe

发表机构 * Institute for Data Science Foundations, TUHH, Germany(数据科学基础研究所,德意志高等技术大学,德国) Santa Fe Institute, USA(新墨西哥州圣达菲研究所,美国) Neuro-Cognitive Modeling Group, University of Tübingen, Germany(图宾根大学神经认知建模小组,德国)

AI总结 在上下文强化学习中,当潜在上下文不连续变化并导致动作对环境的影响发生突变时,零样本泛化仍是一个核心挑战。本文提出DMA*-SH框架,通过一个仅基于动力学预测训练的共享超网络生成适配器权重,用于动态模型、策略和动作价值函数,从而引入与不连续上下文变化相匹配的归纳偏置。该方法结合输入输出归一化和随机输入掩码,提升了上下文推断的稳定性,并在新设计的Actuator Inversion Benchmark基准上实现了优于现有方法的零样本泛化性能。

详情
英文摘要

Zero-shot generalization in contextual reinforcement learning remains a core challenge, particularly when the context is latent and must be inferred from data. A canonical failure mode arises when latent context discontinuously changes how actions affect the environment, requiring incompatible control responses across contexts. We propose DMA*-SH, a framework where a single hypernetwork, trained solely via dynamics prediction, generates a small set of adapter weights shared across the dynamics model, policy, and action-value function. This shared modulation imparts an inductive bias matched to discontinuous context-to-dynamics shifts, while input/output normalization and random input masking stabilize context inference, promoting directionally concentrated representations. We provide theoretical support via expressivity separation results for hypernetwork modulation, and a variance decomposition with policy-gradient variance bounds that formalize how within-mode compression improves learning under non-overlapping contexts. For evaluation, we introduce the Actuator Inversion Benchmark (AIB), a suite of environments designed to isolate challenging context-to-dynamics interactions, including actuator inversion, actuator permutations, and weakly non-overlapping continuous dynamics. On AIB's held-out tasks, DMA*-SH achieves zero-shot generalization, outperforming domain randomization by 58.1% and surpassing a standard context-aware baseline by 11.5% on average.

2602.06527 2026-05-12 cs.AI

HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction

Shengxuan Qiu, Haochen Huang, Shuzhang Zhong, Pengfei Zuo, Meng Li

发表机构 * Institute for Artificial Intelligence, Peking University, Beijing(北京大学人工智能研究院) Huawei(华为) School of Integrated Circuits, Peking University, Beijing(北京大学集成电路学院)

AI总结 该论文提出了一种名为HyPER的方法,旨在解决大规模语言模型推理中探索与利用之间的平衡问题。HyPER通过动态控制假设路径的扩展与缩减,在固定计算预算下优化推理过程,从而提升推理准确率并减少计算资源消耗。该方法无需额外训练,适用于混合专家模型,实验表明其在多个基准测试中显著提升了准确率并降低了计算成本。

详情
英文摘要

Scaling test-time compute with multi-path chain-of-thought improves reasoning accuracy, but its effectiveness depends critically on the exploration-exploitation trade-off. Existing approaches address this trade-off in rigid ways: tree-structured search hard-codes exploration through brittle expansion rules that interfere with post-trained reasoning, while parallel reasoning over-explores redundant hypothesis paths and relies on weak answer selection. Motivated by the observation that the optimal balance is phase-dependent and that correct and incorrect reasoning paths often diverge only at late stages, we reformulate test-time scaling as a dynamic expand-reduce control problem over a pool of hypotheses. We propose HyPER, a training-free online control policy for multi-path decoding in mixture-of-experts models that reallocates computation under a fixed budget using lightweight path statistics. HyPER consists of an online controller that transitions from exploration to exploitation as the hypothesis pool evolves, a token-level refinement mechanism that enables efficient generation-time exploitation without full-path resampling, and a length- and confidence-aware aggregation strategy for reliable answer-time exploitation. Experiments on four mixture-of-experts language models across diverse reasoning benchmarks show that HyPER consistently achieves a superior accuracy-compute trade-off, improving accuracy by 8 to 10 percent while reducing token usage by 25 to 40 percent.

2602.05391 2026-05-12 cs.CV

Efficient Dataset Distillation for Pre-Trained Self-Supervised Models via Statistical Flow Matching

Qianxin Xia, Jiawei Du, Xin Zhang, Yuhan Zhang, Jielei Wang, Guoming Lu

发表机构 * University of Electronic Science(电子科技大学)

AI总结 该论文研究了如何高效地对预训练自监督模型进行数据集蒸馏,以生成一个体积小但性能接近原始数据集的合成数据集。为了解决传统方法在计算和内存上的高开销问题,作者提出了一种基于统计流匹配的新方法,通过对齐原始数据中目标类与非目标类中心的统计流来优化合成图像,大幅降低了计算资源需求。实验表明,该方法在保持甚至提升性能的同时,相比现有方法减少了10倍的GPU内存占用和4倍的运行时间,并提出了一种分类器继承策略以进一步提升效率和性能。

详情
英文摘要

Dataset distillation seeks to synthesize a highly compact dataset that achieves performance comparable to the original dataset on downstream tasks. For the classification task that use pre-trained self-supervised models as backbones, previous linear gradient matching optimizes synthetic images by encouraging them to mimic the gradient updates induced by real images on the linear classifier. However, this batch-level formulation requires loading thousands of real images and applying multiple rounds of differentiable augmentations to synthetic images at each distillation step, leading to substantial computational and memory overhead. In this paper, we introduce statistical flow matching , a stable and efficient supervised learning framework that optimizes synthetic images by aligning constant statistical flows from target class centers to non-target class centers in the original data. Our approach loads raw statistics only once and performs a single augmentation pass on the synthetic data, achieving performance comparable to or better than the state-of-the-art methods with 10x lower GPU memory usage and 4x shorter runtime. Furthermore, we propose a classifier inheritance strategy that reuses the classifier trained on the original dataset for inference, requiring only an extremely lightweight linear projector and marginal storage while achieving substantial performance gains.

2602.04712 2026-05-12 cs.CV cs.AI eess.IV

SAR-RAG: ATR Visual Question Answering by Semantic Search, Retrieval, and MLLM Generation

David F. Ramirez, Tim Overman, Kristen Jaskie, Joe Marvin, Andreas Spanias

发表机构 * SenSIP Center, School of ECEE, Arizona State University(SenSIP中心,电子与计算机工程学院,亚利桑那州立大学) Prime Solutions Group Inc(Prime Solutions Group公司)

AI总结 本文提出了一种用于合成孔径雷达(SAR)图像自动目标识别(ATR)的视觉上下文图像检索增强生成(ImageRAG)辅助AI方法,名为SAR-RAG。该方法结合多模态大语言模型(MLLM)与语义嵌入向量数据库,通过检索已知目标类型的图像示例,提升对SAR图像中军事车辆的识别准确率。实验表明,SAR-RAG在检索、分类和尺寸回归等指标上均优于传统MLLM方法,显著提升了ATR任务的性能。

Comments Accepted to 2026 SPIE Defense + Security, Automatic Target Recognition XXXVI

详情
英文摘要

We present a visual-context image-retrieval-augmented generation (ImageRAG)- assisted AI agent for automatic target recognition (ATR) of synthetic aperture radar (SAR) imagery. SAR is a remote sensing method used in defense and security applications to detect and monitor the positions of military vehicles, which may appear indistinguishable in images. Researchers have extensively studied SAR ATR to improve the differentiation and identification of vehicle types, characteristics, and measurements. Test examples can be compared with known vehicle target types to improve recognition tasks. New methods enhance the capabilities of neural networks, transformer attention, and multimodal large language models. An agentic AI method may be developed to utilize a defined set of tools, such as searching through a library of similar examples. Our proposed method, SAR Retrieval-Augmented Generation (SAR-RAG), combines a multimodal large language model (MLLM) with a vector database of semantic embeddings to support contextual search for image exemplars with known qualities. By recovering past image examples of known true target types, our SAR-RAG system can compare similar vehicle categories, thereby improving ATR prediction accuracy. We evaluate this through search and retrieval metrics, categorical classification accuracy, and numeric regression of vehicle dimensions. These metrics all show improvements when SAR-RAG is added to an MLLM baseline method as an attached ATR memory bank.

2602.04284 2026-05-12 cs.AI cs.LG

Agent-Omit: Adaptive Context Omission for Efficient LLM Agents

Yansong Ning, Jun Fang, Naiqiang Tan, Hao Liu

发表机构 * AI Thrust, The Hong Kong University of Science(香港科学与技术大学人工智能前沿) Didichuxing Co. Ltd(滴滴出行有限公司)

AI总结 在多轮智能体与环境交互过程中,如何高效管理智能体的上下文(如思考和观察)是提升其性能的关键问题。现有方法通常对交互轨迹一视同仁,忽视了不同轮次中思考和观察的必要性与价值差异。为此,本文提出Agent-Omit,一种统一的训练框架,使大语言模型智能体能够自适应地省略冗余的思考和观察内容。实验表明,该方法在多个基准测试中表现出优异的性能与效率平衡。

Comments ICML 2026

详情
英文摘要

Managing agent context (e.g., thought and observation) during multi-turn agent-environment interactions is an emerging strategy to improve agent efficiency. However, existing studies treat the entire interaction trajectories equally, overlooking the thought necessity and observation utility varies across turns. To this end, we first conduct quantitative investigations into how thought and observation affect agent effectiveness and efficiency. Based on our findings, we propose Agent-Omit, a unified training framework that empowers LLM agents to adaptively omit redundant thoughts and observations. Specifically, we first synthesize a small amount of cold-start data, including both single-turn and multi-turn omission scenarios, to fine-tune the agent for omission behaviors. Furthermore, we introduce an omit-aware agentic reinforcement learning approach, incorporating a dual sampling mechanism and a tailored omission reward to incentivize the agent's adaptive omission capability. Theoretically, we prove that the deviation of our omission policy is upper-bounded by KL-divergence. Experimental results on five agent benchmarks show that our constructed Agent-Omit-8B could obtain performance comparable to seven frontier LLM agent, and achieve the best effectiveness-efficiency trade-off than seven efficient LLM agents methods. Our code and data are available at https://github.com/usail-hkust/Agent-Omit.