arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 4089
专题追踪
2606.00428 2026-06-02 cs.LG cs.AI cs.CL

Finer Parameter Steps for Low-Rank PEFT: A Controlled Study with CP Tensor Adapters

低秩PEFT的更细参数步长:基于CP张量适配器的控制研究

Xinjue Wang, Xiuheng Wang, Yejun Zhang, Sergiy A. Vorobyov, Esa Ollila, Zhi-Yong Wang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文通过固定组件的规范多路分解(CP)张量适配器实现更细的参数步长,研究其对低秩适配器精度-预算权衡的影响,发现CP适配器能填补LoRA秩之间的空白,但效果依赖于任务。

Comments Accepted at the ICML 2026 Workshop on CoLoRAI

详情
AI中文摘要

低秩适配器通常通过扫描少量秩进行比较,但秩也固定了参数预算的分辨率。对于一个$2048{\times}2048$的OPT注意力投影,增加LoRA的一个秩会存储$4096$个可训练标量,导致可行的低预算适配器大小之间存在较大间隙。本文探讨具有更细容量增量的张量化适配器是否会改变观察到的精度-预算权衡。我们通过固定组件的规范多路分解(CP)张量适配器来实例化这个问题。在$32{\times}64{\times}32{\times}64$的张量化下,一个归一化的CP组件每个投影存储$193$个可训练标量,比LoRA的一个秩步长小约21倍。我们在OPT-1.3B上,在匹配的目标模块、训练协议、数据上限和种子调度下,比较了CP适配器和LoRA在SST-2、RTE和BoolQ上的表现。CP训练稳定,并填补了LoRA秩之间的空白,但效果依赖于任务:SST-2早期达到低预算平台,BoolQ在略低于LoRA饱和之前受益于额外的CP组件,而RTE仍然偏好LoRA。因此,更细的参数步长有助于诊断PEFT预算敏感性,但它们本身并不能保证更好的精度-预算曲线。

英文摘要

Low-rank adapters are usually compared by sweeping a small set of ranks, but the rank also fixes the resolution of the parameter budget. For a $2048{\times}2048$ OPT attention projection, increasing LoRA by one rank stores $4096$ trainable scalars, leaving large gaps between feasible low-budget adapter sizes. This paper asks whether a tensorized adapter with finer capacity increments changes the observed accuracy--budget trade-off. We instantiate this question with fixed-component canonical polyadic (CP) tensor adapters. Under a $32{\times}64{\times}32{\times}64$ tensorization, one normalized CP component stores $193$ trainable scalars per projection, about $21$ times smaller than one LoRA rank step. We compare CP adapters and LoRA on OPT-1.3B across SST-2, RTE, and BoolQ under matched target modules, training protocol, data caps, and seed schedules. CP trains stably and fills the gaps between LoRA ranks, but the effect is task-dependent: SST-2 reaches an early low-budget plateau, BoolQ benefits from additional CP components before saturating slightly below LoRA, and RTE remains LoRA-favored. Finer parameter steps are therefore useful for diagnosing PEFT budget sensitivity, but they do not by themselves guarantee a better accuracy--budget curve.

2606.00427 2026-06-02 cs.LG

Topology-Aware State Abstraction with Tangle Cores for Markov Decision Processes

基于纠缠核的马尔可夫决策过程拓扑感知状态抽象

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

发表机构 * Department of Computer Science, Iowa State University(计算机科学系,爱荷华州立大学) Department of Civil, Construction & Environmental Engineering, Iowa State University(土木、建设与环境工程系,爱荷华州立大学)

AI总结 提出纠缠核抽象框架,利用经验转移图的图纠缠构建重叠状态抽象,在动作一致性条件下保证价值保持,并通过实验证明其在瓶颈领域优于现有方法。

详情
AI中文摘要

强化学习中的状态抽象通常被形式化为基于奖励和转移相似性的状态划分。这排除了导航、图和分层决策问题中的常见结构模式:接口状态(如门、枢纽和瓶颈)自然参与多个区域。我们引入了\emph{纠缠核抽象},一种基于经验转移图的图纠缠的重叠状态抽象框架。该方法从一致定向的低阶分离中构建抽象状态,并通过隶属核而非硬划分来表示共享接口。我们在显式动作一致性条件下给出了诱导的重叠抽象MDP的价值保持保证,识别了内部同质性/边界泄漏误差分解,并证明了一个定量接口重叠结果,表明硬划分何时会引入可避免的边界误差。实验上,在瓶颈表格领域、程序生成迷宫和MiniGrid表示中,纠缠核抽象在压缩-回报权衡上优于奖励感知、学习、拓扑映射和图划分基线。我们还识别了一个清晰的失败机制,即转移拓扑无信息时,纠缠可预测地几乎没有益处。这些结果将图纠缠定位为具有共享接口结构的决策问题的有效拓扑感知抽象先验。

英文摘要

State abstraction in reinforcement learning is usually formulated as a partition of states based on reward and transition similarity. This excludes a common structural pattern in navigation, graph, and hierarchical decision problems: interface states such as doors, hubs, and bottlenecks naturally participate in more than one region. We introduce \emph{tangle-core abstraction}, an overlapping state-abstraction framework based on graph tangles of empirical transition graphs. The method constructs abstract states from consistently oriented low-order separations and represents shared interfaces through a membership kernel rather than a hard partition. We give value-preservation guarantees for the induced overlapping abstract MDP under an explicit action-consistency condition, identify an interior-homogeneity/boundary-leakage error decomposition, and prove a quantitative interface-overlap result showing when hard partitions incur an avoidable boundary error. Empirically, tangle-core abstractions achieve favorable compression--return tradeoffs against reward-aware, learned, topological-map, and graph-partitioning baselines across bottlenecked tabular domains, procedurally generated mazes, and MiniGrid representations. We also identify a clear failure regime in which transition topology is uninformative, where tangles predictably offer little benefit. These results position graph tangles as an effective topology-aware abstraction prior for decision problems with shared interface structure.

2606.00426 2026-06-02 cs.LG

Canonicalized Stable-List Replay for Private Federated Continual Learning over Language-Model Embeddings

规范化稳定列表回放:面向语言模型嵌入的私有联邦持续学习

Ibne Farabi Shihab, Abu Sa-Adat Mohamed Moon-Im Al Ahsan, Anuj Sharma

发表机构 * Department of Computer Science, Iowa State University(爱荷华州立大学计算机科学系) Department of Computer Science & Engineering, BRAC University(BRAC大学计算机科学与工程系) Department of Civil, Construction & Environmental Engineering, Iowa State University(爱荷华州立大学土木、建设与环境工程系)

AI总结 针对联邦持续学习中差分隐私下回放列表无序的问题,提出规范化稳定列表回放(CSLR)方法,利用公共锚句的签名对齐客户端分布,在多个基准上提升性能。

详情
AI中文摘要

联邦持续学习(FCL)允许分布式客户端在不共享原始文本的情况下,将语言模型头部适应不断演变的NLP任务。在用户级差分隐私(DP)下,基于回放的持续学习面临一个结构性障碍:客户端只能发布候选回放摘要的小型噪声列表,且这些列表在客户端之间是无序的。我们引入了规范化稳定列表回放(CSLR),其中客户端在共享的句子嵌入空间上私有地生成候选回放分布,服务器使用公共锚句诱导的签名对齐它们。锚点提供聚合的可识别性,而不是额外的回放数据。我们证明,在可观测的锚签名间隔下,$O(\log(N/η)/p)$个锚点以至少$1-η$的概率区分$N$个候选列表元素,并给出了无序标签预言机模型的范围性无锚不可识别性结果。在持续分类、NER和对话基准的五个随机种子上,CSLR在报告的回放发布预算下,在$\eps=4$时,最终平均任务指标比最强的非CSLR DP基线提高了3.9-5.6个点,同时也优于匈牙利匹配和最优传输匹配。形式化隐私保证涵盖回放发布;端到端私有训练还需要与用于任务头更新的私有优化器组合。

英文摘要

Federated continual learning (FCL) lets distributed clients adapt language-model heads to evolving NLP tasks without sharing raw text. Under user-level differential privacy (DP), replay-based continual learning faces a structural obstacle: clients can release only small noisy lists of candidate replay summaries, and those lists are unordered across clients. We introduce Canonicalized Stable-List Replay (CSLR), where clients privately produce candidate replay distributions over a shared sentence-embedding space and the server aligns them using signatures induced by public anchor sentences. The anchors provide identifiability for aggregation rather than additional replay data. We prove that, under an observable anchor-signature margin, $O(\log(N/η)/p)$ anchors distinguish $N$ candidate list elements with probability at least $1-η$, and we give a scoped anchorless non-identifiability result for unordered-label oracle models. Across five seeds on continual classification, NER, and dialogue benchmarks, CSLR improves the final average task metric by 3.9--5.6 points over the strongest non-CSLR DP baseline at $\eps=4$ under the reported replay-release budget, while also outperforming Hungarian and optimal-transport matchers. The formal privacy guarantee covers replay release; end-to-end private training additionally requires composition with a private optimizer for task-head updates.

2606.00424 2026-06-02 cs.AI

Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight

弱批评者造就强学习者:用于可扩展监督的在策略批评蒸馏

Can Jin, Jiakang Li, Rui Wu, Eddy Zhang, Dimitris N. Metaxas

发表机构 * University of Cambridge(剑桥大学) University of California, Berkeley(加州大学伯克利分校) UC Berkeley AI Lab(加州大学伯克利分校人工智能实验室)

AI总结 提出在策略批评蒸馏(OPCD)方法,利用弱模型作为批评者提供修订方向,通过自适应自教师信号蒸馏批评引导的行为,提升强模型在推理和对齐基准上的表现。

详情
AI中文摘要

随着大型语言模型变得更强,弱监督者可能无法为复杂输出提供可靠的标签、偏好或最终判断,限制了弱到强的泛化和可扩展监督。我们研究了一种更易处理的弱监督形式:使用弱模型作为批评者,而不是作为标注者或评判者。弱批评者不需要解决任务或选择正确答案,只需提供非误导性的修订方向,帮助强模型更好地利用自身知识。我们将这种设置称为*弱批评者强监督*。我们首先表明,弱批评可以在推理时改进冻结的强模型,并且批评质量是这种改进的关键。然后,我们提出渐进式在策略批评蒸馏(**OPCD**),它过滤高质量的批评,并通过自适应自教师信号将批评引导的行为蒸馏到强模型中。在推理和对齐基准上的实验表明,我们的方法在训练轮次中改进了强模型,为使用弱监督的可扩展监督提供了一条有效路径。

英文摘要

As large language models become stronger, weak supervisors may fail to provide reliable labels, preferences, or final judgments for complex outputs, limiting both weak-to-strong generalization and scalable oversight. We study a more tractable form of weak supervision: using a weak model as a critic rather than as a labeler or judge. Instead of solving the task or selecting the correct answer, the weak critic only needs to provide a non-misleading revision direction that helps the strong model better use its own knowledge. We call this setting *weak-critic strong oversight*. We first show that weak critiques can improve frozen strong models at inference time, and that critique quality is key to this improvement. We then propose progressive on-policy critique distillation (**OPCD**), which filters high-quality critiques and distills critic-guided behavior into the strong model through adaptive self-teacher signals. Experiments on reasoning and alignment benchmarks show that our method improves strong models over training epochs, suggesting an effective path for scalable oversight with weak supervision.

2606.00418 2026-06-02 cs.RO cs.HC

Literary Emotions in Motion: A Soft Robotics Installation for Tactile Storytelling

文学情感在运动中:用于触觉叙事的软体机器人装置

Carolina Silva-Plata, Abraham Villavicencio-Carmona, Miguel Silva Plata, Stefan Escaida, Ruben Fernandez

发表机构 * Department of Mechanical Engineering, University of Chile(智利大学机械工程系) Independent Researcher(独立研究员) Bolivian Catholic University(玻利维亚天主大学) Institute of Engineering Sciences, University of O’Higgins(奥希金斯大学工程科学研究所)

AI总结 提出一种将叙事文本语义情感分析映射到软体气动模块可变刚度的交互装置,通过用户研究评估刚度与LED强度多感官耦合对情感感知的影响。

Comments 8 pages, 8 figures

Journal ref IEEE Robotics and Automation Magazine, 2026

详情
AI中文摘要

软体机器人越来越多地在艺术语境中被探索,其中触觉交互为观众提供了超越视觉或听觉信号的具身参与。本作品展示了一个交互装置,将叙事文本的语义情感分析映射到软体气动模块的可变刚度。一个自然语言模型从预定义的六种情感中识别出两种主导情感,驱动七个六边形排列的软体执行器充气。中心执行器代表主要情感,而周围的执行器表达次要情感。我们开发并机械表征了称为软模块的硅胶执行器,其具有薄膜层,展示了这种形态控制如何扩展可实现的刚度范围,同时保持简单性和低成本制造。一项包含十名参与者的用户研究进一步评估了刚度和LED强度的多感官耦合如何影响情感感知。结果表明,伴随颜色变化的刚度调制可以支持软体机器人装置中具有情感意义和吸引力的触觉交互。

英文摘要

Soft robotics is increasingly explored in artistic contexts, where tactile interaction provides audiences with embodied engagement beyond visual or auditory signals. This work presents an interactive installation that maps semantic emotion analysis of narrative text into variable stiffness of soft pneumatic modules. A natural language model identifies two dominant emotions from a predefined set of six, driving the inflation of seven hexagonally arranged soft actuators. The central actuator represents the primary emotion, while the surrounding ones express the secondary. We develop and mechanically characterize silicone actuators, called soft modules, featuring a thin membrane layer, demonstrating how this morphological control expands the achievable stiffness range while preserving simplicity and low-cost fabrication. A user study with ten participants further evaluates how multisensory coupling of stiffness and LEDs intensity influences emotional perception. The results suggest that stiffness modulation accompanied by color change can support emotionally meaningful and engaging tactile interaction in soft robotic installations.

2606.00416 2026-06-02 cs.CV

4D Radar Meets LiDAR and Camera: Cooperative Perception under Adverse Weather

4D雷达与激光雷达和相机的结合:恶劣天气下的协同感知

Melih Yazgan, Iramm Hamdard, Qiyuan Wu, J. Marius Zoellner

发表机构 * FZI Research Center for Information Technology(FZI信息技术研究所以) Karlsruhe Institute of Technology(卡尔斯鲁厄大学)

AI总结 针对恶劣天气下相机和激光雷达性能下降的问题,提出集成4D成像雷达作为鲁棒模态,并引入多普勒引导的空间注意力机制进行多智能体融合,显著提升雾雨环境下的协同感知鲁棒性。

Comments Accepted by CVPR - DriveX Workshop

详情
AI中文摘要

协同感知对于自动驾驶至关重要,但在恶劣天气下,当相机和激光雷达性能下降时,其可靠性会受到影响。我们通过将4D成像雷达作为一种对天气鲁棒的模态集成到协同感知中,并引入多普勒引导的空间注意力机制用于多智能体融合,来解决这一挑战。我们的方法扩展了两种代表性骨干网络:一种是雷达-相机流水线,其中雷达替代激光雷达;另一种是激光雷达-雷达流水线,其中雷达补充激光雷达。为了支持评估,我们发布了雷达增强的基准数据集OPV2V-R和Adver-City-R,并加入了基于物理的激光雷达退化模拟。实验表明,在雾和雨条件下,该方法获得了显著的鲁棒性提升,特别是在雷达替代退化激光雷达时改进明显。在MAN TruckScenes上的额外验证证明了该方法在仿真之外的迁移能力。总体而言,我们的结果突出了4D成像雷达作为一种适用于全天候协同感知的鲁棒模态。数据集和代码可在以下网址获取:https://url.fzi.de/SlimComm。

英文摘要

Cooperative perception is important for autonomous driving but remains fragile when cameras and LiDAR degrade in adverse weather. We address this challenge by integrating 4D imaging radar as a weather-robust modality into collaborative perception and introducing a Doppler-guided spatial attention mechanism for multi-agent fusion. Our approach extends two representative backbones: a radar-camera pipeline where radar substitutes LiDAR, and a LiDAR-radar pipeline where radar complements LiDAR. To support evaluation, we release radar-augmented benchmarks, OPV2V-R and Adver-City-R, with physics-based LiDAR degradation. Experiments show strong robustness gains in fog and rain, including substantial improvements when radar replaces degraded LiDAR. Additional validation on MAN TruckScenes demonstrates transfer beyond simulation. Overall, our results highlight 4D imaging radar as a robust modality for all-weather collaborative perception. Dataset and code are available at: https://url.fzi.de/SlimComm.

2606.00414 2026-06-02 cs.LG

Auditing Near-Optimal Policies Can Be Exponentially Hard: Conditional Query Lower Bounds via Occupancy Rashomon Capacity

审计近最优策略可能是指数级困难的:通过占用Rashomon容量的条件查询下界

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

发表机构 * Department of Computer Science, Iowa State University(计算机科学系,爱荷华州立大学) Department of Civil, Construction & Environmental Engineering, Iowa State University(土木、建设与环境工程系,爱荷华州立大学)

AI总结 本文通过引入占用Rashomon容量概念,证明了在存在多个近最优策略时,审计这些策略的行为差异需要指数级数量的查询,并给出了精确和噪声查询下的下界。

详情
AI中文摘要

当许多强化学习策略达到近最优回报时,事后审计员可能必须区分许多行为不同但回报等价的策略。我们通过Rashomon容量的占用度量类比来形式化这一现象:近最优占用区域的度量熵,相对于被审计的部署类别计算。由于占用度量仅识别到占用等价,我们在占用类别级别上制定审计,并区分精确局部查询预言机和噪声样本查询预言机。我们的主要精确查询结果是条件性的:如果被审计类别包含一个$2/H$-分离的近最优填充,其局部签名是$b$-稀疏的,那么精确局部查询审计需要$\Omega(M/b)$次查询;当填充实现部署类别容量且$b=O(1)$时,这变为$\Omega(2^{\Hopt^\cF(\eps)})$。我们给出了一个有限折扣隐藏分支MDP达到此界,并展示了精确贝叶斯成功律。对于噪声隐藏触发测试,我们证明了阶为$M/\beta$的混合下界,其中$\beta$是每样本KL信号,对于容量阶填充且$\beta=O(\rho^2\Delta^2)$,得到$\Omega(2^{\Hopt^\cF(\eps)}/(\rho^2\Delta^2))$。我们还提供了静态目标识别信息下界、一个转录兼容的预言机覆盖验证上界,以及一个规范占用正则化器,当存在可信参考占用时,其正则化审计容量会崩溃。受控基准将正稀疏签名实例与精确审计容易的高容量阴性对照区分开来,并将噪声触发律映射到后处理的连续控制和视觉RL审计体制。

英文摘要

When many reinforcement-learning policies achieve near-optimal return, a post-hoc auditor may have to distinguish among many behaviorally distinct but return-equivalent policies. We formalize this phenomenon through an occupancy-measure analogue of Rashomon capacity: the metric entropy of the near-optimal occupancy region, computed relative to an audited deployment class. Because occupancy measures identify behavior only up to occupancy equivalence, we formulate auditing at the occupancy-class level and distinguish exact local-query oracles from noisy sample-query oracles. Our main exact-query result is conditional: if the audited class contains a $2/H$-separated near-optimal packing whose local signatures are $b$-sparse, then exact local-query auditing requires $Ω(M/b)$ queries; when the packing realizes deployment-class capacity and $b=O(1)$, this becomes $Ω(2^{\Hopt^\cF(\eps)})$. We give a finite discounted hidden-branch MDP attaining this bound and show the exact Bayes success law. For noisy hidden-trigger testing, we prove a mixture lower bound of order $M/β$, where $β$ is the per-sample KL signal, yielding $Ω(2^{\Hopt^\cF(\eps)}/(ρ^2Δ^2))$ for capacity-order packings with $β=O(ρ^2Δ^2)$. We also provide a static target-recognition information lower bound, a transcript-compatible oracle-cover verification upper bound, and a canonical occupancy regularizer whose regularized audited capacity collapses when a trusted reference occupancy is available. Controlled benchmarks distinguish positive sparse-signature instances from high-capacity negative controls where exact auditing is easy, and map the noisy-trigger law to post-processed continuous-control and visual-RL auditing regimes.

2606.00408 2026-06-02 cs.CL cs.AI cs.IR

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

掩盖过时观察帮助搜索智能体——直到适得其反:一个机制图谱及其机制

Haoxiang Zhang, Qixin Xu, Zhuofeng Li, Lei Zhang, Pengcheng Jiang, Yu Zhang, Julian McAuley

发表机构 * UC San Diego(加州大学圣地亚哥分校) UC Berkeley(加州大学伯克利分校) Texas A&M University(德克萨斯大学) UIUC(伊利诺伊大学厄巴纳-香槟分校)

AI总结 本文通过系统实验发现,在长时域搜索智能体中掩盖过时观察的效果呈现非对称倒U型曲线,取决于检索器召回率与模型隐式过滤能力的交互,并揭示了其背后的令牌-轮次权衡机制。

Comments 47 pages, 7 figures

详情
AI中文摘要

长时域搜索智能体在多次工具调用中积累大量检索内容,使得上下文预算效率日益重要。一种最小干预措施是在轨迹推进过程中掩盖过时观察,但尚不清楚这种上下文管理何时有帮助及其原因。我们通过系统扫描不同智能体骨干(4B到284B参数)和三个检索器,在离线和在线智能搜索基准上研究观察掩盖。我们发现,当以无上下文管理时的模型准确率为横轴时,掩盖带来的准确率提升呈非对称倒U形:在弱检索器下出现平台期,在强检索器与中等容量模型相遇时达到峰值,在模型饱和时急剧下降。这种模式反映了检索器召回率与模型隐式过滤能力之间的交互,而非单一因素。机制上,掩盖实现了令牌-轮次权衡:它移除了模型基本不再关注的观察以及智能体很少重新打开的页面。当增加的轮次将失败转化为成功时,它们有帮助;但当掩盖移除了模型本会使用的证据时,它们会失败。因此,我们将上下文管理重新定义为一种依赖机制的干预,并为分析智能深度搜索中的上下文使用提供了整体视角。我们在此发布我们的框架和轨迹(https://github.com/i-DeepSearch/observation-masking)以支持未来研究。

英文摘要

Long-horizon search agents accumulate large amounts of retrieved content across many tool calls, making context-budget efficiency increasingly important. A minimal intervention is to mask stale observations from the context as the trajectory progresses, but it remains unclear when this form of context management helps and why. We study observation masking through a systematic sweep over various agent backbones (4B to 284B parameters) and three retrievers on offline and live-web agentic search benchmarks. We find that the accuracy gain from masking follows an asymmetric inverted-U shape when plotted against the model's accuracy without context management: a plateau under weak retrievers, a peak when a strong retriever meets a mid-capacity model, and a sharp collapse when the model is saturated. This pattern reflects the interaction between retriever recall and the model's implicit filtering capacity, rather than either factor in isolation. Mechanistically, masking implements a token-for-turn trade-off: it removes observations the model has largely stopped attending to and pages the agent rarely re-opens. The added turns help when they convert failures into successes, but they fail when masking removes evidence the model would otherwise have used. We therefore reframe context management as a regime-dependent intervention and provide a holistic perspective for analyzing context use in agentic deep search. We release our scaffold and trajectories here (https://github.com/i-DeepSearch/observation-masking) to support future research.

2606.00404 2026-06-02 cs.CV cs.LG

Rethinking Amortized Neural Representations for High-Resolution Terrain Elevation Data

重新思考高分辨率地形高程数据的摊销神经表示

Haoan Feng, Xin Xu, Leila De Floriani

发表机构 * University of Maryland, College Park(马里兰大学学院公园分校)

AI总结 针对地形高程数据,提出HUVR+SIREN超网络方法,通过替换坐标解码器为平滑可微版本,在统一基准上实现最佳高度和导数保真度,且支持后训练量化压缩。

Comments 12 pages, 7 figures, 10 tables

详情
AI中文摘要

隐式神经表示(INR)将信号建模为连续的坐标到值函数。对于地形高程数据,这支持解析导数、任意分辨率解码以及底层高度场的平滑表面模型。然而,为每个瓦片拟合和存储单独的INR无法扩展到大型地形数据集。摊销神经表示通过共享网络降低了这一成本:新瓦片被映射到紧凑的每瓦片载荷,共享解码器从中重建高度场。大多数此类方法是超网络,通过单次前向传递预测载荷,而其他方法则通过短时的每瓦片优化恢复载荷。这些方法主要针对自然图像开发,其在地形高度场上的适用性尚不清楚。我们在1米/像素的地形数据集上引入了受控基准,并在统一协议下评估了三种代表性方法。观察到明显的跨领域差距后,我们提出了HUVR+SIREN,这是一种超网络,它通过将坐标解码器替换为平滑、解析可微的解码器来适应最强的基准方法(HUVR)。它在基准上实现了最佳的高度和导数保真度,无需额外的每瓦片存储且解码成本更低,并且能够容忍激进的后训练量化而质量损失可忽略,从而形成了紧凑的地形神经格式。消融和诊断进一步确定了哪些设计选择可迁移到地形,并表明每瓦片瓶颈已接近其有用极限,剩下的差距在于共享超网络的架构设计。

英文摘要

Implicit neural representations (INRs) model a signal as a continuous coordinate-to-value function. For terrain elevation data, this supports analytic derivatives, arbitrary-resolution decoding, and a smooth surface model of the underlying heightfield. However, fitting and storing a separate INR for every tile does not scale to large terrain datasets. Amortized neural representations reduce this cost with a shared network: a new tile is mapped to a compact per-tile payload, and a shared decoder reconstructs the heightfield from it. Most such methods are hypernetworks that predict the payload in a single forward pass, while others recover it through a short per-tile optimization. These methods were developed primarily for natural images, and their suitability for terrain heightfields remains unclear. We introduce a controlled benchmark on a 1 m/pixel terrain dataset and evaluate three representative methods under a unified protocol. Observing a clear cross-domain gap, we propose HUVR+SIREN, a hypernetwork that adapts the strongest benchmarked method (HUVR) by replacing its coordinate decoder with a smooth, analytically differentiable one. It attains the best height and derivative fidelity on the benchmark with no additional per-tile storage and lower decode cost, and tolerates aggressive post-training quantization with negligible quality loss, giving a compact terrain neural format. Ablations and diagnostics further identify which design choices transfer to terrain and show that the per-tile bottleneck is already near its useful limit, leaving the remaining gap in the shared hypernetwork's architectural design.

2606.00400 2026-06-02 cs.LG

Dynamic Proxy-Mixing: Transferring Replay Controllers from Small to Large Models for Continual Instruction Tuning

动态代理混合:将重放控制器从小模型迁移到大模型以进行持续指令微调

Ibne Farabi Shihab, Fariya Afrin, Anuj Sharma

发表机构 * Department of Computer Science, Iowa State University(爱荷华州立大学计算机科学系) Department of Computer Science, Kalinga Institute of Industrial Technology(卡林加工业技术学院计算机科学系) Department of Civil, Construction & Environmental Engineering, Iowa State University(爱荷华州立大学土木、建筑与环境工程系)

AI总结 提出PROXY-MIX框架,通过在小代理模型上学习动态重放控制器并冻结迁移至大模型,以解决持续指令微调中固定重放比例导致的灾难性遗忘问题,在LLaMA-3-8B上平均准确率提升3.4%,遗忘降低3.5%,安全性提升5.8%。

详情
AI中文摘要

持续指令微调通过一系列新领域更新语言模型,但每次更新会逐渐侵蚀先前学到的能力和对齐行为。重放是标准的缓解方法,但固定重放比例本质上有限,因为最优混合比例随当前领域、训练阶段以及先前行为的脆弱性而变化。我们提出PROXY-MIX框架,该框架在小代理模型上学习动态重放控制器,并将冻结的控制器迁移到更大的目标模型。控制器从未见过未来任务,而是从归一化的验证损失及其时间动态构建状态,生成当前任务和可访问重放缓冲区的掩码混合。我们的核心经验假设是遗忘镜像:即使绝对损失大小不同,任务脆弱性排名在不同模型规模上基本一致。在跨规模迁移控制器之前,我们通过实验验证了这一假设。在LLaMA-3-8B上跨越五个持续指令微调序列,PROXY-MIX在平均准确率上提高了3.4个百分点,最终遗忘降低了3.5个百分点,安全性得分比最强的非神谕基线提高了5.8个百分点,策略学习成本约为神谕目标强化学习的50倍。该框架在接口层面无泄漏且架构无关,我们还确定了代理假设失效的设置,突出了鲁棒部署的局限性。

英文摘要

Continual instruction tuning updates a language model through a sequence of new domains, yet each update can progressively erode previously learned capabilities and alignment behavior. Replay is the standard mitigation, but fixed replay ratios are inherently limited because the optimal mixture varies with the current domain, the training stage, and the evolving vulnerability of prior behaviors. We propose PROX-YMIX, a framework that learns a dynamic replay controller on a small proxy model and transfers the frozen controller to a larger target. The controller never observes future tasks and constructs its state from normalized validation losses and their temporal dynamics, producing a masked mixture over the current task and accessible replay buffers. Our core empirical hypothesis is forgetting mirroring: task vulnerability rankings remain largely consistent across model scales even when absolute loss magnitudes differ. We validate this assumption empirically before transferring controllers across scales. On LLaMA-3-8B across five continual instruction tuning sequences, PROXYMIX improves average accuracy by 3.4 points, reduces final forgetting by 3.5 points, and raises safety score by 5.8 points over the strongest non-oracle baseline, at roughly 50x lower policy learning cost than Oracle Target RL. The framework is leakage free and architecture independent at the interface level, and we also identify settings where the proxy assumption breaks down, highlighting limitations for robust deployment.

2606.00399 2026-06-02 cs.LG

Multi-Objective Reference-Aligned Machine Unlearning

多目标参考对齐机器遗忘

Rasa Khosrowshahli, Stephen Asobiela, Beatrice Ombuki-Berman, Shahryar Rahnamayan

发表机构 * arXiv

AI总结 提出多目标框架RAUL,通过将遗忘样本的预测对齐到参考分布(均匀分布或保留集经验分布)来约束遗忘目标,并利用雅可比下降解决多目标优化,实现接近完整重训练的遗忘效果。

Comments Accepted as a short paper at Canadian AI 2026. Author version with an added framework overview figure for clarity

详情
AI中文摘要

机器遗忘旨在移除特定训练样本的影响,同时保持模型的效用。现有的单目标方法,如梯度上升或随机重标,常常由于冲突的优化动态和无界的遗忘目标导致灾难性遗忘,使模型偏离其预训练知识。我们提出参考对齐遗忘(RAUL),一个多目标框架,通过将遗忘样本上的无界损失最大化替换为有界的KL对齐,使其预测对齐到代表未见数据的参考分布(可实例化为均匀分布或来自保留参考集的经验分布),从而约束遗忘目标并减少与保留目标的梯度冲突,联合优化遗忘和保留。通过雅可比下降解决由此产生的多目标优化(MOO)问题,该算法将多个梯度聚合到无冲突的方向。我们的结果表明,与完全重训练相比,RAUL实现了最接近的差距。

英文摘要

Machine unlearning aims to remove the influence of specific training samples while preserving the model's utility. Existing single-objective approaches, such as gradient ascent or random relabeling, often induce catastrophic forgetting due to conflicting optimization dynamics and unbounded forgetting objectives that cause the model to drift from its pre-trained knowledge. We propose Reference-Aligned UnLearning (RAUL), a multi-objective framework that jointly optimizes forgetting and retention by replacing unbounded loss maximization with a bounded KL alignment of predictions on forgotten samples toward a reference distribution representing unseen data, instantiated either as a uniform distribution or an empirical distribution from a held-out reference set, which constrains the forgetting objective and reduces gradient conflict with retention. The resulting multi-objective optimization (MOO) problem is solved via Jacobian descent, which aggregates multiple gradients into a direction that does not conflict. Our results demonstrate that RAUL achieves the closest gap compared to full retraining.

2606.00397 2026-06-02 cs.RO cs.SY eess.SY

SoFiE: Soft Finger Exoskeleton for Intelligent Grasping

SoFiE: 用于智能抓取的软手指外骨骼

Magnus Malthe Sigsgaard Nielsen, Nicklas Nikolaj Grønvall, Xiaofeng Xiong, Saravana Prashanth Murali Babu

发表机构 * SDU Soft Robotics, SDU Biorobotics, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark (SDU)(SDU柔性机器人实验室、SDU生物机器人实验室、马士基麦金尼莫勒研究所、南丹麦大学)

AI总结 本文提出一种模块化软手指外骨骼SoFiE,采用3D打印柔性材料、肌腱驱动和集成触觉传感,实现轻量化、低轮廓的抓取辅助与智能感知。

详情
AI中文摘要

软体可穿戴机器人系统已成为辅助手部功能减退个体的有前景解决方案。本文提出SoFiE,一种模块化软手指外骨骼,旨在辅助抓取任务中的食指屈曲。该系统主要采用3D打印柔性材料制造,实现了轻量、低轮廓和模块化设计。驱动通过紧凑型直流电机驱动的肌腱机构实现,而被动伸展由柔性导电弹簧提供。该元件称为StretchSense,通过变形下的电阻变化也作为本体感受传感器。此外,引入了一种新颖的触觉传感方法MagSense,使用嵌入软指尖结构中的磁铁和磁力计对来估计接触力和物体柔顺性。该系统完全无线,并由嵌入式微控制器控制。此外,通过电机编码器反馈的驱动器级传感能够估计系统状态,为安全和自适应控制策略提供基础。实验验证表明,该系统能够提供可靠的姿态估计,区分不同刚度的材料,并在不同抓取任务中生成独特的传感器特征。本文详细介绍了所提出的外骨骼的设计、制造和传感概念,作为模块化、软体和辅助可穿戴机器人的概念验证。

英文摘要

Soft wearable robotic systems have emerged as a promising solution for assisting individuals with reduced hand function. This paper presents SoFiE, a modular soft finger exoskeleton designed to assist index-finger flexion during grasping tasks. The proposed system is primarily fabricated using 3D-printed flexible materials, enabling a lightweight, low-profile, and modular design. Actuation is achieved through a tendon-driven mechanism powered by a compact DC motor, while passive extension is provided by a compliant conductive spring. This element, termed StretchSense, also functions as a proprioceptive sensor by exhibiting resistance changes under deformation. Furthermore, a novel tactile sensing approach, MagSense, is introduced, using a magnet and magnetometer pair embedded in a soft fingertip structure to estimate contact force and object compliance. The system is fully untethered and controlled by an embedded microcontroller. In addition, actuator-level sensing through motor encoder feedback enables estimation of the system state, providing a foundation for safe and adaptive control strategies. Experimental validation demonstrates the capability of the system to provide reliable pose estimation, distinguish between materials with different stiffness, and generate distinct sensor signatures across different grasping tasks. This paper details the design, fabrication, and sensing concepts of the proposed exoskeleton as a proof of concept toward modular, soft, and assistive wearable robotics.

2606.00392 2026-06-02 cs.LG cs.AI

Detector-Evasive LLM Paraphrasing via Constrained Policy Optimization

通过约束策略优化实现检测器规避的LLM释义

Mingyi Wang, Zhuoer Shen, Yuheng Bu, Shaofeng Zou

发表机构 * School of ECEE, Arizona State University(亚利桑那州立大学电子工程与计算机科学学院) Department of Computer Science, University of California, Santa Barbara(加州大学圣巴巴拉分校计算机科学系)

AI总结 提出DEPO算法,将检测器规避的LLM释义建模为约束马尔可夫决策过程,通过拉格朗日对偶强化学习在保持语义的同时实现高效规避。

详情
AI中文摘要

AI文本检测器易受释义和检测器引导的释义攻击,但现有规避方法缺乏对语义保持的精确控制。特别是,直接优化检测器规避会降低细粒度语义,而标量化奖励设计仅提供间接、权重敏感的规避-语义权衡控制。我们通过将检测器规避的LLM释义建模为约束马尔可夫决策过程来解决这一限制,其中检测器规避是主要目标,语义保持作为显式约束强制执行。我们提出检测器规避策略优化(DEPO),一种拉格朗日原始-对偶强化学习算法,具有新颖的GRPO风格组基策略更新。DEPO在训练期间自适应平衡语义保持和检测器规避,使策略能够在规定的语义保持区域内提高攻击成功率。在MAGE、M4、RAID和同行评审数据集上的实验,针对MAGE、RoBERTa、RADAR、Binoculars和Fast-DetectGPT检测器进行评估,表明DEPO在精确满足语义保持约束的同时实现了强大的检测器规避。DEPO还表现出跨领域、跨检测器和提示级别的鲁棒性。

英文摘要

AI-text detectors are vulnerable to paraphrasing and detector-guided paraphrasing attacks, but existing detector-evasion methods often lack precise control over semantic preservation. In particular, optimizing directly for detector evasion can degrade fine-grained semantics, whereas scalarized reward designs provide only indirect, weight-sensitive control over the evasion-semantics trade-off. We address this limitation by formulating detector-evasive LLM paraphrasing as a Constrained Markov Decision Process, where detector evasion is the primary objective and semantic preservation is enforced as an explicit constraint. We propose Detector Evasion Policy Optimization (DEPO), a Lagrangian primal-dual reinforcement learning algorithm with a novel GRPO-style group-based policy update. DEPO adaptively balances semantic preservation and detector evasion during training, enabling the policy to improve attack success within a prescribed semantic-preservation region. Experiments on MAGE, M4, RAID, and peer-review datasets, evaluated against MAGE, RoBERTa, RADAR, Binoculars, and Fast-DetectGPT detectors, show that DEPO achieves strong detector evasion while precisely satisfying the semantic preservation constraint. DEPO also exhibits cross-domain, cross-detector, and prompt-level robustness.

2606.00390 2026-06-02 cs.CV cs.AI

Zamba2-VL Technical Report

Zamba2-VL 技术报告

Hassan Shapourian, Kasra Hejazi, Olabode M. Sule, Beren Millidge

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Cambridge(剑桥大学) University of Washington(华盛顿大学) University of Toronto(多伦多大学)

AI总结 提出基于混合架构Zamba2的视觉语言模型Zamba2-VL,在图像理解等基准上媲美Transformer模型,且首次令牌延迟降低约一个数量级。

Comments 16 pages, 2 figures

详情
AI中文摘要

我们提出Zamba2-VL,这是一套基于Zamba2构建的视觉语言模型,Zamba2是一种混合语言模型架构,结合了Mamba2状态空间层和少量共享的Transformer块。在广泛的图像理解、推理、OCR、定位和计数基准测试中,Zamba2-VL与同等规模的主流基于Transformer的开源VLM(包括Molmo2、Qwen3-VL和InternVL3.5系列)具有竞争力,并且显著优于之前的基于SSM和混合的VLM,如VL-Mamba、Cobra和mmMamba。继承了其Zamba2骨干网络的近线性预填充计算和小的、近乎恒定的循环状态,Zamba2-VL在匹配参数规模下,首次令牌延迟(TTFT)比这些Transformer基线低大约一个数量级,在最适合设备和边缘部署的较小1.2B和2.7B规模上效率差距最为明显。我们发布了三个模型——1.2B、2.7B和7B——以及推理代码,网址为https://huggingface.co/collections/Zyphra/zamba2-vl。

英文摘要

We present Zamba2-VL, a suite of vision-language models built on Zamba2, a hybrid language-model architecture combining Mamba2 state-space layers with a small number of shared transformer blocks. Across a broad range of image understanding, reasoning, OCR, grounding, and counting benchmarks, Zamba2-VL is competitive with leading Transformer-based open-weight VLMs of comparable scale, including the Molmo2, Qwen3-VL, and InternVL3.5 families, and substantially outperforms prior SSM-based and hybrid VLMs such as VL-Mamba, Cobra, and mmMamba. Inheriting the near-linear prefill compute and small, near-constant recurrent state of its Zamba2 backbone, Zamba2-VL delivers roughly an order of magnitude lower time-to-first-token (TTFT) than these Transformer baselines at matched parameter scale, with the efficiency gap most pronounced at the smaller 1.2B and 2.7B scales most relevant to on-device and edge deployment. We release three models -- 1.2B, 2.7B, and 7B -- together with inference code at https://huggingface.co/collections/Zyphra/zamba2-vl.

2606.00386 2026-06-02 cs.CV

αDepth: Learning Single-Pass Soft Boundary Decomposition for Stereo Conversion

αDepth: 学习用于立体转换的单次软边界分解

Xiang Zhang, Yang Zhang, Lukas Mehl, Karlis Martins Briedis, Markus Gross, Christopher Schroers

发表机构 * ETH Zürich(苏黎世联邦理工学院) DisneyResearch|Studios(迪士尼研究|工作室)

AI总结 提出αDepth表示,通过圆形Alpha表示(CAR)将软边界分解为局部层次,实现高保真立体转换,无需用户干预。

详情
AI中文摘要

精确建模软边界(例如头发和散焦模糊)是立体转换中的一个基本挑战,因为前景和背景的模糊混合。现有的深度模型主要预测单层深度,导致软边界处的深度对应关系模糊。虽然抠图技术可以捕获用于分层建模的不透明度,但它们在具有多个目标的复杂场景中通常表现不佳,并且通常需要用户干预。本文介绍了αDepth,一种分层表示,用于分解软边界以实现高保真立体转换。具体来说,我们首先通过估计软边界处的分层颜色和深度值来解决混合颜色和深度模糊问题。考虑到复杂的多目标场景,我们设计了一种圆形Alpha表示(CAR),将范式从全局目标提取转变为局部边界分解。与先前仅限于单个前景/背景的抠图方法不同,CAR无需手动指导即可实现高效的场景级推理。大量评估表明,αDepth在立体转换中实现了最先进的性能,消除了软边界处的背景渗色和结构失真。

英文摘要

Accurately modeling soft boundaries, e.g., hair and defocus blur, is a fundamental challenge in stereo conversion due to the ambiguous blending of foreground and background. Existing depth models primarily predict single-layer depth, leading to ambiguity in depth correspondence at soft boundaries. While matting techniques can capture opacity for layered modeling, they often struggle in complex scenes with multiple targets and usually require user intervention. This paper introduces αDepth, a layered representation that decomposes soft boundaries for high-fidelity stereo conversion. Specifically, we first resolve mixed color and depth ambiguity by estimating layered color and depth values at soft boundaries. Considering complex multi-target scenes, we design a Circular Alpha Representation (CAR) that shifts the paradigm from global target extraction to local boundary decomposition. Unlike prior matting methods restricted to a single foreground/background, CAR enables efficient scene-level inference without manual guidance. Extensive evaluations demonstrate that αDepth achieves state-of-the-art performance in stereo conversion, eliminating background bleeding and structural distortions at soft boundaries.

2606.00383 2026-06-02 cs.RO cs.LG cs.SY eess.SY

Behavior Cloning of MPC for 3-DOF Robotic Manipulators

三自由度机械臂MPC的行为克隆

Theo Guegan, Dexter Wen Jie Teo

发表机构 * University of Waterloo(多伦多大学) Universite de Technologie de Compiègne(技术与科学大学) Nanyang Technological University(南洋理工大学) Polytechnique Montréal(蒙特利尔理工学院)

AI总结 针对MPC实时计算负担重的问题,采用行为克隆方法近似MPC策略,通过多种神经网络架构实现三自由度机械臂的实时控制,在宽松容差下推理延迟降低3倍,成功率84.98%。

Comments Accepted at the IEEE ICRA 2026 Workshop on Reinforcement Learning in the Era of Imitation Learning (RL4IL), 6 pages excluding references

详情
AI中文摘要

虽然模型预测控制(MPC)提供了强大的稳定性和鲁棒性,但它给实时系统带来了显著的计算负担。本文研究了行为克隆在近似MPC策略以实时控制三自由度机械臂中的应用。我们提出了一个结合逆运动学与MPC的基线控制器,并评估了从经典回归算法到深度学习模型(包括深度MLP和RNN)的神经网络架构,以推导计算高效的替代策略。我们分析了泛化能力、稳定性考虑以及不同架构选择固有的权衡。我们的实证研究采用了在线和离线评估,以评估在准确性、计算效率和对原始MPC策略的忠实度方面的性能。结果表明,行为克隆可以有效减少三自由度机械臂MPC策略的计算负担,在宽松容差下推理延迟降低3倍,成功率达到84.98%。值得注意的是,我们发现静态架构优于时间变体,证实了瞬时状态观测对此任务的充分性。然而,在严格容差下我们观察到精度差距,这表明虽然行为克隆捕获了全局最优轨迹,但需要进一步研究以最小化终端稳态误差。

英文摘要

While Model Predictive Control (MPC) provides strong stability and robustness, it imposes a significant computational burden on real-time systems. This paper investigates the application of Behavior Cloning to approximate MPC policies for the real-time control of a 3-degree-of-freedom robotic manipulator. We present a baseline controller combining Inverse Kinematics with MPC and evaluate neural network architectures, ranging from classical regression algorithms to deep learning models including Deep MLPs and RNNs, to derive computationally efficient surrogate policies. We analyze generalization capabilities, stability considerations, and the trade-offs inherent in different architectural choices. Our empirical study employs both online and offline evaluations to assess performance regarding accuracy, computational efficiency, and fidelity to the original MPC policy. Our results demonstrate that Behavior Cloning can effectively reduce the computational burden of MPC policies for 3-DOF robotic manipulators, achieving a 3x reduction in inference latency with a 84.98% success rate under relaxed tolerances. Notably, we find that static architectures outperform temporal variants, confirming the sufficiency of instantaneous state observations for this task. However, we observe a precision gap under strict tolerances, which suggest that while Behavior Cloning captures the global optimal trajectory, further research is needed to minimize terminal steady-state error.

2606.00382 2026-06-02 cs.LG

CRMA: A Spectrally-Bounded Backbone for Modular Continual Fine-Tuning of LLMs

CRMA:用于大语言模型模块化持续微调的谱约束骨干

Kiran Nayudu, Aswini Nutakki, Sai Vinay Naidu, Ashwin Shanmugasundaram

发表机构 * ModelBrew AI

AI总结 提出CRMA残差适配器,通过Sinkhorn归一化确保混合矩阵双随机,从而在结构上约束谱范数,实现共享基座的持续训练与跨任务正向迁移,无需回放或蒸馏。

Comments 38 pages, 10 figures. Patent-pending construction details deferred to companion technical report (in preparation)

详情
AI中文摘要

大语言模型的顺序微调面临两难选择:要么让共享基座持续学习并接受灾难性遗忘,要么在第一个任务后冻结它并放弃跨任务优化。每任务适配器方法(LoRAHub、AdapterFusion、PackNet、Progressive Networks)选择了后一条路径。我们提出CRMA(约束残差混合适配器),一种残差适配器,其内部混合矩阵M通过Sinkhorn归一化在每次前向传播时保持双随机,因此根据Birkhoff定理,||M||_2 <= 1由构造保证——这是一种结构约束,而非惩罚项。CRMA的谱约束骨干提供了一个持续训练的共享基座,这是早期模块化方法无法实现的,同时保留了它们的遗忘保证。在Mistral-7B上,跨越5个顺序领域和3个种子,基于CRMA骨干的模块化每任务LoRA将损失相对漂移从+42.96% ± 5.5(朴素顺序微调)降低到-0.17% ± 0.17,且每种子范围不重叠,并将先前任务的保留损失比匹配的冻结基座基线提高了1.99% ± 0.54。三个独立的实验设置(Mistral-7B 4领域受控消融、TinyLlama 3领域污染控制复现、Mistral-7B 7B跨领域探测)均显示出正向反向迁移——无需回放缓冲区、无需增加每任务内存、无需蒸馏。在Gemma-2-9B上的推理时消融证实CRMA介导了对顺序训练知识的访问:在相同权重和相同问题上,仅通过切换CRMA注入,结果从38/100提升到98/100。867次记录的训练步骤验证了||M||_2 = 1.0在float32精度内(最大偏差1.2×10^-7)。遗忘预防效果在1.1B-9.2B参数和四个架构系列中均成立。

英文摘要

Sequential fine-tuning of large language models forces a choice: let the shared substrate keep learning and accept catastrophic forgetting, or freeze it after task one and foreclose cross-task refinement. Per-task adapter methods (LoRAHub, AdapterFusion, PackNet, Progressive Networks) take the second path. We introduce CRMA (Constrained Residual Mixing Adapter), a residual adapter whose internal mixing matrix M is doubly-stochastic at every forward pass via Sinkhorn normalization, so by Birkhoff's theorem ||M||_2 <= 1 holds by construction -- a structural bound, not a penalty. CRMA's spectrally bounded backbone provides a continuously trained shared substrate that earlier modular methods could not, while preserving their forgetting guarantees. On Mistral-7B across 5 sequential domains and 3 seeds, modular per-task LoRA on a CRMA backbone reduces loss-relative drift from +42.96% +/- 5.5 (naive sequential fine-tuning) to -0.17% +/- 0.17, with disjoint per-seed ranges, and improves prior-task holdout loss by 1.99% +/- 0.54 over a matched frozen-substrate baseline. Three independent experimental setups (Mistral-7B 4-domain controlled ablation, TinyLlama 3-domain contamination-controlled replication, Mistral-7B cross-domain probes at 7B) all show positive backward transfer -- without replay buffers, without growing per-task memory, and without distillation. An inference-time ablation on Gemma-2-9B confirms CRMA mediates access to sequentially trained knowledge: 98/100 vs. 38/100 on the same weights and same questions with only CRMA injection toggled. 867 logged training steps verify ||M||_2 = 1.0 within float32 precision (max deviation 1.2 x 10^-7). The forgetting-prevention effect holds across 1.1B-9.2B parameters and four architecture families.

2606.00380 2026-06-02 cs.CV cs.AI

SUPREME: A Multi-GPU Framework for Reproducible Image Unlearning Method Evaluation

SUPREME: 一个用于可复现图像遗忘方法评估的多GPU框架

Petros Andreou, Jamie Lanyon, Axel Finke, Georgina Cosma

发表机构 * Department of Computer Science, School of Science, Loughborough University(计算机科学系,科学学院,洛斯伯勒大学) School of Mathematics, Statistics and Physics, Newcastle University(数学、统计与物理学院,新卡克大学)

AI总结 提出SUPREME框架,通过多GPU分布式架构加速图像分类遗忘方法的评估,支持新方法注册和多精度模式。

Comments 17 pages. Code available at https://github.com/pedroandreou/supreme-unlearning

详情
AI中文摘要

机器遗忘旨在从已训练模型中移除特定训练数据的影响,而无需从头重新训练。评估遗忘方法需要在多个种子下重复训练、遗忘和评估,计算成本高昂。据我们所知,现有的图像分类遗忘框架在单个GPU上运行,限制了在合理时间内可评估的种子数量。我们提出SUPREME,一个开源框架,将这些阶段分布到多个GPU上。SUPREME做出三项贡献:基于注册表的设计,用于添加新方法、指标、模型和场景;支持多种加速器和精度模式的多GPU架构;以及在Pins Face Recognition上使用ResNet18和ViT在十种种子下进行全类和随机样本遗忘的演示。该框架可在https://github.com/pedroandreou/supreme-unlearning获取。

英文摘要

Machine unlearning removes the influence of specific training data from a trained model without retraining it from scratch. Evaluating an unlearning method requires repeating training, unlearning, and evaluation across multiple seeds, which is computationally expensive. To our knowledge, existing image classification unlearning frameworks run on a single GPU, which limits how many seeds can be evaluated in reasonable time. We introduce SUPREME, an open-source framework that distributes these stages across multiple GPUs. SUPREME makes three contributions: a registry-based design for adding new methods, metrics, models, and scenarios; a multi-GPU architecture supporting multiple accelerators and precision modes; and a demonstration on Pins Face Recognition using ResNet18 and ViT under full-class and random-sample unlearning across ten seeds. The framework is available at https://github.com/pedroandreou/supreme-unlearning.

2606.00379 2026-06-02 cs.CV

Non-Learning Low-Light Stereo Vision

非学习低光立体视觉

Jason Wang, Lucas Nguyen, Hyunseung Eom, Wei Xu, Qi Guo

发表机构 * Department of Computer Sciences, Purdue University(普渡大学计算机科学系) Elmore Family School of Electrical and Computer Engineering, Purdue University(普渡大学埃尔莫夫家庭电气与计算机工程学院)

AI总结 提出一种非学习立体框架,利用Field of Junctions (FoJ)提取粗视觉特征,结合边界感知半全局匹配(SGM)从严重噪声图像中估计视差,在基准数据集上获得比近期立体算法更准确的稀疏视差图。

Comments Accepted to ICIP 2026. Code and data available at https://github.com/guo-research-group/nonlearning-lowlight-stereo

详情
AI中文摘要

我们提出了一种非学习立体框架,用于从严重噪声图像中估计视差。利用Field of Junctions (FoJ),它保留了在严重噪声下稳定的粗视觉特征用于构建代价体,同时丢弃与光子噪声不可分的精细纹理。由此产生的结构信息指导边界感知的半全局匹配(SGM),动态调整平滑惩罚以保留真实的视差不连续性。输出是稀疏视差图,在广泛使用的基准数据集上,在未掩蔽像素上比最近的立体算法更准确。

英文摘要

We present a non-learning stereo framework for disparity estimation from severely noisy images. Using the Field of Junctions (FoJ), it retains coarse visual features stable under severe noise for cost volume construction while discarding fine textures inseparable from photon noise. The resulting structural information guides boundary-aware Semi-Global Matching (SGM) that dynamically adapts smoothness penalties to preserve true disparity discontinuities. The output is a sparse disparity map more accurate than those of recent stereo algorithms over unmasked pixels on widely-used benchmark datasets.

2606.00377 2026-06-02 cs.CV

Score-Control for Hallucination Reduction in Diffusion Models

扩散模型中减少幻觉的分数控制

Mahesh Bhosale, Naresh Kumar Devulapally, Abdul Wasi, Chau Pham, Vishnu Suresh Lokhande, David Doermann

发表机构 * University at Buffalo(布法罗大学)

AI总结 针对扩散模型中的幻觉问题,提出基于方差引导的分数调制策略,通过控制分数雅可比矩阵减少幻觉,在保持高保真度和多样性的同时将幻觉降低约25%。

详情
AI中文摘要

扩散模型已成为现代生成式AI的支柱,推动了视觉、语言、音频及其他模态的进步。尽管取得了成功,但它们存在幻觉问题,即生成真实数据分布支撑集之外的不可信样本,这降低了可靠性和信任度。在这项工作中,我们首先通过实验证实了先前提出的假设,即分数平滑性导致图像生成扩散模型中的幻觉,并提供了基于密度的视角。我们进一步通过将幻觉概率质量与学习到的分数函数的利普希茨常数联系起来,形式化了这一概念。受此启发,我们引入了一种方差引导的分数调制(VSM)策略,该策略控制分数雅可比矩阵,从而降低分数平滑性并更好地逼近真实分数,进而减少幻觉。在合成和真实世界数据集上的实验结果表明,我们的方法在保持高保真度和多样性的同时,将幻觉降低了约25%,为更可靠的基于扩散的图像生成提供了原则性步骤。我们还提出了两个具有极端语义变化的基准数据集,用于系统性幻觉评估。代码和数据集公开于https://github.com/bhosalems/VSM。

英文摘要

Diffusion models have emerged as the backbone of modern generative AI, powering advances in vision, language, audio and other modalities. Despite their success, they suffer from hallucinations, implausible samples that lie outside the support of true data distribution, which degrade reliability and trust. In this work, we first empirically confirm previously proposed hypothesis that score smoothness causes hallucinations in Image Generation diffusion models and provide a density-based perspective. We further formalize this notion by linking the hallucinations probability mass to lipschitz constant of the learned score function. Motivated by this, we introduce a Variance-Guided Score Modulation (VSM) strategy that controls the score Jacobian, in turn reducing score smoothness and better approximating the ground truth score that decreases hallucinations. Empirical results on synthetic and real-world datasets demonstrate that our approach reduces hallucinations (up to ~25%) while maintaining high fidelity and diversity, providing a principled step toward more reliable diffusion-based image generation. We also propose two benchmark datasets with extreme semantic variation for systematic hallucination evaluation. Code and Datasets are publicly available at https://github.com/bhosalems/VSM.

2606.00376 2026-06-02 cs.AI cs.CL cs.LG

The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary

确定性视界:当扩展推理失败时工具委托变得必要

Dongxin Guo, Jikun Wu, Siu Ming Yiu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文通过注意力瓶颈定理和确定性视界概念,证明解码器-only注意力在确定性状态追踪任务中存在信息论容量限制,导致扩展推理性能退化,并指出当视界超过19-31时工具委托成为必要。

Comments Accepted at ICML 2026. 4 figures. 51 pages including appendices

详情
AI中文摘要

扩展的思维链推理可能会在确定性状态追踪任务上降低性能,这不是由于偏好偏差,而是源于解码器-only注意力的信息论容量限制。我们建立了:(1) 注意力瓶颈定理及互补的可达性构造,将状态追踪容量界定为 $O(H \cdot \log(L/H) \cdot \sqrt{d_h})$;(2) 一个上下文相关的错误模型,导致超指数精度衰减;(3) 状态空间Jaccard度量,区分能力与偏好失败;(4) 确定性视界 $d^* \in [19, 31]$,超过该视界工具委托变得必要。在12个模型和8个任务领域(包括SWE-Bench、WebArena和SQL-Multi)中,工具集成推理始终优于神经思维链;在主要模型套件上,其准确率达到86-94%,而神经思维链仅为24-42%。在最优长度轨迹上进行微调仅带来<5%的提升,证实了架构上限,并且高跨模型相关性($r = 0.81$-$0.91$)表明这些失败是架构性的而非训练特定的。我们的结果为在代理系统中纯神经推理何时应让位于混合方法提供了原则性指导。

英文摘要

Extended chain-of-thought reasoning can degrade performance on deterministic state-tracking tasks, not due to preference biases, but limits rooted in the information-theoretic capacity of decoder-only attention. We establish: (1) an Attention Bottleneck Theorem with a complementary achievability construction, bounding state-tracking capacity as $O(H \cdot \log(L/H) \cdot \sqrt{d_h})$; (2) a context-dependent error model yielding super-exponential accuracy decay; (3) the State-Space Jaccard metric distinguishing capability from preference failures; (4) a Deterministic Horizon $d^* \in [19, 31]$ beyond which tool delegation becomes necessary. Across 12 models and 8 task domains (including SWE-Bench, WebArena, and SQL-Multi), tool-integrated reasoning consistently outperforms neural chain-of-thought; on the primary model suite it reaches 86-94% accuracy versus 24-42% for neural chain-of-thought. Fine-tuning on optimal-length traces yields $<$5% improvement, confirming an architectural ceiling, and high cross-model correlation ($r = 0.81$-$0.91$) indicates these failures are architectural rather than training-specific. Our results provide principled guidance for when pure neural reasoning should yield to hybrid approaches in agentic systems.

2606.00374 2026-06-02 cs.RO

Constrained Whole-Body Tracking for Humanoid Robots

人形机器人的约束全身跟踪

Daniel Morton, Pranit Mohnot, Marco Pavone

发表机构 * Stanford University(斯坦福大学) NVIDIA Research(NVIDIA研究)

AI总结 提出 ConstrainedMimic 框架,结合操作空间控制与控制障碍函数,在强化学习跟踪策略中实现实时约束满足,用于人形机器人全身运动跟踪与遥操作。

详情
AI中文摘要

强化学习的最新进展已展示出人形机器人令人印象深刻的全身灵活性,但确保安全性和满足约束(特别是训练后指定的约束)仍然是一个挑战。为此,我们提出了 ConstrainedMimic,一个利用全身运动学和动力学在 RL 跟踪策略中实时执行约束的控制框架。通过整合操作空间控制和障碍函数(CBF)的原理,我们能够满足对运动学参考运动和底层动力学的任意运行时约束。在(模拟的)Unitree G1 上使用学习策略进行的全身运动跟踪和遥操作实验中,我们展示了碰撞避免(包括机器人身体和外部障碍物)、关节限制和质心稳定性约束。通过保持与当前接触模式和跟踪目标一致,我们在约束激活时最小化地限制了策略的能力。我们的方法完全可微,可在 CPU、GPU 和 TPU 上运行,并能以高达 300-500 Hz 的频率部署。所有软件将在发表后免费提供。

英文摘要

Recent advances in reinforcement learning (RL) have demonstrated impressive whole-body agility for humanoid robots, yet ensuring safety and satisfying constraints -- particularly those specified after training -- remains a challenge. Towards this goal, we present ConstrainedMimic, a control framework that leverages whole-body kinematics and dynamics for real-time constraint enforcement within RL tracking policies. By integrating principles from operational space control and control barrier functions (CBFs), we enable the satisfaction of arbitrary runtime constraints on both the kinematic reference motion and the underlying dynamics. In whole-body motion-tracking and teleoperation experiments on a (simulated) Unitree G1 with a learned policy, we demonstrate collision avoidance (both with the robot body and external obstacles), joint limits, and center of mass stability constraints. By remaining consistent with the current contact mode and tracking objectives, we minimally restrict the capabilities of the policy when constraints are active. Our method is fully differentiable, runs on CPU, GPU, and TPU, and can be deployed at up to 300-500 Hz. All software will be freely available upon publication.

2606.00372 2026-06-02 cs.CV

LFA: Layer Feature Attention for Run-Time Introspection of 2D Object Detectors in Automated Driving

LFA:用于自动驾驶中2D目标检测器运行时自省的分层特征注意力

Mert Keser, Alois Knoll

发表机构 * Automated Driving Report GitHub Issue(自动驾驶报告GitHub问题)

AI总结 提出LFA方法,通过注意力机制聚合骨干网络多层特征,以提升自动驾驶中2D目标检测器的错误预测性能和可解释性。

详情
AI中文摘要

可靠的目标检测对于自动驾驶至关重要,然而即使是最先进的检测器也不可避免地会犯错误,从而危及安全。预测检测器失败的自省方法通过触发后备机制或提醒人类操作员,能够实现更安全的部署。然而,现有方法仅依赖最后一层特征或手工设计的统计量,丢弃了来自早期层的宝贵信息,这些信息捕捉了不同层次的视觉抽象。我们提出了分层特征注意力(LFA),一种轻量级的自省方法,通过注意力机制学习从多个骨干层聚合特征。我们的关键洞察是,检测错误在特征层次上表现不同:低层捕捉对检测小目标或被遮挡目标至关重要的细粒度细节,而高层编码用于场景理解的语义信息。LFA端到端地学习层重要性权重,从而既改进了错误预测,又实现了对哪些特征级别最能指示检测器失败的可解释分析。在KITTI和BDD100K上的大量实验表明,LFA实现了最先进的自省性能,在多种检测器架构上优于单层基线方法。

英文摘要

Reliable object detection is critical for automated driving, yet even state-of-the-art detectors inevitably make errors that can compromise safety. Introspection methods that predict detector failures enable safer deployment by triggering fallback mechanisms or alerting human operators. However, existing approaches rely solely on last-layer features or hand-crafted statistics, discarding valuable information from earlier layers that capture different levels of visual abstraction. We propose Layer Feature Attention (LFA), a lightweight introspection method that learns to aggregate features from multiple backbone layers through an attention mechanism. Our key insight is that detection errors manifest differently across feature hierarchies: low-level layers capture fine-grained details essential for detecting small or occluded objects, while high-level layers encode semantic information for scene understanding. LFA learns layer importance weights end-to-end, enabling both improved error prediction and interpretable analysis of which feature levels are most indicative of detector failures. Extensive experiments on KITTI and BDD100K demonstrate that LFA achieves state-of-the-art introspection performance, outperforming single-layer baselines across multiple detector architectures.

2606.00371 2026-06-02 cs.LG

How Much Orthogonalization Does Muon Need?

Muon 需要多少正交化?

Hua Huang

发表机构 * NVIDIA

AI总结 研究 Muon 优化器所需的正交化程度,提出一种基于三次牛顿-舒尔茨迭代的低成本正交化变体 cubic5,并在多种模型上验证其与高精度方法性能相当。

详情
AI中文摘要

Muon 优化器通过将病态动量更新替换为近似半正交更新来改进神经网络训练。这引出一个实际问题:Muon 实际上需要多少正交化?我们使用直接为 Muon 的低精度奇异值带导出的松弛三次牛顿-舒尔茨调度来研究这个问题。与五次五次牛顿-舒尔茨迭代的十五次主导矩阵乘法相比,所得的五步三次构造使用十次主导矩阵乘法。三次调度并非旨在作为更精确的极分解求解器;相反,它是一种原则性的低成本变体,使我们能够探究极分解精度、谱整形和训练质量之间的关系。通过合成诊断、NanoGPT 消融实验以及混合 MoE/Mamba 模型的训练实验,我们发现训练质量并非由极分解精度单调决定:截断的 Polar Express、Muon-Jordan、三次牛顿-舒尔茨以及显式 FP32 SVD 极分解因子在 GPT-2 Small 上可达到几乎无法区分的最终损失,而 cubic5 在具有十亿到四十亿参数的混合 MoE/Mamba 模型上,其验证损失与 Muon-Jordan 五次更新相差约 $10^{-3}$。这些结果支持 cubic5 作为一种实用的低成本 Muon 正交化变体,并在测试的设置中提供了训练质量等同的实验证据。

英文摘要

Muon optimizers improve neural-network training by replacing ill-conditioned momentum updates with approximately semi-orthogonal updates. This motivates a practical question: how much orthogonalization does Muon actually require? We study this question using a relaxed cubic Newton--Schulz schedule derived directly for Muon's low precision singular value band. The resulting five-step cubic construction uses ten dominant matrix multiplications, compared with fifteen for five quintic Newton--Schulz iterations. The cubic schedule is not intended as a more accurate polar solver; instead, it is a principled low-cost variant that lets us probe the relation between polar accuracy, spectral shaping, and training quality. Across synthetic diagnostics, NanoGPT ablations, and training experiments on hybrid MoE/Mamba models, we find that training quality is not governed monotonically by polar-decomposition accuracy: truncated Polar Express, Muon-Jordan, cubic Newton--Schulz, and an explicit FP32 SVD polar factor can reach nearly indistinguishable final loss on GPT-2 Small, and cubic5 matches the Muon-Jordan quintic update within about $10^{-3}$ validation loss on hybrid MoE/Mamba models with one billion to four billion parameters. These results support cubic5 as a practical low-cost Muon orthogonalization variant, with empirical evidence of training-quality parity in the settings tested.

2606.00367 2026-06-02 cs.LG cs.AI

Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems

长期决策问题中基于成对偏好的强化学习

Jonathan Colaço Carr, Prakash Panangaden, Doina Precup, Benjamin Van Roy

发表机构 * School of Computer Science, McGill University, Montreal, Quebec, Canada(麦吉尔大学计算机科学学院) Mila - Quebec AI Institute, Montreal, Quebec, Canada(魁北克人工智能研究所) Department of Electrical Engineering, Stanford University, Stanford, California, USA(斯坦福大学电气工程系)

AI总结 针对长期决策问题中基于成对偏好的强化学习效率低且缺乏马尔可夫策略最优性保证的问题,提出马尔可夫决策竞赛模型,证明平稳马尔可夫策略最优性、求解复杂度为P,并给出亚线性收敛算法,在高维长期问题中显著提升学习效率。

详情
AI中文摘要

强化学习问题通常将目标定义为最大化标量奖励函数的期望值。但是,成对偏好通常比标量奖励更容易指定,并且它们表达了标量奖励无法表达的某些目标。因此,基于成对偏好的强化学习方法受到了越来越多的关注。不幸的是,这些方法在长时间跨度的任务中效率低下,并且缺乏关于马尔可夫策略相对于历史依赖策略的性能保证,而这连接了强化学习的理论与实践。因此,我们提出了 extit{马尔可夫决策竞赛}作为基于成对偏好的强化学习的新问题模型。我们证明了平稳马尔可夫策略在所有历史依赖策略中是最优的,精确求解马尔可夫决策竞赛属于P类问题,并且一个简单的迭代算法以亚线性速率收敛到最优策略。最后,在一组具有长时间跨度的高维决策问题中,我们展示了我们的近似算法在学习效率上显著优于先前的工作。

英文摘要

Reinforcement learning problems typically define the goal as maximizing the expected value of a scalar reward function. But, pairwise preferences are often easier to specify than scalar rewards, and they express certain goals that scalar rewards cannot. Methods for reinforcement learning with pairwise preferences have thus received growing interest. Unfortunately, these methods are inefficient in problems with long time horizons, and they lack guarantees on the performance of Markov policies relative to history-dependent policies, which bridge the theory and practice of reinforcement learning. We therefore propose the \textit{Markov decision contest} as a new problem model for reinforcement learning with pairwise preferences. We prove that stationary Markov policies are optimal among all history-dependent policies, that solving a Markov decision contest exactly is in P, and that a simple iterative algorithm converges to an optimal policy at a sublinear rate. Lastly, in a set of high-dimensional decision problems with long time horizons, we show that our approximate algorithm is significantly more learning-efficient than prior work.

2606.00355 2026-06-02 cs.RO

FAIR^2 Drones: An AI-Ready Standard for Cross-Domain Wildlife Drone Datasets

FAIR^2 Drones:跨领域野生动物无人机数据集的AI就绪标准

Jenna Kline, Kilian Meier, Vandita Shukla, Edouard G. A. Rolland, Elena Iannino, Lucie Laporte-Devylder, Constanza Andrea Molina Catricheo, Blair Costelloe, Elizabeth Campolongo, Henrik S. Midtiby, Devis Tuia, Benjamin Risse, Ulrik P. S. Lundquist, Anders Lyhne Christensen, Fabio Remondino, Thomas Richardson, Tanya Berger-Wolf

发表机构 * The Ohio State University, Department of Computer Science and Engineering(俄亥俄州立大学计算机科学与工程系) School of Civil, Aerospace and Design Engineering, University of Bristol(布里斯托尔大学土木、航空航天与设计工程学院) D Optical Metrology (3DOM), Fondazione Bruno Kessler (FBK)(3DOM光学计量(3DOM),布鲁诺·克塞勒基金会(FBK)) Computer Vision and Machine Learning Systems Group, Institute for Geoinformatics, University of Muenster(计算机视觉与机器学习系统组,地理信息研究所,穆恩斯特大学) Unmanned Aerial Systems Center, University of Southern Denmark(无人飞行系统中心,南部丹麦大学) Department of Collective Behavior, Max Planck Institute of Animal Behavior(集体行为部门,动物行为马克斯·普朗克研究所) Department of Biology, University of Konstanz(生物学系,康斯坦茨大学) Department of Biology, University of Southern Denmark(生物学系,南部丹麦大学)

AI总结 提出FAIR^2 Drones标准,通过整合FAIR和AI就绪数据框架并添加平台元数据和标注规范,使无人机数据集同时支持生态分析、机器人算法开发和计算机视觉基准测试。

详情
AI中文摘要

使用无人机收集动物生态数据需要大量的时间、专业知识和财务资源。然而,大多数现有数据集仅服务于单一研究社区,限制了跨学科重用。我们提出了一个统一的无人机数据集标准FAIR^2 Drones,该标准基于现有的FAIR和AI就绪数据框架,通过添加必要的平台元数据和标注规范,桥接了生态学、机器人和计算机视觉。我们的标准使数据集能够同时支持生态分析、机器人算法开发和计算机视觉基准测试。我们提供了开源验证工具、参考实现以及多模态扩展,将无人机图像与互补传感器(如相机陷阱、GPS和声学)连接起来。通过跨学科标准化元数据,该框架最大化了昂贵现场部署的科学投资回报,并加速了环境监测中的跨领域合作。

英文摘要

Animal ecology data collection using drones represents a substantial investment of time, expertise, and financial resources. Yet most existing datasets serve only a single research community, limiting interdisciplinary reuse. We propose a unified drone dataset standard, FAIR^2 Drones, that bridges ecology, robotics, and computer vision by building on existing FAIR and AI-ready data frameworks while adding essential platform metadata and annotation specifications. Our standard enables datasets to simultaneously support ecological analysis, robotics algorithm development, and computer vision benchmarking. We provide open-source validation tools, reference implementations, and multimodal extensions linking drone imagery with complementary sensors such as camera traps, GPS, and acoustics. By standardizing metadata across disciplines, this framework maximizes the scientific return on investment for costly field deployments and accelerates cross-domain collaboration in environmental monitoring.

2606.00352 2026-06-02 cs.CV cs.GR

HiGS: A Hierarchical Rendering Architecture for Real-Time 3D Gaussian Splatting

HiGS:一种用于实时三维高斯泼溅的分层渲染架构

Dawid Pająk, Martin Bisson, Rodolfo Lima

发表机构 * NVIDIA

AI总结 针对3D高斯泼溅中空间分区与光栅化对瓦片尺寸需求矛盾的问题,提出分层瓦片高斯泼溅(HiGS),通过粗粒度宏瓦片分区和细粒度渲染瓦片光栅化实现加速,在保持精确alpha合成的同时实现最高15.8倍加速。

Comments Project Page: https://research.nvidia.com/labs/sil/projects/higs/

详情
AI中文摘要

3D高斯泼溅(3DGS)已成为在商用GPU上实现实时新视角合成的标准。其流程将空间分区和光栅化绑定到同一瓦片尺寸,但两者需求相反:分区(对高斯进行分箱和深度排序)随瓦片增大而成本降低,而光栅化随瓦片减小而成本降低。先前的加速工作降低了单个阶段的成本,但将两者锁定在单一尺度上,其中少数密集瓦片主导帧时间。我们提出分层瓦片高斯泼溅(HiGS),为每个阶段赋予独立尺度:分区在粗粒度宏瓦片上运行,而光栅化在宏瓦片内的细粒度渲染瓦片上运行。光栅化工作根据每个宏瓦片中的高斯数量分配,而非按瓦片分配,因此密集区域分布在多个并行单元上,而非串行通过一个单元。在测试场景中,HiGS比原始3DGS渲染速度快15.8倍,并且优于我们评估的所有其他光栅化器,同时保持精确的前后alpha合成。

英文摘要

3D Gaussian Splatting (3DGS) has become the standard for real-time novel view synthesis on commodity GPUs. Its pipeline ties spatial partitioning and rasterization to one tile size, yet the two pull in opposite directions: partitioning, which bins and depth-sorts gaussians, grows cheaper with larger tiles, while rasterization gets cheaper with smaller ones. Prior acceleration work reduces the cost of individual stages but keeps both locked to that single scale, where a few dense tiles dominate frame time. We present Hierarchically Tiled Gaussian Splatting (HiGS), which gives each its own scale: partitioning runs over coarse macro-tiles, while rasterization runs over the fine render tiles within them. Rasterization work is then issued in proportion to the gaussians in each macro-tile rather than per tile, so dense regions spread across many parallel units instead of serializing through one. Across tested scenes, HiGS renders up to 15.8x faster than the original 3DGS and outperforms every other rasterizer we evaluate, while preserving exact front-to-back alpha compositing.

2606.00350 2026-06-02 cs.LG cs.AI

Drift Q-Learning

Drift Q-Learning

Anas Houssaini, Mohamad H. Danesh, Amin Abyaneh, Scott Fujimoto, Hsiu-Chin Lin, David Meger

发表机构 * McGill University(麦吉尔大学) Mila - Quebec AI Institute(魁北克AI研究院)

AI总结 提出DriftQL,通过漂移正则化与Q学习结合,在离线强化学习中避免分布外动作,单步生成动作,性能优于扩散和流方法。

详情
AI中文摘要

离线强化学习需要从固定数据中改进策略,同时避免具有不可靠价值估计的分布外动作。扩散和流策略通过建模行为分布来正则化强化学习目标以处理这种权衡,但它们需要迭代去噪、求解器集成,并且在更高效的变体中,推理时需要蒸馏或其他近似。我们提出DriftQL,它将基于漂移的行为正则化器与评论家驱动的策略改进相结合。价值信号将策略偏向数据支持的高价值区域,而吸引和排斥共同使生成的动作接近数据并防止坍缩到单一模式。DriftQL实现为具有统一训练目标的单一网络,并在单次前向传播中生成动作。在D4RL和OGBench上,DriftQL持续优于扩散和流方法,推进了最先进水平。在数据质量下降(基线明显挣扎)的情况下,DriftQL保持接近其干净数据性能,使其成为扩散和流方法的有前途的替代方案,同时保持确定性方法的简单性和效率。项目页面:https://driftql.github.io/

英文摘要

Offline reinforcement learning requires improving a policy from fixed data while avoiding out-of-distribution actions with unreliable value estimates. Diffusion and flow policies handle this trade-off by modeling the behavior distribution to regularize the RL objective, but they require iterative denoising, solver integrations, and in more efficient variants, distillation or other approximations at inference. We propose DriftQL, which combines a drift-based behavioral regularizer with critic-driven policy improvement. The value signal biases the policy toward high-value regions of the data support, while attraction and repulsion together keep generated actions near the data and prevent collapse onto a single mode. DriftQL is implemented as a single network with a unified training objective and generates actions in a single forward pass. On D4RL and OGBench, DriftQL consistently outperforms diffusion and flow methods, advancing the state of the art. Under degraded data quality, where the baselines visibly struggle, DriftQL remains close to its clean-data performance, positioning it as a promising alternative to diffusion and flow-based methods while maintaining the simplicity and efficiency of deterministic approaches. Project page: https://driftql.github.io/

2606.00349 2026-06-02 cs.LG cs.AI cs.CE

(HB-ARFM) History-Bootstrapped Flow Matching for Inverse Boiling Reconstruction

(HB-ARFM) 基于历史引导的流匹配用于逆沸腾重建

Xianwei Zou, Sheikh Md Shakeel Hassan, Arthur Feeney, Aparna Chandramowlishwaran

发表机构 * arXiv

AI总结 提出历史引导自回归流匹配方法,通过条件流匹配和自回归传播解决部分观测下的时空逆重建问题,在沸腾动力学重建中优于其他模型。

Comments ICML 2026

详情
AI中文摘要

从部分观测中重建时空场是科学推理的基础,例如从卫星数据推断大气状态或从成像恢复流体状态。当观测不完整时,逆问题本质上是病态的:即使底层PDE动力学在全状态上是马尔可夫的,部分观测算子也会诱导出非马尔可夫的后验,无法从单个时间步解析。我们提出了一种历史引导自回归流匹配方法,用于部分可观测性下的时空逆重建。观测历史通过条件流匹配引导初始重建,减少歧义。然后自回归地应用相同的条件传输模型,以新观测和过去预测为条件,将重建向前传播。我们在沸腾动力学重建上评估该方法,从界面几何和运动恢复完整的速度和温度场。在两个不同观测稀疏性的逆任务中,HB-ARFM产生了物理和时间上有效的重建,而其他模型则失败。

英文摘要

Reconstructing spatiotemporal fields from partial observations is fundamental to scientific inference, from inferring atmospheric states from satellite data to recovering fluid states from imaging. When observations are incomplete, the inverse problem is fundamentally ill-posed: even when the underlying PDE dynamics are Markovian in the full state, partial observation operators induce a non-Markovian posterior that cannot be resolved from a single timestep. We propose a history-bootstrapped autoregressive flow matching (HB-ARFM) for spatiotemporal inverse reconstruction under partial observability. Observation history bootstraps the initial reconstruction via conditional flow matching, reducing ambiguities. The same conditional transport model is then applied autoregressively, conditioning on both new observations and past predictions to propagate the reconstruction forward in time. We evaluate the method on boiling dynamics reconstruction, recovering full velocity and temperature fields from interface geometry and motion. Across two inverse tasks with varying observation sparsity, HB-ARFM produces physically and temporally valid reconstructions where other models fail.

2606.00345 2026-06-02 cs.LG

Longitudinal Multimodal Sensing of Physical Activity and Well-Being in Older Adults

老年人身体活动与福祉的纵向多模态感知

Flavio Di Martino, Mattia G. Campana, Marcello Magno, Lorenza Pratali, Franca Delmastro

发表机构 * IIT-CNR(意大利理工学院-克雷斯塔纳国家研究委员会) IFC-CNR(意大利弗洛rence-克雷斯塔纳国家研究委员会)

AI总结 本研究通过纵向多模态数据(可穿戴传感、行为监测和临床评估)对66名老年人进行现实世界监测,发现可观察行为目标预测性能良好(macro-F1 65%),而抽象结果预测仍具挑战,且历史特征是最重要的预测因子。

详情
AI中文摘要

可穿戴和移动传感技术能够在现实环境中连续监测人类行为和健康。然而,纵向多模态数据中的预测建模仍然具有挑战性,特别是在针对复杂或临床衍生结果时。在这项工作中,我们展示了一项在现实条件下进行的纵向多模态研究,涉及66名老年人,结合了可穿戴传感、行为监测和临床评估。这一设置提供了研究长期、野外条件下代表性不足人群的难得机会。基于该数据集,我们研究了感知信号与目标变量之间的对齐如何影响跨健康相关任务的预测性能。我们设计了一个统一的评估框架,涵盖具有不同可观测性水平的任务,包括活动水平预测、睡眠时长估计和睡眠呼吸暂停严重程度分类。我们的结果揭示了明确的预测性梯度:高度可观察的行为目标实现了稳健的性能(macro-F1 65%),而更抽象的结果尽管相对于基线模型持续改进,但仍然具有挑战性。此外,通过可解释性分析,我们表明历史特征始终是最具信息量的预测因子,突显了纵向信息的核心作用。

英文摘要

Wearable and mobile sensing technologies enable continuous monitoring of human behavior and health in real-world settings. However, predictive modeling in longitudinal multimodal data remains challenging, particularly when targeting complex or clinically derived outcomes. In this work, we present a longitudinal multimodal study of 66 older adults conducted in real-world conditions and combining wearable sensing, behavioral monitoring, and clinical assessments. This setting provides a rare opportunity to study an underrepresented population in long-term, into-the-wild conditions. Building on this dataset, we investigate how the alignment between sensed signals and target variables affects predictive performance across health-related tasks. We design a unified evaluation framework spanning tasks with increasing levels of observability, including Activity Levels prediction, Sleep Duration estimation, and Sleep Apnea Severity classification. Our results reveal a clear gradient of predictability: highly observable behavioral targets achieve robust performance (macro-F1 65%), while more abstract outcomes remain challenging despite consistent improvements over baseline models. Moreover, through explainability analysis, we show that historical features consistently emerge as the most informative predictors, highlighting the central role of longitudinal information.