arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 3405
专题追踪
2605.25239 2026-05-26 cs.RO eess.SP

FusionCore: A 23-State Unscented Kalman Filter for IMU, Wheel Encoder, GPS, and Visual SLAM Fusion in ROS 2

FusionCore: 用于IMU、轮式编码器、GPS和视觉SLAM融合的23状态无迹卡尔曼滤波器(ROS 2)

Manan Kharwar

发表机构 * Independent Researcher(独立研究者)

AI总结 提出FusionCore,一个基于23状态无迹卡尔曼滤波器的开源ROS 2传感器融合包,通过在线估计轮式编码器偏航率偏差、GPS ECEF原生处理、自适应噪声协方差和VSLAM位姿融合,在12个NCLT序列上比robot_localization取得更低的绝对轨迹误差。

Comments 8 pages, 4 figures, 2 tables. Source code: https://github.com/manankharwar/fusioncore (Apache 2.0)

详情
AI中文摘要

我们提出了FusionCore,一个开源的ROS 2传感器融合包,它使用23状态无迹卡尔曼滤波器(UKF)将IMU、轮式编码器里程计、GPS和视觉SLAM位姿融合成单个100 Hz的里程计流。第23个状态是轮式编码器系统性偏航率偏差的在线估计,该偏差通过GPS航向互协方差识别,并在GPS中断期间减去,以减少滑行模式下的航向漂移。FusionCore还将陀螺仪和加速度计偏差估计为显式滤波器状态,在ECEF中本地处理GPS而无需单独的坐标投影节点,应用基于测量自由度的每传感器马氏卡方异常值门控,并根据创新序列自动调整传感器噪声协方差。VSLAM位姿融合使得任何视觉里程计或SLAM系统都能在GPS缺失环境下运行,包括从地图重新初始化中自动恢复。我们在NCLT公开数据集的12个全长序列(每个55-92分钟)上对robot_localization进行了评估。FusionCore在12个序列中的10个上实现了更低的绝对轨迹误差(ATE),在获胜序列上改进范围从1.2倍到22.2倍。robot_localization的UKF在所有12个序列上数值发散。FusionCore可在https://github.com/manankharwar/fusioncore上获取,采用Apache 2.0许可证。

英文摘要

We present FusionCore, an open-source ROS 2 sensor fusion package that fuses IMU, wheel encoder odometry, GPS, and Visual SLAM pose into a single 100 Hz odometry stream using a 23-state Unscented Kalman Filter (UKF). The 23rd state is an online estimate of the wheel encoder's systematic yaw rate bias, identified through GPS heading cross-covariance and subtracted during GPS blackouts to reduce heading drift in coast mode. FusionCore also estimates gyroscope and accelerometer biases as explicit filter states, handles GPS natively in ECEF without a separate coordinate projection node, applies per-sensor Mahalanobis chi-squared outlier gating calibrated to measurement degrees of freedom, and adapts sensor noise covariance automatically from the innovation sequence. VSLAM pose fusion enables GPS-denied operation with any visual odometry or SLAM system, including automatic recovery from map reinitialization. We evaluate against robot_localization on twelve full-length sequences (55-92 min each) from the NCLT public dataset. FusionCore achieves lower Absolute Trajectory Error (ATE) on ten of twelve sequences, with improvements ranging from 1.2x to 22.2x on winning sequences. The robot_localization UKF diverges numerically on all twelve sequences. FusionCore is available at https://github.com/manankharwar/fusioncore under the Apache 2.0 license.

2605.25235 2026-05-26 cs.LG cs.AI math.OC

Constraint-Anchored Attribution: Feasibility-Certified Counterfactuals and Bonferroni-PAC Sufficient Subsets for Neural CO Policies

约束锚定归因:神经组合优化策略的可行性认证反事实与Bonferroni-PAC充分子集

Sohaib Lafifi

发表机构 * Univ. Artois, UR 3926, Laboratoire de G\'enie Informatique et d'Automatique de l'Artois (LGI2A) B\'ethune F-62400 France Univ. Artois, UR 3926, Laboratoire de G\'enie Informatique et d'Automatique de l'Artois (LGI2A)

AI总结 提出一种神经组合优化策略的归因方法,通过LP松弛对偶分解决策、CSP可行性模型认证反事实,并用Bonferroni校正的Hoeffding充分子集测试界定PAC解释大小。

Comments 4 pages, 1 figure, Reference implementation: https://github.com/sohaibafifi/neuro-co-cax (MIT)

详情
AI中文摘要

我们为神经组合优化(CO)策略提供了一种归因方法,该方法(i)通过LP松弛对偶按约束族分解决策,(ii)通过组合可行性模型(实现为CSP可行性决策模型)认证反事实,以及(iii)通过沿贪心顺序的Bonferroni校正Hoeffding充分子集测试界定PAC充分解释的大小。在三个CO问题和三个随机种子上,我们的LP锚定$\Lambda$-归因在CVRPTW(n_cert=344)上匹配CF导出信号的96.5%,在定向问题(n_cert=281)上匹配77.2%,而代理梯度分别为75.0%和35.2%(配对差异+0.215和+0.420;McNemar精确$p \le 10^{-14}$)。在柔性作业车间调度问题的秩对齐机制中,两个后端在每个CSP认证翻转(n_cert=59)上一致,确认了无增益预测。Bonferroni-PAC子集平均每步5.0个节点($M=70$,$\varepsilon=\delta=0.2$,$k_{\max}=25$)。参考实现:https://github.com/sohaibafifi/neuro-co-cax

英文摘要

We give an attribution method for neural combinatorial-optimisation (CO) policies that (i) decomposes a decision by constraint families via LP-relaxation duals, (ii) certifies counterfactuals through a combinatorial feasibility model (implemented as a CSP feasibility-decision model), and (iii) bounds the size of a PAC-sufficient explanation with a Bonferroni-corrected Hoeffding sufficient-subset test along a greedy ordering. Across three CO problems and three seeds, our LP-anchored $Λ$-attribution matches the CF-derived signal at 96.5% on CVRPTW (n_cert=344) and 77.2% on the Orienteering Problem (n_cert=281) vs 75.0% and 35.2% for proxy gradient (paired diffs +0.215 and +0.420; McNemar exact $p \le 10^{-14}$). In the rank-aligned regime of the Flexible Job-Shop Scheduling Problem, both backends agree on every CSP-certified flip (n_cert=59), confirming the no-gain prediction. Bonferroni-PAC subsets average 5.0 nodes per step ($M=70$, $\varepsilon=δ=0.2$, $k_{\max}=25$). Reference implementation: https://github.com/sohaibafifi/neuro-co-cax

2605.25234 2026-05-26 cs.LG cs.AI stat.CO stat.ML

On the Epistemic Uncertainty of Overparametrized Neural Networks

关于过参数化神经网络的认知不确定性

David Rügamer

发表机构 * Department of Statistics, LMU Munich(统计系,慕尼黑大学) Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心)

AI总结 本文通过非可辨识性视角分析过参数化神经网络的认知不确定性,刻画了离散和连续残余不确定性来源,并以单隐层ReLU网络为例验证理论。

Comments Accepted at ICML 2026 (Main Track)

详情
AI中文摘要

认知不确定性通常被视为一种可减少的不确定性,随着数据增加而消失。这种观点隐含地假设参数可辨识,并将认知不确定性等同于预测变异性。然而,在过参数化神经网络中,由于对称性和冗余表示,模型参数通常不可辨识。因此,即使底层函数被完全识别,大量的参数不确定性仍然存在。在这项工作中,我们通过非可辨识性的视角分析认知不确定性,并刻画了残余不确定性的离散和连续来源。聚焦于单隐层ReLU网络,我们深入分析了由此产生的后验结构,并通过实证研究验证了我们的理论见解。

英文摘要

Epistemic uncertainty is often viewed as a reducible uncertainty that vanishes with increasing data. This perspective implicitly assumes parameter identifiability and equates epistemic uncertainty with predictive variability. In overparametrized neural networks, however, model parameters are typically non-identifiable due to symmetries and redundant representations. As a consequence, substantial parameter uncertainty can persist even when the underlying function is fully identified. In this work, we analyze epistemic uncertainty through the lens of non-identifiability and characterize both discrete and continuous sources of residual uncertainty. Focusing on one-hidden-layer ReLU networks, we thoroughly analyze the resulting posterior structure and validate our theoretical insights through empirical studies.

2605.25233 2026-05-26 cs.AI

Meta-Agent: From Task Descriptions to Verified Multi-Agent Systems

Meta-Agent:从任务描述到经过验证的多智能体系统

Andy Xu, Yu-Wing Tai

发表机构 * Dartmouth College(达特茅斯学院)

AI总结 提出Meta-Agent两阶段框架,通过任务规划、网络搜索、代码生成和验证机制,自动从自然语言任务描述构建并执行可靠的多智能体系统,在编码、上下文学习和开放推理任务中提升成功率、错误恢复和工作流稳定性。

详情
AI中文摘要

AI智能体越来越多地被用于解决复杂的多步骤任务,但随着工作流规模和深度的增长,现有的多智能体框架仍然脆弱。中间阶段的小错误会通过智能体交互传播,同时不充分的依据和薄弱的验证机制进一步限制了可靠性。我们提出Meta-Agent,一个两阶段框架,能够从自然语言任务描述自动构建并执行专门的多智能体系统。在构建阶段,任务规划器将问题分解为智能体规范的有向无环图,包含明确的输入/输出契约和验证标准。网络搜索模块用外部证据为每个规范提供依据,代码生成模块产生系统提示和工具配置。构建时验证阶段随后验证生成的工件,并在检测到失败时触发有针对性的重新生成。在执行阶段,协调器在智能体图中分配子任务,同时执行时验证对中间输出进行把关。我们进一步引入三级错误归因机制,区分局部、上游和结构性失败,从而实现从局部重试到部分重新执行和重新分解的有针对性的恢复策略。我们在编码、上下文学习和开放式推理任务上评估Meta-Agent。与强多智能体基线及消融实验相比,结果表明在任务成功率、错误恢复和工作流稳定性方面均有持续改进。这些结果凸显了将规划、依据和验证紧密集成以构建可靠多智能体系统的重要性。

英文摘要

AI agents are increasingly used to solve complex, multi-step tasks, but existing multi-agent frameworks remain brittle as workflows grow in scale and depth. Small errors at intermediate stages can propagate through agent interactions, while insufficient grounding and weak verification mechanisms further limit reliability. We present Meta-Agent, a two-phase framework that automatically constructs and executes specialized multi-agent systems from natural-language task descriptions. In the construction phase, a task planner decomposes a problem into a directed acyclic graph of agent specifications with explicit input/output contracts and verification criteria. A web search module grounds each specification with external evidence, and a code generation module produces system prompts and tool configurations. A construction-time verification stage then validates generated artifacts and triggers targeted regeneration when failures are detected. In the execution phase, a coordinator dispatches subtasks across the agent graph while execution-time verification gates intermediate outputs. We further introduce a three-level error attribution mechanism that distinguishes local, upstream, and structural failures, enabling targeted recovery strategies ranging from localized retries to partial re-execution and re-decomposition. We evaluate Meta-Agent across coding, contextual learning, and open-ended reasoning tasks. Experiments against strong multi-agent baselines and ablation studies demonstrate consistent improvements in task success rate, error recovery, and workflow stability. The results highlight the importance of tightly integrating planning, grounding, and verification for building reliable multi-agent systems.

2605.25228 2026-05-26 cs.LG

A Blended Likelihood Approach for Achieving Fairness Using Naive Bayes

一种使用朴素贝叶斯实现公平性的混合似然方法

John Arthur Junior, Abdul Lateef Yussif, Maame G. Asante-Mensah, Charles R. Haruna, Sandro Amofa, Elliot Attipoe

发表机构 * Department of Computer Science and Information Technology, University of Cape Coast(计算机科学与信息技术系,卡普科斯特大学)

AI总结 提出一种公平感知的朴素贝叶斯扩展(BMNB),通过混合似然估计和自适应阈值后处理来平衡公平性与准确性,在多个数据集上实现接近公平的指标。

详情
AI中文摘要

随着人工智能被纳入高风险决策,对算法偏见和公平性的担忧日益增加。传统的朴素贝叶斯分类器虽然高效且可解释,但缺乏公平性感知机制,并在招聘、信用评分和刑事司法等敏感领域延续了历史偏见。本研究开发了一种公平感知的朴素贝叶斯分类器扩展,在保持计算效率的同时减轻偏见。我们提出了偏见缓解朴素贝叶斯(BMNB)分类器,整合了处理中和处理后干预。处理中阶段采用混合似然方法,通过可调混合参数alpha结合组特定和合并似然估计,以平衡公平性和准确性。处理后阶段应用具有自适应阈值的输出校准,以微调组特定决策边界。实验结果表明,BMNB在Adult、ProPublica和Framingham数据集上分别达到了1.000、1.171和0.997的差异影响(DI)值,以及-0.217、-0.226和-0.053的均等机会差异(EOD)值,同时保持了计算效率。消融研究证实,混合似然与自适应阈值的组合相比单独使用任一技术都能产生更优的性能。

英文摘要

Concerns about algorithmic bias and fairness have increased as artificial intelligence has been incorporated into high-stakes decision-making. Traditional Naive Bayes classifiers, while efficient and interpretable, lack fairness-awareness mechanisms and perpetuate historical biases in sensitive domains such as hiring, credit scoring, and criminal justice. This study develops a fairness-aware extension of the Naive Bayes classifier that mitigates bias while maintaining computational efficiency. We propose the Bias Mitigating Naive Bayes (BMNB) classifier, integrating in-processing and post-processing interventions. The in-processing stage employs a blended likelihood approach combining group-specific and pooled likelihood estimates through a tunable blending parameter alpha to balance fairness and accuracy. The post-processing stage applies output calibration with adaptive thresholding to fine-tune group-specific decision boundaries. Experimental results indicate that BMNB attains Disparate Impact (DI) values of 1.000, 1.171, and 0.997 and Equal Opportunity Difference (EOD) values of -0.217, -0.226, and -0.053 on the Adult, ProPublica, and Framingham datasets, respectively, while maintaining computational efficiency. Ablation studies confirm that the combination of blended likelihood and adaptive thresholding yields superior performance compared to either technique in isolation.

2605.25226 2026-05-26 cs.CL

From Automation to Collaboration: Human-in-the-Loop Methods for Safe and Trustworthy NLP

从自动化到协作:面向安全可信NLP的人机协同方法

Most. Sharmin Sultana Samu, MD. Tanvir Ahmed Seum, Md. Rakibul Islam

发表机构 * Department of Computer Science and Engineering, BRAC University(布拉克大学计算机科学与工程系) Department of Electrical and Electronic Engineering, Rajshahi University of Engineering and Technology(拉贾沙希工程与技术大学电子与电气工程系)

AI总结 本文综述了人机协同方法,通过人类监督支持审计、鲁棒性评估、数据构建和模型引导,以提升NLP在安全可信方面的表现,并指出了可扩展探测、可持续鲁棒性基准、低资源设置和私有系统治理等方面的差距。

Comments Preprint, manuscript under review

详情
AI中文摘要

大型语言模型广泛部署在高风险的NLP任务中,但偏见、幻觉、对抗性脆弱性和不可靠的泛化等风险仍然存在。基于探测的审计揭示了模型行为的不一致性。对抗性文本生成发现了鲁棒性差距,特别是在基准有限的低资源语言中。企业文本到SQL设置暴露了在私有和大规模数据库上验证输出的困难。人类监督对于探测验证、对抗性验证和领域特定标注至关重要,但成本高昂且难以扩展。本综述考察了最近的人机协同方法,这些方法将NLP从自动化转向协作,以实现安全性和可信度。我们回顾了人类专业知识如何支持审计、鲁棒性评估、数据构建和模型引导。我们的发现强调了可扩展探测、可持续鲁棒性基准、低资源设置和私有系统治理方面的差距。我们概述了自适应审计、协作评估和负责任部署的实用研究方向。

英文摘要

Large language models are widely deployed in high-stakes NLP tasks, yet risks such as bias, hallucination, adversarial vulnerability and unreliable generalization remain. Probe-based auditing reveals inconsistencies in model behavior. Adversarial text generation uncovers robustness gaps, especially in lower-resourced languages with limited benchmarks. Enterprise text-to-SQL settings expose the difficulty of validating outputs over private and large-scale databases. Human supervision is essential for probe validation, adversarial verification and domain-specific annotation, but it is costly and hard to scale. This survey examines recent human-in-the-loop methods that shift NLP from automation toward collaboration for safety and trustworthiness. We review how human expertise supports auditing, robustness evaluation, data construction and model steering. Our findings highlight gaps in scalable probing, sustainable robustness benchmarks, low-resource settings and governance of private systems. We outline practical research directions for adaptive auditing, collaborative evaluation and accountable deployment.

2605.25220 2026-05-26 cs.CV cs.GR cs.RO

Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation

无需多视图生成的多视图一致3D高斯头部头像

Aviral Chharia, Fernando De la Torre

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出MVCHead,一种直接从随机采样的2D图像学习3D高斯头部模型的方法,通过层次状态空间块和SE(3)多视图评判器实现多视图一致性,无需多视图数据或3D监督。

Comments CVPR 2026; Project Website: https://humansensinglab.github.io/MVCHead/

Journal ref CVPR, Denver, CO, USA, 2026, pp. 40163-40174

详情
AI中文摘要

高保真3D高斯头部头像生成对于AR/VR、远程呈现和数字人类等应用至关重要。现有方法依赖于多视图数据集、3D捕获或中间2D视图合成。相比之下,我们仅从随机采样的2D图像中学习条件和非条件3D头部模型,而不使用多视图数据、3D监督或中间视图生成。我们引入MVCHead,一种单次状态空间模型,直接在3D表示中强制执行多视图一致性(MVC),同时在这些约束下回归3D高斯。其核心是,我们提出层次状态空间(HiSS)块,从粗到细逐步细化高斯,同时捕获长距离依赖。在每个HiSS块中,我们修改Mamba的标准单向扫描,提出层次双向状态扫描(HiBiSS),将递归与多视图不一致性最强的轴对齐。最后,我们设计了一个SE(3)多视图评判器,判断一组自渲染是否来自单个底层3D配置,奖励跨视图像素对齐而不观察真实的多视图对。MVCHead实现了最先进的感知质量,在纹理和几何一致性上超越了先前方法,并保持了可比的形状一致性。为了展示可扩展性,我们发布了FaceGS-10K,这是第一个用于训练和评估3D头部模型的大规模即用型3D高斯头部资产数据集。项目页面和代码:https://humansensinglab.github.io/MVCHead/

英文摘要

High-fidelity 3D Gaussian head avatar generation is critical for applications such as AR/VR, telepresence, and digital humans. Existing methods depend on multi-view datasets, 3D captures, or intermediate 2D view synthesis. In contrast, we learn both conditional and unconditional 3D head models from randomly sampled 2D images alone, without using multi-view data, 3D supervision, or intermediate view generation. We introduce MVCHead, a single-shot state space model that enforces multi-view consistency (MVC) directly in the 3D representation while regressing 3D Gaussians under these constraints. At its core, we propose a Hierarchical State Space (HiSS) block that progressively refines Gaussians from coarse to fine, while capturing long-range dependencies. Within each HiSS block, we modify Mamba's standard unidirectional scan with the proposed Hierarchical Bi-directional State Scan (HiBiSS) that aligns recurrence with the axes along which multi-view inconsistencies are strongest. Finally, we design an SE(3) Multi-view Critic that judges whether a set of self-renders arises from a single underlying 3D configuration, rewarding cross-view pixel alignment without observing real multi-view pairs. MVCHead achieves state-of-the-art perceptual quality, surpasses prior methods in both texture and geometric consistency, and maintains comparable shape consistency. To demonstrate scalability, we release FaceGS-10K, the first large-scale dataset of ready-to-use 3D Gaussian head assets for training and evaluation of 3D head models. Project Page and code: https://humansensinglab.github.io/MVCHead/

2605.25216 2026-05-26 cs.RO

InvariantCloud: A Globally Invariant, Uniquely Indexed Point Cloud Framework for Robust 6-DoF Tactile Pose Tracking

InvariantCloud:一种全局不变、唯一索引的点云框架,用于鲁棒的6自由度触觉姿态跟踪

Pengfei Ye, Yuxiang Ma, Yi Zhou, Wei Chen, Wenzhen Dong, Molong Duan

发表机构 * Department of Mechanical and Aerospace Engineering, The Hong Kong University of Science and Technology(香港科学与技术大学机械与航空航天工程系) Department of Mechanical Engineering, Massachusetts Institute of Technology(麻省理工学院机械工程系) Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong(香港中文大学机械与自动化工程系)

AI总结 提出InvariantCloud框架,利用视觉触觉传感器上表面标记星座的全局不变性,通过一次性全局不变点云配准实现6自由度物体姿态估计,抑制累积漂移并准确估计偏航旋转,在长序列操作任务中展现出高精度和鲁棒性。

详情
AI中文摘要

最近在模仿学习和视觉语言模型方面的进展突显了对高保真触觉感知的需求,其中6自由度触觉物体姿态估计为精确的机器人操作提供了关键基础。我们提出了InvariantCloud,一种6自由度姿态估计框架,该框架利用基于视觉的触觉传感器上表面标记星座的全局不变性。与最近的方法相比,我们的一次性全局不变点云配准抑制了累积漂移,并克服了准确估计偏航(Z轴)旋转的长期限制。实验验证表明,与现有基准相比,InvariantCloud在偏航跟踪精度和重定位重复性方面表现出色,证明了其在长序列操作任务中的精度和鲁棒性。

英文摘要

Recent advances in imitation learning and vision-language models highlight the need for high-fidelity tactile perception, with 6-DoF tactile object pose estimation providing a crucial foundation for precise robotic manipulation. We introduce InvariantCloud, a 6-DoF pose estimation framework that leverages the global invariance of surface marker constellations on vision-based tactile sensors. In contrast to recent approaches, our one-shot globally invariant point cloud registration suppresses cumulative drift and overcomes long-standing limitations in accurately estimating yaw (Z-axis) rotation. Experimental verifications show that InvariantCloud achieves superior yaw tracking accuracy and re-localization repeatability compared to existing benchmarks, demonstrating its precision and robustness in long-sequence manipulation tasks.

2605.25212 2026-05-26 cs.LG cs.SY eess.SY

Personalized Federated Learning by Energy-Efficient UAV Communications

通过节能无人机通信实现个性化联邦学习

Shiqian Guo, Jianqing Liu, Beatriz Lorenzo

发表机构 * Department of Computer Science, North Carolina State University(计算机科学系,北卡罗来纳州立大学) Department of Electrical and Computer Engineering, University of Massachusetts(电气与计算机工程系,马萨诸塞大学)

AI总结 针对无人机辅助联邦学习中数据异构和能耗问题,提出全局共享骨干与本地个性化头部分离的架构,并设计基于梯度范数的调度策略,在降低能耗的同时提升学习精度。

详情
AI中文摘要

联邦学习是一种在保护数据隐私的同时增强边缘设备学习能力的有效范式。在分布式联邦学习系统中,如偏远地区的传感器网络,无人机可以灵活建立高质量通信链路以支持参数交换。然而,设备异构性和无人机有限的电池容量带来了重大挑战。具体而言,数据异构性会减慢收敛速度,而调度所有设备进行全局协作会导致过高的通信和能量成本。为了克服这些挑战,我们采用全局共享骨干与永久本地个性化头部的严格分离,从而减轻数据异构性的影响。此外,我们提出了一种基于梯度的调度策略,该策略联合考虑了能量效率和学习性能。在每轮通信中,骨干仅由梯度$\ell_{2}$范数排名前$α$的设备更新,确保优化集中在信息量最大的更新上。仿真结果表明,与最先进的方法相比,所提方案实现了更高的学习精度,同时显著降低了无人机能耗。

英文摘要

Federated learning (FL) is an effective paradigm for enhancing the learning capability of edge devices while preserving data privacy. In geographically dispersed FL systems, such as sensor networks in remote areas, unmanned aerial vehicles (UAVs) can flexibly establish high-quality communication links to support parameter exchange. However, device heterogeneity and the limited battery capacity of UAVs pose significant challenges. Specifically, data heterogeneity slows convergence, while scheduling all devices for global collaboration incurs excessive communication and energy costs. To overcome these challenges, we adopt a strict separation between a globally shared backbone and permanently local personalization heads, thereby mitigating the impact of data heterogeneity. Furthermore, we propose a gradient-based scheduling strategy that jointly considers energy efficiency and learning performance. In each communication round, the backbone is updated only by the top-$α$ devices ranked by gradient $\ell_{2}$-norm, ensuring that optimization focuses on the most informative updates. Simulation results demonstrate that the proposed scheme achieves higher learning accuracy than state-of-the-art approaches while significantly reducing UAV energy consumption.

2605.25210 2026-05-26 cs.LG cs.AI stat.ML

Multi-Objective Learning for Diffusion Models: A Statistical Theory under Semi-Supervised Learning

扩散模型的多目标学习:半监督学习下的统计理论

Ziheng Cheng, Yixiao Huang, Hanlin Zhu, Haoran Geng, Somayeh Sojoudi, Jitendra Malik, Pieter Abbeel, Xin Guo

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 针对扩散模型在多目标学习中因模型容量增大导致统计成本高的问题,提出半监督两阶段训练方法,利用未标记数据通过伪样本蒸馏,证明所需配对样本量仅取决于专家模型复杂度。

详情
AI中文摘要

扩散模型越来越多地被用作强大的条件生成器,然而实际部署通常涉及来自不同任务的多个目标分布,例如文本到图像生成中的多样化提示域,或机器人技术中具有扩散策略的多个环境。这自然引出了多目标学习(MOL)问题。一个关键挑战是,实现良好的帕累托权衡可能需要一个通用模型类,其容量远大于解决任何单个任务所需的容量,从而增加了统计成本,因为样本复杂度通常随模型复杂度而扩展。为了调和这一点,我们为有限数据下的扩散模型开发了一个原则性的多目标学习框架:一种半监督机制,其中配对(标记)样本稀缺,但(未标记)条件数据丰富。我们提出了一种两阶段训练程序,首先从有限的配对数据中拟合轻量级专家模型,然后通过生成伪样本将它们蒸馏成一个通用模型。我们建立了泛化界限,表明所需的配对样本数量仅取决于专家模型类的复杂度。我们进一步将理论扩展到用于序列决策的扩散策略,以考虑在线策略展开中的分布偏移。在机器人控制和图像恢复任务上进行了大量实验,以验证我们的理论结果。

英文摘要

Diffusion models are increasingly used as powerful conditional generators, yet real deployments often involve multiple target distributions arising from different tasks, e.g., diverse prompt domains in text-to-image generation, or multiple environments in robotics with diffusion policies. This naturally leads to a multi-objective learning (MOL) problem. A key challenge is that achieving good Pareto trade-offs can require a generalist model class with substantially larger capacity than what suffices for solving any individual task, thereby increasing statistical cost since sample complexity typically scales with the model complexity. To reconcile this, we develop a principled MOL framework for diffusion models with limited data: a semi-supervised regime where paired (labeled) samples are scarce, but (unlabeled) condition data are abundant. We propose a two-stage training procedure that first fits lightweight specialist models from limited paired data, and then distills them into a generalist model by generating pseudo-samples. We establish generalization bounds showing that the required number of paired samples only depends on the complexity of the specialist model classes. We further extend the theory to diffusion policies for sequential decision making to account for distribution shift in on-policy rollouts. Extensive experiments on robotic control and image restoration tasks are conducted to verify our theoretical results.

2605.25208 2026-05-26 cs.CL

They Are Not the Same: Direct Causes Are Not Grounded Emotion Explanations

它们并不相同:直接原因并非基于情感解释

Zhuangzhuang Pan, Yan Xia, Chee Seng Chan

发表机构 * Universiti Malaya, Malaysia(马来亚大学) Suzhou University of Technology, China(苏州科技学院) VinUniversity, Vietnam(Vin 大学)

AI总结 本文通过IEMO-MECP数据集分析发现,情感-原因对提取(ECPE)任务中的二元分类代理只能有效提取直接触发原因,而无法提供基于证据的情感解释,因为情感上下文(emo-context)在二元边界处被忽略,且模型在捷径压力下倾向于选择便利归因而非真实解释。

Comments 25 pages, 11 figures, 24 tables. Preprint

详情
AI中文摘要

情感-原因对提取(ECPE)旨在解释情感为何发生,但该目标现在常被简化为二元对/非对预测。这一代理对于直接原因提取有用,但容易被过度解读为基于证据的情感解释。我们表明这种解释仅部分有效。在IEMO-MECP中,90.9%的原始正例仍为情感-原因对,95.0%的原始负例仍为非对,证实了二元ECPE任务在很大程度上得以保留。问题在于,仅直接触发因素并不构成基于的解释。情感上下文(emo-context),即有助于解释目标情感但不直接导致该情感的语句,出现在原始边界的双侧,并在二元不确定性附近富集,表明二元边界对此类话语证据没有稳定位置。在评估的ECPE模型中,直接触发因素的恢复比上下文支持更可靠。在捷径压力下,这种不平衡变得显著。二元训练模型对附近词汇相似的非对候选者分配的对分数高于对证据支持但结构上更困难的情感-原因和情感-上下文对。因此,对分数可能奖励便利归因而非基于的解释。高二元ECPE性能表明模型能识别直接触发因素,但并不表示模型已解释情感。代码公开于https://github.com/panzhzh/ECPExsame。

英文摘要

Emotion-Cause Pair Extraction (ECPE) was introduced to explain why an emotion occurs, but this goal is now often reduced to binary pair/non-pair prediction. This proxy is useful for direct-cause extraction, yet easy to over-read as evidence grounded emotion explanation. We show that this interpretation is only partially valid. In IEMO-MECP, 90.9% of original positives remain emo-cause and 95.0% of original negatives remain non-pair, confirming that the binary ECPE task is largely preserved. The problem is that direct triggers alone do not constitute a grounded explanation. Emo-context, an utterance that helps interpret a target emotion without directly causing it, appears on both sides of the original boundary and is enriched near binary uncertainty, showing that the binary boundary has no stable place for such discourse evidence. Across evaluated ECPE models, direct triggers are recovered more reliably than contextual support. Under shortcut pressure, this imbalance becomes consequential. Binary-trained models assign higher pair scores to nearby lexically similar non-pair candidates than to evidence supported but structurally harder emo-cause and emo-context pairs. Thus, pair scores can reward convenient attributions over grounded explanations. High binary ECPE performance indicates that a model can identify direct triggers; it does not indicate that the model has explained the emotion. Code is publicly available at https://github.com/panzhzh/ECPExsame.

2605.25204 2026-05-26 cs.CL

Clarification Is Not Enough: Post-Clarification Answering Remains the Bottleneck in Multi-Turn QA

澄清不够:澄清后的回答仍是多轮问答中的瓶颈

Jinyan Su, Jennifer Healey

发表机构 * Cornell University(康奈尔大学) Adobe Research(Adobe研究)

AI总结 本文通过分解多轮问答为澄清策略和澄清后回答两个组件,利用PACIFIC基准实验发现监督微调能快速提升澄清策略,但最终答案准确率仍显著偏低,表明理解并正确解释用户回应是关键瓶颈。

详情
AI中文摘要

多元对齐要求系统适应不同的用户价值观、沟通风格和上下文假设。我们认为,这种对齐的基础前提是,当用户意图不明确或模糊时,能够从用户那里准确引出偏好。我们通过将问题分解为两个组件来研究多轮问答中的偏好引出问题:一个 extbf{澄清策略},决定是提出澄清问题还是直接回答;以及 extbf{澄清后回答},在缺失信息提供后产生正确的最终答案。我们使用PACIFIC基准表明,监督微调能快速改善澄清策略,然而,即使模型采取了正确的行动,最终答案的准确率仍然显著较低。这一差距表明,理解并正确解释用户的回应是多轮问答系统中的关键瓶颈。

英文摘要

Pluralistic alignment requires systems to adapt to diverse user values, communication styles, and contextual assumptions. We believe that a foundational prerequisite for such alignment enabling accurate preference elicitation from people when their intent is under-specified or ambiguous. We study the problem of preference elicitation in multi-turn question answering by decomposing the problem into two components: a \textbf{clarification policy}, which decides whether to ask a clarifying question or answer directly, and \textbf{post-clarification answering}, which produces the correct final answer once the missing information is provided. We show, using the PACIFIC benchmark, that supervised fine-tuning rapidly improves the clarification policy, however, final answer accuracy remains substantially lower even when the model takes the correct action. This gap indicates that understanding and correctly interpreting the user's response is the critical gap in multi-turn question-answering systems.

2605.25203 2026-05-26 cs.LG cs.AI cs.LO

Influence-Inspired Spectral Rotations for Extreme Low-Bit LLM Quantization

基于影响启发的谱旋转用于极端低位LLM量化

Gorgi Pavlov

发表机构 * Lehigh University(莱斯大学)

AI总结 本文利用伴随理论论文的影响自适应Walsh几何,通过WHT旋转和列缩放结合重构误差量化器,实现极端低位权重量化,在多个模型上降低困惑度15-58%。

Comments 14 pages, no figures. Companion application paper to arXiv:2605.01637 (theory). Code and pinned eval stack: https://github.com/gogipav14/spectral-llm

详情
AI中文摘要

我们将伴随理论论文(arXiv:2605.01637)的影响自适应Walsh几何应用于极端低位仅权重量化。方法是一个数学不变的变换:对每个线性层的权重矩阵进行WHT旋转,并根据逐坐标Walsh基激活能量重新缩放其列,然后交给重构误差量化器(Intel auto-round)。这使每组整数舍入偏向高谱能量通道。在四个从135M到1.5B参数的预训练仅解码器模型上,BBT-spectral在W2A16下相对于普通auto-round将wikitext-2困惑度降低了15-58%;我们还报告了一个TinyLlama-1.1B辅助数据点。三个扩展将方法迁移到其失败的族:针对Qwen3注意力的每头PCA矩阵-Gamma替换q_norm/k_norm(Qwen3-0.6B上PPL从136.76降至88.99);与RoPE可交换的SO(2)每对旋转(Qwen2.5-1.5B上PPL从36.93降至21.84);以及通过架构模糊测试发现的Laguna风格融合专家布局的MoE感知输入侧吸收修复。W2与W4的消融实验给出了一个故意的阴性对照:在W4下,重新分配收益落在±0.5 PPL噪声基底内,这与Schur-凸性直觉一致,即非集中影响成本随噪声预算缩小而消失。所有量化权重导出为OpenVINO IR,并在Intel NPU + Arc dGPU + CPU上运行,PPL在设备间变化在±0.1内。我们不声称将理论论文的majorization论证形式化为布尔到实数值的迁移:这里使用的WHT激活能量不是理论论文的布尔影响,联系是直观的,贡献在于工程价值而非迁移定理。与SpinQuant、QuaRot、QuIP-sharp、AQLM、OmniQuant和ButterflyQuant在匹配校准下的头对头基准测试是未来的主要工作。

英文摘要

We apply the influence-adaptive Walsh geometry of a companion theory paper (arXiv:2605.01637) to extreme low-bit weight-only LLM quantization. The recipe is one math-invariant transformation: WHT-rotate each linear layer's weight matrix and rescale its columns by per-coordinate Walsh-basis activation energy before handing off to a reconstruction-error quantizer (Intel auto-round). This biases per-group integer rounding toward high-spectral-energy channels. On four pretrained decoder-only models from 135M to 1.5B parameters, BBT-spectral reduces wikitext-2 perplexity by 15-58% relative to vanilla auto-round at W2A16; we also report a TinyLlama-1.1B auxiliary data point. Three extensions transfer the recipe to families it failed on: a per-head PCA matrix-Gamma replacement of q_norm/k_norm for Qwen3 attention (PPL 136.76 -> 88.99 on Qwen3-0.6B); an SO(2) per-pair rotation that commutes with RoPE (PPL 36.93 -> 21.84 on Qwen2.5-1.5B); and an MoE-aware input-side absorption fix identified by architectural fuzzing of Laguna-style fused-expert layouts. A W2-vs-W4 ablation gives a deliberate negative control: the redistribution payoff falls within the +/-0.5 PPL noise floor at W4, consistent with the Schur-convexity intuition that the cost of unconcentrated influence vanishes as the noise budget shrinks. All quantized weights export to OpenVINO IR and run on Intel NPU + Arc dGPU + CPU with PPL invariant to device within +/-0.1. We do not claim a formal Boolean-to-real-valued transfer of the theory paper's majorization argument: the WHT activation energy used here is not the Boolean influence of the theory paper, the link is intuitive, and the contribution is engineering value rather than a transferred theorem. Head-to-head benchmarks against SpinQuant, QuaRot, QuIP-sharp, AQLM, OmniQuant, and ButterflyQuant at matched calibration are the main future-work item.

2605.25198 2026-05-26 cs.LG cs.AI

Hide to Guide: Learning via Semantic Masking

隐藏以引导:通过语义掩码学习

Ruitao Liu, Qinghao Hu, Alex Hu, Yecheng Wu, Shang Yang, Luke J. Huang, Zhuoyang Zhang, Han Cai, Song Han

发表机构 * MIT(麻省理工学院) NVIDIA(英伟达)

AI总结 提出语义掩码专家策略优化(SMEPO),通过掩码专家轨迹中与奖励相关的语义片段,将困难问题转化为填空过程,提升强化学习在推理密集型任务中的探索效率。

详情
AI中文摘要

具有可验证奖励的强化学习(RLVR)已成为提升语言模型在推理密集型任务上性能的强大范式,但其有效性常受限于探索。例如,模型在困难问题上常常失败,留下很少有用的奖励信号。外部专家轨迹提供了一种自然的引导来源,但它们也可能在通往验证器目标的关键路径上暴露与奖励相关的内容,如最终答案、中间值、可执行实现或与答案相关的实体。这些内容可能创建意外的奖励黑客通道,使策略通过复制轨迹而非学习底层推理或智能体行为来获得奖励。现有的引导式RL方法通过使用部分轨迹来降低这种风险,但它们主要启发式地控制展示多少专家信息,而非控制应隐藏哪些部分。为此,我们提出语义掩码专家策略优化(SMEPO),一种用于专家引导RLVR的细粒度语义掩码策略。SMEPO不是粗略地截断轨迹或原样展示,而是在保留专家分解、计划和过程结构的同时,掩码关键路径上与奖励相关的语义片段。这将困难问题从从头推理转变为填空过程:策略可以遵循专家的问题解决路径,但仍需自行重建缺失的值、代码或实体。SMEPO易于应用,无需更改奖励函数或RL目标。在包括数学、代码和智能体搜索在内的多个领域,SMEPO相比GRPO将准确率提升最多3.2个百分点,并将训练时间减少最多4.2倍。代码已开源:https://github.com/mit-han-lab/SMEPO。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has become a powerful paradigm for improving language models on reasoning-intensive tasks, but its effectiveness is often limited by exploration. For example, models often fail on hard problems, leaving little useful reward signal. External expert traces offer a natural source of guidance, yet they may also expose reward-relevant content along the critical path to the verifier target, such as final answers, intermediate values, executable implementations, or answer-related entities. This content can create an unintended reward hacking channel, allowing the policy to obtain reward by copying the trace rather than learning the underlying reasoning or agentic behavior. Existing guided-RL methods reduce this risk by using partial trajectories, but they mainly control how much expert information is shown heuristically rather than which parts should be hidden. To this end, we propose Semantic Masked Expert Policy Optimization (SMEPO), a fine-grained semantic masking strategy for expert-guided RLVR. Instead of truncating traces coarsely or revealing them unchanged, SMEPO masks reward-relevant semantic spans along the critical path while preserving the expert's decomposition, plan, and procedural structure. This turns hard problems from reasoning from scratch into a fill-in-the-blank process: the policy can follow the expert's problem-solving route, but must still reconstruct the missing values, code, or entities by itself. SMEPO is simple to apply and requires no changes to the reward function or RL objective. Across diverse domains, including math, code, and agentic search, SMEPO improves accuracy by up to 3.2 points over GRPO and reduces training time by up to 4.2x. The code is available at https://github.com/mit-han-lab/SMEPO.

2605.25194 2026-05-26 cs.LG

Localization then Neutralization: Gradient-guided Token Suppression against Visual Prompt Injection Attack

先定位再中和:梯度引导的令牌抑制对抗视觉提示注入攻击

Dongpeng Zhang, Ke Ma, Yangbangyan Jiang, Gaozheng Pei, Longtao Huang, Qianqian Xu, Qingming Huang

发表机构 * School of Advanced Interdisciplinary Sciences, UCAS(UCAS交叉学科研究院) School of Electronic, Electrical and Communication Engineering, UCAS(UCAS电子电气与通信工程学院) State Key Laboratory of AI Safety, Institute of Computing Technology, CAS(中国科学院计算技术研究所人工智能安全国家重点实验室) Alibaba Group(阿里巴巴集团) School of Computer Science and Technology, UCAS(UCAS计算机科学与技术学院) Beijing Academy of Artificial Intelligence(北京人工智能研究院) Key Laboratory of Big Data Mining and Knowledge Management, UCAS(UCAS大数据挖掘与知识管理重点实验室)

AI总结 针对多模态大语言模型的视觉提示注入攻击,提出梯度令牌掩码(GTM)方法,通过梯度分析定位关键图像令牌并掩码中和,将攻击成功率降至接近零且计算开销极小。

详情
AI中文摘要

对抗性图像通过提示注入对多模态大语言模型构成严重安全威胁。现有防御缺乏对底层机制的原则性理解,且难以平衡效率和防御效用。在这项工作中,我们表明成功的对抗攻击并非均匀依赖整个图像,而是依赖于一小部分关键图像令牌。基于这一见解,我们提出梯度令牌掩码(GTM),通过梯度分析定位这些令牌并通过掩码中和它们。我们发现,当攻击保留预测令牌时,基于第一个生成令牌输出概率的归因会失败。为克服这一点,GTM利用隐藏状态梯度范数分数进行对抗输入下的生成影响归因。我们证明其排名与完整对抗损失梯度的排名一致,为精确定位提供了理论保证。我们的方法仅需一次前向-反向传播即可识别并清零少量高分令牌,有效破坏对抗攻击路径。在提示注入和多模态越狱攻击上的大量实验表明,我们的方法将攻击成功率(ASR)降至接近零,同时以可忽略的计算开销保持模型效用。

英文摘要

Adversarial images pose a severe security threat to multimodal large language models through prompt injection. Existing defenses largely lack a principled understanding of the underlying mechanisms and struggle to balance efficiency and defense utility. In this work, we show that successful adversarial attacks do not rely on the entire image uniformly but instead depend on a small subset of critical image tokens. Based on this insight, we propose Gradient Token Masking (GTM), which localizes these tokens via gradient analysis and neutralizes them through masking. We find that attribution based on the first generated token's output probability fails when attacks preserve the predicted token. To overcome this, GTM utilizes the Hidden-State Gradient Norm score for generation-influence attribution under adversarial inputs. We prove that its ranking is consistent with that of the full adversarial loss gradient, providing a theoretical guarantee for accurate localization. Our method requires only a single forward-backward pass to identify and zero out a small number of high-scoring tokens, effectively disrupting the adversarial attack path. Extensive experiments on prompt injection and multimodal jailbreak attacks demonstrate that our approach reduces attack success rates (ASR) to near zero while preserving model utility with negligible computational overhead.

2605.25191 2026-05-26 cs.CV

Injecting Image Guidance into Text-Conditioned Diffusion Models at Inference

在推理时将图像引导注入文本条件扩散模型

Agata Żywot, Iason Skylitsis, Thijmen Nijdam, Zoe Tzifa-Kratira, Derck Prinzhorn, Konrad Szewczyk, Aritra Bhowmik

发表机构 * University of Amsterdam(阿姆斯特丹大学)

AI总结 提出视觉概念融合(VCF),一种无需重新训练即可在推理时同时以图像和文本为条件进行双重引导的方法,通过对齐CLIP图像特征与文本嵌入空间实现视觉概念注入。

详情
AI中文摘要

像Stable Diffusion这样的文本到图像扩散模型可以从文本生成高质量图像,但缺乏在推理时无需重新训练即可注入视觉引导(例如草图、风格)的方法。现有方法要么需要计算昂贵的微调,要么依赖于可能造成与文本提示语义不对齐的风格迁移技术。我们引入了视觉概念融合(VCF),这是第一种在推理时无需任何概念特定训练即可同时对图像和文本提示进行双重条件化的方法。VCF通过将CLIP图像特征与文本嵌入空间对齐,实现了将视觉概念注入Stable Diffusion。VCF由三个组件组成:(1)一个轻量级对齐器,使用InfoNCE和交叉注意力重建损失将图像标记映射到文本嵌入流形;(2)一种保留文本和视觉语义的融合策略;(3)一个可选的提示-噪声优化(PNO)模块,用于测试时细化。我们的实验表明,VCF成功地从参考图像中转移了包括风格、构图和调色板在内的视觉属性,同时保持了对提示的遵循。定量结果显示文本对齐(CLIP分数)和视觉对应(LPIPS)之间存在权衡,VCF在参考保真度方面优于基线。

英文摘要

Text-to-image diffusion models like Stable Diffusion generate high-quality images from text, but lack a way to inject visual guidance (e.g. sketches, styles) at inference without retraining. Existing methods either require computationally expensive fine-tuning or rely on style transfer techniques that risk semantic misalignment with textual prompts. We introduce Visual Concept Fusion (VCF), the first method offering dual conditioning on both an image and text prompt at inference time without any concept-specific training. VCF enables visual concept injection into Stable Diffusion by aligning CLIP image features with the text embedding space. VCF consists of three components: (1) a lightweight aligner that maps image tokens to the text embedding manifold using InfoNCE and cross-attention reconstruction losses, (2) a fusion strategy that preserves both textual and visual semantics, and (3) an optional Prompt-Noise Optimization (PNO) module for test-time refinement. Our experiments demonstrate that VCF successfully transfers visual attributes including style, composition, and color palette from reference images while maintaining prompt adherence. Quantitative results show a trade-off between text alignment (CLIP score) and visual correspondence (LPIPS), with VCF outperforming baselines in reference fidelity.

2605.25189 2026-05-26 cs.LG cs.CL

Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models

方向对齐缓解语言模型强化学习中的奖励黑客问题

Wenlong Deng, Jiaji Huang, Kaan Ozkara, Yushu Li, Christos Thrampoulidis, Xiaoxiao Li, Youngsuk Park

发表机构 * University of British Columbia(不列颠哥伦比亚大学) Vector Institute(向量研究所) Amazon(亚马逊)

AI总结 通过分析强化学习更新的几何结构,发现奖励黑客源于优化偏离稳定低维学习轨迹,提出可信方向投影方法约束梯度在干净参考子空间内,延迟捷径利用并保持任务性能。

详情
AI中文摘要

当模型通过利用捷径而非解决预期任务来改进代理奖励时,就会出现奖励黑客问题。我们通过语言模型中强化学习更新的几何结构来研究这种失败模式,并认为当优化偏离稳定的低维学习轨迹时,黑客行为就会出现。我们通过参数更新的主导奇异方向分析了这种漂移,并表明奖励黑客运行比干净运行表现出显著更大的方向变化。基于这一观察,我们引入了可信方向投影,它约束梯度保持在干净参考子空间内。在数学推理的奖励黑客实验中,所提出的方法延迟了捷径利用并更好地保持了任务性能。

英文摘要

Reward hacking arises when a model improves a proxy reward by exploiting shortcuts rather than solving the intended task. We study this failure mode through the geometry of reinforcement learning updates in language models and argue that hacking emerges when optimization drifts away from a stable low-dimensional learning trajectory. We analyze this drift through dominant singular directions of parameter updates and show that reward-hacking runs exhibit substantially larger directional change than clean runs. Motivated by this observation, we introduce trusted-direction projection, which constrains gradients to remain within a clean reference subspace. Across reward-hacking experiments on mathematical reasoning, the proposed approach delays shortcut exploitation and better preserves task performance.

2605.25188 2026-05-26 cs.AI

DarkForest: Less Talk, Higher Accuracy for Multi-Agent LLMs

DarkForest: 少说话,多智能体LLM更高精度

Yi Li, Songtao Wei, Dongming Jiang, Zhichun Guo, Qiannan Li, Bingzhe Li

发表机构 * University of Texas at Dallas(德克萨斯大学达拉斯分校) Independent Researcher(独立研究者) University of California, Davis(加州大学戴维斯分校)

AI总结 提出DarkForest框架,通过保持智能体独立、结构化解析响应并基于信念分布协调,减少通信开销和错误传播,在六个推理基准上实现领先质量并大幅降低令牌消耗。

详情
AI中文摘要

多智能体LLM系统通过组合多个智能体的输出来改进推理,但交互密集型方法可能导致错误传播和高通信开销。当智能体交换原始响应或推理轨迹时,不正确的中间推理可能被采纳和放大,导致自信但错误的共识;多轮通信也增加了令牌消耗、延迟和推理成本。在本文中,我们提出了一种名为DarkForest的受控通信协调框架。DarkForest首先保持智能体独立,因此每个智能体在不看到其他智能体输出的情况下产生答案。然后,它将原始响应解析为结构化候选记录,将语义等价的候选记录分组为聚类,并使用智能体可靠性、置信度、解析质量、支持模式可靠性和独立性校正来估计这些聚类上的校准信念分布。协调器仅从该信念状态接收策略允许的证据,并进行受控通信。在六个推理基准上的实验表明,DarkForest实现了领先的整体质量,在基准指标上比最强基线提高了30.7%,并且与通信密集型基线相比,令牌消耗减少了高达6.5倍。

英文摘要

Multi-agent LLM systems improve reasoning by combining outputs from multiple agents, but interaction-heavy methods can introduce error propagation and high communication overhead. When agents exchange raw responses or reasoning traces, incorrect intermediate reasoning may be adopted and amplified, leading to confident but wrong consensus; multi-round communication also increases token consumption, latency, and inference cost. In this paper, we propose a controlled-communication coordination framework named DarkForest. DarkForest first keeps agents independent, so each agent produces an answer without seeing the others' outputs. It then parses the raw responses into structured candidate records, groups semantically equivalent candidates into clusters, and estimates a calibrated belief distribution over these clusters using agent reliability, confidence, parse quality, support-pattern reliability, and independence corrections. A coordinator receives only policy-permitted evidence from this belief state with controlled communication. Experiments on six reasoning benchmarks show that DarkForest achieves leading overall quality, improves the strongest baseline by up to 30.7\% on benchmark metrics, and reduces token consumption by up to $6.5\times$ compared with communication-heavy baselines.

2605.25186 2026-05-26 cs.CL cs.AI

By Their Fruits You Will Know Them: Comparing Formalizations of Law by the Decisions They Encode

凭其果实,你们将认识它们:通过编码的决策比较法律的形式化

Julius Vernie, Matthias Grabmair

发表机构 * Technical University of Munich(慕尼黑技术大学)

AI总结 提出一种系统方法,通过SAT求解器枚举不同形式化在边缘案例上的分歧,并转化为具体事实场景,以比较同一法律条款的不同形式化,应用于九个前沿LLM生成的十个欧盟条款形式化,发现行为分歧与结构一致性基本不相关。

Comments 23 pages, 17 figures, submitted to EMNLP PROC 2026

详情
AI中文摘要

将法律条款形式化有望实现机器可访问的法律和自动化法律推理,而最近的LLM使得直接从法规文本生成这种形式化变得诱人。然而,任何形式化都会做出隐含的解释选择,其后果难以预料,尤其是当LLM是作者时。我们提出了一种方法,通过它们在个别案例上的推理,系统地比较同一法律条款的不同形式化。给定一个条款的多个形式化,我们在节点级别匹配它们,从匹配中为每对推导出一个共享接口,并使用SAT求解器枚举任意两个形式化存在分歧的边缘案例。然后将选定的边缘案例转化为具体的事实场景,供法律专家检查并采取行动。我们将该方法应用于九个前沿LLM生成的十个欧盟条款的形式化。我们发现,形式化之间的行为分歧与其结构一致性基本不相关,并且口头化的案例揭示了定性的不同分歧类型,包括反映法律评论中真实争议的分歧。

英文摘要

Formalizing legal provisions promises machine-accessible law and automated legal reasoning, and recent LLMs make it tempting to generate such formalizations directly from statutory text. However, any formalization makes implicit interpretive choices whose consequences are hard to anticipate, especially if an LLM is the author. We present a method for systematically comparing different formalizations of the same legal provision by their inferences on individual cases. Given multiple formalizations of a provision, we match them at the node level, derive a shared interface for each pair from the matching, and use a SAT solver to enumerate the edge cases on which any two formalizations disagree. Selected edge cases are then verbalized into concrete factual scenarios that a legal expert can examine and act on. We apply our method to formalizations of ten EU provisions generated by nine frontier LLMs. We find that behavioral divergence between formalizations is essentially uncorrelated with their structural agreement and that the verbalized cases reveal qualitatively distinct types of disagreement, including divergences that mirror genuine controversies in the legal commentary.

2605.25181 2026-05-26 cs.AI

SpecAlign: A Semantic Alignment Framework for SystemVerilog Assertion Generation

SpecAlign: 一种用于 SystemVerilog 断言生成的语义对齐框架

Jaime Rafael Imperial, Hao Zheng

发表机构 * University of South Florida(佛罗里达州立大学)

AI总结 提出 SpecAlign 框架,通过基于蕴含的分类和自一致性投票机制,评估并改进 LLM 生成的 SVA 与自然语言规范之间的语义对齐,无需黄金 RTL。

详情
AI中文摘要

现有的大语言模型(LLM)方法在生成 SystemVerilog 断言(SVA)时主要关注语法有效性和形式验证结果,而生成的断言与自然语言规范之间的语义对齐仍然难以量化。因此,在缺乏黄金 RTL 的情况下,幻觉或未对齐的 SVA 会降低信心并增加调试工作。本文提出了 SpecAlign,一个用于语义评估和优化 LLM 生成的 SVA 的框架。SpecAlign 引入了两个迭代对齐循环,通过基于蕴含的分类来评估自然语言属性和 SVA 是否符合设计规范。我们通过链式思维提示生成多个推理路径,并通过自一致性投票机制聚合它们,从而改进对齐决策。对未对齐的断言进行分析以生成可操作的反馈用于优化。我们进一步定义了一个定量对齐分数来衡量迭代过程中的语义一致性。实验结果表明,SpecAlign 能够有效检测语义不一致性,并在不依赖黄金 RTL 的情况下改进断言对齐,为传统形式验证评估指标提供了可扩展的补充。

英文摘要

Existing Large Language Model (LLM) approaches to SystemVerilog Assertion (SVA) generation primarily focus on syntactic validity and formal verification outcomes, while semantic alignment between generated assertions and natural language specifications remains difficult to quantify. As a result, hallucinated or misaligned SVAs can reduce confidence and increase debugging efforts in the absence of golden RTL. This paper presents SpecAlign, a framework for semantic evaluation and refinement of LLM-generated SVAs. SpecAlign introduces two iterative alignment loops that assess both natural language properties and SVAs against the design specification using entailment-based classification. We improve alignment decisions by generating multiple reasoning paths using chain-of-thought prompting and aggregating them via a self-consistency voting mechanism. Misaligned assertions are analyzed to generate actionable feedback for refinement. We further define a quantitative alignment score to measure semantic consistency across iterations. Experimental results demonstrate that SpecAlign effectively detects semantic inconsistencies and improves assertion alignment without relying on golden RTL, providing a scalable complement to traditional formal verification evaluation metrics.

2605.25179 2026-05-26 cs.CL

Locality Matters for Training-Free Audio Token Compression in Audio-Language Models

局部性对音频-语言模型中免训练音频令牌压缩的重要性

Jiale Luo, Xiaoyu Liang, Haoji Hu

发表机构 * Zhejiang University(浙江大学)

AI总结 提出局部时间二分图合并(LTBM)方法,通过显式时间窗口约束合并相似邻近音频令牌,实现免训练的编码器空间压缩,并验证了局部性归纳偏置在音频令牌压缩中的任务依赖性优势。

Comments Preprint. 8 pages main text, 10 pages total

详情
AI中文摘要

音频-语言模型(ALMs)越来越多地用于音频字幕生成、问答和开放式音频理解,但当音频输入表示为长前缀令牌序列时,其推理成本仍然很高。这些音频前缀消耗上下文预算,增加内存使用,并在资源受限或延迟敏感的环境中使部署更加困难。现有的免训练音频令牌缩减方法主要依赖于固定池化或基于分数的剪枝。固定池化是内容无关的,而基于分数的剪枝可以保留孤立的显著令牌但丢弃附近的声学上下文。我们提出局部时间二分图合并(LTBM),一种免训练的编码器空间压缩方法,在显式时间窗口约束下合并相似的邻近音频令牌。除了引入LTBM,我们还使用受控的全局合并变体来隔离时间局部性本身是否是音频令牌压缩的有用归纳偏置。在AudioCaps、Clotho和MMAU上使用Qwen2-Audio进行的实验显示了任务依赖的局部性效应:在几种压缩设置下,尤其是更强压缩下,局部感知合并更有利于字幕生成,而全局匹配在多项选择音频理解中更具竞争力。在Audio Flamingo 3上的跨骨干验证进一步支持了在适度和激进压缩下局部感知合并的字幕生成优势。

英文摘要

Audio-language models (ALMs) are increasingly used for audio captioning, question answering, and open-ended audio understanding, but their inference cost remains high when audio inputs are represented as long prefix-token sequences. These audio prefixes consume context budget, increase memory usage, and make deployment harder in resource-constrained or latency-sensitive settings. Existing training-free audio-token reduction methods mainly rely on fixed pooling or score-based pruning. Fixed pooling is content-agnostic, while score-based pruning can preserve isolated salient tokens but discard nearby acoustic context. We propose Local Temporal Bipartite Merging (LTBM), a training-free encoder-space compression method that merges similar nearby audio tokens under an explicit temporal window constraint. Beyond introducing LTBM, we use a controlled Global Merge variant to isolate whether temporal locality itself is a useful inductive bias for audio-token compression. Experiments on AudioCaps, Clotho, and MMAU with Qwen2-Audio show evidence of a task-dependent locality effect: locality-aware merging is more favorable for captioning at several compression settings, especially under stronger compression, while global matching is more competitive for multiple-choice audio understanding. A cross-backbone validation on Audio Flamingo 3 further supports the captioning-side advantage of locality-aware merging under moderate and aggressive compression.

2605.25175 2026-05-26 cs.CV

Discrepancy Minimization Improves Cross-Hospital Robustness in Digital Pathology

差异最小化提升数字病理学中的跨医院鲁棒性

Ben Vardi, Dana Schonberger, Yuval Friedmann, Zohar Yakhini, Iris Barshack, Alexander Loebel, Ariel Shamir

发表机构 * Reichman University, Herzliya, Israel(以色列海法大学) Institute of Pathology, Sheba Medical Center, Ramat-Gan, Israel(以色列沙巴医疗中心病理研究所) Technion - Israel Institute of Technology, Haifa, Israel(以色列技术学院)

AI总结 通过局部最大均值差异(LMMD)微调病理基础模型,在域适应和域泛化设置下提升跨医院鲁棒性。

详情
AI中文摘要

病理基础模型(PFMs)近年来快速发展,支持为多种组织病理学任务训练分类器。然而,它们在医院间的鲁棒性仍然有限:当在一个医院的数据上训练分类器并在另一个目标医院评估时,性能通常会下降。我们通过使用局部最大均值差异(LMMD)目标微调PFMs来解决这一挑战,该目标适用于两种设置:域适应(有未标记的目标医院数据可用)和域泛化(目标医院数据完全不可用)。在补丁和切片级别的实验表明,在多个PFMs和任务上均有一致的改进。

英文摘要

Pathology foundation models (PFMs) have advanced rapidly in recent years and support training classifiers for a range of histopathology tasks. However, their robustness across hospitals remains limited: performance often degrades when training a classifier on data from one hospital and evaluating it on another target hospital. We address this challenge by fine-tuning PFMs with a local maximum mean discrepancy (LMMD) objective that applies to two settings: domain adaptation, where unlabeled target-hospital data is available, and domain generalization, where target-hospital data is unavailable at all. Experiments at both the patch- and slide-level show consistent improvements across multiple PFMs and tasks.

2605.25169 2026-05-26 cs.LG stat.ME stat.ML

Learning Treatment Effects during Resource Allocation via Priority-Queue Randomization

资源分配中通过优先级队列随机化学习处理效应

JungHo Lee, Johnna Sundberg, Pim Welle, Bryan Wilder

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Allegheny County Department of Human Services(阿勒格尼县人类服务部)

AI总结 提出优先级队列随机化实验设计框架,在优先服务高需求个体的同时识别因果效应,并优化队列分配以平衡统计效率与优先级。

详情
AI中文摘要

公共服务项目通常在对其效益不确定的情况下分配有限资源,因此需要随机化来支持可信评估。然而在实践中,申请人通常进入等待名单,资源通过分层优先级队列优先分配给被认为需求更高的个体,这使得直接随机化变得困难。受此启发,我们开发了一个实验设计框架,用于在学习处理效应的同时优先治疗最需要帮助的个体,其中新申请人根据其评估的风险评分被随机分配到优先级队列。然后,在预算允许的情况下,按优先级顺序跨队列提供治疗,并在队列内按先到先得原则提供。我们的贡献有两方面。首先,我们描述了在这种优先级队列分配下哪些因果效应被识别。当到达是外生时,处理是条件随机化的,因此标准估计量被识别;当到达是内生时,队列随机化反而为处理提供了工具变量,识别出由排队过程引起的局部处理效应。其次,我们开发了优化的队列分配设计,以在统计效率与优先考虑高需求申请人之间进行权衡。在此过程中,我们表明,尽管设计导致的处理分配存在依赖性,但通常的独立同分布效率界限仍然是合理的设计目标。我们使用美国一个大县的住房分配项目的数据来说明所提出的设计。

英文摘要

Public service programs often allocate limited resources under uncertainty about their benefits, creating a need for randomization to support credible evaluation. In practice, however, applicants commonly enter waitlists where resources are prioritized toward individuals judged to have higher need through tiered priority queues, making direct randomization difficult. Motivated by this, we develop an experimental design framework for learning treatment effects while treating those most in need where incoming applicants are randomized into priority queues based on their assessed risk scores. Treatments are then provided across queues in priority order and first-in-first-out within queue as budget becomes available. Our contributions are two-fold. First, we characterize what causal effects are identified under this priority-queue allocation. When arrivals are exogenous, treatments are conditionally randomized, and hence standard estimands are identified; when arrivals are endogenous, queue randomization instead provides an instrument for treatment, identifying local treatment effects induced by the queuing process. Second, we develop optimized queue-assignment designs that trade off statistical efficiency against prioritizing higher-need applicants. We show in the process that, despite dependence in treatment assignments induced by the design, usual iid efficiency bounds remain well-justified design objectives. We illustrate the proposed designs using data from a housing allocation program in a large U.S. county.

2605.25166 2026-05-26 cs.LG cs.AI

AME-TS: Anchored Mixture-of-Experts for Time Series Forecasting

AME-TS:基于锚定的混合专家模型用于时间序列预测

Rui Wang, Renhao Xue, Ray Razi, Huan Song, Hannah R. Marlowe

发表机构 * Amazon Web Services(亚马逊网络服务)

AI总结 提出AME-TS,一种结构引导的稀疏时间序列基础模型,通过轻量级预测器估计序列级描述符并生成专家软结构先验,实现专家路由与可解释时间结构对齐,在GIFT-Eval基准上实现精度-效率权衡,并在M5微调中展现更稳定的专家专业化。

详情
AI中文摘要

时间序列预测模型通过大型Transformer骨干不断扩展规模,但大多数现有方法通过共享密集计算路径处理所有序列,尽管时间结构存在显著异质性。混合专家模型(MoE)通过条件计算提供了一种自然替代方案,但标准MoE路由导致专家专业化识别弱且在下游适应中常不稳定。我们提出AME-TS,一种结构引导的稀疏时间序列基础模型,将专家路由与可解释的时间结构对齐。AME-TS首先使用轻量级预测器估计序列级描述符,包括可预测性、季节性、趋势和稀疏性,并将其映射为专家上的软结构先验。该序列级先验在训练期间指导令牌级路由,鼓励结构对齐的专业化。在GIFT-Eval基准上,AME-TS在不同模型规模下提供了强大的精度-效率权衡:在小型模型规模上显著优于现有时间序列基础模型,在较大规模上与最强模型保持竞争力,同时通过稀疏路由激活显著更少的参数。我们进一步表明,在M5数据集微调期间,AME-TS学习了比标准MoE更可解释的路由几何和更稳定的专家专业化。这些结果表明,结构感知路由是实现稀疏专家模型在时间序列预测中优势的有效且可靠方式。

英文摘要

Time series forecasting models are increasingly scaled through large Transformer backbones, yet most existing approaches process all series through a shared dense computation path despite substantial heterogeneity in temporal structure. Mixture-of-Experts (MoE) offers a natural alternative by enabling conditional computation, but standard MoE routing leaves expert specialization weakly identified and often unstable during downstream adaptation. We propose AME-TS, a structure-guided sparse time series foundation model that aligns expert routing with interpretable temporal structure. AME-TS first uses a lightweight regime predictor to estimate series-level descriptors, including forecastability, seasonality, trend, and sparsity, and maps them to a soft structural prior over experts. This series-level prior guides token-level routing during training, encouraging structure-aligned specialization. On the GIFT-Eval benchmark, AME-TS delivers a strong accuracy-efficiency tradeoff across model scales: it substantially outperforms existing time series foundation models at small model scales and remains competitive with the strongest models at larger scales, while activating substantially fewer parameters through sparse routing. We further show that AME-TS learns more interpretable routing geometry and substantially more stable expert specialization than standard MoE during fine-tuning on the M5 dataset. These results suggest that structure-aware routing is an effective and reliable way to realize the benefits of sparse expert models for time series forecasting.

2605.25163 2026-05-26 cs.CV cs.AI

K-U-KAN: Koopman-Enhanced U-KAN for 3D Dental Reconstruction from a Single Panoramic X-ray Radiograph

K-U-KAN: 基于Koopman增强的U-KAN用于单张全景X射线片的三维牙齿重建

Bikram Keshari Parida, Abhijit Sen, Wonsang You

发表机构 * Artificial Intelligence \& Image Processing Lab., Department of Information \& Communication Engineering, Sun Moon University, Asan-Si, South Korea Department of Physics Engineering Physics, Tulane University, New Orleans, LA, USA

AI总结 提出K-U-KAN三阶段流水线,结合Kolmogorov-Arnold网络、Koopman算子与U-KAN,从单张全景X射线高效重建三维牙齿结构,提升感知质量并缩短训练时间。

Comments 24 pages, 9 figures,

详情
AI中文摘要

全景X射线将三维颌骨压缩为二维条带;我们的目标是干净且快速地恢复缺失的深度。现有的隐式神经表示能渲染逼真的体积,但训练缓慢,对采样和位置编码敏感,且实际成本高。纯CNN基线效率高,但难以处理牙弓的长程几何,模糊了精细的釉质-牙本质边界,且可解释性差。我们提出K-U-KAN,一个三阶段流水线:(i) 使用Kolmogorov-Arnold网络将二维特征提升为深度感知的可观测变量,(ii) 通过Koopman令牌块以稳定的、相位感知的线性演化推进这些可观测变量,(iii) 将预测的深度区间放置在焦槽射线上,然后由轻量级3D注意力U-KAN细化体积。这种物理(Beer-Lambert图像形成)、几何(马蹄形焦槽)和学习线性动力学的结合,在批量大小为1的原生射线强度上产生了清晰的解剖结构、更少的伪影和鲁棒的行为。在保留数据上,K-U-KAN在信号和结构指标上与Transformer/隐式基线相当,显著提高了感知质量,并且训练时间大约减半——使单视图全景X射线到锥形束CT重建在临床流程中更加实用。

英文摘要

A panoramic X-ray compresses a 3D jaw into a 2D strip; we aim to recover the missing depth cleanly and fast. Existing implicit neural representations render realistic volumes but are slow to train, sensitive to sampling and positional encodings, and costly in practice. Pure CNN baselines are efficient yet struggle with the dental arch's long-range geometry, blur fine enamel-dentin boundaries, and offer little interpretability. We present K-U-KAN, a three-stage pipeline that (i) lifts 2D features into depth-aware observables with Kolmogorov-Arnold Networks, (ii) advances these observables by a stable, phase-aware linear evolution via a Koopman token block, and (iii) places the predicted depth bins onto focal-trough rays before a lightweight 3D attention U-KAN refines the volume. This marriage of physics (Beer-Lambert image formation), geometry (horseshoe focal trough), and learned linear dynamics yields sharp anatomy, fewer artifacts, and robust behavior on native radiographic intensities with batch size one. On held-out data, K-U-KAN matches transformer/implicit baselines on signal and structure metrics, clearly improves perceptual quality, and trains in roughly half the time-making single-view PX $\to$ CBCT reconstruction more practical for clinical pipelines.

2605.25162 2026-05-26 cs.CL cs.AI

STREAM: A Data-Centric Framework for Mining High-Value Task-Oriented Dialogues from Streaming Media

STREAM:一个以数据为中心的框架,用于从流媒体中挖掘高价值任务导向对话

Liang Xue, Haoyu Liu, Cheng Wang, Pengyu Chen, Haozhuo Zheng, Yang Liu

发表机构 * Harbin Institute of Technology(哈尔滨工业大学) Byering Technology(伯英技术)

AI总结 提出STREAM框架,利用流媒体数据合成大规模多领域任务导向对话数据集StreamDial,通过角色构建和对话蓝图结合RAG生成高质量对话,解决数据稀缺问题。

详情
AI中文摘要

垂直领域的大语言模型受到复杂、特定领域任务导向对话稀缺的瓶颈。现有的数据获取管道面临持续的三难困境:专家标注昂贵,真实服务对话受隐私和商业限制,静态语料库很快过时。我们提出Stream,一个以数据为中心的框架,利用公开可用的流媒体(直播和短视频)大规模合成高价值服务对话。Stream从嘈杂的流中挖掘真实的交互信号,并通过将基于角色的个性构建与对话蓝图构建相结合来合成对话;它进一步采用检索增强生成(RAG)来支持知识感知的响应。基于Stream,我们发布了StreamDial,一个覆盖汽车、餐厅和酒店的大规模多领域数据集。StreamDial总共包含87,498个对话会话和1,497,320轮次,平均每个会话17.11轮,各领域规模相当。每个会话被组织为结构化四元组⟨P_u, P_a, B, H⟩,将对话历史与明确的用户/代理角色和对话蓝图配对,捕捉真实服务行为,如需求挖掘、约束冲突、协商和恢复。使用自动评估和下游任务的评估表明,StreamDial在强基线上提高了内在对话质量,使用StreamDial训练的模型在多个骨干网络上改进了对话状态跟踪;我们进一步报告了完整的人工评估集,并在受控训练预算下在Qwen3-8B上实现了令人鼓舞的多语言迁移。数据发布在https://github.com/hitxueliang/DialogDataSetBySTREAM。

英文摘要

Large language models for vertical domains are bottlenecked by the scarcity of complex, domain-specific task-oriented dialogues. Existing data acquisition pipelines face a persistent trilemma: expert annotation is expensive, real-world service conversations are constrained by privacy and commercial restrictions, and static corpora quickly become temporally stale. We propose Stream, a data-centric framework that leverages publicly available streaming media (live streams and short videos) to synthesize high-value service dialogues at scale. Stream mines authentic interaction signals from noisy streams and synthesizes conversations by integrating role-grounded persona construction with Conversational Blueprint construction; it further adopts retrieval-augmented generation (RAG) to support knowledge-aware responses. Based on Stream, we release StreamDial, a large-scale multi-domain dataset covering Automotive, Restaurant, and Hotel. StreamDial contains 87,498 dialogue sessions and 1,497,320 turns in total, with an average of 17.11 turns per session and a comparable scale across domains. Each session is organized as a structured quadruplet $\langle P_u, P_a, B, H \rangle$ that pairs dialogue history with explicit user/agent personas and a Conversational Blueprint, capturing realistic service behaviors such as requirement mining, constraint conflicts, negotiation, and recovery. Evaluations with automatic judges and downstream tasks show that StreamDial improves intrinsic dialogue quality over strong baselines, and models trained with StreamDial improve Dialogue State Tracking across backbones; we further report a completed human-evaluation set and encouraging multilingual transfer on Qwen3-8B under a controlled training budget. The data is released in https://github.com/hitxueliang/DialogDataSetBySTREAM.

2605.25156 2026-05-26 cs.LG cs.AI

Abduction-Deduction Entanglement: Domain Generalization via Representation Transplants

溯因-演绎纠缠:通过表示移植实现领域泛化

Kasra Jalaldoust, Elias Bareinboum

发表机构 * Columbia University(哥伦比亚大学)

AI总结 本文提出一种基于表示移植的方法,通过参数化溯因-演绎纠缠中的非可识别性,在源分布约束下搜索目标分布空间,实现领域泛化中的最优目标预测。

详情
AI中文摘要

在源分布下训练的预测模型通常无法很好地泛化到不同的目标分布。对未见数据分布的有效推断必须依赖于生成源数据和目标数据的某些因果机制的不变性,然而这些结构不变性仅从源数据中是无法识别的。在关于数据的温和因果假设下,我们表明目标中的最优预测实际上部分可由源分布识别。该结果基于一个简单的观察:在任何领域中,最优预测可以分解为我们称之为溯因映射和演绎映射的一对映射,其中溯因映射从观测变量推断某些未观测变量(可能是混杂因素),演绎映射使用观测和推断的量来预测标签。大量源数据的使用固定了最优预测,从而约束了产生它的有效溯因-演绎组合——这种非可识别性我们称之为溯因-演绎纠缠。为了利用这一点,我们使用所谓的表示移植来参数化受约束的族,表示移植是表示空间中的一种特定线性变换,它在保留演绎成分的同时操纵表示的溯因内容。生成标签的因果机制的不变性意味着源和目标之间存在不变的演绎映射。因此,我们可以通过参数化移植来搜索合理的目标分布空间。我们在一个学习器-对手博弈中使用该方案,在理想优化下,该博弈可证明终止于学习器具有极小极大最优目标预测。评估验证了理论,表明该方法在领域泛化基准测试中具有竞争力。

英文摘要

Prediction models trained under the source distribution do not generalize well to a different target distribution. A valid inference about an unseen data distribution must be anchored by the invariance of certain causal mechanisms that generate the source and target data, however, these structural invariances are non-identifiable from the source data alone. Under mild causal assumptions about the data, we show that the optimal prediction in the target is in fact partially identifiable by the source distribution. The result rests on a simple observation: In any domain, the optimal prediction can be factorized into what we call a pair of abduction and deduction maps, where the abduction map makes inference about some unobserved variables (possibly confounders) from the observed variables and the deduction map predicts the label using both the observed and inferred quantities. Access to large source data pins down the optimal prediction, thus constrains the valid abduction-deduction ensembles that produce it -- a non-identifiability that we call the abduction-deduction entanglement. To leverage this, we parameterize the constrained family using what we call a representation transplant, that is a specific linear transformation in the representation space that manipulates the abduction content of the representation while retaining the deduction component. Invariance of the causal mechanism generating the label implies existence of an invariant deduction map between source and target. Thus, we can search the space of plausible target distributions via a parametric transplant. We use this scheme in a learner-adversary game that, under an idealistic optimization, provably terminates with the learner having the minimax-optimal target prediction. Evaluations verify the theory, showing that the method is competitive in DG benchmarks.

2605.25151 2026-05-26 cs.AI cs.CE

Representation Without Control: Testing the Realization Effect in Language Models

无控制的表征:测试语言模型中的实现效应

Ciarán Walsh, Emilio Barkett

发表机构 * Columbia University(哥伦比亚大学)

AI总结 通过提示行为、线性读出和因果控制三个层面,测试语言模型是否表现出类似人类的实现效应,发现潜在读出成功但因果控制无效,表明三者不自动共存。

详情
AI中文摘要

大型语言模型越来越多地被用作行为模拟器,但其输出何时反映类似人类的认知机制而非提示敏感的表面模式仍不清楚。我们通过实现效应研究这一问题,这是行为经济学中一个特征明确的发现,即风险承担在纸面收益与实现收益及损失后存在系统性差异。我们在三个层面评估LLM行为:仅提示的行为敏感性、内部表征的线性读出以及通过激活引导的因果控制。仅提示结果显示系统的条件敏感性,但方向模式未复现人类实现效应的预测。Gemma的残差流在第18层包含一个线性可解码的实现状态信号,该信号可泛化到未见过的提示。然而,沿此方向引导并未可靠地改变下游风险选择,这一零结果在正尺度和负符号对称运行中均成立。行为敏感性、潜在读出和因果控制是三个不同的属性,它们不会自动共存,成功的潜在读出不足以证明模型在下游决策中行为上依赖于该表征。

英文摘要

Large language models are increasingly used as behavioral simulators, but it remains unclear when their outputs reflect human-like cognitive mechanisms rather than prompt-sensitive surface patterns. We study this question through the realization effect, a well-characterized finding in behavioral economics in which risk-taking differs systematically after paper versus realized gains and losses. We evaluate LLM behavior at three levels: prompt-only behavioral sensitivity, linear readout of internal representations, and causal control via activation steering. Prompt-only results show systematic condition sensitivity, but the directional pattern does not reproduce human realization-effect predictions. Gemma's residual stream contains a linearly decodable realization-status signal at layer 18 that generalizes to held-out prompts. Steering along this direction does not, however, reliably shift downstream risk choices, a null result that holds across positive scales and in a negative sign-symmetry run. Behavioral sensitivity, latent readout, and causal control are three distinct properties that do not automatically co-occur, and successful latent readout is insufficient evidence that a model behaviorally relies on a representation during downstream decision-making.

2605.25141 2026-05-26 cs.CL cs.AI

LLM Agent Based Renewable Energy Forecasting Using Edge and IoT Data A Review of Solar Wind Weather and Grid Aware Decision Support

基于LLM Agent的利用边缘和物联网数据的可再生能源预测:太阳能、风能、天气和电网感知决策支持综述

Pavan Manjunath, Thomas Pruefer

发表机构 * Independent Researcher(独立研究员)

AI总结 本文综述了如何利用大语言模型代理整合异构传感器流、天气API数据、历史发电记录和电网约束,形成统一的决策支持工作流,以增强可再生能源预测。

详情
AI中文摘要

可再生能源发电的可靠预测是电网稳定性、能源交易、电池调度和碳感知运营规划的基础要求。太阳能和风能资源本质上是间歇性的,其输出随云量、风速、大气湍流、季节模式和局部地形而波动。物联网和边缘设备的普及,包括智能电表、逆变器、风速计、日射强度计、气象站和电网接口传感器,创造了前所未有的实时运行数据量,而传统的预测流程难以充分利用这些数据。本综述研究了大语言模型代理如何通过将异构传感器流、天气API数据、历史发电记录、电网约束和上下文推理整合到统一的决策支持工作流中,来增强可再生能源预测。我们调查了经典预测方法(统计时间序列模型、深度学习架构、物理混合方法)以及新兴的用于解释、不确定性沟通和操作员指导的LLM代理框架。提出了一个六层分类法,涵盖数据采集、预处理、特征工程、模型推理、不确定性估计和自然语言报告。综述识别了十二个开放挑战,包括实时部署、分布偏移下的模型漂移、不确定性量化、LLM代理中的幻觉控制、边缘硬件的互操作性以及与能源管理系统的集成。论文最后建议了一个研究议程,重点关注开放基准、物理信息LLM基础以及联邦预测架构。

英文摘要

Reliable forecasting of renewable energy generation is a foundational requirement for grid stability energy trading battery scheduling and carbon aware operational planning Solar and wind resources are inherently intermittent their output fluctuates with cloud cover wind speed atmospheric turbulence seasonal patterns and local terrain The proliferation of IoT and edge devices spanning smart meters inverters anemometers pyranometers weather stations and grid interface sensors has created an unprecedented volume of real time operational data that conventional forecasting pipelines are ill equipped to exploit fully This review investigates how large language model LLM agents can enhance renewable energy forecasting by integrating heterogeneous sensor streams weather API data historical generation records grid constraints and contextual reasoning into unified decision support workflows We survey classical forecasting methods statistical time series models deep learning architectures physics hybrid approaches and emerging LLM agent frameworks for explanation uncertainty communication and operator guidance A six layer taxonomy is proposed covering data acquisition preprocessing feature engineering model inference uncertainty estimation and natural language reporting The review identifies twelve open challenges spanning real time deployment model drift under distribution shift uncertainty quantification hallucination control in LLM agents interoperability of edge hardware and integration with energy management systems The paper concludes by recommending a research agenda centred on open benchmarks physics informed LLM grounding and federated forecasting architectures

2605.25135 2026-05-26 cs.LG cs.AI

ASTRO: Adaptive Spatio-Temporal Reinforcement Optimization for GNN Powered Anomly Detection in Cyber Physical Systems

ASTRO: 用于信息物理系统中基于GNN的异常检测的自适应时空强化优化

Rai Ali Yar, Umaisa Lail, Anwar Shah

发表机构 * Department of Computer Science, FAST NUCES(计算机科学系,FAST NUCES) Department of Information Technology, Riphah International University(信息技术系,Riphah国际大学)

AI总结 提出ASTRO框架,结合深度Q网络与图神经网络、时间建模和多头注意力机制,通过强化学习动态优化阈值,在SWaT和WADI数据集上实现高F1分数,优于现有方法。

详情
AI中文摘要

工业物联网环境中的异常检测对于保护工业控制系统和信息物理系统免受运行时虚假数据注入和其他恶意攻击至关重要。传感器网络和互连控制回路日益复杂,使得识别隐藏在高维和时间依赖信号中的异常行为变得困难。为解决这些挑战,本文介绍了自适应时空强化优化ASTRO,一种新颖的异常检测框架,开创性地使用强化学习进行动态阈值优化。通过将深度Q网络与图神经网络、时间建模和多头注意力机制相结合,ASTRO不断调整其决策边界以提高检测精度。GNN组件建模传感器之间的空间关系,时间模型捕获时间序列依赖性,注意力层突出显示最具信息量的时间步。模型生成连续异常分数,通过自适应阈值转换为二元决策,该阈值通过深度Q网络优化。ASTRO方法在两个真实工业基准测试:安全水处理和水分配数据集上进行了评估。所提模型在SWaT上取得了卓越性能,F1分数为0.990。此外,在高度复杂的127个终端设备的WADI数据集上,它获得了0.788的F1分数,比最先进的基线高出近14%。多次运行的结果证实了其一致的泛化能力和稳定性。这些实验表明,ASTRO框架是增强大规模信息物理基础设施的高度实用和可扩展的方法。

英文摘要

Anomaly detection in Industrial Internet of Things (IIoT) environments is essential to protect the Industrial Control Systems (ICS) and Cyber-Physical Systems (CPS) from occuring run time false data injection and other malicious attacks. The increasing complexity of sensor networks and interconnected control loops makes it difficult to identify anomalous behavior hidden within high-dimensional and time-dependent signals. To address these challenges, this article introduces Adaptive Spatio-Temporal Reinforcement Optimization ASTRO (ASTRO), a novel anomaly detection framework that pioneers the use of reinforcement learning for dynamic threshold optimization. By integrating a Deep Q-Network (DQN) with Graph Neural Networks (GNNs), temporal modelling and a Multi-Head Attention mechanism, ASTRO continuously adapts its decision boundaries to improve detection accuracy. The GNN component models the spatial relations among sensors, Temporal model captures time series dependencies and the attention layer highlights most informative time steps. The model generates continuous anomaly scores, which are transformed into binary decisions using an adaptive threshold, optimized via a Deep Q-Network (DQN). The ASTRO approach is evaluated on two real world industrial benchmarks: the Secure Water Treatment (SWaT) and Water Distribution (WADI) datasets. The proposed model achieves an exceptional performance on the SWaT with F1 score of 0.990. Moreover, on highly complex 127 end devices WADI dataset, it secures F1 score of 0.788, outperforming state-of-the-art baselines by nearly 14%. Results across multiple runs confirm consistent generalization and stability. These experiments demonstrate that the ASTRO framework is highly practical and scalable method for strengthening the large scale cyber physical infrastructures