arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2086
专题追踪
2605.05748 2026-05-08 cs.AI

Evaluating Explainability in Safety-Critical ATR Systems: Limitations of Post-Hoc Methods and Paths Toward Robust XAI

评估安全关键ATR系统中的可解释性:事后方法的局限性及稳健XAI的路径

Vanessa Buhrmester, David Muench, Dimitri Bulatov, Michael Arens

发表机构 * Fraunhofer Institute of Optronics, System Technologies(弗劳恩霍夫光学与系统技术研究所)

AI总结 本文评估安全关键ATR系统中可解释性方法的局限性,指出事后方法的不足,并探讨更稳健的XAI方向,强调需超越视觉合理解释以支持可靠决策。

Comments 15 pages, 1 image 1 table, ICPR workshop

详情
AI中文摘要

可解释人工智能(XAI)日益被认可为在安全关键环境中部署机器学习系统所必需。在自动目标识别(ATR)中,模型处理图像、视频、雷达和多传感器数据,仅高预测性能不足。模型决策必须可解释、可靠且适合验证。本文对安全关键ATR系统中的可解释性方法进行了结构化评估:我们识别了主要的XAI范式,包括基于显著性的、基于注意力的和替代方法,以及近期的检测感知扩展。基于此,我们将可解释性定义为一种以保证为导向的评估问题,引入了分类法,并根据四个关键维度评估这些方法:可解释性、鲁棒性、对操纵的易受性以及对验证和验证的适合性。分析指出现有事后解释方法存在系统性限制。特别是,我们推导出关键故障模式,如虚假解释、扰动下的不稳定性以及由视觉说服性输出引起的过度信任。这些发现表明,广泛使用的XAI技术可能不足以支持安全关键部署。最后,我们讨论了对ATR系统的影响,并概述了更稳健、因果基础和物理感知的解释方法的方向。我们的结果强调了需要超越视觉合理解释,转向支持可靠决策和系统级保证的方法。

英文摘要

Explainable Artificial Intelligence (XAI) is increasingly rec ognized as essential for deploying machine learning systems in safety critical environments. In Automatic Target Recognition (ATR), where models operate on image, video, radar, and multisensor data, high pre dictive performance alone is insufficient. Model decisions must also be interpretable, reliable, and suitable for validation. This paper presents a structured evaluation of explainability methods in the context of safety-critical ATR systems: We identify major XAI paradigms, including saliency-based, attention-based, and surrogate ap proaches, as well as recent detection-aware extensions. Based on this, we formalize explainability as an assurance-oriented assessment problem, introduce a taxonomy, and assess these methods with respect to four key dimensions: interpretability, robustness, vulnerability to manipula tion, and suitability for validation and verification. The analysis identifies systematic limitations of current post-hoc explanation methods. In par ticular, we derive critical failure modes such as spurious explanations, instability under perturbations, and overtrust induced by visually con vincing outputs. These findings indicate that widely used XAI techniques may be insufficient for safety-critical deployment. Finally, we discuss implications for ATR systems and outline directions toward more robust, causally grounded, and physically informed explain ability methods. Our results emphasize the need to move beyond visually plausible explanations toward approaches that support reliable decision making and system-level assurance.

2605.05745 2026-05-08 cs.AI

Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback

在广义线性老虎机中通过混合反馈进行最佳臂识别

Qirun Zeng, Xuchuang Wang, Jiayi Shen, Xutong Liu, Fang Kong, Jinhang Zuo

发表机构 * Department of Computer Science(计算机科学系) Manning College of Information & Computer Science(信息与计算机科学学院) School of Management(管理学院) Computer Science and Systems(计算机科学与系统) School of Computer Science and Technology(计算机科学与技术学院)

AI总结 本文研究了在混合反馈模型下广义线性老虎机中的固定置信度最佳臂识别问题,提出了一种基于似然比的置信序列和混合Track-and-Stop算法,实现了自洽性假设下的椭圆置信集,并通过实验验证了算法的样本效率提升。

详情
AI中文摘要

我们研究了在广义线性老虎机中,基于混合反馈模型下的固定置信度最佳臂识别问题:在每一轮中,学习者可以查询单一臂的绝对奖励反馈或一对臂的相对(对决)反馈,两者均受广义线性模型约束。我们引入了一种基于似然比的置信序列,统一了异质广义线性观测,并在自洽性假设下得到显式的椭圆置信集。基于此置信集,我们提出了一种混合Track-and-Stop算法,通过跟踪联合动作空间中的最坏最优设计来自适应分配查询。我们建立了δ-正确性,并提供了停止时间的高概率上界。我们进一步将框架扩展到考虑异质获取成本的场景中。实验表明,所提算法在样本效率方面显著优于基线方法。

英文摘要

We study fixed-confidence best arm identification in generalized linear bandits under a hybrid feedback model: at each round, the learner may query either (i) absolute reward feedback from a single arm or (ii) relative (dueling) feedback from an arm pair, both governed by generalized linear models. We introduce a likelihood-ratio--based confidence sequence that unifies heterogeneous generalized linear observations and yields an explicit ellipsoidal confidence set under a self-concordance assumption. Building on this confidence set, we propose a hybrid Track-and-Stop algorithm that adaptively allocates queries by tracking a minimax-optimal design over a joint action space of arms and pairs. We establish $δ$-correctness and provide high-probability upper bounds on the stopping time. We further extend the framework to a cost-aware setting that accounts for heterogeneous acquisition costs across feedback modalities. Empirical experiments demonstrate that the proposed algorithms significantly improve sample efficiency over baseline methods.

2605.05742 2026-05-08 cs.LG

Weak-to-Strong Generalization is Nearly Inevitable (in Linear Models)

弱到强的泛化几乎是不可避免的(在线性模型中)

Scott Geng, Dutch Hansen, Jerry Li

发表机构 * University of Washington(华盛顿大学)

AI总结 研究显示在线性逻辑回归中,在轻微的数据分布假设下,弱到强泛化现象几乎不可避免,挑战了传统理论中关于模型容量不匹配是关键机制的观点。

详情
AI中文摘要

弱到强泛化是一种在训练后出现的现象,其中强的学生模型在仅使用弱教师的反馈进行微调时,不仅能超越教师,还能提升自身能力。Burns等人(2023)的研究表明,这种现象在前沿语言模型中可能发生,随后涌现出大量实证和理论研究。本文证明在标准线性逻辑回归中,弱到强泛化在轻微的数据分布假设下发生,实际上大多数学生-教师对都会出现这种现象,表明弱到强泛化在基本设置中几乎不可避免。值得注意的是,本研究不要求学生在表达能力或模型容量上优于教师,这与传统理论认为模型容量不匹配是关键机制的观点相悖。

英文摘要

Weak-to-strong generalization is a phenomenon in post-training whereby a strong student model, when finetuned solely with feedback from a weaker teacher, can not only surpass the teacher, but can improve upon its own capabilities. Recent work of Burns et al. (2023) demonstrated that this can occur in the setting of frontier language models, and subsequently there has been a flurry of both empirical work trying to exploit this phenomenon, as well as theoretical work attempting to understand it. In this work, we demonstrate that weak-to-strong generalization occurs in standard linear logistic regression, under mild distributional assumptions on the data. In fact, we show that this happens for most student-teacher pairs, suggesting that weak-to-strong generalization is in fact \emph{almost inevitable}, even in this basic setting. Notably, our setting does not require the student to be more expressive or have more model capacity in any way compared to the teacher, which runs contrary to the prevailing theoretical belief that a mismatch in model capacity is a central mechanism to weak-to-strong generalization.

2605.05741 2026-05-08 cs.AI

HyperLens: Quantifying Cognitive Effort in LLMs with Fine-grained Confidence Trajectory

HyperLens: 通过细粒度置信轨迹量化大语言模型中的认知努力

Chengda Lu, Xiaoyu Fan, Wei Xu

发表机构 * IIIS, Tsinghua University(清华大学信息学院) Shanghai Qi Zhi Institute(上海启智研究院)

AI总结 HyperLens通过分析大语言模型推理过程中的置信轨迹,量化认知努力,揭示复杂任务对更高认知努力的需求,并诊断监督微调对领域任务性能的影响。

Comments 33 pages

详情
AI中文摘要

尽管大语言模型(LLMs)在多样任务上表现出色,但其推理动态仍因现有分析工具分辨率有限而难以理解。本文识别了transformer架构中的内在放大机制:更深层的层会放大层间置信度的微小变化,提供细粒度的置信轨迹。基于此洞察,我们引入HyperLens,一种高分辨率探针,用于追踪置信轨迹并量化推理中的认知努力。在LLMs和数据集上,HyperLens揭示了置信轨迹的一致分歧,将复杂任务与简单任务分开。我们将此模式抽象为一个定量的认知努力度量。我们的分析揭示了基本原理:复杂任务始终需要更高的认知努力。最后,我们提供了监督微调(SFT)常见副作用的机理诊断:它会降低认知努力并进而影响领域任务性能。

英文摘要

While Large Language Models (LLMs) achieve strong performance across diverse tasks, their inference dynamics remain poorly understood because of the limited resolution of existing analysis tools. In this work, we identify an intrinsic magnification mechanism in transformer architectures: deeper layers inherently magnify the small changes of layer-wise confidence, providing a fine-grained confidence trajectory. Building on this insight, we introduce HyperLens, a high-resolution probe designed to trace confidence trajectories and quantify the cognitive effort during inference. Across LLMs and datasets, HyperLens reveals a consistent divergence in confidence trajectories that separates complex from simple tasks. We abstract this pattern into a quantitative cognitive effort metric. Our analysis reveals a fundamental principle: complex tasks consistently require higher cognitive effort. Finally, we provide a mechanistic diagnosis of a common side effect of standard Supervised Fine-Tuning (SFT): it can reduce cognitive effort and consequently degrade performance on in-domain tasks.

2605.05738 2026-05-08 cs.LG cs.AI

CoMemNet: Contrastive Sampling with Memory Replay Network for Continual Traffic Prediction

CoMemNet:基于记忆回放网络的对比采样用于持续交通预测

Mei Wu, Wenchao Weng, Wenxin Su, Wenjie Tang, Wei Zhou

发表机构 * College of Computer Science, Shanghai Jiao Tong University(上海交通大学计算机科学学院) College of Computer Science and Technology, Zhejiang University of Technology(浙江工业大学计算机科学与技术学院) School of Automation, Nanjing University of Science and Technology(南京理工大学自动化学院)

AI总结 CoMemNet提出一种双分支持续学习框架,通过动态对比采样和轻量级记忆缓冲器解决持续交通预测中的灾难性遗忘问题,实验证明其在三大真实世界数据集上达到SOTA性能。

Comments 12 pages, 6 figures

详情
AI中文摘要

CoMemNet提出了一种双分支持续学习框架,通过动态对比采样和轻量级记忆缓冲器解决持续交通预测中的灾难性遗忘问题,实验证明其在三大真实世界数据集上达到SOTA性能。

英文摘要

In recent years, the integration of non-topological space modeling with temporal learning methods has emerged as an effective approach for capturing spatio-temporal information in non-Euclidean graphs. However, most existing methods rely on static underlying graph structures, which are inadequate for capturing the continuously expanding and evolving patterns in streaming traffic networks. To address this challenge, we propose a simple yet efficient dual-branch continual learning framework for traffic prediction, named CoMemNet. The fast-converging Online branch undertakes the primary prediction tasks, while the momentum-updated Target branch extracts historical information using Wasserstein Distance features to create a Dynamic Contrastive Sampler (DC Sampler). This sampler selects a node set with significant dynamic network feature changes for training, effectively mitigating the issue of catastrophic forgetting. Additionally, the backbone incorporates a lightweight Node-Adaptive Temporal Memory Buffer (TMRB-N) to consolidate old knowledge through memory replay and address the risk of memory explosion. Finally, we provide two newly curated open-source datasets. Experimental results demonstrate that CoMemNet achieves state-of-the-art (SOTA) performance across all three large-scale real-world datasets. The code is available at: https://github.com/meiwu5/CoMemNet.

2605.05737 2026-05-08 cs.AI cs.CL

ReFlect: An Effective Harness System for Complex Long-Horizon LLM Reasoning

ReFlect:一种有效的复杂长周期LLM推理Harness系统

Fan Huang

发表机构 * Indiana University Bloomington(印第安纳大学布卢明顿)

AI总结 ReFlect通过创建确定性封装逻辑,有效检测和恢复LLM推理中的错误,提升多阶段任务的成功率,尤其在SWE-bench中显著提高代码补全质量。

详情
AI中文摘要

当前LLM推理范式包括链式思维、ReAct和事后自我批评,但这些范式在长周期、多阶段任务中失效,导致错误累积。本文提出ReFlect,一种LLM推理的Harness系统,通过确定性封装逻辑独立检测和恢复错误。实验显示,ReFlect在六个领域中提升任务成功率,从41%到56%,并在SWE-bench中将代码补全质量从0%提升至82%-87%。ReFlect的增益与模型基础成功率成反比,每减少1个百分点的基础成功率可获得1.69个百分点的Harness增益。

英文摘要

Current reasoning paradigms for LLMs include chain-of-thought, ReAct, and post-hoc self-critique. These paradigms rely on two assumptions that fail on long-horizon, multi-stage tasks. As a result, errors accumulate silently across reasoning steps, leaving an open question: can a reasoning system effectively detect and recover from its own failures? We present ReFlect, a \emph{harness} system for LLM reasoning that creates standalone error detection and recovery logic as a deterministic wrapper around the model. Controlled experiments across 6 reasoning domains show that prompt-level self-critique produces formulaic templates that flag no issues in 90 of 100 audited reflection blocks, and the investigated LLMs wrongly accept a wrong answer in at least 76\% of cases. Our ReFlect harness achieves task success rates ranging from 41\% on gpt-4o-mini to 56\% on Claude Sonnet 4.5 across six models spanning small and frontier scale, with per-model gains over Direct CoT ranging from +7 pp on Qwen2.5-72B to +29 pp on Claude Sonnet 4.5, and additionally raises SWE-bench patch-structural quality from 0\% (Direct CoT) to between 82\% (Qwen2.5-72B) and 87\% (GPT-4o). Notably, the harness gain is inversely proportional to the model's Direct CoT task success rate (the fitted slope is -1.69 with r=-0.76): each pp lost in baseline success rate is mechanically recovered by 1.69 pp of harness gain. We spot that adding structured reasoning state and operators yields only 15.0--18.7\% pair-mean on Llama-3.3-70B and Qwen2.5-72B because models at this scale cannot reliably populate the state its operators require. ReFlect is model-agnostic, training-free, and operates entirely at inference time.

2605.05731 2026-05-08 cs.AI

Knee Osteoarthritis Severity Grading Using Optimized Deep Learning and LLM-Driven Intelligent AI on Computationally Limited Systems

利用优化深度学习和LLM驱动智能AI对膝骨关节炎严重程度进行分级

Dayam Nadeem, Neha, Safdar Mustafa, Adnan Alvi, Mohd Hussain

发表机构 * Jamia Hamdard(贾迈亚哈姆达德大学)

AI总结 本文提出结合深度学习CNN与TensorFlow Lite平台的自动诊断方法,用于膝骨关节炎严重程度分级,通过迁移学习和优化模型实现94.48%的测试准确率,并利用Gemini-2.0-flash生成解释性结果,提升AI辅助筛查的可及性。

Comments 6 pages, 11 figures, Accepted and presented at the 2nd International Conference on Emerging Computational Intelligence (ICECI 2026), IEEE. Published in conference proceedings. To appear in IEEE Xplore

详情
AI中文摘要

膝骨关节炎(KOA)是一种严重影响关节活动度、引起严重慢性疼痛并影响生活质量的骨科疾病。传统方法因主观性和观察者差异而受限,因此需要精确及时的诊断。本文提出一种自动诊断方法,结合深度学习卷积神经网络(CNN)与基于设备的推理平台,使用ResNet-18模型在公开数据库上训练,通过迁移学习将膝关节图像分类为五级Kellgren-Lawrence(KL)等级。进一步优化模型,使其以轻量TensorFlow Lite格式部署于资源受限设备上,可在无连续互联网连接的环境中运行。此外,还应用了Gemini-2.0-flash大型语言模型生成结构化解释性结果,如潜在症状、风险因素和预防措施等。LLM组件作为接口不干扰分类过程,所提出的方法展示了在设备上进行可解释决策支持工具的可行性,以实现早期诊断并提高AI辅助膝关节筛查工具的可及性。

英文摘要

Knee osteoarthritis (KOA) is among the musculoskeletal disorders that considerably restrict joint mobility, cause severe chronic pain and impact negatively on quality life. It is one of the persistent health issues worldwide. Generally, subjectivity and inter-observer variability undermine conventional practices and evaluation process that are adopted to address such health issues. Hence precise and timely diagnosis would be one of the effective ways for the assessment of its severity. This paper proposes an automated diagnostic approach for severity grading of KOA by blending a deep learning convolutional neural network (CNN) with a device-based inference platform powered by TensorFlow Lite. It proposes a model based on the ResNet-18 convolutional neural network. The designed model is trained on publicly available database. Through a transfer learning approach obtained knee images are first classified into five Kellgren-Lawrence (KL) grades. Further the developed model is optimised. During the training of the model test accuracy of 94.48% with stable convergence has been achieved. Subsequently the optimised model transformed into a lightweight TensorFlow Lite format, facilitating seamless deployment on resource-constrained devices. The designed model is capable enough to operate in the environment having no continuous internet connectivity. Also, an auxiliary Large Language Model (Gemini-2.0-flash) is applied to generate structured interpretive findings like potential symptoms, risk factors, and preventive majors etc. The LLM component functions as interface without influencing the classification process. The proposed model articulates the feasibility of an on-device, interpretable decision-support tools for early diagnosis and improve accessibility to Artificial Intelligence (AI)-assisted knee screening tool.

2605.05728 2026-05-08 cs.LG cs.AI cs.SY eess.SY math.OC

WARP: A Benchmark for Primal-Dual Warm-Starting of Interior-Point Solvers

WARP:一种用于内点法 primal-dual 预热的基准测试

Dhruv Suri, Helgi Hilmarsson, Shourya Bose

发表机构 * Pravah

AI总结 本文提出 WARP 基准测试,用于评估内点法的 primal-dual 预热方法。通过对比传统预热方法,发现仅使用 primal 预热无法有效减少迭代次数,而采用 full primal-dual-barrier 状态可显著降低迭代次数。

详情
AI中文摘要

求解交流最优功率流(AC-OPF)在电力市场运营中至关重要,其中内点法(IPMs)如 IPOPT 是标准求解器。越来越多的研究使用机器学习预测 primal 预热迭代,报告迭代减少 30-46%。我们发现这些报告的收益基于不恰当的评估基准:先前方法对比平坦起始 $V_m = 1, V_a = 0$,而求解器的默认值——变量边界中点 $(l+u)/2$——对于 log-barrier 中心性近似最优。在纠正的基准下,无 primal-only 预热方法能减少求解器迭代。我们追溯失败原因归于内点法的几何性质:primal 预测精度与收敛速度呈反相关,提供 ground-truth 最优解 $x^*$ 而无 dual 变量会导致求解器发散。Oracle 实验表明,完整的 primal-dual-barrier 状态 $(x^*, λ^*, z^*, μ^*)$ 可将 IPOPT 迭代次数从 23 降至 3——85% 的减少,结构上无法被 primal-only 方法实现。为使预热方法在该任务上得到严谨评估,我们发布包含 dual-labeled AC-OPF 数据集、IPOPT 提取的解、纠正的评估协议以及 WARP——一种拓扑条件编码-处理-解码交互网络,可预测异构约束图上的完整内点状态 $(\hat{x}, \hatλ, \hat{z}, \hatμ)$ 的基准测试套件。WARP 在减少 IPOPT 迭代次数的同时,原生支持 N-1 故障拓扑变化,无需重新训练。

英文摘要

Solving AC Optimal Power Flow (AC-OPF) is of central importance in electricity market operations, where interior-point methods (IPMs) such as IPOPT are the standard solvers. A growing body of work uses machine learning to predict primal warm-start iterates, reporting iteration reductions of 30-46\%. We show that these reported gains rest on an inappropriate evaluation baseline: prior methods benchmark against the flat start $V_m = 1, V_a = 0$, whereas the solver's actual default - the variable-bound midpoint $(l+u)/2$ - is near-optimal for log-barrier centrality. Against this corrected baseline, no primal-only warm-start method reduces solver iterations. We trace the failure to a geometric property of interior-point methods: primal prediction accuracy is anticorrelated with convergence speed, and providing the ground-truth optimal solution $x^*$ without dual variables causes the solver to diverge. Oracle experiments establish that the complete primal-dual-barrier state $(x^*, λ^*, z^*, μ^*)$ reduces IPOPT iterations from 23 to 3 - an 85\% reduction that is structurally inaccessible to primal-only methods. To enable rigorous evaluation of warm-start methods on this task, we release a benchmark suite comprising dual-labeled AC-OPF datasets with IPOPT-extracted solutions, a corrected evaluation protocol, and WARP - a topology-conditioned encode-process-decode interaction network that predicts the full interior-point state $(\hat{x}, \hatλ, \hat{z}, \hatμ)$ on the heterogeneous constraint graph. WARP achieves a 76\% reduction in IPOPT iterations while natively accommodating N-1 contingency topology variations without retraining.

2605.05726 2026-05-08 cs.AI

SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents

SkillRet:一种大规模技能检索基准,用于LLM代理

Hongcheol Cho, Ryangkyung Kang, Youngeun Kim

发表机构 * ThakiCloud

AI总结 本文提出SkillRet基准,用于评估LLM代理中的技能检索。该基准包含17810个公开技能,提供63259个训练样本和4997个评估查询,通过任务特定微调显著提升了检索性能。

详情
AI中文摘要

随着LLM代理越来越多地部署带有大量可重用技能的库,选择适合用户请求的技能已成为关键的系统挑战。在小库中,用户可能通过名称显式调用技能,但随着技能生态系统在紧约束的上下文和延迟预算下增长,这一假设不再成立。尽管其实际重要性,技能检索仍处于探索不足的状态,缺乏基准和对现实技能库检索行为的理解。为解决这一差距,我们引入SkillRet,一种大规模技能检索基准。SkillRet包含17810个公开代理技能,通过结构化语义标签和两级分类体系,涵盖6个主要类别和18个子类别。它提供了63259个训练样本和4997个评估查询,具有不相交的技能池,支持基准测试和检索导向的训练。在多样化的检索器集合中,我们发现技能检索仍然远未解决:现成模型在现实大规模技能库中表现不佳,而先前的技能检索模型仍有大量提升空间。在SkillRet上进行任务特定微调显著提升了性能,使NDCG@10比最强的先前检索器提高13.1点,比最强的现成检索器提高16.9点。我们的分析进一步表明,这些收益源于微调模型更好地关注长且嘈杂查询中的小技能相关信号。这些结果确立了SkillRet作为强大的基准和未来研究检索在大规模代理系统中的基础。

英文摘要

As LLM agents are increasingly deployed with large libraries of reusable skills, selecting the right skill for a user request has become a critical systems challenge. In small libraries, users may invoke skills explicitly by name, but this assumption breaks down as skill ecosystems grow under tight context and latency budgets. Despite its practical importance, skill retrieval remains underexplored, with limited benchmarks and little understanding of retrieval behavior on realistic skill libraries. To address this gap, we introduce SkillRet, a large-scale benchmark for skill retrieval in LLM agents. SkillRet contains 17,810 public agent skills, organized with structured semantic tags and a two-level taxonomy spanning 6 major categories and 18 sub-categories. It provides 63,259 training samples and 4,997 evaluation queries with disjoint skill pools, enabling both benchmarking and retrieval-oriented training. Across a diverse set of retrievers, we find that skill retrieval remains far from solved: off-the-shelf models struggle on realistic large-scale skill libraries, and prior skill-retrieval models still leave substantial headroom. Task-specific fine-tuning on SkillRet substantially improves performance, improving NDCG@10 by +13.1 points over the strongest prior retriever and by +16.9 points over the strongest off-the-shelf retriever. Our analysis further suggests that these gains arise because fine-tuned models better focus on the small skill-relevant signals within long and noisy queries. These results establish SkillRet as a strong benchmark and foundation for future research on retrieval in large-scale agent systems.

2605.05725 2026-05-08 cs.AI

Detecting Time Series Anomalies Like an Expert: A Multi-Agent LLM Framework with Specialized Analyzers

像专家一样检测时间序列异常:一种多智能体LLM框架与专用分析器

Hyeongwon Kang, Jeongseob Kim, Jinwoo Park, Pilsung Kang

发表机构 * Department of Industrial and Management Engineering(工业与管理工程系) Korea University(韩国大学) Department of Industrial Engineering(工业工程系) Seoul National University(首尔国立大学)

AI总结 本文提出SAGE框架,通过四个专用分析器检测单变量时间序列的异常,结合数值工具和可视化生成证据,提升异常检测的可靠性与实用性。

Comments Preprint. 9 pages main text, 29 pages total, 8 figures, 9 tables, with appendix

详情
AI中文摘要

近期研究探索了大型语言模型用于时间序列异常检测,但现有方法常依赖单一通用模型直接推断异常指标或区间,限制了对复杂异常模式的可控性、可解释性和可靠性。我们提出SAGE(专家级检测专用分析器组),一种多智能体框架,用于结构化异常诊断。它将异常分析分解为四个专用分析器:点、结构、季节性和模式异常。每个分析器应用特定数值工具和诊断可视化生成证据,而证据驱动的检测器将证据整合为带有区间和候选类型的置信度评分异常记录。监督者将这些结构化记录转换为面向分析师的诊断报告。SAGE进一步从正常参考训练段构建合成上下文示例,不使用真实异常段或异常类型标签作为上下文示例。在三个基准测试中,SAGE在强ML/DL和语言模型基线中实现了最佳平均性能。消融研究和人类评估进一步表明,所提出的框架提高了检测的可靠性和诊断输出的实用性。

英文摘要

Recent studies have explored large language models for time-series anomaly detection, yet existing approaches often rely on a single general-purpose model to directly infer anomaly indices or intervals, limiting controllability, interpretability, and reliability for complex anomaly patterns. We propose SAGE (Specialized Analyzer Group for Expert-like Detection), a multi-agent framework for structured anomaly diagnosis in univariate time series. It decomposes anomaly analysis into four specialized Analyzers for point, structural, seasonal, and pattern anomalies. Each Analyzer applies family-specific numerical tools and diagnostic visualizations to generate evidence, while an evidence-grounded Detector consolidates the evidence into confidence-scored anomaly records with intervals and candidate types. A Supervisor then converts these structured records into analyst-facing diagnostic reports. SAGE further constructs synthetic in-context examples from normal-reference training segments, without using real anomalous segments or anomaly-type labels as in-context examples. Across three benchmarks, SAGE achieves the best average performance among strong ML/DL and language-model-based baselines. Ablation studies and human evaluation further show that the proposed framework improves detection reliability and the practical usefulness of diagnostic outputs.

2605.05722 2026-05-08 cs.CV

$\mathcal{B}^{3}$-Net: Controlled Posterior Bridge Learning for Multi-Task Dense Prediction

$\mathcal{B}^{3}$-Net:多任务密集预测中的受控后验桥学习

Meihua Zhou, Li Yang

发表机构 * Wannan Medical University(皖南医学院) University of Chinese Academy of Sciences(中国科学院大学)

AI总结 本文提出$\mathcal{B}^{3}$-Net,通过可靠性估计、后验桥构建和受限再分配机制,改进多任务密集预测中任务证据的显式建模,提升共享表征的可靠性。

Comments 14 pages, 10 figures

详情
AI中文摘要

多任务密集预测通过统一模型解决互补的像素级任务,如语义分割、深度估计、表面法线估计和边缘检测。现有解码器侧交互使用注意力、提示、路由、扩散、Mamba或桥接特征交换任务证据,但大多数隐式组织证据。它们通常通过相似性或亲和力融合任务特征,未显式建模证据可靠性在任务和空间位置上的差异。因此,不可靠的证据可能污染共享表示并加剧负迁移。我们提出$\mathcal{B}^{3}$-Net,一种多任务密集预测的受控后验桥学习框架。我们的方法将解码器侧交互分解为可靠性估计、后验桥构建和受限再分配。精度场估计器从任务参考对齐和局部变化估计块级证据精度。后验桥操作符通过异方差证据融合构建精度加权后验桥,产生比均匀或启发式混合更可靠的共享状态。受限派发操作符通过受限更新将桥接分配到每个任务分支,减少不受控的特征注入。在NYUD-v2、PASCAL-Context和Cityscapes上的实验表明,$\mathcal{B}^{3}$-Net在代表性CNN、Transformer、扩散、Mamba和桥接特征方法之间实现了竞争或更优的权衡。背骨匹配比较和广泛分析进一步验证,收益来自受控后验桥学习而非背骨容量或解码器规模。

英文摘要

Multi-task dense prediction solves complementary pixel-level tasks in a unified model, such as semantic segmentation, depth estimation, surface normal estimation, and edge detection. Existing decoder-side interactions use attention, prompts, routing, diffusion, Mamba, or bridge features to exchange task evidence, but most of them organize this evidence implicitly. They usually fuse task features by similarity or affinity, without explicitly modeling that evidence reliability varies across tasks and spatial locations. As a result, unreliable evidence may contaminate the shared representation and intensify negative transfer. We propose $\mathcal{B}^{3}$-Net, a controlled posterior bridge learning framework for multi-task dense prediction. Our method decomposes decoder-side interaction into reliability estimation, posterior bridge construction, and bounded redistribution. The Precision Field Estimator estimates patch-wise evidence precision from task-reference alignment and local variation. The Posterior Bridge Operator builds a precision-weighted posterior bridge through heteroscedastic evidence fusion, yielding a shared state more reliable than uniform or heuristic mixtures. The Contractive Dispatch Operator redistributes the bridge to each task branch through a bounded update, reducing uncontrolled feature injection. Experiments on NYUD-v2, PASCAL-Context, and Cityscapes show that $\mathcal{B}^{3}$-Net achieves competitive or superior trade-offs over representative CNN-, Transformer-, diffusion-, Mamba-, and bridge-feature-based methods. Backbone-matched comparisons and extensive analyses further verify that the gains arise from controlled posterior bridge learning rather than backbone capacity or decoder scale.

2605.05718 2026-05-08 cs.LG

Enabling Federated Inference via Unsupervised Consensus Embedding

通过无监督共识嵌入实现联邦推断

Yui Hashimoto, Takayuki Nishio, Yuichi Kitagawa, Takahito Tanimura

发表机构 * School of Engineering, Institute of Science Tokyo(东京科学研究院工程学院) Research & Development Group, Hitachi Ltd.(日立株式会社研发部)

AI总结 本文提出CE-FI框架,通过共识嵌入层和协作输出层实现无需共享模型参数或原始输入的联邦推断,实验表明其在图像分类任务中优于单独推断并在非IID条件下表现优异。

Comments 18 pages, 15 figures, submitted to IEEE Transactions on Mobile Computing (TMC) (under review)

详情
AI中文摘要

在分布式环境中,跨独立部署的机器学习模型协作推断日益受到重视,但现有框架依赖共享输入数据、模型参数或公共编码器,限制了隐私敏感或跨组织场景的应用。为此,本文提出基于共识嵌入的联邦推断(CE-FI)框架,使预训练模型在推断时无需共享模型参数或原始输入,也不假设公共编码器。CE-FI引入两个组件:共识嵌入(CE)层将异构中间表示映射到公共嵌入空间,以及协作输出(CO)层从这些嵌入生成预测。两层仅使用共享未标记数据训练,因此协作阶段不需要额外标记数据。在CIFAR-10和CIFAR-100图像分类基准上,CE-FI在多样非IID条件下始终优于单独推断,并与需要更强共享假设的传统方法表现相当。此外,在文本和时间序列任务上的额外评估表明其适用性超出图像分类,但性能取决于集成策略。进一步分析指出表示对齐是主要瓶颈。

英文摘要

Cooperative inference across independently deployed machine learning models is increasingly desirable in distributed environments, as there is a growing need to leverage multiple models while keeping their data and model parameters private. However, existing cooperative frameworks typically rely on sharing input data, model parameters, or a common encoder, which limits their applicability in privacy-sensitive or cross-organizational settings. To address this challenge, we propose Consensus Embedding-based Federated Inference (CE-FI), a framework that enables pretrained models to cooperate at inference time without sharing model parameters or raw inputs and without assuming a common encoder. CE-FI introduces two components: a Consensus Embedding (CE) layer that maps heterogeneous intermediate representations into a common embedding space, and a Cooperative Output (CO) layer that produces predictions from these embeddings. Both layers are trained using shared unlabeled data only, so the cooperative stage does not require additional labeled data. Experiments on image classification benchmarks -- CIFAR-10 and CIFAR-100 -- under diverse non-IID conditions show that CE-FI consistently outperforms solo inference and performs comparably to conventional methods that require stronger sharing assumptions. Additional evaluations on text and time-series tasks indicate applicability beyond image classification, although performance depends on the ensemble strategy. Further analysis identifies representation alignment as the primary bottleneck.

2605.05716 2026-05-08 cs.AI cs.CL

More Is Not Always Better: Cross-Component Interference in LLM Agent Scaffolding

更多并不总是更好:LLM代理构建中的跨组件干扰

Ming Liu

发表机构 * Amazon(亚马逊)

AI总结 研究LLM代理系统中组件交互导致的性能下降,发现最优组件数量依赖任务和模型规模,贪心选择不可靠,需通过交互分析进行任务特定子集选择。

Comments 10 pages, 5 tables; preprint, under review

详情
AI中文摘要

LLM代理系统通过叠加构建组件(规划、工具、记忆、自我反思、检索)假设更多更好。我们研究跨组件干扰(CCI):组件交互破坏性导致的退化。我们在HotpotQA和GSM8K上对五个组件的所有32个子集进行全因子实验,使用Llama-3.1-8B/70B(96条件,最多10个种子)。All-In系统始终不优:在HotpotQA,单工具代理比All-In高32%(F1 0.233 vs 0.177,p=0.023);在GSM8K,3个组件子集比All-In高79%(0.43 vs 0.24,p=0.010)。最优组件数量依赖任务(k*=1-4)和规模敏感:在70B时,对8B有害的组合在70B提供收益,尽管All-In仍落后于最佳子集。我们拟合主效应回归(R²=0.916,adj-R²=0.899,LOOCV=0.872),计算精确Shapley值,发现183/325次子模性违反(56.3%),显示贪心选择不可靠。报告Tool Use、Self-Reflection和Retrieval三体协同(INT_3=+0.175,95% CI [+0.003,+0.351])作为探索性发现。CCI在模型家族(Qwen2.5)和提示改写中具有可重复性和鲁棒性。我们的发现表明,最大配置代理默认应被任务特定子集选择取代,通过交互分析。

英文摘要

LLM agent systems are built by stacking scaffolding components (planning, tools, memory, self-reflection, retrieval) assuming more is better. We study cross-component interference (CCI): degradation when components interact destructively. We run a full factorial experiment over all 2^5=32 subsets of five components on HotpotQA and GSM8K with Llama-3.1-8B/70B (96 conditions, up to 10 seeds). The All-In system is consistently suboptimal: on HotpotQA, a single-tool agent surpasses All-In by 32% (F1 0.233 vs 0.177, p=0.023); on GSM8K, a 3-component subset beats All-In by 79% (0.43 vs 0.24, p=0.010). Optimal component count is task-dependent (k*=1-4) and scale-sensitive: at 70B, combinations that hurt at 8B provide gains, though All-In still trails the best subset. We fit a main-effects regression (R^2=0.916, adj-R^2=0.899, LOOCV=0.872), compute exact Shapley values, and find 183/325 submodularity violations (56.3%), showing greedy selection is unreliable. A three-body synergy among Tool Use, Self-Reflection, and Retrieval (INT_3=+0.175, 95% CI [+0.003,+0.351]) is reported as exploratory. CCI replicates across model families (Qwen2.5) and is robust to prompt paraphrasing. Our findings suggest maximally-equipped agent defaults should be replaced by task-specific subset selection via interaction-aware analysis.

2605.05715 2026-05-08 cs.AI cs.CL cs.LG

Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes

可解码但无法通过固定残差-流线性引导进行纠正:来自医疗LLM失败模式的证据

Ming Liu

发表机构 * Amazon(亚马逊)

AI总结 研究探讨了LLM隐藏状态中线性可解码的失败信号能否用于纠正失败,通过Overthinking现象发现线性引导无法有效纠正,但可解码的失败结构支持生成后可靠性估计。

Comments 22 pages (14 main + 8 appendix), 5 figures, 7 tables. Under review

详情
AI中文摘要

可解码但无法通过固定残差-流线性引导进行纠正:来自医疗LLM失败模式的证据

英文摘要

Can linearly decodable failure signals in LLM hidden states be leveraged to correct those failures? We investigate this classification-correction gap via Overthinking (OT)--a stable behavioral regime (Jaccard >= 0.81, 94% inter-annotator agreement) in medical QA where models answer correctly under resampling yet fail in extended chain-of-thought. OT is linearly decodable at 71.6% balanced accuracy (p < 10^{-16}). Yet five families of fixed linear steering (29 configurations, n=1,273) all yield Delta ~= 0, with identical null results cross-architecture (Qwen2.5-7B) and cross-domain (MMLU-STEM). Three convergent lines of evidence suggest representational entanglement: the OT direction has 85-88% overlap with task-critical computation (specificity ratio <= 0.152); non-targeted shared-direction steering damages accuracy (-12.1pp); and LEACE concept erasure damages accuracy (-3.6pp, p=0.01), while 10 random erasures produce Delta=+0.3pp. The per-instance probe-steering correlation is r=-0.002 (p=0.97). Positively, the same probe enables selective abstention (held-out AUROC=0.610, exceeding all five uncertainty baselines, p=0.009): decodable failure structure supports post-generation reliability estimation even when the fixed linear steering family cannot exploit it for correction.

2605.05714 2026-05-08 cs.CV cs.RO

TriRelVLA: Triadic Relational Structure for Generalizable Embodied Manipulation

TriRelVLA:三元关系结构用于可迁移的具身操作

Hanyu Zhou, Chuanhao Ma, Gim Hee Lee

发表机构 * School of Computing, National University of Singapore(新加坡国立大学计算机学院) School of Artificial Intelligence and Automation, Huazhong University of Science and Technology(华中科技大学人工智能与自动化学院)

AI总结 TriRelVLA通过构建三元关系结构,解决视觉语言动作模型在未见场景和物体上的泛化问题,提升具身操作的迁移能力。

详情
AI中文摘要

视觉语言动作(VLA)模型在训练见过的机器人任务上表现良好,但在未见场景和物体上泛化能力差。其关键限制在于隐含的视觉表示将物体外观、背景和场景布局纠缠在一起,使策略对视觉变化敏感。先前工作通过结构化中间表示提高迁移性,但这些表示主要捕捉场景语义而非与动作相关的关系。因此,动作预测仍依赖于外观统计。我们发现,操作动作依赖于物体-手-任务的关系结构,该结构支配任务要求、机器人状态和物体属性之间的交互。基于此,我们提出TriRelVLA,一种三元关系VLA框架,用于可迁移的具身操作。我们的方法包括三个部分:1)从多模态输入构建显式的物体-手-任务三元表示作为关系基础。2)构建任务导向的关系图。任务引导的交叉注意机制形成节点,关系感知的图变换器建模它们之间的交互。3)执行关系条件的动作生成。关系结构被压缩到瓶颈空间并投影到LLM中进行动作预测。这种三元关系瓶颈减少了对外观统计的依赖,实现了跨场景、物体和任务组合的迁移。我们进一步引入一个现实世界的机器人数据集进行微调。实验显示在微调任务上表现强劲,并在跨场景、跨物体和跨任务泛化方面取得明显收益。

英文摘要

Vision-language-action (VLA) models perform well on training-seen robotic tasks but struggle to generalize to unseen scenes and objects. A key limitation lies in their implicit visual representations, which entangle object appearance, background, and scene layout. This makes policies sensitive to visual variations. Prior work improves transferability through structured intermediate representations that objectify visual content. However, these representations mainly capture scene semantics instead of action-relevant relations. As a result, action prediction remains tied to appearance statistics. We observe that manipulation actions depend on the object-hand-task relational structure, which governs interactions among task requirements, robot states, and object properties. Based on this observation, we propose TriRelVLA, a triadic relational VLA framework for generalizable embodied manipulation. Our approach consists of three components: 1) We construct explicit object-hand-task triadic representations from multimodal inputs as relational primitives. 2) We build a task-grounded relational graph. Task-guided cross-attention forms nodes, and a relation-aware graph transformer models interactions among them. 3) We perform relation-conditioned action generation. The relational structure is compressed into a bottleneck space and projected into the LLM for action prediction. This triadic relational bottleneck reduces reliance on appearance statistics and enables transfer across scenes, objects, and task compositions. We further introduce a real-world robotic dataset for fine-tuning. Experiments show strong performance on fine-tuned tasks and clear gains in cross-scene, cross-object, and cross-task generalization.

2605.05712 2026-05-08 cs.CV

EgoEMG: A Multimodal Egocentric Dataset with Bilateral EMG and Vision for Hand Pose Estimation

EgoEMG:一个用于手姿态估计的多模态自体视角数据集,包含双侧EMG和视觉信息

Ziheng Xi, Jiayi Yu, Yitao Wang, Yanbo Duan, Jianjiang Feng, Jie Zhou

发表机构 * Department of Automation, Tsinghua University(清华大学自动化系)

AI总结 EgoEMG数据集融合双侧EMG和视觉信息,用于手姿态估计,包含16通道EMG、IMU、RGB视频和RGB-D视频,涵盖41名参与者60种手势,提供EMG-to-pose、vision-to-pose和融合任务的基准测试。

Comments 34 pages, 13 figures, 15 tables. Submitted to NeurIPS 2026

详情
AI中文摘要

表面肌电图(sEMG)记录手部运动时的肌肉活动,并可通过解码恢复详细的手部姿态。EMG和自体视觉在手部感知中互补:EMG在遮挡和光照差的情况下仍能捕捉精细的手指姿态,而视觉提供全局手部配置。然而,现有的数据集未能同步这两种模态。我们提出了EgoEMG,一个多模态自体视角数据集,用于双手姿态估计。EgoEMG包括双侧腕带EMG(共16个通道,每侧8个通道),采样频率为2 kHz,120 Hz的IMU,自体视角广角RGB视频,外部RGB-D视频,以及运动捕捉导出的手部运动和腕关节角度。该数据集覆盖41名参与者执行的60种手势类别,包括30种单手手势和30种双手手势,总记录时间超过10小时。我们还引入了一个基准,包含三个任务——EMG-to-pose、vision-to-pose和EMG+vision融合——在共享的关节角预测目标和共同的泛化分割轴(跨手势、跨用户和结合)下。作为基线,我们评估了EMGFormer用于EMG-to-pose和通用的ResNet/ViT后端用于vision-to-pose。我们进一步研究了一种残差融合架构,该架构在匹配的轻量视觉-only基线之上有所改进。共同,EgoEMG及其基准为未来研究多模态手姿态估计提供了基础。

英文摘要

Surface electromyography (sEMG) records muscle activity during hand movement and can be decoded to recover detailed hand articulation. EMG and egocentric vision are complementary for hand sensing: EMG captures fine-grained finger articulation even under occlusion and poor lighting, while vision provides global hand configuration. However, no existing dataset synchronizes both modalities. We present EgoEMG, a multimodal egocentric dataset for bimanual hand pose estimation. EgoEMG includes bilateral wristband EMG with 16 total channels (8 per wrist) sampled at 2 kHz, 120 Hz IMU, egocentric wide-angle RGB video, external RGB-D video, and mocap-derived hand motion with wrist articulation angles. The dataset covers 41 participants performing 60 gesture classes, including 30 single-hand gestures and 30 bimanual gestures, totaling more than 10 hours of recording. We also introduce a benchmark with three tasks -- EMG-to-pose, vision-to-pose, and EMG+vision fusion -- under a shared joint-angle prediction target and common generalization split axes (cross-gesture, cross-user, and combined). As baselines, we evaluate EMGFormer for EMG-to-pose and generic ResNet/ViT backbones for vision-to-pose. We further study a residual fusion architecture that improves over matched lightweight vision-only baselines. Together, EgoEMG and its benchmark establish a foundation for future research on multimodal hand pose estimation with EMG and vision.

2605.05711 2026-05-08 cs.CV cs.GR cs.HC cs.LG cs.MM

Closing the Loop: Unified 3D Scene Generation and Immersive Interaction via LLM-RL Coupling

闭环:通过LLM-RL耦合实现统一的3D场景生成与沉浸式交互

Anh H. Vo, Sungyo Lee, Phil-Joong Kim, Soo-Mi Choi, Yong-Guk Kim

发表机构 * Department of Computer Engineering, Sejong University(全州大学计算机工程系)

AI总结 本文提出一种统一框架,通过LLM-RL耦合实现语言驱动的3D场景生成与沉浸式交互的闭环,提升互动多媒体系统的适应性和沉浸感。

详情
AI中文摘要

近期大型语言模型(LLM)的进步显著提升了语言驱动的3D内容生成,但现有方法仍将场景生成和用户交互视为独立过程,限制了互动多媒体系统的适应性和沉浸潜力。本文提出一种统一框架,通过LLM-RL耦合实现语言驱动的3D场景生成与沉浸式交互的闭环。给定自然语言指令,系统首先使用LLM构建结构化的场景表示,然后通过强化学习在几何和语义约束下优化空间布局。生成的环境部署在虚拟现实中,以促进人机交互闭环,其中用户交互提供持续反馈,使生成内容与人类感知和可用性对齐。通过紧密耦合生成和交互,所提框架实现了更响应、适应和真实的多媒体体验。在ALFRED基准测试中,实验展示了任务驱动场景生成的最先进性能。此外,定性结果和用户研究显示了沉浸、交互质量和任务效率的一致改进,突显了生成和交互闭环整合在下一代多媒体系统中的重要性。我们的项目页面可在https://proj-showcase.github.io/h3ds/上找到。

英文摘要

Recent advances in large language models (LLMs) have significantly improved language-driven 3D content generation, but most existing approaches still treat scene generation and user interaction as separate processes, limiting the adaptability and immersive potential of interactive multimedia systems. This paper presents a unified framework that closes the loop between language-driven 3D scene generation and immersive user interaction. Given natural language instructions, the system first constructs structured scene representations using LLMs, and then optimizes spatial layouts via reinforcement learning under geometric and semantic constraints. The generated environments are deployed in a virtual reality setting to facilitate HRI-in-the-loop, where user interactions provide continuous feedback to align generated content with human perception and usability. By tightly coupling generation and interaction, the proposed framework enables more responsive, adaptive, and realistic multimedia experiences. Experiments on the ALFRED benchmark demonstrate state-of-the-art performance in task-based scene generation. Furthermore, qualitative results and user studies show consistent improvements in immersion, interaction quality, and task efficiency, highlighting the importance of closed-loop integration of generation and interaction for next-generation multimedia systems. Our project page can be found at https://proj-showcase.github.io/h3ds/.

2605.05710 2026-05-08 cs.LG

On the Blessing of Pre-training in Weak-to-Strong Generalization

在弱到强泛化中预训练的恩赐

Wei Yao, Wang Zhaoyang, Gengze Xu, Chen Qian, Dongrui Liu, Ziqiao Wang, Yong Liu, Yunbei Xu

发表机构 * Renmin University of China(中国人民大学) Shanghai Jiao Tong University(上海交通大学) Tongji University(同济大学) National University of Singapore(新加坡国立大学)

AI总结 本文研究了弱到强泛化现象,发现预训练是其必要前提,理论和实验均证明预训练通过提供几何热启动使模型进入有效区域,从而实现泛化。

Comments 40 pages, 14 figures

详情
AI中文摘要

弱到强泛化(W2SG)范式认为预训练强模型可超越其弱监督,但预训练的作用理论和经验上尚不明确。本文发现预训练是W2SG出现的必要条件。理论上,在高维单索引模型框架中,利用 spiked Gaussian 数据形式化W2SG问题,将预训练建模为谱初始化步骤。基于先前关于随机初始化下学习失败的不可能性结果,证明当预训练提供几何热启动,使模型处于具有扰动强凸几何特征的有效区域时,W2SG可实现。在此区域内,推导出严格的泛化界限,自然捕捉优化动态:初始性能提升后受弱监督偏差限制的饱和瓶颈。实验上,通过受控合成模拟验证所有假设和理论洞察。最后,通过大规模评估数百个大语言模型中间预训练检查点,证明W2SG并非固有能力,而是与预训练进程紧密相关的相变现象。

英文摘要

The paradigm of Weak-to-Strong Generalization (W2SG) suggests that a pre-trained strong model can surpass its weak supervisor, yet the decisive role of pre-training remains theoretically and empirically under-explored. In this work, we identify pre-training as the essential prerequisite for the emergence of W2SG. Theoretically, we formalize the W2SG problem within a high-dimensional single-index model framework using spiked Gaussian data, modeling pre-training as a spectral initialization step. Building upon prior impossibility results regarding the failure of learning under random initialization, we prove that W2SG is achievable when pre-training provides a geometric warm start that places the model within an "effective region" characterized by a perturbed strong-convexity geometry. Within this region, we derive a rigorous generalization bound that naturally captures the optimization dynamics: an initial performance improvement followed by a saturation bottleneck dictated by the weak supervisor's bias. Empirically, we first validate all our assumptions and theoretical insights through controlled synthetic simulations. Finally, through a massive-scale evaluation of hundreds of intermediate pre-training checkpoints from large language models, we demonstrate that W2SG is not an innate capability but emerges via a phase transition tightly coupled with the progression of pre-training.

2605.05709 2026-05-08 cs.AI

Conceal, Reconstruct, Jailbreak: Exploiting the Reconstruction-Concealment Tradeoff in MLLMs

隐藏、重建、突破:在大规模语言模型中利用重建-隐藏权衡

Md Farhamdur Reza, Richeng Jin, Tianfu Wu, Huaiyu Dai

发表机构 * NC State University(北卡罗来纳州立大学) Zhejiang University(浙江大学)

AI总结 本文探讨了在多模态大语言模型中利用重建与隐藏的权衡进行意图混淆攻击,提出了一种基于字符移除的变体构造方法,并引入关键词相关的干扰图像以提高攻击效果。

Comments 39 pages, including appendices

详情
AI中文摘要

基于意图混淆的多模态大语言模型(MLLM)攻击将有害查询转换为隐藏的多模态输入以绕过安全机制。我们发现此类攻击受重建-隐藏权衡的支配:转换后的输入必须隐藏有害意图以避免安全过滤器,同时仍能被受害者模型足够重建原始请求。通过分析三种代表性黑盒方法的重建,我们发现现有转换难以平衡此权衡,限制了其有效性。相比之下,我们发现字符移除变体能更好地平衡此权衡。基于此,我们提出了一种隐蔽感知变体构造方法,该方法贪婪地选择低有害关键词对齐度且互不相似的字符移除变体,并通过五种模态感知提示策略实例化它们。我们进一步引入了与关键词相关的干扰图像,这些图像在多样化上下文中描绘有害关键词,比通用干扰图像提供更有效的辅助视觉上下文。在封闭源代码和开源MLLM上的实验表明,所提出的方法优于强基线,揭示了一个未被充分探索的漏洞:模型自身的重建能力可以被利用以恢复隐藏的有害意图并生成不安全的响应。

英文摘要

Intent-obfuscation-based jailbreak attacks on multimodal large language models (MLLMs) transform a harmful query into a concealed multimodal input to bypass safety mechanisms. We show that such attacks are governed by a \emph{reconstruction--concealment tradeoff}: the transformed input must hide harmful intent from safety filters while remaining recoverable enough for the victim model to reconstruct the original request. Through a reconstruction analysis of three representative black-box methods, we find that existing transformations struggle to balance this tradeoff, limiting their effectiveness. In contrast, we show that character-removed variants achieve a better balance. Building on this, we propose \emph{concealment-aware variant construction}, which greedily selects character-removed variants that are low in harmful-keyword alignment and mutually diverse, and instantiates them through five modality-aware prompting strategies. We further introduce \emph{keyword-related distractor images} that depict the harmful keyword in diverse contexts, providing more effective auxiliary visual context than generic distractor images. Experiments across closed-source and open-source MLLMs show the proposed strategies outperform strong baselines, revealing an underexplored vulnerability: a model's own reconstruction ability can be exploited to recover hidden harmful intent and produce unsafe responses.

2605.05707 2026-05-08 cs.RO

On the Emergence of Pendular Structure in Multi-Contact Locomotion

多接触运动中摆动结构的出现

Lingxue Lyu, Zihui Liu

发表机构 * School of Engineering and Applied Science, University of Pennsylvania(宾夕法尼亚大学工程与应用科学学院) Department of Aeronautics & Astronautics, Stanford University(斯坦福大学航空航天系)

AI总结 本文探讨了多接触运动中摆动结构的形成机制,通过分析角动量变化率的优化问题,揭示了足部几何和摩擦锥对运动模式的影响,并验证了在点质量四足机器人和Unitree Go1上实验结果。

详情
AI中文摘要

LIPM在腿式运动控制中普遍存在,但通常是建模选择而非控制器成本的偏好。本文试图明确这一联系。基于一个小型质心OCP,惩罚角动量变化率,分析其最优解的形态。得出三点:在全秩支撑时,最优解向摆动力模式漂移,速率由动量雅可比的SVD决定;在N=2支撑(如小跑)时,摩擦锥引入了$\|\dot{H}_G\|$的下限,无法通过重量调整解决;我们还发现一个非光滑可行性拐点,可写成闭式表达。添加一个任务项要求非零$\dot{H}_G$,使最优解脱离摆动集。这些发现与经典ZMP/DCM图景接近。我们在点质量四足机器人和Unitree Go1上进行测试,并注意到渐近故事在闭环实际行为中不再准确。

英文摘要

LIPM is everywhere in legged-locomotion control, but almost always as a modeling choice rather than as something the controller's cost actually prefers. This note tries to make that link more explicit. Working from a small centroidal OCP that penalizes the rate of angular momentum, we look at what its optimum tends to look like. Three things come out. With full-rank stance, the optimum drifts toward a pendular force pattern at a rate determined by the SVD of the moment Jacobian; the constant is set by foot-span geometry and matches the experiments to within 16%. With N=2 stance, as in trot, the friction cone introduces a lower bound on $\|\dot{H}_G\|$ that no amount of weight tuning fixes; we also see a non-smooth feasibility kink at a critical horizontal acceleration that we can write in closed form. Adding a task term that asks for a nonzero $\dot{H}_G$ moves the optimum off the pendular set in a predictable way. None of this is far from the classical ZMP/DCM picture. We test these claims on a point-mass quadruped and on the Unitree Go1 in MuJoCo (open-loop QP and a torque-level closed-loop controller), and we note where the asymptotic story stops being a good description of what the closed loop actually does.

2605.05706 2026-05-08 cs.AI q-bio.QM

Resolving the bias-precision paradox with stochastic causal representation learning for personalized medicine

用随机因果表示学习解决偏差-精度悖论以实现个性化医疗

Peisong Zhang, Manqiang Peng, Yuxuan Wu, Pawit Phadungsaksawasdi, Wesley Yeung, Ye Zhang, Trang Nguyen, Qiang Zhang, Nan Liu, Meng Wang, Kee Yuan Ngiam, Yih-Chung Tham, Ching-Yu Cheng, Tianfan Fu, Qingyu Chen, Rosemary Ke, Chang Li, Wenzhuo Yang, Zhenghao Lu, Chunyou Lai, Yu Zhang, Sheng Zhong, Hao Deng, Dianbo Liu

发表机构 * Chengdu OrganoidMed Medical Laboratory(成都OrganoidMed医学实验室)

AI总结 本文提出基于sMMD的随机对齐策略,解决因果表示学习中的偏差-精度悖论,提升个性化医疗中分布偏移下的预测准确性,减少误差并提高高风险任务的召回率。

详情
AI中文摘要

从纵向观察数据估计个体化治疗效应是数据驱动医学的核心,但现有方法面临根本性限制:减少混杂偏倚常会抑制临床相关信息的异质性,降低患者特定预测。本文将这种张力定义为因果表示学习中的偏差-精度悖论,并引入基于采样最大均值差异(sMMD)的随机对齐策略,用子集级匹配替代全局对抗平衡。我们在此框架中实现了反事实结果预测,具有基于归因的可解释性。在两个大规模ICU队列(n=27783)中,我们的框架在分布偏移下提高了准确性,将误差减少高达11.5%,并在高风险任务中显著提高召回率。机制分析显示,sMMD选择性保留临床决定性变量。在人机评估中,我们的方法优于训练中的医生和大语言模型,并提高医生准确性14.7%,同时减少决策时间,实现可解释的实时临床决策支持。

英文摘要

Estimating individualized treatment effects from longitudinal observational data is central to data-driven medicine, yet existing methods face a fundamental limitation: reducing confounding bias often suppresses clinically informative heterogeneity, degrading patient-specific predictions. Here, we identify this tension as a bias-precision paradox in causal representation learning and introduce sampling-based maximum mean discrepancy (sMMD), a stochastic alignment strategy that replaces global adversarial balancing with subset-level matching. We instantiate this approach in a framework for counterfactual outcome prediction with attribution-grounded interpretability. Across two large-scale ICU cohorts (n = 27,783), our framework improves accuracy under distribution shift, reducing error by up to 11.5% and substantially increasing recall in high-risk tasks. Mechanistic analyses show that sMMD selectively preserves clinically decisive variables. In human-AI evaluation, our method outperforms clinicians-in-training and large language models, and improves clinician accuracy by 14.7% while reducing decision time, enabling interpretable, real-time clinical decision support.

2605.05702 2026-05-08 cs.AI

Knowledge-Graph Paths as Intermediate Supervision for Self-Evolving Search Agents

知识图谱路径作为自演化搜索代理的中间监督

Huyu Wu, Jun Liu, Xiaochi Wei, Yan Gao, Yi Wu, Yao Hu

发表机构 * Xiaohongshu Inc.(小红书公司)

AI总结 本文通过利用知识图谱路径作为中间监督,改进自演化搜索代理的提问生成和奖励塑造,解决传统方法中提问孤立和奖励信息不足的问题,提升多跳问答性能。

详情
AI中文摘要

自演化搜索代理通过生成并解决自身搜索任务来减少对人类编写训练问题的依赖。我们基于Search Self-Play(SSP),一种代表性的问题生成与解答框架,其中问题通过多步搜索和推理生成和解答。然而,在实践中,SSP面临两个瓶颈:问题生成器从孤立的答案实体中构造问题,缺乏关系上下文,导致早期自我训练中产生许多无效或不可验证的问题;而解答器仅接收二元结果奖励,丢弃了部分轨迹上轨道的有用信号。我们通过重用知识图谱路径作为构造派生的中间监督,解决这两个瓶颈。首先,我们通过LLM引导的知识图谱子图来引导问题生成,为问题生成器提供关系上下文。其次,我们发现构造和解答多跳问题可能涉及重叠的中间实体:用于构造问题的事实桥梁可能为解答提供近似的 waypoints。利用这种重叠,我们引入了Waypoint Coverage Reward(WCR),根据其在构造路径上的实体覆盖情况,为不正确的解答轨迹提供分级部分信用奖励,同时保留对正确答案的完整奖励。在七个问答基准和九种模型配置上,我们的方法在所有配置中都提高了平均分数,包括在多跳问答任务上的显著提升。这些结果表明,知识图谱路径可以作为轻量级的中间监督,提供关系指导和过程反馈,而无需额外的任务特定人工标注或手动标记的过程步骤。

英文摘要

Self-evolving search agents reduce reliance on human-written training questions by generating and solving their own search tasks. We build on Search Self-Play (SSP), a representative Proposer and Solver framework in which questions are generated and answered via multi-step search and reasoning. In practice, however, SSP faces two bottlenecks: the Proposer constructs questions from isolated answer entities without relational context, yielding many invalid or unverifiable questions in early self-play training, while the Solver receives only a binary outcome reward that discards useful signal from partially on-track search trajectories. We address both bottlenecks by reusing knowledge-graph paths as construction-derived intermediate supervision for both question construction and reward shaping. First, we ground question construction in LLM-guided knowledge-graph subgraphs, providing relational context for the Proposer. Second, we observe that constructing and solving a multi-hop question can involve overlapping intermediate entities: the factual bridges used to formulate the question may provide approximate waypoints for answering it. Exploiting this overlap, we introduce Waypoint Coverage Reward (WCR), which grants graded partial credit to incorrect Solver trajectories according to their coverage of entities on the construction path, while preserving full reward for correct answers. Across seven QA benchmarks and nine model configurations, our approach improves the average score over standard SSP in all configurations, including notable gains on multi-hop QA tasks. These results suggest that knowledge-graph paths can be reused as lightweight intermediate supervision, providing both relational guidance and process feedback without additional task-specific human annotations or manually labeled process steps.

2605.05701 2026-05-08 cs.AI

Inference-Time Budget Control for LLM Search Agents

推理时预算控制用于大语言模型搜索代理

Zhengru Fang, Senkang Forest Hu, Zhonghao Chang, Yu Guo, Yihang Tao, Hongyao Liu, Mengzhe Ruan, Jun Huang, Yuguang Fang

发表机构 * City University of Hong Kong(香港城市大学) Tsinghua University(清华大学) Alibaba Ant Group(阿里巴巴集团)

AI总结 本文研究了多跳问答中推理时双预算控制问题,通过两阶段控制策略,在搜索阶段根据任务级信息价值评分选择检索、分解或回答承诺,在最终阶段根据证据选择性修正答案,实验表明该方法在多个基准上取得积极收益。

详情
AI中文摘要

大语言模型搜索代理在推理时越来越多地依赖工具,但其轨迹常受工具调用和生成令牌的硬性限制。在双重预算下,更好的答案不仅需要更强的模型,还需要明确控制哪些搜索动作应获得下一个预算单位以及何时积累的证据足够提交最终答案。我们研究了多跳问答中的此问题,并将其公式化为两阶段推理时预算控制。在搜索阶段,我们的控制器为每个可行动作分配任务级信息价值(VOI)评分,定义为当前搜索状态和剩余双重预算下的边际任务价值的运营估计值,并利用此评分在检索、分解和回答承诺之间进行选择。搜索后,一个选择性证据基础最终器将轨迹答案与优化候选答案进行比较,并仅在残余误差似乎为低风险答案错误时进行重写。在四个多跳问答基准、三个LLM后端和四个预算级别上,该方法在相同的硬双重预算协议下,相对于四个审计基准显示出积极的总体收益。消融实验显示,搜索时预算控制,特别是预算依赖性惩罚,提供了主要的性能增益,而答案时控制在检索路径已经充分时主要帮助。这些结果表明,大语言模型搜索代理的推理时预算控制应同时管理搜索期间预算的使用方式以及最终答案的提交方式。

英文摘要

LLM search agents increasingly rely on tools at inference time, but their trajectories are often constrained by hard limits on both tool calls and generated tokens. Under such dual budgets, better answers require not only stronger models, but also explicit control over which search action should receive the next budget unit and when the accumulated evidence is sufficient to commit a final answer. We study this problem in multi-hop question answering (QA) and formulate it as two-stage inference-time budget control. At search time, our controller assigns each feasible action a task-level Value-of-Information (VOI) score, defined as an operational estimate of marginal task value per unit budget under the current search state and remaining dual budget, and uses this score to choose among retrieval, decomposition, and answer commitment. After search, a selective evidence-grounded finalizer compares the trajectory answer with a refined candidate and rewrites only when the residual error appears to be a low-risk answer-form error. Across four multi-hop QA benchmarks, three LLM backbones, and four budget levels, the method yields positive aggregate gains over four audited baselines under the same hard dual-budget protocol. Ablations show that search-time budget control, especially budget-dependent penalty, provides the main performance gain, while answer-time control helps mainly when the retrieval path is already adequate. These results suggest that inference-time budget control for LLM search agents should govern both how budget is spent during search and how the final answer is committed.

2605.05697 2026-05-08 cs.LG cs.AI

Budgeted Attention Allocation: Cost-Conditioned Compute Control for Efficient Transformers

预算注意力分配:面向高效变换器的成本条件计算控制

Amrit Nidhi

发表机构 * Independent Researcher(独立研究者)

AI总结 本文研究了预算注意力分配机制,通过条件注意力预算控制实现变换器的高效计算,展示了在不同预算下提升准确率和速度的可行性。

Comments 12 pages, 1 figure, 10 tables

详情
AI中文摘要

变换器通常暴露每个训练模型的单次推理成本,而部署系统往往需要多个成本-质量操作点。我们研究了预算注意力分配,一种基于请求注意力预算的单调头部门控机制。密集预热对稳定性至关重要:在稳健的合成序列任务中,一个预算模型在0.303估计注意力成本时达到99.7%的准确率,在0.504成本时达到100.0%的准确率。在外部AG新闻数据集上,使用自定义词级变换器,硬门适应将软成本控制转化为测量单线程CPU速度,达到82.1%的准确率,速度提升1.28倍,在预算0.50时。在预训练的BERT-Mini AG新闻中,预算结构剪枝在预算0.50时达到87.6%的准确率,速度提升1.20倍;一个验证排名的零样本密集后处理结构基线达到86.1%,一个恢复周期将该预算专家提升至87.9%。在DBpedia14上,BERT-Mini预算门在精确预算0.50时达到97.4%,优于密集全注意力的96.6%。静态固定预算门和恢复的密集专家仍表现强劲。因此,贡献并非普遍主导,而是在不同预算下可控制的检查点的可重复可行性研究,可以以注意力成本换取准确率,并在小型CPU基准上转化为测量的结构速度提升。

英文摘要

Transformers usually expose one inference cost per trained model, while deployed systems often need multiple cost-quality operating points. We study Budgeted Attention Allocation, a monotone head-gating mechanism conditioned on a requested attention budget. Dense warm-starting is important for stability: on a robust synthetic sequence task, one budgeted model reaches 99.7% accuracy at 0.303 estimated attention cost and 100.0% accuracy at 0.504 cost. On held-out AG News with a custom word-level transformer, hard-gate adaptation turns soft cost control into measured single-thread CPU speed, reaching 82.1% accuracy with 1.28x speedup at budget 0.50. In pretrained BERT-Mini AG News, budgeted structural pruning reaches 87.6% accuracy with 1.20x speedup at budget 0.50; a validation-ranked zero-shot dense post-hoc structural baseline reaches 86.1%, and one recovery epoch raises that per-budget specialist to 87.9%. On DBpedia14, BERT-Mini budgeted gates reach 97.4% at exact budget 0.50 versus 96.6% for dense full attention. Static fixed-budget gates and recovered dense specialists remain strong. The contribution is therefore not universal dominance, but a reproducible feasibility study of one controllable checkpoint across budgets that can trade attention cost for accuracy and be converted into measured structural speedups on small CPU benchmarks.

2605.05694 2026-05-08 cs.CV

Adaptive Physical-Facial Representation Fusion via Subject-Invariant Cross-Modal Prompt Tuning for Video-Based Emotion Recognition

基于主体不变跨模态提示调优的自适应物理-面部表征融合用于基于视频的情感识别

Xiwen Luo, Jia Li, Rencheng Song, Yu Liu, Juan Cheng

发表机构 * Department of Biomedical Engineering(生物医学工程系) Anhui Province Key Laboratory of Measuring Theory and Precision Instrument(安徽省测量理论与精密仪器重点实验室) School of Computer Science and Information Engineering(计算机科学与信息工程学院)

AI总结 本文提出一种主体不变的跨模态提示调优框架,通过将rPPG波形转换为噪声鲁棒的时间-频率表示,并引入解耦共享-特定适配器以提升跨主体泛化能力,实验证明在MAHNOB-HCI和DEAP基准上优于现有方法。

Comments The source code will be available at https://github.com/MSA-LMC/SCPT

详情
AI中文摘要

基于面部视频的情感识别能够实现非接触式的人类情绪状态推断。尽管面部表情被广泛用作线索,但它们无法完全反映内在的 affective 状态。远程脉搏波测记(rPPG)提供了互补的生理信息,但其高度易受噪声和跨主体变异性影响,限制了对未见过个体的泛化能力。现有多模态方法结合面部和rPPG特征,但其融合策略往往破坏预训练的面部表示,并缺乏显式机制来抑制主体特定的变异性。为了解决这些问题,我们提出了一种用于基于视频的情感识别的主体不变跨模态提示调优框架。具体而言,rPPG波形被转换为噪声鲁棒的时间-频率表示(TFRs),从中生成模态互补的提示以调节冻结的视觉Transformer(ViT)内的面部标记。这种设计实现了有效的跨模态交互,同时保留了通过预训练骨干网络学习的可泛化面部表示。此外,我们引入了解耦共享-特定适配器(DSSA)到每个ViT层中,以显式分离主体共享和主体特定的组件,从而提高跨主体泛化能力。在MAHNOB-HCI和DEAP基准上的实验表明,所提出的方法在识别准确性和泛化能力上均优于强大的基线方法,突显了其在基于视频的情感识别中的有效性。

英文摘要

Emotion recognition from facial videos enables non-contact inference of human emotional states. Although facial expressions are widely used cues, they cannot fully reflect intrinsic affective states. Remote photoplethysmography (rPPG) provides complementary physiological information, but it is highly susceptible to noise and inter-subject variability, limiting generalization to unseen individuals. Existing multimodal methods combine facial and rPPG features, yet their fusion strategies often disrupt pretrained facial representations and lack explicit mechanisms to suppress subject-specific variations. To address these issues, we propose a subject-invariant cross-modal prompt-tuning framework for video-based emotion recognition. Specifically, rPPG waveforms are transformed into noise-robust time-frequency representations (TFRs), from which modality-complementary prompts are generated to modulate facial tokens within a frozen Vision Transformer (ViT). This design enables effective cross-modal interaction while preserving the generalizable facial representations learned by the pretrained backbone. In addition, we introduce a decoupled shared-specific adapter (DSSA) into each ViT layer to explicitly separate subject-shared and subject-specific components, thereby improving cross-subject generalization. Experiments on the MAHNOB-HCI and DEAP benchmarks demonstrate that the proposed method consistently outperforms strong baselines in both recognition accuracy and generalization ability, highlighting its effectiveness for video-based emotion recognition.

2605.05692 2026-05-08 cs.CV cs.AI cs.CR

CFE-PPAR: Compression-friendly encryption for privacy-preserving action recognition leveraging video transformers

CFE-PPAR: 一种适用于隐私保护动作识别的压缩友好加密方法

Haiwei Lin, Shoko Imaizumi, Hitoshi Kiya

发表机构 * Graduate School of Informatics(信息研究生院) Faculty of System Design(系统设计学部) Chiba University(千叶大学) Tokyo Metropolitan University(东京 Metropolitan 大学)

AI总结 本文提出CFE-PPAR,一种压缩友好的加密方法,通过视频变换器实现加密视频的高效识别,优于现有方法。

Comments 6 pages, 5 figures, accepted to 2026 IEEE International Conference on Image Processing (ICIP)

详情
AI中文摘要

隐私保护动作识别(PPAR)使机器能够在不泄露敏感视觉内容的情况下理解视频中的人类活动。在各种PPAR策略中,基于加密的方法在保持高识别性能的同时提供强大的隐私保护。然而,当加密视频被压缩时,这些方法会导致识别性能和视觉质量急剧下降。为了解决这些问题,本文提出了首个适用于PPAR的压缩友好加密方法CFE-PPAR。在CFE-PPAR中,使用秘密密钥加密的视频可以直接由视频变换器识别,该变换器使用与视频加密相同的密钥变换的参数。实验表明,在Motion-JPEG和H.264压缩下,CFE-PPAR在UCF101和HMDB51数据集上优于先前方法。

英文摘要

Privacy-preserving action recognition (PPAR) enables machines to understand human activities in videos without revealing sensitive visual content. Among the various strategies for PPAR, encryption-based methods achieve strong privacy protection while maintaining high recognition performance. However, these methods lead to a catastrophic decrease in recognition performance and visual quality when the encrypted videos are compressed. That is, the previous methods are not compression-friendly. To address these issues, in this paper, we propose the first compression-friendly encryption method for PPAR, called CFE-PPAR. In CFE-PPAR, videos encrypted with secret keys can be directly recognized by a video transformer, which uses parameters transformed by the same keys as those used for video encryption. In experiments, it is verified that CFE-PPAR outperforms previous methods on the UCF101 and HMDB51 datasets under Motion-JPEG and H.264 compression.

2605.05689 2026-05-08 cs.AI

GCCM: Enhancing Generative Graph Prediction via Contrastive Consistency Model

GCCM: 通过对比一致性模型增强生成图预测

Shaozhen Ma, Wei Huang, Hanchen Wang, Dong Wen, Wenjie Zhang

发表机构 * University of New South Wales(新南威尔士大学) University of Technology Sydney(技术与科学大学)

AI总结 GCCM通过引入对比一致性目标和特征扰动,解决图预测中对比学习易陷入捷径解的问题,提升预测性能。

详情
AI中文摘要

条件生成模型,特别是基于扩散的方法,最近被应用于图预测,通过将目标建模为给定输入图的条件分布,取得了与确定性预测器相当的结果。然而,现有扩散预测方法通常需要昂贵的迭代去噪,在推理时往往不稳定,这促使最近的努力减少推理去噪步骤并通过一致性训练等技术实现稳定的采样。尽管有进展,我们发现现有图预测的一致性训练方法可能陷入捷径解:模型可能通过忽略噪声目标(即赋予其可忽略的权重)来满足自一致性约束,最终退化为纯粹的确定性预测器。为缓解这种捷径解,我们提出了GCCM,一种图对比一致性模型,通过引入负样本到对比一致性目标中,超越了不同噪声水平下相同目标之间的孤立成对匹配。这增加了额外的分离要求,使捷径解不再轻易满足所提出的目标。此外,我们对输入节点/边特征应用特征扰动,打破对输入图的相同条件化,使捷径解在不同噪声水平下不再产生相同预测,从而变得不具吸引力。在基准数据集上的广泛实验表明,GCCM缓解了捷径解问题,并在图预测中相比确定性预测器取得了持续的性能提升。

英文摘要

Conditional generative models, particularly diffusion-based methods, have recently been applied to graph prediction by modeling the target as a conditional distribution given the input graph, yielding competitive results compared to deterministic predictor. However, existing diffusion-based prediction methods typically require expensive iterative denoising at inference and often suffer from unstable sampling, which motivates recent efforts to reduce inference denoising steps and enable stable sampling via techniques such as consistency training. Despite this progress, we find that existing consistency training methods for graph prediction could potentially fall into a shortcut solution: the model may attempt to satisfy the self-consistency constraint by ignoring the noisy target (i.e., assigning it negligible weight), ultimately collapsing into a purely deterministic predictor. To mitigate such shortcut solution, we propose GCCM, a graph contrastive consistency model that goes beyond isolated pairwise matching between the same target at different noise levels by introducing negative pairs into a contrastive consistency objective. This adds an additional separation requirement, making the shortcut solution no longer trivially sufficient to satisfy the proposed objective. Moreover, we apply feature perturbation to the input node/edge features to break identical conditioning on the input graph, so that the shortcut no longer yields the same predictions across noise levels and becomes less attractive. Extensive experiments on benchmark datasets demonstrate that GCCM mitigates the shortcut solution and yields consistent performance improvements in graph prediction compared to deterministic predictors.

2605.05687 2026-05-08 cs.AI

DataDignity: Training Data Attribution for Large Language Models

DataDignity: 为大语言模型训练数据的归因

Xiaomin Li, Andrzej Banburski-Fahey, Jaron Lanier

发表机构 * Microsoft(微软)

AI总结 研究提出通过归因方法识别支持语言模型响应的文档,引入FakeWiki基准测试,评估ScoringModel在九个模型上的表现,提升Recall@10至52.2,展示训练数据归因需区分真实支持与主题或词汇相似性。

详情
AI中文摘要

审核语言模型输出通常需要超越正确性判断:审计员可能需要识别支持响应的最可能来源文档。我们研究此问题作为精准归因:给定提示、目标模型响应和候选语料库,对支持响应的文档进行排序。我们引入FakeWiki,一个包含3,537篇伪造维基百科风格文章的受控基准,旨在保留真实归因信息同时削弱词汇捷径。FakeWiki包含问答探针、源保留改写、反向生成变体、硬反文档(保持主题相似但移除关键事实)和五个查询条件:干净提示加四种灵感来自 jailbreak 的转换。我们评估了七个检索基线、一个无需训练的激活引导检索融合方法SteerFuse,以及一个监督对比归因排名器ScoringModel。ScoringModel将响应和文档特征映射到共享空间,并通过InfoNCE使用批次内、检索挖掘和反文档负样本进行训练。在九个开放权重指令微调LLM和五个查询条件下,ScoringModel在无需推理时间融合的情况下,将Recall@10从最强检索基线的35.0提升至52.2,并在41/45模型-条件单元中胜出。SteerFuse在无需监督训练的情况下通常为第二好,显示激活空间证据可有效补充文本检索。在灵感来自jailbreak的转换查询上,ScoringModel在最佳基线基础上平均提升Recall@10 15.7点。总体而言,我们的工作表明,稳健的训练数据归因需要评估设置,以区分真实答案支持与主题或词汇相似性。

英文摘要

Auditing language-model outputs often requires more than judging correctness: an auditor may need to identify which source document most likely supports the knowledge expressed in a response. We study this as pinpoint provenance: given a prompt, a target-model response, and a candidate corpus, rank the documents that best support the response. We introduce FakeWiki, a controlled benchmark of 3,537 fabricated Wikipedia-style articles designed to preserve ground-truth provenance while weakening lexical shortcuts. FakeWiki includes QA probes, source-preserving paraphrases, retro-generated variants, hard anti-documents that remain topically similar while removing answer-critical facts, and five query conditions: clean prompting plus four jailbreak-inspired transformations. We evaluate seven retrieval baselines, a training-free activation-steering retrieval-fusion method, SteerFuse, and a supervised contrastive provenance ranker, ScoringModel. ScoringModel maps response and document features into a shared space and is trained with InfoNCE using in-batch, retrieval-mined, and anti-document negatives. Across nine open-weight instruction-tuned LLMs and five query conditions, ScoringModel improves mean Recall@10 from 35.0 for the strongest retrieval baseline to 52.2, without inference-time fusion, and wins 41/45 model-by-condition cells. SteerFuse is usually second-best despite requiring no supervised training, showing that activation-space evidence can efficiently complement text retrieval. On jailbreak-inspired transformed queries, ScoringModel improves Recall@10 by 15.7 points on average over the best baseline. Overall, our work shows that robust training data attribution requires evaluation settings that separate true answer support from topical or lexical resemblance.

2605.05685 2026-05-08 cs.LG cs.AI stat.ML

Temporal Functional Circuits: From Spline Plots to Faithful Explanations in KAN Forecasting

时间功能电路:从样条图到KAN预测中的忠实解释

Naveen Mysore

发表机构 * University of California, Santa Barbara(加州大学圣巴巴拉分校)

AI总结 本文提出时间功能电路框架,通过门控残差KAN将KAN边功能从隐含可视化转化为忠实的时间解释,展示了在复杂信号中提升预测精度和可解释性。

Comments 9 pages, 4 figures, 6 tables, plus appendix. Under review at NeurIPS 2026

详情
AI中文摘要

与MLP不同,Kolmogorov-Arnold网络(KANs)在每条连接上暴露显式的可学习边函数,从而在时间序列预测中实现机制解释。本文介绍了时间功能电路框架,该框架将KAN边函数从隐含可视化转化为忠实的时间解释。基于门控残差KAN,该框架将预测分解为线性基础和稀疏激活的KAN校正部分。框架(i)通过输出感知归因将每条边映射到输入滞后;(ii)通过学习激活范围对边进行排序;(iii)通过边级干预验证忠实性,包括零化和样条移除。去除学习的B样条组件而保留基础SiLU项会降低预测精度,证明样条形状本身具有超越基础激活的预测价值。在四个复杂度递增的合成信号中,学习的门逐渐打开,随着信号复杂性增加。在切换信号中,门控KAN的MSE比仅线性模型低59%。在八个基准测试中,门控架构与线性、注意和MLP替代方案具有竞争力,同时提供可解释的边函数,这些边函数MLP基于修正无法提供。

英文摘要

Unlike MLPs, Kolmogorov-Arnold Networks (KANs) expose explicit learnable edge functions on every connection, enabling mechanistic explanation in time-series forecasting. This paper introduces Temporal Functional Circuits, a framework that transforms KAN edge functions from latent visualizations into faithful, temporally grounded explanations. Built on a gated residual KAN that decomposes forecasts into a linear base and a sparsely activated KAN correction, the framework (i) maps each edge to input lags via output-aware attribution, (ii) ranks edges by learned activation range, and (iii) validates faithfulness through edge-level interventions including zeroing and spline removal. Removing the learned B-spline component while retaining the base SiLU term degrades forecasts, providing evidence that the spline shape itself carries predictive value beyond the base activation. On four synthetic regimes of increasing complexity, the learned gate opens progressively wider as signal complexity grows. On regime-switching signals, gated KAN achieves 59% lower MSE than linear-only models. Across eight benchmarks, the gated architecture is competitive with linear, attention, and MLP alternatives, while providing interpretable edge functions that MLP-based corrections cannot offer.

2605.05678 2026-05-08 cs.AI

Chain of Risk: Safety Failures in Large Reasoning Models and Mitigation via Adaptive Multi-Principle Steering

风险链:大型推理模型中的安全故障及通过自适应多原则引导的缓解

Xiaomin Li, Jianheng Hou, Zheyuan Deng, Zhiwei Zhang, Taoran Li, Binghang Lu, Bing Hu, Yunhan Zhao, Yuexing Hao

发表机构 * Harvard University(哈佛大学) University of Southern California(南加州大学) Brown University(布朗大学) Pennsylvania State University(宾夕法尼亚州立大学) Texas A&M University(德克萨斯阿姆大学) Purdue University(普渡大学) University of California, Irvine(加州大学 Irvine 分校) Massachusetts Institute of Technology(麻省理工学院)

AI总结 研究发现大型推理模型在推理轨迹中存在额外安全风险,提出自适应多原则引导方法降低不安全内容出现率,提升整体安全性。

详情
AI中文摘要

大型推理模型(LRMs)越来越多地暴露链式思维般的推理过程以提高透明度、验证和有意识的问题解决。这导致了安全盲区:有害或违反政策的内容可能出现在推理轨迹中,即使最终答案看起来安全。我们测试了最终答案安全是否是完整推理-答案轨迹的充分代理,通过统一的二十原则安全评估标准对两个阶段进行评分。使用七个公共有害性和禁言来源的提示,加上四个非分布来源,我们评估了15个开放权重和API基于的LRMs,每个模型评估41,000个提示。推理轨迹一致地揭示了超出最终答案的安全风险,特别是在高严重性阶段性故障:泄漏案例,其中不安全的推理 precedes 一个安全的看起来答案,以及逃脱案例,其中看起来 benign 的推理 precedes 一个不安全的最终响应。原则级分析显示风险集中在虚假信息、法律合规、歧视、身体伤害和心理伤害。我们进一步提出自适应多原则引导,一种白盒测试时间缓解方法,每种安全原则学习一个不安全到安全的激活方向,并只激活当前隐藏状态更接近不安全而非安全中心的方向。在三个可引导的开放推理模型上,自适应引导在保留测试和非分布基准的97.7%宏平均准确性的情况下,将不安全内容数量减少了40.8%。这些结果表明,LRM的安全性应通过完整的暴露推理-答案轨迹进行评估和缓解,而不仅仅是最终答案阶段。

英文摘要

Large reasoning models (LRMs) increasingly expose chain-of-thought-like reasoning for transparency, verification, and deliberate problem solving. This creates a safety blind spot: harmful or policy-violating content may appear in reasoning traces even when final answers appear safe. We test whether final-answer safety is a sufficient proxy for the full reasoning-answer trajectory by scoring both stages under a unified twenty-principle safety rubric. Using prompts from seven public harmfulness and jailbreak sources, plus four out-of-distribution (OOD) sources, we evaluate 15 open-weight and API-based LRMs across 41K prompts per model. Reasoning traces consistently reveal additional safety risks beyond final answers, especially in high-severity stage-wise failures: leak cases, where unsafe reasoning precedes a safe-looking answer, and escape cases, where benign-looking reasoning precedes an unsafe final response. Principle-level analysis shows that risk concentrates in misinformation, legal compliance, discrimination, physical harm, and psychological harm. We further propose adaptive multi-principle steering, a white-box test-time mitigation that learns one unsafe-to-safe activation direction per safety principle and activates only directions whose current hidden state is closer to the unsafe than safe centroid. On three steerable open reasoning models, adaptive steering reduces unsafe counts in both reasoning traces and final answers on held-out and OOD benchmarks. DeepSeek-R1-Qwen-7B achieves a 40.8% average unsafe-count reduction while retaining 97.7% macro-averaged accuracy on BBH, GSM8K, and MMLU. These results suggest that LRM safety should be evaluated and mitigated over the full exposed reasoning-answer trajectory, not only at the final-answer stage.