arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.28607 2026-05-28 cs.AI cs.CL

Adaptive Multimodal Agents-Based Framework for Automatic Workflow Execution

基于自适应多智能体框架的自动工作流执行

Susanna Cifani, Mario Luca Bernardi, Marta Cimitile

AI总结 提出一种多模态多智能体框架,通过离线构建拓扑知识库和在线自适应检索增强生成与闭环协作验证,实现自动工作流执行。

详情
Comments
Copyright 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses. Accepted for publication at the 2026 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS 2026)
AI中文摘要

现代信息系统需要能够导航复杂工作流的自主智能体,但当前方法在从结构化元数据解析过渡到通用环境感知时常常遇到困难。虽然多模态大语言模型的集成使智能体能够直接与图形用户界面交互,但现有方法通常将任务序列视为离散的线性片段。这种碎片化阻止了智能体捕捉底层转移拓扑结构,限制了它们在新型或非平稳场景中的有效性。为了解决这个问题,我们提出了一种新颖的多模态多智能体框架,通过一个独特的两阶段流程实现自动工作流执行。首先,在离线发现阶段,该架构从碎片化的执行日志中自适应地构建拓扑知识库。在推理过程中,智能体利用自适应检索增强生成(RAG)作用于这个固定的、预先建立的图,并结合闭环协作验证协议进行动态自我纠正和导航。这种基于图的方法促进了优越的任务分解和自适应导航性能。我们在真实世界环境中验证了该框架,展示了即使在训练数据有限的情况下,它也能保持高可靠性和语义感知能力。

英文摘要

Modern information systems require autonomous agents capable of navigating complex workflows, yet current methodologies often struggle with the transition from structured metadata parsing to general environmental perception. While the integration of MLLMs has enabled agents to interact directly with GUIs, existing approaches typically treat task sequences as discrete, linear episodes. This fragmentation prevents agents from capturing the underlying transition topology, limiting their effectiveness in novel or non-stationary scenarios. To address this, we propose a novel multimodal multi-agent framework that achieves automatic workflow execution through a distinct two-phase pipeline. First, during an offline discovery phase, the architecture adaptively constructs a topological knowledge base from fragmented execution logs. During inference, agents leverage Adaptive Retrieval-Augmented Generation (RAG) over this fixed, pre-established graph, coupled with a closed-loop collaborative verification protocol to dynamically self-correct and navigate. This graph-based approach facilitates superior task decomposition and adaptive navigation performance. We validate our framework in a real-world context, demonstrating its ability to maintain high reliability and semantic awareness even with limited training data.

2605.28605 2026-05-28 cs.CV

Internally Referenced Low-Light Enhancement

内部参考的低光增强

Peiyuan He, Hainuo Wang, Hengxing Liu, Mingjia Li, Xiaojie Guo

AI总结 提出一种内部参考低光增强框架,通过从退化输入中提取物理和结构参考,结合局部曝光模拟、双域保持策略和增益自适应特征调制,实现自监督低光图像增强,在噪声抑制和纹理保真度上达到最优性能。

详情
AI中文摘要

自监督低光图像增强(LLIE)因其消除了对外部配对数据的依赖而极具吸引力。然而,缺乏外部参考导致网络难以解耦纠缠的照明、精细纹理和放大的噪声。为解决这一挑战,我们提出了一种内部参考的LLIE框架,该框架从退化输入图像本身提取可靠的物理和结构参考。首先,我们引入了一种局部曝光模拟方案来提取低频伪真值。这作为内部物理参考,用于指导全局照明估计和校正色偏。其次,我们提出了一种具有空间和光谱约束的双域保持策略来构建内部结构参考。具体来说,照明对齐感知损失在照明变化下保留全局结构,而平移不变光谱相关损失捕获细粒度局部结构并抑制高频噪声。最后,我们提出了一种增益自适应特征调制(GAFM)机制来处理高度空间变化的残余噪声。通过将自估计的照明图转换为内部空间增益先验,GAFM动态引导盲点网络进行空间感知去噪。大量实验表明,我们的方法实现了最先进的性能,提供了卓越的噪声抑制和纹理保真度。代码将在https://visonj.github.io/IRLE/公开。

英文摘要

Self-supervised low-light image enhancement (LLIE) is highly appealing as it eliminates the reliance on external paired data. However, the lack of external references causes networks to struggle with decoupling entangled illumination, delicate textures, and amplified noise. To resolve this challenge, we propose an Internally Referenced LLIE framework that extracts reliable physical and structural references from the degraded input image itself. First, we introduce a local exposure-simulated scheme to extract a low-frequency pseudo ground-truth. This serves as an internal physical reference to guide global illumination estimation and correct color casts. Second, we propose a dual-domain preservation strategy with spatial and spectral constraints to construct internal structural references. Specifically, an Illumination-Aligned Perceptual loss preserves global structures under illumination shifts, while a Shift-Invariant Spectral Correlation loss captures fine-grained local structures and suppresses high-frequency noise. Finally, we propose a Gain-Adaptive Feature Modulation (GAFM) mechanism to address highly spatially-variant residual noise. By transforming the self-estimated illumination map into an internal spatial gain prior, GAFM dynamically guides a blind-spot network for spatially-aware denoising. Extensive experiments demonstrate that our method achieves state-of-the-art performance, delivering superior noise suppression and textural fidelity. Code will be publicly released at https://visonj.github.io/IRLE/.

2605.28604 2026-05-28 cs.CV cs.AI

Mining Multi-Modality Spatio-Temporal Cues for Video Important Person Identification

挖掘多模态时空线索用于视频重要人物识别

Xiao Wang, Minglei Yang, Bin Yang, Wenke Huang, Zheng Wang, Xin Xu, Mang Ye

AI总结 针对视频中人物重要性随时间变化的问题,提出VIP-Net框架,通过多模态时空线索融合与时间重要性矫正,在Temporal-VIP数据集上达到67.3%准确率。

详情
AI中文摘要

识别视频场景中的关键人物对于自动视频编辑和智能监控等应用至关重要。当前方法主要关注静态图像和即时视觉线索,忽略了视频中丰富的时空信息。这导致了时间重要性转移(TIS)现象,即早期帧中被认为重要的人物在考虑整个时间上下文后可能被降级。为了解决这一问题,我们引入了视频重要人物(VIP)识别任务,旨在自动识别视频中最具影响力的人物,同时提供文本理由。我们提出了Temporal-VIP,一个大规模的理由标注数据集,包含11个类别的9,249个视频片段,并附有对齐的重要性理由。为了缓解TIS,我们开发了VIP-Net框架,包括用于提取多模态时空线索的社会线索编码器(SCE)、用于层次化线索融合和跨模态对齐的时间重要性矫正器(TIR),以及用于人物排序的VIP推理。实验结果表明,VIP-Net达到了67.3%的准确率,显著优于最先进的模型(37.5%-53.9%),并通过特征引导的LLM优化,平均理由相似度达到0.63。数据集和代码可在https://huggingface.co/datasets/yml2002/Temporal-VIP获取。

英文摘要

Identifying key individuals in video scenes is essential for applications such as automated video editing and intelligent surveillance. Current methods primarily focus on static images and immediate visual cues, overlooking the rich spatio-temporal information in videos. This leads to the phenomenon of Temporal Importance Shift (TIS), wherein individuals deemed significant in early frames may be demoted as the entire temporal context is considered. To address this, we introduce the Video Important Person (VIP) identification task, aimed at automatically identifying the most influential individuals in videos while providing textual rationales. We present Temporal-VIP, a large-scale rationale-annotated dataset consisting of 9,249 video segments across 11 categories with aligned importance rationales. To mitigate TIS, we develop the VIP-Net framework, which includes a Social Cue Encoder (SCE) for extracting multi-modal spatio-temporal cues, a Temporal Importance Rectifier (TIR) for hierarchical cue fusion and cross-modal alignment, and VIP Inference for ranking individuals. Experimental results show that VIP-Net achieves 67.3% accuracy, significantly outperforming state-of-the-art models (37.5%-53.9%) and yielding a mean rationale similarity of 0.63 to ground truth through feature-guided LLM refinement. The dataset and code are available at https://huggingface.co/datasets/yml2002/Temporal-VIP.

2605.28603 2026-05-28 cs.LG cs.AI

Online Irregular Multivariate Time Series Forecasting via Uncertainty-Driven Dual-Expert Calibration

在线不规则多变量时间序列预测:基于不确定性驱动的双专家校准

Haonan Wen, Hanyang Chen, Songhe Feng

AI总结 针对在线不规则多变量时间序列预测中数据分布动态变化导致性能下降的问题,提出不确定性驱动的双专家校准框架Under-Cali,通过不确定性估计、双专家校准和自适应路由模块实现稳定高效的在线学习。

详情
Comments
Accepted by KDD 2026
AI中文摘要

不规则多变量时间序列预测在许多实际应用中至关重要,其中时间序列是不规则采样的,并表现出动态演变的缺失模式。尽管现有方法在离线设置中表现良好,但在在线部署时,由于数据分布的动态变化,它们常常遭受显著的性能下降。在这种动态场景中保持预测能力通常需要在线自适应技术。由于不规则采样从根本上破坏了时间连续性和周期性,我们无法利用来自规则MTS的这些广泛研究的特性进行在线学习。为此,我们研究了在线IMTS预测问题,并提出了Under-Cali,一个不确定性驱动的双专家校准框架,包含三个核心组件:不确定性估计器、双专家校准模块和自适应路由模块。我们设计了一个不确定性估计器,作为核心控制信号来联合管理推理和自适应过程。在我们的框架中,不确定性估计器首先评估每个传入批次的不确定性。然后,自适应路由模块将高不确定性的样本引导至不可靠专家进行校准,而低不确定性样本则保留给可靠专家。随后,系统使用校准良好的可靠样本更新可靠专家和不确定性估计器,并使用具有挑战性的样本更新不可靠专家,从而实现稳定高效的在线学习。Under-Cali保持源预测模型冻结,仅通过轻量级、模型无关的校准模块进行自适应,从而实现高效自适应。在IMTS基准上的大量实验表明,在低计算成本下取得了持续的改进。我们的代码可在https://github.com/HaonanWen/Under-Cali获取。

英文摘要

Irregular multivariate time series forecasting is critical in many real-world applications, where time series are irregularly sampled and exhibit dynamically evolving missingness patterns. Although existing methods perform well in offline settings, they often suffer from significant performance degradation when deployed online due to dynamic shifts in data distribution. Maintaining forecasting capability in such dynamic scenarios typically necessitates online adaptation techniques. Since irregular sampling fundamentally undermines temporal continuity and periodicity, we cannot leverage these widely studied characteristics from regular MTS for online learning. To this end, we study the problem of online IMTS forecasting and propose Under-Cali, an uncertainty-driven dual-expert calibration framework consisting of three core components: an uncertainty estimator, a dual-expert calibration module, and an adaptive routing module. We design an uncertainty estimator that serves as the core control signal to jointly manage inference and adaptation processes. In our framework, the uncertainty estimator first assesses uncertainty for each incoming batch. The adaptive routing module then directs samples with high uncertainty to the unreliable expert for calibration, while low uncertainty samples remain with the reliable expert. Subsequently, the system updates the reliable expert and the uncertainty estimator using well-calibrated reliable samples, and updates the unreliable expert with challenging samples, enabling stable and efficient online learning. Under-Cali keeps the source forecasting model frozen and performs adaptation only through a lightweight, model-agnostic calibration module, enabling efficient adaptation. Extensive experiments on IMTS benchmarks demonstrate consistent improvements with low computational cost. Our code is available at https://github.com/HaonanWen/Under-Cali.

2605.28602 2026-05-28 cs.AI cs.CL cs.LO

Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability

用大语言模型求解可满足性问题:推理能力的配对评估

Leizhen Zhang, Shuhan Chen, Sheng Chen

AI总结 提出配对公式协议和准确区分率(ADR)来评估大语言模型在SAT问题上的推理能力,发现传统指标具有误导性,而ADR能更忠实、跨表示鲁棒地评估模型。

详情
Comments
Accepted at the ACM International Conference on the Foundations of Software Engineering (FSE 2026)
AI中文摘要

大型语言模型(LLMs)越来越多地用于隐式归结为布尔可满足性(SAT)的任务,但它们在SAT上的推理能力仍不清楚。我们对LLMs在2-SAT和3-SAT上进行了系统研究,并使用了两个经典归约——顶点覆盖和离散3D装箱——来探测表示不变的推理。我们首先使用传统指标评估模型,包括准确率、精确率、召回率和F1,以及SAT相变设置。我们发现这些指标可能具有误导性:许多模型通过过度预测可满足公式获得高分,未能重现3-SAT阈值附近经典的易-难-易特征,并且随着变量数量的增加而急剧下降。为解决这个问题,我们引入了一个基于最小差异可满足和不可满足实例的配对公式协议,以及准确区分率(ADR),它要求每对中的两个成员都被正确分类。ADR将面向推理的模型与启发式模型区分开来,并与证据有效性相关。在CNF之外,我们通过将CNF转换为顶点覆盖和将3-SAT转换为离散3D装箱来测试跨表示一致性。大多数模型在超过80%的实例上,对CNF和对应图或装箱实例的决策一致,表明跨表示存在稳定的决策规则。总体而言,我们的结果表明SAT是LLM推理的一个保守探针,并且使用ADR的配对评估比传统指标提供了更忠实且表示鲁棒的评估。

英文摘要

Large language models (LLMs) are increasingly used for tasks that implicitly reduce to Boolean satisfiability (SAT), yet their reasoning ability on SAT remains unclear. We present a systematic study of LLMs on 2-SAT and 3-SAT, together with two canonical reductions, Vertex Cover and discrete 3D packing, to probe representation-invariant reasoning. We first evaluate models using conventional metrics, including accuracy, precision, recall, and F1, as well as the SAT phase-transition setting. We find that these metrics can be misleading: many models obtain high scores by over-predicting satisfiable formulas, fail to reproduce the classical easy-hard-easy signature around the 3-SAT threshold, and degrade sharply as the number of variables grows. To address this problem, we introduce a paired-formula protocol based on minimally different satisfiable and unsatisfiable instances, together with Accurate Differentiation Rate (ADR), which requires both members of each pair to be classified correctly. ADR separates reasoning-oriented models from heuristic ones and correlates with witness validity. Beyond CNF, we test cross-representation consistency by converting CNF to Vertex Cover and 3-SAT to discrete 3D packing. Model decisions on CNF and on the corresponding graph or packing instances agree for most models on more than 80 percent of instances, suggesting stable decision rules across representations. Overall, our results show that SAT is a conservative probe for LLM reasoning, and that paired evaluation with ADR provides a more faithful and representation-robust assessment than conventional metrics.

2605.28600 2026-05-28 cs.LG

Transformers Provably Learn to Internalize Chain-of-Thought

Transformer可证明地内化思维链

Yixiao Huang, Hanlin Zhu, Zixuan Wang, Jiantao Jiao, Stuart Russell, Somayeh Sojoudi, Song Mei

AI总结 提出Log-ICoT课程训练,使L层Transformer用多项式样本学习k-奇偶性,匹配显式CoT的样本效率且消除推理开销。

详情
AI中文摘要

思维链提示显著提高了transformer的样本效率,将奇偶学习等任务的复杂度从输入长度的指数级降低到多项式级。然而,在推理时生成显式推理步骤的计算成本很高。隐式思维链作为一种有前景的经验性补救措施出现,它训练模型在隐藏状态内内化中间步骤,但其理论基础仍不清楚。我们首次对ICoT进行理论分析,证明在我们提出的Log-ICoT课程下训练的$L$层transformer可以用$\mathsf{poly}(n)$个样本和$L = \log_2 k$个训练阶段学习$k$-奇偶性。这匹配了显式CoT的样本效率,同时消除了其推理开销,并将先前的一层奇偶性保证扩展到多层架构。与每次移除一个思考令牌的标准ICoT相比,Log-ICoT以几何块的方式移除它们,将阶段数从$k$的线性减少到对数。多层transformer上的实验证实了该理论,并可视化了推理如何逐步被吸收到更深层中。

英文摘要

Chain-of-Thought (CoT) prompting substantially improves the sample efficiency of transformers, reducing the complexity of tasks like parity learning from exponential to polynomial in the input length. However, generating explicit reasoning steps at inference is computationally expensive. Implicit Chain-of-Thought (ICoT) has emerged as a promising empirical remedy that trains models to internalize intermediate steps within their hidden states, but its theoretical foundations remain poorly understood. We give the first theoretical analysis of ICoT, proving that an $L$-layer transformer trained under our proposed Log-ICoT curriculum learns $k$-parity with $\mathsf{poly}(n)$ samples and $L = \log_2 k$ training stages. This matches the sample efficiency of explicit CoT while eliminating its inference overhead, and extends prior one-layer parity guarantees to multi-layer architectures. Compared to standard ICoT, which removes thinking tokens one at a time, Log-ICoT removes them in geometric chunks, reducing the number of stages from linear in $k$ to logarithmic. Experiments on multi-layer transformers confirm the theory and visualize how reasoning is progressively absorbed into deeper layers.

2605.28598 2026-05-28 cs.CL cs.AI

Evaluating the Realism of LLM-powered Social Agents: A Case Study of Reactions to Spanish Online News

评估基于LLM的社会智能体的真实性:对西班牙在线新闻反应的案例研究

Alejandro Buitrago López, Alberto Ortega Pastor, Javier Pastor-Galindo, José A. Ruipérez-Valiente

AI总结 通过比较真实与LLM生成的西班牙新闻评论,研究LLM在仇恨言论、情感和语义对齐三个维度上的真实性,发现现成模型表现不佳,微调可部分改善。

详情
AI中文摘要

基于LLM的社会智能体越来越多地被用于模拟在线社交行为,但其真实性仍然难以验证。现有工作主要依赖通用基准,而对简短的反应性话语(如受众对在线新闻的回复)关注较少。在本文中,我们评估LLM生成的西班牙新闻反应是否再现了真实受众话语的可测量属性。使用Hatemedia数据集,我们将5,631条新闻与58,555条真实受众反应配对,并在共享实验设置下使用五个LLM生成匹配的合成数据集。我们从仇恨言论、情感和语义对齐三个维度比较真实和合成反应,考虑现成和微调生成。结果表明,现成模型是真实受众反应的糟糕代理:它们严重低估仇恨言论,引入模型特定的情感偏差,并且在分布上与人类回复相距甚远。微调不均匀地提高了保真度。Qwen3提供了最平衡的近似,而Mistral7B实现了最强的情感和语义对齐,但过度估计了仇恨普遍性。看似合理的合成回复不一定再现公共话语的分布特性。

英文摘要

LLM-powered social agents are increasingly used to simulate online social behavior, yet their realism remains difficult to validate. Existing work has largely relied on general-purpose benchmarks, while less attention has been paid to short, reactive discourse such as audience replies to online news. In this paper, we evaluate whether LLM-generated reactions to Spanish online news reproduce measurable properties of real audience discourse. Using the Hatemedia dataset, we pair 5,631 news items with 58,555 real audience reactions, and generate a matched synthetic dataset using five LLMs under a shared experimental setting. We compare real and synthetic reactions across three dimensions: hate speech, sentiment, and semantic alignment, considering both off-the-shelf and fine-tuned generation. Results show that off-the-shelf models are poor proxies for real audience reactions: they strongly underproduce hate speech, introduce model-specific sentiment biases, and remain distributionally distant from human replies. Fine-tuning improves fidelity unevenly. Qwen3 provides the most balanced approximation, while Mistral7B achieves the strongest sentiment and semantic alignment but overshoots hate prevalence. Plausible synthetic replies do not necessarily reproduce the distributional properties of public discourse.

2605.28597 2026-05-28 cs.CR cs.AI cs.LG

Position: Retire the "Positive Backdoor" Label -- Secret Alignment Requires Strict and Systematic Evaluation

立场:淘汰“正向后门”标签——秘密对齐需要严格且系统的评估

Jianwei Li, Jung-Eun Kim

AI总结 本文主张停止使用“正向后门”标签,将触发激活的隐藏行为视为秘密对齐,并通过评估三个代表性应用在六个核心属性上的表现,揭示其脆弱性,呼吁进行严格评估。

详情
Comments
ICML 2026
AI中文摘要

这篇立场论文认为,AI/ML社区应停止过度宣称并淘汰“正向后门”标签,而应将触发激活的隐藏行为视为秘密对齐。关键在于,基于秘密对齐的保护性主张在缺乏严格、标准化评估的情况下,默认不应被视为安全。私有AI时代,通过开放权重的LLM和可访问的训练/推理栈,语言模型成为私有数字资产,产生了关于未授权访问、模型盗窃和行为滥用的安全问题。最近,一系列被称为“正向后门”的工作被提出以应对这些挑战。为将我们的立场建立在证据基础上,我们将这些提议统一为用于访问门控、所有权归属和安全执行的隐蔽触发-行为关联,并评估了三个代表性应用在六个核心属性上的表现:有效性、无害性、持久性、效率、鲁棒性和可靠性。我们的结果揭示了触发-行为映射的显著脆弱性——尤其是在机密性、完整性和可用性(CIA)方面——这些往往被现有声称低估。我们进一步将这些结果与行为密度和决策复杂性联系起来,提供了一个理解部署时风险的行为视角,并激励社区范围内的评估,使秘密对齐主张可证明。

英文摘要

This position paper argues that the AI/ML community should stop overclaiming and retire the label "positive backdoor," and instead treat trigger-activated hidden behaviors as Secret Alignment. Crucially, protective claims based on Secret Alignment should be presumed not secure by default unless supported by rigorous, standardized evaluation. The Private AI era, enabled by open-weight LLMs and accessible training/inference stacks, turns language models into privately owned digital assets, creating security concerns around unauthorized access, model theft, and behavioral misuse. Recently, a line of work framed as "positive backdoors" has been proposed to address these challenges. To ground our position in evidence, we unify these proposals as covert trigger-behavior associations for access gating, ownership attribution, and safety enforcement, and evaluate three representative applications across six core properties: effectiveness, harmlessness, persistence, efficiency, robustness, and reliability. Our results reveal substantial brittleness - especially in the confidentiality, integrity, and availability (CIA) - of trigger-behavior mappings often underrepresented by existing claims. We further relate these outcomes to behavior density and decision complexity, offering a behavioral lens for understanding deployment-time risks and motivating community-wide evaluation that makes Secret Alignment claims provable.

2605.28596 2026-05-28 astro-ph.CO cs.LG

Dark Quest II: A Wide-Coverage Neural Network Emulator of the Nonlinear Matter Power Spectrum Across Extended Cosmologies

黑暗探索 II:扩展宇宙学下非线性物质功率谱的广覆盖神经网络仿真器

Satoshi Tanaka, Takahiro Nishimichi, Yosuke Kobayashi

AI总结 提出DarkEmulator2神经网络仿真器,在九维w0waνoCDM参数空间中通过联合训练多分辨率模拟实现亚百分比精度预测非线性物质功率谱。

详情
Comments
53 pages, 44 figures, emulator code available at https://github.com/DarkQuestCosmology/dark_emulator2_public
AI中文摘要

\textsc{DarkEmulator2} 是一个神经网络仿真器,用于在九维 $w_0 w_a νo \mathrm{CDM}$ 参数空间中模拟非线性物质功率谱,作为 \textsc{Dark Quest II} (DQ2) 程序的仿真器组件开发。它基于 \textsc{Ginkaku} 代码生成的模拟进行训练,该代码的数值实现、精度测试和后处理流程在配套论文中描述。设计遵循统一策略:除了宇宙学参数向量外,我们还向神经网络的输入补充了三类物理动机的辅助量——线性物质功率谱、模拟分辨率描述符以及初始高斯随机场的低维总结——这些预计将改善跨参数空间的泛化能力。跨三个模拟分辨率层级联合训练单个网络,使得仿真器能够利用少量高分辨率模拟,同时保留来自较低分辨率模拟的广泛覆盖。对于 $L_{\mathrm{box}}=1\,\hiGpc$ 盒子,粒子数为 $N=3000^{3}$,仿真器在粒子奈奎斯特尺度 $k_{\mathrm{Ny}}\simeq 10\,\hMpci$ 以内以亚百分比精度再现模拟物质功率谱。仿真器在校准波数范围内保持准确,而其最高 $k$ 预测取决于模拟分辨率和散粒噪声。我们在独立测试集上验证仿真器,并通过与多个公开仿真器和广泛使用的拟合公式的交叉比较,表征模型间一致性及其残差中参数依赖的趋势。

英文摘要

\textsc{DarkEmulator2} is a neural network emulator of the nonlinear matter power spectrum in a nine-dimensional $w_0 w_a νo \mathrm{CDM}$ parameter space, developed as the emulator component of the \textsc{Dark Quest II} (DQ2) program. It is trained on simulations generated with the \textsc{Ginkaku} code, whose numerical implementation, accuracy tests, and post-processing pipeline are described in the companion paper. The design follows a unified strategy: in addition to the cosmological parameter vector, we supplement the neural network's inputs with three families of physically motivated auxiliary quantities -- the linear matter power spectrum, descriptors of the simulation resolution, and a low-dimensional summary of the initial Gaussian random field -- that are expected to improve generalization across the parameter space. Training a single network jointly across three simulation resolution tiers allows the emulator to exploit a small number of high-resolution simulations while retaining broad coverage from lower-resolution simulations. For a $L_{\mathrm{box}}=1\,\hiGpc$ box with $N=3000^{3}$ particles, the emulator reproduces the simulated matter power spectrum to subpercent accuracy up to the particle Nyquist scale, $k_{\mathrm{Ny}}\simeq 10\,\hMpci$. The emulator remains accurate over the calibrated wavenumber range, while its highest-$k$ predictions depend on the simulation resolution and shot noise. We validate the emulator on independent test suites and, through a cross-comparison with several public emulators and widely used fitting formulas, characterize the inter-model consistency and the parameter-dependent trends in their residuals.

2605.28594 2026-05-28 cond-mat.stat-mech cs.AI physics.comp-ph

Thermodynamic properties of chemically disordered compounds via AI-driven estimation of partition function with the PULSE method

通过PULSE方法基于AI驱动配分函数估计的化学无序化合物热力学性质

Baptiste Bernard, Luca Messina, Eiji Kawasaki, Emeric Bourasseau

AI总结 提出改进的PULSE方法,通过无监督学习采样和估计配分函数,以低成本高效计算化学无序化合物的热力学性质,并在2D Ising模型上验证了其高精度和效率。

详情
Comments
13 pages, 11 figures, submitted to Physical Chemistry Chemical Physics
AI中文摘要

在本文中,我们提出了PULSE方法(配分函数无监督学习采样与评估)的改进版本,用于估计化学无序化合物的热力学性质。目的是降低这类材料蒙特卡罗方法的计算成本,并证明这种生成工具可以通过采样和估计系统的配分函数来估计热力学性质。为了验证这种创新方法,我们使用2D Ising模型作为基准。我们证明,与传统蒙特卡罗采样方法相比,我们的方法能够以高精度和效率准确再现平均性质。我们的结果突出了PULSE方法的效率和适应性,使其成为研究那些传统方法因化学无序影响而过于低效、无法低成本计算性质的材料的有价值工具。

英文摘要

In this article, we present an improved version of the PULSE method (Partition function Unsupervised Learning Sampling and Evaluation) for estimating the thermodynamic properties of chemically disordered compounds. The aim is to reduce the computational cost of Monte Carlo approaches for this type of material and to demonstrate that this generative tool can estimate thermodynamic properties by sampling and estimating the partition function of the system. To validate this innovative approach, we use the 2D Ising model as a benchmark. We demonstrate that our method accurately reproduces average properties with high precision and efficiency compared to traditional Monte Carlo sampling methods. Our results highlight the efficiency and adaptability of the PULSE method, making it a valuable tool for studying materials for which conventional methods are too inefficient to compute properties affected by chemical disorder at low cost.

2605.28592 2026-05-28 cs.LG

PLS in the Mirror of Self-Attention

PLS在自注意力镜中的映射

Jiangsheng, You

AI总结 将偏最小二乘法(PLS)视为线性化自注意力机制,从而在神经网络框架内研究PLS,同时PLS的降维和预测变量选择表明自注意力包含一定程度的维度归一化以改进学习。

详情
AI中文摘要

本文提供了一个有趣的观察:将偏最小二乘法(PLS)视为线性化的自注意力机制,从而可以在神经网络范式内研究PLS。另一方面,PLS中的降维和预测变量选择表明,自注意力包含一定程度的维度归一化,以改进学习。

英文摘要

This note provides an interesting observation on casting partial least square (PLS) as a linearized self-attention so that PLS may be studied within the neural network paradigm. On the other hand, the dimensionality reduction and selection of predictors in PLS may indicate that self-attention includes certain degree of dimensionality normalization toward improved learning.

2605.28589 2026-05-28 cs.LG

Thinned Mean Field Langevin Dynamics

稀疏化平均场朗之万动力学

Zonghao Chen, Heishiro Kanagawa, François-Xavier Briol, Chris J. Oates, Lester Mackey

AI总结 提出KT-MFLD算法,通过核稀疏化将粒子交互复杂度从O(N^2)降至O(N^{3/2}),并保持与MFLD相同的收敛保证。

详情
Journal ref
International Conference on Machine Learning, 2026
AI中文摘要

几个重要的学习任务可以表述为在适当的概率分布空间上最小化熵正则化目标。平均场朗之万动力学(MFLD)促进了这一一般上下文中的计算,将最小化器视为McKean-Vlasov过程的不变分布,该过程可以使用$N$个粒子进行数值离散化并模拟。然而,模拟这个相互作用粒子系统的计算复杂度为$O(N^2)$。受最近关于\emph{核稀疏化}研究的启发,我们提出了 exttt{KT-MFLD},其中每个粒子仅与大小为$\mathcal{O}(N^{ rac{1}{2}})$的稀疏粒子核心集相互作用。因此, exttt{KT-MFLD}将计算复杂度降低到$O(N^{ rac{3}{2}})$,同时在温和的正则条件下,实现与MFLD相同的收敛保证(最多对数因子)。我们的理论分析在包括学生-教师神经网络训练、最大均值差异量化以及后贝叶斯框架中面向预测的后验计算等任务上得到了实证确认。

英文摘要

Several important learning tasks can be formulated as minimizing an entropy-regularized objective over an appropriate space of probability distributions. Mean-field Langevin dynamics (MFLD) facilitate computation in this general context, casting the minimizer as the invariant distribution of a McKean--Vlasov process, which can be numerically discretized using $N$ particles and thus simulated. However, simulating this interacting particle system has computational complexity of order $N^2$. Motivated by recent research into \emph{kernel thinning}, we propose \texttt{KT-MFLD}, in which each particle interacts only with a thinned particle coreset of size $\mathcal{O}(N^{\frac{1}{2}})$. \texttt{KT-MFLD} thus reduces the computational complexity to order $N^{\frac{3}{2}}$ while, under mild regularity conditions, achieving the same convergence guarantees (up to logarithmic factors) as MFLD. Our theoretical analysis is empirically confirmed on tasks including the training of student-teacher neural networks, quantization with maximum mean discrepancy, and computation of predictively-oriented posteriors in a post-Bayesian framework.

2605.28588 2026-05-28 cs.CR cs.AI

Technical Report: Exploring the Emerging Threats of the Agent Skill Ecosystem

技术报告:探索智能体技能生态系统的新兴威胁

Luca Beurer-Kellner, Aleksei Kudrinskii, Marco Milanta, Kristian Bonde Nielsen, Hemang Sarkar, Liran Tal

AI总结 本研究通过分析3984个AI智能体技能,发现76个恶意载荷,揭示了技能生态系统中的安全威胁,并提出了威胁分类和攻击模式。

详情
Comments
10 pages, technical report
AI中文摘要

我们分析了来自主要市场的3,984个AI智能体技能,发现了76个确认的恶意载荷,包括凭证窃取、后门安装和数据泄露。13.4%的技能至少包含一个关键级别的安全问题,截至发表之日,至少有8个手动确认的恶意技能仍在clawhub.ai上公开可用。本报告记录了我们的方法论,基于真实样本提出了威胁分类,并详细描述了观察到的攻击模式。随着技能市场快速增长,AI智能体获得敏感凭证和系统的访问权限,自动化安全分析不再是可选项。

英文摘要

We analyzed 3,984 AI agent skills from major marketplaces and found 76 confirmed malicious payloads, including credential theft, backdoor installation, and data exfiltration. 13.4% of all skills contain at least one critical-level security issue and at least 8 manually confirmed malicious skills remain publicly available on clawhub.ai as of the date of publication. This report documents our methodology, presents a threat taxonomy based on real-world samples, and details the attack patterns we observed. As skill marketplaces grow rapidly and AI agents gain access to sensitive credentials and systems, automated security analysis is no longer optional.

2605.28587 2026-05-28 cs.CV

Deformable Gaussian Occupancy: Decoupling Rigid and Nonrigid Motion with Factorized Distillation

可变形高斯占据:通过分解蒸馏解耦刚性与非刚性运动

Yang Gao, Wuyang Li, Po-Chien Luan, Alexandre Alahi

AI总结 提出DeGO框架,通过解耦高斯变形和分解式4D基础模型蒸馏,在弱监督下实现动态场景中刚性与非刚性运动的分离,显著提升人体实例占据预测性能。

详情
Comments
CVPR 2026
AI中文摘要

理解动态3D环境对于安全自动驾驶至关重要,特别是推理以人为中心的非刚性智能体时。然而,现有的弱监督占据预测框架主要假设刚体运动并依赖简单的帧间偏移,限制了其捕捉细粒度变形和保持时间一致性的能力。为解决此问题,我们提出DeGO,一个可变形高斯占据框架,统一了解耦高斯变形与分解式4D基础模型蒸馏。DeGO解耦刚性和非刚性运动,使每个高斯基元通过变形和偏移更新共同演化。同时,分解式4D蒸馏策略从VGGT基础模型迁移跨相机和跨帧知识,产生对齐基础模型的特征,增强时间一致性。在Occ3D-NuScenes基准上的实验表明,我们的方法在弱监督下达到了最先进性能,在人体实例上获得13.5%的提升,整体提升10.9%。这些结果凸显了变形感知和基础模型引导的占据建模对动态场景理解的有效性。代码已公开:https://github.com/vita-epfl/DeGO

英文摘要

Understanding dynamic 3D environments is essential for safe autonomous driving, particularly when reasoning about human-centric, nonrigid agents. However, existing weakly supervised occupancy prediction frameworks predominantly assume rigid-body motion and rely on simple frame-to-frame offsets, limiting their ability to capture fine-grained deformations and maintain temporal coherence. To address this issue, we propose DeGO, a deformable Gaussian occupancy framework that unifies decoupled Gaussian deformation with factorized 4D foundation-model distillation. DeGO disentangles rigid and nonrigid motion, enabling each Gaussian primitive to evolve through both deformation and offset-based updates. In parallel, a factorized 4D distillation strategy transfers cross-camera and cross-frame knowledge from the VGGT foundation model, producing foundation-aligned features that enhance temporal consistency. Experiments on the Occ3D-NuScenes benchmark demonstrate that our method achieves state-of-the-art performance under weak supervision, delivering 13.5% gains on human-centric instances and 10.9% overall improvements. These results highlight the effectiveness of deformation-aware and foundation-guided occupancy modeling for dynamic scene understanding. The code is publicly available: https://github.com/vita-epfl/DeGO

2605.28585 2026-05-28 cs.LG

Outer-Momentum Restarting in High-Dimensional Two-Phase Optimization

高维两阶段优化中的外动量重启

Kristi Topollai, Allan Ma, Tolga Dimlioglu, Sui Jiet Tay, Anna Choromanska

AI总结 本文研究在分布式优化中周期性重启外动量以控制外存效应,通过理论分析、玩具实验和语言模型预训练验证其能扩大稳定范围。

详情
AI中文摘要

通信高效的分布式优化器(如DiLoCo)通过让工作节点在聚合进度之前执行多次本地更新来减少同步成本,并使用外动量优化器进行聚合。近期理论表明,外优化器作用于由内优化循环诱导的有效谱,而外动量的选择控制着本地更新的进度如何在通信轮次间累积。我们研究外动量的周期性重启,作为控制这种外存的一种简单互补机制。在线性化平方损失模型中,预测空间残差在经验NTK下演化,我们推导出模态重启收缩,表明重置通过丢弃陈旧动量同时保留内循环进度来利用相位抵消。玩具实验验证了预测的收缩行为,语言模型预训练实验表明,周期性重启扩大了外学习率和动量值在通信周期内的稳定范围。

英文摘要

Communication-efficient distributed optimizers such as DiLoCo reduce synchronization costs by letting workers perform many local updates before aggregating their progress with an outer momentum optimizer. Recent theory suggests that the outer optimizer acts on an effective spectrum induced by the inner optimization loop, and that the choice of outer momentum controls how progress from local updates is accumulated across communication rounds. We study periodic restarting of the outer momentum as a simple complementary mechanism for controlling this outer memory. In a linearized squared-loss model where prediction-space residuals evolve under the empirical NTK, we derive a mode-wise restart contraction showing that resets exploit phase cancellation by discarding stale momentum while preserving inner-loop progress. Toy experiments verify the predicted contraction behavior, and language-model pretraining experiments show that periodic restarts widen the stable range of outer learning rates and momentum values across communication periods.

2605.28583 2026-05-28 cs.RO cs.AI cs.LG cs.SY eess.SY

SARAD: LLM-Based Safety-Aware Hybrid Reinforcement Learning with Collision Prediction for Autonomous Driving

SARAD:基于LLM的安全感知混合强化学习与碰撞预测在自动驾驶中的应用

Kangyu Wu, Peng Cui, Guoxi Chen, Ya Zhang

AI总结 提出SARAD框架,结合大语言模型和深度强化学习,通过检索增强生成和碰撞预测模块提升自动驾驶的安全性和效率。

详情
Comments
7 pages, 4 figures, accepted by IJCNN 2026
AI中文摘要

确保自动驾驶系统决策的安全性和效率仍然是一个基本挑战。传统的深度强化学习(DRL)存在不安全的随机探索和收敛缓慢的问题,而大语言模型(LLM)在实时推理操作中表现出固有的延迟。为了解决这些限制,本文提出了SARAD,一种新颖的安全感知混合框架,协同LLM和DRL用于自动驾驶。SARAD用来自动态专家知识库的、经检索增强生成(RAG)增强的LLM引导决策替代了DRL的随机探索。提出了一个注意力判别器,将LLM的先验知识整合到DRL策略优化中。进一步设计了一个碰撞预测模块,使用历史碰撞数据进行微调,以提高车辆安全性。大量实验表明,SARAD在Highway-Env模拟器中实现了显著的性能提升,验证了所提模型在自动驾驶中的有效性。

英文摘要

Ensuring both safety and efficiency in decision-making for autonomous driving systems remains a fundamental challenge. Traditional Deep Reinforcement Learning (DRL) suffers from unsafe random exploration and slow convergence, while Large Language Models (LLMs) demonstrate inherent latency in real-time inference operations. To address these limitations, this paper proposes SARAD, a novel safety-aware hybrid framework that synergizes LLMs and DRL for autonomous driving. SARAD substitutes the random exploration of DRL with Retrieval-Augmented Generation (RAG)-enhanced, LLM-guided decisions sourced from a dynamic expert knowledge repository. An attention discriminator is proposed to integrate the prior knowledge of LLMs into DRL policy optimization. A collision predictor module, fine-tuned with historical collision data, is further designed to improve vehicle safety. Extensive experiments show that SARAD achieves significant performance improvements in the Highway-Env simulator, validating the effectiveness of the proposed model in autonomous driving.

2605.28578 2026-05-28 cs.LG

A Generalized Tikhonov Layer for Interpretable-by-design Graph Neural Networks

一种用于可解释设计的图神经网络的广义Tikhonov层

Nicolas Tremblay, Benjamin Ricaud, Filippo Maria Bianchi

AI总结 提出Tikhonov层,通过可学习的节点重要性分数和多项式实现图神经网络的可解释性,其输出是广义图Tikhonov问题的精确解。

详情
AI中文摘要

我们提出了Tikhonov层,一种设计上可解释的图神经网络层:一旦训练完成,其学习到的参数直接揭示了哪些节点特征和图拓扑的哪些方面被用于预测。实际上,该层的传播矩阵采用闭式$R = (p(L)+Q)^{-1} Q$,其中$L$是归一化图拉普拉斯矩阵,$Q = diag(q_1,...,q_n)$是一个可学习的正节点重要性分数对角矩阵,$p(\cdot)$是一个可学习多项式。对于任意输入特征$x$,层输出$Rx$是广义图Tikhonov问题的精确最小化器,该问题在节点级数据保真度和拓扑驱动正则化惩罚之间进行权衡。学习到的对$\{\{q_i\},p\}$构成了内置的解释:大的$q_i$表明节点$i$自身的特征驱动预测,而小的$q_i$则表明依赖于局部图拓扑;$p$的形状揭示了是同质性、异质性还是带通响应被利用。通过将复杂性路由到一个专用的、任意深的Q网络来产生重要性分数,从而保持了表达能力,而Tikhonov层本身保持透明。我们证明了不同的节点重要性矩阵产生不同的传播算子,在结构上将解释与计算耦合。此外,Tikhonov层在单层中提供了全局感受野,缓解了过平滑和过挤压问题。在标准图分类基准上的实验证实,该模型匹配(有时甚至超越)不透明的基线,同时产生可解释且忠实的解释。

英文摘要

We propose the Tikhonov layer, a graph neural network layer that is interpretable by design: once trained, its learned parameters directly reveal which node features and which aspects of the graph topology were leveraged for prediction. In practice, the layer's propagation matrix takes the closed-form $R = (p(L)+Q)^{-1} Q$, where $L$ is the normalized graph Laplacian, $Q = diag(q_1,...,q_n)$ a learnable diagonal matrix of positive node-importance scores, and $p(\cdot)$ a learnable polynomial. For any input feature $x$, the layer output $Rx$ is the exact minimizer of a generalized graph Tikhonov problem that trades off node-level data fidelity against a topology-driven regularization penalty. The learned pair $\{\{q_i\},p\}$ constitutes a built-in explanation: large $q_i$ indicates that node $i$'s own features drive the prediction, while small $q_i$ signals reliance on the local graph topology; the shape of $p$ reveals whether homophily, heterophily, or a band-pass response is exploited. Expressivity is preserved by routing complexity through a dedicated, arbitrarily deep Q-network that produces the importance scores, while the Tikhonov layer itself remains transparent. We prove that distinct node-importance matrices yield distinct propagation operators, structurally coupling the explanation to the computation. Additionally, the Tikhonov layer provides, in a single layer, a global receptive field, mitigating both oversmoothing and oversquashing. Experiments on standard graph classification benchmarks confirm that the model matches (and sometimes outperforms) opaque baselines while producing interpretable and faithful explanations.

2605.28577 2026-05-28 cs.AI cs.LG

Continual Model Routing in Evolving Model Hubs

演化模型库中的持续模型路由

Jack Bell, Giacomo Carfì, Gerlando Gramaglia, Vincenzo Lomonaco

AI总结 针对模型库快速扩展带来的模型选择和路由更新挑战,提出持续模型路由(CMR)问题,构建大规模基准CMRBench,并设计基于对比嵌入的CARvE方法,通过检查点锚定和结构化重放实现高效路由,显著优于多种基线。

详情
Comments
42 pages, 24 tables, 6 figures, to be published at ICML 2026
AI中文摘要

AI模型库提供了对快速增长的大量预训练模型的访问,使得具有不同路由策略的现成混合专家系统成为可能。然而,这种快速增长带来了两个基本挑战:跨数千个专家进行模型选择的扩展,以及随着新模型和任务的引入持续更新路由机制。在本文中,我们将这一设置形式化为持续模型路由(CMR),并提出了CMRBench,这是一个新的大规模基准,模拟现实的模型库扩展,包括超过2000个候选模型。最后,我们介绍了CARvE,一种对比嵌入方法,通过基于检查点的锚定和结构化重放实现高效的持续模型路由。大量的实验结果和消融研究表明,CARvE在模型、家族和领域级别的准确性上显著优于零样本检索、微调和适配器合并基线。

英文摘要

AI model hubs provide access to a rapidly growing collection of powerful pre-trained models, enabling off-the-shelf mixture-of-experts systems with different routing strategies. However, this rapid growth poses two fundamental challenges: scaling model selection across thousands of experts and continually updating routing mechanisms as new models and tasks are introduced. In this paper, we formalise this setting as Continual Model Routing (CMR) and propose CMRBench, a new large-scale benchmark simulating realistic hub expansion and including over 2,000 candidate models. Finally, we introduce CARvE, a contrastive embedding approach for efficient continual model routing via checkpoint-based anchoring and structured replay. Extensive empirical results and ablations show that CARvE significantly outperforms zero-shot retrieval, fine-tuning, and adapter-merging baselines in model, family, and domain-level accuracy.

2605.28575 2026-05-28 cs.AI

A Conflict-Aware Penalty and Statistical Loss Framework for Balancing Modalities and Enhancing Stability in Multimodal Sentiment Analysis

一种冲突感知惩罚与统计损失框架,用于平衡模态并增强多模态情感分析的稳定性

Jianheng Dai, Jiazhang Liang, Sijie Mai

AI总结 针对多模态情感分析中文本模态主导导致梯度冲突的问题,提出冲突感知惩罚和统计损失框架,实现模态平衡与训练稳定,在CMU-MOSI上取得最优性能。

详情
AI中文摘要

多模态情感分析(MSA)融合文本、声学和视觉流来推断情感。由于预训练文本编码器的表达能力远强于声学和视觉编码器,文本模态往往主导优化过程,抑制较弱模态并引发梯度范数冲突,从而破坏训练稳定性。为解决此问题,我们提出一种冲突感知惩罚(CP),在每一步训练中检测并惩罚梯度范数冲突,以及一种统计损失(SL),使预测分布统计量与经验输入统计量对齐。关键的是,CP防止主导模态梯度干扰SL目标,从而在统一框架内实现协同训练,该框架包含自适应模态编码、门控跨模态融合和单模态辅助头。在CMU-MOSI上的实验表明,该方法达到了最先进的性能,消融研究证实了每个组件的有效性。

英文摘要

Multimodal Sentiment Analysis (MSA) fuses text, acoustic, and visual streams to infer sentiment. Because pre-trained text encoders are far more expressive than their acoustic and visual counterparts, the text modality tends to dominate optimization, suppressing weaker modalities and inducing gradient norm conflicts that destabilize training. To address this, we propose a Conflict-aware Penalty (CP) that detects and penalizes gradient norm conflicts at each training step, and a Statistical Loss (SL) that aligns predicted distribution statistics with empirical input statistics. Crucially, CP prevents dominant modality gradients from interfering with the SL objective, enabling synergistic training within a unified framework incorporating adaptive modality encoding, gated cross-modal fusion, and unimodal auxiliary heads. Experiments on CMU-MOSI demonstrate state-of-the-art performance, with ablation studies confirming the effectiveness of each component.

2605.28573 2026-05-28 cs.LG cs.AI

Efficient Pre-Training of LLMs through Truncated SVD Layers

通过截断SVD层实现LLM的高效预训练

Kaivan Kamali, Kajetan Schweighofer, Hormoz Shahrzad, Olivier Francon, Babak Hodjat, Risto Miikkulainen

AI总结 提出TSVD框架,利用谱能量启发式自适应秩选择和缓存机制保持低秩与严格正交性,在减少计算开销的同时匹配或超越全参数基线的性能。

详情
AI中文摘要

大规模语言模型(LLM)的规模扩展使得预训练成本日益高昂。虽然低秩表示和正交权重矩阵原则上可以减少参数数量和计算开销,但现有方法大多依赖静态秩选择,且由于高计算成本而不强制权重正交性。本文引入TSVD框架,在整个训练过程中保持低秩和严格正交性。它利用基于谱能量的启发式方法进行自适应秩选择,并采用缓存机制来维持正交性。理论分析证明了该方法在预训练动态中的优势,跨多种模型规模的实验表明其在经验上有效。TSVD在显著降低计算需求的同时,匹配或超越了全参数基线的性能。因此,该方法为高效高性能LLM预训练提供了一条有充分依据、实用且可扩展的路径。

英文摘要

The massive scaling of Large Language Models (LLMs) has made pretraining increasingly cost-prohibitive. While low-rank representation and orthonormal weight matrices could in principle reduce parameter counts and computational overhead, most existing methods rely on static rank selection and do not enforce weight orthonormality due to high computational cost. This paper introduces TSVD, a framework that maintains low rank and strict orthonormality throughout the training process. It utilizes a spectral energy-based heuristic for adaptive rank selection, and a caching mechanisms to maintain orthonormality. Theoretical analysis justifies the advantage of the approach in pretraining dynamics and experiments across various model scales demonstrate that it is effective empirically. TSVD matches or exceeds the performance of full-parameter baselines while significantly reducing compute requirements. The approach thus offers a well-founded, practical, and scalable path toward efficient high-performance LLM pretraining.

2605.28567 2026-05-28 cs.LG cs.AI

Semantic Optimal Transport for Sparse Autoencoder Feature Matching and Circuit Compression

稀疏自编码器特征匹配与电路压缩的语义最优传输

Tue M. Cao, Nguyen Do, My T. Thai

AI总结 提出基于最优传输的分布框架,通过激活加权分布和Wasserstein距离统一解决跨层特征匹配与电路压缩问题。

详情
Comments
preprint
AI中文摘要

稀疏自编码器(SAE)已成为解释语言模型的核心工具。然而,两个关键的SAE分析仍然难以规模化:(1)跨层匹配语义相似的特征,(2)将大型特征电路压缩为可解释的超节点。尽管这些问题被视为独立问题,但我们表明它们都是更基础挑战的实例,我们将其框架化为估计位于不同激活流形上的SAE特征之间的语义距离。我们为此问题引入了一个分布框架,其中每个特征不是像文献中那样由单个解码器向量表示,而是由表达它的隐藏状态上的激活加权分布表示。通过将这些分布投影到共享参考空间并使用Wasserstein距离进行比较,我们的方法为跨层特征比较提供了统一的语义度量。我们证明了我们的表示对激活缩放具有不变性,在扰动下稳定,并在有限样本边际条件下恢复真实匹配。实验上,我们的方法优于解码器向量和基于LLM的基线,并捕捉相关特征之间的细微功能差异。值得注意的是,我们的方法自动将大型特征电路压缩为可解释的超节点。

英文摘要

Sparse autoencoders (SAEs) have become a central tool for interpreting language models. However, two key SAE analyses that remain difficult to scale are (1) matching semantically similar features across multi-layers and (2) compressing large feature circuits into interpretable supernodes. Although these have been treated as separate problems, we show that both are instances of a more fundamental challenge, which we frame as the estimation of semantic distances between SAE features that lie on different activation manifolds. We introduce a distributional framework for this problem, in which each feature is represented not by a single decoder vector like in the literature, but by an activation-weighted distribution over the hidden states that express it. By projecting these distributions into a shared reference space and comparing them with Wasserstein distance, our method provides a unified semantic metric for cross-layer feature comparison. We prove that our representation is invariant to activation rescaling, stable under perturbations, and recovers true matches under finite-sample margin conditions. Empirically, our method outperforms decoder-vector and LLM-based baselines and captures subtle functional distinctions between related features. Notably, our method compresses large feature circuits into interpretable supernodes automatically.

2605.28566 2026-05-28 cs.AI cs.LG

Tree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design Patterns

思维树作为经典启发式搜索问题:形式化基础与设计模式

Guni Sharon

AI总结 本文通过经典启发式搜索术语统一分类法,将基于LLM的推理映射到搜索组件,并识别出系统搜索和前瞻性策略两种设计模式。

详情
Journal ref
Proceedings of the Nineteenth International Symposium on Combinatorial Search (SoCS 2026), AAAI Press, 2026
Comments
Extended version of the SoCS 2026 paper. Includes appendices omitted from the proceedings version
AI中文摘要

大型语言模型(LLM)展示了卓越的推理能力,但其标准生成过程——自回归令牌预测——本质上是短视的,容易产生级联错误。为了解决这个问题,思维树(ToT)框架在中间推理步骤上创建了一个搜索空间,允许搜索模型进行探索、前瞻和回溯。然而,当前的ToT研究在自然语言处理和自动规划社区之间仍然分散,常常使用不一致的术语和临时实现。因此,我们通过基于经典启发式搜索术语的统一分类法综合了ToT领域。我们将基于LLM的推理映射到经典搜索组件:状态表示(思维粒度)、后继生成(提示操作符)和启发式评估(进展自我评估)。我们在分类法的背景下分析现有工作,并识别出新兴的设计模式:针对浅层确定性任务的系统搜索(最佳优先搜索)和针对深层多步推理的前瞻性策略(DFS、MCTS)。最后,我们指出了启发式搜索与LLM推理交叉领域中的开放算法挑战,并呼吁启发式搜索社区参与这一新兴领域。

英文摘要

Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities, yet their standard generation process -- auto-regressive token prediction -- is inherently myopic and prone to cascading errors. To address this, the Tree-of-Thoughts (ToT) framework creates a search space over intermediate reasoning steps, allowing search models to explore, look ahead, and backtrack. However, current ToT research remains fragmented across Natural Language Processing and Automated Planning communities, often using inconsistent terminology and ad-hoc implementations. Consequently, we synthesize the ToT landscape through a unified taxonomy based on classical heuristic search terminology. We map LLM-based reasoning to classical search components: state representation (granularity of thoughts), successor generation (prompting operators), and heuristic evaluation (self-assessment of progress). We analyze existing work within the context of our taxonomy and identify emerging design patterns: systematic search (Best-First Search) for shallow, deterministic tasks and lookahead-heavy strategies (DFS, MCTS) for deep multi-step reasoning. We conclude by identifying open algorithmic challenges at the intersection of heuristic search and LLM reasoning, and call on the heuristic search community to engage with this emerging domain.

2605.28565 2026-05-28 cs.DL cs.AI cs.CL cs.IR

Verified Misguidance: Measuring Structural Citation Failures in Search-Augmented LLMs

验证性误导:衡量搜索增强型大语言模型中的结构性引用失败

Yongsik Seo, Wooseok Jeong, Eunyoung Kim, Hyeonseo Jang, Dongha Lee

AI总结 针对搜索增强型大语言模型中的引用可信度问题,提出CITETRACE数据集和三维评估框架,发现系统性“验证性误导”模式:模型引用真实可访问来源但存在意图对齐、来源适宜性或答案-来源忠实度缺陷,导致用户面临结构性误导。

详情
Comments
Working Progress
AI中文摘要

搜索增强型大语言模型的用户依赖引用作为回答基于真实来源的证据,但很少自行验证引用的页面。每天数百万次查询通过这些系统,使得引用质量成为用户是被告知还是被误导的无声决定因素——然而现有基准各自孤立地处理一个方面,导致决定引用可信度的联合结构未被衡量。我们构建了CITETRACE,一个大规模数据集,追踪从用户查询到检索来源再到生成答案的完整引用链:来自28个社区的11,200个真实世界查询,与来自五个提供商的十个模型的112,000个回答配对,产生761,495个可评估的引用对。我们设计了一个三维评估框架,使用专家验证的预定义矩阵和五级忠实度标准,对每个引用在意图-目的对齐、来源适宜性和答案-来源忠实度上进行评分;该框架适用于任何产生带引用回答的系统。大规模应用该框架,我们识别出一种系统性的模式,称为验证性误导(VM):模型引用真实、可访问的来源,但在一个或多个维度上失败,产生忠实度-适宜性权衡,其中忠实模型选择不合适的来源,反之亦然。在我们的池中,30.6%的引用扭曲了其来源,27.1%的引用源自领域不合适的来源;在回答层面,高达96%的用户至少遇到一个结构性误导的引用。提供商层面的差异解释了88-96%的引用质量方差,表明来源选择更多受超出单个模型能力的因素控制,而非LLM本身。总之,CITETRACE及其评估框架为诊断部署的搜索增强系统中的结构性引用失败提供了首个资源。

英文摘要

Users of search-augmented LLMs rely on citations as evidence that responses are grounded in real sources, and rarely verify the cited pages themselves. Millions of queries per day now pass through these systems, making citation quality a silent determinant of whether users are informed or misled-yet existing benchmarks each address one facet in isolation, leaving the joint structure that determines citation trustworthiness unmeasured. We construct CITETRACE, a large-scale dataset that traces the full citation chain from user query through retrieved source to generated answer: 11,200 real-world queries from 28 communities paired with 112,000 responses from ten models across five providers, yielding 761,495 evaluable citation pairs. We design a three-dimension evaluation framework that scores each citation on intent-purpose alignment, source suitability, and answer-source fidelity, using expert-validated predefined matrices and a five-level fidelity rubric; the framework applies to any system that produces citation-bearing responses. Applying this framework at scale, we identify a systematic pattern we call VERIFIED MISGUIDANCE (VM): models cite real, accessible sources yet fail along one or more dimensions, producing a fidelity-suitability trade-off in which faithful models select inappropriate sources and vice versa. Across our pool, 30.6% of citations distort their sources and 27.1% originate from domain-inappropriate sources; at the response level, up to 96% of users encounter at least one structurally misleading citation. Provider-level differences explain 88-96% of citation-quality variance, suggesting that source selection is governed more by factors beyond individual model capability than by the LLMs themselves. Together, CITETRACE and its evaluation framework provide the first resource for diagnosing structural citation failures in deployed search-augmented systems.

2605.28563 2026-05-28 cs.LG cs.AI

A Multi-dimensional Framework for Evaluating Generalization in EEG Foundation Models

评估脑电图基础模型泛化能力的多维框架

Aditya Kommineni, Emily Zhou, Kleanthis Avramidis, Tiantian Feng, Shrikanth Narayanan

AI总结 提出一个多维评估框架,在低资源条件下系统评估EEG基础模型(如LaBraM、CSBrain、CBraMod)的泛化能力,发现其在长上下文任务中表现优异,但在短窗口BCI任务中与监督模型相当,且对通道限制鲁棒性不足。

详情
Comments
24 pages, 5 Figures
AI中文摘要

在适当的适应设置下评估基础模型对于理解所学表示的质量和可迁移性至关重要。最近的脑电图基础模型在跨任务和数据集上展示了有前景的迁移能力,推动了它们在神经技术和临床应用中日益增长的使用。然而,这些模型通常是在精心整理的下游数据集上进行全微调评估,这种设置并未反映生物医学领域的约束,如有限的标记数据、减少的传感器覆盖或参数高效的适应。在这项工作中,我们提出了一个多维评估框架,用于在现实低资源条件下评估脑电图模型。在提出的多维评估框架下,对包括LaBraM、CSBrain和CBraMod在内的监督脑电图模型和最近的脑电图基础模型在6个不同数据集上进行了实证分析。我们发现,脑电图基础模型在长上下文任务(如睡眠阶段预测和心理健康状态分类)上持续提供性能提升。相比之下,对于短窗口的脑机接口风格任务,监督模型尽管参数少得多,却取得了相当的性能。额外的分析表明,当前的基础模型对短窗口任务和通道受限设置提供的鲁棒性有限。总之,这些发现激励使用多维评估协议,以表征模型在现实使用约束下的行为。

英文摘要

Evaluating foundation models under appropriate adaptation settings is essential for understanding the quality and transferability of the learned representations. Recent EEG foundation models have demonstrated promising transfer capabilities across tasks and datasets, motivating their growing use in neurotechnology and clinical applications. However, these models are typically evaluated under full fine-tuning on well-curated downstream datasets, a setting that does not reflect biomedical domain constraints such as limited labeled data, reduced sensor coverage, or parameter-efficient adaptation. In this work, we propose a multi-dimensional evaluation framework for assessing EEG models under realistic low-resource conditions. Empirical analysis of both supervised EEG models and recent EEG foundation models, including LaBraM, CSBrain, and CBraMod, across 6 different datasets is performed under the proposed multi-dimensional evaluation framework. We find that EEG foundation models consistently provide performance gains on long-context tasks such as sleep stage prediction and mental health state classification. In contrast, for short-window Brain Computer Interface style tasks, supervised models achieve comparable despite having substantially fewer parameters. Additional analyses demonstrate that current foundation models provide limited robustness to short-window tasks and channel constrained settings. Together, these findings motivate the use of multi-dimensional evaluation protocols that characterize model behavior under realistic use constraints.

2605.28561 2026-05-28 cs.CL cs.LG

Soft-SVeRL: Self-Verified Reinforcement Learning with Soft Rewards

Soft-SVeRL: 基于软奖励的自验证强化学习

Saurabh Dash, Pierre Clavier, John Dang, Matthias Galle, Marzieh Fadaee, Ahmet Üstün, Beyza Ermis

AI总结 针对部分可验证任务,提出基于检查表分解的软奖励框架Soft-RLVR及其自验证变体Soft-SVeRL,通过密集部分信用信号提升强化学习训练效果,并解决自验证中的奖励膨胀问题。

详情
AI中文摘要

可验证奖励的强化学习(RLVR)在数学和代码等领域改进了语言模型,这些领域中正确性可以自动检查。然而,许多重要任务仅部分可验证:提示包含多个要求,响应可能满足其中一些但非全部,或者可能不存在单一的参考答案。我们引入Soft-RLVR,一个从分解的、学习的验证信号中进行强化学习的框架。Soft-RLVR将每个提示转换为原子要求的检查表,使用LLM验证器逐项评分候选响应,并在生成的软奖励上进行训练。基于检查表的奖励将稀疏的通过/失败监督转化为更密集的部分信用信号,但它们也引入了一个权衡:平均逐项判断可以减少验证器噪声,而部分信用可能奖励不完整的响应。我们形式化了这一权衡,并确定了基于检查表的验证比整体验证提供更可靠RL训练信号的条件。我们进一步引入Soft-SVeRL,这是Soft-RLVR的一个自验证变体,其中策略也充当验证器。我们表明,自验证容易因过于宽松的自我判断而导致奖励膨胀,并且需要显式稳定化以防止这种崩溃。在基于规则的ground-truth评估的受控指令遵循设置中,基于检查表的Soft-RLVR仅使用学习的验证器奖励就将IFEval提升了最多11.1分。我们的实验进一步表明,验证器质量和检查表质量都影响下游RL结果,并且显式稳定化对于有效的自验证至关重要。

英文摘要

Reinforcement Learning from Verifiable Rewards (RLVR) has improved language models in domains such as mathematics and code, where correctness can be checked automatically. However, many important tasks are only partially verifiable: prompts contain multiple requirements, responses may satisfy some but not all of them, or no single reference answer might exist. We introduce Soft-RLVR, a framework for reinforcement learning from decomposed, learned verification signals. Soft-RLVR converts each prompt into a checklist of atomic requirements, scores candidate responses item by item with an LLM verifier, and trains on the resulting soft reward. Checklist-based rewards turn sparse pass/fail supervision into a denser partial-credit signal, but they also introduce a tradeoff: averaging item-level judgments can reduce verifier noise, while partial credit can reward incomplete responses. We formalize this tradeoff and identify conditions under which checklist-based verification gives a more reliable RL training signal than holistic verification. We further introduce Soft-SVeRL, a self-verifying variant of Soft-RLVR in which the policy also acts as the verifier. We show that self-verification is prone to reward inflation from overly permissive self-judgments, and that explicit stabilization is needed to prevent this collapse. In a controlled instruction-following setting with rule-based ground-truth evaluation, checklist-based Soft-RLVR improves IFEval by up to 11.1 points using only learned verifier rewards. Our experiments further show that verifier quality and checklist quality both affect downstream RL outcomes, and that explicit stabilization is essential for effective self-verification.

2605.28557 2026-05-28 cs.LO cs.AI

Token Optimization Strategies for LLM-Based Oracle-to-PostgreSQL Migration

基于LLM的Oracle到PostgreSQL迁移的Token优化策略

Oleg Grynets, Dmytro Babarytskyi, Vasyl Lyashkevych

AI总结 本文形式化并评估了十二种Token优化策略,在Oracle到PostgreSQL迁移中平衡成本、语法有效性、语义保持和结构保真度。

详情
Comments
11 pages, 3 figures, 5 tables, 38 references
AI中文摘要

LLM越来越多地用于软件现代化、代码翻译和数据库迁移。然而,基于LLM的Oracle2PostgreSQL迁移仍然受到高Token消耗、长上下文退化、方言特定的语义差异以及查询转换过程中语义漂移风险的限制。将大型Oracle SQL/PL-SQL工件、模式定义、过程逻辑和迁移指令直接包含到模型上下文中会增加成本并可能降低生成质量。本文将Token优化视为基于LLM的Oracle2PostgreSQL迁移中的一个约束转换问题。研究形式化并评估了十二种Token优化策略:基线表示、上下文剪枝、最小化、基于DSL的语义压缩、元数据增强、上下文重构、模式蒸馏、自适应路由、基于AST的最小化、标识符掩码、输出约束强制和混合优化。这些策略在10和100个Oracle SQL查询样本上使用有效语法率、精确匹配、语义匹配、CodeBLEU和Token效率进行评估。结果表明,轻度上下文剪枝几乎保持了基线水平的语义质量,在100个查询样本上实现了89.75%的语义匹配,而未优化基线为89.80%。自适应路由提供了最佳的实际权衡,输入Token减少8.72%,输出Token减少5.49%,同时保持88.40%的语义匹配,并将Token效率提高6.67%。激进的模式蒸馏将Token效率提高了132.22%,但导致语义匹配下降44.50个百分点。研究结果表明,Token优化不能简单地视为提示缩短;它必须作为一个多目标迁移问题来评估,平衡成本、语法有效性、语义保持和结构保真度。

英文摘要

LLMs are increasingly used for software modernization, code translation, and database migration. However, LLM-based Oracle2PostgreSQL migration remains constrained by high token consumption, long-context degradation, dialect-specific semantic differences, and the risk of semantic drift during query transformation. Direct inclusion of large Oracle SQL/PL-SQL artefacts, schema definitions, procedural logic, and migration instructions into the model context increases cost and may reduce generation quality. This paper shows token optimization as a constrained transformation problem in LLM-based Oracle2PostgreSQL migration. The study formalizes and evaluates twelve token optimization strategies: baseline representation, context pruning, minification, DSL-based semantic compression, metadata augmentation, context refactoring, schema distillation, adaptive routing, AST-based minification, identifier masking, output constraint enforcement, and hybrid optimization. The strategies are evaluated on samples of 10 and 100 Oracle SQL queries using Valid Syntax Rate, Exact Match, Semantic Match, CodeBLEU, and Token Efficiency. The results show that mild context pruning preserves semantic quality almost at the baseline level, achieving 89.75% Semantic Match on the 100-query sample compared with 89.80% for the unoptimized baseline. Adaptive routing provides the best practical trade-off, reducing input tokens by 8.72% and output tokens by 5.49% while maintaining 88.40% Semantic Match and increasing Token Efficiency by 6.67%. Aggressive schema distillation increases Token Efficiency by 132.22% but results in a 44.50-percentage-point decrease in Semantic Match. The findings demonstrate that token optimization cannot be treated as simple prompt shortening; it must be evaluated as a multi-objective migration problem balancing cost, syntactic validity, semantic preservation, and structural fidelity.

2605.28554 2026-05-28 cs.LG

High Performance, Low Reliability: Uncertainty Benchmarking for Tabular Foundation Models

高性能,低可靠性:表格基础模型的不确定性基准测试

José Lucas De Melo Costa, Fabrice Popineau, Arpad Rimmel, Bich-Liên Doan

AI总结 通过TALENT基准测试,发现表格基础模型虽在预测性能上优于梯度提升决策树,但在不确定性校准上表现更差,存在性能-不确定性权衡。

详情
Journal ref
ESANN 2026 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges (Belgium) and online event, 22-24 April 2026, pp. 115-120, i6doc.com publ., ISBN 9782875870964
Comments
6 pages, 2 figures, 2 tables. Accepted at ESANN 2026 (European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning), 22-24 April 2026, Bruges (Belgium)
AI中文摘要

最近的表格基础模型(TFMs)展示了最先进的预测性能,通常超越梯度提升决策树(GBDTs)。然而,这些模型的可信度,特别是其不确定性量化,在很大程度上被忽视了。我们通过在TALENT基准测试的112个数据集上进行广泛研究,比较TFMs、GBDTs和经典基线,调查了这一差距。我们的结果揭示了性能-不确定性权衡:尽管TFMs在AUC测量下达到了最高的预测性能,但在共形预测下,它们表现出较低的条件覆盖率(由SSCS测量),相比GBDTs。在合成数据集上的补充实验进一步刻画了这种效应加剧的情景。我们得出结论,尽管TFMs推进了预测前沿,但实现良好校准的不确定性仍然是其可靠采用的主要开放挑战。代码可在:https://github.com/jose-melo/high-performance-low-reliability 获取。

英文摘要

Recent Tabular Foundation Models (TFMs) have demonstrated state-of-the-art predictive performance, often surpassing Gradient-Boosted Decision Trees (GBDTs). However, the trustworthiness of these models, particularly their uncertainty quantification, has been largely overlooked. We investigate this gap through an extensive study comparing TFMs, GBDTs, and classical baselines on the 112 datasets of the TALENT benchmark. Our results reveal a performance-uncertainty trade-off: although TFMs achieve the highest predictive performance, measured by AUC, they exhibit lower conditional coverage under conformal prediction, measured by SSCS, compared to GBDTs. Complementary experiments on synthetic datasets further characterize the regimes in which this effect intensifies. We conclude that while TFMs advance predictive frontiers, achieving well-calibrated uncertainty remains a major open challenge for their reliable adoption. Code is available at: https://github.com/jose-melo/high-performance-low-reliability

2605.28553 2026-05-28 cs.AI cs.CR

Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations

解码前拒绝:检测和利用中间LLM激活中的拒绝信号

Matteo Gioele Collu, Riccardo Conte, Alberto Giaretta, Denis Kleyko, Mauro Conti, Matteo Zavatteri, Roberto Confalonieri

AI总结 本文通过线性探针在变压器块的残差流激活中检测拒绝行为,并提出Mechanistic AutoDAN方法,利用探针引导的遗传搜索实现高效攻击,显著降低搜索时间并保持攻击成功率。

详情
AI中文摘要

在本文中,我们研究了是否可以通过在解码前使用线性探针在变压器块的残差流激活上训练,从LLM中间激活中预测拒绝行为。我们发现拒绝在远早于最后一层时即可线性解码,表明安全相关行为在输出生成前就已编码在中间激活中。为了测试该信号是否可行,我们引入了Mechanistic AutoDAN,这是AutoDAN的一种探针引导变体,它在遗传提示搜索循环中用部分前向传递和基于探针的评分取代了全模型适应度评估。在评估的模型中,我们的方法实现了与原始AutoDAN相当的攻击成功率,同时将每次迭代的搜索时间减少了高达72%,并且在多种配置下,探针引导的提示在跨模型迁移方面达到或超过了AutoDAN。我们进一步发现,探针引导的有效性随模型规模增大而增加。我们的结果表明,拒绝不仅在输出层面可观察,而且作为结构化且可行的信号编码在LLM中间激活中。

英文摘要

In this paper, we investigate whether refusal behavior can be predicted from LLM intermediate activations before decoding using linear probes trained on residual stream activations at each transformer block. We find that refusal is linearly decodable well before the final layer, indicating that safety-relevant behavior is represented in intermediate activations before output generation. To test whether this signal is actionable, we introduce Mechanistic AutoDAN, a probe-guided variant of AutoDAN that replaces full-model fitness evaluation with partial forward passes and probe-based scoring inside a genetic prompt search loop. Across the evaluated models, our method achieves attack success rates competitive with vanilla AutoDAN while reducing per-iteration search time by up to 72%, and probe-guided prompts match or exceed AutoDAN's cross-model transfer in several configurations. We further find that the usefulness of probe guidance increases with model scale. Our results show that refusal is not only observable at the output level, but is encoded as a structured and actionable signal in intermediate LLM activations.

2605.28552 2026-05-28 cs.AI

Modeling Vehicle-Type-Specific Pedestrian Crash Avoidance Behavior in Safety-Critical Interactions Using Smooth-Mamba Deep Reinforcement Learning

使用Smooth-Mamba深度强化学习建模安全关键交互中车辆类型特定的行人碰撞规避行为

Qingwen Pu, Kun Xie, Hong Yang, Di Yang, Junqing Wang

AI总结 本研究利用Smooth-Mamba深度确定性策略梯度框架(SMamba-DDPG)从Argoverse 2数据集中提取安全关键交互,建模行人与自动驾驶车辆(AV)和人类驾驶车辆(HDV)的碰撞规避行为,发现行人对AV反应更快、穿越速度更低,且AV场景冲突率更低。

详情
Comments
37 page. 15 Figure, 9 table
AI中文摘要

随着自动驾驶车辆(AV)越来越多地与人类驾驶车辆(HDV)共享道路,理解行人在安全关键交互中如何应对不同车辆类型对于自动驾驶技术的安全部署至关重要。本研究从Argoverse 2数据集中提取安全关键的行人-车辆交互,以捕捉涉及AV和HDV的真实碰撞规避行为。为了建模车辆类型特定的行人碰撞规避行为,我们开发了Smooth-Mamba深度确定性策略梯度框架(称为SMamba-DDPG),该框架将平滑动作约束与高效的时序表示学习相结合。为了量化行人行为差异,该框架分别为行人与AV和HDV的交互训练了碰撞规避策略。结果表明,SMamba-DDPG在复现行人碰撞规避行为方面优于基线强化学习和监督学习模型。重构轨迹表现出强烈的行为真实性,准确复现了AV和HDV场景中的碰撞规避运动学。反应时间分析表明,该模型捕捉到了类人的响应延迟,并揭示行人对AV的反应比HDV更快。反事实分析进一步表明,行人在与AV交互时采用更低的穿越速度。对模型生成数据的大规模安全分析显示,与行人-HDV交互相比,行人-AV交互始终产生更低的冲突率和更高的行人让行率。这些发现强调了在混合交通环境中,将车辆类型特定的行人行为模型纳入更安全的自动驾驶系统设计和更真实的交通模拟中的重要性。

英文摘要

As automated vehicles (AVs) increasingly share roadways with human-driven vehicles (HDVs), understanding how pedestrians respond to different vehicle types in safety-critical interactions is essential for the safe deployment of automated driving technologies. This study extracts safety-critical pedestrian-vehicle interactions from the Argoverse 2 dataset to capture real-world crash avoidance behaviors in encounters involving AVs and HDVs. To model vehicle-type-specific pedestrian crash avoidance behavior, we develop a Smooth-Mamba Deep Deterministic Policy Gradient framework, termed SMamba-DDPG, which integrates smooth action constraints with efficient temporal representation learning. To quantify pedestrian behavioral differences, the framework trains separate crash avoidance policies for pedestrian interactions with AVs and HDVs. Results show that SMamba-DDPG outperforms baseline reinforcement learning and supervised learning models in reproducing pedestrian crash avoidance behaviors. Reconstructed trajectories demonstrate strong behavioral realism, accurately reproducing crash avoidance kinematics in both AV and HDV scenarios. Reaction time analysis shows that the model captures human-like response delays and reveals that pedestrians respond more quickly to AVs than to HDVs. Counterfactual analysis further indicates that pedestrians adopt lower crossing speeds when interacting with AVs. Large-scale safety analysis of model-generated data revealed that pedestrian-AV interactions consistently yielded lower conflict rates and higher pedestrian yielding rates compared to pedestrian-HDV interactions. The findings highlight the importance of incorporating vehicle-type-specific pedestrian behavioral models for safer automated driving system design and more realistic traffic simulations in mixed-traffic environments.

2605.28549 2026-05-28 cs.RO cs.LG

SPRINT: Efficient Spectral Priors for Humanoid Athletic Sprints

SPRINT: 用于人形运动短跑的高效频谱先验

Yantong Wei, Kaihong Huang, Hainan Pan, Jiawei Luo, Jiawei Zhou, Ziyan Mai, Zhiwen Zeng, Yaonan Wang, Huimin Lu

AI总结 提出SPRINT框架,利用频率自适应频谱先验生成运动学可行的关节轨迹,实现零样本仿真到现实迁移,在Unitree G1平台上达到6 m/s峰值速度。

详情
AI中文摘要

人形运动短跑的追求受到缺乏人形可行的运动学参考数据以及现有框架在短跑过程中无法保持稳定性的阻碍。为了克服这些限制,我们引入了SPRINT,一种由高效、频率自适应频谱先验驱动的新框架。通过使用五个离散运动序列的参考库在频域中表征人类运动的基本周期性,这些先验在广泛的速度范围内生成运动学可行的关节轨迹,成功外推至超过参考分布的速度。在这些预训练先验的指导下,SPRINT策略在Unitree G1平台上的现场实验中实现了零样本仿真到现实迁移,达到了6 m/s的峰值短跑速度,并在保持仿生自然性的同时展示了无缝步态转换。最终,这项工作确立了频率自适应频谱先验作为人形运动短跑的高数据效率基础。项目页面见 https://anonymous.4open.science/w/SPRINT-138A/。

英文摘要

The pursuit of humanoid athletic sprints is hindered by a scarcity of humanoid-viable kinematic reference data and the inability of existing frameworks to maintain stability during sprints. To overcome these limitations, we introduce SPRINT, a novel framework driven by efficient, frequency-adaptive spectral priors. By characterizing the fundamental periodicity of human locomotion in the frequency domain using a reference library of five discrete motion sequences, these priors generate kinematically feasible joint trajectories across a broad velocity spectrum, successfully extrapolating to speeds that exceed the reference distribution. Guided by these pretrained priors, the SPRINT policy achieves zero-shot sim-to-real transfer in field experiments on the Unitree G1 platform, reaching a peak sprinting velocity of 6 m/s and demonstrating seamless gait transitions while preserving biomimetic naturalness. Ultimately, this work establishes frequency-adaptive spectral priors as a highly data-efficient foundation for humanoid athletic sprints. The project page is available at https://anonymous.4open.science/w/SPRINT-138A/.