arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2093
2605.30486 2026-06-01 cs.LG cs.AI

Graph-Conditioned Mixture of Graph Neural Network Experts for Traffic Forecasting

图条件化的图神经网络专家混合模型用于交通预测

Amirhossein Ghaffari, Saeid Sheikhi, Ekaterina Gilman

AI总结 提出GC-MoE框架,通过图拓扑和近期交通输入为每个节点分配个性化专家组合,仅训练轻量路由模块,在四个基准上提升MAE。

详情
Comments
An accepted paper at the 27th IEEE International Conference on Mobile Data Management (MDM 2026)
AI中文摘要

传感器图上的时空预测通常采用统一应用于所有节点的单一骨干架构,尽管图区域可能表现出不同的动态。道路段在功能类别、结构和交通行为上存在差异,表明节点级专家专业化可能是有用的。我们提出GC-MoE,一种图条件化的专家混合框架,基于图拓扑和近期交通输入窗口为每个节点分配个性化的冻结预测专家组合。GC-MoE将冻结的预训练时空GNN专家与输入感知、空间上下文化的路由器相结合,同时仅训练轻量级路由模块。我们还研究了一个有界图条件化输出精炼层作为可选扩展,并仅作为消融诊断包含节点自适应ST-LoRA适配器。在四个标准基准(PEMS04、PEMS07、METR-LA和PEMS-BAY)上,GC-MoE在零参数集成基线上改善了MAE,具有竞争力的RMSE和MAPE,同时在1.5M冻结专家权重之上仅训练约17K参数。实现代码见https://github.com/Ahghaffari/gc_moe。

英文摘要

Spatio-temporal forecasting on sensor graphs is commonly tackled with a single backbone architecture applied uniformly across all nodes, although graph regions can exhibit different dynamics. Road segments differ in functional class, structure, and traffic behavior, suggesting that node-wise expert specialization can be useful. We propose GC-MoE, a graph-conditioned mixture of experts framework that assigns each node a personalized combination of frozen forecasting experts based on graph topology and the recent traffic input window. GC-MoE combines frozen pretrained spatio-temporal GNN experts with an input-aware, spatially contextualized router while training only a lightweight routing module. We also study a bounded graph-conditioned output refinement layer as an optional extension and include node-adaptive ST-LoRA adapters only as an ablation diagnostic. Across four standard benchmarks (PEMS04, PEMS07, METR-LA, and PEMS-BAY), GC-MoE improves MAE over a zero-parameter ensemble baseline, with competitive RMSE and MAPE, while training only ~17K parameters on top of 1.5M frozen expert weights. The implementation is available at https://github.com/Ahghaffari/gc_moe.

2605.30484 2026-06-01 cs.RO

ELAN4D: Embodiment-Centric 4D Supervision for Vision-Language-Action Models via Plug-and-Play Adaptation

ELAN4D:以具身为中心的4D监督用于视觉-语言-动作模型的即插即用适配

Zeyuan He, Bowen Yang, Zhirui Fang, Keru Zhou, Lei Jiang, Jingjing Qian, Fan Mo, Junchi Yan, Philip Torr, Xiu Li, Li Jiang, Jialin Yu

AI总结 提出ELAN4D框架,通过未来机器人关键点轨迹作为预测性时空监督,以即插即用方式增强VLA策略的鲁棒性和泛化能力。

详情
AI中文摘要

视觉-语言-动作(VLA)模型在机器人操作中展现出潜力,但现有大多数策略通过直接从当前观测回归动作来反应式运行,没有显式建模未来动态。这限制了它们在分布外扰动下的泛化能力。为解决此问题,我们提出ELAN4D,一个以具身为中心的4D感知训练框架,通过未来机器人关键点轨迹作为预测性时空监督来增强VLA策略。仅利用本体感觉状态的前向运动学,我们推导出机器人关键点(如关节和末端执行器)的3D位移轨迹,预处理成本可忽略。这些轨迹提供度量且紧凑的监督,无需外部跟踪器或重建。一个即插即用的辅助分支,配备轻量级轨迹解码器,在通过梯度隔离保护预训练视觉-语言主干的同时,将4D信号注入动作专家。推理时丢弃轨迹解码器,保持基础策略接口不变。在LIBERO、LIBERO-Plus、RoboTwin2.0和真实世界操作任务上的大量实验表明,ELAN4D持续优于强VLA基线,在分布外扰动(包括相机、背景和布局变化)下取得最佳整体性能和显著提升。这些结果凸显了以具身为中心的4D监督对于构建更鲁棒和可泛化的操作策略的有效性。

英文摘要

Vision-Language-Action (VLA) models have shown promise for robotic manipulation, yet most existing policies operate reactively by directly regressing actions from current observations, without explicitly modeling future dynamics. This limits their ability to generalize under out-of-distribution perturbations. To address this issue, we propose ELAN4D, an embodiment-centric, 4D-aware training framework that enhances VLA policies with future robot keypoint tracks as predictive spatio-temporal supervision. Using only forward kinematics from proprioceptive states, we derive 3D displacement tracks of robot keypoints, such as joints and the end-effector, with negligible preprocess cost. These tracks provide metric and compact supervision without requiring external trackers or reconstruction. A plug-and-play auxiliary branch with a lightweight track decoder injects this 4D signal into the action expert while preserving the pretrained vision-language backbone through gradient isolation. The track decoder is discarded during inference, leaving the base policy interface unchanged. Extensive experiments on LIBERO, LIBERO-Plus, RoboTwin2.0 and real-world manipulation tasks demonstrate that ELAN4D consistently improves over strong VLA baselines, achieving the best overall performance and substantial gains under out-of-distribution perturbations, including camera, background, and layout shifts. These results highlight the effectiveness of embodiment-centric 4D supervision for building more robust and generalizable manipulation policies.

2605.30482 2026-06-01 cs.LG

Discovering a Zeta Map Algorithm on Dyck Paths via Mechanistic Interpretability

通过机制可解释性发现 Dyck 路径上的 Zeta 映射算法

Xiaoyu Huang, Blake Jackson, Kyu-Hwan Lee

AI总结 本文通过训练一个小型编码器-解码器 Transformer 模型来学习 Dyck 路径上的 zeta 映射,并利用机制可解释性工具分析其计算过程,从而发现并证明了一种新的显式组合算法——脚手架映射。

详情
AI中文摘要

机器学习越来越多地用于数学发现,但在数学中,期望的输出通常不是预测本身,而是一个可以独立验证的显式构造。我们通过 Dyck 路径上的 zeta 映射(q,t-卡特兰数组合学中的一个经典双射)来研究这一设定。我们在该映射上训练了一个特意设计的小型单层单头编码器-解码器 Transformer,并使用机制可解释性工具(包括解码器交叉注意力分析、线性探测和因果干预)分析其学习到的计算过程。分析揭示了一种基于层级的机制:编码器表示使路径层级线性可访问,而解码器以结构化方式选择和遍历输入位置。将这些信号转化为组合学,得到了脚手架映射,这是一种针对 Dyck 路径的显式以峰为中心的遍历算法。我们证明该算法与 zeta 映射一致,只是标签的逆转约定有所不同。这提供了一个受控的 AI 辅助数学发现示例,其中机制可解释性将模型行为转化为精确、人类可验证的组合算法。

英文摘要

Machine learning is increasingly used in mathematical discovery, but in mathematics the desired output is often not a prediction itself, but an explicit construction that can be checked independently. We study this setting through the zeta map on Dyck paths, a classical bijection in the combinatorics of the q,t-Catalan numbers. We train a deliberately small one-layer, one-head encoder-decoder transformer on this map and analyze its learned computation using mechanistic interpretability tools, including decoder cross-attention analysis, linear probing, and causal intervention. The analysis reveals a level-based mechanism: encoder representations make path levels linearly accessible, while the decoder selects and traverses input positions in a structured way. Translating these signals into combinatorics leads to the scaffolding map, an explicit peak-centered traversal algorithm for Dyck paths. We prove that this algorithm agrees with the zeta map, modulo a reversal convention in the labeling. This gives a controlled example of AI-assisted mathematical discovery in which mechanistic interpretability turns model behavior into a precise, human-verifiable combinatorial algorithm.

2605.30481 2026-06-01 cs.CL

When English Rewrites Local Knowledge: Global Narrative Dominance in Large Language Models

当英语重写地方知识:大语言模型中的全球叙事主导

Md Arid Hasan, Ruwad Naswan, Farhan Samir, Sharifa Sultana, Syed Ishtiaque Ahmed

AI总结 本研究通过构建孟加拉语文化数据集CulturalNB,评估大语言模型在低资源文化背景下的跨语言知识一致性,发现英语提问会系统性地增加全球替代和制度框架,减少地方视角覆盖,表明文化失败不仅是知识缺失,更是根基和叙事优先级问题。

详情
Comments
Submitted to ARR
AI中文摘要

大语言模型(LLMs)被广泛用作跨语言知识接口。然而,植根于文化的问题往往反映全球主导叙事而非地方背景。我们将这种失败模式称为孟加拉语(一种低资源文化背景)中的 extit{全球叙事主导}。我们引入了 exttt{CulturalNB},一个包含717个手工策划的孟加拉语文化实例的数据集,配有平行的孟加拉语-英语问答对、支持证据、元数据和社会文化注释。通过仅问题和基于证据的提示,我们使用人类和两个独立的LLM裁判,在跨语言一致性、语言锚定、全球替代、制度偏见和认知视角覆盖等指标上评估了九个最先进的LLM。结果表明,用英语提问会系统性地增加全球替代和制度框架,同时减少地方视角覆盖。地方证据提高了事实一致性和视角覆盖,但并未消除语言引起的认知偏移。这些发现表明,LLM中的文化失败不仅是知识缺失错误,更是根基和叙事优先级失败。

英文摘要

Large language models (LLMs) are widely used as cross-lingual knowledge interfaces. However, culturally grounded questions often reflect globally dominant narratives rather than local contexts. We study this failure mode as \textit{global narrative dominance} in Bangla, a low-resource cultural context. We introduce \texttt{CulturalNB}, a dataset of 717 manually curated Bengali cultural instances with parallel Bangla--English question--answer pairs and supporting evidence, metadata, and sociocultural annotations. Using question-only and evidence-based prompting, we evaluate nine state-of-the-art LLMs with human and two independent LLM judges across metrics for cross-lingual consistency, language anchoring, global substitution, institutional bias, and epistemic perspective coverage. Results show that questions asked in English systematically increase global substitution and institutional framing while reducing local perspective coverage. Local evidence improves factual consistency and perspective coverage, but does not eliminate language-induced epistemic shifts. These findings suggest that cultural failures in LLMs are not only missing-knowledge errors but also failures of grounding and narrative prioritization.

2605.30479 2026-06-01 cs.LG

Universal Multiclass Transductive Online Learning

通用多类别转导在线学习

Steve Hanneke, Hongao Wang

AI总结 研究具有可能无界标签空间的通用转导在线分类问题,通过引入“Level-Constrained-Littlestone-Littlestone (LCLL)树”和冷漠性质来刻画可学习性,并证明可学习类的最优错误率要么有界要么对数增长。

详情
AI中文摘要

我们考虑具有可能无界标签空间的通用转导在线分类问题。该设置考虑在线学习,其中实例序列(无标签)预先已知给学习器。我们说一个概念类$\mathcal{H}$是可学习的,如果存在一个学习算法$\mathcal{A}$,使得对于每个可实现序列,$\mathcal{A}$犯的错误数量最多随预测次数次线性增长。我们刻画了该设置的可学习性,并表明对于可学习类,只有两种可能的最优速率:有界或对数增长。我们引入了一种新的组合结构,称为“Level-Constrained-Littlestone-Littlestone (LCLL)树”,它与冷漠性质一起刻画了可学习性。我们还将可学习性结果扩展到不可知情况以及仅已知生成实例序列的随机过程的情况。

英文摘要

We consider the problem of universal transductive online classification with a possibly unbounded label space. This setting considers online learning, with the sequence of instances (without labels) known to the learner in advance. We say a concept class $\mathcal{H}$ is learnable if there is a learning algorithm $\mathcal{A}$, such that for every realizable sequence, the number of mistakes made by $\mathcal{A}$ grows at most sublinearly with the number of predictions. We characterize the learnability of this setting and show that there are only two possible optimal rates for the learnable classes: either bounded or increasing logarithmically. We introduce a new combinatorial structure, called ``Level-Constrained-Littlestone-Littlestone (LCLL) tree'', which, along with the indifference property, characterizes the learnability. We also extend the learnability result to the agnostic case and the case where only the stochastic process that generates the instance sequence is known.

2605.30472 2026-06-01 cs.CL

Your Multimodal Speech Model Says I Have a Face for Radio

你的多模态语音模型说我有张适合电台的脸

Maya K. Nachesa, Vlad Niculae, Vagrant Gautam

AI总结 通过配对不同人脸与相同音频的视频,评估多模态语音识别模型在性别、种族及其交叉属性上的偏差,发现质量差异显著,提示多模态并非必然更好。

详情
AI中文摘要

随着大型神经模型在语言任务上的表现越来越好,研究人员正在构建处理更多数据模态的多模态和全模态模型。其中一个例子是将语音识别模型扩展到音视频数据,用于噪声抑制和多模态字幕生成。虽然性能和偏差在单模态领域已得到广泛研究,但新模态如何影响这些方面尚不清楚,尽管它们在人类中会产生偏差。因此,我们提出了对多模态语音识别的首次偏差评估,我们创建了将不同人脸与相同音频配对的视频,并测量语音转录准确性的变化。我们发现,在mWhisper-Flamingo和Gemini模型中,自我声明的性别、种族及其交叉属性之间存在高达4.05个词错误率点的服务质量差异。我们的研究结果表明,开发者应优先评估、修复和传达此类限制,因为通过额外模态提供更多信号并不一定更好,甚至可能导致有偏见的结果。

英文摘要

As large neural models have become better at language tasks, researchers are increasingly building multi- and omnimodal models that handle more modalities of data. One example is the expansion of speech recognition models to audio-visual data for noise mitigation and multimodal subtitling. While performance and bias have been studied extensively in the single-modality regime, it is unknown how new modalities affect this, even though they produce biases in humans. We therefore propose the first bias evaluation of multimodal speech recognition, where we create videos pairing different faces with the same audio, and measure changes in speech transcription accuracy. We find large quality-of-service differences across mWhisper-Flamingo and Gemini models, with drops of up to 4.05 word error rate points, across self-declared gender, ethnicity, and their intersection. Our findings point to a priority for developers to evaluate, fix, and communicate such limitations, as providing more signals through additional modalities is not necessarily better, and may even lead to biased outcomes.

2605.30470 2026-06-01 cs.LG

Can Subgraph Explanations Be Weaponized to Steal Graph Neural Networks?

子图解释能否被武器化以窃取图神经网络?

Ojas Nimase, Jiate Li, Yue Zhao, Yushun Dong

AI总结 本文提出首个针对图分类的黑盒模型提取攻击,利用模型解释输出引导蒙特卡洛边敏感性估计,并利用解释子图缩小边界搜索空间,实验表明该方法优于现有基线。

详情
Comments
28 pages, 8 figures, 10 tables. Under review at NeurIPS 2026
AI中文摘要

图机器学习即服务(GMLaaS)平台越来越多地实现可解释性接口以满足监管透明度要求。然而,这种透明度为模型提取攻击创造了可利用的漏洞。我们提出了首个针对图分类的模型提取攻击,该攻击在严格的黑盒约束下进行,攻击者仅观察到离散类标签和二进制解释掩码(无概率分数、梯度或置信度值)。我们的方法(1)利用模型解释输出引导蒙特卡洛边敏感性估计朝向决策边界,并具有Hoeffding集中保证估计精度;(2)利用解释子图有效缩小边界搜索空间。在多个领域的基准图数据集上的大量实验表明,我们的方法优于可比基线。这些发现表明,此类可解释性接口创造了可利用的攻击面,为可解释AI指令的防御机制和政策框架提供了信息。实现代码见https://github.com/LabRAI/XSTEAL/。

英文摘要

Graph Machine Learning as a Service (GMLaaS) platforms increasingly implement explainability interfaces to meet regulatory transparency requirements. However, this transparency creates exploitable vulnerabilities for model extraction attacks. We present the first model extraction attack specifically designed for graph classification under strict black-box constraints where the attacker observes only discrete class labels and binary explanation masks (no probability scores, gradients, or confidence values). Our method (1) uses model explanation outputs to guide Monte Carlo edge sensitivity estimation toward decision boundaries, with Hoeffding concentration guarantees on estimation accuracy and (2) exploits explanation subgraphs to efficiently narrow the boundary search space. Extensive experiments on benchmark graph datasets across multiple domains demonstrate our method's superiority over comparable baselines. These findings demonstrate that such explainability interfaces create exploitable attack surfaces, informing both defensive mechanisms and policy frameworks for explainable AI mandates. The implementation code is provided in https://github.com/LabRAI/XSTEAL/.

2605.30469 2026-06-01 cs.SD cs.CV

3DAE: Binaural Quality Assessment for Audio Novel View Synthesis with Spatial Maps and Benchmark

3DAE: 基于空间图谱和基准的音频新视角合成双耳质量评估

Jialu Xu, Yifan Zhou

AI总结 提出一个全参考诊断框架3DAE Map,通过时频音频误差图(幅度、ILD、IPD、时间对齐、响度和高频故障)进行视觉检查,并构建模型无关基准3DAE Bench,用于评估音频新视角合成模型的双耳预测质量。

详情
AI中文摘要

3D音频和新视角声学合成模型通常使用全局指标进行评估。然而,全局指标往往隐藏了双耳预测失败的位置和原因。我们提出一个全参考诊断框架,该框架使用时频音频误差图,包括幅度、ILD、IPD、时间对齐、响度和高频故障,形成3D音频误差图(3DAE Map)用于视觉检查。我们将这些诊断方法整合到一个模型无关的基准——空间音频误差基准(3DAE Bench)中,该基准接受任意真实和预测的双耳对,并报告音频新视角合成模型的预测质量。在Replay-NVAS和SoundSpaces上对ViGAS输出的实验显示了不同的主要故障模式:Replay-NVAS上的时间错位和SoundSpaces上的ILD不匹配。总体而言,该框架为音频新视角合成模型开发优化提供了可解释的故障模式总结和直观的视觉图谱。

英文摘要

3D audio and novel-view acoustic synthesis models are usually evaluated with global metrics.However, global metrics often hide where and why binaural prediction fails. We propose a full-reference diagnostic framework that uses time-frequency audio error maps for magnitude, ILD, IPD, temporal alignment, loudness, and high-frequency failures, forming a 3D Audio Error Map (3DAE Map) for visual inspection. We frame these diagnostics into a model-agnostic benchmark, Spatial Audio Error Bench (3DAE Bench), which takes arbitrary ground-truth and predicted binaural pairs and reports the prediction quality of audio novel-view synthesis models. Experiments on ViGAS outputs over Replay-NVAS and SoundSpaces show different dominant failure modes: temporal misalignment on Replay-NVAS and ILD mismatch on SoundSpaces. Overall, the framework provides interpretable failure-mode summaries and intuitive visual maps for audio Novel-view-synthesis model development optimization.

2605.30468 2026-06-01 cs.RO

Learning-Based Navigation for Indoor Mobile Robots

基于学习的室内移动机器人导航

Tri-Tin Nguyen, Tien-Dat Nguyen, Gia-Uy Le, Vinh Nguyen, Vinh-Hao Nguyen

AI总结 提出一种结合监督学习全局规划器与基于学习的DWA局部规划器的导航框架,通过行为克隆和PPO优化实现安全避障导航。

详情
AI中文摘要

本文提出了一种基于学习的室内移动机器人导航框架。该方法将基于代价感知A*专家轨迹训练的监督神经全局规划器与提出的基于学习的DWA局部规划器相结合,后者被表述为动态窗口法(DWA)动作格上的离散候选选择。对于局部规划,策略首先通过行为克隆进行训练,然后在可行性感知掩码下通过近端策略优化(PPO)进行精炼。该框架在模拟和真实室内环境中进行了实现和评估。实验结果表明,所提方法能够在存在障碍物的情况下生成可行的全局路径和可靠的局部运动指令,以实现安全的目标导向导航。这些结果证明了将基于学习的全局规划与强化学习精炼的局部控制相结合用于室内移动机器人导航的有效性。源代码将在 https://ntdathp.github.io/rl_robot_web/ 发布。

英文摘要

This paper presents a learning-based navigation framework for indoor mobile robots. The proposed method combines a supervised neural global planner, trained from cost-aware A* expert trajectories, with the proposed Learning-Based DWA local planner, which is formulated as discrete candidate selection over the Dynamic Window Approach (DWA) action lattice. For local planning, the policy is first trained by behavior cloning and then refined by Proximal Policy Optimization (PPO) under feasibility-aware masking. The framework is implemented and evaluated in both simulated and real-world indoor environments. Experimental results show that the proposed method generates feasible global routes and reliable local motion commands for safe goal-directed navigation in the presence of obstacles. These results demonstrate the effectiveness of integrating learning-based global planning with reinforcement-learning-refined local control for indoor mobile robot navigation. The source code will be released at https://ntdathp.github.io/rl_robot_web/.

2605.30465 2026-06-01 cs.CL

Knowledge Graph-Enhanced Zero-Shot Topic Classification: A Multi-Strategy Comparative Study

知识图谱增强的零样本主题分类:多策略比较研究

Shahana Akter, Yatharth Vohra, Ankita Shukla, Souvika Sarkar

AI总结 提出零样本多标签主题分类框架,通过知识图谱增强文档表示,实验表明对小型模型有正面影响,对大型模型有负面影响。

详情
Comments
15 pages, 1 figure, ACL format. This paper proposes a KG-augmented zero-shot multi-label topic classification framework and evaluates multiple strategies
AI中文摘要

在没有标注训练数据的情况下进行多标签主题分类是一项具有挑战性的任务,特别是当文档包含复杂的关系信息时。我们提出了一个零样本多标签主题分类框架,并系统地研究了每篇文章的知识图谱增强如何影响其性能。基础框架在没有标注训练数据的情况下对文档中的主题进行分类,有四种变体:仅文章分类、关键词增强分类以及两者的自一致性解码变体。然后,我们为每个基础变体增加每篇文章的知识图谱。该图谱通过类似于KGGen的流水线从输入文档中提取,基于主语-谓语-宾语三元组。我们在十五个大型语言模型和八个跨不同领域的多标签数据集上测试了所有八种方法(四种基础和四种图谱增强)。对于基础框架,关键词增强分类(AK)是表现最好的方法,十五个大型语言模型中有六个超过了句子编码器基线。图谱增强对小型模型有正面影响,对大型模型有负面影响。这表明大型模型已经从预训练中包含了足够的关系信息。此外,自一致性解码变体在任何实验中都没有显示出性能提升,同时计算成本增加了约五倍。

英文摘要

Multi-label topic classification without labeled training data is a challenging task, specially when documents contain complex relational information. We present a zero-shot multi-label topic classification framework and systematically investigate how per-article knowledge graph augmentation affects its performance. The base framework classifies topics in documents without labeled training data and has four variants: article-only classification, keyword-enhanced classification, and self-consistency decoding variants of both. Then, we augment each base variant with per article knowledge graph. This graph is extracted from the input document through a pipeline similar to KGGen based on subject-predicate-object triples. We test all eight methods, four base and four graph augmented on fifteen LLMs and eight multi-label datasets across different domains. For the base framework, keyword-enhanced classification (AK) is the best performing method, and six out of fifteen LLMs surpass the sentence-encoder baseline. Graph augmentation has positive and negative impacts on small and large models, respectively. This shows that larger models already contain enough relational information from pretraining. Furthermore, the self-consistency decoding variant does not show performance improvements in any experiment while increasing computation costs about fivefold.

2605.30462 2026-06-01 cs.LG cs.AI

idSCD: Identifying Training Datasets through Semantic Correlation Descriptors

idSCD: 通过语义相关描述符识别训练数据集

Andrada Gobeaja, Ionut Hodoroaga, Elena Burceanu, Marius Leordeanu

AI总结 提出基于语义相关描述符(SCD)的白盒方法,通过模型学习到的语义相关结构识别训练数据集中的成员关系,在多个实验设置中优于现有基线方法。

详情
Comments
16 pages, 3 figures
AI中文摘要

一个数据集能否通过其在训练过程中引起的虚假相关性被识别?我们认为,数据集会在模型学习的语义相关结构中留下特定于数据集的痕迹:在数据集中具有预测性但对底层任务非因果的偶然规律性,可能在训练过程中被内化。我们利用这一洞察研究数据集级别的成员推断,超越了依赖置信度分数、损失、边际、生成样本或查询响应等行为或分布证据的现有方法。我们引入了一种基于语义相关描述符(SCD)的白盒语义指纹方法,该方法捕获模型学习的语义相关结构,并使其在不同数据集混合中具有可比性。在受控的留一数据集诊断中,SCD恢复了数据集特定的变化,并完美区分匹配与非匹配的数据集对。然后,我们提出了一种实用的基于SCD的成员分数,该分数仅使用模型的SCD和目标数据集的独立SCD来测试目标数据集是否是模型训练混合的一部分,无需留一数据集模型。在三个不同的实验设置中,包括自然语言推理、情感分类和医学文本分类的数据集组,我们测试了基于SCD的成员推断在不同程度的语义分离和数据集划分之间的关键词支持下的优势和局限性。平均而言,基于该分数的分类器实现了最高的性能和最低的标准差,优于黑盒基线RMIA、Attack-P和LiRA,以及白盒基线SIF。这些结果表明,数据集成员可以通过内部语义相关性进行追踪,当数据集组暴露不同的语义特性时,ROC-AUC的最大相对增益超过60%。

英文摘要

Can a dataset be recognized from the spurious correlations it induces during training? We argue that datasets leave dataset-specific traces in a model's learned semantic correlation structure: incidental regularities that are predictive within a dataset, but not causal for the underlying task, can be internalized during training. We use this insight to study dataset-level membership inference, moving beyond existing methods that rely on behavioral or distributional evidence such as confidence scores, losses, margins, generated samples, or query responses. We introduce a white-box semantic fingerprinting approach based on semantic correlation descriptors (SCDs), which capture the semantic correlation structure learned by a model and make it comparable across dataset mixtures. In a controlled leave-one-dataset-out diagnostic, SCDs recover dataset-specific changes and perfectly separate matching from non-matching dataset pairs. We then propose a practical SCD-based membership score that tests whether a target dataset is part of a model's training mixture using only the model's SCD and the target dataset's standalone SCD, without requiring leave-one-dataset-out models. Across three diverse experimental settings, with dataset groups for natural language inference, emotion classification, and medical text classification, we test both the advantages and limitations of SCD-based membership inference with different degrees of semantic separation and keyword support between dataset splits. On average, the classifier based on this score achieves the highest performance and the lowest std, outperforming black-box baselines RMIA, Attack-P, and LiRA, as well as the white-box SIF baseline. These results show that dataset membership can be traced through internal semantic correlations, with the largest relative gain exceeding 60% in ROC-AUC when dataset groups expose distinct semantic particularities.

2605.30461 2026-06-01 cs.LG cs.AI

Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics

通过状态增强和共识实现可分离动力学的可扩展约束多智能体强化学习

Santiago Amaya-Corredor, Miguel Calvo-Fullana, Anders Jonsson

AI总结 提出一种结合状态增强策略学习与对偶变量分布式共识的分布式约束多智能体强化学习方法,解决可分离动力学系统中全局资源约束的协调问题,实现线性可扩展性并保证约束满足。

详情
Comments
17 pages, 8 figures, 3 tables. Plus appendix
AI中文摘要

我们提出了一种用于约束多智能体强化学习(MARL)的分布式方法,该方法将状态增强策略学习与对偶变量的分布式共识相结合。我们的方法针对智能体具有可分离动力学但必须协调以满足全局资源约束的系统,正如我们通过实验证明的,在这种设置下,独立学习无法产生可行解,因为智能体无法确定各自对集体约束满足的适当贡献。关键技术贡献在于证明,对拉格朗日乘子进行轻量级邻居到邻居共识足以实现全局协调的约束执行,同时保持独立训练的可扩展性。每个智能体离线学习一个单一的增强策略,该策略以其局部状态和编码约束反馈的对偶变量为条件。在执行过程中,智能体仅通过局部通信就该对偶变量达成共识。我们证明,在温和的连通性假设下,智能体乘子之间的共识误差是有界的,并且表明这转化为有界的约束违反,该违反随图连通性和共识轮次增加而减小。与集中训练分散执行(CTDE)方法相比,后者的复杂度至少随智能体数量呈二次增长,而我们的方法在训练和执行中均呈线性扩展。在智能电网需求响应上的实验表明,共识协调对于可行性至关重要:没有共识,智能体只能通过无限期推迟需求来满足电网容量约束,这是一种退化的非解。有了共识,智能体收敛到共享的对偶变量,并同时满足电网约束和需求满足,可扩展到数千个智能体,而CTDE基线仅能处理数十个。

英文摘要

We present a distributed approach for constrained Multi-Agent Reinforcement Learning (MARL) that combines state-augmented policy learning with distributed consensus over dual variables. Our method targets systems where agents have separable dynamics but must coordinate to satisfy global resource constraints, a setting in which, as we demonstrate empirically, independent learning fails to produce feasible solutions because agents cannot determine appropriate individual contributions toward collective constraint satisfaction. The key technical contribution is showing that lightweight neighbor-to-neighbor consensus over Lagrange multipliers suffices for globally coordinated constraint enforcement while preserving the scalability of independent training. Each agent learns a single augmented policy offline, conditioned on both its local state and a dual variable encoding constraint feedback. During execution, agents reach agreement on this dual variable through local communication alone. We prove that under mild connectivity assumptions, the consensus error among agents' multipliers is bounded, and show that this translates to a bounded constraint violation that decreases with graph connectivity and the number of consensus rounds. Unlike centralized training with decentralized execution (CTDE) approaches, whose complexity grows at least quadratically with agent count, our method scales linearly in both training and execution. Experiments on smart grid demand response demonstrate that consensus coordination is \emph{essential for feasibility}: without it, agents satisfy grid capacity constraints only by indefinitely postponing demand, a degenerate non-solution. With consensus, agents converge to a shared dual variable and satisfy both grid constraints and demand fulfillment, scaling to thousands of agents while CTDE baselines are limited to dozens.

2605.30459 2026-06-01 cs.CL

Can LLM Teams Play What? Where? When?

LLM 团队能玩“什么?哪里?何时?”吗?

Anastasia Kotelnikova, Viktor Byzov, Maria Dolzhenkova, Evgeny Kotelnikov

AI总结 本文研究基于团队的交互策略(投票、静默团队、健谈团队)能否提升大语言模型在“什么?哪里?何时?”问答游戏中的表现,实验表明团队策略优于单模型基线,准确率最高提升20个百分点,最佳团队达到44.23%,接近人类水平。

详情
Comments
Accepted for Dialogue-2026 conference
AI中文摘要

大语言模型(LLM)在需要间接推理、文化知识和协调假设测试的任务上仍然受限。我们研究基于团队的交互是否能提升LLM在“什么?哪里?何时?”(ChGK)问答游戏中的表现,该游戏旨在奖励集体推理。我们引入了三种团队策略:投票、静默团队(队长观察最终答案)和健谈团队(队长观察答案和理由)。为最小化数据泄露,我们在2025年发布的572个ChGK问题数据集上评估这些策略。使用六个近期的大规模开源模型,我们表明基于团队的策略优于单模型基线,准确率提升高达20个百分点。最佳团队达到44.23%的准确率,并在有人类统计数据的题目上接近人类团队表现。模型间多样性分析显示,分歧强烈预测较低准确率,但解释性沟通显著缓解性能下降。我们进一步检查队长行为,未发现自我偏好偏差的证据;访问同伴理由提升了队长判断。总体而言,LLM团队主要作为答案选择和错误过滤机制,而非新解决方案的生成器。我们的发现强调了交互的重要性,并表明自适应策略是多智能体系统的一个有前景的方向。

英文摘要

Large language models (LLMs) remain limited on tasks requiring indirect reasoning, cultural knowledge, and coordinated hypothesis testing. We investigate whether team-based interaction improves LLM performance in What? Where? When? (ChGK), a quiz game designed to reward collective reasoning. We introduce three team strategies: Voting, Silent Team (the captain observes final answers), and Talkative Team (the captain observes both answers and rationales). To minimize data leakage, we evaluate these strategies on a dataset consisting of 572 ChGK questions released in 2025. Using six recent large-scale open models, we show that team-based strategies outperform single-model baselines, yielding gains of up to 20 percentage points in accuracy. The best team achieves 44.23% accuracy, and approaches human team performance on questions with available human statistics. Analysis of inter-model diversity reveals that disagreement strongly predicts lower accuracy, but explanatory communication substantially mitigates performance drops. We further examine captain behavior and find no evidence of self-preference bias; access to peer rationales improves captain judgments. Overall, LLM teams function primarily as answer selection and error-filtering mechanisms rather than generators of novel solutions. Our findings highlight the importance of interaction and suggest adaptive strategies as a promising direction for multi-agent systems.

2605.30456 2026-06-01 cs.LG math.OC

DisjunctiveNet: Neural Symbolic Learning via Differentiable Convexified Optimization Layers

DisjunctiveNet: 通过可微凸优化层实现的神经符号学习

Shraman Pal, Can Li

AI总结 针对数据稀疏且富含领域知识的场景,提出DisjunctiveNet框架,通过可微凸优化层将析取约束嵌入神经网络,实现硬约束满足与强预测性能。

详情
Comments
ICML 2026
AI中文摘要

科学与工程中的许多学习任务以稀疏数据集为特征,这限制了纯数据驱动方法的有效性。同时,这些问题通常伴随着源自物理定律、操作要求和专家启发式的丰富领域知识。这些知识经常以涉及逻辑命题和线性不等式的规则形式表达。现有的神经符号方法通常通过软惩罚近似地强制执行这些规则,在设计专门架构时假设输入无关的规则,或者依赖推理时的不可微后处理来实现硬约束满足。虽然可微优化层的最新进展使得在神经网络中实现端到端的可行性强制成为可能,但由于固有的非凸性,将这些方法扩展到逻辑或混合整数规则仍然具有挑战性。在这项工作中,我们提出了一个统一的端到端框架,用于在神经网络中强制执行硬性的、输入相关的混合整数线性约束。我们的方法将规则表示为析取约束,并应用层次凸松弛来获得凸包公式。这些松弛产生了易于处理的线性约束,可以嵌入为可微优化层,同时实现精确的规则满足。我们在真实数据集上展示了所提出框架的有效性,实现了完美的规则满足和强大的预测性能。

英文摘要

Many learning tasks in science and engineering are characterized by sparse datasets, which limits the effectiveness of purely data-driven approaches. At the same time, these problems are often accompanied by rich domain knowledge derived from physical laws, operational requirements, and expert heuristics. Such knowledge is frequently expressed as rules involving logical propositions and linear inequalities. Existing neuro-symbolic methods typically enforce these rules approximately through soft penalties, assume input-independent rules when designing specialized architectures, or rely on non-differentiable post-processing at inference time to achieve hard constraint satisfaction. While recent advances in differentiable optimization layers enable end-to-end feasibility enforcement within neural networks, extending these approaches to logical or mixed-integer rules remains challenging due to inherent nonconvexity. In this work, we propose a unified end-to-end framework for enforcing hard, input-dependent mixed integer linear constraints within neural networks. Our approach represents rules as disjunctive constraints and applies hierarchical convex relaxations to obtain convex hull formulations. These relaxations yield tractable linear constraints that can be embedded as differentiable optimization layers while enabling exact rule satisfaction. We demonstrate the effectiveness of the proposed framework on real-world datasets, achieving perfect rule satisfaction and strong predictive performance.

2605.30452 2026-06-01 cs.LG cs.AI math.OC

A Unified Framework for Gradient Aggregation in Multi-Objective Optimization

多目标优化中梯度聚合的统一框架

Zeou Hu, Kelvin Ho, Yaoliang Yu

AI总结 提出一个统一框架,通过充分对齐条件建立梯度聚合方法的收敛率,并引入基于CVaR的capped MGDA算法,在对抗联邦学习中验证鲁棒性。

详情
AI中文摘要

许多机器学习问题涉及多个固有的权衡,最好通过基于梯度的多目标优化(MOO)算法来解决。现有方法通常基于不同的动机提出,逐个案例进行分析,并且在每一步中如何聚合分量梯度在算法上有所不同。在这项工作中,我们为MOO中的梯度聚合开发了一个统一框架,建立了收敛到帕累托平稳性(MOO的标准性能度量)的(最优)速率。我们分析的核心是一个充分对齐条件,由此我们推导出一个定理,表明当在梯度的凸包内选择时,非冲突方向构成了收敛的基本充分条件。我们进一步表明,通过对偶锥上的投影可以确保可行性,从而拓宽了具有收敛保证的方法的范围。同时,我们提出了梯度聚合的原始优化视角,该视角涵盖了已有算法,阐明了它们的理论关系,并能够设计新的变体。作为示例,我们引入了capped MGDA,它基于CVaR公式推导而来,并展示了其在对抗联邦学习中的鲁棒性。最后,我们通过在合成问题和实际基准上的实验验证了我们的理论。

英文摘要

Many machine learning problems involve multiple inherent trade-offs that are best addressed by gradient-based multi-objective optimization (MOO) algorithms. Existing methods are often proposed with various motivations, analyzed case by case, and differ algorithmically in how the component gradients are aggregated at each step. In this work, we develop a unifying framework for gradient aggregation in MOO, establishing (optimal) rates of convergence to Pareto stationarity, the standard measure of performance in MOO. Central to our analysis is a sufficient alignment condition, from which we derive a theorem showing that non-conflicting directions, when chosen within the convex hull of gradients, form a fundamental sufficient condition for convergence. We further show that feasibility can be ensured through projection onto the dual cone, broadening the scope of methods that admit convergence guarantees. In parallel, we present a primal optimization perspective of gradient aggregation that encompasses established algorithms, clarifies their theoretical relationships, and enables the design of new variants. As an illustration, we introduce capped MGDA, derived from a CVaR-based formulation, and demonstrate its robustness in adversarial federated learning. Finally, we validate our theory through experiments on synthetic problems and practical benchmarks.

2605.30451 2026-06-01 cs.LG

VeriGate: Verifier-Gated Step-Level Supervision for GRPO

VeriGate: 验证器门控的步骤级监督用于GRPO

Aakriti Agrawal, Minghui Liu, Furong Huang

AI总结 提出VeriGate方法,通过验证器门控的步骤级监督扩展GRPO,解决稀疏奖励和信用分配问题,在多个推理基准上显著提升准确率。

详情
AI中文摘要

组相对策略优化(GRPO)是一种有效的训练推理模型的方法,使用基于验证器的结果奖励,但其监督是稀疏的:当针对某个提示的所有采样轨迹获得相同的验证器奖励时,组相对优势会坍缩为零,学习停滞。仅结果奖励也不提供步骤级信用分配,限制了探索,使得学习稳健推理更加困难。我们提出了VeriGate(验证器门控步骤级GRPO),这是GRPO的一种验证器门控扩展,通过三个设计选择解决了这些限制。首先,每当验证器奖励在采样轨迹之间诱导出有意义的偏好时,VeriGate让验证器负责,并且仅在验证器奖励退化时使用过程监督。其次,VeriGate不将过程奖励模型(PRM)的步骤分数坍缩为单个轨迹奖励,而是将其转换为未来累积奖励,以分配延续感知的信用。第三,VeriGate将这些奖励转换为组归一化的令牌级优势,恢复信息丰富的梯度和细粒度的信用分配,同时相比优化聚合PRM分数的方法,对奖励黑客攻击的敏感性更低。实验上,在MATH上使用1.5B和7B Qwen2.5-Instruct模型进行训练,并在六个推理基准上评估,VeriGate将1.5B和7B模型的平均准确率分别提高了约20%和12%,显著减少了零梯度失败,降低了奖励黑客行为,并相对于仅结果GRPO和PRM作为结果的基线提高了推理质量。

英文摘要

Group Relative Policy Optimization (GRPO) is an effective recipe for training reasoning models with verifier-based outcome rewards, but its supervision is sparse: when all sampled trajectories for a prompt receive the same verifier reward, the group-relative advantage collapses to zero and learning stalls. Outcome-only rewards also provide no step-level credit assignment, limiting exploration and making it harder to learn robust reasoning. We present VeriGate (Verifier-Gated Step-Level GRPO), a verifier-gated extension of GRPO that addresses these limitations with three design choices. First, VeriGate keeps the verifier in charge whenever verifier rewards induce a meaningful preference among sampled trajectories, and uses process supervision only when verifier rewards are degenerate. Second, instead of collapsing Process Reward Model (PRM) step scores into a single trajectory reward, VeriGate converts them into future-cumulated rewards to assign continuation-aware credit. Third, VeriGate transforms these rewards into group-normalized token-level advantages, restoring informative gradients and fine-grained credit assignment while remaining less susceptible to reward hacking than methods that optimize aggregated PRM scores. Empirically, training on MATH with 1.5B and 7B Qwen2.5-Instruct models and evaluating on six reasoning benchmarks, VeriGate improves average accuracy by about 20% and 12% for 1.5B and 7B models respectively, substantially reduces zero-gradient failures, decreases reward-hacking behavior, and improves reasoning quality relative to outcome-only GRPO and PRM-as-outcome baselines.

2605.30448 2026-06-01 cs.LG cs.CL

Bounded Behavioral Indistinguishability for Black-Box LLM Distillation

黑盒大语言模型蒸馏的有界行为不可区分性

Munawar Hasan

AI总结 针对黑盒LLM蒸馏,提出有界行为不可区分性形式化定义,并通过对抗评估揭示语义相似性不足以保证行为不可区分性。

详情
AI中文摘要

黑盒大语言模型蒸馏通常被评估为输出匹配问题:当学生模型的响应与教师模型在语义上相似或任务一致时,即认为学生模型成功。然而,输出相似性并不意味着学生模型与其模仿的模型在行为上不可区分。我们引入了有界行为不可区分性,形式化为在显式提示分布上的$(ε,q,t,\mathbb{A})$-行为不可区分性,其中$ε$限制区分优势,$q$限制预言机查询次数,$t$限制计算量,$\mathbb{A}$表示对手类别。我们在Qwen和Llama教师-学生对上使用受控的$5,000$提示行为探测套件实例化该概念。对于每个系列,我们比较教师模型与基础学生模型以及LoRA蒸馏学生模型,衡量蒸馏是否降低了可区分性而不仅仅是提高了相似性。LoRA将Qwen的语义相似性从$0.788$提升至$0.862$,Llama从$0.814$提升至$0.874$。然而,对抗评估揭示了剩余的行为差异:学习到的判别器保持非零优势,成对类别分析显示伪影集中在风格/格式、鲁棒性和领域技术提示中。成对教师识别对手证实了这一趋势。使用不同系列的Llama评判器和A/B交换一致性过滤,Qwen的区分优势从基础学生模型的$0.158$下降到LoRA蒸馏后的$0.081$。查询预算实验表明,分歧引导的采集并不始终优于分层随机采样,表明覆盖率和多样性仍然是强基线。我们的结果表明,语义保真度有用但不足:黑盒大语言模型蒸馏需要有界、对抗性和类别感知的评估。

英文摘要

Black-box LLM distillation is usually evaluated as an output-matching problem: a student is considered successful when its responses are semantically similar to, or task-consistent with, those of a teacher. However, output similarity does not imply that the student is behaviorally indistinguishable from the model it imitates. We introduce bounded behavioral indistinguishability, formalized as $(ε,q,t,\mathbb{A})$-behavioral indistinguishability over an explicit prompt distribution, where $ε$ bounds distinguishing advantage, $q$ bounds oracle queries, $t$ bounds computation, and $\mathbb{A}$ denotes the adversary class. We instantiate this notion on Qwen and Llama teacher-student pairs using a controlled $5,000$-prompt behavioral probe suite. For each family, we compare the teacher with both the base student and the LoRA-distilled student, measuring whether distillation reduces distinguishability rather than merely improving similarity. LoRA raises semantic similarity from $0.788$ to $0.862$ for Qwen and from $0.814$ to $0.874$ for Llama. Yet adversarial evaluation reveals remaining behavioral differences: learned discriminators retain nonzero advantage, and pairwise category analysis shows artifacts concentrated in style/format, robustness, and domain-technical prompts. A pairwise teacher-identification adversary confirms this trend. With a different-family Llama judge and A/B-swap consistency filtering, Qwen distinguishing advantage drops from $0.158$ for the base student to $0.081$ after LoRA distillation. Query-budget experiments show that disagreement-guided acquisition does not consistently outperform stratified random sampling, indicating that coverage and diversity remain strong baselines. Our results show that semantic fidelity is useful but insufficient: black-box LLM distillation requires bounded, adversarial, and category-aware evaluation.

2605.30447 2026-06-01 cs.LG cs.AI stat.ML

Calibrated Preference Learning: The Case of Label Ranking

校准偏好学习:以标签排序为例

Santo M. A. R. Thies, Viktor Bengs, Timo Kaufmann, Sebastian J. Vollmer, Eyke Hüllermeier

AI总结 针对概率标签排序问题,形式化定义了校准概念并建立层次体系,通过理论证明和实验验证了不同校准概念的关系及现有模型的校准缺陷。

详情
AI中文摘要

校准,即预测概率与真实结果频率的对齐,对于可靠决策至关重要。尽管在分类和回归中已有广泛研究,但校准尚未在概率标签排序中得到正式处理,其目标是预测标签集排序上的分布。将排序视为类别会忽略其结构,并无法捕捉成对和top-k预测等重要模态。我们形式化了标签排序的校准,并建立了一个涵盖完整排序、子排序和top-k排序的概念层次。我们证明完整排序校准蕴含其他校准,但反之不成立,且子排序和top-k校准不可比较。实验发现,流行的标签排序模型通常校准不良,子排序和top-k指标之间存在显著差异。将我们的框架应用于RLHF奖励模型,发现校准与基准准确性强相关但不完全一致,表明它捕捉了超越top-1准确性的有意义的质量维度。这些发现激励了未来关于理解误校准的下游影响以及开发纠正方法的工作。

英文摘要

Calibration, the alignment of predicted probabilities with true outcome frequencies, is essential for reliable decision-making. While extensively studied for classification and regression, calibration has not been formally addressed for probabilistic label ranking, where the goal is to predict a distribution over orderings of a label set. Naively treating rankings as classes ignores their structure and fails to capture important modalities such as pairwise and top-k predictions. We formalize calibration for label ranking and develop a hierarchy of notions covering full rankings, sub-rankings, and top-k rankings. We prove that full-rank calibration implies the others but not conversely, and sub-ranking and top-k calibration are incomparable. Empirically, we find popular label ranking models are often poorly calibrated, with substantial differences between sub-ranking and top-k metrics. Applying our framework to RLHF reward models, we find that calibration correlates strongly but not perfectly with benchmark accuracy, suggesting it captures a meaningful quality dimension beyond top-1 accuracy. These findings motivate future work on understanding the downstream effects of miscalibration and developing methods to correct it.

2605.30444 2026-06-01 cs.CV

Dex2HOI: Dexterous Bimanual Two-Object Interaction Generation

Dex2HOI: 灵巧双手双物体交互生成

Chrysa Pratikaki, Pablo Ruiz-Ponce, Jiankang Deng, Stefanos Zafeiriou, Rolandos Alexandros Potamias

AI总结 提出Dex2HOI统一扩散模型,通过双流扩散和运动融合网络,实现从文本生成单/双物体灵巧双手交互,速度提升达540倍。

详情
AI中文摘要

近期4D人-物体交互(HOI)生成的进展使得运动合成越来越逼真,特别是对于单物体操作。然而,当前研究忽视了人类行为的一个固有特性:人们自然地协调双手并同时操作多个物体。为填补这一空白,我们提出了Dex2HOI,一个用于从文本合成单物体和双物体HOI的统一扩散模型。其核心采用双流扩散方法,每个物体在专用交互流中处理,并通过双向交叉注意力进行协调。为了合成最终运动,我们引入了一个运动融合网络,该网络集成了新颖的相对于手的物体表示和应用于整个序列的接触感知条件。通过在带前缀条件的窗口上自回归采样扩散过程,Dex2HOI以实时速度生成任意长的序列,省略了冗余的测试时优化,相比先前最先进方法实现了高达540倍的推理加速。在单物体和双物体基准上的广泛评估展示了最先进的定量结果,标志着超越传统单物体HOI生成、向表达性多物体操作迈出的一步。代码和模型将在接收后发布。

英文摘要

Recent advances in 4D Human-Object Interaction (HOI) generation have enabled increasingly realistic motion synthesis, particularly for single-object manipulation. Yet current research overlooks an inherent property of human behavior: people naturally coordinate both hands and manipulate multiple objects simultaneously. To address this gap, we present Dex2HOI, a unified diffusion model for single- and two-object HOI synthesis from text. At its core, Dex2HOI employs a Dual-Stream Diffusion approach, where each object is processed in a dedicated interaction stream and coordinated through bidirectional cross-attention. To synthesize the final motion, we introduce a Motion Fusion Network integrated with novel hand-relative object representations and contact-aware conditioning applied across the whole sequence. By sampling the diffusion process autoregressively over prefix-conditioned windows, Dex2HOI generates arbitrarily long sequences at real-time speed omitting redundant test-time optimization, achieving up to x540 inference speed-up over prior state-of-the-art methods. Extensive evaluation on both single- and two-object benchmarks demonstrates state-of-the-art quantitative results, marking a step beyond conventional single-object HOI generation and toward expressive multi-object manipulation. Code and models will be released upon acceptance.

2605.30434 2026-06-01 cs.LG cs.AI cs.CL cs.MA

LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

LongDS-Bench:关于长周期智能数据分析的失败

Kewei Xu, Xiaoben Lu, Shuofei Qiao, Zihan Ding, Haoming Xu, Lei Liang, Ningyu Zhang

AI总结 提出LongDS基准,用于评估长周期多轮数据分析中智能体维护和更新分析状态的能力,发现最佳模型平均准确率仅48.45%,且长周期错误占失败原因的52%-69%。

详情
Comments
Ongoing work
AI中文摘要

现实世界的数据分析本质上是迭代的,然而现有基准大多评估孤立或短期的交互任务,未能测试智能体在长周期内跟踪不断变化的分析上下文的能力。我们引入了LongDS,一个用于长周期、多轮数据分析的基准,其中智能体必须维护、更新、恢复和组合不断变化的分析状态。LongDS包含68个从真实世界Kaggle笔记本构建的任务,涵盖地球科学、商业和教育等六个领域的2,225轮交互。任务围绕状态演化模式(例如反事实扰动、回滚、多状态组合)设计,平均依赖跨度为11.3轮。评估五个最先进模型,我们发现最佳模型仅达到48.45%的平均准确率,性能从早期到后期轮次下降近47个百分点,长周期错误占失败原因的52%-69%。进一步分析表明,额外的智能体步骤并不一定能提高性能,这表明关键瓶颈在于维护正确的分析状态,而非增加交互预算。我们发布LongDS以支持可靠的长周期智能数据分析研究。代码和数据将在https://github.com/zjunlp/DataMind发布。

英文摘要

Real-world data analysis is inherently iterative, yet existing benchmarks mostly evaluate isolated or short interactive tasks, leaving agents' ability to track evolving analytical context over long horizons untested. We introduce LongDS, a benchmark for long-horizon, multi-turn data analysis where agents must maintain, update, restore, and compose evolving analytical states. LongDS comprises 68 tasks constructed from real-world Kaggle notebooks, spanning 2,225 turns across six domains including Geoscience, Business, and Education. Tasks are designed around state-evolution patterns (e.g., counterfactual perturbation, rollback, multi-state composition), with an average dependency span of 11.3 turns. Evaluating five state-of-the-art models, we find that the best model reaches only 48.45% average accuracy, performance drops nearly 47 points from early to late turns, and long-horizon errors account for 52%--69% of failures. Further analysis shows that additional agent steps do not necessarily improve performance, suggesting that the key bottleneck is maintaining a correct analytical state rather than increasing interaction budget. We release LongDS to support research on reliable long-horizon agentic data analysis. Code and data will be released at https://github.com/zjunlp/DataMind.

2605.30431 2026-06-01 cs.CV

DTG-Restore: Training-Free Diffusion Refinement for Generative Video Super-Resolution

DTG-Restore: 无需训练的视频超分辨率扩散精炼

Hidir Yesiltepe, Koutilya PNVR, Gaurav Pathak, Navaneeth Bodla, Bharat Singh, Pinar Yanardag, Jinrong Xie

AI总结 提出解耦时间引导(DTG)方法,通过时间解耦条件与无条件分支,无需训练即可增强扭曲低分辨率视频,提升结构保真度和时间稳定性。

详情
AI中文摘要

近期视频扩散模型的进展实现了显著的生成保真度,但利用这些先验进行修复仍受限于标准无分类器引导中条件分支与无条件分支的强耦合。我们提出一种无需训练的框架,通过时间解耦这些信号来增强扭曲和低分辨率视频。我们提出的解耦时间引导(DTG)在更干净的扩散时间步评估无条件分支,提供一个前瞻先验,在抑制扭曲内容复制的同时保持几何结构。这种时间偏置在采样过程中逐渐减弱,使模型能够从结构校正过渡到细节精炼,无需重新训练。结合任何现成的修复模块以即插即用的方式,我们的方法在AI生成和真实世界视频中均能改善感知一致性并恢复合理的结构。为便于评估,我们整理了GenWarp480基准,包含从多种文本到视频模型合成的4400个扭曲480p视频。GenWarp480专注于特征性生成退化,如扭曲面部、身体错位和空间伪影,为评估对生成错误的鲁棒性提供了专门构建的测试平台。大量实验表明,我们的方法在无需任何模型训练的情况下,在结构保真度和时间稳定性方面取得了显著改进。

英文摘要

Recent progress in video diffusion models has enabled remarkable generative fidelity, yet leveraging these priors for restoration remains limited by the strong coupling between conditional and unconditional branches in standard classifier-free guidance. We introduce a training-free framework that enhances distorted and low-resolution videos by decoupling these signals in time. Our proposed Decoupled Time Guidance (DTG) evaluates the unconditional branch at a cleaner diffusion timestep, providing a lookahead prior that preserves geometry while suppressing replication of warped content. This temporal bias is annealed throughout sampling, allowing the model to transition from structure correction to detail refinement without retraining. Combined with any off-the-shelf restoration module in a plug-and-play manner, our approach improves perceptual coherence and restores plausible structure in AIgenerated and real-world videos alike. To facilitate evaluation, we curate GenWarp480, a benchmark of 4,400 distorted 480p videos synthesized from diverse text-to-video models. GenWarp480 focuses on characteristic generative degradations such as warped faces, body misalignments, and spatial artifacts, providing a purpose-built testbed for assessing robustness to generative errors. Extensive experiments demonstrate that our method achieves significant improvements in structural fidelity and temporal stability without any model training.

2605.30415 2026-06-01 cs.CL cs.AI

Domain Adaptation and Reasoning Frameworks in Language Models: A Controlled Experiment with Historical Cosmology

语言模型中的领域适应与推理框架:以历史宇宙学为受控实验

Francesco De Bernardis

AI总结 通过历史宇宙学受控实验,研究领域适应如何重塑语言模型的解释行为,发现适应主要改变解释框架而非直接改变立场。

详情
Comments
17 pages, 3 figures
AI中文摘要

我们以历史宇宙学为受控环境,研究领域适应如何重塑语言模型中的解释行为。在第一阶段,我们在一个去除明确日心说引用的前哥白尼语料库上从头训练一个小型语言模型,并评估地动说或日心说延续是否仍然出现。在第二阶段,我们使用QLoRA在同一语料库上微调一个更大的预训练模型,以研究适应如何修改解释框架和宇宙学立场。模型输出使用LLM-as-judge框架进行评估,该框架标记宇宙学立场(地心说、日心说或模糊)和解释框架(前现代与现代)。在受限的第一阶段,较小的模型偶尔生成局部的地动说延续,但这些延续全局不稳定,不足以支持连贯的宇宙学推理。在第二阶段,微调导致向现代前解释框架的大幅且统计显著的转变,而条件宇宙学立场分布在这些框架内相对稳定。因此,地心说输出的增加主要源于解释机制的重新分布,而非立场的直接修改。这些结果表明,领域适应可能主要重塑生成延续的语言框架,而立场的变化则次要地源于这些转变。

英文摘要

We investigate how domain adaptation reshapes explanatory behavior in language models using historical cosmology as a controlled setting. In Phase 1, we train a small language model from scratch on a pre-Copernican corpus from which explicit heliocentric references were removed, and evaluate whether Earth-motion or heliocentric continuations nevertheless emerge. In Phase 2, we fine-tune a larger pretrained model using QLoRA on the same corpus in order to study how adaptation modifies explanatory framing and cosmological stance. Model outputs are evaluated using an LLM-as-judge framework that labels both cosmological stance (geocentric, heliocentric, or ambiguous) and explanatory frame (premodern versus modern). In the constrained setting of Phase 1, the smaller models occasionally generate local Earth-motion continuations, but these remain globally unstable and insufficient to support coherent cosmological reasoning. In Phase 2, fine-tuning induces a large and statistically significant shift toward premodern explanatory framing, while the conditional cosmological stance distributions remain comparatively stable within those frames. As a result, increases in geocentric outputs arise primarily from redistribution over explanatory regimes rather than from direct modification of stance. These results suggest that domain adaptation may primarily reshape the linguistic frameworks from which continuations are generated, with changes in stance emerging secondarily from those shifts.

2605.30409 2026-06-01 cs.CV cs.AI

SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer

SANA-Streaming: 基于混合扩散Transformer的实时流式视频编辑

Yuyang Zhao, Yicheng Pan, Qiyuan He, Jincheng Yu, Junsong Chen, Tian Ye, Haozhe Liu, Enze Xie, Song Han

AI总结 提出系统-算法协同设计的SANA-Streaming框架,通过混合扩散Transformer架构、循环反向正则化训练策略和高效系统协同设计,在消费级GPU上实现高分辨率实时流式视频编辑,达到1280×704分辨率24 FPS的端到端性能。

详情
AI中文摘要

实时流式视频到视频编辑(V2V)对于直播和游戏等交互式应用至关重要,但由于对时间一致性和推理吞吐量的严格要求,它仍然是一个严峻的挑战。在本文中,我们提出了SANA-Streaming,一个系统-算法协同设计的框架,用于在消费级GPU上进行高分辨率、实时流式视频编辑,具有以下三个核心设计:(1)混合扩散Transformer架构在部分块中引入softmax注意力以提高局部建模能力,同时保持线性层的效率。(2)循环反向正则化是一种新颖的训练策略,通过流匹配从生成内容预测源帧来强制语义一致性,无需成对的长编辑视频即可提高时间一致性。(3)高效系统协同设计结合了融合GDN内核和针对NVIDIA Blackwell(RTX 5090)架构优化的混合精度量化(MPQ)。通过分析实际吞吐量,我们的MPQ在保持生成质量的同时最大化Tensor Core利用率。最终系统在单个RTX 5090 GPU上以24 FPS的端到端帧率实现实时1280×704分辨率编辑,其中DiT核心运行在58 FPS。实验结果表明,我们的协同设计方法在时间一致性和系统吞吐量方面均显著优于现有最先进方法。

英文摘要

Real-time streaming video-to-video editing (V2V) is critical for interactive applications such as live broadcasting and gaming, yet it remains a formidable challenge due to the stringent requirements for temporal consistency and inference throughput. In this paper, we present SANA-Streaming, a system-algorithm co-designed framework for high-resolution, real-time streaming video editing on consumer GPUs, with the following three core designs: (1) Hybrid Diffusion Transformer architecture introduces softmax attention in part of the blocks to improve local modeling capabilities while preserving the efficiency of linear layers. (2) Cycle-Reverse Regularization is a novel training strategy that enforces semantic consistency by predicting source frames from generated content via flow matching, improving temporal consistency without requiring paired long edited videos. (3) Efficient System Co-design combines fused GDN kernels and Mixed-Precision Quantization (MPQ) optimized for the NVIDIA Blackwell (RTX 5090) architecture. By profiling real-world throughput, our MPQ maximizes Tensor Core utilization while maintaining generation quality. The resulting system achieves real-time 1280 x 704 resolution editing at 24 end-to-end FPS on a single RTX 5090 GPU, with the DiT core running at 58 FPS. Experimental results demonstrate that our co-design approach significantly outperforms existing SOTA methods in both temporal coherence and system throughput.

2605.30400 2026-06-01 cs.CL

Protocol for evaluating ChatGPT in biomedical association generation and verification using a RAG-enabled, cross-model majority voting workflow

使用RAG支持的跨模型多数投票工作流评估ChatGPT在生物医学关联生成与验证中的协议

Ahmed Abdeen Hamed, Luis M. Rocha

AI总结 提出一个协议,通过RAG和跨模型多数投票工作流评估ChatGPT生成疾病中心生物医学关联的能力,并验证其可靠性。

详情
Journal ref
STAR Protocols, 2026; 7
Comments
Main Manuscript and Supplementary Information. Both are equally important
AI中文摘要

我们提出一个协议,用于评估ChatGPT生成以疾病为中心的生物医学关联的能力。该协议概述了如何生成关联、使用生物医学本体验证生物实体以及通过文献验证关联。协议包括一个自一致性策略,用于评估跨ChatGPT模型的生成可靠性。为了解决本体精确匹配的限制,我们提供了一个用例,通过由开源大语言模型(LLMs)驱动的检索增强生成(RAG)工作流执行语义验证。这使得LLMs能够对其他LLMs生成的内容建立真实性,并暴露幻觉。

英文摘要

We present a protocol to evaluate ChatGPT's ability to generate disease-centric biomedical associations. It outlines how we generate the associations, validate the biological entities using biomedical ontologies, and verify associations using literature. The protocol includes a self-consistency strategy to assess generative reliability across ChatGPT models. To address ontology exact-match limitations, we provide a use case performing semantic verification through a workflow enabled by Retrieval-Augmented Generation (RAG) powered by open-source large language models (LLMs). This enables LLMs to establish truth over content generated by other LLMs and expose hallucination.

2605.30393 2026-06-01 cs.LG cs.AI cs.CR

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

NumLeak: 基础模型中的公开数值基准作为潜在标签

Anany Kotawala

AI总结 提出NumLeak框架,通过API边界探测和开源因果模型的白盒验证,揭示基础模型在预训练中记忆公开数值基准,导致评估高估泛化能力。

详情
Comments
23 pages, 12 figures, 17 tables. Accepted at the ICML 2026 Workshop on the Impact of Memorization on Trustworthy Foundation Models (MemFM)
AI中文摘要

公开数值基准出现在预训练中,因此基于日期进行评估可能测量的是记忆性回忆而非样本外技能。我们引入NumLeak,一个结合生产模型API边界探测与开源因果模型白盒受控验证的测量框架。顶级前沿LLM在3种子池化后,对Fama-French市场超额收益的回忆皮尔逊相关系数r=0.97-0.99,同时五个兄弟因子在25个基点内误差不超过0.15;在美国失业率、CPI通胀和NOAA温度上观察到类似保真度。在近期发布的保留集上,解析率骤降至21-57%,但在回答的月份上r仍约为0.99,拒绝-回忆不对称性符合记忆通道的预测。白盒实验重现了剂量反应,对数概率排名检测到开放生成遗漏的记忆,意味着封闭API黑盒探测低估了该通道。一个Sonnet“日期到市场情绪”回归与真实Mkt-RF的相关性r=0.74,在残差化模型自身回忆后降至r=0.02。一行系统提示防御在概念和历史叙事查询上以接近零的效用成本阻止了99.8%的非自适应单轮后缀攻击集。

英文摘要

Public numeric benchmarks appear in pretraining, so an evaluation that conditions on a date may be measuring memorized recall rather than out-of-sample skill. We introduce NumLeak, a measurement framework that combines API-boundary probes on production models with a white-box controlled validation on an open causal LM. Top-tier frontier LLMs recall the Fama-French market excess return at 3-seed pooled Pearson r=0.97-0.99 while staying within 0.15 within-25bps on the five sibling factors; comparable fidelity appears on U.S. unemployment, CPI inflation, and NOAA temperature. On a recent-release holdout, parse rate collapses to 21-57% but r stays at approximately 0.99 on months answered, the refuse-or-recall asymmetry a memorized channel predicts. The white-box experiment reproduces the dose-response, and logprob ranking detects memorization that open-ended generation misses, implying closed-API black-box probes understate the channel. A Sonnet "date to market-sentiment" regression that correlates with true Mkt-RF at r=0.74 collapses to r=0.02 once the model's own recall is residualized out. A one-line system-prompt defense blocks 99.8% of a non-adaptive single-turn suffix attack set at near-zero utility cost on conceptual and historical-narrative queries

2605.30388 2026-06-01 cs.LG

A Novel Evaluation Metric for Unsupervised Learning in AIS-Based Maritime Anomaly Detection: MADQI

基于AIS的海事异常检测中无监督学习的新型评估指标:MADQI

Ismet Gocer, Zakirul Bhuiyan, Raza Hasan, Shakeel Ahmad

AI总结 提出一种无需标签数据的海事异常检测质量指标MADQI,通过结合四个子指标来评估无监督学习模型的异常检测性能。

详情
Comments
26 pages, A new Eval Metric for Unsupervised Machine Learning
AI中文摘要

本文介绍了一个新的系统框架,用于检测海事自动识别系统(AIS)数据集中的异常。这些异常包括与速度、位置跳跃、时间间隔和转向角度相关的异常船舶行为。尽管诸如孤立森林之类的无监督学习算法被广泛用于检测异常船舶运动,但它们通常缺乏系统且有意义的评估措施。为了解决这一局限性,我们提出了一种称为海事异常检测质量指标(MADQI)的新型质量指标。所提出的MADQI是一个复合指标,旨在评估机器学习模型的异常检测性能,而无需标记数据。该框架使用哈弗辛距离计算来分析AIS数据集,并根据空间和行为特征识别异常。所提出的MADQI评估框架整合了四个相互关联的指标:异常率一致性(ARC)、物理合理性评分(PPS)、评分分布分离度(SDS)和极端案例证据(ECE)。这些指标通过使用多块评估和自适应缩放技术的自动归一化进行组合。在AIS数据集上的实验结果表明,所提出的框架实现了80.37%的MADQI分数,证明了其在无监督异常检测中的有效性。特别是,该算法在识别异常船舶行为方面表现强劲。在MADQI的各个组成部分中,ECE和ARC分别达到了0.907和1.000的分数,表明其在检测极端异常和保持异常率一致性方面具有出色的能力。总体而言,这些结果令人鼓舞,并表明所提出的框架为评估海事AIS数据中的无监督异常检测提供了一种可靠且有意义的方法。

英文摘要

This paper introduces a new systematic framework for detecting anomalies in maritime Automatic Identification System (AIS) datasets. These anomalies include abnormal vessel behaviours related to speed, position jumps, time gaps, and turn angles. Although unsupervised learning algorithms such as Isolation Forest are widely used for detecting anomalous vessel movements, they often lack systematic and meaningful evaluation measures. To address this limitation, we propose a novel quality metric called Maritime Anomaly Detection Quality Index (MADQI). The prosed MADQI is a composite index designed to evaluate the anomaly detection performance of machine learning models without requiring labelled data. The proposed framework uses Haversine distance calculations to analyse AIS datasets and identify anomalies based on their spatial and behavioural characteristics. The proposed MADQI evaluation framework integrates four interconnected metrics: Anomaly Rate Consistency (ARC), Physical Plausibility Score (PPS), Score Distribution Separation (SDS), and Extreme Case Evidence (ECE). These metrics are combined through automatic normalisation using multi-chunk evaluation and adaptive scaling techniques. Experimental results on the AIS dataset show that the proposed framework achieved a MADQI score of 80.37%, demonstrating its effectiveness for unsupervised anomaly detection. In particular, the algorithm performed strongly in identifying abnormal vessel behaviour. Among the individual MADQI components, ECE and ARC achieved scores of 0.907 and 1.000, respectively, indicating excellent capability in detecting extreme anomalies and maintaining anomaly rate consistency. Overall, these results are encouraging and demonstrate that the proposed framework provides a reliable and meaningful approach for evaluating unsupervised anomaly detection in maritime AIS data.

2605.30387 2026-06-01 cs.LG cs.AI cs.CV eess.SP

Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification

基于小波图像变换和频谱流匹配的功能磁共振时间序列生成用于脑疾病识别

Hwa Hui Tew, Junn Yong Loo, Fang Yu Leong, Julia K. Lau, Ding Fan, Hernando Ombao, Raphaël C. -W. Phan, Chee Pin Tan, Chee-Ming Ting

AI总结 提出双频谱流匹配(DSFM)框架,通过离散小波变换和离散余弦变换对BOLD信号进行双频表示,结合频谱流匹配生成类条件余弦频率表示,再经逆变换重建生理上合理的时域BOLD信号,以改善下游脑网络分类。

详情
Comments
Accepted at the Fourteenth International Conference on Learning Representations (ICLR 2026)
AI中文摘要

功能磁共振成像(fMRI)通过测量随时间变化的血氧水平依赖(BOLD)信号,提供对动态脑活动的非侵入性访问。然而,fMRI采集的资源密集型特性限制了数据驱动脑分析模型所需的高保真样本的可用性。虽然现代生成模型可以合成fMRI数据,但它们在复制原始BOLD信号固有的非平稳性、复杂的时空动态和生理变化方面仍然面临挑战。为了解决这些挑战,我们提出了双频谱流匹配(DSFM),一种新颖的fMRI生成框架,它将BOLD信号的双频表示与频谱流匹配级联起来。具体来说,我们的框架首先通过离散小波变换(DWT)将BOLD信号转换为小波分解图,以捕获全局瞬态和多尺度变化,并将其投影到跨脑区和时间的离散余弦变换(DCT)空间中,以利用低频主导BOLD系数的局部能量压缩。随后,训练一个频谱流匹配模型来生成类条件余弦频率表示。通过逆DCT和逆DWT操作重建生成的样本,以恢复生理上合理的时域BOLD信号。这种双变换方法施加了结构化的频率先验,并保留了关键的生理脑动力学。最终,我们通过改进的下游基于fMRI的脑网络分类证明了我们方法的有效性。代码可在 https://github.com/htew0001/DSFM.git 获取。

英文摘要

Functional Magnetic Resonance Imaging (fMRI) provides non-invasive access to dynamic brain activity by measuring blood oxygen level-dependent (BOLD) signals over time. However, the resource-intensive nature of fMRI acquisition limits the availability of high-fidelity samples required for data-driven brain analysis models. While modern generative models can synthesize fMRI data, they often remain challenging in replicating their inherent non-stationarity, intricate spatiotemporal dynamics, and physiological variations of raw BOLD signals. To address these challenges, we propose Dual-Spectral Flow Matching (DSFM), a novel fMRI generative framework that cascades dual frequency representation of BOLD signals with spectral flow matching. Specifically, our framework first converts BOLD signals into a wavelet decomposition map via a discrete wavelet transform (DWT) to capture globalized transient and multi-scale variations, and projects into the discrete cosine transform (DCT) space across brain regions and time to exploit localized energy compaction of low-frequency dominant BOLD coefficients. Subsequently, a spectral flow matching model is trained to generate class-conditioned cosine-frequency representation. The generated samples are reconstructed through inverse DCT and inverse DWT operations to recover physiologically plausible time-domain BOLD signals. This dual-transform approach imposes structured frequency priors and preserves key physiological brain dynamics. Ultimately, we demonstrate the efficacy of our approach through improved downstream fMRI-based brain network classification. The code is available at https://github.com/htew0001/DSFM.git .

2605.30385 2026-06-01 cs.LG cs.AI

LLMs Without Deep Neural Networks: New Architecture, Benefits and Case Study

无需深度神经网络的LLM:新架构、优势与案例研究

Vincent Granville

AI总结 本文提出一种基于RBF网络的新架构,无需深度神经网络即可通过闭式解找到损失函数全局最优,消除繁琐训练步骤,并提高可解释性和准确性。

详情
Comments
9 pages, 5 figures
AI中文摘要

本文旨在验证我在LLM背景下提出的深度神经网络替代方案。最近,中国研究人员对一种称为RBF网络的模型产生了浓厚兴趣,该模型作为标准DNN的替代品,具有更高的可解释性和准确性。事实证明,我独立发现的新模型基于完全相同的机制,但有一个重大转折:它不需要DNN,因为它以闭式解在一次迭代中找到损失函数的全局最优,从而消除了繁琐的训练步骤。这里我提供了我的技术的高层概述,包括案例研究和与类似方法的比较。

英文摘要

The purpose of this article is to provide validation to my deep neural network alternative in the context of LLMs. Very recently, there has been a significant interest by Chinese researchers in a model called RBF network, as a substitute to standard DNNs, with increased explainability and higher accuracy. It turns out that my new model, discovered independently, is based on the exact same machinery. But with a major twist: it does not need DNN as it finds the global optimum of the loss function in closed form, in one iteration, thus eliminating the tedious training step. Here I provide a high-level overview of my technology, with case study and comparison to similar methods.

2605.30383 2026-06-01 cs.RO cs.AI

Structured interactions improve distributed coordination beyond model scaling in a real-world multi-robot system

结构化交互在真实世界多机器人系统中超越模型规模提升分布式协调能力

Junping Wang, Zhizhong Zhang, Yongqiang Tang, Geng Zheng, Jiaming Zhang, Shiji Song, Yanmei Li, Yushan Ma

AI总结 通过真实多机器人实验,发现模块化层次化交互拓扑相比增加模型规模能更显著提升协调性能。

详情
AI中文摘要

提升单个机器人能力是常见但昂贵的做法。本文研究真实多机器人协调中的系统级设计问题:在硬件预算匹配的情况下,重构机器人间的通信是否比增加机载模型规模带来更大收益?使用10个物理机器人执行代表性的运输与建图任务(每种条件5次运行,共60次运行),我们发现从全连接切换到模块化层次化交互可将归一化性能提升47分(0-100分),而将神经网络隐藏层大小加倍最多提升9分。嵌套混合效应模型比较显示,拓扑对模型拟合的改善远大于规模。该模式在独立的SMAC复制实验中得到确认;异构基准重分析提供次要支持性一致性检查而非主要证据。在仿真校准的外推中观察到超过1024个隐藏单元的性能饱和,但未直接在硬件上验证。这些结果表明,在测试系统和任务设置中,交互结构可发挥主导作用,但更广泛的定量泛化仍有待建立。

英文摘要

Scaling individual robot capabilities is common but costly. Here we investigate a system-level design question in real-world multi-robot coordination: given matched hardware budgets, does restructuring communication among robots yield larger gains than increasing onboard model size? Using a representative transport-and-mapping task with 10 physical robots (5 runs per condition, 60 runs total), we find that switching from fully connected to modular hierarchical interactions improves normalised performance by 47 points (0--100), whereas doubling neural network hidden size yields at most 9 points. Nested mixed-effects model comparisons show a substantially larger improvement in model fit for topology than for scale. The pattern is confirmed in independent SMAC replications; heterogeneous benchmark reanalyses provide secondary supporting consistency checks rather than primary evidence. Performance saturation beyond 1024 hidden units is observed in simulation-calibrated extrapolation, not directly on hardware. These results indicate that interaction structure can play a dominant role within the tested system and task setting, while broader quantitative generalisation remains to be established.

2605.30381 2026-06-01 cs.LG cs.AI

When LLMs Learn to Be Consistently Wrong: A Multi-Model Study of Linear Representations of Synthetic Deception

当LLM学会一致错误:合成欺骗的线性表示的多模型研究

Vahideh Zolfaghari

AI总结 通过LoRA微调五个Transformer模型的诚实与欺骗变体,使用线性探针检测合成欺骗,发现早期层即可达到近完美AUC,支持线性表示假说,并揭示两种表示机制。

详情
AI中文摘要

欺骗性对齐(模型保持准确的内部表示同时故意产生错误输出)仍然是AI安全的核心挑战。虽然战略性欺骗是主要的长期关注点,但通过直接优化错误答案诱导的合成不诚实为研究学习欺骗的表示基础提供了受控测试平台。我们引入了一个多模型范式,其中五个Transformer模型(Pythia-1.4B、Gemma-2-2B/9B、Qwen2.5-7B、Llama-3.1-8B)的诚实和欺骗变体使用LoRA在相同问题分布上进行微调。在平均池化隐藏状态上训练的线性探针在四个架构的1-3层即可检测到合成欺骗,AUC接近完美(≥0.99),而Pythia-1.4B达到峰值0.705。逻辑回归探针始终匹配或优于MLP探针,支持线性表示假说。在TruthfulQA上训练的探针以近乎零损失(ΔAUC≈0)泛化到保留的MMLU主题。深层表示对高斯噪声表现出强鲁棒性,其中Gemma-2模型表现出卓越的稳定性。对Fisher判别比、有效秩、质心几何、方向稳定性、跨域对齐和校准(ECE)的机制分析揭示了两种机制:Pythia/Llama/Qwen中的表示坍缩与Gemma-2中的高维保持。在所有模型中,欺骗方向在更深层逐渐巩固,在1-4层可实现最优校准(除Pythia外ECE<0.01)。这些结果表明,通过适度的监督微调,鲁棒、域不变的欺骗表示可以迅速固化,对基于激活的监控具有启示意义。

英文摘要

Deceptive alignment, in which models maintain accurate internal representations while deliberately producing false outputs, remains a central challenge in AI safety. While strategic deception is the primary long-term concern, synthetic dishonesty - induced via direct optimization on incorrect answers - provides a controlled testbed for studying the representational basis of learned deception. We introduce a multi-model paradigm in which honest and deceptive variants of five transformer models (Pythia-1.4B, Gemma-2-2B/9B, Qwen2.5-7B, Llama-3.1-8B) are fine-tuned using LoRA on the same question distribution. Linear probes trained on mean-pooled hidden states detect synthetic dishonesty with near-perfect AUC (greater than or equal to 0.99) as early as layers 1-3 in four architectures, while Pythia-1.4B reaches a peak of 0.705. Logistic regression probes consistently match or outperform MLP probes, supporting the Linear Representation Hypothesis. Probes trained on TruthfulQA generalize with near-zero loss (Delta AUC approx. 0) to held-out MMLU subjects. Late-layer representations show strong robustness to Gaussian noise, with Gemma-2 models exhibiting exceptional stability. Mechanistic analysis of Fisher Discriminant Ratio, effective rank, centroid geometry, directional stability, cross-domain alignment, and calibration (ECE) reveals two regimes: representational collapse in Pythia/Llama/Qwen versus high-dimensional preservation in Gemma-2. Across all models, the dishonesty direction consolidates progressively in deeper layers, with optimal calibration (ECE less than 0.01 except Pythia) achievable in layers 1-4. These results demonstrate that robust, domain-invariant dishonesty representations can be rapidly entrenched via modest supervised fine-tuning, with implications for activation-based monitoring.