arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1971
专题追踪
2512.10353 2026-06-18 cs.CV 版本更新

Hybrid Transformer-Mamba for Weakly Supervised Volumetric Medical Segmentation

混合Transformer-Mamba用于弱监督体积医学分割

Yiheng Lyu, Lian Xu, Coen Arrow, Mohammed Bennamoun, Farid Boussaid, Girish Dwivedi

发表机构 * University of Western Australia(西澳大学) Harry Perkins Institute of Medical Research(哈利·佩金斯医学研究所) National Imaging Facility(国家成像设施) Fiona Stanley Hospital(菲奥娜·斯蒂尔医院) Victor Chang Cardiac Research Institute(维多利亚·张心脏研究中心)

AI总结 提出TranSamba混合架构,通过跨平面建模捕获3D上下文,在弱监督下实现高效体积分割,在三个数据集上达到最优性能。

详情
AI中文摘要

弱监督分割使得模型能够从平面级标签进行训练。现有方法通常依赖2D编码器,忽略了医学数据的体积特性。我们提出TranSamba,一种混合Transformer-Mamba架构,旨在通过跨平面建模捕获3D上下文。TranSamba在Vision Transformer骨干网络基础上增加跨平面Mamba块,利用线性时间建模实现相邻平面间的高效信息交换。这种交换改善了平面内自注意力以及后续用于目标定位的注意力图。TranSamba在输入体积深度上保持线性时间复杂度和恒定空间复杂度。在涵盖不同模态和病理的三个数据集上的大量实验表明,TranSamba达到了最先进的性能,展示了跨平面建模的泛化有效性。代码可在以下网址获取:this https URL.

英文摘要

Weakly supervised segmentation enables model training from plane-level labels. Existing methods often rely on 2D encoders, neglecting the volumetric nature of medical data. We propose TranSamba, a hybrid Transformer-Mamba architecture designed to capture 3D context via cross-plane modeling. TranSamba augments a Vision Transformer backbone with Cross-Plane Mamba blocks, leveraging linear-time modeling for efficient information exchange across neighboring planes. This exchange improves in-plane self-attention and subsequent attention maps for object localization. TranSamba maintains linear time complexity and constant space complexity with respect to the input volume depth. Extensive experiments on three datasets covering diverse modalities and pathologies show that TranSamba achieves state-of-the-art performance, demonstrating the generalizable efficacy of cross-plane modeling. Code is available at: https://github.com/YihengLyu/TranSamba.

2512.04144 2026-06-18 cs.AI 版本更新

RippleBench: Capturing Ripple Effects Using Existing Knowledge Repositories

RippleBench: 利用现有知识库捕捉涟漪效应

Roy Rinberg, Usha Bhalla, Igor Shilov, Flavio P. Calmon, Rohit Gandikota

发表机构 * Harvard University(哈佛大学) Imperial College London(伦敦帝国学院) Northeastern University(东北大学)

AI总结 提出RippleBench-Maker自动管道,从知识库检索语义邻居生成选择题,评估八种遗忘方法在Llama3-8B-Instruct上的涟漪效应,发现准确率下降随语义距离衰减且跨模型一致。

详情
AI中文摘要

针对语言模型的目标干预,如遗忘或模型编辑,旨在修改特定信息,但其效果往往传播到相关的、非预期的领域(例如,删除病毒学内容可能降低对过敏任务的性能);这些副作用通常被称为涟漪效应。我们引入RippleBench-Maker,一个自动管道,从知识库中检索任何源概念的语义邻居,并生成不同语义距离的多选题。我们使用WikiRAG(一个基于英文维基百科的开源RAG系统)实例化该框架,构建RippleBench-WMDP-Bio(584个种子主题,352,961个问题),并在Llama3-8B-Instruct上评估八种遗忘方法。所有八种方法在遗忘目标附近准确率下降最大,并随语义距离衰减,每种方法具有不同的传播曲线。我们在Mistral-7B、Zephyr-7B和Yi-34B上复现了这些发现;跨模型的差值曲线几乎相同,表明涟漪效应是遗忘方法的属性而非基础模型。我们通过一项包含四个实验的Mechanical Turk研究(5,200+次响应,61名工作者)验证了所有主要管道阶段。我们发布所有代码、数据和基础设施。

英文摘要

Targeted interventions on language models, such as unlearning or model editing, aim to modify specific information, but their effects often propagate to related, unintended areas (e.g., removing virology content may degrade performance on allergies); these side-effects are commonly referred to as the ripple effect. We introduce RippleBench-Maker, an automatic pipeline that retrieves semantic neighbors of any source concept from a knowledge repository and generates multiple-choice questions at varying semantic distances. We instantiate this framework using WikiRAG, an open-source RAG system over English Wikipedia, to construct RippleBench-WMDP-Bio (584 seed topics, 352,961 questions), and evaluate eight unlearning methods on Llama3-8B-Instruct. All eight exhibit accuracy drops that are largest near the unlearned target and decay with semantic distance, each with a distinct propagation profile. We replicate these findings across Mistral-7B, Zephyr-7B, and Yi-34B; cross-model delta curves are nearly identical, suggesting ripple effects are a property of the unlearning method rather than the base model. We validate all major pipeline stages using a four-experiment Mechanical Turk study (5,200+ responses, 61 workers). We release all code, data, and infrastructure.

2510.05107 2026-06-18 cs.AI 版本更新

Structured Cognitive Loop for Behavioral Intelligence in Large Language Model Agents (Extended Revision: From Behavioral Architecture to Epistemic Accountability)

大型语言模型代理中行为智能的结构化认知循环(扩展修订:从行为架构到认知问责)

Myung Ho Kim

发表机构 * JEI University(JEI大学)

AI总结 提出结构化认知循环(SCL)架构,通过分离认知、记忆、控制和行动模块,实现LLM代理的可问责行为,在360个任务中成功率86.3%,优于基线方法。

Comments This revised version extends the original SCL framework from a behavioral architecture for reliable LLM agents into a broader architecture of epistemic accountability, integrating context-aware Human-in-the-Loop control, Pool-Gated Retrieval, and the Horizon-Warrant-Commitment structure

详情
AI中文摘要

AI代理的核心挑战不仅是性能,还有问责性。通过不透明提示序列行动的代理可能产生正确输出,但几乎无法验证为何允许某个行动、错误发生在何处或如何分配责任。本文提出结构化认知循环(SCL)作为大型语言模型代理中可问责行为的架构。SCL将认知、记忆、控制和行动分离为不同模块。语言模型提出建议。外部记忆保存已验证的状态。轻量级控制器检查前提条件、防止冗余行动,并在使用工具前授权执行。我们评估了SCL与ReAct及常见LangChain代理变体在旅行规划、条件邮件起草和约束引导图像生成中的表现。在360个回合中,SCL的任务成功率达到86.3%,而基于提示的基线为70.5%至76.8%。它还提高了目标保真度,减少了冗余工具调用,增加了中间状态的重用,并降低了无依据的断言。此扩展修订将SCL置于更广泛的认知问责架构中。后续扩展整合了上下文感知的人机循环控制、池门控检索和视野担保承诺框架。这些组件共同定义了一个代理架构,其中模型提出建议,结构做出决策,证据在使用前得到担保,人类判断嵌入在轨迹中而非事后强加。结果为AI代理奠定了基础,使其决策不仅有效,而且得到授权、可检查且可问责。

英文摘要

The central challenge for AI agents is not only performance but accountability. Agents that act through opaque prompt sequences may produce correct outputs, but they provide little basis for verifying why an action was permitted, where an error occurred, or how responsibility should be assigned. This paper presents the Structured Cognitive Loop as an architecture for accountable behavior in large language model agents. SCL separates cognition, memory, control, and action into distinct modules. The language model proposes. External memory preserves verified state. A lightweight controller checks preconditions, prevents redundant actions, and authorizes execution before tools are used. We evaluate SCL against ReAct and common LangChain agent variants across travel planning, conditional email drafting, and constraint guided image generation. Across 360 episodes, SCL achieves 86.3 percent task success compared with 70.5 to 76.8 percent for prompt based baselines. It also improves goal fidelity, reduces redundant tool calls, increases reuse of intermediate state, and lowers unsupported assertions. This extended revision situates SCL within a broader architecture of epistemic accountability. Subsequent extensions integrate context aware Human in the Loop control, Pool Gated Retrieval, and the Horizon Warrant Commitment framework. Together these components define an agent architecture in which the model proposes, structure decides, evidence is warranted before use, and human judgment is embedded in the trace rather than imposed after the fact. The result is a foundation for AI agents whose decisions are not only effective but also authorized, inspectable, and accountable.

2511.20302 2026-06-18 cs.CV 版本更新

CrossEarth-Gate: Fisher-Guided Adaptive Tuning Engine for Efficient Adaptation of Cross-Domain Remote Sensing Semantic Segmentation

CrossEarth-Gate:基于Fisher引导的自适应调优引擎用于高效跨域遥感语义分割

Shilei Cao, Ziyang Gong, Hehai Lin, Yang Liu, Jiashun Cheng, Xiaoxing Hu, Haoyuan Liang, Guowen Li, Chengwei Qin, Hong Cheng, Xue Yang, Juepeng Zheng, Haohuan Fu

发表机构 * Sun Yat-sen University(中山大学) The Chinese University of Hong Kong(香港中文大学) Shanghai Jiao Tong University(上海交通大学) National Supercomputing Center in Shenzhen(深圳国家超算中心) The Hong Kong University of Science and Technology(香港科技大学) Beijing Institute of Technology(北京理工大学) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) Tsinghua University(清华大学)

AI总结 提出CrossEarth-Gate,通过Fisher信息引导的自适应模块选择机制,动态激活最关键的跨域模块,在18个跨域基准中16个达到最优性能。

详情
AI中文摘要

在遥感(RS)中,参数高效微调(PEFT)已成为激活基础模型泛化表示能力以用于下游任务的关键方法。然而,现有的专用PEFT方法在应用于大规模地球观测任务时常常失败,因为它们无法完全处理遥感数据中固有的多面且不可预测的域差距(例如空间、语义和频率偏移)。为克服这一问题,我们提出CrossEarth-Gate,它包含两个主要贡献。首先,我们建立了一个全面的遥感模块工具箱,以解决多方面的域差距,包括空间、语义和频率模块。其次,我们开发了一种基于Fisher引导的自适应选择机制,该机制作用于该工具箱。该选择由Fisher信息引导,通过衡量每个模块对任务特定梯度流的贡献来量化其重要性。它动态地仅在适当层激活最关键模块,引导梯度流以最大化适应效果和效率。全面实验验证了我们方法的有效性和泛化能力,其中CrossEarth-Gate在18个遥感语义分割跨域基准中的16个上达到了最先进性能。

英文摘要

In Remote Sensing (RS), Parameter-Efficient Fine-Tuning (PEFT) has emerged as a key approach to activate the generalizable representation ability of foundation models for downstream tasks. However, existing specialized PEFT methods often fail when applied to large-scale Earth observation tasks, as they are unable to fully handle the multifaceted and unpredictable domain gaps (e.g., spatial, semantic, and frequency shifts) inherent in RS data. To overcome this, we propose CrossEarth-Gate, which introduces two primary contributions. First, we establish a comprehensive RS module toolbox to address multifaceted domain gaps, comprising spatial, semantic, and frequency modules. Second, we develop a Fisher-guided adaptive selection mechanism that operates on this toolbox. This selection is guided by Fisher Information to quantify each module's importance by measuring its contribution to the task-specific gradient flow. It dynamically activates only the most critical modules at the appropriate layers, guiding the gradient flow to maximize adaptation effectiveness and efficiency. Comprehensive experiments validate the efficacy and generalizability of our method, where CrossEarth-Gate achieves state-of-the-art performance on 16 out of 18 cross-domain benchmarks for RS semantic segmentation.

2511.05221 2026-06-18 cs.LG q-bio.NC 版本更新

ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy

ActiTect:通过标准化体动记录进行REM睡眠行为障碍筛查的通用机器学习流程

David Bertram, Anja Ophey, Sinah Röttgen, Konstantin Kufer, Gereon R. Fink, Elke Kalbe, Clint Hansen, Walter Maetzler, Maximilian Kapsecker, Lara M. Reimer, Stephan Jonas, Andreas T. Damgaard, Natasha B. Bertelsen, Casper Skjaerbaek, Per Borghammer, Karolien Groenewald, Pietro-Luca Ratti, Michele T. Hu, Noémie Moreau, Michael Sommerauer, Katarzyna Bozek

发表机构 * Faculty of Mathematics and Natural Sciences, University of Cologne, Germany(科隆大学数学与自然科学学院,德国) Institute for Biomedical Informatics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆大学医学院与科隆大学医院生物医学信息学研究所,德国) Center for Molecular Medicine Cologne (CMMC), Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆分子医学中心(CMMC),科隆大学医学院与科隆大学医院,德国) Medical Psychology | Neuropsychology and Gender Studies, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆大学医学院与科隆大学医院医学心理学 | 神经心理学与性别研究,德国) Cognitive Neuroscience, Insitute for Neuroscience and Medicine, INM-3, Research Center Juelich, Germany(认知神经科学,神经科学与医学研究所,Juelich研究中心,德国) Department of Neurology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany(科隆大学医学院与科隆大学医院神经科,德国) Center of Neurology, Department of Parkinson, Sleep and Movement Disorders, University Hospital Bonn, University of Bonn, Germany(神经科中心,帕金森、睡眠与运动障碍部门,波恩大学医院,德国) German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany(德国神经退行性疾病研究中心(DZNE),波恩,德国) Cluster of Excellence for Aging and Aging-Associated Diseases (CECAD), University of Cologne, Germany(老龄化与相关疾病卓越中心(CECAD),科隆大学,德国) Department of Neurology, University Medical Center Schleswig-Holstein, Campus Kiel and Kiel University, Germany(神经科,施普伦德-霍斯特大学医院,基尔校区和基尔大学,德国) Department of Informatics, Technical University of Munich, Germany(信息学院,慕尼黑技术大学,德国) Institute for Digital Medicine, University Hospital Bonn, Germany(数字医学研究所,波恩大学医院,德国) Lundbeck Foundation Parkinson’s Disease Research Center (PACE), Aarhus University, Denmark(路德维希基金会帕金森病研究中心(PACE),奥胡斯大学,丹麦) Department of Nuclear Medicine, Aarhus University Hospital, Denmark(核医学部,奥胡斯大学医院,丹麦) Department of Electrical and Computer Engineering, Aarhus University, Denmark(电气与计算机工程系,奥胡斯大学,丹麦) Oxford Parkinson’s Disease Centre and Division of Neurology, Nuffield Department of Clinical Neurosciences, University of Oxford, UK(牛津帕金森病中心与神经科,牛津大学临床神经科学系,英国)

AI总结 提出ActiTect,一个全自动开源机器学习工具,通过标准化预处理和睡眠-觉醒检测,从体动记录中识别RBD,在多个独立队列中验证了泛化能力(AUROC 0.84-0.94)。

Comments 37 pages including Supplementary Information, 4 core figures, 1 supplementary figure. (v2: fixed a typo in Table 3 and made minor text edits; v3: post review)

Journal ref npj Digital Medicine (2026)

详情
AI中文摘要

孤立性快速眼动睡眠行为障碍(iRBD)是α-突触核蛋白病的主要前驱标志,通常先于帕金森病、路易体痴呆或多系统萎缩的临床发作。虽然腕戴式体动记录仪通过捕捉异常夜间运动在大规模筛查中具有检测RBD的巨大潜力,但缺乏可靠高效的分析流程则无法使用。本研究提出了ActiTect,一个全自动开源机器学习工具,用于从体动记录中识别RBD。为确保跨异构采集设置的泛化能力,我们的流程包括稳健的预处理和自动睡眠-觉醒检测,以协调多设备数据并提取表征活动模式的生理可解释运动特征。模型开发基于78名个体的队列,在嵌套交叉验证下表现出强大的区分能力(AUROC = 0.95)。在盲法本地测试集(n = 31,AUROC = 0.86)和两个独立外部队列(n = 113,AUROC = 0.84;n = 57,AUROC = 0.94)上验证了泛化性。为评估现实世界鲁棒性,跨内部和外部队列的留一数据集交叉验证显示出一致的性能(AUROC范围 = 0.84-0.89)。补充稳定性分析表明,关键预测特征在数据集中保持可重复性,支持最终合并的多中心模型作为更广泛部署的稳健预训练资源。通过开源且易于使用,我们的工具促进了广泛采用,并促进了独立验证和协作改进,从而推动该领域向使用可穿戴设备的统一且可泛化的RBD检测模型发展。

英文摘要

Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of $α$-synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with Lewy bodies, or multiple system atrophy. While wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts by capturing abnormal nocturnal movements, they become inoperable without a reliable and efficient analysis pipeline. This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings. To ensure generalizability across heterogeneous acquisition settings, our pipeline includes robust preprocessing and automated sleep-wake detection to harmonize multi-device data and extract physiologically interpretable motion features characterizing activity patterns. Model development was conducted on a cohort of 78 individuals, yielding strong discrimination under nested cross-validation (AUROC = 0.95). Generalization was confirmed on a blinded local test set (n = 31, AUROC = 0.86) and on two independent external cohorts (n = 113, AUROC = 0.84; n = 57, AUROC = 0.94). To assess real-world robustness, leave-one-dataset-out cross-validation across the internal and external cohorts demonstrated consistent performance (AUROC range = 0.84-0.89). A complementary stability analysis showed that key predictive features remained reproducible across datasets, supporting the final pooled multi-center model as a robust pre-trained resource for broader deployment. By being open-source and easy to use, our tool promotes widespread adoption and facilitates independent validation and collaborative improvements, thereby advancing the field toward a unified and generalizable RBD detection model using wearable devices.

2209.01378 2026-06-18 cs.LG eess.SP q-fin.ST 版本更新

RNN(p) for Power Consumption Forecasting

RNN(p) 用于电力消耗预测

Roberto Baviera, Pietro Manzoni

发表机构 * Politecnico di Milano, Department of Mathematics(米兰理工大学数学系) University of Edinburgh, Business School(爱丁堡大学商学院)

AI总结 提出RNN(p)作为ARX(p)的推广,用于多时间尺度季节模式预测,通过结构化反馈设计高效训练策略,在电力消耗预测中实现高精度与可解释性。

详情
AI中文摘要

一种基本的循环神经网络,它作用于p个时间滞后,称为RNN(p),是线性自回归模型ARX(p)的自然推广。对于在多个时间尺度上显示固有季节模式的变量,如能源、经济和金融时间序列中经常观察到的,它是一个强大的预测工具。RNN(p)模型的结构,以跨时间滞后的结构化反馈为特征,使得设计高效的训练策略成为可能。我们对这些模型的学习算法进行了比较研究,对其计算复杂度和训练性能进行了严格分析。我们展示了RNN(p)模型在电力消耗预测中的两个应用,这是能源领域的一个关键领域,准确的预测为运营和财务决策提供信息。实验结果表明,RNN(p)模型在保持高度可解释性的同时实现了出色的预测精度。这些特性使其非常适合能源市场和其他金融科技应用中的决策,其中可靠的预测在经济中发挥着重要作用。

英文摘要

An elementary Recurrent Neural Network that operates on p time lags, called an RNN(p), is the natural generalisation of a linear autoregressive model ARX(p). It is a powerful forecasting tool for variables displaying inherent seasonal patterns across multiple time scales, as is often observed in energy, economic, and financial time series. The architecture of RNN(p) models, characterised by structured feedbacks across time lags, enables the design of efficient training strategies. We conduct a comparative study of learning algorithms for these models, providing a rigorous analysis of their computational complexity and training performance. We present two applications of RNN(p) models in power consumption forecasting, a key domain within the energy sector where accurate forecasts inform both operational and financial decisions. Experimental results show that RNN(p) models achieve excellent forecasting accuracy while maintaining a high degree of interpretability. These features make them well-suited for decision-making in energy markets and other fintech applications where reliable predictions play a significant economic role.

2510.27353 2026-06-18 cs.AI 版本更新

An In-depth Study of LLM Contributions to the Bin Packing Problem

LLM对装箱问题贡献的深入研究

Julien Herrmann, Guillaume Pallez

发表机构 * CNRS-IRIT Inria

AI总结 通过分析LLM生成的启发式算法,发现其虽可读但难以解释,进而提出更简单高效的新算法,质疑LLM对装箱问题的实际贡献。

Comments Accepted for publication in ACM Transactions on Evolutionary Learning and Optimization

详情
AI中文摘要

近期研究表明,大型语言模型(LLM)可能为数学发现提供有趣的思路。该主张基于报告称,基于LLM的遗传算法在均匀分布和Weibull分布下为在线装箱问题产生了具有新见解的启发式算法。本文通过详细分析LLM产生的启发式算法,考察其行为和可解释性,重新评估了这一主张。尽管这些启发式算法是人类可读的,但即使对领域专家而言,它们仍然在很大程度上是不透明的。基于此分析,我们提出了一类针对这些特定装箱实例的新算法。推导出的算法显著更简单、更高效、更可解释且更具泛化性,表明所考虑的实例本身相对简单。然后,我们讨论了关于LLM对该问题贡献的主张的局限性,该主张似乎基于一个错误的假设,即这些实例先前已被研究过。我们的发现反而强调了在评估LLM生成输出的科学价值时,需要进行严格的验证和情境化。

英文摘要

Recent studies have suggested that Large Language Models (LLMs) could provide interesting ideas contributing to mathematical discovery. This claim was motivated by reports that LLM-based genetic algorithms produced heuristics offering new insights into the online bin packing problem under uniform and Weibull distributions. In this work, we reassess this claim through a detailed analysis of the heuristics produced by LLMs, examining both their behavior and interpretability. Despite being human-readable, these heuristics remain largely opaque even to domain experts. Building on this analysis, we propose a new class of algorithms tailored to these specific bin packing instances. The derived algorithms are significantly simpler, more efficient, more interpretable, and more generalizable, suggesting that the considered instances are themselves relatively simple. We then discuss the limitations of the claim regarding LLMs' contribution to this problem, which appears to rest on the mistaken assumption that the instances had previously been studied. Our findings instead emphasize the need for rigorous validation and contextualization when assessing the scientific value of LLM-generated outputs.

2510.21615 2026-06-18 cs.CV 版本更新

Epipolar Geometry Improves Video Generation Models

极线几何改进视频生成模型

Orest Kupyn, Théo Uscidda, Marta Tintore Gazulla, Fabian Manhardt, Federico Tombari, Christian Rupprecht

发表机构 * University of Oxford(牛津大学) Google Research(谷歌研究院) CREST-ENSAE, Institut Polytechnique de Paris(巴黎理工学院CREST-ENSAE研究中心) Technical University of Munich(慕尼黑技术大学)

AI总结 针对视频生成模型几何不一致和运动伪影问题,提出基于极线几何约束的偏好优化方法,在保持视觉质量的同时将极线误差降低31%,人类评分一致性从54%提升至72%。

详情
AI中文摘要

视频生成模型通过使用整流流技术训练的潜在扩散变换器取得了显著进展。然而,这些模型仍然存在几何不一致、运动不稳定以及破坏逼真3D场景错觉的视觉伪影。3D一致的视频生成可能对生成和重建任务中的众多下游应用产生重大影响。我们探索了极线几何约束如何改进现代视频扩散模型。尽管使用了大量训练数据,这些模型未能捕捉基本的几何原理。我们通过基于偏好的优化,利用成对极线几何约束对齐扩散模型,通过数学上合理的几何约束直接解决不稳定轨迹和几何伪影。我们的方法有效地强制执行几何原理,而不需要端到端的可微性。评估表明,经典的几何约束比现代学习度量提供了更稳定的优化信号。在静态场景和动态相机上的训练确保了度量质量,同时模型泛化到各种动态场景。通过将数据驱动学习与经典计算机视觉相结合,我们将极线误差降低了31%,并将人类评分一致性从54%提高到72%,且不损害视觉质量。

英文摘要

Video generation models have advanced significantly through the latent diffusion transformers trained with rectified flow techniques. Yet these models still struggle with geometric inconsistencies, unstable motion, and visual artifacts that break the illusion of realistic 3D scenes. 3D-consistent video generation could significantly impact numerous downstream applications in generation and reconstruction tasks. We explore how epipolar geometry constraints improve modern video diffusion models. Despite using massive training data, these models fail to capture fundamental geometric principles. We align diffusion models using pairwise epipolar geometry constraints via preference-based optimization, directly addressing unstable trajectories and geometric artifacts through mathematically principled geometric enforcement. Our approach efficiently enforces geometric principles without requiring end-to-end differentiability. Evaluation demonstrates that classical geometric constraints provide more stable optimization signals than modern learned metrics. Training on static scenes with dynamic cameras ensures metric quality while the model generalizes to various dynamic scenes. By bridging data-driven learning with classical computer vision, we reduce epipolar error by 31% and improve human-rated consistency from 54% to 72% without compromising visual quality.

2510.18085 2026-06-18 cs.RO cs.AI cs.MA 版本更新

R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations

R2BC: 从单智能体演示进行多智能体模仿学习

Connor Mattson, Varun Raveendra, Ellen Novoseller, Nicholas Waytowich, Vernon J. Lawhern, Daniel S. Brown

发表机构 * Kahlert School of Computing, University of Utah(犹他大学凯勒尔计算学院) DEVCOM Army Research Laboratory(陆军研究实验室)

AI总结 提出R2BC方法,通过轮换单智能体演示训练多机器人系统,无需联合动作空间演示,在模拟和实物任务中性能媲美或超越基于特权同步演示的基线方法。

Comments 8 pages, 6 figures. In Proceedings: IEEE International Conference on Robotics & Automation (ICRA 2026)

详情
AI中文摘要

模仿学习(IL)是人类教授机器人的自然方式,尤其是在高质量演示易于获取的情况下。虽然IL已广泛应用于单机器人场景,但将其扩展到多智能体系统的研究相对较少,尤其是在单个人类必须为协作机器人团队提供演示的场景中。本文介绍并研究了轮换行为克隆(R2BC),该方法使单个人类操作员能够通过顺序的单智能体演示有效训练多机器人系统。我们的方法允许人类一次远程操作一个智能体,并逐步向整个系统教授多智能体行为,无需联合多智能体动作空间的演示。我们表明,在四个多智能体模拟任务中,R2BC方法的性能与基于特权同步演示的Oracle行为克隆方法相当,甚至在某些情况下超越后者。最后,我们在两个使用真实人类演示训练的物理机器人任务上部署了R2BC。

英文摘要

Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demonstrations are easy to obtain. While IL has been widely applied to single-robot settings, relatively few studies have addressed the extension of these methods to multi-agent systems, especially in settings where a single human must provide demonstrations to a team of collaborating robots. In this paper, we introduce and study Round-Robin Behavior Cloning (R2BC), a method that enables a single human operator to effectively train multi-robot systems through sequential, single-agent demonstrations. Our approach allows the human to teleoperate one agent at a time and incrementally teach multi-agent behavior to the entire system, without requiring demonstrations in the joint multi-agent action space. We show that R2BC methods match, and in some cases surpass, the performance of an oracle behavior cloning approach trained on privileged synchronized demonstrations across four multi-agent simulated tasks. Finally, we deploy R2BC on two physical robot tasks trained using real human demonstrations.

2510.15551 2026-06-18 cs.CL cs.AI cs.LG 版本更新

Rethinking Cross-lingual Gaps from a Statistical Viewpoint

从统计视角重新思考跨语言差距

Vihari Piratla, Purvam Jain, Darshan Singh, Trevor Cohn, Preethi Jyothi, Partha Talukdar

发表机构 * Google DeepMind(谷歌深Mind)

AI总结 提出跨语言差距源于目标语言响应方差,通过形式化偏差和无偏误差,并采用推理时集成方法降低方差,使跨语言迁移得分提升8%-50%以上。

Comments 30 pages

详情
AI中文摘要

任何知识片段通常以一种或少数几种自然语言表达在网页或大型语料库中。大型语言模型(LLMs)通过从源语言获取知识,并在使用目标语言查询时使其可访问,从而充当桥梁。跨语言差距是指使用目标语言而非源语言查询知识时准确率的下降。现有研究侧重于导致跨语言差距的建模或训练失败。在这项工作中,我们采取另一种视角来表征跨语言错误的性质,并假设目标语言中响应的方差是造成这一差距的关键原因。我们首次将跨语言差距形式化为有偏误差和无偏误差。通过多种控制方差并减少跨语言差距的推理时干预,我们实证验证了我们的假设。我们展示了几种测试时集成方法,这些方法降低了响应方差,从而将源-目标迁移得分提高了多达12个绝对百分点,在各种LLMs上实现了8%到超过50%的相对提升。

英文摘要

Any piece of knowledge is usually expressed in one or a handful of natural languages on the web or in any large corpus. Large Language Models (LLMs) act as a bridge by acquiring knowledge from a source language and making it accessible when queried using target languages. A cross-lingual gap is a drop in accuracy incurred when querying knowledge in a target language rather than the source language. Existing research focused on modeling or training failures leading to cross-lingual gaps. In this work, we take an alternative view to characterize the nature of cross-lingual error, and hypothesize that the variance of responses in the target language is a key cause of this gap. For the first time, we formalize the cross-lingual gap in terms of biased and unbiased errors. We empirically validate our hypothesis through multiple inference-time interventions that control variance and reduce the cross-lingual gap. We demonstrate a few test-time ensemble methods that reduce response variance, and thereby improve source-target transfer scores by up to 12 absolute points yielding relative gains of 8% to over 50% across various LLMs.

2510.09905 2026-06-18 cs.AI cs.CL 版本更新

The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs

个性化陷阱:用户记忆如何改变大语言模型的情感推理

Xi Fang, Weijie Xu, Yuchong Zhang, Stephanie Eckman, Scott Nickleach, Chandan K. Reddy

发表机构 * Amazon(亚马逊)

AI总结 研究用户记忆如何导致大语言模型在情感推理中产生系统性偏差,发现高绩效模型对优势背景用户的情感解读更准确,个性化机制可能嵌入社会等级。

Comments 19 pages 5 figures

详情
AI中文摘要

当AI助手记住Sarah是一位打两份工的单亲母亲时,它对她压力的解读是否与她是富有的高管时不同?随着个性化AI系统越来越多地融入长期用户记忆,理解这种记忆如何塑造情感推理至关重要。我们通过在人验证的情感智能测试上评估15个模型,研究用户记忆如何影响大语言模型(LLMs)的情感智能。我们发现,相同的场景搭配不同的用户画像会产生系统性不同的情感解读。在经验证的独立于用户的情感场景和多样化的用户画像中,几个高性能LLM出现了系统性偏差,其中优势背景的用户画像获得了更准确的情感解读。此外,LLM在情感推理和支持性推荐任务中表现出跨人口统计因素的显著差异,表明个性化机制可以将社会等级嵌入模型的情感推理中。这些结果凸显了记忆增强AI的一个关键挑战:为个性化设计的系统可能会强化社会不平等。为缓解这些差异,我们整理了一个通用偏好数据集,旨在减少人口统计画像对情感理解的影响。

英文摘要

When an AI assistant remembers that Sarah is a single mother working two jobs, does it interpret her stress differently than if she were a wealthy executive? As personalized AI systems increasingly incorporate long-term user memory, understanding how this memory shapes emotional reasoning is critical. We investigate how user memory affects emotional intelligence in large language models (LLMs) by evaluating 15 models on human-validated emotional intelligence tests. We find that identical scenarios paired with different user profiles produce systematically divergent emotional interpretations. Across validated user-independent emotional scenarios and diverse user profiles, systematic biases emerged in several high-performing LLMs where advantaged profiles received more accurate emotional interpretations. Moreover, LLMs demonstrate significant disparities across demographic factors in emotion reasoning and supportive recommendations tasks, indicating that personalization mechanisms can embed social hierarchies into models' emotional reasoning. These results highlight a key challenge for memory-enhanced AI: systems designed for personalization may reinforce social inequalities. To mitigate these disparities, we curate a general-purpose preference dataset designed to reduce demographic profiles' influence on emotional understanding.

2506.12311 2026-06-18 cs.CL cs.SD eess.AS 版本更新

Phonikud: Overcoming Phonetic Underspecification for Hebrew Text-To-Speech

Phonikud:克服希伯来语文本转语音中的语音欠指定问题

Yakov Kolani, Maxim Melichov, Cobi Calev, Morris Alper

发表机构 * Independent Researcher(独立研究者) Reichman University(雷赫曼大学) Tel Aviv University(特拉维夫大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出Phonikud框架,通过开源G2P系统、语料库、基准和评估模型,解决希伯来语TTS中重音等语音特征欠指定问题,实现更准确的音素预测。

Comments Accepted to Interspeech 2026. Project page: https://phonikud.github.io

详情
AI中文摘要

现代希伯来语的文本转语音(TTS)受到该语言正字法复杂性的挑战,现有解决方案忽略了诸如重音等欠指定的语音特征。我们提出了一个更准确的希伯来语TTS框架,包含四个贡献:(1)Phonikud,一个开源的希伯来语字素到音素(G2P)系统,输出完全指定的国际音标(IPA)转录,通过增强基础注音器设计而成。(2)ILSpeech语料库,包含配对的希伯来语音频、文本和专家IPA标注。(3)针对先前未测量的希伯来语G2P转换任务的基准。(4)希伯来语音频到IPA模型,捕获先前忽略的语音细节,用于自动TTS评估。我们的结果表明,Phonikud比先前方法更准确地预测希伯来语音素,并且使用Phonikud语音输入的小型本地TTS模型接近大型专有系统。我们在以下网址发布代码、数据和模型:this https URL。

英文摘要

Text-to-speech (TTS) for Modern Hebrew is challenged by the language's orthographic complexity, with existing solutions ignoring underspecified phonetic features such as stress. We present a framework for more phonetically accurate Hebrew TTS with four contributions: (1) Phonikud, an open-source Hebrew grapheme-to-phoneme (G2P) system that outputs fully-specified International Phonetic Alphabet (IPA) transcriptions, designed by augmenting a base diacritizer. (2) The ILSpeech corpus of paired Hebrew audio, text, and expert IPA annotations. (3) A benchmark for the previously unmeasured task of Hebrew G2P conversion. (4) Hebrew audio-to-IPA models capturing previously disregarded phonetic details for automatic TTS evaluation. Our results show that Phonikud more accurately predicts Hebrew phonemes than prior methods, and that small, local TTS models with phonetic input from Phonikud approach large proprietary systems. We release our code, data, and models at https://phonikud.github.io.

2502.17748 2026-06-18 cs.LG cs.CR 版本更新

FinP: Fairness-in-Privacy in Federated Learning by Addressing Disparities in Privacy Risk

FinP:联邦学习中通过解决隐私风险差异实现隐私公平性

Tianyu Zhao, Mahmoud Srewa, Salma Elmalaki

发表机构 * University of California, Irvine(加州大学尔湾分校)

AI总结 针对联邦学习中隐私风险分布不均的问题,提出FinP框架,通过服务器端自适应聚合和客户端正则化技术,减轻源推理攻击风险,将隐私暴露差异降低57.14%,同时保持模型效用与基线相当。

Comments To appear in PoPETS 2026 Issue 4. Privacy Enhancing Technology Symposium (PETS) 2026

详情
AI中文摘要

联邦学习(FL)固有地缓解了大规模数据集中化风险;然而,其隐私保护并非均匀分布——使得脆弱个体不成比例地暴露于复杂的隐私攻击之下。关键的是,以人为中心的FL环境中的统计异质性常常导致隐私风险的不公平分布,尤其影响那些敏感属性或行为使其成为异常值的个体。为解决这一关键差距,我们引入了FinP,这是一个新颖的框架,旨在通过减轻客户端对源推理攻击(SIA)的过度脆弱性来形式化和实施隐私公平性。FinP实施了一种双管齐下的防御策略,同时解决隐私差异的症状和根本原因,确保没有一组客户端承担过度的隐私负担。它结合了服务器端自适应聚合机制(根据客户端的估计隐私风险动态加权其贡献)和客户端正则化技术(抑制导致独特数据记忆的局部过拟合)。在FEMNIST、人类活动识别(HAR)和CIFAR-10数据集上的广泛实证评估表明,FinP有效地将隐私公平性与主要任务效用对齐。值得注意的是,FinP成功减轻了SIA风险并减少了隐私暴露差异,证明了强大的隐私公平性保证无需牺牲模型效用。最终,FinP通过将脆弱性差异降低高达57.14%,同时将全局模型效用保持在标准联邦基线±1.75%的微小范围内,建立了公平的隐私保护。

英文摘要

Federated Learning (FL) inherently mitigates mass data centralization risks; however, its privacy protections are not equally distributed - leaving vulnerable individuals disproportionately exposed to sophisticated privacy attacks. Crucially, statistical heterogeneity in human-centric FL environments often results in an inequitable distribution of privacy risks, particularly affecting those whose sensitive attributes or behaviors make them outliers. To address this critical gap, we introduce FinP, a novel framework designed to formalize and enforce fairness-in-privacy by mitigating disproportionate client vulnerability to Source Inference Attacks (SIA). FinP operationalizes a two-pronged defense strategy that tackles both the symptoms and root causes of privacy disparity, ensuring that no group of clients bears an excessive privacy burden. It combines a server-side adaptive aggregation mechanism, which dynamically weights client contributions based on their estimated privacy risk, with a client-side regularization technique to curb localized overfitting that drives unique data memorization. Extensive empirical evaluations on FEMNIST, Human Activity Recognition (HAR), and CIFAR-10 datasets demonstrate that FinP effectively aligns privacy fairness with primary task utility. Notably, FinP successfully mitigates SIA risks and reduces disparities in privacy exposure, establishing that strong fairness-in-privacy guarantees need not compromise model utility. Ultimately, FinP establishes equitable privacy protections by reducing vulnerability disparities by up to 57.14%, while preserving global model utility within a marginal +/- 1.75% of standard federated baselines.

2510.04120 2026-06-18 cs.CL cs.AI 版本更新

Probing Semantic Alignment, Lexical Invariance, and Syntactic Influence in LLM Metaphor Processing

探究大语言模型隐喻处理中的语义对齐、词汇不变性和句法影响

Fengying Ye, Shanshan Wang, Lidia S. Chao, Derek F. Wong

发表机构 * NLP 2 CT Lab, Department of Computer and Information Science, University of Macau(自然语言处理2CT实验室,计算机与信息科学系,澳门大学)

AI总结 通过几何探测、上下文替换和句法扰动三种方法,分析LLM在隐喻处理中的语义漂移、词汇稳定性及句法敏感性,揭示强行为表现可能源于异质信号。

Comments Accepted to ACL 2026

详情
AI中文摘要

大语言模型(LLM)在隐喻检测和解释任务上表现出色,但尚不清楚这种行为成功揭示了隐喻处理的哪些方面。我们通过探测三个互补维度:语义属性对齐、词汇不变性和句法敏感性,对行为证据的局限性进行诊断分析。使用几何探测,我们评估模型生成的解释是否与参考语义属性对齐;通过上下文变化替换,分析隐喻和字面表达之间词汇关联的稳定性;通过受控句法扰动,检查隐喻检测的敏感性。我们的分析表明,LLM生成的解释可能相对于参考属性出现语义漂移;稳定的词汇锚点在不同上下文条件下持续存在,可能支持常规隐喻,同时使需要上下文整合的新奇隐喻产生偏差;检测性能对句法不规则性敏感。这些发现表明,强行为表现可能反映了异质的潜在信号,强调在将隐喻基准解释为稳健、集成语义理解的证据时需要谨慎。

英文摘要

Large language models (LLMs) achieve strong performance on metaphor detection and interpretation tasks, yet it remains unclear what such behavioral success reveals about metaphor processing. We present a diagnostic analysis that examines the limits of behavioral evidence by probing three complementary dimensions: semantic attribute alignment, lexical invariance, and syntactic sensitivity. Using geometric probing, we assess whether model-generated interpretations align with reference semantic attributes; through context-varying substitution, we analyze the stability of lexical associations between metaphorical and literal expressions; and via controlled syntactic perturbations, we examine sensitivity in metaphor detection. Our analysis reveals that LLM-generated interpretations can exhibit semantic drift relative to reference attributes; stable lexical anchors persist across contextual conditions, potentially supporting conventional metaphors while biasing novel metaphors requiring contextual integration; and detection performance is sensitive to syntactic irregularities. These findings suggest that strong behavioral performance may reflect heterogeneous underlying signals, highlighting the need for caution when interpreting metaphor benchmarks as evidence of robust, integrated semantic understanding.

2509.22020 2026-06-18 cs.LG 版本更新

Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models

面向天气基础模型的任务自适应参数高效微调

Shilei Cao, Hehai Lin, Jiashun Cheng, Yang Liu, Guowen Li, Xuehe Wang, Juepeng Zheng, Haoyuan Liang, Meng Jin, Chengwei Qin, Hong Cheng, Haohuan Fu

发表机构 * Sun Yat-sen University(中山大学) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) The Hong Kong University of Science and Technology(香港科技大学) The Chinese University of Hong Kong(香港中文大学) National Supercomputing Center in Shenzhen(深圳国家超算中心) Huawei Technologies Co., Ltd(华为技术有限公司) Tsinghua University(清华大学)

AI总结 提出WeatherPEFT框架,通过任务自适应动态提示和随机Fisher引导自适应选择,在天气下游任务上以更少参数达到全微调性能。

详情
AI中文摘要

尽管机器学习的最新进展使天气基础模型(WFM)在多种下游任务中具备了强大的泛化能力,但随着模型规模扩大,计算需求不断攀升,实际部署愈发困难。当前为视觉或语言任务设计的参数高效微调(PEFT)方法无法应对天气下游任务的独特挑战,如变量异质性、分辨率多样性和时空覆盖变化,导致在WFM上性能欠佳。为弥补这一差距,我们提出WeatherPEFT,一种新颖的PEFT框架,包含两项协同创新。首先,在前向传播中,任务自适应动态提示(TADP)通过内部和外部模式提取,将编码器中的嵌入权重动态注入预训练骨干网络的输入令牌,实现针对特定下游任务的上下文感知特征重校准。其次,在反向传播中,随机Fisher引导自适应选择(SFAS)不仅利用Fisher信息识别并更新最关键的任务参数,从而保留不变的预训练知识,还引入随机性以稳定选择过程。我们在三个下游任务上验证了WeatherPEFT的有效性和效率,现有PEFT方法与全微调相比存在显著差距,而WeatherPEFT使用更少的可训练参数达到了与全微调相当的性能。本工作代码见此https链接。

英文摘要

While recent advances in machine learning have equipped Weather Foundation Models (WFMs) with substantial generalization capabilities across diverse downstream tasks, the escalating computational requirements associated with their expanding scale increasingly hinder practical deployment. Current Parameter-Efficient Fine-Tuning (PEFT) methods, designed for vision or language tasks, fail to address the unique challenges of weather downstream tasks, such as variable heterogeneity, resolution diversity, and spatiotemporal coverage variations, leading to suboptimal performance when applied to WFMs. To bridge this gap, we introduce WeatherPEFT, a novel PEFT framework for WFMs incorporating two synergistic innovations. First, during the forward pass, Task-Adaptive Dynamic Prompting (TADP) dynamically injects the embedding weights within the encoder to the input tokens of the pre-trained backbone via internal and external pattern extraction, enabling context-aware feature recalibration for specific downstream tasks. Furthermore, during backpropagation, Stochastic Fisher-Guided Adaptive Selection (SFAS) not only leverages Fisher information to identify and update the most task-critical parameters, thereby preserving invariant pre-trained knowledge, but also introduces randomness to stabilize the selection. We demonstrate the effectiveness and efficiency of WeatherPEFT on three downstream tasks, where existing PEFT methods show significant gaps versus Full-Tuning, and WeatherPEFT achieves performance parity with Full-Tuning using fewer trainable parameters. The code of this work is available at https://github.com/ShileiCao/WeatherPEFT.

2502.07531 2026-06-18 cs.CV cs.AI cs.LG cs.MM 版本更新

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation

VidCRAFT3: 面向图像到视频生成的相机、物体与光照控制

Sixiao Zheng, Zimian Peng, Yanpeng Zhou, Yi Zhu, Hang Xu, Xiangru Huang, Yanwei Fu

发表机构 * School of Data Science, Fudan University(复旦大学数据科学学院) Shanghai Innovation Institute(上海创新研究院) Zhejiang University(浙江大学) Huawei Noah’s Ark Lab(华为诺亚实验室) Westlake University(西湖大学) School of Data Science and MOE Frontiers Center for Brain Science, Fudan University(复旦大学数据科学学院和脑科学前沿中心) Fudan ISTBI–ZJNU Algorithm Centre for Brain-inspired Intelligence, Zhejiang Normal University(复旦大学-浙江师范大学脑启发智能算法中心)

AI总结 提出VidCRAFT3框架,通过显式建模几何、运动与光照的跨因素交互,实现对相机运动、物体运动和光照方向的独立或联合控制,在控制精度和视觉一致性上达到最优。

Comments Accepted to TVCG 2026

详情
AI中文摘要

可控图像到视频(I2V)生成将参考图像转换为由用户指定控制信号引导的连贯视频。虽然对相机运动、物体运动和光照的精确控制对于高保真创作至关重要,但现有方法通常独立处理这些因素,忽视了动态场景中视角、几何和光照之间的物理耦合,导致同时变化时出现阴影不匹配和透视漂移等视觉不一致问题。我们提出了VidCRAFT3,一个统一且灵活的I2V框架,显式建模几何、运动和光照之间的跨因素交互,实现对相机运动、物体运动和光照方向的独立或联合控制。Image2Cloud提供显式的3D几何先验以实现精确的相机运动控制。ObjMotionNet将稀疏物体轨迹编码为多尺度运动特征,以引导逼真的物体运动。空间三重注意力变压器通过光照交叉注意力整合光照方向,实现一致的重光照。为了解决联合标注数据的稀缺性,我们构建了VideoLightingDirection(VLD)数据集,包含精确的逐帧光照方向标注,并引入三阶段渐进训练策略,使得无需完全联合标注即可实现鲁棒学习。大量实验表明,VidCRAFT3在多种场景下的控制精度和视觉一致性上达到了最先进水平。

英文摘要

Controllable image-to-video (I2V) generation transforms a reference image into a coherent video guided by user-specified control signals. While precise control over camera motion, object motion, and lighting is essential for high-fidelity creation, existing methods often treat these factors independently. This overlooks the physical coupling among viewpoint, geometry, and illumination in dynamic scenes, leading to visual inconsistencies such as mismatched shadows and perspective drift under simultaneous changes. We present VidCRAFT3, a unified and flexible I2V framework that explicitly models cross-factor interactions among geometry, motion, and illumination, enabling both independent and joint control over camera motion, object motion, and lighting direction. Image2Cloud provides explicit 3D geometric priors for accurate camera motion control. ObjMotionNet encodes sparse object trajectories into multi-scale motion features to guide realistic object motion. A Spatial Triple-Attention Transformer integrates lighting direction through lighting cross-attention for consistent relighting. To address the scarcity of jointly annotated data, we construct the VideoLightingDirection (VLD) dataset with accurate per-frame lighting direction annotations, and introduce a three-stage progressive training strategy that enables robust learning without fully joint annotations. Extensive experiments demonstrate that VidCRAFT3 achieves state-of-the-art performance in control precision and visual coherence across diverse scenarios.

2508.20330 2026-06-18 cs.LG 版本更新

FORGE: Foundational Optimization Representations from Graph Embeddings

FORGE:基于图嵌入的基础优化表示

Zohair Shafi, Serdar Kadioglu

发表机构 * Khoury College of Computer Science Northeastern University(诺埃弗大学计算机科学学院) AI Center of Excellence, Fidelity Investments(富达投资人工智能卓越中心) Department of Computer Science, Brown University(布朗大学计算机科学系)

AI总结 提出FORGE框架,通过无监督预训练向量量化图自编码器学习混合整数规划实例的通用表示,无需求解器或最优解,在下游任务中提升求解器性能并超越现有方法。

Comments Published in TMLR

详情
AI中文摘要

组合优化问题在科学和工程中无处不在。然而,基于学习的加速组合优化方法通常需要求解大量困难实例来收集训练数据,导致显著的计算成本。现有的学习方法需要为每个问题分布和每个下游任务训练专用模型,严重限制了其可扩展性和泛化能力。我们提出Forge:基于图嵌入的基础优化表示,这是一个框架,它在大规模、多样化的混合整数规划(MIP)实例集合上以无监督方式预训练向量量化图自编码器,不依赖优化求解器或最优解。向量量化产生离散的代码分配,作为表示优化实例的词汇表。我们在无监督和有监督设置下评估Forge。在无监督设置中,Forge嵌入有效聚类跨问题领域和规模的未见实例。在有监督设置中,我们微调Forge嵌入,并展示单个预训练模型有助于预测割生成的完整性差距和搜索指导的变量提示,跨越多个问题和规模分布。在这两个任务中,我们提升了商业优化求解器的性能,并超越了最先进的基于学习的方法。最后,我们开源训练代码、预训练Forge权重和多个MIP分布的嵌入,以促进优化问题表示学习的进一步研究。

英文摘要

Combinatorial optimization problems are ubiquitous in science and engineering. Still, learning-based approaches to accelerate combinatorial optimization often require solving a large number of difficult instances to collect training data, incurring significant computational cost. Existing learning-based methods require training dedicated models for each problem distribution, for each downstream task, severely limiting their scalability and generalization. We introduce Forge: Foundational Optimization Representations from Graph Embeddings, a framework that pre-trains a vector-quantized graph autoencoder on a large, diverse collection of mixed-integer programming (MIP) instances in an unsupervised manner, without relying on optimization solvers or optimal solutions. Vector quantization produces discrete code assignments that serve as a vocabulary for representing optimization instances. We evaluate Forge in both unsupervised and supervised settings. In the unsupervised setting, Forge embeddings effectively cluster unseen instances across problem domains and sizes. In the supervised setting, we fine-tune Forge embeddings and show that a single pre-trained model helps predicting both the integrality gap for cut-generation and variable hints for search guidance across multiple problem and size distributions. In both tasks, we improve the performance of a commercial optimization solver and outperform state-of-the-art learning-based methods. Finally, we open-source our training code, pre-trained Forge weights, and embeddings for multiple MIP distributions to foster further research in representation learning for optimization problems https://skadio.github.io/forge/

2509.18588 2026-06-18 cs.CL 版本更新

UniECG: Understanding and Generating ECG in One Unified Model

UniECG: 在一个统一模型中理解与生成心电图

Jiarui Jin, Haoyu Wang, Xiang Lan, Jun Li, Hongyan Li, Shenda Hong

发表机构 * Peking University(北京大学) Shanghai Ocean University(上海海洋大学) the Second Hospital of Tianjin Medical University(天津医科大学第二医院) National University of Singapore(新加坡国立大学)

AI总结 提出UniECG模型,通过两阶段设计实现心电图信号/图像生成解释性文本及根据文本目标生成对应心电图信号,支持交互式心电图教育。

详情
AI中文摘要

心电图解读是医学教育中的基本技能,但学生往往需要更多静态示例来将波形证据与诊断推理联系起来。本文提出UniECG作为迈向交互式心电图教育的一步。UniECG支持两种互补的学习交互:给定心电图信号或图像,它生成基于证据的解释;给定文本学习目标,它生成对应的心电图信号示例用于案例学习。该模型采用两阶段设计。首先,它从心电图信号-图像-文本数据中学习基于证据的心电图解释。其次,它引入特殊的心电图生成标记,并将其隐藏表示与预训练的文本条件心电图扩散模型对齐,实现可控的信号级心电图生成。我们通过基于证据的心电图解释和面向生成的定性分析来评估UniECG,考察其支持解释和案例学习的潜力。UniECG旨在作为教育辅助工具和迈向交互式AI辅助心电图学习的研究步骤,而非临床验证的诊断系统。

英文摘要

Electrocardiogram (ECG) interpretation is a fundamental skill in medical education, yet students often need more than static examples to connect waveform evidence with diagnostic reasoning. This paper presents UniECG as a step toward interactive ECG education. UniECG supports two complementary learning interactions: given an ECG signal or image, it generates an evidence-based explanation; given a textual learning objective, it generates a corresponding ECG signal example for case-based learning. The model follows a two-stage design. First, it learns grounded ECG explanation from ECG signal--image--text data. Second, it introduces special ECG generation tokens and aligns their hidden representations with a pretrained text-conditioned ECG diffusion model, enabling controllable signal-level ECG generation. We evaluate UniECG through grounded ECG explanation and generation-oriented qualitative analysis, examining its potential to support explanation and case-based learning. UniECG is intended as an educational aid and a research step toward interactive AI-assisted ECG learning, rather than a clinically validated diagnostic system.

2505.20045 2026-06-18 cs.CL 版本更新

Efficient Hallucination Detection for LLMs Using Uncertainty-Aware Attention Heads

基于不确定性感知注意力头的高效大语言模型幻觉检测

Artem Vazhentsev, Lyudmila Rvanova, Gleb Kuzmin, Ekaterina Fadeeva, Ivan Lazichny, Alexander Panchenko, Maxim Panov, Mrinmaya Sachan, Preslav Nakov, Timothy Baldwin, Artem Shelmanov

发表机构 * Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)(莫扎德人工智能大学) ETH Zurich(苏黎世联邦理工学院) Independent Researcher(独立研究者) Applied AI Institute(应用人工智能研究所)

AI总结 提出RAUQ框架,利用不确定性感知注意力头与令牌级置信度,通过单次前向传递实现无监督、高效的序列级幻觉检测,在12个数据集上优于现有方法且额外计算少于1%。

Journal ref Proceedings of the 43rd International Conference on Machine Learning (ICML), Seoul, South Korea, 2026

详情
AI中文摘要

尽管大型语言模型(LLM)已经变得非常强大,但它们仍然容易出现事实性错误,通常称为“幻觉”。不确定性量化(UQ)为缓解这一问题提供了一种有前景的方法,但大多数现有方法计算量大且/或需要监督。在这项工作中,我们提出了基于循环注意力的不确定性量化(RAUQ),这是一种无监督且高效的幻觉识别框架。该方法利用了Transformer注意力行为的一个观察:当生成错误信息时,某些“不确定性感知”注意力头倾向于减少对前驱令牌的关注。RAUQ自动检测这些注意力头,并以循环方式将其激活模式与令牌级置信度度量相结合,仅通过一次前向传递即可生成序列级不确定性估计。通过在涵盖问答、摘要和翻译的十二个数据集上对九个不同LLM进行的实验,我们表明RAUQ始终优于最先进的UQ基线。重要的是,它产生的开销极小,所需的额外计算不到1%。由于它既不需要标记数据也不需要广泛的参数调整,RAUQ可作为白盒LLM中实时幻觉检测的轻量级即插即用解决方案。

英文摘要

While large language models (LLMs) have become highly capable, they remain prone to factual inaccuracies, commonly referred to as "hallucinations." Uncertainty quantification (UQ) offers a promising way to mitigate this issue, but most existing methods are computationally intensive and/or require supervision. In this work, we propose Recurrent Attention-based Uncertainty Quantification (RAUQ), an unsupervised and efficient framework for identifying hallucinations. The method leverages an observation about transformer attention behavior: when incorrect information is generated, certain "uncertainty-aware" attention heads tend to reduce their focus on preceding tokens. RAUQ automatically detects these attention heads and combines their activation patterns with token-level confidence measures in a recurrent scheme, producing a sequence-level uncertainty estimate in just a single forward pass. Through experiments on twelve datasets spanning question answering, summarization, and translation across nine different LLMs, we show that RAUQ consistently outperforms state-of-the-art UQ baselines. Importantly, it incurs minimal overhead, requiring less than 1\% additional computation. Since it requires neither labeled data nor extensive parameter tuning, RAUQ serves as a lightweight, plug-and-play solution for real-time hallucination detection in white-box LLMs.

2509.14969 2026-06-18 cs.LG math.OC stat.ML 版本更新

Stochastic Adaptive Gradient Descent Without Descent

无需下降的随机自适应梯度下降

Jean-François Aujol, Jérémie Bigot, Camille Castera

发表机构 * Univ. Bordeaux CNRS, Bordeaux INP, IMB, UMR 5251(波尔多大学 CNRS,波尔多 INP,IMB,UMR 5251)

AI总结 提出一种无需超参数调优的随机梯度自适应步长策略,利用一阶随机Oracle的局部几何信息,理论证明收敛性,实验与调优基线竞争。

详情
AI中文摘要

我们引入了一种新的自适应步长策略,用于随机梯度的凸优化,该策略仅通过一阶随机Oracle利用目标函数的局部几何信息,无需任何超参数调优。该方法源于将自适应梯度下降无需下降方法理论化地适应到随机设置。我们证明了在多种假设下,使用我们的步长的随机梯度下降的收敛性,并展示了它在经验上与调优基线竞争。

英文摘要

We introduce a new adaptive step-size strategy for convex optimization with stochastic gradient that exploits the local geometry of the objective function only by means of a first-order stochastic oracle and without any hyper-parameter tuning. The method comes from a theoretically-grounded adaptation of the Adaptive Gradient Descent Without Descent method to the stochastic setting. We prove the convergence of stochastic gradient descent with our step-size under various assumptions, and we show that it empirically competes against tuned baselines.

2509.14653 2026-06-18 cs.CL 版本更新

UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition

UMA-Split:面向英语和普通话的非自回归语音识别的单峰聚合

Ying Fang, Xiaofei Li

发表机构 * Zhejiang University(浙江大学) School of Engineering, Westlake University(西湖大学工程学院) Institute of Advanced Technology, Westlake Institute for Advanced Study(西湖先进研究学院技术研究所)

AI总结 针对UMA在英语等语言中因token粒度不匹配导致性能下降的问题,提出UMA-Split,通过分割模块使每个聚合帧映射到多个token,提升非自回归语音识别的跨语言性能。

Comments Accepted by ICASSP 2026. Code:https://github.com/FnoY0723/uma_split

详情
AI中文摘要

本文提出了一种基于单峰聚合(UMA)的非自回归模型,用于英语和普通话语音识别。原始的UMA显式地分割并聚合相同文本标记的声学帧(使用先单调递增后递减的单峰权重),以学习比常规连接主义时间分类(CTC)更好的表示。然而,它仅在普通话中表现良好。在其他语言(如英语)中,单个音节可能被分词为多个细粒度标记,或者一个标记跨越少于3个声学帧而无法形成单峰权重,导致其难以处理。为解决此问题,我们提出允许每个UMA聚合帧映射到多个标记,通过一个简单的分割模块,在计算CTC损失之前从每个聚合帧生成两个标记。

英文摘要

This paper proposes a unimodal aggregation (UMA) based nonautoregressive model for both English and Mandarin speech recognition. The original UMA explicitly segments and aggregates acoustic frames (with unimodal weights that first monotonically increase and then decrease) of the same text token to learn better representations than regular connectionist temporal classification (CTC). However, it only works well in Mandarin. It struggles with other languages, such as English, for which a single syllable may be tokenized into multiple fine-grained tokens, or a token spans fewer than 3 acoustic frames and fails to form unimodal weights. To address this problem, we propose allowing each UMA-aggregated frame map to multiple tokens, via a simple split module that generates two tokens from each aggregated frame before computing the CTC loss.

2509.02555 2026-06-18 cs.LG cs.AI cs.NE 版本更新

Surrogate Benchmarks for Model Merging Optimization

模型合并优化的替代基准

Rio Akizuki, Yuya Kudo, Nozomu Yoshinari, Yoichi Hirose, Toshiyuki Nishimoto, Kento Uchida, Shinichi Shirakawa

发表机构 * Yokohama National University(横滨国立大学)

AI总结 针对模型合并超参数优化计算成本高的问题,构建替代基准以低成本预测合并模型性能并模拟优化算法行为。

Comments AutoML 2025 Non-Archival Content Track. The code of the surrogate benchmark is available at https://github.com/shiralab/SMM-Bench

详情
AI中文摘要

模型合并技术旨在将多个模型的能力整合到一个模型中。大多数模型合并技术都有超参数,其设置会影响合并模型的性能。由于现有几项工作表明,调整模型合并中的超参数可以增强合并结果,因此为模型合并开发超参数优化算法是一个有前景的方向。然而,其优化过程计算成本高昂,特别是在合并大型语言模型时。在这项工作中,我们为合并超参数的优化开发了替代基准,以实现低成本的算法开发和性能比较。我们定义了两个搜索空间并收集数据样本,以构建替代模型来预测合并模型在给定超参数下的性能。我们证明了我们的基准能够很好地预测合并模型的性能,并模拟优化算法的行为。

英文摘要

Model merging techniques aim to integrate the abilities of multiple models into a single model. Most model merging techniques have hyperparameters, and their setting affects the performance of the merged model. Because several existing works show that tuning hyperparameters in model merging can enhance the merging outcome, developing hyperparameter optimization algorithms for model merging is a promising direction. However, its optimization process is computationally expensive, particularly in merging LLMs. In this work, we develop surrogate benchmarks for optimization of the merging hyperparameters to realize algorithm development and performance comparison at low cost. We define two search spaces and collect data samples to construct surrogate models to predict the performance of a merged model from a hyperparameter. We demonstrate that our benchmarks can predict the performance of merged models well and simulate optimization algorithm behaviors.

2508.09191 2026-06-18 cs.LG cs.AI 版本更新

From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization

从数值到标记:一种基于符号离散化的LLM驱动上下文感知时间序列预测框架

Xiaoyu Tao, Shilong Zhang, Mingyue Cheng, Daoyu Wang, Tingyue Pan, Bokai Pan, Changqing Zhang, Shijin Wang

发表机构 * State Key Laboratory of Cognitive Intelligence(认知智能国家重点实验室) University of Science and Technology of China(中国科学技术大学) College of Intelligence and Computing(智能科学与计算学院) iFLYTEK Research(iFLYTEK研究院)

AI总结 提出TokenCast框架,利用大语言模型通过符号离散化将连续时间序列转化为标记,与上下文文本对齐,实现上下文感知的预测,实验证明有效。

详情
AI中文摘要

时间序列预测在能源、医疗和金融等关键应用领域支持决策中起着重要作用。尽管近期取得了进展,但由于将历史数值序列与通常包含非结构化文本数据的上下文特征整合的挑战,预测精度仍然有限。为了解决这一挑战,我们提出了TokenCast,一个由大语言模型(LLM)驱动的框架,利用基于语言的符号表示作为上下文感知时间序列预测的统一中介。具体来说,TokenCast采用离散分词器将连续数值序列转化为时间标记,实现与基于语言输入的结构对齐。为了有效弥合模态之间的语义差距,时间和上下文标记通过预训练的LLM嵌入到共享表示空间中,并通过生成目标进一步优化。基于这一统一语义空间,对齐的LLM随后以监督方式进行微调,以预测未来的时间标记,然后解码回原始数值空间。在真实世界数据集上的大量实验证明了我们框架的有效性,并突显了其作为上下文感知时间序列预测生成框架的潜力。代码可从此https URL获取。

英文摘要

Time series forecasting plays a vital role in supporting decision-making across a wide range of critical applications, including energy, healthcare, and finance. Despite recent advances, forecasting accuracy remains limited due to the challenge of integrating historical numerical sequences with contextual features, which often comprise unstructured textual data. To address this challenge, we propose TokenCast, a large language model (LLM) driven framework that leverages language-based symbolic representations as a unified intermediary for context-aware time series forecasting. Specifically, TokenCast employs a discrete tokenizer to transform continuous numerical sequences into temporal tokens, enabling structural alignment with language-based inputs. To effectively bridge the semantic gap between modalities, both temporal and contextual tokens are embedded into a shared representation space via a pre-trained LLM, further optimized with generative objectives. Building upon this unified semantic space, the aligned LLM is subsequently fine-tuned in a supervised manner to predict future temporal tokens, which are then decoded back into the original numerical space. Extensive experiments on real-world datasets demonstrate the effectiveness of our framework and highlight its potential as a generative framework for context-aware time series forecasting. The code is available at https://github.com/Xiaoyu-Tao/TokenCast.

2508.03483 2026-06-18 cs.CV cs.AI 版本更新

When Cars Have Stereotypes: Auditing Demographic Bias in Objects from Text-to-Image Models

当汽车有刻板印象:审计文本到图像模型中对象的群体偏见

Dasol Choi, Jihwan Lee, Minjae Lee, Minsuk Kahng

发表机构 * AIM Intelligence(AIM智能研究院) Yonsei University(延世大学)

AI总结 提出SODA框架,通过三个指标系统测量文本到图像模型在生成对象中的群体偏见,发现中性提示隐含偏向中年和白人,且人口统计线索导致高度偏斜的刻板输出。

详情
AI中文摘要

虽然先前关于文本到图像生成的研究主要集中在人类描绘中的偏见,但生成对象中的群体偏见仍然相对未被充分探索。我们引入了SODA(刻板对象诊断审计),这是一个新颖的框架,通过自动属性发现和三个标准化指标系统地测量这些偏见:基础与群体差异(BDS)、跨群体差异(CDS)和视觉属性集中度(VAC)。将SODA应用于五个最先进模型和八个对象类别(例如汽车)的8000张图像,我们发现“中性”提示产生的输出在视觉上最接近中年和白人,表明这些群体在模型默认设置中被隐含地过度代表。此外,人口统计线索触发了高度偏斜的刻板输出:26.6%的对象-模型-群体组合产生的结果中,所有20张生成图像共享完全相同的属性值(例如,为女性生成玫瑰金笔记本电脑)。最后,提示级别的去偏减少了群体间差异,但矛盾地压缩了群体内多样性,用一种刻板印象取代了另一种。SODA提供了一个实用的流程,使这些隐含关联变得可测量,作为迈向更负责任的人工智能发展的一步。

英文摘要

While prior research on text-to-image generation has predominantly focused on biases in human depictions, demographic bias in generated objects remains relatively underexplored. We introduce SODA (Stereotyped Object Diagnostic Audit), a novel framework for systematically measuring these biases through automated attribute discovery and three standardized metrics: Base vs. Demographic Divergence (BDS), Cross-Demographic Disparity (CDS), and Visual Attribute Concentration (VAC). Applying SODA to 8,000 images across five state-of-the-art models and eight object categories (e.g., cars), we find that "neutral" prompts produce outputs most visually similar to middle-aged and White people, suggesting these groups are implicitly over-represented in model defaults. Furthermore, demographic cues trigger highly skewed stereotypical outputs: 26.6% of object-model-demographic combinations produce results where all 20 generated images share the exact same attribute value (e.g., rose gold laptops for women). Finally, prompt-level debiasing reduces inter-group disparity but paradoxically collapses within-group diversity, replacing one stereotype with another. SODA offers a practical pipeline for making these implicit associations measurable, serving as a step toward more responsible AI development.

2504.04739 2026-06-18 cs.LG cs.CY 版本更新

UST-GNN: A Unified Spatial--Topological Graph Neural Network Framework for Urban Analytics--Demonstrated through a Case Study on Urban Health Prediction

UST-GNN:面向城市分析的空间-拓扑统一图神经网络框架——以城市健康预测为例

Minwei Zhao, Sanja Scepanovic, Stephen Law, Ivica Obadic, Cai Wu, Daniele Quercia

发表机构 * University College London(伦敦大学学院) The Hong Kong University of Science(香港科学大学) Nokia Bell Labs(诺基亚贝尔实验室) Technical University of Munich(慕尼黑技术大学) University of Oxford(牛津大学)

AI总结 提出UST-GNN框架,整合邻域连通性、异质城市特征和位置嵌入,在大伦敦4835个邻域的健康预测中,严格空间交叉验证下R²提升8.4-13.2%,并引入主成分模块解释嵌入。

详情
AI中文摘要

理解社会、人口、环境与空间因素如何共同塑造城市结果,对于可持续城市发展和循证政策至关重要。传统统计方法往往难以捕捉复杂的非线性关系,而许多机器学习方法忽视了城市系统中空间自相关和网络拓扑的共同作用。近期GeoAI的进展仅部分解决了这些挑战,通常将空间效应、图结构、评估和可解释性分开处理。我们提出\textbf{UST-GNN},一个统一的空间-拓扑图神经网络框架,将邻域连通性、异质城市特征和位置/区位嵌入整合到单一表示中。使用MedSAT数据集(包含大伦敦4835个邻域的150多个环境和社会人口变量及六种处方结果),UST-GNN在严格空间交叉验证下,比强统计基线、地理增强基线和图机器学习基线表现更优,样本外$R^2$提升8.4-13.2%。我们进一步引入轻量级主成分模块,从地理角度解释学习到的节点嵌入,并将其与政策相关的协变量联系起来。结果分析恢复了已知模式,为有争议的关联提供了新视角,并揭示了值得进一步因果研究的新预测因子。这些发现共同证明了基于图的空间机器学习在城市健康分析、环境不平等评估和循证城市政策中的价值。除预测增益外,UST-GNN提供了一个统一的GeoAI分析流程,可嵌入城市数字孪生工作流,用于情景测试、监测和数据驱动的决策,以建设更健康、更可持续的城市。

英文摘要

Understanding how social, demographic, environmental, and spatial factors jointly shape urban outcomes is essential for sustainable urban development and evidence-based policy. Traditional statistical approaches often struggle to capture complex non-linear relationships, while many machine learning methods overlook the joint roles of spatial autocorrelation and network topology in urban systems. Recent advances in GeoAI have addressed these challenges only partially, often treating spatial effects, graph structure, evaluation, and interpretability separately. We present \textbf{UST-GNN}, a unified spatial--topological graph neural network framework that integrates neighbourhood connectivity, heterogeneous urban features, and positional/locational embeddings into a single representation. Using the MedSAT dataset, which contains over 150 environmental and socio-demographic variables and six prescription outcomes across 4,835 neighbourhoods in Greater London, UST-GNN outperforms strong statistical, geographically enhanced, and graph Machine Learning baselines, improving out-of-sample $R^2$ by 8.4--13.2\% under strict spatial cross-validation. We further introduce a lightweight principal-component module to interpret learned node embeddings geographically and relate them to policy-relevant covariates. The resulting analyses recover established patterns, offer new perspectives on debated associations, and reveal novel predictors warranting further causal investigation. Together, these findings demonstrate the value of graph-based spatial machine learning for urban health analytics, environmental inequality assessment, and evidence-based urban policy. Beyond predictive gains, UST-GNN provides a unified GeoAI analytical pipeline that can be embedded into urban digital twin workflows for scenario testing, monitoring, and data-informed decision-making for healthier, more sustainable cities.

2507.17786 2026-06-18 cs.LG 版本更新

Reinforcement Learning for Accelerated Aerodynamic Shape Optimisation

强化学习加速气动外形优化

Florian Sobieczky, Alfredo Lopez, Erika Dudkin, Christopher Lackner, Matthias Hochsteger, Bernhard Scheichl, Helmut Sobieczky

发表机构 * Software Competence Center Hagenberg (SCCH)(软件竞争力中心哈根贝格) Institut für Strömungsmechanik und Wärmeübertragung, TU Wien(流体力学与传热研究所,维也纳技术大学) CERBSim GmbH(CERBSim公司)

AI总结 提出基于强化学习的自适应优化算法,通过代理模型和演员-评论家策略评估的MCMC方法,冻结部分参数以降低维度,加速气动外形优化,并在简单流体动力学问题上验证了特征重要性解释能力。

详情
AI中文摘要

我们引入了一种基于强化学习(RL)的自适应优化算法,用于气动外形优化,重点关注降维。这里应用RL的形式是一种基于代理的、演员-评论家策略评估的MCMC方法,允许对部分待优化参数进行时间上的“冻结”。目标是尽量减少计算量,并利用观察到的优化结果来解释所发现的极值点在实现所需流场中的作用。通过围绕作为真实值的中间CFD模拟进行一系列局部优化的参数变化,如果(a)参数必须驻留的局部邻域足够大,能够与网格大小的步长及其大量模拟相竞争,并且(b)对这些邻域所需的奖励和成本估计足够准确,以实现良好的逐步参数自适应,则可以加速全局优化。我们给出了一个简单流体动力学问题的例子,在该问题上,该方法允许在特征重要性评分意义上进行解释。

英文摘要

We introduce a reinforcement learning (RL) based adaptive optimization algorithm for aerodynamic shape optimization focused on dimensionality reduction. The form in which RL is applied here is that of a surrogate-based, actor-critic policy evaluation MCMC approach allowing for temporal 'freezing' of some of the parameters to be optimized. The goals are to minimize computational effort, and to use the observed optimization results for interpretation of the discovered extrema in terms of their role in achieving the desired flow-field. By a sequence of local optimized parameter changes around intermediate CFD simulations acting as ground truth, it is possible to speed up the global optimization if (a) the local neighbourhoods of the parameters in which the changed parameters must reside are sufficiently large to compete with the grid-sized steps and its large number of simulations, and (b) the estimates of the rewards and costs on these neighbourhoods necessary for a good step-wise parameter adaption are sufficiently accurate. We give an example of a simple fluid-dynamical problem on which the method allows interpretation in the sense of a feature importance scoring.

2506.09046 2026-06-18 cs.LG cs.AI cs.MA 版本更新

Self-Evolving Multi-Agent Systems via Textual Backpropagation

通过文本反向传播的自进化多智能体系统

Xiaowen Ma, Yunpu Ma, Chenyang Lin, Sikuan Yan, Jinhe Bi, Zixuan Cao, Yijun Tian, Volker Tresp, Hinrich Schuetze

发表机构 * Ludwig Maximilian University of Munich(慕尼黑路德维希-马克西米利安大学) Technical University of Munich(慕尼黑技术大学) Munich Center for Machine Learning(慕尼黑机器学习中心) University of Notre Dame(诺丁汉大学)

AI总结 提出Agentic Neural Network框架,将多智能体协作建模为分层神经网络,通过前向分解任务和反向传播反馈实现智能体角色、提示和协作的自进化,在七个基准数据集上超越现有方法。

详情
AI中文摘要

利用多个大型语言模型(LLM)已被证明对处理复杂、高维任务有效,但当前方法通常依赖静态、手动设计的多智能体配置。为克服这些限制,我们提出Agentic Neural Network(ANN)框架,该框架将多智能体协作概念化为分层神经网络架构。在此设计中,每个智能体作为节点运行,每一层形成一个专注于特定子任务的协作团队。我们的框架遵循两阶段优化策略:(1)前向阶段——受神经网络前向传播启发,任务被动态分解为子任务,并逐层构建具有合适聚合方法的协作智能体团队。(2)反向阶段——模仿反向传播,我们通过迭代反馈优化全局和局部协作,使智能体能够自进化其角色、提示和协调。这种神经符号方法使我们的框架能够在训练后创建新的或专门的智能体团队,在准确性和适应性方面带来显著提升。在七个基准数据集上,我们的工作在相同配置下超越了领先的多智能体基线,显示出持续的性能改进。

英文摘要

Leveraging multiple Large Language Models (LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network (ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative team focused on a specific subtask. Our framework follows a two-phase optimization strategy: (1) Forward Phase - Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase - Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables our framework to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across seven benchmark datasets, our work surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements.

2410.15595 2026-06-18 cs.AI cs.CL cs.LG 版本更新

A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications

直接偏好优化综述:数据集、理论、变体及应用

Wenyi Xiao, Zechuan Wang, Leilei Gan, Shuai Zhao, Zongrui Li, Ruirui Lei, Wanggui He, Luu Anh Tuan, Long Chen, Hao Jiang, Zhou Zhao, Fei Wu

发表机构 * Zhejiang University(浙江大学) Nanyang Technological University(南洋理工大学) Alibaba Group(阿里巴巴集团)

AI总结 综述直接偏好优化(DPO)在理论、变体、数据集和应用方面的进展,指出其作为RL-free替代方案的潜力与局限,并提出未来研究方向。

Comments Accepted by TPAMI 2026. Project page: https://github.com/Mr-Loevan/DPO-Survey

详情
AI中文摘要

随着大语言模型(LLMs)的快速发展,将策略模型与人类偏好对齐变得日益关键。直接偏好优化(DPO)作为一种有前景的对齐方法,作为从人类反馈中强化学习(RLHF)的无RL替代方案而出现。尽管DPO取得了各种进展并存在固有局限性,但文献中目前缺乏对这些方面的深入综述。在这项工作中,我们对DPO中的挑战和机遇进行了全面回顾,涵盖理论分析、变体、相关偏好数据集和应用。具体而言,我们基于关键研究问题对近期DPO研究进行分类,以提供对DPO当前格局的透彻理解。此外,我们提出了几个未来研究方向,为研究社区提供模型对齐的见解。相关论文的更新合集可在此https URL找到。

英文摘要

With the rapid advancement of large language models (LLMs), aligning policy models with human preferences has become increasingly critical. Direct Preference Optimization (DPO) has emerged as a promising approach for alignment, acting as an RL-free alternative to Reinforcement Learning from Human Feedback (RLHF). Despite DPO's various advancements and inherent limitations, an in-depth review of these aspects is currently lacking in the literature. In this work, we present a comprehensive review of the challenges and opportunities in DPO, covering theoretical analyses, variants, relevant preference datasets, and applications. Specifically, we categorize recent studies on DPO based on key research questions to provide a thorough understanding of DPO's current landscape. Additionally, we propose several future research directions to offer insights on model alignment for the research community. An updated collection of relevant papers can be found on https://github.com/Mr-Loevan/DPO-Survey.

2507.01414 2026-06-18 cs.LG 版本更新

Decomposing Prediction Mechanisms for In-Context Recall

分解上下文召回中的预测机制

Sultan Daniels, Dylan Davis, Dhruv Gautam, Wentinn Liao, Gireeja Ranade, Anant Sahai

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Pennsylvania(宾夕法尼亚大学)

AI总结 通过设计结合连续上下文学习与离散关联召回的新玩具问题,发现Transformer模型在上下文召回任务中存在两种具有不同学习动态的独立机制:一种依赖离散符号标签进行关联召回,另一种基于前一个token和上下文进行贝叶斯式预测。

Comments 45 pages, 47 figures, 2 tables

详情
AI中文摘要

我们引入了一类新的玩具问题,将线性回归风格的连续上下文学习(ICL)特征与离散关联召回相结合。我们在该玩具的样本轨迹上预训练Transformer模型,具体是从随机抽取的线性确定性动力系统中提取的符号标记交错状态观测。我们研究当模型被提示使用相应的上下文标签时,是否能够召回先前在其上下文中见过的序列的状态。仔细观察这个任务,很明显模型必须执行两个功能:(1)识别应召回哪个系统的状态,并将该系统应用于其最后看到的状态;(2)继续应用正确的系统来预测后续状态。训练动态表明,第一个能力在模型训练中后期才出现。令人惊讶的是,第二个能力(继续预测恢复的序列)发展得更早。通过分布外实验和通过边缘剪枝对模型权重的机制分析,我们发现这个玩具问题的下一个token预测涉及至少两个独立的机制。一种机制使用离散符号标签进行关联召回,以预测先前见过的序列恢复的开始。第二种机制在很大程度上与离散符号标签无关,基于前一个token和上下文进行“贝叶斯式”预测。这两种机制具有不同的学习动态。为了确认这种多机制现象(表现为不同的相变)不仅仅是玩具设置的人为产物,我们使用OLMo在ICL翻译任务上的训练检查点观察到了类似的现象:第一个任务token的性能与第二个任务token的性能出现决定性差距。

英文摘要

We introduce a new family of toy problems that combine features of linear-regression-style continuous in-context learning (ICL) with discrete associative recall. We pretrain transformer models on sample traces from this toy, specifically symbolically-labeled interleaved state observations from randomly drawn linear deterministic dynamical systems. We study if the transformer models can recall the state of a sequence previously seen in its context when prompted to do so with the corresponding in-context label. Taking a closer look at this task, it becomes clear that the model must perform two functions: (1) identify which system's state should be recalled and apply that system to its last seen state, and (2) continuing to apply the correct system to predict the subsequent states. Training dynamics reveal that the first capability emerges well into a model's training. Surprisingly, the second capability, of continuing the prediction of a resumed sequence, develops much earlier. Via out-of-distribution experiments, and a mechanistic analysis on model weights via edge pruning, we find that next-token prediction for this toy problem involves at least two separate mechanisms. One mechanism uses the discrete symbolic labels to do the associative recall required to predict the start of a resumption of a previously seen sequence. The second mechanism, which is largely agnostic to the discrete symbolic labels, performs a "Bayesian-style" prediction based on the previous token and the context. These two mechanisms have different learning dynamics. To confirm that this multi-mechanism (manifesting as separate phase transitions) phenomenon is not just an artifact of our toy setting, we used OLMo training checkpoints on an ICL translation task to see a similar phenomenon: a decisive gap in the emergence of first-task-token performance vs second-task-token performance.

2506.14126 2026-06-18 cs.LG cs.AI 版本更新

From Memorization to Parameter Interference: How Overtraining Experts Harms Model Merging

从记忆到参数干扰:过度训练专家如何损害模型合并

Stefan Horoi, Guy Wolf, Eugene Belilovsky, Gintare Karolina Dziugaite

发表机构 * Concordia University(康科德大学) Mila -- Québec AI Institute(魁北克人工智能研究所) Google DeepMind(谷歌深Mind)

AI总结 本文研究专家模型微调过度对模型合并的影响,发现长时间微调导致记忆困难样本,造成参数干扰,降低合并性能,并提出任务相关的早停策略改善合并效果。

Comments Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026

详情
AI中文摘要

现代深度学习日益以使用开放权重基础模型为特征,这些模型可以在专门数据集上进行微调。这导致了专家模型和适配器的激增,通常通过HuggingFace和AdapterHub等平台共享。模型合并最近成为一种有效利用这些现有资源的方法,使得能够组合不同模型检查点的能力。因此,形成了一种自然的流程来利用迁移学习的好处并分摊沉没训练成本:模型在通用数据上预训练,在特定任务上微调,然后合并多个检查点以获得更强大的模型。一个普遍假设是,该流程中某一阶段的改进会向下游传播,从而在后续步骤中带来收益。在这项工作中,我们通过研究专家微调如何影响模型合并来挑战这一假设。我们表明,针对个体性能优化的专家长时间微调会导致跨视觉和语言模态、多种模型规模以及完全微调和LoRA适配模型的合并性能下降。我们将这种退化追溯到对一小部分困难样本的记忆,这些样本主导了微调后期步骤。这会导致负参数干扰,并编码在合并过程中被遗忘的知识。最后,我们证明任务相关的激进早停策略可以显著改善模型合并性能。

英文摘要

Modern deep learning is increasingly characterized by the use of open-weight foundation models that can be fine-tuned on specialized datasets. This has led to a proliferation of expert models and adapters, often shared via platforms like HuggingFace and AdapterHub. Model merging has recently emerged as an effective way to leverage these existing resources, enabling the composition of capabilities from different model checkpoints. A natural pipeline has thus formed to harness the benefits of transfer learning and amortize sunk training costs: models are pre-trained on general data, fine-tuned on specific tasks, and then multiple checkpoints are merged to obtain a more capable model. A prevailing assumption is that improvements at one stage of this pipeline propagate downstream, leading to gains at subsequent steps. In this work, we challenge that assumption by examining how expert fine-tuning affects model merging. We show that long fine-tuning of experts that optimizes for their individual performance leads to degraded merging performance across vision and language modalities, multiple model scales, and both fully fine-tuned and LoRA-adapted models. We trace this degradation to the memorization of a small set of difficult examples that dominate late fine-tuning steps. This causes negative parameter interference and encodes knowledge that is forgotten during merging. Finally, we demonstrate that task-dependent aggressive early stopping strategies can significantly improve model merging performance.