arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2512.10353 2026-06-18 cs.CV 版本更新

Hybrid Transformer-Mamba for Weakly Supervised Volumetric Medical Segmentation

混合Transformer-Mamba用于弱监督体积医学分割

Yiheng Lyu, Lian Xu, Coen Arrow, Mohammed Bennamoun, Farid Boussaid, Girish Dwivedi

发表机构 * University of Western Australia（西澳大学）； Harry Perkins Institute of Medical Research（哈利·佩金斯医学研究所）； National Imaging Facility（国家成像设施）； Fiona Stanley Hospital（菲奥娜·斯蒂尔医院）； Victor Chang Cardiac Research Institute（维多利亚·张心脏研究中心）

AI总结提出TranSamba混合架构，通过跨平面建模捕获3D上下文，在弱监督下实现高效体积分割，在三个数据集上达到最优性能。

2512.04144 2026-06-18 cs.AI 版本更新

RippleBench: Capturing Ripple Effects Using Existing Knowledge Repositories

RippleBench: 利用现有知识库捕捉涟漪效应

Roy Rinberg, Usha Bhalla, Igor Shilov, Flavio P. Calmon, Rohit Gandikota

发表机构 * Harvard University（哈佛大学）； Imperial College London（伦敦帝国学院）； Northeastern University（东北大学）

AI总结提出RippleBench-Maker自动管道，从知识库检索语义邻居生成选择题，评估八种遗忘方法在Llama3-8B-Instruct上的涟漪效应，发现准确率下降随语义距离衰减且跨模型一致。

详情

AI中文摘要

针对语言模型的目标干预，如遗忘或模型编辑，旨在修改特定信息，但其效果往往传播到相关的、非预期的领域（例如，删除病毒学内容可能降低对过敏任务的性能）；这些副作用通常被称为涟漪效应。我们引入RippleBench-Maker，一个自动管道，从知识库中检索任何源概念的语义邻居，并生成不同语义距离的多选题。我们使用WikiRAG（一个基于英文维基百科的开源RAG系统）实例化该框架，构建RippleBench-WMDP-Bio（584个种子主题，352,961个问题），并在Llama3-8B-Instruct上评估八种遗忘方法。所有八种方法在遗忘目标附近准确率下降最大，并随语义距离衰减，每种方法具有不同的传播曲线。我们在Mistral-7B、Zephyr-7B和Yi-34B上复现了这些发现；跨模型的差值曲线几乎相同，表明涟漪效应是遗忘方法的属性而非基础模型。我们通过一项包含四个实验的Mechanical Turk研究（5,200+次响应，61名工作者）验证了所有主要管道阶段。我们发布所有代码、数据和基础设施。

英文摘要

Targeted interventions on language models, such as unlearning or model editing, aim to modify specific information, but their effects often propagate to related, unintended areas (e.g., removing virology content may degrade performance on allergies); these side-effects are commonly referred to as the ripple effect. We introduce RippleBench-Maker, an automatic pipeline that retrieves semantic neighbors of any source concept from a knowledge repository and generates multiple-choice questions at varying semantic distances. We instantiate this framework using WikiRAG, an open-source RAG system over English Wikipedia, to construct RippleBench-WMDP-Bio (584 seed topics, 352,961 questions), and evaluate eight unlearning methods on Llama3-8B-Instruct. All eight exhibit accuracy drops that are largest near the unlearned target and decay with semantic distance, each with a distinct propagation profile. We replicate these findings across Mistral-7B, Zephyr-7B, and Yi-34B; cross-model delta curves are nearly identical, suggesting ripple effects are a property of the unlearning method rather than the base model. We validate all major pipeline stages using a four-experiment Mechanical Turk study (5,200+ responses, 61 workers). We release all code, data, and infrastructure.

URL PDF HTML ☆

赞 0 踩 0

2510.05107 2026-06-18 cs.AI 版本更新

Structured Cognitive Loop for Behavioral Intelligence in Large Language Model Agents (Extended Revision: From Behavioral Architecture to Epistemic Accountability)

大型语言模型代理中行为智能的结构化认知循环（扩展修订：从行为架构到认知问责）

Myung Ho Kim

发表机构 * JEI University（JEI大学）

AI总结提出结构化认知循环（SCL）架构，通过分离认知、记忆、控制和行动模块，实现LLM代理的可问责行为，在360个任务中成功率86.3%，优于基线方法。

Comments This revised version extends the original SCL framework from a behavioral architecture for reliable LLM agents into a broader architecture of epistemic accountability, integrating context-aware Human-in-the-Loop control, Pool-Gated Retrieval, and the Horizon-Warrant-Commitment structure

详情

AI中文摘要

AI代理的核心挑战不仅是性能，还有问责性。通过不透明提示序列行动的代理可能产生正确输出，但几乎无法验证为何允许某个行动、错误发生在何处或如何分配责任。本文提出结构化认知循环（SCL）作为大型语言模型代理中可问责行为的架构。SCL将认知、记忆、控制和行动分离为不同模块。语言模型提出建议。外部记忆保存已验证的状态。轻量级控制器检查前提条件、防止冗余行动，并在使用工具前授权执行。我们评估了SCL与ReAct及常见LangChain代理变体在旅行规划、条件邮件起草和约束引导图像生成中的表现。在360个回合中，SCL的任务成功率达到86.3%，而基于提示的基线为70.5%至76.8%。它还提高了目标保真度，减少了冗余工具调用，增加了中间状态的重用，并降低了无依据的断言。此扩展修订将SCL置于更广泛的认知问责架构中。后续扩展整合了上下文感知的人机循环控制、池门控检索和视野担保承诺框架。这些组件共同定义了一个代理架构，其中模型提出建议，结构做出决策，证据在使用前得到担保，人类判断嵌入在轨迹中而非事后强加。结果为AI代理奠定了基础，使其决策不仅有效，而且得到授权、可检查且可问责。

英文摘要

The central challenge for AI agents is not only performance but accountability. Agents that act through opaque prompt sequences may produce correct outputs, but they provide little basis for verifying why an action was permitted, where an error occurred, or how responsibility should be assigned. This paper presents the Structured Cognitive Loop as an architecture for accountable behavior in large language model agents. SCL separates cognition, memory, control, and action into distinct modules. The language model proposes. External memory preserves verified state. A lightweight controller checks preconditions, prevents redundant actions, and authorizes execution before tools are used. We evaluate SCL against ReAct and common LangChain agent variants across travel planning, conditional email drafting, and constraint guided image generation. Across 360 episodes, SCL achieves 86.3 percent task success compared with 70.5 to 76.8 percent for prompt based baselines. It also improves goal fidelity, reduces redundant tool calls, increases reuse of intermediate state, and lowers unsupported assertions. This extended revision situates SCL within a broader architecture of epistemic accountability. Subsequent extensions integrate context aware Human in the Loop control, Pool Gated Retrieval, and the Horizon Warrant Commitment framework. Together these components define an agent architecture in which the model proposes, structure decides, evidence is warranted before use, and human judgment is embedded in the trace rather than imposed after the fact. The result is a foundation for AI agents whose decisions are not only effective but also authorized, inspectable, and accountable.

URL PDF HTML ☆

赞 0 踩 0

2511.20302 2026-06-18 cs.CV 版本更新

CrossEarth-Gate: Fisher-Guided Adaptive Tuning Engine for Efficient Adaptation of Cross-Domain Remote Sensing Semantic Segmentation

CrossEarth-Gate：基于Fisher引导的自适应调优引擎用于高效跨域遥感语义分割

Shilei Cao, Ziyang Gong, Hehai Lin, Yang Liu, Jiashun Cheng, Xiaoxing Hu, Haoyuan Liang, Guowen Li, Chengwei Qin, Hong Cheng, Xue Yang, Juepeng Zheng, Haohuan Fu

发表机构 * Sun Yat-sen University（中山大学）； The Chinese University of Hong Kong（香港中文大学）； Shanghai Jiao Tong University（上海交通大学）； National Supercomputing Center in Shenzhen（深圳国家超算中心）； The Hong Kong University of Science and Technology（香港科技大学）； Beijing Institute of Technology（北京理工大学）； The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； Tsinghua University（清华大学）

AI总结提出CrossEarth-Gate，通过Fisher信息引导的自适应模块选择机制，动态激活最关键的跨域模块，在18个跨域基准中16个达到最优性能。

详情

AI中文摘要

在遥感（RS）中，参数高效微调（PEFT）已成为激活基础模型泛化表示能力以用于下游任务的关键方法。然而，现有的专用PEFT方法在应用于大规模地球观测任务时常常失败，因为它们无法完全处理遥感数据中固有的多面且不可预测的域差距（例如空间、语义和频率偏移）。为克服这一问题，我们提出CrossEarth-Gate，它包含两个主要贡献。首先，我们建立了一个全面的遥感模块工具箱，以解决多方面的域差距，包括空间、语义和频率模块。其次，我们开发了一种基于Fisher引导的自适应选择机制，该机制作用于该工具箱。该选择由Fisher信息引导，通过衡量每个模块对任务特定梯度流的贡献来量化其重要性。它动态地仅在适当层激活最关键模块，引导梯度流以最大化适应效果和效率。全面实验验证了我们方法的有效性和泛化能力，其中CrossEarth-Gate在18个遥感语义分割跨域基准中的16个上达到了最先进性能。

英文摘要

In Remote Sensing (RS), Parameter-Efficient Fine-Tuning (PEFT) has emerged as a key approach to activate the generalizable representation ability of foundation models for downstream tasks. However, existing specialized PEFT methods often fail when applied to large-scale Earth observation tasks, as they are unable to fully handle the multifaceted and unpredictable domain gaps (e.g., spatial, semantic, and frequency shifts) inherent in RS data. To overcome this, we propose CrossEarth-Gate, which introduces two primary contributions. First, we establish a comprehensive RS module toolbox to address multifaceted domain gaps, comprising spatial, semantic, and frequency modules. Second, we develop a Fisher-guided adaptive selection mechanism that operates on this toolbox. This selection is guided by Fisher Information to quantify each module's importance by measuring its contribution to the task-specific gradient flow. It dynamically activates only the most critical modules at the appropriate layers, guiding the gradient flow to maximize adaptation effectiveness and efficiency. Comprehensive experiments validate the efficacy and generalizability of our method, where CrossEarth-Gate achieves state-of-the-art performance on 16 out of 18 cross-domain benchmarks for RS semantic segmentation.

URL PDF HTML ☆

赞 0 踩 0

2511.05221 2026-06-18 cs.LG q-bio.NC 版本更新

ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy

ActiTect：通过标准化体动记录进行REM睡眠行为障碍筛查的通用机器学习流程

David Bertram, Anja Ophey, Sinah Röttgen, Konstantin Kufer, Gereon R. Fink, Elke Kalbe, Clint Hansen, Walter Maetzler, Maximilian Kapsecker, Lara M. Reimer, Stephan Jonas, Andreas T. Damgaard, Natasha B. Bertelsen, Casper Skjaerbaek, Per Borghammer, Karolien Groenewald, Pietro-Luca Ratti, Michele T. Hu, Noémie Moreau, Michael Sommerauer, Katarzyna Bozek

发表机构 * Faculty of Mathematics and Natural Sciences, University of Cologne, Germany（科隆大学数学与自然科学学院，德国）； Institute for Biomedical Informatics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany（科隆大学医学院与科隆大学医院生物医学信息学研究所，德国）； Center for Molecular Medicine Cologne (CMMC), Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany（科隆分子医学中心（CMMC），科隆大学医学院与科隆大学医院，德国）； Medical Psychology | Neuropsychology and Gender Studies, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany（科隆大学医学院与科隆大学医院医学心理学 | 神经心理学与性别研究，德国）； Cognitive Neuroscience, Insitute for Neuroscience and Medicine, INM-3, Research Center Juelich, Germany（认知神经科学，神经科学与医学研究所，Juelich研究中心，德国）； Department of Neurology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Germany（科隆大学医学院与科隆大学医院神经科，德国）； Center of Neurology, Department of Parkinson, Sleep and Movement Disorders, University Hospital Bonn, University of Bonn, Germany（神经科中心，帕金森、睡眠与运动障碍部门，波恩大学医院，德国）； German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany（德国神经退行性疾病研究中心（DZNE），波恩，德国）； Cluster of Excellence for Aging and Aging-Associated Diseases (CECAD), University of Cologne, Germany（老龄化与相关疾病卓越中心（CECAD），科隆大学，德国）； Department of Neurology, University Medical Center Schleswig-Holstein, Campus Kiel and Kiel University, Germany（神经科，施普伦德-霍斯特大学医院，基尔校区和基尔大学，德国）； Department of Informatics, Technical University of Munich, Germany（信息学院，慕尼黑技术大学，德国）； Institute for Digital Medicine, University Hospital Bonn, Germany（数字医学研究所，波恩大学医院，德国）； Lundbeck Foundation Parkinson’s Disease Research Center (PACE), Aarhus University, Denmark（路德维希基金会帕金森病研究中心（PACE），奥胡斯大学，丹麦）； Department of Nuclear Medicine, Aarhus University Hospital, Denmark（核医学部，奥胡斯大学医院，丹麦）； Department of Electrical and Computer Engineering, Aarhus University, Denmark（电气与计算机工程系，奥胡斯大学，丹麦）； Oxford Parkinson’s Disease Centre and Division of Neurology, Nuffield Department of Clinical Neurosciences, University of Oxford, UK（牛津帕金森病中心与神经科，牛津大学临床神经科学系，英国）

AI总结提出ActiTect，一个全自动开源机器学习工具，通过标准化预处理和睡眠-觉醒检测，从体动记录中识别RBD，在多个独立队列中验证了泛化能力（AUROC 0.84-0.94）。

Comments 37 pages including Supplementary Information, 4 core figures, 1 supplementary figure. (v2: fixed a typo in Table 3 and made minor text edits; v3: post review)

Journal ref npj Digital Medicine (2026)

详情

DOI: 10.1038/s41746-026-02738-8

AI中文摘要

孤立性快速眼动睡眠行为障碍（iRBD）是α-突触核蛋白病的主要前驱标志，通常先于帕金森病、路易体痴呆或多系统萎缩的临床发作。虽然腕戴式体动记录仪通过捕捉异常夜间运动在大规模筛查中具有检测RBD的巨大潜力，但缺乏可靠高效的分析流程则无法使用。本研究提出了ActiTect，一个全自动开源机器学习工具，用于从体动记录中识别RBD。为确保跨异构采集设置的泛化能力，我们的流程包括稳健的预处理和自动睡眠-觉醒检测，以协调多设备数据并提取表征活动模式的生理可解释运动特征。模型开发基于78名个体的队列，在嵌套交叉验证下表现出强大的区分能力（AUROC = 0.95）。在盲法本地测试集（n = 31，AUROC = 0.86）和两个独立外部队列（n = 113，AUROC = 0.84；n = 57，AUROC = 0.94）上验证了泛化性。为评估现实世界鲁棒性，跨内部和外部队列的留一数据集交叉验证显示出一致的性能（AUROC范围 = 0.84-0.89）。补充稳定性分析表明，关键预测特征在数据集中保持可重复性，支持最终合并的多中心模型作为更广泛部署的稳健预训练资源。通过开源且易于使用，我们的工具促进了广泛采用，并促进了独立验证和协作改进，从而推动该领域向使用可穿戴设备的统一且可泛化的RBD检测模型发展。

英文摘要

Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of $α$-synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with Lewy bodies, or multiple system atrophy. While wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts by capturing abnormal nocturnal movements, they become inoperable without a reliable and efficient analysis pipeline. This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings. To ensure generalizability across heterogeneous acquisition settings, our pipeline includes robust preprocessing and automated sleep-wake detection to harmonize multi-device data and extract physiologically interpretable motion features characterizing activity patterns. Model development was conducted on a cohort of 78 individuals, yielding strong discrimination under nested cross-validation (AUROC = 0.95). Generalization was confirmed on a blinded local test set (n = 31, AUROC = 0.86) and on two independent external cohorts (n = 113, AUROC = 0.84; n = 57, AUROC = 0.94). To assess real-world robustness, leave-one-dataset-out cross-validation across the internal and external cohorts demonstrated consistent performance (AUROC range = 0.84-0.89). A complementary stability analysis showed that key predictive features remained reproducible across datasets, supporting the final pooled multi-center model as a robust pre-trained resource for broader deployment. By being open-source and easy to use, our tool promotes widespread adoption and facilitates independent validation and collaborative improvements, thereby advancing the field toward a unified and generalizable RBD detection model using wearable devices.

URL PDF HTML ☆

赞 0 踩 0

2209.01378 2026-06-18 cs.LG eess.SP q-fin.ST 版本更新

RNN(p) for Power Consumption Forecasting

RNN(p) 用于电力消耗预测

Roberto Baviera, Pietro Manzoni

发表机构 * Politecnico di Milano, Department of Mathematics（米兰理工大学数学系）； University of Edinburgh, Business School（爱丁堡大学商学院）

AI总结提出RNN(p)作为ARX(p)的推广，用于多时间尺度季节模式预测，通过结构化反馈设计高效训练策略，在电力消耗预测中实现高精度与可解释性。

详情

AI中文摘要

一种基本的循环神经网络，它作用于p个时间滞后，称为RNN(p)，是线性自回归模型ARX(p)的自然推广。对于在多个时间尺度上显示固有季节模式的变量，如能源、经济和金融时间序列中经常观察到的，它是一个强大的预测工具。RNN(p)模型的结构，以跨时间滞后的结构化反馈为特征，使得设计高效的训练策略成为可能。我们对这些模型的学习算法进行了比较研究，对其计算复杂度和训练性能进行了严格分析。我们展示了RNN(p)模型在电力消耗预测中的两个应用，这是能源领域的一个关键领域，准确的预测为运营和财务决策提供信息。实验结果表明，RNN(p)模型在保持高度可解释性的同时实现了出色的预测精度。这些特性使其非常适合能源市场和其他金融科技应用中的决策，其中可靠的预测在经济中发挥着重要作用。

英文摘要

An elementary Recurrent Neural Network that operates on p time lags, called an RNN(p), is the natural generalisation of a linear autoregressive model ARX(p). It is a powerful forecasting tool for variables displaying inherent seasonal patterns across multiple time scales, as is often observed in energy, economic, and financial time series. The architecture of RNN(p) models, characterised by structured feedbacks across time lags, enables the design of efficient training strategies. We conduct a comparative study of learning algorithms for these models, providing a rigorous analysis of their computational complexity and training performance. We present two applications of RNN(p) models in power consumption forecasting, a key domain within the energy sector where accurate forecasts inform both operational and financial decisions. Experimental results show that RNN(p) models achieve excellent forecasting accuracy while maintaining a high degree of interpretability. These features make them well-suited for decision-making in energy markets and other fintech applications where reliable predictions play a significant economic role.

URL PDF HTML ☆

赞 0 踩 0

2510.27353 2026-06-18 cs.AI 版本更新

An In-depth Study of LLM Contributions to the Bin Packing Problem

LLM对装箱问题贡献的深入研究

Julien Herrmann, Guillaume Pallez

发表机构 * CNRS-IRIT ； Inria

AI总结通过分析LLM生成的启发式算法，发现其虽可读但难以解释，进而提出更简单高效的新算法，质疑LLM对装箱问题的实际贡献。

Comments Accepted for publication in ACM Transactions on Evolutionary Learning and Optimization

详情

DOI: 10.1145/3821574

AI中文摘要

近期研究表明，大型语言模型（LLM）可能为数学发现提供有趣的思路。该主张基于报告称，基于LLM的遗传算法在均匀分布和Weibull分布下为在线装箱问题产生了具有新见解的启发式算法。本文通过详细分析LLM产生的启发式算法，考察其行为和可解释性，重新评估了这一主张。尽管这些启发式算法是人类可读的，但即使对领域专家而言，它们仍然在很大程度上是不透明的。基于此分析，我们提出了一类针对这些特定装箱实例的新算法。推导出的算法显著更简单、更高效、更可解释且更具泛化性，表明所考虑的实例本身相对简单。然后，我们讨论了关于LLM对该问题贡献的主张的局限性，该主张似乎基于一个错误的假设，即这些实例先前已被研究过。我们的发现反而强调了在评估LLM生成输出的科学价值时，需要进行严格的验证和情境化。

英文摘要

Recent studies have suggested that Large Language Models (LLMs) could provide interesting ideas contributing to mathematical discovery. This claim was motivated by reports that LLM-based genetic algorithms produced heuristics offering new insights into the online bin packing problem under uniform and Weibull distributions. In this work, we reassess this claim through a detailed analysis of the heuristics produced by LLMs, examining both their behavior and interpretability. Despite being human-readable, these heuristics remain largely opaque even to domain experts. Building on this analysis, we propose a new class of algorithms tailored to these specific bin packing instances. The derived algorithms are significantly simpler, more efficient, more interpretable, and more generalizable, suggesting that the considered instances are themselves relatively simple. We then discuss the limitations of the claim regarding LLMs' contribution to this problem, which appears to rest on the mistaken assumption that the instances had previously been studied. Our findings instead emphasize the need for rigorous validation and contextualization when assessing the scientific value of LLM-generated outputs.

URL PDF HTML ☆

赞 0 踩 0

2510.21615 2026-06-18 cs.CV 版本更新

面向天气基础模型的任务自适应参数高效微调

Shilei Cao, Hehai Lin, Jiashun Cheng, Yang Liu, Guowen Li, Xuehe Wang, Juepeng Zheng, Haoyuan Liang, Meng Jin, Chengwei Qin, Hong Cheng, Haohuan Fu

发表机构 * Sun Yat-sen University（中山大学）； The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； The Hong Kong University of Science and Technology（香港科技大学）； The Chinese University of Hong Kong（香港中文大学）； National Supercomputing Center in Shenzhen（深圳国家超算中心）； Huawei Technologies Co., Ltd（华为技术有限公司）； Tsinghua University（清华大学）

AI总结提出WeatherPEFT框架，通过任务自适应动态提示和随机Fisher引导自适应选择，在天气下游任务上以更少参数达到全微调性能。

详情

AI中文摘要

尽管机器学习的最新进展使天气基础模型（WFM）在多种下游任务中具备了强大的泛化能力，但随着模型规模扩大，计算需求不断攀升，实际部署愈发困难。当前为视觉或语言任务设计的参数高效微调（PEFT）方法无法应对天气下游任务的独特挑战，如变量异质性、分辨率多样性和时空覆盖变化，导致在WFM上性能欠佳。为弥补这一差距，我们提出WeatherPEFT，一种新颖的PEFT框架，包含两项协同创新。首先，在前向传播中，任务自适应动态提示（TADP）通过内部和外部模式提取，将编码器中的嵌入权重动态注入预训练骨干网络的输入令牌，实现针对特定下游任务的上下文感知特征重校准。其次，在反向传播中，随机Fisher引导自适应选择（SFAS）不仅利用Fisher信息识别并更新最关键的任务参数，从而保留不变的预训练知识，还引入随机性以稳定选择过程。我们在三个下游任务上验证了WeatherPEFT的有效性和效率，现有PEFT方法与全微调相比存在显著差距，而WeatherPEFT使用更少的可训练参数达到了与全微调相当的性能。本工作代码见此https链接。

英文摘要

While recent advances in machine learning have equipped Weather Foundation Models (WFMs) with substantial generalization capabilities across diverse downstream tasks, the escalating computational requirements associated with their expanding scale increasingly hinder practical deployment. Current Parameter-Efficient Fine-Tuning (PEFT) methods, designed for vision or language tasks, fail to address the unique challenges of weather downstream tasks, such as variable heterogeneity, resolution diversity, and spatiotemporal coverage variations, leading to suboptimal performance when applied to WFMs. To bridge this gap, we introduce WeatherPEFT, a novel PEFT framework for WFMs incorporating two synergistic innovations. First, during the forward pass, Task-Adaptive Dynamic Prompting (TADP) dynamically injects the embedding weights within the encoder to the input tokens of the pre-trained backbone via internal and external pattern extraction, enabling context-aware feature recalibration for specific downstream tasks. Furthermore, during backpropagation, Stochastic Fisher-Guided Adaptive Selection (SFAS) not only leverages Fisher information to identify and update the most task-critical parameters, thereby preserving invariant pre-trained knowledge, but also introduces randomness to stabilize the selection. We demonstrate the effectiveness and efficiency of WeatherPEFT on three downstream tasks, where existing PEFT methods show significant gaps versus Full-Tuning, and WeatherPEFT achieves performance parity with Full-Tuning using fewer trainable parameters. The code of this work is available at https://github.com/ShileiCao/WeatherPEFT.

URL PDF HTML ☆

赞 0 踩 0

2502.07531 2026-06-18 cs.CV cs.AI cs.LG cs.MM 版本更新

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation

VidCRAFT3: 面向图像到视频生成的相机、物体与光照控制

Sixiao Zheng, Zimian Peng, Yanpeng Zhou, Yi Zhu, Hang Xu, Xiangru Huang, Yanwei Fu

发表机构 * School of Data Science, Fudan University（复旦大学数据科学学院）； Shanghai Innovation Institute（上海创新研究院）； Zhejiang University（浙江大学）； Huawei Noah’s Ark Lab（华为诺亚实验室）； Westlake University（西湖大学）； School of Data Science and MOE Frontiers Center for Brain Science, Fudan University（复旦大学数据科学学院和脑科学前沿中心）； Fudan ISTBI–ZJNU Algorithm Centre for Brain-inspired Intelligence, Zhejiang Normal University（复旦大学-浙江师范大学脑启发智能算法中心）

AI总结提出VidCRAFT3框架，通过显式建模几何、运动与光照的跨因素交互，实现对相机运动、物体运动和光照方向的独立或联合控制，在控制精度和视觉一致性上达到最优。

Comments Accepted to TVCG 2026

详情

AI中文摘要

可控图像到视频（I2V）生成将参考图像转换为由用户指定控制信号引导的连贯视频。虽然对相机运动、物体运动和光照的精确控制对于高保真创作至关重要，但现有方法通常独立处理这些因素，忽视了动态场景中视角、几何和光照之间的物理耦合，导致同时变化时出现阴影不匹配和透视漂移等视觉不一致问题。我们提出了VidCRAFT3，一个统一且灵活的I2V框架，显式建模几何、运动和光照之间的跨因素交互，实现对相机运动、物体运动和光照方向的独立或联合控制。Image2Cloud提供显式的3D几何先验以实现精确的相机运动控制。ObjMotionNet将稀疏物体轨迹编码为多尺度运动特征，以引导逼真的物体运动。空间三重注意力变压器通过光照交叉注意力整合光照方向，实现一致的重光照。为了解决联合标注数据的稀缺性，我们构建了VideoLightingDirection（VLD）数据集，包含精确的逐帧光照方向标注，并引入三阶段渐进训练策略，使得无需完全联合标注即可实现鲁棒学习。大量实验表明，VidCRAFT3在多种场景下的控制精度和视觉一致性上达到了最先进水平。

英文摘要

Controllable image-to-video (I2V) generation transforms a reference image into a coherent video guided by user-specified control signals. While precise control over camera motion, object motion, and lighting is essential for high-fidelity creation, existing methods often treat these factors independently. This overlooks the physical coupling among viewpoint, geometry, and illumination in dynamic scenes, leading to visual inconsistencies such as mismatched shadows and perspective drift under simultaneous changes. We present VidCRAFT3, a unified and flexible I2V framework that explicitly models cross-factor interactions among geometry, motion, and illumination, enabling both independent and joint control over camera motion, object motion, and lighting direction. Image2Cloud provides explicit 3D geometric priors for accurate camera motion control. ObjMotionNet encodes sparse object trajectories into multi-scale motion features to guide realistic object motion. A Spatial Triple-Attention Transformer integrates lighting direction through lighting cross-attention for consistent relighting. To address the scarcity of jointly annotated data, we construct the VideoLightingDirection (VLD) dataset with accurate per-frame lighting direction annotations, and introduce a three-stage progressive training strategy that enables robust learning without fully joint annotations. Extensive experiments demonstrate that VidCRAFT3 achieves state-of-the-art performance in control precision and visual coherence across diverse scenarios.

URL PDF HTML ☆

赞 0 踩 0

2508.20330 2026-06-18 cs.LG 版本更新

FORGE: Foundational Optimization Representations from Graph Embeddings

FORGE：基于图嵌入的基础优化表示

Zohair Shafi, Serdar Kadioglu

发表机构 * Khoury College of Computer Science Northeastern University（诺埃弗大学计算机科学学院）； AI Center of Excellence, Fidelity Investments（富达投资人工智能卓越中心）； Department of Computer Science, Brown University（布朗大学计算机科学系）

AI总结提出FORGE框架，通过无监督预训练向量量化图自编码器学习混合整数规划实例的通用表示，无需求解器或最优解，在下游任务中提升求解器性能并超越现有方法。

Comments Published in TMLR

详情

AI中文摘要

组合优化问题在科学和工程中无处不在。然而，基于学习的加速组合优化方法通常需要求解大量困难实例来收集训练数据，导致显著的计算成本。现有的学习方法需要为每个问题分布和每个下游任务训练专用模型，严重限制了其可扩展性和泛化能力。我们提出Forge：基于图嵌入的基础优化表示，这是一个框架，它在大规模、多样化的混合整数规划（MIP）实例集合上以无监督方式预训练向量量化图自编码器，不依赖优化求解器或最优解。向量量化产生离散的代码分配，作为表示优化实例的词汇表。我们在无监督和有监督设置下评估Forge。在无监督设置中，Forge嵌入有效聚类跨问题领域和规模的未见实例。在有监督设置中，我们微调Forge嵌入，并展示单个预训练模型有助于预测割生成的完整性差距和搜索指导的变量提示，跨越多个问题和规模分布。在这两个任务中，我们提升了商业优化求解器的性能，并超越了最先进的基于学习的方法。最后，我们开源训练代码、预训练Forge权重和多个MIP分布的嵌入，以促进优化问题表示学习的进一步研究。

英文摘要

Combinatorial optimization problems are ubiquitous in science and engineering. Still, learning-based approaches to accelerate combinatorial optimization often require solving a large number of difficult instances to collect training data, incurring significant computational cost. Existing learning-based methods require training dedicated models for each problem distribution, for each downstream task, severely limiting their scalability and generalization. We introduce Forge: Foundational Optimization Representations from Graph Embeddings, a framework that pre-trains a vector-quantized graph autoencoder on a large, diverse collection of mixed-integer programming (MIP) instances in an unsupervised manner, without relying on optimization solvers or optimal solutions. Vector quantization produces discrete code assignments that serve as a vocabulary for representing optimization instances. We evaluate Forge in both unsupervised and supervised settings. In the unsupervised setting, Forge embeddings effectively cluster unseen instances across problem domains and sizes. In the supervised setting, we fine-tune Forge embeddings and show that a single pre-trained model helps predicting both the integrality gap for cut-generation and variable hints for search guidance across multiple problem and size distributions. In both tasks, we improve the performance of a commercial optimization solver and outperform state-of-the-art learning-based methods. Finally, we open-source our training code, pre-trained Forge weights, and embeddings for multiple MIP distributions to foster further research in representation learning for optimization problems https://skadio.github.io/forge/

URL PDF HTML ☆

赞 0 踩 0

2509.18588 2026-06-18 cs.CL 版本更新

UniECG: Understanding and Generating ECG in One Unified Model

UniECG: 在一个统一模型中理解与生成心电图

Jiarui Jin, Haoyu Wang, Xiang Lan, Jun Li, Hongyan Li, Shenda Hong

发表机构 * Peking University（北京大学）； Shanghai Ocean University（上海海洋大学）； the Second Hospital of Tianjin Medical University（天津医科大学第二医院）； National University of Singapore（新加坡国立大学）

AI总结提出UniECG模型，通过两阶段设计实现心电图信号/图像生成解释性文本及根据文本目标生成对应心电图信号，支持交互式心电图教育。

详情

AI中文摘要

心电图解读是医学教育中的基本技能，但学生往往需要更多静态示例来将波形证据与诊断推理联系起来。本文提出UniECG作为迈向交互式心电图教育的一步。UniECG支持两种互补的学习交互：给定心电图信号或图像，它生成基于证据的解释；给定文本学习目标，它生成对应的心电图信号示例用于案例学习。该模型采用两阶段设计。首先，它从心电图信号-图像-文本数据中学习基于证据的心电图解释。其次，它引入特殊的心电图生成标记，并将其隐藏表示与预训练的文本条件心电图扩散模型对齐，实现可控的信号级心电图生成。我们通过基于证据的心电图解释和面向生成的定性分析来评估UniECG，考察其支持解释和案例学习的潜力。UniECG旨在作为教育辅助工具和迈向交互式AI辅助心电图学习的研究步骤，而非临床验证的诊断系统。

英文摘要

Electrocardiogram (ECG) interpretation is a fundamental skill in medical education, yet students often need more than static examples to connect waveform evidence with diagnostic reasoning. This paper presents UniECG as a step toward interactive ECG education. UniECG supports two complementary learning interactions: given an ECG signal or image, it generates an evidence-based explanation; given a textual learning objective, it generates a corresponding ECG signal example for case-based learning. The model follows a two-stage design. First, it learns grounded ECG explanation from ECG signal--image--text data. Second, it introduces special ECG generation tokens and aligns their hidden representations with a pretrained text-conditioned ECG diffusion model, enabling controllable signal-level ECG generation. We evaluate UniECG through grounded ECG explanation and generation-oriented qualitative analysis, examining its potential to support explanation and case-based learning. UniECG is intended as an educational aid and a research step toward interactive AI-assisted ECG learning, rather than a clinically validated diagnostic system.

URL PDF HTML ☆

赞 0 踩 0

2505.20045 2026-06-18 cs.CL 版本更新

Efficient Hallucination Detection for LLMs Using Uncertainty-Aware Attention Heads

基于不确定性感知注意力头的高效大语言模型幻觉检测

Artem Vazhentsev, Lyudmila Rvanova, Gleb Kuzmin, Ekaterina Fadeeva, Ivan Lazichny, Alexander Panchenko, Maxim Panov, Mrinmaya Sachan, Preslav Nakov, Timothy Baldwin, Artem Shelmanov

发表机构 * Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)（莫扎德人工智能大学）； ETH Zurich（苏黎世联邦理工学院）； Independent Researcher（独立研究者）； Applied AI Institute（应用人工智能研究所）

AI总结提出RAUQ框架，利用不确定性感知注意力头与令牌级置信度，通过单次前向传递实现无监督、高效的序列级幻觉检测，在12个数据集上优于现有方法且额外计算少于1%。

Journal ref Proceedings of the 43rd International Conference on Machine Learning (ICML), Seoul, South Korea, 2026

详情

AI中文摘要

尽管大型语言模型（LLM）已经变得非常强大，但它们仍然容易出现事实性错误，通常称为“幻觉”。不确定性量化（UQ）为缓解这一问题提供了一种有前景的方法，但大多数现有方法计算量大且/或需要监督。在这项工作中，我们提出了基于循环注意力的不确定性量化（RAUQ），这是一种无监督且高效的幻觉识别框架。该方法利用了Transformer注意力行为的一个观察：当生成错误信息时，某些“不确定性感知”注意力头倾向于减少对前驱令牌的关注。RAUQ自动检测这些注意力头，并以循环方式将其激活模式与令牌级置信度度量相结合，仅通过一次前向传递即可生成序列级不确定性估计。通过在涵盖问答、摘要和翻译的十二个数据集上对九个不同LLM进行的实验，我们表明RAUQ始终优于最先进的UQ基线。重要的是，它产生的开销极小，所需的额外计算不到1%。由于它既不需要标记数据也不需要广泛的参数调整，RAUQ可作为白盒LLM中实时幻觉检测的轻量级即插即用解决方案。

英文摘要

While large language models (LLMs) have become highly capable, they remain prone to factual inaccuracies, commonly referred to as "hallucinations." Uncertainty quantification (UQ) offers a promising way to mitigate this issue, but most existing methods are computationally intensive and/or require supervision. In this work, we propose Recurrent Attention-based Uncertainty Quantification (RAUQ), an unsupervised and efficient framework for identifying hallucinations. The method leverages an observation about transformer attention behavior: when incorrect information is generated, certain "uncertainty-aware" attention heads tend to reduce their focus on preceding tokens. RAUQ automatically detects these attention heads and combines their activation patterns with token-level confidence measures in a recurrent scheme, producing a sequence-level uncertainty estimate in just a single forward pass. Through experiments on twelve datasets spanning question answering, summarization, and translation across nine different LLMs, we show that RAUQ consistently outperforms state-of-the-art UQ baselines. Importantly, it incurs minimal overhead, requiring less than 1\% additional computation. Since it requires neither labeled data nor extensive parameter tuning, RAUQ serves as a lightweight, plug-and-play solution for real-time hallucination detection in white-box LLMs.

URL PDF HTML ☆

赞 0 踩 0

2509.14969 2026-06-18 cs.LG math.OC stat.ML 版本更新

Stochastic Adaptive Gradient Descent Without Descent

无需下降的随机自适应梯度下降

Jean-François Aujol, Jérémie Bigot, Camille Castera

发表机构 * Univ. Bordeaux CNRS, Bordeaux INP, IMB, UMR 5251（波尔多大学 CNRS，波尔多 INP，IMB，UMR 5251）

AI总结提出一种无需超参数调优的随机梯度自适应步长策略，利用一阶随机Oracle的局部几何信息，理论证明收敛性，实验与调优基线竞争。

2509.14653 2026-06-18 cs.CL 版本更新

UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition

UMA-Split：面向英语和普通话的非自回归语音识别的单峰聚合

Ying Fang, Xiaofei Li

发表机构 * Zhejiang University（浙江大学）； School of Engineering, Westlake University（西湖大学工程学院）； Institute of Advanced Technology, Westlake Institute for Advanced Study（西湖先进研究学院技术研究所）

AI总结针对UMA在英语等语言中因token粒度不匹配导致性能下降的问题，提出UMA-Split，通过分割模块使每个聚合帧映射到多个token，提升非自回归语音识别的跨语言性能。

Comments Accepted by ICASSP 2026. Code:https://github.com/FnoY0723/uma_split

2509.02555 2026-06-18 cs.LG cs.AI cs.NE 版本更新

Surrogate Benchmarks for Model Merging Optimization

模型合并优化的替代基准

Rio Akizuki, Yuya Kudo, Nozomu Yoshinari, Yoichi Hirose, Toshiyuki Nishimoto, Kento Uchida, Shinichi Shirakawa

发表机构 * Yokohama National University（横滨国立大学）

AI总结针对模型合并超参数优化计算成本高的问题，构建替代基准以低成本预测合并模型性能并模拟优化算法行为。

Comments AutoML 2025 Non-Archival Content Track. The code of the surrogate benchmark is available at https://github.com/shiralab/SMM-Bench

2508.09191 2026-06-18 cs.LG cs.AI 版本更新

From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization

从数值到标记：一种基于符号离散化的LLM驱动上下文感知时间序列预测框架

Xiaoyu Tao, Shilong Zhang, Mingyue Cheng, Daoyu Wang, Tingyue Pan, Bokai Pan, Changqing Zhang, Shijin Wang

发表机构 * State Key Laboratory of Cognitive Intelligence（认知智能国家重点实验室）； University of Science and Technology of China（中国科学技术大学）； College of Intelligence and Computing（智能科学与计算学院）； iFLYTEK Research（iFLYTEK研究院）

AI总结提出TokenCast框架，利用大语言模型通过符号离散化将连续时间序列转化为标记，与上下文文本对齐，实现上下文感知的预测，实验证明有效。

详情

AI中文摘要

时间序列预测在能源、医疗和金融等关键应用领域支持决策中起着重要作用。尽管近期取得了进展，但由于将历史数值序列与通常包含非结构化文本数据的上下文特征整合的挑战，预测精度仍然有限。为了解决这一挑战，我们提出了TokenCast，一个由大语言模型（LLM）驱动的框架，利用基于语言的符号表示作为上下文感知时间序列预测的统一中介。具体来说，TokenCast采用离散分词器将连续数值序列转化为时间标记，实现与基于语言输入的结构对齐。为了有效弥合模态之间的语义差距，时间和上下文标记通过预训练的LLM嵌入到共享表示空间中，并通过生成目标进一步优化。基于这一统一语义空间，对齐的LLM随后以监督方式进行微调，以预测未来的时间标记，然后解码回原始数值空间。在真实世界数据集上的大量实验证明了我们框架的有效性，并突显了其作为上下文感知时间序列预测生成框架的潜力。代码可从此https URL获取。

英文摘要

Time series forecasting plays a vital role in supporting decision-making across a wide range of critical applications, including energy, healthcare, and finance. Despite recent advances, forecasting accuracy remains limited due to the challenge of integrating historical numerical sequences with contextual features, which often comprise unstructured textual data. To address this challenge, we propose TokenCast, a large language model (LLM) driven framework that leverages language-based symbolic representations as a unified intermediary for context-aware time series forecasting. Specifically, TokenCast employs a discrete tokenizer to transform continuous numerical sequences into temporal tokens, enabling structural alignment with language-based inputs. To effectively bridge the semantic gap between modalities, both temporal and contextual tokens are embedded into a shared representation space via a pre-trained LLM, further optimized with generative objectives. Building upon this unified semantic space, the aligned LLM is subsequently fine-tuned in a supervised manner to predict future temporal tokens, which are then decoded back into the original numerical space. Extensive experiments on real-world datasets demonstrate the effectiveness of our framework and highlight its potential as a generative framework for context-aware time series forecasting. The code is available at https://github.com/Xiaoyu-Tao/TokenCast.

URL PDF HTML ☆

赞 0 踩 0

2508.03483 2026-06-18 cs.CV cs.AI 版本更新

When Cars Have Stereotypes: Auditing Demographic Bias in Objects from Text-to-Image Models

当汽车有刻板印象：审计文本到图像模型中对象的群体偏见

Dasol Choi, Jihwan Lee, Minjae Lee, Minsuk Kahng

发表机构 * AIM Intelligence（AIM智能研究院）； Yonsei University（延世大学）

AI总结提出SODA框架，通过三个指标系统测量文本到图像模型在生成对象中的群体偏见，发现中性提示隐含偏向中年和白人，且人口统计线索导致高度偏斜的刻板输出。

详情

AI中文摘要

虽然先前关于文本到图像生成的研究主要集中在人类描绘中的偏见，但生成对象中的群体偏见仍然相对未被充分探索。我们引入了SODA（刻板对象诊断审计），这是一个新颖的框架，通过自动属性发现和三个标准化指标系统地测量这些偏见：基础与群体差异（BDS）、跨群体差异（CDS）和视觉属性集中度（VAC）。将SODA应用于五个最先进模型和八个对象类别（例如汽车）的8000张图像，我们发现“中性”提示产生的输出在视觉上最接近中年和白人，表明这些群体在模型默认设置中被隐含地过度代表。此外，人口统计线索触发了高度偏斜的刻板输出：26.6%的对象-模型-群体组合产生的结果中，所有20张生成图像共享完全相同的属性值（例如，为女性生成玫瑰金笔记本电脑）。最后，提示级别的去偏减少了群体间差异，但矛盾地压缩了群体内多样性，用一种刻板印象取代了另一种。SODA提供了一个实用的流程，使这些隐含关联变得可测量，作为迈向更负责任的人工智能发展的一步。

英文摘要

While prior research on text-to-image generation has predominantly focused on biases in human depictions, demographic bias in generated objects remains relatively underexplored. We introduce SODA (Stereotyped Object Diagnostic Audit), a novel framework for systematically measuring these biases through automated attribute discovery and three standardized metrics: Base vs. Demographic Divergence (BDS), Cross-Demographic Disparity (CDS), and Visual Attribute Concentration (VAC). Applying SODA to 8,000 images across five state-of-the-art models and eight object categories (e.g., cars), we find that "neutral" prompts produce outputs most visually similar to middle-aged and White people, suggesting these groups are implicitly over-represented in model defaults. Furthermore, demographic cues trigger highly skewed stereotypical outputs: 26.6% of object-model-demographic combinations produce results where all 20 generated images share the exact same attribute value (e.g., rose gold laptops for women). Finally, prompt-level debiasing reduces inter-group disparity but paradoxically collapses within-group diversity, replacing one stereotype with another. SODA offers a practical pipeline for making these implicit associations measurable, serving as a step toward more responsible AI development.

URL PDF HTML ☆

赞 0 踩 0

2504.04739 2026-06-18 cs.LG cs.CY 版本更新

UST-GNN: A Unified Spatial--Topological Graph Neural Network Framework for Urban Analytics--Demonstrated through a Case Study on Urban Health Prediction

UST-GNN：面向城市分析的空间-拓扑统一图神经网络框架——以城市健康预测为例

Minwei Zhao, Sanja Scepanovic, Stephen Law, Ivica Obadic, Cai Wu, Daniele Quercia

发表机构 * University College London（伦敦大学学院）； The Hong Kong University of Science（香港科学大学）； Nokia Bell Labs（诺基亚贝尔实验室）； Technical University of Munich（慕尼黑技术大学）； University of Oxford（牛津大学）

AI总结提出UST-GNN框架，整合邻域连通性、异质城市特征和位置嵌入，在大伦敦4835个邻域的健康预测中，严格空间交叉验证下R²提升8.4-13.2%，并引入主成分模块解释嵌入。

详情

AI中文摘要

理解社会、人口、环境与空间因素如何共同塑造城市结果，对于可持续城市发展和循证政策至关重要。传统统计方法往往难以捕捉复杂的非线性关系，而许多机器学习方法忽视了城市系统中空间自相关和网络拓扑的共同作用。近期GeoAI的进展仅部分解决了这些挑战，通常将空间效应、图结构、评估和可解释性分开处理。我们提出\textbf{UST-GNN}，一个统一的空间-拓扑图神经网络框架，将邻域连通性、异质城市特征和位置/区位嵌入整合到单一表示中。使用MedSAT数据集（包含大伦敦4835个邻域的150多个环境和社会人口变量及六种处方结果），UST-GNN在严格空间交叉验证下，比强统计基线、地理增强基线和图机器学习基线表现更优，样本外$R^2$提升8.4-13.2%。我们进一步引入轻量级主成分模块，从地理角度解释学习到的节点嵌入，并将其与政策相关的协变量联系起来。结果分析恢复了已知模式，为有争议的关联提供了新视角，并揭示了值得进一步因果研究的新预测因子。这些发现共同证明了基于图的空间机器学习在城市健康分析、环境不平等评估和循证城市政策中的价值。除预测增益外，UST-GNN提供了一个统一的GeoAI分析流程，可嵌入城市数字孪生工作流，用于情景测试、监测和数据驱动的决策，以建设更健康、更可持续的城市。

英文摘要

Understanding how social, demographic, environmental, and spatial factors jointly shape urban outcomes is essential for sustainable urban development and evidence-based policy. Traditional statistical approaches often struggle to capture complex non-linear relationships, while many machine learning methods overlook the joint roles of spatial autocorrelation and network topology in urban systems. Recent advances in GeoAI have addressed these challenges only partially, often treating spatial effects, graph structure, evaluation, and interpretability separately. We present \textbf{UST-GNN}, a unified spatial--topological graph neural network framework that integrates neighbourhood connectivity, heterogeneous urban features, and positional/locational embeddings into a single representation. Using the MedSAT dataset, which contains over 150 environmental and socio-demographic variables and six prescription outcomes across 4,835 neighbourhoods in Greater London, UST-GNN outperforms strong statistical, geographically enhanced, and graph Machine Learning baselines, improving out-of-sample $R^2$ by 8.4--13.2\% under strict spatial cross-validation. We further introduce a lightweight principal-component module to interpret learned node embeddings geographically and relate them to policy-relevant covariates. The resulting analyses recover established patterns, offer new perspectives on debated associations, and reveal novel predictors warranting further causal investigation. Together, these findings demonstrate the value of graph-based spatial machine learning for urban health analytics, environmental inequality assessment, and evidence-based urban policy. Beyond predictive gains, UST-GNN provides a unified GeoAI analytical pipeline that can be embedded into urban digital twin workflows for scenario testing, monitoring, and data-informed decision-making for healthier, more sustainable cities.

URL PDF HTML ☆

赞 0 踩 0

2507.17786 2026-06-18 cs.LG 版本更新

Reinforcement Learning for Accelerated Aerodynamic Shape Optimisation

强化学习加速气动外形优化

Florian Sobieczky, Alfredo Lopez, Erika Dudkin, Christopher Lackner, Matthias Hochsteger, Bernhard Scheichl, Helmut Sobieczky

发表机构 * Software Competence Center Hagenberg (SCCH)（软件竞争力中心哈根贝格）； Institut für Strömungsmechanik und Wärmeübertragung, TU Wien（流体力学与传热研究所，维也纳技术大学）； CERBSim GmbH（CERBSim公司）

AI总结提出基于强化学习的自适应优化算法，通过代理模型和演员-评论家策略评估的MCMC方法，冻结部分参数以降低维度，加速气动外形优化，并在简单流体动力学问题上验证了特征重要性解释能力。

详情

AI中文摘要

我们引入了一种基于强化学习（RL）的自适应优化算法，用于气动外形优化，重点关注降维。这里应用RL的形式是一种基于代理的、演员-评论家策略评估的MCMC方法，允许对部分待优化参数进行时间上的“冻结”。目标是尽量减少计算量，并利用观察到的优化结果来解释所发现的极值点在实现所需流场中的作用。通过围绕作为真实值的中间CFD模拟进行一系列局部优化的参数变化，如果（a）参数必须驻留的局部邻域足够大，能够与网格大小的步长及其大量模拟相竞争，并且（b）对这些邻域所需的奖励和成本估计足够准确，以实现良好的逐步参数自适应，则可以加速全局优化。我们给出了一个简单流体动力学问题的例子，在该问题上，该方法允许在特征重要性评分意义上进行解释。

英文摘要

We introduce a reinforcement learning (RL) based adaptive optimization algorithm for aerodynamic shape optimization focused on dimensionality reduction. The form in which RL is applied here is that of a surrogate-based, actor-critic policy evaluation MCMC approach allowing for temporal 'freezing' of some of the parameters to be optimized. The goals are to minimize computational effort, and to use the observed optimization results for interpretation of the discovered extrema in terms of their role in achieving the desired flow-field. By a sequence of local optimized parameter changes around intermediate CFD simulations acting as ground truth, it is possible to speed up the global optimization if (a) the local neighbourhoods of the parameters in which the changed parameters must reside are sufficiently large to compete with the grid-sized steps and its large number of simulations, and (b) the estimates of the rewards and costs on these neighbourhoods necessary for a good step-wise parameter adaption are sufficiently accurate. We give an example of a simple fluid-dynamical problem on which the method allows interpretation in the sense of a feature importance scoring.

URL PDF HTML ☆

赞 0 踩 0

2506.09046 2026-06-18 cs.LG cs.AI cs.MA 版本更新

Self-Evolving Multi-Agent Systems via Textual Backpropagation

通过文本反向传播的自进化多智能体系统

Xiaowen Ma, Yunpu Ma, Chenyang Lin, Sikuan Yan, Jinhe Bi, Zixuan Cao, Yijun Tian, Volker Tresp, Hinrich Schuetze

发表机构 * Ludwig Maximilian University of Munich（慕尼黑路德维希-马克西米利安大学）； Technical University of Munich（慕尼黑技术大学）； Munich Center for Machine Learning（慕尼黑机器学习中心）； University of Notre Dame（诺丁汉大学）

AI总结提出Agentic Neural Network框架，将多智能体协作建模为分层神经网络，通过前向分解任务和反向传播反馈实现智能体角色、提示和协作的自进化，在七个基准数据集上超越现有方法。

详情

AI中文摘要

利用多个大型语言模型（LLM）已被证明对处理复杂、高维任务有效，但当前方法通常依赖静态、手动设计的多智能体配置。为克服这些限制，我们提出Agentic Neural Network（ANN）框架，该框架将多智能体协作概念化为分层神经网络架构。在此设计中，每个智能体作为节点运行，每一层形成一个专注于特定子任务的协作团队。我们的框架遵循两阶段优化策略：（1）前向阶段——受神经网络前向传播启发，任务被动态分解为子任务，并逐层构建具有合适聚合方法的协作智能体团队。（2）反向阶段——模仿反向传播，我们通过迭代反馈优化全局和局部协作，使智能体能够自进化其角色、提示和协调。这种神经符号方法使我们的框架能够在训练后创建新的或专门的智能体团队，在准确性和适应性方面带来显著提升。在七个基准数据集上，我们的工作在相同配置下超越了领先的多智能体基线，显示出持续的性能改进。

英文摘要

Leveraging multiple Large Language Models (LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network (ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative team focused on a specific subtask. Our framework follows a two-phase optimization strategy: (1) Forward Phase - Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase - Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables our framework to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across seven benchmark datasets, our work surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements.

URL PDF HTML ☆

赞 0 踩 0

2410.15595 2026-06-18 cs.AI cs.CL cs.LG 版本更新

A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications

直接偏好优化综述：数据集、理论、变体及应用

Wenyi Xiao, Zechuan Wang, Leilei Gan, Shuai Zhao, Zongrui Li, Ruirui Lei, Wanggui He, Luu Anh Tuan, Long Chen, Hao Jiang, Zhou Zhao, Fei Wu

发表机构 * Zhejiang University（浙江大学）； Nanyang Technological University（南洋理工大学）； Alibaba Group（阿里巴巴集团）

AI总结综述直接偏好优化（DPO）在理论、变体、数据集和应用方面的进展，指出其作为RL-free替代方案的潜力与局限，并提出未来研究方向。

Comments Accepted by TPAMI 2026. Project page: https://github.com/Mr-Loevan/DPO-Survey

详情

DOI: 10.1109/TPAMI.2026.3704314

AI中文摘要

随着大语言模型（LLMs）的快速发展，将策略模型与人类偏好对齐变得日益关键。直接偏好优化（DPO）作为一种有前景的对齐方法，作为从人类反馈中强化学习（RLHF）的无RL替代方案而出现。尽管DPO取得了各种进展并存在固有局限性，但文献中目前缺乏对这些方面的深入综述。在这项工作中，我们对DPO中的挑战和机遇进行了全面回顾，涵盖理论分析、变体、相关偏好数据集和应用。具体而言，我们基于关键研究问题对近期DPO研究进行分类，以提供对DPO当前格局的透彻理解。此外，我们提出了几个未来研究方向，为研究社区提供模型对齐的见解。相关论文的更新合集可在此https URL找到。

英文摘要

With the rapid advancement of large language models (LLMs), aligning policy models with human preferences has become increasingly critical. Direct Preference Optimization (DPO) has emerged as a promising approach for alignment, acting as an RL-free alternative to Reinforcement Learning from Human Feedback (RLHF). Despite DPO's various advancements and inherent limitations, an in-depth review of these aspects is currently lacking in the literature. In this work, we present a comprehensive review of the challenges and opportunities in DPO, covering theoretical analyses, variants, relevant preference datasets, and applications. Specifically, we categorize recent studies on DPO based on key research questions to provide a thorough understanding of DPO's current landscape. Additionally, we propose several future research directions to offer insights on model alignment for the research community. An updated collection of relevant papers can be found on https://github.com/Mr-Loevan/DPO-Survey.

URL PDF HTML ☆

赞 0 踩 0

2507.01414 2026-06-18 cs.LG 版本更新

Decomposing Prediction Mechanisms for In-Context Recall

分解上下文召回中的预测机制

Sultan Daniels, Dylan Davis, Dhruv Gautam, Wentinn Liao, Gireeja Ranade, Anant Sahai

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Pennsylvania（宾夕法尼亚大学）

AI总结通过设计结合连续上下文学习与离散关联召回的新玩具问题，发现Transformer模型在上下文召回任务中存在两种具有不同学习动态的独立机制：一种依赖离散符号标签进行关联召回，另一种基于前一个token和上下文进行贝叶斯式预测。

Comments 45 pages, 47 figures, 2 tables

详情

AI中文摘要

我们引入了一类新的玩具问题，将线性回归风格的连续上下文学习（ICL）特征与离散关联召回相结合。我们在该玩具的样本轨迹上预训练Transformer模型，具体是从随机抽取的线性确定性动力系统中提取的符号标记交错状态观测。我们研究当模型被提示使用相应的上下文标签时，是否能够召回先前在其上下文中见过的序列的状态。仔细观察这个任务，很明显模型必须执行两个功能：（1）识别应召回哪个系统的状态，并将该系统应用于其最后看到的状态；（2）继续应用正确的系统来预测后续状态。训练动态表明，第一个能力在模型训练中后期才出现。令人惊讶的是，第二个能力（继续预测恢复的序列）发展得更早。通过分布外实验和通过边缘剪枝对模型权重的机制分析，我们发现这个玩具问题的下一个token预测涉及至少两个独立的机制。一种机制使用离散符号标签进行关联召回，以预测先前见过的序列恢复的开始。第二种机制在很大程度上与离散符号标签无关，基于前一个token和上下文进行“贝叶斯式”预测。这两种机制具有不同的学习动态。为了确认这种多机制现象（表现为不同的相变）不仅仅是玩具设置的人为产物，我们使用OLMo在ICL翻译任务上的训练检查点观察到了类似的现象：第一个任务token的性能与第二个任务token的性能出现决定性差距。

英文摘要

We introduce a new family of toy problems that combine features of linear-regression-style continuous in-context learning (ICL) with discrete associative recall. We pretrain transformer models on sample traces from this toy, specifically symbolically-labeled interleaved state observations from randomly drawn linear deterministic dynamical systems. We study if the transformer models can recall the state of a sequence previously seen in its context when prompted to do so with the corresponding in-context label. Taking a closer look at this task, it becomes clear that the model must perform two functions: (1) identify which system's state should be recalled and apply that system to its last seen state, and (2) continuing to apply the correct system to predict the subsequent states. Training dynamics reveal that the first capability emerges well into a model's training. Surprisingly, the second capability, of continuing the prediction of a resumed sequence, develops much earlier. Via out-of-distribution experiments, and a mechanistic analysis on model weights via edge pruning, we find that next-token prediction for this toy problem involves at least two separate mechanisms. One mechanism uses the discrete symbolic labels to do the associative recall required to predict the start of a resumption of a previously seen sequence. The second mechanism, which is largely agnostic to the discrete symbolic labels, performs a "Bayesian-style" prediction based on the previous token and the context. These two mechanisms have different learning dynamics. To confirm that this multi-mechanism (manifesting as separate phase transitions) phenomenon is not just an artifact of our toy setting, we used OLMo training checkpoints on an ICL translation task to see a similar phenomenon: a decisive gap in the emergence of first-task-token performance vs second-task-token performance.

URL PDF HTML ☆

赞 0 踩 0

2506.14126 2026-06-18 cs.LG cs.AI 版本更新

From Memorization to Parameter Interference: How Overtraining Experts Harms Model Merging

从记忆到参数干扰：过度训练专家如何损害模型合并

Stefan Horoi, Guy Wolf, Eugene Belilovsky, Gintare Karolina Dziugaite

发表机构 * Concordia University（康科德大学）； Mila -- Québec AI Institute（魁北克人工智能研究所）； Google DeepMind（谷歌深Mind）

AI总结本文研究专家模型微调过度对模型合并的影响，发现长时间微调导致记忆困难样本，造成参数干扰，降低合并性能，并提出任务相关的早停策略改善合并效果。

Comments Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026

详情

AI中文摘要

现代深度学习日益以使用开放权重基础模型为特征，这些模型可以在专门数据集上进行微调。这导致了专家模型和适配器的激增，通常通过HuggingFace和AdapterHub等平台共享。模型合并最近成为一种有效利用这些现有资源的方法，使得能够组合不同模型检查点的能力。因此，形成了一种自然的流程来利用迁移学习的好处并分摊沉没训练成本：模型在通用数据上预训练，在特定任务上微调，然后合并多个检查点以获得更强大的模型。一个普遍假设是，该流程中某一阶段的改进会向下游传播，从而在后续步骤中带来收益。在这项工作中，我们通过研究专家微调如何影响模型合并来挑战这一假设。我们表明，针对个体性能优化的专家长时间微调会导致跨视觉和语言模态、多种模型规模以及完全微调和LoRA适配模型的合并性能下降。我们将这种退化追溯到对一小部分困难样本的记忆，这些样本主导了微调后期步骤。这会导致负参数干扰，并编码在合并过程中被遗忘的知识。最后，我们证明任务相关的激进早停策略可以显著改善模型合并性能。

英文摘要

Modern deep learning is increasingly characterized by the use of open-weight foundation models that can be fine-tuned on specialized datasets. This has led to a proliferation of expert models and adapters, often shared via platforms like HuggingFace and AdapterHub. Model merging has recently emerged as an effective way to leverage these existing resources, enabling the composition of capabilities from different model checkpoints. A natural pipeline has thus formed to harness the benefits of transfer learning and amortize sunk training costs: models are pre-trained on general data, fine-tuned on specific tasks, and then multiple checkpoints are merged to obtain a more capable model. A prevailing assumption is that improvements at one stage of this pipeline propagate downstream, leading to gains at subsequent steps. In this work, we challenge that assumption by examining how expert fine-tuning affects model merging. We show that long fine-tuning of experts that optimizes for their individual performance leads to degraded merging performance across vision and language modalities, multiple model scales, and both fully fine-tuned and LoRA-adapted models. We trace this degradation to the memorization of a small set of difficult examples that dominate late fine-tuning steps. This causes negative parameter interference and encodes knowledge that is forgotten during merging. Finally, we demonstrate that task-dependent aggressive early stopping strategies can significantly improve model merging performance.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Hybrid Transformer-Mamba for Weakly Supervised Volumetric Medical Segmentation

RippleBench: Capturing Ripple Effects Using Existing Knowledge Repositories

Structured Cognitive Loop for Behavioral Intelligence in Large Language Model Agents (Extended Revision: From Behavioral Architecture to Epistemic Accountability)

CrossEarth-Gate: Fisher-Guided Adaptive Tuning Engine for Efficient Adaptation of Cross-Domain Remote Sensing Semantic Segmentation

ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy

RNN(p) for Power Consumption Forecasting

An In-depth Study of LLM Contributions to the Bin Packing Problem

Epipolar Geometry Improves Video Generation Models

R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations

Rethinking Cross-lingual Gaps from a Statistical Viewpoint

The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs

Phonikud: Overcoming Phonetic Underspecification for Hebrew Text-To-Speech

FinP: Fairness-in-Privacy in Federated Learning by Addressing Disparities in Privacy Risk

Probing Semantic Alignment, Lexical Invariance, and Syntactic Influence in LLM Metaphor Processing

Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation

FORGE: Foundational Optimization Representations from Graph Embeddings

UniECG: Understanding and Generating ECG in One Unified Model

Efficient Hallucination Detection for LLMs Using Uncertainty-Aware Attention Heads

Stochastic Adaptive Gradient Descent Without Descent

UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition

Surrogate Benchmarks for Model Merging Optimization

From Values to Tokens: An LLM-Driven Framework for Context-aware Time Series Forecasting via Symbolic Discretization

When Cars Have Stereotypes: Auditing Demographic Bias in Objects from Text-to-Image Models

UST-GNN: A Unified Spatial--Topological Graph Neural Network Framework for Urban Analytics--Demonstrated through a Case Study on Urban Health Prediction

Reinforcement Learning for Accelerated Aerodynamic Shape Optimisation

Self-Evolving Multi-Agent Systems via Textual Backpropagation

A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications

Decomposing Prediction Mechanisms for In-Context Recall

From Memorization to Parameter Interference: How Overtraining Experts Harms Model Merging