arXivDaily arXiv每日学术速递 周一至周五更新
2512.03818 2026-06-19 cs.CL 版本更新

Improving Alignment Between Human and Machine Codes: An Empirical Assessment of Prompt Engineering for Construct Identification in Psychology

改善人机编码对齐:心理学构念识别中提示工程的实证评估

Kylie L. Anglin, Stephanie Milan, Brittney Hernandez, Claudia Ventura

发表机构 * Department of Educational Psychology, Neag School of Education, University of Connecticut(教育心理学系,教育学院,康涅狄格大学) Department of Psychological Sciences, College of Liberal Arts and Sciences, University of Connecticut(心理学系,文理学院,康涅狄格大学)

AI总结 本研究提出一个实证框架,通过提示工程优化大语言模型在心理学文本中识别构念的性能。实验评估五种提示策略,发现构念定义和任务框架最关键,结合代码簿引导和自动提示工程的少样本方法最接近专家判断。

Comments 22 pages, 2 figures

详情
AI中文摘要

由于其架构和庞大的预训练数据,大语言模型(LLMs)表现出强大的文本分类性能。然而,LLM的输出——这里指分配给文本的类别——在很大程度上取决于提示的措辞。尽管关于提示工程的文献正在扩展,但很少有研究关注分类任务,更少有研究涉及心理学等领域,在这些领域中,构念具有精确的、理论驱动的定义,而这些定义可能未在预训练数据中得到充分体现。我们提出了一个实证框架,通过提示工程优化LLM在文本中识别构念的性能。我们实验评估了五种提示策略——代码簿引导的实证提示选择、自动提示工程、角色提示、思维链推理和解释性提示——采用零样本和少样本分类。我们发现,角色、思维链和解释并不能完全解决因措辞不当的提示而导致的性能损失。相反,提示中最有影响力的特征是构念定义、任务框架,以及在较小程度上提供的示例。在三个构念和两个模型中,与专家判断最一致的分类来自结合代码簿引导的实证提示选择和自动提示工程的少样本提示。基于我们的发现,我们建议研究人员生成并评估尽可能多的提示变体,无论是人工编写的、自动生成的,或者理想情况下两者兼有,并根据训练数据集中的实证性能选择提示和示例,在保留集中验证最终方法。该程序提供了一种实用、系统且理论驱动的方法,用于在需要与专家判断对齐的环境中优化LLM提示。

英文摘要

Due to their architecture and vast pre-training data, large language models (LLMs) demonstrate strong text classification performance. However, LLM output - here, the category assigned to a text - depends heavily on the wording of the prompt. While literature on prompt engineering is expanding, few studies focus on classification tasks, and even fewer address domains like psychology, where constructs have precise, theory-driven definitions that may not be well represented in pre-training data. We present an empirical framework for optimizing LLM performance for identifying constructs in texts via prompt engineering. We experimentally evaluate five prompting strategies -- codebook-guided empirical prompt selection, automatic prompt engineering, persona prompting, chain-of-thought reasoning, and explanatory prompting - with zero-shot and few-shot classification. We find that persona, chain-of-thought, and explanations do not fully address performance loss accompanying a badly worded prompt. Instead, the most influential features of a prompt are the construct definition, task framing, and, to a lesser extent, the examples provided. Across three constructs and two models, the classifications most aligned with expert judgments resulted from a few-shot prompt combining codebook-guided empirical prompt selection with automatic prompt engineering. Based on our findings, we recommend that researchers generate and evaluate as many prompt variants as feasible, whether human-crafted, automatically generated, or ideally both, and select prompts and examples based on empirical performance in a training dataset, validating the final approach in a holdout set. This procedure offers a practical, systematic, and theory-driven method for optimizing LLM prompts in settings where alignment with expert judgment is critical.

2510.21290 2026-06-19 math.NA cs.NA 版本更新

A Variational Framework for the Complexity of PDE Solutions

偏微分方程解复杂性的变分框架

Juan Esteban Suarez Cardona, Holger Boche, Gitta Kutyniok

AI总结 提出基于最小二乘变分公式和梯度流的框架,从优化角度分析PDE解的可计算性和复杂性,建立多项式时间逼近与复杂性爆炸的充分条件。

详情
AI中文摘要

偏微分方程是描述物理现象的基本数学模型,但大多数实际感兴趣的PDE需要数值近似。这些方法的可行性受到现有计算模型的限制。由于数字计算机是数值计算的主要实现,而图灵机定义了其理论极限,因此PDE解的可计算性具有根本意义。它提供了一个严格的框架来区分有效可解的方程与那些编码了不可判定或不可计算行为的方程。一旦可计算性确立,复杂性理论量化了近似PDE解所需的资源。在这项工作中,我们提出了一个基于最小二乘变分公式和相关梯度流的新框架,从优化角度分析PDE解的可计算性和复杂性。我们的方法通过离散梯度流近似PDE解算子,将PDE性质(如强制性、椭圆性和凸性)与解复杂性联系起来。在此设置下,我们刻画了依赖于表示和离散化的充分条件,用于PDE允许多项式时间逼近的情形,以及出现复杂性爆炸(即多项式时间输入数据产生超多项式复杂性的解)的情形。总之,本文开发了一个用于分析PDE解类可计算性和计算复杂性的变分框架。结果展示了PDE结构和解正则性如何通过建立可计算性和复杂性界限的充分条件来影响其复杂性。除了理论刻画,该框架为有效数值方法提供了指导,并有助于理解数字计算在PDE问题上的局限性。

英文摘要

Partial Differential Equations (PDEs) are fundamental mathematical models for describing physical phenomena, yet most PDEs of practical interest require numerical approximations. The feasibility of such methods is constrained by existing computational models. Since digital computers are the primary realizations of numerical computations, and Turing machines define their theoretical limits, computability of PDE solutions is of fundamental significance. It provides a rigorous framework to distinguish equations that are effectively solvable from those that encode undecidable or non-computable behavior. Once computability is established, complexity theory quantifies the resources required to approximate PDE solutions. In this work, we present a novel framework based on least-squares variational formulations and associated gradient flows to analyze the computability and complexity of PDE solutions from an optimization perspective. Our approach approximates PDE solution operators via discrete gradient flows, linking PDE properties, such as coercivity, ellipticity, and convexity, to solution complexity. Within this setting, we characterize representation- and discretization-dependent sufficient conditions for regimes where PDEs admit polynomial-time approximations, as well as regimes exhibiting complexity blowup, where polynomial-time input data produce solutions with super-polynomial complexity. In summary, this paper develops a variational framework for analyzing computability and computational complexity of PDE solution classes. The results show how PDE structure and solution regularity influence their complexity, by establishing sufficient conditions for computability and complexity bounds. Beyond the theoretical characterization, the framework provides guidelines for effective numerical methods and contributes to understanding the limitations of digital computation for PDE problems.

2512.00560 2026-06-19 cs.SE 版本更新

SAGE: Semantic-Aware Gray-Box Game Regression Testing with Large Language Models

SAGE: 基于语义的灰盒游戏回归测试与大型语言模型

Jinyu Cai, Jialong Li, Nianyu Li, Zhenyu Mao, Mingyue Zhang, Kenji Tei

AI总结 提出SAGE框架,利用LLM引导强化学习自动生成测试套件,通过语义多目标优化精简测试,并基于更新日志语义分析优先排序,在Overcooked Plus和Minecraft中实现高效回归测试。

Comments This paper has been accepted by Automated Software Engineering journal

详情
AI中文摘要

现代实时服务游戏的快速迭代周期使得回归测试对于维持质量和稳定性不可或缺。然而,现有的回归测试方法面临关键限制,特别是在无法完全访问源代码的常见灰盒设置中:它们严重依赖手动构建测试用例,难以维护因冗余而日益庞大的测试套件,并且缺乏有效的机制来优先排序相关测试。这些挑战导致测试成本过高、自动化程度有限以及缺陷检测不足。为了解决这些问题,我们提出了SAGE,一个面向灰盒游戏环境的语义感知回归测试框架。SAGE系统地解决了测试生成、维护和选择的核心挑战。它采用LLM引导的强化学习进行高效、目标导向的探索,以自动生成多样化的基础测试套件。随后,它应用基于语义的多目标优化,通过平衡成本、覆盖率和稀有性,将该套件精炼为紧凑、高价值的子集。最后,它利用基于LLM的更新日志语义分析,优先排序与版本变更最相关的测试用例,从而实现跨迭代的高效适应。我们在两个代表性环境Overcooked Plus和Minecraft上评估了SAGE,并与自动化基线和人工记录的测试用例进行了比较。在所有环境中,SAGE以显著更低的执行成本实现了更优的缺陷检测,并展现出对版本更新的强大适应性。

英文摘要

The rapid iteration cycles of modern live-service games make regression testing indispensable for maintaining quality and stability. However, existing regression testing approaches face critical limitations, especially in common gray-box settings where full source code access is unavailable: they heavily rely on manual effort for test case construction, struggle to maintain growing suites plagued by redundancy, and lack efficient mechanisms for prioritizing relevant tests. These challenges result in excessive testing costs, limited automation, and insufficient bug detection. To address these issues, we propose SAGE, a semanticaware regression testing framework for gray-box game environments. SAGE systematically addresses the core challenges of test generation, maintenance, and selection. It employs LLM-guided reinforcement learning for efficient, goal-oriented exploration to automatically generate a diverse foundational test suite. Subsequently, it applies a semantic-based multi-objective optimization to refine this suite into a compact, high-value subset by balancing cost, coverage, and rarity. Finally, it leverages LLM-based semantic analysis of update logs to prioritize test cases most relevant to version changes, enabling efficient adaptation across iterations. We evaluate SAGE on two representative environments, Overcooked Plus and Minecraft, comparing against both automated baselines and human-recorded test cases. Across all environments, SAGE achieves superior bug detection with significantly lower execution cost, while demonstrating strong adaptability to version updates.

2503.02636 2026-06-19 q-bio.NC cs.AI 版本更新

A Deep Generative Model for Resting-State EEG Synthesis and Transferable Representation Learning

一种用于静息态脑电合成与可迁移表示学习的深度生成模型

Yeganeh Farahzadi, Morteza Ansarinia, Zoltan Kekecs

发表机构 * Institute of Psychology, Eötvös Loránd University(埃斯特哈兹·洛朗大学心理学研究所) Doctoral School of Psychology, Eötvös Loránd University(埃斯特哈兹·洛朗大学心理学博士学院) Department of Behavioural and Cognitive Sciences, University of Luxembourg(卢森堡大学行为与认知科学系)

AI总结 提出REST-GAN框架,结合对抗训练与自监督重构,从原始时域信号合成静息态EEG并学习可迁移表示,在频谱、连接性及分类任务中表现优异。

详情
AI中文摘要

静息态脑电提供了一种非侵入性的自发脑活动观测方式,但提取有意义的模式常受限于高质量数据稀缺和对人工设计特征的依赖。生成对抗网络(GAN)能够合成神经信号并从原始数据中学习可迁移表示,这一双重能力在脑电研究中尚未被充分探索。本文提出REST-GAN,一个基于GAN的静息态脑电框架,将对抗训练与辅助自监督重构目标相结合,以支持信号合成和无监督特征提取。尽管仅使用原始时域信号训练,未引入显式的频域或传感器拓扑监督,生成的时序列再现了真实脑电的关键时间、频谱和连接特性。在频带功率特征空间中,生成的样本在睁眼和闭眼条件下均表现出高精确率和召回率(EO: 0.91/0.67; EC: 0.87/0.65),而组平均频谱相干矩阵与真实数据在各频段上的平均绝对差异较低(约0.01-0.03)。模型判别器学习到的表示可迁移至独立的静息态人口统计学分类任务,其性能优于直接在原始脑电上训练的模型,并与近期脑电基础模型表现相当,同时所需训练数据和计算资源大幅减少。这些发现突显了一种计算高效的架构驱动策略,其中生成模型不仅作为脑电信号生成器,还作为无监督特征提取器。该方法有望支持更数据高效的脑电分析,同时减少对人工特征工程的依赖。REST-GAN的实现代码见:this https URL。

英文摘要

Resting-state EEG provides a non-invasive view of spontaneous brain activity, but extracting meaningful patterns is often limited by scarce high-quality data and reliance on manually engineered features. Generative adversarial networks (GANs) can synthesize neural signals and learn transferable representations directly from raw data, a dual capability that remains underexplored in EEG research. Here, we introduce REST-GAN, a GAN-based framework for resting-state EEG that combines adversarial training with an auxiliary self-supervised reconstruction objective to support signal synthesis and unsupervised feature extraction. Although trained only on raw time-domain signals, without explicit frequency-domain or sensor-topographic supervision, the generated time series reproduced key temporal, spectral, and connectivity properties of real EEG. In band-power feature space, generated samples showed high precision and recall across eyes-open and eyes-closed conditions (EO: 0.91/0.67; EC: 0.87/0.65), while group-average spectral coherence matrices showed low mean absolute differences from real data across frequency bands (~0.01-0.03). The representations learned by the model's critic transferred to independent resting-state demographic classification tasks, outperforming models trained directly on raw EEG and showing competitive performance relative to a recent EEG foundation model, while requiring substantially less training data and computational resources. These findings highlight a computationally efficient, architecture-driven strategy in which generative models serve not only as EEG signal generators, but also as unsupervised feature extractors. This approach may support more data-efficient EEG analysis while reducing reliance on manual feature engineering. The implementation code for REST-GAN is available at: https://github.com/Yeganehfrh/REST-GAN.

2511.22283 2026-06-19 cs.LG 版本更新

The Hidden Cost of Approximation in Online Mirror Descent

在线镜像下降中近似的隐藏代价

Ofir Schlisselberg, Uri Sherman, Tomer Koren, Yishay Mansour

发表机构 * Tel Aviv University(特拉维夫大学) Google Research(谷歌研究)

AI总结 研究在线镜像下降(OMD)在近似误差下的鲁棒性,发现正则子光滑度与误差容忍度密切相关:均匀光滑正则子有紧界,而负熵在单纯形上需指数小误差,对数障碍和Tsallis正则子仅需多项式误差。

详情
AI中文摘要

在线镜像下降(OMD)是一个基本的算法范式,支撑着优化、机器学习和序列决策中的许多算法。OMD迭代被定义为优化子问题的解,而这些子问题通常只能近似求解,导致算法的不精确版本。然而,现有的OMD分析通常假设理想的无误差环境,从而限制了我们对实践中应期望的性能保证的理解。在这项工作中,我们启动了对不精确OMD的系统研究,并揭示了正则子光滑性与对近似误差鲁棒性之间的复杂关系。当正则子一致光滑时,我们建立了由误差引起的超额遗憾的紧界。然后,对于单纯形及其子集上的障碍正则子,我们识别出一个尖锐的分离:负熵需要指数小的误差以避免线性遗憾,而对数障碍和Tsallis正则子即使在误差仅为多项式大小时也能保持鲁棒。最后,我们表明当损失是随机的且域是单纯形时,负熵重新获得鲁棒性——但这种性质并不扩展到所有子集,在那里指数小的误差再次是避免次优遗憾所必需的。

英文摘要

Online mirror descent (OMD) is a fundamental algorithmic paradigm that underlies many algorithms in optimization, machine learning and sequential decision-making. The OMD iterates are defined as solutions to optimization subproblems which, oftentimes, can be solved only approximately, leading to an inexact version of the algorithm. Nonetheless, existing OMD analyses typically assume an idealized error free setting, thereby limiting our understanding of performance guarantees that should be expected in practice. In this work we initiate a systematic study into inexact OMD, and uncover an intricate relation between regularizer smoothness and robustness to approximation errors. When the regularizer is uniformly smooth, we establish a tight bound on the excess regret due to errors. Then, for barrier regularizers over the simplex and its subsets, we identify a sharp separation: negative entropy requires exponentially small errors to avoid linear regret, whereas log-barrier and Tsallis regularizers remain robust even when the errors are only polynomial. Finally, we show that when the losses are stochastic and the domain is the simplex, negative entropy regains robustness-but this property does not extend to all subsets, where exponentially small errors are again necessary to avoid suboptimal regret.

2511.18288 2026-06-19 cs.SE 版本更新

Can Large Language Models Reason About Complex Execution Paths? An Empirical Study on Python

大型语言模型能否推理复杂执行路径?基于Python的实证研究

Wenhan Wang, Kaibo Liu, Zeyu Sun, An Ran Chen, Ge Li, Gang Huang, Lei Ma

AI总结 本文实证研究大型语言模型在Python执行路径推理中的可行性,构建测试用例生成和缺陷分类任务,发现LLM能提升路径覆盖率,但强推理模型不一定优于弱模型。

Comments Accepted by ACM Transactions on Software Engineering and Methodology (TOSEM)

详情
AI中文摘要

执行路径推理是理解程序语义的关键步骤,对于生成覆盖特定分支/路径的测试用例或检测由某些路径触发的缺陷(无需实际执行程序)至关重要。传统上,执行路径推理可通过符号执行技术实现,但现有的基于SMT的符号执行方法在处理复杂数据结构及外部API调用时面临困难。在具有高度灵活语法的语言(如Python)中,这一挑战更为突出,导致缺乏广泛采用的执行路径推理工具。因此,基于AI的方法进行执行路径推理成为一个有前景的方向。本文研究了采用大型语言模型(LLMs)进行Python执行路径推理的可行性,而传统的基于路径的符号执行工具在此环境中不可用。我们对两类路径推理任务进行了实证研究:用于测试用例生成的生成任务和用于缺陷检测的分类任务。我们从竞赛级程序和真实世界仓库中构建了新的评估流水线和基准。结果表明,最先进的LLMs能够正确推理执行路径,并提高真实世界软件的测试覆盖率,尽管推理能力更强的模型并不总是优于较弱的模型。这些发现凸显了利用LLMs作为路径感知代码推理的补充启发式方法的潜力,特别是在缺乏成熟符号执行工具的程序语言中。我们已在以下网址发布了基准和评估脚本:此 https URL。

英文摘要

Execution path reasoning is a key step towards program semantics understanding. It is crucial for generating test cases that cover certain branches/paths, or detecting bugs that are triggered by some paths without actually executing the program. Traditionally, execution path reasoning can be achieved by symbolic execution techniques, but existing SMT-based symbolic execution approaches struggle with complex data structures and external API calls. This challenge is even more pronounced in languages with highly flexible syntax, such as Python, resulting in a lack of widely adopted tools for reasoning on execution paths. Therefore, reasoning execution paths with AI-based approaches become a promising direction. In this paper, we investigate the feasibility of adopting large language models (LLMs) for execution path reasoning on Python, where traditional path-based symbolic execution tools are unavailable. We conduct an empirical study on two types of path reasoning tasks: generation tasks for test case generation and classification tasks for bug detection. We build new evaluation pipelines and benchmarks from both competition-level programs and real-world repositories. Our results show that state-of-the-art LLMs can perform correct reasoning on execution paths and improve test coverage on real-world software, though models with stronger reasoning abilities do not always outperform weaker ones. These findings highlight the potential of utilizing LLMs as a complementary heuristic for path-aware code reasoning, especially in program languages lacking mature symbolic execution tools. We have released our benchmark and evaluation scripts at https://github.com/jacobwwh/llm-path-study.

2511.17625 2026-06-19 cs.MA cs.GT 版本更新

Iterative Negotiation and Oversight: A Case Study in Decentralized Air Traffic Management

迭代协商与监督:去中心化空中交通管理案例研究

Jaehan Im, John-Paul Clarke, Ufuk Topcu, David Fridovich-Keil

AI总结 提出一种受监管的去中心化协商框架,通过交易拍卖实现共识,并引入税收式监督机制引导系统效率和公平性,理论保证有限时间终止,案例验证了框架在去中心化空中交通管理中的有效性。

详情
AI中文摘要

在去中心化多智能体系统中,自利智能体通常具有冲突偏好,达成共识仍然具有挑战性。现有的协调方法使智能体无需中央协调员即可达成共识,但无法对系统级目标(如效率或公平性)提供正式保证。为解决这一局限,我们提出一个受监管的去中心化协商框架,该框架通过有限的监管监督增强去中心化协商机制。该框架基于交易拍卖达成共识,使具有冲突偏好的自利智能体能够通过资产交易进行协商,同时避免直接披露私有资产估值。我们引入一种监督机制,实施类似税收的干预,引导去中心化协商走向系统高效和公平的结果,同时调节框架的收敛速度。我们建立了有限时间终止的理论保证,并推导出系统效率和收敛速度与监管干预水平相关的界限。基于美国空中交通管理中的协作航迹选项计划(一个改道倡议)的案例研究表明,该框架能够可靠地在自利空域扇区管理者之间达成共识,并揭示了监管干预水平如何调节系统效率与收敛速度之间的关系。综合理论和实验结果表明,所提出的框架提供了一种受监管的去中心化协调机制,在维护非合作最终选择的同时保障系统级目标。

英文摘要

Achieving consensus among self-interested agents remains challenging in decentralized multi-agent systems, where agents often have conflicting preferences. Existing coordination methods enable agents to reach consensus without a centralized coordinator, but do not provide formal guarantees on system-level objectives such as efficiency or fairness. To address this limitation, we propose a regulated decentralized negotiation framework that augments a decentralized negotiation mechanism with limited regulatory oversight. The framework builds upon the trading auction for consensus, enabling self-interested agents with conflicting preferences to negotiate through asset trading while avoiding direct disclosure of private asset valuations. We introduce an oversight mechanism, which implements a taxation-like intervention that guides decentralized negotiation toward system-efficient and equitable outcomes while also regulating how fast the framework converges. We establish theoretical guarantees of finite-time termination and derive bounds linking system efficiency and convergence rate to the level of regulatory intervention. A case study based on the collaborative trajectory options program, a rerouting initiative in U.S. air traffic management, demonstrates that the framework can reliably achieve consensus among self-interested airspace sector managers, and reveals how the level of regulatory intervention regulates the relationship between system efficiency and convergence speed. Taken together, the theoretical and experimental results indicate that the proposed framework provides a mechanism for regulated decentralized coordination that preserves noncooperative final selection while safeguarding system-level objectives.

2508.04424 2026-06-19 cs.CV 版本更新

Composed Object Retrieval: Object-level Retrieval via Composed Expressions

组合对象检索:通过组合表达式进行对象级检索

Tong Wang, Guanyu Yang, Nian Liu, Zongyan Han, Jinxing Zhou, Salman Khan, Fahad Shahbaz Khan

发表机构 * Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Southeast University, Ministry of Education, Jiangsu, China(新一代人工智能技术及跨学科应用国家重点实验室,东南大学,教育部,江苏,中国) Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE(穆罕默德·本·扎耶德人工智能大学(MBZUAI),阿布扎赫德,阿联酋)

AI总结 提出组合对象检索(COR)任务,通过组合参考对象、掩码和检索文本进行对象级检索,并构建COR125K基准和CORE模型,显著优于现有方法。

详情
AI中文摘要

基于用户意图检索细粒度视觉内容在多模态系统中仍然是一个挑战。尽管当前的组合图像检索(CIR)方法结合了参考图像和检索文本,但它们局限于图像级匹配,无法定位特定对象。为此,我们提出了组合对象检索(COR),一种新的对象级检索任务,从目标图像中的候选对象中检索目标对象,并用像素级掩码对检索结果进行定位。给定一个参考对象、其掩码、一个目标图像以及描述所需修改的检索文本,COR要求模型执行组合视觉-文本推理,而不是依赖显式的类别名称。这一设置带来了若干挑战,包括细粒度组合匹配、在视觉相似干扰物下的负对象过滤以及灵活的单对象或多对象检索。我们构建了COR125K,第一个大规模COR基准,包含408个类别的125,541个检索三元组,并划分基础/新类别以评估类别级泛化能力。我们还提出了CORE,一个统一的端到端模型,集成了参考区域编码、自适应视觉-文本交互和区域级对比学习,以将组合表示与目标对象对齐,同时抑制背景和干扰物。大量实验表明,CORE在基础和新类别上均显著优于现有的基于CIR的流程和强基线,为细粒度对象级多模态检索建立了一个简单而有效的基础。代码将在此https URL公开发布。

英文摘要

Retrieving fine-grained visual content based on user intent remains a challenge in multimodal systems. Although current Composed Image Retrieval (CIR) methods combine reference images with retrieval texts, they are constrained to image-level matching and cannot localize specific objects. To this end, we propose Composed Object Retrieval (COR), a new object-level retrieval task that retrieves target object(s) from candidate objects in a target image and grounds the retrieved result with pixel-level masks. Given a reference object, its mask, a target image, and a retrieval text describing the desired modification, COR requires models to perform composed visual-textual reasoning rather than relying on explicit category names. This setting introduces several challenges, including fine-grained compositional matching, negative-object filtering under visually similar distractors, and flexible single- or multi-object retrieval. We construct COR125K, the first large-scale COR benchmark, containing 125,541 retrieval triplets across 408 categories with base/novel splits for evaluating category-level generalization. We also present CORE, a unified end-to-end model that integrates reference region encoding, adaptive vision-text interaction, and region-level contrastive learning to align composed representations with target objects while suppressing background and distractors. Extensive experiments demonstrate that CORE significantly outperforms existing CIR-based pipelines and strong baselines in both base and novel categories, establishing a simple and effective foundation for fine-grained object-level multimodal retrieval. Code will be released publicly at https://github.com/wangtong627/COR.

2511.14280 2026-06-19 eess.SY cs.SY math.OC 版本更新

A graph-informed regret metric for optimal distributed control

面向最优分布式控制的图信息遗憾度量

Daniele Martinelli, Andrea Martin, Giancarlo Ferrari-Trecate, Luca Furieri

AI总结 提出空间遗憾度量,衡量分布式控制器与拥有额外传感信息的先知控制器之间的最坏性能差距,并基于该度量设计分布式控制器,通过凸优化实现有限维近似,在电力系统仿真中有效抑制局部扰动。

详情
AI中文摘要

我们考虑使用分布式控制器对大规模系统进行最优控制,这些控制器的网络拓扑与子系统之间的耦合图相匹配。在这项工作中,我们引入了空间遗憾,这是一种基于图的度量,用于衡量分布式控制器与能够访问额外传感器信息的先知控制器之间的最坏情况性能差距。先知的图是信息图的用户指定扩展,产生一个基准策略,该策略惩罚那些额外传感会改善性能的扰动。最小化空间遗憾可以产生尊重名义信息图的分布式控制器,这些控制器模仿先知对大规模网络特征扰动(如局部扰动)的响应。我们证明,最小化空间遗憾可以转化为一个具有有限维近似的无限规划。为了扩展到大型网络,我们推导了空间遗憾的上界,该上界可以以分布式方式高效最小化。在电力系统模型上的数值实验表明,与基于经典度量的控制器相比,所得控制器能更有效地抑制局部扰动。

英文摘要

We consider the optimal control of large-scale systems using distributed controllers whose network topology mirrors the coupling graph between subsystems. In this work, we introduce spatial regret, a graph-informed metric measuring the worst-case performance gap between a distributed controller and an oracle with access to additional sensor information. The oracle's graph is a user-specified augmentation of the information graph, yielding a benchmark policy that penalizes disturbances for which additional sensing would improve performance. Minimizing spatial regret yields distributed controllers - respecting the nominal information graph - that emulate the oracle's response to disturbances characteristic of large-scale networks, such as localized perturbations. We show that minimizing spatial regret admits a convex reformulation as an infinite program with a finite-dimensional approximation. To scale to large networks, we derive an upper bound on the spatial regret that can be efficiently minimized in a distributed way. Numerical experiments on power-system models show that the resulting controllers mitigate localized disturbances more effectively than those based on classical metrics.

2509.11951 2026-06-19 math.NA cs.NA math.AP 版本更新

X-ray imaging from nonlinear waves: numerical reconstruction of a cubic nonlinearity

非线性波X射线成像:三次非线性的数值重建

Suvi Anttila, Markus Harju, Teemu Tyni

AI总结 针对2+1维非线性波动方程的反边界值问题,提出基于Radon变换的直接数值重建方法,通过谱正则化稳定数值微分,实现从边界测量恢复势函数。

Comments 26 pages, 10 figures. Revised version based on peer-review feedback with improvements to Theorem 1, an addition of Theorem 2, and an additional figure in the time-dependent case

详情
AI中文摘要

我们研究了$2+1$维非线性波动方程的反边界值问题。目标是利用实值波从相关的Dirichlet-to-Neumann映射中恢复未知势$q(x, t)$。我们提出了一种直接数值重建方法,用于$q$的Radon变换,然后可以使用标准的X射线断层扫描技术反演以确定$q$。我们的实现引入了一种谱正则化程序,以稳定重建中所需的数值微分步骤,提高了对边界数据噪声的鲁棒性。我们给出了噪声测量正则化谱微分的严格证明和最优稳定性估计,这可能具有独立的意义。数值实验证明了从非线性波的边界测量中恢复势的可行性,并说明了基于Radon重建的优势。

英文摘要

We study an inverse boundary value problem for the nonlinear wave equation in $2 + 1$ dimensions. The objective is to recover an unknown potential $q(x, t)$ from the associated Dirichlet-to-Neumann map using real-valued waves. We propose a direct numerical reconstruction method for the Radon transform of $q$, which can then be inverted using standard X-ray tomography techniques to determine $q$. Our implementation introduces a spectral regularization procedure to stabilize the numerical differentiation step required in the reconstruction, improving robustness with respect to noise in the boundary data. We give rigorous justification and optimal stability estimates for the regularized spectral differentiation of noisy measurements, which may be of independent interest. Numerical experiments demonstrate the feasibility of recovering potentials from boundary measurements of nonlinear waves and illustrate the advantages of the Radon-based reconstruction.

2511.04260 2026-06-19 cs.CV cs.AI 版本更新

Proto-LeakNet: Towards Signal-Leak Aware Attribution in Synthetic Human Face Imagery

Proto-LeakNet:面向合成人脸图像中信号泄漏感知的归因方法

Claudio Giusti, Luca Guarnera, Sebastiano Battiato

发表机构 * Department of Mathematics and Computer Science(数学与计算机科学系) University of Catania(卡塔尼亚大学)

AI总结 提出Proto-LeakNet,利用扩散模型中的信号泄漏痕迹,结合闭集分类与密度开集评估,实现可解释的生成器归因,在闭集上训练后对未见生成器也有效。

Comments 44 pages, 27 figures, 11 tables

详情
AI中文摘要

合成图像和深度伪造生成模型的日益复杂使得源归因和真实性验证成为现代计算机视觉系统的关键挑战。最近的研究表明,扩散管道会在其输出中无意中留下持久的统计痕迹,称为信号泄漏,特别是在潜在表示中。基于这一观察,我们提出了Proto-LeakNet,一个信号泄漏感知且可解释的归因框架,它将闭集分类与基于密度的开集评估相结合,对学习到的嵌入进行开集评估,从而无需重新训练即可分析未见过的生成器。我们的方法作用于扩散模型的潜在域,重新模拟部分前向扩散以暴露残留的生成器特定线索。一个时间注意力编码器聚合多步潜在特征,而一个特征加权原型头则结构化嵌入空间并实现透明的归因。仅在闭集数据上训练并达到98.13%的宏AUC,Proto-LeakNet学习到的潜在几何结构在后处理下保持鲁棒,超越了最先进的方法,并且在真实图像与已知生成器之间以及已知与未见生成器之间实现了强可分离性。代码库可在以下链接获取:this https URL。

英文摘要

The growing sophistication of synthetic image and deepfake generation models has turned source attribution and authenticity verification into a critical challenge for modern computer vision systems. Recent studies suggest that diffusion pipelines unintentionally imprint persistent statistical traces, known as signal-leaks, within their outputs, particularly in latent representations. Building on this observation, we propose Proto-LeakNet, a signal-leak-aware and interpretable attribution framework that integrates Closed-set classification with a density-based Open-set evaluation on the learned embeddings, enabling analysis of unseen generators without retraining. Acting in the latent domain of diffusion models, our method re-simulates partial forward diffusion to expose residual generator-specific cues. A temporal attention encoder aggregates multi-step latent features, while a feature-weighted prototype head structures the embedding space and enables transparent attribution. Trained solely on closed data and achieving a Macro AUC of 98.13\%, Proto-LeakNet learns a latent geometry that remains robust under post-processing, surpassing state-of-the-art methods, and achieves strong separability both between real images and known generators, and between known and unseen ones. The codebase is available at the following link: https://github.com/claudiunderthehood/Proto-LeakNet .

2511.09480 2026-06-19 math.CO cs.DM 版本更新

Enumeration in the lattice of $q$-decreasing words

$q$-递减词格中的枚举

Jean-Luc Baril, Nathanaël Hassler, Sergey Kirgizov

AI总结 本文证明了$q$-递减词按分量序构成格,枚举了$q>0$时的join-不可约元,对正有理数$q$给出了覆盖数、区间数和meet-不可约元的计数,并分析了渐近行为。

Comments 22 pages, 1 figure

详情
AI中文摘要

我们证明了配备分量序的$q$-递减词偏序集构成一个格。对于任意$q>0$,我们枚举了join-不可约元;对于任意正有理数$q$,我们确定了覆盖、区间和meet-不可约元的数量。后者呈现出与字母表大小为$2\lceil q\rceil+1$且避免长度为2的$\lceil q\rceil^2+2\lceil q\rceil-1$个连续模式的词相同的结构。此外,我们分析了其中几个量的渐近行为。

英文摘要

We prove that the poset of $q$-decreasing words equipped with the componentwise order forms a lattice. We enumerate the join-irreducible elements for arbitrary $q>0$, and for any positive rational number $q$, we determine the number of coverings, intervals and meet-irreducible elements. The latter present the same structure as words over an alphabet of $2\lceil q\rceil+1$ letters avoiding $\lceil q\rceil^2+2\lceil q\rceil-1$ consecutive patterns of length 2. Furthermore, we analyze the asymptotic behavior of several of these quantities.

2510.18784 2026-06-19 cs.LG 版本更新

CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training

CAGE: 曲率感知梯度估计用于精确的量化感知训练

Soroush Tabesh, Mher Safaryan, Andrei Panferov, Alexandra Volkova, Dan Alistarh

发表机构 * Anonymous Authors(匿名作者)

AI总结 提出CAGE方法,通过曲率感知校正项改进直通估计器,平衡损失最小化与量化约束,在平滑非凸设置下提供收敛保证,显著提升低比特量化感知训练的精度。

Comments Accepted at MLSys 2026 (Oral). To appear in Proceedings of Machine Learning and Systems 8

Journal ref Proceedings of Machine Learning and Systems 8 (MLSys 2026)

详情
AI中文摘要

尽管在低比特量化感知训练(QAT)方面已有大量工作,但这些技术与原生训练之间仍存在精度差距。为解决这一问题,我们引入了CAGE(曲率感知梯度估计),一种新的QAT方法,它用曲率感知校正项增强直通估计器(STE)梯度,旨在抵消量化引起的损失增加。CAGE源自QAT的多目标视角,平衡损失最小化与量化约束,产生一个依赖于局部曲率信息的原理性校正项。在理论方面,我们引入了量化优化的帕累托最优解概念,并证明CAGE在平滑非凸设置下具有强收敛保证。在实现方面,我们的方法是优化器无关的,但我们提供了一个利用Adam统计信息的高效实现。在相似计算成本下,CAGE在精度上显著优于先前最先进的方法:对于QAT微调,它将压缩精度损失相对于先前最佳方法减半;而对于Llama模型的QAT预训练,其在3比特权重和激活(W3A3)下的精度与先前最佳方法在4比特(W4A4)下达到的精度相当。官方实现可在以下链接找到:https://github.com/IST-DASLab/CAGE。

英文摘要

Despite significant work on low-bit quantization-aware training (QAT), there is still an accuracy gap between such techniques and native training. To address this, we introduce CAGE (Curvature-Aware Gradient Estimation), a new QAT method that augments the straight-through estimator (STE) gradient with a curvature-aware correction designed to counteract the loss increase induced by quantization. CAGE is derived from a multi-objective view of QAT that balances loss minimization with the quantization constraints, yielding a principled correction term that depends on local curvature information. On the theoretical side, we introduce the notion of Pareto-optimal solutions for quantized optimization, and establish that CAGE yields strong convergence guarantees in the smooth non-convex setting. In terms of implementation, our approach is optimizer-agnostic, but we provide a highly-efficient implementation that leverages Adam statistics. CAGE significantly improves upon the prior state-of-the-art methods in terms of accuracy, for similar computational cost: for QAT fine-tuning, it halves the compression accuracy loss relative to the prior best method, while for QAT pre-training of Llama models, its accuracy for 3-bit weights-and-activations (W3A3) matches the accuracy achieved at 4-bits (W4A4) with the prior best method. The official implementation can be found over https://github.com/IST-DASLab/CAGE .

2507.23534 2026-06-19 cs.LG cs.CV 版本更新

Continual Learning with Support Boundary Experience Blending

支持边界经验混合的持续学习

Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

发表机构 * National Taiwan University(国立台湾大学)

AI总结 提出经验混合框架,通过差分隐私启发的噪声生成支持边界数据,联合训练样本和边界数据以正则化决策边界,在多个数据集上提升持续学习准确率。

详情
AI中文摘要

持续学习旨在减轻模型在顺序任务训练时的灾难性遗忘。常见方法经验回放存储过去的样本,但仅稀疏地近似数据分布,导致决策边界脆弱且过于简化。我们通过引入支持边界数据来解决这一限制,该数据通过差分隐私启发的噪声注入潜在特征,生成边界邻近表示,隐式正则化决策边界。基于此,我们提出经验混合框架,通过双模型聚合策略联合训练样本和支持边界数据。经验混合有两个组成部分:(1) 潜在空间噪声注入以生成支持边界数据,(2) 联合利用样本和支持边界数据的端到端训练。与标准经验回放不同,支持边界数据丰富了决策边界附近的特征空间,从而实现更稳定和鲁棒的持续学习。在CIFAR-10、CIFAR-100、Tiny ImageNet和ImageNet1K上的大量实验分别展示了10%、6%、13%和2%的持续准确率提升。

英文摘要

Continual learning (CL) seeks to mitigate catastrophic forgetting when models are trained with sequential tasks. A common approach, experience replay (ER), stores past exemplars but only sparsely approximates the data distribution, yielding fragile and oversimplified decision boundaries. We address this limitation by introducing Support Boundary Data (SBD), generated via differential-privacy-inspired noise into latent features to create boundary-adjacent representations that implicitly regularize decision boundaries. Building on this idea, we propose Experience Blending (EB), a framework that jointly trains on exemplars and SBD through a dual-model aggregation strategy. EB has two components: (1) latent-space noise injection to generate support boundary data, and (2) end-to-end training that jointly leverages exemplars and SBD. Unlike standard experience replay, SBD enriches the feature space near decision boundaries, leading to more stable and robust continual learning. Extensive experiments on CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet1K demonstrate consistent accuracy improvements of 10%, 6%, 14%, 2%, respectively.

2510.27285 2026-06-19 cs.CV cs.CR 版本更新

Rethinking Robust Adversarial Concept Erasure in Diffusion Models

重新思考扩散模型中的鲁棒对抗性概念擦除

Qinghong Yin, Yu Tian, Heming Yang, Xiang Chen, Xianlin Zhang, Yue Ming, Xueming Li, Yue Zhang

发表机构 * Beijing University of Posts and Telecommunications(北京邮电大学) Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua University(计算机科学与技术系,人工智能研究院,清华大学) University of Chinese Academy of Sciences(中国科学院大学) Nanjing University of Aeronautics and Astronautics(南京航空航天大学)

AI总结 针对扩散模型中概念擦除的对抗训练忽视概念语义导致拟合不足的问题,提出语义引导的鲁棒对抗概念擦除方法S-GRACE,显著提升擦除性能26%并减少90%训练时间。

详情
AI中文摘要

概念擦除旨在选择性地遗忘扩散模型(DMs)中的不良内容,以降低敏感内容生成的风险。作为概念擦除的一种新范式,现有方法大多采用对抗训练来识别和抑制目标概念,从而减少敏感输出的可能性。然而,这些方法常常忽视对抗训练在DMs中的特异性,导致仅能部分缓解。在这项工作中,我们从概念空间的角度调查并量化了这种特异性,即对抗样本能否真正拟合目标概念空间?我们观察到现有方法在生成对抗样本时忽视了概念语义的作用,导致对概念空间的拟合效果不佳。这种忽视导致了以下问题:1)当对抗样本较少时,它们无法全面覆盖目标概念;2)反之,它们会破坏其他目标概念空间。受这些发现分析的启发,我们引入了S-GRACE(语义引导的鲁棒对抗概念擦除),它优雅地利用概念空间内的语义引导来生成对抗样本并执行擦除训练。使用七种最先进方法和三种对抗提示生成策略在各种DM遗忘场景下进行的实验表明,S-GRACE显著提高了擦除性能26%,更好地保留了非目标概念,并将训练时间减少了90%。我们的代码可在此https URL获取。

英文摘要

Concept erasure aims to selectively unlearning undesirable content in diffusion models (DMs) to reduce the risk of sensitive content generation. As a novel paradigm in concept erasure, most existing methods employ adversarial training to identify and suppress target concepts, thus reducing the likelihood of sensitive outputs. However, these methods often neglect the specificity of adversarial training in DMs, resulting in only partial mitigation. In this work, we investigate and quantify this specificity from the perspective of concept space, i.e., can adversarial samples truly fit the target concept space? We observe that existing methods neglect the role of conceptual semantics when generating adversarial samples, resulting in ineffective fitting of concept spaces. This oversight leads to the following issues: 1) when there are few adversarial samples, they fail to comprehensively cover the object concept; 2) conversely, they will disrupt other target concept spaces. Motivated by the analysis of these findings, we introduce S-GRACE (Semantics-Guided Robust Adversarial Concept Erasure), which grace leveraging semantic guidance within the concept space to generate adversarial samples and perform erasure training. Experiments conducted with seven state-of-the-art methods and three adversarial prompt generation strategies across various DM unlearning scenarios demonstrate that S-GRACE significantly improves erasure performance 26%, better preserves non-target concepts, and reduces training time by 90%. Our code is available at https://github.com/Qhong-522/S-GRACE.

2509.15822 2026-06-19 stat.ML cs.LG math.PR math.ST stat.TH 版本更新

Phase Transition for Stochastic Block Model with more than $\sqrt{n}$ Communities

具有多于 $\sqrt{n}$ 个社区的随机块模型的相变

Alexandra Carpentier, Christophe Giraud, Nicolas Verzelen

发表机构 * Institut für Mathematik – Universität Potsdam, Potsdam, Germany(波恩大学数学研究所,德国波恩) Laboratoire de Mathématiques d’Orsay, Université Paris-Saclay, CNRS, France(奥赛数学实验室,巴黎-萨克雷大学,法国 CNRS) INRAE, Institut Agro, MISTEA, Univ. Montpellier, France(国家农业研究院,蒙彼利埃大学,法国)

AI总结 本文证明在随机块模型中,当社区数 $K\geq \sqrt{n}$ 时,低度多项式在 Chin 等人提出的阈值以下无法恢复社区,而通过计数特定子图可在多项式时间内实现恢复,支持了新相变阈值的猜想。

详情
AI中文摘要

统计物理的预测表明,在随机块模型(SBM)中,当社区数 $K$ 固定时,社区恢复在 Kesten-Stigum (KS) 阈值以上(且仅在其以上)可以在多项式时间内实现。这一猜想催生了丰富的文献,证明在 KS 阈值以上的 SBM 中,非平凡社区恢复确实是可能的。只要 $K\ll \sqrt{n}$(其中 $n$ 是观测图中的节点数),KS 阈值以下低度多项式(LDP)的失败也被证明。当 $K\geq \sqrt{n}$ 时,Chin 等人(2025)最近证明,在稀疏机制中,通过计数非回溯路径,可以在 KS 阈值以下的多项式时间内实现社区恢复。这一突破使他们提出了多社区机制 $K\geq \sqrt{n}$ 的新阈值。在这项工作中,我们为他们的猜想提供了证据:\n1- 我们证明,对于任意图密度,LDP 无法在 Chin 等人(2025)提出的阈值以下恢复社区;\n2- 我们证明,在所提出的阈值以上,不仅是在 Chin 等人(2025)考虑的稀疏机制中,而且在适度稀疏机制中,通过计数受 LDP 分析启发的某些特定子图,可以在多项式时间内实现社区恢复。\n特别地,计数长度为 $\log(n)$ 的自避路径(这与基于非回溯算子的谱算法密切相关)仅在稀疏机制中是最优的。在更密集的机制中,必须考虑基于循环放大的更复杂子图。

英文摘要

Predictions from statistical physics postulate that recovery of the communities in the Stochastic Block Model (SBM) with a fixed number $K$ of communities is possible in polynomial time above, and only above, the Kesten-Stigum (KS) threshold. This conjecture has given rise to a rich literature, proving that non-trivial community recovery is indeed possible in SBM above the KS threshold. Failure of low-degree polynomials (LDP) below the KS threshold was also proven, as long as $K\ll \sqrt{n}$, where $n$ is the number of nodes in the observed graph. When $K\geq \sqrt{n}$, Chin et al.(2025) recently proved that, in a \emph{sparse regime}, community recovery in polynomial time is possible below the KS threshold by counting non-backtracking paths. This breakthrough led them to postulate a new threshold for the many-communities regime $K\geq \sqrt{n}$. In this work, we provide evidence supporting their conjecture:\\ 1- We prove that, for \emph{any graph density}, LDP fail to recover communities below the threshold postulated by Chin et al.(2025) ;\\ 2- We prove that community recovery is possible in polynomial time above the postulated threshold, not only in the \emph{sparse regime} considered in Chin et al.~(2025), but also in \emph{moderately sparse regimes}, by counting occurrences of some specific motifs inspired by the LDP analysis.\\ In particular, counting self-avoiding paths of length $\log(n)$, which is closely related to spectral algorithms based on the Non-Backtracking operator, is optimal only in the sparse regime. More complex motifs based on the blow-up of a cycle must be considered in denser regimes.

2511.04514 2026-06-19 cs.LG 版本更新

Linear Mode Connectivity under Data Shifts for Deep Ensembles of Image Classifiers

图像分类器深度集成在数据偏移下的线性模式连通性

C. Hepburn, T. Zielke, A. P. Raulf

发表机构 * Institute for AI Safety & Security(人工智能安全与安全研究所)

AI总结 实验研究数据偏移下线性模式连通性(LMC)的条件,发现小学习率和大批量可减轻其影响,并揭示LMC在训练效率与集成多样性间的权衡。

Comments 17 pages, 22 figures

详情
AI中文摘要

线性模式连通性(LMC)现象将深度学习的多个方面联系起来,包括噪声随机梯度下的训练稳定性、局部最小值(盆地)的平滑性和泛化性、采样模型的相似性和功能多样性,以及架构对数据处理的影响。在这项工作中,我们实验研究了数据偏移下的LMC,并确定了减轻其影响的条件。我们将数据偏移解释为随机梯度噪声的额外来源,可以通过小学习率和大批量来减少。这些参数影响模型是收敛到相同的局部最小值,还是收敛到损失景观中具有不同平滑性和泛化性的区域。尽管通过LMC采样的模型往往比收敛到不同盆地的模型更频繁地犯相似错误,但LMC的好处在于平衡训练效率与从更大、更多样化的集成中获得的收益。代码和补充材料可从此https URL获取。本工作已提交给IEEE考虑发表。版权可能随时转移,此后此版本可能不再可访问。

英文摘要

The phenomenon of linear mode connectivity (LMC) links several aspects of deep learning, including training stability under noisy stochastic gradients, the smoothness and generalization of local minima (basins), the similarity and functional diversity of sampled models, and architectural effects on data processing. In this work, we experimentally study LMC under data shifts and identify conditions that mitigate their impact. We interpret data shifts as an additional source of stochastic gradient noise, which can be reduced through small learning rates and large batch sizes. These parameters influence whether models converge to the same local minimum or to regions of the loss landscape with varying smoothness and generalization. Although models sampled via LMC tend to make similar errors more frequently than those converging to different basins, the benefit of LMC lies in balancing training efficiency against the gains achieved from larger, more diverse ensembles. Code and supplementary materials are available at https://github.com/DLR-KI/LMC. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

2510.27568 2026-06-19 cs.AI cs.CL 版本更新

SIGMA: Search-Augmented On-Demand Knowledge Integration for Agentic Mathematical Reasoning

SIGMA: 搜索增强的按需知识集成用于智能体数学推理

Ali Asgarov, Umid Suleymanov, Aadyant Khatri

AI总结 提出SIGMA框架,通过多智能体独立推理、定向搜索和协调机制,实现上下文敏感的知识集成,在MATH500等基准上提升7.4%的绝对性能。

Comments AAAI 2026 LMReasoning

详情
AI中文摘要

解决数学推理问题不仅需要准确访问相关知识,还需要仔细的多步骤思考。然而,当前的检索增强模型通常依赖单一视角,遵循僵化的搜索策略,并且难以有效结合来自多个来源的信息。我们提出了SIGMA(搜索增强的按需知识集成用于智能体数学推理),这是一个统一框架,通过协调机制编排专门智能体独立推理、执行定向搜索并综合发现。每个智能体生成假设段落以优化其分析视角的检索,确保知识集成既上下文敏感又计算高效。在MATH500、AIME和博士级科学问答GPQA等具有挑战性的基准测试中,SIGMA持续优于开源和闭源系统,实现了7.4%的绝对性能提升。我们的结果表明,多智能体按需知识集成显著提高了推理准确性和效率,为复杂、知识密集型问题解决提供了可扩展的方法。代码将在发表后公开。

英文摘要

Solving mathematical reasoning problems requires not only accurate access to relevant knowledge but also careful, multi-step thinking. However, current retrieval-augmented models often rely on a single perspective, follow inflexible search strategies, and struggle to effectively combine information from multiple sources. We introduce SIGMA (Search-Augmented On-Demand Knowledge Integration for AGentic Mathematical reAsoning), a unified framework that orchestrates specialized agents to independently reason, perform targeted searches, and synthesize findings through a moderator mechanism. Each agent generates hypothetical passages to optimize retrieval for its analytic perspective, ensuring knowledge integration is both context-sensitive and computation-efficient. When evaluated on challenging benchmarks such as MATH500, AIME, and PhD-level science QA GPQA, SIGMA consistently outperforms both open- and closed-source systems, achieving an absolute performance improvement of 7.4%. Our results demonstrate that multi-agent, on-demand knowledge integration significantly enhances both reasoning accuracy and efficiency, offering a scalable approach for complex, knowledge-intensive problem-solving. We will release the code upon publication.

2510.24399 2026-06-19 cs.CV cs.RO 版本更新

GenTrack: A New Generation of Multi-Object Tracking

GenTrack:新一代多目标跟踪

Toan Van Nguyen, Rasmus G. K. Christiansen, Dirk Kraft, Leon Bodenhagen

发表机构 * SDU Robotics, University of Southern Denmark(SDU机器人实验室,南丹麦大学)

AI总结 提出GenTrack多目标跟踪方法,采用随机与确定性混合策略,结合粒子群优化与社会交互,在弱检测器、遮挡等场景下有效维持目标身份一致性并减少ID切换。

Comments This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

本文介绍了一种新颖的多目标跟踪(MOT)方法,称为GenTrack,其主要贡献包括:第一,一种混合跟踪方法,采用随机和确定性方式,以鲁棒地处理未知且时变的目标数量,特别是在维持目标身份(ID)一致性和管理非线性动态方面;第二,利用粒子群优化(PSO)和一些提出的适应度度量,引导随机粒子朝向其目标分布模式,从而即使在弱且噪声大的目标检测器下也能实现有效跟踪;第三,整合目标间的社会交互,以增强PSO引导的粒子,并改进强(匹配)和弱(未匹配)轨迹的连续更新,从而减少ID切换和轨迹丢失,尤其是在遮挡期间;第四,基于GenTrack重新定义的视觉MOT基线,结合了基于空间一致性、外观、检测置信度、轨迹惩罚和社会分数的综合状态与观测模型,以实现系统且高效的目标更新;第五,首个公开可用的最小依赖源代码参考实现,包含三种变体,包括GenTrack Simple、Strengthen和Super,便于灵活重新实现。实验结果表明,与最先进的跟踪器相比,GenTrack在标准基准和现实场景中提供了优越的性能,并集成了基线实现以进行公平比较。还讨论了未来工作的潜在方向。所提方法和比较跟踪器的源代码参考实现已在GitHub上提供:this https URL

英文摘要

This paper introduces a novel multi-object tracking (MOT) method, dubbed GenTrack, whose main contributions include: first-a hybrid tracking approach employing both stochastic and deterministic manners to robustly handle unknown and time-varying numbers of targets, particularly in maintaining target identity (ID) consistency and managing nonlinear dynamics, second-leveraging particle swarm optimization (PSO) with some proposed fitness measures to guide stochastic particles toward their target distribution modes, enabling effective tracking even with weak and noisy object detectors, third-integration of social interactions among targets to enhance PSO-guided particles as well as improve continuous updates of both strong (matched) and weak (unmatched) tracks, thereby reducing ID switches and track loss, especially during occlusions, fourth-a GenTrack-based redefined visual MOT baseline incorporating a comprehensive state and observation model based on space consistency, appearance, detection confidence, track penalties, and social scores for systematic and efficient target updates, and five-the first ever publicly available source-code reference implementation with minimal dependencies, featuring three variants, including GenTrack Simple, Strengthen, and Super, facilitating flexible reimplementation. Experimental results have shown that GenTrack provides superior performance on standard benchmarks and real-world scenarios compared to state-of-the-art trackers, with integrated implementations of baselines for fair comparison. Potential directions for future work are also discussed. The source-code reference implementations of both the proposed method and compared-trackers are provided on GitHub: https://github.com/SDU-VelKoTek/GenTrack

2510.18383 2026-06-19 cs.CL cs.AI 版本更新

MENTOR: Reinforcement Learning via Flexible Teacher-Optimized Rewards for Tool-Use Distillation

MENTOR: 通过灵活的教师优化奖励进行工具使用蒸馏的强化学习

ChangSu Choi, Hoyun Song, Dongyeon Kim, WooHyeon Jung, Minkyung Cho, Sunjin Park, NohHyeob Bae, Seona Yu, KyungTae Lim

发表机构 * Seoul National University of Science and Technology(首尔科学技术大学) Korea Advanced Institute of Science and Technology(韩国科学技术院) LG CNS

AI总结 提出MENTOR方法,通过灵活的教师优化奖励结构,平衡行为对齐与下游性能,提升小模型在工具使用任务中的域外泛化能力。

详情
AI中文摘要

将大型语言模型(LLMs)的工具使用能力蒸馏到小型语言模型(SLMs)中对其实际应用至关重要。主要方法监督微调(SFT)由于与静态教师轨迹的刚性对齐,导致域外(OOD)泛化性能较差。虽然强化学习(RL)提供了一种替代方案,但SLMs的能力限制带来了严峻的困境:稀疏的结果奖励提供的指导不足,而严格的轨迹匹配施加了过于严格的约束。为了弥合这一能力驱动的差距,我们提出了MENTOR,它引入了一种灵活且过程感知的奖励结构。MENTOR不强制执行刚性复制,而是利用教师的参考来指导工具使用行为,平衡行为对齐与下游性能。在可控可执行工具基准上的大量实验表明,与SFT和严格RL基线相比,MENTOR提高了OOD工具使用性能。我们的研究结果表明,在可验证的工具使用环境中,灵活的工具使用对齐比严格的轨迹复制为开发适应性小模型提供了更有效的方法。

英文摘要

Distilling the tool-use capabilities of large language models (LLMs) into small language models (SLMs) is essential for their practical application. The predominant approach, supervised fine-tuning (SFT), suffers from poor out-of-domain (OOD) generalization due to its rigid alignment with static teacher trajectories. While reinforcement learning (RL) offers an alternative, the capacity limitations of SLMs pose a severe dilemma: sparse outcome rewards provide insufficient guidance, whereas strict trajectory matching imposes overly restrictive constraints. To bridge this capacity-driven gap, we propose MENTOR, which introduces a flexible yet process-aware reward structure. Instead of enforcing rigid replication, MENTOR uses the teacher's reference to guide tool-use behavior, balancing behavioral alignment with downstream performance. Extensive experiments on controlled executable-tool benchmarks demonstrate that MENTOR improves OOD tool-use performance compared to SFT and strict RL baselines. Our findings suggest that within verifiable tool-use environments, flexible tool-use alignment offers a more effective approach than strict trajectory replication for developing adaptable small models.

2510.21978 2026-06-19 cs.LG cs.AI 版本更新

Beyond Reasoning Gains: Mitigating General-Capability Forgetting in Large Reasoning Models

超越推理增益:缓解大型推理模型中的通用能力遗忘

Hoang Phan, Xianjun Yang, Yuanshun Yao, Jingyu Zhang, Shengjie Bi, Xiaocheng Tang, Madian Khabsa, Lijuan Liu, Deren Lei

发表机构 * Meta Superintelligence Labs(Meta超智能实验室) New York University(纽约大学) Johns Hopkins University(约翰霍普金斯大学)

AI总结 针对强化学习训练导致推理模型遗忘基础能力的问题,提出RECAP重放策略,通过动态目标重加权在线调整训练重点,在保持通用能力的同时提升推理性能。

详情
AI中文摘要

基于可验证奖励的强化学习(RLVR)在数学和多模态推理方面取得了显著进展,并已成为当代语言和视觉-语言模型的标准后训练范式。然而,RLVR方法引入了能力退化的重大风险,即模型在长时间训练后,若未采用正则化策略,会遗忘基础技能。我们通过实验证实了这一担忧,观察到开源推理模型在感知和忠实性等核心能力上出现性能下降。虽然施加KL散度等正则化项有助于防止偏离基础模型,但这些项是在当前任务上计算的,因此不能保证保留更广泛的知识。同时,跨异构领域的经验回放使得决定每个目标应获得多少训练权重变得困难。为解决这一问题,我们提出RECAP——一种具有动态目标重加权的重放策略,用于通用知识保留。我们的重加权机制利用短期收敛和不稳定信号在线自适应,将后训练焦点从饱和目标转移到表现不佳或不稳定的目标。我们的方法是端到端的,可直接应用于现有RLVR流程,无需训练额外模型或进行繁重调优。在Qwen2.5-VL-3B和Qwen2.5-VL-7B上的广泛实验证明了我们方法的有效性,该方法不仅保留了通用能力,还通过实现任务内奖励的更灵活权衡提升了推理性能。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has delivered impressive gains in mathematical and multimodal reasoning and has become a standard post-training paradigm for contemporary language and vision-language models. However, the RLVR recipe introduces a significant risk of capability regression, in which models forget foundational skills after prolonged training without employing regularization strategies. We empirically confirm this concern, observing that open-source reasoning models suffer performance degradation on core capabilities such as perception and faithfulness. While imposing regularization terms like KL divergence can help prevent deviation from the base model, these terms are computed on the current task and therefore do not guarantee preservation of broader knowledge. Meanwhile, commonly used experience replay across heterogeneous domains makes it nontrivial to decide how much training emphasis each objective should receive. To address this, we propose RECAP-a replay strategy with dynamic objective reweighting for general knowledge preservation. Our reweighting mechanism adapts online using short-horizon signals of convergence and instability, shifting the post-training focus away from saturated objectives and toward underperforming or volatile ones. Our method is end-to-end and readily applicable to existing RLVR pipelines without training additional models or heavy tuning. Extensive experiments on benchmarks using Qwen2.5-VL-3B and Qwen2.5-VL-7B demonstrate the effectiveness of our method, which not only preserves general capabilities but also improves reasoning by enabling more flexible trade-offs among in-task rewards.

2510.21546 2026-06-19 eess.SY cs.SY 版本更新

Auction-Based Responsibility Allocation for Scalable Decentralized Safety Filters in Cooperative Multi-Agent Collision Avoidance

基于拍卖的责任分配用于可扩展的去中心化安全滤波器在多智能体协同避碰中

Johannes Autenrieb, Mark Spiller

AI总结 提出基于高阶控制屏障函数和拍卖责任分配的可扩展去中心化安全滤波器,通过非对称分配约束减少计算负荷,实现多智能体协同避碰。

Comments 6 pages, 3 figures, accepted for presentation at the IFAC World Congress 2026

详情
AI中文摘要

本文提出了一种基于高阶控制屏障函数(HOCBFs)和拍卖式责任分配的可扩展去中心化多智能体系统安全滤波器。虽然去中心化HOCBF公式在输入约束下保证了成对安全性,但随着智能体数量增加,它们面临可行性和可扩展性挑战。每个智能体必须评估越来越多的成对约束,增加了不可行的风险,并难以满足实时要求。为了解决这个问题,我们引入了一种基于拍卖的分配方案,该方案基于局部控制努力估计,在邻居之间非对称地分配约束执行。由此产生的有向责任图保证了完全的安全覆盖,同时减少了冗余约束和每个智能体的计算负荷。仿真结果证实了在各种网络规模和交互密度下的安全高效协调。

英文摘要

This paper proposes a scalable decentralized safety filter for multi-agent systems based on high-order control barrier functions (HOCBFs) and auction-based responsibility allocation. While decentralized HOCBF formulations ensure pairwise safety under input bounds, they face feasibility and scalability challenges as the number of agents grows. Each agent must evaluate an increasing number of pairwise constraints, raising the risk of infeasibility and making it difficult to meet real-time requirements. To address this, we introduce an auction-based allocation scheme that distributes constraint enforcement asymmetrically among neighbors based on local control effort estimates. The resulting directed responsibility graph guarantees full safety coverage while reducing redundant constraints and per-agent computational load. Simulation results confirm safe and efficient coordination across a range of network sizes and interaction densities.

2510.20454 2026-06-19 cs.LG 版本更新

Capturing Intransitive Dominance in Tennis Forecasting: A Graph Neural Network Approach

网球预测中非传递性优势的捕捉:一种图神经网络方法

Lawrence Clegg, John Cartlidge

发表机构 * School of Engineering Mathematics and Technology, University of Bristol(布里斯托大学工程数学与技术学院)

AI总结 针对网球中常见的非传递性优势(A胜B,B胜C,C胜A),提出图神经网络模型,通过时间有向图建模历史比赛结果,捕捉被传递性评级系统忽略的预测信号,与加权Elo结合后显著提升预测性能。

Comments 41 pages, 7 figures. Major revision reframing the paper from betting-market inefficiency toward intransitivity analysis, forecast complementarity, and robustness. Added forecast-encompassing tests, new intransitivity measures, robustness analyses, and expanded appendices

详情
AI中文摘要

非传递性球员优势(即球员A击败B,B击败C,但C击败A)在竞技网球中很常见。然而,很少有已知的尝试将其纳入预测方法中。我们通过一种图神经网络方法来解决这个问题,该方法通过时间有向图显式建模这些非传递性关系,其中球员作为节点,他们的历史比赛结果作为有向边。我们的模型(准确率65.7%,Brier分数0.214)与加权Elo等已建立的评级系统相比具有竞争力。尽管它在无条件准确性上没有超越基线,但一项预测包含测试表明它携带了互补信息。组合预测显著优于加权Elo,并且有迹象表明,在我们的模型针对的非传递性对决中,增益增长更强烈。因此,基于图的球员交互表示捕捉了传递性评级系统丢弃的预测信号,即使在没有共同对手的球员之间也是如此。

英文摘要

Intransitive player dominance, where player A beats B, B beats C, but C beats A, is common in competitive tennis. Yet, there are few known attempts to incorporate it within forecasting methods. We address this problem with a graph neural network approach that explicitly models these intransitive relationships through temporal directed graphs, with players as nodes and their historical match outcomes as directed edges. Our model (65.7% accuracy, 0.214 Brier score) forecasts competitively with established rating systems such as Weighted Elo. Although it does not improve on the baseline in unconditional accuracy, a forecast-encompassing test shows that it carries complementary information. A combined forecast significantly outperforms Weighted Elo, and there is some indication that the gain grows more strongly on the intransitive matchups our model targets. A graph-based representation of player interactions thus captures a forecasting signal that transitive rating systems discard, even between players who share no common opponents.

2510.19893 2026-06-19 cs.LG 版本更新

EQPO: Equitable Group Relative Policy Optimization for Clinical Reasoning

EQPO: 面向临床推理的公平群体相对策略优化

Shiqi Dai, Wei Dai, Jiaee Cheong, Paul Pu Liang

发表机构 * MIT(麻省理工学院) Harvard University(哈佛大学)

AI总结 提出EQPO分层强化学习方法,通过自适应重加权样本促进异质临床人群的均衡学习,在7个诊断基准上降低F1标准差43.9%,缩小预测公平差距27.2%。

Comments Accepted as Oral on NeurIPS 2025 GenAI4Health Workshop

详情
AI中文摘要

医疗AI系统展示了令人印象深刻的诊断性能,但它们在不同人口统计群体之间通常表现出不均匀的准确性,使代表性不足的人群处于不利地位。尽管多模态推理基础模型推动了临床诊断的发展,基于强化学习的后训练倾向于吸收并放大多数主导训练语料中存在的偏见。我们提出公平群体相对策略优化(EQPO),一种分层强化学习方法,通过根据子群表示、任务难度和数据来源自适应地重新加权样本,鼓励跨异质临床人群的平衡学习。由于人口统计注释在真实临床数据中经常缺失,EQPO还在不可用时应用无监督聚类来恢复潜在子群。在覆盖5种模态(X射线、CT、皮肤镜、乳腺X线摄影、超声)的7个诊断基准上,EQPO在QoQ-Med3-8B上相比原始GRPO将F1标准差降低43.9%,最大跨群体F1差距降低42.7%,并在MedGemma-4B上将预测公平差距缩小27.2%(相比有偏减轻的RL基线),同时即使没有任何人口统计标签也将F1提高12.5%。检查训练轨迹显示,EQPO在优化过程中稳步提高公平性,而基线方法的公平性随训练进行而下降,并且发现的隐式群体保持稳定并与掩蔽的人口统计属性对齐。我们进一步发布了EquiMedGemma-4B和EquiQoQ-Med3-8B,这两种具有公平意识的临床VLLM在显著缩小人口统计差距的同时达到了最先进的准确性。

英文摘要

Medical AI systems demonstrated impressive diagnostic performance, yet they routinely show uneven accuracy across demographic groups, disadvantaging underrepresented populations. Although multimodal reasoning foundation models have pushed clinical diagnosis forward, reinforcement learning-based post-training tends to absorb and magnify the biases present in majority-dominated training corpora. We propose Equitable Group Relative Policy Optimization (EQPO), a hierarchical reinforcement learning method that encourages balanced learning across heterogeneous clinical populations by adaptively reweighting samples according to subgroup representation, task difficulty, and data source. As demographic annotations are frequently missing in real-world clinical data, EQPO additionally applies unsupervised clustering to recover latent subpopulations when they are unavailable. On 7 diagnostic benchmarks covering 5 modalities (X-ray, CT, dermoscopy, mammography, ultrasound), EQPO reduces F1 standard deviation by 43.9% and the maximum cross-group F1 gap by 42.7% on QoQ-Med3-8B over vanilla GRPO, and narrows predictive parity gaps by 27.2% on MedGemma-4B over bias-mitigated RL baselines while raising F1 by 12.5% even without any demographic labels. Examining the training trajectory shows that EQPO steadily improves fairness over the course of optimization, in contrast to baseline methods whose fairness degrades as training proceeds, and the discovered implicit groups remain stable and align with masked demographic attributes. We further release EquiMedGemma-4B and EquiQoQ-Med3-8B, equitability-aware clinical VLLMs that attain state-of-the-art accuracy with markedly smaller demographic gaps.

2510.16311 2026-06-19 cs.LG 版本更新

Toward General Digraph Contrastive Learning: A Dual Spatial Perspective

面向一般有向图对比学习:双空间视角

Zhengyu Wu, Daohan Su, Yang Zhang, Xunkai Li, Rong-Hua Li, Guoren Wang

发表机构 * National University of Singapore(新加坡国立大学) University of Science and Technology of China(中国科学技术大学)

AI总结 提出S2-DiGCL框架,从复数域和实数域双空间视角对有向图进行对比学习,通过磁拉普拉斯自适应调制和路径子图增强,在节点分类和链接预测任务上分别提升4.41%和4.34%。

详情
AI中文摘要

图对比学习(GCL)已成为一种从图中提取一致表示而无需标签信息的强大工具。然而,现有方法主要关注无向图,忽略了在实际网络(如社交网络和推荐系统)中基础且不可或缺的关键方向信息。本文提出了S2-DiGCL,一种新颖的框架,强调从复杂域和实数域视角对有向图进行对比学习的空间洞察。从复数域视角,S2-DiGCL在磁拉普拉斯中引入个性化扰动,以自适应地调制边相位和方向语义。从实数域视角,它采用基于路径的子图增强策略,捕捉细粒度的局部不对称性和拓扑依赖性。通过联合利用这两个互补的空间视图,S2-DiGCL构建了高质量的正负样本,从而实现更通用和鲁棒的有向图对比学习。在7个真实有向图数据集上的大量实验证明了我们方法的优越性,在监督和无监督设置下,节点分类和链接预测分别实现了4.41%和4.34%的性能提升,达到了最先进水平。

英文摘要

Graph Contrastive Learning (GCL) has emerged as a powerful tool for extracting consistent representations from graphs, independent of labeled information. However, existing methods predominantly focus on undirected graphs, disregarding the pivotal directional information that is fundamental and indispensable in real-world networks (e.g., social networks and recommendations).In this paper, we introduce S2-DiGCL, a novel framework that emphasizes spatial insights from complex and real domain perspectives for directed graph (digraph) contrastive learning. From the complex-domain perspective, S2-DiGCL introduces personalized perturbations into the magnetic Laplacian to adaptively modulate edge phases and directional semantics. From the real-domain perspective, it employs a path-based subgraph augmentation strategy to capture fine-grained local asymmetries and topological dependencies. By jointly leveraging these two complementary spatial views, S2-DiGCL constructs high-quality positive and negative samples, leading to more general and robust digraph contrastive learning. Extensive experiments on 7 real-world digraph datasets demonstrate the superiority of our approach, achieving SOTA performance with 4.41% improvement in node classification and 4.34% in link prediction under both supervised and unsupervised settings.

2510.12307 2026-06-19 math.NA cs.NA 版本更新

Fully mixed virtual element schemes for a new model of steady-state poroelastic stress-assisted diffusion in the brain

脑稳态孔隙弹性应力辅助扩散新模型的完全混合虚拟元方案

Isaac Bermudez, Bryan Gomez-Vargas, Kent-Andre Mardal, Andres E. Rubiano, Ricardo Ruiz-Baier

AI总结 提出完全混合虚拟元法求解线性孔隙弹性与应力依赖非线性扩散的耦合问题,通过解耦不动点策略证明解存在性,并建立先验误差估计,数值实验验证了最优收敛性和参数鲁棒性。

详情
AI中文摘要

我们提出了一种完全混合虚拟元方法,用于数值逼近线性孔隙弹性方程(使用Hellinger--Reissner原理,具有总孔隙弹性应力的强对称性)与应力改变溶质扩散(其中扩散通量依赖于孔隙弹性应力并非线性依赖于浓度梯度)之间的耦合。由于非线性耦合,与非线性扩散子问题相关的函数空间是Banach型的。为了处理这种结构,通过解耦的不动点策略建立了连续和离散问题的可解性。线性孔隙弹性部分使用扰动鞍点问题理论进行分析,而非线性扩散问题则依赖于单调全局算子的经典Minty--Browder定理。通过Schauder不动点定理严格证明了完全耦合系统解的存在性。此外,我们为离散方案建立了严格的先验误差估计,成功处理了强交叉耦合的非线性。这些发现得到了计算证据的支持,表明该公式在实践中渐近地恢复了最优收敛速度。作为关键贡献,数值方案及其基础分析在孔隙力学参数方面被证明是鲁棒的。最后,给出了几个数值例子,以说明所提出方案在脑多物理场背景下溶质输运研究中的特性和适用性。

英文摘要

We propose a fully mixed virtual element method for the numerical approximation of the coupling between linear poroelasticity equations with strong symmetry of total poroelastic stress (using the Hellinger--Reissner principle) and stress-altered solute diffusion (where diffusive flux depends on the poroelastic stress and nonlinearly on the concentration gradient). Because of the nonlinear coupling, the function spaces associated with the nonlinear diffusion sub-problem are of Banach type. To handle this structure, the solvability of both the continuous and discrete problems is established through a decoupled fixed-point strategy. The linear poroelasticity component is analysed using the theory for perturbed saddle-point problems, whereas the nonlinear diffusion problem, relies on the classical Minty--Browder theorem for monotone global operators. The existence of solutions for the fully coupled system is rigorously proven via Schauder's fixed-point theorem. Additionally, we establish rigorous a priori error estimates for the discrete scheme, successfully handling the strongly cross-coupled nonlinearities. These findings are supported by computational evidence, demonstrating that the formulation asymptotically recovers optimal convergence rates in practice. As a key contribution, both the numerical scheme and its underlying analysis prove to be robust with respect to the poromechanical parameters. Finally, several numerical examples are presented to illustrate the properties and applicability of the proposed scheme in the study of solute transport in the context of brain multiphysics.

2510.08807 2026-06-19 cs.RO cs.LG 版本更新

Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation

Humanoid Everyday:面向开放世界人形机器人操作的综合机器人数据集

Zhenyu Zhao, Hongyi Jing, Xiawei Liu, Jiageng Mao, Abha Jha, Hanwen Yang, Rong Xue, Sergey Zakharov, Vitor Guizilini, Yue Wang

发表机构 * University of Southern California(南加州大学) Toyota Research Institute(丰田研究院)

AI总结 提出Humanoid Everyday数据集,包含10.3k轨迹、260个任务的多模态数据,用于人形机器人灵巧操作、人机交互和移动操作研究,并配套云评估平台。

详情
AI中文摘要

从运动到灵巧操作,人形机器人在展示复杂的全身能力方面取得了显著进展。然而,当前大多数机器人学习数据集和基准主要关注固定机器人臂,少数现有人形数据集要么局限于固定环境,要么任务多样性有限,通常缺乏人机交互和下肢运动。此外,缺乏用于在人形数据上对基于学习的策略进行基准测试的标准化评估平台。在这项工作中,我们提出了Humanoid Everyday,一个大规模且多样化的人形操作数据集,其特点是涉及灵巧物体操作、人机交互、运动集成动作等广泛的任务多样性。利用高效的人工监督遥操作流水线,Humanoid Everyday聚合了高质量的多模态感官数据,包括RGB、深度、LiDAR和触觉输入,以及自然语言注释,包含10.3k条轨迹和超过300万帧数据,涵盖7个大类共260个任务。此外,我们对数据集上的代表性策略学习方法进行了分析,提供了它们在不同任务类别中的优势和局限性的见解。为了标准化评估,我们引入了一个基于云的评估平台,允许研究人员在我们的受控环境中无缝部署他们的策略并接收性能反馈。通过发布Humanoid Everyday以及我们的策略学习分析和标准化的基于云的评估平台,我们旨在推进通用人形操作的研究,并为现实世界中更有能力和具身化的机器人代理奠定基础。我们的数据集、数据收集代码和云评估网站在我们的项目网站上公开发布。

英文摘要

From loco-motion to dextrous manipulation, humanoid robots have made remarkable strides in demonstrating complex full-body capabilities. However, the majority of current robot learning datasets and benchmarks mainly focus on stationary robot arms, and the few existing humanoid datasets are either confined to fixed environments or limited in task diversity, often lacking human-humanoid interaction and lower-body locomotion. Moreover, there are a few standardized evaluation platforms for benchmarking learning-based policies on humanoid data. In this work, we present Humanoid Everyday, a large-scale and diverse humanoid manipulation dataset characterized by extensive task variety involving dextrous object manipulation, human-humanoid interaction, locomotion-integrated actions, and more. Leveraging a highly efficient human-supervised teleoperation pipeline, Humanoid Everyday aggregates high-quality multimodal sensory data, including RGB, depth, LiDAR, and tactile inputs, together with natural language annotations, comprising 10.3k trajectories and over 3 million frames of data across 260 tasks across 7 broad categories. In addition, we conduct an analysis of representative policy learning methods on our dataset, providing insights into their strengths and limitations across different task categories. For standardized evaluation, we introduce a cloud-based evaluation platform that allows researchers to seamlessly deploy their policies in our controlled setting and receive performance feedback. By releasing Humanoid Everyday along with our policy learning analysis and a standardized cloud-based evaluation platform, we intend to advance research in general-purpose humanoid manipulation and lay the groundwork for more capable and embodied robotic agents in real-world scenarios. Our dataset, data collection code, and cloud evaluation website are made publicly available on our project website.

2509.13972 2026-06-19 cs.RO 版本更新

BIM Informed Visual SLAM for Construction Environments

BIM 引导的视觉 SLAM 在建筑环境中的应用

Asier Bikandi-Noya, Miguel Fernandez-Cortizas, Muhammad Shaheer, Ali Tourani, Holger Voos, Jose Luis Sanchez-Lopez

发表机构 * Automation and Robotics Research Group, Interdisciplinary Centre for Security, Reliability, and Trust (SnT), University of Luxembourg(自动化与机器人研究组,安全、可靠与信任跨学科研究中心(SnT),卢森堡大学)

AI总结 针对建筑环境中视觉SLAM轨迹漂移问题,提出利用建筑信息模型(BIM)的结构先验增强RGB-D SLAM系统,通过墙面对应与几何约束优化减少漂移,提升全局一致性,实验显示轨迹误差降低25.23%,地图精度提升7.14%。

Comments 9 pages, 7 tables, 4 figures

详情
AI中文摘要

监测建筑施工现场需要将计划设计与实际建造状态进行比较,而同步定位与地图构建(SLAM)技术可以实时估计实际状态。然而,视觉SLAM在建筑环境中容易产生轨迹漂移,生成的地图在几何上与实际环境不准确。为解决这一局限,我们利用从建筑信息模型(BIM)导出的结构先验增强现有的RGB-D SLAM系统。该系统将检测到的墙面与BIM中的对应墙面关联,并将这些对应关系作为几何约束加入后端优化,从而减少漂移并增强全局一致性。所提方法实时运行,并在多个真实建筑工地上验证,与最先进的基线相比,平均轨迹误差降低25.23%,地图精度提升7.14%。鲁棒性分析进一步表明,该方法对不完整的BIM数据以及计划模型与实际环境之间的几何差异具有韧性。

英文摘要

Monitoring building construction sites requires comparing the as-planned design with the as-built state, which can be estimated in real time using Simultaneous Localization and Mapping (SLAM) techniques. However, visual SLAM is prone to trajectory drift in construction environments, producing maps that are geometrically inaccurate with the actual environment. To address this limitation, we augment an existing RGB-D SLAM system with structural priors derived from the Building Information Model (BIM). The system associates detected walls with their BIM counterparts and includes these correspondences as geometric constraints in the back-end optimization, reducing drift and enhancing global consistency. The proposed method operates in real time and is validated on multiple real construction sites, achieving an average trajectory error reduction of 25.23% and a 7.14% improvement in map accuracy over state-of-the-art baselines. Robustness analyses further demonstrate resilience to incomplete BIM data and geometric discrepancies between as-planned models and the as-built environment.

2510.00831 2026-06-19 cs.AI cs.LG eess.SP 版本更新

Controlled Comparison of Machine Learning Models for Fault Classification and Localization in Power System Protection

电力系统保护中故障分类与定位的机器学习模型受控比较

Julian Oelhaf, Georg Kordowich, Changhun Kim, Paula Andrea Pérez-Toro, Christian Bergler, Andreas Maier, Johann Jäger, Siming Bayer

发表机构 * Department of Electrical Engineering, Media and Computer Science, Ostbayerische Technische Hochschule Amberg-Weiden(奥贝格-魏登应用技术大学电气工程、媒体与计算机科学系)

AI总结 在统一电磁暂态数据集和10-50ms决策窗口下,对比机器学习模型在故障分类与定位中的性能,发现分类在10ms时F1>0.98,定位误差稳定在约10%线路长度。

Comments Accepted at IEEE PES Innovative Smart Grid Technologies Europe 2026 (ISGT Europe 2026). Pre-camera-ready author version; final proceedings version may differ

详情
AI中文摘要

现代电力系统因逆变器基和分布式能源的集成而日益复杂,挑战了传统保护方案的可靠性,并推动了机器学习在保护任务中的应用。然而,由于不同研究中的数据集、传感假设和决策时域各异,已发表的结果往往难以比较。本文在相同的传感、时序和验证条件下,基于公共电磁暂态数据集,使用10-50ms的决策窗口以反映保护相关时间尺度,对故障分类(FC)和故障定位(FL)的机器学习模型进行了受控比较。对于FC,性能最佳的非线性模型在10ms时F1分数已超过0.98,而低容量模型在较短时域下性能下降,但随窗口延长而改善,表明相关故障类型信息在最早暂态中已存在。对于FL,顶级模型在所有评估时域下达到约10%归一化线路长度的稳定定位误差,而较弱模型形成明显分离的第二性能层级。线路解析分析显示,定位精度随电网段变化,表明存在拓扑依赖的难度而非仅时间上下文不足。这些发现为比较两个信息需求根本不同的保护任务中的机器学习模型提供了受控参考。

英文摘要

The increasing complexity of modern power systems, driven by the integration of inverter-based and distributed energy resources, challenges the reliability of conventional protection schemes and motivates the use of machine learning for protection tasks. However, published results are often difficult to compare because datasets, sensing assumptions, and decision horizons vary across studies. This paper presents a controlled comparison of machine learning models for fault classification (FC) and fault localization (FL) under identical sensing, timing, and validation conditions on a common electromagnetic transient dataset, using decision windows of 10-50 ms to reflect protection-relevant time scales. For FC, the best-performing nonlinear models achieve F1 scores above 0.98 already at 10 ms, while lower-capacity models degrade at shorter horizons but improve with longer windows, indicating that relevant fault-type information is already present in the earliest transient. For FL, the top-performing models reach a stable localization error of about 10 % of normalized line length across all evaluated horizons, while weaker models form a clearly separated second performance tier. Line-resolved analysis shows that localization accuracy varies across grid segments, indicating topology-dependent difficulty rather than insufficient temporal context alone. These findings provide a controlled reference for comparing machine learning models across two protection tasks with fundamentally different information requirements.

2509.25148 2026-06-19 cs.AI 版本更新

AAPA: Adversarially Anchored Preference Alignment for Post-Training of Large Language Models

AAPA:用于大型语言模型后训练的对抗锚定偏好对齐

Faqiang Qian, Kang An, Weikun Zhang, Ziliang Wang, Xuhui Zheng, Liangjian Wen, Yong Dai, Mengya Gao, Yichao Wu

发表机构 * Southwest University of Finance and Economics(西南财经大学)

AI总结 提出AAPA框架,通过固定轻量判别器对策略输出与专家响应进行句子级对抗锚定,增强SFT、GRPO等后训练目标,在指令遵循基准上持续提升性能。

详情
AI中文摘要

大型语言模型的后训练对齐通常结合了专家演示上的监督微调(SFT)和来自偏好或可验证反馈的强化学习(RL)。SFT提供了有用的行为锚点,但可能过拟合静态演示,而RL鼓励探索但可能偏离专家行为或利用不完美的奖励。我们提出\textbf{AAPA}(\emph{对抗锚定偏好对齐}),这是一个插件式框架,通过句子级对抗锚定信号增强现有的后训练目标。AAPA使用固定的轻量判别器将策略生成结果与离线预收集的专家响应进行比较,因此在策略优化期间既不需要在线教师推理,也不需要判别器协同训练。相同的锚定项可以添加到SFT、GRPO和CHORD中,同时保留其原始训练流程。在指令遵循基准上的实验表明,AAPA在不同模型规模上一致地改善了相应的基础目标。特别是,分阶段的AAPA配置在\texttt{Qwen3-0.6B}上比强GRPO基线提高了5.77%,在\texttt{Qwen3-4B}上提高了3.75%。对响应长度、对数概率分布和判别器变体的进一步分析表明,对抗锚定为偏好优化提供了稳定的语义基础信号。代码可在\url{this https URL}获取。

英文摘要

Post-training alignment of large language models often combines supervised fine-tuning (SFT) on expert demonstrations with reinforcement learning (RL) from preference or verifiable feedback. SFT provides a useful behavioral anchor but can overfit to static demonstrations, whereas RL encourages exploration but may drift from expert behavior or exploit imperfect rewards. We propose \textbf{AAPA} (\emph{Adversarially Anchored Preference Alignment}), a plug-in framework that augments existing post-training objectives with a sentence-level adversarial anchoring signal. AAPA compares policy rollouts with offline, pre-collected expert responses using a fixed lightweight discriminator, and therefore requires neither online teacher inference nor discriminator co-training during policy optimization. The same anchoring term can be added to SFT, GRPO, and CHORD while preserving their original training pipelines. Experiments on instruction-following benchmarks show that AAPA consistently improves the corresponding base objectives across model scales. In particular, the staged AAPA configuration improves over a strong GRPO baseline by 5.77\% on \texttt{Qwen3-0.6B} and 3.75\% on \texttt{Qwen3-4B}. Further analyses on response length, log-probability distributions, and discriminator variants suggest that adversarial anchoring provides a stable semantic grounding signal for preference optimization. Code is available at \url{https://github.com/IsFaqq/AAPA}.