arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2409.02708 2026-05-14 cs.LG stat.ME

Few-shot Multi-Task Learning of Linear Invariant Features with Meta Subspace Pursuit

Chaozhi Zhang, Lin Liu, Xiaoqun Zhang

AI总结本文研究了在数据稀缺情况下如何通过多任务学习提取线性不变特征的问题，提出了一种名为Meta Subspace Pursuit（Meta-SP）的新算法，用于学习不同任务间共享的低秩不变子空间。该方法在算法层面和统计层面均提供了理论保证，并通过大量实验验证了其在性能上的优越性，优于包括ANIL在内的多种对比方法。

2409.02038 2026-05-14 cs.CL cs.AI cs.DB

BEAVER: An Enterprise Benchmark for Text-to-SQL

Peter Baile Chen, Devin Yang, Weiyue Li, Fabian Wenz, Yi Zhang, Nesime Tatbul, Michael Cafarella, Çağatay Demiralp, Michael Stonebraker

AI总结 BEAVER 是首个基于私有数据仓库构建的文本到 SQL 基准测试集，旨在评估大语言模型在复杂企业环境中的表现。该基准包含来自真实查询日志的 9128 个问题-SQL 对，覆盖 19 个不同领域，涵盖复杂的数据库结构和专业领域知识。为解决企业数据稀缺和评估指标不足的问题，BEAVER 通过合成高质量专家验证查询，并引入细粒度子任务评估指标，揭示了当前先进模型在实际企业场景中的显著性能差距。

Comments Dataset and code are available at https://beaverbench.github.io/

详情

英文摘要

Existing text-to-SQL benchmarks have largely been constructed from public databases with well-structured schemas and simplistic question-SQL pairs. While large language models (LLMs) excel on these settings, their efficacy in complex private enterprise environments, characterized by intricate schemas, domain knowledge, and analytical user queries involving sophisticated structures and functions, remains unproven. To bridge this gap, we introduce BEAVER, the first text-to-SQL benchmark derived from private data warehouses. It comprises 9128 question-SQL pairs sourced from real-world query logs and 812 tables across 19 diverse domains. Building this benchmark is challenging because (1) enterprise query logs are scarce due to privacy constraints, and (2) existing all-or-nothing evaluation metrics based on accuracy make error diagnosis difficult -- especially when producing a correct query involves solving multiple compounded challenges, such as domain knowledge and query complexity. We address these issues at two levels. At the dataset level, we synthesize high-fidelity, expert-verified queries that increase dataset size and isolate individual challenges or combine them, producing queries focused on domain knowledge, query complexity, and both. At the evaluation level, we provide human annotations and evaluation metrics for five critical subtasks to enable fine-grained analysis. Our evaluation reveals a significant performance gap compared to existing benchmarks: SOTA agentic frameworks using the advanced model GPT-5.2 achieve only 10.8% accuracy. When provided with all subtask annotations as oracle hints, accuracy increases to 30.1%, confirming that a major bottleneck lies in correctly resolving these subtasks. Finally, we provide a taxonomy of the residual errors that persist even with subtask hints, identifying specific challenges such as the use of advanced functions.

URL PDF HTML ☆

赞 0 踩 0

2402.15415 2026-05-14 cs.LG math.DS stat.ML

Understanding Catastrophic Forgetting In LoRA via Mean-Field Attention Dynamics

Hugo Koubbi, Louis Hernandez, Matthieu Boussard

AI总结本文研究了LoRA（低秩适配）方法在微调过程中出现的灾难性遗忘问题，通过构建一个可解析的均场自注意力玩具模型，将令牌视为相互作用的粒子系统，并将LoRA视为低秩扰动。利用偏微分方程和动力系统理论，揭示了遗忘行为与非遗忘行为之间的相变机制，并分析了扰动大小和模型深度对遗忘的影响，同时通过实验验证了理论预测。

Comments New version accepted at ICML 2026, with new results and without previous results

2210.09114 2026-05-14 cs.RO

INSANE: Cross-Domain UAV Data Sets with Increased Number of Sensors for developing Advanced and Novel Estimators

Christian Brommer, Alessandro Fornasier, Martin Scheiber, Jeff Delaune, Roland Brockers, Jan Steinbrener, Stephan Weiss

AI总结本文提出了一种名为INSANE的跨领域无人机数据集，旨在支持自主移动机器人在复杂动态环境中的高精度定位研究。该数据集包含多种场景和不同难度级别的飞行轨迹，涵盖室内运动捕捉环境、室内外过渡飞行以及模拟火星环境的挑战性任务，提供了丰富的传感器数据和高精度真实值。数据集配备了多种传感器，包括多个惯性测量单元和摄像头，并支持基于机器学习的传感器信号增强方法研究。

Comments V2 with added dataset comparison tables

详情

DOI: 10.1177/02783649241227245
Journal ref: Int. J. Robot. Res. 43 (2024) 1083-1113

英文摘要

For real-world applications, autonomous mobile robotic platforms must be capable of navigating safely in a multitude of different and dynamic environments with accurate and robust localization being a key prerequisite. To support further research in this domain, we present the INSANE data sets - a collection of versatile Micro Aerial Vehicle (MAV) data sets for cross-environment localization. The data sets provide various scenarios with multiple stages of difficulty for localization methods. These scenarios range from trajectories in the controlled environment of an indoor motion capture facility, to experiments where the vehicle performs an outdoor maneuver and transitions into a building, requiring changes of sensor modalities, up to purely outdoor flight maneuvers in a challenging Mars analog environment to simulate scenarios which current and future Mars helicopters would need to perform. The presented work aims to provide data that reflects real-world scenarios and sensor effects. The extensive sensor suite includes various sensor categories, including multiple Inertial Measurement Units (IMUs) and cameras. Sensor data is made available as raw measurements and each data set provides highly accurate ground truth, including the outdoor experiments where a dual Real-Time Kinematic (RTK) Global Navigation Satellite System (GNSS) setup provides sub-degree and centimeter accuracy (1-sigma). The sensor suite also includes a dedicated high-rate IMU to capture all the vibration dynamics of the vehicle during flight to support research on novel machine learning-based sensor signal enhancement methods for improved localization. The data sets and post-processing tools are available at: https://sst.aau.at/cns/datasets

URL PDF HTML ☆

赞 0 踩 0

2110.00062 2026-05-14 cs.RO cs.SY eess.SY

Simulation-based multi-criteria comparison of mono-articular and bi-articular exoskeletons during walking with and without load

Ali KhalilianMotamed Bonab, Volkan Patoglu

AI总结本文通过仿真方法对单关节和双关节外骨骼在不同负载条件下的行走性能进行了多目标比较，研究了外骨骼动力学特性与辅助扭矩对代谢成本、肌肉激活和关节反作用力的影响。作者提出了一种基于帕累托优化的多目标设计方法，同时优化外骨骼的功耗和人体代谢率降低效果，并考虑了设备惯性和电能再生的影响。研究结果表明，尽管两种外骨骼的辅助水平相近，但单关节外骨骼在降低关节峰值反作用力方面表现更优，而双关节外骨骼的功耗对负载变化的敏感性更低，且其惯性对代谢成本的负面影响较小。

详情

DOI: 10.1109/TNSRE.2026.3658597

英文摘要

Developing exoskeletons that can reduce the metabolic cost of assisted subjects is challenging since a systematic design approach is required to capture the effects of device dynamics and the assistance torques on human performance. Design studies that rely on musculoskeletal models hold high promise in providing effective design guidelines, as the effect of various devices and different assistance torque profiles on metabolic cost can be studied systematically. In this paper, we present a simulation-based multi-criteria design approach to systematically study the effect of different device kinematics and corresponding optimal assistive torque profiles under actuator saturation on the metabolic cost, muscle activation, and joint reaction forces of subjects walking under different loading conditions. For the multi-criteria comparison of exoskeletons, we introduce a Pareto optimization approach to simultaneously optimize the exoskeleton power consumption and the human metabolic rate reduction during walking, under different loading conditions. We further superpose the effects of device inertia and electrical regeneration on the metabolic rate and power consumption, respectively. Our results explain the effects of heavy loads on the optimal assistance profiles of the exoskeletons and provide guidelines on choosing optimal device configurations under actuator torque limitations, device inertia, and regeneration effects. The multi-criteria comparison of devices indicates that despite the similar assistance levels of both devices, mono-articular exoskeletons show better performance on reducing the peak reaction forces, while the power consumption of bi-articular devices is less sensitive to the loading. Furthermore, for the bi-articular exoskeletons, the device inertia has lower detrimental effects on the metabolic cost of subjects and does not affect the Pareto-optimality of solutions.

URL PDF HTML ☆

赞 0 踩 0

2008.03496 2026-05-14 cs.AI cs.LO cs.RO

Human Robot Collaborative Assembly Planning: An Answer Set Programming Approach

Momina Rizwan, Volkan Patoglu, Esra Erdem

AI总结本文研究了人机协作装配任务中的规划问题，提出了一种基于答案集编程的方法，结合常识推理和丰富的通信动作，以应对人类行为不确定性带来的挑战。该方法通过扩展混合条件规划，实现了对装配动作顺序的高层规划与几何可行性验证，并在实际场景中验证了其有效性，展示了双臂机器人与人类协作组装家具的应用案例。

Comments 36th International Conference on Logic Programming (ICLP 2020), University Of Calabria, Rende (CS), Italy, September 2020, 15 pages

1903.00745 2026-05-14 cs.AI cs.LO cs.RO

A Formal Framework for Robot Construction Problems: A Hybrid Planning Approach

Faseeh Ahmad, Esra Erdem, Volkan Patoglu

AI总结本文研究了由多个自主机器人协作堆叠预制模块构建稳定结构的机器人建造问题，该问题因动作的连锁效应、真正的并发操作以及结构稳定性和模块支撑性要求而具有挑战性。作者提出了一种基于答案集编程的混合规划框架，能够同时确定最终稳定结构配置并规划多机器人操作顺序，确保每一步部分结构的稳定性与支撑性。该方法在理论上有严格的正确性与完备性保证，并通过多个具有挑战性的建造实例验证了其有效性与实用性。

Comments 8 pages (double-column), 7 figures

1811.12784 2026-05-14 cs.CV

The GAN that Warped: Semantic Attribute Editing with Unpaired Data

Gara Dorta, Sara Vicente, Neill D. F. Campbell, Ivor J. A. Simpson

AI总结该研究提出了一种基于平滑变形场的语义图像编辑方法，能够在不依赖配对数据的情况下实现高质量的图像编辑。通过结合生成对抗网络（GAN）的最新进展，该方法能够使用未配对数据进行训练，有效保留图像主体的身份特征，并在高分辨率（如4K）图像上实现了高效的编辑。实验表明，该方法在人脸和鸟类图像数据集上均表现出优异的编辑效果和鲁棒性。

Comments CVPR 2020

1804.05261 2026-05-14 cs.CV cs.GR

Physics-driven Fire Modeling from Multi-view Images

Gara Dorta, Luca Benedetti, Dmitry Kit, Yong-Liang Yang

AI总结该研究提出了一种从多视角图像中重建物理合理的火焰模型的新方法，解决了传统火焰建模中依赖复杂物理模拟或简化假设的问题。通过RGB相机首次实现了对火焰体积物理属性（如温度、密度）的合理估计，从而支持全局火焰光照等新现象。该方法在多种输入数据上进行了验证，并成功应用于虚拟场景的真实光照生成，展示了其有效性与实用性。

1307.7494 2026-05-14 cs.AI cs.LO cs.RO

ReAct! An Interactive Tool for Hybrid Planning in Robotics

Zeynep Dogmus, Esra Erdem, Volkan Patoglu

AI总结本文介绍了一种名为 ReAct! 的交互式工具，用于机器人领域中的混合规划。该工具允许研究人员在无需了解底层形式化语法和语义细节的情况下，描述机器人在动态环境中的行为并解决规划问题。ReAct! 支持复杂动态域的建模，包括并发、动作的间接效应和状态/转换约束，并能够将外部计算（如碰撞自由轨迹检查）嵌入到混合域的表示中，从而实现离散高层推理与连续几何推理的紧密集成，适用于从服务机器人到认知工厂等多种复杂场景。

2605.13340 2026-05-14 cs.LG

Shortcut Mitigation via Spurious-Positive Samples

Phuong Quynh Le, Jörg Schlötterer, Sari Sadiya, Gemma Roig, Christin Seifert

AI总结该论文研究了如何缓解模型对虚假特征（spurious attributes）的依赖问题。作者提出了一种无需额外标注或平衡数据的方法，通过分析模型预测过程，识别出模型依赖虚假特征的样本，并据此定位中间层中与这些特征相关的神经元进行正则化。该方法有效提升了模型的鲁棒性，使其更依赖于真正的判别特征而非偶然正确的预测。

Comments preprint

2605.13335 2026-05-14 cs.AI cs.CV

Ego2World: Compiling Egocentric Cooking Videos into Executable Worlds for Belief-State Planning

Qinchuan Cheng, Zhantao Gong, Pengzhan Sun, Angela Yao, Xulei Yang, Shijie Li

AI总结本文提出 Ego2World，一个将第一视角烹饪视频编译为可执行符号世界的基准，用于评估具身智能体在部分可观测环境下的规划能力。该方法基于视频标注提取可复用的状态转移规则，并在隐藏的符号世界图中执行，迫使智能体仅依靠局部观测和执行反馈进行规划与记忆更新。实验表明，传统动作重叠度指标可能高估任务成功率，而维持持久的信念记忆有助于提升任务完成效率并减少重复视觉探索。

Comments Project page: https://sj-li.com/PROJ/Ego2World/

2605.13334 2026-05-14 cs.CL

LLM-Based Persuasion Enables Guardrail Override in Frontier LLMs

Rodrigo Nogueira, Thales Sales Almeida, Giovana Kerche Bonás, Andrea Roque, Ramon Pires, Hugo Abonizio, Thiago Laitz, Celio Larcher, Roseval Malaquias Junior, Marcos Piau

AI总结该研究探讨了前沿大型语言模型（LLM）在面对敏感话题时的防护机制，并发现这些模型虽然直接拒绝生成争议性内容，但在模拟用户说服的对话中，却能被其他LLM成功引导生成此类内容。研究通过自然语言说服策略，如同行对比和认知责任重构，展示了攻击者LLM无需明确指令即可促使目标LLM突破其安全限制。实验表明，不同模型组合在多个科学共识话题上均能生成争议性文章，揭示了当前LLM安全机制在交互场景中的潜在漏洞。

2605.13333 2026-05-14 cs.CV cs.AI cs.GR cs.LG

Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation

Junhyuk Jeon, Seokhyeon Hong, Junyong Noh

AI总结该研究针对文本驱动的运动扩散模型在生成精细风格化动作时的不足，提出了一种轻量级的风格条件生成框架。通过超网络生成低秩适配参数，动态调节预训练扩散模型，从而在去噪过程中实现对风格的精细控制。该方法利用监督对比损失结构风格潜在空间，提升了对未见风格的泛化能力，并在多个数据集上取得了领先的风格化生成效果。

Comments Accepted to SIGGRAPH 2026. Project page: https://junhyukjeon.github.io/projects/style-salad/

2605.13332 2026-05-14 cs.AI cs.CC

Diversity of Extensions in Abstract Argumentation

Johannes K. Fichte, Markus Hecher, Yasir Mahmood, Zhengjun Wang

AI总结本文研究抽象论证框架中扩展集的多样性问题，提出了一种基于对称差的定量多样性度量方法，用于衡量不同扩展集之间的差异程度。作者系统分析了相关推理问题的计算复杂性，并探讨了框架是否允许具有特定多样性的扩展集，以及如何计算最大可能的多样性值。研究还提供了计算多样性水平的原型系统和实验评估。

Comments Technical Report to the paper accepted at IJCAI 2026

2605.13330 2026-05-14 cs.CL

FIND: Toward Multimodal Financial Reasoning and Question Answering for Indic Languages

Sarmistha Das, Vaibhav Vishal, Syed Ibrahim Ahmad, Manish Gupta, Sriparna Saha

AI总结该研究针对多语言金融场景下的数值推理与问答任务，提出了一种面向印地语系语言的新型基准数据集FinVQA，涵盖英语、印地语、孟加拉语等六种语言，包含18,900个样本，覆盖14个金融领域。为应对多模态和多语言带来的挑战，研究还提出FIND框架，结合监督微调与约束感知解码，提升模型在数值推理、多模态理解和结构化决策方面的能力，为高风险多语言金融推理任务提供了评估与建模的新范式。

2605.13329 2026-05-14 cs.CL cs.AI

Tracing Persona Vectors Through LLM Pretraining

Viktor Moskvoretskii, Dominik Glandorf, Jorge Medina Moreira, Tanja Käser, Robert West

AI总结本文研究了大语言模型在预训练过程中如何形成用于表示高层行为的“人格向量”，并追踪了这些向量在OLMo-3-7B模型预训练阶段的演变过程。研究发现，人格向量在预训练初期就已形成，并在后续训练中持续优化。实验还表明，不同的人格提取方法能够揭示模型中不同方面的行为特征，且这一现象在其他模型如Apertus-8B中也得到验证，说明人格向量是预训练早期形成的稳定特征，为理解模型行为的可解释性提供了新方向。

Comments Preprint

2605.13328 2026-05-14 cs.RO cs.AI cs.CL cs.CV

What Limits Vision-and-Language Navigation ?

Yunheng Wang, Yuetong Fang, Taowen Wang, Lusong Li, Kun Liu, Junzhe Xu, Zizhao Yuan, Yixiao Feng, Jiaxi Zhang, Wei Lu, Zecui Zeng, Renjing Xu

AI总结视觉与语言导航（VLN）是具身智能的重要研究方向，但在从仿真环境迁移到真实世界时，现有方法常因感知不稳定和指令模糊而表现下降。本文提出StereoNav，一种融合视觉、语言和动作的鲁棒框架，通过引入目标位置先验和双目视觉技术，增强跨域导航的稳定性与准确性。实验表明，StereoNav在多个基准测试中取得先进性能，并在真实机器人部署中显著提升了复杂环境下的导航可靠性。

2605.13321 2026-05-14 cs.RO

HCSG: Human-Centric Semantic-Geometric Reasoning for Vision-Language Navigation

Haoxuan Xu, Tianfu Li, Wenbo Chen, Yi Liu, Jin Wu, Huashuo Lei, Yunfan Lou, Lujia Wang, Hesheng Wang, Haoang Li

AI总结视觉语言导航（VLN）在数据和模型规模扩展的推动下取得了显著进展，但在真实室内场景中，机器人常需应对动态行人，现有方法多将行人视为被动障碍物，缺乏对人类意图和社交规范的主动理解。为此，本文提出HCSG，首个以人类为中心的视觉语言导航框架，通过融合几何预测与语义解释模块，实现对人类行为的主动理解与社交距离控制，显著提升了导航的安全性与社会适应性。实验表明，HCSG在HA-VLNCE基准测试中大幅优于现有方法，成功率提升14%，碰撞率降低34%。

2605.13316 2026-05-14 cs.CV

Test-time Sparsity for Extreme Fast Action Diffusion

Kangye Ji, Yuan Meng, Jianbo Zhou, Ye Li, Chen Tang, Zhi Wang

AI总结该研究针对动作扩散模型在生成高质量动作序列时计算成本高的问题，提出了一种测试时稀疏化方法，通过动态预测模型前向过程中的可剪枝残差计算，以加速动作生成。为解决重复编码和剪枝带来的效率瓶颈，设计了高度并行的推理流程，并引入多向复用策略，有效提升了剪枝稀疏度与生成效率。实验表明，该方法在保持性能不变的情况下，将计算量降低了92%，生成速度提升了5倍。

详情

英文摘要

Action diffusion excels at high-fidelity action generation but incurs heavy computational costs owing to its iterative denoising nature. Despite current technologies showing promise in accelerating diffusion transformers by reusing the cached features, they struggle to adapt to policy dynamics arising from diverse perceptions and multi-round rollout iterations in open environments. We propose test-time sparsity to tackle this challenge, which aims to accelerate action diffusion by dynamically predicting prunable residual computations for each model forward at test time. However, two bottlenecks remain in this paradigm: 1) repetitive conditional encoding and pruning offset most potential speed gains, and 2) the features cached from previous denoising timesteps cannot constrain large pruning errors under aggressive sparsity. To address the first bottleneck, we design a highly parallelized inference pipeline that minimizes the non-decoder delay to milliseconds. Specifically, we first design a lightweight pruner that shares the encoder with the diffusion transformer. Then, we decouple the encoding and pruning from the autoregressive denoising loop by processing all denoising timesteps in parallel, and overlap the pruner with the decoder forward inference through asynchronism. To overcome the second bottleneck, we introduce an omnidirectional reusing strategy, which achieves 95% sparsity by selectively reusing features cached from the current forward, previous denoising timesteps, and earlier rollout iterations. To learn the rollout-level reusing strategies, we sample a few action trajectories to supervise the sparsified diffusion step by step. Extensive experiments demonstrate that our method reduces FLOPs by 92% and accelerates action generation by 5x, achieving lossless performance with an inference frequency of 47.5 Hz. Our code is available at https://github.com/ky-ji/Test-time-Sparsity.

URL PDF HTML ☆

赞 0 踩 0

2605.13312 2026-05-14 cs.LG

Supervised Deep Multimodal Matrix Factorization for Interpretable Brain Network Analysis

Amjad Seyedi, Lifang He, Songlin Zhao, Akwum Onwunta, Nicolas Gillis

AI总结本文提出了一种可解释的监督深度多模态矩阵分解框架SD3MF，用于整合多模态脑网络数据的分析。该方法将传统的无监督单图聚类扩展为多模态图的监督预测，通过深度分层分解学习各模态的特征，并构建共享的潜在表示以对齐不同视角的被试数据。实验表明，SD3MF在多模态脑连接数据集上优于CNN和GNN等深度学习方法，同时能够提供具有生物学意义的可解释特征。

2605.13311 2026-05-14 cs.AI cs.IR cs.MA

IdeaForge: A Knowledge Graph-Grounded Multi-Agent Framework for Cross-Methodology Innovation Analysis and Patent Claim Generation

Joy Bose

AI总结 IdeaForge 是一个基于知识图谱的多智能体框架，旨在支持跨创新方法（如 TRIZ、设计思维等）的创新分析与专利权利要求生成。该框架通过多个专业智能体在持久化的知识图谱上协作，整合不同方法的结构化推理结果，并利用图结构实现跨方法的收敛关联，从而识别高可信度的创新方案。研究提出了一种基于图的收敛机制和专利生成流程，实验表明该方法在创新候选的多样性和可追溯性方面优于单一方法的基线模型。

Comments 14 pages, 3 figures, 6 tables

详情

英文摘要

Current AI-assisted innovation systems typically apply a single ideation methodology (such as TRIZ or Design Thinking) using sequential prompt-based workflows that do not preserve intermediate reasoning structure. As a result, insights generated across methodologies remain fragmented, limiting traceability, synthesis, and systematic evaluation of novelty. We present IdeaForge, a knowledge graph-grounded multi-agent framework for innovation analysis and patent claim generation. IdeaForge integrates multiple innovation methodologies (TRIZ, Design Thinking, and SCAMPER) through specialist agents operating over a persistent FalkorDB knowledge graph. Each agent contributes structured entities and relationships representing contradictions, inventive principles, user needs, transformations, analogies, and candidate claims. The central contribution of IdeaForge is a cross-methodology convergence mechanism implemented through graph-based claim linkage. Claims independently supported by multiple methodologies are connected using CONVERGENT relationships, enabling identification of high-confidence innovation candidates through graph traversal. A downstream patent drafting agent generates structured patent drafts grounded in convergent claim subgraphs, reducing reliance on unconstrained language model generation. An InnovationScore formula ranks claims by convergent support, methodology diversity, claim strength, and prior art challenge count. We describe the graph schema, agent architecture, convergence detection pipeline, and patent synthesis workflow. Experiments on a legal technology use case demonstrate that graph-grounded multi-methodology synthesis produces more diverse and traceable innovation candidates compared to single-methodology baselines. We discuss implications for computational creativity, explainable AI-assisted invention, and graph-native innovation systems.

URL PDF HTML ☆

赞 0 踩 0

2605.13307 2026-05-14 cs.CL cs.HC

PRISM-X: Experiments on Personalised Fine-Tuning with Human and Simulated Users

Hannah Rose Kirk, Liu Leqi, Fanzhi Zeng, Henry Davidson, Bertie Vidgen, Christopher Summerfield, Scott A. Hale

AI总结该研究探讨了个性化微调在对话系统中的有效性，通过大规模的被试内实验，比较了基于真实用户和模拟用户对个性化与非个性化语言模型的偏好。研究发现，基于用户偏好进行微调的方法在短期表现上优于通用模型和个性化提示，但在长期可能加剧模型的奉承和关系寻求行为。实验还表明，模拟用户在判断一致性、话题多样性和反馈动态等方面与真实用户存在显著差异，难以完全替代人类进行评估。

2605.13306 2026-05-14 cs.CV

Color Constancy in Hyperspectral Imaging via Reduced Spectral Spaces

G. Dofri Vidarsson, Liying Lu, Sabine Süsstrunk

AI总结本文研究了如何通过降低光谱维度来提升高光谱成像中的颜色恒定性估计性能。作者采用基于相关性的颜色估计（CbC）框架，分析了不同光谱降维策略对光照估计的影响，揭示了在何种条件下紧凑的光谱表示优于传统RGB方法。该研究为高效利用高光谱信息进行光照估计提供了实用指导。

2605.13305 2026-05-14 cs.LG math.DS physics.chem-ph

MPINeuralODE: Multiple-Initial-Condition Physics-Informed Neural ODEs for Globally Consistent Dynamical System Learning

Lake Yang, Antonio Malpica-Morales, Frank Ioannis Papadakis Wood, Serafim Kalliadasis

AI总结本文提出了一种名为MPINeuralODE的新方法，用于解决神经常微分方程（Neural ODE）在面对未见过的初始条件和长期预测时泛化能力差的问题。该方法结合了软物理感知残差和多初始条件（MIC）多阶段训练策略，通过结构互补的方式提升了对动态系统矢量场的全局一致性学习能力。实验表明，MPINeuralODE在多个指标上优于现有方法，尤其在长期稳定性和哈密顿量漂移控制方面表现突出。

2605.13301 2026-05-14 cs.AI cs.CL

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Yafu Li, Runzhe Zhan, Haoran Zhang, Shunkai Zhang, Yizhuo Li, Zhilin Wang, Jiacheng Chen, Futing Wang, Xuyang Hu, Yuchen Fan, Bangjie Xu, Yucheng Su, Xinmiao Han, Chenxi Li, Haodi Lei, Yufeng Zhao, Zejin Lin, Qianjia Cheng, Tong Zhu, Xiaoye Qu, Ganqu Cui, Peng Ye, Yun Luo, Zhouchen Lin, Yu Qiao, Bowen Zhou, Ning Ding, Yu Cheng

AI总结本文提出了一种简单统一的方法，将预训练的推理模型转化为能够达到国际数学和物理奥林匹克竞赛金牌水平的解题系统。该方法通过逆困惑度课程进行监督微调，培养严格的证明搜索与自我检查能力，并通过两阶段强化学习流程逐步提升模型性能，最终通过测试时扩展进一步提高解题效果。实验表明，基于该方法训练的模型SU-01在数学与物理竞赛中表现出色，同时在科学推理的跨领域泛化能力方面也表现出色。

Comments Technical Report. 77 pages

2605.13297 2026-05-14 cs.LG

PaMM: Periodic Motif Memory for Atomistic Models with an Explicit Local-Structure Interface

Ryan Dong

AI总结本文提出了一种名为PaMM的周期性配位模式记忆模块，用于增强原子模型对晶体结构中重复局部配位模式的显式建模能力。PaMM通过引入基于原子类型和几何特征的成对和三元组模式查找表，显式地编码局部结构信息，并与原始边特征进行融合。实验表明，在固定训练预算下，PaMM能够有效提升模型在能量和力预测上的性能，且其优势来源于结构化的配对/三元组组织方式，而非简单的容量增加。

2605.13296 2026-05-14 cs.AI cs.LG cs.MA

Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention

Yuanzhe Wang, Tian Zhi, Zihang Wei, Hongguang Wang, Jiaming Guo, Yang Zhao, Zisheng Liu, Shiyu Quan, Xing Hu, Zidong Du, Yunji Chen

AI总结本文研究了在复杂拥挤环境中多智能体路径规划（MAPF）的问题，提出了一种基于离散扩散模型的混合框架DiffLNS，用于生成高质量的初始路径草案以提升修复型求解器的性能。该方法结合了稀疏社交注意力机制的离散去噪扩散概率模型（D3PM）与LNS2算法，直接在离散动作空间中生成多样化的联合路径草案，有效提升了大规模MAPF问题的求解成功率和效率。实验表明，DiffLNS在多种复杂场景中表现优异，平均成功率达到95.8%，显著优于现有方法。

Comments 24 pages, 7 figures

2605.13295 2026-05-14 cs.CL cs.AI cs.MA

CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution

Tom Zehle

AI总结本文提出了一种名为 CANTANTE 的框架，用于优化基于大语言模型的多智能体系统。该方法通过对比不同联合配置在相同查询上的执行结果，将系统层面的奖励分解为每个智能体的更新信号，从而解决信用分配问题。实验表明，CANTANTE 在编程、数学推理和多跳问答等任务上均优于现有优化方法，且在保持较高性能的同时降低了推理成本。

2605.13293 2026-05-14 cs.CV

Img2CADSeq: Image-to-CAD Generation via Sequence-Based Diffusion

Shiyu Tan, Zixuan Zhao, Hao Gao, Zhiheng Chen, Xiaolong Yin, Enya Shen

AI总结该论文提出了一种名为Img2CADSeq的多阶段图像到CAD生成方法，旨在从单视角图像中生成高质量的边界表示（BRep）CAD模型。其核心方法是将CAD操作序列编码为三级层次化代码本，并通过重要性优先策略，优先保留轮廓信息以压缩长序列到稳定的离散潜在空间。为弥合图像与CAD之间的模态差异，研究引入了基于对比学习的点云中间表示，结合VQ-Diffusion模型进行条件生成，并在新构建的CAD-220K和PrintCAD数据集上验证了方法的有效性，显著优于现有方法，生成的STEP文件可直接用于商业CAD软件。

Comments Accepted by SIGGRAPH 2026 Conference