arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.12576 2026-06-12 cs.CL 新提交

Helping Figures Tell their Story! Paper-Grounded Video Generation Explaining Complex Scientific Figures

帮助图表讲述它们的故事!基于论文的视频生成解释复杂科学图表

Ishani Mondal, Javad Baghirov, Jordan Boyd-Graber

AI总结 提出MINARD流水线,从图表及其论文生成基于区域分解的叙述性视频,并发布FigTalk基准,在自动和人工评估中优于现有方法。

详情
Comments
Webpage: this https URL
AI中文摘要

科学图表将复杂的流程压缩到单个画布中,但理解它们需要基于论文的、逐步的叙述,并与视觉高亮对齐——这是当前视频生成系统和基准所缺乏的能力。为了解决这个问题,我们引入了基于论文的图表到视频生成:从图表及其论文生成叙述性的、区域引导的导览视频。我们提出了MINARD(通过区域分解对叙述性架构进行多模态解释),这是一个生成基于论文的叙述并顺序将其与图表区域对齐的流水线。我们还发布了FigTalk,一个包含新的顺序和组件级对齐指标的基准。在FigTalk上,MINARD生成类人的、忠于论文的叙述,并在自动和人工评估中,在叙述条件下的图表空间对齐方面优于现有方法。

英文摘要

Scientific figures compress complex pipelines into a single canvas, yet understanding them requires paper-grounded, step-by-step narration aligned with visual highlights a capability missing from current video generation systems and benchmarks. To address this, we introduce paper-grounded figure-to-video generation: generating narrated, region-grounded walkthrough videos from a figure and its paper. We propose MINARD (Multimodal Interpretation of Narrated Architecture via Region Decomposition), a pipeline that generates paper-grounded narrations and sequentially grounds them to figure regions. We also release FigTalk, a benchmark with new sequential and component-level grounding metrics derived. On FigTalk, MINARD generates humanlike, paper-faithful narrations and outperforms narration-conditioned figure spatial grounding compared to existing approaches in both automatic and human evaluation

2606.12505 2026-06-12 cs.LG cs.AI 新提交

Boosting Direct Preference Optimization with Penalization

通过惩罚增强直接偏好优化

Pengwei Sun

AI总结 提出DPOP,在DPO损失上增加对参考模型贪婪响应的门控惩罚,仅当当前策略对偏好响应概率低于拒绝响应时激活,在AlpacaEval 2.0上显著提升胜率。

详情
Comments
Accepted at ICML 2026 Workshop on Decision-Making from Offline Datasets to Online Adaptation: Black-Box Optimization to Reinforcement Learning
AI中文摘要

离线偏好优化已成为从人类反馈中进行强化学习的实用替代方案,但诸如直接偏好优化(DPO)及其变体等成对目标仅使用存储在静态数据集中的选择和拒绝响应。这留下了一个有用的信号未被利用:参考模型本身为同一提示生成的响应。我们提出了带惩罚的直接偏好优化(DPOP),这是DPO的一个简单扩展,它在基础偏好损失上增加了一个对参考贪婪响应的门控惩罚。DPOP仅在当前策略对偏好响应的似然仍低于对拒绝响应的似然时激活此惩罚。在AlpacaEval 2.0上,DPOP在Llama-3-8b-it和Gemma-2-9b-it上均提高了长度控制的胜率,相对于DPO、SimPO和AlphaDPO,在两个模型上分别实现了5.3%和4.4%的相对增益。消融实验进一步表明,在此设置下,SimNPO风格的长度归一化惩罚比NPO和token级非似然惩罚更强。

英文摘要

Offline preference optimization has become a practical substitute for reinforcement learning from human feedback, but pairwise objectives such as Direct Preference Optimization (DPO) and its variants use only the chosen and rejected responses stored in a static dataset. This leaves a useful signal unused: the response that the reference model itself would generate for the same prompt. We propose Direct Preference Optimization with Penalization (DPOP), a simple extension of DPO that augments the base preference loss with a gated penalty on reference-greedy responses. DPOP activates this penalty only when the current policy still assigns a lower likelihood to the preferred response than to the rejected response. On AlpacaEval 2.0, DPOP improves length-controlled win rate over DPO, SimPO, and AlphaDPO on both Llama-3-8b-it and Gemma-2-9b-it, achieving relative gains of 5.3\% and 4.4\% over baselines on the two models, respectively. Ablations further show that a SimNPO-style length-normalized penalty is stronger than NPO and token-level unlikelihood in this setting.

2606.12500 2026-06-12 cs.LG cs.AI 新提交

Improving Crash Frequency Prediction from Simulated Traffic Conflicts Using Machine Learning Based Microsimulation

基于机器学习的微观仿真从模拟交通冲突改进碰撞频率预测

Xian Liu, Carlo G. Prato, Gustav Markkula

AI总结 本文利用机器学习行为模型替代传统规则模型进行交通微观仿真,通过极端值理论分析模拟冲突预测碰撞频率,在英国利兹五个信号交叉口验证了ML模型无需地点校准即可提升预测准确性。

详情
AI中文摘要

交通微观仿真结合替代安全措施越来越多地被用作历史碰撞数据的主动替代方案,用于预测当前或计划道路基础设施设计的碰撞频率。然而,现有的基于微观仿真的安全研究采用了简化的基于规则的行为模型,这些模型能较好地再现交通流,但往往无法生成真实的冲突动态,限制了碰撞预测的准确性。机器学习(ML)行为模型的最新进展提供了一个有希望的机会,通过直接从大规模轨迹数据集中学习人类驾驶行为,可能提高微观仿真的真实性和碰撞频率预测。为了研究这种可能性,我们对英国利兹的五个真实信号交叉口进行了交通微观仿真,使用了标准的基于规则模型和最先进的ML模型。使用二维碰撞时间指标分析模拟车辆轨迹以识别模拟冲突,然后使用极端值理论建模以预测碰撞频率。结果表明,ML模型的冲突产生的碰撞预测与实际碰撞数据一致,而基于规则的模型由于缺乏对特定模拟交叉口的模型校准,无法产生有意义的预测。直接使用ML生成的模拟碰撞来预测实际碰撞频率也产生了较差的结果,这表明尽管当前的ML模型可以真实地再现冲突,但尚不能生成真实的碰撞。总体而言,研究结果表明,基于ML的行为模型在无需特定地点模型校准的情况下,有望从模拟冲突中改进碰撞预测,并为基于ML的交通微观仿真指明了明确的未来方向。

英文摘要

Traffic microsimulation combined with surrogate safety measures has increasingly been used as a proactive alternative to historical crash data for predicting crash frequency for current or planned road infrastructure designs. However, existing microsimulation-based safety studies have adopted simplified rule-based behaviour models, which reproduce traffic flow reasonably well but often fail to generate realistic conflict dynamics, limiting crash prediction accuracy. Recent advances in machine learning (ML)-based behaviour models offer a promising opportunity to potentially improve microsimulation realism and crash frequency predictions by learning human driving behaviour directly from large-scale trajectory datasets. To investigate this possibility, traffic microsimulation was conducted for five real-world signalised intersections in Leeds, UK, using both a standard rule-based model and a state-of-the-art ML model. Simulated vehicle trajectories were analysed using a two-dimensional Time-to-Collision metric to identify simulated conflicts, which were then modelled using Extreme Value Theory to predict crash frequency. Results show that conflicts from the ML model yielded crash predictions in line with the real-world crash data, whereas the rule-based model did not permit meaningful predictions, presumably due to a lack of model calibration to the specific simulated intersections. Directly using ML-generated simulated crashes to predict real-world crash frequency also yielded poor results, suggesting that while current ML models can realistically reproduce conflicts, they are not yet able to generate realistic crashes. Overall, the findings demonstrate that ML-based behaviour models are promising for improving crash prediction from simulated conflicts, without a need for location-specific model calibration, and suggest clear future directions for ML-based traffic microsimulation.

2606.12498 2026-06-12 cs.CR cs.LG 新提交

From Parameters to Feature Space: Task Arithmetic for Backdoor Mitigation in Model Merging

从参数到特征空间:模型合并中后门缓解的任务算术

Zhenqian Zhu, Yamin Hu, Yiya Diao, Weixiang Li, Haodong Li, Wenjian Luo

AI总结 提出线性特征路径最小化(LFPM)框架,通过跨任务线性性在特征空间优化反后门任务向量,在模型合并中有效抑制后门且保持干净任务性能。

详情
AI中文摘要

模型合并(MM)作为一种将多个任务特定模型整合为统一模型的成本效益方法,已获得显著关注。然而,近期工作揭示MM极易受到后门攻击。现有基于任务算术的防御通常因依赖直接参数空间编辑,在未显著降低干净任务性能的情况下难以消除后门。为解决这一差距,我们提出线性特征路径最小化(LFPM),一种用于模型合并的后门缓解框架,该框架将反后门任务向量引入被后门污染的合并模型。与先前方法不同,LFPM在跨任务线性性(CTL)框架下从统一的特征空间视角制定合并模型的后门鲁棒性,该框架利用跨任务特征的近似线性性。这一视角指导反后门任务的优化,以在抑制后门的同时保持干净任务性能。此外,我们引入一种基于梯度累积和损失路径积分的有效优化机制,确保沿插值路径的鲁棒后门抑制。大量实验表明,LFPM在完全微调和参数高效微调(PEFT)设置中均对后门攻击表现出强鲁棒性。

英文摘要

Model merging (MM) has gained significant attention as a cost-effective approach to integrate multiple task-specific models into a unified model. However, recent work reveals that MM is highly susceptible to backdoor attacks. Existing defenses based on task arithmetic often fail to eliminate backdoors without substantially degrading clean-task performance, owing to their reliance on direct parameter-space editing. To address this gap, we propose Linear Feature Path Minimization (LFPM), a backdoor mitigation framework for model merging, which introduces an anti-backdoor task vector into the backdoored merged model. Unlike prior approaches, LFPM formulates the backdoor robustness of the merged model from a unified feature-space perspective under the Cross-Task Linearity (CTL) framework, which leverages the approximate linearity of features across tasks. This perspective guides the optimization of the anti-backdoor task to suppress backdoors while preserving clean-task performance. Furthermore, we introduce an effective optimization mechanism based on gradient accumulation and loss path-integral, ensuring robust backdoor suppression along the interpolation path. Extensive experiments demonstrate that LFPM consistently exhibits strong robustness against backdoor attacks in both full fine-tuning and Parameter-Efficient Fine-Tuning (PEFT) settings.

2606.12483 2026-06-12 cs.LG 新提交

Scalable anomaly detection via a univariate Christoffel function

通过单变量Christoffel函数实现可扩展的异常检测

Florian Grivet (CNES, LAAS-DISCO, Comue de Toulouse), Didier Henrion (LAAS-POP), Jean-Bernard Lasserre (TSE-R, LAAS-POP), Louise Travé-Massuyès (LAAS-DISCO, Comue de Toulouse)

AI总结 针对Christoffel函数方法因矩阵大小随维度指数增长而难以应用于高维数据的问题,提出基于查询点与支撑点间平方距离的单变量Christoffel函数(UCF),在ADBench基准上平均精度优于14种基线方法。

详情
AI中文摘要

异常检测在欺诈检测、网络入侵和系统故障诊断等领域识别异常模式中发挥关键作用。近年来,基于Christoffel函数的方法(根植于多项式优化)因其坚实的数学基础和计算节俭性,成为深度学习的有前景替代方案。然而,其实用性受限于需要求逆一个大小随数据维度指数增长的矩阵,即使对于中等维度数据集也难以处理。本文解决了Christoffel函数异常检测的维度限制,同时保留了其关键理论性质,即开关支撑二分法行为和准确的支撑形状捕获。我们引入了UCF,一种基于查询点与支撑点间平方距离的单变量Christoffel函数。在ADBench基准上的大量实验表明,UCF在平均精度上持续优于14个最先进的基线方法。通过解决Christoffel函数的可扩展性瓶颈,本文扩展了异常检测方法的工具箱,提供了一种稳健、有理论依据且普遍适用的方法。

英文摘要

Anomaly detection plays a critical role in identifying unusual patterns across domains such as fraud detection, network intrusion, and system fault diagnosis. Recently, Christoffel function-based methods, rooted in polynomial optimization, have emerged as promising alternatives to deep learning due to their strong mathematical foundations and computational frugality. However, their practical applicability is hindered by the need to invert a matrix whose size grows exponentially with the data dimension, rendering the method intractable even for moderate-dimensional datasets. This paper addresses the dimensionality limitations of Christoffel function-based anomaly detection while preserving its key theoretical properties, i.e., the on-off support dichotomy behavior and the accurate support shape capture. We introduce UCF, a univariate Christoffel function which is based on the squared distance between the query point and the support points. Extensive experiments on the ADBench benchmark demonstrate that UCF consistently outperforms 14 state-of-the-art baselines in terms of Average Precision. By resolving the scalability bottleneck of the Christoffel Function, this work expands the toolkit of anomaly detection methods with a robust, theoretically grounded, and universally applicable approach.

2606.12474 2026-06-12 cs.MA cs.AI cs.CR 新提交

SAIGuard: Communication-State Simulation for Proactive Defense of LLM Multi-Agent Systems

SAIGuard: 面向LLM多智能体系统主动防御的通信状态模拟

Ruxue Shi, Yili Wang, Mengnan Du, Qinggang Zhang, Rui Miao, Yixin Liu, Xin Wang

AI总结 提出SAIGuard主动防御框架,通过通信状态模拟检测并净化风险消息,降低攻击成功率并保持系统效用。

详情
AI中文摘要

基于LLM的多智能体系统(MAS)通过智能体间协作解决复杂任务,但其通信驱动的特性也使安全风险能够在智能体间传播并引发系统级故障。现有的MAS防御主要遵循执行后的反应式范式,通过检测和隔离有害智能体,但这可能导致不可逆的损害并降低协作效用。为解决此问题,我们提出一种面向MAS安全的主动防御框架,即模拟感知拦截守卫(SAIGuard)。SAIGuard在MAS交互图上执行通信状态模拟,估计传入消息对局部智能体状态和全局MAS状态的影响,并通过与良性通信模式的重建偏差检测风险消息。SAIGuard不隔离智能体,而是在可疑消息传播到系统之前对其进行净化或重新生成。跨多种拓扑和攻击场景的实验表明,SAIGuard在保持MAS效用的同时降低了攻击成功率,优于反应式防御。

英文摘要

LLM-based multi-agent systems (MAS) solve complex tasks through inter-agent collaboration, but their communication-driven nature also allows security risks to spread across agents and trigger system-wide failures. Existing MAS defenses mainly follow a reactive paradigm after execution by detecting and isolating harmful agents, which may cause irreversible damage and degrade collaborative utility. To address this, we propose a proactive defense framework for MAS security, namely a Simulation-aware Interception Guard (SAIGuard). SAIGuard performs communication-state simulation over the MAS interaction graph, estimates the impact of incoming messages on local agent states and the global MAS state, and detects risky messages via reconstruction deviations from benign communication patterns. Instead of isolating agents, SAIGuard sanitizes or regenerates suspicious messages before it propagation into system. Experiments across diverse topologies and attack scenarios show that SAIGuard reduces attack success rates while maintaining MAS utility, outperforming reactive defenses.

2606.12441 2026-06-12 cs.CY cs.AI cs.HC 新提交

Generativism: Toward a Learning Theory for the Age of Generative Artificial Intelligence

生成主义:面向生成式人工智能时代的学习理论

Shan Li, Juan Zheng

AI总结 本文批判性审视行为主义、认知主义、建构主义和连接主义四大学习理论在生成式AI时代的局限,提出以“生成主义”为核心的新学习理论,强调人机协作的知识共建。

详情
AI中文摘要

行为主义、认知主义、建构主义和连接主义这四种主流学习理论,随着生成式人工智能在教育环境中的普及,显示出显著的概念局限性。这些框架是在能够生成、综合和推理知识的AI系统出现之前形成的。本文批判性地审视每种学习理论,并识别出生成式AI的赋能所挑战的假设。基于分布式认知、延展心智、人机协作、AI素养、认知卸载和元认知等研究,本文提出生成主义作为生成式AI时代的学习理论。生成主义认为,学习日益通过人类学习者与AI系统之间的迭代知识共建而发生。该框架围绕四个原则组织:认知伙伴关系、分布式能动性、生成素养和适应性元认知。该框架为在生成式AI在认知中发挥核心作用的情境下重新思考教学设计、学习、评估和专业知识发展提供了基础。

英文摘要

The four dominant learning theories of behaviorism, cognitivism, constructivism, and connectivism show significant conceptual limitations as generative artificial intelligence (AI) proliferates in educational settings. These frameworks were formulated before the emergence of AI systems capable of generating, synthesizing, and reasoning about knowledge. This article critically examines each learning theory and identifies assumptions challenged by generative AI's affordances. Drawing on research in distributed cognition, extended mind, human-AI collaboration, AI literacy, cognitive offloading, and metacognition, the article proposes Generativism as a learning theory for the generative AI age. Generativism posits that learning increasingly occurs through the iterative co-construction of knowledge between human learners and AI systems. The proposed framework is organized around four principles: epistemic partnership, distributed agency, generative literacy, and adaptive metacognition. The framework offers a foundation for rethinking instructional design, learning, assessment, and expertise development in contexts where generative AI plays an integral role in cognition.

2606.12437 2026-06-12 cs.CY cs.AI 新提交

Algorithmic Constitutionalism

算法宪政主义

Oren Perez, Nurit Wimer

AI总结 针对AI对社会生活日益渗透的风险,本文提出“算法宪政主义”框架,通过分层架构、算法元推理和协商纠正,应用于Facebook内容审核,并分析其与社会宪政主义的张力及对欧盟数字服务法案的影响。

详情
AI中文摘要

人工智能对社会生活的日益侵入给社会带来了重大风险,特别是在由谷歌、Facebook、苹果和亚马逊等公司创建和控制的资讯圈内。本文通过对Facebook内容审核制度的深入分析来审视这些风险,该制度已部分由算法管理。我们认为,文献中常作为AI治理挑战解决方案提出的伦理工程概念,因若干原因并不充分。为此,我们开发了一个替代框架,称为“算法宪政主义”。我们的方法基于三个支柱:(a)由两层代码组成的分层架构:(i)操作层或对象层,以及(ii)旨在保护系统核心原则免受算法引发变更的元层;(b)算法元推理,使系统能够同时在两个层面运行,从而实时监控、验证并可能纠正对象层偏离元代码层保护原则的操作;(c)通过协商进行纠正。本文阐述了算法宪政主义的概念,并展示了如何将其应用于Facebook的内容审核制度。作为分析的一部分,我们考察了社会宪政主义与算法宪政主义之间的张力。矛盾的是,试图将AI系统置于外部协商控制之下,也可能使AI代理干预该过程,从而可能破坏其目的。文章最后考虑了这一论点对2022年10月生效的欧盟数字服务法案的影响。

英文摘要

The increasing encroachment of artificial intelligence (AI) on social life raises significant risks for society, particularly within the infospheres created and controlled by companies such as Google, Facebook, Apple, and Amazon. This article examines these risks through an in-depth analysis of Facebook's content moderation regime, which is already partially governed by algorithms. We argue that the idea of ethical engineering, often proposed in the literature as a solution to the governance challenges posed by AI, is inadequate for several reasons. In response, we develop an alternative framework, which we term "algorithmic constitutionalism." Our approach rests on three pillars: (a) a layered architecture consisting of two levels of code: (i) an operative or object level and (ii) a meta level designed to protect the system's core principles from algorithmically initiated change; (b) algorithmic meta-reasoning, which enables the system to operate simultaneously at both levels so that it can monitor, verify, and potentially correct in real time operations at the object level that depart from principles protected at the meta-code level; and (c) correction through deliberation. The article elaborates the concept of algorithmic constitutionalism and demonstrates how it may be applied to Facebook's content moderation regime. As part of this analysis, we examine the tension between societal constitutionalism and algorithmic constitutionalism. Paradoxically, attempts to subject AI systems to external deliberative control may also enable AI agents to intervene in that process, potentially undermining its purpose. The article concludes by considering the implications of this argument for the European Digital Services Act, which entered into force in October 2022.

2606.12429 2026-06-12 cs.CY cs.AI 新提交

Muse Spark Safety & Preparedness Report

Muse Spark 安全与准备报告

Cristina Menghini, Peter Ney, Hamza Kwisaba, Zifan (Sail) Wang, Miles Turpin, Felix Binder, Jean-Christophe Testud, Aidan Boyd, Nathaniel Li, Ivan Evtimov, Klaudia Krawiecka, Arman Zharmagambetov, Jeremy Kritz, Alexander R. Fabbri, Daniel Song, Jinpeng Miao, Joonas Hjelt, Meghna Ramani, Leona Lan, Reza Aghajani, Joanna Bitton, Mahesh Pasupuleti, Devin Norder, Khalid El-Arini, Paridhi Singh, Vítor Albiero, Sahana CB, Rashnil Chaturvedi, Elahe Dabir, Edoardo Debenedetti, Jim Gust, Ziwen Han, Kat He, Sean Hendryx, Lifeng Jin, Polina Kirichenko, Sandra Lefdal, Kenneth Li, Asad Liaqat, Inna Lin, Despoina Magka, Neal Mangaokar, Ishita Mediratta, Zach Miller, Smitha Milli, Niloofar Mireshghallah, Saba Nazir, Hung Nguyen, Maximilian Nickel, Kelvin Niu, Kerem Oktar, Bhargavi Paranjape, Parth Pathak, Maya Pavlova, Emmanuel Ramirez, David Renardy, Candace Ross, Yasha Sheynin, Claudia Shi, Shivam Singhal, Evangelia Spiliopoulou, Rakshith Sharma Srinivasa, Jamelle Watson-Daniels, Spencer Whitman, Adina Williams, Chen Xing, Andy Zou, Tommy Ma, Siqi Deng, James Beldock, Prashant Ratanchandani, Kate Plawiak, Taesung Lee, Ryan Victory, Lindsay Hundley, Rachad Alao, Himaghna Bhattacharjee, Jianfeng Chi, Gary Frost, Pegah Ghahremani, Niki Howe, Yuheng Huang, Saeed Jahed, Hannah Korevaar, Trang Le, Zhe Liu, Jinghong Luo, Qin Lyu, Nina Mehrabi, Abraham Montilla, Chirag Nagpal, Cyrus Nikolaidis, Rajvardhan Oak, Manoj Ravi, Vidya Sarma, Aman Shankar, Alana Shine, Eric Michael Smith, Mariana Tandon

AI总结 Meta 发布 Muse Spark 大语言模型,评估其在化学/生物、网络安全和失控风险等灾难性风险领域的安全性,通过多层缓解措施将风险降至可接受水平,并作为 Meta AI 的基础模型发布。

详情
Comments
159 pages, 57 figures
AI中文摘要

Muse Spark 是 Meta 开发的最新大型语言模型。在本报告中,我们首先根据 Meta 的高级 AI 扩展框架对灾难性风险领域进行评估,并提供了支持我们发布决策的证据。然后,我们讨论了其他考虑因素,例如 Muse Spark 更广泛的内容安全性和行为特征,这些因素与整体安全相关,但不在框架管辖的灾难性风险领域之内。我们的准备结果涵盖了化学与生物、网络安全以及失控风险,评估了 Muse Spark 在 Meta AI 中的部署,认为其在我们高级 AI 扩展框架下呈现了可接受的残余风险水平。我们针对这些灾难性风险领域中的双重用途和高风险能力进行了一系列广泛的评估。这些评估在缓解措施实施前识别出了升高的风险,其中化学与生物能力在应用安全措施前被评估为可能达到高级 AI 扩展框架下的“高风险”类别。我们实施了一套多层缓解措施来解决已识别的风险,并且 Muse Spark 在与化学和生物学危险工作流程相关的多个基准测试中展示了最先进的拒绝能力。因此,我们发布 Muse Spark 作为 Meta AI 的基础模型。

英文摘要

Muse Spark is the latest large language model developed by Meta. In this report, we first present evaluations for catastrophic risk domains under Meta's Advanced AI Scaling Framework, along with the evidence that informed our launch decision. We then discuss additional considerations, such as Muse Spark's broader content safety and behavioral profile, that are relevant to overall safety but fall outside the catastrophic risk domains governed by the Framework. Our preparedness results covering Chemical and Biological, Cybersecurity, and Loss of Control risks assess Muse Spark's deployment within Meta AI as presenting acceptable levels of residual risks under our Advanced AI Scaling Framework. We conducted a broad set of evaluations targeting dual-use and high-risk capabilities across these catastrophic risk domains. Those evaluations identified elevated risks prior to mitigations, with Chemical and Biological capabilities assessed as likely reaching the "high risk" category under the Advanced AI Scaling Framework before safeguards were applied. We have implemented a multi-layered set of mitigations that address the identified risks, and Muse Spark demonstrates state-of-the-art refusal across a range of benchmarks related to hazardous workflows in chemistry and biology. We therefore release Muse Spark as the underlying model of Meta AI.

2606.12424 2026-06-12 cs.CY cs.AI cs.HC 新提交

AI-Automation Tooling in Computer Engineering Education: Mixed-Methods TAM/UTAUT Evidence for a General Acceptance Attitude

计算机工程教育中的AI自动化工具:基于TAM/UTAUT混合方法的一般接受态度证据

Aung Pyae

AI总结 本研究通过混合方法调查本科生对AI自动化工具(n8n平台)的接受态度,发现六个TAM/UTAUT构念融合为单一一般接受因子,绩效期望最强,享乐动机最弱,为课程整合提供理论依据。

详情
AI中文摘要

随着生成式AI和低代码工作流平台成为软件实践中的常规工具,一个关键的教育问题是下一代计算机工程师是否会将这些工具视为有用、可用且值得持续参与。本文报告了一项混合方法、横截面研究,涉及泰国三个相同脚本工作坊中本科生对AI自动化工具(通过开源平台n8n实例化)的接受度(n=103)。一个12项、五点李克特量表映射到六个TAM/UTAUT构念——绩效期望(PE)、努力期望(EE)、行为意向(BI)、自我效能(SE)、享乐动机(HM)和输出质量(OQ),并通过开放式反馈的归纳主题分析进行补充。分析结合了序数可靠性估计、自助置信区间、非参数检验、多重比较控制的相关性、多维度诊断、共同方法偏差检验以及跨会话比较。所有六个构念的接受度均良好,效应量大,其中PE最强,HM最弱。维度诊断进一步揭示,在这种简短的工作坊后情境中,经典的TAM/UTAUT子维度合并为一个单一的一般接受因子,这一发现具有重要的方法论和理论意义。定性主题在有用性和热情方面与定量概况一致,但在输出质量上存在分歧,揭示了一个虽小但表达清晰的可靠性怀疑少数群体。研究结果支持在本科计算教育中课程采用AI自动化工具,并确定了三个基于理论的教学杠杆:教学顺序支架、自我效能支持和信任校准干预。

英文摘要

As generative AI and low-code workflow platforms become routine in software practice, a key educational question is whether the next generation of computer engineers will accept these tools as useful, usable, and worthy of sustained engagement. This paper reports a mixed-methods, cross-sectional study of undergraduate computer engineering students' acceptance of AI automation tooling, instantiated through the open-source platform n8n across three identically scripted workshops in Thailand (n = 103). A 12-item, five-point Likert instrument mapped to six TAM/UTAUT constructs - Performance Expectancy (PE), Effort Expectancy (EE), Behavioral Intention (BI), Self-Efficacy (SE), Hedonic Motivation (HM), and Output Quality (OQ) - was complemented by inductive thematic analysis of open-ended feedback. Analyses combined ordinal reliability estimation, bootstrap confidence intervals, non-parametric tests, multiple-comparison-controlled correlations, polychoric dimensionality diagnostics, a common-method-bias check, and between-session comparisons. Acceptance was favorable across all six constructs with large effect sizes, with PE emerging as the strongest construct and HM as the weakest. Dimensionality diagnostics further revealed that canonical TAM/UTAUT sub-facets collapsed into a single general acceptance factor in this short-form post-workshop context, a finding with important methodological and theoretical implications. Qualitative themes converged with the quantitative profile regarding usefulness and enthusiasm but diverged on output quality, revealing a small yet articulate reliability-skeptical minority. The findings support the curricular adoption of AI automation tooling in undergraduate computing education and identify three theory-grounded instructional levers: instruction-sequencing scaffolds, self-efficacy supports, and trust-calibration interventions.

2606.12423 2026-06-12 cs.CY cs.AI 新提交

The Challenges of Balancing AI Compliance and Technological Innovations in Critical Sectors: A Systematic Literature Review

关键领域中平衡AI合规与技术创新的挑战:系统文献综述

Ayush Enkhtaivan, Chinazunwa Uwaoma

AI总结 通过系统文献综述,识别出碎片化法规、中小企业过度合规负担和治理模型错配三大挑战,并提出风险分级监管、设计合规和可解释AI等策略。

详情
Comments
11 pages, 7 figures, Hawaii International Conference on System Sciences
AI中文摘要

人工智能在医疗、金融、能源和国防等关键基础设施中的快速整合带来了变革性益处,但也与不断演变的监管和治理框架产生冲突。本文通过系统文献综述(SLR)研究在关键基础设施领域中平衡AI合规与技术创新的挑战。综述遵循既定的SLR指南,提取并综合了2020-2025年间发表的同行评审文章、报告和机构来源的见解。研究识别出三个相互关联的挑战:碎片化法规、中小企业过度合规负担以及治理模型错配。为应对这些挑战,研究强调了实用的治理策略,包括风险分级监管、设计合规和可解释AI,以支持在关键领域中可扩展且可信的AI部署。主要贡献包括核心AI治理挑战的简明映射及说明其重叠的概念图,以及为政策制定者和从业者提供协调监管与创新的可行策略。

英文摘要

The rapid integration of artificial intelligence (AI) into critical infrastructure including healthcare, finance, energy, and defense, offers transformative benefits but also conflicts with evolving regulatory and governance frameworks. This paper presents a systematic literature review (SLR) to examine the challenges of balancing AI compliance and technological innovation across critical infrastructure sectors. The review follows established SLR guidelines to extract and synthesize insights from peer-reviewed articles, report, and institutional sources published between 2020-2025. The study identifies three interrelated challenges: fragmented regulations, excessive compliance burdens for smaller to medium enterprises (SMEs), and misaligned governance models. To address these challenges, the study highlights practical governance strategies, including risk-tiered regulation, compliance by design, and explainable AI, to support scalable and trustworthy AI deployment in critical sectors. Key contributions include a concise mapping of core AI-governance challenges and a conceptual diagram illustrating their overlap, as well as actionable strategies for policymakers and practitioner to harmonize oversight with innovation.

2606.12418 2026-06-12 cs.CY cs.AI 新提交

Divination by Prompt: LLM-Mediated Xuanxue on Chinese Social Media

通过提示占卜:中文社交媒体上LLM中介的玄学

Chuang Li, Lixuan Wang, Yuqi Chen, Ze Hong

AI总结 研究LLM在中文社交媒体上用于占卜的现象,通过混合方法分析用户动机、协作提示优化及效果感知,揭示其与传统占卜的异同。

详情
AI中文摘要

大型语言模型(LLM)的快速普及催生了一种引人注目的文化实践:使用对话式AI进行占卜。本文首次系统研究了LLM中介的占卜在玄学(Xuanxue)背景下的实践,玄学是中文社交媒体上神秘和精神实践的互联网原生总称。采用混合方法设计,我们分析了小红书上的23000多条帖子和评论,并对用户和专业占卜师进行了32次半结构化访谈。用户主要就实际问题——恋爱关系、职业、考试和游戏抽卡——咨询LLM,通过两种交叉路径:由病毒式传播和零成本访问驱动的趋势性好奇心,以及不确定性条件下由事件驱动的焦虑。一个显著特征是协作提示优化,将用户转变为主动的提示工程师。在表达明确立场的评论者中,感知效果偏向积极,“准确性”通常通过个人经历契合和回顾性确认来证明,这与巴纳姆效应和确认偏见一致。用户还发展出验证实践,如重复试验和跨模型比较。相比之下,专业占卜师认为LLM缺乏真正占卜所需的“灵力”,这反映了本体论承诺和经济边界工作。我们还展示了参与者在解释AI生成解读时如何在科学和形而上框架之间进行协商。将这些发现置于人类学和认知进化占卜理论中,我们认为LLM占卜保留了传统实践的核心功能,同时引入了可扩展性、可重复性和提示驱动的共同生产,重塑了占卜权威的构建和评估方式。

英文摘要

The rapid proliferation of large language models (LLMs) has produced a striking cultural practice: using conversational AI for divination. This paper offers one of the first systematic studies of LLM-mediated divination in the context of Xuanxue, an internet-native umbrella term for mystical and spiritual practices on Chinese social media. Using a mixed-methods design, we analyze 23000+ posts and comments from Xiaohongshu and conduct 32 semi-structured interviews with users and professional diviners. Users primarily consult LLMs about pragmatic concerns - romantic relationships, careers, exams, and in-game gacha draws - via two intersecting pathways: trend-driven curiosity enabled by viral visibility and zero-cost access, and event-driven anxiety under conditions of uncertainty. A defining feature is collaborative prompt refinement, which turns users into active prompt engineers. Among commenters expressing a clear stance, perceived efficacy skews positive, with "accuracy" often justified through biographical fit and retrospective confirmation, consistent with Barnum and confirmation bias. Users also develop verification practices such as repeated trials and cross-model comparison. Professional diviners, by contrast, portray LLMs as lacking the "spiritual power" required for genuine divination, reflecting both ontological commitments and economic boundary-work. We also show how participants navigate tensions between scientific and metaphysical frames when interpreting AI-generated readings. Situating these findings in anthropological and cognitive-evolutionary theories of divination, we argue that LLM divination preserves core functions of traditional practice while introducing scalability, repeatability, and prompt-driven co-production that reshape how divinatory authority is constructed and evaluated.

2606.12413 2026-06-12 cs.CY cs.AI cs.CE cs.CL cs.SE 新提交

AI SciBrief as a Gateway to Research: A Framework for Onboarding Students into New Research Areas

AI SciBrief 作为研究入门:一种引导学生进入新研究领域的框架

Andrei Lazarev, Dmitrii Sedov

AI总结 提出利用大语言模型平台 AI SciBrief 自动生成科学趋势摘要的框架,帮助学生克服信息过载,加速从信息搜索到知识创造的转变。

详情
Comments
This is the version of the article accepted for publication in TELE 2025 after peer review. The final, published version is available at IEEE Xplore: this https URL
AI中文摘要

各层次高等教育学生面临信息过载的重大障碍,这常常使研究过程的初始阶段陷入瘫痪并抑制动机。为此,本文介绍了一种教学框架,利用 AI SciBrief——一个由大语言模型驱动的平台,旨在自动生成科学趋势摘要。我们描述了这一多学科工具——初始覆盖金融、医学和教育领域——如何融入课程以克服这一“入门障碍”。该框架提供了具体方法,利用这些摘要促进学期论文的选题、加速学位论文的文献综述,并使研究生能够持续监测新兴趋势。我们得出结论,AI SciBrief 作为“研究入门”有效降低了学生的认知负荷,使他们能够更快地从信息搜索过渡到知识创造。

英文摘要

Students at all levels of higher education face a significant barrier in the form of information overload, which often paralyzes the initial stages of the research process and suppresses motivation. In response, this article introduces a pedagogical framework that leverages AI SciBrief, a platform powered by a Large Language Model (LLM) designed to automatically generate digests of scientific trends. We describe how this multidisciplinary tool - with initial coverage in finance, medicine, and education - can be integrated into the curriculum to overcome this "entry barrier." The framework provides concrete methodologies for utilizing these digests to facilitate topic selection for term papers, accelerate literature reviews for dissertations, and enable postgraduate students to continuously monitor emerging trends. We conclude that AI SciBrief functions as a "gateway to research" effectively reducing students' cognitive load and empowering them to transition more rapidly from information searching to knowledge creation.

2606.13634 2026-06-12 cs.CL math.CT 新提交

Operads for compositional reasoning in LLMs

用于LLM组合推理的Operad框架

Nathaniel Bottman, Kyle Richardson

AI总结 提出operad作为问题分解的数学框架,定义问题operad Q,将QA模型解释为Q上的代数,并引入operadic一致性度量,实验表明该度量与准确性强相关。

详情
AI中文摘要

问题分解,即将复杂查询分解为更简单的子查询,并将子查询的答案组合成最终答案,是提高LLM推理能力的常用策略,但目前缺乏严格的数学基础。本文提出operad(一种模拟多输入单输出操作及其组合的数学结构)作为描述问题分解的自然框架。我们定义了问题operad $Q$,其中操作对应问题模板,组合对应子答案的替换,并展示了QA模型如何被解释为$Q$上的代数。除了重新诠释现有实践,这一operad视角还指向了新方法,特别是operadic一致性概念,它衡量QA模型的答案在问题分解树的部分折叠上是否一致。关于operadic一致性的实证评估见我们的姊妹论文(Bottman, Liu, and Richardson, 2026),该论文发现它在12个LLM和4个多跳QA数据集上与准确性强相关,且优于基于温度的标准自一致性基线。我们认为operad是问题分解的自然数学框架,而诸如operadic一致性等不变量为分析和改进多步推理的可靠性开辟了新方向。

英文摘要

Question decomposition, i.e. breaking a complex query into simpler sub-queries whose answers are composed to produce a final answer, is a widely used strategy for improving LLM reasoning, yet it currently lacks a rigorous mathematical foundation. In this paper, we propose operads, mathematical structures that model many-in, one-out operations and compositions thereof, as a natural framework for describing question decomposition. We define the questions operad $Q$, in which operations correspond to question templates and composition corresponds to substitution of sub-answers, and show how QA models can be interpreted as algebras over $Q$. Beyond reframing existing practice, this operadic perspective points toward new methods, in particular a notion of operadic consistency, which measures whether a QA model's answers agree across the partial collapses of a question decomposition tree. Empirical evaluation of operadic consistency is reported in our companion paper (Bottman, Liu, and Richardson, 2026), which finds it strongly correlated with accuracy across twelve LLMs and four multi-hop QA datasets and outperforming standard temperature-based self-consistency baselines. We argue that operads are the natural mathematical home for question decomposition, and that invariants such as operadic consistency open new directions for analyzing and improving the reliability of multi-step reasoning.

2606.13092 2026-06-12 cs.LG cs.RO math.DS 新提交

Scale Buys Interpolation, Structure Buys a Horizon: Certified Predictability for Equivariant World Models

规模买插值,结构买地平线:等变世界模型的认证可预测性

Hongbo Wang

AI总结 针对等变潜在世界模型,提出可计算的多步可预测地平线认证,证明T步滚动误差在对称轨道上恒定,并由李雅普诺夫谱分层界定,且该认证为等变模型独有。

详情
Comments
23 pages (9 main + appendices). Code: this https URL
AI中文摘要

规模买插值;结构买认证的地平线。世界模型的平均误差无法说明特定预测是否可信,或可信多久。对于等变潜在世界模型,我们给出可计算的多步可预测地平线认证:$T$步滚动误差在每个对称轨道上恒定(定理A),并由预测器的李雅普诺夫谱逐通道分层,$T_j(\epsilon)\sim\log(1/\epsilon)/\lambda_j$。地平线是双向的——匹配的下界使近似等变被证明受地平线限制——且该认证为结构独有:轨道恒定误差刻画等变性,因此任何非等变模型无论规模多大都不具备。实验上,在40维Lorenz-96上,只有$\mathbb{Z}_N$等变网络恢复完整李雅普诺夫谱($R^2=0.98$);密集和循环基线失败。由于谱是忠实的,认证先验地起作用:在固定感知预算下,$c$倍膨胀的认证需要$c$倍预算,且等变认证满足其膨胀密集对应物无法满足的预算——无需校准数据。相同的读出,未经修改,可无训练审计公开预训练世界模型:TD-MPC2检查点落在认证自身的范围分类上——在强膨胀处校准(比率0.94-1.02),在弱膨胀处乐观,在收缩处正确弃权——部署的监控器逐单元复制该映射,样本外。在官方1M-317M多任务阶梯上,校准不随参数增加。在V-JEPA 2-AC(1B,真实机器人数据)上,测量的交叉检查正确覆盖了过度承诺的切空间谱——交叉验证审计,而非原始数值,是可部署的对象。规模买插值,而非校准的地平线。

英文摘要

Scale buys interpolation; structure buys a certified horizon. A world model's average error says nothing about whether a particular prediction can be trusted, or for how long. For equivariant latent world models we give a computable, multi-step certificate of the predictable horizon: $T$-step rollout error is provably constant over each symmetry orbit (Theorem A) and stratified channel-by-channel by the predictor's Lyapunov spectrum, $T_j(\epsilon)\sim\log(1/\epsilon)/\lambda_j$. The horizon is two-sided -- a matching lower bound makes approximate equivariance provably horizon-limited -- and the certificate is exclusive to structure: orbit-constant error characterizes equivariance, so no non-equivariant model has it at any scale. Empirically, on 40-D Lorenz-96 only a $\mathbb{Z}_N$-equivariant network recovers the full Lyapunov spectrum ($R^2{=}0.98$); dense and recurrent baselines fail. Because the spectrum is faithful, the certificate acts, a priori: under a fixed sensing budget a $c\times$-inflated certificate provably needs $c\times$ the budget, and the equivariant certificate meets a budget its inflated dense counterpart cannot -- with zero calibration data. The same read-out, unchanged, audits public pretrained world models training-free: TD-MPC2 checkpoints land on the certificate's own scope taxonomy -- calibrated where strongly expansive (ratio 0.94-1.02), optimistic where weakly expansive, correctly abstaining where contracting -- a map a deployed monitor replicates cell-by-cell, out-of-sample. Across the official 1M-317M multitask ladder, calibration does not improve with parameters. On V-JEPA 2-AC (1B, real robot data) the measured cross-check correctly overrides an over-promising tangent spectrum -- the cross-validated audit, not the raw number, is the deployable object. Scale buys interpolation, not a calibrated horizon.

2606.12691 2026-06-12 cs.LG cs.AI eess.SY math.OC stat.ML 新提交

Two-Layer Linear Auto-Regressive Models Estimate Latent States

两层线性自回归模型估计潜在状态

Yahya Sattar, Sunmook Choi, Leo Maynard-Zhang, Yassir Jedra, Maryam Fazel, Sarah Dean

AI总结 本文证明两层线性自回归模型通过经验风险最小化训练时,能近似卡尔曼滤波,恢复潜在状态估计,并提供有限样本保证。

详情
Comments
ICML 2026
AI中文摘要

自回归模型已成为处理序列数据(从语言到视频)的强大工具。理解这些模型如何以及为何学习潜在表示仍然是一个开放的理论问题。在这项工作中,我们证明,当在部分观测的线性动力系统的数据上通过经验风险最小化训练时,两层线性自回归模型自然学会近似卡尔曼滤波。特别地,我们表明,学习到的隐藏表示与最优(卡尔曼)滤波器产生的状态估计一致,仅相差一个相似变换,尽管模型没有关于底层动力学或状态的显式知识。该结果基于三个主要见解。首先,我们建立卡尔曼滤波器可以被具有有界截断误差的自回归模型很好地近似。其次,我们表明,尽管非凸性,两层优化景观是良性的,即所有驻点要么是严格鞍点,要么是全局最小值。最后,作为我们的主要贡献,我们提供了关于预测误差、参数估计误差和潜在状态恢复的有限样本保证。数值模拟支持理论结果,并表明自回归模型的潜在表示恢复了状态估计。

英文摘要

Auto-regressive models have emerged as powerful tools for sequential data, from language to video. Understanding how and why these models learn latent representations remains an open theoretical question. In this work, we demonstrate that when trained by empirical risk minimization on data from partially observed linear dynamical systems, two-layer linear auto-regressive models naturally learn to approximate Kalman filtering. In particular, we show that the learned hidden representation coincides, up to a similarity transformation, with the state estimates produced by the optimal (Kalman) filter, even though the model has no explicit knowledge of the underlying dynamics or state. The result follows from three main insights. First, we establish that the Kalman filter is well approximated by an auto-regressive model with bounded truncation error. Second, we show that despite non-convexity, the two-layer optimization landscape is benign, i.e., all stationary points are either strict saddles or global minima. Finally, as our main contributions, we provide finite-sample guarantees on prediction error, parameter estimation error, and latent state recovery. Numerical simulations support the theoretical results and demonstrate that the latent representations of auto-regressive models recover state estimates.

2606.13380 2026-06-12 quant-ph cs.AI 新提交

An LLM System for Autonomous Variational Quantum Circuit Design

用于自主变分量子电路设计的大语言模型系统

Kenya Sakka, Wataru Mizukami, Kosuke Mitarai

AI总结 提出一个基于大语言模型的自主代理框架,通过迭代设计量子电路,在量子特征映射和变分量子本征求解器任务中取得优于或媲美现有方法的性能。

详情
Comments
63 pages, 19 figures, 3 tables
AI中文摘要

高性能量子电路的设计在很大程度上仍然依赖于人类专家。我们引入了一个自主代理框架,该框架利用大语言模型在明确的设计约束下进行迭代量子电路设计。我们的系统集成了七个组件:探索、生成、讨论、验证、存储、评估和审查。这些组件形成了一个闭环工作流,结合了基于网络的知识获取、基于文献的批评、可执行代码生成和实验反馈。我们在两个任务上评估了该框架:用于量子机器学习的量子特征映射构建和用于量子化学中变分量子本征求解器应用的拟设生成。在图像分类基准测试中,生成的最佳特征映射优于代表性的量子特征映射,并且当扩展到更大的量子比特数时,超过了经典的径向基函数核。在七个分子的基态能量估计中,生成的拟设达到了与广泛使用的化学启发式和硬件高效构造相竞争的精度,同时满足施加的缩放约束。这些结果确立了由大语言模型驱动的代理系统作为自动化量子电路设计的可行范式,并展示了人工智能系统如何跨科学领域参与迭代科学优化工作流。

英文摘要

The design of high performing quantum circuits remains largely dependent on human expertise. We introduce an autonomous agentic framework that employs large language models (LLMs) to conduct iterative quantum circuit designs under explicit design constraints. Our system integrates seven components: Exploration, Generation, Discussion, Validation, Storage, Evaluation, and Review. These components form a closed-loop workflow that combines web-based knowledge acquisition, literature-grounded critique, executable code generation, and experimental feedback. We evaluate the framework on two tasks: quantum feature map construction for quantum machine learning and ansatz generation for variational quantum eigensolver applications in quantum chemistry. In image classification benchmarks, the best generated feature map outperforms representative quantum feature maps and, when scaled to larger qubit counts, surpasses the classical radial basis function kernel. In molecular ground state estimation across seven molecules, the generated ansatz attains competitive accuracy with widely used chemically inspired and hardware-efficient constructions while satisfying the imposed scaling constraints. These results establish LLM driven agentic system as a viable paradigm for automated quantum circuit design and illustrate how AI systems can participate in iterative scientific optimization workflows across scientific domains.

2606.13341 2026-06-12 cs.CV cs.AI physics.med-ph 新提交

Dual-Domain Equivariant Generative Adversarial Network for Multimodal CT-PET Synthesis

双域等变生成对抗网络用于多模态CT-PET合成

Gabriel Steele, Alzahra Altalib, Alessandro Perelli

AI总结 提出双域等变生成对抗网络(DDE-GAN),联合空间与频域学习并融入旋转等变性,实现高保真多模态CT-PET图像合成。

详情
Comments
4 pages, 3 figures, 1 table, 2026 IEEE 23rd International Symposium on Biomedical Imaging (ISBI)
AI中文摘要

我们提出了一种用于多模态CT-PET图像合成的双域等变生成对抗网络(DDE-GAN)。传统的基于GAN的方法通常仅在空间域中操作,忽略了几何一致性,导致结构保真度有限。DDE-GAN通过联合学习空间域和频率(傅里叶)域,捕捉互补的解剖和频谱信息,解决了这些挑战。此外,嵌入在CT和PET测量物理中的旋转等变性被整合到生成器和判别器的损失中,以确保在旋转下的一致响应,从而提高解剖准确性。一种分层双域训练策略通过多阶段损失函数强制实现域内和域间一致性。在HECKTOR 2022 CT-PET数据集上的评估表明,DDE-GAN在CT-PET图像合成中取得了优于基线模型的合成质量。结果表明,将双域学习与几何等变性相结合,显著增强了多模态图像合成的准确性和鲁棒性,为PET补全和数据增强等实际应用提供了可能。

英文摘要

We present a Dual-Domain Equivariant Generative Adversarial Network (DDE-GAN) for multimodal CT-PET image synthesis. Traditional GAN-based approaches often operate solely in the spatial domain and ignore geometric consistency, resulting in limited structural fidelity. DDE-GAN addresses these challenges by jointly learning from both spatial and frequency (Fourier) domains, capturing complementary anatomical and spectral information. Furthermore, rotational equivariance embedded in the physics of the CT and PET measurements are integrated into the loss of both the generator and discriminator to ensure consistent responses under rotations, improving anatomical accuracy. A hierarchical dual-domain training strategy enforces intra- and inter-domain consistency through multi-stage loss functions. Evaluated on the HECKTOR 2022 CT-PET dataset, DDE-GAN achieves superior synthesis quality over baseline models for CT-PET image synthesis. The results demonstrate that combining dual-domain learning with geometric equivariance substantially enhances multimodal image synthesis accuracy and robustness, enabling practical applications in PET completion and data augmentation.

2606.13568 2026-06-12 cs.LG math-ph 新提交

Adjusted Cup-Product Neural Layer

调整杯积神经层

Snigdha Chandan Khilar

AI总结 提出调整杯积神经层,通过硬连线杯积与高规范理论调整项,实现规范不变读出,并证明调整系数是唯一信号源。

详情
AI中文摘要

物理和几何中的许多重要可观测量是上链的杯积。本文引入了调整杯积神经层。这是一种神经原语,硬连线了杯积与来自高规范理论的调整项。这创建了一个设计上规范不变的读出。他们的主要理论结果表明,在闭链上,输出完全依赖于调整系数。将该系数设为零,无论其他参数如何,输出完全消失。因此,调整是规范不变信号的唯一来源。他们证明该可观测量是一个非零二次型,并且在一个和两个规范变换下精确不变。

英文摘要

Many important observables in physics and geometry are cup products of cochains. The adjusted cup product neural layer has been introduced in this paper. It is a neural primitive that hard wires the cup product with an adjustment term from higher gauge theory. This creates a readout that is gauge invariant by design. Their main theoretical result shows that on a closed cycle the output relies entirely on the adjustment coefficient. Setting this coefficient to zero removes the output completely regardless of other parameters. Thus the adjustment is the only source of gauge invariant signal. They prove this observable is a nonzero quadratic form and is exactly invariant under one and two gauge transformations.

2606.12368 2026-06-12 cs.CV 新提交

DepthMaster: Unified Monocular Depth Estimation for Perspective and Panoramic Images

DepthMaster: 统一透视与全景图像的单目深度估计

Pengfei Wang, Shihao Wang, Liyi Chen, Zhiyuan Ma, Guowen Zhang, Lei Zhang

AI总结 提出DepthMaster统一框架,通过将全景图分解为重叠透视块并引入对应一致性损失和虚拟投影相机几何先验,解决透视与全景深度估计的几何差异和数据稀缺问题,在13个数据集上实现零样本最优性能。

详情
AI中文摘要

虽然单目深度估计取得了显著进展,但对于窄视场(FoV)透视图像和$360^\circ$全景图像实现通用的度量深度估计仍然是一个未解决的挑战。现有方法通常针对特定相机类型设计,难以在多样化场景中生成准确的度量深度。这一限制源于两个关键挑战:透视相机与全景相机之间的固有几何差异,以及带有度量标注的全景训练数据的稀缺性。在这项工作中,我们引入了DepthMaster,一个统一的度量深度估计框架。我们不采用专门网络来学习球形畸变,而是通过将全景图像分解为重叠的透视块来重新表述问题。关键的是,与先前依赖临时架构修改来处理边界的基于投影的方法不同,我们引入了一种新颖的对应一致性损失(CCL),并注入虚拟投影相机作为几何先验,从而能够无缝拼接这些块,同时避免专用算子并保持主干与标准Transformer设计高度兼容。该策略通过将所有输入统一为规范透视表示来解决几何差异,并通过直接从大量透视数据集中解锁强大的度量先验来有效规避数据稀缺问题。在仅包含一个全景数据集的混合数据集上训练后,DepthMaster在13个多样化数据集上实现了最先进的零样本性能,不仅在透视和全景领域超越了通用方法,还领先于领先的专家模型。

英文摘要

While monocular depth estimation has achieved significant progress, achieving generalized metric depth estimation for both narrow field-of-view (FoV) perspectives and $360^\circ$ panoramas remains an unsolved challenge. Existing methods are often tailored to specific camera types and struggle to produce accurate metric depth that generalizes across diverse settings. This limitation stems from two key challenges: the inherent geometric discrepancy between perspective and panoramic cameras, and the scarcity of panoramic training data with metric annotations. In this work, we introduce DepthMaster, a unified metric depth estimation framework. Rather than employing specialized networks to learn spherical distortions, we reformulate the problem by decomposing panoramic images into overlapping perspective patches. Crucially, distinct from prior projection-based methods that rely on ad-hoc architectural modifications to handle boundaries, we introduce a novel Correspondence Consistency Loss (CCL) and inject virtual projection cameras as geometric priors, allowing us to seamlessly stitch the patches while avoiding specialized operators and keeping the backbone largely compatible with standard Transformer designs. This strategy also resolves the geometric differences by unifying all inputs into a canonical perspective representation, and effectively circumvents data scarcity by directly unlocking powerful metric priors from vast perspective datasets. Trained on a mixed dataset that contains only one panorama dataset, DepthMaster achieves state-of-the-art zero-shot performance on 13 diverse datasets, outperforming not only universal methods but also leading specialist models in both perspective and panoramic domains.

2606.12040 2026-06-12 cs.AI cs.GR 新提交

A Lightweight Multi-Agent Framework for Automated Concrete Barrier Design

一种用于自动混凝土护栏设计的轻量级多智能体框架

Wanting Wang, Xiye Ma, Yuyang He, Minghui Cheng, Ran Cao

AI总结 提出基于AutoGen的“生成-评估-优化”闭环多智能体框架,实现混凝土护栏自动设计,准确率超98%,且8B参数轻量模型可优于631B旗舰模型。

详情
AI中文摘要

钢筋混凝土公路护栏的设计是一个安全关键过程,需要严格遵守AASHTO-LRFD桥梁设计指南等监管规定。当前的工程实践严重依赖手动、迭代和启发式计算来满足复杂的非线性材料和力学约束。尽管大型语言模型(LLMs)表现出强大的生成能力,但它们在结构工程中的直接应用仍受到幻觉风险和物理基础不足的限制。为了解决这些挑战,本研究提出了一种新颖的“生成-评估-优化”闭环框架,利用AutoGen的多智能体编排能力实现混凝土护栏的自动设计。实验结果表明,所提出的智能体框架实现了超过98%的设计准确率,显著优于独立的通用LLMs。更重要的是,研究揭示了设计性能不一定与模型规模相关,8B参数的轻量级模型可以胜过无约束的631B参数旗舰模型。这一发现凸显了在降低计算成本的同时提高AI辅助工程工具在工业应用中的可及性的潜力。所提出的多智能体设计框架的源代码可在项目GitHub仓库中获取:this https URL。关键词:结构工程;多智能体系统;大型语言模型;混凝土护栏设计;AutoGen;设计自动化。

英文摘要

The design of reinforced concrete highway barriers is a safety-critical process that requires strict compliance with regulatory provisions such as the AASHTO-LRFD bridge design guidelines. Current engineering practice relies heavily on manual, iterative, and heuristic calculations to satisfy complex nonlinear material and mechanics constraints. Although Large Language Models (LLMs) demonstrate strong generative capabilities, their direct application to structural engineering remains limited by hallucination risks and insufficient physical grounding. To address these challenges, this study proposes a novel "generation-evaluation-optimization" closed-loop framework for automated concrete barrier design using the multi-agent orchestration capabilities of AutoGen. Experimental results demonstrate that the proposed agentic framework achieves over 98% design accuracy, significantly outperforming standalone general-purpose LLMs. More importantly, the study reveals that design performance is not necessarily correlated with model scale, where an 8B-parameter lightweight model could outperform unconstrained 631B-parameter flagship models. This finding highlights the potential to substantially reduce computational costs while improving the accessibility of AI-assisted engineering tools for industry applications. The source code for the proposed multi-agent design framework is available at the project GitHub repository: this https URL. Keywords: Structural Engineering; Multi-Agent Systems; Large Language Models; Concrete Barrier Design; AutoGen; Design Automation.

2606.12025 2026-06-12 cs.AI 新提交

Human-Enhanced Loop Modeling (HELM): Agent-Based Finite Element Modeling of Concrete Bridge Barriers

人类增强循环建模(HELM):基于智能体的混凝土桥梁护栏有限元建模

Quankai Wang, Yulin Xie, Tongfei Yang, Minghui Cheng, Ran Cao

AI总结 提出HELM框架,通过人机协作将有限元建模分解为可验证的检查点,在MASH TL-4和TL-5条件下将自主建模成功率从20%提升至75%。

详情
AI中文摘要

对桥梁护栏等安全关键基础设施进行有限元(FE)建模需要高保真非线性动态分析,然而当前的FE建模过程仍然劳动密集且缺乏自动化。本文提出了人类增强循环建模(HELM)框架,这是一种协作式人机协议,将长序列有限元建模分解为几何生成、边界条件定义和材料分配等离散的、可视觉验证的检查点。该框架通过一个包含20个案例的钢筋混凝土桥梁护栏矩阵在MASH TL-4和TL-5侧向荷载条件下进行演示,将专用智能体与两种广泛使用的商业FE软件(即ANSYS和LS-PrePost)对接。实验结果表明,HELM将基线自主建模成功率从20%提高到75%,其中几何和边界条件任务的智能体级通过率大约翻倍。误差分析显示,空间推理和代数逻辑限制构成了主要的失败模式,突显了结构化人在回路干预对建模自动化的价值。完整的智能体设计代码和提示已开源,可访问:此 https URL。

英文摘要

Finite element (FE) modeling of safety-critical infrastructure such as bridge barriers requires high-fidelity nonlinear dynamic analysis, yet the current FE modeling process remains labor-intensive and lacks automation. This paper presents the Human-Enhanced Loop Modeling (HELM) framework, a collaborative human-agent protocol that decomposes long-sequence finite element modeling into discrete, visually verifiable checkpoints across geometry generation, boundary condition definition, and material assignment. The framework is demonstrated through a 20-case matrix of reinforced concrete bridge barriers under MASH TL-4 and TL-5 lateral loading conditions, interfacing specialized agents with two widely used commercial FE softwares, i.e., ANSYS and LS-PrePost. Experimental results show that HELM improves the baseline autonomous modeling success rate from 20% to 75%, with agent-level pass rates for geometry and boundary condition tasks approximately doubling. Error analysis reveals that spatial reasoning and algebraic logic limitations constitute the primary failure modes, underscoring the value of structured human-in-the-loop intervention for modeling automation. The complete agent design code and prompts are open-sourced and can be accessed at: this https URL.

2606.11240 2026-06-12 physics.comp-ph cond-mat.str-el cs.LG quant-ph 新提交

Physically Constrained Ensemble Gaussian Process Modelling for Expensive Quantum Systems with Heteroskedastic Noise

物理约束集成高斯过程建模用于具有异方差噪声的昂贵量子系统

Arpan Biswas, Sutirtha Paul, Joseph Agada, Matthias Thamm, Adrian Del Maestro

AI总结 提出物理约束集成高斯过程框架,通过加权惩罚和数值积分集成多个GP代理,高效建模含异方差噪声的量子系统,在Bose-Hubbard模型和纳米孔硅酸盐量子液体模拟中实现更准确且物理合理的预测。

详情
Comments
14 pages, 6 figures in main text, 2 figures in Supp materials
AI中文摘要

精确建模量子多体系统通常需要计算昂贵的模拟,如密度矩阵重正化群(DMRG)或量子蒙特卡洛(QMC)计算。这些方法虽然精确,但会带来显著的时间和资源限制,限制了它们在详尽参数探索中的应用。此外,这些昂贵模拟在大的未知参数空间内可能包含可变误差,需要量化和传播。因此,需要预测建模来准确估计稀疏采样数据(具有异方差噪声)的函数空间,同时保持估计的物理相关性。为此,我们提出了物理约束集成高斯过程(pc-EGP)框架,旨在物理一致性约束下高效建模复杂且含噪声的量子系统。该方法首先将物理约束作为用户控制的加权惩罚项,施加到高斯过程(GP)代理的数据驱动损失函数中。然后,通过数值求积方法训练一组这样的GP模型,其中多个不同节点上的GP通过求积加权平均进行集成。我们首先在合成生成数据上演示该框架,然后应用于量子系统。在第一个案例研究中,我们利用Bose-Hubbard模型的DMRG模拟来预测控制超流-莫特绝缘体转变的临界相互作用参数Uc。在第二个案例研究中,我们展示了该方法在QMC模拟上的应用,模拟限制在纳米孔硅酸盐内的量子液体,目标是优化化学环境以实现一维超流。与传统GP相比,pc-EGP在准确性和物理有意义的预测之间实现了更好的平衡。

英文摘要

Accurate modeling of quantum many-body systems often requires computationally expensive simulations such as Density Matrix Renormalization Group (DMRG) or Quantum Monte Carlo (QMC) calculations. These methods, while precise, impose significant time and resource constraints, limiting their use in exhaustive parameter exploration. Moreover, these expensive simulations can contain variable errors over the large unknown parameter space, which needs to be quantified and propagated. Thus, predictive modelling is required to estimate the functional space accurately over scarcely sampled data with heteroskedastic noise, while preserving the physical relevance of the estimation. Therefore, we present a Physically Constrained Ensemble Gaussian Process (pc-EGP) framework designed to efficiently model complex and noisy quantum systems under physical consistency constraints. The proposed method first enforces physical constraints as a user controlled weighted penalty to the data-driven loss function of the Gaussian Process (GP) surrogates. Then an ensemble of such GP models is trained with variable noisy simulations via numerical quadrature method where these multiple GP(s) at different nodes is integrated as a quadrature weighted average. We first demonstrate the framework on synthetically generated data before applying to quantum systems. In the first case study, we leverage DMRG simulations of the Bose-Hubbard Model to predict the critical interaction parameter Uc governing the superfluid-to-Mott-insulator transition. In the second case study, we demonstrate our method on QMC simulations, of a quantum liquid confined inside a nanoporous silicate with the goal of optimizing a chemical environment to realize a one-dimensional superfluid. Compared to conventional GP, pc-EGP achieves a better balance of accuracy and physically meaningful predictions.

2606.11104 2026-06-12 cs.LG math.CA stat.ML 新提交

Limitations of Learning Tanh Neural Networks with Finite Precision

有限精度下学习Tanh神经网络的局限性

Philipp Grohs, Matěj Trödler

AI总结 基于有限精度计算和L^p精度保证,通过构造尖锐局部化bump函数,证明自适应随机算法在L^p范数下收敛速度不超过蒙特卡洛率O(m^{-1/p}),除非采样预算随网络参数和架构指数增长。

详情
AI中文摘要

我们研究了在有限精度计算和$L^p$精度保证下,从点评估中学习$\ anh$神经网络的局限性,建立在Berner、Grohs和Voigtländer(2023)的工作基础上。我们的方法基于通过迭代$\ anh$激活函数新颖构造的尖锐局部化bump函数。利用这一机制,我们证明,在有限精度设置下,基于$m$个样本的自适应随机算法在$L^p$范数下无法达到比蒙特卡洛率$O(m^{-1/p})$更高的收敛速度,除非采样预算随网络参数和架构的大小指数增长。结果揭示了有限精度对包含局部化bump函数的类别可学习性施加的基本限制,将先前针对ReLU网络的结果推广到了$\ anh$设置。

英文摘要

We investigate limitations of learning $\tanh$ neural networks from point evaluations under finite-precision computations and $L^p$ accuracy guarantees, building on Berner, Grohs, and Voigtländer (2023). Our approach is based on a novel construction of sharply localized bump functions via iterated $\tanh$ activations. Using this mechanism, we show that, in a finite-precision setting, no adaptive randomized algorithm based on $m$ samples can achieve a convergence rate higher than the Monte Carlo rate $O(m^{-1/p})$ in the $L^p$ norm, unless the sampling budget grows exponentially with the size of the network parameters and architecture. The results reveal fundamental limitations imposed by finite precision on the learnability of classes containing localized bump functions, extending previous results for ReLU networks to the $\tanh$ setting.

2606.10931 2026-06-12 cs.CL 新提交

It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO

一个样本就能带偏所有:单次GRPO打破对齐

Naihao Deng, Yilun Zhu, Naichen Shi, Clayton Scott, Rada Mihalcea

AI总结 研究发现,仅用单个有偏样本进行一步GRPO训练就能诱导大语言模型产生系统性偏见,且刻板印象推理泛化到多种属性、类别和基准测试,揭示了对齐机制的关键脆弱性。

详情
AI中文摘要

警告:本文包含若干有毒和冒犯性言论。现代大语言模型通常通过大规模后训练进行对齐,以确保公平和可靠的行为。在本工作中,我们研究了通过群体相对策略优化(GRPO)打破这些防护栏的容易程度。我们表明,在单个有偏样本上进行一次GRPO训练就足以诱导系统性偏见,且基于刻板印象的推理会泛化到不同属性、类别和基准测试中。我们进一步发现,模型基于初始产生有偏输出的可能性而表现出不同的易感性。我们的结果揭示了后训练中的一个关键脆弱性:对齐可以被单个样本覆盖。

英文摘要

Warning: This paper contains several toxic and offensive statements. Modern large language models (LLMs) are typically aligned through large-scale post-training to ensure fair and reliable behavior. In this work, we investigate how easily such guardrails can be broken by Group Relative Policy Optimization (GRPO). We show that one-shot GRPO training on a single biased example is sufficient to induce systematic bias, with stereotype-driven reasoning generalizing across attributes, categories, and benchmarks. We further find that models differ in their susceptibility based on the initial likelihood of producing biased outputs. Our results reveal a critical vulnerability in post-training: alignment can be overridden by a single example.

2606.10200 2026-06-12 cs.CV cs.AI cs.LG 新提交

An Improved Generative Adversarial Network for Micro-Resistivity Imaging Logging Restoration

一种改进的生成对抗网络用于微电阻率成像测井恢复

Ahmed Faizul Haque, S.M. Riaz Rahman Antu, Saif Ahmed, Asadullah Hil Galib, Souvik Pramanik, Mohammad Ashrafuzzaman Khan, Mohammad Abdul Qayum, Mohsin Sajjad

AI总结 提出基于改进GAN的成像测井图像恢复方法,通过FCN生成网络、深度可分离卷积残差块、Inception模块及多尺度特征提取与空间注意力机制,结合全局与局部判别网络,有效恢复缺失区域,结构相似性达0.903。

详情
Comments
Mistakes in citations and references. Further we want to submit in conference with improved experiments and results
AI中文摘要

本文提出了一种改进的基于GAN的成像测井图像恢复方法,用于解决微电阻率成像测井图像部分缺失的问题。该方法采用FCN作为生成网络基础设施,并添加深度可分离卷积残差块以学习和保留更有效的像素与语义信息;添加Inception模块以增加网络的多尺度感知场并减少参数数量;添加多尺度特征提取模块和空间注意力残差块,结合通道注意力机制与残差块实现多尺度特征提取。设计了全局判别网络和局部判别网络,通过相互对抗与生成网络逐步提高恢复部分与整体图像之间的内容和语义结构一致性。实验结果表明,测试集中五组不同大小缺失区域的成像测井图像的平均结构相似性度量为0.903,相比其他类似方法提高了约0.3。研究表明,该方法可用于微电阻率成像测井图像的恢复,在语义结构一致性和纹理细节方面有良好改善,从而为保障微电阻率成像测井图像后续解释的顺利进行提供了一种新的深度学习方法。

英文摘要

An improved GAN-based imaging logging image restoration method is presented in this paper for solving the problem of partially missing micro-resistivity imaging logging images. The method uses FCN as the generative network infrastructure and adds a depth-separable convolutional residual block to learn and retain more effective pixel and semantic information; an Inception module is added to increase the multi-scale perceptual field of the network and reduce the number of parameters in the network; and a multi-scale feature extraction module and a spatial attention residual block are added to combine the channel attention. The multi-scale module adds a multi-scale feature extraction module and a spatial attention residual block, which combine the channel attention mechanism and the residual block to achieve multi-scale feature extraction. The global discriminative network and the local discriminative network are designed to gradually improve the content and semantic structure coherence between the restored parts and the whole image by playing off each other and the generative network. According to the experimental results, the average structural similarity measure of the five sets of imaged logging images with different sizes of missing regions in the test set is 0.903, which is an improvement of about 0.3 compared with other similar methods. It is shown that the method in this study can be used for the restoration of micro-resistivity imaging log images with good improvement in semantic structural coherence and texture details, thus providing a new deep learning method to ensure the smooth advancement of the subsequent interpretation of micro-resistivity imaging log images.

2606.10642 2026-06-12 cs.LG physics.ao-ph 新提交

PhysMetrics.Weather: An Evaluation Framework for Physical Consistency in ML Weather Models

PhysMetrics.Weather: 机器学习天气模型中物理一致性的评估框架

Emma Kasteleyn, Timo Maier, Axel Lauer, Veronika Eyring, Pierre Gentine, Ana Lucic

AI总结 提出PhysMetrics.Weather评估框架,通过守恒、谱和动力学三类指标量化MLWP模型的物理真实性,指导物理信息架构开发并评估其运行可靠性。

详情
Comments
Preprint
AI中文摘要

机器学习天气预测(MLWP)模型以传统基于物理方法所需计算成本的一小部分实现了令人印象深刻的预测性能。然而,它们主要是(1)数据驱动的,并且(2)使用逐像素误差指标(例如RMSE)进行评估,因此无法保证其预测与已知物理定律一致。我们介绍了PhysMetrics.Weather,这是一个评估框架,通过三类指标(守恒、谱和动力学)评估MLWP模型的物理真实性。通过量化物理真实性,该工具指导物理信息架构的开发,并帮助评估MLWP模型是否可用于运行。我们的框架可在Github上获取,网址为https://github.com/...(原文未提供完整链接)。

英文摘要

Machine learning weather prediction (MLWP) models have achieved impressive forecasting performance at a small fraction of the computational costs required for traditional physics-based methods. However, they are primarily (1) data-driven and (2) evaluated using pixel-wide error metrics (e.g., RMSE), so there are no guarantees that their forecasts are consistent with known physical laws. We introduce PhysMetrics$.$Weather, an evaluation framework that assesses the physical realism of MLWP models across three types of metrics: conservation, spectral, and dynamical. By quantifying physical realism, this tool guides the development of physics-informed architectures and helps evaluate whether MLWP models are reliable for operational use. Our framework is available on Github at this https URL.

2606.10069 2026-06-12 cs.LG physics.geo-ph 新提交

Using Seismic Statistical Features and VQ-VAE to Improve Spatiotemporal Seismicity Predictability

基于VQ-VAE和地震统计特征的时空地震危险性评估

Wei Quan, Denise Gorse

AI总结 本文在先前基于XGBoost和地震统计特征的研究基础上,将预测从全区域扩展到局部区域,并引入基于VQ-VAE模型从二维地震图提取的新特征,提升了局部地震预测性能。

详情
Comments
Title updated from "Spatiotemporal Seismic Hazard Assessment Using VQ-VAE and Seismic Statistical Features" to "Using Seismic Statistical Features and VQ-VAE to Improve Spatiotemporal Seismicity Predictability" in v2 to better reflect the focus of the paper. The content is unchanged apart from the title and minor copyediting
AI中文摘要

在本文中,我们基于先前的一项研究,该研究使用XGBoost以及日本和智利的地震目录数据证明,一组60个地震统计特征(SSFs)比tsfresh包中的428个通用时间序列特征具有更大的预测价值。我们在此以两种关键方式扩展了先前的工作,重点使用日本的数据,因为需要大数据集来训练深度学习(自编码器)模型。首先,我们从全区域预测(针对每个候选事件,考虑未来15天内区域内任何地方发生M≥5.0事件的可能性)转向局部预测,其中特征计算区域和预测区域都限制在候选事件周围半径24公里的圆内,并且我们表明性能仍然优秀,与先前同一区域的全局研究相似。其次,我们将基于一维(目录)数据的这套经过验证的SSFs与基于二维地震图的新特征相结合,该特征通过训练VQ-VAE模型以输出此类地图,并识别其误差度量与局部地壳应力积累的关系。我们表明,尽管仅基于SSFs的局部预测可以单独有效,测试AUC值与先前日本全局研究中的值一样高,但包含新的原生空间VQ-VAE衍生特征(通过SHAP分析排名最高)可以提升性能,并且似乎几乎完全取代了传统计算的b值在特征使用中的位置。

英文摘要

In this paper we build upon a previous study in which we demonstrated, using XGBoost and earthquake catalogue data from Japan and Chile, that a set of 60 seismic statistical features (SSFs) had much greater predictive value than a set of 428 generic time series features from the tsfresh package. We here extend this previous work in two key ways, focusing on data from Japan as a large dataset is necessary in order to allow for the training of a deep learning (autoencoder) model. First, we move from whole-region prediction (considering, for each candidate event, the likelihood of an event M $\geq$ 5.0 anywhere in the region in the next 15 days) to localised predictions in which both the region of feature computation and the region of prediction are restricted to a circle of radius 24 km around the candidate event, and we show that performance remains excellent, similar to our previous whole-region study for the same area. Second, we here couple this proven set of SSFs, based on one-dimensional (catalogue) data, with a novel feature based on two-dimensional seismic maps, obtained by training a VQ-VAE model to reproduce such maps as output and identifying a measure of its error in doing so with a localised build-up of crustal stress. We show that while localised prediction based on SSFs can be effective alone, with test AUC values as high as those obtained in the case of Japan in our previous whole-region study, the inclusion of the new natively-spatial VQ-VAE-derived feature, top-ranked by SHAP analysis, can enhance performance and additionally appears to near-wholly replace the traditionally-computed $b$-value in terms of feature usage.

2606.09073 2026-06-12 cs.LG cs.AI cs.CL 新提交

A Unifying Lens on Reward Uncertainty in RLHF

RLHF中奖励不确定性的统一视角

Ely Hahami, Yoel Zimmermann, Ray Zhou, Jack Benarroch Jedlicki

AI总结 本文提出使用分布奖励模型统一RLHF中的悲观主义方法,通过闭式有效奖励公式连接现有启发式方法,并揭示其隐含假设。

详情
AI中文摘要

基于人类反馈的强化学习(RLHF)受限于\textit{奖励破解},即策略利用代理奖励模型(RM)中的错误,产生高RM分数而缺乏真正的质量提升。一种自然的缓解方法是\textit{悲观主义}:在RM不确定的区域惩罚奖励。然而,标准标量RM没有提供原则性的不确定性概念。我们认为正确的对象是\textit{分布}奖励模型$p(r\mid x,y)$。在贝叶斯推断或KL分布鲁棒优化(KL-DRO)视角下,KL正则化的RLHF目标具有闭式有效奖励$\tilde r(x,y) = \pmβ\log\mathbb{E}_p[e^{\pm r/β}]$。悲观分支统一了RM集成聚合的先前启发式方法:均值聚合、最坏情况优化(WCO)和不确定性加权优化(UWO)都作为该单一表达式的极限或截断出现。这也澄清了每个现有规则的隐含假设。

英文摘要

Reinforcement learning from human feedback (RLHF) is bottlenecked by reward hacking, where the policy exploits errors in a proxy reward model (RM) and produces high RM scores without genuine quality gains. A natural mitigation is pessimism: lowering rewards in regions where the RM is uncertain. However, standard scalar RMs provide no principled notion of uncertainty. We argue that the right object is a distributional reward model $p(r\mid x,y)$. Under either a Bayesian inference or a KL-distributionally robust optimization (KL-DRO) lens, the KL-regularized RLHF objective admits a closed-form effective reward $\tilde r(x,y) = \pm\beta\log\mathbb{E}_p[e^{\pm r/\beta}]$. The pessimistic branch unifies the prior heuristics for RM ensemble aggregation: mean aggregation, worst-case optimization (WCO), and uncertainty-weighted optimization (UWO) all emerge as limits or truncations of this single expression. This also clarifies the implicit assumptions of each existing rule.

2606.08436 2026-06-12 cs.CV 新提交

CACR:Reinforcing Temporal Answer Grounding in Instructional Video via Candidate-Aware Causal Reasoning

通过候选感知因果推理增强教学视频中的时间答案定位

Muge Qi, Rong Fu, Pengbin Feng, Xianda Li, Yu Cai, Yifu Guo, Shizhe Zhang, Simon James Fong, Lei Ma, Bin Li

AI总结 提出候选感知因果推理框架,通过视觉-语言预训练候选选择和基于GRPO的时序逻辑推理,解决教学视频中复杂问题理解和长视频片段定位挑战,在六个基准上取得最优mIoU。

详情
AI中文摘要

教学视频中的时间答案定位任务旨在定位响应自然语言查询的精确视频片段,对于直接视频答案检索日益重要。由于需要理解语义复杂的问题并解决未修剪视频与短目标时刻之间的显著长度不匹配,该任务仍然具有挑战性。现有方法通常对无关内容敏感或视觉推理能力不足。为了解决这些局限性,我们提出了候选感知因果推理框架。我们的方法首先采用基于视觉-语言预训练的候选选择算法高效生成K个候选片段,然后应用由拒绝奖励机制增强并通过组相对策略优化优化的时序逻辑推理模块进行稳健推理。在六个基准上的大量实验表明,我们的方法在平均交并比方面达到了最先进的性能,为长视频中基于推理的检索提供了新视角。

英文摘要

The task of temporal answer grounding in instructional video (TAGV), which aims to locate precise video segments that respond to natural language queries, is increasingly important for direct video answer retrieval. This task remains challenging due to the need to comprehend semantically complex questions and to address the significant length mismatch between untrimmed videos and short target moments. Existing methods often suffer from sensitivity to irrelevant content or insufficient visual reasoning capabilities. To tackle these limitations, we propose a Candidate-Aware Causal Reasoning (CACR) framework. Our approach first employs a Visual-Language Pre-training based Candidate Selection (VBCS) algorithm to efficiently generate K candidate segments, then applies a temporal logic reasoning module enhanced by a rejection reward mechanism and optimized via Group Relative Policy Optimization (GRPO) for robust inference. Extensive experiments on six benchmarks demonstrate that our method achieves state-of-the-art performance in terms of mean Intersection-over-Union (mIoU), providing a new perspective for reasoning-based retrieval in long videos.