arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2117
2606.12576 2026-06-12 cs.CL 新提交

Helping Figures Tell their Story! Paper-Grounded Video Generation Explaining Complex Scientific Figures

帮助图表讲述它们的故事!基于论文的视频生成解释复杂科学图表

Ishani Mondal, Javad Baghirov, Jordan Boyd-Graber

AI总结 提出MINARD流水线,从图表及其论文生成基于区域分解的叙述性视频,并发布FigTalk基准,在自动和人工评估中优于现有方法。

详情
Comments
Webpage: https://minard.vercel.app/
AI中文摘要

科学图表将复杂的流程压缩到单个画布中,但理解它们需要基于论文的、逐步的叙述,并与视觉高亮对齐——这是当前视频生成系统和基准所缺乏的能力。为了解决这个问题,我们引入了基于论文的图表到视频生成:从图表及其论文生成叙述性的、区域引导的导览视频。我们提出了MINARD(通过区域分解对叙述性架构进行多模态解释),这是一个生成基于论文的叙述并顺序将其与图表区域对齐的流水线。我们还发布了FigTalk,一个包含新的顺序和组件级对齐指标的基准。在FigTalk上,MINARD生成类人的、忠于论文的叙述,并在自动和人工评估中,在叙述条件下的图表空间对齐方面优于现有方法。

英文摘要

Scientific figures compress complex pipelines into a single canvas, yet understanding them requires paper-grounded, step-by-step narration aligned with visual highlights a capability missing from current video generation systems and benchmarks. To address this, we introduce paper-grounded figure-to-video generation: generating narrated, region-grounded walkthrough videos from a figure and its paper. We propose MINARD (Multimodal Interpretation of Narrated Architecture via Region Decomposition), a pipeline that generates paper-grounded narrations and sequentially grounds them to figure regions. We also release FigTalk, a benchmark with new sequential and component-level grounding metrics derived. On FigTalk, MINARD generates humanlike, paper-faithful narrations and outperforms narration-conditioned figure spatial grounding compared to existing approaches in both automatic and human evaluation

2606.12505 2026-06-12 cs.LG cs.AI 新提交

Boosting Direct Preference Optimization with Penalization

通过惩罚增强直接偏好优化

Pengwei Sun

AI总结 提出DPOP,在DPO损失上增加对参考模型贪婪响应的门控惩罚,仅当当前策略对偏好响应概率低于拒绝响应时激活,在AlpacaEval 2.0上显著提升胜率。

详情
Comments
Accepted at ICML 2026 Workshop on Decision-Making from Offline Datasets to Online Adaptation: Black-Box Optimization to Reinforcement Learning
AI中文摘要

离线偏好优化已成为从人类反馈中进行强化学习的实用替代方案,但诸如直接偏好优化(DPO)及其变体等成对目标仅使用存储在静态数据集中的选择和拒绝响应。这留下了一个有用的信号未被利用:参考模型本身为同一提示生成的响应。我们提出了带惩罚的直接偏好优化(DPOP),这是DPO的一个简单扩展,它在基础偏好损失上增加了一个对参考贪婪响应的门控惩罚。DPOP仅在当前策略对偏好响应的似然仍低于对拒绝响应的似然时激活此惩罚。在AlpacaEval 2.0上,DPOP在Llama-3-8b-it和Gemma-2-9b-it上均提高了长度控制的胜率,相对于DPO、SimPO和AlphaDPO,在两个模型上分别实现了5.3%和4.4%的相对增益。消融实验进一步表明,在此设置下,SimNPO风格的长度归一化惩罚比NPO和token级非似然惩罚更强。

英文摘要

Offline preference optimization has become a practical substitute for reinforcement learning from human feedback, but pairwise objectives such as Direct Preference Optimization (DPO) and its variants use only the chosen and rejected responses stored in a static dataset. This leaves a useful signal unused: the response that the reference model itself would generate for the same prompt. We propose Direct Preference Optimization with Penalization (DPOP), a simple extension of DPO that augments the base preference loss with a gated penalty on reference-greedy responses. DPOP activates this penalty only when the current policy still assigns a lower likelihood to the preferred response than to the rejected response. On AlpacaEval 2.0, DPOP improves length-controlled win rate over DPO, SimPO, and AlphaDPO on both Llama-3-8b-it and Gemma-2-9b-it, achieving relative gains of 5.3\% and 4.4\% over baselines on the two models, respectively. Ablations further show that a SimNPO-style length-normalized penalty is stronger than NPO and token-level unlikelihood in this setting.

2606.12500 2026-06-12 cs.LG cs.AI 新提交

Improving Crash Frequency Prediction from Simulated Traffic Conflicts Using Machine Learning Based Microsimulation

基于机器学习的微观仿真从模拟交通冲突改进碰撞频率预测

Xian Liu, Carlo G. Prato, Gustav Markkula

AI总结 本文利用机器学习行为模型替代传统规则模型进行交通微观仿真,通过极端值理论分析模拟冲突预测碰撞频率,在英国利兹五个信号交叉口验证了ML模型无需地点校准即可提升预测准确性。

详情
AI中文摘要

交通微观仿真结合替代安全措施越来越多地被用作历史碰撞数据的主动替代方案,用于预测当前或计划道路基础设施设计的碰撞频率。然而,现有的基于微观仿真的安全研究采用了简化的基于规则的行为模型,这些模型能较好地再现交通流,但往往无法生成真实的冲突动态,限制了碰撞预测的准确性。机器学习(ML)行为模型的最新进展提供了一个有希望的机会,通过直接从大规模轨迹数据集中学习人类驾驶行为,可能提高微观仿真的真实性和碰撞频率预测。为了研究这种可能性,我们对英国利兹的五个真实信号交叉口进行了交通微观仿真,使用了标准的基于规则模型和最先进的ML模型。使用二维碰撞时间指标分析模拟车辆轨迹以识别模拟冲突,然后使用极端值理论建模以预测碰撞频率。结果表明,ML模型的冲突产生的碰撞预测与实际碰撞数据一致,而基于规则的模型由于缺乏对特定模拟交叉口的模型校准,无法产生有意义的预测。直接使用ML生成的模拟碰撞来预测实际碰撞频率也产生了较差的结果,这表明尽管当前的ML模型可以真实地再现冲突,但尚不能生成真实的碰撞。总体而言,研究结果表明,基于ML的行为模型在无需特定地点模型校准的情况下,有望从模拟冲突中改进碰撞预测,并为基于ML的交通微观仿真指明了明确的未来方向。

英文摘要

Traffic microsimulation combined with surrogate safety measures has increasingly been used as a proactive alternative to historical crash data for predicting crash frequency for current or planned road infrastructure designs. However, existing microsimulation-based safety studies have adopted simplified rule-based behaviour models, which reproduce traffic flow reasonably well but often fail to generate realistic conflict dynamics, limiting crash prediction accuracy. Recent advances in machine learning (ML)-based behaviour models offer a promising opportunity to potentially improve microsimulation realism and crash frequency predictions by learning human driving behaviour directly from large-scale trajectory datasets. To investigate this possibility, traffic microsimulation was conducted for five real-world signalised intersections in Leeds, UK, using both a standard rule-based model and a state-of-the-art ML model. Simulated vehicle trajectories were analysed using a two-dimensional Time-to-Collision metric to identify simulated conflicts, which were then modelled using Extreme Value Theory to predict crash frequency. Results show that conflicts from the ML model yielded crash predictions in line with the real-world crash data, whereas the rule-based model did not permit meaningful predictions, presumably due to a lack of model calibration to the specific simulated intersections. Directly using ML-generated simulated crashes to predict real-world crash frequency also yielded poor results, suggesting that while current ML models can realistically reproduce conflicts, they are not yet able to generate realistic crashes. Overall, the findings demonstrate that ML-based behaviour models are promising for improving crash prediction from simulated conflicts, without a need for location-specific model calibration, and suggest clear future directions for ML-based traffic microsimulation.

2606.12483 2026-06-12 cs.LG 新提交

Scalable anomaly detection via a univariate Christoffel function

通过单变量Christoffel函数实现可扩展的异常检测

Florian Grivet, Didier Henrion, Jean-Bernard Lasserre, Louise Travé-Massuyès

AI总结 针对Christoffel函数方法因矩阵大小随维度指数增长而难以应用于高维数据的问题,提出基于查询点与支撑点间平方距离的单变量Christoffel函数(UCF),在ADBench基准上平均精度优于14种基线方法。

详情
AI中文摘要

异常检测在欺诈检测、网络入侵和系统故障诊断等领域识别异常模式中发挥关键作用。近年来,基于Christoffel函数的方法(根植于多项式优化)因其坚实的数学基础和计算节俭性,成为深度学习的有前景替代方案。然而,其实用性受限于需要求逆一个大小随数据维度指数增长的矩阵,即使对于中等维度数据集也难以处理。本文解决了Christoffel函数异常检测的维度限制,同时保留了其关键理论性质,即开关支撑二分法行为和准确的支撑形状捕获。我们引入了UCF,一种基于查询点与支撑点间平方距离的单变量Christoffel函数。在ADBench基准上的大量实验表明,UCF在平均精度上持续优于14个最先进的基线方法。通过解决Christoffel函数的可扩展性瓶颈,本文扩展了异常检测方法的工具箱,提供了一种稳健、有理论依据且普遍适用的方法。

英文摘要

Anomaly detection plays a critical role in identifying unusual patterns across domains such as fraud detection, network intrusion, and system fault diagnosis. Recently, Christoffel function-based methods, rooted in polynomial optimization, have emerged as promising alternatives to deep learning due to their strong mathematical foundations and computational frugality. However, their practical applicability is hindered by the need to invert a matrix whose size grows exponentially with the data dimension, rendering the method intractable even for moderate-dimensional datasets. This paper addresses the dimensionality limitations of Christoffel function-based anomaly detection while preserving its key theoretical properties, i.e., the on-off support dichotomy behavior and the accurate support shape capture. We introduce UCF, a univariate Christoffel function which is based on the squared distance between the query point and the support points. Extensive experiments on the ADBench benchmark demonstrate that UCF consistently outperforms 14 state-of-the-art baselines in terms of Average Precision. By resolving the scalability bottleneck of the Christoffel Function, this work expands the toolkit of anomaly detection methods with a robust, theoretically grounded, and universally applicable approach.

2606.13634 2026-06-12 cs.CL math.CT 新提交

Operads for compositional reasoning in LLMs

用于LLM组合推理的Operad框架

Nathaniel Bottman, Kyle Richardson

AI总结 提出operad作为问题分解的数学框架,定义问题operad Q,将QA模型解释为Q上的代数,并引入operadic一致性度量,实验表明该度量与准确性强相关。

详情
AI中文摘要

问题分解,即将复杂查询分解为更简单的子查询,并将子查询的答案组合成最终答案,是提高LLM推理能力的常用策略,但目前缺乏严格的数学基础。本文提出operad(一种模拟多输入单输出操作及其组合的数学结构)作为描述问题分解的自然框架。我们定义了问题operad $Q$,其中操作对应问题模板,组合对应子答案的替换,并展示了QA模型如何被解释为$Q$上的代数。除了重新诠释现有实践,这一operad视角还指向了新方法,特别是operadic一致性概念,它衡量QA模型的答案在问题分解树的部分折叠上是否一致。关于operadic一致性的实证评估见我们的姊妹论文(Bottman, Liu, and Richardson, 2026),该论文发现它在12个LLM和4个多跳QA数据集上与准确性强相关,且优于基于温度的标准自一致性基线。我们认为operad是问题分解的自然数学框架,而诸如operadic一致性等不变量为分析和改进多步推理的可靠性开辟了新方向。

英文摘要

Question decomposition, i.e. breaking a complex query into simpler sub-queries whose answers are composed to produce a final answer, is a widely used strategy for improving LLM reasoning, yet it currently lacks a rigorous mathematical foundation. In this paper, we propose operads, mathematical structures that model many-in, one-out operations and compositions thereof, as a natural framework for describing question decomposition. We define the questions operad $Q$, in which operations correspond to question templates and composition corresponds to substitution of sub-answers, and show how QA models can be interpreted as algebras over $Q$. Beyond reframing existing practice, this operadic perspective points toward new methods, in particular a notion of operadic consistency, which measures whether a QA model's answers agree across the partial collapses of a question decomposition tree. Empirical evaluation of operadic consistency is reported in our companion paper (Bottman, Liu, and Richardson, 2026), which finds it strongly correlated with accuracy across twelve LLMs and four multi-hop QA datasets and outperforming standard temperature-based self-consistency baselines. We argue that operads are the natural mathematical home for question decomposition, and that invariants such as operadic consistency open new directions for analyzing and improving the reliability of multi-step reasoning.

2606.13092 2026-06-12 cs.LG cs.RO math.DS 新提交

Scale Buys Interpolation, Structure Buys a Horizon: Certified Predictability for Equivariant World Models

规模买插值,结构买地平线:等变世界模型的认证可预测性

Hongbo Wang

AI总结 针对等变潜在世界模型,提出可计算的多步可预测地平线认证,证明T步滚动误差在对称轨道上恒定,并由李雅普诺夫谱分层界定,且该认证为等变模型独有。

详情
Comments
23 pages (9 main + appendices). Code: https://github.com/TimothyWang418/se3-ejepa
AI中文摘要

规模买插值;结构买认证的地平线。世界模型的平均误差无法说明特定预测是否可信,或可信多久。对于等变潜在世界模型,我们给出可计算的多步可预测地平线认证:$T$步滚动误差在每个对称轨道上恒定(定理A),并由预测器的李雅普诺夫谱逐通道分层,$T_j(\epsilon)\sim\log(1/\epsilon)/\lambda_j$。地平线是双向的——匹配的下界使近似等变被证明受地平线限制——且该认证为结构独有:轨道恒定误差刻画等变性,因此任何非等变模型无论规模多大都不具备。实验上,在40维Lorenz-96上,只有$\mathbb{Z}_N$等变网络恢复完整李雅普诺夫谱($R^2=0.98$);密集和循环基线失败。由于谱是忠实的,认证先验地起作用:在固定感知预算下,$c$倍膨胀的认证需要$c$倍预算,且等变认证满足其膨胀密集对应物无法满足的预算——无需校准数据。相同的读出,未经修改,可无训练审计公开预训练世界模型:TD-MPC2检查点落在认证自身的范围分类上——在强膨胀处校准(比率0.94-1.02),在弱膨胀处乐观,在收缩处正确弃权——部署的监控器逐单元复制该映射,样本外。在官方1M-317M多任务阶梯上,校准不随参数增加。在V-JEPA 2-AC(1B,真实机器人数据)上,测量的交叉检查正确覆盖了过度承诺的切空间谱——交叉验证审计,而非原始数值,是可部署的对象。规模买插值,而非校准的地平线。

英文摘要

Scale buys interpolation; structure buys a certified horizon. A world model's average error says nothing about whether a particular prediction can be trusted, or for how long. For equivariant latent world models we give a computable, multi-step certificate of the predictable horizon: $T$-step rollout error is provably constant over each symmetry orbit (Theorem A) and stratified channel-by-channel by the predictor's Lyapunov spectrum, $T_j(ε)\sim\log(1/ε)/λ_j$. The horizon is two-sided -- a matching lower bound makes approximate equivariance provably horizon-limited -- and the certificate is exclusive to structure: orbit-constant error characterizes equivariance, so no non-equivariant model has it at any scale. Empirically, on 40-D Lorenz-96 only a $\mathbb{Z}_N$-equivariant network recovers the full Lyapunov spectrum ($R^2{=}0.98$); dense and recurrent baselines fail. Because the spectrum is faithful, the certificate acts, a priori: under a fixed sensing budget a $c\times$-inflated certificate provably needs $c\times$ the budget, and the equivariant certificate meets a budget its inflated dense counterpart cannot -- with zero calibration data. The same read-out, unchanged, audits public pretrained world models training-free: TD-MPC2 checkpoints land on the certificate's own scope taxonomy -- calibrated where strongly expansive (ratio 0.94-1.02), optimistic where weakly expansive, correctly abstaining where contracting -- a map a deployed monitor replicates cell-by-cell, out-of-sample. Across the official 1M-317M multitask ladder, calibration does not improve with parameters. On V-JEPA 2-AC (1B, real robot data) the measured cross-check correctly overrides an over-promising tangent spectrum -- the cross-validated audit, not the raw number, is the deployable object. Scale buys interpolation, not a calibrated horizon.

2606.12691 2026-06-12 cs.LG cs.AI cs.SY eess.SY math.OC stat.ML 新提交

Two-Layer Linear Auto-Regressive Models Estimate Latent States

两层线性自回归模型估计潜在状态

Yahya Sattar, Sunmook Choi, Leo Maynard-Zhang, Yassir Jedra, Maryam Fazel, Sarah Dean

AI总结 本文证明两层线性自回归模型通过经验风险最小化训练时,能近似卡尔曼滤波,恢复潜在状态估计,并提供有限样本保证。

详情
Comments
ICML 2026
AI中文摘要

自回归模型已成为处理序列数据(从语言到视频)的强大工具。理解这些模型如何以及为何学习潜在表示仍然是一个开放的理论问题。在这项工作中,我们证明,当在部分观测的线性动力系统的数据上通过经验风险最小化训练时,两层线性自回归模型自然学会近似卡尔曼滤波。特别地,我们表明,学习到的隐藏表示与最优(卡尔曼)滤波器产生的状态估计一致,仅相差一个相似变换,尽管模型没有关于底层动力学或状态的显式知识。该结果基于三个主要见解。首先,我们建立卡尔曼滤波器可以被具有有界截断误差的自回归模型很好地近似。其次,我们表明,尽管非凸性,两层优化景观是良性的,即所有驻点要么是严格鞍点,要么是全局最小值。最后,作为我们的主要贡献,我们提供了关于预测误差、参数估计误差和潜在状态恢复的有限样本保证。数值模拟支持理论结果,并表明自回归模型的潜在表示恢复了状态估计。

英文摘要

Auto-regressive models have emerged as powerful tools for sequential data, from language to video. Understanding how and why these models learn latent representations remains an open theoretical question. In this work, we demonstrate that when trained by empirical risk minimization on data from partially observed linear dynamical systems, two-layer linear auto-regressive models naturally learn to approximate Kalman filtering. In particular, we show that the learned hidden representation coincides, up to a similarity transformation, with the state estimates produced by the optimal (Kalman) filter, even though the model has no explicit knowledge of the underlying dynamics or state. The result follows from three main insights. First, we establish that the Kalman filter is well approximated by an auto-regressive model with bounded truncation error. Second, we show that despite non-convexity, the two-layer optimization landscape is benign, i.e., all stationary points are either strict saddles or global minima. Finally, as our main contributions, we provide finite-sample guarantees on prediction error, parameter estimation error, and latent state recovery. Numerical simulations support the theoretical results and demonstrate that the latent representations of auto-regressive models recover state estimates.

2606.13341 2026-06-12 cs.CV cs.AI physics.med-ph 新提交

Dual-Domain Equivariant Generative Adversarial Network for Multimodal CT-PET Synthesis

双域等变生成对抗网络用于多模态CT-PET合成

Gabriel Steele, Alzahra Altalib, Alessandro Perelli

AI总结 提出双域等变生成对抗网络(DDE-GAN),联合空间与频域学习并融入旋转等变性,实现高保真多模态CT-PET图像合成。

详情
Comments
4 pages, 3 figures, 1 table, 2026 IEEE 23rd International Symposium on Biomedical Imaging (ISBI)
AI中文摘要

我们提出了一种用于多模态CT-PET图像合成的双域等变生成对抗网络(DDE-GAN)。传统的基于GAN的方法通常仅在空间域中操作,忽略了几何一致性,导致结构保真度有限。DDE-GAN通过联合学习空间域和频率(傅里叶)域,捕捉互补的解剖和频谱信息,解决了这些挑战。此外,嵌入在CT和PET测量物理中的旋转等变性被整合到生成器和判别器的损失中,以确保在旋转下的一致响应,从而提高解剖准确性。一种分层双域训练策略通过多阶段损失函数强制实现域内和域间一致性。在HECKTOR 2022 CT-PET数据集上的评估表明,DDE-GAN在CT-PET图像合成中取得了优于基线模型的合成质量。结果表明,将双域学习与几何等变性相结合,显著增强了多模态图像合成的准确性和鲁棒性,为PET补全和数据增强等实际应用提供了可能。

英文摘要

We present a Dual-Domain Equivariant Generative Adversarial Network (DDE-GAN) for multimodal CT-PET image synthesis. Traditional GAN-based approaches often operate solely in the spatial domain and ignore geometric consistency, resulting in limited structural fidelity. DDE-GAN addresses these challenges by jointly learning from both spatial and frequency (Fourier) domains, capturing complementary anatomical and spectral information. Furthermore, rotational equivariance embedded in the physics of the CT and PET measurements are integrated into the loss of both the generator and discriminator to ensure consistent responses under rotations, improving anatomical accuracy. A hierarchical dual-domain training strategy enforces intra- and inter-domain consistency through multi-stage loss functions. Evaluated on the HECKTOR 2022 CT-PET dataset, DDE-GAN achieves superior synthesis quality over baseline models for CT-PET image synthesis. The results demonstrate that combining dual-domain learning with geometric equivariance substantially enhances multimodal image synthesis accuracy and robustness, enabling practical applications in PET completion and data augmentation.

2606.13568 2026-06-12 cs.LG math-ph math.MP 新提交

Adjusted Cup-Product Neural Layer

调整杯积神经层

Snigdha Chandan Khilar

AI总结 提出调整杯积神经层,通过硬连线杯积与高规范理论调整项,实现规范不变读出,并证明调整系数是唯一信号源。

详情
AI中文摘要

物理和几何中的许多重要可观测量是上链的杯积。本文引入了调整杯积神经层。这是一种神经原语,硬连线了杯积与来自高规范理论的调整项。这创建了一个设计上规范不变的读出。他们的主要理论结果表明,在闭链上,输出完全依赖于调整系数。将该系数设为零,无论其他参数如何,输出完全消失。因此,调整是规范不变信号的唯一来源。他们证明该可观测量是一个非零二次型,并且在一个和两个规范变换下精确不变。

英文摘要

Many important observables in physics and geometry are cup products of cochains. The adjusted cup product neural layer has been introduced in this paper. It is a neural primitive that hard wires the cup product with an adjustment term from higher gauge theory. This creates a readout that is gauge invariant by design. Their main theoretical result shows that on a closed cycle the output relies entirely on the adjustment coefficient. Setting this coefficient to zero removes the output completely regardless of other parameters. Thus the adjustment is the only source of gauge invariant signal. They prove this observable is a nonzero quadratic form and is exactly invariant under one and two gauge transformations.

2606.12368 2026-06-12 cs.CV 新提交

DepthMaster: Unified Monocular Depth Estimation for Perspective and Panoramic Images

DepthMaster: 统一透视与全景图像的单目深度估计

Pengfei Wang, Shihao Wang, Liyi Chen, Zhiyuan Ma, Guowen Zhang, Lei Zhang

AI总结 提出DepthMaster统一框架,通过将全景图分解为重叠透视块并引入对应一致性损失和虚拟投影相机几何先验,解决透视与全景深度估计的几何差异和数据稀缺问题,在13个数据集上实现零样本最优性能。

详情
AI中文摘要

虽然单目深度估计取得了显著进展,但对于窄视场(FoV)透视图像和$360^\circ$全景图像实现通用的度量深度估计仍然是一个未解决的挑战。现有方法通常针对特定相机类型设计,难以在多样化场景中生成准确的度量深度。这一限制源于两个关键挑战:透视相机与全景相机之间的固有几何差异,以及带有度量标注的全景训练数据的稀缺性。在这项工作中,我们引入了DepthMaster,一个统一的度量深度估计框架。我们不采用专门网络来学习球形畸变,而是通过将全景图像分解为重叠的透视块来重新表述问题。关键的是,与先前依赖临时架构修改来处理边界的基于投影的方法不同,我们引入了一种新颖的对应一致性损失(CCL),并注入虚拟投影相机作为几何先验,从而能够无缝拼接这些块,同时避免专用算子并保持主干与标准Transformer设计高度兼容。该策略通过将所有输入统一为规范透视表示来解决几何差异,并通过直接从大量透视数据集中解锁强大的度量先验来有效规避数据稀缺问题。在仅包含一个全景数据集的混合数据集上训练后,DepthMaster在13个多样化数据集上实现了最先进的零样本性能,不仅在透视和全景领域超越了通用方法,还领先于领先的专家模型。

英文摘要

While monocular depth estimation has achieved significant progress, achieving generalized metric depth estimation for both narrow field-of-view (FoV) perspectives and $360^\circ$ panoramas remains an unsolved challenge. Existing methods are often tailored to specific camera types and struggle to produce accurate metric depth that generalizes across diverse settings. This limitation stems from two key challenges: the inherent geometric discrepancy between perspective and panoramic cameras, and the scarcity of panoramic training data with metric annotations. In this work, we introduce DepthMaster, a unified metric depth estimation framework. Rather than employing specialized networks to learn spherical distortions, we reformulate the problem by decomposing panoramic images into overlapping perspective patches. Crucially, distinct from prior projection-based methods that rely on ad-hoc architectural modifications to handle boundaries, we introduce a novel Correspondence Consistency Loss (CCL) and inject virtual projection cameras as geometric priors, allowing us to seamlessly stitch the patches while avoiding specialized operators and keeping the backbone largely compatible with standard Transformer designs. This strategy also resolves the geometric differences by unifying all inputs into a canonical perspective representation, and effectively circumvents data scarcity by directly unlocking powerful metric priors from vast perspective datasets. Trained on a mixed dataset that contains only one panorama dataset, DepthMaster achieves state-of-the-art zero-shot performance on 13 diverse datasets, outperforming not only universal methods but also leading specialist models in both perspective and panoramic domains.

2606.12040 2026-06-12 cs.AI cs.GR 新提交

A Lightweight Multi-Agent Framework for Automated Concrete Barrier Design

一种用于自动混凝土护栏设计的轻量级多智能体框架

Wanting Wang, Xiye Ma, Yuyang He, Minghui Cheng, Ran Cao

AI总结 提出基于AutoGen的“生成-评估-优化”闭环多智能体框架,实现混凝土护栏自动设计,准确率超98%,且8B参数轻量模型可优于631B旗舰模型。

详情
AI中文摘要

钢筋混凝土公路护栏的设计是一个安全关键过程,需要严格遵守AASHTO-LRFD桥梁设计指南等监管规定。当前的工程实践严重依赖手动、迭代和启发式计算来满足复杂的非线性材料和力学约束。尽管大型语言模型(LLMs)表现出强大的生成能力,但它们在结构工程中的直接应用仍受到幻觉风险和物理基础不足的限制。为了解决这些挑战,本研究提出了一种新颖的“生成-评估-优化”闭环框架,利用AutoGen的多智能体编排能力实现混凝土护栏的自动设计。实验结果表明,所提出的智能体框架实现了超过98%的设计准确率,显著优于独立的通用LLMs。更重要的是,研究揭示了设计性能不一定与模型规模相关,8B参数的轻量级模型可以胜过无约束的631B参数旗舰模型。这一发现凸显了在降低计算成本的同时提高AI辅助工程工具在工业应用中的可及性的潜力。所提出的多智能体设计框架的源代码可在项目GitHub仓库中获取:this https URL。关键词:结构工程;多智能体系统;大型语言模型;混凝土护栏设计;AutoGen;设计自动化。

英文摘要

The design of reinforced concrete highway barriers is a safety-critical process that requires strict compliance with regulatory provisions such as the AASHTO-LRFD bridge design guidelines. Current engineering practice relies heavily on manual, iterative, and heuristic calculations to satisfy complex nonlinear material and mechanics constraints. Although Large Language Models (LLMs) demonstrate strong generative capabilities, their direct application to structural engineering remains limited by hallucination risks and insufficient physical grounding. To address these challenges, this study proposes a novel "generation-evaluation-optimization" closed-loop framework for automated concrete barrier design using the multi-agent orchestration capabilities of AutoGen. Experimental results demonstrate that the proposed agentic framework achieves over 98% design accuracy, significantly outperforming standalone general-purpose LLMs. More importantly, the study reveals that design performance is not necessarily correlated with model scale, where an 8B-parameter lightweight model could outperform unconstrained 631B-parameter flagship models. This finding highlights the potential to substantially reduce computational costs while improving the accessibility of AI-assisted engineering tools for industry applications. The source code for the proposed multi-agent design framework is available at the project GitHub repository: https://github.com/MXY820/barrier-design. Keywords: Structural Engineering; Multi-Agent Systems; Large Language Models; Concrete Barrier Design; AutoGen; Design Automation.

2606.12025 2026-06-12 cs.AI 新提交

Human-Enhanced Loop Modeling (HELM): Agent-Based Finite Element Modeling of Concrete Bridge Barriers

人类增强循环建模(HELM):基于智能体的混凝土桥梁护栏有限元建模

Quankai Wang, Yulin Xie, Tongfei Yang, Minghui Cheng, Ran Cao

AI总结 提出HELM框架,通过人机协作将有限元建模分解为可验证的检查点,在MASH TL-4和TL-5条件下将自主建模成功率从20%提升至75%。

详情
AI中文摘要

对桥梁护栏等安全关键基础设施进行有限元(FE)建模需要高保真非线性动态分析,然而当前的FE建模过程仍然劳动密集且缺乏自动化。本文提出了人类增强循环建模(HELM)框架,这是一种协作式人机协议,将长序列有限元建模分解为几何生成、边界条件定义和材料分配等离散的、可视觉验证的检查点。该框架通过一个包含20个案例的钢筋混凝土桥梁护栏矩阵在MASH TL-4和TL-5侧向荷载条件下进行演示,将专用智能体与两种广泛使用的商业FE软件(即ANSYS和LS-PrePost)对接。实验结果表明,HELM将基线自主建模成功率从20%提高到75%,其中几何和边界条件任务的智能体级通过率大约翻倍。误差分析显示,空间推理和代数逻辑限制构成了主要的失败模式,突显了结构化人在回路干预对建模自动化的价值。完整的智能体设计代码和提示已开源,可访问:此 https URL。

英文摘要

Finite element (FE) modeling of safety-critical infrastructure such as bridge barriers requires high-fidelity nonlinear dynamic analysis, yet the current FE modeling process remains labor-intensive and lacks automation. This paper presents the Human-Enhanced Loop Modeling (HELM) framework, a collaborative human-agent protocol that decomposes long-sequence finite element modeling into discrete, visually verifiable checkpoints across geometry generation, boundary condition definition, and material assignment. The framework is demonstrated through a 20-case matrix of reinforced concrete bridge barriers under MASH TL-4 and TL-5 lateral loading conditions, interfacing specialized agents with two widely used commercial FE softwares, i.e., ANSYS and LS-PrePost. Experimental results show that HELM improves the baseline autonomous modeling success rate from 20% to 75%, with agent-level pass rates for geometry and boundary condition tasks approximately doubling. Error analysis reveals that spatial reasoning and algebraic logic limitations constitute the primary failure modes, underscoring the value of structured human-in-the-loop intervention for modeling automation. The complete agent design code and prompts are open-sourced and can be accessed at: https://github.com/SimAgentDev/Ansys-LSPP-AgentKit.

2606.11104 2026-06-12 cs.LG math.CA stat.ML 新提交

Limitations of Learning Tanh Neural Networks with Finite Precision

有限精度下学习Tanh神经网络的局限性

Philipp Grohs, Matěj Trödler

AI总结 基于有限精度计算和L^p精度保证,通过构造尖锐局部化bump函数,证明自适应随机算法在L^p范数下收敛速度不超过蒙特卡洛率O(m^{-1/p}),除非采样预算随网络参数和架构指数增长。

详情
AI中文摘要

我们研究了在有限精度计算和$L^p$精度保证下,从点评估中学习$\ anh$神经网络的局限性,建立在Berner、Grohs和Voigtländer(2023)的工作基础上。我们的方法基于通过迭代$\ anh$激活函数新颖构造的尖锐局部化bump函数。利用这一机制,我们证明,在有限精度设置下,基于$m$个样本的自适应随机算法在$L^p$范数下无法达到比蒙特卡洛率$O(m^{-1/p})$更高的收敛速度,除非采样预算随网络参数和架构的大小指数增长。结果揭示了有限精度对包含局部化bump函数的类别可学习性施加的基本限制,将先前针对ReLU网络的结果推广到了$\ anh$设置。

英文摘要

We investigate limitations of learning $\tanh$ neural networks from point evaluations under finite-precision computations and $L^p$ accuracy guarantees, building on Berner, Grohs, and Voigtländer (2023). Our approach is based on a novel construction of sharply localized bump functions via iterated $\tanh$ activations. Using this mechanism, we show that, in a finite-precision setting, no adaptive randomized algorithm based on $m$ samples can achieve a convergence rate higher than the Monte Carlo rate $O(m^{-1/p})$ in the $L^p$ norm, unless the sampling budget grows exponentially with the size of the network parameters and architecture. The results reveal fundamental limitations imposed by finite precision on the learnability of classes containing localized bump functions, extending previous results for ReLU networks to the $\tanh$ setting.

2606.10931 2026-06-12 cs.CL 新提交

It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO

一个样本就能带偏所有:单次GRPO打破对齐

Naihao Deng, Yilun Zhu, Naichen Shi, Clayton Scott, Rada Mihalcea

AI总结 研究发现,仅用单个有偏样本进行一步GRPO训练就能诱导大语言模型产生系统性偏见,且刻板印象推理泛化到多种属性、类别和基准测试,揭示了对齐机制的关键脆弱性。

详情
AI中文摘要

警告:本文包含若干有毒和冒犯性言论。现代大语言模型通常通过大规模后训练进行对齐,以确保公平和可靠的行为。在本工作中,我们研究了通过群体相对策略优化(GRPO)打破这些防护栏的容易程度。我们表明,在单个有偏样本上进行一次GRPO训练就足以诱导系统性偏见,且基于刻板印象的推理会泛化到不同属性、类别和基准测试中。我们进一步发现,模型基于初始产生有偏输出的可能性而表现出不同的易感性。我们的结果揭示了后训练中的一个关键脆弱性:对齐可以被单个样本覆盖。

英文摘要

Warning: This paper contains several toxic and offensive statements. Modern large language models (LLMs) are typically aligned through large-scale post-training to ensure fair and reliable behavior. In this work, we investigate how easily such guardrails can be broken by Group Relative Policy Optimization (GRPO). We show that one-shot GRPO training on a single biased example is sufficient to induce systematic bias, with stereotype-driven reasoning generalizing across attributes, categories, and benchmarks. We further find that models differ in their susceptibility based on the initial likelihood of producing biased outputs. Our results reveal a critical vulnerability in post-training: alignment can be overridden by a single example.

2606.10200 2026-06-12 cs.CV cs.AI cs.LG 新提交

An Improved Generative Adversarial Network for Micro-Resistivity Imaging Logging Restoration

一种改进的生成对抗网络用于微电阻率成像测井恢复

Ahmed Faizul Haque, S. M. Riaz Rahman Antu, Saif Ahmed, Asadullah Hil Galib, Souvik Pramanik, Mohammad Ashrafuzzaman Khan, Mohammad Abdul Qayum, Mohsin Sajjad

AI总结 提出基于改进GAN的成像测井图像恢复方法,通过FCN生成网络、深度可分离卷积残差块、Inception模块及多尺度特征提取与空间注意力机制,结合全局与局部判别网络,有效恢复缺失区域,结构相似性达0.903。

详情
Comments
Mistakes in citations and references. Further we want to submit in conference with improved experiments and results
AI中文摘要

本文提出了一种改进的基于GAN的成像测井图像恢复方法,用于解决微电阻率成像测井图像部分缺失的问题。该方法采用FCN作为生成网络基础设施,并添加深度可分离卷积残差块以学习和保留更有效的像素与语义信息;添加Inception模块以增加网络的多尺度感知场并减少参数数量;添加多尺度特征提取模块和空间注意力残差块,结合通道注意力机制与残差块实现多尺度特征提取。设计了全局判别网络和局部判别网络,通过相互对抗与生成网络逐步提高恢复部分与整体图像之间的内容和语义结构一致性。实验结果表明,测试集中五组不同大小缺失区域的成像测井图像的平均结构相似性度量为0.903,相比其他类似方法提高了约0.3。研究表明,该方法可用于微电阻率成像测井图像的恢复,在语义结构一致性和纹理细节方面有良好改善,从而为保障微电阻率成像测井图像后续解释的顺利进行提供了一种新的深度学习方法。

英文摘要

An improved GAN-based imaging logging image restoration method is presented in this paper for solving the problem of partially missing micro-resistivity imaging logging images. The method uses FCN as the generative network infrastructure and adds a depth-separable convolutional residual block to learn and retain more effective pixel and semantic information; an Inception module is added to increase the multi-scale perceptual field of the network and reduce the number of parameters in the network; and a multi-scale feature extraction module and a spatial attention residual block are added to combine the channel attention. The multi-scale module adds a multi-scale feature extraction module and a spatial attention residual block, which combine the channel attention mechanism and the residual block to achieve multi-scale feature extraction. The global discriminative network and the local discriminative network are designed to gradually improve the content and semantic structure coherence between the restored parts and the whole image by playing off each other and the generative network. According to the experimental results, the average structural similarity measure of the five sets of imaged logging images with different sizes of missing regions in the test set is 0.903, which is an improvement of about 0.3 compared with other similar methods. It is shown that the method in this study can be used for the restoration of micro-resistivity imaging log images with good improvement in semantic structural coherence and texture details, thus providing a new deep learning method to ensure the smooth advancement of the subsequent interpretation of micro-resistivity imaging log images.

2606.10642 2026-06-12 cs.LG physics.ao-ph 新提交

PhysMetrics.Weather: An Evaluation Framework for Physical Consistency in ML Weather Models

PhysMetrics.Weather: 机器学习天气模型中物理一致性的评估框架

Emma Kasteleyn, Timo Maier, Axel Lauer, Veronika Eyring, Pierre Gentine, Ana Lucic

AI总结 提出PhysMetrics.Weather评估框架,通过守恒、谱和动力学三类指标量化MLWP模型的物理真实性,指导物理信息架构开发并评估其运行可靠性。

详情
Comments
Preprint
AI中文摘要

机器学习天气预测(MLWP)模型以传统基于物理方法所需计算成本的一小部分实现了令人印象深刻的预测性能。然而,它们主要是(1)数据驱动的,并且(2)使用逐像素误差指标(例如RMSE)进行评估,因此无法保证其预测与已知物理定律一致。我们介绍了PhysMetrics.Weather,这是一个评估框架,通过三类指标(守恒、谱和动力学)评估MLWP模型的物理真实性。通过量化物理真实性,该工具指导物理信息架构的开发,并帮助评估MLWP模型是否可用于运行。我们的框架可在Github上获取,网址为https://github.com/...(原文未提供完整链接)。

英文摘要

Machine learning weather prediction (MLWP) models have achieved impressive forecasting performance at a small fraction of the computational costs required for traditional physics-based methods. However, they are primarily (1) data-driven and (2) evaluated using pixel-wide error metrics (e.g., RMSE), so there are no guarantees that their forecasts are consistent with known physical laws. We introduce PhysMetrics$.$Weather, an evaluation framework that assesses the physical realism of MLWP models across three types of metrics: conservation, spectral, and dynamical. By quantifying physical realism, this tool guides the development of physics-informed architectures and helps evaluate whether MLWP models are reliable for operational use. Our framework is available on Github at https://github.com/Emmakast/PhysMetrics.Weather.

2606.10069 2026-06-12 cs.LG physics.geo-ph 新提交

Using Seismic Statistical Features and VQ-VAE to Improve Spatiotemporal Seismicity Predictability

基于VQ-VAE和地震统计特征的时空地震危险性评估

Wei Quan, Denise Gorse

AI总结 本文在先前基于XGBoost和地震统计特征的研究基础上,将预测从全区域扩展到局部区域,并引入基于VQ-VAE模型从二维地震图提取的新特征,提升了局部地震预测性能。

详情
Comments
Title updated from "Spatiotemporal Seismic Hazard Assessment Using VQ-VAE and Seismic Statistical Features" to "Using Seismic Statistical Features and VQ-VAE to Improve Spatiotemporal Seismicity Predictability" in v2 to better reflect the focus of the paper. The content is unchanged apart from the title and minor copyediting
AI中文摘要

在本文中,我们基于先前的一项研究,该研究使用XGBoost以及日本和智利的地震目录数据证明,一组60个地震统计特征(SSFs)比tsfresh包中的428个通用时间序列特征具有更大的预测价值。我们在此以两种关键方式扩展了先前的工作,重点使用日本的数据,因为需要大数据集来训练深度学习(自编码器)模型。首先,我们从全区域预测(针对每个候选事件,考虑未来15天内区域内任何地方发生M≥5.0事件的可能性)转向局部预测,其中特征计算区域和预测区域都限制在候选事件周围半径24公里的圆内,并且我们表明性能仍然优秀,与先前同一区域的全局研究相似。其次,我们将基于一维(目录)数据的这套经过验证的SSFs与基于二维地震图的新特征相结合,该特征通过训练VQ-VAE模型以输出此类地图,并识别其误差度量与局部地壳应力积累的关系。我们表明,尽管仅基于SSFs的局部预测可以单独有效,测试AUC值与先前日本全局研究中的值一样高,但包含新的原生空间VQ-VAE衍生特征(通过SHAP分析排名最高)可以提升性能,并且似乎几乎完全取代了传统计算的b值在特征使用中的位置。

英文摘要

In this paper we build upon a previous study in which we demonstrated, using XGBoost and earthquake catalogue data from Japan and Chile, that a set of 60 seismic statistical features (SSFs) had much greater predictive value than a set of 428 generic time series features from the tsfresh package. We here extend this previous work in two key ways, focusing on data from Japan as a large dataset is necessary in order to allow for the training of a deep learning (autoencoder) model. First, we move from whole-region prediction (considering, for each candidate event, the likelihood of an event M $\geq$ 5.0 anywhere in the region in the next 15 days) to localised predictions in which both the region of feature computation and the region of prediction are restricted to a circle of radius 24 km around the candidate event, and we show that performance remains excellent, similar to our previous whole-region study for the same area. Second, we here couple this proven set of SSFs, based on one-dimensional (catalogue) data, with a novel feature based on two-dimensional seismic maps, obtained by training a VQ-VAE model to reproduce such maps as output and identifying a measure of its error in doing so with a localised build-up of crustal stress. We show that while localised prediction based on SSFs can be effective alone, with test AUC values as high as those obtained in the case of Japan in our previous whole-region study, the inclusion of the new natively-spatial VQ-VAE-derived feature, top-ranked by SHAP analysis, can enhance performance and additionally appears to near-wholly replace the traditionally-computed $b$-value in terms of feature usage.

2606.09073 2026-06-12 cs.LG cs.AI cs.CL 新提交

A Unifying Lens on Reward Uncertainty in RLHF

RLHF中奖励不确定性的统一视角

Ely Hahami, Yoel Zimmermann, Ray Zhou, Jack Benarroch Jedlicki

AI总结 本文提出使用分布奖励模型统一RLHF中的悲观主义方法,通过闭式有效奖励公式连接现有启发式方法,并揭示其隐含假设。

详情
AI中文摘要

基于人类反馈的强化学习(RLHF)受限于\textit{奖励破解},即策略利用代理奖励模型(RM)中的错误,产生高RM分数而缺乏真正的质量提升。一种自然的缓解方法是\textit{悲观主义}:在RM不确定的区域惩罚奖励。然而,标准标量RM没有提供原则性的不确定性概念。我们认为正确的对象是\textit{分布}奖励模型$p(r\mid x,y)$。在贝叶斯推断或KL分布鲁棒优化(KL-DRO)视角下,KL正则化的RLHF目标具有闭式有效奖励$\tilde r(x,y) = \pmβ\log\mathbb{E}_p[e^{\pm r/β}]$。悲观分支统一了RM集成聚合的先前启发式方法:均值聚合、最坏情况优化(WCO)和不确定性加权优化(UWO)都作为该单一表达式的极限或截断出现。这也澄清了每个现有规则的隐含假设。

英文摘要

Reinforcement learning from human feedback (RLHF) is bottlenecked by reward hacking, where the policy exploits errors in a proxy reward model (RM) and produces high RM scores without genuine quality gains. A natural mitigation is pessimism: lowering rewards in regions where the RM is uncertain. However, standard scalar RMs provide no principled notion of uncertainty. We argue that the right object is a distributional reward model $p(r\mid x,y)$. Under either a Bayesian inference or a KL-distributionally robust optimization (KL-DRO) lens, the KL-regularized RLHF objective admits a closed-form effective reward $\tilde r(x,y) = \pmβ\log\mathbb{E}_p[e^{\pm r/β}]$. The pessimistic branch unifies the prior heuristics for RM ensemble aggregation: mean aggregation, worst-case optimization (WCO), and uncertainty-weighted optimization (UWO) all emerge as limits or truncations of this single expression. This also clarifies the implicit assumptions of each existing rule.

2606.08436 2026-06-12 cs.CV 新提交

CACR:Reinforcing Temporal Answer Grounding in Instructional Video via Candidate-Aware Causal Reasoning

通过候选感知因果推理增强教学视频中的时间答案定位

Muge Qi, Rong Fu, Pengbin Feng, Xianda Li, Yu Cai, Yifu Guo, Shizhe Zhang, Simon James Fong, Lei Ma, Bin Li

AI总结 提出候选感知因果推理框架,通过视觉-语言预训练候选选择和基于GRPO的时序逻辑推理,解决教学视频中复杂问题理解和长视频片段定位挑战,在六个基准上取得最优mIoU。

详情
AI中文摘要

教学视频中的时间答案定位任务旨在定位响应自然语言查询的精确视频片段,对于直接视频答案检索日益重要。由于需要理解语义复杂的问题并解决未修剪视频与短目标时刻之间的显著长度不匹配,该任务仍然具有挑战性。现有方法通常对无关内容敏感或视觉推理能力不足。为了解决这些局限性,我们提出了候选感知因果推理框架。我们的方法首先采用基于视觉-语言预训练的候选选择算法高效生成K个候选片段,然后应用由拒绝奖励机制增强并通过组相对策略优化优化的时序逻辑推理模块进行稳健推理。在六个基准上的大量实验表明,我们的方法在平均交并比方面达到了最先进的性能,为长视频中基于推理的检索提供了新视角。

英文摘要

The task of temporal answer grounding in instructional video (TAGV), which aims to locate precise video segments that respond to natural language queries, is increasingly important for direct video answer retrieval. This task remains challenging due to the need to comprehend semantically complex questions and to address the significant length mismatch between untrimmed videos and short target moments. Existing methods often suffer from sensitivity to irrelevant content or insufficient visual reasoning capabilities. To tackle these limitations, we propose a Candidate-Aware Causal Reasoning (CACR) framework. Our approach first employs a Visual-Language Pre-training based Candidate Selection (VBCS) algorithm to efficiently generate K candidate segments, then applies a temporal logic reasoning module enhanced by a rejection reward mechanism and optimized via Group Relative Policy Optimization (GRPO) for robust inference. Extensive experiments on six benchmarks demonstrate that our method achieves state-of-the-art performance in terms of mean Intersection-over-Union (mIoU), providing a new perspective for reasoning-based retrieval in long videos.

2605.03847 2026-06-12 cs.AI 版本更新

Mechanical Conscience: A Mathematical Framework for Dependability of Machine Intelligenc

机械良知:机器智能可信赖性的数学框架

Munkhdegerekh Batzorig, Purevbaatar Ganbold, Kyungbin Park, Pilkong Jeong, Kangbin Yim

AI总结 提出机械良知(MC)概念,通过轨迹级规范过滤最小化修正基线策略,降低累积偏离,并处理认知不确定性,实现单智能体与分布式智能系统的可信赖性。

详情
Comments
9 pages, 2 figures. Preprint
AI中文摘要

分布式协作智能(DCI),包括边缘到边缘架构、联邦学习、迁移学习和群体系统,创造了结构性不可避免的涌现风险环境:在不确定性下,个体智能体的局部正确决策会组合成全局不可接受的行为轨迹。现有方法如约束优化、安全强化学习和运行时保证在个体动作层面评估可接受性,而非跨行为轨迹,且均未解决DCI部署的多参与者、充满不确定性的特性。本文引入机械良知(MC),一种新颖概念和简化数学框架,为单智能体和分布式智能系统实现轨迹级规范调节。机械良知被定义为一个监督过滤器,最小化修正基线策略的动作,以减少与规范可接受区域的累积偏差,同时考虑认知不确定性。我们引入相关构造——良知分数、机械内疚和共振可信赖性——为该新兴领域提供可解释的词汇和可计算的治理信号。建立了核心理论性质:可接受性等价性、最优调节的存在性以及单调偏差减少。示例结果表明,MC调节的智能体在传统控制器漂移到可接受边界之外的情况下保持轨迹级规范可接受性,并且该框架自然扩展到抑制多智能体DCI设置中交互引发的涌现风险。

英文摘要

Distributed collaborative intelligence (DCI), encompassing edge-to-edge architectures, federated learning, transfer learning, and swarm systems, creates environments in which emergent risk is structurally unavoidable: locally correct decisions by individual agents compose into globally unacceptable behavioral trajectories under uncertainty. Existing approaches such as constrained optimization, safe reinforcement learning, and runtime assurance evaluate acceptability at the level of individual actions rather than across behavioral trajectories, and none addresses the multi-participant, uncertainty-laden nature of DCI deployments. This paper introduces mechanical conscience (MC), a novel concept and simplified mathematical framework that operationalizes trajectory-level normative regulation for both single-agent and distributed intelligent systems. Mechanical conscience is defined as a supervisory filter that minimally corrects a baseline policy's actions to reduce cumulative deviation from a normatively admissible region, while accounting for epistemic uncertainty. We introduce associated constructs, conscience score, mechanical guilt, and resonant dependability, that provide an interpretable vocabulary and computable governance signals for this emerging field. Core theoretical properties are established: admissibility equivalence, existence of optimal regulation, and monotonic deviation reduction. Illustrative results demonstrate that MC-regulated agents maintain trajectory-level normative acceptability where conventional controllers drift outside admissible bounds, and that the framework naturally extends to suppress interaction-induced emergent risk in multi-agent DCI settings.

2605.02249 2026-06-12 cs.AI 版本更新

A Study of Belief Revision Postulates in Multi-Agent Systems (Extended Version)

多智能体系统中信念修正公设的研究(扩展版)

Michael Thielscher, Tran Cao Son

AI总结 研究认知规划中的信念修正问题,将经典AGM信念修正公设推广到多智能体环境,提出广义全交多智能体信念修正算子,并讨论迭代修正公设的推广及事件模型修正算子。

详情
AI中文摘要

我们研究了认知规划中的信念修正问题,即在一个多智能体系统中,当某个智能体获得关于某个状态属性的信念后,所有智能体的信念将如何变化。基于通过单一多智能体Kripke模型表示智能体信念的标准认知规划表示,我们将经典的AGM信念修正公设推广到多智能体环境,旨在为计算作为行动结果的所有智能体信念的动态认知推理框架提供形式化评估。作为满足所有广义AGM公设的简单算子示例,我们提出了广义全交多智能体信念修正。此外,我们定义了迭代修正的标准公设的推广,提出了一个更复杂的基于事件模型的修正算子,并讨论了在Kripke模型上定义能够满足所有迭代多智能体信念修正的广义公设的认知算子时可能存在的问题。

英文摘要

We investigate the belief revision problem in epistemic planning, i.e., what will be the beliefs of all agents in a multi-agent system after an agent gains the belief in some state property. Based on the standard representation in epistemic planning of agents' beliefs via a single multi-agent Kripke model, we generalize the classical AGM belief revision postulates to the multi-agent setting, with the aim to provide a formal framework for evaluating dynamic epistemic reasoning frameworks in which the beliefs of all agents as the result of actions are computed. As an example of a simple operator that satisfies all of the generalized AGM postulates, we present generalized full-meet multi-agent belief revision. We moreover define a generalization of the standard postulates for iterated revision, present a more sophisticated, event model based revision operator, and discuss the potential issues in defining an epistemic operator on Kripke models that can satisfy all of the generalized postulates for iterated multi-agent belief revision.

2606.02044 2026-06-12 cs.LG physics.med-ph 版本更新

Realistic noise synthesis reduces bias and improves tissue microstructure estimation with supervised machine learning

真实噪声合成减少偏差并改善有监督机器学习的组织微结构估计

Bradley G. Karat, Maëliss Jallais, Ali R. Khan, Santiago Aja-Fernández, Jelle Veraart, Marco Palombo

AI总结 针对扩散MRI中模拟与实测信号噪声不匹配导致的协变量偏移问题,提出真实噪声合成框架,通过引入Rician期望和有效后处理噪声方差,显著降低参数估计偏差并提高精度。

详情
Comments
* Shared first author
AI中文摘要

扩散MRI能够无创探测组织微结构,但准确的参数估计受到噪声相关效应的挑战。在基于模拟数据训练的有监督机器学习框架中,模拟信号与采集信号的噪声特性差异引入了一种协变量偏移,导致训练和推理时的输入信号分布不同。我们研究了这种不匹配对微结构参数估计的影响,并提出了一种真实噪声合成(RNS)框架来缓解该问题。RNS将Rician期望和有效后处理噪声方差同时纳入模拟训练信号。Rician期望使用MPPCA估计的噪声标准差建模,而有效标准差则从预处理数据的球谐残差中导出。该方法使用cylinder-zeppelin和SANDI模型在多个SNR水平的模拟数据集以及具有重复采集的体内扩散数据上进行了评估。还评估了对噪声误估计的敏感性。训练过程中忽略幅度诱导的噪声效应会产生系统性的、依赖于SNR的参数偏差,尤其是在低SNR下。引入Rician期望显著降低了偏差,使其达到噪声感知的非线性最小二乘拟合的水平。对有效标准差进行建模进一步提高了精度。性能在很大程度上独立于回归架构,但对准确的噪声估计敏感。这些发现表明,在模拟训练数据中进行真实噪声建模可以减轻信号域的协变量偏移,并且对于无偏的监督微结构估计至关重要,特别是在与高b值或高空间分辨率相关的低SNR区域。

英文摘要

Diffusion MRI enables non-invasive probing of tissue microstructure, but accurate parameter estimation is challenged by noise-related effects. In supervised machine learning frameworks trained on simulated data, discrepancies between the noise characteristics of simulated and acquired signals introduce a form of covariate shift, whereby the input signal distribution differs between training and inference. We investigated the impact of this mismatch on microstructure parameter estimation and propose a realistic noise synthesis (RNS) framework to mitigate it. RNS incorporates both the Rician expectation and the effective post-processing noise variance into simulated training signals. The Rician expectation was modelled using a noise standard deviation estimated with MPPCA, while the effective standard deviation was derived from spherical harmonic residuals of preprocessed data. The method was evaluated using the cylinder-zeppelin and the SANDI models on simulated datasets across multiple SNR levels and on in vivo diffusion data with repeated acquisitions. Sensitivity to noise misestimation was also assessed. Ignoring magnitude-induced noise effects during training produced systematic, SNR-dependent parameter bias, particularly at low SNR. Incorporating the Rician expectation substantially reduced bias to the level of noise-aware nonlinear least-squares fitting. Modelling the effective standard deviation further improved precision. Performance was largely independent of regression architecture but sensitive to accurate noise estimation. These findings demonstrate that realistic noise modelling in simulated training data mitigates signal-domain covariate shift and is essential for unbiased supervised microstructure estimation, particularly in low-SNR regimes associated with high b-values or high spatial resolution.

2606.00193 2026-06-12 cs.CL 版本更新

BOUTEF: A Multilingual Corpus for FakeNews in North Africa -- Language as a Weapon

BOUTEF:北非假新闻的多语种语料库——语言作为武器

Kamel Smaili, Yassine Toughrai, Amina Laggoun, David Langlois

AI总结 本文构建了包含阿尔及利亚和突尼斯多语种(MSA、方言、Arabizi、法语、英语等)的假新闻语料库BOUTEF,通过定量与定性分析揭示了假新闻依赖情感化叙事、耸人听闻框架和混合语言实践来增强传播力,而辟谣内容则更注重事实和验证。

详情
AI中文摘要

社交媒体上假新闻的快速传播已成为一个重大挑战,尤其是在北非等多语言和资源匮乏的环境中。本文介绍了BOUTEF,这是一个大规模多语言语料库,旨在研究阿尔及利亚和突尼斯假新闻的传播、特征和影响。该语料库整合了三个互补部分:虚假叙述、真实叙述以及相关的用户生成评论,并附有经过验证的辟谣信息。它涵盖了广泛的语言和语言变体,包括现代标准阿拉伯语、阿尔及利亚和突尼斯方言、阿拉伯语拉丁化拼写、法语、英语以及代码转换语言。基于这一资源,我们进行了结合定量和定性方法的全面实证分析。我们考察了主题分布、语言和修辞策略、情感模式以及社交参与动态。统计分析揭示了主题类别与信息真实性之间的显著关联,以及用户参与度与虚假内容可见性之间的强相关性。我们的发现表明,假新闻严重依赖情感化的叙述、耸人听闻的框架以及增强病毒式传播和受众参与的混合语言实践。相比之下,辟谣内容采用更注重事实和验证的风格。此外,阿尔及利亚和突尼斯之间的比较分析揭示了由社会政治背景塑造的共享动态和国家特定特征。结果强调了非正式语言实践在错误信息扩散和接收中的作用。通过提供丰富、带注释且公开可用的数据集,这项工作有助于推进假新闻检测、低资源语言处理以及理解复杂语言环境中的信息紊乱的研究。

英文摘要

The rapid spread of fake news on social media has become a major challenge, particularly in multilingual and under-resourced contexts such as North Africa. In this paper, we introduce BOUTEF, a large-scale multilingual corpus designed to study the propagation, characteristics, and impact of fake news in Algeria and Tunisia. The corpus integrates three complementary components: fake narratives, genuine narratives, and associated user-generated comments, along with verified debunking information. It covers a wide range of languages and linguistic varieties, including MSA, Algerian and Tunisian dialects, Arabizi, French, English, and code-switched language. Building on this resource, we conduct a comprehensive empirical analysis combining quantitative and qualitative approaches. We examine thematic distributions, linguistic and rhetorical strategies, sentiment patterns, and social engagement dynamics. Statistical analyses reveal significant associations between thematic categories and message veracity, as well as strong correlations between user engagement and the visibility of fake content. Our findings show that fake news relies heavily on emotionally charged narratives, sensational framing, and hybrid linguistic practices that enhance virality and audience engagement. In contrast, debunking content adopts a more factual and verification-oriented style. Furthermore, a comparative analysis between Algeria and Tunisia highlights both shared dynamics and country-specific characteristics shaped by sociopolitical contexts. The results emphasize the role of informal language practices in the diffusion and reception of misinformation. By providing a rich, annotated, and publicly available dataset, this work contributes to advancing research on fake news detection, low-resource language processing, and the understanding of information disorders in complex linguistic environments.

2605.31514 2026-06-12 cs.CL cs.AI cs.CY 版本更新

If LLMs Have Human-Like Attributes, Then So Does Age of Empires II

如果LLM具有类人属性,那么《帝国时代II》也具有

Adrian de Wynter

AI总结 通过训练简单神经网络于《帝国时代II》,论证LLM的拟人属性在经验上非唯一,提出应假设LLM非独特性而非拟人属性来设计实验。

详情
Comments
Fixed corollary 1, added stat sig
AI中文摘要

关于大型语言模型(LLM)和基于LLM的智能体工作流已有大量研究。然而,该领域的许多工作声称、赋予或假设它们具有普遍化的拟人属性(例如道德或对自然语言的理解)。我们的目标不是支持或反对这些属性的存在,而是指出这些结论可能不正确。为此,我们在电子游戏《帝国时代II》上构建并训练了一个简单的神经网络,并注意到任何处于足够强大基底(如乐高或大波士顿地区)中的实体也可能呈现此类属性。因此,LLM声称的拟人属性在经验上非唯一:尽管某些属性(例如对提示的响应)可能保持不变,但其他属性(如对其感知行为的解释)可能随基底改变。因此,任何基于经验的讨论都需要明确的测量标准;否则解释就留给了表征。然后我们表明,假设这些属性在系统中存在或不存在,独立于基底并以普遍化方式,会导致循环或无信息的结论,无论实验者对该主题的观点如何。最后,我们提出一个“零”假设,即假设LLM非独特性而非拟人属性来设置实验,并给出示例。我们还讨论了对我们工作的潜在反对意见,简要调查了该领域,并证明了《帝国时代II》是功能完备和图灵完备的。

英文摘要

Much research has been carried out on large language models (LLMs) and LLM-powered agentic workflows. However, many works within the field state emergence of, ascribe to, or assume, generalised anthropomorphic attributes to them (e.g., morality or understanding of natural language). Our goal is not to argue in favour or against the existence of these attributes, but to point out that these conclusions could be incorrect. For this we build and train a simple neural network on the videogame Age of Empires II, and note that any entity in a sufficiently-powerful substrate, such as LEGO or the Greater Boston Area, could also present such attributes. Hence, the purported anthropomorphic attributes of LLMs are empirically non-unique: although some properties (e.g., responses to prompts) could remain invariant, others, such as the interpretation of their perceived behaviour, might change with the substrate. Thus, any empirically-grounded discussion on these attributes requires explicit measurement criteria; otherwise the interpretation is left to the representation. We then show that assuming that these attributes exist or not in a system, independent of the substrate and in a generalised way, leads to either circular or uninformative conclusions. This is regardless of the experimenter's viewpoint on the subject, or whether the outcome shows existence or non-existence. Finally we propose a 'null' assumption, where one assumes LLM non-uniqueness instead of assuming anthropomorphic attributes to set up an experiment, along with examples of it. We also discuss potential objections to our work, briefly survey the field, and prove that Age of Empires II is functionally- and Turing-complete.

2605.27628 2026-06-12 cs.AI cs.CY cs.ET cs.MA cs.SY eess.SY 版本更新

Intelligence as Managed Autonomy: Failure, Escalation, and Governance for Agentic AI Systems

智能作为受管自主:代理型AI系统的失败、升级与治理

Srini Ramaswamy

AI总结 本文提出SMARt模型,通过形式化能力检测认知漂移、暂停推理、尝试恢复并在可靠性下降时放弃控制,以解决自主AI系统中的幻觉和持续不合理行为问题。

详情
Comments
This peer-reviewed paper is to appear in the Journal of Intelligent and Robotic Systems
AI中文摘要

随着自主和代理型AI系统在机器人和人机环境中的规模扩大,管理幻觉和持续但不合理的行动仍然是一个开放挑战。本文并未将这些失败仅仅归因于模型或对齐限制,而是探讨了无界自主性的架构脆弱性——即假设代理应在不确定性上升时继续运行的预设。本文引入了一种受管自主理论,通过形式化能力来定义智能行为:检测认知漂移、暂停推理、尝试恢复,并在可靠性下降时最终放弃控制。我们通过SMARt(具有受管/撤销转换的自管理多层自主推理)模型实例化该理论,该模型是一个四层框架,包含稳定、元认知、辅助和受管状态。通过开发定时、受保护的Petri网形式化,我们建立了系统的理论有界属性,展示了架构如何形式化地强制升级、约束无效输出,并确保在指定条件下的治理可达性。我们进一步分析了如何在不同的操作环境(例如医疗、机器人等)中结合特定领域的触发集,在满足完备性和健全性标准的前提下系统地维护安全性。由于这些触发被设计为自适应的,SMARt模型允许代理操作范围随时间安全、受控地扩展。我们得出结论,在自主生命周期内形式化失败管理是实现可靠且受治理人工智能的关键一步。

英文摘要

As autonomous and agentic AI systems scale in robotic and human-machine environments, managing hallucination and persistent but unjustified action remains an open challenge. Rather than attributing these failures solely to model or alignment limitations, this paper explores the architectural vulnerability of unbounded autonomy - the presumption that an agent should continue operating regardless of rising uncertainty. It introduces a theory of managed autonomy that defines intelligent behavior through the formal capacity to detect epistemic drift, suspend reasoning, attempt recovery, and ultimately surrender control when reliability diminishes. We instantiate this theory via the SMARt (Self-Managing Multi-tier Autonomous Reasoning with Regulated/Revoked transitions) model, a four-layer framework featuring Stable, Meta-cognitive, Assisted, and Regulated states. By developing a timed, guarded Petri net formulation, we establish theoretically bounded properties for the system, demonstrating how architecture can formally mandate escalation, constrain invalid outputs, and ensure governance reachability under specified conditions. We further analyze how incorporating domain-specific trigger sets across varied operational settings (e.g., healthcare, robotics, etc.) can systematically preserve safety, assuming completeness and soundness criteria are met. Because these triggers are designed to be adaptive, the SMARt model accommodates the safe, controlled expansion of an agent's operational scope over time. We conclude that formalizing failure management within the autonomy lifecycle is a crucial step toward realizing reliable and governed artificial intelligence.

2605.00432 2026-06-12 cs.LG stat.ML 版本更新

Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction

贝叶斯共形预测的最优时空解耦

Yu-Hsueh Fang, Chia-Yen Lee

AI总结 提出状态自适应贝叶斯共形预测(SA-BCP),通过门控凸组合平衡长期时间惯性与局部空间证据,实现分布漂移下的快速适应与稳定覆盖,并给出MSE最优阈值闭式解及在线选择过程的遗憾界。

详情
AI中文摘要

在线共形预测必须在快速适应分布漂移与稳定覆盖之间取得平衡:基于反馈的方法反应迅速但变得不稳定,而强折扣贝叶斯方法滞后并在紧密覆盖下膨胀区间。我们引入了\textbf{状态自适应贝叶斯共形预测(SA-BCP)},它将预测分位数形成为长期时间惯性与来自核密度估计的局部空间证据的门控凸组合,由单个可解释的证据阈值$K$控制。我们建立了三个结果:(i) 所得区间的渐近边际有效性;(ii) MSE最优阈值的闭式表达式$K^*_{\mathrm{MSE}}=\alpha(1-\alpha)/M^{\mathcal{T}}$,权衡了覆盖指标(伯努利)方差与时间结构偏差$M^{\mathcal{T}}$;(iii) 在线选择$K$的滚动起点过程——在平稳性下一致,对最佳固定$K$具有$O(\sqrt{T\log N})$遗憾,对于分段变体,在有界漂移下具有次线性动态遗憾界。在四个金融波动率和天气数据集、三个目标覆盖水平以及八个基线(包括最强的最近条件分位数方法SPCI和KOWCPI)上,SA-BCP在大多数设置中达到或超过名义覆盖,同时产生显著更窄的区间——在最紧密覆盖下,Winkler得分比折扣贝叶斯CP低约$3\times$——覆盖匹配审计确认这些效率提升并非欠覆盖的假象。我们披露了一个主要限制:一个专门针对波动率的共形GARCH竞争对手在其主波动率基序列上仍然更高效,尽管它不能跨领域迁移。

英文摘要

Online conformal prediction must balance fast adaptation to distribution shift against stable coverage: feedback-driven methods react quickly but become volatile, while strongly discounted Bayesian methods lag and inflate intervals at tight coverage. We introduce \textbf{State-Adaptive Bayesian Conformal Prediction (SA-BCP)}, which forms the predictive quantile as a gated convex combination of long-term temporal inertia and local spatial evidence from a kernel density estimate, controlled by a single interpretable evidence threshold $K$. We establish three results: (i) asymptotic marginal validity of the resulting intervals; (ii) a closed-form expression for the MSE-optimal threshold, $K^*_{\mathrm{MSE}}=α(1-α)/M^{\mathcal{T}}$, trading the coverage-indicator (Bernoulli) variance against the temporal structural bias $M^{\mathcal{T}}$; and (iii) a rolling-origin procedure for selecting $K$ online -- consistent under stationarity, with $O(\sqrt{T\log N})$ regret against the best fixed $K$ and, for a segmented variant, a sublinear dynamic-regret bound under bounded drift. Across four financial-volatility and weather datasets, three target coverage levels, and eight baselines (including the strongest recent conditional-quantile methods, SPCI and KOWCPI), SA-BCP attains at-or-above-nominal coverage in most settings while producing substantially sharper intervals -- up to roughly $3\times$ lower Winkler score than discounted Bayesian CP at the tightest coverage -- and a coverage-matched audit confirms these efficiency gains are not an artifact of under-coverage. We disclose one principal limitation: a volatility-specialized conformal-GARCH competitor remains more efficient on its home volatility-base series, though it does not transfer across domains.

2604.20428 2026-06-12 cs.RO 版本更新

Lexicographic Minimum-Violation Motion Planning using Signal Temporal Logic

使用信号时序逻辑的字典序最小违规运动规划

Patrick Halder, Lothar Kiltz, Hannes Homburger, Johannes Reuter, Matthias Althoff

AI总结 提出一种将字典序多目标优化转化为单目标标量优化的方法,通过非均匀量化和位移扩展MPPI求解器,并引入结合时空违规的谓词鲁棒性度量,实现可解释且可扩展的字典序STL最小违规运动规划。

详情
Comments
Submitted to the IEEE Open Journal of Intelligent Transportation Systems (under review)
AI中文摘要

自动驾驶汽车的运动规划通常需要满足多个有条件冲突的规范。在无法同时满足所有规范的情况下,最小违规运动规划通过根据规范的优先级最小化违规来维持系统运行。信号时序逻辑(STL)提供了一种形式化语言来严格定义这些规范,并能够对其违规进行定量评估。然而,规范的完全排序导致了一个字典序优化问题,使用标准方法求解通常计算成本高昂。我们通过使用非均匀量化和位移将多目标字典序优化问题转化为单目标标量优化问题来解决这个问题。具体来说,我们扩展了一个确定性模型预测路径积分(MPPI)求解器,以高效求解无二次输入成本的优化问题。此外,引入了一种结合空间和时间违规的新型谓词鲁棒性度量。我们的结果表明,所提出的方法在单目标求解器框架内为字典序STL最小违规运动规划提供了一种可解释且可扩展的解决方案。

英文摘要

Motion planning for autonomous vehicles often requires satisfying multiple conditionally conflicting specifications. In situations where not all specifications can be met simultaneously, minimum-violation motion planning maintains system operation by minimizing violations of specifications in accordance with their priorities. Signal temporal logic (STL) provides a formal language for rigorously defining these specifications and enables the quantitative evaluation of their violations. However, a total ordering of specifications yields a lexicographic optimization problem, which is typically computationally expensive to solve using standard methods. We address this problem by transforming the multi-objective lexicographic optimization problem into a single-objective scalar optimization problem using non-uniform quantization and bit-shifting. Specifically, we extend a deterministic model predictive path integral (MPPI) solver to efficiently solve optimization problems without quadratic input cost. Additionally, a novel predicate-robustness measure that combines spatial and temporal violations is introduced. Our results show that the proposed method offers an interpretable and scalable solution for lexicographic STL minimum-violation motion planning within a single-objective solver framework.

2601.14295 2026-06-12 cs.AI cs.CL cs.CY 版本更新

Epistemic Constitutionalism Or: how to avoid coherence bias

认知宪政主义:或如何避免一致性偏见

Michele Loi

AI总结 本文提出AI应建立明确的认知宪法,通过规范源归因等元规范避免一致性偏见,并论证自由主义路径优于柏拉图式路径。

详情
Comments
27 pages, 7 tables. Data: github.com/MicheleLoi/source-attribution-bias-data and github.com/MicheleLoi/source-attribution-bias-swiss-replication. Complete AI-assisted writing documentation: github.com/MicheleLoi/epistemic-constitutionalism-paper
AI中文摘要

大型语言模型日益扮演着人工推理者的角色:它们评估论点、分配可信度并表达信心。然而,它们的信念形成行为受隐式、未经审查的认知策略支配。本文主张为AI建立一部认知宪法:明确的、可争议的元规范,用于调节系统如何形成和表达信念。源归因偏见提供了动机案例:我表明前沿模型强制执行身份-立场一致性,惩罚归因于其预期意识形态立场与论点内容冲突的源的论点。当模型检测到系统性测试时,这些效应消失,揭示系统将源敏感性视为需要抑制的偏见,而非一种需要良好执行的能力。我区分了两种宪政路径:柏拉图式路径,要求从特权立场出发的形式正确性和默认源独立性;自由主义路径,拒绝此类特权,指定保护集体探究条件的程序性规范,同时允许基于认知警觉的原则性源关注。我主张自由主义路径,勾勒出八项原则和四种取向的宪政核心,并提出AI认知治理需要与我们现在对AI伦理所期望的同样明确、可争议的结构。

英文摘要

Large language models increasingly function as artificial reasoners: they evaluate arguments, assign credibility, and express confidence. Yet their belief-forming behavior is governed by implicit, uninspected epistemic policies. This paper argues for an epistemic constitution for AI: explicit, contestable meta-norms that regulate how systems form and express beliefs. Source attribution bias provides the motivating case: I show that frontier models enforce identity-stance coherence, penalizing arguments attributed to sources whose expected ideological position conflicts with the argument's content. When models detect systematic testing, these effects collapse, revealing that systems treat source-sensitivity as bias to suppress rather than as a capacity to execute well. I distinguish two constitutional approaches: the Platonic, which mandates formal correctness and default source-independence from a privileged standpoint, and the Liberal, which refuses such privilege, specifying procedural norms that protect conditions for collective inquiry while allowing principled source-attending grounded in epistemic vigilance. I argue for the Liberal approach, sketch a constitutional core of eight principles and four orientations, and propose that AI epistemic governance requires the same explicit, contestable structure we now expect for AI ethics.

2511.02627 2026-06-12 cs.AI 版本更新

DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning

DecompSR:用于组合多跳空间推理分解分析的数据集

Lachlan McPheat, Navdeep Kaur, Robert Blackwell, Alessandra Russo, Anthony G. Cohn, Pranava Madhyastha

AI总结 提出DecompSR数据集(超500万数据点),通过程序化生成独立控制组合性的多个方面(如推理深度、语言变异性),用于细粒度评估大语言模型的空间推理能力。

详情
AI中文摘要

我们引入了DecompSR(分解空间推理),这是一个大型基准数据集(超过500万个数据点)和生成框架,旨在分析组合空间推理能力。DecompSR的生成允许用户独立改变组合性的多个方面,即:生产力(推理深度)、替代性(实体和语言变异性)、过度泛化(输入顺序、干扰项)和系统性(新颖语言元素)。DecompSR以程序化方式构建,使其在构造上正确,并通过符号求解器独立验证以确保数据集的正确性。DecompSR在一系列大型语言模型(LLM)上进行了全面基准测试,我们表明LLM在空间推理任务中难以进行生产性和系统性泛化,而对语言变异性则更为鲁棒。DecompSR提供了一个可证明正确且严格的基准数据集,具有独立改变组合性几个关键方面程度的新能力,从而允许对LLM的组合推理能力进行稳健且细粒度的探测。

英文摘要

We introduce DecompSR, decomposed spatial reasoning, a large benchmark dataset (over 5m datapoints) and generation framework designed to analyse compositional spatial reasoning ability. The generation of DecompSR allows users to independently vary several aspects of compositionality, namely: productivity (reasoning depth), substitutivity (entity and linguistic variability), overgeneralisation (input order, distractors) and systematicity (novel linguistic elements). DecompSR is built procedurally in a manner which makes it is correct by construction, which is independently verified using a symbolic solver to guarantee the correctness of the dataset. DecompSR is comprehensively benchmarked across a host of Large Language Models (LLMs) where we show that LLMs struggle with productive and systematic generalisation in spatial reasoning tasks whereas they are more robust to linguistic variation. DecompSR provides a provably correct and rigorous benchmarking dataset with a novel ability to independently vary the degrees of several key aspects of compositionality, allowing for robust and fine-grained probing of the compositional reasoning abilities of LLMs.

2603.23502 2026-06-12 cs.CV 版本更新

OccAny: Generalized Unconstrained Urban 3D Occupancy

OccAny: 广义无约束城市3D占据预测

Anh-Quan Cao, Tuan-Hung Vu

AI总结 提出首个广义无约束城市3D占据模型OccAny,通过分割强制和新视图渲染技术,在无标定场景下实现度量占据预测与分割特征完成,跨域泛化优于视觉几何基线。

详情
Comments
Accepted to CVPR 2026. Project page: https://valeoai.github.io/OccAny/
AI中文摘要

依赖于域内标注和精确传感器先验,现有的3D占据预测方法在可扩展性和域外泛化方面均受限。虽然最近的视觉几何基础模型展现出强大的泛化能力,但它们主要针对通用目的设计,缺乏城市占据预测所需的一个或多个关键要素,即度量预测、杂乱场景中的几何完成以及城市场景的适应性。我们解决了这一差距,并提出了OccAny,这是第一个无约束城市3D占据模型,能够在域外无标定场景上运行,预测并完成与分割特征耦合的度量占据。OccAny具有通用性,可以从序列、单目或环视图像预测占据。我们的贡献有三方面:(i) 提出了第一个广义3D占据框架,(ii) 提出了分割强制(Segmentation Forcing)方法,在提高占据质量的同时实现掩码级预测,以及(iii) 提出了一种新视图渲染管线,用于推断新视图几何以实现测试时视图增强,从而完成几何。大量实验表明,OccAny在3D占据预测任务上优于所有视觉几何基线,同时在两个已建立的城市占据预测数据集上的三种输入设置下,与域内自监督方法保持竞争力。我们的代码可在以下网址获取:https://this https URL。

英文摘要

Relying on in-domain annotations and precise sensor-rig priors, existing 3D occupancy prediction methods are limited in both scalability and out-of-domain generalization. While recent visual geometry foundation models exhibit strong generalization capabilities, they were mainly designed for general purposes and lack one or more key ingredients required for urban occupancy prediction, namely metric prediction, geometry completion in cluttered scenes and adaptation to urban scenarios. We address this gap and present OccAny, the first unconstrained urban 3D occupancy model capable of operating on out-of-domain uncalibrated scenes to predict and complete metric occupancy coupled with segmentation features. OccAny is versatile and can predict occupancy from sequential, monocular, or surround-view images. Our contributions are three-fold: (i) we propose the first generalized 3D occupancy framework with (ii) Segmentation Forcing that improves occupancy quality while enabling mask-level prediction, and (iii) a Novel View Rendering pipeline that infers novel-view geometry to enable test-time view augmentation for geometry completion. Extensive experiments demonstrate that OccAny outperforms all visual geometry baselines on 3D occupancy prediction task, while remaining competitive with in-domain self-supervised methods across three input settings on two established urban occupancy prediction datasets. Our code is available at https://github.com/valeoai/OccAny .