arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.26115 2026-05-26 cs.CV

TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

TriSplat: 面向仿真的前馈式3D场景重建

Weijie Wang, Zimu Li, Jinchuan Shi, Zeyu Zhang, Botao Ye, Marc Pollefeys, Donny Y. Chen, Bohan Zhuang

AI总结 提出TriSplat,一种前馈式重建网络,使用有向三角形图元表示场景,直接从稀疏视图图像预测并导出可直接用于仿真的网格场景。

详情
Comments
Project Page: https://lhmd.top/trisplat, Code: https://github.com/ziplab/TriSplat
AI中文摘要

稀疏视图3D重建越来越多地通过前馈式splatting网络来解决,这些网络直接从图像预测显式图元。然而,现有方法大多仍以高斯图元为中心,且仅间接暴露表面:提取可用于下游仿真、物理推理或具身交互的网格仍需昂贵的后处理步骤,这违背了前馈式的承诺。这一限制在无姿态设置中尤为突出,因为场景结构和相机参数必须从稀疏观测中联合估计。我们提出TriSplat,一种前馈式重建网络,使用有向三角形图元表示场景,并直接从单次前向传播中导出可用于仿真的网格场景。给定输入图像,网络预测局部3D点图、三角形属性、相机姿态和可选内参。我们的方法不是将三角形方向回归为无约束的潜变量,而是从预测的点图构建几何法线,通过图像条件法线头进行细化,并将其转换为稳定的局部框架用于三角形参数化。单目法线引导调度进一步稳定早期训练,而透明度和模糊调度逐步锐化学习到的表面表示以直接提取网格。在RealEstate10K和DL3DV上的实验表明,与高斯前馈基线相比,该表示方法能产生更几何保真的重建,同时保持有竞争力的新视角渲染质量。由于渲染图元本身就是表面三角形,输出可直接被物理引擎、碰撞检测器和标准渲染管线使用而无需任何转换,使其成为面向仿真的前馈式3D场景重建的实用解决方案。

英文摘要

Sparse-view 3D reconstruction is increasingly addressed with feed-forward splatting networks that predict explicit primitives directly from images. Yet most existing methods remain centered on Gaussian primitives and expose surfaces only indirectly: extracting a usable mesh for downstream simulation, physics reasoning, or embodied interaction still requires expensive post-hoc steps that break the feed-forward promise. This limitation is especially pronounced in pose-free settings, where scene structure and camera parameters must be estimated jointly from sparse observations. We present TriSplat, a feed-forward reconstruction network that represents scenes with oriented triangle primitives and directly exports simulation-ready mesh scenes from a single forward pass. Given input images, the network predicts local 3D point maps, triangle attributes, camera poses, and optional intrinsics. Rather than regressing triangle orientation as an unconstrained latent variable, our approach constructs geometry normals from the predicted point maps, refines them with an image-conditioned normal head, and converts them into stable local frames for triangle parameterization. A mono-normal bootstrap schedule further stabilizes early training, while opacity and blur scheduling progressively sharpens the learned surface representation for direct mesh extraction. Experiments on RealEstate10K and DL3DV show that this representation produces more geometry-faithful reconstructions than Gaussian feed-forward baselines while maintaining competitive novel-view rendering quality. Because the rendering primitives are themselves surface triangles, the output can be directly ingested by physics engines, collision detectors, and standard rendering pipelines without any conversion, making it a practical simulation-ready solution for feed-forward 3D scene reconstruction.

2605.26113 2026-05-26 cs.RO cs.CV

AnyScene: Towards Highly Controllable Driving Scene Generation at Anywhere and Beyond

AnyScene: 迈向高度可控的任意位置驾驶场景生成及超越

Haiming Zhang, Junfei Zhou, Feng Jiang, Jingzhong Li, Zhenglong Guo, Penglin Dai, Jifeng Dai, Yan Xie, Benjin Zhu

AI总结 提出AnyScene框架,通过时空占用扩散Transformer和几何引导视图扩展模块,实现从BEV布局生成语义占用序列和参考无关的多视角驾驶视频,支持精确可控和长时生成。

详情
Comments
Work in progress. Project page: https://mind-omni.github.io/
AI中文摘要

生成高保真且可控的合成数据对于推进端到端自动驾驶至关重要,特别是解决罕见安全关键场景的长尾问题。现有的占用引导方法通常依赖于浅层条件机制和参考帧相关的视频合成,这限制了从任意BEV布局进行细粒度可控性,并限制了其在可扩展模拟中的适用性。在本文中,我们提出了AnyScene,一个统一的以占用为中心的驾驶场景生成框架。AnyScene通过时空占用扩散Transformer从BEV布局生成语义占用序列,该Transformer以自回归方式联合标记BEV和占用特征。这种设计使得从跨数据集和用户定义的BEV输入实现精确可控性,同时自然支持长时生成。基于生成的占用,几何引导视图扩展模块将占用视为规范空间表示,并以无参考和自回归方式合成时间一致的多视角驾驶视频,支持推理时的灵活相机配置。大量实验表明,AnyScene在占用和视频生成方面均达到最先进性能。它展现出对未见和定制布局的强大泛化能力,并为下游任务(如稀疏视图3D重建)提供可衡量的益处。

英文摘要

Generating high-fidelity and controllable synthetic data is critical for advancing end-to-end autonomous driving, particularly for addressing the long tail of rare safety-critical scenarios. Existing occupancy-guided methods typically rely on shallow conditioning mechanisms and reference-frame-dependent video synthesis, which limits fine-grained controllability from arbitrary BEV layouts and restricts their applicability for scalable simulation. In this paper, we propose AnyScene, a unified occupancy-centric framework for driving scene generation. AnyScene generates semantic occupancy sequences from BEV layouts through a Spatial-Temporal Occupancy Diffusion Transformer that jointly tokenizes BEV and occupancy features in an autoregressive manner. This design enables precise controllability from cross-dataset and user-defined BEV inputs while naturally supporting long-horizon generation. Building upon the generated occupancy, a Geometry-Grounded View Expansion module treats occupancy as the canonical spatial representation and synthesizes temporally consistent multi-view driving videos in a reference-free and autoregressive fashion, supporting flexible camera configurations at inference time. Extensive experiments demonstrate that AnyScene achieves state-of-the-art performance in both occupancy and video generation. It exhibits strong generalization to unseen and customized layouts, and provides measurable benefits for downstream tasks such as sparse-view 3D reconstruction.

2605.26112 2026-05-26 cs.AI cs.LG

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

从模型扩展到系统扩展:扩展智能体AI中的“缰绳”

Shangding Gu

AI总结 本文提出智能体AI的下一个瓶颈是系统扩展而非仅模型扩展,通过设计可审计、持久、模块化和可验证的架构(称为“缰绳”),并研究上下文治理、可信记忆和动态技能路由三大瓶颈,以推动智能体行为从模型能力向长期任务执行转化。

详情
AI中文摘要

本文研究智能体AI中下一个主要瓶颈是系统扩展,而不仅仅是模型扩展:围绕基础模型设计可审计、持久、模块化和可验证的架构。我们将这种转变称为扩展“缰绳”:将基础模型周围的结构化执行层视为设计、评估和优化的一等对象。尽管近期的大语言模型使智能体能够使用工具、检索信息、维护记忆并执行长期工作流,但评估仍以模型为中心,通常将智能体简化为最终任务成功,而将记忆、检索、工具使用、编排、验证和治理视为次要的实现细节。这种框架日益不足,因为智能体性能源于基础模型、记忆基质、上下文构建器、技能路由层、编排循环以及验证与治理层之间的交互。这些组件共同构成智能体缰绳,将模型能力转化为长期智能体行为。我们通过三个核心瓶颈研究扩展缰绳:上下文治理、可信记忆和动态技能路由,以及协调和约束它们的编排与治理机制。我们进一步概述了缰绳级基准的研究议程,超越一次性任务成功,测量轨迹质量、记忆卫生、上下文效率、通信保真度、验证成本和随时间的安全演化。为使讨论具体化,我们开发了CheetahClaws:https://github.com/SafeRL-Lab/cheetahclaws,一个Python原生参考缰绳,并将其与Claude Code和OpenClaw进行比较。我们的主要主张是,智能体AI的未来进展将同样依赖于系统设计和更强的模型。

英文摘要

This paper studies the next major bottleneck in agentic AI as system scaling, not only model scaling: the design of auditable, persistent, modular, and verifiable architectures around foundation models. We refer to this shift as scaling the harness: treating the structured execution layer around a foundation model as a first-class object of design, evaluation, and optimization. Although recent large language models enable agents to use tools, retrieve information, maintain memory, and execute long-horizon workflows, evaluation remains largely model-centric, often reducing agents to final-task success while treating memory, retrieval, tool use, orchestration, verification, and governance as secondary implementation details. This framing is increasingly inadequate because agent performance emerges from the interaction among the foundation model, memory substrate, context constructor, skill-routing layer, orchestration loop, and verification-and-governance layer. Together, these components form the agent harness, which translates model capability into long-horizon agent behavior. We study scaling the harness through three core bottlenecks: context governance, trustworthy memory, and dynamic skill routing, together with the orchestration and governance mechanisms that coordinate and constrain them. We further outline a research agenda for harness-level benchmarks that go beyond one-shot task success to measure trajectory quality, memory hygiene, context efficiency, communication fidelity, verification cost, and safe evolution over time. To make the discussion concrete, we develop CheetahClaws: https://github.com/SafeRL-Lab/cheetahclaws, a Python-native reference harness, and compare it with Claude Code and OpenClaw. Our main claim is that future progress in agentic AI will depend as much on system design as on stronger foundation models.

2605.26111 2026-05-26 cs.CV cs.AI cs.GR cs.LG cs.MM

Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation

从多模态大语言模型中榨取能力用于主题驱动生成

Shuhong Zheng, Aashish Kumar Misraa, Yu-Teng Li, Yu-Jhe Li, Igor Gilitschenski

AI总结 提出一种结合多模态大语言模型和VAE身份条件的方法,通过双层级聚合模块和多阶段去噪策略,在主题驱动图像生成中实现多模态理解与身份保持的平衡,优于现有方法。

详情
Comments
33 pages, 18 figures, Project Page: https://zsh2000.github.io/squeeze-mllm-subject-gen/
AI中文摘要

主题驱动图像生成旨在合成新图像,在遵循文本指令的同时保持给定主题的身份。现有方法通常分别编码文本和参考图像,这限制了跨模态推理能力并导致复制粘贴伪影。最近连接多模态模型和扩散模型的框架改进了指令遵循,但很大程度上忽略了身份保持。为了解决这些限制,我们将扩散模型条件设置为联合编码文本和参考图像的多模态大语言模型(MLLM),并用基于VAE的身份条件进行增强。设计了一种新颖的双层级聚合(DLA)模块来聚合多级MLLM特征以实现最优条件,并应用多阶段去噪策略在推理过程中逐步平衡来自MLLM的语义信息和来自VAE的精细细节身份。大量实验表明,我们的方法协调了多模态理解与身份保持,缓解了复制粘贴问题,并在主题驱动图像生成中实现了优于人类偏好的性能。我们的项目网站位于https://zsh2000.github.io/squeeze-mllm-subject-gen/。

英文摘要

Subject-driven image generation aims to synthesize new images that preserve the identity of the given subject while following textual instructions. Existing approaches often encode text and reference images separately. This limits cross-modal reasoning abilities and causes copy-paste artifacts. Recent frameworks that connect multimodal models and diffusion models improve instruction following, but largely overlook identity preservation. To address these limitations, we condition diffusion models on Multimodal Large Language Models (MLLMs) that jointly encode text and reference images, and augment it with VAE-based identity conditioning. A novel Dual Layer Aggregation (DLA) module is designed to aggregate multi-level MLLM features for optimal conditioning, and a multi-stage denoising strategy is applied to progressively balance the semantic information from MLLM and fine-detail identity from VAE during inference. Extensive experiments demonstrate that our approach harmonizes multimodal understanding with identity preservation, mitigates copy-paste issues, and achieves superior performance regarding human preference on subject-driven image generation. Our project website is available at https://zsh2000.github.io/squeeze-mllm-subject-gen/.

2605.26110 2026-05-26 cs.LG cs.CL cs.CV

Prism: A Plug-in Reproducible Infrastructure for Scalable Multimodal Continual Instruction Tuning

Prism:面向可扩展多模态持续指令微调的插件式可复现基础设施

Jun-Tao Tang, Yu-Cheng Shi, Zhen-Hao Xie, Da-Wei Zhou

AI总结 针对多模态持续指令微调中工程瓶颈问题,提出Prism插件式代码库,通过轻量级插件注册机制分离算法开发与骨干实现,支持大规模训练流水线,实现可复现、可扩展的实验。

详情
Comments
Code is available at https://github.com/LAMDA-CL/Prism
AI中文摘要

多模态大语言模型(MLLMs)通过指令微调将多样任务重构为统一的指令遵循框架,从而实现多功能性。然而,实际部署需要持续适应新兴任务,这推动了多模态持续指令微调(MCIT)的发展。尽管其重要性日益增长,当前的MCIT研究受到严重的工程瓶颈阻碍。现有方法通常通过直接修改基础MLLM代码库来实现,这带来了大量的实现开销,并产生了方法特定的架构,严重限制了代码复用和公平比较。为了解决这一问题,我们引入了Prism,一个专门为可扩展MCIT研究设计的插件式可复现代码库。它通过轻量级插件注册机制将算法开发与骨干实现分离,使得新策略可以作为独立插件集成,而无需修改底层MLLM代码库,从而消除结构碎片化并加速方法开发。Prism原生支持广泛使用的大规模训练流水线,从而实现可复现和可扩展的MCIT实验。代码可在https://github.com/LAMDA-CL/Prism获取。

英文摘要

Multimodal Large Language Models (MLLMs) achieve versatility by reformulating diverse tasks into a unified instruction-following framework via instruction tuning. However, real-world deployment requires continuous adaptation to emerging tasks, motivating Multimodal Continual Instruction Tuning (MCIT). Despite its growing importance, current MCIT research is hindered by severe engineering bottlenecks. Existing methods are typically implemented by directly modifying the base MLLM codebase, which imposes substantial implementation overhead and yields method-specific architectures that severely limit code reuse and fair comparison. To address this, we introduce Prism, a plug-in reproducible codebase specifically designed for scalable MCIT research. It separates algorithmic development from the backbone implementation via a lightweight plugin registration mechanism, enabling new strategies to be integrated as independent plugins without modifying the underlying MLLM codebase, thereby eliminating structural fragmentation and accelerating method development. Prism natively supports widely used large-scale training pipeline, thereby enabling reproducible and scalable MCIT experimentation. Code is available at https://github.com/LAMDA-CL/Prism.

2605.26109 2026-05-26 cs.CV

Helix4D: Complex 4D Mesh Generation

Helix4D: 复杂4D网格生成

Jiraphon Yenphraphai, Jianqi Chen, Jian Wang, Gordon Qian, Sergey Tulyakov, Rameen Abdal, Raymond A. Yeh, Peter Wonka, Chaoyang Wang

AI总结 提出Helix4D框架,通过滑动窗口跨帧注意力和4D时间编码,将Trellis2从图像到3D扩展为视频条件4D动态网格生成,解决复杂拓扑变化、透明材料、薄结构和内表面等难题。

详情
Comments
Project page: https://snap-research.github.io/helix4d/
AI中文摘要

当前的视频到4D方法在处理复杂拓扑变化、透明材料、薄结构和内表面时存在困难。我们提出了Helix4D,一个动态网格生成框架,它继承了Trellis2的表达能力,并将其从图像到3D适应为视频条件4D生成。我们的设计源于两个关键问题:(a) 如何使Trellis2的帧局部注意力在帧间共享信息,同时保持其在罕见情况(如透明物体和内表面)上的预训练质量,以及(b) 如何在不破坏预训练能力的情况下将时间信息注入纯3D位置编码。我们通过滑动窗口跨帧注意力并锚定第一帧来解决(a)。第一帧由基础Trellis2模型生成并注入到我们的模型中,使其通过跨帧注意力继承Trellis2在罕见情况下的质量。我们通过一种4D时间编码来解决(b),该编码将冗余的低频空间RoPE频带重新用于时间,从而将编码从3D扩展到4D,且无需额外参数。大量实验表明,Helix4D在ActionBench和我们自己具有挑战性的复杂动态集上能有效生成高质量动态网格。

英文摘要

Current video-to-4D methods struggle with complex topology changes, transparent materials, thin structures, and inner surfaces. We present Helix4D, a dynamic mesh generation framework by inheriting the expressive representation of Trellis2, adapting it from image-to-3D to video-conditioned 4D generation. Our design arises from two key questions: (a) how to enable Trellis2's frame-local attention to share information across frames while preserving its pretrained quality on rare cases such as transparent objects and inner surfaces, and (b) how to inject temporal information into a purely 3D positional encoding without breaking pretrained capabilities. We address (a) with a sliding-window cross-frame attention and anchor on the first frame. The first frame is generated by the base Trellis2 model and injected into our model, letting it inherit Trellis2's quality in rare cases through cross-frame attention. We address (b) with a 4D temporal encoding that repurposes redundant low-frequency spatial RoPE bands for time, extending the encoding from 3D with no additional parameters. Extensive experiments show the effectiveness of Helix4D for high-quality dynamic mesh generation on ActionBench and our own challenging complex dynamics set.

2605.26107 2026-05-26 math.PR cs.PF

Radial Extremality for LRU Caching and the Fill--Holst Conjecture

LRU缓存的径向极值性与Fill-Holst猜想

Christopher D. Long

AI总结 本文证明在独立参考模型中,均匀流行度向量是LRU缓存命中率的唯一全局最小化器,并沿均匀向量出发的射线严格递增,从而验证了Fill-Holst关于移动至前端规则的Schur-凹性猜想的径向部分。

详情
Comments
13 pages, 0 figures
AI中文摘要

对于具有流行度向量$p\in\Delta_N^\circ$的独立参考模型,令$H_C(p)$表示容量为$C$的LRU缓存的精确稳态命中率。我们证明,对于每个$1\le C<N$,均匀流行度向量是内部单纯形上$H_C$的唯一全局最小化器。更尖锐地,沿着从均匀向量到内部点的每个非常数线段,LRU命中率严格递增。证明使用了稳态LRU缓存的标准指数年龄表示,并给出了径向导数的显式正对平方公式。等价地,对于移动至前端规则,沿远离均匀的每条非常数射线,稳态搜索成本分布在通常的随机序下严格改善。这证明了Fill-Holst关于移动至前端搜索成本尾部的Schur-凹性猜想的径向限制。特别地,所有LRU未命中概率和所有非常数非递减栈深度成本沿此类射线严格递减。该结果是径向的而非Schur-凸的:已知LRU的全优序单调性不成立,而证明识别了在均匀向量出发的射线上存续的特殊正性。

英文摘要

For the independent reference model with popularity vector $p\inΔ_N^\circ$, let $H_C(p)$ denote the exact stationary hit rate of an LRU cache of capacity $C$. We prove that, for every $1\le C<N$, the uniform popularity vector is the unique global minimizer of $H_C$ on the interior simplex. More sharply, along every nonconstant segment from the uniform vector to an interior point, the LRU hit rate is strictly increasing. The proof uses the standard exponential-age representation of the stationary LRU cache and gives an explicit positive pair-square formula for the radial derivative. Equivalently, for the move-to-front rule, the stationary search-cost distribution improves strictly in the usual stochastic order along every nonconstant ray away from uniform. This proves the radial restriction of the Fill--Holst Schur-concavity conjecture for move-to-front search-cost tails. In particular, all LRU miss probabilities and all nonconstant nondecreasing stack-depth costs decrease strictly along such rays. The result is radial rather than Schur-convex: full majorization monotonicity for LRU is known to fail, and the proof identifies the special positivity that survives on rays from the uniform vector.

2605.26106 2026-05-26 cs.LG

Looped Diffusion Language Models

循环扩散语言模型

Sanghyun Lee, Chunsan Hong, Seungryong Kim, Jonghyun Lee, Jongho Park, Dongmin Park

AI总结 提出LoopMDM,通过选择性循环早期-中间Transformer层,在训练时实现深度缩放效果而不增加参数,在推理时通过调整循环次数灵活扩展计算量,从而提升掩码扩散模型的训练效率和性能。

详情
Comments
23 pages
AI中文摘要

掩码扩散模型(MDMs)已成为自回归模型在语言建模中的有前途替代方案,然而针对MDMs的Transformer架构有效设计仍未充分探索。在本文中,我们展示选择性循环早期-中间Transformer层显著提升了MDMs的训练效率和模型性能。我们将此方法称为LoopMDM(循环掩码扩散模型),它带来两个关键优势:训练时循环层产生深度缩放效果而不增加参数,而推理时改变循环次数可实现灵活的计算扩展。尽管简单,结果令人瞩目:在多个预训练语料库上,LoopMDM在匹配相同大小MDMs性能的同时,训练FLOPs最多减少3.3倍,并且在各种推理基准上最终性能优于它们,包括在GSM8K上最多提升8.5个百分点。它甚至超越了使用可比每步计算训练的更深非循环MDMs,表明选择性循环比简单深度缩放更有效。此外,LoopMDM可通过增加循环次数来扩展推理时计算。在采样过程中自适应调整循环次数进一步在保持性能的同时提高计算效率。最后,通过注意力分析,我们提供证据表明循环通过促进掩码位置之间的交互在MDMs中有效。我们的代码和权重将公开发布。

英文摘要

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models for language modeling, yet the effective design of transformer architectures for MDMs remains underexplored. In this paper, we show that selectively looping the early-middle transformer layers significantly improves both training efficiency and model performance in MDMs. We call this approach LoopMDM(Looped Masked Diffusion Model), which brings two key benefits: looping layers at training-time yields a depth-scaling effect without adding parameters, while varying the number of loops at inference-time enables flexible compute scaling. Despite the simplicity, the results are striking: across multiple pre-training corpora, LoopMDM matches the performance of same-size MDMs with up to 3.3 fewer training FLOPs, while its final performance outperforms them on various reasoning benchmarks, including up to 8.5 points on GSM8K. It even surpasses deeper non-looped MDMs trained with comparable per-step compute, indicating that selective looping is more effective than naive depth scaling. Furthermore, LoopMDM can scale inference-time compute by increasing the number of loops. Adaptively adjusting the number of loops throughout the sampling process further yields additional gains in compute efficiency while maintaining performance. Lastly, with attention analysis, we provide evidence that looping is effective in MDMs by promoting interactions among masked positions. Our code and weights will be publicly released.

2605.26105 2026-05-26 cs.CV

On-Policy Adversarial Flow Distillation for Autoregressive Video Generation

自回归视频生成中的策略对抗流蒸馏

Yang Luo, Shengju Qian, Xiaohang Tang, Zirui Zhu, Yong Liu, Xin Wang, Yang You

AI总结 提出策略对抗流蒸馏(AFD)框架,通过策略内对抗性反馈和正向过程流匹配,实现从异构黑盒教师模型向自回归学生模型的高效蒸馏。

详情
AI中文摘要

自回归视频生成器在流式、长时和交互式应用中具有吸引力,但将强大的黑盒教师模型蒸馏到因果学生模型中仍然困难。学生模型必须在其自身的 rollout 分布下学习,而实际教师模型可能只暴露提示条件化的完整视频,并且在架构、容量、时间设计和采样调度上可能不同。这种接口使得监督微调离策略、基于分数的蒸馏不适用,并且直接的对抗性模仿对于去噪时间信用分配过于稀疏。我们提出对抗流蒸馏(AFD),一种用于异构黑盒视频蒸馏的策略内框架。AFD 查询教师模型并在相同提示上 rollout 当前学生模型,训练一个提示配对的 Bradley-Terry 判别器来估计干净样本的教师-学生差异,并将得到的策略内优势转换为学生自身噪声状态上的正向过程流匹配更新。因此,AFD 提供了密集的速度场监督,同时不需要教师分数、潜在变量、去噪轨迹、步骤对齐或反向链强化学习。在两个因果 AR 学生家族上的实验表明,AFD 在保持一般视频质量的同时,持续改善了运动和物理敏感生成,消融实验验证了自适应策略内反馈和正向过程信用分配的重要性。该方法仅需要干净的教师视频和学生 rollout,为将专有或异构视频生成器蒸馏为高效自回归学生模型提供了一条实用途径。

英文摘要

Autoregressive video generators are attractive for streaming, long-horizon, and interactive applications, but distilling strong black-box teachers into causal students remains difficult. The student must learn under its own rollout distribution, whereas practical teachers may expose only prompt-conditioned completed videos and may differ in architecture, capacity, temporal design, and sampling schedule. This interface makes supervised fine-tuning off-policy, score-based distillation inapplicable, and direct adversarial imitation too sparse for denoising-time credit assignment. We propose Adversarial Flow Distillation (AFD), an on-policy framework for heterogeneous black-box video distillation. AFD queries the teacher and rolls out the current student on the same prompts, trains a prompt-paired Bradley-Terry discriminator to estimate clean-sample teacher-student discrepancy, and converts the resulting on-policy advantage into forward-process flow-matching updates on the student's own noised states. Thus, AFD provides dense velocity-field supervision while requiring no teacher scores, latents, denoising trajectories, step alignment, or reverse-chain reinforcement learning. Experiments across two causal AR student families show that AFD consistently improves motion- and physics-sensitive generation while preserving general video quality, and ablations validate the importance of adaptive on-policy feedback and forward-process credit assignment. The method requires only clean teacher videos and student rollouts, providing a practical route for distilling proprietary or heterogeneous video generators into efficient autoregressive students.

2605.26104 2026-05-26 cs.CV

EVIDENT: Routing MLLM Adaptation through Entity-Grounded Visual Evidence for Cross-Domain Video Temporal Grounding

EVIDENT: 通过实体锚定的视觉证据路由MLLM适配用于跨域视频时间定位

Geo Ahn, Jiwook Han, Youngrae Kim, Joonseok Lee, Jinwoo Choi

AI总结 针对视频时间定位中域迁移导致性能下降的问题,提出EVIDENT框架,通过实体瓶颈适配器、实体绑定蒸馏损失和实体到证据门控机制,利用预训练MLLM的实体注意力实现参数高效的跨域鲁棒时间定位。

详情
AI中文摘要

微调MLLM用于视频时间定位(VTG)通常能提升域内性能,但在域迁移下性能急剧下降。本工作中,我们发现这种失败主要不仅由未见查询概念驱动,更由视觉域迁移导致,这阻止了模型将其学习的时间定位知识与固有的实体注意力能力耦合。为解决此问题,我们引入EVIDENT,一个参数高效的适配框架,通过将VTG适配路由通过显式的视觉实体证据,将时间定位锚定在预训练MLLM固有的实体注意力上。EVIDENT包含三个组件:(i) 实体瓶颈适配器,将密集的视觉令牌转换为紧凑的实体级槽;(ii) 实体绑定蒸馏损失,将对象性先验注入语义非结构化的MLLM视觉空间,引导每个槽绑定到一致的实体;(iii) 实体到证据门控机制,利用捕获的实体作为证据,引导模型定位包含查询相关实体的时刻。这些组件共同使VTG微调依赖于实体锚定的证据,而非脆弱的数据集捷径。在跨域VTG基准上的实验表明,EVIDENT在保持竞争性域内性能的同时,以适度的参数开销持续提升域外鲁棒性。这些结果表明,实体级锚定是通用时间定位的有效归纳偏置。

英文摘要

Fine-tuning MLLMs for Video Temporal Grounding (VTG) often improves in-domain performance but degrades sharply under domain shift. In this work, we find that this failure is primarily driven not just by unseen query concepts, but by visual domain shift, which prevents the model from coupling its learned temporal localization knowledge with its inherent entity-attention capability. To address this, we introduce EVIDENT, a parameter-efficient adaptation framework that anchors temporal grounding in the inherent entity-attention of pre-trained MLLMs by routing VTG adaptation through explicit visual entity evidence. EVIDENT consists of three components: (i) an Entity Bottleneck Adapter that transforms dense visual tokens into compact entity-level slots, (ii) an Entity-Binding Distillation loss that instills objectness priors into the semantically unstructured MLLM visual space, guiding each slot to bind to a coherent entity, and (iii) an Entity-to-eVidence gating mechanism that leverages the captured entities as evidence, steering the model to localize moments containing query-relevant entities. Together, these components enable VTG fine-tuning to rely on entity-grounded evidence rather than brittle dataset shortcuts. Experiments on cross-domain VTG benchmarks show that EVIDENT consistently improves out-of-domain robustness while preserving competitive in-domain performance with modest parameter overhead. These results suggest that entity-level grounding is an effective inductive bias for generalizable temporal localization.

2605.26100 2026-05-26 cs.SE cs.AI

Beyond Summaries: Structure-Aware Labeling of Code Changes with Large Language Models

超越摘要:基于结构感知的代码变更标注与大型语言模型

Bar Weiss, Antonio Abu-Nassar, Adi Sosnovich, Karen Yorav

AI总结 提出两阶段流水线,利用大型语言模型对代码补丁中的变更进行基于分类的标注,捕获结构关系和语义属性,以提升代码审查效率。

详情
Comments
13 pages, 6 figures
AI中文摘要

代码审查是软件工程中的关键实践,然而现代项目中代码补丁的规模和频率不断增长,加上AI代码助手的广泛采用,使得人工审查越来越具有挑战性。识别补丁中的变更类型(如重命名、移动或逻辑修改)可以通过实现优先级排序、过滤和自动化来显著提高审查效率。然而,现有的基于LLM的代码审查方法主要集中在摘要和评论生成上,结构化代码审查尚未得到充分探索。在本文中,我们系统研究了使用大型语言模型(LLMs)对代码补丁中的代码变更进行基于分类的标注。我们引入了一个两阶段流水线,首先为差异块分配标签,然后对其进行细化以捕获结构关系和语义属性,例如重命名传播和类型变更。我们的方法采用少样本提示来生成与语言无关且可定制的标签,无需传统静态分析流水线的工程开销。我们在一个手动策划的自然和合成补丁基准上,跨多个上下文配置评估了四个LLM。我们的最佳配置实现了高达84%的召回率和81%的精确率,并在提取关系和属性元数据方面具有高准确性。这些结果表明,基于LLM的标注可以通过实现灵活、多语言和自动化友好的代码审查工作流,有效补充静态分析。

英文摘要

Code review is a critical practice in software engineering, yet the growing scale and frequency of code patches in modern projects, together with the widespread adoption of AI code assistants, make manual review increasingly challenging. Identifying the types of changes within a patch, such as renames, moves, or logic modifications, can substantially improve review efficiency by enabling prioritization, filtering, and automation. However, existing LLM-based approaches to code review have largely focused on summarization and comment generation, leaving structured code reviews underexplored. In this paper, we present a systematic study of using large language models (LLMs) for taxonomy-based labeling of code changes in a code patch. We introduce a two-stage pipeline that assigns labels to diff hunks and then refines them to capture structural relationships and semantic attributes, such as rename propagation and type changes. Our approach employs few-shot prompting to produce language-agnostic and customizable labels, without the engineering overhead of traditional static-analysis pipelines. We evaluate four LLMs across multiple context configurations on a manually curated benchmark of natural and synthetic patches. Our best configuration achieves up to $84\%$ recall and $81\%$ precision, with high accuracy in extracting relational and attribute metadata. These results suggest that LLM-based labeling can effectively complement static analysis by enabling flexible, multilingual, and automation-friendly code review workflows.

2605.26097 2026-05-26 cs.LG

Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay

语言模型中的遗忘:容量、优化与自生成回放

Martin Marek, Dongkyu Cho, Shikai Qiu, Rumi Chunara, Pavel Izmailov, Andrew Gordon Wilson

AI总结 本文研究了语言模型中的遗忘问题,发现自生成样本可作为有效的回放数据几乎消除遗忘,并揭示了容量限制和低学习率对遗忘的影响。

详情
AI中文摘要

在新任务上训练的模型通常会在先前任务上表现下降,这种现象称为遗忘。传统上,缓解遗忘需要回放存储的先前任务样本,这通常不切实际。相比之下,语言模型可以从自身的训练分布中采样,我们证明这些自生成样本可作为有效的回放数据,几乎消除遗忘。然而,我们发现当模型剩余容量很小时,遗忘仍然存在:接近饱和的预训练模型无法在不覆盖先前知识的情况下吸收新信息。当容量不是限制因素时,低学习率会减少遗忘,但需要更多的训练步骤。回放打破了这一权衡,使得无需遗忘即可进行快速、高学习率的微调。

英文摘要

Models trained on a new task typically degrade on prior tasks, a phenomenon known as forgetting. Traditionally, mitigating forgetting has required replaying stored exemplars from prior tasks, which is often impractical. By contrast, language models can sample from their own training distribution, and we show that these self-generated samples serve as effective replay data, nearly eliminating forgetting. We find that forgetting nonetheless persists when the model has little remaining capacity: models pretrained close to saturation cannot absorb new information without overwriting prior knowledge. When capacity is not the limiting factor, low learning rates reduce forgetting but require substantially more training steps. Replay breaks this tradeoff, enabling fast, high-learning-rate finetuning without forgetting.

2605.26096 2026-05-26 quant-ph cond-mat.other cs.CC

Rounding Almost Commuting Hamiltonians

近似对易哈密顿量的取整

Islam Faisal, Anand Natarajan, Alexander Poremba

AI总结 针对几乎对易的2-局域量子比特哈密顿量,提出一种保持局域性的算法取整技术,将其近似为对易哈密顿量,并应用于吉布斯采样和快速哈密顿量模拟。

详情
Comments
41 pages
AI中文摘要

对易哈密顿量位于经典约束满足与量子多体物理的边界,展现出丰富的量子结构,同时比一般非对易模型更易处理。相比之下,物理哈密顿量很少精确对易,这自然促使了对近似对易哈密顿量的研究。尽管其相关性,近似对易的含义仍知之甚少。在这项工作中,我们展示了如何高效地将任意近似对易的$2$-局域量子比特哈密顿量近似为一个对易的哈密顿量:我们给出了一种保持局域性的算法取整技术,将任意$2$-局域哈密顿量$H=\sum_{i=1}^m h_i$(满足$\|[h_i,h_j]\| \leq ε$)映射到邻近的哈密顿量$\hat{H}$,其各项两两对易,并且整体距离$\|H-\hat{H}\| = O(m\,ε^{1/6})$。作为推论,我们证明当$δ\gg mε^{1/6}$时,$ε$-近似对易的$2$-局域量子比特哈密顿量的基态能量的$δ$-近似属于$\mathsf{NP}$,将经典包含性远远扩展到对易情形之外。最后,我们展示了取整框架的两个应用:近似对易系统的吉布斯采样和快速哈密顿量模拟。

英文摘要

Commuting Hamiltonians lie at the boundary between classical constraint satisfaction and quantum many-body physics, exhibiting rich quantum structure while remaining more tractable than general noncommuting models. In contrast, physical Hamiltonians are rarely exactly commuting, which naturally motivates the study of almost commuting Hamiltonians. Despite their relevance, the implications of approximate commutation are only poorly understood. In this work, we show how to efficiently approximate any almost commuting $2$-local qubit Hamiltonian by a commuting one: we give a locality-preserving algorithmic rounding technique that maps any $2$-local Hamiltonian $H=\sum_{i=1}^m h_i$ with $\|[h_i,h_j]\| \leq ε$ to a nearby Hamiltonian $\hat{H}$ whose terms pair-wise commute, and which is within overall distance $\|H-\hat{H}\| = O(m\,ε^{1/6})$. As a consequence, we show that $δ$-approximations to the ground energy for $ε$-almost commuting $2$-local qubit Hamiltonians lie in $\mathsf{NP}$ when $δ\gg mε^{1/6}$, extending the classical containment well beyond the commuting setting. Finally, we present two applications of our rounding framework: Gibbs sampling and fast Hamiltonian simulation for almost commuting systems.

2605.26095 2026-05-26 cs.CV

Pixel-Level Pavement Distress Assessment Using Instance Segmentation

基于实例分割的像素级路面病害评估

Logan Dewick, Bibesh Pyakurel, Kong Pheng Yang, Nazim Choudhury, M. G. Sarwar Murshed

AI总结 提出基于Mask R-CNN实例分割的路面病害分析系统,在自定义数据集上实现精确的裂缝和坑洞分割,并验证了其在实际路面图像中的有效性。

详情
Comments
7 pages, 6 figures
AI中文摘要

自动路面病害评估不仅需要图像级分类或粗略的边界框检测,还需要对细长、分支和不规则裂缝进行精确定位,以达到维护相关量化所需的几何精度。本文提出了一种基于Mask R-CNN实例分割的视觉路面病害分析系统,并在UWGB-StreetCrack(一个自定义的现场采集道路图像数据集,使用车载智能手机获取,并手动标注了纵向裂缝、横向裂缝、鳄鱼裂缝和坑洞的多边形标签)上进行了评估。在一致的微调协议下,考虑了五种基于Detectron2的Mask R-CNN骨干网络变体。性能最佳的模型——使用ResNet-101 FPN骨干网络的Mask R-CNN,在项目特定的边界框匹配协议下实现了84.23%的精确率、90.04%的召回率和87.04%的F1分数。同一模型产生的聚合预测裂缝面积分数为2.164%,与真实裂缝面积分数2.170%非常接近。为了将分割系统与面向检测器的替代方案进行对比,还基于CSPDarknet53的YOLO检测器进行了适配和重新训练,在验证协议上达到了27.5%的精确率和20.7%的召回率。结果表明,实例分割是现场路面图像和聚合裂缝面积估计的一个实用方向,同时也暴露了注释一致性、类别不平衡、混淆因素抑制和掩码级基准测试方面的开放挑战。

英文摘要

Automated pavement distress assessment requires more than image-level classification or coarse bounding box detection, demanding precise localization of thin, branching, and irregular cracks to achieve the geometric precision necessary for maintenance-relevant quantification. This paper presents a vision-based pavement distress analysis system based on Mask R-CNN instance segmentation and evaluates it on UWGB-StreetCrack, a custom field-collected roadway image dataset acquired with a vehicle-mounted smartphone and manually annotated with polygon labels for longitudinal cracks, transverse cracks, alligator cracks, and potholes. Five Detectron2-based Mask R-CNN backbone variants were considered under a consistent fine-tuning protocol. The best-performing model, Mask R-CNN with a ResNet-101 FPN backbone, achieved 84.23% precision, 90.04% recall, and an F1 score of 87.04% under the project-specific bounding-box matching protocol. The same model produced an aggregate predicted crack-area fraction of 2.164%, closely matching the 2.170% ground-truth crack-area fraction. To contextualize the segmentation system against a detector-oriented alternative, a CSPDarknet53-based YOLO detector was also adapted and retrained on the dataset, reaching 27.5% precision and 20.7% recall on the validation protocol. The results show that instance segmentation is a practical direction for field pavement imagery and aggregate crack-area estimation, while also exposing open challenges in annotation consistency, class imbalance, confounder rejection, and mask-level benchmarking.

2605.26093 2026-05-26 cs.LG stat.ML

Goal-driven Bayesian Optimal Experimental Design for Robust Decision-Making Under Model Uncertainty

面向模型不确定性下鲁棒决策的目标驱动贝叶斯最优实验设计

Jinwoo Go, Xiaoning Qian, Byung-Jun Yoon

AI总结 提出GoBOED框架,通过结合变分后验代理与可微凸决策层,直接优化实验设计以提升下游决策质量,并理论证明其对决策无关参数方向不敏感。

详情
AI中文摘要

贝叶斯最优实验设计(BOED)选择实验以最大化关于模型参数的信息增益。然而,在决策关键场景中,减少参数不确定性并不一定能改善下游决策,因为只有与目标相关的特定参数方向才真正重要。我们提出了GoBOED,一个目标驱动的BOED框架,它直接针对指定的决策目标优化实验设计。GoBOED结合了摊销变分后验代理与可微凸决策层,实现了完全以决策为中心的基于梯度的设计优化。我们从理论上证明,GoBOED梯度对决策目标无关的参数方向不敏感,这为为什么目标驱动设计在更广泛的实验设计集合上实现与信息增益最大化等效的决策质量提供了形式化依据。在源定位、流行病管理和药代动力学控制等实证任务中,GoBOED识别出与下游决策目标更一致的设计,并揭示了接近最优的设计窗口比目标无关的BOED方法预测的要宽得多。

英文摘要

Bayesian optimal experimental design (BOED) selects experiments to maximize information gain about model parameters. However, in decision-critical settings, reducing parameter uncertainty does not necessarily improve downstream decisions, as only specific parameter directions relevant to the objective truly matter. We propose GoBOED, a goal-driven BOED framework that directly optimizes experimental designs for a specified decision-making objective. GoBOED combines an amortized variational posterior surrogate with a differentiable convex decision layer, enabling gradient-based design optimization that is fully decision-focused. We theoretically show that GoBOED gradients are insensitive to parameter directions irrelevant to the decision objective, providing a formal justification for why goal-driven design achieves equivalent decision quality over a wider set of experimental designs than information-gain maximization. Empirically, across source localization, epidemic management, and pharmacokinetic control, GoBOED identifies designs that better align with downstream decision objectives and reveals that near-optimal design windows are substantially wider than those predicted by goal-agnostic BOED approaches.

2605.26090 2026-05-26 math.NA cs.NA quant-ph

Quantum Domain Decomposition for Preconditioning the Finite Element Method

量子域分解用于有限元方法预条件

Elise Fressart, Michel Nowak, Nicole Spillane

AI总结 研究将量子域分解作为预条件子应用于有限元离散的泊松问题,通过块编码参数上界分析量子线性系统求解器的复杂度,并采用BPX预条件子作为局部求解器。

详情
AI中文摘要

即使量子线性求解器相比经典求解器能提供显著加速,其性能仍取决于某些相同参数。特别是,待求逆矩阵的条件数是一个决定性参数。一个众所周知的经典(现也有量子)补救措施是通过预乘一个矩阵$H$来对线性系统$A x = b$进行预条件,使得$HA$的条件数远小于$A$的条件数。本文中,我们关注一类称为域分解的预条件子。首先,我们证明应用量子域分解是可行的。我们给出了由有限元方法离散并经两级加性Schwarz预条件子(最基本的域分解技术之一)预条件的泊松问题的块编码参数上界。从这些上界,我们推导出量子线性系统求解器的复杂度。其次,我们通过应用[Deiml和Peterseim, extit{Math. Comput.}, 2025]关于Bramble–Pasciak–Xu (BPX)预条件子的近期工作,专注于域分解预条件子中特定局部求解器的选择。最后,我们提供了算子实现的具体细节。

英文摘要

Even in cases where quantum linear solvers provide significant speedup compared to their classical counterparts, their performance depends on some of the same parameters. In particular, the condition number of the matrix which is to be inverted is a decisive parameter. A well known classical, and now quantum, remedy is to precondition the linear system $A x = b$ by premultiplying it by a matrix $H$ in such a way that the condition number of $HA$ is significantly smaller than the condition number of $A$. In this work, we focus on a family of preconditioners called domain decomposition. First, we prove that it is feasible to apply quantum domain decomposition. We provide upper bounds for the block-encoding parameters of the Poisson problem discretized by the finite element method and preconditioned by the two-level Additive Schwarz preconditioner (one of the most fundamental domain decomposition techniques). From these bounds, we deduce the complexity of the quantum linear system solver. Second, we focus on a particular choice of local solver within the domain decomposition preconditioner by applying recent work by [Deiml and Peterseim, \textit{Math. Comput.}, 2025] on the Bramble--Pasciak--Xu (BPX) preconditioner. Finally, we provide details on how the operators are implemented.

2605.26087 2026-05-26 stat.ML cs.LG

DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking

DiscoverPhysics: 基准测试LLMs的即用型科学思维

Matt L. Wiemann, Lindsay M. Smith, Peter Melchior, Siddharth Mishra-Sharma, Andrew Gordon Wilson, Pavel Izmailov, Carolina Cuesta-Lázaro

AI总结 提出DiscoverPhysics交互式基准,通过让LLM代理探索物理定律偏离现实的模拟世界,评估其设计实验、修正假设和发现物理规律的能力。

详情
AI中文摘要

前沿LLM现在在广泛的物理评估中表现强劲,但很难区分真正的推理与对已知科学的回忆。我们引入了DiscoverPhysics,一个交互式基准,要求LLM代理发现一个模拟世界的运动定律,该世界的物理故意偏离我们自己的世界。我们构建了22个世界,分别由屏蔽重力、分数幂重力、多物种耦合、隐藏暗物质样粒子、非坐标无关物理以及时变相互作用等支配。每个世界由N体模拟器按需生成,代理提出多轮实验,观察原始轨迹数据,最终提交对世界物理的自然语言解释以及推断定律的Python实现。由于解决一个世界需要代理设计信息性实验并修正其假设,该基准探测了在实验历史之上的长程推理。我们沿着两个互补轴评估提交:保留粒子的轨迹MSE和LLM评判的解释分数,该分数遵循专家编写的评估每个世界概念理解的规则。在11个前沿模型中,我们发现最强的代理仅通过一半的世界,并且在那些必须揭示潜在结构的世界中持续失败。开源模型在设计信息性实验和从数据中提取结论的能力方面明显落后于商业模型。我们进一步发现,良好的预测准确性并不能保证高质量的解释,并且概念理解依赖于通过精心选择的实验进行假设修正。

英文摘要

Frontier LLMs now perform strongly across a wide range of physics evaluations, but it is hard to disentangle genuine reasoning from recall of established science. We introduce DiscoverPhysics, an interactive benchmark that asks a LLM agent to discover the laws of motion of a simulated world whose physics deliberately deviates from our own. We construct 22 worlds governed by, among others, screened and fractional-power gravity, multi-species couplings, hidden dark-matter-like particles, non-coordinate-free physics, and time-varying interactions. Each world is generated on demand by an N-body simulator, for which the agent proposes several rounds of experiments, observes raw trajectory data, and ultimately submits both a natural-language explanation of the world's physics and a Python implementation of the inferred law. Because solving a world requires the agent to design informative experiments and revise its hypotheses, the benchmark probes long-horizon reasoning over an experimental history. We evaluate submissions along two complementary axes: trajectory MSE on held-out particles and an LLM-judged explanation score following an expert-written rubric assessing conceptual understanding of each world. Across eleven frontier models, we find that the strongest agents pass only half of the worlds and consistently fail on those where latent structure must be uncovered. Open-source models lag substantially behind commercial models, both in their ability to design informative experiments and in extracting conclusions from the data. We further find that good predictive accuracy does not guarantee high explanation quality and that conceptual understanding depends on hypothesis refinement through well-chosen experiments.

2605.26086 2026-05-26 cs.AI

Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World

Claw-Anything: 对更广泛访问用户数字世界的始终在线个人助手的基准测试

Yusong Lin, Xinyuan Liang, Haiyang Wang, Qipeng Gu, Siqi Cheng, Jiangui Chen, Shuzhe Wu, Feiyang Pan, Lue Fan, Sanyuan Zhao, Dandan Tu

AI总结 提出Claw-Anything基准测试,通过扩展长期活动历史、相互依赖的后端服务以及跨多设备的GUI和CLI交互三个维度,评估大型语言模型代理在始终在线环境下的性能,发现GPT-5.5仅达34.5% pass@1,并发布自动化数据生成管道提升基线模型23.7%。

详情
AI中文摘要

大型语言模型代理越来越被设想为始终在线的个人助手,能够访问用户数字世界中任何相关的内容。然而,当前系统仅在该世界的狭窄片段上运行,限制了上下文敏感推理和有效协助。现有基准测试同样仅提供部分用户状态,因此无法捕捉在这种广泛、始终在线环境下的性能。为填补这一空白,我们引入了Claw-Anything,一个沿三个维度扩展代理上下文的基准测试:长期活动历史、相互依赖的后端服务以及跨多设备的集成GUI和CLI交互。为实例化这一设置,我们通过多轮事件注入模拟数月的用户活动,产生复杂的世界状态和现实噪声,包括无关事件和冲突信号。代理必须在丰富的上下文环境中进行推理,同时对此类噪声保持鲁棒性。这种扩展范围还使得能够评估主动协助,要求代理预测用户需求并提供及时建议。实验表明,GPT-5.5仅达到34.5%的pass@1,显著低于先前的基准测试,凸显了当前代理能力与始终在线个人协助需求之间的差距。除基准测试外,我们还发布了一个自动化数据生成管道,该管道产生了2000个训练环境,并将基线模型提升了23.7%,展示了可扩展数据基础设施的实用性。

英文摘要

Large language model agents are increasingly envisioned as always-on personal assistants with access to anything relevant in the user's digital world. Yet current systems operate over only narrow slices of that world, limiting context-sensitive reasoning and effective assistance. Existing benchmarks similarly provide only partial user state and therefore fail to capture performance in such a broad, always-on setting. To address this gap, we introduce Claw-Anything, a benchmark that expands agent context along three dimensions: long-horizon activity histories, interdependent backend services, and integrated GUI and CLI interaction across multiple devices. To instantiate this setting, we simulate months of user activity through multi-round event injection, producing complex world states and realistic noise, including irrelevant events and conflicting signals. Agents must reason over rich contextual environments while remaining robust to such noise. This expanded scope also enables the evaluation of proactive assistance, requiring agents to anticipate user needs and deliver timely recommendations. Experiments show that GPT-5.5 achieves only 34.5% pass@1, substantially below prior benchmarks, underscoring a gap between current agent capabilities and the demands of always-on personal assistance. Alongside the benchmark, we release an automated data-generation pipeline that yields 2,000 training environments and improves the base model by 23.7%, demonstrating its utility of scalable data infrastructure.

2605.26084 2026-05-26 cs.CY

What is 'undone computer science'?

什么是“未完成的计算机科学”?

Chantal Enguehard, Guillaume Munch-Maccagnoni, Alberto Naibo

AI总结 本文引入社会科学中的“未完成的科学”概念,探讨计算机科学中因学科结构、范式等因素而被忽视或未获资助的重要研究问题。

详情
Journal ref
Philosophia Scientiæ, 30(2), 2026, pp. 5-16
Comments
Translation of "Qu'est-ce que la science informatique non faite ?" In: Philosophia Scientiæ, 30(2): Undone Computer Science. June 2026, pp. 5-16, doi: 10.4000/16952. Translated by Richard Dickinson (Inist-CNRS)
AI中文摘要

“未完成的科学”概念出现在2010年代,源于社会科学中社会运动研究与科学技术研究的交叉领域。它指的是那些值得探索却被忽视、忽略或未获资助的研究问题。本期特刊旨在将该概念应用于计算机科学,通过考察该学科的组织方式(包括其社会学、经济学和政治维度)以及塑造它的范式,是否使得识别对其发展和构想至关重要的认识论和伦理问题成为可能。

英文摘要

The concept of 'undone science' emerged in the 2010s in research in social sciences at the intersection of studies on social movements and of science and technology studies. It refers to research questions that are neglected, ignored, or left unfunded, even though they deserve to be explored. The aim of this special issue is to apply this concept to computer science, by examining whether the way this discipline is structured (including its sociological, economic, and political dimensions), as well as the paradigms that shape it, make it possible to identify epistemological and ethical questions that are crucial for its development and conception.

2605.26081 2026-05-26 cs.AI

VeriTrace: Evolving Mental Models for Deep Research Agents

VeriTrace:深度研究智能体的心智模型演化

Haolang Zhao, Yunbo Long, Lukas Beckenbauer, Alexandra Brintrup

AI总结 针对深度研究智能体面临的信息不确定性,提出VeriTrace认知图框架,通过显式反馈循环(解释更新、偏差反馈、模式修正)来演化心智模型,在DeepResearch Bench和DeepConsult上取得显著提升。

详情
AI中文摘要

深度研究智能体面临广阔、相互依赖且普遍不确定的信息。现有系统探索了演化中的中间表示应如何呈现,但将其演化留给LLM的隐式推理。没有显式调节,中间层容易被混合质量信息污染,并沿其依赖关系传播错误,因此模型规模往往最终替代了缺失的调节。我们认为,智能体的心智模型应通过持续将任务理解与现实对齐的显式反馈来演化,并识别出三种调节循环:解释更新、偏差反馈和模式修正。我们在VeriTrace中实现了这一点,这是一个显式实现这三种循环的认知图框架。使用匹配的Qwen3.5-27B骨干网络,VeriTrace在DeepResearch Bench (DRB) Insight上比最强匹配基线提高4.22个百分点(总体提高1.49个百分点),在DeepConsult上总体胜率提高5.9个百分点。使用Config-DeepSeek,它在DRB上取得了最强的可复现开源结果。

英文摘要

Deep research agents face vast, interdependent, and pervasively uncertain information. Existing systems explore what evolving intermediate representations should look like, but leave their evolution to the LLM's implicit reasoning. Without explicit regulation, the intermediate layer is easily contaminated by mixed-quality information and propagates errors along its dependencies, so model scale often ends up substituting for absent regulation. We argue that an agent's mental model should instead evolve through explicit feedback that continuously aligns task understanding with reality, and identify three regulatory loops: interpretive update, deviation feedback, and schema revision. We realise this in VeriTrace, a cognitive-graph framework that explicitly implements the three loops. Using matched Qwen3.5-27B backbones, VeriTrace improves over the strongest matched baseline by 4.22 pp on DeepResearch Bench (DRB) Insight (1.49 pp Overall) and by 5.9 pp Overall win rate on DeepConsult. With Config-DeepSeek, it achieves the strongest reproducible open-source result on DRB.

2605.26076 2026-05-26 cs.CE

AI-Powered Sustainable Finance: An Integrative Taxonomy and Framework of AI Applications for Sustainable Investment Decision-Making

AI驱动的可持续金融:面向可持续投资决策的AI应用综合分类与框架

Eduardo C. Garrido-Merchán, Esther Vaquero Lafuente, Elisa Aracil

AI总结 本文提出了一种AI应用分类法,涵盖机器学习、自然语言处理和优化算法,用于ESG评分预测、争议检测、投资组合管理和可持续性报告分析,并构建了AI驱动的可持续金融框架以克服ESG数据障碍。

详情
AI中文摘要

人工智能融入可持续金融代表了环境、社会和治理因素如何被分析、预测并纳入投资决策的变革性范式转变。本文综述了适用于可持续投资决策的AI方法综合分类法,根据其底层算法及其对ESG相关金融流程的影响对方法进行分类。所提出的AI分类法包括机器学习范式——包括监督学习、无监督学习和强化学习——以及自然语言处理技术和优化算法,考察它们在ESG评分预测、争议检测、投资组合管理和可持续性报告分析中的具体应用。通过综合近期文献的研究结果,构建了一个AI驱动的可持续金融框架,该框架识别了克服ESG数据障碍的技术应用。

英文摘要

The integration of Artificial Intelligence into sustainable finance represents a transformative paradigm shift in how Environmental, Social, and Governance factors are analyzed, predicted, and incorporated into investment decisions. This review provides a comprehensive taxonomy of AI approaches applicable to sustainable investment decision-making, categorizing methodologies based on their underlying algorithms and their impact on ESG-related financial processes. The proposed AI Taxonomy includes machine learning paradigms -- including supervised, unsupervised, and reinforcement learning -- as well as natural language processing techniques and optimization algorithms, examining their specific applications in ESG score prediction, controversy detection, portfolio management, and sustainability report analysis. By synthesizing findings from the recent literature, a framework emerges on AI-powered sustainable finance that identifies technological applications to overcome ESG data barriers.

2605.26074 2026-05-26 cs.CL cs.AI q-fin.GN

StakeBench: Evaluating Language Understanding Grounded in Market Commitment

StakeBench: 评估基于市场承诺的语言理解

Yunhua Pei, Jingyu Hu, Yiwei Shi, Hongnan Ma, Weiru Liu, John Cartlidge

AI总结 提出StakeBench框架,通过将市场评论与可验证的交易记录关联,从市场行为中自动生成监督信号,评估语言模型对市场承诺的理解能力。

详情
Comments
21 pages, 2 figures, 20 tables. Preprint. Dataset and evaluation code included
AI中文摘要

现有的金融自然语言处理基准通常依赖外部观察者提供的标签,衡量语言如何被感知而非说话者在市场中承诺了什么。我们引入StakeBench,一个基于市场承诺的语言理解评估框架。StakeBench将来自2261个已结算市场的560,876条评论与Polymarket和Manifold上可验证的头寸、行动和市场赔率记录相关联。监督信号来自可观察的市场行为。头寸方向、评论后交易行动和市场赔率轨迹取代了人工标注。四个诊断任务测试模型是否检测到市场承诺、识别揭示的方向、预测未来行动以及执行集体赔率预测。三个承诺感知指标衡量与揭示偏好而非感知情绪的一致性。有效性审计和明确的解释边界有助于区分可观察的承诺信号与潜在信念和因果市场赔率影响。在15个LLM、18个主题和平台设置中,模型部分恢复了头寸方向信号,定向准确率从0.506到0.599,但在后续任务中出现结构性失败。15个模型中有10个在未来行动预测中崩溃为一到两个行动标签,且没有模型在集体赔率预测中持续优于朴素赔率方向基线。模型规模与性能不相关,金融领域微调不改善揭示方向识别,平台激励强烈影响高阶结果。StakeBench在CC-BY 4.0许可下附带评估代码和数据集。

英文摘要

Existing financial NLP benchmarks often rely on labels supplied by outside observers, measuring how language is perceived rather than what speakers have committed to in the market. We introduce StakeBench, an evaluation framework for language understanding grounded in market commitment. StakeBench links 560,876 comments from 2,261 resolved markets to verified position, action, and market-odds records across Polymarket and Manifold. Supervision is derived from observable market behavior. Position sides, post-comment trading actions, and market-odds trajectories replace human annotation. Four diagnostic tasks test whether models detect market commitment, identify the revealed side, anticipate future action, and perform collective odds projection. Three commitment-aware metrics measure alignment with revealed preferences rather than perceived sentiment. Validity audits and explicit interpretation boundaries help distinguish observable commitment signals from latent belief and causal market-odds impact. Across 15 LLMs and 18 topics and platform settings, models partially recover position-side signals, with Directed Accuracy from 0.506 to 0.599, but show structural failures on later tasks. Ten of the fifteen models collapse to one or two action labels in future action anticipation, and no model consistently improves on the naive odds-direction baseline in collective odds projection. Model scale is not correlated with performance, finance-domain tuning does not improve revealed-side identification, and platform incentives strongly shape higher-order results. StakeBench is packaged with evaluation code and dataset under CC-BY 4.0.

2605.26072 2026-05-26 cs.LG

Active Query Synthesis for Preference Learning

用于偏好学习的主动查询合成

Namrata Nadagouda, Nauman Ahad, Maegan Tucker, Mark A. Davenport

AI总结 针对偏好学习中的查询反馈可靠性问题和池评估计算瓶颈,提出基于互信息最大化的连续空间主动查询合成框架Info-Synth,并扩展出两种有限池查询策略,在合成数据、文本摘要和机器人控制任务上验证了有效性。

详情
Comments
27 pages, 12 figures
AI中文摘要

用户偏好的高效学习对于许多现代决策系统至关重要,但通常需要昂贵的标注数据。主动学习降低了这一成本,然而由于基于池的评估,标准方法计算开销大。此外,大多数方法假设所有查询反馈同样可靠,忽略了几乎相同或完全不同的项目之间的成对查询会产生模糊、低置信度的响应。为了解决反馈可靠性问题,我们引入了一种新颖的置信度感知响应模型,明确考虑了这些模糊比较。为了克服基于池评估的计算瓶颈,我们提出了一个主动查询合成框架Info-Synth,通过在连续空间内最大化基于互信息的目标来生成最优查询。此外,我们提出了两种策略,Pair M-dist和Pair Opt-dist,将Info-Synth扩展到即使限制在有限查询池中也能选择有效查询。我们通过合成偏好学习、受限文本摘要数据集以及模拟移动机器人的主观连续空间控制器增益调优,展示了我们框架的通用性和性能。

英文摘要

Efficient learning of user preferences is crucial for many modern decision making systems but typically requires costly labeled data. Active learning reduces this cost, yet standard methods are computationally expensive due to pool-based evaluation. Further, most methods assume all query feedback is equally reliable, ignoring that pairwise queries between nearly identical or entirely dissimilar items yield ambiguous, low-confidence responses. To address the issue of feedback reliability, we introduce a novel confidence aware response model that explicitly accounts for these ambiguous comparisons. To overcome the computational bottleneck of pool-based evaluation, we propose an active query synthesis framework, Info-Synth that generates optimal queries by maximizing a mutual information-based objective within a continuous space. Moreover, we propose two strategies, Pair M-dist and Pair Opt-dist, that extend Info-Synth to select effective queries even when restricted to finite query pools. We demonstrate our framework's versatility and performance across synthetic preference learning, constrained text summary datasets, and subjective, continuous-space controller gain tuning for a simulated mobile robot.

2605.26070 2026-05-26 cs.CL

WhoSaidIt: Human-LLM Collaborative Annotation for Text-Based Multilingual Speaker-Attribute Classification

WhoSaidIt:面向文本的多语言说话人属性分类的人机协作标注

Lingyu Gao, Will Monroe, David Smith, Meghan Jemison, Jackie Lee

AI总结 提出一种人机协作重标注框架,通过迭代交互和分歧采样稳定多语言说话人属性标签,构建WhoSaidIt数据集并评估LLM性能。

详情
Comments
16 pages in total
AI中文摘要

从文本中标注说话人属性本质上是模糊的,尤其是在多语言环境中,人口统计和社会线索是隐含的且因文化而异。我们提出了一种人类-大语言模型(LLM)协作重标注框架,用于在实际资源限制下稳定多语言说话人属性标签。从嘈杂语料库开始,我们通过专家迭代交互利用LLM揭示重复出现的标注理由,并应用分歧聚焦采样进行针对性重标注。使用该框架,我们构建了WhoSaidIt,一个涵盖九个说话人属性标签的多语言数据集。我们量化了原始标注与修订标注之间的差异,对近期LLM进行了基准测试,并分析了显式理由对模型行为的影响。我们的结果揭示了标注决策中的显著跨语言差异,并展示了LLM在说话人属性分类中的优势与局限性。

英文摘要

Annotating speaker attributes from text is inherently ambiguous, particularly in multilingual settings where demographic and social cues are implicit and culturally variable. We propose a human-large language model (LLM) collaborative re-annotation framework for stabilizing multilingual speaker-attribute labels under practical resource constraints. Starting from a noisy corpus, we use LLMs to surface recurring annotation rationales through iterative interaction with experts, and apply disagreement-focused sampling for targeted re-annotation. Using this framework, we construct WhoSaidIt, a multilingual dataset covering nine speaker-attribute labels. We quantify divergence between original and revised annotations, benchmark recent LLMs, and analyze the effect of explicit rationales on model behavior. Our results reveal substantial cross-lingual differences in annotation decisions and demonstrate both the strengths and limitations of LLMs in speaker-attribute classification.

2605.26067 2026-05-26 cs.LG cs.AI

Conditional KRR: Injecting Unpenalized Features into Kernel Methods with Applications to Kernel Thresholding

条件KRR:将无惩罚特征注入核方法及其在核阈值处理中的应用

Rustem Takhanov, Zhenisbek Assylbekov

AI总结 本文通过将条件KRR简化为带残差核的KRR,理论分析了其统计性质,并展示了在核主成分和随机特征场景下优于标准KRR的条件。

详情
Comments
Accepted to ICML 2026
AI中文摘要

条件正定(CPD)核是相对于函数类$\mathcal{F}$定义的。众所周知,这样的核$K$与其原生空间(类似于RKHS定义)相关联,进而产生一种学习方法——称为条件核岭回归(条件KRR),因其与KRR的类比而得名——其中估计的回归函数通过其原生空间范数的平方进行惩罚。该方法之所以引人关注,是因为它可以被视为经典线性回归(由$\mathcal{F}$指定特征),随后对目标变量的残差(未解释)部分应用标准KRR。这类方法最近引起了越来越多的关注。 我们通过将其行为简化为带有另一个固定核(称为残差核)的KRR来研究该方法的统计性质。我们的主要理论结果表明,这种简化确实是可能的,代价是期望测试风险中增加一个由$\mathcal{O}(1/\sqrt{N})$界定的额外项,其中$N$是样本量,隐藏常数依赖于类$\mathcal{F}$和输入分布。 这种简化使我们能够分析在$K$是正定的且$\mathcal{F}$由$K$的Mercer分解中的前$k$个主特征函数给出的情况下的条件KRR。我们还考虑了$\mathcal{F}$由来自$K$的随机特征表示的$k$个随机特征组成的设置。事实证明,这两种设置密切相关。我们的理论分析和实验都证实,只要回归函数的$\mathcal{F}$分量比残差部分更显著,条件KRR在这些情况下优于标准KRR。

英文摘要

Conditionally positive definite (CPD) kernels are defined with respect to a function class $\mathcal{F}$. It is well known that such a kernel $K$ is associated with its native space (defined analogously to an RKHS), which in turn gives rise to a learning method -- called conditional kernel ridge regression (conditional KRR) due to its analogy with KRR -- where the estimated regression function is penalized by the square of its native space norm. This method is of interest because it can be viewed as classical linear regression, with features specified by $\mathcal{F}$, followed by the application of standard KRR to the residual (unexplained) component of the target variable. Methods of this type have recently attracted increasing attention. We study the statistical properties of this method by reducing its behavior to that of KRR with another fixed kernel, called the residual kernel. Our main theoretical result shows that such a reduction is indeed possible, at the cost of an additional term in the expected test risk, bounded by $\mathcal{O}(1/\sqrt{N})$, where $N$ is the sample size and the hidden constant depends on the class $\mathcal{F}$ and the input distribution. This reduction enables us to analyze conditional KRR in the case where $K$ is positive definite and $\mathcal{F}$ is given by the first $k$ principal eigenfunctions in the Mercer decomposition of $K$. We also consider the setting where $\mathcal{F}$ consists of $k$ random features from a random feature representation of $K$. It turns out that these two settings are closely related. Both our theoretical analysis and experiments confirm that conditional KRR outperforms standard KRR in these cases whenever the $\mathcal{F}$-component of the regression function is more pronounced than the residual part.

2605.26066 2026-05-26 cs.CE cs.ET

The Evolution of Digital Twins from Reactive to Agentic Systems

数字孪生从反应式系统到代理式系统的演进

Omer San, Adil Rasheed, Eda Bozdemir, Jun Deng

AI总结 本文探讨数字孪生如何从反应式系统演进为自学习、自主的代理式系统,并强调互操作性、标准化以及人工智能和高级计算推理跨领域集成的重要性。

详情
Journal ref
Nat Comput Sci 6, 6-10 (2026)
AI中文摘要

数字孪生正在演变为自学习、自主的系统,连接模型、数据和人类交互。实现其全部潜力取决于互操作性、标准化以及人工智能和高级计算推理跨领域的集成。

英文摘要

Digital twins are evolving into self-learning, autonomous systems that link models, data, and human interaction. Realizing their full potential depends on interoperability, standardization, and the integration of artificial intelligence and advanced computational reasoning across sectors.

2605.26062 2026-05-26 cs.GR cs.CV

Look Both Ways Before You Cross: Lifting Cross Fields From 2D Visual Priors

过马路前左右看:从2D视觉先验中提取交叉场

Dale Decatur, Jacob Serfaty, Oded Stein, Amir Vaxman, Rana Hanocka

AI总结 提出CrossLift方法,利用文本到图像先验从2D图像中提取方向信号,通过两次平滑插值将其反投影到网格表面,生成语义对齐的交叉场和四边形网格。

详情
Comments
Project page at: https://crosslift.github.io/
AI中文摘要

我们提出了CrossLift,一种由图像中的视觉特征引导的网格交叉场计算技术。我们利用强大的文本到图像先验,这些先验能够合成特征对齐的二维四边形网格图像。我们将此信号提取为2D图像中明确的逐像素方向,然后将其反投影到网格表面。我们通过在网格表面上执行两次平滑插值(首先在每个视图内,然后在多个视图之间)来聚合这些候选表面方向。我们在每次插值中为候选方向提出基于置信度的自定义权重,这使我们能够解决同一面上的候选方向之间的冲突,并将我们的场平滑插值到被遮挡的面。我们的方法是模块化的,可以与许多不同的2D视觉先验一起使用。我们展示了在纹理对齐四边形网格以及使用粗略的用户绘制线条作为信号的交互式交叉场设计中的额外应用。我们在多种有机和机械形状上展示了CrossLift的有效性,并生成了与现有方法相比具有优越语义对齐的四边形网格。项目页面:https://crosslift.github.io/

英文摘要

We present CrossLift, a technique for computing cross fields on meshes guided by visual features in images. We leverage powerful text-to-image priors that are capable of synthesizing images of feature-aligned quad meshes in 2D. We extract this signal as explicit per-pixel directions in the 2D images, which we then back-project to the mesh surface. We aggregate these candidate surface directions by performing two smooth interpolations on the mesh surface (first within each view and second across multiple views). We propose custom confidence-based weights for the candidate directions in each interpolation that allow us to resolve conflicts between candidates on the same face and smoothly interpolate our field to occluded faces. Our method is modular and can be used with many different 2D visual priors. We show additional applications to texture-aligned quad meshing as well as interactive cross-field design using coarse, user-drawn lines as signal. We demonstrate the effectiveness of CrossLift on a diverse set of both organic and mechanical shapes and produce quad meshes that exhibit superior semantic alignment as compared to existing methods. Project page at: https://crosslift.github.io/

2605.26061 2026-05-26 cs.LG cs.AI

Neuronal Stochastic Attention Circuit (NSAC) for Probabilistic Representation Learning

神经元随机注意力电路(NSAC)用于概率表示学习

Waleed Razzaq, Yun-Bo Zhao

AI总结 提出一种受生物学启发的连续时间注意力架构NSAC,通过Ornstein-Uhlenbeck随机微分方程和NCP门控机制在logits上诱导高斯分布,实现概率输出与不确定性量化。

详情
AI中文摘要

连续时间表示学习中不确定性估计的可靠量化仍处于初级阶段,尤其是在连续时间注意力架构中。我们引入了神经元随机注意力电路(NSAC),这是一种新颖的受生物学启发的连续时间注意力架构,它将注意力logit计算重新表述为Ornstein-Uhlenbeck随机微分方程的解,该方程由来自重新利用的秀丽隐杆线虫神经元电路策略(NCP)布线机制的输入依赖的非线性互连门调制。它在logits上诱导高斯分布,通过注意力权重上的逻辑正态分布传播原则性的随机性,从而产生概率输出。一个结合高斯负对数似然与认知分离正则化器的两项目标函数强制更高的预测方差,并能够联合量化偶然不确定性和认知不确定性。实验上,我们在多种学习任务中实现了NSAC,包括:(i) 不规则连续时间函数逼近;(ii) 多元回归;(iii) 长程预测;(iv) 工业4.0;以及(v) 自动驾驶车辆的车道保持。我们观察到,NSAC在准确性上与多个基线保持竞争力,产生合理校准的不确定性估计,同时在神经元细胞级别具有可解释性。

英文摘要

Reliable quantification of uncertainty estimates in continuous-time (CT) representation learning remains nascent, particularly within CT attention architectures. We introduce the Neuronal Stochastic Attention Circuit (NSAC), a novel biologically-inspired CT attention architecture that reformulates attention logit computation as the solution of an Ornstein-Uhlenbeck stochastic differential equation modulated by input-dependent, nonlinear interlinked gates derived from repurposed C.elegans Neuronal Circuit Policies (NCPs) wiring mechanism. It induces Gaussian distribution over logits that propagates principled stochasticity through logistic-normal distribution over attention weights to yield probabilistic output. A two-term objective function combining Gaussian negative log-likelihood with an epistemic-separation regularizer enforces higher predictive variance and enables joint quantification of aleatoric and epistemic uncertainty. Empirically, we implement NSAC in a diverse set of learning tasks including: (i) irregular CT function approximation; (ii) multivariate regression; (iii) long-range forecasting; (iv) Industry 4.0; and (v) the lane-keeping of autonomous vehicles. We observe that the NSAC remains competitive against several baselines in terms of accuracy and produces reasonably well-calibrated uncertainty estimates while being interpretable at the neuronal cell level.

2605.26059 2026-05-26 physics.flu-dyn cs.LG

Accelerating Bayesian inverse design in computational fluid dynamics using neural operators

利用神经算子加速计算流体力学中的贝叶斯逆向设计

Bipin Tiwari, Omer San

AI总结 本文提出将神经算子代理模型嵌入MCMC采样循环,在保持后验结构的同时实现超过三个数量级的加速,用于计算流体力学中的贝叶斯逆向设计。

详情
Journal ref
Mach. Learn. Comput. Sci. Eng 2, 14 (2026)
AI中文摘要

贝叶斯逆向设计提供了一个原则性框架,用于从稀疏流场观测中推断空气动力学几何形状并量化不确定性。然而,其在计算流体力学(CFD)中的实际应用受到基于梯度的马尔可夫链蒙特卡洛(MCMC)采样所需重复高保真模拟成本的严重限制。虽然通常提出代理模型来降低这一成本,但它们对后验几何和不确定性(尤其是激波主导流)的影响仍知之甚少。在这项工作中,我们证明神经算子代理可以直接嵌入MCMC推断循环中,同时保持后验结构。通过准一维喷管流的全贝叶斯逆公式,我们证明几何参数化在可辨识性和后验条件中起决定性作用,其中三次B样条产生稳定且物理意义明确的不确定性估计。基于该公式,在No-U-Turn采样器中用CFD生成数据训练的深度算子网络替代CFD求解器,同时保持似然模型、先验和采样配置不变。在从稀疏到完全观测的范围内,基于代理的推断再现了CFD参考的后验几何和不确定性趋势。由于代理集成,总推断时间减少到一秒以下,对应超过三个数量级的加速。此外,直接逆神经算子作为逆向设计的确定性替代方案被研究,无需后验采样即可实现单次几何重建。这些结果表明,神经算子加速的贝叶斯推断能够为空气动力学应用实现实用的、不确定性感知的逆向设计工作流程。

英文摘要

Bayesian inverse design provides a principled framework for inferring aerodynamic geometries from sparse flow observations while quantifying uncertainty. However, its practical use in computational fluid dynamics (CFD) is severely limited by the cost of repeated high-fidelity simulations required for gradient-based Markov chain Monte Carlo (MCMC) sampling. While surrogate models are commonly proposed to reduce this cost, their effect on posterior geometry and uncertainty, especially for shock-dominated flows, remains poorly understood. In this work, we demonstrate that neural operator surrogates can be embedded directly within the MCMC inference loop while preserving posterior structure. Using a fully Bayesian inverse formulation of quasi-one-dimensional nozzle flow, we demonstrate that geometry parameterization plays a decisive role in identifiability and posterior conditioning, with cubic B-splines yielding stable and physically meaningful uncertainty estimates. Building on this formulation, a Deep Operator Network trained on CFD-generated data is substituted for the CFD solver within a No-U-Turn Sampler, while keeping the likelihood model, priors, and sampling configuration unchanged. Across sparse to fully observed regimes, surrogate-based inference reproduces the posterior geometry and uncertainty trends of the CFD reference. As a result of surrogate integration, total inference time is reduced to under one second, corresponding to a speedup exceeding three orders of magnitude. In addition, a direct inverse neural operator is examined as a deterministic alternative for inverse design, enabling single-shot geometry reconstruction without posterior sampling. These results demonstrate that neural operator-accelerated Bayesian inference enables practical, uncertainty-aware inverse design workflows for aerodynamic applications.

2605.26054 2026-05-26 math.NA cs.NA

A Fully Discrete Energy-Based Discontinuous Galerkin Method for Variable-Order Time-Fractional Wave Equations

变阶时间分数阶波动方程的全离散基于能量的间断Galerkin方法

Lu Zhang

AI总结 针对变阶时间分数阶波动方程,提出并分析了一种全离散基于能量的间断Galerkin方法,通过累积权重变差估计建立了能量稳定性和二阶时间收敛性。

详情
AI中文摘要

变阶时间分数阶波动方程为具有演化记忆效应和异常时间动态的波动现象提供了灵活模型。其数值逼近具有挑战性,因为变阶分数阶导数产生时间依赖的历史权重,因此缺乏常阶分数阶算子的标准时间平移不变卷积结构。本文针对具有Caputo型变阶时间分数阶导数的波动方程,开发并分析了一种全离散基于能量的间断Galerkin(DG)方法。方程被重新表述为简化的一阶时间系统,在空间上通过基于能量的DG方法离散,并在时间上通过在每个时间区间内特殊选取的点处使用变阶Caputo导数的二阶近似推进。主要分析新颖之处是变阶记忆权重的累积权重变差估计,仅要求变阶$α:[0,T] \rightarrow (0,1)$是Lipschitz连续的。基于该估计,我们建立了全离散格式的能量稳定性,并推导了二阶时间收敛性以及能量范数空间误差估计。该分析在一般仿射单纯形或张量积网格上给出次优收敛,在附加笛卡尔和通量假设下给出最优收敛。一维和二维数值实验验证了理论结果。

英文摘要

Variable-order time-fractional wave equations provide a flexible model for wave phenomena with evolving memory effects and anomalous temporal dynamics. Their numerical approximation is challenging because the variable-order fractional derivative generates time-dependent history weights and therefore lacks the standard time-translation-invariant convolution structure of constant-order fractional operators. In this paper, we develop and analyze a fully discrete energy-based discontinuous Galerkin (DG) method for wave equations with a Caputo-type variable-order time-fractional derivative. The equation is reformulated as a reduced first-order-in-time system, discretized in space by an energy-based DG method, and advanced in time using a second-order approximation of the variable-order Caputo derivative at a specially chosen point in each time interval. The main analytical novelty is a cumulative weight-variation estimate for the variable-order memory weights, which requires only that the variable order $α:[0,T] \rightarrow (0,1)$ be Lipschitz continuous. Based on this estimate, we establish energy stability of the fully discrete scheme and derive second-order temporal convergence together with energy-norm spatial error estimates. The analysis gives suboptimal convergence on general affine simplicial or tensor-product meshes and optimal convergence under additional Cartesian and flux assumptions. Numerical experiments in one and two dimensions validate the theoretical findings.