arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.15199 2026-05-15 cs.CV cs.AI

EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation

Ruozhen He, Meng Wei, Ziyan Yang, Vicente Ordonez

AI总结 EntityBench 是一个用于评估多镜头视频生成中实体一致性能力的基准数据集，包含140个情节（共2,491个镜头），从真实叙事媒体中提取，涵盖不同难度级别的场景，并明确追踪角色、物体和地点在多镜头间的连续性。该基准引入了三部分评估体系，分别评估单镜头质量、提示对齐度和跨镜头一致性，并通过“保真度门”机制确保只有准确的实体表现在跨镜头评分中被计入。研究还提出了一种基于记忆增强的生成方法EntityMem，通过在生成前存储每个实体的视觉参考，显著提升了跨镜头实体一致性表现。

2605.15198 2026-05-15 cs.CV cs.AI cs.CL

ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

Ziyu Guo, Rain Liu, Xinyan Chen, Pheng-Ann Heng

AI总结该研究提出了一种名为ATLAS的新型视觉推理框架，旨在解决传统方法在计算开销和任务泛化上的不足。ATLAS通过一个单一的离散“功能词”同时实现代理式推理和潜在视觉推理，无需视觉监督且兼容标准训练流程。研究还引入了LA-GRPO方法以提升训练稳定性，实验表明ATLAS在多个基准上表现出色，兼具高效性与可解释性。

详情

Comments: Project Page: https://atlas-oneword.github.io Code: https://github.com/ZiyuGuo99/ATLAS

英文摘要

Visual reasoning, often interleaved with intermediate visual states, has emerged as a promising direction in the field. A straightforward approach is to directly generate images via unified models during reasoning, but this is computationally expensive and architecturally non-trivial. Recent alternatives include agentic reasoning through code or tool calls, and latent reasoning with learnable hidden embeddings. However, agentic methods incur context-switching latency from external execution, while latent methods lack task generalization and are difficult to train with autoregressive parallelization. To combine their strengths while mitigating their limitations, we propose ATLAS, a framework in which a single discrete 'word', termed as a functional token, serves both as an agentic operation and a latent visual reasoning unit. Each functional token is associated with an internalized visual operation, yet requires no visual supervision and remains a standard token in the tokenizer vocabulary, which can be generated via next-token prediction. This design avoids verbose intermediate visual content generation, while preserving compatibility with the vanilla scalable SFT and RL training, without architectural or methodological modifications. To further address the sparsity of functional tokens during RL, we introduce Latent-Anchored GRPO (LA-GRPO), which stabilizes the training by anchoring functional tokens with a statically weighted auxiliary objective, providing stronger gradient updates. Extensive experiments and analyses demonstrate that ATLAS achieves superior performance on challenging benchmarks while maintaining clear interpretability. We hope ATLAS offers a new paradigm inspiring future visual reasoning research.

URL PDF HTML ☆

赞 0 踩 0

2605.15196 2026-05-15 cs.CV cs.LG

RefDecoder: Enhancing Visual Generation with Conditional Video Decoding

Xiang Fan, Yuheng Wang, Bohan Fang, Zhongzheng Ren, Ranjay Krishna

AI总结该论文提出了一种名为 RefDecoder 的参考条件视频解码器，旨在提升视觉生成任务中的细节保真度和结构一致性。通过在解码过程中引入高保真参考图像信号，RefDecoder 利用参考注意力机制将参考图像编码为高维特征，并与去噪后的视频潜在特征进行联合处理，从而增强生成结果的质量。实验表明，RefDecoder 在多个基准数据集上显著提升了生成视频的 PSNR 指标，并且无需额外微调即可直接集成到现有视频生成系统中，有效提升了生成内容的主体一致性、背景一致性和整体质量。

2605.15195 2026-05-15 cs.CV

VGGT-$Ω$

Jianyuan Wang, Minghao Chen, Shangzhan Zhang, Nikita Karaev, Johannes Schönberger, Patrick Labatut, Piotr Bojanowski, David Novotny, Andrea Vedaldi, Christian Rupprecht

AI总结本文提出了一种改进的前馈重建模型 VGGT-$Ω$，旨在提升静态和动态场景的重建精度与效率。通过简化网络结构、引入注册机制和自监督学习策略，VGGT-$Ω$ 在大幅降低 GPU 内存占用的同时，显著提升了模型性能，并在多个基准测试中取得了优异结果，例如在 Sintel 数据集上将相机估计精度提升了 77%。研究还表明，该模型中的注册机制可有效支持视觉-语言-动作模型的空间理解任务。

详情

Comments: CVPR 2026 (Oral)

英文摘要

Recent feed-forward reconstruction models, such as VGGT, have proven competitive with traditional optimization-based reconstructors while also providing geometry-aware features useful for other tasks. Here, we show that the quality of these models scales predictably with model and data size. We do so by introducing VGGT-$Ω$, which substantially improves reconstruction accuracy, efficiency, and capabilities for both static and dynamic scenes. To enable training this model at an unprecedented scale, we introduce architectural changes that improve training efficiency, a high-quality data annotation pipeline that supports dynamic scenes, and a self-supervised learning protocol. We simplify VGGT's architecture by using a single dense prediction head with multi-task supervision and removing the expensive high-resolution convolutional layers. We also use registers to aggregate scene information into a compact representation and introduce register attention, which restricts inter-frame information exchange to these registers, in part replacing global attention. In this way, during training, VGGT-$Ω$ uses only about 30% of the GPU memory of its predecessor, allowing us to train with 15x more supervised data than prior work and to leverage vast amounts of unlabeled video data. VGGT-$Ω$ achieves strong results for reconstruction of static and dynamic scenes across multiple benchmarks, for example, improving over the previous best camera estimation accuracy on Sintel by 77%. We also show that the learned registers can improve vision-language-action models and support alignment with language, suggesting that reconstruction can be a powerful and scalable proxy task for spatial understanding. Project Page: http://vggt-omega.github.io/

URL PDF HTML ☆

赞 0 踩 0

2605.15193 2026-05-15 cs.CV

Aligning Latent Geometry for Spherical Flow Matching in Image Generation

Tuna Han Salih Meral, Kaan Oktay, Hidir Yesiltepe, Adil Kaan Akan, Pinar Yanardag

AI总结该研究针对图像生成中的潜在流匹配方法，提出了通过对齐潜在空间的几何结构来提升生成质量的新方法。作者发现，传统方法在将高斯噪声传输到变分自编码器潜在空间时，往往沿着欧几里得路径进行，但这种路径无法保持在薄球壳状的潜在分布上。为此，他们将潜在表示分解为径向和角度成分，发现感知和语义信息主要由方向决定，从而提出将数据潜在表示投影到固定半径球面，并采用球面线性插值替代传统方法，使生成路径始终位于球面上，显著提升了生成图像的质量。

2605.15190 2026-05-15 cs.CV

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Yanzuo Lu, Ronglai Zuo, Jiankang Deng

AI总结本文提出了一种名为RAVEN的实时自回归视频外推网络，用于从先前生成的内容中实时生成未来视频片段。为了解决训练与推理过程中历史分布不一致导致的长期生成质量下降问题，RAVEN在训练时将自滚动过程重构为包含干净历史端点和噪声去噪状态的交错序列，从而对齐训练注意力与推理外推过程。此外，论文还引入了基于条件高斯转移的CM-GRPO方法，通过在线强化学习优化一致性采样步骤，进一步提升了生成效果。实验表明，RAVEN在多项评估指标上优于现有因果视频蒸馏方法。

2605.15188 2026-05-15 cs.LG cs.AI cs.CL

FutureSim: Replaying World Events to Evaluate Adaptive Agents

Shashwat Goel, Nikhil Chandak, Arvindh Arun, Ameya Prabhu, Steffen Staab, Moritz Hardt, Maksym Andriushchenko, Jonas Geiping

AI总结本文提出 FutureSim，一个用于评估适应性人工智能代理在真实世界事件预测能力的基准平台。该平台通过按时间顺序回放真实新闻事件，测试代理在知识截止点之后预测未来事件的能力。实验表明，现有前沿代理在三月份的预测准确率普遍较低，最高仅为25%，揭示了当前模型在长期适应和不确定性推理方面仍存在显著挑战。FutureSim 为研究长期适应、搜索、记忆和不确定性推理等方向提供了现实可靠的实验环境。

2605.15187 2026-05-15 cs.CV cs.GR cs.RO

Articraft: An Agentic System for Scalable Articulated 3D Asset Generation

Matt Zhou, Ruining Li, Xiaoyang Lyu, Zhaomou Song, Zhening Huang, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi, Shangzhe Wu

AI总结本文提出了一种名为Articraft的智能系统，用于大规模生成可动的3D模型资产。该系统通过将生成任务转化为编写程序的过程，并利用大型语言模型自动编写代码，从而克服了当前缺乏大规模多样化数据集的瓶颈。Articraft引入了专门的编程接口和验证机制，使语言模型能够高效生成包含部件定义、几何组合、关节设置及测试验证的代码，最终生成高质量的可动3D资产。实验表明，该方法在生成质量上优于现有最先进的生成工具，并基于此构建了一个包含10,000个样本、涵盖245类物体的高质量数据集，用于训练和应用如机器人仿真与虚拟现实等领域。

2605.15185 2026-05-15 cs.CV cs.AI

Quantitative Video World Model Evaluation for Geometric-Consistency

Jiaxin Wu, Yihao Pi, Yinling Zhang, Yuheng Li, Xueyan Zou

AI总结本文提出了一种名为PDI-Bench的定量评估框架，用于检测生成视频中的几何一致性问题。该方法通过分割和点追踪获取物体中心视角的观测信息，结合单目重建技术将其映射到三维空间，并计算反映尺度-深度对齐、三维运动一致性和结构刚性等三个失败维度的投影几何残差。研究还构建了PDI-Dataset，用于系统评估生成视频的几何特性，揭示了现有生成模型在物理合理性方面的不足。

2605.15184 2026-05-15 cs.CL

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

Sahil Sen, Akhil Kasturi, Elias Lumer, Anmol Gulati, Vamse Kumar Subbiah

AI总结本文研究了在智能体搜索系统中，不同检索策略（如grep和向量检索）与智能体架构及工具调用方式之间的交互影响。通过两个实验，作者对比了在不同工具结果呈现方式和干扰信息环境下，grep与向量检索的性能差异，发现grep在多数情况下表现更优，但整体效果还高度依赖于所使用的智能体框架和工具调用方式。研究为优化智能体搜索系统的检索策略提供了实证依据。

2605.15183 2026-05-15 cs.LG

When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability

ML Nissen Gonzalez, Melwina Albuquerque, Laurence Wroe, Jacob Meyer Cohen, Logan Riggs Smith, Thomas Dooms

AI总结本文研究了如何判断两个神经网络是否实现相同的计算机制，提出了一个基于张量的相似性度量方法，该方法对权重空间的对称性具有不变性，能够捕捉全局功能等价性并考虑跨层机制。相比现有方法，该度量在追踪功能训练动态方面具有更高的精度，将相似性衡量和可信度验证转化为代数问题，提升了机制可解释性的准确性与可靠性。

2605.15182 2026-05-15 cs.CV

Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

Yifan Wang, Tong He

AI总结本文提出了一种名为Warp-as-History的方法，旨在实现无需额外训练即可从单个训练视频中生成可控相机轨迹的视频。该方法通过将相机引起的图像变形转化为伪历史信息，并结合目标帧的位置对齐和可见令牌选择，直接输入到视频生成模型中，从而引导模型生成符合指定相机路径的视频。实验表明，该方法不仅在零样本情况下表现出良好的相机轨迹跟随能力，而且通过轻量的微调还可进一步提升生成视频的质量和运动一致性。

2605.15181 2026-05-15 cs.CV

From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing

Anirudh Sundara Rajan, Krishna Kumar Singh, Yong Jae Lee

AI总结该研究旨在解决开放性图像编辑中抽象、多步骤指令的处理问题，提出了一种将规划与执行紧密结合的框架。其核心方法包括一个生成原子分解步骤的规划器、一个选择编辑工具和区域的协调器，以及一个基于视觉语言判断的奖励机制，用于指导编辑过程。该方法通过奖励驱动的执行优化协调器，并利用成功轨迹反哺规划器，从而实现更连贯、可靠的图像编辑效果。

2605.15179 2026-05-15 cs.LG cs.AI physics.comp-ph

Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing

Ellwil Sharma, Arastu Sharma

AI总结该论文研究了如何消除多物理场基础模型中的负迁移问题，即在同时训练不同偏微分方程（PDE）系统时出现的梯度冲突和优化不稳定现象。为此，作者提出了一种基于稀疏激活的混合专家（MoE）架构Shodh-MoE，通过物理感知的自编码器生成压缩的物理潜在表示，并结合软语义路由策略，将不同物理机制的局部潜在块分配给专门的专家子网络，从而实现对多物理场的高效且稳定的建模。实验表明，该方法在保持质量守恒的同时，显著提升了模型在不同物理场景下的预测精度。

详情

Comments: 5 pages, 4 figures

英文摘要

Scaling Scientific Machine Learning (SciML) toward universal foundation models is bottlenecked by negative transfer: the simultaneous co-training of disparate partial differential equation (PDE) regimes can induce gradient conflict, unstable optimization, and plasticity loss in dense neural operators. In particular, broadband open-channel fluid dynamics and boundary-dominated porous media flows impose incompatible spectral and geometric demands on a single dense parameter path. We introduce Shodh-MoE, a sparse-activated latent transformer architecture for multi-physics transport. Shodh-MoE operates on compressed 16^3 physical latents produced by a physics-informed autoencoder with an intra-tokenizer Helmholtz-style velocity parameterization, restricting decoded states to divergence-free velocity manifolds. The model guarantees exact mass conservation, achieving a physically verifiable velocity divergence of ~2.8 x 10^-10 (evaluated post-hoc in FP64) on 128^3 grids. A Top-1 soft-semantic router dynamically assigns localized latent patches to expert subnetworks, enabling specialized parameter paths for distinct physical mechanisms while preserving shared experts for universal symmetries. In a 20,000-step distributed pretraining run over mixed three-dimensional physical tensors, routing telemetry shows autonomous domain bifurcation: held-out validation tokens from the open-channel domain route exclusively to Expert 0, while porous-media tokens route exclusively to Expert 1. The model converges simultaneously across both regimes, achieving latent validation MSEs of 2.46 x 10^-5 and 9.76 x 10^-6, and decoded physical MSEs of 2.48 x 10^-6 and 1.76 x 10^-6. These results support sparse expert routing as a practical architectural mechanism for mitigating multi-physics interference in universal neural operators.

URL PDF HTML ☆

赞 0 踩 0

2605.15178 2026-05-15 cs.CV

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

Haoyi Zhu, Haozhe Liu, Yuyang Zhao, Tian Ye, Junsong Chen, Jincheng Yu, Tong He, Song Han, Enze Xie

AI总结本文提出了一种名为 SANA-WM 的高效世界模型，能够在单分钟内生成高保真、720p 分辨率的视频，并具备精确的相机控制能力。该模型通过混合线性注意力机制、双分支相机控制、两阶段生成流程以及鲁棒的标注管道等核心设计，在保证视觉质量的同时显著提升了训练与推理效率。实验表明，SANA-WM 在数据使用量、训练时长和硬件资源消耗方面均优于现有开源模型，且在单分钟世界建模基准测试中表现出更高的动作跟随精度和生成吞吐量。

2605.15174 2026-05-15 quant-ph cond-mat.stat-mech cs.IT math-ph math.IT math.MP

Universal quantum resource distillation via composite generalised quantum Stein's lemma

Ludovico Lami, Bartosz Regula, Ryuji Takagi

AI总结本文研究了量子资源蒸馏的通用方法，提出在无需精确了解输入态的情况下，仍可实现最优蒸馏速率，展示了量子资源蒸馏的鲁棒性。核心方法基于对广义量子Stein引理的扩展，将其应用于由未知态独立同分布副本组成的复合假设检验场景。该成果为量子纠缠净化等任务提供了理论支持，并揭示了最优速率由纠缠相对熵的正则化形式决定。

2605.15173 2026-05-15 cs.DS cs.DB

Hybrid Sketching Methods for Dynamic Connectivity on Sparse Graphs

Quinten De Man, Gilvir Gill, Michael A. Bender, Laxman Dhulipala, David Tench

AI总结本文研究了动态图连通性问题在稀疏图上的高效处理方法，提出了一种混合的图素描方法，通过区分图中的稀疏外围和密集核心区域，仅对密集部分进行素描处理，从而在保证性能的同时显著减少空间开销。核心方法包括一种新的BalloonSketch算法，大幅降低每个顶点的素描空间需求，并构建了HybridSCALE系统，实现了在不同密度图上的空间效率优化。该方法在实际图数据上相比传统无损方法节省了高达97%的存储空间。

详情

英文摘要

Dynamic connectivity is a fundamental dynamic graph problem, and recent algorithmic breakthroughs on dynamic graph sketching have reshaped what is theoretically possible: by encoding the graph as per-vertex linear sketches, these algorithms solve dynamic connectivity in only $Θ(V \log^2 V)$ space, independent of the number of edges,outperforming lossless $Θ(V+E)$-space structures that grow as the graph becomes denser. Prior to this work, no practical dynamic connectivity algorithm has been able to translate these theoretical breakthroughs into space savings on real-world graphs. The main obstacle is that per-vertex sketches cost thousands of bytes per vertex, so sketching only pays off once the graph becomes extremely dense. We observe that sparse real-world graphs are often not uniformly sparse, these graphs can contain dense cores on a small subset of vertices that account for a large fraction of edges. We exploit this structure via hybrid sketching: sketch only the dense core, and store the sparse periphery losslessly. We design new hybrid algorithms for fully-dynamic and semi-streaming connectivity with space $O(\min\{V+E, V \log V \log(2+E/V)\})$ w.h.p., simultaneously matching the lossless bound on sparse graphs, the sketching bound on dense graphs, and improving on both in an intermediate regime. A key component is BalloonSketch, a new l0-sampler reducing per-vertex sketch sizes by up to 8x. We implement HybridSCALE, a modular system treating the lossless and sketch-based components as subroutines. HybridSCALE is the first sketch-based dynamic connectivity system to save space on common real-world graphs. Compared to the state-of-the-art lossless baseline, HybridSCALE saves up to 15% space on sparse graphs (average degree < 100), up to 92% on intermediate density graphs (average degree ~ 100-1000), and up to 97% on dense graphs (average degree > 1000).

URL PDF HTML ☆

赞 0 踩 0

2605.15172 2026-05-15 cs.CR cs.CL

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

Rui Wen, Mark Russinovich, Andrew Paverd, Jun Sakuma, Ahmed Salem

AI总结本文提出了一种新型的后门攻击方法MetaBackdoor，利用大语言模型中的位置编码作为触发机制，无需修改输入文本内容即可激活后门。研究发现，基于位置信息的触发器能够有效激活隐蔽的后门行为，使模型在满足特定长度条件时泄露敏感信息或执行恶意操作。该方法扩展了大语言模型后门攻击的威胁模型，揭示了位置编码这一此前被忽视的攻击面，为防御策略的设计提出了新的挑战。

详情

英文摘要

Backdoor attacks pose a serious security threat to large language models (LLMs), which are increasingly deployed as general-purpose assistants in safety- and privacy-critical applications. Existing LLM backdoors rely primarily on content-based triggers, requiring explicit modification of the input text. In this work, we show that this assumption is unnecessary and limiting. We introduce MetaBackdoor, a new class of backdoor attacks that exploits positional information as the trigger, without modifying textual content. Our key insight is that Transformer-based LLMs necessarily encode token positions to process ordered sequences. As a result, length-correlated positional structure is reflected in the model's internal computation and can be used as an effective non-content trigger signal. We demonstrate that even a simple length-based positional trigger is sufficient to activate stealthy backdoors. Unlike prior attacks, MetaBackdoor operates on visibly and semantically clean inputs and enables qualitatively new capabilities. We show that a backdoored LLM can be induced to disclose sensitive internal information, including proprietary system prompts, once a length condition is satisfied. We further demonstrate a self-activation scenario, where normal multi-turn interaction can move the conversation context into the trigger region and induce malicious tool-call behavior without attacker-supplied trigger text. In addition, MetaBackdoor is orthogonal to content-based backdoors and can be composed with them to create more precise and harder-to-detect activation conditions. Our results expand the threat model of LLM backdoors by revealing positional encoding as a previously overlooked attack surface. This challenges defenses that focus on detecting suspicious text and highlights the need for new defense strategies that explicitly account for positional triggers in modern LLM architectures.

URL PDF HTML ☆

赞 0 踩 0

2605.15171 2026-05-15 cs.CV cs.AI cs.LG

Evidential Reasoning Advances Interpretable Real-World Disease Screening

Chenyu Lian, Hong-Yu Zhou, Jing Qin

AI总结本文提出了一种基于证据推理的可解释疾病筛查框架EviScreen，旨在解决当前医学图像筛查模型在可解释性和性能上的不足。该方法通过从历史病例中检索区域级证据，并结合双知识库进行回顾性解释，提升了模型的透明度和诊断准确性。同时，利用对比检索生成的异常图增强定位解释性，实验表明该方法在真实世界疾病筛查基准上表现出色，尤其在临床召回率下的特异性显著提高。

2605.15168 2026-05-15 cs.CL cs.AI cs.LG stat.ML

Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment

Sayantan Kumar, Shahriar Noroozizadeh, Juyong Kim, Jeremy C. Weiss

AI总结本研究旨在解决临床文本与结构化电子健康记录（EHR）在时间信息上的互补性问题，提出了一种基于检索增强的多模态对齐框架，用于重建更精确的临床时间线。该方法通过从文本中提取关键事件构建时间框架，并结合结构化数据中的时间信息进行校准，从而提升时间戳的准确性。实验表明，该方法在多个模型上均显著提升了时间一致性，同时保留了事件匹配率，展示了多模态对齐在临床轨迹重建中的优势。

详情

Comments: Sayantan Kumar, Shahriar Noroozizadeh, Juyong Kim (authors contributed equally)

英文摘要

Reconstructing precise clinical timelines is essential for modeling patient trajectories and forecasting risk in complex, heterogeneous conditions like sepsis. While unstructured clinical narratives offer semantically rich and contextually complete descriptions of a patient's course, they often lack temporal precision and contain ambiguous event timing. Conversely, structured electronic health record (EHR) data provides precise temporal anchors but misses a substantial portion of clinically meaningful events. We introduce a retrieval-augmented multimodal alignment framework that bridges this gap to improve the temporal precision of absolute clinical timelines extracted from text. Our approach formulates timeline reconstruction as a graph-based multistep process: it first extracts central anchor events from narratives to build an initial temporal scaffold, places non-central events relative to this backbone, and then calibrates the timeline using retrieved structured EHR rows as external temporal evidence. Evaluated using instruction-tuned large language models on the i2m4 benchmark spanning MIMIC-III and MIMIC-IV, our multimodal pipeline consistently improves absolute timestamp accuracy (AULTC) and improves temporal concordance across nearly all evaluated models over unimodal text-only reconstruction, without compromising event match rates. Furthermore, our empirical gap analysis reveals that 34.8% of text-derived events are entirely absent from tabular records, demonstrating that aligning these modalities can produce a more temporally faithful and clinically informative reconstruction of patient trajectories than either source alone.

URL PDF HTML ☆

赞 0 踩 0

2605.15167 2026-05-15 cs.CV

Does Synthetic Layered Design Data Benefit Layered Design Decomposition?

Kam Man Wu, Haolin Yang, Qingyu Chen, Yihu Tang, Jingye Chen, Qifeng Chen

AI总结本文研究了纯合成分层数据是否有助于提升图形设计分解的效果。作者基于先进的CLD框架构建了合成数据集SynLayers，并利用视觉语言模型生成文本监督和自动推理输入，发现纯合成数据在性能上可超越现有非可扩展数据集，且在数据量增加时表现持续提升，同时能有效平衡分层分布。该研究为分层设计编辑系统提供了可扩展的合成数据基础，具有重要的实用价值。

详情

Comments: 22 pages, 10 figures. Code is available at https://github.com/YangHaolin0526/SynLayers

英文摘要

Recent advances in image generation have made it easy to produce high-quality images. However, these outputs are inherently flattened, entangling foreground elements, background, and text within a fixed canvas. As a result, flexible post-generation editing remains challenging, revealing a clear last-mile gap toward practical usability. Existing approaches either rely on scarce proprietary layered assets or construct partially synthetic data from limited structural priors. However, both strategies face fundamental challenges in scalability. In this work, we investigate whether pure synthetic layered data can improve graphic design decomposition. We make the assumption that, in graphic design, effective decomposition does not require modeling inter-layer dependencies as precisely as in natural-image composition, since design elements are often intentionally arranged as modular and semantically separable components. Concretely, we conduct a data-centric study based on CLD baseline, which is a state-of-the-art layer decomposition framework. Based on the baseline, we construct our own synthetic dataset, SynLayers, generate textual supervision using vision language models, and automate inference inputs with VLM-predicted bounding boxes. Our study reveals three key findings: (1) even training with purely synthetic data can outperform non-scalable alternatives such as the widely used PrismLayersPro dataset, demonstrating its viability as a scalable and effective substitute; (2) performance consistently improves with increased training data scale, while gains begin to saturate at around 50K samples; and (3) synthetic data enables balanced control over layer-count distributions, avoiding the layer-count imbalance commonly observed in real-world datasets. We hope this data-centric study encourages broader adoption of synthetic data as a practical foundation for layered design editing systems.

URL PDF HTML ☆

赞 0 踩 0

2605.15164 2026-05-15 cs.LG cs.AI

Position: Behavioural Assurance Cannot Verify the Safety Claims Governance Now Demands

Pratinav Seth, Vinay Kumar Sankarapu

AI总结本文指出，当前的行为保障方法无法满足AI治理框架对安全性的验证需求。治理框架要求验证AI系统是否存在隐藏目标、抗失控能力及灾难性能力边界等属性，但现有方法仅能观察模型输出，无法验证其潜在表征和长期行为。文章提出“审计鸿沟”概念，强调验证需求与技术能力之间的不匹配，并建议通过法律文本中限制行为证据的权重、引入机制性验证手段等方式进行技术转向。

2605.15163 2026-05-15 cs.LO

Automating Bitvector and Finite Field Equivalence Proofs in Lean

Elizaveta Pertseva, Valentin Robert, Clark Barrett, James Parker

AI总结该研究针对零知识证明电路编码验证中涉及位向量与有限域操作的无量词陈述正确性证明难题，提出了一种新的Lean证明策略BitModEq。该方法通过范围引理和案例分析实现有限域到位向量的验证转换，并结合位爆破技术，在解决零知识证明算术化基准问题上优于现有SMT求解器，成功案例增加了19%。

2605.15155 2026-05-15 cs.LG cs.AI cs.CL

Self-Distilled Agentic Reinforcement Learning

Zhengxi Lu, Zhiyuan Yao, Zhuowen Han, Zi-Han Wang, Jinyang Wu, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

AI总结该论文研究了如何提升基于强化学习（RL）的大型语言模型代理在多轮任务中的性能。为了解决传统RL在长序列任务中监督信号过于稀疏的问题，作者提出了自蒸馏代理强化学习（SDAR），通过将基于教师分支的密集令牌级指导作为辅助目标，与主RL优化框架结合。SDAR通过引入一个门控机制，增强对教师认可的正向令牌的蒸馏效果，同时柔和地抑制教师的负向拒绝，从而在多个基准任务上显著提升了性能，并避免了传统方法的不稳定性。

2605.15154 2026-05-15 stat.ML cs.LG

RoSHAP: A Distributional Framework and Robust Metric for Stable Feature Attribution

Lanxin Xiang, Liang Shi, Youhui Ye, Boyu Jiang, Dawei Zhou, Feng Guo

AI总结本文提出了一种名为RoSHAP的分布框架和鲁棒度量方法，用于实现更稳定的特征归因分析。该方法基于SHAP值，通过引导重采样和核密度估计建模特征归因分数的分布，并在温和正则条件下证明其聚合值渐近服从高斯分布，从而降低了分布估计的计算成本。RoSHAP不仅提升了特征排名的稳定性，还在模拟和实际数据实验中表现出优于传统单次归因方法的性能，同时使用更少的特征即可达到与全特征模型相当的预测效果。

2605.15150 2026-05-15 quant-ph cond-mat.str-el cs.CC hep-th

Extensive long-range magic in non-Abelian topological orders

Yuzhen Zhang, Isaac H. Kim, Yimu Bao, Sagar Vijay

AI总结本文研究了非阿贝尔拓扑序低能态中广泛存在的长程魔性，并证明这种魔性无法通过常深度局域幺正电路消除。研究提出了一种新的资源理论视角来刻画拓扑序，并通过一个禁止单态态（即使经过常深度局域幺正变换）近似非阿贝尔弦网模型低能态的定理，进一步揭示了拓扑序的复杂性本质。此外，文章还指出高维量子双重模型的基态和低能态若具有非平凡融合空间的激发，必然表现出这种广泛长程魔性。

2605.15144 2026-05-15 cs.LO math.HO math.LO

Guises and Perspectives: An Intentional and Hyperintensional Sketch

Juan J. Colomina-Alminana

AI总结本文基于Héctor-Neri Castañeda的工作，构建了一种以“guises”（带有意图的属性集合）为核心的内涵逻辑系统，用于研究关系的内部结构。该逻辑系统融合了莱布尼茨式的内涵语义、意图操作符以及可能性与必然性的模态层，能够处理超内涵现象如意图语境中的替换失败和自指表达。研究展示了关系并非外在因果联系，而是通过“guises”所编码的主体和对象的内在视角结构。

2605.15143 2026-05-15 cs.LO cs.PL

Complete Local Reasoning About Parameterized Programs Over Topologies

Ruotong Cheng, Azadeh Farzan

AI总结本文研究了在复杂通信拓扑下无限状态参数化并发程序的算法安全验证问题，目标是自动生成一个全称量化归纳不变式作为正确性证明。在合理假设下，该问题可被归约为一种组合验证方案，即将参数化程序的验证转化为一组局部证明。作者提出了一种验证算法并实现为工具，通过多个不同拓扑结构的基准测试验证了该方法在证明参数化程序安全性方面的有效性。

2605.15138 2026-05-15 cs.LG cs.CL cs.ET

Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution

Saisab Sadhu, Pratinav Seth, Vinay Kumar Sankarapu

AI总结本文研究了量化语言模型中机器遗忘的永久性问题，指出传统方法在全精度下评估遗忘效果，未能反映实际部署中模型先经历量化的情况。研究发现，4位量化会削弱甚至逆转遗忘效果，其根本原因在于参数更新幅度远小于量化区间宽度，导致无法改变量化后的模型结构。为此，作者提出MANSU方法，结合因果电路归因与约束投影，实现有意义的遗忘与结构性删除，并引入CAD指标用于验证，实验证明该方法在多个模型和任务中表现优异。

2605.15135 2026-05-15 eess.SP cs.IT math.IT

Deep Mixture of Experts Network for Resource Optimization in Aerial-Terrestrial CF-mMIMO Systems under URLLC

Donggen Li, Chong Huang, Jingfu Li, Pei Xiao, Wenjiang Feng, Dusit Niyato, Zhu Han

AI总结本文研究了在超可靠低时延通信（URLLC）场景下，如何优化空天地一体化免蜂窝大规模MIMO（CF-mMIMO）系统的资源分配问题。为应对高移动性带来的信道老化问题，作者提出了一种基于Transformer的信道预测网络（CP-Net），并设计了一个深度专家混合（MoE）网络（MoE-Net）用于上行功率分配，通过引入加权门控网络（WT-Net）实现专家模型的自适应组合。该方法有效提升了系统在URLLC约束下的通信性能和资源效率。

详情

Comments: 15 pages, accepted for publication in IEEE Transactions on Wireless Communications

英文摘要

As a critical component of sixth-generation (6G) wireless networks, ultra-reliable and low-latency communication (URLLC) is expected to support real-time and reliable information exchange in low-altitude environments. However, achieving URLLC often incurs significant resource overhead, including increased bandwidth consumption, higher transmit power, and denser access point (AP) deployment, which pose significant challenges to both spectral efficiency (SE) and energy efficiency (EE). Besides, existing iterative optimization algorithms are computationally intensive and struggle to meet the latency requirements of URLLC. To address these challenges, we propose a hybrid aerial-terrestrial cell-free massive MIMO (CF-mMIMO) network to support diverse services, along with a channel prediction network and a deep mixture of experts (MoE) network for uplink optimization. First, we design a channel prediction network (CP-Net) to mitigate channel aging caused by high-mobility user equipment (UE). CP-Net employs three Transformer-based sub-networks for aged channel state information (CSI) prediction, while a channel quality-aware loss function is introduced to improve the prediction accuracy of weak links. Based on the predicted CSI, we develop a deep MoE network (MoE-Net) for power allocation comprising three expert models targeting different objectives. Then, we introduce a weighted gating network (WT-Net) to learn an efficient adaptive combination of expert outputs. The proposed framework better captures heterogeneous UE requirements and improves communication performance under URLLC constraints. Numerical results demonstrate the effectiveness of the proposed method.

URL PDF HTML ☆

赞 0 踩 0