arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 3813
2605.12012 2026-05-19 cs.AI

LegalCheck: Retrieval- and Context-Augmented Generation for Drafting Municipal Legal Advice Letters

LegalCheck: 基于检索和上下文增强的生成方法用于起草市政法律建议信

Virgill van der Meer, Julien Rossi

发表机构 * Municipality of Amsterdam(阿姆斯特丹市) University of Amsterdam(阿姆斯特丹大学)

AI总结 本文提出LegalCheck系统,通过检索增强生成(RAG)和上下文增强生成(CAG)技术,自动起草市政法律回应信,提高法律部门在人员短缺、案件量增加和合规压力下的工作效率,确保法律一致性和准确性。

Comments Accepted at ICAIL 2026 as Short Paper

详情
AI中文摘要

荷兰公共部门法律部门面临严重的人员短缺、案件量增加和满足合规要求的压力。本文提出了LegalCheck,一种新的系统,通过检索增强生成(RAG)和上下文增强生成(CAG)的结合,自动化起草异议回应信。利用大型语言模型(LLM)和经过筛选的法律知识库,LegalCheck检索相关法律和先例,并通过受控提示将外部知识和案件特定细节整合到连贯的草稿中。专家在循环审查确保生成的信件在法律上正确且上下文合适。在阿姆斯特丹市的实际部署中,LegalCheck在分钟内生成接近最终的建议信,而不是小时,同时保持高法律一致性和事实准确性。输出基于实际法规和先例,提供可解释的输出,涵盖了大多数所需的法律推理(通常80%到100%的必要内容)。法律专业人士发现该系统减少了他们的工作量,确保了法律标准的一致应用,而没有取代人类判断。这些结果展示了显著的效率提升、改进的法律一致性和积极的用户接受度。更广泛地说,这项工作展示了如何通过在LLM中加入领域知识和治理机制来部署负责任的AI,从而在法律领域应用负责任的AI。

英文摘要

Public-sector legal departments in the Netherlands face acute staff shortages, increased case volumes, and increased pressure to meet regulatory compliance. This paper presents LegalCheck, a novel system that addresses these challenges by automating the drafting of objection response letters through a combination of Retrieval-Augmented Generation (RAG) and Context-Augmented Generation (CAG). Using a large language model (LLM) alongside curated legal knowledge bases, LegalCheck performs retrieval of relevant laws and precedents, and uses controlled prompting to incorporate both external knowledge and case-specific details into a coherent draft. An expert-in-the-loop review ensures that each generated letter is legally sound and contextually appropriate. In a real-world deployment within the Municipality of Amsterdam, LegalCheck produced near-final advice letters in minutes rather than hours, while maintaining high legal consistency and factual accuracy. The output is based on actual regulations and prior cases, providing explainable outputs that captured the vast majority of required legal reasoning (often 80\% to 100\% of essential content). Legal professionals found that the system reduced their workload and ensured a consistent application of legal standards, without replacing human judgment. These results demonstrate substantial efficiency gains, improved legal consistency, and positive user acceptance. More broadly, this work illustrates how responsible AI can be deployed in the legal domain by augmenting LLMs with domain knowledge and governance mechanisms.

2605.12000 2026-05-19 cs.LG

Split the Differences, Pool the Rest: Provably Efficient Multi-Objective Imitation

拆分差异,融合其余:可证明高效的多目标模仿

Ziyad Sheebaelhamd, Luca Viano, Volkan Cevher, Claire Vernade

发表机构 * University of Tübingen(图宾根大学) EPFL(苏黎世联邦理工学院) University of Technology Nuremberg(纽伦堡技术大学)

AI总结 本文研究了多目标模仿学习问题,即在多目标马尔可夫决策过程(MOMDP)中,根据多个帕累托最优专家的演示数据恢复位于帕累托前沿的策略。传统模仿方法无法应对这一场景,因为简单地聚合冲突的专家轨迹可能导致被支配的策略。为此,我们引入了多输出增强行为克隆(MA-BC)算法,该算法系统地划分分歧的专家数据,同时融合无行为冲突的状态-动作对。理论上,我们证明MA-BC以比单独考虑每个专家数据集的学习器更快的统计速率收敛到帕累托最优策略。此外,我们建立了多目标模仿学习的新下界,证明MA-BC是最小最大最优的。最后,我们在多样化的离散环境和连续线性二次调节器(LQR)控制任务中经验验证了我们的算法。

详情
AI中文摘要

本文研究了多目标模仿学习问题:在多目标马尔可夫决策过程(MOMDP)中,根据多个帕累托最优专家的演示数据恢复位于帕累托前沿的策略。标准模仿方法无法应对这一场景,因为简单地聚合冲突的专家轨迹可能导致被支配的策略。为此,我们引入了多输出增强行为克隆(MA-BC)算法,该算法系统地划分分歧的专家数据,同时融合无行为冲突的状态-动作对。理论上,我们证明MA-BC以比单独考虑每个专家数据集的学习器更快的统计速率收敛到帕累托最优策略。此外,我们建立了多目标模仿学习的新下界,证明MA-BC是最小最大最优的。最后,我们在多样化的离散环境和连续线性二次调节器(LQR)控制任务中经验验证了我们的算法。

英文摘要

This work investigates multi-objective imitation learning: the problem of recovering policies that lie on the Pareto front given demonstrations from multiple Pareto-optimal experts in a Multi-Objective Markov Decision Process (MOMDP). Standard imitation approaches are ill-equipped for this regime, as naively aggregating conflicting expert trajectories can result in dominated policies. To address this, we introduce Multi-Output Augmented Behavioral Cloning (MA-BC), an algorithm that systematically partitions divergent expert data while pooling state-action pairs where no behavior conflict is observed. Theoretically, we prove that MA-BC converges to Pareto-optimal policies at a faster statistical rate than any learner that considers each expert dataset independently. Furthermore, we establish a novel lower bound for multi-objective imitation learning, demonstrating that MA-BC is minimax optimal. Finally, we empirically validate our algorithm across diverse discrete environments and, guided by our theoretical insights, extend and evaluate MA-BC on a continuous Linear Quadratic Regulator (LQR) control task.

2605.11881 2026-05-19 cs.CV

Learning Subspace-Preserving Sparse Attention Graphs from Heterogeneous Multiview Data

从异构多视图数据学习子空间保持的稀疏注意力图

Jie Chen, Yuanbiao Gou, Chuanbin Liu, Zhu Wang, Xi Peng

发表机构 * College of Computer Science(计算机科学学院) School of Economics and Management(经济管理学院) China University of Petroleum (Beijing)(中国石油大学(北京)) School of Artificial Intelligence(人工智能学院)

AI总结 本文提出了一种稀疏注意力图学习方法SAGL,通过学习异构多视图数据的子空间保持稀疏注意力图,以实现跨异构视图的语义对齐,核心方法是引入双线性注意力因子分解和动态稀疏门控机制,结合α-entmax生成稀疏注意力图,并通过理论分析和实验验证其有效性。

Comments 18 pages

详情
AI中文摘要

从大规模未标记数据中通过各种预训练模型提取的高维特征被称为异构多视图数据。现有大多数无监督迁移学习方法在利用多视图互补信息时无法忠实恢复内在子空间结构。因此,构建保持这些底层子空间结构的稀疏相似度图以实现跨异构视图的语义对齐是一个基本挑战。本文提出了一种稀疏注意力图学习(SAGL)方法,从异构多视图数据中学习子空间保持的稀疏注意力图。具体而言,我们引入了一种双线性注意力因子分解方案,以捕捉高维特征之间的不对称相似性,从而突破传统表示学习技术中的对称瓶颈。随后,动态稀疏门控机制预测一个特征特定的压缩因子,以适应性地控制邻居的拓扑贡献。此外,我们采用α-entmax生成子空间保持的稀疏注意力图以生成单个视图的图。SAGL利用这些视图特定的图进行稀疏信息聚合,产生用于多视图学习任务的判别表示。此外,我们还提供了一种严谨的理论分析,将可微稀疏注意力与概率单纯形约束联系起来。在多个基准数据集上的广泛实验表明,SAGL在很大程度上优于最先进的无监督迁移学习方法。

英文摘要

The high-dimensional features extracted from large-scale unlabeled data via various pretrained models with diverse architectures are referred to as heterogeneous multiview data. Most existing unsupervised transfer learning methods fail to faithfully recover intrinsic subspace structures when exploiting complementary information across multiple views. Therefore, a fundamental challenge involves constructing sparse similarity graphs that preserve these underlying subspace structures for achieving semantic alignment across heterogeneous views. In this paper, we propose a sparse attention graph learning (SAGL) method that learns subspace-preserving sparse attention graphs from heterogeneous multiview data. Specifically, we introduce a bilinear attention factorization scheme to capture asymmetric similarities among the high-dimensional features, which breaks the symmetry bottleneck that is inherent in the traditional representation learning techniques. A dynamic sparsity gating mechanism then predicts a feature-specific compression factor for adaptively controlling the topological contributions of neighbors. Furthermore, we employ a structured sparse projection via $α$-entmax to generate subspace-preserving sparse attention graphs for individual views. SAGL leverages these view-specific graphs to conduct sparse information aggregation, yielding discriminative representations for multiview learning tasks. In addition, we provide a rigorous theoretical analysis that bridges differentiable sparse attention and probability simplex constraints. Extensive experiments conducted on multiple benchmark datasets demonstrate that SAGL consistently outperforms the state-of-the-art unsupervised transfer learning approaches.

2605.11710 2026-05-19 cs.LG cs.CV

Unlocking Compositional Generalization in Continual Few-Shot Learning

解锁持续少样本学习中的组合泛化

Phu-Quy Nguyen-Lam, Phu-Hoa Pham, Dao Sy Duy Minh, Chi-Nguyen Tran, Huynh Trung Kiet, Long Tran-Thanh

发表机构 * Faculty of Information Technology, University of Science, Vietnam National University(信息科技学院,科学大学,越南国家大学) Department of Computer Science, University of Warwick(计算机科学系,沃里克大学)

AI总结 本文提出了一种新的持续少样本学习范式,通过严格解耦表示学习与组合推理,实现对新概念的高效泛化,并在多个基准测试中取得最佳性能。

Comments 10 pages

详情
AI中文摘要

基于对象的表示方法在少样本学习中具有关键属性:而不是将场景视为单一单元,模型可以将其分解为个体对象级别的部分,这些部分可以在不同概念之间进行匹配和比较。在实践中,这种潜力很少被实现。持续学习者要么将场景压缩成全局嵌入,要么通过部分级匹配目标进行训练,这使表示过于紧密地依赖于已见过的模式,从而无法泛化到真正的新概念。在本文中,我们识别出这种根本性的结构冲突,并开创了一种新的范式,严格解耦表示学习与组合推理。利用自监督视觉变换器(ViTs)固有的片段级语义几何,我们的框架采用双阶段策略。在训练期间,槽表示完全优化为整体类别身份,保留高度可泛化的对象级几何结构。在推理期间,保留的槽被动态组合以匹配新场景。我们证明了这种范式提供了双重结构优势:冻结的主干自然防止了表示漂移,而我们的轻量级、整体优化保持了特征对新概念转移的能力。广泛的实验验证了这种方法,在标准持续学习基准中实现了最佳的未见概念泛化和最小的遗忘。

英文摘要

Object-centric representations promise a key property for few-shot learning: Rather than treating a scene as a single unit, a model can decompose it into individual object-level parts that can be matched and compared across different concepts. In practice, this potential is rarely realized. Continual learners either collapse scenes into global embeddings, or train with part-level matching objectives that tie representations too closely to seen patterns, leaving them unable to generalize to truly novel concepts. In this paper, we identify this fundamental structural conflict and pioneer a new paradigm that strictly decouples representation learning from compositional inference. Leveraging the inherent patch-level semantic geometry of self-supervised Vision Transformers (ViTs), our framework employs a dual-phase strategy. During training, slot representations are optimized entirely toward holistic class identity, preserving highly generalizable, object-level geometries. At inference, preserved slots are dynamically composed to match novel scenes. We demonstrate that this paradigm offers dual structural benefits: The frozen backbone naturally prevents representation drift, while our lightweight, holistic optimization preserves the features' capacity for novel-concept transfer. Extensive experiments validate this approach, achieving state-of-the-art unseen-concept generalization and minimal forgetting across standard continual learning benchmarks.

2605.11654 2026-05-19 cs.CV cs.AI cs.RO

Weather-Robust Cross-View Geo-Localization via Prototype-Based Semantic Part Discovery

通过基于原型的语义部分发现实现抗天气的跨视角地理定位

Chi-Nguyen Tran, Dao Sy Duy Minh, Huynh Trung Kiet, Nguyen Lam Phu Quy, Phu-Hoa Pham, Long Tran-Thanh

发表机构 * Faculty of Information Technology, University of Science, Vietnam National University(信息技术学院,科学大学,越南国家大学) Department of Computer Science, University of Warwick(计算机科学系,沃里克大学)

AI总结 本文提出SkyPart,一种轻量级可替换头,用于基于补丁的视觉变换器,通过在补丁网格上显式分组实现部分分组。SkyPart有四个理论基础的组件:(i)通过单次传递余弦分配学习可学习的原型以竞争补丁标记;(ii)在训练期间应用的海拔条件线性调制,使检索嵌入在推理时无海拔依赖;(iii)对活跃原型的图注意力读出;(iv)一种Kendall不确定性加权多目标损失,其平稳点是帕累托平稳点。在26.95M参数和22.14 GFLOPs下,SkyPart是表现最佳方法中最小的,并在SUES-200、University-1652和DenseUAV上设定了新的状态。其在十条件WeatherPrompt腐蚀基准下的优势优于最强基线。

Comments 37 pages, 7 figures, 6 tables

详情
AI中文摘要

跨视角地理定位(CVGL),即匹配一个倾斜无人机视角到地理参考的卫星瓷砖,已成为在GPS信号被干扰、欺骗或不可用时自主无人机导航的关键替代方案。尽管近年来取得了显著进展,但仍然存在三个限制:(1)全局描述符设计将补丁网格压缩成一个向量,而没有在视角间隙中分离布局和纹理;(2)与海拔相关的尺度变化保留在学习嵌入中,而不是被边缘化;(3)多目标训练依赖于手动调整的标量损失,这些损失在不兼容的梯度尺度上。我们提出SkyPart,一种轻量级可替换头,用于基于补丁的视觉变换器(ViTs),在补丁网格上实施显式部分分组。SkyPart有四个理论基础的组件:(i)通过单次传递余弦分配学习可学习的原型以竞争补丁标记;(ii)在训练期间应用的海拔条件线性调制,使检索嵌入在推理时无海拔依赖;(iii)对活跃原型的图注意力读出;(iv)一种Kendall不确定性加权多目标损失,其平稳点是帕累托平稳点。在26.95M参数和22.14 GFLOPs下,SkyPart是表现最佳方法中最小的,并在SUES-200、University-1652和DenseUAV上设定了新的状态。其在十条件WeatherPrompt腐蚀基准下的优势优于最强基线。

英文摘要

Cross-view geo-localization (CVGL), which matches an oblique drone view to a geo-referenced satellite tile, has emerged as a key alternative for autonomous drone navigation when GNSS signals are jammed, spoofed, or unavailable. Despite strong recent progress, three limitations persist: (1) global-descriptor designs compress the patch grid into a single vector without separating layout from texture across the view gap; (2) altitude-related scale variation is retained in the learned embedding rather than marginalized; and (3) multi-objective training relies on hand-tuned scalars over losses on incompatible gradient scales. We propose SkyPart, a lightweight swappable head for patch-based vision transformers (ViTs) that institutes explicit part grouping over the patch grid. SkyPart has four theory-grounded components: (i) learnable prototypes competing for patch tokens via single-pass cosine assignment; (ii) altitude-conditioned linear modulation applied only during training, making the retrieval embedding altitude-free at inference; (iii) a graph-attention readout over active prototypes; and (iv) a Kendall uncertainty-weighted multi-objective loss whose stationary points are Pareto-stationary. At 26.95M parameters and 22.14 GFLOPs, SkyPart is the smallest among top-performing methods and sets a new state of the art on SUES-200, University-1652, and DenseUAV under a single-pass, no-re-ranking, no-TTA protocol. Its advantage over the strongest baseline widens under the ten-condition WeatherPrompt corruption benchmark.

2605.11617 2026-05-19 cs.LG math.ST stat.TH

MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound

MIST:通过McDiarmid界实现可靠的流决策树用于在线类增量学习

Phu-Hoa Pham, Chi-Nguyen Tran, Nguyen Lam Phu Quy, Dao Sy Duy Minh, Huynh Trung Kiet, Long Tran-Thanh

发表机构 * Faculty of Information and Technology University of Science Vietnam National University Ho Chi Minh City(信息科技学院科学大学越南国家大学胡志明市) Department of Computer Science University of Warwick(计算机科学系沃里克大学)

AI总结 本文提出MIST方法,通过三个集成组件解决流决策树在在线类增量学习中的可靠性问题,包括McDiarmid置信半径、贝叶斯继承协议和KLL量化图,以提升在非高斯几何中的鲁棒性。

Comments 9 pages of main text, 5 figures

详情
AI中文摘要

流决策树是开放世界持续学习的自然候选者,因为它们执行局部更新,具有有界内存,并且具有静态决策边界。尽管如此,它们仍然在在线类增量学习中失败,由于两个耦合的校准问题:(i)随着类别数K的增加,其分裂标准逐渐变得不可靠;(ii)在分裂时间缺乏知识转移。这两种失败的共同根源是信息增益的范围本质上与log2 K成比例。因此,任何基于它的Hoeffding式置信半径必然随着类别数的增长而增长,使得结构上独立于K的分裂标准不可能,从而剥夺了应用流决策树进行持续学习的潜在优势。为了解决这个问题,我们提出了MIST(McDiarmid增量流树),通过三个集成组件解决这两种失败:(i)一个紧致且独立于K的McDiarmid置信半径用于Gini分裂,作为结构正则化器;(ii)一个贝叶斯继承协议,通过截断高斯矩将父统计信息投影到子节点,方差减少保证在最保守的分裂时最强;(iii)每个叶子的KLL量化图支持连续阈值评估和几何自适应的叶子预测。在标准和压力测试表格流上,MIST在近高斯基准上与全局参数方法竞争,并在非高斯几何中表现出独特鲁棒性,其中SOTA基准崩溃。

英文摘要

Streaming decision trees are natural candidates for open-world continual learning, as they perform local updates, enjoy bounded memory, and static decision boundaries. Despite these, they still fail in online class-incremental learning due to two coupled miscalibrations: (i) their split criterion grows unreliable as the class count K expands, and (ii) the absence of knowledge transfer at split time. Both failures share a common root: the range of Information Gain intrinsically scales with log2 K. Consequently, any Hoeffding-style confidence radius derived from it must inevitably grow with the class count, making a K-independent split criterion structurally impossible, taking away the potential benefits of applying streaming decision trees to continual learning. To fix this issue, we present MIST (McDiarmid Incremental Streaming Tree), which resolves both failures through three integrated components: (i) a tight, K-independent McDiarmid confidence radius for Gini splitting that acts as a structural regulariser; (ii) a Bayesian inheritance protocol that projects parent statistics to child nodes via truncated-Gaussian moments, with variance reduction guarantees strongest precisely when splitting is most conservative; and (iii) per-leaf KLL quantile sketches that support both continuous threshold evaluation and geometry-adaptive leaf prediction from a single data structure. On standard and stress-test tabular streams, MIST is competitive with global parametric methods on near-Gaussian benchmarks and uniquely robust on non-Gaussian geometry where SOTA benchmarks collapse.

2605.11365 2026-05-19 cs.AI cs.LG stat.ML

Causal Bias Detection in Generative Artificial Intelligence

生成人工智能中的因果偏见检测

Drago Plecko

发表机构 * Department of Statistics & Data Science(统计与数据科学系)

AI总结 本文研究了生成人工智能中的因果公平性问题,提出了新的因果分解结果,以量化不同因果路径和现实机制被生成模型替代对公平性的影响,并通过分析大型语言模型中的种族和性别偏见验证了方法的有效性。

详情
AI中文摘要

基于人工智能构建的自动化系统越来越多地应用于高风险领域,引发了关于公平性和现实世界中存在的人口差异持续存在的关键担忧。在此背景下,因果推断提供了一个有原则的框架来思考公平性,因为它将观察到的不平等与潜在机制联系起来,并自然与人类直觉和法律上的歧视观念相一致。先前关于因果公平性的研究主要集中在标准机器学习设置中,其中决策者为结果变量Y构建单一预测机制f_Ŷ,同时继承其他协变量的因果机制。然而,生成人工智能的设置却更加复杂:生成模型可以从任意条件下对任何变量集进行采样,隐式地构建了自己对所有因果机制的看法,而不是学习单一预测函数。这种根本性的差异要求因果公平性方法论有新的发展。我们正式定义了生成人工智能中的因果公平性问题,并在统一的理论框架下将其与标准机器学习设置相结合。然后,我们推导了新的因果分解结果,使能够对不同因果路径以及现实机制被生成模型机制替代的公平性影响进行精细量化。我们建立了识别条件并引入了用于因果感兴趣的量的高效估计器,并通过分析不同数据集中的大型语言模型中的种族和性别偏见来证明了我们方法的价值。

英文摘要

Automated systems built on artificial intelligence (AI) are increasingly deployed across high-stakes domains, raising critical concerns about fairness and the perpetuation of demographic disparities that exist in the world. In this context, causal inference provides a principled framework for reasoning about fairness, as it links observed disparities to underlying mechanisms and aligns naturally with human intuition and legal notions of discrimination. Prior work on causal fairness primarily focuses on the standard machine learning setting, where a decision-maker constructs a single predictive mechanism $f_{\widehat Y}$ for an outcome variable $Y$, while inheriting the causal mechanisms of all other covariates from the real world. The generative AI setting, however, is markedly more complex: generative models can sample from arbitrary conditionals over any set of variables, implicitly constructing their own beliefs about all causal mechanisms rather than learning a single predictive function. This fundamental difference requires new developments in causal fairness methodology. We formalize the problem of causal fairness in generative AI and unify it with the standard ML setting under a common theoretical framework. We then derive new causal decomposition results that enable granular quantification of fairness impacts along both (a) different causal pathways and (b) the replacement of real-world mechanisms by the generative model's mechanisms. We establish identification conditions and introduce efficient estimators for causal quantities of interest, and demonstrate the value of our methodology by analyzing race and gender bias in large language models across different datasets.

2605.10843 2026-05-19 cs.CL cs.AI cs.CY

Training-Free Cultural Alignment of Large Language Models via Persona Disagreement

通过人格分歧实现无训练的文化对齐大语言模型

Huynh Trung Kiet, Dao Sy Duy Minh, Tuan Nguyen, Chi-Nguyen Tran, Phu-Hoa Pham, Nguyen Lam Phu Quy, The Anh Han, Long Tran-Thanh

发表机构 * Faculty of Information and Technology, University of Science, Vietnam National University(信息与技术学院,科学大学,越南国家大学) Department of Computer Science, University of Warwick(计算机科学系,沃里克大学) School of Computing, Engineering and Digital Technologies, Teesside University(计算、工程和数字技术学院,泰赛德大学)

AI总结 本文提出DISCA方法,在不改变模型权重的情况下,通过人格分歧校准减少大语言模型在多任务测试中的文化偏差,为服务全球道德偏好提供了可扩展的替代方案。

Comments 57 pages, 1 figure, 6 MultiTP moral dimensions

详情
AI中文摘要

大型语言模型越来越多地参与涉及道德判断的决策,但越来越多的证据表明,它们的隐含偏好并非文化中立。现有的文化对齐方法要么需要国家层面的偏好数据和微调预算,要么假设可以访问模型内部的白盒信息,而商业API并未暴露此类信息。在本工作中,我们专注于这种现实的黑盒、仅公共数据的环境,并观察到国家内部的社会人口学分歧,而非共识,是主要的指导信号。我们引入DISCA(基于分歧的文化对齐推理方法),一种在推理时的方法,将每个国家视为一个基于世界价值观调查的个人代理面板,并将他们的分歧转化为一个有界的、损失厌恶的logit校正。在20个国家和7个开放权重的backbone(2B-70B)上,DISCA在MultiTP上减少了10-24%的文化偏差(在六个backbone >=3.8B上),并在开放场景中减少了2-7%的偏差,而无需改变任何权重。我们的结果表明,推理时的校准是微调的可扩展替代方案,用于服务全球道德偏好的长尾。

英文摘要

Large language models increasingly mediate decisions that turn on moral judgement, yet a growing body of evidence shows that their implicit preferences are not culturally neutral. Existing cultural alignment methods either require per-country preference data and fine-tuning budgets or assume white-box access to model internals that commercial APIs do not expose. In this work, we focus on this realistic black-box, public-data-only regime and observe that within-country sociodemographic disagreement, not consensus, is the primary steering signal. We introduce DISCA (Disagreement-Informed Steering for Cultural Alignment), an inference-time method that instantiates each country as a panel of World-Values-Survey-grounded persona agents and converts their disagreement into a bounded, loss-averse logit correction. Across 20 countries and 7 open-weight backbones (2B--70B), DISCA reduces cultural misalignment on MultiTP by 10--24% on the six backbones >=3.8B, and 2--7% on open-ended scenarios, without changing any weights. Our results suggest that inference-time calibration is a scalable alternative to fine-tuning for serving the long tail of global moral preferences.

2605.10503 2026-05-19 cs.AI

SLASH the Sink: Sharpening Structural Attention Inside LLMs

SLASH the Sink: 在大语言模型中 sharpening 结构性注意力

Yiming Liu, Bin Lu, Xinbing Wang, Chenghu Zhou, Meng Jin

发表机构 * Shanghai Jiao Tong University(上海交通大学) Institute of Geographical Science and Natural Resources Research(地理科学与自然资源研究所) Chinese Academy of Sciences(中国科学院)

AI总结 本文研究了大语言模型内部机制,发现其能自发重构图拓扑,但受注意力sink影响导致结构理解被削弱。提出SLASH方法,通过插件式注意力重分布增强内部结构理解,实验表明在纯图任务和分子预测中性能显著提升。

详情
AI中文摘要

大型语言模型(LLMs)在处理图拓扑时表现出显著的语义理解能力,但往往在结构理解上遇到困难。现有解决方案依赖于训练外部图结构适配器或微调,这导致成本高且失去泛化能力。本文研究了LLMs的内部机制,发现LLMs会自发地在内部重构图的拓扑结构,这在注意力图中表现为明显的“锯齿”模式,与“token级邻接矩阵”结构一致。然而,这种内在的结构理解被注意力sink所稀释。我们理论上将这种稀释定义为一个表示瓶颈,源于一个根本性的矛盾:模型的各向异性偏见,对于语言任务是必要的,却抑制了图推理所需的拓扑感知局部聚合。为了解决这个问题,我们提出了一种无需训练的解决方案,名为StructuraL Attention SHarpening(SLASH),通过插件式注意力重分布来增强这种内部结构理解。在纯图任务和分子预测实验中验证,SLASH在多种LLM上都带来了显著且一致的性能提升。

英文摘要

Large Language Models (LLMs) show remarkable semantic understanding but often struggle with structural understanding when processing graph topologies in a serialized format. Existing solutions rely on training external graph-based adapters or fine-tuning, which incur high costs and lost generalizability. In this work, we investigate the internal mechanisms of LLMs and present a critical finding: LLMs spontaneously reconstruct the graph's topology internally, evidenced by a distinct "sawtooth" pattern in their attention maps that structurally aligns with the "token-level adjacency matrix". However, this intrinsic structural understanding is diluted by the attention sink. We theoretically formalize this dilution as a representation bottleneck, stemming from a fundamental conflict: the model's anisotropic bias, essential for language tasks, suppresses the topology-aware local aggregation required for graph reasoning. To address this, we propose a training-free solution, named StructuraL Attention SHarpening (SLASH), which amplifies this internal structural understanding via a plug-and-play attention redistribution. Experiments on pure graph tasks and molecular prediction validate that SLASH delivers significant and consistent performance gains across diverse LLMs.

2605.06933 2026-05-19 cs.LG cs.CR cs.MA

MAGIQ: A Post-Quantum Multi-Agentic AI Governance System with Provable Security

MAGIQ: 一种具有可证明安全性的多智能体AI治理系统

Sepideh Avizheh, Tushin Mallick, Alina Oprea, Cristina Nita-Rotaru, Reihaneh Safavi-Naini

发表机构 * University of Calgary(卡尔加里大学) Northeastern University(东北大学)

AI总结 本文提出MAGIQ,一种利用新型高效抗量子加密协议进行多智能体AI系统策略定义和执行的框架,旨在解决智能体通信和访问控制策略的安全性问题,并提供可追溯的问责机制。

详情
AI中文摘要

我们的计算生态系统正受到两种新兴范式的转变:代理AI系统部署的增加和量子计算的进步。对于代理AI系统而言,最关键的问题之一是创建安全的治理架构,以确保代理遵循其所有者的通信和交互政策,并对其与其他代理交换的消息负责。对于量子计算而言,现有系统必须进行改造,同时必须设计新的加密机制以确保长期安全性和抗量子性。事实上,NIST建议从2030年起弃用标准公钥加密算法,包括RSA、Diffie-Hellman(DH)和椭圆曲线构造(ECC),并在2035年后禁止使用。在本文中,我们提出了MAGIQ,一种使用新型高效、抗量子的加密协议进行多智能体AI系统策略定义和执行的框架。MAGIQ(i)允许用户为智能体到智能体的会话和任务定义丰富的通信和访问控制策略预算,包括针对一对一智能体会话的全局预算;(ii)利用后量子加密原语执行这些策略;(iii)支持基于会话的策略执行,用于智能体到智能体和一对一智能体会话;(iv)通过消息归因提供智能体对其用户的责任。我们使用通用可组合性(UC)框架正式建模并证明系统的正确性和安全性。我们评估了我们框架的计算和通信开销,并将其与最先进的代理AI框架SAGA进行比较。MAGIQ是朝着后量子安全的代理AI系统解决方案迈出的第一步。

英文摘要

Our computing ecosystem is being transformed by two emerging paradigms: the increased deployment of agentic AI systems and advancements in quantum computing. With respect to agentic AI systems, one of the most critical problems is creating secure governing architectures that ensure agents follow their owners' communication and interaction policies and can be held accountable for the messages they exchange with other agents. With respect to quantum computing, existing systems must be retrofitted and new cryptographic mechanisms must be designed to ensure long-term security and quantum resistance. In fact, NIST recommends that standard public-key cryptographic algorithms, including RSA, Diffie-Hellman (DH), and elliptic-curve constructions (ECC), be deprecated starting in 2030 and disallowed after 2035. In this paper, we present MAGIQ, a framework for policy definition and enforcement in multi-agent AI systems using novel, highly efficient, quantum-resistant cryptographic protocols with proven security guarantees. MAGIQ (i) allows users to define rich communication and access-control policy budgets for agent-to-agent sessions and tasks, including global budgets for one-to-many agent sessions; (ii) enforces such policies using post-quantum cryptographic primitives; (iii) supports session-based enforcement of policies for agent-to-agent and one-to-many agent sessions; and (iv) provides accountability of agents to their users through message attribution. We formally model and prove the correctness and security of the system using the Universal Composability (UC) framework. We evaluate the computation and communication overhead of our framework and compare it with the state-of-the-art agentic AI framework SAGA. MAGIQ is a first step toward post-quantum-secure solutions for agentic AI systems.

2604.26793 2026-05-19 cs.LG eess.SP

Super-resolution Multi-signal Direction-of-Arrival Estimation by Hankel-structured Sensing and Decomposition

基于Hankel结构传感与分解的超分辨率多信号方向到达估计

Georgios I. Orfanidis, Dimitris A. Pados, George Sklivanitis, Elizabeth S. Bentley

发表机构 * Center for Connected Autonomy and AI(连接自主与人工智能中心) Dept. of Electrical Engineering and Computer Science(电气工程与计算机科学系) Florida Atlantic University(佛罗里达大学) Air Force Research Laboratory(空军研究实验室) AFRL/RI(空军研究实验室/RI)

AI总结 本文提出了一种基于Hankel结构传感和任意秩数据矩阵分解的新框架,用于快速超分辨率多信号方向到达估计,在L2和L1范数下均达到最大似然最优,通过大量仿真验证了其强大的超分辨率能力和更高的分辨率概率。

详情
AI中文摘要

受现代自主系统中受限相干时间下大阵列硬件受限空间采样的传感模式启发,我们开发了一种基于Hankel结构传感和任意秩数据矩阵分解的新框架,用于快速超分辨率多信号方向到达(DoA)估计,在L2和L1范数下均达到最大似然最优。L2范数估计器在高斯白噪声中最优,L1范数估计器在独立同分布(i.i.d.)各向同性拉普拉斯噪声中最优,具有对实际中常见脉冲干扰和损坏测量的广泛鲁棒性。大量仿真表明,所提方法具有强大的超分辨率能力,要求显著更低的信噪比,并在分辨率概率上优于最近的竞争对手方法。

英文摘要

Motivated by sensing modalities in modern autonomous systems that involve hardware-constrained spatial sampling over large arrays with limited coherence time, we develop a novel framework for rapid super-resolution multi-signal direction-of-arrival (DoA) estimation based on Hankel-structured sensing and data matrix decomposition of arbitrary rank, under both the $L_2$ and $L_1$-norm formulation. The resulting $L_2$-norm estimator is shown to be maximum-likelihood optimal in white Gaussian noise. The $L_1$-norm estimator is shown to be maximum-likelihood optimal in independent, identically distributed (i.i.d.) isotropic Laplace noise, offering broad robustness to impulsive interference and corrupted measurements commonly encountered in practice. Extensive simulations demonstrate that the proposed methods exhibit powerful super-resolution capabilities, requiring significantly lower SNR and achieving substantially higher resolution probability than recent competing approaches.

2604.26450 2026-05-19 cs.RO

Reactive Motion Generation via Phase-varying Neural Potential Functions

通过相变神经势函数实现反应性运动生成

Ahmet Tekden, Dimitrios Kanoulas, Aude Billard, Yasemin Bekiroglu

发表机构 * Chalmers AI Research Center (CHAIR)(查尔姆斯人工智能研究中心(CHAIR)) Chalmers Gender Initiative for Excellence (Genie)(查尔姆斯卓越性别倡议(Genie)) Wallenberg AI, Autonomous Systems and Software Program (WASP)(瓦兰贝格人工智能、自主系统和软件计划(WASP)) University College London(伦敦大学学院) Ecole Polytechnique Federale de Lausanne (EPFL)(瑞士联邦理工学院(EPFL))

AI总结 本文提出了一种基于相变神经势函数(PNPF)的运动生成框架,通过直接从状态进展估计相变量来条件势函数,从而在点到点、周期性和全6D运动任务中实现更有效的泛化,并在有交点轨迹和外部干扰下表现出更强的鲁棒性。

Comments Accepted by IEEE Robotics and Automation Letters (RAL)

详情
AI中文摘要

动态系统(DS)方法在学习示范(LfD)中提供了从少量示范中获得稳定连续策略的能力。一阶动态系统(DS)在许多点对点和周期性任务中效果良好,只要为每个状态定义唯一的速度。对于具有交点的任务(例如绘制“8”),通常会使用扩展方法如二阶动态或相变量。然而,通过引入速度,二阶模型在交点附近对扰动敏感,因为速度用于区分运动方向。此外,这种区分可能在几乎相同的位移速度对对应不同后续运动时失效。相比之下,基于相位的方法依赖于开环时间或相变量,这限制了它们在扰动后恢复的能力。我们引入了相变神经势函数(PNPF),一种LfD框架,将势函数条件于直接从状态进展估计的相变量,而不是开环时间输入。该相变量使系统能够处理状态重访,而学习的势函数生成局部向量场用于反应性和稳定的控制。PNPF在点对点、周期性和全6D运动任务中表现出良好的泛化能力,在具有交点的轨迹上优于现有基线,并在实时机器人操作中表现出对外部扰动的鲁棒性。

英文摘要

Dynamical systems (DS) methods for Learning-from-Demonstration (LfD) provide stable, continuous policies from few demonstrations. First-order dynamical systems (DS) are effective for many point-to-point and periodic tasks, as long as a unique velocity is defined for each state. For tasks with intersections (e.g., drawing an "8"), extensions such as second-order dynamics or phase variables are often used. However, by incorporating velocity, second-order models become sensitive to disturbances near intersections, as velocity is used to disambiguate motion direction. Moreover, this disambiguation may fail when nearly identical position-velocity pairs correspond to different onward motions. In contrast, phase-based methods rely on open-loop time or phase variables, which limit their ability to recover after perturbations. We introduce Phase-varying Neural Potential Functions (PNPF), an LfD framework that conditions a potential function on a phase variable which is estimated directly from state progression, rather than on open-loop temporal inputs. This phase variable allows the system to handle state revisits, while the learned potential function generates local vector fields for reactive and stable control. PNPF generalizes effectively across point-to-point, periodic, and full 6D motion tasks, outperforms existing baselines on trajectories with intersections, and demonstrates robust performance in real-time robotic manipulation under external disturbances.

2604.25850 2026-05-19 cs.CL cs.SE

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

代理 harness 工程:基于可观测性的自动进化编码代理 harness

Jiahang Lin, Shichun Liu, Chengjun Pan, Lizhi Lin, Shihan Dou, Zhiheng Xi, Xuanjing Huang, Hang Yan, Zhenhua Han, Tao Gui, Yu-Gang Jiang

发表机构 * Fudan University(复旦大学) Peking University(北京大学) Shanghai Qiji Zhifeng Co., Ltd(上海启智锋科技有限公司)

AI总结 本文提出了一种基于可观测性的代理 harness 工程方法,通过三个支柱解决 harness 工程中的挑战,使 harness 进化过程自动且可控,实验表明其在多个基准测试中表现优异。

详情
AI中文摘要

harness 现在是编码代理性能的核心,调解模型与工具和执行环境的交互方式。然而,harness 工程仍然是手工制作,因为自动化面临可编辑组件的异质动作空间、大量轨迹隐藏可操作信号以及编辑效果难以归因的挑战。我们引入了代理 harness 工程(AHE),一个闭环系统,通过三个匹配的可观测性支柱来解决这些挑战:(1)组件可观测性为每个可编辑的harness组件提供文件级表示,使动作空间明确且可回退;(2)经验可观测性将数百万个原始轨迹令牌提炼成分层、可深入查阅的证据库,使进化代理可以实际消耗;(3)决策可观测性将每个编辑与自我声明的预测配对,随后通过下一轮任务级结果验证。共同,这些支柱使每个编辑都成为可检验的合同,使harness进化过程自主进行而不陷入试错。实证上,十次AHE迭代使Terminal-Bench 2的pass@1从69.7%提升到77.0%,超过人工设计的harness Codex-CLI(71.9%)和自进化基线ACE和TF-GRPO。冻结的harness无需重新进化:在SWE-bench-verified中,它在比种子少12%的token上达到最高聚合成功率,并在Terminal-Bench 2上,相对于三个替代模型家族,产生+5.1到+10.1个百分点的跨家族增益,表明进化组件编码了通用工程经验而非基准特定调优。消融分析将增益局部化到工具、中间件和长期记忆,而非系统提示,表明事实性的harness结构转移,而语义层面的策略不转移。

英文摘要

Harnesses are now central to coding-agent performance, mediating how models interact with tools and execution environments. Yet harness engineering remains a manual craft, because automating it faces a heterogeneous action space across editable components, voluminous trajectories that bury actionable signal, and edits whose effect is hard to attribute. We introduce Agentic Harness Engineering (AHE), a closed loop that addresses these challenges through three matched observability pillars: (1) component observability gives every editable harness component a file-level representation so the action space is explicit and revertible; (2) experience observability distills millions of raw trajectory tokens into a layered, drill-down evidence corpus that an evolving agent can actually consume; and (3) decision observability pairs every edit with a self-declared prediction, later verified against the next round's task-level outcomes. Together, these pillars turn every edit into a falsifiable contract, so harness evolution proceeds autonomously without collapsing into trial-and-error. Empirically, ten AHE iterations lift pass@1 on Terminal-Bench 2 from 69.7% to 77.0%, surpassing the human-designed harness Codex-CLI (71.9%) and the self-evolving baselines ACE and TF-GRPO. The frozen harness transfers without re-evolution: on SWE-bench-verified it tops aggregate success at 12% fewer tokens than the seed, and on Terminal-Bench 2 it yields +5.1 to +10.1pp cross-family gains across three alternate model families, indicating the evolved components encode general engineering experience rather than benchmark-specific tuning. Ablations localize the gain to tools, middleware, and long-term memory rather than the system prompt, suggesting factual harness structure transfers while prose-level strategy does not.

2604.15950 2026-05-19 cs.LG

TwinTrack: Post-hoc Multi-Rater Calibration for Medical Image Segmentation

TwinTrack: 医学图像分割的后验多评分者校准

Tristan Kirscher, Alexandra Ertl, Klaus Maier-Hein, Xavier Coubez, Philippe Meyer, Sylvain Faisan

发表机构 * ICube Laboratory, CNRS UMR-7357, University of Strasbourg(ICube实验室,CNRS UMR-7357,斯特拉斯堡大学) CLCC Institut Strauss(CLCC斯特拉斯堡研究所) German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing(海德堡德国癌症研究中心(DKFZ),医学图像计算部门) Pattern Analysis and Learning Group, Department of Radiation Oncology, Heidelberg University Hospital(放射肿瘤科,海德堡大学医院模式分析与学习小组) Medical Faculty Heidelberg, Heidelberg University(海德堡医学系,海德堡大学)

AI总结 针对医学图像分割中多评分者不确定性建模问题,TwinTrack通过后验校准方法将集成分割概率校准到经验人类响应(MHR),提高概率校准和可解释性。

Comments Accepted for publication at MIDL 2026

详情
AI中文摘要

胰腺导管腺癌(PDAC)在增强CT中的分割本质上具有歧义性:专家之间的分歧反映的是真正的不确定性而非标注噪声。标准深度学习方法假设存在单一真实情况,产生概率输出,但在这种歧义下可能校准不良且难以解释。我们提出TwinTrack框架,通过后验校准集成分割概率到经验人类响应(MHR)——即专家标注器对体素标记为肿瘤的比例。校准后的概率可直接解释为标注者分配肿瘤标签的预期比例,明确建模评分者分歧。所提出的后验校准过程简单,仅需少量多评分者校准数据集。在MICCAI 2025 CURVAS-PDACVI多评分者基准测试中,该方法在校准指标上 consistently 改善了标准方法的表现。

英文摘要

Pancreatic ductal adenocarcinoma (PDAC) segmentation on contrast-enhanced CT is inherently ambiguous: inter-rater disagreement among experts reflects genuine uncertainty rather than annotation noise. Standard deep learning approaches assume a single ground truth, producing probabilistic outputs that can be poorly calibrated and difficult to interpret under such ambiguity. We present TwinTrack, a framework that addresses this gap through post-hoc calibration of ensemble segmentation probabilities to the empirical mean human response (MHR) -the fraction of expert annotators labeling a voxel as tumor. Calibrated probabilities are thus directly interpretable as the expected proportion of annotators assigning the tumor label, explicitly modeling inter-rater disagreement. The proposed post-hoc calibration procedure is simple and requires only a small multi-rater calibration set. It consistently improves calibration metrics over standard approaches when evaluated on the MICCAI 2025 CURVAS-PDACVI multi-rater benchmark.

2604.15929 2026-05-19 cs.CL

MUSCAT: MUltilingual, SCientific ConversATion Benchmark

MUSCAT:多语言、科学对话基准

Supriti Sinhamahapatra, Thai-Binh Nguyen, Yiğit Oğuz, Enes Ugan, Jan Niehues, Alexander Waibel

发表机构 * Karlsruhe Institute of Technology(卡尔斯鲁厄大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文提出MUSCAT基准,用于评估自动语音识别系统在处理多语言混合输入、特定词汇和语言切换等挑战方面的性能,揭示了当前最先进的ASR系统仍面临开放性挑战。

详情
AI中文摘要

多语言语音技术的目标是促进说不同语言的人之间的无缝交流,使人们感觉像每个人都是多语言使用者。为了实现这一目标,语音技术需要解决几个挑战:处理混合多语言输入、特定词汇和语言切换。然而,目前尚无基准数据集来评估这种情境。我们提出一个新的基准来评估当前的自动语音识别(ASR)系统是否能够处理这些挑战。该基准由多个说话人之间的双语科学论文讨论组成,每个说话人用不同的语言交流。我们提供了一个标准的评估框架,超越词错误率(WER),使不同语言的ASR性能能够一致比较。实验结果表明,所提出的数据集对最先进的ASR系统仍是一个开放性挑战。该数据集可在https://huggingface.co/datasets/goodpiku/muscat-eval上获取。关键词:多语言、语音识别、音频分割、说话人辨识

英文摘要

The goal of multilingual speech technology is to facilitate seamless communication between individuals speaking different languages, creating the experience as though everyone were a multilingual speaker. To create this experience, speech technology needs to address several challenges: Handling mixed multilingual input, specific vocabulary, and code-switching. However, there is currently no dataset benchmarking this situation. We propose a new benchmark to evaluate current Automatic Speech Recognition (ASR) systems, whether they are able to handle these challenges. The benchmark consists of bilingual discussions on scientific papers between multiple speakers, each conversing in a different language. We provide a standard evaluation framework, beyond Word Error Rate (WER) enabling consistent comparison of ASR performance across languages. Experimental results demonstrate that the proposed dataset is still an open challenge for state-of-the-art ASR systems. The dataset is available in https://huggingface.co/datasets/goodpiku/muscat-eval. Keywords: multilingual, speech recognition, audio segmentation, speaker diarization

2604.12253 2026-05-19 cs.AI

A Scoping Review of Large Language Model-Based Pedagogical Agents

基于大语言模型的教育代理的综述

Shan Li, Juan Zheng

发表机构 * Department of Education and Human Services, College of Education, Lehigh University(教育与人类服务学院,教育学院,莱维大学) Department of Community and Global Health, College of Health, Lehigh University(社区与全球健康学院,健康学院,莱维大学)

AI总结 本文综述了大语言模型在教育环境中的应用,探讨了教育代理的设计维度、发展趋势及研究空白,为未来研究提供指导。

详情
AI中文摘要

本综述根据PRISMA-ScR指南,分析了2022年11月至2025年1月期间五个主要数据库中的52项研究,探讨了基于大语言模型(LLM)的教育代理在K-12教育、高等教育和非正式学习环境中的多样性。研究识别出四个关键设计维度:交互方式(反应型 vs. 主动型)、领域范围(领域专用 vs. 通用)、角色复杂性(单一角色 vs. 多角色)以及系统集成(独立 vs. 集成)。新兴趋势包括多代理系统模拟自然学习环境、虚拟学生模拟用于代理评估、与沉浸式技术的整合以及与学习分析的结合。本文还讨论了隐私、准确性和学生自主性等重要研究空白和伦理问题。

英文摘要

This scoping review examines the emerging field of Large Language Model (LLM)-based pedagogical agents in educational settings. While traditional pedagogical agents have been extensively studied, the integration of LLMs represents a transformative advancement with unprecedented capabilities in natural language understanding, reasoning, and adaptation. Following PRISMA-ScR guidelines, we analyzed 52 studies across five major databases from November 2022 to January 2025. Our findings reveal diverse LLM-based agents spanning K-12, higher education, and informal learning contexts across multiple subject domains. We identified four key design dimensions characterizing these agents: interaction approach (reactive vs. proactive), domain scope (domain-specific vs. general-purpose), role complexity (single-role vs. multi-role), and system integration (standalone vs. integrated). Emerging trends include multi-agent systems that simulate naturalistic learning environments, virtual student simulation for agent evaluation, integration with immersive technologies, and combinations with learning analytics. We also discuss significant research gaps and ethical considerations regarding privacy, accuracy, and student autonomy. This review provides researchers and practitioners with a comprehensive understanding of LLM-based pedagogical agents while identifying crucial areas for future development in this rapidly evolving field.

2604.08874 2026-05-19 cs.LG cs.AI

A Mathematical Framework for Temporal Modeling and Counterfactual Policy Simulation of Student Dropout

面向学生退学的时序建模与反事实政策模拟的数学框架

Rafael da Silva, Jeff Eicher, Gregory Longo

发表机构 * Applied Data Science Program(应用数据科学项目) Eastern University(东部大学)

AI总结 本文提出了一种结合反事实政策模拟层的时序建模框架,用于分析高等教育学生退学问题,通过LMS参与数据和行政退学记录进行建模,采用时间到事件结局的方式,并通过惩罚性、类别平衡逻辑回归进行每周风险建模,展示了模型在训练和测试集上的高AUC表现,并通过消融分析验证了时间参与信号的重要性。

Comments Approx. 20 pages, 9 figures. Code and reproducibility package available at https://github.com/rafa-rodriguess/TCM-Student-Dropout This work introduces a temporal survival framework with counterfactual policy simulation

详情
AI中文摘要

本研究提出了一种针对高等教育学生退学问题的时序建模框架,结合反事实政策模拟层,利用LMS参与数据和行政退学记录进行建模。退学被定义为在入学层面的时间到事件结局;通过在人-时期行上进行惩罚性、类别平衡逻辑回归,对每周风险进行离散时间建模。在晚期事件时间验证下,模型在训练集和测试集上分别达到0.8350和0.8405的行级AUC,整体校准可接受但最高风险分箱支持稀疏。消融分析表明性能对特征集组成敏感,突显了时间参与信号的作用。一个基于场景的政策层产生生存对比ΔS(T)在显式的触发/计划合同下:正对比被限制在冲击分支(T_policy=18:0.0102,0.0260,0.0819),而机制-aware分支为负(ΔS_mech(18)=-0.0078,ΔS_mech(38)=-0.0134)。通过性别子组分析量化了场景诱导的生存差距,通过bootstrap方法进行统计检验;对比方向稳定但较小。结果未被因果识别;它们展示了在观察数据限制下,该框架进行内部结构场景比较的能力。

英文摘要

This study proposes a temporal modeling framework with a counterfactual policy-simulation layer for student dropout in higher education, using LMS engagement data and administrative withdrawal records. Dropout is operationalized as a time-to-event outcome at the enrollment level; weekly risk is modeled in discrete time via penalized, class-balanced logistic regression over person--period rows. Under a late-event temporal holdout, the model attains row-level AUCs of 0.8350 (train) and 0.8405 (test), with aggregate calibration acceptable but sparsely supported in the highest-risk bins. Ablation analyses indicate performance is sensitive to feature set composition, underscoring the role of temporal engagement signals. A scenario-indexed policy layer produces survival contrasts $ΔS(T)$ under an explicit trigger/schedule contract: positive contrasts are confined to the shock branch ($T_{\rm policy}=18$: 0.0102, 0.0260, 0.0819), while the mechanism-aware branch is negative ($ΔS_{\rm mech}(18)=-0.0078$, $ΔS_{\rm mech}(38)=-0.0134$). A subgroup analysis by gender quantifies scenario-induced survival gaps via bootstrap; contrasts are directionally stable but small. Results are not causally identified; they demonstrate the framework's capacity for internal structural scenario comparison under observational data constraints.

2604.07292 2026-05-19 cs.LG

Graph Neural ODE Digital Twins for Control-Oriented Reactor Thermal-Hydraulic Forecasting Under Partial Observability

基于图神经ODE数字孪生的面向控制的反应堆热力学预测(在部分可观测性下)

Akzhol Almukhametov, Doyeong Lim, Rui Hu, Yang Liu

发表机构 * Department of Nuclear Engineering, Texas A&M University, College Station, TX 77843, USA(德克萨斯A&M大学核工程系,学院站,TX 77843,美国) Argonne National Laboratory, Nuclear Science and Engineering Division, USA(阿贡国家实验室,核科学与工程部,美国)

AI总结 本文提出了一种结合物理信息的图神经网络与神经普通微分方程(GNN-ODE)的模型,用于在部分可观测性下实现反应堆热力学状态的准确预测,该模型在预测精度、毫秒级推理速度和对部分可观测性的鲁棒性方面均表现出色。

详情
AI中文摘要

先进的反应堆实时监督控制需要准确预测整个系统的热力学状态,包括物理传感器不可用的位置。为满足这一需求,需要结合预测精度、毫秒级推理速度以及对部分可观测性的鲁棒性的替代模型。在本文中,我们提出了一种结合物理信息的图神经网络与神经普通微分方程(GNN-ODE)来同时解决这三个要求。我们将整个系统表示为一个有向传感器图,其边通过流/热传递感知的消息传递编码液压连接性,并通过受控的神经ODE在连续时间推进潜在动态。拓扑引导的缺失节点初始化器在运行开始时重建未仪器化状态;预测然后完全自回归进行。GNN-ODE替代模型在系统动态预测中取得了令人满意的成果。在测试模拟瞬态中,替代模型在60秒时对未仪器化节点的平均MAE为0.91 K,在300秒时为2.18 K,对于缺失节点状态重建,$R^2$达到0.995。在单个GPU上推理速度大约是模拟时间的105倍,使64成员的集合运行成为可能,用于不确定性量化。为了评估仿真到现实的转移,我们使用逐层判别微调将预训练的替代模型适应到实验设施数据上,仅使用30个训练序列。学习的流依赖热传递缩放恢复了与已确立相关性一致的雷诺数指数,表明了超越轨迹拟合的构成学习。该模型跟踪了陡峭的功率变化瞬态,并在未仪器化位置产生了准确的轨迹。

英文摘要

Real-time supervisory control of advanced reactors requires accurate forecasting of plant-wide thermal-hydraulic states, including locations where physical sensors are unavailable. Meeting this need calls for surrogate models that combine predictive fidelity, millisecond-scale inference, and robustness to partial observability. In this work, we present a physics-informed message-passing Graph Neural Network coupled with a Neural Ordinary Differential Equation (GNN-ODE) to addresses all three requirements simultaneously. We represent the whole system as a directed sensor graph whose edges encode hydraulic connectivity through flow/heat transfer-aware message passing, and we advance the latent dynamics in continuous time via a controlled Neural ODE. A topology-guided missing-node initializer reconstructs uninstrumented states at rollout start; prediction then proceeds fully autoregressively. The GNN-ODE surrogate achieves satisfactory results for the system dynamics prediction. On held-out simulation transients, the surrogate achieves an average MAE of 0.91 K at 60 s and 2.18 K at 300 s for uninstrumented nodes, with $R^2$ up to 0.995 for missing-node state reconstruction. Inference runs at approximately 105 times faster than simulated time on a single GPU, enabling 64-member ensemble rollouts for uncertainty quantification. To assess sim-to-real transfer, we adapt the pretrained surrogate to experimental facility data using layerwise discriminative fine-tuning with only 30 training sequences. The learned flow-dependent heat-transfer scaling recovers a Reynolds-number exponent consistent with established correlations, indicating constitutive learning beyond trajectory fitting. The model tracks a steep power change transient and produces accurate trajectories at uninstrumented locations.

2604.03212 2026-05-19 cs.CV

ProtoFlow: Mitigating Forgetting in Class-Incremental Remote Sensing Segmentation via Low-Curvature Prototype Flow

ProtoFlow: 通过低曲率原型流缓解类别增量遥感分割中的遗忘

Jiekai Wu, Rong Fu, Chuangqi Li, Zijian Zhang, Guangxin Wu, Hao Zhang, Shiyin Lin, Jianyuan Ni, Yang Li, Dongxu Zhang, Amir H. Gandomi, Simon Fong, Pengbin Feng

发表机构 * Faculty of Health Data Science, Juntendo University(静冈大学健康数据科学学院) The Institute of Collaborative Innovation, University of Macau(澳门大学协同创新研究所) Department of Information and Computing Sciences, Faculty of Science, Utrecht University(乌得勒支大学科学学院信息与计算科学系) Department of Computer and Information Science, University of Pennsylvania(宾夕法尼亚大学计算机与信息科学系) School of Computer Science, University of Chinese Academy of Sciences(中国科学院大学计算机科学学院) Department of Computer & Information Science & Engineering, University of Florida(佛罗里达大学计算机与信息科学与工程系) Department of Computer Science, Juniata College(朱尼塔学院计算机科学系) National Engineering Research Center for Beijing Biochip Technology(北京生物芯片工程技术研究中心) CapitalBio Corporation(资本生物公司) Faculty of Engineering & Information Technology, University of Technology Sydney(悉尼科技大学工程与信息技术学院) University Research and Innovation Center (EKIK), Obuda University(布达佩斯大学研究与创新中心(EKIK)) Faculty of Science and Technology, University of Macau(澳门大学科学与技术学院) Department of Mathematics, University of Southern California(南加州大学数学系)

AI总结 本文提出ProtoFlow,一种时间感知的原型动态框架,通过将类别原型建模为轨迹并学习其演变,以缓解遥感分割中的遗忘问题,实验表明其在多个基准上取得了显著提升。

详情
AI中文摘要

遥感分割在实际部署中本质上是连续的:新的语义类别不断出现,且获取条件随季节、城市和传感器而变化。尽管取得了进展,许多增量方法仍将训练步骤视为孤立的更新,导致表示漂移和遗忘控制不足。我们提出了ProtoFlow,一种时间感知的原型动态框架,将类别原型建模为轨迹,并通过显式的时间向量场学习其演变。通过联合强制低曲率运动和类间分离,ProtoFlow在增量学习过程中稳定了原型几何。在标准的类别和领域增量遥感基准上的实验表明,ProtoFlow在mIoUall上比强大的基线模型提高了1.5-2.0个百分点,并减少了遗忘。这些结果表明,显式建模时间原型演变是一种实用且可解释的策略,用于鲁棒的连续遥感分割。开源代码:https://github.com/dudududke/protoflow.

英文摘要

Remote sensing segmentation in real deployment is inherently continual: new semantic categories emerge, and acquisition conditions shift across seasons, cities, and sensors. Despite recent progress, many incremental approaches still treat training steps as isolated updates, which leaves representation drift and forgetting insufficiently controlled. We present ProtoFlow, a time-aware prototype dynamics framework that models class prototypes as trajectories and learns their evolution with an explicit temporal vector field. By jointly enforcing low-curvature motion and inter-class separation, ProtoFlow stabilizes prototype geometry throughout incremental learning. Experiments on standard class- and domain-incremental remote sensing benchmarks show consistent gains over strong baselines, including up to 1.5-2.0 points improvement in mIoUall, together with reduced forgetting. These results suggest that explicitly modeling temporal prototype evolution is a practical and interpretable strategy for robust continual remote sensing segmentation. Open-source code:https://github.com/dudududke/protoflow.

2604.02184 2026-05-19 cs.LG

Neural-network methods for two-dimensional finite-source reflector design

用于二维有限源反射器设计的神经网络方法

Roel Hacking, Lisa Kusch, Koondanibha Mitra, Martijn Anthonissen, Wilbert IJzerman

发表机构 * Eindhoven University of Technology(埃因霍温理工大学) Signify(Signify公司)

AI总结 本文提出了一种基于神经网络的二维有限源反射器设计方法,通过直接变量变换损失和基于网格的损失函数优化反射器高度,实现了高精度的远场分布控制,并在多个基准测试中展示了比传统反卷积方法更高的精度和速度。

Comments 25 pages, 12 figures, 2 tables. Submitted to Machine Learning: Science and Technology

详情
AI中文摘要

我们解决了将有限扩展光源发出的光转换为指定远场分布的二维反射器设计的逆问题。反射器高度由神经网络表示,并通过两个目标函数进行优化:一个基于闭式反射射线图的直接变量变换损失,以及一个将目标单元映射回光源的基于网格的损失,适用于不连续光源。通过自动微分计算梯度,并使用稳健的拟牛顿方法进行最小化。作为基线,我们采用了一种基于简化有限源近似的反卷积流程:从通量平衡中恢复一维单调映射,通过积分因子ODE求解转换为反射器,并嵌入修改后的Van Cittert迭代中,结合非负性裁剪和射线追踪反馈。在四个基准测试中,涵盖连续和不连续光源以及最小高度约束,精度通过射线追踪归一化均方误差测量。在两个主要基准测试中,神经方法在几秒钟内达到约2e-5和5e-5的误差,相比之下,反卷积基线在数百秒后仍为4e-3和5e-2。结果表明,神经方法在精度和速度上均优于传统方法,同时仍支持实际的高度约束。我们还讨论了通过迭代校正方案扩展到旋转对称和全三维反射器设计的可能性。

英文摘要

We address the inverse problem of designing two-dimensional reflectors that transform light from a finite, extended source into a prescribed far-field distribution. The reflector height is represented by a neural network and optimized with two objective functions: a direct change-of-variables loss based on the closed-form inverse ray map, and a mesh-based loss that maps target cells back to the source and remains usable for discontinuous sources. Gradients are computed by automatic differentiation and minimized with a robust quasi-Newton method. As a baseline, we adapt a deconvolution pipeline built on a simplified finite-source approximation: a one-dimensional monotone map is recovered from flux balance, converted to a reflector by an integrating-factor ODE solve, and embedded in a modified Van Cittert iteration with nonnegativity clipping and ray-traced feedback. Across four benchmarks, covering continuous and discontinuous sources and minimum-height constraints, accuracy is measured by ray-traced normalized mean absolute error. On the two main benchmarks, the neural method reaches errors of about 2e-5 and 5e-5 within a few seconds on one NVIDIA RTX 4090 GPU, compared with 4e-3 and 5e-2 for the deconvolution baseline after several hundred seconds. The results show that the neural formulation is both more accurate and substantially faster, while still supporting practical height constraints. We also discuss extensions to rotationally symmetric and full three-dimensional reflector design through iterative correction schemes.

2604.02060 2026-05-19 cs.CV cs.RO

CompassAD: Intent-Driven 3D Affordance Grounding in Functionally Competing Objects

CompassAD: 基于意图的多功能竞争物体3D affordance 地标

Jingliang Li, Jindou Jia, Tuo An, Chuhao Zhou, Xiangyu Chen, Shilin Shan, Boyu Ma, Bofan Lyu, Gen Li, Jianfei Yang

发表机构 * MARS Lab, Nanyang Technological University, Singapore(MARS实验室,南洋理工大学,新加坡)

AI总结 该研究提出了一种新的3D affordance设定,即意图驱动的可混淆地标,旨在预测多物体点云中正确物体的每点affordance掩码,基于隐含的自然语言意图。通过构建CompassAD基准,该研究展示了在具有隐含意图的多物体组合中的先进结果,并在机器人机械臂上验证了其在真实世界抓取中的有效性。

详情
AI中文摘要

当被告知要“切蛋糕”时,机器人必须在附近的剪刀之上选择刀,尽管两个物体都提供相同的切割功能。在真实世界场景中,多个物体可能具有相同的affordance,但只有一个是给定任务上下文下的合适对象。我们称这种情况为混淆对。然而,现有的3D affordance方法大多回避了这一挑战,通过评估孤立的单个物体,通常伴有查询中提供的显式类别名称。我们正式提出了意图驱动的可混淆affordance地标,这是一种新的3D affordance设定,要求在多物体点云中预测正确物体的每点affordance掩码,基于隐含的自然语言意图。为了研究这个问题,我们构建了CompassAD,第一个专注于隐含意图的多物体组合基准。它包含30个混淆物体对,覆盖16种affordance类型,6,422个组合,以及88K+个查询-回答对。此外,我们提出了CompassNet,一个包含两个专门模块的框架,专为该任务定制。实例受限的交叉注入(ICI)在物体边界内约束语言-几何对齐,以防止跨物体语义泄漏。双级对比细化(BCR)在几何组和点级别上强制执行区分,使目标和可混淆表面之间的区别更加清晰。广泛的实验表明,在已见和未见查询上均取得了最先进的结果,并在机器人机械臂上的部署证实了其在真实世界抓取中的有效性。

英文摘要

When told to "cut the cake," a robot must choose the knife over nearby scissors, despite both objects affording the same cutting function. In real-world scenes, multiple objects may share identical affordances, yet only one is appropriate under the given task context. We call such cases confusing pairs. However, existing 3D affordance methods largely sidestep this challenge by evaluating isolated single objects, often with explicit category names provided in the query. We formalize Intent-Driven Confusable Affordance Grounding, a new 3D affordance setting that requires predicting a per-point affordance mask on the correct object within a multi-object point cloud, conditioned on implicit natural language intent. To study this problem, we construct CompassAD, the first benchmark centered on implicit intent in confusing multi-object compositions. It comprises 30 confusing object pairs spanning 16 affordance types, 6,422 compositions, and 88K+ query-answer pairs. Furthermore, we propose CompassNet, a framework that incorporates two dedicated modules tailored to this task. Instance-bounded Cross Injection (ICI) constrains language-geometry alignment within object boundaries to prevent cross-object semantic leakage. Bi-level Contrastive Refinement (BCR) enforces discrimination at both geometric-group and point levels, sharpening distinctions between target and confusable surfaces. Extensive experiments demonstrate state-of-the-art results on both seen and unseen queries, and deployment on a robotic manipulator confirms effective transfer to real-world grasping in confusing multi-object compositions.

2604.00634 2026-05-19 cs.RO cs.CV

LiPS: Lightweight Panoptic Segmentation for Resource-Constrained Robotics

LiPS: 为资源受限机器人设计的轻量级全景分割

Calvin Galagain, Martyna Poreba, François Goulette, Cyrill Stachniss

发表机构 * Université Paris-Saclay, CEA LIST(巴黎-萨克雷大学,CEA LIST) U2IS, ENSTA, Institut Polytechnique de Paris(U2IS、ENSTA、巴黎理工学院) University of Bonn, Center for Robotics(波恩大学,机器人中心)

AI总结 本文提出LiPS,一种轻量级全景分割方法,通过简化特征提取和融合路径,在保持查询基于解码的同时,显著降低计算需求,实现与更重模型相当的精度和更高的吞吐量。

Comments Accepted to IEEE International Conference on Image Processing (ICIP) 2026, Paper #2070

详情
AI中文摘要

全景分割是机器人感知的关键使能器,因为它将语义理解与对象级推理统一起来。然而,随着最新模型复杂性的增加,它们不再适合在资源受限的平台上部署,如移动机器人。我们提出了一种名为LiPS的新方法,通过轻量级设计保留查询基于解码,同时引入流线型的特征提取和融合路径,旨在在大幅降低计算需求的同时提供强大的全景分割性能。在标准基准上的评估表明,LiPS在精度上与更重的基线相当,同时提供高达4.5倍的吞吐量(每秒帧数),并需要几乎6.8倍更少的计算。这种效率使LiPS成为现代全景模型与现实世界机器人应用之间的重要桥梁。

英文摘要

Panoptic segmentation is a key enabler for robotic perception, as it unifies semantic understanding with object-level reasoning. However, the increasing complexity of state-of-the-art models makes them unsuitable for deployment on resource-constrained platforms such as mobile robots. We propose a novel approach called LiPS that addresses the challenge of efficient-to-compute panoptic segmentation with a lightweight design that retains query-based decoding while introducing a streamlined feature extraction and fusion pathway. It aims at providing a strong panoptic segmentation performance while substantially lowering the computational demands. Evaluations on standard benchmarks demonstrate that LiPS attains accuracy comparable to much heavier baselines, while providing up to 4.5 higher throughput, measured in frames per second, and requiring nearly 6.8 times fewer computations. This efficiency makes LiPS a highly relevant bridge between modern panoptic models and real-world robotic applications.

2603.29868 2026-05-19 cs.AI cs.LO

Spatiotemporal Robustness of Temporal Logic Tasks using Multi-Objective Reasoning

基于多目标推理的时序逻辑任务时空鲁棒性

Oliver Schön, Lars Lindemann

发表机构 * Automatic Control Laboratory, ETH Zürich(自动化控制实验室,苏黎世联邦理工学院)

AI总结 本文研究了通过多目标推理处理时序逻辑任务的时空鲁棒性,提出了一种新的时空鲁棒性定义,能够同时考虑空间和时间扰动,并展示了其在多智能体机器人、智慧城市和空中交通管制等交互系统中的应用。

Comments 30 pages, 6 figures, to be published at the 38th International Conference on Computer Aided Verification 2026

详情
AI中文摘要

自主系统的可靠性依赖于其鲁棒性,即在不确定性下满足目标的能力。本文研究了在离散时间信号上评估的时序逻辑规范的时空鲁棒性。现有工作提出了鲁棒语义,能够捕捉不仅布尔可满足性,还包括从不可满足性距离的几何距离,对应于给定信号的可接受空间扰动。相比之下,我们提出了时空鲁棒性(STR),它同时捕捉可接受的空间和时间扰动。这一概念对于交互系统,如多智能体机器人、智慧城市和空中交通管制尤其具有信息量。我们将STR定义为一个多目标推理问题,通过空间和时间扰动的偏序关系形式化。这种视角有两个关键优势:(1)STR可以被解释为一个帕累托最优集,该集描述了所有可接受的时空扰动;(2)STR可以通过多目标优化工具进行计算。为克服计算挑战,我们提出了适用于STR的鲁棒语义,这些语义在适当的意义下是准确的,同时计算上是可行的。最后,我们使用这些鲁棒语义提出了STR的监控算法。据我们所知,这是首次通过多目标推理处理多维鲁棒性的工作。

英文摘要

The reliability of autonomous systems depends on their robustness, i.e., their ability to meet their objectives under uncertainty. In this paper, we study spatiotemporal robustness of temporal logic specifications evaluated over discrete-time signals. Existing work has proposed robust semantics that capture not only Boolean satisfiability, but also the geometric distance from unsatisfiability, corresponding to admissible spatial perturbations of a given signal. In contrast, we propose spatiotemporal robustness (STR), which captures admissible spatial and temporal perturbations jointly. This notion is particularly informative for interacting systems, such as multi-agent robotics, smart cities, and air traffic control. We define STR as a multi-objective reasoning problem, formalized via a partial order over spatial and temporal perturbations. This perspective has two key advantages: (1) STR can be interpreted as a Pareto-optimal set that characterizes all admissible spatiotemporal perturbations, and (2) STR can be computed using tools from multi-objective optimization. To navigate computational challenges, we propose robust semantics for STR that are sound in the sense of suitably under-approximating STR while being computationally tractable. Finally, we present monitoring algorithms for STR using these robust semantics. To the best of our knowledge, this is the first work to deal with robustness across multiple dimensions via multi-objective reasoning.

2603.26720 2026-05-19 cs.RO cs.AI

SutureFormer: Learning Surgical Trajectories via Goal-conditioned Offline RL in Pixel Space

SutureFormer: 通过像素空间中的目标引导离线强化学习学习手术轨迹

Huanrong Liu, Chunlin Tian, Tongyu Jia, Tailai Zhou, Qin Liu, Yu Gao, Yutong Ban, Yun Gu, Guy Rosman, Xin Ma, Qingbiao Li

发表机构 * University of Macau(澳门大学) The Chinese PLA General Hospital(中国人民解放军总医院) Duke University(杜克大学) Shanghai Jiao Tong University(上海交通大学)

AI总结 本文提出SutureFormer,一种基于目标引导的离线强化学习框架,通过稀疏标注到密集奖励信号的插值,有效学习手术针轨迹预测,减少平均位移误差58.6%。

详情
AI中文摘要

从内窥镜视频预测手术针轨迹对于机器人辅助缝合至关重要,能够实现预见性规划、实时引导和更安全的运动执行。现有直接从视觉观测学习运动分布的方法往往忽视相邻运动步骤之间的序列依赖性。此外,稀疏路径点标注通常无法提供足够的监督,进一步增加了监督或模仿学习方法的难度。为了解决这些挑战,我们将基于图像的针轨迹预测 formulations 为一个序列决策问题,在其中将针尖视为一个在像素空间中逐步移动的智能体。这种 formulation 自然捕捉了针运动的连续性,并能够显式建模在时间上物理上合理的像素级状态转换。从这个角度来看,我们提出SutureFormer,一种目标引导的离线强化学习框架,通过三次样条插值将稀疏标注转换为密集奖励信号,鼓励策略在利用有限专家指导的同时探索合理的未来运动路径。SutureFormer 使用观察编码器编码可变长度片段,以捕捉局部空间线索和长距离时间动态,并通过由离散方向和连续幅度组成的操作自回归地预测未来路径点。为了实现从专家演示中稳定离线策略优化,我们采用保守Q学习与行为克隆正则化。在包含1,158条轨迹的新的肾伤口缝合数据集中进行的实验表明,与最强基线相比,SutureFormer将平均位移误差减少了58.6%,证明了将针轨迹预测建模为像素级序列动作学习的有效性。

英文摘要

Predicting surgical needle trajectories from endoscopic video is critical for robot-assisted suturing, enabling anticipatory planning, real-time guidance, and safer motion execution. Existing methods that directly learn motion distributions from visual observations tend to overlook the sequential dependency among adjacent motion steps. Moreover, sparse waypoint annotations often fail to provide sufficient supervision, further increasing the difficulty of supervised or imitation learning methods. To address these challenges, we formulate image-based needle trajectory prediction as a sequential decision-making problem, in which the needle tip is treated as an agent that moves step by step in pixel space. This formulation naturally captures the continuity of needle motion and enables the explicit modeling of physically plausible pixel-wise state transitions over time. From this perspective, we propose SutureFormer, a goal-conditioned offline reinforcement learning framework that leverages sparse annotations to dense reward signals via cubic spline interpolation, encouraging the policy to exploit limited expert guidance while exploring plausible future motion paths. SutureFormer encodes variable-length clips using an observation encoder to capture both local spatial cues and long-range temporal dynamics, and autoregressively predicts future waypoints through actions composed of discrete directions and continuous magnitudes. To enable stable offline policy optimization from expert demonstrations, we adopt Conservative Q-Learning with Behavioral Cloning regularization. Experiments on a new kidney wound suturing dataset containing 1,158 trajectories from 50 patients show that SutureFormer reduces Average Displacement Error by 58.6% compared with the strongest baseline, demonstrating the effectiveness of modeling needle trajectory prediction as pixel-level sequential action learning.

2603.20216 2026-05-19 cs.CL cs.AI cs.LG

Locally Coherent Parallel Decoding in Diffusion Language Models

局部相干并行解码在扩散语言模型中

Michael Hersche, Nicolas Menet, Ronan Tanios, Abbas Rahimi

发表机构 * IBM Research - Zurich(IBM瑞士研究实验室)

AI总结 本文提出CoDiLA方法,通过引入小型辅助自回归模型来解决扩散语言模型在并行解码中的相干性问题,从而在代码生成任务中实现更高的准确性和速度。

Comments Accepted at ICML 2026

详情
AI中文摘要

扩散语言模型(DLMs)作为一种有前景的替代自回归(AR)模型,提供了亚线性生成延迟和双向能力,这在代码生成和编辑中尤为吸引人。在离散DLMs中实现亚线性延迟需要并行预测多个token。然而,标准DLMs从条件边缘分布独立采样token,无法捕捉同时生成token之间的联合依赖关系。因此,它们常常导致语法不一致并破坏多token结构。在本工作中,我们引入CoDiLA(Coherent Diffusion with Local Autoregression),一种方法,通过引入小型辅助AR模型来解决并行采样与局部依赖建模之间的矛盾。该方法将局部解码委托给一个小型辅助AR模型,该模型在扩散潜变量上进行操作。这种设计允许并行生成,同时在块内确保序列的有效性,并保持核心DLM能力,包括跨块的双向建模。我们证明使用高度紧凑的辅助AR模型(例如,0.6B参数)可以有效消除相干性伪影,在代码生成基准中建立了一个新的帕累托前沿。

英文摘要

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) models, offering sub-linear generation latency and bidirectional capabilities that are particularly appealing for code generation and editing. Achieving sub-linear latency in discrete DLMs requires predicting multiple tokens in parallel. However, standard DLMs sample tokens independently from conditional marginal distributions, failing to capture the joint dependencies among concurrently generated tokens. As a result, they often lead to syntactic inconsistencies and break multi-token structures. In this work, we introduce CoDiLA (Coherent Diffusion with Local Autoregression), a method that reconciles parallel sampling with local dependency modeling. Rather than forcing the DLM to resolve fine-grained syntax, CoDiLA delegates local decoding to a small, auxiliary AR model operating on the diffusion latents. This design allows for parallel generation while ensuring sequential validity within a block and maintaining core DLM capabilities, including bidirectional modeling across blocks. We demonstrate that using a highly compact auxiliary AR model (e.g., 0.6B parameters) effectively eliminates coherence artifacts, establishing a new Pareto frontier for accuracy and speed in code generation benchmarks.

2603.17751 2026-05-19 cs.RO cs.SY eess.SY

Multi-Source Human-in-the-Loop Digital Twin Testbed for Connected and Autonomous Vehicles in Mixed Traffic Flow

多源人机协同数字孪生测试平台用于混合交通流中的连接与自动驾驶车辆

Jianghong Dong, Chunying Yang, Mengchi Cai, Chaoyi Chen, Qing Xu, Jianqiang Wang, Jiawei Wang, Keqiang Li

发表机构 * School of Vehicle and Mobility, Tsinghua University(清华大学车辆与移动性学院) Department of Civil and Environmental Engineering, University of Michigan(密歇根大学土木与环境工程系)

AI总结 本文提出了一种多源人机协同混合云控制测试平台(MSH-MCCT),用于在混合交通环境中测试连接与自动驾驶车辆(CAVs)与人类驾驶车辆(HDVs)之间的复杂交互,通过混合数字孪生概念结合混合现实与数字孪生,提升实验灵活性和可扩展性。

Journal ref 2026 in Journal of Intelligent and Connected Vehicles

详情
AI中文摘要

在新兴的混合交通环境中,连接与自动驾驶车辆(CAVs)必须与周围的人类驾驶车辆(HDVs)进行交互。本文介绍MSH-MCCT(多源人机协同混合云控制测试平台),一种新的CAV测试平台,能够捕捉各种CAVs和HDVs之间的复杂交互。利用混合数字孪生概念,该概念结合了混合现实与数字孪生,MSH-MCCT整合了物理、虚拟和混合平台,以及多源控制输入。通过混合平台的连接,MSH-MCCT允许人类驾驶员和CAV算法在多个视野范围内同时操作物理和虚拟车辆。特别地,该测试平台促进了物理和虚拟CAVs与HDVs的共存和实时交互,显著提高了实验的灵活性和可扩展性。在混合交通中的车辆编队实验展示了MSH-MCCT通过不同保真度的驾驶模拟器进行多源真实人类驾驶员闭环CAV测试的潜力。实验视频可在我们的项目网站上获得:https://dongjh20.github.io/MSH-MCCT。

英文摘要

In the emerging mixed traffic environments, Connected and Autonomous Vehicles (CAVs) have to interact with surrounding human-driven vehicles (HDVs). This paper introduces MSH-MCCT (Multi-Source Human-in-the-Loop Mixed Cloud Control Testbed), a novel CAV testbed that captures complex interactions between various CAVs and HDVs. Utilizing the Mixed Digital Twin concept, which combines Mixed Reality with Digital Twin, MSH-MCCT integrates physical, virtual, and mixed platforms, along with multi-source control inputs. Bridged by the mixed platform, MSH-MCCT allows human drivers and CAV algorithms to operate both physical and virtual vehicles within multiple fields of view. Particularly, this testbed facilitates the coexistence and real-time interaction of physical and virtual CAVs \& HDVs, significantly enhancing the experimental flexibility and scalability. Experiments on vehicle platooning in mixed traffic showcase the potential of MSH-MCCT to conduct CAV testing with multi-source real human drivers in the loop through driving simulators of diverse fidelity. The videos for the experiments are available at our project website: https://dongjh20.github.io/MSH-MCCT.

2603.17577 2026-05-19 cs.LG cs.AI stat.ML

Identifying Latent Actions and Dynamics from Offline Data via Demonstrator Diversity

通过示范多样性从离线数据中识别潜在动作和动态

Felix Schur

发表机构 * ETH Zürich(苏黎世联邦理工学院)

AI总结 本文研究了在不观察动作的情况下从离线轨迹中恢复潜在动作和环境动态的问题,通过示范多样性假设,证明了在满足特定条件时,潜在转移和示范策略可以被唯一确定,从而为从离线强化学习数据中学习潜在动作和动态提供了新的方法。

详情
AI中文摘要

在动作未被观察的情况下,能否从离线轨迹中恢复潜在动作和环境动态?我们研究了在轨迹无动作但带有示范者身份标签的设置中这一问题。我们假设每个示范者遵循不同的策略,而环境动态在所有示范者之间是共享的,身份仅通过所选动作影响下一个观测。在这些假设下,条件下一个观测分布 $p(o_{t+1}\mid o_t,e)$ 是潜在动作条件化转移核的混合,具有示范者特定的混合权重。我们证明,这导致每个状态的可观测条件分布具有列随机非负矩阵分解。通过充分分散的策略多样性和秩条件,我们证明潜在转移和示范策略在潜在动作标签的排列下是可识别的。通过Gram行列式最小体积准则,我们将结果扩展到连续观测空间,并证明在连接的状态空间上转移映射的连续性将局部排列模糊性提升为单一全局排列。少量标记的动作数据足以消除最终的模糊性。这些结果确立了示范多样性作为从离线强化学习数据中学习潜在动作和动态的原理性可识别性来源。

英文摘要

Can latent actions and environment dynamics be recovered from offline trajectories when actions are never observed? We study this question in a setting where trajectories are action-free but tagged with demonstrator identity. We assume that each demonstrator follows a distinct policy, while the environment dynamics are shared across demonstrators and identity affects the next observation only through the chosen action. Under these assumptions, the conditional next-observation distribution $p(o_{t+1}\mid o_t,e)$ is a mixture of latent action-conditioned transition kernels with demonstrator-specific mixing weights. We show that this induces, for each state, a column-stochastic nonnegative matrix factorization of the observable conditional distribution. Using sufficiently scattered policy diversity and rank conditions, we prove that the latent transitions and demonstrator policies are identifiable up to permutation of the latent action labels. We extend the result to continuous observation spaces via a Gram-determinant minimum-volume criterion, and show that continuity of the transition map over a connected state space upgrades local permutation ambiguities to a single global permutation. A small amount of labeled action data then suffices to fix this final ambiguity. These results establish demonstrator diversity as a principled source of identifiability for learning latent actions and dynamics from offline RL data.

2603.14936 2026-05-19 cs.CV

Bridging the Intention-Expression Gap: Aligning Multi-Dimensional Preferences via Hierarchical Relevance Feedback in Text-to-Image Diffusion

弥合意图-表达鸿沟:通过层次相关反馈对齐多维偏好

Wenxi Wang, Hongbin Liu, Mingqian Li, Junyan Yuan, Junqi Zhang

发表机构 * Tongji University(同济大学)

AI总结 本文提出一种层次相关反馈驱动框架,通过在文本到图像扩散模型中对齐多维特征,解决用户意图与表达之间的鸿沟问题,提升模型对多维偏好的识别能力。

详情
AI中文摘要

用户往往具有明确的视觉意图,但难以用语言准确表达。这种意图-表达鸿沟使得在文本到图像扩散模型中对齐生成图像与潜在视觉偏好成为基本挑战。现有方法要么需要模型训练,牺牲灵活性,要么依赖文本反馈,加重认知负担。尽管最近的无训练方法使用基于点击的二元偏好反馈来减少用户努力,但它们迫使基础模型(FMs)在语义层面推断偏好。当面对多维偏好时,FMs会受到推断过载的影响,并且无法在冲突的用户信号下识别出确切的首选特征值。因此,一种灵活的多维特征对齐框架仍然缺失。为了解决这个问题,我们提出了一个层次相关反馈驱动(HRFD)框架。认识到多个特征难以同时收敛,HRFD将它们组织成三级层次,并适应相关反馈以强制粗到细的收敛,从而减少认知负担。为了绕过FM推断过载,HRFD将过程分解为独立的单特征偏好推断任务。此外,为了克服FM在识别首选值上的失败,HRFD采用统计推断来量化“喜欢”和“不喜欢”图像集之间特征分布差异,实现稳健且透明的偏好测量。关键的是,HRFD完全在外部文本空间中运行,严格无训练且模型无关。广泛的实验表明,HRFD能够有效捕捉用户的真正视觉意图,显著优于基线方法。

英文摘要

Users often possess a clear visual intent but struggle to articulate it precisely in language. This intention-expression gap makes aligning generated images with latent visual preferences a fundamental challenge in text-to-image diffusion models. Existing methods either require model training, sacrificing flexibility, or rely on textual feedback, imposing a heavy cognitive burden. Although recent training-free methods use click-based binary preference feedback to reduce user effort, they force Foundation Models (FMs) to infer preferences at the semantic level. When faced with multi-dimensional preferences, FMs suffer from inference overload and fail to identify exact preferred feature values under conflicting user signals. Consequently, a flexible framework for multi-dimensional feature alignment remains absent. To address this, we propose a Hierarchical Relevance Feedback-Driven (HRFD) framework. Recognizing that multiple features struggle to converge simultaneously, HRFD organizes them into a three-tier hierarchy and adapts relevance feedback to enforce coarse-to-fine convergence, minimizing cognitive load. To bypass FM inference overload, HRFD decouples the process into independent single-feature preference inference tasks. Furthermore, to overcome FMs' failure in identifying preferred values, HRFD employs statistical inference to quantify the distribution divergence of features between "liked" and "disliked" image sets, achieving robust and transparent preference measurement. Crucially, HRFD operates entirely within the external text space, remaining strictly training-free and model-agnostic. Extensive experiments demonstrate that HRFD effectively captures the user's true visual intent, significantly outperforming baseline approaches.

2603.14371 2026-05-19 cs.RO cs.AI

OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism

OxyGen: 为多任务并行下的VLA推理提供统一的KV缓存管理

Xiangyu Li, Huaizhi Tang, Xin Ding, Weijun Wang, Ting Cao, Yunxin Liu

发表机构 * Institute for AI Industry Research (AIR)(人工智能产业研究院) Department of Electronic Engineering(电子工程系) University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出OxyGen,一种统一的KV缓存管理方法,用于在多任务并行下提高VLA推理效率,通过跨任务KV共享和跨帧连续批处理实现冗余计算和资源竞争的减少,从而在设备端实现更高的吞吐量和频率。

Comments Preprint

详情
AI中文摘要

具身AI代理越来越多地需要在不同的时间约束下从共享观察中并行执行多个任务,如操作、对话和记忆构建。最近的混合变换器(MoT)视觉-语言-动作模型(VLAs)在架构上支持这种异构输出,但现有的推理系统由于冗余计算和资源竞争未能在设备部署中实现高效的多任务并行。我们发现孤立的KV缓存管理是根本原因。为此,我们提出了统一的KV缓存管理,一种将KV缓存作为跨任务和时间的第一类共享资源的推理设计。这种抽象使两种关键优化成为可能:跨任务的KV共享消除了共享观察的冗余预填充,而跨帧连续批处理将可变长度的语言解码与固定速率的动作生成解耦。我们为流行的MoT VLA π_{0.5} 实现了这种设计,并在NVIDIA GeForce RTX 4090和Jetson AGX Thor两个代表性的设备端VLA推理平台上进行了评估。OxyGen在孤立执行的情况下实现了高达3.7倍的加速,同时在不降低动作质量的情况下,实现了超过200 tokens/s的语言吞吐量和70 Hz的动作频率,并进一步在搭载Jetson AGX Thor的现实人形机器人上验证了这些收益。

英文摘要

Embodied AI agents increasingly require parallel execution of multiple tasks, such as manipulation, conversation, and memory construction, from shared observations under distinct time constraints. Recent Mixture-of-Transformers (MoT) Vision-Language-Action Models (VLAs) architecturally support such heterogeneous outputs, yet existing inference systems fail to achieve efficient multi-task parallelism for on-device deployment because of redundant computation and resource contention. We identify isolated KV cache management as the root cause. To address this, we propose unified KV cache management, an inference design that treats the KV cache as a first-class shared resource across tasks and over time. This abstraction enables two key optimizations: cross-task KV sharing eliminates redundant prefill of shared observations, while cross-frame continuous batching decouples variable-length language decoding from fixed-rate action generation across control cycles. We implement this design for $π_{0.5}$, a popular MoT VLA, and evaluate it on both NVIDIA GeForce RTX 4090 and Jetson AGX Thor, two representative platforms for on-device VLA inference. OxyGen achieves up to 3.7$\times$ speedup over isolated execution, delivering over 200 tokens/s language throughput and 70 Hz action frequency simultaneously without degrading action quality, and we further validate the gains on a real humanoid robot with on-board Jetson AGX Thor.

2603.13708 2026-05-19 cs.CV

RSEdit: Text-Guided Image Editing for Remote Sensing

RSEdit:面向遥感的文本引导图像编辑

Chen Zhenyuan, Zhang Zechuan, Zhang Feng

发表机构 * School of Earth Sciences, Zhejiang University(浙江大学地球科学学院) Zhejiang Provincial Key Laboratory of Geographic Information Science(浙江省地理信息科学重点实验室) Key Laboratory of Spatio-temporal Information and Intelligent Services (LSIIS), Ministry of Natural Resources of the People’s Republic of China(中华人民共和国自然资源部时空信息与智能服务重点实验室) ReLER, CCAI, Zhejiang University(ReLER,中国人工智能学会,浙江大学)

AI总结 本文提出RSEdit,一种基于生成模型的遥感图像编辑方法,通过研究文本到图像模型的条件策略,实现了在保持地理空间结构的同时,生成指令忠实的图像编辑结果。

Comments accepted by IEEE GRSL

详情
AI中文摘要

在本文中,我们探索了利用生成模型在遥感领域进行文本引导的图像编辑。我们提出了RSEdit,一种从U-Net到DiT的各种配置模型的集合。具体来说,我们展示了首次全面研究如何通过文本到图像模型构建图像编辑模型的条件策略。我们的实验表明,RSEdit在保持地理空间结构的同时,实现了最佳的指令忠实编辑。我们发布了代码和检查点。

英文摘要

In this paper, we explore text-guided image editing in the remote sensing domain using generative modeling. We propose \rsedit, a collection of models from U-Net to DiT with various configurations. Specifically, we present the first comprehensive study of conditioning strategies for building image editing models from off-the-shelf text-to-image ones. Our experiments show that \rsedit achieves the best instruction-faithful edits while preserving geospatial structure. We release the code at \url{https://github.com/Bili-Sakura/RSEdit-Preview} and checkpoints at \url{https://huggingface.co/collections/BiliSakura/rsedit}.