arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3418
2605.25191 2026-05-26 cs.CV

Injecting Image Guidance into Text-Conditioned Diffusion Models at Inference

在推理时将图像引导注入文本条件扩散模型

Agata Żywot, Iason Skylitsis, Thijmen Nijdam, Zoe Tzifa-Kratira, Derck Prinzhorn, Konrad Szewczyk, Aritra Bhowmik

AI总结 提出视觉概念融合(VCF),一种无需重新训练即可在推理时同时以图像和文本为条件进行双重引导的方法,通过对齐CLIP图像特征与文本嵌入空间实现视觉概念注入。

详情
AI中文摘要

像Stable Diffusion这样的文本到图像扩散模型可以从文本生成高质量图像,但缺乏在推理时无需重新训练即可注入视觉引导(例如草图、风格)的方法。现有方法要么需要计算昂贵的微调,要么依赖于可能造成与文本提示语义不对齐的风格迁移技术。我们引入了视觉概念融合(VCF),这是第一种在推理时无需任何概念特定训练即可同时对图像和文本提示进行双重条件化的方法。VCF通过将CLIP图像特征与文本嵌入空间对齐,实现了将视觉概念注入Stable Diffusion。VCF由三个组件组成:(1)一个轻量级对齐器,使用InfoNCE和交叉注意力重建损失将图像标记映射到文本嵌入流形;(2)一种保留文本和视觉语义的融合策略;(3)一个可选的提示-噪声优化(PNO)模块,用于测试时细化。我们的实验表明,VCF成功地从参考图像中转移了包括风格、构图和调色板在内的视觉属性,同时保持了对提示的遵循。定量结果显示文本对齐(CLIP分数)和视觉对应(LPIPS)之间存在权衡,VCF在参考保真度方面优于基线。

英文摘要

Text-to-image diffusion models like Stable Diffusion generate high-quality images from text, but lack a way to inject visual guidance (e.g. sketches, styles) at inference without retraining. Existing methods either require computationally expensive fine-tuning or rely on style transfer techniques that risk semantic misalignment with textual prompts. We introduce Visual Concept Fusion (VCF), the first method offering dual conditioning on both an image and text prompt at inference time without any concept-specific training. VCF enables visual concept injection into Stable Diffusion by aligning CLIP image features with the text embedding space. VCF consists of three components: (1) a lightweight aligner that maps image tokens to the text embedding manifold using InfoNCE and cross-attention reconstruction losses, (2) a fusion strategy that preserves both textual and visual semantics, and (3) an optional Prompt-Noise Optimization (PNO) module for test-time refinement. Our experiments demonstrate that VCF successfully transfers visual attributes including style, composition, and color palette from reference images while maintaining prompt adherence. Quantitative results show a trade-off between text alignment (CLIP score) and visual correspondence (LPIPS), with VCF outperforming baselines in reference fidelity.

2605.25189 2026-05-26 cs.LG cs.CL

Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models

方向对齐缓解语言模型强化学习中的奖励黑客问题

Wenlong Deng, Jiaji Huang, Kaan Ozkara, Yushu Li, Christos Thrampoulidis, Xiaoxiao Li, Youngsuk Park

AI总结 通过分析强化学习更新的几何结构,发现奖励黑客源于优化偏离稳定低维学习轨迹,提出可信方向投影方法约束梯度在干净参考子空间内,延迟捷径利用并保持任务性能。

详情
AI中文摘要

当模型通过利用捷径而非解决预期任务来改进代理奖励时,就会出现奖励黑客问题。我们通过语言模型中强化学习更新的几何结构来研究这种失败模式,并认为当优化偏离稳定的低维学习轨迹时,黑客行为就会出现。我们通过参数更新的主导奇异方向分析了这种漂移,并表明奖励黑客运行比干净运行表现出显著更大的方向变化。基于这一观察,我们引入了可信方向投影,它约束梯度保持在干净参考子空间内。在数学推理的奖励黑客实验中,所提出的方法延迟了捷径利用并更好地保持了任务性能。

英文摘要

Reward hacking arises when a model improves a proxy reward by exploiting shortcuts rather than solving the intended task. We study this failure mode through the geometry of reinforcement learning updates in language models and argue that hacking emerges when optimization drifts away from a stable low-dimensional learning trajectory. We analyze this drift through dominant singular directions of parameter updates and show that reward-hacking runs exhibit substantially larger directional change than clean runs. Motivated by this observation, we introduce trusted-direction projection, which constrains gradients to remain within a clean reference subspace. Across reward-hacking experiments on mathematical reasoning, the proposed approach delays shortcut exploitation and better preserves task performance.

2605.25188 2026-05-26 cs.AI

DarkForest: Less Talk, Higher Accuracy for Multi-Agent LLMs

DarkForest: 少说话,多智能体LLM更高精度

Yi Li, Songtao Wei, Dongming Jiang, Zhichun Guo, Qiannan Li, Bingzhe Li

AI总结 提出DarkForest框架,通过保持智能体独立、结构化解析响应并基于信念分布协调,减少通信开销和错误传播,在六个推理基准上实现领先质量并大幅降低令牌消耗。

详情
AI中文摘要

多智能体LLM系统通过组合多个智能体的输出来改进推理,但交互密集型方法可能导致错误传播和高通信开销。当智能体交换原始响应或推理轨迹时,不正确的中间推理可能被采纳和放大,导致自信但错误的共识;多轮通信也增加了令牌消耗、延迟和推理成本。在本文中,我们提出了一种名为DarkForest的受控通信协调框架。DarkForest首先保持智能体独立,因此每个智能体在不看到其他智能体输出的情况下产生答案。然后,它将原始响应解析为结构化候选记录,将语义等价的候选记录分组为聚类,并使用智能体可靠性、置信度、解析质量、支持模式可靠性和独立性校正来估计这些聚类上的校准信念分布。协调器仅从该信念状态接收策略允许的证据,并进行受控通信。在六个推理基准上的实验表明,DarkForest实现了领先的整体质量,在基准指标上比最强基线提高了30.7%,并且与通信密集型基线相比,令牌消耗减少了高达6.5倍。

英文摘要

Multi-agent LLM systems improve reasoning by combining outputs from multiple agents, but interaction-heavy methods can introduce error propagation and high communication overhead. When agents exchange raw responses or reasoning traces, incorrect intermediate reasoning may be adopted and amplified, leading to confident but wrong consensus; multi-round communication also increases token consumption, latency, and inference cost. In this paper, we propose a controlled-communication coordination framework named DarkForest. DarkForest first keeps agents independent, so each agent produces an answer without seeing the others' outputs. It then parses the raw responses into structured candidate records, groups semantically equivalent candidates into clusters, and estimates a calibrated belief distribution over these clusters using agent reliability, confidence, parse quality, support-pattern reliability, and independence corrections. A coordinator receives only policy-permitted evidence from this belief state with controlled communication. Experiments on six reasoning benchmarks show that DarkForest achieves leading overall quality, improves the strongest baseline by up to 30.7\% on benchmark metrics, and reduces token consumption by up to $6.5\times$ compared with communication-heavy baselines.

2605.25186 2026-05-26 cs.CL cs.AI

By Their Fruits You Will Know Them: Comparing Formalizations of Law by the Decisions They Encode

凭其果实,你们将认识它们:通过编码的决策比较法律的形式化

Julius Vernie, Matthias Grabmair

AI总结 提出一种系统方法,通过SAT求解器枚举不同形式化在边缘案例上的分歧,并转化为具体事实场景,以比较同一法律条款的不同形式化,应用于九个前沿LLM生成的十个欧盟条款形式化,发现行为分歧与结构一致性基本不相关。

Comments 23 pages, 17 figures, submitted to EMNLP PROC 2026

详情
AI中文摘要

将法律条款形式化有望实现机器可访问的法律和自动化法律推理,而最近的LLM使得直接从法规文本生成这种形式化变得诱人。然而,任何形式化都会做出隐含的解释选择,其后果难以预料,尤其是当LLM是作者时。我们提出了一种方法,通过它们在个别案例上的推理,系统地比较同一法律条款的不同形式化。给定一个条款的多个形式化,我们在节点级别匹配它们,从匹配中为每对推导出一个共享接口,并使用SAT求解器枚举任意两个形式化存在分歧的边缘案例。然后将选定的边缘案例转化为具体的事实场景,供法律专家检查并采取行动。我们将该方法应用于九个前沿LLM生成的十个欧盟条款的形式化。我们发现,形式化之间的行为分歧与其结构一致性基本不相关,并且口头化的案例揭示了定性的不同分歧类型,包括反映法律评论中真实争议的分歧。

英文摘要

Formalizing legal provisions promises machine-accessible law and automated legal reasoning, and recent LLMs make it tempting to generate such formalizations directly from statutory text. However, any formalization makes implicit interpretive choices whose consequences are hard to anticipate, especially if an LLM is the author. We present a method for systematically comparing different formalizations of the same legal provision by their inferences on individual cases. Given multiple formalizations of a provision, we match them at the node level, derive a shared interface for each pair from the matching, and use a SAT solver to enumerate the edge cases on which any two formalizations disagree. Selected edge cases are then verbalized into concrete factual scenarios that a legal expert can examine and act on. We apply our method to formalizations of ten EU provisions generated by nine frontier LLMs. We find that behavioral divergence between formalizations is essentially uncorrelated with their structural agreement and that the verbalized cases reveal qualitatively distinct types of disagreement, including divergences that mirror genuine controversies in the legal commentary.

2605.25181 2026-05-26 cs.AI

SpecAlign: A Semantic Alignment Framework for SystemVerilog Assertion Generation

SpecAlign: 一种用于 SystemVerilog 断言生成的语义对齐框架

Jaime Rafael Imperial, Hao Zheng

AI总结 提出 SpecAlign 框架,通过基于蕴含的分类和自一致性投票机制,评估并改进 LLM 生成的 SVA 与自然语言规范之间的语义对齐,无需黄金 RTL。

详情
AI中文摘要

现有的大语言模型(LLM)方法在生成 SystemVerilog 断言(SVA)时主要关注语法有效性和形式验证结果,而生成的断言与自然语言规范之间的语义对齐仍然难以量化。因此,在缺乏黄金 RTL 的情况下,幻觉或未对齐的 SVA 会降低信心并增加调试工作。本文提出了 SpecAlign,一个用于语义评估和优化 LLM 生成的 SVA 的框架。SpecAlign 引入了两个迭代对齐循环,通过基于蕴含的分类来评估自然语言属性和 SVA 是否符合设计规范。我们通过链式思维提示生成多个推理路径,并通过自一致性投票机制聚合它们,从而改进对齐决策。对未对齐的断言进行分析以生成可操作的反馈用于优化。我们进一步定义了一个定量对齐分数来衡量迭代过程中的语义一致性。实验结果表明,SpecAlign 能够有效检测语义不一致性,并在不依赖黄金 RTL 的情况下改进断言对齐,为传统形式验证评估指标提供了可扩展的补充。

英文摘要

Existing Large Language Model (LLM) approaches to SystemVerilog Assertion (SVA) generation primarily focus on syntactic validity and formal verification outcomes, while semantic alignment between generated assertions and natural language specifications remains difficult to quantify. As a result, hallucinated or misaligned SVAs can reduce confidence and increase debugging efforts in the absence of golden RTL. This paper presents SpecAlign, a framework for semantic evaluation and refinement of LLM-generated SVAs. SpecAlign introduces two iterative alignment loops that assess both natural language properties and SVAs against the design specification using entailment-based classification. We improve alignment decisions by generating multiple reasoning paths using chain-of-thought prompting and aggregating them via a self-consistency voting mechanism. Misaligned assertions are analyzed to generate actionable feedback for refinement. We further define a quantitative alignment score to measure semantic consistency across iterations. Experimental results demonstrate that SpecAlign effectively detects semantic inconsistencies and improves assertion alignment without relying on golden RTL, providing a scalable complement to traditional formal verification evaluation metrics.

2605.25179 2026-05-26 cs.CL

Locality Matters for Training-Free Audio Token Compression in Audio-Language Models

局部性对音频-语言模型中免训练音频令牌压缩的重要性

Jiale Luo, Xiaoyu Liang, Haoji Hu

AI总结 提出局部时间二分图合并(LTBM)方法,通过显式时间窗口约束合并相似邻近音频令牌,实现免训练的编码器空间压缩,并验证了局部性归纳偏置在音频令牌压缩中的任务依赖性优势。

Comments Preprint. 8 pages main text, 10 pages total

详情
AI中文摘要

音频-语言模型(ALMs)越来越多地用于音频字幕生成、问答和开放式音频理解,但当音频输入表示为长前缀令牌序列时,其推理成本仍然很高。这些音频前缀消耗上下文预算,增加内存使用,并在资源受限或延迟敏感的环境中使部署更加困难。现有的免训练音频令牌缩减方法主要依赖于固定池化或基于分数的剪枝。固定池化是内容无关的,而基于分数的剪枝可以保留孤立的显著令牌但丢弃附近的声学上下文。我们提出局部时间二分图合并(LTBM),一种免训练的编码器空间压缩方法,在显式时间窗口约束下合并相似的邻近音频令牌。除了引入LTBM,我们还使用受控的全局合并变体来隔离时间局部性本身是否是音频令牌压缩的有用归纳偏置。在AudioCaps、Clotho和MMAU上使用Qwen2-Audio进行的实验显示了任务依赖的局部性效应:在几种压缩设置下,尤其是更强压缩下,局部感知合并更有利于字幕生成,而全局匹配在多项选择音频理解中更具竞争力。在Audio Flamingo 3上的跨骨干验证进一步支持了在适度和激进压缩下局部感知合并的字幕生成优势。

英文摘要

Audio-language models (ALMs) are increasingly used for audio captioning, question answering, and open-ended audio understanding, but their inference cost remains high when audio inputs are represented as long prefix-token sequences. These audio prefixes consume context budget, increase memory usage, and make deployment harder in resource-constrained or latency-sensitive settings. Existing training-free audio-token reduction methods mainly rely on fixed pooling or score-based pruning. Fixed pooling is content-agnostic, while score-based pruning can preserve isolated salient tokens but discard nearby acoustic context. We propose Local Temporal Bipartite Merging (LTBM), a training-free encoder-space compression method that merges similar nearby audio tokens under an explicit temporal window constraint. Beyond introducing LTBM, we use a controlled Global Merge variant to isolate whether temporal locality itself is a useful inductive bias for audio-token compression. Experiments on AudioCaps, Clotho, and MMAU with Qwen2-Audio show evidence of a task-dependent locality effect: locality-aware merging is more favorable for captioning at several compression settings, especially under stronger compression, while global matching is more competitive for multiple-choice audio understanding. A cross-backbone validation on Audio Flamingo 3 further supports the captioning-side advantage of locality-aware merging under moderate and aggressive compression.

2605.25175 2026-05-26 cs.CV

Discrepancy Minimization Improves Cross-Hospital Robustness in Digital Pathology

差异最小化提升数字病理学中的跨医院鲁棒性

Ben Vardi, Dana Schonberger, Yuval Friedmann, Zohar Yakhini, Iris Barshack, Alexander Loebel, Ariel Shamir

AI总结 通过局部最大均值差异(LMMD)微调病理基础模型,在域适应和域泛化设置下提升跨医院鲁棒性。

详情
AI中文摘要

病理基础模型(PFMs)近年来快速发展,支持为多种组织病理学任务训练分类器。然而,它们在医院间的鲁棒性仍然有限:当在一个医院的数据上训练分类器并在另一个目标医院评估时,性能通常会下降。我们通过使用局部最大均值差异(LMMD)目标微调PFMs来解决这一挑战,该目标适用于两种设置:域适应(有未标记的目标医院数据可用)和域泛化(目标医院数据完全不可用)。在补丁和切片级别的实验表明,在多个PFMs和任务上均有一致的改进。

英文摘要

Pathology foundation models (PFMs) have advanced rapidly in recent years and support training classifiers for a range of histopathology tasks. However, their robustness across hospitals remains limited: performance often degrades when training a classifier on data from one hospital and evaluating it on another target hospital. We address this challenge by fine-tuning PFMs with a local maximum mean discrepancy (LMMD) objective that applies to two settings: domain adaptation, where unlabeled target-hospital data is available, and domain generalization, where target-hospital data is unavailable at all. Experiments at both the patch- and slide-level show consistent improvements across multiple PFMs and tasks.

2605.25170 2026-05-26 cs.LG cs.AI cs.ET cs.RO

Grow-Prune-Freeze Networks: Adaptive & Continual Learning Technique for Olfactory Navigation

生长-剪枝-冻结网络:用于嗅觉导航的自适应与持续学习技术

Kordel K. France, Ovidiu Daescu

AI总结 提出生长-剪枝-冻结(GPF)网络框架,通过动态调整策略网络层数实现持续学习,在湍流羽流导航任务中达到94%成功率,并推广到其他机器学习任务。

详情
AI中文摘要

嗅觉训练数据分散在非标准化的数据集中,限制了构建代表性世界模型的能力。嗅觉导航是一项高度动态和非平稳的任务,受益于实时持续学习。我们引入了一种名为生长-剪枝-冻结(GPF)网络的自适应框架,使智能体能够通过生长、剪枝和冻结其策略的早期层来持续学习,以应对世界复杂性。将GPF基于非线性随机矩阵理论,我们展示了Pennington & Worth(2017)的工作可以从单隐藏层扩展到n层持续学习模型,并且网络权重的特征值组成在添加连续层时得以保持。我们展示了基于期望SARSA的GPF在湍流羽流导航上实现了94%的成功率——这是一个部分可观测、非平稳的任务,代表了激发机器人自适应学习的“大世界”挑战——并提供了将GPF应用于其他世界模型的支撑方法。进一步的实验表明,GPF可能很好地推广到其他机器学习任务,如Atari中的强化学习、图像分类和自回归语言模型。我们开源所有代码和数据,以鼓励对嗅觉机器人技术的改进和更多研究。

英文摘要

Training data for olfaction is scattered through disparate, non-standardized datasets that limit the ability to build representative world models. Olfactory navigation is a highly dynamic and non-stationary task that benefits from real-time continual learning. We introduce an adaptive framework called Grow-Prune-Freeze (GPF) networks that enable an agent to continually learn through growing, pruning, and freezing early layers of its policy in response to world complexity. Grounding GPFs in non-linear random matrix theory, we show that the work of Pennington & Worth (2017) can be extended from single hidden layers to n-layer continual-learning models, and that eigenvalue composition of network weights is preserved as successive layers are added. We show that GPFs based on Expected SARSA achieve a 94% success rate on turbulent plume navigation - a partially observable, non-stationary task representative of the "big world" challenges that motivate adaptive learning in robotics - and provide supporting methodology for applying GPFs in other world models. Further experiments amount evidence that GPFs may generalize well to other machine learning tasks such as reinforcement learning in Atari, image classification, and autoregressive language models. We open source all code and data to encourage improvements on and more research in olfactory robotics.

2605.25169 2026-05-26 cs.LG stat.ME stat.ML

Learning Treatment Effects during Resource Allocation via Priority-Queue Randomization

资源分配中通过优先级队列随机化学习处理效应

JungHo Lee, Johnna Sundberg, Pim Welle, Bryan Wilder

AI总结 提出优先级队列随机化实验设计框架,在优先服务高需求个体的同时识别因果效应,并优化队列分配以平衡统计效率与优先级。

详情
AI中文摘要

公共服务项目通常在对其效益不确定的情况下分配有限资源,因此需要随机化来支持可信评估。然而在实践中,申请人通常进入等待名单,资源通过分层优先级队列优先分配给被认为需求更高的个体,这使得直接随机化变得困难。受此启发,我们开发了一个实验设计框架,用于在学习处理效应的同时优先治疗最需要帮助的个体,其中新申请人根据其评估的风险评分被随机分配到优先级队列。然后,在预算允许的情况下,按优先级顺序跨队列提供治疗,并在队列内按先到先得原则提供。我们的贡献有两方面。首先,我们描述了在这种优先级队列分配下哪些因果效应被识别。当到达是外生时,处理是条件随机化的,因此标准估计量被识别;当到达是内生时,队列随机化反而为处理提供了工具变量,识别出由排队过程引起的局部处理效应。其次,我们开发了优化的队列分配设计,以在统计效率与优先考虑高需求申请人之间进行权衡。在此过程中,我们表明,尽管设计导致的处理分配存在依赖性,但通常的独立同分布效率界限仍然是合理的设计目标。我们使用美国一个大县的住房分配项目的数据来说明所提出的设计。

英文摘要

Public service programs often allocate limited resources under uncertainty about their benefits, creating a need for randomization to support credible evaluation. In practice, however, applicants commonly enter waitlists where resources are prioritized toward individuals judged to have higher need through tiered priority queues, making direct randomization difficult. Motivated by this, we develop an experimental design framework for learning treatment effects while treating those most in need where incoming applicants are randomized into priority queues based on their assessed risk scores. Treatments are then provided across queues in priority order and first-in-first-out within queue as budget becomes available. Our contributions are two-fold. First, we characterize what causal effects are identified under this priority-queue allocation. When arrivals are exogenous, treatments are conditionally randomized, and hence standard estimands are identified; when arrivals are endogenous, queue randomization instead provides an instrument for treatment, identifying local treatment effects induced by the queuing process. Second, we develop optimized queue-assignment designs that trade off statistical efficiency against prioritizing higher-need applicants. We show in the process that, despite dependence in treatment assignments induced by the design, usual iid efficiency bounds remain well-justified design objectives. We illustrate the proposed designs using data from a housing allocation program in a large U.S. county.

2605.25166 2026-05-26 cs.LG cs.AI

AME-TS: Anchored Mixture-of-Experts for Time Series Forecasting

AME-TS:基于锚定的混合专家模型用于时间序列预测

Rui Wang, Renhao Xue, Ray Razi, Huan Song, Hannah R. Marlowe

AI总结 提出AME-TS,一种结构引导的稀疏时间序列基础模型,通过轻量级预测器估计序列级描述符并生成专家软结构先验,实现专家路由与可解释时间结构对齐,在GIFT-Eval基准上实现精度-效率权衡,并在M5微调中展现更稳定的专家专业化。

详情
AI中文摘要

时间序列预测模型通过大型Transformer骨干不断扩展规模,但大多数现有方法通过共享密集计算路径处理所有序列,尽管时间结构存在显著异质性。混合专家模型(MoE)通过条件计算提供了一种自然替代方案,但标准MoE路由导致专家专业化识别弱且在下游适应中常不稳定。我们提出AME-TS,一种结构引导的稀疏时间序列基础模型,将专家路由与可解释的时间结构对齐。AME-TS首先使用轻量级预测器估计序列级描述符,包括可预测性、季节性、趋势和稀疏性,并将其映射为专家上的软结构先验。该序列级先验在训练期间指导令牌级路由,鼓励结构对齐的专业化。在GIFT-Eval基准上,AME-TS在不同模型规模下提供了强大的精度-效率权衡:在小型模型规模上显著优于现有时间序列基础模型,在较大规模上与最强模型保持竞争力,同时通过稀疏路由激活显著更少的参数。我们进一步表明,在M5数据集微调期间,AME-TS学习了比标准MoE更可解释的路由几何和更稳定的专家专业化。这些结果表明,结构感知路由是实现稀疏专家模型在时间序列预测中优势的有效且可靠方式。

英文摘要

Time series forecasting models are increasingly scaled through large Transformer backbones, yet most existing approaches process all series through a shared dense computation path despite substantial heterogeneity in temporal structure. Mixture-of-Experts (MoE) offers a natural alternative by enabling conditional computation, but standard MoE routing leaves expert specialization weakly identified and often unstable during downstream adaptation. We propose AME-TS, a structure-guided sparse time series foundation model that aligns expert routing with interpretable temporal structure. AME-TS first uses a lightweight regime predictor to estimate series-level descriptors, including forecastability, seasonality, trend, and sparsity, and maps them to a soft structural prior over experts. This series-level prior guides token-level routing during training, encouraging structure-aligned specialization. On the GIFT-Eval benchmark, AME-TS delivers a strong accuracy-efficiency tradeoff across model scales: it substantially outperforms existing time series foundation models at small model scales and remains competitive with the strongest models at larger scales, while activating substantially fewer parameters through sparse routing. We further show that AME-TS learns more interpretable routing geometry and substantially more stable expert specialization than standard MoE during fine-tuning on the M5 dataset. These results suggest that structure-aware routing is an effective and reliable way to realize the benefits of sparse expert models for time series forecasting.

2605.25163 2026-05-26 cs.CV cs.AI

K-U-KAN: Koopman-Enhanced U-KAN for 3D Dental Reconstruction from a Single Panoramic X-ray Radiograph

K-U-KAN: 基于Koopman增强的U-KAN用于单张全景X射线片的三维牙齿重建

Bikram Keshari Parida, Abhijit Sen, Wonsang You

AI总结 提出K-U-KAN三阶段流水线,结合Kolmogorov-Arnold网络、Koopman算子与U-KAN,从单张全景X射线高效重建三维牙齿结构,提升感知质量并缩短训练时间。

Comments 24 pages, 9 figures,

详情
AI中文摘要

全景X射线将三维颌骨压缩为二维条带;我们的目标是干净且快速地恢复缺失的深度。现有的隐式神经表示能渲染逼真的体积,但训练缓慢,对采样和位置编码敏感,且实际成本高。纯CNN基线效率高,但难以处理牙弓的长程几何,模糊了精细的釉质-牙本质边界,且可解释性差。我们提出K-U-KAN,一个三阶段流水线:(i) 使用Kolmogorov-Arnold网络将二维特征提升为深度感知的可观测变量,(ii) 通过Koopman令牌块以稳定的、相位感知的线性演化推进这些可观测变量,(iii) 将预测的深度区间放置在焦槽射线上,然后由轻量级3D注意力U-KAN细化体积。这种物理(Beer-Lambert图像形成)、几何(马蹄形焦槽)和学习线性动力学的结合,在批量大小为1的原生射线强度上产生了清晰的解剖结构、更少的伪影和鲁棒的行为。在保留数据上,K-U-KAN在信号和结构指标上与Transformer/隐式基线相当,显著提高了感知质量,并且训练时间大约减半——使单视图全景X射线到锥形束CT重建在临床流程中更加实用。

英文摘要

A panoramic X-ray compresses a 3D jaw into a 2D strip; we aim to recover the missing depth cleanly and fast. Existing implicit neural representations render realistic volumes but are slow to train, sensitive to sampling and positional encodings, and costly in practice. Pure CNN baselines are efficient yet struggle with the dental arch's long-range geometry, blur fine enamel-dentin boundaries, and offer little interpretability. We present K-U-KAN, a three-stage pipeline that (i) lifts 2D features into depth-aware observables with Kolmogorov-Arnold Networks, (ii) advances these observables by a stable, phase-aware linear evolution via a Koopman token block, and (iii) places the predicted depth bins onto focal-trough rays before a lightweight 3D attention U-KAN refines the volume. This marriage of physics (Beer-Lambert image formation), geometry (horseshoe focal trough), and learned linear dynamics yields sharp anatomy, fewer artifacts, and robust behavior on native radiographic intensities with batch size one. On held-out data, K-U-KAN matches transformer/implicit baselines on signal and structure metrics, clearly improves perceptual quality, and trains in roughly half the time-making single-view PX $\to$ CBCT reconstruction more practical for clinical pipelines.

2605.25162 2026-05-26 cs.CL cs.AI

STREAM: A Data-Centric Framework for Mining High-Value Task-Oriented Dialogues from Streaming Media

STREAM:一个以数据为中心的框架,用于从流媒体中挖掘高价值任务导向对话

Liang Xue, Haoyu Liu, Cheng Wang, Pengyu Chen, Haozhuo Zheng, Yang Liu

AI总结 提出STREAM框架,利用流媒体数据合成大规模多领域任务导向对话数据集StreamDial,通过角色构建和对话蓝图结合RAG生成高质量对话,解决数据稀缺问题。

详情
AI中文摘要

垂直领域的大语言模型受到复杂、特定领域任务导向对话稀缺的瓶颈。现有的数据获取管道面临持续的三难困境:专家标注昂贵,真实服务对话受隐私和商业限制,静态语料库很快过时。我们提出Stream,一个以数据为中心的框架,利用公开可用的流媒体(直播和短视频)大规模合成高价值服务对话。Stream从嘈杂的流中挖掘真实的交互信号,并通过将基于角色的个性构建与对话蓝图构建相结合来合成对话;它进一步采用检索增强生成(RAG)来支持知识感知的响应。基于Stream,我们发布了StreamDial,一个覆盖汽车、餐厅和酒店的大规模多领域数据集。StreamDial总共包含87,498个对话会话和1,497,320轮次,平均每个会话17.11轮,各领域规模相当。每个会话被组织为结构化四元组⟨P_u, P_a, B, H⟩,将对话历史与明确的用户/代理角色和对话蓝图配对,捕捉真实服务行为,如需求挖掘、约束冲突、协商和恢复。使用自动评估和下游任务的评估表明,StreamDial在强基线上提高了内在对话质量,使用StreamDial训练的模型在多个骨干网络上改进了对话状态跟踪;我们进一步报告了完整的人工评估集,并在受控训练预算下在Qwen3-8B上实现了令人鼓舞的多语言迁移。数据发布在https://github.com/hitxueliang/DialogDataSetBySTREAM。

英文摘要

Large language models for vertical domains are bottlenecked by the scarcity of complex, domain-specific task-oriented dialogues. Existing data acquisition pipelines face a persistent trilemma: expert annotation is expensive, real-world service conversations are constrained by privacy and commercial restrictions, and static corpora quickly become temporally stale. We propose Stream, a data-centric framework that leverages publicly available streaming media (live streams and short videos) to synthesize high-value service dialogues at scale. Stream mines authentic interaction signals from noisy streams and synthesizes conversations by integrating role-grounded persona construction with Conversational Blueprint construction; it further adopts retrieval-augmented generation (RAG) to support knowledge-aware responses. Based on Stream, we release StreamDial, a large-scale multi-domain dataset covering Automotive, Restaurant, and Hotel. StreamDial contains 87,498 dialogue sessions and 1,497,320 turns in total, with an average of 17.11 turns per session and a comparable scale across domains. Each session is organized as a structured quadruplet $\langle P_u, P_a, B, H \rangle$ that pairs dialogue history with explicit user/agent personas and a Conversational Blueprint, capturing realistic service behaviors such as requirement mining, constraint conflicts, negotiation, and recovery. Evaluations with automatic judges and downstream tasks show that StreamDial improves intrinsic dialogue quality over strong baselines, and models trained with StreamDial improve Dialogue State Tracking across backbones; we further report a completed human-evaluation set and encouraging multilingual transfer on Qwen3-8B under a controlled training budget. The data is released in https://github.com/hitxueliang/DialogDataSetBySTREAM.

2605.25160 2026-05-26 cs.AI

SimuWoB: Simulating Real-World Mobile Apps for Fast and Faithful GUI Agent Benchmarking

SimuWoB: 模拟真实世界移动应用以实现快速且保真的GUI智能体基准测试

Guohong Liu, Jialei Ye, Pengzhi Gao, Wei Liu, Jian Luan, Yunxin Liu, Yuanchun Li

AI总结 针对现有移动GUI智能体基准测试与现实应用之间的差距,提出全合成基准SimuWoB,通过鲁棒的虚拟环境生成框架合成高保真任务和环境,自动提供有效奖励,实现对复杂长程交互的高效可重复评估。

详情
AI中文摘要

由大型语言模型驱动的移动GUI智能体发展迅速,迫切需要真实且全面的评估。现有基准测试优先考虑可重复性,但通常局限于开源应用或文件操作任务,因为在实际应用中构建奖励困难,导致基准设置与现实使用之间存在差距。此外,大多数基准测试侧重于基本定位和导航,对复杂长程交互的覆盖有限。为解决这些局限性,我们引入了SimuWoB,一个全合成的移动GUI智能体基准测试,包含120个涵盖不同类型和难度级别的挑战性任务。我们构建了一个鲁棒的虚拟环境生成框架,合成高保真任务和环境,并为每个任务自动提供有效奖励。每个环境都部署为可通过URL访问的无后端网页,实现高效且可重复的评估。我们对几个最先进的移动GUI智能体进行了全面实验。平均成功率仅为27.92%,在长程任务上降至17.82%,揭示了当前智能体在复杂场景下的显著弱点。与真实世界样本任务的评估结果比较表明,基于我们合成环境的智能体评估具有良好的泛化性。我们进一步提供了关键能力维度的诊断见解,并讨论了对未来移动GUI智能体开发的启示。

英文摘要

Mobile GUI agents powered by large language models have progressed rapidly, creating urgent needs for realistic and comprehensive evaluation. Existing benchmarks prioritize reproducibility but are often limited to open-source apps or file-operation tasks for the difficulty of constructing rewards on real applications, leaving a gap between benchmark settings and real-world usage. Moreover, most benchmarks focus on basic grounding and navigation, with limited coverage of complex, long-horizon interactions. To address these limitations, we introduce SimuWoB, a fully synthetic benchmark for mobile GUI agents with 120 challenging tasks spanning diverse types and difficulty levels. We build a robust virtual environment generation framework that synthesizes high-fidelity tasks and environments, and automatically provides valid rewards for each task. Each environment is deployed as a backend-free webpage accessible via URL, enabling efficient and reproducible evaluation. We conduct comprehensive experiments on several state-of-the-art mobile GUI agents. The average success rate is only 27.92%, dropping to 17.82% on long-horizon tasks, which reveals substantial weaknesses in current agents under complex scenarios. Evaluation result comparison with real-world sample tasks demonstrate that agent assessments based on our synthetic environment generalize well. We further provide diagnostic insights across key capability dimensions and discuss implications for future mobile GUI agent development.

2605.25156 2026-05-26 cs.LG cs.AI

Abduction-Deduction Entanglement: Domain Generalization via Representation Transplants

溯因-演绎纠缠:通过表示移植实现领域泛化

Kasra Jalaldoust, Elias Bareinboum

AI总结 本文提出一种基于表示移植的方法,通过参数化溯因-演绎纠缠中的非可识别性,在源分布约束下搜索目标分布空间,实现领域泛化中的最优目标预测。

详情
AI中文摘要

在源分布下训练的预测模型通常无法很好地泛化到不同的目标分布。对未见数据分布的有效推断必须依赖于生成源数据和目标数据的某些因果机制的不变性,然而这些结构不变性仅从源数据中是无法识别的。在关于数据的温和因果假设下,我们表明目标中的最优预测实际上部分可由源分布识别。该结果基于一个简单的观察:在任何领域中,最优预测可以分解为我们称之为溯因映射和演绎映射的一对映射,其中溯因映射从观测变量推断某些未观测变量(可能是混杂因素),演绎映射使用观测和推断的量来预测标签。大量源数据的使用固定了最优预测,从而约束了产生它的有效溯因-演绎组合——这种非可识别性我们称之为溯因-演绎纠缠。为了利用这一点,我们使用所谓的表示移植来参数化受约束的族,表示移植是表示空间中的一种特定线性变换,它在保留演绎成分的同时操纵表示的溯因内容。生成标签的因果机制的不变性意味着源和目标之间存在不变的演绎映射。因此,我们可以通过参数化移植来搜索合理的目标分布空间。我们在一个学习器-对手博弈中使用该方案,在理想优化下,该博弈可证明终止于学习器具有极小极大最优目标预测。评估验证了理论,表明该方法在领域泛化基准测试中具有竞争力。

英文摘要

Prediction models trained under the source distribution do not generalize well to a different target distribution. A valid inference about an unseen data distribution must be anchored by the invariance of certain causal mechanisms that generate the source and target data, however, these structural invariances are non-identifiable from the source data alone. Under mild causal assumptions about the data, we show that the optimal prediction in the target is in fact partially identifiable by the source distribution. The result rests on a simple observation: In any domain, the optimal prediction can be factorized into what we call a pair of abduction and deduction maps, where the abduction map makes inference about some unobserved variables (possibly confounders) from the observed variables and the deduction map predicts the label using both the observed and inferred quantities. Access to large source data pins down the optimal prediction, thus constrains the valid abduction-deduction ensembles that produce it -- a non-identifiability that we call the abduction-deduction entanglement. To leverage this, we parameterize the constrained family using what we call a representation transplant, that is a specific linear transformation in the representation space that manipulates the abduction content of the representation while retaining the deduction component. Invariance of the causal mechanism generating the label implies existence of an invariant deduction map between source and target. Thus, we can search the space of plausible target distributions via a parametric transplant. We use this scheme in a learner-adversary game that, under an idealistic optimization, provably terminates with the learner having the minimax-optimal target prediction. Evaluations verify the theory, showing that the method is competitive in DG benchmarks.

2605.25151 2026-05-26 cs.AI cs.CE

Representation Without Control: Testing the Realization Effect in Language Models

无控制的表征:测试语言模型中的实现效应

Ciarán Walsh, Emilio Barkett

AI总结 通过提示行为、线性读出和因果控制三个层面,测试语言模型是否表现出类似人类的实现效应,发现潜在读出成功但因果控制无效,表明三者不自动共存。

详情
AI中文摘要

大型语言模型越来越多地被用作行为模拟器,但其输出何时反映类似人类的认知机制而非提示敏感的表面模式仍不清楚。我们通过实现效应研究这一问题,这是行为经济学中一个特征明确的发现,即风险承担在纸面收益与实现收益及损失后存在系统性差异。我们在三个层面评估LLM行为:仅提示的行为敏感性、内部表征的线性读出以及通过激活引导的因果控制。仅提示结果显示系统的条件敏感性,但方向模式未复现人类实现效应的预测。Gemma的残差流在第18层包含一个线性可解码的实现状态信号,该信号可泛化到未见过的提示。然而,沿此方向引导并未可靠地改变下游风险选择,这一零结果在正尺度和负符号对称运行中均成立。行为敏感性、潜在读出和因果控制是三个不同的属性,它们不会自动共存,成功的潜在读出不足以证明模型在下游决策中行为上依赖于该表征。

英文摘要

Large language models are increasingly used as behavioral simulators, but it remains unclear when their outputs reflect human-like cognitive mechanisms rather than prompt-sensitive surface patterns. We study this question through the realization effect, a well-characterized finding in behavioral economics in which risk-taking differs systematically after paper versus realized gains and losses. We evaluate LLM behavior at three levels: prompt-only behavioral sensitivity, linear readout of internal representations, and causal control via activation steering. Prompt-only results show systematic condition sensitivity, but the directional pattern does not reproduce human realization-effect predictions. Gemma's residual stream contains a linearly decodable realization-status signal at layer 18 that generalizes to held-out prompts. Steering along this direction does not, however, reliably shift downstream risk choices, a null result that holds across positive scales and in a negative sign-symmetry run. Behavioral sensitivity, latent readout, and causal control are three distinct properties that do not automatically co-occur, and successful latent readout is insufficient evidence that a model behaviorally relies on a representation during downstream decision-making.

2605.25141 2026-05-26 cs.CL cs.AI

LLM Agent Based Renewable Energy Forecasting Using Edge and IoT Data A Review of Solar Wind Weather and Grid Aware Decision Support

基于LLM Agent的利用边缘和物联网数据的可再生能源预测:太阳能、风能、天气和电网感知决策支持综述

Pavan Manjunath, Thomas Pruefer

AI总结 本文综述了如何利用大语言模型代理整合异构传感器流、天气API数据、历史发电记录和电网约束,形成统一的决策支持工作流,以增强可再生能源预测。

详情
AI中文摘要

可再生能源发电的可靠预测是电网稳定性、能源交易、电池调度和碳感知运营规划的基础要求。太阳能和风能资源本质上是间歇性的,其输出随云量、风速、大气湍流、季节模式和局部地形而波动。物联网和边缘设备的普及,包括智能电表、逆变器、风速计、日射强度计、气象站和电网接口传感器,创造了前所未有的实时运行数据量,而传统的预测流程难以充分利用这些数据。本综述研究了大语言模型代理如何通过将异构传感器流、天气API数据、历史发电记录、电网约束和上下文推理整合到统一的决策支持工作流中,来增强可再生能源预测。我们调查了经典预测方法(统计时间序列模型、深度学习架构、物理混合方法)以及新兴的用于解释、不确定性沟通和操作员指导的LLM代理框架。提出了一个六层分类法,涵盖数据采集、预处理、特征工程、模型推理、不确定性估计和自然语言报告。综述识别了十二个开放挑战,包括实时部署、分布偏移下的模型漂移、不确定性量化、LLM代理中的幻觉控制、边缘硬件的互操作性以及与能源管理系统的集成。论文最后建议了一个研究议程,重点关注开放基准、物理信息LLM基础以及联邦预测架构。

英文摘要

Reliable forecasting of renewable energy generation is a foundational requirement for grid stability energy trading battery scheduling and carbon aware operational planning Solar and wind resources are inherently intermittent their output fluctuates with cloud cover wind speed atmospheric turbulence seasonal patterns and local terrain The proliferation of IoT and edge devices spanning smart meters inverters anemometers pyranometers weather stations and grid interface sensors has created an unprecedented volume of real time operational data that conventional forecasting pipelines are ill equipped to exploit fully This review investigates how large language model LLM agents can enhance renewable energy forecasting by integrating heterogeneous sensor streams weather API data historical generation records grid constraints and contextual reasoning into unified decision support workflows We survey classical forecasting methods statistical time series models deep learning architectures physics hybrid approaches and emerging LLM agent frameworks for explanation uncertainty communication and operator guidance A six layer taxonomy is proposed covering data acquisition preprocessing feature engineering model inference uncertainty estimation and natural language reporting The review identifies twelve open challenges spanning real time deployment model drift under distribution shift uncertainty quantification hallucination control in LLM agents interoperability of edge hardware and integration with energy management systems The paper concludes by recommending a research agenda centred on open benchmarks physics informed LLM grounding and federated forecasting architectures

2605.25135 2026-05-26 cs.LG cs.AI

ASTRO: Adaptive Spatio-Temporal Reinforcement Optimization for GNN Powered Anomly Detection in Cyber Physical Systems

ASTRO: 用于信息物理系统中基于GNN的异常检测的自适应时空强化优化

Rai Ali Yar, Umaisa Lail, Anwar Shah

AI总结 提出ASTRO框架,结合深度Q网络与图神经网络、时间建模和多头注意力机制,通过强化学习动态优化阈值,在SWaT和WADI数据集上实现高F1分数,优于现有方法。

详情
AI中文摘要

工业物联网环境中的异常检测对于保护工业控制系统和信息物理系统免受运行时虚假数据注入和其他恶意攻击至关重要。传感器网络和互连控制回路日益复杂,使得识别隐藏在高维和时间依赖信号中的异常行为变得困难。为解决这些挑战,本文介绍了自适应时空强化优化ASTRO,一种新颖的异常检测框架,开创性地使用强化学习进行动态阈值优化。通过将深度Q网络与图神经网络、时间建模和多头注意力机制相结合,ASTRO不断调整其决策边界以提高检测精度。GNN组件建模传感器之间的空间关系,时间模型捕获时间序列依赖性,注意力层突出显示最具信息量的时间步。模型生成连续异常分数,通过自适应阈值转换为二元决策,该阈值通过深度Q网络优化。ASTRO方法在两个真实工业基准测试:安全水处理和水分配数据集上进行了评估。所提模型在SWaT上取得了卓越性能,F1分数为0.990。此外,在高度复杂的127个终端设备的WADI数据集上,它获得了0.788的F1分数,比最先进的基线高出近14%。多次运行的结果证实了其一致的泛化能力和稳定性。这些实验表明,ASTRO框架是增强大规模信息物理基础设施的高度实用和可扩展的方法。

英文摘要

Anomaly detection in Industrial Internet of Things (IIoT) environments is essential to protect the Industrial Control Systems (ICS) and Cyber-Physical Systems (CPS) from occuring run time false data injection and other malicious attacks. The increasing complexity of sensor networks and interconnected control loops makes it difficult to identify anomalous behavior hidden within high-dimensional and time-dependent signals. To address these challenges, this article introduces Adaptive Spatio-Temporal Reinforcement Optimization ASTRO (ASTRO), a novel anomaly detection framework that pioneers the use of reinforcement learning for dynamic threshold optimization. By integrating a Deep Q-Network (DQN) with Graph Neural Networks (GNNs), temporal modelling and a Multi-Head Attention mechanism, ASTRO continuously adapts its decision boundaries to improve detection accuracy. The GNN component models the spatial relations among sensors, Temporal model captures time series dependencies and the attention layer highlights most informative time steps. The model generates continuous anomaly scores, which are transformed into binary decisions using an adaptive threshold, optimized via a Deep Q-Network (DQN). The ASTRO approach is evaluated on two real world industrial benchmarks: the Secure Water Treatment (SWaT) and Water Distribution (WADI) datasets. The proposed model achieves an exceptional performance on the SWaT with F1 score of 0.990. Moreover, on highly complex 127 end devices WADI dataset, it secures F1 score of 0.788, outperforming state-of-the-art baselines by nearly 14%. Results across multiple runs confirm consistent generalization and stability. These experiments demonstrate that the ASTRO framework is highly practical and scalable method for strengthening the large scale cyber physical infrastructures

2605.25133 2026-05-26 cs.AI cs.CL

Trust but Verify: Prover-Verifier Deliberation for Selective LLM Prediction

信任但验证:面向选择性LLM预测的证明者-验证者审议

João Sedoc, Baotong Zhang, Dean Foster

AI总结 提出基于交互式证明理论的证明者-验证者审议协议,通过结构化置信度判定实现选择性预测,在GPQA Diamond上取得约30个百分点的高置信度精确率差距。

详情
AI中文摘要

可靠地知道语言模型何时正确几乎与正确本身同样重要。我们引入证明者-验证者审议(PVD),这是一种基于交互式证明理论的推理时协议,作为选择性预测的机制:该协议同时产生答案和结构化置信度判定,允许系统报告高置信度答案,同时在不明确的情况下弃权。在每个对话中,证明者通过可检查的子主张捍卫候选答案,而验证者发出有针对性的挑战并返回\textsc{Accept}、\textsc{Challenge}或\textsc{Reject}。由于冻结的语言模型是在噪声信道上运行的不完美的证明者和验证者,形式上的可靠性和完备性保证并不适用;相反,我们通过其覆盖-精确率行为来经验性地描述该协议。我们的主要实验使用Claude Sonnet 4.6作为证明者,Claude Haiku 4.5作为验证者,在GPQA Diamond上进行。没有答案修订即被接受的问题,我们称为Accept + No Change (ANC),作为高置信度子集报告;我们通过其精确率和覆盖来评估该子集。ANC将可靠答案与不可靠答案分开,与非ANC补集相比产生约30个百分点的HC-Prec差距。使用GPT和Gemini配对的鲁棒性实验表明,高HC-Prec可以跨模型系列转移,而验证者的严格性和领域能力在很大程度上决定了选择差距的大小。在Humanity's Last Exam上,较弱的证明者-验证者配对可能使ANC信号崩溃或反转,这说明了当验证者在其有效区域外操作时的实际失败模式。与自一致性、通用自一致性、多智能体辩论和Reflexion的比较表明,证明者-验证者审议为选择性预测提供了独特的论点可辩护性信号。

英文摘要

Reliably knowing when a language model is correct is almost as important as being correct. We introduce prover-verifier deliberation (PVD), an inference-time protocol grounded in interactive proof theory, as a mechanism for selective prediction: the protocol produces both an answer and a structured confidence verdict, allowing a system to report high-confidence answers while abstaining on uncertain cases. In each dialogue, a prover defends a candidate answer through checkable sub-claims while a verifier issues targeted challenges and returns \textsc{Accept}, \textsc{Challenge}, or \textsc{Reject}. Because frozen language models are imperfect provers and verifiers operating over a noisy channel, formal soundness and completeness guarantees do not transfer; instead, we characterize the protocol empirically through its coverage-precision behavior. Our main experiment uses Claude Sonnet 4.6 as prover and Claude Haiku 4.5 as verifier on GPQA Diamond. Questions accepted with no answer revision, which we call Accept + No Change (ANC), are reported as the high-confidence subset; we evaluate this subset by its precision and coverage. ANC separates reliable from unreliable answers, yielding a $\sim$30pp HC-Prec gap over the non-ANC complement. Robustness experiments with GPT and Gemini pairings show that high HC-Prec can transfer across model families, while verifier strictness and domain competence largely determine the size of the selection gap. On Humanity's Last Exam, weaker prover-verifier pairings can collapse or invert the ANC signal, illustrating a practical failure mode when the verifier operates outside its effective region. Comparisons with self-consistency, universal self-consistency, multi-agent debate, and Reflexion suggest that prover-verifier deliberation supplies a distinct argument-defensibility signal for selective prediction.

2605.25129 2026-05-26 cs.LG

Blocked Gibbs meets Diffusion Transformers: Unsupervised Learning for Constraint Optimization

分块吉布斯采样遇上扩散Transformer:约束优化的无监督学习

Yudong W. Xu, Wenhao Li, Xiaoyu Wang, Scott Sanner, Elias B. Khalil

AI总结 提出分块吉布斯扩散Transformer(BloGDiT),通过分块高斯去噪替代标准联合高斯去噪,解决扩散模型在约束优化中变量子集大规模编辑的需求,在数独、图着色、最大独立集和MaxCut任务上匹配或超越现有方法。

详情
AI中文摘要

扩散模型在学习解决约束优化问题方面显示出潜力。然而,它们大多局限于二元变量问题,并依赖图神经网络,阻碍了其应用于更广泛的问题,例如具有一般离散变量或需要全局而非局部推理的约束结构的问题。我们研究了使用扩散Transformer来解决上述局限性。朴素实现表现不佳,因为标准扩散过程与约束求解之间存在根本性不匹配:前者对所有变量进行微小、渐进的去噪,而后者需要大幅改变特定的变量子集以实现可行性或最优性。我们的方法,分块吉布斯扩散Transformer(BloGDiT),是第一个通过用分块高斯去噪替代标准联合高斯去噪来解决这一局限性的方法。BloGDiT使用迭代块重采样,并随时间退火块大小,以促进变量块内的大规模、有针对性的编辑。在数独、图着色、最大独立集和MaxCut上,BloGDiT匹配或超越了现有方法,表明分块吉布斯式扩散为基于Transformer的约束满足和优化提供了高度有效的归纳偏置。

英文摘要

Diffusion models have shown promise in learning to solve constraint optimization problems. However, they are mostly restricted to problems with binary variables and rely on graph neural networks, hindering their application to a broader range of problems such as those with general discrete variables or constraint structures that necessitate global rather than local reasoning. We investigate the use of Diffusion Transformers to address the aforementioned limitations. A naive implementation performs poorly due to a fundamental mismatch between the standard diffusion process and constraint solving: while the former applies small, incremental denoising across all variables, the latter requires substantially altering specific subsets of variables to attain feasibility or optimality. Our method, Blocked Gibbs Diffusion Transformer (BloGDiT), is the first to address this limitation by replacing standard joint Gaussian denoising with blocked Gaussian denoising. BloGDiT uses iterative block resampling and anneals the block size over time to facilitate large, targeted edits within a block of variables. Across Sudoku, Graph Coloring, Maximum Independent Set, and MaxCut, BloGDiT matches or outperforms existing methods, demonstrating that blocked Gibbs-style diffusion provides a highly effective inductive bias for Transformer-based constraint satisfaction and optimization.

2605.25127 2026-05-26 cs.CV cs.LG

PQDT: Pseudo-Query Dual Transformer for Robust Point Cloud Restoration

PQDT: 伪查询双Transformer用于鲁棒点云修复

Haoqing Wu, Alexa Nawotki, Jochen Garcke

AI总结 提出一种基于伪查询模块和Transformer主干网络的统一3D修复网络,通过两阶段几何变换增强结构清晰度和局部细节,在多种退化场景下超越现有方法。

Comments To be published in The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

详情
AI中文摘要

点云是计算机视觉中一种基本的3D表示,支持广泛的感知任务。然而,由于传感器限制或遮挡,真实世界的点云常常遭受不完整、噪声、离群点和密度不规则等退化。从这种退化数据中恢复干净且详细的形状对于下游应用至关重要。尽管现有的基于学习方法在完成或去噪等单个任务上取得了进展,但它们通常依赖于全局瓶颈特征,这会丢失细粒度几何信息,并且对变化的输入质量敏感。我们提出一个统一的3D修复网络,直接以点云作为输入,并在多种退化场景下自适应地重建高质量几何。我们方法的核心是一个伪查询模块,在Transformer主干网络中实现,它将几何变换重新表述为两个协作阶段,以增强结构清晰度、鲁棒性和局部细节保留。在精心设计的基准测试上的大量实验表明,我们的方法在通用3D修复中超越了最先进的性能。它有效处理了完成、变形和去噪退化的复杂组合。通过这项工作,我们提供了一个新颖的、统一的、仅基于点的主干网络,用于鲁棒的3D修复,从而实现更通用的3D感知。

英文摘要

Point clouds are a fundamental 3D representation in computer vision, enabling a wide range of perception tasks. However, real-world point clouds often suffer from degradations such as incompleteness, noise, outliers, and irregular density, caused by sensor limitations or occlusions. Recovering clean and detailed shapes from such degraded data is crucial for downstream applications. While existing learning-based methods achieve progress on individual tasks like completion or denoising, they typically rely on global bottleneck features, which lose fine-grained geometry and remain sensitive to varying input quality. We propose a unified 3D restoration network that directly takes point clouds as input and adaptively reconstructs high-quality geometry under diverse degradation scenarios. At the core of our approach is a Pseudo-Query module, implemented within a Transformer backbone, which reformulates geometric translation into two cooperative stages to enhance structural clarity, robustness, and local detail preservation. Extensive experiments on curated benchmarks demonstrate that our approach surpasses state-of-the-art performance in general 3D restoration. It effectively handles complex combinations of completion, deformation, and denoising degradations. With this work, we provide a novel unified, point-only backbone for robust 3D restoration, enabling more versatile 3D perception.

2605.25124 2026-05-26 cs.LG

Optimizing Multidimensional Scaling in Gini Metric Spaces

在基尼度量空间中优化多维缩放

Cassandra Mussard, Stéphane Mussard

AI总结 提出基尼多维缩放(Gini MDS)框架,通过基于值和秩的基尼伪距离,在噪声和异常值数据上优于欧几里得MDS,并利用PyTorch实现GPU加速。

详情
AI中文摘要

基尼多维缩放(Gini MDS)框架扩展了欧几里得多维缩放。我们引入了一种基于值和秩的基尼伪距离,该距离依赖于一个可微调的超参数。这种伪距离允许灵活探索潜在配置,从而实现与观测相异度最佳匹配的嵌入。Gini MDS被证明对噪声和异常值具有鲁棒性,使其非常适合实际应用。我们在16个带有异常值的UCI数据集和带有噪声的MNIST图像上进行了实验,表明Gini MDS在噪声数据上优于欧几里得MDS。最后,与 exttt{sklearn}库的标准MDS相比,基于张量的 exttt{PyTorch}实现提供了GPU加速和高效计算。

英文摘要

The Gini Multidimensional Scaling (Gini MDS) framework extends the Euclidean multidimensional scaling. We introduce a Gini pseudo-distance based on values and their ranks that depends on a fine-tunable hyperparameter. This pseudo-distance allows flexible exploration of latent configurations, enabling embeddings that best match observed dissimilarities. The Gini MDS is shown to be robust to noise and outliers, making it well-suited for real-world applications. We provide experiments on 16 UCI datasets with outliers and on MNIST images with noise to show that the Gini MDS outperforms the Euclidean MDS on noisy data. Finally, a tensor-based implementation in \texttt{PyTorch} provides GPU acceleration and efficient computation compared to the standard MDS of the \texttt{sklearn} library.

2605.25123 2026-05-26 cs.LG cs.AI cs.CL cs.CV stat.ML

Inference-Time Alignment of Diffusion Models via Trust-Region Iterative Twisted Sequential Monte Carlo

扩散模型的推理时对齐:基于信任区域迭代扭曲序贯蒙特卡洛方法

Weixin Wang, Yu Yang, Wei Deng, Pan Xu

AI总结 提出信任区域迭代扭曲序贯蒙特卡洛(TRI-TSMC)框架,通过迭代学习扭曲函数来改进扩散模型推理时的对齐,在文本生成和文本到图像生成任务上优于现有方法。

Comments 34 pages, 6 figures, and 7 tables

详情
AI中文摘要

我们研究基于扩散的生成模型的推理时对齐,旨在引导基础模型产生高奖励输出而不更新其权重。最近的基于序贯蒙特卡洛(SMC)的引导方法以原则性的方式近似奖励倾斜的目标分布,但其提议仍主要依赖于基础采样器。由于奖励信息主要通过粒子重加权和重采样在传播后使用,这些方法可能需要大量粒子预算,并遭受权重退化和高方差估计的问题。降低方差和提高粒子效率的一种方法是迭代学习提供前瞻指导的扭曲函数,如扭曲SMC。然而,现有的可学习扭曲方法主要针对经典序贯推理开发,当应用于具有高维状态空间和终端、噪声或黑盒奖励的扩散对齐时可能不稳定。我们提出信任区域迭代扭曲序贯蒙特卡洛(TRI-TSMC),一种用于在基于SMC的推理时对齐中学习扭曲函数的信任区域框架。每次迭代在路径空间中计算精确的KL约束更新,通过温度重要性重加权得到闭式解,并通过加权最大似然将该目标投影回参数化扭曲族。理论上,我们形式化了最优扭曲函数的值函数解释,并表明它产生零方差采样器。我们证明信任区域更新沿着护航路径朝向目标分布,加权最大似然更新是前向KL投影,并且该路径降低了残差重要性权重方差。实验上,在匹配的推理时预算下,TRI-TSMC在离散扩散文本生成和文本到图像生成上改进了主要对齐目标。

英文摘要

We study inference-time alignment for diffusion-based generative models, aiming to steer a base model toward high-reward outputs without updating its weights. Recent Sequential Monte Carlo (SMC)-based steering methods approximate reward-tilted target distributions in a principled way, but their proposals remain largely tied to the base sampler. Since reward information is mainly used after propagation through particle reweighting and resampling, these methods can require large particle budgets and suffer from weight degeneracy and high-variance estimates. One way to reduce variance and improve particle efficiency is to iteratively learn twisting functions that provide look-ahead guidance, as in twisted SMC. However, existing learnable twisting methods are developed mainly for classical sequential inference and can be unstable when applied to diffusion-based alignment with high-dimensional state spaces and terminal, noisy, or black-box rewards. We propose Trust-Region Iterative Twisted Sequential Monte Carlo (TRI-TSMC), a trust-region framework for learning twisting functions in SMC-based inference-time alignment. Each iteration computes an exact KL-constrained update in path space, which admits a closed-form solution by tempered importance reweighting, and projects this target back to the parameterized twisted family by weighted maximum likelihood. Theoretically, we formalize the value-function interpretation of the optimal twisting function and show that it yields a zero-variance sampler. We prove that the trust-region update follows an escort path toward the target distribution, that the weighted maximum-likelihood update is a forward-KL projection, and that the path reduces residual importance-weight variance. Empirically, TRI-TSMC improves primary alignment objectives on discrete diffusion text generation and text-to-image generation under matched inference-time budgets.

2605.25120 2026-05-26 cs.CL cs.AI cs.HC

Evidence-Linked Radiology Reporting: A Human-Supervised Reference Architecture for Structured Imaging Intelligence

证据关联放射学报告:面向结构化成像智能的人机协同参考架构

Houman Kazemzadeh, Kamyar Naderi

AI总结 提出一种人机协同、证据关联的参考架构,通过结合特定检查模板、语音到结构处理、测量与分割捕获、受控AI辅助起草以及基于DICOM、HL7 FHIR等标准的互操作性,将放射学报告从自由文本转化为结构化智能层,支持审阅报告、纵向比较、临床数据重用及系统集成。

Comments Technical report, 27 pages, 2 figures, 12 tables, 1 listing; reference architecture paper; does not report clinical outcomes or validated diagnostic performance

详情
AI中文摘要

放射学报告仍然是向临床团队传达成像结果的主要机制。然而,这些报告背后的大量结构化信息,包括测量值、图像证据、既往比较、病灶标识、不确定性和术语,通常仍被禁锢在自由文本中,或分散在图像存档与通信系统、放射信息系统、报告工作站、工作表、高级可视化工具和电子健康记录中。本文提出一种人机协同、证据关联的结构化放射学报告参考架构。该框架结合了特定检查模板、语音到结构处理、测量与分割捕获、受控AI辅助起草,以及基于DICOM、DICOM结构化报告、DICOM分割、HL7 FHIR、RadLex、SNOMED CT、LOINC和UCUM的标准化互操作性。该系统并非作为自主报告生成器,而是作为企业成像的结构化智能层,支持审阅报告、纵向比较、临床数据重用、治理,以及与PACS、RIS、EHR、分析和注册工作流的集成。本文还讨论了针对AI辅助放射学报告系统的模态特定部署考虑、临床安全风险、验证要求、网络安全、隐私、质量管理和监管边界。

英文摘要

Radiology reports remain the primary mechanism by which imaging findings are communicated to clinical teams. However, much of the structured information behind these reports, including measurements, image evidence, prior comparisons, lesion identity, uncertainty, and terminology, often remains trapped in free text or fragmented across picture archiving and communication systems, radiology information systems, reporting workstations, worksheets, advanced visualization tools, and electronic health records. This paper proposes a human-supervised, evidence-linked reference architecture for structured radiology reporting. The framework combines exam-specific templates, speech-to-structure processing, measurement and segmentation capture, controlled AI-assisted drafting, and standards-based interoperability using DICOM, DICOM Structured Reporting, DICOM Segmentation, HL7 FHIR, RadLex, SNOMED CT, LOINC, and UCUM. The system is positioned not as an autonomous report generator, but as a structured intelligence layer for enterprise imaging that supports reviewed reporting, longitudinal comparison, clinical data reuse, governance, and integration with PACS, RIS, EHR, analytics, and registry workflows. The paper also discusses modality-specific deployment considerations, clinical safety risks, validation requirements, cybersecurity, privacy, quality management, and regulatory boundaries for AI-assisted radiology reporting systems.

2605.25119 2026-05-26 cs.CV cs.AI cs.LG

Trust-Aware Joint Feature-Prediction Discrepancy for Robust Domain Adaptation

信任感知的联合特征-预测差异用于鲁棒域适应

Xi Ding, Lei Wang, Syuan-Hao Li, Yongsheng Gao

AI总结 提出信任感知域适应框架,通过联合特征-预测差异(JFPD)结合不确定性信任和语义对齐信任,实现可靠性感知的域差异估计,提升域适应性能。

Comments Research report

详情
AI中文摘要

域适应旨在减轻标记源域与未标记或稀疏标记目标域之间分布偏移导致的性能下降。大多数现有方法在特征空间或预测空间中估计域差异。然而,这些单一视角策略忽略了域偏移下的一个关键问题:用于对齐的信号可靠性。实际上,学习到的表示和语义预测都可能变得不可靠,平等对待所有目标样本可能导致误导性对齐和次优迁移。我们引入了信任感知域适应,这是一个原则性框架,通过特征和预测信号的可靠性来建模域差异。我们方法的核心是联合特征-预测差异(JFPD),这是一个统一公式,联合捕捉表示散度和预测散度,并通过样本特定信任加权它们的贡献。信任通过两种互补机制量化:不确定性信任,从预测熵导出以抑制不可靠预测;语义对齐信任,从特征空间中的原型相似性计算以强调良好对齐的表示。通过优先考虑自信且语义一致的样本,同时降低噪声或模糊样本的权重,JFPD提供了域差异的可靠性感知估计。我们进一步将JFPD集成到训练目标中,引导适应朝向目标域的可靠区域。在标准基准上的实验表明,所提出的框架始终实现优越的适应性能,并产生与目标域误差相关的差异估计。这项工作首次解决了在域适应中建模特征与预测之间交互信任的重要性。

英文摘要

Domain adaptation aims to mitigate performance degradation caused by distribution shifts between a labeled source domain and an unlabeled or sparsely labeled target domain. Most existing approaches estimate domain discrepancy either in feature space or in prediction space. However, these single-perspective strategies overlook a critical problem under domain shift: the reliability of the signals used for alignment. In practice, both learned representations and semantic predictions may become unreliable, and treating all target samples equally can lead to misleading alignment and suboptimal transfer. We introduce trust-aware domain adaptation, a principled framework that models domain discrepancy through the reliability of feature and prediction signals. Central to our approach is the Joint Feature-Prediction Discrepancy (JFPD), a unified formulation that jointly captures representation divergence and prediction divergence while weighting their contributions by sample-specific trust. Trust is quantified via two complementary mechanisms: uncertainty-aware trust, derived from prediction entropy to suppress unreliable predictions, and semantic-alignment trust, computed from prototype similarity in feature space to emphasize well-aligned representations. By prioritizing confident and semantically consistent samples while down-weighting noisy or ambiguous ones, JFPD provides a reliability-aware estimate of domain discrepancy. We further integrate JFPD into a training objective that guides adaptation toward trustworthy regions of the target domain. Experiments on standard benchmarks demonstrate that the proposed framework consistently achieves superior adaptation performance and yields discrepancy estimates that correlate with target-domain error. This work addresses, for the first time, the importance of modeling trust in the interaction between features and predictions for domain adaptation.

2605.25115 2026-05-26 cs.LG cs.AI cs.CE physics.app-ph

Courant: a State-Adaptive Perceiver-Based Neural Surrogate with Local Support and Interpretable Field Decomposition

Courant:一种具有局部支持和可解释场分解的状态自适应感知器神经代理模型

Anuj Kumar, Josiah Bjorgaard, Nikolaos Bouklas, Matteo Salvador, Alexander Lavin

AI总结 提出基于感知器的编码-处理-解码代理模型Courant,通过状态自适应潜在查询和轻量解码器实现类似自适应hp细化的局部支持与可解释场分解,在稳态/瞬态模拟基准上取得竞争性精度。

详情
AI中文摘要

我们引入“Courant”,一种基于感知器的编码器-处理器-解码器代理模型,其潜在特征在物理空间中表现出自适应专门化和局部支持,实现了类似于自适应hp细化方案的功能,这是传统数值求解器和科学机器学习中非常期望的属性。所提出的架构结合了共享随机傅里叶特征坐标嵌入、状态自适应潜在查询和轻量解码器。Courant使用稳态或瞬态模拟数据进行端到端训练,仅使用物理空间中的标准L_2预测损失,在基准测试上达到竞争性精度。我们证明Courant的归纳偏差产生了设计上可解释的潜在变量:它们在模拟域中发展出多尺度几何专门化,并在时间相关情况下跟踪相干结构,类似于随时间演化的空间基函数,从而允许对模拟场进行紧凑的、几何锚定的、单位划分式的分解。

英文摘要

We introduce "Courant", a Perceiver-based encoder-processor-decoder surrogate model that has latent features exhibiting adaptive specialization and local support in the physical space, enabling functionality akin to an adaptive hp-refinement scheme, an attribute that is highly desirable in traditional numerical solvers and scientific machine learning broadly. The proposed architecture combines a shared random Fourier feature coordinate embedding, state-adapted latent queries, and a light-weight decoder. Courant is trained end-to-end with steady or transient simulation data and only a standard L_2 prediction loss in the physical space, achieving competitive accuracy on benchmarks. We demonstrate that Courant's inductive biases yield latents that are interpretable by design: they develop multiscale geometric specialization in the simulation domain and track coherent structures in the time-dependent case, acting analogously to time-evolving spatial basis functions and allowing for decoding a compact, geometry-anchored, partition-of-unity-like decomposition of the simulated field.

2605.25111 2026-05-26 cs.LG

Revisiting Pre-Propagation GNNs: Robust Diffusion Operators and Hidden-State Re-Propagation

重新审视预传播图神经网络:鲁棒扩散算子与隐状态再传播

Zichao Yue, Zhiru Zhang

AI总结 提出鲁棒图扩散算子和少量隐状态再传播方案,使预传播图神经网络在保持训练效率的同时匹配消息传递图神经网络的精度。

详情
AI中文摘要

预传播图神经网络(PPGNNs)将节点特征传播与变换解耦:图扩散作为预处理一次性执行,训练简化为每个节点的密集变换。这种设计使得小批量训练无需节点间依赖,避免了重复的稀疏矩阵-矩阵乘法,并更好地适配针对密集计算优化的现代加速器。然而,其表达能力仍不明确,实验结果表明PPGNNs与对应的消息传递图神经网络在常用图基准(尤其是异配图)上存在差距。本文提出一套用于预处理的鲁棒图扩散算子和训练过程中的少量隐状态再传播方案。我们的方法提高了PPGNNs的验证和测试准确率,使其在保持训练效率的同时匹配消息传递图神经网络的精度。

英文摘要

Pre-propagation graph neural networks (PPGNNs) decouple node feature propagation from transformation: graph diffusion is performed once as preprocessing, and training reduces to dense per-node transformations. This design enables mini-batch training without inter-node dependencies, avoids repeated sparse matrix--matrix multiplications, and better matches modern accelerators optimized for dense compute. However, their expressivity remains unclear, and empirical results show a gap between PPGNNs and their message-passing counterparts on commonly used graph benchmarks, especially heterophilic ones. In this paper, we propose a suite of robust graph diffusion operators for preprocessing and a few-shot hidden-state re-propagation scheme during training. Our methods improve the validation and test accuracy of PPGNNs, enabling them to match the accuracy of message-passing GNNs while maintaining training efficiency.

2605.25110 2026-05-26 cs.CV cs.AI cs.LG

Uncertainty-DTW for Sequences and Visual Tokens

Uncertainty-DTW 用于序列和视觉标记

Lei Wang, Syuan-Hao Li, Yongsheng Gao, Piotr Koniusz

AI总结 提出不确定性感知的动态时间规整(uDTW)框架,通过异方差不确定性建模和最大似然估计实现鲁棒对齐,并推广到视觉标记集,在多个领域取得优于现有方法的结果。

Comments Research report

详情
AI中文摘要

对齐结构化数据是计算机视觉和机器学习中的一个基本问题,支撑着时间序列分析、人类动作识别和视觉表示学习等任务。现有的对齐方法,包括动态时间规整(DTW)及其可微变体,依赖于确定性相似度度量,因此对异质和噪声特征敏感。在这项工作中,我们引入了不确定性感知对齐,这是一个概率框架,用异方差不确定性建模成对对应关系,并沿对齐路径执行结构化匹配。我们的公式,不确定性-DTW(uDTW),为每个对应分配一个正态分布,并通过最大似然估计目标参数化每条对齐路径,该目标包括(i)一个精度加权匹配项,抑制不可靠特征,以及(ii)一个对数方差正则化,防止退化解。这产生了一个概率对齐机制,对噪声具有鲁棒性且可解释,因为不确定性直接反映了匹配的可靠性。我们进一步将该框架从时间序列推广到标记化的视觉表示,从而能够对视觉标记集进行结构化匹配。学习到的不确定性可以解释为反向注意力:语义相关区域表现出低不确定性并主导对齐,而模糊/噪声区域具有高不确定性。这提供了对齐、注意力和不确定性建模之间的联系。我们在不同领域评估了所提出的框架。结果表明,与最先进的方法相比,该方法持续改进,并且学习到的不确定性与语义重要性相关。这些发现将不确定性感知对齐确立为一个通用、鲁棒且可解释的框架,用于从结构化数据中学习。

英文摘要

Aligning structured data is a fundamental problem in computer vision and machine learning, underlying tasks such as time series analysis, human action recognition, and visual representation learning. Existing alignment methods, including Dynamic Time Warping (DTW) and its differentiable variants, rely on deterministic similarity measures and are therefore sensitive to heterogeneous and noisy features. In this work, we introduce uncertainty-aware alignment, a probabilistic framework that models pairwise correspondences with heteroscedastic uncertainty and performs structured matching along alignment paths. Our formulation, uncertainty-DTW (uDTW), assigns each correspondence a Normal distribution and parametrizes each alignment path by a Maximum Likelihood Estimate objective consisting of (i) a precision-weighted matching term that suppresses unreliable features, and (ii) a log-variance regularization that prevents degenerate solutions. This yields a probabilistic alignment mechanism that is robust to noise and interpretable, as uncertainty directly reflects the reliability of matches. We further generalize this framework from temporal sequences to tokenized visual representations, enabling structured matching over sets of visual tokens. The learned uncertainty can be interpreted as a reverse-attention: semantically relevant regions exhibit low uncertainty and dominate the alignment, while ambiguous/noisy regions have high uncertainty. This provides a connection between alignment, attention, and uncertainty modeling. We evaluate the proposed framework across diverse domains. The results demonstrate consistent improvements over state-of-the-art methods and show that learned uncertainty correlates with semantic importance. These findings establish uncertainty-aware alignment as a general, robust, and interpretable framework for learning from structured data.

2605.25109 2026-05-26 cs.RO

Soft Pneumatic Actuators for Soft Robotics: A Motion-Based Review of Actuation Mechanisms and Performance Trade-offs

软体机器人中的软体气动执行器:基于运动的驱动机制与性能权衡综述

Mohammed Abboodi

AI总结 本文基于运动类型(直线、弯曲、扭转、全向)分类综述软体气动执行器的设计策略,分析结构特征对运动输出、力产生、空气需求等性能的影响,并讨论选择与比较时的关键条件。

详情
AI中文摘要

软体气动执行器在软体机器人中广泛应用,因为它们能够产生大运动,同时保持足够的柔顺性,以安全地与物体、环境和人体交互。然而,它们的性能并不仅仅由压力决定。相反,响应取决于执行器的构建方式,包括腔室的形状、增强材料的放置、褶皱的使用、材料刚度以及引导其变形的约束。随着文献的扩展,确定哪种机制最适合特定应用以及哪些报告的结果可以在研究之间进行比较变得更加困难。本文根据用于生成四类运动(直线、弯曲、扭转和全向驱动)的设计策略来审视软体气动执行器。对于每一类,它分析了定义变形路径的结构特征,包括编织角、褶皱几何形状、纤维取向、腔室排列、结构不对称性和内部约束层。然后讨论了设计选择如何影响运动输出、力产生、空气需求、可重复性、耐久性、制造难度和机器人集成。本文进一步确定了在选择或比较执行器时必须考虑的关键条件,包括压力、负载条件、执行器尺寸、气动供应和滞后。这种方法有助于解释为什么具有相似运动输出的执行器在设计要求、气动需求和实际适用性上可能存在显著差异。它还突出了在可穿戴、生物医学和移动机器人应用中实现紧凑、高效、可重复和可部署的软体气动系统所需的设计优先级。

英文摘要

Soft pneumatic actuators are widely used in soft robotics because they can produce large motions while remaining compliant enough to interact safely with objects, environments, and the human body. However, their performance is not solely determined by pressure. Instead, the response depends on the way the actuator is built, including the shape of its chambers, the placement of reinforcements, the use of folds, material stiffness, and the constraints that guide its deformation. As the literature has expanded, it has become more difficult to determine which mechanism is most suitable for a given application and which reported results can be compared across studies. This review examines soft pneumatic actuators according to the design strategies used to generate four motion classes: linear, bending, twisting, and omnidirectional actuation. For each class, it analyzes the structural features that define the deformation path, including braid angle, fold geometry, fiber orientation, chamber arrangement, structural asymmetry, and internal constraint layers. It then discusses how the design choice affect motion output, force generation, air demand, repeatability, durability, fabrication difficulty, and robotic integration. The review further identifies key conditions that must be considered when selecting or comparing actuators, including pressure, loading condition, actuator size, pneumatic supply, and hysteresis This approach helps explain why actuators with similar motion outputs may differ substantially in design requirements, pneumatic demand, and practical suitability. It also highlights the design priorities needed for compact, efficient, repeatable, and deployable soft pneumatic systems in wearable, biomedical, and mobile robotic applications.

2605.25107 2026-05-26 cs.LG cs.AI cs.NA math.NA

Leveraging Gauge Freedom for Learning Non-Gradient Population Dynamics of Stochastic Systems

利用规范自由度学习随机系统的非梯度种群动力学

Jules Berman, Tobias Blickhan, Benjamin Peherstorfer

AI总结 针对现有种群动力学推断局限于梯度流的问题,提出非梯度推断流(NGIF)算法,通过连续性方程的弱形式参数化一般向量场并选择非最小动能准则,在低维和高维物理问题中提高了分布精度并更好地捕捉非势输运。

详情
AI中文摘要

现有的种群动力学推断工作通常关注由标量势的梯度向量场产生的流。在所有与种群动力学兼容的容许流中,梯度流在特定意义下是最优的:它们最小化动能。基于不同准则选择场对应于确定种群动力学时的规范自由度,我们在本文中利用了这一点。我们提出了非梯度推断流(NGIF),一种使用连续性方程弱形式推断非梯度种群动力学的算法。这使我们能够参数化一般向量场,并选择超出最小动能的其他选择准则。我们在各种低维和高维物理问题上证明,这种更一般的方法提高了相对于梯度受限基线的分布精度,并更好地捕捉了非势输运。

英文摘要

Existing work on population dynamics inference often focuses on flows arising from vector fields that are the gradients of scalar potentials. Among all admissible flows that are compatible with the population dynamics, gradient flows are optimal in a specific sense: they minimize kinetic energy. The selection of fields based on different criteria corresponds to a gauge freedom when determining population dynamics, which we leverage in this work. We propose Non-Gradient Inference Flows (NGIF), an algorithm to infer non-gradient population dynamics using a weak formulation of the continuity equation. This allows us to parameterize general vector fields and choose other selection criteria beyond minimal kinetic energy. We demonstrate on a variety of low- and high-dimensional physics problems that this more general approach improves distributional accuracy over gradient-restricted baselines and better captures non-potential transport.

2605.25095 2026-05-26 cs.AI cs.LG math.OC

RECTOR: Priority-Aware Rule-Based Reranking for Compliance-Aware Autonomous Driving Trajectory Selection

RECTOR: 基于优先级规则的合规感知自动驾驶轨迹选择重排序

Hadi Hajieghrary, Benedikt Walter, Chaitanya Shinde, Paul Schmitt, Miguel Hurtado

AI总结 提出RECTOR,一种后生成重排序层,通过差异化代理和场景条件适用性机制,基于分层规则手册(安全>法律>道路>舒适)对候选轨迹进行评分,并采用确定性ε-词典序规则选择,在无需重新训练预测器的情况下,将安全与法律违规率从28.58%降至20.42%。

详情
AI中文摘要

自动驾驶堆栈必须从多模态候选集中选择一条轨迹;仅凭模型置信度选择会忽略安全、交通法规和舒适性约束。我们提出RECTOR(规则强制约束轨迹编排器),一种后生成重排序层,通过差异化代理和场景条件适用性机制,根据分层规则手册(安全>法律>道路>舒适)对候选轨迹进行评分,然后采用确定性ε-词典序规则进行选择,该规则通过构造保持跨层优先级——无需重新训练预测器。在Waymo开放运动数据集validation_interactive划分(43,219个增强实例,K=6)上,根据协议B(28条规则代理目录,oracle适用性),与同一候选集上仅基于置信度的选择相比,规则感知选择将安全+法律违规从28.58%降至20.42%,总违规从40.32%降至32.41%。在该基准上,均匀加权求和基线匹配了二元合规性——经验提升来自规则感知排序,而词典序保证是任何权重校准无法复制的结构性差异因素。在对抗性置信度破坏下,仅置信度选择在100%的场景中失败,而两种规则感知选择器在约96%的场景中拒绝了注入的模式。所有数据均为代理评估器结果(非安全认证),开环,5秒时域,美国规则,验证集划分。

英文摘要

Autonomous driving stacks must pick one trajectory from a multi-modal candidate set; choosing by model confidence ignores safety, traffic-law, and comfort constraints. We present \textsc{RECTOR} (Rule-Enforced Constrained Trajectory Orchestrator), a post-generation reranking layer that scores candidates against a tiered rulebook (Safety~$\succ$~Legal~$\succ$~Road~$\succ$~Comfort) via differentiable proxies and a scene-conditioned applicability mechanism, then selects with a deterministic $\varepsilon$-lexicographic rule that preserves cross-tier priority by construction -- without retraining the predictor. On the Waymo Open Motion Dataset \texttt{validation\_interactive} split (43{,}219 augmented instances, $K{=}6$), under Protocol~B (28-rule proxy catalog, oracle applicability) rule-aware selection cuts Safety+Legal violations from 28.58\% to 20.42\% and Total from 40.32\% to 32.41\% versus confidence-only on the same candidates. A uniform-weight weighted-sum baseline matches binary compliance on this benchmark -- the empirical lift comes from rule-aware ranking, while the lexicographic guarantee is the structural differentiator no weight calibration can replicate. Under adversarial confidence corruption, confidence-only selection fails in 100\% of scenarios while both rule-aware selectors reject the injected mode in $\sim$96\%. All figures are proxy-evaluator results (not a safety certificate), open-loop, 5\,s horizon, U.S.\ rules, validation split.