arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3418
2605.25091 2026-05-26 cs.AI

Evolutionary Enhanced Multi-Agent Reinforcement Learning for Cooperative Air Combat

进化增强的多智能体强化学习用于协同空战

Chengwei Li, Junlin Liu, Yang Gao

AI总结 针对多机协同空战中现有MARL方法探索效率低、样本利用率低和策略泛化差的问题,提出ACE-MAPPO混合学习框架,融合进化算法与MAPPO,通过遗传软更新、进化优先轨迹回放和对抗进化课程学习机制提升性能。

详情
AI中文摘要

随着现代空战向超视距多机协同交战演变,无人作战飞行器的自主决策面临高维状态空间、离散动作指令和强对抗动态环境的重大挑战。为克服现有基于多智能体强化学习的方法在此类场景中的局限性,即探索效率不足、样本利用率低和策略泛化能力差,我们提出了对抗课程与进化增强的多智能体近端策略优化(ACE-MAPPO),一种将进化算法与MAPPO相结合的混合学习框架。具体而言,引入了遗传软更新机制以增强种群多样性并缓解收敛到局部最优。进一步采用了进化增强的优先轨迹回放策略以提高稀疏高价值样本的利用率。此外,设计了对抗进化课程学习机制,实现难度逐渐增加的自适应训练。大量实验结果表明,所提方法在训练稳定性、收敛速度和胜率方面优于MAPPO及其他基线算法,验证了其在多机协同空战场景中的有效性。

英文摘要

As modern air combat evolves toward beyond-visual-range (BVR) multi-aircraft cooperative engagements, autonomous decision-making for unmanned combat aerial vehicles (UCAVs) faces significant challenges due to high-dimensional state spaces, discrete action commands, and strongly adversarial dynamic environments. To overcome the limitations of existing multi-agent reinforcement learning (MARL) methods in such settings, namely insufficient exploration efficiency, low sample utilization, and poor policy generalization, we propose Adversarial Curriculum and Evolutionary-enhanced Multi-agent Proximal Policy Optimization (ACE-MAPPO), a hybrid learning framework that integrates evolutionary algorithms with MAPPO. Specifically, a genetic soft update mechanism is introduced to enhance population diversity and mitigate convergence to local optima. An evolutionary-augmented prioritized trajectory replay strategy is further employed to improve the utilization of sparse high-value samples. In addition, an adversarial evolutionary curriculum learning mechanism is designed to enable adaptive training with progressively increasing difficulty. Extensive experimental results demonstrate that the proposed method outperforms MAPPO and other baseline algorithms in terms of training stability, convergence speed, and win rate, validating its effectiveness in multi-aircraft cooperative air combat scenarios.

2605.25077 2026-05-26 cs.CV

WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

WorldCraft: 从相机导航到交互式视频世界模型中的物体操控

Bohai Gu, Taiyi Wu, Yueyang Yuan, Jian Liu, Xiaocheng Lu, Dazhao Du, Jie Zhang, Jinxiang Lai, Shuai Yang, Xiaotong Zhao, Alan Zhao, Song Guo

AI总结 提出WorldCraft框架,通过轨迹控制管道(NWT、SP-LoRA、TASP)将交互式视频世界模型从相机导航扩展到物体级轨迹操控,实现用户指定路径下的物体运动与相机导航共存。

Comments Project page: https://nevsdev.github.io/WorldCraft/

详情
AI中文摘要

最近的基于视频的世界模型使像素空间环境在相机层面具有交互性:用户可以导航视角,同时模型生成连贯的视觉延续。然而,它们的动作空间仍然不完整:用户可以移动相机,但不能对单个物体进行操作。由于现实世界的交互本质上是物体中心的,这样的模型更接近被动的场景观察者,而非真正可操控的环境。我们提出WorldCraft,一个将交互式视频世界模型从相机导航扩展到物体级轨迹动作的框架。给定用户点击和手绘路径,WorldCraft生成未来帧,其中所选物体遵循指定轨迹运动,同时相机继续导航场景。WorldCraft通过一个轨迹中心控制管道实现这一点:首先,归一化世界轨迹(NWT)在相机不变的世界坐标系中表示用户绘制的运动,并在当前相机姿态下动态重投影,将物体运动与相机引起的屏幕空间位移分离;然后,空间路径LoRA(SP-LoRA)通过模型的空间控制路径注入这个世界空间信号,在保留预训练相机控制器的同时增加物体操控能力;最后,轨迹锚定状态持久化(TASP)将世界轨迹视为持久空间状态,并在轨迹条件生成后刷新自回归记忆,使移动物体在离开相机视野后能够在其更新位置重新出现。实验表明,WorldCraft实现了精确的物体控制,在仅相机评估下保持了基于视频的世界模型的相机保真度,并在包含离屏移动的长自回归展开中维持了物体状态。

英文摘要

Recent video-based world models have made pixel-space environments interactive at the camera level: users can navigate viewpoints while the model generates coherent visual continuations. Yet their action spaces remain incomplete: users can move the camera, but cannot act on individual objects. Since real-world interaction is inherently object-centric, such models remain closer to passive scene observers than truly manipulable environments. We present WorldCraft, a framework that expands interactive video world models from camera navigation to object-level trajectory actions. Given a user click and a sketched path, WorldCraft generates future frames in which the selected object follows the prescribed trajectory while the camera continues to navigate the scene. WorldCraft achieves this through a trajectory-centric control pipeline: First, Normalized World Trajectory (NWT) represents user-drawn motion in a camera-invariant world coordinate system and dynamically re-projects it under the current camera pose, separating object motion from camera-induced screen-space displacement; Spatial-Pathway LoRA (SP-LoRA) then injects this world-space signal through the model's spatial-control pathway, adding object manipulation capability while preserving the pretrained camera controller; finally, Trajectory-Anchored State Persistence (TASP) treats the world trajectory as a persistent spatial state and refreshes autoregressive memory after trajectory-conditioned generation, allowing moved objects to reappear at their updated positions after leaving the camera view. Experiments show that WorldCraft enables accurate object control, preserves the video-based world model's camera fidelity under camera-only evaluation, and maintains object state across long autoregressive rollouts with off-camera excursions.

2605.25063 2026-05-26 cs.LG cond-mat.mtrl-sci

Reinforcement Learning for Laser Additive Manufacturing Scan-Order Optimisation: A Bilevel Proxy--FEA Diagnostic Framework for Reward and World-Model Diagnosis

激光增材制造扫描顺序优化的强化学习:用于奖励和世界模型诊断的双层代理-有限元分析诊断框架

Xian Wu, Haoran Li, Dongbin Zhao, Ruiyao Zhang, Yuanqi Chu, Bin Wang

AI总结 本文提出一个双层代理-有限元分析诊断框架,通过轻量代理和稀疏有限元模拟,诊断强化学习在激光增材制造扫描顺序优化中的奖励和世界模型保真度问题。

Comments 31 pages, 7 figures, 3 tables

详情
AI中文摘要

强化学习为激光增材制造中的扫描顺序优化提供了一种有前景的方法,其中顺序扫描决策关键影响热积累、残余应力、变形和最终零件质量。将RL应用于该领域的一个核心挑战在于奖励和世界模型的保真度:完整的有限元分析在密集的环路评估中计算成本过高,而廉价的热启发代理度量虽然高效,但可能仅捕获真实热机械目标的局部方面。本文研究了一个用于强化学习引导的扫描顺序优化中奖励和世界模型诊断的双层代理-有限元分析诊断框架。下层采用轻量扫描路径和热启发代理进行快速候选生成和初步策略侧筛选,而上层利用稀疏的Abaqus有限元分析模拟提供基于模拟的参考标签。该框架在一个简化的全轨迹加热LDED32条纹基准上进行检验,该基准包含十种代表性扫描策略。最终冷却残余Mises应力、U3垂直变形和PEEQ塑性度量揭示了一个观察到的应力-变形权衡,而非单一单调的质量目标。在评估的集合中,center_out策略成为稳健的折衷候选,而raster_left_to_right和edge_in构成权衡的对立端点。代理-有限元分析对齐分析表明,当前廉价的基于路径的度量主要捕获变形相关(U3)行为,且与稀疏有限元分析参考标签仅呈现弱相关性。这些发现表明,仅代理的奖励设计在未来的RL训练中可能存在错位风险,并强调了在大规模策略优化之前,稀疏有限元分析参考信号对于诊断引导的奖励和世界模型精炼的价值。

英文摘要

Reinforcement learning offers a promising approach for scan-order optimisation in laser additive manufacturing, where sequential scan decisions critically influence thermal accumulation, residual stress, distortion, and final part quality. A central challenge in applying RL to this domain lies in reward and world-model fidelity: full finite-element analysis is computationally prohibitive for dense in-the-loop evaluation, while cheap thermo-inspired proxy metrics, though efficient, may capture only partial aspects of the true thermo-mechanical objectives. This paper investigates a bilevel Proxy--FEA diagnostic framework for reward and world-model diagnosis in reinforcement-learning-guided scan-order optimisation. The lower level employs lightweight scan-path and thermo-inspired proxies for rapid candidate generation and preliminary policy-side screening, while the upper level utilises sparse Abaqus FEA simulations to provide simulation-based reference labels. The framework is examined on a simplified whole-track heating LDED32 stripe benchmark comprising ten representative scan strategies. Final-cooling residual Mises stress, U3 vertical distortion, and PEEQ plasticity metrics reveal an observed stress--distortion trade-off rather than a single monotonic quality objective. Within the evaluated set, the center_out strategy emerges as a robust compromise candidate, while raster_left_to_right and edge_in form opposing endpoints of the trade-off. Proxy--FEA alignment analysis shows that current cheap path-based metrics predominantly capture distortion-related (U3) behaviour and exhibit only weak correlation with the sparse FEA reference labels. These findings highlight that proxy-only reward designs risk misalignment in future RL training and underscore the value of sparse FEA reference signals for diagnostic-guided reward and world-model refinement prior to large-scale policy optimisation.

2605.25061 2026-05-26 cs.LG cs.AI

GL-LFGNN:A Global-Local Dual-branch Causal Graph Neural Network Based on Liang-Kleeman Information Flow for EEG Emotion Recognition

GL-LFGNN:基于Liang-Kleeman信息流的全局-局部双分支因果图神经网络用于脑电情感识别

Ziyi Wang, Dongyang Kuang

AI总结 提出GL-LFGNN模型,利用Liang-Kleeman信息流理论构建有向因果图,通过全局-局部双分支架构整合全脑与区域连接,在MEEG数据集上以少量参数实现高精度情感识别。

Comments 10 pages, 3 figures

详情
AI中文摘要

基于脑电的情感识别在客观诊断情绪障碍方面具有重要前景。图神经网络已成为建模脑电通道间依赖关系的主流范式,但现有方法依赖于基于空间邻近性或功能相关性导出的对称邻接矩阵,这些矩阵本质上捕捉的是统计关联而非有向因果影响,这与神经信息流固有的非对称、因果驱动特性相冲突。为弥合这一差距,我们提出GL-LFGNN,一种基于Liang-Kleeman信息流理论的全局-局部双分支因果图神经网络。与仅评估时间优先性的格兰杰因果不同,我们的方法从动力系统角度严格量化因果强度,生成神经生理学可解释的有向图。双分支架构进一步将全脑连接性与符合既定功能神经解剖学的区域特定处理相结合。在MEEG数据集上,GL-LFGNN仅用37K参数(约为当前最优模型的10%)便达到86.17%(唤醒度)和86.71%(效价)的准确率,表明原则性的因果建模可同时增强可解释性、泛化能力和计算效率。代码将开源。

英文摘要

EEG-based emotion recognition holds significant promise for objective diagnosis of mood disorders. Graph neural networks (GNNs) have emerged as the dominant paradigm for modeling inter-channel dependencies in EEG, yet existing approaches rely on symmetric adjacency matrices derived from spatial proximity or functional correlations that fundamentally capture statistical associations rather than directed causal influences, which conflicts with the inherently asymmetric, causally-driven nature of neural information flow. To bridge this gap, we propose GL-LFGNN, a Global-Local Dual-branch Causal Graph Neural Network grounded in Liang-Kleeman information flow theory. Unlike Granger causality that merely assesses temporal precedence, our approach rigorously quantifies causal strength from a dynamical systems perspective, yielding neurophysiologically interpretable directed graphs. A dual-branch architecture further integrates whole-brain connectivity with region-specific processing aligned to established functional neuroanatomy. On the MEEG dataset, GL-LFGNN achieves 86.17% (Arousal) and 86.71% (Valence) accuracy with only 37K parameters -- approximately 10% of the current state-of-the-art -- demonstrating that principled causal modeling can simultaneously enhance interpretability, generalization, and computational efficiency. Code will be released.

2605.25052 2026-05-26 cs.CL

Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth

忠实性指标并不衡量忠实性:基于真实标签的元评估

Yoav Gur-Arieh, Ana Marasović, Mor Geva

AI总结 针对思维链忠实性度量缺乏真实标签验证的问题,构建了包含真实忠实性标签的数据集BonaFide,系统评估现有指标,发现多数指标表现接近随机、存在偏差且计算成本高。

详情
AI中文摘要

思维链(CoT)已成为解释和审计大型语言模型行为的核心工具。然而,越来越多的证据表明,这些轨迹往往未能忠实反映模型预测背后的计算过程。已有多种忠实性指标被提出,但它们是否真正衡量了忠实性仍不得而知。回答这一问题需要真实标签,但由于内部计算不可直接观察,真实标签难以获取。因此,大多数提出指标的工作仅报告绝对分数或与先前指标的对比,而少数现有基准依赖于似然性或重要性等代理指标,这些属性与忠实性正交,可能误导对CoT可信度的判断。我们通过构建任务来应对这一挑战,这些任务的输出揭示了哪些中间计算必然产生了它们,并开发了一个自动化标注流程,在步骤级和CoT级生成真实忠实性标签。基于这一方法,我们提出了BonaFide基准,包含来自13个任务和10个模型的3066个标注CoT,并利用它首次系统评估了主流忠实性指标。我们的实验表明,大多数指标表现接近随机,存在强烈的预测偏差,并且在更长的CoT上性能下降。最佳指标在CoT级仅达到0.70 AUROC,另一指标在步骤级达到0.59,且两者均无法跨设置迁移,同时计算成本过高。我们的结果暴露了当前忠实性评估中的根本性缺陷,并呼吁开发更可靠、更高效的指标。

英文摘要

Chains of thought (CoTs) have become central in interpreting and auditing behaviors of large language models. Yet growing evidence suggests that these traces often fail to faithfully represent the computations behind a model's predictions. Several faithfulness metrics have been proposed, but whether they indeed measure faithfulness remains unknown. Answering this requires ground-truth labels, which are hard to obtain since internal computations are not directly observable. Consequently, most works proposing metrics report only absolute scores or comparisons to prior metrics, and the few existing benchmarks rely on proxies like plausibility or importance, properties orthogonal to faithfulness that can mislead about whether a CoT can be trusted. We address this challenge by constructing tasks whose outputs reveal which intermediate computations must have produced them, and developing an automated labeling pipeline that yields ground-truth faithfulness labels at both the step and CoT level. Building on this methodology, we present BonaFide, a benchmark of 3,066 labeled CoTs across 13 tasks and 10 models, and use it to conduct the first systematic evaluation of prominent faithfulness metrics. Our experiments show that most metrics perform near chance, exhibit strong prediction biases and degrade on longer CoTs. The best metric reaches only 0.70 AUROC at the CoT level while another reaches 0.59 at the step level, with neither transferring across settings, while entailing prohibitively high computational cost. Our results expose fundamental gaps in current faithfulness evaluation and call for the development of more reliable and efficient metrics.

2605.25045 2026-05-26 cs.AI

AION: Next-Generation Tasks and Practical Harness for Time Series

AION:下一代时间序列任务与实用框架

Tianxiang Zhan, Xiaobao Song, Tong Guan, Shirui Pan, Ming Jin

AI总结 针对时间序列研究向结合预测、上下文推理、工具使用和结构化决策支持的现实任务转变,提出AION框架,通过时间锚定、知识推理和可靠性机制(如实验后分析和分层审查)实现更详细的过程追踪和审查步骤。

Comments Project page and code are available at https://github.com/ztxtech/aion

详情
AI中文摘要

时间序列研究正从固定的预测基准转向结合预测、上下文推理、工具使用和结构化决策支持的现实任务。大多数基准基于干净数据和短评估循环构建;仅靠智能体可能会在最终输出前忽略时间约束、证据检查或审查。我们首先将下一代时间序列任务形式化为由任务文件、工作空间和验证接口组成的三元组。然后,我们提出AION,一个由六个组件组(智能体、技能、规则、记忆、评估和协议)构建的时间序列框架。在该框架中,我们使用三个设计原则:时间锚定、时间知识推理以及可靠性机制(如实验后分析和分层审查)。Kaggle商店销售案例研究表明,与在OpenCode直接构建模式下运行的相同基础智能体相比,该框架产生了更详细的过程追踪、更多工件和更多审查步骤。综合来看,这些结果支持从固定任务向现实世界约束下的现实任务的范式转变。

英文摘要

Time series research is moving beyond fixed forecasting benchmarks toward realistic tasks that combine prediction, contextual reasoning, tool use, and structured decision support. Most benchmarks are built around clean data and short evaluation loops; agents alone may miss temporal constraints, evidence checks, or review before finalizing outputs. We first formalize next-generation time series tasks as three-component tuples consisting of a task file, a workspace, and a validation interface. We then present AION, a time series harness built from six component groups: agents, skills, rules, memory, evaluation, and protocols. In this harness, we use three design principles: temporal grounding, temporal knowledge-grounded reasoning, and reliability mechanisms such as post-experiment analysis and layered review. A Kaggle Store Sales case study shows that the harness produces more detailed process traces, more artifacts, and more review steps than the same base agent operating in OpenCode direct build mode. Taken together, these results argue for a paradigm shift from fixed tasks to realistic ones under real-world constraints.

2605.25044 2026-05-26 cs.RO

X-DiffVLA: X-Embodied Diffusion Action Heads for Vision-Language-Action Models

X-DiffVLA:面向视觉-语言-动作模型的跨具身扩散动作头

Boyu Li, Chaoyi Xu, Haoqi Yuan, Xinrun Xu, Börje F. Karlsson, Dongbin Zhao, Haoran Li, Zongqing Lu

AI总结 针对跨具身数据学习通用策略的挑战,提出X-DiffVLA模型,通过扩散模型和具身强制技术实现异构末端执行器间的知识迁移,在RoboCasa和Isaac Gym上分别提升15.3%和12.5%。

详情
AI中文摘要

从跨具身数据中学习通用策略仍然是机器人学中的基本挑战。尽管视觉-语言-动作(VLA)模型在大型多样化数据集上进行了预训练,但它们通常依赖于具身特定的微调才能在下游任务中实现强性能。这一要求严重限制了它们的泛化能力,并阻碍了执行相似任务的具身之间的知识迁移。为了克服这些限制,我们聚焦于共享机器人基座和异构末端执行器的跨具身设置,并提出X-DiffVLA,一种具有统一跨具身动作头的基于扩散的VLA模型。X-DiffVLA能够利用扩散模型的生成优势来捕捉跨具身数据集中的多样性和潜在相关性。具体地,我们引入了具身强制(Embodiment Forcing),一种无分类器引导技术,以隐式地将动作生成导向具身特定的功能组件,无需显式监督即可捕捉细粒度的结构细微差别。此外,设计了形态树扩散(Morphological Tree Diffusion)方法来增强不同末端执行器之间的行为相关性,最大化异构演示的可迁移性。在RoboCasa和Isaac Gym上的实验结果覆盖了从夹爪到灵巧手的多种具身,表明X-DiffVLA达到了最先进的性能,分别提升了15.3%和12.5%。真实世界评估进一步验证了所提出框架的鲁棒性及其在可扩展跨具身策略学习中的有效性。

英文摘要

Learning universal policies from cross-embodied data remains a fundamental challenge in robotics. Although Vision-Language-Action (VLA) models are pre-trained on large and diverse datasets, they typically rely on embodiment-specific fine-tuning to achieve strong performance in downstream tasks. This requirement severely limits their generalization capability and restricts knowledge transfer across embodiments performing similar tasks. To overcome these limitations, we focus on cross-embodied settings with shared robotic bases and heterogeneous end-effectors, and propose X-DiffVLA, a diffusion-based VLA model featuring a unified cross-embodied action head. X-DiffVLA can leverage the generative strengths of diffusion models to capture both the diversity and latent correlations in cross-embodied datasets. Specifically, we introduce Embodiment Forcing, a classifier-free guidance technique to implicitly steer action generation toward embodiment-specific functional components, capturing fine-grained structural nuances without explicit supervision. In addition, a Morphological Tree Diffusion approach is designed to strengthen behavioral correlations across diverse end-effectors, maximizing the transferability of heterogeneous demonstrations. Experimental results across RoboCasa and Isaac Gym, covering different embodiments from grippers to dexterous hands, show that X-DiffVLA achieves state-of-the-art performance, with improvements of 15.3% and 12.5%, respectively. Real-world evaluations further validate the robustness of the proposed framework and its effectiveness in scalable cross-embodied policy learning.

2605.25042 2026-05-26 cs.CV

Unbiased Diffusion Variational Inversion via Principled Posterior Matching

无偏扩散变分反演:基于原则性后验匹配

Weimin Bai, Yuxuan Gu, Yifei Wang, Weijian Luo, He Sun

AI总结 提出原则性后验匹配(PPM)框架,通过精确优化KL散度(利用Fisher散度积分)解决逆问题中模式坍塌和不确定性量化不可靠的问题,统一变分推理和摊销推理,在图像修复、超分辨荧光显微和射电干涉成像中实现高保真重建和校准的不确定性估计。

详情
AI中文摘要

现有的基于分数的逆问题方法通常采用KL散度在反演分布与贝叶斯后验之间的近似最小化。这种近似导致严重的模式坍塌和不可靠的不确定性量化。在本文中,我们提出原则性后验匹配(PPM),一个回归变分推理基础而非使用技巧性近似的框架。我们不依赖启发式近似,而是通过整合Fisher散度严格公式化KL散度的精确优化。我们推导出该积分的可处理等价梯度形式,使得无需先前近似引入的偏差即可进行精确优化。我们的分析清楚地揭示了先前方法中的模式坍塌直接源于这种近似差距。在我们的理论解决方案支持下,PPM统一了两个互补范式:(1)在变分推理中,PPM采用覆盖质量的散度,显著提高了反演多样性和不确定性量化;(2)在摊销推理中,它使得能够训练高效的重建网络以进行快速的单步重建。此外,我们的公式通过推广Fisher散度的积分,自然地扩展到更广泛的散度度量族。我们在具有挑战性的计算成像任务中验证了PPM,包括图像修复、超分辨荧光显微镜和射电干涉黑洞成像。在所有实验中,PPM实现了卓越的重建保真度、忠实的多模态后验恢复以及良好校准的不确定性估计,为科学成像建立了一个稳健的框架。

英文摘要

Existing score-based methods for inverse problems often resort to approximate minimization of the KL divergence between the inversion distribution and the Bayesian posterior. Such an approximation leads to severe mode collapse and unreliable uncertainty quantification. In this paper, we propose Principled Posterior Matching (PPM), a framework that returns to the fundamentals of variational inference, rather than using tricky approximations. Instead of relying on heuristic approximations, we rigorously formulate the exact optimization of the KL divergence via the integration of Fisher divergence. We derive a tractable, equivalent gradient form of this integral, enabling precise optimization without the biases introduced by prior approximations. Our analysis clearly reveals that the mode collapse in previous methods stems directly from this approximation gap. Supported by our theoretical solution, PPM unifies two complementary paradigms: (1) In variational inference, PPM adopts mass-covering divergences that significantly improve the inversion diversity and uncertainty quantification; (2) In amortized inference, it enables the training of an efficient reconstruction network for rapid, single-step reconstruction. Furthermore, our formulation naturally extends to a broader family of divergence measures by generalizing the integral of the Fisher divergence. We validate PPM across challenging computational imaging tasks, including inpainting, super-resolution fluorescent microscopy, and radio interferometric black-hole imaging. In all experiments, PPM achieves superior reconstruction fidelity, faithful multimodal posterior recovery, and well-calibrated uncertainty estimates, establishing a robust framework for scientific imaging.

2605.25041 2026-05-26 cs.RO

RAMBA: 4D Radar Mapping by Bundle Adjustment

RAMBA: 通过束调整的4D雷达建图

Jianzhu Huai, Yiwen Chen, Binliang Wang

AI总结 提出RAMBA框架,利用束调整联合优化雷达帧状态,结合协方差加权几何残差、IMU预积分因子和雷达自速度约束,实现全局一致的4D雷达建图。

Comments 5 pages, 2 figures, to present in ISPRS2026 Thematic Session 10 on Radar Perception

详情
AI中文摘要

4D雷达在机器人建图中越来越有吸引力,因为它提供距离、方位角、仰角和多普勒测量,同时在恶劣可见度条件下保持鲁棒性。尽管最近的雷达和雷达-惯性里程计方法已经实现了有前景的在线状态估计性能,但4D雷达的离线全局地图优化仍未得到充分探索。本文提出了RAMBA,一种用于全局一致4D雷达建图的雷达束调整框架。给定来自雷达-惯性里程计前端的初始位姿和雷达帧,RAMBA使用协方差加权几何残差、IMU预积分因子和雷达自速度约束联合优化雷达帧状态。几何残差通过跨选定帧形成基于体素的对应关系,并用点协方差加权每个残差,将成对GICP扩展到多帧优化。为了提高对漂移和重访的鲁棒性,RAMBA在对应关系形成过程中强制时间一致性,同时明确支持闭环约束。在ColoRadar和SNAIL Radar数据集上的实验表明,与雷达-惯性里程计和位姿图优化基线相比,RAMBA提高了地图一致性并通常提升了轨迹精度。

英文摘要

4D radar is increasingly attractive for robotic mapping because it provides range, azimuth, elevation, and Doppler measurements while remaining robust in adverse visibility conditions. Although recent radar and radar--inertial odometry methods have achieved promising online state estimation performance, offline global map refinement for 4D radar remains underexplored. This paper presents RAMBA, a radar bundle-adjustment framework for globally consistent 4D radar mapping. Given initial poses and radar frames from a radar--inertial odometry front-end, RAMBA jointly refines radar frame states using covariance-weighted geometric residuals, IMU preintegration factors, and radar ego-velocity constraints. The geometric residuals extend pairwise GICP to a multi-frame optimization by forming voxel-based correspondences across selected frames and weighting each residual with point covariances. To improve robustness against drift and revisits, RAMBA enforces temporal consistency during correspondence formation while explicitly supporting loop-closure constraints. Experiments on the ColoRadar and SNAIL Radar datasets show that RAMBA improves map consistency and usually enhances trajectory accuracy over radar--inertial odometry and pose-graph optimization baselines.

2605.25039 2026-05-26 cs.CV

AstroRAG -- A Pagerank-Based Retrieval-Augmented Generation Pipeline for Question Answering in Astronomy

AstroRAG -- 一种基于PageRank的检索增强生成管道用于天文学问答

Zhifeng Wang, Jason Jingshi Li, Kaihao Zhang, Ramesh Sankaranarayana

AI总结 提出AstroRAG,一种基于PageRank的检索增强生成管道,通过两阶段检索(MMR和PR重排序)在严格token预算下选择紧凑互支持的上下文,无需训练且保护隐私,在天文学QA基准上使Mistral-7B准确率和F1分数达到79.49%,性能近乎翻倍。

Comments Accepted to IEEE CAI 2026

详情
AI中文摘要

大型语言模型(LLMs)在自然语言处理中表现出强大的性能,但仅依赖参数化知识时常常产生事实性错误。检索增强生成(RAG)通过将响应基于外部证据来减轻这些错误,然而传统的检索-转储方法经常引入无关上下文,从而降低答案质量。在这项工作中,我们提出了AstroRAG——一种基于PageRank的检索增强生成(RAG)管道,适用于天文学中的问答。该系统在Elasticsearch中执行token感知的分块和每个实例的临时索引,然后执行两阶段检索:(i)最大边际相关性(MMR)以获得一个小的、多样化的候选集,以及(ii)在相似性图上进行读者驱动的PageRank(PR)重排序,以在严格的token预算下识别紧凑、互支持的上下文。我们的设计无需训练、保护隐私且可重复,因为每个实例通过临时索引处理以防止跨任务泄漏。我们在用于天文学QA的AstroQA基准上评估了该管道,并在所有难度级别上展示了有竞争力的性能。特别是,RAG增强的Mistral-7B实现了 extbf{79.49\%的准确率}和 extbf{79.49\%的F1分数},几乎是非RAG对应版本性能的两倍。这些结果突显了严格检索和精炼在提升领域特定推理方面的有效性,为将RAG扩展到其他科学领域奠定了坚实基础。

英文摘要

Large language models (LLMs) demonstrate strong performance in natural language processing but often generate factual errors when relying solely on parametric knowledge. Retrieval-Augmented Generation (RAG) mitigates these errors by grounding responses in external evidence, yet conventional retrieve-and-dump approaches frequently introduce irrelevant context that degrades answer quality. In this work, we present AstroRAG -- a PageRank-based retrieval-augmented generation (RAG) pipeline adapted for question answering in astronomy. The system performs token-aware chunking and per-instance, ephemeral indexing in Elasticsearch, then executes a two-stage retrieval: (i) Maximal Marginal Relevance (MMR) to obtain a small, diverse candidate set and (ii) a reader-driven PageRank (PR) re-ranking on a similarity graph to identify a compact, mutually supportive context under a strict token budget. Our design is training-free, privacy-preserving, and reproducible, as each instance is processed through transient indexing to prevent cross-task leakage. We evaluate the pipeline on the AstroQA benchmark for astronomy QA, and demonstrate competitive performance across all difficulty levels. In particular, the RAG-enhanced Mistral-7B achieves \textbf{79.49\% accuracy} and \textbf{79.49\% F1-score}, nearly doubling the performance of its non-RAG counterpart. These results highlight the effectiveness of disciplined retrieval and refinement in boosting domain-specific reasoning, establishing a robust foundation for extending RAG to other scientific fields.

2605.25038 2026-05-26 cs.CL cs.LG cs.SE

TRACE: A taxonomy-grounded synthetic dataset for teaching-program generation and session interpretation in Applied Behavior Analysis

TRACE:一个基于分类学的合成数据集,用于应用行为分析中的教学程序生成和会话解释

Festus Kahunla

AI总结 提出TRACE数据集,通过分类学驱动的确定性生成器创建2999个合成示例,覆盖教学程序生成和多会话行为解释任务,以解决ABA领域真实数据受隐私保护无法公开的问题。

Comments 11 pages, 3 tables. Dataset: https://huggingface.co/datasets/PomboLabs/TRACE ; code: https://github.com/Pombo-Labs/TRACE

详情
AI中文摘要

应用行为分析(ABA)是一门临床学科,其文档、教学程序和多次会话行为日志具有公式化和高容量的特点,但真实会话数据受HIPAA保护并受专业保密规则约束,阻碍了训练语料库的发布。我们提出了TRACE(分类学参考的ABA临床示例),一个包含2999个示例的合成指令调优数据集,涵盖两项ABA任务:跨离散试验训练、自然环境教学和任务分析的教学程序生成;以及跨十二种轨迹模式和十三种目标行为的多会话行为解释。每个示例均由一个基于经典ABA文献的确定性分类学驱动生成器产生,并且每个示例都带有完整的采样来源,即产生它的确切分类学单元。该数据集以CC BY-NC 4.0(数据)和MIT(代码)许可发布,包含分层训练集(2549)、验证集(149)、测试集(281)和完整性检查集(20)。TRACE是一个研究工件,尚未经过临床验证。

英文摘要

Applied Behavior Analysis (ABA) is a clinical discipline whose documentation, teaching programs and multi-session behavioral logs, is formulaic and high-volume, yet real session data is HIPAA-protected and bound by professional confidentiality rules, blocking the release of a training corpus. We present TRACE (Taxonomy-Referenced ABA Clinical Examples), a 2,999-example synthetic instruction-tuning dataset covering two ABA tasks: teaching-program generation across Discrete Trial Training, Natural Environment Teaching, and Task Analysis; and multi-session behavioral interpretation across twelve trajectory patterns and thirteen target behaviors. Every example is produced by a deterministic taxonomy-driven generator grounded in the canonical ABA literature, and every example carries complete sampling provenance, the exact taxonomy cells that produced it. The dataset is released under CC BY-NC 4.0 for data and MIT for code, with stratified train (2,549), validation (149), test (281), and sanity (20) splits. TRACE is a research artifact and has not been clinically validated.

2605.25036 2026-05-26 cs.CL cs.AI

Language Bias in LVLMs: From In-Depth Analysis to Simple and Effective Mitigation

LVLMs中的语言偏差:从深入分析到简单有效的缓解方法

Yangneng Chen, Jing Li

AI总结 本文系统研究了大视觉语言模型中的语言偏差问题,发现其根源在于训练中的模态未对齐,并提出了两种简单有效的缓解方法:语言偏差正则化(LBR)和语言偏差惩罚(LBP)。

Comments Accepted by ICML 2026

详情
AI中文摘要

大型视觉语言模型(LVLMs)通过视觉理解扩展了大型语言模型,但仍然容易产生幻觉,即输出流畅但与图像不一致。最近的研究将这一问题与语言偏差联系起来——LVLMs过度依赖文本而忽视视觉输入的倾向。然而,大多数分析仍然是经验性的,没有揭示其根本原因。在本文中,我们对语言偏差进行了系统研究,并确定其根源在于训练过程中的模态未对齐。我们的分析表明,视觉指令微调(VIT)和直接偏好优化(DPO)通常优先考虑文本改进,这可能导致LVLMs过度倾向于语言建模,而不是平衡的多模态理解。为了解决这个问题,我们提出了两种简单而有效的方法:语言偏差正则化(LBR),通过在指令微调期间进行正则化来缓解语言偏差;以及语言偏差惩罚(LBP),在DPO训练过程中惩罚语言偏差。跨多种模型和基准的大量实验证明了我们方法的有效性。LBR在十多个通用基准上持续提高性能,而LBP显著减少了幻觉并提高了可信度。这些方法共同不仅缓解了语言偏差,还促进了LVLMs的整体对齐,且无需引入任何额外数据或辅助模型。我们的代码公开在https://github.com/lab-klc/LVLM-Language-Bias。

英文摘要

Large Vision-Language Models (LVLMs) extend large language models with visual understanding, but remain vulnerable to hallucination, where outputs are fluent yet inconsistent with images. Recent studies link this issue to language bias-the tendency of LVLMs to over-rely on text while neglecting visual inputs. Yet most analyses remain empirical without uncovering its underlying cause. In this paper, we provide a systematic study of language bias and identify its root in modality misalignment during training. Our analysis shows that both Visual Instruction Tuning (VIT) and Direct Preference Optimization (DPO) often prioritize textual improvements, which may cause LVLMs to overly lean toward language modeling rather than balanced multimodal understanding. To address this, we propose two simple yet effective methods: Language Bias Regularization (LBR) which mitigates language bias through regularization during instruction tuning, and Language Bias Penalty (LBP), which penalizes language bias in the DPO training process. Extensive experiments across diverse models and benchmarks demonstrate the effectiveness of our approach. LBR consistently improves performance on over ten general benchmarks, while LBP significantly reduces hallucination and improves trustworthiness. Together, these methods not only mitigate language bias but also advance the overall alignment of LVLMs, all without introducing any additional data or auxiliary models. Our code is publicly available at https://github.com/lab-klc/LVLM-Language-Bias.

2605.25030 2026-05-26 cs.LG

MimirRAG: A Multi-Agent RAG Framework for Financial Data Retrieval with Metadata Integration

MimirRAG:一种集成元数据的金融数据检索多智能体RAG框架

Magnus Samuelsen, Wilmer Nyström, Somnath Mazumdar, Mansoor Hussain, Mikkel Strange

AI总结 提出MimirRAG多智能体RAG框架,通过元数据集成、表格感知分块和智能体工作流,在金融数据检索中实现89.3%准确率,优于基线。

详情
AI中文摘要

检索增强生成(RAG)系统提供了一种有前景的方法来减少大语言模型(LLM)中的幻觉并提高答案准确性,这是可靠金融分析的必要条件,其中答案必须基于文件中的可验证证据,而非从模型先验生成。然而,设计能够从混合金融文档中提取有意义见解并集成到分析师工作流程中的RAG系统仍然具有挑战性。本文介绍了MimirRAG(元数据集成多智能体信息检索),这是一个迭代开发的多智能体RAG系统,旨在应对这些挑战。MimirRAG具有模块化流水线,包括PDF文件的保结构解析、表格感知分块、元数据提取、带有查询规划和混合搜索的基于智能体的检索、验证以及支持数值推理的上下文感知生成。我们的消融研究确定了有效金融RAG的三个关键技术推动因素:元数据集成、表格感知分块和智能体工作流。MimirRAG使用FinanceBench进行定量评估,并通过四位金融分析师的专家验证进行定性评估。该系统在FinanceBench上达到89.3%的准确率,优于原始基准基线。专家反馈强调,成功部署还需要校准信任、全面的数据集成和用户个性化。我们得出结论,将多智能体RAG架构与以人为中心的设计原则相结合,可以改善金融分析中有意义见解的提取。

英文摘要

Retrieval-augmented generation (RAG) systems offer a promising approach to reduce hallucinations and improve answer accuracy in large language models (LLMs), a requirement for reliable, financial analysis where answers must be grounded in verifiable evidence from filings rather than generated from model priors. However, designing RAG systems that extract meaningful insights from mixed financial documents and integrate into analyst workflows remains challenging. This paper introduces MimirRAG (Metadata-Integrated Multi-Agent Information Retrieval), a multi-agent RAG system developed iteratively to address these challenges. MimirRAG features a modular pipeline encompassing structure-preserving parsing of PDF filings, table-aware chunking, metadata extraction, agent-based retrieval with query planning and hybrid search, validation, and context-aware generation with numerical reasoning support. Our ablation study identifies three key technical enablers for effective financial RAG: metadata integration, table-aware chunking, and an agentic workflow. MimirRAG was evaluated quantitatively using FinanceBench and qualitatively through expert validation with four financial analysts. The system achieved 89.3% accuracy on FinanceBench, outperforming the original benchmark baselines. Expert feedback highlighted that successful deployment also requires calibrated trust, comprehensive data integration, and user personalization. We conclude that combining multi-agent RAG architecture with human-centric design principles can improve the extraction of meaningful insights in financial analysis.

2605.25024 2026-05-26 cs.CV

DA-UCT: Self-Supervised Domain-Adaptive Ultrasound Computed Tomography for Rapid Musculoskeletal Sound Speed Reconstruction

DA-UCT:用于快速肌肉骨骼声速重建的自监督域自适应超声计算机断层扫描

Tianyu Liu, Heyu Ma, Aiduo Wang, Peiwen Li, Boyi Li, Ying Li, Dan Li, Chengcheng Liu, Dean Ta

AI总结 提出SDA-UCT框架,通过自监督域自适应和注意力增强网络,实现快速高分辨率肌肉骨骼超声计算机断层扫描重建,显著提升速度并保持高质量。

详情
AI中文摘要

通过全波形反演的超声计算机断层扫描(UCT)能够实现高分辨率定量成像,用于组织表征和疾病诊断。然而,由于高度非线性的优化,UCT存在计算负担大和收敛问题严重等缺点。深度学习可以加速UCT重建,但监督训练需要大规模标记数据集,这在体内难以获得。为了解决这些限制,我们提出了SDA-UCT,一个两阶段自监督域自适应框架,用于快速准确的肌肉骨骼组织UCT成像。SDA-UCT采用在模拟数据集上预训练的注意力增强网络(AttUCT),并通过物理信息自监督学习迁移到体内数据,有效弥合了模拟到真实的域差距。集成了低秩自适应(LoRA)机制,以实现跨不同临床场景的高效自适应。结果表明,AttUCT在模拟人前臂上实现了高质量声速重建,PSNR为29.23 dB,SSIM为0.928,优于传统FWI和现有深度学习方法。在体内数据上验证,SDA-UCT成功重建了揭示人前臂复杂解剖结构(皮肤、脂肪、肌肉、肌腱、骨骼和骨髓)的声速图像,与MRI参考高度一致。仅调整3%参数的LoRA机制实现了与全微调相当的性能。快速重建(每帧5毫秒)实现了实时3D可视化,比传统FWI提高了五个数量级。这项工作代表了首个用于快速、高分辨率体内UCT成像的自监督域自适应深度学习,显示了在肌肉骨骼疾病诊断中的潜力。

英文摘要

Ultrasound computed tomography (UCT) via full waveform inversion (FWI) enables high-resolution quantitative imaging for tissue characterization and disease diagnosis. However, UCT suffers from large computational burden and severe convergence issues due to highly nonlinear optimization. Deep learning can accelerate UCT reconstruction, but supervised training requires large-scale labeled datasets difficult to obtain in vivo. To address these limitations, we propose SDA-UCT, a two-stage self-supervised domain-adaptive framework for rapid and accurate UCT imaging of musculoskeletal tissues. SDA-UCT employs an attention-enhanced network (AttUCT) pre-trained on simulation datasets and transfers to in-vivo data via physics-informed self-supervised learning, effectively bridging the simulation-to-real domain gap. A Low-Rank Adaptation (LoRA) mechanism is integrated to enable efficient adaptation across diverse clinical scenarios. Results showed that AttUCT achieved high-quality SOS reconstruction for simulated human forearm with a PSNR of 29.23 dB and SSIM of 0.928, outperforming conventional FWI and existing deep learning methods. Validated on in-vivo data, SDA-UCT successfully reconstructed SOS images revealing complex anatomical structures (skin, fat, muscle, tendon, bone and bone marrow) for human forearm, in high concordance with MRI references. The LoRA mechanism adjusting only 3% of parameters achieved comparable performance to full fine-tuning. The rapid reconstruction (5 ms per frame) enables real-time 3D visualization, achieving five-orders-of-magnitude improvement over traditional FWI. This work represents the first self-supervised domain-adaptive deep learning for rapid, high-resolution in-vivo UCT imaging, showing potential for musculoskeletal disease diagnosis.

2605.25022 2026-05-26 cs.CV cs.AI

D3S2: Diffusion-Guided Dataset Distillation for Semantic Segmentation

D3S2: 扩散引导的语义分割数据集蒸馏

Wenjie Zheng, Haoji Hu, Jiali Lu, Xingze Zou, Jing Wang

AI总结 针对语义分割数据集蒸馏中的长尾类别不平衡、像素级对齐和高计算成本问题,提出两阶段框架D3S2,通过类别平衡掩码选择和扩散引导图像合成生成紧凑训练集,在极低压缩率下显著提升分割性能。

详情
AI中文摘要

数据集蒸馏旨在将大规模数据集压缩为紧凑的合成集,同时保持训练效果。然而,现有研究主要关注图像分类,而语义分割等密集预测任务尚未充分探索。本文识别了分割数据集蒸馏的三个关键挑战:(i) 长尾类别不平衡,(ii) 图像与密集标签之间严格的像素级对齐需求,以及(iii) 使用复杂模型优化高分辨率数据的高计算成本。为应对这些挑战,我们提出D3S2,一种扩散引导的语义分割数据集蒸馏框架。我们的方法采用两阶段设计。在类别平衡掩码选择中,我们通过优先考虑低表示类别的贪婪策略构建代表性掩码集。在扩散引导图像合成中,我们使用预训练的布局到图像扩散模型生成以所选掩码为条件的图像,自然确保空间对齐。为进一步增强合成数据的训练效用,我们引入具有两个互补目标的引导扩散采样:用于像素级对齐的分割一致性损失,以及用于对齐跨层每类特征统计的类级特征匹配损失。大量实验证明了D3S2的优越性。值得注意的是,在1%的极低压缩率下,我们的方法在ADE20K和COCO-Stuff上使用Mask2Former (Swin-S)分别达到24.99%和35.49%的mIoU,比随机选择分别高出9.34%和5.70%。

英文摘要

Dataset distillation (DD) aims to compress large-scale datasets into compact synthetic sets while preserving training efficacy. However, existing studies mainly focus on image classification, leaving dense prediction tasks such as semantic segmentation largely underexplored. In this work, we identify three key challenges for segmentation DD: (i) long-tailed class imbalance, (ii) the need for strict pixel-wise alignment between images and dense labels, and (iii) the high computational cost of optimizing high-resolution data with complex models. To address these challenges, we propose D3S2, a Diffusion-guided Dataset Distillation framework for Semantic Segmentation. Our method adopts a two-stage design. In Class-Balanced Mask Selection, we construct a representative mask set via a greedy strategy that prioritizes underrepresented classes. In Diffusion-Guided Image Synthesis, we employ a pretrained layout-to-image diffusion model to generate images conditioned on the selected masks, naturally ensuring spatial alignment. To further enhance the training utility of synthesized data, we introduce guided diffusion sampling with two complementary objectives: a segmentation-consistency loss for pixel-level alignment, and a class-wise feature matching loss for aligning per-class feature statistics across layers. Extensive experiments demonstrate the superiority of D3S2. Notably, at an extremely compression rate of 1%, our method achieves 24.99% and 35.49% mIoU on ADE20K and COCO-Stuff with Mask2Former (Swin-S), outperforming random selection by 9.34% and 5.70%, respectively.

2605.25020 2026-05-26 cs.AI cs.CL

Privacy-Preserving Local Language Models for Longitudinal Data Retrieval in Chronic Dermatologic Disease: Implementation in Pemphigus Patients

慢性皮肤病纵向数据检索中的隐私保护本地语言模型:在天疱疮患者中的实施

Abdurrahim Yilmaz, Ayşe Esra Koku Aksu, Duygu Yamen, Vefa Asli Erdemir, Mehmet Salih Gurel, Gulsum Gencoglan, Joram M. Posma, Burak Temelkuran

AI总结 本研究评估了本地部署的隐私保护小型语言模型(SLM)在天疱疮患者长期随访记录中检索结构化临床特征并生成纵向摘要的能力,结果显示SLM在特征检索任务中平均准确率达82.25%,且医生对AI生成摘要的质量、临床准确性和实用性评分较高。

详情
AI中文摘要

慢性皮肤病如天疱疮需要长期随访,产生大量纵向临床文档,在常规就诊期间难以全面审查,增加了临床医生的工作量以及遗漏关键历史信息的风险。我们评估了本地部署的隐私保护小型语言模型(SLM)是否能够从长期皮肤科随访记录中检索结构化临床特征并生成纵向摘要。在这项回顾性病例系列研究中,30名天疱疮患者贡献了541份就诊记录,汇总为完整的纵向记录(89,336词);由两位皮肤科专家标注了56个临床相关特征。本地部署的SLM(Qwen3 4B Thinking 2507)对每份完整记录进行查询,以检索56个特征并生成一份最终报告摘要。在1,680个特征检索任务中,平均准确率为82.25%。皮肤科医生对AI生成摘要的整体质量(8.23-8.47)、临床准确性(7.93-8.20)和实用性(8.47-8.50)评分较高,评估者间无显著差异,且在53.3%的评估中总体偏好AI摘要。这些发现表明,隐私保护的本地部署SLM可以优于医学专家,并可靠地生成有临床意义的纵向摘要。在适当监督下,SLM可以支持临床决策。

英文摘要

Chronic dermatologic diseases such as pemphigus require long-term follow-up, generating extensive longitudinal clinical documentation that is difficult to review comprehensively during routine visits and increasing clinician workload as well as the risk of missing critical historical information. We evaluated whether a locally deployed, privacy-preserving small language model (SLM) could retrieve structured clinical features and generate longitudinal summaries from long-term dermatology follow-up records. In this retrospective case series, thirty pemphigus patients contributed 541 visit notes that were aggregated into full longitudinal records (89,336 words); 56 clinically relevant features were annotated by two expert dermatologists. The locally deployed SLM (Qwen3 4B Thinking 2507) was queried with each complete record to retrieve 56 features and generate one final report summaries. Across 1,680 feature retrieval tasks, mean accuracy was 82.25%. Dermatologists' ratings of AI-generated summaries were high for overall quality (8.23-8.47), clinical accuracy (7.93-8.20), and usefulness (8.47-8.50), with no significant inter-evaluator differences and an overall preference for AI summaries in 53.3% of evaluations. These findings suggest that privacy-preserving, locally deployed SLMs can outperform medical experts and reliably generate clinically meaningful longitudinal summaries. SLMs may support clinical decision-making when integrated with appropriate oversight.

2605.25014 2026-05-26 cs.CV

Stop Denoising Your Blurs

停止去噪你的模糊

Sasidhar Parvathireddy, Vamsidhar Saraswathula, Rama Krishna Gorthi

AI总结 提出ConvDiff框架,用卷积替代加性噪声构建模糊退化轨迹,实现基于扩散模型的图像去模糊,弥合模糊数学原理与扩散算法设计的差距。

Comments Accepted at IEEE International Conference on Image Processing (ICIP) 2026. 7 pages, 3 figures

详情
AI中文摘要

近年来,扩散模型在图像恢复任务中取得了显著性能。其核心机制依赖于在加性噪声操作之前对退化先验的受限假设。然而,模糊模型作为最广泛研究的退化形式之一,违反了这一假设,因为它本质上基于卷积而非加法。在本文中,我们引入了ConvDiff,一种新颖的基于扩散的框架,该框架用卷积替代加法操作,用于图像去模糊任务。在前向过程中,我们利用卷积的频域特性,从清晰图像到其模糊对应物构建有意义的轨迹,而不是用加性噪声逐步破坏图像。虽然当前工作针对高斯模糊实例化了该框架(其中频域分解产生闭式且物理有效的中间状态),但从模糊算子构建退化轨迹的基本原则自然扩展到其他模糊族。该公式弥合了模糊的数学原理与基于扩散的恢复算法的迭代设计之间的差距,从而实现了更物理基础且有效的图像恢复模型。

英文摘要

In recent times, diffusion models have achieved remarkable performance in image restoration tasks. Their core mechanism relies on the restricted presumption of degradation prior to the additive noise operation. However, the blur model, one of the most widely studied degradation formulations, violates this assumption, as it is inherently based on convolution rather than addition. In this paper, we introduce ConvDiff, a novel diffusion based framework that substitutes the additive operation with convolution for the task of image deblurring. In the forward process, we construct a meaningful trajectory from the clean image to its blurred counterpart by exploiting the frequency domain characteristics of convolution, rather than progressively corrupting the image with additive noise. While the current work instantiates this framework for Gaussian blur, where frequency-domain decomposition yields closed-form and physically valid intermediate states, the underlying principle of constructing degradation trajectories from the blur operator extends naturally to other blur families. This formulation bridges the gap between the mathematical principles of blurring and the iterative design of diffusion-based restoration algorithms, enabling more physically grounded and effective image restoration models.

2605.25012 2026-05-26 cs.CV

Learning from Semantic Dictionaries: Discriminative Codebook Contrastive Learning for Unified Visual Representation and Generation

从语义字典中学习:面向统一视觉表示与生成的判别式码本对比学习

Imanol G. Estepa, Jesús M Rodríguez-de-Vera, Bhalaji Nagarajan, Petia Radeva

AI总结 提出LEASE框架,通过配对生成-判别码本设计,在离散标记空间中联合优化掩码重建损失和码本对比损失,实现统一视觉表示与生成,在ImageNet-1K上达到最先进性能。

Comments Accepted at CVPR'26

详情
AI中文摘要

判别式和生成式视觉模型在各自领域表现出色,但在语义上存在错位,阻碍了统一视觉学习的进展。我们提出LEASE(从语义字典中学习),一种自监督框架,通过配对生成-判别码本设计弥合这一差距。LEASE完全在通过一次性预计算步骤产生的离散标记空间中运行,无需数据增强、教师模型或在线分词器即可高效训练。LEASE整合了两个互补目标:捕获细粒度生成细节的掩码标记重建损失,以及通过自适应质心加权将编码器特征与判别语义对齐的码本对比损失。这种双重监督产生了一个统一潜在空间,同时支持高质量生成和强大的表示学习。在ImageNet-1K上,LEASE实现了最先进的统一性能,在线性探测(相比MAGE和Sorcen提升高达+1.7%)、无条件生成(相比MAGE FID降低1.26,IS提升10.19)、少样本学习(相比Sorcen平均提升+0.56%)、迁移学习(相比MAGE和Sorcen平均提升+0.75%)以及鲁棒性基准(相比MAGE和Sorcen平均提升+5.86%和+4.25%)上均优于先前的VQGAN方法如MAGE和Sorcen。它还能与领域专用的对比和生成模型竞争,同时超越先前的MIM方法。无监督的LEASE模型还可以通过在其学习表示基础上构建扩展到条件生成,与专用基线相比具有竞争力。总体而言,LEASE为联合理解和生成视觉内容的通用视觉模型提供了高效且有效的一步。

英文摘要

Discriminative and generative vision models excel in their respective domains but remain semantically misaligned, hindering progress toward unified visual learning. We introduce LEASE (LEArning from SEmantic Dictionaries), a self-supervised framework that bridges this gap using a paired generative-discriminative codebook design. LEASE operates entirely in a discrete token space produced through a one-time precomputation step, enabling efficient training without data augmentations, teacher models, or online tokenizers. LEASE integrates two complementary objectives: a masked token reconstruction loss that captures fine-grained generative detail, and a codebook contrast loss that aligns encoder features with discriminative semantics via adaptive centroid weighting. This dual supervision yields a unified latent space that supports both high-quality generation and strong representation learning. On ImageNet-1K, LEASE achieves state-of-the-art unified performance, outperforming prior VQGAN-based methods such as MAGE and Sorcen across linear probing (up to +1.7%), unconditional generation (-1.26 FID and +10.19 IS w.r.t MAGE), few-shot learning (+0.56% on average against Sorcen), transfer (+0.75% average improvement against MAGE and Sorcen), and robustness benchmarks (+5.86% and +4.25% average improvement against MAGE and Sorcen, respectively). It also competes favorably with domain-specialized contrastive and generative models while surpassing previous MIM methods. The unsupervised LEASE model can also be extended to conditional generation by building upon its learned representations, proving competitive with specialized baselines. Overall, LEASE provides an efficient and effective step toward general-purpose vision models that jointly understand and generate visual content.

2605.25011 2026-05-26 cs.LG

A perspective on fluid mechanical environments for challenges in reinforcement learning

强化学习挑战中的流体力学环境视角

Shruti Mishra, Michael Chang, Vamsi Spandan, Shmuel M. Rubinstein

AI总结 本文提出将经典流体力学问题作为强化学习测试平台,通过非线性不稳定性环境中的状态、动作空间和奖励函数设计,促进智能体在高维动态环境中的高效交互。

详情
AI中文摘要

我们考虑开发能够高效与高维、演化环境交互的智能体所面临的挑战,旨在实现与开放世界交互的实际强化学习智能体,这些智能体仅能观察并影响世界的一小部分。我们认为,经典流体力学问题及其模拟为这类方法的开发提供了一个引人注目的测试平台。这些问题出现在非线性不稳定性中,其中小扰动可能增长并改变系统的动力学。非线性不稳定性代表了若干具有工业应用的开放科学挑战——液体射流的液滴破碎、两种流体界面的混合,以及海洋中异常高的怪浪的出现。在这些设置中,智能体可以利用跨变化动力学的保留表示来高效学习。 我们提出了两个智能体与流体力学环境交互的问题描述,并描述了这些智能体的状态空间、动作空间和奖励函数。对于这些示例,我们指定了环境的非平稳方面以及保留的不变性。我们注意到Dedalus和JAX-CFD是可用于开发强化学习方法的开源模拟器(Burns等人,2016;Kochkov等人,2021)。我们通过创建在Dedalus模拟的静态环境中学习导航的强化学习智能体,展示了Dedalus在环境生成中的使用。这为未来开发能够有意义地与代表自然和工业流动中科学挑战的模拟环境交互的强化学习智能体奠定了基础。

英文摘要

We consider the challenge of developing agents that efficiently interact with high-dimensional, evolving environments, towards a view of practical reinforcement learning (RL) agents interacting with open worlds, of which they witness and affect only a small part. We argue that canonical fluid mechanics problems, and their simulations, present a compelling testbed for the development of such methods. These problems arise in nonlinear instabilities, where small disturbances can grow to transform the dynamics of a system. Nonlinear instabilities represent several open scientific challenges with industrial applications -- the droplet breakup of a liquid jet, mixing at an interface between two fluids, and the appearance of unusually tall rogue waves in the ocean. In these settings, agents may leverage preserved representations across the changing dynamics to learn efficiently. We present two problem descriptions of agents interacting with a fluid mechanical environment, and describe the state and action spaces, and reward functions, for these agents. For these examples, we specify the aspects of the environment which are nonstationary and the preserved invariances. We note Dedalus and JAX-CFD as open-source simulators that can be used for the development of reinforcement learning methods (Burns et al., 2016; Kochkov et al., 2021)) We demonstrate the use of Dedalus for environment generation by creating RL agents that learn to navigate in a stationary environment that is simulated using Dedalus. This sets the stage for future development of RL agents that learn to meaningfully interact with simulated environments that represent scientific challenges in natural and industrial flows.

2605.25009 2026-05-26 cs.CV

ClueAegis: Heuristic-to-Reasoning Cognitive-skill Learning for Unified Evidence-based Synthetic Image Detection

ClueAegis:面向统一基于证据的合成图像检测的启发式到推理认知技能学习

Huangsen Cao, Hongkang Chu, Yuxi Li, Ying Zhang, Chen Li, Jing Lyu, Yongwei Wang, Yu Zhao, Fei Wu

AI总结 针对现有合成图像检测方法缺乏结构化取证推理的问题,提出一种启发式到推理的认知技能学习框架ClueAegis,通过两阶段智能体流程实现技能选择与证据引导推理,在跨域泛化和鲁棒性上达到最优性能。

详情
AI中文摘要

生成模型的快速发展使合成图像越来越逼真,挑战了可靠的检测。现有方法通常局限于端到端分类或单一推理,因此无法建模结构化的取证推理和异构视觉证据。我们从认知角度重新审视合成图像检测,提出了一种启发式到推理的认知技能学习框架,用于基于证据的取证分析。给定输入图像,我们的框架首先提取启发式感知线索,选择最优取证技能,然后执行技能条件推理以进行证据提取和决策。为支持这一范式,我们引入了ClueAegis-Bench,它将合成图像检测分解为显式标注的取证认知技能,以实现超越二分类的结构化评估。基于该基准,我们提出了ClueAegis(面向统一基于证据的合成图像检测的认知技能学习),一个两阶段智能体框架,执行启发式技能选择,然后通过技能条件工具链进行证据引导推理。该设计将合成图像检测重新表述为一个可配置的多技能推理过程,桥接了感知、技能选择和取证推理。大量实验表明,ClueAegis在提升跨域泛化和鲁棒性的同时实现了最先进的性能。它还提供了透明的推理轨迹和结构化的取证证据,为传统的端到端检测器提供了更可解释的替代方案。

英文摘要

The rapid advancement of generative models has made synthetic images increasingly realistic, challenging reliable detection. Existing methods are often limited to end-to-end classification or monolithic reasoning, and thus fail to model structured forensic reasoning and heterogeneous visual evidence. We revisit synthetic image detection from a cognitive perspective and propose a \textit{Heuristic-to-Reasoning} cognitive skill learning framework for evidence-based forensic analysis. Given an input image, our framework first extracts heuristic perceptual clues, selects the optimal forensic skill, and then performs skill-conditioned reasoning for evidence extraction and decision making. To support this paradigm, we introduce \textbf{ClueAegis-Bench}, which decomposes synthetic image detection into explicitly annotated forensic cognitive skills for structured evaluation beyond binary classification. Based on this benchmark, we propose \textbf{ClueAegis} (\underline{C}ognitive-skill \underline{L}earning for \underline{U}nified \underline{E}vidence-based Synthetic Image Detection), a two-stage agentic framework that conducts heuristic skill selection followed by evidence-guided reasoning through skill-conditioned toolchains. This design reformulates synthetic image detection as a configurable multi-skill reasoning process that bridges perception, skill selection, and forensic reasoning. Extensive experiments show that ClueAegis achieves state-of-the-art performance while improving cross-domain generalization and robustness. It also provides transparent reasoning trajectories and structured forensic evidence, offering a more explainable alternative to conventional end-to-end detectors.

2605.25005 2026-05-26 cs.RO

Stiffness Optimization for Concentrated Bending in Magnetically Actuated Catheters: Maintaining Steerability under Gradient Stiffness

磁驱动导管集中弯曲的刚度优化:在梯度刚度下保持可操控性

Jiewen Tan, Junnan Xue, Shing Shin Cheng, Shuang Song, Erli Lyu, Jiaole Wang

AI总结 针对磁驱动软导管在推送性与近端集中弯曲之间的权衡,提出一种刚度优化的多段磁驱动导管(SO-MAC),通过解耦转向-推进机构和梯度刚度架构,在推进过程中实现稳定的近端枢轴弯曲,同时远端被动自直以传递推进力。

详情
AI中文摘要

对于磁驱动软导管,实现高效的推送性(推进力传递)和近端集中弯曲以保持可操控性具有挑战性:较高的轴向/弯曲刚度可改善力传递但降低可操控性,而较低的刚度可实现大的近端集中弯曲,但在压缩推送载荷下增加扭结/屈曲风险。为了解决这一权衡,我们提出了一种刚度优化的多段磁驱动导管(SO-MAC),它集成了解耦的转向-推进机构与梯度刚度架构。SO-MAC在推进过程中将弯曲集中在稳定的近端枢轴周围,而远端部分通过优化的刚度分布和弹簧骨架的弹性恢复抵抗摩擦引起的扭结/屈曲,被动自直以传递推进力。在$0{-}180^{\circ}$的组合转向和推进过程中,枢轴保持稳定,远端尖端几乎直线地向目标方向推进。直径为1.5 mm的SO-MAC在其10 mm尖端处实现了高达$180^{\circ}$的转向,弯曲半径为3 mm,平均形状误差为$1.39 \pm 0.56$ mm,转向枢轴误差为$0.35 \pm 0.10$ mm。在支气管模型中的视觉反馈控制进一步验证了通过高度弯曲的分叉路径的鲁棒导航。

英文摘要

Achieving both efficient pushability (propulsion transmission) and proximally concentrated bending for steerability is challenging for magnetically actuated soft catheters: higher axial/bending stiffness improves force transmission but reduces steerability, whereas lower stiffness enables large, proximally concentrated bending yet increases kinking/buckling risk under compressive push loads. To address this trade-off, we propose a stiffness-optimized multi-segment magnetically actuated catheter (SO-MAC) that integrates a decoupled steering-advancement mechanism with a gradient-stiffness architecture. The SO-MAC concentrates bending about a stable proximal pivot during advancement while the distal section passively self-straightens to transmit propulsion, aided by the optimized stiffness distribution and elastic recovery of the spring backbone against friction-induced kinking/buckling. Over $0{-}180^{\circ}$ combined steering and advancement, the pivot remained stable and the distal tip advanced near-straight toward the target direction. A 1.5 mm-diameter SO-MAC achieved up to $180^{\circ}$ steering with a 3 mm bending radius at its 10 mm tip, with an average shape error of $1.39 \pm 0.56$ mm and a steering-pivot error of $0.35 \pm 0.10$ mm. Visual feedback control in a bronchial phantom further confirmed robust navigation through highly curved, bifurcating paths.

2605.25004 2026-05-26 cs.LG cs.AI

Metropolis-Scale Resilient and Trustworthy Traffic Flow Inference Using Multi-Source Data

基于多源数据的都市尺度弹性可信交通流推断

Qishen Zhou, Yifan Zhang, Michail A. Makridis, Anastasios Kouvelas, Yibing Wang, Simon Hu

AI总结 提出任务感知注意力神经过程(TA-ANP)统一概率框架,融合浮动车数据和稀疏固定检测器数据,实现高精度、可信的不确定性量化的全局交通状态推断,并在都市尺度数据集上取得最优性能。

Comments The paper has been submitted to Elsevier for possible publication

详情
AI中文摘要

从稀疏观测中以高精度和可信的不确定性量化推断网络级交通状态对于智能交通系统至关重要,但由于问题的欠定性、传感网络的多方面干扰以及多个推断子任务在联合建模时的固有冲突,这仍然具有挑战性。我们提出了任务感知注意力神经过程(TA-ANP),这是一个统一的概率框架,通过融合浮动车数据(FCD)和稀疏的固定检测器测量,实现弹性且可信的全局交通状态推断(GTSI)。通过将GTSI视为一个随机过程,TA-ANP利用神经过程的元学习特性,无需重新训练即可快速适应传感配置的变化。引入了一个具有不同时空归纳偏置的任务感知多查询注意力模块,以联合处理三个GTSI子任务,同时减轻跨任务干扰。对于不确定性量化,我们将神经过程与蒙特卡洛丢弃法相结合,以捕获偶然不确定性和认知不确定性。为了支持都市尺度评估,我们构建了都市多源交通数据集(MMTD),该数据集整合了固定环路传感器测量、FCD统计数据和OpenStreetMap道路网络数据,覆盖了包含2371个路段的城市网络。在MMTD上的实验表明,TA-ANP在确定性和概率性指标下的所有子任务中均达到了最先进的性能。由此产生的良好校准的不确定性使得能够以更少的传感器部署实现更高效的固定传感器布局。在“损坏-修复-新增”传感生命周期下,TA-ANP在干扰吸收、性能恢复和对未见传感配置的适应性方面表现出卓越的弹性。

英文摘要

Inferring network-wide traffic states from sparse observations with high accuracy and trustworthy uncertainty quantification is essential for intelligent transportation systems, yet it remains challenging due to the underdetermined nature of the problem, multifaceted disturbances in sensing networks, and the inherent conflicts among multiple inference sub-tasks when modeled jointly. We propose the Task-Aware Attentive Neural Process (TA-ANP), a unified probabilistic framework for resilient and trustworthy global traffic state inference (GTSI) by fusing floating car data (FCD) with sparse fixed-detector measurements. By casting GTSI as a stochastic process, TA-ANP leverages the meta-learning properties of neural processes to adapt rapidly to changes in sensing configurations without retraining. A task-aware multi-query attention module with distinct spatiotemporal inductive biases is introduced to jointly handle three GTSI sub-tasks, while mitigating cross-task interference. For uncertainty quantification, we combine neural processes with Monte Carlo Dropout to capture both aleatoric and epistemic uncertainty. To support metropolis-scale evaluation, we construct the Metropolitan Multi-Source Traffic Dataset (MMTD), integrating fixed-loop sensor measurements, FCD statistics, and OpenStreetMap road-network data over an urban network of 2,371 road segments. Experiments on MMTD show that TA-ANP achieves state-of-the-art performance across all sub-tasks under deterministic and probabilistic metrics. The resulting well-calibrated uncertainties enable more efficient fixed-sensor placement with fewer sensor deployments. Under a Damage-Repair-Addition sensing lifecycle, TA-ANP demonstrates superior resilience in terms of disturbance absorption, performance recovery, and adaptability to unseen sensing configurations.

2605.25001 2026-05-26 cs.LG

Mitigating Gradient Pathology in PINNs through Aligned Constraint

通过对齐约束缓解PINN中的梯度病理

Yichen Luo, Peiyu Zhu, Dongxiao Hu, Jia Wang, Tailin Wu, Dapeng Lan, Yu Liu, Zhibo Pang

AI总结 针对物理信息神经网络训练中梯度冲突导致的局部最优问题,提出约束对齐损失与流形提升方法,通过重新表述零阶项为对齐约束并引入延迟因子,显著提升数值稳定性和效率。

详情
Journal ref
Forty-Third International Conference on Machine Learning (ICML 2026)
AI中文摘要

虽然物理信息神经网络(PINN)在求解偏微分方程(PDE)方面功能强大,但其训练常常因梯度病理而瘫痪。来自PDE残差和边界约束的梯度相互对抗,使模型陷入局部最小值。当前的解决方案,如自适应加权或硬约束,要么无法从根本上解决这种病态条件,要么仅限于简单几何形状。在本研究中,我们从损失景观和优化动态的角度系统分析了这种梯度病理的可能原因。基于所得结论,我们提出了带有流形提升的约束对齐损失(CAML)。通过将所有零阶项重新表述为对齐约束,我们的方法有效缓解了梯度冲突。此外,我们引入了一个延迟因子来帮助优化器跳过高曲率区域。实验表明,我们的CAML在高度复杂的PINN问题中显著增强了数值稳定性和效率。我们的代码已在https://github.com/YichenLuo-0/CAML上开源。

英文摘要

While Physics-Informed Neural Networks (PINNs) are powerful for solving Partial Differential Equations (PDEs), their training is often paralyzed by gradient pathology. The gradients from the PDE residuals and boundary constraints oppose each other, trapping the model in local minima. Current solutions, such as adaptive weighting or hard constraints, either fail to fundamentally resolve this ill-conditioning or are limited to simple geometries. In this study, we systematically analyze the possible causes of this gradient pathology from the perspectives of loss landscapes and optimization dynamics. Based on the obtained conclusion, we propose Constraint-Aligned loss with Manifold Lifting (CAML). By reformulating all zeroth-order terms into aligned constraints, our method effectively mitigates gradient conflicts. In addition, we introduce a delay factor to help the optimizer skip the high-curvature area. Experiments demonstrate that our CAML significantly enhances numerical stability and efficiency in highly complex PINN problems. Our code is open-sourced on https://github.com/YichenLuo-0/CAML.

2605.24998 2026-05-26 cs.CL

Better, Faster: Harnessing Self-Improvement in Large Reasoning Models

更好、更快:利用大型推理模型的自我改进

Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Leszek Rutkowski, Dacheng Tao

AI总结 针对自我改进训练中数据不平衡和过度思考问题,提出HSIR方法,通过验证后退出采样和内在多样性评分提升推理性能与效率。

Comments Accepted by ICML2026

详情
AI中文摘要

自我改进训练使大型推理模型(LRMs)能够在没有外部监督的情况下,通过自我生成推理轨迹作为训练数据来改进自身。然而,我们发现这种方法在复杂推理任务中常常表现不佳,甚至导致模型崩溃。通过一系列初步分析,我们揭示了两个问题:(1)数据不平衡,即大多数训练样本简单,但具有挑战性且关键的样本稀缺;(2)过度思考,即许多带有冗余推理步骤的不理想样本被用于自我训练。为此,我们提出HSIR,通过两种简单而有效的方法有效利用大型推理模型的自我改进。具体而言,HSIR引入了一种验证后退出采样策略,通过高效收集困难查询的更准确解决方案来缓解数据不平衡,并设计了内在多样性评分来量化过度思考并过滤掉不理想的解决方案。我们将HSIR应用于各种后训练范式,其中进一步提出了H-GRPO,一种增强的GRPO算法,利用内在多样性作为外部奖励,通过强化学习鼓励简洁且多样化的推理。大量结果表明,HSIR不仅有效提升了推理性能,即平均性能提升高达10.9%,而且通过减少高达42.4%的相对推理开销,显著提高了推理效率。

英文摘要

Self-improvement training enables the large reasoning models (LRMs) to improve themselves by self-generating reasoning trajectories as training data without external supervision. However, we find that this method often falls short in complex reasoning tasks and even leads to model collapse. Through a series of preliminary analyses, we reveal two problems: (1) data imbalance, where most training samples are simple, but the challenging yet crucial samples are scarce; (2) overthinking, where many undesired samples with redundant reasoning steps are used for self-training. To this end, we propose HSIR, which effectively Harnesses Self-Improvement in large Reasoning models via two simple-yet-effective approaches. Specifically, HSIR introduces a verify-then-exit sampling strategy to mitigate data imbalance by efficiently collecting more accurate solutions for difficult queries, and designs an Intrinsic Diversity score to quantify overthinking and filter out the undesired solutions. We apply HSIR to various post-training paradigms, among which we further propose H-GRPO, an enhanced GRPO algorithm that leverages the intrinsic diversity as an external reward to encourage concise and diverse reasoning via reinforcement learning. Extensive results show that HSIR not only effectively enhances the reasoning performance, i.e., bringing up to +10.9% average performance gains, but also significantly improves the reasoning efficiency by reducing up to 42.4% relative inference overhead.

2605.24996 2026-05-26 cs.CL

Exploring Profiles of Cognitive Distortions Associated with Mental Health Disorders

探索与心理健康障碍相关的认知扭曲特征

Alina Anikejeva, Kairit Sirts

AI总结 本研究使用基于n-gram和微调Transformer模型的方法,分析Reddit数据中九种自我报告心理健康群体与对照组的认知扭曲,发现心理健康群体扭曲程度更高,且不同群体间扭曲模式相似,表明简单词汇方法可用于大规模心理健康文本的群体趋势探索。

Comments CLPsych 2026

详情
AI中文摘要

认知扭曲,即扭曲的思维模式,在计算心理健康研究中受到越来越多的关注。尽管它们与许多(如果不是全部)心理健康障碍相关,但现有研究主要关注抑郁症。在这项工作中,我们探索了多种心理健康状况下的扭曲特征。我们分析了一个大型Reddit数据集,包含来自九个自我报告心理健康群体以及一个对照组的帖子,使用基于n-gram的方法和微调Transformer模型来检测认知扭曲。心理健康群体(无论是合并还是单独检查)与对照组相比,表现出更高的认知扭曲发生率,效应大小从小到中等。在比较不同状况下的扭曲特征时,我们观察到大致相似的模式,尽管某些群体整体上表现出比其他群体更高的扭曲水平。这些发现表明,相对简单的词汇方法可用于大规模心理健康文本数据中群体趋势的探索性分析。

英文摘要

Cognitive distortions, distorted patterns of thinking, have been increasingly studied in computational mental health research. Although they are related to many, if not all, mental health disorders, most existing studies focus primarily on depression. In this work, we explore distortion profiles across multiple mental health conditions. We analyzed a large Reddit-based dataset containing posts from nine self-reported mental health groups as well as a control group using both an n-gram-based method and a fine-tuned transformer model for detecting cognitive distortions. Mental health groups, both when pooled together and when examined individually, showed higher prevalence of cognitive distortions compared to the control group, with the effect sizes ranging from small to moderate. When comparing distortion profiles across conditions, we observed largely similar patterns, although some groups exhibited overall higher levels of distortions than others. These findings suggest that relatively simple lexical approaches can be useful for exploratory analyses of group-level trends in large-scale mental health text data.

2605.24993 2026-05-26 cs.AI cs.CV

NeurIPS: Neuro-anatomical Inductive Priors for Sphere-based Brain Decoding

NeurIPS: 基于球面的脑解码的神经解剖学归纳先验

Sijin Yu, Zijiao Chen, Zhenyu Yang, Zihao Tan, Jiakun Xu, Zhongliang Liu, Shengxian Chen, Wenxuan Wu, Xiangmin Xu, Xin Zhang

AI总结 提出NeurIPS框架,通过选择性ROI球形分词器和结构引导专家混合模型,将解剖变异转化为归纳先验,在自然场景数据集上实现表面解码器最先进性能,并显著提升训练效率。

Comments International Conference on Machine Learning (ICML) 2026

详情
AI中文摘要

当前的fMRI解码器面临性能-保真度权衡,其中高效的ID编码器优于几何保真的表面模型。我们认为这部分是由于低效的表面分词化以及未能将解剖学用作预测信号。我们提出NeurIPS,一个通过将解剖变异从干扰因素重新定义为强大的归纳先验来改进表面解码的框架。NeurIPS结合了两项创新:用于高效几何编码的选择性ROI球形分词器(SRST),以及使用皮层特征显式建模个体解剖的结构引导专家混合模型(SG-MoE)。在自然场景数据集上,NeurIPS为表面解码器建立了新的最先进水平,并实现了与强1D基线相当的性能。这是以空前的效率实现的,因为模型收敛速度显著加快(10个epoch对比600个epoch)。这种效率使得仅使用20%的数据即可快速适应新受试者,并确保随着训练队列扩大而稳健扩展。消融实验提供了因果证据,表明这些收益源于模型使用皮层特征,而非记忆受试者ID。通过利用解剖先验,NeurIPS为稳健、可泛化的脑解码提供了一条有原则且可扩展的路径。

英文摘要

Current fMRI decoders face a performance-fidelity trade-off where efficient ID encoders outperform geometrically faithful surface-based models. We argue this is partly driven by inefficient surface tokenization and the failure to use anatomy as a predictive signal. We present NeurIPS, a framework that improves surface-based decoding by reframing anatomical variation from a nuisance to a powerful inductive prior. NeurIPS unites two innovations: a Selective ROI Spherical Tokenizer (SRST) for efficient geometric encoding, and a Structure-Guided Mixture of Experts (SG-MoE) that explicitly models individual anatomy using cortical features. On the Natural Scenes Dataset, NeurIPS establishes a new state-of-the-art for surface decoders and achieves performance comparable to strong 1D baselines. This is achieved with unprecedented efficiency, as the model converges dramatically faster (10 vs. 600 epochs). This efficiency enables rapid adaptation to new subjects using only 20% of data and ensures robust scalability as the training cohort is expanded. Ablations provide causal evidence that these gains are driven by the model's use of cortical features, not by memorizing subject IDs. By leveraging anatomical priors, NeurIPS provides a principled and scalable path toward robust, generalizable brain decoding.

2605.24989 2026-05-26 cs.LG cs.AI cs.IR

Selective Test-Time Compute Scaling for Click-Through Rate Prediction via Uncertainty-Triggered Feature Path Exploration

基于不确定性触发的特征路径探索的点击率预测选择性测试时计算扩展

Moyu Zhang, Yun Chen, Yujun Jin, Jinxin Hu, Yu Zhang, Xiaoyi Zeng

AI总结 针对点击率预测中训练数据稀疏导致的不确定性,提出无需训练、模型无关的UTTSI框架,通过双信号估计器区分认知不确定性和偶然模糊性,对不确定实例进行自适应特征过滤和随机特征路径探索,在保持最坏延迟不变的情况下实现平均约2.8倍基础模型开销,实验和在线A/B测试均取得显著提升。

Comments 12 pages, 4 Figures, 3 Tables

详情
AI中文摘要

扩展测试时计算对语言模型已被证明非常有效,然而这一机会在工业点击率(CTR)预测中仍未得到充分探索。CTR模型存在一个根本的不对称性:训练中充分表示的特征组合产生自信的预测,而稀疏观察到的特征组合则产生不可靠的输出。现有的训练阶段解决方案(如自适应门控)学习一个固定的选择函数,但受限于相同的稀疏性,在部署时无法提供针对每个实例的补救措施。我们提出UTTSI(不确定性触发的测试时选择性推理),一个无需训练、模型无关的框架,将推理深度按比例扩展到每个实例的不确定性。一个结合模型logit置信度和数据级频率先验的双信号估计器区分认知不确定性和偶然模糊性。每个实例都经过自适应特征过滤以去除不可靠的嵌入;不确定的实例额外接受随机特征路径探索,其预测通过一致性加权集成进行聚合。自信的实例完全绕过探索,保持平均开销约为基础模型成本的2.8倍,最坏情况延迟不变。在四个数据集和三种骨干架构上的实验表明,与所有训练阶段基线相比,取得了持续且统计显著的增益。为期七天的在线A/B测试进一步证实了5.3%的相对CTR提升(p < 0.01),确立了选择性测试时计算分配作为CTR预测训练阶段进展的实用补充。

英文摘要

Scaling test-time compute has proven highly effective for language models, yet this opportunity remains largely unexplored for industrial Click-Through Rate (CTR) prediction. CTR models suffer from a fundamental asymmetry: feature combinations well-represented in training yield confident predictions, while sparsely observed ones produce unreliable outputs. Existing training-phase solutions such as adaptive gating learn a fixed selection function subject to the same sparsity, offering no per-instance recourse at deployment.We propose UTTSI (Uncertainty-Triggered Test-Time Selective Inference), a training-free model-agnostic framework that scales inference depth proportionally to per-instance uncertainty. A dual-signal estimator combining model logit confidence with a data-level frequency prior distinguishes epistemic uncertainty from aleatoric ambiguity. Every instance undergoes adaptive feature filtering to remove unreliable embeddings; uncertain instances additionally receive stochastic feature-path explorations whose predictions are aggregated via consistency-weighted ensembling. Confident instances bypass exploration entirely, keeping average overhead at approximately $2.8\times$ base model cost with worst-case latency unchanged.Experiments on four datasets with three backbone architectures demonstrate consistent, statistically significant gains over all training-phase baselines. A seven-day online A/B test further confirms a 5.3% relative CTR gain ($p < 0.01$), establishing selective test-time compute allocation as a practical complement to training-phase advances for CTR prediction.

2605.24985 2026-05-26 cs.RO cs.LG physics.comp-ph

Learning, locomotion, and navigation of soft synthetic snakes in three-dimensional, heterogeneous environments

软体合成蛇在三维异质环境中的学习、运动与导航

Xiaotian Zhang, Ali Albazroun, Tixian Wang, Songyuan Cui, Prashant G. Mehta, Mattia Gazzola

AI总结 提出基于仿生驱动和感知模型的强化学习框架,使软体合成蛇能够自主导航非结构化三维地形,并通过高保真环境验证鲁棒性。

Comments 14 pages, 5 figures

详情
AI中文摘要

无肢陆地动物表现出卓越的运动多样性和控制能力,目前尚无法被工程对应物所超越。在这里,我们引入了一个计算框架,使软体合成蛇能够导航非结构化的、异质的三维地形。我们的方法基于仿生驱动和感知模型,这些模型降低了高自由度连续体固有的控制复杂性。这些模型被集成到强化学习架构中,以推导出穿越环境的策略。训练首先在简化的同质地形中进行,以学习运动基元。然后,这些基元被组合成针对复杂地形的自适应策略。我们通过将蛇部署在从真实世界成像重建的高保真三维环境中来展示鲁棒性,实现了可靠的导航。总体而言,这项工作为自然地形中连续系统的控制提供了一个物理真实的仿真平台和实用见解。

英文摘要

Limbless terrestrial animals exhibit exceptional locomotor versatility and control, currently unmatched by engineered counterparts. Here, we introduce a computational framework that enables soft synthetic snakes to navigate unstructured, heterogeneous 3D terrains. Our approach is grounded in bio-inspired actuation and sensing models that reduce the control complexity inherent to high-degree-of-freedom, continuum bodies. These models are integrated into a reinforcement learning architecture to derive environment-traversing policies. Training first occurs in simplified, homogeneous terrains to learn locomotion primitives. These are then composed into adaptive strategies for complex landscapes. We demonstrate robustness by deploying a snake in high-fidelity 3D environments reconstructed from real-world imaging, achieving reliable navigation. Overall, this work provides a physically-realistic simulation platform and practical insights for the control of continuum systems in natural terrains.

2605.24983 2026-05-26 cs.LG

Benchmarking non-conformity score functions in conformal prediction

共形预测中非一致性评分函数的基准测试

Sol Erika Boman

AI总结 本文综述了共形预测中非一致性评分函数的性质,提出原始修改和评估方法,并通过实验比较了不同函数在平衡和不平衡类别设置下的性能。

Comments 3 tables, 1 supplementary table, 1 supplementary figure

详情
AI中文摘要

共形预测是机器学习分类中模型校准的一种有用且多功能的替代方法。它将单类预测替换为预测集,保证预测集包含真实类别的 extit{先验}概率大于或等于预指定的比率。预测集的大小和有用性在很大程度上取决于非一致性评分函数的选择。科学文献中包含许多非一致性评分函数的例子,但缺乏研究其性质和有效性的工作。在本文中,我们概述了非一致性评分函数的性质。我们给出了现有文献中的非一致性评分函数示例,并引入了原始修改。我们提出了一种评估共形预测器预测集大小的原始方法,并用它来比较非一致性评分函数。我们还研究了不同非一致性评分函数在类别不平衡设置下用于类别条件共形预测的有效性。

英文摘要

Conformal prediction is a useful and versatile alternative to model calibration in machine learning classification. It replaces single-class prediction with prediction sets, guaranteeing that the \textit{a priori} probability of the prediction sets containing the true class is larger than or equal to a pre-specified rate. The size and usefulness of the prediction sets relies heavily on the choice of the non-conformity score function. The scientific literature contains many examples of non-conformity score functions but there is an absence of studies examining their properties and effectiveness. In this paper, we give an overview of properties of non-conformity score functions. We give examples of non-conformity score functions in the existing literature and introduce original modifications. We introduce an original method of evaluating the prediction set sizes of conformal predictors and use it to provide a comparison between non-conformity score functions. We also examine efficacy of different non-conformity score functions for class-conditional conformal prediction in a setting with imbalanced classes.

2605.24981 2026-05-26 cs.CL cs.LG

Large Language Model Selection with Limited Annotations

有限标注下的大语言模型选择

Yavuz Durmazkeser, Patrik Okanovic, Andreas Kirsch, Torsten Hoefler, Nezihe Merve Gürel

AI总结 提出SELECT-LLM框架,通过基于期望信息增益的查询选择规则,在有限标注下高效识别最佳大语言模型,显著降低标注成本。

Comments 33 pages, 5 figures, 4 tables

详情
AI中文摘要

为给定任务选择大语言模型(LLM)需要比较许多强候选模型,然而标准评估依赖于固定评估集上的昂贵标注。为解决这一挑战,我们开发了SELECT-LLM,这是第一个用于主动模型选择LLM的框架。SELECT-LLM旨在找到一组查询,其标注对于识别给定任务的最佳LLM最具信息量。为此,我们引入了一种基于期望信息增益的查询选择规则,该规则通过候选模型输出之间的成对相似性计算。由于该规则仅使用生成的模型响应,SELECT-LLM可以在不假设候选模型架构或访问模型权重的情况下应用。这使得它适用于开源权重和黑盒LLM。我们在23个数据集、156个评估模型、多样化的任务族和多个文本评估指标上评估了SELECT-LLM。在所有实验中,SELECT-LLM在每个设置中都优于最强基线,最佳模型选择的标注成本降低高达81.8%,近最佳模型选择的标注成本降低高达84.78%。

英文摘要

Choosing a Large Language Model (LLM) for a given task requires comparing many strong candidates, yet standard evaluation relies on costly annotations over fixed evaluation sets. To address this challenge, we develop SELECT-LLM, the first framework for active model selection of LLMs. SELECT-LLM aims to find a small set of queries whose annotations are most informative for identifying the best LLM for a given task. To this end, we introduce a query selection rule based on expected information gain, computed from pairwise similarities between candidate model outputs. Because this rule only uses generated model responses, SELECT-LLM can be applied across candidate models without assumptions about their architecture or access to model weights. This makes it suitable for both open-weight and black-box LLMs. We evaluate SELECT-LLM across 23 datasets, 156 evaluated models, diverse task families, and multiple text evaluation metrics. Across all experiments, SELECT-LLM improves over the strongest baseline in every setting, with annotation cost reductions up to 81.8% for best model selection and up to 84.78% for near-best model selection.