arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.01895 2026-06-02 cs.CV cs.AI

Collaborative Space Object Detection with Multi-Satellite Viewpoints in LEO Constellations

LEO星座中基于多卫星视角的协作空间目标检测

Xingyu Qu, Wenxuan Zhang, Peng Hu

AI总结 针对LEO星座中空间目标检测的挑战,提出基于深度学习框架的多视角观测融合方法,使用YOLO检测器处理多视角数据,实验表明多视角融合显著提升检测精度。

详情
AI中文摘要

随着低地球轨道(LEO)星座中卫星数量的增加,近地空间环境日益拥挤,使得空间目标检测(SOD)成为空间安全和可持续性面临的紧迫挑战。为了降低碰撞风险并确保空间操作的连续性,SOD系统必须在严格的星载约束下提供快速准确的检测。在本文中,我们研究了深度学习(DL)框架内多视角观测融合的潜力,以增强SOD性能。我们设计了一个实用的多视角流水线和几种输入表示,用于将多视角数据输入基于YOLO的检测器。我们的实验表明,在大多数情况下使用多视角输入是可行的,并且通常能在mAP50和mAP50-95上产生更好的结果。例如,在模型YOLOv9-m中,单视角与三视角融合RGB设置相比,mAP50从0.638增加到0.732,而mAP50-95从0.227提高到0.276。与单视角设置相比,最佳的三视角灰度配置将mAP50提高了36.3%,mAP50-95提高了46.5%。这些发现确立了多视角融合作为SOD的一种可行且有效的策略,对LEO星座部署中的空间态势感知具有广泛意义。

英文摘要

With the growing number of satellites in low Earth orbit (LEO) constellations, the near-Earth space environment has become increasingly congested, making space object detection (SOD) a pressing challenge for space safety and sustainability. To mitigate collision risks and ensure the continuity of space operations, SOD systems must deliver fast and accurate detection under stringent onboard constraints. In this paper, we investigate the potential of multi-viewpoint observation fusion within a deep learning (DL) framework to enhance SOD performance. We design a practical multi-view pipeline and several input representations for feeding multi-view data into YOLO-based detectors. Our experiments show that using multi-view inputs is feasible in most cases and typically produces better results for mAP50 and mAP50-95. For example, in model YOLOv9-m, single-view compared to a three-view fused RGB setting, mAP50 increases from 0.638 to 0.732, while mAP50-95 improves from 0.227 to 0.276. Compared with the single-view setting, the best three-view grayscale configuration improves mAP50 by 36.3% and mAP50-95 by 46.5%. These findings establish multi-view fusion as a viable and effective strategy for SOD, with broad implications for space situational awareness in LEO constellation deployments.

2606.01894 2026-06-02 cs.AI

Physically-Constrained Mamba-SDE for Remaining Useful Life Prediction under Irregular Observations

物理约束的Mamba-SDE用于不规则观测下的剩余使用寿命预测

Deyu Zhuang, Peiliang Gong, Yang Shao, Liyuan Shu, Qi Zhu, Xiaoli Li, Daoqiang Zhang

AI总结 提出PC-MambaSDE框架,通过掩码感知连续Mamba编码器和物理引导的潜在SDE,解决不规则观测下剩余使用寿命预测的物理不可行性问题。

详情
AI中文摘要

准确的剩余使用寿命预测对于工业预测性维护至关重要。然而,由于传感器观测的不规则性,表现为异步采样、突发缺失和时间抖动,实际部署具有挑战性。更糟糕的是,纯数据驱动模型常常生成物理上不合理的退化轨迹,违反损伤累积的不可逆性。为了解决这个问题,我们提出了PC-MambaSDE,一个统一的连续时间框架,用于在不规则观测下进行鲁棒的RUL预测。具体来说,我们设计了一个掩码感知连续Mamba编码器,显式利用观测掩码提取富含上下文的控制信号。此外,我们引入了一个带有参数化修正混合漂移的物理引导潜在SDE,叠加全局物理偏差以强制单调退化,即使在严重观测间隙下也是如此。另外,我们通过终端退化惩罚将RUL预测公式化为边界值问题,该惩罚解耦健康指标维度并应用惩罚损失引导轨迹向故障状态演化。理论上,我们通过Girsanov定理证明了我们的变分目标在数学上等价于最小化KL散度,并通过Lyapunov分析保证了学习动力学的全局渐近稳定性。为了进行严格评估,我们开发了一个混合不规则性生成方案,模拟真实的工业缺陷。在公开基准上的大量实验表明,PC-MambaSDE显著优于最先进的方法,特别是在极端观测稀缺情况下,验证了将物理先验嵌入连续时间潜在动力学的有效性。

英文摘要

Accurate Remaining Useful Life prediction is critical for industrial predictive maintenance. However, real-world deployment is challenging due to the irregular nature of sensor observations, characterized by asynchronous sampling, burst missingness, and temporal jitter. Compounding this issue, purely data-driven models often generate physically implausible degradation trajectories that violate the irreversible nature of damage accumulation. To address this, we propose PC-MambaSDE, a unified continuous-time framework for robust RUL prediction under irregular observations. Specifically, we design a Mask-Aware Continuous Mamba Encoder that explicitly leverages observation masks to extract context-rich control signals. Furthermore, we introduce a Physics-Guided Latent SDE with parametrically rectified hybrid drift, superimposing a global physical bias to enforce monotonic degradation even amid severe observation gaps. Additionally, we formulate RUL prediction as a boundary value problem via a Terminal Degradation Penalty, which decouples a Health Index dimension and applies a penalty loss to guide trajectories toward the failure state. Theoretically, we prove that our variational objective is mathematically equivalent to minimizing the KL divergence via Girsanov's theorem, and we guarantee the global asymptotic stability of the learned dynamics through Lyapunov analysis. To enable rigorous evaluation, we develop a Hybrid Irregularity Generation Scheme that simulates realistic industrial imperfections. Extensive experiments on public benchmarks demonstrate that PC-MambaSDE significantly outperforms state-of-the-art methods, particularly under extreme observation scarcity, validating the efficacy of embedding physical priors into continuous-time latent dynamics.

2606.01892 2026-06-02 cs.CV

Adversarial Attacks on Robot Localization Systems via Deep Feature Perturbation

通过深度特征扰动对机器人定位系统的对抗攻击

Zhenyu Li, Tianyi Shang

AI总结 提出一种基于轻量级乘积量化网络(LPQN)的对抗攻击框架,通过扰动查询特征编码来误导视觉定位系统中的检索过程,从而暴露深度学习定位管道的脆弱性。

详情
Comments
11page
AI中文摘要

机器人定位系统对于自主导航和安全至关重要。对抗性扰动可能误导这些系统,导致定位错误、导航失误或不安全交互,尤其是在关键任务场景中。本文研究了基于深度学习的定位管道对对抗攻击的脆弱性。我们提出了一种新颖的框架,用于生成专门针对视觉定位系统中乘积量化(PQ)的对抗查询。我们的方法采用轻量级乘积量化网络(LPQN)来扰动查询特征编码,通过返回语义无关的数据库条目来误导检索过程。对抗查询通过两阶段过程生成:前向传播扰动特征分布,后向传播通过优化细化扰动。LPQN的轻量级设计允许以最小的计算开销创建微妙但高效的扰动。在受控和真实机器人环境中的大量实验表明,我们的方法显著降低了PQN的性能,暴露了实际应用中的关键脆弱性。

英文摘要

Robot localization systems are critical for autonomous navigation and safety. Adversarial perturbations can mislead these systems, resulting in mislocalization, navigation errors, or unsafe interactions, especially in mission-critical scenarios. This paper investigates the vulnerability of deep learning based localization pipelines to adversarial attacks. We propose a novel framework for generating adversarial queries that specifically target Product Quantization (PQ) in visual localization systems. Our method employs a Lightweight Product Quantization Network (LPQN) to perturb query feature encodings, misleading the retrieval process by returning semantically irrelevant database entries. Adversarial queries are generated via a two-phase procedure: a forward pass that perturbs feature distributions and a backward pass that refines the perturbation through optimization. The lightweight design of LPQN allows the creation of subtle yet highly effective perturbations with minimal computational overhead. Extensive experiments in both controlled and real-world robotic environments demonstrate that our approach substantially degrades PQN performance, exposing critical vulnerabilities in practical applications.

2606.01891 2026-06-02 cs.GR cs.LG

MidSurfNet: Learnable Face Pairing and Interference Implicit Fields for Generalized Mid-surface Abstraction

MidSurfNet:面向广义中面抽象的可学习面配对与干涉隐式场

Li Ye, Xinhang Zhou, Xingyu Yang, Ruofeng Tong, Hailong Li, Peng Du, Min Tang

AI总结 提出MidSurfNet框架,通过可学习的面配对模块和干涉隐式场,解决薄壁CAD模型中多壁厚、自匹配及非中心偏移等复杂场景的中面抽象问题,实现87.32%的面配对准确率。

详情
Comments
20 pages, 12 figures, 5 tables
AI中文摘要

中面抽象对于薄壁CAD模型的有限元分析至关重要。现有的基于面配对的方法依赖手工几何启发式,但实际工业模型常呈现多壁厚区域、自匹配面配置,并需要非中心偏移曲面——在这些场景中,基于规则的方法始终失败。我们提出MidSurfNet,一个学习增强框架,通过两个新颖组件解决这些局限:(1) 神经面配对模块,从几何和拓扑特征学习预测面配对置信度,处理超越基于规则方法的复杂配对场景;(2) 干涉隐式场,将中面表示为两个符号距离函数的干涉,实现广义偏移控制,以便在下游CAE/FEA导向工作流中灵活定位。我们构建了一个包含超过1500个手动标注CAD模型的大规模中面数据集。实验表明,MidSurfNet达到87.32%的面配对准确率,并成功处理了困扰所有现有方法的多壁厚(完成率61.90%)和自匹配(完成率52.94%)场景。此外,MidSurfNet为面向CAE的应用提供了具有任意偏移控制的广义中面抽象的学习方法。

英文摘要

Mid-surface abstraction is essential for finite element analysis of thin-walled CAD models. Existing face pairing-based methods rely on handcrafted geometric heuristics, yet real-world industrial models frequently exhibit multi-wall-thickness regions, self-matching face configurations, and demand for non-center offset surfaces--scenarios where rule-based approaches consistently fail. We present MidSurfNet, a learning-augmented framework that addresses these limitations through two novel components: (1) a neural face pairing module that learns to predict face pair confidence from geometric and topological features, handling complex pairing scenarios beyond rule-based methods; and (2) an interference implicit field that represents mid-surfaces as the interference of two signed distance functions, enabling generalized offset control for flexible positioning in downstream CAE/FEA-oriented workflows. We construct a large-scale mid-surface dataset containing over 1,500 manually annotated CAD models. Experiments demonstrate that MidSurfNet achieves 87.32% face pairing accuracy and successfully handles multi-wall-thickness (61.90% completion) and self-matching (52.94% completion) scenarios that confound all existing methods. Furthermore, MidSurfNet provides a learning-based approach to generalized mid-surface abstraction with arbitrary offset control for CAE-oriented applications.

2606.01890 2026-06-02 cs.LG

Segment-driven Structural Induction and Semantic Alignment for Heterogeneous Tabular Representation

面向异构表格表示的片段驱动结构归纳与语义对齐

Woojun Jung, Susik Yoon

AI总结 提出NAVI框架,通过掩码片段建模和熵驱动片段对齐,利用片段级结构归纳与语义对齐实现异构表格的表示学习。

详情
AI中文摘要

现实世界领域通常包含异构表格,其标题各不相同,但底层属性语义是共享的,这使得仅从表格局部证据中归纳领域专用语义变得困难。现有编码器对此问题进行了部分建模,但往往未充分利用列级值分布,并对具有不同语义角色的属性应用统一目标。我们提出NAVI,一种以片段为中心的预训练框架,将每个标题-值对视为聚合模式级结构证据和列级分布证据的单位。我们通过掩码片段建模和熵驱动片段对齐实现这一设计,共同强制结构化标题-值耦合以及跨稳定属性和实例特定属性的语义对齐。在异构领域内表格上的实验表明,在整体评估设置下,重建、语义一致性和下游效用均得到改善。

英文摘要

Real-world domains often contain heterogeneous tables whose headers vary while their underlying attribute semantics are shared, making it difficult to induce domain-specialized semantics from table-local evidence alone. Existing encoders model parts of this problem, but often underuse column-level value distributions and apply uniform objectives across attributes with different semantic roles. We propose NAVI, a segment-centric pretraining framework that treats each header-value pair as the unit for aggregating schema-level structural evidence and column-level distributional evidence. We realize this design through Masked Segment Modeling and Entropy-driven Segment Alignment, which jointly enforce structured header-value coupling and semantic alignment across stable and instance-specific attributes. Experiments on heterogeneous in-domain tables show improved reconstruction, semantic consistency, and downstream utility across evaluation settings overall.

2606.01886 2026-06-02 cs.AI cs.CE

Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents

吸收复杂性:面向金融LLM代理的交互原生知识驾驭系统

Ailiya Borjigin, Igor Stadnyk, Ben Bilski, Maksym Chikita, Dmytro Kyrylenko, Sofiia Pidturkina, Julia Stadnyk

AI总结 提出交互原生知识驾驭(InKH)架构,通过被动知识注入、时序图记忆和过期失效机制,将复杂性吸收到系统中,在金融LLM代理任务中显著降低延迟、令牌成本和过时知识使用,同时提升任务质量和可追溯性。

详情
Comments
17 pages, 3 figures
AI中文摘要

金融AI代理常常因一个简单原因而失败:它们让用户承担复杂性。用户必须反复陈述目标、风险偏好、投资组合背景、过往判断以及不断变化的市场假设,而代理则回答、检索、行动并遗忘。在金融领域,这不仅仅是方便与否的问题。在市场分析、跟单交易审查和交易准备等任务中,被遗忘的背景和过时的记忆可能导致延迟、重复错误、弱可审计性以及不安全的决策。 我们提出了交互原生知识驾驭(InKH),一种面向金融LLM代理的架构,将复杂性吸收到系统中。InKH将用户、市场、投资组合和工具事件转换为结构化的操作知识。它使用被动知识注入在主模型步骤之前组装一个有界的工作上下文缓冲区,使用时序图记忆进行低延迟检索,使用维基审计界面实现人类可读的治理,以及具有成熟度、衰减和写入时失效的背景提取。 我们在一个可重复的受控合成基准上评估了InKH,该基准包含24个随机种子、4轮、每轮80个片段和6个基线,产生了46,080个基线条件评估。InKH在900毫秒延迟下实现了0.815的平均任务质量。与代理驱动的维基漫步记忆相比,它将延迟降低了82.95%,令牌成本降低了82.29%,过时知识使用降低了96.58%,同时质量提高了0.108,可追溯性提高了0.461。与没有失效机制的时序图系统相比,它在相当的服务成本下将质量提高了0.050,并将过时记忆使用降低了96.58%。 结果支持了金融AI的设计论点:当复杂性被系统吸收而不是转移给用户时,采用就会发生。该基准验证了架构层面的行为,而非实时交易性能。

英文摘要

Financial AI agents often fail for a simple reason: they make users carry the complexity. A user must repeatedly restate goals, risk preferences, portfolio context, past judgments, and shifting market assumptions, while the agent answers, retrieves, acts, and forgets. In finance, this is not just inconvenient. In tasks such as market analysis, copy-trading review, and trade preparation, forgotten context and stale memory can create latency, repeated errors, weak auditability, and unsafe decisions. We propose the interaction-native knowledge harness (InKH), an architecture for financial LLM agents that absorbs complexity into the system. InKH converts user, market, portfolio, and tool events into structured operational knowledge. It uses passive knowledge injection to assemble a bounded working context buffer before the main model step, temporal graph memory for low-latency retrieval, a wiki audit surface for human-readable governance, and background extraction with maturity, decay, and write-time invalidation. We evaluate InKH on a reproducible controlled synthetic benchmark with 24 random seeds, 4 rounds, 80 episodes per round, and 6 baselines, producing 46,080 baseline-conditioned evaluations. InKH achieves mean task quality of 0.815 at 900 ms latency. Compared with agent-driven wiki-walk memory, it reduces latency by 82.95 percent, token cost by 82.29 percent, and stale-knowledge usage by 96.58 percent, while improving quality by 0.108 and traceability by 0.461. Compared with a temporal-graph system without invalidation, it improves quality by 0.050 and reduces stale-memory usage by 96.58 percent with comparable serving cost. The results support a design thesis for financial AI: adoption happens when complexity is absorbed by the system rather than transferred to the user. The benchmark validates architecture-level behavior, not live trading performance.

2606.01885 2026-06-02 cs.CV

Divide and Conquer: Reliable Multi-View Evidential Learning for Deepfake Detection

分而治之:用于深度伪造检测的可靠多视图证据学习

Xiaolu Kang, Zhongyuan Wang, Jikang Cheng, Baojin Huang, Zhanhe Lei, Gang Wu, Qin Zou, Qian Wang

AI总结 提出分治多视图证据学习框架(DiCoME),通过几何视图净化解耦语义与伪影特征,并利用不确定性感知证据学习融合视图,提升深度伪造检测的泛化性和可靠性。

详情
Comments
Accepted to ICML 2026
AI中文摘要

随着生成模型的演进,深度伪造已实现近乎完美的语义真实性,仅在细微结构异常中留下取证痕迹。然而,现有的单视图范式通常难以泛化,因为主导的语义特征在纠缠表示中掩盖了微弱的伪影线索。这种不平衡导致过度自信但脆弱的预测——我们称之为语义掩蔽效应。为应对这一挑战,我们提出一个可靠的框架,称为分治多视图证据学习(DiCoME),用于深度伪造检测。在“分”阶段,我们采用几何视图净化,通过原则性的几何投影分解纠缠的表示空间。该过程抑制了伪影敏感表示中的语义干扰,为去相关但互补的语义和伪影视图奠定基础。在“治”阶段,我们利用不确定性感知证据学习来合成这些不同的视图。通过显式建模语义和伪影线索之间的“认知冲突”,该机制提供校准的不确定性估计,而不是强制僵化的确定性决策。跨多个基准的大量实验表明,我们的方法在泛化性能上持续优于现有方法,同时为可信的深度伪造检测提供可靠的不确定性估计。代码可在 https://github.com/kxl0825/DiCoME.git 获取。

英文摘要

With the evolution of generative models, deepfakes have achieved near-perfect semantic realism, leaving forensic traces only in subtle structural anomalies. However, existing single-view paradigms often fail to generalize, as dominant semantic features overwhelm subtle artifact cues within entangled representations. This imbalance leads to overconfident yet brittle predictions -- a phenomenon we term the Semantic Masking Effect. To address this challenge, we propose a reliable framework called Divide-and-Conquer Multi-View Evidential Learning (DiCoME) for Deepfake Detection. In the "Divide" phase, we employ Geometric View Purification to decompose the entangled representation space through principled geometric projection. This process suppresses semantic interference within artifact-sensitive representations, forming the foundation for decorrelated yet complementary semantic and artifact views. In the "Conquer" phase, we leverage Uncertainty-Aware Evidential Learning to synthesize these distinct views. By explicitly modeling the "epistemic conflict" between semantic and artifact cues, this mechanism provides calibrated uncertainty estimates instead of forcing rigid deterministic decisions. Extensive experiments across multiple benchmarks demonstrate that our method consistently outperforms existing approaches in generalization performance, while providing reliable uncertainty estimation for trustworthy deepfake detection. Code is available at https://github.com/kxl0825/DiCoME.git.

2606.01883 2026-06-02 cs.LG cs.CV

Beyond the Simplex: Balanced Prototype Geometry for Scorer-Agnostic Open-Set Recognition

超越单纯形:用于评分器无关的开放集识别的平衡原型几何

Mayank Sharma, Rohit Kumar Mourya

AI总结 本文提出平衡等范数原型几何理论,统一分析不同嵌入维度下的开放集识别,证明评分器性能依赖于评分规则而非单纯形结构。

详情
Comments
20 pages, 2 figures, 6 tables
AI中文摘要

开放集识别(OSR)要求分类器拒绝来自未见类别的输入,这在医学成像等安全关键场景中至关重要。基于单纯形的方法将类原型固定在正则单纯形的顶点,然后通过距离比分数进行拒绝,这些方法在经验上表现良好但缺乏理论依据,且现有分析仅适用于嵌入维度d至少为C-1的情况,这是正则单纯形存在的条件。我们给出了在任意嵌入维度(包括d < C-1)下单纯形比OSR的理论解释。我们的分析集中于平衡等范数编码:具有等长和零和的原型配置,存在于所有d >= 2的情况,并包含正则单纯形作为特例。对于这些编码,我们证明辅助平方比分数的子水平集是欧几里得球的精确并集,进而包围了操作分数的接受区域;并且我们证明了一个尖锐的二分法:当且仅当d >= C-1时,原型达到等距对称性,行为类似于正则单纯形,低于该阈值时,由显式缺陷参数控制退化程度。我们进一步证明,在自然各向同性假设下,错误接受率随d指数衰减,并且操作分数是全局Lipschitz的,具有紧致接受区域。在实验上,我们将平衡原型几何作为分析工具和表示学习先验进行研究,而非作为独立的先进检测器。在CIFAR和MedMNIST开放集划分上,几何结构提供了有用的结构,但OSR性能仍然强烈依赖于评分规则:原始比率分数通常不如基于最近邻和logit的替代方案。

英文摘要

Open-set recognition (OSR) requires a classifier to reject inputs from unseen classes which is essential in safety-critical settings such as medical imaging. Simplex based methods, which fix class prototypes at the vertices of a regular simplex and then reject via a distance-ratio score, perform well empirically but lack theoretical justification, and existing analysis applies only when the embedding dimension d is at least C-1, which is the regime in which a regular simplex exists. We give a theoretical account of simplex-ratio OSR that holds in every embedding dimension, including d < C-1. Our analysis centers on balanced equal-norm codes: prototype configurations with equal lengths and zero sum, which exist for all d >= 2 and include the regular simplex as a special case. For these codes we show that an auxiliary squared ratio score has sublevel sets that are exact unions of Euclidean balls, which in turn bracket the acceptance region of the operational score; and we prove a sharp dichotomy: the prototypes attain one-distance symmetry, behaving like a regular simplex, if and only if d >= C-1, with controlled degradation governed by an explicit defect parameter below that threshold. We further show the false-acceptance rate decays exponentially in d under natural isotropy assumptions, and that the operational score is globally Lipschitz with compact acceptance regions. Empirically, we study balanced prototype geometry as both an analytic tool and a representation-learning prior, rather than as a stand-alone state-of-the-art detector. Across CIFAR and MedMNIST open-set splits, the geometry provides useful structure, but OSR performance remains strongly dependent on the scoring rule: raw ratio scores typically underperform nearest-neighbor and logit-based alternatives.

2606.01879 2026-06-02 cs.CL

CultureForest: Understanding and Evaluating Cultural Norm Grounded Reasoning in LLMs

CultureForest:理解与评估大语言模型中的文化规范推理

Yangfan Ye, Xiaocheng Feng, Jialong Tang, Xiayu Cao, Zihan Zhang, Xiachong Feng, Baosong Yang, Bing Qin

AI总结 为弥补现有研究仅将文化智能视为知识获取问题而忽视实际场景应用的不足,提出CultureForest基准,通过基于原子规范的推理任务评估模型,发现顶级模型在开放式生成中性能大幅下降,并揭示推理能力瓶颈。

详情
AI中文摘要

现有研究大多将大语言模型中的文化智能简化为知识层面的问题,忽视了模型能否在现实场景中有效利用其获取的知识。为弥补这一差距,我们引入了CultureForest,一个用于 extit{文化规范推理}的基准。每个问题都基于一组原子规范,从而支持可验证和可归因的评估。CultureForest包含来自8个领域和53个国家/地区的5,378个示例,并支持从多项选择到开放式生成的渐进式评估。大量实验表明,即使是顶级模型在开放式设置中性能也大幅下降,并伴随显著的跨区域差异。通过针对性分析,我们发现了几个一致的模式:(1)测试时推理带来的收益有限,且可能加剧不公平;(2)模型表现出高度共享的区域偏好结构;(3)模型响应明显保守,尤其在更严格的文化约束下;(4)通过分离文化知识获取与文化推理,我们发现虽然LLMs拥有丰富的文化知识,但其性能进一步受限于知识的有效利用。这些发现表明,有必要从以知识为中心的评估转向衡量基于知识的推理。

英文摘要

Existing research largely reduces cultural intelligence in LLMs to a knowledge-level problem, overlooking whether models can effectively utilize their acquired knowledge in realistic scenarios. To bridge this gap, we introduce CultureForest, a benchmark for \textit{Cultural Norm Grounded Reasoning}. Each question is grounded in a small set of atomic norms, enabling verifiable and attributable evaluation. CultureForest comprises 5,378 examples across 8 domains and 53 countries/regions, and supports a progressive evaluation from multiple-choice to open-ended generation. Extensive experiments reveal that even top-tier models degrade substantially in open-ended settings, accompanied by pronounced cross-region disparities. Through targeted analysis, we uncover several consistent patterns: (1) test-time reasoning yields limited gains and may exacerbate inequity; (2) models exhibit highly shared regional preference structures; (3) model responses are markedly conservative, especially under stricter cultural constraints; and (4) by disentangling cultural knowledge acquisition from cultural reasoning, we show that while LLMs possess substantial cultural knowledge, their performance is further bottlenecked by its effective use. These findings point to a necessary shift from knowledge-centric evaluation toward measuring knowledge-grounded reasoning.

2606.01873 2026-06-02 cs.LG

G2LoRA: Gradient Orthogonal Low-Rank Adaptation Framework for Graph Continual Learning on Text-Attributed Graphs

G2LoRA: 面向文本属性图的梯度正交低秩自适应框架用于图持续学习

Yuhan Wang, Yibo Ding, Yutong Ye, Mufan Zhao, Wenbo Zhang, Ruijie Wang, Jianxin Li

AI总结 针对LLM-as-Aligner模型在文本属性图持续学习中的灾难性遗忘问题,提出G2LoRA框架,通过统一图-文本对齐目标、类别感知梯度投影和梯度幅度调制,实现任务间正向迁移并缓解模态漂移。

详情
Comments
Accepted by KDD 2026
AI中文摘要

LLM-as-Aligner已成为文本属性图(TAGs)的一种流行预训练范式,通过CLIP风格的对比学习将图和文本模态对齐到共享嵌入空间。虽然在单个下游任务上有效,但我们观察到当此类模型在流式任务上顺序微调时会出现严重的灾难性遗忘。尽管参数高效微调在一定程度上缓解了遗忘,但仍不足以解决任务干扰和无效知识迁移。在这项工作中,我们研究了TAGs上LLM-as-Aligner模型的图持续学习,目标是减轻干扰同时促进任务间的正向迁移。该设置引入了两个基本挑战:(1)异构下游任务导致优化目标变化,阻碍统一微调;(2)图和文本编码器对自适应表现出不同的敏感性,不协调的更新容易导致错位。为应对这些挑战,我们提出了G2LoRA,一个面向TAGs的持续学习框架。G2LoRA将节点级、链接级和图级任务统一到单一的图-文本对齐目标下,并在领域/类别/任务增量模式下实现一致的优化。为减少任务干扰同时鼓励正向迁移,G2LoRA在结构化子空间中执行类别感知梯度投影,解决冲突更新并实现条件性反向迁移以平衡前向和后向知识流。为进一步防止跨模态漂移,G2LoRA引入梯度幅度调制来协调图和文本编码器之间的更新速率。在基准数据集上的大量实验表明,G2LoRA在不同骨干架构上始终优于强基线,实现了卓越的持续性能和可迁移性。

英文摘要

LLM-as-Aligner has emerged as a prevalent pre-training paradigm for Text-Attributed Graphs(TAGS), aligning graph and text modalities into a shared embedding space via CLIP-style contrastive learning. While effective on individual downstream tasks, we observe severe catastrophic forgetting when such models are sequentially fine-tuned on streaming tasks. Although parameter-efficient fine-tuning alleviates forgetting to some extent, it remains insufficient to resolve task interference and ineffective knowledge transfer. In this work, we study graph continual learning for LLM-as-Aligner models on TAGs, with the goal of mitigating interference while promoting positive transfer across tasks. This setting introduces two fundamental challenges: (1) heterogeneous downstream tasks induce shifting optimization objectives, hindering unified fine-tuning; and (2) graph and text encoders exhibit different sensitivities to adaptation, making uncoordinated updates prone to misalignment. To address these challenges, we propose G2LoRA, a continual learning framework for TAGs. G2LoRA unifies node-, link-, and graph-level tasks under a single graph--text alignment objective, and enables consistent optimization across domain/class/task incremental modes. To reduce task interference while encouraging positive transfer, G2LoRA performs category-aware gradient projection in structured subspaces, resolving conflicting updates and enabling conditional backward transfer to balance forward and backward knowledge flow. To further prevent cross-modal drift, G2LoRA introduces gradient magnitude modulation to coordinate update rates between graph and text encoders. Extensive experiments on benchmark datasets demonstrate that G2LoRA consistently outperforms strong baselines across different backbone architectures, achieving superior continual performance and transferability.

2606.01871 2026-06-02 cs.CV

Deep Learning for Generating Computational PIN-4 Immunohistochemistry Staining from Prostate Biopsy H&E Images

深度学习从前列腺活检H&E图像生成计算PIN-4免疫组织化学染色

Vietbao Tran, Pratik Shah

AI总结 本研究使用条件生成对抗网络(cGAN)从H&E图像合成PIN-4 IHC染色,实现了直接的空间对应,并在病理评估中取得了良好效果。

详情
AI中文摘要

免疫组织化学(IHC)常用于解决苏木精和伊红(H&E)染色组织上诊断不明确的前列腺癌活检结果。然而,PIN-4 IHC染色通常在相邻组织切片上进行,限制了H&E形态与相应免疫表型信号之间的直接空间比较。从常规临床前列腺活检全切片图像(WSI)构建了一个配对、配准的H&E/PIN-4数据集,并训练了一个条件生成对抗网络(cGAN)直接从原始H&E图像块合成PIN-4染色模式。最终数据集包含来自93名患者的172对配准WSI和27,298对配准的1024x1024图像块,涵盖腺癌阳性和良性病例,并代表了不同年龄、种族和民族群体。模型在来自17张WSI的1,814对图像块的保留测试集上进行了评估,平均峰值信噪比(PSNR)为21.88 dB,结构相似性指数(SSIM)为0.667,皮尔逊相关系数(PCC)为0.684,学习感知图像块相似度(LPIPS)为0.417。由委员会认证的病理学家进行的定性审查显示,生成的图像捕获了诊断相关的PIN-4染色模式,包括AMACR/消旋酶表达和基底细胞相关染色,同时保持了与源H&E形态的空间对应。在形态复杂的区域(包括高级别癌和导管内癌)中,合成的准确性有所变化。这些结果支持从常规采集的明场H&E前列腺活检图像进行监督式PIN-4合成的可行性。该方法能够在源前列腺H&E结构的背景下直接解释预测的PIN-4标记模式,解决了传统相邻切片IHC当前的空间局限性。

英文摘要

Immunohistochemistry (IHC)is frequently used to resolve diagnostically ambiguous prostate cancer biopsy findings on hematoxylin and eosin (H&E)-stained tissue. However, PIN-4 IHC staining is typically performed on adjacent tissue sections, limiting direct spatial comparison between the H&E morphology and the corresponding immunophenotypic signal. A paired, registered H&E/PIN-4 dataset was constructed from routine clinical prostate biopsy whole-slide images (WSIs), and a conditional generative adversarial network (cGAN) was trained to synthesize PIN-4 staining patterns directly from native H&E image patches. The final dataset comprised 172 paired WSIs from 93 patients and 27,298 registered 1024x1024 patch pairs, spanning adenocarcinoma-positive and benign cases with representation across age, race, and ethnicity groups. The model was evaluated on a held-out test set of 1,814 patch pairs from 17 WSIs, achieving a mean peak signal-to-noise ratio (PSNR) of 21.88 dB, structural similarity index measure (SSIM) of 0.667, Pearson correlation coefficient (PCC) of 0.684, and learned perceptual image patch similarity (LPIPS) of 0.417. Qualitative review by a board-certified pathologist showed that generated images captured diagnostically relevant PIN-4 staining patterns, including AMACR/racemase expression and basal-cell-associated staining, while preserving spatial correspondence with the source H&E morphology. Accuracy of synthesis varied across morphologically complex regions, including high-grade carcinoma and intraductal carcinoma. These results support the feasibility of supervised PIN-4 synthesis from routinely acquired brightfield H&E prostate biopsy images. The approach enables direct interpretation of predicted PIN-4 marker patterns in the context of the source prostate H&E architecture, addressing a current spatial limitation of conventional adjacent-section IHC.

2606.01868 2026-06-02 cs.LG

Task-Induced Representational Invariances Depend on Learning Objective in Deep RL

任务诱导的表征不变性依赖于深度强化学习中的学习目标

Manu Srinath Halvagal, Sebastian Lee, SueYeon Chung

AI总结 本文通过MDP约简理论分析深度强化学习中的表征,发现基于价值的方法(DQN)学习对MDP同态对称性不变的表征,而基于策略梯度的方法(PPO)学习对动作对称性不变的表征,这些差异影响迁移学习并在LLM中呈现提示依赖性。

详情
AI中文摘要

强化学习(RL)长期以来在神经科学中被用作目标导向动物行为的模型。现代深度RL在许多领域取得了显著成功,进一步强化了这一联系。学习高维状态空间的抽象表征能力是这一成功的基础。然而,对这些学习表征的理论理解仍然有限,阻碍了模型与动物学习之间的直接比较。我们通过MDP约简理论的视角分析深度RL表征来弥补这一差距。在导航任务中研究经典RL算法时,我们发现即使性能相当,基于价值的方法(DQN)学习对MDP同态对称性不变的表征,而基于策略梯度的方法(PPO)学习对动作对称性不变的表征。这些差异在不同领域中一致出现,对迁移学习有下游影响,并以提示依赖的方式出现在LLM中。我们的发现提供了一种比较不同RL算法学习表征的原则性方法,具有实际意义,并可能为大脑中的神经编码提供见解。

英文摘要

Reinforcement Learning (RL) has long served as a model for goal-directed animal behavior in neuroscience. Modern deep RL has shown remarkable success across many domains, further strengthening this connection. The ability to learn abstract representations of high-dimensional state spaces underlies much of this success. However, theoretical understanding of these learned representations remains limited, hindering direct comparisons between models and animal learning. We address this gap by analyzing deep RL representations through the lens of MDP reduction theory. Investigating canonical RL algorithms in a navigation task, we find that even when performance is comparable, the value-based method (DQN) learns representations that are invariant to MDP homomorphism symmetries, while the policy-gradient method (PPO) learns representations invariant to action symmetries. These differences emerge consistently across domains, have downstream consequences for transfer learning, and appear in LLMs in a prompt-dependent manner. Our findings provide a principled approach to comparing learned representations across RL algorithms, with demonstrated practical implications and possible insights for neural coding in the brain.

2606.01865 2026-06-02 cs.RO

Set-Supervised Diffusion Policy: Learning Action-Chunking Diffusion through Corrections

集合监督扩散策略:通过修正学习动作分块扩散

Zhaoting Li, Gang Chen, Javier Alonso-Mora, Cosimo Della Santina, Jens Kober

AI总结 提出集合监督扩散策略(SDP),利用人类修正中的对比动作分块数据,通过构建期望动作分块集合来训练扩散策略,有效缓解分布偏移并提升鲁棒性。

详情
AI中文摘要

扩散策略最近已成为机器人操作的一个强大框架。然而,与其他行为克隆方法一样,它仍然容易受到分布偏移的影响,通常需要人在回路中进行干预以纠正部署过程中的失败。这些交互自然提供了成对监督,形式为机器人的不期望动作和人类教师的纠正动作。然而,现有的数据聚合流程和标准行为克隆损失在很大程度上忽略了来自不期望动作的负面信号,导致对教师动作的过拟合以及对昂贵专家数据的日益依赖。为了解决这一限制,我们提出了集合监督扩散策略(SDP),这是一种新颖的学习框架,利用对比动作分块数据从人类修正中训练扩散策略。从配对的正负动作分块中,SDP构建了一组期望的动作分块,并设计了一个训练流程,鼓励扩散策略与该集合对齐。通过在多个机器人操作任务上的大量实验,我们证明了SDP持续提高了策略性能,在对噪声数据的鲁棒性方面尤其显著。此外,SDP生成了高质量的聚合数据集,使得从人在回路修正中进行更高效、更可靠的策略学习成为可能。我们的代码可在 https://set-supervised-diffusion-policy.github.io/ 获取。

英文摘要

Diffusion policies have recently emerged as a powerful framework for robotic manipulation. However, like other behavior cloning methods, they remain vulnerable to distributional shift, often requiring human-in-the-loop interventions to correct failures during deployment. These interactions naturally provide paired supervision in the form of the robot's undesired actions and the human teacher's corrective actions. Yet existing data aggregation pipelines and standard behavior cloning losses largely ignore this negative signal from undesired actions, leading to overfitting to teacher's actions and an increasing reliance on costly expert data. To address this limitation, we propose Set-Supervised Diffusion Policy (SDP), a novel learning framework that utilizes contrastive action-chunk data to train diffusion policies from human corrections. From paired positive and negative action-chunks, SDP constructs a set of desired action-chunks and designs a training pipeline that encourages the diffusion policy to align with the set. Through extensive experiments across multiple robotic manipulation tasks, we demonstrate that SDP consistently improves policy performance, with particularly strong gains in robustness to noisy data. Moreover, SDP induces high-quality aggregated datasets, enabling more efficient and reliable policy learning from human-in-the-loop corrections. Our code is available at https://set-supervised-diffusion-policy.github.io/.

2606.01863 2026-06-02 cs.LG math-ph math.MP

Continual Learning as a Multiphase Moving-Boundary Problem

持续学习作为多相移动边界问题

Snigdha Chandan Khilar

AI总结 受熔化物理学启发,提出Stefan-CL方法,将知识巩固视为固相、未用容量视为液相,通过控制潜热调节边界移动,在几乎零遗忘下实现持续学习,无需存储原始数据。

详情
AI中文摘要

持续学习在平衡保留旧知识与吸收新任务方面面临困难。Stefan-CL通过熔化物理学优雅地解决了这一稳定性-可塑性困境。它将巩固的知识视为受保护的“固体”,未使用的容量视为可适应的“液体”。随着网络学习,这个边界在“潜热”调节旋钮的控制下扩展。通过数学上冻结已学习的内部区域,Stefan-CL将遗忘降至接近零,匹配了需要大量内存的基线方法,而无需存储原始数据,为AI开辟了一条优美且基于物理学的路径。

英文摘要

Continual learning struggles to balance retaining past knowledge with absorbing new tasks. Stefan-CL elegantly resolves this stability-plasticity dilemma through the physics of melting. It frames consolidated knowledge as a protected "solid" and unused capacity as an adaptable "liquid." As the network learns, this boundary expands, governed by a "latent heat" tuning dial. By mathematically freezing the learned interior, Stefan-CL cuts forgetting to near zero, matching memory-heavy baselines without storing raw data, forging a beautiful, physics-grounded path for AI.

2606.01862 2026-06-02 cs.MA cs.AI cs.NI

RadioMaster: Multi-Agent System for Autonomous Radio Signal Generation

RadioMaster: 自主无线电信号生成的多智能体系统

Jiazhen Lei, Tianze Cao, Yuxin Sha, Sihan Wang, Bingbing Wang, Fengyuan Zhu, Zeming Yang, Xiaohua Tian

AI总结 提出RadioMaster,一个全自主的多智能体框架,通过RadioWiki、RadioAgent和RadioEmulator三大支柱,将用户意图转化为真实无线信号,解决现有模型因领域知识和硬件约束敏感性不足而无法生成无线电信号的问题。

详情
AI中文摘要

将用户意图转化为物理无线电信号是无线原型设计中关键但繁琐的最后一步,因为它需要复杂的物理层细节知识,并带来巨大的实现挑战。大型语言模型(LLM)和多智能体系统已经彻底改变了传统的软件工程,提出了一个引人深思的问题:它们能否解决这些艰巨的困难?然而,我们的研究表明,当前模型在应用于无线电信号生成时存在显著局限性,无法完成此任务。这种性能下降主要源于严重的领域无知和对物理硬件约束的根本不敏感。为弥补这一差距,我们引入了RadioMaster,一个完全自主的多智能体框架,旨在将用户输入无缝转化为真实的无线发射。RadioMaster基于三个协同支柱运行:用于领域特定知识检索的RadioWiki、用于协作I/Q样本生成和硬件配置的RadioAgent,以及用于闭环物理层验证的RadioEmulator。此外,我们构建了RadioBench,这是首个专门针对无线电信号生成领域的全面基准测试。广泛的真实世界评估表明,RadioMaster在配置可行性和信号保真度方面显著优于最先进的基线方法。

英文摘要

Translating user intents into physical radio signals represents the critical yet notoriously tedious final step in wireless prototyping, as it requires intricate knowledge of physical layer details and presents immense implementation challenges. Large Language Models (LLMs) and multi-agent systems have revolutionized conventional software engineering, raising the compelling question of whether they can resolve these formidable difficulties. However, our investigations reveal that current models experience significant limitations and fail to accomplish this task when applied to radio signal generation. This performance degradation primarily stems from severe domain ignorance and a fundamental insensitivity to physical hardware constraints. To bridge this gap, we introduce RadioMaster, a fully autonomous multi-agent framework designed to seamlessly translate user input into real-world wireless emissions. RadioMaster operates on three synergistic pillars: RadioWiki for domain-specific knowledge retrieval, RadioAgent for collaborative I/Q sample generation alongside hardware configuration, and RadioEmulator for closed-loop physical layer verification. Furthermore, we construct RadioBench, the first comprehensive benchmark tailored specifically for the radio signal generation domain. Extensive real-world evaluations demonstrate that RadioMaster significantly outperforms state-of-the-art (SOTA) baselines regarding configuration viability and signal fidelity.

2606.01861 2026-06-02 cs.LG

A Theoretical Framework for Self-Play Theorem Proving Algorithms

自我对弈定理证明算法的理论框架

Thomas Chen, Zhiyuan Li

AI总结 本文提出一个理论框架,通过将定理集建模为图并引入可逆随机游走和多样性度量,分析自我对弈算法在定理证明中的自我改进能力。

详情
AI中文摘要

自我对弈是一种使模型能够自我改进的训练算法,最近在利用大型语言模型进行形式定理证明方面显示出有希望的实证结果。(Dong & Ma, 2025) 将自我对弈实例化为两个协作智能体:证明者(证明定理)和猜想者(生成新定理作为证明者的课程)。在本文中,我们提供了一个理论框架,用于理解自我对弈算法在定理证明中的自我改进能力。首先,我们将定理集形式化为一个图,节点为定理,边连接具有相似语义的定理对。我们引入一组原始假设,刻画训练过的证明者的保证以及猜想者如何访问图的结构。其次,我们证明,如果底层定理图是良好连接的,那么一个基于可逆随机游走的猜想算法的证明者-猜想者系统足以指数级增长已证明定理的集合。第三,受自我对弈算法在实证中遇到的一个问题(猜想者倾向于生成人为复杂且非基础的定理)的启发,我们提出了一个由猜想者生成的定理训练分布的多样性度量,以及一种改进的猜想算法,该算法通过计算定理图中相邻定理之间的扩散相似性来局部最大化该多样性度量。最后,我们描述了一种通过对比学习将节点嵌入欧几里得空间,然后计算嵌入之间的内积来计算扩散相似性的方法。

英文摘要

Self-play, a type of training algorithm that enables a model to self-improve, has recently shown promising empirical results in the context of formal theorem proving using Large Language Models (LLMs). (Dong & Ma, 2025) instantiate self-play with two cooperating agents: a prover, which proves theorems, and a conjecturer, which generates new theorems as a curriculum to the prover. In this paper, we provide a theoretical framework for understanding the self-improvement capabilities of self-play algorithms for theorem proving. First, we formalize the set of theorems as a graph, with nodes as theorems and edges between pairs of theorems with similar semantics. We introduce a set of primitive assumptions that characterize the guarantees of a trained prover and how a conjecturer can access the structure of the graph. Second, we show that if the underlying graph of theorems is well-connected, then a prover-conjecturer system, where the conjecturing algorithm is based on a reversible random walk, is sufficient to grow the set of proved theorems exponentially. Third, motivated by an issue encountered empirically by self-play algorithms, where the conjecturer tends to generate artificially complex and non-fundamental theorems, we propose a diversity measure for a training distribution of theorems generated by a conjecturer and an improved conjecturing algorithm that locally maximizes this diversity measure, by computing the diffusion similarity between neighboring theorems in the theorem graph. Finally, we describe a method to compute the diffusion similarity by using contrastive learning to embed nodes into Euclidean space and then computing the inner-product between embeddings.

2606.01858 2026-06-02 cs.CV

Polaris: Scaling Up Instruction-Guided Image Generation Towards Millions of Personalized Style Needs

Polaris: 将指令引导的图像生成扩展到数百万个性化风格需求

Zhi-Kai Chen, Jun-Peng Jiang, Jun-Jie Tao, De-Chuan Zhan, Han-Jia Ye

AI总结 提出Polaris智能检索框架,通过索引和检索超过6500个检查点和75000个适配器,自动选择和集成最相关的模型组件,实现无需额外训练的可扩展、可控且对齐的指令驱动图像生成。

详情
AI中文摘要

用户越来越期望图像生成模型能够快速适应高度多样化和个性化的需求,例如生成具有独特风格或特征的图像。传统方法依赖于微调,成本高昂且难以扩展。为了应对这些限制,社区积累了一个不断增长的微调模块和适配器库,其中每个组件针对特定的生成需求,并共同作为处理新需求的基础。这自然引出一个问题:与其重复训练新模型,我们能否系统地利用这个不断扩展的生态系统来更好地满足用户指令?为此,我们提出了Polaris,一个智能检索框架,根据用户的指令自动从模型库中选择和集成合适的模型。关键见解是,利用如此庞大和异构的库不仅需要在数千个候选中找到最相关的模块,还需要将它们有效地对齐以进行指令驱动的生成和编辑。Polaris通过索引超过6500个检查点和75000个适配器,并根据用户的输入和指令检索最相关的组件来解决这一挑战。通过这种方式,它提供了可扩展、可控且良好对齐的生成——无需任何额外训练。

英文摘要

Users increasingly expect image generation models to quickly adapt to highly diverse and personalized requirements, such as producing images with distinctive styles or characteristics. Traditional approaches rely on fine-tuning, which is costly and difficult to scale. To cope with these limitations, the community has accumulated a growing library of fine-tuned modules and adapters, where each component targets specific generation needs and collectively serves as a foundation for handling new demands. This naturally raises a question: instead of repeatedly training new models, can we systematically exploit this expanding ecosystem to better fulfill user instructions? To this end, we present Polaris, an intelligent retrieval framework that automatically selects and integrates suitable models from the model library based on a user's instructions. The key insight is that harnessing such a massive and heterogeneous pool requires not only finding the most relevant modules among thousands of candidates, but also aligning them effectively for instruction-driven generation and editing. Polaris addresses this challenge by indexing over 6,500 checkpoints and 75,000 adapters, and retrieving the most relevant components given a user's input and instruction. In doing so, it delivers scalable, controllable, and well-aligned generation -- without any additional training.

2606.01856 2026-06-02 cs.DC cs.AI

Boosting Multimodal Federated Learning via Chained Modality Optimization

通过链式模态优化提升多模态联邦学习

Zixin Zhang, Fan Qi, Shuai Li, Xiaoshan Yang, Changsheng Xu

AI总结 针对多模态联邦学习中模态竞争导致全局模型次优的问题,提出FedMChain框架,通过分阶段优化、误差补偿正则化和稀疏符号引导聚合,提升预测性能并降低通信开销。

详情
AI中文摘要

多模态联邦学习(MMFL)能够在具有异构数据和模态可用性的分散客户端之间实现隐私保护的协作学习。然而,现有大多数MMFL方法将多模态训练视为联合优化问题,忽略了一个关键瓶颈:模态竞争,即主导模态抑制较弱模态,导致全局模型次优。为解决这一问题,我们提出FedMChain,一个平衡的MMFL框架,将联邦多模态训练结构化为一系列模态阶段。这种分阶段设计为每个模态在多模态客户端上提供了专用的局部优化窗口,以缓解模态竞争,并通过误差补偿正则化器进一步促进跨模态互补性。在服务器端,我们采用稀疏符号引导聚合策略,利用方向符号一致性进行稳健的模态内聚合,避免破坏性平均,并支持较少的同步频率以降低通信开销。在多模态基准上的大量实验表明,FedMChain在需要比基线更少通信频率的同时,持续提高了预测性能。

英文摘要

Multimodal Federated Learning (MMFL) enables privacy-preserving collaborative learning across decentralized clients with heterogeneous data and modality availability. However, most existing MMFL methods cast multimodal training as a joint optimization problem, overlooking a key bottleneck: modality competition, where dominant modalities suppress weaker ones and lead to suboptimal global models. To address this, we propose FedMChain, a balanced MMFL framework that structures federated multimodal training as a chain of modality-wise phases. This phase-wise design gives each modality a dedicated local optimization window on multimodal clients to mitigate modality competition, and further promotes cross-modal complementarity via an error-compensated regularizer. On the server side, we employ a sparse sign-guided aggregation strategy that leverages directional sign agreement for robust intra-modality aggregation, avoids destructive averaging, and supports less frequent synchronization to reduce communication overhead. Extensive experiments on multimodal benchmarks demonstrate that FedMChain consistently improves predictive performance while requiring less frequent communication than baselines.

2606.01850 2026-06-02 cs.AI

Does Compression Preserve Uncertainty? A Unified Benchmark for Quantized and Sparse LLMs via Conformal Prediction

压缩是否保留不确定性?基于共形预测的量化和稀疏大语言模型统一基准

Yujia Tong, Yuxi Wang, Yunyang Wan, Tian Zhang, Junhao Dong, Jingling Yuan

AI总结 本研究通过共形预测方法,在五个NLP任务上对12种不同压缩配置的大语言模型进行基准测试,发现压缩经常解耦准确率与不确定性,大模型更能吸收压缩引起的不确定性,且不确定性膨胀常呈阈值状而非渐进。

详情
AI中文摘要

模型压缩技术如量化和剪枝被广泛用于降低大语言模型(LLMs)的部署成本,现有评估几乎只关注准确率保持。然而,在安全关键应用中,模型可靠量化自身不确定性的能力同样重要。我们问:压缩是否保留了这种能力?为回答此问题,我们在五个NLP任务上对12种不同压缩配置的LLM进行基准测试,使用共形预测提供严格、无分布的不确定性度量。实验揭示:(I) 压缩经常解耦准确率与不确定性;(II) 大模型吸收压缩引起的不确定性的能力远强于小模型;(III) 不确定性膨胀常呈阈值状而非渐进。这些结果表明,仅基于准确率的评估不足以评估压缩LLM的部署就绪度,不确定性感知的基准测试应成为模型压缩流程的标准组成部分。

英文摘要

Model compression techniques such as quantization and pruning are widely used to reduce the deployment cost of large language models (LLMs), with existing evaluations focusing almost exclusively on accuracy preservation. However, in safety-critical applications, a model's ability to reliably quantify its own uncertainty is equally important. We ask: does compression preserve this ability? To answer this question, we benchmark 12 LLMs under various compression configurations across five NLP tasks, using conformal prediction to provide a rigorous, distribution-free measure of uncertainty. Our experiments reveal that: (I) compression frequently decouples accuracy from uncertainty; (II) larger models absorb compression-induced uncertainty far more effectively than smaller ones; and (III) uncertainty inflation is often threshold-like rather than gradual. These results suggest that accuracy-only evaluation is insufficient for assessing the deployment readiness of compressed LLMs, and that uncertainty-aware benchmarking should be a standard component of model compression pipelines.

2606.01848 2026-06-02 cs.CV

RescueBench: Can Embodied Agents Save Lives in the Wild ?

RescueBench: 具身智能体能否在野外拯救生命?

Kui Wu, Beiyu Guo, Hao Chen, ShuHang Xu, Yuling Li, Yongdan Zeng, Zhoujun Li, Yizhou Wang, Fangwei Zhong

AI总结 本文提出 RescueBench,一个四阶段流水线的逼真诊断基准,用于评估具身智能体在搜索与救援任务中的探索、记忆和交互能力,并揭示探索和记忆失败如何传播。

详情
AI中文摘要

搜索与救援(SAR)要求具身智能体在多模态不确定性下探索陌生环境,执行多阶段交互,并在长时域内检索空间记忆。现有基准通常孤立评估这些能力,当它们必须在现实工作流中组合时,失败如何叠加尚不清楚。我们提出 RescueBench,一个逼真的诊断基准,将 SAR 实例化为四阶段流水线:多模态探索、目标救援、记忆引导返回和最终交接。通过将顺序任务组合与阶段级评估相结合,RescueBench 能够分析探索和记忆失败如何在具身救援工作流中传播。它包含五个渐进难度级别,在环境复杂性、线索模糊性和空间层次上有所不同,并配有自动化的情节生成和标注流水线,用于可扩展的评估和训练。我们评估了七个基线、一个 oracle 参考和人类玩家,结果显示没有基线能在最大难度下完成全部任务。阶段级诊断将自主探索识别为主要失败模式,空间记忆为第二个独立瓶颈,表明当前拓扑视觉语言导航或基于地图的方法无法解决这些局限。代码见 https://github.com/wukui-muc/RescueBench。

英文摘要

Search-and-rescue (SAR) requires embodied agents to explore unfamiliar environments under multimodal uncertainty, perform multi-stage interactions, and retrieve spatial memory over long horizons. Existing benchmarks typically evaluate these capabilities in isolation, leaving unclear how failures compound when they must be composed in realistic workflows. We introduce RescueBench, a photo-realistic diagnostic benchmark that instantiates SAR as a four-stage pipeline: multimodal exploration, target rescue, memory-guided return, and final handoff. By combining sequential task composition with stage-level evaluation, RescueBench enables analysis of how exploration and memory failures propagate through embodied rescue workflows. It contains five progressive difficulty levels that vary in environmental complexity, clue ambiguity, and spatial hierarchy, along with an automatic episode generation and annotation pipeline for scalable evaluation and training. We evaluate seven baselines, an oracle reference, and human players, showing that no baselines complete the full task at the greatest difficulty. Stage-level diagnosis identifies autonomous exploration as the dominant failure mode and spatial memory as a second, independent bottleneck, suggesting that these limitations are not resolved by current topological visual-language navigation or map-based methods. Code is available in https://github.com/wukui-muc/RescueBench

2606.01847 2026-06-02 cs.RO cs.LG

The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space

我们说的谎言:通过切空间上的分数匹配纠正视觉-语言-动作策略中的欧几里得谬误

Bing-Cheng Chuang, I-Hsuan Chu, Bor-Jiun Lin, YuanFu Yang, Min Sun, Chun-Yi Lee

AI总结 针对扩散视觉-语言-动作策略将SE(3)位姿表示为平坦R^12向量导致的欧几里得谬误,提出Lie Diffuser Actor (LDA)框架,通过左不变SDE注入噪声、在切空间预测分数并利用指数映射回缩样本,从根本上消除流形漂移、保证坐标框架等变性和测地线最优性,在CALVIN ABC→D上平均任务长度从3.27提升至3.51。

详情
Comments
ICML 2026 Accepted
AI中文摘要

基于扩散的视觉-语言-动作策略在机器人操作中取得了显著成功,但犯了一个我们称之为$ extbf{欧几里得谬误}$的基本几何错误:将SE(3)位姿表示为平坦的$\mathbb{R}^{12}$向量。这种近似导致(1)违反SO(3)约束的流形漂移,(2)坐标变换下等变性的破坏,以及(3)具有过高运动学代价的非测地线轨迹。我们提出$ extbf{Lie Diffuser Actor (LDA)}$,一个本质上在SE(3)上运行的扩散框架。我们的方法通过左不变SDE注入噪声,在切空间中预测分数,并通过指数映射回缩样本。这种表述通过构造消除了流形漂移,同时保证了坐标框架等变性和测地线最优性。在CALVIN ABC$ ightarrow$D上,LDA将平均任务长度从$3.27$提升到$3.51$($+7.3\%$)。我们进一步在真实机器人上验证了该方法,结果表明我们的方法在大多数任务上优于基线。

英文摘要

Diffusion-based Vision-Language-Action policies achieve remarkable success in robotic manipulation, yet commit a fundamental geometric error we term the $\textbf{Euclidean Fallacy}$: representing SE(3) poses as flat $\mathbb{R}^{12}$ vectors. This approximation induces (1) manifold drift violating SO(3) constraints, (2) broken equivariance under coordinate transformations, and (3) non-geodesic trajectories with excessive kinematic cost. We introduce $\textbf{Lie Diffuser Actor (LDA)}$, a diffusion framework operating intrinsically on SE(3). Our method injects noise through left-invariant SDEs, predicts scores in the tangent space, and retracts samples via the exponential map. This formulation eliminates manifold drift by construction while guaranteeing coordinate-frame equivariance and geodesic optimality. On CALVIN ABC$\rightarrow$D, LDA improves average task length from $3.27$ to $3.51$ ($+7.3\%$). We further validate our method on real robot and the results show that our methodology outperforms the baseline on majority tasks.

2606.01846 2026-06-02 cs.LG

Mos-Gen: A Generative Molecular Framework for Mosquito Insecticide Design

Mos-Gen:用于蚊虫杀虫剂设计的生成式分子框架

Lina Wang, Yaning Cui

AI总结 提出Mos-Gen框架,结合预训练分子表示模型Uni-Mol与变分自编码器,用于从头生成含二硫键的大蒜素衍生物作为蚊虫杀虫剂,实验验证预测阳性命中率达78%。

详情
AI中文摘要

蚊媒传染病每年在全球造成超过70万人死亡。传统化学杀虫剂的长期使用已导致严重的抗药性问题,迫切需要开发新型、高效且生态可持续的替代品。虽然该领域现有的人工智能方法主要集中于活性预测和分类,但在从头生成新型分子骨架方面存在关键空白。在本研究中,我们提出了Mos-Gen,一种基序感知的生成式协作框架,将预训练分子表示模型Uni-Mol与变分自编码器(VAE)相结合,专门用于设计含二硫键的大蒜素衍生物作为蚊虫杀虫剂。在生成的候选分子中,选择了14种化合物——包括9个预测阳性和5个预测阴性——进行化学合成和实验验证。预测阳性中的命中率达到78%,而预测阴性均未表现出杀蚊活性。这些实验结果充分验证了Mos-Gen框架的高精度筛选能力。

英文摘要

Mosquito-borne infectious diseases cause more than 700000 deaths worldwide each year. The long-term use of conventional chemical insecticides has induced serious resistance problems, creating an urgent need to develop novel, highly effective, and ecologically sustainable alternatives. While existing artificial intelligence approaches in this domain have focused primarily on activity prediction and classification, they leave a critical gap in the de~novo generation of novel molecular scaffolds. In this study, we propose Mos-Gen, a motif-aware generative collaborative framework that couples the pretrained molecular representation model Uni-Mol with a variational autoencoder (VAE), specifically tailored for the design of disulfide-containing allicin derivatives as mosquito insecticides. Among the generated candidates, fourteen compounds -- comprising nine predicted positives and five predicted negatives -- were selected for chemical synthesis and experimental validation. The hit rate among the predicted positives reached 78%, whereas none of the predicted negatives exhibited mosquitocidal activity. These experimental results fully validated the high-precision screening capability of the Mos-Gen framework.

2606.01845 2026-06-02 cs.CL cs.AI

Unveiling the Limits of Large Language Models in Inferring Pragmatic Meaning from Non-Verbal Responses

揭示大型语言模型在推断非语言回应中的语用意义的局限性

Sugyeong Eo, Heuiseok Lim

AI总结 本研究首次系统评估大型语言模型(LLMs)从纯非语言回应对话中推断语用意义的能力,发现其准确率相比语言回应下降高达60%,并表明上下文学习有助于语用推理。

详情
AI中文摘要

尽管大型语言模型(LLMs)在语用语言理解方面取得了显著进展,但先前的研究主要集中在其对语言行为的理解上。然而,非语言行为仍然是人类交流的基本组成部分,特别是当故意单独使用以传达间接意义时。在这项工作中,我们首次系统评估了LLMs从仅包含非语言回应的对话中推断语用意义的能力。我们探讨了三个研究问题:(1)LLMs能否识别通过非语言回应传达的间接意图?(2)LLMs何时以及如何未能捕捉非语言意图?(3)我们如何提高LLMs解释非语言意图的能力?通过评估,我们观察到LLMs难以从非语言回应中推断出潜在意义,准确率相比语言回应下降高达60个百分点。进一步的广泛分析揭示了LLMs在解释非语言行为时的行为模式,并表明上下文学习有助于语用推理。

英文摘要

Although large language models (LLMs) have shown considerable progress in pragmatic language understanding, prior research has focused mainly on their comprehension of verbal behavior. Nonetheless, non-verbal behavior remains a fundamental component of human communication, especially when deliberately utilized in isolation to convey indirect meanings. In this work, we present the first systematic evaluation of LLMs' ability to infer pragmatic meaning in dialogue consisting solely of non-verbal responses. We explore three research questions: (1) Can LLMs recognize indirect intent conveyed through non-verbal responses? (2) When and how do LLMs fail to capture non-verbal intent? (3) How can we improve LLMs' ability to interpret non-verbal intent?. Through the evaluation, we observe that LLMs struggle to infer underlying meaning from non-verbal responses, with accuracy dropping by up to 60% points compared to verbal ones. Further extensive analysis reveals a behavioral pattern in LLMs' interpretations of non-verbal behavior and demonstrates that in-context learning facilitates pragmatic inference.

2606.01843 2026-06-02 cs.CV cs.AI

Suppressing Forgery-Specific Shortcuts for Generalizable Deepfake Detection

抑制伪造特定捷径以实现可泛化的深度伪造检测

Yihui Wang, Yonghui Yang, Jilong Liu, Fengbin Zhu, Le Wu, Tat-Seng Chua

AI总结 提出Shortcut Subspace Suppression (S^3)框架,通过子空间建模显式表征并抑制方法特定捷径,以提升深度伪造检测的跨方法泛化能力。

详情
AI中文摘要

深度伪造检测在跨伪造方法泛化方面表现不佳,因为现有模型倾向于依赖虚假的方法特定捷径,这些捷径无法迁移到未见过的篡改操作。尽管近期方法试图改进泛化性,但它们缺乏明确的机制来识别和抑制学习表示中的此类捷径。在这项工作中,我们提出了捷径子空间抑制(S^3)框架,通过子空间建模显式表征并抑制方法特定捷径。我们的关键洞察是,区分不同伪造方法的变体捕获了方法特定的伪影,因此可作为方法特定捷径的有效代理。为此,我们训练一个轻量级线性探针进行伪造方法分类,并执行奇异值分解(SVD)以提取主导的捷径子空间。基于此公式,我们开发了两种互补策略来减少对捷径的依赖。在训练期间,我们软性抑制特征表示中的捷径子空间,鼓励模型依赖更可泛化的线索进行真/假判别。在推理时,我们引入一个无需训练的对应方法,衰减与识别出的捷径方向对齐的神经元,从而实现即插即用的泛化增强,并提高可解释性。在多个基准上的大量实验表明,我们的方法显著改善了跨方法泛化,同时保持了强大的域内性能。代码将在论文被接收后发布。

英文摘要

Deepfake detection suffers from poor generalization across forgery methods, as existing models tend to rely on spurious method-specific shortcuts that fail to transfer to unseen manipulations. While recent approaches attempt to improve generalization, they lack an explicit mechanism to identify and suppress such shortcuts in learned representations. In this work, we propose Shortcut Subspace Suppression (S^3) framework that explicitly characterizes and suppresses method-specific shortcuts via subspace modeling. Our key insight is that variations distinguishing different forgery methods capture method-specific artifacts and thus serve as an effective proxy for method-specific shortcuts. To this end, we train a lightweight linear probe for forgery method classification and perform Singular Value Decomposition (SVD) to extract the dominant shortcut subspace. Building on this formulation, we develop two complementary strategies to reduce shortcut reliance. During training, we softly suppress the shortcut subspace in feature representations, encouraging the model to rely on more generalizable cues for real/fake discrimination. At inference time, we introduce a training-free counterpart that attenuates neurons aligned with the identified shortcut directions, enabling plug-and-play generalization enhancement with improved interpretability. Extensive experiments on multiple benchmarks demonstrate that our method significantly improves cross-method generalization while maintaining strong in-domain performance. The code will be released upon acceptance of the submission.

2606.01840 2026-06-02 cs.AI

Evaluation of Baseline Methods for IDD-based SSD External Memory Search

基于IDD的SSD外部内存搜索的基线方法评估

Yuki Suzuki, Alex Fukunaga

AI总结 本文评估了基于即时重复检测(IDD)的A*算法在SSD外部内存搜索中的简单基线方法性能,并分析了操作系统级页面缓存的影响。

详情
Comments
accepted to The 19th International Symposium on Combinatorial Search (SoCS2026)
AI中文摘要

许多困难的搜索问题无法仅使用RAM通过A*等算法解决。先前的工作提出了使用容量远大于RAM的外部内存(如SSD和HDD)的搜索算法,但先前的工作主要集中在延迟重复检测方法以及复杂的即时重复检测(IDD)方法上,而相对简单的IDD方法尚未得到系统研究。此外,操作系统级管理及加速外部内存访问的机制(如页面缓存)的影响也未被研究。本文通过评估和分析基于IDD的A*的简单基线方法的性能,填补了文献中的这些空白。

英文摘要

Many difficult search problems cannot be solved by algorithms such as A* using only RAM. Search algorithms which use external memory such as SSDs and HDDs with much higher capacity than RAM have been proposed in previous work, but previous work has focused on delayed duplicate detection approaches, as well as complex immediate duplicate detection (IDD) methods, and relatively simple methods for IDD have not been systematically studied. In addition, the effect of OS-level mechanisms for managing and speeding up accesses to external memory, such as page caches, has not been studied. This paper addresses these gaps in the literature by evaluating and analyzing the performance of simple baseline approaches for IDD-based A*.

2606.01839 2026-06-02 cs.DC cs.AR cs.LG

Observation, Not Prediction: Conversation-Level Disaggregated Scheduling for Agentic Serving

观察而非预测:面向智能体服务的对话级解耦调度

Jianru Ding, Ryien Hosseini, Pouya Mahdi Gholami, Mingyuan Xiang, Henry Hoffmann

AI总结 提出将调度单元从单轮提升至整个对话,利用对话中首轮计算密集与后续内存密集的两阶段可观察特性,实现无需预测的解耦调度,显著降低延迟并提升能效。

详情
AI中文摘要

基于LLM的智能体通过多轮依赖推理和工具调用来解决用户任务,产生的工作负载在任务到达时总成本未知。现有的多轮系统以轮次为调度单元,逐轮决定是否将预填充与解码解耦。该决策依赖于该轮的解码长度、工具行为和KV增长,这些量在调度器必须行动时不可观察,迫使系统进行预测。我们表明这种对预测的依赖是由调度单元而非工作负载强加的。将调度单元从轮次提升到对话,将轮次级的不规则性转化为稳定的两阶段结构:1) 计算密集的首轮预填充,随后是2) 长尾内存密集阶段。因此,以对话为调度单元,放置问题简化为读取首轮输入长度和每解码器KV占用率,两者均可直接观察。我们在ConServe中实例化这一原则,它将首轮预填充路由到高吞吐预填充器,精确传输KV缓存一次,并将对话固定到单个解码器处理其整个尾部,无需学习解码侧成本模型。与每轮预测基线相比,ConServe将p95首次有效令牌时间(对话首个用户可见输出的延迟)降低51.08%,能效提升7.51%,同时保持最后一轮的TBT和SLO;将两阶段映射到异构GPU层级可进一步增加22.75%的能效。

英文摘要

LLM-based agents resolve a user task through many turns of dependent inference and tool calls, producing a workload whose total cost is unknown when the task arrives. Existing multi-turn systems keep the turn as the scheduling unit and decide, turn by turn, whether to disaggregate prefill from decode. That decision rests on the turn's decode length, tool behavior, and KV growth, quantities that are not observable when the scheduler must act, forcing the system to predict them. We show this dependence on prediction is imposed by the scheduling unit, not the workload. Raising the scheduling unit from the turn to the conversation converts turn-level irregularity into a stable, two-phase structure: 1) a compute-bound turn-1 prefill followed by 2) a long, memory-bound tail. Thus, with the conversation as the scheduling unit, placement reduces to reading the first-turn input length and per-decoder KV occupancy, both directly observable. We instantiate this principle in ConServe, which routes the first-turn prefill to a high-throughput prefiller, transfers the KV cache exactly once, and pins the conversation to a single decoder for its entire tail, with no learned model of decode-side cost. Against a per-turn prediction baseline, ConServe reduces p95 time-to-first-effective-token (the latency of a conversation's first user-visible output) by 51.08% and improves energy efficiency by 7.51% while preserving last-turn TBT and SLOs; mapping the two phases onto heterogeneous GPU tiers adds a further 22.75% in energy efficiency.

2606.01838 2026-06-02 cs.CL cs.AI cs.LG

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

LayerRoute: 基于LoRA微调的输入条件自适应层跳过方法用于智能语言模型

Prateek Kumar Sikdar

AI总结 提出LayerRoute,通过为每个Transformer块添加轻量级路由器和LoRA适配器,根据输入类型(工具调用或规划推理)自适应跳过层,在仅增加0.22%可训练参数下实现12.91%的跳过差异并提升质量。

详情
Comments
10 pages, 3 figures, 4 tables
AI中文摘要

智能语言模型系统交替使用两种结构不同的步骤类型:结构化工具调用(短、确定性、低困惑度)和开放式规划/推理步骤(长、复杂、高困惑度)。尽管存在这种异质性,当前的推理系统对每个步骤应用相同的计算量。我们引入LayerRoute,一个轻量级适配器,学习基于每个输入有选择地跳过Transformer块。LayerRoute为Qwen2.5-0.5B-Instruct中的24个Transformer块中的每一个增加:(1)一个每层路由器(约897个参数,Linear(896,1)),通过直通估计器输出硬二值门;(2)在Q/K/V/O注意力投影上的LoRA适配器(秩8,约1.08M参数)。骨干权重保持冻结。在智能体数据(Hermes、Glaive、GSM8K、Turing)上进行单次端到端训练,并加入门正则化项,迫使系统发现每个输入类型下哪些块是可跳过的。经过3000步(在A100 40GB上6.4分钟),LayerRoute实现了12.91%的跳过差异:工具调用跳过15.25%的FLOPs,而规划步骤仅跳过2.34%,仅使用1.10M可训练参数(占494M骨干的0.22%)。由于LoRA适配,质量相比基础模型有所提升,工具调用上的困惑度差为-1.29,规划步骤上为-1.30。

英文摘要

Agentic language model systems alternate between two structurally distinct step types: structured tool calls (short, deterministic, low perplexity) and open-ended planning/reasoning steps (long, complex, high perplexity). Despite this heterogeneity, current inference systems apply identical compute to every step. We introduce LayerRoute, a lightweight adapter that learns to selectively skip transformer blocks on a per-input basis. LayerRoute augments each of the 24 transformer blocks in Qwen2.5-0.5B-Instruct with: (1) a per-layer router (~897 parameters, Linear(896,1)) that outputs a hard binary gate via the straight-through estimator, and (2) LoRA adapters (rank 8, ~1.08M parameters) on the Q/K/V/O attention projections. The backbone weights remain frozen. A single end-to-end training pass on agentic data (Hermes, Glaive, GSM8K, Turing) with a gate regularisation term forces the system to discover which blocks are skippable per input type. After 3,000 steps (6.4 minutes on an A100 40GB), LayerRoute achieves a 12.91% skip differential: tool calls skip 15.25% of FLOPs while planning steps skip only 2.34%, using only 1.10M trainable parameters (0.22% of the 494M backbone). Quality improves over the base model due to LoRA adaptation, with perplexity delta of -1.29 on tool calls and -1.30 on planning.

2606.01834 2026-06-02 cs.CV cs.AI

Physics-Guided Attention in a Lightweight TCN for Efficient WiFi CSI-Based Human Activity Recognition

轻量级TCN中的物理引导注意力用于高效基于WiFi CSI的人体活动识别

Chinthaka Ranasingha, Tharindu Fernando, Sridha Sridharan, Clinton Fookes, Harshala Gammulle

AI总结 提出一种紧凑的TCN框架,通过多普勒能量引导的时间注意力和方差驱动的通道注意力机制,显式引入运动感知归纳偏置,在减少参数和计算成本的同时实现优于深度基线模型的性能。

详情
AI中文摘要

基于WiFi信道状态信息(CSI)的人体动作识别(HAR)因其非接触、低成本及保护隐私的特性而受到越来越多的关注。然而,现有的基于学习的方法主要依赖深度、计算密集的架构来隐式地从CSI测量中捕捉运动动态,从而增加了模型复杂度并降低了效率。相反,我们认为,结合针对CSI信号物理特性的适当归纳偏置能够实现更高效和有效的学习。在这项工作中,我们提出一个紧凑的基于时间卷积网络(TCN)的框架,将运动感知的归纳偏置显式地融入特征学习。具体地,我们在特征空间中引入多普勒能量引导的时间注意力机制以强调运动显著的时间段,以及一个方差驱动的通道注意力模块,根据时间运动统计自适应地加权信息子载波。通过整合这些领域特定的先验知识,所提模型在不增加架构深度的情况下有效捕捉运动动态。在多个基准数据集上的大量实验表明,我们的方法相比更深的基线模型取得了优越的性能,同时显著减少了参数数量和计算成本。

英文摘要

Human Action Recognition (HAR) using WiFi Channel State Information (CSI) has gained increasing attention due to its non-contact, low-cost, and privacy-preserving nature. However, existing learning-based approaches largely rely on deep, computationally intensive architectures to implicitly capture motion dynamics from CSI measurements, thereby increasing model complexity and reducing efficiency. Instead, we argue that incorporating appropriate inductive biases tailored to the physical characteristics of CSI signals enables more efficient and effective learning. In this work, we propose a compact temporal convolutional network (TCN)-based framework that explicitly incorporates motion-aware inductive biases into feature learning. Specifically, we introduce a Doppler-energy-guided temporal attention mechanism in feature space to emphasize motion-salient time segments, and a variance-driven channel attention module to weight informative subcarriers based on temporal motion statistics adaptively. By integrating these domain-specific priors, the proposed model effectively captures motion dynamics without increasing architectural depth. Extensive experiments on multiple benchmark datasets demonstrate that our approach achieves superior performance compared to deeper baselines, while significantly reducing parameter count and computational cost.

2606.01833 2026-06-02 cs.LG cs.AI

Learning Implicit Bias in Generative Spaces for Accelerating Protein Dynamics Emulation

学习生成空间中的隐式偏置以加速蛋白质动力学仿真

Kaihui Cheng, Zhiqiang Cai, Wenkai Xiang, Zhihang Hu, Siyu Zhu, Tzuhsiung Yang, Yuan Qi

AI总结 提出在预训练生成式仿真器的生成空间中引入隐式历史依赖偏置,结合距离加权分数估计和环境支持正则化,通过重投影步骤保持结构有效性,显著提升采样多样性和稀有状态覆盖速度。

详情
AI中文摘要

蛋白质动力学生成式仿真器能够以分子动力学一小部分成本生成合理的轨迹,但它们继承了训练分布,在长期外推下倾向于重访已知状态而非到达稀有状态。受经典增强采样启发,我们在预训练仿真器的生成空间中引入隐式历史依赖偏置。具体来说,一个历史感知的分数估计器向冻结的仿真器添加距离加权偏置,引导逆时采样远离先前生成的结构,并通过环境支持项进行正则化。为在长时间尺度下保持结构有效性,一个基于分数的精化步骤利用冻结仿真器将漂移的样本重新投影到数据流形上。实验表明,该方法(i)在DynamicPDB-80上将多样性提升35%;(ii)在12个零样本快速折叠蛋白质上,单独使用学习到的偏置达到无偏仿真器覆盖的速度最高快约15倍,与精化结合后覆盖速度最高快约37倍,同时覆盖的低能态数量多约3倍。代码即将发布。

英文摘要

Generative emulators of protein dynamics produce plausible trajectories at a fraction of the cost of molecular dynamics, but they inherit their training distribution and tend to revisit known states rather than reach rare ones under long-horizon extrapolation. Inspired by classical enhanced sampling, we introduce an implicit, history-dependent bias in the generative space of a pretrained emulator. Specifically, a history-aware score estimator augments the frozen emulator with a distance-weighted bias that steers reverse-time sampling away from previously generated structures, regularized by an environment-support term. To preserve structural validity at long horizons, a score-based refinement step re-projects drifted samples onto the data manifold using the frozen emulator. Our experiments demonstrate that the method (i) raises diversity by $35\%$ on DynamicPDB-80; (ii) on $12$ zero-shot Fast-Folding proteins, the learned bias alone reaches the unbiased emulator's coverage up to ${\sim}15\times$ faster, and pairing it with refinement reaches the coverage up to ${\sim}37\times$ faster while covering ${\sim}3\times$ as many low-energy states. Code will be released soon.

2606.01830 2026-06-02 cs.AI

CAPF: Guiding Search-Agent Rollouts with Credit-Attenuated Privileged Feedback

CAPF:基于信用衰减特权反馈引导搜索智能体轨迹生成

Bin Chen, Xinye Liao, Yiming Liu, Xin Liao, Chonghan Liu

AI总结 针对结果奖励稀疏导致搜索智能体学习困难的问题,提出训练时利用验证器侧信息(CAPF)将零奖励轨迹修复为正奖励轨迹,并衰减相关信用以适配无特权反馈的部署场景,在七个开放域问答基准上将Qwen3-4B的平均精确匹配分数从44.7%提升至48.5%。

详情
AI中文摘要

最近的LLM搜索智能体使用带可验证奖励的强化学习(RLVR)从结果奖励中学习搜索增强推理。在困难问题上,这些智能体很少采样到端到端成功的轨迹,导致仅基于结果的RLVR只有少量正奖励轨迹。我们认为,改善此类问题的学习需要在训练期间提供额外指导,而RLVR已经包含了可以提供这种指导的验证器侧信息。这些信息可以识别智能体提交答案中的错误或遗漏,并引导轨迹内的修正。我们提出了一种训练时机制,称为**信用衰减特权反馈**(CAPF),该机制通过在训练期间进行特权反馈调用,使验证器侧信息可用。CAPF允许策略将零奖励尝试修复为正奖励修复轨迹,并衰减对反馈调用和早期动作的信用,以适应没有此调用的部署。实证研究表明,在七个开放域问答基准上,CAPF将Qwen3-4B的平均精确匹配分数从仅结果RLVR的44.7%提升至48.5%。

英文摘要

Recent LLM search agents use reinforcement learning with verifiable rewards (RLVR) to learn search-augmented reasoning from outcome rewards. On hard problems, these agents rarely sample end-to-end successful rollouts, leaving outcome-only RLVR with few positive-reward trajectories. We argue that improving learning on such problems requires additional guidance during training, and RLVR already contains verifier-side information that can provide it. This information can identify errors or omissions in the agent's submitted answer and guide revision within the rollout. We propose a training-time mechanism called \textbf{Credit-Attenuated Privileged Feedback} (CAPF), which makes this verifier-side information available through a Privileged Feedback call during training. CAPF lets the policy revise zero-reward attempts into positive-reward repair trajectories and attenuates credit for the feedback call and earlier actions to accommodate deployment without this call. Empirical research demonstrates that CAPF improves Qwen3-4B's average exact-match score from 44.7% under outcome-only RLVR to 48.5% on seven open-domain QA benchmarks.