arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.01334 2026-06-02 cs.CV

HOLA: Holistic Multi-Modal Alignment for Open-Set 3D Recognition

HOLA: 面向开放集3D识别的全息多模态对齐

Koby Aharonov, Oren Shrout, Ayellet Tal

发表机构 * Technion – Israel Institute of Technology（技术ion-以色列理工学院）

AI总结提出HOLA方法，通过解耦多正例对比损失和对齐点云与多视图图像及文本描述，实现开放集3D识别中的全息多模态对齐，在长尾基准上取得最先进零样本性能。

详情

AI中文摘要

开放集3D识别需要模型能够泛化到罕见或未见类别。最近的方法通过将语言-视觉知识蒸馏到3D编码器来解决这一问题，通常依赖重型2D ViT，并将每个点云与单张图像或标题对齐，从而将表示锚定到局部视图。我们提出将每个点云与多张图像和文本描述对齐，以捕获对3D对象的更全面理解。为实现这一想法，必须设计一个损失函数，能够联合对齐一个3D实例与多个匹配信号（多视图图像和多个文本），同时将正例聚合与负例竞争分离。我们引入了这样的函数，称为解耦多正例对比损失。我们的公式增强了损失对困难负例的难度感知关注，避免了当多个正例与所有负例共享同一个softmax时出现的“聚光灯拥挤”现象。作为补充，我们提出了一个轻量级文本适配器，仅应用于网络标题，减少了与精心标注之间的领域差距，并能够有效利用大规模无监督文本。我们的模型在长尾基准上展示了最先进的开放词汇性能，在保持高帧率的同时实现了显著的零样本改进。

英文摘要

Open-set 3D recognition requires models that generalize to rare or unseen categories. Recent approaches address this by distilling language-vision knowledge into 3D encoders, typically relying on heavy 2D ViTs and aligning each point cloud with a single image or caption, thus anchoring representations to partial views. We propose aligning each point cloud with multiple images and textual descriptions to capture a more holistic understanding of 3D objects. To realize this idea, it is essential to design a loss function capable of jointly aligning a 3D instance with multiple matched signals, multi-view images and multiple texts, while separating positive aggregation from negative competition. We introduce such a function, termed the decoupled multi-positive contrastive loss. Our formulation enhances the loss's hardness-aware focus on challenging negatives, avoiding the "spotlight crowding" that occurs when many positives share the same softmax with all the negatives. Complementing this, we present a lightweight text adapter applied only to web captions, reducing the domain gap to curated annotations and enabling effective use of large-scale unsupervised text. Our model demonstrates state-of-the-art open-vocabulary performance on long-tail benchmarks, yielding substantial zero-shot improvements while sustaining high frame rates.

URL PDF HTML ☆

赞 0 踩 0

2606.01332 2026-06-02 cs.RO

S2M-Trek: From Single to Multi-Sphere Transport via Per-Frame Deep Sets on a Wheel-Legged Robot

S2M-Trek: 从单球到多球运输：基于轮腿机器人的逐帧深度集方法

Zong Chen, Xuebin Li, Jinpeng Xiao, Shaoyang Li, Ben Liu, Min Li, Zhouping Yin, Yiqun Li

发表机构 * School of Mechanical Science and Engineering, Huazhong University of Science and Technology（华中科技大学机械科学与工程学院）； School of Mathematics, Harbin Institute of Technology（哈尔滨工业大学数学学院）

AI总结针对轮腿四足机器人背部同时运输多个自由滚动球体的动态操作问题，提出逐帧深度集（PFDS）编码器，通过逐帧置换不变池化解决历史拼接编码器的置换对称性不匹配，实现五球100%无掉落运输。

详情

AI中文摘要

我们研究了从单个自由滚动球体到多个球体同时运输的动态操作缩放问题，这些球体在轮腿四足机器人背部运输，无需围栏、夹具或机械止动器。多个相同的自由滚动球体构成一个无序集合，没有持久身份：它们的顺序可能在每个历史帧中独立变化，产生一种\\emph{逐帧置换对称性}，而标准的历史拼接集合编码器并未显式强制这种对称性——这些编码器仅在整个历史上施加共享的对角置换对称性。我们表明，这种对称性不匹配导致基于课程强化学习的具体失败模式。在相同的PPO训练预算内，平坦MLP和分支编码器在双球阶段或以下停滞，而历史拼接深度集基线（\\\HCDS）在我们的运行中无法超越双球阶段，除非在训练期间随机化球到槽的分配，这表明它利用槽索引作为课程捷径，而不是学习无身份的多球动力学。我们提出\textbf{逐帧深度集（\\\PFDS）}，它在时间读出之前在每个历史帧内执行置换不变池化；我们证明\\\PFDS是$\\\Gframe$-不变的，并且能普遍逼近连续的$\\\Gframe$-不变策略。一个$2{\\times}2$消融实验（编码器架构和槽随机化）分离了架构和数据增强路径，\\\PFDS在所有五个随机种子下达到五球阶段，模拟中实现100%无掉落运输。我们进一步通过DAgger将\\\PFDS教师蒸馏为\\\TactSet，用$16{\\times}16$布尔联合接触图替代特权球体状态观测，产生紧凑且自然$\\\Gframe$-不变的触觉表示。

英文摘要

We study the problem of scaling dynamic loco-manipulation from a single free-rolling sphere to multiple spheres transported simultaneously on the back of a wheel-legged quadruped, without fences, grippers, or mechanical stops. Multiple identical free-rolling spheres form an unordered set with no persistent identity: their ordering may change independently at each history frame, creating a \emph{per-frame permutation symmetry} that standard history-concatenation set encoders do not explicitly enforce -- these encoders impose only a shared, diagonal permutation symmetry over the full history. We show that this symmetry mismatch leads to a concrete failure mode in curriculum-based reinforcement learning. Within the same PPO training budget, flat MLPs and branch-wise encoders plateau at or below the two-sphere stage, while a history-concatenation Deep Sets baseline (\HCDS) fails to progress past the two-sphere stage in our runs unless ball-to-slot assignments are randomised during training, suggesting that it exploits slot indices as a curriculum shortcut rather than learning identity-free multi-sphere dynamics. We propose \textbf{Per-Frame Deep Sets (\PFDS)}, which performs permutation-invariant pooling within each history frame before temporal readout; we prove that \PFDS is $\Gframe$-invariant and universally approximates continuous $\Gframe$-invariant policies. A $2{\times}2$ ablation over encoder architecture and slot randomisation separates the architectural and data-augmentation pathways, and \PFDS reaches the five-sphere stage with 100\% no-drop transport in simulation across all five random seeds. We further distill the \PFDS teacher into \TactSet via DAgger, replacing privileged sphere-state observations with a $16{\times}16$ Boolean union contact map, yielding a compact and naturally $\Gframe$-invariant tactile representation.

URL PDF HTML ☆

赞 0 踩 0

2606.01329 2026-06-02 cs.LG q-bio.BM

Conditioned free-energy density of proteins using unbalanced solutions to constraint satisfaction problems

使用约束满足问题的不平衡解的条件化蛋白质自由能密度

Pratik Worah, Subhash Khot, Srinivasa Varadhan

发表机构 * CIMS, NYU（纽约大学应用数学与计算科学中心）

AI总结本文通过将条件化非均匀Curie-Weiss自旋哈密顿量的对数配分函数（自由能）简化为不平衡$2 \to 1$范数计算，并设计多项式时间SDP算法，应用于泛素蛋白以探索自由能景观并识别柔性区域。

2606.01323 2026-06-02 cs.CL cs.AI

DiffuSent: Towards a Unified Diffusion Framework for Aspect-Based Sentiment Analysis

DiffuSent：面向方面级情感分析的统一扩散框架

Shu Long, Yanglei Gan, Xuchuan Zhou

发表机构 * University of Electronic Science and Technology of China（电子科技大学）； Southwest Petroleum University（西南石油大学）； Southwest Minzu University（西南民族大学）

AI总结提出非自回归扩散框架DiffuSent，将方面级情感分析的所有子任务统一为边界去噪扩散过程，通过对比去噪训练策略解决重复预测问题，在28个设置上优于现有生成式和跨度式系统，并实现高达181倍的推理加速。

详情

AI中文摘要

方面级情感分析（ABSA）包含七个不同的子任务，每个子任务关注不同的提取元素。尽管生成模型在统一方面情感分析中取得了成功，现有方法通常依赖于自回归的逐词生成，未能捕捉方面和意见术语的整体信息，导致边界不敏感，特别是在多词方面和意见术语的上下文中。为了解决这些问题，我们提出了DiffuSent，一个非自回归扩散框架，系统地将所有ABSA子任务公式化为边界去噪扩散过程，逐步在噪声状态上细化边界。此外，我们引入了一种对比去噪训练策略，有效解决了扩散过程中引入的细微变化导致的重复预测问题。在28个设置（7个子任务×4个数据集）上的大量实验表明，DiffuSent在最强生成式和跨度式系统上实现了持续改进。DiffuSent在多词三元组上表现出显著增益，平均F1提升+2.48，并在包含多个情感三元组的句子中保持稳健的提取准确性。此外，非自回归解码实现了显著的效率优势，推理速度比自回归生成基线快达181倍。

英文摘要

Aspect-Based Sentiment Analysis (ABSA) encompasses seven distinct subtasks, each focusing on different extracted elements. Despite the proven success of generative models in unified aspect sentiment analysis, existing approaches often rely on auto-regressive token-by-token generation without grasping the whole information of the aspect and opinion terms, resulting in boundary insensitivity, particularly in context of multi-word aspect and opinion terms. To address these issues, we present DiffuSent, a non-auto-regressive diffusion framework that systematically formulates all ABSA subtasks as boundary denoising diffusion processes, progressively refining boundaries over noisy states. Furthermore, we introduce a contrastive denoising training strategy which effectively address duplicate predictions with subtle variations introduced by diffusion process. Extensive experiments across 28 settings (7 subtasks x 4 datasets) demonstrate that DiffuSent achieves delivers consistent improvements over the strongest generative and span-based systems. DiffuSent exhibits notable gains on multi-word triplets, achieving an average improvement of +2.48 F1, and maintains robust extraction accuracy in sentences containing multiple sentiment triplets. Moreover, the non-auto-regressive decoding enables substantial efficiency benefits, reaching up to 181 times faster inference than auto-regressive generative baselines

URL PDF HTML ☆

赞 0 踩 0

2606.01322 2026-06-02 cs.CL cs.AI

TukaBench: A Culturally Grounded Jailbreak Benchmark for African Languages

TukaBench: 一个基于文化的非洲语言越狱基准

Victor Akinode, Senyu Li, Wassim Hamidouche, Waqas Zamir, Inbal Becker-Reshef, David Ifeoluwa Adelani

发表机构 * Mila - Quebec AI Institute（魁北克人工智能研究院）； McGill University（麦吉尔大学）； Microsoft AI for Good Research Lab（微软人工智能造福人类研究实验室）； Canada CIFAR AI Chair（加拿大CIFAR人工智能主席）

AI总结针对大型语言模型在非洲低资源语言上的安全评估缺失，提出TUKABENCH基准，通过四种设置（直接翻译、文化适应翻译、人工策划提示、代码切换提示）评估语言、文化背景和提示规避性对模型安全的影响，发现非洲语言提示降低拒绝率，并引入Deflection指标和人工验证以解决模型理解失败和评判可靠性问题。

Comments Under review

详情

AI中文摘要

大型语言模型（LLMs）的安全评估仍然高度以英语为中心，导致低资源语言（LRLs），特别是非洲语言，严重缺乏探索。我们引入了TUKABENCH，一个针对七种非洲语言的越狱基准，它通过四种设置将JailbreakBench（JBB）扩展到直接翻译之外：JBB提示的人工翻译、适应非洲背景的英语提示后人工翻译、通过与GPT-5.2交互验证的人工策划提示，以及结合英语和非洲语言的代码切换提示，从而隔离语言、文化背景和提示规避性对模型安全的影响。在闭源和开源模型中，使用非洲语言提示相比英语减少了拒绝，其中文化适应的提示导致最少的拒绝。评估还揭示了两个结构性限制：模型理解失败和低资源语言中LLM作为评判者的可靠性降低。为了捕捉前者，我们在“拒绝”和“越狱”之外引入了“回避”；为了评估后者，我们通过人工标注验证输出，显示在低资源语言和较少支持的脚本中，评判者与人类的一致性下降。

英文摘要

Safety evaluation of Large Language Models (LLMs) remains heavily English-centric, leaving Low-Resource Languages (LRLs), particularly African ones, critically underexplored. We introduce TUKABENCH, a jailbreak benchmark for seven African languages that extends JailbreakBench (JBB) beyond direct translation through four settings: human translation of JBB prompts, English adaptation to African contexts followed by human translation, human-curated prompts validated through interactions with GPT-5.2, and code-switched prompts combining English and African languages, isolating the effect of language, cultural grounding, and prompt evasiveness on model safety. Across closed and open models, prompting in African languages reduces refusal relative to English, with culturally adapted prompts leading to least refusal. The evaluation also surfaces two structural limitations: model comprehension failures and reduced LLM-as-a-judge reliability in LRLs. To capture the first, we introduce Deflection alongside Refused and Jailbroken; to assess the second, we validate outputs with human annotations, showing that judge-human agreement drops in lower-resource languages and less commonly supported scripts.

URL PDF HTML ☆

赞 0 踩 0

2606.01315 2026-06-02 cs.CV

DeblurNVS: Geometric Latent Diffusion for Novel View Synthesis from Sparse Motion-Blurred Images

DeblurNVS：基于几何潜在扩散的稀疏运动模糊图像新视角合成

Changyue Shi, Wangbo Yu, Chaoran Feng, Li Yuan

发表机构 * School of AI for Science, Peking University Shenzhen Graduate School（人工智能科学学院，北京大学深圳研究生院）； School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School（电子与计算机工程学院，北京大学深圳研究生院）

AI总结提出DeblurNVS框架，利用几何潜在扩散从稀疏运动模糊图像中直接合成高保真新视角，无需逐场景优化。

详情

AI中文摘要

新视角合成（NVS）是计算机视觉和图形学中的一个基本问题。神经辐射场（NeRF）、3D高斯泼溅（3DGS）和生成式视角合成的最新进展显著提高了其质量。然而，大多数方法仍然依赖于清晰观测，其中图像结构和跨视角几何线索得以良好保留。运动模糊通过破坏局部细节和削弱多视角对应关系打破了这一假设。这种模糊通常由实际拍摄中的相机抖动、场景运动或有限曝光引起。模糊感知的NVS方法通过建模图像形成来解决这一退化问题，但它们依赖于昂贵的逐场景优化，限制了高效且可泛化的稀疏视角合成。为了解决这个问题，我们提出了DeblurNVS，一种新颖的框架，可以直接从稀疏运动模糊图像中合成高保真新视角，无需逐场景优化。DeblurNVS恢复了多视角推理所需的中间几何表示，使模糊输入能够恢复可靠的结构和对应线索。然后，将恢复的表示与目标相机信息结合，合成目标视角表示并重建清晰的RGB新视角。为了实现大规模训练，我们使用基于插值的有限曝光模糊合成方法，从DL3DV-10K构建了一个运动模糊NVS数据集。大量实验表明，DeblurNVS在合成运动模糊基准上优于现有基线，并能泛化到真实运动模糊场景，生成感知上更清晰、结构上更稳定的新视角，同时避免了昂贵的逐场景优化。项目页面：https://github.com/PKU-YuanGroup/DeblurNVS。

英文摘要

Novel view synthesis (NVS) is a fundamental problem in computer vision and graphics. Recent advances in neural radiance fields (NeRF), 3D Gaussian Splatting (3DGS), and generative view synthesis have substantially improved its quality. Yet most methods still rely on clean observations, where image structures and cross-view geometric cues are well preserved. Motion blur breaks this assumption by corrupting local details and weakening multi-view correspondences. Such blur commonly arises from camera shake, scene motion, or finite exposure in practical capture. Blur-aware NVS methods address this degradation by modeling image formation, but their reliance on costly per-scene optimization limits efficient and generalizable sparse-view synthesis. To address this, we propose DeblurNVS, a novel framework for synthesizing high-fidelity novel views directly from sparse motion-blurred images, without requiring per-scene optimization. DeblurNVS restores the intermediate geometric representations needed for multi-view reasoning, enabling blurred inputs to recover reliable structure and correspondence cues. The restored representations are then combined with target camera information to synthesize the target-view representation and reconstruct a sharp RGB novel view. To enable the large-scale training, we construct a motion-blurred NVS dataset from DL3DV-10K using interpolation-based finite-exposure blur synthesis. Extensive experiments demonstrate that DeblurNVS outperforms existing baselines on synthetic motion-blur benchmarks and generalizes to real motion-blurred scenes, producing perceptually sharper and structurally more stable novel views while avoiding costly per-scene optimization. Project page: https://github.com/PKU-YuanGroup/DeblurNVS.

URL PDF HTML ☆

赞 0 踩 0

2606.01314 2026-06-02 cs.AI

SkillSmith: Co-Evolving Skills and Tools for Self-Improving Agent Systems

SkillSmith: 技能与工具的协同进化用于自我改进的智能体系统

Yangbo Wei, Zhen Huang, Shaoqiang Lu, Junhong Qian, Qifan Wang, Chen Wu, Lei He

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Eastern Institute of Technology（东部技术研究所）； University of Science and Technology of China（中国科学技术大学）； Southeast University（东南大学）； Ningbo Institute of Digital Twin（宁波数字孪生研究所）

AI总结提出SkillSmith框架，通过统一提案空间和Lotka-Volterra生态效用模型实现技能与工具的协同进化，在多个基准测试中显著提升性能。

详情

AI中文摘要

最近的自进化智能体表明，技能可以通过执行被发现、精炼和积累。然而，现有的技能进化框架通常假设固定的工具层，并独立评估每个技能，限制了它们修复工具级故障或推理技能间交互的能力。我们提出SkillSmith，一个协同感知的技能-工具协同进化框架。SkillSmith引入了一个统一的提案空间，其中反思产生原子束，共同修改技能和工具，允许在技能进化识别出可重用的能力缺口时，对工具进行包装、编辑、组合、拆分或淘汰。为了指导这种联合搜索，SkillSmith维护了一个受Lotka-Volterra动力学启发的生态效用模型，其中从执行轨迹估计的交互矩阵捕获技能间的成对互补和冲突，并为检索、变异优先级和淘汰提供压力信号。此外，SkillSmith记录反模式，包括失败特征、因果归因和补救措施，以加速诊断并否决重复已知错误的提案。在包括WildClawBench在内的三个基准测试和五个Qwen3.5模型规模上的实验表明，SkillSmith始终优于强基线，并且随着任务复杂性和多技能共激活的增加，增益会放大。

英文摘要

Recent self-evolving agents have shown that skills can be discovered, refined, and accumulated through execution. However, existing skill-evolution frameworks typically assume a fixed tool layer and evaluate each skill independently, limiting their ability to repair tool-level failures or reason about interactions among skills. We propose SkillSmith, a synergy-aware skill-tool co-evolution framework. SkillSmith introduces a unified proposal space in which reflection produces atomic bundles that jointly modify skills and tools, allowing tools to be wrapped, edited, composed, split, or retired when skill evolution identifies a reusable capability gap. To guide this joint search, SkillSmith maintains an ecological utility model inspired by Lotka-Volterra dynamics, where an interaction matrix estimated from execution traces captures pairwise complementarity and conflict among skills and provides pressure signals for retrieval, mutation prioritization, and retirement. Furthermore, SkillSmith records anti-patterns, including failure signatures, causal attributions, and remedies, to accelerate diagnosis and veto proposals that repeat known mistakes. Experiments on three benchmarks, including WildClawBench, and five Qwen3.5 model scales show that SkillSmith consistently outperforms strong baselines, with gains that amplify as task complexity and multi-skill co-activation increase.

URL PDF HTML ☆

赞 0 踩 0

2606.01313 2026-06-02 cs.RO cs.AI

PSG-Nav: Probabilistic Scene Graph Navigation via Multiverse Decision Making

PSG-Nav: 通过多元宇宙决策的概率场景图导航

Rufeng Chen, Yue Chang, Xiaqiang Tang, Hechang Chen, Sihong Xie

发表机构 * Tsinghua University（清华大学）

AI总结提出PSG-Nav方法，通过构建3D概率场景图并利用多元宇宙决策从联合分布中采样最可能的世界设置，以处理开放词汇导航中的感知不确定性，并引入证据经验校准器实现在线终身适应，在多个基准上取得最新最优结果。

Comments 21 pages, 7 figures. ICML 2026

详情

AI中文摘要

开放词汇导航要求具身智能体管理由语义歧义和模型错误引起的显著感知不确定性。然而，大多数现有工作满足于局部最优的确定性方法，剥夺了在多个复合可能性上的复杂导航决策，而这些对于全局更优解至关重要。在本文中，我们提出概率场景图导航（PSG-Nav），它构建了一个3D概率场景图，使用完整的语义类别分布来考虑感知不确定性。为了有效利用局部分布来组合和推理最优导航地标，我们提出多元宇宙决策，从联合分布中采样多个最可能的世界设置，并基于地标与多元宇宙之间的兼容性评估导航地标。为了减轻开放词汇导航中因认知不确定性导致的误报，我们引入证据经验校准器，通过将检测与过去成功和失败的记忆进行交叉验证，实现在线终身适应。在广泛使用的基准MP3D、HM3D和HSSD上的大量实验表明，PSG-Nav建立了新的最先进结果，分别实现了66.1%、44.8%和67.9%的成功率。代码可在https://psg-nav.github.io/获取。

英文摘要

Open-vocabulary navigation requires embodied agents to manage significant perception uncertainty stemming from semantic ambiguity and model errors. However, most existing works settle for local optimal deterministic approaches, depriving complex navigation decision-making over multiple composite possibilities that are critical for globally better solutions. In this paper, we propose Probabilistic Scene Graph Navigation (PSG-Nav), which constructs a 3D Probabilistic Scene Graph that uses full semantic categorical distributions to account for perception uncertainty. To efficiently use the local distributions to compose and reason about the optimal navigation landmarks, we propose Multiverse Decision to sample multiple most likely world settings from the joint distribution, and evaluate navigation landmarks based on the compatibility between landmarks and multiverses. To mitigate false positives due to epistemic uncertainty in open-vocabulary navigation, we introduce the Evidential Experience Calibrator, which enables online lifelong adaptation by cross-validating detections against memories of past successes and failures. Extensive experiments on widely-used benchmarks MP3D, HM3D, and HSSD demonstrate that PSG-Nav establishes new state-of-the-art results, achieving Success Rates of 66.1%, 44.8%, and 67.9%, respectively. Code is available at: https://psg-nav.github.io/

URL PDF HTML ☆

赞 0 踩 0

2606.01311 2026-06-02 cs.CL cs.AI cs.LG cs.MA

SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories

SkillAdaptor：基于轨迹的LLM智能体自适应技能

Zhuoyun Yu, Xin Xie, Wuguannan Yao, Chenxi Wang, Lei Liang, Xiang Qi, Shumin Deng

发表机构 * Zhejiang University（浙江大学）； Ant Digital Technologies, Ant Group（蚂蚁集团数字技术部）； Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph（浙江大学-蚂蚁集团知识图谱联合实验室）

AI总结提出SkillAdaptor，一种无训练的步骤级技能自适应框架，通过显式故障归因和针对性更新，提升LLM智能体在长程交互任务中的表现。

Comments Work in progress

详情

AI中文摘要

大型语言模型（LLM）智能体越来越依赖可重用的外部技能来解决长程交互任务。现有的无训练技能自适应流程通常从完整轨迹或会话级反馈更新技能，这使得故障归因粗糙，往往产生不稳定或过于宽泛的修订。我们提出SkillAdaptor，一种无训练的步骤级技能自适应框架，具有显式故障归因，并可插入OpenClaw类智能体框架。给定一个失败轨迹，SkillAdaptor识别第一个可操作的故障步骤，将责任关联到候选技能，并在显式接受检查下应用针对性更新，同时保持主干冻结。我们在WebShop、PinchBench和Claw-Eval上使用Kimi-K2.5、GLM-5和GPT-5.2进行评估。SkillAdaptor在所有三个套件上均优于无技能和技能自适应基线，最大的单项指标提升为PinchBench平均得分%提升1.5分，Claw-Eval平均得分提升1.8分，WebShop成功率提升1.7分。这些结果表明，步骤级归因支持更稳定且可审计的无训练技能维护。

英文摘要

Large language model (LLM) agents increasingly rely on reusable external skills to solve long-horizon interactive tasks. Existing training-free skill adaptation pipelines usually update skills from full trajectories or session-level feedback, which makes failure attribution coarse and often produces unstable or overly broad revisions. We propose SkillAdaptor, a training-free step-level skill adaptation framework with explicit failure attribution, and it can plug into OpenClaw-class agent harnesses. Given a failed trajectory, SkillAdaptor identifies a first actionable fault step, links responsibility to candidate skills, and applies targeted updates under explicit acceptance checks while keeping the backbone frozen. We evaluate on WebShop, PinchBench, and Claw-Eval with Kimi-K2.5, GLM-5, and GPT-5.2. SkillAdaptor improves over no-skill and skill-adaptation baselines on all three suites, with the largest single-metric improvements of +1.5 points on PinchBench Avg Score%, +1.8 on Claw-Eval Avg Score, and +1.7 on WebShop success rate. These results indicate that step-level attribution supports more stable and auditable training-free skill maintenance\footnote{The code will be released at https://github.com/zjunlp/SkillAdaptor.}.

URL PDF HTML ☆

赞 0 踩 0

2606.01306 2026-06-02 cs.LG cs.IR

FAiT: Frequency-Aware Inverted Transformer for Multivariate Time Series Forecasting

FAiT：面向多元时间序列预测的频率感知倒置Transformer

Peng He, Yao Liu, Yanglei Gan, Run Lin, Yuxiang Cai, Qiao Liu

发表机构 * University of Electronic Science and Technology of China（电子科技大学）

AI总结提出FAiT，通过倒置注意力机制和动态时频调制，解决Transformer在多元时间序列预测中忽略高频信号和时变频谱特性的问题。

详情

AI中文摘要

虽然基于Transformer的架构已成为多元时间序列预测（MTSF）的主导范式，但其核心自注意力机制本质上充当低通滤波器，系统性地平滑掉对剧烈局部变化至关重要的高频信号。最近的进展越来越多地引入频域操作来解决这一偏差，然而，大多数现有设计依赖固定的频谱基并应用序列级（均匀）调制，隐含地假设时不变的频率响应。这忽略了现实世界序列的一个关键属性——其频谱特征通常随时间演变，使得均匀调制不足以捕捉细粒度的时态动态。为了解决这些局限性，我们提出了FAiT，一种频率感知的倒置Transformer。具体来说，FAiT通过倒置注意力机制内部纠正频谱偏差，该机制将注意力图解释为可学习的低通算子，并通过倒置注意力矩阵构建一个专用的互补高通分支，以恢复衰减的瞬态信号。此外，FAiT引入了动态时频调制（DTFM），该调制合成实例条件权重以自适应地重新校准频谱子带的能量，从而实现对演变的多样模式进行细粒度控制。在广泛使用的基准上的大量实验表明，FAiT在保持计算效率的同时，始终优于最先进的基于Transformer和频率增强的基线。

英文摘要

While Transformer-based architectures have established themselves as a dominant paradigm in Multivariate Time Series Forecasting (MTSF), their core self-attention mechanism inherently functions as a low-pass filter, systematically smoothing out high-frequency signals vital for sharp local changes. Recent advancements have increasingly incorporated frequency-domain operations to address this bias, however, most existing designs rely on fixed spectral bases and apply sequence-wise (uniform) modulation, implicitly assuming a time-invariant frequency response. This overlooks a key property of real-world series that their spectral characteristics often evolve over time, making uniform modulation insufficient for capturing fine-grained temporal dynamics. To tackle these limitations, we propose FAiT, a Frequency-Aware inverted Transformer. Specifically, FAiT rectifies the spectral bias internally through Inverted Attention, which interprets the attention map as a learnable low-pass operator and constructs a dedicated complementary high-pass branch by inverting the attention matrix to recover attenuated transient signals. Furthermore, FAiT introduces Dynamic Temporal-Frequency Modulation (DTFM), which synthesizes instance-conditioned weights to adaptively re-calibrate the energy of spectral sub-bands, enabling fine-grained control over evolving multi-scale patterns. Extensive experiments on widely used benchmarks demonstrate that FAiT consistently outperforms state-of-the-art Transformer-based and frequency-enhanced baselines, while maintaining computational efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.01302 2026-06-02 cs.LG

Structure and Scale in Simplicial Sequence Modelling

单纯复形序列建模中的结构与规模

Matthew Farrugia-Roberts

发表机构 * Department of Computer Science, University of Oxford（牛津大学计算机科学系）

AI总结本文通过训练小型Transformer预测隐马尔可夫模型输出，发现性能缩放模式与内部表征之间存在相关性，为行为缩放定律与涌现机制的联系提供了初步证据。

Comments HiLD 2026: 4th Workshop on High-dimensional Learning Dynamics

2606.01301 2026-06-02 cs.CL

Med-HEAL: Analyzing and Mitigating Hallucinations in Medical LLMs with Hallucination-Aware In-Context Learning

Med-HEAL：通过幻觉感知的上下文学习分析与缓解医学大语言模型中的幻觉

Yiming Liao, Zeno Franco, Jose Eduardo Lizarraga Mazaba, Keke Chen

发表机构 * University of Maryland, Baltimore County（马里兰大学巴尔的摩县）； Medical College of Wisconsin（威斯康星医学学院）

AI总结提出Med-HEAL框架，利用临床数据构建幻觉数据集，并通过自我批评和检索增强上下文学习策略缓解医学大语言模型中的幻觉，实验表明自我批评策略可提升多数模型准确率。

Comments 12 pages, 5 figures. Preprint full version of an accepted ACM-BCB 2026 short paper

详情

AI中文摘要

医学大语言模型中的幻觉对临床决策支持构成严重风险，特别是当模型需要推理复杂的电子健康记录时。然而，现有基准通常缺乏真实的临床背景，并且对如何在实践中缓解幻觉提供的见解有限。我们引入了Med-HEAL，一个利用临床基础数据系统性地识别、分析和缓解医学大语言模型幻觉的框架。基于源自MIMIC-IV出院摘要的EHRNoteQA基准，我们通过评估BioMistral-7B在开放式临床问答任务上的表现构建了一个幻觉数据集。模型输出通过一个结合LLM-as-a-Judge评估（GPT-4o）和医学生评审员人工审核的双重评估流程进行标注，通过自定义的基于网络的评估系统产生正确性判断和推理错误注释。然后，我们利用该数据集研究缓解策略：自我批评流程，其中测试模型审查自己的答案以检测潜在错误，并对标记案例重新生成响应；以及检索增强的上下文学习，该模型暴露于幻觉和纠正示例。在五个开源大语言模型（BioMistral、Llama-3.1、DeepSeek、Qwen2.5和Qwen3）上的实验表明，自我批评策略在无需参数更新的情况下提高了五个模型中三个的准确率（p < 0.05）。Med-HEAL既提供了一个可重用的幻觉数据集，也提供了一个实用的框架，用于研究和缓解医学大语言模型中的幻觉，支持AI系统在临床环境中更安全地部署。我们的代码和数据可在https://github.com/yimingliao-blad/med-heal.git公开获取。

英文摘要

Hallucinations in medical large language models (LLMs) pose serious risks for clinical decision support, particularly when models must reason over complex electronic health records (EHRs). However, existing benchmarks often lack a realistic clinical context and provide limited insight into how hallucinations can be mitigated in practice. We introduce Med-HEAL, a framework for systematically identifying, analyzing, and mitigating hallucinations in medical LLMs using clinically grounded data. Building on the EHRNoteQA benchmark derived from MIMIC-IV discharge summaries, we construct a hallucination dataset by evaluating BioMistral-7B on open-ended clinical question answering tasks. Model outputs are labeled through a dual evaluation pipeline that combines LLM-as-a-Judge assessment (GPT-4o) with human auditing by medical student reviewers, producing correctness judgments and annotations of reasoning errors via a custom web-based evaluation system. We then leverage this dataset to investigate mitigation strategies: a self-critique pipeline, in which the test model reviews its own answers to detect potential errors and regenerates responses for flagged cases, and retrieval-augmented in-context learning (RA-ICL), which exposes the model to hallucinated and corrected examples. Experiments across five open-source LLMs-BioMistral, Llama-3.1, DeepSeek, Qwen2.5, and Qwen3, show that the self-critique strategy improves accuracy for three of five models (p < 0.05) without requiring parameter updates. Med-HEAL provides both a reusable hallucination dataset and a practical framework for studying and mitigating hallucinations in medical LLMs, supporting safer deployment of AI systems in clinical environments. Our code and data are publicly available at https://github.com/yimingliao-blad/med-heal.git.

URL PDF HTML ☆

赞 0 踩 0

2606.01300 2026-06-02 cs.LG cs.AI

ChronosAD: Leveraging Time Series Foundation Models for Accurate Anomaly Detection

ChronosAD：利用时间序列基础模型进行精确异常检测

Uzair Khan, Luigi Capogrosso, Francesco Biondani, Michele Magno, Franco Fummi, Francesco Setti, Marco Cristani

发表机构 * PR Veneto FESR 2021-2027（普罗文托地区FESR 2021-2027项目）； Action 1.1.1（行动1.1.1）； DGR 792 ； CUP D19J24000810007

AI总结提出ChronosAD架构，通过时间序列基础模型提取特征并结合BiLSTM与多头注意力机制，实现跨域鲁棒的异常检测，在11个基准上平均AUC提升4.72%，AP提升6.60%。

Comments Accepted at the 24th IEEE International Conference on Industrial Informatics (INDIN) 2026

详情

AI中文摘要

时间序列异常检测是金融、医疗和工业等多个领域的关键任务。然而，现有方法通常难以在不同数据集上泛化，尤其是当异常微妙或依赖于上下文时。为解决此问题，我们引入了ChronosAD，一种新颖的异常检测架构，它使用时间序列基础模型作为特征提取器。具体而言，它采用两阶段流程：首先，使用基础模型以零样本方式为每个时间序列提取嵌入。然后，一个由双向长短期记忆（BiLSTM）和多头注意力组成的自定义开发的时间块，对这些嵌入进行精炼以捕捉时间依赖性并突出显著模式。与先前方法不同，我们的模型需要最少的任务特定调整，并在包括工业、医疗、信息物理和汽车系统在内的广泛领域中展现出鲁棒的泛化能力。在11个基准上的大量实验表明，ChronosAD在AUC和AP上平均分别超过现有方法4.72%和6.60%。源代码可在https://github.com/intelligolabs/ChronosAD获取。

英文摘要

Time series anomaly detection is a crucial task in various domains, including finance, healthcare, and industry. However, existing methods often struggle to generalize across different datasets, especially when anomalies are subtle or context-dependent. To solve this issue, we introduce ChronosAD, a novel architecture for anomaly detection that uses a time series foundation model as a feature extractor. Specifically, it employs a two-stage pipeline: first, it uses the foundation model to extract embeddings for each time series in a zero-shot manner. Then, a custom-developed Temporal Block, composed of Bidirectional Long Short-Term Memory (BiLSTM) and Multi-Head Attention, refines these embeddings to capture temporal dependencies and highlight salient patterns. Unlike previous approaches, our model requires minimal task-specific tuning and demonstrates robust generalization across a wide range of domains, including industrial, medical, cyber-physical, and automotive systems. Extensive experiments on 11 benchmarks show that ChronosAD outperforms existing methods by 4.72% in AUC and 6.60% in AP on average. The source code is available at https://github.com/intelligolabs/ChronosAD.

URL PDF HTML ☆

赞 0 踩 0

2606.01298 2026-06-02 cs.CL

Challenger at MultiPRIDE: Is It Hate Speech or Reclaimed?

Challenger at MultiPRIDE: 这是仇恨言论还是被重新使用的语言？

Hadi Bayrami Asl Tekanlou, Mahdi Bakhtiyarzadeh, Jafar Razmara

发表机构 * University of Tabriz（塔布里兹大学）

AI总结针对仇恨言论与重新使用语言的区分难题，提出一种结合语义嵌入、标签噪声过滤（Cleanlab+逻辑回归）和MLP分类器的可解释方法，在资源受限下实现稳健性能。

Comments 9 pages, 2 figures, Published in EVALITA 2026, CEUR Workshop Proceedings Vol. 4195

Journal ref CEUR Workshop Proceedings, Vol. 4195, 2026

详情

AI中文摘要

仇恨言论的传播在现代数字环境中，特别是在社交网络平台上，变得越来越有害。尽管最近的进展在自动仇恨言论检测方面显示出有希望的结果，但一个关键挑战仍然存在：区分真正的仇恨言论和被重新使用的语言。由于重新使用表达具有细微差别和上下文依赖性，准确标注是困难的。在本文中，我们提出了一种简单且可解释的方法来区分仇恨言论和被重新使用的语言，该方法是为MultiPride共享任务开发的。我们的方法生成密集的语义文本嵌入，并使用Cleanlab结合逻辑回归进行标签噪声过滤阶段，然后使用多层感知器（MLP）神经网络进行最终分类。该系统设计用于在有限的计算资源下运行，同时保持强大的性能。我们使用精确率、召回率和F1分数（包括宏平均指标）评估我们的方法。实验结果表明，尽管数据集存在极端类别不平衡，但性能稳健。总体而言，研究结果强调了通过更大的嵌入模型和更先进的预处理技术进一步改进的潜力，同时保持可解释性。

英文摘要

The spread of hate speech has become increasingly harmful in modern digital environments, particularly on social networking platforms. While recent advances have shown promising results in automatic hate speech detection, a key challenge remains: distinguishing genuine hate speech from reclaimed language. Accurate labeling is difficult due to the nuanced and context-dependent nature of reclaimed expressions. In this paper, we present a simple and interpretable approach for distinguishing hate speech from reclaimed language, developed for the MultiPride Shared Task. Our method generates dense semantic text embeddings and incorporates a label-noise filtering stage using Cleanlab with logistic regression, followed by a Multi-layer Perceptron (MLP) neural network for final classification. The system is designed to operate under limited computational resources while maintaining strong performance. We evaluate our approach using precision, recall, and F1-score, including macro-averaged metrics. Experimental results demonstrate robust performance despite extreme class imbalance in the dataset. Overall, the findings highlight the potential for further improvements through larger embedding models and more advanced preprocessing techniques while preserving interpretability.

URL PDF HTML ☆

赞 0 踩 0

2606.01294 2026-06-02 cs.CL cs.LG

Don't Read Everything: A Curvature-Conditioned Query for Linear Attention

不要阅读一切：用于线性注意力的曲率条件查询

Dong Le, Thong Nguyen, Cong-Duy Nguyen, Anh Tuan Luu

发表机构 * Nanyang Technological University（南洋理工大学）； National University of Singapore（国立新加坡大学）； VinUniversity（文理大学）

AI总结针对线性注意力在上下文检索和长上下文任务中的不足，提出曲率条件查询（CCQ）机制，通过二阶泰勒展开构建局部二次模型，利用运行键协方差收缩查询向量，仅修改读取步骤，兼容现有线性注意力骨干，在困惑度、零样本下游准确率、检索和长上下文任务上取得提升。

Comments 19 pages

详情

AI中文摘要

线性注意力通过维护一个循环的快速权重状态，降低了 softmax 注意力的二次成本，但在上下文检索和长上下文任务中始终落后。现有的补救措施通过门控、增量更新或核特征映射作用于记忆的写入侧，但读取步骤保持不变：每个过去的键对输出都有加性贡献，因此有用的目标被存储向量的大多数稀释。我们借用 softmax 几何的一个特定部分来构建一个廉价的读取时查询收缩。在等向注意力点处对 softmax 对数配分函数进行二阶泰勒展开，得到一个局部二次模型，其曲率与运行键协方差一致，该量可以通过与线性注意力状态相同的循环/分块机制来维护。相关的线性算子在查询读取状态之前，沿着记忆的高密度方向收缩查询。我们将这种机制称为曲率条件查询（CCQ）。CCQ 仅修改读取步骤，并且可以与任何线性注意力骨干组合。将其附加到 GLA 和 Gated DeltaNet 上，它在困惑度、零样本下游准确率、训练上下文内外的 S-NIAH 检索、从 4K 到 20K 的长度外推困惑度以及 LongBench 准确率上均有提升，且额外成本很小。

英文摘要

Linear attention reduces the quadratic cost of softmax attention by maintaining a recurrent fast-weight state, but it consistently lags on in-context retrieval and long-context tasks. Existing remedies act on the write side of memory through gating, delta updates, or kernel feature maps, but the read step is left unchanged: every past key contributes additively to the output, so useful targets are diluted by the bulk of stored vectors. We borrow one specific piece of softmax's geometry to construct a cheap read-time contraction of the query. A second-order Taylor expansion of the softmax log-partition at the isotropic-attention point gives a local quadratic model whose curvature coincides with the running key covariance, a quantity that can be maintained with the same recurrent/chunkwise mechanism as the linear-attention state. The associated linear operator contracts the query along the high-density directions of memory before it reads the state. We call this mechanism Curvature-Conditioned Query (CCQ). CCQ modifies only the read step and is composable with any linear-attention backbone. Attached to GLA and Gated DeltaNet, it improves perplexity, zero-shot downstream accuracy, S-NIAH retrieval at and beyond the training context, length-extrapolation perplexity from 4K to 20K, and LongBench accuracy, at small extra cost.

URL PDF HTML ☆

赞 0 踩 0

2606.01292 2026-06-02 cs.LG cs.AI

What Makes a Strong Model? A Unified Spectral Analysis of Knowledge Transfer over High-dimensional Linear Regression

什么造就了一个强模型？高维线性回归中知识迁移的统一谱分析

Wendao Wu, Fangqing Zhang, Haihan Zhang, Cong Fang

发表机构 * Department of Computer Science（计算机科学系）； Cranberry-Lemon University（Cranberry-Lemon 大学）； Department of Computational Neuroscience（计算神经科学系）； University of the Witwatersrand（沃特瓦特斯兰大学）

AI总结本文通过高维线性回归中SGD动力学的统一谱分析，揭示了知识蒸馏中的谱视界扩展和弱到强泛化中的谱去噪两种机制，统一解释了不同知识迁移范式的有效性。

详情

AI中文摘要

师生知识迁移在现代机器学习中无处不在，从通过知识蒸馏进行的经典模型压缩到弱到强泛化这一新兴现象。尽管现有研究提供了孤立见解，但缺乏一个统一的理论框架来解释知识迁移在这些不同机制中的有效性。在这项工作中，我们建立了高维线性回归中SGD动力学的统一谱分析，阐明了知识迁移在看似不同的机制中的效率。我们通过两种不同机制来刻画知识迁移效率：知识蒸馏中的谱视界扩展，使得能够捕获统计上不可及的高频信号；以及弱到强泛化中的谱去噪，其中学生充当优化噪声的滤波器。我们的框架统一了这些现象，揭示了迁移的有效性由隐式正则化与谱上异质谱学习速度之间的相互作用所支配。

英文摘要

Teacher-Student Knowledge Transfer (KT) is ubiquitous in modern machine learning, ranging from classical model compression via Knowledge Distillation (KD) to the emergent phenomenon of Weak-to-Strong (W2S) generalization. While existing studies offer isolated insights, a unified theoretical framework explaining the efficacy of KT across these disparate regimes remains lacking. In this work, we establish a unified spectral analysis of SGD dynamics in high-dimensional linear regression, elucidating the efficiency of KT across seemingly disparate regimes. We characterize KT efficiency through two distinct mechanisms: \emph{Spectral Horizon Expansion} in KD, which enables the capture of statistically inaccessible high-frequency signals, and \emph{Spectral Denoising} in W2S, where the student acts as a filter for optimization noise. Our framework unifies these phenomena, revealing that the efficacy of transfer is governed by the interplay between implicit regularization and heterogeneous spectral learning speeds over the spectrum.

URL PDF HTML ☆

赞 0 踩 0

2606.01289 2026-06-02 cs.LG

Feature to Dynamics: Feature-space to Autoregression strategy for Zero-shot Time Series Forecasting

从特征到动态：面向零样本时间序列预测的特征空间到自回归策略

Yifan Wu, Junjie Wu, Kai Wu, Xiaoyu Zhang, Jian Lou

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出FSA框架，通过从可解释特征空间到自回归策略空间的映射，在零样本单变量时间序列预测中引入显式归纳偏置，分离全局趋势、周期成分和局部动态，以更少数据假设实现跨域泛化，在受控实验中优于Transformer架构。

详情

AI中文摘要

零样本时间序列预测旨在预测未见序列的未来值，要求模型泛化超出训练分布的时间动态。虽然近期的基础模型通过大规模预训练实现了强大的域内性能，但其有效性通常依赖于广泛的数据覆盖和隐式模式记忆，当数据稀缺或源域与目标域不重叠时，这可能会限制泛化能力。在这项工作中，我们提出了FSA，一种用于受控零样本单变量预测的特征到策略框架。FSA不直接在观测空间中对原始序列建模，而是学习从可解释特征空间到自回归策略空间的结构化映射。这种设计引入了显式归纳偏置，将全局趋势、周期成分和局部时间动态分离，使模型能够以更少的数据假设捕获可迁移的时间序列结构。实验结果表明，在相同的预训练数据、训练协议和可比较的参数预算下，FSA在我们的受控零样本设置中优于基于Transformer的架构。

英文摘要

Zero-shot time series forecasting aims to predict future values for previously unseen series, requiring models to generalize temporal dynamics beyond the training distribution. While recent foundation models achieve strong in-domain performance through large-scale pretraining, their effectiveness often relies on broad data coverage and implicit pattern memorization, which can limit generalization when data are scarce or source and target domains are disjoint. In this work, we propose FSA, a feature-to-strategy framework for controlled zero-shot univariate forecasting. Instead of directly modeling raw sequences in the observation space, FSA learns a structured mapping from an interpretable feature space to an autoregressive strategy space. This design introduces explicit inductive biases that disentangle global trends, periodic components, and local temporal dynamics, enabling the model to capture transferable time-series structure with fewer data assumptions. Empirical results show that, under identical pretraining data, training protocol, and comparable parameter budgets, FSA outperforms Transformer-based architectures in our controlled zero-shot setting.

URL PDF HTML ☆

赞 0 踩 0

2606.01287 2026-06-02 cs.CV cs.AI

Beyond Visual Memory: Mechanistic Diagnostics of Latent Visual Reasoning

超越视觉记忆：潜在视觉推理的机制诊断

Garvin Guo, Yu Chen, Xiang Wang, Shuai Li, Xinpei Zhao, Huaxing Liu, Shuai Dong

发表机构 * Amap, Alibaba Group（阿里集团亚马通）； Shanghai Innovation Institute（上海创新研究院）

AI总结通过分解潜在令牌为三个可测试组件，发现边界标记和格式而非潜在槽贡献了主要性能提升，揭示了潜在视觉推理的真正机制。

详情

AI中文摘要

最近的潜在视觉推理方法通过在多模态语言模型中插入连续潜在令牌取得了显著提升。这些提升通常归因于令牌编码了视觉证据；然而，最近的分析揭示了一个悖论：令牌与图像关联松散，对答案贡献甚微。关键的是，这些分析将潜在令牌视为一个整体，掩盖了提升的真正来源。因此，我们将潜在令牌分解为三个可测试组件：潜在槽、边界标记和格式，并在有利条件下开发了一种最先进的方法作为探针。在六个方法-阶段设置和四个感知密集型基准测试中，潜在槽未能通过视觉记忆解释的所有预测。引人注目的是，在几种设置中，仅保留边界标记即可保留78%至100%的提升，而模型在潜在位置比在答案位置更窄地关注图像。因此，提升来自边界标记、格式以及这种注意力模式，而非潜在槽。每种方法如何利用这一机制取决于其训练监督：在匹配的准确率下，机制仍可能显著不同。因此，潜在视觉推理不仅需要根据准确率评估，还需要根据模型实际依赖的内容进行评估。

英文摘要

Recent latent visual reasoning methods achieve substantial gains by inserting continuous latent tokens into multimodal language models. These gains are commonly attributed to the tokens encoding visual evidence; recent analyses, however, reveal a paradox: the tokens are loosely tied to the image and contribute little to the answer. Critically, these analyses treat latent tokens as a single unit, obscuring the true source of the gains. We therefore decompose latent tokens into three testable components: latent slots, boundary markers, and format, and develop a state-of-the-art method as a probe under favorable conditions. Across six method-stage settings and four perception-heavy benchmarks, latent slots fail every prediction of the visual-memory account. Strikingly, retaining only the boundary markers preserves 78 to 100% of the gain in several settings, while the model attends to the image more narrowly at latent positions than at answer positions. The gain therefore comes from boundary markers, format, and this attention pattern, not from latent slots. How each method engages this mechanism depends on its training supervision: at matched accuracy, mechanisms can still differ markedly. Latent visual reasoning thus needs evaluation not only by accuracy but by what the model actually relies on.

URL PDF HTML ☆

赞 0 踩 0

2606.01285 2026-06-02 cs.CV cs.AI

Knowledge-Intensive Video Generation

知识密集型视频生成

Chenxu Wang, Mingda Chen

发表机构 * Fudan University（复旦大学）； Shanghai Jiao Tong University（上海交通大学）

AI总结针对文本到视频生成在事实性和实用性方面的不足，提出知识密集型视频生成（KIVI）任务，构建KIVI-Bench基准和自动评估指标，实验表明现有模型在视觉属性、操作过程和信息呈现上落后于人类。

2606.01283 2026-06-02 cs.LG

AdaKernel: Learning Adaptive Kernel Parameters for Spatiotemporal Graph Neural Networks

AdaKernel: 为时空图神经网络学习自适应核参数

Zhongyue Zhang, Guangyin Jin, Yuxuan Liang, Suwan Yin, Yuankai Wu

发表机构 * Sichuan University（四川大学）； PLA Academy of Military Science（中国人民解放军军事科学院）； The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））

AI总结针对图神经网络中固定核参数导致模型容量受限的问题，提出AdaKernel方法，通过结构保持策略学习自适应核参数，在数据稀疏场景下优于固定先验和全隐式图结构方法。

Comments 17 pages, 15 figures, including appendix

详情

AI中文摘要

建模空间依赖性是使用图神经网络（GNN）进行时空数据分析的核心。传统方法依赖于具有预定义参数的基于距离的核，这限制了模型容量。尽管通用自适应机制（如图注意力网络）提供了灵活性，但它们通常无法捕捉潜在的几何结构，在数据稀疏场景下表现不如基于距离的模型。针对这一问题，我们重新审视核参数化问题，并从理论上证明，错误指定的核参数会在GNN中引入不可避免的近似误差。为了克服这一困难，我们提出AdaKernel，一种简单而有效的方法，在神经网络内学习自适应核参数。与从头学习图结构的方法不同，AdaKernel采用结构保持策略，优化物理相互作用的尺度而非丢弃它们。在克里金插值、数据填补和预测上的大量实验表明，AdaKernel持续改进各种GNN架构，并优于模型无关的自适应基线，验证了准确学习的核参数优于固定先验和完全隐式图结构。

英文摘要

Modeling spatial dependencies is central to spatiotemporal data analysis using Graph Neural Networks (GNNs). Traditional methods rely on distance-based kernels with predefined parameters, which restricts model capacity. Although generic adaptive mechanisms (e.g., Graph Attention Networks) offer flexibility, they often fail to capture the underlying geometric structure, performing worse than distance-based models in data-sparse scenarios. Addressing this, we revisit the kernel parameterization problem and theoretically prove that misspecified kernel parameters introduce unavoidable approximation errors in GNNs. To overcome this, we propose AdaKernel, a simple yet effective approach that learns adaptive kernel parameters within the neural network. Unlike methods that learn graph structures from scratch, AdaKernel adopts a structure-preserving strategy that optimizes the scale of physical interactions rather than discarding them. Extensive experiments on Kriging, Imputation, and Forecasting demonstrate that AdaKernel consistently improves various GNN architectures and outperforms model-agnostic adaptive baselines, validating that accurately learned kernel parameters are superior to both fixed priors and fully latent graph structures.

URL PDF HTML ☆

赞 0 踩 0

2606.01282 2026-06-02 cs.CV cs.CY cs.LG

KG-FairDiff: Knowledge Graph-Guided Prompt Refinement for Demographically Fair Text-to-Image Generation

KG-FairDiff: 知识图谱引导的提示词精炼用于人口统计公平的文本到图像生成

Farbod Davoodi, Seyed Reza Tavakoli Shiyadeh, Pooria Safaei, Sana Harighi, Parsa Gholami, Amirali Amini, Kimia Vanaei, Emad Firoozi, Parham Abed Azad, Babak Khalaj, Siavash Ahmadi, Amir Hossein Payberah, Mohammad Hossein Rohban, Soheil Kolouri, Ali Diba

发表机构 * University of Science and Technology of China（中国科学技术大学）； Sharif University of Technology（谢赫·伊斯兰大学）； Iran University of Science and Technology（伊朗科学技术大学）

AI总结提出KG-FairDiff框架，通过知识图谱引导的提示词精炼，在推理时优化公平性损失，减少文本到图像生成中的性别、种族、年龄等人口统计偏差，同时保持语义保真度。

详情

AI中文摘要

文本到图像（TTI）系统现已成为新闻、教育、广告和公共传播的日常基础设施，它们从训练数据中继承的人口统计和文化刻板印象（将女性、有色人种、老年人和非西方文化描绘为代表性不足或漫画化）在部署规模上成为人口层面的危害。现有的缓解措施要么需要昂贵的重新训练，这对于主导消费产品的闭源骨干网络不可行，要么依赖于忽略文化背景的固定人口统计模板。我们提出了KG-FairDiff，一个模型无关、推理时框架，将公平感知的提示词精炼形式化为一个约束优化问题，并将其实现为一个闭环流水线：一个包含约1200个文化和偏见相关三元组的知识图谱检索结构化上下文，一个LLM改写器提出精炼，一个验证器仅接受那些减少基于散度的公平性损失同时保持用户原始意图语义保真度的提示词。我们证明了精炼循环的有限终止界限，贡献了一个数学上一致的评估套件，将Bias-P/Bias-W与目标分布的散度以及ENS与KL散度联系起来，并审计了八个广泛部署的骨干生成器。KG-FairDiff显著减少了性别、种族、年龄和交叉差异，同时保持了提示词语义，为更公平的生成式AI提供了一条实用、可部署的路径。

英文摘要

Text-to-Image (TTI) systems are now everyday infrastructure for journalism, education, advertising, and public communication, and the demographic and cultural stereotypes they inherit from training data (rendering women, people of colour, older adults, and non-Western cultures as under-represented or caricatured) become a population-level harm at deployment scale. Existing mitigations either require costly retraining, infeasible for the closed-source backbones that dominate consumer products, or rely on fixed demographic templates that ignore cultural context. We present KG-FairDiff, a model-agnostic, inference-time framework that formalises fairness-aware prompt refinement as a constrained optimisation problem and operationalises it as a closed-loop pipeline: a knowledge graph of ~1,200 culture- and bias-related triples retrieves structured context, an LLM rewriter proposes refinements, and a validator accepts only prompts that reduce a divergence-based fairness loss while preserving semantic fidelity to the user's original intent. We prove a finite-termination bound for the refinement loop, contribute a mathematically consistent evaluation suite linking Bias-P/Bias-W to divergence from target distributions and ENS to KL divergence, and audit eight widely-deployed backbone generators. KG-FairDiff substantially reduces gender, race, age, and intersectional disparities while preserving prompt semantics, offering a practical, deployment-ready route to more equitable generative AI.

URL PDF HTML ☆

赞 0 踩 0

2606.01281 2026-06-02 cs.LG cs.AI

RLVR without Ineffective Samples: Group Prioritized Off-Policy Optimization for LLM Reasoning

RLVR 无需无效样本：面向 LLM 推理的群体优先级离策略优化

Yixiu Mao, Yun Qu, Qi Wang, Heming Zou, Xiangyang Ji

发表机构 * Department of Automation, Tsinghua University（清华大学自动化系）

AI总结针对强化学习中无效样本导致学习信号不足的问题，提出群体优先级离策略优化（POPO），通过优先级群体重放和解耦重要性采样，在不增加额外采样开销的情况下提升推理性能。

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）已成为增强大型语言模型（LLMs）推理能力的强大范式。然而，其有效性受到无效训练数据普遍存在的严重阻碍：许多采样提示产生的响应群体要么完全正确，要么完全错误，导致奖励零方差和学习信号有限。最近的先进方法通过大量LLM rollout来过滤无效样本以解决此问题，但代价是相当大的计算开销。替代方法，包括预测性采样和轨迹重放，旨在提高数据效率，但往往仍不充分，并可能引入额外问题，如系统性偏差或次优约束。为解决这些局限性，我们提出了群体优先级离策略优化（POPO），一个简单而有效的框架，无需额外rollout开销即可充分利用有效训练批次。POPO包含两个关键组件：优先级群体重放和解耦离策略优化。前者通过基于近因的重放机制，联合考虑样本质量和离策略程度，用有效的离策略群体替换无效的在策略群体。为进一步缩小离策略差距，POPO采用解耦重要性采样来校正离策略偏差，同时在一致的信任区域约束下保持稳定的策略更新。在包括数学、规划和视觉几何在内的多种推理任务上的实证评估表明，POPO显著加速了RL微调，并在显著减少rollout的情况下实现了强大的推理性能。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, its effectiveness is substantially hindered by the prevalence of ineffective training data: many sampled prompts yield response groups that are either entirely correct or entirely incorrect, resulting in zero-variance rewards and limited learning signals. Recent state-of-the-art methods address this issue through extensive LLM rollouts to filter ineffective samples, but at the cost of considerable computational overhead. Alternative approaches, including predictive sampling and trajectory replay, aim to improve data efficiency but often remain insufficient and may introduce additional issues such as systematic bias or suboptimal constraints. To address these limitations, we propose Group Prioritized Off-Policy Optimization (POPO), a simple yet effective framework that fully exploits effective training batches without additional rollout overhead. POPO comprises two key components: prioritized group replay and decoupled off-policy optimization. The former replaces ineffective on-policy groups with effective off-policy groups via a recency-based replay mechanism that jointly considers sample quality and the degree of off-policiness. To further mitigate the off-policy gap, POPO employs decoupled importance sampling to correct off-policy bias while maintaining stable policy updates under consistent trust-region constraints. Empirical evaluations across diverse reasoning tasks, including mathematics, planning, and visual geometry, demonstrate that POPO substantially accelerates RL finetuning and achieves strong reasoning performance with significantly fewer rollouts.

URL PDF HTML ☆

赞 0 踩 0

2606.01280 2026-06-02 cs.CV

Event-Based Vision in Space: Applications, Trends, and Future Directions

太空中的事件视觉：应用、趋势与未来方向

Luigi Capogrosso, Pietro Bonazzi, Michele Magno

发表机构 * Interdisciplinary Transformation University of Austria（交叉学科转型奥地利大学）； ETH Zurich（苏黎世联邦理工学院）

AI总结本文综述了事件视觉传感器在太空领域的应用，通过分类四个主要领域（大气与高速观测、环境监测与变化检测、操作支持与星上处理、地理空间建模与预测分析），指出神经形态工程是解决现代遥感与可持续太空探索关键瓶颈的范式转变。

Comments Accepted at the XXIV Annual Conference on Sensors and Microsystems (AISEM) 2026

详情

AI中文摘要

地球观测（EO）正经历由新型传感技术部署驱动的重大变革。传统的基于帧的光学传感器在具有挑战性的轨道环境中常受运动模糊、高功耗和极端数据冗余的困扰。相比之下，事件传感器（也称为神经形态相机）提供了一种仿生异步方法。通过仅捕获局部光照变化，它们提供微秒级时间分辨率、极高动态范围和卓越能效。尽管这些传感器的使用正从地面系统迅速扩展到轨道平台，但围绕其太空应用的科学文献仍然高度分散。为弥合这一差距，本文对太空领域事件视觉的最新技术进行了全面综述。基于检索到的文献，我们引入了一个围绕四个主要领域构建的分类体系：1）大气与高速观测；2）环境监测与变化检测；3）操作支持与星上处理；4）地理空间建模与预测分析。因此，本综述强调，神经形态工程远不止是一种补充成像技术；它是一种范式转变，可直接用于解决现代遥感和可持续太空探索中的关键瓶颈。

英文摘要

Earth Observation (EO) is undergoing a significant transformation driven by the deployment of novel sensing technologies. Traditional frame-based optical sensors often struggle with motion blur, high power consumption, and extreme data redundancy in challenging orbital environments. In contrast, event-based sensors, also known as neuromorphic cameras, offer a bio-inspired asynchronous approach. By capturing only local illumination changes, they provide microsecond temporal resolution, an extremely high dynamic range, and exceptional energy efficiency. Although the use of these sensors is rapidly expanding from terrestrial systems to orbital platforms, the scientific literature surrounding their space-based applications remains heavily fragmented. To bridge this gap, this article presents a comprehensive review of the state-of-the-art in event-based vision in the space domain. Based on the retrieved literature, we introduce a taxonomy structured around four primary domains: 1) atmospheric and high-speed observation; 2) environmental monitoring and change detection; 3) operational support and onboard processing; and 4) geospatial modeling and predictive analysis. As a result, this survey highlights that neuromorphic engineering is far more than a supplementary imaging technique; it is a paradigm shift that can be used to directly address critical bottlenecks in modern remote sensing and sustainable space exploration.

URL PDF HTML ☆

赞 0 踩 0

2606.01279 2026-06-02 cs.AI

ANDES: Agent Native Data Evolving Synthesis Tool for Autonomous Instruction Alignment

ANDES：用于自主指令对齐的智能体原生数据演化合成工具

Zhengyang Zhao, Shengjie Ye, Lu Ma, Hao Liang, Hengyi Feng, Wentao Zhang

发表机构 * Peking University（北京大学）； Sichuan University, Chengdu（四川大学，成都）

AI总结提出ANDES框架，通过自进化世界树路由和可操作诊断报告，将数据生成重构为即插即用的智能体技能，使基础较弱的智能体在严格计算约束下实现自动对齐，在PostTrainBench上取得最先进性能。

详情

AI中文摘要

AI智能体正越来越多地被用于自动化AI研究本身，特别是将基础大语言模型转化为对齐助手的关键后训练阶段。然而，最近的评估显示，即使是最前沿的智能体也难以完成这一任务。虽然后训练的成功根本上依赖于获取高质量数据，但依赖智能体从开放网络中自主策划目标训练数据集带来了严峻挑战。在嘈杂的网络环境中执行搜索、过滤和平衡数据的长期任务，常常超出智能体有限的上下文能力，最终导致数据集质量下降和下游训练性能次优。为弥补这一差距，我们引入了Andes（智能体原生数据演化合成），这是一个将数据生成重新构想为即插即用的智能体技能的框架。Andes不强迫智能体从头设计复杂的数据收集策略，而是提供一个智能抽象层。通过利用自演化的世界树路由机制和可操作的诊断报告，它允许训练智能体通过交互式闭环界面动态引导数据合成。我们证明，在严格的计算约束下，为基础较弱的智能体配备Andes可以改善自动对齐，在PostTrainBench上取得最先进的性能，并实现稳健的跨任务泛化。我们的项目可在https://github.com/zzy1127/ANDES获取。

英文摘要

AI agents are increasingly being tasked with automating AI research itself, particularly the critical post-training phase that transforms base LLMs into aligned assistants. However, recent evaluations reveal that even frontier agents struggle to perform this task. While the success of post-training fundamentally relies on acquiring high-quality data, relying on agents to autonomously curate targeted training datasets from the open web introduces severe challenges. Executing the long-horizon tasks of searching, filtering, and balancing data within noisy web environments frequently overwhelms an agent's limited context, ultimately leading to degraded dataset quality and suboptimal downstream training performance. To bridge this gap, we introduce Andes (Agent Native Data Evolving Synthesis), a framework that reimagines data generation as a plug-and-play \emph{agent skill}. Rather than forcing agents to devise complex data-gathering strategies from scratch, \textsc{Andes} provides an intelligent abstraction layer. By leveraging a self-evolving World Tree routing mechanism and actionable diagnostic reports, it allows trainer agents to dynamically steer data synthesis through an interactive, closed-loop interface. We demonstrate that under strict compute constraints, equipping foundationally weaker agents with Andes improves automated alignment, securing state-of-the-art performance on PostTrainBench and robust cross-task generalization. Our project is available at https://github.com/zzy1127/ANDES.

URL PDF HTML ☆

赞 0 踩 0

2606.01277 2026-06-02 cs.RO cs.AI cs.CV cs.SY eess.IV eess.SY

DeepIPCv3: Event-Aware Multi-Modal Sensor Fusion for Sudden Pedestrian Crossing Avoidance

DeepIPCv3: 面向突发行人穿越避让的事件感知多模态传感器融合

Oskar Natan, Andi Dharmawan, Aufaclav Zatu Kusuma Frisky, Jazi Eko Istiyanto, Jun Miura

发表机构 * Department of Computer Science and Electronics, Universitas Gadjah Mada（计算机科学与电子系，加雅马达大学）； Department of Computer Science and Engineering, Toyohashi University of Technology（计算机科学与工程系，东福士大学）

AI总结提出DeepIPCv3框架，通过Transformer交叉模态注意力融合LiDAR点云与DVS事件流，实现突发行人穿越场景下的高反应性避让，在自定义多模态数据集上达到最优轨迹与控制精度。

详情

AI中文摘要

当前的端到端自动驾驶系统主要依赖基于帧的传感器，这类传感器在高度动态的突发行人穿越场景中存在固有的感知延迟和运动模糊问题。为解决这一关键安全漏洞，我们提出DeepIPCv3，一种新颖的多模态自主导航框架，它将LiDAR点云的密集3D空间几何与动态视觉传感器（DVS）的微秒级异步事件流协同融合。我们引入了一种受Transformer启发的交叉模态注意力机制，以动态关联这些不同模态，使网络能够即时优先处理高速动态更新，同时不牺牲场景结构感知。融合后的潜在表示通过一个混合策略网络映射到安全的局部路径点和可执行控制命令，该网络结合了启发式轨迹跟踪与直接神经预测。由于在真实场景中测试这些突发穿越场景存在严重物理风险，该框架使用在光照良好的正午和具有挑战性的傍晚条件下收集的自定义多模态数据集进行严格离线评估。广泛的对比和消融研究表明，DeepIPCv3达到了最先进的预测性能。通过有效消除曝光失败和运动模糊，所提出的LiDAR与DVS融合实现了最低的轨迹和控制命令误差，使得无论环境光照如何，都能实现高反应性、数学上有界的规避机动。为支持未来研究，我们将代码发布到GitHub仓库：https://github.com/oskarnatan/DeepIPCv3。

英文摘要

Current end-to-end autonomous driving systems predominantly rely on frame-based sensors, which suffer from inherent perception latency and motion blur during highly dynamic encounters, specifically sudden pedestrian crossings. To address this critical safety vulnerability, we propose DeepIPCv3, a novel multi-modal autonomous navigation framework that synergizes the dense 3D spatial geometry of LiDAR point clouds with the microsecond-level asynchronous event streams of a Dynamic Vision Sensor (DVS). We introduce a Transformer-inspired cross-modal attention mechanism to dynamically correlate these distinct modalities, allowing the network to instantaneously prioritize high-speed dynamic updates without sacrificing structural scene awareness. The fused latent representations are then mapped to safe local waypoints and executable control commands via a hybrid policy network that blends heuristic trajectory tracking with direct neural predictions. Due to the severe physical risks associated with live testing of these sudden crossing scenarios, the framework is rigorously evaluated offline using a custom multi-modal dataset collected across both well-illuminated noon and challenging evening conditions. Extensive comparative and ablation studies demonstrate that DeepIPCv3 achieves state-of-the-art predictive performance. By effectively eliminating exposure failures and motion blur, the proposed LiDAR and DVS fusion yields the lowest trajectory and control command errors, enabling highly reactive, mathematically bounded evasive maneuvers regardless of ambient illumination. To support future research, we will release the codes to our GitHub repo at https://github.com/oskarnatan/DeepIPCv3.

URL PDF HTML ☆

赞 0 踩 0

2606.01276 2026-06-02 cs.CL

Worlds Within Words: Translating Culture in Ancient Chinese Texts with Multi-Agent Coordination

词中世界：基于多智能体协调的古代汉语文本文化翻译

Xiaoqi He, Kaixin Lan, Mu You, Tao Fang, Lidia S. Chao, Derek F. Wong

发表机构 * NLP2 CT Lab, Department of Computer and Information Science, University of Macau（澳门大学自然语言处理与计算机实验室，计算机与信息科学系）； Institute of International Language Services Studies, Macau Millennium College（澳门 millennium 学院国际语言服务研究学院）

AI总结针对古代汉语文本中文化负载词的翻译难题，提出多智能体文化感知翻译框架MACAT，通过选择性显化策略动态识别文化短语并注入简洁解释，在中医经典和《论语》上优于基线模型。

Comments The preprint manuscript is 20 pages long and is currently under review

详情

AI中文摘要

基于大语言模型的机器翻译推动了跨文化交流，但在处理古代汉语文本中的文化负载词时仍面临挑战。挑战不仅在于词汇对齐，还在于决定何时以及如何向缺乏相关背景的读者显化文化依赖知识。直译常保留表面形式但缺失深层概念，而过度显化则损害简洁性和可读性。为解决此问题，我们将文化负载词翻译定义为选择性显化任务，并提出MACAT，一个多智能体文化感知翻译框架，动态识别文化显著短语并在必要时注入简洁的解释性知识。MACAT进一步包含一个质量感知重排序模块用于候选选择，以及一个多轮评估智能体，从术语精确性、可读性、忠实度、文化保留和文化显化方面评估翻译。在中医经典和《论语》上的实验表明，在统一的GPT-5.4评估设置下，MACAT在100篇中医文档和《论语》20章子集上始终优于骨干模型和通用机器翻译基线。

英文摘要

Large language model (LLM)-based machine translation has advanced cross-cultural communication, yet it still struggles with culture-loaded words (CLWs) in ancient Chinese texts. The challenge extends beyond lexical alignment to deciding when and how culture-dependent knowledge should be explicated for readers lacking relevant background. Literal translation often preserves surface forms while missing underlying concepts, whereas over-explicitation harms conciseness and readability. To address this problem, we formulate CLW translation as a selective explicitation task and propose \textbf{MACAT}, a \textbf{M}ulti-\textbf{A}gent \textbf{C}ulture-\textbf{A}ware \textbf{T}ranslation framework that dynamically identifies culturally salient phrases and injects concise explanatory knowledge when necessary. MACAT further incorporates a quality-aware reranking module for candidate selection and a multi-round evaluation agent that assesses translations across terminological precision, readability, fidelity, cultural preservation, and cultural explicitation. Experiments on traditional Chinese medicine (TCM) classics and the \textit{Analects} show that, under a unified GPT-5.4 evaluation setting, MACAT consistently outperforms both the backbone model and general-purpose MT baselines on 100 TCM documents and a 20-chapter subset of the \textit{Analects}.

URL PDF HTML ☆

赞 0 踩 0

2606.01273 2026-06-02 cs.LG

GLIDE: Graph-guided Leap Inference for Diffusion Estimation of Spatio-Temporal Point Processes

GLIDE: 面向时空点过程扩散估计的图引导跳跃推理

Guanyu Zhou, Yao Liu, Yanglei Gan, Yuxiang Cai, Peng He, Run Lin, Yuxiang Liu, Qiao Liu

发表机构 * University of Electronic Science and Technology of China（电子科技大学）

AI总结提出GLIDE框架，利用多尺度历史图编码和双流架构作为条件，结合先验引导的跳跃推理机制，实现高效且准确的时空点过程下一个事件建模与预测。

详情

AI中文摘要

时空点过程（STPPs）为连续时间和空间中的异步事件建模提供了原则性框架。最近的扩散方法通过建模复杂条件分布，为确定性预测提供了灵活的替代方案，但其在STPPs中的应用仍面临挑战：从纯噪声中反向采样成本高昂，且稀疏空间域中弱结构约束可能导致概率质量定位不佳。我们提出 extbf{GLIDE}（图引导跳跃推理扩散估计），一种用于STPPs中下一个事件建模的条件扩散框架。GLIDE将历史事件组织成多尺度历史图，并通过双流架构编码时间演化和空间拓扑，为双分支扩散去噪器提供结构化条件上下文。它进一步引入先验引导的跳跃推理机制，其中轻量级均值预测器提供确定性锚点，反向过程从中间扩散步骤而非纯高斯噪声开始。在多个真实世界数据集上的实验表明，GLIDE改进了分布拟合和下一个事件预测，其中空间方面的提升最大。结果还表明，先验引导的跳跃推理大幅降低了反向采样成本，同时保留了扩散模型的随机生成能力。

英文摘要

Spatio-temporal point processes (STPPs) provide a principled framework for modeling asynchronous events in continuous time and space. Recent diffusion-based approaches offer a flexible alternative to deterministic prediction by modeling complex conditional distributions, but their application to STPPs remains challenging: reverse sampling from pure noise is costly, and weak structural constraints in sparse spatial domains can lead to poorly localized probability mass. We propose \textbf{GLIDE} (Graph-guided Leap Inference for Diffusion Estimation), a conditional diffusion framework for next-event modeling in STPPs. GLIDE organizes historical events into a multi-scale historical graph and encodes temporal evolution and spatial topology through a dual-stream architecture, yielding a structured conditioning context for a dual-branch diffusion denoiser. It further introduces a prior-guided leap inference mechanism, in which a lightweight mean predictor provides a deterministic anchor and the reverse process starts from an intermediate diffusion step instead of from pure Gaussian noise. Experiments on multiple real-world datasets show that GLIDE improves both distribution fitting and next-event prediction, with the largest gains appearing on the spatial side. The results also indicate that prior-guided leap inference substantially reduces reverse-sampling cost while preserving the stochastic generation capability of diffusion models.

URL PDF HTML ☆

赞 0 踩 0

2606.01271 2026-06-02 cs.CV

Exploiting In-Sensor Computing for Energy-Efficient Earth Observation

利用传感器内计算实现节能地球观测

Luigi Capogrosso, Pietro Bonazzi, Loris Hoxhaj, Michele Magno

发表机构 * Interdisciplinary Transformation University of Austria（跨学科转型奥地利大学）； ETH Zurich（苏黎世联邦理工学院）； University of Verona（威尼斯大学）

AI总结针对卫星数据下行带宽瓶颈，提出基于TinyML和索尼IMX500传感器的传感器内计算框架，在8MB约束下达到96.68%精度和42.26 GMAC/J能效。

Comments Accepted at the XXIV Annual Conference on Sensors and Microsystems (AISEM) 2026

详情

AI中文摘要

卫星产业的快速增长推动了地理空间数据获取的大幅增加，凸显了一个关键瓶颈：收集的传感器数据量与地面站有限的可用下行带宽之间的严重不匹配。虽然星载计算通过在轨预处理数据帮助解决了这一问题，但本文通过引入传感器内计算框架进一步推进了这一范式。我们通过将TinyML技术与索尼IMX500智能视觉传感器集成，提出了一种针对严格计算约束优化的端到端地球观测流水线。具体来说，我们的方法将处理直接转移到传感器级别，减轻了主嵌入式设备的计算负担，并有效减少了噪声或无关数据的下行传输。我们在EuroSAT数据集上评估了几种高效的卷积神经网络，即SqueezeNet、ShuffleNetV2和MCUNetV1。实验结果表明，尽管部署在IMX500平台上需要优化，我们的模型在其8 MB约束内保持了具有竞争力的96.68%准确率。具体来说，模型达到平均处理吞吐量17.40 FPS，延迟27.43 ms。此外，我们的系统配置文件表现出高能效，每次推理的低能耗为14.19 mJ，能效评级为42.26 GMAC/J，证明了其在传感器内部署的可行性。

英文摘要

The rapid growth of the satellite industry has driven a significant increase in geospatial data acquisition, highlighting a critical bottleneck: the severe disparity between the volume of collected sensor data and the limited downlink bandwidth available to ground stations. While On-Board Computing (OBC) has helped address this by pre-processing data in orbit, this article further advances the paradigm by introducing an in-sensor computing framework. We present an optimized end-to-end Earth Observation (EO) pipeline tailored for strict computational constraints by integrating TinyML techniques with the Sony IMX500 Intelligent Vision Sensor. Specifically, our approach shifts processing directly to the sensor level, offloading the computation from the primary embedded device, and effectively mitigating the downlink transmission of noisy or irrelevant data. We evaluated several efficient Convolutional Neural Networks (ConvNets), i.e., SqueezeNet, ShuffleNetV2, and MCUNetV1, on the EuroSAT dataset. Experimental results show that, despite the optimizations required for deployment on the IMX500 platform, our models maintain a competitive 96.68% accuracy while operating within its 8 MB constraints. Specifically, the models reach an average processing throughput of 17.40 FPS with a latency of 27.43 ms. Furthermore, our system profile exhibits high energy efficiency, with a low energy footprint of 14.19 mJ per inference and an efficiency rating of 42.26 GMAC/J, demonstrating its viability for in-sensor deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.01265 2026-06-02 cs.LG cs.AI

PALTO: Physics-Informed Active Learning for Tri-Gate FinFET Design Optimization for Vertical Power Delivery

PALTO：面向垂直供电的Tri-Gate FinFET设计优化的物理信息主动学习

Ayoub Sadeghi, Leonid Popryho, Inna Partin-Vaisband

发表机构 * University of Illinois Chicago（伊利诺伊大学香槟分校）； Center for Heterogeneous Integration of Micro Electronic Systems（微电子异构集成中心）； Joint University Microelectronics Program (JUMP) 2.0（联合大学微电子计划（JUMP）2.0）； Semiconductor Research Corporation (SRC)（半导体研究公司（SRC））； Defense Advanced Research Project Agency (DARPA)（国防高级研究计划局（DARPA））

AI总结提出物理信息主动学习框架，高效探索GaN tri-gate FinFET的高维设计空间，优化关键结构参数（如GaN-to-AlGaN厚度比），发现两种优化器件，其中D1在300-fin配置下驱动电流和开关效率优于D2。

详情

AI中文摘要

本文展示了机器学习驱动优化在垂直供电系统中设计特定应用的GaN三栅极FinFET的有效性。传统的基于TCAD的方法计算量大，且不足以导航先进GaN器件的高维非线性设计空间。为此，采用物理信息主动学习框架智能引导仿真，在保持精度的同时加速收敛。这种ML引导的方法通过高效探索关键结构参数——尤其是GaN-to-AlGaN厚度比（器件设计中长期争论的焦点）——来发现最优配置。通过系统探索关键结构参数，确定了两种具有激进缩放的栅漏长度的优化器件。单鳍多通道仿真表明，相对于AlGaN势垒具有更薄GaN沟道的器件D2实现了更高的驱动电流。然而，在300鳍配置中，器件D1以0.49欧姆导通电阻提供3.3A电流，性能约为D2的2倍，尽管寄生参数略高。两种器件均工作在常关模式。基于特定应用品质因数，器件D1达到5 pC·欧姆，开关效率比D2高2倍，而两种设计在不同性能指标上均优于工业基准。

英文摘要

This paper demonstrates the effectiveness of machine learning-driven optimization for designing application-specific GaN tri-gate FinFETs in vertical power delivery systems. Conventional TCAD-based approaches are computationally intensive and insufficient for navigating the high-dimensional, nonlinear design space of advanced GaN devices. To address this, a physics-informed active learning framework is used to intelligently guide simulations, accelerating convergence while preserving accuracy. This ML-guided approach enables the discovery of optimal configurations by efficiently exploring key structural parameters -- most notably the GaN-to-AlGaN thickness ratio -- a long-standing focus of debate in device design. By systematically exploring key structural parameters, two optimized devices with aggressively scaled gate-to-drain lengths are identified. Single-fin, multi-channel simulations show that device~D2, with a thinner GaN channel relative to the AlGaN barrier, achieves higher drive current. However, in a 300-fin configuration, device~D1 outperforms device~D2 by delivering 3.3\,A at 0.49~ohm on-resistance -- approximately 2$\times$ better -- despite slightly higher parasitics. Both devices operate in a normally-off mode. Based on an application-specific figure of merit, device~D1 achieves 5\,pC$\cdot$ohm, demonstrating 2$\times$ greater switching efficiency than device~D2, while both designs outperform industrial benchmarks from different performance standpoints.

URL PDF HTML ☆

赞 0 踩 0

2606.01260 2026-06-02 cs.CL cs.AI

IndoBias: A Dual Track Culturally Grounded Benchmark for LLMs Bias Evaluation in Indonesian Languages

IndoBias：印尼语言中大语言模型偏见评估的双轨文化基准

Ikhlasul Akmal Hanif, Muhammad Falensi Azmi, Filbert Aurelian Tjiaranata, Eryawan Presma Yulianrifat, Fajri Koto

发表机构 * Mohamed bin Zayed University of Artificial Intelligence（莫扎德大学人工智能大学）； Universitas Indonesia（印度尼西亚大学）； Independent Researcher（独立研究员）

AI总结提出IndoBias基准，通过深度和广度双轨评估，发现现有LLM在印尼语和三种地方语言中表现出显著偏见，且预训练数据来源和语言多样性影响偏见程度。

详情

AI中文摘要

尽管印尼拥有超过1300个民族和700种土著语言，但大语言模型中的偏见尚未得到充分研究，从而在其独特广阔、多语言和多样化的社会文化背景下，评估代表性公平性和本地化刻板印象存在关键空白。为解决此问题，我们引入IndoBias作为文化基础的偏见基准，评估LLM在印尼语和三种地方语言（爪哇语、巽他语和望加锡语）中的偏见。IndoBias具有双视角评估轨道：深度导向（使用对比对）和广度导向（基于生成），后者基于社会科学框架（SPI、O*NET和WGI）。我们的结果表明，现有LLM——尤其是解码器模型——对印尼语中的原型句子表现出强烈偏见，而地方语言在意识形态和宗教类别下遭受更高偏见。我们还发现，当使用各种地方实体提示时，LLM响应表现出非均匀的刻板印象极性。最后，我们发现，在印尼语中，Common Crawl文本在预训练期间引入的偏见比人工审核的文章文本（如维基百科、新闻）更多，而将地方语言引入预训练通常会增加偏见。这项工作强调了在特定文化背景下研究偏见的重要性。警告：本文包含可能具有冒犯性、有害性或偏见性的示例数据。

英文摘要

Despite being home to more than 1300 ethnic groups and 700 indigenous languages, bias in Large Language Models has not been fully studied in Indonesia, thus leaving a critical gap in evaluating representational fairness and localized stereotypes within its uniquely vast, multilingual, and diverse sociocultural landscape. To address this, we introduce IndoBias as a culturally-grounded bias benchmark to assess LLMs bias in Indonesian and three local languages: Javanese, Sundanese, and Makasar. IndoBias features dual perspective evaluation tracks: depth-oriented (with contrastive-pairs) and breadth-oriented (with generation-based), where the latter is grounded in social science frameworks (SPI, O*NET, and WGI). Our results show that existing LLMs -- particularly decoder models -- exhibit strong bias towards prototypical sentences in Indonesian, while local languages suffer higher bias under Ideology and Religion category. We also find that LLMs responses exhibit a non-uniform Stereotype Polarity when prompted with various local entities. Finally, we discover that, in Indonesian, Common Crawl texts introduce more bias during pretraining, compared to human-reviewed article texts (e.g., Wikipedia, News), whereas introducing local languages to pretraining generally increases bias. This work highlights the importance of studying bias in culture-specific context. Warning: This paper contains example data that may be offensive, harmful, or biased.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

HOLA: Holistic Multi-Modal Alignment for Open-Set 3D Recognition

S2M-Trek: From Single to Multi-Sphere Transport via Per-Frame Deep Sets on a Wheel-Legged Robot

Conditioned free-energy density of proteins using unbalanced solutions to constraint satisfaction problems

DiffuSent: Towards a Unified Diffusion Framework for Aspect-Based Sentiment Analysis

TukaBench: A Culturally Grounded Jailbreak Benchmark for African Languages

DeblurNVS: Geometric Latent Diffusion for Novel View Synthesis from Sparse Motion-Blurred Images

SkillSmith: Co-Evolving Skills and Tools for Self-Improving Agent Systems

PSG-Nav: Probabilistic Scene Graph Navigation via Multiverse Decision Making

SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories

FAiT: Frequency-Aware Inverted Transformer for Multivariate Time Series Forecasting

Structure and Scale in Simplicial Sequence Modelling

Med-HEAL: Analyzing and Mitigating Hallucinations in Medical LLMs with Hallucination-Aware In-Context Learning

ChronosAD: Leveraging Time Series Foundation Models for Accurate Anomaly Detection

Challenger at MultiPRIDE: Is It Hate Speech or Reclaimed?

Don't Read Everything: A Curvature-Conditioned Query for Linear Attention

What Makes a Strong Model? A Unified Spectral Analysis of Knowledge Transfer over High-dimensional Linear Regression

Feature to Dynamics: Feature-space to Autoregression strategy for Zero-shot Time Series Forecasting

Beyond Visual Memory: Mechanistic Diagnostics of Latent Visual Reasoning

Knowledge-Intensive Video Generation

AdaKernel: Learning Adaptive Kernel Parameters for Spatiotemporal Graph Neural Networks

KG-FairDiff: Knowledge Graph-Guided Prompt Refinement for Demographically Fair Text-to-Image Generation

RLVR without Ineffective Samples: Group Prioritized Off-Policy Optimization for LLM Reasoning

Event-Based Vision in Space: Applications, Trends, and Future Directions

ANDES: Agent Native Data Evolving Synthesis Tool for Autonomous Instruction Alignment

DeepIPCv3: Event-Aware Multi-Modal Sensor Fusion for Sudden Pedestrian Crossing Avoidance

Worlds Within Words: Translating Culture in Ancient Chinese Texts with Multi-Agent Coordination

GLIDE: Graph-guided Leap Inference for Diffusion Estimation of Spatio-Temporal Point Processes

Exploiting In-Sensor Computing for Energy-Efficient Earth Observation

PALTO: Physics-Informed Active Learning for Tri-Gate FinFET Design Optimization for Vertical Power Delivery

IndoBias: A Dual Track Culturally Grounded Benchmark for LLMs Bias Evaluation in Indonesian Languages