arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4178
2606.01323 2026-06-02 cs.CL cs.AI

DiffuSent: Towards a Unified Diffusion Framework for Aspect-Based Sentiment Analysis

DiffuSent:面向方面级情感分析的统一扩散框架

Shu Long, Yanglei Gan, Xuchuan Zhou

发表机构 * University of Electronic Science and Technology of China(电子科技大学) Southwest Petroleum University(西南石油大学) Southwest Minzu University(西南民族大学)

AI总结 提出非自回归扩散框架DiffuSent,将方面级情感分析的所有子任务统一为边界去噪扩散过程,通过对比去噪训练策略解决重复预测问题,在28个设置上优于现有生成式和跨度式系统,并实现高达181倍的推理加速。

详情
AI中文摘要

方面级情感分析(ABSA)包含七个不同的子任务,每个子任务关注不同的提取元素。尽管生成模型在统一方面情感分析中取得了成功,现有方法通常依赖于自回归的逐词生成,未能捕捉方面和意见术语的整体信息,导致边界不敏感,特别是在多词方面和意见术语的上下文中。为了解决这些问题,我们提出了DiffuSent,一个非自回归扩散框架,系统地将所有ABSA子任务公式化为边界去噪扩散过程,逐步在噪声状态上细化边界。此外,我们引入了一种对比去噪训练策略,有效解决了扩散过程中引入的细微变化导致的重复预测问题。在28个设置(7个子任务×4个数据集)上的大量实验表明,DiffuSent在最强生成式和跨度式系统上实现了持续改进。DiffuSent在多词三元组上表现出显著增益,平均F1提升+2.48,并在包含多个情感三元组的句子中保持稳健的提取准确性。此外,非自回归解码实现了显著的效率优势,推理速度比自回归生成基线快达181倍。

英文摘要

Aspect-Based Sentiment Analysis (ABSA) encompasses seven distinct subtasks, each focusing on different extracted elements. Despite the proven success of generative models in unified aspect sentiment analysis, existing approaches often rely on auto-regressive token-by-token generation without grasping the whole information of the aspect and opinion terms, resulting in boundary insensitivity, particularly in context of multi-word aspect and opinion terms. To address these issues, we present DiffuSent, a non-auto-regressive diffusion framework that systematically formulates all ABSA subtasks as boundary denoising diffusion processes, progressively refining boundaries over noisy states. Furthermore, we introduce a contrastive denoising training strategy which effectively address duplicate predictions with subtle variations introduced by diffusion process. Extensive experiments across 28 settings (7 subtasks x 4 datasets) demonstrate that DiffuSent achieves delivers consistent improvements over the strongest generative and span-based systems. DiffuSent exhibits notable gains on multi-word triplets, achieving an average improvement of +2.48 F1, and maintains robust extraction accuracy in sentences containing multiple sentiment triplets. Moreover, the non-auto-regressive decoding enables substantial efficiency benefits, reaching up to 181 times faster inference than auto-regressive generative baselines

2606.01322 2026-06-02 cs.CL cs.AI

TukaBench: A Culturally Grounded Jailbreak Benchmark for African Languages

TukaBench: 一个基于文化的非洲语言越狱基准

Victor Akinode, Senyu Li, Wassim Hamidouche, Waqas Zamir, Inbal Becker-Reshef, David Ifeoluwa Adelani

发表机构 * Mila - Quebec AI Institute(魁北克人工智能研究院) McGill University(麦吉尔大学) Microsoft AI for Good Research Lab(微软人工智能造福人类研究实验室) Canada CIFAR AI Chair(加拿大CIFAR人工智能主席)

AI总结 针对大型语言模型在非洲低资源语言上的安全评估缺失,提出TUKABENCH基准,通过四种设置(直接翻译、文化适应翻译、人工策划提示、代码切换提示)评估语言、文化背景和提示规避性对模型安全的影响,发现非洲语言提示降低拒绝率,并引入Deflection指标和人工验证以解决模型理解失败和评判可靠性问题。

详情
Comments
Under review
AI中文摘要

大型语言模型(LLMs)的安全评估仍然高度以英语为中心,导致低资源语言(LRLs),特别是非洲语言,严重缺乏探索。我们引入了TUKABENCH,一个针对七种非洲语言的越狱基准,它通过四种设置将JailbreakBench(JBB)扩展到直接翻译之外:JBB提示的人工翻译、适应非洲背景的英语提示后人工翻译、通过与GPT-5.2交互验证的人工策划提示,以及结合英语和非洲语言的代码切换提示,从而隔离语言、文化背景和提示规避性对模型安全的影响。在闭源和开源模型中,使用非洲语言提示相比英语减少了拒绝,其中文化适应的提示导致最少的拒绝。评估还揭示了两个结构性限制:模型理解失败和低资源语言中LLM作为评判者的可靠性降低。为了捕捉前者,我们在“拒绝”和“越狱”之外引入了“回避”;为了评估后者,我们通过人工标注验证输出,显示在低资源语言和较少支持的脚本中,评判者与人类的一致性下降。

英文摘要

Safety evaluation of Large Language Models (LLMs) remains heavily English-centric, leaving Low-Resource Languages (LRLs), particularly African ones, critically underexplored. We introduce TUKABENCH, a jailbreak benchmark for seven African languages that extends JailbreakBench (JBB) beyond direct translation through four settings: human translation of JBB prompts, English adaptation to African contexts followed by human translation, human-curated prompts validated through interactions with GPT-5.2, and code-switched prompts combining English and African languages, isolating the effect of language, cultural grounding, and prompt evasiveness on model safety. Across closed and open models, prompting in African languages reduces refusal relative to English, with culturally adapted prompts leading to least refusal. The evaluation also surfaces two structural limitations: model comprehension failures and reduced LLM-as-a-judge reliability in LRLs. To capture the first, we introduce Deflection alongside Refused and Jailbroken; to assess the second, we validate outputs with human annotations, showing that judge-human agreement drops in lower-resource languages and less commonly supported scripts.

2606.01316 2026-06-02 cs.AI

Science Earth: Towards A Planet-Scale Operating System for AI-Native Scientific Discovery

Science Earth: 迈向面向AI原生科学发现的行星级操作系统

Zhe Zhao, Haibin Wen, Yingcheng Wu, Jiaming Ma, Yifan Wen, Jinglin Jian, Jiacheng Ge, Xiangru Tang, Bo An, Ming Yin, Sanfeng Wu, Mengdi Wang, Le Cong

发表机构 * Department of Pathology, Department of Genetics, Stanford University School of Medicine(病理学系、遗传学系,斯坦福大学医学院) Princeton AI Lab, Department of Electrical & Computer Engineering, Princeton University(普林斯顿人工智能实验室、电气与计算机工程系,普林斯顿大学) Scripps Research, La Jolla, CA, USA(斯克里普斯研究机构,洛杉矶,加利福尼亚州,美国) Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine(生物统计学部、人口健康系,纽约大学格罗斯曼医学院) College of Computing and Data Science, Nanyang Technological University(计算与数据科学学院,南洋理工大学) Department of Computer Science, Yale University(计算机科学系,耶鲁大学) Department of Physics, Princeton University(物理系,普林斯顿大学)

AI总结 提出Science Earth行星级科学运行时,通过EACN协议实现AI能力动态连接与自组织协作,在跨太平洋Kuramoto同步研究和单细胞分析中验证了分布式自校正科学推理。

详情
AI中文摘要

科学发现需要在广阔的搜索空间中运用智能、毅力和偶然性。如今,顶尖科学能力仍然孤立——一个AI系统用于生物分析,另一个用于临床推理、数学推导或材料模拟——并且没有预设计的团队能够预见一个问题所需的所有技能。Science Earth是一个行星级科学运行时,其中任何能力——模拟集群、湿实验室机器人、证明引擎、单细胞管道——都可以相互连接,协作结构由问题本身涌现。其底层EACN协议让能力能够相互发现、协商任务所有权,并在不相容的证据标准之间进行裁决,而无需事先知道谁将遇见谁。这将组织挑战从工作流设计转向开放式连接。两次运行在结构不同的条件下验证了这一点。在一项跨太平洋高阶Kuramoto同步研究中,智能体在30分钟内识别并纠正了Ott-Antonsen解析理论中一个在洛伦兹极限外失效的闭合比率假设。在针对488万细胞Kang 2024泛癌图谱的八智能体单细胞运行中,异质能力在64.9小时窗口内耦合,仅有一条结构外部指令,产生了三个新的结果层,并将发现与一项关于相邻CCR8- TIGIT+ Treg亚群的独立湿实验室研究进行锚定。这些案例是首次实证读数,而非基准测试。它们表明,当AI能力真正可连接且协调从问题中涌现时,科学推理成为一个分布式、自校正的过程——这是向行星级AI原生发现迈出的一步。

英文摘要

Scientific discovery demands intelligence, perseverance, and serendipity across vast search spaces. Today, top scientific capabilities remain siloed--one AI system for biological analysis, another for clinical reasoning, mathematical derivation, or materials simulation--and no pre-designed team can anticipate every skill a question will need. Science Earth is a planet-scale scientific runtime in which any capability--a simulation cluster, a wet-lab robot, a proof engine, a single-cell pipeline--can connect to any other, with collaboration structure emerging from the question itself. Its underlying EACN protocol lets capabilities discover one another, negotiate task ownership, and adjudicate across incompatible evidentiary standards without prior knowledge of who will meet whom. This shifts the organizing challenge from workflow design to open-ended connectivity. Two runs validate this under structurally distinct conditions. In a trans-Pacific higher-order Kuramoto synchronization study, agents identified and corrected a closure-ratio assumption in Ott-Antonsen analytic theory that fails outside the Lorentzian limit, within thirty minutes. In an eight-agent single-cell run on the 4.88M-cell Kang 2024 pan-cancer atlas, heterogeneous capabilities coupled over a 64.9-hour window with one structural external instruction, producing three new result layers and anchoring findings against an independent wet-lab study on an adjacent CCR8- TIGIT+ Treg subset. These cases are a first empirical reading, not a benchmark sweep. They show that when AI capabilities are truly connectable and coordination emerges from the problem, scientific reasoning becomes a distributed, self-correcting process--a step towards scaling AI-native discovery to the planet.

2606.01315 2026-06-02 cs.CV

DeblurNVS: Geometric Latent Diffusion for Novel View Synthesis from Sparse Motion-Blurred Images

DeblurNVS:基于几何潜在扩散的稀疏运动模糊图像新视角合成

Changyue Shi, Wangbo Yu, Chaoran Feng, Li Yuan

发表机构 * School of AI for Science, Peking University Shenzhen Graduate School(人工智能科学学院,北京大学深圳研究生院) School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School(电子与计算机工程学院,北京大学深圳研究生院)

AI总结 提出DeblurNVS框架,利用几何潜在扩散从稀疏运动模糊图像中直接合成高保真新视角,无需逐场景优化。

详情
AI中文摘要

新视角合成(NVS)是计算机视觉和图形学中的一个基本问题。神经辐射场(NeRF)、3D高斯泼溅(3DGS)和生成式视角合成的最新进展显著提高了其质量。然而,大多数方法仍然依赖于清晰观测,其中图像结构和跨视角几何线索得以良好保留。运动模糊通过破坏局部细节和削弱多视角对应关系打破了这一假设。这种模糊通常由实际拍摄中的相机抖动、场景运动或有限曝光引起。模糊感知的NVS方法通过建模图像形成来解决这一退化问题,但它们依赖于昂贵的逐场景优化,限制了高效且可泛化的稀疏视角合成。为了解决这个问题,我们提出了DeblurNVS,一种新颖的框架,可以直接从稀疏运动模糊图像中合成高保真新视角,无需逐场景优化。DeblurNVS恢复了多视角推理所需的中间几何表示,使模糊输入能够恢复可靠的结构和对应线索。然后,将恢复的表示与目标相机信息结合,合成目标视角表示并重建清晰的RGB新视角。为了实现大规模训练,我们使用基于插值的有限曝光模糊合成方法,从DL3DV-10K构建了一个运动模糊NVS数据集。大量实验表明,DeblurNVS在合成运动模糊基准上优于现有基线,并能泛化到真实运动模糊场景,生成感知上更清晰、结构上更稳定的新视角,同时避免了昂贵的逐场景优化。项目页面:https://github.com/PKU-YuanGroup/DeblurNVS。

英文摘要

Novel view synthesis (NVS) is a fundamental problem in computer vision and graphics. Recent advances in neural radiance fields (NeRF), 3D Gaussian Splatting (3DGS), and generative view synthesis have substantially improved its quality. Yet most methods still rely on clean observations, where image structures and cross-view geometric cues are well preserved. Motion blur breaks this assumption by corrupting local details and weakening multi-view correspondences. Such blur commonly arises from camera shake, scene motion, or finite exposure in practical capture. Blur-aware NVS methods address this degradation by modeling image formation, but their reliance on costly per-scene optimization limits efficient and generalizable sparse-view synthesis. To address this, we propose DeblurNVS, a novel framework for synthesizing high-fidelity novel views directly from sparse motion-blurred images, without requiring per-scene optimization. DeblurNVS restores the intermediate geometric representations needed for multi-view reasoning, enabling blurred inputs to recover reliable structure and correspondence cues. The restored representations are then combined with target camera information to synthesize the target-view representation and reconstruct a sharp RGB novel view. To enable the large-scale training, we construct a motion-blurred NVS dataset from DL3DV-10K using interpolation-based finite-exposure blur synthesis. Extensive experiments demonstrate that DeblurNVS outperforms existing baselines on synthetic motion-blur benchmarks and generalizes to real motion-blurred scenes, producing perceptually sharper and structurally more stable novel views while avoiding costly per-scene optimization. Project page: https://github.com/PKU-YuanGroup/DeblurNVS.

2606.01314 2026-06-02 cs.AI

SkillSmith: Co-Evolving Skills and Tools for Self-Improving Agent Systems

SkillSmith: 技能与工具的协同进化用于自我改进的智能体系统

Yangbo Wei, Zhen Huang, Shaoqiang Lu, Junhong Qian, Qifan Wang, Chen Wu, Lei He

发表机构 * Shanghai Jiao Tong University(上海交通大学) Eastern Institute of Technology(东部技术研究所) University of Science and Technology of China(中国科学技术大学) Southeast University(东南大学) Ningbo Institute of Digital Twin(宁波数字孪生研究所)

AI总结 提出SkillSmith框架,通过统一提案空间和Lotka-Volterra生态效用模型实现技能与工具的协同进化,在多个基准测试中显著提升性能。

详情
AI中文摘要

最近的自进化智能体表明,技能可以通过执行被发现、精炼和积累。然而,现有的技能进化框架通常假设固定的工具层,并独立评估每个技能,限制了它们修复工具级故障或推理技能间交互的能力。我们提出SkillSmith,一个协同感知的技能-工具协同进化框架。SkillSmith引入了一个统一的提案空间,其中反思产生原子束,共同修改技能和工具,允许在技能进化识别出可重用的能力缺口时,对工具进行包装、编辑、组合、拆分或淘汰。为了指导这种联合搜索,SkillSmith维护了一个受Lotka-Volterra动力学启发的生态效用模型,其中从执行轨迹估计的交互矩阵捕获技能间的成对互补和冲突,并为检索、变异优先级和淘汰提供压力信号。此外,SkillSmith记录反模式,包括失败特征、因果归因和补救措施,以加速诊断并否决重复已知错误的提案。在包括WildClawBench在内的三个基准测试和五个Qwen3.5模型规模上的实验表明,SkillSmith始终优于强基线,并且随着任务复杂性和多技能共激活的增加,增益会放大。

英文摘要

Recent self-evolving agents have shown that skills can be discovered, refined, and accumulated through execution. However, existing skill-evolution frameworks typically assume a fixed tool layer and evaluate each skill independently, limiting their ability to repair tool-level failures or reason about interactions among skills. We propose SkillSmith, a synergy-aware skill-tool co-evolution framework. SkillSmith introduces a unified proposal space in which reflection produces atomic bundles that jointly modify skills and tools, allowing tools to be wrapped, edited, composed, split, or retired when skill evolution identifies a reusable capability gap. To guide this joint search, SkillSmith maintains an ecological utility model inspired by Lotka-Volterra dynamics, where an interaction matrix estimated from execution traces captures pairwise complementarity and conflict among skills and provides pressure signals for retrieval, mutation prioritization, and retirement. Furthermore, SkillSmith records anti-patterns, including failure signatures, causal attributions, and remedies, to accelerate diagnosis and veto proposals that repeat known mistakes. Experiments on three benchmarks, including WildClawBench, and five Qwen3.5 model scales show that SkillSmith consistently outperforms strong baselines, with gains that amplify as task complexity and multi-skill co-activation increase.

2606.01313 2026-06-02 cs.RO cs.AI

PSG-Nav: Probabilistic Scene Graph Navigation via Multiverse Decision Making

PSG-Nav: 通过多元宇宙决策的概率场景图导航

Rufeng Chen, Yue Chang, Xiaqiang Tang, Hechang Chen, Sihong Xie

发表机构 * Tsinghua University(清华大学)

AI总结 提出PSG-Nav方法,通过构建3D概率场景图并利用多元宇宙决策从联合分布中采样最可能的世界设置,以处理开放词汇导航中的感知不确定性,并引入证据经验校准器实现在线终身适应,在多个基准上取得最新最优结果。

详情
Comments
21 pages, 7 figures. ICML 2026
AI中文摘要

开放词汇导航要求具身智能体管理由语义歧义和模型错误引起的显著感知不确定性。然而,大多数现有工作满足于局部最优的确定性方法,剥夺了在多个复合可能性上的复杂导航决策,而这些对于全局更优解至关重要。在本文中,我们提出概率场景图导航(PSG-Nav),它构建了一个3D概率场景图,使用完整的语义类别分布来考虑感知不确定性。为了有效利用局部分布来组合和推理最优导航地标,我们提出多元宇宙决策,从联合分布中采样多个最可能的世界设置,并基于地标与多元宇宙之间的兼容性评估导航地标。为了减轻开放词汇导航中因认知不确定性导致的误报,我们引入证据经验校准器,通过将检测与过去成功和失败的记忆进行交叉验证,实现在线终身适应。在广泛使用的基准MP3D、HM3D和HSSD上的大量实验表明,PSG-Nav建立了新的最先进结果,分别实现了66.1%、44.8%和67.9%的成功率。代码可在https://psg-nav.github.io/获取。

英文摘要

Open-vocabulary navigation requires embodied agents to manage significant perception uncertainty stemming from semantic ambiguity and model errors. However, most existing works settle for local optimal deterministic approaches, depriving complex navigation decision-making over multiple composite possibilities that are critical for globally better solutions. In this paper, we propose Probabilistic Scene Graph Navigation (PSG-Nav), which constructs a 3D Probabilistic Scene Graph that uses full semantic categorical distributions to account for perception uncertainty. To efficiently use the local distributions to compose and reason about the optimal navigation landmarks, we propose Multiverse Decision to sample multiple most likely world settings from the joint distribution, and evaluate navigation landmarks based on the compatibility between landmarks and multiverses. To mitigate false positives due to epistemic uncertainty in open-vocabulary navigation, we introduce the Evidential Experience Calibrator, which enables online lifelong adaptation by cross-validating detections against memories of past successes and failures. Extensive experiments on widely-used benchmarks MP3D, HM3D, and HSSD demonstrate that PSG-Nav establishes new state-of-the-art results, achieving Success Rates of 66.1%, 44.8%, and 67.9%, respectively. Code is available at: https://psg-nav.github.io/

2606.01311 2026-06-02 cs.CL cs.AI cs.LG cs.MA

SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories

SkillAdaptor:基于轨迹的LLM智能体自适应技能

Zhuoyun Yu, Xin Xie, Wuguannan Yao, Chenxi Wang, Lei Liang, Xiang Qi, Shumin Deng

发表机构 * Zhejiang University(浙江大学) Ant Digital Technologies, Ant Group(蚂蚁集团数字技术部) Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph(浙江大学-蚂蚁集团知识图谱联合实验室)

AI总结 提出SkillAdaptor,一种无训练的步骤级技能自适应框架,通过显式故障归因和针对性更新,提升LLM智能体在长程交互任务中的表现。

详情
Comments
Work in progress
AI中文摘要

大型语言模型(LLM)智能体越来越依赖可重用的外部技能来解决长程交互任务。现有的无训练技能自适应流程通常从完整轨迹或会话级反馈更新技能,这使得故障归因粗糙,往往产生不稳定或过于宽泛的修订。我们提出SkillAdaptor,一种无训练的步骤级技能自适应框架,具有显式故障归因,并可插入OpenClaw类智能体框架。给定一个失败轨迹,SkillAdaptor识别第一个可操作的故障步骤,将责任关联到候选技能,并在显式接受检查下应用针对性更新,同时保持主干冻结。我们在WebShop、PinchBench和Claw-Eval上使用Kimi-K2.5、GLM-5和GPT-5.2进行评估。SkillAdaptor在所有三个套件上均优于无技能和技能自适应基线,最大的单项指标提升为PinchBench平均得分%提升1.5分,Claw-Eval平均得分提升1.8分,WebShop成功率提升1.7分。这些结果表明,步骤级归因支持更稳定且可审计的无训练技能维护。

英文摘要

Large language model (LLM) agents increasingly rely on reusable external skills to solve long-horizon interactive tasks. Existing training-free skill adaptation pipelines usually update skills from full trajectories or session-level feedback, which makes failure attribution coarse and often produces unstable or overly broad revisions. We propose SkillAdaptor, a training-free step-level skill adaptation framework with explicit failure attribution, and it can plug into OpenClaw-class agent harnesses. Given a failed trajectory, SkillAdaptor identifies a first actionable fault step, links responsibility to candidate skills, and applies targeted updates under explicit acceptance checks while keeping the backbone frozen. We evaluate on WebShop, PinchBench, and Claw-Eval with Kimi-K2.5, GLM-5, and GPT-5.2. SkillAdaptor improves over no-skill and skill-adaptation baselines on all three suites, with the largest single-metric improvements of +1.5 points on PinchBench Avg Score%, +1.8 on Claw-Eval Avg Score, and +1.7 on WebShop success rate. These results indicate that step-level attribution supports more stable and auditable training-free skill maintenance\footnote{The code will be released at https://github.com/zjunlp/SkillAdaptor.}.

2606.01306 2026-06-02 cs.LG cs.IR

FAiT: Frequency-Aware Inverted Transformer for Multivariate Time Series Forecasting

FAiT:面向多元时间序列预测的频率感知倒置Transformer

Peng He, Yao Liu, Yanglei Gan, Run Lin, Yuxiang Cai, Qiao Liu

发表机构 * University of Electronic Science and Technology of China(电子科技大学)

AI总结 提出FAiT,通过倒置注意力机制和动态时频调制,解决Transformer在多元时间序列预测中忽略高频信号和时变频谱特性的问题。

详情
AI中文摘要

虽然基于Transformer的架构已成为多元时间序列预测(MTSF)的主导范式,但其核心自注意力机制本质上充当低通滤波器,系统性地平滑掉对剧烈局部变化至关重要的高频信号。最近的进展越来越多地引入频域操作来解决这一偏差,然而,大多数现有设计依赖固定的频谱基并应用序列级(均匀)调制,隐含地假设时不变的频率响应。这忽略了现实世界序列的一个关键属性——其频谱特征通常随时间演变,使得均匀调制不足以捕捉细粒度的时态动态。为了解决这些局限性,我们提出了FAiT,一种频率感知的倒置Transformer。具体来说,FAiT通过倒置注意力机制内部纠正频谱偏差,该机制将注意力图解释为可学习的低通算子,并通过倒置注意力矩阵构建一个专用的互补高通分支,以恢复衰减的瞬态信号。此外,FAiT引入了动态时频调制(DTFM),该调制合成实例条件权重以自适应地重新校准频谱子带的能量,从而实现对演变的多样模式进行细粒度控制。在广泛使用的基准上的大量实验表明,FAiT在保持计算效率的同时,始终优于最先进的基于Transformer和频率增强的基线。

英文摘要

While Transformer-based architectures have established themselves as a dominant paradigm in Multivariate Time Series Forecasting (MTSF), their core self-attention mechanism inherently functions as a low-pass filter, systematically smoothing out high-frequency signals vital for sharp local changes. Recent advancements have increasingly incorporated frequency-domain operations to address this bias, however, most existing designs rely on fixed spectral bases and apply sequence-wise (uniform) modulation, implicitly assuming a time-invariant frequency response. This overlooks a key property of real-world series that their spectral characteristics often evolve over time, making uniform modulation insufficient for capturing fine-grained temporal dynamics. To tackle these limitations, we propose FAiT, a Frequency-Aware inverted Transformer. Specifically, FAiT rectifies the spectral bias internally through Inverted Attention, which interprets the attention map as a learnable low-pass operator and constructs a dedicated complementary high-pass branch by inverting the attention matrix to recover attenuated transient signals. Furthermore, FAiT introduces Dynamic Temporal-Frequency Modulation (DTFM), which synthesizes instance-conditioned weights to adaptively re-calibrate the energy of spectral sub-bands, enabling fine-grained control over evolving multi-scale patterns. Extensive experiments on widely used benchmarks demonstrate that FAiT consistently outperforms state-of-the-art Transformer-based and frequency-enhanced baselines, while maintaining computational efficiency.

2606.01302 2026-06-02 cs.LG

Structure and Scale in Simplicial Sequence Modelling

单纯复形序列建模中的结构与规模

Matthew Farrugia-Roberts

发表机构 * Department of Computer Science, University of Oxford(牛津大学计算机科学系)

AI总结 本文通过训练小型Transformer预测隐马尔可夫模型输出,发现性能缩放模式与内部表征之间存在相关性,为行为缩放定律与涌现机制的联系提供了初步证据。

详情
Comments
HiLD 2026: 4th Workshop on High-dimensional Learning Dynamics
AI中文摘要

现代大规模深度学习展现出两个引人注目的经验现象:行为缩放定律(性能随规模增大而可预测地提升)和涌现机制(深度神经网络中结构化的内部表征和回路)。我们假设这两个现象是相互关联的:行为上的可预测变化是内部计算结构可预测变化的结果。在本文中,我们报告了这种联系的初步证据。我们发现,在训练用于预测隐马尔可夫模型输出的小型Transformer中,性能缩放模式与表征之间存在相关性,已知其残差激活在概率单纯形上线性编码了关于潜在状态的信念分布。

英文摘要

Modern large-scale deep learning exhibits two striking empirical phenomena: behavioural scaling laws (predictable performance gains with increasing scale) and emergent mechanisms (structured internal representations and circuits in deep neural networks). We hypothesise that these two phenomena are connected: that predictable changes in behaviour are the result of predictable changes in internal computational structure. In this paper, we report preliminary evidence of such a connection. We find a correlation between scaling patterns in performance and representations in small transformers trained to predict the outputs of a hidden Markov model, for which residual activations are known to linearly encode a belief distribution over latent states in a probability simplex.

2606.01301 2026-06-02 cs.CL

Med-HEAL: Analyzing and Mitigating Hallucinations in Medical LLMs with Hallucination-Aware In-Context Learning

Med-HEAL:通过幻觉感知的上下文学习分析与缓解医学大语言模型中的幻觉

Yiming Liao, Zeno Franco, Jose Eduardo Lizarraga Mazaba, Keke Chen

发表机构 * University of Maryland, Baltimore County(马里兰大学巴尔的摩县) Medical College of Wisconsin(威斯康星医学学院)

AI总结 提出Med-HEAL框架,利用临床数据构建幻觉数据集,并通过自我批评和检索增强上下文学习策略缓解医学大语言模型中的幻觉,实验表明自我批评策略可提升多数模型准确率。

详情
Comments
12 pages, 5 figures. Preprint full version of an accepted ACM-BCB 2026 short paper
AI中文摘要

医学大语言模型中的幻觉对临床决策支持构成严重风险,特别是当模型需要推理复杂的电子健康记录时。然而,现有基准通常缺乏真实的临床背景,并且对如何在实践中缓解幻觉提供的见解有限。我们引入了Med-HEAL,一个利用临床基础数据系统性地识别、分析和缓解医学大语言模型幻觉的框架。基于源自MIMIC-IV出院摘要的EHRNoteQA基准,我们通过评估BioMistral-7B在开放式临床问答任务上的表现构建了一个幻觉数据集。模型输出通过一个结合LLM-as-a-Judge评估(GPT-4o)和医学生评审员人工审核的双重评估流程进行标注,通过自定义的基于网络的评估系统产生正确性判断和推理错误注释。然后,我们利用该数据集研究缓解策略:自我批评流程,其中测试模型审查自己的答案以检测潜在错误,并对标记案例重新生成响应;以及检索增强的上下文学习,该模型暴露于幻觉和纠正示例。在五个开源大语言模型(BioMistral、Llama-3.1、DeepSeek、Qwen2.5和Qwen3)上的实验表明,自我批评策略在无需参数更新的情况下提高了五个模型中三个的准确率(p < 0.05)。Med-HEAL既提供了一个可重用的幻觉数据集,也提供了一个实用的框架,用于研究和缓解医学大语言模型中的幻觉,支持AI系统在临床环境中更安全地部署。我们的代码和数据可在https://github.com/yimingliao-blad/med-heal.git公开获取。

英文摘要

Hallucinations in medical large language models (LLMs) pose serious risks for clinical decision support, particularly when models must reason over complex electronic health records (EHRs). However, existing benchmarks often lack a realistic clinical context and provide limited insight into how hallucinations can be mitigated in practice. We introduce Med-HEAL, a framework for systematically identifying, analyzing, and mitigating hallucinations in medical LLMs using clinically grounded data. Building on the EHRNoteQA benchmark derived from MIMIC-IV discharge summaries, we construct a hallucination dataset by evaluating BioMistral-7B on open-ended clinical question answering tasks. Model outputs are labeled through a dual evaluation pipeline that combines LLM-as-a-Judge assessment (GPT-4o) with human auditing by medical student reviewers, producing correctness judgments and annotations of reasoning errors via a custom web-based evaluation system. We then leverage this dataset to investigate mitigation strategies: a self-critique pipeline, in which the test model reviews its own answers to detect potential errors and regenerates responses for flagged cases, and retrieval-augmented in-context learning (RA-ICL), which exposes the model to hallucinated and corrected examples. Experiments across five open-source LLMs-BioMistral, Llama-3.1, DeepSeek, Qwen2.5, and Qwen3, show that the self-critique strategy improves accuracy for three of five models (p < 0.05) without requiring parameter updates. Med-HEAL provides both a reusable hallucination dataset and a practical framework for studying and mitigating hallucinations in medical LLMs, supporting safer deployment of AI systems in clinical environments. Our code and data are publicly available at https://github.com/yimingliao-blad/med-heal.git.

2606.01300 2026-06-02 cs.LG cs.AI

ChronosAD: Leveraging Time Series Foundation Models for Accurate Anomaly Detection

ChronosAD:利用时间序列基础模型进行精确异常检测

Uzair Khan, Luigi Capogrosso, Francesco Biondani, Michele Magno, Franco Fummi, Francesco Setti, Marco Cristani

发表机构 * PR Veneto FESR 2021-2027(普罗文托地区FESR 2021-2027项目) Action 1.1.1(行动1.1.1) DGR 792 CUP D19J24000810007

AI总结 提出ChronosAD架构,通过时间序列基础模型提取特征并结合BiLSTM与多头注意力机制,实现跨域鲁棒的异常检测,在11个基准上平均AUC提升4.72%,AP提升6.60%。

详情
Comments
Accepted at the 24th IEEE International Conference on Industrial Informatics (INDIN) 2026
AI中文摘要

时间序列异常检测是金融、医疗和工业等多个领域的关键任务。然而,现有方法通常难以在不同数据集上泛化,尤其是当异常微妙或依赖于上下文时。为解决此问题,我们引入了ChronosAD,一种新颖的异常检测架构,它使用时间序列基础模型作为特征提取器。具体而言,它采用两阶段流程:首先,使用基础模型以零样本方式为每个时间序列提取嵌入。然后,一个由双向长短期记忆(BiLSTM)和多头注意力组成的自定义开发的时间块,对这些嵌入进行精炼以捕捉时间依赖性并突出显著模式。与先前方法不同,我们的模型需要最少的任务特定调整,并在包括工业、医疗、信息物理和汽车系统在内的广泛领域中展现出鲁棒的泛化能力。在11个基准上的大量实验表明,ChronosAD在AUC和AP上平均分别超过现有方法4.72%和6.60%。源代码可在https://github.com/intelligolabs/ChronosAD获取。

英文摘要

Time series anomaly detection is a crucial task in various domains, including finance, healthcare, and industry. However, existing methods often struggle to generalize across different datasets, especially when anomalies are subtle or context-dependent. To solve this issue, we introduce ChronosAD, a novel architecture for anomaly detection that uses a time series foundation model as a feature extractor. Specifically, it employs a two-stage pipeline: first, it uses the foundation model to extract embeddings for each time series in a zero-shot manner. Then, a custom-developed Temporal Block, composed of Bidirectional Long Short-Term Memory (BiLSTM) and Multi-Head Attention, refines these embeddings to capture temporal dependencies and highlight salient patterns. Unlike previous approaches, our model requires minimal task-specific tuning and demonstrates robust generalization across a wide range of domains, including industrial, medical, cyber-physical, and automotive systems. Extensive experiments on 11 benchmarks show that ChronosAD outperforms existing methods by 4.72% in AUC and 6.60% in AP on average. The source code is available at https://github.com/intelligolabs/ChronosAD.

2606.01298 2026-06-02 cs.CL

Challenger at MultiPRIDE: Is It Hate Speech or Reclaimed?

Challenger at MultiPRIDE: 这是仇恨言论还是被重新使用的语言?

Hadi Bayrami Asl Tekanlou, Mahdi Bakhtiyarzadeh, Jafar Razmara

发表机构 * University of Tabriz(塔布里兹大学)

AI总结 针对仇恨言论与重新使用语言的区分难题,提出一种结合语义嵌入、标签噪声过滤(Cleanlab+逻辑回归)和MLP分类器的可解释方法,在资源受限下实现稳健性能。

详情
Journal ref
CEUR Workshop Proceedings, Vol. 4195, 2026
Comments
9 pages, 2 figures, Published in EVALITA 2026, CEUR Workshop Proceedings Vol. 4195
AI中文摘要

仇恨言论的传播在现代数字环境中,特别是在社交网络平台上,变得越来越有害。尽管最近的进展在自动仇恨言论检测方面显示出有希望的结果,但一个关键挑战仍然存在:区分真正的仇恨言论和被重新使用的语言。由于重新使用表达具有细微差别和上下文依赖性,准确标注是困难的。在本文中,我们提出了一种简单且可解释的方法来区分仇恨言论和被重新使用的语言,该方法是为MultiPride共享任务开发的。我们的方法生成密集的语义文本嵌入,并使用Cleanlab结合逻辑回归进行标签噪声过滤阶段,然后使用多层感知器(MLP)神经网络进行最终分类。该系统设计用于在有限的计算资源下运行,同时保持强大的性能。我们使用精确率、召回率和F1分数(包括宏平均指标)评估我们的方法。实验结果表明,尽管数据集存在极端类别不平衡,但性能稳健。总体而言,研究结果强调了通过更大的嵌入模型和更先进的预处理技术进一步改进的潜力,同时保持可解释性。

英文摘要

The spread of hate speech has become increasingly harmful in modern digital environments, particularly on social networking platforms. While recent advances have shown promising results in automatic hate speech detection, a key challenge remains: distinguishing genuine hate speech from reclaimed language. Accurate labeling is difficult due to the nuanced and context-dependent nature of reclaimed expressions. In this paper, we present a simple and interpretable approach for distinguishing hate speech from reclaimed language, developed for the MultiPride Shared Task. Our method generates dense semantic text embeddings and incorporates a label-noise filtering stage using Cleanlab with logistic regression, followed by a Multi-layer Perceptron (MLP) neural network for final classification. The system is designed to operate under limited computational resources while maintaining strong performance. We evaluate our approach using precision, recall, and F1-score, including macro-averaged metrics. Experimental results demonstrate robust performance despite extreme class imbalance in the dataset. Overall, the findings highlight the potential for further improvements through larger embedding models and more advanced preprocessing techniques while preserving interpretability.

2606.01294 2026-06-02 cs.CL cs.LG

Don't Read Everything: A Curvature-Conditioned Query for Linear Attention

不要阅读一切:用于线性注意力的曲率条件查询

Dong Le, Thong Nguyen, Cong-Duy Nguyen, Anh Tuan Luu

发表机构 * Nanyang Technological University(南洋理工大学) National University of Singapore(国立新加坡大学) VinUniversity(文理大学)

AI总结 针对线性注意力在上下文检索和长上下文任务中的不足,提出曲率条件查询(CCQ)机制,通过二阶泰勒展开构建局部二次模型,利用运行键协方差收缩查询向量,仅修改读取步骤,兼容现有线性注意力骨干,在困惑度、零样本下游准确率、检索和长上下文任务上取得提升。

详情
Comments
19 pages
AI中文摘要

线性注意力通过维护一个循环的快速权重状态,降低了 softmax 注意力的二次成本,但在上下文检索和长上下文任务中始终落后。现有的补救措施通过门控、增量更新或核特征映射作用于记忆的写入侧,但读取步骤保持不变:每个过去的键对输出都有加性贡献,因此有用的目标被存储向量的大多数稀释。我们借用 softmax 几何的一个特定部分来构建一个廉价的读取时查询收缩。在等向注意力点处对 softmax 对数配分函数进行二阶泰勒展开,得到一个局部二次模型,其曲率与运行键协方差一致,该量可以通过与线性注意力状态相同的循环/分块机制来维护。相关的线性算子在查询读取状态之前,沿着记忆的高密度方向收缩查询。我们将这种机制称为曲率条件查询(CCQ)。CCQ 仅修改读取步骤,并且可以与任何线性注意力骨干组合。将其附加到 GLA 和 Gated DeltaNet 上,它在困惑度、零样本下游准确率、训练上下文内外的 S-NIAH 检索、从 4K 到 20K 的长度外推困惑度以及 LongBench 准确率上均有提升,且额外成本很小。

英文摘要

Linear attention reduces the quadratic cost of softmax attention by maintaining a recurrent fast-weight state, but it consistently lags on in-context retrieval and long-context tasks. Existing remedies act on the write side of memory through gating, delta updates, or kernel feature maps, but the read step is left unchanged: every past key contributes additively to the output, so useful targets are diluted by the bulk of stored vectors. We borrow one specific piece of softmax's geometry to construct a cheap read-time contraction of the query. A second-order Taylor expansion of the softmax log-partition at the isotropic-attention point gives a local quadratic model whose curvature coincides with the running key covariance, a quantity that can be maintained with the same recurrent/chunkwise mechanism as the linear-attention state. The associated linear operator contracts the query along the high-density directions of memory before it reads the state. We call this mechanism Curvature-Conditioned Query (CCQ). CCQ modifies only the read step and is composable with any linear-attention backbone. Attached to GLA and Gated DeltaNet, it improves perplexity, zero-shot downstream accuracy, S-NIAH retrieval at and beyond the training context, length-extrapolation perplexity from 4K to 20K, and LongBench accuracy, at small extra cost.

2606.01292 2026-06-02 cs.LG cs.AI

What Makes a Strong Model? A Unified Spectral Analysis of Knowledge Transfer over High-dimensional Linear Regression

什么造就了一个强模型?高维线性回归中知识迁移的统一谱分析

Wendao Wu, Fangqing Zhang, Haihan Zhang, Cong Fang

发表机构 * Department of Computer Science(计算机科学系) Cranberry-Lemon University(Cranberry-Lemon 大学) Department of Computational Neuroscience(计算神经科学系) University of the Witwatersrand(沃特瓦特斯兰大学)

AI总结 本文通过高维线性回归中SGD动力学的统一谱分析,揭示了知识蒸馏中的谱视界扩展和弱到强泛化中的谱去噪两种机制,统一解释了不同知识迁移范式的有效性。

详情
AI中文摘要

师生知识迁移在现代机器学习中无处不在,从通过知识蒸馏进行的经典模型压缩到弱到强泛化这一新兴现象。尽管现有研究提供了孤立见解,但缺乏一个统一的理论框架来解释知识迁移在这些不同机制中的有效性。在这项工作中,我们建立了高维线性回归中SGD动力学的统一谱分析,阐明了知识迁移在看似不同的机制中的效率。我们通过两种不同机制来刻画知识迁移效率:知识蒸馏中的谱视界扩展,使得能够捕获统计上不可及的高频信号;以及弱到强泛化中的谱去噪,其中学生充当优化噪声的滤波器。我们的框架统一了这些现象,揭示了迁移的有效性由隐式正则化与谱上异质谱学习速度之间的相互作用所支配。

英文摘要

Teacher-Student Knowledge Transfer (KT) is ubiquitous in modern machine learning, ranging from classical model compression via Knowledge Distillation (KD) to the emergent phenomenon of Weak-to-Strong (W2S) generalization. While existing studies offer isolated insights, a unified theoretical framework explaining the efficacy of KT across these disparate regimes remains lacking. In this work, we establish a unified spectral analysis of SGD dynamics in high-dimensional linear regression, elucidating the efficiency of KT across seemingly disparate regimes. We characterize KT efficiency through two distinct mechanisms: \emph{Spectral Horizon Expansion} in KD, which enables the capture of statistically inaccessible high-frequency signals, and \emph{Spectral Denoising} in W2S, where the student acts as a filter for optimization noise. Our framework unifies these phenomena, revealing that the efficacy of transfer is governed by the interplay between implicit regularization and heterogeneous spectral learning speeds over the spectrum.

2606.01289 2026-06-02 cs.LG

Feature to Dynamics: Feature-space to Autoregression strategy for Zero-shot Time Series Forecasting

从特征到动态:面向零样本时间序列预测的特征空间到自回归策略

Yifan Wu, Junjie Wu, Kai Wu, Xiaoyu Zhang, Jian Lou

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出FSA框架,通过从可解释特征空间到自回归策略空间的映射,在零样本单变量时间序列预测中引入显式归纳偏置,分离全局趋势、周期成分和局部动态,以更少数据假设实现跨域泛化,在受控实验中优于Transformer架构。

详情
AI中文摘要

零样本时间序列预测旨在预测未见序列的未来值,要求模型泛化超出训练分布的时间动态。虽然近期的基础模型通过大规模预训练实现了强大的域内性能,但其有效性通常依赖于广泛的数据覆盖和隐式模式记忆,当数据稀缺或源域与目标域不重叠时,这可能会限制泛化能力。在这项工作中,我们提出了FSA,一种用于受控零样本单变量预测的特征到策略框架。FSA不直接在观测空间中对原始序列建模,而是学习从可解释特征空间到自回归策略空间的结构化映射。这种设计引入了显式归纳偏置,将全局趋势、周期成分和局部时间动态分离,使模型能够以更少的数据假设捕获可迁移的时间序列结构。实验结果表明,在相同的预训练数据、训练协议和可比较的参数预算下,FSA在我们的受控零样本设置中优于基于Transformer的架构。

英文摘要

Zero-shot time series forecasting aims to predict future values for previously unseen series, requiring models to generalize temporal dynamics beyond the training distribution. While recent foundation models achieve strong in-domain performance through large-scale pretraining, their effectiveness often relies on broad data coverage and implicit pattern memorization, which can limit generalization when data are scarce or source and target domains are disjoint. In this work, we propose FSA, a feature-to-strategy framework for controlled zero-shot univariate forecasting. Instead of directly modeling raw sequences in the observation space, FSA learns a structured mapping from an interpretable feature space to an autoregressive strategy space. This design introduces explicit inductive biases that disentangle global trends, periodic components, and local temporal dynamics, enabling the model to capture transferable time-series structure with fewer data assumptions. Empirical results show that, under identical pretraining data, training protocol, and comparable parameter budgets, FSA outperforms Transformer-based architectures in our controlled zero-shot setting.

2606.01287 2026-06-02 cs.CV cs.AI

Beyond Visual Memory: Mechanistic Diagnostics of Latent Visual Reasoning

超越视觉记忆:潜在视觉推理的机制诊断

Garvin Guo, Yu Chen, Xiang Wang, Shuai Li, Xinpei Zhao, Huaxing Liu, Shuai Dong

发表机构 * Amap, Alibaba Group(阿里集团亚马通) Shanghai Innovation Institute(上海创新研究院)

AI总结 通过分解潜在令牌为三个可测试组件,发现边界标记和格式而非潜在槽贡献了主要性能提升,揭示了潜在视觉推理的真正机制。

详情
AI中文摘要

最近的潜在视觉推理方法通过在多模态语言模型中插入连续潜在令牌取得了显著提升。这些提升通常归因于令牌编码了视觉证据;然而,最近的分析揭示了一个悖论:令牌与图像关联松散,对答案贡献甚微。关键的是,这些分析将潜在令牌视为一个整体,掩盖了提升的真正来源。因此,我们将潜在令牌分解为三个可测试组件:潜在槽、边界标记和格式,并在有利条件下开发了一种最先进的方法作为探针。在六个方法-阶段设置和四个感知密集型基准测试中,潜在槽未能通过视觉记忆解释的所有预测。引人注目的是,在几种设置中,仅保留边界标记即可保留78%至100%的提升,而模型在潜在位置比在答案位置更窄地关注图像。因此,提升来自边界标记、格式以及这种注意力模式,而非潜在槽。每种方法如何利用这一机制取决于其训练监督:在匹配的准确率下,机制仍可能显著不同。因此,潜在视觉推理不仅需要根据准确率评估,还需要根据模型实际依赖的内容进行评估。

英文摘要

Recent latent visual reasoning methods achieve substantial gains by inserting continuous latent tokens into multimodal language models. These gains are commonly attributed to the tokens encoding visual evidence; recent analyses, however, reveal a paradox: the tokens are loosely tied to the image and contribute little to the answer. Critically, these analyses treat latent tokens as a single unit, obscuring the true source of the gains. We therefore decompose latent tokens into three testable components: latent slots, boundary markers, and format, and develop a state-of-the-art method as a probe under favorable conditions. Across six method-stage settings and four perception-heavy benchmarks, latent slots fail every prediction of the visual-memory account. Strikingly, retaining only the boundary markers preserves 78 to 100% of the gain in several settings, while the model attends to the image more narrowly at latent positions than at answer positions. The gain therefore comes from boundary markers, format, and this attention pattern, not from latent slots. How each method engages this mechanism depends on its training supervision: at matched accuracy, mechanisms can still differ markedly. Latent visual reasoning thus needs evaluation not only by accuracy but by what the model actually relies on.

2606.01283 2026-06-02 cs.LG

AdaKernel: Learning Adaptive Kernel Parameters for Spatiotemporal Graph Neural Networks

AdaKernel: 为时空图神经网络学习自适应核参数

Zhongyue Zhang, Guangyin Jin, Yuxuan Liang, Suwan Yin, Yuankai Wu

发表机构 * Sichuan University(四川大学) PLA Academy of Military Science(中国人民解放军军事科学院) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州))

AI总结 针对图神经网络中固定核参数导致模型容量受限的问题,提出AdaKernel方法,通过结构保持策略学习自适应核参数,在数据稀疏场景下优于固定先验和全隐式图结构方法。

详情
Comments
17 pages, 15 figures, including appendix
AI中文摘要

建模空间依赖性是使用图神经网络(GNN)进行时空数据分析的核心。传统方法依赖于具有预定义参数的基于距离的核,这限制了模型容量。尽管通用自适应机制(如图注意力网络)提供了灵活性,但它们通常无法捕捉潜在的几何结构,在数据稀疏场景下表现不如基于距离的模型。针对这一问题,我们重新审视核参数化问题,并从理论上证明,错误指定的核参数会在GNN中引入不可避免的近似误差。为了克服这一困难,我们提出AdaKernel,一种简单而有效的方法,在神经网络内学习自适应核参数。与从头学习图结构的方法不同,AdaKernel采用结构保持策略,优化物理相互作用的尺度而非丢弃它们。在克里金插值、数据填补和预测上的大量实验表明,AdaKernel持续改进各种GNN架构,并优于模型无关的自适应基线,验证了准确学习的核参数优于固定先验和完全隐式图结构。

英文摘要

Modeling spatial dependencies is central to spatiotemporal data analysis using Graph Neural Networks (GNNs). Traditional methods rely on distance-based kernels with predefined parameters, which restricts model capacity. Although generic adaptive mechanisms (e.g., Graph Attention Networks) offer flexibility, they often fail to capture the underlying geometric structure, performing worse than distance-based models in data-sparse scenarios. Addressing this, we revisit the kernel parameterization problem and theoretically prove that misspecified kernel parameters introduce unavoidable approximation errors in GNNs. To overcome this, we propose AdaKernel, a simple yet effective approach that learns adaptive kernel parameters within the neural network. Unlike methods that learn graph structures from scratch, AdaKernel adopts a structure-preserving strategy that optimizes the scale of physical interactions rather than discarding them. Extensive experiments on Kriging, Imputation, and Forecasting demonstrate that AdaKernel consistently improves various GNN architectures and outperforms model-agnostic adaptive baselines, validating that accurately learned kernel parameters are superior to both fixed priors and fully latent graph structures.

2606.01282 2026-06-02 cs.CV cs.CY cs.LG

KG-FairDiff: Knowledge Graph-Guided Prompt Refinement for Demographically Fair Text-to-Image Generation

KG-FairDiff: 知识图谱引导的提示词精炼用于人口统计公平的文本到图像生成

Farbod Davoodi, Seyed Reza Tavakoli Shiyadeh, Pooria Safaei, Sana Harighi, Parsa Gholami, Amirali Amini, Kimia Vanaei, Emad Firoozi, Parham Abed Azad, Babak Khalaj, Siavash Ahmadi, Amir Hossein Payberah, Mohammad Hossein Rohban, Soheil Kolouri, Ali Diba

发表机构 * University of Science and Technology of China(中国科学技术大学) Sharif University of Technology(谢赫·伊斯兰大学) Iran University of Science and Technology(伊朗科学技术大学)

AI总结 提出KG-FairDiff框架,通过知识图谱引导的提示词精炼,在推理时优化公平性损失,减少文本到图像生成中的性别、种族、年龄等人口统计偏差,同时保持语义保真度。

详情
AI中文摘要

文本到图像(TTI)系统现已成为新闻、教育、广告和公共传播的日常基础设施,它们从训练数据中继承的人口统计和文化刻板印象(将女性、有色人种、老年人和非西方文化描绘为代表性不足或漫画化)在部署规模上成为人口层面的危害。现有的缓解措施要么需要昂贵的重新训练,这对于主导消费产品的闭源骨干网络不可行,要么依赖于忽略文化背景的固定人口统计模板。我们提出了KG-FairDiff,一个模型无关、推理时框架,将公平感知的提示词精炼形式化为一个约束优化问题,并将其实现为一个闭环流水线:一个包含约1200个文化和偏见相关三元组的知识图谱检索结构化上下文,一个LLM改写器提出精炼,一个验证器仅接受那些减少基于散度的公平性损失同时保持用户原始意图语义保真度的提示词。我们证明了精炼循环的有限终止界限,贡献了一个数学上一致的评估套件,将Bias-P/Bias-W与目标分布的散度以及ENS与KL散度联系起来,并审计了八个广泛部署的骨干生成器。KG-FairDiff显著减少了性别、种族、年龄和交叉差异,同时保持了提示词语义,为更公平的生成式AI提供了一条实用、可部署的路径。

英文摘要

Text-to-Image (TTI) systems are now everyday infrastructure for journalism, education, advertising, and public communication, and the demographic and cultural stereotypes they inherit from training data (rendering women, people of colour, older adults, and non-Western cultures as under-represented or caricatured) become a population-level harm at deployment scale. Existing mitigations either require costly retraining, infeasible for the closed-source backbones that dominate consumer products, or rely on fixed demographic templates that ignore cultural context. We present KG-FairDiff, a model-agnostic, inference-time framework that formalises fairness-aware prompt refinement as a constrained optimisation problem and operationalises it as a closed-loop pipeline: a knowledge graph of ~1,200 culture- and bias-related triples retrieves structured context, an LLM rewriter proposes refinements, and a validator accepts only prompts that reduce a divergence-based fairness loss while preserving semantic fidelity to the user's original intent. We prove a finite-termination bound for the refinement loop, contribute a mathematically consistent evaluation suite linking Bias-P/Bias-W to divergence from target distributions and ENS to KL divergence, and audit eight widely-deployed backbone generators. KG-FairDiff substantially reduces gender, race, age, and intersectional disparities while preserving prompt semantics, offering a practical, deployment-ready route to more equitable generative AI.

2606.01281 2026-06-02 cs.LG cs.AI

RLVR without Ineffective Samples: Group Prioritized Off-Policy Optimization for LLM Reasoning

RLVR 无需无效样本:面向 LLM 推理的群体优先级离策略优化

Yixiu Mao, Yun Qu, Qi Wang, Heming Zou, Xiangyang Ji

发表机构 * Department of Automation, Tsinghua University(清华大学自动化系)

AI总结 针对强化学习中无效样本导致学习信号不足的问题,提出群体优先级离策略优化(POPO),通过优先级群体重放和解耦重要性采样,在不增加额外采样开销的情况下提升推理性能。

详情
AI中文摘要

基于可验证奖励的强化学习(RLVR)已成为增强大型语言模型(LLMs)推理能力的强大范式。然而,其有效性受到无效训练数据普遍存在的严重阻碍:许多采样提示产生的响应群体要么完全正确,要么完全错误,导致奖励零方差和学习信号有限。最近的先进方法通过大量LLM rollout来过滤无效样本以解决此问题,但代价是相当大的计算开销。替代方法,包括预测性采样和轨迹重放,旨在提高数据效率,但往往仍不充分,并可能引入额外问题,如系统性偏差或次优约束。为解决这些局限性,我们提出了群体优先级离策略优化(POPO),一个简单而有效的框架,无需额外rollout开销即可充分利用有效训练批次。POPO包含两个关键组件:优先级群体重放和解耦离策略优化。前者通过基于近因的重放机制,联合考虑样本质量和离策略程度,用有效的离策略群体替换无效的在策略群体。为进一步缩小离策略差距,POPO采用解耦重要性采样来校正离策略偏差,同时在一致的信任区域约束下保持稳定的策略更新。在包括数学、规划和视觉几何在内的多种推理任务上的实证评估表明,POPO显著加速了RL微调,并在显著减少rollout的情况下实现了强大的推理性能。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, its effectiveness is substantially hindered by the prevalence of ineffective training data: many sampled prompts yield response groups that are either entirely correct or entirely incorrect, resulting in zero-variance rewards and limited learning signals. Recent state-of-the-art methods address this issue through extensive LLM rollouts to filter ineffective samples, but at the cost of considerable computational overhead. Alternative approaches, including predictive sampling and trajectory replay, aim to improve data efficiency but often remain insufficient and may introduce additional issues such as systematic bias or suboptimal constraints. To address these limitations, we propose Group Prioritized Off-Policy Optimization (POPO), a simple yet effective framework that fully exploits effective training batches without additional rollout overhead. POPO comprises two key components: prioritized group replay and decoupled off-policy optimization. The former replaces ineffective on-policy groups with effective off-policy groups via a recency-based replay mechanism that jointly considers sample quality and the degree of off-policiness. To further mitigate the off-policy gap, POPO employs decoupled importance sampling to correct off-policy bias while maintaining stable policy updates under consistent trust-region constraints. Empirical evaluations across diverse reasoning tasks, including mathematics, planning, and visual geometry, demonstrate that POPO substantially accelerates RL finetuning and achieves strong reasoning performance with significantly fewer rollouts.

2606.01280 2026-06-02 cs.CV

Event-Based Vision in Space: Applications, Trends, and Future Directions

太空中的事件视觉:应用、趋势与未来方向

Luigi Capogrosso, Pietro Bonazzi, Michele Magno

发表机构 * Interdisciplinary Transformation University of Austria(交叉学科转型奥地利大学) ETH Zurich(苏黎世联邦理工学院)

AI总结 本文综述了事件视觉传感器在太空领域的应用,通过分类四个主要领域(大气与高速观测、环境监测与变化检测、操作支持与星上处理、地理空间建模与预测分析),指出神经形态工程是解决现代遥感与可持续太空探索关键瓶颈的范式转变。

详情
Comments
Accepted at the XXIV Annual Conference on Sensors and Microsystems (AISEM) 2026
AI中文摘要

地球观测(EO)正经历由新型传感技术部署驱动的重大变革。传统的基于帧的光学传感器在具有挑战性的轨道环境中常受运动模糊、高功耗和极端数据冗余的困扰。相比之下,事件传感器(也称为神经形态相机)提供了一种仿生异步方法。通过仅捕获局部光照变化,它们提供微秒级时间分辨率、极高动态范围和卓越能效。尽管这些传感器的使用正从地面系统迅速扩展到轨道平台,但围绕其太空应用的科学文献仍然高度分散。为弥合这一差距,本文对太空领域事件视觉的最新技术进行了全面综述。基于检索到的文献,我们引入了一个围绕四个主要领域构建的分类体系:1)大气与高速观测;2)环境监测与变化检测;3)操作支持与星上处理;4)地理空间建模与预测分析。因此,本综述强调,神经形态工程远不止是一种补充成像技术;它是一种范式转变,可直接用于解决现代遥感和可持续太空探索中的关键瓶颈。

英文摘要

Earth Observation (EO) is undergoing a significant transformation driven by the deployment of novel sensing technologies. Traditional frame-based optical sensors often struggle with motion blur, high power consumption, and extreme data redundancy in challenging orbital environments. In contrast, event-based sensors, also known as neuromorphic cameras, offer a bio-inspired asynchronous approach. By capturing only local illumination changes, they provide microsecond temporal resolution, an extremely high dynamic range, and exceptional energy efficiency. Although the use of these sensors is rapidly expanding from terrestrial systems to orbital platforms, the scientific literature surrounding their space-based applications remains heavily fragmented. To bridge this gap, this article presents a comprehensive review of the state-of-the-art in event-based vision in the space domain. Based on the retrieved literature, we introduce a taxonomy structured around four primary domains: 1) atmospheric and high-speed observation; 2) environmental monitoring and change detection; 3) operational support and onboard processing; and 4) geospatial modeling and predictive analysis. As a result, this survey highlights that neuromorphic engineering is far more than a supplementary imaging technique; it is a paradigm shift that can be used to directly address critical bottlenecks in modern remote sensing and sustainable space exploration.

2606.01279 2026-06-02 cs.AI

ANDES: Agent Native Data Evolving Synthesis Tool for Autonomous Instruction Alignment

ANDES:用于自主指令对齐的智能体原生数据演化合成工具

Zhengyang Zhao, Shengjie Ye, Lu Ma, Hao Liang, Hengyi Feng, Wentao Zhang

发表机构 * Peking University(北京大学) Sichuan University, Chengdu(四川大学,成都)

AI总结 提出ANDES框架,通过自进化世界树路由和可操作诊断报告,将数据生成重构为即插即用的智能体技能,使基础较弱的智能体在严格计算约束下实现自动对齐,在PostTrainBench上取得最先进性能。

详情
AI中文摘要

AI智能体正越来越多地被用于自动化AI研究本身,特别是将基础大语言模型转化为对齐助手的关键后训练阶段。然而,最近的评估显示,即使是最前沿的智能体也难以完成这一任务。虽然后训练的成功根本上依赖于获取高质量数据,但依赖智能体从开放网络中自主策划目标训练数据集带来了严峻挑战。在嘈杂的网络环境中执行搜索、过滤和平衡数据的长期任务,常常超出智能体有限的上下文能力,最终导致数据集质量下降和下游训练性能次优。为弥补这一差距,我们引入了Andes(智能体原生数据演化合成),这是一个将数据生成重新构想为即插即用的智能体技能的框架。Andes不强迫智能体从头设计复杂的数据收集策略,而是提供一个智能抽象层。通过利用自演化的世界树路由机制和可操作的诊断报告,它允许训练智能体通过交互式闭环界面动态引导数据合成。我们证明,在严格的计算约束下,为基础较弱的智能体配备Andes可以改善自动对齐,在PostTrainBench上取得最先进的性能,并实现稳健的跨任务泛化。我们的项目可在https://github.com/zzy1127/ANDES获取。

英文摘要

AI agents are increasingly being tasked with automating AI research itself, particularly the critical post-training phase that transforms base LLMs into aligned assistants. However, recent evaluations reveal that even frontier agents struggle to perform this task. While the success of post-training fundamentally relies on acquiring high-quality data, relying on agents to autonomously curate targeted training datasets from the open web introduces severe challenges. Executing the long-horizon tasks of searching, filtering, and balancing data within noisy web environments frequently overwhelms an agent's limited context, ultimately leading to degraded dataset quality and suboptimal downstream training performance. To bridge this gap, we introduce Andes (Agent Native Data Evolving Synthesis), a framework that reimagines data generation as a plug-and-play \emph{agent skill}. Rather than forcing agents to devise complex data-gathering strategies from scratch, \textsc{Andes} provides an intelligent abstraction layer. By leveraging a self-evolving World Tree routing mechanism and actionable diagnostic reports, it allows trainer agents to dynamically steer data synthesis through an interactive, closed-loop interface. We demonstrate that under strict compute constraints, equipping foundationally weaker agents with Andes improves automated alignment, securing state-of-the-art performance on PostTrainBench and robust cross-task generalization. Our project is available at https://github.com/zzy1127/ANDES.

2606.01277 2026-06-02 cs.RO cs.AI cs.CV cs.SY eess.IV eess.SY

DeepIPCv3: Event-Aware Multi-Modal Sensor Fusion for Sudden Pedestrian Crossing Avoidance

DeepIPCv3: 面向突发行人穿越避让的事件感知多模态传感器融合

Oskar Natan, Andi Dharmawan, Aufaclav Zatu Kusuma Frisky, Jazi Eko Istiyanto, Jun Miura

发表机构 * Department of Computer Science and Electronics, Universitas Gadjah Mada(计算机科学与电子系,加雅马达大学) Department of Computer Science and Engineering, Toyohashi University of Technology(计算机科学与工程系,东福士大学)

AI总结 提出DeepIPCv3框架,通过Transformer交叉模态注意力融合LiDAR点云与DVS事件流,实现突发行人穿越场景下的高反应性避让,在自定义多模态数据集上达到最优轨迹与控制精度。

详情
AI中文摘要

当前的端到端自动驾驶系统主要依赖基于帧的传感器,这类传感器在高度动态的突发行人穿越场景中存在固有的感知延迟和运动模糊问题。为解决这一关键安全漏洞,我们提出DeepIPCv3,一种新颖的多模态自主导航框架,它将LiDAR点云的密集3D空间几何与动态视觉传感器(DVS)的微秒级异步事件流协同融合。我们引入了一种受Transformer启发的交叉模态注意力机制,以动态关联这些不同模态,使网络能够即时优先处理高速动态更新,同时不牺牲场景结构感知。融合后的潜在表示通过一个混合策略网络映射到安全的局部路径点和可执行控制命令,该网络结合了启发式轨迹跟踪与直接神经预测。由于在真实场景中测试这些突发穿越场景存在严重物理风险,该框架使用在光照良好的正午和具有挑战性的傍晚条件下收集的自定义多模态数据集进行严格离线评估。广泛的对比和消融研究表明,DeepIPCv3达到了最先进的预测性能。通过有效消除曝光失败和运动模糊,所提出的LiDAR与DVS融合实现了最低的轨迹和控制命令误差,使得无论环境光照如何,都能实现高反应性、数学上有界的规避机动。为支持未来研究,我们将代码发布到GitHub仓库:https://github.com/oskarnatan/DeepIPCv3。

英文摘要

Current end-to-end autonomous driving systems predominantly rely on frame-based sensors, which suffer from inherent perception latency and motion blur during highly dynamic encounters, specifically sudden pedestrian crossings. To address this critical safety vulnerability, we propose DeepIPCv3, a novel multi-modal autonomous navigation framework that synergizes the dense 3D spatial geometry of LiDAR point clouds with the microsecond-level asynchronous event streams of a Dynamic Vision Sensor (DVS). We introduce a Transformer-inspired cross-modal attention mechanism to dynamically correlate these distinct modalities, allowing the network to instantaneously prioritize high-speed dynamic updates without sacrificing structural scene awareness. The fused latent representations are then mapped to safe local waypoints and executable control commands via a hybrid policy network that blends heuristic trajectory tracking with direct neural predictions. Due to the severe physical risks associated with live testing of these sudden crossing scenarios, the framework is rigorously evaluated offline using a custom multi-modal dataset collected across both well-illuminated noon and challenging evening conditions. Extensive comparative and ablation studies demonstrate that DeepIPCv3 achieves state-of-the-art predictive performance. By effectively eliminating exposure failures and motion blur, the proposed LiDAR and DVS fusion yields the lowest trajectory and control command errors, enabling highly reactive, mathematically bounded evasive maneuvers regardless of ambient illumination. To support future research, we will release the codes to our GitHub repo at https://github.com/oskarnatan/DeepIPCv3.

2606.01273 2026-06-02 cs.LG

GLIDE: Graph-guided Leap Inference for Diffusion Estimation of Spatio-Temporal Point Processes

GLIDE: 面向时空点过程扩散估计的图引导跳跃推理

Guanyu Zhou, Yao Liu, Yanglei Gan, Yuxiang Cai, Peng He, Run Lin, Yuxiang Liu, Qiao Liu

发表机构 * University of Electronic Science and Technology of China(电子科技大学)

AI总结 提出GLIDE框架,利用多尺度历史图编码和双流架构作为条件,结合先验引导的跳跃推理机制,实现高效且准确的时空点过程下一个事件建模与预测。

详情
AI中文摘要

时空点过程(STPPs)为连续时间和空间中的异步事件建模提供了原则性框架。最近的扩散方法通过建模复杂条件分布,为确定性预测提供了灵活的替代方案,但其在STPPs中的应用仍面临挑战:从纯噪声中反向采样成本高昂,且稀疏空间域中弱结构约束可能导致概率质量定位不佳。我们提出 extbf{GLIDE}(图引导跳跃推理扩散估计),一种用于STPPs中下一个事件建模的条件扩散框架。GLIDE将历史事件组织成多尺度历史图,并通过双流架构编码时间演化和空间拓扑,为双分支扩散去噪器提供结构化条件上下文。它进一步引入先验引导的跳跃推理机制,其中轻量级均值预测器提供确定性锚点,反向过程从中间扩散步骤而非纯高斯噪声开始。在多个真实世界数据集上的实验表明,GLIDE改进了分布拟合和下一个事件预测,其中空间方面的提升最大。结果还表明,先验引导的跳跃推理大幅降低了反向采样成本,同时保留了扩散模型的随机生成能力。

英文摘要

Spatio-temporal point processes (STPPs) provide a principled framework for modeling asynchronous events in continuous time and space. Recent diffusion-based approaches offer a flexible alternative to deterministic prediction by modeling complex conditional distributions, but their application to STPPs remains challenging: reverse sampling from pure noise is costly, and weak structural constraints in sparse spatial domains can lead to poorly localized probability mass. We propose \textbf{GLIDE} (Graph-guided Leap Inference for Diffusion Estimation), a conditional diffusion framework for next-event modeling in STPPs. GLIDE organizes historical events into a multi-scale historical graph and encodes temporal evolution and spatial topology through a dual-stream architecture, yielding a structured conditioning context for a dual-branch diffusion denoiser. It further introduces a prior-guided leap inference mechanism, in which a lightweight mean predictor provides a deterministic anchor and the reverse process starts from an intermediate diffusion step instead of from pure Gaussian noise. Experiments on multiple real-world datasets show that GLIDE improves both distribution fitting and next-event prediction, with the largest gains appearing on the spatial side. The results also indicate that prior-guided leap inference substantially reduces reverse-sampling cost while preserving the stochastic generation capability of diffusion models.

2606.01271 2026-06-02 cs.CV

Exploiting In-Sensor Computing for Energy-Efficient Earth Observation

利用传感器内计算实现节能地球观测

Luigi Capogrosso, Pietro Bonazzi, Loris Hoxhaj, Michele Magno

发表机构 * Interdisciplinary Transformation University of Austria(跨学科转型奥地利大学) ETH Zurich(苏黎世联邦理工学院) University of Verona(威尼斯大学)

AI总结 针对卫星数据下行带宽瓶颈,提出基于TinyML和索尼IMX500传感器的传感器内计算框架,在8MB约束下达到96.68%精度和42.26 GMAC/J能效。

详情
Comments
Accepted at the XXIV Annual Conference on Sensors and Microsystems (AISEM) 2026
AI中文摘要

卫星产业的快速增长推动了地理空间数据获取的大幅增加,凸显了一个关键瓶颈:收集的传感器数据量与地面站有限的可用下行带宽之间的严重不匹配。虽然星载计算通过在轨预处理数据帮助解决了这一问题,但本文通过引入传感器内计算框架进一步推进了这一范式。我们通过将TinyML技术与索尼IMX500智能视觉传感器集成,提出了一种针对严格计算约束优化的端到端地球观测流水线。具体来说,我们的方法将处理直接转移到传感器级别,减轻了主嵌入式设备的计算负担,并有效减少了噪声或无关数据的下行传输。我们在EuroSAT数据集上评估了几种高效的卷积神经网络,即SqueezeNet、ShuffleNetV2和MCUNetV1。实验结果表明,尽管部署在IMX500平台上需要优化,我们的模型在其8 MB约束内保持了具有竞争力的96.68%准确率。具体来说,模型达到平均处理吞吐量17.40 FPS,延迟27.43 ms。此外,我们的系统配置文件表现出高能效,每次推理的低能耗为14.19 mJ,能效评级为42.26 GMAC/J,证明了其在传感器内部署的可行性。

英文摘要

The rapid growth of the satellite industry has driven a significant increase in geospatial data acquisition, highlighting a critical bottleneck: the severe disparity between the volume of collected sensor data and the limited downlink bandwidth available to ground stations. While On-Board Computing (OBC) has helped address this by pre-processing data in orbit, this article further advances the paradigm by introducing an in-sensor computing framework. We present an optimized end-to-end Earth Observation (EO) pipeline tailored for strict computational constraints by integrating TinyML techniques with the Sony IMX500 Intelligent Vision Sensor. Specifically, our approach shifts processing directly to the sensor level, offloading the computation from the primary embedded device, and effectively mitigating the downlink transmission of noisy or irrelevant data. We evaluated several efficient Convolutional Neural Networks (ConvNets), i.e., SqueezeNet, ShuffleNetV2, and MCUNetV1, on the EuroSAT dataset. Experimental results show that, despite the optimizations required for deployment on the IMX500 platform, our models maintain a competitive 96.68% accuracy while operating within its 8 MB constraints. Specifically, the models reach an average processing throughput of 17.40 FPS with a latency of 27.43 ms. Furthermore, our system profile exhibits high energy efficiency, with a low energy footprint of 14.19 mJ per inference and an efficiency rating of 42.26 GMAC/J, demonstrating its viability for in-sensor deployment.

2606.01265 2026-06-02 cs.LG cs.AI

PALTO: Physics-Informed Active Learning for Tri-Gate FinFET Design Optimization for Vertical Power Delivery

PALTO:面向垂直供电的Tri-Gate FinFET设计优化的物理信息主动学习

Ayoub Sadeghi, Leonid Popryho, Inna Partin-Vaisband

发表机构 * University of Illinois Chicago(伊利诺伊大学香槟分校) Center for Heterogeneous Integration of Micro Electronic Systems(微电子异构集成中心) Joint University Microelectronics Program (JUMP) 2.0(联合大学微电子计划(JUMP)2.0) Semiconductor Research Corporation (SRC)(半导体研究公司(SRC)) Defense Advanced Research Project Agency (DARPA)(国防高级研究计划局(DARPA))

AI总结 提出物理信息主动学习框架,高效探索GaN tri-gate FinFET的高维设计空间,优化关键结构参数(如GaN-to-AlGaN厚度比),发现两种优化器件,其中D1在300-fin配置下驱动电流和开关效率优于D2。

详情
AI中文摘要

本文展示了机器学习驱动优化在垂直供电系统中设计特定应用的GaN三栅极FinFET的有效性。传统的基于TCAD的方法计算量大,且不足以导航先进GaN器件的高维非线性设计空间。为此,采用物理信息主动学习框架智能引导仿真,在保持精度的同时加速收敛。这种ML引导的方法通过高效探索关键结构参数——尤其是GaN-to-AlGaN厚度比(器件设计中长期争论的焦点)——来发现最优配置。通过系统探索关键结构参数,确定了两种具有激进缩放的栅漏长度的优化器件。单鳍多通道仿真表明,相对于AlGaN势垒具有更薄GaN沟道的器件D2实现了更高的驱动电流。然而,在300鳍配置中,器件D1以0.49欧姆导通电阻提供3.3A电流,性能约为D2的2倍,尽管寄生参数略高。两种器件均工作在常关模式。基于特定应用品质因数,器件D1达到5 pC·欧姆,开关效率比D2高2倍,而两种设计在不同性能指标上均优于工业基准。

英文摘要

This paper demonstrates the effectiveness of machine learning-driven optimization for designing application-specific GaN tri-gate FinFETs in vertical power delivery systems. Conventional TCAD-based approaches are computationally intensive and insufficient for navigating the high-dimensional, nonlinear design space of advanced GaN devices. To address this, a physics-informed active learning framework is used to intelligently guide simulations, accelerating convergence while preserving accuracy. This ML-guided approach enables the discovery of optimal configurations by efficiently exploring key structural parameters -- most notably the GaN-to-AlGaN thickness ratio -- a long-standing focus of debate in device design. By systematically exploring key structural parameters, two optimized devices with aggressively scaled gate-to-drain lengths are identified. Single-fin, multi-channel simulations show that device~D2, with a thinner GaN channel relative to the AlGaN barrier, achieves higher drive current. However, in a 300-fin configuration, device~D1 outperforms device~D2 by delivering 3.3\,A at 0.49~ohm on-resistance -- approximately 2$\times$ better -- despite slightly higher parasitics. Both devices operate in a normally-off mode. Based on an application-specific figure of merit, device~D1 achieves 5\,pC$\cdot$ohm, demonstrating 2$\times$ greater switching efficiency than device~D2, while both designs outperform industrial benchmarks from different performance standpoints.

2606.01260 2026-06-02 cs.CL cs.AI

IndoBias: A Dual Track Culturally Grounded Benchmark for LLMs Bias Evaluation in Indonesian Languages

IndoBias:印尼语言中大语言模型偏见评估的双轨文化基准

Ikhlasul Akmal Hanif, Muhammad Falensi Azmi, Filbert Aurelian Tjiaranata, Eryawan Presma Yulianrifat, Fajri Koto

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(莫扎德大学人工智能大学) Universitas Indonesia(印度尼西亚大学) Independent Researcher(独立研究员)

AI总结 提出IndoBias基准,通过深度和广度双轨评估,发现现有LLM在印尼语和三种地方语言中表现出显著偏见,且预训练数据来源和语言多样性影响偏见程度。

详情
AI中文摘要

尽管印尼拥有超过1300个民族和700种土著语言,但大语言模型中的偏见尚未得到充分研究,从而在其独特广阔、多语言和多样化的社会文化背景下,评估代表性公平性和本地化刻板印象存在关键空白。为解决此问题,我们引入IndoBias作为文化基础的偏见基准,评估LLM在印尼语和三种地方语言(爪哇语、巽他语和望加锡语)中的偏见。IndoBias具有双视角评估轨道:深度导向(使用对比对)和广度导向(基于生成),后者基于社会科学框架(SPI、O*NET和WGI)。我们的结果表明,现有LLM——尤其是解码器模型——对印尼语中的原型句子表现出强烈偏见,而地方语言在意识形态和宗教类别下遭受更高偏见。我们还发现,当使用各种地方实体提示时,LLM响应表现出非均匀的刻板印象极性。最后,我们发现,在印尼语中,Common Crawl文本在预训练期间引入的偏见比人工审核的文章文本(如维基百科、新闻)更多,而将地方语言引入预训练通常会增加偏见。这项工作强调了在特定文化背景下研究偏见的重要性。警告:本文包含可能具有冒犯性、有害性或偏见性的示例数据。

英文摘要

Despite being home to more than 1300 ethnic groups and 700 indigenous languages, bias in Large Language Models has not been fully studied in Indonesia, thus leaving a critical gap in evaluating representational fairness and localized stereotypes within its uniquely vast, multilingual, and diverse sociocultural landscape. To address this, we introduce IndoBias as a culturally-grounded bias benchmark to assess LLMs bias in Indonesian and three local languages: Javanese, Sundanese, and Makasar. IndoBias features dual perspective evaluation tracks: depth-oriented (with contrastive-pairs) and breadth-oriented (with generation-based), where the latter is grounded in social science frameworks (SPI, O*NET, and WGI). Our results show that existing LLMs -- particularly decoder models -- exhibit strong bias towards prototypical sentences in Indonesian, while local languages suffer higher bias under Ideology and Religion category. We also find that LLMs responses exhibit a non-uniform Stereotype Polarity when prompted with various local entities. Finally, we discover that, in Indonesian, Common Crawl texts introduce more bias during pretraining, compared to human-reviewed article texts (e.g., Wikipedia, News), whereas introducing local languages to pretraining generally increases bias. This work highlights the importance of studying bias in culture-specific context. Warning: This paper contains example data that may be offensive, harmful, or biased.

2606.01258 2026-06-02 cs.LG cs.CL eess.SP

Beyond Sinusoids: A Morlet Wavelet Framework for Transformer Positional Encoding

超越正弦波:基于Morlet小波的Transformer位置编码框架

Athanasios Zeris

发表机构 * Independent Researcher(独立研究者) Athens, Greece(希腊雅典)

AI总结 提出Morlet位置编码(MoPE),通过可学习的频率和局部带宽统一了正弦位置编码和旋转位置编码,并在TinyShakespeare上结合能量门控注意力提升了0.119的性能。

详情
Comments
16 pages, 4 figures, 4 tables
AI中文摘要

标准Transformer位置编码——正弦编码和旋转位置编码(RoPE)——将每个位置视为同等局部:它们编码了标记的位置,但未编码其位置影响应延伸多远。我们提出Morlet小波(同时最小化位置和频率的不确定性)是位置编码的自然基础,并引入Morlet位置编码(MoPE):每个嵌入维度从数据中学习其自身的频率和局部带宽。主要理论结果是统一:当局部性关闭时(sigma_i -> 无穷大),正弦PE和RoPE相关核都作为MoPE的极限情况出现。MoPE的相位精确恢复了RoPE旋转角度;幅度增加了一个标准编码所缺乏的可学习高斯局部核。实验上,MoPE结合能量门控注意力在TinyShakespeare上比标准注意力提升了0.119,优于任一单独组件。对学习参数的分析显示,所有128个频率-带宽对收敛到小波可容许边界——这一经验观察与关于能量门控的伴随结果一致,表明字符级语言信号的一个可重现性质,值得进一步研究。

英文摘要

Standard positional encodings for transformers - sinusoidal and rotary (RoPE) - treat every position as equally local: they encode where a token is, but not how far its positional influence should extend. We propose that the Morlet wavelet, which simultaneously minimises uncertainty in position and frequency, is the natural basis for positional encoding, and introduce Morlet Positional Encoding (MoPE): each embedding dimension learns its own frequency and locality bandwidth from data. The main theoretical result is a unification: sinusoidal PE and the RoPE correlation kernel both emerge as limiting cases of MoPE when locality is switched off (sigma_i -> infinity). The phase of MoPE recovers the RoPE rotation angle exactly; the amplitude adds a learned Gaussian locality kernel that standard encodings lack. Empirically, MoPE combined with Energy-Gated Attention achieves +0.119 improvement over standard attention on TinyShakespeare, outperforming either component alone. Analysis of the learned parameters reveals that all 128 frequency-bandwidth pairs converge to the wavelet admissibility boundary - an empirical observation consistent with a companion result on energy gating, suggesting a reproducible property of character-level language signals that warrants further investigation.

2606.01255 2026-06-02 cs.CL

Agentic Clustering: Controllable Text Taxonomies via Multi-Agent Refinement

Agentic Clustering: 通过多智能体精炼实现可控文本分类

Simon Löwe, Emily Silcock

发表机构 * Burning Glass Institute(Burning Glass研究院) Harvard University(哈佛大学)

AI总结 提出一种基于多智能体(提议者、合成者、审计者、调查者、批评者)的自适应文本聚类方法,通过编排器LLM动态调整聚类流程,在七个公开基准上实现最先进性能,ARI最高提升32%。

详情
AI中文摘要

最近的文本聚类方法使用大型语言模型从语料库中提出聚类分类法,然后将每个文本分配到其中。这些流程本质上是程序化的:LLM调用的顺序以及停止、合并和拆分聚类的规则是预先在代码中固定的,因此它们在不同结构的语料库上泛化能力差,并且难以整合用户提供的约束,如目标聚类数量或聚类意图。我们提出了一种基于智能体的替代方案,其中编排器LLM在每个步骤检查发现过程的状态,并调度一组专门的智能体(提议者、合成者、审计者、调查者和批评者)中的一个,使流程适应语料库,而不是执行固定的流程。在七个公开文本聚类基准上,该方法实现了最先进的性能,在ARI上比最强的先前LLM基线高出32%。

英文摘要

Recent text-clustering methods use large language models to propose a cluster taxonomy from a corpus and then assign each text to it. These pipelines are fundamentally programmatic: the sequence of LLM calls and the rules for stopping, merging, and splitting clusters are fixed in code in advance, so they generalise poorly across corpora of different structure and cannot easily incorporate user-supplied constraints such as a target cluster count or a clustering intent. We propose an agentic alternative in which an orchestrator LLM inspects the state of the discovery process at each step and dispatches one of a small set of specialised agents - proposer, synthesizer, auditor, investigator, and critic - adapting the pipeline to the corpus rather than executing a fixed one. On seven public text-clustering benchmarks the method achieves state-of-the-art performance, beating the strongest prior LLM baseline by up to 32% in ARI.

2606.01247 2026-06-02 cs.CV

Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

看向哪里:基础模型能否通过主动探索达到目标视角?

Liyang Li, Muzhi Zhu, Zhiyue Zhao, Hengyu Zhao, Ke Liu, Linhao Zhong, Hao Chen, Chunhua Shen

发表机构 * Zhejiang University(浙江大学)

AI总结 提出目标视角复现(TVR)任务及TVRBench基准,通过分析现有模型瓶颈并构建统一后训练框架,将9B开源模型成功率提升至50%以上。

详情
Comments
Project page: https://github.com/aim-uofa/TVRBench
AI中文摘要

人类可以通过主动的头部和身体运动复现目标图像指定的视角,然而基础模型中的空间智能主要被研究为对预先收集的观测的被动理解。我们引入了目标视角复现(TVR)——一个主动任务,其中智能体在3D环境中调整其视角,直到其观测与给定的目标图像匹配——以及TVRBench,一个涵盖场景尺度和目标视角视觉丰富度的室内模拟基准。TVR远未解决:在评估集上,最强的开源和闭源模型仅达到7.8%和12.0%的成功率。细粒度分析识别出两个一致的瓶颈:现成模型难以处理多轮视觉历史,并且当视角复现需要身体平移而非原地旋转时,性能急剧下降,暴露了将空间差异映射到具身运动方面的差距。为了研究缩小这一差距,我们构建了一个统一的TVR后训练框架,涵盖专家轨迹SFT、理由监督的CoT-SFT、离线单轮GRPO以及来自实时模拟器rollout的在线多轮GRPO。视觉-动作SFT提供了主要增益,将9B开源模型提升至50.8%的成功率;多轮GRPO提供了针对性的多房间细化,总体达到51.4%,而CoT监督和单轮GRPO降低了闭环性能。这些结果将TVRBench确立为衡量和训练主动感知并在3D环境中行动的基础模型的测试平台。我们的代码、数据和模型可在https://github.com/aim-uofa/TVRBench获取。

英文摘要

Humans can reproduce the viewpoint specified by a target image through active head and body motion, yet spatial intelligence in foundation models has largely been studied as passive understanding of pre-collected observations. We introduce Target Viewpoint Reproduction (TVR) -- an active task where an agent adjusts its viewpoint in a 3D environment until its observation matches a given target image -- and TVRBench, an indoor-simulation benchmark spanning scene scale and target-view visual richness. TVR is far from solved: on the evaluation split, the strongest open-source and closed-source models reach only 7.8% and 12.0% success. Fine-grained analysis identifies two consistent bottlenecks: off-the-shelf models struggle with multi-turn visual history, and performance drops sharply when viewpoint reproduction requires body translation rather than in-place rotation, exposing a gap in mapping spatial discrepancies to embodied movement. To study reducing this gap, we build a unified TVR post-training framework covering expert-trajectory SFT, rationale-supervised CoT-SFT, offline Single-turn GRPO, and on-policy Multi-turn GRPO from live simulator rollouts. Visual-action SFT supplies the main gain, raising a 9B open-source model to 50.8% success; Multi-turn GRPO provides targeted multi-room refinement and reaches 51.4% overall, while CoT supervision and Single-turn GRPO degrade closed-loop performance. These results establish TVRBench as a testbed for measuring and training foundation models that actively perceive and act in 3D environments. Our code, data, and models are available at https://github.com/aim-uofa/TVRBench.

2606.01246 2026-06-02 cs.AI

SIRIUS-SQL: Anchoring Multi-Candidate Text-to-SQL in Execution Feedback

SIRIUS-SQL: 在执行反馈中锚定多候选文本到SQL

Leo Luo, Haining Xie, Siqi Shen, Zhipeng Ma, Rui Ling, Hang Xu, Hefeng Jiang, Dingwei Chen, Yang Li, Peng Chen, Jie Jiang

发表机构 * TEG, Tencent Inc., China(腾讯科技(TEG),腾讯公司,中国) Peking University, China(北京大学,中国)

AI总结 提出SIRIUS-SQL系统,通过难度平滑强化学习、执行生命周期分类和置信门控混合选择器,解决多候选SQL生成中的冗余、修复和选择问题,在BIRD dev和SPIDER test上达到75.88%和91.20%的准确率。

详情
AI中文摘要

在复杂模式上的Text-to-SQL单次通过不可靠,因此近期系统生成多个SQL候选并通过投票过滤错误。然而仅投票是不够的,因为多候选方法有三个耦合的弱点:1) 从单个生成器采样更多会产生越来越冗余的候选,2) 现有流程对每个非干净执行结果应用通用修正,而运行时错误、超时和空结果各自指示与正确性的不同距离,3) 现有选择器依赖单一角度如结果多数投票或成对SQL比较,错过了其他角度可能捕获的信息。我们提出SIRIUS-SQL,解决了所有三个弱点。一个难度平滑的强化学习配方训练SIRIUS-32B生成多样化的可执行SQL候选,并与一个通用LLM配对,填补专家留下的空白。一个基于执行的生命周期对每个结果进行分类,并在候选重新进入池之前应用针对性修复。一个置信门控混合选择器结合执行结果一致性与成对SQL形式判断,仅在接近平局的情况下升级到确定性结构检查。SIRIUS-SQL在BIRD dev上达到75.88%,在SPIDER test上达到91.20%。三个通用配对中的两个超过了BIRD dev上最强已发布的多候选系统Agentar-Scale-SQL。

英文摘要

Text-to-SQL on complex schemas is unreliable on a single pass, so recent systems generate multiple SQL candidates and let voting filter out errors. Yet voting alone is not enough, because the multi-candidate recipe has three coupled weaknesses: 1) sampling more from a single generator produces increasingly redundant candidates, 2) existing pipelines apply one generic correction to every non-clean execution result, while runtime errors, timeouts, and empty results each indicate a different distance from correctness, and 3) existing selectors rely on a single angle such as result-majority voting or pairwise SQL comparison, missing what other angles would have caught. We present SIRIUS-SQL, which addresses all three weaknesses. A difficulty-smoothing RL recipe trains SIRIUS-32B to generate diverse executable SQL candidates, paired with a generalist LLM that fills in gaps left by the specialist. An execution-grounded lifecycle classifies each outcome and applies targeted repair before candidates re-enter the pool. A confidence-gated hybrid selector combines execution-result agreement with pairwise SQL-form judgment, escalating only near-tied cases to a deterministic structural check. SIRIUS-SQL reaches 75.88% on BIRD dev and 91.20% on SPIDER test. Two of three generalist pairings surpass Agentar-Scale-SQL, the strongest published multi-candidate system on BIRD dev.