arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3410
2605.24418 2026-05-26 cs.LG

ChainLearn: A Blockchain-Based Capacity-Aware Framework for Federated Ensemble Learning

ChainLearn: 一种基于区块链的容量感知联邦集成学习框架

Karan Sharma, Aditya Tripathi, Rahul Mishra, Tapas Kumar Maiti

AI总结 针对医院计算资源异构导致标准联邦学习失效的问题,提出容量感知协调方法,通过区块链分离链上策略与链下学习,为各医院分配适当架构并加权集成,在降低通信开销的同时保持竞争性精度与校准误差。

Comments 10 pages, 7 figures, 11 tables. IEEE conference format. Code: https://github.com/EdddTri/ChainLearn

详情
AI中文摘要

联邦学习用于医疗影像中,其中隐私禁止集中数据。标准联邦算法假设同质硬件、相同架构和集中聚合,当医院拥有不均等的计算资源时失败。我们提出容量感知协调:测量每个医院的吞吐量,分配容量适当的架构(MobileNetV3-Small、EfficientNet-B0、ResNet-50),并通过加权集成组合预测。弱医院和强医院都可以参与,无需强制统一架构。我们将链上策略与链下学习分离。一个Solidity合约存储医院注册、基准哈希、指标和权重。医院本地训练并仅提交哈希和标量(而非参数)。加权集成推理在链下计算。在PneumoniaMNIST和DermaMNIST上的实验(5个种子,3个非独立同分布水平)表明,我们的方法相比等权集成实现了更低或相等的校准误差,相比FedAvg、FedProx和FedMD具有竞争性精度。每轮通信开销为224字节,相比FedAvg减少了超过912,000倍。

英文摘要

Federated learning is used in medical imaging where privacy prohibits centralizing data. Standard federated algorithms assume homogeneous hardware, identical architectures, and centralized aggregation, which fails when hospitals have unequal compute resources. We propose capacity-aware coordination: measure each hospital's throughput, assign capacity-appropriate architectures (MobileNetV3-Small, EfficientNet-B0, ResNet-50), and combine predictions via weighted ensemble. Weak and strong hospitals can participate without forcing uniform architectures. We separate on-chain policy from off-chain learning. A Solidity contract stores hospital registration, benchmark hashes, metrics, and weights. Hospitals train locally and submit only hashes and scalars (not parameters). Weighted ensemble inference is computed off-chain. Experiments on PneumoniaMNIST and DermaMNIST (5 seeds, 3 non-IID levels) show our method achieves lower or equal calibration error versus equal-weight ensemble and competitive accuracy versus FedAvg, FedProx, and FedMD. Communication overhead is 224 bytes per round, a reduction of over 912,000x compared to FedAvg.

2605.24417 2026-05-26 cs.LG

LLMTabBench: Evaluating LLMs on Binary Tabular Classification From Zero to Few Shots

LLMTabBench:从零样本到少样本的二元表格分类中评估LLM

Daria Grushina, Kseniia Kuvshinova, Alina Kostromina, Aziz Temirkhanov, Mile Mitrovic, Dmitry Simakov

AI总结 提出LLMTabBench基准,系统评估LLM在数据稀缺条件下进行表格分类时,先验知识与上下文信息(任务描述和少样本示例)的交互作用,以及性能随数据复杂度的扩展规律。

详情
AI中文摘要

表格数据的监督分类仍然是核心机器学习任务,但其对大规模标注数据集的依赖限制了在数据稀缺领域的适用性。对于此类少样本场景,像TabPFN(一种最先进的先验数据拟合网络)这样的专门方法通过利用大规模合成预训练设定了高标准,但它们仍然需要标注示例的上下文才能运行。相比之下,大型语言模型(LLM)可以通过直接从任务描述中进行零样本和少样本上下文学习提供更灵活的替代方案,但它们在表格数据上的性能仍然不一致且理解不足。我们引入了LLMTabBench,这是一个基准测试,旨在系统评估LLM在数据稀缺条件下进行表格分类的能力。LLMTabBench明确探究了(i)LLM先验知识如何与上下文信息(任务描述和少样本示例)相互作用,以及(ii)模型性能如何随数据复杂度的增加而扩展,使用了真实世界和受控合成数据集。我们的发现包括:(1)LLM在零样本设置中极具竞争力,甚至可以超越那些能够访问少样本示例的替代模型;(2)加入额外的少样本示例可能与LLM先验知识冲突,限制甚至降低性能;(3)存在一个数据复杂度阈值,超过该阈值LLM的性能下降且少样本示例变得效果较差。这些发现共同揭示了表格数据上下文学习的基本限制,并为在低数据场景中部署LLM提供了实用指导。

英文摘要

Supervised classification for tabular data remains a core machine learning task, yet its reliance on large labeled datasets limits applicability in data-scarce domains. For such few-shot scenarios, specialized methods like TabPFN - a state-of-the-art Prior-Data Fitted Network - have set a high standard by leveraging large-scale synthetic pretraining, though they still require a context of labeled examples to function. In contrast, Large Language Models (LLMs) could offer a more flexible alternative via zero- and few-shot in-context learning directly from task descriptions, but their performance on tabular data remains inconsistent and poorly understood. We introduce LLMTabBench, a benchmark designed to systematically evaluate LLMs for tabular classification under data-scarce conditions. LLMTabBench explicitly probes (i) how LLM prior knowledge interacts with in-context information (task descriptions and few-shot examples), and (ii) how model performance scales with increasing data complexity, using both real-world and controlled synthetic datasets. Our findings include: (1) LLMs are highly competitive in zero-shot settings and can outperform alternative models, even when those models have access to few-shot examples; (2) incorporating additional few-shot examples can conflict with LLM prior knowledge, limiting or even degrading performance; and (3) there is a data complexity threshold beyond which LLMs' performance declines and few-shot examples become less effective. Together, these findings reveal fundamental constraints of in-context learning for tabular data and provide practical guidance for deploying LLMs in low-data regimes.

2605.24416 2026-05-26 cs.LG

Synheart Capacity: A Theory-Driven Physiological Representation of Cognitive Capacity Dynamics from Wearable Signals

Synheart Capacity: 一种理论驱动的从可穿戴信号中认知容量动态的生理表征

Yisak Debele, Henok Ademtew, Israel Goytom

AI总结 提出一种理论驱动的多模态学习框架,通过心脏和皮肤电信号的双流编码,将认知容量状态建模为资源分配(努力)和超负荷(压力)的二维生理表征,在SWELL-KW数据集上实现跨个体泛化,并区分不同认知状态。

详情
AI中文摘要

人类认知表现受有限心理资源的约束,但认知容量动态的连续计算估计仍然是一个开放挑战。我们提出了一种理论驱动的多模态学习框架,将容量相关的认知状态建模为由自愿资源分配(心理努力)和超负荷相关压力(应激)定义的二维生理表征。所提出的架构结合了心脏(IBI/HRV)和皮肤电(EDA)信号的双流编码,以及后期融合和任务特定的输出头,独立估计概率性的努力和压力状态。在SWELL-KW数据集上使用严格的留一受试者交叉验证进行评估,展示了跨个体泛化能力(压力:70.0%平衡准确率;努力:72.2%),多模态融合和理论引导监督带来了显著提升。所提出的努力-压力状态空间不是将生理动态压缩为单一工作负荷标签,而是能够结构化区分不同的认知状态,包括生产性投入和超负荷相关压力。在受控工作负荷操作下,预测的状态轨迹表现出显著的负荷敏感性变化,努力和压力在中断和时间压力条件下呈现差异化响应。这些结果表明,基于生理的多维状态表征可能为能够进行连续容量感知监测和人本交互的自适应系统提供基础。

英文摘要

Human cognitive performance is constrained by limited mental resources, yet continuous computational estimation of cognitive capacity dynamics remains an open challenge. We propose a theory-driven multimodal learning framework that models capacity-related cognitive state as a two-dimensional physiological representation defined by voluntary resource allocation (mental effort) and overload-related strain (stress). The proposed architecture combines dual-stream encoding of cardiac (IBI/HRV) and electrodermal (EDA) signals with late fusion and task-specific output heads that independently estimate probabilistic effort and stress states. Evaluation on the SWELL-KW dataset using strict leave-one-subject-out cross-validation demonstrates cross-individual generalization (stress: 70.0\% balanced accuracy; effort: 72.2\%), with significant gains from multimodal integration and theory-guided supervision. Rather than collapsing physiological dynamics into a single workload label, the proposed effort--stress state-space enables structured differentiation between distinct cognitive regimes, including productive engagement and overload-related strain. Predicted state trajectories exhibit significant demand-sensitive shifts under controlled workload manipulations, with effort and stress responding differentially across interruption and time-pressure conditions. These results suggest that physiologically grounded multidimensional state representations may provide a foundation for adaptive systems capable of continuous capacity-aware monitoring and human-centered interaction.

2605.24414 2026-05-26 cs.AI

JT-SAFE-V2: Safety-by-Design Foundation Model with World-Context Data

JT-SAFE-V2:具有世界上下文数据的安全设计基础模型

Junlan Feng, Fanyu Meng, Chong Long, Pengyu Cong, Duqing Wang, Yan Zheng, Yuyao Zhang, Xuanchang Gao, Ye Yuan, Yunfei Ma, Zhijie Ren, Fan Yang, Na Wu, Di Jin, Chao Deng

AI总结 提出JT-Safe-V2大语言模型,通过世界知识预训练、高确定性训练和安全强化后训练实现通用智能与安全设计的联合优化,并引入Safe-MoMA框架降低推理成本,在通用智能和安全基准上达到最优性能。

详情
AI中文摘要

我们介绍了JT-Safe-V2,这是一个旨在提升基础模型安全性和可信度的大型语言模型,将我们之前的JT-Safe模型扩展为更全面的安全设计范式。JT-Safe-V2通过几个关键创新强调通用智能与安全设计的联合优化:用上下文世界知识丰富预训练数据、高确定性预训练程序,以及面向企业级代理能力的安全强化后训练机制。在这些安全增强的基础模型基础上,我们提出了Safe-MoMA(安全模型与代理混合),这是一个通过协调部署多个模型和代理实现可追溯高效推理的框架。广泛评估表明,JT-Safe-V2在通用智能和安全基准上均达到了最先进性能。此外,与使用最大的独立模型基线相比,Safe-MoMA在保持相当性能的同时将推理成本降低了30%以上。为了促进未来安全设计基础模型的研究,我们公开发布了后训练的JT-Safe-V2-35B模型检查点。

英文摘要

We introduce JT-Safe-V2, a large language model designed to advance the safety and trustworthiness of foundation models, extending our previous JT-Safe model toward a more comprehensive safety-by-design paradigm. JT-Safe-V2 emphasizes the joint optimization of general intelligence and safety-by-design through several key innovations: enriching pre-training data with contextual world knowledge, high-certainty pre-training procedures, and safety strengthening post-training mechanisms for enterprise-oriented agentic capabilities. Building on these safety-enhanced foundation models, we propose Safe-MoMA (Safe Mixture of Models and Agents), a framework that enables traceable and efficient inference through the orchestrated deployment of multiple models and agents. Extensive evaluations demonstrate that JT-Safe-V2 achieves state-of-the-art performance across both general intelligence and safety benchmarks. Moreover, Safe-MoMA reduces inference costs by more than 30\% compared to using the largest standalone model baseline while maintaining comparable performance. To facilitate future research on safety-by-design foundation models, we publicly release the post-trained JT-Safe-V2-35B model checkpoint.

2605.24411 2026-05-26 cs.AI cs.LG

The Model Is Not the Product: A Dual-Pillar Architecture for Local-First Psychological Coaching

模型并非产品:面向本地优先心理辅导的双支柱架构

Alexander Mihalcea

AI总结 本文提出Psych LM,一种基于本地优先架构的iOS应用,通过自动记忆语料库和检索增强生成实现近无限上下文窗口,在移动设备上提供可靠的上下文感知心理辅导。

Comments 10 pages, 3 figures

详情
AI中文摘要

现有语言模型应用难以满足情感导向支持的需求,主要原因是它们无法在会话间维持深度、持久的上下文。本报告介绍了Psych LM,一款iOS应用,验证了对于此类应用,周围架构至关重要的论点。Psych LM在专为行为和生活辅导应用设计的本地优先运行时中运行本地设备端语言模型。该系统通过一个自动化的、用户可检查的记忆语料库实现了接近无限上下文窗口的实际效果,该语料库将对话转换为结构化的记忆卡片,包括事实、目标和事件,并通过语义和向量搜索动态注入提示中。因此,该系统可定义为一种主动学习、检索增强生成、设备端架构。该架构提供了四个主要贡献:以隐私为核心属性的本地优先设计;用于持久存储关键用户信息的记忆语料库的详细描述;提供独立于模型内部状态的稳定行为骨架的确定性编排层;以及专注于在现实操作条件下评估集成系统可靠性的基准框架。研发过程证实,通过优先考虑架构控制和资源管理而非简单模型大小,可以在移动环境的严格约束下可靠地实现复杂的上下文感知交互。

英文摘要

Existing language model applications struggle to meet the demand for emotionally oriented support, primarily due to their inability to maintain deep, persistent context across sessions. This report introduces Psych LM, an iOS application that validates the thesis that, for such applications, the surrounding architecture is paramount. Psych LM runs a local, on-device language model within a purpose-built, local-first runtime designed for behavioral and life-coaching applications. The system achieves the practical effect of a near-infinite context window through an automated, user-inspectable memory corpus that converts conversations into structured memory cards, including facts, goals, and events, and dynamically injects them into the prompt via semantic and vector search. As such, the system can be defined as an active-learning, retrieval-augmented generative, on-device architecture. This architecture delivers four primary contributions: a local-first design where privacy is a core property; a detailed description of the memory corpus for persistent context of key user information; a deterministic orchestration layer that provides a stable behavioral spine independent of the model's internal state; and a benchmark framework focused on evaluating the integrated system's reliability under realistic operating conditions. The R and D process confirms that complex, context-aware interaction can be reliably achieved under the strict constraints of a mobile environment by prioritizing architectural control and resource management over simple model size.

2605.24410 2026-05-26 cs.AI

Advancing Graph Few-Shot Learning via In-Context Learning

通过上下文学习推进图少样本学习

Renchu Guan, Yajun Wang, Chunli Guo, Bowen Cao, Fausto Giunchiglia, Wei Pang, Yonghao Liu, Xiaoyue Feng

AI总结 提出VISION模型,将图少样本学习重构为免微调的序列推理问题,利用无监督任务生成器从无标签数据中构建伪任务,通过上下文感知网络融合局部拓扑和全局任务依赖,实现高效推理。

Comments KDD26

详情
AI中文摘要

图少样本学习旨在仅用少量标注样本对来自新类别的节点进行分类,是图学习中广泛研究的问题。然而,现有方法常面临两个关键限制。首先,主流的图少样本学习范式依赖于监督任务,未能利用图中大量的无标签节点。其次,许多方法在推理时需要复杂的任务适应或微调,限制了其效率和适用性。受大语言模型强大的上下文学习能力启发,我们提出了一种名为VISION的新模型,通过上下文学习推进图少样本学习,以应对这些挑战。我们的模型将图少样本学习重构为免微调的序列推理问题。其核心是一个上下文感知网络,该网络使用角色嵌入初始化节点,并采用双上下文融合模块协同整合局部拓扑结构和全局任务级依赖关系。这使得我们的模型能够在单次前向传播中,根据支持集上下文动态地为查询集生成类别感知表示。为了有效训练我们的模型,我们引入了一个无监督任务生成器,该生成器创建结构自适应特征,并从大量无标签数据中构建多样的伪任务。我们的方法将无监督元学习与图上下文学习统一起来,实现了高效推理。在多个基准数据集上的大量实验证明了我们模型的优越性。我们的公开代码可在以下网址找到。

英文摘要

Graph few-shot learning, which aims to classify nodes from novel classes with only a few labeled examples, is a widely studied problem in graph learning. However, existing methods often face two key limitations. First, the predominant graph few-shot learning paradigm relies on supervised tasks, failing to leverage the vast number of unlabeled nodes in the graph. Second, many approaches require complex task adaptation or fine-tuning during inference, limiting their efficiency and applicability. Inspired by the powerful in-context learning capabilities of large language models, we propose a novel model named VISION for adVancIng graph few-Shot learning via In-cOntext LearNing to address these challenges. Our model reframes graph few-shot learning as a fine-tuning-free sequence reasoning problem. At its core is a context-aware network that initializes nodes with role embeddings and employs a dual-context fusion module to synergistically integrate local topological structures and global task-level dependencies. This allows our model to dynamically generate class-aware representations for the query set conditioned on the support set context in a single forward pass. To effectively train our model, we introduce an unsupervised task generator that creates structure-adaptive features and constructs diverse pseudo-tasks from abundant unlabeled data. Our method unifies unsupervised meta-learning with graph in-context learning, achieving efficient inference. Extensive experiments on multiple benchmark datasets demonstrate the superiority of our model. Our public code can be found

2605.24406 2026-05-26 cs.LG

A Unified Python Framework for Direct PPO-based Control of AHUs with Economizer Logic and CO2-Constrained Ventilation

一个统一的Python框架:基于直接PPO的AHU控制,包含节能器逻辑和CO2约束通风

Erfan Haghighat Damavandi, Davide Papurello, Mahdi Alibeigi, Armin Keshavarz, Simone Canevarolo, Marco Condo

AI总结 本文提出一个基于深度强化学习和PPO算法的统一Python框架,通过层次流逻辑和焓基节能器实现AHU的节能控制,在保证CO2浓度不超限的同时提升温度稳定性和能效。

Comments 10 pages, 7 figures

详情
AI中文摘要

优化HVAC(供暖、通风和空调)系统可以在为居住者提供舒适度的同时提高建筑能效。由于建筑围护结构随时间经历随机负荷变化而具有非线性特性,使用传统控制系统来维持HVAC功能通常很困难。本文提出了一种新方法,通过深度强化学习(DRL)算法和在自定义Python性能环境中实现的近端策略优化(PPO)算法来优化HVAC系统。DRL系统使用二阶电阻-电容热模型和集成的CO2动态质量平衡来复制与建筑相关的复杂物理过程。本研究的一个主要创新是“层次流逻辑”,它通过覆盖导致CO2超过1000 ppm的智能体动作来确保室内空气质量(IAQ)得以维持。此外,使用基于焓的节能器从室外环境实现免费冷却。实验数据表明,与通过遗传算法(GA)调优的PID控制器或传统的开关控制相比,PPO智能体具有更好的温度稳定性和整体能效。端到端的流水线为在真实硬件实现背景下实施智能建筑能源管理提供了稳健且通用的解决方案。

英文摘要

Optimizing HVAC (Heating, Ventilation and Air Conditioning) can enhance a building's energy efficiency while providing comfort levels for its occupants. Using conventional control systems to maintain HVAC functions is often difficult because of the nonlinear characteristics of a building envelope as it experiences stochastic load variations over time. This paper presents a new approach to optimizing HVAC systems through the use of Deep Reinforcement Learning (DRL) algorithms and the Proximal Policy Optimization (PPO) algorithm implemented in a custom Python performance environment. The DRL system uses a second order resistor-capacitor thermal model and an integrated dynamic mass balance of CO2 to replicate the complex physics associated with buildings. One major innovation of this study is a "Hierarchical Flow Logic," which provides the means to ensure that indoor air quality (IAQ) is maintained by overriding the accepted actions of the agent that cause CO2 to exceed 1000 ppm. In addition, an enthalpy-based economiser is used to create free cooling from the outdoor environment. The experimental data shows that compared to PID controllers tuned by GA or traditional On-Off controls, a PPO agent has better temperature stability and energy efficiency overall. An end-to-end pipeline provides an avenue for robust and generalized solutions to help implement smart building energy management within the context of real hardware implementation.

2605.24405 2026-05-26 cs.LG cs.AI

Generative OOD-regularized Model-based Policy Optimization

生成式OOD正则化的基于模型的策略优化

Aysin Tumay, Jiahe Huang, Elise Jortberg, Rose Yu

AI总结 提出GORMPO算法,利用生成式密度估计在稀疏状态-动作空间中限制策略更新到高密度区域,以解决离线强化学习中的分布外动作问题,并在真实医疗数据集和离线RL数据集上优于基线方法。

详情
AI中文摘要

我们研究使用离线强化学习的序贯决策。传统离线RL策略在训练仅依赖稀疏离线表示时可能导致分布外(OOD)动作。为确保在稀疏状态-动作空间中的安全离线策略,我们探索如何将密度估计模型集成到基于模型的RL方法中以避免OOD区域。生成式模型能够显式建模稀疏状态-动作空间中的密度。基于此,我们引入生成式OOD正则化的基于模型的策略优化(GORMPO),一种密度正则化的离线RL算法,使用生成式密度建模将策略更新限制在数据集的高密度区域。此外,我们考察更好的OOD检测是否对应更好的基于模型的离线策略。我们比较了(1)各种密度估计器的OOD检测能力,以及(2)它们在GORMPO框架内在真实医疗数据集和稀疏离线RL数据集上的性能。我们在温和假设下理论上保证了GORMPO的性能。实验上,GORMPO在真实医疗数据集上比最先进的基线方法提升17%,并在离线RL数据集上增强了基础模型。我们的实证发现表明,在动态稳定的环境中,更好的OOD检测通常导致改进的策略,而当动态不确定时,带有保守惩罚的较差密度估计更受青睐。

英文摘要

We study sequential decision-making with offline reinforcement learning (RL). Traditional offline RL policies may result in out-of-distribution (OOD) actions when training relies only on sparse offline representations. To ensure safe offline policies in a sparse state-action space, we explore how density estimation models can be integrated into model-based RL methods to avoid the OOD regions. Generative models are capable of explicitly modeling the density in sparse state-action spaces. Building on this, we introduce Generative OOD-regularized Model-based Policy Optimization (GORMPO), a density-regularized offline RL algorithm that uses generative density modeling to restrict policy updates to high-density areas of the dataset. Furthermore, we examine whether better OOD detection corresponds to better model-based offline policies. We compare (1) the OOD detection capabilities of various density estimators and (2) their performance within the GORMPO framework on a real-world medical dataset and sparse offline RL datasets. We theoretically guarantee GORMPO's performance under mild assumptions. Empirically, GORMPO outperforms state-of-the-art baselines by 17% on a real-world medical dataset and enhances the base model on the offline RL datasets. Our empirical findings show that better OOD detection generally results in improved policies in environments with stable dynamics, while conservative penalties with poor density estimation are favored when dynamics are uncertain.

2605.24403 2026-05-26 cs.CV

Artiverse: A Diverse and Physically Grounded Dataset for Articulated Objects

Artiverse:一个多样且物理基础扎实的铰接物体数据集

Denys Iliash, Jiayi Liu, Egor Fokin, Qirui Wu, Ali Mahdavi-Amiri, Manolis Savva, Angel X. Chang

AI总结 提出Artiverse数据集,包含5.4K个高质量铰接3D物体,通过半自动标注管道结合少样本分割、几何推理和多阶段人工验证,实现高效标注,并展示其在部件运动分析、铰接物体生成和基于物理的交互中的价值。

Comments CVPR camera-ready version

详情
AI中文摘要

我们提出了Artiverse,一个多样且物理基础扎实的高质量铰接3D物体数据集,旨在用于真实的功能建模和仿真。Artiverse包含来自多个3D静态仓库的5.4K个人工制作的物体,涵盖88个广泛类别。物体被标注有功能部件、内部结构、真实的运动学关系和铰接关节(包括多自由度关节),以及物理属性如公制尺度、材料和质量。我们开发了一个半自动标注管道,结合少样本分割、几何推理和多阶段人工验证,以实现高质量和高效的标注,将人工标注时间减少了30%以上。我们展示了Artiverse在部件运动分析、铰接物体生成和基于物理的交互任务中的价值。Artiverse为推进铰接物体的功能理解提供了数据资源。

英文摘要

We present Artiverse, a diverse and physically grounded dataset of high-quality articulated 3D objects designed for realistic functional modeling and simulation. Artiverse contains 5.4K human-authored objects across a broad range of 88 categories, aggregated from multiple 3D static repositories. Objects are annotated with functional parts, interior structures, realistic kinematic relationships and articulated joints including multi-DoF joints, and physical attributes such as metric scale, material, and mass. We develop a semi-automated annotation pipeline that combines few-shot segmentation, geometric reasoning, and multi-stage human verification to achieve high-quality and efficient annotation, reducing manual annotation time by over 30%. We demonstrate the value of Artiverse on tasks of part mobility analysis, articulated object generation, and physics-based interaction. Artiverse provides a data resource to advance functional understanding for articulated objects.

2605.24402 2026-05-26 cs.CV

Dual Prototype-Conditioned Diffusion Model for Scalable Multi-Class Unsupervised Anomaly Detection in Large Category Spaces

面向大规模类别空间的可扩展多类无监督异常检测的双原型条件扩散模型

Yaoxuan Feng, Yuxin Li, Weijiang Lv, Zixuan Zhao, Yubiao Wang, Wenchao Chen, Bo Chen, Hongwei Liu

AI总结 提出DPDiff-AD,一种通过局部和全局原型建模异构正态分布并利用扩散重建实现可扩展多类异常检测的方法。

详情
AI中文摘要

多类异常检测旨在跨不同产品类别构建统一模型。然而,随着类别数量的增加,由于正态分布日益复杂和异质,其性能通常会下降。为应对这一挑战,我们提出DPDiff-AD,一种用于大规模多类异常检测的双原型条件扩散模型。DPDiff-AD通过互补的局部和全局原型对异构正态分布进行建模。局部原型通过最近原型聚合捕获代表性的细粒度结构模式,而全局原型通过最优传输正则化调节整体特征几何。这些双尺度表示共同定义了一个结构化的正态空间。通过基于原型感知注意力的局部和全局原型条件扩散重建,该空间得到细化。在生成过程中联合利用双原型,DPDiff-AD实现了精确的正态建模,随着类别基数的增长保持了结构化的可分离性,并实现了可扩展的异常判别。在五个基准上的大量实验证明了DPDiff-AD的有效性和可扩展性。在160类大规模数据集上,它相比之前最先进的方法Dinomaly+,图像级和像素级AUROC分别提升了5.3和2.9个百分点,同时随着类别基数的增加保持了稳定的性能。

英文摘要

Multi-class anomaly detection aims to build unified models across diverse product categories. However, as the number of categories grows, its performance often degrades due to increasingly complex and heterogeneous normal distributions. To address this challenge, we propose DPDiff-AD, a Dual Prototype-conditioned Diffusion model for large-scale multi-class Anomaly Detection. DPDiff-AD models heterogeneous normal distributions through complementary local and global prototypes. Local prototypes capture representative fine-grained structural patterns via nearest-prototype aggregation, while global prototypes regulate holistic feature geometry through optimal transport regularization. Together, these dual-scale representations define a structured normality space. This space is refined through diffusion-based reconstruction conditioned on both local and global prototypes via prototype-aware attention. By jointly leveraging dual prototypes during generation, DPDiff-AD achieves precise normality modeling, preserves structured separability as category cardinality grows, and enables scalable anomaly discrimination. Extensive experiments across five benchmarks demonstrate the effectiveness and scalability of DPDiff-AD. On the 160-category large-scale dataset, it improves image- and pixel-level AUROC by 5.3 and 2.9 points over the previous state-of-the-art method Dinomaly+, while maintaining stable performance as category cardinality increases.

2605.24398 2026-05-26 cs.CV cs.AI cs.GR

VectorArk: Learning Practical Image Vectorization with Rounded Polygon Representation

VectorArk: 学习基于圆角多边形表示的实际图像矢量化

Tarun Gehlaut, Difan Liu, Charu Bansal, Krutik Malani, Souymodip Chakraborty, Ankit Phogat, Matthew Fisher, Vineet Batra

AI总结 提出VectorArk模型,采用圆角多边形表示和退化模型,实现鲁棒且实用的图像矢量化,在多个数据集上取得优越的几何完整性和伪影抑制效果。

Comments CVPR 2026. Project page: https://vectorark.github.io/

详情
AI中文摘要

近期基于视觉-语言模型(VLM)的方法在图像矢量化任务上取得了令人印象深刻的结果。然而,它们通常在合成基准上进行评估,其中干净的SVG以高分辨率光栅化,然后重新矢量化。因此,这些方法在真实场景中泛化能力较差,例如图像具有未知的光栅化方法或由文本到图像模型生成。我们引入了VectorArk,一种新的基于VLM的模型,旨在实现鲁棒且实用的图像矢量化。VectorArk采用了一种新颖的圆角多边形表示,简化了学习过程,同时自然地生成平滑、视觉上吸引人的基元。我们还提出了一种退化模型,增强了在多样且不完美输入下的鲁棒性。我们的实验表明,与先前方法相比,VectorArk在多个数据集上实现了优越的几何完整性和伪影抑制,全面的消融实验验证了每个组件的贡献。

英文摘要

Recent vision-language model (VLM)-based approaches have achieved impressive results on image vectorization tasks. However, they are typically evaluated on synthetic benchmarks, where clean SVGs are rasterized at high resolution and then re-vectorized. As a result, these methods generalize poorly to real-world scenarios, such as images with unknown rasterization methods or those generated by text-to-image models. We introduce VectorArk, a new VLM-based model designed for robust and practical image vectorization. VectorArk employs a novel rounded polygon representation that simplifies the learning process while naturally producing smooth, visually appealing primitives. We also propose a degradation model that enhances robustness across diverse and imperfect inputs. Our experiments show that, in contrast to previous methods, VectorArk achieves superior geometric completeness and artifact suppression across multiple datasets, with comprehensive ablations validating the contribution of each component.

2605.24396 2026-05-26 cs.AI

Understanding and Mitigating Premature Confidence for Better LLM Reasoning

理解并缓解过早自信以提升大语言模型推理能力

Jingchu Gai, Guanning Zeng, Christina Baek, Chen Wu, J. Zico Kolter, Andrej Risteski, Aditi Raghunathan

AI总结 针对大语言模型长思维链中逻辑跳跃和过早自信问题,提出渐进式自信塑造强化学习目标,无需外部标签或奖励模型,通过奖励逐步自信增长并惩罚过早承诺,显著提升推理准确性和质量。

详情
AI中文摘要

当前语言模型的长思维链(CoT)经常包含逻辑缺口和不合理的跳跃,限制了额外测试时计算带来的收益。直接提升推理质量需要过程奖励模型,但训练它们所需的步骤级标注昂贵且稀缺。我们在模型推理过程中自信度的演化中发现了一个信号:过早自信,即倾向于过早承诺答案并用剩余标记为其辩护,这强烈预测了跨任务和模型规模的推理缺陷。我们利用这一点提出了渐进式自信塑造,这是一种强化学习目标,训练模型在推理过程中更新自信度而非过早承诺——奖励逐步自信增长并惩罚过早承诺,无需外部标签或奖励模型。该方法在算术(Countdown)、数学(DAPO、AIME)和科学(ScienceQA)任务上,从1.5B到8B参数规模均提升了准确率和推理质量:在Countdown上,准确率提升3.2倍(+42.0个百分点),缺陷推理下降48个百分点;在AIME上,Pass@64提升6.6个百分点。与该机制一致,该方法还提升了忠实度:在安全基准上,我们的模型更透明地在其推理轨迹中暴露误导性内容而非隐藏它。控制实验表明,问题及其解决方案共同扩展:过早自信随模型规模和任务难度增长,而解决它带来的收益也随之增长。

英文摘要

Long chains of thought (CoT) from current language models frequently contain logical gaps and unjustified leaps, limiting the gains from additional test-time compute. Improving reasoning quality directly would require process reward models, but the step-level annotations needed to train them are expensive and scarce. We find such a signal in how the model's confidence evolves during reasoning: premature confidence, the tendency to commit to an answer early and use the remaining tokens to rationalize it, strongly predicts flawed reasoning across tasks and model scales. We exploit this in progressive confidence shaping, a reinforcement learning objective that trains models to update their confidence as they reason rather than commit early -- rewarding gradual confidence growth and penalizing early commitment, with no external labels or reward models. The method improves accuracy and reasoning quality from 1.5B to 8B parameters across arithmetic (Countdown), math (DAPO, AIME), and science (ScienceQA): on Countdown, accuracy improves 3.2x (+42.0pp) and flawed reasoning drops 48pp; on AIME, Pass@64 improves 6.6pp. Consistent with this mechanism, the method also improves faithfulness: on a safety benchmark, our models more transparently surface misleading content in their reasoning traces rather than concealing it. Controlled experiments reveal that the problem and its remedy scale together: premature confidence grows with model size and task difficulty, and so do the gains from addressing it.

2605.24395 2026-05-26 cs.LG

AvAtar: Learning to Align via Active Optimal Transport

AvAtar: 通过主动最优输运学习对齐

Qi Yu, Ruizhong Qiu, Zhichen Zeng, My T. Thai, Huan Liu, Hanghang Tong

AI总结 提出AvAtar框架,利用主动学习策略通过熵正则化最优输运的梯度影响量化候选点信息量,并采用伴随状态法高效求解,以提升对齐性能。

Comments Published as a conference paper at ICML 2026

详情
AI中文摘要

对齐在许多机器学习问题中扮演基础角色,例如多网络分析、多模态学习和点云配准。近期工作越来越多地利用最优输运(OT)进行分布对齐,其有效性很大程度上依赖于在实践中难以或昂贵获取的稀疏监督。然而,现有工作大多忽略了如何主动获取高质量监督以提升OT框架下的对齐性能。本文提出了一种基于主动对齐的最优输运框架AvAtar。我们通过测量候选点对全局对齐结果的梯度影响来量化其信息量,该影响通过熵正则化OT公式从全局对齐结果传播到候选点的所有可能监督。鉴于OT的约束性质,直接对其求导具有挑战性,我们利用伴随状态方法将计算重新表述为一个线性系统,可通过共轭梯度法以线性复杂度求解并保证收敛。通过有效的效用函数编码全局对齐结果,AvAtar适用于OT框架下的一般对齐问题。在三个代表性对齐任务上的大量实验证明了所提AvAtar的有效性、可扩展性和泛化性。

英文摘要

Alignment plays a fundamental role in many machine learning problems, such as multi-network analysis, multimodal learning, and point cloud registration. Recent works increasingly leverage optimal transport (OT) for distributional alignment, whose effectiveness largely depends on sparse supervision that is hard or costly to obtain in practice. Existing works, however, largely overlook how to actively acquire high-quality supervision to improve their alignment performance under OT frameworks. In this paper, we propose a principled active alignment framework for optimal transport alignment called AvAtar. We quantify the informativeness of a candidate by measuring its gradient-based impact on the global alignment result, computed as the gradient propagation from the global alignment result to all possible supervisions of the candidate through the entropy-regularized OT formulation. While differentiating through OT is challenging given its constrained nature, we leverage the adjoint-state method to reformulate the computation to a linear system solvable by the conjugate gradient method with linear complexity and guaranteed convergence. By encoding the global alignment result via effective utility functions, AvAtar is applicable to general alignment problems under the OT framework. Extensive experiments on three representative alignment tasks demonstrate the effectiveness, scalability, and generalizability of the proposed AvAtar.

2605.24394 2026-05-26 cs.RO

RoboHitch: Learning Visual Affordance from Disordered Keypoints for Hitch Knots Tying

RoboHitch: 从无序关键点学习视觉可供性用于系结

Jiahui Zuo, Boyang Zhang, Fumin Zhang

AI总结 提出RoboHitch框架,利用无序3D关键点和RGB图像从人类演示中学习系结,通过动态图自编码器和卷积自编码器融合特征,预测抓取和放置可供性,实现遮挡下的系结。

详情
AI中文摘要

由于复杂的动力学和频繁的自遮挡,可变形线性物体的机器人操作面临重大挑战。现有的机器人打结方法通常依赖于有序关键点和显式边缘连接的精确拓扑状态跟踪。这种依赖使得它们在打结过程中因重复弯曲和交叉导致的跟踪漂移和拓扑不匹配而容易失败。为了解决这些限制,我们引入了RoboHitch,一个新颖的框架,它仅使用无序的3D关键点和RGB图像从人类演示中学习执行系结。这消除了对显式拓扑顺序的需求,允许更灵活的操作。我们的方法采用动态图自编码器从未跟踪的关键点中提取几何特征,并辅以卷积自编码器捕获必要的视觉上下文。然后,双向交叉注意力机制融合这些模态,共同预测抓取和放置可供性,促进对绳子状态的隐式推理,并在遮挡下实现系结。真实世界实验证明了我们方法的有效性和泛化能力,成功完成了自遮挡场景中的系结。

英文摘要

Robotic manipulation of deformable linear objects (DLOs) presents significant challenges due to complex dynamics and frequent self-occlusions. Existing robotic knot tying methods typically rely on precise topological state tracking with ordered keypoints and explicit edge connectivity. This reliance makes them prone to failures due to tracking drift and topology mismatch caused by repeated bending and crossings during knot formation.To address these limitations, we introduce RoboHitch, a novel framework that learns to perform hitch knot tying from human demonstrations using only disordered 3D keypoints and RGB images. This eliminates the need for explicit topological order, allowing for more flexible manipulation. Our method employs a dynamic Graph Autoencoder to extract geometric features from untracked keypoints, complemented by a Convolutional Autoencoder that captures essential visual context. A bidirectional cross-attention mechanism then fuses these modalities to jointly predict pick and place affordances, facilitating implicit reasoning about the rope's state and enabling knot tying under occlusion.Real-world experiments demonstrate the effectiveness and generalizability of our approach, successfully completing hitch knots in scenarios with self-occlusions.

2605.24390 2026-05-26 cs.LG

Learning Laplacian Eigenspace with Mass-Aware Neural Operators on Point Clouds

学习拉普拉斯特征空间:基于质量感知神经算子的点云处理

Zherui Yang, Tao Du, Ligang Liu

AI总结 提出神经特征空间算子(NEO),通过预测稳定不变的低频子空间而非特征向量,结合质量感知神经算子和瑞利-里兹精化,实现点云上拉普拉斯-贝尔特拉米算子的快速谱分解。

详情
AI中文摘要

拉普拉斯-贝尔特拉米算子(LBO)的特征分解是几何分析的基础,但由于在大规模数据上迭代求解器的高成本,计算其低频特征模态仍然是一个重大瓶颈。为了分摊这一成本,我们引入了神经特征空间算子(NEO),这是一种前馈框架,旨在直接从点云预测谱。关键的是,NEO通过学习稳定、不变的低频子空间,规避了标准特征向量回归的不适定性(后者存在固有的符号翻转和旋转歧义)。具体来说,网络预测一组冗余的基函数,其张成空间稳健地覆盖目标特征空间,从而通过轻量级的瑞利-里兹精化恢复精确的特征对。为了处理不规则采样,我们提出了一种质量感知神经算子,将逐点面积权重纳入基于注意力的聚合中,提高了对非均匀密度的鲁棒性,并实现了跨分辨率的零样本泛化。我们的方法实现了近线性的运行时间缩放,并在相当精度下比迭代求解器获得了显著的挂钟加速,同时对高分辨率点云表现出强大的零样本迁移能力。得到的特征对支持标准的谱几何任务,而原始基函数为下游学习提供了有效的逐点特征。代码:https://github.com/Adversarr/NEO。

英文摘要

The eigendecomposition of the Laplace--Beltrami Operator (LBO) is fundamental to geometric analysis, yet computing its low-frequency eigenmodes remains a significant bottleneck due to the high cost of iterative solvers on large-scale data. To amortize this cost, we introduce the Neural Eigenspace Operator (NEO), a feed-forward framework designed to predict the spectrum directly from point clouds. Crucially, NEO circumvents the ill-posed nature of standard eigenvector regression, which suffers from intrinsic sign flips and rotation ambiguities, by learning the stable, invariant low-frequency subspace instead. Specifically, the network predicts a redundant set of basis functions whose span robustly covers the target eigenspace, allowing for the recovery of accurate eigenpairs via a lightweight Rayleigh--Ritz refinement. To handle irregular sampling, we propose a mass-aware neural operator that incorporates per-point area weights into attention-based aggregation, improving robustness to non-uniform densities and enabling zero-shot generalization across resolutions. Our approach achieves near-linear runtime scaling and substantial wall-clock speedups over iterative solvers at comparable accuracy, and exhibits strong zero-shot transfer to high-resolution point clouds. The resulting eigenpairs support standard spectral geometry tasks, while the raw basis functions provide effective point-wise features for downstream learning. Code: https://github.com/Adversarr/NEO.

2605.24381 2026-05-26 cs.LG cs.AI stat.AP stat.ML

Assessing the Operational Viability of Foundation Models for Time Series Forecasting

评估基础模型在时间序列预测中的操作可行性

Kavin Soni, Debanshu Das, Vamshi Guduguntla

AI总结 通过对比基础模型与监督学习方法在四种操作场景下的性能,提出基于经验特征的复杂度路由器以实现精度与效率的平衡。

Comments 21 pages, 8 Figures, Code available at [https://github.com/kavin-soni/timeseries-zeroshot-eval]

详情
AI中文摘要

时间序列预测驱动着金融、交通和能源等领域的操作决策。虽然监督学习方法表现出色,但它们需要特定领域的训练、特征工程和持续维护。大规模基础模型最近作为一种零样本替代方案出现,像LLM一样避免了任务特定训练。在这项工作中,我们评估了基础模型与标准监督方法的对比。我们不仅关注总体精度,还分析了四种操作场景下的性能:周期性人机系统、物理约束过程、随机金融市场和异构需求预测。我们的结果描述了最优部署区域。基础模型在具有可迁移周期结构的领域中表现良好,并且对于冷启动或长尾场景效率高。相反,监督专家在受严格物理约束的系统中保持更高的精度。在金融领域,较新的基础模型正在迅速缩小与监督专家的性能差距。我们进一步量化了推理延迟、数据漂移适应性和部署约束之间的权衡。最后,我们提出了一个复杂度路由器,它利用经验特征将每个序列分配给最优模型类别。我们证明,与部署通用基础模型相比,这种选择性路由实现了更高的精度和显著更低的推理成本,为平衡泛化性和效率提供了一个实用框架。

英文摘要

Time series forecasting drives operational decisions in areas like finance, transportation, and energy. While supervised learning approaches achieve strong performance, they require domain-specific training, feature engineering, and ongoing maintenance. Large-scale foundation models have recently emerged as a zero-shot alternative, avoiding task-specific training much like LLMs. In this work, we evaluate foundation models against standard supervised approaches. Rather than focusing solely on aggregate accuracy, we analyze performance across four operational regimes: periodic human-centric systems, physically constrained processes, stochastic financial markets, and heterogeneous demand forecasting. Our results characterize optimal deployment areas. Foundation models perform well in domains with transferable periodic structures and are efficient for cold-start or long-tail scenarios. Conversely, supervised specialists maintain higher precision in systems governed by strict physical constraints. In financial domains, newer foundation models are rapidly closing the performance gap with supervised specialists. We further quantify trade-offs in inference latency, data drift adaptability, and deployment constraints. Finally, we propose a Complexity Router that assigns each series to the optimal model class using empirical features. We demonstrate that this selective routing achieves higher accuracy and significantly lower inference costs compared to deploying a universal foundation model, providing a practical framework for balancing generalization and efficiency.

2605.24375 2026-05-26 cs.AI

Distilling Game Code World Model Generation into Lightweight Large Language Models

将游戏代码世界模型生成蒸馏到轻量级大型语言模型

Tyrone Serapio, Arjun Prakash, Haoyang Xu, Kevin Wang, Amy Greenwald

AI总结 研究通过后训练将游戏代码世界模型生成能力蒸馏到小型模型,采用监督微调和带可验证奖励的强化学习提升生成代码的语法正确性和规则遵循性。

详情
AI中文摘要

大型语言模型(LLMs)在从自然语言生成可执行代码方面展现了强大的能力,为AI代理自动构建环境提供了可能性。最近关于代码世界模型(CWMs)的工作表明,LLMs可以将游戏规则转化为与蒙特卡洛树搜索等求解器兼容的Python实现。我们在游戏设置中研究此问题,其中生成的环境必须实现规则、合法动作、状态转移、观察和奖励。我们将这些特定于游戏的可执行模型称为游戏代码世界模型(GameCWMs)。然而,当前生成代码世界模型的方法依赖于前沿模型和推理时精炼循环,限制了可访问性和可扩展性。本文研究是否可以通过后训练将GameCWM生成能力蒸馏到更小的模型中。我们引入:(1)一个包含30个完美信息和不完美信息游戏的精选数据集,(2)一个评估生成代码的结构和语义游戏属性的验证框架,以及(3)一个结合监督微调(SFT)和带可验证奖励的强化学习(RLVR)的后训练流程。我们在Qwen2.5-3B-Instruct上进行实验,发现SFT可以提高语法正确性,而RLVR可以改善执行层面对游戏规则的遵循,从而提升Qwen在完美信息和不完美信息游戏中生成有效GameCWM的能力。总体而言,我们的流程使Qwen2.5-3B-Instruct更能够生成有效的GameCWM,从而为从自然语言自动生成环境提供了一条可扩展的路径。

英文摘要

Large Language Models (LLMs) have shown great ability in generating executable code from natural language, opening the possibility of automatically constructing environments for AI agents. Recent work on Code World Models (CWMs) demonstrates that LLMs can translate game rules into Python implementations compatible with solvers like Monte Carlo Tree Search. We study this problem in game settings, where generated environments must implement rules, legal actions, state transitions, observations, and rewards. We refer to these game-specific executable models as Game Code World Models (GameCWMs). However, current approaches to generating code world models rely on frontier models and inference-time refinement loops, limiting accessibility and scalability. This work investigates whether GameCWM generation capabilities can be distilled into smaller models through post-training. We introduce: (1) a curated dataset of 30 games spanning perfect and imperfect information games, (2) a verification framework that evaluates generated code against structural and semantic game properties, and (3) a post-training pipeline combining Supervised Fine-Tuning (SFT) with Reinforcement Learning with Verifiable Rewards (RLVR). We experiment with Qwen2.5-3B-Instruct and find that SFT can increase syntactic correctness, while RLVR can improve execution-level adherence to game rules, thereby improving Qwen's ability to generate valid GameCWMs in both perfect and imperfect information games. Overall, our pipeline makes Qwen2.5-3B-Instruct more capable of generating valid GameCWMs, thereby offering a scalable path toward automatic environment generation from natural language.

2605.24371 2026-05-26 cs.CV cs.CL

SliceWorld: A Predictive and Controllable World-State Model for CT Report Generation

SliceWorld: 一种用于CT报告生成的预测性和可控世界状态模型

Yuanhe Tian, Yan Song

AI总结 提出SliceWorld世界状态框架,通过编码CT切片序列为因子感知的潜在状态,实现未来切片预测、病变因子干预和LLM报告生成,在M3D-Cap和CT-RATE上提升NLG指标和临床评估。

Comments 18 pages, 5 figures

详情
AI中文摘要

CT报告生成(CTRG)要求模型从数百个轴向切片中总结三维解剖背景和病理发现。现有方法通常学习直接的图像到文本映射,缺乏对CT证据如何跨切片演变或报告如何响应潜在病变相关因素受控变化的建模机制。我们提出SliceWorld,一个CT特定的世界状态框架,将轴向CT扫描视为沿z轴的有序序列。SliceWorld将前缀CT证据编码为包含解剖、病变和不确定性成分的因子感知潜在状态,并将这些状态投影到用于多步未来切片特征预测、病变因子干预和基于LLM的报告生成的世界令牌中。该模型首先在CT切片序列上使用预测性、因子感知和反事实目标进行预训练,然后在配对的CT报告数据上进行微调。在M3D-Cap和CT-RATE上的实验表明,SliceWorld改善了自然语言生成指标和临床导向的自动评估。进一步分析展示了多视野未来切片预测、可测量的因子对齐、减少切片的鲁棒性以及选择性病变敏感的报告调制。

英文摘要

CT report generation (CTRG) requires models to summarize three-dimensional anatomical context and pathological findings from hundreds of axial slices. Existing methods typically learn a direct image-to-text mapping, providing limited mechanisms for modeling how CT evidence evolves across slices or how reports respond to controlled changes in latent lesion-related factors. We propose SliceWorld, a CT-specific world-state framework that treats an axial CT scan as an ordered sequence along the z-axis. SliceWorld encodes prefix CT evidence into factor-aware latent states containing anatomy, lesion, and uncertainty components, and projects these states into world tokens used for multi-step future-slice feature prediction, lesion-factor intervention, and LLM-based report generation. The model is first pretrained on CT slice sequences with predictive, factor-aware, and counterfactual objectives, and is then fine-tuned on paired CT-report data. Experiments on M3D-Cap and CT-RATE show that SliceWorld improves natural language generation metrics and clinically oriented automatic evaluation. Further analyses demonstrate multi-horizon future-slice prediction, measurable factor alignment, reduced-slice robustness, and selective lesion-sensitive report modulation.

2605.24370 2026-05-26 cs.LG q-bio.QM

GEESE: Genotype-aware End-to-End Spatio-temporal Embedding for Behavioral Phenotyping

GEESE: 基因型感知的端到端时空嵌入用于行为表型分析

Yiran Ding, Yuen Gao, Chunqi Qian, Zijun Cui

AI总结 提出GEESE框架,利用预训练时间序列基础模型从3D姿态动力学中直接学习行为表征,无需手工特征,在三个自闭症相关基因模型上超越传统方法,并开发了交互式工具HONK。

详情
AI中文摘要

遗传动物模型的行为表型分析目前需要劳动密集的手工特征工程,这限制了可重复性和可扩展性。我们提出GEESE,一个端到端的深度学习框架,直接从3D姿态动力学中学习行为表征,无需手工特征。使用预训练的时间序列基础模型,我们将运动序列编码到一个行为流形中,该流形支持行为分类和基因型预测。在三个自闭症相关基因模型(CNTNAP2、CHD8、FMR1)上评估,我们的深度学习方法在这两个任务上都超越了手工特征基线,揭示出学习到的表征捕获了基因型特异的行为特征。该框架跨遗传背景泛化,一个全队列模型仅从运动模式中识别遗传背景和基因型。我们进一步提供HONK,一个交互式智能工具,使没有编程专业知识的科研人员能够通过自然语言交互从姿态数据中进行行为表型分析。

英文摘要

Behavioral phenotyping of genetic animal models currently requires labor-intensive manual feature engineering that limits reproducibility and scalability. We present GEESE, an end-to-end deep learning framework that learns behavioral representations directly from 3D pose dynamics without hand-crafted features. Using a pretrained time series foundation model, we encode movement sequences into a behavioral manifold that supports both behavior classification and genotype prediction. Evaluated across three autism-associated genetic models (CNTNAP2, CHD8, FMR1), our deep learning approach surpasses hand-crafted feature baselines in both tasks, revealing that learned representations capture genotype-specific behavioral signatures. The framework generalizes across genetic backgrounds, and an all-cohort model identifies both genetic background and genotype from movement patterns alone. We further provide HONK, an interactive intelligent tool enabling researchers without programming expertise to perform behavioral phenotyping from pose data through natural language interaction.

2605.24367 2026-05-26 cs.CV cs.LG

Gaussian Rank-Based Neighborhood Degree for Graph Neural Networks in Image Classification

基于高斯排序邻域度的图神经网络图像分类方法

Rafael Mendonça Duarte, Jean Roberto Ponciano, Lucas Pascotti Valem

AI总结 提出GRaNDe(高斯排序邻域度)方法,通过结合邻域排序与高斯距离加权来改进图神经网络中的度归一化,在五个公开图像分类数据集上取得一致准确率提升。

详情
AI中文摘要

数据的指数级增长加剧了未标注数据的可用性与人工标注的高成本之间的差距。图神经网络(GNN)作为一种有前景的解决方案出现,因为它们利用关系结构并从标注和未标注数据中学习,执行半监督学习。这些模型的一个关键组成部分是基于度的归一化,它影响消息传播,但通常假设邻域节点具有均匀重要性。在图像分类中,图通常根据特征相似性构建,将所有邻居平等对待可能会忽略相关性的重要变化。受此差距启发,我们提出GRaNDe(高斯排序邻域度)。这种新颖的度度量将邻域排序与高斯距离加权相结合,以更好地捕捉节点重要性。在五个公开图像分类数据集上的实验表明,与最先进方法相比,该方法具有一致的准确率提升和竞争性或更优的结果。

英文摘要

The exponential growth of data has intensified the gap between the availability of unlabeled data and the high cost of manual annotation. Graph Neural Networks (GNNs) have emerged as a promising solution, as they exploit relational structures and learn from both labeled and unlabeled data, performing semi-supervised learning. A crucial component of many of these models is degree-based normalization, which influences message propagation but typically assumes uniform importance among neighboring nodes. In image classification, graphs are usually constructed from feature similarity, where treating all neighbors equally may overlook important variations in relevance. Motivated by this gap, we propose GRaNDe (Gaussian Rank-based Neighborhood Degree). This novel degree measure integrates neighborhood ranking with Gaussian distance weighting to better capture node importance. Experiments on five public image classification datasets show consistent accuracy improvements and competitive or superior results compared to state-of-the-art methods.

2605.24366 2026-05-26 cs.CL cs.LG

Structure-Aware RAG: Structured Retrieval Augmented Generation from Noisy Data for Conversational Agents

结构感知检索增强生成:面向对话代理的噪声数据结构化检索增强生成

Kaiqiao Han, LuAn Tang, Renliang Sun, Peng Yuan, Wei Cheng, Haoyu Wang, Wei Wang, Yizhou Sun, Haifeng Chen

AI总结 提出结构感知检索增强生成(SA-RAG),通过表格作为中间结构化表示来减少噪声并保留关键信息,结合质量感知的表格元数据生成框架和优化方法,在噪声真实数据集上显著优于现有RAG基线。

详情
AI中文摘要

大型语言模型(LLM)已广泛应用于对话应用。然而,它们对参数化知识的依赖限制了在需要动态或领域特定信息的真实场景中的可靠性。检索增强生成(RAG)通过在生成过程中引入外部知识来解决这一限制,但现有的基于文本和基于图的RAG方法通常难以处理噪声或不相关的上下文。在这项工作中,我们提出了结构感知检索增强生成(SA-RAG),它使用表格作为中间结构化表示,提供紧凑且可控的接口,在减少噪声的同时保留关键信息。我们引入了一个质量感知的表格元数据生成框架,对元数据规范化和有效性进行建模,提高了元数据质量和下游性能。此外,我们探索了无训练和基于训练的表格生成方法。生成验证和直接偏好优化进一步提高了表格质量,同时保持了语义和结构一致性。在两个噪声真实数据集上的实验表明,SA-RAG显著优于现有的RAG基线。我们的代码已在公共仓库中公开。

英文摘要

Large Language Models (LLMs) have been widely adopted in conversational applications. However, their reliance on parametric knowledge limits reliability in real-world scenarios that require dynamic or domain-specific information. Retrieval-Augmented Generation (RAG) addresses this limitation by incorporating external knowledge during generation, but existing text-based and graph-based RAG methods often struggle with noisy or irrelevant contexts. In this work, we propose Structure-aware Retrieval Augmented Generation (SA-RAG), which uses tables as an intermediate structured representation to provide a compact and controllable interface that reduces noise while preserving essential information. We introduce a quality-aware table metadata generation framework that models metadata normalization and effectiveness, improving metadata quality and downstream performance. Furthermore, we explore both training-free and training-based table generation methods. Generation validation and direct preference optimization further improve table quality while maintaining semantic and structural consistency. Experiments on two noisy real-world datasets show that SA-RAG significantly outperforms existing RAG baselines. Our code is publicly available at a public repository.

2605.24357 2026-05-26 cs.LG

Refined Analysis of Entropy-Regularized Actor-Critic

熵正则化演员-评论家的精细化分析

Safwan Labbi, Paul Mangold, Daniil Tiapkin, Eric Moulines

AI总结 本文精细化分析了熵正则化有限折扣环境中演员-评论家算法中评论家的作用,证明精确评论家可强方差缩减,使随机梯度演员-评论家达到与确定性策略梯度相同的样本复杂度,并指出评论家误差足够小时方差缩减和快速收敛得以保持,强调了准确评论家估计的关键性。

详情
AI中文摘要

在本文中,我们研究了熵正则化、有限、折扣环境中演员-评论家算法中评论家的作用。我们证明,当评论家精确时,将其作为基线是一种强意义上的方差缩减方法。在这种情况下,使用随机梯度的演员-评论家达到了与确定性策略梯度相同的样本复杂度,以 $\tilde{O}(\log(1/ε))$ 个样本达到 $ε$-最优正则化值。在实践中,评论家与演员同时学习:演员更新的方差随后受到评论家方差和偏差的影响。具体而言,当评论家误差足够小时,方差缩减和快速收敛得以保持。这建议先学习评论家,并在每次演员更新后保持其更新,强调了准确评论家估计在演员-评论家方法中的关键作用。

英文摘要

In this paper, we study the role of the critic in actor--critic for entropy-regularized, finite, discounted environments. We establish that, when the critic is exact, using the latter as a baseline is a variance-reduction method in a strong sense. In this case, actor--critic with stochastic gradients matches the sample complexity of deterministic policy gradient, reaching an $ε$-optimal regularized value with $\tilde{O}(\log(1/ε))$ samples. In practice, the critic is learned alongside the actor: the variance of the actor update is then influenced by the critic's variance and bias. Specifically, when the critic has a sufficiently small error, the variance reduction and rapid convergence are preserved. This suggests to learn the critic first, keeping it up to date after each actor update, underscoring the crucial role of accurate critic estimation in actor--critic methods.

2605.24354 2026-05-26 cs.CV

SparseWorld: Enhancing End-to-End Autonomous Driving via World Models with Sparse Scene Representation

SparseWorld: 通过具有稀疏场景表示的世界模型增强端到端自动驾驶

Ruoyu Wang, Jingke Wang, Yukai Ma, Yuehao Huang, Shuangming Lei, Guanglin Xu, Aixue Ye, Yong Liu

AI总结 提出SparseWorld,一种基于稀疏场景表示的轻量级世界模型,通过自回归预测未来地图元素和周围智能体,并利用预测结果优化下游运动预测和轨迹规划,在nuScenes数据集上实现0.05%的碰撞率,达到开放循环规划指标的最优性能。

详情
AI中文摘要

最近,世界模型通过未来情况预测和改进场景理解,在增强端到端驾驶系统方面取得了显著进展。然而,现有的驾驶世界模型通常基于密集场景表示,导致高计算成本和冗余信息。在本文中,我们提出了SparseWorld,一种轻量级世界模型,专注于仅预测场景的关键布局,从而为端到端驾驶系统实现高效的未来预测。SparseWorld首先执行自回归展开以预测未来的地图元素和周围智能体,使模型能够学习驾驶场景随时间如何演变。然后,它利用这些预测的未来来优化下游运动预测和轨迹规划。具体来说,我们提出了一种稀疏梦想家(Sparse Dreamer),通过联合时间和空间注意力在潜在空间中预测未来实例。通过与预测的未来实例交互,运动规划器捕获更准确的运动模式,并生成更明智且安全感知的轨迹。大量实验表明,SparseWorld显著降低了碰撞风险,并在nuScenes数据集的开放循环规划指标上实现了最先进的性能,碰撞率为0.05%。此外,在Bench2Drive基准测试的闭环规划指标上,它大幅优于基线方法。补充材料可在项目页面获取:https://wryzju.github.io/SparseWorld/。

英文摘要

Recently, world models have made significant progress in enhancing end-to-end driving systems through both future situation forecasting and improved scene understanding. However, existing driving world models are typically built upon dense scene representations, causing high computational costs and redundant information. In this paper, we present SparseWorld, a lightweight world model that focuses on predicting only the critical layout of the scene, enabling efficient future forecasting for end-to-end driving systems. SparseWorld first performs autoregressive rollout to forecast future map elements and surrounding agents, enabling the model to learn how driving scenarios evolve over time. It then leverages these predicted futures to refine downstream motion prediction and trajectory planning. Specifically, we propose a Sparse Dreamer that anticipates future instances in the latent space through joint temporal and spatial attention. By interacting with predicted future instances, the motion planner captures more accurate motion patterns and generates more informed and safety-aware trajectories. Extensive experiments demonstrate that SparseWorld significantly reduces collision risk and achieves state-of-the-art performance on the open-loop planning metrics of the nuScenes dataset with a collision rate of 0.05\%. Moreover, it substantially outperforms the baseline method in closed-loop planning metrics on the Bench2Drive benchmark. Supplementary material is available at the project page: https://wryzju.github.io/SparseWorld/.

2605.24353 2026-05-26 cs.CV q-bio.OT

ViViD-5K: Vineyard vision dataset for field-based berry detection and segmentation and grape cluster closure estimation

ViViD-5K:用于田间浆果检测与分割以及葡萄串闭合度估计的葡萄园视觉数据集

Xiangzhi Tong, Chengrui Zhang, Mac Flaherty, Andre Matteo Garcia, Dominic Gorman, Jonathan Jaramillo, Justine E. Vanden Heuvel, Yu Jiang

AI总结 提出ViViD-5K大规模葡萄园图像数据集和GrapeSAM两阶段视觉流水线,实现葡萄串闭合度的自动、客观估计。

详情
AI中文摘要

簇闭合度,定义为葡萄串中浆果之间间隙逐渐填充的程度,是葡萄园管理中的一个关键性状,影响病害风险。然而,传统的视觉评分方法劳动强度大、主观性强,且缺乏时间分辨率。现有数据集很少支持细粒度的浆果级分析,限制了稳健深度学习模型的发展。在这项工作中,我们提出了ViViD-5k,一个大规模田间葡萄园视觉数据集,包含5,000张带有密集标注的图像,包括超过648,000个浆果质心和覆盖13个葡萄品种的簇分割掩码。基于该数据集,我们引入了GrapeSAM,一个两阶段视觉流水线,结合了点状浆果定位和基于提示的分割(使用Segment Anything),随后是基于Transformer的簇分割。该流水线实现了在最小监督下对簇闭合度的自动化田间估计。定量结果表明,在多种条件下具有强大的分割和计数准确性,而可视化结果证实了在域内和域外样本上的鲁棒性。这项工作为手动紧凑度评分提供了一种可扩展且客观的替代方案,并支持具有增强空间细节的高通量葡萄表型分析。

英文摘要

Cluster closure, defined as the progressive filling of gaps between the berries in a grape bunch, is a key trait in vineyard management, impacting disease risk. However, traditional visual scoring methods are labor-intensive, subjective, and lack temporal resolution. Existing datasets rarely support fine-grained berry-level analysis, limiting the development of robust deep learning models. In this work, we present ViViD-5k, a large-scale in-field Vineyard Vision Dataset containing 5,000 images with dense annotations, including over 648,000 berry centroids and cluster segmentation masks spanning 13 grape varieties. Building on this dataset, we introduce GrapeSAM, a two-stage visual pipeline that combines point-based berry localization with prompt-based segmentation using Segment Anything, followed by transformer-based cluster segmentation. The pipeline enables automated, in-field estimation of cluster closure with minimal supervision. Quantitative results demonstrate strong segmentation and counting accuracy across diverse conditions, while visualizations confirm robustness on both in-domain and out-of-domain samples. This work provides a scalable and objective alternative to manual compactness scoring and supports high-throughput grape phenotyping with enhanced spatial detail.

2605.24352 2026-05-26 cs.AI

Partner-Aware Hierarchical Skill Discovery for Robust Human-AI Collaboration

伙伴感知的分层技能发现用于鲁棒的人机协作

Adnan Ahmad, Bahareh Nakisa, Mohammad Naim Rastgoo

AI总结 提出伙伴感知技能发现(PASD)框架,通过对比内在奖励学习基于伙伴行为的技能,缓解捷径学习,提升人机协作的鲁棒性和适应性。

详情
AI中文摘要

多智能体协作,尤其是在人机团队中,要求智能体能够适应具有多样化和动态行为的新伙伴。传统的深度分层强化学习(DHRL)方法关注智能体自身的奖励而忽略伙伴行为,导致捷径学习,即技能利用虚假信息而非适应伙伴的动态行为。这一限制削弱了智能体适应和有效协调新伙伴的能力。我们提出了伙伴感知技能发现(PASD),一种学习以伙伴行为为条件的技能的DHRL框架。PASD引入了一种对比内在奖励,以捕捉伙伴交互中出现的模式,在相似伙伴之间对齐技能表示,同时在不同策略之间保持可区分性。通过基于伙伴交互构建技能空间,该方法缓解了捷径学习并促进了行为一致性,从而实现鲁棒和自适应的协调。我们在Overcooked-AI基准测试中,针对具有不同技能水平和游戏风格的多样化伙伴群体,广泛评估了PASD。我们还使用从人机游戏轨迹训练的人类代理模型进一步评估了该方法。PASD始终优于现有的基于群体和分层基线,展示了可迁移的技能学习,能够泛化到广泛的伙伴行为。对学习到的技能表示的分析表明,PASD有效适应了多样的伙伴行为,突显了其在人机协作中的鲁棒性。

英文摘要

Multi-agent collaboration, especially in human-AI teaming, requires agents that can adapt to novel partners with diverse and dynamic behaviors. Conventional Deep Hierarchical Reinforcement Learning (DHRL) methods focus on agent-centric rewards and overlook partner behavior, leading to shortcut learning, where skills exploit spurious information instead of adapting to partners' dynamic behaviors. This limitation undermines agents' ability to adapt and coordinate effectively with novel partners. We introduce Partner-Aware Skill Discovery (PASD), a DHRL framework that learns skills conditioned on partner behavior. PASD introduces a contrastive intrinsic reward to capture patterns emerging from partner interactions, aligning skill representations across similar partners while maintaining discriminability across diverse strategies. By structuring the skill space based on partner interactions, this approach mitigates shortcut learning and promotes behavioral consistency, enabling robust and adaptive coordination. We extensively evaluate PASD in the Overcooked-AI benchmark with a diverse population of partners characterized by varying skill levels and play styles. We further evaluate the approach with human proxy models trained from human-human gameplay trajectories. PASD consistently outperforms existing population-based and hierarchical baselines, demonstrating transferable skill learning that generalizes across a wide range of partner behaviors. Analysis of learned skill representations shows that PASD adapts effectively to diverse partner behaviors, highlighting its robustness in human-AI collaboration.

2605.24351 2026-05-26 cs.CL

How Much Structure Do LLMs Need? Evaluating LLMs for Bibliometric Cluster Description

LLM 需要多少结构?评估 LLM 用于文献计量聚类描述

Abraham Camelo-Guerrero, Jairo Diaz-Rodriguez

AI总结 本研究通过六种不同证据和结构水平的流水线,评估文献计量结构是否改善 LLM 辅助的聚类描述生成,发现混合工作流(算法提供可审计结构,LLM 生成可读描述)效果最佳。

详情
AI中文摘要

大型语言模型(LLM)可以支持科学文献综合,但仍容易出现引用幻觉、覆盖不均匀和主题组织基础薄弱的问题。我们通过比较六种在不同证据和结构水平下生成聚类描述的流水线,评估文献计量结构是否改善 LLM 辅助的综合。使用 100 个已发表的文献计量分析,我们重建 Scopus 语料库,提取人工撰写的聚类描述,并通过人类对齐、语义覆盖、聚类质量、图质量和引用基础来评估输出。结果表明,LLM 生成的描述在语义上接近人工撰写的描述,但在要求从头推断文献计量结构时不可靠。当文献计量算法定义聚类且 LLM 解释它们时,性能有所提高。总体而言,LLM 辅助的文献计量综合最有前景的是混合工作流,其中算法提供可审计的结构,LLM 生成可读的描述。

英文摘要

Large language models (LLMs) can support scientific literature synthesis, but remain prone to hallucinated references, uneven coverage, and weakly grounded thematic organization. We evaluate whether bibliometric structure improves LLM-assisted synthesis by comparing six pipelines for generating cluster descriptions under different levels of evidence and structure. Using 100 published bibliometric analyses, we reconstruct Scopus corpora, extract human-written cluster descriptions, and assess outputs by human alignment, semantic coverage, clustering quality, graph quality, and reference grounding. Results show that LLMs produce descriptions semantically close to human-written ones, but are unreliable when asked to infer bibliometric structure from scratch. Performance improves when bibliometric algorithms define the clusters and the LLM interprets them. Overall, LLM-assisted bibliometric synthesis is most promising as a hybrid workflow in which algorithms provide auditable structure and LLMs generate readable descriptions.

2605.24350 2026-05-26 cs.RO cs.HC

PACT: Proactive Asking for Continual Task Assistance in Human-Robot Collaboration

PACT:人机协作中持续任务辅助的主动询问

Chengbo He, Sheng Li, Chenyang Ma, Bochao Zou, Li Sun, Jiansheng Chen, Junliang Xing, Yuanchun Shi, Huimin Ma

AI总结 提出PACT框架,通过强化学习在部分观测下决定何时主动询问用户以澄清任务,从而在跨日人机协作中逐步提高辅助准确性和澄清效用。

详情
AI中文摘要

在长期人机协作中,机器人助手需要在部分观测下辅助用户,同时利用跨日的交互历史。然而,在协作开始时,人类的特征和常规通常是未知的,这使得被动的推断-行动辅助变得低效。为了解决这一挑战,我们研究了跨日主动询问设置以进行持续任务辅助,并提出了PACT(持续任务辅助的主动询问),这是一个询问-行动框架,决定在采取行动前是否应寻求澄清。PACT利用当前观测以及累积的交互历史来评估上下文充分性,使机器人能够提供更可靠的辅助,并逐步适应用户。我们使用强化学习实现了其主要的学习实例,并在同一框架下评估了替代实例。为了评估这种行为,我们进一步引入了一个澄清效用度量,量化了辅助准确性与澄清请求频率之间的权衡。在多日具身协作场景中的实验表明,与被动推断基线相比,PACT持续提高了辅助准确性和澄清效用,突显了主动询问在持续人机协作中的重要性。

英文摘要

Robotic assistants in long-term human-robot collaboration need to assist users under partial observations while leveraging cross-day interaction history. However, human traits and routines are often unknown at the beginning of collaboration, making passive infer-then-act assistance ineffective and inefficient. To address this challenge, we study a cross-day proactive asking setting for continual task assistance and propose PACT (Proactive Asking for Continual Task Assistance), an ask-or-act framework that determines whether clarification should be sought before taking action. PACT leverages current observations together with accumulated interaction history to evaluate contextual sufficiency, enabling the robot to provide more reliable assistance and progressively adapt to the user over time. We implement its primary learned instantiation using reinforcement learning and evaluate alternative instantiations under the same framework. To assess such behavior, we further introduce a clarification utility metric that quantifies the trade-off between assistance accuracy and the frequency of clarification requests. Experiments in multi-day embodied collaboration scenarios demonstrate that, compared with passive inference baselines, PACT consistently improves both assistance accuracy and clarification utility, highlighting the importance of proactive asking in continual human-robot collaboration.

2605.24345 2026-05-26 cs.LG

Evolving Robustness--Exploration Trade-off in Online Reinforcement Learning via Quantile Bayesian Risk MDPs

通过分位数贝叶斯风险MDP演化在线强化学习中的鲁棒性-探索权衡

Meichen Song, Yuhao Wang, Enlu Zhou

AI总结 本文提出一种基于分位数贝叶斯风险MDP的自适应算法,通过动态调整分位数水平来平衡早期鲁棒性与后期探索,并证明了亚线性贝叶斯遗憾界。

详情
AI中文摘要

在在线强化学习中,数据稀缺导致认知不确定性,使得鲁棒性在学习初期很重要,而充分的探索对于学习真实环境的最优策略是必要的。我们通过分位数贝叶斯风险感知马尔可夫决策过程(BR-MDP)研究这种时变的鲁棒性-探索权衡,其中分位数水平控制后验不确定性如何进入贝尔曼备份。我们通过分位数BR-MDP值与真实环境值之差的渐近正态性结果来表征这种控制。结果表明,上/下尾分位数分别导致对认知不确定性的乐观/悲观,并且乐观/悲观的程度随着数据积累而减小。基于这一表征,我们提出了一种在线贝叶斯风险感知算法,该算法具有自适应分位数调度,早期强调鲁棒性,并逐渐鼓励探索较少访问的状态-动作对。我们建立了相对于真实最优值和最优BR-MDP鲁棒值的亚线性贝叶斯遗憾界。数值实验在探索需求型和探索成本型环境中均表现出强劲性能。

英文摘要

In online reinforcement learning, data scarcity creates epistemic uncertainty that makes robustness important early in learning, whereas sufficient exploration is needed to learn the true-environment optimal policy. We study this time-varying robustness--exploration trade-off through a quantile Bayesian risk-aware Markov decision process (BR-MDP), in which the quantile level controls how posterior uncertainty enters the Bellman backup. We characterize this control through an asymptotic normality result for the difference between the quantile BR-MDP value and the value in the true environment. The result implies that upper/lower-tail quantiles induce optimism/pessimism towards epistemic uncertainty, and the magnitude of the optimism/pessimism decreases as data accumulate. Building on this characterization, we propose an online Bayesian risk-aware algorithm with an adaptive quantile schedule that emphasizes robustness early and gradually encourages exploration of less-visited state--action pairs. We establish sublinear Bayesian regret bounds with respect to both the true optimal value and the optimal BR-MDP robust value. Numerical experiments demonstrate strong performance in both exploration-demanding and exploration-costly environments.

2605.24344 2026-05-26 cs.CL

Distinguishing Right from Wrong in Debates: Attribution Analysis of Chinese Harmful Memes

在辩论中区分对错:中国有害模因的归因分析

Weiming Wang, Junyu Lu, Han Wang, Xiaokun Zhang, Zewen Bai, Bo Xu, Liang Yang, Hongfei Lin

AI总结 针对中文有害模因检测中文化背景依赖和语义歧义问题,构建首个中文有害模因解释数据集Ex-ToxiCN-MM,并提出包含归因知识增强模块和相对意图推理模块的归因分析框架RIKE,在归因任务上超越主流基线模型。

Comments 10 pages, 4 figures

详情
AI中文摘要

关于有害模因检测的研究已引起广泛关注,并催生了大量数据集和方法。然而,中文有害模因检测的进展明显滞后,主要面临两个挑战:首先,准确评估模因的有害性高度依赖于对深层文化背景的理解;其次,许多模因在语义上存在歧义,使得有害性判断具有高度主观性。为解决这些问题,我们聚焦于中文有害模因的可解释检测,构建了首个中文有害模因解释数据集Ex-ToxiCN-MM。该数据集为每个模因提供了“有害”和“无害”两种对立的解释,旨在严格评估模型辨别和理解具有歧义、文化根基内容的能力。我们构建了专门的中文文化概念和冒犯性词汇知识库(C-HarmKB),为模型提供必要的先验知识。为应对模因归因中的歧义和背景知识缺失问题,我们开发了一个全面的归因分析框架RIKE,其中包括归因知识增强模块(AKE)和相对意图推理模块(RIR)。大量的定量和定性实验表明,我们的方法在中文有害模因归因任务的多项指标上优于主流基线模型。本研究中涉及的代码、Ex-ToxiCN-MM数据集和中文有害语义知识库(C-HarmKB)已在https://github.com/wimiw123/Ex-ToxiCN-MM开源。

英文摘要

Research on harmful meme detection has garnered significant attention, resulting in the development of numerous datasets and methods. However, progress in detecting Chinese harmful memes lags considerably, primarily due to two challenges: first, accurately assessing a meme's harmfulness depends heavily on understanding deep cultural context; second, many memes are semantically ambiguous, making harmfulness highly subjective. To address these issues, we focus on the interpretable detection of Chinese harmful memes by constructing the first Chinese harmful meme explanation dataset, Ex-ToxiCN-MM. This dataset offers opposing interpretations, categorized as "harmful" and "non-harmful", for each meme, aiming to rigorously evaluate a model's ability to discern and comprehend ambiguous, culturally grounded content. We built a specialized knowledge base of Chinese cultural concepts and offensive vocabulary to supply models with essential prior knowledge (C-HarmKB). To address the ambiguity and lack of background knowledge in meme attribution, we have developed a comprehensive attribution analysis framework, RIKE, which includes an Attribution Knowledge Enhancement module (AKE) and a Relative Intent Reasoning module (RIR). Extensive quantitative and qualitative experiments demonstrate that our method outperforms mainstream baseline models across multiple metrics in the task of attributing harmful memes in Chinese. The code, Ex-ToxiCN-MM dataset, and Chinese Harmful Semantic Knowledge Base (C-HarmKB) involved in this study have been open-sourced at https://github.com/wimiw123/Ex-ToxiCN-MM

2605.24343 2026-05-26 cs.AI

Adaptive Human-AI Coordination via Hierarchical Action Disentanglement

通过层次化动作解耦实现自适应人机协作

Adnan Ahmad, Bahareh Nakisa, Mohammad Naim Rastgoo

AI总结 提出内在动作解耦(IAD)框架,利用深度层次强化学习学习伙伴感知的低层动作序列,通过内在奖励鼓励动作解耦,实现与多样伙伴的自适应协调。

详情
AI中文摘要

人机协作需要智能体能够适应多样化的伙伴行为和技能水平,同时对未见过的伙伴保持鲁棒性。现有方法往往坍缩为单一主导行为或学习到对齐不良的技能,限制了有效协调。我们提出内在动作解耦(IAD),一种深度层次强化学习(DHRL)框架,学习以高层潜在技能为条件的、不同的、伙伴感知的低层动作序列。IAD引入内在奖励,明确鼓励智能体低层策略在不同技能上的动作分布解耦,从而在高层次决策与伙伴特定的行为响应之间产生可解释的映射。通过捕捉时间上扩展的交互模式,IAD能够在分布偏移下灵活适应异质伙伴动态。我们在Overcooked-AI领域中对多个布局和多种伙伴设置进行评估,包括未见过的模拟伙伴、基于人-人游戏训练的人类代理模型以及真实人类伙伴。结果表明,IAD在所有设置中均持续优于强基线,并实现更可靠、自适应的协调。

英文摘要

Human-AI collaboration requires agents that can adapt to diverse partner behaviors and skill levels while remaining robust to unseen partners. Existing methods often collapse to a single dominant behavior or learn poorly aligned skills, limiting effective coordination. We propose Intrinsic Action Disentanglement (IAD), a deep hierarchical reinforcement learning (DHRL) framework that learns distinct, partner-aware low-level action sequences conditioned on high-level latent skills. IAD introduces an intrinsic reward that explicitly encourages disentangled action distributions of the agent's low-level policy across skills, yielding an interpretable mapping between high-level decisions and partner-specific behavioral responses. By capturing temporally extended interaction patterns, IAD enables flexible adaptation to heterogeneous partner dynamics under distributional shift. We evaluate IAD in the Overcooked-AI domain across multiple layouts and diverse partner settings, including unseen simulated partners, a human-proxy model trained on human-human gameplay, and real human partners. Results show that IAD consistently outperforms strong baselines and achieves more reliable, adaptive coordination across all settings.