arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.24418 2026-05-26 cs.LG

ChainLearn: A Blockchain-Based Capacity-Aware Framework for Federated Ensemble Learning

ChainLearn: 一种基于区块链的容量感知联邦集成学习框架

Karan Sharma, Aditya Tripathi, Rahul Mishra, Tapas Kumar Maiti

AI总结针对医院计算资源异构导致标准联邦学习失效的问题，提出容量感知协调方法，通过区块链分离链上策略与链下学习，为各医院分配适当架构并加权集成，在降低通信开销的同时保持竞争性精度与校准误差。

Comments 10 pages, 7 figures, 11 tables. IEEE conference format. Code: https://github.com/EdddTri/ChainLearn

详情

AI中文摘要

联邦学习用于医疗影像中，其中隐私禁止集中数据。标准联邦算法假设同质硬件、相同架构和集中聚合，当医院拥有不均等的计算资源时失败。我们提出容量感知协调：测量每个医院的吞吐量，分配容量适当的架构（MobileNetV3-Small、EfficientNet-B0、ResNet-50），并通过加权集成组合预测。弱医院和强医院都可以参与，无需强制统一架构。我们将链上策略与链下学习分离。一个Solidity合约存储医院注册、基准哈希、指标和权重。医院本地训练并仅提交哈希和标量（而非参数）。加权集成推理在链下计算。在PneumoniaMNIST和DermaMNIST上的实验（5个种子，3个非独立同分布水平）表明，我们的方法相比等权集成实现了更低或相等的校准误差，相比FedAvg、FedProx和FedMD具有竞争性精度。每轮通信开销为224字节，相比FedAvg减少了超过912,000倍。

英文摘要

Federated learning is used in medical imaging where privacy prohibits centralizing data. Standard federated algorithms assume homogeneous hardware, identical architectures, and centralized aggregation, which fails when hospitals have unequal compute resources. We propose capacity-aware coordination: measure each hospital's throughput, assign capacity-appropriate architectures (MobileNetV3-Small, EfficientNet-B0, ResNet-50), and combine predictions via weighted ensemble. Weak and strong hospitals can participate without forcing uniform architectures. We separate on-chain policy from off-chain learning. A Solidity contract stores hospital registration, benchmark hashes, metrics, and weights. Hospitals train locally and submit only hashes and scalars (not parameters). Weighted ensemble inference is computed off-chain. Experiments on PneumoniaMNIST and DermaMNIST (5 seeds, 3 non-IID levels) show our method achieves lower or equal calibration error versus equal-weight ensemble and competitive accuracy versus FedAvg, FedProx, and FedMD. Communication overhead is 224 bytes per round, a reduction of over 912,000x compared to FedAvg.

URL PDF HTML ☆

赞 0 踩 0

2605.24417 2026-05-26 cs.LG

LLMTabBench: Evaluating LLMs on Binary Tabular Classification From Zero to Few Shots

LLMTabBench：从零样本到少样本的二元表格分类中评估LLM

Daria Grushina, Kseniia Kuvshinova, Alina Kostromina, Aziz Temirkhanov, Mile Mitrovic, Dmitry Simakov

AI总结提出LLMTabBench基准，系统评估LLM在数据稀缺条件下进行表格分类时，先验知识与上下文信息（任务描述和少样本示例）的交互作用，以及性能随数据复杂度的扩展规律。

详情

AI中文摘要

表格数据的监督分类仍然是核心机器学习任务，但其对大规模标注数据集的依赖限制了在数据稀缺领域的适用性。对于此类少样本场景，像TabPFN（一种最先进的先验数据拟合网络）这样的专门方法通过利用大规模合成预训练设定了高标准，但它们仍然需要标注示例的上下文才能运行。相比之下，大型语言模型（LLM）可以通过直接从任务描述中进行零样本和少样本上下文学习提供更灵活的替代方案，但它们在表格数据上的性能仍然不一致且理解不足。我们引入了LLMTabBench，这是一个基准测试，旨在系统评估LLM在数据稀缺条件下进行表格分类的能力。LLMTabBench明确探究了（i）LLM先验知识如何与上下文信息（任务描述和少样本示例）相互作用，以及（ii）模型性能如何随数据复杂度的增加而扩展，使用了真实世界和受控合成数据集。我们的发现包括：（1）LLM在零样本设置中极具竞争力，甚至可以超越那些能够访问少样本示例的替代模型；（2）加入额外的少样本示例可能与LLM先验知识冲突，限制甚至降低性能；（3）存在一个数据复杂度阈值，超过该阈值LLM的性能下降且少样本示例变得效果较差。这些发现共同揭示了表格数据上下文学习的基本限制，并为在低数据场景中部署LLM提供了实用指导。

英文摘要

Supervised classification for tabular data remains a core machine learning task, yet its reliance on large labeled datasets limits applicability in data-scarce domains. For such few-shot scenarios, specialized methods like TabPFN - a state-of-the-art Prior-Data Fitted Network - have set a high standard by leveraging large-scale synthetic pretraining, though they still require a context of labeled examples to function. In contrast, Large Language Models (LLMs) could offer a more flexible alternative via zero- and few-shot in-context learning directly from task descriptions, but their performance on tabular data remains inconsistent and poorly understood. We introduce LLMTabBench, a benchmark designed to systematically evaluate LLMs for tabular classification under data-scarce conditions. LLMTabBench explicitly probes (i) how LLM prior knowledge interacts with in-context information (task descriptions and few-shot examples), and (ii) how model performance scales with increasing data complexity, using both real-world and controlled synthetic datasets. Our findings include: (1) LLMs are highly competitive in zero-shot settings and can outperform alternative models, even when those models have access to few-shot examples; (2) incorporating additional few-shot examples can conflict with LLM prior knowledge, limiting or even degrading performance; and (3) there is a data complexity threshold beyond which LLMs' performance declines and few-shot examples become less effective. Together, these findings reveal fundamental constraints of in-context learning for tabular data and provide practical guidance for deploying LLMs in low-data regimes.

URL PDF HTML ☆

赞 0 踩 0

2605.24416 2026-05-26 cs.LG

Synheart Capacity: A Theory-Driven Physiological Representation of Cognitive Capacity Dynamics from Wearable Signals

Synheart Capacity: 一种理论驱动的从可穿戴信号中认知容量动态的生理表征

Yisak Debele, Henok Ademtew, Israel Goytom

AI总结提出一种理论驱动的多模态学习框架，通过心脏和皮肤电信号的双流编码，将认知容量状态建模为资源分配（努力）和超负荷（压力）的二维生理表征，在SWELL-KW数据集上实现跨个体泛化，并区分不同认知状态。

详情

AI中文摘要

人类认知表现受有限心理资源的约束，但认知容量动态的连续计算估计仍然是一个开放挑战。我们提出了一种理论驱动的多模态学习框架，将容量相关的认知状态建模为由自愿资源分配（心理努力）和超负荷相关压力（应激）定义的二维生理表征。所提出的架构结合了心脏（IBI/HRV）和皮肤电（EDA）信号的双流编码，以及后期融合和任务特定的输出头，独立估计概率性的努力和压力状态。在SWELL-KW数据集上使用严格的留一受试者交叉验证进行评估，展示了跨个体泛化能力（压力：70.0%平衡准确率；努力：72.2%），多模态融合和理论引导监督带来了显著提升。所提出的努力-压力状态空间不是将生理动态压缩为单一工作负荷标签，而是能够结构化区分不同的认知状态，包括生产性投入和超负荷相关压力。在受控工作负荷操作下，预测的状态轨迹表现出显著的负荷敏感性变化，努力和压力在中断和时间压力条件下呈现差异化响应。这些结果表明，基于生理的多维状态表征可能为能够进行连续容量感知监测和人本交互的自适应系统提供基础。

英文摘要

Human cognitive performance is constrained by limited mental resources, yet continuous computational estimation of cognitive capacity dynamics remains an open challenge. We propose a theory-driven multimodal learning framework that models capacity-related cognitive state as a two-dimensional physiological representation defined by voluntary resource allocation (mental effort) and overload-related strain (stress). The proposed architecture combines dual-stream encoding of cardiac (IBI/HRV) and electrodermal (EDA) signals with late fusion and task-specific output heads that independently estimate probabilistic effort and stress states. Evaluation on the SWELL-KW dataset using strict leave-one-subject-out cross-validation demonstrates cross-individual generalization (stress: 70.0\% balanced accuracy; effort: 72.2\%), with significant gains from multimodal integration and theory-guided supervision. Rather than collapsing physiological dynamics into a single workload label, the proposed effort--stress state-space enables structured differentiation between distinct cognitive regimes, including productive engagement and overload-related strain. Predicted state trajectories exhibit significant demand-sensitive shifts under controlled workload manipulations, with effort and stress responding differentially across interruption and time-pressure conditions. These results suggest that physiologically grounded multidimensional state representations may provide a foundation for adaptive systems capable of continuous capacity-aware monitoring and human-centered interaction.

URL PDF HTML ☆

赞 0 踩 0

2605.24414 2026-05-26 cs.AI

JT-SAFE-V2: Safety-by-Design Foundation Model with World-Context Data

JT-SAFE-V2：具有世界上下文数据的安全设计基础模型

Junlan Feng, Fanyu Meng, Chong Long, Pengyu Cong, Duqing Wang, Yan Zheng, Yuyao Zhang, Xuanchang Gao, Ye Yuan, Yunfei Ma, Zhijie Ren, Fan Yang, Na Wu, Di Jin, Chao Deng

AI总结提出JT-Safe-V2大语言模型，通过世界知识预训练、高确定性训练和安全强化后训练实现通用智能与安全设计的联合优化，并引入Safe-MoMA框架降低推理成本，在通用智能和安全基准上达到最优性能。

详情

AI中文摘要

我们介绍了JT-Safe-V2，这是一个旨在提升基础模型安全性和可信度的大型语言模型，将我们之前的JT-Safe模型扩展为更全面的安全设计范式。JT-Safe-V2通过几个关键创新强调通用智能与安全设计的联合优化：用上下文世界知识丰富预训练数据、高确定性预训练程序，以及面向企业级代理能力的安全强化后训练机制。在这些安全增强的基础模型基础上，我们提出了Safe-MoMA（安全模型与代理混合），这是一个通过协调部署多个模型和代理实现可追溯高效推理的框架。广泛评估表明，JT-Safe-V2在通用智能和安全基准上均达到了最先进性能。此外，与使用最大的独立模型基线相比，Safe-MoMA在保持相当性能的同时将推理成本降低了30%以上。为了促进未来安全设计基础模型的研究，我们公开发布了后训练的JT-Safe-V2-35B模型检查点。

英文摘要

We introduce JT-Safe-V2, a large language model designed to advance the safety and trustworthiness of foundation models, extending our previous JT-Safe model toward a more comprehensive safety-by-design paradigm. JT-Safe-V2 emphasizes the joint optimization of general intelligence and safety-by-design through several key innovations: enriching pre-training data with contextual world knowledge, high-certainty pre-training procedures, and safety strengthening post-training mechanisms for enterprise-oriented agentic capabilities. Building on these safety-enhanced foundation models, we propose Safe-MoMA (Safe Mixture of Models and Agents), a framework that enables traceable and efficient inference through the orchestrated deployment of multiple models and agents. Extensive evaluations demonstrate that JT-Safe-V2 achieves state-of-the-art performance across both general intelligence and safety benchmarks. Moreover, Safe-MoMA reduces inference costs by more than 30\% compared to using the largest standalone model baseline while maintaining comparable performance. To facilitate future research on safety-by-design foundation models, we publicly release the post-trained JT-Safe-V2-35B model checkpoint.

URL PDF HTML ☆

赞 0 踩 0

2605.24411 2026-05-26 cs.AI cs.LG

The Model Is Not the Product: A Dual-Pillar Architecture for Local-First Psychological Coaching

模型并非产品：面向本地优先心理辅导的双支柱架构

Alexander Mihalcea

AI总结本文提出Psych LM，一种基于本地优先架构的iOS应用，通过自动记忆语料库和检索增强生成实现近无限上下文窗口，在移动设备上提供可靠的上下文感知心理辅导。

Comments 10 pages, 3 figures

详情

AI中文摘要

现有语言模型应用难以满足情感导向支持的需求，主要原因是它们无法在会话间维持深度、持久的上下文。本报告介绍了Psych LM，一款iOS应用，验证了对于此类应用，周围架构至关重要的论点。Psych LM在专为行为和生活辅导应用设计的本地优先运行时中运行本地设备端语言模型。该系统通过一个自动化的、用户可检查的记忆语料库实现了接近无限上下文窗口的实际效果，该语料库将对话转换为结构化的记忆卡片，包括事实、目标和事件，并通过语义和向量搜索动态注入提示中。因此，该系统可定义为一种主动学习、检索增强生成、设备端架构。该架构提供了四个主要贡献：以隐私为核心属性的本地优先设计；用于持久存储关键用户信息的记忆语料库的详细描述；提供独立于模型内部状态的稳定行为骨架的确定性编排层；以及专注于在现实操作条件下评估集成系统可靠性的基准框架。研发过程证实，通过优先考虑架构控制和资源管理而非简单模型大小，可以在移动环境的严格约束下可靠地实现复杂的上下文感知交互。

英文摘要

Existing language model applications struggle to meet the demand for emotionally oriented support, primarily due to their inability to maintain deep, persistent context across sessions. This report introduces Psych LM, an iOS application that validates the thesis that, for such applications, the surrounding architecture is paramount. Psych LM runs a local, on-device language model within a purpose-built, local-first runtime designed for behavioral and life-coaching applications. The system achieves the practical effect of a near-infinite context window through an automated, user-inspectable memory corpus that converts conversations into structured memory cards, including facts, goals, and events, and dynamically injects them into the prompt via semantic and vector search. As such, the system can be defined as an active-learning, retrieval-augmented generative, on-device architecture. This architecture delivers four primary contributions: a local-first design where privacy is a core property; a detailed description of the memory corpus for persistent context of key user information; a deterministic orchestration layer that provides a stable behavioral spine independent of the model's internal state; and a benchmark framework focused on evaluating the integrated system's reliability under realistic operating conditions. The R and D process confirms that complex, context-aware interaction can be reliably achieved under the strict constraints of a mobile environment by prioritizing architectural control and resource management over simple model size.

URL PDF HTML ☆

赞 0 踩 0

2605.24410 2026-05-26 cs.AI

Advancing Graph Few-Shot Learning via In-Context Learning

通过上下文学习推进图少样本学习

Renchu Guan, Yajun Wang, Chunli Guo, Bowen Cao, Fausto Giunchiglia, Wei Pang, Yonghao Liu, Xiaoyue Feng

AI总结提出VISION模型，将图少样本学习重构为免微调的序列推理问题，利用无监督任务生成器从无标签数据中构建伪任务，通过上下文感知网络融合局部拓扑和全局任务依赖，实现高效推理。

Comments KDD26

详情

DOI: 10.1145/3770855.3817797

AI中文摘要

图少样本学习旨在仅用少量标注样本对来自新类别的节点进行分类，是图学习中广泛研究的问题。然而，现有方法常面临两个关键限制。首先，主流的图少样本学习范式依赖于监督任务，未能利用图中大量的无标签节点。其次，许多方法在推理时需要复杂的任务适应或微调，限制了其效率和适用性。受大语言模型强大的上下文学习能力启发，我们提出了一种名为VISION的新模型，通过上下文学习推进图少样本学习，以应对这些挑战。我们的模型将图少样本学习重构为免微调的序列推理问题。其核心是一个上下文感知网络，该网络使用角色嵌入初始化节点，并采用双上下文融合模块协同整合局部拓扑结构和全局任务级依赖关系。这使得我们的模型能够在单次前向传播中，根据支持集上下文动态地为查询集生成类别感知表示。为了有效训练我们的模型，我们引入了一个无监督任务生成器，该生成器创建结构自适应特征，并从大量无标签数据中构建多样的伪任务。我们的方法将无监督元学习与图上下文学习统一起来，实现了高效推理。在多个基准数据集上的大量实验证明了我们模型的优越性。我们的公开代码可在以下网址找到。

英文摘要

Graph few-shot learning, which aims to classify nodes from novel classes with only a few labeled examples, is a widely studied problem in graph learning. However, existing methods often face two key limitations. First, the predominant graph few-shot learning paradigm relies on supervised tasks, failing to leverage the vast number of unlabeled nodes in the graph. Second, many approaches require complex task adaptation or fine-tuning during inference, limiting their efficiency and applicability. Inspired by the powerful in-context learning capabilities of large language models, we propose a novel model named VISION for adVancIng graph few-Shot learning via In-cOntext LearNing to address these challenges. Our model reframes graph few-shot learning as a fine-tuning-free sequence reasoning problem. At its core is a context-aware network that initializes nodes with role embeddings and employs a dual-context fusion module to synergistically integrate local topological structures and global task-level dependencies. This allows our model to dynamically generate class-aware representations for the query set conditioned on the support set context in a single forward pass. To effectively train our model, we introduce an unsupervised task generator that creates structure-adaptive features and constructs diverse pseudo-tasks from abundant unlabeled data. Our method unifies unsupervised meta-learning with graph in-context learning, achieving efficient inference. Extensive experiments on multiple benchmark datasets demonstrate the superiority of our model. Our public code can be found

URL PDF HTML ☆

赞 0 踩 0

2605.24406 2026-05-26 cs.LG

A Unified Python Framework for Direct PPO-based Control of AHUs with Economizer Logic and CO2-Constrained Ventilation

一个统一的Python框架：基于直接PPO的AHU控制，包含节能器逻辑和CO2约束通风

Erfan Haghighat Damavandi, Davide Papurello, Mahdi Alibeigi, Armin Keshavarz, Simone Canevarolo, Marco Condo

AI总结本文提出一个基于深度强化学习和PPO算法的统一Python框架，通过层次流逻辑和焓基节能器实现AHU的节能控制，在保证CO2浓度不超限的同时提升温度稳定性和能效。

Comments 10 pages, 7 figures

详情

AI中文摘要

优化HVAC（供暖、通风和空调）系统可以在为居住者提供舒适度的同时提高建筑能效。由于建筑围护结构随时间经历随机负荷变化而具有非线性特性，使用传统控制系统来维持HVAC功能通常很困难。本文提出了一种新方法，通过深度强化学习（DRL）算法和在自定义Python性能环境中实现的近端策略优化（PPO）算法来优化HVAC系统。DRL系统使用二阶电阻-电容热模型和集成的CO2动态质量平衡来复制与建筑相关的复杂物理过程。本研究的一个主要创新是“层次流逻辑”，它通过覆盖导致CO2超过1000 ppm的智能体动作来确保室内空气质量（IAQ）得以维持。此外，使用基于焓的节能器从室外环境实现免费冷却。实验数据表明，与通过遗传算法（GA）调优的PID控制器或传统的开关控制相比，PPO智能体具有更好的温度稳定性和整体能效。端到端的流水线为在真实硬件实现背景下实施智能建筑能源管理提供了稳健且通用的解决方案。

英文摘要

Optimizing HVAC (Heating, Ventilation and Air Conditioning) can enhance a building's energy efficiency while providing comfort levels for its occupants. Using conventional control systems to maintain HVAC functions is often difficult because of the nonlinear characteristics of a building envelope as it experiences stochastic load variations over time. This paper presents a new approach to optimizing HVAC systems through the use of Deep Reinforcement Learning (DRL) algorithms and the Proximal Policy Optimization (PPO) algorithm implemented in a custom Python performance environment. The DRL system uses a second order resistor-capacitor thermal model and an integrated dynamic mass balance of CO2 to replicate the complex physics associated with buildings. One major innovation of this study is a "Hierarchical Flow Logic," which provides the means to ensure that indoor air quality (IAQ) is maintained by overriding the accepted actions of the agent that cause CO2 to exceed 1000 ppm. In addition, an enthalpy-based economiser is used to create free cooling from the outdoor environment. The experimental data shows that compared to PID controllers tuned by GA or traditional On-Off controls, a PPO agent has better temperature stability and energy efficiency overall. An end-to-end pipeline provides an avenue for robust and generalized solutions to help implement smart building energy management within the context of real hardware implementation.

URL PDF HTML ☆

赞 0 踩 0

2605.24405 2026-05-26 cs.LG cs.AI

Generative OOD-regularized Model-based Policy Optimization

生成式OOD正则化的基于模型的策略优化

Aysin Tumay, Jiahe Huang, Elise Jortberg, Rose Yu

AI总结提出GORMPO算法，利用生成式密度估计在稀疏状态-动作空间中限制策略更新到高密度区域，以解决离线强化学习中的分布外动作问题，并在真实医疗数据集和离线RL数据集上优于基线方法。

详情

AI中文摘要

我们研究使用离线强化学习的序贯决策。传统离线RL策略在训练仅依赖稀疏离线表示时可能导致分布外（OOD）动作。为确保在稀疏状态-动作空间中的安全离线策略，我们探索如何将密度估计模型集成到基于模型的RL方法中以避免OOD区域。生成式模型能够显式建模稀疏状态-动作空间中的密度。基于此，我们引入生成式OOD正则化的基于模型的策略优化（GORMPO），一种密度正则化的离线RL算法，使用生成式密度建模将策略更新限制在数据集的高密度区域。此外，我们考察更好的OOD检测是否对应更好的基于模型的离线策略。我们比较了（1）各种密度估计器的OOD检测能力，以及（2）它们在GORMPO框架内在真实医疗数据集和稀疏离线RL数据集上的性能。我们在温和假设下理论上保证了GORMPO的性能。实验上，GORMPO在真实医疗数据集上比最先进的基线方法提升17%，并在离线RL数据集上增强了基础模型。我们的实证发现表明，在动态稳定的环境中，更好的OOD检测通常导致改进的策略，而当动态不确定时，带有保守惩罚的较差密度估计更受青睐。

英文摘要

We study sequential decision-making with offline reinforcement learning (RL). Traditional offline RL policies may result in out-of-distribution (OOD) actions when training relies only on sparse offline representations. To ensure safe offline policies in a sparse state-action space, we explore how density estimation models can be integrated into model-based RL methods to avoid the OOD regions. Generative models are capable of explicitly modeling the density in sparse state-action spaces. Building on this, we introduce Generative OOD-regularized Model-based Policy Optimization (GORMPO), a density-regularized offline RL algorithm that uses generative density modeling to restrict policy updates to high-density areas of the dataset. Furthermore, we examine whether better OOD detection corresponds to better model-based offline policies. We compare (1) the OOD detection capabilities of various density estimators and (2) their performance within the GORMPO framework on a real-world medical dataset and sparse offline RL datasets. We theoretically guarantee GORMPO's performance under mild assumptions. Empirically, GORMPO outperforms state-of-the-art baselines by 17% on a real-world medical dataset and enhances the base model on the offline RL datasets. Our empirical findings show that better OOD detection generally results in improved policies in environments with stable dynamics, while conservative penalties with poor density estimation are favored when dynamics are uncertain.

URL PDF HTML ☆

赞 0 踩 0

2605.24403 2026-05-26 cs.CV

Artiverse: A Diverse and Physically Grounded Dataset for Articulated Objects

Artiverse：一个多样且物理基础扎实的铰接物体数据集

Denys Iliash, Jiayi Liu, Egor Fokin, Qirui Wu, Ali Mahdavi-Amiri, Manolis Savva, Angel X. Chang

AI总结提出Artiverse数据集，包含5.4K个高质量铰接3D物体，通过半自动标注管道结合少样本分割、几何推理和多阶段人工验证，实现高效标注，并展示其在部件运动分析、铰接物体生成和基于物理的交互中的价值。

Comments CVPR camera-ready version

详情

AI中文摘要

我们提出了Artiverse，一个多样且物理基础扎实的高质量铰接3D物体数据集，旨在用于真实的功能建模和仿真。Artiverse包含来自多个3D静态仓库的5.4K个人工制作的物体，涵盖88个广泛类别。物体被标注有功能部件、内部结构、真实的运动学关系和铰接关节（包括多自由度关节），以及物理属性如公制尺度、材料和质量。我们开发了一个半自动标注管道，结合少样本分割、几何推理和多阶段人工验证，以实现高质量和高效的标注，将人工标注时间减少了30%以上。我们展示了Artiverse在部件运动分析、铰接物体生成和基于物理的交互任务中的价值。Artiverse为推进铰接物体的功能理解提供了数据资源。

英文摘要

We present Artiverse, a diverse and physically grounded dataset of high-quality articulated 3D objects designed for realistic functional modeling and simulation. Artiverse contains 5.4K human-authored objects across a broad range of 88 categories, aggregated from multiple 3D static repositories. Objects are annotated with functional parts, interior structures, realistic kinematic relationships and articulated joints including multi-DoF joints, and physical attributes such as metric scale, material, and mass. We develop a semi-automated annotation pipeline that combines few-shot segmentation, geometric reasoning, and multi-stage human verification to achieve high-quality and efficient annotation, reducing manual annotation time by over 30%. We demonstrate the value of Artiverse on tasks of part mobility analysis, articulated object generation, and physics-based interaction. Artiverse provides a data resource to advance functional understanding for articulated objects.

URL PDF HTML ☆

赞 0 踩 0

2605.24402 2026-05-26 cs.CV

Dual Prototype-Conditioned Diffusion Model for Scalable Multi-Class Unsupervised Anomaly Detection in Large Category Spaces

面向大规模类别空间的可扩展多类无监督异常检测的双原型条件扩散模型

Yaoxuan Feng, Yuxin Li, Weijiang Lv, Zixuan Zhao, Yubiao Wang, Wenchao Chen, Bo Chen, Hongwei Liu

AI总结提出DPDiff-AD，一种通过局部和全局原型建模异构正态分布并利用扩散重建实现可扩展多类异常检测的方法。

详情

AI中文摘要

多类异常检测旨在跨不同产品类别构建统一模型。然而，随着类别数量的增加，由于正态分布日益复杂和异质，其性能通常会下降。为应对这一挑战，我们提出DPDiff-AD，一种用于大规模多类异常检测的双原型条件扩散模型。DPDiff-AD通过互补的局部和全局原型对异构正态分布进行建模。局部原型通过最近原型聚合捕获代表性的细粒度结构模式，而全局原型通过最优传输正则化调节整体特征几何。这些双尺度表示共同定义了一个结构化的正态空间。通过基于原型感知注意力的局部和全局原型条件扩散重建，该空间得到细化。在生成过程中联合利用双原型，DPDiff-AD实现了精确的正态建模，随着类别基数的增长保持了结构化的可分离性，并实现了可扩展的异常判别。在五个基准上的大量实验证明了DPDiff-AD的有效性和可扩展性。在160类大规模数据集上，它相比之前最先进的方法Dinomaly+，图像级和像素级AUROC分别提升了5.3和2.9个百分点，同时随着类别基数的增加保持了稳定的性能。

英文摘要

Multi-class anomaly detection aims to build unified models across diverse product categories. However, as the number of categories grows, its performance often degrades due to increasingly complex and heterogeneous normal distributions. To address this challenge, we propose DPDiff-AD, a Dual Prototype-conditioned Diffusion model for large-scale multi-class Anomaly Detection. DPDiff-AD models heterogeneous normal distributions through complementary local and global prototypes. Local prototypes capture representative fine-grained structural patterns via nearest-prototype aggregation, while global prototypes regulate holistic feature geometry through optimal transport regularization. Together, these dual-scale representations define a structured normality space. This space is refined through diffusion-based reconstruction conditioned on both local and global prototypes via prototype-aware attention. By jointly leveraging dual prototypes during generation, DPDiff-AD achieves precise normality modeling, preserves structured separability as category cardinality grows, and enables scalable anomaly discrimination. Extensive experiments across five benchmarks demonstrate the effectiveness and scalability of DPDiff-AD. On the 160-category large-scale dataset, it improves image- and pixel-level AUROC by 5.3 and 2.9 points over the previous state-of-the-art method Dinomaly+, while maintaining stable performance as category cardinality increases.

URL PDF HTML ☆

赞 0 踩 0

2605.24398 2026-05-26 cs.CV cs.AI cs.GR

VectorArk: Learning Practical Image Vectorization with Rounded Polygon Representation

VectorArk: 学习基于圆角多边形表示的实际图像矢量化

Tarun Gehlaut, Difan Liu, Charu Bansal, Krutik Malani, Souymodip Chakraborty, Ankit Phogat, Matthew Fisher, Vineet Batra

AI总结提出VectorArk模型，采用圆角多边形表示和退化模型，实现鲁棒且实用的图像矢量化，在多个数据集上取得优越的几何完整性和伪影抑制效果。

Comments CVPR 2026. Project page: https://vectorark.github.io/

详情

AI中文摘要

近期基于视觉-语言模型（VLM）的方法在图像矢量化任务上取得了令人印象深刻的结果。然而，它们通常在合成基准上进行评估，其中干净的SVG以高分辨率光栅化，然后重新矢量化。因此，这些方法在真实场景中泛化能力较差，例如图像具有未知的光栅化方法或由文本到图像模型生成。我们引入了VectorArk，一种新的基于VLM的模型，旨在实现鲁棒且实用的图像矢量化。VectorArk采用了一种新颖的圆角多边形表示，简化了学习过程，同时自然地生成平滑、视觉上吸引人的基元。我们还提出了一种退化模型，增强了在多样且不完美输入下的鲁棒性。我们的实验表明，与先前方法相比，VectorArk在多个数据集上实现了优越的几何完整性和伪影抑制，全面的消融实验验证了每个组件的贡献。

英文摘要

Recent vision-language model (VLM)-based approaches have achieved impressive results on image vectorization tasks. However, they are typically evaluated on synthetic benchmarks, where clean SVGs are rasterized at high resolution and then re-vectorized. As a result, these methods generalize poorly to real-world scenarios, such as images with unknown rasterization methods or those generated by text-to-image models. We introduce VectorArk, a new VLM-based model designed for robust and practical image vectorization. VectorArk employs a novel rounded polygon representation that simplifies the learning process while naturally producing smooth, visually appealing primitives. We also propose a degradation model that enhances robustness across diverse and imperfect inputs. Our experiments show that, in contrast to previous methods, VectorArk achieves superior geometric completeness and artifact suppression across multiple datasets, with comprehensive ablations validating the contribution of each component.

URL PDF HTML ☆

赞 0 踩 0

2605.24396 2026-05-26 cs.AI

Understanding and Mitigating Premature Confidence for Better LLM Reasoning

理解并缓解过早自信以提升大语言模型推理能力

Jingchu Gai, Guanning Zeng, Christina Baek, Chen Wu, J. Zico Kolter, Andrej Risteski, Aditi Raghunathan

AI总结针对大语言模型长思维链中逻辑跳跃和过早自信问题，提出渐进式自信塑造强化学习目标，无需外部标签或奖励模型，通过奖励逐步自信增长并惩罚过早承诺，显著提升推理准确性和质量。

详情

AI中文摘要

当前语言模型的长思维链（CoT）经常包含逻辑缺口和不合理的跳跃，限制了额外测试时计算带来的收益。直接提升推理质量需要过程奖励模型，但训练它们所需的步骤级标注昂贵且稀缺。我们在模型推理过程中自信度的演化中发现了一个信号：过早自信，即倾向于过早承诺答案并用剩余标记为其辩护，这强烈预测了跨任务和模型规模的推理缺陷。我们利用这一点提出了渐进式自信塑造，这是一种强化学习目标，训练模型在推理过程中更新自信度而非过早承诺——奖励逐步自信增长并惩罚过早承诺，无需外部标签或奖励模型。该方法在算术（Countdown）、数学（DAPO、AIME）和科学（ScienceQA）任务上，从1.5B到8B参数规模均提升了准确率和推理质量：在Countdown上，准确率提升3.2倍（+42.0个百分点），缺陷推理下降48个百分点；在AIME上，Pass@64提升6.6个百分点。与该机制一致，该方法还提升了忠实度：在安全基准上，我们的模型更透明地在其推理轨迹中暴露误导性内容而非隐藏它。控制实验表明，问题及其解决方案共同扩展：过早自信随模型规模和任务难度增长，而解决它带来的收益也随之增长。

英文摘要

Long chains of thought (CoT) from current language models frequently contain logical gaps and unjustified leaps, limiting the gains from additional test-time compute. Improving reasoning quality directly would require process reward models, but the step-level annotations needed to train them are expensive and scarce. We find such a signal in how the model's confidence evolves during reasoning: premature confidence, the tendency to commit to an answer early and use the remaining tokens to rationalize it, strongly predicts flawed reasoning across tasks and model scales. We exploit this in progressive confidence shaping, a reinforcement learning objective that trains models to update their confidence as they reason rather than commit early -- rewarding gradual confidence growth and penalizing early commitment, with no external labels or reward models. The method improves accuracy and reasoning quality from 1.5B to 8B parameters across arithmetic (Countdown), math (DAPO, AIME), and science (ScienceQA): on Countdown, accuracy improves 3.2x (+42.0pp) and flawed reasoning drops 48pp; on AIME, Pass@64 improves 6.6pp. Consistent with this mechanism, the method also improves faithfulness: on a safety benchmark, our models more transparently surface misleading content in their reasoning traces rather than concealing it. Controlled experiments reveal that the problem and its remedy scale together: premature confidence grows with model size and task difficulty, and so do the gains from addressing it.

URL PDF HTML ☆

赞 0 踩 0

2605.24395 2026-05-26 cs.LG

AvAtar: Learning to Align via Active Optimal Transport

AvAtar: 通过主动最优输运学习对齐

Qi Yu, Ruizhong Qiu, Zhichen Zeng, My T. Thai, Huan Liu, Hanghang Tong

AI总结提出AvAtar框架，利用主动学习策略通过熵正则化最优输运的梯度影响量化候选点信息量，并采用伴随状态法高效求解，以提升对齐性能。

Comments Published as a conference paper at ICML 2026

详情

AI中文摘要

对齐在许多机器学习问题中扮演基础角色，例如多网络分析、多模态学习和点云配准。近期工作越来越多地利用最优输运（OT）进行分布对齐，其有效性很大程度上依赖于在实践中难以或昂贵获取的稀疏监督。然而，现有工作大多忽略了如何主动获取高质量监督以提升OT框架下的对齐性能。本文提出了一种基于主动对齐的最优输运框架AvAtar。我们通过测量候选点对全局对齐结果的梯度影响来量化其信息量，该影响通过熵正则化OT公式从全局对齐结果传播到候选点的所有可能监督。鉴于OT的约束性质，直接对其求导具有挑战性，我们利用伴随状态方法将计算重新表述为一个线性系统，可通过共轭梯度法以线性复杂度求解并保证收敛。通过有效的效用函数编码全局对齐结果，AvAtar适用于OT框架下的一般对齐问题。在三个代表性对齐任务上的大量实验证明了所提AvAtar的有效性、可扩展性和泛化性。

英文摘要

Alignment plays a fundamental role in many machine learning problems, such as multi-network analysis, multimodal learning, and point cloud registration. Recent works increasingly leverage optimal transport (OT) for distributional alignment, whose effectiveness largely depends on sparse supervision that is hard or costly to obtain in practice. Existing works, however, largely overlook how to actively acquire high-quality supervision to improve their alignment performance under OT frameworks. In this paper, we propose a principled active alignment framework for optimal transport alignment called AvAtar. We quantify the informativeness of a candidate by measuring its gradient-based impact on the global alignment result, computed as the gradient propagation from the global alignment result to all possible supervisions of the candidate through the entropy-regularized OT formulation. While differentiating through OT is challenging given its constrained nature, we leverage the adjoint-state method to reformulate the computation to a linear system solvable by the conjugate gradient method with linear complexity and guaranteed convergence. By encoding the global alignment result via effective utility functions, AvAtar is applicable to general alignment problems under the OT framework. Extensive experiments on three representative alignment tasks demonstrate the effectiveness, scalability, and generalizability of the proposed AvAtar.

URL PDF HTML ☆

赞 0 踩 0

2605.24394 2026-05-26 cs.RO

RoboHitch: Learning Visual Affordance from Disordered Keypoints for Hitch Knots Tying

RoboHitch: 从无序关键点学习视觉可供性用于系结

Jiahui Zuo, Boyang Zhang, Fumin Zhang

AI总结提出RoboHitch框架，利用无序3D关键点和RGB图像从人类演示中学习系结，通过动态图自编码器和卷积自编码器融合特征，预测抓取和放置可供性，实现遮挡下的系结。

详情

AI中文摘要

由于复杂的动力学和频繁的自遮挡，可变形线性物体的机器人操作面临重大挑战。现有的机器人打结方法通常依赖于有序关键点和显式边缘连接的精确拓扑状态跟踪。这种依赖使得它们在打结过程中因重复弯曲和交叉导致的跟踪漂移和拓扑不匹配而容易失败。为了解决这些限制，我们引入了RoboHitch，一个新颖的框架，它仅使用无序的3D关键点和RGB图像从人类演示中学习执行系结。这消除了对显式拓扑顺序的需求，允许更灵活的操作。我们的方法采用动态图自编码器从未跟踪的关键点中提取几何特征，并辅以卷积自编码器捕获必要的视觉上下文。然后，双向交叉注意力机制融合这些模态，共同预测抓取和放置可供性，促进对绳子状态的隐式推理，并在遮挡下实现系结。真实世界实验证明了我们方法的有效性和泛化能力，成功完成了自遮挡场景中的系结。

英文摘要

Robotic manipulation of deformable linear objects (DLOs) presents significant challenges due to complex dynamics and frequent self-occlusions. Existing robotic knot tying methods typically rely on precise topological state tracking with ordered keypoints and explicit edge connectivity. This reliance makes them prone to failures due to tracking drift and topology mismatch caused by repeated bending and crossings during knot formation.To address these limitations, we introduce RoboHitch, a novel framework that learns to perform hitch knot tying from human demonstrations using only disordered 3D keypoints and RGB images. This eliminates the need for explicit topological order, allowing for more flexible manipulation. Our method employs a dynamic Graph Autoencoder to extract geometric features from untracked keypoints, complemented by a Convolutional Autoencoder that captures essential visual context. A bidirectional cross-attention mechanism then fuses these modalities to jointly predict pick and place affordances, facilitating implicit reasoning about the rope's state and enabling knot tying under occlusion.Real-world experiments demonstrate the effectiveness and generalizability of our approach, successfully completing hitch knots in scenarios with self-occlusions.

URL PDF HTML ☆

赞 0 踩 0

2605.24390 2026-05-26 cs.LG

Learning Laplacian Eigenspace with Mass-Aware Neural Operators on Point Clouds

学习拉普拉斯特征空间：基于质量感知神经算子的点云处理

Zherui Yang, Tao Du, Ligang Liu

AI总结提出神经特征空间算子（NEO），通过预测稳定不变的低频子空间而非特征向量，结合质量感知神经算子和瑞利-里兹精化，实现点云上拉普拉斯-贝尔特拉米算子的快速谱分解。

详情

DOI: 10.1145/3799902.3811185

AI中文摘要

拉普拉斯-贝尔特拉米算子（LBO）的特征分解是几何分析的基础，但由于在大规模数据上迭代求解器的高成本，计算其低频特征模态仍然是一个重大瓶颈。为了分摊这一成本，我们引入了神经特征空间算子（NEO），这是一种前馈框架，旨在直接从点云预测谱。关键的是，NEO通过学习稳定、不变的低频子空间，规避了标准特征向量回归的不适定性（后者存在固有的符号翻转和旋转歧义）。具体来说，网络预测一组冗余的基函数，其张成空间稳健地覆盖目标特征空间，从而通过轻量级的瑞利-里兹精化恢复精确的特征对。为了处理不规则采样，我们提出了一种质量感知神经算子，将逐点面积权重纳入基于注意力的聚合中，提高了对非均匀密度的鲁棒性，并实现了跨分辨率的零样本泛化。我们的方法实现了近线性的运行时间缩放，并在相当精度下比迭代求解器获得了显著的挂钟加速，同时对高分辨率点云表现出强大的零样本迁移能力。得到的特征对支持标准的谱几何任务，而原始基函数为下游学习提供了有效的逐点特征。代码：https://github.com/Adversarr/NEO。

英文摘要

The eigendecomposition of the Laplace--Beltrami Operator (LBO) is fundamental to geometric analysis, yet computing its low-frequency eigenmodes remains a significant bottleneck due to the high cost of iterative solvers on large-scale data. To amortize this cost, we introduce the Neural Eigenspace Operator (NEO), a feed-forward framework designed to predict the spectrum directly from point clouds. Crucially, NEO circumvents the ill-posed nature of standard eigenvector regression, which suffers from intrinsic sign flips and rotation ambiguities, by learning the stable, invariant low-frequency subspace instead. Specifically, the network predicts a redundant set of basis functions whose span robustly covers the target eigenspace, allowing for the recovery of accurate eigenpairs via a lightweight Rayleigh--Ritz refinement. To handle irregular sampling, we propose a mass-aware neural operator that incorporates per-point area weights into attention-based aggregation, improving robustness to non-uniform densities and enabling zero-shot generalization across resolutions. Our approach achieves near-linear runtime scaling and substantial wall-clock speedups over iterative solvers at comparable accuracy, and exhibits strong zero-shot transfer to high-resolution point clouds. The resulting eigenpairs support standard spectral geometry tasks, while the raw basis functions provide effective point-wise features for downstream learning. Code: https://github.com/Adversarr/NEO.

URL PDF HTML ☆

赞 0 踩 0

2605.24381 2026-05-26 cs.LG cs.AI stat.AP stat.ML

Assessing the Operational Viability of Foundation Models for Time Series Forecasting

评估基础模型在时间序列预测中的操作可行性

Kavin Soni, Debanshu Das, Vamshi Guduguntla

AI总结通过对比基础模型与监督学习方法在四种操作场景下的性能，提出基于经验特征的复杂度路由器以实现精度与效率的平衡。

Comments 21 pages, 8 Figures, Code available at [https://github.com/kavin-soni/timeseries-zeroshot-eval]

详情

AI中文摘要

时间序列预测驱动着金融、交通和能源等领域的操作决策。虽然监督学习方法表现出色，但它们需要特定领域的训练、特征工程和持续维护。大规模基础模型最近作为一种零样本替代方案出现，像LLM一样避免了任务特定训练。在这项工作中，我们评估了基础模型与标准监督方法的对比。我们不仅关注总体精度，还分析了四种操作场景下的性能：周期性人机系统、物理约束过程、随机金融市场和异构需求预测。我们的结果描述了最优部署区域。基础模型在具有可迁移周期结构的领域中表现良好，并且对于冷启动或长尾场景效率高。相反，监督专家在受严格物理约束的系统中保持更高的精度。在金融领域，较新的基础模型正在迅速缩小与监督专家的性能差距。我们进一步量化了推理延迟、数据漂移适应性和部署约束之间的权衡。最后，我们提出了一个复杂度路由器，它利用经验特征将每个序列分配给最优模型类别。我们证明，与部署通用基础模型相比，这种选择性路由实现了更高的精度和显著更低的推理成本，为平衡泛化性和效率提供了一个实用框架。

英文摘要

Time series forecasting drives operational decisions in areas like finance, transportation, and energy. While supervised learning approaches achieve strong performance, they require domain-specific training, feature engineering, and ongoing maintenance. Large-scale foundation models have recently emerged as a zero-shot alternative, avoiding task-specific training much like LLMs. In this work, we evaluate foundation models against standard supervised approaches. Rather than focusing solely on aggregate accuracy, we analyze performance across four operational regimes: periodic human-centric systems, physically constrained processes, stochastic financial markets, and heterogeneous demand forecasting. Our results characterize optimal deployment areas. Foundation models perform well in domains with transferable periodic structures and are efficient for cold-start or long-tail scenarios. Conversely, supervised specialists maintain higher precision in systems governed by strict physical constraints. In financial domains, newer foundation models are rapidly closing the performance gap with supervised specialists. We further quantify trade-offs in inference latency, data drift adaptability, and deployment constraints. Finally, we propose a Complexity Router that assigns each series to the optimal model class using empirical features. We demonstrate that this selective routing achieves higher accuracy and significantly lower inference costs compared to deploying a universal foundation model, providing a practical framework for balancing generalization and efficiency.

URL PDF HTML ☆

赞 0 踩 0

2605.24375 2026-05-26 cs.AI

Distilling Game Code World Model Generation into Lightweight Large Language Models

将游戏代码世界模型生成蒸馏到轻量级大型语言模型

Tyrone Serapio, Arjun Prakash, Haoyang Xu, Kevin Wang, Amy Greenwald

AI总结研究通过后训练将游戏代码世界模型生成能力蒸馏到小型模型，采用监督微调和带可验证奖励的强化学习提升生成代码的语法正确性和规则遵循性。

详情

AI中文摘要

大型语言模型（LLMs）在从自然语言生成可执行代码方面展现了强大的能力，为AI代理自动构建环境提供了可能性。最近关于代码世界模型（CWMs）的工作表明，LLMs可以将游戏规则转化为与蒙特卡洛树搜索等求解器兼容的Python实现。我们在游戏设置中研究此问题，其中生成的环境必须实现规则、合法动作、状态转移、观察和奖励。我们将这些特定于游戏的可执行模型称为游戏代码世界模型（GameCWMs）。然而，当前生成代码世界模型的方法依赖于前沿模型和推理时精炼循环，限制了可访问性和可扩展性。本文研究是否可以通过后训练将GameCWM生成能力蒸馏到更小的模型中。我们引入：（1）一个包含30个完美信息和不完美信息游戏的精选数据集，（2）一个评估生成代码的结构和语义游戏属性的验证框架，以及（3）一个结合监督微调（SFT）和带可验证奖励的强化学习（RLVR）的后训练流程。我们在Qwen2.5-3B-Instruct上进行实验，发现SFT可以提高语法正确性，而RLVR可以改善执行层面对游戏规则的遵循，从而提升Qwen在完美信息和不完美信息游戏中生成有效GameCWM的能力。总体而言，我们的流程使Qwen2.5-3B-Instruct更能够生成有效的GameCWM，从而为从自然语言自动生成环境提供了一条可扩展的路径。

英文摘要

Large Language Models (LLMs) have shown great ability in generating executable code from natural language, opening the possibility of automatically constructing environments for AI agents. Recent work on Code World Models (CWMs) demonstrates that LLMs can translate game rules into Python implementations compatible with solvers like Monte Carlo Tree Search. We study this problem in game settings, where generated environments must implement rules, legal actions, state transitions, observations, and rewards. We refer to these game-specific executable models as Game Code World Models (GameCWMs). However, current approaches to generating code world models rely on frontier models and inference-time refinement loops, limiting accessibility and scalability. This work investigates whether GameCWM generation capabilities can be distilled into smaller models through post-training. We introduce: (1) a curated dataset of 30 games spanning perfect and imperfect information games, (2) a verification framework that evaluates generated code against structural and semantic game properties, and (3) a post-training pipeline combining Supervised Fine-Tuning (SFT) with Reinforcement Learning with Verifiable Rewards (RLVR). We experiment with Qwen2.5-3B-Instruct and find that SFT can increase syntactic correctness, while RLVR can improve execution-level adherence to game rules, thereby improving Qwen's ability to generate valid GameCWMs in both perfect and imperfect information games. Overall, our pipeline makes Qwen2.5-3B-Instruct more capable of generating valid GameCWMs, thereby offering a scalable path toward automatic environment generation from natural language.

URL PDF HTML ☆

赞 0 踩 0

2605.24371 2026-05-26 cs.CV cs.CL

SliceWorld: A Predictive and Controllable World-State Model for CT Report Generation

SliceWorld: 一种用于CT报告生成的预测性和可控世界状态模型

Yuanhe Tian, Yan Song

AI总结提出SliceWorld世界状态框架，通过编码CT切片序列为因子感知的潜在状态，实现未来切片预测、病变因子干预和LLM报告生成，在M3D-Cap和CT-RATE上提升NLG指标和临床评估。

Comments 18 pages, 5 figures

详情

AI中文摘要

CT报告生成（CTRG）要求模型从数百个轴向切片中总结三维解剖背景和病理发现。现有方法通常学习直接的图像到文本映射，缺乏对CT证据如何跨切片演变或报告如何响应潜在病变相关因素受控变化的建模机制。我们提出SliceWorld，一个CT特定的世界状态框架，将轴向CT扫描视为沿z轴的有序序列。SliceWorld将前缀CT证据编码为包含解剖、病变和不确定性成分的因子感知潜在状态，并将这些状态投影到用于多步未来切片特征预测、病变因子干预和基于LLM的报告生成的世界令牌中。该模型首先在CT切片序列上使用预测性、因子感知和反事实目标进行预训练，然后在配对的CT报告数据上进行微调。在M3D-Cap和CT-RATE上的实验表明，SliceWorld改善了自然语言生成指标和临床导向的自动评估。进一步分析展示了多视野未来切片预测、可测量的因子对齐、减少切片的鲁棒性以及选择性病变敏感的报告调制。

英文摘要

CT report generation (CTRG) requires models to summarize three-dimensional anatomical context and pathological findings from hundreds of axial slices. Existing methods typically learn a direct image-to-text mapping, providing limited mechanisms for modeling how CT evidence evolves across slices or how reports respond to controlled changes in latent lesion-related factors. We propose SliceWorld, a CT-specific world-state framework that treats an axial CT scan as an ordered sequence along the z-axis. SliceWorld encodes prefix CT evidence into factor-aware latent states containing anatomy, lesion, and uncertainty components, and projects these states into world tokens used for multi-step future-slice feature prediction, lesion-factor intervention, and LLM-based report generation. The model is first pretrained on CT slice sequences with predictive, factor-aware, and counterfactual objectives, and is then fine-tuned on paired CT-report data. Experiments on M3D-Cap and CT-RATE show that SliceWorld improves natural language generation metrics and clinically oriented automatic evaluation. Further analyses demonstrate multi-horizon future-slice prediction, measurable factor alignment, reduced-slice robustness, and selective lesion-sensitive report modulation.

URL PDF HTML ☆

赞 0 踩 0

2605.24370 2026-05-26 cs.LG q-bio.QM

GEESE: Genotype-aware End-to-End Spatio-temporal Embedding for Behavioral Phenotyping

GEESE: 基因型感知的端到端时空嵌入用于行为表型分析

Yiran Ding, Yuen Gao, Chunqi Qian, Zijun Cui

AI总结提出GEESE框架，利用预训练时间序列基础模型从3D姿态动力学中直接学习行为表征，无需手工特征，在三个自闭症相关基因模型上超越传统方法，并开发了交互式工具HONK。

详情

AI中文摘要

遗传动物模型的行为表型分析目前需要劳动密集的手工特征工程，这限制了可重复性和可扩展性。我们提出GEESE，一个端到端的深度学习框架，直接从3D姿态动力学中学习行为表征，无需手工特征。使用预训练的时间序列基础模型，我们将运动序列编码到一个行为流形中，该流形支持行为分类和基因型预测。在三个自闭症相关基因模型（CNTNAP2、CHD8、FMR1）上评估，我们的深度学习方法在这两个任务上都超越了手工特征基线，揭示出学习到的表征捕获了基因型特异的行为特征。该框架跨遗传背景泛化，一个全队列模型仅从运动模式中识别遗传背景和基因型。我们进一步提供HONK，一个交互式智能工具，使没有编程专业知识的科研人员能够通过自然语言交互从姿态数据中进行行为表型分析。

英文摘要

Behavioral phenotyping of genetic animal models currently requires labor-intensive manual feature engineering that limits reproducibility and scalability. We present GEESE, an end-to-end deep learning framework that learns behavioral representations directly from 3D pose dynamics without hand-crafted features. Using a pretrained time series foundation model, we encode movement sequences into a behavioral manifold that supports both behavior classification and genotype prediction. Evaluated across three autism-associated genetic models (CNTNAP2, CHD8, FMR1), our deep learning approach surpasses hand-crafted feature baselines in both tasks, revealing that learned representations capture genotype-specific behavioral signatures. The framework generalizes across genetic backgrounds, and an all-cohort model identifies both genetic background and genotype from movement patterns alone. We further provide HONK, an interactive intelligent tool enabling researchers without programming expertise to perform behavioral phenotyping from pose data through natural language interaction.

URL PDF HTML ☆

赞 0 踩 0

2605.24367 2026-05-26 cs.CV cs.LG

Gaussian Rank-Based Neighborhood Degree for Graph Neural Networks in Image Classification

基于高斯排序邻域度的图神经网络图像分类方法

Rafael Mendonça Duarte, Jean Roberto Ponciano, Lucas Pascotti Valem

AI总结提出GRaNDe（高斯排序邻域度）方法，通过结合邻域排序与高斯距离加权来改进图神经网络中的度归一化，在五个公开图像分类数据集上取得一致准确率提升。

详情

AI中文摘要

数据的指数级增长加剧了未标注数据的可用性与人工标注的高成本之间的差距。图神经网络（GNN）作为一种有前景的解决方案出现，因为它们利用关系结构并从标注和未标注数据中学习，执行半监督学习。这些模型的一个关键组成部分是基于度的归一化，它影响消息传播，但通常假设邻域节点具有均匀重要性。在图像分类中，图通常根据特征相似性构建，将所有邻居平等对待可能会忽略相关性的重要变化。受此差距启发，我们提出GRaNDe（高斯排序邻域度）。这种新颖的度度量将邻域排序与高斯距离加权相结合，以更好地捕捉节点重要性。在五个公开图像分类数据集上的实验表明，与最先进方法相比，该方法具有一致的准确率提升和竞争性或更优的结果。

英文摘要

The exponential growth of data has intensified the gap between the availability of unlabeled data and the high cost of manual annotation. Graph Neural Networks (GNNs) have emerged as a promising solution, as they exploit relational structures and learn from both labeled and unlabeled data, performing semi-supervised learning. A crucial component of many of these models is degree-based normalization, which influences message propagation but typically assumes uniform importance among neighboring nodes. In image classification, graphs are usually constructed from feature similarity, where treating all neighbors equally may overlook important variations in relevance. Motivated by this gap, we propose GRaNDe (Gaussian Rank-based Neighborhood Degree). This novel degree measure integrates neighborhood ranking with Gaussian distance weighting to better capture node importance. Experiments on five public image classification datasets show consistent accuracy improvements and competitive or superior results compared to state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2605.24366 2026-05-26 cs.CL cs.LG

Structure-Aware RAG: Structured Retrieval Augmented Generation from Noisy Data for Conversational Agents

结构感知检索增强生成：面向对话代理的噪声数据结构化检索增强生成

Kaiqiao Han, LuAn Tang, Renliang Sun, Peng Yuan, Wei Cheng, Haoyu Wang, Wei Wang, Yizhou Sun, Haifeng Chen

AI总结提出结构感知检索增强生成（SA-RAG），通过表格作为中间结构化表示来减少噪声并保留关键信息，结合质量感知的表格元数据生成框架和优化方法，在噪声真实数据集上显著优于现有RAG基线。

详情

AI中文摘要

在辩论中区分对错：中国有害模因的归因分析

Weiming Wang, Junyu Lu, Han Wang, Xiaokun Zhang, Zewen Bai, Bo Xu, Liang Yang, Hongfei Lin

AI总结针对中文有害模因检测中文化背景依赖和语义歧义问题，构建首个中文有害模因解释数据集Ex-ToxiCN-MM，并提出包含归因知识增强模块和相对意图推理模块的归因分析框架RIKE，在归因任务上超越主流基线模型。

Comments 10 pages, 4 figures

详情

AI中文摘要

关于有害模因检测的研究已引起广泛关注，并催生了大量数据集和方法。然而，中文有害模因检测的进展明显滞后，主要面临两个挑战：首先，准确评估模因的有害性高度依赖于对深层文化背景的理解；其次，许多模因在语义上存在歧义，使得有害性判断具有高度主观性。为解决这些问题，我们聚焦于中文有害模因的可解释检测，构建了首个中文有害模因解释数据集Ex-ToxiCN-MM。该数据集为每个模因提供了“有害”和“无害”两种对立的解释，旨在严格评估模型辨别和理解具有歧义、文化根基内容的能力。我们构建了专门的中文文化概念和冒犯性词汇知识库（C-HarmKB），为模型提供必要的先验知识。为应对模因归因中的歧义和背景知识缺失问题，我们开发了一个全面的归因分析框架RIKE，其中包括归因知识增强模块（AKE）和相对意图推理模块（RIR）。大量的定量和定性实验表明，我们的方法在中文有害模因归因任务的多项指标上优于主流基线模型。本研究中涉及的代码、Ex-ToxiCN-MM数据集和中文有害语义知识库（C-HarmKB）已在https://github.com/wimiw123/Ex-ToxiCN-MM开源。

英文摘要

Research on harmful meme detection has garnered significant attention, resulting in the development of numerous datasets and methods. However, progress in detecting Chinese harmful memes lags considerably, primarily due to two challenges: first, accurately assessing a meme's harmfulness depends heavily on understanding deep cultural context; second, many memes are semantically ambiguous, making harmfulness highly subjective. To address these issues, we focus on the interpretable detection of Chinese harmful memes by constructing the first Chinese harmful meme explanation dataset, Ex-ToxiCN-MM. This dataset offers opposing interpretations, categorized as "harmful" and "non-harmful", for each meme, aiming to rigorously evaluate a model's ability to discern and comprehend ambiguous, culturally grounded content. We built a specialized knowledge base of Chinese cultural concepts and offensive vocabulary to supply models with essential prior knowledge (C-HarmKB). To address the ambiguity and lack of background knowledge in meme attribution, we have developed a comprehensive attribution analysis framework, RIKE, which includes an Attribution Knowledge Enhancement module (AKE) and a Relative Intent Reasoning module (RIR). Extensive quantitative and qualitative experiments demonstrate that our method outperforms mainstream baseline models across multiple metrics in the task of attributing harmful memes in Chinese. The code, Ex-ToxiCN-MM dataset, and Chinese Harmful Semantic Knowledge Base (C-HarmKB) involved in this study have been open-sourced at https://github.com/wimiw123/Ex-ToxiCN-MM

URL PDF HTML ☆

赞 0 踩 0

2605.24343 2026-05-26 cs.AI

Adaptive Human-AI Coordination via Hierarchical Action Disentanglement

通过层次化动作解耦实现自适应人机协作

Adnan Ahmad, Bahareh Nakisa, Mohammad Naim Rastgoo

AI总结提出内在动作解耦（IAD）框架，利用深度层次强化学习学习伙伴感知的低层动作序列，通过内在奖励鼓励动作解耦，实现与多样伙伴的自适应协调。

详情

AI中文摘要

人机协作需要智能体能够适应多样化的伙伴行为和技能水平，同时对未见过的伙伴保持鲁棒性。现有方法往往坍缩为单一主导行为或学习到对齐不良的技能，限制了有效协调。我们提出内在动作解耦（IAD），一种深度层次强化学习（DHRL）框架，学习以高层潜在技能为条件的、不同的、伙伴感知的低层动作序列。IAD引入内在奖励，明确鼓励智能体低层策略在不同技能上的动作分布解耦，从而在高层次决策与伙伴特定的行为响应之间产生可解释的映射。通过捕捉时间上扩展的交互模式，IAD能够在分布偏移下灵活适应异质伙伴动态。我们在Overcooked-AI领域中对多个布局和多种伙伴设置进行评估，包括未见过的模拟伙伴、基于人-人游戏训练的人类代理模型以及真实人类伙伴。结果表明，IAD在所有设置中均持续优于强基线，并实现更可靠、自适应的协调。

英文摘要

Human-AI collaboration requires agents that can adapt to diverse partner behaviors and skill levels while remaining robust to unseen partners. Existing methods often collapse to a single dominant behavior or learn poorly aligned skills, limiting effective coordination. We propose Intrinsic Action Disentanglement (IAD), a deep hierarchical reinforcement learning (DHRL) framework that learns distinct, partner-aware low-level action sequences conditioned on high-level latent skills. IAD introduces an intrinsic reward that explicitly encourages disentangled action distributions of the agent's low-level policy across skills, yielding an interpretable mapping between high-level decisions and partner-specific behavioral responses. By capturing temporally extended interaction patterns, IAD enables flexible adaptation to heterogeneous partner dynamics under distributional shift. We evaluate IAD in the Overcooked-AI domain across multiple layouts and diverse partner settings, including unseen simulated partners, a human-proxy model trained on human-human gameplay, and real human partners. Results show that IAD consistently outperforms strong baselines and achieves more reliable, adaptive coordination across all settings.

URL PDF HTML ☆

赞 0 踩 0