arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2508.01253 2026-05-27 cs.CV

ODOV: Benchmark the Open-Domain Open-Vocabulary Object Detection

ODOV：开放域开放词汇目标检测基准

Yupeng Zhang, Ruize Han, Fangnan Zhou, Wei Feng, Liang Wan

AI总结针对真实场景中域偏移和类别偏移同时发生的问题，提出开放域开放词汇目标检测任务，构建OD-LVIS基准数据集，并设计基于VLM的基线方法，通过域无关类别提示和域投影嫁接模块提升检测性能。

详情

AI中文摘要

现有研究通常将域偏移和类别偏移作为独立问题进行研究，然而在真实场景中，这两种偏移常常同时发生并相互作用，导致检测性能显著下降。为了解决这一问题，我们提出并系统研究了一个新问题——开放域开放词汇（ODOV）目标检测，旨在评估模型在真实环境中适应复合域和类别偏移的能力。我们构建了一个新的基准数据集OD-LVIS，包含来自15个不同真实场景的46,949张图像和1,203个类别，用于评估目标检测性能。此外，我们提出了一种新的ODOV检测基线，充分利用VLM强大的多模态对齐能力，并引入两种关键机制以增强类别和域泛化能力。一种是域无关类别提示（DAPmt），它在增强类别语义的同时减弱域表示，从而实现纯粹的类别表示。另一种是域投影与嫁接（DP&G）模块，它融合了输入图像中的域特定特征，使模型能够动态地在各种开放域中进行泛化。这两个组件使模型能够在真实场景中同时存在类别和域变化的情况下保持有效的检测性能。我们为提出的ODOV检测任务提供了广泛的基准评估，并报告了实验结果。这些结果验证了ODOV任务的合理性、OD-LVIS数据集的实用性以及该方法的优越性。

英文摘要

Existing studies typically investigate domain shift and category shift as independent problems, however, in real-world scenarios, the two types of shifts often occur simultaneously and interact, leading to significant degradation in detection performance. To address this, we propose and systematically study a novel problem-Open-Domain Open-Vocabulary (ODOV) object detection-which aims to evaluate a model's ability to adapt to the compound domain and category shifts in real-world environments.We construct a new benchmark, OD-LVIS, which contains 46,949 images spanning 15 diverse real-world scenarios and 1,203 categories, for assessing object detection performance. Furthermore, we propose a novel ODOV detection baseline that fully leverages VLM's powerful multi-modal alignment capabilities and introduces two key mechanisms to enhance both category and domain generalization. One is the Domain-Agnostic Category Prompt (DAPmt), which strengthens category semantics while attenuating domain representations, enabling pure category representation. The other is the Domain Projection and Grafting (DP&G) module, which incorporates domain-specific features from input images, allowing the model to dynamically generalize across diverse open domains. These two components enable the model to maintain effective detection performance under simultaneous category and domain variations in real-world scenarios. We provide extensive benchmark evaluations for the proposed ODOV detection task and report experimental results. These results validate the soundness of the ODOV task, the practicality of the OD-LVIS dataset, and the superiority of the method.

URL PDF HTML ☆

赞 0 踩 0

2507.13762 2026-05-27 cs.LG q-bio.BM

MolPIF: A Parameter Interpolation Flow Model for Molecule Generation

MolPIF: 一种用于分子生成的参数插值流模型

Yaowei Jin, Junjie Wang, Yufan Tang, Wenkai Xiang, Duanhua Cao, Dan Teng, Zhehuan Fan, Jiacheng Xiong, Xia Sheng, Chuanlong Zeng, Duo An, Mingyue Zheng, Shuangjia Zheng, Qian Shi

AI总结提出参数插值流模型MolPIF，通过参数空间分布插值统一连续坐标与离散原子类型的生成，在CrossDocked2020数据集上优于基线方法。

Comments Accepted to Bioinformatics

详情

AI中文摘要

Normal Patch Retinex 稳健算法用于数字显微镜白平衡

Radoslaw Roszczyk, Artur Krupa, Izabella Antoniuk

AI总结提出一种基于Normal Patch Retinex的全自动白平衡算法，用于校正数字显微镜彩色图像，实验证明其优于经典算法。

详情

DOI: 10.22630/MGV.2020.29.1.5
Journal ref: Vol. 29 No. 1/4 (2020)

AI中文摘要

在光学显微镜中获取准确彩色、平衡的图像即使对于经验丰富的显微镜操作者也可能是一个挑战。本文提出了一种完全自动的白平衡机制，能够充分校正显微彩色图像。该算法的结果已在200张显微图像数据集上通过实验验证。这些图像包含病理形态学中常用的三种显微标本的扫描图。此外，将所得结果与数字摄影中其他常用的白平衡算法进行了比较。本文应用的算法对于苏木精-荧光桃红-番红染色的显微图像和免疫组织化学染色图像比彩色摄影中使用的经典算法更有效。

英文摘要

The acquisition of accurately coloured, balanced images in an optical microscope can be a challenge even for experienced microscope operators. This article presents an entirely automatic mechanism for balancing the white level that allows the correction of the microscopic colour images adequately. The results of the algorithm have been confirmed experimentally on a set of two hundred microscopic images. The images contained scans of three microscopic specimens commonly used in pathomorphology. Also, the results achieved were compared with other commonly used white balance algorithms in digital photography. The algorithm applied in this work is more effective than the classical algorithms used in colour photography for microscopic images stained with hematoxylin-phloxine-saffron and for immunohistochemical staining images.

URL PDF HTML ☆

赞 0 踩 0

2506.23149 2026-05-27 cs.CL

AlignEvoSkill: Towards Knowledge-Aware and Task-Aligned Agent Skill Evolution

AlignEvoSkill: 迈向知识感知与任务对齐的智能体技能进化

Dingzirui Wang, Xuanliang Zhang, Keyan Xu, Qingfu Zhu, Wanxiang Che, Yang Deng

AI总结提出AlignEvoSkill框架，通过联合建模知识覆盖和任务对齐，从失败轨迹中识别知识标签、检索并适配候选技能，再基于知识覆盖和任务对齐分数筛选高质量技能，在3个基准和4个LLM骨干上相对提升34.7%，实现技能进化新SOTA且成本更低。

详情

AI中文摘要

可重用技能在提升基于LLM的智能体中扮演关键角色，但现有技能进化方法往往无法确保进化后的技能既覆盖任务所需的知识，又与目标任务保持对齐。结果，进化后的技能可能不完整或无关。为解决这一局限，我们提出AlignEvoSkill，一个联合建模知识覆盖和任务对齐的技能进化框架。给定失败的任务轨迹，AlignEvoSkill首先识别与任务相关的知识标签，检索互补的先前技能，并将它们适配为弥补缺失知识的候选技能。然后，它使用基于知识覆盖和任务对齐分数的联合过滤标准选择高质量候选技能。在3个基准和4个LLM骨干上的实验表明，AlignEvoSkill相对于非进化基线实现了34.7%的相对增益，并以更低的成本实现了技能进化的新SOTA。

英文摘要

Reusable skills play a key role in improving LLM-based agents, but existing skill-evolution methods often fail to ensure that evolved skills both cover the knowledge required by the task and remain aligned with the target task. As a result, evolved skills could be incomplete or irrelevant. To address this limitation, we propose AlignEvoSkill, a skill-evolution framework that jointly models knowledge coverage and task alignment. Given failed task trajectories, AlignEvoSkill first identifies task-relevant knowledge tags, retrieves complementary prior skills, and adapts them into candidate skills that address missing knowledge. It then selects high-quality candidates using a joint filtering criterion based on knowledge-coverage and task-alignment scores. Experiments on 3 benchmarks with4 LLM backbones show a 34.7% relative gain of AlignEvoSkill over the non-evolution baseline and achieves a new SOTA in skill evolution with lower cost.

URL PDF HTML ☆

赞 0 踩 0

2506.21443 2026-05-27 cs.CL cs.AI

Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection

领域知识增强的大语言模型用于欺诈和概念漂移检测

Ali Şenol, Garima Agrawal, Huan Liu

AI总结提出一种领域知识增强的大语言模型框架，通过集成结构化领域知识和漂移检测单元，实现高准确率的欺诈对话检测和概念漂移分类。

详情

DOI: 10.3390/electronics15030534

AI中文摘要

在动态平台上检测欺骗性对话变得越来越困难，原因是语言模式的演变和概念漂移（CD）——即随着时间推移，语义或主题的转变会改变交互的上下文或意图。这些转变可能掩盖恶意意图或模仿正常对话，使得准确分类具有挑战性。尽管大语言模型（LLMs）在自然语言任务中表现出色，但在风险敏感场景中，它们常常面临上下文模糊和幻觉问题。为了解决这些挑战，我们提出了一个领域知识（DK）增强的LLM框架，该框架将预训练的LLM与结构化的、任务特定的见解相结合，以执行欺诈和概念漂移检测。所提出的架构由三个主要组件组成：（1）一个DK-LLM模块，用于检测虚假或欺骗性对话；（2）一个漂移检测单元（OCDD），用于判断是否发生了语义转变；（3）第二个DK-LLM模块，用于将漂移分类为良性或欺诈性。我们首先使用虚假评论数据集验证领域知识的价值，然后将我们的完整框架应用于SEConvo，一个包含多种欺诈和垃圾攻击的多轮对话数据集。结果表明，我们的系统能够高精度地检测虚假对话，并有效分类漂移的性质。在结构化提示的引导下，基于LLaMA的实现达到了98%的分类准确率。与零样本基线的对比研究表明，在高风险NLP应用中，融入领域知识和漂移意识显著提高了性能、可解释性和鲁棒性。

英文摘要

Detecting deceptive conversations on dynamic platforms is increasingly difficult due to evolving language patterns and Concept Drift (CD)-i.e., semantic or topical shifts that alter the context or intent of interactions over time. These shifts can obscure malicious intent or mimic normal dialogue, making accurate classification challenging. While Large Language Models (LLMs) show strong performance in natural language tasks, they often struggle with contextual ambiguity and hallucinations in risk-sensitive scenarios. To address these challenges, we present a Domain Knowledge (DK)-Enhanced LLM framework that integrates pretrained LLMs with structured, task-specific insights to perform fraud and concept drift detection. The proposed architecture consists of three main components: (1) a DK-LLM module to detect fake or deceptive conversations; (2) a drift detection unit (OCDD) to determine whether a semantic shift has occurred; and (3) a second DK-LLM module to classify the drift as either benign or fraudulent. We first validate the value of domain knowledge using a fake review dataset and then apply our full framework to SEConvo, a multiturn dialogue dataset that includes various types of fraud and spam attacks. Results show that our system detects fake conversations with high accuracy and effectively classifies the nature of drift. Guided by structured prompts, the LLaMA-based implementation achieves 98% classification accuracy. Comparative studies against zero-shot baselines demonstrate that incorporating domain knowledge and drift awareness significantly improves performance, interpretability, and robustness in high-stakes NLP applications.

URL PDF HTML ☆

赞 0 踩 0

2506.17633 2026-05-27 cs.CV cs.AI

Adaptive Multi-prompt Contrastive Network for Few-shot Out-of-distribution Detection

自适应多提示对比网络用于少样本分布外检测

Xiang Fang, Arvind Easwaran, Blaise Genest

AI总结针对少样本分布外检测问题，提出自适应多提示对比网络（AMCN），通过CLIP学习可学习文本提示和类间/类内分布，实现ID-OOD分离边界自适应。

Comments Published in ICML 2025

详情

AI中文摘要

分布外（OOD）检测旨在区分异常样本，以防止在分布内（ID）数据集上训练的模型产生不可用的输出。大多数OOD检测方法需要大量IID样本进行训练，这严重限制了它们的实际应用。为此，我们针对一个具有挑战性的场景：少样本OOD检测，其中只有少量标记的ID样本可用。因此，少样本OOD检测比传统的OOD检测设置更具挑战性。先前的少样本OOD检测工作忽略了不同类别之间的显著多样性。在本文中，我们提出了一种新颖的网络：自适应多提示对比网络（AMCN），它通过学习类间和类内分布来适应ID-OOD分离边界。为了弥补OOD的缺失和ID图像样本的稀缺，我们利用CLIP连接文本与图像，设计可学习的ID和OOD文本提示。具体来说，我们首先生成自适应提示（可学习ID提示、标签固定OOD提示和标签自适应OOD提示）。然后，我们通过引入类级阈值为每个类生成自适应类边界。最后，我们提出一个提示引导的ID-OOD分离模块来控制ID和OOD提示之间的间隔。实验结果表明，AMCN优于其他最先进的工作。

英文摘要

Out-of-distribution (OOD) detection attempts to distinguish outlier samples to prevent models trained on the in-distribution (ID) dataset from producing unavailable outputs. Most OOD detection methods require many IID samples for training, which seriously limits their real-world applications. To this end, we target a challenging setting: few-shot OOD detection, where {Only a few {\em labeled ID} samples are available.} Therefore, few-shot OOD detection is much more challenging than the traditional OOD detection setting. Previous few-shot OOD detection works ignore the distinct diversity between different classes. In this paper, we propose a novel network: Adaptive Multi-prompt Contrastive Network (AMCN), which adapts the ID-OOD separation boundary by learning inter- and intra-class distribution. To compensate for the absence of OOD and scarcity of ID {\em image samples}, we leverage CLIP, connecting text with images, engineering learnable ID and OOD {\em textual prompts}. Specifically, we first generate adaptive prompts (learnable ID prompts, label-fixed OOD prompts and label-adaptive OOD prompts). Then, we generate an adaptive class boundary for each class by introducing a class-wise threshold. Finally, we propose a prompt-guided ID-OOD separation module to control the margin between ID and OOD prompts. Experimental results show that AMCN outperforms other state-of-the-art works.

URL PDF HTML ☆

赞 0 踩 0

2506.11253 2026-05-27 cs.CV cs.LG

Lifting Data-Tracing Machine Unlearning to Knowledge-Tracing for Foundation Models

将数据追踪的机器遗忘提升为基础模型的知识追踪

Yuwen Tan, Boqing Gong

AI总结本文提出将数据追踪的机器遗忘提升为基础模型的知识追踪，以应对多样化遗忘请求，并更接近人类遗忘机制，通过视觉语言模型案例展示实现范式。

Comments Accepted to TMLR

详情

AI中文摘要

机器遗忘从AI模型中移除特定训练数据点及其影响（例如，当数据所有者撤销其同意允许模型从数据中学习时）。在这篇立场论文中，我们提出将数据追踪的机器遗忘提升为基础模型（FMs）的知识追踪。我们基于实际需求和认知研究的见解支持这一立场。实际上，追踪数据无法满足对FMs的多样化遗忘请求，这些请求可能来自监管机构、企业用户、产品团队等，他们无法访问FMs的大量训练数据。相反，这些方方便提出关于FMs（不应）拥有的知识或能力的遗忘请求。认知上，知识追踪遗忘比追踪单个训练数据点更接近人脑的遗忘方式。我们进一步讨论了知识追踪机器遗忘范式中的重大挑战。最后，我们提供了一个关于视觉语言FMs的具体案例研究，以说明遗忘者如何实例化知识追踪机器遗忘范式。代码可在：https://1yuwen.github.io/Knowledge-Tracing-MU-Page 获取。

英文摘要

Machine unlearning removes certain training data points and their influence from AI models (e.g., when a data owner revokes their consent to allow models to learn from the data). In this position paper, we propose to lift data-tracing machine unlearning to knowledge-tracing for foundation models (FMs). We support this position based on practical needs and insights from cognitive studies. Practically, tracing data cannot meet the diverse unlearning requests for FMs, which may be from regulators, enterprise users, product teams, etc., who have no access to FMs' massive training data. Instead, it is convenient for these parties to issue an unlearning request about the knowledge or capability FMs (should not) possess. Cognitively, knowledge-tracing unlearning aligns with how the human brain forgets more closely than tracing individual training data points does. We further discuss the nontrivial challenges in the knowledge-tracing machine unlearning paradigm. Finally, we provide a concrete case study about a vision-language FM to illustrate how an unlearner might instantiate the knowledge-tracing machine unlearning paradigm. Code is available at: https://1yuwen.github.io/Knowledge-Tracing-MU-Page.

URL PDF HTML ☆

赞 0 踩 0

2506.10225 2026-05-27 cs.SD cs.AI eess.AS

Genre Controlled Music Generation via Activation Steering

通过激活引导实现体裁控制的音乐生成

Swathi Narashiman, Pranay Mathur, Dipanshu Panda, Jayden Koshy Joe, Harshith M R, Anish Veerakumar, Aniruddh Krishna, Keerthiharan A

AI总结提出一种在推理时对自回归生成模型MusicGen进行干预的方法，利用线性探针权重引导残差流，实现细粒度的体裁控制。

2506.07813 2026-05-27 cs.CV cs.AI

消息传递状态空间模型：利用现代序列建模改进图学习

Andrea Ceni, Alessio Gravina, Claudio Gallicchio, Davide Bacciu, Carola-Bibiane Schonlieb, Moshe Eliasof

AI总结提出MP-SSM，将现代状态空间模型的核心计算嵌入消息传递神经网络，实现静态和时序图上的高效、置换等变和长程信息传播，并通过精确敏感性分析刻画深层信息流问题。

详情

AI中文摘要

状态空间模型（SSM）在序列建模中的近期成功推动了其向图学习的迁移，催生了图状态空间模型（GSSM）。然而，现有的GSSM通过将SSM模块应用于从图中提取的序列，往往损害了置换等变性、消息传递兼容性和计算效率等核心属性。本文引入了一种新视角，将现代SSM计算的关键原理直接嵌入消息传递神经网络框架，从而为静态图和时序图提供统一的方法论。我们的方法MP-SSM能够实现高效、置换等变和长程信息传播，同时保持消息传递的架构简洁性。关键的是，MP-SSM支持精确的敏感性分析，我们利用该分析从理论上刻画信息流，并评估深层网络中的梯度消失和过压缩等问题。此外，我们的设计选择允许类似现代SSM的高度优化并行实现。我们在包括节点分类、图属性预测、长程基准和时空预测在内的广泛任务上验证了MP-SSM，展示了其多功能性和强大的实证性能。

英文摘要

The recent success of State-Space Models (SSMs) in sequence modeling has motivated their adaptation to graph learning, giving rise to Graph State-Space Models (GSSMs). However, existing GSSMs operate by applying SSM modules to sequences extracted from graphs, often compromising core properties such as permutation equivariance, message-passing compatibility, and computational efficiency. In this paper, we introduce a new perspective by embedding the key principles of modern SSM computation directly into the Message-Passing Neural Network framework, resulting in a unified methodology for both static and temporal graphs. Our approach, MP-SSM, enables efficient, permutation-equivariant, and long-range information propagation while preserving the architectural simplicity of message passing. Crucially, MP-SSM enables an exact sensitivity analysis, which we use to theoretically characterize information flow and evaluate issues like vanishing gradients and over-squashing in the deep regime. Furthermore, our design choices allow for a highly optimized parallel implementation akin to modern SSMs. We validate MP-SSM across a wide range of tasks, including node classification, graph property prediction, long-range benchmarks, and spatiotemporal forecasting, demonstrating both its versatility and strong empirical performance.

URL PDF HTML ☆

赞 0 踩 0

2505.18603 2026-05-27 cs.AI cs.CV

是的，Q学习有助于离线上下文强化学习

Denis Tarasov, Alexander Nikulin, Ilya Zisman, Albina Klepach, Andrei Polubarov, Nikita Lyubaykin, Alexander Derevyagin, Igor Kiselev, Vladislav Kurenkov

AI总结本文在离线上下文强化学习框架中整合RL目标，通过150多个数据集实验证明，直接优化RL目标相比算法蒸馏平均提升约30%性能，且价值学习中的保守性带来额外改进。

详情

AI中文摘要

现有的离线上下文强化学习（ICRL）方法主要依赖监督训练目标，这在离线RL设置中已知存在局限性。在本研究中，我们探索了在离线ICRL框架中整合RL目标。通过在150多个GridWorld和MuJoCo环境派生数据集上的实验，我们证明，与广泛采用的算法蒸馏（AD）相比，直接优化RL目标在各种数据集覆盖范围、结构、专业水平和环境复杂性下平均提升约30%的性能。此外，在具有挑战性的XLand-MiniGrid环境中，RL目标使AD的性能翻倍。我们的结果还揭示，在几乎所有测试的设置中，价值学习期间加入保守性带来了额外的改进。我们的发现强调了将ICRL学习目标与RL奖励最大化目标对齐的重要性，并表明离线RL是推进ICRL的一个有前景的方向。

英文摘要

Existing offline in-context reinforcement learning (ICRL) methods have predominantly relied on supervised training objectives, which are known to have limitations in offline RL settings. In this study, we explore the integration of RL objectives within an offline ICRL framework. Through experiments on more than 150 GridWorld and MuJoCo environment-derived datasets, we demonstrate that optimizing RL objectives directly improves performance by approximately 30% on average compared to widely adopted Algorithm Distillation (AD), across various dataset coverages, structures, expertise levels, and environmental complexities. Furthermore, in the challenging XLand-MiniGrid environment, RL objectives doubled the performance of AD. Our results also reveal that the addition of conservatism during value learning brings additional improvements in almost all settings tested. Our findings emphasize the importance of aligning ICRL learning objectives with the RL reward-maximization goal, and demonstrate that offline RL is a promising direction for advancing ICRL.

URL PDF HTML ☆

赞 0 踩 0

2505.02974 2026-05-27 cs.LG

PLAID: A Unified Data Model for Machine Learning on Heterogeneous Physics Simulations

PLAID：面向异构物理模拟的机器学习统一数据模型

Fabien Casenave, Xavier Roynard, Brian Staber, Alexandre Devaux-Rivière, William Piat, Michele Alessandro Bucci, Nissrine Akkari, Abbas Kabalan, Xuan Minh Vuong Nguyen, Luca Saverio, Raphaël Carpintero Perez, Anthony Kalaydjian, Samy Fouché, Thierry Gonon, Ghassan Najjar, Thomas Daniel, Emmanuel Menier, Matthieu Nastorg, Giovanni Catalani, Christian Rey

AI总结提出PLAID统一数据层，通过标准化异构物理模拟数据并发布六个基准数据集，解决机器学习代理模型缺乏大规模多样化数据集的问题。

Comments Presented at EuRIPS 2025 and accepted at the AI4Physics Workshop @ ICML 2026

详情

AI中文摘要

基于机器学习的代理模型已成为加速模拟驱动科学工作流的强大工具，但其应用受到缺乏大规模、多样化且标准化的物理模拟数据集的限制。现有基准测试通常聚焦于狭窄领域或依赖简化数据模型，未能捕捉由可变几何、网格和拓扑产生的异质性，而这对于评估现实场景中的泛化能力至关重要。我们提出PLAID（物理学习AI数据模型），一个用于异构物理模拟的统一且可扩展的数据层。它在保留模拟数据完整复杂性的同时，支持高效可扩展的机器学习工作流，并附带一个用于数据集构建和操作的库（https://github.com/PLAID-lib/plaid）。我们发布了六个覆盖结构力学和计算流体动力学的数据集，旨在反映真实工业场景并提供标准化基准。该框架包含可复现的评估协议，并与Hugging Face集成，支持开放、社区驱动的基准测试和用户积极参与（https://huggingface.co/PLAIDcompetitions）。

英文摘要

Machine learning-based surrogate models have emerged as a powerful tool to accelerate simulation-driven scientific workflows, but their adoption is limited by the lack of large-scale, diverse, and standardized datasets for physics-based simulations. Existing benchmarks often focus on narrow domains or rely on simplified data models, and fail to capture the heterogeneity arising from variable geometries, meshes, and topologies, which is critical for assessing generalization in realistic settings. We introduce PLAID (Physics-Learning AI Data model), a unified and extensible data layer for heterogeneous physics simulations. It preserves the full complexity of simulation data while enabling efficient and scalable machine learning workflows, together with a library for dataset construction and manipulation~(\href{https://github.com/PLAID-lib/plaid}{github.com/PLAID-lib/plaid}). We release six datasets covering structural mechanics and computational fluid dynamics, designed to reflect realistic industrial scenarios and provide standardized benchmarks. The framework includes reproducible evaluation protocols and is integrated with Hugging Face to enable open, community-driven benchmarking with active user participation (\href{https://huggingface.co/PLAIDcompetitions}{huggingface.co/PLAIDcompetitions}).

URL PDF HTML ☆

赞 0 踩 0

2503.21510 2026-05-27 cs.LG cs.CV stat.ML

An uncertainty-aware Bayesian framework for machine learning classification models: A case study in land cover classification

一种不确定性感知的贝叶斯机器学习分类模型框架：以土地覆盖分类为例

Samuel Bilson, Miles McCrory, Anna Pustogvar

AI总结提出一种考虑输入测量不确定性的贝叶斯生成式分类模型框架，通过贝叶斯二次判别分析模型在土地覆盖数据集上验证，该模型在可解释性、不确定性建模和计算效率方面优于随机森林和神经网络。

Comments 38 pages, 16 figures

详情

AI中文摘要

确保机器学习分类模型的预测伴随不确定性估计是可信任人工智能的主要支柱之一。当前不确定性量化研究主要关注ML模型的认知不确定性，但很少考虑输入测量不确定性，而这对于计量学的可追溯性至关重要。在这项工作中，我们提出了一种考虑输入测量不确定性的生成式ML分类模型的贝叶斯框架。我们以贝叶斯二次判别分析（BQDA）模型为例，并将其应用于来自Copernicus Sentinel-2的2020年和2021年计量土地覆盖数据集。我们将该模型的性能与土地覆盖图中更流行的分类模型（如随机森林和神经网络）进行基准测试。为了验证和评估此类模型的泛化能力，我们还在合成分类数据上进行了模拟，改变了输入测量噪声的分布类型和强度。我们发现，对于真实和合成数据，所提出的BQDA模型更可信，因为它更具可解释性，显式建模了输入测量不确定性，并在不同领域和大小的数据集上保持了类别概率输出的预测性能，同时计算效率更高。

英文摘要

Ensuring that predictions of machine learning (ML) classification models are accompanied by uncertainty estimates is one of the main pillars of trustworthy AI. Current research in uncertainty quantification focuses mainly on epistemic uncertainty of the ML model, but rarely takes account of input measurement uncertainty, which is vital for traceability in metrology. In this work we propose a Bayesian framework for generative ML classification models that takes account of input measurement uncertainty. We take the specific case of a Bayesian quadratic discriminant analysis (BQDA) model, and apply it to metrological land cover datasets from Copernicus Sentinel-2 from 2020 and 2021. We benchmark the performance of the model against more popular classification models used in land cover maps such as random forests and neural networks. To validate and assess the generalisability of such a model, we also run simulations over synthetic classification data, varying distribution type and strength of the input measurement noise. We find for both real and synthetic data, the BQDA model presented is more trustworthy, in the sense that it is more interpretable, explicitly models the input measurement uncertainty, and maintains predictive performance of class probability outputs across datasets over different domains and sizes, whilst also being more computationally efficient.

URL PDF HTML ☆

赞 0 踩 0

2504.08540 2026-05-27 cs.CV

增强物理人机交互：通过机器人本体触觉感知识别数字

Teresa Sinico, Giovanni Boschetti, Pedro Neto

AI总结利用协作机器人内置扭矩传感器采集人手在触控板上书写数字时的关节力矩和末端力数据，通过双向LSTM网络实现94%准确率的在线数字识别，并在水果递送任务中验证其应用潜力。

详情

DOI: 10.1109/IECON58223.2025.11221741

AI中文摘要

物理人机交互（pHRI）仍然是实现与机器人直观安全交互的关键挑战。当前的进展通常依赖外部触觉传感器作为接口，这增加了机器人系统的复杂性。在本研究中，我们利用协作机器人的本体触觉感知能力，识别用户在安装在机器人法兰上的无仪器触控板上绘制的数字。我们提出了一个数据集，包含机器人关节扭矩信号以及相应的末端执行器（EEF）力和力矩，这些数据来自机器人每个关节的集成扭矩传感器，用户在手写数字（0-9）时采集。pHRI-DIGI-TACT数据集从不同用户收集，以捕捉手写的自然变化。为增强分类鲁棒性，我们开发了一种数据增强技术来处理反转和旋转的数字输入。双向长短期记忆（Bi-LSTM）网络利用数据的时空特性，实现在线数字分类，在各种测试场景中总体准确率达到94%，包括涉及未参与系统训练的用户。该方法在真实机器人上的水果递送任务中实现，展示了其辅助日常生活的潜力。数据集和视频演示可在 https://TS-Robotics.github.io/pHRI-DIGI/ 获取。

英文摘要

Physical human-robot interaction (pHRI) remains a key challenge for achieving intuitive and safe interaction with robots. Current advancements often rely on external tactile sensors as interface, which increase the complexity of robotic systems. In this study, we leverage the intrinsic tactile sensing capabilities of collaborative robots to recognize digits drawn by humans on an uninstrumented touchpad mounted to the robot's flange. We propose a dataset of robot joint torque signals along with corresponding end-effector (EEF) forces and moments, captured from the robot's integrated torque sensors in each joint, as users draw handwritten digits (0-9) on the touchpad. The pHRI-DIGI-TACT dataset was collected from different users to capture natural variations in handwriting. To enhance classification robustness, we developed a data augmentation technique to account for reversed and rotated digits inputs. A Bidirectional Long Short-Term Memory (Bi-LSTM) network, leveraging the spatiotemporal nature of the data, performs online digit classification with an overall accuracy of 94\% across various test scenarios, including those involving users who did not participate in training the system. This methodology is implemented on a real robot in a fruit delivery task, demonstrating its potential to assist individuals in everyday life. Dataset and video demonstrations are available at: https://TS-Robotics.github.io/pHRI-DIGI/.

URL PDF HTML ☆

赞 0 踩 0

2503.14359 2026-05-27 cs.CV

ImViD: Immersive Volumetric Videos for Enhanced VR Engagement

ImViD：用于增强VR沉浸感的沉浸式体积视频

Zhengxian Yang, Shi Pan, Shengqi Wang, Haoxiang Wang, Li Lin, Guanjun Li, Zhengqi Wen, Borong Lin, Jianhua Tao, Tao Yu

AI总结提出ImViD多视角多模态数据集，支持移动中捕获完整场景，为6自由度多模态沉浸式VR体验提供基准和重建管线。

Comments CVPR 2025 Highlight; Fix NSFC ID

详情

AI中文摘要

用户参与度通过结合视觉和听觉刺激的完全沉浸式多模态体验得到极大增强。因此，VR/AR技术的下一个前沿在于具有完整场景捕获、大6自由度交互空间、多模态反馈以及高分辨率和高帧率内容的沉浸式体积视频。为了促进沉浸式体积视频的重建，我们引入了ImViD，这是一个多视角、多模态数据集，具有完整的面向空间的数据捕获和各种室内/室外场景。我们的捕获设备支持在移动中进行多视角视频-音频捕获，这是现有数据集所不具备的能力，显著提高了数据捕获的完整性、灵活性和效率。捕获的多视角视频（带有同步音频）为5K分辨率、60FPS，持续1-5分钟，包含丰富的前景-背景元素和复杂的动态。我们使用我们的数据集对现有方法进行基准测试，并建立了一个基础管线，用于从多视角视听输入构建用于6自由度多模态沉浸式VR体验的沉浸式体积视频。基准测试以及重建和交互结果证明了我们数据集和基线方法的有效性，我们相信这将激发未来对沉浸式体积视频制作的研究。

英文摘要

User engagement is greatly enhanced by fully immersive multi-modal experiences that combine visual and auditory stimuli. Consequently, the next frontier in VR/AR technologies lies in immersive volumetric videos with complete scene capture, large 6-DoF interaction space, multi-modal feedback, and high resolution & frame-rate contents. To stimulate the reconstruction of immersive volumetric videos, we introduce ImViD, a multi-view, multi-modal dataset featuring complete space-oriented data capture and various indoor/outdoor scenarios. Our capture rig supports multi-view video-audio capture while on the move, a capability absent in existing datasets, significantly enhancing the completeness, flexibility, and efficiency of data capture. The captured multi-view videos (with synchronized audios) are in 5K resolution at 60FPS, lasting from 1-5 minutes, and include rich foreground-background elements, and complex dynamics. We benchmark existing methods using our dataset and establish a base pipeline for constructing immersive volumetric videos from multi-view audiovisual inputs for 6-DoF multi-modal immersive VR experiences. The benchmark and the reconstruction and interaction results demonstrate the effectiveness of our dataset and baseline method, which we believe will stimulate future research on immersive volumetric video production.

URL PDF HTML ☆

赞 0 踩 0