详情

AI中文摘要

并非所有输入都有效：面向开放集视频时刻检索的语言方法

Xiang Fang, Wanlong Fang, Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Renfu Li, Zichuan Xu, Lixing Chen, Panpan Zheng, Yu Cheng

发表机构 * Huazhong University of Science and Technology（华中科技大学）； Peking University（北京大学）； Zhejiang Gongshang University（浙江工商大学）； Dalian University of Technology（大连理工大学）； Shanghai Jiao Tong University（上海交通大学）； Xinjiang University（新疆大学）； The Chinese University of Hong Kong（香港中文大学）

AI总结针对开放集场景下视频时刻检索任务中无关查询导致错误检索的问题，提出基于归一化流的开放集视频时刻检索模型OpenVMR，实现分布内查询的精确检索与分布外查询的拒绝。

Comments Published in ACM MM 2024

详情

AI中文摘要

视频时刻检索（VMR）旨在从未修剪的视频中检索与句子查询对应的特定时刻。尽管近期工作在该任务上取得了显著进展，但它们隐含地基于封闭集假设，即所有给定查询都与视频相关 ootnote{在本文中，我们将“视频相关查询”视为“分布内（ID）查询”，将“视频无关查询”视为“分布外（OOD）查询”。}。在开放集场景中，给定OOD查询时，它们仍会用于错误检索，这可能在高风险场景（例如犯罪活动检测）中导致不可挽回的损失。为此，我们创造性地探索了一种全新的VMR设置，称为开放集视频时刻检索（OS-VMR），其中我们不仅应基于ID查询检索精确时刻，还应拒绝OOD查询。在本文中，我们首次尝试迈向OS-VMR，并提出了一种新颖模型OpenVMR，该模型首先基于归一化流技术区分ID和OOD查询，然后基于ID查询进行时刻检索。具体而言，我们首先通过构建归一化流学习ID分布，并假设ID查询分布服从多元高斯分布。然后，我们引入不确定性分数来搜索ID-OOD分离边界。之后，通过拉近ID查询特征来细化ID-OOD边界。此外，分别设计了视频-查询匹配和帧-查询匹配用于粗粒度和细粒度的跨模态交互。最后，引入正-无标签学习模块用于时刻检索。在三个VMR数据集上的实验结果表明了我们的OpenVMR的有效性。

英文摘要

Video Moment Retrieval (VMR) targets to retrieve the specific moment corresponding to a sentence query from an untrimmed video. Although recent works have made remarkable progress in this task, they implicitly are rooted in the closed-set assumption that all the given queries as video-relevant\footnote{In this paper, we treat ``video-relevant query'' as ``in-distribution (ID) query'' and ``video-irrelevant query'' as ``out-of-distribution (OOD) query''.}. Given an OOD query in open-set scenarios, they still utilize it for wrong retrieval, which might lead to irrecoverable losses in high-risk scenarios, \textit{e.g.}, criminal activity detection. To this end, we creatively explore a brand-new VMR setting termed Open-Set Video Moment Retrieval (OS-VMR), where we should not only retrieve the precise moments based on ID query, but also reject OOD queries. In this paper, we make the first attempt to step toward OS-VMR and propose a novel model \textbf{OpenVMR}, which first distinguishes ID and OOD queries based on the normalizing flow technology, and then conducts moment retrieval based on ID queries. Specifically, we first learn the ID distribution by constructing a normalizing flow, and assume the ID query distribution obeys the multi-variate Gaussian distribution. Then, we introduce an uncertainty score to search the ID-OOD separating boundary. After that, we refine the ID-OOD boundary by pulling together ID query features. Besides, video-query matching and frame-query matching are designed for coarse-grained and fine-grained cross-modal interaction, respectively. Finally, a positive-unlabeled learning module is introduced for moment retrieval. Experimental results on three VMR datasets show the effectiveness of our OpenVMR.

URL PDF HTML ☆

赞 0 踩 0

2605.29807 2026-05-29 cs.CL cs.AI cs.LG

Data filtering methods for training language models

训练语言模型的数据过滤方法

Egor Shevchenko, Elena Bruches

发表机构 * Novosibirsk State University（新西伯利亚国立大学）； A. P. Ershov Institute of Informatics Systems SB RAS（A. P. Ershov 信息系统研究所）

AI总结本文比较了Confident Learning和Dataset Cartography两种自动标签错误检测方法在俄语文本分类任务中的效果，发现其有效性依赖于数据集特性，在小规模高噪声数据集上Confident Learning显著提升F1-macro。

Comments AINL-2026

详情

AI中文摘要

数据质量是机器学习模型有效性的关键因素。即使广泛使用的基准数据集中也存在标签错误，这些错误会引入训练数据噪声并降低模型泛化能力。在本工作中，我们对两种自动标签错误检测方法——Confident Learning和Dataset Cartography——在三个俄语文本分类语料库上进行了比较分析，这些语料库在规模、类别数量和领域上各不相同：ru_emotion_e-culture（49,123个样本，情感分类）、RuCoLA（8,524个样本，语言可接受性）和TERRa（2,337个样本，文本蕴含识别）。我们使用在每个语料库上微调的预训练rubert-base-cased模型。为了验证过滤的意义，我们进行了控制实验，随机移除等量样本。结果表明，两种方法的有效性强烈依赖于数据集特征：在噪声水平低的大规模语料库上，过滤并未提升性能，而在噪声高的小规模数据集上，Confident Learning实现了显著的F1-macro提升。Dataset Cartography表现出更保守的行为，移除的样本更少。在所有语料库中，两种方法的目标性移除均优于随机移除，证实了这些方法的意义。

英文摘要

Data quality is a critical factor in the effectiveness of machine learning models. Label errors, present even in widely used benchmarks, introduce noise into training data and reduce model generalization. In this work, we conduct a comparative analysis of two automatic label error detection methods - Confident Learning and Dataset Cartography - on three Russian text classification corpora of varying size, number of classes, and domain: ru_emotion_e-culture (49,123 examples, emotion classification), RuCoLA (8,524 examples, linguistic acceptability), and TERRa (2,337 examples, textual entailment recognition). We use the pre-trained rubert-base-cased model fine-tuned on each corpus. To verify the meaningfulness of filtering, we conduct control experiments with random removal of an equivalent number of examples. Results show that the effectiveness of both methods depends strongly on dataset characteristics: on large corpora with low noise levels, filtering does not improve performance, while on small datasets with high noise, Confident Learning achieves a significant F1-macro improvement. Dataset Cartography demonstrates more conservative behavior, removing fewer examples. Across all corpora, targeted removal by both methods outperforms random removal, confirming the meaningfulness of the approaches.

URL PDF HTML ☆

赞 0 踩 0

2605.29803 2026-05-29 cs.LG

低倍率SEM可能足够：用于氧化锆增韧氧化铝多尺度断裂原因分类的可解释深度学习

Julian Schmid, Pawel Astankow, Tom Vater, Julius Beck, Robert Cichon, Danny Krautz

发表机构 * CeramTec GmbH（CeramTec公司）； School of Life Sciences, University of Applied Sciences（应用科学与艺术北瑞士学院生命科学学院）

AI总结提出一种可解释的视觉变换器工作流，利用低倍率SEM图像对氧化铝基复合材料植入物断裂原因进行自动分类，达到与高倍率相当的准确率。

详情

AI中文摘要

可靠识别氧化铝基复合材料髋关节和膝关节植入物的断裂起源对于质量保证和患者安全至关重要，然而当前的断口分析工作流程耗时、部分主观且依赖高倍率扫描电子显微镜（SEM）。我们提出了一种可解释的视觉变换器（ViT）工作流，用于对广泛用于全关节置换的氧化铝基复合材料（BIOLOX delta, CeramTec GmbH）的断裂原因进行自动分类。从五年的生产爆破和验证测试中整理了8,493张SEM图像（50倍至10,000倍）的数据集，并按照制造链定义的三个缺陷类别（生坯、硬加工和材料缺陷）进行标注。在严重的类别不平衡下，微调后的ViT在分层五折交叉验证中达到了0.907的准确率和0.888的宏F1分数，两阶段感知哈希/SSIM泄漏审计确认了样本重叠可忽略。值得注意的是，低倍率（50倍）下的性能与高倍率（1k-10k倍）相当，表明宏观特征——镜面几何和羽状纹线场——已经编码了足够的诊断信号。Grad-CAM归因一致地定位在经典的断口线索（镜面、羽状纹、孔隙、加工痕迹）上，与既定的断口分析标准一致。这些结果共同将可解释ViT定位为陶瓷植入物质量保证的补充工具，能够实现低倍率预筛选并减少对耗时的高倍率检查的依赖。

英文摘要

Reliable identification of fracture origins in alumina matrix composite hip and knee implants is critical for quality assurance and patient safety, yet current fractographic workflows are time-consuming, partly subjective, and reliant on high-magnification scanning electron microscopy (SEM). We present an interpretable vision-transformer (ViT) workflow for automated classification of fracture causes in an alumina matrix composite (BIOLOX delta, CeramTec GmbH) widely used in total joint replacements. A dataset of 8,493 SEM images (50x-10,000x) was curated from five years of in-production burst and proof tests and annotated into three defect categories defined along the manufacturing chain: green body, hard machining, and material defects. Under severe class imbalance, the fine-tuned ViT reached an accuracy of 0.907 and a macro-F1 of 0.888 in stratified five-fold cross-validation, with a two-stage perceptual-hash/SSIM leakage audit confirming negligible specimen overlap. Notably, performance at low magnification (50x) was comparable to that at high magnification (1k-10kx), indicating that macro-scale features - mirror geometry and hackle line fields - already encode sufficient diagnostic signal. Grad-CAM attributions consistently localised on canonical fractographic cues (mirrors, hackles, pores, machining marks), aligning with established fractographic criteria. Together, these results position interpretable ViTs as a complementary tool for ceramic-implant quality assurance, enabling low-magnification pre-screening and reducing reliance on time-intensive high-magnification inspection.

URL PDF HTML ☆

赞 0 踩 0

2605.29797 2026-05-29 cs.CL

Metric-Dependent Annotation Saturation for Learning from Label Distributions

基于度量依赖的标注饱和：从标签分布中学习

Guneet Kohli

发表机构 * Apple（苹果公司）

AI总结研究从标签分布中学习时，所需标注者数量如何依赖于评估度量，发现熵相关需要20-50个标注者收敛，而KL散度在10个标注者时饱和，且软标签优于标签平滑。

Comments 16 pages, 3 figures, 14 tables

详情

AI中文摘要

当标注者对标签存在分歧时，分歧本身携带信号——而捕获该信号所需的标注者数量取决于评估度量。我们在从ChaosNLI（每个项目提供100个独立标注者判断的数据集）子采样的标签分布上微调NLI模型，并识别出度量依赖的饱和。在我们的3类NLI设置中，熵相关——模型是否识别出哪些项目引发分歧——需要N~20-50个标注者才能收敛，而分布匹配（KL散度）在N~10时饱和（五个模型种子中达到改进的87-95%）。这一发现基于先前的观察：软标签携带标签平滑无法复现的项目特定信号。在五种平滑强度下，熵相关聚类在r~0.45-0.49，而软标签达到r=0.643（p<0.001）；逐项分析将这一差距归因于平滑无法区分模糊项目与清晰项目。软标签优势在两种架构（DeBERTa、RoBERTa）、非NLI预训练基线以及内容安全探索性跨领域评估中得以复现。这些结果表明，标注预算应根据目标评估度量而非统一设定。

英文摘要

When annotators disagree on a label, the disagreement itself carries signal -- and the number of annotators needed to capture it depends on the evaluation metric. We fine-tune NLI models on label distributions subsampled from ChaosNLI, a dataset providing 100 independent annotator judgments per item, and identify metric-dependent saturation. In our 3-class NLI setting, entropy correlation -- whether the model identifies which items elicit disagreement -- requires N ~ 20-50 annotators to converge, while distributional match (KL divergence) saturates by N ~ 10 (87-95% of improvement across five model seeds). This finding rests on a prior observation: soft labels carry item-specific signal that label smoothing cannot replicate. Across five smoothing intensities, entropy correlation clusters at r ~ 0.45-0.49, while soft labels reach r = 0.643 (p < 0.001); per-item analysis traces this gap to smoothing's inability to distinguish ambiguous items from clear ones. The soft-label advantage replicates across two architectures (DeBERTa, RoBERTa), a non-NLI-pretrained baseline, and an exploratory cross-domain evaluation on content safety. These results suggest that annotation budgets should be informed by the target evaluation metric rather than set uniformly.

URL PDF HTML ☆

赞 0 踩 0

2605.29795 2026-05-29 cs.AI

MEMENTO: Leveraging Web as a Learning Signal for Low-Data Domains

MEMENTO: 利用网络作为低数据领域的学习信号

Ashutosh Ojha, Vinay Aggarwal, Ashutosh Srivastava, Siddharth Yedlapati, Yaman K Singla, Jitendra Ajmera

发表机构 * Adobe, Media & Data Science Research Lab（Adobe媒体与数据科学研究实验室）

AI总结提出MEMENTO框架，通过自适应探索树和双通道记忆将网络作为学习信号，在低数据专业领域（销售自动化和法律研究）中显著提升性能。

详情

AI中文摘要

现实世界的任务通常缺乏大规模标注数据集，这激发了在低数据场景下学习的广泛研究。然而，现有方法如少样本提示、指令调优和合成数据生成，仍将标注或伪标注数据作为主要学习信号。相比之下，人类从业者通过反复、自主地与开放网络交互来获取专业知识，逐步完善领域知识和搜索策略。我们提出MEMENTO，一个将网络视为学习信号而非无状态检索接口的框架。MEMENTO在两个层面运作：在每个会话内，它通过自适应探索树（AET）进行迭代式网络探索，将任务分解为演化中的问题并反思中间发现；跨会话间，它通过双通道记忆积累经验，将陈述性知识（事实）与程序性知识（搜索策略）分离。这种设计使智能体能够从网络交互轨迹中学习可重用的研究策略和领域专业知识，而无需额外的模型训练。我们在两个低数据专业领域（销售自动化和法律研究）上评估MEMENTO。实验结果显示，与基于ReAct的基线相比，性能持续提升（销售自动化+25.6%，法律研究+36.5%），表明网络可以作为在数据稀缺场景下获取任务特定专业知识的可扩展学习源。

英文摘要

Real-world tasks often lack large labeled datasets, motivating extensive work on learning in low-data regimes. However, existing approaches such as few-shot prompting, instruction tuning, and synthetic data generation, continue to treat labeled or pseudo-labeled data as the primary learning signal. In contrast, human practitioners acquire expertise through repeated, self-directed interaction with the open web, progressively refining both domain knowledge and search strategies. We propose MEMENTO, a framework that treats the web as a learning signal rather than a stateless retrieval interface. MEMENTO operates at two levels: within each session, it conducts iterative web exploration via an Adaptive Exploration Tree (AET) that decomposes tasks into evolving questions and reflects on intermediate findings; across sessions, it accumulates experience through dual-channel memory, separating declarative knowledge (facts) from procedural knowledge (search strategies). This design enables agents to learn reusable research strategies and domain expertise from trajectories of web interaction without additional model training. We evaluate MEMENTO on two low-data professional domains: sales automation and legal research. Our empirical results show consistent improvements in performance over ReAct based baselines (+25.6% on sales automation and 36.5% on legal research), demonstrating that the web can serve as a scalable learning source for acquiring task-specific expertise in data-scarce settings.

URL PDF HTML ☆

赞 0 踩 0

2605.29794 2026-05-29 cs.AI

SkillsInjector: Dynamic Skill Context Construction for LLM Agents

SkillsInjector: 面向LLM智能体的动态技能上下文构建

Yanchao Li, Wanhao Liu, Ben Gao, Jiaqing Xie, Zhehong Ai, Na Zou, Yuqiang Li, Tianfan Fu

发表机构 * Nanjing University（南京大学）； Shanghai AI Lab（上海人工智能实验室）

AI总结针对静态技能注入导致性能下降的问题，提出SkillsInjector两阶段自适应方法，通过上下文规划器学习技能偏好并自适应预算，结合集合感知渲染器优化描述呈现，在三个基准上分别提升3.9、6.1和7.3个百分点。

详情

AI中文摘要

LLM智能体现在依赖不断增长的技能库来处理复杂任务。然而，注入更多技能并不总能提高任务完成度，甚至可能降低性能。现有方法仍将技能注入视为静态步骤，使用固定标准选择技能，预先设定预算，并保持描述不变。我们认为这种静态处理会削弱技能的效用，因为暴露哪些技能、包含多少技能以及如何呈现它们都会影响下游性能。我们提出SkillsInjector，一种两阶段自适应方法，共同解决这些决策。首先，上下文规划器学习基于执行的技能偏好，并为每个任务自适应地确定技能数量。然后，集合感知渲染器根据共注入的邻居定制所选描述的呈现方式。在tau2-bench、SkillsBench和ALFWorld上，SkillsInjector取得了最高分数，分别比最强基线提高了3.9、6.1和7.3个百分点。消融研究表明，技能选择、自适应预算和集合感知渲染各自对性能提升有贡献。这些结果表明，技能增强型智能体受益于优化注入的上下文本身。代码将在发表后发布。

英文摘要

LLM agents now draw on growing skill libraries to handle complex tasks. However, injecting more skills does not always improve task completion and can even degrade it. Existing methods still treat skill injection as a static step, selecting skills with fixed criteria, fixing the budget in advance, and leaving descriptions unchanged. We argue that this static treatment can undermine the utility of skills, because which skills are exposed, how many are included, and how they are presented all affect downstream performance. We propose SkillsInjector, a two-stage adaptive method that jointly addresses these decisions. First, a context planner learns execution-grounded skill preferences and admits an adaptive number of skills for each task. A set-aware renderer then tailors how selected descriptions are presented relative to their co-injected neighbors. Across tau2-bench, SkillsBench, and ALFWorld, SkillsInjector achieves the highest score, improving over the strongest baseline by 3.9, 6.1, and 7.3 percentage points, respectively. Ablation studies show that skill selection, adaptive budgeting, and set-aware rendering each contribute to the gain. These results show that skill-augmented agents benefit from optimizing the injected context itself. Code will be released upon publication

URL PDF HTML ☆

赞 0 踩 0

2605.29793 2026-05-29 cs.CV

Fewer Steps, Better Performance: Efficient Cross-Modal Clip Trimming for Video Moment Retrieval Using Language

更少步骤，更优性能：基于语言的高效跨模态视频片段修剪用于视频时刻检索

Xiang Fang, Daizong Liu, Wanlong Fang, Pan Zhou, Zichuan Xu, Wenzheng Xu, Junyang Chen, Renfu Li

发表机构 * Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science of Technology（湖北大数据安全工程研究中心，网络安全学院，华中科技大学）； Peking University（北京大学）； Henan University（河南大学）； Dalian University of Technology（大连理工大学）； Sichuan University（四川大学）； Shenzhen University（深圳大学）； Huazhong University of Science and Technology（华中科技大学）

AI总结提出SpotVMR方法，通过可学习的片段搜索模型和低成本语义索引特征，高效修剪查询相关视频片段，作为即插即用模块提升现有VMR方法的效率与性能。

Comments Published in AAAI 2024

详情

AI中文摘要

给定一个未修剪的视频和一个句子查询，基于语言的视频时刻检索（VMR）旨在定位目标查询相关的时刻。由于未修剪的视频过长，几乎所有现有的VMR方法首先将每个未修剪的视频稀疏下采样为多个固定长度的视频片段，然后与查询特征和昂贵的片段特征进行多模态交互以进行推理，这对于跨越数小时的长真实世界视频是不可行的。由于视频被下采样为固定长度的片段，一些与查询相关的帧可能被过滤掉，这将模糊目标时刻的特定边界，将相邻的不相关帧作为新边界，容易导致跨模态错位，并引入边界偏差和推理偏差。为此，在本文中，我们提出了一种高效的方法SpotVMR，用于修剪与查询相关的片段。此外，我们提出的SpotVMR可以作为即插即用模块，在保持良好检索性能的同时提高最先进VMR方法的效率。特别地，我们首先设计了一个新颖的片段搜索模型，该模型学习根据语言查询识别有希望的视频区域进行搜索。然后，我们引入一组低成本的语义索引特征来捕获对象和交互的上下文，这些上下文提示在哪里搜索查询相关的时刻。此外，利用蒸馏损失来解决片段选择器和VMR模型端到端联合训练中出现的优化问题。在三个具有挑战性的数据集上的大量实验证明了其有效性。

英文摘要

Given an untrimmed video and a sentence query, video moment retrieval using language (VMR) aims to locate a target query-relevant moment. Since the untrimmed video is overlong, almost all existing VMR methods first sparsely down-sample each untrimmed video into multiple fixed-length video clips and then conduct multi-modal interactions with the query feature and expensive clip features for reasoning, which is infeasible for long real-world videos that span hours. Since the video is downsampled into fixed-length clips, some query-related frames may be filtered out, which will blur the specific boundary of the target moment, take the adjacent irrelevant frames as new boundaries, easily leading to cross-modal misalignment and introducing both boundary-bias and reasoning-bias. To this end, in this paper, we propose an efficient approach, SpotVMR, to trim the query-relevant clip. Besides, our proposed SpotVMR can serve as plug-and-play module, which achieves efficiency for state-of-the-art VMR methods while maintaining good retrieval performance. Especially, we first design a novel clip search model that learns to identify promising video regions to search conditioned on the language query. Then, we introduce a set of low-cost semantic indexing features to capture the context of objects and interactions that suggest where to search the query-relevant moment. Also, the distillation loss is utilized to address the optimization issues arising from end-to-end joint training of the clip selector and VMR model. Extensive experiments on three challenging datasets demonstrate its effectiveness.

URL PDF HTML ☆

赞 0 踩 0

2605.29791 2026-05-29 cs.CL

ActTraitBench: Quantifying the Knowledge-Decision Gap in Large Language Models via Human-Grounded Behavioral Validation

ActTraitBench: 通过人类行为验证量化大型语言模型中的知识-决策差距

Yutong Yang, Chenxi Miao, Weikang Li, Yunfang Wu

发表机构 * Peking University（北京大学）； Baidu Inc（百度公司）

AI总结提出ActTraitBench框架，基于人类数据建立心理测量方面与行为范式的一一映射，并通过分位数映射校准LLM评分分布，揭示LLM在自我报告与行为决策之间的知识-决策差距，并引入CoCA干预来缓解该差距。

详情

AI中文摘要

虽然大型语言模型（LLM）在显式自我报告中能够令人信服地模拟人格，但它们在隐式行为决策中常常出现偏差，揭示了显著的知识-决策差距（$G_{\text{KD}}$）。现有的基准由于结构效度有限、多维度纠缠以及基于LLM评估中的分布偏差，难以衡量这种不对称性。为了解决这些问题，我们提出了ActTraitBench，一个基于人类数据的评估框架，用于衡量LLM中的人格一致性。基于经验人类数据，ActTraitBench建立了心理测量方面与行为范式之间的一一映射，并应用通过分位数映射的分布校准程序，使LLM评判者的分数分布与人类规范对齐。在14个主流LLM上的实验揭示了普遍的知识-决策不对称性，其中更大、能力更强的模型尽管自我报告高度一致，但往往表现出更强的行为分歧。为了缓解这一差距，我们进一步引入了认知对齐链（CoCA），一种即插即用的推理时干预措施，可改善具有推理能力的前沿模型的对齐，同时暴露出较小架构中明显的能力限制。

英文摘要

While Large Language Models (LLMs) can convincingly simulate personas in explicit self-reports, they often deviate in implicit behavioral decisions, revealing a substantial Knowledge-Decision Gap ($G_{\text{KD}}$). Existing benchmarks struggle to measure this asymmetry due to limited construct validity, multi-dimensional entanglement, and distributional biases in LLM-based evaluation. To address these issues, we propose ActTraitBench, a human-grounded evaluation framework for measuring personality consistency in LLMs. Grounded in empirical human data, ActTraitBench establishes one-to-one mappings between psychometric facets and behavioral paradigms, and applies a Distributional Calibration via Quantile Mapping procedure to align LLM-judge score distributions with human norms. Experiments on 14 mainstream LLMs reveal a pervasive knowledge-decision asymmetry, where larger and more capable models often exhibit stronger behavioral divergence despite highly consistent self-reports. To mitigate this gap, we further introduce the Chain of Cognitive Alignment (CoCA), a plug-and-play inference-time intervention that improves alignment in reasoning-capable frontier models while exposing clear capability limitations in smaller architectures.

URL PDF HTML ☆

赞 0 踩 0

2605.29788 2026-05-29 cs.AI cs.LG

基于在线增量学习的定制腕带关节角度估计

Shuo Wang, Xiaobin Chen, Xiaoming Tao

发表机构 * Research Institute for Intelligent Wearable Systems, The Hong Kong Polytechnic University（智能可穿戴系统研究院，香港理工大学）

AI总结提出一种基于在线增量学习的定制腕带系统，通过两阶段方法（在线学习更新模型+模型估计）实现腕关节角度估计，适应不同佩戴配置下的数据漂移，误差约15度。

详情

AI中文摘要

智能可穿戴技术在人机交互、运动和健康监测中扮演着越来越重要的角色。为了确保使用的舒适性和实用性，运动监测的一种常见形式是利用软体可穿戴传感器。然而，许多关于可穿戴传感器的研究应用过于简单，难以适应不同情况。本研究提出了一种基于在线增量学习方法的定制腕带系统，用于估计腕关节角度。这是一种两阶段估计方法：第一阶段根据佩戴者的手腕运动特征，利用在线学习更新模型，并集成来自IMU的实时数据作为真实值。第二阶段仅使用腕带利用更新后的模型进行腕关节角度估计。换句话说，模型训练在数据采集过程中完成，使得训练好的模型可用于后续的角度估计。该方法在适应由不同测试配置引起的数据漂移方面具有优势，例如同一受试者的左右手腕、同一手腕上佩戴位置的偏差，甚至不同受试者之间的差异。结果表明，传感器在应变变化下表现出良好的性能，所提系统在不同场景下的腕关节轨迹估计误差约为15度。

英文摘要

Intelligent wearable technology plays an increasingly important role in human-computer interaction, motion, and health monitoring. To ensure comfort and practicality of use, one common form for motion monitoring is to utilize soft wearable sensors. However, many research applications regarding wearable sensors are simplistic and difficult to adapt to different situations. This study proposes a system for estimating the angle of the wrist joint using a customized wristband based on an online incremental learning approach. It is a two-stage estimation method: the first stage updates the model based on the wearer's wrist movement characteristics using online learning, integrating real-time data from an IMU as ground truth. The second stage utilizes the updated model for estimation of wrist joint angle solely with the wristband. In other words, model training is completed during data acquisition, allowing the trained model to be used for subsequent angle estimation. This method offers advantages in adapting to data drift caused by variations in different testing configurations, such as the left and right wrists of the same subject, deviations in the wearing position on the same wrist, and even differences among various subjects. The results indicate that the sensors exhibit good performance under strain variations, and the wrist joint trajectory estimation of the proposed system has an approximate error of 15 degree in different scenarios.

URL PDF HTML ☆

赞 0 踩 0

2605.29768 2026-05-29 cs.AI

From XXLTraffic to EvoXXLTraffic: Scaling Traffic Forecasting to Sensor-Evolving Networks

从XXLTraffic到EvoXXLTraffic：将交通预测扩展到传感器演化的网络

Du Yin, Hao Xue, Arian Prabowo, Shuang Ao, Flora Salim

发表机构 * University of New South Wales（新南威尔士大学）

AI总结针对现有交通预测基准假设固定传感器集的问题，提出包含长达27年数据的XXLTraffic数据集及其传感器演化版本EvoXXLTraffic，定义年度流式预测协议，并评估多种基线方法，发现超大规模演化数据集更贴近现实且许多现有SOTA方法失效。

Comments Under Review

详情

AI中文摘要

现有的交通预测基准假设固定的传感器集，但实际道路传感器网络随着道路网络逐年变化而持续增长。我们引入了XXLTraffic数据集系列，涵盖长达27年的加州PeMS和新南威尔士州交通数据。XXLTraffic的固定传感器子集支持多年间隔的极长周期预测以及标准的每小时/每日长时预测。我们将其扩展为EvoXXLTraffic，这是一个传感器演化的重组版本，暴露了每年活跃的传感器、年度交通流矩阵以及九个PeMS区域的年度图快照，增长率从+305%到超过+10,000%。我们在EvoXXLTraffic上定义了一个年度流式预测协议，其中每个日历年是一个持续任务，并评估了来自静态时空GNN、朴素在线方案、演化图持续方法以及检索/测试时方法的各种代表性基线。我们发现，我们的超大规模演化数据集更好地反映了现实世界，许多最先进（SOTA）结果不再有效。我们的数据集通过支持在超长演化道路网络下更现实的预测，补充了现有的基准。

英文摘要

Existing traffic forecasting benchmarks assume a fixed sensor set, but real road-sensor networks grow continuously as the road network changes year by year. We introduce the XXLTraffic dataset family, which spans up to 27 years of California PeMS and Transport for NSW data. The fixed-sensor subsets of XXLTraffic support extremely long forecasting with multi-year gaps and standard hourly / daily long-horizon forecasting. We extend it to EvoXXLTraffic, a sensor-evolving reorganization that exposes per-year active sensors, yearly traffic-flow matrices, and yearly graph snapshots across nine PeMS districts, with growth ratios ranging from +305% to over +10,000%. We define a yearly streaming forecasting protocol on EvoXXLTraffic in which each calendar year is a continual task, and benchmark a wide range of representative baselines drawn from static spatio-temporal GNNs, naïve online schemes, evolving-graph continual methods, and retrieval / test-time methods. We find that our ultra-large evolutionary dataset better reflects the real world, and many state-of-the-art (SOTA) results no longer work. Our dataset complements existing benchmarks by enabling more realistic forecasting under ultra-long evolutionary road networks.

URL PDF HTML ☆

赞 0 踩 0

2605.29766 2026-05-29 cs.RO

MARS Policy: Multimodality Only When It Matters

MARS策略：仅在必要时使用多模态

Jindou Jia, Tuo An, Yuxuan Hu, Gen Li, Jingliang Li, Bohan Hou, Xiangyu Chen, Jiaqi Bai, Bofan Lyu, Jianfei Yang

发表机构 * MARS Lab, Nanyang Technological University（MARS实验室，南洋理工大学）

AI总结提出MARS策略，通过自适应地在需要时引入随机性，在单模态阶段使用确定性学习，平衡生成策略的多模态能力与确定性模型的高效率，在模拟和真实任务中提升成功率和推理速度。

Comments 13 figures, 17 pages

详情

AI中文摘要

模仿学习已成为解决复杂机器人操作任务的基石。特别是，多模态使机器人能够捕捉多样且有效的行为模式，推动了生成策略作为机器人学习主导范式的迅速兴起。然而，实现这种多模态通常依赖于随机噪声初始化和迭代去噪过程，导致训练复杂度高、推理效率低。同时，机器人任务的并非所有阶段都固有地需要行为多样性。受此启发，我们提出了模态自适应机器人采样（MARS）策略，该策略仅在真正有益时自适应地调用定制的随机性，而在单模态阶段恢复为高效的确定性学习。换句话说，仅在适当的时间注入适量的噪声。通过选择性激活多模态生成，MARS策略弥合了生成策略的多模态能力与确定性模型优越的训练和推理效率之间的差距。在8个模拟和4个真实世界任务上的实证研究表明，MARS展现出鲁棒的多模态表达能力和高效率，在真实世界测试中成功率提升16.67%，推理延迟降低83.20%。反直觉的是，MARS在近确定性任务上的训练效率也超过了确定性策略，因为它更有效地建模了细微的动作多样性。

英文摘要

Imitation learning has become a cornerstone for solving complex robotic manipulation tasks. In particular, multimodality, which enables robots to capture diverse yet valid behavioral patterns, has driven the rapid emergence of generative policies as a dominant paradigm in robot learning. However, achieving such multimodality typically relies on stochastic noise initialization and iterative denoising procedures, resulting in substantial training complexity and low inference efficiency. Meanwhile, not all phases of a robotic task inherently require behavioral diversity. Motivated by this insight, we propose the Modality-Adaptive Robot Sampling (MARS) policy, which adaptively invokes tailored stochasticity only when it is truly beneficial, while reverting to an efficient deterministic learning during single-modal phases. In other words, the proper amount of noise is injected only at the proper time. By selectively activating multimodal generation, MARS policy bridges the gap between the multimodal capability of generative policies and the superior training and inference efficiency of deterministic models. Empirical studies across 8 simulated and 4 real-world tasks demonstrate that MARS exhibits robust multimodal expressivity and high efficiency, with a 16.67% success rate improvement and an 83.20% inference latency reduction in real-world tests. Counterintuitively, MARS also outpaces deterministic policies in training efficiency on near-deterministic tasks by more effectively modeling nuanced action diversity.

URL PDF HTML ☆

赞 0 踩 0