2605.18137 2026-05-28 cs.CV

通过结构化运动描述实现无编码器的人体运动理解

Yao Zhang, Zhuchenyang Liu, Thomas Ploetz, Yu Xiao

发表机构 * Aalto University（阿alto大学）； Georgia Institute of Technology（佐治亚理工学院）

AI总结提出结构化运动描述（SMD），将关节位置序列转换为结构化自然语言描述，使大语言模型无需专用编码器即可直接进行运动推理，在运动问答和字幕生成任务上超越现有方法。

详情

AI中文摘要

基于文本的大语言模型（LLM）的世界知识和推理能力正在快速发展，但目前的人体运动理解方法（包括运动问答和字幕生成）尚未充分利用这些能力。现有的基于LLM的方法通常通过专用编码器学习运动-语言对齐，将运动特征投影到LLM的嵌入空间中，但仍受限于跨模态表示和对齐。受生物力学分析的启发（其中关节角度和身体部位运动学长期以来一直作为人体运动的精确描述语言），我们提出了 extbf{结构化运动描述（SMD）}，一种基于规则的确定性方法，将关节位置序列转换为关节角度、身体部位运动和全局轨迹的结构化自然语言描述。通过将运动表示为文本，SMD使LLM能够直接将其关于身体部位、空间方向和运动语义的预训练知识应用于运动推理，无需学习编码器或对齐模块。我们表明，该方法在运动问答（BABEL-QA上66.7%，HuMMan-QA上90.1%）和运动字幕生成（HumanML3D上R@1为0.584，CIDEr为53.16）上均超越了所有先前方法，达到了最先进的结果。SMD还提供了实际优势：相同的文本输入可适用于不同的LLM，仅需轻量级的LoRA适配（在6个模型家族的8个LLM上验证），并且其人类可读的表示能够对运动描述进行可解释的注意力分析。代码、数据和预训练的LoRA适配器可在https://yaozhang182.github.io/motion-smd/获取。

英文摘要

The world knowledge and reasoning capabilities of text-based large language models (LLMs) are advancing rapidly, yet current approaches to human motion understanding, including motion question answering and captioning, have not fully exploited these capabilities. Existing LLM-based methods typically learn motion-language alignment through dedicated encoders that project motion features into the LLM's embedding space, remaining constrained by cross-modal representation and alignment. Inspired by biomechanical analysis, where joint angles and body-part kinematics have long served as a precise descriptive language for human movement, we propose \textbf{Structured Motion Description (SMD)}, a rule-based, deterministic approach that converts joint position sequences into structured natural language descriptions of joint angles, body part movements, and global trajectory. By representing motion as text, SMD enables LLMs to apply their pretrained knowledge of body parts, spatial directions, and movement semantics directly to motion reasoning, without requiring learned encoders or alignment modules. We show that this approach goes beyond state-of-the-art results on both motion question answering (66.7\% on BABEL-QA, 90.1\% on HuMMan-QA) and motion captioning (R@1 of 0.584, CIDEr of 53.16 on HumanML3D), surpassing all prior methods. SMD additionally offers practical benefits: the same text input works across different LLMs with only lightweight LoRA adaptation (validated on 8 LLMs from 6 model families), and its human-readable representation enables interpretable attention analysis over motion descriptions. Code, data, and pretrained LoRA adapters are available at https://yaozhang182.github.io/motion-smd/.

URL PDF HTML ☆

赞 0 踩 0

2605.28145 2026-05-28 cs.AI cs.LG

Adaptive Reservoir Computing for Multi-Scenario Chaotic System Forecasting

自适应储层计算用于多场景混沌系统预测

Shadmehr Zaregarizi, Khashayar Yavari

发表机构 * Politecnico di Torino（托里尼理工大学）

AI总结提出一种自适应储层计算框架，通过四种定制策略（精确状态同步、直方图引导候选选择、多种子搜索、顺序多序列训练）在CTF-4-Science Lorenz基准的12个任务中取得74.91分，证明其高效竞争力。

Comments 4 pages, 2 figures

详情

AI中文摘要

我们提出了一种自适应储层计算框架，用于CTF-4-Science Lorenz基准测试，该基准评估机器学习模型在十二个不同任务上的表现，这些任务涵盖五种性质不同的场景：基线预测、含噪信号重建、噪声下预测、少样本学习和参数泛化。我们没有采用统一的推理策略，而是根据每个评估场景的具体需求定制回声状态网络（ESNs）的训练和预测过程。我们的主要贡献有四个方面：（1）精确的储层状态同步，消除了短时预测中的预热近似误差；（2）直方图引导的候选选择，直接优化长时间遍历评估指标；（3）多种子储层搜索，适用于训练数据严重受限的少样本场景；（4）顺序多序列训练，解决了参数泛化任务中的状态分布不匹配问题。所提出的框架在公共基准排行榜上获得了74.91分，表明精心调整的储层计算对于多样化的混沌系统建模挑战是一种具有竞争力和计算效率的方法。

英文摘要

We present an adaptive reservoir computing framework for the CTF-4-Science Lorenz benchmark, which evaluates machine learning models across twelve distinct tasks spanning five qualitatively different scenarios: baseline forecasting, noisy signal reconstruction, forecasting under noise, few-shot learning, and parametric generalization. Rather than applying a uniform inference strategy, we tailor the training and prediction procedure of Echo State Networks (ESNs) to the specific demands of each evaluation scenario. Our key contributions are fourfold: (1) exact reservoir state synchronization that eliminates warmup approximation error in short-time prediction; (2) histogram-guided candidate selection that directly optimizes the long-time ergodic evaluation metric; (3) multi-seed reservoir search for few-shot regimes with severely limited training data; and (4) sequential multi-sequence training that resolves state-distribution mismatch in parametric generalization tasks. The proposed framework achieves a score of 74.91 on the public benchmark leaderboard, demonstrating that carefully adapted reservoir computing constitutes a competitive and computationally efficient approach for diverse chaotic system modeling challenges.

URL PDF HTML ☆

赞 0 踩 0

2605.28144 2026-05-28 cs.AI

Deconstructing Spatial Complexity: Hierarchical Decomposition for LLM Spatial Reasoning

解构空间复杂性：用于LLM空间推理的层次分解

Yi Wang, Haojie Lu, Zhaofan Zhang, Li Chen, Sihong Xie

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））

AI总结提出一种层次任务分解方法，结合MCTS引导的组相对策略优化（M-GRPO），通过改进中间状态选择和规划能力，显著提升LLM在导航、规划和策略游戏等空间任务中的表现。

Comments 8 pages

详情

AI中文摘要

LLMs在通用语言理解和推理方面表现出色。然而，它们在空间推理方面始终表现不佳，这严重限制了它们的应用，特别是在具身智能领域。受层次强化学习成功的启发，本文介绍了一种新颖的LLM空间推理层次任务分解方法。我们的方法通过识别关键中间状态并生成简化的子环境，引导LLMs将复杂任务分解为可管理的子任务。然而，我们发现LLMs由于缺乏足够的空间先验知识，往往无法推导出最优的中间状态，导致次优的任务分解。为了解决这一限制并增强其规划能力，我们提出了MCTS引导的组相对策略优化（M-GRPO），其中我们通过结合LLM的先验预测概率及其认知不确定性来重新制定UCT公式。此外，我们实现了一个更细粒度的优势函数，使模型能够学习最优路径规划。实验结果表明，我们的方法显著提高了LLM在空间任务（包括导航、规划和策略游戏）上的性能，达到了最先进的结果。这项工作为LLM在现实世界中的应用铺平了道路。

哪种预训练范式更有利于空间智能？视觉-语言模型与视频生成模型的实证比较

Haozhan Shen, Tiancheng Zhao, Kangjia Zhao, Jianwei Yin

发表机构 * Zhejiang University（浙江大学）； Om AI Research（Om人工智能研究）； Binjiang Institute of Zhejiang University（浙江大学滨江研究院）

AI总结本文通过冻结特征探测研究，系统比较了视觉-语言模型（VLM）和视频生成模型（VGM）在语义标注、实例分组和3D几何预测三个空间智能维度上的表现，发现两者互补且简单融合可提升整体性能。

Comments Code is here: \href{https://github.com/om-ai-lab/Probing-VLM-VGM}{https://github.com/om-ai-lab/Probing-VLM-VGM}

详情

AI中文摘要

STR机器人：从仿真到现实的自主移动机器人设计

Vinh Nguyen, Gia-Uy Le, Tien-Dat Nguyen, Tri-Tin Nguyen, Vinh-Hao Nguyen

发表机构 * Faculty of Electrical and Electronic Engineering, Ho Chi Minh City University of Technology, VNU-HCM（电子工程学院，胡志明市技术大学，VNU-HCM）

AI总结本文提出一种基于现有机械平台的自主移动机器人仿真到现实实现方法，重点开发机载控制、自定位和自主导航系统，并通过仿真和实验验证其可行性。