视觉与机器人

3D 视觉

三维重建、NeRF、Gaussian Splatting、点云和空间智能。

今日/当前日期收录 2 篇信号源：cs.CV, cs.GR, cs.RO

2606.20547 2026-06-19 cs.LG cs.CV cs.GR cs.RO math.DG 新提交 70%

The Token Is a Group Element: On Lie-Algebra Attention over Matrix Lie Groups

Token 是群元素：关于矩阵李群上的李代数注意力

Przemyslaw Musialski

发表机构 * New Jersey Institute of Technology（新泽西理工学院）

专题命中其他3D视觉：李群上注意力机制，可应用于3D变换

AI总结提出李代数注意力机制，将token定义为矩阵李群元素，利用相对位姿的李代数范数作为注意力分数，无需学习核函数或表示论工具，适用于仿射全帧群等非紧致非阿贝尔群。

Comments preprint, 19 pages, 3 figures

详情

AI中文摘要

我们将注意力token置于群上：一个token是矩阵李群$G$的一个元素$g_i$——一个纯粹的变换，没有特征负载，也没有外部作用$\rho(g)$承载它。据我们所知，这是第一个token为裸矩阵李群元素的注意力构造：它们的分数是相对位姿的闭式代数范数，而非学习核，并且它达到了每个基于不可约表示或满射指数的方法必须排除的仿射全帧群。我们称之为李代数注意力。一旦token是群元素，其余部分无需通常的表示论机制。一对的相对几何是规范的，即$g_i^{-1} g_j$，因此成对不变量$w_{ij} = \log(g_i^{-1} g_j)$是内在的而非设计的；在$G$对角作用下的等变性是重言式的，且余循环条件自动成立。注意力分数是负平方代数范数$s_{ij} = -\|\log(g_i^{-1} g_j)\|_\lambda^2/\tau$：在块加权Frobenius内积下的规范邻近核，无需不可约表示、球谐函数、Clebsch-Gordan积或学习核。该构造适用于任何矩阵李群，在包含相对位姿的选定对数图上，包括具有尺度和剪切的非紧致非阿贝尔仿射群，这些是向量token注意力方法无法达到的：既不是不可约表示传统，也不是满射指数方法。在SE(2)、SO(3)和Aff(2)上的三个序列补全实验证实了这一点：闭式分数匹配了相同不变量上的学习MLP核，并在SE(2)上优于它，使用的分数参数少50到80倍，而向量token基线破坏了不变量，误差达五到十二个数量级。

英文摘要

We place the attention token on the group: a token is an element $g_i$ of a matrix Lie group $G$ -- a bare transformation, with no feature payload and no external action $ρ(g)$ carrying it. To our knowledge this is the first attention construction whose tokens are bare matrix Lie group elements: their score is the closed-form algebra norm of the relative pose rather than a learned kernel, and it reaches the affine full-frame groups that every irrep- or surjective-exp-based method must exclude. We call it Lie-Algebra Attention. Once tokens are group elements, the rest follows with none of the usual representation-theoretic machinery. The relative geometry of a pair is canonical, $g_i^{-1} g_j$, so the pairwise invariant $w_{ij} = \log(g_i^{-1} g_j)$ is intrinsic rather than designed; equivariance under the diagonal $G$-action is tautological, and the cocycle condition holds automatically. The attention score is the negative squared algebra norm, $s_{ij} = -\|\log(g_i^{-1} g_j)\|_λ^2/τ$: the canonical proximity kernel under a block-weighted Frobenius inner product, with no irreducible representations, spherical harmonics, Clebsch-Gordan products, or learned kernel. The construction applies to any matrix Lie group on a chosen logarithm chart containing the relative poses, including the non-compact non-abelian affine groups with scale and shear that no vector-token attention method reaches: neither the irrep tradition nor surjective-exp methods. Three sequence-completion experiments, on SE(2), SO(3), and Aff(2), bear this out: the closed-form score matches a learned MLP kernel on the same invariant and outperforms it on SE(2), using 50 to 80x fewer score parameters, while a vector-token baseline breaks invariance by five to twelve orders of magnitude.

URL PDF HTML ☆

赞 0 踩 0

2606.20549 2026-06-19 cs.RO 新提交 60%

Generating Robot Hands from Human Demonstrations

从人类演示生成机器人手

Sha Yi, Nicklas Hansen, Xueqian Bai, Carmelo Sferrazza, Michael T. Tolley, Xiaolong Wang

发表机构 * University of California San Diego（加州大学圣迭戈分校）； Amazon Frontier AI & Robotics（亚马逊前沿人工智能与机器人）

专题命中其他3D视觉：涉及指尖运动数据和逆运动学匹配。

AI总结提出数据驱动框架，利用人类日常操作中超过400万帧指尖运动数据，通过逆运动学匹配指尖位置，优化树状结构机器人手的设计，生成通用6自由度手和低自由度任务专用手，并训练强化学习智能体加速设计搜索。

详情

AI中文摘要

机器人学习在控制学习方面取得了快速进展，但学习机器人的物理身体仍然困难得多，因为同时搜索设计和控制会产生一个非常大的组合问题。在这里，我们提出了一个数据驱动的框架，用于从人类演示生成机器人手。我们不是为每个候选设计学习一个复杂的控制器，而是使用制造后使用的相同简单控制策略来生成机器人手设计：通过逆运动学匹配指尖位置。利用来自日常操作的超过400万帧人类指尖运动数据，我们的算法优化树状结构机器人手以再现所需的目标运动。该框架产生了一个6自由度（DoF）通用手和具有空间四杆仿生关节的低自由度任务专用手。为了加速设计搜索，我们训练了一个强化学习（RL）智能体来提出好的手设计和关节角度，将搜索时间从数小时减少到数分钟。我们直接将机制制作为具有打印就绪关节的一体式铰接结构。在真实世界实验中，6自由度手实现了高度精确的遥操作指尖跟踪，优于现有的商用机器人手，而专门的3自由度手以降低的机械复杂性再现了结构化的人类和合成轨迹。这些结果表明，大规模人类运动数据不仅可以用于训练机器人控制器，还可以作为优化和生成机器人物理实体的参考。

英文摘要

Robot learning has advanced rapidly in learning control, but learning the physical body of a robot remains much more difficult because jointly searching over design and control creates a very large combinatorial problem. Here, we present a data-driven framework for generating robot hands from human demonstrations. Instead of learning a complex controller together with each candidate design, we generate robot hand designs using the same simple control policy used after fabrication: matching fingertip positions through inverse kinematics. Using more than 4 million frames of human fingertip motion from everyday manipulation, our algorithm optimizes tree-structured robot hands to reproduce desired target motions. The framework produced both a 6-degree-of-freedom (DoF) general-purpose hand and lower-DoF task-specific hands with spatial four-bar mimic joints. To accelerate the search over designs, we trained a reinforcement-learning (RL) actor to propose good hand designs and joint angles, reducing search time from hours to minutes. We fabricated the mechanisms directly as one-piece articulated structures with print-in-place joints. In real-world experiments, the 6-DoF hand achieved highly accurate teleoperated fingertip tracking better than available commercial robot hands, whereas the specialized 3-DoF hands reproduced structured human and synthetic trajectories with reduced mechanical complexity. These results showed that large-scale human motion data can be used not only to train robot controllers but also as a reference for optimizing and generating the physical embodiment of robots.

URL PDF HTML ☆

赞 0 踩 0