arXivDaily arXiv每日学术速递 周一至周五更新

视觉与机器人

机器人 / 具身智能

机器人、具身智能、机器人学习、操作、导航和具身世界模型。

今日/当前日期收录 5 信号源:cs.RO, cs.AI, cs.CV, cs.LG
2606.18664 2026-06-18 cs.SD cs.AI 新提交 80%

NeuralMUSIC: A Hybrid Neural-Subspace Framework for Robot Sound Source Localization

NeuralMUSIC: 一种用于机器人声源定位的混合神经-子空间框架

Yizhuo Yang, Junqiao Fan, Shenghai Yuan, Lihua Xie

发表机构 * School of Electrical and Electronic Engineering, Nanyang Technological University(南洋理工大学电气与电子工程学院)

专题命中 其他机器人 :机器人声源定位混合框架

AI总结 提出NeuralMUSIC混合框架,结合神经网络估计空间协方差矩阵与经典MUSIC子空间方法,通过频率注意力融合和自监督学习提升机器人声源定位的鲁棒性和跨域泛化能力。

详情
AI中文摘要

可靠的声源定位是机器人听觉的基础,使自主机器人能够感知空间线索并在动态环境中有效运行。经典方法如多信号分类(MUSIC)具有坚实的理论基础,但在低信噪比下性能下降。基于深度学习的方法虽然取得了有前景的性能,但通常难以在多种条件下泛化。为了解决这些挑战,我们提出了NeuralMUSIC,一种用于机器人声源定位的混合神经-子空间框架。具体来说,神经网络首先从多通道麦克风观测中估计空间协方差矩阵。然后将预测的协方差集成到经典的MUSIC流程中,包括特征值分解(EVD)和伪谱计算,随后通过频率注意力融合(FAF)模块产生最终的DOA估计。为了提高数据效率,我们进一步引入了一种自监督空间相关学习(SSCL)策略,利用未标记的声学数据来捕获空间结构。跨不同机器人任务的广泛实验表明,NeuralMUSIC在实现有竞争力的定位精度的同时,表现出更强的鲁棒性和跨域泛化能力。

英文摘要

Reliable sound source localization is fundamental to robot audition, enabling autonomous robots to perceive spatial cues and operate effectively in dynamic environments. Classical methods such as Multiple Signal Classification (MUSIC) offer strong theoretical foundations but degrade under low signal-to-noise ratios. While deep learning-based approaches achieve promising performance, they often struggle with limited generalization across conditions. To address these challenges, we propose NeuralMUSIC, a hybrid neural-subspace framework for robotic sound source localization. Specifically, a neural network first estimates the spatial covariance matrix from multichannel microphone observations. The predicted covariance is then integrated into a classical MUSIC pipeline with eigenvalue decomposition (EVD) and pseudo-spectrum computation, followed by a Frequency Attention Fusion (FAF) module to produce the final DOA estimates. To improve data efficiency, we further introduce a Self-supervised Spatial Correlation Learning (SSCL) strategy that leverages unlabeled acoustic data to capture spatial structure. Extensive experiments across different robotic tasks demonstrate that NeuralMUSIC achieves competitive localization accuracy while exhibiting improved robustness and cross-domain generalization.

2601.07052 2026-06-18 cs.RO 版本更新 80%

RSLCPP -- Deterministic Simulations Using ROS 2

RSLCPP——使用ROS 2进行确定性仿真

Simon Sagmeister, Marcel Weinmann, Phillip Pitschi, Markus Lienkamp

发表机构 * Technical University of Munich, Germany(慕尼黑技术大学) School of Engineering & Design, Department of Mobility Systems Engineering, Institute of Automotive Technology(工程与设计学院,移动系统工程系,汽车技术研究所) School of Engineering & Design, Department of Engineering Physics and Computation, Institute of Automatic Control(工程与设计学院,工程物理与计算系,自动控制研究所)

专题命中 其他机器人 :使用ROS 2实现确定性仿真,用于机器人开发

AI总结 针对ROS异步多进程设计导致仿真结果不可复现的问题,提出RSLCPP库,通过确定性回调执行实现跨平台可复现仿真,无需修改现有节点代码。

Comments Accepted for publication at the 'IEEE Robotics and Automation Practice'

详情
AI中文摘要

仿真在现实机器人技术中至关重要,为开发各种机器人应用提供了安全、可扩展且高效的环境。虽然机器人操作系统(ROS)在学术界和工业界已被广泛采用作为这些机器人应用的基础,但其异步、多进程的设计使得复现变得复杂,尤其是在不同的硬件平台上。当计算时间和通信延迟变化时,无法保证确定性回调执行。这种缺乏复现性的问题给科学基准测试和持续集成带来了困难,因为在这些场景中一致的结果至关重要。为了解决这个问题,我们提出了一种使用ROS 2节点创建确定性仿真的方法。我们的ROS仿真库(RSLCPP)实现了这种方法,使得现有节点可以组合成一个产生可复现结果的仿真例程,通常无需更改任何源代码。我们证明,在测试合成基准测试和真实机器人系统时,我们的方法在各种CPU和架构上产生相同的结果。RSLCPP已开源,网址为:https://this https URL。

英文摘要

Simulation is crucial in real-world robotics, offering safe, scalable, and efficient environments for developing a variety of robotic applications. While the Robot Operating System (ROS) has been widely adopted as the backbone of these robotic applications in both academia and industry, its asynchronous, multi-process design complicates reproducibility, especially across varying hardware platforms. Deterministic callback execution cannot be guaranteed when computation times and communication delays vary. This lack of reproducibility complicates scientific benchmarking and continuous integration, where consistent results are essential. To address this, we present a methodology to create deterministic simulations using ROS 2 nodes. Our ROS Simulation Library for C++ (RSLCPP) implements this approach, enabling existing nodes to be combined into a simulation routine that yields reproducible results, usually without requiring any source code changes. We demonstrate that our approach produces identical results across various CPUs and architectures when testing both a synthetic benchmark and a real-world robotics system. RSLCPP is open-sourced at https://github.com/TUMFTM/rslcpp.

2606.18688 2026-06-18 cs.LG cs.AI 新提交 70%

Dual-Channel Grounded World Modeling (DCGWM): Structural Prevention of Objective Interference Collapse via Heterogeneous External Grounding with Inward-Only Gradient Flow

双通道接地世界建模 (DCGWM):通过异构外部接地与内向梯度流结构性防止目标干扰崩溃

Akshay Hazare

发表机构 * Independent Researcher(独立研究者)

专题命中 其他机器人 :世界模型表示学习,双通道接地

AI总结 提出双通道接地世界建模(DCGWM),通过分区潜空间和内向梯度流,结构性防止联合嵌入预测架构中多目标接地导致的目标干扰崩溃。

Comments Position paper. Experimental validation in progress

详情
AI中文摘要

联合嵌入预测架构(JEPAs)是世界模型表示学习的主要方法。我们识别出基于JEPA的世界模型在接地于两种性质不同的外部信号时存在一种失败模式:物理动力学(稀疏、高幅度、满足约束的梯度修正)和社会行为动力学(扩散、分布匹配的修正)。我们将其称为目标干扰崩溃(OIC):我们认为在共享潜空间中的联合学习会导致主导通道系统地崩溃从属通道的表示子空间,且仅通过损失加权无法解决。我们提出双通道接地世界建模(DCGWM),通过分区潜空间(物理子空间Z_p,行为子空间Z_b)和内向梯度流,从结构上防止OIC。物理接地通道通过VICReg风格的对齐到物理测量仅更新Z_p;社会行为接地通道通过对齐到涌现多智能体模拟的轨迹仅更新Z_b。通道间接口模块在任务级别耦合子空间,而不产生跨子空间梯度。非对称接地 adherence 损失通过硬铰链惩罚物理违反和软KL惩罚行为发散来惩罚 rollout 漂移。生成渲染层在架构上与潜世界模型隔离。我们给出三个理论结果:分区消除了与OIC相关的梯度干扰路径;每个接地子空间从其对齐目标继承抗崩溃保证;在生成目标几何形状的假设下,生成隔离是必要的。本文建立了问题表述和架构;实验验证正在进行中,将在未来修订中报告。

英文摘要

Joint Embedding Predictive Architectures (JEPAs) are a leading approach to world model representation learning. We identify a failure mode in JEPA-based world models grounded against two qualitatively distinct external signals: physical dynamics (sparse, high-magnitude, constraint-satisfying gradient corrections) and social-behavioral dynamics (diffuse, distribution-matching corrections). We term this Objective Interference Collapse (OIC): we argue that joint learning in a shared latent space causes the dominant channel to systematically collapse the subordinate channel's representational subspace, in a manner not resolvable by loss weighting alone. We propose Dual-Channel Grounded World Modeling (DCGWM), designed to structurally prevent OIC through a partitioned latent space (physical subspace Z_p, behavioral subspace Z_b) with inward-only gradient flow. A Physical Grounding Channel updates only Z_p via VICReg-style alignment to physical measurements; a Social-Behavioral Grounding Channel updates only Z_b via alignment to trajectories from an emergent multi-agent simulation. An Inter-Channel Interface Module couples the subspaces at the task level without cross-subspace gradients. An Asymmetric Grounding Adherence Loss penalizes rollout drift with a hard hinge for physical violations and a soft KL for behavioral divergence. A Generative Rendering Layer is architecturally isolated from the latent world model. We present three theoretical results: the partition removes the gradient-interference pathway implicated in OIC; each grounded subspace inherits anti-collapse guarantees from its alignment objective; and generative isolation is necessary under a stated assumption on the generative objective's geometry. This manuscript establishes the problem formulation and architecture; experimental validation is ongoing and will be reported in a future revision.

2606.18532 2026-06-18 cs.CR cs.AI cs.RO cs.SE 新提交 60%

AI Sandboxes: A Threat Model, Taxonomy, and Measurement Framework

AI沙箱:威胁模型、分类法与测量框架

Inderjeet Singh, Haitham Mahmoud, Andrés Murillo

发表机构 * Fujitsu Research of Europe(富士通欧洲研究)

专题命中 其他机器人 :涉及物理AI和具身自主系统

AI总结 提出AI沙箱的威胁模型、分类法和测量框架,形式化沙箱边界与最弱链规则,定义网络物理威胁模型,并通过三个案例验证。

Comments 50 pages, 8 figures, 10 tables

详情
AI中文摘要

AI系统越来越多地在结合隔离、仿真、仪器化、监督和证据捕获的有界环境中进行评估。对于物理AI、AIoT和网络物理系统,这种转变不仅仅是术语问题:被测系统可能通过物理过程、网络设备和人类操作员进行感知、决策、执行、通信和故障。本文开发了一种面向保证的AI沙箱描述,将其作为数字AI、具身自主和网络物理部署中测试、评估、验证和确认的受控环境。我们形式化了沙箱边界和用于将每个维度的证据组合成有界部署声明的“最弱链”规则;分离了主要的沙箱原型;定义了一个包括对保证装置本身攻击的网络物理威胁模型;并引入了一个跨越保真度、可控性、可观测性、包含性、可重复性和治理工件的测量框架,在三个实际沙箱的工作案例研究中实例化。由此产生的威胁模型、分类法和测量框架阐明了沙箱可以有效测试什么、它可以包含哪些风险,以及它可以为安全、安保和监管保证支持哪些形式的证据。

英文摘要

AI systems are increasingly evaluated in bounded environments that combine isolation, simulation, instrumentation, supervision, and evidence capture. For physical AI, AIoT, and cyber-physical systems, this shift is not a matter of terminology: the system under test may sense, decide, actuate, communicate, and fail through physical processes, networked devices, and human operators. This article develops an assurance-oriented account of AI sandboxes as controlled environments for testing, evaluation, verification, and validation across digital AI, embodied autonomy, and cyber-physical deployments. We formalize the sandbox boundary and a weakest-link rule for composing per-dimension evidence into a bounded deployment claim; separate major sandbox archetypes; define a cyber-physical threat model that includes attacks on the assurance apparatus itself; and introduce a measurement framework spanning fidelity, controllability, observability, containment, reproducibility, and governance artifacts, instantiated on three worked case studies of real sandboxes. The resulting threat model, taxonomy, and measurement framework clarify what a sandbox can validly test, which risks it can contain, and what forms of evidence it can support for safety, security, and regulatory assurance.

2501.06348 2026-06-18 cs.HC cs.RO 版本更新 60%

Why Automate This? Exploring Correlations Between Desire for Robotic Automation, Invested Time and Well-Being

为什么自动化这个?探索机器人自动化愿望、投入时间与幸福感之间的相关性

Ruchira Ray, Leona Pang, Sanjana Srivastava, Li Fei-Fei, Samantha Shorey, Roberto Martín-Martín

发表机构 * University of Texas at Austin(德克萨斯大学奥斯汀分校) Stanford University(斯坦福大学) University of Pittsburgh(匹兹堡大学)

专题命中 其他机器人 :探索机器人自动化偏好与时间、幸福感的相关性。

AI总结 本研究利用BEHAVIOR-1K等数据集,发现活动时间并非自动化偏好的强预测因子,而幸福感和痛苦感是最强指标,并揭示了性别和收入水平的差异。

Comments 26 pages, 14 figures

详情
AI中文摘要

理解人类倾向于自动化任务的动机对于开发无缝融入日常生活的机器人至关重要。因此,我们提出疑问:个体是否更倾向于根据活动消耗的时间或执行活动时的感受来自动化活动?本研究探讨了这些偏好以及它们是否在不同社会群体(特别是性别类别和收入水平)之间存在差异。利用BEHAVIOR-1K数据集、美国时间使用调查以及美国时间使用调查幸福感模块的数据,我们研究了机器人自动化愿望、花费时间以及相关感受(幸福感、意义感、悲伤感、痛苦感、压力感或疲惫感)之间的关系。我们的主要发现表明,尽管存在常见假设,但活动花费的时间并不能强烈预测自动化偏好;相反,幸福感和痛苦感是最强的指标。我们还识别出性别和经济水平的差异:女性倾向于自动化压力大的活动,而男性倾向于自动化让他们不快乐的活动;中等收入个体优先自动化不太愉快和有意义的活动,而低收入和高收入群体则没有显著相关性。我们希望我们的研究有助于推动机器人设计符合用户优先事项,使家用机器人朝着更具社会相关性的解决方案发展。所有数据和交互式工具均可在此https URL公开获取。

英文摘要

Understanding the motivations underlying the human inclination to automate tasks is vital for developing robots that fit seamlessly into daily life. Accordingly, we ask: are individuals more inclined to automate activities based on the time they consume or the feelings experienced while performing them? This study explores these preferences and whether they vary across social groups, specifically gender category and income level. Leveraging data from the BEHAVIOR-1K dataset, the American Time-Use Survey, and the American Time-Use Survey Well-Being Module, we investigate the relationship between the desire for robot automation, time spent, and associated feelings: Happiness, Meaningfulness, Sadness, Painfulness, Stressfulness, or Tiredness. Our key findings show that, despite common assumptions, time spent on activities does not strongly predict automation preferences; instead, happiness and pain are the strongest indicators. We also identify differences by gender and economic level: Women prefer to automate stressful activities, whereas men prefer to automate those that make them unhappy; mid-income individuals prioritize automating less enjoyable and meaningful activities, while low and high-income show no significant correlations. We hope our research helps motivate the design of robots that align with user priorities, moving domestic robotics toward more socially relevant solutions. All data and an interactive tool are publicly available at https://robin-lab.cs.utexas.edu/why-automate-this/.