arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 21516
1811.04584 2026-06-04 cs.AI cs.SY eess.SY

Navigating Assistance System for Quadcopter with Deep Reinforcement Learning

四旋翼避障导航辅助系统基于深度强化学习

Tung-Cheng Wu, Shau-Yin Tseng, Chin-Feng Lai, Chia-Yu Ho, Ying-Hsun Lai

发表机构 * National Cheng Kung University(国立成功大学) Research Laboratories(研究实验室) Industrial Technology Research Institute(工业技术研究 institutes) Department of Computer Science(计算机科学系) Information Engineering(信息工程系) National Taitung University(国立台东大学)

AI总结 本文提出了一种基于深度强化学习的四旋翼避障导航辅助系统,通过两个功能模块分别实现路径导航和碰撞避障,实验表明该方法在500次飞行中碰撞率为14%。

Comments conference

详情
AI中文摘要

在本文中,我们提出了一种深度强化学习方法,用于四旋翼飞行器在飞行路径上绕过障碍物。在以往的研究中,算法仅控制四旋翼的前进方向。在本文中,我们使用两个功能来控制四旋翼。一个是四旋翼导航功能,它基于计算协调点并找到通往目标的直线路径。另一个功能是碰撞避障功能,它通过深度Q网络模型实现。两个功能都会输出旋转度数,智能体将结合这两个输出进行转向。此外,深度Q网络还可以使四旋翼向上或向下飞行以绕过障碍物并到达目标。我们的实验结果表明,在500次飞行后碰撞率为14%。基于这项工作,我们将训练更复杂的感知和转移模型以应用于真实的四旋翼飞行器。

英文摘要

In this paper, we present a deep reinforcement learning method for quadcopter bypassing the obstacle on the flying path. In the past study, the algorithm only controls the forward direction about quadcopter. In this letter, we use two functions to control quadcopter. One is quadcopter navigating function. It is based on calculating coordination point and find the straight path to the goal. The other function is collision avoidance function. It is implemented by deep Q-network model. Both two function will output rotating degree, the agent will combine both output and turn direct. Besides, deep Q-network can also make quadcopter fly up and down to bypass the obstacle and arrive at the goal. Our experimental result shows that the collision rate is 14% after 500 flights. Based on this work, we will train more complex sense and transfer model to the real quadcopter.

1811.04006 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Reachability-based safe learning for optimal control problem

基于可达性的安全学习用于最优控制问题

Stanislav Fedorov, Antonio Candelieri

发表机构 * Department of Computer Science, Systems and Communication, University of Milano Bicocca(米兰Bicocca大学计算机科学与通信系)

AI总结 本文提出了一种结合系统部分已知状态空间模型和未知动态作为加性有界扰动的安全学习方法,旨在通过安全集选择最优动作以实现目标集,同时在学习过程中更新扰动并提升最优控制的鲁棒性。

详情
AI中文摘要

在本文中,我们寻求一种整合安全性的学习方法,该方法依赖于部分已知的系统状态空间模型,并将未知动态视为加性有界扰动。我们引入了一个框架,用于在存在扰动的情况下安全地学习控制策略。基于已知模型部分,算法可以在满足安全保持条件的情况下,选择最优动作以追求目标集。在一些学习回合后,扰动可以根据现实数据进行更新。为此,对收集的扰动样本进行高斯过程回归。由于现实世界的不稳定性,例如摩擦或导电性随温度的变化,我们期望获得更鲁棒的最优控制问题解决方案。为了评估上述方法,我们选择倒立摆作为基准模型。所提出的算法能够学习到不违反预设安全约束的策略。当将其与探索设置结合时,观察到性能有所提升,从而确保在安全集内学习到最优策略。最后,我们概述了一些超出本文范围的未来研究方向。

英文摘要

In this work we seek for an approach to integrate safety in the learning process that relies on a partly known state-space model of the system and regards the unknown dynamics as an additive bounded disturbance. We introduce a framework for safely learning a control strategy for a given system with an additive disturbance. On the basis of the known part of the model, a safe set in which the system can learn safely, the algorithm can choose optimal actions for pursuing the target set as long as the safety-preserving condition is satisfied. After some learning episodes, the disturbance can be updated based on real-world data. To this end, Gaussian Process regression is conducted on the collected disturbance samples. Since the unstable nature of the law of the real world, for example, change of friction or conductivity with the temperature, we expect to have the more robust solution of optimal control problem. For evaluation of approach described above we choose an inverted pendulum as a benchmark model. The proposed algorithm manages to learn a policy that does not violate the pre-specified safety constraints. Observed performance is improved when it was incorporated exploration set up to make sure that an optimal policy is learned everywhere in the safe set. Finally, we outline some promising directions for future research beyond the scope of this paper.

1811.03853 2026-06-04 cs.LG cs.AI cs.SY eess.SY

Sample-Efficient Policy Learning based on Completely Behavior Cloning

基于完全行为克隆的高效策略学习

Qiming Zou, Ling Wang, Ke Lu, Yu Li

发表机构 * Department of Computer Science and Technology, Harbin Institute of Technology, China(计算机科学与技术系,哈尔滨工业大学,中国) Department of Management Science and Engineering, Anhui University of Technology, China(管理科学与工程系,安徽理工大学,中国)

AI总结 本文提出了一种基于完全行为克隆的策略初始化算法PLCBC,通过将模型预测控制转换为分段仿射函数并用神经网络表达,实现无训练的完全克隆,从而提高策略学习的效率和收敛性。

详情
AI中文摘要

直接策略搜索是强化学习中最重要的算法之一。然而,从头开始学习需要大量经验数据,并容易陷入局部极小值。此外,部分训练的策略可能会对智能体和环境产生危险的动作。为了解决这些挑战,本文提出了一种称为基于完全行为克隆的策略学习(PLCBC)的策略初始化算法。PLCBC首先使用多参数编程将模型预测控制(MPC)控制器转换为分段仿射(PWA)函数,并用神经网络表达此函数。通过这种方式,PLCBC可以在不损失性能的情况下完全克隆MPC控制器,并且是完全无训练的。实验表明,这种初始化策略可以帮助智能体在高奖励状态区域学习,并更快、更有效地收敛。

英文摘要

Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In addition to that, a partially trained policy tends to perform dangerous action to agent and environment. In order to overcome these challenges, this paper proposed a policy initialization algorithm called Policy Learning based on Completely Behavior Cloning (PLCBC). PLCBC first transforms the Model Predictive Control (MPC) controller into a piecewise affine (PWA) function using multi-parametric programming, and uses a neural network to express this function. By this way, PLCBC can completely clone the MPC controller without any performance loss, and is totally training-free. The experiments show that this initialization strategy can help agent learn at the high reward state region, and converge faster and better.

1806.06498 2026-06-04 cs.RO cs.LG cs.SY eess.SY

Conditional Affordance Learning for Driving in Urban Environments

面向城市环境的条件性 affordance 学习

Axel Sauer, Nikolay Savinov, Andreas Geiger

发表机构 * Chair of Robotics Science and System Intelligence, Technical University of Munich(慕尼黑技术大学机器人科学与系统智能系)

AI总结 本文提出了一种直接感知方法,通过将视频输入映射到适合复杂城市环境自主导航的中间表示,结合高层方向输入,实现了比现有强化学习和条件模仿学习方法更高的目标导向导航性能,并首次通过图像级标签处理交通灯和速度标志,显著减少模拟中的交通事故。

Comments Accepted for Conference on Robot Learning (CoRL) 2018

详情
AI中文摘要

大多数现有的自动驾驶方法分为两类:模块化流水线,通过构建环境的详尽模型,以及模仿学习方法,直接将图像映射到控制输出。最近提出的一种第三范式,直接感知,旨在通过神经网络学习适当的低维中间表示来结合两者的优点。然而,现有的直接感知方法仅限于简单的高速公路场景,缺乏在交叉路口导航、在交通灯前停止或遵守速度限制的能力。在本文中,我们提出了一种直接感知方法,将视频输入映射到适合复杂城市环境自主导航的中间表示,给定高层方向输入。与最先进的强化学习和条件模仿学习方法相比,在具有挑战性的CARLA模拟基准上,我们实现了高达68%的目标导向导航改进。此外,我们的方法是首次通过仅使用图像级标签来处理交通灯和速度标志,从而在模拟中显著减少交通事故。

英文摘要

Most existing approaches to autonomous driving fall into one of two categories: modular pipelines, that build an extensive model of the environment, and imitation learning approaches, that map images directly to control outputs. A recently proposed third paradigm, direct perception, aims to combine the advantages of both by using a neural network to learn appropriate low-dimensional intermediate representations. However, existing direct perception approaches are restricted to simple highway situations, lacking the ability to navigate intersections, stop at traffic lights or respect speed limits. In this work, we propose a direct perception approach which maps video input to intermediate representations suitable for autonomous navigation in complex urban environments given high-level directional inputs. Compared to state-of-the-art reinforcement and conditional imitation learning approaches, we achieve an improvement of up to 68 % in goal-directed navigation on the challenging CARLA simulation benchmark. In addition, our approach is the first to handle traffic lights and speed signs by using image-level labels only, as well as smooth car-following, resulting in a significant reduction of traffic accidents in simulation.

1811.00641 2026-06-04 cs.LG cs.CL cs.NA math.NA stat.ML

Online Embedding Compression for Text Classification using Low Rank Matrix Factorization

在线文本分类中的低秩矩阵分解用于词嵌入压缩

Anish Acharya, Rahul Goel, Angeliki Metallinou, Inderjit Dhillon

发表机构 * Amazon Alexa AI(亚马逊Alexa人工智能) Amazon Search Technologies(亚马逊搜索技术) University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 本文提出了一种在线词嵌入压缩方法,利用低秩矩阵分解在训练过程中压缩词嵌入层,从而减少NLP模型的内存瓶颈,同时在下游任务中通过重新训练恢复精度,实验证明该方法在句子分类任务中实现了90%的压缩率,并优于固定点量化等其他方法。

Comments Accepted in Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019)

详情
AI中文摘要

深度学习模型已成为自然语言处理(NLP)任务的最新技术,但将其部署到生产系统中却面临显著的内存限制。现有的压缩方法要么有损,要么引入显著的延迟。我们提出了一种压缩方法,利用低秩矩阵分解在训练过程中压缩词嵌入层,该层是大多数NLP模型的主要内存瓶颈。我们的模型在训练、压缩后,再在下游任务上重新训练以恢复精度,同时保持减小的尺寸。实验证明,所提出的方法在句子分类任务中可实现90%的压缩,对精度影响极小,并优于固定点量化或其他方法如离线词嵌入压缩。我们还通过FLOP计算分析了我们方法的推理时间和存储空间,显示我们可以通过可配置的比率压缩DNN模型,并在不引入额外延迟的情况下恢复精度损失。最后,我们引入了一种新的学习率调度方法,即周期性退火学习率(CALR),并通过句子分类基准实验证明其优于其他流行的自适应学习率算法。

英文摘要

Deep learning models have become state of the art for natural language processing (NLP) tasks, however deploying these models in production system poses significant memory constraints. Existing compression methods are either lossy or introduce significant latency. We propose a compression method that leverages low rank matrix factorization during training,to compress the word embedding layer which represents the size bottleneck for most NLP models. Our models are trained, compressed and then further re-trained on the downstream task to recover accuracy while maintaining the reduced size. Empirically, we show that the proposed method can achieve 90% compression with minimal impact in accuracy for sentence classification tasks, and outperforms alternative methods like fixed-point quantization or offline word embedding compression. We also analyze the inference time and storage space for our method through FLOP calculations, showing that we can compress DNN models by a configurable ratio and regain accuracy loss without introducing additional latency compared to fixed point quantization. Finally, we introduce a novel learning rate schedule, the Cyclically Annealed Learning Rate (CALR), which we empirically demonstrate to outperform other popular adaptive learning rate algorithms on a sentence classification benchmark.

1811.00426 2026-06-04 cs.RO cs.AI cs.SY eess.SY

Improving the Modularity of AUV Control Systems using Behaviour Trees

使用行为树提高水下机器人控制系统的模块化程度

Christopher Iliffe Sprague, Özer Özkahraman, Andrea Munafo, Rachel Marlow, Alexander Phillips, Petter Ögren

发表机构 * Robotics, Perception and Learning Lab(机器人、感知与学习实验室) Royal Institute of Technology(皇家理工学院) National Oceanography Centre(国家海洋学研究中心)

AI总结 本文展示如何利用行为树设计模块化、多功能且稳健的控制架构,用于关键任务系统,特别针对自主水下机器人。研究强调了系统安全的稳健性、执行多种任务的多功能性以及模块化在结合稳健性和多功能性中的重要性。

Comments Submitted to 2018 IEEE OES Autonomous Underwater Vehicle Symposium

详情
AI中文摘要

在本文中,我们展示了行为树(BTs)如何用于设计模块化、多功能且稳健的控制架构,用于关键任务系统。特别是,我们在此背景下展示了自主水下机器人(AUVs)的应用。在系统安全方面,稳健性很重要,因为手动恢复AUVs往往非常困难。此外,多功能性对于执行多种不同任务至关重要。最后,模块化是实现稳健性和多功能性结合所必需的,因为多功能系统的复杂性需要封装在模块中,以便创建一个简单的整体结构,从而实现稳健性分析。所提出的设计通过典型的AUV任务进行了说明。

英文摘要

In this paper, we show how behaviour trees (BTs) can be used to design modular, versatile, and robust control architectures for mission-critical systems. In particular, we show this in the context of autonomous underwater vehicles (AUVs). Robustness, in terms of system safety, is important since manual recovery of AUVs is often extremely difficult. Further more, versatility is important to be able to execute many different kinds of missions. Finally, modularity is needed to achieve a combination of robustness and versatility, as the complexity of a versatile systems needs to be encapsulated in modules, in order to create a simple overall structure enabling robustness analysis. The proposed design is illustrated using a typical AUV mission.

1810.13087 2026-06-04 cs.RO cs.FL cs.SY eess.SY math.OC

Multirobot Coordination with Counting Temporal Logics

多机器人协调与计数时序逻辑

Yunus Emre Sahin, Petter Nilsson, Necmiye Ozay

发表机构 * Department of Electrical Engineering and Computer Science(电气工程与计算机科学系) University of Michigan(密歇根大学) Department of Mechanical and Civil Engineering(机械与土木工程系) California Institute of Technology(加州理工学院)

AI总结 本文提出了一种基于计数时序逻辑的多机器人协调方法,通过优化算法生成轨迹以保证同步执行时满足给定的逻辑公式,并展示了在机器人动力学相同的情况下,使用计数线性时序逻辑(cLTL)能更高效地解决规划问题,同时讨论了异步执行下保持逻辑规范性的方法及鲁棒轨迹生成。

Comments Under submission for a journal

详情
AI中文摘要

在许多多机器人应用中,规划轨迹以确保机器人集体行为满足某种高层规范至关重要。受此问题启发,我们引入了计数时序逻辑——一种能够简洁表达多机器人任务规范的正式语言,适用于可能无限的时域。我们首先介绍了一种通用逻辑,称为计数线性时序逻辑加(cLTL+),并提出了一种基于优化的方法,生成个体轨迹,使得在这些轨迹同步执行时满足给定的cLTL+公式。我们随后介绍cLTL+的一个片段,称为计数线性时序逻辑(cLTL),并展示当所有机器人具有相同动力学时,使用cLTL约束的规划问题的解决方案可以更高效地获得。在论文的第二部分,我们放松同步假设,讨论如何生成可以在异步执行下保持所需cLTL+规范性的轨迹。特别是,我们证明当机器人之间的异步性受限制时,本文提出的方法可以修改以生成鲁棒轨迹。我们通过实验演示这些想法,并提供数值结果,展示该方法的可扩展性。

英文摘要

In many multirobot applications, planning trajectories in a way to guarantee that the collective behavior of the robots satisfies a certain high-level specification is crucial. Motivated by this problem, we introduce counting temporal logics---formal languages that enable concise expression of multirobot task specifications over possibly infinite horizons. We first introduce a general logic called counting linear temporal logic plus (cLTL+), and propose an optimization-based method that generates individual trajectories such that satisfaction of a given cLTL+ formula is guaranteed when these trajectories are synchronously executed. We then introduce a fragment of cLTL+, called counting linear temporal logic (cLTL), and show that a solution to planning problem with cLTL constraints can be obtained more efficiently if all robots have identical dynamics. In the second part of the paper, we relax the synchrony assumption and discuss how to generate trajectories that can be asynchronously executed, while preserving the satisfaction of the desired cLTL+ specification. In particular, we show that when the asynchrony between robots is bounded, the method presented in this paper can be modified to generate robust trajectories. We demonstrate these ideas with an experiment and provide numerical results that showcase the scalability of the method.

1810.13072 2026-06-04 cs.AI cs.RO cs.SY eess.SY

Formal Verification of Neural Network Controlled Autonomous Systems

神经网络控制自主系统的形式验证

Xiaowu Sun, Haitham Khedr, Yasser Shoukry

发表机构 * Department of Electrical Computer Engineering University of Maryland, College Park

AI总结 本文研究了如何形式验证配备神经网络控制器的自主机器人在LiDAR图像处理中安全性的核心问题,通过构建有限状态抽象并利用可达性分析计算安全的初始条件,提出了一种多项式时间算法来分区工作空间并计算对应的仿射成像函数,同时利用SMC编码分析神经网络行为,通过数值模拟验证了算法的效率。

详情
AI中文摘要

在本文中,我们考虑了正式验证配备神经网络(NN)控制器的自主机器人在处理LiDAR图像以产生控制动作时的安全性问题。给定一个由一组多边形障碍物特征化的工作空间,我们的目标是计算一组安全的初始条件,使得从这些初始条件出发的机器人轨迹能够保证避开障碍物。我们的方法是构建系统的有限状态抽象,并利用标准的可达性分析在有限状态抽象上计算安全的初始状态集。计算有限状态抽象的第一个技术问题是数学建模将机器人位置映射到LiDAR图像的成像函数。为此,我们引入了成像适应集的概念,作为工作空间的分区,在这些分区中,成像函数被保证为仿射的。我们开发了一种多项式时间算法,用于将工作空间划分为成像适应集并计算相应的仿射成像函数。给定这种工作空间分区,机器人的离散时间线性动力学以及一个预训练的具有修正线性单元(ReLU)非线性的神经网络控制器,第二个技术挑战是分析神经网络的行为。为此,我们利用满足模凸(SMC)编码来枚举所有可能的ReLU段落。SMC求解器随后使用布尔可满足性求解器和凸优化求解器,将问题分解为更小的子问题。为了加速这个过程,我们开发了一种预处理算法,可以快速修剪可行的ReLU段落。最后,我们通过数值模拟验证了所提出算法的效率,模拟中神经网络控制器的复杂性逐渐增加。

英文摘要

In this paper, we consider the problem of formally verifying the safety of an autonomous robot equipped with a Neural Network (NN) controller that processes LiDAR images to produce control actions. Given a workspace that is characterized by a set of polytopic obstacles, our objective is to compute the set of safe initial conditions such that a robot trajectory starting from these initial conditions is guaranteed to avoid the obstacles. Our approach is to construct a finite state abstraction of the system and use standard reachability analysis over the finite state abstraction to compute the set of the safe initial states. The first technical problem in computing the finite state abstraction is to mathematically model the imaging function that maps the robot position to the LiDAR image. To that end, we introduce the notion of imaging-adapted sets as partitions of the workspace in which the imaging function is guaranteed to be affine. We develop a polynomial-time algorithm to partition the workspace into imaging-adapted sets along with computing the corresponding affine imaging functions. Given this workspace partitioning, a discrete-time linear dynamics of the robot, and a pre-trained NN controller with Rectified Linear Unit (ReLU) nonlinearity, the second technical challenge is to analyze the behavior of the neural network. To that end, we utilize a Satisfiability Modulo Convex (SMC) encoding to enumerate all the possible segments of different ReLUs. SMC solvers then use a Boolean satisfiability solver and a convex programming solver and decompose the problem into smaller subproblems. To accelerate this process, we develop a pre-processing algorithm that could rapidly prune the space feasible ReLU segments. Finally, we demonstrate the efficiency of the proposed algorithms using numerical simulations with increasing complexity of the neural network controller.

1706.08932 2026-06-04 cs.RO cs.SY eess.SY

Iterative Sequential Action Control for Stable, Model-Based Control of Nonlinear Systems

迭代序列动作控制用于非线性系统的稳定模型控制

Emmanouil Tzorakoleftherakis, Todd Murphey

发表机构 * Neuroscience and Robotics Laboratory (N×R)(神经科学与机器人实验室)

AI总结 本文提出了一种迭代序列动作控制(iSAC)方法,用于非线性系统的控制,该方法通过在时间步之间迭代更新常数控制值来获得闭环渐近稳定性,并探讨了渐近衰减扰动对系统轨迹的影响。

详情
Journal ref
IEEE Transactions on Automatic Control, 2018
AI中文摘要

本文提出了迭代序列动作控制(iSAC),一种用于非线性系统的递推时间窗口控制方法。iSAC方法具有闭式开环解,通过在时间步之间迭代更新,引入常数控制值用于短时间应用。在成本上应用收缩约束被证明在温和假设下可以实现闭环渐近稳定性。还研究了渐近衰减扰动对系统轨迹的影响。为了展示iSAC在各种系统和条件下的适用性,我们采用了五个不同的系统,包括一个基于四元数的13维四旋翼。每个系统在不同的场景中进行测试,从可行和不可行的轨迹跟踪到设定点稳定化,有或没有外部扰动的存在。最后讨论了该工作的局限性。

英文摘要

This paper presents iterative Sequential Action Control (iSAC), a receding horizon approach for control of nonlinear systems. The iSAC method has a closed-form open-loop solution, which is iteratively updated between time steps by introducing constant control values applied for short duration. Application of a contractive constraint on the cost is shown to lead to closed-loop asymptotic stability under mild assumptions. The effect of asymptotically decaying disturbances on system trajectories is also examined. To demonstrate the applicability of iSAC to a variety of systems and conditions, we employ five different systems, including a 13-dimensional quaternion-based quadrotor. Each system is tested in different scenarios, ranging from feasible and infeasible trajectory tracking, to setpoint stabilization, with or without the presence of external disturbances. Finally, limitations of this work are discussed.

1810.12429 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

打破地平线诅咒:无限地平线离线估计

Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou

发表机构 * The University of Texas at Austin(德克萨斯大学奥斯汀分校) Google Brain(谷歌大脑)

AI总结 本文提出了一种新的离线估计方法,通过直接在平稳状态访问分布上应用重要性采样来避免现有估计器中方差爆炸的问题,核心贡献是提出了一种估计两个平稳分布密度比的新方法,并推导了RKHS情况下的闭式解。

Comments 21 pages, 5 figures, NIPS 2018 (spotlight)

详情
AI中文摘要

我们考虑了估计目标策略预期奖励的离线估计问题,该问题使用由不同行为策略收集的样本进行估计。重要性采样(IS)已成为推导(近)无偏估计器的关键技术,但在长地平线问题中已知会遭受过度高的方差。在无限地平线问题的极端情况下,基于IS的估计器的方差可能甚至是无界的。在本文中,我们提出了一种新的离线估计方法,直接在平稳状态访问分布上应用重要性采样,以避免现有估计器所面临的爆炸方差问题。我们的关键贡献是提出了一种估计两个平稳分布密度比的新方法,仅从行为分布中采样轨迹。我们为估计问题开发了一种mini-max损失函数,并推导了RKHS情况下的闭式解。我们通过理论和实证分析支持我们的方法。

英文摘要

We consider the off-policy estimation problem of estimating the expected reward of a target policy using samples collected by a different behavior policy. Importance sampling (IS) has been a key technique to derive (nearly) unbiased estimators, but is known to suffer from an excessively high variance in long-horizon problems. In the extreme case of in infinite-horizon problems, the variance of an IS-based estimator may even be unbounded. In this paper, we propose a new off-policy estimation method that applies IS directly on the stationary state-visitation distributions to avoid the exploding variance issue faced by existing estimators.Our key contribution is a novel approach to estimating the density ratio of two stationary distributions, with trajectories sampled from only the behavior distribution. We develop a mini-max loss function for the estimation problem, and derive a closed-form solution for the case of RKHS. We support our method with both theoretical and empirical analyses.

1802.00285 2026-06-04 cs.CV cs.RO cs.SY eess.SY

Virtual-to-Real: Learning to Control in Visual Semantic Segmentation

虚拟到现实:学习在视觉语义分割中的控制

Zhang-Wei Hong, Chen Yu-Ming, Shih-Yang Su, Tzu-Yun Shann, Yi-Hsiang Chang, Hsuan-Kung Yang, Brian Hsi-Lin Ho, Chih-Chieh Tu, Yueh-Chuan Chang, Tsu-Ching Hsiao, Hsin-Wei Hsiao, Sih-Pin Lai, Chun-Yi Lee

发表机构 * Elsa Lab(Elsa实验室) Department of Computer Science(计算机科学系) National Tsing Hua University(国立清华大学)

AI总结 本文提出了一种模块化架构,通过将感知模块和控制策略模块结合,利用语义图像分割作为元表示,解决虚拟到现实的迁移问题,并在障碍避让和目标跟随任务中展示了优越的性能。

Comments 7 pages, accepted by IJCAI-18

详情
AI中文摘要

从物理世界收集训练数据通常是耗时且甚至对脆弱机器人来说是危险的,因此最近的机器人学习进展倡导使用模拟器作为训练平台。不幸的是,合成与真实视觉数据之间的现实差距阻止了在虚拟世界中训练的模型直接迁移到现实世界。本文提出了一种模块化架构来解决虚拟到现实的问题。所提出的架构将学习模型分为感知模块和控制策略模块,并使用语义图像分割作为这些模块之间关联的元表示。感知模块将感知的RGB图像转换为语义图像分割。控制策略模块实现为一个深度强化学习代理,根据转换后的图像分割执行动作。我们的架构在避障任务和目标跟随任务中进行了评估。实验结果表明,我们的架构在虚拟和现实环境中均显著优于所有基线方法,并且比它们具有更快的学习曲线。我们还对各种变体配置进行了详细分析,并验证了我们模块化架构的可迁移性。

英文摘要

Collecting training data from the physical world is usually time-consuming and even dangerous for fragile robots, and thus, recent advances in robot learning advocate the use of simulators as the training platform. Unfortunately, the reality gap between synthetic and real visual data prohibits direct migration of the models trained in virtual worlds to the real world. This paper proposes a modular architecture for tackling the virtual-to-real problem. The proposed architecture separates the learning model into a perception module and a control policy module, and uses semantic image segmentation as the meta representation for relating these two modules. The perception module translates the perceived RGB image to semantic image segmentation. The control policy module is implemented as a deep reinforcement learning agent, which performs actions based on the translated image segmentation. Our architecture is evaluated in an obstacle avoidance task and a target following task. Experimental results show that our architecture significantly outperforms all of the baseline methods in both virtual and real environments, and demonstrates a faster learning curve than them. We also present a detailed analysis for a variety of variant configurations, and validate the transferability of our modular architecture.

1810.09729 2026-06-04 cs.RO cs.AI cs.SY eess.SY

Design Challenges of Multi-UAV Systems in Cyber-Physical Applications: A Comprehensive Survey, and Future Directions

多无人机系统在网络物理应用中的设计挑战:综述与未来方向

Reza Shakeri, Mohammed Ali Al-Garadi, Ahmed Badawy, Amr Mohamed, Tamer Khattab, Abdulla Al-Ali, Khaled A. Harras, Mohsen Guizani

发表机构 * Carnegie Mellon University Qatar Campus(卡塔尔分校卡内基梅隆大学)

AI总结 本文综述了多无人机系统在网络物理应用中的关键设计挑战,探讨了目标和基础设施对象的覆盖与跟踪、能量高效导航以及基于机器学习的图像分析等核心方法,并提出了面向细粒度网络物理应用的先进算法和未来研究方向。

详情
AI中文摘要

无人驾驶飞行器(UAVs)近年来迅速发展,为一系列创新应用提供了支持,这些应用有可能从根本上改变网络物理系统(CPSs)的设计方式。CPSs 是一种现代系统,具有计算和物理潜力的协同作用,能够通过多种新机制与人类交互。使用 UAVs 在 CPS 应用中的主要优势在于其卓越的特性,包括机动性、动态性、易于部署、适应高度、敏捷性、可调节性和随时在任何地方有效评估现实功能的能力。此外,从技术角度来看,UAVs 被预测将成为高级 CPSs 发展的重要元素。因此,在本次综述中,我们旨在确定多 UAV 系统在 CPS 应用中最基本和重要的设计挑战。我们强调了关键且多方面的内容,涵盖目标和基础设施对象的覆盖与跟踪、能量高效的导航以及使用机器学习进行图像分析以支持细粒度的 CPS 应用。此外,还研究了关键原型和测试平台,以展示这些实用技术如何促进 CPS 应用。我们提出了面向设计挑战的最先进算法,结合定量和定性方法,并将这些挑战与重要的 CPS 应用映射,以得出关于每个应用挑战的深入结论。最后,我们总结了可能的新方向和想法,这些可能会影响这些领域的未来研究。

英文摘要

Unmanned Aerial Vehicles (UAVs) have recently rapidly grown to facilitate a wide range of innovative applications that can fundamentally change the way cyber-physical systems (CPSs) are designed. CPSs are a modern generation of systems with synergic cooperation between computational and physical potentials that can interact with humans through several new mechanisms. The main advantages of using UAVs in CPS application is their exceptional features, including their mobility, dynamism, effortless deployment, adaptive altitude, agility, adjustability, and effective appraisal of real-world functions anytime and anywhere. Furthermore, from the technology perspective, UAVs are predicted to be a vital element of the development of advanced CPSs. Therefore, in this survey, we aim to pinpoint the most fundamental and important design challenges of multi-UAV systems for CPS applications. We highlight key and versatile aspects that span the coverage and tracking of targets and infrastructure objects, energy-efficient navigation, and image analysis using machine learning for fine-grained CPS applications. Key prototypes and testbeds are also investigated to show how these practical technologies can facilitate CPS applications. We present and propose state-of-the-art algorithms to address design challenges with both quantitative and qualitative methods and map these challenges with important CPS applications to draw insightful conclusions on the challenges of each application. Finally, we summarize potential new directions and ideas that could shape future research in these areas.

1706.05104 2026-06-04 cs.RO cs.ET cs.SY eess.SY

Personal Food Computer: A new device for controlled-environment agriculture

个人食物计算机:一种用于受控环境农业的新设备

Eduardo Castelló Ferrer, Jake Rye, Gordon Brander, Tim Savas, Douglas Chambers, Hildreth England, Caleb Harper

发表机构 * MIT Media Lab(麻省理工学院媒体实验室)

AI总结 本文提出了一种低成本的桌面平台OpenAg Personal Food Computer(PFC),旨在为植物物候研究、爱好者、制作者和K-12教师提供工具,支持集体数据共享和植物生长分析。

Comments 9 pages, 11 figures, Accepted at the 2017 Future Technologies Conference (FTC)

详情
AI中文摘要

由于其跨学科性质,受控环境农业设备有可能成为研究植物物候和在各种学科中创建课程的理想工具。受控环境设备正在增加其功能并改进其可访问性。传统上,从头开始建造这些设备需要机械工程、数字电子、编程和能源管理方面的知识。然而,为个人使用设计的有效受控环境设备带来了新的约束和挑战。本文提出了OpenAg Personal Food Computer(PFC);一种低成本的桌面平台,不仅针对植物物候研究人员,还针对爱好者、制作者和K-12级别的教师。PFC完全开源,并旨在成为可用于集体数据共享和植物生长分析的工具。得益于其模块化设计,PFC可以用于广泛活动。

英文摘要

Due to their interdisciplinary nature, devices for controlled-environment agriculture have the possibility to turn into ideal tools not only to conduct research on plant phenology but also to create curricula in a wide range of disciplines. Controlled-environment devices are increasing their functionalities as well as improving their accessibility. Traditionally, building one of these devices from scratch implies knowledge in fields such as mechanical engineering, digital electronics, programming, and energy management. However, the requirements of an effective controlled environment device for personal use brings new constraints and challenges. This paper presents the OpenAg Personal Food Computer (PFC); a low cost desktop size platform, which not only targets plant phenology researchers but also hobbyists, makers, and teachers from elementary to high-school levels (K-12). The PFC is completely open-source and it is intended to become a tool that can be used for collective data sharing and plant growth analysis. Thanks to its modular design, the PFC can be used in a large spectrum of activities.

1810.09365 2026-06-04 cs.LG cs.RO cs.SY eess.SY stat.ML

Coupled Longitudinal and Lateral Control of a Vehicle using Deep Learning

使用深度学习进行车辆纵向和横向控制的耦合控制

Guillaume Devineau, Philip Polack, Florent Altché, Fabien Moutarde

发表机构 * Center for Robotics, MINES ParisTech(机器人中心,巴黎综合理工学院) PSL Research University(巴黎综合理工大学)

AI总结 本文研究了深度神经网络在捕捉车辆动力学关键特性及执行耦合纵向和横向控制方面的潜力,通过高保真车辆动力学模拟数据集训练两种不同的人工神经网络,评估多层感知机和卷积神经网络在复杂测试赛道上的性能,与传统解耦控制器进行比较。

Comments Published in the IEEE 2018 International Conference on Intelligent Transportation Systems (ITSC 2018)

详情
AI中文摘要

本文探讨了深度神经网络在捕捉车辆动力学关键特性及执行耦合纵向和横向控制方面的潜力。为此,两种不同的人工神经网络被训练以计算对应参考轨迹的车辆控制输入,使用基于高保真车辆动力学模拟的数据集。在本研究中,控制输入被选择为前轮转向角和每个车轮施加的扭矩。两种模型,即多层感知机(MLP)和卷积神经网络(CNN),基于其在复杂测试赛道上驾驶车辆的能力进行评估,该赛道在长直线和紧弯之间切换。还提供了与传统解耦控制器在相同赛道上的比较。

英文摘要

This paper explores the capability of deep neural networks to capture key characteristics of vehicle dynamics, and their ability to perform coupled longitudinal and lateral control of a vehicle. To this extent, two different artificial neural networks are trained to compute vehicle controls corresponding to a reference trajectory, using a dataset based on high-fidelity simulations of vehicle dynamics. In this study, control inputs are chosen as the steering angle of the front wheels, and the applied torque on each wheel. The performance of both models, namely a Multi-Layer Perceptron (MLP) and a Convolutional Neural Network (CNN), is evaluated based on their ability to drive the vehicle on a challenging test track, shifting between long straight lines and tight curves. A comparison to conventional decoupled controllers on the same track is also provided.

1703.09971 2026-06-04 cs.CV cs.NA math.DS math.NA

A Geometric Framework for Stochastic Shape Analysis

随机形状分析的几何框架

Alexis Arnaudon, Darryl D. Holm, Stefan Sommer

发表机构 * Department of Mathematics, Imperial College(帝国理工学院数学系) Department of Computer Science (DIKU), University of Copenhagen(哥本哈根大学计算机科学系(DIKU))

AI总结 本文提出了一种随机流形的几何框架,用于分析形状、图像和地标的数据演化,通过Fokker-Planck方程和数值模拟研究了随机演化的特性,并提出了两种参数推断方法。

详情
AI中文摘要

我们介绍了一种随机的流形模型,其作用于多种数据类型上可降维为形状、图像和地标的随机演化。随机性引入在运输数据的向量场中,该随机性在大变形流形度度量映射(LDDMM)框架中用于形状分析和图像配准。随机性因此建模了跟随给定变形速度时流的误差或不确定性。该方法在有限维地标流形的例子中进行了说明,其随机演化通过Fokker-Planck方程和数值模拟研究。我们推导了两种从离散时间点观测到的地标配置推断随机模型参数的方法。第一种方法将Fokker-Planck方程的矩匹配到数据样本的矩,第二种方法则使用蒙特卡罗桥采样方案的期望最大化算法来优化数据似然。我们推导并数值测试了这两种方法推断底层噪声空间相关长度的能力。

英文摘要

We introduce a stochastic model of diffeomorphisms, whose action on a variety of data types descends to stochastic evolution of shapes, images and landmarks. The stochasticity is introduced in the vector field which transports the data in the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework for shape analysis and image registration. The stochasticity thereby models errors or uncertainties of the flow in following the prescribed deformation velocity. The approach is illustrated in the example of finite dimensional landmark manifolds, whose stochastic evolution is studied both via the Fokker-Planck equation and by numerical simulations. We derive two approaches for inferring parameters of the stochastic model from landmark configurations observed at discrete time points. The first of the two approaches matches moments of the Fokker-Planck equation to sample moments of the data, while the second approach employs an Expectation-Maximisation based algorithm using a Monte Carlo bridge sampling scheme to optimise the data likelihood. We derive and numerically test the ability of the two approaches to infer the spatial correlation length of the underlying noise.

1506.02438 2026-06-04 cs.LG cs.RO cs.SY eess.SY

High-Dimensional Continuous Control Using Generalized Advantage Estimation

利用广义优势估计进行高维连续控制

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel

发表机构 * Department of Electrical Engineering and Computer Science(电气工程与计算机科学系) University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出了一种基于广义优势估计的方法,通过减少策略梯度估计的方差来解决高维连续控制中的样本需求问题,并通过信任区域优化提高稳定性和收敛性,从而在复杂的3D运动任务中实现了高效的政策学习。

详情
AI中文摘要

策略梯度方法在强化学习中受到青睐,因为它们直接优化累积奖励,并且可以方便地与非线性函数近似器如神经网络结合使用。主要挑战是通常需要大量的样本,以及在输入数据非平稳性下获得稳定和持续改进的难度。我们通过使用价值函数来显著减少策略梯度估计的方差(尽管引入了偏差),并利用类似于TD(λ)的指数加权优势函数估计来解决第一个挑战。我们通过使用信任区域优化过程来解决第二个挑战,该过程用于策略和价值函数,它们由神经网络表示。我们的方法在高度具有挑战性的3D运动任务中表现出强大的经验结果,包括学习双足和四足仿真实体的行走姿态,以及学习使双足机器人从地面平躺状态站立的策略。与使用手工制定政策表示的先前工作相比,我们的神经网络策略直接从原始运动学映射到关节扭矩。我们的算法是完全模型无关的,并且在3D双足机器人上的学习任务所需的模拟经验时间相当于1-2周的真实时间。

英文摘要

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main challenges are the large number of samples typically required, and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data. We address the first challenge by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias, with an exponentially-weighted estimator of the advantage function that is analogous to TD(lambda). We address the second challenge by using trust region optimization procedure for both the policy and the value function, which are represented by neural networks. Our approach yields strong empirical results on highly challenging 3D locomotion tasks, learning running gaits for bipedal and quadrupedal simulated robots, and learning a policy for getting the biped to stand up from starting out lying on the ground. In contrast to a body of prior work that uses hand-crafted policy representations, our neural network policies map directly from raw kinematics to joint torques. Our algorithm is fully model-free, and the amount of simulated experience required for the learning tasks on 3D bipeds corresponds to 1-2 weeks of real time.

1810.06175 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

An Optimal Control Approach to Sequential Machine Teaching

用最优控制方法进行序列机器教学

Laurent Lessard, Xuezhou Zhang, Xiaojin Zhu

发表机构 * University of Wisconsin–Madison(威斯康星大学麦迪逊分校)

AI总结 本文提出了一种基于最优控制的序列机器教学方法,通过将问题转化为时间最优控制问题,解决了寻找最短训练序列以驱动学习算法达到目标模型的问题,并在案例研究中展示了该方法的优越性。

详情
AI中文摘要

给定一个序列学习算法和目标模型,序列机器教学旨在找到最短的训练序列以驱动学习算法达到目标模型。我们提出了寻找此类最短训练序列的第一个系统方法。我们的关键见解是将序列机器教学公式化为时间最优控制问题。这使我们能够利用过去60年间最优控制领域发展出的关键理论和计算工具来解决序列教学问题。具体而言,我们研究了庞特里亚金最大原理,它为训练序列的最优性提供了必要条件。我们通过一个使用最小二乘损失函数和梯度下降学习者案例研究,展示了该方法的分析、结构和数值影响。我们为该问题计算了最优训练序列,尽管这些序列看起来曲折,但我们发现它们可以大幅超越现有生成训练序列的最优启发式方法。

英文摘要

Given a sequential learning algorithm and a target model, sequential machine teaching aims to find the shortest training sequence to drive the learning algorithm to the target model. We present the first principled way to find such shortest training sequences. Our key insight is to formulate sequential machine teaching as a time-optimal control problem. This allows us to solve sequential teaching by leveraging key theoretical and computational tools developed over the past 60 years in the optimal control community. Specifically, we study the Pontryagin Maximum Principle, which yields a necessary condition for optimality of a training sequence. We present analytic, structural, and numerical implications of this approach on a case study with a least-squares loss function and gradient descent learner. We compute optimal training sequences for this problem, and although the sequences seem circuitous, we find that they can vastly outperform the best available heuristics for generating training sequences.

1810.05683 2026-06-04 cs.RO cs.SY eess.SY

Long-Duration Autonomy for Small Rotorcraft UAS including Recharging

小型旋翼机无人机的长期自主性包括充电

Christian Brommer, Danylo Malyuta, Daniel Hentzen, Roland Brockers

发表机构 * Autonomous Controls Laboratory, University of Washington(华盛顿大学自主控制实验室) Jet Propulsion Laboratory, California Institute of Technology(加州理工学院喷气推进实验室)

AI总结 本研究提出了一种完全自主的小型旋翼机无人机,能够在无人干预的情况下执行长期观测任务,通过全平台自主性和基于视觉的精确着陆技术实现自动能源补充,实验结果展示了其在室内和室外环境中的11小时自主操作能力。

Comments 7 pages

详情
AI中文摘要

许多无人 aerial vehicle 监控和监测应用需要在精确位置上进行长时间的观测,理想情况下持续数天或数周(例如生态系统监测),这在以往由于有限的续航能力和需要有人参与操作而难以实现。为克服这些限制,我们提出了一种完全自主的小型旋翼机无人机,能够执行多次飞行任务以完成长期观测任务而无需任何人为干预。我们解决了两个关键技术,对于此类系统至关重要:全平台自主性,包括紧急响应以使任务能够独立于人类操作员执行,以及基于视觉的精确着陆能力,用于自动补充能量。实验结果展示了在室内和室外环境中长达11小时的完全自主操作能力。

英文摘要

Many unmanned aerial vehicle surveillance and monitoring applications require observations at precise locations over long periods of time, ideally days or weeks at a time (e.g. ecosystem monitoring), which has been impractical due to limited endurance and the requirement of humans in the loop for operation. To overcome these limitations, we propose a fully autonomous small rotorcraft UAS that is capable of performing repeated sorties for long-term observation missions without any human intervention. We address two key technologies that are critical for such a system: full platform autonomy including emergency response to enable mission execution independently from human operators, and the ability of vision-based precision landing on a recharging station for automated energy replenishment. Experimental results of up to 11 hours of fully autonomous operation in indoor and outdoor environments illustrate the capability of our system.

1710.06647 2026-06-04 cs.CV cs.NA math.NA

Image Restoration by Iterative Denoising and Backward Projections

通过迭代去噪和反向投影进行图像恢复

Tom Tirer, Raja Giryes

发表机构 * School of Electrical Engineering, Tel Aviv University(特拉维夫大学电气工程学院)

AI总结 本文提出了一种利用现成去噪器解决逆问题的替代方法,通过将典型成本函数转换为新的优化问题,并引入高效的最小化方案和自动调参机制,以减少参数调优并提升图像修复和去模糊的效果。

Comments To appear in IEEE Transactions on Image Processing

详情
AI中文摘要

逆问题出现在许多应用中,如图像去模糊和修复。解决这些问题是通过为每个问题设计特定算法。Plug-and-Play(P&P)框架最近被引入,利用现有去噪算法的出色能力来解决一般逆问题。尽管这种新策略已找到许多应用,但通常需要大量的参数调优才能获得高质量的结果。在本文中,我们提出了一种替代方法,利用现成的去噪器解决逆问题,其参数调优要求更少。首先,我们将典型成本函数(由保真度和先验项组成)转换为一个密切相关的新优化问题。然后,我们提出了一种高效的最小化方案,具有Plug-and-Play属性,即先验项仅通过去噪操作处理。最后,我们提出了一种自动调参机制来设置方法的参数。我们对方法进行了理论分析,并通过图像修复和去模糊任务与特定技术和P&P方法的实验,证明了其竞争力。

英文摘要

Inverse problems appear in many applications, such as image deblurring and inpainting. The common approach to address them is to design a specific algorithm for each problem. The Plug-and-Play (P&P) framework, which has been recently introduced, allows solving general inverse problems by leveraging the impressive capabilities of existing denoising algorithms. While this fresh strategy has found many applications, a burdensome parameter tuning is often required in order to obtain high-quality results. In this work, we propose an alternative method for solving inverse problems using off-the-shelf denoisers, which requires less parameter tuning. First, we transform a typical cost function, composed of fidelity and prior terms, into a closely related, novel optimization problem. Then, we propose an efficient minimization scheme with a plug-and-play property, i.e., the prior term is handled solely by a denoising operation. Finally, we present an automatic tuning mechanism to set the method's parameters. We provide a theoretical analysis of the method, and empirically demonstrate its competitiveness with task-specific techniques and the P&P approach for image inpainting and deblurring.

1807.09904 2026-06-04 cs.RO cs.LG cs.SY eess.SY

A Data-Efficient Approach to Precise and Controlled Pushing

一种数据高效且精确可控的推动作方法

Maria Bauza, Francois R. Hogan, Alberto Rodriguez

发表机构 * Department of Mechanical Engineering — Massachusetts Institute of Technology(机械工程系——麻省理工学院)

AI总结 本文提出了一种数据高效的方法,通过学习动态模型来控制复杂机械系统,仅需10个数据点即可完成复杂的推动作轨迹。

Comments Maria Bauza and Francois R. Hogan contributed equally to this work. 10 pages, 5 figures

详情
Journal ref
CoRL 2018
AI中文摘要

几十年来,控制理论的研究表明,简单的控制器在获得及时反馈的情况下,能够控制复杂的系统。推动作是复杂机械系统的一个例子,由于摩擦系数和压力分布等未知系统参数,难以准确建模。本文探讨了控制而非建模所需的数据复杂性。结果表明,一种基于模型的控制方法,其中动态模型从数据中学习,能够使用极少量的训练数据(10个数据点)完成复杂的推动作轨迹。推动作的动态特性通过高斯过程(GP)建模,并在一种模型预测控制方法中利用,该方法线性化GP并施加执行器和任务约束,以完成平面操作任务。

英文摘要

Decades of research in control theory have shown that simple controllers, when provided with timely feedback, can control complex systems. Pushing is an example of a complex mechanical system that is difficult to model accurately due to unknown system parameters such as coefficients of friction and pressure distributions. In this paper, we explore the data-complexity required for controlling, rather than modeling, such a system. Results show that a model-based control approach, where the dynamical model is learned from data, is capable of performing complex pushing trajectories with a minimal amount of training data (10 data points). The dynamics of pushing interactions are modeled using a Gaussian process (GP) and are leveraged within a model predictive control approach that linearizes the GP and imposes actuator and task constraints for a planar manipulation task.

1803.01940 2026-06-04 cs.RO cs.SY eess.SY

Tactile Regrasp: Grasp Adjustments via Simulated Tactile Transformations

触觉重抓:通过模拟触觉变换进行抓取调整

Francois R. Hogan, Maria Bauza, Oleguer Canal, Elliott Donlon, Alberto Rodriguez

发表机构 * Massachusetts Institute of Technology(麻省理工学院)

AI总结 本文提出了一种新的重抓控制策略,利用触觉传感进行局部抓取调整。该方法通过虚拟搜索局部触觉测量变换来提高抓取质量。首先,使用深度卷积神经网络构建基于触觉的抓取质量度量,该网络在超过2800次抓取上进行训练。每个抓取的质量是一个介于0和1之间的连续值,通过实验测量其对外部扰动的抵抗性来确定。其次,通过刚体变换模拟机器人运动相对于初始抓取的触觉印记,新生成的触觉印记与学习的抓取质量网络进行评估,选择最大化抓取质量的重抓动作。结果表明,抓取质量网络在已知物体上的平均准确率为85%,在12个物体的交叉验证集上的准确率为75%。重抓控制策略在8个物体的测试集上将抓取动作的成功率提高了70%。

Comments Francois R. Hogan and Maria Bauza contributed equally to this work. 8 pages, 7 figures

详情
Journal ref
IROS 2018
AI中文摘要

本文提出了一种新颖的重抓控制策略,利用触觉传感进行局部抓取调整。我们的方法通过虚拟搜索局部触觉测量变换来确定重抓动作。首先,我们使用深度卷积神经网络构建基于触觉的抓取质量度量,该网络在超过2800次抓取上进行训练。每个抓取的质量是一个介于0和1之间的连续值,通过实验测量其对外部扰动的抵抗性来确定。其次,我们通过执行刚体变换,模拟机器人运动相对于初始抓取的触觉印记。新生成的触觉印记与学习的抓取质量网络进行评估,重抓动作被选择以最大化抓取质量。结果表明,抓取质量网络在已知物体上的平均准确率为85%,在12个物体的交叉验证集上的准确率为75%。重抓控制策略在8个物体的测试集上将抓取动作的成功率提高了70%。

英文摘要

This paper presents a novel regrasp control policy that makes use of tactile sensing to plan local grasp adjustments. Our approach determines regrasp actions by virtually searching for local transformations of tactile measurements that improve the quality of the grasp. First, we construct a tactile-based grasp quality metric using a deep convolutional neural network trained on over 2800 grasps. The quality of each grasp, a continuous value between 0 and 1, is determined experimentally by measuring its resistance to external perturbations. Second, we simulate the tactile imprints associated with robot motions relative to the initial grasp by performing rigid-body transformations of the given tactile measurements. The newly generated tactile imprints are evaluated with the learned grasp quality network and the regrasp action is chosen to maximize the grasp quality. Results show that the grasp quality network can predict the outcome of grasps with an average accuracy of 85% on known objects and 75% on a cross validation set of 12 objects. The regrasp control policy improves the success rate of grasp actions by an average relative increase of 70% on a test set of 8 objects.

1802.04205 2026-06-04 cs.RO cs.AI cs.SY eess.SY

Efficient Hierarchical Robot Motion Planning Under Uncertainty and Hybrid Dynamics

在不确定性和混合动力学下的高效机器人运动规划

Ajinkya Jain, Scott Niekum

发表机构 * Department of Mechanical Engineering(机械工程系) Department of Computer Science(计算机科学系) University of Texas at Austin, USA(得克萨斯大学奥斯汀分校)

AI总结 本文提出了一种分层POMDP规划器,用于在存在不确定性的情况下为混合动力学模型生成成本优化的运动计划,通过将非线性动力学分解为离散的局部动力学模型,从而有效减少状态不确定性。

Comments 2nd Conference on Robot Learning (CoRL 2018), Zurich, Switzerland

详情
AI中文摘要

嘈杂的观测与非线性动力学是机器人运动规划中最大的挑战之一。通过将非线性动力学分解为一组离散的局部动力学模型,混合动力学提供了一种自然的方式来建模非线性动力学,尤其是在由于接触等因素导致动力学突然不连续的系统中。我们提出了一种分层POMDP规划器,该规划器为混合动力学模型开发成本优化的运动计划。分层规划器首先开发一个高层运动计划,以确定要访问的局部动力学模型的顺序,然后将其转换为详细的连续状态计划。这种分层规划方法将POMDP规划问题分解为更小的子部分,这些子部分可以以显著降低的计算成本解决。能够按顺序访问局部动力学模型的能力也提供了一种强大的方法,利用混合动力学来减少状态不确定性。我们在模拟领域导航任务和具有机械臂的装配任务上评估了所提出的规划器,证明了我们的方法可以有效解决具有高观测噪声和非线性动力学的任务,且计算成本显著低于直接规划方法。

英文摘要

Noisy observations coupled with nonlinear dynamics pose one of the biggest challenges in robot motion planning. By decomposing nonlinear dynamics into a discrete set of local dynamics models, hybrid dynamics provide a natural way to model nonlinear dynamics, especially in systems with sudden discontinuities in dynamics due to factors such as contacts. We propose a hierarchical POMDP planner that develops cost-optimized motion plans for hybrid dynamics models. The hierarchical planner first develops a high-level motion plan to sequence the local dynamics models to be visited and then converts it into a detailed continuous state plan. This hierarchical planning approach results in a decomposition of the POMDP planning problem into smaller sub-parts that can be solved with significantly lower computational costs. The ability to sequence the visitation of local dynamics models also provides a powerful way to leverage the hybrid dynamics to reduce state uncertainty. We evaluate the proposed planner on a navigation task in the simulated domain and on an assembly task with a robotic manipulator, showing that our approach can solve tasks having high observation noise and nonlinear dynamics effectively with significantly lower computational costs compared to direct planning approaches.

1810.03074 2026-06-04 cs.RO cs.SY eess.SY

Hierarchical Optimization for Whole-Body Control of Wheeled Inverted Pendulum Humanoids

为轮式倒立摆人形机器人的全身控制设计分层优化

Munzir Zafar, Seth Hutchinson, Evangelos A. Theodorou

发表机构 * Institute of Robotics and Intelligent Machines(机器人与智能机构研究所)

AI总结 本文提出了一种用于轮式倒立摆人形机器人的全身控制框架,通过分层优化方法实现多任务的同时执行,同时考虑关节角度和扭矩限制,以提高整体性能。

详情
AI中文摘要

本文提出了一种用于轮式倒立摆人形机器人的全身控制框架。轮式倒立摆人形机器人是一种具有多个自由度的冗余操作臂,能够动态地在轮子上保持平衡。这些机器人能够同时执行多种任务,如平衡、维持身体姿态、控制视线、举重物或维持末端执行器在操作空间中的配置。全身控制问题旨在在指定优先级下,通过最优利用所有自由度的同时执行这些任务。控制还必须遵守每个关节的角度和扭矩限制。所提出的方法是分层的,包括一个低级控制器用于身体关节的操作,以及一个高级控制器定义用于低级控制器的质心(CoM)目标,以控制系统的零动力学,从而驱动轮子。低级控制器在考虑系统更完整的动力学时计划较短的规划范围,而高级控制器则基于对机器人近似模型的规划来提高计算效率,以规划更长的规划范围。

英文摘要

In this paper, we present a whole-body control framework for Wheeled Inverted Pendulum (WIP) Humanoids. WIP Humanoids are redundant manipulators dynamically balancing themselves on wheels. Characterized by several degrees of freedom, they have the ability to perform several tasks simultaneously, such as balancing, maintaining a body pose, controlling the gaze, lifting a load or maintaining end-effector configuration in operation space. The problem of whole-body control is to enable simultaneous performance of these tasks with optimal participation of all degrees of freedom at specified priorities for each objective. The control also has to obey constraint of angle and torque limits on each joint. The proposed approach is hierarchical with a low level controller for body joints manipulation and a high-level controller that defines center of mass (CoM) targets for the low-level controller to control zero dynamics of the system driving the wheels. The low-level controller plans for shorter horizons while considering more complete dynamics of the system, while the high-level controller plans for longer horizon based on an approximate model of the robot for computational efficiency.

1810.00527 2026-06-04 cs.RO cs.SY eess.SY

Safe Adaptive Switching among Dynamical Movement Primitives: Application to 3D Limit-Cycle Walkers

安全的动态运动基元切换:应用于3D极限环步行机器人

Sushant Veer, Ioannis Poulakakis

发表机构 * Department of Mechanical Engineering, University of Delaware(德克萨斯大学达勒姆分校机械工程系)

AI总结 本文提出了一种安全的动态运动基元切换方法,用于生成机器人运动计划,通过在存在外部扰动的情况下确保执行的安全性,应用于3D极限环步行机器人以适应持续的外部力。

详情
AI中文摘要

机器人复杂运动通常通过切换多个独立运动基元来生成。我们采用这种方法,将机器人运动计划表示为一系列按顺序执行的基元序列。在处理动态运动基元时,除了完成高层目标外,规划器还必须考虑计划执行对平台安全的影响。在存在扰动(如外部力)的情况下,这一任务变得更加困难。为了解决这一问题,我们提出了一种框架,利用严谨的控制理论工具,为受外部激励的机器人系统生成安全可执行的运动计划。该框架在一种3D极限环周期步态双足机器人上得到示例,该机器人能够适应持续的外部力。

英文摘要

Complex motions for robots are frequently generated by switching among a collection of individual movement primitives. We use this approach to formulate robot motion plans as sequences of primitives to be executed one after the other. When dealing with dynamical movement primitives, besides accomplishing the high-level objective, planners must also reason about the effect of the plan's execution on the safety of the platform. This task becomes more daunting in the presence of disturbances, such as external forces. To alleviate this issue, we present a framework that builds on rigorous control-theoretic tools to generate safely-executable motion plans for externally excited robotic systems. Our framework is illustrated on a 3D limit-cycle gait bipedal robot that adapts its walking pattern to persistent external forcing.

1809.09261 2026-06-04 cs.AI cs.LG cs.SY eess.SY

Resilient Computing with Reinforcement Learning on a Dynamical System: Case Study in Sorting

基于动态系统的强化学习鲁棒计算:排序问题案例研究

Aleksandra Faust, James B. Aimone, Conrad D. James, Lydia Tapia

发表机构 * Google Brain, Mountain View, CA, USA(谷歌大脑,美国加利福尼亚州山景城) Sandia National Labs, Albuquerque, NM, USA(桑迪亚国家实验室,美国新墨西哥州阿尔伯克基)

AI总结 本文将计算过程建模为反馈控制问题,利用强化学习解决序列决策问题,通过排序问题案例展示鲁棒计算方法在克服传统编程局限性方面的有效性。

Comments 11 pages, accepted to CDC 2018. Here with additional evaluations

详情
AI中文摘要

机器人和自主代理在资源有限的情况下,通常依赖不完美的模型和传感器测量来完成目标导向任务。特别是,强化学习(RL)和反馈控制可以用来帮助机器人实现目标。本文基于这一领域的工作,将通用计算建模为反馈控制问题,使代理能够自主克服标准过程语言编程的局限性:对错误的鲁棒性和早期程序终止的容忍。我们的建模将计算视为程序变量空间中的轨迹生成。计算因此成为一个序列决策问题,通过强化学习(RL)解决,并通过李雅普诺夫稳定性理论分析以评估代理的鲁棒性和向目标的进展。我们通过一个典型的计算机科学问题——数组排序的案例研究来实现这一点。评估显示,我们的RL排序代理能够稳定地向渐近稳定的终点进展,对故障组件具有鲁棒性,并且比传统的快速排序和冒泡排序进行的数组操作更少。

英文摘要

Robots and autonomous agents often complete goal-based tasks with limited resources, relying on imperfect models and sensor measurements. In particular, reinforcement learning (RL) and feedback control can be used to help a robot achieve a goal. Taking advantage of this body of work, this paper formulates general computation as a feedback-control problem, which allows the agent to autonomously overcome some limitations of standard procedural language programming: resilience to errors and early program termination. Our formulation considers computation to be trajectory generation in the program's variable space. The computing then becomes a sequential decision making problem, solved with reinforcement learning (RL), and analyzed with Lyapunov stability theory to assess the agent's resilience and progression to the goal. We do this through a case study on a quintessential computer science problem, array sorting. Evaluations show that our RL sorting agent makes steady progress to an asymptotically stable goal, is resilient to faulty components, and performs less array manipulations than traditional Quicksort and Bubble sort.

1809.08819 2026-06-04 cs.RO cs.SY eess.SY

Oscillation Damping Control of Pendulum-like Manipulation Platform using Moving Masses

使用移动质量抑制摆动式操作平台的振动

Min Jun Kim, Jianjie Lin, Konstantin Kondak, Dongheui Lee, Christian Ott

发表机构 * Institute of Robotics and Mechatronics, German Aerospace Center (DLR), Wessling, Germany(机器人与机电研究所,德国航空航天中心(DLR),德国韦斯林) Chair of Automatic Control Engineering, Technical University of Munich (TUM), Munich, Germany(自动控制工程系,慕尼黑技术大学(TUM),德国慕尼黑) Fortiss Institute, Munich, Germany(Fortiss研究所,德国慕尼黑)

AI总结 本文提出了一种通过在平台上安装移动质量来抑制机器人操作臂悬挂平台振动的方法,通过合理设计移动质量的参考加速度实现平台的渐近稳定性,同时克服了欠驱动带来的挑战。

Comments IFAC Symposium on Robot Control (SYROCO) 2018

详情
AI中文摘要

本文提出了一种方法,用于抑制安装在机器人操作臂上的摆动式悬挂平台的振荡运动。为此,在平台上安装了移动质量。本文通过合理设计移动质量的参考加速度,实现了平台的渐近稳定性(即振动阻尼)。该工作的主要特点是不仅实现了平台的渐近稳定性,还实现了移动质量的渐近稳定性,这可能由于欠驱动特性而具有挑战性。所提出的方法通过仿真研究进行了验证。

英文摘要

This paper presents an approach to damp out the oscillatory motion of the pendulum-like hanging platform on which a robotic manipulator is mounted. To this end, moving masses were installed on top of the platform. In this paper, asymptotic stability of the platform (which implies oscillation damping) is achieved by designing reference acceleration of the moving masses properly. A main feature of this work is that we can achieve asymptotic stability of not only the platform, but also the moving masses, which may be challenging due to the under-actuation nature. The proposed scheme is validated by the simulation studies.

1809.06401 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Hidden Markov Model Estimation-Based Q-learning for Partially Observable Markov Decision Process

基于隐马尔可夫模型估计的Q学习:部分可观测马尔可夫决策过程

Hyung-Jin Yoon, Donghwan Lee, Naira Hovakimyan

发表机构 * Department of Industrial and Enterprise Systems Engineering(工业与企业系统工程系)

AI总结 本文提出了一种基于隐马尔可夫模型估计的在线Q学习算法,用于部分可观测马尔可夫决策过程,同时估计POMDP参数和Q函数,并证明其收敛性。

详情
AI中文摘要

目标是研究一种在线隐马尔可夫模型(HMM)估计基于的Q学习算法,用于有限状态和动作集的部分可观测马尔可夫决策过程(POMDP)。当完整状态观测可用时,Q学习在当前动作下找到最优动作价值函数(Q函数)。然而,当完整状态观测不可用时,Q学习表现不佳。本文将POMDP估计转化为HMM估计问题,并提出递归算法,同时估计POMDP参数和Q函数。此外,本文证明POMDP估计收敛到最大似然估计的平稳点,而Q函数估计收敛到满足由HMM估计过程确定的状态信念不变分布加权的贝尔曼最优性方程的固定点。

英文摘要

The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.

1602.01891 2026-06-04 cs.RO cs.MA cs.SY eess.SY math.OC

Distributed Estimation of State and Parameters in Multi-Agent Cooperative Load Manipulation

多智能体协同负载操作中状态与参数的分布式估计

Antonio Franchi, Antonio Petitti, Alessandro Rizzo

发表机构 * CNRS, LAAS(法国国家科学研究中心(CNRS)、拉瓦尔大学(LAAS))

AI总结 本文提出两种分布式方法,用于估计未知平面体的运动学参数、动力学参数和运动学状态,利用刚体运动学和动力学、非线性观测理论和一致性算法,通过智能体对负载施加二维力矩、测量接触点速度以及通信图连通性来实现,理论分析和收敛证明均提供,第一种方法假设参数恒定,第二种方法可处理时变参数并可并行应用于任何任务导向的控制律,对于无控制律的情况,提出了一种分布式且安全的控制策略以满足可观测性条件,通过现实的蒙特卡洛模拟展示了估计策略的有效性和鲁棒性。

Comments Accepted for publication to the IEEE Transactions on Control of Network Systems

详情
AI中文摘要

我们提出了两种分布式方法,用于估计未知平面体的运动学参数、动力学参数和运动学状态,这些方法依赖于刚体运动学和动力学、非线性观测理论和一致性算法。唯一三个要求是每个智能体可以对负载施加二维力矩,可以测量其接触点的速度,并且通信图是连通的。提供了理论非线性可观测性分析和收敛证明。第一种方法假设参数恒定,而第二种方法可以处理时变参数,并且可以并行应用于任何任务导向的控制律。对于没有提供控制律的情况,我们提出了一种分布式且安全的控制策略,以满足可观测性条件。通过现实的蒙特卡洛模拟展示了估计策略的有效性和鲁棒性。

英文摘要

We present two distributed methods for the estimation of the kinematic parameters, the dynamic parameters, and the kinematic state of an unknown planar body manipulated by a decentralized multi-agent system. The proposed approaches rely on the rigid body kinematics and dynamics, on nonlinear observation theory, and on consensus algorithms. The only three requirements are that each agent can exert a 2D wrench on the load, it can measure the velocity of its contact point, and that the communication graph is connected. Both theoretical nonlinear observability analysis and convergence proofs are provided. The first method assumes constant parameters while the second one can deal with time-varying parameters and can be applied in parallel to any task-oriented control law. For the cases in which a control law is not provided, we propose a distributed and safe control strategy satisfying the observability condition. The effectiveness and robustness of the estimation strategy is showcased by means of realistic MonteCarlo simulations.

1809.08022 2026-06-04 cs.RO cs.SY eess.SY

The Urban Last Mile Problem: Autonomous Drone Delivery to Your Balcony

城市最后一公里问题:自主无人机送货到您的阳台

Gino Brunner, Bence Szebedy, Simon Tanner, Roger Wattenhofer

发表机构 * Computer Engineering and Networks Laboratory(计算机工程与网络实验室)

AI总结 本文提出了一种基于商用无人机的城市最后一公里自主送货方法,通过GPS定位和视觉导航实现对阳台或门廊等非集中地点的精准配送,并开源代码以促进未来研究。

详情
AI中文摘要

无人机送货在过去几年中已成为行业热点。然而,现有方法要么专注于农村地区,要么依赖于集中式配送点来执行最后一公里配送。在本文中,我们使用商用无人机解决城市环境中自主最后一公里配送的问题。我们构建了一个原型系统,该系统能够利用GPS飞向近似配送位置,然后使用视觉导航找到精确的配送位置。配送位置可能例如在阳台或门廊上,并且只需在墙上或窗户上用视觉标记指示即可。我们测试了我们的系统组件在模拟环境中的表现,包括视觉导航和避障。最后,我们在现实环境中部署了我们的无人机,并展示了它如何在阳台上找到配送点。为了促进该主题的未来研究,我们开源了我们的代码。

英文摘要

Drone delivery has been a hot topic in the industry in the past few years. However, existing approaches either focus on rural areas or rely on centralized drop-off locations from where the last mile delivery is performed. In this paper we tackle the problem of autonomous last mile delivery in urban environments using an off-the-shelf drone. We build a prototype system that is able to fly to the approximate delivery location using GPS and then find the exact drop-off location using visual navigation. The drop-off location could, e.g., be on a balcony or porch, and simply needs to be indicated by a visual marker on the wall or window. We test our system components in simulated environments, including the visual navigation and collision avoidance. Finally, we deploy our drone in a real-world environment and show how it can find the drop-off point on a balcony. To stimulate future research in this topic we open source our code.

1803.02238 2026-06-04 cs.RO cs.CL cs.SY eess.SY

Precise but Natural Specification for Robot Tasks

机器人任务的精确但自然的规范

Ivan Gavran, Brendon Boldt, Eva Darulova, Rupak Majumdar

发表机构 * Max Planck Institute for Software Systems, Germany(德国马克斯·普朗克软件研究所)

AI总结 Flipper通过自然语言接口实现机器人高阶任务规范,结合形式化核心语言与语义解析器,提供可视化反馈并支持自然语言扩展,提升任务描述效率。

详情
AI中文摘要

我们提出了Flipper,一种自然语言接口,用于描述机器人高阶任务规范并编译为机器人动作。Flipper始于形式化核心语言,允许表达丰富的时序规范,并通过语义解析器提供自然语言接口。Flipper通过在图形用户界面中执行自动构建的计划提供即时视觉反馈,允许用户解决潜在的歧义解释。Flipper通过自然化扩展自身:用户可以添加 utterances 的定义,Flipper 由此诱导新规则并将其添加到核心语言中,逐渐形成更加自然的任务规范语言。Flipper通过泛化用户提供的定义来改进自然化。与其他任务规范系统不同,Flipper在保持编程语言的表达力和形式精确性的同时,实现了自然语言交互。我们通过初始用户研究证明,自然语言交互和泛化可以显著简化任务描述。此外,随着时间推移,用户会使用更多超出初始核心语言的概念。这些扩展可供Flipper社区使用,用户可以使用其他人定义的概念。

英文摘要

We present Flipper, a natural language interface for describing high-level task specifications for robots that are compiled into robot actions. Flipper starts with a formal core language for task planning that allows expressing rich temporal specifications and uses a semantic parser to provide a natural language interface. Flipper provides immediate visual feedback by executing an automatically constructed plan of the task in a graphical user interface. This allows the user to resolve potentially ambiguous interpretations. Flipper extends itself via naturalization: its users can add definitions for utterances, from which Flipper induces new rules and adds them to the core language, gradually growing a more and more natural task specification language. Flipper improves the naturalization by generalizing the definition provided by users. Unlike other task-specification systems, Flipper enables natural language interactions while maintaining the expressive power and formal precision of a programming language. We show through an initial user study that natural language interactions and generalization can considerably ease the description of tasks. Moreover, over time, users employ more and more concepts outside of the initial core language. Such extensions are available to the Flipper community, and users can use concepts that others have defined.