arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 21516
1904.02341 2026-06-04 cs.RO cs.AI cs.SY eess.SY

Online Risk-Bounded Motion Planning for Autonomous Vehicles in Dynamic Environments

在线风险受限的自主车辆动态环境中的运动规划

Xin Huang, Sungkweon Hong, Andreas Hofmann, Brian C. Williams

发表机构 * MIT Computer Science and Artificial Intelligence Laboratory(麻省理工学院计算机科学与人工智能实验室)

AI总结 本文提出了一种在线风险受限的运动规划方法,通过结合意图识别算法和POMDP求解器,生成安全高效的路径规划方案,尤其在无保护左转和变道等复杂环境中表现更优。

Comments Accepted at ICAPS'19. 10 pages, 6 figures, 1 table

详情
AI中文摘要

高效且稳健的自主车辆运动规划面临的关键挑战是理解周围代理的意图。忽略动态环境中其他代理的意图会导致风险或过于保守的规划。本文将运动规划问题建模为部分可观测马尔可夫决策过程(POMDP),并提出一个在线系统,结合意图识别算法和POMDP求解器,为自主车辆生成风险受限的路径规划。意图识别算法利用贝叶斯过滤和预学习的机动运动模型,预测每个代理车辆在有限时间 horizon 内的混合运动状态。我们实时更新POMDP模型,并使用启发式搜索算法求解,生成具有碰撞概率上界保证的策略。我们证明,与基线方法相比,我们的系统在多个具有挑战性的环境中,能够生成更高效和安全的运动规划。

英文摘要

A crucial challenge to efficient and robust motion planning for autonomous vehicles is understanding the intentions of the surrounding agents. Ignoring the intentions of the other agents in dynamic environments can lead to risky or over-conservative plans. In this work, we model the motion planning problem as a partially observable Markov decision process (POMDP) and propose an online system that combines an intent recognition algorithm and a POMDP solver to generate risk-bounded plans for the ego vehicle navigating with a number of dynamic agent vehicles. The intent recognition algorithm predicts the probabilistic hybrid motion states of each agent vehicle over a finite horizon using Bayesian filtering and a library of pre-learned maneuver motion models. We update the POMDP model with the intent recognition results in real time and solve it using a heuristic search algorithm which produces policies with upper-bound guarantees on the probability of near colliding with other dynamic agents. We demonstrate that our system is able to generate better motion plans in terms of efficiency and safety in a number of challenging environments including unprotected intersection left turns and lane changes as compared to the baseline methods.

1904.01214 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

Enhancement of Energy-Based Swing-Up Controller via Entropy Search

通过熵搜索增强基于能量的摆动上控制器

Chang Sik Lee, Dong Eui Chang

发表机构 * School of Electrical Engineering, KAIST, Daejeon, Korea.(韩国成均馆大学电气工程学院)

AI总结 本文利用熵搜索进行贝叶斯优化,改进基于能量的控制器,以实现旋转倒立摆(Furuta摆)的摆动控制,实验表明该控制器在各种初始条件下性能优于常规控制器。

Comments 6 pages, 2019 Asian Control Conference

详情
AI中文摘要

基于能量的方法为稳定机械系统提供了一种简单而强大的控制方案。然而,由于它不对控制器参数空间施加强约束,寻找适合最优控制器的参数值被认为是困难的。本文旨在通过应用称为熵搜索的贝叶斯优化方法,生成一个最优的基于能量的控制器,用于旋转倒立摆(也称为Furuta摆)的摆动控制。仿真和实验表明,与常规控制器相比,最优控制器在各种初始条件下表现出改进的性能。

英文摘要

An energy based approach for stabilizing a mechanical system has offered a simple yet powerful control scheme. However, since it does not impose such strong constraints on parameter space of the controller, finding appropriate parameter values for an optimal controller is known to be hard. This paper intends to generate an optimal energy-based controller for swinging up a rotary inverted pendulum, also known as the Furuta pendulum, by applying the Bayesian optimization called Entropy Search. Simulations and experiments show that the optimal controller has an improved performance compared to a nominal controller for various initial conditions.

1707.09198 2026-06-04 cs.LG cs.AI cs.SY eess.SY math.OC

Data-Driven Stochastic Robust Optimization: A General Computational Framework and Algorithm for Optimization under Uncertainty in the Big Data Era

数据驱动的随机稳健优化:大数据时代不确定性优化的通用计算框架和算法

Chao Ning, Fengqi You

发表机构 * Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University(罗伯特·弗雷德里克·史密斯化学与生物分子工程学院,康奈尔大学)

AI总结 本文提出了一种数据驱动的随机稳健优化框架,通过双层优化结构基于数据驱动的不确定性模型,结合两阶段随机规划和自适应稳健优化,解决大数据时代下的不确定性优化问题。

详情
Journal ref
Computers & Chemical Engineering, Volume 111, Pages 115-133, 4 March 2018,
AI中文摘要

本文提出了一种新颖的数据驱动随机稳健优化(DDSRO)框架,用于利用带有标签的多类不确定性数据进行不确定性优化。大数据集中的不确定性数据通常来自各种条件,这些条件通过类别标签进行编码。采用狄利克雷过程混合模型和最大似然估计等机器学习方法进行不确定性建模。基于数据驱动的不确定性模型,进一步提出了一种双层优化结构的DDSRO框架。外层优化问题采用两阶段随机规划方法,以在不同数据类别上优化预期目标;自适应稳健优化作为内层问题,确保解决方案的鲁棒性,同时保持计算可行性。进一步开发了一种基于分解的算法,以高效解决由此产生的多级优化问题。通过过程网络设计和规划的案例研究,展示了所提框架和算法的应用性。

英文摘要

A novel data-driven stochastic robust optimization (DDSRO) framework is proposed for optimization under uncertainty leveraging labeled multi-class uncertainty data. Uncertainty data in large datasets are often collected from various conditions, which are encoded by class labels. Machine learning methods including Dirichlet process mixture model and maximum likelihood estimation are employed for uncertainty modeling. A DDSRO framework is further proposed based on the data-driven uncertainty model through a bi-level optimization structure. The outer optimization problem follows a two-stage stochastic programming approach to optimize the expected objective across different data classes; adaptive robust optimization is nested as the inner problem to ensure the robustness of the solution while maintaining computational tractability. A decomposition-based algorithm is further developed to solve the resulting multi-level optimization problem efficiently. Case studies on process network design and planning are presented to demonstrate the applicability of the proposed framework and algorithm.

1903.09749 2026-06-04 cs.RO cs.SY eess.SY math.DS math.OC

Passivity guaranteed stiffness control with multiple frequency band specifications for a cable-driven series elastic actuator

具有多频率带规范的电缆驱动串联弹性执行器的被动保证刚度控制

Ningbo Yu, Wulin Zou, Yubo Sun

发表机构 * Tianjin Key Laboratory of Intelligent Robotics, Nankai University(天津智能机器人重点实验室,南开大学)

AI总结 本文针对电缆驱动串联弹性执行器的刚度控制问题,提出了一种基于H∞综合方法的改进方案,通过在特定频率带内满足被动性、执行器限制、扰动抑制和噪声抑制等约束条件,提升了刚度控制的精度和鲁棒性。

Comments 10 pages, already published in Mechanical Systems and Signal Processing

详情
AI中文摘要

阻抗控制和特别是刚度控制被广泛应用于物理人机交互。串联弹性执行器(SEA)提供了固有的柔顺性、安全性和进一步的好处。本文旨在改进电缆驱动SEA的刚度控制性能。现有的阻抗控制器是在全频域内设计的,尽管人机交互通常发生在低频范围内。我们通过制定的被动性、执行器限制、扰动抑制和特定频率范围内的噪声抑制约束条件来增强刚度渲染性能。首先,我们将多频率带优化问题重新公式化为H∞综合框架。然后,性能目标通过各自受限频域规范作为范数界限来定量描述。进一步,直接综合出一个结构化的控制器以满足所有竞争性性能要求。仿真和实验结果表明,所生成的控制器能够为每个期望的刚度(从0到1倍的物理弹簧常数)提供良好的交互性能。与基于被动性的PID方法相比,所提出的H∞综合方法在保证被动性的前提下实现了更精确和鲁棒的刚度控制性能。

英文摘要

Impedance control and specifically stiffness control are widely applied for physical human-robot interaction. The series elastic actuator (SEA) provides inherent compliance, safety and further benefits. This paper aims to improve the stiffness control performance of a cable-driven SEA. Existing impedance controllers were designed within the full frequency domain, though human-robot interaction commonly falls in the low frequency range. We enhance the stiffness rendering performance under formulated constraints of passivity, actuator limitation, disturbance attenuation, noise rejection at their specific frequency ranges. Firstly, we reformulate this multiple frequency-band optimization problem into the $H_\infty$ synthesis framework. Then, the performance goals are quantitatively characterized by respective restricted frequency-domain specifications as norm bounds. Further, a structured controller is directly synthesized to satisfy all the competing performance requirements. Both simulation and experimental results showed that the produced controller enabled good interaction performance for each desired stiffness varying from 0 to 1 times of the physical spring constant. Compared with the passivity-based PID method, the proposed $H_\infty$ synthesis method achieved more accurate and robust stiffness control performance with guaranteed passivity.

1904.00035 2026-06-04 cs.RO cs.LG cs.SY eess.SY stat.ML

Autonomous Highway Driving using Deep Reinforcement Learning

使用深度强化学习实现自动驾驶高速公路驾驶

Subramanya Nageshrao, Eric Tseng, Dimitar Filev

发表机构 * Ford Greenfield Labs(福特绿谷实验室) Ford Research and Innovation Center(福特研究与创新中心)

AI总结 本文提出了一种基于强化学习的方法,通过与模拟交通直接交互,使自动驾驶车辆在复杂和多变的环境中做出决策,解决了传统规则和预设成本函数在实时优化中的不足,提高了学习效率和安全性。

详情
AI中文摘要

自动驾驶车辆的操作空间可以是多样的,并且可能显著变化。这可能导致设计阶段未预料到的场景。因此,基于规则的决策者选择动作可能并不理想。同样,设计一个先验成本函数然后在实时中求解最优控制问题可能也不够有效。为了应对这些问题并避免在遇到意外场景时出现异常行为,我们提出了一种基于强化学习(RL)的方法,其中自动驾驶车辆通过与模拟交通直接交互来学习决策。决策者由深度神经网络实现,根据给定的系统状态提供动作选择。在关键应用如驾驶中,没有明确安全概念的RL代理可能无法收敛,或者需要极大量的样本才能找到可靠的策略。为了更好地解决这个问题,本文将强化学习与额外的短时间安全检查(SC)相结合。在关键场景中,安全检查还将为代理提供替代的安全动作,如果存在的话。这导致了两个新的贡献。首先,它扩展了可能导致不良“接近事件”或“碰撞”的状态。其次,安全检查的加入可以提供一个安全且稳定的训练环境。这显著提高了学习效率,同时不抑制有意义的探索,以确保安全和最优的学习行为。我们展示了所开发算法在高速公路驾驶场景中的性能,其中训练好的自动驾驶车辆在高速公路环境下遇到不同交通密度的情况。

英文摘要

The operational space of an autonomous vehicle (AV) can be diverse and vary significantly. This may lead to a scenario that was not postulated in the design phase. Due to this, formulating a rule based decision maker for selecting maneuvers may not be ideal. Similarly, it may not be effective to design an a-priori cost function and then solve the optimal control problem in real-time. In order to address these issues and to avoid peculiar behaviors when encountering unforeseen scenario, we propose a reinforcement learning (RL) based method, where the ego car, i.e., an autonomous vehicle, learns to make decisions by directly interacting with simulated traffic. The decision maker for AV is implemented as a deep neural network providing an action choice for a given system state. In a critical application such as driving, an RL agent without explicit notion of safety may not converge or it may need extremely large number of samples before finding a reliable policy. To best address the issue, this paper incorporates reinforcement learning with an additional short horizon safety check (SC). In a critical scenario, the safety check will also provide an alternate safe action to the agent provided if it exists. This leads to two novel contributions. First, it generalizes the states that could lead to undesirable "near-misses" or "collisions ". Second, inclusion of safety check can provide a safe and stable training environment. This significantly enhances learning efficiency without inhibiting meaningful exploration to ensure safe and optimal learned behavior. We demonstrate the performance of the developed algorithm in highway driving scenario where the trained AV encounters varying traffic density in a highway setting.

1903.11204 2026-06-04 cs.RO cs.SY eess.SY math.OC

Priority Maps for Surveillance and Intervention of Wildfires and other Spreading Processes

优先地图用于监视和干预野火及其他扩散过程

Vera L. J. Somers, Ian R. Manchester

发表机构 * Australian Centre for Field Robotics(澳大利亚田径场机器人研究中心) University of Sydney(悉尼大学)

AI总结 本文提出了一种生成优先地图的方法,用于监视或干预动态扩散过程,如野火。该方法利用正系统性质,特别是价值函数的分离结构,提供可扩展的算法。通过16和1000节点示例展示了方法如何响应系统动态变化,并结合旅行商问题进行无人机路径规划。

Comments Accepted for ICRA 2019

详情
AI中文摘要

无人驾驶航空器(UAV)路径规划算法通常假设一个知识奖励函数或优先地图,指示最重要的区域。本文提出了一种方法,用于生成监视或干预动态扩散过程(如野火)的优先地图。所提出的优化框架利用正系统的性质,特别是价值函数(成本到目标)的分离结构,提供可扩展的监视和干预算法。我们展示了16和1000节点示例的结果,并说明了优先地图如何响应系统动态的变化。1000节点的更大示例代表一个虚构景观,展示了该方法如何整合野火扩散动态、景观和风条件。最后,我们给出将所提方法与旅行商问题结合用于野火干预的无人机路径规划示例。

英文摘要

Unmanned Aerial Vehicle (UAV) path planning algorithms often assume a knowledge reward function or priority map, indicating the most important areas to visit. In this paper we propose a method to create priority maps for monitoring or intervention of dynamic spreading processes such as wildfires. The presented optimization framework utilizes the properties of positive systems, in particular the separable structure of value (cost-to-go) functions, to provide scalable algorithms for surveillance and intervention. We present results obtained for a 16 and 1000 node example and convey how the priority map responds to changes in the dynamics of the system. The larger example of 1000 nodes, representing a fictional landscape, shows how the method can integrate bushfire spreading dynamics, landscape and wind conditions. Finally, we give an example of combining the proposed method with a travelling salesman problem for UAV path planning for wildfire intervention.

1903.09748 2026-06-04 cs.RO cs.SY eess.SY math.DS math.OC

Impedance control of a cable-driven SEA with mixed $H_2/H_\infty$ synthesis

电缆驱动串联弹性执行器的阻抗控制:混合H2/H∞综合方法

Ningbo Yu, Wulin Zou

发表机构 * Institute of Robotics and Automatic Information Systems, Nankai University(机器人与自动信息系统研究所,南开大学) Tianjin Key Laboratory of Intelligent Robotics, Nankai University(天津智能机器人重点实验室,南开大学)

AI总结 本文提出了一种基于混合H2/H∞综合和放松被动性的电缆驱动串联弹性执行器的阻抗控制方法,用于物理人机交互。

Comments 11 pages, already published in Assembly Automation

详情
Journal ref
Assembly Automation, Vol. 37, Issue: 3, pp.296-303, 2017
AI中文摘要

目的:本文提出了一种混合H2/H∞综合和放松被动性的阻抗控制方法,用于电缆驱动串联弹性执行器,以应用于物理人机交互。设计/方法/研究途径:为了使系统的阻抗匹配所需的动态模型,将阻抗控制问题重新公式化为阻抗匹配结构。所需的竞争性能要求以及来自物理系统的约束可以通过针对各自信号的加权函数来表征。考虑到人类运动的频率特性,被动约束对于稳定的人机交互,其在整个频谱上要求,可能会带来保守的解决方案,已被放松成仅限制低频带。因此,阻抗控制成为混合H2/H∞综合问题,并可以得到动态输出反馈控制器。发现:所提出的阻抗控制策略已针对各种期望的阻抗进行了测试,包括在电缆驱动串联弹性执行器平台上进行的仿真和实验。实际的交互扭矩在期望的范数范围内良好跟踪了期望的扭矩,且控制输入被调节在电机速度限制以下。闭环系统可以在低频上保证放松的被动性。仿真和实验结果都验证了所提出方法的可行性和有效性。原创性/价值:这种基于混合H2/H∞综合和放松被动性的阻抗控制策略提供了一种新颖、有效且更少保守的方法用于物理人机交互控制。

英文摘要

Purpose: This paper presents an impedance control method with mixed $H_2/H_\infty$ synthesis and relaxed passivity for a cable-driven series elastic actuator to be applied for physical human-robot interaction. Design/methodology/approach: To shape the system's impedance to match a desired dynamic model, the impedance control problem was reformulated into an impedance matching structure. The desired competing performance requirements as well as constraints from the physical system can be characterized with weighting functions for respective signals. Considering the frequency properties of human movements, the passivity constraint for stable human-robot interaction, which is required on the entire frequency spectrum and may bring conservative solutions, has been relaxed in such a way that it only restrains the low frequency band. Thus, impedance control became a mixed $H_2/H_\infty$ synthesis problem, and a dynamic output feedback controller can be obtained. Findings: The proposed impedance control strategy has been tested for various desired impedance with both simulation and experiments on the cable-driven series elastic actuator platform. The actual interaction torque tracked well the desired torque within the desired norm bounds, and the control input was regulated below the motor velocity limit. The closed loop system can guarantee relaxed passivity at low frequency. Both simulation and experimental results have validated the feasibility and efficacy of the proposed method. Originality/value: This impedance control strategy with mixed $H_2/H_\infty$ synthesis and relaxed passivity provides a novel, effective and less conservative method for physical human-robot interaction control.

1903.09673 2026-06-04 cs.RO cs.SY eess.SY math.DS math.OC

Compliance Shaping for Control of Strength Amplification Exoskeletons with Elastic Cuffs

合规性塑形用于具有弹性围裙的强度放大外骨骼控制

Gray Cortright Thomas, Jeremiah M. Coholich, Luis Sentis

发表机构 * Human Centered Robotics Lab in the University of Texas at Austin(德克萨斯大学奥斯汀分校人本机器人实验室)

AI总结 本文提出了一种双合规性塑形方法,通过在力敏感围裙中串联弹簧来设计外骨骼的合规行为,以实现高放大比下的稳定性和鲁棒性,同时引入反馈控制器和增益调节方法,并通过单自由度肘部外骨骼验证了方法的有效性。

Comments 8 pages, 9 figures, conference

详情
AI中文摘要

能够放大操作者力量的外骨骼可以实现对未知物体的重载操作。然而,这种行为难以实现,因为外骨骼需要感知并放大操作者交互力的同时保持稳定。然而,放大与连接到操作者时的鲁棒稳定性目标本质上是冲突的。为此,我们引入了一种设计,在力敏感围裙中串联一个弹簧。这允许我们设计出名义上被动的外骨骼合规行为,即使具有高放大比。实际上,时间延迟和离散时间滤波器阻止我们的策略实际上实现被动性,但设计的合规性仍使外骨骼对弹簧状人类行为更具鲁棒性。我们的外骨骼由串联弹性执行器(SEA)驱动,这向系统引入了另一个弹簧。我们表明,为外骨骼塑形围裙的合规性可以近似转化为对SEA弹簧合规性的塑形问题。因此,我们引入了一种反馈控制器和增益调节方法,利用现有的SEAs合规性塑形技术。我们称之为“双合规性塑形”方法。在大放大比下,此控制器倾向于放大非线性传动摩擦效应,因此我们还提出了“传动扰动观测器”以缓解这一缺点。我们的方法在单自由度肘部外骨骼上进行了验证。

英文摘要

Exoskeletons which amplify the strength of their operators can enable heavy-duty manipulation of unknown objects. However, this type of behavior is difficult to accomplish; it requires the exoskeleton to sense and amplify the operator's interaction forces while remaining stable. But, the goals of amplification and robust stability when connected to the operator fundamentally conflict. As a solution, we introduce a design with a spring in series with the force sensitive cuff. This allows us to design an exoskeleton compliance behavior which is nominally passive, even with high amplification ratios. In practice, time delay and discrete time filters prevent our strategy from actually achieving passivity, but the designed compliance still makes the exoskeleton more robust to spring-like human behaviors. Our exoskeleton is actuated by a series elastic actuator (SEA), which introduces another spring into the system. We show that shaping the cuff compliance for the exoskeleton can be made into approximately the same problem as shaping the spring compliance of an SEA. We therefore introduce a feedback controller and gain tuning method which takes advantage of an existing compliance shaping technique for SEAs. We call our strategy the "double compliance shaping" method. With large amplification ratios, this controller tends to amplify nonlinear transmission friction effects, so we additionally propose a "transmission disturbance observer" to mitigate this drawback. Our methods are validated on a single-degree-of-freedom elbow exoskeleton.

1903.09122 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

Finite Sample Analysis of Stochastic System Identification

随机系统辨识的有限样本分析

Anastasios Tsiamis, George J. Pappas

发表机构 * Department of Electrical and Systems Engineering, University of Pennsylvania(宾夕法尼亚大学电气与系统工程系)

AI总结 本文基于机器学习和统计学的现代工具,研究了随机系统辨识的有限样本复杂性。通过子空间辨识算法和有限数量的输出样本,提供了系统参数估计误差的非渐近高概率上界,证明了在高概率下估计误差以1/√N的速度减小。

Comments Under review

详情
AI中文摘要

在本文中,我们利用现代机器学习和统计学工具,分析了随机系统辨识的有限样本复杂性。一个未知的离散时间线性系统在高斯噪声下随时间演变,没有外部输入。目标是在给定有限时间跨度N内的单条输出测量轨迹的情况下,恢复系统参数以及卡尔曼滤波增益。基于子空间辨识算法和有限数量的N个输出样本,我们提供了系统参数估计误差的非渐近高概率上界。我们的分析利用了最近的随机矩阵理论、自归一化鞅和SVD鲁棒性结果,以证明在高概率下估计误差以1/√N的速度减小。我们的非渐近界不仅与经典渐近结果一致,而且即使在系统处于临界稳定的情况下也有效。

英文摘要

In this paper, we analyze the finite sample complexity of stochastic system identification using modern tools from machine learning and statistics. An unknown discrete-time linear system evolves over time under Gaussian noise without external inputs. The objective is to recover the system parameters as well as the Kalman filter gain, given a single trajectory of output measurements over a finite horizon of length $N$. Based on a subspace identification algorithm and a finite number of $N$ output samples, we provide non-asymptotic high-probability upper bounds for the system parameter estimation errors. Our analysis uses recent results from random matrix theory, self-normalized martingales and SVD robustness, in order to show that with high probability the estimation errors decrease with a rate of $1/\sqrt{N}$. Our non-asymptotic bounds not only agree with classical asymptotic results, but are also valid even when the system is marginally stable.

1903.08792 2026-06-04 cs.LG cs.SY eess.SY stat.ML

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

通过障碍函数实现端到端安全强化学习用于安全关键的连续控制任务

Richard Cheng, Gabor Orosz, Richard M. Murray, Joel W. Burdick

发表机构 * California Institute of Technology(加州理工学院) University of Michigan(密歇根大学)

AI总结 本文提出了一种结合模型无关强化学习控制器、基于控制障碍函数的控制器以及在线学习未知系统动力学的控制器架构,以确保学习过程中的安全性,通过Gaussian Processes建模系统动力学并展示在倒立摆和无线车对车自主跟车任务中更高的样本效率和安全性。

Comments Published in AAAI 2019

详情
AI中文摘要

强化学习(RL)算法在模拟应用之外取得有限成功,主要原因是学习过程中缺乏安全性保证。现实世界系统在最优控制器学习之前可能无法正常运行或崩溃。为了解决这个问题,我们提出了一种控制器架构,结合(1)模型无关的RL控制器、(2)利用控制障碍函数(CBFs)的模型基于控制器以及(3)在线学习未知系统动力学,以确保学习过程中的安全性。我们的通用框架利用RL算法的成功来学习高性能控制器,而基于CBF的控制器通过约束可探索策略集来保证安全并引导学习过程。我们利用高斯过程(GPs)来建模系统动力学及其不确定性。我们的新型控制器合成算法RL-CBF在学习过程中以高概率保证安全性,无论使用何种RL算法,并展示了更高的策略探索效率。我们在(1)倒立摆控制和(2)具有无线车辆到车辆通信的自动驾驶跟车任务中测试了我们的算法,并展示了我们的算法在学习过程中比其他最先进的算法具有更高的样本效率,并在整个学习过程中保持安全。

英文摘要

Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) on-line learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous car-following with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.

1803.02099 2026-06-04 cs.LG cs.SY eess.SY

A Hybrid Method for Traffic Flow Forecasting Using Multimodal Deep Learning

一种用于交通流预测的混合方法:使用多模态深度学习

Shengdong Du, Tianrui Li, Xun Gong, Shi-Jinn Horng

发表机构 * School of Information Science and Technology, National Engineering Laboratory of Integrated Transportation Big Data Application Technology(信息科学与技术学院,集成交通大数据应用技术国家工程实验室) Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology(计算机科学与工程系,台湾大学科技学院)

AI总结 本文提出了一种混合多模态深度学习方法,用于短期交通流预测,通过注意力辅助多模态深度学习架构联合和自适应学习多模态交通数据的空间时间相关特征和长期时间依赖性。

详情
AI中文摘要

交通流预测被视为智能交通系统的关键问题。在本工作中,我们提出了一种混合多模态深度学习方法,用于短期交通流预测,该方法通过注意力辅助多模态深度学习架构,联合和自适应地学习多模态交通数据的空间时间相关特征和长期时间依赖性。根据多模态交通数据的强非线性特征,我们方法的基础模块由一维卷积神经网络(1D CNN)和门控循环单元(GRU)组成,其中前者用于捕捉局部趋势特征,后者用于捕捉长期时间依赖性。然后,我们设计了一个混合多模态深度学习框架(HMDLF),通过多个CNN-GRU-Attention模块融合不同模态交通数据的共享表示特征。实验结果表明,所提出的多模态深度学习模型能够有效处理复杂的非线性城市交通流预测,并具有满意的准确性和有效性。

英文摘要

Traffic flow forecasting has been regarded as a key problem of intelligent transport systems. In this work, we propose a hybrid multimodal deep learning method for short-term traffic flow forecasting, which can jointly and adaptively learn the spatial-temporal correlation features and long temporal interdependence of multi-modality traffic data by an attention auxiliary multimodal deep learning architecture. According to the highly nonlinear characteristics of multi-modality traffic data, the base module of our method consists of one-dimensional Convolutional Neural Networks (1D CNN) and Gated Recurrent Units (GRU) with the attention mechanism. The former is to capture the local trend features and the latter is to capture the long temporal dependencies. Then, we design a hybrid multimodal deep learning framework (HMDLF) for fusing share representation features of different modality traffic data by multiple CNN-GRU-Attention modules. The experimental results indicate that the proposed multimodal deep learning model is capable of dealing with complex nonlinear urban traffic flow forecasting with satisfying accuracy and effectiveness.

1903.05196 2026-06-04 cs.LG cs.SY eess.SY stat.ML

A Review of Reinforcement Learning for Autonomous Building Energy Management

自主建筑能源管理中强化学习的综述

Karl Mason, Santiago Grijalva

发表机构 * School of Electrical and Computer Engineering(电气与计算机工程学院) Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文综述了强化学习在自主建筑能源管理系统中的应用,总结了相关文献,并探讨了未来研究方向和挑战。

Comments 17 pages, 3 figures

详情
AI中文摘要

近年来,建筑能源管理领域受到了广泛关注。该领域致力于结合传感器技术、通信技术和先进控制算法,以优化能源利用。强化学习是用于控制问题中最突出的机器学习算法之一,并已在建筑能源管理领域取得了许多成功应用。本文对与强化学习应用于开发自主建筑能源管理系统相关的文献进行了全面回顾。还概述了强化学习未来的研究方向和挑战。

英文摘要

The area of building energy management has received a significant amount of interest in recent years. This area is concerned with combining advancements in sensor technologies, communications and advanced control algorithms to optimize energy utilization. Reinforcement learning is one of the most prominent machine learning algorithms used for control problems and has had many successful applications in the area of building energy management. This research gives a comprehensive review of the literature relating to the application of reinforcement learning to developing autonomous building energy management systems. The main direction for future research and challenges in reinforcement learning are also outlined.

1807.06614 2026-06-04 cs.RO cs.SY eess.SY

Rapid Trajectory Optimization Using C-FROST with Illustration on a Cassie-Series Dynamic Walking Biped

利用C-FROST实现快速轨迹优化及在Cassie系列动态步行双足机器人上的示例

Ayonga Hereid, Omar Harib, Ross Hartley, Yukai Gong, Jessy W. Grizzle

发表机构 * College of Engineering and the Robotics Institute, University of Michigan(密歇根大学工程学院和机器人研究所)

AI总结 本文提出了一种无需去除或合并自由度的方法,通过C-FROST和多线程技术快速确定人类oids的解决方案,并在20自由度的Cassie系列双足机器人浮基模型上进行数值计算和物理实验验证

详情
AI中文摘要

低维模型在步态设计中的一个主要吸引力是能够快速计算解决方案,而其一个缺点是将解决方案映射回目标机器人存在困难。本文提出了一套工具,用于快速确定“人类oids”的解决方案而无需去除或合并自由度。主要工具包括(1)C-FROST,一个开源的C++接口,用于FROST,一种直接配分优化工具;和(2)多线程。结果将在20自由度的Cassie系列双足机器人浮基模型上通过数值计算和物理实验进行示例展示。

英文摘要

One of the big attractions of low-dimensional models for gait design has been the ability to compute solutions rapidly, whereas one of their drawbacks has been the difficulty in mapping the solutions back to the target robot. This paper presents a set of tools for rapidly determining solutions for ``humanoids'' without removing or lumping degrees of freedom. The main tools are (1) C-FROST, an open-source C++ interface for FROST, a direct collocation optimization tool; and (2) multi-threading. The results will be illustrated on a 20-DoF floating-base model for a Cassie-series bipedal robot through numerical calculations and physical experiments.

1903.05355 2026-06-04 cs.RO cs.LG cs.SY eess.SY

A Framework for On-line Learning of Underwater Vehicles Dynamic Models

在线学习水下机器人动态模型的框架

Bilal Wehbe, Marc Hildebrandt, Frank Kirchner

发表机构 * DFKI - Robotic Innovation Center(DFKI机器人创新中心)

AI总结 本文提出了一种在线学习水下机器人动态模型的框架,通过增量支持向量回归方法从数据流中逐步学习模型,并结合增量学习策略来改进模型在整体状态空间上的泛化能力。

Comments 8 pages, 6 figures, ICRA 2019 authors preprint

详情
AI中文摘要

从数据中学习机器人的动力学有助于实现更精确的跟踪控制器,或帮助其导航算法。然而,当由于外部条件变化导致机器人实际动力学变化时,需要在线调整其模型以保持高性能。本文提出了一种在线学习机器人动力学的框架,以适应此类变化。所提出的框架采用增量支持向量回归方法,从数据流中逐步学习模型。结合增量学习,开发了包括和遗忘数据的策略,以在整体状态空间上获得更好的泛化能力。该框架在仿真和真实实验场景中进行了测试,展示了其适应机器人动力学变化的能力。

英文摘要

Learning the dynamics of robots from data can help achieve more accurate tracking controllers, or aid their navigation algorithms. However, when the actual dynamics of the robots change due to external conditions, on-line adaptation of their models is required to maintain high fidelity performance. In this work, a framework for on-line learning of robot dynamics is developed to adapt to such changes. The proposed framework employs an incremental support vector regression method to learn the model sequentially from data streams. In combination with the incremental learning, strategies for including and forgetting data are developed to obtain better generalization over the whole state space. The framework is tested in simulation and real experimental scenarios demonstrating its adaptation capabilities to changes in the robot's dynamics.

1903.03948 2026-06-04 cs.AI cs.SY eess.SY

Rethinking System Health Management

重新思考系统健康管理

Edward Balaban, Stephen B. Johnson, Mykel J. Kochenderfer

发表机构 * Intelligent Systems Division, NASA Ames Research Center(美国国家航空航天局阿姆斯研究中心智能系统部门) Dependable System Technologies, LLC(可靠系统技术有限公司) Jacobs ESSCA Group at NASA Marshall Space Flight Center(美国国家航空航天局马歇尔太空飞行中心Jacobs ESSCA小组) Department of Aeronautics and Astronautics, Stanford University(斯坦福大学航空与航天系)

AI总结 本文提出将系统健康管理与决策制定统一起来,以提高系统运行效率并降低整体复杂性,通过数值示例展示了传统方法的局限性。

Comments Published in the proceedings of the 2018 AAAI Fall Symposium on Integrating Planning, Diagnosis, and Causal Reasoning

详情
AI中文摘要

复杂动态系统的健康管理传统上与自动化控制、规划和调度(通常称为决策制定)分开发展。集成系统健康管理的目标是使系统健康管理与决策制定协调一致,尽管成功的实际应用仍然有限。本文提出,系统健康管理与决策制定不应被视为相互连接但又不同的实体,而应在其 formulations 中统一。借助建模和计算的进步,我们主张统一方法将提高系统的操作效率,并可能导致整体系统复杂性降低。我们概述了普遍的系统健康管理方法,并通过数值示例说明其局限性。然后描述了所提出的统一方法,并展示其如何容纳典型的系统健康管理概念。

英文摘要

Health management of complex dynamic systems has traditionally evolved separately from automated control, planning, and scheduling (generally referred to in the paper as decision making). A goal of Integrated System Health Management has been to enable coordination between system health management and decision making, although successful practical implementations have remained limited. This paper proposes that, rather than being treated as connected, yet distinct entities, system health management and decision making should be unified in their formulations. Enabled by advances in modeling and computing, we argue that the unified approach will increase a system's operational effectiveness and may also lead to a lower overall system complexity. We overview the prevalent system health management methodology and illustrate its limitations through numerical examples. We then describe the proposed unification approach and show how it accommodates the typical system health management concepts.

1811.07834 2026-06-04 cs.RO cs.SY eess.SY

Safely Probabilistically Complete Real-Time Planning and Exploration in Unknown Environments

在未知环境中安全的概率完备实时规划与探索

David Fridovich-Keil, Jaime F. Fisac, Claire J. Tomlin

发表机构 * UC-Philippine-California Advanced Research Institute(加州-菲律宾-加州高级研究机构) ONR MURI(国防高级研究计划局(ONR)MURI) SRC CONIX Center(SRC CONIX中心)

AI总结 本文提出了一种新的运动规划框架,该框架围绕现有的动力学规划器构建,在事先未知的静态环境中保证递归可行性。通过利用来自可达性分析的鲁棒控制器,该方法对整体安全性和碰撞避免做出了强保证。运动计划始终保持在初始状态的安全后向可达集内,同时安全地探索空间。这保证了初始状态的安全性,并确保在安全探索过程中最终能找到目标。

Comments 7 pages, accepted to ICRA 2019

详情
AI中文摘要

我们提出了一种新的运动规划框架,该框架围绕现有的动力学规划器构建,并在事先未知的静态环境中保证递归可行性。我们的方法通过利用来自可达性分析的鲁棒控制器,对整体安全性和碰撞避免做出了强保证。我们确保运动计划始终不离开初始状态的安全后向可达集,同时安全地探索空间。这保持了初始状态的安全性,并保证在安全探索过程中最终能够找到目标。我们将在机器人操作系统(ROS)软件环境中实现该框架,并在实时模拟中进行演示。

英文摘要

We present a new framework for motion planning that wraps around existing kinodynamic planners and guarantees recursive feasibility when operating in a priori unknown, static environments. Our approach makes strong guarantees about overall safety and collision avoidance by utilizing a robust controller derived from reachability analysis. We ensure that motion plans never exit the safe backward reachable set of the initial state, while safely exploring the space. This preserves the safety of the initial state, and guarantees that that we will eventually find the goal if it is possible to do so while exploring safely. We implement our framework in the Robot Operating System (ROS) software environment and demonstrate it in a real-time simulation.

1902.08705 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY

A General Framework for Structured Learning of Mechanical Systems

结构机械系统学习的通用框架

Jayesh K. Gupta, Kunal Menda, Zachary Manchester, Mykel J. Kochenderfer

发表机构 * Stanford University(斯坦福大学)

AI总结 本文提出了一种通用框架,用于结构化学习机械系统,通过结合先验知识和训练表达式近似器来提高模型的准确性和效率。

Comments 10 pages, 7 figures. First two authors contributed equally. Submitted to IROS/RA-L. Code at https://github.com/sisl/mechamodlearn/

详情
AI中文摘要

学习准确的动力学模型对于优化和顺应性控制机器人系统至关重要。当前使用解析参数化进行白盒建模或使用神经网络进行黑盒建模的方法可能会产生高偏差或高方差。我们提出了一个灵活的灰盒模型,可以无缝地结合可用的先验知识,并在没有时训练具有表达能力的函数近似器。我们提出使用神经网络参数化机械系统,以建模其拉格朗日量和作用在其上的广义力。我们在模拟的驱动双摆上测试了我们的方法。我们展示了我们的方法在数据效率以及基于模型的强化学习中的性能优于朴素的黑盒模型。我们还系统地研究了我们的方法在结合可用的系统先验知识以提高数据效率方面的能力。

英文摘要

Learning accurate dynamics models is necessary for optimal, compliant control of robotic systems. Current approaches to white-box modeling using analytic parameterizations, or black-box modeling using neural networks, can suffer from high bias or high variance. We address the need for a flexible, gray-box model of mechanical systems that can seamlessly incorporate prior knowledge where it is available, and train expressive function approximators where it is not. We propose to parameterize a mechanical system using neural networks to model its Lagrangian and the generalized forces that act on it. We test our method on a simulated, actuated double pendulum. We show that our method outperforms a naive, black-box model in terms of data-efficiency, as well as performance in model-based reinforcement learning. We also conduct a systematic study of our method's ability to incorporate available prior knowledge about the system to improve data efficiency.

1806.07115 2026-06-04 cs.RO cs.SY eess.SY

ConFusion: Sensor Fusion for Complex Robotic Systems using Nonlinear Optimization

ConFusion:利用非线性优化的复杂机器人系统传感器融合

Timothy Sandy, Lukas Stadelmann, Simon Kerscher, Jonas Buchli

发表机构 * Agile & Dexterous Robotics Lab, ETH Zurich(敏捷与灵活机器人实验室,苏黎世联邦理工学院)

AI总结 本文提出ConFusion,一种用于机器人应用的开源传感器融合框架,通过非线性优化实现灵活的传感器融合设计,并展示了其在视觉惯性跟踪和移动机械臂上的性能。

详情
Journal ref
IEEE Robotics and Automation Letters, 2019, Volume 4, Number 2, Pages 1093-1100
AI中文摘要

我们介绍了ConFusion,一种用于机器人应用的开源在线传感器融合包。ConFusion是一种模块化的框架,用于在移动时间窗口估计器内融合多种异构传感器的测量。ConFusion比基于滤波的系统在传感器融合问题设计上具有更大的灵活性,并且能够根据可用的计算能力调整在线估计的质量。我们通过与迭代扩展卡尔曼滤波器在视觉惯性跟踪中的性能比较,展示了其在移动机械臂上的整体传感器融合的适应性。

英文摘要

We present ConFusion, an open-source package for online sensor fusion for robotic applications. ConFusion is a modular framework for fusing measurements from many heterogeneous sensors within a moving horizon estimator. ConFusion offers greater flexibility in sensor fusion problem design than filtering-based systems and the ability to scale the online estimate quality with the available computing power. We demonstrate its performance in comparison to an iterated extended Kalman filter in visual-inertial tracking, and show its versatility through whole-body sensor fusion on a mobile manipulator.

1810.06749 2026-06-04 cs.LG cs.NA math.NA stat.ML

Optimally rotated coordinate systems for adaptive least-squares regression on sparse grids

最优旋转坐标系用于稀疏网格上的自适应最小二乘回归

Bastian Bohn, Michael Griebel, Jens Oettershagen

发表机构 * Institute for Numerical Simulation, University of Bonn(柏林洪堡大学数值模拟研究所)

AI总结 针对高维数据集,本文提出了一种预处理方法,通过确定问题相关的优化坐标系来降低数据的有效维度,从而提升自适应稀疏网格最小二乘回归算法的性能。

详情
AI中文摘要

对于低维数据集具有大量数据点时,标准核方法通常不再适用于回归。除了简单的线性模型或复杂的启发式深度学习模型外,基于网格的更大(核)模型类的离散化方法导致的算法自然地线性缩放数据点数量。在中等维或高维回归任务中,这些基于网格的离散化方法受到维度诅咒的影响。在此背景下,稀疏网格方法已证明可以很大程度上克服这一问题。在这种情况下,能够检测并利用名义上高维数据的低有效维数的空间和维度自适应稀疏网格特别成功。然而,它们仍然依赖于轴对齐的结构,并在具有主要偏斜和旋转坐标的数据中表现出问题。在本文中,我们提出了一种预处理方法,用于这些自适应稀疏网格算法,以确定一个优化的、问题相关的坐标系,从而在ANOVA意义上降低给定数据集的有效维度。我们通过合成数据以及现实世界数据的数值示例,展示了自适应稀疏网格最小二乘算法如何从我们的预处理方法中受益。

英文摘要

For low-dimensional data sets with a large amount of data points, standard kernel methods are usually not feasible for regression anymore. Besides simple linear models or involved heuristic deep learning models, grid-based discretizations of larger (kernel) model classes lead to algorithms, which naturally scale linearly in the amount of data points. For moderate-dimensional or high-dimensional regression tasks, these grid-based discretizations suffer from the curse of dimensionality. Here, sparse grid methods have proven to circumvent this problem to a large extent. In this context, space- and dimension-adaptive sparse grids, which can detect and exploit a given low effective dimensionality of nominally high-dimensional data, are particularly successful. They nevertheless rely on an axis-aligned structure of the solution and exhibit issues for data with predominantly skewed and rotated coordinates. In this paper we propose a preprocessing approach for these adaptive sparse grid algorithms that determines an optimized, problem-dependent coordinate system and, thus, reduces the effective dimensionality of a given data set in the ANOVA sense. We provide numerical examples on synthetic data as well as real-world data to show how an adaptive sparse grid least squares algorithm benefits from our preprocessing method.

1810.03749 2026-06-04 cs.RO cs.SY eess.SY

Balancing Global Exploration and Local-connectivity Exploitation with Rapidly-exploring Random disjointed-Trees

在快速探索随机断树中平衡全局探索与局部连通性利用

Tin Lai, Fabio Ramos, Gilad Francis

发表机构 * NVIDIA, USA(美国NVIDIA公司)

AI总结 本文提出了一种名为RRdT*的增量最优多查询规划器,通过使用多个断树来利用空间的局部连通性,通过马尔可夫链随机采样,并在局部连通性利用失败时主动探索全局空间,将局部利用与全局探索的平衡转化为多臂老虎机问题,从而提高采样效率。

Comments Submitted to IEEE International Conference on Robotics and Automation (ICRA) 2019

详情
AI中文摘要

在高度受限的环境中,采样效率长期以来一直是采样规划器的主要挑战。在本文中,我们提出了快速探索随机断树(RRdT*),一种增量最优多查询规划器。RRdT*使用多个断树来通过马尔可夫链随机采样来利用空间的局部连通性,该方法利用来自先前成功和失败样本的邻居信息。为了平衡局部利用,当局部连通性利用失败时,RRdT*会主动探索未见过的全局空间。局部利用与全局探索之间的主动权衡被公式化为一个多臂老虎机问题。我们主张,主动平衡全局探索与局部利用是提高采样规划器采样效率的关键。我们为这一新方法提供了严谨的完整性和最优收敛性证明。此外,我们通过实验展示了RRdT*的局部探索树在规划中提供改进可见性的有效性。因此,RRdT*在高度受限的环境中优于现有的最先进增量规划器。

英文摘要

Sampling efficiency in a highly constrained environment has long been a major challenge for sampling-based planners. In this work, we propose Rapidly-exploring Random disjointed-Trees* (RRdT*), an incremental optimal multi-query planner. RRdT* uses multiple disjointed-trees to exploit local-connectivity of spaces via Markov Chain random sampling, which utilises neighbourhood information derived from previous successful and failed samples. To balance local exploitation, RRdT* actively explore unseen global spaces when local-connectivity exploitation is unsuccessful. The active trade-off between local exploitation and global exploration is formulated as a multi-armed bandit problem. We argue that the active balancing of global exploration and local exploitation is the key to improving sample efficient in sampling-based motion planners. We provide rigorous proofs of completeness and optimal convergence for this novel approach. Furthermore, we demonstrate experimentally the effectiveness of RRdT*'s locally exploring trees in granting improved visibility for planning. Consequently, RRdT* outperforms existing state-of-the-art incremental planners, especially in highly constrained environments.

1902.10320 2026-06-04 cs.RO cs.SY eess.SY

A New Simulation Metric to Determine Safe Environments and Controllers for Systems with Unknown Dynamics

一种新的仿真度量用于确定具有未知动态系统的安全环境和控制器

Shromona Ghosh, Somil Bansal, Alberto Sangiovanni-Vincentelli, Sanjit A. Seshia, Claire J. Tomlin

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出了一种基于规范的仿真度量(SPEC),用于在系统动态未知的情况下,通过更严格的规范修改来合成安全控制器,从而扩大安全环境集。

Comments 22nd ACM International Conference on Hybrid Systems: Computation and Control (2019)

详情
AI中文摘要

我们考虑了在已知状态和控制空间但动态未知的情况下,提取安全环境和控制器以满足可达-避免目标的问题。在给定的环境中,通常通过从抽象或系统模型(可能从数据中学习)中合成控制器。然而,在许多情况下,模型的动力学与实际系统的关系并不明确,因此难以为系统提供安全保证。在这种情况下,标准仿真度量(SSM)定义为模型与系统输出轨迹之间最坏情况的范数距离,可以用来将系统的可达-避免规范修改为更严格的抽象规范。然而,获得的距离以及修改后的规范可能相当保守,这限制了能够获得安全控制器的环境集。我们提出SPEC,一种基于规范的仿真度量,通过仅计算违反系统规范的轨迹来克服这些限制。我们证明,使用SPEC修改可达-避免规范可以比SSM合成更大环境集的安全控制器。我们还提出了一种概率方法来计算一般系统的SPEC。使用四旋翼和自动驾驶汽车的仿真器进行的案例研究展示了所提出度量在确定安全环境集和控制器方面的优势。

英文摘要

We consider the problem of extracting safe environments and controllers for reach-avoid objectives for systems with known state and control spaces, but unknown dynamics. In a given environment, a common approach is to synthesize a controller from an abstraction or a model of the system (potentially learned from data). However, in many situations, the relationship between the dynamics of the model and the \textit{actual system} is not known; and hence it is difficult to provide safety guarantees for the system. In such cases, the Standard Simulation Metric (SSM), defined as the worst-case norm distance between the model and the system output trajectories, can be used to modify a reach-avoid specification for the system into a more stringent specification for the abstraction. Nevertheless, the obtained distance, and hence the modified specification, can be quite conservative. This limits the set of environments for which a safe controller can be obtained. We propose SPEC, a specification-centric simulation metric, which overcomes these limitations by computing the distance using only the trajectories that violate the specification for the system. We show that modifying a reach-avoid specification with SPEC allows us to synthesize a safe controller for a larger set of environments compared to SSM. We also propose a probabilistic method to compute SPEC for a general class of systems. Case studies using simulators for quadrotors and autonomous cars illustrate the advantages of the proposed metric for determining safe environment sets and controllers.

1806.07190 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Stable Gaussian Process based Tracking Control of Euler-Lagrange Systems

基于稳定高斯过程的欧拉-拉格朗日系统跟踪控制

Thomas Beckers, Dana Kulić, Sandra Hirche

发表机构 * Chair of Information-oriented Control (ITR), Department of Electrical and Computer Engineering, Technical University of Munich(信息导向控制研究所(ITR),电气与计算机工程系,慕尼黑技术大学) Adaptive Systems Laboratory, Department of Electrical and Computer Engineering, University of Waterloo(自适应系统实验室,电气与计算机工程系,滑铁卢大学)

AI总结 本文提出一种基于高斯过程回归的稳定跟踪控制方法,用于未知欧拉-拉格朗日系统的高精度跟踪控制,通过数据驱动建模实现前馈补偿,并利用模型保真度动态调整反馈增益,确保全局有界跟踪误差。

Comments Accepted manuscript for publication in Elsevier Automatica

详情
AI中文摘要

对现实中的欧拉-拉格朗日系统实现完美的跟踪控制具有挑战性,因为系统模型的不确定性以及外部干扰会影响跟踪误差的大小。通过增加反馈增益或改进系统模型可以减小跟踪误差。后者显然更可取,因为它允许在低反馈增益下保持良好的跟踪性能。然而,准确的模型往往难以获得。在本文中,我们解决了未知欧拉-拉格朗日系统的稳定高性能跟踪控制问题。具体来说,我们使用高斯过程回归来获得一个数据驱动的模型,用于系统未知动力学的前馈补偿。模型保真度用于调整反馈增益,允许在状态空间中模型信心高的区域使用低反馈增益。所提出的控制律保证了具有特定概率的全局有界跟踪误差。仿真研究展示了其优于现有跟踪控制方法的优越性。

英文摘要

Perfect tracking control for real-world Euler-Lagrange systems is challenging due to uncertainties in the system model and external disturbances. The magnitude of the tracking error can be reduced either by increasing the feedback gains or improving the model of the system. The latter is clearly preferable as it allows to maintain good tracking performance at low feedback gains. However, accurate models are often difficult to obtain. In this article, we address the problem of stable high-performance tracking control for unknown Euler-Lagrange systems. In particular, we employ Gaussian Process regression to obtain a data-driven model that is used for the feed-forward compensation of unknown dynamics of the system. The model fidelity is used to adapt the feedback gains allowing low feedback gains in state space regions of high model confidence. The proposed control law guarantees a globally bounded tracking error with a specific probability. Simulation studies demonstrate the superiority over state of the art tracking control approaches.

1902.08721 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

Online Control with Adversarial Disturbances

对抗性扰动下的在线控制

Naman Agarwal, Brian Bullins, Elad Hazan, Sham M. Kakade, Karan Singh

发表机构 * Google AI Princeton(谷歌AI普林斯顿) Princeton University(普林斯顿大学) University of Washington(华盛顿大学) Allen School of Computer Science and Engineering(阿伦计算机科学与工程学院)

AI总结 本文研究了在存在对抗性扰动的线性动态系统中的在线控制问题,提出了一种高效的算法,该算法在几乎紧致的 regret 绑定下实现了接近全知扰动的控制效果,同时扩展了先前工作的两个主要方面:允许动态中的对抗性噪声和一般的凸成本。

详情
AI中文摘要

我们研究了具有对抗性扰动(而非统计噪声)的线性动态系统的控制问题。我们考虑的目标是regret:我们希望一种在线控制过程能够几乎达到完全了解扰动的控制过程的性能。我们的主要结果是一个高效的算法,该算法为该问题提供了几乎紧致的regret界。从技术角度来看,这项工作在两个主要方面扩展了先前的工作:我们的模型允许动态中的对抗性噪声,并允许一般的凸成本。

英文摘要

We study the control of a linear dynamical system with adversarial disturbances (as opposed to statistical noise). The objective we consider is one of regret: we desire an online control procedure that can do nearly as well as that of a procedure that has full knowledge of the disturbances in hindsight. Our main result is an efficient algorithm that provides nearly tight regret bounds for this problem. From a technical standpoint, this work generalizes upon previous work in two main aspects: our model allows for adversarial noise in the dynamics, and allows for general convex costs.

1711.09048 2026-06-04 cs.AI cs.RO cs.SY eess.SY

A Compression-Inspired Framework for Macro Discovery

一种受压缩启发的宏发现框架

Francisco M. Garcia, Bruno C. da Silva, Philip S. Thomas

发表机构 * College of Information and Computer Sciences(信息与计算机科学学院) Department of Computer Science(计算机科学系) University of Massachusetts Amherst(马萨诸塞大学阿默斯特分校) Federal University Rio Grande do Sul(里约格朗德杜斯阿鲁斯联邦大学)

AI总结 本文提出了一种受压缩启发的宏发现框架,通过识别高性能策略获得的轨迹中的重复模式,帮助强化学习代理利用早期经验快速解决相关新任务。

Comments Accepted as Extended Abstract, AAMAS, 2019

详情
AI中文摘要

在本文中,我们考虑了强化学习代理在解决一组相关马尔可夫决策过程时,如何利用早期获得的知识来提高其快速解决新但相关任务的能力。一种利用这种经验的方法是通过识别从高性能策略中获得的轨迹中的重复模式。我们提出一个三步框架:代理1) 通过压缩来自近最优策略的轨迹生成一组候选开环宏;2) 评估每个宏的价值;3) 选择一个最大化多样性的宏子集,覆盖通常用于解决相关任务集的策略空间。我们的实验表明,将识别出的宏扩展到代理的原始原始动作集,使其能够更快速地在未见过但相似的MDPs中学习到最优策略。

英文摘要

In this paper we consider the problem of how a reinforcement learning agent tasked with solving a set of related Markov decision processes can use knowledge acquired early in its lifetime to improve its ability to more rapidly solve novel, but related, tasks. One way of exploiting this experience is by identifying recurrent patterns in trajectories obtained from well-performing policies. We propose a three-step framework in which an agent 1) generates a set of candidate open-loop macros by compressing trajectories drawn from near-optimal policies; 2) evaluates the value of each macro; and 3) selects a maximally diverse subset of macros that spans the space of policies typically required for solving the set of related tasks. Our experiments show that extending the original primitive action-set of the agent with the identified macros allows it to more rapidly learn an optimal policy in unseen, but similar MDPs.

1902.08274 2026-06-04 cs.AI cs.LG cs.MA cs.SY eess.SY

An Online Decision-Theoretic Pipeline for Responder Dispatch

为响应调度设计一个在线决策理论管道

Ayan Mukhopadhyay, Geoffrey Pettet, Chinmaya Samal, Abhishek Dubey, Yevgeniy Vorobeychik

发表机构 * Vanderbilt University(范德比大学) Washington University(华盛顿大学)

AI总结 本文提出了一种在线决策理论管道,用于有效应对紧急事件,通过实时数据流更新模型,提高响应效率并减少计算时间。

Comments Appeared in ICCPS 2019

详情
AI中文摘要

向服务交通事故、火灾、 distress 电话和犯罪等紧急事件派遣应急响应人员的问题困扰着全球各地的城市。尽管此类问题已广泛研究,但大多数方法是离线的。这些方法无法捕捉到关键紧急响应发生的动态变化环境,因此无法在实践中实施。任何全面的方法必须考虑其他挑战,包括预测事件何时何地发生以及理解环境动态变化。我们描述了一个系统,该系统以在线方式处理所有这些问题,即模型通过流数据源更新。我们强调这种做法对应急响应有效性的重要性,并提出了一种算法框架,可以为给定的决策理论模型计算有希望的行动。我们还提出了一种在线机制用于事件预测,以及基于循环神经网络的方法来学习和预测影响响应调度的环境特征。我们比较了我们的方法与现有最先进的方法和现有调度策略,结果表明我们的方法在减少响应时间的同时大幅减少了计算时间。

英文摘要

The problem of dispatching emergency responders to service traffic accidents, fire, distress calls and crimes plagues urban areas across the globe. While such problems have been extensively looked at, most approaches are offline. Such methodologies fail to capture the dynamically changing environments under which critical emergency response occurs, and therefore, fail to be implemented in practice. Any holistic approach towards creating a pipeline for effective emergency response must also look at other challenges that it subsumes - predicting when and where incidents happen and understanding the changing environmental dynamics. We describe a system that collectively deals with all these problems in an online manner, meaning that the models get updated with streaming data sources. We highlight why such an approach is crucial to the effectiveness of emergency response, and present an algorithmic framework that can compute promising actions for a given decision-theoretic model for responder dispatch. We argue that carefully crafted heuristic measures can balance the trade-off between computational time and the quality of solutions achieved and highlight why such an approach is more scalable and tractable than traditional approaches. We also present an online mechanism for incident prediction, as well as an approach based on recurrent neural networks for learning and predicting environmental features that affect responder dispatch. We compare our methodology with prior state-of-the-art and existing dispatch strategies in the field, which show that our approach results in a reduction in response time with a drastic reduction in computational time.

1812.07084 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY

Learning Constraints from Demonstrations

从示范中学习约束

Glen Chou, Dmitry Berenson, Necmiye Ozay

发表机构 * Dept. of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, 48109, USA(电气工程与计算机科学系,密歇根大学,安娜堡,MI,48109,美国)

AI总结 该研究提出了一种从示范中学习未知约束的方法,通过任务示范、成本函数和系统动力学与控制约束,利用hit-and-run采样获取低成本但不安全的轨迹,并通过整数规划获得一致的不安全集表示,同时理论分析了可从安全示范中学习的约束子集。

Comments Presented at the Workshop on the Algorithmic Foundations of Robotics (WAFR), 2018, Mérida, Mexico

详情
AI中文摘要

我们通过提供一种方法扩展了从示范中学习的范式,该方法利用任务的示范、成本函数以及系统动力学和控制约束来学习跨任务的未知约束。给定安全的示范,我们的方法使用hit-and-run采样来获得低成本但不安全的轨迹。安全和不安全的轨迹都被用来通过求解整数规划问题获得不安全集的一致表示。我们的方法能够跨系统动力学泛化,并学习保证的约束子集。我们还提供了理论分析,说明从安全示范中可以学习的约束子集。我们在线性和非线性系统动力学上展示了我们的方法,并证明它可以修改以适应次优示范,并且也可以用于特征空间中学习约束。

英文摘要

We extend the learning from demonstration paradigm by providing a method for learning unknown constraints shared across tasks, using demonstrations of the tasks, their cost functions, and knowledge of the system dynamics and control constraints. Given safe demonstrations, our method uses hit-and-run sampling to obtain lower cost, and thus unsafe, trajectories. Both safe and unsafe trajectories are used to obtain a consistent representation of the unsafe set via solving an integer program. Our method generalizes across system dynamics and learns a guaranteed subset of the constraint. We also provide theoretical analysis on what subset of the constraint can be learnable from safe demonstrations. We demonstrate our method on linear and nonlinear system dynamics, show that it can be modified to work with suboptimal demonstrations, and that it can also be used to learn constraints in a feature space.

1709.04794 2026-06-04 cs.AI cs.NA cs.PF math.NA

Fast semi-supervised discriminant analysis for binary classification of large data-sets

快速半监督判别分析用于大数据集的二分类

Joris Tavernier, Jaak Simm, Karl Meerbergen, Joerg Kurt Wegner, Hugo Ceulemans, Yves Moreau

发表机构 * Department of Computer Science, KU Leuven(库勒万大学计算机科学系)

AI总结 本文提出并分析了三种可扩展的半监督判别分析方法,通过利用数据稀疏性和Krylov子空间的移位不变性,提高了大数据集二分类的效率和性能。

详情
AI中文摘要

高维数据需要可扩展的算法。我们提出了三种可扩展且相关的半监督判别分析(SDA)算法,并分析了这些算法。这些方法基于Krylov子空间方法,利用数据稀疏性和Krylov子空间的移位不变性。此外,通过在半监督设置中添加中心化来改进问题定义。所提出的方法在制药公司的行业级数据集上进行了评估,以预测化合物在目标蛋白上的活性。结果表明,SDA实现了良好的预测性能,而我们的方法仅需几秒钟,显著提高了之前最先进的方法的计算时间。

英文摘要

High-dimensional data requires scalable algorithms. We propose and analyze three scalable and related algorithms for semi-supervised discriminant analysis (SDA). These methods are based on Krylov subspace methods which exploit the data sparsity and the shift-invariance of Krylov subspaces. In addition, the problem definition was improved by adding centralization to the semi-supervised setting. The proposed methods are evaluated on a industry-scale data set from a pharmaceutical company to predict compound activity on target proteins. The results show that SDA achieves good predictive performance and our methods only require a few seconds, significantly improving computation time on previous state of the art.

1902.06366 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Detecting and Diagnosing Incipient Building Faults Using Uncertainty Information from Deep Neural Networks

利用深度神经网络的不确定性信息检测和诊断建筑初期故障

Baihong Jin, Dan Li, Seshadhri Srinivasan, See-Kiong Ng, Kameshwar Poolla, Alberto~Sangiovanni-Vincentelli

发表机构 * Department of EECS, University of California, Berkeley(加州大学伯克利分校电子工程与计算机科学系) Institute of Data Science, National University of Singapore(新加坡国立大学数据科学研究所) The Berkeley Education Alliance for Research in Singapore(新加坡伯克利教育联盟)

AI总结 本文提出利用蒙特卡洛dropout方法增强监督学习流程,以检测和诊断未见过的初期故障,并在RP-1043数据集上验证其在指示最可能的初期故障类型方面的有效性。

详情
AI中文摘要

早期检测初期故障对于减少维护成本、节约能源和提高居住舒适度在建筑中至关重要。尽管深度神经网络等流行监督学习模型因其能够直接从标记的故障数据中学习而被认为具有前景,但监督学习方法的性能高度依赖于标记训练数据的可用性和质量。在故障检测与诊断(FDD)应用中,缺乏标记的初期故障数据已成为将这些监督学习技术应用于商业建筑的主要挑战。为克服这一挑战,本文提出利用蒙特卡洛dropout(MC-dropout)来增强监督学习流程,使生成的神经网络能够检测和诊断未见过的初期故障示例。我们还检查了所提出的MC-dropout方法在RP-1043数据集上的效果,以证明其在指示最可能的初期故障类型方面的有效性。

英文摘要

Early detection of incipient faults is of vital importance to reducing maintenance costs, saving energy, and enhancing occupant comfort in buildings. Popular supervised learning models such as deep neural networks are considered promising due to their ability to directly learn from labeled fault data; however, it is known that the performance of supervised learning approaches highly relies on the availability and quality of labeled training data. In Fault Detection and Diagnosis (FDD) applications, the lack of labeled incipient fault data has posed a major challenge to applying these supervised learning techniques to commercial buildings. To overcome this challenge, this paper proposes using Monte Carlo dropout (MC-dropout) to enhance the supervised learning pipeline, so that the resulting neural network is able to detect and diagnose unseen incipient fault examples. We also examine the proposed MC-dropout method on the RP-1043 dataset to demonstrate its effectiveness in indicating the most likely incipient fault types.

1902.06361 2026-06-04 cs.LG cs.NE cs.SY eess.SY stat.ML

A One-Class Support Vector Machine Calibration Method for Time Series Change Point Detection

一种用于时间序列变化点检测的一类支持向量机校准方法

Baihong Jin, Yuxin Chen, Dan Li, Kameshwar Poolla, Alberto Sangiovanni-Vincentelli

发表机构 * Department of EECS, University of California, Berkeley(加州大学伯克利分校电子工程与计算机科学系) California Institute of Technology(加州理工学院) Institute of Data Science, National University of Singapore(新加坡国立大学数据科学研究所)

AI总结 本文提出了一种校准一类支持向量机(OC-SVM)的方法,用于时间序列变化点检测,通过启发式搜索方法找到输入数据和超参数的最优组合,实验表明OC-SVM在少量训练数据下也能有效检测变化点,优于现有深度学习方法。

详情
AI中文摘要

识别系统健康状态的变化点对于检测发展中的初始故障至关重要。一类支持向量机(OC-SVM)是一种流行的机器学习模型,用于异常检测,因此可用于识别变化点;然而,有时难以获得一个能够用于传感器测量时间序列以识别系统健康状态变化点的良好的OC-SVM模型。在本文中,我们提出了一种新颖的OC-SVM模型校准方法。该方法使用启发式搜索方法来寻找一组良好的输入数据和超参数,以产生一个表现良好的模型。我们在C-MAPSS数据集上的结果表明,OC-SVM在使用较少训练数据的情况下也能在时间序列中实现满意的准确性,相较于最先进的深度学习方法。在我们的案例研究中,通过所提出的模型校准的OC-SVM在训练数据有限的情况下显示出特别的实用性。

英文摘要

It is important to identify the change point of a system's health status, which usually signifies an incipient fault under development. The One-Class Support Vector Machine (OC-SVM) is a popular machine learning model for anomaly detection and hence could be used for identifying change points; however, it is sometimes difficult to obtain a good OC-SVM model that can be used on sensor measurement time series to identify the change points in system health status. In this paper, we propose a novel approach for calibrating OC-SVM models. The approach uses a heuristic search method to find a good set of input data and hyperparameters that yield a well-performing model. Our results on the C-MAPSS dataset demonstrate that OC-SVM can also achieve satisfactory accuracy in detecting change point in time series with fewer training data, compared to state-of-the-art deep learning approaches. In our case study, the OC-SVM calibrated by the proposed model is shown to be useful especially in scenarios with limited amount of training data.

1902.06133 2026-06-04 cs.RO cs.SY eess.SY

A Fleet of Miniature Cars for Experiments in Cooperative Driving

用于合作驾驶实验的微型汽车车队

Nicholas Hyldmar, Yijun He, Amanda Prorok

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文介绍了一种由16辆微型Ackermann转向车辆组成的独特实验平台,旨在解决多车导航和轨迹规划研究与教育中低成本平台不足的问题,通过实验展示合作驾驶在多车道道路地形中的优势。

Comments Accepted to ICRA 2019

详情
AI中文摘要

我们介绍了一种独特的实验测试平台,由16辆微型Ackermann转向车辆组成。我们受多车导航和轨迹规划研究与教育中缺乏低成本平台的启发。本文详细介绍了我们的微型机器人汽车Cambridge Minicar的设计以及车队的控制架构。我们的实验平台允许我们实现最先进的驾驶员模型以及自主控制策略,并在真实的多车道物理环境中测试其有效性。通过在我们的微型高速公路上的实验,我们能够直观地展示合作驾驶在多车道道路地形中的优势。我们的设置为室内的大车队实验研究铺平了道路。

英文摘要

We introduce a unique experimental testbed that consists of a fleet of 16 miniature Ackermann-steering vehicles. We are motivated by a lack of available low-cost platforms to support research and education in multi-car navigation and trajectory planning. This article elaborates the design of our miniature robotic car, the Cambridge Minicar, as well as the fleet's control architecture. Our experimental testbed allows us to implement state-of-the-art driver models as well as autonomous control strategies, and test their validity in a real, physical multi-lane setup. Through experiments on our miniature highway, we are able to tangibly demonstrate the benefits of cooperative driving on multi-lane road topographies. Our setup paves the way for indoor large-fleet experimental research.