arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 21516
1710.06537 2026-06-04 cs.RO cs.SY eess.SY

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

机器人控制的仿真到现实转移与动力学随机化

Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel

发表机构 * OpenAI

AI总结 本文提出一种简单方法通过随机化仿真动力学来弥合现实与仿真的差距,使策略能适应不同动态,从而在无需真实系统训练的情况下实现现实世界泛化。

详情
AI中文摘要

仿真环境为训练智能体提供了丰富的数据源,并在训练过程中减少了某些安全方面的担忧。但智能体在仿真中开发的行为往往特定于模拟器的特性。由于建模误差,仿真中表现良好的策略可能无法转移到现实世界。本文提出了一种简单的方法来弥合这一“现实差距”。通过在训练过程中随机化模拟器的动力学,我们能够开发出能够适应非常不同动力学的策略,包括那些与策略训练所基于的动力学有显著差异的动力学。这种适应性使策略能够在没有对物理系统进行训练的情况下泛化到现实世界的动力学。我们的方法在使用机械臂的物体推动任务上进行了演示。尽管策略仅在仿真中进行训练,但部署在真实机器人上时,其性能仍能保持相似水平,能够可靠地将物体从随机初始配置移动到目标位置。我们探讨了各种设计决策的影响,并展示了所得到的策略对显著校准误差具有鲁棒性。

英文摘要

Simulations are attractive environments for training agents as they provide an abundant source of data and alleviate certain safety concerns during the training process. But the behaviours developed by agents in simulation are often specific to the characteristics of the simulator. Due to modeling error, strategies that are successful in simulation may not transfer to their real world counterparts. In this paper, we demonstrate a simple method to bridge this "reality gap". By randomizing the dynamics of the simulator during training, we are able to develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained. This adaptivity enables the policies to generalize to the dynamics of the real world without any training on the physical system. Our approach is demonstrated on an object pushing task using a robotic arm. Despite being trained exclusively in simulation, our policies are able to maintain a similar level of performance when deployed on a real robot, reliably moving an object to a desired location from random initial configurations. We explore the impact of various design decisions and show that the resulting policies are robust to significant calibration error.

1809.07098 2026-06-04 cs.AI cs.LG cs.MA cs.NE cs.SY eess.SY

Novelty-organizing team of classifiers in noisy and dynamic environments

在噪声和动态环境中组织新颖性的分类器团队

Danilo Vasconcellos Vargas, Hirotaka Takano, Junichi Murata

发表机构 * Graduate School of Information Science(信息科学研究生学校) Electrical Engineering Kyushu University Fukuoka, Japan Email(电气工程九州大学福冈日本电子邮件) Faculty of Information Science(信息科学学院)

AI总结 该研究提出了一种在噪声和动态环境中有效工作的分类器团队(NOTC),并通过连续动作山车问题及其变体进行验证,展示了NOTC在性能上的优势,尽管其初始化过程需要一些时间。

详情
Journal ref
2015 IEEE Congress on Evolutionary Computation (CEC)
AI中文摘要

在现实世界中,环境不断变化,输入变量受到噪声的影响。然而,很少有算法能够在这种情况下工作。在这里,新颖性组织分类器团队(NOTC)被应用于连续动作山车以及其两个变种:噪声山车和不稳定天气山车。这些问题分别考虑了噪声和问题动态的变化。此外,NOTC在这些问题中与神经进化拓扑增强(NEAT)进行了比较,揭示了两种方法之间的权衡。尽管NOTC在所有问题中均表现最佳,但NEAT需要更少的试验来收敛。证明了NOTC之所以表现更好,是因为其将输入空间划分为更易处理的问题。不幸的是,这种输入空间的划分也需要一些时间来初始化。

英文摘要

In the real world, the environment is constantly changing with the input variables under the effect of noise. However, few algorithms were shown to be able to work under those circumstances. Here, Novelty-Organizing Team of Classifiers (NOTC) is applied to the continuous action mountain car as well as two variations of it: a noisy mountain car and an unstable weather mountain car. These problems take respectively noise and change of problem dynamics into account. Moreover, NOTC is compared with NeuroEvolution of Augmenting Topologies (NEAT) in these problems, revealing a trade-off between the approaches. While NOTC achieves the best performance in all of the problems, NEAT needs less trials to converge. It is demonstrated that NOTC achieves better performance because of its division of the input space (creating easier problems). Unfortunately, this division of input space also requires a bit of time to bootstrap.

1809.06970 2026-06-04 cs.LG cs.NI cs.PF cs.SY eess.SY stat.ML

FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices

FastDeepIoT: 向理解和优化移动和嵌入式设备上神经网络执行时间迈进

Shuochao Yao, Yiran Zhao, Huajie Shao, Shengzhong Liu, Dongxin Liu, Lu Su, Tarek Abdelzaher

发表机构 * University of Illinois Urbana Champaign(伊利诺伊大学厄巴纳-香槟分校) State University of New York at Buffalo(纽约州立大学布法罗分校)

AI总结 本文提出FastDeepIoT框架,通过揭示神经网络结构与执行时间之间的非线性关系,优化移动和嵌入式设备上执行时间与准确性的权衡,同时无需预先了解硬件规格或深度学习库的实现细节。

Comments Accepted by SenSys '18

详情
AI中文摘要

深度神经网络在许多传感应用问题中展现出巨大潜力,但其过度的资源需求会减慢执行时间,成为在低端设备上部署的重大障碍。为了解决这一挑战,最近的研究集中在压缩神经网络大小以提高性能。我们表明,改变神经网络大小并不成比例地影响感兴趣的性能属性,例如执行时间。相反,在网络配置空间中存在极端的运行时间非线性性。因此,我们提出了一个名为FastDeepIoT的新型框架,该框架揭示了神经网络结构与执行时间之间的非线性关系,然后利用这种理解来找到显著改善移动和嵌入式设备上执行时间与准确性权衡的网络配置。FastDeepIoT有两个关键贡献。首先,FastDeepIoT自动学习了一个准确且高度可解释的深度神经网络在目标设备上的执行时间模型。这无需事先了解硬件规格或所用深度学习库的详细实现。其次,FastDeepIoT告知压缩算法如何在经过分析的设备上最小化执行时间而不影响准确性。我们使用三种不同的传感相关任务在两部移动设备(Nexus 5和Galaxy Nexus)上评估了FastDeepIoT。FastDeepIoT进一步将神经网络的执行时间减少了48%到78%,并将能耗降低了37%到69%,与最先进的压缩算法相比。

英文摘要

Deep neural networks show great potential as solutions to many sensing application problems, but their excessive resource demand slows down execution time, pausing a serious impediment to deployment on low-end devices. To address this challenge, recent literature focused on compressing neural network size to improve performance. We show that changing neural network size does not proportionally affect performance attributes of interest, such as execution time. Rather, extreme run-time nonlinearities exist over the network configuration space. Hence, we propose a novel framework, called FastDeepIoT, that uncovers the non-linear relation between neural network structure and execution time, then exploits that understanding to find network configurations that significantly improve the trade-off between execution time and accuracy on mobile and embedded devices. FastDeepIoT makes two key contributions. First, FastDeepIoT automatically learns an accurate and highly interpretable execution time model for deep neural networks on the target device. This is done without prior knowledge of either the hardware specifications or the detailed implementation of the used deep learning library. Second, FastDeepIoT informs a compression algorithm how to minimize execution time on the profiled device without impacting accuracy. We evaluate FastDeepIoT using three different sensing-related tasks on two mobile devices: Nexus 5 and Galaxy Nexus. FastDeepIoT further reduces the neural network execution time by $48\%$ to $78\%$ and energy consumption by $37\%$ to $69\%$ compared with the state-of-the-art compression algorithms.

1809.06179 2026-06-04 cs.RO cs.LG cs.SY eess.SY

Learning of Multi-Context Models for Autonomous Underwater Vehicles

多情境模型学习用于自主水下车辆

Bilal Wehbe, Octavio Arriaga, Mario Michael Krell, Frank Kirchner

发表机构 * DFKI - Robotic Innovation Center(DFKI机器人创新中心) Robotics Research Group(机器人研究组)

AI总结 本文提出利用LSTM网络学习自主水下车辆的多情境模型,通过实验数据构建仿真模型,生成不同情境并提高分类准确性,展现对噪声的鲁棒性和大数据集的扩展能力。

Comments 6 pages, 7 figures, AUV 2018 author copy

详情
AI中文摘要

多情境模型学习对于海洋机器人至关重要,因为多个因素可能干扰系统的动力学。本文解决了识别自主水下车辆(AUV)模型多种情境的问题。我们从实验数据构建了机器人的仿真模型,并利用该模型填补缺失数据并生成不同的模型情境。我们实现了一种基于长短期记忆(LSTM)网络的架构,直接从数据中学习不同的情境。我们证明LSTM网络在与基线方法相比时能够实现较高的分类准确性,显示出对噪声的鲁棒性,并能有效扩展到大规模数据集上。

英文摘要

Multi-context model learning is crucial for marine robotics where several factors can cause disturbances to the system's dynamics. This work addresses the problem of identifying multiple contexts of an AUV model. We build a simulation model of the robot from experimental data, and use it to fill in the missing data and generate different model contexts. We implement an architecture based on long-short-term-memory (LSTM) networks to learn the different contexts directly from the data. We show that the LSTM network can achieve high classification accuracy compared to baseline methods, showing robustness against noise and scaling efficiently on large datasets.

1809.06009 2026-06-04 cs.LG cs.NA math.NA stat.ML

Uncertainty Propagation in Deep Neural Networks Using Extended Kalman Filtering

使用扩展卡尔曼滤波在深度神经网络中进行不确定性传播

Jessica S. Titensky, Hayden Jananthan, Jeremy Kepner

发表机构 * Massachusetts Institute of Technology(麻省理工学院) Department of Mathematics(数学系) Lincoln Laboratory Supercomputing Center(林肯实验室超级计算机中心)

AI总结 本文提出利用扩展卡尔曼滤波在深度神经网络中传播和量化输入不确定性,方法在计算效率上优于现有技术,同时自然地将模型误差纳入输出不确定性。

Comments 4 Pages, 8 figures. Accepted at MIT IEEE Undergraduate Research Technology Conference 2018. Publication pending

详情
AI中文摘要

扩展卡尔曼滤波(EKF)可用于在假设输入分布具有温和假设的情况下通过深度神经网络(DNN)传播和量化输入不确定性。该方法在结果上与现有DNN不确定性传播方法相当,同时显著降低了计算开销。此外,EKF允许将模型误差自然地纳入输出不确定性中。

英文摘要

Extended Kalman Filtering (EKF) can be used to propagate and quantify input uncertainty through a Deep Neural Network (DNN) assuming mild hypotheses on the input distribution. This methodology yields results comparable to existing methods of uncertainty propagation for DNNs while lowering the computational overhead considerably. Additionally, EKF allows model error to be naturally incorporated into the output uncertainty.

1806.06161 2026-06-04 cs.RO cs.LG cs.SY eess.SY

BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning

BaRC:机器人强化学习中的逆向可达性课程

Boris Ivanovic, James Harrison, Apoorva Sharma, Mo Chen, Marco Pavone

发表机构 * Department of Mechanical Engineering, Stanford University(斯坦福大学机械工程系) School of Computing Science, Simon Fraser University(西蒙弗雷泽大学计算机科学学院)

AI总结 本文提出BaRC方法,利用物理先验知识设计课程方案,通过逆向可达性策略加速连续控制MDP中模型无关RL算法的训练,提升性能并减少探索需求。

详情
AI中文摘要

模型无关强化学习(RL)为高维系统学习控制策略提供了有吸引力的方法,但其相对差的样本复杂性通常迫使在模拟环境中进行训练。即使在模拟中,具有稀疏自然奖励函数的目标导向任务仍难以被最先进的模型无关算法处理。这些任务的瓶颈在于从系统初始状态获取学习信号所需的大量探索。本文利用物理先验知识(以近似系统动力学模型的形式)设计了一种课程方案,用于模型无关策略优化算法。我们的逆向可达性课程(BaRC)从需要少量动作完成任务的状态开始策略训练,并在策略优化算法表现出足够性能后,以动态一致的方式扩展初始状态分布。BaRC具有通用性,可以加速任何模型无关RL算法在广泛目标导向连续控制MDP上的训练。其课程策略具有物理直观性、易于调节,并允许将物理先验整合到训练中,而不会影响模型无关RL算法的性能、灵活性和适用性。我们在两个代表性的动态机器人学习问题上评估了我们的方法,并发现相对于先前的课程生成技术和朴素探索策略,有显著的性能提升。

英文摘要

Model-free Reinforcement Learning (RL) offers an attractive approach to learn control policies for high-dimensional systems, but its relatively poor sample complexity often forces training in simulated environments. Even in simulation, goal-directed tasks whose natural reward function is sparse remain intractable for state-of-the-art model-free algorithms for continuous control. The bottleneck in these tasks is the prohibitive amount of exploration required to obtain a learning signal from the initial state of the system. In this work, we leverage physical priors in the form of an approximate system dynamics model to design a curriculum scheme for a model-free policy optimization algorithm. Our Backward Reachability Curriculum (BaRC) begins policy training from states that require a small number of actions to accomplish the task, and expands the initial state distribution backwards in a dynamically-consistent manner once the policy optimization algorithm demonstrates sufficient performance. BaRC is general, in that it can accelerate training of any model-free RL algorithm on a broad class of goal-directed continuous control MDPs. Its curriculum strategy is physically intuitive, easy-to-tune, and allows incorporating physical priors to accelerate training without hindering the performance, flexibility, and applicability of the model-free RL algorithm. We evaluate our approach on two representative dynamic robotic learning problems and find substantial performance improvement relative to previous curriculum generation techniques and naive exploration strategies.

1804.01031 2026-06-04 cs.RO cs.LG cs.SY eess.SY

Provably Robust Learning-Based Approach for High-Accuracy Tracking Control of Lagrangian Systems

具有证明鲁棒性的基于学习的方法用于拉格朗日系统高精度跟踪控制

Mohamed K. Helwa, Adam Heins, Angela P. Schoellig

发表机构 * Dynamic Systems Lab(动态系统实验室) Institute for Aerospace Studies(航空航天研究院) University of Toronto(多伦多大学)

AI总结 本文提出基于高斯过程的新型学习控制方法,确保系统稳定性与高精度跟踪,通过不确定性界保证鲁棒性,并在仿真和实验中验证有效性。

Comments 8 pages, 4 figures, 2 tables, submitted to IEEE Robotics and Automation Letters (RA-L) and the 2019 International Conference on Robotics and Automation (ICRA) (created: March 2018; updated: September 2018)

详情
AI中文摘要

拉格朗日系统涵盖了多种机器人系统,包括机械臂、轮式和腿部机器人以及四旋翼。通常使用逆动力学控制和前馈线性化技术将复杂非线性动力学转换为解耦的二阶积分器,然后使用标准外环控制器计算线性化系统的期望加速度。然而,这些方法通常依赖于非常准确的系统模型,这在实践中往往不可用。尽管文献中使用了不同的学习方法来解决这一挑战,但大多数方法在学习控制系统稳定性方面缺乏安全保证。本文提出了一种基于高斯过程(GPs)的新学习控制方法,确保闭环系统的稳定性和高精度跟踪。我们使用GPs近似命令加速度与系统实际加速度之间的误差,并利用GP预测的均值和方差计算线性化模型不确定性的上界。此不确定性界随后用于鲁棒的外环控制器以确保整个系统的稳定性。此外,我们证明跟踪误差收敛到一个半径可任意小的球体。进一步,我们通过在2自由度平面机械臂上的仿真和6自由度工业机械臂上的实验验证了我们方法的有效性。

英文摘要

Lagrangian systems represent a wide range of robotic systems, including manipulators, wheeled and legged robots, and quadrotors. Inverse dynamics control and feedforward linearization techniques are typically used to convert the complex nonlinear dynamics of Lagrangian systems to a set of decoupled double integrators, and then a standard, outer-loop controller can be used to calculate the commanded acceleration for the linearized system. However, these methods typically depend on having a very accurate system model, which is often not available in practice. While this challenge has been addressed in the literature using different learning approaches, most of these approaches do not provide safety guarantees in terms of stability of the learning-based control system. In this paper, we provide a novel, learning-based control approach based on Gaussian processes (GPs) that ensures both stability of the closed-loop system and high-accuracy tracking. We use GPs to approximate the error between the commanded acceleration and the actual acceleration of the system, and then use the predicted mean and variance of the GP to calculate an upper bound on the uncertainty of the linearized model. This uncertainty bound is then used in a robust, outer-loop controller to ensure stability of the overall system. Moreover, we show that the tracking error converges to a ball with a radius that can be made arbitrarily small. Furthermore, we verify the effectiveness of our approach via simulations on a 2 degree-of-freedom (DOF) planar manipulator and experimentally on a 6 DOF industrial manipulator.

1809.03314 2026-06-04 cs.CV cs.SY eess.SY

A Robotic Auto-Focus System based on Deep Reinforcement Learning

基于深度强化学习的机器人自动对焦系统

Xiaofan Yu, Runze Yu, Jingsong Yang, Xiaohui Duan

发表机构 * Center of Wireless Communication and Signal Processing(无线通信与信号处理中心)

AI总结 本文提出一种端到端的自动对焦方法,通过深度强化学习在视觉输入中学习对焦策略,实现自动清晰成像。方法通过离散化动作空间和应用DQN,解决自动对焦问题并推广至基于视觉的控制问题。

Comments To Appear at ICARCV 2018

详情
AI中文摘要

考虑到DQN在处理高维视觉输入和学习离散域控制策略方面的优势,DQN可能成为传统自动对焦方法的替代方案。本文基于深度强化学习提出了一种端到端方法,从视觉输入中学习自动对焦策略,并自动聚焦到清晰点。我们证明了我们的方法——通过粗到细的步骤离散化动作空间并应用DQN,不仅解决了自动对焦问题,还为基于视觉的控制问题提供了一种通用方法。分别在虚拟和真实环境中进行训练阶段以获得有效的模型。虚拟实验表明,我们的方法在不同聚焦范围内能够实现100%的准确性。进一步在真实机器人上训练可消除模拟器与真实场景之间的偏差,从而在实际应用中实现可靠性能。

英文摘要

Considering its advantages in dealing with high-dimensional visual input and learning control policies in discrete domain, Deep Q Network (DQN) could be an alternative method of traditional auto-focus means in the future. In this paper, based on Deep Reinforcement Learning, we propose an end-to-end approach that can learn auto-focus policies from visual input and finish at a clear spot automatically. We demonstrate that our method - discretizing the action space with coarse to fine steps and applying DQN is not only a solution to auto-focus but also a general approach towards vision-based control problems. Separate phases of training in virtual and real environments are applied to obtain an effective model. Virtual experiments, which are carried out after the virtual training phase, indicates that our method could achieve 100% accuracy on a certain view with different focus range. Further training on real robots could eliminate the deviation between the simulator and real scenario, leading to reliable performances in real applications.

1709.06196 2026-06-04 cs.AI cs.RO cs.SY eess.SY

Online algorithms for POMDPs with continuous state, action, and observation spaces

在线算法用于具有连续状态、动作和观察空间的POMDPs

Zachary Sunberg, Mykel Kochenderfer

发表机构 * Aeronautics and Astronautics Dept. Stanford University(航空航天系 斯坦福大学)

AI总结 本文提出POMCPOW和PFT-DPW算法,通过加权粒子过滤解决连续状态空间POMDPs的求解问题,验证了改进方法的有效性。

Comments Added Multilane section

详情
Journal ref
Short version published in 2018 proceedings of the International Conference on Automated Planning and Scheduling (ICAPS)
AI中文摘要

在线求解部分可观测马尔可夫决策过程的算法已被应用于具有大离散状态空间的问题,但连续状态、动作和观察空间仍具挑战性。本文首先探讨双级渐进扩展(DPW)作为解决方案,但证明该修改单独不足,因为搜索树中的信念表示坍缩为单个粒子,导致算法收敛到次优策略。本文提出并评估了两种新算法,POMCPOW和PFT-DPW,通过加权粒子过滤克服这一缺陷。仿真结果表明,这些改进使算法在先前方法失败的场景中取得成功。

英文摘要

Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.

1809.00037 2026-06-04 cs.RO cs.SY eess.SY

Estimation for Quadrotors

四旋翼估计

Stefanie Tellex, Andy Brown, Sergei Lupashin

发表机构 * Brown University(布朗大学) Udacity, Inc.(Udacity公司) Fotokite(Fotokite公司)

AI总结 本文基于四旋翼模型,推导了扩展卡尔曼滤波器的推导过程,提供EKF、贝叶斯滤波和无迹卡尔曼滤波的伪代码,旨在解决四旋翼状态估计中的噪声和计算限制问题。

详情
AI中文摘要

本文描述了四旋翼滤波和估计的标准方法,适用于Udacity飞行汽车课程。本文假设具备概率知识和一些线性代数知识,不假设卡尔曼滤波或贝叶斯滤波的先前知识。本文推导了不同无人机模型在1D、2D和3D中的EKF。本文使用Thrun等人[13]定义的EKF和符号,并提供了贝叶斯滤波、EKF和无迹卡尔曼滤波[14]的伪代码。本文的动机是缺乏提供四旋翼直升机推导的逐步EKF教程。估计的目标是从传感器值和控制输入推断无人机的状态(姿态、速度、加速度和偏差)。这个问题具有挑战性,因为传感器噪声很大。此外,由于重量和成本问题,许多无人机具有有限的机载计算能力,因此希望快速估计这些值。标准方法是扩展卡尔曼滤波,它是卡尔曼滤波的非线性扩展,通过在当前状态附近线性化非线性转换和测量模型。然而,无迹卡尔曼滤波在几乎所有方面都更好:更容易实现,估计更准确,运行时间相当。

英文摘要

This document describes standard approaches for filtering and estimation for quadrotors, created for the Udacity Flying Cars course. We assume previous knowledge of probability and some knowledge of linear algebra. We do not assume previous knowledge of Kalman filters or Bayes filters. This document derives an EKF for various models of drones in 1D, 2D, and 3D. We use the EKF and notation as defined in Thrun et al. [13]. We also give pseudocode for the Bayes filter, the EKF, and the Unscented Kalman filter [14]. The motivation behind this document is the lack of a step-by-step EKF tutorial that provides the derivations for a quadrotor helicopter. The goal of estimation is to infer the drone's state (pose, velocity, acceleration, and biases) from its sensor values and control inputs. This problem is challenging because sensors are noisy. Additionally, because of weight and cost issues, many drones have limited on-board computation so we want to estimate these values as quickly as possible. The standard method for performing this method is the Extended Kalman filter, a nonlinear extension of the Kalman filter which linearizes a nonlinear transition and measurement model around the current state. However the Unscented Kalman filter is better in almost every respect: simpler to implement, more accurate to estimate, and comparable runtimes.

1806.05220 2026-06-04 cs.RO cs.SY eess.SY

Decentralized Ergodic Control: Distribution-Driven Sensing and Exploration for Multi-Agent Systems

去中心化恒定控制:面向多智能体系统的分布驱动感知与探索

Ian Abraham, Todd D. Murphey

发表机构 * Neuroscience and Robotics Laboratory(神经科学与机器人实验室)

AI总结 本文提出一种去中心化恒定控制策略,用于解决多智能体非线性动态系统的时间变化区域覆盖问题,通过共识实现完全去中心化的多智能体控制政策,并展示了其在多智能体地形映射和目标定位中的应用。

Comments 8 pages, Accepted for publication in IEEE Robotics and Automation Letters

详情
Journal ref
IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 2377-3766, 2018
AI中文摘要

我们提出了一种去中心化恒定控制策略,用于解决多智能体在时间变化区域覆盖问题中的非线性动态。恒定控制允许我们将分布作为非线性机器人系统的区域覆盖问题的目标,作为一种闭式控制器。我们推导出一种恒定控制策略的变体,可用于共识,以实现完全去中心化的多智能体控制策略。通过示例展示了我们的方法在多智能体地形映射以及目标定位中的适用性。还提供了对恒定策略作为纳什均衡的分析,用于博弈论应用。

英文摘要

We present a decentralized ergodic control policy for time-varying area coverage problems for multiple agents with nonlinear dynamics. Ergodic control allows us to specify distributions as objectives for area coverage problems for nonlinear robotic systems as a closed-form controller. We derive a variation to the ergodic control policy that can be used with consensus to enable a fully decentralized multi-agent control policy. Examples are presented to illustrate the applicability of our method for multi-agent terrain mapping as well as target localization. An analysis on ergodic policies as a Nash equilibrium is provided for game theoretic applications.

1802.08215 2026-06-04 cs.RO cs.SY eess.SY

ArduSoar: an Open-Source Thermalling Controller for Resource-Constrained Autopilots

ArduSoar:一种为资源受限自动驾驶仪设计的开源热气球控制器

Samuel Tabor, Iain Guilliard, Andrey Kolobov

发表机构 * Glasgow, Scotland(格拉斯哥,苏格兰) Australian National University(澳大利亚国立大学) Microsoft Research(微软研究院)

AI总结 本文提出ArduSoar,首个集成于主流小型无人机自动驾驶软件中的热气球控制器,通过算法设计、与ArduPlane的集成及实飞测试验证其在非理想大气条件下的鲁棒性。

详情
AI中文摘要

自主热气球能力有潜力显著增加固定翼无人机的飞行时间。本文介绍ArduSoar,首个集成于主流小型无人机自动驾驶软件套件中的热气球控制器。我们从算法角度描述ArduSoar,概述其与ArduPlane自动驾驶仪的集成,讨论其参数调节,并在真实小型无人机上进行一系列飞行测试,证明ArduSoar在高度非理想大气条件下仍具鲁棒性。

英文摘要

Autonomous soaring capability has the potential to significantly increase time aloft for fixed-wing UAVs. In this paper, we introduce ArduSoar, the first soaring controller integrated into a major autopilot software suite for small UAVs. We describe ArduSoar from the algorithmic standpoint, outline its integration with the ArduPlane autopilot, discuss parameter tuning for it, and conduct a series of flight tests on real sUAVs that show ArduSoar's robustness even in highly non-ideal atmospheric conditions.

1803.10309 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Canonical Correlation Analysis of Datasets with a Common Source Graph

具有共同源图的数据集的典型相关分析

Jia Chen, Gang Wang, Yanning Shen, Georgios B. Giannakis

发表机构 * University of Minnesota(明尼苏达大学)

AI总结 本文提出了一种基于图正则化的典型相关分析方法(gCCA),通过引入图结构来利用共同源的知识,以提升数据融合和分类性能。

Comments 10 pages, 7 figures

详情
AI中文摘要

典型相关分析(CCA)是一种用于发现两个或多个数据集是否共享隐藏源的强大技术。其优点包括降维、聚类、分类、特征选择和数据融合。然而,标准CCA未利用共同源的几何结构,这可能来自给定数据或通过(交叉)相关性推导。本文将共同源提供的额外信息编码为图,并作为图正则化器。这导致了一种新的图正则化CCA方法,称为图(g)CCA。新的gCCA考虑了图诱导的共同源知识,同时最小化所需典型变量的距离。针对数据量小于数据向量维度的多种实际设置,还开发了gCCA的对偶形式。一种设置包括内核用于处理非线性数据依赖性。所得到的图内核(gk)CCA也以闭式形式获得。最后,通过多个真实数据集上的图像分类测试来证明新线性、对偶和内核方法相对于竞争方法的优势。

英文摘要

Canonical correlation analysis (CCA) is a powerful technique for discovering whether or not hidden sources are commonly present in two (or more) datasets. Its well-appreciated merits include dimensionality reduction, clustering, classification, feature selection, and data fusion. The standard CCA however, does not exploit the geometry of the common sources, which may be available from the given data or can be deduced from (cross-) correlations. In this paper, this extra information provided by the common sources generating the data is encoded in a graph, and is invoked as a graph regularizer. This leads to a novel graph-regularized CCA approach, that is termed graph (g) CCA. The novel gCCA accounts for the graph-induced knowledge of common sources, while minimizing the distance between the wanted canonical variables. Tailored for diverse practical settings where the number of data is smaller than the data vector dimensions, the dual formulation of gCCA is also developed. One such setting includes kernels that are incorporated to account for nonlinear data dependencies. The resultant graph-kernel (gk) CCA is also obtained in closed form. Finally, corroborating image classification tests over several real datasets are presented to showcase the merits of the novel linear, dual, and kernel approaches relative to competing alternatives.

1509.02223 2026-06-04 cs.CV cs.NA math.NA

Diffusion tensor imaging with deterministic error bounds

具有确定性误差边界的扩散张量成像

Artur Gorokh, Yury Korolev, Tuomo Valkonen

发表机构 * Faculty of Physics, Lomonosov Moscow State University(莫斯科罗蒙诺索夫国立大学物理系) School of Engineering and Materials Science, Queen Mary University of London(伦敦女王玛丽大学工程与材料科学学院)

AI总结 本文在Banach格中利用偏序理论建模逆问题的误差,应用于扩散张量成像中复杂噪声建模问题,通过确定性误差边界方法简化非线性Stejskal-Tanner方程的处理。

详情
AI中文摘要

逆问题的数据和前向算子的误差可以利用Banach格中的偏序进行建模。我们在此新框架中呈现了一些正则化理论的现有结果,其中误差通过适当的偏序表示为界限。我们将该理论应用于扩散张量成像,其中正确的噪声建模具有挑战性:它涉及Rician分布和非线性Stejskal-Tanner方程。在统计框架中线性化后者会进一步复杂化噪声模型。我们通过误差边界方法避免了这一点,该方法在单调变换下保持简单的误差结构。

英文摘要

Errors in the data and the forward operator of an inverse problem can be handily modelled using partial order in Banach lattices. We present some existing results of the theory of regularisation in this novel framework, where errors are represented as bounds by means of the appropriate partial order. We apply the theory to Diffusion Tensor Imaging, where correct noise modelling is challenging: it involves the Rician distribution and the nonlinear Stejskal-Tanner equation. Linearisation of the latter in the statistical framework would complicate the noise model even further. We avoid this using the error bounds approach, which preserves simple error structure under monotone transformations.

1808.03983 2026-06-04 cs.RO cs.SY eess.SY

Robot Safe Interaction System for Intelligent Industrial Co-Robots

智能工业协作机器人安全交互系统

Changliu Liu, Masayoshi Tomizuka

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出一种安全交互系统,通过并行规划与控制架构提升协作机器人在动态不确定环境中的效率与安全性,实验验证了方法的有效性。

Comments 12 pages

详情
AI中文摘要

人类-机器人交互被认为是未来工业协作机器人(协作机器人)的关键要素。不同于传统机器人在结构化和确定性环境中的工作方式,协作机器人需要在高度非结构化和随机环境中操作。为确保协作机器人在动态不确定环境中高效安全地运行,本文介绍了机器人安全交互系统。为解决人类-机器人交互中的不确定性,提出了一种独特的并行规划与控制架构,该架构包含一个长期全局规划器以确保机器人行为的效率,以及一个短期局部规划器以在不确定性下确保实时安全性。为使机器人能够立即响应环境变化,使用快速算法进行实时计算,即用于长期优化的凸可行性集算法和用于短期优化的安全集算法。介绍了几个测试平台,用于在部署初期对开发系统的安全性进行评估。通过与工业机器人机械臂的实验验证了所提方法的有效性和效率。

英文摘要

Human-robot interactions have been recognized to be a key element of future industrial collaborative robots (co-robots). Unlike traditional robots that work in structured and deterministic environments, co-robots need to operate in highly unstructured and stochastic environments. To ensure that co-robots operate efficiently and safely in dynamic uncertain environments, this paper introduces the robot safe interaction system. In order to address the uncertainties during human-robot interactions, a unique parallel planning and control architecture is proposed, which has a long term global planner to ensure efficiency of robot behavior, and a short term local planner to ensure real time safety under uncertainties. In order for the robot to respond immediately to environmental changes, fast algorithms are used for real-time computation, i.e., the convex feasible set algorithm for the long term optimization, and the safe set algorithm for the short term optimization. Several test platforms are introduced for safe evaluation of the developed system in the early phase of deployment. The effectiveness and the efficiency of the proposed method have been verified in experiment with an industrial robot manipulator.

1808.03037 2026-06-04 cs.RO cs.SY eess.SY

Passive Compliance Control of Aerial Manipulators

空载合规控制的空中机械臂

Min Jun Kim, Ribin Balachandran, Marco De Stefano, Konstantin Kondak, Christian Ott

发表机构 * German Aerospace Center (DLR)(德国航空航天中心)

AI总结 本文提出了一种空载合规控制方法,通过合理选择末端执行器坐标和时间域被动技术,确保空中机械臂在无动力驱动情况下实现稳定环境交互。

Comments IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018

详情
AI中文摘要

本文提出了一种用于空中机械臂的被动合规控制方法,以实现稳定的环境交互。主要挑战是空中车辆的机身平面方向上缺乏驱动能力,这可能在交互过程中需要保持被动性。本文提出的控制器通过合理选择末端执行器坐标来保证机械臂的被动性,而通过利用时间域被动技术保证车辆机身的被动性。仿真研究验证了所提出方法的有效性。

英文摘要

This paper presents a passive compliance control for aerial manipulators to achieve stable environmental interactions. The main challenge is the absence of actuation along body-planar directions of the aerial vehicle which might be required during the interaction to preserve passivity. The controller proposed in this paper guarantees passivity of the manipulator through a proper choice of end-effector coordinates, and that of vehicle fuselage is guaranteed by exploiting time domain passivity technique. Simulation studies validate the proposed approach.

1608.02702 2026-06-04 cs.CV cs.NA math.NA

Steerable Principal Components for Space-Frequency Localized Images

可旋转主成分用于空间-频率局部化图像

Boris Landa, Yoel Shkolnisky

发表机构 * Department of Applied Mathematics, School of Mathematical Sciences(应用数学系,数学科学学院)

AI总结 本文提出一种快速准确的方法,通过二维Prolate Spheroidal Wave Functions对图像进行展开,获取可旋转主成分,用于图像及其旋转的最优扩展。

详情
AI中文摘要

本文描述了一种快速且准确的方法,用于从大量图像数据集中获得可旋转主成分,假设图像在空间和频率上具有良好的局部化特性。所获得的可旋转主成分用于图像数据集及其旋转的最优扩展。该方法首先使用一系列二维Prolate Spheroidal Wave Functions对图像进行展开,其中展开系数通过特殊设计的数值积分方案进行评估。然后,利用这些展开系数构建一个旋转不变的协方差矩阵,其具有块对角结构,其块的特征分解提供了所需的可旋转主成分。所提出的方法被证明比现有方法更快,同时提供适当的误差界以保证其准确性。

英文摘要

This paper describes a fast and accurate method for obtaining steerable principal components from a large dataset of images, assuming the images are well localized in space and frequency. The obtained steerable principal components are optimal for expanding the images in the dataset and all of their rotations. The method relies upon first expanding the images using a series of two-dimensional Prolate Spheroidal Wave Functions (PSWFs), where the expansion coefficients are evaluated using a specially designed numerical integration scheme. Then, the expansion coefficients are used to construct a rotationally-invariant covariance matrix which admits a block-diagonal structure, and the eigen-decomposition of its blocks provides us with the desired steerable principal components. The proposed method is shown to be faster then existing methods, while providing appropriate error bounds which guarantee its accuracy.

1712.07249 2026-06-04 cs.RO cs.LG cs.SY eess.SY

Probabilistic Learning of Torque Controllers from Kinematic and Force Constraints

基于概率学习的扭矩控制器从运动学和力约束中学习

João Silvério, Yanlong Huang, Leonel Rozo, Sylvain Calinon, Darwin G. Caldwell

发表机构 * Department of Advanced Robotics, Istituto Italiano di Tecnologia(意大利先进机器人研究所机器人部) Idiap Research Institute(Idiap研究 institute)

AI总结 本文提出一种概率方法,同时学习和合成扭矩控制命令,考虑任务空间、关节空间和力约束,通过概率学习不同扭矩控制器的相关性,结合高斯分布特性生成满足任务特征的新扭矩命令。

Comments Accepted for publication at 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

详情
AI中文摘要

在从示范中学习技能时,通常需要提前考虑适当的任务表示(通常在操作空间或配置空间中)。本文提出了一种概率方法,同时学习和合成扭矩控制命令,考虑任务空间、关节空间和力约束。我们通过考虑作用于机器人上的不同扭矩控制器,其相关性从示范中概率性地学习。利用高斯分布的性质,将这些控制器结合起来,生成满足任务重要特征的新扭矩命令。我们在两个实验场景中使用7自由度扭矩控制机械臂进行验证,任务需要考虑不同控制器以正确执行。

英文摘要

When learning skills from demonstrations, one is often required to think in advance about the appropriate task representation (usually in either operational or configuration space). We here propose a probabilistic approach for simultaneously learning and synthesizing torque control commands which take into account task space, joint space and force constraints. We treat the problem by considering different torque controllers acting on the robot, whose relevance is learned probabilistically from demonstrations. This information is used to combine the controllers by exploiting the properties of Gaussian distributions, generating new torque commands that satisfy the important features of the task. We validate the approach in two experimental scenarios using 7-DoF torquecontrolled manipulators, with tasks that require the consideration of different controllers to be properly executed.

1807.05290 2026-06-04 cs.RO cs.SY eess.SY

Adaptive Model Predictive Control for High-Accuracy Trajectory Tracking in Changing Conditions

自适应模型预测控制在变化条件下高精度轨迹跟踪中的应用

Karime Pereida, Angela Schoellig

发表机构 * Dynamic Systems Lab(动态系统实验室) University of Toronto Institute for Aerospace Studies(多伦多大学航空航天研究 institute)

AI总结 本文提出一种结合模型预测控制与L1自适应控制器的自适应模型预测控制器,用于在未知和变化的扰动环境下提高系统轨迹跟踪性能。通过实验验证,该方法在四旋翼无人机上表现出更低的轨迹跟踪误差。

详情
AI中文摘要

机器人和自动化系统越来越多地被引入未知和动态环境,这些环境要求它们能够处理扰动、未建模动力学和参数不确定性。为在这些动态环境中实现高性能,需要鲁棒和自适应的控制策略。本文提出了一种新颖的自适应模型预测控制器,结合模型预测控制(MPC)与底层的L1自适应控制器,以提高受未知和变化扰动影响的系统轨迹跟踪性能。L1自适应控制器迫使系统以参考模型指定的方式运行。更高层的模型预测控制器则基于成本函数计算最优参考输入,同时考虑输入和状态约束。我们专注于所提出方法的实验验证,并在四旋翼无人机上展示了其有效性。我们表明,所提出的方法在外部风扰动下,其轨迹跟踪误差比非预测性自适应方法和预测性非自适应方法更低。

英文摘要

Robots and automated systems are increasingly being introduced to unknown and dynamic environments where they are required to handle disturbances, unmodeled dynamics, and parametric uncertainties. Robust and adaptive control strategies are required to achieve high performance in these dynamic environments. In this paper, we propose a novel adaptive model predictive controller that combines model predictive control (MPC) with an underlying $\mathcal{L}_1$ adaptive controller to improve trajectory tracking of a system subject to unknown and changing disturbances. The $\mathcal{L}_1$ adaptive controller forces the system to behave in a predefined way, as specified by a reference model. A higher-level model predictive controller then uses this reference model to calculate the optimal reference input based on a cost function, while taking into account input and state constraints. We focus on the experimental validation of the proposed approach and demonstrate its effectiveness in experiments on a quadrotor. We show that the proposed approach has a lower trajectory tracking error compared to non-predictive, adaptive approaches and a predictive, non-adaptive approach, even when external wind disturbances are applied.

1709.03726 2026-06-04 cs.LG cs.SY eess.SY

Adaptive Graph Signal Processing: Algorithms and Optimal Sampling Strategies

自适应图信号处理:算法与最优采样策略

Paolo Di Lorenzo, Paolo Banelli, Elvin Isufi, Sergio Barbarossa, Geert Leus

发表机构 * Dept. of Engineering, University of Perugia(工程系,佩鲁吉亚大学)

AI总结 本文提出自适应图信号学习的新策略,通过分析随机采样对算法性能的影响,设计优化采样策略以提升稳态性能和收敛速度。

Comments Submitted to IEEE Transactions on Signal Processing, September 2017

详情
AI中文摘要

本文旨在提出自适应图信号学习的新策略,即在随机时间变化的顶点子集上观测信号。将经典自适应算法LMS和RLS重新纳入图信号处理框架,通过均方分析探讨随机采样对自适应重建能力和稳态性能的影响。随后提出几种概率采样策略,设计每个节点的采样概率,以优化稳态性能、图采样率和算法收敛速度的平衡。最后推导出一种分布式RLS策略,并证明其收敛于集中式算法。通过合成和真实数据的数值模拟,展示了所提采样和重建策略在图上信号(可能分布式)自适应学习中的良好性能。

英文摘要

The goal of this paper is to propose novel strategies for adaptive learning of signals defined over graphs, which are observed over a (randomly time-varying) subset of vertices. We recast two classical adaptive algorithms in the graph signal processing framework, namely, the least mean squares (LMS) and the recursive least squares (RLS) adaptive estimation strategies. For both methods, a detailed mean-square analysis illustrates the effect of random sampling on the adaptive reconstruction capability and the steady-state performance. Then, several probabilistic sampling strategies are proposed to design the sampling probability at each node in the graph, with the aim of optimizing the tradeoff between steady-state performance, graph sampling rate, and convergence rate of the adaptive algorithms. Finally, a distributed RLS strategy is derived and is shown to be convergent to its centralized counterpart. Numerical simulations carried out over both synthetic and real data illustrate the good performance of the proposed sampling and reconstruction strategies for (possibly distributed) adaptive learning of signals defined over graphs.

1807.10757 2026-06-04 cs.CV cs.NA math.NA

A multi-contrast MRI approach to thalamus segmentation

一种多对比MRI方法用于丘脑分割

Veronica Corona, Jan Lellmann, Peter Nestor, Carola-Bibiane Schoenlieb, Julio Acosta-Cabronero

发表机构 * Department of Applied Mathematics and Theoretical Physics, University of Cambridge(应用数学与理论物理系,剑桥大学) Queensland Brain Institute, University of Queensland(昆士兰脑研究所,昆士兰大学) Mater Hospital, South Brisbane, Queensland, Australia(马特医院,南布里斯班,昆士兰,澳大利亚) Wellcome Centre for Human Neuroimaging, UCL Institute of Neurology, University College London, United Kingdom(wellcome人类神经影像中心,伦敦大学学院神经学研究所,伦敦大学学院,英国) German Center for Neurodegenerative Diseases (DZNE), Magdeburg, Germany(德国神经退行性疾病研究中心(DZNE),马格德堡,德国)

AI总结 本文提出一种多模态MRI分割方法,通过多对比数据提高丘脑子核分割精度,结合迭代配准、手动分割模板、监督学习和凸优化,提升分割性能与鲁棒性。

详情
AI中文摘要

丘脑变化与许多神经疾病相关,包括阿尔茨海默病、帕金森病和多发性硬化症。常规干预常包括手术或深部脑刺激,因此准确分割灰质丘脑子区域具有临床重要性。MRI适用于结构分割,因其能提供单次扫描的不同解剖视图。尽管有多种对比度可用,开发能处理多谱的图像分割技术变得越来越重要。本文提出了一种新的多模态数据分割方法,用于自动分割主要丘脑子核组,使用T1-、T2*-加权和定量susceptibility mapping (QSM)信息。该方法包括四个步骤:高度迭代的图像配准、在平均训练数据模板上的手动分割、监督学习用于模式识别,以及最终的凸优化步骤,通过进一步的空间约束来优化解决方案。这导致了与手动分割更一致的解决方案,优于标准Morel图谱方法。此外,我们展示了多对比方法提升了分割性能。然后我们研究了是否能利用训练模板轮廓的先验知识进一步提高凸分割的精度和鲁棒性,从而在单个受试者中获得高度精确的多对比分割。该方法可扩展到大多数3D成像数据类型和任何在单次扫描或多受试者模板中可辨识的感兴趣区域。

英文摘要

Thalamic alterations are relevant to many neurological disorders including Alzheimer's disease, Parkinson's disease and multiple sclerosis. Routine interventions to improve symptom severity in movement disorders, for example, often consist of surgery or deep brain stimulation to diencephalic nuclei. Therefore, accurate delineation of grey matter thalamic subregions is of the upmost clinical importance. MRI is highly appropriate for structural segmentation as it provides different views of the anatomy from a single scanning session. Though with several contrasts potentially available, it is also of increasing importance to develop new image segmentation techniques that can operate multi-spectrally. We hereby propose a new segmentation method for use with multi-modality data, which we evaluated for automated segmentation of major thalamic subnuclear groups using T1-, T2*-weighted and quantitative susceptibility mapping (QSM) information. The proposed method consists of four steps: highly iterative image co-registration, manual segmentation on the average training-data template, supervised learning for pattern recognition, and a final convex optimisation step imposing further spatial constraints to refine the solution. This led to solutions in greater agreement with manual segmentation than the standard Morel atlas based approach. Furthermore, we show that the multi-contrast approach boosts segmentation performances. We then investigated whether prior knowledge using the training-template contours could further improve convex segmentation accuracy and robustness, which led to highly precise multi-contrast segmentations in single subjects. This approach can be extended to most 3D imaging data types and any region of interest discernible in single scans or multi-subject templates.

1807.08048 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY

Baidu Apollo EM Motion Planner

百度 Apollo EM 运动规划器

Haoyang Fan, Fan Zhu, Changchun Liu, Liangliang Zhang, Li Zhuang, Dong Li, Weicheng Zhu, Jiangtao Hu, Hongye Li, Qi Kong

发表机构 * Baidu USA LLC(百度美国有限公司)

AI总结 本文提出基于百度 Apollo 开源自动驾驶平台的实时运动规划系统,解决工业级4级运动规划问题,兼顾安全性、舒适性和可扩展性,通过分层结构实现多车道和单车道自动驾驶。

详情
AI中文摘要

本文介绍了一种基于百度 Apollo(开源)自动驾驶平台的实时运动规划系统。该系统旨在解决工业级4级运动规划问题,同时考虑安全性、舒适性和可扩展性。系统采用分层结构处理多车道和单车道自动驾驶:(1)系统顶层为多车道策略,通过并行计算的车道级轨迹进行比较以处理变道场景。(2)在车道级轨迹生成器中,基于弗伦兹框架迭代求解路径和速度优化。(3)对于路径和速度优化,提出结合动态规划和基于样条的二次规划的方法,构建可扩展且易于调节的框架,同时处理交通规则、障碍物决策和平滑性。该规划器可扩展至高速公路和低速城市驾驶场景。我们通过场景示例和道路测试结果展示了该算法。本文描述的系统自2017年9月Apollo v1.5发布以来已部署到数十辆百度Apollo自动驾驶车辆。截至2018年5月16日,该系统已在各种城市场景下进行了3,380小时和约68,000公里(42,253英里)的闭环自动驾驶测试。本文描述的算法可在https://github.com/ApolloAuto/apollo/tree/master/modules/planning上获得。

英文摘要

In this manuscript, we introduce a real-time motion planning system based on the Baidu Apollo (open source) autonomous driving platform. The developed system aims to address the industrial level-4 motion planning problem while considering safety, comfort and scalability. The system covers multilane and single-lane autonomous driving in a hierarchical manner: (1) The top layer of the system is a multilane strategy that handles lane-change scenarios by comparing lane-level trajectories computed in parallel. (2) Inside the lane-level trajectory generator, it iteratively solves path and speed optimization based on a Frenet frame. (3) For path and speed optimization, a combination of dynamic programming and spline-based quadratic programming is proposed to construct a scalable and easy-to-tune framework to handle traffic rules, obstacle decisions and smoothness simultaneously. The planner is scalable to both highway and lower-speed city driving scenarios. We also demonstrate the algorithm through scenario illustrations and on-road test results. The system described in this manuscript has been deployed to dozens of Baidu Apollo autonomous driving vehicles since Apollo v1.5 was announced in September 2017. As of May 16th, 2018, the system has been tested under 3,380 hours and approximately 68,000 kilometers (42,253 miles) of closed-loop autonomous driving under various urban scenarios. The algorithm described in this manuscript is available at https://github.com/ApolloAuto/apollo/tree/master/modules/planning.

1709.05077 2026-06-04 cs.AI cs.SY eess.SY

Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning

通过深度强化学习优化绿色数据中心的冷却系统

Yuanlong Li, Yonggang Wen, Kyle Guan, Dacheng Tao

发表机构 * School of Computer Science and Engineering, Nanyang Technological University(南洋理工大学计算机科学与工程学院) Bell Labs, Nokia(诺基亚贝尔实验室)

AI总结 本文提出利用数据中心监控数据优化冷却控制策略,采用深度强化学习框架设计端到端冷却控制算法,实现冷却成本降低11%的模拟平台结果及15%的实时数据节省。

详情
AI中文摘要

冷却系统在现代数据中心(DC)中起着关键作用。开发最优控制策略对于数据中心冷却系统是一个具有挑战性的任务。现有方法通常依赖于基于机械冷却、电气和热管理知识构建的系统模型近似,这难以设计且可能导致次优或不稳定性能。本文提出利用数据中心中的大量监控数据来优化控制策略。为此,将冷却控制策略设计转化为具有温度约束的能量成本最小化问题,并将其应用于新兴的深度强化学习(DRL)框架。具体而言,我们提出了一种基于actor-critic框架和深度确定性策略梯度(DDPG)算法的端到端冷却控制算法(CCA)。在所提出的CCA中,评估网络被训练以预测一个受数据中心房间冷却状态惩罚的能量成本计数器,而策略网络被训练以在给定当前负载和天气信息时预测优化的控制设置。所提出的算法在EnergyPlus模拟平台和从新加坡国家超级计算中心(NSCC)收集的实时数据跟踪上进行了评估。我们的结果表明,所提出的CCA在模拟平台上相比手动配置的基线控制算法可实现约11%的冷却成本节省。在基于跟踪的研究中,我们提出了一种去低估验证机制,因为我们无法直接在真实数据中心上测试该算法。尽管使用DUE结果较为保守,如果我们设置入口温度阈值为26.6摄氏度,我们仍能在NSCC数据跟踪上实现约15%的冷却能耗节省。

英文摘要

Cooling system plays a critical role in a modern data center (DC). Developing an optimal control policy for DC cooling system is a challenging task. The prevailing approaches often rely on approximating system models that are built upon the knowledge of mechanical cooling, electrical and thermal management, which is difficult to design and may lead to sub-optimal or unstable performances. In this paper, we propose utilizing the large amount of monitoring data in DC to optimize the control policy. To do so, we cast the cooling control policy design into an energy cost minimization problem with temperature constraints, and tap it into the emerging deep reinforcement learning (DRL) framework. Specifically, we propose an end-to-end cooling control algorithm (CCA) that is based on the actor-critic framework and an off-policy offline version of the deep deterministic policy gradient (DDPG) algorithm. In the proposed CCA, an evaluation network is trained to predict an energy cost counter penalized by the cooling status of the DC room, and a policy network is trained to predict optimized control settings when gave the current load and weather information. The proposed algorithm is evaluated on the EnergyPlus simulation platform and on a real data trace collected from the National Super Computing Centre (NSCC) of Singapore. Our results show that the proposed CCA can achieve about 11% cooling cost saving on the simulation platform compared with a manually configured baseline control algorithm. In the trace-based study, we propose a de-underestimation validation mechanism as we cannot directly test the algorithm on a real DC. Even though with DUE the results are conservative, we can still achieve about 15% cooling energy saving on the NSCC data trace if we set the inlet temperature threshold at 26.6 degree Celsius.

1807.05289 2026-06-04 cs.RO cs.SY eess.SY

Transfer Learning for High-Precision Trajectory Tracking Through $\mathcal{L}_1$ Adaptive Feedback and Iterative Learning

通过L1自适应反馈和迭代学习实现高精度轨迹跟踪的迁移学习

Karime Pereida, Dave Kooijman, Rikky R. P. R. Duivenvoorden, Angela P. Schoellig

发表机构 * Institute for Aerospace Studies, University of Toronto, North York, ON M3H 5T6, Canada(多伦多大学航空航天研究 institute, 北York, ON M3H 5T6, 加拿大)

AI总结 本文提出结合L1自适应控制与迭代学习控制的框架,用于在未知动态环境中实现高精度轨迹跟踪,通过迁移学习实现不同系统间的经验传递。

详情
AI中文摘要

当机器人或自动化系统被引入未知和动态环境时,需要鲁棒且适应性的控制策略以应对干扰、未建模动力学和参数不确定性。本文展示了一种结合L1自适应控制与迭代学习控制(ILC)的框架,用于在存在未知和变化的干扰时实现高精度轨迹跟踪。L1自适应控制器使系统接近参考模型,但无法保证完美轨迹跟踪,而ILC则通过以前的迭代改进轨迹跟踪性能。本文的综合框架使用L1自适应控制作为底层控制器,实现鲁棒且可重复的行为,而ILC则作为高层适应方案,主要补偿系统跟踪误差。我们证明了该框架能够在动态不同的系统间实现迁移学习,其中一个系统的学习经验可对另一个系统有益。两种不同四旋翼的实验结果表明,与使用PID控制器的ILC方法相比,该综合L1-ILC框架具有优越性能。结果表明,当初始输入基于自适应控制器的参考模型生成时,我们的L1-ILC框架能够实现精确的轨迹跟踪,即使在存在未知和变化的干扰时,也能实现系统间的学习经验迁移。

英文摘要

Robust and adaptive control strategies are needed when robots or automated systems are introduced to unknown and dynamic environments where they are required to cope with disturbances, unmodeled dynamics, and parametric uncertainties. In this paper, we demonstrate the capabilities of a combined $\mathcal{L}_1$ adaptive control and iterative learning control (ILC) framework to achieve high-precision trajectory tracking in the presence of unknown and changing disturbances. The $\mathcal{L}_1$ adaptive controller makes the system behave close to a reference model; however, it does not guarantee that perfect trajectory tracking is achieved, while ILC improves trajectory tracking performance based on previous iterations. The combined framework in this paper uses $\mathcal{L}_1$ adaptive control as an underlying controller that achieves a robust and repeatable behavior, while the ILC acts as a high-level adaptation scheme that mainly compensates for systematic tracking errors. We illustrate that this framework enables transfer learning between dynamically different systems, where learned experience of one system can be shown to be beneficial for another different system. Experimental results with two different quadrotors show the superior performance of the combined $\mathcal{L}_1$-ILC framework compared with approaches using ILC with an underlying proportional-derivative controller or proportional-integral-derivative controller. Results highlight that our $\mathcal{L}_1$-ILC framework can achieve high-precision trajectory tracking when unknown and changing disturbances are present and can achieve transfer of learned experience between dynamically different systems. Moreover, our approach is able to achieve precise trajectory tracking in the first attempt when the initial input is generated based on the reference model of the adaptive controller.

1709.08174 2026-06-04 cs.LG cs.NA math.NA

Function approximation with zonal function networks with activation functions analogous to the rectified linear unit functions

基于类似修正线性单元函数的区域函数网络的函数近似

Hrushikesh N. Mhaskar

发表机构 * Institute of Mathematical Sciences, Claremont Graduate University(数学科学研究所,克莱蒙特研究生大学)

AI总结 本文研究了在q维球面上的区域函数网络的近似性质,探讨了非正定激活函数的逼近特性,并建立了相应的光滑性类别和逼近性质。

Comments 18 pages, Title changed from the pervious version

详情
AI中文摘要

在q维球面S^q上,区域函数(ZF)网络的形式为x↦∑_{k=1}^n a_kϕ(x·x_k),其中ϕ:[-1,1]→R是激活函数,x_k∈S^q是中心,a_k∈R。尽管正定激活函数的近似性质已被广泛研究,但深度和浅层网络的近期兴趣促使研究类似修正线性单元函数的激活函数形式ϕ(t)=|t|,这些函数不是正定的。本文定义了适当的光滑性类别,并建立了此类网络在该类别中函数的逼近性质。中心可以独立于目标函数选择,系数是训练数据的线性组合。构造保持旋转对称性。

英文摘要

A zonal function (ZF) network on the $q$ dimensional sphere $\mathbb{S}^q$ is a network of the form $\mathbf{x}\mapsto \sum_{k=1}^n a_kϕ(\mathbf{x}\cdot\mathbf{x}_k)$ where $ϕ:[-1,1]\to\mathbf{R}$ is the activation function, $\mathbf{x}_k\in\mathbb{S}^q$ are the centers, and $a_k\in\mathbb{R}$. While the approximation properties of such networks are well studied in the context of positive definite activation functions, recent interest in deep and shallow networks motivate the study of activation functions of the form $ϕ(t)=|t|$, which are not positive definite. In this paper, we define an appropriate smoothess class and establish approximation properties of such networks for functions in this class. The centers can be chosen independently of the target function, and the coefficients are linear combinations of the training data. The constructions preserve rotational symmetries.

1807.02297 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML

Combinatorial Bandits for Incentivizing Agents with Dynamic Preferences

基于动态偏好的激励机制组合博弈问题

Tanner Fiez, Shreyas Sekar, Liyuan Zheng, Lillian J. Ratliff

发表机构 * Electrical Engineering Department, University of Washington(华盛顿大学电气工程系)

AI总结 本文提出一种多臂老虎机框架,用于在资源受限环境下匹配用户激励,结合贪心匹配、UCB算法和马尔可夫链混合时间,理论分析 regret 并通过合成和现实案例验证性能。

Comments Published as a conference paper in Conference on Uncertainty in Artificial Intelligence (UAI) 2018

详情
AI中文摘要

个性化激励或推荐设计以提高用户参与度正日益受到重视,随着数字平台提供商不断涌现。我们提出了一种多臂老虎机框架,用于匹配激励给用户,其偏好在事前未知且随时间动态变化,在资源受限环境下。我们设计了一种算法,结合了三个不同领域的思想:(i) 贪心匹配范式,(ii) 用于老虎机的上置信界算法 (UCB),以及 (iii) 马尔可夫链理论中的混合时间。对于该算法,我们提供了关于 regret 的理论界限,并通过合成和现实(如共享单车平台的供需匹配)示例展示了其性能。

英文摘要

The design of personalized incentives or recommendations to improve user engagement is gaining prominence as digital platform providers continually emerge. We propose a multi-armed bandit framework for matching incentives to users, whose preferences are unknown a priori and evolving dynamically in time, in a resource constrained environment. We design an algorithm that combines ideas from three distinct domains: (i) a greedy matching paradigm, (ii) the upper confidence bound algorithm (UCB) for bandits, and (iii) mixing times from the theory of Markov chains. For this algorithm, we provide theoretical bounds on the regret and demonstrate its performance via both synthetic and realistic (matching supply and demand in a bike-sharing platform) examples.

1807.00553 2026-06-04 cs.LG cs.AI cs.SY eess.SY math.DS stat.ML

A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics

对自动化决策中偏见的更广泛视角:反思认识论与动态性

Roel Dobbe, Sarah Dean, Thomas Gilbert, Nitin Kohli

发表机构 * Department of Electrical Engineering and Computer Sciences, University of California Berkeley, USA(加州大学伯克利分校电气工程与计算机科学系) Department of Rhetoric, University of California Berkeley, USA(加州大学伯克利分校修辞学系) School of Information, University of California Berkeley, USA(加州大学伯克利分校信息学院)

AI总结 本文探讨自动化决策中偏见的根源,将技术偏见视为认识论问题,新兴偏见视为动态反馈现象,强调需反思认识论并采用价值敏感设计方法改进决策系统。

Comments Presented at the 2018 Workshop on Fairness, Accountability and Transparency in Machine Learning during ICML 2018, Stockholm, Sweden

详情
AI中文摘要

机器学习(ML)正日益应用于现实世界,提供可操作见解并成为自动化决策系统的基础。尽管训练数据中固有的偏见是公平性讨论的核心问题,但这些系统也受到技术性和新兴偏见的影响,后者常作为实施中的上下文特定产物出现。本文将技术偏见视为认识论问题,新兴偏见视为动态反馈现象。为激发关于如何改变机器学习实践以有效应对这些问题的讨论,本文探索了偏见的更广泛视角,强调反思认识论的必要性,并指出价值敏感设计方法以重新审视自动化决策系统的设计和实施过程。

英文摘要

Machine learning (ML) is increasingly deployed in real world contexts, supplying actionable insights and forming the basis of automated decision-making systems. While issues resulting from biases pre-existing in training data have been at the center of the fairness debate, these systems are also affected by technical and emergent biases, which often arise as context-specific artifacts of implementation. This position paper interprets technical bias as an epistemological problem and emergent bias as a dynamical feedback phenomenon. In order to stimulate debate on how to change machine learning practice to effectively address these issues, we explore this broader view on bias, stress the need to reflect on epistemology, and point to value-sensitive design methodologies to revisit the design and implementation process of automated decision-making systems.

1806.10472 2026-06-04 cs.CV cs.NA math.NA

Homogeneity of a region in the logarithmic image processing framework: application to region growing algorithms

对数图像处理框架中区域的同质性:应用于区域生长算法

Michel Jourlin, Guillaume Noyel

发表机构 * Lab. H. Curien, UMR CNRS 5516(H. Curien实验室,CNRS 5516研究单位) University of Strathclyde Institute of Global Public Health(斯特拉斯堡大学全球公共卫生研究所) International Prevention Research Institute, iPRI(国际预防研究研究所)

AI总结 本文探讨了对数图像处理(LIP)算子在评估区域同质性中的作用,提出两种新的异质性标准,改进了Revol技术以增强对比度变化的鲁棒性,减少区域生长过程中的链式效应。

详情
Journal ref
International Workshop on the Physics and Mechanics of Random Structures: from Morphology to Material Properties, Jun 2018, Island of Ol{é}ron, France
AI中文摘要

本文探讨了对数图像处理(LIP)算子在评估区域同质性中的作用,提出两种新的异质性标准,一种基于LIP加法,另一种基于LIP标量乘法。这些工具能够管理区域生长算法,采用Revol技术:从初始种子开始,通过应用特定的膨胀操作来扩展生长区域,直到其异质性水平不超过一定值。我们引入的新方法显著改进了Revol现有的技术,使其对图像的对比度变化具有鲁棒性。这种性质强烈减少了区域生长过程中出现的链式效应。

英文摘要

The current paper deals with the role played by Logarithmic Image Processing (LIP) operators for evaluating the homogeneity of a region. Two new criteria of heterogeneity are introduced, one based on the LIP addition and the other based on the LIP scalar multiplication. Such tools are able to manage Region Growing algorithms following the Revol's technique: starting from an initial seed, they consist of applying specific dilations to the growing region while its inhomogeneity level does not exceed a certain level. The new approaches we introduce are significantly improving Revol's existing technique by making it robust to contrast variations in images. Such a property strongly reduces the chaining effect arising in region growing processes.

1806.09919 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Tangent-Space Regularization for Neural-Network Models of Dynamical Systems

神经动力系统模型中的切空间正则化

Fredrik Bagge Carlson, Rolf Johansson, Anders Robertsson

发表机构 * LCCC Linnaeus Center(LCCC 林纳尤斯中心)

AI总结 本文提出神经网络动力系统模型的切空间正则化方法,通过利用动力学函数的切空间特性,改进模型雅可比矩阵的正则化,减少对大量训练数据的依赖,并探讨不同网络架构对输入输出雅可比矩阵学习能力及L2正则化对系统稳定性的影响。

详情
AI中文摘要

本文介绍了神经网络动力系统模型中的切空间正则化概念。许多物理系统在控制应用中的动力学函数的切空间表现出有用性质,例如光滑性,这促使通过假设动力学的切空间来沿系统轨迹正则化模型雅可比矩阵。在没有假设的情况下,神经网络需要大量训练数据才能学习完整的非线性动力学而不过拟合。本文比较了不同网络架构在一步预测和模拟性能上的表现,并研究了不同架构学习具有正确输入输出雅可比矩阵的倾向。此外,探讨了L2权重正则化对学习雅可比特征值谱以及系统稳定性的影响。

英文摘要

This work introduces the concept of tangent space regularization for neural-network models of dynamical systems. The tangent space to the dynamics function of many physical systems of interest in control applications exhibits useful properties, e.g., smoothness, motivating regularization of the model Jacobian along system trajectories using assumptions on the tangent space of the dynamics. Without assumptions, large amounts of training data are required for a neural network to learn the full non-linear dynamics without overfitting. We compare different network architectures on one-step prediction and simulation performance and investigate the propensity of different architectures to learn models with correct input-output Jacobian. Furthermore, the influence of $L_2$ weight regularization on the learned Jacobian eigenvalue spectrum, and hence system stability, is investigated.

1806.08083 2026-06-04 cs.AI cs.SY eess.SY

Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop

拓展主动推断领域:感知-动作循环中的更多内在动机

Martin Biehl, Christian Guckelsberger, Christoph Salge, Simón C. Smith, Daniel Polani

发表机构 * Araya Inc.(Araya公司) Computational Creativity Group, Department of Computing, Goldsmiths, University of London(Goldsmiths大学计算创意小组) Game Innovation Lab, Department of Computer Science and Engineering, New York University(纽约大学游戏创新实验室) Sepia Lab, Adaptive Systems Research Group, Department of Computer Science, University of Hertfordshire(赫特福德大学计算机科学系Sepia实验室) Institute of Perception, Action and Behaviour, School of Informatics, The University of Edinburgh(爱丁堡大学信息学院感知、行为与行为研究所)

AI总结 本文探讨主动推断中是否可利用其他内在动机替代原有动机,同时保持核心机制,并通过形式化方法连接通用强化学习。

Comments 53 pages, 6 figures, 2 tables

详情
AI中文摘要

主动推断是一种雄心勃勃的理论,将自主代理的感知、推断和动作选择统一于单一原则下。它为许多认知现象提供了生物合理解释,包括意识。在主动推断中,动作选择由一个评估未来动作的客观函数驱动,该函数基于当前推断的世界信念。主动推断本质上独立于外在奖励,使其在不同环境或代理形态中具有高度鲁棒性。在文献中,共享这种独立性的范式被总结为内在动机。与主动推断不同,这些动机模型通常不承诺特定的推断和动作选择机制。本文研究主动推断的推断和动作选择机制是否也可用于其他内在动机替代原动机。感知-动作循环明确将推断和动作选择与环境和代理记忆联系起来,因此被用作分析基础。我们重构了主动推断方法,将其原始公式定位其中,并展示如何在保持许多原始特征的同时使用其他内在动机。此外,我们通过形式化方法展示了与通用强化学习的联系。主动推断研究可能从比较其他内在动机诱导的动力学中受益。内在动机研究可能从另一种实现内在动机代理的方式中受益,该方式也共享主动推断的生物合理性。

英文摘要

Active inference is an ambitious theory that treats perception, inference and action selection of autonomous agents under the heading of a single principle. It suggests biologically plausible explanations for many cognitive phenomena, including consciousness. In active inference, action selection is driven by an objective function that evaluates possible future actions with respect to current, inferred beliefs about the world. Active inference at its core is independent from extrinsic rewards, resulting in a high level of robustness across e.g.\ different environments or agent morphologies. In the literature, paradigms that share this independence have been summarised under the notion of intrinsic motivations. In general and in contrast to active inference, these models of motivation come without a commitment to particular inference and action selection mechanisms. In this article, we study if the inference and action selection machinery of active inference can also be used by alternatives to the originally included intrinsic motivation. The perception-action loop explicitly relates inference and action selection to the environment and agent memory, and is consequently used as foundation for our analysis. We reconstruct the active inference approach, locate the original formulation within, and show how alternative intrinsic motivations can be used while keeping many of the original features intact. Furthermore, we illustrate the connection to universal reinforcement learning by means of our formalism. Active inference research may profit from comparisons of the dynamics induced by alternative intrinsic motivations. Research on intrinsic motivations may profit from an additional way to implement intrinsically motivated agents that also share the biological plausibility of active inference.