arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

1905.07501 2026-06-04 cs.LG cs.NA math.NA stat.ML

Enforcing constraints for time series prediction in supervised, unsupervised and reinforcement learning

在监督、无监督和强化学习中对时间序列预测施加约束

Panos Stinis

发表机构 * Advanced Computing, Mathematics and Data Division, Pacific Northwest National Laboratory（太平洋西北国家实验室高级计算、数学与数据部门）

AI总结本文研究了如何在监督、无监督和强化学习中通过施加来自动态系统的约束来加速深度神经网络训练并提高其预测能力，主要贡献是提出了一种基于动作价值函数同伦的新型方法来稳定和加速强化学习训练。

Comments 30 pages, 5 figures

详情

AI中文摘要

我们假设我们给定了一个来自动态系统的数据时间序列，我们的任务是学习动态系统的流映射。我们提出了关于如何施加来自动态系统的约束以加速深度神经网络训练并提高其预测能力的一系列结果。特别是，我们为监督、无监督和强化学习三种主要学习模式提供了在训练过程中施加约束的方法。一般来说，动态约束需要包括类似于模型降阶形式中的记忆项。这些记忆项起到恢复力的作用，纠正由学习的流映射在预测过程中犯的错误。对于监督学习，约束被添加到目标函数中。对于无监督学习，特别是生成对抗网络，约束是通过增强判别器的输入引入的。最后，对于强化学习，特别是actor-critic方法，约束被添加到奖励函数中。此外，对于强化学习情况，我们提出了一种基于动作价值函数同伦的新方法，以稳定和加速训练。我们使用洛伦兹系统数值结果来说明各种构造。

英文摘要

We assume that we are given a time series of data from a dynamical system and our task is to learn the flow map of the dynamical system. We present a collection of results on how to enforce constraints coming from the dynamical system in order to accelerate the training of deep neural networks to represent the flow map of the system as well as increase their predictive ability. In particular, we provide ways to enforce constraints during training for all three major modes of learning, namely supervised, unsupervised and reinforcement learning. In general, the dynamic constraints need to include terms which are analogous to memory terms in model reduction formalisms. Such memory terms act as a restoring force which corrects the errors committed by the learned flow map during prediction. For supervised learning, the constraints are added to the objective function. For the case of unsupervised learning, in particular generative adversarial networks, the constraints are introduced by augmenting the input of the discriminator. Finally, for the case of reinforcement learning and in particular actor-critic methods, the constraints are added to the reward function. In addition, for the reinforcement learning case, we present a novel approach based on homotopy of the action-value function in order to stabilize and accelerate training. We use numerical results for the Lorenz system to illustrate the various constructions.

URL PDF HTML ☆

赞 0 踩 0

1902.01119 2026-06-04 cs.AI cs.CL cs.LG cs.SY eess.SY

The Natural Language of Actions

动作的自然语言

Guy Tennenholtz, Shie Mannor

发表机构 * Faculty of Electrical Engineering, Technion Institute of Technology, Israel（电气工程学院，技术学院，以色列）

AI总结本文提出Act2Vec框架，用于学习基于上下文的动作表示以提升强化学习性能，通过将相似动作分组并利用动作间的关系来改进Q值近似和状态表示。

Comments Published in the proceedings of the 36th International Conference on Machine Learning (ICML 2019)

1811.01516 2026-06-04 cs.RO cs.SY eess.SY

SLAMBooster: An Application-aware Controller for Approximation in SLAM

SLAMBooster: 一种面向应用的近似控制算法用于SLAM

Yan Pei, Swarnendu Biswas, Donald S. Fussell, Keshav Pingali

发表机构 * University of Texas at Austin, USA（德克萨斯大学奥斯汀分校）； Indian Institute of Technology Kanpur, India（印度理工学院坎普尔分校）

AI总结本文提出SLAMBooster，一种面向应用的在线控制算法，用于在SLAM中实现近似计算，通过动态调整近似参数来减少计算时间和能耗，同时保持定位精度。

Comments 13 pages

详情

AI中文摘要

同时定位与建图（SLAM）是构建移动代理环境地图的同时定位代理的问题。密集SLAM算法在像素粒度上执行重建和定位。这些算法需要大量计算资源，这限制了它们在低功耗资源受限设备上的应用。近似计算可以用于加速SLAM实现，只要近似不会阻止代理正确导航通过环境。先前的SLAM近似研究假设代理的整个轨迹在开始前已知，并且专注于离线控制器，在轨迹开始时设置近似参数。在实践中，轨迹并不事先已知，允许在运行时动态调整参数提供了更多减少计算时间和能耗的机会。我们描述了SLAMBooster，一种面向应用的在线控制系统，用于密集SLAM，它在代理运动过程中自适应地控制近似参数。SLAMBooster基于比例-积分-微分（PID）控制器技术，但我们的实验表明这种通用控制器导致定位精度显著下降。为了解决这个问题，SLAMBooster还利用领域知识通过执行平滑表面检测和姿态校正来控制近似。我们实现了SLAMBooster在开源的SLAMBench框架中，并在多个轨迹上进行了评估。我们的实验表明，在嵌入式平台上，SLAMBooster平均减少了72%的计算时间和35%的能量消耗，同时保持定位精度在合理范围内。这些改进使得在更广泛设备上部署SLAM成为可能。

英文摘要

Simultaneous Localization and Mapping (SLAM) is the problem of constructing a map of a mobile agent's environment while localizing the agent within the map. Dense SLAM algorithms perform reconstruction and localization at pixel granularity. These algorithms require a lot of computational power, which has hindered their use on low-power resource-constrained devices. Approximate computing can be used to speed up SLAM implementations as long as the approximations do not prevent the agent from navigating correctly through the environment. Previous studies of approximation in SLAM have assumed that the entire trajectory of the agent is known before the agent starts, and they have focused on offline controllers that set approximation knobs at the start of the trajectory. In practice, the trajectory is not known ahead of time, and allowing knob settings to change dynamically opens up more opportunities for reducing computation time and energy. We describe SLAMBooster, an application-aware, online control system for dense SLAM that adaptively controls approximation knobs during the motion of the agent. SLAMBooster is based on a control technique called proportional-integral-derivative (PID) controller but our experiments showed this application-agnostic controller led to an unacceptable reduction in localization accuracy. To address this problem, SLAMBooster also exploits domain knowledge for controlling approximation by performing smooth surface detection and pose correction. We implemented SLAMBooster in the open-source SLAMBench framework and evaluated it on several trajectories. Our experiments show that on the average, SLAMBooster reduces the computation time by 72% and energy consumption by 35% on an embedded platform, while maintaining the accuracy of localization within reasonable bounds. These improvements make it feasible to deploy SLAM on a wider range of devices.

URL PDF HTML ☆

赞 0 踩 0

1905.05380 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Control Regularization for Reduced Variance Reinforcement Learning

减少方差的强化学习中的控制正则化

Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri, Yisong Yue, Joel W. Burdick

发表机构 * California Institute of Technology, Pasadena, CA（加州理工学院）； University of Michigan, Ann Arbor, MI（密歇根大学）； Rice University, Houston, TX（Rice大学）

AI总结本文提出了一种功能正则化方法，用于减少连续控制中强化学习的方差，通过正则化深度策略的行为与先验策略相似，从而在偏倚-方差权衡中实现更稳定的动态稳定性和更高效的训练。

Comments Appearing in ICML 2019

详情

AI中文摘要

在模型无关强化学习（RL）中，处理高方差是一个重要的挑战。现有方法不可靠，使用不同初始化/种子时性能表现方差较大。针对连续控制中出现的问题，我们提出了一种功能正则化方法来增强模型无关RL。具体而言，我们正则化深度策略的行为与先验策略相似，即在函数空间中进行正则化。我们证明功能正则化会产生偏倚-方差权衡，并提出了一种自适应调节策略来优化这种权衡。当策略先验具有控制理论稳定性保证时，我们进一步证明这种正则化在整个学习过程中近似保持这些稳定性保证。我们在多种设置中通过实验证明了我们的方法，展示了显著减少的方差、保证的动态稳定性和比深度RL更高效的训练。

英文摘要

Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a policy prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the policy prior has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone.

URL PDF HTML ☆

赞 0 踩 0

1905.04351 2026-06-04 cs.LG cs.NA math.NA physics.comp-ph physics.data-an stat.ML

Solving Irregular and Data-enriched Differential Equations using Deep Neural Networks

使用深度神经网络求解不规则和数据丰富的微分方程

Craig Michoski, Milos Milosavljevic, Todd Oliver, David Hatch

发表机构 * Oden Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin,TX 78712（计算工程与科学研究院，德克萨斯大学奥斯汀分校，奥斯汀，TX 78712）； Department of Astronomy, The University of Texas at Austin, Austin, TX 78712（天文学系，德克萨斯大学奥斯汀分校，奥斯汀，TX 78712）； Institute for Fusion Studies, University of Texas at Austin, Austin,TX 78712（融合研究学院，德克萨斯大学奥斯汀分校，奥斯汀，TX 78712）

AI总结本文提出了一种利用深度神经网络求解不规则和数据丰富的微分方程的方法，通过分析Sod激波管解和压缩磁流体动力学中的激波解，展示了该方法在提高数值方法性能和参数空间探索方面的优势。

Comments 21 pages, 14 figures, 3 tables

详情

AI中文摘要

近期的研究提出了一种简单的数值方法，用于利用深度神经网络（DNN）求解偏微分方程（PDEs）。本文回顾并扩展了该方法，并将其应用于分析数值PDEs和非线性分析中最基本的特征之一：不规则解。首先，讨论并分析了Sod激波管解到可压缩欧拉方程的解，然后与传统有限元和有限体积方法进行比较。这些方法被扩展以考虑性能改进和同时的参数空间探索。接下来，解决了一个压缩磁流体动力学（MHD）中的激波解，并在实验数据被用来增强一个原本不足以验证的PDE系统的情况下使用。这通过在模型PDE系统中加入源项并使用监督训练在合成实验数据上实现。所得到的DNN框架似乎在系统原型化方面表现出几乎幻想般的易用性，能够自然整合大规模数据集（无论是合成还是实验数据），同时能够同时进行整个参数空间的单次探索。

英文摘要

Recent work has introduced a simple numerical method for solving partial differential equations (PDEs) with deep neural networks (DNNs). This paper reviews and extends the method while applying it to analyze one of the most fundamental features in numerical PDEs and nonlinear analysis: irregular solutions. First, the Sod shock tube solution to compressible Euler equations is discussed, analyzed, and then compared to conventional finite element and finite volume methods. These methods are extended to consider performance improvements and simultaneous parameter space exploration. Next, a shock solution to compressible magnetohydrodynamics (MHD) is solved for, and used in a scenario where experimental data is utilized to enhance a PDE system that is \emph{a priori} insufficient to validate against the observed/experimental data. This is accomplished by enriching the model PDE system with source terms and using supervised training on synthetic experimental data. The resulting DNN framework for PDEs seems to demonstrate almost fantastical ease of system prototyping, natural integration of large data sets (be they synthetic or experimental), all while simultaneously enabling single-pass exploration of the entire parameter space.

URL PDF HTML ☆

赞 0 踩 0

1905.03130 2026-06-04 cs.RO cs.SY eess.SY

Cooperative Distributed Robust Control of Modular Mobile Robots with Bounded Curvature and Velocity

模块化移动机器人协同分布式鲁棒控制：具有有限曲率和速度的控制

Xiaorui Zhu, Youngshik Kim, Mark A. Minor

发表机构 * University of Utah（犹他大学）

AI总结本文研究了一种新型的运动控制系统，用于具有柔性框架轮式模块化移动机器人（CFMMR）。该系统结合了基于有限曲率的运动控制器和非线性阻尼动态控制器，通过多种控制器交互形式评估不同控制结构，实验结果验证了姿态调节的有效性。

1905.01150 2026-06-04 cs.RO cs.MA cs.SY eess.SY

A Right-of-Way Based Strategy to Implement Safe and Efficient Driving at Non-Signalized Intersections for Automated Vehicles

基于通行权的非信号交叉口安全高效自动驾驶策略

Yadong Xing, Can Zhao, ZhiHeng Li, Yi Zhang, Li Li, Fei-Yue Wang, Xiao Wang, Yujing Wang, Yuelong Su, Dongpu Cao

发表机构 * Graduate School at Shenzhen, Tsinghua University（清华大学深圳研究生院）； Department of Automation, BNRist, Tsinghua University（清华大学自动化系，北京理工大学）； traffic management solution division, traffic throughput through simulation experiments（交通管理解决方案部）

AI总结本文提出一种基于通行权分配的策略，用于在非信号交叉口实现安全高效的自动驾驶，通过测试比较现有策略，证明该策略在交通效率上优于原有策略，但受限于通信范围和缺乏长期规划，不如协作驾驶策略，但通信成本更低。

Comments 6 pages, 7 figures

1809.07412 2026-06-04 cs.LG cs.AI cs.SY eess.SY

Learning, Planning, and Control in a Monolithic Neural Event Inference Architecture

在单体神经事件推理架构中的学习、规划与控制

Martin V. Butz, David Bilkey, Dania Humaidan, Alistair Knott, Sebastian Otte

发表机构 * Cognitive Modeling Group Computer Science Department University of Tübingen（图宾根大学认知建模组计算机科学系）

AI总结该研究提出了一种单体神经事件推理架构REPRISE，通过学习动态系统的时序事件预测模型，结合回顾和前瞻推理，实现对传感器运动动态的高效预测与控制。

Comments This is the final revision submitted to the Neural Networks journal. The revision mainly includes improvements in language, explanation, and additional references and system relations

详情

AI中文摘要

我们引入了REPRISE，一种回顾和前瞻推理方案，用于学习动态系统的时序事件预测模型。REPRISE推断出不可观测的上下文事件状态及其最佳解释最近遭遇的传感器运动经验的时序预测模型。同时，它以目标导向的方式优化即将到来的运动活动。在此，REPRISE通过循环神经网络（RNN）实现，该网络学习由不同模拟动态车辆生成的传感器运动连续性的时序前向模型。RNN通过上下文神经元增强，能够编码不同但相关的传感器运动动态为紧凑的事件代码。我们证明REPRISE能够同时学习分离和近似遇到的传感器运动动态：它分析传感器运动误差信号，同时适应内部上下文神经活动和连接权重值。此外，我们证明REPRISE可以利用所学模型诱导目标导向的模型预测控制，即近似主动推理：给定一个目标状态，系统想象一个优化该状态的运动命令序列，以最小化与目标的距离。RNN活动因此持续想象即将到来的未来并反思最近的过去，优化预测模型、隐藏神经状态活动和即将到来的运动活动。结果，事件预测神经编码得以发展，从而能够调用高效且适应性强的目标导向传感器运动控制。

英文摘要

We introduce REPRISE, a REtrospective and PRospective Inference SchEme, which learns temporal event-predictive models of dynamical systems. REPRISE infers the unobservable contextual event state and accompanying temporal predictive models that best explain the recently encountered sensorimotor experiences retrospectively. Meanwhile, it optimizes upcoming motor activities prospectively in a goal-directed manner. Here, REPRISE is implemented by a recurrent neural network (RNN), which learns temporal forward models of the sensorimotor contingencies generated by different simulated dynamic vehicles. The RNN is augmented with contextual neurons, which enable the encoding of distinct, but related, sensorimotor dynamics as compact event codes. We show that REPRISE concurrently learns to separate and approximate the encountered sensorimotor dynamics: it analyzes sensorimotor error signals adapting both internal contextual neural activities and connection weight values. Moreover, we show that REPRISE can exploit the learned model to induce goal-directed, model-predictive control, that is, approximate active inference: Given a goal state, the system imagines a motor command sequence optimizing it with the prospective objective to minimize the distance to the goal. The RNN activities thus continuously imagine the upcoming future and reflect on the recent past, optimizing the predictive model, the hidden neural state activities, and the upcoming motor activities. As a result, event-predictive neural encodings develop, which allow the invocation of highly effective and adaptive goal-directed sensorimotor control.

URL PDF HTML ☆

赞 0 踩 0

1904.13304 2026-06-04 cs.LG cs.SY eess.SP eess.SY stat.ML

A supervised-learning-based strategy for optimal demand response of an HVAC System

基于监督学习的HVAC系统最优需求响应策略

Youngjin Kim

发表机构 * Member, IEEE（IEEE成员）

AI总结本文提出了一种基于监督学习的HVAC系统最优需求响应策略，通过训练人工神经网络并结合分段线性方程，解决多区建筑中HVAC系统的优化需求响应问题，同时确保热舒适性和经济性。

Comments 12 pages

详情

AI中文摘要

建筑的大型热容量使供暖、通风和空气调节（HVAC）系统能够被用作需求响应（DR）资源。优化HVAC单元的需求响应具有挑战性，特别是在多区建筑中，因为这需要详细的基于物理的模型来描述区域温度变化和建筑热状况。本文提出了一种基于监督学习（SL）的新策略，用于多区建筑中的HVAC系统最优需求响应。人工神经网络（ANNs）使用正常建筑运行条件下的数据进行训练。ANNs通过分段线性方程进行复制，并显式地整合到基于价格的需求响应优化问题中。该优化问题在各种电价和建筑热状况下得到解决。解决方案进一步用于训练深度神经网络（DNN）以直接确定最优需求响应计划，称为监督学习辅助元预测（SLAMP）。通过三种不同方法进行案例研究：显式ANN复制（EAR）、SLAMP和基于物理的建模。案例研究结果验证了所提出的SL策略的有效性，不仅在实际应用性和计算时间方面，还确保了 occupant的热舒适性和HVAC系统的经济运行。

英文摘要

The large thermal capacity of buildings enables heating, ventilating, and air-conditioning (HVAC) systems to be exploited as demand response (DR) resources. Optimal DR of HVAC units is challenging, particularly for multi-zone buildings, because this requires detailed physics-based models of zonal temperature variations for HVAC system operation and building thermal conditions. This paper proposes a new strategy for optimal DR of an HVAC system in a multi-zone building, based on supervised learning (SL). Artificial neural networks (ANNs) are trained with data obtained under normal building operating conditions. The ANNs are replicated using piecewise linear equations, which are explicitly integrated into an optimal scheduling problem for price-based DR. The optimization problem is solved for various electricity prices and building thermal conditions. The solutions are further used to train a deep neural network (DNN) to directly determine the optimal DR schedule, referred to here as supervised-learning-aided meta-prediction (SLAMP). Case studies are performed using three different methods: explicit ANN replication (EAR), SLAMP, and physics-based modeling. The case study results verify the effectiveness of the proposed SL-based strategy, in terms of both practical applicability and computational time, while also ensuring the thermal comfort of occupants and cost-effective operation of the HVAC system.

URL PDF HTML ☆

赞 0 踩 0

1806.06317 2026-06-04 cs.LG cs.NA math.NA stat.ML

Laplacian Smoothing Gradient Descent

拉普拉斯平滑梯度下降

Stanley Osher, Bao Wang, Penghang Yin, Xiyang Luo, Farzin Barekat, Minh Pham, Alex Lin

发表机构 * University of California, Los Angeles（加州大学洛杉矶分校）

AI总结本文提出了一种简单的方法改进梯度下降和随机梯度下降，通过乘以正定矩阵的逆（可通过FFT高效计算）来减少方差、增大步长并提高泛化精度，同时在理论和实践中均表现出色。

Comments 28 pages, 15 figures

详情

AI中文摘要

我们提出了一类非常简单的梯度下降和随机梯度下降的修改方法。我们展示，当应用于从逻辑回归到深度神经网络的各种机器学习问题时，所提出的替代方法可以显著减少方差，允许采取更大的步长，并提高泛化准确性。这些方法仅涉及将通常的（随机）梯度乘以正定矩阵的逆（可以通过FFT高效计算），该矩阵的条件数来自一维离散拉普拉斯或其高阶推广。它还保持均值并增加最小成分，减少最大成分。哈密尔顿-雅可比偏微分方程的理论表明，新算法的隐式版本几乎等同于在新的函数上进行梯度下降，该函数（i）具有与原函数相同的全局极小值，并且（ii）更“凸”。此外，我们证明具有这些替代方案的优化算法在离散Sobolev $H_σ^p$ 意义下统一收敛，并减少凸优化问题的最优性差距。代码可在：\url{https://github.com/BaoWangMath/LaplacianSmoothing-GradientDescent}

英文摘要

We propose a class of very simple modifications of gradient descent and stochastic gradient descent. We show that when applied to a large variety of machine learning problems, ranging from logistic regression to deep neural nets, the proposed surrogates can dramatically reduce the variance, allow to take a larger step size, and improve the generalization accuracy. The methods only involve multiplying the usual (stochastic) gradient by the inverse of a positive definitive matrix (which can be computed efficiently by FFT) with a low condition number coming from a one-dimensional discrete Laplacian or its high order generalizations. It also preserves the mean and increases the smallest component and decreases the largest component. The theory of Hamilton-Jacobi partial differential equations demonstrates that the implicit version of the new algorithm is almost the same as doing gradient descent on a new function which (i) has the same global minima as the original function and (ii) is ``more convex". Moreover, we show that optimization algorithms with these surrogates converge uniformly in the discrete Sobolev $H_σ^p$ sense and reduce the optimality gap for convex optimization problems. The code is available at: \url{https://github.com/BaoWangMath/LaplacianSmoothing-GradientDescent}

URL PDF HTML ☆

赞 0 踩 0

1809.00846 2026-06-04 cs.LG cs.CV cs.SY eess.SY stat.ML

Towards Understanding Regularization in Batch Normalization

向批量归一化中的正则化理解迈进

Ping Luo, Xinjiang Wang, Wenqi Shao, Zhanglin Peng

发表机构 * The Chinese University of Hong Kong（香港中文大学）； SenseTime Research（时光科技研究院）； The University of Hong Kong（香港大学）

AI总结本文通过理论分析探讨了批量归一化在神经网络训练中的收敛性和泛化能力，揭示了批量归一化作为隐式正则化的作用，并通过实验验证了其在卷积神经网络中的正则化特性。

Comments International Conference on Learning Representations (ICLR)

1904.09656 2026-06-04 cs.LG cs.NA math.NA

Solution of Definite Integrals using Functional Link Artificial Neural Networks

使用功能链接人工神经网络求解定积分

Satyasaran Changdar, Snehangshu Bhattacharjee

发表机构 * Department of Information Technology, Institute of Engineering and Management（信息科技系，工程管理学院）

AI总结本文提出了一种利用人工神经网络求解定积分的新方法，通过最小化精心设计的误差函数，构建出一种新颖的替代传统数值方法的神经网络，特别适用于高阶多项式积分。

Comments 14 pages, 7 figures

1903.03712 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Adaptive Power System Emergency Control using Deep Reinforcement Learning

基于深度强化学习的自适应电力系统紧急控制

Qiuhua Huang, Renke Huang, Weituo Hao, Jie Tan, Rui Fan, Zhenyu Huang

发表机构 * Pacific Northwest National Laboratory（太平洋西北国家实验室）； Battelle（巴特尔）； U.S. Department of Energy（美国能源部）； Deep Science laboratory（深科学实验室）； Google Brain（谷歌大脑）

AI总结本文提出了一种基于深度强化学习的自适应电力系统紧急控制方法，通过高维特征提取和非线性泛化能力来应对现代电网中的不确定性与变化，展示了其在发电机动态制动和电压降低负载切除中的优异性能和鲁棒性。

Comments 12 pages

详情

AI中文摘要

电力系统紧急控制通常被视为电网安全和韧性的最后一道安全网。现有的紧急控制方案通常是基于设想的'最坏'场景或几个典型运行场景进行离线设计。这些方案在现代电网中出现越来越多的不确定性和变化时，面临着显著的适应性和鲁棒性问题。为了解决这些挑战，本文首次开发了新的自适应紧急控制方案，利用深度强化学习（DRL）的高维特征提取和非线性泛化能力来处理复杂的电力系统。此外，首次设计了一个名为RLGC的开源平台，以协助DRL算法在电力系统控制中的开发和基准测试。详细介绍了该平台和基于DRL的紧急控制方案，包括发电机动态制动和电压降低负载切除。在两个区域四机系统和IEEE 39节点系统中进行了广泛的案例研究，证明了所提出方案的优异性能和鲁棒性。

英文摘要

Power system emergency control is generally regarded as the last safety net for grid security and resiliency. Existing emergency control schemes are usually designed off-line based on either the conceived "worst" case scenario or a few typical operation scenarios. These schemes are facing significant adaptiveness and robustness issues as increasing uncertainties and variations occur in modern electrical grids. To address these challenges, for the first time, this paper developed novel adaptive emergency control schemes using deep reinforcement learning (DRL), by leveraging the high-dimensional feature extraction and non-linear generalization capabilities of DRL for complex power systems. Furthermore, an open-source platform named RLGC has been designed for the first time to assist the development and benchmarking of DRL algorithms for power system control. Details of the platform and DRL-based emergency control schemes for generator dynamic braking and under-voltage load shedding are presented. Extensive case studies performed in both two-area four-machine system and IEEE 39-Bus system have demonstrated the excellent performance and robustness of the proposed schemes.

URL PDF HTML ☆

赞 0 踩 0

1712.01975 2026-06-04 cs.LG cs.NA math.NA math.OC

Regularization and feature selection for large dimensional data

大规模数据的正则化与特征选择

Nand Sharma, Prathamesh Verlekar, Rehab Ashary, Sui Zhiquan

发表机构 * Department of Mathematics, Colorado State University（科罗拉多州立大学数学系）； Department of Computer Science, Colorado State University（科罗拉多州立大学计算机科学系）

AI总结本文研究了五种嵌入式特征选择方法，通过ridge回归、Lasso回归或其组合进行正则化，以在高维数据中减少特征空间并提高分类性能。

1808.07921 2026-06-04 cs.RO cs.AI cs.PL cs.SE cs.SY eess.SY

SOTER: A Runtime Assurance Framework for Programming Safe Robotics Systems

SOTER：一种用于安全机器人系统编程的运行时保证框架

Ankush Desai, Shromona Ghosh, Sanjit A. Seshia, Natarajan Shankar, Ashish Tiwari

发表机构 * University of California at Berkeley, CA, USA（加州大学伯克利分校）； SRI International（SRI国际）； Microsoft（微软）

AI总结本文提出SOTER框架，通过一种编程语言和集成的运行时保证系统，为安全机器人系统提供保障，确保在使用未经认证组件时仍能满足安全要求。

详情

AI中文摘要

近年来，机器人实现更高自主性和智能性的趋势导致了高度复杂性。自主机器人越来越多地依赖第三方现成组件和复杂的机器学习技术。这种趋势使得提供强设计时认证的正确操作变得具有挑战性。为了解决这些挑战，我们提出了SOTER，一种机器人编程框架，包含两个关键组件：（1）一种用于实现和测试高层反应式机器人软件的编程语言；（2）一个集成的运行时保证（RTA）系统，该系统帮助在使用未经认证的组件时仍能提供安全保证。SOTER提供了语言原语，用于声明性地构建RTA模块，该模块包含一个高级高性能控制器（未经认证）、一个安全但性能较低的控制器（认证）以及期望的安全规范。该框架提供正式保证，确保一个良好的RTA模块始终满足安全规范，而无需完全牺牲性能，通过在安全时使用高性能未经认证的组件。SOTER允许复杂的机器人软件堆栈作为RTA模块的组合来构建，其中每个未经认证的组件都通过RTA模块进行保护。为了证明我们框架的有效性，我们考虑了一个现实世界案例研究，即构建一个安全的无人机监视系统。我们的实验在模拟和实际无人机上均表明，SOTER启用的RTA确保了系统的安全性，包括在不可信的第三方组件有bug或偏离预期行为时。

英文摘要

The recent drive towards achieving greater autonomy and intelligence in robotics has led to high levels of complexity. Autonomous robots increasingly depend on third party off-the-shelf components and complex machine-learning techniques. This trend makes it challenging to provide strong design-time certification of correct operation. To address these challenges, we present SOTER, a robotics programming framework with two key components: (1) a programming language for implementing and testing high-level reactive robotics software and (2) an integrated runtime assurance (RTA) system that helps enable the use of uncertified components, while still providing safety guarantees. SOTER provides language primitives to declaratively construct a RTA module consisting of an advanced, high-performance controller (uncertified), a safe, lower-performance controller (certified), and the desired safety specification. The framework provides a formal guarantee that a well-formed RTA module always satisfies the safety specification, without completely sacrificing performance by using higher performance uncertified components whenever safe. SOTER allows the complex robotics software stack to be constructed as a composition of RTA modules, where each uncertified component is protected using a RTA module. To demonstrate the efficacy of our framework, we consider a real-world case-study of building a safe drone surveillance system. Our experiments both in simulation and on actual drones show that the SOTER-enabled RTA ensures the safety of the system, including when untrusted third-party components have bugs or deviate from the desired behavior.

URL PDF HTML ☆

赞 0 踩 0

1806.02998 2026-06-04 cs.CV cs.NA math.GN math.NA

Logarithmic mathematical morphology: a new framework adaptive to illumination changes

对数数学形态学：一种适应光照变化的新框架

Guillaume Noyel

发表机构 * University of Strathclyde Institute of Global Public Health（斯特拉思克莱德大学全球公共卫生研究所）； International Prevention Research Institute（国际预防研究所）； iPRI ； Lyon, France（法国里昂）

AI总结本文提出了一种基于对数图像处理模型的新数学形态学框架，该框架能够适应曝光时间或光照强度变化引起的光照变化，通过定义对数膨胀和腐蚀算子，提高了低对比信息处理的效率。

详情

DOI: 10.1007/978-3-030-13469-3_53
Journal ref: 23rd Iberoamerican Congress on Pattern Recognition (CIARP 2018), Nov 2018, Madrid, Spain. Springer International Publishing, Lecture Notes in Computer Science, 11401, pp.453-461, 2019, Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. https://atvs.ii.uam.es/ciarp2018/

AI中文摘要

通过基于成像物理的对数图像处理（LIP）模型，定义了一组适应由曝光时间或光照强度变化引起的光照变化的数学形态学（MM）算子。该模型与人类视觉一致。基本算子，即对数膨胀和对数腐蚀，通过结构函数的LIP加法定义。这些两个互补算子的结合给出了形态学滤波器，即对数开运算和闭运算，用于模式识别。建立了“经典”膨胀和腐蚀与其对数版本之间的数学关系，从而方便了其实现。在模拟和真实图像上的结果表明，对数MM在低对比信息上比“经典”MM更有效。

英文摘要

A new set of mathematical morphology (MM) operators adaptive to illumination changes caused by variation of exposure time or light intensity is defined thanks to the Logarithmic Image Processing (LIP) model. This model based on the physics of acquisition is consistent with human vision. The fundamental operators, the logarithmic-dilation and the logarithmic-erosion, are defined with the LIP-addition of a structuring function. The combination of these two adjunct operators gives morphological filters, namely the logarithmic-opening and closing, useful for pattern recognition. The mathematical relation existing between ``classical'' dilation and erosion and their logarithmic-versions is established facilitating their implementation. Results on simulated and real images show that logarithmic-MM is more efficient on low-contrasted information than ``classical'' MM.

URL PDF HTML ☆

赞 0 踩 0

1904.08361 2026-06-04 cs.LG cs.RO cs.SY eess.SY stat.ML

Decoupled Data Based Approach for Learning to Control Nonlinear Dynamical Systems

基于解耦数据的方法用于学习控制非线性动力学系统

Ran Wang, Karthikeya Parunandi, Dan Yu, Dileep Kalathil, Suman Chakravorty

发表机构 * College of Astronautics, Nanjing University and hence, run into the curse of dimensionality（南京大学航天学院）； Department of Electrical and Computer Engineering, Texas A&M University, Texas, USA（德克萨斯A&M大学电气与计算机工程系）

AI总结本文提出了一种解耦数据基于的方法，用于学习控制具有连续状态空间、连续动作空间和未知动态的非线性随机动力学系统，通过解耦的开环-闭环方法，利用黑盒仿真模型解决开环确定性轨迹优化问题，并通过线性化动态在该名义轨迹上开发闭环控制，从而使用线性二次调节器算法，证明了该方法的性能近似最优，并在训练时间上显著优于其他先进算法。

详情

AI中文摘要

本文解决了一个非线性随机动力学系统学习最优控制策略的问题，该系统具有连续状态空间、连续动作空间和未知动态。此类问题通常在随机自适应控制和强化学习文献中使用基于模型和无模型的方法分别解决。这两种方法都依赖于解决动态规划问题，无论是直接还是间接，以找到最优闭环控制策略。动态规划方法固有的'维度灾难'使这些方法也变得计算上困难。本文提出了一种新颖的解耦数据基于控制（D2C）算法，通过解耦的'开环-闭环'方法解决这个问题。首先，使用动力学系统的黑盒仿真模型解决一个开环确定性轨迹优化问题。然后，通过在该名义轨迹上线性化动态，开发围绕该开环轨迹的闭环控制。通过线性化，可以使用基于线性二次调节器的算法来实现该闭环控制。我们证明了D2C算法的性能近似最优。此外，仿真性能表明，与其它先进算法相比，训练时间显著减少。

英文摘要

This paper addresses the problem of learning the optimal control policy for a nonlinear stochastic dynamical system with continuous state space, continuous action space and unknown dynamics. This class of problems are typically addressed in stochastic adaptive control and reinforcement learning literature using model-based and model-free approaches respectively. Both methods rely on solving a dynamic programming problem, either directly or indirectly, for finding the optimal closed loop control policy. The inherent `curse of dimensionality' associated with dynamic programming method makes these approaches also computationally difficult. This paper proposes a novel decoupled data-based control (D2C) algorithm that addresses this problem using a decoupled, `open loop - closed loop', approach. First, an open-loop deterministic trajectory optimization problem is solved using a black-box simulation model of the dynamical system. Then, a closed loop control is developed around this open loop trajectory by linearization of the dynamics about this nominal trajectory. By virtue of linearization, a linear quadratic regulator based algorithm can be used for this closed loop control. We show that the performance of D2C algorithm is approximately optimal. Moreover, simulation performance suggests significant reduction in training time compared to other state of the art algorithms.

URL PDF HTML ☆

赞 0 踩 0

1904.07200 2026-06-04 cs.LG cs.NA math.NA stat.ML

A Discussion on Solving Partial Differential Equations using Neural Networks

利用神经网络求解偏微分方程的讨论

Tim Dockhorn

发表机构 * Department of Applied Mathematics（应用数学系）； University of Waterloo（滑铁卢大学）

AI总结本文探讨了神经网络求解偏微分方程的能力，通过数值实验展示了小型神经网络能够准确学习复杂解，并分析了随机权重初始化对解质量的影响，提出了损失函数的选择、神经网络与经典数值方法的优劣比较，以及未来研究方向。

Comments 9 pages, 2 figures

1904.06892 2026-06-04 cs.RO cs.SY eess.SY

Learning to Guide: Guidance Law Based on Deep Meta-learning and Model Predictive Path Integral Control

学习引导：基于深度元学习和模型预测路径积分控制的引导法

Chen Liang, Weihong Wang, Zhenghua Liu, Chao Lai, Benchun Zhou

发表机构 * School of Automation Science and Electrical Engineering, Beihang University（北京航空航天大学自动化科学与电气工程学院）； Navigation and Control Technology Research Institute of China North Industries Group Corporation（中国北方工业集团有限公司导航与控制技术研究院）

AI总结本文提出了一种基于模型驱动深度强化学习的新型引导方案，通过将深度神经网络作为引导动力学的预测模型融入模型预测路径积分（MPPI）控制框架中，利用元学习技术使深度神经动力学模型能够在线适应环境变化，从而缓解标准MPPI控制因实际环境与训练数据差异导致的性能下降，并构建了在存在作动器故障时拦截机动目标的新型引导律。

Comments Code available at https://github.com/tccliangchen/deep_meta-learning_guidance_law . in IEEE Access 2019

详情

DOI: 10.1109/ACCESS.2019.2909579

AI中文摘要

在本文中，我们提出了一种基于模型驱动深度强化学习（RL）技术的新型引导方案。利用模型驱动深度RL方法，训练一个深度神经网络作为引导动力学的预测模型，并将其纳入模型预测路径积分（MPPI）控制框架中。然而，传统的MPPI框架假设实际环境与训练数据集相似，这在实践中由于目标机动、其他扰动和作动器故障等因素而难以实现。为了解决这个问题，我们的方法利用元学习技术，使深度神经动力学模型能够在线适应这些变化。通过这种方法，我们可以减轻标准MPPI控制因实际环境与训练数据差异导致的性能下降。然后，基于上述技术，构建了一种新型引导律，用于在存在作动器故障的情况下拦截具有期望终端撞击角的机动目标。不同情况下的仿真和实验结果表明，所提出的引导律在实现成功拦截机动目标方面具有有效性与鲁棒性。

英文摘要

In this paper, we present a novel guidance scheme based on model-based deep reinforcement learning (RL) technique. With model-based deep RL method, a deep neural network is trained as a predictive model of guidance dynamics which is incorporated into a model predictive path integral (MPPI) control framework. However the traditional MPPI framework assumes the actual environment similar to the training dataset for the deep neural network which is impractical in practice with different maneuvering of target, other perturbations and actuator failures. To address this problem, our method utilize meta-learning technique to make the deep neural dynamics model adapt to such changes online. With this approach we can alleviate the performance deterioration of standard MPPI control caused by the difference between actual environment and training data. Then, a novel guidance law for a varying velocity interceptor intercepting maneuvering target with desired terminal impact angle under actuator failure is constructed based on aforementioned techniques. Simulation and experiment results under different cases show the effectiveness and robustness of the proposed guidance law in achieving successful interceptions of maneuvering target.

URL PDF HTML ☆

赞 0 踩 0

1904.06680 2026-06-04 cs.RO cs.SY eess.SY

Online Sampling in the Parameter Space of a Neural Network for GPU-accelerated Motion Planning of Autonomous Vehicles

神经网络参数空间中的在线采样用于自动驾驶车辆的GPU加速运动规划

Mogens Graf Plessen

发表机构 * MPG

AI总结本文提出了一种用于自动驾驶车辆GPU加速运动规划的神经网络参数空间在线采样方法，通过神经网络作为控制器参数化来处理非线性和非凸系统，并在预测时间范围内保持参数化不变，同时通过变化的特征向量确定转向和纵向加速度控制。

Comments 8 pages, 8 figures, 3 tables, conference paper

详情

AI中文摘要

本文提出了一种用于自动驾驶车辆GPU加速运动规划的神经网络参数空间在线采样方法。神经网络被用作控制器参数化，因为它们能够处理非线性和非凸系统，且其复杂性不随预测时间长度而增加。在网络参数化在每个采样时间点被采样后，在预测时间范围内保持不变。由于输入到网络的特征向量在预测时间范围内变化，因此控制仍然在预测时间范围内变化。全维车辆通过多面体建模。在障碍物点数据假设下，并在其预测时间范围内以恒定速度假设进行外推的情况下，碰撞避免减少为线性不等式检查。转向和纵向加速度控制同时确定。所提出的方法设计用于并行化，因此非常适合从GPU等硬件的持续进步中受益。所提出方法的特点在5个数值模拟实验中得到说明，包括动态障碍物避让、需要交替前进和倒车的路径点跟踪以及倒车停车场景。

英文摘要

This paper proposes online sampling in the parameter space of a neural network for GPU-accelerated motion planning of autonomous vehicles. Neural networks are used as controller parametrization since they can handle nonlinear non-convex systems and their complexity does not scale with prediction horizon length. Network parametrizations are sampled at each sampling time and then held constant throughout the prediction horizon. Controls still vary over the prediction horizon due to varying feature vectors fed to the network. Full-dimensional vehicles are modeled by polytopes. Under the assumption of obstacle point data, and their extrapolation over a prediction horizon under constant velocity assumption, collision avoidance reduces to linear inequality checks. Steering and longitudinal acceleration controls are determined simultaneously. The proposed method is designed for parallelization and therefore well-suited to benefit from continuing advancements in hardware such as GPUs. Characteristics of proposed method are illustrated in 5 numerical simulation experiments including dynamic obstacle avoidance, waypoint tracking requiring alternating forward and reverse driving with maximal steering, and a reverse parking scenario.

URL PDF HTML ☆

赞 0 踩 0

1904.06524 2026-06-04 cs.RO cs.SY eess.SY

On Model Adaptation for Sensorimotor Control of Robots

关于机器人传感器运动控制的模型适应

David Navarro-Alarcon, Andrea Cherubini, Xiang Li

发表机构 * The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong（香港理工大学）； University of Montpellier / LIRMM, Montpellier, France（蒙彼利埃大学 / LIRMM）； The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong（香港中文大学）

AI总结本文研究了如何计算用于引导具有不确定动作-感知关系的机器人系统运动的自适应传感器运动模型，通过两个案例研究展示了所提出的方法：变形物体的形状控制和未校准传感器下的超声探头软操控。

Comments 38th Chinese Control Conference

1904.05271 2026-06-04 cs.RO cs.SY eess.SY

Indoor Testing and Simulation Platform for Close-distance Visual Inspection of Complex Structures using Micro Quadrotor UAV

复杂结构近距离视觉检测用微四旋翼无人机室内测试与仿真平台

Zhexiong Shang, Zhigang Shen

发表机构 * Durham School of Architectural Engineering & Construction, University of Nebraska-Lincoln（达勒姆建筑工程与建设学院，内布拉斯加大学林肯分校）

AI总结本文提出了一种基于微四旋翼无人机的低成本实验平台，用于测试无人机路径规划算法，通过室内环境实现高效、安全的近距离视觉检测，验证了现有路径规划算法在实际应用中的有效性。

Comments 6 pages, 6 figures, accepted in ICCCBE 2018

详情

AI中文摘要

近年来，使用无人机（也称为无人驾驶飞行器，UAV）进行近距离视觉检测已成为多个学科中的活跃研究领域。然而，在实现自主检测之前，仍有许多挑战需要克服，尤其是在检测复杂结构时。复杂的民用结构，如桥梁、水坝和风力涡轮机，规模庞大且几何复杂。这要求使用复杂的路径规划算法来实现近距离检测，同时避免碰撞。在实践中，直接在这些结构上部署路径规划结果容易出错、成本高且充满危险。本文基于微四旋翼无人机，提出了一种经济实惠的实验平台，用于测试基于无人机的路径规划结果。该平台允许用户随时进行多种路径规划实验，而无需担心昂贵且耗时的户外飞行测试。该平台基于Crazyflie套件开发，包括Crazyflie 2.0四旋翼、Crazyradio和定位系统（LPS）。该平台配备 onboard 微型FPV相机，飞行过程中可实时流式传输视觉数据到主机计算机。该平台明确设计了手动配置和航点控制功能，以提高其在路径跟随和调试方面的灵活性和性能。为了评估所提出测试平台的实用性，测试了两种现有的基于无人机的路径规划算法。结果表明，尽管存在一定程度的误差，视觉数据的质量和路径跟随的准确性足以模拟大多数实际检测应用。

英文摘要

In recent years, using drone, also known as unmanned aerial vehicle (UAV), in close-distance visual inspection has became an active area in many disciplines. However, many challenges still remain before we can achieve autonomous inspection, especially when inspecting complex structures. The complex civil structures, such as bridges, dams and wind turbines, are large-scale and geometrical complicated. It requires sophisticated path planning algorithms to achieve close-distance inspection and, at the same time, avoid collisions. In practice, directly deploying the path planning result on such structures is error prone, costly, and full of hazards. In this paper, rely on micro quadrotor UAV, the authors present an affordable experimental platform for testing drone-based path planning result. The platform allows the users to conduct many path planning experiments at any time without worrying expensive and time consuming outdoor test flying. This platform is developed based on the bundle of Crazyflie, which includes Crazyflie 2.0 quadrotor, Crazyradio and loco positioning system (LPS). Equipped with an onboard micro FPV camera, the visual data can be lively streamed to the host computer during flight. The functions of manual configuration and waypoints control are explicitly designed in this platform to increase its flexibility and performance on path following and debugging. To evaluate the practicability of the proposed test platform, two existing drone-based path planning algorithms are tested. The results show that even though certain level of error existed, the quality of visual data and accuracy of path following are high enough for simulating most practical inspection applications.

URL PDF HTML ☆

赞 0 踩 0

1904.05072 2026-06-04 cs.RO cs.AI cs.SY eess.SY

Differential Dynamic Programming for Multi-Phase Rigid Contact Dynamics

多相刚体接触动力学中的微分动态规划

Rohan Budhiraja, Justin Carpentier, Carlos Mastalli, Nicolas Mansard

发表机构 * CNRS, LAAS（法国国家科学研究中心，拉拉斯研究所）； INRIA, France（法国国家信息与自动化研究所，法国）

AI总结本文提出使用微分动态规划算法来优化多相刚体接触动力学的全身轨迹，通过利用角动量提高运动效率，减少力和冲击，并在无外力情况下实现姿态控制。

Comments 6 pages, IEEE RAS International Conference on Humanoid Robots

详情

DOI: 10.1109/HUMANOIDS.2018.8624925

AI中文摘要

当今生成高效运动的常见策略是将问题分解为两个连续步骤：第一步生成接触序列和质心轨迹，第二步计算遵循质心模式的全身轨迹。然而，第二步通常由简单的程序如逆运动学求解器处理。相反，我们提出使用局部最优控制求解器，即微分动态规划（DDP），来计算全身轨迹。我们的方法通过利用角动量产生更高效的运动，具有较低的力和较小的冲击。为此，我们提出了一种原始的DDP公式，利用刚体接触模型的Kuhn-Tucker约束。通过在真实HRP-2机器人上执行大步行走和无外力情况下的姿态控制问题，我们实验性地展示了这种方法的重要性。

英文摘要

A common strategy today to generate efficient locomotion movements is to split the problem into two consecutive steps: the first one generates the contact sequence together with the centroidal trajectory, while the second one computes the whole-body trajectory that follows the centroidal pattern. Yet the second step is generally handled by a simple program such as an inverse kinematics solver. In contrast, we propose to compute the whole-body trajectory by using a local optimal control solver, namely Differential Dynamic Programming (DDP). Our method produces more efficient motions, with lower forces and smaller impacts, by exploiting the Angular Momentum (AM). With this aim, we propose an original DDP formulation exploiting the Karush-Kuhn-Tucker constraint of the rigid contact model. We experimentally show the importance of this approach by executing large steps walking on the real HRP-2 robot, and by solving the problem of attitude control under the absence of external forces.

URL PDF HTML ☆

赞 0 踩 0

1903.07266 2026-06-04 cs.LG cs.DC cs.MA cs.SY eess.SY stat.ML

Distributed stochastic optimization with gradient tracking over strongly-connected networks

在强连通网络上进行分布式随机优化与梯度跟踪

Ran Xin, Anit Kumar Sahu, Usman A. Khan, Soummya Kar

发表机构 * Department of Electrical and Computer Engineering, Tufts University（Tufts大学电气与计算机工程系）； Bosch Center for Artificial Intelligence（博世人工智能中心）； Department of Electrical and Computer Engineering, Carnegie Mellon University（卡内基梅隆大学电气与计算机工程系）

AI总结本文研究了在强连通网络上最小化平滑且强凸局部成本函数之和的分布式随机优化问题，提出了一种新的分布式方法$\mathcal{S}$-$\mathcal{AB}$，通过辅助变量在期望意义上渐近跟踪全局成本的梯度，利用行和列随机权重确保共识和最优性，并在任意强连通图上应用。

详情

AI中文摘要

在本文中，我们研究了在强连通网络上最小化平滑且强凸局部成本函数之和的分布式随机优化问题，假设每个代理都有访问随机一阶oracle（$\mathcal{SFO}$）的权限，我们提出了一种新的分布式方法，称为$\mathcal{S}$-$\mathcal{AB}$，其中每个代理使用辅助变量以期望意义渐近跟踪全局成本的梯度。$\mathcal{S}$-$\mathcal{AB}$算法同时使用行和列随机权重，以确保共识和最优性。由于未使用双随机权重，$\mathcal{S}$-$\mathcal{AB}$适用于任意强连通图。我们证明，在足够小的常数步长下，$\mathcal{S}$-$\mathcal{AB}$在期望均方意义上线性收敛到全局极小值的邻域。我们基于真实世界数据集进行了数值模拟以说明理论结果。

英文摘要

In this paper, we study distributed stochastic optimization to minimize a sum of smooth and strongly-convex local cost functions over a network of agents, communicating over a strongly-connected graph. Assuming that each agent has access to a stochastic first-order oracle ($\mathcal{SFO}$), we propose a novel distributed method, called $\mathcal{S}$-$\mathcal{AB}$, where each agent uses an auxiliary variable to asymptotically track the gradient of the global cost in expectation. The $\mathcal{S}$-$\mathcal{AB}$ algorithm employs row- and column-stochastic weights simultaneously to ensure both consensus and optimality. Since doubly-stochastic weights are not used, $\mathcal{S}$-$\mathcal{AB}$ is applicable to arbitrary strongly-connected graphs. We show that under a sufficiently small constant step-size, $\mathcal{S}$-$\mathcal{AB}$ converges linearly (in expected mean-square sense) to a neighborhood of the global minimizer. We present numerical simulations based on real-world data sets to illustrate the theoretical results.

URL PDF HTML ☆

赞 0 踩 0

1904.04600 2026-06-04 cs.RO cs.SY eess.SY

Hierarchical Planning of Dynamic Movements without Scheduled Contact Sequences

无调度接触序列的动态运动分层规划

Carlos Mastalli, Ioannis Havoutis, Michele Focchi, Darwin G. Caldwell, Claudio Semini

发表机构 * Department of Advanced Robotics, Istituto Italiano di Tecnologia（意大利技术研究院先进机器人部）； Robot Learning and Interaction Group, Idiap Research Institute（伊迪普研究所机器人学习与交互小组）

AI总结本文提出了一种分层轨迹优化方法，用于规划无需调度接触序列的动态运动，通过计算能够实现无法通过刚体运动达到的目标的全身运动，首先根据机器人的质心动力学找到可行的质心运动，然后通过应用完整的动力学模型进行优化，利用可行的质心轨迹作为预热起点，通过互补约束描述接触模型，即环境几何和非滑动主动接触，两个优化阶段均作为互补约束数学规划问题（MPCC）进行求解，实验表明该规划方法在一系列具有挑战性的任务中表现出色。

Comments 6 pages, IEEE International Conference on Robotics and Automation (ICRA)

详情

DOI: 10.1109/ICRA.2016.7487664

AI中文摘要

大多数动物和人类完成复杂任务的运动行为涉及动态运动和丰富的接触交互。事实上，复杂操作需要同时考虑动态运动和接触事件。我们提出了一种分层轨迹优化方法，用于规划具有无调度接触序列的动态运动。我们计算出能够实现无法通过刚体运动达到的目标的全身运动。首先，我们根据机器人的质心动力学找到可行的质心运动。然后，我们通过应用机器人的完整动力学模型来优化解决方案，其中可行的质心轨迹用作预热起点。为了实现无调度的接触行为，我们使用互补约束来描述接触模型，即环境几何和非滑动主动接触。两个优化阶段均被提出为互补约束数学规划问题（MPCC）。实验测试展示了我们的规划方法在一系列具有挑战性的任务中的性能。

英文摘要

Most animal and human locomotion behaviors for solving complex tasks involve dynamic motions and rich contact interaction. In fact, complex maneuvers need to consider dynamic movement and contact events at the same time. We present a hierarchical trajectory optimization approach for planning dynamic movements with unscheduled contact sequences. We compute whole-body motions that achieve goals that cannot be reached in a kinematic fashion. First, we find a feasible CoM motion according to the centroidal dynamics of the robot. Then, we refine the solution by applying the robot's full-dynamics model, where the feasible CoM trajectory is used as a warm-start point. To accomplish the unscheduled contact behavior, we use complementarity constraints to describe the contact model, i.e. environment geometry and non-sliding active contacts. Both optimization phases are posed as Mathematical Program with Complementarity Constraints (MPCC). Experimental trials demonstrate the performance of our planning approach in a set of challenging tasks.

URL PDF HTML ☆

赞 0 踩 0

1904.04595 2026-06-04 cs.RO cs.AI cs.SY eess.SY

Simultaneous Contact, Gait and Motion Planning for Robust Multi-Legged Locomotion via Mixed-Integer Convex Optimization

通过混合整数凸优化实现鲁棒多足运动的同步接触、步态和运动规划

Bernardo Aceituno-Cabezas, Carlos Mastalli, Hongkai Dai, Michele Focchi, Andreea Radulescu, Darwin G. Caldwell, Jose Cappelletto, Juan C. Grieco, Gerardo Fernandez-Lopez, Claudio Semini

发表机构 * School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA（电气与计算机工程系，佐治亚理工学院，亚特兰大，GA 30332 USA）； Twentieth Century Fox, Springfield, USA（二十世纪福克斯，斯普林菲尔德，USA）； Starfleet Academy, San Francisco, CA 96678 USA（星际舰队学院，旧金山，CA 96678 USA）； Tyrell Inc., 123 Replicant Street, Los Angeles, CA 90210 USA（泰勒尔公司，123 复制人街，洛杉矶，CA 90210 USA）

AI总结本文提出了一种混合整数凸优化方法，用于同时规划多足机器人的接触位置、步态转换和运动，以提高运动的通用性并保持低计算时间。

Comments 8 pages, IEEE Robotics and Automation Letters

详情

DOI: 10.1109/LRA.2017.2779821

AI中文摘要

传统多足运动规划方法将问题分为多个阶段，如接触搜索和轨迹生成。然而，同时考虑接触和运动对于生成复杂的全身行为至关重要。目前，将这些问题耦合在一起需要假设固定的步态序列和平坦地形条件，或者使用非凸优化，计算时间不可行。本文提出了一种混合整数凸公式，以高效的方式同时规划接触位置、步态转换和运动。与之前的工作不同，我们的方法不限于平坦地形或预设的步态序列。相反，我们纳入摩擦锥稳定性边际，近似机器人扭矩限制，并使用混合整数凸约束规划步态。我们通过在HyQ机器人上实验验证了我们的方法，穿越了不同具有挑战性的地形，其中非凸性和平坦地形假设可能导致次优或不稳定计划。我们的方法在保持低计算时间的同时提高了运动的通用性。

英文摘要

Traditional motion planning approaches for multi-legged locomotion divide the problem into several stages, such as contact search and trajectory generation. However, reasoning about contacts and motions simultaneously is crucial for the generation of complex whole-body behaviors. Currently, coupling theses problems has required either the assumption of a fixed gait sequence and flat terrain condition, or non-convex optimization with intractable computation time. In this paper, we propose a mixed-integer convex formulation to plan simultaneously contact locations, gait transitions and motion, in a computationally efficient fashion. In contrast to previous works, our approach is not limited to flat terrain nor to a pre-specified gait sequence. Instead, we incorporate the friction cone stability margin, approximate the robot's torque limits, and plan the gait using mixed-integer convex constraints. We experimentally validated our approach on the HyQ robot by traversing different challenging terrains, where non-convexity and flat terrain assumptions might lead to sub-optimal or unstable plans. Our method increases the motion generality while keeping a low computation time.

URL PDF HTML ☆

赞 0 踩 0

1904.03665 2026-06-04 cs.RO cs.LG cs.SY eess.SY

Learning to Control Highly Accelerated Ballistic Movements on Muscular Robots

学习控制高加速度的球形运动在肌肉机器人上

Dieter Büchler, Roberto Calandra, Jan Peters

发表机构 * Max Planck Institute for Intelligent Systems（智能系统马克斯·普朗克研究所）； Facebook AI Research（脸书人工智能研究）

AI总结本文研究了如何通过学习方法提高肌肉机器人在高速高加速度运动中的控制精度，提出了一种四自由度的机器人臂，利用气动人工肌肉实现高关节角加速度，并通过贝叶斯优化直接在硬件上调整控制参数，从而在快速轨迹上实现了优于以往的结果。

Comments 12 pages, preprint submitted to Journal of Robotics and Autonomous Systems

详情

AI中文摘要

高速和高加速度的运动本质上很难控制。在人形机器人臂上应用学习方法来控制此类运动可以提高控制的准确性，但可能会损害系统。学习方法的内在探索可能导致不稳定性和机器人在高速下达到关节极限。因此，具有安全探索高速和高加速度运动硬件的需求是必要的。为了解决这个问题，我们提出使用由气动人工肌肉（PAMs）驱动的机器人。在本文中，我们展示了一种四自由度的机器人臂，能够达到高达28000度/秒²的关节角加速度，同时通过拮抗驱动和空气压力范围限制避免危险的关节极限。利用这种机器人臂，我们能够通过贝叶斯优化直接在硬件上调整控制参数，而无需额外的安全考虑。在快速轨迹上的跟踪性能超过了以往在类似PAM驱动机器人上的结果。我们还展示了由于电缆弯曲最小、轻量级动力学和PAMs与链接之间的最小接触等精心设计考虑，系统能够使用PID控制器在慢速轨迹上良好控制。最后，我们提出了一种新的技术来控制拮抗肌肉对的协同收缩。实验结果表明，选择最佳的协同收缩水平对于达到更好的跟踪性能至关重要。通过使用PAM驱动机器人和学习，我们朝着未来能够实现更像人类运动的机器人发展迈出了小一步。

英文摘要

High-speed and high-acceleration movements are inherently hard to control. Applying learning to the control of such motions on anthropomorphic robot arms can improve the accuracy of the control but might damage the system. The inherent exploration of learning approaches can lead to instabilities and the robot reaching joint limits at high speeds. Having hardware that enables safe exploration of high-speed and high-acceleration movements is therefore desirable. To address this issue, we propose to use robots actuated by Pneumatic Artificial Muscles (PAMs). In this paper, we present a four degrees of freedom (DoFs) robot arm that reaches high joint angle accelerations of up to 28000 deg/s^2 while avoiding dangerous joint limits thanks to the antagonistic actuation and limits on the air pressure ranges. With this robot arm, we are able to tune control parameters using Bayesian optimization directly on the hardware without additional safety considerations. The achieved tracking performance on a fast trajectory exceeds previous results on comparable PAM-driven robots. We also show that our system can be controlled well on slow trajectories with PID controllers due to careful construction considerations such as minimal bending of cables, lightweight kinematics and minimal contact between PAMs and PAMs with the links. Finally, we propose a novel technique to control the the co-contraction of antagonistic muscle pairs. Experimental results illustrate that choosing the optimal co-contraction level is vital to reach better tracking performance. Through the use of PAM-driven robots and learning, we do a small step towards the future development of robots capable of more human-like motions.

URL PDF HTML ☆

赞 0 踩 0

1811.07049 2026-06-04 cs.RO cs.SY eess.SY

RMPflow: A Computational Graph for Automatic Motion Policy Generation

RMPflow：一种用于自动运动策略生成的计算图

Ching-An Cheng, Mustafa Mukadam, Jan Issac, Stan Birchfield, Dieter Fox, Byron Boots, Nathan Ratliff

发表机构 * NVIDIA, Seattle Robotics Lab（NVIDIA西雅图机器人实验室）； Georgia Institute of Technology, Robot Learning Lab（佐治亚理工学院机器人学习实验室）； University of Washington, Robotics and State Estimation Lab（华盛顿大学机器人与状态估计实验室）

AI总结本文提出了一种基于Riemannian Motion Policies几何一致变换的新型策略合成算法RMPflow，通过结合局部策略生成全局策略并利用稀疏结构提高计算效率，同时研究了其几何性质并验证了在高自由度 manipulation 系统中通过障碍物规划的简化效果。

Comments WAFR 2018

1604.00970 2026-06-04 cs.CV cs.SY eess.SP eess.SY

Extended Object Tracking: Introduction, Overview and Applications

扩展目标跟踪：介绍、概述与应用

Karl Granstrom, Marcus Baum, Stephan Reuter

发表机构 * Department of Signals and Systems, Chalmers University of Technology（信号与系统系，查尔姆斯理工大学）

AI总结本文综述了扩展目标跟踪的当前研究，定义了该问题并讨论其与其他目标跟踪方法的区别，介绍了两种基本的扩展目标跟踪方法，并总结了在摄像头、X波段雷达、激光雷达、RGB-D传感器等应用中的实际应用。

Comments 30 pages, 19 figures

详情

Journal ref: Journal of Advances in Information Fusion, Volume 12, Number 2, Pages 139-174, December 2016, ISSN 1557-6418

AI中文摘要

本文提供了一篇详尽的扩展目标跟踪当前研究的概述。我们给出了扩展目标跟踪问题的明确定义，并讨论了其与其他类型目标跟踪的界限。接下来，广泛讨论了扩展目标建模的不同方面。随后，我们介绍了两种基本且常用的方法——随机矩阵方法和基于卡尔曼滤波的星形形状方法。接下来一部分讨论了多个扩展目标的跟踪，并阐述了如何利用随机有限集（RFS）和非RFS多目标跟踪器来处理大量的可行关联假设。文章最后总结了当前的应用情况，突出了四个涉及摄像头、X波段雷达、激光雷达、红绿蓝深度（RGB-D）传感器的应用示例。

英文摘要

This article provides an elaborate overview of current research in extended object tracking. We provide a clear definition of the extended object tracking problem and discuss its delimitation to other types of object tracking. Next, different aspects of extended object modelling are extensively discussed. Subsequently, we give a tutorial introduction to two basic and well used extended object tracking approaches - the random matrix approach and the Kalman filter-based approach for star-convex shapes. The next part treats the tracking of multiple extended objects and elaborates how the large number of feasible association hypotheses can be tackled using both Random Finite Set (RFS) and Non-RFS multi-object trackers. The article concludes with a summary of current applications, where four example applications involving camera, X-band radar, light detection and ranging (lidar), red-green-blue-depth (RGB-D) sensors are highlighted.

URL PDF HTML ☆

赞 0 踩 0

1904.02765 2026-06-04 cs.RO cs.LG cs.SY eess.SY math.OC

Intent-Aware Probabilistic Trajectory Estimation for Collision Prediction with Uncertainty Quantification

意图感知的概率轨迹估计用于碰撞预测与不确定性量化

Andrew Patterson, Arun Lakshmanan, Naira Hovakimyan

发表机构 * Department of Mechanical Science and Engineering, University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校机械科学与工程系）

AI总结本文提出了一种基于高斯过程的概率轨迹估计方法，用于在不确定环境中预测碰撞，通过概率方法替代确定性假设，以考虑更广泛的障碍物类型，并通过案例研究展示了在有限障碍物行为知识下预测碰撞的能力。

详情

AI中文摘要

在动态和未知的环境中，碰撞预测依赖于对环境变化的理解。许多碰撞预测方法依赖于对障碍物运动的确定性知识，但完全确定性的障碍物运动知识往往不可用。本文提出了一种基于高斯过程的预测方法，用概率知识替代对每个障碍物未来行为的确定性假设，以考虑更广泛的障碍物。该方法仅依赖位置和速度测量来预测与动态障碍物的碰撞。我们证明，障碍物位置的不确定性区域可以表示为通过高斯过程回归生成的多项式的组合。为了控制任意时间范围内不确定性的增长，假设概率障碍物意图作为障碍物位置和速度的分布，这可以自然地包含在高斯过程框架中。我们的方法在两个案例研究中得到验证：(i) 障碍物超越代理；(ii) 障碍物垂直穿过代理的路径。在这些模拟中，我们展示了即使在有限的障碍物行为知识下也能预测碰撞。

英文摘要

Collision prediction in a dynamic and unknown environment relies on knowledge of how the environment is changing. Many collision prediction methods rely on deterministic knowledge of how obstacles are moving in the environment. However, complete deterministic knowledge of the obstacles' motion is often unavailable. This work proposes a Gaussian process based prediction method that replaces the assumption of deterministic knowledge of each obstacle's future behavior with probabilistic knowledge, to allow a larger class of obstacles to be considered. The method solely relies on position and velocity measurements to predict collisions with dynamic obstacles. We show that the uncertainty region for obstacle positions can be expressed in terms of a combination of polynomials generated with Gaussian process regression. To control the growth of uncertainty over arbitrary time horizons, a probabilistic obstacle intention is assumed as a distribution over obstacle positions and velocities, which can be naturally included in the Gaussian process framework. Our approach is demonstrated in two case studies in which (i), an obstacle overtakes the agent and (ii), an obstacle crosses the agent's path perpendicularly. In these simulations we show that the collision can be predicted despite having limited knowledge of the obstacle's behavior.

URL PDF HTML ☆

赞 0 踩 0