arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

1905.08314 2026-06-04 cs.RO cs.LG cs.SY eess.SY

Longitudinal Dynamic versus Kinematic Models for Car-Following Control Using Deep Reinforcement Learning

纵向动态模型与运动学模型在使用深度强化学习的汽车跟随控制中的比较

Yuan Lin, John McPhee, Nasser L. Azad

发表机构 * University of Waterloo, Ontario, Canada（加拿大温哥华大学）

AI总结本文研究了在考虑车辆动力学的情况下，使用深度强化学习的纵向汽车跟随控制问题，通过引入延迟的控制输入和实际车辆加速度到强化学习环境状态中，改进了DRL框架，从而在考虑车辆动力学时实现了接近最优的控制性能。

Comments Accepted to 2019 IEEE Intelligent Transportation Systems Conference

详情

DOI: 10.1109/ITSC.2019.8916781

AI中文摘要

目前大多数关于通过深度强化学习（DRL）实现自动驾驶车辆控制的研究都使用点质量运动学模型，忽略了车辆动力学，包括加速度延迟和加速度命令动力学。加速度延迟源于传感和执行延迟，导致控制输入执行延迟。加速度命令动力学决定了实际车辆加速度不会立即达到期望的命令加速度，因为存在动力学限制。在本工作中，我们研究了将使用车辆运动学模型训练的DRL控制器应用于更现实的驾驶控制中的可行性。我们考虑了一个特定的纵向汽车跟随控制问题，即自适应巡航控制系统（ACC），该问题通过使用点质量运动学模型的DRL解决。当此类控制器应用于具有车辆动力学的汽车跟随时，我们观察到显著退化的汽车跟随性能。因此，我们重新设计DRL框架，通过将延迟的控制输入和实际车辆加速度分别添加到强化学习环境状态中，以适应加速度延迟和加速度命令动力学。训练结果表明，改进后的DRL控制器在考虑车辆动力学时的汽车跟随控制性能接近最优，与动态规划解决方案相比。

英文摘要

The majority of current studies on autonomous vehicle control via deep reinforcement learning (DRL) utilize point-mass kinematic models, neglecting vehicle dynamics which includes acceleration delay and acceleration command dynamics. The acceleration delay, which results from sensing and actuation delays, results in delayed execution of the control inputs. The acceleration command dynamics dictates that the actual vehicle acceleration does not rise up to the desired command acceleration instantaneously due to dynamics. In this work, we investigate the feasibility of applying DRL controllers trained using vehicle kinematic models to more realistic driving control with vehicle dynamics. We consider a particular longitudinal car-following control, i.e., Adaptive Cruise Control (ACC), problem solved via DRL using a point-mass kinematic model. When such a controller is applied to car following with vehicle dynamics, we observe significantly degraded car-following performance. Therefore, we redesign the DRL framework to accommodate the acceleration delay and acceleration command dynamics by adding the delayed control inputs and the actual vehicle acceleration to the reinforcement learning environment state, respectively. The training results show that the redesigned DRL controller results in near-optimal control performance of car following with vehicle dynamics considered when compared with dynamic programming solutions.

URL PDF HTML ☆

赞 0 踩 0

1803.00204 2026-06-04 cs.LG cs.AI cs.NA math.NA stat.ML

Scalar Quantization as Sparse Least Square Optimization

标量量化作为稀疏最小二乘优化

Chen Wang, Xiaomei Yang, Shaomin Fei, Kai Zhou, Xiaofeng Gong, Miao Du, Ruisen Luo

发表机构 * College of Electrical Engineering, Sichuan University（四川大学电气工程学院）； Department of Computer Science, Rutgers University -- New Brunswick（罗格斯大学新布朗斯维广场分校计算机科学系）； Engineering Practice Center, Chengdu University of Information Technology（成都信息科技大学工程实践中心）

AI总结本文提出了一种基于稀疏最小二乘优化的新方法，用于解决标量量化中的问题，通过引入l1、l1+l2和l0正则化，改进了传统聚类方法的不足，提升了在位宽缩减场景下的性能。

详情

DOI: 10.1109/TPAMI.2019.2952096
Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019

AI中文摘要

量化可以用来形成具有共享值的新向量/矩阵，其值接近原始数据。近年来，标量量化在值共享应用中的普及度迅速上升，因为它在减少神经网络复杂度方面具有巨大实用性。现有的基于聚类的量化技术虽然发展成熟，但存在多个缺点，包括对随机种子的依赖性、空集群或超出范围的集群，以及大量集群时的时间复杂度高。为克服这些问题，本文从新的视角研究标量量化问题，即稀疏最小二乘优化。具体来说，受稀疏最小二乘回归性质的启发，提出了几种基于l1最小二乘的量化算法。此外，还提出了类似的方案，具有l1 + l2和l0正则化。此外，为了计算给定数量的值/集群的量化结果，本文设计了一种迭代方法和一种基于聚类的方法，并且两者都建立在稀疏最小二乘之上。本文表明，后者方法在数学上等价于改进版的k-means聚类基量化算法，尽管两种算法起源于不同的直觉。所提出的算法在三种类型的数据上进行了测试，比较和分析了其计算性能，包括信息损失、时间消耗以及稀疏向量值的分布。本文为量化领域提供了新的视角，所提出的算法在某些位宽缩减场景下表现优异，当所需的量化后分辨率（值的数量）不显著低于原始数量时尤其如此。

英文摘要

Quantization can be used to form new vectors/matrices with shared values close to the original. In recent years, the popularity of scalar quantization for value-sharing applications has been soaring as it has been found huge utilities in reducing the complexity of neural networks. Existing clustering-based quantization techniques, while being well-developed, have multiple drawbacks including the dependency of the random seed, empty or out-of-the-range clusters, and high time complexity for a large number of clusters. To overcome these problems, in this paper, the problem of scalar quantization is examined from a new perspective, namely sparse least square optimization. Specifically, inspired by the property of sparse least square regression, several quantization algorithms based on $l_1$ least square are proposed. In addition, similar schemes with $l_1 + l_2$ and $l_0$ regularization are proposed. Furthermore, to compute quantization results with a given amount of values/clusters, this paper designed an iterative method and a clustering-based method, and both of them are built on sparse least square. The paper shows that the latter method is mathematically equivalent to an improved version of k-means clustering-based quantization algorithm, although the two algorithms originated from different intuitions. The algorithms proposed were tested with three types of data and their computational performances, including information loss, time consumption, and the distribution of the values of the sparse vectors, were compared and analyzed. The paper offers a new perspective to probe the area of quantization, and the algorithms proposed can outperform existing methods especially under some bit-width reduction scenarios, when the required post-quantization resolution (number of values) is not significantly lower than the original number.

URL PDF HTML ☆

赞 0 踩 0

1903.10604 2026-06-04 cs.CV cs.SY eess.SY

An Approach for Adaptive Automatic Threat Recognition Within 3D Computed Tomography Images for Baggage Security Screening

一种基于3D计算层析成像的自适应自动威胁识别方法用于行李安全检查

Qian Wang, Khalid N. Ismail, Toby P. Breckon

发表机构 * Department of Computer Science, Durham University, United Kingdom（英国杜伦大学计算机科学系）； Department of Engineering, Durham University, United Kingdom（英国杜伦大学工程系）

AI总结本文提出了一种基于3D X射线计算层析成像的自适应自动威胁识别方法，旨在解决快速演变的威胁特征识别问题，通过多尺度3D CT图像分割算法、多类支持向量机分类器和适应性策略实现高检测概率和低误报率。

Comments Technical Report, Durham University

详情

AI中文摘要

使用X射线扫描器对行李进行安全检查已成为航空安全的常规操作，自动威胁检测方法基于3D X射线计算层析成像（CT）图像，称为自动威胁识别（ATR）。当前策略使用预定义的威胁材料签名，而非适应于新出现的威胁签名。为了解决这个问题，先前的工作提出了自适应自动威胁识别（AATR）的概念。本文提出了一种基于X射线CT行李扫描图像的解决方案。该方法旨在解决筛查要求中快速演变的威胁特征问题。理想情况下，部署在安全扫描仪中的检测算法应能快速适应不同情况，具有不同的威胁特征要求（例如，威胁材料、物体的物理属性）。我们通过一种新颖的自适应机器学习方法来解决这个问题，该解决方案包括一个多尺度3D CT图像分割算法、一个多类支持向量机（SVM）分类器用于物体材料识别以及一种使方法适应的策略。实验在开放和封闭的3D CT行李图像数据集上进行，这些数据集专门用于AATR研究。我们提出的方法在识别和适应性方面表现良好。总体而言，我们的方法可以实现约90%的检测概率和低于20%的误报率。我们的AATR展示了适应不同种类材料的能力，甚至包括训练数据中未出现的未知材料，适应不同所需的检测概率以及适应不同规模的威胁物体。

英文摘要

The screening of baggage using X-ray scanners is now routine in aviation security with automatic threat detection approaches, based on 3D X-ray computed tomography (CT) images, known as Automatic Threat Recognition (ATR) within the aviation security industry. These current strategies use pre-defined threat material signatures in contrast to adaptability towards new and emerging threat signatures. To address this issue, the concept of adaptive automatic threat recognition (AATR) was proposed in previous work. In this paper, we present a solution to AATR based on such X-ray CT baggage scan imagery. This aims to address the issues of rapidly evolving threat signatures within the screening requirements. Ideally, the detection algorithms deployed within the security scanners should be readily adaptable to different situations with varying requirements of threat characteristics (e.g., threat material, physical properties of objects). We tackle this issue using a novel adaptive machine learning methodology with our solution consisting of a multi-scale 3D CT image segmentation algorithm, a multi-class support vector machine (SVM) classifier for object material recognition and a strategy to enable the adaptability of our approach. Experiments are conducted on both open and sequestered 3D CT baggage image datasets specifically collected for the AATR study. Our proposed approach performs well on both recognition and adaptation. Overall our approach can achieve the probability of detection around 90% with a probability of false alarm below 20%. Our AATR shows the capabilities of adapting to varying types of materials, even the unknown materials which are not available in the training data, adapting to varying required probability of detection and adapting to varying scales of the threat object.

URL PDF HTML ☆

赞 0 踩 0

1905.09435 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling

MATCHA: 通过匹配分解采样加速去中心化SGD

Jianyu Wang, Anit Kumar Sahu, Zhouyi Yang, Gauri Joshi, Soummya Kar

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Bosch Center for Artificial Intelligence（博世人工智能中心）

AI总结该研究提出MATCHA算法，通过匹配分解采样在去中心化SGD中实现误差与运行时间的双赢，验证了其在各种数据集和深度神经网络上的有效性，证明其比传统去中心化SGD快5倍。

详情

AI中文摘要

本文研究了在基于随机梯度下降（SGD）的去中心化训练中常见的误差-运行时间权衡问题。尽管更密集（稀疏）的网络拓扑会导致迭代更快（更慢）的误差收敛，但会带来更多的（更少）每次迭代的通信时间/延迟。本文提出MATCHA算法，能够在任意任意网络拓扑中实现误差-运行时间的双赢。MATCHA的主要思想是通过将拓扑分解为匹配来并行化节点间通信。为了保持快速的误差收敛速度，它识别并频繁通过关键链接进行通信，并通过较少使用其他链接来节省通信时间。在一系列数据集和深度神经网络上的实验验证了理论分析，并证明MATCHA在达到相同训练损失时比传统去中心化SGD快多达5倍。

英文摘要

This paper studies the problem of error-runtime trade-off, typically encountered in decentralized training based on stochastic gradient descent (SGD) using a given network. While a denser (sparser) network topology results in faster (slower) error convergence in terms of iterations, it incurs more (less) communication time/delay per iteration. In this paper, we propose MATCHA, an algorithm that can achieve a win-win in this error-runtime trade-off for any arbitrary network topology. The main idea of MATCHA is to parallelize inter-node communication by decomposing the topology into matchings. To preserve fast error convergence speed, it identifies and communicates more frequently over critical links, and saves communication time by using other links less frequently. Experiments on a suite of datasets and deep neural networks validate the theoretical analyses and demonstrate that MATCHA takes up to $5\times$ less time than vanilla decentralized SGD to reach the same training loss.

URL PDF HTML ☆

赞 0 踩 0

1905.07960 2026-06-04 cs.LG cs.SY eess.SY stat.ML

A novel Multiplicative Polynomial Kernel for Volterra series identification

一种新型的乘积多项式核用于Volterra级数识别

Alberto Dalla Libera, Ruggero Carli, Gianluigi Pillonetto

发表机构 * Department of Information Engineering, University of Padova（信息工程系，帕多瓦大学）

AI总结本文提出了一种新的正则化网络用于Volterra模型的识别，通过引入由基本构建块乘积构成的新核，利用边际似然优化估计未知参数，实验表明该方法能更有效地选择影响系统输出的单项式，提升模型预测能力。

1904.11898 2026-06-04 cs.RO cs.CV cs.LG cs.SY eess.SY

Perceptual Attention-based Predictive Control

基于感知注意力的预测控制

Keuntaek Lee, Gabriel Nakajima An, Viacheslav Zakharov, Evangelos A. Theodorou

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结本文提出了一种新的信息处理架构，用于安全的深度学习视觉导航系统，通过模型预测控制（MPC）、卷积神经网络（CNNs）和不确定性量化方法，实现基于感知注意力的预测控制算法，提高了系统对不安全状况的快速检测能力。

详情

AI中文摘要

在本文中，我们提出了一种新的信息处理架构，用于安全的基于深度学习的视觉导航自主系统。所提出的信息处理架构用于支持一种基于感知注意力的预测控制算法，该算法利用模型预测控制（MPC）、卷积神经网络（CNNs）和不确定性量化方法。我们的方法新颖之处在于利用MPC学习如何在视觉输入的相关区域上放置注意力，从而最终使系统能够更快速地检测到不安全状况。我们通过使用MPC学习如何选择输入图像中的感兴趣区域，这些区域用于输出控制动作以及在注意力感知的视觉输入中的epistemic和aleatoric不确定性估计。我们使用这些不确定性估计来量化在当前导航条件下网络控制器的安全性。所提出的架构和算法在1:5比例的陆地车辆上进行了测试。实验结果表明，所提出的算法在早期检测不安全状况方面优于先前的方法，例如当导航环境中出现新障碍物时。所提出的架构是向在安全关键领域使用基于深度学习的感知控制策略迈出的第一步。

英文摘要

In this paper, we present a novel information processing architecture for safe deep learning-based visual navigation of autonomous systems. The proposed information processing architecture is used to support a perceptual attention-based predictive control algorithm that leverages model predictive control (MPC), convolutional neural networks (CNNs), and uncertainty quantification methods. The novelty of our approach lies in using MPC to learn how to place attention on relevant areas of the visual input, which ultimately allows the system to more rapidly detect unsafe conditions. We accomplish this by using MPC to learn to select regions of interest in the input image, which are used to output control actions as well as estimates of epistemic and aleatoric uncertainty in the attention-aware visual input. We use these uncertainty estimates to quantify the safety of our network controller under the current navigation condition. The proposed architecture and algorithm is tested on a 1:5 scale terrestrial vehicle. Experimental results show that the proposed algorithm outperforms previous approaches on early detection of unsafe conditions, such as when novel obstacles are present in the navigation environment. The proposed architecture is the first step towards using deep learning-based perceptual control policies in safety-critical domains.

URL PDF HTML ☆

赞 0 踩 0

1806.02957 2026-06-04 cs.LG cs.NA cs.NE math.NA physics.comp-ph stat.ML

A Deep Neural Network Surrogate for High-Dimensional Random Partial Differential Equations

高维随机偏微分方程的深度神经网络替代模型

Mohammad Amin Nabian, Hadi Meidani

发表机构 * Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.（土木与环境工程系，伊利诺伊大学厄巴纳-香槟分校）

AI总结本文提出了一种基于深度学习的高维随机偏微分方程求解框架，通过深度残差网络近似随机PDE，并采用强化或弱化初始和边界条件的方法，验证了该方法在扩散和热传导问题中的准确性。

详情

DOI: 10.1016/j.probengmech.2019.05.001
Journal ref: Probabilistic Engineering Mechanics, 57, pp.14-25 (2019)

AI中文摘要

开发高效的数值算法来求解高维随机偏微分方程（PDEs）一直是一个具有挑战性的任务，由于众所周知的维度灾难。我们提出了一种基于深度学习的新解决方案框架。具体而言，随机PDE通过前馈全连接深度残差网络进行近似，采用强或弱执行初始和边界约束。该框架是无网格的，能够处理不规则计算域。近似深度神经网络的参数通过SGD算法的变种迭代确定。所提出的框架在扩散和热传导问题中通过数值实验验证了令人满意的准确性，与收敛的基于蒙特卡洛的有限元结果进行比较。

英文摘要

Developing efficient numerical algorithms for the solution of high dimensional random Partial Differential Equations (PDEs) has been a challenging task due to the well-known curse of dimensionality. We present a new solution framework for these problems based on a deep learning approach. Specifically, the random PDE is approximated by a feed-forward fully-connected deep residual network, with either strong or weak enforcement of initial and boundary constraints. The framework is mesh-free, and can handle irregular computational domains. Parameters of the approximating deep neural network are determined iteratively using variants of the Stochastic Gradient Descent (SGD) algorithm. The satisfactory accuracy of the proposed frameworks is numerically demonstrated on diffusion and heat conduction problems, in comparison with the converged Monte Carlo-based finite element results.

URL PDF HTML ☆

赞 0 踩 0

1701.08711 2026-06-04 cs.CL cs.LG econ.GN q-fin.EC stat.ML

Predicting Auction Price of Vehicle License Plate with Deep Recurrent Neural Network

利用深度循环神经网络预测车辆车牌拍卖价格

Vinci Chow

发表机构 * Department of Economics, The Chinese University of Hong Kong, Shatin, Hong Kong（香港中文大学经济系，沙田，香港）

AI总结本文提出将车辆车牌价格预测视为自然语言处理任务，通过构建深度循环神经网络来预测香港车牌拍卖价格，并展示了模型在解释价格变化和扩展为车牌搜索引擎方面的贡献。

详情

DOI: 10.1016/j.eswa.2019.113008

AI中文摘要

在中国社会，迷信因素极为重要，具有吉祥数字的车辆车牌在拍卖中可以高价成交。与其他珍贵物品不同，车牌在拍卖前并不预估价格。本文提出将车牌价格预测视为自然语言处理（NLP）任务，因为价值取决于车牌上每个字符的含义和语义。本文构建了一个深度循环神经网络（RNN）来预测香港车牌的价格，基于车牌上的字符。在13年的历史拍卖价格上评估，深度RNN的预测可以解释超过80%的价格变化，显著优于以前的模型。此外，本文还展示了该模型如何扩展为车牌搜索引擎，并提供价格分布的估计。

英文摘要

In Chinese societies, superstition is of paramount importance, and vehicle license plates with desirable numbers can fetch very high prices in auctions. Unlike other valuable items, license plates are not allocated an estimated price before auction. I propose that the task of predicting plate prices can be viewed as a natural language processing (NLP) task, as the value depends on the meaning of each individual character on the plate and its semantics. I construct a deep recurrent neural network (RNN) to predict the prices of vehicle license plates in Hong Kong, based on the characters on a plate. I demonstrate the importance of having a deep network and of retraining. Evaluated on 13 years of historical auction prices, the deep RNN's predictions can explain over 80 percent of price variations, outperforming previous models by a significant margin. I also demonstrate how the model can be extended to become a search engine for plates and to provide estimates of the expected price distribution.

URL PDF HTML ☆

赞 0 踩 0

1802.00714 2026-06-04 cs.RO cs.SY eess.SY

Incremental Control and Guidance of Hybrid Aircraft Applied to a Tailsitter UAV

增量控制与引导用于混合式飞行器的尾旋无人机

E. J. J. Smeur, M. Bronz, G. C. H. E. de Croon

发表机构 * Delft University of Technology（代尔夫特理工大学）； ENAC, MAIAA, University of Toulouse（法国图卢兹大学）

AI总结本文提出了一种增量非线性动态逆控制方法，用于混合式飞行器的姿态和位置控制，实现了一个连续的控制器，能够跨飞行包线跟踪飞行器的期望加速度，并在尾旋无人机上进行了多场户外实验验证。

Comments 20 pages, 26 figures

详情

DOI: 10.2514/1.G004520
Journal ref: Journal of Guidance, Control and Dynamics, September 2019 [online]

AI中文摘要

混合无人飞行器可以显著提高微空气车辆的潜力，因为它们结合了悬停能力和机翼以实现快速高效的前进飞行。然而，这些车辆很难控制，因为其空气动力学难以建模且容易受到风切变影响。这通常导致复杂的复合控制器，具有悬停、过渡和前进飞行的不同模式。在本文中，我们提出了一种增量非线性动态逆控制用于姿态和位置控制。结果是一个单一的连续控制器，能够跨飞行包线跟踪飞行器的期望加速度。所提出的控制器在Cyclone混合无人机上实现。进行了多次户外实验，显示未建模的力和力矩被增量控制结构有效补偿。最后，我们提供了一种全面的程序，用于在其他类型的混合无人机上实现该控制器。

英文摘要

Hybrid unmanned aircraft can significantly increase the potential of micro air vehicles, because they combine hovering capability with a wing for fast and efficient forward flight. However, these vehicles are very difficult to control, because their aerodynamics are hard to model and they are susceptible to wind gusts. This often leads to composite and complex controllers, with different modes for hover, transition and forward flight. In this paper, we propose incremental nonlinear dynamic inversion control for the attitude and position control. The result is a single, continuous controller, that is able to track the desired acceleration of the vehicle across the flight envelope. The proposed controller is implemented on the Cyclone hybrid UAV. Multiple outdoor experiments are performed, showing that unmodeled forces and moments are effectively compensated by the incremental control structure. Finally, we provide a comprehensive procedure for the implementation of the controller on other types of hybrid UAVs.

URL PDF HTML ☆

赞 0 踩 0

1904.10945 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

Target-Based Temporal Difference Learning

基于目标的时序差分学习

Donghwan Lee, Niao He

发表机构 * Coordinated Science Laboratory (CSL), University of Illinois at Urbana-Champaign（协调科学实验室（CSL），伊利诺伊大学厄巴纳-香槟分校）； Department of Industrial and Enterprise Systems Engineering, University of Illinois（工业与企业系统工程系，伊利诺伊大学）

AI总结本文提出了一种新的基于目标的时序差分学习算法家族，并从理论上分析了其收敛性，展示了这些算法在收敛性能上可能优于标准时序差分学习。

详情

AI中文摘要

目标网络的使用已成为近期深度Q学习算法在强化学习中的流行和关键组成部分，但理论方面的了解仍然有限。在本工作中，我们介绍了一种新的基于目标的时序差分（TD）学习算法家族，并对其收敛性进行了理论分析。与标准TD学习不同，基于目标的TD算法维护两个独立的学习参数——目标变量和在线变量。特别地，我们介绍了该家族中的三个成员，称为平均TD、双TD和周期TD，其中目标变量通过平均、对称或周期性的方式更新，模仿了深度Q学习实践中使用的技术。我们为平均TD和双TD建立了渐近收敛分析，并为周期TD提供了有限样本分析。此外，我们还提供了一些模拟结果，显示这些基于目标的TD算法在收敛性能上可能优于标准TD学习。虽然本工作集中在线性函数逼近和策略评估设置上，但我们将其视为朝着理解具有目标网络的深度Q学习变体理论基础迈出的有意义一步。

英文摘要

The use of target networks has been a popular and key component of recent deep Q-learning algorithms for reinforcement learning, yet little is known from the theory side. In this work, we introduce a new family of target-based temporal difference (TD) learning algorithms and provide theoretical analysis on their convergences. In contrast to the standard TD-learning, target-based TD algorithms maintain two separate learning parameters-the target variable and online variable. Particularly, we introduce three members in the family, called the averaging TD, double TD, and periodic TD, where the target variable is updated through an averaging, symmetric, or periodic fashion, mirroring those techniques used in deep Q-learning practice. We establish asymptotic convergence analyses for both averaging TD and double TD and a finite sample analysis for periodic TD. In addition, we also provide some simulation results showing potentially superior convergence of these target-based TD algorithms compared to the standard TD-learning. While this work focuses on linear function approximation and policy evaluation setting, we consider this as a meaningful step towards the theoretical understanding of deep Q-learning variants with target networks.

URL PDF HTML ☆

赞 0 踩 0

1808.03258 2026-06-04 cs.LG cs.NA math.NA stat.ML

Application of Bounded Total Variation Denoising in Urban Traffic Analysis

bounded 总变差去噪在城市交通分析中的应用

Shanshan Tang, Haijun Yu

发表机构 * 1 School of Mathematical Sciences, University of Chinese Academy of Sciences ； LSEC, Institute of Computational Mathematics ； Scientific/Engineering Computing, Academy of Mathematics ； Systems Science, Beijing 100190, China 2 NCMIS \& LSEC, Institute of Computational Mathematics ； School of Mathematical Sciences, University of Chinese Academy of Sciences

AI总结本文提出利用 bounded 总变差去噪方法提升城市交通分析的准确性，通过改进的去噪算法和神经网络结合历史匹配方法，提高了交通预测和聚类的性能。

Comments 7 figures, 3 tables, to appear on East Asian Journal on Applied Mathematics

详情

DOI: 10.4208/eajam.181118.250219
Journal ref: East Asian Journal on Applied Mathematics Vol.9, No.3, pp. 622-642, 2019

AI中文摘要

尽管在许多大数据应用中人们认为去噪并不总是必要，但本文通过将 bounded 总变差去噪方法应用于城市道路预测和聚类问题，证明了去噪在城市交通分析中的有效性。我们提出了两种易于实现的方法来估计去噪算法中的噪声强度参数，并将去噪算法应用于北京出租车系统基于 GPS 的交通数据。在交通预测问题中，我们结合神经网络和历史匹配方法，对北京城市区域中随机选择的道路进行预测。数值实验表明，应用所提出的 bounded 总变差去噪算法显著提高了预测精度。我们还测试了该算法在聚类问题中的应用，其中一种 recently 开发的聚类分析方法被应用于北京超过一百个城市的道路段，基于其速度剖面进行聚类分析。去噪后获得了更好的聚类结果。

英文摘要

While it is believed that denoising is not always necessary in many big data applications, we show in this paper that denoising is helpful in urban traffic analysis by applying the method of bounded total variation denoising to the urban road traffic prediction and clustering problem. We propose two easy-to-implement methods to estimate the noise strength parameter in the denoising algorithm, and apply the denoising algorithm to GPS-based traffic data from Beijing taxi system. For the traffic prediction problem, we combine neural network and history matching method for roads randomly chosen from an urban area of Beijing. Numerical experiments show that the predicting accuracy is improved significantly by applying the proposed bounded total variation denoising algorithm. We also test the algorithm on clustering problem, where a recently developed clustering analysis method is applied to more than one hundred urban road segments in Beijing based on their velocity profiles. Better clustering result is obtained after denoising.

URL PDF HTML ☆

赞 0 踩 0

1905.01248 2026-06-04 cs.RO cs.SY eess.SY

Asymmetric Dual-Arm Task Execution using an Extended Relative Jacobian

使用扩展相对雅可比的非对称双臂任务执行

Diogo Almeida, Yiannis Karayiannidis

发表机构 * Division of Robotics, Perception and Learning, KTH Royal Institute of Technology（机器人、感知与学习 division，皇家理工学院）； Dept. of Electrical Eng., Chalmers University of Technology（电气工程系，查尔默斯技术大学）

AI总结本文提出了一种基于扩展相对雅可比的非对称双臂任务执行方法，通过定义非对称相对运动空间，允许用户在不指定绝对运动目标的情况下设置任务执行的不对称程度，同时保留绝对运动作为功能性冗余。

Comments Accepted for presentation at ISRR19. 16 Pages

详情

AI中文摘要

协调的双臂操作任务可以大致分为具有绝对和相对运动成分。特别是相对运动任务，其在如何分配给末端执行器方面本质上是冗余的。在本工作中，我们从非对称解决相对运动任务的角度分析协作操作。我们讨论了现有方法如何使相对运动任务的非对称执行成为可能，并展示了如何定义非对称相对运动空间。我们利用这一结果提出了一种扩展的相对雅可比来建模协作系统，这使用户能够在不指定绝对运动目标的情况下设置任务执行的具体不对称程度。这无需规定绝对运动目标，而是将绝对运动保留为系统的功能性冗余。我们通过一种新的微分逆运动学算法的数值模拟来展示所提出雅可比的性质。

英文摘要

Coordinated dual-arm manipulation tasks can be broadly characterized as possessing absolute and relative motion components. Relative motion tasks, in particular, are inherently redundant in the way they can be distributed between end-effectors. In this work, we analyse cooperative manipulation in terms of the asymmetric resolution of relative motion tasks. We discuss how existing approaches enable the asymmetric execution of a relative motion task, and show how an asymmetric relative motion space can be defined. We leverage this result to propose an extended relative Jacobian to model the cooperative system, which allows a user to set a concrete degree of asymmetry in the task execution. This is achieved without the need for prescribing an absolute motion target. Instead, the absolute motion remains available as a functional redundancy to the system. We illustrate the properties of our proposed Jacobian through numerical simulations of a novel differential Inverse Kinematics algorithm.

URL PDF HTML ☆

赞 0 踩 0

1806.06790 2026-06-04 cs.LG cs.AI cs.IT cs.SY eess.SY math.IT math.OC stat.ML

Towards Distributed Energy Services: Decentralizing Optimal Power Flow with Machine Learning

迈向分布式能源服务：利用机器学习实现最优功率流的去中心化

Roel Dobbe, Oscar Sondermeijer, David Fridovich-Keil, Daniel Arnold, Duncan Callaway, Claire Tomlin

发表机构 * AI Now Institute at New York University（纽约大学AI现在研究所）； Energy & Resources Group at UC Berkeley（伯克利大学能源与资源组）

AI总结本文提出了一种基于机器学习的去中心化方法，通过本地可用信息学习可控分布式能源资源（DER）的控制策略，以重构和模仿集中式最优功率流（OPF）问题的解决方案，从而实现分布式能源服务。

Comments Accepted for publication. To appear in the IEEE Transactions on Smart Grid

详情

AI中文摘要

实现最优功率流（OPF）方法以调节电力网络中的电压和功率流通常被认为需要大量通信。我们考虑包含多个可控分布式能源资源（DER）的配电系统，并提出一种数据驱动的方法，用于学习每个DER的控制策略，以仅利用本地可用信息来重构和模仿集中式OPF问题的解决方案。集体来看，所有本地控制器紧密匹配集中式OPF解决方案，提供接近最优的性能并满足系统约束。速率失真框架使得能够分析由此产生的完全去中心化控制策略在重构OPF解决方案方面的效果。该方法为决定DER应与哪些节点通信以改进其个别策略提供了自然扩展。该方法在单相和三相测试馈线网络上应用，使用真实负载和分布式发电机的数据，重点于不表现出跨时间依赖性的DER。它为配电系统运营商提供了一个框架，以高效规划和操作DER的贡献，以实现配电网络中的分布式能源服务。

英文摘要

The implementation of optimal power flow (OPF) methods to perform voltage and power flow regulation in electric networks is generally believed to require extensive communication. We consider distribution systems with multiple controllable Distributed Energy Resources (DERs) and present a data-driven approach to learn control policies for each DER to reconstruct and mimic the solution to a centralized OPF problem from solely locally available information. Collectively, all local controllers closely match the centralized OPF solution, providing near optimal performance and satisfaction of system constraints. A rate distortion framework enables the analysis of how well the resulting fully decentralized control policies are able to reconstruct the OPF solution. The methodology provides a natural extension to decide what nodes a DER should communicate with to improve the reconstruction of its individual policy. The method is applied on both single- and three-phase test feeder networks using data from real loads and distributed generators, focusing on DERs that do not exhibit inter-temporal dependencies. It provides a framework for Distribution System Operators to efficiently plan and operate the contributions of DERs to achieve Distributed Energy Services in distribution networks.

URL PDF HTML ☆

赞 0 踩 0

1801.09627 2026-06-04 cs.LG cs.RO cs.SY eess.SY

Barrier-Certified Adaptive Reinforcement Learning with Applications to Brushbot Navigation

具有应用的障碍证书自适应强化学习：Brushbot导航

Motoya Ohnishi, Li Wang, Gennaro Notomista, Magnus Egerstedt

发表机构 * School of Electrical Engineering, Royal Institute of Technology（皇家理工学院电气工程学院）； Georgia Institute of Technology（佐治亚理工学院）； RIKEN Center for Advanced Intelligence Project（日本理化学研究所高级智能研究中心）； School of Mechanical Engineering（机械工程学院）

AI总结本文提出了一种安全学习框架，结合自适应模型学习算法和障碍证书，用于具有可能非平稳智能体动态的系统。通过稀疏优化技术提取模型的动态结构，并利用学习的模型结合控制障碍证书来约束策略（反馈控制器），以保持安全性，即避免特定的不利状态空间区域。在某些条件下，保证了在安全被非平稳性破坏后，以李雅普诺夫稳定性的方式恢复安全。此外，将动作-价值函数近似重新公式化，使任何基于内核的非线性函数估计方法都能应用于我们的自适应学习框架。最后，保证了障碍证书策略优化的解是全局最优的，确保在温和条件下进行贪心策略改进。所得到的框架通过四旋翼无人机的模拟进行验证，该无人机此前在安全学习文献中被假设为平稳性，然后在动态未知、高度复杂且非平稳的Brushbot机器人上进行测试。

Comments ©2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

详情

DOI: 10.1109/TRO.2019.2920206
Journal ref: Published in IEEE Transactions on Robotics, 2019

AI中文摘要

本文提出了一种安全学习框架，该框架结合了自适应模型学习算法和障碍证书，用于具有可能非平稳智能体动态的系统。为了提取模型的动态结构，我们使用了稀疏优化技术。我们利用学习的模型结合控制障碍证书，以约束策略（反馈控制器）从而保持安全性，即避免特定的状态空间区域中的不利区域。在某些条件下，恢复安全性的保证是在安全被非平稳性破坏后以李雅普诺夫稳定性的方式恢复。此外，我们重新公式化了动作-价值函数近似，使任何基于内核的非线性函数估计方法都能应用于我们的自适应学习框架。最后，保证了障碍证书策略优化的解是全局最优的，确保在温和条件下进行贪心策略改进。所得到的框架通过四旋翼无人机的模拟进行验证，该无人机此前在安全学习文献中被假设为平稳性，然后在动态未知、高度复杂且非平稳的Brushbot机器人上进行测试。

英文摘要

This paper presents a safe learning framework that employs an adaptive model learning algorithm together with barrier certificates for systems with possibly nonstationary agent dynamics. To extract the dynamic structure of the model, we use a sparse optimization technique. We use the learned model in combination with control barrier certificates which constrain policies (feedback controllers) in order to maintain safety, which refers to avoiding particular undesirable regions of the state space. Under certain conditions, recovery of safety in the sense of Lyapunov stability after violations of safety due to the nonstationarity is guaranteed. In addition, we reformulate an action-value function approximation to make any kernel-based nonlinear function estimation method applicable to our adaptive learning framework. Lastly, solutions to the barrier-certified policy optimization are guaranteed to be globally optimal, ensuring the greedy policy improvement under mild conditions. The resulting framework is validated via simulations of a quadrotor, which has previously been used under stationarity assumptions in the safe learnings literature, and is then tested on a real robot, the brushbot, whose dynamics is unknown, highly complex and nonstationary.

URL PDF HTML ☆

赞 0 踩 0

1903.02531 2026-06-04 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY

Combining Optimal Control and Learning for Visual Navigation in Novel Environments

将最优控制与学习相结合用于新环境中的视觉导航

Somil Bansal, Varun Tolani, Saurabh Gupta, Jitendra Malik, Claire Tomlin

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Facebook AI Research（脸书人工智能研究）

AI总结本文提出了一种结合模型控制与学习感知的方法，用于在新环境中实现可靠的视觉导航，通过生成无碰撞路径的 waypoints，使机器人能够高效地到达目标位置，同时在低帧率和仿真到现实的迁移中表现良好。

Comments Project website: https://vtolani95.github.io/WayPtNav/

详情

AI中文摘要

基于模型的控制是机器人导航的流行范式，因为它可以利用已知的动力学模型来高效地规划鲁棒的机器人轨迹。然而，在环境事先未知且只能通过机器人上的传感器部分观测的情况下，使用基于模型的方法具有挑战性。在本工作中，我们通过将基于模型的控制与基于学习的感知相结合来解决这一不足。基于学习的感知模块生成一系列 waypoints，通过无碰撞路径引导机器人到达目标。这些 waypoints 被用于基于模型的规划器生成平滑且动态可行的轨迹，该轨迹通过反馈控制在物理系统上执行。我们在模拟的真实世界复杂环境中以及在实际地面车辆上的实验表明，与纯几何映射或端到端学习方法相比，所提出的方法在新环境中能够更可靠、更高效地到达目标位置。我们的方法不依赖于详细的显式 3D 环境地图，能够与低帧率工作，并且在仿真到现实的迁移中表现良好。描述我们方法和实验的视频可在项目网站上获得。

英文摘要

Model-based control is a popular paradigm for robot navigation because it can leverage a known dynamics model to efficiently plan robust robot trajectories. However, it is challenging to use model-based methods in settings where the environment is a priori unknown and can only be observed partially through on-board sensors on the robot. In this work, we address this short-coming by coupling model-based control with learning-based perception. The learning-based perception module produces a series of waypoints that guide the robot to the goal via a collision-free path. These waypoints are used by a model-based planner to generate a smooth and dynamically feasible trajectory that is executed on the physical system using feedback control. Our experiments in simulated real-world cluttered environments and on an actual ground vehicle demonstrate that the proposed approach can reach goal locations more reliably and efficiently in novel environments as compared to purely geometric mapping-based or end-to-end learning-based alternatives. Our approach does not rely on detailed explicit 3D maps of the environment, works well with low frame rates, and generalizes well from simulation to the real world. Videos describing our approach and experiments are available on the project website.

URL PDF HTML ☆

赞 0 踩 0

1806.03816 2026-06-04 cs.LG cs.NA math.NA stat.ML

Adaptive MCMC via Combining Local Samplers

通过结合局部采样器实现自适应MCMC

Kiarash Shaloudegi, András György

发表机构 * Imperial College London, London, UK（伦敦帝国学院，伦敦，英国）

AI总结本文提出了一种自适应MCMC方法，通过结合多个并行运行的局部采样器，利用核Stein分歧度优先选择链，以提高整体采样效率，实验表明该方法在多模态问题和传感器定位任务中优于现有方法。

详情

AI中文摘要

马尔可夫链蒙特卡罗（MCMC）方法在机器学习中被广泛使用。MCMC的主要问题之一是如何设计能够快速混合整个状态空间的链，特别是如何选择MCMC算法的参数。本文采取了不同的方法，类似于并行MCMC方法，而不是寻找一个能够采样整个分布的单一链，而是结合多个并行运行的链的样本，每个链仅探索状态空间的部分（例如几个模式）。链根据核Stein分歧度优先级进行选择，这提供了局部性能的良好度量。独立链的样本通过一种新的技术进行组合，用于估计样本空间不同区域的概率。实验结果表明，所提出的算法可能在不同的采样问题中提供显著的加速。最重要的是，当与最先进的NUTS算法作为基础MCMC采样器结合时，我们的方法在采样单峰分布时与NUTS具有竞争力，而在合成多峰问题以及具有挑战性的传感器定位任务中显著优于现有方法。

英文摘要

Markov chain Monte Carlo (MCMC) methods are widely used in machine learning. One of the major problems with MCMC is the question of how to design chains that mix fast over the whole state space; in particular, how to select the parameters of an MCMC algorithm. Here we take a different approach and, similarly to parallel MCMC methods, instead of trying to find a single chain that samples from the whole distribution, we combine samples from several chains run in parallel, each exploring only parts of the state space (e.g., a few modes only). The chains are prioritized based on kernel Stein discrepancy, which provides a good measure of performance locally. The samples from the independent chains are combined using a novel technique for estimating the probability of different regions of the sample space. Experimental results demonstrate that the proposed algorithm may provide significant speedups in different sampling problems. Most importantly, when combined with the state-of-the-art NUTS algorithm as the base MCMC sampler, our method remained competitive with NUTS on sampling from unimodal distributions, while significantly outperforming state-of-the-art competitors on synthetic multimodal problems as well as on a challenging sensor localization task.

URL PDF HTML ☆

赞 0 踩 0

1811.09358 2026-06-04 cs.LG cs.CV cs.NA math.NA math.OC stat.ML

A Sufficient Condition for Convergences of Adam and RMSProp

Adam和RMSProp收敛性的充分条件

Fangyu Zou, Li Shen, Zequn Jie, Weizhong Zhang, Wei Liu

发表机构 * Tencent AI Lab（腾讯AI实验室）； Stony Brook University（石英布鲁克大学）

AI总结本文提出了一种易于检查的充分条件，该条件仅依赖于基础学习率参数和历史二阶矩量的组合，以保证通用的Adam/RMSProp算法在大规模非凸随机优化中的全局收敛性，并展示了几种Adam变体在非凸设置下的收敛性可由此条件直接推导。

Comments Accepted by CVPR2019 as an Oral presentation

详情

AI中文摘要

Adam和RMSProp是训练深度神经网络中最具影响力的自适应随机算法，尽管在凸设置中通过几个简单的反例已被指出存在发散现象。许多尝试，如降低自适应学习率、采用大批次大小、引入时间去相关技术、寻找类比的替代方案等，已被尝试以促进Adam/RMSProp型算法收敛。与现有方法不同，我们引入了一种替代的易于检查的充分条件，该条件仅依赖于基础学习率参数和历史二阶矩量的组合，以保证通用的Adam/RMSProp算法在大规模非凸随机优化中的全局收敛性。此外，我们展示了几种Adam变体，如AdamNC、AdaEMA等，在非凸设置下的收敛性可通过所提出的充分条件直接推导。此外，我们表明Adam本质上是一种具有指数移动平均动量的特定加权AdaGrad，这为理解Adam和RMSProp提供了新的视角。这一观察结合该充分条件，为它们的发散性提供了更深入的解释。最后，我们通过将Adam和RMSProp应用于特定反例和训练深度神经网络来验证该充分条件。数值结果与我们的理论分析一致。

英文摘要

Adam and RMSProp are two of the most influential adaptive stochastic algorithms for training deep neural networks, which have been pointed out to be divergent even in the convex setting via a few simple counterexamples. Many attempts, such as decreasing an adaptive learning rate, adopting a big batch size, incorporating a temporal decorrelation technique, seeking an analogous surrogate, etc., have been tried to promote Adam/RMSProp-type algorithms to converge. In contrast with existing approaches, we introduce an alternative easy-to-check sufficient condition, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, to guarantee the global convergence of generic Adam/RMSProp for solving large-scale non-convex stochastic optimization. Moreover, we show that the convergences of several variants of Adam, such as AdamNC, AdaEMA, etc., can be directly implied via the proposed sufficient condition in the non-convex setting. In addition, we illustrate that Adam is essentially a specifically weighted AdaGrad with exponential moving average momentum, which provides a novel perspective for understanding Adam and RMSProp. This observation coupled with this sufficient condition gives much deeper interpretations on their divergences. At last, we validate the sufficient condition by applying Adam and RMSProp to tackle a certain counterexample and train deep neural networks. Numerical results are exactly in accord with our theoretical analysis.

URL PDF HTML ☆

赞 0 踩 0

1904.05728 2026-06-04 cs.RO cs.SY eess.SY

Technical Report: Safe, Aggressive Quadrotor Flight via Reachability-based Trajectory Design

技术报告：基于可达性的安全激进四旋翼飞行轨迹设计

Shreyas Kousik, Patrick Holmes, Ramanarayan Vasudevan

发表机构 * Mechanical Engineering, University of Michigan, Ann Arbor, MI（机械工程，密歇根大学，安娜堡，MI）

AI总结本文提出了一种基于可达性的四旋翼飞行轨迹设计方法，通过在线规划中创新性地使用Zonotopes来确保在存在轨迹依赖性跟踪误差的情况下飞行安全，实验显示在500个随机复杂环境中实现了最高5m/s的激进飞行且无碰撞。

Comments 12 Pages, 3 Figures, 1 Table

详情

AI中文摘要

四旋翼可以提供基础设施检查和搜索救援等服务，需要在复杂环境中自主操作。通常通过滚动时间规划实现自主性，即在执行短期计划的同时计算新的计划，因为传感器在任何时候都只能获得有限的信息。为了确保安全并防止机器人丢失，计划必须被验证为在存在不确定性（例如跟踪误差）的情况下仍然无碰撞。现有的样条基规划器通过均匀扩张障碍物来补偿不确定性，这可能会过于保守。另一方面，基于可达性的规划器可以将轨迹依赖性的不确定性作为轨迹函数来考虑。本文将基于可达性的轨迹设计（RTD）应用于规划四旋翼轨迹，以在存在轨迹依赖性跟踪误差的情况下保持安全。这通过在在线规划中以新颖的方式使用Zonotopes来实现。模拟显示在500个随机复杂环境中实现了最高5m/s的激进飞行且无碰撞。

英文摘要

Quadrotors can provide services such as infrastructure inspection and search-and-rescue, which require operating autonomously in cluttered environments. Autonomy is typically achieved with receding-horizon planning, where a short plan is executed while a new one is computed, because sensors receive limited information at any time. To ensure safety and prevent robot loss, plans must be verified as collision free despite uncertainty (e.g, tracking error). Existing spline-based planners dilate obstacles uniformly to compensate for uncertainty, which can be conservative. On the other hand, reachability-based planners can include trajectory-dependent uncertainty as a function of the planned trajectory. This work applies Reachability-based Trajectory Design (RTD) to plan quadrotor trajectories that are safe despite trajectory-dependent tracking error. This is achieved by using zonotopes in a novel way for online planning. Simulations show aggressive flight up to 5 m/s with zero crashes in 500 cluttered, randomized environments.

URL PDF HTML ☆

赞 0 踩 0

1811.10745 2026-06-04 cs.LG cs.CR cs.NA math.NA stat.ML

ResNets Ensemble via the Feynman-Kac Formalism to Improve Natural and Robust Accuracies

通过费米-狄拉克公式式方法提升ResNets的自然和鲁棒准确性的集成方法

Bao Wang, Binjie Yuan, Zuoqiang Shi, Stanley J. Osher

发表机构 * Department of Mathematics（数学系）； Computer Science Department（计算机科学系）； University of California, Los Angeles（加州大学洛杉矶分校）； Tsinghua University（清华大学）； Yau Mathematical Sciences Center（杨振宁数学科学中心）

AI总结本文提出了一种基于费米-狄拉克公式式的ResNets集成算法，通过在残差映射的输出中注入方差指定的高斯噪声并平均多个联合训练的修改ResNets的乘积来提高模型在干净和对抗性图像上的准确率。

Comments 18 pages, 6 figures

详情

AI中文摘要

经验对抗风险最小化（EARM）是一种广泛使用的数学框架，用于鲁棒地训练深度神经网络（DNNs），使其对对抗性攻击具有抵抗力。然而，训练后的鲁棒模型在分类干净图像和对抗图像时的自然和鲁棒准确率仍然远未令人满意。在本工作中，我们统一了传输方程最优控制的理论与ResNets的训练和测试实践。基于这一统一观点，我们提出了一种简单但有效的ResNets集成算法，以提升鲁棒训练模型在干净和对抗图像上的准确率。所提出的算法包括两个组成部分：首先，我们通过在每个残差映射的输出中注入指定方差的高斯噪声来修改基础ResNets。其次，我们对多个联合训练的修改ResNets的乘积进行平均以获得最终预测。这两个步骤对费米-狄拉克公式表示粘性传输方程或对流-扩散方程的解提供了近似。在CIFAR10基准测试中，该简单算法导致在干净图像上的自然准确率为85.62%，在20次IFGSM攻击迭代下的鲁棒准确率为57.94%，优于当前在CIFAR10上防御IFGSM攻击的最先进方法。所提出的ResNets集成的自然和鲁棒准确率可以随着基础ResNet的进展动态提高。代码可在：https://github.com/BaoWangMath/EnResNet获取。

英文摘要

Empirical adversarial risk minimization (EARM) is a widely used mathematical framework to robustly train deep neural nets (DNNs) that are resistant to adversarial attacks. However, both natural and robust accuracies, in classifying clean and adversarial images, respectively, of the trained robust models are far from satisfactory. In this work, we unify the theory of optimal control of transport equations with the practice of training and testing of ResNets. Based on this unified viewpoint, we propose a simple yet effective ResNets ensemble algorithm to boost the accuracy of the robustly trained model on both clean and adversarial images. The proposed algorithm consists of two components: First, we modify the base ResNets by injecting a variance specified Gaussian noise to the output of each residual mapping. Second, we average over the production of multiple jointly trained modified ResNets to get the final prediction. These two steps give an approximation to the Feynman-Kac formula for representing the solution of a transport equation with viscosity, or a convection-diffusion equation. For the CIFAR10 benchmark, this simple algorithm leads to a robust model with a natural accuracy of {\bf 85.62}\% on clean images and a robust accuracy of ${\bf 57.94 \%}$ under the 20 iterations of the IFGSM attack, which outperforms the current state-of-the-art in defending against IFGSM attack on the CIFAR10. Both natural and robust accuracies of the proposed ResNets ensemble can be improved dynamically as the building block ResNet advances. The code is available at: \url{https://github.com/BaoWangMath/EnResNet}.

URL PDF HTML ☆

赞 0 踩 0

1905.13587 2026-06-04 cs.LG cs.NA math.NA math.OC stat.ML

GENO -- GENeric Optimization for Classical Machine Learning

GENO -- 为经典机器学习设计的通用优化

Sören Laue, Matthias Mitterreiter, Joachim Giesen

发表机构 * Friedrich-Schiller-Universität Jena（耶拿弗里德里希-施勒斯海姆大学）

AI总结本文提出GENO框架，通过结合建模语言和通用求解器，实现了对大多数经典机器学习问题的高效自动求解，展示了其在效率上的优势。

详情

AI中文摘要

尽管优化是机器学习的长期算法核心，但新模型仍需要耗时实现新求解器。因此，有成千上万种针对机器学习问题的优化算法实现。一个自然的问题是，是否总需要实现新求解器，或者是否存在一个适用于大多数模型的算法。普遍认为这种“万能算法”无法工作，因为该算法无法利用模型特定的结构，因此无法在广泛的问题上高效且稳健。本文挑战这一普遍观点。我们设计并实现了优化框架GENO（GENeric Optimization），它结合了建模语言和通用求解器。GENO从优化问题类的声明性规范中生成求解器。该框架足够灵活，可以涵盖大多数经典机器学习问题。我们在广泛的经典问题以及一些最近提出的问题上展示了自动生成的求解器的性能：(1) 与精心设计的专用求解器一样高效，(2) 比最近的最先进求解器有相当大的优势，(3) 比传统建模语言加求解器方法快多个数量级。

英文摘要

Although optimization is the longstanding algorithmic backbone of machine learning, new models still require the time-consuming implementation of new solvers. As a result, there are thousands of implementations of optimization algorithms for machine learning problems. A natural question is, if it is always necessary to implement a new solver, or if there is one algorithm that is sufficient for most models. Common belief suggests that such a one-algorithm-fits-all approach cannot work, because this algorithm cannot exploit model specific structure and thus cannot be efficient and robust on a wide variety of problems. Here, we challenge this common belief. We have designed and implemented the optimization framework GENO (GENeric Optimization) that combines a modeling language with a generic solver. GENO generates a solver from the declarative specification of an optimization problem class. The framework is flexible enough to encompass most of the classical machine learning problems. We show on a wide variety of classical but also some recently suggested problems that the automatically generated solvers are (1) as efficient as well-engineered specialized solvers, (2) more efficient by a decent margin than recent state-of-the-art solvers, and (3) orders of magnitude more efficient than classical modeling language plus solver approaches.

URL PDF HTML ☆

赞 0 踩 0

1905.13428 2026-06-04 cs.LG cs.MA cs.SY eess.SY stat.ML

Attentional Policies for Cross-Context Multi-Agent Reinforcement Learning

面向跨上下文多智能体强化学习的注意力策略

Matthew A. Wright, Roberto Horowitz

发表机构 * University of California Berkeley（加州大学伯克利分校）

AI总结本文提出了一种新的神经策略架构，用于解决多智能体问题，通过在策略层面学习多智能体关系，利用注意力机制实现智能体间的协作，优于传统方法并在大规模智能体场景中表现更优。

详情

AI中文摘要

许多现实世界中强化学习的应用涉及与数量随时间变化的其他智能体交互。我们为这些多智能体问题提出了新的神经策略架构。与传统的为每个智能体训练离散策略并通过额外的跨策略机制强制合作的方法不同，我们遵循最近关于深度网络中关系归纳偏置力量的工作精神，在策略层面学习多智能体关系。在我们的方法中，所有智能体共享相同的策略，但各自在自己的上下文中独立应用该策略，以聚合其他智能体的状态信息以选择下一步动作。我们的架构结构允许其应用于具有不同数量智能体的环境。我们在基准多智能体自动驾驶协调问题上展示了我们的架构，取得了优于全知识、完全集中化参考解决方案的成果，并在智能体数量扩大时显著优于该方案。

英文摘要

Many potential applications of reinforcement learning in the real world involve interacting with other agents whose numbers vary over time. We propose new neural policy architectures for these multi-agent problems. In contrast to other methods of training an individual, discrete policy for each agent and then enforcing cooperation through some additional inter-policy mechanism, we follow the spirit of recent work on the power of relational inductive biases in deep networks by learning multi-agent relationships at the policy level via an attentional architecture. In our method, all agents share the same policy, but independently apply it in their own context to aggregate the other agents' state information when selecting their next action. The structure of our architectures allow them to be applied on environments with varying numbers of agents. We demonstrate our architecture on a benchmark multi-agent autonomous vehicle coordination problem, obtaining superior results to a full-knowledge, fully-centralized reference solution, and significantly outperforming it when scaling to large numbers of agents.

URL PDF HTML ☆

赞 0 踩 0

1905.13053 2026-06-04 cs.AI cs.SY eess.SY

Unpredictability of AI

AI的不可预测性

Roman V. Yampolskiy

发表机构 * Computer Engineering and Computer Science University of Louisville（计算机工程与计算机科学路易斯维尔大学）

AI总结本文研究了AI安全领域中一个核心问题，即智能系统的行为预测难题，证明了即使知道终端目标，也无法准确预测超人类智能系统的行为，对AI安全产生了深远影响。

1905.09673 2026-06-04 cs.AI cs.LG cs.SY eess.SY

Deep Q-Learning with Q-Matrix Transfer Learning for Novel Fire Evacuation Environment

基于Q矩阵迁移学习的深度Q学习用于新型火灾疏散环境

Jivitesh Sharma, Per-Arne Andersen, Ole-Chrisoffer Granmo, Morten Goodwin

发表机构 * Centre for Artificial Intelligence Research（人工智能研究中心）； Department of Information and Communication Technology（信息与通信技术系）； University of Agder, Norway（阿格德大学，挪威）

AI总结本文提出了一种基于Q矩阵迁移学习的深度Q学习方法，用于解决紧急疏散问题，通过预训练DQN网络权重以获取最短路径信息，并在复杂真实环境中实现最优疏散路径。

Comments 21 pages, 14 figures, 4 tables

详情

AI中文摘要

我们关注紧急疏散这一重要问题，该问题显然可以受益于强化学习，但长期以来未被充分研究。紧急疏散是一个复杂的任务，难以用强化学习解决，因为紧急情况高度动态，包含大量变化变量和复杂约束，使训练变得困难。在本文中，我们提出了第一个用于训练强化学习代理进行疏散规划的火灾疏散环境。该环境被建模为图，以捕捉建筑结构。它包括现实特征，如火势蔓延、不确定性和瓶颈。我们已经将环境实现为OpenAI gym格式，以促进未来研究。我们还提出了一种新的强化学习方法，该方法通过预训练DQN代理的网络权重来整合通往出口的最短路径信息。我们通过使用表格Q学习来学习建筑模型图中的最短路径来实现这一点。此信息通过故意在Q矩阵上过拟合来转移到网络。然后，预训练的DQN模型在火灾疏散环境中进行训练，以在时间变化条件下生成最优疏散路径。我们对所提出的方法与PPO、VPG、SARSA、A2C和ACKTR等最新强化学习算法进行了比较。结果表明，我们的方法在包括原始DQN模型在内的最新模型上表现出巨大的优势。最后，我们在一个大型且复杂的现实建筑中测试我们的模型，该建筑由91个房间组成，可以移动到任何其他房间，因此有8281种动作。我们使用基于注意力的机制来处理大动作空间。我们的模型在现实世界紧急环境中实现了接近最优的性能。

英文摘要

We focus on the important problem of emergency evacuation, which clearly could benefit from reinforcement learning that has been largely unaddressed. Emergency evacuation is a complex task which is difficult to solve with reinforcement learning, since an emergency situation is highly dynamic, with a lot of changing variables and complex constraints that makes it difficult to train on. In this paper, we propose the first fire evacuation environment to train reinforcement learning agents for evacuation planning. The environment is modelled as a graph capturing the building structure. It consists of realistic features like fire spread, uncertainty and bottlenecks. We have implemented the environment in the OpenAI gym format, to facilitate future research. We also propose a new reinforcement learning approach that entails pretraining the network weights of a DQN based agents to incorporate information on the shortest path to the exit. We achieved this by using tabular Q-learning to learn the shortest path on the building model's graph. This information is transferred to the network by deliberately overfitting it on the Q-matrix. Then, the pretrained DQN model is trained on the fire evacuation environment to generate the optimal evacuation path under time varying conditions. We perform comparisons of the proposed approach with state-of-the-art reinforcement learning algorithms like PPO, VPG, SARSA, A2C and ACKTR. The results show that our method is able to outperform state-of-the-art models by a huge margin including the original DQN based models. Finally, we test our model on a large and complex real building consisting of 91 rooms, with the possibility to move to any other room, hence giving 8281 actions. We use an attention based mechanism to deal with large action spaces. Our model achieves near optimal performance on the real world emergency environment.

URL PDF HTML ☆

赞 0 踩 0

1805.07297 2026-06-04 cs.LG cs.NA math.NA stat.ML

General solutions for nonlinear differential equations: a rule-based self-learning approach using deep reinforcement learning

非线性微分方程的通用解法：一种基于规则的自学习方法使用深度强化学习

Shiyin Wei, Xiaowei Jin, Hui Li

发表机构 * Key Lab of Smart Prevention and Mitigation of Civil Engineering Disasters of the Ministry of Industry and Information Technology, Harbin Institute of Technology（工信部智能防灾减灾重点实验室，哈尔滨工业大学）； Key Lab of Structures Dynamic Behavior and Control of the Ministry of Education, Harbin Institute of Technology（教育部结构动力行为与控制重点实验室，哈尔滨工业大学）； School of Civil Engineering, Harbin Institute of Technology（哈尔滨工业大学土木工程学院）

AI总结本文提出了一种基于规则的自学习方法，利用深度强化学习解决非线性常微分方程和偏微分方程，通过深度神经网络结构的演员输出候选解，以及仅基于物理规则（ governing equations 和边界和初始条件）的评论家，展示了转移学习特性，并验证了该方法在求解薛定谔、纳维-斯托克斯、伯格斯、范德波尔和洛伦兹方程及运动方程中的高精度解。

详情

DOI: 10.1007/s00466-019-01715-1

AI中文摘要

本文首次提出了一种基于深度强化学习（DRL）的通用规则-based 自学习方法，用于求解非线性常微分方程和偏微分方程。求解器由一个深度神经网络结构的演员组成，该演员输出候选解，以及仅基于物理规则（ governing equations 和边界和初始条件）的评论家。离散时间中的解被视为共享相同 governing equation 的多个任务，当前步骤参数为下一步提供了理想的初始化，由于解的时序连续性，展示了转移学习特性，表明DRL求解器已经捕捉到了方程的本质。该方法通过求解薛定谔、纳维-斯托克斯、伯格斯、范德波尔和洛伦兹方程及运动方程进行了验证。结果表明，该方法能够给出高精度的解，且求解过程有望更快。

英文摘要

A universal rule-based self-learning approach using deep reinforcement learning (DRL) is proposed for the first time to solve nonlinear ordinary differential equations and partial differential equations. The solver consists of a deep neural network-structured actor that outputs candidate solutions, and a critic derived only from physical rules (governing equations and boundary and initial conditions). Solutions in discretized time are treated as multiple tasks sharing the same governing equation, and the current step parameters provide an ideal initialization for the next owing to the temporal continuity of the solutions, which shows a transfer learning characteristic and indicates that the DRL solver has captured the intrinsic nature of the equation. The approach is verified through solving the Schrödinger, Navier-Stokes, Burgers', Van der Pol, and Lorenz equations and an equation of motion. The results indicate that the approach gives solutions with high accuracy, and the solution process promises to get faster.

URL PDF HTML ☆

赞 0 踩 0

1905.10457 2026-06-04 cs.LG cs.NA math.NA stat.ML

A Polynomial-Based Approach for Architectural Design and Learning with Deep Neural Networks

基于多项式的深度神经网络架构设计与学习方法

Joseph Daws, Clayton G. Webster

发表机构 * Oak Ridge National Lab（橡树岭国家实验室）； University of Tennessee at Knoxville（田纳西大学 Knoxville分校）

AI总结本文提出了一种基于多项式的新型方法，通过识别合适的网络架构和初始化来从训练数据中重建多元函数，利用多项式近似，通过标准训练过程改进网络，从而更可能获得理想的局部极小值。

Comments 11 pages, 6 figures, submitted to NeurIPS 2019, corrected several typos and included new examples

详情

AI中文摘要

在本研究中，我们提出了一种新的方法，通过多项式近似来从训练数据中重建多元函数，同时确定合适的网络架构和初始化。使用梯度下降训练深度神经网络可以被视为沿着损失景观移动网络参数以最小化损失函数。参数初始化对于基于下降的迭代训练方法至关重要。我们的方法产生了一个初始状态为训练数据多项式表示的网络。该技术的主要优势是，从该初始状态出发，网络可以通过标准训练过程进行改进。由于网络已经近似了数据，训练更可能产生一组与理想局部极小值相关的参数。我们提供了构建此类网络所需的理论细节，并考虑了几个数值示例，揭示了我们的方法最终能够有效训练网络，从初始状态开始，以实现对大量目标函数的改进近似。

英文摘要

In this effort we propose a novel approach for reconstructing multivariate functions from training data, by identifying both a suitable network architecture and an initialization using polynomial-based approximations. Training deep neural networks using gradient descent can be interpreted as moving the set of network parameters along the loss landscape in order to minimize the loss functional. The initialization of parameters is important for iterative training methods based on descent. Our procedure produces a network whose initial state is a polynomial representation of the training data. The major advantage of this technique is from this initialized state the network may be improved using standard training procedures. Since the network already approximates the data, training is more likely to produce a set of parameters associated with a desirable local minimum. We provide the details of the theory necessary for constructing such networks and also consider several numerical examples that reveal our approach ultimately produces networks which can be effectively trained from our initialized state to achieve an improved approximation for a large class of target functions.

URL PDF HTML ☆

赞 0 踩 0

1905.11130 2026-06-04 cs.RO cs.SY eess.SY

Autonomous Interpretation of Demonstrations for Modification of Dynamical Movement Primitives

自主解释示范以修改动力学运动原语

Martin Karlsson, Anders Robertsson, Rolf Johansson

发表机构 * LCCC Linnaeus Center（LCCC林纳尤斯中心）； ELLIIT Excellence Center（ELLIIT卓越中心）

AI总结本文提出了一种框架，使机器人操作员能够直观地调整动力学运动原语（DMPs）。通过使用引导通过编程，操作员可以演示纠正轨迹，从而生成一个结合故障轨迹前部分和纠正轨迹后部分的修改DMP。

1905.11129 2026-06-04 cs.RO cs.SY eess.SY

On Motion Control and Machine Learning for Robotic Assembly

关于机器人装配的运动控制与机器学习

Martin Karlsson

发表机构 * Department of Automatic Control, Lund University（自动控制系，卢恩大学）

AI总结本研究通过减少机器人编程所需的工程工作量并增强机器人应对意外事件的能力，提出了新的方法，从而加快编程速度并使非工程背景用户也能使用机器人。

1609.03213 2026-06-04 cs.SD cs.SY eess.SY

Relaxed Binaural LCMV Beamforming

放松的双耳LCMV波束成形

Andreas I. Koutrouvelis, Richard C. Hendriks, Richard Heusdens, Jesper Jensen

发表机构 * Oticon Foundation（奥蒂康基金会）

AI总结本文提出了一种新的双耳波束成形技术，该技术可以看作是线性约束最小方差（LCMV）框架的放松。该方法能够同时实现噪声抑制和目标源双耳线索的精确保持，类似于双耳最小方差无失真响应（BMVDR）方法。然而，与BMVDR不同，该方法还能以预定义的精度保持多个干扰源的双耳线索。具体来说，它通过为每个干扰源使用独立的权衡参数来控制噪声抑制和干扰源双耳线索保持之间的权衡。此外，我们提供了一种稳健的方法来选择这些权衡参数，使得干扰源的双耳线索保持精度始终优于BMVDR的相应精度。所提出方法中约束的放松实现了比其他使用严格等式约束的LCMV基双耳波束成形方法更接近的更多干扰源的近似双耳线索保持。

详情

DOI: 10.1109/TASLP.2016.2628642
Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(1), 137-152, 2016

AI中文摘要

在本文中，我们提出了一种新的双耳波束成形技术，该技术可以看作是线性约束最小方差（LCMV）框架的放松。所提出的方法能够实现同时的噪声抑制和目标源的双耳线索的精确保持，类似于双耳最小方差无失真响应（BMVDR）方法。然而，与BMVDR不同，该方法还能够以一定的预定义精度保持多个干扰源的双耳线索。具体来说，它通过为每个干扰源使用独立的权衡参数来控制噪声抑制和干扰源双耳线索保持之间的权衡。此外，我们提供了一种稳健的方法来选择这些权衡参数，使得干扰源的双耳线索保持精度始终优于BMVDR的相应精度。所提出方法中约束的放松实现了比其他使用严格等式约束的LCMV基双耳波束成形方法更接近的更多干扰源的近似双耳线索保持。

英文摘要

In this paper we propose a new binaural beamforming technique which can be seen as a relaxation of the linearly constrained minimum variance (LCMV) framework. The proposed method can achieve simultaneous noise reduction and exact binaural cue preservation of the target source, similar to the binaural minimum variance distortionless response (BMVDR) method. However, unlike BMVDR, the proposed method is also able to preserve the binaural cues of multiple interferers to a certain predefined accuracy. Specifically, it is able to control the trade-off between noise reduction and binaural cue preservation of the interferers by using a separate trade-off parameter per interferer. Moreover, we provide a robust way of selecting these trade-off parameters in such a way that the preservation accuracy for the binaural cues of the interferers is always better than the corresponding ones of the BMVDR. The relaxation of the constraints in the proposed method achieves approximate binaural cue preservation of more interferers than other previously presented LCMV-based binaural beamforming methods that use strict equality constraints.

URL PDF HTML ☆

赞 0 踩 0

1905.10224 2026-06-04 cs.LG cs.DM cs.NA cs.NE math.NA stat.ML

Semi-Supervised Classification on Non-Sparse Graphs Using Low-Rank Graph Convolutional Networks

利用低秩图卷积网络对非稀疏图进行半监督分类

Dominik Alfke, Martin Stoll

发表机构 * Department of Mathematics, Chair of Scientific Computing（数学系，科学计算教研室）

AI总结本文提出了一种低秩图卷积网络架构，用于高效处理非稀疏图上的半监督学习问题，通过引入低秩滤波器提升运行效率和准确率，并扩展到超图数据集的处理。

2606.05150 2026-06-04 cs.NE cs.AI

Multi-Column RBF Neural Network Using Adaptive and Non-Adaptive Particle Swarm Optimization

使用自适应和非自适应粒子群优化的多列RBF神经网络

Ammar Hoori, Yuichi Motai

发表机构 * Department of Biomedical Engineering, Case Western Reserve University（生物医学工程系，凯斯西储大学）； Department of Electrical and Computer Engineering, Virginia Commonwealth University（电气与计算机工程系，弗吉尼亚 Commonwealth 大学）

AI总结针对大规模数据集下RBF神经网络训练的可扩展性问题，提出基于粒子群优化（PSO）和自适应PSO（APSO）的多列RBF网络（MC-PSO和MC-APSO），通过并行训练多个RBFN并利用子集专门化提高精度和速度。

Comments 15 Page, Under Review

详情

AI中文摘要

使用梯度下降算法训练的径向基函数神经网络（RBFN）在浅层和深层网络中提供了有效的全连接结构。误差校正（ErrCor）是一种先进的基于梯度的训练方法，它选择最优隐藏单元以提高精度。另外，作为基于种群的算法，粒子群优化算法（PSO）利用群体经验优化RBFN参数，提供全局搜索和对局部最小值的鲁棒性。自适应PSO（APSO）作为PSO的改进变体出现。APSO算法通过在优化过程中动态调整群体参数来提高收敛速度。ErrCor和PSO都显示出改进的结果和有竞争力的收敛性。然而，对于大规模数据集，这些方法面临可扩展性挑战，如过多的核计算和大的隐藏层结构。最近的多列RBFN方法（MCRN）通过在并行系统中部署小型RBFN来提高ErrCor性能。受MCRN成功的启发，我们提出了两种改进PSO性能的新方法：使用PSO的多列RBFN（MC-PSO）和使用APSO的多列RBFN（MC-APSO）。这些方法引入了使用进化群方法训练的并行RBFN结构。每个RBFN独立地在数据集的特定空间子集上使用PSO或APSO算法进行训练。这些经过专门训练的RBFN针对各自的子集进行了定制。在测试期间，只有测试实例邻居所在的选定RBFN对多列输出有贡献。这种专门化提高了精度，而并行性提高了速度。我们在各种基准数据集上评估了所提出的方法。MC-PSO和MC-APSO在精度和召回率方面优于ErrCor、PSO、APSO和MCRN。在大多数实验中，它们还表现出更快的训练和测试时间。

英文摘要

The radial basis function neural network (RBFN) trained with a gradient descending algorithm provides an effective fully connected structure in both shallow and deep networks. The error correction (ErrCor), a state-of-the-art gradient-based training method, selects optimal hidden units to improve accuracy. Alternatively, as a population-based algorithm, the particle swarm optimization algorithm (PSO) uses the swarm experience to optimize RBFN parameters, offering global search and robustness to local minima. Adaptive PSO (APSO) has emerged as an improved variant of PSO. APSO algorithm improves convergence speed by dynamically adjusting swarm parameters during optimization. Both ErrCor and PSO demonstrate improved results and competitive convergence. However, with large datasets, these methods face scalability challenges such as excessive kernel computations and large hidden layer structures. A recent multi-column RBFN approach (MCRN) improves ErrCor performance by deploying small RBFNs in a parallel system. Inspired by MCRN's success, we propose two novel approaches to improve PSO performance: the multi-column RBFN with PSO (MC-PSO) and the multi-column RBFN with APSO (MC-APSO). These methods introduce parallel RBFN structures trained using evolutionary swarm methods. Each RBFN is independently trained on a specific spatial subset of the dataset using either PSO or APSO algorithms. These resulting specialist-trained RBFNs are tailored to their respective subsets. During testing, only selected RBFNs, where the test instance neighbors are located, contribute to the multi-column output. This specialization improves accuracy, while parallelism enhances speed. We evaluate the proposed methods on various benchmark datasets. The MC-PSO and MC-APSO outperform ErrCor, PSO, APSO, and MCRN in terms of accuracy and recall. They also demonstrate faster training and testing times in most experiments.

URL PDF HTML ☆

赞 0 踩 0