arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 21516
1806.09919 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Tangent-Space Regularization for Neural-Network Models of Dynamical Systems

神经动力系统模型中的切空间正则化

Fredrik Bagge Carlson, Rolf Johansson, Anders Robertsson

发表机构 * LCCC Linnaeus Center(LCCC 林纳尤斯中心)

AI总结 本文提出神经网络动力系统模型的切空间正则化方法,通过利用动力学函数的切空间特性,改进模型雅可比矩阵的正则化,减少对大量训练数据的依赖,并探讨不同网络架构对输入输出雅可比矩阵学习能力及L2正则化对系统稳定性的影响。

详情
AI中文摘要

本文介绍了神经网络动力系统模型中的切空间正则化概念。许多物理系统在控制应用中的动力学函数的切空间表现出有用性质,例如光滑性,这促使通过假设动力学的切空间来沿系统轨迹正则化模型雅可比矩阵。在没有假设的情况下,神经网络需要大量训练数据才能学习完整的非线性动力学而不过拟合。本文比较了不同网络架构在一步预测和模拟性能上的表现,并研究了不同架构学习具有正确输入输出雅可比矩阵的倾向。此外,探讨了L2权重正则化对学习雅可比特征值谱以及系统稳定性的影响。

英文摘要

This work introduces the concept of tangent space regularization for neural-network models of dynamical systems. The tangent space to the dynamics function of many physical systems of interest in control applications exhibits useful properties, e.g., smoothness, motivating regularization of the model Jacobian along system trajectories using assumptions on the tangent space of the dynamics. Without assumptions, large amounts of training data are required for a neural network to learn the full non-linear dynamics without overfitting. We compare different network architectures on one-step prediction and simulation performance and investigate the propensity of different architectures to learn models with correct input-output Jacobian. Furthermore, the influence of $L_2$ weight regularization on the learned Jacobian eigenvalue spectrum, and hence system stability, is investigated.

1806.08083 2026-06-04 cs.AI cs.SY eess.SY

Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop

拓展主动推断领域:感知-动作循环中的更多内在动机

Martin Biehl, Christian Guckelsberger, Christoph Salge, Simón C. Smith, Daniel Polani

发表机构 * Araya Inc.(Araya公司) Computational Creativity Group, Department of Computing, Goldsmiths, University of London(Goldsmiths大学计算创意小组) Game Innovation Lab, Department of Computer Science and Engineering, New York University(纽约大学游戏创新实验室) Sepia Lab, Adaptive Systems Research Group, Department of Computer Science, University of Hertfordshire(赫特福德大学计算机科学系Sepia实验室) Institute of Perception, Action and Behaviour, School of Informatics, The University of Edinburgh(爱丁堡大学信息学院感知、行为与行为研究所)

AI总结 本文探讨主动推断中是否可利用其他内在动机替代原有动机,同时保持核心机制,并通过形式化方法连接通用强化学习。

Comments 53 pages, 6 figures, 2 tables

详情
AI中文摘要

主动推断是一种雄心勃勃的理论,将自主代理的感知、推断和动作选择统一于单一原则下。它为许多认知现象提供了生物合理解释,包括意识。在主动推断中,动作选择由一个评估未来动作的客观函数驱动,该函数基于当前推断的世界信念。主动推断本质上独立于外在奖励,使其在不同环境或代理形态中具有高度鲁棒性。在文献中,共享这种独立性的范式被总结为内在动机。与主动推断不同,这些动机模型通常不承诺特定的推断和动作选择机制。本文研究主动推断的推断和动作选择机制是否也可用于其他内在动机替代原动机。感知-动作循环明确将推断和动作选择与环境和代理记忆联系起来,因此被用作分析基础。我们重构了主动推断方法,将其原始公式定位其中,并展示如何在保持许多原始特征的同时使用其他内在动机。此外,我们通过形式化方法展示了与通用强化学习的联系。主动推断研究可能从比较其他内在动机诱导的动力学中受益。内在动机研究可能从另一种实现内在动机代理的方式中受益,该方式也共享主动推断的生物合理性。

英文摘要

Active inference is an ambitious theory that treats perception, inference and action selection of autonomous agents under the heading of a single principle. It suggests biologically plausible explanations for many cognitive phenomena, including consciousness. In active inference, action selection is driven by an objective function that evaluates possible future actions with respect to current, inferred beliefs about the world. Active inference at its core is independent from extrinsic rewards, resulting in a high level of robustness across e.g.\ different environments or agent morphologies. In the literature, paradigms that share this independence have been summarised under the notion of intrinsic motivations. In general and in contrast to active inference, these models of motivation come without a commitment to particular inference and action selection mechanisms. In this article, we study if the inference and action selection machinery of active inference can also be used by alternatives to the originally included intrinsic motivation. The perception-action loop explicitly relates inference and action selection to the environment and agent memory, and is consequently used as foundation for our analysis. We reconstruct the active inference approach, locate the original formulation within, and show how alternative intrinsic motivations can be used while keeping many of the original features intact. Furthermore, we illustrate the connection to universal reinforcement learning by means of our formalism. Active inference research may profit from comparisons of the dynamics induced by alternative intrinsic motivations. Research on intrinsic motivations may profit from an additional way to implement intrinsically motivated agents that also share the biological plausibility of active inference.

1804.01926 2026-06-04 cs.RO cs.SY eess.SY stat.ML

Scalable Magnetic Field SLAM in 3D Using Gaussian Process Maps

基于高斯过程地图的可扩展三维磁场SLAM

Manon Kok, Arno Solin

发表机构 * Delft University of Technology(代尔夫特理工大学) Aalto University(阿尔托大学) University of Cambridge(剑桥大学)

AI总结 本文提出一种利用磁场局部异常进行三维磁场SLAM的方法,采用高斯过程模型和六边形分块映射,结合降维高斯过程回归与 Rao-Blackwellised 粒子滤波,实现高效计算和存储的SLAM算法。

Comments 11 pages, 5 figures

详情
AI中文摘要

我们提出了一种利用磁场局部异常作为位置信息源的可扩展且完全三维的磁场同时定位与建图(SLAM)方法。这些异常是由于建筑物结构和家具等物体中存在铁磁材料引起的。我们使用高斯过程模型表示磁场地图,并考虑磁场的已知物理性质。我们使用三维六边形分块进行局部地图构建。为了使我们的方法计算可行,我们结合降维高斯过程回归与 Rao-Blackwellised 粒子滤波。我们展示了使用智能手机测量可以得到准确的位置和姿态估计,并证明我们的方法在计算复杂度和地图存储方面都实现了可扩展的磁场SLAM算法。

英文摘要

We present a method for scalable and fully 3D magnetic field simultaneous localisation and mapping (SLAM) using local anomalies in the magnetic field as a source of position information. These anomalies are due to the presence of ferromagnetic material in the structure of buildings and in objects such as furniture. We represent the magnetic field map using a Gaussian process model and take well-known physical properties of the magnetic field into account. We build local maps using three-dimensional hexagonal block tiling. To make our approach computationally tractable we use reduced-rank Gaussian process regression in combination with a Rao-Blackwellised particle filter. We show that it is possible to obtain accurate position and orientation estimates using measurements from a smartphone, and that our approach provides a scalable magnetic field SLAM algorithm in terms of both computational complexity and map storage.

1704.06053 2026-06-04 cs.RO cs.SY eess.SY

Using Inertial Sensors for Position and Orientation Estimation

利用惯性传感器进行位置和姿态估计

Manon Kok, Jeroen D. Hol, Thomas B. Schön

发表机构 * Delft Center for Systems and Control, Delft University of Technology, the Netherlands(荷兰代尔夫特理工大学系统与控制中心) Xsens Technologies B.V., Enschede, the Netherlands(荷兰恩schede市Xsens技术公司) Department of Information Technology, Uppsala University, Sweden(瑞典乌普萨拉大学信息科技系)

AI总结 本文探讨了惯性传感器在位置和姿态估计中的信号处理方法,分析了不同建模选择和关键算法,如优化平滑滤波、扩展卡尔曼滤波和互补滤波,并通过实验和模拟数据验证其性能。

Comments 90 pages, 38 figures

详情
Journal ref
Foundations and Trends in Signal Processing, Vol. 11: No. 1-2, Pages 1-153, 2017
AI中文摘要

近年来,由于体积小、成本低,MEMS惯性传感器(3D加速度计和3D陀螺仪)已广泛可用。惯性传感器以高采样率获取数据,可通过积分获得位置和姿态信息。这些估计在短时间尺度上准确,但长时间尺度上会因积分漂移而产生误差。为克服此问题,惯性传感器通常与其它传感器和模型结合。本文教程聚焦于惯性传感器用于位置和姿态估计的信号处理方面,讨论了不同的建模选择和若干重要的算法。这些算法包括基于优化的平滑和滤波方法,以及计算成本更低的扩展卡尔曼滤波和互补滤波实现。通过实验和模拟数据展示了这些算法的估计质量。

英文摘要

In recent years, MEMS inertial sensors (3D accelerometers and 3D gyroscopes) have become widely available due to their small size and low cost. Inertial sensor measurements are obtained at high sampling rates and can be integrated to obtain position and orientation information. These estimates are accurate on a short time scale, but suffer from integration drift over longer time scales. To overcome this issue, inertial sensors are typically combined with additional sensors and models. In this tutorial we focus on the signal processing aspects of position and orientation estimation using inertial sensors. We discuss different modeling choices and a selected number of important algorithms. The algorithms include optimization-based smoothing and filtering as well as computationally cheaper extended Kalman filter and complementary filter implementations. The quality of their estimates is illustrated using both experimental and simulated data.

1709.10441 2026-06-04 cs.LG cs.NA math.NA

A representer theorem for deep kernel learning

深度核学习的代表定理

Bastian Bohn, Michael Griebel, Christian Rieger

发表机构 * Institute for Numerical Simulation, University of Bonn(数值模拟研究所,波恩大学) Fraunhofer Institute for Algorithms and Scientific Computing SCAI(算法与科学计算弗劳恩霍夫研究所SCAI)

AI总结 本文为深度核学习中的函数拼接提供了有限和无限样本的代表定理,为分析基于函数组合的机器学习算法提供数学基础,并展示了如何将拼接的机器学习问题转化为神经网络,并应用于最新深度学习方法。

详情
AI中文摘要

在本文中,我们为再生核希尔伯特空间中核函数的拼接(线性组合)提供了有限样本和无限样本的代表定理。这些结果为基于函数组合的机器学习算法分析提供了数学基础。在有限样本情况下,相应的无限维最小化问题可以转化为(非线性)有限维最小化问题,可通过非线性优化算法求解。此外,我们展示了如何将拼接的机器学习问题重新表述为神经网络,并证明了我们的代表定理适用于一系列最先进的深度学习方法。

英文摘要

In this paper we provide a finite-sample and an infinite-sample representer theorem for the concatenation of (linear combinations of) kernel functions of reproducing kernel Hilbert spaces. These results serve as mathematical foundation for the analysis of machine learning algorithms based on compositions of functions. As a direct consequence in the finite-sample case, the corresponding infinite-dimensional minimization problems can be recast into (nonlinear) finite-dimensional minimization problems, which can be tackled with nonlinear optimization algorithms. Moreover, we show how concatenated machine learning problems can be reformulated as neural networks and how our representer theorem applies to a broad class of state-of-the-art deep learning methods.

1806.00678 2026-06-04 cs.RO cs.SY eess.SY

AutoRally An open platform for aggressive autonomous driving

AutoRally:一个用于激进自动驾驶的开放平台

Brian Goldfain, Paul Drews, Changxi You, Matthew Barulic, Orlin Velev, Panagiotis Tsiotras, James M. Rehg

发表机构 * Georgia Tech Autonomous Racing Facility(佐治亚理工学院自动驾驶赛车中心)

AI总结 本文介绍了一个1:5比例的机器人测试平台AutoRally,旨在提供稳健、易用和可重复的自动驾驶研究环境,使非专业人员也能收集真实世界的数据。

详情
AI中文摘要

本文介绍了一个1:5比例的机器人测试平台AutoRally,旨在提供稳健、易用和可重复的自动驾驶研究环境,使非专业人员也能收集真实世界的数据。

英文摘要

This article presents AutoRally, a 1$:$5 scale robotics testbed for autonomous vehicle research. AutoRally is designed for robustness, ease of use, and reproducibility, so that a team of two people with limited knowledge of mechanical engineering, electrical engineering, and computer science can construct and then operate the testbed to collect real world autonomous driving data in whatever domain they wish to study. Complete documentation to construct and operate the platform is available online along with tutorials, example controllers, and a driving dataset collected at the Georgia Tech Autonomous Racing Facility. Offline estimation algorithms are used to determine parameters for physics-based dynamics models using an adaptive limited memory joint state unscented Kalman filter. Online vehicle state estimation using a factor graph optimization scheme and a convolutional neural network for semantic segmentation of drivable surface are presented. All algorithms are tested with real world data from the fleet of six AutoRally robots at the Georgia Tech Autonomous Racing Facility tracks, and serve as a demonstration of the robot$'$s capabilities.

1806.00589 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML

Efficient Entropy for Policy Gradient with Multidimensional Action Space

在多维动作空间中高效的策略梯度熵

Yiming Zhang, Quan Ho Vuong, Kenny Song, Xiao-Yue Gong, Keith W. Ross

发表机构 * New York University(纽约大学) New York University Abu Dhabi(纽约大学阿布扎克分校) New York University Shanghai(纽约大学上海分校) Massachusetts Institute of Technology(麻省理工学院)

AI总结 本文提出高效计算高维动作空间策略梯度熵的方法,通过改进的无偏估计器提升探索效率,在多猎手多兔子网格游戏和多智能体多臂老虎机问题中验证了其有效性。

详情
AI中文摘要

近年来,深度强化学习在解决高维状态空间(如Atari游戏)的序列决策过程方面表现出色。然而,许多强化学习问题涉及高维离散动作空间和高维状态空间。本文考虑熵奖励,用于在策略梯度中鼓励探索。在高维动作空间中,计算熵及其梯度需要枚举所有动作并为每个动作运行前向和反向传播,这可能计算上不可行。我们开发了几种新颖的无偏估计器用于熵奖励及其梯度。我们将这些估计器应用于几种参数化策略模型,包括独立采样、CommNet、带有修改MDP的自回归和带有LSTM的自回归。最后,我们在两个环境中测试我们的算法:一个多猎手多兔子网格游戏和一个多智能体多臂老虎机问题。结果表明,我们的熵估计器在边际额外计算成本下显著提升了性能。

英文摘要

In recent years, deep reinforcement learning has been shown to be adept at solving sequential decision processes with high-dimensional state spaces such as in the Atari games. Many reinforcement learning problems, however, involve high-dimensional discrete action spaces as well as high-dimensional state spaces. This paper considers entropy bonus, which is used to encourage exploration in policy gradient. In the case of high-dimensional action spaces, calculating the entropy and its gradient requires enumerating all the actions in the action space and running forward and backpropagation for each action, which may be computationally infeasible. We develop several novel unbiased estimators for the entropy bonus and its gradient. We apply these estimators to several models for the parameterized policies, including Independent Sampling, CommNet, Autoregressive with Modified MDP, and Autoregressive with LSTM. Finally, we test our algorithms on two environments: a multi-hunter multi-rabbit grid game and a multi-agent multi-arm bandit problem. The results show that our entropy estimators substantially improve performance with marginal additional computational cost.

1709.05746 2026-06-04 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY

Adversarial Discriminative Sim-to-real Transfer of Visuo-motor Policies

对抗性判别仿真到现实的视觉-运动策略转移

Fangyi Zhang, Jürgen Leitner, Zongyuan Ge, Michael Milford, Peter Corke

发表机构 * Australian Centre for Robotic Vision (ACRV)(澳大利亚机器人视觉中心) Queensland University of Technology (QUT)(昆士兰技术大学) Monash University(墨尔本大学)

AI总结 本文提出对抗性判别仿真到现实转移方法,减少现实数据标注成本,在桌面上物体抓取任务中,通过视觉观测控制7自由度机械臂在障碍物中抓取蓝色立方体,仅需93个标注和186个未标注图像即可实现97.8%的成功率和1.8厘米的控制精度。

Comments Under review for the International Journal of Robotics Research

详情
AI中文摘要

各种方法已被提出以学习用于现实世界机器人应用的视觉-运动策略。一种解决方案是首先在仿真中学习然后转移到现实世界。在转移过程中,大多数现有方法需要带有标签的真实图像。然而,在许多机器人应用中,标注过程往往昂贵甚至不实际。在本文中,我们提出了一种对抗性判别仿真到现实转移方法,以减少标注真实数据的成本。通过模块化网络在桌面物体抓取任务中验证了该方法的有效性,其中7自由度的机械臂以速度模式控制在障碍物中抓取蓝色立方体。对抗性转移方法将标注真实数据的需求减少了50%。策略可以仅使用93个标注和186个未标注的真实图像转移到现实环境。转移的视觉-运动策略对训练中未见过的物体和移动目标具有鲁棒性,实现了97.8%的成功率和1.8厘米的控制精度。

英文摘要

Various approaches have been proposed to learn visuo-motor policies for real-world robotic applications. One solution is first learning in simulation then transferring to the real world. In the transfer, most existing approaches need real-world images with labels. However, the labelling process is often expensive or even impractical in many robotic applications. In this paper, we propose an adversarial discriminative sim-to-real transfer approach to reduce the cost of labelling real data. The effectiveness of the approach is demonstrated with modular networks in a table-top object reaching task where a 7 DoF arm is controlled in velocity mode to reach a blue cuboid in clutter through visual observations. The adversarial transfer approach reduced the labelled real data requirement by 50%. Policies can be transferred to real environments with only 93 labelled and 186 unlabelled real images. The transferred visuo-motor policies are robust to novel (not seen in training) objects in clutter and even a moving target, achieving a 97.8% success rate and 1.8 cm control accuracy.

1805.10638 2026-06-04 cs.LG cs.NA math.NA stat.ML

Fast K-Means Clustering with Anderson Acceleration

快速K均值聚类的安德森加速方法

Juyong Zhang, Yuxin Yao, Yue Peng, Hao Yu, Bailin Deng

发表机构 * University of Science and Technology of China(中国科学技术大学) Cardiff University(卡迪夫大学)

AI总结 本文提出了一种加速K均值聚类Lloyd算法的新方法,通过将Lloyd算法的分配和更新步骤视为固定点迭代,并应用安德森加速技术,动态调整参数m以实现鲁棒且一致的加速效果。

详情
AI中文摘要

我们提出了一种新的方法,用于加速K-均值聚类的Lloyd算法。与以往减少每次迭代计算成本或改进初始化的方法不同,我们的方法专注于减少收敛所需的迭代次数。这通过将Lloyd算法的分配步骤和更新步骤视为固定点迭代,并应用安德森加速,一种已建立的加速固定点求解器的技术来实现。经典安德森加速利用m个之前的迭代来找到加速的迭代,其在K-均值聚类中的性能对m的选择和样本分布敏感。我们提出了一种新的策略,动态调整m的值,以在不同问题实例上实现鲁棒且一致的加速。我们的方法补充了现有的加速技术,并可以与它们结合以实现最先进的性能。我们进行了广泛的实验来评估所提出方法的性能,在120个测试用例中,有106个用例优于其他算法,平均计算时间减少比率超过33%。

英文摘要

We propose a novel method to accelerate Lloyd's algorithm for K-Means clustering. Unlike previous acceleration approaches that reduce computational cost per iterations or improve initialization, our approach is focused on reducing the number of iterations required for convergence. This is achieved by treating the assignment step and the update step of Lloyd's algorithm as a fixed-point iteration, and applying Anderson acceleration, a well-established technique for accelerating fixed-point solvers. Classical Anderson acceleration utilizes m previous iterates to find an accelerated iterate, and its performance on K-Means clustering can be sensitive to choice of m and the distribution of samples. We propose a new strategy to dynamically adjust the value of m, which achieves robust and consistent speedups across different problem instances. Our method complements existing acceleration techniques, and can be combined with them to achieve state-of-the-art performance. We perform extensive experiments to evaluate the performance of the proposed method, where it outperforms other algorithms in 106 out of 120 test cases, and the mean decrease ratio of computational time is more than 33%.

1710.01493 2026-06-04 cs.LG cs.CV cs.NA math.NA math.OC

Image Labeling Based on Graphical Models Using Wasserstein Messages and Geometric Assignment

基于图形模型的图像标注:利用Wasserstein消息与几何分配

Ruben Hühnerbein, Fabrizio Savarino, Freddie Åström, Christoph Schnörr

发表机构 * Image and Pattern Analysis Group, Heidelberg University, Germany(海德堡大学图像与模式分析组) Heidelberg Collaboratory for Image Processing, Heidelberg University, Germany(海德堡图像处理协同实验室)

AI总结 本文提出基于离散图模型的最大后验推断新方法,利用局部Wasserstein距离近似目标函数并实现并行收敛。

详情
AI中文摘要

我们介绍了一种基于离散图模型的最大后验推断新方法。通过利用局部Wasserstein距离来耦合图底层边的分配措施,给定的离散目标函数被平滑近似并限制在分配流形上。相应的乘法更新方案结合了两个过程:(i)所得到的黎曼梯度流的几何积分,以及(ii)将解四舍五入为有效的标签。在整个过程中,已知的LP松弛方法中的局部边缘约束得以满足,而平滑的几何设置导致快速收敛的迭代,可以并行执行每条边。

英文摘要

We introduce a novel approach to Maximum A Posteriori inference based on discrete graphical models. By utilizing local Wasserstein distances for coupling assignment measures across edges of the underlying graph, a given discrete objective function is smoothly approximated and restricted to the assignment manifold. A corresponding multiplicative update scheme combines in a single process (i) geometric integration of the resulting Riemannian gradient flow and (ii) rounding to integral solutions that represent valid labelings. Throughout this process, local marginalization constraints known from the established LP relaxation are satisfied, whereas the smooth geometric setting results in rapidly converging iterations that can be carried out in parallel for every edge.

1805.09875 2026-06-04 cs.RO cs.SY eess.SY

Autonomous Thermalling as a Partially Observable Markov Decision Process (Extended Version)

自主热力上升作为部分可观测马尔可夫决策过程(扩展版本)

Iain Guilliard, Richard Rogahn, Jim Piavis, Andrey Kolobov

发表机构 * Australian National University(澳大利亚国立大学) Microsoft Research(微软研究院)

AI总结 本文提出将自主热力上升建模为POMDP,并设计基于此的递推地平线控制器,通过在ArduPlane中实现并对比现有方法,验证了其在多架sUAV同时热力上升时的显著优势。

详情
AI中文摘要

小型无人空中车辆(sUAVs)通常依赖主动推进保持飞行,这限制了飞行时间和范围。为解决此问题,自主热力上升试图利用大气中的上升气流(热力)。然而,低空热力的不规则性使得现有方法难以有效利用。本文将自主热力上升建模为POMDP,并基于此提出递推地平线控制器。该控制器被实现于流行的开源自动驾驶系统ArduPlane中,并通过一系列涉及两架同时热力上升的sUAV的实飞测试,与现有方法进行比较,结果表明基于POMDP的控制器具有显著优势。

英文摘要

Small uninhabited aerial vehicles (sUAVs) commonly rely on active propulsion to stay airborne, which limits flight time and range. To address this, autonomous soaring seeks to utilize free atmospheric energy in the form of updrafts (thermals). However, their irregular nature at low altitudes makes them hard to exploit for existing methods. We model autonomous thermalling as a POMDP and present a receding-horizon controller based on it. We implement it as part of ArduPlane, a popular open-source autopilot, and compare it to an existing alternative in a series of live flight tests involving two sUAVs thermalling simultaneously, with our POMDP-based controller showing a significant advantage.

1805.09464 2026-06-04 cs.LG cs.IT cs.NA math.IT math.NA math.OC stat.ML

Simple and practical algorithms for $\ell_p$-norm low-rank approximation

简单且实用的ℓp-范数低秩近似算法

Anastasios Kyrillidis

发表机构 * IBM T.J. Watson Research Center(IBM T.J. 巴特利特研究中心) Rice University(里士满大学)

AI总结 本文提出基于梯度的非凸算法,用于ℓp范数低秩近似,适用于p=1或p=∞。算法易于实现,能更快速且更精确地逼近,理论证明其可达到(1+ε)-OPT近似,且不依赖超参数。

Comments 16 pages, 11 figures, to appear in UAI 2018

详情
AI中文摘要

我们提出了一种实用算法,用于entrywise ℓp-范数低秩近似,其中p=1或p=∞。所提出的框架是非凸且基于梯度的,易于实现且通常在速度和精度上优于现有方法。从理论角度看,我们证明所提方案可以达到(1+ε)-OPT近似。我们的算法并非超参数无关:只有在假设算法的超参数已知或可近似的情况下,才能实现所需目标。即,我们的理论表明为了在多项式时间内获得良好的解,需要知道哪些问题量,且不与最近的不可近似性结果相矛盾,如[46]。

英文摘要

We propose practical algorithms for entrywise $\ell_p$-norm low-rank approximation, for $p = 1$ or $p = \infty$. The proposed framework, which is non-convex and gradient-based, is easy to implement and typically attains better approximations, faster, than state of the art. From a theoretical standpoint, we show that the proposed scheme can attain $(1 + \varepsilon)$-OPT approximations. Our algorithms are not hyperparameter-free: they achieve the desiderata only assuming algorithm's hyperparameters are known a priori---or are at least approximable. I.e., our theory indicates what problem quantities need to be known, in order to get a good solution within polynomial time, and does not contradict to recent inapproximabilty results, as in [46].

1805.09408 2026-06-04 cs.CV cs.NA math.NA

Non-convex non-local flows for saliency detection

非凸非局部流用于显著性检测

Iván Ramírez, Gonzalo Galiano, Emanuele Schiavi

发表机构 * Dpt. of Mathematics, Universidad Rey Juan Carlos(数学系,雷乌恩卡洛斯大学) Dpt. of Mathematics, Universidad de Oviedo(数学系,奥维多大学)

AI总结 本文提出并求解了新的变分模型用于数字图像自动显著性检测,结合非局部框架和新的二次显著性检测项,用于胶质瘤在MRI-Flair图像中的分割。

详情
AI中文摘要

我们提出并数值求解了一个新的变分模型,用于数字图像的自动显著性检测。使用非局部框架,我们考虑了一组保持边缘的函数,结合一个新的二次显著性检测项。该术语定义了一个受p-拉普拉斯算子驱动的约束双侧障碍问题,包括所谓的超拉普拉斯情况(0 < p < 1)。然后考虑并应用了相关的非凸非局部反应流,用于MRI-Flair图像中的胶质瘤分割。通过快速卷积核基于的近似解进行计算。数值实验显示,与超拉普拉斯算子相关的非凸性在标准度量方面提供了单调改进的结果。

英文摘要

We propose and numerically solve a new variational model for automatic saliency detection in digital images. Using a non-local framework we consider a family of edge preserving functions combined with a new quadratic saliency detection term. Such term defines a constrained bilateral obstacle problem for image classification driven by p-Laplacian operators, including the so-called hyper-Laplacian case (0 < p < 1). The related non-convex non-local reactive flows are then considered and applied for glioblastoma segmentation in magnetic resonance fluid-attenuated inversion recovery (MRI-Flair) images. A fast convolutional kernel based approximated solution is computed. The numerical experiments show how the non-convexity related to the hyperLaplacian operators provides monotonically better results in terms of the standard metrics.

1805.08095 2026-06-04 cs.LG cs.CV cs.NA math.NA stat.ML

Small steps and giant leaps: Minimal Newton solvers for Deep Learning

小步与巨跃:用于深度学习的最小牛顿求解器

João F. Henriques, Sebastien Ehrhardt, Samuel Albanie, Andrea Vedaldi

发表机构 * Visual Geometry Group, University of Oxford(视觉几何组,牛津大学)

AI总结 本文提出一种快速的二阶方法,可作为现有深度学习求解器的替代方案。该方法仅需每个迭代两次额外的前向模式自动微分操作,计算成本与两次标准前向传递相当,易于实现。方法解决了现有二阶求解器的长期问题,避免了计算Hessian矩阵的近似逆矩阵的高成本和噪声敏感性。

详情
AI中文摘要

我们提出了一种快速的二阶方法,可作为现有深度学习求解器的替代方案。与随机梯度下降(SGD)相比,该方法每个迭代仅需两次额外的前向模式自动微分操作,计算成本与两次标准前向传递相当,且易于实现。我们的方法解决了现有二阶求解器的长期问题,即每次迭代精确或通过共轭梯度法计算近似Hessian矩阵的逆矩阵,这一过程成本高且对噪声敏感。相反,我们提出保持一个梯度的估计值,该估计值通过逆Hessian矩阵投影得到,并在每次迭代中更新一次。该估计值的大小相同,类似于SGD中常用的动量变量。不维护Hessian的估计值。我们首先在具有已知闭式解的小问题上验证了我们的方法,称为CurveBall,包括噪声Rosenbrock函数和退化的两层线性网络,其中现有深度学习求解器似乎难以处理。然后我们在CIFAR和ImageNet上训练了多个大型模型,包括ResNet和VGG-f网络,展示了无需超参数调优的更快收敛速度。代码已提供。

英文摘要

We propose a fast second-order method that can be used as a drop-in replacement for current deep learning solvers. Compared to stochastic gradient descent (SGD), it only requires two additional forward-mode automatic differentiation operations per iteration, which has a computational cost comparable to two standard forward passes and is easy to implement. Our method addresses long-standing issues with current second-order solvers, which invert an approximate Hessian matrix every iteration exactly or by conjugate-gradient methods, a procedure that is both costly and sensitive to noise. Instead, we propose to keep a single estimate of the gradient projected by the inverse Hessian matrix, and update it once per iteration. This estimate has the same size and is similar to the momentum variable that is commonly used in SGD. No estimate of the Hessian is maintained. We first validate our method, called CurveBall, on small problems with known closed-form solutions (noisy Rosenbrock function and degenerate 2-layer linear networks), where current deep learning solvers seem to struggle. We then train several large models on CIFAR and ImageNet, including ResNet and VGG-f networks, where we demonstrate faster convergence with no hyperparameter tuning. Code is available.

1711.09220 2026-06-04 cs.LG cs.SY eess.SY math.OC

Fitting Jump Models

拟合跳跃模型

A. Bemporad, V. Breschi, D. Piga, S. Boyd

发表机构 * IMT School for Advanced Studies Lucca(IMT 高级研究学院卢塞拉分校) Dalle Molle Institute for Artificial Intelligence Research - USI/SUPSI(达勒莫莱人工智能研究所 - USI/SUPSI) Department of Electrical Engineering, Stanford University(斯坦福大学电气工程系)

AI总结 本文提出了一种新的框架,用于拟合跳跃模型序列数据,通过交替最小化损失函数以拟合多个模型参数和确定每个数据点的活跃参数集,适用于隐马尔可夫模型等主流模型。

Comments Accepted for publication in Automatica

详情
AI中文摘要

我们描述了一种新的框架,用于将跳跃模型拟合到数据序列中。关键思想是交替最小化损失函数以拟合多个模型参数,以及最小化离散损失函数以确定每个数据点的模型参数集。该框架相当通用,涵盖了隐马尔可夫模型和分段仿射模型等流行模型类别。所选损失函数的形状决定了最终跳跃模型的形状。

英文摘要

We describe a new framework for fitting jump models to a sequence of data. The key idea is to alternate between minimizing a loss function to fit multiple model parameters, and minimizing a discrete loss function to determine which set of model parameters is active at each data point. The framework is quite general and encompasses popular classes of models, such as hidden Markov models and piecewise affine models. The shape of the chosen loss functions to minimize determine the shape of the resulting jump model.

1804.01825 2026-06-04 cs.LG econ.GN q-fin.EC stat.ML

Evaluating Hospital Case Cost Prediction Models Using Azure Machine Learning Studio

利用Azure机器学习工作室评估医院病例成本预测模型

Alexei Botchkarev

发表机构 * Microsoft Azure Machine Learning Studio(微软Azure机器学习工作室)

AI总结 本文提出了一种利用Azure机器学习工作室快速评估多种回归模型的工具,评估了鲁棒回归、提升决策树回归和决策森林回归在医院病例成本预测中的优势。

详情
AI中文摘要

准确的医院病例成本建模和预测能力对高效医疗财务管理和预算规划至关重要。已知各种回归机器学习算法在医疗成本预测中表现良好。本实验的目的是构建一个Azure机器学习工作室工具,用于快速评估多种类型的回归模型。该工具提供了一个统一的实验环境,可比较14种回归模型:线性回归、贝叶斯线性回归、决策森林回归、提升决策树回归、神经网络回归、泊松回归、回归高斯过程、梯度提升机、非线性最小二乘回归、投影寻踪回归、随机森林回归、鲁棒回归、鲁棒回归与mm型估计器、支持向量回归。该工具通过五个性能指标将评估结果按模型准确性排列在单一表格中。对回归机器学习模型进行医院病例成本预测的评估显示,鲁棒回归模型、提升决策树回归和决策森林回归具有优势。该操作工具已发布到网络上,可供实验和扩展使用。

英文摘要

Ability for accurate hospital case cost modelling and prediction is critical for efficient health care financial management and budgetary planning. A variety of regression machine learning algorithms are known to be effective for health care cost predictions. The purpose of this experiment was to build an Azure Machine Learning Studio tool for rapid assessment of multiple types of regression models. The tool offers environment for comparing 14 types of regression models in a unified experiment: linear regression, Bayesian linear regression, decision forest regression, boosted decision tree regression, neural network regression, Poisson regression, Gaussian processes for regression, gradient boosted machine, nonlinear least squares regression, projection pursuit regression, random forest regression, robust regression, robust regression with mm-type estimators, support vector regression. The tool presents assessment results arranged by model accuracy in a single table using five performance metrics. Evaluation of regression machine learning models for performing hospital case cost prediction demonstrated advantage of robust regression model, boosted decision tree regression and decision forest regression. The operational tool has been published to the web and openly available for experiments and extensions.

1510.07380 2026-06-04 cs.RO cs.SY eess.SY

SLAP: Simultaneous Localization and Planning Under Uncertainty for Physical Mobile Robots via Dynamic Replanning in Belief Space: Extended version

SLAP:通过信念空间中的动态重新规划实现物理移动机器人的同时定位与规划(在不确定性下):扩展版

Ali-akbar Agha-mohammadi, Saurav Agarwal, Sung-Kyun Kim, Suman Chakravorty, Nancy M. Amato

发表机构 * NASA-JPL, Caltech(NASA-喷气推进中心,加州理工学院) Dept. of Aerospace Eng. and Amato is with the Dept. of Computer Science(航空航天工程系和计算机科学系) Dept. of Computer Science(计算机科学系)

AI总结 本文提出一种在不确定性环境下通过信念空间动态重新规划实现物理移动机器人同时定位与规划的方法,通过在线重新规划循环改进离线策略,有效应对环境变化和大定位误差,优于FIRM方法。

Comments 20 pages, updated figures, extended theory and simulation results

详情
AI中文摘要

同时定位与规划(SLAP)是自主机器人在不确定性环境下至关重要的能力。在最一般的形式下,SLAP诱导出一个连续的POMDP(部分可观测马尔可夫决策过程),需要在线不断求解。本文针对此问题提出一种在信念空间中的动态重新规划方案。该连续的POMDP在状态、动作和观测空间中通过采样方法进行离线近似,但通过在线重新规划循环实现局部改进。这种构造使所提方法能够应对环境变化和大定位误差,即使环境变化改变了最优轨迹的同调类。此外,本文方法优于当前最先进的FIRM(反馈信息路标)方法,通过消除不必要的稳定步骤。将信念空间规划应用于物理系统带来了诸多挑战。本文的重点是将所提规划器应用于物理机器人,并展示在不确定性、变化环境和存在大干扰(如被绑架机器人情况)下的SLAP解决方案性能。

英文摘要

Simultaneous localization and Planning (SLAP) is a crucial ability for an autonomous robot operating under uncertainty. In its most general form, SLAP induces a continuous POMDP (partially-observable Markov decision process), which needs to be repeatedly solved online. This paper addresses this problem and proposes a dynamic replanning scheme in belief space. The underlying POMDP, which is continuous in state, action, and observation space, is approximated offline via sampling-based methods, but operates in a replanning loop online to admit local improvements to the coarse offline policy. This construct enables the proposed method to combat changing environments and large localization errors, even when the change alters the homotopy class of the optimal trajectory. It further outperforms the state-of-the-art FIRM (Feedback-based Information RoadMap) method by eliminating unnecessary stabilization steps. Applying belief space planning to physical systems brings with it a plethora of challenges. A key focus of this paper is to implement the proposed planner on a physical robot and show the SLAP solution performance under uncertainty, in changing environments and in the presence of large disturbances, such as a kidnapped robot situation.

1805.04201 2026-06-04 cs.RO cs.AI cs.SY eess.SY

Learning to Grasp Without Seeing

无需视觉的抓取学习

Adithyavairavan Murali, Yin Li, Dhiraj Gandhi, Abhinav Gupta

发表机构 * The Robotics Institute, Carnegie Mellon University(卡内基梅隆大学机器人研究所)

AI总结 本文提出基于触觉感知的抓取方法,通过触觉信号表征和迭代重抓取提升抓取稳定性,实验表明在无视觉信息下可有效抓取新型物体。

详情
AI中文摘要

能否在不看到物体的情况下让机器人抓取未知物体?本文提出了一种基于触觉感知的解决方案,结合触觉信号定位与触觉反馈重抓取。我们创建了一个大规模抓取数据集,包含超过30帧RGB图像和280万条触觉样本。提出了一种无监督自编码方案,显著提升了触觉感知任务的性能。系统分为两个步骤:首先,触觉定位模型通过粒子滤波聚合目标信息,输出物体位置估计以建立初始抓取;其次,重抓取模型基于学习特征逐步改进抓取,估计抓取稳定性并预测下一步调整。最终通过大量实验验证了在无视觉信息下抓取新型物体的有效性,并在视觉策略基础上提升了整体准确率10.6%。

英文摘要

Can a robot grasp an unknown object without seeing it? In this paper, we present a tactile-sensing based approach to this challenging problem of grasping novel objects without prior knowledge of their location or physical properties. Our key idea is to combine touch based object localization with tactile based re-grasping. To train our learning models, we created a large-scale grasping dataset, including more than 30 RGB frames and over 2.8 million tactile samples from 7800 grasp interactions of 52 objects. To learn a representation of tactile signals, we propose an unsupervised auto-encoding scheme, which shows a significant improvement of 4-9% over prior methods on a variety of tactile perception tasks. Our system consists of two steps. First, our touch localization model sequentially 'touch-scans' the workspace and uses a particle filter to aggregate beliefs from multiple hits of the target. It outputs an estimate of the object's location, from which an initial grasp is established. Next, our re-grasping model learns to progressively improve grasps with tactile feedback based on the learned features. This network learns to estimate grasp stability and predict adjustment for the next grasp. Re-grasping thus is performed iteratively until our model identifies a stable grasp. Finally, we demonstrate extensive experimental results on grasping a large set of novel objects using tactile sensing alone. Furthermore, when applied on top of a vision-based policy, our re-grasping model significantly boosts the overall accuracy by 10.6%. We believe this is the first attempt at learning to grasp with only tactile sensing and without any prior object knowledge.

1804.08676 2026-06-04 cs.RO cs.MA cs.SY eess.SY

Gesture based Human-Swarm Interactions for Formation Control using interpreters

基于手势的人群-蜂群交互的编队控制使用解释器

Aamodh Suresh, Sonia Martinez

发表机构 * Department of Mechanical and Aerospace Engineering, University of California at San Diego, La Jolla, CA 92093, USA(机械与航空航天工程系,加州大学圣地亚哥分校,拉古拉,CA 92093,美国)

AI总结 本文提出了一种新颖的人群-蜂群交互框架,通过手势控制蜂群形状和编队。该框架利用可穿戴臂带记录手势,通过解释器将手势转化为蜂群控制指令,结合机器学习和最优控制技术实现编队控制。

详情
AI中文摘要

我们提出了一种新颖的人群-蜂群交互(HSI)框架,使用户能够通过简单的手臂手势和动作控制蜂群的形状和编队。用户通过可穿戴的臂带记录手势,该框架引入了一种新颖的解释器系统,作为用户和蜂群之间的中介,简化用户的交互角色。解释器接收用户通过手势绘制的高层次输入,并将其转化为低层次的蜂群控制指令。该解释器利用机器学习、卡尔曼滤波和最优控制技术将用户输入转化为蜂群控制参数。引入了人类可解释的动力学概念,用于解释器的规划以及向用户提供反馈。蜂群的动力学通过基于分布式线性迭代和动态平均一致的新型去中心化编队控制器进行控制。该框架在二维环境中理论和实验上均得到了验证,展示了人类实时控制模拟机器人蜂群的能力。

英文摘要

We propose a novel Human-Swarm Interaction (HSI) framework which enables the user to control a swarm shape and formation. The user commands the swarm utilizing just arm gestures and motions which are recorded by an off-the-shelf wearable armband. We propose a novel interpreter system, which acts as an intermediary between the user and the swarm to simplify the user's role in the interaction. The interpreter takes in a high level input drawn using gestures by the user, and translates it into low level swarm control commands. This interpreter employs machine learning, Kalman filtering and optimal control techniques to translate the user input into swarm control parameters. A notion of Human Interpretable dynamics is introduced, which is used by the interpreter for planning as well as to provide feedback to the user. The dynamics of the swarm are controlled using a novel decentralized formation controller based on distributed linear iterations and dynamic average consensus. The framework is demonstrated theoretically as well as experimentally in a 2D environment, with a human controlling a swarm of simulated robots in real time.

1804.07323 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems

非参数随机组合梯度下降法在连续马尔可夫决策问题中的Q学习

Alec Koppel, Ekaterina Tolstaya, Ethan Stump, Alejandro Ribeiro

发表机构 * University of Pennsylvania(宾夕法尼亚大学) U.S. Army Research Laboratory(美国陆军研究实验室)

AI总结 本文提出非参数随机组合梯度下降法用于连续马尔可夫决策问题中的Q学习,通过将贝尔曼最优性方程转化为嵌套非凸随机优化问题,并利用核诱导再生核希尔伯特空间进行参数化,最终证明算法在概率意义下收敛于问题的 stationary 点。

详情
AI中文摘要

我们考虑定义在连续状态和动作空间上的马尔可夫决策问题,其中自主代理试图学习从状态到动作的映射以最大化长期折扣奖励累积。我们通过考虑定义在动作价值函数上的贝尔曼最优性方程,将其重新表述为一个嵌套非凸随机优化问题,该问题定义在再生核希尔伯特空间(RKHS)上。我们开发了一种功能扩展的随机准梯度方法来解决这个问题,由于RKHS的结构,它允许以标量权重和过去的状态-动作对参数化,其增长与算法迭代次数成比例。为缓解这种复杂性爆炸,我们应用核正交匹配追踪到核权重和字典序列,从而在底层优化方法的下降方向上产生可控的误差。我们证明所得到的算法,称为KQ学习,以概率1收敛于该问题的 stationary 点,从而在假设其属于RKHS的情况下得到贝尔曼最优性算子的固定点。在常数学习率下,我们进一步得到收敛于一个小的贝尔曼误差,该误差取决于所选的学习率。在连续山车和倒立摆任务上的数值评估表明,收敛的简洁学习动作价值函数、与最先进方法具有竞争力的策略,并表现出可靠、可重复的学习行为。

英文摘要

We consider Markov Decision Problems defined over continuous state and action spaces, where an autonomous agent seeks to learn a map from its states to actions so as to maximize its long-term discounted accumulation of rewards. We address this problem by considering Bellman's optimality equation defined over action-value functions, which we reformulate into a nested non-convex stochastic optimization problem defined over a Reproducing Kernel Hilbert Space (RKHS). We develop a functional generalization of stochastic quasi-gradient method to solve it, which, owing to the structure of the RKHS, admits a parameterization in terms of scalar weights and past state-action pairs which grows proportionately with the algorithm iteration index. To ameliorate this complexity explosion, we apply Kernel Orthogonal Matching Pursuit to the sequence of kernel weights and dictionaries, which yields a controllable error in the descent direction of the underlying optimization method. We prove that the resulting algorithm, called KQ-Learning, converges with probability 1 to a stationary point of this problem, yielding a fixed point of the Bellman optimality operator under the hypothesis that it belongs to the RKHS. Under constant learning rates, we further obtain convergence to a small Bellman error that depends on the chosen learning rates. Numerical evaluation on the Continuous Mountain Car and Inverted Pendulum tasks yields convergent parsimonious learned action-value functions, policies that are competitive with the state of the art, and exhibit reliable, reproducible learning behavior.

1804.06114 2026-06-04 cs.LG cs.CV cs.NA math.NA stat.ML

A Support Tensor Train Machine

支持张量列车机

Cong Chen, Kim Batselier, Ching-Yun Ko, Ngai Wong

发表机构 * The Department of Electrical and Electronic Engineering, The University of Hong Kong(香港大学电子与电气工程系)

AI总结 本文提出支持张量列车机,通过将传统支持张量机中的秩一张量替换为张量列车,提升模型表达能力,实验验证其优于SVM和STM。

Comments 7 pages

详情
AI中文摘要

近年来,将传统向量机技术扩展到张量形式引起了广泛关注。例如,支持张量机(STM)利用秩一张量捕捉数据结构,从而缓解传统支持向量机(SVM)中的过拟合和维度灾难问题。然而,秩一张量的表达能力对于许多现实数据来说是有限的。为克服这一限制,我们引入支持张量列车机(STTM),通过将STM中的秩一张量替换为张量列车。实验验证并确认STTM优于SVM和STM。

英文摘要

There has been growing interest in extending traditional vector-based machine learning techniques to their tensor forms. An example is the support tensor machine (STM) that utilizes a rank-one tensor to capture the data structure, thereby alleviating the overfitting and curse of dimensionality problems in the conventional support vector machine (SVM). However, the expressive power of a rank-one tensor is restrictive for many real-world data. To overcome this limitation, we introduce a support tensor train machine (STTM) by replacing the rank-one tensor in an STM with a tensor train. Experiments validate and confirm the superiority of an STTM over the SVM and STM.

1804.04696 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY

Efficient Model Identification for Tensegrity Locomotion

高效 tensegrity 机器人运动的模型识别

Shaojun Zhu, David Surovik, Kostas E. Bekris, Abdeslam Boularias

发表机构 * Department of Computer Science, Rutgers University(计算机科学系,罗格斯大学)

AI总结 本文提出一种高效方法,利用物理引擎和贝叶斯优化框架,用于识别高维顺应性tensegrity机器人中的未知机械参数,提升运动控制精度。

详情
AI中文摘要

本文旨在以实用方式识别未知物理参数,如驱动机器人连杆的机械模型,这些参数在动态机器人任务中至关重要。关键特征包括使用现成的物理引擎和贝叶斯优化框架。所考虑的任务是高维、顺应性tensegrity机器人的运动。关键见解在于将模型识别挑战投影到适当的低维空间以提高效率。与替代方法的比较表明,所提出的方法可以在给定的时间预算内更准确地识别参数,从而实现更精确的运动控制。

英文摘要

This paper aims to identify in a practical manner unknown physical parameters, such as mechanical models of actuated robot links, which are critical in dynamical robotic tasks. Key features include the use of an off-the-shelf physics engine and the Bayesian optimization framework. The task being considered is locomotion with a high-dimensional, compliant Tensegrity robot. A key insight, in this case, is the need to project the model identification challenge into an appropriate lower dimensional space for efficiency. Comparisons with alternatives indicate that the proposed method can identify the parameters more accurately within the given time budget, which also results in more precise locomotion control.

1804.04347 2026-06-04 cs.RO cs.SE cs.SY eess.SY

The CAT Vehicle Testbed: A Simulator with Hardware in the Loop for Autonomous Vehicle Applications

CAT车辆测试平台:用于自动驾驶应用的具有闭环硬件的模拟器

Rahul Kumar Bhadani, Jonathan Sprinkle, Matthew Bunting

发表机构 * Department of Electrical and Computer Engineering(电气与计算机工程系) University of Arizona(亚利桑那大学) Tucson, USA(美国图森市)

AI总结 本文提出CAT车辆测试平台,通过闭环硬件模拟验证仿真结果,支持自动驾驶技术研究。平台基于ROS和物理车辆模型,支持多车交互和实时数据回放,可快速验证算法性能。

Comments In Proceedings SCAV 2018, arXiv:1804.03406

详情
Journal ref
EPTCS 269, 2018, pp. 32-47
AI中文摘要

本文介绍了CAT车辆(认知与自主测试车辆)测试平台:一个由分布式仿真为基础的自动驾驶车辆组成的研发测试平台,能够轻松过渡到闭环硬件测试和执行,以支持自动驾驶技术的研究。自动驾驶技术从主动安全功能和高级驾驶辅助系统发展到完全传感器引导的自动驾驶,需要测试所有可能的场景。然而,研究人员若没有自己的机器人平台,想要在物理平台上展示新成果将面临困难。因此,需要一个研究测试平台,使基于仿真的结果能够通过闭环仿真快速验证,以便在物理平台上测试软件。CAT车辆测试平台提供了这样的测试平台,可以在仿真中模拟真实车辆的动力学,然后无缝过渡到使用案例的硬件再现。该模拟器使用机器人操作系统(ROS)和基于物理的车辆模型,包括具有可配置参数的模拟传感器和执行器。该测试平台允许多车仿真以支持车辆间交互。我们的测试平台还支持实时数据记录和捕获,可以回放以检查特定场景或使用案例,并用于回归测试。作为可行性演示的一部分,我们介绍了CAT车辆挑战,全球各地的学生研究人员能够在少于2天的物理平台接口时间内重现他们的仿真结果。

英文摘要

This paper presents the CAT Vehicle (Cognitive and Autonomous Test Vehicle) Testbed: a research testbed comprised of a distributed simulation-based autonomous vehicle, with straightforward transition to hardware in the loop testing and execution, to support research in autonomous driving technology. The evolution of autonomous driving technology from active safety features and advanced driving assistance systems to full sensor-guided autonomous driving requires testing of every possible scenario. However, researchers who want to demonstrate new results on a physical platform face difficult challenges, if they do not have access to a robotic platform in their own labs. Thus, there is a need for a research testbed where simulation-based results can be rapidly validated through hardware in the loop simulation, in order to test the software on board the physical platform. The CAT Vehicle Testbed offers such a testbed that can mimic dynamics of a real vehicle in simulation and then seamlessly transition to reproduction of use cases with hardware. The simulator utilizes the Robot Operating System (ROS) with a physics-based vehicle model, including simulated sensors and actuators with configurable parameters. The testbed allows multi-vehicle simulation to support vehicle to vehicle interaction. Our testbed also facilitates logging and capturing of the data in the real time that can be played back to examine particular scenarios or use cases, and for regression testing. As part of the demonstration of feasibility, we present a brief description of the CAT Vehicle Challenge, in which student researchers from all over the globe were able to reproduce their simulation results with fewer than 2 days of interfacing with the physical platform.

1804.02884 2026-06-04 cs.AI cs.LG cs.MA cs.NE cs.SY eess.SY

Policy Gradient With Value Function Approximation For Collective Multiagent Planning

基于价值函数近似集体多智能体规划的策略梯度

Duc Thien Nguyen, Akshat Kumar, Hoong Chuin Lau

发表机构 * School of Information Systems(信息系统学院) Singapore Management University(新加坡管理大学)

AI总结 本文提出一种改进的actor-critic方法,用于优化集体决策多智能体规划问题,通过分解近似动作价值函数提升收敛速度,并在合成任务和出租车车队优化中验证了方法的有效性。

详情
AI中文摘要

去中心化(PO)MDPs为多智能体系统序列决策提供了表达性框架。鉴于其计算复杂性,近期研究聚焦于可处理且实用的Dec-POMDP子类。我们针对此类子类CDEC-POMDP进行研究,其中智能体群体行为影响联合奖励和环境动态。本文的主要贡献是一种用于优化CDEC-POMDP策略的actor-critic强化学习方法。 vanilla AC在大问题上收敛缓慢。为解决此问题,我们展示了如何通过特定的分解近似动作价值函数过智能体导致有效的更新,并推导出一种基于局部奖励信号训练critic的新方法。在合成基准和现实世界出租车车队优化问题上的比较表明,我们的新AC方法提供了比先前最佳方法更高质量的解决方案。

英文摘要

Decentralized (PO)MDPs provide an expressive framework for sequential decision making in a multiagent system. Given their computational complexity, recent research has focused on tractable yet practical subclasses of Dec-POMDPs. We address such a subclass called CDEC-POMDP where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our main contribution is an actor-critic (AC) reinforcement learning method for optimizing CDEC-POMDP policies. Vanilla AC has slow convergence for larger problems. To address this, we show how a particular decomposition of the approximate action-value function over agents leads to effective updates, and also derive a new way to train the critic based on local reward signals. Comparisons on a synthetic benchmark and a real-world taxi fleet optimization problem show that our new AC approach provides better quality solutions than previous best approaches.

1612.07139 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY

A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation

深度网络在机器人学习控制中的应用综述:从强化到模仿

Lei Tai, Jingwei Zhang, Ming Liu, Joschka Boedecker, Wolfram Burgard

发表机构 * University of Freiburg(弗赖堡大学)

AI总结 本文综述了深度学习在机器人学习控制中的应用,探讨了深度强化学习和模仿学习两大主流方法,分析了其在导航、 manipulation 任务中的应用及现实差距挑战。

Comments 19 pages, 1 figures

详情
AI中文摘要

深度学习技术已广泛应用于各种研究领域,取得了最先进的成果。本文综述了针对机器人应用的学习控制策略的深度学习解决方案。我们讨论了深度学习在学习控制中的两大主要范式:深度强化学习和模仿学习。对于深度强化学习(DRL),我们从传统强化学习算法开始,展示了如何将其扩展到深度领域,并介绍了在机器人导航和 manipulation 任务中使用 DRL 的代表性工作。我们继续讨论了解决现实差距挑战的方法,即如何将仿真中训练的 DRL 策略转移到现实世界场景,并总结了用于 DRL 研究的机器人仿真平台。对于模仿学习,我们探讨了其三个主要类别:行为克隆、逆强化学习和生成对抗模仿学习,介绍了它们的公式及其在机器人应用中的对应情况。最后,我们讨论了开放挑战和研究前沿。

英文摘要

Deep learning techniques have been widely applied, achieving state-of-the-art results in various fields of study. This survey focuses on deep learning solutions that target learning control policies for robotics applications. We carry out our discussions on the two main paradigms for learning control with deep networks: deep reinforcement learning and imitation learning. For deep reinforcement learning (DRL), we begin from traditional reinforcement learning algorithms, showing how they are extended to the deep context and effective mechanisms that could be added on top of the DRL algorithms. We then introduce representative works that utilize DRL to solve navigation and manipulation tasks in robotics. We continue our discussion on methods addressing the challenge of the reality gap for transferring DRL policies trained in simulation to real-world scenarios, and summarize robotics simulation platforms for conducting DRL research. For imitation leaning, we go through its three main categories, behavior cloning, inverse reinforcement learning and generative adversarial imitation learning, by introducing their formulations and their corresponding robotics applications. Finally, we discuss the open challenges and research frontiers.

1612.00181 2026-06-04 cs.CV cs.NA math.NA

Monge's Optimal Transport Distance for Image Classification

蒙特问题最优运输距离用于图像分类

Michael Snow, Jan Van lent

发表机构 * Department of Engineering Design and Mathematics, Centre for Machine Vision, University of the West of England(工程设计与数学系,机器视觉中心,西英格兰大学)

AI总结 本文提出利用Wasserstein距离进行图像比较,通过求解Monge问题的高效数值方法,并用1-NN算法展示其在图像分类中的优势。

Comments 15 pages, 14 figure

详情
AI中文摘要

本文聚焦于一种用于图像比较的相似性度量,即Wasserstein距离。Wasserstein距离源于Monge最优运输问题的偏微分方程(PDE) formulation。我们提出了一个高效的数值求解方法来解决Monge问题。为了展示该度量在图像比较中的判别能力,我们使用$1$-近邻($1$-NN)机器学习算法来展示该度量相对于其他更传统距离度量以及Tangent Space距离在MNIST数据集上的优势。到目前为止,Wasserstein度量的PDE formulation尚未用于处理图像比较,也尚未在$1$-nearest neighbour架构中使用Wasserstein距离。

英文摘要

This paper focuses on a similarity measure, known as the Wasserstein distance, with which to compare images. The Wasserstein distance results from a partial differential equation (PDE) formulation of Monge's optimal transport problem. We present an efficient numerical solution method for solving Monge's problem. To demonstrate the measure's discriminatory power when comparing images, we use a $1$-Nearest Neighbour ($1$-NN) machine learning algorithm to illustrate the measure's potential benefits over other more traditional distance metrics and also the Tangent Space distance, designed to perform excellently on the well-known MNIST dataset. To our knowledge, the PDE formulation of the Wasserstein metric has not been presented for dealing with image comparison, nor has the Wasserstein distance been used within the $1$-nearest neighbour architecture.

1712.04170 2026-06-04 cs.AI cs.NE cs.SY eess.SY

Interpretable Policies for Reinforcement Learning by Genetic Programming

通过遗传编程实现强化学习的可解释策略

Daniel Hein, Steffen Udluft, Thomas A. Runkler

发表机构 * Technical University of Munich, Department of Informatics(慕尼黑技术大学信息学院) Siemens AG, Corporate Technology(西门子股份公司企业技术部)

AI总结 本文提出基于模型驱动批量强化学习和遗传编程的GPRL方法,通过预存的默认状态-动作轨迹样本自动生成可解释的强化学习策略,实验表明其优于传统符号回归方法。

详情
AI中文摘要

可解释性强化学习策略的搜索在学术和工业领域均有重要价值。特别是对于工业系统,如果策略易于理解和评估,领域专家更可能部署自主学习的控制器。基本代数方程只要复杂度适当,就能满足这些要求。本文引入基于模型驱动批量强化学习和遗传编程的强化学习遗传编程(GPRL)方法,该方法可从预存的默认状态-动作轨迹样本中自动生成策略方程。GPRL与传统利用遗传编程进行符号回归的方法相比,能够生成模仿现有高性能但不可解释策略的策略。在三个强化学习基准测试中,即山车、倒极杆平衡和工业基准,实验显示GPRL方法优于符号回归方法。GPRL能够从预存的默认轨迹数据中生成高性能且可解释的强化学习策略。

英文摘要

The search for interpretable reinforcement learning policies is of high academic and industrial interest. Especially for industrial systems, domain experts are more likely to deploy autonomously learned controllers if they are understandable and convenient to evaluate. Basic algebraic equations are supposed to meet these requirements, as long as they are restricted to an adequate complexity. Here we introduce the genetic programming for reinforcement learning (GPRL) approach based on model-based batch reinforcement learning and genetic programming, which autonomously learns policy equations from pre-existing default state-action trajectory samples. GPRL is compared to a straight-forward method which utilizes genetic programming for symbolic regression, yielding policies imitating an existing well-performing, but non-interpretable policy. Experiments on three reinforcement learning benchmarks, i.e., mountain car, cart-pole balancing, and industrial benchmark, demonstrate the superiority of our GPRL approach compared to the symbolic regression method. GPRL is capable of producing well-performing interpretable reinforcement learning policies from pre-existing default trajectory data.

1804.00684 2026-06-04 cs.LG cs.NA math.NA stat.ML

Graph-Based Deep Modeling and Real Time Forecasting of Sparse Spatio-Temporal Data

基于图的深度建模与稀疏时空数据的实时预测

Bao Wang, Xiyang Luo, Fangbo Zhang, Baichuan Yuan, Andrea L. Bertozzi, P. Jeffrey Brantingham

发表机构 * Dept of Anthropology, UCLA(人类学系,加州大学洛杉矶分校) Dept of Math, UCLA(数学系,加州大学洛杉矶分校)

AI总结 本文提出一种通用框架,用于稀疏时空数据的建模、分析与预测,结合自激发点过程和图结构循环神经网络,实现宏微观尺度的联合建模与实时预测。

Comments 9 pages, 19 figures

详情
AI中文摘要

我们提出了一种通用框架,用于时空数据的建模、分析和预测,特别关注在空间和时间上都稀疏的数据。我们的多尺度框架是两个主要组件的无缝耦合:一个自激发点过程用于建模时空数据的宏尺度统计行为,以及一个图结构循环神经网络(GSRNN)用于在推断图上发现时空数据的微尺度模式。这种新颖的深度神经网络(DNN)结合了图节点的实时交互,以实现更准确的实时预测。该方法在犯罪和交通预测上得到了验证。

英文摘要

We present a generic framework for spatio-temporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time. Our multi-scaled framework is a seamless coupling of two major components: a self-exciting point process that models the macroscale statistical behaviors of the ST data and a graph structured recurrent neural network (GSRNN) to discover the microscale patterns of the ST data on the inferred graph. This novel deep neural network (DNN) incorporates the real time interactions of the graph nodes to enable more accurate real time forecasting. The effectiveness of our method is demonstrated on both crime and traffic forecasting.

1803.10371 2026-06-04 cs.RO cs.LG cs.SY eess.SY

Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system

基于非操控操作的强化学习:从仿真到物理系统的迁移

Kendall Lowrey, Svetoslav Kolev, Jeremy Dao, Aravind Rajeswaran, Emanuel Todorov

发表机构 * University of Washington(华盛顿大学) Roboti LLC(Roboti公司)

AI总结 本文提出了一种基于仿真的强化学习方法,用于非操控操作任务,通过在仿真环境中训练策略,成功迁移到物理系统中,且在模型集合训练下提升了策略的鲁棒性。

Comments Accepted at IEEE SIMPAR 2018. Project page: https://sites.google.com/view/phantomsim2real

详情
AI中文摘要

强化学习已作为一种有前途的方法用于训练机器人控制器。然而,大多数结果受限于仿真,因为需要大量样本且缺乏自动且安全的数据收集方法。基于模型的强化学习方法提供了一种途径来克服这些挑战,但传统关注的是仿真与现实世界之间的不匹配。这里,我们展示在仿真中学习的控制策略可以成功迁移到由三个Phantom机器人推动物体到各种目标位置的物理系统中。我们使用修改的自然策略梯度算法进行学习,应用于精心识别的仿真模型。所得到的策略在仿真中完全训练后,在物理系统中无需额外训练即可有效工作。此外,我们还表明,使用模型集合训练使学习的策略对建模误差更鲁棒,从而补偿系统识别的困难。

英文摘要

Reinforcement learning has emerged as a promising methodology for training robot controllers. However, most results have been limited to simulation due to the need for a large number of samples and the lack of automated-yet-safe data collection methods. Model-based reinforcement learning methods provide an avenue to circumvent these challenges, but the traditional concern has been the mismatch between the simulator and the real world. Here, we show that control policies learned in simulation can successfully transfer to a physical system, composed of three Phantom robots pushing an object to various desired target positions. We use a modified form of the natural policy gradient algorithm for learning, applied to a carefully identified simulation model. The resulting policies, trained entirely in simulation, work well on the physical system without additional training. In addition, we show that training with an ensemble of models makes the learned policies more robust to modeling errors, thus compensating for difficulties in system identification.

1803.07661 2026-06-04 cs.LG cs.NA math.NA stat.ML

Efficient Recurrent Neural Networks using Structured Matrices in FPGAs

在FPGA上使用结构化矩阵实现高效的循环神经网络

Zhe Li, Shuo Wang, Caiwen Ding, Qinru Qiu, Yanzhi Wang, Yun Liang

发表机构 * Department of Electrical Engineering and Computer Science, Syracuse University, USA(Syracuse大学电气工程与计算机科学系) Center for Energy-efficient Computing and Applications (CECA), Peking University, China(北京大学能源高效计算与应用中心)

AI总结 本文提出在FPGA上使用块循环矩阵实现RNN,以提高模型压缩和加速,实验显示比ESE提升35.7倍的能效。

Comments To appear in International Conference on Learning Representations 2018 Workshop Track

详情
AI中文摘要

循环神经网络(RNN)在时间序列相关应用中正变得越来越重要,要求高效的实时实现。最近基于剪枝的工作ESE由于剪枝后网络结构的不规则性导致性能/能效下降。我们提出在RNN中使用块循环矩阵来表示权重矩阵,从而实现同时的模型压缩和加速。我们的目标是在FPGA上实现最高性能和能效的RNN,同时满足一定的精度要求(可忽略的精度下降)。实验结果表明,所提出的框架在实际FPGA部署中相比ESE实现了最大能效提升35.7倍。

英文摘要

Recurrent Neural Networks (RNNs) are becoming increasingly important for time series-related applications which require efficient and real-time implementations. The recent pruning based work ESE suffers from degradation of performance/energy efficiency due to the irregular network structure after pruning. We propose block-circulant matrices for weight matrix representation in RNNs, thereby achieving simultaneous model compression and acceleration. We aim to implement RNNs in FPGA with highest performance and energy efficiency, with certain accuracy requirement (negligible accuracy degradation). Experimental results on actual FPGA deployments shows that the proposed framework achieves a maximum energy efficiency improvement of 35.7$\times$ compared with ESE.