arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

1806.09919 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Tangent-Space Regularization for Neural-Network Models of Dynamical Systems

神经动力系统模型中的切空间正则化

Fredrik Bagge Carlson, Rolf Johansson, Anders Robertsson

发表机构 * LCCC Linnaeus Center（LCCC 林纳尤斯中心）

AI总结本文提出神经网络动力系统模型的切空间正则化方法，通过利用动力学函数的切空间特性，改进模型雅可比矩阵的正则化，减少对大量训练数据的依赖，并探讨不同网络架构对输入输出雅可比矩阵学习能力及L2正则化对系统稳定性的影响。

1806.08083 2026-06-04 cs.AI cs.SY eess.SY

Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop

拓展主动推断领域：感知-动作循环中的更多内在动机

Martin Biehl, Christian Guckelsberger, Christoph Salge, Simón C. Smith, Daniel Polani

发表机构 * Araya Inc.（Araya公司）； Computational Creativity Group, Department of Computing, Goldsmiths, University of London（Goldsmiths大学计算创意小组）； Game Innovation Lab, Department of Computer Science and Engineering, New York University（纽约大学游戏创新实验室）； Sepia Lab, Adaptive Systems Research Group, Department of Computer Science, University of Hertfordshire（赫特福德大学计算机科学系Sepia实验室）； Institute of Perception, Action and Behaviour, School of Informatics, The University of Edinburgh（爱丁堡大学信息学院感知、行为与行为研究所）

AI总结本文探讨主动推断中是否可利用其他内在动机替代原有动机，同时保持核心机制，并通过形式化方法连接通用强化学习。

Comments 53 pages, 6 figures, 2 tables

详情

AI中文摘要

主动推断是一种雄心勃勃的理论，将自主代理的感知、推断和动作选择统一于单一原则下。它为许多认知现象提供了生物合理解释，包括意识。在主动推断中，动作选择由一个评估未来动作的客观函数驱动，该函数基于当前推断的世界信念。主动推断本质上独立于外在奖励，使其在不同环境或代理形态中具有高度鲁棒性。在文献中，共享这种独立性的范式被总结为内在动机。与主动推断不同，这些动机模型通常不承诺特定的推断和动作选择机制。本文研究主动推断的推断和动作选择机制是否也可用于其他内在动机替代原动机。感知-动作循环明确将推断和动作选择与环境和代理记忆联系起来，因此被用作分析基础。我们重构了主动推断方法，将其原始公式定位其中，并展示如何在保持许多原始特征的同时使用其他内在动机。此外，我们通过形式化方法展示了与通用强化学习的联系。主动推断研究可能从比较其他内在动机诱导的动力学中受益。内在动机研究可能从另一种实现内在动机代理的方式中受益，该方式也共享主动推断的生物合理性。

英文摘要

Active inference is an ambitious theory that treats perception, inference and action selection of autonomous agents under the heading of a single principle. It suggests biologically plausible explanations for many cognitive phenomena, including consciousness. In active inference, action selection is driven by an objective function that evaluates possible future actions with respect to current, inferred beliefs about the world. Active inference at its core is independent from extrinsic rewards, resulting in a high level of robustness across e.g.\ different environments or agent morphologies. In the literature, paradigms that share this independence have been summarised under the notion of intrinsic motivations. In general and in contrast to active inference, these models of motivation come without a commitment to particular inference and action selection mechanisms. In this article, we study if the inference and action selection machinery of active inference can also be used by alternatives to the originally included intrinsic motivation. The perception-action loop explicitly relates inference and action selection to the environment and agent memory, and is consequently used as foundation for our analysis. We reconstruct the active inference approach, locate the original formulation within, and show how alternative intrinsic motivations can be used while keeping many of the original features intact. Furthermore, we illustrate the connection to universal reinforcement learning by means of our formalism. Active inference research may profit from comparisons of the dynamics induced by alternative intrinsic motivations. Research on intrinsic motivations may profit from an additional way to implement intrinsically motivated agents that also share the biological plausibility of active inference.

URL PDF HTML ☆

赞 0 踩 0

1804.01926 2026-06-04 cs.RO cs.SY eess.SY stat.ML

Scalable Magnetic Field SLAM in 3D Using Gaussian Process Maps

基于高斯过程地图的可扩展三维磁场SLAM

Manon Kok, Arno Solin

发表机构 * Delft University of Technology（代尔夫特理工大学）； Aalto University（阿尔托大学）； University of Cambridge（剑桥大学）

AI总结本文提出一种利用磁场局部异常进行三维磁场SLAM的方法，采用高斯过程模型和六边形分块映射，结合降维高斯过程回归与 Rao-Blackwellised 粒子滤波，实现高效计算和存储的SLAM算法。

Comments 11 pages, 5 figures

1704.06053 2026-06-04 cs.RO cs.SY eess.SY

Using Inertial Sensors for Position and Orientation Estimation

利用惯性传感器进行位置和姿态估计

Manon Kok, Jeroen D. Hol, Thomas B. Schön

发表机构 * Delft Center for Systems and Control, Delft University of Technology, the Netherlands（荷兰代尔夫特理工大学系统与控制中心）； Xsens Technologies B.V., Enschede, the Netherlands（荷兰恩schede市Xsens技术公司）； Department of Information Technology, Uppsala University, Sweden（瑞典乌普萨拉大学信息科技系）

AI总结本文探讨了惯性传感器在位置和姿态估计中的信号处理方法，分析了不同建模选择和关键算法，如优化平滑滤波、扩展卡尔曼滤波和互补滤波，并通过实验和模拟数据验证其性能。

Comments 90 pages, 38 figures

详情

DOI: 10.1561/2000000094
Journal ref: Foundations and Trends in Signal Processing, Vol. 11: No. 1-2, Pages 1-153, 2017

AI中文摘要

近年来，由于体积小、成本低，MEMS惯性传感器（3D加速度计和3D陀螺仪）已广泛可用。惯性传感器以高采样率获取数据，可通过积分获得位置和姿态信息。这些估计在短时间尺度上准确，但长时间尺度上会因积分漂移而产生误差。为克服此问题，惯性传感器通常与其它传感器和模型结合。本文教程聚焦于惯性传感器用于位置和姿态估计的信号处理方面，讨论了不同的建模选择和若干重要的算法。这些算法包括基于优化的平滑和滤波方法，以及计算成本更低的扩展卡尔曼滤波和互补滤波实现。通过实验和模拟数据展示了这些算法的估计质量。

英文摘要

In recent years, MEMS inertial sensors (3D accelerometers and 3D gyroscopes) have become widely available due to their small size and low cost. Inertial sensor measurements are obtained at high sampling rates and can be integrated to obtain position and orientation information. These estimates are accurate on a short time scale, but suffer from integration drift over longer time scales. To overcome this issue, inertial sensors are typically combined with additional sensors and models. In this tutorial we focus on the signal processing aspects of position and orientation estimation using inertial sensors. We discuss different modeling choices and a selected number of important algorithms. The algorithms include optimization-based smoothing and filtering as well as computationally cheaper extended Kalman filter and complementary filter implementations. The quality of their estimates is illustrated using both experimental and simulated data.

URL PDF HTML ☆

赞 0 踩 0

1709.10441 2026-06-04 cs.LG cs.NA math.NA

A representer theorem for deep kernel learning

深度核学习的代表定理

Bastian Bohn, Michael Griebel, Christian Rieger

发表机构 * Institute for Numerical Simulation, University of Bonn（数值模拟研究所，波恩大学）； Fraunhofer Institute for Algorithms and Scientific Computing SCAI（算法与科学计算弗劳恩霍夫研究所SCAI）

AI总结本文为深度核学习中的函数拼接提供了有限和无限样本的代表定理，为分析基于函数组合的机器学习算法提供数学基础，并展示了如何将拼接的机器学习问题转化为神经网络，并应用于最新深度学习方法。

1806.00678 2026-06-04 cs.RO cs.SY eess.SY

AutoRally An open platform for aggressive autonomous driving

AutoRally：一个用于激进自动驾驶的开放平台

Brian Goldfain, Paul Drews, Changxi You, Matthew Barulic, Orlin Velev, Panagiotis Tsiotras, James M. Rehg

发表机构 * Georgia Tech Autonomous Racing Facility（佐治亚理工学院自动驾驶赛车中心）

AI总结本文介绍了一个1:5比例的机器人测试平台AutoRally，旨在提供稳健、易用和可重复的自动驾驶研究环境，使非专业人员也能收集真实世界的数据。

1806.00589 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML

Efficient Entropy for Policy Gradient with Multidimensional Action Space

在多维动作空间中高效的策略梯度熵

Yiming Zhang, Quan Ho Vuong, Kenny Song, Xiao-Yue Gong, Keith W. Ross

发表机构 * New York University（纽约大学）； New York University Abu Dhabi（纽约大学阿布扎克分校）； New York University Shanghai（纽约大学上海分校）； Massachusetts Institute of Technology（麻省理工学院）

AI总结本文提出高效计算高维动作空间策略梯度熵的方法，通过改进的无偏估计器提升探索效率，在多猎手多兔子网格游戏和多智能体多臂老虎机问题中验证了其有效性。

详情

AI中文摘要

近年来，深度强化学习在解决高维状态空间（如Atari游戏）的序列决策过程方面表现出色。然而，许多强化学习问题涉及高维离散动作空间和高维状态空间。本文考虑熵奖励，用于在策略梯度中鼓励探索。在高维动作空间中，计算熵及其梯度需要枚举所有动作并为每个动作运行前向和反向传播，这可能计算上不可行。我们开发了几种新颖的无偏估计器用于熵奖励及其梯度。我们将这些估计器应用于几种参数化策略模型，包括独立采样、CommNet、带有修改MDP的自回归和带有LSTM的自回归。最后，我们在两个环境中测试我们的算法：一个多猎手多兔子网格游戏和一个多智能体多臂老虎机问题。结果表明，我们的熵估计器在边际额外计算成本下显著提升了性能。

英文摘要

In recent years, deep reinforcement learning has been shown to be adept at solving sequential decision processes with high-dimensional state spaces such as in the Atari games. Many reinforcement learning problems, however, involve high-dimensional discrete action spaces as well as high-dimensional state spaces. This paper considers entropy bonus, which is used to encourage exploration in policy gradient. In the case of high-dimensional action spaces, calculating the entropy and its gradient requires enumerating all the actions in the action space and running forward and backpropagation for each action, which may be computationally infeasible. We develop several novel unbiased estimators for the entropy bonus and its gradient. We apply these estimators to several models for the parameterized policies, including Independent Sampling, CommNet, Autoregressive with Modified MDP, and Autoregressive with LSTM. Finally, we test our algorithms on two environments: a multi-hunter multi-rabbit grid game and a multi-agent multi-arm bandit problem. The results show that our entropy estimators substantially improve performance with marginal additional computational cost.

URL PDF HTML ☆

赞 0 踩 0

1709.05746 2026-06-04 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY

Adversarial Discriminative Sim-to-real Transfer of Visuo-motor Policies

对抗性判别仿真到现实的视觉-运动策略转移

Fangyi Zhang, Jürgen Leitner, Zongyuan Ge, Michael Milford, Peter Corke

发表机构 * Australian Centre for Robotic Vision (ACRV)（澳大利亚机器人视觉中心）； Queensland University of Technology (QUT)（昆士兰技术大学）； Monash University（墨尔本大学）

AI总结本文提出对抗性判别仿真到现实转移方法，减少现实数据标注成本，在桌面上物体抓取任务中，通过视觉观测控制7自由度机械臂在障碍物中抓取蓝色立方体，仅需93个标注和186个未标注图像即可实现97.8%的成功率和1.8厘米的控制精度。

Comments Under review for the International Journal of Robotics Research

详情

AI中文摘要

各种方法已被提出以学习用于现实世界机器人应用的视觉-运动策略。一种解决方案是首先在仿真中学习然后转移到现实世界。在转移过程中，大多数现有方法需要带有标签的真实图像。然而，在许多机器人应用中，标注过程往往昂贵甚至不实际。在本文中，我们提出了一种对抗性判别仿真到现实转移方法，以减少标注真实数据的成本。通过模块化网络在桌面物体抓取任务中验证了该方法的有效性，其中7自由度的机械臂以速度模式控制在障碍物中抓取蓝色立方体。对抗性转移方法将标注真实数据的需求减少了50%。策略可以仅使用93个标注和186个未标注的真实图像转移到现实环境。转移的视觉-运动策略对训练中未见过的物体和移动目标具有鲁棒性，实现了97.8%的成功率和1.8厘米的控制精度。

英文摘要

Various approaches have been proposed to learn visuo-motor policies for real-world robotic applications. One solution is first learning in simulation then transferring to the real world. In the transfer, most existing approaches need real-world images with labels. However, the labelling process is often expensive or even impractical in many robotic applications. In this paper, we propose an adversarial discriminative sim-to-real transfer approach to reduce the cost of labelling real data. The effectiveness of the approach is demonstrated with modular networks in a table-top object reaching task where a 7 DoF arm is controlled in velocity mode to reach a blue cuboid in clutter through visual observations. The adversarial transfer approach reduced the labelled real data requirement by 50%. Policies can be transferred to real environments with only 93 labelled and 186 unlabelled real images. The transferred visuo-motor policies are robust to novel (not seen in training) objects in clutter and even a moving target, achieving a 97.8% success rate and 1.8 cm control accuracy.

URL PDF HTML ☆

赞 0 踩 0

1805.10638 2026-06-04 cs.LG cs.NA math.NA stat.ML

Fast K-Means Clustering with Anderson Acceleration

快速K均值聚类的安德森加速方法

Juyong Zhang, Yuxin Yao, Yue Peng, Hao Yu, Bailin Deng

发表机构 * University of Science and Technology of China（中国科学技术大学）； Cardiff University（卡迪夫大学）

AI总结本文提出了一种加速K均值聚类Lloyd算法的新方法，通过将Lloyd算法的分配和更新步骤视为固定点迭代，并应用安德森加速技术，动态调整参数m以实现鲁棒且一致的加速效果。

详情

AI中文摘要

我们提出了一种新的方法，用于加速K-均值聚类的Lloyd算法。与以往减少每次迭代计算成本或改进初始化的方法不同，我们的方法专注于减少收敛所需的迭代次数。这通过将Lloyd算法的分配步骤和更新步骤视为固定点迭代，并应用安德森加速，一种已建立的加速固定点求解器的技术来实现。经典安德森加速利用m个之前的迭代来找到加速的迭代，其在K-均值聚类中的性能对m的选择和样本分布敏感。我们提出了一种新的策略，动态调整m的值，以在不同问题实例上实现鲁棒且一致的加速。我们的方法补充了现有的加速技术，并可以与它们结合以实现最先进的性能。我们进行了广泛的实验来评估所提出方法的性能，在120个测试用例中，有106个用例优于其他算法，平均计算时间减少比率超过33%。

英文摘要

We propose a novel method to accelerate Lloyd's algorithm for K-Means clustering. Unlike previous acceleration approaches that reduce computational cost per iterations or improve initialization, our approach is focused on reducing the number of iterations required for convergence. This is achieved by treating the assignment step and the update step of Lloyd's algorithm as a fixed-point iteration, and applying Anderson acceleration, a well-established technique for accelerating fixed-point solvers. Classical Anderson acceleration utilizes m previous iterates to find an accelerated iterate, and its performance on K-Means clustering can be sensitive to choice of m and the distribution of samples. We propose a new strategy to dynamically adjust the value of m, which achieves robust and consistent speedups across different problem instances. Our method complements existing acceleration techniques, and can be combined with them to achieve state-of-the-art performance. We perform extensive experiments to evaluate the performance of the proposed method, where it outperforms other algorithms in 106 out of 120 test cases, and the mean decrease ratio of computational time is more than 33%.

URL PDF HTML ☆

赞 0 踩 0

1710.01493 2026-06-04 cs.LG cs.CV cs.NA math.NA math.OC

Image Labeling Based on Graphical Models Using Wasserstein Messages and Geometric Assignment

基于图形模型的图像标注：利用Wasserstein消息与几何分配

Ruben Hühnerbein, Fabrizio Savarino, Freddie Åström, Christoph Schnörr

发表机构 * Image and Pattern Analysis Group, Heidelberg University, Germany（海德堡大学图像与模式分析组）； Heidelberg Collaboratory for Image Processing, Heidelberg University, Germany（海德堡图像处理协同实验室）

AI总结本文提出基于离散图模型的最大后验推断新方法，利用局部Wasserstein距离近似目标函数并实现并行收敛。

1805.09875 2026-06-04 cs.RO cs.SY eess.SY

Autonomous Thermalling as a Partially Observable Markov Decision Process (Extended Version)

自主热力上升作为部分可观测马尔可夫决策过程（扩展版本）

Iain Guilliard, Richard Rogahn, Jim Piavis, Andrey Kolobov

发表机构 * Australian National University（澳大利亚国立大学）； Microsoft Research（微软研究院）

AI总结本文提出将自主热力上升建模为POMDP，并设计基于此的递推地平线控制器，通过在ArduPlane中实现并对比现有方法，验证了其在多架sUAV同时热力上升时的显著优势。

1805.09464 2026-06-04 cs.LG cs.IT cs.NA math.IT math.NA math.OC stat.ML

Simple and practical algorithms for $\ell_p$-norm low-rank approximation

简单且实用的ℓp-范数低秩近似算法

Anastasios Kyrillidis

发表机构 * IBM T.J. Watson Research Center（IBM T.J. 巴特利特研究中心）； Rice University（里士满大学）

AI总结本文提出基于梯度的非凸算法，用于ℓp范数低秩近似，适用于p=1或p=∞。算法易于实现，能更快速且更精确地逼近，理论证明其可达到(1+ε)-OPT近似，且不依赖超参数。

Comments 16 pages, 11 figures, to appear in UAI 2018

1805.09408 2026-06-04 cs.CV cs.NA math.NA

Non-convex non-local flows for saliency detection

非凸非局部流用于显著性检测

Iván Ramírez, Gonzalo Galiano, Emanuele Schiavi

发表机构 * Dpt. of Mathematics, Universidad Rey Juan Carlos（数学系，雷乌恩卡洛斯大学）； Dpt. of Mathematics, Universidad de Oviedo（数学系，奥维多大学）

AI总结本文提出并求解了新的变分模型用于数字图像自动显著性检测，结合非局部框架和新的二次显著性检测项，用于胶质瘤在MRI-Flair图像中的分割。

1805.08095 2026-06-04 cs.LG cs.CV cs.NA math.NA stat.ML

Small steps and giant leaps: Minimal Newton solvers for Deep Learning

小步与巨跃：用于深度学习的最小牛顿求解器

João F. Henriques, Sebastien Ehrhardt, Samuel Albanie, Andrea Vedaldi

发表机构 * Visual Geometry Group, University of Oxford（视觉几何组，牛津大学）

AI总结本文提出一种快速的二阶方法，可作为现有深度学习求解器的替代方案。该方法仅需每个迭代两次额外的前向模式自动微分操作，计算成本与两次标准前向传递相当，易于实现。方法解决了现有二阶求解器的长期问题，避免了计算Hessian矩阵的近似逆矩阵的高成本和噪声敏感性。

详情

AI中文摘要

我们提出了一种快速的二阶方法，可作为现有深度学习求解器的替代方案。与随机梯度下降（SGD）相比，该方法每个迭代仅需两次额外的前向模式自动微分操作，计算成本与两次标准前向传递相当，且易于实现。我们的方法解决了现有二阶求解器的长期问题，即每次迭代精确或通过共轭梯度法计算近似Hessian矩阵的逆矩阵，这一过程成本高且对噪声敏感。相反，我们提出保持一个梯度的估计值，该估计值通过逆Hessian矩阵投影得到，并在每次迭代中更新一次。该估计值的大小相同，类似于SGD中常用的动量变量。不维护Hessian的估计值。我们首先在具有已知闭式解的小问题上验证了我们的方法，称为CurveBall，包括噪声Rosenbrock函数和退化的两层线性网络，其中现有深度学习求解器似乎难以处理。然后我们在CIFAR和ImageNet上训练了多个大型模型，包括ResNet和VGG-f网络，展示了无需超参数调优的更快收敛速度。代码已提供。

英文摘要

We propose a fast second-order method that can be used as a drop-in replacement for current deep learning solvers. Compared to stochastic gradient descent (SGD), it only requires two additional forward-mode automatic differentiation operations per iteration, which has a computational cost comparable to two standard forward passes and is easy to implement. Our method addresses long-standing issues with current second-order solvers, which invert an approximate Hessian matrix every iteration exactly or by conjugate-gradient methods, a procedure that is both costly and sensitive to noise. Instead, we propose to keep a single estimate of the gradient projected by the inverse Hessian matrix, and update it once per iteration. This estimate has the same size and is similar to the momentum variable that is commonly used in SGD. No estimate of the Hessian is maintained. We first validate our method, called CurveBall, on small problems with known closed-form solutions (noisy Rosenbrock function and degenerate 2-layer linear networks), where current deep learning solvers seem to struggle. We then train several large models on CIFAR and ImageNet, including ResNet and VGG-f networks, where we demonstrate faster convergence with no hyperparameter tuning. Code is available.

URL PDF HTML ☆

赞 0 踩 0

1711.09220 2026-06-04 cs.LG cs.SY eess.SY math.OC

Fitting Jump Models

拟合跳跃模型

A. Bemporad, V. Breschi, D. Piga, S. Boyd

发表机构 * IMT School for Advanced Studies Lucca（IMT 高级研究学院卢塞拉分校）； Dalle Molle Institute for Artificial Intelligence Research - USI/SUPSI（达勒莫莱人工智能研究所 - USI/SUPSI）； Department of Electrical Engineering, Stanford University（斯坦福大学电气工程系）

AI总结本文提出了一种新的框架，用于拟合跳跃模型序列数据，通过交替最小化损失函数以拟合多个模型参数和确定每个数据点的活跃参数集，适用于隐马尔可夫模型等主流模型。

Comments Accepted for publication in Automatica

1804.01825 2026-06-04 cs.LG econ.GN q-fin.EC stat.ML

Evaluating Hospital Case Cost Prediction Models Using Azure Machine Learning Studio

利用Azure机器学习工作室评估医院病例成本预测模型

Alexei Botchkarev

发表机构 * Microsoft Azure Machine Learning Studio（微软Azure机器学习工作室）

AI总结本文提出了一种利用Azure机器学习工作室快速评估多种回归模型的工具，评估了鲁棒回归、提升决策树回归和决策森林回归在医院病例成本预测中的优势。

详情

AI中文摘要

准确的医院病例成本建模和预测能力对高效医疗财务管理和预算规划至关重要。已知各种回归机器学习算法在医疗成本预测中表现良好。本实验的目的是构建一个Azure机器学习工作室工具，用于快速评估多种类型的回归模型。该工具提供了一个统一的实验环境，可比较14种回归模型：线性回归、贝叶斯线性回归、决策森林回归、提升决策树回归、神经网络回归、泊松回归、回归高斯过程、梯度提升机、非线性最小二乘回归、投影寻踪回归、随机森林回归、鲁棒回归、鲁棒回归与mm型估计器、支持向量回归。该工具通过五个性能指标将评估结果按模型准确性排列在单一表格中。对回归机器学习模型进行医院病例成本预测的评估显示，鲁棒回归模型、提升决策树回归和决策森林回归具有优势。该操作工具已发布到网络上，可供实验和扩展使用。

英文摘要

Ability for accurate hospital case cost modelling and prediction is critical for efficient health care financial management and budgetary planning. A variety of regression machine learning algorithms are known to be effective for health care cost predictions. The purpose of this experiment was to build an Azure Machine Learning Studio tool for rapid assessment of multiple types of regression models. The tool offers environment for comparing 14 types of regression models in a unified experiment: linear regression, Bayesian linear regression, decision forest regression, boosted decision tree regression, neural network regression, Poisson regression, Gaussian processes for regression, gradient boosted machine, nonlinear least squares regression, projection pursuit regression, random forest regression, robust regression, robust regression with mm-type estimators, support vector regression. The tool presents assessment results arranged by model accuracy in a single table using five performance metrics. Evaluation of regression machine learning models for performing hospital case cost prediction demonstrated advantage of robust regression model, boosted decision tree regression and decision forest regression. The operational tool has been published to the web and openly available for experiments and extensions.

URL PDF HTML ☆

赞 0 踩 0

1510.07380 2026-06-04 cs.RO cs.SY eess.SY

SLAP: Simultaneous Localization and Planning Under Uncertainty for Physical Mobile Robots via Dynamic Replanning in Belief Space: Extended version

SLAP：通过信念空间中的动态重新规划实现物理移动机器人的同时定位与规划（在不确定性下）：扩展版

Ali-akbar Agha-mohammadi, Saurav Agarwal, Sung-Kyun Kim, Suman Chakravorty, Nancy M. Amato

发表机构 * NASA-JPL, Caltech（NASA-喷气推进中心，加州理工学院）； Dept. of Aerospace Eng. and Amato is with the Dept. of Computer Science（航空航天工程系和计算机科学系）； Dept. of Computer Science（计算机科学系）

AI总结本文提出一种在不确定性环境下通过信念空间动态重新规划实现物理移动机器人同时定位与规划的方法，通过在线重新规划循环改进离线策略，有效应对环境变化和大定位误差，优于FIRM方法。

Comments 20 pages, updated figures, extended theory and simulation results

详情

AI中文摘要

同时定位与规划（SLAP）是自主机器人在不确定性环境下至关重要的能力。在最一般的形式下，SLAP诱导出一个连续的POMDP（部分可观测马尔可夫决策过程），需要在线不断求解。本文针对此问题提出一种在信念空间中的动态重新规划方案。该连续的POMDP在状态、动作和观测空间中通过采样方法进行离线近似，但通过在线重新规划循环实现局部改进。这种构造使所提方法能够应对环境变化和大定位误差，即使环境变化改变了最优轨迹的同调类。此外，本文方法优于当前最先进的FIRM（反馈信息路标）方法，通过消除不必要的稳定步骤。将信念空间规划应用于物理系统带来了诸多挑战。本文的重点是将所提规划器应用于物理机器人，并展示在不确定性、变化环境和存在大干扰（如被绑架机器人情况）下的SLAP解决方案性能。

英文摘要

Simultaneous localization and Planning (SLAP) is a crucial ability for an autonomous robot operating under uncertainty. In its most general form, SLAP induces a continuous POMDP (partially-observable Markov decision process), which needs to be repeatedly solved online. This paper addresses this problem and proposes a dynamic replanning scheme in belief space. The underlying POMDP, which is continuous in state, action, and observation space, is approximated offline via sampling-based methods, but operates in a replanning loop online to admit local improvements to the coarse offline policy. This construct enables the proposed method to combat changing environments and large localization errors, even when the change alters the homotopy class of the optimal trajectory. It further outperforms the state-of-the-art FIRM (Feedback-based Information RoadMap) method by eliminating unnecessary stabilization steps. Applying belief space planning to physical systems brings with it a plethora of challenges. A key focus of this paper is to implement the proposed planner on a physical robot and show the SLAP solution performance under uncertainty, in changing environments and in the presence of large disturbances, such as a kidnapped robot situation.

URL PDF HTML ☆

赞 0 踩 0

1805.04201 2026-06-04 cs.RO cs.AI cs.SY eess.SY

Learning to Grasp Without Seeing

无需视觉的抓取学习

Adithyavairavan Murali, Yin Li, Dhiraj Gandhi, Abhinav Gupta

发表机构 * The Robotics Institute, Carnegie Mellon University（卡内基梅隆大学机器人研究所）

AI总结本文提出基于触觉感知的抓取方法，通过触觉信号表征和迭代重抓取提升抓取稳定性，实验表明在无视觉信息下可有效抓取新型物体。

详情

AI中文摘要

能否在不看到物体的情况下让机器人抓取未知物体？本文提出了一种基于触觉感知的解决方案，结合触觉信号定位与触觉反馈重抓取。我们创建了一个大规模抓取数据集，包含超过30帧RGB图像和280万条触觉样本。提出了一种无监督自编码方案，显著提升了触觉感知任务的性能。系统分为两个步骤：首先，触觉定位模型通过粒子滤波聚合目标信息，输出物体位置估计以建立初始抓取；其次，重抓取模型基于学习特征逐步改进抓取，估计抓取稳定性并预测下一步调整。最终通过大量实验验证了在无视觉信息下抓取新型物体的有效性，并在视觉策略基础上提升了整体准确率10.6%。

英文摘要

Can a robot grasp an unknown object without seeing it? In this paper, we present a tactile-sensing based approach to this challenging problem of grasping novel objects without prior knowledge of their location or physical properties. Our key idea is to combine touch based object localization with tactile based re-grasping. To train our learning models, we created a large-scale grasping dataset, including more than 30 RGB frames and over 2.8 million tactile samples from 7800 grasp interactions of 52 objects. To learn a representation of tactile signals, we propose an unsupervised auto-encoding scheme, which shows a significant improvement of 4-9% over prior methods on a variety of tactile perception tasks. Our system consists of two steps. First, our touch localization model sequentially 'touch-scans' the workspace and uses a particle filter to aggregate beliefs from multiple hits of the target. It outputs an estimate of the object's location, from which an initial grasp is established. Next, our re-grasping model learns to progressively improve grasps with tactile feedback based on the learned features. This network learns to estimate grasp stability and predict adjustment for the next grasp. Re-grasping thus is performed iteratively until our model identifies a stable grasp. Finally, we demonstrate extensive experimental results on grasping a large set of novel objects using tactile sensing alone. Furthermore, when applied on top of a vision-based policy, our re-grasping model significantly boosts the overall accuracy by 10.6%. We believe this is the first attempt at learning to grasp with only tactile sensing and without any prior object knowledge.

URL PDF HTML ☆

赞 0 踩 0

1804.08676 2026-06-04 cs.RO cs.MA cs.SY eess.SY

Gesture based Human-Swarm Interactions for Formation Control using interpreters

基于手势的人群-蜂群交互的编队控制使用解释器

Aamodh Suresh, Sonia Martinez

发表机构 * Department of Mechanical and Aerospace Engineering, University of California at San Diego, La Jolla, CA 92093, USA（机械与航空航天工程系，加州大学圣地亚哥分校，拉古拉，CA 92093，美国）

AI总结本文提出了一种新颖的人群-蜂群交互框架，通过手势控制蜂群形状和编队。该框架利用可穿戴臂带记录手势，通过解释器将手势转化为蜂群控制指令，结合机器学习和最优控制技术实现编队控制。

详情

AI中文摘要

我们提出了一种新颖的人群-蜂群交互（HSI）框架，使用户能够通过简单的手臂手势和动作控制蜂群的形状和编队。用户通过可穿戴的臂带记录手势，该框架引入了一种新颖的解释器系统，作为用户和蜂群之间的中介，简化用户的交互角色。解释器接收用户通过手势绘制的高层次输入，并将其转化为低层次的蜂群控制指令。该解释器利用机器学习、卡尔曼滤波和最优控制技术将用户输入转化为蜂群控制参数。引入了人类可解释的动力学概念，用于解释器的规划以及向用户提供反馈。蜂群的动力学通过基于分布式线性迭代和动态平均一致的新型去中心化编队控制器进行控制。该框架在二维环境中理论和实验上均得到了验证，展示了人类实时控制模拟机器人蜂群的能力。

英文摘要

We propose a novel Human-Swarm Interaction (HSI) framework which enables the user to control a swarm shape and formation. The user commands the swarm utilizing just arm gestures and motions which are recorded by an off-the-shelf wearable armband. We propose a novel interpreter system, which acts as an intermediary between the user and the swarm to simplify the user's role in the interaction. The interpreter takes in a high level input drawn using gestures by the user, and translates it into low level swarm control commands. This interpreter employs machine learning, Kalman filtering and optimal control techniques to translate the user input into swarm control parameters. A notion of Human Interpretable dynamics is introduced, which is used by the interpreter for planning as well as to provide feedback to the user. The dynamics of the swarm are controlled using a novel decentralized formation controller based on distributed linear iterations and dynamic average consensus. The framework is demonstrated theoretically as well as experimentally in a 2D environment, with a human controlling a swarm of simulated robots in real time.

URL PDF HTML ☆

赞 0 踩 0

1804.07323 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems

非参数随机组合梯度下降法在连续马尔可夫决策问题中的Q学习

Alec Koppel, Ekaterina Tolstaya, Ethan Stump, Alejandro Ribeiro

发表机构 * University of Pennsylvania（宾夕法尼亚大学）； U.S. Army Research Laboratory（美国陆军研究实验室）

AI总结本文提出非参数随机组合梯度下降法用于连续马尔可夫决策问题中的Q学习，通过将贝尔曼最优性方程转化为嵌套非凸随机优化问题，并利用核诱导再生核希尔伯特空间进行参数化，最终证明算法在概率意义下收敛于问题的 stationary 点。

详情

AI中文摘要

我们考虑定义在连续状态和动作空间上的马尔可夫决策问题，其中自主代理试图学习从状态到动作的映射以最大化长期折扣奖励累积。我们通过考虑定义在动作价值函数上的贝尔曼最优性方程，将其重新表述为一个嵌套非凸随机优化问题，该问题定义在再生核希尔伯特空间（RKHS）上。我们开发了一种功能扩展的随机准梯度方法来解决这个问题，由于RKHS的结构，它允许以标量权重和过去的状态-动作对参数化，其增长与算法迭代次数成比例。为缓解这种复杂性爆炸，我们应用核正交匹配追踪到核权重和字典序列，从而在底层优化方法的下降方向上产生可控的误差。我们证明所得到的算法，称为KQ学习，以概率1收敛于该问题的 stationary 点，从而在假设其属于RKHS的情况下得到贝尔曼最优性算子的固定点。在常数学习率下，我们进一步得到收敛于一个小的贝尔曼误差，该误差取决于所选的学习率。在连续山车和倒立摆任务上的数值评估表明，收敛的简洁学习动作价值函数、与最先进方法具有竞争力的策略，并表现出可靠、可重复的学习行为。

英文摘要

We consider Markov Decision Problems defined over continuous state and action spaces, where an autonomous agent seeks to learn a map from its states to actions so as to maximize its long-term discounted accumulation of rewards. We address this problem by considering Bellman's optimality equation defined over action-value functions, which we reformulate into a nested non-convex stochastic optimization problem defined over a Reproducing Kernel Hilbert Space (RKHS). We develop a functional generalization of stochastic quasi-gradient method to solve it, which, owing to the structure of the RKHS, admits a parameterization in terms of scalar weights and past state-action pairs which grows proportionately with the algorithm iteration index. To ameliorate this complexity explosion, we apply Kernel Orthogonal Matching Pursuit to the sequence of kernel weights and dictionaries, which yields a controllable error in the descent direction of the underlying optimization method. We prove that the resulting algorithm, called KQ-Learning, converges with probability 1 to a stationary point of this problem, yielding a fixed point of the Bellman optimality operator under the hypothesis that it belongs to the RKHS. Under constant learning rates, we further obtain convergence to a small Bellman error that depends on the chosen learning rates. Numerical evaluation on the Continuous Mountain Car and Inverted Pendulum tasks yields convergent parsimonious learned action-value functions, policies that are competitive with the state of the art, and exhibit reliable, reproducible learning behavior.

URL PDF HTML ☆

赞 0 踩 0

1804.06114 2026-06-04 cs.LG cs.CV cs.NA math.NA stat.ML

A Support Tensor Train Machine

支持张量列车机

Cong Chen, Kim Batselier, Ching-Yun Ko, Ngai Wong

发表机构 * The Department of Electrical and Electronic Engineering, The University of Hong Kong（香港大学电子与电气工程系）

AI总结本文提出支持张量列车机，通过将传统支持张量机中的秩一张量替换为张量列车，提升模型表达能力，实验验证其优于SVM和STM。

Comments 7 pages

1804.04696 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY

Efficient Model Identification for Tensegrity Locomotion

高效 tensegrity 机器人运动的模型识别

Shaojun Zhu, David Surovik, Kostas E. Bekris, Abdeslam Boularias

发表机构 * Department of Computer Science, Rutgers University（计算机科学系，罗格斯大学）

AI总结本文提出一种高效方法，利用物理引擎和贝叶斯优化框架，用于识别高维顺应性tensegrity机器人中的未知机械参数，提升运动控制精度。

1804.04347 2026-06-04 cs.RO cs.SE cs.SY eess.SY

The CAT Vehicle Testbed: A Simulator with Hardware in the Loop for Autonomous Vehicle Applications

CAT车辆测试平台：用于自动驾驶应用的具有闭环硬件的模拟器

Rahul Kumar Bhadani, Jonathan Sprinkle, Matthew Bunting

发表机构 * Department of Electrical and Computer Engineering（电气与计算机工程系）； University of Arizona（亚利桑那大学）； Tucson, USA（美国图森市）

AI总结本文提出CAT车辆测试平台，通过闭环硬件模拟验证仿真结果，支持自动驾驶技术研究。平台基于ROS和物理车辆模型，支持多车交互和实时数据回放，可快速验证算法性能。

Comments In Proceedings SCAV 2018, arXiv:1804.03406

详情

DOI: 10.4204/EPTCS.269.4
Journal ref: EPTCS 269, 2018, pp. 32-47

AI中文摘要

本文介绍了CAT车辆（认知与自主测试车辆）测试平台：一个由分布式仿真为基础的自动驾驶车辆组成的研发测试平台，能够轻松过渡到闭环硬件测试和执行，以支持自动驾驶技术的研究。自动驾驶技术从主动安全功能和高级驾驶辅助系统发展到完全传感器引导的自动驾驶，需要测试所有可能的场景。然而，研究人员若没有自己的机器人平台，想要在物理平台上展示新成果将面临困难。因此，需要一个研究测试平台，使基于仿真的结果能够通过闭环仿真快速验证，以便在物理平台上测试软件。CAT车辆测试平台提供了这样的测试平台，可以在仿真中模拟真实车辆的动力学，然后无缝过渡到使用案例的硬件再现。该模拟器使用机器人操作系统（ROS）和基于物理的车辆模型，包括具有可配置参数的模拟传感器和执行器。该测试平台允许多车仿真以支持车辆间交互。我们的测试平台还支持实时数据记录和捕获，可以回放以检查特定场景或使用案例，并用于回归测试。作为可行性演示的一部分，我们介绍了CAT车辆挑战，全球各地的学生研究人员能够在少于2天的物理平台接口时间内重现他们的仿真结果。

英文摘要

This paper presents the CAT Vehicle (Cognitive and Autonomous Test Vehicle) Testbed: a research testbed comprised of a distributed simulation-based autonomous vehicle, with straightforward transition to hardware in the loop testing and execution, to support research in autonomous driving technology. The evolution of autonomous driving technology from active safety features and advanced driving assistance systems to full sensor-guided autonomous driving requires testing of every possible scenario. However, researchers who want to demonstrate new results on a physical platform face difficult challenges, if they do not have access to a robotic platform in their own labs. Thus, there is a need for a research testbed where simulation-based results can be rapidly validated through hardware in the loop simulation, in order to test the software on board the physical platform. The CAT Vehicle Testbed offers such a testbed that can mimic dynamics of a real vehicle in simulation and then seamlessly transition to reproduction of use cases with hardware. The simulator utilizes the Robot Operating System (ROS) with a physics-based vehicle model, including simulated sensors and actuators with configurable parameters. The testbed allows multi-vehicle simulation to support vehicle to vehicle interaction. Our testbed also facilitates logging and capturing of the data in the real time that can be played back to examine particular scenarios or use cases, and for regression testing. As part of the demonstration of feasibility, we present a brief description of the CAT Vehicle Challenge, in which student researchers from all over the globe were able to reproduce their simulation results with fewer than 2 days of interfacing with the physical platform.

URL PDF HTML ☆

赞 0 踩 0

1804.02884 2026-06-04 cs.AI cs.LG cs.MA cs.NE cs.SY eess.SY

Policy Gradient With Value Function Approximation For Collective Multiagent Planning

基于价值函数近似集体多智能体规划的策略梯度

Duc Thien Nguyen, Akshat Kumar, Hoong Chuin Lau

发表机构 * School of Information Systems（信息系统学院）； Singapore Management University（新加坡管理大学）

AI总结本文提出一种改进的actor-critic方法，用于优化集体决策多智能体规划问题，通过分解近似动作价值函数提升收敛速度，并在合成任务和出租车车队优化中验证了方法的有效性。

1612.07139 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY

A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation

深度网络在机器人学习控制中的应用综述：从强化到模仿

Lei Tai, Jingwei Zhang, Ming Liu, Joschka Boedecker, Wolfram Burgard

发表机构 * University of Freiburg（弗赖堡大学）

AI总结本文综述了深度学习在机器人学习控制中的应用，探讨了深度强化学习和模仿学习两大主流方法，分析了其在导航、 manipulation 任务中的应用及现实差距挑战。

Comments 19 pages, 1 figures

详情

AI中文摘要

深度学习技术已广泛应用于各种研究领域，取得了最先进的成果。本文综述了针对机器人应用的学习控制策略的深度学习解决方案。我们讨论了深度学习在学习控制中的两大主要范式：深度强化学习和模仿学习。对于深度强化学习（DRL），我们从传统强化学习算法开始，展示了如何将其扩展到深度领域，并介绍了在机器人导航和 manipulation 任务中使用 DRL 的代表性工作。我们继续讨论了解决现实差距挑战的方法，即如何将仿真中训练的 DRL 策略转移到现实世界场景，并总结了用于 DRL 研究的机器人仿真平台。对于模仿学习，我们探讨了其三个主要类别：行为克隆、逆强化学习和生成对抗模仿学习，介绍了它们的公式及其在机器人应用中的对应情况。最后，我们讨论了开放挑战和研究前沿。

英文摘要

Deep learning techniques have been widely applied, achieving state-of-the-art results in various fields of study. This survey focuses on deep learning solutions that target learning control policies for robotics applications. We carry out our discussions on the two main paradigms for learning control with deep networks: deep reinforcement learning and imitation learning. For deep reinforcement learning (DRL), we begin from traditional reinforcement learning algorithms, showing how they are extended to the deep context and effective mechanisms that could be added on top of the DRL algorithms. We then introduce representative works that utilize DRL to solve navigation and manipulation tasks in robotics. We continue our discussion on methods addressing the challenge of the reality gap for transferring DRL policies trained in simulation to real-world scenarios, and summarize robotics simulation platforms for conducting DRL research. For imitation leaning, we go through its three main categories, behavior cloning, inverse reinforcement learning and generative adversarial imitation learning, by introducing their formulations and their corresponding robotics applications. Finally, we discuss the open challenges and research frontiers.

URL PDF HTML ☆

赞 0 踩 0

1612.00181 2026-06-04 cs.CV cs.NA math.NA

Monge's Optimal Transport Distance for Image Classification

蒙特问题最优运输距离用于图像分类

Michael Snow, Jan Van lent

发表机构 * Department of Engineering Design and Mathematics, Centre for Machine Vision, University of the West of England（工程设计与数学系，机器视觉中心，西英格兰大学）

AI总结本文提出利用Wasserstein距离进行图像比较，通过求解Monge问题的高效数值方法，并用1-NN算法展示其在图像分类中的优势。

Comments 15 pages, 14 figure

1712.04170 2026-06-04 cs.AI cs.NE cs.SY eess.SY

Interpretable Policies for Reinforcement Learning by Genetic Programming

通过遗传编程实现强化学习的可解释策略

Daniel Hein, Steffen Udluft, Thomas A. Runkler

发表机构 * Technical University of Munich, Department of Informatics（慕尼黑技术大学信息学院）； Siemens AG, Corporate Technology（西门子股份公司企业技术部）

AI总结本文提出基于模型驱动批量强化学习和遗传编程的GPRL方法，通过预存的默认状态-动作轨迹样本自动生成可解释的强化学习策略，实验表明其优于传统符号回归方法。

详情

AI中文摘要

可解释性强化学习策略的搜索在学术和工业领域均有重要价值。特别是对于工业系统，如果策略易于理解和评估，领域专家更可能部署自主学习的控制器。基本代数方程只要复杂度适当，就能满足这些要求。本文引入基于模型驱动批量强化学习和遗传编程的强化学习遗传编程（GPRL）方法，该方法可从预存的默认状态-动作轨迹样本中自动生成策略方程。GPRL与传统利用遗传编程进行符号回归的方法相比，能够生成模仿现有高性能但不可解释策略的策略。在三个强化学习基准测试中，即山车、倒极杆平衡和工业基准，实验显示GPRL方法优于符号回归方法。GPRL能够从预存的默认轨迹数据中生成高性能且可解释的强化学习策略。

英文摘要

The search for interpretable reinforcement learning policies is of high academic and industrial interest. Especially for industrial systems, domain experts are more likely to deploy autonomously learned controllers if they are understandable and convenient to evaluate. Basic algebraic equations are supposed to meet these requirements, as long as they are restricted to an adequate complexity. Here we introduce the genetic programming for reinforcement learning (GPRL) approach based on model-based batch reinforcement learning and genetic programming, which autonomously learns policy equations from pre-existing default state-action trajectory samples. GPRL is compared to a straight-forward method which utilizes genetic programming for symbolic regression, yielding policies imitating an existing well-performing, but non-interpretable policy. Experiments on three reinforcement learning benchmarks, i.e., mountain car, cart-pole balancing, and industrial benchmark, demonstrate the superiority of our GPRL approach compared to the symbolic regression method. GPRL is capable of producing well-performing interpretable reinforcement learning policies from pre-existing default trajectory data.

URL PDF HTML ☆

赞 0 踩 0

1804.00684 2026-06-04 cs.LG cs.NA math.NA stat.ML

Graph-Based Deep Modeling and Real Time Forecasting of Sparse Spatio-Temporal Data

基于图的深度建模与稀疏时空数据的实时预测

Bao Wang, Xiyang Luo, Fangbo Zhang, Baichuan Yuan, Andrea L. Bertozzi, P. Jeffrey Brantingham

发表机构 * Dept of Anthropology, UCLA（人类学系，加州大学洛杉矶分校）； Dept of Math, UCLA（数学系，加州大学洛杉矶分校）

AI总结本文提出一种通用框架，用于稀疏时空数据的建模、分析与预测，结合自激发点过程和图结构循环神经网络，实现宏微观尺度的联合建模与实时预测。

Comments 9 pages, 19 figures

1803.10371 2026-06-04 cs.RO cs.LG cs.SY eess.SY

Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system

基于非操控操作的强化学习：从仿真到物理系统的迁移

Kendall Lowrey, Svetoslav Kolev, Jeremy Dao, Aravind Rajeswaran, Emanuel Todorov

发表机构 * University of Washington（华盛顿大学）； Roboti LLC（Roboti公司）

AI总结本文提出了一种基于仿真的强化学习方法，用于非操控操作任务，通过在仿真环境中训练策略，成功迁移到物理系统中，且在模型集合训练下提升了策略的鲁棒性。

Comments Accepted at IEEE SIMPAR 2018. Project page: https://sites.google.com/view/phantomsim2real

详情

AI中文摘要

强化学习已作为一种有前途的方法用于训练机器人控制器。然而，大多数结果受限于仿真，因为需要大量样本且缺乏自动且安全的数据收集方法。基于模型的强化学习方法提供了一种途径来克服这些挑战，但传统关注的是仿真与现实世界之间的不匹配。这里，我们展示在仿真中学习的控制策略可以成功迁移到由三个Phantom机器人推动物体到各种目标位置的物理系统中。我们使用修改的自然策略梯度算法进行学习，应用于精心识别的仿真模型。所得到的策略在仿真中完全训练后，在物理系统中无需额外训练即可有效工作。此外，我们还表明，使用模型集合训练使学习的策略对建模误差更鲁棒，从而补偿系统识别的困难。

英文摘要

Reinforcement learning has emerged as a promising methodology for training robot controllers. However, most results have been limited to simulation due to the need for a large number of samples and the lack of automated-yet-safe data collection methods. Model-based reinforcement learning methods provide an avenue to circumvent these challenges, but the traditional concern has been the mismatch between the simulator and the real world. Here, we show that control policies learned in simulation can successfully transfer to a physical system, composed of three Phantom robots pushing an object to various desired target positions. We use a modified form of the natural policy gradient algorithm for learning, applied to a carefully identified simulation model. The resulting policies, trained entirely in simulation, work well on the physical system without additional training. In addition, we show that training with an ensemble of models makes the learned policies more robust to modeling errors, thus compensating for difficulties in system identification.

URL PDF HTML ☆

赞 0 踩 0

1803.07661 2026-06-04 cs.LG cs.NA math.NA stat.ML

Efficient Recurrent Neural Networks using Structured Matrices in FPGAs

在FPGA上使用结构化矩阵实现高效的循环神经网络

Zhe Li, Shuo Wang, Caiwen Ding, Qinru Qiu, Yanzhi Wang, Yun Liang

发表机构 * Department of Electrical Engineering and Computer Science, Syracuse University, USA（Syracuse大学电气工程与计算机科学系）； Center for Energy-efficient Computing and Applications (CECA), Peking University, China（北京大学能源高效计算与应用中心）

AI总结本文提出在FPGA上使用块循环矩阵实现RNN，以提高模型压缩和加速，实验显示比ESE提升35.7倍的能效。

Comments To appear in International Conference on Learning Representations 2018 Workshop Track