arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

1902.05343 2026-06-04 cs.CV cs.RO cs.SY eess.SY

Study of dynamical system based obstacle avoidance via manipulating orthogonal coordinates

基于操纵正交坐标的动态系统障碍避障研究

Weiya Ren

发表机构 * Artificial Intelligence Research Center of National Innovation Institute of Defense Technology（国家创新技术研究院人工智能研究中心）； Tianjin Artificial Intelligence Innovation Center（天津人工智能创新中心）

AI总结本文研究了基于动态系统的障碍避障问题，通过引入正交坐标开发了调制矩阵，使调制矩阵更加合理。新轨迹的方向可通过正交坐标的线性组合表示。提出了一种通过引入旋转矩阵来解决局部最小问题，并在三维或更高维空间中提供更合理运动的正交坐标操纵方法。该方法还为围绕凸形体巡逻提供了解决方案。实验结果表明所提出方法的有效性。

1809.04539 2026-06-04 cs.RO cs.SY eess.SY

Frequency-Aware Model Predictive Control

频率感知模型预测控制

Ruben Grandia, Farbod Farshidian, Alexey Dosovitskiy, René Ranftl, Marco Hutter

发表机构 * Robotic Systems Lab, ETH Zurich（机器人系统实验室，苏黎世联邦理工学院）； Intel Labs, Munich, Germany（英特尔实验室，德国慕尼黑）

AI总结本文提出频率形状成本函数，用于在腿足机器人最优控制中实现鲁棒解决方案，通过仿真和硬件实验展示了运动计划与执行器带宽限制的兼容性，并在未建模合规性地形上实现了稳健行走。

详情

DOI: 10.1109/LRA.2019.2895882
Journal ref: IEEE Robotics and Automation Letters 2019

AI中文摘要

将轨迹优化得到的解决方案转移到机器人硬件仍是一个具有挑战性的问题。当优化充分利用提供的模型执行动态任务时，未建模的动力学会使运动在现实系统中不可行。模型误差可能是由于模型简化，也自然出现在在无结构和非确定性环境中部署机器人时。主要的是，顺应性接触和执行器动力学导致带宽限制。虽然经典控制方法提供了合成对一类模型误差鲁棒的控制器的工具，但现代轨迹优化中缺少这种概念，该问题是在时域中解决的。我们提出频率形状成本函数，以在腿足机器人的最优控制中实现鲁棒解决方案。通过仿真和硬件实验，我们展示了运动计划可以与由执行器和接触动力学设定的带宽限制相兼容。模型预测解决方案的平滑度可以连续调节而不影响问题的可行性。与由高度顺应性串联弹性执行器驱动的四足机器人ANYmal的实验显示，计划的运动、扭矩和力轨迹的跟踪性能显著提高，并使机器在具有未建模顺应性的地形上稳健行走。

英文摘要

Transferring solutions found by trajectory optimization to robotic hardware remains a challenging task. When the optimization fully exploits the provided model to perform dynamic tasks, the presence of unmodeled dynamics renders the motion infeasible on the real system. Model errors can be a result of model simplifications, but also naturally arise when deploying the robot in unstructured and nondeterministic environments. Predominantly, compliant contacts and actuator dynamics lead to bandwidth limitations. While classical control methods provide tools to synthesize controllers that are robust to a class of model errors, such a notion is missing in modern trajectory optimization, which is solved in the time domain. We propose frequency-shaped cost functions to achieve robust solutions in the context of optimal control for legged robots. Through simulation and hardware experiments we show that motion plans can be made compatible with bandwidth limits set by actuators and contact dynamics. The smoothness of the model predictive solutions can be continuously tuned without compromising the feasibility of the problem. Experiments with the quadrupedal robot ANYmal, which is driven by highly-compliant series elastic actuators, showed significantly improved tracking performance of the planned motion, torque, and force trajectories and enabled the machine to walk robustly on terrain with unmodeled compliance.

URL PDF HTML ☆

赞 0 踩 0

1602.05702 2026-06-04 cs.SD cs.SY eess.SY stat.ML

EEG-informed attended speaker extraction from recorded speech mixtures with application in neuro-steered hearing prostheses

基于EEG的受关注说话人提取从记录的语音混音中，应用于神经引导的听力假体

Simon Van Eyndhoven, Tom Francart, Alexander Bertrand

发表机构 * ESAT Laboratory of KU Leuven（KU莱顿大学ESAT实验室）； KU Leuven（KU莱顿大学）； Department of Electrical Engineering (ESAT)（电气工程系（ESAT））； Department of Neurosciences（神经科学系）

AI总结本文提出了一种基于EEG的受关注说话人提取方法，利用麦克风阵列记录和EEG记录来实现噪声环境下的说话人分离与去噪，展示了在无干净语音信号的情况下，通过EEG进行的听觉注意力检测的鲁棒性。

Comments This paper is published in IEEE Transactions on Biomedical Engineering (2016) and is under copyright. Please cite this paper as: S. Van Eyndhoven, T. Francart, and A. Bertrand, "EEG-informed attended speaker extraction from recorded speech mixtures with application in neuro-steered hearing prostheses", IEEE Transactions on Biomedical Engineering, vol. 64, no. 5, pp. 1045-1056, 2017

详情

DOI: 10.1109/TBME.2016.2587382
Journal ref: IEEE Transactions on Biomedical Engineering, vol. 64, no. 5, pp. 1045-1056, 2017

AI中文摘要

OBJECTIVE: 我们的目标是提取并去噪嘈杂双说话人声学场景中的受关注说话人，依靠双耳助听器的麦克风阵列记录，这些记录通过脑电图（EEG）记录补充，以推断感兴趣的说话人。 METHODS: 在本研究中，我们提出了一种模块化处理流程，首先从麦克风记录中提取两个语音包络，然后根据EEG选择受关注的语音包络，最后使用该包络来指导多通道语音分离和去噪算法。 RESULTS: 实现了对干扰（未受关注）语音和背景噪声的有效抑制，同时保留了受关注的语音。此外，基于EEG的听觉注意力检测（AAD）被证明在使用噪声语音信号时具有鲁棒性。 CONCLUSIONS: 我们的结果表明，基于EEG的说话人提取从麦克风阵列记录是可行且鲁棒的，即使在嘈杂的声学环境中，且无需访问干净的语音信号来执行基于EEG的AAD。 SIGNIFICANCE: 当前关于AAD的研究总是假设干净语音信号的可用性，这限制了在真实环境中的应用。我们扩展了这项研究，即使只有麦克风记录和嘈杂语音混合物可用时，也能检测到受关注的说话人。这是为新的脑机接口和有效的过滤方案在神经引导的听力假体中提供了一个关键要素。在这里，我们提供了基于EEG的受关注说话人提取和去噪的第一个概念验证。

英文摘要

OBJECTIVE: We aim to extract and denoise the attended speaker in a noisy, two-speaker acoustic scenario, relying on microphone array recordings from a binaural hearing aid, which are complemented with electroencephalography (EEG) recordings to infer the speaker of interest. METHODS: In this study, we propose a modular processing flow that first extracts the two speech envelopes from the microphone recordings, then selects the attended speech envelope based on the EEG, and finally uses this envelope to inform a multi-channel speech separation and denoising algorithm. RESULTS: Strong suppression of interfering (unattended) speech and background noise is achieved, while the attended speech is preserved. Furthermore, EEG-based auditory attention detection (AAD) is shown to be robust to the use of noisy speech signals. CONCLUSIONS: Our results show that AAD-based speaker extraction from microphone array recordings is feasible and robust, even in noisy acoustic environments, and without access to the clean speech signals to perform EEG-based AAD. SIGNIFICANCE: Current research on AAD always assumes the availability of the clean speech signals, which limits the applicability in real settings. We have extended this research to detect the attended speaker even when only microphone recordings with noisy speech mixtures are available. This is an enabling ingredient for new brain-computer interfaces and effective filtering schemes in neuro-steered hearing prostheses. Here, we provide a first proof of concept for EEG-informed attended speaker extraction and denoising.

URL PDF HTML ☆

赞 0 踩 0

1806.05722 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

Non-asymptotic Identification of LTI Systems from a Single Trajectory

非渐近识别单轨迹下的线性时不变系统

Samet Oymak, Necmiye Ozay

发表机构 * Department of Electrical and Computer Engineering, University of California, Riverside, CA（加州大学河滨分校电子工程与计算机科学系）； Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor, MI（密歇根大学安娜堡分校电气工程与计算机科学系）

AI总结该研究通过单轨迹输入输出数据，利用霍尔-卡尔曼算法在有限时间内学习线性时不变系统的马尔可夫参数，并结合稳定性结果和样本复杂度分析，确定学习系统平衡实现所需的数据量。

Comments Version 2 has two improvements: First, paper now uses spectral radius rather than largest singular value hence applies to a larger class of systems. Secondly, new sample complexity bounds are provided for approximating the system's Hankel operator via estimated Markov parameters

1803.06443 2026-06-04 cs.LG cs.DC cs.SY eess.SY stat.ML

Communication Compression for Decentralized Training

分布式训练中的通信压缩

Hanlin Tang, Shaoduo Gan, Ce Zhang, Tong Zhang, Ji Liu

发表机构 * Department of Computer Science, University of Rochester（罗切斯特大学计算机科学系）； Department of Computer Science, ETH Zurich（苏黎世联邦理工学院计算机科学系）； Tencent AI Lab（腾讯AI实验室）

AI总结本文研究了在高延迟和低带宽网络中结合通信压缩与去中心化技术以实现鲁棒训练系统的问题，提出了两种新的压缩策略并证明了其收敛性。

详情

AI中文摘要

优化分布式学习系统是平衡计算与通信的艺术。已有两种研究方向试图解决网络速度慢的问题：通信压缩用于低带宽网络，去中心化用于高延迟网络。本文探讨了一个自然问题：能否将这两种技术结合，使系统同时鲁棒于带宽和延迟？尽管这种组合的系统影响是显而易见的，但其理论原理和算法设计却极具挑战性：与集中式算法不同，简单地在去中心化网络中压缩交换信息，即使以无偏随机方式，也会累积误差并导致无法收敛。本文提出了一种压缩的去中心化训练框架，并提出了两种不同的策略，分别称为 extrapolation compression 和 difference compression。我们分析了这两种算法并证明了它们以 $O(1/\sqrt{nT})$ 的速率收敛，其中 $n$ 是工作者数量，$T$ 是迭代次数，与全精度集中式训练的收敛速率相匹配。我们验证了我们的算法，并发现对于同时具有高延迟和低带宽的网络，我们的算法显著优于仅去中心化或仅量化算法。

英文摘要

Optimizing distributed learning systems is an art of balancing between computation and communication. There have been two lines of research that try to deal with slower networks: {\em communication compression} for low bandwidth networks, and {\em decentralization} for high latency networks. In this paper, We explore a natural question: {\em can the combination of both techniques lead to a system that is robust to both bandwidth and latency?} Although the system implication of such combination is trivial, the underlying theoretical principle and algorithm design is challenging: unlike centralized algorithms, simply compressing exchanged information, even in an unbiased stochastic way, within the decentralized network would accumulate the error and fail to converge. In this paper, we develop a framework of compressed, decentralized training and propose two different strategies, which we call {\em extrapolation compression} and {\em difference compression}. We analyze both algorithms and prove both converge at the rate of $O(1/\sqrt{nT})$ where $n$ is the number of workers and $T$ is the number of iterations, matching the convergence rate for full precision, centralized training. We validate our algorithms and find that our proposed algorithm outperforms the best of merely decentralized and merely quantized algorithm significantly for networks with {\em both} high latency and low bandwidth.

URL PDF HTML ☆

赞 0 踩 0

1809.08911 2026-06-04 cs.LG cs.CY cs.SY eess.SP eess.SY stat.ML

Understanding Compressive Adversarial Privacy

理解压缩对抗隐私

Xiao Chen, Peter Kairouz, Ram Rajagopal

发表机构 * Stanford University（斯坦福大学）

AI总结本文提出了一种压缩对抗隐私框架，通过凸优化在数据隐私和效用之间取得平衡，并通过实证应用展示了该框架在保护敏感信息方面的有效性。

详情

DOI: 10.1109/CDC.2018.8619455
Journal ref: 2018 IEEE Conference on Decision and Control (CDC)

AI中文摘要

设计一种不牺牲过多隐私的数据共享机制可以被视为数据持有者与恶意攻击者之间的博弈。本文描述了一种压缩对抗隐私框架，该框架捕捉了数据隐私与效用之间的权衡。我们在假设数据持有者和攻击者只能使用线性变换修改数据的情况下，通过凸优化确定最优的数据发布机制。随后，我们构建了一个更加现实的数据发布机制，该机制可以依赖于非线性压缩模型，而攻击者则使用神经网络。通过一系列实证应用，我们展示了该框架，即压缩对抗隐私，能够保护敏感信息。

英文摘要

Designing a data sharing mechanism without sacrificing too much privacy can be considered as a game between data holders and malicious attackers. This paper describes a compressive adversarial privacy framework that captures the trade-off between the data privacy and utility. We characterize the optimal data releasing mechanism through convex optimization when assuming that both the data holder and attacker can only modify the data using linear transformations. We then build a more realistic data releasing mechanism that can rely on a nonlinear compression model while the attacker uses a neural network. We demonstrate in a series of empirical applications that this framework, consisting of compressive adversarial privacy, can preserve sensitive information.

URL PDF HTML ☆

赞 0 踩 0

1610.05202 2026-06-04 cs.LG cs.AI cs.DC cs.SY eess.SY stat.ML

Decentralized Collaborative Learning of Personalized Models over Networks

网络上的去中心化协作学习个性化模型

Paul Vanhaesebrouck, Aurélien Bellet, Marc Tommasi

发表机构 * INRIA

AI总结本文研究了在协作对等网络中，如何通过与其他具有相似目标的代理通信来改进本地训练模型，提出两种异步 gossip 算法并基于 ADMM 实现去中心化算法。

Comments To appear in the Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017)

1309.7666 2026-06-04 cs.RO cs.SY eess.SY

Dynamic Sliding Mode Control based on Fractional calculus subject to uncertain delay based chaotic pneumatic robot

基于不确定延迟混沌气动机器人的分数 calculus 动态滑模控制

Sara Gholipour P., Heydar Toosian Shandiz, Mobin Alizadeh, Sara Minagar, Seyed Javad Kazemitabar

发表机构 * Farabina Noshirvani’s smart Co.（法拉比亚·诺希拉维智能公司）； Robotic Research Lab.（机器人研究实验室）； Faculty of Electrical and Robotic Engineering, Shahrood University of Technology（沙霍尔德大学电气与机器人工程学院）

AI总结本文针对机器人操作臂中由于延迟引起混沌的滑模控制颤振问题，提出了一种基于分数 calculus 的动态滑模控制方法，通过动态本质消除颤振，并在主从配置的混沌系统中实现控制，利用李雅普诺夫稳定性理论保证闭环系统稳定性，同时对延迟机器人运动进行定性与定量研究。

Comments 8 pages, 9 figures, will be submitted in journal

1805.11572 2026-06-04 cs.CV cs.LG cs.NA math.NA stat.ML

Adversarial Regularizers in Inverse Problems

对抗正则化在反问题中的应用

Sebastian Lunz, Ozan Öktem, Carola-Bibiane Schönlieb

发表机构 * DAMTP Department of Mathematics（DAMTP数学系）； University of Cambridge（剑桥大学）； KTH - Royal Institute of Technology（皇家理工学院）

AI总结本文提出了一种利用神经网络作为正则化函数的新框架，用于解决反问题，该方法通过学习真实图像分布与未正则化重建分布之间的差异来提升反问题求解的性能。

Comments published at NeurIPS 2018

1604.00359 2026-06-04 cs.AI cs.NA cs.NE math.NA

Using Well-Understood Single-Objective Functions in Multiobjective Black-Box Optimization Test Suites

在多目标黑箱优化测试套件中使用已知的单目标函数

Dimo Brockhoff, Tea Tusar, Anne Auger, Nikolaus Hansen

发表机构 * Inria, research centre Saclay and CMAP UMR 7641（Inria萨克莱研究中心及CMAP UMR 7641）

AI总结本文提出通过结合现有单目标问题来构建多目标问题，介绍bbob-biobj测试套件及其扩展版本，并提供了一种通用方法来创建任意数量目标的测试套件，以比较确定性和随机优化算法的性能。

Comments ArXiv e-prints, arXiv:1604.00359

详情

AI中文摘要

几种测试函数套件被用于多目标优化算法的数值基准测试。虽然它们有一些 desirable 的属性，如各种形状的Pareto集和Pareto前沿，但大多数当前使用的函数具有在现实问题中可能被低估的特性。这些特性主要源于此类函数的更容易构造，导致了分离性、恰好位于边界约束的最优解以及控制解与Pareto前沿距离的变量的存在。本文提出了一种替代的多目标问题构造方法，通过结合文献中的现有单目标问题。我们特别描述了连续域中的bbob-biobj测试套件，包含55个双目标函数，以及其扩展版本包含92个双目标函数（bbob-biobj-ext）。这两个测试套件已在COCO平台中实现，用于黑箱优化基准测试。最后，我们推荐了一种通用的创建任意数量目标的测试套件的程序。除了提供正式的函数定义并展示其（已知）属性外，本文还旨在从具有相似属性的函数组、目标空间归一化和问题实例的角度解释我们的方法的原理。后者使我们能够轻松比较确定性和随机求解器的性能，这是基准测试中常被忽视的问题。

英文摘要

Several test function suites are being used for numerical benchmarking of multiobjective optimization algorithms. While they have some desirable properties, like well-understood Pareto sets and Pareto fronts of various shapes, most of the currently used functions possess characteristics that are arguably under-represented in real-world problems. They mainly stem from the easier construction of such functions and result in improbable properties such as separability, optima located exactly at the boundary constraints, and the existence of variables that solely control the distance between a solution and the Pareto front. Here, we propose an alternative way to constructing multiobjective problems-by combining existing single-objective problems from the literature. We describe in particular the bbob-biobj test suite with 55 bi-objective functions in continuous domain, and its extended version with 92 bi-objective functions (bbob-biobj-ext). Both test suites have been implemented in the COCO platform for black-box optimization benchmarking. Finally, we recommend a general procedure for creating test suites for an arbitrary number of objectives. Besides providing the formal function definitions and presenting their (known) properties, this paper also aims at giving the rationale behind our approach in terms of groups of functions with similar properties, objective space normalization, and problem instances. The latter allows us to easily compare the performance of deterministic and stochastic solvers, which is an often overlooked issue in benchmarking.

URL PDF HTML ☆

赞 0 踩 0

1812.11707 2026-06-04 cs.RO cs.SY eess.SY

UAV Control in Close Proximities - Ceiling Effect on Battery Lifetime

无人机近距离控制 - 顶效应对电池寿命的影响

Basaran Bahadir Kocer, Volkan Kumtepeli, Tegoeh Tjahjowidodo, Mahardhika Pratama, Anshuman Tripathi, Gerald Seet Gim Lee, Youyi Wang

发表机构 * 1 School of Mechanical

AI总结本文研究了无人机在近距离飞行时利用顶效应减少电池消耗对电池寿命的影响，通过实测数据发现顶效应可降低控制器努力，并首次采用全等效循环计数法分析其对电池寿命的降解作用，结果表明可减少15.77%的电池降解。

Comments ICoIAS 2019

1812.11315 2026-06-04 cs.RO cs.SY eess.SY

On Infusing Reachability-Based Safety Assurance within Probabilistic Planning Frameworks for Human-Robot Vehicle Interactions

在人机车辆交互中将基于可达性的安全性保证融入概率规划框架

Karen Leung, Edward Schmerling, Mo Chen, John Talbot, J. Christian Gerdes, Marco Pavone

发表机构 * Department of Aeronautics and Astronautics（航空与航天系）； Department of Mechanical Engineering（机械工程系）； School of Computing Science（计算科学学院）

AI总结本文提出一种最小干预的安全控制器，用于确保自动驾驶车辆与外部控制车辆的碰撞自由交互，通过实时控制器实现轨迹跟踪和安全保证，展示了在交通交织实验中自动驾驶车辆在对手车辆违反规划预期时仍能避免碰撞。

Comments Presented at the International Symposium on Experimental Robotics, Buenos Aires, Argentina, 2018

详情

AI中文摘要

动作预判、意图预测和主动行为都是交互场景中自动驾驶策略的 desirable 特性。然而，确保道路安全是关键挑战，其中一个重要问题是，在不影响规划器性能的情况下，必须考虑人类驾驶员行为的不确定性。本文介绍了一种最小干预的安全控制器，该控制器在自动驾驶车辆控制栈中运行，其作用是确保与外部控制（例如人工驾驶）的对手车辆的碰撞自由交互。我们利用可达性分析来构建一个实时（100Hz）的控制器，该控制器具有双重作用：（1）使用模型预测控制跟踪来自更高层规划算法的输入轨迹；（2）通过维持碰撞自由逃脱 maneuver 的可用性作为持续约束来确保安全，无论对方车辆未来采取何种行动。我们使用全规模的线控平台进行交通交织实验，其中两辆车最初并排，必须在有限的时间和距离内交换车道，模拟车辆汇入/汇出高速公路。我们证明，通过我们的控制栈，自动驾驶车辆能够在对手车辆违反规划预期并采取危险行动（无论是粗心还是有意碰撞）时避免碰撞，并且在必要时仅轻微偏离计划轨迹以维持安全。

英文摘要

Action anticipation, intent prediction, and proactive behavior are all desirable characteristics for autonomous driving policies in interactive scenarios. Paramount, however, is ensuring safety on the road --- a key challenge in doing so is accounting for uncertainty in human driver actions without unduly impacting planner performance. This paper introduces a minimally-interventional safety controller operating within an autonomous vehicle control stack with the role of ensuring collision-free interaction with an externally controlled (e.g., human-driven) counterpart. We leverage reachability analysis to construct a real-time (100Hz) controller that serves the dual role of (1) tracking an input trajectory from a higher-level planning algorithm using model predictive control, and (2) assuring safety through maintaining the availability of a collision-free escape maneuver as a persistent constraint regardless of whatever future actions the other car takes. A full-scale steer-by-wire platform is used to conduct traffic weaving experiments wherein the two cars, initially side-by-side, must swap lanes in a limited amount of time and distance, emulating cars merging onto/off of a highway. We demonstrate that, with our control stack, the autonomous vehicle is able to avoid collision even when the other car defies the planner's expectations and takes dangerous actions, either carelessly or with the intent to collide, and otherwise deviates minimally from the planned trajectory to the extent required to maintain safety.

URL PDF HTML ☆

赞 0 踩 0

1805.11706 2026-06-04 cs.LG cs.AI cs.RO cs.SY eess.SY stat.ML

Supervised Policy Update for Deep Reinforcement Learning

深度强化学习中的监督策略更新

Quan Vuong, Yiming Zhang, Keith W. Ross

发表机构 * University of California, San Diego（加州大学圣地亚哥分校）； New York University（纽约大学）

AI总结本文提出了一种新的样本效率高的方法，称为监督策略更新（SPU），用于深度强化学习。该方法通过当前策略生成的数据，在非参数化的近端策略空间中构建并求解一个约束优化问题，然后利用监督回归将最优的非参数化策略转换为参数化策略，从而生成新的样本。该方法适用于离散和连续动作空间，并能处理多种接近约束。本文展示了如何通过该方法解决自然策略梯度和信任区域策略优化（NPG/TRPO）以及近端策略优化（PPO）问题。SPU的实现比TRPO更简单，在样本效率方面，实验表明SPU在Mujoco模拟机器人任务中优于TRPO，在Atari视频游戏任务中优于PPO。

Comments Accepted as a conference paper at ICLR 2019

详情

AI中文摘要

我们提出了一种新的样本效率高的方法，称为监督策略更新（SPU），用于深度强化学习。从当前策略生成的数据开始，SPU在非参数化的近端策略空间中构建并求解一个约束优化问题。利用监督回归，它将最优的非参数化策略转换为参数化策略，从而生成新的样本。该方法具有通用性，适用于离散和连续动作空间，并能处理多种接近约束。我们展示了如何通过该方法解决自然策略梯度和信任区域策略优化（NPG/TRPO）以及近端策略优化（PPO）问题。SPU的实现比TRPO更简单。在样本效率方面，我们的广泛实验表明，SPU在Mujoco模拟机器人任务中优于TRPO，在Atari视频游戏任务中优于PPO。

英文摘要

We propose a new sample-efficient methodology, called Supervised Policy Update (SPU), for deep reinforcement learning. Starting with data generated by the current policy, SPU formulates and solves a constrained optimization problem in the non-parameterized proximal policy space. Using supervised regression, it then converts the optimal non-parameterized policy to a parameterized policy, from which it draws new samples. The methodology is general in that it applies to both discrete and continuous action spaces, and can handle a wide variety of proximity constraints for the non-parameterized optimization problem. We show how the Natural Policy Gradient and Trust Region Policy Optimization (NPG/TRPO) problems, and the Proximal Policy Optimization (PPO) problem can be addressed by this methodology. The SPU implementation is much simpler than TRPO. In terms of sample efficiency, our extensive experiments show SPU outperforms TRPO in Mujoco simulated robotic tasks and outperforms PPO in Atari video game tasks.

URL PDF HTML ☆

赞 0 踩 0

1812.08280 2026-06-04 cs.RO cs.SY eess.SY

Extrinisic Calibration of a Camera-Arm System Through Rotation Identification

通过旋转识别进行相机-机械臂系统的外校准

Steve McGuire, Christoffer Heckman, Daniel Szafir, Simon Julier, Nisar Ahmed

发表机构 * Department of Aerospace Engineering Sciences, University of Colorado at Boulder（科罗拉多大学波尔德分校航空航天工程科学系）

AI总结本文提出了一种基于观察臂的结构运动来恢复外校准参数的方法，结合已知的机械臂运动学和图像平面中的二次曲线观测，通过最大似然估计计算校准外参数，该方法在仿真中得到验证并在现实模型中测试，结果与尺子估计一致。

1812.07810 2026-06-04 cs.LG cs.CR cs.NA math.NA stat.ML

Fast Botnet Detection From Streaming Logs Using Online Lanczos Method

从流日志中快速检测僵尸网络的在线兰茨斯方法

Zheng Chen, Xinli Yu, Chi Zhang, Jin Zhang, Cui Lin, Bo Song, Jianliang Gao, Xiaohua Hu, Wei-Shih Yang, Erjia Yan

发表机构 * CA Technologies, Inc.（CA Technologies公司）； College of Computing & Informatics, Drexel University（德雷塞尔大学计算与信息学院）； Department of Mathematics, Temple University（Temple大学数学系）； Department of Computer Science, Maryland University at Baltimore County（马里兰大学巴尔的摩县计算机科学系）

AI总结本文提出了一种基于在线兰茨斯方法的僵尸网络检测方法，通过将PCA方法改进为亚立方复杂度，提高了实时检测的准确性和灵敏度，同时提出了通用的在线相关矩阵更新公式和新的终止条件。

详情

AI中文摘要

僵尸网络，作为一种协调的机器人网络，已成为恶意互联网活动的主要平台，如DDOS攻击、点击欺诈、网络爬虫、垃圾/谣言传播等。本文专注于设计和实验一种新的从流Web服务器日志中检测僵尸网络的方法，受到其广泛适用性、实时保护能力、易用性和敏感数据更安全的启发。我们的算法受到主成分分析（PCA）的启发，以捕捉数据中的相关性，我们首次将兰茨斯方法应用于改进基于PCA的僵尸网络检测的时间复杂度，从立方到亚立方，这使我们能够更准确和灵敏地检测滑动时间窗口中的僵尸网络，而不是固定时间窗口。我们贡献了一个通用的在线相关矩阵更新公式，以及基于误差界和对称矩阵非递减特征值的新终止条件。在电子商务网站日志数据集上，实验表明兰茨斯方法在不同时间窗口下的时间成本始终仅为PCA的20%至25%。

英文摘要

Botnet, a group of coordinated bots, is becoming the main platform of malicious Internet activities like DDOS, click fraud, web scraping, spam/rumor distribution, etc. This paper focuses on design and experiment of a new approach for botnet detection from streaming web server logs, motivated by its wide applicability, real-time protection capability, ease of use and better security of sensitive data. Our algorithm is inspired by a Principal Component Analysis (PCA) to capture correlation in data, and we are first to recognize and adapt Lanczos method to improve the time complexity of PCA-based botnet detection from cubic to sub-cubic, which enables us to more accurately and sensitively detect botnets with sliding time windows rather than fixed time windows. We contribute a generalized online correlation matrix update formula, and a new termination condition for Lanczos iteration for our purpose based on error bound and non-decreasing eigenvalues of symmetric matrices. On our dataset of an ecommerce website logs, experiments show the time cost of Lanczos method with different time windows are consistently only 20% to 25% of PCA.

URL PDF HTML ☆

赞 0 踩 0

1812.07410 2026-06-04 cs.LG cs.SY eess.SY stat.ML

An Improved Deep Belief Network Model for Road Safety Analyses

一种改进的深度信念网络模型用于道路安全分析

Guangyuan Pan, Liping Fu, Lalita Thakali, Matthew Muresan, Ming Yu

发表机构 * Intelligent Transportation Systems Research Center（智能交通系统研究中心）； Wuhan University of Technology（武汉理工大学）； University of Waterloo（滑铁卢大学）； Department of Civil & Environmental Engineering（土木与环境工程系）； Department of Electrical & Computer Engineering（电气与计算机工程系）

AI总结本文提出了一种改进的深度信念网络模型，用于提升道路安全分析中的碰撞预测能力，通过两个案例研究展示该模型在预测性能上的优势，并与其他传统模型进行比较。

详情

Journal ref: Transportation Research Board 97th Annual Meeting, 2018

AI中文摘要

碰撞预测是道路安全分析中的关键组成部分。广泛采用的碰撞预测方法是应用基于回归的技术。底层的校准过程通常耗时较长，需要大量的领域知识和专业知识，无法轻易自动化。本文介绍了一种新的机器学习（ML）方法作为传统技术的替代方案。所提出的ML模型称为正则化深度信念网络，是一种具有两个训练步骤的深度神经网络：首先使用无监督学习算法进行训练，然后通过用第一步训练得到的权重初始化贝叶斯神经网络进行微调。所得模型预计具有改进的预测能力和减少对耗时人工干预的需求。在本文中，我们试图通过两个案例研究展示这种新模型在碰撞预测中的潜力，包括来自加拿大安大略省800公里高速公路401号和其他高速公路的碰撞数据集。我们的目的是展示该ML方法与其他传统模型（包括负二项（NB）模型、核回归（KR）和贝叶斯神经网络（贝叶斯NN））的性能比较。我们还试图解决其他相关问题，如训练数据大小和训练参数的影响。

英文摘要

Crash prediction is a critical component of road safety analyses. A widely adopted approach to crash prediction is application of regression based techniques. The underlying calibration process is often time-consuming, requiring significant domain knowledge and expertise and cannot be easily automated. This paper introduces a new machine learning (ML) based approach as an alternative to the traditional techniques. The proposed ML model is called regularized deep belief network, which is a deep neural network with two training steps: it is first trained using an unsupervised learning algorithm and then fine-tuned by initializing a Bayesian neural network with the trained weights from the first step. The resulting model is expected to have improved prediction power and reduced need for the time-consuming human intervention. In this paper, we attempt to demonstrate the potential of this new model for crash prediction through two case studies including a collision data set from 800 km stretch of Highway 401 and other highways in Ontario, Canada. Our intention is to show the performance of this ML approach in comparison to various traditional models including negative binomial (NB) model, kernel regression (KR), and Bayesian neural network (Bayesian NN). We also attempt to address other related issues such as effect of training data size and training parameters.

URL PDF HTML ☆

赞 0 踩 0

1509.09236 2026-06-04 cs.LG cs.CC cs.NA math.NA math.OC

On the Complexity of Robust PCA and $\ell_1$-norm Low-Rank Matrix Approximation

关于鲁棒PCA和ℓ1-范数低秩矩阵逼近的复杂性

Nicolas Gillis, Stephen A. Vavasis

发表机构 * Department of Mathematics and Operational Research, University of Mons（蒙斯大学数学与运筹学系）； Department of Combinatorics and Optimization, University of Waterloo（滑铁卢大学组合学与优化系）

AI总结本文证明了基于ℓ1-范数的低秩矩阵逼近（ℓ1-LRA）在秩为1的情况下是NP难的，并将其与鲁棒PCA、ℓ0-LRA、二元矩阵分解等多个已知NP难问题建立了联系。

Comments 16 pages, some typos corrected

详情

DOI: 10.1287/moor.2017.0895
Journal ref: Mathematics of Operations Research 43 (4), pp. 1072-1084, 2018

AI中文摘要

基于组件wise的ℓ1-范数（ℓ1-LRA）的低秩矩阵逼近问题，与鲁棒主成分分析（PCA）密切相关，已成为数据挖掘和机器学习中的非常流行工具。鲁棒PCA旨在恢复被稀疏噪声扰动的低秩矩阵，例如在前景-背景视频分离中的应用。尽管ℓ1-LRA被强烈认为是NP难的，但到目前为止，尚无正式证明。在本文中，我们通过将问题归约到MAX CUT，证明了ℓ1-LRA在秩为1的情况下是NP难的。我们的推导揭示了ℓ1-LRA与几个其他已知NP难问题之间的有趣联系，包括鲁棒PCA、ℓ0-LRA、二元矩阵分解、特定的稠密二分子图问题、{-1,+1}矩阵的切范数计算，以及离散基底问题。

英文摘要

The low-rank matrix approximation problem with respect to the component-wise $\ell_1$-norm ($\ell_1$-LRA), which is closely related to robust principal component analysis (PCA), has become a very popular tool in data mining and machine learning. Robust PCA aims at recovering a low-rank matrix that was perturbed with sparse noise, with applications for example in foreground-background video separation. Although $\ell_1$-LRA is strongly believed to be NP-hard, there is, to the best of our knowledge, no formal proof of this fact. In this paper, we prove that $\ell_1$-LRA is NP-hard, already in the rank-one case, using a reduction from MAX CUT. Our derivations draw interesting connections between $\ell_1$-LRA and several other well-known problems, namely, robust PCA, $\ell_0$-LRA, binary matrix factorization, a particular densest bipartite subgraph problem, the computation of the cut norm of $\{-1,+1\}$ matrices, and the discrete basis problem, which we all prove to be NP-hard.

URL PDF HTML ☆

赞 0 踩 0

1810.08124 2026-06-04 cs.AI cs.SY eess.SY

Approximate Dynamic Programming for Planning a Ride-Sharing System using Autonomous Fleets of Electric Vehicles

为使用自动驾驶电动车的拼车系统进行规划的近似动态规划

Lina Al-Kanj, Juliana Nascimento, Warren B. Powell

发表机构 * Operations Research and Financial Engineering Department（运筹学与金融工程系）

AI总结本文研究了自动驾驶电动车拼车系统中的调度问题、 surge定价问题和车队规模规划问题，采用近似动态规划方法来优化车辆分配、充电和重新定位决策，并通过分层聚合技术提高价值函数估计的准确性，同时利用自适应学习方法确定每趟行程的定价。

详情

AI中文摘要

在十年内，几乎每家主要汽车公司以及如Uber等车队运营商都宣布计划将自动驾驶车辆投放到道路上。同时，电动车正迅速成为下一代技术，不仅成本效益高，还能减少碳足迹。由中央管理的无人驾驶车队与电动车的操作特性相结合，正创造一种变革性技术，提供显著的成本节省和高水平的服务。该问题涉及调度问题，即分配乘客到车辆；surge定价问题，即决定每趟行程的价格；以及规划问题，即决定车队规模。我们使用近似动态规划来开发高质量的操作调度策略，以确定哪辆车最适合特定行程，何时应充电，以及何时应重新定位到提供更高行程密度的区域。我们证明价值函数在电池和时间维度上是单调的，并利用分层聚合技术，用少量观测数据获得更好的价值函数估计。然后，使用自适应学习方法讨论surge定价问题，以决定每趟行程的价格。最后，我们讨论了车队规模问题，其取决于前两个问题。

英文摘要

Within a decade, almost every major auto company, along with fleet operators such as Uber, have announced plans to put autonomous vehicles on the road. At the same time, electric vehicles are quickly emerging as a next-generation technology that is cost effective, in addition to offering the benefits of reducing the carbon footprint. The combination of a centrally managed fleet of driverless vehicles, along with the operating characteristics of electric vehicles, is creating a transformative new technology that offers significant cost savings with high service levels. This problem involves a dispatch problem for assigning riders to cars, a surge pricing problem for deciding on the price per trip and a planning problem for deciding on the fleet size. We use approximate dynamic programming to develop high-quality operational dispatch strategies to determine which car is best for a particular trip, when a car should be recharged, and when it should be re-positioned to a different zone which offers a higher density of trips. We prove that the value functions are monotone in the battery and time dimensions and use hierarchical aggregation to get better estimates of the value functions with a small number of observations. Then, surge pricing is discussed using an adaptive learning approach to decide on the price for each trip. Finally, we discuss the fleet size problem which depends on the previous two problems.

URL PDF HTML ☆

赞 0 踩 0

1808.04580 2026-06-04 cs.LG cs.NA math.NA stat.ML

NFFT meets Krylov methods: Fast matrix-vector products for the graph Laplacian of fully connected networks

NFFT与Krylov方法的结合：用于完全连接网络图拉普拉斯算子的快速矩阵-向量乘法

Dominik Alfke, Daniel Potts, Martin Stoll, Toni Volkmer

发表机构 * Technische Universität Chemnitz, Faculty of Mathematics, Chair of Scientific Computing（技术大学化学学院，数学系，科学计算教研室）； Technische Universität Chemnitz, Faculty of Mathematics, Chair of Applied Functional Analysis（技术大学化学学院，数学系，应用泛函分析教研室）； Technische Universität Chemnitz, Faculty of Mathematics, Chair of Applied Analysis（技术大学化学学院，数学系，应用分析教研室）

AI总结本文提出利用NFFT进行快速矩阵-向量乘法，以高效处理完全连接网络的图拉普拉斯算子，同时展示了其在图像分割和半监督学习中的应用，并与Nyström方法进行了比较。

Comments 28 pages, 9 figures

详情

DOI: 10.3389/fams.2018.00061

AI中文摘要

图拉普拉斯算子是数据科学、机器学习和图像处理中的标准工具。对应的矩阵继承了底层网络的复杂结构，并在某些应用中被密集填充。这使得与图拉普拉斯算子的计算，特别是矩阵-向量乘法，成为一个困难的任务。典型应用是计算其若干特征值和特征向量。标准方法在图中节点数量过大时变得不可行。本文提出利用基于非等间距快速傅里叶变换（NFFT）的快速求和方法，以快速执行图拉普拉斯算子的密集矩阵-向量乘法，而无需形成整个矩阵。NFFT算法的巨大灵活性使我们能够将加速乘法嵌入到基于Lanczos的特征值计算程序或迭代线性系统求解器中，并考虑非标准高斯核。我们通过一系列测试问题展示了该方法的可行性，从图像分割到基于图的PDEs的半监督学习。特别是，我们比较了我们的方法与Nyström方法。此外，我们还提出并测试了改进的、混合版本的Nyström方法，该方法内部使用NFFT。

英文摘要

The graph Laplacian is a standard tool in data science, machine learning, and image processing. The corresponding matrix inherits the complex structure of the underlying network and is in certain applications densely populated. This makes computations, in particular matrix-vector products, with the graph Laplacian a hard task. A typical application is the computation of a number of its eigenvalues and eigenvectors. Standard methods become infeasible as the number of nodes in the graph is too large. We propose the use of the fast summation based on the nonequispaced fast Fourier transform (NFFT) to perform the dense matrix-vector product with the graph Laplacian fast without ever forming the whole matrix. The enormous flexibility of the NFFT algorithm allows us to embed the accelerated multiplication into Lanczos-based eigenvalues routines or iterative linear system solvers and even consider other than the standard Gaussian kernels. We illustrate the feasibility of our approach on a number of test problems from image segmentation to semi-supervised learning based on graph-based PDEs. In particular, we compare our approach with the Nyström method. Moreover, we present and test an enhanced, hybrid version of the Nyström method, which internally uses the NFFT.

URL PDF HTML ☆

赞 0 踩 0

1511.06444 2026-06-04 cs.LG cs.NA math.NA math.PR

Universal halting times in optimization and machine learning

优化与机器学习中的通用停止时间

Levent Sagun, Thomas Trogdon, Yann LeCun

发表机构 * Mathematics Department（数学系）； Department of Mathematics（数学系）； Computer Science Department（计算机科学系）； New York University（纽约大学）； University of California, Irvine（加州大学欧文分校）

AI总结研究通过分析优化算法在随机系统（如自旋玻璃和深度学习）中的停止时间分布，发现其在特定条件下与底层分布无关，揭示了两种类型的分布类：Gumbel型和高斯型。

详情

DOI: 10.1090/qam/1483
Journal ref: Quart. Appl. Math. 76 (2018), 289-301

AI中文摘要

作者展示了优化算法在两个随机系统（自旋玻璃和深度学习）中达到给定精度所需的迭代次数的经验分布。给定一个算法（即优化过程和随机景观的形式），停止时间的波动遵循一种分布，即使在改变景观分布后，经过中心化和标准化后仍保持不变。我们观察到两种定性类别：一种类似于Gumbel分布的分布，出现在谷歌搜索、人类决策时间、QR特征值算法和自旋玻璃中；另一种类似于高斯分布的分布，出现在共轭梯度法、使用MNIST输入数据的深度网络和使用随机输入数据的深度网络中。这种经验证据表明，存在一种分布类，其停止时间在某些条件下与底层分布无关。

英文摘要

The authors present empirical distributions for the halting time (measured by the number of iterations to reach a given accuracy) of optimization algorithms applied to two random systems: spin glasses and deep learning. Given an algorithm, which we take to be both the optimization routine and the form of the random landscape, the fluctuations of the halting time follow a distribution that, after centering and scaling, remains unchanged even when the distribution on the landscape is changed. We observe two qualitative classes: A Gumbel-like distribution that appears in Google searches, human decision times, the QR eigenvalue algorithm and spin glasses, and a Gaussian-like distribution that appears in conjugate gradient method, deep network with MNIST input data and deep network with random input data. This empirical evidence suggests presence of a class of distributions for which the halting time is independent of the underlying distribution under some conditions.

URL PDF HTML ☆

赞 0 踩 0

1812.04303 2026-06-04 cs.CV cs.GR cs.NA math.NA

Analytic heuristics for a fast DSC-MRI

动态磁共振成像的分析启发法

Marco Virgulin, Marco Castellaro, Enrico Grisan, Fabio Marcuzzi

发表机构 * Department of Mathematics, Padua University（数学系，帕多瓦大学）； Department of Information Engineering, Padua University（信息工程系，帕多瓦大学）

AI总结本文提出了一种确定性方法用于动态磁敏感对比成像数据的重建，并将其与文献中已有的压缩感知解决方案进行比较。通过问题的数学分析，尽管计算复杂度非多项式导致计算不可行，但提出了简单的启发法，效果良好，并在真实图像和加噪人工假体上给出了结果。

1812.03216 2026-06-04 cs.LG cs.RO cs.SY eess.SY

Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control

基于鲁棒控制的零样本深度强化学习驾驶策略迁移用于自动驾驶车辆

Zhuo Xu, Chen Tang, Masayoshi Tomizuka

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文提出了一种基于鲁棒控制的零样本深度强化学习驾驶策略迁移方法，通过转移具体的运动学量来解决自动驾驶中源域与目标域之间的建模差距问题，采用可转移的分层强化学习轨迹规划器和基于扰动观测器的鲁棒跟踪控制器，验证了该方法在多个驾驶场景中的零样本迁移能力。

Comments Published at IEEE ITSC 2018

详情

AI中文摘要

尽管深度强化学习（深度RL）方法在应用于自动驾驶时具有诸多优势，但真实应用却受到源域（训练）与目标域（部署）之间建模差距的限制。与当前的策略迁移方法不同，本文提出转移具体的自动驾驶运动学量。所提出的基于鲁棒控制的（RC）通用迁移架构，称为RL-RC，结合了可转移的分层强化学习轨迹规划器和基于扰动观测器（DOB）的鲁棒跟踪控制器。通过训练已知的名义动力学模型的深度RL策略直接转移到目标域，DOB基于的鲁棒跟踪控制用于处理建模差距，包括车辆动力学误差和外部扰动如侧向力。我们提供了模拟验证所提出方法在多个驾驶场景如车道保持、车道变更和障碍物避让中的迁移能力。

英文摘要

Although deep reinforcement learning (deep RL) methods have lots of strengths that are favorable if applied to autonomous driving, real deep RL applications in autonomous driving have been slowed down by the modeling gap between the source (training) domain and the target (deployment) domain. Unlike current policy transfer approaches, which generally limit to the usage of uninterpretable neural network representations as the transferred features, we propose to transfer concrete kinematic quantities in autonomous driving. The proposed robust-control-based (RC) generic transfer architecture, which we call RL-RC, incorporates a transferable hierarchical RL trajectory planner and a robust tracking controller based on disturbance observer (DOB). The deep RL policies trained with known nominal dynamics model are transfered directly to the target domain, DOB-based robust tracking control is applied to tackle the modeling gap including the vehicle dynamics errors and the external disturbances such as side forces. We provide simulations validating the capability of the proposed method to achieve zero-shot transfer across multiple driving scenarios such as lane keeping, lane changing and obstacle avoidance.

URL PDF HTML ☆

赞 0 踩 0

1812.03279 2026-06-04 cs.SD cs.NA eess.AS math.NA

Estimates of the Reconstruction Error in Partially Redressed Warped Frames Expansions

部分修正扭曲框架展开中重构误差的估计

Thomas Mejstrik, Gianpaolo Evangelista

发表机构 * University of Vienna（维也纳大学）； MDW, University of Music and Performing Arts Vienna（维也纳音乐与表演艺术大学MDW）

AI总结本文研究了具有紧支撑的频率扭曲分析-合成元素的近似误差，通过几个例子和案例研究，探讨了在在线计算中如何通过有限时间支持的近似来减少重构误差。

Comments 8 pages, 5 figures, 4 tables, conference paper

详情

Journal ref: Proc. of Digital Audio Effect Conf. (DAFx'16). Brno, Czech Republic, September 2016, pp. 9-16

AI中文摘要

在最近的工作中，引入了修正扭曲框架，用于非均匀频率和时间分辨率音频信号的分析和合成。在这些框架中，表示元素的频率带或时间间隔的分配可以通过扭曲映射唯一描述。在时间-频率采样后应用反扭曲可以减少或消除扭曲框架元素在共轭变量中的色散，从而可能通过频率构造具有同步时间对齐的频率扭曲框架。然而，修正过程仅在分析和合成窗口在应用扭曲的域中具有紧支撑时才是精确的。这暗示频率扭曲框架不能在时间域中具有紧支撑。当需要在线计算时，这一性质是不理想的。然而，允许时间支持为有限的近似是可能的，这导致较小的重构误差。在本文中，我们研究了具有紧支撑的频率扭曲分析-合成元素的近似误差，提供了几个例子和案例研究。

英文摘要

In recent work, redressed warped frames have been introduced for the analysis and synthesis of audio signals with non-uniform frequency and time resolutions. In these frames, the allocation of frequency bands or time intervals of the elements of the representation can be uniquely described by means of a warping map. Inverse warping applied after time-frequency sampling provides the key to reduce or eliminate dispersion of the warped frame elements in the conjugate variable, making it possible, e.g., to construct frequency warped frames with synchronous time alignment through frequency. The redressing procedure is however exact only when the analysis and synthesis windows have compact support in the domain where warping is applied. This implies that frequency warped frames cannot have compact support in the time domain. This property is undesirable when online computation is required. Approximations in which the time support is finite are however possible, which lead to small reconstruction errors. In this paper we study the approximation error for compactly supported frequency warped analysis-synthesis elements, providing a few examples and case studies.

URL PDF HTML ☆

赞 0 踩 0

1805.07857 2026-06-04 cs.LG cs.CV cs.NA math.NA stat.ML

Parallel Transport Convolution: A New Tool for Convolutional Neural Networks on Manifolds

平行运输卷积：用于流形上卷积神经网络的新工具

Stefan C. Schonsheck, Bin Dong, Rongjie Lai

发表机构 * Rensselaer Polytechnic Institute（伦斯拉尔理工学院）

AI总结本文提出平行运输卷积（PTC），一种在流形及其离散对应物上扩展卷积操作的新方法，能够保持卷积的紧凑支持、方向性和跨流形的可转移性，从而在曲面域上构建小波样操作和深度卷积神经网络。

Comments 10 pages

详情

AI中文摘要

卷积在科学和工程中的各种应用中扮演了重要的角色，是卷积神经网络中最关键的操作。近年来，研究者对在曲面域（如流形和图）上推广卷积的兴趣增长，但现有方法无法保持欧几里得卷积的所有理想特性，即紧凑支持滤波器、方向性和跨不同流形的可转移性。本文开发了一种新的卷积操作扩展，称为平行运输卷积（PTC），应用于黎曼流形及其离散对应物。PTC基于平行运输，能够沿流形传输信息并内在保持方向性。PTC允许构建具有紧凑支持的滤波器，并且对流形变形具有鲁棒性。这使得我们能够执行小波样操作，并在曲面域上定义深度卷积神经网络。

英文摘要

Convolution has been playing a prominent role in various applications in science and engineering for many years. It is the most important operation in convolutional neural networks. There has been a recent growth of interests of research in generalizing convolutions on curved domains such as manifolds and graphs. However, existing approaches cannot preserve all the desirable properties of Euclidean convolutions, namely compactly supported filters, directionality, transferability across different manifolds. In this paper we develop a new generalization of the convolution operation, referred to as parallel transport convolution (PTC), on Riemannian manifolds and their discrete counterparts. PTC is designed based on the parallel transportation which is able to translate information along a manifold and to intrinsically preserve directionality. PTC allows for the construction of compactly supported filters and is also robust to manifold deformations. This enables us to preform wavelet-like operations and to define deep convolutional neural networks on curved domains.

URL PDF HTML ☆

赞 0 踩 0

1804.01526 2026-06-04 cs.LG cs.NA math.NA stat.ML

Training DNNs with Hybrid Block Floating Point

用混合块浮点数训练DNNs

Mario Drumond, Tao Lin, Martin Jaggi, Babak Falsafi

发表机构 * EPFL（苏黎世联邦理工学院）

AI总结本文提出了一种混合块浮点数（HBFP）方法，通过在块浮点数中执行所有点积运算，其他运算使用浮点数，从而在保持高精度的同时提高硬件密度和吞吐量。

Comments 9 pages, 3 figures. Accepted in Neural Information Processing Systems 2018 (NeurIPS 2018)

详情

AI中文摘要

深度神经网络（DNN）的广泛应用催生了持续增长的计算需求，迫使数据中心运营商采用领域专用加速器来训练它们。这些加速器通常使用密集打包的全精度浮点运算以最大化面积性能。持续的研究努力旨在通过用固定点运算替换浮点运算来进一步提高这种性能密度。然而，这些尝试面临的主要障碍是固定点的动态范围狭窄，不足以支持DNN训练的收敛。我们识别出块浮点数（BFP）作为有前途的替代表示，因为它具有宽动态范围，并且能够使大多数DNN运算使用固定点逻辑进行。不幸的是，BFP单独引入了几个限制，使其无法直接应用。在本文中，我们引入了HBFP，一种混合BFP-FP方法，它在BFP中执行所有点积运算，其他运算使用浮点运算。HBFP实现了两全其美：浮点数的高精度和固定点的优越硬件密度。对于广泛的各种模型，我们证明HBFP在保持浮点数精度的同时，能够实现高达8.5倍的吞吐量。

英文摘要

The wide adoption of DNNs has given birth to unrelenting computing requirements, forcing datacenter operators to adopt domain-specific accelerators to train them. These accelerators typically employ densely packed full precision floating-point arithmetic to maximize performance per area. Ongoing research efforts seek to further increase that performance density by replacing floating-point with fixed-point arithmetic. However, a significant roadblock for these attempts has been fixed point's narrow dynamic range, which is insufficient for DNN training convergence. We identify block floating point (BFP) as a promising alternative representation since it exhibits wide dynamic range and enables the majority of DNN operations to be performed with fixed-point logic. Unfortunately, BFP alone introduces several limitations that preclude its direct applicability. In this work, we introduce HBFP, a hybrid BFP-FP approach, which performs all dot products in BFP and other operations in floating point. HBFP delivers the best of both worlds: the high accuracy of floating point at the superior hardware density of fixed point. For a wide variety of models, we show that HBFP matches floating point's accuracy while enabling hardware implementations that deliver up to 8.5x higher throughput.

URL PDF HTML ☆

赞 0 踩 0

1803.00444 2026-06-04 cs.LG cs.AI cs.RO cs.SY eess.SY stat.ML

Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal Modeling

通过非参数时空子目标建模实现逆强化学习

Adrian Šošić, Elmar Rueckert, Jan Peters, Abdelhak M. Zoubir, Heinz Koeppl

发表机构 * Signal Processing Group（信号处理组）； Institute for Robotics and Cognitive Systems（机器人与认知系统研究所）； Autonomous Systems Labs（自主系统实验室）； Bioinspired Communication Systems（生物启发通信系统）

AI总结本文提出了一种基于非参数时空子目标建模的逆强化学习方法，通过局部上下文更高效地解释单条轨迹，实现更紧凑的行为表示，并构建隐式意图模型以预测未观察到的情况，从而在处理意图变化和主动学习场景中表现出色。

Comments 45 pages, 14 figures; ### Version 3 ### published in the Journal of Machine Learning Research

详情

AI中文摘要

逆强化学习（IRL）领域的发展导致了更复杂的推理框架，这些框架放宽了原始建模假设，即观察到的代理行为仅反映单一意图。相反于学习全局行为模型，最近的IRL方法将演示数据分成部分，以考虑不同轨迹可能对应不同意图，例如因为它们由不同领域专家生成。在本工作中，我们进一步采用子目标的直观概念，建立一个前提：即使单条轨迹在特定上下文中局部解释也比全局更高效，从而实现更紧凑的行为表示。基于这一假设，我们构建了代理目标的隐式意图模型，以预测未观察到的情况。结果是一种集成的贝叶斯预测框架，显著优于现有IRL解决方案，并提供与专家计划一致的平滑策略估计。最值得注意的是，我们的框架自然处理代理意图随时间变化的情况，而经典IRL算法失败。此外，由于其概率性质，该模型可以轻松应用于主动学习场景，以指导专家的演示过程。

英文摘要

Advances in the field of inverse reinforcement learning (IRL) have led to sophisticated inference frameworks that relax the original modeling assumption of observing an agent behavior that reflects only a single intention. Instead of learning a global behavioral model, recent IRL methods divide the demonstration data into parts, to account for the fact that different trajectories may correspond to different intentions, e.g., because they were generated by different domain experts. In this work, we go one step further: using the intuitive concept of subgoals, we build upon the premise that even a single trajectory can be explained more efficiently locally within a certain context than globally, enabling a more compact representation of the observed behavior. Based on this assumption, we build an implicit intentional model of the agent's goals to forecast its behavior in unobserved situations. The result is an integrated Bayesian prediction framework that significantly outperforms existing IRL solutions and provides smooth policy estimates consistent with the expert's plan. Most notably, our framework naturally handles situations where the intentions of the agent change over time and classical IRL algorithms fail. In addition, due to its probabilistic nature, the model can be straightforwardly applied in active learning scenarios to guide the demonstration process of the expert.

URL PDF HTML ☆

赞 0 踩 0

1811.12084 2026-06-04 cs.CV cs.LG cs.NA math.AP math.NA

Networks for Nonlinear Diffusion Problems in Imaging

图像中非线性扩散问题的网络

Simon Arridge, Andreas Hauptmann

发表机构 * Department of Computer Science（计算机科学系；伦敦大学学院）； University College London

AI总结本文提出了一种基于非线性扩散过程的网络架构DiffNet，用于解决图像中的非线性扩散问题，该网络在可解释性和泛化能力方面优于传统卷积神经网络，并在非线性扩散逆问题上取得了与U-Net相当的性能。

详情

AI中文摘要

许多成像和视觉任务近期通过深度学习方法，特别是卷积神经网络的应用，经历了重大变革。这些方法在某些应用中取得了显著成果，即使这些应用并不明显表明卷积适合捕捉底层物理。在本文中，我们开发了一种基于非线性扩散过程的网络架构，称为DiffNet。通过设计，我们获得了一种适合图像中扩散相关问题的非线性网络架构。此外，所执行的更新是显式的，从而比传统卷积神经网络架构获得了更好的可解释性和泛化能力。在STL-10图像数据集上测试DiffNet在非线性扩散逆问题中的性能，使用Perona-Malik滤波器。我们获得的结果与已建立的U-Net架构具有竞争力，参数数量和必要的训练数据较少。

英文摘要

A multitude of imaging and vision tasks have seen recently a major transformation by deep learning methods and in particular by the application of convolutional neural networks. These methods achieve impressive results, even for applications where it is not apparent that convolutions are suited to capture the underlying physics. In this work we develop a network architecture based on nonlinear diffusion processes, named DiffNet. By design, we obtain a nonlinear network architecture that is well suited for diffusion related problems in imaging. Furthermore, the performed updates are explicit, by which we obtain better interpretability and generalisability compared to classical convolutional neural network architectures. The performance of DiffNet tested on the inverse problem of nonlinear diffusion with the Perona-Malik filter on the STL-10 image dataset. We obtain competitive results to the established U-Net architecture, with a fraction of parameters and necessary training data.

URL PDF HTML ☆

赞 0 踩 0

1811.11259 2026-06-04 cs.LG cs.AI cs.DS cs.SY eess.SY stat.ML

Scaling Configuration of Energy Harvesting Sensors with Reinforcement Learning

基于强化学习的能源收集传感器的扩展配置

Francesco Fraternali, Bharathan Balaji, Rajesh Gupta

发表机构 * University of California, San Diego（加州大学圣迭戈分校）； University of California, Los Angeles（加州大学洛杉矶分校）

AI总结本文提出利用强化学习自动配置室内太阳能板能源收集传感器的采样率，通过减少训练阶段和计算需求，实现快速部署和大规模扩展，有效提升传感器数据采集效率并避免能源耗尽。

Comments 7 pages, 5 figures

详情

DOI: 10.1145/3279755.3279760
Journal ref: ENSsys '18: International Workshop on Energy Harvesting & Energy-Neutral Sensing Systems}{November 4, 2018}{Shenzhen, China

AI中文摘要

随着物联网（IoT）的出现，越来越多的能源收集方法被用于补充或替代电池供电传感器。能源收集传感器需要根据应用、硬件和环境条件进行配置，以最大化其效用。目前，传感器配置要么是手动的，要么基于启发式方法，需要宝贵的领域专业知识。强化学习（RL）是一种有前景的方法，可以自动化配置并高效扩展IoT部署，但尚未在实践中得到应用。我们提出了解决这一差距的解决方案：减少RL的训练阶段，使节点在部署后短时间内即可运行，并减少计算需求以扩展到大规模部署。我们专注于配置基于室内太阳能板的能源收集传感器的采样率。我们基于三个月内从5个传感器节点收集的数据创建了一个模拟器。我们的模拟结果表明，RL可以有效学习能源可用性模式，并配置传感器节点的采样率以在确保不耗尽能源存储的情况下最大化传感数据。通过我们的方法，节点可以在部署的第一天内投入使用。我们还展示了通过使用相似光照条件的节点共享单个策略来减少RL策略数量的可能性。

英文摘要

With the advent of the Internet of Things (IoT), an increasing number of energy harvesting methods are being used to supplement or supplant battery based sensors. Energy harvesting sensors need to be configured according to the application, hardware, and environmental conditions to maximize their usefulness. As of today, the configuration of sensors is either manual or heuristics based, requiring valuable domain expertise. Reinforcement learning (RL) is a promising approach to automate configuration and efficiently scale IoT deployments, but it is not yet adopted in practice. We propose solutions to bridge this gap: reduce the training phase of RL so that nodes are operational within a short time after deployment and reduce the computational requirements to scale to large deployments. We focus on configuration of the sampling rate of indoor solar panel based energy harvesting sensors. We created a simulator based on 3 months of data collected from 5 sensor nodes subject to different lighting conditions. Our simulation results show that RL can effectively learn energy availability patterns and configure the sampling rate of the sensor nodes to maximize the sensing data while ensuring that energy storage is not depleted. The nodes can be operational within the first day by using our methods. We show that it is possible to reduce the number of RL policies by using a single policy for nodes that share similar lighting conditions.

URL PDF HTML ☆

赞 0 踩 0

1809.00367 2026-06-04 cs.RO cs.SY eess.SY

Momentum Model-based Minimal Parameter Identification of a Space Robot

基于动量模型的太空机器人最小参数识别

B. Naveen, Suril V. Shah, Arun K. Misra

发表机构 * Indian Institute of Technology, Jodhpur, Rajasthan, India（印度理工学院，朱达普尔，拉贾斯坦邦，印度）； McGill University, Montreal, Quebec H3A 0C3, Canada（麦吉尔大学，蒙特利尔，魁北克 H3A 0C3，加拿大）

AI总结本文提出了一种基于动量模型的最小参数识别方法，用于在轨识别太空机器人的最小参数，这些参数能够唯一确定动量和动力学模型，从而支持卫星及其安装的机械臂的运动规划和控制。

Comments Accepted for publication in AIAA Journal of Guidance, Control, and Dynamics

详情

DOI: 10.2514/1.G003541

AI中文摘要

准确的惯性参数信息对空间机器人的运动规划和控制至关重要。在发射前，仅能通过实验和计算机辅助设计（CAD）模型获得惯性参数的粗略估计。在发射后，轨道操作会显著改变惯性参数的值。本文提出了一种新的基于动量模型的方法，在轨识别太空机器人的最小参数。最小参数是链接的惯性参数的组合，并唯一定义动量和动力学模型。因此，它们对于卫星及其安装的机械臂的运动规划和控制是足够的。所提出框架的关键在于将动量模型以最小参数的线性形式进行唯一建模。进一步，为了估计最小参数，我们提出了一种基于关节速度方向组合的新型关节轨迹规划和优化技术。该识别框架的有效性在具有12个自由度的空间双臂机器人上得到验证。该方法适用于树型空间机器人，仅需要姿态和扭转数据，并且随着关节数量的增加而可扩展。

英文摘要

Accurate information of inertial parameters is critical to motion planning and control of space robots. Before the launch, only a rudimentary estimate of the inertial parameters is available from experiments and computer-aided design (CAD) models. After the launch, on-orbit operations substantially alter the value of inertial parameters. In this work, we propose a new momentum model-based method for identifying the minimal parameters of a space robot while on orbit. Minimal parameters are combinations of the inertial parameters of the links and uniquely define the momentum and dynamic models. Consequently, they are sufficient for motion planning and control of both the satellite and robotic arms mounted on it. The key to the proposed framework is the unique formulation of momentum model in the linear form of minimal parameters. Further, to estimate the minimal parameters, we propose a novel joint trajectory planning and optimization technique based on direction combinations of joints' velocity. The efficacy of the identification framework is demonstrated on a 12 degrees-of-freedom, spatial, dual-arm space robot. The methodology is developed for tree-type space robots, requires just the pose and twist data, and scalable with increasing number of joints.

URL PDF HTML ☆

赞 0 踩 0

1811.05788 2026-06-04 cs.LG cs.AI cs.SY eess.SY

Learning to Compensate Photovoltaic Power Fluctuations from Images of the Sky by Imitating an Optimal Policy

通过模仿最优策略从天空图像中学习补偿光伏功率波动

Robin Spiess, Felix Berkenkamp, Jan Poland, Andreas Krause

发表机构 * Department of Computer Science, ETH Zurich（计算机科学系，苏黎世联邦理工学院）； ABB Corporate Research, Switzerland（瑞士ABB企业研究）

AI总结本文提出了一种基于深度学习的方法，利用天空图像预测性地补偿光伏功率波动，减少电池压力，通过模仿学习训练神经网络近似最优策略。

Comments 7 pages, 7 figures

详情

AI中文摘要

光伏（PV）发电站的输出功率取决于环境，因此会随时间波动。这导致光伏功率可能在电网中引起不稳定性，尤其是在日益广泛使用的情况下。限制功率输出变化率是缓解这些波动的常见方法，通常借助大型电池。一种使用这些电池补偿阶跃变化的反应控制器在实践中有效，但会导致电池因高能量通过而受到压力。在本文中，我们提出了一种深度学习方法，利用天空图像来预测性地补偿功率波动并减少电池压力。特别是，我们证明可以通过仅在事后可用的信息来计算最优控制策略。基于此，我们使用模仿学习训练一个神经网络，该网络近似这种事后最优策略，但仅使用当前可用的天空图像和传感器数据。我们对一个大规模的测量和图像数据集进行了评估，并展示了训练后的策略能够减少电池压力。

英文摘要

The energy output of photovoltaic (PV) power plants depends on the environment and thus fluctuates over time. As a result, PV power can cause instability in the power grid, in particular when increasingly used. Limiting the rate of change of the power output is a common way to mitigate these fluctuations, often with the help of large batteries. A reactive controller that uses these batteries to compensate ramps works in practice, but causes stress on the battery due to a high energy throughput. In this paper, we present a deep learning approach that uses images of the sky to compensate power fluctuations predictively and reduces battery stress. In particular, we show that the optimal control policy can be computed using information that is only available in hindsight. Based on this, we use imitation learning to train a neural network that approximates this hindsight-optimal policy, but uses only currently available sky images and sensor data. We evaluate our method on a large dataset of measurements and images from a real power plant and show that the trained policy reduces stress on the battery.

URL PDF HTML ☆

赞 0 踩 0