arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2402.02555 2026-06-04 cs.CV cs.CL

High-Quality Entity Segmentation and Grounding

高质量实体分割与定位

Lu Qi, Yi-Wen Chen, Tao Zhang, Xiangtai Li, Xu Yang, Bo Du, Ming-Hsuan Yang

发表机构 * Wuhan University（武汉大学）； Insta360 Research（Insta360研究院）； Department of EECS, University of California, Merced（加州大学默塞德分校电子工程与计算机科学系）； Nanyang Technological University（南洋理工大学）； Institute of Automation of the Chinese Academy of Sciences（中国科学院自动化研究所）

AI总结提出ESG流水线，通过新数据集EntitySeg和两阶段解耦设计（CropFormer高质量分割+GELLA精确名词提取与语义匹配），实现高质量实体分割与定位，在五项任务上有效。

详情

AI中文摘要

在这项工作中，我们提出了ESG，一个由新数据集EntitySeg支持的高质量实体分割与定位流水线。首先，所提出的数据集命名为EntitySeg，包含跨越各种图像域和实体的图像，以及用于训练和测试的大量高分辨率图像和高质量掩码标注。然后，ESG主要由两个模块组成：用于高质量实体分割的CropFormer，以及用于从句子中精确提取名词并在语言和视觉区域之间进行语义匹配的GELLA。与现有联合训练分割和大语言模型的定位方法不同，ESG采用两阶段解耦设计，保留了高质量掩码和定位鲁棒性，避免了联合训练通常带来的权衡。CropFormer确保高质量实体分割结果，然后可以编码到GELLA模型中进行有效定位。大量实验结果表明，我们提出的流水线在五项任务上有效，包括实体分割、全景分割、开放词汇分割、指代分割和全景定位叙述。此外，ESG流水线的GELLA模块高度灵活，能够处理来自任何分割框架的掩码输入，这得益于其轻量级的颜色图/视觉编码器、语言/掩码解码器和关联模块。实体分割数据集和定位代码将在https://github.com/qqlu/Entity发布。

英文摘要

In this work, we propose ESG, a pipeline for high-quality entity segmentation and grounding supported by a new dataset EntitySeg. At first, the proposed dataset naming EntitySeg contains images spanning various image domains and entities, along with plentiful high-resolution images and high-quality mask annotations for training and testing. Then, the ESG mainly consists of two modules: CropFormer for high-quality entity segmentation whereas GELLA for accurate noun extraction from sentences and semantic matching between language and visual regions. Unlike existing grounding methods that jointly train a segmentation and a large language model, ESG adopts a two-stage decoupled design, preserving high-quality masks and grounding robustness without the trade-offs often introduced by joint training. CropFormer ensures high-quality entity segmentation results, which can then be encoded into the GELLA model for effective grounding. Extensive experimental results demonstrate the effectiveness of our proposed pipeline across five tasks, including entity segmentation, panoptic segmentation, open-vocabulary segmentation, referring segmentation, and panoptic localized narratives. Furthermore, GELLA module of ESG pipeline is highly flexible and capable of processing mask inputs from any segmentation framework, thanks to its lightweight colormap/vision encoder, language/mask decoder, and association module. The entity segmentation dataset and grounding code will be released at https://github.com/qqlu/Entity.

URL PDF HTML ☆

赞 0 踩 0

2209.15448 2026-06-04 cs.LG math.ST stat.ME stat.TH

Blessing from Human-AI Interaction: Super Reinforcement Learning in Confounded Environments

人机交互的福音：混杂环境下的超级强化学习

Jiayi Wang, Zhengling Qi, Chengchun Shi

发表机构 * Department of Mathematical Sciences, University of Texas at Dallas（德克萨斯大学达拉斯分校数学科学系）； Department of Statistics, London School of Economics and Political Science（伦敦政治经济学院统计系）； Department of Decision Sciences, George Washington University（乔治华盛顿大学决策科学系）

AI总结提出利用人机交互中的观察动作进行超级策略学习，在存在未测量混杂的情况下，通过近端因果推断实现优于标准最优策略和行为策略的超级策略。

详情

AI中文摘要

随着人工智能在社会中越来越普遍，整合人类和AI系统以发挥各自优势并降低风险的有效方法已成为重要优先事项。在本文中，我们引入了超级策略学习的范式，该范式利用人机交互进行数据驱动的序贯决策。这种方法将来自AI或人类的观察动作作为输入，以实现决策者（人类或AI）在策略学习中更强的oracle。在存在未测量混杂的决策过程中，过去智能体采取的动作可以揭示未公开信息的有价值见解。通过以一种新颖且合法的方式将这些信息纳入策略搜索，所提出的超级策略学习将产生一个超级策略，该策略保证优于标准最优策略和行为策略（例如，过去智能体的动作）。我们将这种更强的oracle称为人机交互的福音。此外，为了解决使用批处理数据寻找超级策略时的未测量混杂问题，在近端因果推断框架下建立了一系列非参数和因果识别。基于这些新颖的识别结果，我们开发了几种超级策略学习算法，并系统研究了它们的理论性质，例如有限样本遗憾保证。最后，通过大量模拟和实际应用说明了我们方法的有效性。

英文摘要

As AI becomes more prevalent throughout society, effective methods of integrating humans and AI systems that leverage their respective strengths and mitigate risk have become an important priority. In this paper, we introduce the paradigm of super policy learning that takes advantage of Human-AI interaction for data driven sequential decision making. This approach utilizes the observed action, either from AI or humans, as input for achieving a stronger oracle in policy learning for the decision maker (humans or AI). In the decision process with unmeasured confounding, the actions taken by past agents can offer valuable insights into undisclosed information. By including this information for the policy search in a novel and legitimate manner, the proposed super policy learning will yield a super-policy that is guaranteed to outperform both the standard optimal policy and the behavior one (e.g., past agents' actions). We call this stronger oracle a blessing from human-AI interaction. Furthermore, to address the issue of unmeasured confounding in finding super-policies using the batch data, a number of nonparametric and causal identifications are established under the framework of proximal causal inference. Building upon on these novel identification results, we develop several super-policy learning algorithms and systematically study their theoretical properties such as finite-sample regret guarantee. Finally, we illustrate the effectiveness of our proposal through extensive simulations and real-world applications.

URL PDF HTML ☆

赞 0 踩 0

1410.6333 2026-06-04 cs.CV cs.NA math.NA

A Regularization Approach to Blind Deblurring and Denoising of QR Barcodes

一种正则化方法用于QR条形码的盲去模糊和去噪

Yves van Gennip, Prashant Athavale, Jérôme Gilles, Rustum Choksi

发表机构 * Fields Institute, University of Toronto（多伦多大学菲尔兹研究所）

AI总结本文提出了一种基于正则化的纯方法，用于在存在噪声的情况下对QR条形码进行盲去模糊和去噪，利用了已知的所需模式和开源条形码阅读器的事实。

Comments 14 pages, 19 figures (with a total of 57 subfigures), 1 table; v3: previously missing reference [35] added

1808.03408 2026-06-04 cs.LG cs.NA math.NA math.OC stat.ML

A Unified Analysis of AdaGrad with Weighted Aggregation and Momentum Acceleration

AdaGrad的统一分析：带加权聚合和动量加速

Li Shen, Congliang Chen, Fangyu Zou, Zequn Jie, Ju Sun, Wei Liu

发表机构 * JD Explore Academy, Beijing, China（京东探索研究院，北京，中国）； Facebook, USA（Facebook，美国）； Meituan, Beijing, China（美团，北京，中国）； University of Minnesota, Twin Cities, USA（明尼苏达大学双城分校，美国）； Tencent, Shenzhen, China（腾讯，深圳，中国）

AI总结本文提出了一种名为AdaUSM的加权AdaGrad算法，通过统一动量方案和新型加权自适应学习率，实现了在非凸随机设置下的O(√(log(T)/T))收敛率，并从新视角解释了Adam和RMSProp的自适应学习率。

Comments IEEE TNNLS

详情

AI中文摘要

将自适应学习率和动量技术整合到SGD中，可以得到一系列高效加速的自适应随机算法，如AdaGrad、RMSProp、Adam、AccAdaGrad等。尽管这些算法在实践中效果显著，但在非凸随机设置下的收敛理论仍存在较大差距。为此，我们提出了名为AdaUSM的加权AdaGrad，其主要特点包括（1）采用统一的动量方案，涵盖重球动量和Nesterov加速梯度动量；（2）采用新颖的加权自适应学习率，能够统一AdaGrad、AccAdaGrad、Adam和RMSProp的学习率。此外，当在AdaUSM中采用多项式增长的权重时，可以得到非凸随机设置下的O(√(log(T)/T))收敛率。我们还展示了Adam和RMSProp的自适应学习率对应于在AdaUSM中采用指数增长的权重，从而为理解Adam和RMSProp提供了新的视角。最后，我们还在各种深度学习模型和数据集上进行了AdaUSM与SGD动量、AdaGrad、AdaEMA、Adam和AMSGrad的比较实验。

英文摘要

Integrating adaptive learning rate and momentum techniques into SGD leads to a large class of efficiently accelerated adaptive stochastic algorithms, such as AdaGrad, RMSProp, Adam, AccAdaGrad, \textit{etc}. In spite of their effectiveness in practice, there is still a large gap in their theories of convergences, especially in the difficult non-convex stochastic setting. To fill this gap, we propose \emph{weighted AdaGrad with unified momentum}, dubbed AdaUSM, which has the main characteristics that (1) it incorporates a unified momentum scheme which covers both the heavy ball momentum and the Nesterov accelerated gradient momentum; (2) it adopts a novel weighted adaptive learning rate that can unify the learning rates of AdaGrad, AccAdaGrad, Adam, and RMSProp. Moreover, when we take polynomially growing weights in AdaUSM, we obtain its $\mathcal{O}(\log(T)/\sqrt{T})$ convergence rate in the non-convex stochastic setting. We also show that the adaptive learning rates of Adam and RMSProp correspond to taking exponentially growing weights in AdaUSM, thereby providing a new perspective for understanding Adam and RMSProp. Lastly, comparative experiments of AdaUSM against SGD with momentum, AdaGrad, AdaEMA, Adam, and AMSGrad on various deep learning models and datasets are also carried out.

URL PDF HTML ☆

赞 0 踩 0

1709.09480 2026-06-04 cs.AI cs.LG cs.SY eess.SY

A Benchmark Environment Motivated by Industrial Control Problems

由工业控制问题启发的基准环境

Daniel Hein, Stefan Depeweg, Michel Tokic, Steffen Udluft, Alexander Hentschel, Thomas A. Runkler, Volkmar Sterzing

发表机构 * Siemens AG, Corporate Technology（西门子股份公司企业技术部）

AI总结本文提出一个结合工业控制问题的基准环境，旨在解决真实工业环境与现有人工基准之间缺乏联系的问题，通过详细描述基准动态并识别典型实验设置来促进强化学习方法的改进。

Journal ref 2017 IEEE Symposium Series on Computational Intelligence (SSCI)

详情

DOI: 10.1109/SSCI.2017.8280935

AI中文摘要

在强化学习（RL）研究领域，频繁出现新的有前景的方法被开发并引入RL社区。然而，尽管许多研究人员渴望将他们的方法应用于现实世界的问题，但在真实工业环境中实施这些方法往往是一个令人沮丧和繁琐的过程。通常，学术研究小组只能有限地访问真实工业数据和应用。因此，新方法通常通过使用人工软件基准来开发、评估和比较。一方面，这些基准旨在提供可解释的RL训练场景和对所用方法学习过程的深入见解。另一方面，它们通常与现实工业应用缺乏相似性。为此，我们利用行业经验设计了一个基准，以弥合自由可用、文档齐全且有动机的人工基准与真实工业问题属性之间的差距。所得到的工业基准（IB）已通过在GitHub上发布其Java和Python代码，包括一个OpenAI Gym包装器，向RL社区公开。在本文中，我们详细阐述了IB的动力学，并识别了能够捕捉现实世界工业控制问题中常见情况的典型实验设置。

英文摘要

In the research area of reinforcement learning (RL), frequently novel and promising methods are developed and introduced to the RL community. However, although many researchers are keen to apply their methods on real-world problems, implementing such methods in real industry environments often is a frustrating and tedious process. Generally, academic research groups have only limited access to real industrial data and applications. For this reason, new methods are usually developed, evaluated and compared by using artificial software benchmarks. On one hand, these benchmarks are designed to provide interpretable RL training scenarios and detailed insight into the learning process of the method on hand. On the other hand, they usually do not share much similarity with industrial real-world applications. For this reason we used our industry experience to design a benchmark which bridges the gap between freely available, documented, and motivated artificial benchmarks and properties of real industrial problems. The resulting industrial benchmark (IB) has been made publicly available to the RL community by publishing its Java and Python code, including an OpenAI Gym wrapper, on Github. In this paper we motivate and describe in detail the IB's dynamics and identify prototypic experimental settings that capture common situations in real-world industry control problems.

URL PDF HTML ☆

赞 0 踩 0

1809.03225 2026-06-04 cs.RO cs.LG cs.SY eess.SY

Gait learning for soft microrobots controlled by light fields

基于光场控制的软微机器人步态学习

Alexander von Rohr, Sebastian Trimpe, Alonso Marco, Peer Fischer, Stefano Palagi

发表机构 * Micro, Nano, and Molecular Systems Group, Max Planck Institute for Intelligent Systems（微、纳、分子系统组，人工智能系统马克斯·普朗克研究所）； Max Planck ETH Center for Learning Systems（马克斯·普朗克-ETH学习系统中心）

AI总结本文提出一种基于贝叶斯优化和高斯过程的概率学习方法，用于优化光场控制的软微机器人步态，通过有限实验预算实现高效且鲁棒的运动性能提升。

Comments 8 pages, 7 figures, to appear in the proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems 2018

详情

DOI: 10.1109/IROS.2018.8594092

AI中文摘要

基于光场控制的软微机器人可以生成多种不同的步态。这种内在的灵活性可以用来最大化其在特定环境中的运动性能，并用于适应变化的条件。然而，由于缺乏准确的运动模型以及微机器人之间的固有变异性，分析控制设计是不可能的。另一方面，常见的数据驱动方法需要运行大量的实验，导致非常特定于样本的结果。本文提出了一种基于贝叶斯优化（BO）和高斯过程（GPs）的概率学习方法，用于光场控制的软微机器人。所提出的方法产生了一种学习方案，该方案在数据效率方面表现优异，能够在有限的实验预算下进行步态优化，并且对微机器人样本之间的差异具有鲁棒性。这些特性是通过在半合成数据集上比较不同的GP先验和BO设置来设计学习方案获得的。开发的学习方案在微机器人实验中得到验证，结果在仅20次实验的预算下，使微机器人的运动性能提高了115%。这些令人鼓舞的结果为基于光场控制的软微机器人和概率学习控制的自适应微机器人系统铺平了道路。

英文摘要

Soft microrobots based on photoresponsive materials and controlled by light fields can generate a variety of different gaits. This inherent flexibility can be exploited to maximize their locomotion performance in a given environment and used to adapt them to changing conditions. Albeit, because of the lack of accurate locomotion models, and given the intrinsic variability among microrobots, analytical control design is not possible. Common data-driven approaches, on the other hand, require running prohibitive numbers of experiments and lead to very sample-specific results. Here we propose a probabilistic learning approach for light-controlled soft microrobots based on Bayesian Optimization (BO) and Gaussian Processes (GPs). The proposed approach results in a learning scheme that is data-efficient, enabling gait optimization with a limited experimental budget, and robust against differences among microrobot samples. These features are obtained by designing the learning scheme through the comparison of different GP priors and BO settings on a semi-synthetic data set. The developed learning scheme is validated in microrobot experiments, resulting in a 115% improvement in a microrobot's locomotion performance with an experimental budget of only 20 tests. These encouraging results lead the way toward self-adaptive microrobotic systems based on light-controlled soft microrobots and probabilistic learning control.

URL PDF HTML ☆

赞 0 踩 0

1811.04333 2026-06-04 cs.RO cs.FL cs.SY eess.SY

Reactive Task and Motion Planning for Robust Whole-Body Dynamic Locomotion in Constrained Environments

面向受限环境的鲁棒全身体动态运动的反应任务与运动规划

Ye Zhao, Yinan Li, Luis Sentis, Ufuk Topcu, Jun Liu

发表机构 * George W. Woodruff School of Mechanical Engineering, Georgia Tech, USA（佐治亚理工学院机械工程学院）； Department of Applied Mathematics, University of Waterloo, Canada（滑铁卢大学应用数学系）； Department of Aerospace Engineering and Engineering Mechanics, UT Austin, USA（得克萨斯大学奥斯汀分校航空航天工程与工程力学系）； Institute for Computational Engineering and Sciences, UT Austin, USA（得克萨斯大学奥斯汀分校计算工程与科学研究所）

AI总结本文提出了一种基于时序逻辑的游戏框架，用于在受限和动态变化的环境中进行全身体动态运动的任务规划与控制，通过符号系统的方法确保运动行为的正确性。

Comments 49 pages, 23 figures, 1 table

详情

AI中文摘要

基于接触的决策和规划方法越来越重要，以赋予四足机器人更高的自主性。源自符号系统的正式合成方法在推理高层运动决策和实现复杂 maneuvering 行为方面具有巨大潜力。本文首次尝试正式设计由任务规划和全身体动态运动控制组成的架构，在受限和动态变化的环境中。在高层，我们构建了一个多肢体运动规划器与其动态环境之间的双玩家时序逻辑游戏，以合成一个获胜策略，提供符号运动动作。这些运动动作满足由时序逻辑片段表达的期望高层任务规范。这些动作发送到一个稳健的有限状态转换系统，该系统合成一个满足状态可达性约束的运动控制器。此控制器通过低层运动规划器进一步执行，生成可行的运动轨迹。我们构建了一系列动态运动模型用于腿部机器人，作为处理多样化环境事件的模板库。我们设计了一种重新规划策略，考虑突发环境变化或大状态扰动，以提高最终运动行为的鲁棒性。我们正式证明了分层运动框架的正确性，通过运动规划层保证鲁棒实现。在多种环境中的反应运动行为模拟表明，我们的框架有潜力成为智能运动行为的理论基础。

英文摘要

Contact-based decision and planning methods are becoming increasingly important to endow higher levels of autonomy for legged robots. Formal synthesis methods derived from symbolic systems have great potential for reasoning about high-level locomotion decisions and achieving complex maneuvering behaviors with correctness guarantees. This study takes a first step toward formally devising an architecture composed of task planning and control of whole-body dynamic locomotion behaviors in constrained and dynamically changing environments. At the high level, we formulate a two-player temporal logic game between the multi-limb locomotion planner and its dynamic environment to synthesize a winning strategy that delivers symbolic locomotion actions. These locomotion actions satisfy the desired high-level task specifications expressed in a fragment of temporal logic. Those actions are sent to a robust finite transition system that synthesizes a locomotion controller that fulfills state reachability constraints. This controller is further executed via a low-level motion planner that generates feasible locomotion trajectories. We construct a set of dynamic locomotion models for legged robots to serve as a template library for handling diverse environmental events. We devise a replanning strategy that takes into consideration sudden environmental changes or large state disturbances to increase the robustness of the resulting locomotion behaviors. We formally prove the correctness of the layered locomotion framework guaranteeing a robust implementation by the motion planning layer. Simulations of reactive locomotion behaviors in diverse environments indicate that our framework has the potential to serve as a theoretical foundation for intelligent locomotion behaviors.

URL PDF HTML ☆

赞 0 踩 0

1710.05465 2026-06-04 cs.AI cs.RO cs.SY eess.SY

Flow: A Modular Learning Framework for Mixed Autonomy Traffic

Flow: 一种用于混合自主性的模块化学习框架

Cathy Wu, Aboudy Kreidieh, Kanaad Parvate, Eugene Vinitsky, Alexandre M Bayen

发表机构 * Laboratory for Information and Decision Systems, Massachusetts Institute of Technology（信息与决策实验室，麻省理工学院）； Institute of Data, Systems, and Society, Massachusetts Institute of Technology（数据、系统与社会研究所，麻省理工学院）； Department of Mechanical Engineering, University of California, Berkeley（机械工程系，加州大学伯克利分校）

AI总结本文提出了一种模块化学习框架，利用深度强化学习解决复杂交通动态问题，通过提高系统层面的速度，使学习到的控制法则在仅有4-7%的自动驾驶汽车参与度下，相比人类驾驶性能提升高达57%。此外，在单车道交通中，一个仅使用局部观测的小型神经网络控制法则能够消除拥堵现象，达到近最优性能。

Comments 17 pages, 8 figures, 5 tables. 2021 IEEE Transactions on Robotics (T-RO)

详情

DOI: 10.1109/TRO.2021.3087314

AI中文摘要

自动驾驶车辆（AVs）的快速发展为交通系统带来了巨大的潜力，通过提高安全性和效率以及出行可及性。然而，随着AVs的采用，这些影响的发展进程并不清楚。从分析部分自动驾驶的总体目标来看，出现了许多技术挑战：部分控制和观测、多车辆交互以及现实世界网络所代表的大量场景。为了深入了解近期AV的影响，本文研究了深度强化学习（RL）在低AV采用率环境下克服这些挑战的适用性。本文提出了一种模块化学习框架，利用深度RL来处理复杂的交通动态。模块由多个部分组成，以捕捉常见的交通现象（如停止-启动交通拥堵、车道变换、交叉口）。学习到的控制法则在系统层面的速度上优于人类驾驶性能，仅在4-7%的AVs参与度下，提高了高达57%。此外，在单车道交通中，一个仅使用局部观测的小型神经网络控制法则被发现能够消除停止-启动交通现象，超越了所有已知的基于模型的控制器，达到近最优性能，并且能够推广到非分布交通密度。

英文摘要

The rapid development of autonomous vehicles (AVs) holds vast potential for transportation systems through improved safety, efficiency, and access to mobility. However, the progression of these impacts, as AVs are adopted, is not well understood. Numerous technical challenges arise from the goal of analyzing the partial adoption of autonomy: partial control and observation, multi-vehicle interactions, and the sheer variety of scenarios represented by real-world networks. To shed light into near-term AV impacts, this article studies the suitability of deep reinforcement learning (RL) for overcoming these challenges in a low AV-adoption regime. A modular learning framework is presented, which leverages deep RL to address complex traffic dynamics. Modules are composed to capture common traffic phenomena (stop-and-go traffic jams, lane changing, intersections). Learned control laws are found to improve upon human driving performance, in terms of system-level velocity, by up to 57% with only 4-7% adoption of AVs. Furthermore, in single-lane traffic, a small neural network control law with only local observation is found to eliminate stop-and-go traffic - surpassing all known model-based controllers to achieve near-optimal performance - and generalize to out-of-distribution traffic densities.

URL PDF HTML ☆

赞 0 踩 0

1803.07696 2026-06-04 cs.RO cs.SY eess.SY

Inverse Optimal Control from Incomplete Trajectory Observations

从不完整轨迹观测中逆最优控制

Wanxin Jin, Dana Kulić, Shaoshuai Mou, Sandra Hirche

发表机构 * School of Aeronautics and Astronautics, Purdue University（航空与航天学院，普渡大学）； Monash University（墨尔本大学）； Chair of Information-oriented Control, Technical University of Munich（信息导向控制教授职位，慕尼黑技术大学）

AI总结本文提出了一种从不完整轨迹观测中学习最优控制系统目标函数的方法，通过恢复矩阵确定候选特征的权重，并开发了增量逆最优控制算法。

Comments Codes: https://github.com/wanxinjin/IOC-from-Incomplete-Trajectory-Observations

Journal ref The International Journal of Robotics Research. 2021;40(6-7):848-865

详情

DOI: 10.1177/0278364921996384

AI中文摘要

本文开发了一种方法，使能够从不完整的轨迹观测中学习最优控制系统的的目标函数。假设目标函数是未知权重的特征（或基函数）的加权和，观测数据是系统状态和输入轨迹的一段。所提出的技术引入了恢复矩阵的概念，以建立任何可用轨迹段与给定候选特征权重之间的关系。恢复矩阵的秩表明是否可以在候选特征中找到相关子集，并且可以从段数据中学习相应的权重。恢复矩阵可以迭代获得，其秩非递减的性质表明额外的观测可能有助于目标学习。基于恢复矩阵，建立了一种使用不完整轨迹观测学习所选特征权重的方法，并通过自动寻找所需的最小观测开发了增量逆最优控制算法。该方法的有效性在线性二次调节系统和模拟机器人机械臂上得到了验证。

英文摘要

This article develops a methodology that enables learning an objective function of an optimal control system from incomplete trajectory observations. The objective function is assumed to be a weighted sum of features (or basis functions) with unknown weights, and the observed data is a segment of a trajectory of system states and inputs. The proposed technique introduces the concept of the recovery matrix to establish the relationship between any available segment of the trajectory and the weights of given candidate features. The rank of the recovery matrix indicates whether a subset of relevant features can be found among the candidate features and the corresponding weights can be learned from the segment data. The recovery matrix can be obtained iteratively and its rank non-decreasing property shows that additional observations may contribute to the objective learning. Based on the recovery matrix, a method for using incomplete trajectory observations to learn the weights of selected features is established, and an incremental inverse optimal control algorithm is developed by automatically finding the minimal required observation. The effectiveness of the proposed method is demonstrated on a linear quadratic regulator system and a simulated robot manipulator.

URL PDF HTML ☆

赞 0 踩 0

1904.00378 2026-06-04 cs.RO cs.SY eess.SY

MAT-Fly: An Educational Platform for Simulating Unmanned Aerial Vehicles Aimed to Detect and Track Moving Objects

MAT-Fly：一种用于模拟无人驾驶航空器的教育平台，旨在检测和跟踪移动物体

Giuseppe Silano, Luigi Iannelli

发表机构 * Faculty of Electrical Engineering, Czech Technical University in Prague（布拉格捷克技术大学电气工程系）； Department of Engineering, University of Sannio in Benevento, Piazza Roma 21（巴内维诺萨恩尼奥大学工程系，罗马广场21号）

AI总结本文提出了一种用于无人驾驶航空器领域特定任务的模拟方法，即视觉检测和跟踪任意移动物体，介绍了MAT-Fly平台，该平台基于Matlab和MathWorks虚拟现实（VR）和计算机视觉系统（CVS）工具箱，用于模拟四旋翼飞行器跟踪沿复杂路径移动的汽车，并开源供教育使用。

Comments 11 pages, 15 figures, journal paper

Journal ref IEEE Access, 2021

详情

DOI: 10.1109/ACCESS.2021.3064758

AI中文摘要

本文的主要动机是提出一种针对无人驾驶航空器领域特定任务的模拟方法，即视觉检测和跟踪任意移动物体。特别地，介绍了MAT-Fly，一个具有易用性和控制开发特点的多旋翼飞行器数值模拟平台。该平台基于Matlab和MathWorks虚拟现实（VR）和计算机视觉系统（CVS）工具箱，共同模拟四旋翼飞行器在跟踪沿复杂路径移动的汽车时的行为。VR工具箱被选择是因为学生对Matlab比较熟悉，并且由于其结构简单，用户在学习和开发阶段不需要付出显著的努力。整体架构非常模块化，使得每个模块可以轻松替换，从而简化代码重用和平台定制。一些简单的测试环境被展示以证明该方法的有效性以及平台的工作方式。该模拟器作为开源发布，使用户能够查看系统中的任何部分，并用于教育目的。

英文摘要

The main motivation of this work is to propose a simulation approach for a specific task within the Unmanned Aerial Vehicle (UAV) field, i.e., the visual detection and tracking of arbitrary moving objects. In particular, it is described MAT-Fly, a numerical simulation platform for multi-rotor aircraft characterized by the ease of use and control development. The platform is based on Matlab and the MathWorks Virtual Reality (VR) and Computer Vision System (CVS) toolboxes that work together to simulate the behavior of a quad-rotor while tracking a car that moves along a nontrivial path. The VR toolbox has been chosen due to the familiarity that students have with Matlab and because it does not require a notable effort by the user for the learning and development phase thanks to its simple structure. The overall architecture is quite modular so that each block can be easily replaced with others simplifying the code reuse and the platform customization. Some simple testbeds are presented to show the validity of the approach and how the platform works. The simulator is released as open-source, making it possible to go through any part of the system, and available for educational purposes.

URL PDF HTML ☆

赞 0 踩 0

1906.00729 2026-06-04 cs.LG cs.GT cs.SY eess.SY math.OC stat.ML

Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games

策略优化在零和线性二次博弈中可证明收敛至纳什均衡

Kaiqing Zhang, Zhuoran Yang, Tamer Başar

发表机构 * Department of Electrical and Computer Engineering & Coordinated Science Laboratory, University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校电子工程与协调科学实验室部门）； Department of Operations Research and Financial Engineering, Princeton University（普林斯顿大学运筹学与金融工程系）

AI总结本文研究了策略优化在零和线性二次博弈中寻找纳什均衡的全局收敛性，通过分析LQ博弈的优化景观，证明了线性反馈控制策略的 stationary 点构成博弈的纳什均衡，并提出三种保证收敛到纳什均衡的投影嵌套梯度方法，同时展示了这些算法具有全局次线性和局部线性收敛率。

Comments Fixed some typos, addressed some comments from NeurIPS reviews

详情

AI中文摘要

我们研究了策略优化在寻找零和线性二次（LQ）博弈纳什均衡（NE）中的全局收敛性。为此，我们首先分析了LQ博弈的景观，将其视为策略空间中的非凸非凹鞍点问题。具体来说，我们证明了尽管其非凸性和非凹性，零和LQ博弈具有性质：目标函数相对于线性反馈控制策略的 stationary 点构成博弈的纳什均衡。在此基础上，我们开发了三种投影嵌套梯度方法，这些方法保证能够收敛到博弈的纳什均衡。此外，我们证明所有这些算法都具有全局次线性和局部线性收敛率。还提供了仿真结果以说明算法的满意收敛特性。据我们所知，这项工作似乎是首次研究LQ博弈的优化景观，并且证明了策略优化方法收敛到纳什均衡。我们的工作为理解一般零和马尔可夫游戏中的基于策略的强化学习算法的理论方面提供了初步步骤。

英文摘要

We study the global convergence of policy optimization for finding the Nash equilibria (NE) in zero-sum linear quadratic (LQ) games. To this end, we first investigate the landscape of LQ games, viewing it as a nonconvex-nonconcave saddle-point problem in the policy space. Specifically, we show that despite its nonconvexity and nonconcavity, zero-sum LQ games have the property that the stationary point of the objective function with respect to the linear feedback control policies constitutes the NE of the game. Building upon this, we develop three projected nested-gradient methods that are guaranteed to converge to the NE of the game. Moreover, we show that all of these algorithms enjoy both globally sublinear and locally linear convergence rates. Simulation results are also provided to illustrate the satisfactory convergence properties of the algorithms. To the best of our knowledge, this work appears to be the first one to investigate the optimization landscape of LQ games, and provably show the convergence of policy optimization methods to the Nash equilibria. Our work serves as an initial step toward understanding the theoretical aspects of policy-based reinforcement learning algorithms for zero-sum Markov games in general.

URL PDF HTML ☆

赞 0 踩 0

1905.13268 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Interpretable PID Parameter Tuning for Control Engineering using General Dynamic Neural Networks: An Extensive Comparison

使用通用动态神经网络进行可解释的PID参数调节：一种广泛的比较

Johannes Günther, Elias Reichensdörfer, Patrick M. Pilarski, Klaus Diepold

发表机构 * Department of Computing Science, University of Alberta（阿尔伯塔大学计算机科学系）； Alberta Machine Intelligence Institute（阿尔伯塔人工智能研究所）

AI总结本文研究了如何通过通用动态神经网络（GDNN）扩展PID控制器，以提高复杂控制系统的性能和可解释性，通过四个基准系统的广泛比较，展示了神经PID控制器在16项任务中优于传统PID和模型驱动控制的13项任务。

详情

DOI: 10.1371/journal.pone.0243320

AI中文摘要

现代自动化系统依赖于闭环控制，其中控制器根据观察与受控过程交互。这些系统日益复杂，但大多数控制器仍是线性比例-积分-微分（PID）控制器。PID控制器在处理线性和近线性系统时表现良好，但其简单性与控制复杂过程所需鲁棒性相矛盾。现代机器学习提供了一种方法，即通过神经网络扩展PID控制器，以超越其线性能力。然而，这种扩展以失去稳定性保证和控制器可解释性为代价。本文研究了通过循环神经网络（即通用动态神经网络GDNN）扩展PID控制器的效用，证明GDNN（神经）PID控制器在多种控制系统中表现良好，并强调其作为可扩展和可解释的控制选项。为此，我们通过四个基准系统进行了广泛研究，这些系统代表了最常用的控制工程基准。所有控制基准均在有噪声和无噪声、有干扰和无干扰的情况下进行评估。神经PID控制器在16项任务中优于传统PID控制15项，在16项任务中优于模型驱动控制13项。作为第二项贡献，我们解决了防止神经网络用于实际控制过程的可解释性不足问题。我们使用有界输入有界输出稳定性分析来评估神经网络建议的参数，从而使其变得可理解。这种严格的评估与更好的可解释性相结合，是神经网络控制方法接受的重要步骤。此外，这也是可解释和安全应用人工智能的重要步骤。

英文摘要

Modern automation systems rely on closed loop control, wherein a controller interacts with a controlled process, based on observations. These systems are increasingly complex, yet most controllers are linear Proportional-Integral-Derivative (PID) controllers. PID controllers perform well on linear and near-linear systems but their simplicity is at odds with the robustness required to reliably control complex processes. Modern machine learning offers a way to extend PID controllers beyond their linear capabilities by using neural networks. However, such an extension comes at the cost of losing stability guarantees and controller interpretability. In this paper, we examine the utility of extending PID controllers with recurrent neural networks-namely, General Dynamic Neural Networks (GDNN); we show that GDNN (neural) PID controllers perform well on a range of control systems and highlight how they can be a scalable and interpretable option for control systems. To do so, we provide an extensive study using four benchmark systems that represent the most common control engineering benchmarks. All control benchmarks are evaluated with and without noise as well as with and without disturbances. The neural PID controller performs better than standard PID control in 15 of 16 tasks and better than model-based control in 13 of 16 tasks. As a second contribution, we address the lack of interpretability that prevents neural networks from being used in real-world control processes. We use bounded-input bounded-output stability analysis to evaluate the parameters suggested by the neural network, thus making them understandable. This combination of rigorous evaluation paired with better interpretability is an important step towards the acceptance of neural-network-based control approaches. It is furthermore an important step towards interpretable and safely applied artificial intelligence.

URL PDF HTML ☆

赞 0 踩 0

1903.01577 2026-06-04 cs.RO cs.LG cs.SY eess.SY

Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems

具有控制李雅普诺夫函数的不确定性机器人系统的经验学习

Andrew J. Taylor, Victor D. Dorobantu, Hoang M. Le, Yisong Yue, Aaron D. Ames

发表机构 * California Institute of Technology（加州理工学院）

AI总结本文提出了一种基于控制李雅普诺夫函数的机器学习框架，用于适应机器人系统中的参数不确定性和未建模动态，通过迭代更新李雅普诺夫函数导数的估计和改进控制器，最终获得一个稳定性的二次规划基于控制器，并在平面Segway模拟中验证了方法的有效性。

详情

DOI: 10.1109/IROS40897.2019.8967820

AI中文摘要

许多现代非线性控制方法旨在赋予系统保证性质，如稳定性或安全性，并已成功应用于机器人领域。然而，模型不确定性仍然是持续的挑战，削弱了理论保证并导致物理系统中的实施失败。本文开发了一种以控制李雅普诺夫函数（CLFs）为中心的机器学习框架，以适应一般机器人系统中的参数不确定性和未建模动态。我们提出的方法通过迭代更新李雅普诺夫函数导数的估计并改进控制器，最终获得一个基于二次规划的稳定控制器。我们在平面Segway模拟中验证了我们的方法，通过迭代改进基础无模型控制器，展示了显著的性能提升。

英文摘要

Many modern nonlinear control methods aim to endow systems with guaranteed properties, such as stability or safety, and have been successfully applied to the domain of robotics. However, model uncertainty remains a persistent challenge, weakening theoretical guarantees and causing implementation failures on physical systems. This paper develops a machine learning framework centered around Control Lyapunov Functions (CLFs) to adapt to parametric uncertainty and unmodeled dynamics in general robotic systems. Our proposed method proceeds by iteratively updating estimates of Lyapunov function derivatives and improving controllers, ultimately yielding a stabilizing quadratic program model-based controller. We validate our approach on a planar Segway simulation, demonstrating substantial performance improvements by iteratively refining on a base model-free controller.

URL PDF HTML ☆

赞 0 踩 0

1711.01526 2026-06-04 cs.LG cs.SY eess.SY math.OC

On Identification of Distribution Grids

配电网络的识别

Omid Ardakanian, Vincent W. S. Wong, Roel Dobbe, Steven H. Low, Alexandra von Meier, Claire Tomlin, Ye Yuan

发表机构 * Department of Electrical Engineering and Computer Sciences, UC Berkeley, USA（伯克利大学电气工程与计算机科学系，美国）

AI总结本文研究了如何通过遥测数据联合估计配电网络的模型参数和运行结构，利用lasso方法进行回归收缩和选择，提出可处理配电系统低秩结构的可行凸优化程序，并开发了用于早期检测和定位引起电导矩阵变化的关键事件的在线算法。

详情

DOI: 10.1109/TCNS.2019.2891002

AI中文摘要

将分布式能源资源大规模整合到住宅配电馈线中需要通过潮流分析仔细控制其运行。虽然分布系统模型的知识对于此类分析至关重要，但这种知识往往不可用或过时。最近同步相量技术在低压配电网络中的引入为从高精度、时间同步的电压和电流相量测量中学习此模型创造了前所未有的机会。本文重点是通过lasso方法（一种回归收缩和选择方法）从可用遥测数据中联合估计多相配电网络的模型参数（电导值）和运行结构。我们提出了能够处理配电系统低秩结构的可行凸优化程序，并开发了用于早期检测和定位引起电导矩阵变化的关键事件的在线算法。这些技术的有效性通过四个三相辐射形配电系统在真实家庭需求上的潮流研究得到验证。

英文摘要

Large-scale integration of distributed energy resources into residential distribution feeders necessitates careful control of their operation through power flow analysis. While the knowledge of the distribution system model is crucial for this type of analysis, it is often unavailable or outdated. The recent introduction of synchrophasor technology in low-voltage distribution grids has created an unprecedented opportunity to learn this model from high-precision, time-synchronized measurements of voltage and current phasors at various locations. This paper focuses on joint estimation of model parameters (admittance values) and operational structure of a poly-phase distribution network from the available telemetry data via the lasso, a method for regression shrinkage and selection. We propose tractable convex programs capable of tackling the low rank structure of the distribution system and develop an online algorithm for early detection and localization of critical events that induce a change in the admittance matrix. The efficacy of these techniques is corroborated through power flow studies on four three-phase radial distribution systems serving real household demands.

URL PDF HTML ☆

赞 0 踩 0

1605.01177 2026-06-04 cs.CV cs.SY eess.SY

A metric on the space of finite sets of trajectories for evaluation of multi-target tracking algorithms

有限轨迹集合空间上的度量用于多目标跟踪算法评估

Ángel F. García-Fernández, Abu Sajana Rahmathullah, Lennart Svensson

发表机构 * Zenuity AB（Zenuity AB公司）

AI总结本文提出了一种用于以数学严谨方式评估多目标跟踪算法的有限轨迹集合空间上的度量。该度量用于比较不同算法对轨迹的估计与真实轨迹，并包含与定位误差、漏检和误检以及轨迹切换相关的直观成本。度量计算基于解决多维分配问题，还提出了该度量的下界，该下界也是轨迹集合的度量，并可通过线性规划在多项式时间内计算。此外，还扩展了该度量到随机有限轨迹集合。

Comments Matlab code for the metric is available at https://github.com/Agarciafernandez/MTT

Journal ref in IEEE Transactions on Signal Processing, vol. 68, pp. 3917-3928, 2020

1806.04225 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY math.OC

PAC-Bayes Control: Learning Policies that Provably Generalize to Novel Environments

PAC-Bayes 控制：学习能够证明在新环境中泛化的能力的策略

Anirudha Majumdar, Alec Farid, Anoopkumar Sonar

发表机构 * Department of Mechanical and Aerospace Engineering（1,2 机械与航空航天工程系）； Department of Computer Science Princeton University（3 计算机科学系纽约大学普林斯顿分校）

AI总结本文提出了一种基于PAC-Bayes框架的机器人策略学习方法，通过在新环境中泛化能力的理论分析，为机器人系统提供强泛化保证。

Comments Extended version of paper presented at the 2018 Conference on Robot Learning (CoRL)

详情

AI中文摘要

我们的目标是学习能够证明在新环境中泛化能力的机器人控制策略，给定一组示例环境的数据集。我们方法的关键技术思想是利用机器学习中的泛化理论工具，通过精确的类比（以缩减形式呈现）将控制策略在新环境中的泛化与监督学习中的假设泛化联系起来。特别是，我们利用Probably Approximately Correct (PAC)-Bayes框架，这使我们能够获得在新环境中（随机）控制策略预期成本的上界。我们提出策略学习算法，明确寻求最小化此上界。相应的优化问题可以在有限策略空间的设置中通过凸优化（特别是相对熵编程）解决。在更一般的情况下，对于连续参数化策略（例如神经网络策略），我们使用随机梯度下降来最小化此上界。我们展示了所提出方法应用于学习（1）反应性障碍物回避策略和（2）基于神经网络的抓取策略的模拟结果。我们还展示了Parrot Swing无人机在不同障碍物环境中的硬件结果。我们的例子展示了该方法在具有连续状态和动作空间、复杂（例如非线性）动态、丰富感官输入（例如深度图像）和基于神经网络的策略的机器人系统中提供强泛化保证的潜力。

英文摘要

Our goal is to learn control policies for robots that provably generalize well to novel environments given a dataset of example environments. The key technical idea behind our approach is to leverage tools from generalization theory in machine learning by exploiting a precise analogy (which we present in the form of a reduction) between generalization of control policies to novel environments and generalization of hypotheses in the supervised learning setting. In particular, we utilize the Probably Approximately Correct (PAC)-Bayes framework, which allows us to obtain upper bounds that hold with high probability on the expected cost of (stochastic) control policies across novel environments. We propose policy learning algorithms that explicitly seek to minimize this upper bound. The corresponding optimization problem can be solved using convex optimization (Relative Entropy Programming in particular) in the setting where we are optimizing over a finite policy space. In the more general setting of continuously parameterized policies (e.g., neural network policies), we minimize this upper bound using stochastic gradient descent. We present simulated results of our approach applied to learning (1) reactive obstacle avoidance policies and (2) neural network-based grasping policies. We also present hardware results for the Parrot Swing drone navigating through different obstacle environments. Our examples demonstrate the potential of our approach to provide strong generalization guarantees for robotic systems with continuous state and action spaces, complicated (e.g., nonlinear) dynamics, rich sensory inputs (e.g., depth images), and neural network-based policies.

URL PDF HTML ☆

赞 0 踩 0

1904.10778 2026-06-04 cs.LG cs.SY eess.SY math.OC math.PR stat.ML

Some Limit Properties of Markov Chains Induced by Stochastic Recursive Algorithms

由随机递归算法诱导的马尔可夫链的一些极限性质

Abhishek Gupta, Hao Chen, Jianzong Pi, Gaurav Tendolkar

发表机构 * Electrical and Computer Engineering Department, The Ohio State University（俄亥俄州立大学电气与计算机工程系）； Microsoft Corp（微软公司）

AI总结本文研究了由随机递归算法诱导的马尔可夫链的极限性质，通过分析迭代随机算子的收敛性，证明了随机序列的分布弱收敛于收缩算子生成的轨迹，并进一步展示了随机序列的时间平均收敛于不变分布的空间均值。

Comments Accepted in SIMODS, 37 pages

详情

AI中文摘要

递归随机算法由于数据驱动应用而近期受到广泛关注。例如，随机梯度下降用于解决大规模优化问题，经验动态规划算法用于解决马尔可夫决策问题。这些递归随机算法近似某些收缩算子，并可以被视为迭代随机算子的框架内。因此，我们考虑在波兰空间上迭代随机算子，模拟该波兰空间上的迭代收缩算子。假设迭代随机算子按一定批次大小索引，当批次大小趋于无穷时，每个随机算子的实现（以某种方式）收敛于它所模拟的收缩算子。我们证明，从相同的初始条件出发，由迭代随机算子生成的随机序列的分布弱收敛于由收缩算子生成的轨迹。我们进一步证明，在某些条件下，随机序列的时间平均收敛于不变分布的空间均值。然后，我们将这些结果应用于逻辑回归、经验价值迭代和经验Q值迭代，以说明此处发展的通用理论。

英文摘要

Recursive stochastic algorithms have gained significant attention in the recent past due to data driven applications. Examples include stochastic gradient descent for solving large-scale optimization problems and empirical dynamic programming algorithms for solving Markov decision problems. These recursive stochastic algorithms approximate certain contraction operators and can be viewed within the framework of iterated random operators. Accordingly, we consider iterated random operators over a Polish space that simulate iterated contraction operator over that Polish space. Assume that the iterated random operators are indexed by certain batch sizes such that as batch sizes grow to infinity, each realization of the random operator converges (in some sense) to the contraction operator it is simulating. We show that starting from the same initial condition, the distribution of the random sequence generated by the iterated random operators converges weakly to the trajectory generated by the contraction operator. We further show that under certain conditions, the time average of the random sequence converges to the spatial mean of the invariant distribution. We then apply these results to logistic regression, empirical value iteration, and empirical Q value iteration for finite state finite action MDPs to illustrate the general theory develop here.

URL PDF HTML ☆

赞 0 踩 0

1903.11483 2026-06-04 cs.LG cs.NE cs.RO cs.SY eess.SY stat.ML

Constructing Parsimonious Analytic Models for Dynamic Systems via Symbolic Regression

通过符号回归构建动态系统的简洁解析模型

Erik Derner, Jiří Kubalík, Nicola Ancona, Robert Babuška

发表机构 * Czech Institute of Informatics, Robotics, and Cybernetics（捷克信息学、机器人学与自动化研究所）； Czech Technical University in Prague（布拉格捷克技术大学）； Department of Control Engineering, Faculty of Electrical Engineering（电气工程系控制工程系）； Delft University of Technology（代尔夫特理工大学）

AI总结本文提出利用符号回归构建动态系统的简洁解析模型，通过两种先进的符号回归算法在状态空间域和输入输出域中应用，展示了在模拟示例和真实系统中的优越性能。

Journal ref Applied Soft Computing, Volume 94, September 2020, 106432

详情

DOI: 10.1016/j.asoc.2020.106432

AI中文摘要

构建动态系统的数学模型对于许多工程和科学学科至关重要。模型有助于模拟、分析系统行为、决策制定和自动控制算法的设计。即使像强化学习（RL）这样的无模型控制技术也已被证明能从使用模型中受益，通常这些模型是在线学习的。任何模型构建方法都必须处理模型的准确性和复杂性之间的权衡，这很难做到。本文提出利用符号回归（SR）来构建由解析方程描述的简洁过程模型。我们为方法配备了两种最先进的符号回归算法，它们自动搜索适合测量数据的方程：单节点遗传编程（SNGP）和多基因遗传编程（MGGP）。除了状态空间域中的标准问题表述外，我们还展示了该方法如何应用于非线性自回归加外生输入（NARX）类型的输入输出模型。我们展示了该方法在三个模拟示例中的应用，这些示例的状态空间最高可达14维：倒立摆、移动机器人和双足行走机器人。与深度神经网络和局部线性回归的比较表明，SR在大多数情况下优于这些常用替代方法。我们在真实摆系统上展示了解析模型的发现使RL控制器能够成功完成摆起任务，该模型仅基于100个数据样本构建。

英文摘要

Developing mathematical models of dynamic systems is central to many disciplines of engineering and science. Models facilitate simulations, analysis of the system's behavior, decision making and design of automatic control algorithms. Even inherently model-free control techniques such as reinforcement learning (RL) have been shown to benefit from the use of models, typically learned online. Any model construction method must address the tradeoff between the accuracy of the model and its complexity, which is difficult to strike. In this paper, we propose to employ symbolic regression (SR) to construct parsimonious process models described by analytic equations. We have equipped our method with two different state-of-the-art SR algorithms which automatically search for equations that fit the measured data: Single Node Genetic Programming (SNGP) and Multi-Gene Genetic Programming (MGGP). In addition to the standard problem formulation in the state-space domain, we show how the method can also be applied to input-output models of the NARX (nonlinear autoregressive with exogenous input) type. We present the approach on three simulated examples with up to 14-dimensional state space: an inverted pendulum, a mobile robot, and a bipedal walking robot. A comparison with deep neural networks and local linear regression shows that SR in most cases outperforms these commonly used alternative methods. We demonstrate on a real pendulum system that the analytic model found enables a RL controller to successfully perform the swing-up task, based on a model constructed from only 100 data samples.

URL PDF HTML ☆

赞 0 踩 0

1904.01068 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY

Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models

在未知转移模型的确定性马尔可夫决策过程中实现高效且安全的探索

Erdem Bıyık, Jonathan Margoliash, Shahrouz Ryan Alimo, Dorsa Sadigh

发表机构 * Stanford University（斯坦福大学）； Jet Propulsion Laboratory（喷气推进实验室）； California Institute of Technology（加州理工学院）

AI总结本文提出了一种安全探索算法，通过利用Lipschitz连续性确保在探索过程中不访问危险状态，该算法在确定性马尔可夫决策过程中提供了确定性的安全保证，并通过模拟导航任务验证了其性能。

Comments Proceedings of the American Control Conference (ACC), July 2019. The first two authors have equal contribution

1905.03632 2026-06-04 cs.SD cs.SY eess.AS eess.SY

Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

基于DNN的块在线多通道语音增强方法

Jiri Malek, Zbynek Koldovsky, Marek Bohac

发表机构 * Faculty of Mechatronics, Informatics, and Interdisciplinary Studies, Technical University of Liberec（机械电子与交叉学科学院，利贝雷茨技术大学）

AI总结本文提出了一种基于DNN的块在线多通道语音增强方法，通过估计相对传输函数来实现波束成形，并在动态环境中处理短语音，提升了语音增强的鲁棒性。

Comments 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusions

Journal ref IET Signal Processing, vol. 14, no. 3, pp. 124-133, May 2020

详情

DOI: 10.1049/iet-spr.2019.0304

AI中文摘要

本文解决多通道语音增强中的块在线处理问题。此类处理在移动说话人或处理极短语音（如语音助手场景）时至关重要。我们考虑了一种系统，该系统通过基于DNN的语音活动检测（VAD）进行波束成形，随后进行后滤波。通过估计麦克风之间的相对传输函数来定位说话人。输入信号的每个块独立处理，以使其适用于高度动态的环境。由于处理块长度较短，波束成形所需的统计信息估计不够精确。本研究分析了这种不精确性的影响，并将其与将记录视为单块（批量处理）的处理模式进行比较。所提出方法在CHiME-4大型数据集和另一个具有移动目标说话人数据集上进行了实验评估。评估基于客观和主观标准（如信号干扰比（SIR）或语音质量主观评价（PESQ））。此外，还评估了基于基线自动语音识别系统的词错误率（WER），其中增强方法作为前端解决方案。结果表明，所提出的方法在处理块长度较短时具有鲁棒性。即使在250毫秒的块长度下，也能在各项指标和WER上观察到显著改进。

英文摘要

This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.

URL PDF HTML ☆

赞 0 踩 0

1905.10706 2026-06-04 cs.LG cs.RO cs.SY eess.SY stat.ML

Interactive Differentiable Simulation

交互式可微分模拟

Eric Heiden, David Millard, Hejia Zhang, Gaurav S. Sukhatme

发表机构 * University of Southern California（南加州大学）

AI总结本文提出交互式可微分模拟（IDS），一种能够高效准确推断刚体系统物理属性的可微分物理引擎，通过视觉输入实现系统识别，从而建立具有物理意义的世界模型，并在非线性动态系统中实现自动任务机器人设计和参数估计，显著提升了非线性控制领域的样本效率。

详情

AI中文摘要

智能体需要对世界有物理理解才能预测其未来行动的影响。虽然基于学习的环境动力学模型在样本效率上相比无模型强化学习算法有所改进，但通常无法泛化到训练数据之外的系统状态，且往往依赖于非解释性的潜在变量。我们引入交互式可微分模拟（IDS），一种可微分的物理引擎，能够高效准确地推断刚体系统的物理属性。将模型集成到深度学习架构中，该模型能够利用视觉输入实现系统识别，从而建立具有物理意义的世界模型。我们展示了通过自动计算IDS中的梯度，实现非线性动态系统的自动任务机器人设计和参数估计。当与自适应模型预测控制算法结合时，我们的方法在具有挑战性的非线性控制领域中，相比无模型强化学习算法显示出数量级的样本效率提升。

英文摘要

Intelligent agents need a physical understanding of the world to predict the impact of their actions in the future. While learning-based models of the environment dynamics have contributed to significant improvements in sample efficiency compared to model-free reinforcement learning algorithms, they typically fail to generalize to system states beyond the training data, while often grounding their predictions on non-interpretable latent variables. We introduce Interactive Differentiable Simulation (IDS), a differentiable physics engine, that allows for efficient, accurate inference of physical properties of rigid-body systems. Integrated into deep learning architectures, our model is able to accomplish system identification using visual input, leading to an interpretable model of the world whose parameters have physical meaning. We present experiments showing automatic task-based robot design and parameter estimation for nonlinear dynamical systems by automatically calculating gradients in IDS. When integrated into an adaptive model-predictive control algorithm, our approach exhibits orders of magnitude improvements in sample efficiency over model-free reinforcement learning algorithms on challenging nonlinear control domains.

URL PDF HTML ☆

赞 0 踩 0

1905.13547 2026-06-04 cs.LG cs.SY eess.SY math.DS math.OC stat.ML

Learning robust control for LQR systems with multiplicative noise via policy gradient

通过策略梯度学习具有乘性噪声的LQR系统的鲁棒控制

Benjamin Gravell, Peyman Mohajerin Esfahani, Tyler Summers

发表机构 * Control, Optimization, and Networks lab, UT Dallas（控制、优化与网络实验室，UT Dallas）； Delft Center for Systems and Control, TU Delft（代尔夫特系统与控制中心，TU Delft）

AI总结本文研究了具有乘性噪声的LQR系统，通过策略梯度方法实现鲁棒控制，证明了在非凸成本函数下策略梯度算法的全局收敛性。

详情

AI中文摘要

线性二次调节（LQR）问题重新成为强化学习控制复杂动态系统的重要理论基准，特别是当状态和动作空间连续时。与几乎所有近期相关工作不同，我们考虑了乘性噪声模型，这些模型由于显式地纳入系统动态中的固有不确定性和变化，从而提高了控制器的鲁棒性。鲁棒性是强化学习中一个关键但理解不足的问题；现有不考虑不确定性的方法可能会收敛到脆弱的策略或完全无法收敛。此外，有意地将乘性噪声注入到学习算法中可以增强策略的鲁棒性，如在领域随机化中的非正式工作所观察到的。尽管策略梯度算法需要优化非凸成本函数，我们展示了乘性噪声LQR成本具有称为梯度支配的特殊性质，该性质被用来证明策略梯度算法在问题参数上具有多项式依赖性的全局收敛性，以达到全局最优控制策略。结果在已知模型和未知模型设置中均提供，其中系统轨迹样本用于估计策略梯度。

英文摘要

The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for reinforcement learning-based control of complex dynamical systems with continuous state and action spaces. In contrast with nearly all recent work in this area, we consider multiplicative noise models, which are increasingly relevant because they explicitly incorporate inherent uncertainty and variation in the system dynamics and thereby improve robustness properties of the controller. Robustness is a critical and poorly understood issue in reinforcement learning; existing methods which do not account for uncertainty can converge to fragile policies or fail to converge at all. Additionally, intentional injection of multiplicative noise into learning algorithms can enhance robustness of policies, as observed in ad hoc work on domain randomization. Although policy gradient algorithms require optimization of a non-convex cost function, we show that the multiplicative noise LQR cost has a special property called gradient domination, which is exploited to prove global convergence of policy gradient algorithms to the globally optimum control policy with polynomial dependence on problem parameters. Results are provided both in the model-known and model-unknown settings where samples of system trajectories are used to estimate policy gradients.

URL PDF HTML ☆

赞 0 踩 0

1602.04450 2026-06-04 cs.RO cs.LG cs.SY eess.SY

Bayesian Optimization with Safety Constraints: Safe and Automatic Parameter Tuning in Robotics

具有安全约束的贝叶斯优化：机器人中的安全自动参数调节

Felix Berkenkamp, Andreas Krause, Angela P. Schoellig

发表机构 * 1 Learning \& Adaptive Systems Group, Department of Computer Science, ETH Zurich, Switzerland. 2 Dynamic Systems Lab, Institute for Aerospace Studies, University of Toronto, Canada.

AI总结本文提出了一种通用算法，允许在目标函数之外存在多个独立的安全约束。该算法在给定初始安全参数的情况下，最大化性能，但仅评估满足所有安全约束的参数。通过利用高斯过程先验的正则性假设，该算法仔细探索参数空间，并展示了如何利用上下文变量安全地将知识转移到新任务中。

详情

AI中文摘要

机器人算法通常依赖于各种参数，这些参数的选择对机器人的性能有显著影响。虽然初始参数猜测可以从机器人的动态模型中获得，但通常需要在真实系统上手动调整参数以达到最佳性能。优化算法，如贝叶斯优化，已被用来自动化这一过程。然而，这些方法在优化过程中可能会评估不安全的参数，导致安全关键系统的故障。最近，一种称为SafeOpt的安全贝叶斯优化算法已被开发，该算法保证系统性能永远不会低于临界值；即，安全性是基于性能函数定义的。然而，在机器人中，将性能和安全性结合往往并不理想。例如，高增益控制器可能实现低平均跟踪误差（性能），但可能会超调并违反输入约束。在本文中，我们提出了一种通用算法，允许在目标函数之外存在多个独立的安全约束。给定初始的安全参数集，该算法最大化性能，但只评估满足所有约束的参数，以高概率。为此，它通过利用高斯过程先验的正则性假设来仔细探索参数空间。此外，我们展示了如何利用上下文变量安全地将知识转移到新情况和任务中。我们提供了理论分析，并证明所提出的算法能够实现快速、自动和安全的参数调节优化，在四旋翼飞行器的实验中得到了验证。

英文摘要

Robotic algorithms typically depend on various parameters, the choice of which significantly affects the robot's performance. While an initial guess for the parameters may be obtained from dynamic models of the robot, parameters are usually tuned manually on the real system to achieve the best performance. Optimization algorithms, such as Bayesian optimization, have been used to automate this process. However, these methods may evaluate unsafe parameters during the optimization process that lead to safety-critical system failures. Recently, a safe Bayesian optimization algorithm, called SafeOpt, has been developed, which guarantees that the performance of the system never falls below a critical value; that is, safety is defined based on the performance function. However, coupling performance and safety is often not desirable in robotics. For example, high-gain controllers might achieve low average tracking error (performance), but can overshoot and violate input constraints. In this paper, we present a generalized algorithm that allows for multiple safety constraints separate from the objective. Given an initial set of safe parameters, the algorithm maximizes performance but only evaluates parameters that satisfy safety for all constraints with high probability. To this end, it carefully explores the parameter space by exploiting regularity assumptions in terms of a Gaussian process prior. Moreover, we show how context variables can be used to safely transfer knowledge to new situations and tasks. We provide a theoretical analysis and demonstrate that the proposed algorithm enables fast, automatic, and safe optimization of tuning parameters in experiments on a quadrotor vehicle.

URL PDF HTML ☆

赞 0 踩 0

1812.11137 2026-06-04 cs.LG cs.SY eess.SY math.OC stat.ML

Differential Temporal Difference Learning

差分时间差分学习

Adithya M. Devraj, Ioannis Kontoyiannis, Sean P. Meyn

发表机构 * Department of Electrical and Computer Engineering, University of Florida（佛罗里达大学电气与计算机工程系）； Department of Engineering, University of Cambridge（剑桥大学工程系）

AI总结本文提出了一种新的差分时间差分学习算法，旨在解决传统时间差分学习方法中收敛缓慢和相对价值函数计算中一致性算法仅在特殊情况下存在的问题。

Comments Preliminary versions of some of the results in this article were submitted as arXiv:1604.01828

详情

AI中文摘要

由马尔可夫决策过程导出的价值函数在许多统计和工程应用中的机器学习技术中作为算法和性能指标的核心组成部分。在大多数实际情况下，计算相关贝尔曼方程的解具有挑战性。一种流行的近似技术，即时间差分（TD）学习算法，是通用强化学习方法的重要子类。本文介绍的算法旨在解决TD学习方法的两个已知难题：由于非常高的方差导致的收敛缓慢，以及在计算相对价值函数的问题中，仅在特殊情况下存在一致算法。首先，我们表明这些价值函数的梯度具有可以用于算法设计的表示形式。基于这一结果，引入了一种新的差分TD学习算法。对于在欧几里得空间上具有光滑动力学的马尔可夫模型，在一般条件下，这些算法被证明是自洽的。数值结果表明，与标准方法相比，具有显著的方差减少。

英文摘要

Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques. Computation of the solution to the associated Bellman equations is challenging in most practical cases of interest. A popular class of approximation techniques, known as Temporal Difference (TD) learning algorithms, are an important sub-class of general reinforcement learning methods. The algorithms introduced in this paper are intended to resolve two well-known difficulties of TD-learning approaches: Their slow convergence due to very high variance, and the fact that, for the problem of computing the relative value function, consistent algorithms exist only in special cases. First we show that the gradients of these value functions admit a representation that lends itself to algorithm design. Based on this result, a new class of differential TD-learning algorithms is introduced. For Markovian models on Euclidean space with smooth dynamics, the algorithms are shown to be consistent under general conditions. Numerical results show dramatic variance reduction when compared to standard methods.

URL PDF HTML ☆

赞 0 踩 0

1407.0221 2026-06-04 cs.CV cs.NA math.NA

Imaging with Kantorovich-Rubinstein discrepancy

基于Kantorovich-Rubinstein偏差的成像

Jan Lellmann, Dirk A. Lorenz, Carola Schönlieb, Tuomo Valkonen

发表机构 * Department for Applied Mathematics and Theoretical Physics, University of Cambridge（应用数学与理论物理系，剑桥大学）； Institute for Analysis and Algebra, TU Braunschweig（分析与代数研究所， Braunschweig 技术大学）； Center for Mathematical Modeling (Modemat), EPN Quito（数学建模中心（Modemat），厄瓜多尔奎托）

AI总结本文提出将最优传输中的Kantorovich-Rubinstein范数应用于成像问题，提出了一种变分正则化模型，结合Kantorovich-Rubinstein偏差项和总变分正则化，用于图像去噪和卡通-纹理分解，并与其他方法建立联系，证明优化问题可转化为凸-凹鞍点问题并用标准工具求解。

1801.03800 2026-06-04 cs.CV cs.NA math.NA

Cortical-inspired image reconstruction via sub-Riemannian geometry and hypoelliptic diffusion

基于子黎曼几何和双曲椭圆扩散的皮层启发式图像重建

Ugo Boscain, Roman Chertovskih, Jean-Paul Gauthier, Dario Prandi, Alexey Remizov

发表机构 * CNRS, LJLL, Universit\'e Pierre et Marie Curie, Paris, France ； SYSTEC, FEUP, University of Porto, Portugal

AI总结本文基于初级视觉皮层的数学模型，提出了一种利用双曲椭圆扩散进行图像修复的算法，其中一种算法不利用图像损坏位置信息，另一种则利用该信息，后者在图像修复领域达到了最先进的水平，验证了视觉皮层确实编码了第一种算法。

Journal ref ESAIM: ProcS. 64 (2018), pp. 37-53

1808.08252 2026-06-04 cs.RO cs.SY eess.SY

Inverse Statics Optimization for Compound Tensegrity Robots

复合张力结构机器人的逆静态优化

Andrew P. Sabelhaus, Albert H. Li, Kimberly A. Sover, Jacob Madden, Andrew Barkan, Adrian K. Agogino, Alice M. Agogino

发表机构 * Department of Mechanical Engineering, University of California Berkeley（加州大学伯克利分校机械工程系）； Intelligent Systems Division, NASA Ames Research Center（NASA阿姆斯研究中心智能系统部）

AI总结本文提出了一种方法，用于计算复合张力结构机器人的电缆张力，以实现静态平衡。该方法基于改进的力密度法，通过二次优化问题解决电缆张力计算，并通过仿真和硬件实验验证了其在脊柱机器人和四足机器人设计与控制中的有效性。

详情

AI中文摘要

由电缆驱动的张力结构（张力-完整性）机器人具有软机器人许多优点，如灵活性和鲁棒性，同时仍遵循简单的静态和动态模型。然而，现有的张力结构建模方法无法原生描述具有任意刚体的张力网络的机器人。本文提出了一种方法，用于计算此类张力结构机器人的电缆张力，这里定义为复合张力结构。首先，将复合张力结构机器人的静态平衡模型重新表述为用于其他张力结构的标准力密度法。接下来，我们提出了在所提出模型下计算机器人电缆张力的问题。提出了解决方案作为带有实际约束的二次优化问题。仿真展示了如何利用该逆静态优化问题来设计和控制两种不同的复合张力结构应用：由该脊柱制成的脊柱机器人和四足机器人。最后，通过硬件实验验证了逆静态模型的准确性，证明了使用所提出方法进行低误差开环控制的可行性。

英文摘要

Robots built from cable-driven tensegrity (`tension-integrity') structures have many of the advantages of soft robots, such as flexibility and robustness, while still obeying simple statics and dynamics models. However, existing tensegrity modeling approaches cannot natively describe robots with arbitrary rigid bodies in their tension network. This work presents a method to calculate the cable tensions in static equilibrium for such tensegrity robots, here defined as compound tensegrity. First, a static equilibrium model for compound tensegrity robots is reformulated from the standard force density method used with other tensegrity structures. Next, we pose the problem of calculating tension forces in the robot's cables under our proposed model. A solution is proposed as a quadratic optimization problem with practical constraints. Simulations illustrate how this inverse statics optimization problem can be used for both the design and control of two different compound tensegrity applications: a spine robot and a quadruped robot built from that spine. Finally, we verify the accuracy of the inverse statics model through a hardware experiment, demonstrating the feasibility of low-error open-loop control using our proposed methodology.

URL PDF HTML ☆

赞 0 踩 0

1705.05415 2026-06-04 cs.RO cs.SY eess.SY

Robotic Wireless Sensor Networks

机器人无线传感器网络

Pradipta Ghosh, Andrea Gasparri, Jiong Jin, Bhaskar Krishnamachari

发表机构 * University of Southern California（南加州大学）； Università degli studi "Roma Tre"（罗马三大学）； Swinburne University of Technology（斯winburne技术大学）

AI总结本文综述了机器人与无线传感器网络交叉领域的新兴研究领域，探讨了多机器人系统在满足通信性能要求的同时实现传感目标的协同控制、学习和适应方法，并指出现有文献中存在的一些研究空白和未来研究方向。

详情

DOI: 10.1007/978-3-319-92384-0_16

AI中文摘要

在本章中，我们介绍了机器人与无线传感器网络（WSN）交叉领域的新兴、前沿且跨学科的研究领域，称为机器人无线传感器网络（RWSN）。我们定义RWSN为一种自主的多机器人网络系统，旨在通过协同控制、学习和适应，实现特定的传感目标，同时满足和维持一定的通信性能要求。尽管机器人和WSN这两个领域都非常知名且研究充分，但这两个领域交叉处存在大量新的机会和研究方向，这些方向要么相对未被探索，要么完全未被探索。例如，使用一组机器人路由器来建立发送方和接收方之间的临时通信路径，利用受控的移动性优势来改进数据路由。我们发现，直接归类为RWSN相关研究的文献数量非常有限，而机器人和WSN文献中存在一些相关于这一新研究领域的文章。为了连接这些点，我们首先识别了与RWSN相关的核心问题和研究趋势，如连通性、定位、路由和信息的鲁棒流动。接着，我们根据第一步中识别的问题和趋势，将现有的RWSN研究以及机器人和WSN社区的相关最新进展进行分类。最后，我们分析现有文献中缺失的部分，并确定未来需要更多研究关注的主题。

英文摘要

In this chapter, we present a literature survey of an emerging, cutting-edge, and multi-disciplinary field of research at the intersection of Robotics and Wireless Sensor Networks (WSN) which we refer to as Robotic Wireless Sensor Networks (RWSN). We define a RWSN as an autonomous networked multi-robot system that aims to achieve certain sensing goals while meeting and maintaining certain communication performance requirements, through cooperative control, learning and adaptation. While both of the component areas, i.e., Robotics and WSN, are very well-known and well-explored, there exist a whole set of new opportunities and research directions at the intersection of these two fields which are relatively or even completely unexplored. One such example would be the use of a set of robotic routers to set up a temporary communication path between a sender and a receiver that uses the controlled mobility to the advantage of packet routing. We find that there exist only a limited number of articles to be directly categorized as RWSN related works whereas there exist a range of articles in the robotics and the WSN literature that are also relevant to this new field of research. To connect the dots, we first identify the core problems and research trends related to RWSN such as connectivity, localization, routing, and robust flow of information. Next, we classify the existing research on RWSN as well as the relevant state-of-the-arts from robotics and WSN community according to the problems and trends identified in the first step. Lastly, we analyze what is missing in the existing literature, and identify topics that require more research attention in the future.

URL PDF HTML ☆

赞 0 踩 0

1905.08314 2026-06-04 cs.RO cs.LG cs.SY eess.SY

Longitudinal Dynamic versus Kinematic Models for Car-Following Control Using Deep Reinforcement Learning

纵向动态模型与运动学模型在使用深度强化学习的汽车跟随控制中的比较

Yuan Lin, John McPhee, Nasser L. Azad

发表机构 * University of Waterloo, Ontario, Canada（加拿大温哥华大学）

AI总结本文研究了在考虑车辆动力学的情况下，使用深度强化学习的纵向汽车跟随控制问题，通过引入延迟的控制输入和实际车辆加速度到强化学习环境状态中，改进了DRL框架，从而在考虑车辆动力学时实现了接近最优的控制性能。

Comments Accepted to 2019 IEEE Intelligent Transportation Systems Conference

详情

DOI: 10.1109/ITSC.2019.8916781

AI中文摘要

目前大多数关于通过深度强化学习（DRL）实现自动驾驶车辆控制的研究都使用点质量运动学模型，忽略了车辆动力学，包括加速度延迟和加速度命令动力学。加速度延迟源于传感和执行延迟，导致控制输入执行延迟。加速度命令动力学决定了实际车辆加速度不会立即达到期望的命令加速度，因为存在动力学限制。在本工作中，我们研究了将使用车辆运动学模型训练的DRL控制器应用于更现实的驾驶控制中的可行性。我们考虑了一个特定的纵向汽车跟随控制问题，即自适应巡航控制系统（ACC），该问题通过使用点质量运动学模型的DRL解决。当此类控制器应用于具有车辆动力学的汽车跟随时，我们观察到显著退化的汽车跟随性能。因此，我们重新设计DRL框架，通过将延迟的控制输入和实际车辆加速度分别添加到强化学习环境状态中，以适应加速度延迟和加速度命令动力学。训练结果表明，改进后的DRL控制器在考虑车辆动力学时的汽车跟随控制性能接近最优，与动态规划解决方案相比。

英文摘要

The majority of current studies on autonomous vehicle control via deep reinforcement learning (DRL) utilize point-mass kinematic models, neglecting vehicle dynamics which includes acceleration delay and acceleration command dynamics. The acceleration delay, which results from sensing and actuation delays, results in delayed execution of the control inputs. The acceleration command dynamics dictates that the actual vehicle acceleration does not rise up to the desired command acceleration instantaneously due to dynamics. In this work, we investigate the feasibility of applying DRL controllers trained using vehicle kinematic models to more realistic driving control with vehicle dynamics. We consider a particular longitudinal car-following control, i.e., Adaptive Cruise Control (ACC), problem solved via DRL using a point-mass kinematic model. When such a controller is applied to car following with vehicle dynamics, we observe significantly degraded car-following performance. Therefore, we redesign the DRL framework to accommodate the acceleration delay and acceleration command dynamics by adding the delayed control inputs and the actual vehicle acceleration to the reinforcement learning environment state, respectively. The training results show that the redesigned DRL controller results in near-optimal control performance of car following with vehicle dynamics considered when compared with dynamic programming solutions.

URL PDF HTML ☆

赞 0 踩 0

1803.00204 2026-06-04 cs.LG cs.AI cs.NA math.NA stat.ML

Scalar Quantization as Sparse Least Square Optimization

标量量化作为稀疏最小二乘优化

Chen Wang, Xiaomei Yang, Shaomin Fei, Kai Zhou, Xiaofeng Gong, Miao Du, Ruisen Luo

发表机构 * College of Electrical Engineering, Sichuan University（四川大学电气工程学院）； Department of Computer Science, Rutgers University -- New Brunswick（罗格斯大学新布朗斯维广场分校计算机科学系）； Engineering Practice Center, Chengdu University of Information Technology（成都信息科技大学工程实践中心）

AI总结本文提出了一种基于稀疏最小二乘优化的新方法，用于解决标量量化中的问题，通过引入l1、l1+l2和l0正则化，改进了传统聚类方法的不足，提升了在位宽缩减场景下的性能。

Journal ref IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019

详情

DOI: 10.1109/TPAMI.2019.2952096

AI中文摘要

量化可以用来形成具有共享值的新向量/矩阵，其值接近原始数据。近年来，标量量化在值共享应用中的普及度迅速上升，因为它在减少神经网络复杂度方面具有巨大实用性。现有的基于聚类的量化技术虽然发展成熟，但存在多个缺点，包括对随机种子的依赖性、空集群或超出范围的集群，以及大量集群时的时间复杂度高。为克服这些问题，本文从新的视角研究标量量化问题，即稀疏最小二乘优化。具体来说，受稀疏最小二乘回归性质的启发，提出了几种基于l1最小二乘的量化算法。此外，还提出了类似的方案，具有l1 + l2和l0正则化。此外，为了计算给定数量的值/集群的量化结果，本文设计了一种迭代方法和一种基于聚类的方法，并且两者都建立在稀疏最小二乘之上。本文表明，后者方法在数学上等价于改进版的k-means聚类基量化算法，尽管两种算法起源于不同的直觉。所提出的算法在三种类型的数据上进行了测试，比较和分析了其计算性能，包括信息损失、时间消耗以及稀疏向量值的分布。本文为量化领域提供了新的视角，所提出的算法在某些位宽缩减场景下表现优异，当所需的量化后分辨率（值的数量）不显著低于原始数量时尤其如此。

英文摘要

Quantization can be used to form new vectors/matrices with shared values close to the original. In recent years, the popularity of scalar quantization for value-sharing applications has been soaring as it has been found huge utilities in reducing the complexity of neural networks. Existing clustering-based quantization techniques, while being well-developed, have multiple drawbacks including the dependency of the random seed, empty or out-of-the-range clusters, and high time complexity for a large number of clusters. To overcome these problems, in this paper, the problem of scalar quantization is examined from a new perspective, namely sparse least square optimization. Specifically, inspired by the property of sparse least square regression, several quantization algorithms based on $l_1$ least square are proposed. In addition, similar schemes with $l_1 + l_2$ and $l_0$ regularization are proposed. Furthermore, to compute quantization results with a given amount of values/clusters, this paper designed an iterative method and a clustering-based method, and both of them are built on sparse least square. The paper shows that the latter method is mathematically equivalent to an improved version of k-means clustering-based quantization algorithm, although the two algorithms originated from different intuitions. The algorithms proposed were tested with three types of data and their computational performances, including information loss, time consumption, and the distribution of the values of the sparse vectors, were compared and analyzed. The paper offers a new perspective to probe the area of quantization, and the algorithms proposed can outperform existing methods especially under some bit-width reduction scenarios, when the required post-quantization resolution (number of values) is not significantly lower than the original number.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

High-Quality Entity Segmentation and Grounding

Blessing from Human-AI Interaction: Super Reinforcement Learning in Confounded Environments

A Regularization Approach to Blind Deblurring and Denoising of QR Barcodes

A Unified Analysis of AdaGrad with Weighted Aggregation and Momentum Acceleration

A Benchmark Environment Motivated by Industrial Control Problems

Gait learning for soft microrobots controlled by light fields

Reactive Task and Motion Planning for Robust Whole-Body Dynamic Locomotion in Constrained Environments

Flow: A Modular Learning Framework for Mixed Autonomy Traffic

Inverse Optimal Control from Incomplete Trajectory Observations

MAT-Fly: An Educational Platform for Simulating Unmanned Aerial Vehicles Aimed to Detect and Track Moving Objects

Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games

Interpretable PID Parameter Tuning for Control Engineering using General Dynamic Neural Networks: An Extensive Comparison

Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems

On Identification of Distribution Grids

A metric on the space of finite sets of trajectories for evaluation of multi-target tracking algorithms

PAC-Bayes Control: Learning Policies that Provably Generalize to Novel Environments

Some Limit Properties of Markov Chains Induced by Stochastic Recursive Algorithms

Constructing Parsimonious Analytic Models for Dynamic Systems via Symbolic Regression

Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models

Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

Interactive Differentiable Simulation

Learning robust control for LQR systems with multiplicative noise via policy gradient

Bayesian Optimization with Safety Constraints: Safe and Automatic Parameter Tuning in Robotics

Differential Temporal Difference Learning

Imaging with Kantorovich-Rubinstein discrepancy

Cortical-inspired image reconstruction via sub-Riemannian geometry and hypoelliptic diffusion

Inverse Statics Optimization for Compound Tensegrity Robots

Robotic Wireless Sensor Networks

Longitudinal Dynamic versus Kinematic Models for Car-Following Control Using Deep Reinforcement Learning

Scalar Quantization as Sparse Least Square Optimization