arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2404.11309 2026-06-04 cs.CV

Achieving Rotation-Invariant Convolution via Non-Learnable Orientation Alignment Operators

通过不可学习的朝向对齐算子实现旋转不变卷积

Hanlin Mo, Peihong Lei, You Hao, Guoying Zhao

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出基于不可学习算子的旋转不变卷积（RIConvs），其参数量和计算过程与标准卷积相同，在多个视觉任务中提升准确率，尤其在数据有限时效果显著。

详情

AI中文摘要

在深度神经网络中实现旋转不变性而无需数据增强是一个研究热点。内在不变性使特征能够捕捉目标的固有属性，从而提升深度学习在视觉任务中的性能。基于多种类型的不可学习算子，本文提出了一套对任意旋转自然不变的卷积操作。与大多数先前方法不同，这些旋转不变卷积（RIConvs）具有与标准卷积相同的可学习参数数量和相似的计算过程，因此可以互换。使用MNIST-Rot数据集，我们验证了它们在不同旋转角度下的不变性，并与先前的旋转不变CNN进行了比较，其中两种基于梯度的RIConvs取得了最先进的结果。然后，我们将RIConvs与经典CNN骨干网络集成，并在纹理识别、飞机类型识别和遥感图像分类任务上进行了评估。结果表明，RIConvs显著提高了准确率，特别是在训练数据有限的情况下，并且即使在使用数据增强时也能提升性能。

英文摘要

Achieving rotational invariance in deep neural networks without data augmentation is a research hotspot. Intrinsic invariance enables features to capture targets' inherent properties, enhancing deep learning performance in visual tasks. Based on various types of non-learnable operators, this paper proposes a comprehensive set of convolution operations that are natually invariant to arbitrary rotations. Unlike most prior methods, these rotation-invariant convolutions (RIConvs) have the same number of learnable parameters and a similar computational process as standard convolutions, making them interchangeable. Using the MNIST-Rot dataset, we validate their invariance across rotation angles and compare them with previous rotation-invariant CNNs, where two gradient-based RIConvs achieve state-of-the-art results. Then, we integrate RIConvs with classic CNN backbones and evaluate them on texture recognition, aircraft type recognition, and remote sensing image classification tasks. Results show that RIConvs significantly improve accuracy, particularly with limited training data, and enhance performance even with data augmentation.

URL PDF HTML ☆

赞 0 踩 0

1905.04235 2026-06-04 cs.RO cs.SY eess.SY

Autonomous Locomotion Mode Transition in Quadruped Track-Legged Robots: A Simulation-Based Analysis for Step Negotiation

四足履轮腿机器人自主运动模式切换：基于仿真的步阶跨越分析

Jie Wang, Krispin Davies

发表机构 * University of Cambridge（剑桥大学）； ClearPath AI

AI总结本文提出了一种用于四足混合机器人自主切换运动模式的方法，特别是在跨越不同高度台阶时，通过能量效率评估机制实现平稳过渡。

详情

DOI: 10.1016/j.simpat.2024.102893

AI中文摘要

混合履轮腿机器人结合了轮式和腿式运动的优势，通过高效切换滚动和行走模式，在多种地形中实现适应性。然而，自动实现这些切换仍然是重大挑战。本文介绍了一种用于四足混合机器人自主模式切换的方法，特别是在跨越台阶时。我们的方法基于一种决策机制，利用所提出的基于能量的准则评估两种运动模式的能量效率。为了确保平稳跨越台阶，我们结合了两种攀爬步态，用于评估行走运动的能量使用情况。仿真结果验证了该方法的有效性，显示在不同高度的台阶上实现了成功的自主切换。我们提出的方法具有通用性，可以修改以适应类似机械配置的其他混合机器人，前提是其运动能量性能已先进行研究。

英文摘要

Hybrid track/wheel-legged robots combine the advantages of wheel-based and leg-based locomotion, granting adaptability across varied terrains through efficient transitions between rolling and walking modes. However, automating these transitions remains a significant challenge. In this paper, we introduce a method designed for autonomous mode transition in a quadruped hybrid robot with a track/wheel-legged configuration, especially during step negotiation. Our approach hinges on a decision-making mechanism that evaluates the energy efficiency of both locomotion modes using a proposed energy-based criterion. To guarantee a smooth negotiation of steps, we incorporate two climbing gaits designated for the assessment of energy usage in walking locomotion. Simulation results validate the method's effectiveness, showing successful autonomous transitions across steps of diverse heights. Our suggested approach has universal applicability and can be modified to suit other hybrid robots of similar mechanical configuration, provided their locomotion energy performance is studied beforehand.

URL PDF HTML ☆

赞 0 踩 0

2402.02555 2026-06-04 cs.CV cs.CL

High-Quality Entity Segmentation and Grounding

高质量实体分割与定位

Lu Qi, Yi-Wen Chen, Tao Zhang, Xiangtai Li, Xu Yang, Bo Du, Ming-Hsuan Yang

发表机构 * Wuhan University（武汉大学）； Insta360 Research（Insta360研究院）； Department of EECS, University of California, Merced（加州大学默塞德分校电子工程与计算机科学系）； Nanyang Technological University（南洋理工大学）； Institute of Automation of the Chinese Academy of Sciences（中国科学院自动化研究所）

AI总结提出ESG流水线，通过新数据集EntitySeg和两阶段解耦设计（CropFormer高质量分割+GELLA精确名词提取与语义匹配），实现高质量实体分割与定位，在五项任务上有效。

详情

AI中文摘要

在这项工作中，我们提出了ESG，一个由新数据集EntitySeg支持的高质量实体分割与定位流水线。首先，所提出的数据集命名为EntitySeg，包含跨越各种图像域和实体的图像，以及用于训练和测试的大量高分辨率图像和高质量掩码标注。然后，ESG主要由两个模块组成：用于高质量实体分割的CropFormer，以及用于从句子中精确提取名词并在语言和视觉区域之间进行语义匹配的GELLA。与现有联合训练分割和大语言模型的定位方法不同，ESG采用两阶段解耦设计，保留了高质量掩码和定位鲁棒性，避免了联合训练通常带来的权衡。CropFormer确保高质量实体分割结果，然后可以编码到GELLA模型中进行有效定位。大量实验结果表明，我们提出的流水线在五项任务上有效，包括实体分割、全景分割、开放词汇分割、指代分割和全景定位叙述。此外，ESG流水线的GELLA模块高度灵活，能够处理来自任何分割框架的掩码输入，这得益于其轻量级的颜色图/视觉编码器、语言/掩码解码器和关联模块。实体分割数据集和定位代码将在https://github.com/qqlu/Entity发布。

英文摘要

In this work, we propose ESG, a pipeline for high-quality entity segmentation and grounding supported by a new dataset EntitySeg. At first, the proposed dataset naming EntitySeg contains images spanning various image domains and entities, along with plentiful high-resolution images and high-quality mask annotations for training and testing. Then, the ESG mainly consists of two modules: CropFormer for high-quality entity segmentation whereas GELLA for accurate noun extraction from sentences and semantic matching between language and visual regions. Unlike existing grounding methods that jointly train a segmentation and a large language model, ESG adopts a two-stage decoupled design, preserving high-quality masks and grounding robustness without the trade-offs often introduced by joint training. CropFormer ensures high-quality entity segmentation results, which can then be encoded into the GELLA model for effective grounding. Extensive experimental results demonstrate the effectiveness of our proposed pipeline across five tasks, including entity segmentation, panoptic segmentation, open-vocabulary segmentation, referring segmentation, and panoptic localized narratives. Furthermore, GELLA module of ESG pipeline is highly flexible and capable of processing mask inputs from any segmentation framework, thanks to its lightweight colormap/vision encoder, language/mask decoder, and association module. The entity segmentation dataset and grounding code will be released at https://github.com/qqlu/Entity.

URL PDF HTML ☆

赞 0 踩 0

2209.15448 2026-06-04 cs.LG math.ST stat.ME stat.TH

Blessing from Human-AI Interaction: Super Reinforcement Learning in Confounded Environments

人机交互的福音：混杂环境下的超级强化学习

Jiayi Wang, Zhengling Qi, Chengchun Shi

发表机构 * Department of Mathematical Sciences, University of Texas at Dallas（德克萨斯大学达拉斯分校数学科学系）； Department of Statistics, London School of Economics and Political Science（伦敦政治经济学院统计系）； Department of Decision Sciences, George Washington University（乔治华盛顿大学决策科学系）

AI总结提出利用人机交互中的观察动作进行超级策略学习，在存在未测量混杂的情况下，通过近端因果推断实现优于标准最优策略和行为策略的超级策略。

详情

AI中文摘要

随着人工智能在社会中越来越普遍，整合人类和AI系统以发挥各自优势并降低风险的有效方法已成为重要优先事项。在本文中，我们引入了超级策略学习的范式，该范式利用人机交互进行数据驱动的序贯决策。这种方法将来自AI或人类的观察动作作为输入，以实现决策者（人类或AI）在策略学习中更强的oracle。在存在未测量混杂的决策过程中，过去智能体采取的动作可以揭示未公开信息的有价值见解。通过以一种新颖且合法的方式将这些信息纳入策略搜索，所提出的超级策略学习将产生一个超级策略，该策略保证优于标准最优策略和行为策略（例如，过去智能体的动作）。我们将这种更强的oracle称为人机交互的福音。此外，为了解决使用批处理数据寻找超级策略时的未测量混杂问题，在近端因果推断框架下建立了一系列非参数和因果识别。基于这些新颖的识别结果，我们开发了几种超级策略学习算法，并系统研究了它们的理论性质，例如有限样本遗憾保证。最后，通过大量模拟和实际应用说明了我们方法的有效性。

英文摘要

As AI becomes more prevalent throughout society, effective methods of integrating humans and AI systems that leverage their respective strengths and mitigate risk have become an important priority. In this paper, we introduce the paradigm of super policy learning that takes advantage of Human-AI interaction for data driven sequential decision making. This approach utilizes the observed action, either from AI or humans, as input for achieving a stronger oracle in policy learning for the decision maker (humans or AI). In the decision process with unmeasured confounding, the actions taken by past agents can offer valuable insights into undisclosed information. By including this information for the policy search in a novel and legitimate manner, the proposed super policy learning will yield a super-policy that is guaranteed to outperform both the standard optimal policy and the behavior one (e.g., past agents' actions). We call this stronger oracle a blessing from human-AI interaction. Furthermore, to address the issue of unmeasured confounding in finding super-policies using the batch data, a number of nonparametric and causal identifications are established under the framework of proximal causal inference. Building upon on these novel identification results, we develop several super-policy learning algorithms and systematically study their theoretical properties such as finite-sample regret guarantee. Finally, we illustrate the effectiveness of our proposal through extensive simulations and real-world applications.

URL PDF HTML ☆

赞 0 踩 0

1410.6333 2026-06-04 cs.CV cs.NA math.NA

A Regularization Approach to Blind Deblurring and Denoising of QR Barcodes

一种正则化方法用于QR条形码的盲去模糊和去噪

Yves van Gennip, Prashant Athavale, Jérôme Gilles, Rustum Choksi

发表机构 * Fields Institute, University of Toronto（多伦多大学菲尔兹研究所）

AI总结本文提出了一种基于正则化的纯方法，用于在存在噪声的情况下对QR条形码进行盲去模糊和去噪，利用了已知的所需模式和开源条形码阅读器的事实。

Comments 14 pages, 19 figures (with a total of 57 subfigures), 1 table; v3: previously missing reference [35] added

1808.03408 2026-06-04 cs.LG cs.NA math.NA math.OC stat.ML

A Unified Analysis of AdaGrad with Weighted Aggregation and Momentum Acceleration

AdaGrad的统一分析：带加权聚合和动量加速

Li Shen, Congliang Chen, Fangyu Zou, Zequn Jie, Ju Sun, Wei Liu

发表机构 * JD Explore Academy, Beijing, China（京东探索研究院，北京，中国）； Facebook, USA（Facebook，美国）； Meituan, Beijing, China（美团，北京，中国）； University of Minnesota, Twin Cities, USA（明尼苏达大学双城分校，美国）； Tencent, Shenzhen, China（腾讯，深圳，中国）

AI总结本文提出了一种名为AdaUSM的加权AdaGrad算法，通过统一动量方案和新型加权自适应学习率，实现了在非凸随机设置下的O(√(log(T)/T))收敛率，并从新视角解释了Adam和RMSProp的自适应学习率。

Comments IEEE TNNLS

详情

AI中文摘要

将自适应学习率和动量技术整合到SGD中，可以得到一系列高效加速的自适应随机算法，如AdaGrad、RMSProp、Adam、AccAdaGrad等。尽管这些算法在实践中效果显著，但在非凸随机设置下的收敛理论仍存在较大差距。为此，我们提出了名为AdaUSM的加权AdaGrad，其主要特点包括（1）采用统一的动量方案，涵盖重球动量和Nesterov加速梯度动量；（2）采用新颖的加权自适应学习率，能够统一AdaGrad、AccAdaGrad、Adam和RMSProp的学习率。此外，当在AdaUSM中采用多项式增长的权重时，可以得到非凸随机设置下的O(√(log(T)/T))收敛率。我们还展示了Adam和RMSProp的自适应学习率对应于在AdaUSM中采用指数增长的权重，从而为理解Adam和RMSProp提供了新的视角。最后，我们还在各种深度学习模型和数据集上进行了AdaUSM与SGD动量、AdaGrad、AdaEMA、Adam和AMSGrad的比较实验。

英文摘要

Integrating adaptive learning rate and momentum techniques into SGD leads to a large class of efficiently accelerated adaptive stochastic algorithms, such as AdaGrad, RMSProp, Adam, AccAdaGrad, \textit{etc}. In spite of their effectiveness in practice, there is still a large gap in their theories of convergences, especially in the difficult non-convex stochastic setting. To fill this gap, we propose \emph{weighted AdaGrad with unified momentum}, dubbed AdaUSM, which has the main characteristics that (1) it incorporates a unified momentum scheme which covers both the heavy ball momentum and the Nesterov accelerated gradient momentum; (2) it adopts a novel weighted adaptive learning rate that can unify the learning rates of AdaGrad, AccAdaGrad, Adam, and RMSProp. Moreover, when we take polynomially growing weights in AdaUSM, we obtain its $\mathcal{O}(\log(T)/\sqrt{T})$ convergence rate in the non-convex stochastic setting. We also show that the adaptive learning rates of Adam and RMSProp correspond to taking exponentially growing weights in AdaUSM, thereby providing a new perspective for understanding Adam and RMSProp. Lastly, comparative experiments of AdaUSM against SGD with momentum, AdaGrad, AdaEMA, Adam, and AMSGrad on various deep learning models and datasets are also carried out.

URL PDF HTML ☆

赞 0 踩 0

1709.09480 2026-06-04 cs.AI cs.LG cs.SY eess.SY

A Benchmark Environment Motivated by Industrial Control Problems

由工业控制问题启发的基准环境

Daniel Hein, Stefan Depeweg, Michel Tokic, Steffen Udluft, Alexander Hentschel, Thomas A. Runkler, Volkmar Sterzing

发表机构 * Siemens AG, Corporate Technology（西门子股份公司企业技术部）

AI总结本文提出一个结合工业控制问题的基准环境，旨在解决真实工业环境与现有人工基准之间缺乏联系的问题，通过详细描述基准动态并识别典型实验设置来促进强化学习方法的改进。

详情

DOI: 10.1109/SSCI.2017.8280935
Journal ref: 2017 IEEE Symposium Series on Computational Intelligence (SSCI)

AI中文摘要

在强化学习（RL）研究领域，频繁出现新的有前景的方法被开发并引入RL社区。然而，尽管许多研究人员渴望将他们的方法应用于现实世界的问题，但在真实工业环境中实施这些方法往往是一个令人沮丧和繁琐的过程。通常，学术研究小组只能有限地访问真实工业数据和应用。因此，新方法通常通过使用人工软件基准来开发、评估和比较。一方面，这些基准旨在提供可解释的RL训练场景和对所用方法学习过程的深入见解。另一方面，它们通常与现实工业应用缺乏相似性。为此，我们利用行业经验设计了一个基准，以弥合自由可用、文档齐全且有动机的人工基准与真实工业问题属性之间的差距。所得到的工业基准（IB）已通过在GitHub上发布其Java和Python代码，包括一个OpenAI Gym包装器，向RL社区公开。在本文中，我们详细阐述了IB的动力学，并识别了能够捕捉现实世界工业控制问题中常见情况的典型实验设置。

英文摘要

In the research area of reinforcement learning (RL), frequently novel and promising methods are developed and introduced to the RL community. However, although many researchers are keen to apply their methods on real-world problems, implementing such methods in real industry environments often is a frustrating and tedious process. Generally, academic research groups have only limited access to real industrial data and applications. For this reason, new methods are usually developed, evaluated and compared by using artificial software benchmarks. On one hand, these benchmarks are designed to provide interpretable RL training scenarios and detailed insight into the learning process of the method on hand. On the other hand, they usually do not share much similarity with industrial real-world applications. For this reason we used our industry experience to design a benchmark which bridges the gap between freely available, documented, and motivated artificial benchmarks and properties of real industrial problems. The resulting industrial benchmark (IB) has been made publicly available to the RL community by publishing its Java and Python code, including an OpenAI Gym wrapper, on Github. In this paper we motivate and describe in detail the IB's dynamics and identify prototypic experimental settings that capture common situations in real-world industry control problems.

URL PDF HTML ☆

赞 0 踩 0

1809.03225 2026-06-04 cs.RO cs.LG cs.SY eess.SY

Gait learning for soft microrobots controlled by light fields

基于光场控制的软微机器人步态学习

Alexander von Rohr, Sebastian Trimpe, Alonso Marco, Peer Fischer, Stefano Palagi

发表机构 * Micro, Nano, and Molecular Systems Group, Max Planck Institute for Intelligent Systems（微、纳、分子系统组，人工智能系统马克斯·普朗克研究所）； Max Planck ETH Center for Learning Systems（马克斯·普朗克-ETH学习系统中心）

AI总结本文提出一种基于贝叶斯优化和高斯过程的概率学习方法，用于优化光场控制的软微机器人步态，通过有限实验预算实现高效且鲁棒的运动性能提升。

Comments 8 pages, 7 figures, to appear in the proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems 2018

详情

DOI: 10.1109/IROS.2018.8594092

AI中文摘要

基于光场控制的软微机器人可以生成多种不同的步态。这种内在的灵活性可以用来最大化其在特定环境中的运动性能，并用于适应变化的条件。然而，由于缺乏准确的运动模型以及微机器人之间的固有变异性，分析控制设计是不可能的。另一方面，常见的数据驱动方法需要运行大量的实验，导致非常特定于样本的结果。本文提出了一种基于贝叶斯优化（BO）和高斯过程（GPs）的概率学习方法，用于光场控制的软微机器人。所提出的方法产生了一种学习方案，该方案在数据效率方面表现优异，能够在有限的实验预算下进行步态优化，并且对微机器人样本之间的差异具有鲁棒性。这些特性是通过在半合成数据集上比较不同的GP先验和BO设置来设计学习方案获得的。开发的学习方案在微机器人实验中得到验证，结果在仅20次实验的预算下，使微机器人的运动性能提高了115%。这些令人鼓舞的结果为基于光场控制的软微机器人和概率学习控制的自适应微机器人系统铺平了道路。

英文摘要

Soft microrobots based on photoresponsive materials and controlled by light fields can generate a variety of different gaits. This inherent flexibility can be exploited to maximize their locomotion performance in a given environment and used to adapt them to changing conditions. Albeit, because of the lack of accurate locomotion models, and given the intrinsic variability among microrobots, analytical control design is not possible. Common data-driven approaches, on the other hand, require running prohibitive numbers of experiments and lead to very sample-specific results. Here we propose a probabilistic learning approach for light-controlled soft microrobots based on Bayesian Optimization (BO) and Gaussian Processes (GPs). The proposed approach results in a learning scheme that is data-efficient, enabling gait optimization with a limited experimental budget, and robust against differences among microrobot samples. These features are obtained by designing the learning scheme through the comparison of different GP priors and BO settings on a semi-synthetic data set. The developed learning scheme is validated in microrobot experiments, resulting in a 115% improvement in a microrobot's locomotion performance with an experimental budget of only 20 tests. These encouraging results lead the way toward self-adaptive microrobotic systems based on light-controlled soft microrobots and probabilistic learning control.

URL PDF HTML ☆

赞 0 踩 0

1811.04333 2026-06-04 cs.RO cs.FL cs.SY eess.SY

Reactive Task and Motion Planning for Robust Whole-Body Dynamic Locomotion in Constrained Environments

面向受限环境的鲁棒全身体动态运动的反应任务与运动规划

Ye Zhao, Yinan Li, Luis Sentis, Ufuk Topcu, Jun Liu

发表机构 * George W. Woodruff School of Mechanical Engineering, Georgia Tech, USA（佐治亚理工学院机械工程学院）； Department of Applied Mathematics, University of Waterloo, Canada（滑铁卢大学应用数学系）； Department of Aerospace Engineering and Engineering Mechanics, UT Austin, USA（得克萨斯大学奥斯汀分校航空航天工程与工程力学系）； Institute for Computational Engineering and Sciences, UT Austin, USA（得克萨斯大学奥斯汀分校计算工程与科学研究所）

AI总结本文提出了一种基于时序逻辑的游戏框架，用于在受限和动态变化的环境中进行全身体动态运动的任务规划与控制，通过符号系统的方法确保运动行为的正确性。

Comments 49 pages, 23 figures, 1 table

详情

AI中文摘要

基于接触的决策和规划方法越来越重要，以赋予四足机器人更高的自主性。源自符号系统的正式合成方法在推理高层运动决策和实现复杂 maneuvering 行为方面具有巨大潜力。本文首次尝试正式设计由任务规划和全身体动态运动控制组成的架构，在受限和动态变化的环境中。在高层，我们构建了一个多肢体运动规划器与其动态环境之间的双玩家时序逻辑游戏，以合成一个获胜策略，提供符号运动动作。这些运动动作满足由时序逻辑片段表达的期望高层任务规范。这些动作发送到一个稳健的有限状态转换系统，该系统合成一个满足状态可达性约束的运动控制器。此控制器通过低层运动规划器进一步执行，生成可行的运动轨迹。我们构建了一系列动态运动模型用于腿部机器人，作为处理多样化环境事件的模板库。我们设计了一种重新规划策略，考虑突发环境变化或大状态扰动，以提高最终运动行为的鲁棒性。我们正式证明了分层运动框架的正确性，通过运动规划层保证鲁棒实现。在多种环境中的反应运动行为模拟表明，我们的框架有潜力成为智能运动行为的理论基础。

英文摘要

Contact-based decision and planning methods are becoming increasingly important to endow higher levels of autonomy for legged robots. Formal synthesis methods derived from symbolic systems have great potential for reasoning about high-level locomotion decisions and achieving complex maneuvering behaviors with correctness guarantees. This study takes a first step toward formally devising an architecture composed of task planning and control of whole-body dynamic locomotion behaviors in constrained and dynamically changing environments. At the high level, we formulate a two-player temporal logic game between the multi-limb locomotion planner and its dynamic environment to synthesize a winning strategy that delivers symbolic locomotion actions. These locomotion actions satisfy the desired high-level task specifications expressed in a fragment of temporal logic. Those actions are sent to a robust finite transition system that synthesizes a locomotion controller that fulfills state reachability constraints. This controller is further executed via a low-level motion planner that generates feasible locomotion trajectories. We construct a set of dynamic locomotion models for legged robots to serve as a template library for handling diverse environmental events. We devise a replanning strategy that takes into consideration sudden environmental changes or large state disturbances to increase the robustness of the resulting locomotion behaviors. We formally prove the correctness of the layered locomotion framework guaranteeing a robust implementation by the motion planning layer. Simulations of reactive locomotion behaviors in diverse environments indicate that our framework has the potential to serve as a theoretical foundation for intelligent locomotion behaviors.

URL PDF HTML ☆

赞 0 踩 0

1710.05465 2026-06-04 cs.AI cs.RO cs.SY eess.SY

Flow: A Modular Learning Framework for Mixed Autonomy Traffic

Flow: 一种用于混合自主性的模块化学习框架

Cathy Wu, Aboudy Kreidieh, Kanaad Parvate, Eugene Vinitsky, Alexandre M Bayen

发表机构 * Laboratory for Information and Decision Systems, Massachusetts Institute of Technology（信息与决策实验室，麻省理工学院）； Institute of Data, Systems, and Society, Massachusetts Institute of Technology（数据、系统与社会研究所，麻省理工学院）； Department of Mechanical Engineering, University of California, Berkeley（机械工程系，加州大学伯克利分校）

AI总结本文提出了一种模块化学习框架，利用深度强化学习解决复杂交通动态问题，通过提高系统层面的速度，使学习到的控制法则在仅有4-7%的自动驾驶汽车参与度下，相比人类驾驶性能提升高达57%。此外，在单车道交通中，一个仅使用局部观测的小型神经网络控制法则能够消除拥堵现象，达到近最优性能。

Comments 17 pages, 8 figures, 5 tables. 2021 IEEE Transactions on Robotics (T-RO)

详情

DOI: 10.1109/TRO.2021.3087314

AI中文摘要

自动驾驶车辆（AVs）的快速发展为交通系统带来了巨大的潜力，通过提高安全性和效率以及出行可及性。然而，随着AVs的采用，这些影响的发展进程并不清楚。从分析部分自动驾驶的总体目标来看，出现了许多技术挑战：部分控制和观测、多车辆交互以及现实世界网络所代表的大量场景。为了深入了解近期AV的影响，本文研究了深度强化学习（RL）在低AV采用率环境下克服这些挑战的适用性。本文提出了一种模块化学习框架，利用深度RL来处理复杂的交通动态。模块由多个部分组成，以捕捉常见的交通现象（如停止-启动交通拥堵、车道变换、交叉口）。学习到的控制法则在系统层面的速度上优于人类驾驶性能，仅在4-7%的AVs参与度下，提高了高达57%。此外，在单车道交通中，一个仅使用局部观测的小型神经网络控制法则被发现能够消除停止-启动交通现象，超越了所有已知的基于模型的控制器，达到近最优性能，并且能够推广到非分布交通密度。

英文摘要

The rapid development of autonomous vehicles (AVs) holds vast potential for transportation systems through improved safety, efficiency, and access to mobility. However, the progression of these impacts, as AVs are adopted, is not well understood. Numerous technical challenges arise from the goal of analyzing the partial adoption of autonomy: partial control and observation, multi-vehicle interactions, and the sheer variety of scenarios represented by real-world networks. To shed light into near-term AV impacts, this article studies the suitability of deep reinforcement learning (RL) for overcoming these challenges in a low AV-adoption regime. A modular learning framework is presented, which leverages deep RL to address complex traffic dynamics. Modules are composed to capture common traffic phenomena (stop-and-go traffic jams, lane changing, intersections). Learned control laws are found to improve upon human driving performance, in terms of system-level velocity, by up to 57% with only 4-7% adoption of AVs. Furthermore, in single-lane traffic, a small neural network control law with only local observation is found to eliminate stop-and-go traffic - surpassing all known model-based controllers to achieve near-optimal performance - and generalize to out-of-distribution traffic densities.

URL PDF HTML ☆

赞 0 踩 0

1803.07696 2026-06-04 cs.RO cs.SY eess.SY

Inverse Optimal Control from Incomplete Trajectory Observations

从不完整轨迹观测中逆最优控制

Wanxin Jin, Dana Kulić, Shaoshuai Mou, Sandra Hirche

发表机构 * School of Aeronautics and Astronautics, Purdue University（航空与航天学院，普渡大学）； Monash University（墨尔本大学）； Chair of Information-oriented Control, Technical University of Munich（信息导向控制教授职位，慕尼黑技术大学）

AI总结本文提出了一种从不完整轨迹观测中学习最优控制系统目标函数的方法，通过恢复矩阵确定候选特征的权重，并开发了增量逆最优控制算法。

Comments Codes: https://github.com/wanxinjin/IOC-from-Incomplete-Trajectory-Observations

详情

DOI: 10.1177/0278364921996384
Journal ref: The International Journal of Robotics Research. 2021;40(6-7):848-865

AI中文摘要

本文开发了一种方法，使能够从不完整的轨迹观测中学习最优控制系统的的目标函数。假设目标函数是未知权重的特征（或基函数）的加权和，观测数据是系统状态和输入轨迹的一段。所提出的技术引入了恢复矩阵的概念，以建立任何可用轨迹段与给定候选特征权重之间的关系。恢复矩阵的秩表明是否可以在候选特征中找到相关子集，并且可以从段数据中学习相应的权重。恢复矩阵可以迭代获得，其秩非递减的性质表明额外的观测可能有助于目标学习。基于恢复矩阵，建立了一种使用不完整轨迹观测学习所选特征权重的方法，并通过自动寻找所需的最小观测开发了增量逆最优控制算法。该方法的有效性在线性二次调节系统和模拟机器人机械臂上得到了验证。

英文摘要

This article develops a methodology that enables learning an objective function of an optimal control system from incomplete trajectory observations. The objective function is assumed to be a weighted sum of features (or basis functions) with unknown weights, and the observed data is a segment of a trajectory of system states and inputs. The proposed technique introduces the concept of the recovery matrix to establish the relationship between any available segment of the trajectory and the weights of given candidate features. The rank of the recovery matrix indicates whether a subset of relevant features can be found among the candidate features and the corresponding weights can be learned from the segment data. The recovery matrix can be obtained iteratively and its rank non-decreasing property shows that additional observations may contribute to the objective learning. Based on the recovery matrix, a method for using incomplete trajectory observations to learn the weights of selected features is established, and an incremental inverse optimal control algorithm is developed by automatically finding the minimal required observation. The effectiveness of the proposed method is demonstrated on a linear quadratic regulator system and a simulated robot manipulator.

URL PDF HTML ☆

赞 0 踩 0

1904.00378 2026-06-04 cs.RO cs.SY eess.SY

MAT-Fly: An Educational Platform for Simulating Unmanned Aerial Vehicles Aimed to Detect and Track Moving Objects

MAT-Fly：一种用于模拟无人驾驶航空器的教育平台，旨在检测和跟踪移动物体

Giuseppe Silano, Luigi Iannelli

发表机构 * Faculty of Electrical Engineering, Czech Technical University in Prague（布拉格捷克技术大学电气工程系）； Department of Engineering, University of Sannio in Benevento, Piazza Roma 21（巴内维诺萨恩尼奥大学工程系，罗马广场21号）

AI总结本文提出了一种用于无人驾驶航空器领域特定任务的模拟方法，即视觉检测和跟踪任意移动物体，介绍了MAT-Fly平台，该平台基于Matlab和MathWorks虚拟现实（VR）和计算机视觉系统（CVS）工具箱，用于模拟四旋翼飞行器跟踪沿复杂路径移动的汽车，并开源供教育使用。

Comments 11 pages, 15 figures, journal paper

详情

DOI: 10.1109/ACCESS.2021.3064758
Journal ref: IEEE Access, 2021

AI中文摘要

本文的主要动机是提出一种针对无人驾驶航空器领域特定任务的模拟方法，即视觉检测和跟踪任意移动物体。特别地，介绍了MAT-Fly，一个具有易用性和控制开发特点的多旋翼飞行器数值模拟平台。该平台基于Matlab和MathWorks虚拟现实（VR）和计算机视觉系统（CVS）工具箱，共同模拟四旋翼飞行器在跟踪沿复杂路径移动的汽车时的行为。VR工具箱被选择是因为学生对Matlab比较熟悉，并且由于其结构简单，用户在学习和开发阶段不需要付出显著的努力。整体架构非常模块化，使得每个模块可以轻松替换，从而简化代码重用和平台定制。一些简单的测试环境被展示以证明该方法的有效性以及平台的工作方式。该模拟器作为开源发布，使用户能够查看系统中的任何部分，并用于教育目的。

英文摘要

The main motivation of this work is to propose a simulation approach for a specific task within the Unmanned Aerial Vehicle (UAV) field, i.e., the visual detection and tracking of arbitrary moving objects. In particular, it is described MAT-Fly, a numerical simulation platform for multi-rotor aircraft characterized by the ease of use and control development. The platform is based on Matlab and the MathWorks Virtual Reality (VR) and Computer Vision System (CVS) toolboxes that work together to simulate the behavior of a quad-rotor while tracking a car that moves along a nontrivial path. The VR toolbox has been chosen due to the familiarity that students have with Matlab and because it does not require a notable effort by the user for the learning and development phase thanks to its simple structure. The overall architecture is quite modular so that each block can be easily replaced with others simplifying the code reuse and the platform customization. Some simple testbeds are presented to show the validity of the approach and how the platform works. The simulator is released as open-source, making it possible to go through any part of the system, and available for educational purposes.

URL PDF HTML ☆

赞 0 踩 0

1906.00729 2026-06-04 cs.LG cs.GT cs.SY eess.SY math.OC stat.ML

Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games

策略优化在零和线性二次博弈中可证明收敛至纳什均衡

Kaiqing Zhang, Zhuoran Yang, Tamer Başar

发表机构 * Department of Electrical and Computer Engineering & Coordinated Science Laboratory, University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校电子工程与协调科学实验室部门）； Department of Operations Research and Financial Engineering, Princeton University（普林斯顿大学运筹学与金融工程系）

AI总结本文研究了策略优化在零和线性二次博弈中寻找纳什均衡的全局收敛性，通过分析LQ博弈的优化景观，证明了线性反馈控制策略的 stationary 点构成博弈的纳什均衡，并提出三种保证收敛到纳什均衡的投影嵌套梯度方法，同时展示了这些算法具有全局次线性和局部线性收敛率。

Comments Fixed some typos, addressed some comments from NeurIPS reviews

详情

AI中文摘要

我们研究了策略优化在寻找零和线性二次（LQ）博弈纳什均衡（NE）中的全局收敛性。为此，我们首先分析了LQ博弈的景观，将其视为策略空间中的非凸非凹鞍点问题。具体来说，我们证明了尽管其非凸性和非凹性，零和LQ博弈具有性质：目标函数相对于线性反馈控制策略的 stationary 点构成博弈的纳什均衡。在此基础上，我们开发了三种投影嵌套梯度方法，这些方法保证能够收敛到博弈的纳什均衡。此外，我们证明所有这些算法都具有全局次线性和局部线性收敛率。还提供了仿真结果以说明算法的满意收敛特性。据我们所知，这项工作似乎是首次研究LQ博弈的优化景观，并且证明了策略优化方法收敛到纳什均衡。我们的工作为理解一般零和马尔可夫游戏中的基于策略的强化学习算法的理论方面提供了初步步骤。

英文摘要

We study the global convergence of policy optimization for finding the Nash equilibria (NE) in zero-sum linear quadratic (LQ) games. To this end, we first investigate the landscape of LQ games, viewing it as a nonconvex-nonconcave saddle-point problem in the policy space. Specifically, we show that despite its nonconvexity and nonconcavity, zero-sum LQ games have the property that the stationary point of the objective function with respect to the linear feedback control policies constitutes the NE of the game. Building upon this, we develop three projected nested-gradient methods that are guaranteed to converge to the NE of the game. Moreover, we show that all of these algorithms enjoy both globally sublinear and locally linear convergence rates. Simulation results are also provided to illustrate the satisfactory convergence properties of the algorithms. To the best of our knowledge, this work appears to be the first one to investigate the optimization landscape of LQ games, and provably show the convergence of policy optimization methods to the Nash equilibria. Our work serves as an initial step toward understanding the theoretical aspects of policy-based reinforcement learning algorithms for zero-sum Markov games in general.

URL PDF HTML ☆

赞 0 踩 0

1905.13268 2026-06-04 cs.LG cs.SY eess.SY stat.ML

Interpretable PID Parameter Tuning for Control Engineering using General Dynamic Neural Networks: An Extensive Comparison

使用通用动态神经网络进行可解释的PID参数调节：一种广泛的比较

Johannes Günther, Elias Reichensdörfer, Patrick M. Pilarski, Klaus Diepold

发表机构 * Department of Computing Science, University of Alberta（阿尔伯塔大学计算机科学系）； Alberta Machine Intelligence Institute（阿尔伯塔人工智能研究所）

AI总结本文研究了如何通过通用动态神经网络（GDNN）扩展PID控制器，以提高复杂控制系统的性能和可解释性，通过四个基准系统的广泛比较，展示了神经PID控制器在16项任务中优于传统PID和模型驱动控制的13项任务。

详情

DOI: 10.1371/journal.pone.0243320

AI中文摘要

现代自动化系统依赖于闭环控制，其中控制器根据观察与受控过程交互。这些系统日益复杂，但大多数控制器仍是线性比例-积分-微分（PID）控制器。PID控制器在处理线性和近线性系统时表现良好，但其简单性与控制复杂过程所需鲁棒性相矛盾。现代机器学习提供了一种方法，即通过神经网络扩展PID控制器，以超越其线性能力。然而，这种扩展以失去稳定性保证和控制器可解释性为代价。本文研究了通过循环神经网络（即通用动态神经网络GDNN）扩展PID控制器的效用，证明GDNN（神经）PID控制器在多种控制系统中表现良好，并强调其作为可扩展和可解释的控制选项。为此，我们通过四个基准系统进行了广泛研究，这些系统代表了最常用的控制工程基准。所有控制基准均在有噪声和无噪声、有干扰和无干扰的情况下进行评估。神经PID控制器在16项任务中优于传统PID控制15项，在16项任务中优于模型驱动控制13项。作为第二项贡献，我们解决了防止神经网络用于实际控制过程的可解释性不足问题。我们使用有界输入有界输出稳定性分析来评估神经网络建议的参数，从而使其变得可理解。这种严格的评估与更好的可解释性相结合，是神经网络控制方法接受的重要步骤。此外，这也是可解释和安全应用人工智能的重要步骤。

英文摘要

Modern automation systems rely on closed loop control, wherein a controller interacts with a controlled process, based on observations. These systems are increasingly complex, yet most controllers are linear Proportional-Integral-Derivative (PID) controllers. PID controllers perform well on linear and near-linear systems but their simplicity is at odds with the robustness required to reliably control complex processes. Modern machine learning offers a way to extend PID controllers beyond their linear capabilities by using neural networks. However, such an extension comes at the cost of losing stability guarantees and controller interpretability. In this paper, we examine the utility of extending PID controllers with recurrent neural networks-namely, General Dynamic Neural Networks (GDNN); we show that GDNN (neural) PID controllers perform well on a range of control systems and highlight how they can be a scalable and interpretable option for control systems. To do so, we provide an extensive study using four benchmark systems that represent the most common control engineering benchmarks. All control benchmarks are evaluated with and without noise as well as with and without disturbances. The neural PID controller performs better than standard PID control in 15 of 16 tasks and better than model-based control in 13 of 16 tasks. As a second contribution, we address the lack of interpretability that prevents neural networks from being used in real-world control processes. We use bounded-input bounded-output stability analysis to evaluate the parameters suggested by the neural network, thus making them understandable. This combination of rigorous evaluation paired with better interpretability is an important step towards the acceptance of neural-network-based control approaches. It is furthermore an important step towards interpretable and safely applied artificial intelligence.

URL PDF HTML ☆

赞 0 踩 0

1903.01577 2026-06-04 cs.RO cs.LG cs.SY eess.SY

Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems

具有控制李雅普诺夫函数的不确定性机器人系统的经验学习

Andrew J. Taylor, Victor D. Dorobantu, Hoang M. Le, Yisong Yue, Aaron D. Ames

发表机构 * California Institute of Technology（加州理工学院）

AI总结本文提出了一种基于控制李雅普诺夫函数的机器学习框架，用于适应机器人系统中的参数不确定性和未建模动态，通过迭代更新李雅普诺夫函数导数的估计和改进控制器，最终获得一个稳定性的二次规划基于控制器，并在平面Segway模拟中验证了方法的有效性。

详情

DOI: 10.1109/IROS40897.2019.8967820

AI中文摘要

许多现代非线性控制方法旨在赋予系统保证性质，如稳定性或安全性，并已成功应用于机器人领域。然而，模型不确定性仍然是持续的挑战，削弱了理论保证并导致物理系统中的实施失败。本文开发了一种以控制李雅普诺夫函数（CLFs）为中心的机器学习框架，以适应一般机器人系统中的参数不确定性和未建模动态。我们提出的方法通过迭代更新李雅普诺夫函数导数的估计并改进控制器，最终获得一个基于二次规划的稳定控制器。我们在平面Segway模拟中验证了我们的方法，通过迭代改进基础无模型控制器，展示了显著的性能提升。

英文摘要

Many modern nonlinear control methods aim to endow systems with guaranteed properties, such as stability or safety, and have been successfully applied to the domain of robotics. However, model uncertainty remains a persistent challenge, weakening theoretical guarantees and causing implementation failures on physical systems. This paper develops a machine learning framework centered around Control Lyapunov Functions (CLFs) to adapt to parametric uncertainty and unmodeled dynamics in general robotic systems. Our proposed method proceeds by iteratively updating estimates of Lyapunov function derivatives and improving controllers, ultimately yielding a stabilizing quadratic program model-based controller. We validate our approach on a planar Segway simulation, demonstrating substantial performance improvements by iteratively refining on a base model-free controller.

URL PDF HTML ☆

赞 0 踩 0

1711.01526 2026-06-04 cs.LG cs.SY eess.SY math.OC

On Identification of Distribution Grids

配电网络的识别

Omid Ardakanian, Vincent W. S. Wong, Roel Dobbe, Steven H. Low, Alexandra von Meier, Claire Tomlin, Ye Yuan

发表机构 * Department of Electrical Engineering and Computer Sciences, UC Berkeley, USA（伯克利大学电气工程与计算机科学系，美国）

AI总结本文研究了如何通过遥测数据联合估计配电网络的模型参数和运行结构，利用lasso方法进行回归收缩和选择，提出可处理配电系统低秩结构的可行凸优化程序，并开发了用于早期检测和定位引起电导矩阵变化的关键事件的在线算法。

详情

DOI: 10.1109/TCNS.2019.2891002

AI中文摘要

将分布式能源资源大规模整合到住宅配电馈线中需要通过潮流分析仔细控制其运行。虽然分布系统模型的知识对于此类分析至关重要，但这种知识往往不可用或过时。最近同步相量技术在低压配电网络中的引入为从高精度、时间同步的电压和电流相量测量中学习此模型创造了前所未有的机会。本文重点是通过lasso方法（一种回归收缩和选择方法）从可用遥测数据中联合估计多相配电网络的模型参数（电导值）和运行结构。我们提出了能够处理配电系统低秩结构的可行凸优化程序，并开发了用于早期检测和定位引起电导矩阵变化的关键事件的在线算法。这些技术的有效性通过四个三相辐射形配电系统在真实家庭需求上的潮流研究得到验证。

英文摘要

Large-scale integration of distributed energy resources into residential distribution feeders necessitates careful control of their operation through power flow analysis. While the knowledge of the distribution system model is crucial for this type of analysis, it is often unavailable or outdated. The recent introduction of synchrophasor technology in low-voltage distribution grids has created an unprecedented opportunity to learn this model from high-precision, time-synchronized measurements of voltage and current phasors at various locations. This paper focuses on joint estimation of model parameters (admittance values) and operational structure of a poly-phase distribution network from the available telemetry data via the lasso, a method for regression shrinkage and selection. We propose tractable convex programs capable of tackling the low rank structure of the distribution system and develop an online algorithm for early detection and localization of critical events that induce a change in the admittance matrix. The efficacy of these techniques is corroborated through power flow studies on four three-phase radial distribution systems serving real household demands.

URL PDF HTML ☆

赞 0 踩 0

1605.01177 2026-06-04 cs.CV cs.SY eess.SY

A metric on the space of finite sets of trajectories for evaluation of multi-target tracking algorithms

有限轨迹集合空间上的度量用于多目标跟踪算法评估

Ángel F. García-Fernández, Abu Sajana Rahmathullah, Lennart Svensson

发表机构 * Zenuity AB（Zenuity AB公司）

AI总结本文提出了一种用于以数学严谨方式评估多目标跟踪算法的有限轨迹集合空间上的度量。该度量用于比较不同算法对轨迹的估计与真实轨迹，并包含与定位误差、漏检和误检以及轨迹切换相关的直观成本。度量计算基于解决多维分配问题，还提出了该度量的下界，该下界也是轨迹集合的度量，并可通过线性规划在多项式时间内计算。此外，还扩展了该度量到随机有限轨迹集合。

Comments Matlab code for the metric is available at https://github.com/Agarciafernandez/MTT

详情

DOI: 10.1109/TSP.2020.3005309
Journal ref: in IEEE Transactions on Signal Processing, vol. 68, pp. 3917-3928, 2020

AI中文摘要

在本文中，我们提出了一种度量，用于以数学严谨的方式评估多目标跟踪算法。该度量的主要用途是将不同算法对轨迹的估计与真实轨迹进行比较。所提出的度量包括与每个时间步长正确检测目标、漏检和误检以及轨迹切换相关的直观成本。度量计算基于解决多维分配问题。我们还提出了该度量的下界，该下界也是轨迹集合的度量，并可通过线性规划在多项式时间内计算。此外，我们还扩展了该度量到随机有限轨迹集合。

英文摘要

In this paper, we propose a metric on the space of finite sets of trajectories for assessing multi-target tracking algorithms in a mathematically sound way. The main use of the metric is to compare estimates of trajectories from different algorithms with the ground truth of trajectories. The proposed metric includes intuitive costs associated to localization error for properly detected targets, missed and false targets and track switches at each time step. The metric computation is based on solving a multi-dimensional assignment problem. We also propose a lower bound for the metric, which is also a metric for sets of trajectories and is computable in polynomial time using linear programming. We also extend the proposed metrics on sets of trajectories to random finite sets of trajectories.

URL PDF HTML ☆

赞 0 踩 0

1806.04225 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY math.OC

PAC-Bayes Control: Learning Policies that Provably Generalize to Novel Environments

PAC-Bayes 控制：学习能够证明在新环境中泛化的能力的策略

Anirudha Majumdar, Alec Farid, Anoopkumar Sonar

发表机构 * Department of Mechanical and Aerospace Engineering（1,2 机械与航空航天工程系）； Department of Computer Science Princeton University（3 计算机科学系纽约大学普林斯顿分校）

AI总结本文提出了一种基于PAC-Bayes框架的机器人策略学习方法，通过在新环境中泛化能力的理论分析，为机器人系统提供强泛化保证。

Comments Extended version of paper presented at the 2018 Conference on Robot Learning (CoRL)

详情

AI中文摘要

我们的目标是学习能够证明在新环境中泛化能力的机器人控制策略，给定一组示例环境的数据集。我们方法的关键技术思想是利用机器学习中的泛化理论工具，通过精确的类比（以缩减形式呈现）将控制策略在新环境中的泛化与监督学习中的假设泛化联系起来。特别是，我们利用Probably Approximately Correct (PAC)-Bayes框架，这使我们能够获得在新环境中（随机）控制策略预期成本的上界。我们提出策略学习算法，明确寻求最小化此上界。相应的优化问题可以在有限策略空间的设置中通过凸优化（特别是相对熵编程）解决。在更一般的情况下，对于连续参数化策略（例如神经网络策略），我们使用随机梯度下降来最小化此上界。我们展示了所提出方法应用于学习（1）反应性障碍物回避策略和（2）基于神经网络的抓取策略的模拟结果。我们还展示了Parrot Swing无人机在不同障碍物环境中的硬件结果。我们的例子展示了该方法在具有连续状态和动作空间、复杂（例如非线性）动态、丰富感官输入（例如深度图像）和基于神经网络的策略的机器人系统中提供强泛化保证的潜力。

英文摘要

Our goal is to learn control policies for robots that provably generalize well to novel environments given a dataset of example environments. The key technical idea behind our approach is to leverage tools from generalization theory in machine learning by exploiting a precise analogy (which we present in the form of a reduction) between generalization of control policies to novel environments and generalization of hypotheses in the supervised learning setting. In particular, we utilize the Probably Approximately Correct (PAC)-Bayes framework, which allows us to obtain upper bounds that hold with high probability on the expected cost of (stochastic) control policies across novel environments. We propose policy learning algorithms that explicitly seek to minimize this upper bound. The corresponding optimization problem can be solved using convex optimization (Relative Entropy Programming in particular) in the setting where we are optimizing over a finite policy space. In the more general setting of continuously parameterized policies (e.g., neural network policies), we minimize this upper bound using stochastic gradient descent. We present simulated results of our approach applied to learning (1) reactive obstacle avoidance policies and (2) neural network-based grasping policies. We also present hardware results for the Parrot Swing drone navigating through different obstacle environments. Our examples demonstrate the potential of our approach to provide strong generalization guarantees for robotic systems with continuous state and action spaces, complicated (e.g., nonlinear) dynamics, rich sensory inputs (e.g., depth images), and neural network-based policies.

URL PDF HTML ☆

赞 0 踩 0

1904.10778 2026-06-04 cs.LG cs.SY eess.SY math.OC math.PR stat.ML

Some Limit Properties of Markov Chains Induced by Stochastic Recursive Algorithms

由随机递归算法诱导的马尔可夫链的一些极限性质

Abhishek Gupta, Hao Chen, Jianzong Pi, Gaurav Tendolkar

发表机构 * Electrical and Computer Engineering Department, The Ohio State University（俄亥俄州立大学电气与计算机工程系）； Microsoft Corp（微软公司）

AI总结本文研究了由随机递归算法诱导的马尔可夫链的极限性质，通过分析迭代随机算子的收敛性，证明了随机序列的分布弱收敛于收缩算子生成的轨迹，并进一步展示了随机序列的时间平均收敛于不变分布的空间均值。

Comments Accepted in SIMODS, 37 pages

详情

AI中文摘要

递归随机算法由于数据驱动应用而近期受到广泛关注。例如，随机梯度下降用于解决大规模优化问题，经验动态规划算法用于解决马尔可夫决策问题。这些递归随机算法近似某些收缩算子，并可以被视为迭代随机算子的框架内。因此，我们考虑在波兰空间上迭代随机算子，模拟该波兰空间上的迭代收缩算子。假设迭代随机算子按一定批次大小索引，当批次大小趋于无穷时，每个随机算子的实现（以某种方式）收敛于它所模拟的收缩算子。我们证明，从相同的初始条件出发，由迭代随机算子生成的随机序列的分布弱收敛于由收缩算子生成的轨迹。我们进一步证明，在某些条件下，随机序列的时间平均收敛于不变分布的空间均值。然后，我们将这些结果应用于逻辑回归、经验价值迭代和经验Q值迭代，以说明此处发展的通用理论。

英文摘要

Recursive stochastic algorithms have gained significant attention in the recent past due to data driven applications. Examples include stochastic gradient descent for solving large-scale optimization problems and empirical dynamic programming algorithms for solving Markov decision problems. These recursive stochastic algorithms approximate certain contraction operators and can be viewed within the framework of iterated random operators. Accordingly, we consider iterated random operators over a Polish space that simulate iterated contraction operator over that Polish space. Assume that the iterated random operators are indexed by certain batch sizes such that as batch sizes grow to infinity, each realization of the random operator converges (in some sense) to the contraction operator it is simulating. We show that starting from the same initial condition, the distribution of the random sequence generated by the iterated random operators converges weakly to the trajectory generated by the contraction operator. We further show that under certain conditions, the time average of the random sequence converges to the spatial mean of the invariant distribution. We then apply these results to logistic regression, empirical value iteration, and empirical Q value iteration for finite state finite action MDPs to illustrate the general theory develop here.

URL PDF HTML ☆

赞 0 踩 0

1903.11483 2026-06-04 cs.LG cs.NE cs.RO cs.SY eess.SY stat.ML

Constructing Parsimonious Analytic Models for Dynamic Systems via Symbolic Regression

通过符号回归构建动态系统的简洁解析模型

Erik Derner, Jiří Kubalík, Nicola Ancona, Robert Babuška

发表机构 * Czech Institute of Informatics, Robotics, and Cybernetics（捷克信息学、机器人学与自动化研究所）； Czech Technical University in Prague（布拉格捷克技术大学）； Department of Control Engineering, Faculty of Electrical Engineering（电气工程系控制工程系）； Delft University of Technology（代尔夫特理工大学）

AI总结本文提出利用符号回归构建动态系统的简洁解析模型，通过两种先进的符号回归算法在状态空间域和输入输出域中应用，展示了在模拟示例和真实系统中的优越性能。

详情

DOI: 10.1016/j.asoc.2020.106432
Journal ref: Applied Soft Computing, Volume 94, September 2020, 106432

AI中文摘要

构建动态系统的数学模型对于许多工程和科学学科至关重要。模型有助于模拟、分析系统行为、决策制定和自动控制算法的设计。即使像强化学习（RL）这样的无模型控制技术也已被证明能从使用模型中受益，通常这些模型是在线学习的。任何模型构建方法都必须处理模型的准确性和复杂性之间的权衡，这很难做到。本文提出利用符号回归（SR）来构建由解析方程描述的简洁过程模型。我们为方法配备了两种最先进的符号回归算法，它们自动搜索适合测量数据的方程：单节点遗传编程（SNGP）和多基因遗传编程（MGGP）。除了状态空间域中的标准问题表述外，我们还展示了该方法如何应用于非线性自回归加外生输入（NARX）类型的输入输出模型。我们展示了该方法在三个模拟示例中的应用，这些示例的状态空间最高可达14维：倒立摆、移动机器人和双足行走机器人。与深度神经网络和局部线性回归的比较表明，SR在大多数情况下优于这些常用替代方法。我们在真实摆系统上展示了解析模型的发现使RL控制器能够成功完成摆起任务，该模型仅基于100个数据样本构建。

英文摘要

Developing mathematical models of dynamic systems is central to many disciplines of engineering and science. Models facilitate simulations, analysis of the system's behavior, decision making and design of automatic control algorithms. Even inherently model-free control techniques such as reinforcement learning (RL) have been shown to benefit from the use of models, typically learned online. Any model construction method must address the tradeoff between the accuracy of the model and its complexity, which is difficult to strike. In this paper, we propose to employ symbolic regression (SR) to construct parsimonious process models described by analytic equations. We have equipped our method with two different state-of-the-art SR algorithms which automatically search for equations that fit the measured data: Single Node Genetic Programming (SNGP) and Multi-Gene Genetic Programming (MGGP). In addition to the standard problem formulation in the state-space domain, we show how the method can also be applied to input-output models of the NARX (nonlinear autoregressive with exogenous input) type. We present the approach on three simulated examples with up to 14-dimensional state space: an inverted pendulum, a mobile robot, and a bipedal walking robot. A comparison with deep neural networks and local linear regression shows that SR in most cases outperforms these commonly used alternative methods. We demonstrate on a real pendulum system that the analytic model found enables a RL controller to successfully perform the swing-up task, based on a model constructed from only 100 data samples.

URL PDF HTML ☆

赞 0 踩 0

1904.01068 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY

Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models

在未知转移模型的确定性马尔可夫决策过程中实现高效且安全的探索

Erdem Bıyık, Jonathan Margoliash, Shahrouz Ryan Alimo, Dorsa Sadigh

发表机构 * Stanford University（斯坦福大学）； Jet Propulsion Laboratory（喷气推进实验室）； California Institute of Technology（加州理工学院）

AI总结本文提出了一种安全探索算法，通过利用Lipschitz连续性确保在探索过程中不访问危险状态，该算法在确定性马尔可夫决策过程中提供了确定性的安全保证，并通过模拟导航任务验证了其性能。

Comments Proceedings of the American Control Conference (ACC), July 2019. The first two authors have equal contribution

1905.03632 2026-06-04 cs.SD cs.SY eess.AS eess.SY

Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates

基于DNN的块在线多通道语音增强方法

Jiri Malek, Zbynek Koldovsky, Marek Bohac

发表机构 * Faculty of Mechatronics, Informatics, and Interdisciplinary Studies, Technical University of Liberec（机械电子与交叉学科学院，利贝雷茨技术大学）

AI总结本文提出了一种基于DNN的块在线多通道语音增强方法，通过估计相对传输函数来实现波束成形，并在动态环境中处理短语音，提升了语音增强的鲁棒性。

Comments 10 pages, 8 figures, 4 tables. Modified version of the article accepted for publication in IET Signal Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusions

详情

DOI: 10.1049/iet-spr.2019.0304
Journal ref: IET Signal Processing, vol. 14, no. 3, pp. 124-133, May 2020

AI中文摘要

本文解决多通道语音增强中的块在线处理问题。此类处理在移动说话人或处理极短语音（如语音助手场景）时至关重要。我们考虑了一种系统，该系统通过基于DNN的语音活动检测（VAD）进行波束成形，随后进行后滤波。通过估计麦克风之间的相对传输函数来定位说话人。输入信号的每个块独立处理，以使其适用于高度动态的环境。由于处理块长度较短，波束成形所需的统计信息估计不够精确。本研究分析了这种不精确性的影响，并将其与将记录视为单块（批量处理）的处理模式进行比较。所提出方法在CHiME-4大型数据集和另一个具有移动目标说话人数据集上进行了实验评估。评估基于客观和主观标准（如信号干扰比（SIR）或语音质量主观评价（PESQ））。此外，还评估了基于基线自动语音识别系统的词错误率（WER），其中增强方法作为前端解决方案。结果表明，所提出的方法在处理块长度较短时具有鲁棒性。即使在250毫秒的块长度下，也能在各项指标和WER上观察到显著改进。

英文摘要

This work addresses the problem of block-online processing for multi-channel speech enhancement. Such processing is vital in scenarios with moving speakers and/or when very short utterances are processed, e.g., in voice assistant scenarios. We consider several variants of a system that performs beamforming supported by DNN-based voice activity detection (VAD) followed by post-filtering. The speaker is targeted through estimating relative transfer functions between microphones. Each block of the input signals is processed independently in order to make the method applicable in highly dynamic environments. Owing to the short length of the processed block, the statistics required by the beamformer are estimated less precisely. The influence of this inaccuracy is studied and compared to the processing regime when recordings are treated as one block (batch processing). The experimental evaluation of the proposed method is performed on large datasets of CHiME-4 and on another dataset featuring moving target speaker. The experiments are evaluated in terms of objective and perceptual criteria (such as signal-to-interference ratio (SIR) or perceptual evaluation of speech quality (PESQ), respectively). Moreover, word error rate (WER) achieved by a baseline automatic speech recognition system is evaluated, for which the enhancement method serves as a front-end solution. The results indicate that the proposed method is robust with respect to short length of the processed block. Significant improvements in terms of the criteria and WER are observed even for the block length of 250 ms.

URL PDF HTML ☆

赞 0 踩 0

1905.10706 2026-06-04 cs.LG cs.RO cs.SY eess.SY stat.ML

Interactive Differentiable Simulation

交互式可微分模拟

Eric Heiden, David Millard, Hejia Zhang, Gaurav S. Sukhatme

发表机构 * University of Southern California（南加州大学）

AI总结本文提出交互式可微分模拟（IDS），一种能够高效准确推断刚体系统物理属性的可微分物理引擎，通过视觉输入实现系统识别，从而建立具有物理意义的世界模型，并在非线性动态系统中实现自动任务机器人设计和参数估计，显著提升了非线性控制领域的样本效率。

详情

AI中文摘要

智能体需要对世界有物理理解才能预测其未来行动的影响。虽然基于学习的环境动力学模型在样本效率上相比无模型强化学习算法有所改进，但通常无法泛化到训练数据之外的系统状态，且往往依赖于非解释性的潜在变量。我们引入交互式可微分模拟（IDS），一种可微分的物理引擎，能够高效准确地推断刚体系统的物理属性。将模型集成到深度学习架构中，该模型能够利用视觉输入实现系统识别，从而建立具有物理意义的世界模型。我们展示了通过自动计算IDS中的梯度，实现非线性动态系统的自动任务机器人设计和参数估计。当与自适应模型预测控制算法结合时，我们的方法在具有挑战性的非线性控制领域中，相比无模型强化学习算法显示出数量级的样本效率提升。

英文摘要

Intelligent agents need a physical understanding of the world to predict the impact of their actions in the future. While learning-based models of the environment dynamics have contributed to significant improvements in sample efficiency compared to model-free reinforcement learning algorithms, they typically fail to generalize to system states beyond the training data, while often grounding their predictions on non-interpretable latent variables. We introduce Interactive Differentiable Simulation (IDS), a differentiable physics engine, that allows for efficient, accurate inference of physical properties of rigid-body systems. Integrated into deep learning architectures, our model is able to accomplish system identification using visual input, leading to an interpretable model of the world whose parameters have physical meaning. We present experiments showing automatic task-based robot design and parameter estimation for nonlinear dynamical systems by automatically calculating gradients in IDS. When integrated into an adaptive model-predictive control algorithm, our approach exhibits orders of magnitude improvements in sample efficiency over model-free reinforcement learning algorithms on challenging nonlinear control domains.

URL PDF HTML ☆

赞 0 踩 0

1905.13547 2026-06-04 cs.LG cs.SY eess.SY math.DS math.OC stat.ML

Learning robust control for LQR systems with multiplicative noise via policy gradient

通过策略梯度学习具有乘性噪声的LQR系统的鲁棒控制

Benjamin Gravell, Peyman Mohajerin Esfahani, Tyler Summers

发表机构 * Control, Optimization, and Networks lab, UT Dallas（控制、优化与网络实验室，UT Dallas）； Delft Center for Systems and Control, TU Delft（代尔夫特系统与控制中心，TU Delft）

AI总结本文研究了具有乘性噪声的LQR系统，通过策略梯度方法实现鲁棒控制，证明了在非凸成本函数下策略梯度算法的全局收敛性。

详情

AI中文摘要

线性二次调节（LQR）问题重新成为强化学习控制复杂动态系统的重要理论基准，特别是当状态和动作空间连续时。与几乎所有近期相关工作不同，我们考虑了乘性噪声模型，这些模型由于显式地纳入系统动态中的固有不确定性和变化，从而提高了控制器的鲁棒性。鲁棒性是强化学习中一个关键但理解不足的问题；现有不考虑不确定性的方法可能会收敛到脆弱的策略或完全无法收敛。此外，有意地将乘性噪声注入到学习算法中可以增强策略的鲁棒性，如在领域随机化中的非正式工作所观察到的。尽管策略梯度算法需要优化非凸成本函数，我们展示了乘性噪声LQR成本具有称为梯度支配的特殊性质，该性质被用来证明策略梯度算法在问题参数上具有多项式依赖性的全局收敛性，以达到全局最优控制策略。结果在已知模型和未知模型设置中均提供，其中系统轨迹样本用于估计策略梯度。

英文摘要

The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for reinforcement learning-based control of complex dynamical systems with continuous state and action spaces. In contrast with nearly all recent work in this area, we consider multiplicative noise models, which are increasingly relevant because they explicitly incorporate inherent uncertainty and variation in the system dynamics and thereby improve robustness properties of the controller. Robustness is a critical and poorly understood issue in reinforcement learning; existing methods which do not account for uncertainty can converge to fragile policies or fail to converge at all. Additionally, intentional injection of multiplicative noise into learning algorithms can enhance robustness of policies, as observed in ad hoc work on domain randomization. Although policy gradient algorithms require optimization of a non-convex cost function, we show that the multiplicative noise LQR cost has a special property called gradient domination, which is exploited to prove global convergence of policy gradient algorithms to the globally optimum control policy with polynomial dependence on problem parameters. Results are provided both in the model-known and model-unknown settings where samples of system trajectories are used to estimate policy gradients.

URL PDF HTML ☆

赞 0 踩 0

1602.04450 2026-06-04 cs.RO cs.LG cs.SY eess.SY

Bayesian Optimization with Safety Constraints: Safe and Automatic Parameter Tuning in Robotics

具有安全约束的贝叶斯优化：机器人中的安全自动参数调节

Felix Berkenkamp, Andreas Krause, Angela P. Schoellig

发表机构 * 1 Learning \& Adaptive Systems Group, Department of Computer Science, ETH Zurich, Switzerland. 2 Dynamic Systems Lab, Institute for Aerospace Studies, University of Toronto, Canada.

AI总结本文提出了一种通用算法，允许在目标函数之外存在多个独立的安全约束。该算法在给定初始安全参数的情况下，最大化性能，但仅评估满足所有安全约束的参数。通过利用高斯过程先验的正则性假设，该算法仔细探索参数空间，并展示了如何利用上下文变量安全地将知识转移到新任务中。

详情

AI中文摘要

机器人算法通常依赖于各种参数，这些参数的选择对机器人的性能有显著影响。虽然初始参数猜测可以从机器人的动态模型中获得，但通常需要在真实系统上手动调整参数以达到最佳性能。优化算法，如贝叶斯优化，已被用来自动化这一过程。然而，这些方法在优化过程中可能会评估不安全的参数，导致安全关键系统的故障。最近，一种称为SafeOpt的安全贝叶斯优化算法已被开发，该算法保证系统性能永远不会低于临界值；即，安全性是基于性能函数定义的。然而，在机器人中，将性能和安全性结合往往并不理想。例如，高增益控制器可能实现低平均跟踪误差（性能），但可能会超调并违反输入约束。在本文中，我们提出了一种通用算法，允许在目标函数之外存在多个独立的安全约束。给定初始的安全参数集，该算法最大化性能，但只评估满足所有约束的参数，以高概率。为此，它通过利用高斯过程先验的正则性假设来仔细探索参数空间。此外，我们展示了如何利用上下文变量安全地将知识转移到新情况和任务中。我们提供了理论分析，并证明所提出的算法能够实现快速、自动和安全的参数调节优化，在四旋翼飞行器的实验中得到了验证。

英文摘要

Robotic algorithms typically depend on various parameters, the choice of which significantly affects the robot's performance. While an initial guess for the parameters may be obtained from dynamic models of the robot, parameters are usually tuned manually on the real system to achieve the best performance. Optimization algorithms, such as Bayesian optimization, have been used to automate this process. However, these methods may evaluate unsafe parameters during the optimization process that lead to safety-critical system failures. Recently, a safe Bayesian optimization algorithm, called SafeOpt, has been developed, which guarantees that the performance of the system never falls below a critical value; that is, safety is defined based on the performance function. However, coupling performance and safety is often not desirable in robotics. For example, high-gain controllers might achieve low average tracking error (performance), but can overshoot and violate input constraints. In this paper, we present a generalized algorithm that allows for multiple safety constraints separate from the objective. Given an initial set of safe parameters, the algorithm maximizes performance but only evaluates parameters that satisfy safety for all constraints with high probability. To this end, it carefully explores the parameter space by exploiting regularity assumptions in terms of a Gaussian process prior. Moreover, we show how context variables can be used to safely transfer knowledge to new situations and tasks. We provide a theoretical analysis and demonstrate that the proposed algorithm enables fast, automatic, and safe optimization of tuning parameters in experiments on a quadrotor vehicle.

URL PDF HTML ☆

赞 0 踩 0

2606.05150 2026-06-04 cs.NE cs.AI

Multi-Column RBF Neural Network Using Adaptive and Non-Adaptive Particle Swarm Optimization

使用自适应和非自适应粒子群优化的多列RBF神经网络

Ammar Hoori, Yuichi Motai

发表机构 * Department of Biomedical Engineering, Case Western Reserve University（生物医学工程系，凯斯西储大学）； Department of Electrical and Computer Engineering, Virginia Commonwealth University（电气与计算机工程系，弗吉尼亚 Commonwealth 大学）

AI总结针对大规模数据集下RBF神经网络训练的可扩展性问题，提出基于粒子群优化（PSO）和自适应PSO（APSO）的多列RBF网络（MC-PSO和MC-APSO），通过并行训练多个RBFN并利用子集专门化提高精度和速度。

Comments 15 Page, Under Review

详情

AI中文摘要

使用梯度下降算法训练的径向基函数神经网络（RBFN）在浅层和深层网络中提供了有效的全连接结构。误差校正（ErrCor）是一种先进的基于梯度的训练方法，它选择最优隐藏单元以提高精度。另外，作为基于种群的算法，粒子群优化算法（PSO）利用群体经验优化RBFN参数，提供全局搜索和对局部最小值的鲁棒性。自适应PSO（APSO）作为PSO的改进变体出现。APSO算法通过在优化过程中动态调整群体参数来提高收敛速度。ErrCor和PSO都显示出改进的结果和有竞争力的收敛性。然而，对于大规模数据集，这些方法面临可扩展性挑战，如过多的核计算和大的隐藏层结构。最近的多列RBFN方法（MCRN）通过在并行系统中部署小型RBFN来提高ErrCor性能。受MCRN成功的启发，我们提出了两种改进PSO性能的新方法：使用PSO的多列RBFN（MC-PSO）和使用APSO的多列RBFN（MC-APSO）。这些方法引入了使用进化群方法训练的并行RBFN结构。每个RBFN独立地在数据集的特定空间子集上使用PSO或APSO算法进行训练。这些经过专门训练的RBFN针对各自的子集进行了定制。在测试期间，只有测试实例邻居所在的选定RBFN对多列输出有贡献。这种专门化提高了精度，而并行性提高了速度。我们在各种基准数据集上评估了所提出的方法。MC-PSO和MC-APSO在精度和召回率方面优于ErrCor、PSO、APSO和MCRN。在大多数实验中，它们还表现出更快的训练和测试时间。

英文摘要

The radial basis function neural network (RBFN) trained with a gradient descending algorithm provides an effective fully connected structure in both shallow and deep networks. The error correction (ErrCor), a state-of-the-art gradient-based training method, selects optimal hidden units to improve accuracy. Alternatively, as a population-based algorithm, the particle swarm optimization algorithm (PSO) uses the swarm experience to optimize RBFN parameters, offering global search and robustness to local minima. Adaptive PSO (APSO) has emerged as an improved variant of PSO. APSO algorithm improves convergence speed by dynamically adjusting swarm parameters during optimization. Both ErrCor and PSO demonstrate improved results and competitive convergence. However, with large datasets, these methods face scalability challenges such as excessive kernel computations and large hidden layer structures. A recent multi-column RBFN approach (MCRN) improves ErrCor performance by deploying small RBFNs in a parallel system. Inspired by MCRN's success, we propose two novel approaches to improve PSO performance: the multi-column RBFN with PSO (MC-PSO) and the multi-column RBFN with APSO (MC-APSO). These methods introduce parallel RBFN structures trained using evolutionary swarm methods. Each RBFN is independently trained on a specific spatial subset of the dataset using either PSO or APSO algorithms. These resulting specialist-trained RBFNs are tailored to their respective subsets. During testing, only selected RBFNs, where the test instance neighbors are located, contribute to the multi-column output. This specialization improves accuracy, while parallelism enhances speed. We evaluate the proposed methods on various benchmark datasets. The MC-PSO and MC-APSO outperform ErrCor, PSO, APSO, and MCRN in terms of accuracy and recall. They also demonstrate faster training and testing times in most experiments.

URL PDF HTML ☆

赞 0 踩 0

2606.05129 2026-06-04 cs.CR cs.LG

Preserving Data Privacy in Learning Causal Structure with Fully Homomorphic Encryption

在全同态加密下学习因果结构时保护数据隐私

Jian Yang, Yuan Tong, Qinbin Li, Zeyi Wen, Xiaofang Zhou

发表机构 * Hong Kong University of Science and Technology (Guangzhou)（香港理工大学（广州））； Hong Kong University of Science and Technology（香港理工大学）； University of California, Berkeley（加州大学伯克利分校）

AI总结针对分布式因果结构学习中的隐私泄露问题，提出基于全同态加密的方法，通过电路简化、除法和对数近似以及SIMD批处理技术，在加密数据上高效完成因果结构学习，并支持扩展到差分隐私。

详情

AI中文摘要

保护数据隐私是结构数据管理和数据挖掘中的重要课题。然而，分布式因果结构学习中的隐私泄露问题是一个持续的挑战，特别是在需要数据传输和计算的情况下。在本文中，我们提出了一种基于全同态加密（FHE）的方法，该方法在密文上进行计算，保持数据在传输和计算过程中加密。然而，由于FHE计算成本高且对除法和对数运算的支持有限，将FHE应用于因果结构学习具有挑战性。为了应对这一挑战，我们提出了一系列新颖的技术，包括（i）电路简化以提高效率，（ii）通过牛顿-拉夫森倒数和泰勒展开近似除法和对数，以及（iii）使用SIMD加速的批处理技术来增强整个学习过程。此外，我们的方法可以轻松扩展到FHE之外，通过展示其可移植性来支持差分隐私。实验结果表明，我们的方法在测试的数据集上实现了与明文版本高度一致且可比的因果结构。最后，即使在FHE的隐私保护下，我们的方法也能在几十分钟内高效且实际地完成因果结构学习。

英文摘要

Preserving data privacy is an important topic in structural data management and data mining. However, the issue of privacy leakage in distributed causal structure learning is a persistent challenge, especially in cases where data transmission and computation are required. In this paper, we propose a method based on fully homomorphic encryption (FHE) that performs calculations on ciphertexts, keeping data encrypted in transition and computation. Nevertheless, adopting FHE to causal structure learning is challenging due to the high computation cost and limited support on division as well as logarithm operations in FHE. To tackle this challenge, we propose a series of novel techniques including (i) circuit simplification for better efficiency, (ii) approximation of division and logarithm through Newton-Raphson Reciprocal and Taylor expansion, and (iii) a batching technique with SIMD-acceleration to enhance the whole learning process. Additionally, our method can be easily extended beyond FHE by demonstration of its portability to support differential privacy. Empirical results show that our method achieves high consistency and comparable causal structure with the plaintext version in the datasets tested. Last, our method is efficient and practical to complete learning causal structures in tens of minutes even under the privacy protection of FHE.

URL PDF HTML ☆

赞 0 踩 0

2606.05124 2026-06-04 cs.GR cs.CV cs.LG

Geometry Gaussians: Decoupling Appearance and Geometry in Gaussian Splatting

几何高斯：在高斯泼溅中解耦外观与几何

Hongyu Zhou, Zorah Lähner

发表机构 * University of Bonn（波恩大学）； Lamarr Institut（拉马尔研究所）

AI总结针对3D高斯泼溅在几何表示与外观渲染间的冲突，提出通过为每个溅射添加几何不透明度参数并配合透明度优化流程，实现几何与外观的解耦，提升复杂场景（尤其是透明物体）的渲染与几何性能。

详情

AI中文摘要

在3D高斯泼溅（3DGS）成功用于新视角合成后，许多工作探索了如何将其用于几何表面表示。然而，直接从3DGS中提取准确的几何信息仍然具有挑战性，且往往会降低外观渲染质量。在这项工作中，我们通过使用完整的地面真值纹理和几何信息进行训练，证明了默认形式的3DGS本质上不适合同时表示纹理和几何。我们还提出了一种简单的解决方案，即为每个溅射应用一个额外的几何不透明度参数，并配合可选的透明度策划优化流程。我们的实验，无论是使用地面真值还是视觉基础模型的几何输入，都表明这一改变在多种数据集上提高了渲染和几何性能，尤其是对于包含透明物体的复杂场景，我们的方法带来了显著提升。

英文摘要

After the success of 3D Gaussian Splatting (3DGS) for novel view synthesis, many works have explored how to also use it for geometric surface representation. However, extracting accurate geometric information directly from 3DGS remains challenging and can often reduce the appearance rendering quality. In this work, we show that 3DGS in its default form is inheritedly unsuited to represent texture and geometry at the same time, by training with complete ground-truth texture and geometry information. We also propose a simple solution by applying a single additional geometry opacity parameter to each splat, together with an optional transparency-curated optimization pipeline. Our experiments, both with ground-truth and vision foundation model geometric input, show that this change leads to improved rendering and geometry performance on a wide variety of dataset, and especially complex scenes with transparent objects benefit significantly from our method.

URL PDF HTML ☆

赞 0 踩 0

2606.05045 2026-06-04 math.DS cs.LG

Learning Control-Affine Reduced-Order Models via Autoencoders

通过自编码器学习控制仿射降阶模型

Ali Mjalled, Martin Mönnigmann

发表机构 * Automatic Control and Systems Theory Ruhr-Universität Bochum（自动控制与系统理论梅尔恩大学波恩分校）

AI总结提出一种利用自编码器同时学习降阶潜在空间和控制仿射状态空间动力学的框架，并扩展为序列模型以提高预测精度，通过反馈线性化验证其有效性。

详情

AI中文摘要

本文提出了一种用于识别控制仿射降阶模型（ROM）的框架。该方法利用自编码器（AE）将高维状态以及潜在的高维输入变换为适合控制仿射状态空间动力学的降维潜在变量。这是通过同时训练AE和状态空间模型实现的。此外，我们将离散ROM公式扩展为基于序列的模型，该模型处理状态和输入历史以提高预测精度，同时保持控制仿射结构。我们通过对导出的模型应用反馈线性化来激励我们的框架，并提出了有效使用它的指南。所提出的框架在两个数值示例上进行了评估，并将其性能与基线模型（其中AE识别具有线性状态空间动力学的潜在空间）进行了比较。评估涉及测试数据上ROM的预测精度及其将系统控制到期望状态或轨迹的有效性。

英文摘要

We present in this paper a framework for the identification of control-affine reduced-order models (ROMs). The proposed method utilizes autoencoders (AEs) to transform the high-dimensional states, and potentially the high-dimensional inputs, into reduced latent ones suitable for control-affine state-space dynamics. This is achieved by simultaneous training of the AE and the state-space model. In addition, we extend the discrete ROM formulation to a sequence-based model, which processes state and input histories to improve prediction accuracy while preserving the control-affine structure. We motivate our framework by applying feedback linearization to the derived models, and we present guidelines for its efficient use. The proposed framework is assessed on two numerical examples and its performance is compared to a baseline model, where the AE identifies a latent space with linear state-space dynamics. The assessment involves evaluating the prediction accuracy of the ROM on test data and its effectiveness in controlling the system to a desired state or trajectory.

URL PDF HTML ☆

赞 0 踩 0

2606.05037 2026-06-04 cs.SE cs.AI

Self-Reflective APIs: Structure Beats Verbosity for AI Agent Recovery

自反式API：结构优于冗长，助力AI代理恢复

Arquimedes Canedo, Grama Chethan

发表机构 * Siemens Digital Industries Software, USA（西门子数字工业软件公司）

AI总结提出自反式API，在验证失败时返回机器可读的结构化建议，使AI代理无需外部推理即可修复请求并重试，在Anthropic模型上将任务完成率提升36.7-40.0个百分点，且每成功令牌效率提升1.8-2.2倍。

详情

AI中文摘要

当AI代理调用API并遇到验证错误时，它需要的不仅仅是哪里出错了——它需要下一步该做什么。自反式API在验证失败时返回一个机器可读的 recovery_feedback.suggestions[] 负载，足以让代理修复请求并在无需外部推理的情况下重试。在一个经过泄露审计的试点实验（每单元N=30，3个LLM，10个对抗性任务）中，结构化建议在Anthropic模型上将任务完成率提升了+36.7至40.0个百分点（Fisher精确检验 p ≤ 0.0022），每成功令牌效率提高了1.8至2.2倍。在gpt-4o-mini上提升不显著（p=0.435）；在计费API上的第二个领域复制确认了这一模式。该比较仅在审计了LLM基准测试中两个未记录的答案泄露类别后才成立。我们提供了 audit_prompt_leakage.py 作为可重用的CI基础设施。代码和数据：https://github.com/arquicanedo/self-reflective-apis。

英文摘要

When an AI agent calls an API and hits a validation error, it needs more than what went wrong -- it needs what to do next. A self-reflective API returns, on validation failure, a machine-readable recovery\_feedback.suggestions[] payload sufficient for the agent to repair the request and retry without external reasoning. On a leak-audited pilot ($N{=}30$ per cell, 3 LLMs, 10 adversarial tasks), structured suggestions lift task-completion rate by $+36.7$--$40.0$pp over plain-English diagnoses on Anthropic models (Fisher's exact $p \le 0.0022$), at $1.8$--$2.2\times$ better per-success token efficiency. The lift is not significant on gpt-4o-mini ($p{=}0.435$); a second-domain replication on a billing API confirms the pattern. The comparison only holds after auditing two undocumented classes of answer leakage in LLM benchmarks. We shipaudit\_prompt\_leakage.py as reusable CI infrastructure. Code and data: https://github.com/arquicanedo/self-reflective-apis.

URL PDF HTML ☆

赞 0 踩 0