arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

1111.2258 2026-06-03 cs.RO cs.SY eess.SY

Design and Implementation of Prosthetic Arm using Gear Motor Control Technique with Appropriate Testing

使用齿轮电机控制技术的假肢手臂设计与实现及适当测试

Biswarup Neogi, Soumyajit Mukherjee, Soumya Ghosal, Achintya Das, D. N. Tibarewala

AI总结本文提出一种基于齿轮电机控制技术的假肢手臂硬件设计方法，通过处理器编程实现手臂运动，并用肌肉应变替代传统肌电信号，成功测试了轻量化假肢模型。

Comments 5 Pages,13 Figures

1109.1552 2026-06-03 cs.LG cs.NI cs.SY eess.SY math.OC math.PR

Efficient Online Learning for Opportunistic Spectrum Access

机会频谱接入的高效在线学习

Wenhan Dai, Yi Gai, Bhaskar Krishnamachari

AI总结针对认知无线电网络中机会频谱接入的非贝叶斯多臂赌博机问题，提出连续探索与利用（CEE）算法，实现近对数遗憾界，并在已知部分信息时达到对数遗憾。

详情

AI中文摘要

认知无线电网络中的机会频谱接入问题最近被建模为非贝叶斯非平稳多臂赌博机问题。该问题中，有N个臂（对应信道）和一个玩家（对应次用户）。每个臂的状态演变为参数未知的有限状态马尔可夫链。在每个时隙，玩家可以选择K < N个臂进行播放，并获得状态相关的奖励（对应主用户活动下的吞吐量）。目标是最大化多次播放获得的期望总奖励（即总吞吐量）。此类多臂赌博机算法的性能通过遗憾来衡量，定义为与始终播放最佳K个臂的模型感知精灵相比的期望奖励差异。本文针对该问题提出了一种新的连续探索与利用（CEE）算法。当没有关于臂动态的信息时，CEE是首个保证随时间均匀近对数遗憾的算法。当已知与平稳状态分布和状态相关奖励对应的某些界限时，我们证明CEE可以轻松修改以实现随时间对数遗憾。相比之下，先前算法需要关于转移矩阵第二特征值界限的额外信息才能保证对数遗憾。最后，通过数值模拟表明CEE比先前算法更高效。

英文摘要

The problem of opportunistic spectrum access in cognitive radio networks has been recently formulated as a non-Bayesian restless multi-armed bandit problem. In this problem, there are N arms (corresponding to channels) and one player (corresponding to a secondary user). The state of each arm evolves as a finite-state Markov chain with unknown parameters. At each time slot, the player can select K < N arms to play and receives state-dependent rewards (corresponding to the throughput obtained given the activity of primary users). The objective is to maximize the expected total rewards (i.e., total throughput) obtained over multiple plays. The performance of an algorithm for such a multi-armed bandit problem is measured in terms of regret, defined as the difference in expected reward compared to a model-aware genie who always plays the best K arms. In this paper, we propose a new continuous exploration and exploitation (CEE) algorithm for this problem. When no information is available about the dynamics of the arms, CEE is the first algorithm to guarantee near-logarithmic regret uniformly over time. When some bounds corresponding to the stationary state distributions and the state-dependent rewards are known, we show that CEE can be easily modified to achieve logarithmic regret over time. In contrast, prior algorithms require additional information concerning bounds on the second eigenvalues of the transition matrices in order to guarantee logarithmic regret. Finally, we show through numerical simulations that CEE is more efficient than prior algorithms.

URL PDF HTML ☆

赞 0 踩 0

1109.1251 2026-06-03 cs.RO cs.SY eess.SY math.OC

Synthesis of Distributed Control and Communication Schemes from Global LTL Specifications

从全局LTL规范综合分布式控制与通信方案

Yushan Chen, Xu Chu Ding, Calin Belta

AI总结提出一种从全局线性时序逻辑(LTL)规范综合多智能体控制与通信策略的技术，通过并发理论检查规范可分布性，并利用LTL模型检验生成个体策略。

Comments Technical Report accompanying an accepted paper for CDC2011

1111.1684 2026-06-03 cs.RO cs.SY eess.SY

Simulation Techniques and Prosthetic Approach Towards Biologically Efficient Artificial Sense Organs- An Overview

仿真技术与假体方法在生物高效人工感觉器官中的应用综述

Biswarup Neogi, Soumya Ghosal, Soumyajit Mukherjee, Achintya Das, D. N. Tibarewala

AI总结本文综述了控制理论在假体感觉器官（包括视觉、味觉和嗅觉）中的应用，重点讨论了仿真技术和控制建模在人工器官性能评估与设计中的关键作用。

Comments 12 Pages

1110.1781 2026-06-03 cs.LG cs.SY eess.SY

A Study of Unsupervised Adaptive Crowdsourcing

无监督自适应众包研究

G. Kesidis, A. Kurve

AI总结基于用户响应与多数响应的一致性，研究无监督众包性能，提出两种场景下的可靠性度量方法。

Comments Technical Report, 2 figures

1109.2288 2026-06-03 cs.RO cs.SY eess.SY

Heterogeneity for Increasing Performance and Reliability of Self-Reconfigurable Multi-Robot Organisms

异构性提升自重构多机器人组织的性能与可靠性

S. Kernbach, F. Schlachter, R. Humza, J. Liedke, S. Popesku, S. Russo, T. Ranzani, L. Manfredi, C. Stefanini, R. Matthias, Ch. Schwarzer, B. Girault, P. Alschbach, E. Meister, O. Scholz

AI总结本文研究异构性在自重构模块化机器人系统中的设计选择与性能评估，通过机电和软件设计实验证明异构平台能提升系统性能和可靠性。

1109.2088 2026-06-03 cs.LG cs.NI cs.SY eess.SY math.OC math.PR

Online Learning Algorithms for Stochastic Water-Filling

随机注水的在线学习算法

Yi Gai, Bhaskar Krishnamachari

AI总结针对未知信道增益分布下的随机时变信道，提出两种基于多臂老虎机的在线注水算法CWF1和CWF2，分别优化期望和速率与和速率的期望，并证明其遗憾或错误分配次数随信道数多项式增长、随时间对数增长。

详情

AI中文摘要

注水是解决将受限功率分配给一组并行信道以最大化总数据速率的经典问题的术语。它在实践中被广泛使用，例如在WiMax等多用户OFDM系统中用于子载波的功率分配。经典的注水算法是确定性的，并且需要信道增益与噪声比的完美知识。在本文中，我们考虑如何在随机时变（i.i.d.）信道上进行功率分配，且增益与噪声比分布未知。我们采用基于随机多臂老虎机的在线学习框架。我们考虑该问题的两种变体：一种目标是找到最大化 $\sum\limits_i \mathbb{E}[\log(1 + SNR_i)]$ 的功率分配，另一种目标是找到最大化 $\sum\limits_i \log(1 + \mathbb{E}[SNR_i])$ 的功率分配。对于第一个问题，我们提出了一种称为CWF1的认知注水算法。我们证明CWF1获得的遗憾（定义为随时间累积的、由分布感知的预言机获得的速率和与该策略获得的速率和之间的差距）随信道数多项式增长、随时间对数增长，这意味着它渐近地达到了在已知增益分布时可以获得的最优时间平均速率。对于第二个问题，我们提出了一种称为CWF2的算法，据我们所知，这是随机多臂老虎机文献中第一个利用臂之间非线性依赖关系的算法。我们证明CWF2选择错误功率分配的次数被一个随信道数多项式增长、随时间对数增长的函数所界定，这意味着其错误分配频率趋于零。

英文摘要

Water-filling is the term for the classic solution to the problem of allocating constrained power to a set of parallel channels to maximize the total data-rate. It is used widely in practice, for example, for power allocation to sub-carriers in multi-user OFDM systems such as WiMax. The classic water-filling algorithm is deterministic and requires perfect knowledge of the channel gain to noise ratios. In this paper we consider how to do power allocation over stochastically time-varying (i.i.d.) channels with unknown gain to noise ratio distributions. We adopt an online learning framework based on stochastic multi-armed bandits. We consider two variations of the problem, one in which the goal is to find a power allocation to maximize $\sum\limits_i \mathbb{E}[\log(1 + SNR_i)]$, and another in which the goal is to find a power allocation to maximize $\sum\limits_i \log(1 + \mathbb{E}[SNR_i])$. For the first problem, we propose a \emph{cognitive water-filling} algorithm that we call CWF1. We show that CWF1 obtains a regret (defined as the cumulative gap over time between the sum-rate obtained by a distribution-aware genie and this policy) that grows polynomially in the number of channels and logarithmically in time, implying that it asymptotically achieves the optimal time-averaged rate that can be obtained when the gain distributions are known. For the second problem, we present an algorithm called CWF2, which is, to our knowledge, the first algorithm in the literature on stochastic multi-armed bandits to exploit non-linear dependencies between the arms. We prove that the number of times CWF2 picks the incorrect power allocation is bounded by a function that is polynomial in the number of channels and logarithmic in time, implying that its frequency of incorrect allocation tends to zero.

URL PDF HTML ☆

赞 0 踩 0

1004.2027 2026-06-03 cs.LG cs.AI cs.SY eess.SY math.OC stat.ML

Dynamic Policy Programming

动态策略编程

Mohammad Gheshlaghi Azar, Vicenc Gomez, Hilbert J. Kappen

AI总结提出动态策略编程（DPP）方法，通过平均累积误差的无穷范数界，在近似误差下优于标准近似值迭代和近似策略迭代，并在多个问题域中显著超越现有强化学习方法。

Comments Submitted to Journal of Machine Learning Research

详情

AI中文摘要

在本文中，我们提出了一种新颖的策略迭代方法，称为动态策略编程（DPP），用于估计无限时域马尔可夫决策过程中的最优策略。我们证明了在存在近似/估计误差的情况下，DPP的有限迭代和渐近l∞范数性能损失界。这些界以平均累积误差的l∞范数表示，而不是标准近似值迭代（AVI）和近似策略迭代（API）中误差的l∞范数。这表明DPP可以实现比AVI和API更好的性能，因为它平均了整个学习过程中由蒙特卡洛采样引起的模拟噪声。我们通过在不同问题域上比较DPP的近似变体与现有强化学习（RL）方法的性能，数值验证了这一理论结果。我们的结果表明，在所有情况下，基于DPP的算法都大幅优于其他RL方法。

英文摘要

In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. We prove the finite-iteration and asymptotic l\infty-norm performance-loss bounds for DPP in the presence of approximation/estimation error. The bounds are expressed in terms of the l\infty-norm of the average accumulated error as opposed to the l\infty-norm of the error in the case of the standard approximate value iteration (AVI) and the approximate policy iteration (API). This suggests that DPP can achieve a better performance than AVI and API since it averages out the simulation noise caused by Monte-Carlo sampling throughout the learning process. We examine this theoretical results numerically by com- paring the performance of the approximate variants of DPP with existing reinforcement learning (RL) methods on different problem domains. Our results show that, in all cases, DPP-based algorithms outperform other RL methods by a wide margin.

URL PDF HTML ☆

赞 0 踩 0

1108.6175 2026-06-03 cs.RO cs.SY eess.SY

Adaptive Locomotion of Multibody Snake-like Robot

多体蛇形机器人的自适应运动

Eugen Meister, Sergej Stepanenko, Serge Kernbach

AI总结针对25自由度蛇形机器人，提出一种自适应节律控制算法，通过仿真和实物实验研究其行为和能量特性，并分析不同身体节段的动力学差异。

Comments Multibody Dynamics 2011, ECCOMAS Thematic Conference, J.C. Samin, P. Fisette (eds.) Brussels, Belgium, 4-7 July, 2011

1108.4698 2026-06-03 cs.RO cs.SY eess.SY math.OC

Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control

最小二乘时序差分演员-评论家方法及其在机器人运动控制中的应用

Reza Moazzez Estanjini, Xu Chu Ding, Morteza Lahijanian, Jing Wang, Calin A. Belta, Ioannis Ch. Paschalidis

AI总结针对最大化到达某些状态同时避免其他状态的概率的马尔可夫决策过程问题，提出一种基于最小二乘时序差分学习的演员-评论家近似动态规划算法，并证明其收敛到参数空间中的驻点。

Comments Technical report accompanying an accepted paper to CDC 2011

1108.5624 2026-06-03 cs.RO cs.SY eess.SY

Multi-Robot Searching Algorithm Using Levy Flight and Artificial Potential Field

基于Levy飞行和人工势场的多机器人搜索算法

Donny K. Sutantyo, Serge Kernbach, Valentin A. Nepomnyashchikh, Paul Levi

AI总结提出结合Levy飞行和人工势场的多机器人搜索算法，通过实验验证其效率并开发通用框架。

Comments Eighth IEEE International Workshop on Safety, Security, and Rescue Robotics (SSRR-2010), Bremen, Germany, 26-30 July 2010

1108.5543 2026-06-03 cs.RO cs.NE cs.SY eess.SY

Multi-Robot Organisms: State of the Art

多机器人有机体：最新技术综述

Serge Kernbach, Oliver Scholz, Kanako Harada, Sergej Popesku, Jens Liedke, Humza Raja, Wenguo Liu, Fabio Caparrelli, Jaouhar Jemai, Jiri Havlik, Eugen Meister, Paul Levi

AI总结本文综述了人工多机器人有机体领域的最新进展，涵盖机电一体化、传感器与计算设备、软件框架，并介绍了群体与可重构机器人领域的一项重大挑战。

1108.4432 2026-06-03 cs.RO cs.SY eess.SY math.OC physics.comp-ph

Exploiting the Passive Dynamics of a Compliant Leg to Develop Gait Transitions

利用柔性腿的被动动力学发展步态转换

Harold Roberto Martinez Salazar, Juan Pablo Carbajal

AI总结通过混合动力系统分析弹簧负载倒立摆模型，识别稳定与不稳定区域，并利用不稳定区域在恒定能量下诱导步态转换，同时提出简单变攻角控制策略使系统几乎始终稳定。

1108.3240 2026-06-03 cs.RO cs.SY eess.SY math.OC

Multi-robot Deployment From LTL Specifications with Reduced Communication

基于LTL规范的多机器人部署与通信减少

Marius Kloetzer, Xu Chu Ding, Calin Belta

AI总结提出一种分层框架，通过有限抽象、并行组合和运动规划，将全局LTL规范自动部署到多独轮车机器人团队，并重点设计算法减少执行阶段的机器人间通信。

Comments CDC 2011 Technical Report

1108.2126 2026-06-03 cs.RO cs.SY eess.SY math.OC

Multi-Modal Local Sensing and Communication for Collective Underwater Systems

多模态本地感知与通信用于集体水下系统

Serge Kernbach, Tobias Dipper, Donny Sutantyo

AI总结本文研究集体水下系统中用于网络和集群模式的本地感知与通信，通过模态和子模态通信的特定组合实现多AUV间的专用协作。

0906.0434 2026-06-03 cs.CV cs.NA math.NA stat.ME

Total Variation, Adaptive Total Variation and Nonconvex Smoothly Clipped Absolute Deviation Penalty for Denoising Blocky Images

全变分、自适应全变分和非凸平滑剪切绝对偏差惩罚用于块状图像去噪

Aditya Chopra, Heng Lian

AI总结针对全变分模型的偏差问题，提出一种受高维变量选择启发的非凸惩罚函数，通过MM算法高效求解，实验证明在块状图像去噪中性能优于传统方法。

1105.3931 2026-06-03 cs.LG cs.NA math.NA stat.ML

Behavior of Graph Laplacians on Manifolds with Boundary

带边界流形上图拉普拉斯算子的行为

Xueyuan Zhou, Mikhail Belkin

AI总结本文分析了带边界流形上图拉普拉斯算子在边界附近的行为，揭示了其与内部不同的缩放特性及全局影响，并给出了收敛速率和数值结果。

详情

AI中文摘要

在流形学习中，基于数据构建的图拉普拉斯算法在实际应用和理论分析中都受到了广泛关注。特别是，从采样数据获得的图拉普拉斯算子收敛到某些连续算子最近成为一个活跃的研究课题。现有的大部分工作都假设数据采样自无边界流形，或者感兴趣的函数在远离边界的点处评估。然而，边界行为问题具有相当大的实践和理论意义。在本文中，我们分析了图拉普拉斯算子在边界附近或边界上的点的行为，讨论了它们的收敛速率及其含义，并提供了一些数值结果。结果表明，虽然边界附近的点只占流形总体积的一小部分，但图拉普拉斯算子在这些点的行为具有与流形上其他地方不同的缩放特性，并对整个流形产生全局影响，这一观察对于流形学习的普遍问题具有潜在的重要意义。

英文摘要

In manifold learning, algorithms based on graph Laplacians constructed from data have received considerable attention both in practical applications and theoretical analysis. In particular, the convergence of graph Laplacians obtained from sampled data to certain continuous operators has become an active research topic recently. Most of the existing work has been done under the assumption that the data is sampled from a manifold without boundary or that the functions of interests are evaluated at a point away from the boundary. However, the question of boundary behavior is of considerable practical and theoretical interest. In this paper we provide an analysis of the behavior of graph Laplacians at a point near or on the boundary, discuss their convergence rates and their implications and provide some numerical results. It turns out that while points near the boundary occupy only a small part of the total volume of a manifold, the behavior of graph Laplacian there has different scaling properties from its behavior elsewhere on the manifold, with global effects on the whole manifold, an observation with potentially important implications for the general problem of learning on manifolds.

URL PDF HTML ☆

赞 0 踩 0

1004.2342 2026-06-03 cs.AI cs.PF cs.SY eess.SY math.OC math.PR

Mean field for Markov Decision Processes: from Discrete to Continuous Optimization

马尔可夫决策过程的平均场：从离散到连续优化

Nicolas Gast, Bruno Gaujal, Jean-Yves Le Boudec

AI总结研究大量对象组成的马尔可夫决策过程收敛到常微分方程优化问题，通过平均场近似得到连续HJB方程，并给出奖励差异界限及构造性算法。

1009.4219 2026-06-03 cs.LG cs.SY eess.SY math.OC

Safe Feature Elimination for the LASSO and Sparse Supervised Learning Problems

LASSO和稀疏监督学习问题的安全特征消除

Laurent El Ghaoui, Vivian Viallon, Tarek Rabbani

AI总结提出一种快速方法，在LASSO问题中消除无关特征，显著减少运行时间，并可推广到一般l1惩罚凸问题。

Comments Submitted to JMLR in April 2011

详情

AI中文摘要

我们描述了一种快速方法，用于消除l1惩罚最小二乘回归（或LASSO）问题中的特征（变量）。特征的消除可能导致运行时间的大幅减少，特别是对于惩罚参数的大值。我们的方法不是启发式的：它只消除那些在解决LASSO问题后保证不存在的特征。特征消除步骤易于并行化，并且可以独立测试每个特征的消除。此外，与解决LASSO问题的计算量相比，我们的方法的计算努力可以忽略不计——大致相当于单个梯度步骤的计算量。我们的方法扩展了现有LASSO算法的范围，以处理以前无法达到的更大数据集。我们展示了如何将我们的方法扩展到一般的l1惩罚凸问题，并给出了稀疏支持向量机和逻辑回归问题的初步结果。

英文摘要

We describe a fast method to eliminate features (variables) in l1 -penalized least-square regression (or LASSO) problems. The elimination of features leads to a potentially substantial reduction in running time, specially for large values of the penalty parameter. Our method is not heuristic: it only eliminates features that are guaranteed to be absent after solving the LASSO problem. The feature elimination step is easy to parallelize and can test each feature for elimination independently. Moreover, the computational effort of our method is negligible compared to that of solving the LASSO problem - roughly it is the same as single gradient step. Our method extends the scope of existing LASSO algorithms to treat larger data sets, previously out of their reach. We show how our method can be extended to general l1 -penalized convex problems and present preliminary results for the Sparse Support Vector Machine and Logistic Regression problems.

URL PDF HTML ☆

赞 0 踩 0

1104.5391 2026-06-03 cs.LG cs.SY eess.SY math.OC

On Optimality of Greedy Policy for a Class of Standard Reward Function of Restless Multi-armed Bandit Problem

关于一类标准奖励函数的贪婪策略在非稳态多臂赌博机问题中的最优性

Quan Liu, Kehao Wang, Lin Chen

AI总结针对非稳态多臂赌博机问题，通过分析一类标准奖励函数，建立了保证贪婪策略在折扣期望奖励准则下最优性的折扣因子闭式条件，并验证了其在认知无线电网络中的有效性。

详情

AI中文摘要

本文考虑非稳态赌博机问题，这是决策理论中著名的随机多臂赌博机问题最广泛研究的推广之一。然而，已知该问题在近似任何非平凡因子时是PSPACE-难的。因此，由于其高复杂性，最优性很难获得。考虑到贪婪策略的稳定性和简单性，一个自然的方法是采用贪婪策略。然而，贪婪策略通常因其固有的短视行为而导致最优性损失。本文通过分析一类所谓的标准奖励函数，建立了关于折扣因子β的闭式条件，使得在折扣期望奖励准则下贪婪策略的最优性得到保证，特别是条件β=1表示在平均累积奖励准则下贪婪策略的最优性。因此，标准形式的奖励函数可以轻松用于判断贪婪策略的最优性，无需任何复杂计算。文中给出了认知无线电网络中的一些例子，以验证该数学结果在判断贪婪策略最优性方面的有效性。

英文摘要

In this paper,we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. However, it is known be PSPACE-Hard to approximate to any non-trivial factor. Thus the optimality is very difficult to obtain due to its high complexity. A natural method is to obtain the greedy policy considering its stability and simplicity. However, the greedy policy will result in the optimality loss for its intrinsic myopic behavior generally. In this paper, by analyzing one class of so-called standard reward function, we establish the closed-form condition about the discounted factor βsuch that the optimality of the greedy policy is guaranteed under the discounted expected reward criterion, especially, the condition β= 1 indicating the optimality of the greedy policy under the average accumulative reward criterion. Thus, the standard form of reward function can easily be used to judge the optimality of the greedy policy without any complicated calculation. Some examples in cognitive radio networks are presented to verify the effectiveness of the mathematical result in judging the optimality of the greedy policy.

URL PDF HTML ☆

赞 0 踩 0

1001.4475 2026-06-03 cs.LG cs.SY eess.SY math.OC math.ST stat.TH

X-Armed Bandits

Sébastien Bubeck, Rémi Munos, Gilles Stoltz, Csaba Szepesvari

AI总结针对臂集为一般可测空间且均值回报函数满足已知相异度局部Lipschitz条件的随机多臂赌博机问题，提出HOO算法，实现与维度无关的遗憾界并证明极小极大最优性。

详情

AI中文摘要

我们考虑随机赌博机的一个推广，其中臂集$\cX$可以是任意可测空间，且均值回报函数关于决策者已知的相异度函数是“局部Lipschitz”的。在此条件下，我们构建了一种称为HOO（分层乐观优化）的臂选择策略，对于一大类问题，其遗憾界相比之前的结果有所改进。特别地，我们的结果表明，如果$\cX$是欧氏空间中的单位超立方体，且均值回报函数有有限个全局最大值，在这些最大值附近函数的行为具有已知光滑度的局部连续性，那么HOO的期望遗憾以对数因子为界被$\sqrt{n}$控制，即遗憾的增长速率与空间维度无关。我们还证明了当相异度为度量时，我们的算法是极小极大最优的。我们的基本策略具有关于时间步数的二次计算复杂度，且不依赖于加倍技巧。我们还引入了一种改进策略，该策略依赖于加倍技巧但运行时间为线性对数。这两个结果相比之前的方法都有改进。

英文摘要

We consider a generalization of stochastic bandits where the set of arms, $\cX$, is allowed to be a generic measurable space and the mean-payoff function is "locally Lipschitz" with respect to a dissimilarity function that is known to the decision maker. Under this condition we construct an arm selection policy, called HOO (hierarchical optimistic optimization), with improved regret bounds compared to previous results for a large class of problems. In particular, our results imply that if $\cX$ is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally continuous with a known smoothness degree, then the expected regret of HOO is bounded up to a logarithmic factor by $\sqrt{n}$, i.e., the rate of growth of the regret is independent of the dimension of the space. We also prove the minimax optimality of our algorithm when the dissimilarity is a metric. Our basic strategy has quadratic computational complexity as a function of the number of time steps and does not rely on the doubling trick. We also introduce a modified strategy, which relies on the doubling trick but runs in linearithmic time. Both results are improvements with respect to previous approaches.

URL PDF HTML ☆

赞 0 踩 0

1103.4342 2026-06-03 cs.RO cs.SY eess.SY math.OC

MDP Optimal Control under Temporal Logic Constraints

时序逻辑约束下的MDP最优控制

Xu Chu Ding, Stephen L. Smith, Calin Belta, Daniela Rus

AI总结针对马尔可夫决策过程（MDP），提出一种在给定线性时序逻辑（LTL）规范下自动生成控制策略的方法，并引入优化命题以最小化期望成本，通过动态规划算法合成最优或次优策略。

Comments Technical report accompanying the CDC2011 submission

详情

AI中文摘要

在本文中，我们开发了一种方法，用于自动生成以马尔可夫决策过程（MDP）建模的动态系统的控制策略。控制规范以线性时序逻辑（LTL）公式给出，该公式基于定义在MDP状态上的一组命题。我们合成一个控制策略，使得MDP几乎必然满足给定规范（如果这样的策略存在）。此外，我们指定一个“优化命题”以重复满足，并制定了一个新的优化准则，即最小化该命题满足之间的期望成本。我们提出了策略最优的充分条件，并开发了一种动态规划算法，该算法在某些条件下合成最优策略，否则合成次优策略。此问题源于需要执行持久性任务的机器人应用，例如环境监测或数据收集。

英文摘要

In this paper, we develop a method to automatically generate a control policy for a dynamical system modeled as a Markov Decision Process (MDP). The control specification is given as a Linear Temporal Logic (LTL) formula over a set of propositions defined on the states of the MDP. We synthesize a control policy such that the MDP satisfies the given specification almost surely, if such a policy exists. In addition, we designate an "optimizing proposition" to be repeatedly satisfied, and we formulate a novel optimization criterion in terms of minimizing the expected cost in between satisfactions of this proposition. We propose a sufficient condition for a policy to be optimal, and develop a dynamic programming algorithm that synthesizes a policy that is optimal under some conditions, and sub-optimal otherwise. This problem is motivated by robotic applications requiring persistent tasks, such as environmental monitoring or data gathering, to be performed.

URL PDF HTML ☆

赞 0 踩 0

1010.5290 2026-06-03 cs.LG cs.NA math.NA

Converged Algorithms for Orthogonal Nonnegative Matrix Factorizations

正交非负矩阵分解的收敛算法

Andri Mirzal

AI总结提出基于Lee和Seung算法以及Lin思想的单正交和双正交非负矩阵分解算法，并给出收敛性证明，实验验证了收敛性。

Comments 55 pages, 11 figures

1103.2491 2026-06-03 cs.LG cs.GT cs.SY eess.SY math.OC

Heterogeneous Learning in Zero-Sum Stochastic Games with Incomplete Information

不完全信息零和随机博弈中的异构学习

Quanyan Zhu, Hamidou Tembine, Tamer Basar

AI总结针对不完全信息零和随机博弈，提出异构学习方案（各智能体采用不同学习模式），利用随机逼近将其转化为常微分方程，并应用于安全博弈中攻防双方因理性与信息差异采用不同学习策略的场景。

详情

AI中文摘要

学习算法对于博弈论在网络环境中的应用至关重要。在动态和去中心化的环境中，流量、拓扑和信道状态可能随时间变化，且智能体之间的通信不切实际，因此需要制定和研究不完全信息博弈以及完全分布式学习算法，这些算法要求每个智能体对其他智能体的信息需求最小。在本文中，我们应对这一重大挑战，引入了异构学习方案，其中每个智能体在不完全信息博弈的背景下采用不同的学习模式。我们使用随机逼近技术来证明异构学习方案可以通过其确定性常微分方程对应物进行研究。根据玩家的学习速率，这些常微分方程可能不同于标准的复制者动力学、（短视）最佳响应动力学、logit动力学和虚拟博弈动力学。我们将结果应用于一类安全博弈，其中攻击者和防御者由于理性水平和获取信息的差异而采用不同的学习方案。

英文摘要

Learning algorithms are essential for the applications of game theory in a networking environment. In dynamic and decentralized settings where the traffic, topology and channel states may vary over time and the communication between agents is impractical, it is important to formulate and study games of incomplete information and fully distributed learning algorithms which for each agent requires a minimal amount of information regarding the remaining agents. In this paper, we address this major challenge and introduce heterogeneous learning schemes in which each agent adopts a distinct learning pattern in the context of games with incomplete information. We use stochastic approximation techniques to show that the heterogeneous learning schemes can be studied in terms of their deterministic ordinary differential equation (ODE) counterparts. Depending on the learning rates of the players, these ODEs could be different from the standard replicator dynamics, (myopic) best response (BR) dynamics, logit dynamics, and fictitious play dynamics. We apply the results to a class of security games in which the attacker and the defender adopt different learning schemes due to differences in their rationality levels and the information they acquire.

URL PDF HTML ☆

赞 0 踩 0

1102.3396 2026-06-03 cs.RO cs.SY eess.SY

Detecting Separation in Robotic and Sensor Networks

检测机器人与传感器网络中的分离

Chenda Liao, Harshavardhan Chenji, Prabir Barooah, Radu Stoleru, Tamás Kalmár-Nagy

AI总结针对机器人与传感器网络中节点与基站可能因移动或故障而分离的问题，提出一种基于平均化方案的分布式算法，通过监测节点状态收敛性来检测永久性分离。

详情

AI中文摘要

本文考虑在机器人与传感器网络中监测检测代理与基站分离的问题。这种分离可能由代理的移动和/或故障引起。在静态网络中，分离/切断检测可以通过节点与基站之间传递消息来实现，但对于高移动性网络，由于路由不断变化，这种解决方案不切实际。我们提出了一种分布式算法来检测与基站的分离。该算法包括一个平均化方案，其中每个节点通过与其当前邻居通信来更新一个标量状态。我们证明，如果一个节点永久性地与基站断开连接，其状态收敛到$0$。如果一个节点在平均意义上与基站连接，即使在任何时刻都不连接，我们证明其状态的期望值收敛到一个正数。因此，节点可以通过监测其状态来检测是否已与基站分离。通过仿真、实际系统实现以及涉及静态和移动网络的实验，验证了所提算法的有效性。

英文摘要

In this paper we consider the problem of monitoring detecting separation of agents from a base station in robotic and sensor networks. Such separation can be caused by mobility and/or failure of the agents. While separation/cut detection may be performed by passing messages between a node and the base in static networks, such a solution is impractical for networks with high mobility, since routes are constantly changing. We propose a distributed algorithm to detect separation from the base station. The algorithm consists of an averaging scheme in which every node updates a scalar state by communicating with its current neighbors. We prove that if a node is permanently disconnected from the base station, its state converges to $0$. If a node is connected to the base station in an average sense, even if not connected in any instant, then we show that the expected value of its state converges to a positive number. Therefore, a node can detect if it has been separated from the base station by monitoring its state. The effectiveness of the proposed algorithm is demonstrated through simulations, a real system implementation and experiments involving both static as well as mobile networks.

URL PDF HTML ☆

赞 0 踩 0

1102.0899 2026-06-03 cs.AI cs.CV cs.LG cs.NA math.NA math.PR

Evidence Feed Forward Hidden Markov Model: A New Type of Hidden Markov Model

证据前馈隐马尔可夫模型：一种新型隐马尔可夫模型

Michael DelRose, Christian Wagner, Philip Frederick

AI总结针对隐马尔可夫模型无法建模观测间关联的问题，提出证据前馈隐马尔可夫模型，通过引入观测间概率链接提升分类性能，并在视觉动作和测量数据上验证其有效性。

Comments 19 pages, International Journal of Artificial Intelligence and Applications

详情

DOI: 10.5121/ijaia.2011.2101
Journal ref: International Journal of Artificial Intelligence and Applications (IJAIA), Vol. 2, No. 1, Jan 2011

AI中文摘要

仅基于视觉动作预测他人意图的能力是人类和动物独有的技能。当前计算机算法的智能尚未达到这种复杂程度，但已有若干研究正朝此方向努力。由于可用的分类算法众多，难以确定哪种算法最适合特定情境。在视觉人类意图数据分类中，隐马尔可夫模型（HMM）及其变体是主要候选方法。HMM无法提供观测间链接的概率，这是该分类技术的一大缺陷。当人通过视觉识别他人的动作时，会监控观测中的模式。通过估计下一个观测，人们能够总结动作，从而相当准确地判断执行动作者的意图。这些视觉线索和链接对于创建基于视觉观测确定人类动作的智能算法至关重要。证据前馈隐马尔可夫模型是一种新开发的算法，它提供了观测间链接。本研究阐述了证据前馈HMM背后的理论，提供了其学习这些参数以优化观测似然性的数学证明（这对所有计算智能算法都至关重要），并给出了与标准HMM在视觉动作数据和测量数据分类中的比较示例，从而为证据前馈HMM在多种问题分类中的应用奠定了坚实基础。

英文摘要

The ability to predict the intentions of people based solely on their visual actions is a skill only performed by humans and animals. The intelligence of current computer algorithms has not reached this level of complexity, but there are several research efforts that are working towards it. With the number of classification algorithms available, it is hard to determine which algorithm works best for a particular situation. In classification of visual human intent data, Hidden Markov Models (HMM), and their variants, are leading candidates. The inability of HMMs to provide a probability in the observation to observation linkages is a big downfall in this classification technique. If a person is visually identifying an action of another person, they monitor patterns in the observations. By estimating the next observation, people have the ability to summarize the actions, and thus determine, with pretty good accuracy, the intention of the person performing the action. These visual cues and linkages are important in creating intelligent algorithms for determining human actions based on visual observations. The Evidence Feed Forward Hidden Markov Model is a newly developed algorithm which provides observation to observation linkages. The following research addresses the theory behind Evidence Feed Forward HMMs, provides mathematical proofs of their learning of these parameters to optimize the likelihood of observations with a Evidence Feed Forwards HMM, which is important in all computational intelligence algorithm, and gives comparative examples with standard HMMs in classification of both visual action data and measurement data; thus providing a strong base for Evidence Feed Forward HMMs in classification of many types of problems.

URL PDF HTML ☆

赞 0 踩 0

1003.4831 2026-06-03 cs.RO cs.SY eess.SY physics.med-ph

Ball on a beam: stabilization under saturated input control with large basin of attraction

球杆系统：饱和输入控制下具有大吸引域的自稳定

Yannick Aoustin, Alexander Formal'skii

AI总结针对直线和圆形两种欠驱动球杆系统，利用Jordan形式设计考虑电压饱和的反馈控制律，使吸引域逼近可控域，并通过仿真验证非线性控制律的有效性。

详情

DOI: 10.1007/s11044-008-9128-0
Journal ref: Multibody System Dynamics 21 (2008) 71-89

AI中文摘要

本文致力于两个欠驱动平面系统的镇定问题，即著名的直线球杆系统和一种原创的圆形球杆系统。利用每个系统模型在不稳定平衡点附近线性化的Jordan形式，设计了反馈控制律。明确考虑了输入到电机的电压限制。直线球杆系统在平衡点附近的运动中有一个不稳定模态。所提出的控制律确保吸引域与可控域重合。圆形球杆系统在平衡点附近有两个不稳定模态。因此，这种从未被考虑过的装置比直线球杆系统更难控制。主要贡献是提出一种简单的新控制律，通过调整其增益参数，使得在线性情况下吸引域可以任意接近可控域。针对两个非线性系统，给出了仿真结果，以说明所设计的非线性控制律的效率并确定吸引域。

英文摘要

This article is devoted to the stabilization of two underactuated planar systems, the well-known straight beam-and-ball system and an original circular beam-and-ball system. The feedback control for each system is designed, using the Jordan form of its model, linearized near the unstable equilibrium. The limits on the voltage, fed to the motor, are taken into account explicitly. The straight beam-and-ball system has one unstable mode in the motion near the equilibrium point. The proposed control law ensures that the basin of attraction coincides with the controllability domain. The circular beam-and-ball system has two unstable modes near the equilibrium point. Therefore, this device, never considered in the past, is much more difficult to control than the straight beam-and-ball system. The main contribution is to propose a simple new control law, which ensures by adjusting its gain parameters that the basin of attraction arbitrarily can approach the controllability domain for the linear case. For both nonlinear systems, simulation results are presented to illustrate the efficiency of the designed nonlinear control laws and to determine the basin of attraction.

URL PDF HTML ☆

赞 0 踩 0

1010.0301 2026-06-03 cs.CV cs.NA math.NA

A Microwave Imaging and Enhancement Technique from Noisy Synthetic Data

一种基于含噪合成数据的微波成像与增强技术

Anjan Kumar Kundu, Bijoy Bandopadhyay, Sugata Sanyal

AI总结提出一种基于矩量法求解的逆迭代算法用于微波成像，通过约束优化确保收敛，并利用Levenberg-Marquardt方法处理病态性，最后对含噪合成数据重建的图像进行增强。

Comments 8 Pages, 10 Figures, International Symposium on Advanced Engineering and Applied Management-40th Anniversary in Higher Education-Image Processing-University Politegnica, Timisoara, 4-5 November, 2010, Hunedoara, ROMANIA

1008.3760 2026-06-03 cs.RO cs.SY eess.SY math.OC

Formal-language-theoretic Optimal Path Planning For Accommodation of Amortized Uncertainties and Dynamic Effects

形式语言理论最优路径规划以容纳摊销不确定性和动态效应

Ishanu Chattopadhyay, Anthony Cascone, Asok Ray

AI总结提出基于形式语言定量测度理论的全局最优路径规划方法，通过引入概率不可控转移建模不确定性，并采用无搜索组合优化最大化概率正则语言测度，实现机器人导航中目标到达概率最大化与障碍碰撞概率最小化。

Comments Submitted for review for possible publication elsewhere; journal reference will be added when available

详情

AI中文摘要

我们报告了一种基于形式语言定量测度理论的机器人路径规划全局最优方法。对基于语言测度的路径规划算法$ ustar$进行了重要推广，明确考虑了平均动态不确定性和规划执行中的估计误差。导航自动机的概念被推广为包含概率不可控转移，通过建模和规划执行过程中与计算策略的概率偏差来考虑不确定性。规划问题被转化为概率有限状态自动机的性能最大化问题。本质上，我们求解以下优化问题：计算最大化到达目标概率同时最小化碰撞障碍概率的导航策略。所提出方法的关键新颖之处包括使用不可控转移概念建模不确定性，以及通过高效无搜索组合方法求解后续优化问题，以最大化概率正则语言的定量测度。该算法在多种机器人导航模型中的适用性已通过实验室环境中两轮移动机器人平台（SEGWAY RMP 200）的实验验证得到展示。

英文摘要

We report a globally-optimal approach to robotic path planning under uncertainty, based on the theory of quantitative measures of formal languages. A significant generalization to the language-measure-theoretic path planning algorithm $\nustar$ is presented that explicitly accounts for average dynamic uncertainties and estimation errors in plan execution. The notion of the navigation automaton is generalized to include probabilistic uncontrollable transitions, which account for uncertainties by modeling and planning for probabilistic deviations from the computed policy in the course of execution. The planning problem is solved by casting it in the form of a performance maximization problem for probabilistic finite state automata. In essence we solve the following optimization problem: Compute the navigation policy which maximizes the probability of reaching the goal, while simultaneously minimizing the probability of hitting an obstacle. Key novelties of the proposed approach include the modeling of uncertainties using the concept of uncontrollable transitions, and the solution of the ensuing optimization problem using a highly efficient search-free combinatorial approach to maximize quantitative measures of probabilistic regular languages. Applicability of the algorithm in various models of robot navigation has been shown with experimental validation on a two-wheeled mobile robotic platform (SEGWAY RMP 200) in a laboratory environment.

URL PDF HTML ☆

赞 0 踩 0

1003.2441 2026-06-03 cs.SD cs.NA math.NA

Up-sampling and Natural Sample Value Computation for Digital Pulse Width Modulators

数字脉冲宽度调制器的上采样与自然采样值计算

Kien C. Nguyen, Dilip V. Sarwate

AI总结提出一种结合上采样、数字插值和自然采样转换的新方法，通过多相数字插值滤波器和数字微分器实现，以降低数字脉冲宽度调制中的谐波失真。

详情

AI中文摘要

数字脉冲宽度调制已被考虑用于高保真和高效率的音频放大器多年。研究表明，如果开关频率远高于调制波形的奈奎斯特率，则可以减少失真并简化系统实现。因此，输入数字源通常被上采样到更高的频率。同时，也证明了将均匀样本转换为自然样本会降低谐波失真。因此，在本文中，我们研究了一种结合上采样、数字插值和自然采样转换的新方法。该方法使用数字插值滤波器和数字微分器的多相实现。我们将展示该结构由一个FIR型线性级和一个非线性级组成。还将展示基于该方法的脉冲宽度调制系统的一些频谱仿真结果。最后，我们将讨论新方法相对于旧算法的改进。

英文摘要

Digital pulse width modulation has been considered for high-fidelity and high-efficiency audio amplifiers for several years. It has been shown that the distortion can be reduced and the implementation of the system can be simplified if the switching frequency is much higher than the Nyquist rate of the modulating waveform. Hence, the input digital source is normally upsampled to a higher frequency. It was also proved that converting uniform samples to natural samples will decrease the harmonic distortion. Thus, in this paper, we examine a new approach that combines upsampling, digital interpolation and natural sampling conversion. This approach uses poly-phase implementation of the digital interpolation filter and digital differentiators. We will show that the structure consists of an FIR-type linear stage and a nonlinear stage. Some spectral simulation results of a pulse width modulation system based on this approach will also be presented. Finally, we will discuss the improvement of the new approach over old algorithms.

URL PDF HTML ☆

赞 0 踩 0