arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.18196 2026-06-19 eess.SP 新提交

Receiver-Aware Analysis and Verification of the Spectral Separation Coefficient Under Interference-Induced Degradation

接收机感知的干扰诱导退化下频谱分离系数的分析与验证

Lucas Heublein, Fabian Benschuh, Alexander Rügamer, Felix Ott

AI总结本文通过引入接收机前端特性计算依赖接收机的频谱分离系数（SSC），并利用真实和仿真数据集实验验证了干扰影响计算的鲁棒性。

Comments 7 pages, 4 figures

详情

AI中文摘要

干扰对基于卫星的定位系统构成重大挑战，因此准确量化特定干扰类型对接收机性能以及由此产生的位置计算可靠性的影响至关重要。当前实践中，干扰影响通常使用与接收机无关的指标进行量化，而接收机特定的前端特性要么被理想化，要么仅被隐含考虑。在本文中，我们通过将接收机特定的前端特性明确纳入干扰影响的计算中，并通过实验验证所得的依赖接收机的分析，来解决这一局限性。因此，我们记录了一个包含210个不同干扰场景的真实世界开放场数据集，并针对特定接收机模块计算了依赖接收机的频谱分离系数（SSC）和干扰影响。此外，我们使用由射频星座模拟器（RFCS）生成的受控数据集验证了计算，该模拟器采用相同的接收机模块并回放类似的干扰类别。两种环境下获得的结果比较证明了干扰影响计算的鲁棒性。

英文摘要

Interference poses a significant challenge to satellite-based positioning systems, making it essential to accurately quantify the effects of specific interference types on receiver performance and the resulting reliability of position computation. In current practice, interference effects are often quantified using receiver-independent metrics, with receiver-specific front-end characteristics either idealized or only implicitly considered. In this paper, we address this limitation by explicitly incorporating receiver-specific front-end characteristics into the computation of interference effects and validating the resulting receiver-dependent analysis experimentally. Therefore, we record a real-world open-field dataset comprising 210 distinct interference scenarios and compute the receiver-dependent spectral separation coefficient (SSC) and interference impact for a specific receiver module. Furthermore, we verify the computation using a controlled dataset generated with a radio frequency constellation simulator (RFCS), employing the same receiver module and replaying similar interferences classes. The comparison of results obtained in both environments demonstrates the robustness of the interference impact computation.

URL PDF HTML ☆

赞 0 踩 0

2606.16417 2026-06-19 cs.SD eess.AS 新提交

Joycent: Diffusion-based Accent TTS without Accented Phone Prediction

Joycent: 基于扩散的口音语音合成，无需口音音素预测

Xintong Wang, Ye Wang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出Joycent，一种基于扩散模型的口音TTS方法，直接从标准音素序列和语音参考合成口音语音，无需口音音素预测，通过条件层归一化集成口音和说话人表征，并引入WhisAID口音识别模型，在保持说话人身份的同时提升口音自然度。

详情

AI中文摘要

口音文本到语音（TTS）旨在合成具有目标口音的语音。现有的口音TTS系统通常依赖于两阶段流程，首先将标准音素序列转换为口音音素序列，然后合成口音语音。然而，这种方法存在错误累积问题，并且需要配对的标准-口音音素序列数据，这在实践中往往有限。此外，基于文本的口音音素表示不足以建模韵律和节奏等声学口音特征。在这项工作中，我们提出了Joycent，一种基于扩散的口音TTS模型，它直接从标准音素序列和语音参考合成口音语音，无需口音音素预测。Joycent通过文本编码器中的条件层归一化（CLN）集成口音和说话人表征。我们引入了WhisAID，一种在口音普通话语音上训练的普通话口音识别模型，以提取口音表征。实验结果表明，与基线系统相比，Joycent在保持说话人身份的同时提高了口音自然度。我们在以下网址发布代码和演示：https://github.com/oshindow/Joycent-code。

英文摘要

Accent text-to-speech (TTS) aims to synthesize speech with target accents. Existing accent TTS systems typically rely on a two-stage pipeline that first converts standard phone sequences into accented phone sequences and then synthesizes accented speech. However, such approaches suffer from error accumulation and require paired standard-accented phone sequence data, which is often limited in practice. Moreover, text-based accented phone representations are insufficient to model acoustic accent characteristics such as prosody and rhythm. In this work, we propose Joycent, a diffusion-based accent TTS model that synthesizes accented speech directly from standard phone sequences and speech references without accented phone prediction. Joycent integrates accent and speaker representations through conditional layer normalization (CLN) in the text encoder. We introduce WhisAID, a Mandarin accent identification model trained on accented Mandarin speech to extract accent representations. Experimental results show that Joycent improves accentedness while preserving speaker identity compared with baseline systems. We release our code and demos at: https://github.com/oshindow/Joycent-code.

URL PDF HTML ☆

赞 0 踩 0

2606.16057 2026-06-19 cs.RO cs.SY eess.SP eess.SY 新提交

A Smart-Scheduled Hybrid (SSH) EKF-FGO State Estimation

一种智能调度混合（SSH）EKF-FGO状态估计方法

Eric Levy, Soosan Beheshti

发表机构 * GitHub ； arXiv

AI总结本文通过智能调度混合EKF-FGO框架，实验性地将优化调度作为独立设计变量，研究其在平衡估计精度与计算成本中的作用，并在平面SLAM仿真中验证了调度对预优化漂移、瞬态误差和运行时间的显著影响。

Comments This work has been accepted for presentation/publication at the 2026 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE). The final published version will appear in IEEE Xplore

详情

AI中文摘要

在机器人学和控制中，可靠的状态估计需要在估计精度和计算成本之间取得平衡。虽然基于滤波的方法（如扩展卡尔曼滤波器，EKF）提供高效的实时更新，而使用因子图的优化公式化方法改善全局一致性，但优化调度的作用通常被隐式处理，而非作为明确的设计变量进行研究。本文提出了一项实验研究，通过使用智能调度混合（SSH）EKF-FGO框架作为受控测试平台，明确隔离了优化调度。通过将基于EKF的状态传播与定期调用的批量优化相结合，并保持求解器结构和计算量固定，本文的主要贡献是实验性地将优化调度表征为一个独立的设计变量，它控制着中间估计精度与计算成本之间的权衡。在平面SLAM环境中的仿真结果表明，调度强烈影响预优化漂移、瞬态误差行为和运行时间。特别是，结果识别出一些操作区域，在这些区域中，全局优化的大部分好处可以以一小部分计算成本保留，从而突显了优化调度作为混合状态估计系统中一个未被充分探索但至关重要的考虑因素。

英文摘要

Reliable state estimation in robotics and control re quires balancing estimation accuracy against computational cost. While filtering-based methods such as the Extended Kalman Filter (EKF) provide efficient real-time updates, and optimisation based formulations using factor graphs improve global consistency, the role of optimisation scheduling is often treated implicitly rather than examined as an explicit design variable. This paper presents an experimental study that explicitly isolates optimisation scheduling using a Smart Scheduled Hybrid (SSH) EKF-FGO framework as a controlled testbed. By combining EKF-based state propagation with periodically invoked batch optimisation and holding solver structure and effort fixed, the main contribution of this work is the experimental characterisation of optimisation scheduling as an independent design variable governing the trade-off between intermediate estimation accuracy and computational cost. Simulation results in a planar SLAM environment show that scheduling strongly influences pre optimisation drift, transient error behaviour, and runtime. In particular, the results identify operating regimes in which most of the benefit of global optimisation can be retained at a fraction of the computational cost, highlighting optimisation scheduling as an under-explored yet critical consideration in hybrid state estimation systems.

URL PDF HTML ☆

赞 0 踩 0

2606.14784 2026-06-19 cs.SD cs.LG eess.AS 新提交

LLM-Based Synthetic Ground Truth Generation for Audio-Based Emotion Classification via In-Context Learning

基于上下文学习的音频情感分类的LLM合成真实标签生成

Qing Huang, Pooja Pol, Jianing Zhang

发表机构 * School of Business, Technical University of Applied Sciences Augsburg（应用技术大学阿沙芬堡商学院）； Data Science und Autonome Systeme Technologietransferzentrum (TTZ)（数据科学与自主系统技术转移中心（TTZ））

AI总结提出利用大语言模型（LLM）和上下文学习（ICL）从多用户VR环境的流式语音数据中自动生成情感相关合成真实标签，解决团队协作状态标注难题。

Comments https://icaiit.org/paper.php?paper=14th_ICAIIT_2/3_9

详情

AI中文摘要

理解人类状态和交互动态是人机交互（HCI）的核心目标。随着交互范式变得更加沉浸，虚拟现实（VR）已成为研究协作工作的强大平台。在此类环境中，评估团队协作状态（包括团队表现和团队韧性）需要从多模态传感器数据（如语音信号）中连续可靠地推断潜在的团队级认知和情感状态。然而，由于传感器噪声、上下文变异性和稀疏的专家标注，为这些潜在状态生成真实标签仍然具有挑战性。传统的自我报告方法仅提供静态和延迟的测量，因此不足以捕捉连续语音数据中反映的动态团队过程。在这项工作中，我们提出了一种由大语言模型（LLM）驱动的、基于代理的推理工作流，用于从多用户VR环境中的流式语音数据自动生成情感相关的合成真实标签。利用LLM的泛化能力，我们使用上下文学习（ICL）和少量配对的音频样本及其对应转录的演示。ICL倾向于实现与模型微调相当的任务适应，同时避免了参数更新的计算开销。为了构建信息丰富且鲁棒的上下文提示，我们采用基于检索的选择策略，根据声学特征空间中的相似性动态识别相关的音频演示。

英文摘要

Understanding human states and interaction dynamics is a core goal of human-computer interaction (HCI). As interaction paradigms become more immersive, virtual reality (VR) has emerged as a powerful platform for studying collaborative work. In such settings, evaluating team collaboration states, including team performance and team resilience, requires continuous and reliable inference of latent team-level cognitive and affective states from multi-modal sensor data, such as speech signals. However, generating ground truth labels for these latent states remains challenging due to sensor-induced noise, contextual variability, and sparse expert annotations. Traditional self-reporting approaches provide only static and delayed measurements and are therefore insufficient for capturing dynamic team processes reflected in continuous speech data. In this work, we propose a large language model (LLM)-driven, agentic inference workflow for automated emotion-related synthetic ground truth generation from streaming speech data in multi-user VR environments. Leveraging the generalization capabilities of LLMs, we use In-Context Learning (ICL) with few-shot demonstrations of paired audio-based samples and their corresponding transcriptions. ICL tends to achieve task adaptation comparable to model fine-tuning while circumventing the computational overhead of parameter updates. To construct informative and robust in-context prompts, we adopt a retrieval-based selection strategy that dynamically identifies relevant audio demonstrations based on similarity in the acoustic feature space.

URL PDF HTML ☆

赞 0 踩 0

2606.13794 2026-06-19 eess.SY cs.AI cs.RO cs.SY 新提交

An integrated interpretable control effectiveness learning and nonlinear control allocation methodology for overactuated aircrafts

过驱动飞行器的可解释控制效能学习与非线性控制分配集成方法

Umut Demir, Aamir Ahmad, Walter Fichter

发表机构 * University of Stuttgart, Faculty of Aerospace Engineering and Geodesy, Institute of Flight Mechanics and Control (iFR)（斯图加特大学航空航天工程与大地测量学院飞行力学与控制研究所）

AI总结提出一种基于稀疏非线性动力学辨识的学习控制效能映射方法，结合在线自适应机制，实现过驱动飞行器的高效非线性控制分配，兼具可解释性和低计算成本。

详情

AI中文摘要

非线性动力学以及多个执行器之间产生的强耦合削弱了传统线性控制分配技术背后的假设。当飞行进入非线性效应主导的模态时，线性分配器因模型失配增加而精度下降，进而降低飞行控制系统的性能和鲁棒性。高保真机载模型和黑箱数据驱动方法可以在整个飞行包线内恢复精度，但分别带来实时分配难以承受的计算负担，并牺牲了验证和故障诊断所需的可解释性。本文通过使用稀疏非线性动力学辨识从代表性飞行数据中学习显式的、受物理约束的控制效能映射解析模型，解决了这些限制。所得映射紧凑、可解释，并允许解析导数，从而能够在非线性求解器中高效计算，同时额外包含执行器动力学，无需机载模型。在线自适应机制监控预测残差，并在检测到显著对象变化时刷新模型，从而在执行器故障和变化工况下提供平滑重构。该方法在一款高保真非线性基准飞行器上经过一系列激进机动评估，达到了与完整非线性机载模型相当的精度，同时相对于现有基线显著降低了计算成本。

英文摘要

Nonlinear dynamics and the strong couplings that arise between multiple effectors undermine the assumptions behind conventional, linear control allocation techniques. When flight enters regimes where nonlinear effects dominate, linear allocators exhibit reduced accuracy due to increased model mismatch, which subsequently degrades performance and robustness of the flight control system. High fidelity onboard models and black box data driven approaches can recover accuracy across the flight envelope, but respectively impose computational burdens prohibitive for real time allocation and sacrifice the interpretability required for verification and fault diagnosis. This paper addresses these limitations by learning an explicit, physics constrained analytical model of the control effectiveness mapping from representative flight data using Sparse Identification of Nonlinear Dynamics. The resulting mapping is compact, interpretable, and admits analytical derivatives, enabling efficient computation within nonlinear solvers that additionally incorporate actuator dynamics, without requiring an onboard model. An online adaptation mechanism monitors prediction residuals and refreshes the model when significant plant changes are detected, providing graceful reconfiguration under actuator failures and varying operating conditions. The methodology is evaluated on a high fidelity nonlinear benchmark aircraft across a range of aggressive maneuvers, achieving accuracy comparable to a full nonlinear onboard model while substantially reducing computational cost relative to established baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.02989 2026-06-19 cs.IT eess.SP math.IT stat.ML 版本更新

Information Theory and Statistical Learning

信息论与统计学习

Abbas El Gamal

AI总结本文是Cover & Thomas《信息论基础》第三版的章节预印本，系统介绍了散度度量在模型训练中的作用，涵盖线性回归、生成扩散模型等，并给出了扩散模型更系统的推导。

详情

AI中文摘要

本手稿包含即将出版的《Cover and Thomas信息论基础》第三版中一章的预印本，经Wiley许可发布。新版的目录EIT-3 ToC可在此https URL找到。反馈请联系abbas@ee. this http URL。学习与信息论在模型训练和基本性能极限的表征中均有交叉。本手稿对第一个交叉点进行了简洁易懂的处理，仅需高年级本科生或一年级研究生水平的信息论和统计学基础知识。章末习题使材料既适合课堂使用也适合自学。本章重点讨论散度度量在模型训练中的作用，示例涵盖从线性回归、逻辑回归到自回归模型、变分自编码器、扩散模型、生成对抗网络和基于分数的模型。介绍了证据下界（ELBO）、f-散度和Fisher散度。特别是，对生成扩散模型的处理提供了比文献中更系统、更明确的推导。

英文摘要

This manuscript contains preprint of a chapter under consideration for inclusion in the forthcoming third edition of {\em Cover and Thomas's Elements of Information Theory}, posted with permission from Wiley. The table of contents EIT-3 ToC of the new edition can be found at: https://docs.google.com/document/d/1L-m4oQEJw1PJhoxBeMwrrBD8S_HmvzMEkPbYvS24980/edit?usp=sharing . For feedback, please contact abbas@ee.stanford.edu Learning and information theory intersect in both model training and the characterization of fundamental performance limits. This manuscript provides a concise and accessible treatment of the first intersection, requiring only basic background in information theory and statistics at the senior undergraduate or first-year graduate level. End-of-chapter exercises make the material well suited for classroom use as well as self-study. The chapter focuses on the role of divergence measures in model training, with examples ranging from linear and logistic regression to autoregressive models, variational autoencoders, diffusion models, generative adversarial networks, and score-based models. It introduces the evidence lower bound (ELBO), f-divergences, and the Fisher divergence. In particular, the treatment of the generative diffusion model provides a more systematic and explicit derivation than is typical in the literature.

URL PDF HTML ☆

赞 0 踩 0

2606.05846 2026-06-19 cs.CL eess.AS 版本更新

Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs

迈向真正的多语言ASR：将代码切换ASR泛化到未见语言对

Gio Paik, Hyunseo Shin, Soungmin Lee

发表机构 * University of Tokyo（东京大学）

AI总结通过模型合并和领域泛化方法，研究从有限语言对中学到的代码切换能力能否泛化到未见语言对，实验表明双语CS-ASR模型对未见语言对有一定泛化能力但有限。

Comments ICML 2026 Workshop on Machine Learning for Audio

详情

AI中文摘要

自动语音识别（ASR）已成为人机交互的关键技术。然而，由于跨多种语言对的代码切换（CS）语音资源严重稀缺，代码切换ASR（CS-ASR）仍然特别具有挑战性。现有方法主要通过合成CS语音生成或在有限双语数据集上进行特定语言对微调来提高CS-ASR性能。然而，这些方法面临固有的可扩展性限制，因为对CS的支持必须针对语言对单独开发，而语言对的数量随支持的语言数量呈组合增长。在这项工作中，我们研究通过模型合并和领域泛化方法，从一组有限的已见语言对中学到的CS能力是否可以泛化到未见语言对。我们的实验表明，合并的双语CS-ASR模型对未见语言对有一定程度的泛化，表明双语CS能力在语言对之间的迁移有限。

英文摘要

Automatic Speech Recognition (ASR) has become a key technology for human--AI interaction. However, code-switching ASR (CS-ASR) remains particularly challenging due to the severe scarcity of multilingual CS speech resources across diverse language pairs. Existing approaches primarily improve CS-ASR performance through synthetic CS speech generation or pair-specific fine-tuning on limited bilingual datasets. Nevertheless, these approaches face an inherent scalability limitation, as support for CS must be developed separately for language pairs whose number grows combinatorially with the number of supported languages. In this work, we investigate whether CS capabilities learned from a limited set of seen language pairs can generalize to unseen language pairs through model merging and domain generalization methods. Our experiments show that merged bilingual CS-ASR models modestly generalize to unseen language pairs, suggesting limited transfer of bilingual CS capabilities across language pairs.

URL PDF HTML ☆

赞 0 踩 0

2605.28654 2026-06-19 cs.RO cs.SY eess.SY math.OC 版本更新

Integrated Exploration-Aware UAV Route Optimization and Path Planning

集成探索感知的无人机路径优化与轨迹规划

Jimin Choi, Grant Stagg, Cameron K. Peterson, Max Z. Li

发表机构 * Department of Aerospace Engineering, University of Michigan（密歇根大学航空航天工程系）； Department of Electrical Engineering, Brigham Young University（BYU 电子工程系）； Department of Aerospace Engineering, Department of Civil and Environmental Engineering, and Department of Industrial and Operations Engineering, University of Michigan（密歇根大学航空航天工程系、土木与环境工程系和工业与运营管理工程系）

AI总结提出一种集成探索感知的无人机路径优化与轨迹规划框架，通过风险地图、不确定兴趣区域建模、B样条轨迹优化和在线重规划，在灾害监测中平衡报告点访问与新信息探索，实现平均KL散度降低15.9%。

详情

AI中文摘要

无人机越来越多地用于危险环境（如灾区、污染场地、野火区域和受损基础设施）中的探索驱动监测，此时有限的飞行续航必须在访问报告位置和收集新信息之间分配。在这些场景中，关于危险的先验信息通常不完整、空间不精确，并且在执行过程中可能发生变化。例如，初始报告可能识别出危险可能存在的区域，但实际危险可能被移动、部分观察到或完全未被报告。我们提出了一种集成的探索感知无人机路径优化与轨迹规划框架，用于在不确定和演变的先验信息下进行危险监测。环境被表示为空间风险地图，每个位置都有相关的危险状况信念。报告的危险被建模为不确定的兴趣区域（ROI），而不是确认的目标位置，要求无人机在检查报告区域的同时，利用有限的飞行续航探索信息丰富的区域。所提出的方法解决了报告ROI上的车辆路径问题，通过辅助伪节点增强路径以改善空间覆盖，将剩余飞行距离预算分配到路径段，并优化局部探索的动态可行B样条轨迹。在执行过程中，无人机测量更新基于网格的信念地图，当新信息和剩余预算证明调整合理时，对剩余轨迹进行重规划。在48种场景配置中，在线重规划相比离线优化规划器平均KL散度降低15.9%，相比直线遍历降低48.6%。

英文摘要

Uncrewed aerial vehicles (UAVs) are increasingly used for exploration-driven monitoring in hazardous environments such as disaster zones, contaminated sites, wildfire areas, and damaged infrastructure, where limited flight endurance must be allocated between visiting reported locations and gathering new information. In these settings, prior information regarding hazards is often incomplete, spatially imprecise, and subject to change during execution. For example, initial reports may identify a region where a hazard is likely to exist, but the actual hazard may be displaced, partially observed, or entirely unreported. We present an integrated exploration-aware UAV route optimization and path planning framework for hazard monitoring under uncertain and evolving prior information. The environment is represented as a spatial risk map, where each location has an associated belief of hazardous conditions. Reported hazards are modeled as uncertain regions of interest (ROIs) rather than confirmed target locations, requiring the UAV to inspect reported areas while also using its limited flight endurance to explore informative regions. The proposed method solves a vehicle routing problem over reported ROIs, augments the route with auxiliary pseudo-nodes to improve spatial coverage, allocates the remaining flight distance budget across route segments, and optimizes dynamically feasible B-spline trajectories for local exploration. During execution, UAV measurements update a grid-based belief map, and the remaining trajectory is replanned when new information and the remaining budget justify adaptation. Across 48 scenario configurations, online replanning improves average KL reduction by 15.9% over the offline optimized planner and 48.6% over straight-line traversal.

URL PDF HTML ☆

赞 0 踩 0

2603.19895 2026-06-19 eess.SY cs.SY math.CV math.DG math.DS 版本更新

Complex Frequency as Generalized Eigenvalue

复频率作为广义特征值

Nikolas Sofos, Federico Milano

AI总结本文研究了复频率在描述线性时不变系统状态时作为特征值的广义形式，通过几何频率的定义和分解，展示了复频率在二维欧几里得平面中的应用，并证明了线性系统中复频率与特征值的等价性，同时指出非线性系统不具有这一等价性。

详情

AI中文摘要

本文证明了复频率的概念，最初用于描述复值信号的动力学，当应用于线性时不变（LTI）系统的状态时，构成了特征值的广义形式。从几何频率的定义出发，该定义为电路中的频率提供了几何解释，并自然分解为对称和反称成分，分别对应于幅度变化和旋转运动。我们展示复频率作为其在二维欧几里得平面上的限制。对于LTI系统，证明了通过非等距变换计算的系统状态的复频率与原系统的特征值一致。该等价性在任何阶数的可对角化系统中均成立。本文提供了一个统一的几何解释，将经典线性系统理论与曲线微分几何联系起来。同时指出，这种等价性一般不适用于非线性系统。另一方面，系统的几何频率总能被定义，从而为系统流提供几何解释。基于线性和非线性电路的多种示例展示了所提出的框架。

英文摘要

This paper shows that the concept of complex frequency, originally introduced to characterize the dynamics of signals with complex values, constitutes a generalization of eigenvalues when applied to the states of linear time-invariant (LTI) systems. Starting from the definition of geometric frequency, which provides a geometrical interpretation of frequency in electric circuits that admits a natural decomposition into symmetric and antisymmetric components associated with amplitude variation and rotational motion, respectively, we show that complex frequency arises as its restriction to the two-dimensional Euclidean plane. For LTI systems, it is shown that the complex frequencies computed from the system's states subject to a non-isometric transformation, coincide with the original system's eigenvalues. This equivalence is demonstrated for diagonalizable systems of any order. The paper provides a unified geometric interpretation of eigenvalues, bridging classical linear system theory with differential geometry of curves. The paper also highlights that this equivalence does not generally hold for nonlinear systems. On the other hand, the geometric frequency of the system can always be defined, providing a geometrical interpretation of the system flow. A variety of examples based on linear and nonlinear circuits illustrate the proposed framework.

URL PDF HTML ☆

赞 0 踩 0

2605.17443 2026-06-19 cs.CL cs.SD eess.AS 版本更新

Analyzing Error Propagation in Korean Spoken QA with ASR-LLM Cascades

分析韩语语音问答中ASR-LLM级联中的误差传播

Donghyuk Jung, Youngwon Choi

发表机构 * Korea Culture Technology Institute, Republic of Korea（韩国文化科技研究所）； Maum AI Inc., Republic of Korea（马姆人工智能公司）

AI总结本文研究了韩语语音问答中ASR-LLM级联中误差传播的问题，通过分析下游语义失败，揭示了传统ASR指标无法完全捕捉的误差影响，发现不同性能的LLM在级联降级上的一致性，识别出单字符ASR错误作为语义失败通道，并通过辅助比较表明大音频语言模型在噪声韩语SQA中优于匹配语言模型的ASR-LLM流水线。

Comments Preprint. Submitted to APSIPA ASC 2026

2605.08525 2026-06-19 cs.RO cs.SY eess.SY 版本更新

Model-Reference Adaptive Flight Control of a 95-mg Insect-Scale Flapping-Wing Aerial Robot

95毫克昆虫尺度扑翼飞行机器人的模型参考自适应飞行控制

Francisco M. F. R. Gonçalves, Conor K. Trygstad, Néstor O. Pérez-Arancibia

发表机构 * Washington State University（华盛顿州立大学）

AI总结针对昆虫尺度扑翼飞行机器人参数不确定性和扰动问题，提出模型参考自适应控制（MRAC）架构，结合混合乘性扩展卡尔曼滤波，实现高精度位置控制，并通过95毫克机器人实验验证了悬停和轨迹跟踪性能。

Comments Under review, 8 pages, 7 figures

详情

AI中文摘要

由于系统尺度和复杂制造，描述扑翼昆虫尺度飞行机器人动力学的模型存在参数不确定性，例如惯性矩阵和飞行器的执行器映射。此外，由于其低惯性，这种机器人在飞行中受到随机和系统性扰动的严重影响，包括电源线张力、阵风和机翼不对中产生的非期望气动力。因此，在亚分克尺度上执行复杂机动的高性能要求机器人调整其行为以抵消扰动和模型不确定性。为此，我们引入了一种模型参考自适应控制（MRAC）架构，用于可实现为三维空间中刚体的扑翼机器昆虫的高性能位置控制。此外，我们展示了在飞行中实现混合乘性扩展卡尔曼滤波以估计当前和期望角速度，如何显著抑制姿态振动，特别是沿滚转和俯仰自由度，并提高飞行性能。为了展示所提方法的适用性、功能性和高性能，我们使用一个95毫克的昆虫尺度飞行机器人进行了实时悬停和轨迹跟踪六自由度飞行控制实验。

英文摘要

Due to the system's scale and complex fabrication, the model describing the dynamics of a flapping-wing insect-scale aerial robot is subject to parameter uncertainty; for example, in the inertia matrix and the actuator mapping of the flier. Furthermore, due to its low inertia, this type of robot is greatly affected by stochastic and systematic disturbances during flight, including power-wire tension, gusts, and undesired aerodynamic forces produced by wing misalignment. Therefore, the high-performance execution of complex maneuvers at the subdecigram scale requires the robot to adapt its behavior to counteract disturbances and model uncertainty. Toward this objective, we introduce a model-reference adaptive control (MRAC) architecture for high-performance position control of flapping-wing robotic insects that can be modeled as rigid bodies in the three-dimensional (3D) space. In addition, we demonstrate how the implementation of a hybrid multiplicative extended Kálmán filter for estimating current and desired angular velocities during flight significantly dampens attitude vibrations, especially along the roll and pitch degrees of freedom (DOFs), and also improves flight performance. To show the suitability, functionality, and high performance of the proposed approach, we conducted real-time hovering and trajectory-tracking 6-DOF flight control experiments with a 95-mg insect-scale aerial robot.

URL PDF HTML ☆

赞 0 踩 0

2605.00457 2026-06-19 cs.NI cs.LG cs.SY eess.SY 版本更新

Utility-Aware DRL-Based TXOP Adaptation for NR-U and Wi-Fi Coexistence Networks

基于策略驱动的DRL的NR-U与Wi-Fi共存中的TXOP自适应

Po-Heng Chou, Yi-Fang Yu, Shou-Yu Chen, Chiapin Wang

发表机构 * Research Center for Information Technology Innovation (CITI), Academia Sinica (AS)（资讯科技创新研究所以（CITI），中华学术界（AS））； Department of Electrical Engineering, National Taiwan Normal University (NTNU)（国立台湾师范大学电子工程系（NTNU））

AI总结针对NR-U与Wi-Fi在非授权频谱共存中的频谱利用不平衡问题，提出一种基于策略驱动的深度强化学习框架，通过奖励设计实现公平性、吞吐量和效用的灵活权衡控制。

Comments 15 pages, 13 figures, 2 tables, submitted to IEEE Open Journal of the Communications Society

详情

AI中文摘要

NR-U与Wi-Fi在非授权频谱中的共存引入了一个具有挑战性的共存管理问题，其中异构信道接入机制导致频谱利用的显著不平衡和Wi-Fi性能下降。为了解决这一挑战，我们提出了一种基于策略驱动的深度强化学习（DRL）框架，用于自适应传输机会（TXOP）控制，其中共存过程被建模为马尔可夫决策过程（MDP），深度Q网络（DQN）通过在线交互学习控制策略。一个关键贡献是通过奖励设计引入策略层，从而实现对公平性、吞吐量和效用之间共存权衡的显式控制。开发了三种策略，即绝对公平、适度公平和基于效用的公平，以实现不同的工作点。仿真结果表明，所提出的框架在严格公平控制下实现了高于0.9的Jain公平指数。与绝对公平相比，适度公平将总吞吐量提高了68.22%，而基于效用的策略进一步将效用提高了177.6%。这些结果表明，策略驱动控制为管理异构共存网络中的权衡提供了一种灵活有效的解决方案。

英文摘要

The coexistence of NR-U and Wi-Fi in the unlicensed spectrum introduces a challenging resource management problem, where heterogeneous channel access mechanisms can lead to unbalanced spectrum utilization and severe Wi-Fi performance degradation. To address this issue, this paper proposes a utility-aware deep reinforcement learning (DRL) framework for adaptive transmission opportunity (TXOP) control in NR-U/Wi-Fi coexistence networks. The coexistence process is formulated as a Markov decision process (MDP), in which the NR-U TXOP duration is treated as a controllable variable for regulating post-access channel occupancy. A deep Q-network (DQN) is then employed to learn adaptive TXOP control policies through online interaction with the coexistence environment. A key feature of the proposed framework is the integration of a configurable reward and criterion design, which enables explicit control of the fairness-efficiency-utility tradeoff. Three operating policies are developed, namely absolute fairness, moderate fairness, and utility-oriented moderate fairness, to characterize different coexistence operating points. Simulation results show that the proposed framework achieves a Jain fairness index above 0.9 under strict fairness control. Compared with the absolute fairness policy, the moderate fairness policy improves aggregate throughput by 68.22%, while the utility-oriented policy achieves a 177.6% improvement under the adopted utility evaluation metric. These results demonstrate that the proposed utility-aware DRL framework provides an effective and flexible solution for adaptive TXOP control and tradeoff management in heterogeneous unlicensed coexistence networks.

URL PDF HTML ☆

赞 0 踩 0

2604.18105 2026-06-19 eess.AS cs.CL cs.SD 版本更新

NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR

NIM4-ASR：迈向高效、鲁棒且可定制的实时基于LLM的语音识别

Yuan Xie, Jiaqi Song, Guang Qiu, Xianliang Wang, Kai Qiao, Junfeng Yuan, Shengqing Liu, Yi Zhang, Bowen Chen, Ming Lei, Jie Gao, Jie Wu

发表机构 * Advanced Intelligent Systems Group, NIO（蔚来智能系统集团）

AI总结提出NIM4-ASR框架，通过重新设计多阶段训练范式（包括预训练架构优化、迭代异步SFT和ASR专用强化学习）以及生产优化（噪声鲁棒性、流式推理和RAG热词定制），在2.3B参数下实现SOTA性能。

详情

AI中文摘要

将大语言模型（LLM）集成到自动语音识别（ASR）中已成为近年来的主流范式。尽管现有的基于LLM的ASR模型在公共基准上表现出色，但其训练仍然主要依赖数据驱动，未能充分解决关键的实际挑战——特别是在资源受限部署中的有限向下可扩展性以及声学挑战条件下的幻觉问题。为了解决这些问题，我们提出了NIM4-ASR，一个面向生产的、基于LLM的ASR框架，针对效率和鲁棒性进行了优化。基于编码器和LLM之间功能角色的原则性划分，我们重新设计了多阶段训练范式，使每个模块与其预期的能力边界对齐。具体来说，我们重新制定了预训练架构和目标以缓解模态差距并提高参数效率；引入了迭代异步SFT阶段以保持声学保真度并约束表示漂移；设计了ASR专用的强化学习阶段以进一步提高识别质量和鲁棒性。我们还加入了一系列面向生产的优化，包括噪声和静音条件下的鲁棒性、实时流式推理以及通过检索增强生成（RAG）进行的热词定制。实验表明，NIM4-ASR仅用2.3B参数就在多个公共基准上达到了最先进的性能，同时在内部基准上显著优于更大规模的竞争对手——特别是在实体密集的真实场景中。NIM4-ASR进一步通过RAG支持百万级热词定制，检索延迟低于毫秒，从而能够高效适应新兴实体和个性化用户需求。

英文摘要

Integrating large language models (LLMs) into automatic speech recognition (ASR) has become a mainstream paradigm in recent years. Although existing LLM-based ASR models demonstrate impressive performance on public benchmarks, their training remains predominantly data-driven, leaving key practical challenges insufficiently addressed -- particularly limited downward scalability in resource-constrained deployments and hallucinations under acoustically challenging conditions. To address these issues, we present NIM4-ASR, a production-oriented LLM-based ASR framework optimized for both efficiency and robustness. Grounded in a principled delineation of functional roles between the encoder and the LLM, we redesign the multi-stage training paradigm to align each module with its intended capability boundary. Specifically, we reformulate the pre-training architecture and objective to mitigate the modality gap and improve parameter efficiency; introduce an iterative asynchronous SFT stage to preserve acoustic fidelity and constrain representation drift; and design an ASR-specialized reinforcement learning stage to further enhance recognition quality and robustness. We additionally incorporate a suite of production-oriented optimizations, including robustness under noisy and silent conditions, real-time streaming inference, and hotword customization via retrieval-augmented generation (RAG). Experiments show that NIM4-ASR achieves state-of-the-art performance on multiple public benchmarks with merely 2.3B parameters, while substantially outperforming larger-scale competitors on internal benchmarks -- particularly in entity-intensive real-world scenarios. NIM4-ASR further supports million-scale hotword customization via RAG with sub-millisecond retrieval latency, enabling efficient adaptation to emerging entities and personalized user requirements.

URL PDF HTML ☆

赞 0 踩 0

2604.03725 2026-06-19 quant-ph cs.IT eess.SP math.IT 版本更新

Quantum Algebraic Diversity: Single-Copy Density Matrix Estimation via Group-Structured Measurements

量子代数多样性：通过群结构测量进行单副本密度矩阵估计

Mitchell A. Thornton

AI总结将代数多样性框架扩展到量子测量，提出量子代数多样性定理，通过群结构POVM从单副本量子态估计密度矩阵，实现高保真度，并建立经典-量子对偶映射和最优性继承定理。

Comments v3: copy-reduction claim corrected; fidelities fixed; 1 figure removed

详情

AI中文摘要

我们将代数多样性（AD）框架从经典信号处理扩展到量子测量理论。量子代数多样性（QAD）定理表明，应用于量子态单副本的群结构正算子值测度（POVM）会产生一个满秩的群平均密度矩阵估计量，其特征基和特征值排序追踪真实密度矩阵的特征基和特征值排序，并偏向对称化态，类似于从单个观测中恢复协方差特征结构的经典情况。我们建立了一个经典-量子对偶映射，将经典协方差估计与量子态层析成像联系起来，以及一个最优性继承定理，表明经典群最优性通过Born映射在群平均族内转移到量子设置。SIC-POVM被识别为Heisenberg-Weyl群的AD，互无偏基被识别为Clifford群的AD，揭示了层次结构$\mathrm{HW}(d) \subseteq \mathcal{C}(d) \subseteq S_d$，这镜像了经典的$\mathbb{Z}_M \subseteq G_{\min} \subseteq S_M$。双对易子特征值定理给出了多项式时间自适应POVM选择。一个工作的量子比特示例展示了来自单个计算基测量的群平均估计量，在匹配的$\mathbb{Z}_2$群上平均后，达到保真度0.99，而标准单基层层析成像给出的秩1估计保真度为0.80。对于$d=2$到13的蒙特卡洛模拟证实，来自单个结果的保真度高于0.90，而标准保真度按$\sim 1/d$退化。增长比率反映了秩1标准估计量的崩溃，而不是每个参数的更少副本：有偏的单副本估计量减少了不同测量设置的数目，而不是每个参数的采样成本，并且真正的副本减少仅在精确对称下成立。

英文摘要

We extend the algebraic diversity (AD) framework from classical signal processing to quantum measurement theory. The Quantum Algebraic Diversity (QAD) Theorem establishes that a group-structured positive operator-valued measure (POVM) applied to a single copy of a quantum state produces a full-rank, group-averaged density matrix estimator whose eigenbasis and eigenvalue ordering track those of the true density matrix, with a bias toward the symmetrized state, analogous to the classical recovery of covariance eigenstructure from a single observation. We establish a Classical-Quantum Duality Map connecting classical covariance estimation to quantum state tomography, and an Optimality Inheritance Theorem showing that classical group optimality transfers to quantum settings via the Born map within the group-averaged family. SIC-POVMs are identified as AD with the Heisenberg-Weyl group and mutually unbiased bases as AD with the Clifford group, revealing the hierarchy $\mathrm{HW}(d) \subseteq \mathcal{C}(d) \subseteq S_d$ that mirrors the classical $\mathbb{Z}_M \subseteq G_{\min} \subseteq S_M$. The double-commutator eigenvalue theorem gives polynomial-time adaptive POVM selection. A worked qubit example shows the group-averaged estimator from a single computational-basis measurement, averaged over a matched $\mathbb{Z}_2$ group, reaching fidelity 0.99 where standard single-basis tomography gives a rank-1 estimate of fidelity 0.80. Monte Carlo simulations for $d = 2$ to $13$ confirm fidelity above 0.90 from a single outcome while standard fidelity degrades as $\sim 1/d$. The growing ratio reflects collapse of the rank-1 standard estimator, not fewer copies per parameter: the biased single-copy estimator reduces the number of distinct measurement settings, not the per-parameter sampling cost, and a genuine copy reduction holds only under exact symmetry.

URL PDF HTML ☆

赞 0 踩 0

2604.09795 2026-06-19 eess.SY cs.RO cs.SY 版本更新

On Feedback Speed Control for a Planar Tracking

平面跟踪中的反馈速度控制

Xincheng Li, Tengyue Liu, Udit Halder

发表机构 * Department of Mechanical and Aerospace Engineering, University of South Florida（南佛罗里达大学机械与航空航天工程系）

AI总结针对领航-跟随平面跟踪问题，提出一种反馈速度控制律与恒定方位角转向策略，实现并排编队并证明渐近稳定性，扩展至N-agent链网络。

详情

AI中文摘要

本文研究了领航者和跟随者之间的平面跟踪问题。我们提出了一种新颖的反馈速度控制律，结合恒定方位角转向策略，以保持两个智能体之间的并排编队。我们证明了当领航者的转向已知时，所提出的控制使闭环系统渐近稳定。对于跟随者无法获取领航者转向的情况，我们表明系统相对于被视为输入的领航者转向仍然是输入-状态稳定的。此外，我们证明如果领航者的转向是周期性的，跟随者将渐近收敛到具有相同周期的周期轨道。我们通过数值模拟和移动机器人实验验证了这些结果。最后，我们通过将两智能体控制律扩展到N智能体链网络，展示了所提出方法的可扩展性，并说明了其在生物和工程群体中方向信息传播的意义。

英文摘要

This paper investigates a planar tracking problem between a leader and follower agent. We propose a novel feedback speed control law, paired with a constant bearing steering strategy, to maintain an abreast formation between the two agents. We prove that the proposed control yields asymptotic stability of the closed-loop system when the steering of the leader is known. For the case when the leader's steering is unavailable to the follower, we show that the system is still input-to-state stable with respect to the leader's steering viewed as an input. Furthermore, we demonstrate that if the leader's steering is periodic, the follower will asymptotically converge to a periodic orbit with the same period. We validate these results through numerical simulations and experimental implementations on mobile robots. Finally, we demonstrate the scalability of the proposed approach by extending the two-agent control law to an N-agent chain network, illustrating its implications for directional information propagation in biological and engineered flocks.

URL PDF HTML ☆

赞 0 踩 0

2504.09642 2026-06-19 eess.SY cs.SY 版本更新

HBS -- Hardware Build System: Characterizing and comparing direct-Tcl and indirect-abstract approaches for hardware build systems

HBS——硬件构建系统：直接Tcl与间接抽象硬件构建方法的特征化与比较

Michał Kruszewski

AI总结本文特征化并比较了两种硬件构建系统方法：直接Tcl方法（构建代码由EDA工具直接执行）和间接抽象方法（构建系统生成Tcl脚本后由EDA工具运行），并提出了新的直接Tcl构建系统HBS，以弥补现有直接Tcl系统功能不足，用于与间接抽象系统进行对比。

详情

AI中文摘要

构建系统已成为软件实现和部署过程中不可或缺的一部分。新的编程语言（如Go、Rust或Zig）在发布时都集成了构建系统。然而，在硬件描述领域，主流硬件描述语言（HDL）如VHDL或SystemVerilog并未发布官方构建系统。此外，硬件设计项目通常涉及多种语言。本文特征化并比较了两种常见的硬件构建系统实现方法。第一种是直接Tcl方法，其中构建系统代码在设计构建流程中由EDA工具直接执行。第二种是间接抽象方法，其中构建系统生成Tcl脚本，随后由合适的EDA工具运行。由于现有的直接Tcl构建系统在支持的功能方面均不及间接抽象构建系统，本文还提出了一种新的直接Tcl硬件构建系统，称为HBS。该实现的构建系统作为直接Tcl构建系统的代表，用于与间接抽象构建系统进行比较。

英文摘要

Build systems become an indispensable part of the software implementation and deployment process. New programming languages are released with the build system integrated into the language tools, for example, Go, Rust, or Zig. However, in the hardware description domain, no official build systems have been released with the predominant Hardware Description Languages (HDL) such as VHDL or SystemVerilog. Moreover, hardware design projects are often multilingual. The paper characterizes and compares two common approaches for hardware build system implementations. The first one, the direct-Tcl approach, in which the build system code is executed directly by the EDA tool during the design build flow. The second one, the indirect-abstract approach, in which the build system produces a Tcl script, which is later run by a proper EDA tool. As none of the existing direct-Tcl build systems was close to the indirect-abstract build systems in terms of supported functionalities, the paper also presents a new direct-Tcl hardware build system called HBS. The implemented build system was used as a representative of direct-Tcl build systems in comparison with indirect-abstract build systems.

URL PDF HTML ☆

赞 0 踩 0

2603.16865 2026-06-19 math.OC cs.SY eess.SY 版本更新

Prescribed-Time Distributed Generalized Nash Equilibrium Seeking

预设时间分布式广义纳什均衡求解

Liraz Mudrik, Isaac Kaminer, Sean Kragelund, Abram H. Clark

AI总结针对安全关键多智能体系统，提出首个全分布式算法，在用户预设时间T内求解带共享耦合约束的广义纳什均衡问题，采用多速率增益调度解耦观测器、优化与对偶一致性三层耦合。

Comments 12 pages, 5 figures

详情

AI中文摘要

从协同制导到碰撞避免等安全关键多智能体系统，通常必须在硬截止时间前达成协调决策，而非仅仅最终收敛。本文提出首个全分布式算法，用于在用户预设时间$T$内求解广义纳什均衡（GNE）问题（一种具有共享耦合约束和一般成本耦合的非合作博弈），该时间独立于初始条件。其基础是建立在优化李雅普诺夫函数框架上的集中式预设时间结果，并通过非归一化Hessian-梯度反馈实现，选择该反馈是因为与牛顿和归一化Hessian-梯度实现不同，它自然地分解为每个智能体的计算。分布式实现该反馈要求每个智能体同时运行三个耦合过程：全局状态的预设时间观测器、局部优化律以及强制变分GNE共享乘子的对偶一致性机制。它们的同步运行是核心难点，因为优化不断位移观测器跟踪的状态，而估计误差污染驱动优化的梯度。我们通过一种多速率增益调度解决该耦合，其中观测器和一致性层比优化层严格更快收缩，使得每个误差分量在$T$时刻精确消失。Fischer-Burmeister重构保持设计无投影，同时在截止时间强制执行约束。针对Cournot博弈和时间关键传感器覆盖问题的数值结果验证了该方法，并展示了其作为时间关键自主性求解器在环的应用。

英文摘要

Safety-critical multi-agent systems, from cooperative guidance to collision avoidance, must often reach a coordinated decision by a hard deadline rather than merely converge to one eventually. This paper proposes the first fully distributed algorithm that solves the generalized Nash equilibrium (GNE) problem, a non-cooperative game with shared coupling constraints and general cost coupling, at a user-prescribed time $T$ independent of initial conditions. The foundation is a centralized, prescribed-time result built on the optimization Lyapunov function framework and implemented via unnormalized Hessian-gradient feedback, chosen because, unlike the Newton and normalized Hessian-gradient realizations, it naturally splits into per-agent computations. Distributing this feedback requires each agent to run three coupled processes simultaneously: a prescribed-time observer of the global state, a local optimization law, and a dual-consensus mechanism that enforces the shared multipliers of the variational GNE. Their simultaneous operation is the core difficulty, as the optimization continually displaces the states the observers track, while estimation errors corrupt the gradients that drive the optimization. We resolve this coupling with a multi-rate gain schedule whose observer and dual-consensus layers contract strictly faster than the optimization layer, so that every error component vanishes exactly at $T$. A Fischer-Burmeister reformulation keeps the design projection-free while enforcing the constraints at the deadline. Numerical results for a Cournot game and a time-critical sensor-coverage problem validate the approach and demonstrate its use as a solver-in-the-loop for time-critical autonomy.

URL PDF HTML ☆

赞 0 踩 0

2603.16941 2026-06-19 eess.AS cs.CL cs.SD 版本更新

The Voice Behind the Words: Quantifying Intersectional Bias in SpeechLLMs

言语背后的声音：量化语音大语言模型中的交叉偏见

Shree Harsha Bokkahalli Satish, Christoph Minixhofer, Maria Teleki, James Caverlee, Ondřej Klejch, Peter Bell, Gustav Eje Henter, Éva Székely

发表机构 * 1 Department of Speech, Music ； Hearing, KTH Royal Institute of Technology, Sweden 2 Centre for Speech Technology Research, University of Edinburgh, UK 3 Texas A\&M University, USA

AI总结本研究通过2880次受控交互，评估三种语音大语言模型在六种英语口音和两种性别呈现中的口音与性别交叉偏见，发现东欧口音（尤其女性）获得更低有用性评分，且人类评估者比LLM评判更敏感。

Comments 5 pages, 3 figures, 1 table, Accepted to Interspeech 2026

详情

AI中文摘要

语音大语言模型直接处理语音输入，保留了之前级联管道中去除的口音和感知性别等线索，这导致了依赖于说话者身份的反应差异。我们使用2880次受控交互（涵盖六种英语口音和两种性别呈现，通过语音克隆保持语言内容不变），对三种语音大语言模型中的口音和性别偏见进行了大规模交叉评估。通过逐点LLM评判评分、成对比较以及经过人工验证的最佳-最差缩放，我们检测到反复出现的定向差异。东欧口音的语音获得较低的有用性评分，尤其是女性呈现的语音。反应保持礼貌但在有用性上存在差异。虽然LLM评判捕捉到了这些偏见的定向趋势，但人类评估者表现出显著更高的敏感性，显示出更强的口音级别对比。

英文摘要

Speech Large Language Models (SpeechLLMs) process spoken input directly, retaining cues such as accent and perceived gender that were previously removed in cascaded pipelines. This introduces speaker identity dependent variation in responses. We present a large-scale intersectional evaluation of accent and gender bias in three SpeechLLMs using 2,880 controlled interactions across six English accents and two gender presentations, keeping linguistic content constant through voice cloning. Using pointwise LLM-judge ratings, pairwise comparisons, and Best-Worst Scaling with human validation, we detect recurring directional disparities. Eastern European-accented speech receives lower helpfulness scores, particularly for female-presenting voices. Responses remain polite but differ in helpfulness. While LLM judges capture the directional trend of these biases, human evaluators exhibit significantly higher sensitivity, showing stronger accent-level contrasts.

URL PDF HTML ☆

赞 0 踩 0

2510.06846 2026-06-19 eess.SY cs.SY 版本更新

Decentralized CBF-based Safety Filters for Collision Avoidance of Cooperative Missile Systems with Input Constraints

基于CBF的去中心化安全滤波器：面向输入受限的协同导弹系统碰撞避免

Johannes Autenrieb, Mark Spiller

AI总结针对多飞行器拦截场景，提出基于鲁棒控制屏障函数的去中心化安全滤波器，通过事件触发和松弛变量优化实现碰撞避免，兼顾计算效率与可扩展性。

Comments 7 pages, 5 figures, accepted for presentation at the 2026 American Control Conference (ACC 2026)

详情

AI中文摘要

本文提出了一种用于多智能体航空航天拦截场景中碰撞避免的去中心化安全滤波器。该方法利用鲁棒控制屏障函数（RCBF）来保证在有界输入和高相对度动力学下安全集的前向不变性。每个执行器执行其标称协同制导指令，而局部二次规划（QP）仅在必要时修改输入。基于距离和零控脱靶量（ZEM）准则的事件触发激活通过将主动约束限制在相关邻居来确保可扩展性。为了在多个同时主动约束下保证可行性，引入了一种松弛变量方案，以帕累托最优方式优先考虑关键智能体。多对多拦截场景的仿真结果表明，所提出的框架在最小偏离标称制导的情况下保持无碰撞运行，为安全关键的多智能体航空航天系统提供了一种计算高效且可扩展的解决方案。

英文摘要

This paper presents a decentralized safety filter for collision avoidance in multi-agent aerospace interception scenarios. The approach leverages robust control barrier functions (RCBFs) to guarantee forward invariance of safe sets under bounded inputs and high-relative-degree dynamics. Each effector executes its nominal cooperative guidance command, while a local quadratic program (QP) modifies the input only when necessary. Event-triggered activation based on range and zero-effort miss (ZEM) criteria ensures scalability by restricting active constraints to relevant neighbors. To ensure feasibility under multiple simultaneously active constraints, a slack-variable relaxation scheme is introduced that prioritizes critical agents in a Pareto-optimal manner. Simulation results in many-on-many interception scenarios demonstrate that the proposed framework maintains collision-free operation with minimal deviation from nominal guidance, providing a computationally efficient and scalable solution for safety-critical multi-agent aerospace systems.

URL PDF HTML ☆

赞 0 踩 0

2603.14403 2026-06-19 eess.SY cs.SY 版本更新

Robust Safety Filters for Lipschitz-Bounded Adaptive Closed-Loop Systems with Structured Uncertainties

具有结构不确定性的Lipschitz有界自适应闭环系统的鲁棒安全滤波器

Johannes Autenrieb, Peter A. Fisher, Anuradha Annaswamy

AI总结针对自适应控制系统的瞬态安全问题，提出一种基于参考的自适应安全框架，利用Lipschitz有界跟踪误差推导鲁棒CBF条件并转化为凸SOCP，减少保守性并保证前向不变性和闭环稳定性。

Comments 6 pages, 4 figures, accepted for publication in the IEEE Control Systems Letters (L-CSS)

详情

AI中文摘要

自适应控制通过在线参数自适应为不确定动态系统提供闭环稳定性和参考跟踪。然而，仅凭这些特性并不能确保状态约束的前向不变性意义上的安全性，特别是在自适应的瞬态阶段。基于控制屏障函数（CBF）的安全滤波器已被提出以解决这一限制，但现有方法通常依赖于保守的约束收紧或二次规划公式中的静态安全裕度。本文针对具有结构参数不确定性的系统提出了一种基于参考的自适应安全框架，该框架明确考虑了瞬态植物-参考失配。安全性在参考层面通过基于屏障函数的滤波器强制执行，而自适应控制驱动植物跟踪安全认证的参考。通过利用闭环跟踪误差动态的Lipschitz界，推导出依赖于跟踪误差的鲁棒CBF条件，并等价地重新表述为凸二阶锥规划（SOCP）。与固定裕度CBF公式相比，所提出的安全滤波器公式通过使安全约束随着植物-参考跟踪误差的减小而逐渐减少限制性，从而减少了保守性，同时保留了前向不变性和闭环稳定性的正式保证。

英文摘要

Adaptive control provides closed-loop stability and reference tracking for uncertain dynamical systems through online parameter adaptation. These properties alone, however, do not ensure safety in the sense of forward invariance of state constraints, particularly during transient phases of adaptation. Control barrier function (CBF)-based safety filters have been proposed to address this limitation, but existing approaches often rely on conservative constraint tightening or static safety margins within quadratic program formulations. This paper proposes a reference-based adaptive safety framework for systems with structured parametric uncertainty that explicitly accounts for transient plant-reference mismatch. Safety is enforced at the reference level using a barrier-function-based filter, while adaptive control drives the plant to track the safety-certified reference. By exploiting Lipschitz bounds on the closed-loop tracking error dynamics, a tracking-error-dependent robust CBF condition is derived and equivalently reformulated as a convex second-order cone program (SOCP). The proposed safety-filter formulation reduces conservatism relative to fixed-margin CBF formulations by rendering the resulting safety constraints progressively less restrictive as the plant-reference tracking error decreases, while preserving formal guarantees of forward invariance and closed-loop stability.

URL PDF HTML ☆

赞 0 踩 0

2603.10791 2026-06-19 eess.IV 版本更新

Semantic Satellite Communications for Synchronized Audiovisual Reconstruction

面向同步视听重建的语义卫星通信

Fangyu Liu, Peiwen Jiang, Wenjin Wang, Xiao Li, Shi Jin

AI总结提出自适应多模态语义传输系统，通过双流生成架构和动态关键帧更新机制，在带宽受限的卫星场景下实现高质量同步视听重建，显著降低带宽消耗并提升鲁棒性。

详情

AI中文摘要

卫星通信在支持高保真同步视听服务方面面临严重瓶颈，因为传统方案在信道波动、带宽有限和长传播延迟下难以处理跨模态一致性。为了解决这些问题，本文提出了一种针对卫星场景的自适应多模态语义传输系统，旨在带宽约束下实现高质量同步视听重建。与具有固定模态优先级的静态方案不同，我们的框架采用双流生成架构，可灵活切换视频驱动音频生成和音频驱动视频生成。这使得系统能够动态解耦语义，仅传输最重要的模态，同时利用跨模态生成恢复另一种模态。为了平衡重建质量和传输开销，动态关键帧更新机制根据无线场景和用户需求自适应维护共享知识库。此外，引入基于大语言模型的决策模块以增强系统适应性。通过集成卫星特定知识，该模块联合考虑任务需求和信道因素（如天气引起的衰落），主动调整传输路径和生成工作流。仿真结果表明，所提系统在实现高保真视听同步的同时显著降低带宽消耗，提高了挑战性卫星场景下的传输效率和鲁棒性。

英文摘要

Satellite communications face severe bottlenecks in supporting high-fidelity synchronized audiovisual services, as conventional schemes struggle with cross-modal coherence under fluctuating channel conditions, limited bandwidth, and long propagation delays. To address these limitations, this paper proposes an adaptive multimodal semantic transmission system tailored for satellite scenarios, aiming for high-quality synchronized audiovisual reconstruction under bandwidth constraints. Unlike static schemes with fixed modal priorities, our framework features a dual-stream generative architecture that flexibly switches between video-driven audio generation and audio-driven video generation. This allows the system to dynamically decouple semantics, transmitting only the most important modality while employing cross-modal generation to recover the other. To balance reconstruction quality and transmission overhead, a dynamic keyframe update mechanism adaptively maintains the shared knowledge base according to wireless scenarios and user requirements. Furthermore, a large language model based decision module is introduced to enhance system adaptability. By integrating satellite-specific knowledge, this module jointly considers task requirements and channel factors such as weather-induced fading to proactively adjust transmission paths and generation workflows. Simulation results demonstrate that the proposed system significantly reduces bandwidth consumption while achieving high-fidelity audiovisual synchronization, improving transmission efficiency and robustness in challenging satellite scenarios.

URL PDF HTML ☆

赞 0 踩 0

2509.15069 2026-06-19 eess.SP cs.DS cs.NA math.NA 版本更新

Efficient Computation of Time-Index Powered Weighted Sums Using Cascaded Accumulators

使用级联累加器高效计算时间索引加权和

Deijany Rodriguez Linares, Oksana Moryakova, Håkan Johansson

AI总结提出一种利用级联累加器高效计算时间索引加权和的方法，将乘法次数从K×N减少到K+1次常数乘法，无需存储数据块，适用于实时逐样本处理系统。

Comments This work has been submitted to the IEEE for possible publication

Journal ref IEEE Signal Processing Letters, vol. 33, pp. 893-897, Feb. 2026

2603.04219 2026-06-19 cs.SD cs.AI eess.AS 版本更新

ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis

ZeSTA: 基于领域条件训练的零样本文本转语音增强用于数据高效的个性化语音合成

Youngwon Choi, Jinwoo Oh, Hwayeon Kim, Hyeonyu Kim

发表机构 * Maum AI Inc.（Maum AI公司）； Humelo Inc.（Humelo公司）

AI总结提出ZeSTA框架，通过轻量领域嵌入区分真实与合成语音，结合真实数据过采样，在极低资源下提升零样本文本转语音增强的说话人相似度，保持可懂度和感知质量。

Comments 6 pages, accepted to INTERSPEECH 2026

2510.08275 2026-06-19 eess.SY cs.SY 版本更新

Control Allocation Algorithm for Hypersonic Glide Vehicles with Input Limitations

输入受限的高超声速滑翔飞行器控制分配算法

Johannes Autenrieb, Patrick Gruhn

AI总结针对高超声速滑翔飞行器执行机构强非线性和物理约束，提出一种迭代控制分配方法，通过嵌入阻力敏感软约束提高能效并降低表面温度，在GHGV-2模型上验证了有效性。

Comments 43pages, 21 figures, accpeted for publication in the AIAA Journal of Guidance, Control, and Dynamics

详情

AI中文摘要

高超声速滑翔飞行器（HGV）在具有执行机构强非线性和严格物理约束的挑战性飞行状态下运行。这些约束包括状态相关的执行器限制、非对称控制边界以及随机动条件变化的热载荷。本文介绍了一种迭代控制分配方法，以实时应对这些挑战。所提出的算法搜索能够实现期望力矩指令的控制输入，同时满足输入幅度和速率的约束。对于细长HGV构型，热载荷和阻力生成密切相关——较低的阻力通常导致表面加热减少。通过嵌入阻力敏感软约束，该方法提高了能量效率并隐含地降低了表面温度，从而降低了飞行器的红外特征。这些特性对于需要低可观测性的远程军事行动尤为有利。该方法利用DLR的通用高超声速滑翔飞行器2（GHGV-2）仿真模型进行了演示。结果证实了该方法在现实约束飞行条件下保持控制权限的有效性。

英文摘要

Hypersonic glide vehicles (HGVs) operate in challenging flight regimes characterized by strong nonlinearities in actuation and stringent physical constraints. These include state-dependent actuator limitations, asymmetric control bounds, and thermal loads that vary with maneuvering conditions. This paper introduces an iterative control allocation method to address these challenges in real time. The proposed algorithm searches for control inputs that achieve the desired moment commands while respecting constraints on input magnitude and rate. For slender HGV configurations, thermal loads and drag generation are strongly correlated-lower drag typically results in reduced surface heating. By embedding drag-sensitive soft constraints, the method improves energy efficiency and implicitly reduces surface temperatures, lowering the vehicle's infrared signature. These features are particularly advantageous for long-range military operations that require low observability. The approach is demonstrated using the DLR's Generic Hypersonic Glide Vehicle 2 (GHGV-2) simulation model. The results confirm the method's effectiveness in maintaining control authority under realistic, constrained flight conditions.

URL PDF HTML ☆

赞 0 踩 0

2601.17464 2026-06-19 eess.SY cs.SY 版本更新

Robust Output Regulation of Uncertain Linear Time-Varying Systems

不确定线性时变系统的鲁棒输出调节

Jinmeng Zha, Zhen Zhang

AI总结针对线性时变系统的鲁棒输出调节问题，提出轨迹匹配系统浸入框架，揭示参数不确定性的根本影响，建立有限线性参数化的精确代数边界，并设计近似鲁棒控制器以实现任意小的有界跟踪误差。

详情

AI中文摘要

线性时变系统的鲁棒输出调节几十年来一直是一个开放问题。为了解决这个问题，我们提出了轨迹匹配系统浸入框架，通过将调节方程重新表述为更具洞察力的形式。这一视角表明，找到内模等价于通过构造一个无外力系统来再现给定受迫系统的稳态输出轨迹。这揭示了参数不确定性的根本影响，给出了鲁棒调节的精确代数边界，称为有限线性参数化。由此，我们进一步证明时变系统中的不确定性容易激发无限维函数族，使得有限维调节器无法实现精确鲁棒调节。因此，我们建立了一个全面的近似鲁棒设计，它产生一个可以任意小的有界跟踪误差，并避免显式求解调节方程。此外，当不确定性以某些特定方式影响系统时，它可以确保精确调节。总体而言，这些结果为构建基于内模的设计提供了一个通用的、可执行的框架，并简化了鲁棒控制的实现过程。

英文摘要

Robust output regulation for linear time-varying systems has remained an open problem for decades. By augmenting the classical immersion viewpoint, we propose the trajectory-matching system immersion framework. It reformulates the regulator equation as a forced system, and demonstrates that finding an internal model is equivalent to reproducing the non-decaying output trajectories of this forced system by constructing an unforced one. This perspective yields an exact algebraic boundary for finite-dimensional internal models, termed finite linear parameterization. It further reveals a distinctive obstruction in time-varying systems: even highly structured, finite-dimensional affine parametric uncertainties can generate infinite-dimensional families of non-decaying error-zeroing signals, thereby precluding exact robust regulation via linear finite-dimensional internal models in general. Hence, we develop a comprehensive approximate robust design, which yields a bounded tracking error that can be arbitrarily small, and avoids explicitly solving the regulator equation. Additionally, it recovers exact regulation when the uncertainty influences the system in some specified ways. Overall, these results clarify the intrinsic limitation of exact finite-dimensional robust regulation for uncertain LTV systems, and provide a general, executable framework for constructing an internal model-based design.

URL PDF HTML ☆

赞 0 踩 0

2508.02604 2026-06-19 cs.RO cs.SY eess.SY 版本更新

Periodic robust robotic rock chop via virtual model control

基于虚拟模型控制的周期性鲁棒机器人砍切

Yi Zhang, Fumiya Iida, Fulvio Forni

发表机构 * University of Cambridge（剑桥大学）； University of Tokyo（东京大学）

AI总结提出一种物理结构化的虚拟模型控制器，通过切换虚拟机构生成鲁棒的周期性砍切运动，无需预规划轨迹，在Franka机械臂上实现多种蔬菜的亚毫米级精确切割。

详情

AI中文摘要

机器人切割是一项具有挑战性的、接触丰富的操作任务，机器人必须同时协商未知的物体力学、大接触力和精确的运动要求。我们的假设是，这种复杂性可以通过设计一个物理结构化的虚拟模型控制器来缓解，该控制器使用切换虚拟机构生成鲁棒的、有节奏的岩石砍切运动，无需预先规划的轨迹或精确的环境信息。运动是由环境、机器人动力学和切换虚拟机构的虚拟力之间的相互作用产生的，最终通过可用的驱动实现。通过理论分析和实验验证，我们证明了受控的机器人行为会稳定到周期性的运动。使用Franka机械臂进行的实验表明，在五种不同的蔬菜上实现了鲁棒的切割，对于1毫米到6毫米的厚度，以每秒近一次切割的速度实现了亚毫米级的切片精度。尽管刀的形状或砧板的高度发生变化，控制器仍保持高性能，并成功适应了不同的人形机械臂，展示了鲁棒性和平台独立性。

英文摘要

Robotic cutting is a challenging, contact-rich manipulation task where the robot must simultaneously negotiate unknown object mechanics, large contact forces, and precise motion requirements. Our hypothesis is that this complexity can be alleviated through the design of a physically structured virtual-model controller that uses switched virtual mechanisms to generate a robust, rhythmic rock-chop motion for robotic cutting, without requiring pre-planned trajectories or precise environmental information. Motion is generated by the interaction between the environment, the robot's dynamics, and the virtual forces of the switching virtual mechanism, ultimately realized through the available actuation. Through theoretical analysis and experimental validation, we demonstrate that the controlled robot behavior settles into a stable periodic motion. Experiments with a Franka manipulator demonstrate robust cuts across five different vegetables, achieving sub-millimeter slice accuracy for thicknesses from 1 mm to 6 mm at a rate of nearly one cut per second. The controller maintains high performance despite changes in knife shape or cutting board height, and successfully adapts to a different humanoid manipulator, demonstrating robustness and platform independence.

URL PDF HTML ☆

赞 0 踩 0

2601.03112 2026-06-19 eess.IV cs.CV 版本更新

DiT-JSCC: Rethinking Deep JSCC with Diffusion Transformers and Semantic Representations

DiT-JSCC：基于扩散变换器与语义表示的深度JSCC再思考

Kailin Tan, Jincheng Dai, Sixian Wang, Guo Lu, Shuo Shao, Kai Niu, Wenjun Zhang, Ping Zhang

发表机构 * Beijing University of Posts and Telecommunications（北京邮电大学）； Shanghai Jiao Tong University（上海交通大学）； University of Shanghai for Science and Technology（上海科技大学）

AI总结提出DiT-JSCC框架，联合学习语义优先表示编码器和扩散变换器生成解码器，通过粗细粒度条件解码和基于Kolmogorov复杂度的自适应带宽分配，在极端信道条件下提升语义一致性与传输效率。

Comments 14pages, 14figures, 2tables

详情

AI中文摘要

生成式联合源信道编码（GJSCC）已成为一种新的深度JSCC范式，用于在极端无线信道条件（如超低带宽和低信噪比）下实现高保真和鲁棒的图像传输。近期研究通常采用扩散模型作为生成解码器，但经常产生视觉上逼真但语义一致性有限的结果。这种局限性源于面向重建的JSCC编码器与生成解码器之间的根本性不匹配，因为前者缺乏显式的语义判别能力，无法提供可靠的条件线索。在本文中，我们提出DiT-JSCC，一种新颖的GJSCC骨干网络，能够联合学习语义优先的表示编码器和基于扩散变换器（DiT）的生成解码器，我们的开源项目旨在促进GJSCC的未来研究。具体来说，我们设计了一个语义-细节双分支编码器，与从粗到细的条件DiT解码器自然对齐，在极端信道条件下优先考虑语义一致性。此外，受Kolmogorov复杂度启发，引入了一种无需训练的自适应带宽分配策略，以进一步提高传输效率，从而真正重新定义生成解码时代的信息价值概念。大量实验表明，DiT-JSCC在语义一致性和视觉质量上始终优于现有JSCC方法，尤其是在极端条件下。

英文摘要

Generative joint source-channel coding (GJSCC) has emerged as a new Deep JSCC paradigm for achieving high-fidelity and robust image transmission under extreme wireless channel conditions, such as ultra-low bandwidth and low signal-to-noise ratio. Recent studies commonly adopt diffusion models as generative decoders, but they frequently produce visually realistic results with limited semantic consistency. This limitation stems from a fundamental mismatch between reconstruction-oriented JSCC encoders and generative decoders, as the former lack explicit semantic discriminability and fail to provide reliable conditional cues. In this paper, we propose DiT-JSCC, a novel GJSCC backbone that can jointly learn a semantics-prioritized representation encoder and a diffusion transformer (DiT) based generative decoder, our open-source project aims to promote the future research in GJSCC. Specifically, we design a semantics-detail dual-branch encoder that aligns naturally with a coarse-to-fine conditional DiT decoder, prioritizing semantic consistency under extreme channel conditions. Moreover, a training-free adaptive bandwidth allocation strategy inspired by Kolmogorov complexity is introduced to further improve the transmission efficiency, thereby indeed redefining the notion of information value in the era of generative decoding. Extensive experiments demonstrate that DiT-JSCC consistently outperforms existing JSCC methods in both semantic consistency and visual quality, particularly in extreme regimes.

URL PDF HTML ☆

赞 0 踩 0

2601.00014 2026-06-19 eess.SP cs.AI cs.LG 版本更新

Modeling Day-Long ECG Signals to Predict Heart Failure Risk with Explainable AI

建模全天心电图信号以可解释人工智能预测心力衰竭风险

Eran Zvuloni, Ronit Almog, Michael Glikson, Shany Brimer Biton, Ilan Green, Izhar Laufer, Offer Amir, Joachim A. Behar

发表机构 * Leumit Health Services（Leumit健康服务）

AI总结提出DeepHHF深度学习模型，利用24小时单导联心电图数据预测五年内心力衰竭风险，AUC达0.80，优于短时片段和临床评分，可解释性分析显示模型关注心律失常和心脏异常。

详情

AI中文摘要

心力衰竭（HF）影响11.8%的65岁及以上成年人，降低生活质量和寿命。预防HF可降低发病率和死亡率。我们假设将人工智能（AI）应用于24小时单导联心电图（ECG）数据可预测五年内HF风险。为此，使用了Technion-Leumit Holter ECG（TLHE）数据集，包括20年间收集的47,729名患者的69,663条记录。我们的深度学习模型DeepHHF在24小时ECG记录上训练，实现了0.80的受试者工作特征曲线下面积，优于使用30秒片段和临床评分的模型。DeepHHF识别的高风险个体住院或死亡事件概率翻倍。可解释性分析显示DeepHHF关注心律失常和心脏异常。本研究强调了深度学习建模24小时连续ECG数据的可行性，捕捉了对可靠风险预测至关重要的阵发性事件。应用于单导联Holter ECG的人工智能无创、廉价且广泛可及，使其成为HF风险预测的有前景工具。

英文摘要

Heart failure (HF) affects 11.8% of adults aged 65 and older, reducing quality of life and longevity. Preventing HF can reduce morbidity and mortality. We hypothesized that artificial intelligence (AI) applied to 24-hour single-lead electrocardiogram (ECG) data could predict the risk of HF within five years. To research this, the Technion-Leumit Holter ECG (TLHE) dataset, including 69,663 recordings from 47,729 patients, collected over 20 years was used. Our deep learning model, DeepHHF, trained on 24-hour ECG recordings, achieved an area under the receiver operating characteristic curve of 0.80 that outperformed a model using 30-second segments and a clinical score. High-risk individuals identified by DeepHHF had a two-fold chance of hospitalization or death incidents. Explainability analysis showed DeepHHF focused on arrhythmias and heart abnormalities. This study highlights the feasibility of deep learning to model 24-hour continuous ECG data, capturing paroxysmal events essential for reliable risk prediction. Artificial intelligence applied to single-lead Holter ECG is non-invasive, inexpensive, and widely accessible, making it a promising tool for HF risk prediction.

URL PDF HTML ☆

赞 0 踩 0

2512.17473 2026-06-19 eess.SP cs.LG math.OC stat.ML 版本更新

Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions

非线性矩阵分解的交替方向乘子法

Atharva Awari, Nicolas Gillis, Arnaud Vandaele

发表机构 * University of Mons（蒙斯大学）

AI总结提出基于交替方向乘子法（ADMM）的算法求解非线性矩阵分解（NMD），支持多种非线性函数和损失函数，在真实数据集上验证了适用性和效率。

Comments 16 pages, 7 figures. v3: Revised version: added new experiments and comparisons. Code available from https://gitlab.com/Atharva05/admm-for-nmd

详情

AI中文摘要

我们提出了一种基于交替方向乘子法（ADMM）的算法，用于求解非线性矩阵分解（NMD）。给定输入矩阵 $X \in \mathbb{R}^{m \times n}$ 和分解秩 $r \ll \min(m, n)$，NMD 寻求矩阵 $W \in \mathbb{R}^{m \times r}$ 和 $H \in \mathbb{R}^{r \times n}$，使得 $X \approx f(WH)$，其中 $f$ 是逐元素非线性函数。我们在几个代表性非线性模型上评估了我们的方法：适用于非负稀疏数据近似的修正线性单元激活 $f(x) = \max(0, x)$，适用于概率电路表示的逐分量平方 $f(x) = x^2$，以及适用于推荐系统的 MinMax 变换 $f(x) = \min(b, \max(a, x))$。所提出的框架灵活支持多种损失函数，包括最小二乘、$\ell_1$ 范数和 Kullback-Leibler 散度，并且可以轻松扩展到其他非线性和度量。我们在真实世界数据集上展示了该方法的适用性、效率和适应性，突出了其在广泛应用中的潜力。

英文摘要

We present an algorithm based on the alternating direction method of multipliers (ADMM) for solving nonlinear matrix decompositions (NMD). Given an input matrix $X \in \mathbb{R}^{m \times n}$ and a factorization rank $r \ll \min(m, n)$, NMD seeks matrices $W \in \mathbb{R}^{m \times r}$ and $H \in \mathbb{R}^{r \times n}$ such that $X \approx f(WH)$, where $f$ is an element-wise nonlinear function. We evaluate our method on several representative nonlinear models: the rectified linear unit activation $f(x) = \max(0, x)$, suitable for nonnegative sparse data approximation, the component-wise square $f(x) = x^2$, applicable to probabilistic circuit representation, and the MinMax transform $f(x) = \min(b, \max(a, x))$, relevant for recommender systems. The proposed framework flexibly supports diverse loss functions, including least squares, $\ell_1$ norm, and the Kullback-Leibler divergence, and can be readily extended to other nonlinearities and metrics. We illustrate the applicability, efficiency, and adaptability of the approach on real-world datasets, highlighting its potential for a broad range of applications.

URL PDF HTML ☆

赞 0 踩 0

2511.14280 2026-06-19 eess.SY cs.SY math.OC 版本更新

A graph-informed regret metric for optimal distributed control

面向最优分布式控制的图信息遗憾度量

Daniele Martinelli, Andrea Martin, Giancarlo Ferrari-Trecate, Luca Furieri

AI总结提出空间遗憾度量，衡量分布式控制器与拥有额外传感信息的先知控制器之间的最坏性能差距，并基于该度量设计分布式控制器，通过凸优化实现有限维近似，在电力系统仿真中有效抑制局部扰动。

详情

AI中文摘要

我们考虑使用分布式控制器对大规模系统进行最优控制，这些控制器的网络拓扑与子系统之间的耦合图相匹配。在这项工作中，我们引入了空间遗憾，这是一种基于图的度量，用于衡量分布式控制器与能够访问额外传感器信息的先知控制器之间的最坏情况性能差距。先知的图是信息图的用户指定扩展，产生一个基准策略，该策略惩罚那些额外传感会改善性能的扰动。最小化空间遗憾可以产生尊重名义信息图的分布式控制器，这些控制器模仿先知对大规模网络特征扰动（如局部扰动）的响应。我们证明，最小化空间遗憾可以转化为一个具有有限维近似的无限规划。为了扩展到大型网络，我们推导了空间遗憾的上界，该上界可以以分布式方式高效最小化。在电力系统模型上的数值实验表明，与基于经典度量的控制器相比，所得控制器能更有效地抑制局部扰动。

英文摘要

We consider the optimal control of large-scale systems using distributed controllers whose network topology mirrors the coupling graph between subsystems. In this work, we introduce spatial regret, a graph-informed metric measuring the worst-case performance gap between a distributed controller and an oracle with access to additional sensor information. The oracle's graph is a user-specified augmentation of the information graph, yielding a benchmark policy that penalizes disturbances for which additional sensing would improve performance. Minimizing spatial regret yields distributed controllers - respecting the nominal information graph - that emulate the oracle's response to disturbances characteristic of large-scale networks, such as localized perturbations. We show that minimizing spatial regret admits a convex reformulation as an infinite program with a finite-dimensional approximation. To scale to large networks, we derive an upper bound on the spatial regret that can be efficiently minimized in a distributed way. Numerical experiments on power-system models show that the resulting controllers mitigate localized disturbances more effectively than those based on classical metrics.

URL PDF HTML ☆

赞 0 踩 0