arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.04474 2026-06-12 cs.CL eess.AS 版本更新

Entity Binding Failures in Speech LLM Reasoning: Diagnosis and Chain-of-Thought Intervention

语音大模型推理中的实体绑定失败:诊断与思维链干预

Ming-Hao Hsu, Xiaohai Tian, Jun Zhang, Zhizheng Wu

AI总结 本文通过诊断语音大模型在逻辑推理中的实体绑定失败问题,提出实体感知思维链方法,显著提升推理准确率。

详情
Comments
INTERSPEECH 2026
AI中文摘要

语音大模型在复杂推理任务上表现不如文本模型。我们揭示了这种模态差距并非均匀的认知缺陷。通过评估三个不同的语音大模型,我们发现在空间、句法和事实任务上,语音到文本(S2T)匹配或超过文本到文本(T2T)。然而,在需要实体追踪的逻辑任务上,S2T准确率降至随机水平。我们将这种局部退化诊断为实体绑定失败:连续的语音特征导致模型在隐式推理过程中丢失精确的实体-属性关联。为解决此问题,我们提出了实体感知思维链(EA-CoT),强制语音大模型在推理前显式枚举实体并将其绑定到声明上。引人注目的是,即使口语名称被误识别,EA-CoT也能弥合差距,带来高达24.4%的绝对准确率提升。消融实验证实这些提升完全源于显式语义绑定,将模态差距重新定义为可解决的瓶颈。

英文摘要

Speech Large Language Models (SLLMs) underperform their text counterparts on complex reasoning. We reveal that this gap is not a uniform cognitive deficit. Evaluating two architecturally diverse SLLMs, we show speech-to-text (S2T) matches or exceeds text-to-text (T2T) on spatial, syntactic, and factual tasks. Yet on logical tasks requiring entity tracking, S2T accuracy collapses to chance. We diagnose this as an entity binding failure: continuous speech features blur precise entity-property associations during implicit reasoning. To validate this diagnosis, we introduce Entity-Aware Chain-of-Thought (EA-CoT), a lightweight inference-time intervention forcing SLLMs to enumerate entities and bind them to claims before reasoning. EA-CoT bridges the gap, even when spoken names are misrecognized, yielding up to a 24.4 percentage-point accuracy gain. Ablations confirm the gains stem from explicit semantic binding, reframing the gap as an elicitation failure rather than a missing capability.

2606.02868 2026-06-12 eess.SY 版本更新

Closed-Form PI and PID Tuning of All-Pole Plants up to Third Order for Monotonic Minimum-Settling Step Responses

针对单调最小稳定时间解的三阶以下被控对象的PI和PID整定

Senol Gulgonul

AI总结 提出一种统一的闭环解析PI/PID整定方法,针对三阶以下全极点被控对象,实现严格单调(零超调)且最小稳定时间的阶跃响应。

详情
Comments
v2: extended with monotonicity windows, third-order boundary theorem in final form, and comparisons; subsumes arXiv:2604.21294
AI中文摘要

提出一种统一的闭环解析PI/PID整定方法,针对三阶以下全极点被控对象,能够产生严格单调(零超调)且最小稳定时间的阶跃响应。设计目标是二项式闭环传递函数 p^n/(s+p)^n,该函数单调且鲁棒性仅依赖于阶数 n。由于在固定极点模式中添加左半平面零点只会减慢响应,最小稳定时间解要求控制器零点被抵消,这迫使控制器分子整除被控对象分母。贯彻这一原理表明,对于任何稳定的被控对象,精确的实增益解存在于:二阶以下被控对象使用PI控制器,三阶被控对象使用PID控制器;超出此范围时,残差二项式因子会出现一个复极点对,而一般被控对象不包含该复极点对。推导了一阶被控对象(PI)、具有实极点和复极点的二阶被控对象(PI和PID)、以及具有三个实极点或一个实极点加一个复极点对的三阶被控对象(PID)的显式增益。二阶PI情况作为最低阶实例被完整处理。单调性保证了 Mt=1,因此 Ms<2,相位裕度大于60度,增益裕度大于6 dB,对于二项式族这些值收紧为通用常数。数值验证确认了结果。

英文摘要

A unified, closed-form analytical PI/PID tuning method is presented for all-pole plants up to third order that yields a strictly monotonic (zero-overshoot) step response with minimum settling time. The design target is the binomial closed loop p^n/(s+p)^n, which is monotonic with robustness depending only on the order n. Because a fixed PI/PID cannot assign the closed-loop poles and the controller zeros independently, realizing this target exactly requires the controller zeros to be cancelled, which forces the controller numerator to divide the plant denominator. It follows that an exact, real-gained solution exists for any stable plant precisely up to second order with a PI controller and third order with a PID controller; beyond that the residual binomial factor acquires a complex pair of damping sqrt(3)/2, which a generic plant does not contain. Explicit gains are derived for first-order plants (PI), second-order plants with real and complex poles (PI and PID), and third-order plants with three real poles or one real pole plus a complex pair (PID). The freedom of the coincident designs is shown to be bounded: a quadratic nonnegativity condition gives the exact window of the design pole for strict monotonicity, which collapses at the pole-ratio-2 changeover for real poles and is nonempty for damping ratios above approximately 0.443 for complex poles. Monotonicity guarantees Mt = 1, hence Ms <= 2, phase margin >= 60 degrees, and gain margin >= 6 dB, tightening to universal constants for the binomial family. Load-disturbance attenuation obeys IAEd = 1/Ki, making the cost of cancellation explicit, and comparisons with SIMC, the CHR zero-overshoot rule, and deadbeat-fitted explicit formulas quantify the trade: at matched maximum sensitivity the proposed design settles faster than SIMC on the third-order example, with markedly lower controller gains and peak control effort.

2605.27628 2026-06-12 cs.AI cs.CY cs.ET cs.MA eess.SY 版本更新

Intelligence as Managed Autonomy: Failure, Escalation, and Governance for Agentic AI Systems

智能作为受管自主:代理型AI系统的失败、升级与治理

Srini Ramaswamy

AI总结 本文提出SMARt模型,通过形式化能力检测认知漂移、暂停推理、尝试恢复并在可靠性下降时放弃控制,以解决自主AI系统中的幻觉和持续不合理行为问题。

详情
Comments
This peer-reviewed paper is to appear in the Journal of Intelligent and Robotic Systems
AI中文摘要

随着自主和代理型AI系统在机器人和人机环境中的规模扩大,管理幻觉和持续但不合理的行动仍然是一个开放挑战。本文并未将这些失败仅仅归因于模型或对齐限制,而是探讨了无界自主性的架构脆弱性——即假设代理应在不确定性上升时继续运行的预设。本文引入了一种受管自主理论,通过形式化能力来定义智能行为:检测认知漂移、暂停推理、尝试恢复,并在可靠性下降时最终放弃控制。我们通过SMARt(具有受管/撤销转换的自管理多层自主推理)模型实例化该理论,该模型是一个四层框架,包含稳定、元认知、辅助和受管状态。通过开发定时、受保护的Petri网形式化,我们建立了系统的理论有界属性,展示了架构如何形式化地强制升级、约束无效输出,并确保在指定条件下的治理可达性。我们进一步分析了如何在不同的操作环境(例如医疗、机器人等)中结合特定领域的触发集,在满足完备性和健全性标准的前提下系统地维护安全性。由于这些触发被设计为自适应的,SMARt模型允许代理操作范围随时间安全、受控地扩展。我们得出结论,在自主生命周期内形式化失败管理是实现可靠且受治理人工智能的关键一步。

英文摘要

As autonomous and agentic AI systems scale in robotic and human-machine environments, managing hallucination and persistent but unjustified action remains an open challenge. Rather than attributing these failures solely to model or alignment limitations, this paper explores the architectural vulnerability of unbounded autonomy - the presumption that an agent should continue operating regardless of rising uncertainty. It introduces a theory of managed autonomy that defines intelligent behavior through the formal capacity to detect epistemic drift, suspend reasoning, attempt recovery, and ultimately surrender control when reliability diminishes. We instantiate this theory via the SMARt (Self-Managing Multi-tier Autonomous Reasoning with Regulated/Revoked transitions) model, a four-layer framework featuring Stable, Meta-cognitive, Assisted, and Regulated states. By developing a timed, guarded Petri net formulation, we establish theoretically bounded properties for the system, demonstrating how architecture can formally mandate escalation, constrain invalid outputs, and ensure governance reachability under specified conditions. We further analyze how incorporating domain-specific trigger sets across varied operational settings (e.g., healthcare, robotics, etc.) can systematically preserve safety, assuming completeness and soundness criteria are met. Because these triggers are designed to be adaptive, the SMARt model accommodates the safe, controlled expansion of an agent's operational scope over time. We conclude that formalizing failure management within the autonomy lifecycle is a crucial step toward realizing reliable and governed artificial intelligence.

2605.27544 2026-06-12 eess.SY 版本更新

Subsystem Structure as an Inferential Resource for Coupled Engineered Systems

子系统结构作为耦合工程系统的推理资源

Esmaeil Ghorbani, Jürgen Hackl

AI总结 提出概率组合推理框架,利用子系统结构实现耦合工程系统中稀疏测量下的状态与参数推理,在保持校准不确定性的同时将计算复杂度从立方降至线性。

详情
AI中文摘要

工程基础设施系统面临逆问题,其中隐藏状态、未知参数和子系统耦合必须从稀疏且有噪声的测量中推断出来。这些问题之所以困难,是因为物理子系统是异构的,感知是部分的,不确定性分布在子系统接口之间,并且计算成本随系统规模快速增长。我们通过概率组合推理来解决这一挑战,这是一种基于图的架构,将耦合系统表示为相互作用的子系统,每个子系统保留自己的局部模型、估计器和不确定性表示,而耦合通过跨子系统接口交换的物理上有意义的随机消息来处理。这种公式允许机械、学习和确定性组件在单个推理框架内共存,并在不组装全局增广状态或协方差的情况下传播校准的不确定性。我们在三个日益苛刻的环境中验证了该框架:一个稀疏传感的典型逆问题,其中接口耦合也可以从数据中学习;基础设施规模的电力网络,其中该方法匹配集中式联合状态和参数推理,同时将计算规模从大约立方降低到大约线性;以及嵌入在电网网络中的多物理场涡轮机,其中异构子系统分层组合而不降低局部推理或将局部后验折叠为全局估计。这些结果共同表明,子系统结构可以作为耦合工程系统中不确定性感知逆推理的组织原则。

英文摘要

Engineered infrastructure systems pose inverse problems in which hidden states, unknown parameters, and subsystem couplings must be inferred from sparse and noisy measurements. These problems are difficult because physical subsystems are heterogeneous, sensing is partial, uncertainty is distributed across subsystem interfaces, and computational cost grows rapidly with system size. We address this challenge with probabilistic compositional inference, a graph-based architecture that represents a coupled system as interacting subsystems, each retaining its own local model, estimator, and uncertainty representation, while coupling is handled through physically meaningful stochastic messages exchanged across subsystem interfaces. This formulation allows mechanistic, learned, and deterministic components to coexist within a single inference framework and propagates calibrated uncertainty without assembling a global augmented state or covariance. We validate the framework in three increasingly demanding settings: a sparse-sensing canonical inverse problem, where interface couplings can also be learned from data; infrastructure-scale power networks, where the method matches centralized joint state-and-parameter inference while reducing computational scaling from approximately cubic to approximately linear; and a multi-physics turbine embedded in a power-grid network, where heterogeneous subsystems compose hierarchically without degrading local inference or collapsing local posteriors into a global estimate. Together, these results show that subsystem structure can be exploited as the organizing principle for uncertainty-aware inverse inference in coupled engineered systems.

2605.20755 2026-06-12 eess.AS 版本更新

DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action

DuplexSLA: 一种具备同步语音、语言和动作的全双工语音语言模型

Haoyang Zhang, Jun Chen, Donghang Wu, Yuxin Li, Yuxin Zhang, Xiangyu Tony Zhang, Che Liu, Qingjian Lin, Yizhou Peng, Hexin Liu, Eng Siong Chng, Chao Yan, Boyong Wu, Yechang Huang, Xuerui Yang, Fei Tian

AI总结 本研究提出DuplexSLA,一种全双工语音语言模型,通过共享160ms时间线解码助理音频和结构化动作流,实现了同步语音、语言和动作的处理,解决了现有全双工模型在对话中规划和工具调用方面的不足。

详情
AI中文摘要

最近的语音对话语言模型进展已从基于轮次的模型转向全双工设计,其中模型在生成响应的同时持续监听用户。然而,现有的全双工架构仍然缺乏用于对话中规划和工具调用的原生通道,导致实时代理行为要么受限于轮次边界,要么被排除在外部流程之外。我们提出了DuplexSLA,一种原生全双工语音-语言-动作基础模型,它在共享的160ms时间线上解码助理音频和结构化动作流。DuplexSLA基于双流三通道架构:一个连续的用户音频通道、一个离散的助理音频通道以及一个速率受限的文本动作通道,这些都在一个单一的骨干网络中联合解码,使得听、说、规划和工具调用在同一个共享时钟上展开。该模型的两个核心能力是:(1) 语义驱动的轮次切换控制,其中中断、暂停和应答信号在同一个骨干网络中处理,而不是通过外部语义VAD;(2) 对话中的规划和工具调用,其中规划文本和结构化工具调用在动作通道上发出,而无需中断助理音频,因此多动作和由回话触发的工具使用可以与正在进行的语音交错进行。为了评估这些能力,我们进一步构建了DuplexSLA-Bench,一个涵盖暂停、中断和回话的轮次切换以及三种对话中工具调用风格的全双工基准。我们的项目页面、交互演示以及DuplexSLA-Bench评估套件可在https://github.com/hyzhang24/DuplexSLA上公开获取。

英文摘要

Recent advances in spoken dialogue language models have shifted from turn-based to full-duplex designs, where the model continuously listens to the user while generating responses. However, existing duplex backbones still lack a native channel for in-conversation planning and tool calling, leaving real-time agentic behaviour either tied to turn boundaries or relegated to an external cascade. We propose DuplexSLA, a native full-duplex Speech-Language-Action foundation model that decodes assistant audio together with a structured action stream on a shared 160 ms chunk timeline. DuplexSLA is built on a dual-stream three-channel formulation: a continuous user audio channel, a discrete assistant audio channel, and a rate-limited textual action channel, all decoded jointly by a single backbone, so that listening, speaking, planning, and tool calling unfold on one shared clock. Two capabilities define the model: (1) semantic-driven turn-taking control, where interruption, pause, and backchannel are handled inside the same backbone instead of by an external semantic VAD; and (2) in-conversation planning and tool calling, where planning text and structured tool calls are emitted on the action channel without halting assistant audio, so that multi-action and backchannel-triggered tool use are interleaved with ongoing speech. To evaluate these capabilities together, we further construct DuplexSLA-Bench, a duplex benchmark covering pause, interrupt, and backchannel turn-taking together with three styles of in-conversation tool calling. Our project page, interactive demos, and the DuplexSLA-Bench evaluation suite are publicly available at this https URL.

2504.12423 2026-06-12 eess.AS eess.SP 版本更新

Benchmarking Audio Deepfake Detection Robustness in Real-world Communication Scenarios

在现实通信场景中评估音频深度伪造检测的鲁棒性基准测试

Haohan Shi, Xiyu Shi, Safak Dogan, Saif Alzubi, Tianjin Huang, Yunxiao Zhang

AI总结 本文提出ADD-C数据集和数据增强策略,评估现实通信场景下音频深度伪造检测系统的鲁棒性,实验表明所提方法显著提升系统性能。

详情
Comments
Accepted by EUSIPCO 2025
AI中文摘要

现有的音频深度伪造检测(ADD)系统在现实通信场景中常因音频编解码压缩和信道传输效应导致音频质量显著下降而难以有效泛化。为解决这一挑战,我们开发了一个严格的基准测试来评估ADD系统在这些场景下的性能。我们引入了ADD-C,一个新测试数据集,用于评估ADD系统在不同通信条件下的鲁棒性,包括不同音频编解码器组合和包丢失率。在ADD-C数据集上对三个基线ADD模型进行基准测试,结果显示在这些条件下鲁棒性显著下降。提出了一种新的数据增强(DA)策略以提高ADD系统的鲁棒性。实验结果表明,所提方法显著提高了ADD系统在所提ADD-C数据集上的性能。我们的基准测试可帮助未来构建实用且鲁棒泛化的ADD系统。

英文摘要

Existing Audio Deepfake Detection (ADD) systems often struggle to generalise effectively due to the significantly degraded audio quality caused by audio codec compression and channel transmission effects in real-world communication scenarios. To address this challenge, we developed a rigorous benchmark to evaluate the performance of the ADD system under such scenarios. We introduced ADD-C, a new test dataset to evaluate the robustness of ADD systems under diverse communication conditions, including different combinations of audio codecs for compression and packet loss rates. Benchmarking three baseline ADD models on the ADD-C dataset demonstrated a significant decline in robustness under such conditions. A novel Data Augmentation (DA) strategy was proposed to improve the robustness of ADD systems. Experimental results demonstrated that the proposed approach significantly enhances the performance of ADD systems on the proposed ADD-C dataset. Our benchmark can assist future efforts towards building practical and robustly generalisable ADD systems.

2604.15028 2026-06-12 eess.SY astro-ph.EP astro-ph.IM 版本更新

Nonlinear backstepping with saturation for low-thrust station-keeping of libration point orbits

基于饱和非线性的低推力平动点轨道保持反步控制

António Nunes, Sérgio Brás, Pedro Batista

AI总结 针对地月系统低推力连续轨道保持问题,提出一种非线性反步控制律,通过李雅普诺夫理论实现几乎全局一致指数稳定,并形式化纳入执行器饱和约束,通过蒙特卡洛分析验证了有效性。

详情
Comments
Manuscript accepted for Acta Astronautica. Please cite the published version. For a working demo of the solution proposed, see this https URL
AI中文摘要

本文提出了一种新颖的非线性反步控制律,用于地月系统中连续低推力轨道保持。在高保真动力学模型下,以拟周期平动点轨道为目标。通过李雅普诺夫稳定性理论,实现了几乎全局一致指数稳定性保证。执行器饱和被正式纳入控制器设计,使得即使在饱和情况下这些保证仍然成立。研究了饱和阈值、控制增益和偏差之间的关系,并讨论了增益选择的最优过程。通过蒙特卡洛分析对代表性应用案例进行了数值测试,考虑了操作误差、约束和外部扰动。在典型电推进系统的保守阈值下,验证了执行器饱和下的轨道保持。

英文摘要

This paper presents a novel nonlinear backstepping control law for continuous, low-thrust station-keeping in the Earth-Moon system. Quasi-periodic libration point orbits are targeted under a high-fidelity model of the dynamics. Almost global uniform exponential stability guarantees are attained, as shown through Lyapunov's stability theory. Saturation of the actuators is formally included in the controller design, such that these guarantees hold even in the event of saturation. The relationship between saturation threshold, control gains, and deviation is studied and an optimal procedure for gain selection is discussed. The control solution is tested numerically through a Monte Carlo analysis over representative application cases, subject to operational errors, constraints, and external perturbations. Station-keeping under actuation saturation is validated considering a conservative threshold for typical electric propulsion systems.

2512.14648 2026-06-12 cs.CV eess.IV 版本更新

Adaptable Segmentation Pipeline for Diverse Brain Tumors with Radiomic-Guided Subtyping and Lesion-Wise Model Ensemble

适用于多样化脑肿瘤的自适应分割流程:放射组学引导的亚型分类与病灶级模型集成

Daniel Capellán-Martín, Abhijeet Parida, Zhifan Jiang, Nishad Kulkarni, Krithika Iyer, Austin Tapp, Syed Muhammad Anwar, María J. Ledesma-Carbayo, Marius George Linguraru

AI总结 提出一种灵活模块化的自适应分割流程,通过放射组学特征检测肿瘤亚型并平衡训练,结合病灶级性能指标优化模型集成与后处理,在BraTS 2025挑战赛中达到顶尖性能,支持临床定量肿瘤测量。

详情
Comments
12 pages, 5 figures, 3 tables. Algorithm presented at MICCAI BraTS 2025
AI中文摘要

在多参数磁共振成像(MRI)上对脑肿瘤进行鲁棒且可泛化的分割仍然困难,因为肿瘤类型差异很大。BraTS 2025 Lighthouse挑战赛在多种高质量成人及儿童肿瘤数据集上对分割方法进行基准测试:多联盟国际儿童脑肿瘤分割(PED)、术前脑膜瘤肿瘤分割(MEN)、脑膜瘤放射治疗分割(MEN-RT)以及治疗前后脑转移瘤分割(MET)。我们提出了一种灵活、模块化且自适应的流程,通过选择和组合最先进的模型,并在训练前后应用肿瘤和病灶特定的处理,来提高分割性能。从MRI中提取的放射组学特征有助于检测肿瘤亚型,确保更平衡的训练。自定义的病灶级性能指标决定了每个模型在集成中的影响力,并优化了进一步细化预测的后处理,使工作流能够针对每个病例定制每一步。在BraTS测试集上,我们的流程在多个挑战中取得了与顶尖算法相当的性能。这些发现证实,自定义的病灶感知处理与模型选择能够产生鲁棒的分割,而无需将方法锁定在特定的网络架构上。我们的方法在临床实践中具有定量肿瘤测量的潜力,支持诊断和预后。

英文摘要

Robust and generalizable segmentation of brain tumors on multi-parametric magnetic resonance imaging (MRI) remains difficult because tumor types differ widely. The BraTS 2025 Lighthouse Challenge benchmarks segmentation methods on diverse high-quality datasets of adult and pediatric tumors: multi-consortium international pediatric brain tumor segmentation (PED), preoperative meningioma tumor segmentation (MEN), meningioma radiotherapy segmentation (MEN-RT), and segmentation of pre- and post-treatment brain metastases (MET). We present a flexible, modular, and adaptable pipeline that improves segmentation performance by selecting and combining state-of-the-art models and applying tumor- and lesion-specific processing before and after training. Radiomic features extracted from MRI help detect tumor subtype, ensuring a more balanced training. Custom lesion-level performance metrics determine the influence of each model in the ensemble and optimize post-processing that further refines the predictions, enabling the workflow to tailor every step to each case. On the BraTS testing sets, our pipeline achieved performance comparable to top-ranked algorithms across multiple challenges. These findings confirm that custom lesion-aware processing and model selection yield robust segmentations yet without locking the method to a specific network architecture. Our method has the potential for quantitative tumor measurement in clinical practice, supporting diagnosis and prognosis.

2603.28945 2026-06-12 eess.SY 版本更新

Coupling Scenario-Based Grid Simulations with State Estimation: Measurement Requirements for Low-Voltage Networks under the German Energy Transition Pathway

将基于场景的电网仿真与状态估计相结合:德国能源转型路径下低压网络的测量要求

Nane Zimmermann, Lukas P. Wagner, Luca von Rönn, Florian Strobel, Paul Hüttmann, Felix Gehlhoff

AI总结 研究结合德国政府能源转型路径与状态估计性能要求,评估不同设备质量和测量配置对低压配电网的影响,发现变压器过载和电压越限是主要问题,变压器级测量可显著降低估计误差,建议优先部署变压器仪表。

详情
AI中文摘要

电动汽车、热泵和屋顶光伏的日益普及正在低压配电网中造成热和电压应力。本研究将德国联邦政府能源转型路径(2025-2045)与状态估计性能要求联系起来,在两个SimBench参考网络上,针对三种设备质量水平(良好、中等、较差)和三种在变压器和馈线级仪表可用性上不同的VDE论坛网络技术/网络运行(VDE FNN)测量配置进行评估。在本研究的分析中,拥堵完全由变压器过载和电压带违规引起。没有单条线路超过其热额定值(最大:89.5%)。对于给定的部署轨迹,设备质量决定了拥堵的出现时间:在良好设备下,拥堵直到2045年都不出现;在中等设备下,从2035年开始出现(3/6场景);在较差设备下,从2025年开始出现(6/6)。没有变压器仪表时,无论智能电表渗透率如何,中位电压估计误差达到6-42%。增加一个变压器测量可将误差降低一个数量级,实现中位误差0.5-1.7%。在城市网络中,变压器级仪表在所有配置下均达到VDE FNN电压精度目标(第99百分位电压误差低于2%)。在较差设备下的农村网络中,接近但未达到目标。这些发现表明,优先部署变压器仪表是提高电网可观测性的有效第一步,并将当前以消费驱动的计量部署与基于风险的、与当地拥堵暴露相关的部署标准相结合。

英文摘要

Increasing penetration of electric vehicles, heat pumps, and rooftop photovoltaics is creating thermal and voltage stress in low-voltage distribution grids. This work links the German Federal Government energy transition pathway (2025-2045) with state estimation performance requirements, evaluated on two SimBench reference networks across three equipment quality levels (good, medium, poor) and three VDE Forum Netztechnik/Netzbetrieb (VDE FNN) measurement constellations that differ in the availability of transformer and feeder-level instrumentation. Within this work's analysis, congestion is caused exclusively by transformer overloading and voltage-band violations. No individual line exceeds its thermal rating (maximum: 89.5%). Equipment quality governs congestion onset for a given deployment trajectory: under good equipment, congestion remains absent through 2045, under medium equipment it emerges from 2035 (3/6 scenarios), under poor equipment from 2025 (6/6). Without transformer instrumentation, median voltage estimation errors reach 6-42% regardless of smart meter penetration. Adding a single transformer measurement reduces errors by an order of magnitude, achieving median errors of 0.5-1.7%. In urban networks, transformer-level instrumentation meets the VDE FNN voltage accuracy target (99th percentile voltage error below 2%) in all configurations. In rural networks under poor equipment, the target is approached but not met. These findings motivate prioritizing transformer instrumentation as an effective first step for grid observability and supplementing the current consumption-driven metering rollout with risk-based deployment criteria linked to local congestion exposure.

2603.23918 2026-06-12 eess.SP 版本更新

Linking Dispersive-Medium Uncertainty to Clutter Analysis in Single-Snapshot FDA-MIMO-GPR

将色散介质不确定性关联到单快拍FDA-MIMO-GPR中的杂波分析

Yisu Yan, Jifeng Guo

AI总结 本文建立了一个传播侧统计框架,将弛豫谱随机扰动映射到复介电常数、复波数、导向矢量扰动、介质诱导杂波协方差和总杂波协方差,通过KL模态分解和子空间投影分析表征介质不确定性对有效秩、有效杂波子空间维度和目标-杂波可分离性的影响。

详情
AI中文摘要

单快拍FDA-MIMO-GPR需要能够解释色散介质不确定性的杂波模型,然而复杂介质表征与杂波协方差分析之间的统计联系一直不明确。本文开发了一个传播侧统计框架,将弛豫谱的随机扰动映射到复介电常数、复波数、导向矢量扰动、介质诱导杂波协方差和总杂波协方差。在该框架内,通过基于KL的模态分解和子空间投影分析,表征了介质不确定性对有效秩、有效杂波子空间维度和目标-杂波可分离性的影响。数值验证使用了五个文献中已知的介电族来定义物理可追溯的先验场景,一个受控随机场模型来执行主要传播链,以及基于gprMax的全波FDTD快照进行独立的求解器级一致性检查。蒙特卡洛闭合显示了逐级数值一致性,将导向线性化识别为主要的近似敏感步骤,并支持一个弱扰动区域及其向中等区域的有限扩展。在一个代表性的白化与检测基准中,结构化协方差模型将AUC从对角基线的0.593提高到0.753,而先验失配实验表明性能退化是渐进的而非突变的。这些结果为在一阶、传播主导的设置中将复杂介质不确定性嵌入FDA-MIMO-GPR杂波分析提供了一个明确且可解释的接口。

英文摘要

Single-snapshot FDA-MIMO-GPR requires clutter models that account for dispersive-medium uncertainty, yet the statistical link between complex-medium characterization and clutter covariance analysis has remained unclear. This paper develops a propagation-side statistical framework that maps random perturbations of the relaxation spectrum to complex permittivity, complex wavenumber, steering-vector perturbation, medium-induced clutter covariance, and total clutter covariance. Within this framework, the effects of medium uncertainty on effective rank, effective clutter-subspace dimension, and target--clutter separability are characterized through a KL-based modal decomposition and a subspace-projection analysis. Numerical validation uses five literature-informed dielectric families to define physically traceable prior scenarios, a controlled random-field model to exercise the main propagation chain, and gprMax-based full-wave FDTD snapshots for an independent solver-level consistency check. Monte Carlo closure shows stage-wise numerical consistency, identifies steering linearization as the dominant approximation-sensitive step, and supports a weak perturbation regime with a bounded extension into a moderate regime. In a representative whitening-and-detection benchmark, the structured covariance model raises AUC from 0.593 for a diagonal baseline to 0.753, while prior-mismatch experiments indicate gradual rather than abrupt degradation. These results provide an explicit and interpretable interface for embedding complex-medium uncertainty into FDA-MIMO-GPR clutter analysis within a first-order, propagation-dominated setting.

2603.00610 2026-06-12 cs.SD cs.AI cs.LG cs.MM eess.AS 版本更新

CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction

CMI-RewardBench: 基于组合多模态指令评估音乐奖励模型

Yinghao Ma, Haiwen Xia, Hewei Gao, Weixiong Chen, Yuxin Ye, Yuchen Yang, Sungkyun Chang, Mingshuo Ding, Yizhi Li, Ruibin Yuan, Simon Dixon, Emmanouil Benetos

AI总结 针对音乐生成模型缺乏有效评估机制的问题,提出CMI-RewardBench基准,包含大规模偏好数据集和参数高效奖励模型,实现多模态指令下的音乐质量评估。

详情
Comments
Accepted by ICML 2026
AI中文摘要

虽然音乐生成模型已经发展到能够处理混合文本、歌词和参考音频的复杂多模态输入,但评估机制却滞后了。在本文中,我们通过为组合多模态指令(CMI)下的音乐奖励建模建立了一个全面的生态系统来弥补这一关键差距,其中生成的音乐可能以文本描述、歌词和音频提示为条件。我们首先引入了CMI-Pref-Pseudo,一个包含11万个伪标签样本的大规模偏好数据集,以及CMI-Pref,一个针对细粒度对齐任务量身定制的高质量人工标注语料库。为了统一评估格局,我们提出了CMI-RewardBench,一个统一的基准,用于评估音乐奖励模型在音乐性、文本-音乐对齐和组合指令对齐方面的异质样本。利用这些资源,我们开发了CMI奖励模型(CMI-RMs),一个能够处理异质输入的参数高效奖励模型家族。我们评估了它们与人类判断分数在音乐性和对齐方面的相关性,使用了CMI-Pref以及之前的数据集。进一步的实验表明,CMI-RM不仅与人类判断高度相关,而且通过top-k过滤实现了有效的推理时扩展。代码可在GitHub(此 https URL )获取。模型权重:CMI-RM(此 https URL )。数据集:CMI-Pref-Pseudo(此 https URL )和CMI-Pref(此 https URL )。

英文摘要

While music generation models have evolved to handle complex multimodal inputs mixing text, lyrics, and reference audio, evaluation mechanisms have lagged behind. In this paper, we bridge this critical gap by establishing a comprehensive ecosystem for music reward modeling under Compositional Multimodal Instruction (CMI), where the generated music may be conditioned on text descriptions, lyrics, and audio prompts. We first introduce CMI-Pref-Pseudo, a large-scale preference dataset comprising 110k pseudo-labeled samples, and CMI-Pref, a high-quality, human-annotated corpus tailored for fine-grained alignment tasks. To unify the evaluation landscape, we propose CMI-RewardBench, a unified benchmark that evaluates music reward models on heterogeneous samples across musicality, text-music alignment, and compositional instruction alignment. Leveraging these resources, we develop CMI reward models (CMI-RMs), a parameter-efficient reward model family capable of processing heterogeneous inputs. We evaluate their correlation with human judgment scores on musicality and alignment on CMI-Pref along with previous datasets. Further experiments demonstrate that CMI-RM not only correlates strongly with human judgments, but also enables effective inference-time scaling via top-k filtering. Code is available at GitHub ( this https URL ). Model weights: CMI-RM ( this https URL ). Datasets: CMI-Pref-Pseudo ( this https URL ) and CMI-Pref ( this https URL )

2603.03017 2026-06-12 math.OC eess.SY 版本更新

Stability properties of Minimal Gated Unit neural networks

最小门控单元神经网络的稳定性性质

Stefano De Carli, Davide Previtali, Mirko Mazzoleni, Fabio Previdi

AI总结 针对资源受限环境,分析最小门控单元网络的输入-状态稳定性,导出充分参数条件,提出稳定性促进训练方法,在合成数据和Silverbox基准上验证其参数效率与推理速度优势。

详情
Comments
Preprint submitted to Automatica. 16 pages, 6 figures and 1 table MATLAB code for the proposed methodologies is available at: this https URL
AI中文摘要

在这项工作中,我们通过分析最小门控单元(MGU)网络的稳定性,解决了在计算资源有限的环境中需要高效且形式稳定的循环神经网络(RNN)的问题。MGU网络是系统辨识中常用门控RNN的轻量级替代方案。我们推导了MGU网络输入-状态稳定性和增量输入-状态稳定性的充分参数条件。这些条件使得模型稳定性的后验验证成为可能,并构成了新颖的稳定性促进训练方法的基础,包括网络参数的热启动和基于投影梯度的优化方案,两者均在本工作中提出。比较评估,包括鲁棒性分析以及在合成数据和真实世界数据(即Silverbox基准)上的验证,表明最小门控单元网络成功地将形式稳定性保证与优越的参数效率和更快的推理时间相结合,同时保持了可比较且令人满意的准确性。值得注意的是,在Silverbox基准上获得的结果表明,稳定的MGU网络有效捕捉了系统动态,而其他稳定的RNN未能收敛到可靠模型。

英文摘要

In this work, we address the need for efficient and formally stable Recurrent Neural Networks (RNNs) in environments with limited computational resources by analyzing the stability of the Minimal Gated Unit (MGU) network, a lightweight alternative to common gated RNNs used in system identification. We derive sufficient parametric conditions for the MGU network's input-to-state stability and incremental input-to-state stability properties. These conditions enable a-posteriori validation of model stability and form the basis for novel stability-promoting training methodologies, including a warm-start of the network's parameters and a projected gradient-based optimization scheme, both of which are presented in this work. Comparative evaluation, including robustness analysis and validation on synthetic and real-world data (i.e., the Silverbox benchmark), demonstrates that the minimal gated unit network successfully combines formal stability guarantees with superior parameter efficiency and faster inference times compared to other state-of-the-art recurrent neural networks, while maintaining comparable and satisfactory accuracy. Notably, the results attained on the Silverbox benchmark illustrate that the stable MGU network effectively captures the system dynamics, whereas other stable RNNs fail to converge to a reliable model.

2603.01860 2026-06-12 eess.SP math.OC 版本更新

Multiresolution Adaptive Block-Coordinate Forward-Backward for Image Reconstruction

用于图像重建的多分辨率自适应块坐标前向后向算法

Edgar Desainte-Maréville (OCKHAM), Marion Foare (OCKHAM, CPE), Paulo Gonçalves (OCKHAM), Nelly Pustelnik (Phys-ENS), Elisa Riccietti (OCKHAM)

AI总结 提出一种自适应多分辨率块坐标前向后向算法,通过基于非光滑高斯-索斯韦尔规则的随机块选择策略动态平衡各尺度更新,自动适应不同模糊和噪声水平。

详情
AI中文摘要

用于成像逆问题的经典一阶优化方法在图像分辨率高时扩展性差。基于小波的多级策略可以在强模糊下加速收敛,但其固定的由粗到细调度在中等模糊或噪声主导的情况下效果不佳。本文提出一种用于图像恢复的自适应多分辨率块坐标前向后向算法。多分辨率块选择由近端更新的局部幅度驱动,通过将非光滑高斯-索斯韦尔规则应用于图像的小波分解来实现。这种自适应选择策略动态平衡跨尺度的更新,根据退化情况强调粗块或细块。因此,所提方法自动适应变化的模糊和噪声水平,无需依赖预定义的层次更新方案。

英文摘要

Classical first-order optimization methods for imaging inverse problems scale poorly with image resolution. Wavelet based multilevel strategies can accelerate convergence under strong blur, but their fixed coarse-to-fine schedules lose effectiveness in moderate-blur or noise-dominated regimes. In this work, we propose an adaptive multiresolution block coordinate Forward-Backward algorithm for image restoration. Multiresolution block selection is driven by the local magnitude of the proximal update via a stochastic non-smooth Gauss-Southwell rule applied to the wavelet decomposition of the image. This adaptive selection strategy dynamically balances updates across scales, emphasizing coarse or fine blocks according to the degradation regime. As a result, the proposed method automatically adapts to varying blur and noise levels without relying on a predefined hierarchical update scheme.

2602.05121 2026-06-12 eess.SY cs.RO 版本更新

Trojan Attacks on Neural Network Controllers for Robotic Systems

针对机器人系统神经网络控制器的木马攻击

Farbod Younesi, Walter Lucia, Amr Youssef

AI总结 针对机器人神经网络控制器,设计轻量级并行木马网络,在特定触发条件下篡改控制指令,通过仿真验证攻击有效性。

详情
Comments
Paper submitted to the 2026 IEEE Conference on Control Technology and Applications (CCTA)
AI中文摘要

神经网络控制器越来越多地应用于机器人系统中,用于轨迹跟踪和姿态稳定等任务。然而,它们对可能不可信的训练流程或供应链的依赖引入了显著的安全漏洞。本文以差速驱动移动机器人平台为案例,研究针对神经控制器的后门(木马)攻击。具体来说,假设机器人的跟踪控制器实现为神经网络,我们设计了一个轻量级的并行木马网络,可以嵌入到控制器中。该恶意模块在正常操作期间保持休眠,但在检测到由机器人姿态和目标参数定义的高度特定触发条件时,会破坏主控制器的轮速命令,导致不良且可能不安全的机器人行为。我们提供了所提出的木马网络的概念验证实现,并通过两种不同攻击场景下的仿真进行了验证。结果证实了所提出攻击的有效性,并表明基于神经网络的机器人控制系统面临潜在的关键安全威胁。

英文摘要

Neural network controllers are increasingly deployed in robotic systems for tasks such as trajectory tracking and pose stabilization. However, their reliance on potentially untrusted training pipelines or supply chains introduces significant security vulnerabilities. This paper investigates backdoor (Trojan) attacks against neural controllers, using a differential-drive mobile robot platform as a case study. In particular, assuming that the robot's tracking controller is implemented as a neural network, we design a lightweight, parallel Trojan network that can be embedded within the controller. This malicious module remains dormant during normal operation but, upon detecting a highly specific trigger condition defined by the robot's pose and goal parameters, compromises the primary controller's wheel velocity commands, resulting in undesired and potentially unsafe robot behaviours. We provide a proof-of-concept implementation of the proposed Trojan network, which is validated through simulation under two different attack scenarios. The results confirm the effectiveness of the proposed attack and demonstrate that neural network-based robotic control systems are subject to potentially critical security threats.

2505.13102 2026-06-12 cs.LG cs.AI eess.SP 版本更新

Lightweight and Interpretable Transformer via Mixed Graph Algorithm Unrolling for Traffic Forecast

轻量级可解释Transformer:基于混合图算法展开的交通预测

Ji Qi, Tam Thuc Do, Mingxiao Liu, Zhuoshi Pan, Yuzhe Li, Gene Cheung, H. Vicky Zhao

AI总结 提出一种通过展开混合图优化算法构建的轻量级可解释类Transformer网络,用于时空交通预测,在保持竞争性能的同时大幅减少参数。

详情
Comments
24 pages, 7 figures, 11 tables
AI中文摘要

与采用经典自注意力机制的传统“黑箱”Transformer不同,我们通过展开基于混合图的优化算法构建了一个轻量级且可解释的类Transformer神经网络,用于具有空间和时间维度的交通预测。我们构建了两个图:一个无向图$\mathcal{G}^u$捕捉跨地理的空间相关性,以及一个有向图$\mathcal{G}^d$捕捉时间上的序列关系。我们预测信号$\mathbf{x}$的未来样本,假设其相对于$\mathcal{G}^u$和$\mathcal{G}^d$都是“平滑的”,为此我们设计了新的$\ell_2$和$\ell_1$范数变分项来量化并促进有向图上的信号平滑性(低频重构)。我们基于交替方向乘子法(ADMM)设计了一个迭代算法,并将其展开为一个前馈网络以进行数据驱动的参数学习。我们周期性地插入用于$\mathcal{G}^u$和$\mathcal{G}^d$的图学习模块,这些模块扮演自注意力的角色。实验表明,我们的展开网络在交通预测性能上与最先进的预测方案相当,同时大幅减少了参数数量。

英文摘要

Unlike conventional "black-box" transformers with classical self-attention mechanism, we build a lightweight and interpretable transformer-like neural net by unrolling a mixed-graph-based optimization algorithm to forecast traffic with spatial and temporal dimensions. We construct two graphs: an undirected graph $\mathcal{G}^u$ capturing spatial correlations across geography, and a directed graph $\mathcal{G}^d$ capturing sequential relationships over time. We predict future samples of signal $\mathbf{x}$, assuming it is "smooth" with respect to both $\mathcal{G}^u$ and $\mathcal{G}^d$, where we design new $\ell_2$ and $\ell_1$-norm variational terms to quantify and promote signal smoothness (low-frequency reconstruction) on a directed graph. We design an iterative algorithm based on alternating direction method of multipliers (ADMM), and unroll it into a feed-forward network for data-driven parameter learning. We periodically insert graph learning modules for $\mathcal{G}^u$ and $\mathcal{G}^d$ that play the role of self-attention. Experiments show that our unrolled networks achieve competitive traffic forecast performance as state-of-the-art prediction schemes, while reducing parameter counts drastically.

2602.00142 2026-06-12 cs.IT eess.SP 版本更新

Semantic-Aware Command and Control Transmission for Multi-UAVs

面向多无人机的语义感知指挥控制传输

Boya Li, Xiaonan Liu, Dongzhu Liu, Dusit Niyato, Zhu Han

AI总结 针对无人机指挥控制传输中比特导向网络无法满足URLLC需求的问题,提出一种语义感知传输框架,利用语义相似度实现多播传输,并通过PPO算法联合优化传输模式和资源分配,显著提升传输效率。

详情
Comments
The paper requires further revision
AI中文摘要

无人机在低空经济中发挥了重要作用,并已应用于各种场景。然而,随着无人机数量的增加和无线数据的爆炸性增长,现有的面向比特的通信网络已接近香农容量,无法满足面向比特的无人机通信网络中指挥控制(C&C)传输的超可靠低延迟通信(URLLC)服务质量(QoS)要求。为解决这一问题,我们提出了一种在有限无线资源下用于多无人机的语义感知C&C传输方案。具体地,我们利用语义相似度来测量每个无人机在连续传输时间间隔(TTI)内C&C消息的变化,并捕获无人机间C&C消息的相关性,从而实现多播传输。基于语义相似度和无人机命令的重要性,我们设计了一个触发函数来量化无人机的QoS。然后,为了最大化长期QoS并利用语义相似度带来的C&C消息多播机会,我们开发了一种近端策略优化(PPO)算法,以联合决定传输模式(单播/多播/空闲)以及基站(BS)与无人机之间有限资源块(RB)的分配。实验结果表明,与面向比特的无人机传输相比,我们提出的语义感知框架显著提高了传输效率和有效性。

英文摘要

Uncrewed aerial vehicles (UAVs) have played an important role in the low-altitude economy and have been used in various applications. However, with the increasing number of UAVs and explosive wireless data, the existing bit-oriented communication network has approached the Shannon capacity, which cannot satisfy the quality of service (QoS) with ultra-reliable low-latency communication (URLLC) requirements for command and control (C\&C) transmission in bit-oriented UAV communication networks. To address this issue, we propose a novel semantic-aware C\&C transmission for multi-UAVs under limited wireless resources. Specifically, we leverage semantic similarity to measure the variation in C\&C messages for each UAV over continuous transmission time intervals (TTIs) and capture the correlation of C\&C messages among UAVs, enabling multicast transmission. Based on the semantic similarity and the importance of UAV commands, we design a trigger function to quantify the QoS of UAVs. Then, to maximize the long-term QoS and exploit multicast opportunities of C\&C messages induced by semantic similarity, we develop a proximal policy optimization (PPO) algorithm to jointly determine the transmission mode (unicast/multicast/idle) and the allocation of limited resource blocks (RBs) between a base station (BS) and UAVs. Experimental results show that our proposed semantic-aware framework significantly increases transmission efficiency and improves effectiveness compared with bit-oriented UAV transmission.

2601.13289 2026-06-12 eess.SP 版本更新

Semantic Communication for the Internet of Underwater Things: Architectures, Applications, Challenges, and Future Directions

水下物联网的语义通信:架构、应用、挑战与未来方向

Ruhul Amin Khalil, Asiya Jehangir, Hanane Lamaazi, Saddaf Rubab, Nasir Saeed

AI总结 本文综述了语义通信在水下物联网中的应用,探讨了其架构、学习驱动方法及代表性应用,并指出了关键研究方向,旨在提升资源受限水下网络的通信效率。

详情
Comments
33 pages, 10 figures, and 9 tables. The paper is submitted to IEEE for Possible Publication
AI中文摘要

水下物联网(IoUT)支持海洋感知、环境监测、海底检查和水下自主作业。然而,IoUT通信受限于有限的带宽、长传播延迟、时变水下信道、间歇性连接和严格的能量预算。语义通信(SC)通过传输任务相关的含义而非原始数据,提供了一种有前景的替代方案,从而在资源受限的水下网络中提高通信效率。本文对IoUT的SC进行了批判性和可行性感知的综述,重点关注机遇、挑战、局限性和未来研究方向。我们首先回顾了支持SC的IoUT系统的基础知识,包括语义表示、分层架构、语义信道建模和任务导向的评估指标。然后,我们考察了基于机器学习(ML)、知识图谱(KG)、视觉语言模型(VLM)、生成模型和联邦学习(FL)的学习驱动方法,并强调它们在水下边缘约束下的可行性。从SC角度分析了代表性应用,包括环境监测、海洋生态、海底基础设施检查、灾害响应和自主水下航行器(AUV)协调。最后,我们确定了关键研究方向,涉及标准化语义模型、可复现测试平台、计算-通信权衡、可信重建、混合水下链路、能量感知边缘智能、语义安全、数字孪生(DT)和跨域互操作性。本综述为开发可靠、高效和意义驱动的IoUT通信系统提供了结构化基础。

英文摘要

The Internet of Underwater Things (IoUT) supports marine sensing, environmental monitoring, subsea inspection, and autonomous underwater operations. However, IoUT communication is constrained by limited bandwidth, long propagation delay, time-varying underwater channels, intermittent connectivity, and strict energy budgets. Semantic Communication (SC) offers a promising alternative by transmitting task-relevant meaning rather than raw data, thereby improving communication efficiency in resource-constrained underwater networks. This paper presents a critical and feasibility-aware survey of SC for IoUT, focusing on opportunities, challenges, limitations, and future research directions. We first review the fundamentals of SC-enabled IoUT systems, including semantic representations, layered architectures, semantic channel modeling, and task-oriented evaluation metrics. We then examine learning-driven approaches based on machine learning (ML), knowledge graphs (KGs), vision-language models (VLMs), generative models, and federated learning (FL), with emphasis on their feasibility under underwater edge constraints. Representative applications, including environmental monitoring, marine ecology, subsea infrastructure inspection, disaster response, and autonomous underwater vehicle (AUV) coordination, are analyzed from an SC perspective. Finally, we identify key research directions involving standardized semantic models, reproducible testbeds, compute--communication trade-offs, trustworthy reconstruction, hybrid underwater links, energy-aware edge intelligence, semantic security, digital twins (DTs), and cross-domain interoperability. This survey provides a structured foundation for developing reliable, efficient, and meaning-driven IoUT communication systems.

2601.10869 2026-06-12 eess.SY 版本更新

Disturbance Attenuation Regulator II: Stage Bound Finite Horizon Solution

扰动衰减调节器 II:阶段有界有限时域解

Davide Mannini, James B. Rawlings

AI总结 针对阶段有界扰动下的线性系统,结合博弈论和动态规划,推导出状态反馈最优控制策略的递归解,该策略非线性且每步需求解凸优化问题,并给出稳态LMI形式。

详情
AI中文摘要

本文针对离散时间阶段有界扰动衰减调节器(StDAR)的状态反馈控制,提出了一种广义的有限时域递归解。该问题考虑受阶段有界扰动的线性动力系统,即扰动序列在每个时间步独立地受到阶段平方二范数界的约束。术语“广义”表示结果适用于任意初始状态。通过结合博弈论和动态规划,本文推导出最优状态反馈策略的递归解。最优策略在状态上是非线性的,并且需要在每个阶段求解一个可处理的凸优化问题以得到拉格朗日乘子向量;然后控制是显式的。对于具有恒定阶段界的系统,该问题允许一个稳态优化,表示为可处理的线性矩阵不等式(LMI),其经验计算成本约为$n$的三次方。数值例子说明了该解的性质。本文为任意初始状态的StDAR提供了完整的反馈解。配套论文讨论了信号界扰动衰减调节器(SiDAR):第一部分A中的有限时域解和第一部分B中的收敛性质。

英文摘要

This paper develops a generalized finite horizon recursive solution to the discrete time stage bound disturbance attenuation regulator (StDAR) for state feedback control. This problem addresses linear dynamical systems subject to stage bound disturbances, i.e., disturbance sequences constrained independently at each time step through stagewise squared two-norm bounds. The term generalized indicates that the results accommodate arbitrary initial states. By combining game theory and dynamic programming, this work derives a recursive solution for the optimal state feedback policy. The optimal policy is nonlinear in the state and requires solving a tractable convex optimization for the Lagrange multiplier vector at each stage; the control is then explicit. For systems with constant stage bound, the problem admits a steady-state optimization expressed as a tractable linear matrix inequality (LMI) whose empirical computational cost is approximately cubic in $n$. Numerical examples illustrate the properties of the solution. This work provides a complete feedback solution to the StDAR for arbitrary initial states. Companion papers address the signal bound disturbance attenuation regulator (SiDAR): the finite horizon solution in Part~I-A and convergence properties in Part~I-B.

2508.12681 2026-06-12 cs.RO cs.LG eess.SY 版本更新

Adaptive Model-Predictive Control of a Soft Continuum Robot Using a Physics-Informed Neural Network Based on Cosserat Rod Theory

基于Cosserat杆理论物理信息神经网络的软体连续机器人自适应模型预测控制

Johann Licher, Max Bartholdt, Henrik Krauss, Tim-Lukas Habich, Thomas Seel, Moritz Schappler

AI总结 提出一种基于域解耦物理信息神经网络(DD-PINN)的实时非线性模型预测控制框架,实现软体连续机器人的高精度动态控制,位置误差低于3 mm。

详情
Comments
Submitted to IEEE Transactions on Robotics, 20 pages, 14 figures
AI中文摘要

软体连续机器人(SCR)的动态控制对其应用扩展具有巨大潜力,但由于精确动态模型的高计算需求,仍然是一个具有挑战性的问题。虽然已经提出了如Koopman算子方法等数据驱动方法,但它们通常缺乏自适应性,且无法重建完整的机器人形状,限制了其适用性。本文介绍了一种基于具有自适应弯曲刚度的域解耦物理信息神经网络(DD-PINN)的实时非线性模型预测控制(MPC)框架。DD-PINN作为动态Cosserat杆模型的替代模型,加速比高达44,000倍。它还被用于无迹卡尔曼滤波器中,从末端执行器位置测量中估计模型状态和弯曲柔度。我们在GPU上实现了一个以70 Hz运行的非线性进化MPC。在仿真中,它展示了动态轨迹的精确跟踪和设定点控制,末端执行器位置误差低于3 mm(执行器长度的2.3%)。在实际实验中,控制器实现了类似的精度和高达3.55 m/s²的加速度。

英文摘要

Dynamic control of soft continuum robots (SCRs) holds great potential for expanding their applications, but remains a challenging problem due to the high computational demands of accurate dynamic models. While data-driven approaches like Koopman-operator-based methods have been proposed, they typically lack adaptability and cannot reconstruct the full robot shape, limiting their applicability. This work introduces a real-time-capable nonlinear model-predictive control (MPC) framework for SCRs based on a domain-decoupled physics-informed neural network (DD-PINN) with adaptable bending stiffness. The DD-PINN serves as a surrogate for the dynamic Cosserat rod model with a speed-up factor of up to 44,000. It is also used within an unscented Kalman filter for estimating the model states and bending compliance from end-effector position measurements. We implement a nonlinear evolutionary MPC running at 70 Hz on the GPU. In simulation, it demonstrates accurate tracking of dynamic trajectories and setpoint control with end-effector position errors below 3 mm (2.3\% of the actuator's length). In real-world experiments, the controller achieves similar accuracy and accelerations up to 3.55 m/s2.

2508.19478 2026-06-12 physics.med-ph eess.IV 版本更新

Bayesian Insights into Exchange and Restriction in Gray Matter Diffusion MRI

灰质弥散MRI中交换与限制的贝叶斯洞察

Maëliss Jallais, Quentin Uhl, Tommaso Pavan, Malwina Molendowska, Derek K. Jones, Ileana Jelescu, Marco Palombo

AI总结 本研究利用贝叶斯推断框架μGUIDE评估NEXI和SANDIX两种灰质模型的参数估计准确性、精度和简并性,发现交换时间和胞体半径等参数在高噪声下存在高不确定性和偏差,强调不确定性量化对提高模型可重复性的重要性。

详情
AI中文摘要

弥散MRI(dMRI)中的生物物理模型有望表征灰质组织微观结构。然而,其参数估计的可靠性仍未得到充分研究,尤其是在包含水交换的模型中。本研究利用既定采集协议,在模拟和体内数据上评估了最近提出的两种灰质模型NEXI和SANDIX的准确性、精度和简并性。我们采用基于深度学习的贝叶斯推断框架μGUIDE来量化参数不确定性并检测简并性,从而实现对模型拟合的更可解释评估。结果表明,虽然某些微结构参数(如细胞外扩散率和神经突信号分数)被稳健估计,但其他参数(包括交换时间和胞体半径)通常与高不确定性和估计偏差相关,特别是在现实噪声条件和简化采集协议下。与非线性最小二乘拟合的比较突出了不确定性感知方法的关键优势:能够标记并过滤不可靠估计。总之,这些发现强调了在解释基于模型的估计时需要报告不确定性并考虑模型简并性。我们的研究倡导将概率拟合方法整合到成像流程中,以提高可重复性和生物学可解释性。

英文摘要

Biophysical models in diffusion MRI (dMRI) hold promise for characterizing gray matter tissue microstructure. Yet, the reliability of their parameter estimates remains largely under-studied, especially in models that incorporate water exchange. In this study, we investigate the accuracy, precision, and presence of degeneracy of two recently proposed gray matter models, NEXI and SANDIX, using established acquisition protocols, on both simulated and \textit{in vivo} data. We employ $\mu$GUIDE, a Bayesian inference framework based on deep learning, to quantify parameter uncertainty and detect degeneracies, enabling a more interpretable assessment of model fits. Our results show that while some microstructural parameters, such as extra-cellular diffusivity and neurite signal fraction, are robustly estimated, others, including exchange time and soma radius, are often associated with high uncertainty and estimation bias, particularly under realistic noise conditions and reduced acquisition protocols. Comparison with non-linear least squares fitting highlights the critical advantage of uncertainty-aware methods: the ability to flag and filter out unreliable estimates. Together, these findings emphasize the need to report uncertainty and account for model degeneracies when interpreting model-based estimates. Our study advocates for the integration of probabilistic fitting approaches into imaging pipelines to improve reproducibility and biological interpretability.

2503.10919 2026-06-12 cs.RO eess.SY nlin.PS 版本更新

Data-Driven Soft Robot Control via Adiabatic Spectral Submanifolds

基于绝热谱子流形的数据驱动软体机器人控制

Roshan S. Kaundinya, John Irvin Alora, Jonas G. Matt, Luis A. Pabon, Marco Pavone, George Haller

AI总结 针对软体机器人在非线性区域控制难题,提出基于绝热谱子流形(aSSM)的模型预测控制策略,通过数据驱动构建低维吸引子流形,实现高精度轨迹跟踪,性能提升达10倍。

详情
Comments
41 pages, 24 figures, IJRR (2026) in press
AI中文摘要

软体机器人的机械复杂性给基于模型的控制带来了重大挑战。具体而言,线性数据驱动模型难以在探索具有显著非线性行为的复杂空间扩展路径上控制软体机器人。为了解释这些非线性,我们基于最新的绝热谱子流形(aSSM)理论开发了一种模型预测控制策略。该理论适用是因为重度阻尼机器人的内部振动衰减速度远快于机器人沿预定路径的期望速度。在这种情况下,低维吸引不变流形(aSSM)从路径发出并承载机器人的主导动力学。借助这一最新理论,我们仅从数据出发设计了一种基于aSSM的模型预测控制方案。我们展示了数据驱动模型在跨不同任务跟踪动态轨迹方面的有效性。我们在软体躯干机器人和基于Cosserat杆的弹性软臂的高保真、高维有限元模型上进行了验证,额外实验确认了即使在存在实验噪声的情况下也具有鲁棒性能。值得注意的是,我们发现五维或六维aSSM简化模型在所有闭环控制任务中的跟踪性能比其他数据驱动建模方法高出最多10倍。

英文摘要

The mechanical complexity of soft robots creates significant challenges for their model-based control. Specifically, linear data-driven models have struggled to control soft robots on complex, spatially extended paths that explore regions with significant nonlinear behavior. To account for these nonlinearities, we develop here a model-predictive control strategy based on the recent theory of adiabatic spectral submanifolds (aSSMs). This theory is applicable because the internal vibrations of heavily overdamped robots decay at a speed that is much faster than the desired speed of the robot along its intended path. In that case, low-dimensional attracting invariant manifolds (aSSMs) emanate from the path and carry the dominant dynamics of the robot. Aided by this recent theory, we devise an aSSM-based model-predictive control scheme purely from data. We demonstrate the effectiveness of our data-driven model in tracking dynamic trajectories across diverse tasks. We validate on high-fidelity, high-dimensional finite-element models of a soft trunk robot and Cosserat-rod-based elastic soft arms, with additional experiments confirming robust performance even in the presence of experimental noise. Notably, we find that five- or six-dimensional aSSM-reduced models outperform the tracking performance of other data-driven modeling methods by a factor up to 10 across all closed-loop control tasks.

2509.21398 2026-06-12 cs.CV eess.IV 版本更新

Skeleton Sparsification and Densification Scale-Spaces

骨架稀疏化和致密化尺度空间

Julia Gierke, Pascal Peter

AI总结 提出骨架化尺度空间,通过稀疏化中轴实现形状层次简化,并引入致密化实现从粗到细的逆过程,应用于鲁棒骨架化、形状压缩和增材制造刚度增强。

详情
AI中文摘要

Hamilton-Jacobi骨架,也称为中轴,是一种强大的形状描述符,它根据最大内切圆的中心来表示二值对象。尽管应用广泛,但中轴对噪声敏感:微小的边界变化可能导致骨架不成比例地扩大和产生不必要的分支。经典的剪枝方法通过系统地移除多余的骨架分支来缓解这一缺陷。这种骨架的顺序简化类似于稀疏化尺度空间的原理,该空间将图像嵌入到从越来越稀疏的像素表示重建的族中。我们通过引入骨架化尺度空间将两者结合起来:它们利用中轴的稀疏化来实现形状的层次简化。与传统的剪枝不同,我们的框架固有地满足关键的尺度空间特性,如层次结构、可控简化和对几何变换的等变性。我们在连续和离散公式中提供了严格的理论基础,并通过致密化进一步扩展了这一概念。通过逐步增长骨架而不是收缩它,我们允许从粗到细尺度的逆过程。致密化尺度空间甚至可以超越原始骨架,产生与实际问题相关的过完备形状表示。通过概念验证实验,我们展示了我们的框架在实际任务中的有效性,包括鲁棒骨架化、形状压缩和增材制造的刚度增强。

英文摘要

The Hamilton-Jacobi skeleton, also known as the medial axis, is a powerful shape descriptor that represents binary objects in terms of the centres of maximal inscribed discs. Despite its broad applicability, the medial axis suffers from sensitivity to noise: Minor boundary variations can lead to disproportionately large and undesirable expansions of the skeleton. Classical pruning methods mitigate this shortcoming by systematically removing extraneous skeletal branches. This sequential simplification of skeletons resembles the principle of sparsification scale-spaces that embed images into a family of reconstructions from increasingly sparse pixel representations. We combine both worlds by introducing skeletonisation scale-spaces: They leverage sparsification of the medial axis to achieve hierarchical simplification of shapes. Unlike conventional pruning, our framework inherently satisfies key scale-space properties such as hierarchical architecture, controllable simplification, and equivariance to geometric transformations. We provide a rigorous theoretical foundation in both continuous and discrete formulations and extend the concept further with densification. By growing the skeleton successively instead of shrinking it, we allow inverse progression from coarse to fine scales. Densification scale-spaces can even reach beyond the original skeleton to produce overcomplete shape representations with relevancy for practical applications. Through proof-of-concept experiments, we demonstrate the effectiveness of our framework for practical tasks including robust skeletonisation, shape compression, and stiffness enhancement for additive manufacturing.

2509.19526 2026-06-12 cs.LG eess.SY 版本更新

Metriplectic Conditional Flow Matching for Dissipative Dynamics

度量辛条件流匹配用于耗散动力学

Ali Baheri, Lars Lindemann

AI总结 提出度量辛条件流匹配(MCFM)方法,通过将保守-耗散分解融入向量场和结构保持采样器,学习耗散动力学,保证能量单调递减和长期稳定性。

详情
AI中文摘要

度量辛条件流匹配(MCFM)在不违反第一原理的情况下学习耗散动力学。神经替代模型常常注入能量并破坏长期推演的稳定性;MCFM 则将保守-耗散分解同时融入向量场和结构保持采样器。MCFM 通过短时间过渡上的条件流匹配进行训练,避免了长时间推演伴随的梯度计算。在推理时,Strang-prox 方案交替进行辛更新和近端度量步骤,确保离散能量衰减;当有可信能量可用时,可选投影强制严格衰减。我们提供了连续和离散时间保证,将该参数化和采样器与守恒、单调耗散和稳定推演联系起来。在一个受控机械基准上,MCFM 产生的相图更接近真实情况,并且与同等表达能力的无约束神经流相比,能量增加和正能量率事件显著减少,同时匹配终端分布拟合。

英文摘要

Metriplectic conditional flow matching (MCFM) learns dissipative dynamics without violating first principles. Neural surrogates often inject energy and destabilize long-horizon rollouts; MCFM instead builds the conservative-dissipative split into both the vector field and a structure preserving sampler. MCFM trains via conditional flow matching on short transitions, avoiding long rollout adjoints. In inference, a Strang-prox scheme alternates a symplectic update with a proximal metric step, ensuring discrete energy decay; an optional projection enforces strict decay when a trusted energy is available. We provide continuous and discrete time guarantees linking this parameterization and sampler to conservation, monotonic dissipation, and stable rollouts. On a controlled mechanical benchmark, MCFM yields phase portraits closer to ground truth and markedly fewer energy-increase and positive energy rate events than an equally expressive unconstrained neural flow, while matching terminal distributional fit.

2509.09299 2026-06-12 eess.SY 版本更新

Towards Efficient and Secure Cloud-Assisted Autonomous Systems: A Review of Architectures, Algorithms, Security, and Deployment Challenges

迈向高效安全的云辅助自主系统:架构、算法、安全与部署挑战综述

Yasir Ali, Tayyab Manzoor, Huan Yang, Asif Ali, Yuanqing Xia

AI总结 综述2012-2025年间云控制系统(CCS)在工业自动化、安全隐私及云控制技术方面的进展,分类分析现有研究,并讨论未来高效实用CCS的设计方向。

详情
Comments
61 pages, 11 Figures
AI中文摘要

网络控制系统(NCSs)在实时虚拟控制和管理背景下,对于实现完全连接和响应的智能环境起到了关键作用。然而,传统NCSs在处理大规模控制应用产生的海量数据方面面临巨大挑战,特别是在数据采集、存储和计算处理方面。为了应对这些挑战,云计算的出现和控制理论的进步催生了被称为云控制系统(CCSs)的新范式。近年来,CCSs因其在大规模数据管理、复杂计算和数据中心优化决策等方面的潜在特性而受到工业界的广泛关注。本研究对2012年至2025年间发表的多个研究中的CCSs最新进展进行了广泛综述。具体而言,重点提供了CCS研究当前发现的分类,涵盖了不同视角,如其工业自动化中的高效实现、安全与隐私考虑以及基于云的控制技术。通过选定的最新分析,对每种类别进行了深入探讨,对比了不同方法和对比方法论。此外,我们讨论了旨在设计更高效实用CCSs的未来方向。本研究的见解可帮助研究人员、实践者和决策者在其领域内进行有效的CCS设计和部署。

英文摘要

Networked Control Systems (NCSs) have been instrumental in realizing fully connected and responsive intelligent environments within the context of real-time virtual control and management. However, traditional NCSs face considerable challenges in handling the vast amounts of data generated by large-scale control applications, particularly in terms of data acquisition, storage, and computational processing. To address these challenges, the emergence of cloud computing and advancements in control theory have empowered the new paradigm known as Cloud Control Systems (CCSs). Recently, CCSs have received substantial attention from industries for their potential properties, such as large-scale data management, complex computations, and data-centric optimized decisions. This study presents an extensive review of recent progress in CCSs spanning over multiple studies published between 2012 and 2025. Specifically, the focus is on providing a taxonomy of the current findings in CCS research, encompassing various perspectives, such as its efficient implementations in industrial automation, security and privacy considerations, and cloud-based control techniques. Each category is examined in depth through selected state-of-the-art analyses of different approaches and contrasting methodologies. Furthermore, we discuss future directions aimed at designing more efficient and practical CCSs. The insights gained from this study can help researchers, practitioners, and decision-makers in their domain for effective CCS design and deployment.

2509.01630 2026-06-12 cs.LG cs.MA cs.RO eess.SY 版本更新

DiffCoord: Differentiable Coordination for Distributed Multi-Agent Trajectory Optimization

DiffCoord: 分布式多智能体轨迹优化的可微协调

Bingheng Wang, Yichao Gao, Tianchen Sun, Shanker Ajay, Lin Zhao

AI总结 提出DiffCoord框架,将截断ADMM-DDP管道的耦合参数通过端到端元学习联合优化,利用智能体神经网络实现任务自适应,并扩展到不同智能体数量。在协作空中运输系统中验证,相比现有方法将每智能体梯度计算时间减少70%。

详情
AI中文摘要

将交替方向乘子法(ADMM)与微分动态规划(DDP)相结合,为分布式多智能体轨迹优化提供了一个可扩展的框架。在实践中,ADMM通常被截断以提高计算效率,这紧密耦合了原本分别控制协调质量和任务性能的参数。在本文中,我们提出了可微协调(DiffCoord),一个统一框架,联合元学习截断ADMM-DDP管道的这些耦合参数。这些参数由智能体神经网络生成以实现任务自适应,并且同构智能体之间共享相同的网络,从而能够扩展到不同数量的智能体。我们通过端到端微分ADMM-DDP管道实现了高效的元学习。值得注意的是,这产生了一个辅助的ADMM-LQR分布式梯度求解器,用于计算和协调关于这些参数的元梯度。该求解器继承了管道的计算结构,使得关键计算结果可以重用,并能够在智能体和轨迹时间线上高效并行化。我们通过协作空中运输系统的数值和物理实验验证了DiffCoord,该系统在狭窄空间中重新配置四旋翼编队以实现安全的六自由度负载操作。它能够鲁棒地适应变化的团队规模和负载动力学,同时与最先进的轨迹梯度方法相比,将每智能体梯度计算时间减少高达70%。

英文摘要

Integrating the Alternating Direction Method of Multipliers (ADMM) with Differential Dynamic Programming (DDP) provides a scalable framework for distributed multi-agent trajectory optimization. In practice, ADMM is typically truncated for computational efficiency, tightly coupling parameters that would otherwise separately govern coordination quality and task performance. In this paper, we propose Differentiable Coordination (DiffCoord), a unified framework that jointly meta-learns these coupled parameters for the truncated ADMM-DDP pipeline. These parameters are generated by agent-wise neural networks for task adaptation, and the same networks are shared among isomorphic agents to enable scalability to varying agent counts. We achieve efficient meta-learning by differentiating the ADMM-DDP pipeline end-to-end. Notably, this yields an auxiliary ADMM-LQR distributed gradient solver that computes and coordinates meta-gradients with respect to these parameters. This solver inherits the computational structure of the pipeline, enabling reuse of key computation results and efficient parallelization over agents and along trajectory horizons. We validate DiffCoord through numerical and physical experiments on a cooperative aerial transport system, where it reconfigures quadrotor formations for safe 6-DoF load manipulation in tight spaces. It adapts robustly to varying team sizes and load dynamics, while reducing per-agent gradient computation time by up to 70% compared with state-of-the-art trajectory-gradient methods.

2509.04682 2026-06-12 cs.SD cs.AI cs.CV cs.IR cs.LG eess.AS 版本更新

GetNetUPAM: Ecologically Informed Nested Cross-Validation and Noise-Robust Attention for Marine Bioacoustic Monitoring

GetNetUPAM:生态信息嵌套交叉验证与噪声鲁棒注意力用于海洋生物声学监测

Nicholas R. Rasmussen, Rodrigue Rizk, Longwei Wang, KC Santosh

AI总结 提出GetNetUPAM框架,通过分层嵌套交叉验证保持生态异质性,并集成CBAM空间注意力的ARPA-N网络,在高噪声低信噪比条件下实现鲁棒泛化,在零训练区域将误报率降低约10倍。

详情
Comments
Resubmitted and under review as an anonymous submission to IEEETAI - We are allowed an archive submission. Final formatting is yet to be determined
AI中文摘要

部署可靠的生物声学监测系统需要能够在高噪声、低信噪比条件下泛化的模型,以及能够暴露部署相关故障模式的评估协议,这些在当前UPAM实践中基本未得到解决。内在噪声、可变传播以及混合的生物和人为源会导致分布偏移,而传统模型和单次划分评估会掩盖这些偏移,夸大性能并掩盖不稳定性。我们提出GetNetUPAM,一种分层嵌套交叉验证框架,它利用嵌套阶段来量化模型稳定性,而不是调整以获取夸大的保留分数。通过将数据划分为站点-年份块,GetNetUPAM保留了生态异质性,并迫使每个外层折代表不同的环境条件,防止过拟合局部噪声或传感器伪影。内层分层折衡量整个UPAM信号分布上的泛化能力,强制模型开发与外层保留部署条件严格分离。使用GetNetUPAM,我们评估了自适应分辨率池化和注意力网络(ARPA-N),一种用于不规则频谱图维度的CNN架构。ARPA-N将CBAM空间注意力集成为学习型噪声抑制器,生成注意力图以定位真实叫声结构,并避免标准CNN在长窗口数据上利用的全局非生物线索。在GetNetUPAM下,ARPA-N在不同环境条件下鲁棒泛化。在零训练的Balleny Islands区域,它在固定90%召回率下将每小时误报率降低超过一个数量级(约10倍),并在各折上持续改进指标。这些进展提供了可重复的基准,推动UPAM向可扩展、部署可靠的生态监测发展。

英文摘要

Deploying reliable bioacoustic monitoring systems requires models that generalize under high-noise, low-SNR conditions and evaluation protocols that expose deployment-relevant failure modes, gaps largely unaddressed in current UPAM practice. Intrinsic noise, variable propagation, and mixed biological and anthropogenic sources induce distribution shifts that conventional models and single-split evaluations obscure, inflating performance and masking instability. We introduce GetNetUPAM, a hierarchical nested cross-validation framework that uses the nested stage to quantify model stability rather than tune for inflated hold-out scores. By partitioning data into site-year blocks, GetNetUPAM preserves ecological heterogeneity and forces each outer fold to represent a distinct environmental regime, preventing overfitting to localized noise or sensor artifacts. Inner stratified folds measure generalization across the full UPAM signal distribution, enforcing strict separation between model development and the outer held-out deployment condition. Using GetNetUPAM, we evaluate the Adaptive Resolution Pooling and Attention Network (ARPA-N), a CNN architecture for irregular spectrogram dimensions. ARPA-N integrates CBAM spatial attention as a learned noise suppressor, producing attention maps that localize true call structure and avoid the global, non-biological cues exploited by standard CNNs on long-window data. Under GetNetUPAM, ARPA-N generalizes robustly across diverse environmental regimes. In the zero-training support Balleny Islands region, it reduces false positives per hour by over an order of magnitude (approximately 10x) at fixed 90 percent recall, yielding consistently improved metrics across folds. These advances provide a reproducible benchmark and move UPAM toward scalable, deployment-reliable ecological monitoring.

2508.17149 2026-06-12 eess.SY eess.SP 版本更新

Enhancing Energy and Spectral Efficiency in IoT-Cellular Networks via Active SIM-Equipped LEO Satellites

通过有源SIM增强的LEO卫星提升IoT蜂窝网络的能效和频谱效率

Rahman Saadat Yeganeh, Hamid Behroozi, Mohammad Javad Omidi, Mohammad Robat Mili, Eduard A. Jorswieck, Symeon Chatzinotas

AI总结 研究利用安装在卫星太阳能板背面的有源堆叠智能超表面(ASIM)结合速率分割多址(RSMA)和共生无线电网络,通过多顺序处理提升信道增益并抑制干扰,采用BCD-SCA、MA-CSAC和MCPPO三种优化方法,实现高能效和频谱效率。

详情
AI中文摘要

本文研究了一种由有源堆叠智能超表面(ASIM)增强的低地球轨道(LEO)卫星通信系统,该ASIM安装在卫星太阳能板的背板上,以有效利用有限的机载空间并降低主卫星功率放大器的需求。该系统通过速率分割多址(RSMA)服务多个地面用户,并通过共生无线电网络服务物联网设备。ASIM中的多层顺序处理提高了有效信道增益并抑制了用户间干扰,性能优于有源RIS和超越对角线RIS设计。评估了三种优化方法:块坐标下降与逐次凸逼近(BCD-SCA)、模型辅助多智能体约束软演员-评论家(MA-CSAC)以及多约束近端策略优化(MCPPO)。仿真结果表明,BCD-SCA在没有学习的情况下在凸场景中快速稳定收敛,MCPPO实现了快速初始收敛且稳定性适中,而MA-CSAC在大规模网络中获得了最高的长期频谱和能量效率。分析了不同ASIM元件、卫星天线和发射功率下的能量-频谱效率权衡。总体而言,研究表明,将多层ASIM与合适的优化算法集成,为下一代LEO卫星通信提供了一种可扩展、高能效且高性能的解决方案。

英文摘要

This paper investigates a low Earth orbit (LEO) satellite communication system enhanced by an active stacked intelligent metasurface (ASIM), mounted on the backplate of the satellite solar panels to efficiently utilize limited onboard space and reduce the main satellite power amplifier requirements. The system serves multiple ground users via rate-splitting multiple access (RSMA) and IoT devices through a symbiotic radio network. Multi-layer sequential processing in the ASIM improves effective channel gains and suppresses inter-user interference, outperforming active RIS and beyond-diagonal RIS designs. Three optimization approaches are evaluated: block coordinate descent with successive convex approximation (BCD-SCA), model-assisted multi-agent constraint soft actor-critic (MA-CSAC), and multi-constraint proximal policy optimization (MCPPO). Simulation results show that BCD-SCA converges fast and stably in convex scenarios without learning, MCPPO achieves rapid initial convergence with moderate stability, and MA-CSAC attains the highest long-term spectral and energy efficiency in large-scale networks. Energy-spectral efficiency trade-offs are analyzed for different ASIM elements, satellite antennas, and transmit power. Overall, the study demonstrates that integrating multi-layer ASIM with suitable optimization algorithms offers a scalable, energy-efficient, and high-performance solution for next-generation LEO satellite communications.

2507.07879 2026-06-12 cs.SD eess.AS 版本更新

LISTEN: Lightweight Industrial Sound-representable Transformer for Edge Notification

LISTEN:面向边缘通知的轻量级工业声音可表示Transformer

Changheon Han, Yun Seok Kang, Yuseop Sim, Hyung Wook Park, Martin Byung-Guk Jun

AI总结 提出轻量级工业声音基础模型LISTEN,通过知识蒸馏从大模型IMPACT压缩,仅用少量数据微调即可在边缘设备上实现实时机器监控,性能接近大模型。

详情
AI中文摘要

基于深度学习的机器听觉正在拓宽工业声学分析的范围,但其在实时车间中的广泛实施受到对每个新任务依赖大型、任务特定标注数据集的阻碍。虽然新兴的通用声音基础模型旨在减轻数据依赖性,但它们在实践中暴露出关键困境。通用声音基础模型计算成本高,并且在以音调谐波、宽带噪声和瞬态故障事件为特征的工业场景中失败,使得即时、现场部署不切实际。这些挑战共同意味着,在实时车间部署声音基础模型的实用端到端系统仍然难以实现。为了解决这一挑战,本研究引入了LISTEN(面向边缘通知的轻量级工业声音可表示Transformer),这是第一个专门针对工业声音的轻量级基础模型。通过从大规模教师模型IMPACT(基于声学认知Transformer的工业机器感知)进行知识蒸馏,我们构建了针对资源受限边缘环境优化的LISTEN。通过冻结骨干网络并仅对最小目标过程数据训练浅层头部,而不是进行完全微调或重新训练,LISTEN在多种制造过程中实现了与IMPACT几乎相同的性能。本研究进一步展示了一个完整的实时机器监控系统,包括使用工业物联网(IIoT)设备进行数据采集、使用最小标注数据进行快速模型适应,以及在低成本边缘设备上进行实时监控。通过在实时CNC机器上验证整个系统,这项工作建立了在活跃工业环境中部署轻量级工业声音基础模型的第一个可行的端到端系统。

英文摘要

Deep learning-based machine listening is broadening the scope of industrial acoustic analysis, yet its widespread implementation on live shop floors is hindered by the reliance on large, task-specific annotated datasets for every new task. While emerging general-purpose sound foundation models aim to alleviate data dependency, they reveal critical dilemmas in practice. General-purpose sound foundation models are computationally expensive and fail in industrial scenarios characterized by tonal harmonics, broadband noise, and transient fault events, making instant, on-site deployment impractical. These challenges combined mean that a practical, end-to-end system for deploying a sound foundation model on a live shop floor has remained elusive. To address this challenge, this study introduces LISTEN (Lightweight Industrial Sound-representable Transformer for Edge Notification), the first lightweight foundation model specialized for industrial sound. Through Knowledge Distillation (KD) from the large-scale teacher model IMPACT (Industrial Machine Perception via Acoustic Cognitive Transformer), we construct LISTEN optimized for resource-constrained edge environments. By freezing the backbone and training only a shallow head on minimal target-process data, rather than performing full fine-tuning or retraining, LISTEN achieves nearly identical performance to IMPACT across diverse manufacturing processes. This study further demonstrates a complete system for real-time machine monitoring, encompassing data acquisition with Industrial Internet of Things (IIoT) devices, rapid model adaptation using minimal annotated data, and real-time monitoring on a low-cost edge device. By validating the entire system on a live CNC machine, this work establishes the first feasible end-to-end system for deploying a lightweight industrial sound foundation model in an active industrial environment.

2402.01779 2026-06-12 eess.IV cs.CV cs.LG stat.ML 版本更新

Plug-and-Play image restoration with Stochastic deNOising REgularization

即插即用图像恢复:随机去噪正则化

Marien Renaud, Jean Prost, Arthur Leclaire, Nicolas Papadakis

AI总结 提出SNORE框架,仅在适当噪声水平图像上应用去噪器,结合随机正则化与梯度下降求解逆问题,在去模糊和修复任务上达到SOTA。

详情
AI中文摘要

即插即用(PnP)算法是一类迭代算法,通过结合物理模型和深度神经网络进行正则化来解决图像逆问题。尽管它们能产生令人印象深刻的图像恢复结果,但这些算法依赖于在迭代过程中噪声逐渐减小的图像上非标准地使用去噪器,这与最近基于扩散模型(DM)的算法形成对比,后者仅在重新加噪的图像上应用去噪器。我们提出了一种新的PnP框架,称为随机去噪正则化(SNORE),该框架仅在具有适当噪声水平的图像上应用去噪器。它基于显式的随机正则化,从而产生一种随机梯度下降算法来解决不适定逆问题。提供了该算法及其退火扩展的收敛性分析。实验上,我们证明SNORE在去模糊和修复任务上与最先进方法相比具有竞争力,无论是在定量还是定性方面。

英文摘要

Plug-and-Play (PnP) algorithms are a class of iterative algorithms that address image inverse problems by combining a physical model and a deep neural network for regularization. Even if they produce impressive image restoration results, these algorithms rely on a non-standard use of a denoiser on images that are less and less noisy along the iterations, which contrasts with recent algorithms based on Diffusion Models (DM), where the denoiser is applied only on re-noised images. We propose a new PnP framework, called Stochastic deNOising REgularization (SNORE), which applies the denoiser only on images with noise of the adequate level. It is based on an explicit stochastic regularization, which leads to a stochastic gradient descent algorithm to solve ill-posed inverse problems. A convergence analysis of this algorithm and its annealing extension is provided. Experimentally, we prove that SNORE is competitive with respect to state-of-the-art methods on deblurring and inpainting tasks, both quantitatively and qualitatively.

2401.08301 2026-06-12 eess.SP cs.LG eess.SY 版本更新

QoS Improvement in Multi User Cellular-Symbiotic Radio Network Assisted by Active-STAR-RIS

基于有源同步透射反射智能超表面的多用户蜂窝共生无线电网络中的QoS改进

Rahman Saadat Yeganeh, Mohammad Javad Omidi, Farshad Zeinali, Mohammad Robat Mili, Mohammad Ghavami

AI总结 本文利用有源同步透射反射智能超表面(ASRIS)增强6G蜂窝网络服务质量,通过深度强化学习优化波束成形、相位调整和调度参数,最大化共生反向散射设备与用户间的吞吐量。

详情
Comments
This article will be submitted to the Transactions journal
AI中文摘要

在本文中,我们采用有源同步透射反射可重构智能表面(ASRIS)来增强6G蜂窝网络服务的质量。该网络集成了共生无线电(CSR)子系统,以促进无源物联网(IoT)用户与有源用户之间的通信,分别称为共生反向散射设备(SBD)和共生用户设备(SUE)。由于SBD是无源的,向SUE传输信息面临重大挑战。为克服这一挑战,我们利用基站(BS)内大规模多输入多输出(MIMO)天线的能力,以更大的功率中继SBD传输的信息。该方案采用非正交多址(NOMA)技术实现所有用户的多址接入,并使用连续干扰消除(SIC)消除潜在干扰。主要目标是最大化SBD与SUE之间的吞吐量。为此,我们构建了一个优化问题,涉及BS和ASRIS处的有源波束成形系数、ASRIS的相位调整以及CSR与蜂窝网络之间的调度参数。为解决该优化问题,我们使用了三种深度强化学习(DRL)方法:近端策略优化(PPO)、双延迟深度确定性策略梯度(TD3)和异步优势演员-评论家(A3C)。对这些方法进行了仿真,结果表明A3C、TD3和PPO分别具有最快的收敛速度并实现了最高的网络吞吐量增长。最后,使用无源同步透射反射RIS(STAR-RIS)对所提方案进行了评估,其性能劣于ASRIS。

英文摘要

In this article, we employ active simultaneously transmitting and reflecting reconfigurable intelligent surfaces (ASRIS) to enhance the quality of 6G cellular network services. The network integrates commensal symbiotic radio (CSR) subsystems to facilitate communication between passive Internet of Things (IoT) users and active users, referred to as symbiotic backscatter devices (SBDs) and symbiotic user equipments (SUEs), respectively. Since the SBDs are passive, transmitting information to the SUEs poses significant challenges. To overcome this challenge, we harness the capabilities of massive multiple input multiple output (MIMO) antennas within the base station (BS) to relay the information transmitted by SBDs with greater power. This scheme uses the non-orthogonal multiple access (NOMA) technique for multiple access among all users, and potential interferences are eliminated using successive interference cancellation (SIC). The primary objective is to maximize the throughput between SBDs and SUEs. To achieve this, we formulate an optimization problem involving variables such as active beamforming coefficients at the BS and ASRIS, phase adjustments of ASRIS, and scheduling parameters between CSR and cellular networks. To solve this optimization problem, we used three deep reinforcement learning (DRL) methods: proximal policy optimization (PPO), twin delayed deep deterministic policy gradient (TD3), and asynchronous advantage actor critic (A3C). These methods were simulated, and the results demonstrate that A3C, TD3, and PPO have the best convergence speeds and achieve the highest increases in network throughput, respectively. Finally, the proposed scheme was evaluated using passive simultaneously transmitting and reflecting RIS (STAR-RIS), which demonstrated poorer performance compared to ASRIS.