arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.21439 2026-05-21 eess.SY cs.RO cs.SY

Fully Actuated Manifold Constraint Based Output Feedback Control for Input-Constrained Uncertain Nonlinear Systems

全驱动流形约束基于输出反馈控制的输入受限不确定非线性系统

Dianrui Mu, Changchun Hua, Yafeng Li, Jiannan Chen, Rao Wei

AI总结 本文提出了一种低复杂度、无模型的输出反馈控制器,用于处理具有未知输入约束的未知时变非线性系统,实现了预设的控制精度,并在执行器饱和后保持灵活的控制精度。该方法扩展了现有线性流形约束控制方法,包括非线性流形的构造和各种约束类型,从而在有限或固定时间内实现预设的控制精度。此外,通过构造误差驱动的灵活约束,实现了未知饱和情况下的灵活控制。最后提供了二阶及更高阶的控制示例和仿真。

详情
Comments
22 pages, 12 figures, 2 tables
AI中文摘要

本文提出了一种低复杂度、无模型的输出反馈控制器,用于处理具有未知输入约束的未知时变非线性系统。该控制器在执行器未饱和时实现预设的控制精度,并在执行器饱和后保持灵活的控制精度。这一结果将现有针对线性流形的约束控制方法扩展到更一般的形式,包括非线性流形的构造和各种类型的约束,从而在有限或固定时间内实现预设的控制精度。此外,通过构造误差驱动的灵活约束,实现了未知饱和情况下的灵活控制。最后,提供了二阶及更高阶的控制示例和仿真。

英文摘要

This paper presents a low-complexity, model-free, output-feedback controller for a class of unknown time-varying nonlinear systems with unknown input constraints. The controller achieves the preset control accuracy when the actuator is not saturated and maintains flexible control accuracy after actuator saturation. This result extends existing constraint control methods for linear manifolds to a more general form, including the construction of nonlinear manifolds and various types of constraints, thereby achieving preset control accuracy within finite or fixed time. Additionally, flexible control under unknown saturation is achieved through the construction of an error-driven flexible constraint. Finally, second-order and higher-order control examples and simulations are provided.

2605.21399 2026-05-21 eess.SY cs.SY

Output Feedback Control of Linear Time-Invariant Systems with Operational Constraints

线性时不变系统输出反馈控制与操作约束

Marcel Menner, Heather Hussain, Eugene Lavretsky

AI总结 本文提出了一种系统的方法,用于在存在操作约束的情况下设计鲁棒线性控制器,利用Nagumo定理和比较引理保证约束满足,同时结合受控屏障函数启发的最小范数最优控制原理,设计出连续分段线性输出反馈控制器,保持闭环系统分析的可线性化。通过飞行控制贸易研究,展示了该框架在安全关键航空器控制中的实际相关性。

详情
AI中文摘要

本文介绍了一种系统的方法,用于在存在操作约束的情况下设计鲁棒线性控制器。该设计利用Nagumo定理和比较引理来保证约束满足,同时结合受控屏障函数启发的最小范数最优控制原理。所得到的控制器是一种连续分段线性输出反馈策略,保持闭环系统使用线性系统理论分析的可分析性。由于线性控制设计,多输入多输出(MIMO)鲁棒裕度可以在有无主动操作约束的情况下推导出来。本文表明,可以通过基于观测器的输出反馈控制设计满足系统状态的操作约束。通过飞行控制贸易研究,我们展示了该框架在安全关键航空器控制中的实际相关性。

英文摘要

This paper introduces a systematic method for designing robust linear controllers using output feedback in the presence of operational constraints. The design uses Nagumo's Theorem and the Comparison Lemma to guarantee constraint satisfaction, while incorporating min-norm optimal control principles inspired by Control Barrier Functions. The resulting controller is a continuous piecewise-linear output feedback policy that preserves the closed-loop system's analyzability using linear systems theory. Due to the linear control design, multi-input multi-output (MIMO) robustness margins can be derived with and without active operational constraints. This paper shows that operational constraints on the system's state can be satisfied using an observer-based output feedback control design. Through flight control trade studies, we demonstrate the practical relevance of the framework in safety-critical aircraft control applications.

2605.21396 2026-05-21 eess.SY cs.SY

Grid-Aware Peer-to-Peer Energy Trading: A Learning-Augmented Framework

考虑网格的点对点能源交易:一种学习增强的框架

Devangi, Ankit Singhal, Yashasvi Bansal

AI总结 本文提出了一种学习增强的点对点能源交易和电网运营商接口,通过监督变压器回归模型使微电网能够本地预测电网运营商的响应,从而减少交易开销,减轻电网运营商负担并保护信息隐私。

详情
AI中文摘要

配电网络由于分布式能源资源(DERs)的日益整合而从被动系统向主动系统转变。点对点(P2P)能源交易作为一种可行的框架,使参与者之间能够进行本地能源交换,此处表示为聚合微电网(MGs)。将网络约束纳入其中对于确保P2P交易在物理上可行且符合电网运行限制至关重要。然而,现有的P2P框架仍然缺乏先进的预测机制,使生产者消费者(prosumers)在交易制定过程中无法预知网络可行性或电网运营商(DSO)的响应。本文提出了一种学习增强的P2P和DSO接口,该接口预测DSO对所提出的P2P交易的响应,使生产者消费者能够自我评估并改进其交易决策。一个监督变压器基于回归模型被训练以使MGs能够本地预测DSO的响应而无需共享其提出的交易,从而减少交易开销,减轻DSO负担并保护信息隐私。所提出的框架在修改后的IEEE 33 bus配电电力系统上进行了验证,该系统包含互联微电网。通过案例研究验证了所提模型在市场效率、交易接受度和计算负担方面的有效性。

英文摘要

Distribution networks are transitioning from passive to active systems due to the growing integration of distributed energy resources (DERs). Peer to Peer (P2P) energy trading has emerged as a viable framework that enables local energy exchange among participants, represented here as aggregated microgrids (MGs). Incorporating network constraints is essential to ensure that P2P transactions remain physically feasible and consistent with grid's operating limits. However, existing P2P frameworks still lack advanced predictive mechanisms that allow prosumers to anticipate network feasibility or the distribution system operator (DSO) response during trade formulation. This paper proposes a learning augmented P2P and DSO interface that predicts the DSOs response to the proposed P2P trades, allowing prosumers to self-assess and refine their trading decisions. A supervised transformer based regression model is trained to enable MGs to locally predict the DSOs response without sharing their proposed trades, thereby reducing transaction overhead, alleviating DSO burden, and preserving information privacy. The proposed framework is validated on the modified IEEE 33 bus distribution power system with interconnected microgrids. Case studies are presented to validate the effectiveness of the proposed model in terms of market efficiency, trade acceptance and computational burden.

2605.21350 2026-05-21 eess.SP physics.med-ph

Advancements in Non-Invasive Neuroimaging: Exploring the Potential of Radar Technology for Brain Imaging and Tumour Detection

非侵入性神经影像学的进展:雷达技术在脑成像和肿瘤检测中的潜力探索

Keniel Peart, Indu Bodala, Shelly Vishwakarma

AI总结 本研究探讨了雷达技术在非侵入性脑成像和肿瘤检测中的应用,为MRI和CT扫描提供替代方案。通过Ansys HFSS模拟电磁相互作用,评估了贴片天线和Vivaldi天线在脑组织中的穿透性、信号强度和安全性。结果显示,贴片天线最适合肿瘤定位,而Vivaldi天线适用于更广泛的扫描应用。尽管在安全、更易获取的成像方面具有潜力,尤其是在资源有限的环境中,但需要进一步研究以推进该技术在非侵入性医疗诊断中的应用。

详情
Journal ref
2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 1-6, 2024
Comments
7 pages, 7 figures, published at 2024 IEEE EMBC, Orlando, FL, USA
AI中文摘要

本研究探讨了雷达技术在非侵入性脑成像和肿瘤检测中的应用,为MRI和CT扫描提供替代方案。使用Ansys HFSS模拟电磁相互作用,评估了贴片和Vivaldi天线在脑组织中的穿透性、信号强度和安全性。结果表明,贴片天线最适合肿瘤定位,而Vivaldi天线适用于更广泛的扫描应用。尽管在安全、更易获取的成像方面具有潜力,尤其是在资源有限的环境中,但需要进一步研究以推进该技术在非侵入性医疗诊断中的应用。

英文摘要

This study investigates radar technology for non-invasive brain imaging and tumour detection, offering an alternative to MRI and CT scans. Using Ansys HFSS to simulate electromagnetic interactions in brain tissues, we evaluate the penetration, signal strength, and safety of Patch and Vivaldi antennas. Results show Patch antennas are optimal for tumour localization, while Vivaldi antennas suit broader scanning applications. Although promising for safer, more accessible imaging, especially in resource-limited environments, further research with diverse models and actual patient data is essential to advance this technology in non-invasive medical diagnostics.

2605.21344 2026-05-21 math.OC cs.SY eess.SY

Beyond Nonlinear Small-Gain Design: DADS with Partial-State Feedback

超越非线性小增益设计:带有部分状态反馈的DADS

Iasson Karafyllis, Miroslav Krstic

AI总结 本文研究了PDEs中部分状态调节问题,结合DADS和IOS方法,展示了在存在外部输入时无需假设扰动或参数界即可实现鲁棒调节的控制器设计。

详情
Comments
30 pages, 4 figures
AI中文摘要

Eduardo Sontag和合作者研究了输入到输出稳定性(IOS)和输出渐近增益性质。这些概念改变了控制理论,并最近通过Deadzone-Adapted Disturbance Suppression(DADS)控制方案影响了鲁棒自适应控制。此外,最近IOS的概念被扩展到由偏微分方程(PDEs)描述的系统。在本文中,我们纪念Eduardo Sontag,通过将DADS与IOS结合应用于PDEs:我们研究了一个标量常微分方程(ODE)的局部状态调节问题,该ODE与可能的无限维系统互联。在这种情况下,DADS控制方案可以允许摆脱主要用于部分状态反馈的小增益定理的要求。我们展示了部分状态DADS控制器的设计过程,并证明了即使在存在外部输入(扰动)的情况下,即使不假设任何扰动/参数界的知识,也能实现鲁棒调节。DADS控制器被应用于三种不同的ODE与几乎完全未知的PDE互联的情况:(a)热PDE,(b)传输PDE,(c)具有粘性阻尼的波PDE。我们证明了相同的DADS控制器可以在所有三种情况下实现鲁棒调节。

英文摘要

Eduardo Sontag and coauthors studied Input-to-Output Stability (IOS) and the output asymptotic gain property. These notions changed control theory and recently had an impact on robust adaptive control through the Deadzone-Adapted Disturbance Suppression (DADS) control scheme. Moreover, recently the notion of IOS was extended to systems described by Partial Differential Equations (PDEs). In this work, we celebrate Eduardo Sontag by combining DADS and IOS for PDEs: we study the partial-state regulation problem for a scalar Ordinary Differential Equation (ODE) which is interconnected with a possibly infinite-dimensional system. In such a case the DADS control scheme can allow an escape from the requirements of the small-gain theorem that is mainly used for partial-state feedback. We show the design procedure of partial-state DADS controllers and we prove robust regulation even in the presence of external inputs (disturbances) without assuming knowledge of any disturbance/parameter bounds. The DADS controller is applied to three different cases of the interconnection of an ODE with an almost completely unknown: (a) heat PDE, (b) transport PDE, and (c) wave PDE with viscous damping. We show that the same DADS controller can achieve robust regulation in all three cases.

2605.21332 2026-05-21 eess.AS

Speech Quality Embeddings for Improved Detection and Classification of Degradations in Speech Signals

用于改进语音信号退化检测和分类的语音质量嵌入

Michael Kuhlmann, Tobias Cord-Landwehr, Reinhold Haeb-Umbach

AI总结 本文提出了一种生成帧级嵌入的方法,通过对比损失区分退化类型,提升语音退化检测和分类性能。

详情
Comments
Accepted to 2026 Odyssey workshop
AI中文摘要

传统的自动主观语音质量评估(SSQA)通常在句子或系统级别估计语音质量。虽然这种分辨率对于旧的传输或合成系统足够,现代系统生成高质量语音,其中退化可能仅在局部发生。通过合适的模型架构和正则化损失,训练时使用句子级目标的SSQA模型也能产生有用的局部语音质量预测。在本文中,我们扩展此类模型以生成帧级嵌入,按退化类型进行聚类。具体来说,我们在干净和退化的并行语料上采用部分混合策略,并应用对比损失以区分退化类型。通过在域内和域外数据上的实验,我们证明了我们的方法提高了退化检测,并通过分析嵌入聚类实现了退化类型的识别。

英文摘要

Automatic subjective speech quality assessment (SSQA) traditionally estimates speech quality on an utterance or system level. While this resolution was adequate for older transmission or synthesis systems that produced speech signals of mediocre quality, modern systems generate high-quality speech with degradations that may occur only locally. With suitable model architectures and regularization losses, SSQA models trained with utterance-level targets can also yield useful local predictions of speech quality. In this work, we extend such models to produce frame-level embeddings that cluster by degradation type. Specifically, we employ a partial mix-up strategy on a parallel corpus of clean and degraded utterances and apply a contrastive loss to distinguish between degradation types. Through experiments on both in- and out-of-domain data, we demonstrate that our approach improves degradation detection and enables the identification of degradation types by analyzing embedding clusters.

2605.21319 2026-05-21 eess.SP

Optimal Time Window and Frequency Bandwidth Parameter Combination for Subject-Specific Motor Imagery EEG Classification

面向主体特异性运动想象EEG分类的最优时间窗口和频率带宽参数组合

Matthew A. McCartney, Liisa A. Kivioja, Sonal S. Baberwal, Shirley Coyle

AI总结 本文研究了在运动想象EEG分类中,同时优化时间窗口和频率带宽参数对分类性能的影响,通过在109名受试者上训练和测试不同参数组合的主体特异性模型,并利用重复测量方差分析揭示了不同带宽和时间窗口在准确性上的显著差异,发现(0,4)秒和(4,12)Hz的组合在所有受试者中表现最佳。

详情
AI中文摘要

运动想象(MI)EEG可以通过监督学习技术如线性判别分析进行分类,该技术应用于通过共同空间模式提取的特征。这些模型的性能差异很大,可能由于MI研究通常使用不同的后提示时间窗口和频率带宽。本研究旨在评估同时优化这两个参数对MI分类性能的影响。通过在109名受试者上迭代训练和测试一系列主体特异性模型,不同频率带宽和时间窗口的组合。随后通过重复测量方差分析来揭示不同带宽和时间窗口在患者群体中的准确性差异。所得的可视化和统计检验显示,确实存在不同特定时间窗口和特定带宽在准确性上的显著差异。虽然在五种不同时间窗口下对23种频率带宽的分类准确率比较显示,(0,4)秒和(4,12)Hz的组合在所有受试者中表现最佳,但受试者在其他参数组合上表现出相似的准确性。这些发现突显了个性化模型在检测最佳时间与频谱参数组合以最佳分类MI EEG信号方面的有效性,这些信号在不同受试者之间固有变化。

英文摘要

Motor-imagery (MI) EEG can be classified using supervised machine learning techniques such as Linear Discriminant Analysis applied to features extracted by Common Spatial Patterns. Performance of these models varies widely, possibly due to MI studies commonly utilising differing post-cue time windows and frequency bands to one another. This study aims to assess how the simultaneous optimisation of both these parameters impact MI classification performance. This is done by iteratively training and testing a series of subject-specific models on different combinations of frequency bandwidth and time window options across 109 subjects. This is followed by a statistical analysis using repeated measures ANOVA to uncover significant differences between different bandwidths and time windows in terms of accuracy across the patient cohort. The resulting visualisations and statistical tests show that there are, indeed, significant differences between both specific time windows and specific bandwidths in terms of accuracy. While the comparison of classification accuracies across 23 frequency bandwidths during five different time windows demonstrates an optimal temporal and spectral scale combination of (0, 4) s at the range of (4, 12) Hz across all subjects, the subjects demonstrate similar accuracies for other parameter combinations. These findings highlight the efficacy of personalised models to detect optimal temporal and spectral parameter combinations to best classify MI EEG signals that inherently vary across subjects.

2605.19961 2026-05-21 math.OC cs.SY eess.SY

Data-driven approximation of regions of attraction via an LP-based selection of PWA Lyapunov functions

基于线性规划选择的PWA李雅普诺夫函数用于数据驱动的吸引区近似

Oumayma Khattabi, Matteo Tacchi-Bénard, Martin Gulan, Sorin Olaru

AI总结 本文提出了一种方法,通过数据近似未知非线性动力系统吸引区。假设向量场的点评和已知利普希茨界,构造了多面体不确定性集。该描述使能通过线性规划合成连续分段仿射李雅普诺夫函数,强制所有允许的向量场的鲁棒递减条件。该方法允许认证与可用数据一致的吸引区。数值例子展示了所提方法在稀疏数据中提取认证吸引区的有效性。

详情
AI中文摘要

本文提出了一种方法,通过数据近似未知非线性动力系统吸引区。假设向量场的点评和已知利普希茨界,构造了多面体不确定性集。该不确定性描述使能通过线性规划合成连续分段仿射李雅普诺夫候选函数,强制所有允许的向量场的鲁棒递减条件。该方法允许认证与可用数据一致的吸引区。数值例子展示了所提方法在稀疏数据中提取认证吸引区的有效性。

英文摘要

This paper presents a method to approximate regions of attraction of unknown nonlinear dynamical systems from data. Assuming point-wise evaluations of the vector field and known Lipschitz bounds, a polyhedral uncertainty set of admissible dynamics is constructed. This uncertainty description enables the synthesis of a continuous piece-wise affine Lyapunov candidate via a linear program, enforcing a robust decrease condition for all admissible vector fields. The approach allows certification of a region of attraction consistent with the available data. Numerical examples illustrate the effectiveness of the proposed method in extracting certified regions of attraction from sparse data.

2601.06006 2026-05-21 eess.AS cs.SD

Discriminative-Generative Target Speaker Extraction with Decoder-Only Language Models

判别-生成目标说话人提取与解码器-only语言模型

Bang Zeng, Beilong Tang, Wang Xiang, Ming Li

AI总结 本文提出了一种判别-生成两阶段框架,结合判别提取的可控性和生成模型的重建能力,以提高目标说话人提取和语音增强的感知质量、可懂度和说话人一致性。

详情
Comments
13 pages,4 figures
AI中文摘要

目标说话人提取(TSE)旨在从混合信号中恢复目标说话人的语音,给定一个短的注册语句,而语音增强(SE)则聚焦于在噪声条件下提高语音质量。大多数现有的TSE和SE系统基于判别建模,表现出强大的干扰抑制能力,但往往在感知质量和自然度上有限。为了解决这个问题,我们首先引入LauraTSE,一种基于自回归解码器-only语言模型的生成TSE模型。尽管生成建模在质量增强方面很有前景,但纯粹的生成TSE可能会在复杂的声学条件下遇到幻觉、内容漂移和可控性有限的问题。因此,我们提出了一种判别-生成两阶段框架,其中判别前端首先生成具有强干扰抑制能力的目标相关表示,然后生成后端在神经音频编码器表示空间中重建高质量语音。这种设计结合了判别提取的可控性和生成建模的重建能力。我们进一步研究了该两阶段框架的几种协作策略,包括前端冻结、联合微调、SI-SDR正则化以及自回归/非自回归推理。在TSE和SE基准测试中,实验结果表明,所提出的框架在感知质量、可懂度和说话人一致性之间实现了更好的平衡,优于纯判别或纯生成基线。

英文摘要

Target speaker extraction (TSE) aims to recover the speech of a desired speaker from a mixture given a short enrollment utterance, while speech enhancement (SE) focuses on improving speech quality under noisy conditions. Most existing TSE and SE systems are based on discriminative modeling and have shown strong interference suppression ability, but they often remain limited in perceptual quality and naturalness. To address this issue, we first introduce LauraTSE, a generative TSE model built on an autoregressive decoder-only language model. Although generative modeling is promising for quality enhancement, purely generative TSE may suffer from hallucination, content drift, and limited controllability in complex acoustic conditions. We therefore propose a discriminative-generative two-stage framework, where a discriminative front-end first produces target-related representations with strong interference suppression, and a generative back-end then reconstructs high-quality speech in the neural audio codec representation space. This design combines the controllability of discriminative extraction with the reconstruction capability of generative modeling. We further investigate several collaboration strategies for the two-stage framework, including front-end freezing, joint fine-tuning, SI-SDR regularization, and autoregressive/non-autoregressive inference. Experimental results on both TSE and SE benchmarks show that the proposed framework achieves a better balance among perceptual quality, intelligibility, and speaker consistency than purely discriminative or purely generative baselines.

2507.12081 2026-05-21 eess.AS

VoxATtack: A Multimodal Attack on Voice Anonymization Systems

VoxATtack: 一种多模态攻击 Voice Anonymization 系统

Ahmad Aloradi, Ünal Ege Gaznepoglu, Emanuël A. P. Habets, Daniel Tenbrinck

AI总结 本文提出 VoxATtack 模型,通过结合语音和文本信息,攻击语音匿名化系统,展示了文本信息和选择性数据增强对当前语音匿名化方法的漏洞暴露。

详情
Comments
5 pages, 3 figures, 3 tables, accepted at WASPAA 2025
AI中文摘要

语音匿名化系统旨在通过模糊语音特征来保护说话人隐私,同时保留对下游应用相关的重要语言内容。然而,由于这些语言线索仍然完整,它们可以被用来识别与特定说话人相关的语义语音模式。在本工作中,我们提出了 VoxATtack,一种新颖的多模态去匿名化模型,结合了声学和文本信息以攻击匿名化系统。虽然先前研究侧重于改进从语音中提取的说话人表示,我们表明,结合文本信息与标准 ECAPA-TDNN 可以提高攻击者的性能。我们提出的 VoxATtack 模型采用双分支架构,其中 ECAPA-TDNN 处理匿名语音,而预训练的 BERT 编码转录文本。两个输出被投影到等维嵌入空间,然后基于每句语音的置信度权重进行融合。在评估我们的方法于 VoicePrivacy Attacker Challenge (VPAC) 数据集时,它在五个出七项基准测试中优于顶级攻击者,即 B3、B4、B5、T8-5 和 T12-5。为进一步提升性能,我们利用匿名语音和 SpecAugment 作为增强技术。这种增强使 VoxATtack 能够在所有 VPAC 基准测试中达到最先进水平,分别在 T10-2 和 T25-1 上获得 20.6% 和 27.2% 的平均等错误率。我们的结果表明,结合文本信息和选择性数据增强揭示了当前语音匿名化方法的关键漏洞,并暴露了用于评估的潜在数据集的弱点。

英文摘要

Voice anonymization systems aim to protect speaker privacy by obscuring vocal traits while preserving the linguistic content relevant for downstream applications. However, because these linguistic cues remain intact, they can be exploited to identify semantic speech patterns associated with specific speakers. In this work, we present VoxATtack, a novel multimodal de-anonymization model that incorporates both acoustic and textual information to attack anonymization systems. While previous research has focused on refining speaker representations extracted from speech, we show that incorporating textual information with a standard ECAPA-TDNN improves the attacker's performance. Our proposed VoxATtack model employs a dual-branch architecture, with an ECAPA-TDNN processing anonymized speech and a pretrained BERT encoding the transcriptions. Both outputs are projected into embeddings of equal dimensionality and then fused based on confidence weights computed on a per-utterance basis. When evaluating our approach on the VoicePrivacy Attacker Challenge (VPAC) dataset, it outperforms the top-ranking attackers on five out of seven benchmarks, namely B3, B4, B5, T8-5, and T12-5. To further boost performance, we leverage anonymized speech and SpecAugment as augmentation techniques. This enhancement enables VoxATtack to achieve state-of-the-art on all VPAC benchmarks, after scoring 20.6% and 27.2% average equal error rate on T10-2 and T25-1, respectively. Our results demonstrate that incorporating textual information and selective data augmentation reveals critical vulnerabilities in current voice anonymization methods and exposes potential weaknesses in the datasets used to evaluate them.

2506.09521 2026-05-21 eess.AS cs.CL

You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks

你所说的就是你:利用语言内容进行语音隐私攻击

Ünal Ege Gaznepoglu, Anna Leschanowsky, Ahmad Aloradi, Prachi Singh, Daniel Tenbrinck, Emanuël A. P. Habets, Nils Peters

AI总结 本文研究了语音隐私保护系统中语言内容对语音隐私攻击的影响,通过调整BERT模型作为自动说话人验证系统,评估了说话人内部语言内容相似性对攻击性能的影响,并提出改进语音隐私数据集以实现更公平的隐私评估。

详情
Comments
5 pages, 6 figures, 1 table, accepted at INTERSPEECH 2025 update reason: change to the acknowledgements
AI中文摘要

说话人匿名化系统在隐藏说话人身份的同时,保留了诸如语言内容和情感等其他信息。为了评估其隐私效益,采用自动说话人验证(ASV)系统进行攻击。在本研究中,我们通过调整BERT模型作为ASV系统,评估了攻击者训练和评估数据集中说话人内部语言内容相似性的影响。在VoicePrivacy Attacker Challenge数据集中,我们的方法实现了平均相等错误率(EER)为35%,某些说话人仅基于其语音的文本内容就达到了2%的EER。我们的可解释性研究发现,系统决策与语音中语义相似的关键词有关,这些关键词源于LibriSpeech的编纂方式。我们的研究建议重新设计VoicePrivacy数据集,以确保公平和无偏的评估,并挑战对全球EER用于隐私评估的依赖。

英文摘要

Speaker anonymization systems hide the identity of speakers while preserving other information such as linguistic content and emotions. To evaluate their privacy benefits, attacks in the form of automatic speaker verification (ASV) systems are employed. In this study, we assess the impact of intra-speaker linguistic content similarity in the attacker training and evaluation datasets, by adapting BERT, a language model, as an ASV system. On the VoicePrivacy Attacker Challenge datasets, our method achieves a mean equal error rate (EER) of 35%, with certain speakers attaining EERs as low as 2%, based solely on the textual content of their utterances. Our explainability study reveals that the system decisions are linked to semantically similar keywords within utterances, stemming from how LibriSpeech is curated. Our study suggests reworking the VoicePrivacy datasets to ensure a fair and unbiased evaluation and challenge the reliance on global EER for privacy evaluations.

2411.09593 2026-05-21 eess.IV cs.AI cs.CV

SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms

SMILE-UHURA挑战 -- 从超高分辨率7T磁共振血管造影中进行微血管分割

Soumick Chatterjee, Hendrik Mattern, Marc Dörner, Alessandro Sciarra, Florian Dubost, Hannes Schnurre, Rupali Khatun, Chun-Chih Yu, Tsung-Lin Hsieh, Yi-Shan Tsai, Yi-Zeng Fang, Yung-Ching Yang, Juinn-Dar Huang, Marshall Xu, Siyu Liu, Fernanda L. Ribeiro, Saskia Bollmann, Karthikesh Varma Chintalapati, Chethan Mysuru Radhakrishna, Sri Chandana Hudukula Ram Kumara, Raviteja Sutrave, Abdul Qayyum, Moona Mazher, Imran Razzak, Cristobal Rodero, Steven Niederren, Fengming Lin, Yan Xia, Jiacheng Wang, Riyu Qiu, Liansheng Wang, Arya Yazdan Panah, Rosana El Jurdi, Guanghui Fu, Janan Arslan, Ghislain Vaillant, Romain Valabregue, Didier Dormont, Bruno Stankoff, Olivier Colliot, Luisa Vargas, Isai Daniel Chacón, Ioannis Pitsiorlas, Pablo Arbeláez, Maria A. Zuluaga, Stefanie Schreiber, Oliver Speck, Andreas Nürnberger

AI总结 该研究旨在解决公共标注数据集不足的问题,通过提供一个包含时间飞行血管造影的7T MRI标注数据集,评估了多种深度学习方法在微血管分割任务中的性能。

详情
AI中文摘要

人类大脑通过复杂的血管网络获取营养和氧气。影响微血管的病理状况是脑血供中的关键弱点,可能导致严重疾病,如小脑血管疾病。7特斯拉MRI系统的发展使得可以获得更高的空间分辨率图像,使能够可视化大脑中的这些血管。然而,缺乏公开可用的标注数据集阻碍了稳健的机器学习驱动分割算法的发展。为此,SMILE-UHURA挑战被组织起来。该挑战与2023年ISBI会议同期在哥伦比亚的加勒比海城市卡塔赫纳举行,旨在为相关研究领域研究人员提供一个平台。SMILE-UHURA挑战通过提供一个包含7T MRI获取的时间飞行血管造影的标注数据集,填补了公共标注数据集的空白。该数据集是通过自动预分割和大量手动精修相结合创建的。在本文中,十六种提交的方法和两个基线方法在两个不同的数据集上进行了定量和定性比较:一个是来自相同数据集的保留测试MRA(标签保密),另一个是单独的7T ToF MRA数据集(输入体积和标签均保密)。结果表明,大多数提交的深度学习方法在提供的训练数据集上训练后,实现了可靠的分割性能。Dice分数在相应数据集上达到了最高0.838±0.066和0.716±0.125,平均性能最高可达0.804±0.15。

英文摘要

The human brain receives nutrients and oxygen through an intricate network of blood vessels. Pathology affecting small vessels, at the mesoscopic scale, represents a critical vulnerability within the cerebral blood supply and can lead to severe conditions, such as Cerebral Small Vessel Diseases. The advent of 7 Tesla MRI systems has enabled the acquisition of higher spatial resolution images, making it possible to visualise such vessels in the brain. However, the lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms. To address this, the SMILE-UHURA challenge was organised. This challenge, held in conjunction with the ISBI 2023, in Cartagena de Indias, Colombia, aimed to provide a platform for researchers working on related topics. The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI. This dataset was created through a combination of automated pre-segmentation and extensive manual refinement. In this manuscript, sixteen submitted methods and two baseline methods are compared both quantitatively and qualitatively on two different datasets: held-out test MRAs from the same dataset as the training data (with labels kept secret) and a separate 7T ToF MRA dataset where both input volumes and labels are kept secret. The results demonstrate that most of the submitted deep learning methods, trained on the provided training dataset, achieved reliable segmentation performance. Dice scores reached up to 0.838 $\pm$ 0.066 and 0.716 $\pm$ 0.125 on the respective datasets, with an average performance of up to 0.804 $\pm$ 0.15.

2605.21251 2026-05-21 eess.IV cs.CV

Local-sensitive connectivity filter (ls-cf): A post-processing unsupervised improvement of the frangi, hessian and vesselness filters for multimodal vessel segmentation

局部敏感连通性滤波器(ls-cf):一种后处理的无监督改进方法,用于多模态血管分割的Frangi、Hessian和血管性滤波器

Erick O Rodrigues, Lucas O Rodrigues, João HP Machado, Dalcimar Casanova, Marcelo Teixeira, Jeferson T Oliva, Giovani Bernardes, Panos Liatsis

AI总结 本文提出了一种无监督的多模态方法,改进Frangi滤波器的响应,实现自动血管分割。通过计算像素级血管连续性并引入局部容忍启发式方法来填补Frangi响应产生的血管不连续性,提出局部敏感连通性滤波器(LS-CF),在多种多模态数据集上取得了有竞争力的结果,尤其在OSIRIX视网膜血管造影数据集中,其准确率优于现有最先进方法。

详情
Journal ref
Journal of Imaging 2022
AI中文摘要

视网膜血管分析是一种可用于评估眼部风险的程序。本文提出了一种无监督的多模态方法,改进Frangi滤波器的响应,实现自动血管分割。我们提出了一种滤波器,计算像素级血管连续性并引入局部容忍启发式方法来填补Frangi响应产生的血管不连续性。该方法称为局部敏感连通性滤波器(LS-CF),与基于阈值的Frangi响应滤波器、结合形态学闭运算的简单连通性滤波器以及文献中的现有方法进行了比较。该方法在多种多模态数据集中取得了有竞争力的结果。在OSIRIX视网膜血管造影数据集中,它在准确率方面优于所有现有最先进方法;在IOSTAR数据集中,它在4/5项任务中优于现有方法;在DRIVE和STARE数据集中,它也优于一些现有工作;在CHASE-DB数据集中,它在6/10项任务中优于现有方法,并且在CHASE-DB数据集中也优于所有现有的无监督方法。

英文摘要

A retinal vessel analysis is a procedure that can be used as an assessment of risks to the eye. This work proposes an unsupervised multimodal approach that improves the response of the Frangi filter, enabling automatic vessel segmentation. We propose a filter that computes pixel-level vessel continuity while introducing a local tolerance heuristic to fill in vessel discontinuities produced by the Frangi response. This proposal, called the local-sensitive connectivity filter (LS-CF), is compared against a naive connectivity filter to the baseline thresholded Frangi filter response and to the naive connectivity filter response in combination with the morphological closing and to the current approaches in the literature. The proposal was able to achieve competitive results in a variety of multimodal datasets. It was robust enough to outperform all the state-of-the-art approaches in the literature for the OSIRIX angiographic dataset in terms of accuracy and 4 out of 5 works in the case of the IOSTAR dataset while also outperforming several works in the case of the DRIVE and STARE datasets and 6 out of 10 in the CHASE-DB dataset. For the CHASE-DB, it also outperformed all the state-of-the-art unsupervised methods.

2605.21224 2026-05-21 physics.optics cs.AI eess.SP

Artificial Intelligence Reshapes Microwave Photonics

人工智能重塑微波光子学

Peng Li, Xihua Zou, Jia Ye, Wei Pan, Lianshan Yan

AI总结 本文研究了人工智能如何推动微波光子学的发展,通过整合人工智能与微波光子学技术,实现了在信号生成、传输、处理和检测等方面的创新突破。

详情
Comments
13 pages, 12 figures
AI中文摘要

作为一项迅速发展的跨学科领域,微波光子学(MWP)通过整合微波和光子技术,为克服传统电子系统的根本带宽限制提供了颠覆性解决方案。通过利用光子技术固有的超宽带宽和低损耗特性,MWP实现了微波、毫米波和太赫兹信号的生成、传输、处理和检测。代表性突破包括全光微波雷达系统、带宽高达320 GHz的全光模拟-数字转换器,以及数据速率高达616 Gbit/s的全光无线通信系统。同时,人工智能的快速成长正在以前所未有的方式重塑科学研究、工程和日常生活,如AI用于科学/工程和AI合作者/助手。相应地,人工智能在微波光子学的各个方面产生了深远影响,从信号生成、传输到信号处理和检测。人工智能已经革新了MWP系统的 设计、仿真、制造、测试、部署和维护,实现了超越传统系统的自主操作和卓越效率。受这些进展的启发,本文综述论文提供了人工智能赋能微波光子学的首次全面概述,系统总结了最先进的进展,并为学术界和更广泛公众提供了见解。

英文摘要

As a rapidly emerging interdisciplinary field that intrinsically integrates microwave and photonics, microwave photonics (MWP) provides disruptive solutions to overcome the fundamental bandwidth of conventional electronic systems. By exploiting the inherently ultra-wide bandwidth and low-loss characteristics of photonic technologies, MWP enables the generation, transmission, processing, and detection of microwave, millimeter-wave, and terahertz signals. Representative breakthroughs include fully photonic microwave radar systems, photonic analog-to-digital converters with bandwidth up to 320 GHz, and photonic wireless communication systems achieving data rate as high as 616 Gbit/s. Meanwhile, the rapid growth of artificial intelligence (AI) is reshaping scientific research, engineering, and daily life in unprecedented ways, such as AI for science/engineering and AI co-scientist/assistant. Correspondingly, AI is profoundly reshaping MWP in all aspects, ranging from signal generation, transmission to signal processing and detection. AI has revolutionized the design, simulation, fabrication, testing, deployment, and maintenance of MWP systems, delivering autonomous operation and exceptional efficiency beyond traditional systems. Motivated by these developments, this Review Paper provides the first comprehensive overview of AI-enabled MWP, systematically summarizing the state-of-the-art advances and presenting insights for both the academic community and the broader public.

2605.21211 2026-05-21 eess.SY cs.LG cs.SY math.OC

Reinforcement Learning-based Control via Y-wise Affine Neural Networks: Comparative Case Studies for Chemical Processes

基于Y-wise affine神经网络的强化学习控制:化学过程的比较案例研究

Austin Braniff, Yuhe Tian

AI总结 本文提出了一种高效且实用的强化学习控制方法,用于化学过程系统,通过Y-wise Affine Neural Network (YANN)-RL算法解决信任RL算法和训练可靠智能体的挑战,并在三个公开的化学工程案例研究中展示了其在减少训练时间和数据需求方面的优势。

详情
Comments
Accepted for publication at the 23rd IFAC World Congress, 2026
AI中文摘要

在本工作中,我们提出了一种高效且实用的方法,用于将基于强化学习(RL)的控制应用于化学过程系统。这是一个尚未广泛采用RL控制的领域,主要由于RL算法的固有挑战和训练可靠智能体的耗时过程。为了解决这些挑战,我们利用了一类称为Y-wise Affine Neural Network (YANN)-RL的RL算法,该算法在我们之前的研究所提出(Braniff和Tian,2025a)。通过战略性地初始化actor和critic网络,YANN-RL算法在控制方案中提供自信且可解释的起点。我们将这种基于RL的控制方法应用于三个不同的过程工程案例研究,这些研究在PC-Gym库(Bloor等人,2026)中公开:(i)连续搅拌釜反应器(CSTR),(ii)四塔系统,以及(iii)多级萃取柱。我们的方法与几种流行的RL算法(PPO、SAC、DDPG和TD3)以及非线性模型预测控制(NMPC)进行了比较。这些案例研究证明,YANN-RL可以显著减少训练时间和所需的数据,可以放心地部署在化学过程系统中,并且在不掌握完整非线性模型的情况下可以接近NMPC的性能。

英文摘要

In this work we present an efficient and practically implementable approach for the application of reinforcement learning (RL)-based control in chemical process systems. This is an area that has yet to widely adopt RL-based control largely due to inherent challenges in trusting RL algorithms and the time-consuming process of training reliable agents. To address these challenges, we leverage a class of RL algorithms termed Y-wise Affine Neural Network (YANN)- RL, which we have developed in our prior work (Braniff and Tian, 2025a). By strategically initializing actor and critic networks YANN-RL algorithms provide confident and interpretable starting points within control schemes. We apply this RL-based control approach to three different process engineering case studies publicly available on the PC-Gym library (Bloor et al., 2026): (i) a continuous stirred tank reactor (CSTR), (ii) a four-tank system, and (iii) a multistage extraction column. Our approach is compared to several popular RL algorithms (PPO, SAC, DDPG, and TD3) and is benchmarked against nonlinear model predictive control (NMPC). These case studies demonstrate that YANN-RL can greatly reduce the training time and data needed, can be deployed with confidence for chemical process systems, and can approach the performance of NMPC without the knowledge of a full nonlinear model.

2605.21183 2026-05-21 eess.SY cs.SY

Collaborative Optimization of Battery Charging / Swapping Stations for eVTOLs Based on Closed-Loop Supply Chain and Space-Time Network

基于闭环供应链和时空网络的eVTOL电池充电/换电站协同优化

Pengfeng Lin, Miao Zhu, Jiahui Sun, Haoyang Cui, Xiaoyong Cao, Chuanlin Zhang, Yunda Yan

AI总结 本文针对eVTOL电池能量补给的约束问题,提出基于闭环供应链的充电换电站模型,利用时空网络方法优化电池调度和物流,通过Gurobi求解验证了模型的可行性,缓解了eVTOL的续航焦虑并支持其商业化应用。

详情
AI中文摘要

在日益增长的全球低空经济背景下,各国相继出台政策以加速电动垂直起降飞行器(eVTOL)的应用和商业化。然而,纯电力eVTOL面临电池能量密度有限、高操作功率需求及快速能量补充的挑战,限制了其飞行续航和应用场景。此外,随着eVTOL部署的扩大,支持充电基础设施和法规仍不完善,给新兴电力分配网络带来维持充足电力供应和确保运营连续性的新挑战。为解决这些问题,通过研究电池能量补充策略,提出基于闭环供应链的eVTOL电池充电和换电站模型。利用时空网络方法来表征系统中电池和物流的调度。随后,为最大化模型的运营收入,实施优化的电池换乘、运输和充电流程管理,促进eVTOL、换电站和充电站之间的协调运作。最后,通过Gurobi求解验证模型的可行性。仿真结果进一步表明,该模型缓解了eVTOL的续航焦虑,为其商业化提供了有力支持。此外,它还实现了eVTOL与分配网络之间的协调调度,从而促进网络的逐步改进和升级。

英文摘要

Against the backdrop of the burgeoning global low-altitude economy, countries have successively introduced a series of policies to accelerate the application and commercialization of electric vertical take-off and landing (eVTOL) aircraft. Nevertheless, purely electric eVTOLs confront constraints including limited battery energy density, high operational power requirements, and challenges associated with rapid energy replenishment, which collectively restrict their flight endurance and application scenarios. Furthermore, while eVTOL deployment is scaling up, supporting charging infrastructure and regulations remain underdeveloped. This situation presents emerging power distribution networks with new challenges in maintaining adequate electricity supply and ensuring operational continuity. To tackle these issues, following an investigation into battery energy replenishment strategies, a closed-loop supply chain-based model for eVTOL battery charging and swapping is proposed. Time-space network methods are utilized to characterize the scheduling of batteries and logistics throughout the system. Subsequently, aiming to maximize the operational revenue of the model, optimized management of battery swapping, transportation, and charging processes is implemented, facilitating coordinated operation among eVTOLs, swapping stations, and charging stations. Finally, the model is solved by Gurobi, verifying its feasibility. Simulation results further indicate that the model alleviates range anxiety for eVTOLs, offering strong support for their commercialization. Moreover, it enables coordinated scheduling between eVTOLs and the distribution network, thereby facilitating the network's gradual improvement and upgrading.

2605.21181 2026-05-21 cs.IT eess.SP math.IT

On the Identifiability of Semi-Blind Estimation in Cell-Free Massive MIMO Networks

关于细胞自由大规模MIMO网络中半盲估计的可识别性

Christian Forsch, Laura Cottatellucci

AI总结 本文研究了细胞自由大规模MIMO网络中半盲联合信道估计与数据检测的可识别性,通过大规模系统设计视角分析了半盲恢复成功的条件,并提出了基于图模型的递归概率分析方法,揭示了系统参数对可识别性的影响。

详情
Comments
6 pages, 4 figures, submitted for possible conference publication
AI中文摘要

半盲联合信道估计和数据检测(JCD)是一种有前景的方法,用于缓解细胞自由大规模多输入多输出(CF-MaMIMO)网络中的试点污染问题。这些方法的有效性从根本上取决于可识别性,即从接收到的上行观测中无歧义地恢复未知信道系数和传输数据信号的能力。在本文中,我们从大规模系统设计的角度研究了半盲JCD的可识别性。我们考虑了一个接入点(APs)和用户设备(UEs)根据泊松点过程(PPPs)空间分布的CF-MaMIMO网络。所得到的网络拓扑被建模为双部分随机几何图(BRGG),该图捕捉了由无线传播引起的局部连接。为了实现可处理的分析,空间依赖的图模型被近似为具有匹配度分布的替代独立边随机图。基于此模型,我们开发了一种递归概率分析,以表征在何种条件下半盲恢复可以以高概率成功。所提出的分析揭示了可识别区域作为关键系统参数函数,包括AP和UE密度以及连接半径,这些参数在通道系数被认为可忽略的情况下。蒙特卡洛模拟验证了预测的可识别区域,并评估了所提出图近似的准确性。所提出的框架提供了系统层面的见解,说明网络密度和连接性如何影响大规模CF-MaMIMO系统中的可识别性,并为选择部署参数和试点序列长度提供了指导方针,以实现可靠的半盲恢复。

英文摘要

Semi-blind joint channel estimation and data detection (JCD) is a promising approach to mitigate pilot contamination in cell-free massive multiple-input multiple-output (CF-MaMIMO) networks. The effectiveness of such methods fundamentally depends on identifiability, i.e., the ability to unambiguously recover the unknown channel coefficients and transmitted data signals from the received uplink observations. In this work, we investigate the identifiability of semi-blind JCD from a large-scale system design perspective. We consider a CF-MaMIMO network in which access points (APs) and user equipments (UEs) are spatially distributed according to Poisson point processes (PPPs). The resulting network topology is modeled as bipartite random geometric graph (BRGG) that captures local connectivity induced by wireless propagation. To enable a tractable analysis, the spatially dependent graph model is approximated by a surrogate independent-edge random graph with matched degree distributions. Building on this model, we develop a recursive probabilistic analysis that characterizes the conditions under which semi-blind recovery succeeds with high probability. The proposed analysis reveals an identifiability region as a function of key system parameters, including AP and UE densities and the connectivity radius beyond which channel coefficients are assumed negligible. Monte Carlo simulations validate the predicted identifiability region and assess the accuracy of the proposed graph approximation. The proposed framework provides system level insights into how network density and connectivity affect identifiability in large-scale CF-MaMIMO systems and offers guidelines for selecting deployment parameters and pilot sequence lengths that enable reliable semi-blind recovery.

2605.21153 2026-05-21 eess.SY cs.SY

Coordinated Optimal Power Quality Management in Distribution Systems Using The Residual Capacity of Community IBRs

利用社区逆变器基于资源的残余容量进行配电网协调最优无功功率管理

Tiantian Ji, Pengfeng Lin, Miao Zhu, Stephan M. Goetz, Ahmed Abu-Siada, Syed Islam

AI总结 本文提出了一种网络协调优化模型,通过释放社区逆变器基于资源(IBRs)的剩余容量来缓解电压不平衡(VU)。现有单序列策略忽略了耦合容量约束,导致资源闲置,同时未能利用社区IBRs的集体治理能力。为此,本文开发了双共同共享同步参考框架下的序列域网络模型,通过多面体近似对严格相电流和视在功率限制进行凸化,设计了二次目标函数灵活平衡序列容量分配。仿真和实验结果验证了所提策略的有效性。

详情
AI中文摘要

本文提出了一种网络协调优化模型,通过释放社区逆变器基于资源(IBRs)的剩余容量来缓解电压不平衡(VU)。现有单序列策略忽略了耦合容量约束,导致资源闲置,同时未能利用社区IBRs的集体治理能力。为此,本文开发了双共同共享同步参考框架下的序列域网络模型,通过多面体近似对严格相电流和视在功率限制进行凸化,设计了二次目标函数灵活平衡序列容量分配。仿真和实验结果验证了所提策略的有效性。

英文摘要

This letter proposes a network-wide coordinated optimization model to mitigate voltage unbalance (VU) by unleashing the remaining capacity of community inverter-based resources (IBRs). Existing single-sequence strategies ignore coupled capacity constraints and cause idle headroom. Meanwhile, they fail to harness the collective governance capabilities of community IBRs. To solve this discrepancy and exploit the unused potential, we developed a sequence-domain network model in dual commonly shared synchronous reference frames. Strict phase current and apparent power limits are formulated and convexified via polyhedral approximations. A quadratic objective function flexibly balances sequence capacity allocation. Simulation and experimental results validate the effectiveness of the proposed strategy.

2605.21141 2026-05-21 eess.AS

Linearly Constrained Deep Beamformer for Multi-Speaker Scenarios

线性约束深度深度束成形器用于多说话人场景

Ilai Zaidel, Ori Engel, Bar Engel, Sharon Gannot

AI总结 本文提出了一种深度束成形框架,用于在多说话人环境中增强目标说话人。通过一种受增广拉格朗日框架启发的自适应多项损失,深度神经网络直接从噪声多通道输入中训练以估计波束成形权重,同时满足线性空间约束。该模型还受到目标相对传输函数和估计的干扰子空间的引导,能够将波束指向目标说话人并抑制干扰源,从而在整体增强性能上优于传统LCMV波束成形器。此外,与LCMV波束成形器相比,该模型产生的旁瓣更可控且背景噪声抑制更有效。

详情
AI中文摘要

我们提出了一种深度波束成形框架,用于在多说话人环境中增强目标说话人。深度神经网络(DNN)被训练以直接从噪声多通道输入中估计波束成形权重,同时通过受增广拉格朗日框架启发的自适应多项损失满足线性空间约束。该损失结合了信号重建与惩罚项,这些惩罚项强制对目标的无失真响应并抑制干扰子空间。模型进一步由目标相对传输函数(RTF)和估计的干扰子空间引导。所提出模型能够将波束指向目标说话人并抑制干扰源,从而在整体增强性能上优于传统LCMV波束成形器所构建的相同估计的空间签名。此外,与LCMV波束成形器相比,所提出模型产生的旁瓣更可控且背景噪声抑制更有效。

英文摘要

We propose a deep beamforming framework for enhancing target speaker(s) in multi-speaker environments. A deep neural network (DNN) is trained to estimate beamforming weights directly from noisy multichannel inputs while satisfying linear spatial constraints through an adaptive multi-term loss inspired by the augmented Lagrangian framework. The loss combines signal reconstruction with penalties that enforce a distortionless response toward the target and suppress the interference subspace. The model is further guided by the target relative transfer function (RTF) and the estimated interference subspace. The proposed model can direct a beam toward the target speaker while directing nulls toward the interfering sources, achieving superior overall enhancement performance compared with the classical LCMV beamformer constructed by the same estimated spatial signatures. Furthermore, compared with the LCMV beamformer, the proposed model produces more controlled sidelobes and improved background-noise attenuation.

2605.21136 2026-05-21 cs.NI cs.SY eess.SY

LoRa and LoRaWAN simulator-cum-emulator with CAD and capture effect in Python

具有CAD和捕获效应的LoRa和LoRaWAN模拟器-模拟器(Python实现)

Matthijs Reyers, Niels Hokke, R. R. Venkatesha Prasad

AI总结 本文提出了一种基于Python的简单易用的离散事件模拟器,解决了现有LoRaWAN/LoRa模拟器代码复杂且不支持所有设备类的问题,并引入了一种新颖的评估真实设备固件的方法。

详情
Comments
Totally 11 Pages; Github link ncluded
AI中文摘要

现有的LoRaWAN/LoRa模拟器通常由大型复杂的C++代码库构成,且往往不支持所有设备类。本文提出了一种简单易用的基于Python的离散事件模拟器,该模拟器通过自定义的asyncio-based模拟内核、三阶段数据包交付模型(可复现捕获效应)、完整的LoRaWAN 1.0.4协议栈以及容器化固件系统来解决这些问题。该模拟器通过CFFI将真实STM32 C固件的HAL调用重定向到模拟器中,并通过GitHub(https://github.com/MatthijsReyers/lora-simulator)作为Python包分发,无需任何外部模拟框架或依赖项。

英文摘要

Existing LoRaWAN/LoRa simulators consist of large, complicated C++ codebases and often do not support all device classes. This paper presents the design of a simple to use, Python-based discrete-event simulator that addresses these gaps while also introducing a novel method for evaluating real device firmware in the simulator. The simulator is built on a custom asyncio-based simulation kernel, a three-phase packet delivery model that reproduces the capture effect, a full LoRaWAN 1.0.4 stack, and a containerized firmware system that cross-compiles real STM32 C firmware and redirects HAL calls into the simulator via CFFI. The simulator is distributed as a Python package via Github (https://github.com/MatthijsReyers/lora-simulator) and requires no external simulation framework or dependencies.

2605.21119 2026-05-21 math.OC cs.SY eess.SY

Scaled Graph Bounding Techniques for Reset Systems

用于重置系统的缩放图边界技术

Timo de Groot, Maurice Heemels, Tom Oomen, Sebastiaan van den Eijnden

AI总结 本文研究了重置系统中缩放图的上界技术,通过二次耗散性与缩放图的联系,揭示了通用缩放图近似方法的局限性。

详情
Comments
6 pages, 5 figures, To appear in 23rd IFAC World Congress Busan South Korea 2026
AI中文摘要

重置系统可以克服线性时不变控制的基本限制。最近引入的缩放(相对)图概念为开发重置系统的图形分析和设计工具提供了一个有前途的框架,这与广泛采用的线性系统回路形状方法一致。本文的目标是推导重置系统缩放图的上界技术,并获得其精度的见解。我们利用二次耗散性与缩放图之间的联系,将上界问题重新表述为寻找分段二次存储函数的问题。通过特定的采样技术,我们揭示了基于二次耗散性的通用缩放图近似方法的根本局限性。

英文摘要

Reset systems can overcome fundamental limitations of linear time-invariant control. The recently introduced notion of scaled (relative) graphs provides a promising framework for developing graphical analysis and design tools for reset systems, in line with widely adopted loopshaping methods for linear systems. The aim of this paper is to derive techniques for over-bounding the scaled graph of reset systems, and obtain insights in their accuracy. We exploit connections between quadratic dissipativity and scaled graphs to recast the over-bounding problem as the search for piecewise quadratic storage functions. Using specific sampling techniques, we reveal a fundamental limitation of general scaled graph approximation methods that are based on quadratic dissipativity.

2605.21116 2026-05-21 eess.IV

GeoDiff-SAR II: 3D-Driven Foundation Diffusion Models for SAR Generation via Decoupled Control

GeoDiff-SAR II: 3D-Driven Foundation Diffusion Models for SAR Generation via Decoupled Control

Xuanting Wu, Fan Zhang, Fei Ma, Yingbing Liu, Lingxiao Peng, Qiang Yin, Yongsheng Zhou

AI总结 本文提出GeoDiff-SAR II,一种基于3D模型引导的解耦框架,用于通过解耦控制生成合成孔径雷达图像,通过物理基础的几何-电磁线索实现对关键成像参数的可控生成。

详情
Comments
23 pages,14 figures
AI中文摘要

现有的合成孔径雷达(SAR)图像生成方法仍然缺乏对关键成像参数的可靠可控性,特别是方位角、俯仰角和极化模式。我们的初步GeoDiff-SAR支持有限的方位角补全,但对大缺失方位角扇区无效,并未提供对多个成像条件的统一控制。为了解决这个问题,我们提出了GeoDiff-SAR II,一种3D模型引导的解耦框架,用于可控的SAR图像生成。所提出的框架通过物理基础的几何-电磁线索而非仅图像强度来施加可控性。我们引入了一个几何-电磁条件图(GECM),这是一个结构化的中间表示,编码了目标姿态图和主导散射中心,从而将宏观几何与微观散射响应解耦。在训练过程中,GECMs是从真实的稀疏方位角SAR图像中衍生出来的。在推断过程中,相同的表示可以直接从指定方位角、俯仰角和极化条件下的3D CAD模型中渲染出来,从而在大视角间隙中实现物理一致的控制。成像参数进一步转换为文本条件,同时GECM通过ControlNet注入以提供显式的空间指导。结合FLUX主干上的低秩适应(LoRA),所提出的框架在单一过程中统一了几何-电磁条件和参数感知生成。在模拟和真实数据集上的实验表明,能够可控地生成关键SAR成像参数,跨大方位角间隙具有稳定的泛化能力,并在图像保真度、物理一致性和下游自动目标识别(ATR)性能方面实现了持续的改进。

英文摘要

Existing Synthetic Aperture Radar (SAR) image generation methods still lack reliable controllability over key imaging parameters, particularly azimuth angle, depression angle, and polarization mode. Our preliminary GeoDiff-SAR supported limited azimuth completion, but remained ineffective for large missing azimuth sectors and did not provide unified control over multiple imaging conditions. To address this problem, we propose GeoDiff-SAR II, a 3D model-guided decoupled framework for controllable SAR image generation. The proposed framework imposes controllability through physically grounded geometric-electromagnetic cues rather than image intensity alone. We introduce a Geometric-Electromagnetic Conditioning Map (GECM), a structured intermediate representation that encodes the target pose map and dominant scattering centers, thereby decoupling macroscopic geometry from microscopic scattering responses. During training, GECMs are derived from real sparse-azimuth SAR images. During inference, the same representation is rendered directly from a 3D CAD model under specified azimuth, depression angle, and polarization conditions, enabling physically consistent control across large viewpoint gaps. The imaging parameters are further converted into text conditions, while the GECM is injected through ControlNet to provide explicit spatial guidance. Combined with Low-Rank Adaptation (LoRA) on a FLUX backbone, the proposed framework unifies geometric-electromagnetic conditioning and parameter-aware generation within a single process. Experiments on simulated and real datasets demonstrate controllable generation over key SAR imaging parameters, stable generalization across large azimuth gaps, and consistent improvements in image fidelity, physical consistency, and downstream Automatic Target Recognition (ATR) performance.

2605.21111 2026-05-21 cs.RO cs.SY eess.SY

Benchmarking Empirical and Learning-Based Approaches for Feedforward Steering Control in Autonomous Racing

为自动驾驶赛车中的前馈转向控制评估经验方法和学习方法

Georg Jank, Mattia Piccinini, Sebastian Wenk, Phillip Pitschi, Johannes Betz, Boris Lohmann

AI总结 本文通过系统评估两种学习方法和两种经验方法的前馈转向控制器,发现学习方法在开环评估中预测误差最小,但在闭环测试中路径跟踪性能和圈速并不优于所提出的方法,表明在完整轨迹规划和控制软件栈中评估前馈策略的必要性。

详情
Comments
8 pages, 12 figures, Accepted to be published as part of the 2026 IEEE International Conference on Intelligent Transportation Systems (ITSC 2026), Naples, Italy, September 15-18, 2026
AI中文摘要

前馈转向控制是自动驾驶赛车分层控制架构中的关键组成部分。其目标是通过预测车辆的逆横向动力学来减少反馈控制器的转向修正。本文系统地比较了两种学习方法和两种经验(分析)前馈转向控制器。我们提出了一种基于多项式曲面拟合的新ehd公式,能够以最小的参数化捕捉速度依赖的非线性转向行为。我们使用基于现实世界阿布扎比分级自动驾驶赛车联赛的高保真度仿真框架,在高保真度双赛道车辆动力学仿真器中测试前馈控制器。开环评估显示,学习方法实现了最低的预测误差;然而闭环测试显示,这种改进的准确性并未转化为更好的路径跟踪性能或圈速,即使经过迭代微调后也是如此。相比之下,所提出的ehd方法在整体闭环鲁棒性和圈速方面表现最佳,突显了在完整轨迹规划和控制软件栈中评估前馈策略的必要性。我们的代码可在https://github.com/TUMRT/steering_ff_control上获得。

英文摘要

Feedforward steering control is a key component of hierarchical control architectures for autonomous racing. The goal is to reduce steering corrections from the feedback controllers by predicting the vehicle's inverse lateral dynamics. This paper presents a systematic benchmark of two learning-based and two empirical (analytical) feedforward steering controllers. We introduce a new \acf{ehd} formulation based on a polynomial surface fit that captures velocity-dependent nonlinear steering behavior with minimal parametrization. We test the feedforward controllers in a high-fidelity simulation framework based on the real-world Abu Dhabi Autonomous Racing League competition, using a high-fidelity double-track vehicle dynamics simulator. Open-loop evaluation shows that the learning-based controllers achieve the lowest prediction errors; however, closed-loop testing reveals that this improved accuracy does not translate into superior path tracking performance or lap times, even after iterative fine-tuning. In contrast, the proposed EHD approach achieves the best overall closed-loop robustness and lap time, highlighting the necessity of evaluating feedforward strategies within the complete trajectory planning and control software stack. Our code is available at https://github.com/TUMRT/steering_ff_control.

2605.21096 2026-05-21 eess.IV

Joint Alignment and Denoising for Event-Based Vision Sensors Using Regret-based Pareto Optimization

基于遗憾的帕累托优化的事件视觉传感器联合对齐与去噪

Shimpei Harada, Junya Hara, Hiroshi Higashi, Yuichi Tanaka

AI总结 本文提出了一种针对事件视觉传感器的联合对齐与去噪方法,通过构建对比图来同时优化对齐和去噪过程,从而解决传统方法中对齐和去噪相互制约的问题。

详情
AI中文摘要

本文提出了一种针对事件视觉传感器的联合对齐与去噪方法。现有的事件视觉传感器信号处理方法通常将事件对齐(EA)和事件去噪(ED)作为独立模块处理。然而,这种分离方法导致了矛盾:没有ED,EA会受到噪声影响;没有EA,ED难以区分信号事件和噪声事件。为了解决这一矛盾,我们通过构建一个双目标帕累托优化问题,联合优化EA和ED。我们的方法基于一个对比图,该图统计每个像素中本地化事件的数量。利用对比图,我们可以将EA定义为最大化其方差,ED定义为最小化方差。我们将这两个冲突的问题转化为帕累托优化问题,并使用遗憾策略来获得解决方案。实验结果表明,我们的方法在去噪和运动估计方面均优于其他方法。

英文摘要

This paper proposes a joint alignment and denoising method for event-based vision sensors (EVSs). Existing signal processing methods for EVSs typically perform event alignment (EA) and event denoising (ED) as separate modules. However, this separation creates a dilemma: without ED, EA is biased by noise, whereas without EA, ED struggles to distinguish signal events from noise ones. To address this dilemma, we jointly optimize EA and ED by formulating a bi-objective Pareto optimization problem. Our formulation is built upon a contrast map that counts the number of events localized in each pixel. With a contrast map, we can formulate EA as maximizing its variance and ED as minimizing the variance. We cast these two conflicting problems as a Pareto optimization and use a regret strategy to obtain a solution. Experimental results on denoising and motion estimation demonstrate that our method achieves improvements against alternative ones.

2605.21051 2026-05-21 eess.IV

Transcoding a 3D Gaussian Splatting Model from a Plenoptic Point Cloud or Mesh without the Original Multi-view Images

从视光点云或网格无原始多视角图像转码3D高斯点播模型

Maja Krivokuća, Riad Bendouro, Neus Sabater

AI总结 本文提出了一种端到端的转码流程,用于从现有的3D视光点云或网格模型生成3D高斯点播(3DGS)模型,当原始多视角图像不可用时。还提出了一种定制的初始化方法,以引导3DGS模型学习,并通过约束确保最终的3DGS模型与输入点云或网格表面紧密对齐。在高质量的标准视光点云数据集上的测试表明,我们的流程生成高质量的3DGS模型,其点数远少于原始密集点云中的点数。此外,我们的定制初始化方法比通常用于3DGS模型学习的基于SfM的默认初始化方法,导致更快的收敛速度和更干净的表面表示。

详情
Comments
Submitted to an ICIP 2026 satellite workshop
AI中文摘要

在本文中,我们提出了一种端到端的转码流程,用于从现有的3D视光点云或网格模型生成3D高斯点播(3DGS)模型,当原始多视角图像不可用时。我们还提出了一种定制的初始化方法,以引导3DGS模型学习,并通过约束确保最终的3DGS模型与输入点云或网格表面紧密对齐。在高质量的标准视光点云数据集上的测试表明,我们的流程生成高质量的3DGS模型,其点数远少于原始密集点云中的点数。此外,我们的定制初始化方法比通常用于3DGS模型学习的基于SfM的默认初始化方法,导致更快的收敛速度和更干净的表面表示。

英文摘要

In this paper, we propose an end-to-end transcoding pipeline, to create 3D Gaussian splatting (3DGS) models from existing 3D plenoptic point cloud or mesh models, when the original multi-view images of the captured 3D object or scene are not available. We also propose a custom initialisation to guide the 3DGS model learning, with constraints to ensure that the final 3DGS model aligns closely with the input point cloud or mesh surface. Tests on a high-quality, standard plenoptic point cloud dataset show that our pipeline produces 3DGS models of high visual quality, with many fewer splats than points in the original dense point clouds. Additionally, our custom initialisation leads to much faster convergence and cleaner surface representation than when starting from the default SfM-based initialisation that is typically used for 3DGS model learning.

2605.21020 2026-05-21 eess.SP cs.IT math.IT

Microwave Linear Analog Computer (MiLAC)-Aided MIMO Radar Sensing: Transmit Beamforming Design and DoA Estimation

微波线性模拟计算机(MiLAC)辅助MIMO雷达感知:发射波束成形设计和方向估计

Ziang Liu, Zheyu Wu, Bruno Clerckx

AI总结 本文研究了利用微波线性模拟计算机(MiLAC)辅助MIMO雷达感知的问题,提出了一种基于MiLAC的发射波束成形设计和基于二维离散傅里叶变换(2D-DFT)的方向估计方法,并证明了MiLAC辅助和全数字波束成形在CRB和方向估计性能上具有相同效果,同时降低了硬件成本和功耗。

详情
Comments
Submitted to IEEE journal
AI中文摘要

多输入多输出(MIMO)雷达具有波形多样性和大的空间自由度(DoFs),使其在高分辨率感知中具有吸引力。将MIMO雷达扩展到大规模阵列可以进一步提高感知性能,但也会增加硬件成本、功耗和数字处理复杂性。微波线性模拟计算机(MiLAC)可以通过将线性操作从数字域转移到模拟域来解决这些挑战。MiLAC在最近的研究中显示出在通信中的潜在优势,本文进一步探讨了其在雷达感知中的应用。具体而言,我们考虑了MiLAC辅助的发射波束成形和接收端基于二维离散傅里叶变换(2D-DFT)的方向估计。对于发射波束成形,我们提出了一个加权Cramer Rao界(CRB)最小化问题,在无损和互易MiLAC约束下,并提出了一种基于惩罚对偶分解(PDD)的迭代算法来解决非凸问题。我们进一步证明了MiLAC辅助和全数字波束成形达到相同的CRB。对于接收处理,我们展示了二维DFT可以通过无损互易MiLAC实现,这使得在模拟域中进行方向估计而无需数字优化。数值结果证实了理论发现,并表明MiLAC辅助方法在CRB和方向估计性能上与全数字基准相同。同时,硬件成本和功耗降低,因为仅需在发射端使用低分辨率DACs,而在接收端消除了RF链和ADCs。此外,在模拟域中执行二维DFT消除了所有用于方向估计的数字DFT操作。

英文摘要

Multiple-input multiple-output (MIMO) radar has waveform diversity and large spatial degrees of freedom (DoFs), making it attractive for high-resolution sensing. Scaling MIMO radar to massive arrays can further improve sensing performance, but it also increases hardware cost, power consumption, and digital processing complexity. The microwave linear analog computer (MiLAC) can tackle these challenges by moving linear operations from the digital domain to the analog domain. MiLAC has shown promising benefits for communications in recent studies and this paper identifies its potential for radar sensing. Specifically, we consider both MiLAC-aided transmit beamforming and receiver-side two-dimensional discrete Fourier transform (2D-DFT)-based direction-of-arrival (DoA) estimation. For transmit beamforming, we formulate a weighted Cramer Rao bound (CRB) minimization problem under lossless and reciprocal MiLAC constraints and propose a penalty dual decomposition (PDD)-based iterative algorithm to address the non-convex problem. We further prove that MiLAC-aided and fully-digital beamforming achieve the same CRB. For receiver processing, we show that the 2D DFT can be implemented by a lossless reciprocal MiLAC, which enables analog-domain DoA estimation without digital optimization. Numerical results confirm the theoretical finding and show that the MiLAC-aided approach achieves the same CRB and DoA estimation performance as the fully-digital benchmark. Meanwhile, hardware cost and power consumption are reduced because only low-resolution DACs are required at the transmitter, while RF chains and ADCs are eliminated at the receiver. Moreover, performing the 2D DFT in the analog domain eliminates all digital DFT operations for DoA estimation.

2605.21008 2026-05-21 eess.AS

A Survey of Audio Reasoning in Multimodal Foundation Models

多模态基础模型中音频推理的综述

Zhihan Guo, Wenqian Cui, Guan-Ting Lin, Daxin Tan, Jingyao Li, Qiyong Zheng, Dingdong Wang, Jing Xiong, Han Shi, Jiaya Jia, Irwin King

AI总结 本文综述了多模态基础模型中音频推理的研究问题,探讨了音频推理模型的架构和训练基础,并系统整理了音频到文本、音频到语音、音频视觉推理和代理音频推理等领域的最新进展,同时分析了新兴范式和评估实践,提出了未来发展方向。

详情
AI中文摘要

推理已成为现代基础模型的核心能力,然而在音频模态中的发展仍然有限。音频具有与文本和视觉不同的挑战:它是连续的、时间密集的,并且在多个时间尺度上包含语言学、非语言学和环境信息。因此,音频推理模型必须将声学信号对齐到大语言模型的离散语义空间,同时保持用于可靠推断所需的细粒度信息。进展也受到三个主要障碍的限制:真实音频基础推理数据的稀缺性、捷径学习和模态幻觉,以及语音交互中推理深度与实时延迟之间的张力。在本文中,我们提出了首个专门针对音频推理的综述。我们提供了一个统一的公式,区分直接预测建模与推理增强生成,回顾音频推理模型的架构和训练基础,并系统整理了音频到文本、音频到语音、音频视觉推理和代理音频推理等领域的最新进展。我们进一步分析了新兴范式,如思维链提示、监督微调、强化学习和延迟感知的语音交互,并讨论了评估实践、开放挑战和未来方向。我们的目标是为开发稳健、高效且原生接地的音频推理系统提供一个连贯的道路图。

英文摘要

Reasoning has become a defining capability of modern foundation models, yet its development in the audio modality remains limited. Audio poses challenges that are distinct from those of text and vision. It is continuous, temporally dense, and contains linguistic, paralinguistic, and environmental information at multiple time scales. As a result, audio reasoning models must align acoustic signals with the discrete semantic space of large language models, while still preserving fine-grained information needed for reliable inference. Progress is also limited by three major obstacles: the scarcity of genuinely audio-grounded reasoning data, shortcut learning and modality hallucination, and the tension between reasoning depth and real-time latency in spoken interaction. In this paper, we present the first dedicated survey of audio reasoning. We provide a unified formulation that distinguishes direct predictive modeling from reasoning-augmented generation, review the architectural and training foundations of audio reasoning models, and systematically organize recent advances in Audio-to-Text, Audio-to-Speech, Audio-Visual Reasoning and Agentic Audio Reasoning. We further examine emerging paradigms such as Chain-of-Thought prompting, supervised fine-tuning, reinforcement learning, and latency-aware spoken interaction, and discuss evaluation practices, open challenges, and future directions. Our goal is to offer a coherent roadmap for developing robust, efficient, and natively grounded audio reasoning systems.

2605.20977 2026-05-21 eess.IV

Parallel Context Modeling for Sliding Window Attention in Neural Video Coding

并行上下文建模用于神经视频编码中的滑动窗口注意力

Alexander Kopte, André Kaup

AI总结 本研究提出P-SWA,通过使用对角波前实现并行解码,提高了解码速度并改进了RD性能。

详情
Comments
Accepted for ICIP 2026
AI中文摘要

大多数神经视频编码器依赖于时间条件,这使它们在长序列中容易产生误差传播。虽然基于Transformer的架构如VCT提供了无漂移的替代方案,但它们存在计算复杂度高和RD性能差的问题。最近的SWA通过减少复杂度和提高RD性能解决了这些问题,但限制了解码只能严格按顺序扫描,从而在解码延迟上形成关键瓶颈。为了解决这个问题,我们提出了P-SWA,利用对角波前实现并行解码。通过嵌入超先验和引入累加器来融合侧信息和局部空间上下文,我们的方法在并行VCT上将解码速度提高了36%。同时,它在I帧上实现了高达10.0%的Bjøntegaard Delta-rate节省,在P帧上实现了7.1%的节省。

英文摘要

Most neural video codecs rely on temporal conditioning, which makes them susceptible to error propagation over long sequences. While Transformer-based architectures like the VCT offer a drift-free alternative, they suffer from high computational complexity and inferior RD performance. The recent SWA addresses these shortcomings by reducing complexity and enhancing RD performance, yet it restricts decoding to a strictly sequential raster-scan order, creating a critical bottleneck in decoding latency. To resolve this, we propose P-SWA, utilizing diagonal wavefronts to enable parallel decoding. By embedding a hyperprior and introducing an accumulator to fuse side information and local spatial context, our method increases decoding speed by 36% over the parallel VCT. Simultaneously, it achieves Bjøntegaard Delta-rate savings of up to 10.0% for I-frames and 7.1% for P-frames over the SWA baseline.

2605.20968 2026-05-21 eess.AS eess.SP

From Numbers to Perception, Energy Decay Curves Prediction

从数字到感知,能量衰减曲线预测

Imran Muhammad, Gerald Schuller

AI总结 本文提出了一种神经网络框架,通过房间几何和材料属性直接预测多带能量衰减曲线,以提高房间脉冲响应预测的准确性和效率。

详情
AI中文摘要

预测房间脉冲响应(RIRs)仍是一个挑战,由于音频信号的高维性和对感知准确性的需求。本文介绍了一种神经网络框架,该框架直接从房间几何和材料属性预测多带能量衰减曲线(EDCs)。与标准模型不同,我们的框架采用自定义复合损失函数,优化能量水平和衰减斜率,确保预测曲线符合物理衰减原理,同时保持对混响时间和早期反射的高敏感性。结果表明,该模型能够以最小的误差近似真实声学特性,在T30和清晰度指数上表现优异。该方法为传统模拟提供了计算高效的替代方案,有助于交互式虚拟环境中的真实音频渲染。

英文摘要

Predicting Room Impulse Responses (RIRs) remains a challenge due to the high dimensionality of audio signals and the need for perceptual accuracy. This paper introduces a neural network framework that predicts multi-band Energy Decay Curves (EDCs) directly from room geometry and material properties. Unlike standard models, our framework employs a custom composite loss function that optimizes for both energy levels and decay slopes in the log-domain. This ensures the predicted curves adhere to physical decay principles while maintaining high sensitivity to reverberation time and early reflections. Results demonstrate that the model successfully approximates ground-truth acoustics with minimal error in T30 and clarity indices. The approach offers a computationally efficient alternative to traditional simulations, facilitating realistic audio rendering for interactive virtual environments.

2605.20868 2026-05-21 cs.LG cs.AI cs.SY eess.SY

Runtime-Certified Bounded-Error Quantized Attention

具有运行时认证的误差受限量化注意

Dean Calver

AI总结 本文提出了一种分层的KV缓存架构,通过在GPU内存中存储INT8键和INT4值,同时在系统RAM中保留FP16原始数据,实现了运行时认证的注意机制,通过误差分解得到每头每步的误差界,以驱动自适应精度选择和多阶段回退流程,确保在需要时能恢复到精确的密集注意输出。

详情
Comments
32 pages, 1 figure
AI中文摘要

KV缓存量化减少了长上下文LLM推理的内存成本,但引入了通常仅通过经验验证的近似误差。现有系统依赖于平均情况下的鲁棒性,没有机制在运行时检测或恢复失败。本文提出了一种分层的KV缓存架构,使注意机制具有运行时认证:INT8键和INT4值存储在GPU内存中,而FP16原始数据保留在系统RAM中以实现确定性回退。一个两术语误差分解提供了每头每步的误差界(i)键量化导致的注意分布扭曲和(ii)值重建误差。这些界在线计算并用于驱动自适应精度选择和多阶段回退阶梯,确保在需要时能恢复到精确的密集注意输出。在PG-19、NIAH和RULER基准上,对LLaMA~3.1-8B(上下文长度达128K)的测试中,系统在语言建模和检索任务中与密集FP16 KV质量在噪声范围内匹配,同时恢复了在朴素INT8/INT4基线中观察到的灾难性故障。短上下文的值敏感任务暴露了压缩与保真度之间的可控权衡,可通过更紧的值容忍度或FP16值回退消除。认证是局部的(每头、每步),不保证端到端模型的正确性,但确保每个注意计算要么相对于FP16参考是受控的,要么通过回退精确恢复。这将KV缓存量化重新定义为运行时验证的计算,而不是固定近似。目标不是原始的速度提升,而是使在严格质量约束下安全部署的激进KV压缩成为可能。

英文摘要

KV cache quantization reduces the memory cost of long-context LLM inference, but introduces approximation error that is typically validated only empirically. Existing systems rely on average-case robustness, with no mechanism to detect or recover from failures at runtime. We present a tiered KV cache architecture that enables runtime-certified attention: INT8 keys and INT4 values are stored in GPU memory, while FP16 originals are retained in system RAM for deterministic fallback. A two-term error decomposition yields per-head, per-step bounds on (i) attention distribution distortion from key quantization and (ii) value reconstruction error. These bounds are computed online and used to drive adaptive precision selection and a multi-stage fallback ladder, which guarantees recovery to the exact dense attention output when required. Across PG-19, NIAH, and RULER benchmarks on LLaMA~3.1-8B with contexts up to 128K, the system matches dense FP16 KV quality within noise for language modelling and retrieval tasks, while recovering catastrophic failures observed in naive INT8/INT4 baselines. Value-sensitive tasks at short context expose a controlled trade-off between compression and fidelity, which can be eliminated via tighter value tolerances or FP16-value fallback. The certification is local (per-head, per-step) and does not guarantee end-to-end model correctness, but ensures that each attention computation is either bounded relative to an FP16 reference or exactly recovered via fallback. This reframes KV cache quantization as a runtime-verified computation rather than a fixed approximation. The goal is not raw speedups, but enabling safe deployment of aggressive KV compression under strict quality constraints.