arXivDaily arXiv每日学术速递 周一至周五更新
重置
EESS电气与系统111
2606.11857 2026-06-11 eess.SP cs.LG 新提交

REACH: Interpretability-Driven Feature Identification and Architecture Compression for Multi-Channel Vehicular Channel Estimation

REACH:面向多信道车辆信道估计的可解释性驱动特征识别与架构压缩

Simbarashe Aldrin Ngorima, Albert Helberg, Marelie H. Davel

AI总结 提出REACH框架,通过梯度归因识别关键时频特征并压缩网络,在IEEE 802.11p信道估计中实现参数和计算量大幅降低,且OOD泛化性能下降缓慢。

详情
Comments
22 pages, 16 figures
AI中文摘要

多信道混合信噪比训练改善了IEEE 802.11p车辆通信中深度学习信道估计器的分布外(OOD)泛化能力,但其内部机制尚不明确。本文提出REACH(基于相关性的信道估计器解释与架构压缩),一个在两层上运行的基于梯度的可解释性框架。输入级归因识别出一组在所有评估信道条件下始终相关的时频特征,从而以最小的性能损失实现输入维度缩减。滤波器级归因揭示了一种近乎通用的内部表示,为观察到的OOD泛化提供了表示层面的解释。基于由此产生的滤波器分类,相关性引导的架构压缩在归一化均方误差(NMSE)退化小于1 dB的情况下,大幅减少了参数数量和浮点运算次数(FLOPs),并且随着压缩程度的增加,OOD泛化性能的下降速度慢于分布内准确率的下降速度。

英文摘要

Multi-channel mixed-SNR training improves out-of-distribution (OOD) generalisation of deep learning channel estimators for IEEE 802.11p vehicular communications, yet the internal mechanism responsible for this remains unexplained. This work presents REACH (Relevance-based Explanation and Architectural Compression for cHannel estimators), a gradient-based interpretability framework that operates at two levels. Input-level attribution identifies a subset of time-frequency features consistently relevant across all evaluated channel conditions, enabling input dimensionality reduction with minimal performance loss. Filter-level attribution reveals a near-universal internal representation, providing a representational account of the observed OOD generalisation. Guided by the resulting filter taxonomy, relevance-guided architecture compression substantially reduces both the number of parameters and the number of floating-point operations (FLOPs) with sub-1 dB normalised mean square error (NMSE) degradation, and OOD generalisation degrades more slowly than within-distribution accuracy under increasing compression.

2606.11836 2026-06-11 cs.SD cs.AI eess.AS 新提交

Towards Data-free and Training-free Compression for Speech Foundation Models Using Parameter Clustering

面向语音基础模型的无数据无训练压缩:基于参数聚类的方法

Haoning Xu, Zhaoqing Li, Huimeng Wang, Youjun Chen, Chengxi Deng, Mengzhe Geng, Xunying Liu

发表机构 * The Chinese University of Hong Kong(香港中文大学) National Research Council Canada(加拿大国家研究委员会)

AI总结 提出一种基于k-means通道聚类的无数据无训练压缩方法,通过层间不同参数簇数实现细粒度混合稀疏剪枝,在HuBERT-large和Whisper-large-v3上显著降低WER。

详情
Comments
Accepted by Interspeech 2026
AI中文摘要

本文提出了一种新颖的无数据无训练压缩方法,用于语音基础模型,该方法通过k-means进行通道级聚类。还探索了更细粒度的混合稀疏剪枝,通过层间不同数量的参数簇实现。在LibriSpeech数据集上进行的实验表明,当对HuBERT-large进行50%的剪枝稀疏度操作时,在微调前,测试干净和测试其他子集上,相对于基于幅度的剪枝,获得了27.73%/18.61%绝对(34.37%/21.91%相对)的一致WER降低;在仅3个epoch的微调后,获得了0.19%/0.79%绝对(3.36%/4.62%相对)的降低。在Whisper-large-v3上,在10%稀疏度下,相对于基于幅度的剪枝,观察到2.86%/5.02%绝对(59.21%/55.29%相对)的类似WER降低,所有这些相对于未压缩基线均没有显著的WER增加。

英文摘要

This paper presents a novel data-free and training-free compression approach for speech foundation models using channelwise clustering via k-means. More fine-grained, mixed sparsity pruning by layer-level varying number of parameter clusters is also explored. Experiments conducted on the LibriSpeech dataset suggest that when operating with pruning sparsity of 50% on HuBERT-large, consistent WER reductions of 27.73%/18.61% absolute (34.37%/21.91% relative) over the magnitude-based pruning were obtained on the test-clean and test-other subsets before fine-tuning and 0.19%/0.79% absolute (3.36%/4.62% relative) after fine-tuning with only 3 epochs. Similar WER reductions of 2.86%/5.02% absolute (59.21%/55.29% relative) were observed against magnitudebased pruning on Whisper-large-v3 at 10% sparsity, all with no significant WER increase relative to the uncompressed baseline.

2606.11829 2026-06-11 eess.SP 新提交

Parametric Channel Estimation with Hardware Impaired Hybrid Beamformers: Sensing, Communications, and Power Efficiency Tradeoffs

硬件受损混合波束赋形器的参数化信道估计:感知、通信与功率效率权衡

Enrique T. R. Pinto, Silvio Mandelli, Marcus Henninger, Markku Juntti

AI总结 本文研究混合波束赋形架构下硬件损伤对感知与通信性能的影响,提出双各向同性概念和多重起始SAGE算法,发现中等分辨率ADC在功耗与性能间取得最佳平衡。

详情
AI中文摘要

由于全数字阵列的高功耗和高硬件成本,混合波束赋形器通常被视为更经济的选择。此外,使用高分辨率模数转换器(ADC)也可能导致过高的功耗,因此考虑在射频(RF)前端设计中使用较低分辨率的转换器。有限的量化分辨率以及由功率放大器(PA)和低噪声放大器(LNA)引起的非线性会对系统性能产生重大影响。虽然硬件损伤对通信的影响已被广泛研究,但其对感知性能的影响却鲜有探索。在这项工作中,我们研究了混合波束赋形架构、硬件损伤以及感知和通信性能之间的相互作用。此外,我们定义了导频-合并器对的双各向同性概念,形式化了完美能量公平波束扫描的概念。还引入了多重起始(MS)空间交替广义期望最大化算法(SAGE),旨在解决混合波束赋形系统中参数化信道估计(PCE)带来的优化问题。然后,我们提供了一组数值结果,评估了波束赋形器架构和ADC分辨率对PCE、感知和通信性能的影响。结果表明,中等分辨率ADC导致最节能的配置,在大多数波束赋形架构中实现了功耗与性能之间的最佳权衡。此外,具有高分辨率转换器的全数字波束赋形架构通常可以用具有中等分辨率转换器的混合波束赋形器设置替代,而不会显著降低性能,同时功耗和整体硬件成本更低。

英文摘要

Due to high power consumption and hardware costs of fully digital arrays, hybrid beamformers are often considered as a more economic alternative. Furthermore, using high resolution analog to digital converters (ADCs) can also have prohibitive power consumption, which leads to lower resolution converters being considered for radio frequency (RF) front end design. The finite quantization resolution as well as the nonlinearities caused by the power amplifiers (PAs) and low noise amplifiers (LNAs) can have a substantial impact on system performance. While widely studied for communications, the impact of hardware impairments on sensing performance is considerably less explored. In this work, we study the interplay between hybrid beamforming architectures, hardware impairments, and sensing and communications performance. Additionally, we define the concept of double-isotropy for pilot-combiner pairs, formalizing the notion of a perfectly energy-fair beam sweep. The multiple start (MS) space alternating generalized expectation maximization algorithm (SAGE) is also introduced, aimed at addressing the optimization issues arising from parametric channel estimation (PCE) in hybrid beamformed systems. We then provide a set of numerical results assessing the impacts of beamformer architecture and ADC resolution on PCE, sensing, and communications performance. The results show that medium resolution ADCs lead to the most power efficient configurations, with the best tradeoff between power consumption and performance for the majority of beamforming architectures. Additionally, fully digital beamforming architectures with high resolution converters can often be substituted for a hybrid beamformer setup with medium resolution converters without significant performance loss at a lower power consumption and overall hardware cost.

2606.11796 2026-06-11 eess.SY 新提交

Comparative Evaluation of Transition Mechanisms for Adaptive Droop Gains in Parallel Grid-Forming Inverters

并联构网型逆变器中自适应下垂增益的过渡机制比较评估

E. D. Gomez Anccas, E. A. MacPherson, J. Tegeler, D. Schulz

AI总结 针对并联构网型逆变器下垂增益切换引起的暂态问题,比较硬切换、速率限制、一阶IIR低通滤波及S曲线等过渡机制,实验表明S曲线将功率超调从632.7W降至约115W,频率超调限制在0.003Hz。

详情
AI中文摘要

独立微电网运行中的不确定性通常源于功率参考值与预测之间的不匹配。这些偏差由构网型控制单元补偿,这些单元根据其下垂增益分配所需的功率贡献。为了引入额外的灵活性,可以将下垂增益视为决策变量,根据系统级目标重新分配有功功率贡献。然而,直接将来自监控层的更新下垂增益参考值应用于主控制器可能会引入功率和频率暂态。本文研究了在运行中应用计划的有功功率下垂增益变化的过渡机制。在并联的两台15 kW构网型逆变器单元上,实验比较了硬切换、速率限制过渡、一阶IIR低通滤波以及三次和五次S曲线过渡。结果表明,与硬切换相比,塑造下垂增益轨迹显著减少了暂态偏差。在所考虑的案例研究中,S曲线过渡提供了最强的暂态抑制,将有功功率超调从632.7 W降低到约115 W,并将频率超调限制在约0.003 Hz。

英文摘要

Uncertainty in standalone microgrid operation usually originates from mismatches between power references and forecasts. These deviations are compensated by grid-forming controlled units, which distribute the required power contribution based on their droop gains. To introduce an additional degree of flexibility, it is possible to treat droop gains as decision variables to redistribute active-power contributions according to system-level objectives. However, directly applying updated droop gain references from a supervisory layer to the primary controllers can introduce power and frequency transients. This paper investigates transition mechanisms for applying scheduled active-power droop gain changes during operation. Hard switching, rate-limited transition, first-order IIR low-pass filtering, and cubic as well as quintic S-curve transitions are compared experimentally on two parallel 15 kW grid-forming inverter units. The results show that shaping the droop gain trajectory significantly reduces transient deviations compared to hard switching. In the considered case study, the S-curve transitions provide the strongest transient mitigation, reducing the active-power overshoot from 632.7 W to approximately 115 W and limiting the frequency overshoot to about 0.003 Hz.

2606.11795 2026-06-11 eess.AS cs.SD 新提交

Tight Boundary Prediction in Speaker Diarization Using Causal-Anticausal Consistency

说话人日志中的紧边界预测:基于因果-反因果一致性

Shota Horiguchi, Marc Delcroix, Naohiro Tawara, Takanori Ashihara, Atsushi Ando

AI总结 针对松标注训练导致预测边界松散的问题,提出利用因果与反因果模型生成紧伪标签,并通过协同训练迭代优化,恢复约70%的紧标签训练效果并提升下游性能。

详情
Comments
Accepted to Interspeech 2026 (Long Paper Track)
AI中文摘要

多说话人对话自动语音识别数据常用于训练说话人日志模型。由于此类数据优先考虑语义连续性,语音段中包含停顿和边界余量,导致标注松散。在此类数据上训练的模型倾向于内化产生这种松散性的机制,尽管紧语音区间有时更适用于下游应用。本文解决了利用松散标签使模型产生紧预测的新任务。我们的方法使用因果和反因果模型生成更紧的伪标签,这些模型本质上无法学习松散行为。我们进一步提出了一种协同训练方案,迭代地收紧标签并更新两个模型以进行更渐进式的优化。实验结果表明,所提方法恢复了理想紧标签训练所实现的约70%的收紧效果,并提升了下游性能。

英文摘要

Multi-talker conversational automatic speech recognition data are often used to train speaker diarization models. Because such data prioritize semantic continuity, pauses and boundary margins are included within speech segments, resulting in loose annotations. Models trained on such data tend to internalize mechanisms that reproduce this looseness, although tight speech intervals are sometimes preferable for downstream applications. In this paper, we address the novel task of enabling models to produce tight predictions using loose labels. Our method generates tighter pseudo labels using causal and anticausal models, which are inherently incapable of learning loosening behavior. We further propose a co-training scheme that iteratively tightens labels and updates both models for more progressive refinement. Experimental results show that the proposed method recovers about 70 % of the tightening effect achieved by ideal tight-label training and improves downstream performance.

2606.11766 2026-06-11 eess.AS cs.AI cs.CL cs.SD 新提交

Fast Speech Foundation Model Distillation Using Interleaved Stacking

快速语音基础模型蒸馏使用交错堆叠

Eungbeom Kim, Kyogu Lee

AI总结 提出交错堆叠方法加速语音基础模型蒸馏训练,通过保持层位置一致性解决性能下降问题,在SUPERB上验证有效性。

详情
Comments
Accepted by Interspeech 2026
AI中文摘要

将大型语音基础模型(SFM)蒸馏为高效的学生模型已成功应用于低资源环境。尽管蒸馏减少了推理延迟,但它需要额外的学生模型训练。然而,SFM蒸馏的训练效率仍未得到充分探索。在这项工作中,我们探索了SFM蒸馏的训练加速以加快模型部署。我们研究了堆叠的潜力,其中模型深度通过训练逐步增加,直到达到目标模型深度。虽然现有的堆叠方法提高了训练速度,但它们遭受性能下降。为了解决这一限制,我们提出了交错堆叠,一种新颖的堆叠方法,在整个堆叠过程中始终保持层位置。这一特性在SFM中尤为关键,因为每一层编码了不同的层特定知识。我们在SUPERB上验证了所提方法的有效性。

英文摘要

Distilling a large speech foundation model (SFM) into an efficient student model has been successfully applied to low-resource environments. Although distillation reduces inference latency, it requires an additional student model training. However, the training efficiency of SFM distillation remains underexplored. In this work, we explore training acceleration of SFM distillation to speed up model deployment. We examine the potential of stacking, in which the model depth is progressively increased through training until the target model depth is reached. While existing stacking methods improve training speed, they suffer from performance degradation. To handle this limitation, we propose interleaved stacking, a novel stacking method that consistently preserves layer position throughout the stacking process. This property is particularly critical in SFMs, in which each layer encodes distinct layer-specific knowledge. We validate the effectiveness of the proposed method on SUPERB.

2606.11665 2026-06-11 eess.SP 新提交

Quantization Limitations of Leakage Suppression in Self-Calibrating Monostatic Integrated Sensing and Communication MIMO Systems

自校准单站集成感知与通信MIMO系统中泄漏抑制的量化限制

Jan Adler, Florian Gast, Gerhard P. Fettweis, Rafael F. Schaefer

AI总结 本文研究量化噪声对自校准单站MIMO系统中数字预编码泄漏抑制性能的限制,推导了量化噪声影响的闭式解,并通过数值分析和硬件实验验证。

详情
AI中文摘要

功率直接从发射链路泄漏到接收射频链路是实现多天线通信前端单站感知应用的关键挑战,一种有前景的解决方案是通过数字预编码发射信号来改善泄漏抑制。虽然数字发射预编码在理论上表现良好,但实际部署中通常会出现严重的泄漏抑制退化。本文研究量化噪声作为限制此类预编码方案性能的主要因素。推导了量化噪声对任意数字联合泄漏估计和泄漏抑制预编码性能影响的闭式解,进行了数值分析,并在硬件测试平台上进行了验证。

英文摘要

Power leaking directly from transmitting into receiving radio-frequency chains is a key challenge in the realization of monostatic sensing applications with multi-antenna communication front-ends, to which a promising solution is digitally precoding transmitted signals for improved leakage suppression. While digital transmit precodings perform well in theory, real-world deployments typically exhibit severely degraded leakage suppression. This work investigates quantization noise as a primary factor limiting the performance of such precoding schemes. A closed-form solution predicting the impact of quantization noise on the performance of arbitrary digital joint leakage estimation and leakage suppression precodings is derived, numerically analyzed, and validated in a hardware testbed.

2606.11638 2026-06-11 eess.SY 新提交

Violation-Informed Spatio-Temporal Adaptive Targeting Framework for EV-Driven Distribution System Expansion Planning

违规感知的时空自适应目标框架用于电动汽车驱动的配电系统扩展规划

Linhan Fang, Xingpeng Li

AI总结 提出一种违规感知的时空自适应目标框架,通过违规分析、联合优化和时空降维方法,高效解决电动汽车引起的电压和电流违规问题,在保持规划精度的同时大幅降低计算复杂度。

详情
AI中文摘要

电动汽车的快速普及可能导致配电网出现严重的电压跌落和线路电流过载,迫切需要可扩展的扩展规划方法。本文提出了一种计算高效的违规感知时空自适应目标(STAT)框架,用于电动汽车驱动的配电系统扩展规划。该框架首先通过违规分析模型识别潜在的电压和电流违规,然后通过联合最优扩展规划模型进行缓解,该模型协同优化线路改造、并联电容器和电池储能系统的投资决策。为降低计算负担,所提出的STAT-时间关键性评估(STAT-TCA)方法从年度运行数据中提取原始应力事件,从签名一致片段中推导出初始候选规划时段集,并通过基于优化可行性和成本的跨时段验证选择最终的可转移关键时段集。同时,所提出的STAT-自适应空间目标(STAT-AST)方法为BESS和SC选址构建设备特定的空间特征,以保留紧凑但高影响力的候选母线集。在33节点和240节点配电系统上的案例研究表明,所提出的STAT框架可以在保持规划保真度的同时大幅降低时间和空间规划维度。全年验证进一步证实,所得投资计划可以消除电动汽车引起的电压和热违规,同时保持BESS的可行运行。

英文摘要

The rapid adoption of electric vehicles (EVs) can cause severe voltage drops and line current overloads in distribution networks, creating an urgent need for scalable expansion planning methods. This paper proposes a computationally efficient violation-informed spatio-temporal adaptive targeting (STAT) framework for EV-driven distribution system expansion planning. The framework first identifies potential voltage and current violations through a violation analysis model, and then mitigates them through a joint optimal expansion planning model that co-optimizes investment decisions for line reconductoring, shunt capacitors, and battery energy storage systems. To reduce computational burden, the proposed STAT-temporal criticality assessment (STAT-TCA) method extracts primitive stress events from annual operating data, derives an initial set of candidate planning horizons from signature-consistent segments, and selects a final transferable critical horizon set through cross-horizon validation based on optimization feasibility and cost. Meanwhile, the proposed STAT-adaptive spatial targeting (STAT-AST) method constructs device-specific spatial features for BESS and SC siting to retain compact yet high-impact candidate bus sets. Case studies on 33-bus and 240-bus distribution systems demonstrate that the proposed STAT framework can substantially reduce the temporal and spatial planning dimensions while preserving planning fidelity. Full-year validation further confirms that the resulting investment plans can eliminate EV-induced voltage and thermal violations while maintaining feasible BESS operations.

2606.11631 2026-06-11 eess.AS cs.SD 新提交

Benchmarking Neural Speech Compression from a Rate-Distortion Perspective

从率失真角度基准测试神经语音压缩

Jun Xu, Zhengxue Cheng, Fengxi Zhang, Yuhan Liu, Li Song, Wenjun Zhang

AI总结 提出熵约束编解码器ECC,通过标量量化与学习熵模型结合,在低比特率下实现优于传统和神经编解码器的率失真性能。

详情
AI中文摘要

基于学习的语音压缩在低比特率性能上取得了有前景的成果,但许多神经语音编解码器仍使用预设速率的离散符号描述量化潜变量,或仅在符号生成后应用熵编码。这种设计将表示学习与概率建模解耦,限制了它们利用学习到的语音潜变量的非均匀使用和时间依赖性的能力。本文从率失真角度基准测试神经语音压缩,并进一步研究用于低比特率语音压缩的熵约束编码。我们首先制定了一个统一的基于学习的语音编码流程,并对最近的神经语音编解码器进行了基准测试风格的分析,表明显式概率建模在学习语音压缩中仍未得到充分探索。然后,我们提出了ECC,一种熵约束编解码器,它将标量量化与学习熵模型相结合。ECC集成了基于超先验的边信息、通道上下文建模、潜变量残差预测和轻量级时间建模,以在训练期间估计用于率估计的潜变量似然,并在推理期间进行算术编码。为了进一步提高低比特率效率,ECC引入了熵跳跃,它使用解码器可用的尺度估计省略高度可预测的残差符号,而无需传输额外的跳跃掩码。大量实验表明,ECC在低比特率下实现了优于传统和神经编解码器基线的率失真权衡,在两个广泛使用的测试集上,平均BD-rate在ViSQOL上降低39.9%,在PESQ上降低76.3%。消融和诊断研究进一步验证了熵建模的有效性。项目页面:此 https URL

英文摘要

Learning-based speech compression has achieved promising low-bitrate performance, but many neural speech codecs still describe quantized latents with preset-rate discrete symbols or apply entropy coding only after symbol generation. Such designs decouple representation learning from probability modeling, limiting their ability to exploit the non-uniform usage and temporal dependencies of learned speech latents. In this paper, we benchmark neural speech compression from a rate--distortion perspective and further investigate entropy-constrained coding for low-bitrate speech compression. We first formulate a unified learning-based speech coding pipeline and provide a benchmark-style analysis of recent neural speech codecs, showing that explicit probability modeling remains underexplored in learned speech compression. We then propose ECC, an Entropy-Constrained Codec that combines scalar quantization with a learned entropy model. ECC integrates hyperprior-based side information, channel-wise context modeling, latent residual prediction, and lightweight temporal modeling to estimate latent likelihoods for rate estimation during training and arithmetic coding during inference. To further improve low-bitrate efficiency, ECC introduces entropy skip, which omits highly predictable residual symbols using decoder-available scale estimates without transmitting additional skip masks. Extensive experiments show that ECC achieves a favorable low-bitrate rate--distortion trade-off over conventional and neural codec baselines, reducing BD-rate by 39.9% on ViSQOL and 76.3% on PESQ on average over two widely-used test sets. Ablation and diagnostic studies further validate the effectiveness of entropy modeling. Project Page: this https URL

2606.11622 2026-06-11 eess.SP 新提交

Measurement-Based Analysis of Outdoor Massive MIMO Channel Characteristics over FR3 Frequency Band

基于测量的FR3频段室外大规模MIMO信道特性分析

Enrui Liu, Pan Tang, Haiyang Miao, Qi Zhen, Jianhua Zhang, Sen Wang

AI总结 基于8 GHz和15 GHz大规模MIMO平台测量,分析了UMa场景下信道参数,发现高频段多径更集中、方向性更强,而低频段多径分布更广、性能更稳定,为多频段MIMO建模和6G设计提供指导。

详情
Comments
Accepted for presentation at EuCAP 2026. 5 pages, 4 figures, 3 tables
AI中文摘要

频率范围3(FR3)频段由于低频段频谱有限和移动通信需求增长而日益受到关注。本研究使用时分复用(TDM)大规模MIMO平台,在8 GHz和15 GHz下实验性地研究了城市宏蜂窝(UMa)场景中的信道特性。提取了包括均方根(RMS)时延扩展(DS)和角度扩展(AS)在内的关键参数,并与第三代合作伙伴计划(3GPP)TR 38.901进行了比较。结果揭示了明显的频率依赖行为:在视距(LOS)条件下,RMS时延扩展几乎保持不变,但在非视距(NLOS)条件下,从8 GHz到15 GHz时延扩展减小,表明更高频率下多径色散减少。方位角扩展(包括ASA和ASD)和仰角扩展(包括ESA和ESD)均随频率增加而相应减小,显示出所有角度域中向更定向传播的一致趋势。容量分析表明,由于更集中的多径能量和更大的主导奇异值,15 GHz信道在LOS和NLOS场景下均略优于8 GHz。更高频率表现出更大的方向性,而较低频率提供更广泛的多径分布和更稳定的性能,为多频段MIMO建模和6G系统设计提供了宝贵指导。

英文摘要

The Frequency Range 3 (FR3) band is attracting increasing attention due to limited lower-frequency spectrum and growing mobile communication demand. This study experimentally investigates channel characteristics in Urban Macro (UMa) scenarios at 8 GHz and 15 GHz using a large-scale MIMO platform with time-division multiplexing (TDM). Key parameters, including root mean square (RMS) delay spread (DS) and angular spread (AS), were extracted and compared with 3rd Generation Partnership Project (3GPP) TR 38.901. Results reveal clear frequency-dependent behaviors: RMS delay spread remains nearly constant under line of sight (LOS) but decreases from 8 GHz to 15 GHz in non-line of sight (NLOS), indicating reduced multipath dispersion at higher frequencies. Both azimuthal spreads (including ASA and ASD) and elevation spreads (including ESA and ESD) exhibit a corresponding decrease with increasing frequency, demonstrating a consistent trend towards more directional propagation across all angular domains. Capacity analysis indicates that the 15 GHz channel slightly outperforms 8 GHz in both LOS and NLOS scenarios due to more concentrated multipath energy and larger dominant singular values. Higher frequencies exhibit greater directionality, whereas lower frequencies provide broader multipath distributions and more stable performance, offering valuable guidance for multi-band MIMO modeling and 6G system design.

2606.11596 2026-06-11 eess.SY cs.AI 新提交

Model-Based and Data-Driven Hierarchical Control and Topology Co-Design for Robust Networked Systems

基于模型和数据驱动的鲁棒网络系统分层控制与拓扑协同设计

Shirantha Welikala, Zihao Song, Hai Lin, Panos J. Antsaklis

AI总结 针对线性子系统构成的网络系统,提出基于模型和仅依赖轨迹数据的分层控制策略,结合耗散性理论与线性矩阵不等式实现局部与全局耗散性保证及拓扑优化,并应用于直流微电网的鲁棒电压调节与电流共享。

详情
Comments
To be submitted to Automatica
AI中文摘要

本文考虑一类由相互连接的线性子系统、扰动输入和性能输出构成的网络系统。利用耗散性理论,我们首先提出一种基于模型的分层控制设计策略,确保闭环网络系统从扰动输入到性能输出是耗散的。这包括为每个子系统设计局部控制器以强制执行局部耗散性保证,然后利用这些保证协同设计分布式全局控制器和互连拓扑,以在优化互连拓扑成本的同时强制执行全局耗散性保证。整个设计过程仅需求解一系列线性矩阵不等式(LMI)问题,从而保持组合性和可分散性,同时避免低效且集中的非凸迭代设计过程。这种基于模型的分层控制设计策略假设已知子系统动力学,这在许多实际网络系统中可能不成立。受此启发,我们还提出了一种数据驱动的分层控制设计策略,该策略仅假设子系统可获取丰富的输入-状态-输出轨迹数据。所提出的数据驱动设计过程假设影响子系统动力学的未知扰动受二次矩阵不等式约束(放宽了常规界限),并通过使用矩阵S引理来考虑这一点。最后,以直流微电网网络系统为例,验证了所提出的基于模型和数据驱动的分层控制设计在实现鲁棒(耗散)电压调节和电流共享方面的有效性。

英文摘要

In this paper, we consider a class of networked systems comprising an interconnected set of linear subsystems, disturbance inputs, and performance outputs. Using dissipativity theory, we first propose a model-based hierarchical control design strategy to ensure the closed-loop networked system is dissipative from its disturbance inputs to performance outputs. This involves designing local controllers for each subsystem to enforce local dissipativity guarantees, which are then exploited to co-design distributed global controllers and the interconnection topology to enforce global dissipativity guarantees while optimizing interconnection topology costs. The overall design process requires only solving a sequence of linear matrix inequality (LMI) problems, thereby retaining compositionality and decentralizability while avoiding non-convex, iterative design processes that are inefficient and centralized. This model-based hierarchical control design strategy assumes the knowledge of the subsystem dynamics, which may not hold in many real-world networked systems. Motivated by this, we also propose a data-driven hierarchical control design strategy that assumes only the availability of rich input-state-output trajectory data from the subsystems. The proposed data-driven design process assumes that the unknown disturbances affecting the subsystem dynamics are bounded by a quadratic matrix inequality (relaxing conventional bounds) and accounts for this by using the matrix S-lemma. Finally, the effectiveness of the proposed model-based and data-driven hierarchical control designs is illustrated for a networked system representing a DC microgrid, with the aim of enforcing robust (dissipative) voltage regulation and current sharing.

2606.11590 2026-06-11 hep-ex eess.SY 新提交

A High-Precision Clock Synchronization System for the CEPC Accelerator

CEPC 加速器的高精度时钟同步系统

Jun Hu, Xin Zhou, Xiaoshan Jiang, Dapeng Jin

AI总结 针对CEPC百公里隧道中192个控制节点的30 ps同步精度需求,提出基于增强型White Rabbit的系统,通过DSPLL替代DAC+VCXO、GTX相位对齐和级联全局控制架构,实现点对点3.38 ps精度和12级级联6.66 ps精度。

详情
Comments
23 pages,17 figures
AI中文摘要

环形正负电子对撞机(CEPC)沿其100公里地下隧道向192个控制节点分发参考时钟,所需的同步精度为30 ps(标准差)。我们提出了一种基于增强型White Rabbit(WR)的时钟同步系统以满足这一要求。对标准WR从环路的噪声预算分析表明,模拟驱动链(DAC+VCXO+倍频PLL)和重启引起的定时不确定性是主要限制因素。在我们重新设计的节点中,DAC+VCXO链被替换为具有基于DCO相位控制的Si5345A DSPLL时钟发生器,从而消除了板级模拟调谐级。GTX收发器相位对齐和手动字节对齐修复将重启不确定性从88.8 ps峰峰值降低到12 ps峰峰值。对于多节点操作,我们引入了一种级联全局控制架构,其中PC侧PID由TD3强化学习自动调优,片上温度前馈校准至$-0.76\,\mathrm{ps}/^\circ\mathrm{C}$。实测点对点同步精度在1米光纤上为3.38 ps,在50公里光纤上为3.92 ps。在12级级联中,末端节点精度在恒温下达到6.66 ps,在13°C温度波动下达到7.30 ps。同步时钟的TIE抖动无论级联深度如何均保持在1 ps以下。重启不确定性为2.82 ps(标准差)。4级级联系统稳定运行25小时连续监测。所有测量指标均远低于CEPC的30 ps预算。

英文摘要

The Circular Electron Positron Collider (CEPC) distributes a reference clock distributed to 192 control nodes along its 100~km underground tunnel. The required synchronization precision is 30~ps (standard deviation). We present an enhanced White Rabbit (WR)-based clock synchronization system designed to meet this requirement. A noise-budget analysis of the standard WR slave loop identifies the analog actuation chain (DAC + VCXO + multiplier PLL) and restart-induced timing uncertainty as the dominant limitations. In our redesigned node, the DAC+VCXO chain is replaced by a Si5345A DSPLL clock generator with DCO-based phase control, removing the board-level analog tuning stage. GTX transceiver phase alignment and manual byte-alignment fixing reduce restart uncertainty from 88.8~ps to 12~ps peak-to-peak. For multi-node operation, we introduce a cascaded global-control architecture with PC-side PID auto-tuned by TD3 reinforcement learning, on-chip-temperature feed-forward calibrated to $-0.76\,\mathrm{ps}/^\circ\mathrm{C}$. The measured point-to-point synchronization precision is 3.38~ps over 1~m fiber and 3.92~ps over 50~km. In a 12-level cascade, the end-node precision reaches 6.66~ps at constant temperature and 7.30~ps under a 13$\,^\circ$C temperature swing. Synchronized-clock TIE jitter stays below 1~ps regardless of cascade depth. Restart uncertainty is 2.82~ps (std.\ dev.). A 4-level cascade operated stably for 25 hours of continuous monitoring. All measured metrics fall well within the CEPC 30~ps budget.

2606.11589 2026-06-11 eess.SY 新提交

Large Language Models in Process Systems Engineering: Opportunities, Architectures, and Industrial Deployment Challenges

过程系统工程中的大语言模型:机遇、架构与工业部署挑战

Bhushan Gopaluni, Vidya Kotamraju, Syon Bhushan

AI总结 本文系统综述了大语言模型在过程系统工程中的应用,涵盖七个领域,指出其在自然语言任务上表现良好,但在实时执行、约束满足和形式化安全保证方面仍面临挑战。

详情
AI中文摘要

大语言模型(LLMs)已迅速成为工程学科中备受关注的工具,过程系统工程(PSE)也不例外。本综述系统回顾了LLMs在PSE中的应用,将文献分为七类:(1)过程设计与工程,(2)分子设计与合成,(3)过程建模与仿真,(4)时间序列预测,(5)优化与调度,(6)过程控制,(7)故障检测与诊断。针对每个类别,我们总结了最新进展,识别了常见的方法论途径,并批判性地评估了已展示的能力与理想化的声称。我们发现,LLMs在处理自然语言的任务中显示出真正的潜力,包括查询文档、综合非结构化知识以及实现灵活的人机交互。然而,需要实时执行、约束满足或形式化安全保证的应用仍然具有挑战性。最后,我们指出了PSE社区面临的开放问题和富有成效的研究方向。

英文摘要

Large Language Models (LLMs) have rapidly emerged as tools of interest across engineering disciplines, and Process Systems Engineering (PSE) is no exception. This survey provides a systematic review of LLM applications in PSE, organizing the literature into seven categories: (1) process design and engineering, (2) molecular design and synthesis, (3) process modeling and simulation, (4) time-series forecasting, (5) optimization and scheduling, (6) process control, and (7) fault detection and diagnosis. For each category, we summarize the state of the art, identify common methodological approaches, and critically assess demonstrated capabilities versus aspirational claims. We find that LLMs show genuine promise for tasks involving natural language, including querying documentation, synthesizing unstructured knowledge, and enabling flexible human-machine interaction. However, applications requiring real-time execution, constraint satisfaction, or formal safety guarantees remain challenging. We conclude by identifying open problems and productive research directions for the PSE community.

2606.11588 2026-06-11 eess.SP 新提交

Antenna Coding and Digital Precoding for Limited Feedback MIMO Systems Using Pixel Antennas

基于像素天线的有限反馈MIMO系统的天线编码与数字预编码

Zhetong Li, Hongyu Li

AI总结 针对像素天线带来的信道状态信息获取开销问题,提出基于码本和索引反馈的有限反馈MIMO系统,联合设计天线编码器和数字预编码器,并开发低复杂度离线码本构建算法,性能优于传统固定配置天线。

详情
AI中文摘要

像素天线实现了天线编码技术,该技术可在波操控中提供更多自由度,以增强无线通信。然而,由于像素天线的独特硬件约束,在发射机处获取完整的信道状态信息(CSI)会带来过高的开销。因此,本文提出了一种使用像素天线的有限反馈多输入多输出(MIMO)系统,其中天线编码器和数字预编码器基于预定义码本和高效索引反馈进行设计。我们首先推导了实际功率约束下的最优数字预编码器,这为简化天线编码器和数字预编码器的联合码本构建提供了见解。然后,我们开发了一种低复杂度的离线码本构建算法,该算法支持后续天线编码器和数字预编码器的码本设计。仿真结果表明,所提方案显著优于使用固定配置传统天线的无约束MIMO系统。

英文摘要

Pixel antennas enable antenna coding, a technique that can provide more degrees of freedom in wave manipulation, to enhance wireless communications. However, acquiring full channel state information (CSI) at the transmitter incurs prohibitive overhead due to the unique hardware constraints from pixel antennas. This paper thus proposes a limited feedback multi-input multi-output (MIMO) system using pixel antennas, where the antenna coder and digital precoder are designed based on pre-defined codebooks and efficient index feedbacks. We first derive the optimal digital precoder under practical power constraints that provides insights on simplifying the joint codebook construction for antenna coder and digital precoder. We then develop a low-complexity offline codebook construction algorithm that enables subsequent codebook designs for the antenna coder and digital precoder. Simulation results demonstrate that the proposed scheme significantly outperforms unconstrained MIMO systems using conventional antennas with fixed configurations.

2606.11581 2026-06-11 eess.AS cs.SD 新提交

Sensitivity Analysis of Generative Spatial Audio Metrics: A Study on Responsiveness, Smoothness, and Symmetry

生成式空间音频指标的敏感性分析:响应性、平滑性和对称性研究

Purnima Kamath, Adrian S. Roman, Koichi Saito, Yuki Mitsufuji, Juan P. Bello

AI总结 提出一个框架分析生成式空间音频指标对空间参数变化的敏感性,定义响应性、平滑性和对称性三个期望属性,评估标准指标后发现FAD和声学地图表现最佳。

详情
Comments
Accepted for publication at Interspeech 2026
AI中文摘要

由于对指标如何响应方位角和仰角等空间参数变化的理解有限,评估一阶环绕声(FOA)的生成式空间音频仍然具有挑战性。我们借鉴参数化声音合成中的敏感性分析原理,提出了一个沿连续空间轨迹分析指标敏感性的框架。通过使用复杂度递增的受控FOA场景,我们定义了指标行为的三个期望属性:响应性、平滑性和对称性。我们评估了标准基于分布和基于样本的指标,包括Fréchet音频距离(FAD)、强度向量和声学地图。我们的发现表明,使用定位特定嵌入和声学地图的FAD在不同条件下具有高响应性以及稳健的平滑性和对称性,而强度向量随着场景复杂度的增加而退化。这是研究生成式空间音频指标敏感性的第一步。

英文摘要

Evaluating generative spatial audio for First-Order Ambisonics (FOA) remains challenging due to a limited understanding of how metrics respond to changes in spatial parameters such as azimuth and elevation. We propose a framework to analyze metric sensitivity along continuous spatial trajectories, drawing on principles of sensitivity analysis in parametric sound synthesis. Using controlled FOA scenes with increasing scene complexity, we define three desiderata for metric behavior: Responsiveness, Smoothness, and Symmetry. We assess standard distribution-based and sample-based metrics, including Fréchet Audio Distance (FAD), intensity vectors, and acoustic maps. Our findings show that FAD using localization-specific embeddings and acoustic maps yield high Responsiveness and robust Smoothness and Symmetry across conditions, while intensity vectors degrade with increasing scene complexity. This is the first step towards investigating the sensitivity of metrics for generative spatial audio.

2606.11500 2026-06-11 eess.IV cs.CE cs.IT cs.LG q-bio.NC 新提交

FlexiBrain: Resolution-Agnostic Voxel-Level Encoding for Native fMRI

FlexiBrain: 面向原生fMRI的分辨率无关体素级编码

Mo Wang, Wenhao Ye, Junfeng Xia, Minghao Xu, Hongkai Wen, Quanying Liu

AI总结 提出FlexiBrain,一种基于Mamba-JEPA的分辨率无关体素级编码框架,通过动态补丁调整直接处理原生fMRI数据,避免破坏性空间标准化,在五个下游任务中性能提升达12个百分点,并显著降低预处理成本。

详情
AI中文摘要

大规模深度学习模型在神经科学中的成功从根本上受到严重数据异质性的制约。从不同来源聚合的原生fMRI数据在空间和时间分辨率上表现出显著差异。因此,大多数现有框架依赖于冗长、僵化的预处理流程,以强制数据集之间的一致性。这种做法引入了两个关键限制:(1)可能退化受试者特定的解剖信息;(2)显著的计算开销,通常每个受试者需要数小时的处理。在此,我们提出FlexiBrain,一种基于Mamba-JEPA的分辨率无关体素级编码框架,用于原生fMRI。FlexiBrain以真实物理单位定义补丁大小,并采用动态补丁调整,从而绕过破坏性的空间标准化,同时允许直接摄取原生空间中的数据。我们使用高效的Mamba-JEPA骨干网络实例化该框架,以建模高维4D fMRI信号。在五个不同的下游神经科学任务中,FlexiBrain持续优于近期最先进的方法,在不使用外部数据增强的情况下实现了高达12个百分点的提升。重要的是,FlexiBrain作为一个无缝插件模块,显著降低了预处理成本,并加速了稳健的体素级fMRI基础模型的开发。代码可在该https URL获取。

英文摘要

The success of large-scale deep learning models in neuroscience is fundamentally constrained by severe data heterogeneity. Native fMRI data aggregated from diverse sources exhibit substantial variation in both spatial and temporal resolutions. Consequently, most existing frameworks rely on lengthy, rigid preprocessing pipelines that enforce uniformity across datasets. This practice introduces two critical limitations: (1) potential degradation of subject-specific anatomical information; (2) significant computational overhead, often requiring hours of processing per subject. Here, we propose FlexiBrain, a resolution-agnostic voxel-level encoding framework for native fMRI based on Mamba-JEPA. FlexiBrain defines patch sizes in real-world physical units and employs a dynamic patch resizing, thereby bypassing destructive spatial standardization while enabling direct ingestion of data in native space. We instantiate the framework using an efficient Mamba-JEPA backbone to model high-dimensional 4D fMRI signals. Across five diverse downstream neuroscience tasks, FlexiBrain consistently outperforms recent state-of-the-art methods, achieving gains of up to 12 percentage points without external data augmentation. Importantly, FlexiBrain functions as a seamless plug-in module, substantially reducing preprocessing costs and accelerating the development of robust voxel-level fMRI foundation models. Code is available at this https URL.

2606.11490 2026-06-11 cs.LG eess.SY 新提交

OmniLoc: A Geometry-Aware Foundation Model for Anchor-Free UE Localization Across Diverse Indoor Environments

OmniLoc: 一种几何感知的基础模型,用于跨多样室内环境的无锚点用户设备定位

Lei Chu, Yuning Zhang, Omer Gokalp Serbetci, Anushka Katiyar, Bassel Abou Ali Modad, Andreas F. Molisch

AI总结 提出OmniLoc,首个基于无线测量的基础模型,通过统一输入分词、几何感知Transformer和几何感知位置估计模块,实现跨室内环境的鲁棒无锚点定位,显著优于现有方法。

详情
AI中文摘要

由于建筑几何形状、可检测接入点(AP)集合以及接收信号异质性的显著变化,基于无线测量的室内定位在大规模部署中仍然具有挑战性。现有的基于学习的方法通常仅在有限环境下表现良好,并在环境变化下性能下降,使得在多样室内环境中进行鲁棒的无锚点定位变得极其困难。本文提出OmniLoc,一种环境交互式基础模型,用于跨多样室内环境的无锚点用户设备定位。据我们所知,OmniLoc是首个直接基于无线测量构建的用于此任务的基础模型。OmniLoc基于三个关键设计。首先,统一输入分词模块将异构无线测量转换为更易于学习的通用表示。其次,几何感知Transformer通过强调主导AP同时聚合来自辅助AP的互补证据,执行AP感知特征提取。第三,几何感知位置估计模块根据几何嵌入进行回归,以生成几何一致的位置预测。我们在大规模内部数据集和公共基准数据集上评估OmniLoc。结果表明,OmniLoc显著优于现有方法,当其设计组件集成时能持续改进现有骨干网络,并在跨环境评估中展现出强大的泛化能力。

英文摘要

Indoor localization from wireless measurements remains challenging in large-scale deployments due to substantial variation in building geometry, the set of detectable access points (APs), and the heterogeneity of received signals. Existing learning-based methods often perform well only in limited settings and degrade under environmental shifts, making robust anchor-free localization across diverse indoor environments notoriously difficult. In this paper, we present OmniLoc, an environment-interactive foundation model for anchor-free user equipment localization across diverse indoor environments. To the best of our knowledge, OmniLoc is the first foundation-model-based approach built directly on wireless measurements for this task. OmniLoc is built on three key designs. First, a unified input tokenization module converts heterogeneous wireless measurements into a common representation that is more amenable to learning. Second, a geometry-aware Transformer performs AP-aware feature extraction by emphasizing dominant APs while aggregating complementary evidence from supporting APs. Third, a geometry-aware location estimation module conditions regression on geometric embeddings to produce geometrically consistent location predictions. We evaluate OmniLoc on both a large-scale in-house dataset and a public benchmark dataset. Results show that OmniLoc significantly outperforms existing methods, consistently improves existing backbones when its design components are integrated, and demonstrates strong generalization in cross-environment evaluations.

2606.11474 2026-06-11 cs.LG eess.SY physics.acc-ph 新提交

Mahalanobis-Guided Latent OOD Detection for Hybrid ES-DRL Control in Time-Varying Systems

基于马氏距离的潜在分布外检测用于时变系统中混合ES-DRL控制

Shaifalee Saxena, Alexander Scheinker

AI总结 针对时变系统中强化学习控制器性能下降问题,提出基于变分自编码器潜在空间马氏距离的分布外检测方法,实现与极值搜索控制器的自适应切换,并在粒子加速器控制中验证有效性。

详情
AI中文摘要

本文研究了非线性时变系统中基于马氏距离的潜在分布外(OOD)检测,用于测试时RL控制器切换。RL控制器可以在训练分布内快速控制高维系统,但当时间变化动力学产生未见过的观测时,其性能可能下降。我们考虑一个组合的ES-DRL控制器,其中RL提供快速的分布内动作,而有界极值搜索(ES)在OOD操作下提供鲁棒的模型无关控制。关键挑战在于决定何时切换。我们在分布内束流剖面观测上训练变分自编码器(VAE),并使用VAE潜在空间中的马氏距离在测试时检测OOD束流剖面。此OOD决策设置一个二元开关,选择RL控制器或ES控制器。我们在安全关键的粒子加速器控制中评估该方法。在此设置中,空间磁体运动产生RL训练期间未见过的OOD束流剖面。VAE潜在空间的可视化表明,所提方法识别出此OOD场景,并为组合控制器中RL和ES之间的切换提供可解释信号。

英文摘要

In this paper, we study Mahalanobis-guided latent out-of-distribution (OOD) detection for test-time RL controller switching in nonlinear time-varying systems. RL controllers can quickly control high-dimensional systems within the training distribution, but their performance can degrade when time-varying dynamics produce unseen observations. We consider a combined ES--DRL controller, where RL provides fast in-distribution actions and bounded extremum seeking (ES) provides robust model-independent control under OOD operation. The key challenge is deciding when to switch. We train a variational autoencoder (VAE) on in-distribution beam-profile observations and use Mahalanobis distance in the VAE latent space to detect OOD beam profiles at test time. This OOD decision sets a binary switch that selects either the RL controller or the ES controller. We evaluate the approach in safety-critical particle accelerator control. In this setting, spatial magnet motion creates OOD beam profiles that were not seen during RL training. Visualization of the VAE latent space shows that the proposed method identifies this OOD scenario and provides an interpretable signal for switching between RL and ES in the combined controller.

2606.11449 2026-06-11 eess.SP 新提交

Coherent Multiband OFDM Sensing via Low-Complexity Gap Reconstruction

通过低复杂度间隙重建的相干多频带OFDM感知

Lorenzo Pucci, Leonardo Pucci, Andrea Giorgetti

AI总结 针对集成感知与通信中多频带OFDM感知的频谱间隙问题,提出一种低复杂度迭代重建方法,包含时域均衡和迭代切趾操作,在中等间隙下接近全频带性能,且复杂度与目标数无关。

详情
Comments
6 pages; This paper was accepted for presentation at the IEEE PIMRC 2026
AI中文摘要

本文研究了集成感知与通信(ISAC)框架内的相干多频带正交频分复用(OFDM)感知。我们考虑一种频带内配置,其中两个等宽感知子带在同一OFDM信道内对称分配,而中心部分仍可用于通信。我们解决了由频谱间隙引起的缺失频域样本的重建以及由此产生的延迟剖面中栅瓣的抑制问题。为此,我们提出了一种低复杂度迭代重建方法,包括初始延迟域均衡阶段和基于迭代切趾的算子,并强制执行数据一致性。多目标场景的性能结果表明,所提方法在中等间隙大小下保持接近全频带参考,仅因残余栅瓣在较大间隙下性能下降。与基于压缩感知的正交匹配追踪(OMP)基线相比,随着目标数量增加,尤其是在实际相关的低信噪比(SNR)区域,它表现出更有利的性能趋势,同时其复杂度缩放与估计的目标数量无关。

英文摘要

This paper investigates coherent multiband orthogonal frequency division multiplexing (OFDM) sensing within an integrated sensing and communication (ISAC) framework. We consider an intra-band configuration in which two sensing subbands of equal width are allocated symmetrically within the same OFDM channel, while the central portion remains available for communication. We address the reconstruction of missing frequency-domain samples induced by the spectral gap and the suppression of the resulting grating lobes in the delay profile. To this end, we propose a low-complexity iterative reconstruction method consisting of an initial delay-domain equalization stage and an iterative apodization-based operator with data-consistency enforcement. Performance results for multi-target scenarios show that the proposed approach remains close to the full-band reference for moderate gap sizes and degrades only for larger gaps because of residual grating lobes. Compared with the compressed-sensing-based orthogonal matching pursuit (OMP) baseline, it exhibits a more favorable performance trend as the number of targets increases, especially in the practically relevant low-signal-to-noise ratio (SNR) regime, while offering a complexity scaling that is independent of the estimated number of targets.

2606.11432 2026-06-11 eess.SP cs.IT math.PR 新提交

Additive Noise, Shift Recovery, and Signed Signals in the Cumulative Distribution Transform

累积分布变换中的加性噪声、位移恢复与有符号信号

Harbir Antil, Ratna Khatri, Aryan Saxena

AI总结 研究累积分布变换在加性噪声下的敏感性,推导一阶展开并用于位移恢复,提出显式估计器与稳定性界,扩展至有符号信号。

详情
AI中文摘要

累积分布变换(CDT)是一种基于分位数的传输表示,可精确线性化正密度的一维平移。我们研究该结构在加性扰动下的行为,以及如何利用它进行位移恢复。在局部非退化条件下,我们推导出一阶展开,表明物理空间中的加性噪声通过噪声的原函数(由倒数密度加权)在CDT空间中引起非局部扰动。这给出了变换域敏感性的显式描述,并特别表明扰动在低密度区域被放大。当物理空间扰动建模为中心高斯随机场时,诱导的一阶CDT扰动也是高斯的,具有显式协方差核。然后我们利用该结构研究CDT坐标下的恢复。在已知模板情况下,传输位移通过投影到常数模式获得,给出显式估计器,并在无噪声情况下具有精确性,在扰动下具有稳定性界。在未知模板情况下,多次观测允许联合恢复位移和公共模板(直至自然常数模式规范),导致简单的去位移-平均过程。我们还考虑了基于有符号累积分布变换(SCDT)的有符号信号类比,其中位移通过特征匹配数值估计,未知模板通过交替对齐和平均恢复。数值实验验证了扰动分析,并展示了密度值信号和有符号信号的有效恢复。

英文摘要

The cumulative distribution transform (CDT) is a quantile-based transport representation that exactly linearizes one-dimensional translations of positive densities. We study how this structure behaves under additive perturbations and how it can be exploited for shift recovery. Under a local nondegeneracy condition, we derive a first-order expansion showing that additive noise in physical space induces a nonlocal perturbation in CDT space through the primitive of the noise, weighted by the reciprocal density. This yields an explicit description of transform-domain sensitivity and shows, in particular, that perturbations are amplified in low-density regions. When the physical-space perturbation is modeled as a centered Gaussian random field, the induced first-order CDT perturbation is again Gaussian, with an explicit covariance kernel. We then use this structure to study recovery in CDT coordinates. In the known-template setting, the transport shift is obtained by projection onto the constant mode, giving an explicit estimator together with exactness in the noiseless case and a stability bound under perturbations. In the unknown-template setting, multiple observations permit joint recovery of the shifts and a common template up to the natural constant-mode gauge, leading to a simple de-shift--and--average procedure. We also consider a signed-signal analogue based on the signed cumulative distribution transform (SCDT), where shifts are estimated numerically by feature matching and unknown templates are recovered by alternating alignment and averaging. Numerical experiments validate the perturbation analysis and illustrate effective recovery for both density-valued and signed signals.

2606.11429 2026-06-11 eess.AS cs.CL cs.SD 新提交

Gumbel-BEARD: Automatic Layer Selection for Self-Supervised Adaptation of Whisper in Low-Resource Domains

Gumbel-BEARD:低资源领域Whisper自监督自适应的自动层选择

Zilai Wang, Natarajan Balaji Shankar, Mohan Shi, Kaiyuan Zhang, Abeer Alwan

AI总结 提出Gumbel-BEARD框架,通过可训练的Gumbel-Softmax选择器自动选择Whisper编码器层,结合BEST-RQ自监督目标实现低资源领域自适应,在儿童语音和方言数据集上取得最先进词错误率。

详情
Comments
Accepted by Interspeech 2026
AI中文摘要

语音基础模型在低资源领域常因领域不匹配和数据稀缺而表现不佳。我们提出Gumbel-BEARD,一种领域自适应框架,通过端到端可训练的硬Gumbel-Softmax选择器自动选择Whisper编码器层。它利用BEST-RQ目标实现自监督自适应,无需手动调整即可动态适应目标声学特征。在MyST儿童语音语料库上的实验证明了其效率和可扩展性:使用10小时标注数据进行微调,我们的方法匹配了在完整133小时标注集上训练的完全监督基线。我们在MyST上使用Whisper-medium建立了8.21%的新最先进词错误率(WER),在OGI自发言语数据集上使用Whisper-small达到11.06%。在CORAAL上的评估进一步证实了对成人方言领域偏移的鲁棒性,相对WER降低高达6%,突显了我们的方法对多样低资源条件的泛化能力。

英文摘要

Speech foundation models often struggle in low-resource domains due to domain mismatch and data scarcity. We propose Gumbel-BEARD, a domain adaptation framework that automates Whisper encoder layer selection via an end-to-end trainable hard Gumbel-Softmax selector. It enables self-supervised adaptation with a BEST-RQ objective that dynamically adapts to target acoustic characteristics without manual tuning. Experiments on the MyST child speech corpus demonstrate efficiency and scalability: with 10 h of labeled data for fine-tuning, our method matches a fully supervised baseline trained on the complete 133 h labeled set. We establish new state-of-the-art word error rates (WERs) of 8.21% using Whisper-medium on MyST and 11.06% using Whisper-small on the OGI Spontaneous dataset. Evaluation on CORAAL further confirms robustness to adult dialectal domain shifts, with up to 6% relative WER reduction, highlighting the generalizability of our approach to diverse low-resource conditions.

2606.11400 2026-06-11 cs.SD cs.AI eess.AS 新提交

Steering Where to Listen: Instruction-Based Activation Steering Redirects Temporal Attention in Large Audio-Language Models

引导听哪里:基于指令的激活操控重定向大型音频语言模型中的时间注意力

Tsung-En Lin, Hung-Yi Lee

AI总结 提出基于指令的向量操控方法,通过对比不同指令下的激活来重定向音频令牌的时间注意力,实现无需训练的声音事件定位,显著优于直接提示和随机基线。

详情
AI中文摘要

大型音频语言模型(LALMs)在音频理解方面表现出色,但很少揭示它们关注音频信号的哪个部分。我们引入了基于指令的向量操控,该方法通过对比不同指令提示下的激活来构建操控向量,同时保持音频不变。通过对LALM注意力的系统探测,我们发现——与标准提示或基于音频的操控不同——这种干预显著重新分配了分配给音频令牌的时间注意力,将其集中在声学相关的区域。然后我们展示了这种注意力转移在行为上是有意义的:在受控的三事件设置中,读取由操控引起的最大注意力变化的时间位置,可以恢复查询声音事件的位置,而无需任何训练,在Qwen2-Audio和Audio Flamingo 3上分别达到60.87%和68.72%与真实区间的重叠,远高于直接提示(31.84%,46.75%)和随机基线(27.74%)。我们的结果表征了LALMs中基于指令的操控的机制特性,并为这些模型编码的潜在时间结构提供了一种无需训练的探测方法。

英文摘要

Large Audio-Language Models (LALMs) excel at audio understanding but expose little about where in an audio signal they attend. We introduce instruction-based vector steering, which constructs a steering vector by contrasting activations from differently instructed prompts while keeping the audio fixed. Through a systematic probe of LALM attention, we find that - unlike standard prompting or audio-based steering - this intervention significantly redistributes the temporal attention allocated to audio tokens, concentrating it on acoustically relevant regions. We then show that this attention shift is behaviorally meaningful: in a controlled three-event setting, reading out the temporal position of maximal steering-induced attention change recovers the location of a queried sound event without any training, attaining 60.87% and 68.72% overlap with ground-truth intervals on Qwen2-Audio and Audio Flamingo 3, far above direct prompting (31.84%, 46.75%) and random baselines (27.74%). Our results characterize a mechanistic property of instruction-based steering in LALMs and provide a training-free probe for the latent temporal structure these models encode.

2606.11386 2026-06-11 cs.CL cs.AI eess.AS 新提交

Overcoming State Inertia in Full-Duplex Spoken Language Models via Activation Steering

通过激活引导克服全双工口语语言模型中的状态惯性

Cheng-Kuang Chang, Kai-Wei Chang, Alexander H. Liu, James Glass

发表机构 * MIT CSAIL(麻省理工学院计算机科学与人工智能实验室)

AI总结 针对全双工口语模型在用户打断时响应延迟的问题,提出基于感知向量的激活引导方法,无需微调即可显著提升中断理解能力。

详情
AI中文摘要

全双工口语语言模型(FD-SLMs)通过允许模型同时听和说实现无缝语音交互,但其协调听与说的内部机制尚未充分探索。我们分析了FD-SLM隐藏表示中编码的预测行为,发现它们表现出特定流的预测模式:在听时,它们优先预测传入的用户流;而在说时,它们优先预测模型输出流。基于这一观察,我们表明FD-SLMs动态调节其内部预测焦点在两个状态之间:与模型输出生成一致的生成状态和与传入用户输入一致的感知状态。然而,这种调节可能滞后于对话上下文的突然变化。在用户打断期间,模型在过渡到感知状态之前短暂地偏向生成状态,导致其错过传入输入的开头。我们将这种延迟的内部过渡称为状态惯性。为了量化其下游影响,我们引入了零缓冲基准(ZBB),这是一个用于评估当用户语音突然开始时即时中断理解能力的诊断基准。我们使用响应正确性和初始词出现率(IWOR)来评估这一设置。最后,我们通过使用感知向量的激活引导来缓解状态惯性,这是一种无需训练且计算开销很小的干预措施。在多个最先进的FD-SLMs上,激活引导显著改善了中断处理;例如,在PersonaPlex上,它将正确性从28%提高到45%,将IWOR从40%提高到72%,而无需任何微调。

英文摘要

Full-duplex spoken language models (FD-SLMs) enable seamless speech interaction by allowing models to listen and speak simultaneously, yet the internal mechanism by which they coordinate listening and speaking remains underexplored. We analyze the predictive behavior encoded in FD-SLM hidden representations and find that they exhibit stream-specific predictive patterns: during listening, they preferentially predict the incoming user stream, whereas during speaking, they preferentially predict the model output stream. Building on this observation, we show that FD-SLMs dynamically modulate their internal predictive focus between two states: a generative state aligned with model output generation and a perceptive state aligned with incoming user input. However, this modulation can lag behind abrupt changes in conversational context. During user interruptions, the model remains transiently biased toward the generative state before transitioning into the perceptive state, causing it to miss the beginning of the incoming input. We term this delayed internal transition state inertia. To quantify its downstream impact, we introduce the Zero-Buffer Benchmark (ZBB), a diagnostic benchmark for evaluating immediate interruption comprehension when user speech begins abruptly. We evaluate this setting using response correctness and initial-word occurrence rate (IWOR). Finally, we mitigate state inertia through activation steering with a perception vector, a training-free intervention with little additional computational overhead. Across multiple state-of-the-art FD-SLMs, activation steering substantially improves interruption handling; for example, on PersonaPlex, it improves correctness from 28% to 45% and IWOR from 40% to 72% without any fine-tuning.

2606.11373 2026-06-11 eess.SY 新提交

From Symmetry to Stability: Quantifying Converter Grid Impedance Asymmetry as Indicator of Stability Margin

从对称到稳定:量化变流器电网阻抗不对称性作为稳定裕度的指标

Chirag Ramgopal Shah, Marta Molinas, Sjur Føyen, Roy Nilsen

AI总结 提出不对称性量化指标(AQI),通过序列域阻抗分析建立系统不对称性与稳定裕度的直接关联,揭示不对称控制回路和运行点主导的不稳定性,并通过硬件在环实验验证。

详情
AI中文摘要

尽管变流器控制器中的对称性对于鲁棒稳定裕度是可取的,但系统级不对称性与不稳定性之间的直接联系尚未明确建立。变流器控制通过直流母线电压控制、锁相环和功率同步环等环路引入三相不对称性。此外,两电平电压源变流器固有的不对称拓扑结构(将直流电压转换为三相平衡组)是不对称性传播到控制结构的根本原因。因此,建立系统不对称性(而非仅控制不对称性)与稳定裕度之间的直接关系对于理解潜在的不稳定机制至关重要。本文利用从互联变流器-电网阻抗的序列域表示导出的不对称性量化指标(AQI)来量化不对称性。在该域中,通过对称矩阵的定义识别对称性,并将其作为衡量不对称性的基准。一个稳健且广义的分析将AQI与稳定裕度相关联,包括连接到电网的跟网型和构网型控制结构。研究发现,不稳定性源于组合变流器-电网系统中不对称性的增加,这主要由不对称控制回路和运行点主导。因此,在不损害控制器功能的前提下降低不对称性可以提高稳定裕度。该分析在控制硬件在环和功率硬件在环环境中得到验证。

英文摘要

Although symmetricity in the converter controller is desirable for robust stability margins, a direct link between system-level asymmetricity and instability has yet to be clearly established. Converter control introduces three-phase asymmetricity through loops such as DC-link voltage control, a phase-locked loop, and a power synchronization loop. Furthermore, the inherently asymmetric topology of the two-level voltage-source converter, which converts a DC voltage into a three-phase balanced set, acts as the underlying origin of the asymmetries that propagate into the control structure. Consequently, establishing a direct relationship between system asymmetricity (rather than control asymmetricity alone) and the stability margin is essential for understanding the underlying instability mechanisms. In this work, asymmetricity is quantified using the Asymmetricity Quantification Index (AQI), derived from the sequence-domain representation of the interconnected converter-grid impedance. Within this domain, symmetricity is identified through the definition of symmetrical matrices, which serve as the benchmark against which asymmetricity is measured. A robust and generalized analysis correlates AQI with the stability margin, including both grid-following and grid-forming control structures connected to the power grid. It is found that instability arises from increased asymmetricity in the combined converter-grid system, which is dominated by asymmetric control loops and operating points. Thus, reducing asymmetricity without compromising controller functionality can improve stability margins. The analysis is validated in both control-hardware-in-the-loop and power-hardware-in-the-loop environments.

2606.11371 2026-06-11 cs.CL cs.AI eess.AS eess.SP 新提交

The Dynamics of Human and AI-Generated Language: How Semantics Fluctuates across Different Timescales

人类与AI生成语言的动态:语义如何在不同时间尺度上波动

Han-Jen Chang, Yasir Çatal, Angelika Wolman, Agustín Ibáñez, David Smith, I-Wen Su, Kai-Yuan Cheng, Georg Northoff

AI总结 提出语义时间尺度分析流程,通过自相关窗口度量(ACW-0)量化人类与AI生成语音中语义特异性与上下文相似性的时间组织,发现ACW-0长度与词汇通用性相关,且该关联在随机化后被削弱。

详情
Comments
45 pages, 4 figures, 4 tables. Accepted manuscript; published in Computer Speech & Language
AI中文摘要

口语,无论是人类还是大型语言模型(LLM)产生的,都会随时间展开,具有变化的语义内容。然而,我们仍然缺乏简单、可解释的时间序列特征来捕捉通用与特定内容如何随时间分布,并可用于比较人类和AI生成的语音。我们引入了一个语义时间尺度分析流程,将带有时间戳的词级转录转换为语义时间序列。对于每个口语叙述,我们计算(i)基于WordNet词深度的语义特异性,以及(ii)基于SBERT嵌入的上下文相似性,并使用自相关窗口度量(ACW-0及相关指标)量化其时间依赖性。然后,我们将原始语音与多种随机化对照进行比较,这些对照选择性地破坏词汇身份、时间顺序和词时长。在人类朗读的自传叙述、TTS朗读和LLM生成的文本(通过TTS渲染)中,我们发现语义时间序列中ACW-0较长的片段往往包含更多通用词汇,而ACW-0较短的片段则富含更具体的词汇。当词序和计时被随机化时,这些关联被强烈削弱或消除,表明基于ACW的度量捕捉了语义内容超越静态词汇分布的非平凡时间组织。我们的结果表明,基于ACW的语义时间尺度是分析和比较人类与AI生成语音时间结构的有用特征系列。

英文摘要

Spoken language, whether produced by humans or large language models (LLM), unfolds over time with varying semantic content. However, we still lack simple, interpretable time-series features that capture how generic versus specific content is distributed over time, and that can be used to compare human and AI-generated speech. We introduce a semantic-timescale analysis pipeline that turns word-level transcripts with timestamps into semantic time-series. For each spoken narrative, we compute (i) semantic specificity using WordNet-based word depth and (ii) contextual similarity using SBERT embeddings and quantify their temporal dependence using autocorrelation-window measures (ACW-0 and related metrics). We then compare original speech to multiple shuffled controls that selectively disrupt lexical identity, temporal order, and word duration. Across human-read autobiographical narratives, TTS readings, and LLM-generated texts rendered with TTS, we find that segments with longer ACW-0 in the semantic time-series tend to contain more generic vocabulary, whereas segments with shorter ACW-0 are enriched in more specific words. These associations are strongly attenuated or abolished when word order and timing are randomized, indicating that ACW-based measures capture non-trivial temporal organization of semantic content beyond static lexical distributions. Our results suggest that ACW-based semantic timescales are a useful family of features for analyzing and comparing the temporal structure of human and AI-generated speech.

2606.11366 2026-06-11 eess.SY 新提交

Probabilistic Repair Logistics Modeling for Utility-Scale PV Inverter Fleets Using Event-Driven Simulation

基于事件驱动仿真的公用事业规模光伏逆变器机群概率维修物流建模

Jinlei Wei, Yongxin Zhang, Guanyu Tian

AI总结 提出事件驱动蒙特卡洛框架,通过机会调度和两分量VaR混合分布建模维修物流,以Wasserstein距离校准,再现现场双峰结构并揭示51.2%单元通过机会插入完成。

详情
AI中文摘要

随着可再生能源系统的扩展,逆变器的可用性对电网可靠性和经济性日益重要,然而光伏逆变器维修物流的建模仍然不足。本文提出了一个针对具有并行生产线的集中维修设施的事件驱动蒙特卡洛框架,捕捉从行政预等待和运输到健康驱动维修和返回库存的完整维修周期。该模型引入了机会调度,利用强制等待期将额外单元插入临时空闲的生产线,从而在不增加产能的情况下提高吞吐量。阶段持续时间由两分量VaR风格的混合分布表示,用于常规和重尾延迟,而连续健康评分决定维修完成。通过最小化模拟与经验维修持续时间分布之间的一维Wasserstein距离进行校准,该模型应用于43个现场观察的维修案例,以53.3天的Wasserstein距离再现了经验双峰结构。结果表明,51.2%的单元通过机会插入完成,表明等待期提供了显著的可恢复调度资源。

英文摘要

As renewable energy systems expand, inverter availability becomes increasingly important for grid reliability and economics, yet photovoltaic inverter repair logistics remain under-modeled. This paper presents an event-driven Monte Carlo framework for a centralized repair facility with parallel production lines, capturing the full repair cycle from administrative pre-wait and transport to health-driven repair and return-to-inventory. The model incorporates opportunistic scheduling that uses mandatory hold periods to insert additional units onto temporarily idle lines, improving throughput without added capacity. Stage durations are represented by a two-component VaR-style mixture distribution for routine and heavy-tailed delays, while a continuous health score determines repair completion. Calibrated by minimizing the one-dimensional Wasserstein distance between simulated and empirical repair-duration distributions, the model is applied to 43 field-observed repairs, reproducing the empirical bimodal structure with a Wasserstein distance of 53.3 days. Results show that 51.2% of units are accommodated through opportunistic insertion, indicating that hold periods provide a significant recoverable scheduling resource.

2606.11362 2026-06-11 eess.SY 新提交

An Admittance-Based Inverter Connection Screening Tool for Small-Signal System Strength

基于导纳的逆变器并网小信号强度筛选工具

Andreas Hadjileonidas, Debargha Brahma, Yue Zhu, Timothy C. Green

AI总结 针对高渗透逆变器资源引发的小信号失稳问题,提出基于导纳的逆变器连接筛选工具,通过评估候选逆变器配置在关键模态频率下的导纳与系统导纳谱,高效预测其对小信号强度的影响,避免建立解析模型,并验证了准确性。

详情
Comments
10 pages, 10 figures, 9 tables
AI中文摘要

随着逆变器资源(IBRs)高渗透电力系统中小信号失稳(特别是次同步振荡)的频繁发生,新IBR并网规划变得日益重要且具有挑战性。此类并网对小信号稳定性的影响并不总是直接的,因为它强烈依赖于连接位置、逆变器运行模式、控制配置、参数化及运行条件。本文提出一种逆变器连接筛选工具(ICST),能够高效准确地评估潜在逆变器配置对小信号系统强度的影响。它可以在候选配置中识别出最适合给定连接位置的逆变器配置,避免降低小信号系统强度,甚至能增强它。因此,在保持小信号稳定性的同时,可以支持更高的IBR渗透率。ICST利用候选逆变器配置在关键模态频率下的导纳以及系统的导纳谱来评估它们,从而避免了解析模型的需求。基于ICST的规划流程可支持系统运营商、资产所有者和IBR开发商在规划研究的不同阶段进行决策,并通过修改的IEEE 57节点系统进行了演示。与基于模型的对比研究证明了ICST在预测逆变器并网模态影响方面的准确性及其在选择合适逆变器控制配置方面的有效性。

英文摘要

The increasing occurrence of small-signal instability, particularly sub-synchronous oscillations (SSOs), in power systems with a high penetration of inverter-based resources (IBRs) has made the planning of new IBR connections increasingly important and challenging. The impact of such connections on small-signal stability is not always straightforward, as it strongly depends on the connection location, inverter operating mode, control configuration, parametrisation, and operating conditions. This paper proposes an inverter connection screening tool (ICST) that enables efficient and accurate assessment of the impact of prospective inverter configurations on small-signal system strength. It can identify, among the candidates considered, the most suitable inverter configuration for a given connection location that avoids degrading small-signal system strength and can also enhance it. As a result, higher IBR penetration can be supported while maintaining small-signal stability. The ICST evaluates candidate inverter configurations using their admittances at critical modal frequencies, along with the system's admittance spectrum, thereby avoiding the need for analytical models. The ICST-based planning procedure, which can support system operators, asset owners, and IBR developers in decision-making across different stages of planning studies, is demonstrated using a modified IEEE 57-bus system. Comparisons with model-based studies demonstrate the accuracy of the ICST in predicting the modal impact of inverter connections and its effectiveness in selecting suitable inverter control configurations.

2606.11342 2026-06-11 eess.SP 新提交

Beamforming Gain with Single-RF Movable Arrays

单射频可移动阵列的波束赋形增益

Zhenqiao Cheng, Chongjun Ouyang, Hao Jiang, Xingqi Zhang, Arumugam Nallanathan

AI总结 研究单射频链驱动所有可移动天线单元,分析天线位置配置实现的波束赋形增益,证明单径信道下增益随天线数线性增长,多径信道下给出相干合并条件和孔径要求,并推导多用户场景的最优功率分配和天线位置搜索算法。

详情
Comments
5 pages
AI中文摘要

研究了一种单射频(RF)可移动阵列,其中所有可移动单元由单个RF链驱动,具有相等的幅度和相位。分析了通过天线放置实现的可达波束赋形增益。结果表明,在单径信道中,波束赋形增益随天线数量线性增长;在多径信道中,建立了相干合并条件和孔径要求。对于多用户传输,推导了闭式的最优最大最小功率分配,并基于此开发了一种逐元素坐标搜索算法用于天线放置设计。数值结果验证了分析,并揭示了一个基本权衡:仅通过天线放置即可实现波束赋形增益,但代价是增加孔径资源。

英文摘要

A single-radio-frequency (RF) movable array is investigated, in which all movable elements are driven by a single RF chain with equal amplitude and equal phase. The achievable beamforming gain enabled by antenna placement is analyzed. Linear beamforming gain scaling with the number of antennas is shown to be achievable in single-path channels, while coherent-combining conditions and aperture requirements are established for multipath channels. For multiuser transmission, the optimal max-min power allocation is derived in closed form, based on which an element-wise coordinate-search algorithm is developed for antenna placement design. Numerical results validate the analysis and reveal a fundamental tradeoff: beamforming gains can be achieved through antenna placement alone, but only at the expense of increased aperture resources.

2606.11339 2026-06-11 math.OC cs.AI cs.LG eess.SY stat.ML 新提交

Quantized Stochastic Primal-Dual Methods for Distributed Optimization under Relaxed Global Geometry

松弛全局几何下分布式优化的量化随机原始-对偶方法

Susmit Sarkar, Abhinav Raghuvanshi, Kushal Chakrabarti, Mayank Baranwal

AI总结 提出量化随机原始-对偶方法q-PDGD,在松弛全局几何下证明线性收敛到邻域或O(1/k)收敛,匹配最优集中随机复杂度。

详情
Comments
Accepted to UAI
AI中文摘要

我们研究具有随机梯度和有限比特通信(由随机(无偏)量化建模)的分布式优化。我们提出q-PDGD,一种量化的随机原始-对偶方法,并在松弛全局几何下对其进行分析。在受限割线不等式(RSI)下,常数步长产生线性收缩到由梯度噪声、量化失真和网络连通性确定的显式邻域,而递减步长在没有共享最小化器假设的情况下实现O(1/k)收敛。在Polyak-Lojasiewicz(PL)不等式下,我们在相同的随机量化设置中获得线性到邻域的收敛。我们的结果在预言复杂度上匹配已知最优的集中随机速率,并通过实验证明了量化水平、步长选择和图结构之间的预测权衡。

英文摘要

We study distributed optimization with stochastic gradients and finite-bit communication modeled by random (unbiased) quantization. We propose q-PDGD, a quantized stochastic primal-dual method, and analyze it under relaxed global geometry. Under restricted secant inequality (RSI), a constant step-size yields linear contraction to an explicit neighborhood determined by gradient noise, quantization distortion, and network connectivity, while a diminishing step-size achieves O(1/k) convergence without shared-minimizer assumptions. Under Polyak-Lojasiewicz (PL) inequality, we obtain linear-to-neighborhood convergence in the same stochastic quantized setting. Our results match the best-known centralized stochastic rates in oracle complexity, and are supported by experiments demonstrating the predicted tradeoffs between quantization level, step-size choice, and graph structure.

2606.11336 2026-06-11 cs.HC cs.ET eess.SY 新提交

Towards a Joint Understanding of Remote Operation for Vehicles in Public Road Traffic

面向公共道路交通中车辆远程操作的联合理解

Elisabeth Shi, Maria-Magdalena Wolf, Nina Theobald, Bettina Abendroth, Eugen Wige, Johannes Springer, Katharina Hottelart, Andreas Schrank, Thorben Brandt, Michael Oehl, Frank Diermeyer, Lena Plum

AI总结 本文提出一个框架,通过追溯人车信息处理差异的术语,统一远程操作概念,促进跨学科交流,并整合近期讨论的远程操作形式。

详情
AI中文摘要

持续驾驶自动化系统被设想用作无人驾驶出行服务的基础。然而,研究人员和从业者都承认,当前的驾驶自动化系统尚无法处理人类驾驶员能够处理的所有交通情况。为了弥合这一差距并实现无需车内人类驾驶员或后备的出行服务,远程操作(或遥操作)正被越来越多地讨论。最近,已采取首批法律行动,允许在公共道路上进行某些形式的远程操作。远程操作涵盖了支持驾驶自动化系统的广泛方法,从远程辅助(包括提供信息或释放操作)到远程驾驶(包括从远程位置驾驶车辆)。因此,在公共道路交通中安全实施远程操作对多个学科(如工程学、心理学、信息学、法学等)和利益相关者(如远程操作服务提供商、远程操作员、车辆制造商、监管机构等)的协作提出了挑战。同时,由于期望和语言的不同,跨学科讨论往往具有挑战性。为了建立共同基础,本文追溯术语到人类和车辆双方信息处理的原始差异。该框架旨在通过明确指定需要什么来吸引包括不同背景和兴趣的研究人员和利益相关者在内的多样化受众,从而帮助进一步讨论。近期讨论的远程操作形式被整合到该框架中。

英文摘要

Sustained driving automation systems are envisioned to be used as the foundation for driverless mobility services. However, both researchers and practitioners acknowledge that current driving automation systems are not yet able to handle all traffic situations that a human driver can handle. To bridge this gap and enable mobility services without an in-vehicle human driver or fallback, remote operation (or teleoperation) is increasingly discussed. Recently, first legal actions have been taken to enable some forms of remote operation on public roads. Remote operation encompasses a broad spectrum of methods to support a driving automation system, ranging from remote assistance, which includes providing information or releasing a maneuver, to remote driving, which includes driving the vehicle from a remote location. As such, safe implementation of remote operation in public road traffic challenges the collaboration of multiple academic disciplines (e.g. engineering, psychology, informatics, law, etc.) and stakeholders (e.g. remote operation service providers, remote operators, vehicle manufacturers, regulatory authorities, etc.). At the same time, the interdisciplinary discourse is often challenging due to differing expectations and language. To build a common ground, this article traces terminology back to the original differences in information processing both on human and vehicle side. This framework aims to help further discourse by directly specifying what is needed to engage a diverse audience including researchers and stakeholders of different backgrounds and interests. Recently discussed forms of teleoperation are integrated into this framework.