arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.09829 2026-06-09 eess.SP 新提交

Adaptive Derivative Estimation via Stein's Unbiased Risk

基于Stein无偏风险的自适应导数估计

Yonathan Murin, Ali Ozer Ercan

AI总结提出SURDE方法，通过Stein无偏风险评估候选滤波器长度并软组合输出，实现因果FIR导数滤波的噪声-偏差权衡，证明极小极大最优性，在仿真和真实数据上优于ICI和AWVE。

详情

Comments: Submitted to IEEE Transactions on Signal Processing, 23 pages

AI中文摘要

从含噪采样数据中估计导数对于控制、人机交互和生物医学工程至关重要。因果FIR导数滤波器为此提供了一种自然方法，但其性能取决于滤波器长度。短滤波器放大噪声，长滤波器引入平滑偏差。我们提出SURDE（SURE导数估计器），通过在一组候选长度上评估基于Stein无偏风险估计（SURE）的数据驱动代价，并利用指数加权软组合它们的输出，在每个时间步解决这一权衡。我们证明了软组合估计器的极小极大最优预言不等式，并据此推导出最优加权温度的闭式解。因此，SURDE唯一的调参参数是噪声方差。通过数值模拟，我们展示了SURDE在一阶导数估计中始终优于替代自适应方法（置信区间交集（ICI）规则和自适应窗口速度估计器（AWVE））。我们进一步表明SURDE对噪声方差误设具有鲁棒性（在4倍范围内性能下降9%），并且在真实数据场景（EuRoC MAV数据集）中也优于ICI和AWVE。SURDE是因果的、计算轻量，且仅需噪声方差的粗略估计。

英文摘要

Estimating derivatives from noisy sampled data is fundamental to control, human--computer interaction, and biomedical engineering. Causal FIR derivative filters offer a natural approach for this challenge, yet their performance depend on their length. While short filters amplify noise, long filters introduce smoothing bias. We present SURDE (SURE Derivative Estimator), which addresses this tradeoff at each time step by evaluating a data-driven cost derived from Stein's Unbiased Risk Estimator (SURE) across a bank of candidate lengths and soft-combining their outputs via exponential weighting. We prove a minimax-optimal oracle inequality for the soft-combined estimator and use it to derive the optimal weighting temperature in closed form. Thus, the only tuning parameter for SURDE is the noise variance. Via numerical simulations we show that SURDE consistently outperforms alternative adaptive methods (the Intersection of Confidence Intervals (ICI) rule and the Adaptive Windowing Velocity Estimator (AWVE)) for first-derivative estimation. We further show that \surede{} is robust to noise-variance misspecification (9\% degradation over a $4\times$ range), and that it is superior to ICI and AWVE also over real data scenarios (the EuRoC MAV dataset). SURDE is causal, computationally light, and requires only a rough estimate of the noise variance.

URL PDF HTML ☆

赞 0 踩 0

2606.09825 2026-06-09 cs.LG cs.AI cs.SY eess.SY math.OC 新提交

An Agency-Transferring Model-Free Policy Enhancement Technique

一种无模型策略增强的代理转移技术

Anton Bolychev, Georgiy Malaniya, Sinan Ibrahim, Pavel Osinenko

发表机构 * Center for Engineering Systems and Sciences（工程系统与科学中心）； Central University（中央大学）； Sirius University of Science and Technology（天狼星科技大学）

AI总结提出一种将次优基线策略嵌入强化学习训练的方法，通过逐步从基线策略向可学习策略转移代理权，提升训练效率并最终获得超越基线的独立策略。

详情

AI中文摘要

从头开始训练强化学习（RL）策略成本高昂：需要仔细设计奖励和环境、大量调参以及大量计算。然而，许多控制问题已经有一个功能正常但次优的基线策略可用。本文提出一种方法，将这样的基线策略嵌入RL训练过程，同时提高相对于从头开始方法的训练效率，并产生一个优于基线的学习策略。在每个步骤中，该方法在基线策略和可训练的学习策略之间进行仲裁，最初强烈依赖基线策略，然后逐步将代理权转移给学习策略。训练结束时，学习策略是一个无需基线策略支持的独立神经网络。本文形式化了基线策略“功能正常”的含义：在该策略下，智能体以高概率到达目标集并停留在那里。所提出的仲裁机制旨在训练过程中利用这一特性，从训练开始就产生高目标到达率。理论分析在给定假设下提供了这种行为的形式化解释，并将其扩展到最终无基线场景，其中推导了独立学习策略目标到达概率的显式下界。在连续控制基准上的实验结果表明，所提出的方法实现了与竞争方法相当或更高的回报，同时在训练过程中（包括最终阶段，学习策略无需任何基线支持）保持了最高的目标到达率。

英文摘要

Training reinforcement learning (RL) policies from scratch is costly: it requires careful reward and environment design, extensive tuning, and substantial computation. Yet many control problems already have a functional but suboptimal policy available as a baseline. This paper proposes a method for embedding such a baseline into the RL training process, simultaneously improving training efficiency relative to from-scratch methods and producing a learning policy that outperforms the baseline. At each step, the method arbitrates between the baseline policy and a trainable learning policy, initially relying strongly on the baseline policy and then progressively transferring agency to the learning policy. By the end of training, the learning policy is a standalone neural network that operates without baseline policy support. The paper formalizes what it means for the baseline policy to be functional: under this policy, the agent reaches a goal set and remains there with high probability. The proposed arbitration mechanism is designed to exploit this property during training, yielding high goal-reaching rates right from the beginning of training. A theoretical analysis provides a formal interpretation of this behavior under stated assumptions and extends it to the final baseline-free regime, where explicit lower bounds are derived for the goal-reaching probability of the standalone learning policy. Empirical results on continuous-control benchmarks show that the proposed method achieves returns that match or exceed those of competitive approaches, while maintaining the highest goal-reaching rates throughout training among the compared methods -- including in the final stage, where the learning policy operates without any baseline support.

URL PDF HTML ☆

赞 0 踩 0

2606.09753 2026-06-09 eess.SP 新提交

Jamming-Resilient Sparse Delay-Doppler NOMA: Unitary Precoding, Randomized Active Sets, and Superincreasing Power Allocation

抗干扰的稀疏时延-多普勒NOMA：酉预编码、随机激活集和超递增功率分配

Michel Kulhandjian, Hovannes Kulhandjian, Theodoros A. Tsiftsis

AI总结提出一种抗故意干扰的稀疏时延-多普勒NOMA方案，通过酉预编码和随机激活集实现干扰抑制，并利用超递增功率分配简化SIC，在Rician衰落中保持抗干扰性。

详情

Comments: 30 pages, 16 figures. Master version. Journal companion: arXiv:TBD. WCL companion: arXiv:TBD

AI中文摘要

我们提出了一种对故意干扰具有鲁棒性的稀疏时延-多普勒NOMA方案。发射机将用户数据放置在时延-多普勒网格的一个小的随机子集上，通过酉预编码器扩展结果，并从与接收机共享的伪随机种子中每帧重新绘制激活子集。接收机检测并丢弃受干扰的网格点，通过最小二乘法恢复稀疏信号，并通过SIC逐网格点解码。Hadamard、DFT和Haar随机预编码器本质上产生相同的BER，因为Marchenko-Pastur条件数论证控制了任何随机酉子矩阵。与传统的OTFS-NOMA中众所周知的局部干扰底噪不同，该闭式BER没有干扰引起的底噪。同样的论证表明，破坏共享种子并不会破坏系统：随机酉子矩阵仍然条件良好，因此BER保持在无干扰包络内。对于两个以上用户，我们使用超递增功率分配（Merkle-Hellman背包），并证明由此产生的低复杂度SIC精确匹配最大似然检测，消除了通常的SIC传播上限。对于四个以上用户，我们将它们分成对，分配给不相交的网格点子集；这种OMA友好的NOMA规则在八个用户时，在SNR约为20 dB时达到底噪BER。我们将框架扩展到Rician衰落，并表明对于任意Rician K因子，干扰独立性成立。蒙特卡洛模拟在3 dB内跟踪分析预测，并显示与模式感知干扰机相比，BER比至少提高40 dB，在先知干扰下，累积增益比传统OTFS-NOMA高出约24 dB。

英文摘要

We propose a sparse delay-Doppler NOMA scheme resilient to intentional jamming. The transmitter places user data on a small random subset of delay-Doppler bins, spreads the result through a unitary precoder, and re-draws the active subset per frame from a pseudo-random seed shared with the receiver. The receiver detects and discards jammed bins, recovers the sparse signal by least squares, and decodes per bin via SIC. Hadamard, DFT, and Haar-random precoders all yield essentially the same BER, because a Marchenko-Pastur conditioning argument controls any random unitary submatrix. The closed-form BER has no jammer-induced floor, unlike the well-known partial-band floor of conventional OTFS-NOMA. The same argument shows that compromising the shared seed does not break the system: random unitary submatrices remain well-conditioned, so BER stays within the unjammed envelope. For more than two users we use a superincreasing power allocation (Merkle-Hellman knapsack) and prove the resulting low-complexity SIC matches maximum-likelihood detection exactly, removing the usual SIC propagation ceiling. For more than four users we partition them into pairs assigned to disjoint bin subsets; this OMA-friendly NOMA rule reaches floor BER at eight users by SNR around 20 dB. We extend the framework to Rician fading and show the jammer-independence property holds for arbitrary Rician K-factor. Monte Carlo simulations track the analytical predictions within 3 dB and show at least a 40 dB BER-ratio improvement against pattern-aware jammers, with about 24 dB of cumulative gain over conventional OTFS-NOMA under oracle jamming.

URL PDF HTML ☆

赞 0 踩 0

2606.09717 2026-06-09 cs.SD eess.AS 新提交

What Makes Synthetic Speech Sound Sarcastic? A Prosody-Controlled Perception Study

什么让合成语音听起来讽刺？一项韵律控制的感知研究

Zhu Li, Shekhar Nayak, Matt Coler

发表机构 * University of Groningen（格罗宁根大学）

AI总结通过可控神经TTS系统操纵语速、音高变化和响度，发现响度主要驱动人类对讽刺的感知，而模型更依赖语速，揭示了韵律线索权重差异。

详情

Comments: Accepted to Interspeech 2026

AI中文摘要

韵律在讽刺感知中起着核心作用，然而以往的研究依赖于自然产生的语音，缺乏对单个声学维度的精细控制。由于韵律线索在自然数据中共变，隔离它们的独立贡献仍然具有挑战性。我们引入了一个受控框架，使用基于提示的韵律条件化的神经文本到语音（TTS）来操纵语速、音高变化和响度。构建了一个正交刺激集，以实现对韵律线索效应的因果测试。人类听众对讽刺性和自然度进行评分，并将他们的判断与能够处理音频输入的基础模型的预测进行比较。结果表明，响度主要驱动人类对讽刺的感知，而模型则赋予语速更大的权重，导致不同的线索加权模式。这项研究表明，可控神经TTS如何能够研究语音感知中的韵律线索加权。

英文摘要

Prosody plays a central role in sarcasm perception, yet previous studies have relied on naturally produced speech that lacks fine-grained control over individual acoustic dimensions. As prosodic cues co-vary in natural data, isolating their independent contributions remains challenging. We introduce a controlled framework using neural text-to-speech (TTS) with prompt-based prosodic conditioning to manipulate speech rate, pitch variation, and loudness. An orthogonal stimulus set was constructed to enable causal testing of prosodic cue effects. Human listeners rated sarcasm and naturalness, and their judgments were compared with predictions from a foundation model capable of processing audio input. Results show that loudness primarily drives human sarcasm perception, whereas the model assigns greater weight to speech rate, leading to distinct cue-weighting patterns. This study shows how controllable neural TTS enables investigation of prosodic cue weighting in speech perception.

URL PDF HTML ☆

赞 0 踩 0

2606.09698 2026-06-09 cs.IT cs.SY eess.SY math.IT math.OC 新提交

Optimal Feedback Communication with Information Maximization and Distortion Minimization

信息最大化与失真最小化的最优反馈通信

Aolin Xu

AI总结研究通过带反馈信道传输实值源的最优方案，提出信息最大化与失真最小化的联合优化条件，并证明后验匹配方案在对称离散信道中同时实现两者最优。

详情

AI中文摘要

我们研究了通过多次使用带反馈的信道最优地发送实值源的问题。首先，我们陈述了一组充分条件，使得编码器能够实现源与所有信道输出之间的最大互信息。当信道是输入可识别时（这是常见信道模型广泛满足的条件），这组条件也是必要的。更值得注意的是，我们进一步研究了信息最大化-失真最小化问题，其中源与所有信道输出之间的互信息仍需最大化，同时在每一步，基于当前信道输出对源的MMSE估计也需最小化。我们针对具有特定对称性的离散信道（例如$k$元对称信道或$k$元删除信道）推导了该问题的解。我们证明，对于此类信道，著名的后验匹配方案虽然对于单独的信息最大化并非必要，但对于同时实现信息最大化和失真最小化是充分且本质上必要的。这项工作还提供了通过信息最大化来正则化失真最小化反馈通信的新视角，使我们能够找到否则难以处理的最优解。

英文摘要

We study the problem of optimally sending a real-valued source through multiple uses of a channel with feedback. First, we state a set of conditions that are sufficient for an encoder to achieve maximal mutual information between the source and all the channel outputs. This set of conditions are also necessary when the channel is input-identifiable, a condition widely satisfied by common channel models. More notably, we further study the information maximization-distortion minimization problem, where the mutual information between the source and all channel outputs still needs to be maximized, while at each step, the MMSE of estimating the source from the channel outputs so far also needs to be minimized. We derive a solution to this problem for discrete channels with certain symmetries, e.g. $k$-ary symmetric or $k$-ary erasure channels. We show that for such channels the famous posterior matching scheme, while not necessary for information maximization alone, is sufficient and essentially necessary for achieving both information maximization and distortion minimization. This work also provides a new perspective of regularizing distortion-minimizing feedback communication through information maximization, which enables us to find the optimal solution that otherwise would be intractable.

URL PDF HTML ☆

赞 0 踩 0

2606.09667 2026-06-09 eess.AS cs.CL cs.SD 新提交

Cross-Modal Masking for Robust Silent Speech Synthesis Using sEMG and Lipreading

基于sEMG和唇读的鲁棒无声语音合成的跨模态掩蔽

Eder del Blanco, David Gimeno-Gómez, Eva Navas, Carlos-D. Martínez-Hinarejos, Inma Hernáez

AI总结提出掩蔽多模态语音合成框架，联合表面肌电图和唇读信号，通过训练时模态掩蔽提升鲁棒性，在多说话人设置下词错误率降低14个百分点。

详情

Comments: 12 pages, 7 figures and 6 tables. Submitted to Transactions on Audio, Speech and Language Processing

AI中文摘要

通过无声语音接口进行语音恢复已成为针对喉部发声受损或缺失个体的有前景的辅助技术。在非侵入式无声语音接口模态中，表面肌电图和基于视频的唇读提供了互补的发音信息，然而它们用于连续语音合成的集成仍未被充分探索。此外，现有的多模态方法很少考虑对模态退化或临时传感器故障的鲁棒性，限制了它们在现实场景中的适用性。在这项工作中，我们提出了一种掩蔽多模态语音合成框架，通过在训练期间进行模态掩蔽来联合利用表面肌电图和唇读信号。在多说话人设置下，与最强的单模态基线相比，所提出的方法将词错误率降低了多达14个绝对百分点。实验结果不仅表明掩蔽策略对于这些性能提升和低比特率条件下的鲁棒性至关重要，而且表明在模态缺失情况下，它们比针对退化的数据增强具有更好的泛化能力。音素级分析进一步揭示了跨模态的互补贡献，对元音和特定辅音组尤其有益。总体而言，这些发现证明了掩蔽多模态集成用于无声语音合成的有效性和鲁棒性，尽管适应喉切除说话者仍是一个开放的研究挑战。

英文摘要

Speech restoration through silent speech interfaces (SSIs) has emerged as a promising assistive technology for individuals with impaired or absent laryngeal voice production. Among non-invasive SSI modalities, surface electromyography (sEMG) and video-based lipreading provide complementary articulatory information, yet their integration for continuous speech synthesis remains underexplored. Moreover, existing multimodal approaches rarely address robustness to modality degradation or temporary sensor failure, limiting their applicability in realistic scenarios. In this work, we propose a masked multimodal speech synthesis framework that jointly leverages sEMG and lipreading signals through modality masking during training. Under multispeaker settings, the proposed approach reduces word error rate by up to 14 absolute percentage points compared to the strongest unimodal baseline. Experimental results not only show that masking strategies are critical for these performance gains and robustness under low-bitrate conditions, but also that they generalize better than degradation-specific data augmentations in the presence of modality absence conditions. Phone-level analyses further reveal complementary contributions across modalities, with particularly strong benefits for vowels and for specific consonant groups. Overall, these findings demonstrate the effectiveness and robustness of masked multimodal integration for silent speech synthesis, although adaptation to laryngectomized speakers remains an open research challenge.

URL PDF HTML ☆

赞 0 踩 0

2606.09652 2026-06-09 eess.SP 新提交

Throughput Analysis for Near-Field Mobile Communications: Beamfocusing or Caustic Beamforming?

近场移动通信的吞吐量分析：波束聚焦还是焦散波束成形？

Jiannan Wang, Xianghao Yu, Robert Schober

AI总结本文针对太赫兹近场通信，通过分析波束聚焦和焦散波束成形的吞吐量，建立了切换开销阈值，证明在高移动性场景下焦散波束成形更优。

详情

AI中文摘要

向太赫兹频段的迁移和超大规模天线阵列的部署正在将无线通信过渡到辐射近场区域，从根本上将传统的角度波束转向演变为波束聚焦。然而，极窄波束宽度与用户移动性的结合需要频繁的波束聚焦重构，导致显著的切换开销，降低了系统可达吞吐量。在这方面，焦散波束成形是一种有前景的替代方案，它基于连续弯曲波束的合成，以分布式波束成形增益为代价消除了波束跟踪的需求。通过利用艾里波束作为典型模型，本文开发了一个分析框架来比较焦散波束成形和波束聚焦实现的吞吐量。我们的主要结果包括两种波束成形策略的闭式吞吐量表达式以及范式选择的性能边界。首先，我们通过建模由连续用户运动引起的散焦惩罚来推导波束聚焦的吞吐量。解析确定了最大化吞吐量的最优波束驻留时间，并量化了用户速度和切换开销对吞吐量的影响。对于焦散波束成形方案，我们证明其吞吐量由信噪比和用户轨迹的几何形状决定，但与用户速度无关。最后，我们解析建立了切换开销的阈值，以定义两种波束成形器可达吞吐量的交叉点。关键的是，该阈值在极高频率下渐近消失，使得连续焦散波束成形方案成为高移动性太赫兹通信的首选波束设计范式。

英文摘要

The migration to the Terahertz (THz) band and the deployment of extremely large antenna arrays (ELAAs) are transitioning wireless communications into the radiative near-field regime, fundamentally evolving conventional angular beam steering to beamfocusing (BF). However, the combination of the extremely narrow beamwidth and the mobility of the users necessitates frequent beamfocusing reconfigurations, incurring a significant switching overhead that degrades the system achievable throughput. In this regard, caustic beamforming (CB) is a promising alternative based on the synthesis of a continuous curved beam, which eliminates the need for beam tracking at the expense of a distributed beamforming gain. By leveraging the Airy beam as a canonical model, this paper develops an analytical framework to compare the throughputs achieved by CB and BF. Our main results include closed-form throughput expressions for both beamforming strategies and a performance boundary for paradigm selection. First, we derive the BF throughput by modeling a defocusing penalty induced by continuous user movement. The optimal beam dwell time that maximizes the throughput is analytically determined, and the impact of user speed and switching overhead on the throughput is quantified. For the CB scheme, we demonstrate that its throughput is determined by the signal-to-noise ratio (SNR) and the geometry of the trajectory of the user, yet invariant to the user speed. Finally, we analytically establish a threshold for the switching overhead to define the crossover point of the achievable throughput of both beamformers. Crucially, this threshold asymptotically vanishes at extremely high frequencies, positioning the continuous CB scheme as the preferred beam design paradigm for high-mobility THz communications.

URL PDF HTML ☆

赞 0 踩 0

2606.09620 2026-06-09 cs.RO cs.SY eess.SY 新提交

Motion planning for hundreds of floating robots

数百个浮动机器人的运动规划

Jan Kamm, Antonio Terpin, Raffaello D'Andrea, Aswin Ramachandran

发表机构 * Institute for Dynamic Systems and Control, ETH Zürich（苏黎世联邦理工学院动态系统与控制研究所）

AI总结针对大型浮动机器人编队的无碰撞运动规划问题，提出一种可扩展的流水线方法，通过碰撞图分解为独立子问题并行求解，在500个机器人仿真和实际演示中验证了有效性。

2606.09617 2026-06-09 math.OC cs.AI cs.CY cs.SY eess.SY 新提交

Powering the Future of AI: Navigating the Trade-offs for Europe's Energy Transition and Net-Zero Goals

赋能AI未来：应对欧洲能源转型与净零目标的权衡

Mohammad Hemmati, Gbemi Oluleye, Vassilis M. Charitopoulos

AI总结通过21种AI增长情景下的空间优化模型，量化AI对欧洲电力需求、容量、排放和运行的影响，发现AI到2050年可能增加73-723 TWh需求，导致2030-2050年累计排放超调67-181 MtCO2，且AI基础设施选址将更依赖稳定电源和系统灵活性。

详情

AI中文摘要

全球AI的快速扩张导致能源密集型超大规模数据中心激增，使其成为电力系统规划和运行中的结构性挑战。利用覆盖21种AI增长情景的欧洲空间显式优化模型，我们系统量化了数据中心的额外需求、容量要求、排放和运行影响。结果表明，到2050年，AI可能推动73-723 TWh的额外需求，导致2030年至2050年间累计排放超调67-181 MtCO2。我们的分析表明，2030年后，AI基础设施的地理分布将更多地由稳定电源和系统灵活性决定，而非仅仅依赖清洁能源的丰富程度。在中等情景下，AI需要额外200小时的稳定发电，这使关键枢纽的平准化电力成本增加35欧元/兆瓦时。我们表明，即使在悲观情景下，现有基础设施也需要额外70吉瓦的容量，而在受控增长路径下，这一扩张可能达到226吉瓦。我们进一步发现，数据中心的工作负载动态强烈影响能源调度、系统灵活性和排放，而效率提升显著降低了容量需求和系统峰值。虽然我们的研究结果表明2050年净零目标可能实现，但中期可能出现关键排放风险，除非政策适应这一加速的数字转型，否则欧盟可能危及其中性碳目标。

英文摘要

The rapid expansion of AI globally has led to the proliferation of energy-intensive hyperscale data centres (DCs), making them as a structurally challenging component in power system planning and operation. Using a spatially explicit optimisation model of Europe across 21 AI growth scenarios, we systematically quantify additional demand, capacity requirements, emissions, and operational impacts of DCs. Results indicate that AI could drive 73-723 TWh of extra demand by 2050, risking cumulative emissions overshoots of 67-181 MtCO2 between 2030 and 2050. Our analysis indicates that after 2030, the geography of AI infrastructure will be shaped more by firm power and system flexibility than by the mere abundance of clean energy. In moderate scenarios, AI requires an additional of 200 hours of firm generation, which increases LCOE by 35 EUR/MWh in key hubs. We show that even under the pessimistic scenarios, existing infrastructure would require 70 GW additional capacity, while under managed growth pathways, this expansion could reach 226 GW. We further find DCs workload dynamics strongly shape energy dispatch, system flexibility, and emissions, while improved efficiency significantly reduces capacity needs, and system peaks. While our findings suggest that net-zero targets for 2050 may be achieved, critical emission risks may appear in the intermediate years, and the EU may compromise its carbon-neutral goals unless policies adapt to this accelerating digital transformation.

URL PDF HTML ☆

赞 0 踩 0

2606.09573 2026-06-09 eess.SP 新提交

Bernoulli Filtering for Multi-Sensor Tracking with Thresholded Measurements

基于门限测量的多传感器跟踪伯努利滤波

Gustav Zetterqvist, Fredrik Gustafsson, Gustaf Hendeby

AI总结针对传感器检测门限导致的状态相关漏检问题，提出基于伯努利滤波的递归跟踪框架，联合处理杂波和目标存在不确定性，在仿真中相比固定检测概率的伯努利滤波将GOSPA指标降低62.4%。

详情

Comments: This work has been submitted to the IEEE for possible publication

AI中文摘要

当传感器检测门限导致状态相关的漏检时，目标跟踪具有挑战性，特别是在存在杂波和目标存在不确定性的多传感器场景中。最近开发的一种漏检框架将检测概率建模为目标状态、传感器特性和检测门限的函数，但仅限于单个测量，并未解决递归跟踪问题。本文利用伯努利滤波公式扩展该框架，以联合处理递归目标跟踪、杂波和目标存在不确定性。在具有非线性测量、杂波和检测不确定性的模拟二维多传感器跟踪场景中，评估了伯努利粒子滤波。与具有固定检测概率的传统伯努利滤波相比，结合准确的检测门限知识将广义最优子模式分配（GOSPA）度量降低了62.4%，同时更好地平衡了漏检和虚警。

英文摘要

Target tracking is challenging when sensor detection thresholds cause state-dependent missed detections, particularly in multi-sensor scenarios with clutter and uncertain target existence. A recently developed missed detection framework models detection probability as a function of target state, sensor characteristics, and detection threshold, but it is limited to individual measurements and does not address the recursive tracking problem. This work extends the framework using a Bernoulli filter formulation to jointly handle recursive target tracking, clutter, and target existence uncertainty. A Bernoulli particle filter is evaluated in a simulated 2D multi-sensor tracking scenario with nonlinear measurements, clutter, and detection uncertainty. Incorporating accurate detection threshold knowledge reduces the generalized optimal subpattern assignment (GOSPA) metric by 62.4% compared to a conventional Bernoulli filter with fixed detection probability, while better balancing missed detections and false alarms.

URL PDF HTML ☆

赞 0 踩 0

2606.09557 2026-06-09 eess.AS 新提交

Your U-Net Dereverberation Model is Secretly an RIR Encoder

你的U-Net去混响模型其实是一个RIR编码器

Sina Khanagha, Timo Gerkmann

AI总结本文分析NCSN++ U-Net去混响模型中间表示捕获全局房间特性的能力，发现深层编码结构化RIR嵌入，并提出基于对比学习RIR嵌入的条件训练策略，提升去混响性能并加速推理。

详情

Comments: Accepted to Interspeech 2026

AI中文摘要

在这项工作中，我们分析了基于NCSN++ U-Net的音频去混响模型在其中间表示中捕获全局房间特性的能力。通过对最先进的扩散模型和判别式对应模型的实证研究，我们表明深层网络编码了结构化的房间脉冲响应（RIR）相关嵌入。此外，这种隐式房间表示的判别能力与客观指标上的去混响性能相关。受此观察启发，我们提出了一种训练策略，明确地以通过自监督对比学习获得的预训练RIR嵌入为条件。引入RIR条件改善了表示质量，加速了收敛，并增强了去混响性能，同时显著减少了扩散模型在推理过程中所需的反向扩散步数。

英文摘要

In this work, we analyze the ability of NCSN++ U-Net based audio dereverberation models to capture global room characteristics in their intermediate representations. Through an empirical study of both a state-of-the-art diffusion-based model and a discriminative counterpart, we show that deeper layers encode structured room impulse response (RIR)-dependent embeddings. Moreover, the discriminative ability of this implicit room representation correlates with dereverberation performance across objective metrics. Motivated by this observation, we propose a training strategy that explicitly conditions the network on pre-trained RIR embeddings, obtained via self-supervised contrastive learning. Incorporating RIR conditioning improves representation quality, accelerates convergence, and enhances dereverberation performance, while significantly reducing the number of reverse diffusion steps required by the diffusion-based model during inference.

URL PDF HTML ☆

赞 0 踩 0

2606.09534 2026-06-09 eess.SY cs.SY 新提交

A Continuification Approach to CAV Control in Mixed Traffic via Variable Speed Limits

一种通过可变限速实现混合交通中CAV控制的连续化方法

Brian Block, Cecilia Pasquale, Silvia Siri, Simona Sacone, Stephanie Stockar

AI总结提出一种基于连续化方法的CAV控制策略，通过设计PDE上的LQR控制器确定最优可变限速，再转化为各CAV的输入速度，以降低多CAV控制的计算负担。

详情

Comments: 7 pages, 5 figures. Accepted to IEEE ITSC 2026, Naples, Italy

AI中文摘要

本文提出一种利用联网自动驾驶车辆（CAV）作为移动瓶颈来控制交通的方法。当前的移动瓶颈控制方法使用基于Lighthill-Whitham-Richard（LWR）模型的耦合PDE-ODE模型来表示CAV的影响。CAV的控制通常通过设计控制移动瓶颈速度的ODE来实现。本文提出的方法则首先在PDE上设计移动瓶颈控制器，以降低控制多个CAV的计算负担。在PDE上设计的原始控制器是一个线性二次型调节器（LQR），用于确定整个高速公路长度的最优可变限速（VSL），以将密度调节到期望设定点。然后，利用连续化方法确定每个CAV的输入速度。结果表明，通过该方法可以控制多个CAV，且计算负担最小；随着CAV数量的增加，解趋近于LQR确定的全局最优解。

英文摘要

This paper presents a method for controlling traffic via the use of connected and automated vehicles (CAVs) acting as moving bottlenecks. Current methods for moving bottleneck control use a couple PDE-ODE model, based on the Lighthill-Whitham-Richard (LWR) model, to represent the influence of the CAV. Control of the CAV is normally achieved by designing the control on the ODE which models the speed of the moving bottleneck. The proposed method in this paper instead looks to reduce the computational burden of controlling multiple CAVs by designing the moving bottleneck controller first on the PDE. The original control designed on the PDE is a linear quadratic regulator (LQR) that determines the optimal variable speed limit (VSL) for the entire length of freeway in order to regulate density to a desired setpoint. Then, a continuification approach is utilized to determine the input speed for each CAV. Results show that multiple CAVs can be controlled via this method, with minimal computational burden, and that as the number of CAVs increases the solution approaches the global optimal solution determined by the LQR.

URL PDF HTML ☆

赞 0 踩 0

2606.09505 2026-06-09 eess.SY cs.SY math.OC 新提交

Guaranteed Fast Implementation of the Split Covariance Intersection Filter: Nested Newton Method Thanks to the Fourth-Order Convexity of w-Optimization

分裂协方差交集滤波器的保证快速实现：由于w-优化的四阶凸性而提出的嵌套牛顿法

Hao Li

AI总结本文证明分裂协方差交集滤波器中的w-优化问题具有四阶凸性，并基于此提出嵌套牛顿法，实现保证快速的滤波器实现。

2606.09504 2026-06-09 eess.SP 新提交

Hierarchical Federated Learning for Unsupervised Waveform Classification over Tactical MANETs

面向战术MANET的无监督波形分类的分层联邦学习

Charles E. Thornton, Daniel J. Jakubisin

AI总结提出一种分层联邦学习框架，在瑞利衰落、随机移动和多跳路由损失的战术MANET中，通过无监督去噪卷积自编码器实现波形分类，两阶段聚合协议减少传输比特约12%，并发现信道驱动的子采样可作为隐式正则化器。

详情

Comments: 6 pages, 3 figures

AI中文摘要

在竞争性战术环境中，分布式射频感知需要跨移动节点的协作学习。在自组织网络中，学习必须在没有持久回程、地面真值标签或可靠通信链路的情况下进行。传统的联邦学习方法假设理想的链路条件或有监督的训练目标，这些在部署的MANET平台上都不成立。本文提出了一种分层联邦学习框架，用于在受瑞利衰落、随机路点移动性和多跳路由损失影响的战术MANET上进行无监督波形分类。每个节点在原始IQ观测上训练局部去噪卷积自编码器，无需标签交换，通过自监督重构目标学习紧凑表示。两阶段聚合协议选举基于连接性的中继聚合器，与OLSR多点中继选择一致，在转发到移动服务器代理之前压缩集群级模型更新。仿真结果表明，在等效分类性能下，网络内聚合相比中继转发联邦平均减少了约12%的尝试传输比特。值得注意的是，非IID数据下的随机信道驱动子采样充当了隐式正则化器，两种MANET条件在无监督表示质量上匹配或超过了理想联邦平均。这表明适度的链路损失可以部分补偿异构网络中的客户端漂移。性能通过使用KMeans归一化互信息和线性探针准确性分析学习到的潜在嵌入来评估。

英文摘要

Distributed radio frequency sensing in contested tactical environments demands collaborative learning across mobile nodes. In ad-hoc networks, learning must occur without persistent backhaul, ground truth labels, or reliable communication links. Traditional federated learning approaches assume either ideal link conditions or supervised training objectives, neither of which holds in practice for deployed MANET platforms. This paper presents a hierarchical federated learning framework for unsupervised waveform classification over tactical MANETs subject to Rayleigh fading, random waypoint mobility, and multi-hop routing loss. Each node trains a local denoising convolutional autoencoder on raw IQ observations without label exchange, learning compact representations through a self-supervised reconstruction objective. A two-stage aggregation protocol elects connectivity-based relay aggregators consistent with OLSR multipoint relay selection, compressing cluster-level model updates before forwarding to a mobile server proxy. Simulation results demonstrate that in-network aggregation reduces attempted transmission bits relative to relay-forward federated averaging by around 12% at equivalent classification performance. Notably, stochastic channel-driven subsampling under non-IID data acts as an implicit regularizer, with both MANET conditions matching or exceeding ideal federated averaging on unsupervised representation quality. This suggests that moderate link loss can partially compensate for client drift in heterogeneous networks. Performance is assessed on analysis of the learned latent embeddings using KMeans normalized mutual information and linear probe accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.09496 2026-06-09 eess.SP 新提交

Orbital Plane Geometry and Information Conditioning for Doppler-Only LEO Positioning

仅多普勒低轨定位的轨道平面几何与信息条件

Charles E Thornton

AI总结针对静止接收机利用低轨机会信号进行仅多普勒定位，提出将卫星贡献建模为轨道平面上的加权投影，推导信息矩阵的特征值、条件数和最坏情况克拉美-罗下界，揭示轨道平面二面角与信息强度对条件数的影响。

详情

Comments: 5 pages, 3 figures

AI中文摘要

我们研究了静止接收机利用低轨机会信号进行仅多普勒定位的理想化信息模型。受卫星通过的多普勒测量主要提供相关轨道平面内信息的观察启发，我们将每个卫星的贡献建模为该平面上的加权投影。在该模型下，来自多颗卫星的联合信息矩阵是轨道平面投影算子的和。推导了特征值、条件数和最坏情况克拉美-罗下界的闭式表达式。对于两颗卫星，条件数由轨道平面之间的二面角和两条链路的相对信息强度决定。对通过积分多普勒Fisher信息矩阵的蒙特卡洛评估表明，所提出的替代模型捕捉了与轨道平面多样性相关的主要条件数趋势。结果提供了一个简单的几何框架，用于理解星座几何在仅多普勒定位系统中的作用。

英文摘要

We study an idealized information model for Doppler-only positioning with low earth orbit (LEO) signals of opportunity from a stationary receiver. Motivated by the observation that Doppler measurements from a satellite pass provide information primarily within the associated orbital plane, we model each satellite contribution as a weighted projection onto that plane. Under this model, the combined information matrix from multiple satellites is a sum of orbital-plane projection operators. Closed-form expressions are derived for the eigenvalues, condition number, and worst-case Cramer-Rao lower bound. For two satellites, the conditioning is governed by the dihedral angle between orbital planes and the relative information strengths of the two links. Monte Carlo evaluation of pass-integrated Doppler Fisher information matrices demonstrates that the proposed surrogate captures the dominant conditioning trends associated with orbital-plane diversity. The results provide a simple geometric framework for understanding the role of constellation geometry in Doppler-only positioning systems.

URL PDF HTML ☆

赞 0 踩 0

2606.09444 2026-06-09 eess.IV 新提交

Vendor-agnostic 4D Phase Contrast MRI: a complete open-source pipeline for velocities, displacement, and strain analysis

供应商无关的四维相位对比磁共振成像：用于速度、位移和应变分析的完整开源流程

Marta B. Maggioni, Sabine M. Räuber, Katarina Puš, Bostjan Šimunič, Xeni Deligianni, Regina M. M. Schlaeger, Francesco Santini

AI总结提出一个完全开源的4D flow PC-MRI流程，集成压缩感知加速、BART重建和应变分析，通过梯度探测序列确保速度符号正确，并在两种MRI系统和两个解剖部位验证，显著缩短采集时间。

详情

AI中文摘要

相位对比磁共振成像（PC MRI）能够定量评估组织运动和应变。尽管其应用日益广泛，但用于加速采集的标准化、供应商无关的流程仍然稀缺。我们提出了一个完全开源的4D flow PC-MRI流程，集成了在PyPulseq中实现的压缩感知加速序列、基于BART的重建和应变分析。此外，开发了一个梯度探测序列，以确保在不同扫描仪方向和供应商之间正确分配速度符号。该流程在两个西门子MRI系统（3T MAGNETOM Prisma和3T Vida Fit）上进行了验证，应用于两个解剖部位：前臂（指浅屈肌，n=9）和大腿（股外侧肌，n=10），在神经肌肉电刺激（NMES）诱导的收缩期间。压缩感知将手臂和腿部采集的采集时间分别从35分钟和80分钟缩短至5分钟和11分钟。肌肉应变图和S形拟合应变曲线能够提取峰值应变、平均应变和建立速率。股外侧肌的应变大约比指浅屈肌高一个数量级（中位峰值应变0.49 vs. 0.063，平均应变0.31 vs. 0.031）。该流程展示了多平台兼容性，并为定量肌肉成像提供了一个可重复、开放的框架。

英文摘要

Phase contrast MRI (PC MRI) enables quantitative assessment of tissue motion and strain. Although it is increasingly used, standardized, vendor-agnostic pipelines for accelerated acquisitions remain scarce. We present a fully open-source 4D flow PC-MRI pipeline integrating a compressed sensing-accelerated sequence implemented in PyPulseq, BART-based reconstruction, and strain analysis. Additionally, a gradient probing sequence was developed to ensure correct velocity sign assignment across scanner orientations and vendors. The pipeline was validated across two Siemens MRI systems (3T MAGNETOM Prisma and 3T Vida Fit) in two anatomical applications: forearm (Flexor Digitorum Superficialis, n=9) and thigh (Vastus Lateralis, n=10) during Neuromuscular Electrical Stimulation (NMES)-induced contractions. Compressed sensing reduced acquisition times from 35 and 80 minutes to 5 and 11 minutes for the arm and leg acquisitions, respectively. Muscle strain maps and sigmoid-fitted strain curves enabled extraction of peak strain, mean strain, and buildup rate. Strains in the Vastus Lateralis were approximately one order of magnitude higher than in the Flexor Digitorum Superficialis (median peak strain 0.49 vs. 0.063, mean strain 0.31 vs. 0.031). The pipeline demonstrates multi-platform compatibility and provides a reproducible, open framework for quantitative muscle imaging.

URL PDF HTML ☆

赞 0 踩 0

2606.09439 2026-06-09 eess.SY cs.SY 新提交

Tracking the Effective Surface Area of Non-Convex Satellites

跟踪非凸卫星的有效表面积

Lauritz Rismark Fosso, Raymond Kristiansen, Jan Tommy Gravdahl, Sveinung Johan Ohrem, Alessio Bocci

AI总结提出一种框架，通过反步控制算法跟踪非凸卫星的有效表面积，以利用低地球轨道气动阻力进行轨道控制，并同时优化太阳能板朝向。

2606.09436 2026-06-09 eess.SY cs.SY 新提交

Leveraging Optimal Information-Power Flow for Transmission Switching in AC/MTDC Grids

利用最优信息-功率流进行交流/多端直流电网的输电切换

Haixiao Li, Aleksandra Lekić

AI总结提出一种考虑通信网络影响的最优信息-功率流模型，用于解决交流/多端直流电网中的输电切换问题，并通过凸松弛等方法转化为混合整数二阶锥规划模型求解。

详情

Comments: 6 pages

AI中文摘要

新兴的交流/多端直流电网被认为是容纳日益增长的可再生能源整合的有前景的解决方案。本文提出一个优化框架，以解决实际运行场景中出现的输电切换问题，例如维护调度、应急管理和故障恢复。与大多数现有研究不同，所提出的框架考虑了通信网络在输电切换操作中的作用，并开发了一个最优信息-功率流模型。该模型捕捉了信息流对断路器动作的影响，同时纳入了与通信相关的成本，从而更好地反映实际运行决策过程。为了确保计算可行性，通过凸松弛、多边形近似和大M重构，将得到的优化问题表述为混合整数二阶锥规划模型。数值案例研究说明了所提出的最优信息-功率流模型的适用性，并表明其在支持输电切换决策方面的潜力。

英文摘要

The emerging AC/multi-terminal DC grids are regarded as a promising solution for accommodating the increasing integration of renewable energy sources. This work proposes an optimization framework to address transmission switching (TS) problems arising in practical operational scenarios, such as maintenance scheduling, contingency management, and fault restoration. Unlike most existing studies, the proposed framework considers the role of communication networks in TS operations and develops an optimal information-power flow (OIPF) model. The OIPF model captures the impact of information flows on circuit breaker actions while incorporating communication-related costs, thereby better reflecting practical operational decision-making processes. To ensure computational tractability, the resulting optimization problem is formulated as a mixed-integer second-order cone programming (MISOCP) model through convex relaxations, polygonal approximations, and Big-M reformulations. Numerical case studies illustrate the applicability of the proposed OIPF model and indicate its potential in supporting transmission switching decisions.

URL PDF HTML ☆

赞 0 踩 0

2606.09407 2026-06-09 eess.SY cs.SY 新提交

Delayed Functional Observers for Output-Delayed Linear Systems

输出延迟线性系统的延迟功能观测器

Hieu Trinh

AI总结针对输出测量严重滞后问题，提出一类新型延迟功能观测器，通过系统处理执行器和传感器通道的不等延迟，实现低阶框架重构延迟控制律。

2606.09406 2026-06-09 eess.SY cs.SY 新提交

Advanced simulation framework for AC/MTDC power systems

AC/MTDC电力系统高级仿真框架

Aleksandra Lekić, Azadeh Kermansaravi, Haixiao Li, Yasel Quintero Lares, Saif Alsarayreh, Robert Dimitrovski

AI总结针对AC/MTDC混合电力系统的稳定性与谐波问题，提出基于C++的开源仿真框架HARMONY，集成最优潮流与谐波稳定性分析功能，提供快速可信的稳定性评估。

详情

Comments: 13 pages

AI中文摘要

交流（AC）/多端直流（MTDC）混合电力系统（HPS）在实现远距离输电和交流电网灵活互联中发挥着关键作用。然而，HPS面临的挑战众多，其中稳定性和谐波问题尤为突出。传统的电磁暂态（EMT）工具难以适应小信号稳定性问题以及变流器间最优交互的潜在问题。为弥补这一空白，基于C++编程语言开发了HARMONY（“高比例电力电子渗透电力系统的谐波稳定性评估”），作为互联AC/MTDC HPS高级仿真与分析的综合性数学框架。HARMONY的主要目标是提供更快、可信的稳定性分析，并解决与变流器控制动态、变流器驱动稳定性以及HPS互操作性相关的分析难题。该框架旨在开源，从而拓宽研究人员的合作，并为电力系统工程师社区做出贡献。本文展示了HARMONY的两项核心功能：最优潮流（OPF）和谐波稳定性分析（HAS）。详细介绍了这两项功能的底层分析模型和计算方法，以帮助未来的读者和用户清晰理解HARMONY的数学基础。此外，我们介绍了HARMONY中设计的OPF与HAS集成框架，并展示了代表性的分析结果，以证明HARMONY的吸引人的能力。

英文摘要

Alternating current (AC)/multi-terminal direct current (MTDC) hybrid power systems (HPSs) play a crucial role in enabling long-distance power transmission and flexible interconnections between AC grids. However, the challenges that HPSs encountered are numerous, with stability and harmonic issues being particularly prominent. Traditional electromagnetic transient (EMT) tools have struggled to accommodate small-signal stability problems and the potential issues of the optimal interactions among converters. To address this gap, HARMONY ("HARMONic stabilitY assessment of PE-penetrated power systems") has been developed for the advanced simulation and analysis of interconnected AC/MTDC HPSs as a comprehensive mathematical framework based on C++ programming language. The primary goals of Harmony are to provide faster and trusted stability analyses, and address the analytical difficulties associated with converter control dynamics, converter-driven stability, and interoperability in HPSs. This framework is intended to be open source, therefore broadening collaboration for researchers, and to contribute to the community of power systems engineers. In this paper, we demonstrate two core functionalities featured in HARMONY, that are optimal power flow (OPF) and harmonic stability analyses (HAS). The underlying analysis models and computational methodologies for both functionalities are presented in detail to help future readers and users gain a clear understanding of mathematical fundamentals of HARMONY. Furthermore, we introduce the integrated framework of OPF and HAS designed in HARMONY, along with representative printed analysis results, to demonstrate the appealing capabilities of HARMONY.

URL PDF HTML ☆

赞 0 踩 0

2606.09366 2026-06-09 cs.CL eess.AS 新提交

Is Text All You Need? Text as a Universal Information Bottleneck for Speech LLMs

文本就是一切？文本作为语音大语言模型的通用信息瓶颈

Ming-Hao Hsu, Yuxuan Hu, Shujie Liu, Jinyu Li, Yan Lu, Zhizheng Wu

发表机构 * The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））； Microsoft Corporation（微软公司）； Microsoft Research Asia（微软亚洲研究院）

AI总结提出Convex Gate（C-Gate）桥接语音与LLM，通过凸包约束将语音表示限制在LLM输入嵌入流形内，在ASR和情感识别上取得联合最优性能，并揭示几何结构而非离散性是关键设计因素。

详情

AI中文摘要

大型语言模型（LLM）为语音理解提供了强大的推理骨干，但将连续声学信号集成到冻结的LLM中仍然具有挑战性。现有的语音到LLM接口通常处于两个极端：要么强制近乎离散的令牌对齐，这有利于转录但丢失副语言信息；要么学习无约束的连续表示，这可能会偏离LLM的输入空间并降低自回归解码性能。在这项工作中，我们提出了Convex Gate（C-Gate），一种语音到LLM的桥接方法，通过架构凸包约束将所有语音表示限制在LLM的输入嵌入流形内。具体而言，每一帧被表示为令牌嵌入的凸组合，确保与预训练LLM的兼容性，同时保持连续表达能力。在自动语音识别（ASR）和情感识别任务中，C-Gate实现了强大的联合性能，在LibriSpeech上相对词错误率（WER）降低高达48.7%，同时匹配或超过单任务情感识别准确率。除了性能之外，我们的分析揭示了一个关键见解：信息不是由离散令牌身份携带，而是由嵌入空间中时间分辨的轨迹携带。因果干预证实，轨迹结构和与预训练嵌入流形的对齐对性能都至关重要。这些结果表明，几何结构而非令牌离散性是语音到LLM接口的基本设计因素，并为研究冻结LLM中的多模态集成提供了一个受控机制。我们发布了检查点、每个样本的输出、机制转储和干预套件以供复现。

英文摘要

Large language models (LLMs) provide a powerful reasoning backbone for speech understanding, but integrating continuous acoustic signals into a frozen LLM remains challenging. Existing speech-to-LLM interfaces typically operate at two extremes: either enforcing near-discrete token alignment, which benefits transcription but loses paralinguistic information, or learning unconstrained continuous representations, which can drift away from the LLM's input space and degrade autoregressive decoding. In this work, we propose Convex Gate (C-Gate), a speech-to-LLM bridge that constrains all speech representations to lie within the LLM's input embedding manifold with an architectural convex-hull constraint. Concretely, each frame is represented as a convex combination of token embeddings, ensuring compatibility with the pretrained LLM while preserving continuous expressivity. Across automatic speech recognition (ASR) and emotion recognition, C-Gate achieves strong joint performance, improving LibriSpeech WER by up to 48.7% relative while matching or exceeding single-task emotion accuracy. Beyond performance, our analysis reveals a key insight: information is not carried by discrete token identities, but by time-resolved trajectories in the embedding space. Causal interventions confirm that both the trajectory structure and alignment to the pretrained embedding manifold are critical for performance. These results suggest that geometry, rather than token discreteness, is the fundamental design factor in speech-to-LLM interfaces, and provide a controlled regime for studying multimodal integration in frozen LLMs. We release the checkpoint, per-sample outputs, mechanism dumps, and intervention suite for replication.

URL PDF HTML ☆

赞 0 踩 0

2606.09357 2026-06-09 eess.AS 新提交

Rethinking Depth: A study of the Recursive-Transformer for Speech Recognition

重新思考深度：递归Transformer在语音识别中的研究

Thomas Rolland, Carlos Carvalho, Alberto Abad

AI总结本文实验研究了递归Transformer在语音识别编码器中的应用，通过潜空间有限循环递归，在保持性能的同时减少66%参数。

2606.09345 2026-06-09 eess.AS 新提交

A study on the impact of region specific data on the performance of Indic ASR

区域特定数据对印度自动语音识别性能影响的研究

Agneedh Basu, Pavan Kumar J, Pranav Bhat, Sujith Pulikodan, Visruth Sanka, Nihar Desai, Prasata Kumar Ghosh

AI总结通过微调控制实验，研究印度语言跨区域ASR泛化能力，发现地理距离与词错误率正相关，强调区域多样性数据的重要性。

2606.09342 2026-06-09 eess.AS 新提交

Parameter-Efficient Continual Learning for Automatic Speech Recognition

参数高效持续学习用于自动语音识别

Steven Vander Eeckt, Hugo Van hamme

AI总结提出一种基于奇异值分解的头部-尾部子空间划分方法，在低能量尾部子空间进行近似旋转适应，结合权重平均减少遗忘，实现参数高效的持续学习ASR。

详情

Comments: Accepted at Interspeech 2026

AI中文摘要

语音基础模型能够实现强大的通用自动语音识别（ASR），并且对于下游适应具有吸引力。然而，它们的规模以及顺序微调导致的灾难性遗忘要求参数高效和正则化的训练方法，从而推动了参数高效持续学习（PECL）。虽然PECL在自然语言处理和视觉领域已被广泛研究，但在ASR中受到的关注较少。在本文中，我们基于最近在ASR参数高效微调方面的进展，提出了一种简单而有效的PECL方法。我们根据奇异值将预训练权重矩阵划分为头部和尾部子空间，并将适应限制在低能量尾部子空间内的近似旋转，从而保留主导成分并减少遗忘。对于后续任务，通过权重平均组合旋转以进一步提高保留能力。在两个基准上的实验表明，与最近的PECL基线相比，该方法减少了遗忘并实现了更优的整体性能。

英文摘要

Speech foundation models enable strong general-purpose ASR and are attractive for downstream adaptation. However, their size and the catastrophic forgetting induced by sequential fine-tuning demand parameter-efficient and regularized training methods, motivating parameter-efficient continual learning (PECL). While PECL has been widely studied in NLP and vision, it has received less attention in ASR. In this paper, we propose a simple yet effective PECL method based on recent advances in parameter-efficient fine-tuning for ASR. We partition pretrained weight matrices into head and tail subspaces according to singular values and restrict adaptation to approximate rotations within the low-energy tail subspace, preserving dominant components and reducing forgetting. For subsequent tasks, rotations are combined via weight averaging to further improve retention. Experiments on two benchmarks demonstrate reduced forgetting and superior overall performance compared to recent PECL baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.09335 2026-06-09 eess.AS 新提交

Factors affecting ASR performance: A study using state of the art ASR models in Indic Languages

影响ASR性能的因素：使用最先进的ASR模型在印度语言中的研究

Agneedh Basu, Pavan Kumar J, Pranav Bhat, Sujith Pulikodan, Visruth Sanka, Nihar Desai, Prasanta Kumar Ghosh

AI总结本研究通过多种开源ASR模型在零样本设置下对印度语言语音数据集进行大规模分析，探讨语言、说话者和声学因素对词错误率的影响，揭示了跨语言模式与语言特定敏感性。

2606.09332 2026-06-09 eess.SP 新提交

Wearable Single-Lead ECG Detects Fine-Grained Structural Heart Disease Through Echo-Report Supervision

可穿戴单导联心电图通过超声报告监督检测细粒度结构性心脏病

Chenyang He, Qinghao Zhao, Shun Huang, Jun Li, Gongzheng Tang, Hao Zhang, Tong Liu, Zhengkai Xue, Jian Liu, Kangyin Chen, Cheng Ding, Shenda Hong

AI总结提出AnyECG-Echo框架，利用单导联心电图与超声报告对比预训练，在外部队列中检测13种细粒度结构性心脏病亚型，实现高AUROC并具备双轴可解释性。

详情

AI中文摘要

结构性心脏病（SHD）是心力衰竭和心血管死亡的主要原因，但由于超声心动图的可及性有限，早期检测仍受到限制。虽然单导联心电图（ECG）通过可穿戴设备无处不在，但现有的人工智能筛查模型通常依赖于12导联输入，跨机构泛化能力差，或需要大量特定条件的标注数据集。最近的研究证明了在单一医疗系统内对单导联心电图和超声心动图报告进行对比预训练的可行性。在这里，我们提出了AnyECG-Echo，一个通过三个关键进展将这一范式推向临床转化的框架：（1）在地理独立的外部队列（n=16,621）中进行评估；（2）诊断覆盖13种细粒度SHD亚型，包括心肌、心腔、瓣膜和大血管病变；（3）双轴机制可解释性，结合基于电生理学的Shapley归因与定量测量的新兴相关性。在总计n=25,222的验证队列中，该模型对高影响亚型表现出高AUROC，包括左心室收缩功能降低（AUROC 0.866-0.924）、全心扩大（0.877-0.931）和二尖瓣狭窄（0.836-0.906）。此外，我们成功验证了模型输出与既定医学生理特征的一致性，从而增强了可解释性。值得注意的是，我们发现AnyECG-Echo的输出作为基于生理学的数字生物标志物，能够准确跟踪LVEF和心肌壁厚度等客观指标。这些发现证明可穿戴单导联心电图可以有效检测细粒度结构性心脏病，为人群规模筛查提供了实用解决方案。

英文摘要

Structural heart disease (SHD) is a primary driver of heart failure and cardiovascular mortality, yet early detection remains constrained by the limited accessibility of echocardiography. While single-lead electrocardiogram (ECG) is ubiquitous through wearables, existing AI screening models often depend on 12-lead inputs, generalize poorly across institutions, or require massive, condition-specific labeled datasets. Recent work has demonstrated the feasibility of contrastive pre-training between single-lead ECGs and echocardiography reports within a single health system. Here, we present AnyECG-Echo, a framework that advance this paradigm toward clinical translation through three key developments: (1) evaluation in a geographically independent external cohort (n = 16,621); (2) diagnostic coverage of 13 fine-grained SHD subtypes spanning myocardial, chamber, valvular, and great-vessel pathologies; and (3) dual-axis mechanistic interpretability combining electrophysiology-grounded Shapley attribution with emergent correlations to quantitative measurements. Across validation cohorts totaling n = 25,222, the model demonstrated high AUROC for high-impact subtypes, including reduced left ventricular systolic function (AUROC 0.866-0.924), global heart enlargement (0.877-0.931), and mitral stenosis (0.836-0.906). Furthermore, we successfully validated the alignment of model outputs with established medical physiological traits, thereby enhancing interpretability. Notably, we discovered that AnyECG-Echo's outputs function as physiologically grounded digital biomarkers that accurately track objective metrics such as LVEF and myocardial wall thickness. These findings prove that wearable single-lead ECGs can effectively detect fine-grained structural heart disease, offering a practical solution for population-scale screening.

URL PDF HTML ☆

赞 0 踩 0

2606.09330 2026-06-09 eess.IV 新提交

Dynamic XR Rendering Offloading Based on Feature-Based Quality Assessment

基于特征质量评估的动态XR渲染卸载

Sige Liu, Zhe Wang, Lavish Kamal Kumar, Yansha Deng

AI总结提出一种边缘辅助XR渲染测试平台，利用基于深度特征嵌入和余弦相似度的感知质量指标，结合上下文赌博机学习控制器，动态优化渲染卸载决策以平衡感知质量和延迟。

详情

AI中文摘要

扩展现实（XR）应用需要密集计算和低延迟，尤其是实时渲染任务。本文提出一个边缘辅助的XR渲染测试平台，该平台根据网络条件和延迟约束，在XR客户端和边缘服务器之间动态卸载渲染工作负载。该测试平台集成了Microsoft HoloLens 2头显、支持GPU的边缘服务器以及基于HOLO Stream SDK定制的远程渲染工具包，实现了本地和边缘渲染模式之间的实时无缝切换。为了克服头部运动和异步帧到达下像素级质量指标的局限性，我们提出了一种基于深度特征嵌入和余弦相似度的感知评估指标，该指标对空间和时间错位具有鲁棒性。此外，我们设计了一个上下文赌博机学习控制器，通过联合优化感知质量和延迟来实时调整渲染放置决策。实验结果表明了该测试平台的可行性和性能，验证了其在提供高质量交互式XR体验方面的有效性。

英文摘要

Extended Reality (XR) applications demand intensive computation and low latency, especially for real-time rendering tasks. In this letter, we present an edge-aided XR rendering testbed that dynamically offloads rendering workloads between the XR client and the edge server built upon network conditions and latency constraints. The testbed integrates a Microsoft HoloLens 2 headset, a GPU-enabled edge server, and a customized remote rendering toolkit based on the HOLO Stream SDK, enabling seamless switching between local and edge rendering modes in real time. To overcome the limitations of pixel-level quality metrics under head movements and asynchronous frame arrivals, we propose a perceptual evaluation metric based on deep feature embeddings and cosine similarity, which remains robust to spatial and temporal misalignments. Furthermore, we design a contextual bandit learning controller to adapt rendering placement decisions in real time by jointly optimizing perceptual quality and latency. Experimental results demonstrate the feasibility and performance of our testbed, validating its effectiveness in delivering high-quality and interactive XR experiences.

URL PDF HTML ☆

赞 0 踩 0

2606.09317 2026-06-09 eess.AS 新提交

A Comparative Study of Pre-trained Speech Encoders and Training Objectives for Large-Scale Indic Spoken Language Identification

预训练语音编码器与训练目标在大规模印度语种口语识别中的比较研究

Agneedh Basu, Pavan Kumar J, Sujith P, Visruth Sanka, Nihar Desai, Prasanta Kumar Ghosh

AI总结本文系统比较了Whisper和FastConformer两种预训练语音编码器结合线性分类器在42种印度语言口语识别中的表现，评估了交叉熵、监督对比损失和层次softmax三种训练目标，发现冻结FastConformer在跨域测试中表现优异，而层次softmax一致优于其他目标。

详情

AI中文摘要

由于语言数量众多、相关变体间显著的语音重叠以及许多低资源语言标记数据的稀缺，印度语言的口语识别（LID）是一个具有挑战性的问题。在这项工作中，我们系统比较了两种预训练语音编码器——Whisper和FastConformer——与线性分类器结合，用于跨越四个语系的42种语言的大规模印度LID。我们在冻结（线性探测）和微调设置下评估了两种编码器，并比较了三种训练目标：交叉熵（CE）、带交叉熵的监督对比损失（CE+SupCon）和层次softmax（HSM）。模型在Vaani数据集上训练，并在Vaani-Test（留出集）、FLEURS和Kathbath的跨语料设置中评估，提供了领域泛化的见解。冻结的FastConformer编码器在FLEURS和Kathbath上无需任何任务特定适应即达到超过90%的宏准确率，在域外基准上显著优于Whisper，而微调的Whisper在域内表现更强。HSM在所有基准上一致优于CE和CE+SupCon，在域外测试集上增益最大。CE+SupCon降低了FastConformer的跨语料泛化能力，表明对比目标使表示过度特化于域内条件。按语系分析显示，中印度-雅利安变体最难区分，印地语-乌尔都语以及萨德里语-恰蒂斯加尔语-苏古贾语簇是主要的混淆对。

英文摘要

Spoken language identification (LID) for Indian languages is a challenging problem due to the large number of languages, significant phonetic overlap among related varieties, and the scarcity of labeled data for many low-resource languages. In this work, we present a systematic comparative study of two pre-trained speech encoders -- Whisper and FastConformer -- combined with a linear classifier for large-scale Indic LID spanning 42 languages across four linguistic families. We evaluate both encoders in frozen (linear probing) and fine-tuned settings, and compare three training objectives: cross-entropy (CE), supervised contrastive loss with cross entropy (CE + supCon), and hierarchical softmax (HSM). Models are trained on the Vaani dataset and evaluated in a cross-corpus setting on Vaani-Test (held-out), FLEURS, and Kathbath, providing insights into domain generalization. The frozen FastConformer encoder achieves over 90\% macro accuracy on FLEURS and Kathbath without any task-specific adaptation, substantially outperforming Whisper on out-of-domain benchmarks, while fine-tuned Whisper yields stronger in-domain performance. HSM consistently outperforms CE and CE+SupCon for both encoders across all benchmarks, with the largest gains on out-of-domain test sets. CE+SupCon degrades FastConformer's cross-corpus generalization, suggesting that the contrastive objective over-specializes representations to in-domain conditions. Per-family analysis shows that Central Indo-Aryan varieties are the hardest to discriminate, with Hindi--Urdu and the Sadri--Chhattisgarhi--Surgujia cluster being the dominant confusion pairs.

URL PDF HTML ☆

赞 0 踩 0

2606.09292 2026-06-09 cs.RO cs.SY eess.SY 新提交

Dual Quaternion-Based Unscented Kalman Filter with Visual Inertial Odometry for Navigation in GPS-Denied Environments

基于对偶四元数的无迹卡尔曼滤波与视觉惯性里程计在GPS拒止环境中的导航

Mohamed Khalifa, Hashim A. Hashim

发表机构 * Carleton University（卡尔顿大学）

AI总结提出一种基于对偶四元数的无迹卡尔曼滤波（DQUKF）结合视觉惯性里程计（VIO），在GPS拒止环境下实现高精度状态估计，在EuRoC数据集上位置RMSE达0.2584米。

详情

DOI: 10.1016/j.measurement.2026.121964

AI中文摘要

在GPS拒止环境中的可靠导航仍然是机器人、航空航天和自动驾驶车辆应用中的基本挑战。本文提出了一种基于对偶四元数的无迹卡尔曼滤波（DQUKF），配备视觉惯性里程计（VIO）算法，用于在GPS拒止位置实现精确状态估计以实现导航。所提出的框架以误差状态形式构建DQUKF，其中名义位姿由单位对偶四元数表示，局部位姿误差由6维扭量参数化表示，用于sigma点生成、协方差传播和测量校正。同时，VIO算法跨图像帧跟踪特征，同步IMU和相机之间的测量，并提供补充惯性传播的视觉约束。在EuRoC MAV数据集上的仿真结果表明，所提出的DQUKF在高初始化不确定性下收敛，并在困难飞行序列中实现了0.2584米的位置RMSE，优于基准滤波器。

英文摘要

Reliable navigation in GPS-denied environments remains a fundamental challenge in robotics, aerospace, and autonomous vehicle applications. This paper presents a Dual Quaternion-Based Unscented Kalman Filter (DQUKF) equipped with a Visual Inertial Odometry (VIO) algorithm for accurate state estimation enabling navigation in GPS denied locations. The proposed framework formulates the DQUKF in an error state manner, where the nominal pose is represented by a unit dual quaternion and the local pose error is represented by a 6-dimensional twistor parameterization used for sigma point generation, covariance propagation, and measurement correction. In parallel, the VIO algorithm tracks features across image frames, synchronizes measurements between the IMU and camera, and provides visual constraints that complement inertial propagation. Simulation results on the EuRoC MAV dataset show that the proposed DQUKF converges under high initialization uncertainty and achieves a position RMSE of 0.2584~m in the difficult flight sequence, outperforming the benchmark filters.

URL PDF HTML ☆

赞 0 踩 0

2606.09282 2026-06-09 eess.SY cs.MA cs.SY 新提交

Revisiting mesoscopic traffic flow simulation in SUMO: Limitations, analysis, and an alternative

重新审视SUMO中的介观交通流仿真：局限性、分析与替代方案

Ying-Chuan Ni, Alina Akopian, Anastasios Kouvelas, Michail A. Makridis

AI总结针对SUMO中Eissfeldt介观模型不遵循LWR理论的问题，提出基于链路传输模型的离散时间实现，显式考虑后向传播空间以精确捕捉拥堵动态。

详情

Comments: Presentation at SUMO Conference 2026

AI中文摘要

介观交通流模型结合了宏观和微观模型的优点，既能详细捕捉个体车辆行为，又能保持计算效率。在本研究进行时，SUMO（城市交通仿真）使用了Eissfeldt（2004）提出的介观模型。车辆的运动由边缘之间的动态车头时距控制。然而，该模型并不完全符合Lighthill-Whitham-Richards（LWR）模型的原则。发现了若干问题，包括对队列动力学的不完整考虑以及后向传播空间的有限实现。两个案例研究场景表明，这些问题导致了不现实的拥堵形成和恢复模式。使用该模型通常低估了拥堵的严重程度。为了解决这些缺陷，提出了一种遵循LWR原则的链路传输模型的适当介观离散时间实现。通过显式纳入后向传播空间以捕捉队列溢出现象，所提出的模型提供了更精确的拥堵动态表示。链路密度输出与运动波理论和SUMO中的微观交通仿真一致，从而验证了其理论准确性。

英文摘要

Mesoscopic traffic flow models combines the merits of both macroscopic and microscopic models by capturing individual vehicle behavior in great detail and remaining the computational efficiency. At the time of this study, the mesoscopic model proposed by Eissfeldt (2004) is used in Simulation of Urban MObility (SUMO). The movement of vehicles is governed by dynamic headways between edges. However, the model does not fully comply with the principle of the Lighthill-Whitham-Richards (LWR) model. Several problems are identified, including the incomplete consideration of queue dynamics and the limited implementation of backward traveling spaces. Two case study scenarios demonstrate that the problems lead to unrealistic onset and recovery pattern of congestion. The magnitude of congestion is generally underestimated with this model. To address these drawbacks, a proper mesoscopic discrete-time implementation of link transmission model, which follows the LWR principle, is proposed. By explicitly incorporating backward traveling spaces to capture queue spillback phenomena, the proposed model provides a more precise representation of congestion dynamics. The link density outputs are consistent with the kinematic wave theory and the microscopic traffic simulation in SUMO, thus verifying its theoretical accuracy.

URL PDF HTML ☆

赞 0 踩 0