arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.13417 2026-06-12 eess.SP 新提交

Mitigating SAR-ADC Non-Idealities in Massive MU-MIMO Systems via Affine Models

通过仿射模型缓解大规模MU-MIMO系统中的SAR-ADC非理想性

Jérémy Guichemerre, Christoph Studer

AI总结针对实际SAR-ADC非理想性被忽视的问题，提出两种仿射模型（基于Bussgang分解和最大化SDR），并设计低复杂度方法缓解其在 massive MU-MIMO 系统中的影响。

详情

Comments: Presented at the International Symposium on Wireless Communication Systems (ISWCS) 2024

AI中文摘要

低分辨率数据转换器可以显著降低全数字大规模多用户多输入多输出（MIMO）基站的功耗和硅面积。然而，现有文献几乎只关注理想化的量化模型，忽略了实际模数转换器（ADC）实现中固有的非理想性。为克服这一限制，我们提出了两种仿射模型，一种基于Bussgang分解，另一种最大化信号失真比（SDR），两者都考虑了逐次逼近寄存器（SAR）ADC中最突出的非理想性。随后，我们利用这些模型设计了低复杂度方法，以缓解大规模MU-MIMO无线系统中SAR-ADC的非理想性。

英文摘要

Low-resolution data converters can significantly reduce the power consumption and silicon area of all-digital massive multi-user (MU) multiple-input multiple-output (MIMO) basestations. However, the existing literature almost exclusively focuses on idealistic quantization models, neglecting the inherent non-idealities present in real-world analog-to-digital converter (ADC) implementations. To overcome this limitation, we propose two affine models, one based on Bussgang's decomposition and one that maximizes the signal-to-distortion ratio (SDR), both accounting for the most prominent non-idealities in successive approximation register (SAR) ADCs. Subsequently, we utilize these models to devise low-complexity methods that mitigate SAR-ADC non-idealities in massive MU-MIMO wireless systems.

URL PDF HTML ☆

赞 0 踩 0

2606.13416 2026-06-12 eess.SP 新提交

Towards Standardizing Affine Frequency Division Multiplexing (AFDM) for Future Wireless Networks

迈向未来无线网络的标准化仿射频率分复用（AFDM）

Qu Luo, Lixia Xiao, Pei Xiao, Zilong Liu, Yin Xu, Qihao Peng, Zeping Sui, Hee Wook Kim, Hüseyin Arslan

AI总结本文从标准化角度系统研究AFDM，探讨其与4G/5G多参数集、FMCW雷达和LoRa的向后兼容性，分析多天线、多用户支持及PAPR等关键能力，并评估其在NTN、ISAC、V2X和UWA等场景中的潜力，表明AFDM是未来无线网络及时且引人注目的技术。

详情

AI中文摘要

仿射频率分复用（AFDM）已成为未来无线网络的一种引人注目的波形候选，因为它对双选择性信道具有很强的鲁棒性，并且能够实现通信和传感功能的无缝集成。在此背景下，本文从标准化角度对AFDM进行了系统研究。我们首先介绍AFDM的原理，并讨论波形标准化中涉及的主要考虑因素。然后，我们检查AFDM与4G/5G多参数框架及其预期演进、调频连续波（FMCW）雷达波形和远距离（LoRa）调制的向后兼容性，证明AFDM可以以有限的修改集成到传统处理链中。进一步讨论了关键的标准化关键能力，包括多天线和多用户支持以及峰均功率比（PAPR）。最后，我们研究了AFDM在几种新兴场景中的潜力，包括非地面网络（NTN）、集成感知与通信（ISAC）、车联网（V2X）和水声（UWA）通信，在这些场景中，严重的延迟-多普勒色散对波形的鲁棒性提出了严格要求。通过这些探索，表明AFDM是未来无线网络的一种及时且引人注目的技术。

英文摘要

Affine frequency division multiplexing~(AFDM) has emerged as a compelling waveform candidate for future wireless networks, owing to its strong resilience to doubly selective channels and its ability to enable the seamless integration of communication and sensing functionalities. Against this context, this article provides a systematic study of AFDM from a standardization perspective. We first introduce the principles of AFDM and discuss the major considerations involved in waveform standardization. We then examine the backwards compatibility of AFDM with 4G/5G multi-numerology frameworks and their anticipated evolution, frequency-modulated continuous-wave (FMCW) radar waveforms, and long-range (LoRa) modulation, demonstrating that AFDM can be incorporated into legacy processing chains with limited modification. Key standardization-critical capabilities are further discussed, including multiple-antenna and multi-user support, and peak-to-average power ratio (PAPR). Finally, we investigate the potential of AFDM in several emerging scenarios, including non-terrestrial networks~(NTN), integrated sensing and communications (ISAC), vehicle-to-everything (V2X), and underwater acoustic (UWA) communications, whereby severe delay-Doppler dispersion places stringent demands on waveform robustness. Through these explorations, it is shown that that AFDM represents a timely and compelling technology for future wireless networks.

URL PDF HTML ☆

赞 0 踩 0

2606.13378 2026-06-12 eess.SP 新提交

The Influence of Gain and Phase Mismatches on Beam Patterns in Phased Arrays

增益和相位失配对相控阵波束方向图的影响

Jérémy Guichemerre, Christoph Studer

AI总结针对相控阵中增益、相位失配导致最大旁瓣电平恶化的问题，提出频域框架将波束方向图描述为变形基函数，揭示失配产生理想方向图加权副本，并推导随机失配下最大旁瓣电平分布的近似表达式，实现快速良率导向设计。

详情

Comments: Submitted to a journal

AI中文摘要

相控阵的实际实现存在每根天线的增益、相位和延迟失配，这会显著恶化波束方向图的最大旁瓣电平（SLL）。现有文献要么分析特定的结构化失配模式，要么推导随机失配下的每角度边际统计量，但无法表征全局波束方向图指标如最大SLL。为解决这一局限，我们提出一个频域框架，其中波束方向图由一个锥形窗依赖的基函数描述，该基函数沿由阵列架构和信号带宽决定的变形进行评估。该公式实现了失配的频谱分析，揭示单元误差产生理想波束方向图的加权副本，其幅度由失配序列的离散傅里叶变换给出。基于这一见解，我们推导了随机增益和相位失配下最大SLL分布的近似表达式。所得表达式支持良率导向设计和快速设计空间探索，无需依赖计算密集的蒙特卡洛模拟。

英文摘要

Practical implementations of phased arrays suffer from per-antenna gain, phase, and delay mismatches, which can significantly worsen the maximum sidelobe level (SLL) of beampatterns. The existing literature either analyzes specific structured mismatch patterns or derives per-angle marginal statistics under random mismatches, which fail to characterize global beampattern metrics such as the maximum SLL. To address this limitation, we propose a frequency-domain framework in which the beampattern is described by a tapering-window-dependent base function evaluated along a deformation determined by the array architecture and signal bandwidth. This formulation enables a spectral analysis of mismatches, revealing that element-wise errors generate weighted replicas of the ideal beampattern whose amplitudes are given by the discrete Fourier transform of the mismatch sequence. Building on this insight, we derive an approximation of the maximum SLL distribution under random gain and phase mismatches. The resulting expressions enable yield-oriented design and rapid design-space exploration without relying on computationally intensive Monte-Carlo simulations.

URL PDF HTML ☆

赞 0 踩 0

2606.13129 2026-06-12 eess.SP 新提交

A High Input Impedance Chopper Stabilized Amplifier Based On Charge Conservation

基于电荷守恒的高输入阻抗斩波稳定放大器

Prabhas K Deshpande, Naveen Kadayinti

AI总结针对斩波稳定放大器输入阻抗低的问题，提出差分电容翻转技术，使输入阻抗变为纯电容性且与斩波频率无关，在TSMC 65nm工艺下实现21GOhm直流输入阻抗，用于干电极ECG信号采集。

详情

Comments: 11 pages, 29 figures

AI中文摘要

斩波稳定放大器广泛用于实现低失调和抑制闪烁噪声的放大器。这些放大器的主要限制之一是开关电容输入网络产生的低输入阻抗（Zin）。这里的Zin由于开关电容作用呈阻性，且与斩波频率（Fch）和输入电容（Ci）的乘积成反比。由于Fch应大于闪烁噪声转角频率，导致Zin较低。当与高传感器输出阻抗（Zo）的传感器接口时，斩波稳定放大器会加载传感器，降低灵敏度。本文提出一种新颖的输入阻抗提升技术——用于基于斩波的电容耦合仪表放大器（CCIA）的差分电容翻转技术，该技术通过重新配置电容位置，在保持斩波操作的同时防止每个周期中Ci的放电和再充电。这理想地导致纯电容性Zin，与Fch无关。所提出的架构用于演示使用具有几兆欧量级Zo的干电极进行心电图（ECG）信号采集。该电路采用TSMC 65 nm CMOS工艺节点实现，在直流下具有21 GOhm的Zin。电路功耗为2.6E(-6)W（包括时钟生成电路为2.8E(-6)W），总积分输入参考噪声为7.2E(-6)Vrms（1 Hz-150 Hz）。

英文摘要

Chopper stabilized amplifiers are popularly used for realizing amplifiers with low offset and for rejecting flicker noise. One of the main limitations of these amplifiers is the low Input Impedance (Zin) produced by the switch capacitor input network. Zin here is resistive due to the switch capacitor action and is inversely proportional to the product of Chopping frequency (Fch) and Input Capacitance (Ci). Since Fch should be greater than the flicker noise corner frequency, this results in a low Zin. When interfacing sensors with high Sensor Output Impedance (Zo), chopper stabilized amplifiers load the sensors resulting in reduced sensitivity. This paper presents a novel input impedance boosting technique - Differential capacitor flipping technique for chopper based Capacitively Coupled Instrumentation Amplifier (CCIA), which prevents discharge and recharge of Ci's in every cycle by reconfiguring the capacitor positions while preserving the chopping operation. This ideally results in a purely capacitive Zin which is independent of Fch. The proposed architecture is used to demonstrate Electrocardiogram (ECG) signal acquisition with dry electrodes that have Zo in the order of a few Mega Ohms. This circuit implemented in TSMC 65 nm CMOS technology node features Zin of 21 GOhms at DC. The circuit has a power consumption of 2.6E(-6)W (2.8E(-6)W including clock generation circuits), with 7.2E(-6)Vrms (1 Hz-150 Hz) of total integrated input referred noise. ~

URL PDF HTML ☆

赞 0 踩 0

2606.13110 2026-06-12 eess.IV 新提交

JOMP: Jointly-Optimized Mixed-Precision Quantization Across Neural Video Coding Frameworks and Buffering Strategies

JOMP：跨神经视频编码框架和缓冲策略的联合优化混合精度量化

Yu-Hsiang Lin, Ruhan Conceição, Chun-Hung Wu, Huu-Tai Phung, Tzu-Hsiang Chou, Marcelo Porto, Luciano Volcan Agostini, Wen-Hsiao Peng

AI总结提出JOMP框架，将量化参数和位宽作为可学习变量，实现神经视频编码器的混合精度量化，在率失真性能接近DCVC-FM的同时减少87.6%的比特操作。

详情

AI中文摘要

基于变分自编码器的神经视频编码已展现出令人印象深刻的率失真性能。然而，其在实际应用中的采用仍受到挑战的阻碍，例如过高的计算复杂度和有限的跨平台互操作性。这些问题常常被忽视，因为大多数神经视频编解码器依赖浮点运算来充分探索其率失真潜力。然而，实际部署需要基于整数的实现。将浮点实现转换为基于整数的网络并非易事，因为它涉及量化相互依赖的编码组件，这些组件对精度的敏感性可能因编解码器设计而异。本文介绍了一种联合优化混合精度（JOMP）框架，其中量化参数和位宽在训练期间都被视为可学习变量。这使得不同的编解码器模块能够以不同的精度水平运行，从而联合优化率失真-复杂度权衡。据我们所知，JOMP是首个针对神经视频编解码器的混合精度量化框架。其有效性通过对不同编码框架和时域缓冲策略的量化系统研究得到验证。我们的研究首次尝试统一理解现代编码框架和时域缓冲策略的联合效应，旨在从实用性的角度为未来神经视频编解码器的发展提供信息。此外，我们开发了一个完整的整数化流水线以实现确定性解码。总体而言，当应用于我们性能最佳的模型时，JOMP实现了整数神经视频编解码器的端到端混合精度学习，其率失真性能与最先进的DCVC-FM相当，同时减少了87.6%的比特操作。

英文摘要

Variational autoencoder-based neural video coding has demonstrated impressive rate-distortion performance. However, its adoption in real-world applications remains hindered by challenges, such as prohibitively high computational complexity and limited cross-platform interoperability. These issues are often overlooked, as most neural video codecs rely on floating-point arithmetic to fully explore their rate-distortion potential. Practical deployment, however, requires integer-based implementations. Converting floating-point implementations into integer-based networks is non-trivial, since it involves quantizing inter-dependent coding components, whose sensitivity to precision may vary across codec designs. This paper introduces a Jointly-Optimized Mixed-Precision (JOMP) framework, in which both quantization parameters and bit widths are treated as learnable variables during training. This enables different codec modules to operate at varying precision levels, thereby jointly optimizing the rate-distortion-complexity trade-off. To the best of our knowledge, JOMP is the first mixed-precision quantization framework for neural video codecs. Its effectiveness is validated through a systematic investigation of quantization across different coding frameworks and temporal buffering strategies. Our study marks the first attempt to a unified understanding of the combined effects of modern coding frameworks and temporal buffering strategies, with the aim of informing future development of neural video codecs from a practicality perspective. In addition, we develop a complete integerization pipeline to achieve deterministic decoding. Overall, when applied to our best-performing model, JOMP enables end-to-end mixed-precision learning for integer neural video codecs, achieving rate-distortion performance comparable to that of the state-of-the-art DCVC-FM while reducing bit operations by 87.6%.

URL PDF HTML ☆

赞 0 踩 0

2606.13091 2026-06-12 eess.SP 新提交

Inverse Learning assisted V2I Communication for Intent Based 6G ISAC Vehicular Networks

基于逆学习的意图驱动6G ISAC车联网V2I通信

Anoop C V, Anup Aprem

AI总结针对6G车联网中集成感知与通信的RSU自主决策，提出逆学习方法推断其意图函数，实现V2I通信自适应波束分配等应用。

详情

AI中文摘要

6G有望在车联网能力方面带来前所未有的进步。然而，6G的出现也将引入车载通信基础设施（如路侧单元RSU）操作的变化，包括采用自主意图驱动的网络范式和集成感知与通信（ISAC）能力。虽然ISAC能够在单个6G网络节点内实现感知和通信，但意图驱动的网络设计范式确保RSU等网络节点作为自主认知代理，以实现其各自通信服务提供商的目标。这种范式转变需要开发V2I通信策略，该策略学习并适应感知辅助通信和RSU的自主决策策略。我们将RSU建模为约束效用最大化器，其中效用函数表征RSU意图，并制定逆学习（IL）问题，以从观察到的ISAC RSU动作中推断潜在效用函数，例如响应于车辆微云（VMC）内车辆运动状态的自适应波束宽度分配。本文的主要贡献是：（i）ATIL，一种基于Afriat定理的非参数方法，用于固定效用学习；（ii）FICNNIL，一种使用全输入凹神经网络的参数化方法，用于结构化固定效用学习；（iii）PICNNIL，一种基于部分输入凹神经网络的参数化方法，用于状态依赖效用的逆学习；（iv）联邦逆学习算法FedFICNNIL和FedPICNNIL，分别用于固定和状态依赖效用。我们展示了所提出的基于IL的框架在VMC中的两个V2I通信应用，即协作数据下载的预测调度和动态簇头选择。

英文摘要

6G is expected to bring unprecedented advancements in the capabilities of vehicular networks. However, the advent of 6G will also introduce changes in the operation of vehicular communication infrastructures such as roadside units (RSUs), including the incorporation of autonomous intent-based network paradigm and integrated sensing and communication (ISAC) capabilities. While ISAC enables sensing and communication within a single 6G network node, intent-based network design paradigm ensures that network nodes such as RSUs, act as autonomous cognitive agents to fulfill the objectives of their respective communication service providers. This paradigm shift necessitates the development of V2I communication strategies that learns and adapts to the sensing-assisted communication and the autonomous decision-making strategies of RSUs. We model the RSU as a constrained utility maximizer, where the utility function characterizes the RSU intent, and formulate an inverse learning (IL) problem to infer the underlying utility function from observed ISAC RSU actions, for example the adaptive beamwidth allocation in response to the kinematic states of vehicles within a vehicular micro-cloud (VMC). The main contributions of this paper are: (i) ATIL, a nonparametric method based on Afriat theorem for fixed utility learning; (ii) FICNNIL, a parametric approach using fully input-concave neural networks, for structured fixed utility learning; and (iii) PICNNIL, a parametric approach based on partially input-concave neural networks, for inverse learning of state-dependent utilities. (iv) Federated inverse learning algorithms FedFICNNIL and FedPICNNIL for fixed and state dependent utility, respectively. We demonstrate the proposed IL-based framework for two V2I communication applications in VMCs, namely predictive scheduling for cooperative data downloading and dynamic cluster-head selection.

URL PDF HTML ☆

赞 0 踩 0

2606.13046 2026-06-12 eess.SP 新提交

Thermal characterisation by Scanning Photothermal Radiometry using a random undersampled measurement scheme

使用随机欠采样测量方案的光热辐射扫描热表征

Florian Crouau (I2M-BX), Alejandro Mateos-Canseco (I2M-BX), Jérémie Maire (I2M-BX), Jean-Luc Battaglia (I2M-BX), Stéphane Chevalier (I2M-BX)

AI总结针对光热辐射扫描技术测量耗时问题，提出随机欠采样方案，在碳纤维铝基复合材料上实现6倍测量量减少，采用稀疏信号非规则采样和加权随机技术。

2606.12967 2026-06-12 eess.SP 新提交

Simulating Torsional Vibrations of Faulty Bevel Gears Using the Polygonal Contact Method

使用多边形接触方法模拟故障锥齿轮的扭转振动

Milla Vehviläinen, Aleksanteri Hämäläinen, Pekka Rahkola, Mikko Savolainen, Janne Keränen, Jari Halme, Jenni Pippuri-Mäkeläinen, Kari Tammi, Anouar Belahcen

AI总结提出基于多边形接触方法的多体仿真，模拟实验方位推进器试验台的扭转振动，能够处理任意故障几何形状，仿真信号在时域和频域与测量结果高度一致。

详情

AI中文摘要

齿轮是机电应用中的关键部件，但准确的状态监测方法（包括数据驱动的预测性维护）强烈依赖于高质量数据，尤其是来自故障部件的数据。为了解决数据稀缺问题，我们提出了一种使用先进多边形接触方法的多体仿真，以复制实验方位推进器试验台的扭转振动。关键创新在于能够模拟具有任意故障几何形状的健康和故障齿轮。仿真信号在时域和频域上与测量结果高度匹配。在时域中，平均扭矩水平和周期性波动吻合良好，尽管测量信号表现出更高的峰峰值幅度和更大的噪声，特别是在较低转速的健康条件下。在频域中，仿真准确再现了预期的故障频率和相应的边带，较大的故障产生更高的幅度。虽然仿真倾向于高估峰值幅度并低估外部噪声，但结果与测量高度可比，且符合物理预期。这些发现为增强数据驱动的状态监测方法（特别是那些使用机器学习或深度学习的方法）提供了坚实的基础。

英文摘要

Gears are an integral component of electromechanical applications, but accurate condition monitoring methods, including data-driven predictive maintenance, are strongly dependent on high-quality data, especially from faulty components. To address the scarcity of data, we proposed a multibody simulation using an advanced polygonal contact method to replicate torsional vibrations from an experimental azimuth thruster test rig. The key novelty is the ability to simulate both healthy and faulty gears with arbitrary fault geometries. The simulated signals closely matched the measurements in both time and frequency domains. In the time domain, average torque levels and periodic fluctuations aligned well, although measured signals exhibited higher peak-to-peak amplitudes and greater noise, particularly in healthy conditions at lower rotational speeds. In the frequency domain, the simulations accurately reproduced expected fault frequencies and corresponding sidebands, with larger faults producing higher amplitudes. While the simulations tended to overestimate peak amplitudes and underestimate external noise, the results were highly comparable to measurements and consistent with the physical expectations. These findings provide a robust foundation for enhancing data-driven condition monitoring methods, particularly those employing machine learning or deep learning.

URL PDF HTML ☆

赞 0 踩 0

2606.12899 2026-06-12 eess.SP 新提交

LGVSC: A Large-Model-Driven Generative Video Semantic Communication Framework

LGVSC：一种大模型驱动的生成式视频语义通信框架

Yu Ma, Hang Yin, Li Qiao, Shuo Sun, Zhen Gao, Yin Xu, Wenjun Zhang

AI总结提出大模型驱动的生成式视频语义通信框架LGVSC，通过解耦编解码器、引入概率语义相似度评分和语义引导关键帧提取，在极低带宽下实现高效视频语义传输，并保持零样本泛化能力。

详情

Comments: Accepted by IEEE Transactions on Vehicular Technology

AI中文摘要

受万物互联中大规模视频传输需求的驱动，语义通信在平衡传输效率和质量方面具有巨大潜力。本文介绍了一种大模型驱动的生成式视频语义通信（LGVSC）框架，能够在极低带宽条件下实现高效的视频语义传输。首先，通过解耦编码器和解码器并暴露显式的中间语义表示，LGVSC保持了可解释性，避免了端到端系统中常见的黑盒行为。其次，我们引入了一种新的度量标准，即基于概率的语义相似度评分（PSSS），它在连续范围内量化复杂模态的语义相似度，从而能够更精确地评估语义内容。基于PSSS，我们提出了一种由多模态大模型驱动的语义引导关键帧提取模块。该模块可以在发射端的关键帧选择过程中增强细粒度语义一致性，在不牺牲语义保真度的情况下优化传输带宽。此外，我们在接收端设计了一个由生成式大模型驱动的动态语义自适应解码器，能够适应任意长度的视频。仿真结果表明，LGVSC显著优于传统方案，实现了10^-4到10^-3量级的信道带宽比，同时在多个下游任务中保持了强大的零样本泛化能力。

英文摘要

Driven by the massive video transmission requirements in the Internet of Everything, semantic communication holds great promise for striking a balance between transmission efficiency and quality. This paper introduces a large-model-driven generative video semantic communication (LGVSC) framework, enabling efficient video semantic transmission under extremely low bandwidth conditions. First, by decoupling the encoder and decoder as well as exposing explicit intermediate semantic representations, LGVSC maintains interpretability, avoiding the black-box behavior commonly observed in end-to-end systems. Next, we introduce a new metric, i.e., the probability-based semantic similarity score (PSSS), which quantifies semantic similarity for complex modalities within a continuous range, allowing for more precise evaluation of semantic content. Building on PSSS, we propose a semantic-guided keyframe extraction module driven by a multimodal large model. This module can enhance fine-grained semantic consistency during keyframe selection at the transmitter, optimizing transmission bandwidth without compromising semantic fidelity. Additionally, we design a generative large-model-driven dynamic semantic-adaptive decoder at the receiver, which can adapt to videos of arbitrary lengths. Simulation results demonstrate that LGVSC significantly outperforms traditional schemes, achieving a channel bandwidth ratio on the order of 10^-4 to 10^-3, while maintaining strong zero-shot generalization across downstream tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.12870 2026-06-12 eess.SP 新提交

Rotatable Antenna-Enabled Near-Field Integrated Sensing and Communication

可旋转天线使能的近场通感一体化

Zequan Wang, Liang Yin, Yitong Liu, Hongwen Yang

AI总结本文提出利用可旋转天线（RA）的单元级旋转提供方向域空间自由度，增强近场通信与感知性能；通过联合优化波束成形与天线指向，提出交替优化算法，并推导了闭式根克拉美-罗界（RCRB），仿真表明RA可补偿射频链路限制并提升近场感知精度。

详情

AI中文摘要

本文提出利用可旋转天线（RA）通过单元级天线旋转提供的新方向域空间自由度（DoF）来增强近场通信与感知。具体地，我们研究了一个具有子连接混合波束成形的RA使能近场通感一体化（ISAC）系统，其中每个发射RA可以在实际旋转约束下独立调整其指向方向。建立了考虑方向相关天线增益的球面波信道模型，以表征存在杂波时的多用户通信和目标感知。基于该模型，通过联合优化接收波束成形器、数字波束成形器、模拟波束成形器和RA指向方向，构建了一个加权通信-感知效用最大化问题。为解决由此产生的非凸问题，开发了一种结合分数规划、黎曼优化和基于球冠Frank-Wolfe的指向更新的交替优化算法。为了进一步理解RA旋转对近场感知的影响，推导了闭式根克拉美-罗界（RCRB）表达式。仿真结果证明了所提算法的收敛性和有效性。结果表明，RA使能的混合设计在某些情况下可以匹配甚至超越全数字FPA基准，表明单元级旋转引入的方向域自由度可以补偿有限的射频链路。RCRB和波束图结果进一步表明，RA旋转提高了近场中偏离视轴方向的感知精度，增强了距离域聚焦，并抑制了同角度杂波。

英文摘要

In this paper, we propose leveraging rotatable antennas (RAs) to enhance near-field communication and sensing by exploiting a new orientation-domain spatial degree-of-freedom (DoF) provided by element-wise antenna rotation. Specifically, we investigate an RA-enabled near-field integrated sensing and communication (ISAC) system with sub-connected hybrid beamforming, where each transmit RA can independently adjust its boresight direction under a practical rotation constraint. A spherical-wave channel model incorporating orientation-dependent antenna gains is established to characterize multi-user communication and target sensing in the presence of clutters. Based on this model, a weighted communication-sensing utility maximization problem is formulated by jointly optimizing the receive beamformer, digital beamformer, analog beamformer, and RA boresight directions. To solve the resulting non-convex problem, an alternating optimization algorithm is developed by combining fractional programming, Riemannian optimization, and a spherical-cap Frank--Wolfe-based boresight update. To further understand the impact of RA rotation on near-field sensing, we derive a closed-form root Cramer--Rao bound (RCRB) expression. Simulation results demonstrate the convergence and effectiveness of the proposed algorithm. It is shown that the RA-enabled hybrid design can match or even outperform the fully-digital FPA benchmark in some regimes, indicating that the orientation-domain DoF introduced by element-wise rotation can compensate for limited RF chains. The RCRB and beampattern results further show that RA rotation improves off-broadside sensing accuracy, enhances range-domain focusing, and suppresses same-angle clutters in the near field.

URL PDF HTML ☆

赞 0 踩 0

2606.12844 2026-06-12 eess.SP 新提交

Active Perception for Radio Map Reconstruction in Uncharted 3D Air-Ground Environments

未知三维空地环境中无线电地图重建的主动感知

Wenlihan Lu, Miaowen Wen, Shijian Gao

AI总结提出3D-URAM主动感知框架，通过贝叶斯UNet恢复地图并利用动态概率路图与Transformer策略优化路径，实现稀疏测量下重建误差降低50%以上。

详情

AI中文摘要

无线电地图为低空网络系统提供了必要的基础。与通常通过路测测量生成的地面无线电地图不同，绘制空-地环境需要部署无人机（UAV）。这一转变在未知3D场景中带来了两个严峻挑战。首先，稀疏的无线电测量和不完整的几何观测阻碍了准确重建。其次，大的3D动作空间和高频谱扫描仪能耗带来的严格功率约束使得信息性探索变得困难。为解决这些问题，本文提出3D不确定性感知无线电主动映射（3D-URAM），一种闭环主动感知框架，将映射过程解耦为两个离线训练阶段。在第一阶段，开发了贝叶斯UNet，从稀疏测量和部分几何中恢复无线电地图，同时提供校准的预测不确定性。在第二阶段，通过近端策略优化训练的基于动态概率路图和Transformer的航点选择策略，在旅行预算下最大化长程不确定性降低。实验结果表明，与代表性基线相比，3D-URAM将重建误差降低了50%以上。在300mx200mx100m空间内的实际现场测试也验证了主动无线电地图重建的潜力。

英文摘要

Radio maps provide the essential foundation for low altitude networking systems. Unlike terrestrial radio maps that are typically generated via drive test measurements, mapping the air-ground environment requires the deployment of unmanned aerial vehicles (UAVs). This shift introduces two formidable challenges in uncharted 3D scenarios. First, sparse radio measurements and incomplete geometric observations hinder accurate reconstruction. Second, the large 3D action space and strict power constraints from high spectrum scanner energy consumption make informative exploration difficult. To address these issues, this paper proposes 3D uncertainty aware radio active mapping (3D-URAM), a closed loop active perception framework that decouples the mapping process into two offline trained stages. In Stage I, a Bayesian UNet is developed to recover radio maps from sparse measurements and partial geometry while providing calibrated predictive uncertainty. In Stage II, a dynamic probabilistic roadmap and a transformer based waypoint selection policy trained via proximal policy optimization maximize long horizon uncertainty reduction under travel budgets. Experimental results demonstrate that 3D-URAM reduces reconstruction error by over 50% compared to representative baselines. Real-world field tests within a 300mx200mx100m space also validate the potential of active radio map reconstruction.

URL PDF HTML ☆

赞 0 踩 0

2606.12831 2026-06-12 eess.SP 新提交

Sub-array Selection Optimization for Joint Self-Interference and Multi-User Interference Suppression in FD mMIMO

全双工大规模MIMO中联合自干扰与多用户干扰抑制的子阵列选择优化

Yuanzhe Gong, Yuanxing Zhang, Tho Le-Ngoc

AI总结提出一种联合天线子阵列选择与角度扰动归零的波束成形优化方案，用于全双工大规模MIMO系统，同时抑制自干扰和多用户干扰。基于实测信道，子阵列选择使残余波束级自干扰抑制提升29.2 dB以上，平均隔离度达85.2 dB。

详情

AI中文摘要

本文针对全双工大规模多输入多输出系统，提出一种联合天线子阵列选择与角度扰动归零的波束成形优化方案，以同时抑制自干扰和多用户干扰。利用8x8Tx-8x8Rx全双工阵列原型进行的全面空中自干扰信道测量活动揭示了不同空间位置子阵列间的显著差异，以及在不同收发子阵列配置下自干扰信道的可重构特性。为利用选择性自干扰信道，开发了一种基于粒子群优化的算法，联合确定最优子阵列索引和扰动导向角，从而有效消除潜在干扰。选择固有自干扰信道较低的子阵列显著增强了波束级隔离度，而在可比较的自干扰信道间增加的选择灵活性确保了在不同下行/上行位置更均匀的自干扰抑制，并显著改善了最坏情况隔离度。基于实测自干扰信道的实验评估表明，所提出的子阵列选择技术对于示例1x2和1x4子阵列分别实现了29.2 dB和26.6 dB的残余收发波束级自干扰抑制提升，最坏情况改善超过30.7 dB。总体而言，联合子阵列选择与角度扰动归零优化方案在1x2和1x4子阵列下分别实现了85.2 dB和83.3 dB的平均波束级隔离度。应用基带预编码器后，所有测试的子阵列配置均实现了优于-181.3 dB的平均多用户干扰抑制。这些结果证实了所提出的优化算法能够成功将干扰降低至噪声基底，从而保证可靠的全双工大规模MIMO运行。

英文摘要

This paper proposes a beamforming optimization scheme with joint antenna sub-array selection (SAS) and angular perturbation-based nulling (APN) for full-duplex (FD) massive multiple-input multiple-output (mMIMO) systems, to simultaneously suppress self-interference (SI) and multi-user interference (MUI). A comprehensive over-the-air SI channel measurement campaign, conducted with an 8x8Tx-8x8Rx FD array prototype, reveals significant variations across sub-arrays at different spatial locations, as well as reconfigurable characteristics of the SI channel under diverse Tx and Rx sub-array configurations. To exploit the selective SI channels, a particle swarm optimization (PSO)-based algorithm is developed to jointly determine optimal sub-array indices and perturbed steering angles, thereby effectively nullifying potential interference. Selecting sub-arrays with inherently lower SI channels notably enhances the beam-level isolation, while the added selection flexibility among comparable SI channels ensures more uniform SI suppression across diverse DL/UL locations and significantly improves worst-case isolation. Experimental evaluation based on the measured SI channel demonstrates that the proposed SAS technique achieves residual Tx-Rx beam-level SI suppression improvements of 29.2 dB and 26.6 dB for the sample 1x2 and 1x4 sub-arrays, respectively. A worst-case improvement greater than 30.7 dB is observed. Overall, the joint SAS and APN optimization scheme achieves average beam-level isolation of 85.2 dB and 83.3 dB with the 1x2 and 1x4 sub-arrays, respectively. With the application of a baseband precoder, all tested sub-array configurations achieve average MUI suppression better than -181.3 dB. These results confirm the potential of the proposed optimization algorithm to successfully reduce interference to the noise floor, thereby guaranteeing reliable FD mMIMO operation.

URL PDF HTML ☆

赞 0 踩 0

2606.12823 2026-06-12 eess.SP 新提交

Chirp Parameter Optimization and Distributed Detection for Cooperative RSMA-AFDM Systems

协作RSMA-AFDM系统的啁啾参数优化与分布式检测

Qingyu Li, Guanghui Liu, Yusha Liu, Fuchen Xu, Chengxiang Liu, Hongjun Liu, Liaoyuan Zeng

AI总结针对AFDM系统信号色散难以直接采用传统多址方案的问题，引入协作速率分割多址接入（RSMA），通过优化啁啾参数降低用户间信道相关性，并设计两种基于期望传播的分布式检测方案以充分利用分集增益。

详情

Comments: This work has been submitted to the IEEE for possible publication

AI中文摘要

仿射频分复用（AFDM）表现出优异的Doppler鲁棒性和表征双选择性信道的能力。然而，其信号色散特性使得直接采用传统时频多址方案具有挑战性。为解决此问题，我们为AFDM系统引入了协作速率分割多址接入（RSMA）。AFDM啁啾参数的灵活配置可以降低用户间等效信道的相关性，从而减少RSMA私有流的干扰。我们对协作RSMA-AFDM系统进行了理论分析，并证明最小化用户间信道列空间的重叠可以有效提升系统性能。在此分析指导下，我们设计了一种啁啾参数优化方案，以减少多用户干扰并最大化分集增益。为充分利用所提啁啾参数优化带来的分集增益，提出了两种基于期望传播（EP）的分布式协作检测方案。首先，开发了一种基于决策融合的方法，其中通过最大比合并融合本地信息和协作信息，实现对公共流的全局一致估计。其次，我们开发了一种基于信念共识的EP检测方案。在每次迭代中，用户节点交换并融合公共流的一阶和二阶统计量，所得信念逐渐收敛到一致的全局决策，从而显著提高整体可靠性。

英文摘要

Affine frequency division multiplexing (AFDM) exhibits excellent Doppler robustness and the ability to characterize doubly selective channels. However, its signal dispersion characteristics make it challenging to directly adopt traditional time-frequency multiple access schemes. To address this issue, we introduce cooperative rate splitting multiple access (RSMA) for AFDM systems. The flexible configuration of AFDM chirp parameters can reduce the correlation between users' equivalent channels, which decreases the interference from RSMA private streams. We conduct a theoretical analysis of the cooperative RSMA-AFDM system and demonstrate that minimizing the overlap in the channel column spaces among users can effectively enhance the system performance. Guided by this analysis, we design a chirp parameter optimization scheme that reduces multi-user interference and maximizes diversity gain. To fully exploit the diversity gain brought by the proposed chirp parameter optimization, two expectation propagation (EP)-based distributed cooperative detection schemes are proposed. First, a decision-fusion-based method is developed, where local information and cooperative information are fused by maximum ratio combining, achieving a globally consistent estimate of the common stream. Second, we develop a belief-consensus EP-based detection scheme. In each iteration, user nodes exchange and fuse the first- and second-order statistics of the common stream, and the resulting beliefs gradually converge to a consistent global decision, which significantly improves the overall reliability.

URL PDF HTML ☆

赞 0 踩 0

2606.12884 2026-06-12 stat.ME eess.SP 新提交

Volterra--Wiener--Kunchenko Orthogonalization: From Wiener--Hermite to Distribution-Matched Volterra Bases

Volterra--Wiener--Kunchenko正交化：从Wiener--Hermite到分布匹配的Volterra基

Serhii Zabolotnii

AI总结针对非高斯输入下Volterra辨识的病态问题，通过定向Gram-Schmidt正交化构造分布匹配的VWK基，并证明方差匹配高斯基下的自归一化对角估计器风险受偏度系数控制，实验表明VWK基条件数优于幂基。

详情

Comments: 20 pages, 1 figure; companion reproducibility archive with code, frozen results, and Lean 4 files

AI中文摘要

有限记忆Volterra辨识的单项式参数化在非高斯输入下是病态的，而Wiener--Hermite展开仅对高斯白噪声输入消除病态。我们通过在$L^2(P)$中对单项式进行定向Gram--Schmidt正交化，构造了分布匹配的Volterra--Wiener--Kunchenko (VWK)基，并将其作为任意多项式混沌坐标系，用于从数据中进行有限记忆Volterra辨识，遵循Xiu和Karniadakis (2002)的广义多项式混沌以及Oladyshkin和Nowak (2012)的数据驱动任意多项式混沌。该基本身是经典的；贡献在于Volterra估计的解读。首先，一个二阶误指定惩罚定理表明，在方差匹配高斯基中，自归一化对角估计器的超额$L^2(P)$风险由偏度系数$\delta=\mu_3/\sigma^2$控制，对于对称输入恰好消失。其次，条件实验将总体匹配Gram是单位矩阵这一构造性事实与有限样本设计Gram区分开来：在$n=2000$时，中心指数经验VWK Gram的条件数远优于幂Gram，尽管它随阶数增加而退化。第三，一个机器检查的Lean 4证明建立了任意$N$的二项式$(N,p)$ Krawtchouk行。固定跨度上的全最小二乘是基不变的，因此VWK稳定了对角互相关和正则化坐标拟合，而非声称通用预测优越性。该分析基于矩、有限记忆，并限制为乘积输入分布。

英文摘要

The monomial parameterization of finite-memory Volterra identification is ill-conditioned under non-Gaussian input, and the Wiener--Hermite expansion removes this ill-conditioning only for Gaussian white-noise input. We construct the distribution-matched Volterra--Wiener--Kunchenko (VWK) basis by oriented Gram--Schmidt orthogonalization of monomials in $L^2(P)$ and use it as an arbitrary-polynomial-chaos coordinate system for finite-memory Volterra identification from data, following the generalized polynomial chaos of Xiu and Karniadakis (2002) and the data-driven arbitrary polynomial chaos of Oladyshkin and Nowak (2012). The basis itself is classical; the contribution is the Volterra-estimation reading. First, an order-2 misspecification-penalty theorem shows that a self-normalized diagonal estimator in the variance-matched Gaussian basis incurs an excess $L^2(P)$ risk governed by the skew coefficient $\delta=\mu_3/\sigma^2$, vanishing exactly for symmetric inputs. Second, conditioning experiments separate the constructional fact that the population matched Gram is the identity from the finite-sample design Gram: at $n=2000$, the centered-exponential empirical VWK Gram remains far better conditioned than the power Gram, although it degrades with degree. Third, a machine-checked Lean 4 proof establishes the Binomial$(N,p)$ Krawtchouk row for arbitrary $N$. Full least squares over a fixed span is basis-invariant, so VWK stabilizes diagonal cross-correlation and regularized coordinate fits rather than claiming universal prediction superiority. The analysis is moment-based, finite-memory, and restricted to product input laws.

URL PDF HTML ☆

赞 0 踩 0

2606.13544 2026-06-12 eess.AS cs.AI cs.CL 新提交

Adaptive Turn-Taking for Real-time Multi-Party Voice Agents

自适应轮流发言：面向实时多方语音代理

Soumyajit Mitra, Prabhat Pandey, Abhinav Jain, Shanmukha Sahith, K V Vijay Girish

AI总结提出ModeratorLM，一种基于角色条件的语音大模型，通过分块流式处理和链式推理，在多方对话中实现自适应轮流发言，显著提升轮流精度和召回率。

详情

Comments: Accepted for publication at Interspeech 2026

AI中文摘要

多方口语对话中的轮流发言仍然是语音代理面临的基本挑战，特别是在动态的发言权竞争和用户期望变化的情况下。我们提出ModeratorLM，一种角色扮演语音代理，它在多方环境中根据明确分配的角色来调节轮流发言行为。该系统基于以分块流式方式运行的语音大语言模型。我们进一步引入了一种推理增强变体，该变体结合了对对话上下文和分配角色的链式推理。我们构建了RolePlayConv，一个大规模合成数据集，包含具有多种助手角色的口语多方对话。在真实会议数据和RolePlayConv上的实验表明，与无角色条件的基线相比，轮流发言精度提高了40%以上，召回率提高了70%以上，同时大幅减少了误报中断。

英文摘要

Turn-taking in multi-party spoken conversations remains a fundamental challenge for voice-based agents, particularly under dynamic floor competition and varying user expectations. We propose ModeratorLM, a role-playing voice agent that conditions turn-taking behavior on an explicitly assigned role in multi-party settings. The system is built on a speech large language model operating in chunk-wise streaming manner. We further introduce a reasoning-augmented variant that incorporates chain-of-thought reasoning over conversational context and the assigned role. We construct RolePlayConv, a large-scale synthetic dataset of spoken multi-party conversations with diverse assistant roles. Experiments on real-world meeting data and RolePlayConv show improved turn-taking precision by over 40% and recall by more than 70%, while substantially reducing false-positive interruptions compared to non-role-conditioned baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.13450 2026-06-12 eess.AS cs.SD 新提交

Endpoint Anticipation for Low-Latency Spoken Dialogue

低延迟口语对话的端点预测

Sathvik Udupa, Shinji Watanabe, Petr Schwarz, Jan Cernocky

AI总结提出端点预测方法，通过提前预测对话结束信号实现低延迟，在部分上下文中投机执行LLM和TTS流水线，平均延迟降低505毫秒。

详情

Comments: Accepted at Interspeech 2026

AI中文摘要

虽然低延迟交互对于口语对话至关重要，但级联架构通常受限于反应式话轮结束检测。我们提出端点预测，从反应式检测转向主动预测结束信号。我们的基于语音的模型可提前最多2.56秒预测端点，从而能够在部分上下文中投机执行LLM和TTS流水线。我们引入指标来量化实现的延迟降低与计算冗余之间的权衡。在对话和任务导向数据集上的评估表明，我们的模型始终优于基于VAP的竞争基线。与Unmute框架的集成展示了平均延迟降低505毫秒，投机计算增加28.4%，有效掩盖了顺序瓶颈，从而在实时语音到语音交互中实现复杂推理。

英文摘要

While low-latency interaction is critical for spoken dialogue, cascaded architectures are often bottlenecked by reactive turn-completion detection. We propose Endpoint Anticipation, shifting from reactive detection to proactive forecasting of end-of-turn signals. Our speech-based model anticipates endpoints upto 2.56 seconds in advance, enabling speculative execution of LLM and TTS pipelines on partial context. We introduce metrics to quantify the trade-off between realized latency reduction and computational redundancy. Evaluation across conversational and task-oriented datasets shows our model consistently outperforms competitive VAP-based baselines. Integration with the Unmute framework demonstrates a 505 ms average latency reduction with a 28.4% increase in speculative computation, effectively masking sequential bottlenecks to enable complex reasoning in real-time speech-to-speech interaction.

URL PDF HTML ☆

赞 0 踩 0

2606.13193 2026-06-12 eess.AS cs.PL cs.SD 新提交

A Dual-Mode Faust-to-CLAP Compilation System

双模式 Faust 到 CLAP 编译系统

Facundo Franchino (1), Stéphane Letz (2), Jatin Chowdhury (3) ((1) University of York, (2) GRAME-CNCM, (3) Massachusetts Institute of Technology)

AI总结提出 faust2clap 框架，支持静态编译和动态解释两种模式，通过地址身份匹配算法和稳定槽位分配方案解决 DSP 参数身份保持问题，实现高效编译与热更新。

详情

Comments: 4 pages, 4 figures, 1 algorithm. Presented at the International Faust Conference (IFC-26), Lyon, France, June 2026

AI中文摘要

我们描述了 faust2clap，一个建立从 Faust DSP 规范到 CLAP 格式的首个官方维护编译路径的框架。该系统以两种不同模式运行。静态模式采用提前编译以生成最优效率的原生二进制文件，而动态模式使用运行时解释以允许在不中断宿主应用程序的情况下修改 DSP 代码。后一种能力解决了音频软件开发中一个长期存在的摩擦，即编辑、编译和重载循环的累积开销。我们详细阐述了两种模式背后的算法机制，特别关注参数身份问题。为了在结构 DSP 突变中保留参数值及其与宿主自动化的绑定，我们引入了一种基于地址的身份匹配算法和一种稳定的槽位分配方案。该实现包含约 2400 行 C++ 架构和 Python 工具代码，并已集成到 Faust 主发行版中。

英文摘要

We describe faust2clap, a framework establishing the first officially maintained compilation pathway from Faust DSP specifications to the CLAP format. The system operates in two different modes. A static mode employs ahead-of-time compilation to yield native binaries of optimal efficiency, while a dynamic mode uses runtime interpretation to permit DSP code modification without interrupting the host application. This latter capability addresses a persistent friction in audio software development, namely the cumulative overhead of the edit, compile, and reload cycle. We detail the algorithmic machinery underlying both modes, focusing specifically on the problem of parameter identity. To preserve both parameter values and their bindings to host automation across structural DSP mutations, we introduce an address-based identity matching algorithm and a stable slot allocation scheme. The implementation, comprising approximately 2,400 lines of C++ architecture and Python tooling code, has been integrated into the main Faust distribution.

URL PDF HTML ☆

赞 0 踩 0

2606.13109 2026-06-12 eess.AS cs.SD 新提交

Generating Training Targets for Real-World Speech Enhancement via Close-to-Distant Microphone Projection

为真实场景语音增强生成训练目标：通过近远麦克风投影

Tomohiro Nakatani, Rintaro Ikeshita, Naoyuki Kamo, Marc Delcroix, Shoko Araki

AI总结提出近远麦克风投影（C2D投影）方法，利用真实录音生成配对数据，通过参数化多通道维纳滤波器实现投影，训练神经网络在远场语音增强中优于现有GSS方法。

详情

AI中文摘要

在远距离语音捕获场景中训练语音增强（SE）神经网络需要配对的失真和干净参考语音信号。虽然此类数据通常通过模拟生成，但模拟与真实录音之间的不匹配显著限制了SE的准确性。为解决此问题，我们提出近远麦克风投影（C2D投影），一种从近距离和远距离麦克风捕获的真实录音中生成配对数据的方法。C2D投影估计一个最优投影矩阵，将近麦克风输入转换为与远麦克风录音对齐的干净参考信号，同时执行去噪。我们证明，使用参数化多通道维纳滤波器（PMWF）的变体可以有效地实现这种投影。实验结果表明，在具有挑战性的CHiME6晚宴派对ASR任务中，使用C2D投影数据训练的神经网络在oracle说话人日志条件下，当使用GSS的增强输出作为神经网络的辅助输入时，优于最先进的引导源分离（GSS）。

英文摘要

Training neural networks (NNs) for speech enhancement (SE) in distant speech-capturing scenarios requires paired distorted and clean reference speech signals. While such data are often generated through simulation, the mismatch between simulated and real recordings significantly limits SE accuracy. To address this issue, we propose Close-to-Distant microphone Projection (C2D projection), a method that generates paired data from real recordings captured by close and distant microphones. C2D projection estimates an optimal projection matrix that transforms close-microphone inputs into clean reference signals aligned with distant-microphone recordings, while simultaneously performing denoising. We show this projection can be effectively realized using a variant of the Parametric Multichannel Wiener Filter (PMWF). Experimental results demonstrate that an NN trained with C2D-projected data outperforms the state-of-the-art Guided Source Separation (GSS) on the challenging CHiME6 dinner party ASR task under oracle diarization, when using the enhanced output from GSS as an auxiliary input to the NN.

URL PDF HTML ☆

赞 0 踩 0

2606.13095 2026-06-12 eess.AS cs.SD 新提交

Balancing ASR and diarization in end-to-end LLMs for multi-talker speech recognition

在端到端大语言模型中平衡ASR与说话人日志以进行多说话人语音识别

Naijun Zheng, Yuke Lin, Sanli Tian, Mengtian Li, Zhiwei Lin, Longshuai Xiao, Dandan Tu

AI总结提出双编码器架构、特征交错格式、长度感知说话人ID损失和自适应阈值ASR损失策略，在有限真实数据下高效训练LLM系统，平衡ASR与说话人日志任务，在AliMeeting和Aishell4语料库上分别实现18%和24%的相对改进。

详情

Comments: Accepted in Interspeech 2026

AI中文摘要

多说话人语音识别通常通过结合自动语音识别（ASR）和说话人日志的流水线系统来处理。最近，基于大语言模型（LLM）的方法通过联合建模语义和说话人信息显示出前景，但它们通常需要大规模的多说话人语料库，而标注这些语料库成本高昂。在本文中，我们研究了如何在有限真实录音数据下高效训练基于LLM的系统，同时保持说话人归属的高准确性。我们提出了几种策略：（1）双编码器架构，用于提取语义和说话人特征；（2）特征交错格式，将这些特征合并作为LLM的输入；（3）长度感知的说话人ID损失，以增强日志能力；（4）自适应阈值的ASR损失计算，以减轻语音重叠引起的幻觉。这些策略平衡了ASR和说话人日志任务之间的训练。我们的系统优于开源基线方法，在AliMeeting语料库上实现了18%的相对改进，在Aishell4语料库上实现了24%的相对改进。

英文摘要

Multi-talker speech recognition is often addressed by combining automatic speech recognition (ASR) and speaker diarization in a pipeline system. Recently, LLM-based approaches have shown promise by jointly modeling semantic and speaker information, but they typically require large-scale multi-talker corpora that are costly to annotate. In this paper, we investigate how to efficiently train an LLM-based system with limited real-recorded data while maintaining high accuracy in speaker attribution. We propose several strategies: (1) a dual-encoder architecture to extract semantic and speaker features, (2) a feature interleaving format to merge these features as the inputs to the LLM, (3) a length-aware speaker ID loss to enhance diarization capability, and (4) an adaptive threshold strategy for ASR loss computation to mitigate hallucinations caused by speech overlaps. These strategies balance training between ASR and diarization tasks. Our system outperforms open-source baseline approaches, achieving relative improvements of 18% on the AliMeeting corpus and 24% on the Aishell4 corpus.

URL PDF HTML ☆

赞 0 踩 0

2606.13633 2026-06-12 eess.SY cs.LG 新提交

Aerial Wildfire Suppression Planning with a Hybrid CNN-Cellular Automata Fire Model

基于混合CNN-元胞自动机火灾模型的空中野火抑制规划

Ion Matei, Maksym Zhenirovskyy, Takuya Kurihana, Rohit Vupala, Anthony Wong

AI总结提出结合混合神经-元胞自动机野火模型与梯度优化空中投放的框架，通过蒙特卡洛采样和空间相关扰动量化不确定性，案例验证可生成有效抑制方案。

详情

AI中文摘要

空中野火抑制不仅需要预测火势蔓延，还需要在操作和环境不确定性下设计有效的干预策略。我们提出了一个空中野火抑制的建模与优化框架，该框架将混合神经-元胞自动机野火模型与基于梯度的目标空中投放设计相结合。野火模型根据地形、燃料和风数据预测空间变化的蔓延行为，而干预模块确定二元投放动作，其连续值位置和方向参数映射到模拟网格。水和阻燃剂具有不同的抑制效果，分别对应于立即减少活跃燃烧和持续减少未来蔓延。为了评估所得抑制方案的鲁棒性，我们通过每日火势状态的蒙特卡洛采样量化偶然不确定性，并通过空间相关的预测误差扰动量化认知不确定性。基于2020年Bear Fire的案例研究表明，该框架可以生成连贯的空中抑制调度，以减少总火灾影响面积，并支持对野火干预策略的不确定性感知分析。

英文摘要

Aerial wildfire suppression requires not only predicting fire spread, but also designing effective intervention strategies under operational and environmental uncertainty. We present a modeling and optimization framework for aerial wildfire suppression that combines a hybrid neural-cellular automaton wildfire model with gradient-based design of targeted aerial drops. The wildfire model predicts spatially varying spread behavior from terrain, fuel, and wind data, while the intervention module determines binary drop actions with continuous-valued location and orientation parameters mapped to the simulation grid. Water and retardant are represented with distinct suppression effects, corresponding to immediate reduction of active burning and persistent reduction of future spread. To evaluate the robustness of the resulting suppression plans, we quantify both aleatoric uncertainty through Monte Carlo sampling of daily fire-state realizations and epistemic uncertainty through spatially correlated prediction-error perturbations. A case study based on the 2020 Bear Fire shows that the framework can generate coherent aerial suppression schedules for reducing total fire-affected area and can support uncertainty-aware analysis of wildfire intervention strategies.

URL PDF HTML ☆

赞 0 踩 0

2606.13601 2026-06-12 cs.RO eess.SY 新提交

MCR-Bionic Hand: Anatomical Structural Priors for Dexterous Manipulation

MCR-Bionic Hand: 用于灵巧操作的解剖结构先验

Haosen Yang, Guowu Wei

发表机构 * University of Salford（索尔福德大学）

AI总结本文提出MCR-Bionic Hand，一种基于人体手部解剖结构先验的仿生机械手，通过结构智能实现低维控制到灵巧操作的映射，在接触密集型任务中验证了其有效性。

详情

AI中文摘要

灵巧机器人手通常被表述为由自由度、驱动和控制算法支配的高维主动控制系统。然而，人类手的灵巧性部分编码在骨骼、韧带、肌腱、腱膜和内在肌肉的物理结构中。本文将这种贡献描述为两种相互关联的结构智能形式：结构先验生成，其中腕指腱固定、FDS/FDP路径和背侧伸肌腱帽将低维姿态输入转换为默认抓取构型及PIP到DIP协调；以及肌肉介导的调节，其中外在肌、蚓状肌和骨间肌围绕该默认状态调节MCP姿态、远端稳定性、指尖力路径和接触状态。基于此框架，MCR-Bionic Hand被开发为一个1:1肌肉骨骼仿生手，在一个主体内集成了两排八骨手腕、跨腕肌腱、解剖屈肌腱路径、掌板和侧副韧带约束、背侧伸肌腱帽以及内在肌通路。功能演示和几何力学模型表明，手腕姿态诱导多关节预塑形，伸肌腱帽将PIP姿态映射为耦合的DIP响应，而内在肌通路在抓取形成后调节远端稳定性和指尖动作方向。接触密集型任务，包括硬币旋转、笔传递、手背翻硬币和立方体操作，表明MCR-Bionic将低维状态生成与精细接触后调节联系起来。这些结果表明，解剖仿生学的价值不在于视觉相似性，而在于识别执行部分控制功能的人手结构。

英文摘要

Dexterous robotic hands are usually formulated as high dimensional active control systems governed by degrees of freedom, actuation, and algorithms. Human hand dexterity, however, is partly encoded in the physical architecture of bones, ligaments, tendons, aponeuroses, and intrinsic muscles. This work describes that contribution as two linked forms of structural intelligence: structural prior generation, in which wrist to finger tenodesis, FDS/FDP routing, and the dorsal extensor hood transform low dimensional posture inputs into default grasp configurations and PIP to DIP coordination; and muscle mediated modulation, in which extrinsic muscles, lumbricals, and interossei regulate MCP posture, distal stability, fingertip force paths, and contact states around that default state. Based on this framework, MCR-Bionic Hand is developed as a 1:1 musculoskeletal biomimetic hand integrating a two row eight bone wrist, cross wrist tendons, anatomical flexor routing, volar plate and collateral ligament constraints, the dorsal extensor hood, and intrinsic muscle pathways within one body. Functional demonstrations and geometric mechanical models show that wrist posture induces multi joint pre shaping, the extensor hood maps PIP posture to a coupled DIP response, and intrinsic plus pathways modulate distal stability and fingertip action direction after grasp formation. Contact rich tasks, including coin rotation, pen transfer, dorsal coin flipping, and cube manipulation, show that MCR-Bionic links low dimensional state generation with fine post contact modulation. These results suggest that anatomical biomimetics is valuable not for visual similarity, but for identifying human hand structures that perform part of control.

URL PDF HTML ☆

赞 0 踩 0

2606.13511 2026-06-12 eess.SY cs.NI eess.SP 新提交

Spectrum Sharing Across Terrestrial and Non-Terrestrial Services in the FR3 Upper Midband

FR3上中频段的地面与非地面服务频谱共享

Paolo Testolina, Ergest Beshaj, Michele Polese, Tommaso Melodia

AI总结针对6G地面系统与卫星系统在FR3频段的共存问题，采用波士顿大规模3D模型和光线追踪评估干扰，发现旁瓣和非视距路径显著贡献干扰，地面基站空间分布影响干扰水平。

详情

Comments: 9 pages, 6 figures, 1 table. Presented 2025 IEEE DySPAN. Copyright 2025 IEEE. Please cite it as: P. Testolina, E. Beshaj, M. Polese and T. Melodia, "Spectrum Sharing Across Terrestrial and Non-Terrestrial Services in the FR3 Upper Midband," 2025 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), London, United Kingdom, 2025, pp. 1-9, doi: https://doi.org/10.1109/DySPAN64764.2025.11115947

AI中文摘要

7至24 GHz之间的频段，也称为上中频段或频率范围（FR）3，正被视为第六代（6G）移动网络的使能频段。与24 GHz以上频率相比，该频谱部分表现出不同的传播特性，同时也有潜力为移动系统提供比sub-6 GHz范围更大的带宽分配。然而，6G技术和频谱政策需要保证与已使用这些频段的现有系统的共存，这些系统包括从无线电定位到卫星通信、遥感以及射电天文学等多种服务。在本文中，我们考虑了6G地面系统与卫星现有系统在FR3频段不同部分共存的挑战。利用波士顿城市地面部署的大规模3D模型和开源光线追踪解决方案，我们评估了数十个地面下一代节点B（gNB）在不同仰角下对卫星产生的射频干扰（RFI）水平。我们的模型基于真实的遮挡、杂波、衍射和反射，表明旁瓣和非视距（NLoS）路径可能显著贡献RFI。除了方向性，gNB的空间分布在定义RFI水平中也起着关键作用，这表明精心设计和运营地面部署可以创造共存机会。

英文摘要

The frequency bands between 7 and 24 GHz, also known as upper midband or Frequency Range (FR) 3, are being considered as an enabler of 6th Generation (6G) mobile networks. This portion of the spectrum exhibits different propagation characteristics compared to frequencies above 24 GHz, while also offering the potential to provide larger bandwidth allocations for mobile systems than those available in the sub-6 GHz range. 6G technology and spectrum policy, however, will need to guarantee coexistence with the incumbents that already use these frequency bands, which include a variety of services, from radiolocation to satellite-based communications, remote sensing, and radioastronomy. In this paper, we consider the challenge of coexistence between 6G terrestrial systems and satellite incumbents in different portions of the FR3 bands. Using a large-scale 3D model of a terrestrial deployment in the city of Boston and an open-source ray tracing solution, we evaluate the level of Radio Frequency Interference (RFI) that tens of terrestrial Next Generation Node Bs (gNBs) generate toward satellites at different elevation angles. Our model, based on realistic obstruction, clutter, diffraction, and reflections, shows that sidelobes and Non-Line-of-Sight (NLoS) paths can significantly contribute to RFI. Besides directionality, the spatial distribution of gNBs also plays a key role in defining the RFI levels, suggesting that a careful design and operation of terrestrial deployments can create coexistence opportunities.

URL PDF HTML ☆

赞 0 踩 0

2606.13485 2026-06-12 eess.SY 新提交

Impedance MPC with Patient-Torque Estimation for Knee Rehabilitation Exoskeletons

用于膝关节康复外骨骼的阻抗模型预测控制与患者力矩估计

Yongyan Cao, Jinshan Tang

AI总结提出阻抗模型预测控制框架，结合卡尔曼扰动状态估计患者力矩，实现无偏移跟踪和辅助按需，在500 Hz下满足临床精度标准。

详情

AI中文摘要

膝关节康复外骨骼必须强制执行规定的关节轨迹，同时保持对非自主痉挛和自主患者努力的安全顺从——这是任何固定增益阻抗控制器的目标冲突。我们提出了一种用于膝关节康复外骨骼的阻抗模型预测控制框架，并在串联弹性执行器（SEA）平台上进行了演示：代数前馈将膝关节动力学简化为常系数标量双积分器，而滚动时域二次规划（QP）计算校正力矩，同时强制执行硬性的运动范围、力矩和速度限制（ISO 13482）。由直接基于SEA的力矩传感（通过弹性元件测量的串联弹性弹簧挠度——一种固有的、无EMG的患者力矩估计，而非单独的力传感器）驱动的卡尔曼扰动状态提供了标称无偏移保证，并通过其符号和期望运动方向实现无传感器的辅助按需。常状态矩阵允许离线预计算QP成本逆，从而实现多步时域下的500 Hz运行。在七个控制器基准测试（正弦跟踪、等长保持）中，500 Hz卡尔曼MPC在15 Nm痉挛下实现了0.1 mrad RMS、0.1 mrad稳态、0.2 mrad峰值的无偏移，而相同刚度下的经典阻抗控制器稳态偏移为515 mrad——直接测量通道几乎立即（几个采样周期内）收敛估计。没有估计器时，它实现经典阻抗（4.8 mrad RMS，8.3 mrad稳态）。所有MPC变体均满足87 mrad临床标准；没有经典控制器满足。该架构通过考虑耦合的每个关节QP为20自由度MyoSuite myoLeg设计。

英文摘要

Knee rehabilitation exoskeletons must enforce a prescribed joint trajectory while remaining safely compliant with involuntary spasm and voluntary patient effort-objectives in tension for any fixed-gain impedance controller. We present an Impedance Model Predictive Control framework for knee rehabilitation exoskeletons, demonstrated on a series-elastic-actuator (SEA) platform: an algebraic feedforward reduces the knee dynamics to a constant-coefficient scalar double integrator, and a receding-horizon quadratic program (QP) computes corrective torques while enforcing hard range-of-motion, torque, and velocity limits (ISO 13482). A Kalman disturbance state driven by direct SEA-based torque sensing (the series-elastic spring deflection measured through the elastic element - an intrinsic, EMG-free patient-torque estimate, not a separate load cell) gives a nominal offset-free guarantee and, via its sign and the desired-motion direction, sensorless Assist-as-Needed. The constant state matrix permits offline precomputation of the QP cost inverse, enabling 500 Hz operation with a multi-step horizon. Across seven-controller benchmarks (sinusoidal tracking, isometric hold), the 500 Hz Kalman MPC is offset free 0.1 mrad RMS, 0.1 mrad steady-state, 0.2 mrad peak under 15 Nm spasm, versus a 515 mrad steady-state offset for classical impedance at the same stiffness - the direct-measurement channel converging the estimate near-immediately (within a few sampling periods). Without the estimator it realizes a classical impedance (4.8 mrad RMS, 8.3 mrad steady-state). All MPC variants meet the 87 mrad clinical criterion; no classical controller does. The architecture is formulated for the 20 DOF MyoSuite myoLeg via coupling-aware per-joint QPs.

URL PDF HTML ☆

赞 0 踩 0

2606.13479 2026-06-12 eess.SY 新提交

A Reactive Redistribution Mechanism for STL Tasks in Multi-Agent Systems Under Time-Varying Communication

时变通信下多智能体系统中STL任务的响应式再分配机制

Gregorio Marchesini, Bjarne Jan Jesse Moro, Siyuan Liu, Lars Lindemann, Dimos V. Dimarogonas

AI总结提出一种通信感知的任务分解框架，通过图转移系统处理析取STL规范，并引入再分配机制应对时变通信网络，实现动态任务重分配，在Crazyflie无人机群上验证了可扩展性。

详情

AI中文摘要

我们提出了一种通信感知的任务分解框架，用于具有信号时序逻辑（STL）中指定的协作相对配置目标的多智能体系统，允许在时变通信网络下进行动态任务重新分配。基于我们之前的工作，该框架支持直接使用现有的反馈控制器来实现响应式任务满足。我们解决了两个关键挑战：析取STL规范与时变通信网络。析取规范通过一个图转移系统来处理，该系统捕获由逻辑OR运算符引起的替代任务序列。为了解决时变连通性，我们引入了一种再分配机制，该机制在网络演化过程中将任务从断开的智能体转移到连接的智能体，同时保持分散执行。在Crazyflie无人机群上的仿真和实验证明了在智能体数量、通信连通性和规范复杂性方面的可扩展性。

英文摘要

We present a communication-aware task decomposition framework for multi-agent systems with collaborative relative configuration objectives specified in Signal Temporal Logic (STL), allowing for dynamic task reallocation under time-varying communication networks. Building on our prior work, the framework supports the direct use of existing feedback controllers for reactive task satisfaction. We address two key challenges: disjunctive STL specifications and time-varying communication networks. Disjunctive specifications are handled through a graph transition system that captures the alternative task sequences induced by logical OR operators. To address time-varying connectivity, we introduce a redistribution mechanism that transfers tasks from disconnected agents to connected ones as the network evolves while preserving decentralized execution. Simulations and experiments on a swarm of Crazyflie drones demonstrate scalability in the number of agents, communication connectivity, and specification complexity.

URL PDF HTML ☆

赞 0 踩 0

2606.13465 2026-06-12 eess.SY 新提交

Embodied Opinion Dynamics for Safety-Critical Motion Control in Dynamic Environments

动态环境中安全关键运动控制的具身意见动力学

Zhiqi Tang, Yu Xing

AI总结提出一种自适应控制框架，将非线性意见动力学嵌入自动车辆的运动感知层，实现非合作环境下的安全决策与运动控制，并提供形式化安全保证与稳定性分析。

2606.13421 2026-06-12 eess.SY 新提交

When expectation fails: stochastic MPC of linear systems with random input losses

当期望失效：具有随机输入损失的线性系统的随机MPC

Paul Trodden, Xinda Li

AI总结针对乘性二元输入不确定性下的约束线性系统，揭示基于期望的随机MPC在结构性质上的根本缺陷，包括值函数非单调性和吸引域退化，并证明确定性等价MPC在递归可行性上更优。

详情

Comments: This work has been submitted to the IEEE for possible publication

AI中文摘要

我们考虑受乘性二元输入不确定性影响的约束线性系统的随机模型预测控制（MPC），其动机源于诸如网络控制中的丢包和间歇驱动等应用。在这种情况下的常见方法是用期望替代随机动力学，从而得到易于处理的公式，这些公式允许标准的终端成分和期望意义上的稳定性保证。我们表明，这些公式可能表现出与确定性MPC根本不同的结构性质，并且可能作为实现闭环行为的指标产生误导。特别地，期望值函数在预测时域上不一定单调，并且基于值函数的吸引域内部近似可能随着时域增加而恶化。此外，我们建立了与确定性等价（乐观）MPC的概率比较，表明后者在随机MPC证明可行性但以概率1失败的情况下，能够确保严格正的概率递归可行性。这些结果凸显了基于期望的随机MPC对于具有乘性二元不确定性的系统的固有限制，并促使重新审视如何将随机性纳入此类系统的约束预测控制设计中。

英文摘要

We consider stochastic model predictive control (MPC) for constrained linear systems subject to multiplicative binary input uncertainty, motivated by applications such as networked control with packet losses and intermittent actuation. A common approach in this setting replaces the stochastic dynamics with their expectation, yielding tractable formulations that admit standard terminal ingredients and stability guarantees in expectation. We show that such formulations can exhibit structural properties that differ fundamentally from those of deterministic MPC and may be misleading as indicators of realized closed-loop behaviour. In particular, the expected value function is not necessarily monotonic in the prediction horizon, and value function-based inner approximations of the region of attraction may deteriorate as the horizon increases. Furthermore, we establish a probabilistic comparison with certainty-equivalent (optimistic) MPC, showing that the latter can ensure a strictly positive probability of recursive feasibility in situations where stochastic MPC certifies feasibility but fails with probability one. These results highlight inherent limitations of expectation-based stochastic MPC for systems with multiplicative binary uncertainty and motivate a re-examination of how stochasticity is incorporated into constrained predictive control design for such systems.

URL PDF HTML ☆

赞 0 踩 0

2606.13358 2026-06-12 eess.SY 新提交

Sizing of a grid-forming power converter to improve the small-signal stability of an LCC-HVDC system connected to a weak grid

面向改善弱电网下LCC-HVDC系统小信号稳定性的构网型电力变换器容量设计

Anup Joshi, Javier Renedo, Xavier Guillaud

AI总结本文提出一种简化的LCC-HVDC模型，并建立包含构网型VSC的小信号状态空间模型，分析其对弱电网下LCC-HVDC稳定性的改善作用，发现即使较小容量的GFM-VSC也能保证系统稳定。

详情

AI中文摘要

电网换相换流器高压直流输电（LCC-HVDC）已被证明是远距离大容量输电的可靠技术。然而，换流器接口发电（CIG）渗透率的增加导致交流电网变弱，使LCC-HVDC系统的运行变得脆弱，并对其稳定性构成严重挑战。构网型（GFM）控制的电压源换流器（VSC）已被证明在弱电网条件下具有稳定作用。然而，文献中尚未深入研究GFM控制的VSC（GFM-VSC）对弱电网条件下LCC-HVDC稳定性的影响。本文提出并验证了一个简化的LCC-HVDC模型。然后，建立了由上述LCC-HVDC、GFM-VSC和无穷大电网组成的系统的小信号状态空间模型，以研究不同组件之间的相互作用。小信号稳定性分析表明，GFM-VSC对弱电网条件下LCC-HVDC链路的稳定性具有稳定作用。此外，对GFM功率变换器容量的研究表明，在本研究分析的测试系统中，即使GFM功率变换器容量相对于总额定视在功率（LCC-HVDC额定功率与GFM-VSC额定视在功率之和）的占比很小，也足以保证系统的稳定性。本文仅关注小信号稳定性，但需要强调的是，在选择GFM-VSC的最终容量时，还应考虑其他稳定性现象。

英文摘要

Line-commutated converter high-voltage direct current (LCC-HVDC) has proven to be a reliable technology for bulk power transmission over long distances. However, the growing penetration of converter interfaced generation (CIG) is resulting in weaker AC grids, rendering the operation of LCC-HVDC systems vulnerable and posing a serious challenge to their stability. Grid-forming (GFM) controlled voltage source converter (VSC) have been shown to provide stabilizing impact in weak grid conditions. However, the impact of GFM controlled VSCs (GFM-VSC) on stability of LCC-HVDC in weak grid conditions has not been studied in depth in the literature. In this paper, a simplified model of LCC-HVDC is proposed and validated. Then a small-signal state-space model of a system consisting of aforementioned LCC-HVDC, a GFM-VSC and an infinite grid is developed to study the interactions between different components. The small-signal stability analysis shows the stabilizing effect of the GFM-VSC on the stability of the LCC-HVDC link in weak grid condition. Furthermore, the study on the sizing of the GFM power converter reveals that even a modest share of the capacity of the GFM power converter relative to the total nominal apparent power (sum of nominal power of LCC-HVDC and the nominal apparent power of GFM-VSC) is sufficient to ensure the stability of the system, in the test system analyzed in this study. This work just focuses in small-signal stability, but it is important to highlight that other stability phenomena should also be taken into account when selecting the final size of the GFM-VSC.

URL PDF HTML ☆

赞 0 踩 0

2606.13315 2026-06-12 cs.CV eess.IV 新提交

Masked and Predictive Self-Supervised Foundation Models for 3D Brain MRI

用于3D脑部MRI的掩码和预测自监督基础模型

Esra Ergün, Hersh Chandarana, Dan Sodickson, Gözde Ünal

发表机构 * Istanbul Technical University（伊斯坦布尔理工大学）； NYU Langone Health（纽约大学朗格尼医学中心）

AI总结研究自监督基础模型在MRI疾病检测中的应用，提出频谱域重建损失（MAE）和方差-协方差正则化（JEPA）两种方法，在五个下游任务中验证了目标设计对任务结构匹配的重要性。

详情

AI中文摘要

自监督基础模型在医学影像中展现出巨大潜力。然而，现有的MRI基础模型研究主要强调分割和密集预测任务，而针对基于MRI的疾病检测的自监督基础模型的系统研究仍然有限。在这项工作中，我们研究了两种主要的自监督预训练范式用于基于MRI的疾病检测：通过掩码自编码器（MAE）的基于重建的学习和通过联合嵌入预测架构（JEPA）的预测表示学习。我们通过引入一种新颖的MAE频谱域重建损失来增强对细粒度解剖结构的敏感性，并通过在我们的JEPA框架中集成方差-协方差正则化（VCR）来鼓励去相关的潜在表示，从而研究辅助目标的作用。我们的模型在对比度无关的设置下，在异质单对比度MRI体积上进行预训练，无需模态拼接。在五个下游疾病检测任务中，我们的结果突出了自监督目标设计对医学基础模型预训练的重要性，表明每个目标的下游收益由其与任务结构的相关性决定。具体来说，当下游判别信号以强高频解剖结构为特征时，频谱正则化带来最大的改进；而当判别信息跨越多个去相关的特征维度时，协方差正则化最为有益。具有频谱域监督的MAE在基于MRI的疾病检测中始终实现优越的下游性能。这些发现表明，医学影像中的自监督目标编码了特定的偏差，其下游收益根本上取决于任务的结构。

英文摘要

Self-supervised foundation models have shown strong promise in medical imaging. However, existing MRI foundation-model studies have primarily emphasized segmentation and dense prediction tasks, while systematic investigation of self-supervised foundation models for MRI-based disease detection remains limited. In this work, we investigate two major self-supervised pretraining paradigms for MRI-based disease detection: reconstruction-based learning via Masked Autoencoders (MAE) and predictive representation learning via Joint Embedding Predictive Architectures (JEPA). We study the role of auxiliary objectives by introducing a novel spectral-domain reconstruction loss for MAE to enhance sensitivity to fine-grained anatomical structure, and by integrating variance--covariance regularization (VCR) within our JEPA framework to encourage decorrelated latent representations. Our models are pretrained on heterogeneous single-contrast MRI volumes in a contrast-agnostic setting, without modality concatenation. Across five downstream disease detection tasks, our results highlight the importance of self-supervised objective design for medical foundation model pretraining, demonstrating that the downstream benefit of each objective is determined by its relevance to the task's structure. Specifically, spectral regularization yields the largest improvements when the downstream discriminative signal is characterized by strong high-frequency anatomical structures, while covariance regularization is most beneficial when discriminative information spans multiple decorrelated feature dimensions. MAE with spectral-domain supervision consistently achieves superior downstream performance for MRI-based disease detection. These findings suggest that self-supervised objectives in medical imaging encode specific biases, and their downstream benefit is fundamentally conditioned on the task's structure.

URL PDF HTML ☆

赞 0 踩 0

2606.13226 2026-06-12 eess.SY 新提交

Multi-Phase Optimization of Shared Charging Infrastructure for Freight Electrification

货运电气化共享充电基础设施的多阶段优化

Joas Kahlert, Jiali Fu, Chengxi Liu

AI总结提出多阶段优化框架，利用两家货运公司的高分辨率轨迹数据，最小化充电站总数并满足电气化目标，揭示共享基础设施可减少冗余投资但增加依赖。

详情

Comments: This work has been submitted to the IEEE for possible publication

AI中文摘要

向重型电池电动车的转型需要高效且经济地部署充电基础设施，尤其是在多个运营商共享资源的情况下。本文提出了一个多阶段优化框架，用于共享网络中充电站的联合规划，使用了来自两家运营特征不同的货运公司的高分辨率经验卡车轨迹数据。模型被制定为在连续扩展阶段中最小化充电站总数，同时确保达到预定的电气化目标。分析捕捉了车队使用的异质性，一家公司运营空间集中的网络，路线较短且一致，另一家公司则表现出更分散的运营，具有更长且变化更多的驾驶模式。结果表明，早期基础设施部署主要支持集中运营的车队，而后期扩展阶段对于容纳长途和地理上分散的运输需求至关重要。此外，共享基础设施不仅能够减少冗余投资，还会引入依赖关系，某些车队严重依赖整个网络来维持电气化运营。总体而言，研究结果强调了协调和数据驱动的基础设施规划的重要性，并表明车队特定特征强烈影响基础设施需求和电气化结果。所提出的框架提供了关于协作和分阶段部署策略如何增强货运电气化的可扩展性和效率的实践见解。

英文摘要

The transition to heavy-duty battery electric vehicles requires an efficient and cost-effective deployment of the charging infrastructure, particularly when multiple operators share resources. This paper presents a multi-phase optimization framework for the joint planning of charging stations in a shared network, using high-resolution empirical truck trajectory data from two freight companies with distinct operational characteristics. The model is formulated to minimize the total number of charging stations while ensuring that the predefined electrification targets are met over successive expansion stages. The analysis captures heterogeneity in fleet usage, with one company operating a spatially concentrated network with shorter and more consistent routes, and the other exhibiting more dispersed operations with longer and more variable driving patterns. The results show that early-stage infrastructure deployment primarily supports fleets with concentrated operations, while later expansion phases are essential to accommodate long-haul and geographically dispersed transport demand. Furthermore, shared infrastructure not only enables reductions in redundant investments, but also introduces dependencies where certain fleets rely heavily on the full network to sustain electrified operations. In general, the findings highlight the importance of coordinated and data-driven infrastructure planning, and demonstrate that fleet-specific characteristics strongly influence both infrastructure requirements and electrification outcomes. The proposed framework provides practical insights on how collaborative and phased deployment strategies can enhance the scalability and efficiency of freight transport electrification.

URL PDF HTML ☆

赞 0 踩 0

2606.13215 2026-06-12 eess.SY cs.CE 新提交

Mitigating business risks from renewable PPA power sourcing uncertainties for European green hydrogen production: Robust system design, regulatory adjustments and offtake flexibility

缓解可再生能源PPA电力采购不确定性对欧洲绿氢生产的商业风险：稳健系统设计、监管调整与承购灵活性

Jonathan Brandt, Astrid Bensmann, Richard Hanke-Rauschenbach

AI总结针对欧洲绿氢生产因可再生能源电力采购规则（如附加性和时间相关性）导致的商业风险，本研究通过系统设计优化、监管调整和承购灵活性分析，提出缓解措施并评估成本影响。

详情

AI中文摘要

随着中东持续危机推动能源价格近年来第二次飙升，欧盟对化石能源进口的持续依赖日益明显。然而，尽管本地绿氢生产提供了改善能源韧性的诱人前景，其启动速度远落后于2022年能源危机后设定的官方雄心。已宣布与已实现生产项目之间执行差距扩大的一个显著原因是可再生能源电力采购规则过于严格，促使成员国部委和欧盟委员会提议将计划中的规则审查从2028年提前至2026年。为促进成功的审查和规则调整，我们解决了理解电力采购规则对绿氢生产影响的一个重要空白。从欧洲电解槽运营商的角度出发，我们展示了附加性标准及其与所需时间相关性的相互作用如何危及绿氢承购协议的履行，并影响不同欧洲竞价区域的绿氢生产成本。将不同设计范式应用于绿氢生产系统表明，电解槽运营商措施（如PPA和储能扩容）有助于缓解附加性标准带来的商业风险，但成本增加。或者，放松时间相关性和增加承购灵活性同时提高了生产系统的稳健性并降低了生产成本。其中，放松时间相关性规则不会导致排放强度阈值超标，突显了延长过渡规则支持欧洲绿氢生产启动的潜力。

英文摘要

As energy prices surge for the second time in recent years driven by the ongoing crisis in the Middle East, the European Union's continuing reliance on fossil energy imports is becoming increasingly apparent. However, despite offering an intriguing prospect of improved energy resilience, the ramp-up of local green hydrogen production lags far behind the officially stated ambitions set after the 2022 energy crisis. A prominent reason for the widening implementation gap between announced and realised production projects is overly strict rules on renewable power sourcing, prompting Member states' ministries and the European Commission to propose advancing a planned rules review from 2028 to 2026. To contribute to a successful review and rule adjustments, we address an important gap in understanding the effects of power purchase rules on green hydrogen production. By taking the perspective of European electrolyser operators, we show how the criterion of additionality and its interaction with required temporal correlation can jeopardise the fulfilment of green hydrogen offtake agreements and affect green hydrogen production costs across different European bidding zones. Applying different design paradigms to a green hydrogen production system reveals that electrolyser operator measures, such as PPA and storage upsizing, can help to mitigate the business risks posed by the additionality criterion but come with increased costs. Alternatively, relaxed temporal correlation and increased offtake flexibility both increase production system robustness and reduce production costs simultaneously. Whereby relaxing temporal correlation rules does not result in exceeded emission intensity thresholds, underlining the potential of extended transitional rules to support the ramp-up of European green hydrogen production.

URL PDF HTML ☆

赞 0 踩 0