arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.30339 2026-05-29 cs.CV cs.MM cs.SD eess.AS

Benchmarking Single-Factor Physical Video-to-Audio Generation

单因素物理视频到音频生成的基准测试

Tingle Li, Siddharth Gururani, Kevin J. Shih, Gantavya Bhatt, Sang-gil Lee, Zhifeng Kong, Arushi Goel, Gopala Anumanchipalli, Ming-Yu Liu

AI总结 提出FlatSounds基准,通过控制反事实对和单视频模式测试评估视频到音频模型的物理推理能力,发现模型依赖文本描述而非视觉流,且物理准确性与时序对齐存在权衡。

详情
Comments
CVPR 2026
AI中文摘要

生成式视频到音频(V2A)模型能产生高度逼真的音轨,但尚不清楚它们是否捕捉了底层物理过程。现有评估强调感知真实性,忽视了在受控干预下的物理正确性。本文中,我们引入FlatSounds,一个通过以下方式审计V2A模型物理推理的基准:1)改变单个物理因素的受控反事实对,以及2)探测内部一致性和方向趋势的单视频模式测试。这些设置测试生成的音频是否正确反映特定的物理属性和时序。我们对最先进模型的评估揭示了一致的权衡:模型更依赖文本描述而非视觉流来推断物理和语义。描述通常提高物理和语义准确性,但矛盾地降低了时序对齐。我们的结果强调了需要超越音频质量,直接从像素学习物理过程。最后,我们发现我们的基于物理的指标与我们自己数据上的人类偏好测试强相关。项目网页:https://research.nvidia.com/labs/cosmos-lab/flatsounds/

英文摘要

Generative video-to-audio (V2A) models produce highly plausible soundtracks, but it remains unclear whether they capture the underlying physical processes. Existing evaluations emphasize perceptual realism and overlook physical correctness under controlled interventions. In this paper, we introduce FlatSounds, a benchmark that audits the physical reasoning of V2A models through: 1) controlled counterfactual pairs in which a single physical factor is varied, and 2) single-video pattern tests that probe internal consistency and directional trends. These settings test whether the generated audio correctly reflects specific physical properties and timings. Our evaluation of state-of-the-art models reveals a consistent trade-off: models rely more on text captions than the visual stream to infer physics and semantics. Captions generally improve physical and semantic accuracy, but paradoxically degrade temporal alignment. Our results highlight the need to move beyond audio quality toward learning physical processes directly from pixels. Finally, we find that our physics-based metrics correlate strongly with human preference tests on our own data. Project webpage: https://research.nvidia.com/labs/cosmos-lab/flatsounds/

2605.30269 2026-05-29 cs.CV eess.IV

Boosting Image Quality Assessment Performance: Unsupervised Score Fusion by Deep Maximum a Posteriori Estimation

提升图像质量评估性能:基于深度最大后验估计的无监督分数融合

Zhongling Wang, Raymond Zhou, Shahrukh Athar, Wenbo Yang, Zhou Wang

AI总结 提出一种基于深度最大后验估计的无监督图像质量评估分数融合框架,通过细粒度不确定性估计提高融合预测的准确性并降低不确定性。

详情
Comments
2024 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)
AI中文摘要

在过去的几十年中,出现了许多图像质量评估(IQA)模型,旨在预测图像的感知质量。然而,单个模型往往偏向于某些类型的图像内容或失真,具体取决于设计原则和过程。一个直观的想法是通过将多个模型的分数融合成一个更强的模型,来利用每个IQA模型的优势并减轻其弱点。在此,我们首次尝试为这一想法寻求最优解,并提出一个基于深度最大后验(MAP)估计的无监督IQA分数融合通用框架。所提出的模型在分数级别进行细粒度不确定性估计,以提高准确性并降低融合预测中的不确定性。综合实验表明,所提出的模型优于单个IQA模型和其他融合方法。它还在融合过程中展现出拒绝“坏”模型的有趣能力。

英文摘要

Over the past decades, numerous Image Quality Assessment (IQA) models have emerged, aiming to predict the perceptual quality of images. However, individual models are often biased toward certain types of image content or distortions, depending on the design principle and process. An intuitive idea is to harness the strengths and mitigate the weaknesses of each IQA model, by fusing the scores of multiple models into a stronger one. Here we make one of the first attempts to seek an optimal solution for the idea and propose a general framework for unsupervised IQA score fusion using deep Maximum a Posteriori (MAP) estimation. The proposed model conducts fine-grained uncertainty estimation at the score level to increase the accuracy and reduce the uncertainty in fused predictions. Comprehensive experiments demonstrate the superiority of the proposed model over individual IQA models and other fusion methods. It also exhibits an interesting capability of rejecting ``bad" models in the fusion process.

2605.30222 2026-05-29 eess.SY cs.SY

Optimization of Predictive Maintenance Schedules under Uncertainty: A Scenario-Based Theoretical Framework

不确定性下预测性维护调度的优化:基于场景的理论框架

Jerzy Baranowski, Waldemar Bauer

AI总结 提出一个基于场景的框架,整合日历、使用和剩余寿命预测三种信息,在有限规划期内优化预测性维护调度,并通过期望成本和尾部风险准则评估完整维护计划。

详情
Comments
This work has been submitted to the IEEE for possible publication
AI中文摘要

本文提出了一个基于场景的框架,用于在有限规划期内的不确定性下进行预测性维护调度。所考虑的情景涉及多个资产,其维护决策由三种异构信息源提供:基于日历的大修间隔、由不确定的未来运行周期驱动的基于使用的限制,以及通过具有不确定性的剩余使用寿命(RUL)估计表示的状态监测输出。虽然这些元素在维护文献中已被广泛研究,但它们通常被单独处理或仅部分整合。相比之下,所提出的公式在模拟的未来场景下评估完整的维护计划,并使用期望成本和尾部风险准则进行比较。贡献主要是概念性和方法性的:我们定义了一个统一的有限规划期决策框架,将基于日历、使用和预测的信息结合在一个共同的调度问题中。一个小型合成计算示例被用作概念验证。结果表明,集成的基于场景的策略可以显著优于更简单的单触发规则,而在当前校准下,风险中性和风险感知的集成策略之间的差异仍然很小。

英文摘要

This paper proposes a scenario-based framework for predictive maintenance scheduling under uncertainty in a finite planning horizon. The considered setting involves multiple assets for which maintenance decisions are informed by three heterogeneous sources of information: calendar-based overhaul intervals, usage-based limits driven by uncertain future operating cycles, and condition-monitoring outputs represented through remaining useful life (RUL) estimates with uncertainty. While these elements have been studied extensively in the maintenance literature, they are often treated separately or only partially integrated. In contrast, the proposed formulation evaluates complete maintenance schedules under simulated future scenarios and compares them using expected-cost and tail-risk criteria. The contribution is primarily conceptual and methodological: we define a unified finite-horizon decision framework that combines calendar-, usage-, and prognostics-based information within a common scheduling problem. A small synthetic computational example is used as a proof of concept. The results show that integrated scenario-based policies can substantially outperform simpler single-trigger rules, while the difference between risk-neutral and risk-aware integrated policies remains modest under the present calibration.

2605.30183 2026-05-29 eess.SY cs.SY

Fault-Ride-Through Coordination Strategy for Offshore AC Islands with Multi-Infeed HVDC Interconnections

多馈入HVDC互联的海上交流孤岛故障穿越协调策略

Eleni Tsotsopoulou, Vasileios Psaras, Dionysios Moutevelis, Oriol Gomis-Bellmunt, Alexandros Paspatis

AI总结 针对多馈入HVDC互联的海上交流孤岛,提出一种协调控制策略,通过引入零有功/无功注入和故障后有功下垂控制,实现故障穿越并满足电网规范。

详情
AI中文摘要

大规模海上风电场被认为是实现可持续电力系统的关键资产。这些系统通常配置为海上交流孤岛,其集成很大程度上依赖于高压直流(HVDC)技术。这种拓扑虽然能够实现长距离海上输电的经济性,但可能带来运行挑战。具体而言,海上交流孤岛在故障期间的运行以及电网规范要求的满足被识别为其大规模部署的主要挑战。为解决这一紧迫问题,本文提出了一种针对多馈入海上交流孤岛中不同参与换流器在故障穿越(FRT)运行期间的全面协调控制策略。所提策略在HVDC和风电场换流器的FRT方案中引入了高级控制功能,例如故障期间的有功和无功零注入,以及故障后有功下垂控制协调以解决功率不平衡。通过PSCAD/EMTDC中的广泛仿真以及功率硬件在环(PHIL)实验结果,验证了所提FRT协调策略的有效性,同时考虑了交流和直流故障。

英文摘要

Large-scale offshore Wind Farms (WFs) are considered key assets towards realizing a sustainable power system. These systems are often configured as offshore AC islands and their integration largely depends on the High-Voltage-Direct-Current (HVDC) technology. This topology, while it enables cost-effective transmission over large offshore distances, may lead to operational challenges. Specifically, the operation of offshore AC islands during faults and the grid code requirement fulfillment are identified as a major challenges for their large-scale deployment. To address this pressing issue, a comprehensive coordination control strategy for the different participating converters in multi-infeed AC offshore islands during Fault Ride Through (FRT) operation is presented in this work. The proposed strategy introduces advanced control functions in the FRT schemes of both the HVDC and WF converters, such as zero active and reactive power injection during faults, as well as post-fault active power droop control coordination to tackle power imbalances. The proposed FRT coordination strategy is validated through both extensive simulations in PSCAD/EMTDC, as well as with Power Hardware-in-the-Loop (PHIL) experimental results, considering both AC and DC faults.

2605.30172 2026-05-29 cs.CE eess.SP physics.med-ph

A Lumped-Element Electrical Model of the Human Head for Brain-Oriented Applications

面向脑部应用的集总参数人脑电学模型

Angelo Faccia, Ermanno Citraro, Francesco P. Andriulli

AI总结 提出一种基于三壳几何结构的集总参数电路模型,通过径向和切向RC支路模拟脑组织频散特性,验证了模型与半解析球谐参考解的一致性。

详情
Comments
4 pages, 4 figures. To appear in the proceedings of the APS March Meeting 2026, Detroit, Michigan
AI中文摘要

在这项工作中,我们提出了一种用于电准静态头部建模的紧凑替代电路。考虑了三层几何结构(脑、颅骨、头皮),每层通过径向和切向路径建模,并实现为RC支路。频率相关的组织电导率和介电常数被映射为色散电阻和电容元件。该模型在多种几何配置和工作频率下与半解析球谐参考解进行了验证,显示出良好的一致性。忽略色散和电容路径会导致在所考虑频率范围内头皮电位的过高估计,凸显了色散RC电路建模的必要性。

英文摘要

In this work, we present a compact surrogate circuit for electro-quasi-static (EQS) head modeling. A three-shell geometry (brain, skull, scalp) is considered, and each layer is modeled through radial and tangential pathways, implemented as RC branches. Frequency-dependent tissue conductivity and permittivity are mapped into dispersive resistive and capacitive elements. The model is validated against a semi-analytical spherical-harmonics reference solution over multiple geometrical configurations and operating frequencies, demonstrating good agreement. Neglecting dispersion and capacitive pathways can lead to an overestimation of scalp potentials over the considered frequency range, highlighting the need for dispersive RC circuit modeling.

2605.30127 2026-05-29 cs.HC eess.SP

REACT: A Conditioning Framework for User-Adaptive sEMG Hand Pose Estimation

REACT: 一种用于用户自适应sEMG手势姿态估计的条件框架

Eric Xie, Hei Shing Cheung

AI总结 提出REACT框架,通过轻量级条件机制和特征线性调制(FiLM)在推理时利用少量校准数据个性化预训练模型,实现跨用户sEMG手势姿态估计,在EMG2POSE基准上降低角度误差达3.9%。

详情
Comments
6 pages, 3 figures
AI中文摘要

表面肌电图(sEMG)能够在可穿戴设备上实现连续的手势姿态估计,但由于解剖结构和电极放置的个体间差异,基于多用户语料库训练的模型在未见过的个体上性能下降。我们提出了REACT,一个轻量级条件框架,在推理时仅使用少量校准记录即可个性化冻结的预训练EMG到姿态骨干网络。REACT从校准数据中学习紧凑的用户嵌入,并应用特征线性调制(FiLM)来适应共享编码器的特征空间,部署时无需梯度更新。在大型EMG2POSE基准上,REACT在回归和跟踪模式下所有三个泛化分割上均优于最先进的基线,在最小参数开销和每次用户校准时间低于45秒的情况下,将角度误差降低高达3.9%。

英文摘要

Surface electromyography (sEMG) enables continuous hand pose estimation on wearable devices, but models trained on multi-user corpora degrade on unseen individuals due to inter-user variability in anatomy and electrode placement. We propose REACT, a lightweight conditioning framework that personalizes a frozen pretrained EMG-to-pose backbone at inference time using only a handful of calibration recordings. REACT learns a compact user embedding from calibration data and applies Feature-wise Linear Modulation (FiLM) to adapt the shared encoder's feature space, requiring no gradient updates at deployment. On the large-scale EMG2POSE benchmark, REACT improves over the state-of-the-art baseline across all three generalization splits in both regression and tracking modes, reducing angular error by up to 3.9% with minimal parameter overhead and under 45 seconds of per-user calibration.

2605.30095 2026-05-29 math.ST cs.IT eess.SP math.IT stat.TH

The generalized method of moments is (almost) statistically efficient in low-SNR Gaussian latent-variable models

广义矩方法在低信噪比高斯潜变量模型中(几乎)具有统计有效性

Amnon Balanov, Tamir Bendory, Dan Edidin

AI总结 针对低信噪比高斯潜变量模型,证明广义矩方法在最优加权下与最大似然估计具有相同的一阶渐近协方差,从而提供统计有效的替代方案。

详情
AI中文摘要

我们研究了低信噪比(SNR)条件下的一类广泛的高斯潜变量模型,包括高斯混合和轨道恢复问题。我们证明,在该条件下,广义矩方法(GMoM)与最大似然估计的一阶渐近有效性相匹配。特别地,如果矩特征选择到识别所需的最小局部阶数并最优加权,则所得的GMoM估计量与最大似然估计量具有相同的主渐近协方差。我们的分析表明,在低信噪比下,这种等价性由分层局部几何结构决定:不同方向在不同矩阶数下变得信息丰富,将空间划分为具有不同SNR缩放比例的分层。我们证明了观测Fisher信息和GMoM信息算子在这些层上具有匹配的分层展开。因此,在低信噪比条件下,GMoM提供了最大似然的统计有效替代方案,同时保留了基于矩估计的计算优势。

英文摘要

We study estimation in the low signal-to-noise ratio (SNR) regime for a broad class of Gaussian latent-variable models, including Gaussian mixtures and orbit recovery problems. We show that, in this regime, the generalized method-of-moments (GMoM) matches the first-order asymptotic efficiency of maximum likelihood. In particular, if the moment features are chosen up to the minimal local order required for identification and are weighted optimally, then the resulting GMoM estimator has the same leading asymptotic covariance as the maximum-likelihood estimator. Our analysis shows that, in low SNR, this equivalence is governed by a layered local geometry: different directions become informative at different moment orders, partitioning the space into layers with distinct SNR scalings. We prove that the observed Fisher information and the GMoM information operator admit matching layerwise expansions across these layers. As a consequence, in the low-SNR regime, GMoM provides a statistically efficient alternative to maximum likelihood, while preserving the computational advantages of moment-based estimation.

2605.29995 2026-05-29 cs.IT eess.SP math.IT

Low-Overhead Receiver Design for Data-Dependent Superimposed Training via Deep Learning

基于深度学习的低开销数据相关叠加训练接收机设计

Xinjie Li, Xingyu Zhou, Jing Zhang, Chao-Kai Wen, Xiao Li, Shi Jin

AI总结 针对叠加导频传输中导频-数据耦合导致的性能-复杂度瓶颈,本文提出一种增强型数据相关叠加训练(DDST)框架,结合混合传输方案和基于Vision Transformer的神经接收机,实现非迭代解耦和时变信道下的高效干扰抑制。

详情
Comments
This work has been submitted to the IEEE for possible publication
AI中文摘要

叠加导频(SIP)传输通过消除正交导频(OP)方案中所需的专用导频开销来提高频谱效率。然而,SIP遭受严重的导频-数据耦合,这导致接收机处出现关键的性能-复杂度瓶颈。为解决这一问题,本文提出一种低开销传输框架,通过增强的干扰抑制策略重振数据相关叠加训练(DDST)。首先,对于准静态块衰落信道,开发了一种增强型DDST接收机,通过利用数据相关代数结构实现非迭代的导频-数据解耦。其次,为了克服传统DDST在快速时变环境中对信道变化和符号误判的敏感性,提出了一种混合传输方案。通过将DDST策略性地应用于资源元素的子集,该方案结合了OP的无干扰传输特性和SIP的零导频开销优势,从而提高了解映射可靠性和干扰抑制能力。此外,在所提出的混合方案下,设计了一种基于Vision Transformer的神经接收机,以捕获导频和携带扰动的数据之间的正交结构以及潜在的信道相关性,从而放宽了干扰解缠所需的严格准静态假设。仿真结果表明,所提出的框架在时变信道下的低到中SNR区域实现了显著的性能增益,同时与最先进的SIP接收机相比提供了优越的计算效率。

英文摘要

Superimposed pilot (SIP) transmission improves spectral efficiency by eliminating the dedicated pilot overhead required in orthogonal pilot (OP)-based schemes. However, SIP suffers from severe pilot-data coupling, which leads to a critical performance-complexity bottleneck at the receiver. To address this issue, this paper proposes a low-overhead transmission framework that revitalizes data-dependent superimposed training (DDST) with enhanced interference mitigation strategies. First, for quasi-static block-fading channels, an enhanced DDST receiver is developed to achieve non-iterative pilot-data decoupling by exploiting data-dependent algebraic structures. Second, to overcome the sensitivity of conventional DDST to channel variations and symbol misidentification in fast time-varying environments, a mix transmission scheme is developed. By strategically applying DDST to a subset of resource elements, the proposed scheme combines the interference-free transmission property of OP with the zero-pilot-overhead advantage of SIP, thereby improving demapping reliability and interference suppression. Furthermore, under the proposed mix scheme, a Vision Transformer-based neural receiver is designed to capture the orthogonal structure between pilots and perturbation-bearing data, as well as the underlying channel correlations, thereby relaxing the stringent quasi-static assumption required for interference disentanglement. Simulation results demonstrate that the proposed framework achieves significant performance gains in the low-to-medium SNR regime under time-varying channels while providing superior computational efficiency compared with state-of-the-art SIP receivers.

2605.29975 2026-05-29 cs.LG eess.SP

A Fully Convolutional Approach to Denoising Structural Dynamics Data from X-Ray Photon Correlation Spectroscopy

一种全卷积方法用于X射线光子相关光谱中结构动力学数据的去噪

Nisar Nellikunnummel, Andi Barbour, Lutz Wiegart, Tatiana Konstantinova, Anthony DeGennaro

AI总结 提出全卷积去噪自编码器(FC-DAE),用于去噪X射线光子相关光谱中的双时间强度-强度相关函数,支持任意输入尺寸,在低信噪比条件下恢复复杂动力学特征并保持结构保真度。

详情
AI中文摘要

我们提出了一种全卷积去噪自编码器(FC-DAE),用于去噪X射线光子相关光谱(XPCS)中的双时间强度-强度相关函数($C_2$)。与通常限制为固定输入尺寸的传统去噪自编码器不同,FC-DAE接受任意维度的输入,同时保留不同动力学范围内的相关结构。该模型使用在NSLS-II光束线收集的实验$C_2$数据进行训练,并应用数据增强来扩展数据集的多样性并减少过拟合。FC-DAE在低信噪比条件下成功恢复复杂的动力学特征,同时保持结构保真度。为了评估重建可靠性,我们采用定量指标来评估结构保真度并识别潜在的模型引入偏差。我们的结果表明,FC-DAE提供了具有高计算效率的鲁棒去噪性能,使得在光子受限和低剂量测量条件下恢复XPCS动力学成为可能。

英文摘要

We present a fully convolutional denoising autoencoder (FC-DAE) for denoising two-time intensity-intensity correlation functions ($C_2$) in X-ray photon correlation spectroscopy (XPCS). Unlike conventional denoising autoencoders that are typically restricted to fixed input sizes, the FC-DAE accepts inputs of arbitrary dimensions while preserving correlation structures across diverse dynamical regimes. The model is trained using experimentally derived $C_2$ data collected at NSLS-II beamlines, with data augmentation applied to expand the diversity of the dataset and reduce overfitting. The FC-DAE successfully recovers intricate dynamical features in low signal-to-noise conditions while maintaining structural fidelity. To assess reconstruction reliability, we employ quantitative metrics to evaluate structural fidelity and identify potential model-induced bias. Our results demonstrate that the FC-DAE provides robust denoising performance with high computational efficiency, enabling recovery of XPCS dynamics under photon-limited and low-dose measurement conditions.

2605.29950 2026-05-29 eess.AS eess.SP

Frequency-Modulated and Single-Tone Excitation to Reveal Vibro-Acoustic Nonlinearities in Loosened Bolted Joints

调频和单音激励揭示松动螺栓连接中的振动声学非线性

Berkay Kullukcu, Robin Pianowski, Dina Hannebauer

AI总结 提出一种通过调频和单音激励结合振动声学技术检测螺栓松动的方法,利用谐波带功率比区分不同预紧状态。

详情
AI中文摘要

螺栓连接的预紧力损失会导致结构刚度、阻尼和非线性的变化,但现有的轨道车辆系统监测技术通常无法结合受控激振器测试和非线性特征感知。本文提出了一种使用振动声学技术检测螺栓松动的方法,其中结构受到受控激振器测试以感知非线性特征。将一个三轴加速度计连接到演示件上,将麦克风放置在附近,并在0%、20%、40%和80%预紧力条件下测试其中一个螺栓。对演示件施加接近130 Hz主固有频率(通过正弦扫频和窄带激励识别)的单音和调频信号。当结构受到130 Hz单音激励时,螺栓的松动状态表现出几个额外的高频频谱峰值。125至135 Hz之间的调频激励进一步区分了不同状态。以载波归一化的谐波带功率比区分了松动状态和80%预紧状态,其中松动状态与80%预紧状态之间的差异在l=2时为17.5 dB,在l=6时为36.5 dB。

英文摘要

Preload loss in bolted joints results in alterations of the stiffness, damping, and nonlinearity of the structure, but existing monitoring techniques for rail-vehicle systems are often not capable of combining controlled shaker tests and sensing of nonlinear features. This paper proposes a method for detecting bolt loosening using a vibro-acoustic technique, where the structure is subjected to controlled shaker tests to sense the nonlinear features. A triaxial accelerometer was attached to the demonstrator, a microphone was placed in close proximity, and one of the bolts was tested under 0%, 20%, 40%, and 80% preload conditions. Single-tone and frequency-modulated (FM) signals close to the main natural frequency of 130 Hz, which was identified using sine sweep and narrow-band excitation, were applied to the demonstrator. When the structure was subjected to 130 Hz single-tone excitation, the loose state of the bolt exhibited several additional high-frequency spectral peaks. FM excitation between 125 and 135 Hz further distinguished between the states. Harmonic band power ratios, normalized to the carrier, distinguished between the loose state and the 80% preload state, where the difference between the loose and 80% preload states was 17.5 dB for l = 2 and 36.5 dB for l = 6.

2605.29942 2026-05-29 physics.app-ph eess.IV

Reconfigurable Multistate MRAM Synapses with Vortex STNO based Neurons for Scalable In-Memory Convolutional Neural Networks

基于涡旋STNO神经元可重构多态MRAM突触用于可扩展内存卷积神经网络

Ravish Kumar Raj, Simon N. Richter, Saeed Baghaee Ivriq, Oliver Fridorf, Darío Fernández-Khatiboun, Yasser Rezaeiyan, Luana Benetti, Tim Boehnert, Ricardo Ferreira, Hooman Farkhani, Sonal Shreya, Farshad Moradi

AI总结 提出一种集成多态MRAM突触与涡旋自旋扭矩纳米振荡器神经元的统一架构,通过场线驱动写入通道实现可编程卷积核与池化操作,在多个数据集上达到高精度并显著降低能耗。

详情
Comments
29 pages, 17 Figures and 4 tables
AI中文摘要

基于磁性隧道结(MTJ)的磁随机存储器(MRAM)因其非易失性、高耐久性、快速开关动态和CMOS兼容性,成为神经形态和内存计算的有前景平台。然而,用于神经网络的常规自旋转移矩和自旋轨道矩MRAM实现常面临高临界开关电流、大延迟、热不稳定性和显著的读写开销。本文展示了一种统一的多态MRAM-自旋扭矩纳米振荡器(STNO)架构,将突触和神经元集成在单个芯片上,用于卷积神经网络(CNN)应用。该系统采用1×8多态MRAM阵列作为可编程突触,与基于涡旋的STNO神经元耦合,通过场线驱动写入通道实现单独和集体编程。通过调节内外部磁场和偏置电流,实现多个可配置电阻状态,从而获得量化的正负突触权重,用于可配置的卷积核和池化操作。通过在MNIST、SVHN、CIFAR-10、Google语音命令(GSC)和RadioML数据集上的仿真评估,所提架构分别达到99.76%、87.93%、78.14%、87.96%和56.46%的准确率。基于制造器件尺寸,完整架构面积约为6171.2 μm²,对于MNIST,每个训练和推理周期的平均能耗为200.08 pJ,突显了其在可扩展低功耗神经形态计算中的潜力。

英文摘要

Magnetic tunnel junction (MTJ)-based magnetic random-access memory (MRAM) is a promising platform for neuromorphic and in-memory computing owing to its non-volatility, high endurance, fast switching dynamics and CMOS compatibility. However, conventional spin-transfer torque and spin-orbit torque MRAM implementations for neural networks often suffer from high critical switching currents, large latency, thermal instability and significant read-write overheads. Here, we demonstrate a unified multistate MRAM-spin-torque nano-oscillator (STNO) architecture that integrates synapses and neurons on a single chip for convolutional neural network (CNN) applications. The system employs 1x8 multistate MRAM arrays as programmable synapses coupled with a vortex-based STNO neuron, enabling both individual and collective programming through fieldline-driven write channels. Multiple configurable resistance states are achieved by tuning internal and external magnetic fields together with bias currents, allowing quantized positive and negative synaptic weights for configurable kernel and pooling operations. The proposed architecture is evaluated through simulation on MNIST, SVHN, CIFAR-10, Google Speech Commands (GSC) and RadioML datasets, achieving accuracy of 99.76%, 87.93%, 78.14%, 87.96% and 56.46% respectively. Based on fabricated device dimensions, the complete architecture occupies ~6171.2 μm2 with an average energy consumption of 200.08 pJ per training and inference cycle for MNIST, highlighting its potential for scalable low-power neuromorphic computing

2605.29931 2026-05-29 cs.AI eess.AS

It`s All About Speed: AI`s Impact on Workflow in Music Production

一切都关乎速度:AI对音乐制作工作流程的影响

Finn McClellan, Fabio Morreale

AI总结 通过民族志研究,探讨AI和自动化工具如何影响音乐制作工作流程,重点关注录音工程师、混音师和制作人的使用体验与态度,并分析速度、可控性与创造性自主权之间的张力及其缓解方法。

详情
Comments
Audio Engineering Society Conference Paper - Presented at the AES International Conference on Machine Learning and Artificial Intelligence for Audio 2025 - September 8-10, London, UK
AI中文摘要

在本文中,我们展示了一项关于AI和自动化工具对音乐制作工作流程影响的民族志研究结果。我们特别关注那些自认为是录音工程师、混音师和制作人的专业参与者,讨论了他们对常见AI和自动化软件的使用情况,以及他们对这些工具普及的看法。我们讨论了在速度和效率、可控性以及保持创造性自主权等关键领域,用户与自动化工具之间可能产生的紧张关系,以及如何通过工具设计来缓解这些紧张关系。

英文摘要

In this paper, we present the results of an ethnographic study into the impact of AI and automated tools on music production workflow. Focusing specifically on professional participants who identified as recording engineers, mixers, and producers, we discuss their usage of common AI and automated software, as well as their sentiments on the proliferation of these tools. We discuss tensions that may be created between users and automated tools in key areas such as the need for speed and efficiency, controllability, and maintaining creative agency, and how these tensions may be alleviated through tool design.

2605.29862 2026-05-29 eess.AS cs.AI cs.SD

Mitigating Stethoscope-Induced Shortcuts in Respiratory Sound Classification under Federated Domain Generalization with Causality-Inspired Interventions

在联邦域泛化下通过因果启发的干预减轻听诊器引起的呼吸音分类中的捷径

Heejoon Koo, Yoon Tae Kim, Miika Toikkanen, June-Woo Kim

AI总结 针对呼吸音分类中听诊器设备差异导致的域偏移问题,提出一种因果启发的多模态联邦域泛化框架,通过内容保持的风格扰动、反事实文本增强和梯度对齐实现设备不变表示,在ICBHI和SPRSound数据集上优于传统方法。

详情
Comments
2 figures, 4 tables, and 5 pages
AI中文摘要

基于AI的呼吸音分类(RSC)有望实现自动化肺部疾病检测,但多站点部署受到听诊器间差异的阻碍。我们针对听诊器引起的设备偏移引入了一种联邦域泛化(FedDG)公式,其中客户端使用异构设备,模型在未见设备上进行评估。我们的实证分析表明,听诊器引起的风格和疾病特定内容紧密纠缠,使得确定性风格去除不可靠。为此,我们提出了一种因果启发的多模态FedDG框架,结合了:(i) 因果启发的设备风格干预网络,执行内容保持的风格扰动,(ii) 反事实文本增强,中和元数据捷径,以及(iii) 梯度对齐,促进跨客户端的设备不变表示。基于多模态语言-音频预训练模型,在ICBHI和SPRSound数据集上的留一设备验证中,它优于传统数据增强和联邦学习基线。代码将在发表后发布。

英文摘要

AI-driven respiratory sound classification (RSC) is promising for automated pulmonary disease detection, yet multi-site deployment is hindered by inter-stethoscope variability. We introduce a federated domain generalization (FedDG) formulation for RSC under stethoscope-induced device shifts, where clients use heterogeneous devices and the model is evaluated on unseen devices. Our empirical analysis shows that stethoscope-induced style and disease-specific content are tightly entangled, making deterministic style removal unreliable. In response, we propose a causality-inspired multimodal FedDG framework that combines: (i) a causality-inspired device style intervention network that performs content-preserving style perturbations, (ii) counterfactual text augmentation that neutralizes metadata shortcuts, and (iii) gradient alignment that facilitates device-invariant representations across clients. Built on a multimodal language-audio pretraining model, it outperforms conventional data augmentation and federated learning baselines in leave-one-device-out validation on ICBHI and SPRSound datasets. Code will be released upon publication.

2605.29859 2026-05-29 eess.AS cs.CL

MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables

MELD: 基于梅尔频谱的离散潜变量语音语言建模

Sung-Lin Yeh, Wei Zhou, Gil Keren, Duc Le, Zhong Meng, Hao Tang, Jay Mahadeokar, Ozlem Kalinli, Alexandre Mourachko

AI总结 提出一种在梅尔频谱上联合优化编码器和语音语言模型的离散潜变量模型,在零样本文本转语音和语音转文本任务上优于基于编解码器和其他梅尔频谱基线,并缓解了自回归建模中的长时间静音和单词遗漏问题。

详情
AI中文摘要

最近的语音语言模型依赖于与自回归模型分开优化的编码器。由于这些编码器不了解下游目标,提取的表示可能对下游任务不是最优的。为了解决这一限制,我们在梅尔频谱上引入了一种离散潜变量模型,该模型联合优化编码器和语音语言模型。联合优化不仅在零样本文本转语音(TTS)和语音转文本(STT)任务上相比基于编解码器和其他基于梅尔频谱的基线带来了改进,而且有效缓解了自回归梅尔频谱建模中的常见问题,如长时间静音生成和单词遗漏。

英文摘要

Recent speech language models rely on encoders that are optimized separately from autoregressive models. Since these encoders are unaware of the downstream objectives, the extracted representations may not be optimal for downstream tasks. To address this limitation, we introduce a discrete latent variable model on mel spectrograms that jointly optimizes the encoder and the speech language model. Joint optimization not only brings improvements over codec-based and other mel-spectrogram-based baselines on zero-shot Text-to-Speech (TTS) and Speech-to-Text (STT) tasks, but also effectively alleviates common issues in autoregressive mel-spectrogram modeling, such as prolonged silence generation and word omissions.

2605.29849 2026-05-29 eess.SY cs.LG cs.SY

BuilDyn: Excitation-Driven Data Generation for Building Thermal Dynamics Modeling and Control

BuilDyn: 面向建筑热动力学建模与控制的激励驱动数据生成

Felix Koch, Thomas Krug, Fabian Raisch, Benjamin Schäfer, Benjamin Tischler

AI总结 本文提出BuilDyn包,通过可定制的激励策略生成控制导向的建筑数据,提升机器学习模型对未见工况的鲁棒性。

详情
AI中文摘要

机器学习越来越多地用于建筑的数据驱动建模,以实现故障检测与诊断、节能控制等下游任务。虽然最近的工作改善了跨建筑特性、天气和占用率的泛化能力,但泛化也依赖于对控制驱动系统状态空间的充分探索。现有的真实世界数据集和仿真环境主要反映固定控制策略下的稳态运行,导致激励有限,对未见工况的鲁棒性降低。本文介绍了基于BuilDa的BuilDyn包,该包支持可定制的激励策略用于控制导向的数据生成。BuilDyn还支持从代表性建筑分布中采样,并提供Python接口以便轻松集成到机器学习流水线中。我们通过比较在非激励和激励数据上训练的数据驱动ML模型在一栋建筑上的性能,展示了BuilDyn的优势。借助BuilDyn,我们希望推进可扩展的控制导向建模,并支持迁移学习和建筑特定基础模型等未来方向。

英文摘要

Machine learning (ML) is increasingly used for data-driven modeling of buildings to enable downstream tasks such as fault detection and diagnosis, and energy-efficient control. While recent work improves generalization across building characteristics, weather, and occupancy, generalization also depends on sufficient exploration of the control-driven system state space. Existing real-world datasets and simulation environments predominantly reflect stationary operation under fixed control policies, resulting in limited excitation and reduced robustness to unseen operating conditions. This paper introduces BuilDyn, a package based on BuilDa that enables customizable excitation strategies for control-oriented data generation. BuilDyn further supports sampling from representative building distributions and provides a Python interface for easy integration into machine learning pipelines. We demonstrate the benefits of BuilDyn by comparing the performance of data-driven ML models trained on non-excited and excited data for one building. With BuilDyn, we hope to advance scalable control-oriented modeling and support future directions such as transfer learning and building-specific foundation models.

2605.29841 2026-05-29 eess.SY cs.SY math.OC

Distributed Nonlinear Model Predictive Control for District Heating Networks

区域供热网络的分布式非线性模型预测控制

Alessandro Bettoni, Giacomo Mastroddi, Marco Muttoni

AI总结 提出一种基于交替方向乘子法的分布式非线性模型预测控制方法,利用图模型优化建筑质量流量吸收,在集中式控制性能和分散式隐私保护之间取得平衡。

详情
Comments
9 pages, 9 figures
AI中文摘要

本文提出了一种分布式非线性模型预测控制方法,该方法使用交替方向乘子法用于区域供热网络。利用热动力学的图模型,我们的控制器在分布式协作方案中优化建筑物的质量流量吸收,该方案在集中式控制的优越性能和分散式方案的隐私保护之间进行调解。通过一个基准的三建筑网络仿真,将所提出方案的性能与分散式模型预测控制方案进行了比较。

英文摘要

This paper presents a distributed nonlinear model predictive control that uses alternating direction method of mul tipliers for district heating networks. Exploiting a graph-based modeling of the thermal dynamics, our controller optimizes the mass flow absorption of buildings in a distributed cooperative scheme that mediates between the superior performance of the centralized control and the privacy preservation of the decentralized schemes. A benchmark three-building network simulation is used to compare the performance of the proposed solution with a decentralized model predictive control scheme.

2605.29824 2026-05-29 cs.IT eess.SP math.IT

On the Effect of Pulse Shaping Filters in Zak-OTFS Waveform for Radar Sensing

脉冲成形滤波器对Zak-OTFS波形雷达感知的影响

Abhishek Bairwa, Ananthanarayanan Chockalingam

AI总结 本文研究Zak-OTFS雷达波形中不同脉冲成形滤波器(sinc、高斯-辛格滤波器)对自模糊函数的影响,发现sinc和GS滤波器主瓣更窄,在多目标密集场景下分辨率更高,而高斯滤波器旁瓣更低,在稀疏场景下性能更好;当采用干扰抑制接收机时,sinc和GS滤波器在两种场景下均优于高斯滤波器。

详情
Comments
Submitted to IEEE journal for possible publication
AI中文摘要

在雷达感知中,探测波形的自模糊函数对多目标的分辨率和检测起着关键作用。在最近的基于Zak-OTFS的雷达文献中,考虑了高斯脉冲成形滤波器,并表明在多目标场景下,与传统的线性调频波形相比,它能提供更好的距离/速度估计性能。虽然高斯滤波器的自模糊函数具有非常低的旁瓣,但其主瓣较宽,这损害了分辨率和性能。受此启发,我们寻找具有更好模糊特性的滤波器。具体来说,我们探索了另外两种已知的滤波器,即sinc和高斯-辛格(GS)滤波器,并证明这些滤波器在不同场景和接收机处理下比高斯滤波器具有更好的性能。为了证明这一点,我们推导了采用sinc和GS滤波器的Zak-OTFS波形的自模糊函数的闭式表达式。sinc和GS滤波波形的模糊函数具有窄的主瓣,从而在基于基本峰值检测的接收机中,对于目标密集的场景具有更好的分辨率。高斯滤波波形的模糊函数具有非常低的旁瓣,从而在稀疏场景中具有更好的性能。当使用具有目标间干扰抑制的接收机时,与高斯滤波器相比,sinc和GS滤波器在密集和稀疏场景下均表现更好。

英文摘要

In radar sensing, the self-ambiguity function of the probing waveform plays a crucial role in the resolvability and detection of multiple targets. In the recent Zak-OTFS based radar literature, Gaussian pulse shaping filter has been considered, and it has been shown to offer better range/velocity estimation performance compared to the traditional chirp waveform in scenes with multiple targets. While the self-ambiguity function with Gaussian filter has very low side lobes, its main lobe is wide which compromises resolvability and performance. Motivated by this, we seek filters with better ambiguity characteristics. Specifically, we explore two other known filters, namely, sinc and Gaussian-sinc (GS) filters, and demonstrate that these filters offer better performance compared to Gaussian filter under different scenarios and receiver processing. Towards demonstrating this, we derive closed-form expressions for the self-ambiguity functions of Zak-OTFS waveform with sinc and GS filters. The ambiguity functions of sinc and GS filtered waveforms have narrow main lobes, resulting in better resolvability in scenes with densely populated targets for the basic peak-detection based receiver. The ambiguity function of Gaussian filtered waveform has very low sidelobes, resulting in better performance in sparsely populated scenes. When a receiver with inter-target interference mitigation is used, the sinc and GS filters perform better in both dense and sparsely populated scenes compared to Gaussian filter.

2605.29818 2026-05-29 eess.SY cs.SY

Teleoperation Operational Design Domain based on Minimal Risk Maneuver Capability

基于最小风险机动能力的远程操作运行设计域

Leon Johann Brettin, Nayel Fabian Salem, Ole Hans, Markus Maurer

AI总结 本文针对远程操作道路车辆,提出基于最小风险机动能力的运行设计域(ODD)概念,并通过用例验证其可行性。

详情
Comments
This is a preprint. The manuscript is under preparation and has not yet been submitted for peer review
AI中文摘要

本文讨论了专门为远程操作道路车辆设计的运行设计域(ODD)概念。为此,将针对自动驾驶设计的ODD概念调整为适用于远程操作。随着远程操作在常规交通中越来越普遍,问题出现了:这些车辆在哪些运行条件下能够且被允许行驶?目前,这些条件主要基于网络性能来选择。从安全角度来看,基于可靠连接进行选择是困难的,因为几乎不可能保证足够的可靠性。考虑到这一点,将针对自动驾驶设计的ODD概念调整为适用于远程操作:提出了一种概念,将远程操作系统的ODD基于远程操作车辆使用专门为此目的设计的专用系统执行最小风险机动的能力。然后,通过一个用例示例演示了这一概念。

英文摘要

This article discusses the concept of an Operational Design Domain (ODD) designed specifically for teleoperated road vehicles. For this purpose, the ODD concept designed for automated driving is adapted for teleoperation. As teleoperation becomes more common in regular traffic, the question arises under which operating conditions such vehicles are able and allowed to drive. Currently, these conditions are selected primarily based on network performance. From a safety perspective, it is difficult to base such a selection on a reliable connection because it is almost impossible to guarantee sufficient reliability. With this in mind, the ODD concept designed for automated driving is adapted for teleoperation: A concept is proposed for basing the ODD for a teleoperation system on the capability of the teleoperated vehicle to perform a minimal risk maneuver using a dedicated system designed solely for this purpose. This concept is then demonstrated using a use case example.

2605.29813 2026-05-29 cs.IT cs.SY eess.SY math.IT

Tackling Interference in HAPS Networks via Angular-Aware Clustering and RSMA

通过角度感知聚类和RSMA解决HAPS网络中的干扰问题

Afsoon Alidadi Shamsabadi, Animesh Yadav, Halim Yanikomeroglu

AI总结 针对HAPS网络中强视距链路导致的用户间干扰,提出角度感知用户聚类和干扰感知资源块分配框架,并引入速率分割多址接入(RSMA)以缓解同一资源块内的干扰,显著提升用户频谱效率。

详情
AI中文摘要

高空平台站(HAPS)已成为下一代无线网络的有前途的推动者,为地面用户提供无处不在的连接。无论是独立运行还是与地面网络集成,HAPS因其在平流层的战略位置,可以显著增强覆盖范围和容量。然而,由于HAPS链路的独特传播特性,HAPS赋能网络中的干扰管理需要特别关注。特别是,HAPS与地面用户之间的强视距(LoS)条件导致信道变化有限,从而加剧了用户间干扰。在这项工作中,我们考虑一个单一的HAPS通过多个波束在有限数量的正交资源块(RB)上服务多个地面用户。为了解决由此产生的干扰,我们提出了一种新颖的角度感知用户聚类和干扰感知RB分配框架,该框架策略性地对用户进行聚类,设计波束以服务每个聚类,并将RB分配给跨聚类的用户。为了进一步减轻同一RB内的干扰,引入了速率分割多址接入(RSMA)方案。仿真结果表明,所提出的基于聚类和RSMA的方法在可实现每用户频谱效率方面显著优于基线方案。

英文摘要

High Altitude Platform Stations (HAPS) have emerged as a promising enabler for next-generation wireless networks, offering ubiquitous connectivity to ground users. Operating either in standalone mode or in integration with terrestrial networks, HAPS can significantly enhance both coverage and capacity due to their strategic placement in the stratosphere. However, interference management in HAPS-empowered networks requires special attention due to the unique propagation characteristics of HAPS links. In particular, the strong line-of-sight (LoS) conditions between HAPS and ground users result in limited channel variability, thereby intensifying inter-user interference. In this work, we consider a single HAPS serving multiple ground users through multiple beams over a limited number of orthogonal resource blocks (RBs). To address the resulting interference, we propose a novel angular-aware user clustering and interference-aware RB allocation framework that strategically clusters users, designs beams to serve each cluster, and allocates RBs to users across clusters. To further mitigate intra-RB interference, a rate-splitting multiple access (RSMA) scheme is incorporated. Simulation results demonstrate that the proposed clustering and RSMA-based approach significantly outperforms baseline schemes in terms of achievable per-user spectral efficiency.

2605.29798 2026-05-29 cs.CV cond-mat.mtrl-sci eess.IV

Low-Magnification SEM May Suffice: Interpretable Deep Learning for Multi-Scale Fracture-Cause Classification in Zirconia-Toughened Alumina

低倍率SEM可能足够:用于氧化锆增韧氧化铝多尺度断裂原因分类的可解释深度学习

Julian Schmid, Pawel Astankow, Tom Vater, Julius Beck, Robert Cichon, Danny Krautz

AI总结 提出一种可解释的视觉变换器工作流,利用低倍率SEM图像对氧化铝基复合材料植入物断裂原因进行自动分类,达到与高倍率相当的准确率。

详情
AI中文摘要

可靠识别氧化铝基复合材料髋关节和膝关节植入物的断裂起源对于质量保证和患者安全至关重要,然而当前的断口分析工作流程耗时、部分主观且依赖高倍率扫描电子显微镜(SEM)。我们提出了一种可解释的视觉变换器(ViT)工作流,用于对广泛用于全关节置换的氧化铝基复合材料(BIOLOX delta, CeramTec GmbH)的断裂原因进行自动分类。从五年的生产爆破和验证测试中整理了8,493张SEM图像(50倍至10,000倍)的数据集,并按照制造链定义的三个缺陷类别(生坯、硬加工和材料缺陷)进行标注。在严重的类别不平衡下,微调后的ViT在分层五折交叉验证中达到了0.907的准确率和0.888的宏F1分数,两阶段感知哈希/SSIM泄漏审计确认了样本重叠可忽略。值得注意的是,低倍率(50倍)下的性能与高倍率(1k-10k倍)相当,表明宏观特征——镜面几何和羽状纹线场——已经编码了足够的诊断信号。Grad-CAM归因一致地定位在经典的断口线索(镜面、羽状纹、孔隙、加工痕迹)上,与既定的断口分析标准一致。这些结果共同将可解释ViT定位为陶瓷植入物质量保证的补充工具,能够实现低倍率预筛选并减少对耗时的高倍率检查的依赖。

英文摘要

Reliable identification of fracture origins in alumina matrix composite hip and knee implants is critical for quality assurance and patient safety, yet current fractographic workflows are time-consuming, partly subjective, and reliant on high-magnification scanning electron microscopy (SEM). We present an interpretable vision-transformer (ViT) workflow for automated classification of fracture causes in an alumina matrix composite (BIOLOX delta, CeramTec GmbH) widely used in total joint replacements. A dataset of 8,493 SEM images (50x-10,000x) was curated from five years of in-production burst and proof tests and annotated into three defect categories defined along the manufacturing chain: green body, hard machining, and material defects. Under severe class imbalance, the fine-tuned ViT reached an accuracy of 0.907 and a macro-F1 of 0.888 in stratified five-fold cross-validation, with a two-stage perceptual-hash/SSIM leakage audit confirming negligible specimen overlap. Notably, performance at low magnification (50x) was comparable to that at high magnification (1k-10kx), indicating that macro-scale features - mirror geometry and hackle line fields - already encode sufficient diagnostic signal. Grad-CAM attributions consistently localised on canonical fractographic cues (mirrors, hackles, pores, machining marks), aligning with established fractographic criteria. Together, these results position interpretable ViTs as a complementary tool for ceramic-implant quality assurance, enabling low-magnification pre-screening and reducing reliance on time-intensive high-magnification inspection.

2605.29777 2026-05-29 eess.SP

Multi-Snapshot Deep Denoising for Channel Estimation in OTFS Modulated Systems

OTFS调制系统中基于多快照深度去噪的信道估计

Surbhi Gehlot, Siddhi Shinde, Suraj Srivastava, Sandeep Kumar Yadav

AI总结 提出一种基于深度去噪的信道估计框架,将信道状态信息恢复建模为图像复原问题,利用时延-多普勒域信道的结构不变性,通过多帧OTFS快照联合增强轻量级去噪器NAFNet的性能,实现低复杂度、低导频信噪比下的可靠估计,并支持分数时延和多普勒效应。

详情
Journal ref
IEEE Communication Letters, Vol 30 (2025), Page No. 2029-2033
Comments
5 pages, 3 figures
AI中文摘要

针对正交时频空间(OTFS)调制系统,提出了一种基于深度去噪的信道估计框架,其中信道状态信息(CSI)恢复被建模为图像复原问题。该方法的一个显著特点是利用时延-多普勒(DD)域信道在几何相干时间内的结构不变性,使得在此期间捕获的多个OTFS帧可以作为近似相同信道的噪声快照。这些快照共同增强了所提出的基于非线性激活自由网络(NAFNet)的轻量级去噪器的有效性。该方法具有较低的计算复杂度,即使在低导频信噪比(PSNR)下也能可靠运行,并且可以适应分数时延和分数多普勒效应。仿真结果表明,与现有方法相比,该方法具有显著的性能提升。

英文摘要

A deep denoising based channel estimation framework is proposed for orthogonal time frequency space (OTFS) modulated systems, wherein channel state information (CSI) recovery is formulated as an image restoration problem. A salient attribute of the approach is the exploitation of structural invariance in the delay Doppler (DD) domain channel over a geometric coherence time, allowing multiple OTFS frames captured during this period to serve as noisy snapshots of the approximately identical channel. These snapshots jointly enhance the effectiveness of the proposed lightweight denoiser based on nonlinear activation free network (NAFNet). The method exhibits low computational complexity, operates reliably even at low pilot signal-to-noise ratio (PSNR), and can accommodate both fractional delay and fractional Doppler effects. Simulation results demonstrate significant performance gains over the existing methods.

2605.29753 2026-05-29 eess.IV cs.AI

A unified deeplearning framework for contrast-phase-specific virtual monochromatic imaging

一种用于对比相位特异性虚拟单色成像的统一深度学习框架

Antony Jerald, Hemant K Aggarwal, Brian Nett, Avinash Gopal, Phaneendra K Yalavarthy, Bipul Das, Rajesh Langoju

AI总结 提出一种统一深度学习框架,利用对比相位先验信息从单能CT数据合成对比相位特异性虚拟单色50 keV图像,通过新型先验条件架构实现能量转换,并在四个对比相位上验证了其对比增强和泛化能力。

详情
Journal ref
SPIE Medical Imaging 2026
AI中文摘要

双能CT(DECT)可实现虚拟单色成像(VMI)并提高对比度分辨率,但其临床采用受到硬件复杂性和成本的限制。在这项工作中,我们提出了一种统一的深度学习框架,通过利用对比相位信息作为先验,从单能CT(SECT)数据合成对比相位特异性虚拟单色50 keV图像。该模型使用DECT衍生的70 keV和50 keV图像对进行训练,涵盖四个对比相位——血管期、动脉期、门脉期和延迟期——采用一种新颖的先验条件架构,将对比相位先验整合到能量转换过程中。我们证明了所提出的统一模型能够实现对比增强,并在对比相位之间具有良好的泛化能力。此外,我们展示了该模型可以从SECT输入生成类似50 keV的图像,并保留对比相位特异性动态。

英文摘要

Dual-energy CT (DECT) enables virtual monochromatic imaging (VMI) and improved contrast resolution, but its clinical adoption is limited by hardware complexity and cost. In this work, we propose a unified deep learning framework that synthesizes contrast-phase-specific virtual monochromatic 50 keV images from single-energy CT (SECT) data by leveraging contrast phase information as a prior. The model is trained using DECT-derived 70 keV and 50 keV image pairs across four contrast phases -- Angio, Arterial, Portal, and Delayed -- using a novel prior conditioning architecture that integrates contrast phase priors into the energy transformation process. We demonstrate that the proposed unified model achieves contrast enhancement and generalizes well across contrast phases. Additionally, we show that the model can generate 50 keV-like images from SECT inputs, preserving contrast phase-specific dynamics.

2605.29679 2026-05-29 cs.IT eess.SP math.IT

A Unified Two-Stage Generative Diffusion Framework for Channel Estimation and Port Selection in Multiuser MIMO-FAS

多用户MIMO-FAS中信道估计与端口选择的统一两阶段生成扩散框架

Erqiang Tang, Wei Guo, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

AI总结 提出一个统一的两阶段扩散框架,通过最大后验推断将联合任务分解为连续流扩散模型(用于信道估计)和离散扩散模型(用于端口选择),在低采样率下实现高精度信道恢复和全局最优端口选择。

详情
AI中文摘要

流体天线系统(FAS)已成为下一代无线系统的一项有前景的技术。然而,实际的多用户多输入多输出FAS(MIMO-FAS)面临两个内在耦合的挑战:从有限的射频链获取精确的高维信道状态信息(CSI),以及解决组合端口选择问题,其中后者的有效性高度依赖于前者的结果。在本文中,我们提出一个统一的两阶段扩散框架,将联合任务表述为最大后验(MAP)推断问题,并通过插件近似将其分解为两个顺序采样阶段。对于阶段I,一个基于连续流的扩散模型作为2D FAS信道的强大隐式先验,并行引导生成方案实现近似后验采样,即使在严重低子采样比下也能实现准确的多用户信道恢复。对于阶段II,训练一个离散扩散模型,通过结合启发式标签上的监督学习和强化微调来近似条件端口选择分布,有效克服传统启发式算法的局部最优。大量仿真表明,所提出的框架同时实现了卓越的信道估计精度和全局优化的端口选择,显著提高了最小可达速率。

英文摘要

Fluid antenna systems (FAS) have emerged as a promising technology for next-generation wireless systems. However, practical multiuser multiple-input multiple-output FAS (MIMO-FAS) faces two inherently coupled challenges: acquiring accurate high-dimensional channel state information (CSI) from limited RF chains and solving the combinatorial port selection problem, where the effectiveness of the latter highly depends on the result of the former. In this paper, we propose a unified two-stage diffusion framework that formulates the joint task as a maximum-a-posteriori (MAP) inference problem and decomposes it into two sequential sampling stages through a plug-in approximation. For Stage I, a continuous flow-based diffusion model serves as a powerful implicit prior for 2D FAS channels, and a parallel guided generation scheme realizes approximate posterior sampling, enabling accurate multiuser channel recovery even under severely low sub-sampling ratios. For Stage II, a discrete diffusion model is trained to approximate the conditional port selection distribution by combining supervised learning on heuristic labels with reinforcement fine-tuning, effectively overcoming the local optima of conventional heuristic algorithms. Extensive simulations demonstrate that the proposed framework simultaneously achieves exceptional channel estimation accuracy and globally optimized port selection, substantially improving the minimum achievable rate.

2605.29677 2026-05-29 cs.HC eess.SP q-bio.NC

Embodied Virtual Reality Feedback Reshapes Neural Representations to Support Continuous Three-Dimensional Motor Imagery Decoding

具身虚拟现实反馈重塑神经表征以支持连续三维运动想象解码

Niall McShane, Attila Korik, Karl McCreadie, Naomi Du Bois, Darryl Charles, Damien Coyle

AI总结 本研究通过十名参与者的纵向实验,首次系统探究了具身虚拟现实反馈在实时三维虚拟肢体运动想象控制中的作用,发现VR反馈显著优于屏幕反馈,能提升解码性能并诱发更可解码和泛化的神经表征。

详情
Comments
28 pages, 7 figures, 3 tables. Submitted to Nature Biomedical Engineering. Data to be made available via Zenodo (DOI: 10.5281/zenodo.16047021)
AI中文摘要

连续脑机接口(BCI)通过解码想象运动中的运动轨迹提供直观的运动控制,然而反馈模态和纵向训练如何塑造神经表征和解码性能仍知之甚少。我们首次系统研究了在由运动想象驱动的实时三维虚拟肢体控制过程中,具身虚拟现实(VR)反馈的作用,涉及十名参与者的十个纵向会话。使用三种策略评估性能:实际在线性能(固定解码器泛化,FDG)、周期性再训练(顺序自适应训练,SAT)和会话内上限估计(会话内重建,WSR)。CNN-LSTM解码器在VR下实现了会话内想象运动相关性r = 0.762,在屏幕反馈下为r = 0.672。VR在所有策略和运动维度上均显著优于屏幕反馈(改进8.9-13.0%,所有p <= 0.002,d = 1.42-2.05)。这种优势在无需再训练的固定解码器下持续存在,表明具身VR反馈能诱发本质上更可解码和泛化的神经表征。线性混合效应模型证实了反馈模态和运动轴的主效应显著,且无交互作用。在神经生理学上,VR产生了更强的感觉运动-顶叶去同步和增强的运动-额叶功能连接,所有频带均涉及广泛的前岛叶活动,并增加了上顶叶小叶耦合,这与真实运动执行相关的模式相似。这些发现确立了具身空间反馈作为下一代面向直观运动控制和神经康复的连续BCI的关键设计原则。

英文摘要

Continuous brain-computer interfaces (BCIs) that decode motion trajectories from imagined movement offer intuitive motor control, yet how feedback modality and longitudinal training shape neural representations and decoding performance remains poorly understood. We present the first systematic investigation of embodied virtual reality (VR) feedback during real-time 3D virtual limb control driven by motor imagery, across ten longitudinal sessions in ten participants. Performance was evaluated using three strategies: actual online performance (Fixed Decoder Generalisation, FDG), periodic retraining (Sequential Adaptive Training, SAT), and within-session upper-bound estimation (Within-Session Reconstruction, WSR). A CNN-LSTM decoder achieved within-session imagined movement correlations of r = 0.762 under VR and r = 0.672 under screen feedback. VR significantly outperformed screen feedback across all strategies and movement dimensions (improvements of 8.9-13.0%, all p <= 0.002, d = 1.42-2.05). This advantage persisted under fixed decoders without retraining, demonstrating that embodied VR feedback elicits inherently more decodable and generalisable neural representations. Linear mixed-effects modelling confirmed robust main effects of feedback modality and movement axis with no interaction. Neurophysiologically, VR produced stronger sensorimotor-parietal desynchronisation and enhanced motor-frontal functional connectivity, with pervasive anterior insula engagement across all frequency bands and increased superior parietal lobule coupling, paralleling patterns associated with real movement execution. These findings establish embodied spatial feedback as a key design principle for next-generation continuous BCIs targeting intuitive motor control and neurorehabilitation.

2605.29628 2026-05-29 cs.SD cs.AI cs.CL cs.LG eess.AS

COMET: Concept Space Dissection of the Modality Gap in Audio-Text Multimodal Contrastive Embeddings

COMET:音频-文本多模态对比嵌入中模态间隙的概念空间剖析

Yonggang Zhu, Liting Gao, Aidong Men, Wenwu Wang

AI总结 提出COMET框架,通过PLS-SVD分解揭示CLAP模型中模态间隙主要由少数共享概念轴贡献,并基于谱截断方法无训练地缓解间隙,实现零样本音频字幕接近全监督性能。

详情
AI中文摘要

对比语言-音频预训练(CLAP)模型广泛用于音频理解,并在许多零样本应用中支持模态无关的条件交换。然而,其性能受到音频和文本嵌入之间模态间隙的严重影响。现有解释主要将此间隙归因于锥体效应,将其视为均值嵌入之间的偏移,但仅纠正均值只能带来有限的改进。其他假设,如信息不平衡和维度坍缩,也被提出,但仍未得到充分验证,并且在音频领域尚未被深入研究。同时,一些工作尝试将多模态对比嵌入分解为可解释的概念,但没有任何工作从概念分解的角度显式分析模态间隙。在这项工作中,我们引入了COMET(基于PLS-SVD变换的概念空间组织与模态间隙解释),这是一个新颖的用于CLAP的偏最小二乘奇异值分解(PLS-SVD)框架,揭示了模态间隙的更广泛视角。我们的框架揭示,只有一小部分可解释的轴(捕捉共享概念)对相似度计算有显著贡献,并且均值分量仅部分代表模态间隙。基于这一见解,我们提出了一种简单的谱截断方法,以无训练的方式缓解模态间隙。该方法使得零样本音频字幕通过条件交换接近全监督性能,无需大型辅助记忆库或昂贵计算。同时,它在保持检索和音频字幕任务强性能的同时,实现了显著的嵌入维度缩减。

英文摘要

Contrastive Language-Audio Pretraining (CLAP) models are widely used for audio understanding and support modality-agnostic condition swapping in many zero-shot applications. However, their performance is heavily affected by the modality gap between audio and text embeddings. Existing explanations mainly attribute this gap to the cone effect, treating it as a shift between mean embeddings, yet correcting the mean alone yields only limited improvements. Alternative hypotheses, such as information imbalance and dimensionality collapse, have also been proposed, but they remain insufficiently verified and have not been thoroughly studied in the audio domain. Meanwhile, several works attempt to decompose multimodal contrastive embeddings into interpretable concepts, but none explicitly analyze the modality gap from the perspective of concept decomposition. In this work, we introduce COMET (Concept space Organization and Modality gap Explanation with PLS-SVD Transformation), a novel partial least squares singular value decomposition (PLS-SVD) framework for CLAP that unveils a broader perspective of the modality gap. Our framework reveals that only a small, interpretable subset of axes, which captures shared concepts, contributes substantially to similarity computation, and that the mean component represents only partially the modality gap. Building on this insight, we propose a simple spectral truncation method that mitigates the modality gap in a training-free manner. The method enables zero-shot audio captioning with condition swapping to approach fully supervised performance, without requiring large auxiliary memory banks or expensive computation. At the same time, it achieves substantial embedding dimensionality reduction while preserving strong performance on retrieval and audio captioning tasks.

2605.29613 2026-05-29 eess.AS cs.SD

Decoding Strategies for Diffusion-Based ASR: A Systematic Evaluation of Confidence-Based Thresholding

基于扩散的ASR解码策略:基于置信度阈值的系统评估

Jeong Hun Yeo, Minsu Kim, Hyeongseop Rha, Yong Man Ro

AI总结 本文系统评估了基于扩散语言模型的ASR中三种解码策略,提出使用基于负对数似然的不确定性度量来监控解码进度,发现基于阈值的策略在准确率和速度上均优于固定步数策略,其中静态阈值策略在匹配自回归解码准确率的同时具有更高效率。

详情
AI中文摘要

虽然基于LLM的自动语音识别(ASR)实现了高准确率,但其速度受限于顺序自回归解码。扩散语言模型(DLM)提供了一种并行替代方案,然而其解码策略在ASR场景中尚未得到充分探索。本文分析了三种用于DLM-based ASR的解码方案:固定步数、静态置信度阈值和动态置信度阈值。我们提出使用基于负对数似然的不确定性度量作为解码进度的代理来测量逐轮准确率。结果表明,基于阈值的策略在准确率和速度上均显著优于固定步数方案。我们将此归因于ASR独有的特性:大多数token在早期就达到高置信度,从而可以积极收集可靠token,仅将困难token留到后续轮次。值得注意的是,静态阈值策略在匹配自回归解码准确率的同时提供了更高的效率。

英文摘要

While LLM-based Automatic Speech Recognition (ASR) achieves high accuracy, its speed is limited by sequential autoregressive decoding. Diffusion Language Models (DLMs) offer a parallel alternative, yet their decoding strategies remain under-explored in ASR contexts. This paper analyzes three decoding schemes for DLM-based ASR: fixed-number, static confidence threshold, and dynamic confidence threshold. We propose measuring round-wise accuracy using Negative Log-Likelihood-based uncertainty as a proxy for decoding progress. Our results show that both threshold-based strategies significantly outperform fixed-number schemes in accuracy and speed. We attribute this to a property unique to ASR: most tokens reach high confidence early, allowing reliable ones to be harvested aggressively while leaving only difficult tokens for later rounds. Notably, the static-threshold strategy matches the accuracy of autoregressive decoding while offering superior efficiency.

2605.28569 2026-05-29 eess.SY cs.SY

Actor-Identifier-Critic Reinforcement Learning for Adaptive Model-Free Optimal Control of Nonlinear Systems with Stochastic Packet Dropouts

Actor-Identifier-Critic 强化学习用于具有随机丢包的非线性系统的自适应无模型最优控制

Kianoush Aqabakee, Kosar Behnia, Amirhossein Heydarian Ardakani, Farzaneh Abdollahi, Elham Shirazi

AI总结 提出一种 Actor-Identifier-Critic 控制器,通过标识器学习系统动态,处理控制器到执行器和传感器到控制器通道的丢包,实现非线性系统的无模型跟踪控制。

详情
AI中文摘要

控制系统中的丢包是一个关键挑战,因为它会显著损害系统性能和稳定性。在这种情况下,经典控制器通常难以提供有效的控制,因为它们依赖于可能并不总是可用的精确系统模型。本文提出了一种新颖的 Actor-Identifier-Critic (AIC) 控制器,用于解决在控制器到执行器和传感器到控制器通道均存在丢包的非线性系统的无模型跟踪控制问题。通过使用标识器学习系统动态,所提出的控制器能够处理通信链路中的丢包,并在无模型控制框架内促进从评论家到行动者的梯度传播。该方法的性能在两个非线性 SIMO 和 MIMO 系统以及一个受随机丢包影响的电力系统稳定性案例研究中得到了验证。

英文摘要

Packet dropouts in control systems poses a critical challenge, as it can significantly compromise system performance and stability. In these conditions, classical controllers often struggle to deliver effective control, as they rely on accurate system models, which may not always be available. This paper proposes a novel Actor-Identifier-Critic~(AIC) controller to address model-free tracking control of nonlinear systems in the presence of packet dropouts in both the controller-to-actuator and sensor-to-controller channels. Using an identifier to learn the system dynamics, the proposed controller is able to handle packet dropouts in the communication link and facilitate gradient propagation from the critic to the actor within a model-free control framework. The performance of the proposed method is demonstrated on two nonlinear SIMO and MIMO systems and a case study on power system stability subject to stochastic packet dropouts.

2605.20560 2026-05-29 cs.IT eess.SP math.IT

Reconfigurable Coupler Antenna for Wireless Networks

可重构耦合天线用于无线网络

Xiaodan Shao, Chuangye Shan, Weihua Zhuang, Xuemin Shen

AI总结 本文介绍可重构耦合天线(RCA)技术,通过重新配置低成本耦合器的位置和旋转,利用互耦实现机械波束赋形,以提升无线网络性能,并展示其在波束增益、路径损耗降低、衰落缓解、空间复用增益、干扰抑制和几何增益等方面的优势。

详情
Comments
7 pages
AI中文摘要

可重构耦合天线(RCA),也称为柔性耦合天线(FCA),是一种新技术,旨在通过重新配置固定位置有源天线周围低成本耦合器的位置和旋转,利用互耦来提升无线通信网络的性能。具体而言,不同的耦合器可以在收发端独立调整其位置和/或旋转,以重塑耦合器上的感应电流用于辐射,从而协同实现用于定向信号增强或零陷的机械波束赋形。无源耦合器的位置和/或旋转重构提供了一种新的、成本效益高的方式来增强无线通信性能,同时显著降低传统有源阵列的天线和射频(RF)链成本。RCA的紧凑和低外形结构使其特别适用于对尺寸、重量和功率(SWAP)有严格限制的设备。在本文中,我们概述了RCA,以揭示其在无线网络中的有前景能力,包括其系统建模、实际实现以及相对于现有技术的竞争优势。我们展示了多种RCA实现的性能增强,包括机械波束赋形增益、路径损耗降低、衰落缓解、空间复用增益、干扰抑制和几何增益。此外,我们详细阐述了RCA的设计挑战以及有前景的解决方案,并讨论了RCA在无线网络中的关键应用。最后,给出了数值结果,以验证RCA辅助传输在无线网络中带来的显著容量增益。

英文摘要

The reconfigurable coupler antenna (RCA), also called the flexible coupler antenna (FCA), is a new technique that aims to improve the performance of wireless communication networks by reconfiguring the positions and rotations of low-cost couplers around fixed-position active antennas to harness mutual coupling. Specifically, different couplers can independently adjust their positions and/or rotations at the transceiver to reshape the induced currents on the couplers for radiation, thereby collaboratively achieving mechanical beamforming for directional signal enhancement or nulling. The position and/or rotation reconfiguration of passive couplers provides a new and cost-effective means of enhancing wireless communication performance, while significantly reducing the antenna and radio-frequency (RF) chain costs of conventional active arrays. The compact and low form-factor structure of the RCA makes it particularly appealing for devices with stringent size, weight, and power (SWAP) constraints. In this article, we provide an overview of RCA to reveal its promising capabilities in wireless networks, including its system modeling, practical implementation, and competitive advantages over existing techniques. We present a variety of RCA-enabled performance enhancements in terms of mechanical beamforming gain, path-loss reduction, fading mitigation, spatial multiplexing gain, interference suppression, and geometric gain. Furthermore, we elaborate on the design challenges of RCA as well as promising solutions, and discuss the key applications of RCA in wireless networks. Finally, numerical results are presented to verify the substantial capacity gains enabled by RCA-aided transmission in wireless networks.

2605.01395 2026-05-29 eess.SY cs.RO cs.SY

Quasi-Static Control of Discrete Cosserat Rod

离散Cosserat杆的准静态控制

Srishti Siddharth

AI总结 针对使用Cosserat杆建模的软体机器人,基于分段常应变空间离散化方法,利用外部力/力矩作为控制输入,设计应变空间和任务空间的状态反馈线性化控制律,实现末端执行器轨迹跟踪和形状控制。

详情
Comments
Submitted to 17th APCA International Conference on Automatic Control and Soft Computing (CONTROLO 2026)
AI中文摘要

在本文中,我们为使用Cosserat杆建模的软体机器人设计了反馈控制律,其中Cosserat杆通过分段常应变(PCS)方法进行空间离散化。PCS方法将描述Cosserat杆的非线性偏微分方程转化为非线性常微分方程组。这种简化得到的软体机器人模型类似于串联刚性连杆机械臂。我们通过将外部力/力矩作为控制输入,为准静态PCS模型设计了反馈控制律。控制律基于应变空间和任务空间的状态反馈线性化设计。大量的数值结果展示了这些控制律在软体机器人末端执行器轨迹跟踪和形状控制中的性能。

英文摘要

In this paper, we design feedback control laws for soft robots modelled using the Cosserat rod, which is spatially discretised using the Piecewise Constant Strain (PCS) approach. The PCS approach transforms the nonlinear PDEs describing the Cosserat rod to a system of nonlinear ODEs. This simplification results in a model describing soft robots which is similar to the serial rigid-link manipulators. We design feedback control laws for the quasi-static PCS model by using the external wrenches as control input. The control laws are designed based on state-feedback linearisation in strain and task spaces. An extensive set of numerical results demonstrates the performance of the control laws for end-effector trajectory tracking and shape control of soft robots.

2605.00898 2026-05-29 eess.SP cs.LG

A Deep Learning Model for Battery State Prediction towards Intelligent Energy Management

面向智能能源管理的电池状态预测深度学习模型

Athanasios Koukosias, Vasileios Tzanidakis, Sotiris Athanasiou, Kostas Kolomvatsos

AI总结 提出一种集成先进神经网络架构和大规模训练数据的深度学习模型,用于预测工业电化学储能系统的未来状态和性能,以支持预测性维护和能源资源优化分配。

详情
Comments
11 pages, 11 figures, Journal
AI中文摘要

准确预测电池健康指标(包括剩余容量和寿命)对于确保电动汽车和大规模储能基础设施等应用的可靠性、安全性和运行效率至关重要。预测结果可用于构建先进的监测机制,持续检查电池健康状态,以协助众多应用的高效实时管理。本研究探讨了用于预测工业电化学储能系统未来状态和性能的深度学习(DL)模型的开发与实现。为应对这一挑战,我们提出了一种专用计算框架,该框架将先进的神经网络架构与大规模训练数据集相结合,能够精确建模电池退化动态和运行趋势。所提出的方法为电池的最优管理提供了决策支持机制,促进了预测性维护和能源资源的高效分配。我们的研究结果凸显了基于深度学习的预测建模在推动可持续和智能能源管理系统发展方面的巨大潜力。

英文摘要

Accurate forecasting of battery health indicators, including remaining capacity and lifetime, is of paramount importance for ensuring the reliability, safety, and operational efficiency of applications such as electric vehicles and large scale energy storage infrastructures. The result of the forecasting can be adopted to build an advanced monitoring mechanism for continuous checking batteries' health status to assist in the efficient real-time management of numerous applications. This research investigates the development and implementation of a Deep Learning (DL) model for the prediction of the future state and performance of industrial electrochemical energy storage systems. To address this challenge, we propose a dedicated computational framework that integrates advanced neural network architectures with large-scale training datasets, enabling precise modeling of batteries degradation dynamics and operational trends. The proposed approach provides a decision support mechanism for the optimal management of batteries facilitating both predictive maintenance and the efficient allocation of energy resources. Our findings highlight the potential of DL-based predictive modeling to significantly contribute to the advancement of sustainable and intelligent energy management systems.