arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.11332 2026-06-11 eess.SP 新提交

Learning-Based Phase Estimation for Multi-Frequency Carrier Phase Ranging under Structured Multipath Conditions

基于学习的多频载波相位测距在结构化多径条件下的相位估计

Jakub Bonczyk, Jakub Nikonowicz, Łukasz Matuszewski

AI总结针对多径环境下载波相位测距的非高斯、非对称相位观测问题，提出一种基于学习的估计器，直接利用经验相位分布，无需预设统计模型，在3GPP场景下比经典方法精度更高。

详情

Comments: 13 pages, 9 figures, 4 tables

AI中文摘要

载波相位（CP）测距是现代无线系统中高精度定位的关键技术。在多频OFDM感知中，子载波上的相位观测提供了关于底层传播几何的信息。然而，在现实的工业和城市环境中，由于确定性多径分量，这些观测表现出非高斯和非对称特性，违反了标准的圆形统计假设。在这项工作中，我们将基于CP的测距分析为圆形相位观测上的估计问题。我们表明，传统的基于模型的估计器，例如在von Mises假设下的圆形平均，在符合3GPP的传播条件下会产生偏差。使用基于QuaDRiGa的仿真框架，我们评估了工业工厂（InF）和城市微小区（UMi）场景中的经验相位分布，并量化了它们与经典统计模型的偏差。为了解决这些局限性，我们提出了一种基于学习的估计器，它直接对经验相位分布进行操作，而不假设预定义的统计模型。实验结果表明，与经典估计器相比，特别是在多径条件下，该方法的精度有所提高。

英文摘要

Carrier-phase (CP) ranging is a key enabler of high-precision positioning in modern wireless systems. In multi-frequency OFDM-based sensing, phase observations across subcarriers provide information about the underlying propagation geometry. However, in realistic industrial and urban environments, these observations exhibit non-Gaussian and asymmetric characteristics due to deterministic multipath components, violating standard circular statistical assumptions. In this work, we analyze CP-based ranging as an estimation problem over circular phase observations. We show that conventional model-based estimators, such as circular averaging under von Mises assumptions, become biased under 3GPP-compliant propagation conditions. Using a QuaDRiGa-based simulation framework, we evaluate empirical phase distributions in Industrial Factory (InF) and Urban Microcell (UMi) scenarios and quantify their deviation from classical statistical models. To address these limitations, we propose a learning-based estimator that operates directly on empirical phase distributions without assuming a predefined statistical model. Experimental results show improved accuracy compared to classical estimators, particularly under multipath conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.11327 2026-06-11 eess.SY 新提交

Shared Renewable Allocation and Hydrogen Flexibility in Local Energy Markets: A Market Design Perspective

本地能源市场中的共享可再生能源分配与氢灵活性：市场设计视角

Pratik Mochi, Magnus Korpås

AI总结提出协调的本地电-氢市场框架，通过混合整数线性规划模型优化电力交易、电池运行、风电分配和氢生产，揭示市场设计规则和可再生能源接入机制对系统行为与灵活性的关键影响。

详情

AI中文摘要

绿色氢在本地能源市场中的整合通常从技术灵活性角度进行分析，而市场设计规则的影响仍较少被探索。本文提出了一个协调的本地电-氢市场框架，其中氢的参与通过明确的可再生能源接入机制进行监管。开发了一个混合整数线性规划模型，在集中协调下共同优化电力交易、电池运行、风电分配和氢生产。研究了六种监管案例，包括氢供应选项和本地风电接入。结果针对挪威能源社区的代表性季节周获得。电解槽作为刚性负载连接时，增加了电网依赖，但当基于价格的参与被激活时，也改善了系统成本。直接可再生能源接入减少了电网进口，增强了风电分配，并引入了与家庭在能源分配和系统成本优化方面的竞争。此外，研究结果表明：（i）本地能源系统中的氢整合本质上是一个市场设计问题；（ii）可再生能源接入规则关键地决定了系统行为、灵活性交互和季节性表现。

英文摘要

The integration of green hydrogen in local energy markets is often analyzed from a technical flexibility perspective, while the effect of market design rules remains less explored. This paper proposes a coordinated local electricity-hydrogen market framework in which hydrogen participation is regulated by explicit renewable access mechanisms. A mixed-integer linear programming model is developed to co-optimize electricity trading, battery operation, wind allocation and hydrogen production under centralized coordination. Six regulatory cases are examined including hydrogen supply options and access of local wind. Results are obtained for representative seasonal weeks for Norwegian energy community. Electrolyzer, when connected as rigid load, increases grid dependence, but also improves system cost when price-based participation is activated. Direct renewable access reduces grid imports, enhances wind allocation and introduces competition with households for energy distribution and system cost optimization. Furthermore, findings show that (i) hydrogen integration in local energy systems is essentially a market design problem and (ii) renewable access rules critically determine system behaviour, flexibility interactions and seasonal performance.

URL PDF HTML ☆

赞 0 踩 0

2606.11287 2026-06-11 eess.IV cs.CV 新提交

Intelligent Skin Cancer Detection Using a Multispectral Metasurface and a Hybrid

基于多光谱超表面和混合深度学习的智能皮肤癌检测

Afsane Saee Arezoomand

AI总结提出结合多光谱超表面成像与CNN-ViT混合深度学习架构，实现皮肤癌高精度检测，准确率达98%，灵敏度95%，特异性99%。

详情

Comments: 8 pages

AI中文摘要

皮肤癌是全球最常见的恶性肿瘤之一，早期检测对于提高患者生存率和降低治疗成本至关重要。传统的皮肤镜和视觉成像技术主要局限于可见光谱，通常无法捕捉与早期恶性肿瘤相关的细微光谱特征。本研究提出了一种创新框架，将多光谱超表面成像与基于卷积神经网络和视觉Transformer的混合深度学习架构相结合。设计的超表面能够非侵入性地获取对组织变化高度敏感的丰富光谱信息，而混合CNN-ViT模型同时提取局部和全局特征，以稳健地对皮肤病变进行分类。基于模拟的评估表明，所提方法实现了约98%的准确率、95%的灵敏度和99%的特异性，优于传统的基于RGB和单一架构的方法。使用注意力图进行的定性分析显示，模型关注临床相关的病变区域，提高了可解释性。总体而言，结果表明，将基于超表面的多光谱成像与混合深度学习相结合，可以引入新一代皮肤病学诊断工具，并为便携、快速且高精度的临床系统铺平道路。

英文摘要

Skin cancer is among the most prevalent malignancies worldwiAdbe satnradcitts early detection is essential for improving patient survival and reducing treatment costs Conventional dermoscopic and visual imaging techniques are primarily limited to the visible spectrum and often fail to capture subtle spectral signatures associated with early stage malignancies This study proposes an innovative framework that integrates a multispectral metasurface for imaging with a hybrid deep learning architecture based on Convolutional Neural Networks and Vision Transformers The designed metasurface enables noninvasive acquisition of rich spectral information highly sensitive to tissue alterations while the hybrid CNN ViT model simultaneously extracts local and global features to robustly classify skin lesions Simulation-based evaluations demonstrate that the proposed method achieves approximately 98 accuracy 95 percentages sensitivity and 99 perentage specificity surpassing conventional RGB-based and single-architecture approaches Qualitative analyses using attention maps reveal that the model focuses on clinically relevant lesion regions improving interpretability Overall the results indicate that combining metasurface based multispectral imaging with hybrid deep learning can introduce a new generation of diagnostic tools in dermatology and pave the way for portable fast and highly accurate clinical systems

URL PDF HTML ☆

赞 0 踩 0

2606.11280 2026-06-11 cs.IT eess.SP 新提交

Designed-Source Reductions and a Dual-Purpose Feasibility Band for Semantic Rate-Distortion

设计源约简与语义率失真的双重用途可行性带

Joss Armstrong

AI总结针对语义通信中设计源子类，将SK框架特化为条件均值解码和Lloyd-Max平稳性，并推导出可行性带。

详情

AI中文摘要

Stavrou和Kountouris的联合率失真框架（IEEE Transactions on Communications 2023）刻画了随机语义源上语义通信的双保真度权衡。许多面向任务的通信系统使用设计源，其中语义对象是确定性预言分配$\phi^{(t)}$，而非自然给定的随机量。我们在光滑凹效用、假设A1、A2和欧几里得分配余定义域下隔离出设计源子类，并将编码器类限制为确定性公共类别映射。在此子类中，SK指数倾斜解码器和广义Blahut-Arimoto迭代特化为条件均值解码和关于$\phi^{(t)}$的Lloyd-Max平稳性。当第二保真度为单调单字母失真时，联合问题仍属于SK可容许类；公共类别SK率由相应香农率失真函数的最大值下界，仅当公共类别重构兼容且RDF最优时取等。当第二保真度为聚合验证时，联合问题离开SK单字母类，并允许一个约束设计可行性带$R_{\min}(\varepsilon^*) \leq R \leq R_{\max}(\beta^*)$，其宽度为$\log_2(K_{\max}/K_{\min})$比特（按划分基数）。该约简和带是SK装置的适用范围陈述，而非对其的修改。一个带有非技术损耗检测对比的智能电网经济调度示例说明了该带。

英文摘要

The joint rate-distortion framework of Stavrou and Kountouris (IEEE Transactions on Communications 2023) characterises dual-fidelity tradeoffs for semantic communication on stochastic semantic sources. Many task-oriented communication systems instead use designed sources, where the semantic object is a deterministic oracle allocation $\phi^(t)$ rather than a stochastic quantity given by nature. We isolate the subclass of designed sources under smooth concave utility with assumptions A1, A2 and Euclidean allocation codomain, and restrict the encoder class to deterministic common-category mappings. Within this subclass the SK exponential-tilting decoder and generalised Blahut--Arimoto iteration specialise to conditional-mean decoding and Lloyd--Max stationarity on $\phi^(t)$. When the second fidelity is a monotone single-letter distortion, the joint problem stays inside the SK admissible class; the common-category SK rate is lower-bounded by the max of the corresponding Shannon rate-distortion functions, with equality only when the common-category reconstruction is compatible and RDF-optimal. When the second fidelity is aggregate verification, the joint problem leaves the SK single-letter class and admits a constrained-design feasibility band $R_{\min}(\varepsilon^) \leq R \leq R_{\max}(\beta^)$ of width $\log_2(K_{\max}/K_{\min})$ bits in partition cardinality. The reduction and the band are scope statements on the SK apparatus, not modifications to it. A smart-grid economic-dispatch example with a non-technical-loss-detection contrast illustrates the band.

URL PDF HTML ☆

赞 0 踩 0

2606.11279 2026-06-11 eess.AS cs.CL cs.LG cs.SD 新提交

Massive Open-Vocabulary Keyword Spotting

大规模开放词汇关键词识别

Leonor Barreiros, Raul Monteiro, Afonso Mendes, Gonçalo M. Correia

AI总结提出一种内存占用更小的开放词汇关键词识别系统，无需微调即可处理大规模数据库，在未见语言中达到与未压缩方案相当的实体召回率。

2606.11226 2026-06-11 math.NA eess.SY 新提交

A Scalable Approach for Transient Thermal Modeling of Automotive Power Electronics

汽车电力电子瞬态热建模的可扩展方法

Neelakantan Padmanabhan

AI总结提出一种结合集总参数与线性叠加的LPLSP方法，用于汽车逆变器模块的瞬态热仿真，误差小于5%，支持快速设计迭代和长任务剖面模拟。

详情

Comments: This arXiv version corresponds to the author accepted manuscript published in SAE Technical Papers. The final version of record is available at this https URL

AI中文摘要

高效热管理对于汽车应用中电力电子系统的可靠性和性能至关重要。本文提出了一种计算高效的建模方法，用于电力电子系统的瞬态热仿真，重点关注使用多个MOSFET安装在印刷电路板组件（PCBA）上的逆变器模块。考虑了一个逆变器模块的案例研究，该模块包含六个MOSFET，排列为三相系统的高边和低边对，安装在PCBA上并连接到散热器。在Ansys Icepak中进行了计算流体动力学（CFD）仿真，考虑了不同的传热机制，包括自然对流、恒定速度强制对流和变流速强制对流。使用集总参数线性叠加（LPLSP）方法开发了瞬态热模型，这是一种混合方法，结合了集总参数建模与线性叠加原理，以高效捕获瞬态热行为。将仿真得到的组件温度与LPLSP模型的温度以及为此系统开发的基于线性时不变（LTI）的降阶模型（ROM）的温度进行了比较。观察到LPLSP模型能够非常准确地模拟广泛的使用场景，误差小于5%。该方法能够快速评估电力电子系统的热性能，这些系统在组件级功耗和环境条件方面具有非常快的瞬态变化，特别适用于早期设计迭代和长持续时间任务剖面仿真。该方法为缩短汽车电力电子设计开发周期提供了一条实用途径。

英文摘要

Efficient thermal management is critical for the reliability and performance of power electronics systems in automotive applications. This work presents a computationally efficient modeling approach for transient thermal simulation of power electronic systems, with a focus on inverter modules using multiple MOSFETs mounted on a printed circuit board assembly (PCBA). A case study of an inverter module comprising six MOSFETs arranged as high-side and low-side pairs for a three phases system mounted on a PCBA, attached to a heat sink is considered. Computational fluid dynamic (CFD) simulations in Ansys Icepak are performed considering different heat transfer mechanisms, including natural convection, forced convection at constant velocity, and forced convection with varying flow velocity. A transient thermal model is developed using the Lumped Parameter Linear Superposition (LPLSP) method, a hybrid approach that combines lumped parameter modeling with the principle of linear superposition to capture transient thermal behavior efficiently. Temperatures of the components from the simulations are compared with temperatures from the LPLSP model and temperatures from a Linear Time Invariant (LTI) based reduced order model (ROM) developed for this system. It is observed that the LPLSP model is able to model a wide range of use cases very accurately with error of less than 5 %. This method enables rapid thermal performance evaluation of power electronics systems that have very fast transients in component level power dissipation and variations in ambient conditions, making it particularly well-suited for early-stage design iterations and long-duration mission profile simulations. The approach offers a practical path to reducing development cycles for automotive power electronics design.

URL PDF HTML ☆

赞 0 踩 0

2606.11225 2026-06-11 eess.SY physics.app-ph 新提交

Emergent Non-Hermitian Topology in Multi-Robot Network

多机器人网络中的涌现非厄米拓扑

Jielong Zhang, Guiju Duan, Tinggui Chen, Shengjie Zheng, Bozheng Xue, Baizhan Xia

AI总结通过数字编程非互易交互规则，在多机器人网络中实验实现了可编程非厄米拓扑相，观察到了拓扑零模和皮肤效应，并实现了拓扑模式的动态调控。

详情

AI中文摘要

非厄米拓扑已在波和物质系统中得到广泛探索，通常依赖于物理空间中复杂非互易耦合的路径。本工作展示了在分散式多机器人网络中可编程非厄米拓扑相的实验实现。通过数字编程非互易交互规则并在活跃机器人之间建立实时状态交换，我们在跨越一维到三维的合成晶格中观察到了涌现拓扑零模和非厄米皮肤效应。动态定制非互易参数使得拓扑零模在局域态和离域态之间精确变形，为跨维度的拓扑模式工程建立了一个通用框架。该平台将多机器人网络确立为探索非平衡拓扑物理学的高度可重构系统，同时为活性物质中拓扑保护的鲁棒集体行为铺平了道路。

英文摘要

Non-Hermitian (NH) topology has been extensively explored in wave and matter systems, typically relying on the routing of complex, non-reciprocal couplings in physical space. This work demonstrates the experimental realization of programmable NH topological phases within decentralized multi-robot networks. By digitally programming non-reciprocal interaction rules and establishing real-time state exchange among active robots, we observe emergent topological zero modes (TZMs) and NH skin effects in synthetic lattices spanning one to three dimensions. Dynamically tailoring non-reciprocal parameters enables the precise morphing of TZMs between localized and delocalized states, establishing a versatile framework for topological mode engineering across dimensionalities. This platform establishes multi-robot networks as highly reconfigurable systems for exploring non-equilibrium topological physics, while paving the way for topologically protected, robust collective behaviors in active matter.

URL PDF HTML ☆

赞 0 踩 0

2606.11197 2026-06-11 eess.AS cs.AI cs.CL cs.SD 新提交

MA-DLE: Speech-based Automatic Depression Level Estimation via Memory Augmentation

MA-DLE: 基于记忆增强的语音自动抑郁程度估计

Xuzhi Wang, Xinran Wu, Ziping Zhao, Jianhua Tao, Björn W. Schuller

AI总结提出记忆增强特征方法，通过选择性整合历史时序特征和动态记忆特征，结合层次注意力融合模块，在DAIC-WOZ和E-DAIC数据集上实现最优性能。

详情

Comments: Accepted at IEEE TAC

AI中文摘要

基于语音的抑郁程度自动估计对于实现早期检测和及时干预至关重要，尤其是在资源受限的心理健康环境中。近年来，深度学习在包括情感计算和心理健康评估在内的多个领域取得了显著成功。现有方法大多依赖基于RNN的架构（如LSTM和GRU）来建模时间信息以进行抑郁估计。然而，提取的特征往往只强调少数相邻语音片段，限制了其捕捉长程依赖的能力。为克服这一局限，我们引入了一种基于记忆的特征增强方法，以增强GRU提取特征的表示能力。我们的记忆库并非不加区分地整合历史数据，而是设计为选择性整合两类组件以减少冗余和不相关性：(1) 与当前GRU输出高度相似的历史时序特征，提供互补的上下文信息；(2) 基于特征变异性识别的动态记忆特征，捕捉指示抑郁症状的行为和情绪波动。为有效融合记忆增强特征与GRU输出，我们进一步设计了层次注意力融合（HAF）模块。我们的方法在广泛使用的DAIC-WOZ和E-DAIC数据集上进行了评估，取得了最先进的性能。

英文摘要

Speech-based automatic estimation of depression levels is essential for enabling early detection and timely intervention, particularly in resource-constrained mental health settings. In recent years, deep learning has demonstrated impressive success across various domains, including affective computing and mental health assessment. Most existing approaches rely on RNN-based architectures (such as LSTM and GRU) to model temporal information for depression estimation. However, the extracted features often emphasize only a few adjacent speech segments, limiting their ability to capture long-range dependencies. To overcome this limitation, we introduce a memory-based feature augmentation method that enhances the representational capacity of GRU-extracted features. Rather than indiscriminately incorporating historical data, our memory bank is designed to selectively integrate two types of components in order to reduce redundancy and irrelevance: (1) historical temporal features that closely resemble the current GRU output, offering complementary contextual information; and (2) dynamic memory features identified based on feature variability, which capture behavioral and emotional fluctuations indicative of depressive symptoms. To effectively fuse the memory-augmented features with GRU outputs, we further design a Hierarchical Attention Fusion (HAF) module. Our method is evaluated on the widely used DAIC-WOZ and E-DAIC datasets, achieving state-of-the-art performance.

URL PDF HTML ☆

赞 0 踩 0

2606.10511 2026-06-11 eess.SP 版本更新

Simplified Temporal Convolutional-Based Channel Estimation for a WiFi Vehicular Communication Channel

基于简化时间卷积的WiFi车辆通信信道估计

Simbarashe Aldrin Ngorima, Albert Helberg, Marelie Davel

AI总结针对IEEE 802.11p标准在高移动性场景下导频不足导致信道估计不准确的问题，提出一种基于简化时间卷积网络（DPA-TCN）的估计器，在混合信噪比数据集上训练，性能与LSTM-DPA-TA相当，但模型复杂度降低约65%。

2606.11107 2026-06-11 eess.IV cs.CV cs.LG 版本更新

Multimodal Brain Tumour Classification Using Feature Fusion

使用特征融合的多模态脑肿瘤分类

Wajih ul Islam, Muhammad Yaqoob, Javed Ali Khan, Volker Steuber

AI总结提出双分支多模态网络，融合MRI图像与91个放射组学特征，通过门控融合实现脑肿瘤分类，准确率达96.13%。

详情

AI中文摘要

临床医生通过综合患者症状、病史以及来自MRI和CT扫描等模态的定量成像数据，形成统一的临床判断来诊断脑肿瘤。然而，大多数深度学习模型仅依赖MRI/CT图像，未能复制临床医生的多模态推理。我们探索了一种双分支多模态网络，将原始MRI扫描与91个提取的放射组学特征（强度、纹理、形状和边界描述符）相结合，将脑肿瘤分类为胶质瘤、脑膜瘤、垂体瘤和无肿瘤。预训练的CNN骨干网络编码图像流，而专用的MLP编码放射组学特征流。通过拼接、门控或双向跨模态注意力策略融合两个流。在平衡的7200张图像数据集上的九次实验运行中，所有多模态配置均优于单模态基线，其中门控融合实现了最佳准确率96.13%。

英文摘要

Clinicians diagnose brain tumors by synthesizing patient symptoms, medical history, and quantitative imaging data from modalities such as MRI and CT scans into a unified clinical judgement. However, most deep learning models rely on MRI/CT images alone, failing to replicate the clinicians multimodal reasoning. We explore a two-branch multimodal network combining raw MRI scans with 91 extracted radiomic features (intensity, texture, shape, and boundary descriptors) to classify brain tumors into glioma, meningioma, pituitary, and no-tumor. A pre-trained CNN backbone encodes the image stream, whereas a dedicated MLP encodes the radiomic stream. Both streams are fused via concatenation, gated, or bidirectional cross-modal attention strategies. Across nine experimental runs on a balanced 7,200 image dataset, all multimodal configurations outperform unimodal baselines with gated fusion achieving the best accuracy of 96.13%.

URL PDF HTML ☆

赞 0 踩 0

2606.06940 2026-06-11 eess.AS cs.SD 版本更新

Beyond Semantic Dominance: Cognitive Affective Reasoning and Empathetic Response Alignment in Audio Language Models

超越语义主导：音频语言模型中的认知情感推理与共情响应对齐

Zhixian Zhao, Shuiyuan Wang, Wenjie Tian, Jingbin Hu, Ziyu Zhang, Lei Xie

AI总结提出CogAudio-LLM框架，通过构建LIME-440K数据集实现声学-语义解耦，设计EIPS思维链机制进行心理推理，并采用DR-SAPO优化策略平衡逻辑严谨性与共情质量，解决音频语言模型中的语义主导和情感认知不足问题。

详情

Comments: Accepted by Interspeech2026

AI中文摘要

虽然音频语言模型（ALM）表现出强大的语义理解能力，但在复杂的情感交互方面仍存在困难。具体来说，文本语义主导常常掩盖声学细微差别，而缺乏认知深度导致生成通用、与情感无关的响应。我们提出了CogAudio-LLM\footnote{ \urlstyle{same} this https URL}，一种新颖的认知情感推理框架。为了缓解语义主导，我们构建了LIME-440K，一个“词汇相同、多情感”的数据集，旨在促进声学-语义解耦。我们引入了EIPS，一种包含心理推理的4步思维链（CoT）机制。为了提高推理效率，多阶段训练通过监督微调显式建立EIPS，然后将这种逻辑提炼为隐式生成过程。最后，我们设计了DR-SAPO（双路径软自适应策略优化）来动态平衡CoT的逻辑严谨性与直接响应的共情质量。

英文摘要

While Audio Language Models (ALMs) demonstrate strong semantic understanding, they struggle with complex affective interactions. Specifically, textual semantic dominance often overshadows acoustic nuances, and a lack of cognitive depth leads to generic, emotion-agnostic responses. We propose CogAudio-LLM\footnote{ \urlstyle{same} this https URL, a novel cognitive affective reasoning framework. To mitigate semantic dominance, we build LIME-440K, a ``lexically-identical, multi-emotion'' dataset designed to facilitate acoustic-semantic decoupling. We introduce EIPS, a 4-step Chain-of-Thought (CoT) mechanism incorporating psychological reasoning. For inference efficiency, multi-stage training explicitly establishes EIPS via supervised fine-tuning, then distills this logic into an implicit generation process. Finally, we design DR-SAPO (Dual-Route Soft Adaptive Policy Optimization) to dynamically balance the logical rigor of the CoT with the empathetic quality of the direct response.

URL PDF HTML ☆

赞 0 踩 0

2606.06065 2026-06-11 cs.CL cs.SD eess.AS 版本更新

Multi-task Learning is Not Enough: Representational Entanglement in Dual-output Second Language Speech Recognition

多任务学习还不够：双输出第二语言语音识别中的表示纠缠

Seung Hwan Cho, Young-Min Kim

AI总结针对双输出第二语言语音识别，研究发现多任务学习导致表面转录性能下降，归因于编码器级别的表示纠缠，尤其在英语中随表面-意义差异增大而加剧。

详情

Comments: 5 pages, 2 figures, Accepted to the 43rd International Conference on Machine Learning Workshop on Machine Learning for Audio

AI中文摘要

第二语言（L2）语音识别通常需要发音转录和预期意义的转录。多任务学习（MTL）是一种自然的方法，因为它假设共享表示对两个输出都有益。然而，本文表明这一假设在韩语和英语中并不成立。MTL提高了意义转录但降低了表面转录，尤其是在英语中，性能下降与通过Levenshtein编辑距离测量的表面-意义差异成正比。编码器分析将这些模式与编码器级别的纠缠联系起来，韩语保留了不同的任务表示，而英语产生了几乎相同的表示。跨任务解码器分析表明，意义双输出解码器适应了独特的表示，而表面双输出解码器仍受编码器约束。这些发现促使设计能够减轻编码器级别纠缠的MTL框架，以减少双输出L2自动语音识别中的表面性能下降。

英文摘要

Second-language (L2) speech recognition often requires transcriptions of pronunciations and intended meanings. Multi-task learning (MTL) is a natural approach because it assumes that shared representations benefit both outputs. However, this paper shows that this assumption does not hold across Korean and English. MTL improves meaning but degrades surface transcription, especially in English, where the degradation scales with surface-meaning divergence measured by Levenshtein edit distance. Encoder analysis links these patterns to encoder-level entanglement, with Korean preserving distinct task representations while English produces nearly identical ones. Cross-task decoder analysis shows that the meaning dual-output decoder adapts with a unique representation, while the surface dual-output decoder remains constrained by the encoder. These findings motivate the design of MTL frameworks that mitigate encoder-level entanglement to reduce surface degradation in dual-output L2 automatic speech recognition.

URL PDF HTML ☆

赞 0 踩 0

2606.05394 2026-06-11 cs.SD eess.AS 版本更新

nnAudio 2: Overcoming Dynamic Compilation Barriers and Transform Inconsistencies

nnAudio 2: 克服动态编译障碍与变换不一致性

Abhinaba Roy, Junyi Liang, Dorien Herremans

AI总结针对 nnAudio 在 TorchScript 编译、逆变换边缘情况和依赖漂移方面的问题，通过移除动态状态变异、限制逆变换适用范围并更新依赖，实现了与现代 PyTorch 和 SciPy 的兼容，提升了可微音频分析的鲁棒性。

详情

AI中文摘要

nnAudio 是一个用于深度学习的开源音频特征提取工具箱，但在当前环境中，其使用受到 TorchScript 不兼容、逆变换边缘情况和依赖漂移的阻碍。我们针对现代 PyTorch 和科学 Python 进行了有针对性的现代化改造。我们通过从脚本化代码路径中移除动态状态变异和模块构造，并收紧逆相关辅助函数中的参数处理，解决了 STFT 和 iSTFT 中的 TorchScript 编译失败问题。我们通过将可靠逆变换限制为均匀 bin 设置（freq_scale='no'），并对不支持的频率尺度引发显式运行时错误，澄清了逆 STFT 行为，防止了静默退化的重构。我们恢复了与现代 SciPy 的 CFP 兼容性，并确保当 gamma = 0 时 VQT 简化为 CQT。回归测试涵盖了新的 STFT/iSTFT 行为，更新后的代码库在现代 Python 环境中通过了完整的仓库测试套件。这些改进为研究和部署中的可微音频分析提供了更坚实的基础。

英文摘要

nnAudio is an open-source audio feature extraction toolbox for deep learning, but its use in current environments is hindered by TorchScript incompatibilities, inverse-transform edge cases, and dependency drift. We present a targeted modernization for modern PyTorch and scientific Python. We resolve TorchScript compilation failures in STFT and iSTFT by removing dynamic state mutation and module construction from scripted code paths and tightening argument handling in inverse-related helpers. We clarify inverse-STFT behavior by restricting reliable inversion to the uniform-bin setting (freq_scale=`no') and raising explicit runtime errors for unsupported frequency scales, preventing silently degraded reconstructions. We restore CFP compatibility with modern SciPy and ensure VQT reduces to CQT when gamma = 0. Regression tests cover the new STFT/iSTFT behaviors, and the updated codebase passes the full repository test suite in a modern Python environment. These improvements provide a more robust foundation for differentiable audio analysis in research and deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.02220 2026-06-11 eess.AS 版本更新

SiamCTC: Learning Speech Representations through Monotonic Temporal Alignment

SiamCTC: 通过单调时间对齐学习语音表征

SooHwan Eom, Mark Hasegawa-Johnson, Chang D. Yoo

AI总结提出SiamCTC框架，结合孪生网络与连接时序分类（CTC）损失，通过灵活单调对齐不同时间实现，学习无需严格帧级对应的语音表征，提升对语速变化的鲁棒性。

详情

Comments: Accepted to Interspeech 2025

AI中文摘要

通过孪生网络，自监督语音表征学习取得了显著进展，这些网络利用同一输入的不同视图。然而，现有方法通常需要这些视图之间的帧级对齐，忽略了不同说话风格下更广泛的 linguistic context 不变性。我们引入了SiamCTC，一个将孪生网络与连接时序分类（CTC）相结合的框架，用于学习无需严格帧级对应的语音表征。通过采用CTC损失在相同内容的不同时间实现之间建立灵活、单调的对齐，SiamCTC适应了速度扰动和其他时间增强。这种设计放宽了帧级约束，同时保持了时间一致性，并增强了下游任务中对语速变化的鲁棒性。我们的实验表明，SiamCTC导致了更具适应性的语音表征，特别是在不同的语速下。

英文摘要

Self-supervised speech representation learning has made significant progress through Siamese networks, which leverage different views of the same input. However, existing methods often require frame-wise alignment between these views, overlooking the broader linguistic context invariance across different speaking styles. We introduce SiamCTC, a framework that integrates Siamese networks with Connectionist Temporal Classification (CTC) to learn speech representations without strict frame-level correspondence. By employing CTC loss to establish flexible, monotonic alignments between differing temporal realizations of the same content, SiamCTC accommodates speed perturbations and other temporal augmentations. This design relaxes frame-wise constraints while preserving temporal coherence and enhancing robustness to speaking-rate variations in downstream tasks. Our experiments demonstrate that SiamCTC leads to more adaptable speech representations, particularly at diverse speaking rates.

URL PDF HTML ☆

赞 0 踩 0

2605.27303 2026-06-11 eess.SP 版本更新

Point Spread Function Optimization for Communication-assisted UAV-borne MIMO TomoSAR

面向通信辅助的无人机载MIMO TomoSAR的点扩展函数优化

Pouya Fakharizadeh, Mohamed-Amine Lahmeri, Gerhard Krieger, Robert Schober

AI总结针对无人机载MIMO合成孔径雷达层析成像系统，提出基于粒子群优化的联合无人机编队与卸载功率分配方法，以最小化点扩展函数旁瓣水平。

详情

AI中文摘要

本文解决了无人机载多输入多输出合成孔径雷达层析成像系统的点扩展函数优化问题。部署一群无人机载SAR系统对区域成像以获取其高度剖面。为了获得场景的高质量三维图像，PSF必须具有低旁瓣。图像生成所需的重计算在地面进行。为此，无人机SAR收集的传感器数据通过频分多址空地回程链路实时卸载。本文联合优化无人机编队和用于卸载的功率分配，以最小化PSF旁瓣水平。为此，我们提出了一种基于粒子群优化算法的新颖解决方案，该方案满足实际的感知和通信约束。仿真结果表明，与几种基准方案相比，所提方案能显著改善旁瓣抑制。

英文摘要

This paper tackles the optimization of the point spread function (PSF) of unmanned aerial vehicle (UAV)-borne multiple-input multiple-output (MIMO) synthetic aperture radar (SAR) tomography systems. A swarm of UAV-borne SAR systems is deployed to image an area to obtain its height profile. To achieve a high-quality three-dimensional (3D) image of the scene, the PSF has to exhibit low sidelobes. The heavy computations, required for image generation, are performed on the ground. To this end, the sensor data collected by the UAV-SARs is offloaded in real time via a frequency division multiple access (FDMA) air-to-ground backhaul link. In this work, the UAV formation and the power allocated for offloading are jointly optimized for the minimization of the PSF sidelobe levels. To this end, we propose a novel solution based on the particle swarm optimization (PSO) algorithm, which meets practical sensing and communication constraints. Our simulation results demonstrate that the proposed solution can significantly improve sidelobe suppression compared to several benchmark schemes.

URL PDF HTML ☆

赞 0 踩 0

2605.23770 2026-06-11 eess.SY astro-ph.EP math.OC physics.space-ph 版本更新

Reachability for Low-Thrust Trajectories via Maximum Initial Mass

基于最大初始质量的低推力轨迹可达性分析

Giacomo Acciarini, Dario Izzo, Zhong Zhang

AI总结提出一种对偶可达性公式，通过最大化初始质量（或太阳帆强度）将可达性评估转化为标量优化问题，并利用残差网络构建高效代理模型。

详情

Comments: Presented at the 30th International Symposium on Space Flight Dynamics, 1-5 June 2026, Toulouse, France

AI中文摘要

可达性分析在低推力航天器轨迹优化中起着核心作用，它通过识别在时间、推力和推进剂约束下可实现的目标状态。经典方法通过求解大量终端状态网格上的最优控制问题来构建可达集，需要固定初始条件进行大量正向模拟。虽然有效，但这种方法计算成本高，对于高维系统或强非线性动力学（如地月环境或太阳帆任务中遇到的）变得不切实际。本文引入了可达性问题的对偶公式。我们不直接计算可达集，而是针对固定的转移时间和边界条件，确定允许成功转移的最大初始质量（对于太阳帆，为标量帆强度参数）。如果航天器的初始质量不超过该阈值，则目标可达。这种重新表述将可达性评估简化为每个目标的标量优化问题，产生一个平滑的标量场，其编码与经典可达集等效的可行性信息。我们为电低推力和太阳帆动力学开发了间接最大初始质量（MIM）公式，并展示了它们如何作为高效的可达性预言机。基于此公式，我们构建了数据驱动的代理模型来近似基于MIM的可达性指标。我们研究了全连接神经网络，并证明残差网络在准确性、训练稳定性和模型复杂度之间提供了最佳权衡。由此产生的代理模型能够实现快速的可达性评估，同时保留对偶公式的数值优势，为初步任务设计和可行性评估提供了实用工具。

英文摘要

Reachability analysis plays a central role in low-thrust spacecraft trajectory optimization by identifying which target states can be achieved under constraints on time, thrust, and propellant. Classical approaches construct reachable sets by solving many optimal control problems over grids of terminal states, requiring extensive forward simulations with fixed initial conditions. While effective, this approach is computationally expensive and becomes impractical for high-dimensional systems or strongly nonlinear dynamics, such as those encountered in cislunar environments or solar sail missions. This work introduces a dual formulation of the reachability problem. Instead of computing reachable sets directly, we determine, for fixed transfer time and boundary conditions, the maximum allowable initial mass (or, for solar sails, a scalar sail-strength parameter) that permits a successful transfer. A target is reachable if the spacecraft's initial mass does not exceed this threshold. This reformulation reduces reachability assessment to a scalar optimization problem for each target, producing a smooth scalar field that encodes equivalent feasibility information to classical reachable sets. We develop indirect maximum-initial-mass (MIM) formulations for both electric low-thrust and solar-sail dynamics and show how they can serve as efficient reachability oracles. Building on this formulation, we construct data-driven surrogate models to approximate the MIM-based reachability indicator. We investigate fully connected neural networks and demonstrate that residual networks provide the best trade-off between accuracy, training stability, and model complexity. The resulting surrogates enable rapid reachability evaluation while preserving the numerical advantages of the dual formulation, offering a practical tool for preliminary mission design and feasibility assessment.

URL PDF HTML ☆

赞 0 踩 0

2605.19031 2026-06-11 cs.AI eess.SP 版本更新

KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition

KAN-MLP-Mixer: 对Kolmogorov-Arnold网络（KANs）在改进基于惯性测量单元（IMU）的人体活动识别中的应用的全面研究

Mengxi Liu, Sizhen Bian, Vitor Fortes, Francisco Calatrava Nicolas, Daniel Geißler, Maximilian Kiefer-Emmanouilidis, Bo Zhou, Paul Lukowicz

AI总结本文研究了KANs在改进IMU基人体活动识别（HAR）模型中的应用，提出了一种混合架构，结合KANs的精度与MLP的鲁棒性和效率，实验表明该混合模型在多个数据集上显著提升了性能。

详情

Comments: 23 pages, and 9 figures

AI中文摘要

Kolmogorov-Arnold Networks (KANs) have demonstrated an exceptional ability to learn complex functions on clean, low-dimensional data but struggle to maintain performance on noisy and imperfect real-world datasets. In contrast, conventional multi-layer perceptrons (MLPs) are far more tolerant to noise and computationally efficient. Replacing all MLP components with KANs in HAR models often degrades accuracy and computation efficiency, highlighting an open challenge: how to combine KANs' precision with MLPs' noise robustness and efficiency. To address this, we systematically explore various placements of KAN modules within deep HAR networks and propose a hybrid architecture that strategically synergizes the strengths of both paradigms, which uses a KAN-based input embedding layer, retains MLP layers for intermediate feature mixing, and introduces a specialized LarctanKAN module for final activity classification. Across eight public HAR datasets, the hybrid KAN-MLP model achieves an average macro F1 score relative improvement of 5.33\% compared pure-MLP model, significantly outperforming standalone KAN and MLP baselines. Furthermore, integrating this hybrid strategy into other state-of-the-art HAR architectures consistently boosts their performance. Our findings demonstrate that a carefully orchestrated combination of KAN, MLP, or other conventional neural components yields more robust and accurate HAR models for real-world wearable sensing environments.

英文摘要

Kolmogorov-Arnold Networks (KANs) have demonstrated an exceptional ability to learn complex functions on clean, low-dimensional data but struggle to maintain performance on noisy and imperfect real-world datasets. In contrast, conventional multi-layer perceptrons (MLPs) are far more tolerant to noise and computationally efficient. Replacing all MLP components with KANs in HAR models often degrades accuracy and computation efficiency, highlighting an open challenge: how to combine KANs' precision with MLPs' noise robustness and efficiency. To address this, we systematically explore various placements of KAN modules within deep HAR networks and propose a hybrid architecture that strategically synergizes the strengths of both paradigms, which uses a KAN-based input embedding layer, retains MLP layers for intermediate feature mixing, and introduces a specialized LarctanKAN module for final activity classification. Across eight public HAR datasets, the hybrid KAN-MLP model achieves an average macro F1 score relative improvement of 5.33\% compared pure-MLP model, significantly outperforming standalone KAN and MLP baselines. Furthermore, integrating this hybrid strategy into other state-of-the-art HAR architectures consistently boosts their performance. Our findings demonstrate that a carefully orchestrated combination of KAN, MLP, or other conventional neural components yields more robust and accurate HAR models for real-world wearable sensing environments.

URL PDF HTML ☆

赞 0 踩 0

2605.15161 2026-06-11 eess.SY math.DS 版本更新

On the Nonexistence of Continuous Immersions for Discrete-time Systems

关于离散时间系统连续浸入不存在性的研究

Eron Ristich, Eduardo Sontag, Necmiye Ozay

AI总结本文研究了离散时间系统连续浸入的不存在性，扩展了Liu等人(2023)关于连续时间系统的结果，并考虑了alpha极限集的泛化。

2605.06100 2026-06-11 eess.SP cs.AI cs.LG cs.RO 版本更新

CredibleDFGO: Differentiable Factor Graph Optimization with Credibility Supervision

可信DFGO：具有可信度监督的可微因子图优化

Liang Qian, Penggao Yan, Penghui Xu, Li-Ta Hsu

AI总结针对GNSS协方差不可靠问题，提出CredibleDFGO框架，通过可微高斯-牛顿求解器与加权生成网络，利用适当评分规则监督预测分布，提升协方差可信度与定位精度。

详情

Comments: Submitted to NAVIGATION: Journal of the Institute of Navigation

AI中文摘要

全球导航卫星系统（GNSS）定位广泛用于城市导航，但GNSS求解器报告的协方差在城市峡谷中通常不可靠。现有的可微因子图优化（DFGO）方法通过求解器学习测量加权，但仍仅使用位置目标。因此，位置估计可能改善，而报告的协方差仍然过小、过大或方向错误。我们提出CredibleDFGO（CDFGO），一种可微GNSS因子图框架，将协方差可信度作为显式训练目标。加权生成网络（WGN）预测每颗卫星的可靠性权重，可微高斯-牛顿求解器将这些权重映射到位置估计和基于Hessian的后验协方差。我们使用适当评分规则端到端监督东-北预测分布。我们研究了负对数似然（NLL）、能量分数（ES）及其组合。在三个UrbanNav测试场景上的结果表明，协方差可信度持续提升。定位精度在中度城市和严峻城市场景中也有所提高；在深度城市场景中，平均水平误差和第95百分位误差均有所改善。在严峻城市的旺角（MK）场景中，与DFGO（MAE）相比，CDFGO-Combined将平均水平误差从13.77米降至11.68米，将NLL从40.63降至6.59，将ES从12.31降至9.05。案例研究将MK改进归因于更好的轴向一致性、更可信的局部协方差椭圆以及卫星级重新加权。

英文摘要

Global navigation satellite system (GNSS) positioning is widely used for urban navigation, but the covariance reported by the GNSS solver is often unreliable in urban canyons. Existing differentiable factor graph optimization (DFGO) methods learn measurement weighting through the solver, but they still use position-only objectives. As a result, the position estimate may improve while the reported covariance remains too small, too large, or incorrectly oriented. We propose CredibleDFGO (CDFGO), a differentiable GNSS factor graph framework that makes covariance credibility an explicit training target. A Weighting Generation Network (WGN) predicts per-satellite reliability weights, and a differentiable Gauss-Newton solver maps these weights to a position estimate and a Hessian-derived posterior covariance. We use proper scoring rules to supervise the East-North predictive distribution end to end. We study negative log-likelihood (NLL), the energy score (ES), and their combination. Results on three UrbanNav test scenes show consistent gains in covariance credibility. Positioning accuracy also improves on the medium-urban and harsh-urban scenes; on the deep-urban scene, both the mean horizontal error and the 95th-percentile error improve. On the harsh-urban Mong Kok (MK) scene, CDFGO-Combined reduces the mean horizontal error from 13.77 m to 11.68 m, reduces NLL from 40.63 to 6.59, and reduces ES from 12.31 to 9.05 relative to DFGO (MAE). Case studies link the MK improvement to better axis-wise consistency, more credible local covariance ellipses, and satellite-level reweighting.

URL PDF HTML ☆

赞 0 踩 0

2603.25979 2026-06-11 cs.GT eess.SY 版本更新

Move Over, Prisoner's Dilemma: Colonel Blotto has arrived

让位吧，囚徒困境：Colonel Blotto 来了

Keith Paarporn, Jason R. Marden

AI总结本文介绍 Colonel Blotto 博弈框架，综述关键分析与计算结果，并展示其在网络安全、网络防御和多智能体系统中的应用，重点探讨相互依赖的对抗目标、替代获胜规则和多智能体竞争环境三个研究方向。

详情

AI中文摘要

囚徒困境、零和博弈、LQR 团队问题和微分博弈几十年来塑造了控制领域的博弈论，但该领域最紧迫的对抗性挑战需要一个更丰富的框架，其名为 Colonel Blotto。从网络安全防御到基础设施保护，战略对抗约束是控制系统中的基本考虑因素。Colonel Blotto 博弈尽管与这些应用直接相关，但在控制界中相对于其他博弈论方法仍未被充分利用。本文旨在为控制界弥合这一差距。实际上，过去二十年内的理论进展激发了人们重新燃起的兴趣，并使其能够应用于多个领域。在本文中，我们介绍 Colonel Blotto 框架，综述关键分析和计算结果，并展示涵盖网络安全、网络防御和多智能体系统的问题如何自然地适合这一结构。深入探讨了三个研究方向：捕捉网络脆弱性的相互依赖的对抗目标、模拟部分奖励和结构不对称的替代获胜规则，以及涉及联盟形成和战略让步的多智能体竞争环境。综合来看，这些方向揭示了一个既实用又足够丰富以捕捉对抗性资源分配中固有战略复杂性的框架。

英文摘要

The Prisoner's Dilemma, zero-sum games, LQR team problems, and differential games have shaped game theory in controls for decades, but the field's most pressing adversarial challenges demand a richer framework, and its name is Colonel Blotto. Strategic adversarial constraints represent a fundamental consideration in control systems, from cybersecurity defense to infrastructure protection. Colonel Blotto games, despite their direct relevance to such applications, remain underutilized in the controls community relative to other game-theoretic approaches. This article aims to close that gap for the controls community. Indeed, theoretical advances within the last two decades have spurred a resurgence of interest and enabled their applications across several domains. In this article, we introduce the Colonel Blotto framework, survey key analytical and computational results, and demonstrate how problems spanning cybersecurity, network defense, and multi-agent systems fit naturally within this structure. Three research directions are examined in depth: interdependent contest objectives that capture networked vulnerabilities, alternate winning rules that model partial rewards and structural asymmetries, and multi-agent competitive environments involving coalition formation and strategic concessions. Taken together, these directions reveal a framework that is both practically deployable and rich enough to capture the strategic complexity inherent in adversarial resource allocation.

URL PDF HTML ☆

赞 0 踩 0

2603.23372 2026-06-11 eess.SY 版本更新

WAKE-NET: A 3D-Wake-Aware Economic Turbine Layout and Cabling Optimization Framework for Multi-Capacity Multi-Hub-Height Wind Farms Serving Grid-Scale and Industrial Power Systems

WAKE-NET：面向电网级和工业电力系统的多容量多轮毂高度风电场的三维尾流感知经济涡轮机布局与电缆布线优化框架

Ann Mary Toms, Xingpeng Li

AI总结提出WAKE-NET框架，集成涡轮机布局、容量选择、电缆布线和轮毂高度多样化，通过尾流感知优化提高风电场能量产量和经济评估的可靠性。

详情

AI中文摘要

全球向可再生能源的转型加速了公用事业规模风电场的部署，增加了对准确性能和经济评估的需求。尽管风能具有巨大的碳减排潜力，但投资决策对预测的年发电量和经济盈利能力高度敏感。传统的风电场分析通常仅根据来风条件估算涡轮机功率输出，忽略了涡轮机之间的尾流相互作用。这些尾流效应会显著降低下游涡轮机的性能，导致对能量产量和财务回报的高估。本研究提出了WAKE-NET，一个三维尾流感知优化框架，将涡轮机布局优化、涡轮机容量选择、电缆布线和轮毂高度多样化整合在一个统一的利润驱动公式中。与假设统一轮毂高度和涡轮机容量或忽略尾流动力学的传统方法不同，所提出的框架在优化过程中考虑了尾流引起的功率损失。还评估了一个基准的尾流忽略模型，以量化忽略尾流相互作用的影响。结果表明，忽略尾流的优化会显著高估年利润，而使用多种轮毂高度和容量可以减少尾流重叠并提高空间利用率。总体而言，研究结果表明，尾流感知优化结合轮毂高度和容量多样化提供了更可靠的能量产量预测和经济评估，为大规模风电场规划和投资提供了有价值的指导。

英文摘要

The global transition towards renewable energy has accelerated the deployment of utility-scale wind farms, increasing the need for accurate performance and economic assessments. Although wind energy offers substantial potential for carbon emission reduction, investment decisions are highly sensitive to predicted annual energy production and economic profitability. Conventionally wind farm analyses often estimate turbine power output based solely on incoming wind conditions, neglecting wake interactions between turbines. These wake effects can significantly reduce downstream turbine performance, leading to overestimation of energy yield and financial returns. This study proposes WAKE-NET, a 3D wake-aware optimization framework that integrates turbine layout optimization, turbine capacity selection, cable routing, and hub height diversification within a unified profit-driven formulation. Unlike traditional approaches that assume a uniform hub height and turbine capacities or ignore wake dynamics, the proposed framework accounts for wake-induced power losses during optimization. A benchmark wake-ignorant model is also evaluated to quantify the impact of neglecting wake interactions. Results indicate that the wake-ignorant optimization can significantly overestimate annual profits, while the use of multiple hub heights and capacities reduce wake overlap and improve spatial utilization. Overall, the findings demonstrate that wake-aware optimization coupled with hub height and capacity diversification provides more reliable energy yield prediction and economic assessment, offering valuable guidance for large-scale wind farm planning and investment.

URL PDF HTML ☆

赞 0 踩 0

2511.01747 2026-06-11 eess.SP 版本更新

AnyPPG: An ECG-Guided PPG Foundation Model Trained on Over 100,000 Hours of Recordings for Holistic Health Profiling

AnyPPG：基于心电引导的PPG基础模型，在超过10万小时记录上训练，用于全面健康分析

Guangkun Nie, Xiaocheng Fang, Gongzheng Tang, Yujie Xiao, Jun Li, Bo Liu, Hongyan Li, Shenda Hong

AI总结提出AnyPPG，一种基于心电引导预训练的光电容积描记（PPG）基础模型，在超10万小时数据上训练，首次开展覆盖1468种疾病表型的全表型关联研究，证明PPG可超越传统心血管应用，对307种表型（含230种非循环系统疾病）实现有效判别。

详情

AI中文摘要

光电容积描记（PPG）作为一种非侵入性、易获取的连续健康监测方式被广泛使用。然而，尽管PPG是与体循环内在耦合的外周血流动力学信号，现有研究大多将其局限于狭窄的心血管任务，一个基本问题尚未充分探索：PPG在多大程度上能够支持超越传统心血管应用的全面健康分析？为回答这一问题，我们提出AnyPPG，一个基于基础模型的框架，旨在揭示PPG更广泛的健康分析潜力。为确保该研究的可靠性能，AnyPPG在心电引导下，基于迄今为止最多样化的PPG语料库（包含来自六个大规模数据源的超过10万小时记录，并同步心电信号）进行预训练。该预训练产生了稳健且具有生理基础的PPG表示，为后续分析提供了可靠基础。基于该预训练模型，我们通过据我们所知首个基于PPG的全表型疾病检测研究，系统探究PPG与全面健康之间的关联，涵盖超过15000名受试者的1468种疾病表型。我们的评估证明了AnyPPG的有效性：在覆盖15个下游任务的8个临床和可穿戴数据集中，它在13个任务上取得了最佳性能。更重要的是，在全表型分析中，AnyPPG对16个不同phecode章节中的307种表型表现出有意义的判别能力（AUC ≥ 0.70），包括痴呆和慢性肾病等230种非循环系统疾病，其中许多疾病此前很少使用PPG进行探索。综合来看，这些发现表明，易于获取的PPG信号编码了远超传统心血管评估范围的丰富健康相关信息。

英文摘要

Photoplethysmography (PPG) is widely used as a non-invasive and accessible modality for continuous health monitoring. However, despite being a peripheral hemodynamic signal intrinsically coupled with systemic circulation, existing research has largely confined its scope to a narrow range of cardiovascular tasks, leaving a fundamental question underexplored: to what extent can PPG support holistic health profiling beyond traditional cardiovascular applications? To answer this question, we present AnyPPG, a foundation model-based framework designed to reveal the broader health-profiling potential of PPG. To ensure reliable performance for this investigation, AnyPPG is pretrained with ECG guidance on the most diverse PPG corpus with synchronized ECG to date, comprising over 100,000 hours of recordings from six large-scale data sources. This pretraining yields robust and physiologically grounded PPG representations that provide a reliable basis for subsequent analysis. Building upon this pretrained model, we conduct a systematic investigation into the association between PPG and holistic health through, to our knowledge, the first PPG-based phenome-wide disease detection study, spanning 1,468 disease phenotypes in more than 15,000 subjects. Our evaluation demonstrates the effectiveness of AnyPPG: across eight clinical and wearable datasets covering 15 downstream tasks, it achieves the best performance in 13 tasks. More importantly, in the phenome-wide analysis, AnyPPG exhibits meaningful discriminative capability (AUC $\ge$ 0.70) for 307 phenotypes across 16 distinct phecode chapters, including 230 non-circulatory conditions such as dementia and chronic kidney disease, many of which have rarely been explored using PPG. Collectively, these findings indicate that easily acquired PPG signals encode rich health-related information extending well beyond conventional cardiovascular assessment.

URL PDF HTML ☆

赞 0 踩 0

2603.14762 2026-06-11 math.OC cs.LG eess.SY 版本更新

Online Learning for Supervisory Switching Control

在线学习用于监督切换控制

Haoyuan Sun, Ali Jadbabaie

AI总结研究在线学习在部分观测线性动态系统中监督切换控制的问题，提出非渐近分析方法，结合多臂老虎机算法，实现稳定控制器识别与系统辨识。

详情

AI中文摘要

我们研究了部分观测线性动态系统中的监督切换控制。目标是通过周期性选择一组N个候选控制器中的一个，来识别并部署适合的控制器。经典估计器基于监督控制保证渐近稳定性，但缺乏有限时间性能界限。相反，当前在线学习和系统识别中的非渐近方法需要限制性假设，如系统稳定性，这在控制设置中不兼容，从而排除了测试可能不稳定控制器的可能性。为弥合这一差距，我们提出了一种新颖的非渐近监督控制分析，将多臂老虎机算法适应到控制理论设置中。所提出的数据驱动算法通过评分标准评估候选控制器，利用系统可观测性来隔离状态历史的影响，从而既能检测不稳定控制器，又能实现准确的系统辨识。我们提出了两种算法变体，具有无维度、有限时间保证，其中每个算法在O(N log²N)步内识别匹配控制器，同时在系统扰动下实现有限的L₂增益。

英文摘要

We study supervisory switching control for partially-observed linear dynamical systems. The objective is to identify and deploy a suitable controller for the unknown system by periodically selecting among a collection of $N$ candidate controllers, some of which may destabilize the underlying system. While classical estimator-based supervisory control guarantees asymptotic stability, it lacks quantitative finite-time performance bounds. Conversely, current non-asymptotic methods in both online learning and system identification require restrictive assumptions that are incompatible in a control setting, such as system stability, which preclude testing potentially unstable controllers. To bridge this gap, we propose a novel, non-asymptotic analysis of supervisory control that adapts multi-armed bandit algorithms to a control-theoretic setting. The proposed data-driven algorithm evaluates candidate controllers via scoring criteria that leverage system observability to isolate the effects of state history, enabling both detection of destabilizing controllers and accurate system identification. We present two algorithmic variants with dimension-free, finite-time guarantees, where each identifies the matching controller in $O(N \log^2 N)$ steps, while simultaneously achieving finite $L_2$-gain with respect to system disturbances.

URL PDF HTML ☆

赞 0 踩 0

2603.11678 2026-06-11 eess.AS cs.SD 版本更新

RAF: Relativistic Adversarial Feedback For Universal Speech Synthesis

RAF：用于通用语音合成的相对论对抗反馈

Yongjoon Lee, Jung-Woo Choi

AI总结提出相对论对抗反馈（RAF）训练目标，通过自监督语音模型和相对论配对改进GAN声码器的域内保真度和泛化能力，在参数减少88%的情况下超越LSGAN训练的BigVGAN。

详情

Comments: Accepted to Interspeech 2026 Long paper track. Code: this https URL

AI中文摘要

我们提出相对论对抗反馈（RAF），一种用于GAN声码器的新型训练目标，可提高域内保真度和对未见场景的泛化能力。尽管现代GAN声码器采用先进架构，但其训练目标往往无法促进可泛化的表示。RAF通过利用语音自监督学习模型辅助判别器评估样本质量，鼓励生成器学习更丰富的表示来解决这一问题。此外，我们利用真实和虚假波形的相对论配对来改善训练数据分布的建模。跨多个数据集的实验表明，基于GAN的声码器在客观和主观指标上均获得一致提升。重要的是，经过RAF训练的BigVGAN-base仅使用12%的参数就在感知质量上优于经过LSGAN训练的BigVGAN。对比研究进一步证实了RAF作为GAN声码器训练框架的有效性。

英文摘要

We propose Relativistic Adversarial Feedback (RAF), a novel training objective for GAN vocoders that improves in-domain fidelity and generalization to unseen scenarios. Although modern GAN vocoders employ advanced architectures, their training objectives often fail to promote generalizable representations. RAF addresses this problem by leveraging speech self-supervised learning models to assist discriminators in evaluating sample quality, encouraging the generator to learn richer representations. Furthermore, we utilize relativistic pairing for real and fake waveforms to improve the modeling of the training data distribution. Experiments across multiple datasets show consistent gains in both objective and subjective metrics on GAN-based vocoders. Importantly, the RAF-trained BigVGAN-base outperforms the LSGAN-trained BigVGAN in perceptual quality using only 12\% of the parameters. Comparative studies further confirm the effectiveness of RAF as a training framework for GAN vocoders.

URL PDF HTML ☆

赞 0 踩 0

2509.15680 2026-06-11 cs.SD eess.AS 版本更新

SAM: A Mamba-2 State-Space Audio-Language Model

SAM: 一种基于 Mamba-2 状态空间的音频-语言模型

Taehan Lee, Jaehan Jung, Hyukjun Lee

AI总结提出 SAM，一种结合 Mamba-2 骨干网络的音频-语言模型，在 AudioSet 和 AudioCaps 上以更少参数达到或超越 7B 变压器模型性能，并系统分析了 SSM 与音频编码器输出的交互机制。

详情

Comments: 6 pages, Accepted to Interspeech 2026

AI中文摘要

我们提出了 SAM，一种状态空间音频-语言模型，它将音频编码器与 Mamba-2 骨干网络集成。SAM-2.7B 在 AudioSet 上达到 21.1 mAP，在 AudioCaps 上达到 17.6 SPICE，以更少的参数匹配或超越更大的 7B 变压器模型。我们进一步首次提供了系统性的、表示级别的分析，研究 SSM 如何与音频编码器输出交互：(1) 联合音频编码器微调是必要的，这由准确率提升以及在不同 SSM 大小下观察到的 token 表示秩和相似性的适应所支持；(2) 尽管线性缩放，SSM 从紧凑、信息丰富的音频 token 表示中获益更多，而非过长的 token 序列；(3) 融入指令跟随监督显著提升了推理能力，将 MMAU-Sound 准确率从 22.8 提升至 56.8。通过全面的实验和分析，我们为 SSM 作为音频-语言模型的强大、可扩展骨干网络建立了实用的设计原则。

英文摘要

We present SAM, a State-space Audio-language Model that integrates an audio encoder with a Mamba-2 backbone. SAM-2.7B achieves 21.1 mAP on AudioSet and 17.6 SPICE on AudioCaps, matching or surpassing larger 7B transformer-based models with fewer parameters. We further provide the first systematic, representation-level analysis of how SSMs interact with audio encoder outputs: (1) joint audio encoder finetuning is essential, supported by accuracy gains and observed adaptation of token representation rank and similarity across different SSM sizes; (2) despite linear scaling, SSMs benefit more from compact, information-rich audio token representations than from excessively long token sequences; and (3) incorporating instruction-following supervision substantially improves reasoning ability, boosting MMAU-Sound accuracy from 22.8 to 56.8. Through comprehensive experiments and analysis, we establish practical design principles for SSMs as strong, scalable backbones for audio-language models.

URL PDF HTML ☆

赞 0 踩 0

2602.22964 2026-06-11 eess.SP 版本更新

A guided residual search for nonlinear state-space identification

非线性状态空间辨识的引导残差搜索

Merijn Floren, Jan Swevers

AI总结针对非线性状态空间模型参数辨识的非凸优化问题，提出引导残差搜索与多步优化结合的方法，提升收敛可靠性与效率。

2602.14913 2026-06-11 cs.LG eess.IV 版本更新

Coverage Guarantees for Pseudo-Calibrated Conformal Prediction under Distribution Shift

分布漂移下伪校准保形预测的覆盖保证

Farbod Siahkali, Ashwin Verma, Vijay Gupta

AI总结针对分布漂移下保形预测覆盖失效问题，利用伪校准和领域自适应工具，推导目标覆盖下界，并提出通过松弛参数膨胀保形阈值的方法及源调优伪校准算法，实验证明其能缓解覆盖退化。

详情

Comments: Under review. 6 pages, 2 figures, 1 table

AI中文摘要

保形预测（CP）在可交换性假设下提供无分布边际覆盖保证，但当数据分布发生漂移时，这些保证可能失效。我们分析了在有限标签条件协变量漂移模型下，使用伪校准作为应对这种性能损失的工具。利用领域自适应的工具，我们根据分类器的源域损失和漂移的Wasserstein度量推导出目标覆盖的下界。利用这一结果，我们提供了一种设计伪校准集的方法，该方法通过松弛参数膨胀保形阈值，使目标覆盖保持在规定水平以上。最后，我们提出了一种源调优伪校准算法，该算法根据分类器的不确定性在硬伪标签和随机化标签之间进行插值。数值实验表明，我们的界限定性地跟踪了伪校准行为，并且源调优方案在分布漂移下缓解了覆盖退化，同时保持了非平凡的预测集大小。

英文摘要

Conformal prediction (CP) offers distribution-free marginal coverage guarantees under an exchangeability assumption, but these guarantees can fail if the data distribution shifts. We analyze the use of pseudo-calibration as a tool to counter this performance loss under a bounded label-conditional covariate shift model. Using tools from domain adaptation, we derive a lower bound on target coverage in terms of the source-domain loss of the classifier and a Wasserstein measure of the shift. Using this result, we provide a method to design pseudo-calibrated sets that inflate the conformal threshold by a slack parameter to keep target coverage above a prescribed level. Finally, we propose a source-tuned pseudo-calibration algorithm that interpolates between hard pseudo-labels and randomized labels as a function of classifier uncertainty. Numerical experiments show that our bounds qualitatively track pseudo-calibration behavior and that the source-tuned scheme mitigates coverage degradation under distribution shift while maintaining nontrivial prediction set sizes.

URL PDF HTML ☆

赞 0 踩 0

2602.09144 2026-06-11 eess.SY 版本更新

Shaping Energy Exchange with Gyroscopic Interconnections: a Geometric Approach

利用陀螺互连塑造能量交换：一种几何方法

Jasper Juchem, Mia Loccufier

AI总结本文通过几何方法研究常斜对称速度耦合保守系统的能量交换，引入内切半径度量来量化子系统性能，并开发了无需时域仿真的计算方法，揭示了低阶共振通过锁相限制能量耗散，而高阶共振恢复保守界，为能量吸收与包含控制提供互连设计框架。

详情

Comments: Conference paper submitted to the 10th IEEE Conference on Control Technology and Applications (CCTA) 2026 In Vancouver, and is currently under review

AI中文摘要

陀螺互连能够在保持无源性和总能量的同时，在自由度之间重新分配能量，并在受控拉格朗日方法和IDA-PBC中发挥核心作用。然而，它们对瞬态能量交换和子系统性能的定量影响尚未得到充分表征。我们研究了一个具有常斜对称速度耦合的保守机械系统。其动力学是可积的，并在不变二维环面上演化，这些环面在子系统相平面上的投影提供了能量交换的几何描述。当简正模频率比为有理数时，这些投影变为闭合的共振李萨如图形，从而能够对子系统轨迹进行结构化分析。为了量化子系统行为，我们引入了内切半径度量：投影轨迹中包含的最大以原点为中心的圆的半径。这给出了可达到的子系统能量的下界，并作为内部性能度量。我们推导了共振条件，并开发了一种无需时域仿真即可计算或验证内切半径的高效方法。我们的结果表明，低阶共振可以通过锁相强烈限制能量耗散，而高阶共振则恢复保守界。这些见解为考虑响应性的能量吸收和包含控制策略提供了一个显式的互连设计框架。

英文摘要

Gyroscopic interconnections enable redistribution of energy among degrees of freedom while preserving passivity and total energy, and they play a central role in controlled Lagrangian methods and IDA-PBC. Yet their quantitative effect on transient energy exchange and subsystem performance is not well characterised. We study a conservative mechanical system with constant skew-symmetric velocity coupling. Its dynamics are integrable and evolve on invariant two-tori, whose projections onto subsystem phase planes provide geometric description of energy exchange. When the ratio of normal-mode frequencies is rational, these projections become closed resonant Lissajous curves, enabling structured analysis of subsystem trajectories. To quantify subsystem behaviour, we introduce the inscribed-radius metric: the radius of the largest origin-centred circle contained in a projected trajectory. This gives a lower bound on attainable subsystem energy and acts as an internal performance measure. We derive resonance conditions and develop an efficient method to compute or certify the inscribed radius without time-domain simulation. Our results show that low-order resonances can strongly restrict energy depletion through phase-locking, whereas high-order resonances recover conservative bounds. These insights lead to an explicit interconnection-shaping design framework for both energy absorption and containment control strategies, while taking responsiveness into account.

URL PDF HTML ☆

赞 0 踩 0

2602.02229 2026-06-11 cs.LG eess.SP 版本更新

Prediction-Powered Risk Monitoring of Deployed Models for Detecting Harmful Distribution Shifts

预测驱动的已部署模型风险监控：检测有害分布漂移

Guangyi Zhang, Yunlong Cai, Guanding Yu, Osvaldo Simeone

AI总结提出预测驱动风险监控（PPRM），一种基于预测驱动推断的半监督方法，通过结合合成标签与少量真实标签构建运行风险的随时有效下界，实现对有害漂移的检测，并在图像分类、大语言模型和电信监控任务中验证有效性。

2602.00560 2026-06-11 cs.SD eess.AS 版本更新

Edit Content, Preserve Acoustics: Imperceptible Text-Based Speech Editing via Self-Consistency Rewards

编辑内容，保留声学：基于自一致性奖励的不可感知文本语音编辑

Yong Ren, Jiangyan Yi, Jianhua Tao, Tao Wang, Le Xu, Zhengqi Wen

AI总结提出一种在稳定语义空间中编辑内容、通过流匹配解码器保持声学连续性的框架，并利用自一致性奖励组相对策略优化实现不可感知的文本语音编辑。

详情

Comments: Accepted by Interspeech 2026

AI中文摘要

不可感知的基于文本的语音编辑通过转录操作修改口语内容，同时保持声学连续性。先前的声学空间方法存在内容-风格纠缠，导致生成不稳定和边界伪影。我们引入了一个以“编辑内容，保留声学”原则为指导的框架。编辑在稳定的语义空间中进行，而声学实现由流匹配解码器处理。为了确保感知一致性，我们提出了自一致性奖励组相对策略优化，该优化利用预训练的文本到语音模型作为隐式评判器，并结合可理解性和持续时间约束。实验表明，在可理解性、鲁棒性和感知质量方面，该方法持续优于最先进的自回归和非自回归基线。

英文摘要

Imperceptible text-based speech editing modifies spoken content through transcript manipulation while preserving acoustic continuity. Prior acoustic-space approaches suffer from content-style entanglement, causing unstable generation and boundary artifacts. We introduce a framework guided by the principle of "Edit Content, Preserve Acoustics". Editing is conducted in a stable semantic space, while acoustic realization is handled by a Flow Matching decoder. To ensure perceptual consistency, we propose Self-Consistency Rewards Group Relative Policy Optimization, which leverages a pre-trained Text-to-Speech model as an implicit critic, together with intelligibility and duration constraints. Experiments demonstrate consistent improvements over state-of-the-art autoregressive and non-autoregressive baselines in intelligibility, robustness, and perceptual quality.

URL PDF HTML ☆

赞 0 踩 0