arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.12867 2026-06-17 cs.LG 新提交

SMGFM: Spectral Multimodal Graph Pretraining for Multimodal-Attributed Graphs

SMGFM: 面向多模态属性图的谱多模态图预训练

Zhengyu Wu, Xu Wang, Hongchao Qin, Xunkai Li, Guang Zeng, Rong-Hua Li, Guoren Wang

AI总结提出SMGFM框架，利用图频谱分解区分结构诱导语义与模态特有语义，通过频带路由实现跨模态融合，在图级和模态级任务上取得最优性能。

详情

AI中文摘要

多模态属性图（MAGs）将图拓扑结构与来自文本、图像等模态的节点语义相结合。传统的图学习通过耦合拓扑与节点特征来上下文化节点语义。然而，这种耦合设计在MAGs中变得棘手，因为结构诱导和模态固有的语义可能对下游任务产生不同贡献。结构诱导语义通过平滑拓扑变化促进关系一致性，而模态固有语义通常编码局部、细粒度的区分，不应被统一平滑或对齐。因此，关键挑战在于跨模态融合前识别语义角色。为此，我们利用图频率变化作为先验，其中低频分量捕获拓扑一致语义，高频分量保留模态特定语义。基于这一直觉，我们提出SMGFM，一种谱多模态图预训练框架，将每个模态特定的节点信号分解为图频带，并在跨模态交互前分配频带级语义角色。具体地，SMGFM使用可扩展的切比雪夫滤波器构建频率解析的模态令牌，通过拓扑条件路由估计其耦合可靠性，并在融合前进行频带-模态交互。其频率路由目标在平滑共识路由的同时保留模态特定路由，减轻空间域纠缠和统一跨模态对齐。在MAG数据集上的大量实验表明，SMGFM在图级和模态级任务上均达到最先进性能。

英文摘要

Multimodal-attributed graphs (MAGs) couple graph topology with node semantics from text, images, and other modalities. Traditional graph learning contextualizes node semantics by coupling topology with node features. However, this coupling design becomes troublesome in MAGs, where structure-induced and modality-intrinsic semantics may contribute differently to downstream tasks. Structure-induced semantics promote relational consistency through smooth topological variation, whereas modality-intrinsic semantics often encode local, fine-grained distinctions that should not be uniformly smoothed or aligned. Therefore, the key challenge is to identify semantic roles before cross-modal fusion. To this end, we leverage graph-frequency variation as a prior, where low-frequency components capture topology-consistent semantics and high-frequency components preserve modality-specific semantics. Based on this intuition, we propose SMGFM, a spectral multimodal graph pretraining framework that decomposes each modality-specific node signal into graph-frequency bands and assigns band-level semantic roles before cross-modal interaction. Concretely, SMGFM constructs frequency-resolved modality tokens with scalable Chebyshev filters, estimates their coupling reliability through topology-conditioned routing, and performs band-modality interaction before fusion. Its frequency-routed objectives align smooth consensus routes while preserving modality-specific routes, mitigating spatial-domain entanglement and uniform cross-modal alignment. Extensive experiments conducted on the MAG datasets demonstrate that SMGFM achieves state-of-the-art performance across graph-level and modality-level tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.18074 2026-06-17 stat.ML cs.LG stat.ME 新提交

Tensor-based second-order causal discovery

基于张量的二阶因果发现

Nathan Ouyang, Kexin Wan, Anna Seigal

AI总结提出TSCD算法，利用观测和干预数据的协方差矩阵张量，在线性结构方程模型下识别有向无环图及其边函数，仅要求噪声不相关，并扩展到非线性模型，具有对数级干预可识别性。

Comments 27 pages, 7 figures. Code available at https://github.com/QWE123665/Tensor-based-Second-order-Causal-Discovery

详情

AI中文摘要

因果发现旨在揭示变量间的因果依赖关系。为此，我们提出了一种称为基于张量的二阶因果发现（TSCD）的算法。其输入是从观测数据和干预数据的协方差矩阵中得到的张量。假设因果依赖关系遵循有向无环图（DAG）上的线性结构方程模型，TSCD输出DAG及其边上的函数，仅要求噪声变量不相关。我们还实现了该方法在非线性模型中的版本。我们关注二阶统计量（通过协方差矩阵）的动机是：相对于高阶矩，它们在统计和计算上更高效；相对于一阶统计量，它们具有可识别性；并且无论变量是否为高斯分布，它们都适用。我们证明，TSCD从对数于变量数量的干预次数中可识别因果顺序和参数。实验表明，TSCD对噪声具有鲁棒性，与现有方法相比具有竞争力，并且可扩展到数百个变量。

英文摘要

Causal discovery seeks to uncover the causal dependencies among variables. For this purpose, we propose an algorithm called Tensor-based Second-order Causal Discovery (TSCD). Its input is a tensor obtained from the covariance matrices of observational and interventional data. Assuming the causal dependencies follow a linear structural equation model on a directed acyclic graph (DAG), TSCD outputs the DAG and the functions on its edges, requiring only that the noise variables are uncorrelated. We also implement a version of the approach for nonlinear models. Our focus on second-order statistics (via the covariance matrices) is motivated by their statistical and computational efficiency relative to higher-order moments, their identifiability relative to first-order statistics, and that they work regardless of whether the variables are Gaussian. We show that TSCD has identifiable causal order and parameters from a number of interventions that is logarithmic in the number of variables. Experiments show that TSCD is robust to noise, competitive with existing methods, and scales to hundreds of variables.

URL PDF HTML ☆

赞 0 踩 0

2606.17504 2026-06-17 eess.IV cs.CV 新提交

Two-Stage Fine-Tuning of ResNet50 for High-Sensitivity Melanoma Detection on Dermoscopic Images

ResNet50的两阶段微调用于皮肤镜图像中高灵敏度黑色素瘤检测

Aryan Bhagat

AI总结提出ResNet50的两阶段微调方法，通过分层训练和低学习率微调解决类别不平衡和迁移学习不足问题，在3826张测试图像上实现AUC-ROC 0.9559，灵敏度87.56%，优于单阶段微调。

Comments 13 pages, 4 figures, 4 tables. Code available at https://github.com/Aryanbhagat23/melanoma-detection

详情

AI中文摘要

黑色素瘤是最危险的皮肤癌，早期检测五年生存率超过99%，但一旦扩散则急剧下降。本文提出并评估了一种两阶段微调方法，用于皮肤镜图像上的二分类黑色素瘤检测，基于ResNet50。解决的核心挑战是类别不平衡和单阶段微调导致的迁移学习次优。在分层训练/验证/测试分割后，仅对训练集应用随机过采样以实现1:1类别平衡。第一阶段冻结ResNet50骨干网络，仅训练分类头；第二阶段以1e-5的低学习率联合微调所有层，以防止对已学习视觉特征的灾难性遗忘。在包含3826张图像的独立测试集上，模型实现了AUC-ROC为0.9559，准确率88.34%，灵敏度87.56%，特异度89.13%，F1分数88.29%。消融研究证实两阶段协议显著优于单阶段微调，灵敏度提升超过4%。Grad-CAM可视化展示了正确的病变定位。提供了完全可部署的Streamlit检测应用程序及所有训练代码。

英文摘要

Melanoma is the most dangerous form of skin cancer with five-year survival rates exceeding 99% when detected early but falling sharply once the disease spreads. This paper proposes and evaluates a two-stage fine-tuning approach for ResNet50 applied to binary melanoma classification on dermoscopic images. The core challenges addressed are class imbalance and suboptimal transfer learning from single-stage fine-tuning. After stratified train/validation/test splitting, random oversampling was applied exclusively to the training set to achieve a 1:1 class balance. Stage 1 trained only the classification head with the ResNet50 base frozen, while Stage 2 fine-tuned all layers jointly at a low learning rate of 1e-5 to prevent catastrophic forgetting of learned visual features. On an independent test set of 3,826 images, the model achieved an AUC-ROC of 0.9559, accuracy of 88.34%, sensitivity of 87.56%, specificity of 89.13%, and F1-score of 88.29%. An ablation study confirms the two-stage protocol significantly outperforms single-stage fine-tuning, with sensitivity gains of over 4%. Grad-CAM visualizations demonstrate correct lesion localization. A fully deployable Streamlit detection application is provided alongside all training code.

URL PDF HTML ☆

赞 0 踩 0

2606.12623 2026-06-17 stat.AP cs.LG 新提交

Estimating Individualized Treatment Effects in Acute Ischemic Stroke with Causal Transformation Models (TRAM-DAG): A Multi-Centre Observational Study with External RCT Validation

使用因果变换模型（TRAM-DAG）估计急性缺血性卒中个体化治疗效果：一项多中心观察性研究及外部RCT验证

Oliver Dürr, Lisa Herzog, Pascal Bühler, Susanne Wegener, Beate Sick

AI总结提出因果变换模型（TRAM-DAG）估计急性缺血性卒中患者个体化治疗效果，基于观察数据拟合后，在RCT人群中验证其平均效果与ATE一致，并能正确排序患者预后。

Comments This submission has been withdrawn by the authors pending completion of internal review. A revised version will be posted in due course

详情

AI中文摘要

急性缺血性卒中的个体化医疗需要从平均治疗效果（ATE）转向个体化治疗效果（ITE）估计，以支持治疗决策。在急性缺血性卒中中，随机对照试验（如MR CLEAN研究）显示机械取栓平均优于溶栓。我们旨在识别哪些个体患者从机械取栓中获益最大。关注的结局是三个月时的改良Rankin量表（mRS），这是一个有序的功能残疾指标（0：无症状，6：死亡）。我们证明，在观察性MAGIC多中心卒中患者数据上拟合后，有向无环图上的因果变换模型（TRAM-DAG）可用于ITE估计。为确保与用于验证的MR CLEAN人群的可比性，我们在MAGIC子人群（入院NIHSS≥6，对应MR CLEAN的一项纳入标准）上训练TRAM-DAG。然后使用拟合模型估计MR CLEAN人群中卒中患者的ITE。虽然这些ITE估计无法通过实验确认，但我们显示其平均值与试验报告的ATE一致。此外，ITE估计正确地将试验患者按观察到的良好结局（三个月mRS≤2）频率排序。这些发现支持使用像TRAM-DAG这样的因果模型进行卒中护理中的个性化决策，并突显其弥合观察性证据与临床试验之间差距的能力。

英文摘要

Personalized medicine in acute ischemic stroke requires moving beyond average treatment effects (ATE) to individualized treatment effect (ITE) estimates to support treatment decisions. In acute ischemic stroke, mechanical thrombectomy has been shown to be more effective on average than lysis in randomized controlled trials (RCTs), such as the MR CLEAN study. We aim to identify which individual patients benefit most from mechanical thrombectomy compared to lysis. The outcome of interest is the modified Rankin Scale (mRS) at three months, an ordinal measure of functional disability (0: no symptoms, 6: death). We demonstrate that causal transformation models on directed acyclic graphs (TRAM-DAG) can be used for ITE estimation after being fitted on observational MAGIC multi-center stroke patient data. To ensure comparability with the MR CLEAN population, which we use for validation, we train the TRAM-DAG on a MAGIC sub-population with NIHSS at admission >= 6, corresponding to one inclusion criterion of MR CLEAN. The fitted model is then used to estimate ITEs for stroke patients in the MR CLEAN population. While these ITE estimates cannot be confirmed experimentally, we show that their average is consistent with the trial's reported ATE. Furthermore, the ITE estimates correctly rank trial patients by their observed frequency of a good outcome (mRS at three months <= 2). These findings support the use of causal models like TRAM-DAG for personalized decision-making in stroke care and highlight their ability to bridge the gap between observational evidence and clinical trials.

URL PDF HTML ☆

赞 0 踩 0

2606.12666 2026-06-17 cs.CR cs.AI 新提交

CAPED: Context-Aware Privacy Exposure Defense for Mobile GUI Agents

CAPED：面向移动GUI代理的上下文感知隐私暴露防御

Siyu Shen, Fenghao Xu, Wenrui Diao, Kehuan Zhang

AI总结针对移动GUI代理截图上传导致的附带视觉隐私暴露问题，提出上下文感知的预上传暴露控制层CAPED，通过任务需求提取、屏幕上下文隐私先验和UI元素解析，选择性暴露任务所需内容，在保持高任务效用的同时显著降低隐私泄露。

详情

AI中文摘要

基于截图的移动GUI代理能够像人类用户一样通过相同的视觉界面操作普通智能手机应用，但这种能力也将每一次屏幕观察变成了隐私边界。在正常任务执行过程中，截图可能暴露联系人、消息、照片、文件、推荐、健康提示等与用户请求无关的敏感上下文。我们称这个问题为附带视觉隐私暴露。现有防御难以解决：文本匿名化遗漏了许多视觉和推理线索，而通用隐私遮蔽可能移除GUI代理完成任务所需的证据和控制。本文提出CAPED，一种面向移动GUI代理的上下文感知预上传暴露控制层。CAPED被设计为手机端保护层：在截图被释放到远程多模态代理之前，它提取任务需求，利用屏幕上下文作为隐私先验，解析可见UI元素，并仅选择性暴露当前任务所需的内容，同时遮蔽附带隐私内容。我们在AndroidWorld上评估CAPED的广泛任务效用，并使用受控的28任务种子隐私评估作为轨迹级附带泄漏的测量工具。在该种子评估中，完整CAPED将成功条件下的加权种子泄漏从原始截图的0.766降低到0.268，同时保持高任务效用。更广泛的AndroidWorld运行显示了剩余的原型级效用成本，但结果支持核心主张：截图上传应被视为明确的设备-云边界决策，由任务驱动的选择性暴露而非全有或全无的屏幕共享来管理。

英文摘要

Screenshot-based mobile GUI agents can operate ordinary smartphone apps through the same visual interface as a human user, but this capability also turns every screen observation into a privacy boundary. During normal task execution, screenshots may expose contacts, messages, photos, files, recommendations, health cues, and other sensitive context that is unrelated to the user's request. We call this problem incidental visual privacy exposure. It is difficult to address with existing defenses: text anonymization misses many visual and inferential cues, while generic privacy masking can remove the evidence and controls that a GUI agent needs to complete the task. This paper presents CAPED, a context-aware pre-upload exposure control layer for mobile GUI agents. CAPED is designed as a phone-side protection layer: before screenshots are released to a remote multimodal agent, it extracts task requirements, uses screen context as a privacy prior, parses visible UI elements, and selectively exposes only content needed for the current task while masking incidental private content. We evaluate CAPED on AndroidWorld for broad task utility and with a controlled 28-task seeded privacy evaluation used as a measurement instrument for trajectory-level incidental leakage. In this seeded evaluation, Full CAPED reduces success-conditioned weighted seeded leakage from 0.766 under raw screenshots to 0.268 while preserving high task utility. A broader AndroidWorld run shows a remaining prototype-level utility cost, but the results show that task-driven selective exposure can reduce incidental visual leakage before screenshots are released to a remote GUI agent.

URL PDF HTML ☆

赞 0 踩 0

2206.06208 2026-06-17 eess.AS cs.CL cs.SD

Automated Evaluation of Standardized Dementia Screening Tests

Franziska Braun, Markus Förstel, Bastian Oppermann, Andreas Erzigkeit, Thomas Hillemacher, Hartmut Lehfeld, Korbinian Riedhammer

Comments Submitted to Interspeech 2022. arXiv admin note: text overlap with arXiv:2206.05018

2606.17977 2026-06-17 econ.EM 新提交

Beyond Parallel Trends in Staggered Difference-in-Differences: Identification under Higher-Order Parallelism

超越交错双重差分中的平行趋势：高阶平行性下的识别

Zecharias Anteneh

AI总结本文提出高阶平行性假设层次，替代传统平行趋势假设，在交错双重差分设计中实现队列特定和平均处理效应的点识别，并证明聚合定理。

Comments 38 pages, 4 figures. Companion Stata command (anddp) implementing the estimator will be available soon at https://github.com/zanteneh/anddp

详情

AI中文摘要

在双重差分设计中，平行趋势假设要求处理组和对照组之间的结果差距在未处理情况下保持平坦。预处理事件研究经常拒绝这一平坦差距要求。现有的应对措施包括参数趋势控制以及基于违规程度假设的处理效应边界。本文表明，在严格更弱的假设下，交错设计中队列特定和平均处理效应的点识别仍然可以实现。我将平坦差距要求替换为高阶条件层次 Parallel[p]，将该框架嵌入 Callaway 和 Sant'Anna (2021) 的组-时间平均处理效应结构中，并证明了一个聚合定理，该定理适用于不同队列在不同可行多项式阶数下被识别的情况，这是交错设计特有的此前未解决的挑战。一个序贯阶数选择程序指导应用实践。蒙特卡洛证据证实，选择后自助法覆盖接近名义水平，且推断对现实序列相关具有稳健性。应用于医疗补助扩展数据，该方法得到的点估计基于预处理数据未拒绝的假设，而同样的数据明确拒绝了平坦差距要求。

英文摘要

In difference-in-differences designs, the parallel trends assumption requires that the outcome gap between treated and control units would have remained flat absent treatment. Pre-treatment event studies frequently reject this flat-gap requirement. Existing responses include parametric trend controls and bounds on the treatment effect under assumptions about the magnitude of the violation. This paper shows that point identification of cohort-specific and aggregate treatment effects in staggered designs remains achievable under strictly weaker assumptions. I replace the flat-gap requirement with a hierarchy of higher-order conditions, Parallel[p], embed this framework in the group-time average treatment effect structure of Callaway and Sant'Anna (2021), and prove an aggregation theorem for the case where different cohorts are identified under different feasible polynomial orders, a challenge unique to staggered designs that has not been previously addressed. A sequential order-selection procedure guides applied practice. Monte Carlo evidence confirms that post-selection bootstrap coverage remains near-nominal and that inference is robust to realistic serial correlation. Applied to Medicaid expansion data, the method yields point estimates resting on an assumption the pre-treatment data do not reject, in contrast to the flat-gap requirement which those same data decisively reject.

URL PDF HTML ☆

赞 0 踩 0

2606.18196 2026-06-17 eess.SP 新提交

Receiver-Aware Analysis and Verification of the Spectral Separation Coefficient Under Interference-Induced Degradation

接收机感知的干扰诱导退化下频谱分离系数的分析与验证

Lucas Heublein, Fabian Benschuh, Alexander Rügamer, Felix Ott

AI总结本文通过引入接收机前端特性计算依赖接收机的频谱分离系数（SSC），并利用真实和仿真数据集实验验证了干扰影响计算的鲁棒性。

Comments 7 pages, 4 figures

详情

AI中文摘要

干扰对基于卫星的定位系统构成重大挑战，因此准确量化特定干扰类型对接收机性能以及由此产生的位置计算可靠性的影响至关重要。当前实践中，干扰影响通常使用与接收机无关的指标进行量化，而接收机特定的前端特性要么被理想化，要么仅被隐含考虑。在本文中，我们通过将接收机特定的前端特性明确纳入干扰影响的计算中，并通过实验验证所得的依赖接收机的分析，来解决这一局限性。因此，我们记录了一个包含210个不同干扰场景的真实世界开放场数据集，并针对特定接收机模块计算了依赖接收机的频谱分离系数（SSC）和干扰影响。此外，我们使用由射频星座模拟器（RFCS）生成的受控数据集验证了计算，该模拟器采用相同的接收机模块并回放类似的干扰类别。两种环境下获得的结果比较证明了干扰影响计算的鲁棒性。

英文摘要

Interference poses a significant challenge to satellite-based positioning systems, making it essential to accurately quantify the effects of specific interference types on receiver performance and the resulting reliability of position computation. In current practice, interference effects are often quantified using receiver-independent metrics, with receiver-specific front-end characteristics either idealized or only implicitly considered. In this paper, we address this limitation by explicitly incorporating receiver-specific front-end characteristics into the computation of interference effects and validating the resulting receiver-dependent analysis experimentally. Therefore, we record a real-world open-field dataset comprising 210 distinct interference scenarios and compute the receiver-dependent spectral separation coefficient (SSC) and interference impact for a specific receiver module. Furthermore, we verify the computation using a controlled dataset generated with a radio frequency constellation simulator (RFCS), employing the same receiver module and replaying similar interferences classes. The comparison of results obtained in both environments demonstrates the robustness of the interference impact computation.

URL PDF HTML ☆

赞 0 踩 0

2606.18134 2026-06-17 eess.AS 新提交

Grounding Spoken LLMs in Multi-Speaker Audio via Diarization Conditioning

通过说话人日志条件将口语大语言模型扩展到多说话人音频

Alexander Polok, Samuele Cornell, Sathvik Udupa, Jan Černocký, Shinji Watanabe, Lukáš Burget

AI总结提出基于说话人日志条件的口语语言模型，通过条件化声学编码器提取目标说话人表示，避免序列化输出训练导致的灾难性遗忘，在多个数据集上显著提升说话人属性转录性能。

Comments Accepted to Interspeech 2026

详情

AI中文摘要

我们提出了说话人日志条件的口语语言模型（SLMs），这是一种将SLMs扩展到远场多说话人音频的策略。不同于通过序列化输出训练来调整解码器（这有灾难性遗忘的风险），我们通过说话人日志掩码条件化声学编码器以提取目标说话人表示，同时保持解码器冻结。我们将其实例化为Dixtral，将说话人日志条件的Whisper（DiCoW）编码器集成到Voxtral SLM中。在AMI、NOTSOFAR-1、LibriSpeechMix和Mixer6上，Dixtral在说话人属性转录方面分别以29.0%、19.8%和16.0%的绝对cpWER优于Gemini 3.0 Flash、VibeVoice和Voxtral Mini Transcribe V2。在一个新颖的长篇多说话人问答基准上，零样本Dixtral在远场内容理解上与Gemini持平，而经过微调后，在所有任务上均超越了Gemini和基于近讲语音的Voxtral。

英文摘要

We propose diarization-conditioned spoken language models (SLMs), a strategy for extending SLMs to far-field multi-talker audio. Rather than adapting the decoder via Serialized Output Training, which risks catastrophic forgetting, we condition the acoustic encoder on diarization masks to extract target-speaker representations, keeping the decoder frozen. We instantiate this as Dixtral, integrating a Diarization Conditioned Whisper (DiCoW) encoder into the Voxtral SLM. On AMI, NOTSOFAR-1, LibriSpeechMix, and Mixer6, Dixtral outperforms Gemini 3.0 Flash, VibeVoice, and Voxtral Mini Transcribe V2 on speaker-attributed transcription by 29.0%, 19.8%, and 16.0% absolute cpWER respectively. On a novel long-form multi-speaker QA benchmark, zero-shot Dixtral matches Gemini on far-field content understanding, and when fine-tuned surpasses both Gemini and Voxtral operating on close-talk across all tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.18072 2026-06-17 eess.AS 新提交

One-Step Token-to-Waveform Generation with MeanFlow in Latent Space

基于潜在空间中MeanFlow的一步式Token到波形生成

Zheqi Dai, Guangyan Zhang, Zhen Ye, Jingyu Li, Haolin He, Chunyat Wu, Yiwen Guo, Qiuqiang Kong

AI总结提出MeanFlow在高度压缩潜在空间中实现一步式Token2Wav生成，解决多步流匹配解码器的速度-质量权衡，RTF提升17倍且质量损失可忽略。

Comments 5 pages, 1 figure

详情

AI中文摘要

神经音频编解码器是现代基于LLM的文本到语音（TTS）和多模态系统的核心。随着低比特率语义编解码器的重要性日益增加，Token到波形（Token2Wav）解码器成为决定感知质量和系统效率的瓶颈。传统的多步流匹配解码器提供了卓越的质量，但由于迭代采样导致高推理延迟，造成了严重的质量-速度权衡。在本文中，我们提出了一种新颖的Token2Wav架构，通过在高度压缩的潜在空间中应用MeanFlow来克服这一限制。通过建模平均速度而非瞬时速度场，MeanFlow实现了真正的一步生成。在潜在域中操作减轻了波形级流的内存和稳定性问题，与多步基线相比，实时因子（RTF）提升了高达17倍，且质量下降可忽略。此外，我们引入了缓解潜在不匹配的细化策略，包括冻结MeanFlow生成器的仅解码器微调和端到端联合微调，在不增加推理时间成本的情况下提高了保真度。代码和演示已公开。

英文摘要

Neural audio codecs are central to modern LLM-based Text-to-Speech (TTS) and multimodal systems. As low-bitrate semantic codecs gain prominence, the Token-to-Waveform (Token2Wav) decoder becomes a bottleneck determining both perceptual quality and system efficiency. Conventional multi-step flow-matching decoders offer superior quality but suffer from high inference latency due to iterative sampling, creating a severe quality-speed trade-off. In this paper, we propose a novel Token2Wav architecture that overcomes this limitation by applying MeanFlow in a highly compressed latent space. By modeling the average velocity rather than the instantaneous velocity field, MeanFlow enables true one-step generation. Operating in the latent domain mitigates the memory and stability issues of waveform-level flows, yielding up to a 17$\times$ improvement in Real-Time Factor (RTF) compared to multi-step baselines with negligible quality degradation. Furthermore, we introduce refinement strategies that mitigate latent mismatch, including decoder-only fine-tuning with the MeanFlow generator frozen and end-to-end joint fine-tuning, improving fidelity without increasing inference-time cost. Code and demo are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2606.18054 2026-06-17 eess.AS 新提交

AI-based Cognitive-linguistic Features for Dementia Assessment in Picture Description

基于AI的认知语言特征在图片描述任务中的痴呆评估

Lingfeng Xu, Prad Kadambi, Samuel Goldinger, Visar Berisha, Kimberly D. Mueller, Julie Liss

AI总结提出七个针对Cookie Theft图片描述任务的临床构念，利用大语言模型生成严重度评分和解释，Claude 3.5 Sonnet在ADReSS数据集上达到85%准确率，专家一致性评分3.99/5，展示了LLM在可解释认知筛查中的潜力。

Comments 10 pages, 2 figures

详情

AI中文摘要

图片描述为认知语言能力的多个临床构念提供了有价值的见解。然而，将这些构念转化为定量测量仍然具有挑战性，限制了可解释性和临床实用性。我们引入了七个针对Cookie Theft图片描述任务定制的构念，并提示大语言模型（LLMs）对其进行评估，生成严重度评分和基于示例的解释。在所检查的LLMs中，Claude 3.5 Sonnet表现最佳，其生成的严重度评分能够显著区分认知障碍个体与健康对照组。该模型在ADReSS数据集上达到了85%的高准确率。专家对Claude的评分和解释进行评估，平均一致性为3.99/5。研究结果展示了LLMs在操作化临床构念和生成可解释评估方面的潜力，为开发可访问的认知筛查工具提供了一种有前景的方法。

英文摘要

Picture descriptions provide valuable insights into several clinical constructs related to cognitive-linguistic abilities. However, operationalizing these constructs into quantitative measures remains challenging, limiting interpretability and clinical utility. We introduced seven constructs tailored to the Cookie Theft picture description task and prompted large language models (LLMs) to evaluate them, generating severity scores and example-based explanations. Among the examined LLMs, Claude 3.5 Sonnet performed the best, producing severity scores that significantly distinguish cognitively impaired individuals from healthy controls. The model achieves a high accuracy of 85% on the ADReSS dataset. Expert evaluation of Claude's scores and explanations yields a 3.99/5 average agreement. The findings demonstrate the potential of LLMs to operationalize clinical constructs and generate interpretable evaluations, offering a promising approach for accessible cognitive screening tools.

URL PDF HTML ☆

赞 0 踩 0

2606.17942 2026-06-17 eess.SP 新提交

On the Optimum Energy-per-bit Launch Power in Coherent Hollow-core Fibre Transmission Systems

相干空芯光纤传输系统中每比特能量最优发射功率研究

Ronit Sohanpal, Eric Sillekens, Mindaugas Jarmolovicius, Robert I. Killey, Polina Bayvel

AI总结本文研究空芯光纤传输系统中每比特能量最优发射功率，发现1000公里C波段链路在最小每比特能量发射功率下可降低总功耗41.5%，吞吐量仅损失2.2%。

Comments European Conference on Optical Communications (ECOC) 2026

2606.17903 2026-06-17 eess.SP 新提交

Constellation Design for Nonlinear Unified SWIPT Receiver Channels with Memory

非线性统一SWIPT接收机信道带记忆的星座设计

Triantafyllos Mavrovoltsos, Elio Faddoul, Zulqarnain Bin Ashraf, Constantinos Psomas, Besma Smida, Ioannis Krikidis

AI总结针对非线性统一SWIPT接收机信道，提出考虑记忆效应的星座设计方法，通过状态自适应策略和自编码器框架优化误符号率与能量收集的折中。

Comments Submitted to IEEE Transactions on Communications

详情

AI中文摘要

统一接收机（UR）已成为同时无线信息和能量传输（SWIPT）的一种有前景架构，因为共同的整流前端能够从同一整流输出中实现信息解码（ID）和能量收集（EH）。然而，由于二极管的非线性，整流是非线性的，而电容器在符号间引入记忆，使得信道上的星座设计具有挑战性。本文研究了无记忆和有记忆机制下非线性UR-SWIPT信道的星座设计。首先，我们提出一个易处理的统一整流模型，该模型同时捕捉（i）非线性稳态映射和（ii）瞬态操作下的非对称电容器充放电动力学。为了隔离带记忆的整流对ID的影响，我们研究了基于信息的设计。在此设置中，我们开发了一种状态自适应策略，该策略具有算法星座设计，考虑整流器状态并在观测域中塑造星座。通过近似整流器状态分布，我们推导出闭式平均符号错误率（SER）表达式，并表征速率-可靠性（R-R）折中。然后，我们寻找在平均发射功率和EH约束下最小化SER的星座。我们使用基于自编码器的框架解决无记忆机制中的能量约束设置，该框架将非线性整流模型嵌入为可微信道块。数值结果验证了所提模型，展示了记忆对R-R折中的影响，并展示了学习星座如何适应速率-能量折中的EH需求。

英文摘要

Unified receivers (URs) have emerged as a promising architecture for simultaneous wireless information and power transfer (SWIPT), since a common rectifying front-end enables information decoding (ID) and energy harvesting (EH) from the same rectified output. However, rectification is nonlinear due to the diode, while the capacitor introduces memory across symbols, making constellation design over the channel challenging. In this paper, we study constellation design for nonlinear UR-SWIPT channels in both memoryless and memory regimes. First, we propose a tractable unified rectification model that captures both (i) the nonlinear steady-state mapping and (ii) the asymmetric capacitor charging/discharging dynamics under transient operation. To isolate the impact of rectification with memory on ID, we study the information-based design. In this setting, we develop a state-adaptive policy with an algorithmic constellation design that accounts for the rectifier state and shapes the constellation in the observation domain. By approximating the rectifier state distribution, we derive a closed-form average symbol error rate (SER) expression and characterize the rate-reliability (R-R) tradeoff. We then seek constellations that minimize the SER under average transmit power and EH constraints. We address the resulting energy-constrained setting in the memoryless regime using an autoencoder-based framework that embeds the nonlinear rectification model as a differentiable channel block. Numerical results validate the proposed models, demonstrate the impact of memory on the R-R tradeoff, and show how learned constellations adapt to EH requirements in the rate-energy tradeoff.

URL PDF HTML ☆

赞 0 踩 0

2606.17900 2026-06-17 eess.SP 新提交

Time-Slotted Multi-Cluster UAV AirComp with Energy-Awareness: A Pointer Network-Assisted Soft Actor-Critic Learning Framework

时间分槽多簇无人机空中计算与能量感知：一种指针网络辅助的软演员-评论家学习框架

Xunqiang Lan, Xiao Tang, Ruonan Zhang, Qinghe Du, Tony Q. S. Quek

AI总结提出无人机辅助的空中计算系统，通过联合优化波束成形、归一化因子、传感器调度和无人机轨迹，最小化聚合误差和能耗，并采用分层学习框架（指针网络和软演员-评论家）求解。

Comments Accepted @ IEEE JSTSP

详情

AI中文摘要

空中计算（AirComp）已成为大规模数据聚合的一种有前景的方法，但受到信道变化、任务分布以及计算节点固有能量限制的挑战。本文提出了一种无人机辅助的空中计算系统，用于随时间服务多簇计算任务，利用无人机移动性促进的空间和时间分集实现高效准确的数据计算。具体而言，我们旨在通过联合优化收发波束成形、归一化因子、传感器调度和无人机轨迹，最小化空中计算聚合误差和能耗。为了解决所提出的问题，我们将其分解为两层：内层处理基于优化的空中计算收发器设计，外层专注于基于深度强化学习的调度和轨迹设计。特别地，开发了一种指针网络演员-评论家学习来处理二元调度问题，并采用软演员-评论家深度强化学习算法确定无人机轨迹。仿真结果验证了所提出的分层学习框架的收敛性，并表明与基线方案相比，在聚合误差和能耗方面具有显著的性能提升。

英文摘要

Over-the-air computation (AirComp) has emerged as a promising approach for massive data aggregation, which is yet challenged by the channel variations, task distributions, and inherent energy limitation of the computation nodes. In this paper, we propose an unmanned aerial vehicle (UAV)-assisted Aircomp system to serve multi-cluster computation tasks over time, where the UAV mobility-facilitated spatial and time diversity is exploited for efficient and accurate data computation. Specifically, we aim for the minimization of AirComp aggregation error and the energy consumption by jointly optimizing the transceiver beamforming, normalizing factors, sensor scheduling, and UAV trajectory. To solve the formulated problem, we decompose it into two layers where the inner layer addresses the optimization-based AirComp transceiver design, and the outer layer focuses on the deep reinforcement learning (DRL)-based scheduling and trajectory design. In particular, a pointer network actor-critic learning is developed to tackle the binary scheduling problem, and a soft actor-critic DRL algorithm is employed to determine the UAV trajectory. Simulation results validate the convergence of the proposed hierarchical learning framework and demonstrate its significant performance gains in terms of AirComp aggregation error and energy consumption as compared with baseline schemes.

URL PDF HTML ☆

赞 0 踩 0

2606.17893 2026-06-17 eess.SP 新提交

Condition-Wise Sinkhorn Drifting for One-Shot Learned Channel Simulation

条件式Sinkhorn漂移用于一次性学习信道仿真

Rick Fritschek, Rafael F. Schaefer

AI总结针对学习通信系统中扩散式反向采样成本高的问题，提出条件式Sinkhorn漂移，一种一次性信道替代方法，通过条件Sinkhorn目标训练生成器，在AWGN、瑞利衰落等信道下评估，条件式变体在条件诊断和符号编码检查中表现最强。

Comments 12 pages, 3 figures

详情

AI中文摘要

学习通信系统可能在可微训练循环中评估随机信道替代模型数百万次，这使得扩散式反向采样成本高昂。本文提出条件式Sinkhorn漂移，一种一次性信道替代方法，它保留传输符号并仅传输条件输出分布$p(y\mid x)$。我们对相同传输符号的重复输出制定条件Sinkhorn目标，并通过有限样本重心速度后接分离粒子回归来训练生成器。在加性高斯白噪声（AWGN）、瑞利衰落、固态功率放大器（SSPA）非线性和紧凑抽头延迟线（TDL）信道上的实验比较了直接漂移、联合Sinkhorn漂移、条件式Sinkhorn漂移、条件去噪扩散概率建模（DDPM）、去噪扩散隐式建模（DDIM）和Wasserstein生成对抗网络（WGAN）参考。在评估的一次性漂移族变体中，条件式Sinkhorn在条件诊断和符号编码检查中表现最强，而扩散方法在最困难的下游符号错误率（SER）曲线上仍然最强。最终的操作点是一个条件保持的一次性仿真器，适用于重复信道调用使扩散式采样成本过高的场景。

英文摘要

Learned communication systems may evaluate stochastic channel surrogates millions of times inside differentiable training loops, making diffusion-style reverse sampling expensive. This paper proposes condition-wise Sinkhorn drifting, a one-shot channel surrogate that preserves the transmitted symbol and transports only the conditional output laws $p(y\mid x)$. We formulate a conditional Sinkhorn objective over repeated outputs at the same transmitted symbol and train the generator with finite-sample barycentric velocities followed by detached particle regression. Experiments on additive white Gaussian noise (AWGN), Rayleigh fading, solid-state power amplifier (SSPA) nonlinearity, and a compact tapped-delay-line (TDL) channel compare direct drifting, joint Sinkhorn drifting, condition-wise Sinkhorn drifting, conditional denoising diffusion probabilistic modeling (DDPM), denoising diffusion implicit modeling (DDIM), and Wasserstein generative adversarial network (WGAN) references. Within the evaluated one-shot drifting-family variants, condition-wise Sinkhorn is strongest under conditional diagnostics and symbolic-coding checks, while diffusion remains strongest on the hardest downstream symbol-error-rate (SER) curves. The resulting operating point is a condition-preserving one-shot simulator for settings where repeated channel calls make diffusion-style sampling too costly.

URL PDF HTML ☆

赞 0 踩 0

2606.17879 2026-06-17 eess.AS 新提交

A 399uW 114.3 dB DR Companding Readout ASIC for MEMS Microphones Employing a Multirate Time-Domain ADC

一款用于MEMS麦克风的399μW 114.3 dB DR压扩读出ASIC，采用多速率时域ADC

Javier Granizo, Ruben Garvi, Ricardo Carrero, Jorge de la Torre, Javier Fernandez, Dietmar Straeussnigg, Andreas Wiesbauer, Luis Hernandez

AI总结提出一种基于VCO的多速率压扩ADC架构，通过时域表示缓解边界伪影，实现114.3 dB动态范围和<400 μW功耗，适用于数字MEMS麦克风读出。

详情

AI中文摘要

数字MEMS麦克风的动态范围和灵敏度改进在高级噪声消除和语音识别等应用中至关重要。实现这些目标的一种经济有效的解决方案是压扩ADC架构。压扩ADC将动态范围分成多个具有不同量化噪声电平的段，从而放宽功率限制。压扩麦克风的一个常见问题是当输入信号穿过不同幅度段之间的边界时产生的可听伪影。本文展示了一种压扩ADC架构，该架构通过利用基于VCO的ADC中输入信号的瞬时和高分辨率时域表示来减轻边界伪影。使用多速率频率-数字转换器可以将量化噪声与VCO频率解耦，保持标准音频采样率。驱动器和振荡器电路的协同优化使我们的VCO-ADC能够在没有反馈DAC的情况下达到>112 dBc的峰值SFDR，同时保持与电容式MEMS兼容的Giga-Ohm输入阻抗。我们展示了一款0.13 μm ASIC的测量结果，该ASIC实现了数字MEMS麦克风的完整读出电路。这包括两个模拟通道以及提供标准单比特PDM输出所需的数字信号处理和校准模块。该ADC在低于400 μW的功率预算下达到114.3 dB的动态范围，Schreier FoM_{SNDR}为171.0 dB，FoM_{DR}为191.3 dB。

英文摘要

Improvements in the dynamic range and sensitivity of digital MEMS microphones are essential in applications like advanced noise canceling and voice recognition. A cost effective solution to achieve these goals is the companding ADC architecture. Companding ADCs split the dynamic range in several segments with different quantization noise levels, relaxing power constraints. A common problem of companding microphones are audible artifacts generated when the input signal crosses the boundaries between different amplitude segments. We show in this paper a companding ADC architecture that mitigates the boundary artifacts by leveraging the instantaneous and high-resolution time-domain representation of the input signal in a VCO-based ADC. The use of a multi-rate frequency-to-digital converter allows to decouple quantization noise from the VCO frequency, keeping standard audio sampling rates. Co-optimization of the driver and oscillator circuits enables our VCO-ADC to reach \textgreater 112dBc of peak SFDR without a feedback DAC, keeping a Giga-Ohm input impedance compatible with a capacitive MEMS. We show measurements of a 0.13 $μ$m ASIC implementing a complete readout circuit for a digital MEMS microphone. This includes two analog channels and the digital signal processing and calibration blocks required to deliver a standard single-bit PDM output. This ADC reaches a dynamic range of 114.3dB with a power budget under 400 uW, a Schreier FoM_{SNDR} of 171.0 dB and a FoM_{DR} of 191.3 dB.

URL PDF HTML ☆

赞 0 踩 0

2606.17869 2026-06-17 eess.IV 新提交

Perceptually-Weighted Video Quality Metric for Asymmetric Encoded Sports Videos

感知加权视频质量度量用于非对称编码体育视频

Anna Meyer, Jonas Janzen, Diwakara Reddy, Alexander Kopte, Simon Deniffel, Paul Wawerek-López, Marc Windsheimer, André Kaup

AI总结提出一种感知加权视频质量度量(PW-VQM)，通过结合开放词汇目标检测和光流分析区分前景与背景，在质量聚合中赋予前景更高权重，在体育视频上SROCC达0.9511，优于SSIM、VMAF等指标。

Comments accepted for International Conference on Quality of Multimedia Experience 2025 (QoMEX'26)

详情

AI中文摘要

客观视频质量度量通常假设均匀的空间注意力，这一假设与人类视觉感知的选择性相矛盾，尤其是在体育视频中。通过语义编码为显著区域分配更多比特可以带来显著的比特率节省。我们提出了一种感知加权视频质量度量(PW-VQM)，这是一种全参考度量，考虑了空间区域感知重要性的不均匀性，因此针对非对称编码内容的质量评估。在多尺度小波域中计算的SSIM图通过区分前景和背景区域进行加权。通过结合开放词汇目标检测和光流分析识别感知显著的前景区域，并在质量聚合中赋予更高权重。在体育视频内容上评估，PW-VQM实现了0.9511的斯皮尔曼等级相关系数，优于包括SSIM、VMAF、FUNQUE和LPIPS在内的现有度量。消融研究证实了感知加权各组成部分的单独贡献。

英文摘要

Objective video quality metrics commonly assume uniform spatial attention, an assumption that conflicts with the selective nature of human visual perception, particularly in sports videos. Here, allocating more bits for salient regions through semantic encoding can lead to significant bitrate savings. We present a Perceptually-Weighted Video Quality Metric (PW-VQM), a full-reference metric that accounts for the unequal perceptual importance of spatial regions and therefore targets quality evaluation for asymmetrically encoded content. SSIM maps computed in a multiscale wavelet domain are weighted by differentiating between foreground and background regions. Perceptually salient foreground regions are identified by combining open-vocabulary object detection with optical flow analysis, and are assigned higher weight during quality aggregation. Evaluated on sports video content, PW-VQM achieves a Spearman Rank Order Correlation Coefficient of 0.9511, outperforming established metrics including SSIM, VMAF, FUNQUE, and LPIPS. An ablation study confirms the individual contributions of the components of the perceptual weighting.

URL PDF HTML ☆

赞 0 踩 0

2606.17806 2026-06-17 eess.AS 新提交

PhASE-Flow: Phonetic-Conditioned Acoustic Flow Matching in SSL Representation Domain for Speech Enhancement

PhASE-Flow：语音增强中SSL表示域内基于音素条件的声学流匹配

Jun Gao, Xiaobin Rong, Yu Sun, Dahan Wang, Jing Lu

AI总结提出PhASE-Flow，一种在自监督学习表示空间中直接建模的流匹配语音增强框架，通过音素条件生成干净声学表示，仅需4步采样即可达到领先性能。

Comments Accepted by Interspeech 2026

详情

AI中文摘要

流匹配（FM）能够实现高保真生成，而自监督学习（SSL）语音模型提供跨越声学和音素层次的分层表示。然而，现有的基于FM的语音增强（SE）方法主要在频谱域中操作，仅将SSL特征作为外部条件，而非直接在SSL潜在空间中建模。为了充分利用SSL表示的结构丰富性，我们提出了PhASE-Flow，一个完全在SSL空间中运行的基于FM的SE框架。它建模给定音素表示的干净声学表示的条件分布，并通过神经声码器重建波形。实验表明，PhASE-Flow在感知质量和可懂度上优于最先进的基线。值得注意的是，它仅用四个采样步骤即可达到竞争性能，实现了高效推理。音频演示可在此网址获取：https://this URL。

英文摘要

Flow matching (FM) enables high-fidelity generation, while self-supervised learning (SSL) speech models provide hierarchical representations spanning acoustic and phonetic levels. However, existing FM-based speech enhancement (SE) methods operate primarily in the spectral domain, treating SSL features only as external conditions rather than modeling directly in the SSL latent space. To fully exploit the structural richness of SSL representations, we propose PhASE-Flow, an FM-based SE framework that operates entirely in the SSL space. It models the conditional distribution of clean acoustic representations given phonetic ones, reconstructing the waveform via a neural vocoder. Experiments show that PhASE-Flow outperforms state-of-the-art baselines in perceptual quality and intelligibility. Notably, it achieves competitive performance with only four sampling steps, enabling highly efficient inference. Audio demos are available at https://anonymous.4open.science/w/phase-flow_demo-E6E1/.

URL PDF HTML ☆

赞 0 踩 0

2606.17801 2026-06-17 eess.SP 新提交

Joint Direction-of-Arrival and Range Estimation for Millimeter-Wave Uniform Linear Array Radar

毫米波均匀线性阵列雷达的联合到达角与距离估计

Necati Kagan Erkek, Zeynep Gul Pehlivanli

AI总结提出一种基于FFT的到达角与距离估计框架，用于77 GHz单基地均匀线性阵列雷达，通过窄带和宽带波形实现高精度角度与距离估计。

Comments 6 pages

2606.17737 2026-06-17 eess.SP 新提交

Deep CSI Feedback for FDD Massive MIMO Systems: A Curvelet Learning Approach

FDD大规模MIMO系统的深度CSI反馈：一种曲波学习方法

Mengli Tao, Jiancun Fan, Huiqiang Xie, Kai Xie

AI总结针对FDD大规模MIMO系统中CSI反馈开销大的问题，提出基于曲波变换的SwinCANet框架，通过频域分解与注意力机制提升重建质量，并引入去噪变体抑制噪声，仿真验证了其优越性能。

详情

AI中文摘要

下行信道状态信息（CSI）反馈在频分双工（FDD）大规模多输入多输出（mMIMO）系统中起着关键作用。超大规模MIMO中天线数量的增长增加了CSI反馈的难度和开销，这对传统的下行CSI反馈机制构成了重大挑战。为了解决现有CSI反馈方法的局限性，本文提出了一种基于曲波学习的新框架，称为SwinCANet，包括频域信息处理模块和去噪模块。频域信息处理模块采用曲波变换将CSI分解为低频和高频分量。随后，分别利用Swin Transformer和通道注意力块提取低频和高频表示，从而提高重建质量。值得注意的是，额外的Swin Transformer促进了多尺度频率分量的融合，增强了不同角度分辨率和空间方向上的能力。此外，我们开发了一种变体（De-SwinCANet），它采用Sigmoid阈值函数有效抑制噪声系数，从而减轻各种信道损伤和非线性失真。数值仿真结果表明，所提出的方法在具有挑战性的传播条件下实现了优于现有基准的性能，同时保持了鲁棒性。

英文摘要

Downlink channel state information (CSI) feedback plays a key role in frequency division duplex (FDD) massive multiple-input multiple-output (mMIMO) systems. The growth of antennas in ultra-massive MIMO increases the difficulty and overhead of CSI feedback, which poses significant challenges for conventional downlink CSI feedback mechanisms. To address the limitations of existing CSI feedback approaches, this paper proposes a novel curvelet learning based framework termed SwinCANet, comprising a frequency-domain information processing module and a denoising module. The frequency-domain information processing module employs curvelet transform to decompose CSI into low-frequency and high-frequency components. Subsequently, Swin Transformer and channel-wise attention block are utilized for extracting the low-frequency and high-frequency representations, respectively, thereby enhancing reconstruction quality. Notably, an additional Swin Transformer facilitates the fusion of multi-scale frequency components, enhancing capabilities across different angular resolutions and spatial directions. Furthermore, we develop a variant (De-SwinCANet), which employs a Sigmoid threshold function to effectively suppress noise coefficients, thereby mitigating various channel impairments and nonlinear distortions. Numerical simulation results demonstrate that the proposed methodology achieves superior performance compared to existing benchmarks while maintaining robust performance under challenging propagation conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.17718 2026-06-17 eess.SP 新提交

BASIIS: Bistatic Angular Sampling and Interpolation for ISAC Setups

BASIIS: 双基地ISAC设置的角度采样与插值

Alexander Felix, Marcus Henninger, Lucas Giroto, Maximilian Bauhofer, Stephan ten Brink, Silvio Mandelli

AI总结针对双基地ISAC中收发阵列角度域的四维采样问题，提出基于正交基线共阵列的最小采样与插值方案，在保持检测精度的同时减少3-5倍收发方向对。

详情

AI中文摘要

集成感知与通信（ISAC）是6G的一个定义性特征，以有限的额外开销将蜂窝网络扩展到雷达类感知。在双基地部署中，感知需要协调发射（TX）和接收（RX）阵列以扫描离开角和到达角的笛卡尔积，导致角度域中的四维采样问题。本文为双基地ISAC建立了一个完整的角度采样框架，将基于DFT的最优采样方法扩展到两个阵列的全方位角和仰角域。我们表明双基地几何耦合了TX和RX仰角，并通过正交基线共阵列（一种捕获阵列对联合仰角孔径的虚拟阵列）表示这种耦合。从共阵列中，我们推导出一种最小采样和插值方案，该方案近乎无损且可适用于任何波束赋形架构。蒙特卡洛模拟证实，所提出的最小采集基本上等同于密集过采样成像的检测精度，同时采集的TX-RX方向对减少了3到5倍。这使得双基地操作能够大幅降低ISAC系统的无线电资源使用开销。

英文摘要

Integrated Sensing and Communications (ISAC) is a defining feature of 6G, extending cellular networks with radar-like sensing at limited additional overhead. In bistatic deployments, sensing requires coordinating the transmitter (TX) and receiver (RX) arrays to scan the Cartesian product of angle of departure and arrival, resulting in a four-dimensional sampling problem in the angular domain. This work establishes a complete angular sampling framework for bistatic ISAC, extending the DFT-based optimal-sampling methodology to the full azimuth and elevation domains of both arrays. We show that the bistatic geometry couples the TX and RX elevation angles, and represent this coupling through the ortho-baseline coarray, a virtual array that captures the joint elevation aperture of the array pair. From the coarray we derive a minimal sampling and interpolation scheme, near-lossless and realizable with any beamforming architecture. Monte Carlo simulations confirm the proposed minimal acquisition essentially equalizes the detection accuracy of dense oversampled imaging while acquiring 3 to 5 times fewer TX-RX direction pairs. This allows having bistatic operations with drastically reduced overhead on the radio resource usage of ISAC systems.

URL PDF HTML ☆

赞 0 踩 0

2606.17699 2026-06-17 eess.SP 新提交

Joint Synchronization and Radar Parameter Estimation for Distributed OFDM-ISAC Systems

分布式OFDM-ISAC系统的联合同步与雷达参数估计

Niclas Führling, Hyeon Seok Rou, Kuranage Roche Rayan Ranasinghe, Giuseppe Thadeu Freitas de Abreu, Nuria González-Prelcic

AI总结针对分布式ISAC系统在双弥散信道中的同步问题，提出基于双变量高斯置信传播的联合同步与雷达参数估计方法，实现时偏、频偏及信道参数的联合估计，性能接近CRLB。

2606.17662 2026-06-17 eess.AS 新提交

An Analysis of the Effectiveness of Synthetic Speech Data for ASR Fine-tuning in Selected Indic Languages

合成语音数据在选定印度语言ASR微调中的有效性分析

Sujith Pulikodan, Agneedh Basu, Pavan Kumar, Pranav Bhat, Visruth Sanka, Nihar Desai, Prasanta Kumar Ghosh

AI总结研究在三种印度语言（印地语、卡纳达语、泰卢固语）中，将合成语音数据与真实数据结合用于ASR微调的效果，分析不同合成来源和语音克隆对性能的影响。

2606.17641 2026-06-17 eess.SP 新提交

Toward Quantum-Enhanced ISAC: Active-RIS-Aided Integrated Sensing and Communication with Rydberg Atomic Receivers

面向量子增强型ISAC：基于活性RIS的集成感知与通信系统及里德伯原子接收机

Hong-Bae Jeon, Hyung-Joo Moon, Yonghwi Kim

AI总结提出一种活性RIS辅助的集成感知与通信系统，利用里德伯原子接收机的幅度实域观测特性，联合设计基站波束成形与RIS反射系数以最小化克拉美-罗界，通过交替优化框架解决非凸问题。

详情

AI中文摘要

本文研究了一种采用里德伯原子接收机（RARE）的活性RIS（ARIS）辅助集成感知与通信（ISAC）系统。利用RARE的幅度实域观测结构，我们首先推导了统一的ISAC模型，并给出了到达角（DoA）估计的闭式克拉美-罗界（CRB）。基于此公式，我们提出联合设计基站（BS）波束成形和ARIS反射系数，以在RARE特定的信号干扰噪声比（SINR）和ARIS功率约束下最小化CRB。为解决由此产生的高度非凸问题，我们开发了一种交替优化（AO）框架，该框架结合了用于波束成形的半定松弛（SDR）和用于ARIS设计的基于主化-最小化（MM）的方法。数值结果表明，所提出的RARE感知框架显著优于传统的基于射频的设计，并实现了接近雷达专用基准的性能，突显了RARE在ARIS辅助量子增强型ISAC中的潜力。

英文摘要

In this paper, we investigate an active-RIS (ARIS)-aided integrated sensing and communication (ISAC) system with Rydberg Atomic REceiver (RARE). Leveraging the magnitude-only and real-domain observation structure of RARE, we first derive a unified ISAC model, along with a closed-form Cramer-Rao bound (CRB) for direction-of-arrival (DoA) estimation. Based on this formulation, we propose a joint design of the {base station (BS)} beamforming and ARIS reflection coefficients to minimize the CRB under RARE-specific signal-to-interference-noise-ratio (SINR) and ARIS power constraints. To tackle the resulting highly non-convex problem, we develop an alternating optimization (AO) framework that combines semidefinite relaxation (SDR) for beamforming and a majorization-minimization (MM)-based approach for ARIS design. Numerical results demonstrate that the proposed RARE-aware framework significantly outperforms conventional RF-based designs and achieves performance close to the radar-only benchmark, highlighting the potential of RARE for quantum-enhanced ISAC with ARIS.

URL PDF HTML ☆

赞 0 踩 0

2606.17570 2026-06-17 eess.IV 新提交

Fine-UNETR for PSMA PET/CT Lesion Segmentation: Automated Tumor Quantification and Overall Survival Stratification in Prostate Cancer

Fine-UNETR 用于 PSMA PET/CT 病灶分割：前列腺癌中自动肿瘤定量和总生存期分层

Mansour Abtahi, Chae Moon Hong, Nikhil Deveshwar, Stellamaris Nwihim, Peder E. Z. Larson, Thomas A. Hope

AI总结提出基于 Vision Transformer 的 Fine-UNETR 架构，实现全身 PSMA PET/CT 病灶自动分割，并验证 AI 衍生的肿瘤负荷生物标志物在放射配体治疗前总生存期分层中的临床效用。

详情

AI中文摘要

引言：开发并评估 Fine-UNETR，一种基于 Vision Transformer 的架构，用于全身 PET/CT 上 PSMA 亲和病灶的自动分割，并评估 AI 衍生的肿瘤负荷生物标志物在放射配体治疗中总生存期分层的临床效用。方法：在这项回顾性研究中，分析了来自前列腺癌患者的 373 次 PSMA PET/CT 扫描（平均年龄 71±8 岁）。Fine-UNETR 是一种改进的 UNETR，采用 8×8×8 体素块嵌入和轴向滑动窗口训练，在 299 次扫描上训练，并在 74 次扫描上验证。在独立的 67 名放射配体治疗前患者队列中，使用 Kaplan-Meier 分析和 log-rank 检验评估总生存期分层。在来自 AutoPET IV PSMA PET/CT 数据集的 192 例病例上进行外部验证。结果：Fine-UNETR 的 Dice 相似系数（DSC）为 66.63%，灵敏度为 70.27%，精确率为 67.77%，病灶检测率为 79.53%（SUVmax ≥ 5 的病灶为 96.05%）。在外部验证数据集上，模型达到 DSC 44.11% 和病灶检测率 87.18%，表明尽管体素级重叠减少，病灶检测性能仍得以保持。AI 衍生的生物标志物与金标准具有极好的一致性（总肿瘤体积：r=0.984；总病灶摄取：r=0.989；病灶计数：r=0.960）。在临床队列中，总肿瘤体积（p=0.0019）、SUVmax（p=0.014）和 SUVmean（p=0.016）显著分层了总生存期。结论：Fine-UNETR 能够实现准确的全身 PSMA 病灶自动分割和肿瘤负荷量化。在外部数据集上的性能尽管存在域偏移的证据，但表现出鲁棒性。AI 衍生的生物标志物在放射配体治疗前队列中显著分层了总生存期，支持自动 PSMA PET/CT 量化在预后判断中的临床效用。

英文摘要

Introduction: To develop and evaluate Fine-UNETR, a Vision Transformer-based architecture for automated segmentation of PSMA-avid lesions on whole-body PET/CT, and to assess clinical utility of AI-derived tumor burden biomarkers for overall survival stratification in radioligand therapy. Methods: In this retrospective study, 373 PSMA PET/CT scans (mean age, 71+-8 years) from patients with prostate cancer were analyzed. Fine-UNETR, a modified UNETR with 8x8x8 voxel patch embedding and axial sliding window training, was trained on 299 scans and validated on 74 scans. Overall survival stratification was assessed in an independent cohort of 67 pre-radioligand therapy patients using Kaplan-Meier analysis and log-rank testing. External validation was performed on 192 cases from the AutoPET IV PSMA PET/CT dataset. Results: Fine-UNETR achieved a Dice similarity coefficient (DSC) of 66.63%, sensitivity of 70.27%, precision of 67.77%, and a lesion detection rate of 79.53% (96.05% for lesions with SUVmax >= 5). On the external validation dataset, the model achieved a DSC of 44.11% and a lesion detection rate of 87.18%, indicating that lesion detection performance was preserved despite reduced voxel-level overlap. AI-derived biomarkers showed excellent agreement with ground truth (total tumor volume: r=0.984; total lesion uptake: r=0.989; lesion count: r=0.960). In the clinical cohort, total tumor volume (p=0.0019), SUVmax (p=0.014), and SUVmean (p=0.016) significantly stratified overall survival. Conclusion: Fine-UNETR enables accurate automated whole-body PSMA lesion segmentation and tumor burden quantification. Performance on an external dataset demonstrates robustness despite evidence of domain shift. AI-derived biomarkers significantly stratified overall survival in a pre-radioligand therapy cohort, supporting the clinical utility of automated PSMA PET/CT quantification for prognostication.

URL PDF HTML ☆

赞 0 踩 0

2606.17479 2026-06-17 eess.SP 新提交

A Miniaturized Dynamic Array for Antenna-Level Physical Layer Security

用于天线级物理层安全的小型化动态阵列

Sheng Huang, Jacob R. Randall, Cory Hilton, Jeffrey A. Nanzer

AI总结提出一种基于方向调制的紧凑动态全向阵列，通过单射频输入和开关控制四元印刷曲折线单极子阵列，在E面实现角度选择性信息恢复，H面保持全向覆盖。

Comments 14 pages, 11 figures

详情

AI中文摘要

提出了一种紧凑的动态全向阵列，用于通过方向调制实现天线级物理层安全。与基于相控阵波束合成或多个射频链的传统方向调制发射机不同，所提出的架构使用单个射频输入和开关控制的四元印刷曲折线单极子阵列，工作于5.05 GHz。状态相关的激励在辐射场中引入可控的幅度和相位扰动，产生角度相关的星座畸变和误码率行为。可靠的信息恢复被限制在E面的窄边射区域，而H面保持准静态全向，提供完整的360度信息可恢复区域。该天线在单层Rogers RO4350B基板上实现，紧凑尺寸为0.57 x 1.11 λ₀²。使用基于商用射频元件的四路开关网络进行实验验证。采用16-QAM在5.05 GHz的通信测量表明，在BER ≤ 10⁻³准则下，校准开关模式的E面信息波束宽度为30至36度，而测量的H面未观察到误码，信噪比保持在约33 dB以上。还利用馈电相位偏移来引导BER定义的信息可恢复扇区，展示了使用相同天线级开关机制的信息波束转向。这些结果表明，紧凑的天线级方向调制可以在一个主平面内提供角度选择性信息恢复，同时在正交平面保持全向覆盖。

英文摘要

A compact dynamic omnidirectional array is proposed for antenna-level physical-layer security through directional modulation. Unlike conventional directional-modulation transmitters based on phased-array beam synthesis or multiple RF chains, the proposed architecture uses a single RF input and a switching-controlled four-element printed meander-line monopole array operating at 5.05 GHz. The state-dependent excitation introduces controllable magnitude and phase perturbations in the radiated field, producing angle-dependent constellation distortion and bit error rate behavior. Reliable information recovery is confined to a narrow broadside region in the E-plane, whereas the H-plane remains quasi-static and omnidirectional, providing a full 360-degree information-recoverable region. The antenna is implemented on a single-layer Rogers RO4350B substrate with a compact footprint of 0.57 x 1.11 lambda_0^2. A four-path switching network based on commercial RF components is used for experimental validation. Communication measurements using 16-QAM at 5.05 GHz demonstrate BER-defined E-plane information beamwidths of 30 to 36 degrees for calibrated switching modes under a BER <= 10^-3 criterion, while no bit errors are observed in the measured H-plane and the SNR remains above approximately 33 dB. Feed-phase offsets are also used to steer the BER-defined information-recoverable sector, demonstrating information-beam steering with the same antenna-level switching mechanism. These results show that compact antenna-level directional modulation can provide angularly selective information recovery in one principal plane while preserving omnidirectional coverage in the orthogonal plane.

URL PDF HTML ☆

赞 0 踩 0

2606.17439 2026-06-17 eess.SP 新提交

Two-Stage IQ Imbalance Estimation and Compensation for AFDM Systems

AFDM系统的两级IQ不平衡估计与补偿

Zhenfeng Huang, Yitong Liu, Yuping Yan, Hongwen Yang

AI总结针对AFDM系统中的IQ不平衡问题，提出两级估计与补偿方法：先利用前导码迭代估计时不变参数，再结合BEM信道估计与改进LMMSE检测器抑制干扰，实现快速收敛和近理想误码率。

Comments submitted to IEEE Wireless Communications Letters

2606.17382 2026-06-17 eess.SP 新提交

Automated Estimation of Equivalent Circuit Model from Impedances with Long Short-Term Memory

基于长短期记忆网络的阻抗等效电路模型自动估计

Ryoma Iki, Motoya Furugori, Noboru Katayama

AI总结提出一种结合LSTM和卷积特征提取器的机器学习方法，直接从阻抗谱生成等效电路拓扑，无需拟合或预设元件数量，在合成数据上以77.8%准确率识别正确拓扑。

详情

AI中文摘要

电化学阻抗谱（EIS）是一种广泛使用的非破坏性电化学系统表征技术，其分析通常依赖于将测量谱拟合到等效电路模型（ECM）。然而，选择合适的ECM仍然是一个主要瓶颈：基于知识的选择需要专家判断且难以复现，而现有的自动化方法要么从固定的候选电路集中选择，要么在基因表达编程的情况下需要重复的等效电路拟合和预定的电路规模。本文提出了一种机器学习方法，通过将电路表示为符号序列，并利用长短期记忆（LSTM）网络结合卷积特征提取器生成该序列，直接从阻抗谱估计ECM。由于LSTM天然处理变长序列，该方法直接生成电路拓扑，在估计过程中无需任何拟合，也不需要对元件数量进行先验假设。引入阻抗的四次方根变换以强调对区分电路至关重要的中频特征，自适应波束搜索生成多个排序候选。在由119种电路拓扑生成的100,000个合成数据集（阻抗添加1%噪声）上评估，该方法在77.8%的情况下识别出正确拓扑作为最可能的ECM，在98.8%的情况下正确拓扑位于前五名候选之中，每个数据集的平均估计时间为17.8毫秒——比报道的基于拟合的方法快几个数量级。这些结果表明，使用神经网络直接生成拓扑是实现全自动、无需专家的ECM估计的有前景的途径。

英文摘要

Electrochemical Impedance Spectroscopy (EIS) is a widely used, non-destructive technique for characterizing electrochemical systems, and its analysis typically relies on fitting the measured spectra to an Equivalent Circuit Model (ECM). Selecting an appropriate ECM, however, remains a major bottleneck: knowledge-based selection requires expert judgment and is difficult to reproduce, while existing automated approaches either choose from a fixed set of candidate circuits or, in the case of Gene Expression Programming, require repeated equivalent-circuit fitting and a predetermined circuit scale. Here, we propose a machine learning method that estimates an ECM directly from an impedance spectrum by representing the circuit as a serialized string of symbols and generating this string with a Long Short-Term Memory (LSTM) network coupled to a convolutional feature extractor. Because the LSTM inherently handles variable-length sequences, the method produces the circuit topology directly, without any fitting during estimation nor prior assumption for the number of elements. A fourth-root transformation of the impedance is introduced to emphasize the mid-frequency features essential for distinguishing circuits, and an adaptive beam search yields multiple ranked candidates. Evaluated on 100,000 synthetic datasets generated from 119 circuit topologies with 1% added noise on impedances, the method identified the correct topology as the most probable ECM in 77.8% of cases and among the top five candidates in 98.8% of cases, with an average estimation time of 17.8 milliseconds per dataset - several orders of magnitude faster than reported fitting-based approaches. These results indicate that direct topology generation with a neural network is a promising route toward fully automated, expert-independent ECM estimation.

URL PDF HTML ☆

赞 0 踩 0

2606.17337 2026-06-17 eess.AS 新提交

From Signals to Patterns: Non-Invasive Tuberculosis Detection from Cough Audio using Bandit Weighted Hyperbolic Prototypes

从信号到模式：使用Bandit加权双曲原型从咳嗽音频进行非侵入性结核病检测

Mohd Mujtaba Akhtar, Girish, Sanjam Wadhwa, Muskaan Singh, Ning Ma

AI总结提出COBALT框架，融合频谱特征与语音基础表示，通过码本对齐双曲原型和Bandit可靠性加权，在CODA TB DREAM挑战基准上实现最优性能。

Comments Accepted to INTERSPEECH 2026

2606.17333 2026-06-17 eess.SP 新提交

Communication Modeling of Long-Distance Abscisic Acid Signaling in Plant Vascular Systems

植物维管系统中长距离脱落酸信号传导的通信建模

Necati Kagan Erkek, Hani Ballouz, Radin Monshian Motlagh

AI总结综述脱落酸（ABA）的生物合成、长距离运输及实验量化方法，提出基于分子通信的ABA传输模型，通过MATLAB布朗运动模拟评估释放量和接收器半径对检测信号的影响。

Comments 16 pages

详情

AI中文摘要

脱落酸（ABA）是一种关键的植物激素，用于协调对干旱、盐碱、冷胁迫、病原体攻击、创伤和发育老化的响应。本文综述了增加ABA生物合成的生物刺激、主要产生部位和途径，以及ABA通过植物维管组织的长距离运动。然后讨论了实验量化方法，包括带电子捕获检测的气液色谱法和带紫外检测的高效液相色谱法。最后，本文提出了一种受分子通信启发的ABA传输模型，其中根侧ABA释放被表示为发射器，木质部路径被表示为有界通道，大豆组织被表示为接收器。使用MATLAB布朗运动模拟来评估释放分子数量和接收器半径对检测到的ABA信号的影响。结果表明，更高的释放量产生更平滑和更强的接收趋势，而更大的接收器增加分子捕获概率。

英文摘要

Abscisic acid (ABA) is a central plant hormone for coordinating responses to drought, salinity, cold stress, pathogen attack, wounding, and developmental aging. This paper reviews the biological stimuli that increase ABA biosynthesis, the main production sites and pathways, and the long-distance movement of ABA through plant vascular tissues. It then discusses experimental quantification approaches, including gas-liquid chromatography with electron-capture detection and high-performance liquid chromatography with ultraviolet detection. Finally, the paper presents a molecular-communication-inspired model of ABA transport in which root-side ABA release is represented as a transmitter, the xylem pathway as a bounded channel, and soybean tissue as a receiver. MATLAB Brownian-motion simulations are used to evaluate the effects of released molecule quantity and receiver radius on the detected ABA signal. The results show that higher release quantities produce smoother and stronger reception trends, while larger receivers increase molecule-capture probability.

URL PDF HTML ☆

赞 0 踩 0