arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.05149 2026-06-04 cs.CV cs.LG eess.IV

An Open-Source Two-Stage Computer Vision Pipeline for Fine-Grained Vehicle Classification using Vision Transformers

基于视觉Transformer的开源两阶段细粒度车辆分类流水线

Gandhimathi Padmanaban, Fred Feng

AI总结提出一个结合RT-DETR检测器和微调ViT-Base/16的两阶段流水线，用于六类车身分类，并引入置信度弃权机制，在分布内和分布外数据集上分别达到0.94和0.89的准确率。

详情

Comments: 24 pages, 10 figures, venue TBD

AI中文摘要

车辆车身类型是超车碰撞中骑行者伤害严重程度的重要决定因素，然而，在公开文献中，尚不存在从自然道路视频中将车辆分类为与伤害风险相关类别的自动化工具。标准目标检测基准仅提供粗粒度车辆标签（轿车、卡车、公交车、摩托车），而现有的细粒度识别系统在受控图像上训练，且缺乏跨记录站点的部署鲁棒性评估。本文提出一个开源的两阶段计算机视觉流水线，结合预训练的RT-DETR检测器进行粗粒度车辆定位，以及微调的视觉Transformer（ViT-Base/16）进行六类车身分类：乘用车、SUV、皮卡、小型货车、大型货车和商用卡车。当softmax输出低于0.60时，基于置信度的弃权机制保留第二阶段预测，产生未知标签而非静默误分类。在来自密歇根州安阿伯市自行车道走廊的3,805个标注超车事件（分布内）上评估，该流水线达到0.94的准确率，每类F1分数从0.91（小型货车）到0.97（SUV）。在来自开放骑行数据集的311个事件（分布外）上独立评估，无需重新训练，准确率为0.89。四个代表性类别中的三个在域偏移下保持F1不低于0.90。观察到的最大退化出现在小型货车（F1=0.72），原因是弃权率从2.4%上升到25.0%，而非主动误分类，这与传播真实模型不确定性的机制一致。完整的流水线，包括推理脚本、训练代码、评估工具和模型权重，作为开源软件发布，以支持跨路边视频档案和骑行安全研究的可重复性和复用。

英文摘要

Vehicle body type is a significant determinant of cyclist injury severity in overtaking crashes, yet automated tools for classifying vehicles into injury-risk-relevant categories from naturalistic roadway video do not exist in the open literature. Standard object detection benchmarks provide only coarse vehicle labels (car, truck, bus, motorcycle), while existing fine-grained recognition systems are trained on controlled imagery and lack evaluation for deployment robustness across recording sites. This paper presents an open-source two-stage computer vision pipeline combining a pre-trained RT-DETR detector for coarse vehicle localization with a fine-tuned Vision Transformer (ViT-Base/16) for six-category body-type classification: passenger car, SUV, pickup truck, minivan, large van, and commercial truck. A confidence-based abstention mechanism withholds Stage 2 predictions when softmax output falls below 0.60, producing unknown labels rather than silent misclassifications. Evaluated on 3,805 annotated overtaking events from a bicycle-lane corridor in Ann Arbor, Michigan (in-distribution), the pipeline achieved 0.94 accuracy with per-class F1 scores from 0.91 (minivan) to 0.97 (SUV). On an independent out-of-distribution evaluation of 311 events from an open cycling dataset without retraining, accuracy was 0.89. Three of four well-represented categories maintained F1 at or above 0.90 under domain shift. The largest degradation was observed for minivan (F1 = 0.72), driven by abstention rate rising from 2.4% to 25.0% rather than active misclassification, consistent with the mechanism propagating genuine model uncertainty. The full pipeline, including inference scripts, training code, evaluation utilities, and model weights, is released as open-source software to support reproducibility and reuse across roadside video archives and cycling safety research.

URL PDF HTML ☆

赞 0 踩 0

2606.05121 2026-06-04 cs.SD cs.AI cs.CL cs.MM eess.AS

Audio Interaction Model

音频交互模型

Zhifei Xie, Zihang Liu, Ze An, Xiaobin Hu, Yue Liao, Ziyang Ma, Dongchao Yang, Mingbao Lin, Deheng Ye, Shuicheng Yan, Chunyan Miao

AI总结提出一种统一的在线大型音频语言模型Audio-Interaction，通过始终在线的感知-决策-响应循环实现实时音频交互，并构建了StreamAudio-2M数据集和Proactive-Sound-Bench基准，在保持主流音频任务性能的同时解锁了实时ASR、流式音频指令跟随和主动帮助等能力。

详情

Comments: Next generation of LALMs, work in progress

AI中文摘要

音频本质上是一种交互式模态，然而当今的大型音频语言模型（LALM）是离线的，而流式音频模型每个只处理单一任务，如流式ASR或语音聊天。现在是时候将它们统一为一个在线LALM：一个通过始终在线的感知-决策-响应循环，实时收听声音、环境和指令并即时反应的模型。我们将这种机制形式化为音频交互模型，并通过Audio-Interaction实现，这是一个统一的流式模型，在保留离线任务执行的同时，增加了在线通用音频指令跟随能力，从对话到全语音聊天，根据流语义决定何时响应。为此，我们提出了SoundFlow框架，该框架通过流原生数据构建、理解感知训练和异步低延迟推理，端到端地实例化感知-决策-响应循环，实现稳定的实时交互。我们进一步构建了StreamAudio-2M，一个包含260万项流式语料库，涵盖7种基本能力和28个子任务，以及用于评估主动音频干预的Proactive-Sound-Bench。在8个基准测试中，Audio-Interaction在主流音频任务上保持有竞争力的性能，同时解锁了离线LALM无法实现的能力，包括实时ASR、流式音频指令跟随和主动帮助。

英文摘要

Audio is an inherently interactive modality, yet today's Large Audio Language Models (LALMs) are offline, and streaming audio models each handle only a single task such as streaming ASR or voice chatting. It is time to unify them into one online LALM: a model that, through an always-on perceive-decide-respond loop, listens to sound, environment, and instructions in real time and reacts on the fly. We formalize this regime as the Audio Interaction Model, and realize it with Audio-Interaction, a unified streaming model that retains offline task execution while adding online general audio instruction following, from dialogue to full voice chatting, deciding when to respond from the semantics of the stream. To enable this, we propose SoundFlow, a framework that instantiates the perceive-decide-respond loop end to end, from data to training to deployment, through streaming-native data construction, comprehension-aware training, and asynchronous low-latency inference for stable real-time interaction. We further construct StreamAudio-2M, a 2.6M-item streaming corpus spanning 7 fundamental abilities and 28 sub-tasks, and Proactive-Sound-Bench for evaluating proactive audio intervention. Across 8 benchmarks, Audio-Interaction preserves competitive performance on mainstream audio tasks while unlocking capabilities inaccessible to offline LALMs, including real-time ASR, streaming audio instruction following, and proactive help.

URL PDF HTML ☆

赞 0 踩 0

2606.05113 2026-06-04 eess.IV

3D-GlioPREDICT: 3D Latent Diffusion for Post-Radiotherapy Brain MRI Prediction in Patients with Glioma

3D-GlioPREDICT: 用于胶质瘤患者放疗后脑MRI预测的3D潜在扩散模型

Nordin Belkacemi, Selena Huisman, Vera C. Keil, Joost J. C. Verhoeff, Szabolcs David

AI总结提出一种3D潜在扩散框架，以空间分辨的体素级剂量分布为条件，结合预处理图像和随访时间，生成放疗后脑MRI，在公开数据集上优于2D方法。

详情

Comments: 12 pages, 6 figures, 2 tables

AI中文摘要

放射治疗是胶质瘤治疗的基石，会诱导脑组织中难以预测的复杂结构变化。从治疗前数据预测这些变化可以增进对治疗相关效应的理解，并支持基于图像的预后预测方法的发展。最近的研究表明，可以从基线成像和治疗信息合成随访脑磁共振成像，但大多数现有方法在单个2D切片上操作，并将治疗表示为全局参数而非空间动态变量。在这项工作中，我们通过一个3D潜在扩散框架解决了这两个限制，该框架以空间分辨的体素级剂量分布、预处理图像和随访时间为条件生成图像。为了使体积合成在计算上可行，该模型结合了潜在空间压缩和基于ControlNet的空间条件化。该方法在包含25名胶质瘤患者257次扫描的公共数据集上进行了训练和评估。使用均方误差、峰值信噪比和结构相似性指数评估预测质量。进一步使用脑脊液、灰质和白质分割的Dice分数、海马体积预测误差以及基于对数雅可比行列式图的变形分析评估解剖一致性。与我们先前提出的2D方法相比，3D模型在保持与真实解剖和变形模式良好一致性的同时，实现了更高的图像相似度。总体而言，这些结果支持了仅使用治疗前信息预测放疗后脑MRI的3D治疗感知生成建模的可行性。代码可在 https://github.com/nordinbelkacemi/fu-pred-3d 获取。

英文摘要

Radiotherapy is a cornerstone of glioma treatment inducing complex structural changes in brain tissue that are difficult to anticipate. Predicting these changes from pretreatment data could improve understanding of treatment-related effects and support the development of image-based outcome prediction methods. Recent studies have shown that follow-up brain magnetic resonance imaging can be synthesized from baseline imaging and treatment information, but most existing approaches operate on single 2D slices and represent treatment as a global parameter, rather than a spatially dynamic variable. In this work, we address both limitations with a 3D latent diffusion framework that conditions image generation on the spatially resolved voxel-wise dose distribution, alongside a pretreatment image and follow-up time. To make volumetric synthesis computationally feasible, the model combines latent-space compression with ControlNet-based spatial conditioning. The method was trained and evaluated on a public dataset comprising 257 scans from 25 glioma patients. Prediction quality was assessed using mean squared error, peak signal-to-noise ratio, and structural similarity index. Anatomical consistency was further evaluated using Dice scores for cerebrospinal fluid, gray matter, and white matter segmentations, together with hippocampus volume prediction error and deformation analysis based on log Jacobian determinant maps. Compared with our previously proposed 2D approach, the 3D model achieved improved image similarity while maintaining good agreement with ground truth anatomy and deformation patterns. Overall, these results support the feasibility of 3D treatment-aware generative modeling for predicting post-radiotherapy brain MRI using only pretreatment information. Code is available at https://github.com/nordinbelkacemi/fu-pred-3d

URL PDF HTML ☆

赞 0 踩 0

2606.05084 2026-06-04 eess.SP

A Cancellation Mechanism in AFDM Radar Sensing: Exact Fisher Information and Delay-Doppler Decoupling

AFDM雷达感知中的抵消机制：精确Fisher信息与延迟-多普勒解耦

Tingjun Lyu, Yunmei Shi

AI总结本文通过分析仿射频分复用（AFDM）波形中的抵消机制，推导出闭合形式的Fisher信息矩阵和Cramér-Rao界，证明AFDM在延迟-多普勒估计中比OFDM具有更低的耦合性，且延迟估计精度随啁啾率二次方提升。

详情

AI中文摘要

我们考虑使用仿射频分复用（AFDM）进行雷达感知，这是一种最近为高移动性集成感知与通信提出的基于啁啾的波形。虽然文献中存在AFDM雷达的数值Cramér-Rao界，但至今尚无闭合形式的Fisher信息分析揭示波形的啁啾结构如何塑造延迟-多普勒估计精度。在本文中，我们提供了这样的分析。我们识别出AFDM似然函数中的一种抵消：啁啾调制引入的频率漂移被内置在啁啾周期前缀中的离散相位校正精确补偿，仅留下一个小残余。利用这种抵消，我们推导出一个精确的闭合形式Fisher信息矩阵，它通过单个标量依赖于AFDM啁啾结构，并由此得到联合延迟和多普勒估计的闭合形式Cramér-Rao界。由此产生三个结论：对于任何非零啁啾率，AFDM在延迟-多普勒耦合性上可证明低于OFDM；延迟Cramér-Rao界随啁啾率二次方改善，而多普勒界不受其影响；最后，当啁啾消失时，我们的框架连续地退化为经典的OFDM结果，证明它是OFDM雷达感知理论的严格推广。总体而言，我们的工作表明，啁啾周期前缀——至今仅作为信道均衡设备研究——是AFDM感知中解耦延迟和多普勒的结构元素，并且AFDM的优越感知性能可以通过解析方式而非仅通过数值界来表征。在现实车辆和低地球轨道参数下的数值实验验证了所有闭合形式表达式。

英文摘要

We consider radar sensing with affine frequency division multiplexing (AFDM), a chirp-based waveform recently proposed for high-mobility integrated sensing and communication. While numerical Cramér-Rao bounds for AFDM radar are available in the literature, no closed-form Fisher information analysis has so far revealed how the waveform's chirp structure shapes delay-Doppler estimation accuracy.In this paper, we provide such an analysis. We identify a cancellation in the AFDM likelihood: the frequency drift introduced by the chirp modulation is exactly compensated by a discrete phase correction built into the chirp-periodic prefix, leaving only a small residual. Exploiting this cancellation, we derive an exact closed-form Fisher information matrix that depends on the AFDM chirp structure through a single scalar, and from it we obtain closed-form Cramér-Rao bounds for joint delay and Doppler estimation.Three consequences follow. AFDM is provably less delay-Doppler-coupled than OFDM for any nonzero chirp rate. The delay Cramér-Rao bound improves quadratically with the chirp rate, while the Doppler bound is unaffected by it. Finally, our framework reduces continuously to the classical OFDM result as the chirp vanishes, certifying it as a strict generalization of OFDM radar sensing theory.Overall, our work shows that the chirp-periodic prefix -- until now studied only as a channel-equalization device -- is the structural element that decouples delay and Doppler in AFDM sensing, and that AFDM's superior sensing performance can be characterized analytically rather than through numerical bounds alone. Numerical experiments at realistic vehicular and low-Earth-orbit parameters validate all closed-form expressions.

URL PDF HTML ☆

赞 0 踩 0

2606.05053 2026-06-04 eess.SP

Deep Learning Based Multi-Step Channel Prediction for Adaptive Underwater Acoustic OFDM Systems

基于深度学习的水声自适应OFDM系统多步信道预测

Tian Tian, Ying Zhang, Agastya Raj, Fei-Yun Wu, Marco Ruffini

AI总结提出基于PatchCSI-T Transformer的多步信道预测模型，结合贪婪自适应调制与功率分配，实现低延迟高精度CSI预测，提升实际水声信道下的端到端误码率和频谱效率。

2606.05041 2026-06-04 eess.SP

A Compact Omnidirectional Meanderline Antenna Array for Wireless Security Using Dynamic Magnitude and Phase Pattern Modulation

用于无线安全的紧凑型全向曲折线天线阵列，采用动态幅度和相位模式调制

Sheng Huang, Jacob R. Randall, Cory Hilton, Jeffrey A. Nanzer

AI总结提出一种紧凑型四元动态阵列，通过天线级方向调制实现平面物理层安全，无需相控阵波束成形或多射频链，在H面实现全向覆盖。

详情

Comments: 11 pages, 13 figures; submitted to IEEE Transactions on Antennas and Propagation

AI中文摘要

提出了一种紧凑型动态四元阵列，具有全向H面覆盖，用于采用天线级方向调制的平面物理层安全。所提出的方法通过动态切换四元阵列的激励路径，无需相控阵波束成形或多射频链，即可实现角度选择性信息传输。该天线由四个印刷曲折线单极子单元组成，工作频率为5.05 GHz，具有独立控制的差分功率激励，引入幅度和相位模式调制以及视在单元间距的动态变化，导致强烈的角度相关信号失真和误码率（BER）性能。可靠的信息恢复仅限于E面窄边射区域，而在偏离边射角度观察到显著升高的BER。相比之下，H面辐射保持静态且全向，在正交平面实现360度信息可恢复覆盖。该天线制作在单层Rogers RO4350B基板上，紧凑尺寸为0.55 x 1.73 lambda_0^2。采用商用射频元件实现的四路径切换网络通过实验验证了该概念。在信噪比高于19 dB的高SNR条件下使用16-QAM进行的通信测量显示，平面信息波束宽度低于24度，证实了有效的天线级方向调制，具有角度相关的BER特性和全向H面覆盖。

英文摘要

A compact dynamic four-element array with omnidirectional H-plane coverage is presented for planar physical-layer security using antenna-level directional modulation. The proposed approach achieves angularly selective information transmission without phased-array beamforming or multiple RF chains by dynamically switching the excitation paths of a four-element array. The antenna comprises four printed meander-line monopole elements operating at 5.05 GHz with independently controlled differential power excitation, which introduces magnitude and phase pattern modulation and dynamic motion of the apparent element spacing, resulting in strongly angle-dependent signal distortion and bit error rate (BER) performance. Reliable information recovery is confined to a narrow broadside region in the E-plane, while significantly elevated BER is observed at off-broadside angles. In contrast, the H-plane radiation remains static and omnidirectional, enabling full 360-degree information-recoverable coverage in the orthogonal plane. The antenna is fabricated on a single-layer Rogers RO4350B substrate with a compact footprint of 0.55 x 1.73 lambda_0^2. A four-path switching network implemented using commercial RF components validates the concept experimentally. Communication measurements under high-SNR conditions above 19 dB using 16-QAM demonstrate a planar information beamwidth below 24 degrees, confirming effective antenna-level directional modulation with angle-dependent BER characteristics and omnidirectional H-plane coverage.

URL PDF HTML ☆

赞 0 踩 0

2606.05038 2026-06-04 math.DS cs.SY eess.SY

Dual Lyapunov-based Synchronization Control of Rössler System

基于双Lyapunov的Rössler系统同步控制

Alkım Gökçen, Savaş Şahin, Mahmut Kudeyt, Swapnil Tripathi, Özkan Karabacak

AI总结提出一种结合双Lyapunov稳定性分析与多项式优化的非线性动力学系统同步方法，通过半定规划和平方和多项式计算非线性状态反馈函数，使Rössler系统同步到参考模型并破坏混沌行为。

详情

Comments: Presented at the International Interdisciplinary Chaos Symposium on Chaos and Complex Systems (SCCS 2025), Istanbul, Türkiye. A version of this work has been accepted for publication in the conference proceedings and will appear in Chaos and Complex Systems: Proceedings of the 6th International Interdisciplinary Chaos Symposium (Springer Cham)

AI中文摘要

本文针对非线性动力学系统的同步问题，提出了一种结合双Lyapunov稳定性分析与多项式优化的新方法。对同步方法的相关科学文献进行了全面综述，特别关注了基于经典Lyapunov的混沌系统方法。本研究采用基于双Lyapunov的闭环同步方法对Rössler系统进行同步。该方法利用半定规划和平方和多项式计算非线性状态反馈函数，使混沌系统同步到选定的参考模型。目标是破坏混沌行为，转而吸引极限环。对随机选取的100种不同初始条件进行了仿真，结果表明同步过程成功完成。此外，通过分岔图和相图评估了系统动力学。本文讨论了结果以及如何采用新约束并使其适应更复杂的系统。

英文摘要

This paper proposes a novel approach for the synchronization problem of nonlinear dynamical systems, integrating dual Lyapunov stability analysis with polynomial optimization. A comprehensive review of the relevant scientific literature on synchronization methods is conducted, with a particular focus on classical Lyapunov-based methods for chaotic systems. In this study, the Rössler system is synchronized by employing dual Lyapunov-based closed-loop synchronization method. This method uses semidefinite programming and sum-of-squares polynomials to compute a nonlinear state feedback function which synchronize a chaotic system to a selected reference model. It is aimed that chaotic behavior is destroyed and, instead, a limit cycle becomes attracting. Simulation works are performed for randomly selected 100 different initial conditions to show that synchronization process is successfully performed. Furthermore, bifurcation diagrams and phase portraits are evaluated to analyze the system dynamics. The paper discusses results and how new constraints should be employed and adapted to more complex systems.

URL PDF HTML ☆

赞 0 踩 0

2606.04981 2026-06-04 eess.SY cs.SY

Peer-to-Peer Cloud Service Market for Data Centers Oriented to Computation-Electricity Coordination

面向计算-电力协调的数据中心对等云服务市场

Yugui Liu, Yibo Ding, Xudong Li, Jing Qu, Wenyi Zhang, Tong Qian, Wuyou Xiao, Zhengyang Hu, Jiaqi Ruan, Zhao Xu

AI总结提出双层计算-电力协调框架，通过地理分布式数据中心间的对等云服务市场（P2P-CSM）利用区域异质性，实现计算与电力的协同优化，提高数据中心运营利润并降低能耗。

详情

AI中文摘要

能源密集型数据中心已成为现代电力系统中重要且灵活的负荷，凸显了计算-电力协调的关键需求。利用数据中心工作负载的时空灵活性是促进这种协调的有前景的方法。然而，现有研究忽视了地理分布式数据中心间计算资源共享的协作潜力，从而未能充分释放这种灵活性。本文提出了一种双层计算-电力协调框架，以明确捕获数据中心与电网之间的双向交互。首先，提出了针对地理分布式数据中心的对等云服务市场（P2P-CSM），该市场支持双边云服务交易以利用区域异质性（例如电价、冷却效率）。其次，将节点边际电价嵌入框架中以反映网络拥堵和节点价格差异。第三，开发了基于双共识交替方向乘子法（ADMM）的分散式算法作为P2P市场出清算法，并提出了二分辅助迭代算法以确保框架的严格收敛。在改进的IEEE 30节点系统上进行的案例研究验证了P2P-CSM实现了计算-电力的双赢协调：它不仅使数据中心总运营利润增加了22.8%，而且有效缓解了电网拥堵，并使总能耗降低了3.2%。

英文摘要

Energy-intensive data centers (DCs) have emerged as substantial and flexible loads in modern power systems, underscoring the critical need for computation-electricity coordination. Harnessing the spatio-temporal flexibility of DC workloads is a promising approach to facilitate this coordination. However, existing studies overlook the collaborative potential of computational resource sharing among geo-distributed DCs, thereby failing to fully unlock this flexibility. In this paper, a bi-level computation-electricity coordination framework is proposed to explicitly capture the bidirectional interactions between DCs and power grid. Firstly, a peer-to-peer cloud service market (P2P-CSM) for geo-distributed DCs is proposed, which enables bilateral cloud service transactions to leverage regional heterogeneities (e.g., electricity prices, cooling efficiency). Secondly, locational marginal prices are embedded into the framework to reflect network congestion and nodal price disparities. Thirdly, a dual consensus alternating direction method of multipliers (ADMM)-based decentralized algorithm is developed as the P2P market clearing algorithm, and a bisection-assisted iterative algorithm is proposed to ensure rigorous convergence of the framework. Case studies conducted on modified IEEE 30-bus system validate that the P2P-CSM achieves a win-win computation-electricity coordination: it not only increases total DC operational profit by 22.8\%, but also effectively alleviates grid congestion and yields a 3.2\% reduction in total energy consumption.

URL PDF HTML ☆

赞 0 踩 0

2606.04975 2026-06-04 eess.SY cs.SY

A Survey of Smart Grid Emerging Use Cases and Relevant 5G and 6G Capabilities and Features

智能电网新兴用例及相关5G和6G能力与特性综述

Manoj Kumar, Nishith D. Tripathi, Jeffrey H. Reed

AI总结本文综述了智能电网架构，量化了新兴用例的服务需求，并描述了5G和6G的关键能力与使能技术，最后讨论了开放挑战与未来方向。

详情

DOI: 10.1109/ACCESS.2026.3667671
Journal ref: IEEE Access, vol. 14, pp. 30455-30475, 2026

AI中文摘要

现代能源系统日益复杂，促使采用智能电网（SG），利用先进通信技术促进高效、可靠、安全、可持续的能源运营和管理。与现有调查通常将电网和通信领域分开处理不同，本工作严格量化了高复杂度新兴场景的服务需求。它全面概述了将数字通信基础设施与分布式能源（DER）、微电网、储能系统和网络安全框架相结合的SG架构。此外，识别了新兴SG用例，如智能分布式电压控制、实时故障检测与自愈、智能自主监控和预测性维护，更重要的是，量化了与这些用例相关的服务性能需求。同时，描述了第五代（5G）和第六代（6G）网络的关键能力和新兴SG使能技术，包括网络切片、边缘计算、频谱管理、人工智能（AI）驱动的优化、数字孪生和开放无线接入网络（O-RAN）。最后，论文讨论了设计可扩展、智能和安全的下一代SG系统的开放挑战和未来研究方向。

英文摘要

The growing complexity of modern energy systems has led to the adoption of Smart Grid (SG) that use advanced communication technologies to facilitate efficient, reliable, secure, and sustainable energy operation and management. Unlike existing surveys that often treat grid and communication domains separately, this work rigorously quantifies service requirements for high-complexity emerging scenarios. It provides a comprehensive overview of SG architecture that integrates digital communication infrastructure with distributed energy resources (DERs), microgrids, energy storage systems, and cybersecurity frameworks. Furthermore, emerging SG use cases such as smart distributed voltage control, real-time fault detection and self-healing, smart and autonomous monitoring, and predictive maintenance are identified, and more importantly, service performance requirements associated with these use cases have been quantified. Additionally, key capabilities and emerging SG enablers of fifth-generation (5G) and sixth-generation (6G) networks are described. These capabilities and enablers include network slicing, edge computing, spectrum management, artificial intelligence (AI) driven optimization, digital twins, and Open-Radio Access Network (O-RAN). Finally, the paper discusses open challenges and future research directions for designing scalable, intelligent, and secure next-generation SG systems.

URL PDF HTML ☆

赞 0 踩 0

2606.04943 2026-06-04 eess.AS eess.SP

Differentiable Articulatory Copy-Synthesis of Biphonic Singing

双音歌唱的可微分发音复制合成

Mateo Cámara, María Pilar Daza-Llin, Fernando Marcos-Macías, José Luis Blanco

AI总结提出一种可微分的Kelly-Lochbaum波导模型，结合舌下第二声源、三次B样条声道参数化和空间可变可学习阻尼，通过梯度下降从音频端到端优化，实现了双音歌唱（sygyt）的复制合成，在20个片段上将对数谱距离相对于发音基线降低了30-38%。

详情

Comments: Accepted to DAFx 2026

AI中文摘要

Sygyt是一种图瓦风格的双音歌唱，其中低音持续发声，同时高次谐波在1-3 kHz区域被选择性放大。对于发音模型来说，复制合成这种效果仍然具有挑战性，因为它需要精细控制窄聚焦共振，而标准低维声道参数化难以轻易再现。我们通过一个可微分的Kelly-Lochbaum波导解决了这个问题，该波导增加了舌下第二声源、三次B样条声道参数化和空间可变可学习阻尼，并通过梯度下降从音频端到端优化。在两个独立的sygyt数据集（5位歌手，10个音高）的20个片段上，所提出的模型相对于发音基线将对数谱距离降低了30-38%，最大的改进集中在泛音区域。倒谱包络分析进一步显示，该模型更准确地恢复了sygyt发声特征的合并共振峰结构。该模型也优于具有直接逐次谐波频谱控制的DDSP谐波加噪声基线，表明显式声学结构是泛音歌唱复制合成的有用归纳偏置。

英文摘要

Sygyt is a Tuvan style of biphonic singing in which a low vocal drone is sustained while a high harmonic is selectively amplified in the 1--3\,kHz region. Copy-synthesizing this effect remains challenging for articulatory models, since it requires fine control of narrowly focused resonances that standard low-dimensional tract parameterizations cannot easily reproduce. We address this problem with a differentiable Kelly--Lochbaum waveguide augmented with a sublingual second source, cubic B-spline tract parameterization, and spatially varying learnable damping, optimized end-to-end by gradient descent from audio. On 20 segments from two independent sygyt datasets (5 singers, 10 pitches), the proposed model reduces log-spectral distance by 30--38\% relative to an articulatory baseline, with the largest gains concentrated in the overtone region. Cepstral-envelope analysis further shows more accurate recovery of the merged formant structure characteristic of sygyt production. The model also outperforms a DDSP harmonic-plus-noise baseline with direct per-harmonic spectral control, suggesting that explicit acoustic structure is a useful inductive bias for overtone-singing copy-synthesis.

URL PDF HTML ☆

赞 0 踩 0

2606.04942 2026-06-04 eess.SP cs.NA math.NA

Encounter Geometry Effects on Space-Based Laser Debris Remediation and Estimation

遭遇几何对天基激光碎片清除与估计的影响

Matthew C. Fox, Gavin M. Baker, David O. Williams Rogers, Hang Woon Lee

AI总结针对天基激光碎片清除中激光与碎片遭遇几何对清除效果和估计精度的影响，提出联合烧蚀与估计方法，分析不同几何构型下的性能变化。

详情

Comments: 40 pages

AI中文摘要

轨道碎片的不断积累对未来太空操作构成严重威胁。利用激光烧蚀的天基激光器已成为减缓碎片增殖和保护轨道环境的有前途的方法。然而，现有文献将天基激光碎片清除视为确定性问题，假设动量传递和由此产生的碎片扰动是精确已知的。实际上，由于碎片特性部分已知，激光与碎片的交战结果本质上是随机的。更复杂的是，原位估计关键激光-物质参数（如动量耦合系数）需要烧蚀，从而扰动碎片轨迹。这建立了一个耦合的烧蚀与估计问题，其中激光平台与目标碎片的遭遇几何影响清除效果和估计精度。为解决此问题，我们提出了一种联合烧蚀与估计方法，揭示了不同遭遇几何改善或降低整体清除与估计性能的驱动因素。多个共面和非共面遭遇几何的结果展示了近地点降低能力、线性系统可观测性和非线性估计性能如何随激光参数和相对轨道几何变化。通过识别这些指标的关键驱动因素，本研究强调了在不确定性下安全有效操作天基激光器的关键考虑因素。

英文摘要

The escalating accumulation of orbital debris poses a critical threat to future space operations. Space-based lasers leveraging laser ablation have emerged as a promising approach for mitigating debris proliferation and preserving the orbital environment. Current literature, however, treats space-based laser debris remediation as a deterministic problem, assuming that momentum transfer and the resulting debris perturbations are precisely known. In reality, laser-to-debris engagement outcomes are inherently stochastic due to partially known debris characteristics. Compounding this challenge, estimating critical laser-matter parameters in situ, such as the momentum coupling coefficient, requires ablation that consequently perturbs the debris trajectory. This establishes a coupled ablation-and-estimation problem in which the laser platform and target debris encounter geometry influences remediation effectiveness and estimation accuracy. To address this problem, we present a joint ablation-and-estimation methodology that provides insights into the driving factors that make different encounter geometries improve or degrade overall remediation and estimation performance. Results across multiple coplanar and out-of-plane encounter geometries demonstrate how periapsis-lowering capacity, linear system observability, and nonlinear estimation performance evolve as laser parameters and relative orbit geometry vary. By identifying the key drivers behind these metrics, this study highlights critical considerations for the safe and effective operation of space-based lasers under uncertainty.

URL PDF HTML ☆

赞 0 踩 0

2606.04939 2026-06-04 eess.AS

UAT: Unified Audio-Text Diffusion for Audio Generation, Editing, and Captioning

UAT: 统一音频-文本扩散用于音频生成、编辑和字幕生成

Hui Wang, Yifan Yang, Zeyue Tian, Yuhang Jia, Jinghua Zhao, Long Zhou, Bing Han, Cheng Liu, Jiaming Zhou, Geng Tu, Yong Qin

AI总结提出UAT框架，通过连续潜扩散和掩码离散扩散实现音频生成、编辑和字幕生成的统一，在保持高保真音频合成的同时取得有竞争力的字幕性能。

详情

AI中文摘要

音频生成和音频到文本理解在很大程度上仍然是分离的，扩散模型主导高保真合成，而自回归（AR）语言模型驱动字幕生成和语义预测。现有的统一方法通常依赖于异构模块或AR中心建模，这可能阻碍联合优化并限制声学保真度。我们提出UAT，据我们所知，这是第一个支持统一音频生成、编辑和字幕生成的扩散中心框架。UAT将音频的连续潜扩散与文本的掩码离散扩散相结合，在共享的双流骨干中实现双向音频-文本建模。实验表明，UAT在保持强大的音频生成和编辑能力的同时，实现了有竞争力的字幕性能，展示了声学合成和语义预测之间的良好平衡。演示样本可在 https://UAT-demo.github.io 获取。

英文摘要

Audio generation and audio-to-text understanding remain largely separate, with diffusion models dominating high-fidelity synthesis and autoregressive (AR) language models driving captioning and semantic prediction. Existing unified approaches typically rely on either heterogeneous modules or AR-centric modeling, which can hinder joint optimization and limit acoustic fidelity. We present UAT, to our knowledge, the first diffusion-centric framework that supports unified audio generation, editing, and captioning. UAT couples continuous latent diffusion for audio with masked discrete diffusion for text, enabling bidirectional audio-text modeling within a shared dual-stream backbone. Experiments show that UAT preserves strong audio generation and editing capabilities while achieving competitive captioning performance, demonstrating a favorable balance between acoustic synthesis and semantic prediction. Demo samples are available at https://UAT-demo.github.io.

URL PDF HTML ☆

赞 0 踩 0

2606.04921 2026-06-04 cs.SD eess.AS

SURF: Separation via Unsupervised Remixing Flow

SURF: 通过无监督重混流进行分离

Henry Li, Robin Scheibler, Efthymios Tzinis, Matt Shannon, Arnaud Doucet, John R. Hershey

AI总结提出无监督流匹配方法SURF，直接从混合信号学习源分离，结合监督流匹配与自监督回归，通过重混步骤引导学生模型，在图像和音频基准上达到新最优。

详情

Comments: Accepted at ICML 2026

AI中文摘要

单通道源分离的目标是从混合信号中重建$K$个源。在监督设置中，当有大量干净源数据可用时，这个具有挑战性的不适定问题已通过生成扩散和基于流的先验模型成功解决。然而，获取此类干净源样本通常受限，即使可用，监督模型也容易受到领域偏移的影响。为弥补这一差距，我们提出了通过无监督重混流进行分离（SURF），这是一种无监督流匹配方法，直接从观测到的混合信号中学习。该方法依赖于最先进的监督流匹配和基于回归的自监督技术的新颖组合。在高层面上，从教师模型开始，我们利用“重混”步骤，从教师估计中引导学习学生流模型。我们提供了关于该方法优化目标的见解，并建立了与Wake-Sleep算法的新联系。在图像和音频基准上的实证评估表明，SURF建立了新的最优水平，显著优于现有无监督方法。示例请参见我们的演示页面：https://google.github.io/df-conformer/surf/

英文摘要

The goal of single-channel source separation is to reconstruct $K$ sources given their mixture. In supervised settings where vast amounts of clean source data are available, this challenging, ill-posed problem has been addressed successfully by generative diffusion and flow-based prior models. However, access to such clean source samples is often limited, and even when available, supervised models are vulnerable to domain shifts. To bridge this gap, we present Separation via Unsupervised Remixing Flow (SURF), an unsupervised flow matching approach for source separation that learns directly from observed mixtures. This method relies on a novel combination of state-of-the-art supervised flow matching and regression-based self-supervised techniques. At a high level, starting from a teacher model, we utilize a "remixing" step to bootstrap the learning of a student flow model from the teacher's estimates. We provide insights into the objectives optimized by this approach and draw a novel connection to the Wake-Sleep algorithm. Empirical evaluations on image and audio benchmarks demonstrate that SURF establishes a new state-of-the-art, significantly outperforming existing unsupervised methods. See our demo page for examples. https://google.github.io/df-conformer/surf/

URL PDF HTML ☆

赞 0 踩 0

2606.04913 2026-06-04 eess.SP

Access Protocols for Segmented Waveguide-Enabled Pinching-Antenna Systems (SWANs)

分段波导启用的捏合天线系统（SWANs）的接入协议

Shan Shan, Chongjun Ouyang, Petar Popovski, Yuanwei Liu

AI总结提出一种针对分段波导启用的捏合天线系统（SWANs）的两阶段接入协议框架，利用SWAN诱导的可重构信道多样性作为上行随机接入的协议级资源，通过信道预知和接入阶段设计，在三种操作模式下实现低开销信道获取与多用户接入性能优化。

详情

AI中文摘要

本文提出了一种针对分段波导启用的捏合天线系统（SWANs）的接入协议框架，该框架利用SWAN诱导的可重构信道多样性作为上行随机接入的协议级资源。该框架包括两个阶段：信道预知阶段和接入阶段，在三种SWAN操作模式下设计：（i）单段选择（OS），（ii）段聚合（SA），和（iii）段复用（SM）。具体来说，在信道预知阶段，采用OS模式获取稀疏导频观测并推断整个SWAN配置空间中的信道响应。这样，高维上行信道获取被转化为低维几何定位问题，从而在保持信道重建精度的同时减少导频开销。对于接入阶段，我们分别在SA和SM模式下构建了两个由预知引导的接入码本，解决了硬件复杂度与多用户接入分辨率之间的权衡。特别地，基于SA的方案通过随机段组激活支持单射频链（RF）接入，而基于SM的R接入方案利用多个射频链构建确定性接入时隙并增强冲突解决能力。最后，我们的数值结果表明：（i）所提出的两阶段框架在相同训练开销下提高了接入性能，（ii）对于SA，锚点密集化比激进的段聚合更有效，以及（iii）基于SM的R接入在中高负载条件下实现了确定性的覆盖和更高的吞吐量，而基于SA的接入对于低复杂度实现仍然具有吸引力。

英文摘要

This paper proposes an access protocol framework for segmented waveguide-enabled pinching-antenna systems (SWANs), which exploits SWAN-induced reconfigurable channel diversity as a protocol-level resource for uplink random access. The framework consists of two stages, a channel-oracle stage and an access stage, designed under three SWAN operating modes: (i) one-segment selection (OS), (ii) segment aggregation (SA), and (iii) segment multiplexing (SM). Specifically, in the channel oracle stage, the OS mode is adopted to acquire sparse pilot observations and infer the channel responses across the SWAN configuration space. In this way, high-dimensional uplink channel acquisition is recast as a low-dimensional geometric localization problem, thereby reducing pilot overhead while preserving channel reconstruction accuracy. For the access stage, we construct two oracle-guided access codebooks under the SA and SM modes, respectively, which address the tradeoff between hardware complexity and multiuser access resolution. In particular, the SA-based scheme supports single radio frequency (RF) chain access through randomized segment-group activation, whereas the SM-based R-access scheme exploits multiple RF chains to construct deterministic access slots and enhance collision resolution. Finally, our numerical results demonstrate that (i) the proposed two-stage framework improves access performance under the same training overhead, (ii) anchor densification is more effective than aggressive segment aggregation for SA, and (iii) SM-based R-access achieves deterministic coverage and higher throughput in moderate- and high-load regimes, whereas SA-based access remains attractive for low-complexity implementations.

URL PDF HTML ☆

赞 0 踩 0

2606.04872 2026-06-04 eess.SY cs.SY

Consistent Distributed Cooperative Localization for Ultra Large-Scale Multi-agent Systems

超大规模多智能体系统的一致分布式协同定位

Leonardo Pedroso, W. P. M. H. Heemels, Pedro Batista

AI总结针对超大规模多智能体系统中现有协同定位方法无法兼顾性能、一致性和可扩展性的问题，提出一种基于重叠协方差交集的全分布式算法，实现保守最优协方差传播，并证明其递归一致性。

详情

AI中文摘要

协同定位（CL）是新兴多智能体系统的基础，其中智能体融合本地传感数据与交换信息以估计自身状态。然而，在大规模系统中，跟踪互相关变得不可行，从而阻碍了最优滤波器的使用。忽略或低估这些相关性会导致过度自信且不一致的估计。现有的CL算法通常以通信、计算或内存随网络规模扩展为代价来实现良好的性能和一致性。这与超大规模系统（ULSS）——例如卫星巨型星座——不兼容，因为每个智能体的资源有限且必须独立于智能体数量。这揭示了一个关键差距：没有现有的CL方法能够同时具备良好的性能、一致性和ULSS可扩展性。本文引入了一种新的CL框架，利用最近提出的重叠协方差交集方法来解决这一差距，该方法使智能体能够在不牺牲一致性的情况下利用关于互相关的有限结构信息。由此产生的CL算法仅使用本地可用信息即可实现最优保守协方差传播。该方法完全分布式，可扩展到超大规模，并且可证明递归一致。仿真结果表明，与最先进的一致CL方法相比，该方法在保持可扩展性的同时显著提高了性能。

英文摘要

Cooperative localization (CL) is fundamental in emerging multi-agent systems, where agents fuse local sensing data with exchanged information to estimate their own states. At a large scale, however, tracking cross-correlations becomes infeasible, preventing the use of optimal filters. Ignoring or underestimating these correlations leads to overconfident, and thus inconsistent, estimates. Existing CL algorithms achieve good performance and consistency typically at the expense of communication, computation, or memory that scales with the network size. This is incompatible with ultra large-scale systems (ULSS) - for example, satellite mega-constellations - where per-agent resources are limited and must remain independent of the number of agents. This reveals a critical gap: no existing CL method is simultaneously well-performing, consistent, and ULSS-scalable. This paper introduces a new CL framework that addresses this gap using the recently proposed overlapping covariance intersection methodology, which enables agents to exploit limited structural information about cross-correlations without compromising consistency. The resulting CL algorithm leads to optimal conservative covariance propagation using only locally available information. The method is fully distributed, scalable to an ultra large scale, and provably recursively consistent. Simulations demonstrate substantial performance improvement over state-of-the-art consistent CL approaches while preserving scalability.

URL PDF HTML ☆

赞 0 踩 0

2606.04869 2026-06-04 eess.SY cs.SY

Source Side Mitigation of AI Datacenter Power Fluctuations with a Hybrid Energy Storage System and Residual Differentiable Predictive Control

基于混合储能系统和残差可微预测控制的AI数据中心功率波动源侧缓解

Haiyang You, Chengwei Lou, Jin Yang

AI总结针对AI数据中心工作负载引起的功率波动，提出混合储能系统与可微预测控制框架（HESS-DPC），通过离线训练残差可微预测策略修正规则基线，在NPCC 140节点系统上实现电网侧残余偏差降低，发电机峰峰频率偏差减少80%以上。

详情

AI中文摘要

超大规模AI数据中心的快速增长在互联点引入了结构化的、由工作负载驱动的有功功率波动。这些波动对电网表现为时变扰动注入，无法通过传统的峰值或平均负荷表示来捕捉。为了在残余功率扰动传播到大电网之前减少它，本文提出了一种混合储能系统与可微预测控制（HESS-DPC）框架，用于数据中心侧的功率平滑。首先建立了一个工作负载驱动的扰动模型，将互联点负荷偏差表示为训练和微调工作负载的叠加，以捕捉可能激发发电机频率动态的结构化强迫输入。然后，一个基于频率的规则控制器将该偏差分配给电池储能系统（BESS）和超级电容器（SC），将能量主导分量分配给BESS，将快速变化分量分配给SC。为了克服固定频率分解的预测和约束限制，离线训练了一个残差可微预测控制策略，以在规则基线周围计算有限时域的命令修正，同时强制执行一步安全保护。在NPCC 140节点系统上的仿真表明，HESS-DPC在工作负载转换期间减少了电网侧残余偏差，改善了SC荷电状态在长时间运行中的可持续性，并将所有监测发电机的峰峰频率偏差降低了80%以上，受影响最严重的发电机响应从15.1 mHz降至1.3 mHz。这些结果证实，在数据中心互联点进行本地有功功率平滑可以显著减轻由AI工作负载引起的频率扰动。

英文摘要

The rapid growth of hyperscale AI datacenters introduces structured, workload-driven active-power fluctuations at the point of interconnection. These fluctuations appear to the grid as time-varying disturbance injections that cannot be captured by conventional peak- or average-load representations. To reduce the residual power disturbance before it propagates into the bulk power system, this paper proposes a hybrid energy storage system with differentiable predictive control (HESS-DPC) framework for datacenter-side power smoothing. A workload-driven disturbance model is first established, representing the point-of-interconnection load deviation as the superposition of training and fine-tuning workloads to capture the structured forcing inputs that can excite generator frequency dynamics. A frequency-based rule-based controller then allocates this deviation between a battery energy storage system (BESS) and a supercapacitor (SC), assigning the energy-dominant component to the BESS and the fast-varying component to the SC. To overcome the anticipation and constraint limitations of fixed-frequency decomposition, a residual differentiable predictive control policy is trained offline to compute finite-horizon command corrections around the rule-based baseline while enforcing a one-step safeguard. Simulations on the NPCC 140-bus system show that HESS-DPC reduces grid-side residual deviations during workload transitions, improves SC state-of-charge sustainability over extended operation, and reduces generator peak-to-peak frequency deviations by more than 80 percent across all monitored generators, with the worst-affected generator response falling from 15.1 mHz to 1.3 mHz. These results confirm that local active-power smoothing at the datacenter point of interconnection can substantially mitigate frequency disturbances caused by AI workloads.

URL PDF HTML ☆

赞 0 踩 0

2606.04849 2026-06-04 cs.IT eess.SP math.IT

Dynamic FDD for Spectrum Sharing in Non-Terrestrial Networks

非地面网络中频谱共享的动态FDD

Sourav Mukherjee, Bho Matthiesen, Armin Dekorsy, Petar Popovski

AI总结针对密集低轨卫星星座在共享频段上的干扰问题，提出动态FDD频带分配方法，结合用户调度和功率分配的联合优化，实现高达30%的吞吐量提升。

详情

Comments: Submitted to IEEE for possible publication. The associated code for reproducibility with DOI is available at: https://doi.org/10.5281/zenodo.20307458

AI中文摘要

未来6G网络预计将集成低地球轨道卫星巨型星座，以实现无缝的全球连接，特别是在服务不足和偏远地区。然而，密集巨型星座的部署在共享频段上运行的卫星之间引入了干扰。这为研究频谱共享提供了一个相当新的设置，加剧了基于下行和上行传输固定频带的传统FDD系统灵活性有限的问题。我们解决了这个频谱共享问题，并提出了动态重新分配FDD频带的方法，以改善密集部署中的干扰管理，同时评估了该方法的性能增益。为此，我们制定了一个联合优化问题，其中包含两个方向上的动态频带分配、用户调度和功率分配。这个非凸混合整数问题通过等价变换、交替优化和最先进的工业级混合整数求解器相结合的方法来解决。数值结果表明，所提出的动态FDD频带分配方法显著提高了系统性能，相比传统FDD，在密集部署中吞吐量提升高达30%。

英文摘要

Future 6G networks are envisioned to integrate low Earth orbit satellite mega-constellations to enable seamless global connectivity, particularly in underserved and remote areas. However, the deployment of dense mega-constellations introduces interference among satellites operating over shared frequency bands. This represents a rather new setup for studying spectrum sharing, which exacerbates the limited flexibility of conventional FDD systems based on fixed bands for downlink and uplink transmissions. We address this spectrum-sharing problem and propose dynamic re-assignment of FDD bands for improved interference management in dense deployments, as well as evaluate the performance gain of this approach. To this end, we formulate a joint optimization problem that incorporates dynamic band assignment, user scheduling, and power allocation in both directions. This non-convex mixed integer problem is solved using a combination of equivalence transforms, alternating optimization, and state-of-the-art industrial-grade mixed integer solvers. Numerical results demonstrate that the proposed approach of dynamic FDD band assignment significantly enhances system performance over conventional FDD, achieving up to 30\% improvement in throughput in dense deployments.

URL PDF HTML ☆

赞 0 踩 0

2606.04790 2026-06-04 eess.SY cs.SY math.OC

A model-free approach to control barrier functions for higher-order systems

高阶系统的无模型控制障碍函数方法

Lukas Lanza, Johannes Köhler, Dario Dennstädt, Thomas Berger, Karl Worthmann

AI总结针对任意相对度的非线性系统，提出一种无需动态模型或状态测量的无模型控制障碍函数设计方法，并利用漏斗控制技术保证安全性。

2606.04787 2026-06-04 eess.SY cs.SY math.OC

Towards Guaranteed Optimal PID Tuning for Uncertain Nonlinear Systems

面向不确定非线性系统的保证最优PID整定

Jingru Zhu, Cheng Zhao, Lei Guo

AI总结针对不确定非线性系统，提出一种基于滞回随机搜索与Kiefer-Wolfowitz算法结合的HRS-KW迭代学习算法，仅利用输入输出数据即可实现PID参数的近最优整定，并保证闭环系统稳定性和几乎必然收敛到ε-最优解。

详情

Comments: Accepted by IFAC World Congress 2026

AI中文摘要

尽管PID控制器在工程实践中广泛应用，但设计最优PID参数长期以来在理论和实践中都被视为一个具有挑战性的问题，尤其是在面对不确定非线性动力系统时。基于作者近期为MIMO非线性不确定系统建立的PID控制理论（Zhao and Guo, 2022），该理论为PID控制系统的全局稳定性提供了具体的PID参数集，本文进一步提出了一种近最优PID整定方法，其中仅可获得关于控制性能的输入输出（零阶）数据。该整定方法被表述为一个约束优化问题，并通过一种称为HRS-KW算法的迭代学习算法求解，该算法结合了滞回随机搜索和Kiefer-Wolfowitz算法，旨在利用全局探索和局部梯度加速两者的优势。该方法无需系统动力学的精确结构知识，但理论上可以保证其几乎必然收敛到PID参数的ε-最优解，同时确保闭环系统稳定性。仿真结果表明，我们的HRS-KW算法优于其他相关优化方法，表现出更好的收敛到指定ε-最优性能集的能力。

英文摘要

Despite the widespread use of PID controllers in engineering practice, designing optimal PID parameters has long been regarded as a challenging problem in both theory and practice, particularly when faced with uncertain nonlinear dynamical systems. Based on the authors' PID control theory established recently for MIMO nonlinear uncertain systems (Zhao and Guo, 2022), which provides a concrete PID parameter set for global stability of PID controlled systems, this paper further proposes a near-optimal PID tuning method, where only input-output (zeroth-order) data on the control performance is available. The tuning method is formulated as a constrained optimization problem and solved by an iterative learning algorithm, referred to as HRS-KW algorithm, that combines a hysteretic random search with the Kiefer-Wolfowitz algorithm, aiming at utilizing the advantages of both global exploration and local gradient acceleration. This method operates without requiring precise structural knowledge of the system dynamics, yet its almost sure convergence to an epsilon-optimal solution for the PID parameters can be guaranteed in theory while ensuring closed-loop system stability. Simulation results illustrate that our HRS-KW algorithm outperforms other related optimization methods, exhibiting better convergence to the prescribed epsilon-optimal performance set.

URL PDF HTML ☆

赞 0 踩 0

2606.04775 2026-06-04 cs.LG cs.AI cs.CV cs.SY eess.SY math.OC

Activation Steering of Video Generation Models via Reduced-Order Linear Optimal Control

通过降阶线性最优控制引导视频生成模型的激活

Jihoon Hong, Alice Chan, Qiyue Dai, Julian Skifstad, Glen Chou

AI总结提出LA-LQR框架，将文本到视频推理建模为动态系统，通过降阶最优控制实现最小干预的激活引导，减少不安全内容生成同时保持视觉质量。

详情

AI中文摘要

在大规模网络数据上训练的文本到视频（T2V）模型可能生成不良内容，这促使我们进行干预以减少有害输出而不牺牲视觉质量。激活引导提供了一种有吸引力的机制替代微调和提示过滤，但现有的T2V引导方法仍然有限，通常采用粗糙的、非预测性的干预，可能导致过度引导和内容退化。为了弥补这一差距，我们提出了潜在激活线性二次型调节器（LA-LQR），一种用于最小侵入性T2V引导的降阶最优控制框架。LA-LQR将T2V推理表述为一个动态系统，并计算闭环反馈干预，将激活引导向期望的特征设定点，同时惩罚不必要的扰动。为了使最优控制对高维视频激活可行，我们将激活投影到由对比提示对导出的低维、任务相关子空间，估计该潜在空间中的局部线性动力学，并求解潜在LQR问题以获得时间步和层特定的引导信号。我们提供了将潜在设定点跟踪与原始激活空间特征控制联系起来的理论界限，并实证验证了降阶潜在动力学的保真度。在概念引导和视频安全基准测试中，LA-LQR相对于基线减少了不安全生成，同时保持了提示保真度和视觉质量。

英文摘要

Text-to-video (T2V) models trained on large-scale web data can generate undesired content, motivating interventions that reduce harmful outputs without sacrificing visual quality. Activation steering offers an attractive mechanistic alternative to finetuning and prompt filtering, but existing T2V steering methods remain limited, typically applying coarse, non-anticipative interventions that can lead to oversteering and content degradation. To close this gap, we propose Latent Activation Linear-Quadratic Regulator (LA-LQR), a reduced-order optimal control framework for minimally invasive T2V steering. LA-LQR formulates T2V inference as a dynamical system and computes closed-loop feedback interventions that steer activations toward desired feature setpoints while penalizing unnecessary perturbations. To make optimal control feasible for high-dimensional video activations, we project activations onto a low-dimensional, task-relevant subspace derived from contrastive prompt pairs, estimate local linear dynamics in this latent space, and solve a latent LQR problem to obtain timestep- and layer-specific steering signals. We provide theoretical bounds relating latent setpoint tracking to raw activation-space feature control, and empirically validate the fidelity of the reduced latent dynamics. On concept steering and video safety benchmarks, LA-LQR reduces unsafe generations relative to baselines, while preserving prompt fidelity and visual quality.

URL PDF HTML ☆

赞 0 踩 0

2606.04770 2026-06-04 eess.SP

WiSER: A Wireless Scene Encoder for Geometry-Grounded Multi-View Wireless Prediction

WiSER: 一种用于几何基础的多视角无线预测的无线场景编码器

Jing Qiao, Yiyang Guo, Hao Ye

AI总结提出WiSER模型，利用稀疏体素场景表示和发射器条件记忆，联合预测无线覆盖图和多径信道脉冲响应，在ScanNet++和Sionna射线追踪数据集上优于基线方法。

详情

AI中文摘要

室内无线传播由三维场景几何、无线电材料属性以及发射器和接收器配置之间的相互作用决定，这些因素共同决定了整体覆盖行为和路径级多径结构。然而，大多数基于学习的特定位置预测方法针对单一无线表示设计，例如无线电图估计或信道脉冲响应（CIR）预测，因此没有明确利用跨异构无线视图共享的传播结构。本文介绍了WiSER，一种用于联合无线电图和多径CIR预测的无线场景编码器。WiSER将室内场景的稀疏体素表示和发射器位置映射为发射器条件化的稀疏3D场景记忆，该记忆由两个结构感知解码器查询：用于密集接收器平面路径增益预测的射线走廊解码器，以及用于可变基数延迟和功率抽头预测的检测变换器（DETR）风格集合解码器。为了训练和评估这一设置，我们使用ScanNet++室内场景和Sionna射线追踪构建了一个共同注册的室内场景和无线数据集流程，在共同坐标系和传播配置下生成对齐的稀疏体素输入、密集无线电图标签和无序多径CIR抽头集合。实验结果表明，WiSER优于特定场景的无线电图基线，并在匹配延迟和功率预测上显著优于参考CIR基线。这些结果表明，发射器条件化的稀疏3D场景表示可以作为可重用的无线场景编码器，用于异构传播查询，为AI原生无线系统的表示学习和基础模型开发提供了几何基础的一步。

英文摘要

Indoor wireless propagation is governed by the interaction among three-dimensional (3D) scene geometry, radiomaterial properties, and transmitter and receiver configuration, which jointly determine both aggregate coverage behavior and path-level multipath structure. However, most learning-based site-specific prediction methods are designed for a single wireless representation, such as radiomap estimation or channel impulse response (CIR) prediction, and therefore do not explicitly exploit the propagation structure shared across heterogeneous wireless views. This paper introduces WiSER, a Wireless Scene Encoder for joint radiomap and multipath CIR prediction. WiSER maps a sparse voxel representation of an indoor scene and a transmitter location into a transmitter-conditioned sparse 3D scene memory, which is queried by two structure-aware decoders: a ray-corridor decoder for dense receiver-plane path-gain prediction and a Detection Transformer (DETR)-style set decoder for variable cardinality delay and power tap prediction. To train and evaluate this setting, we construct a co-registered indoor scene and wireless dataset pipeline using ScanNet++ indoor scenes and Sionna Ray Tracing, producing aligned sparse voxel inputs, dense radiomap labels, and unordered multipath CIR tap sets under a common coordinate frame and propagation configuration. Experimental results show that WiSER outperforms scene-specific radiomap baselines and substantially improves matched delay and power prediction over reference CIR baselines. These results suggest that transmitter-conditioned sparse 3D scene representations can serve as reusable wireless scene encoders for heterogeneous propagation queries, providing a geometry-grounded step toward representation learning and foundation-model development for AI-native wireless systems.

URL PDF HTML ☆

赞 0 踩 0

2606.04756 2026-06-04 eess.SP

Ultra-precise TDoA-based Localization of Frequency Hopping LPWAN Transmitters

基于TDoA的超精确定位跳频LPWAN发射器

Thomas Maul, Alfred Mueller, Sebastian Klob, Joerg Robert

AI总结本文提出一种基于到达时间差（TDoA）的定位方案，利用机会信号（SoO）实现亚米级基站同步，并解决跳频（FH）导致的相位信息丢失问题，在视距（LOS）场景下实现低于10米的二维定位精度。

详情

AI中文摘要

物联网（IoT）是一个高度新兴的市场。它是数字孪生或工业场景中资产跟踪等多种应用的关键推动者。这通常需要提供精确的位置信息。然而，由于高能耗和室内应用，全球导航卫星系统（GNSS）等系统被排除在外。为了弥补这一差距，人们讨论了多种系统。为了有助于研究可能的黄金标准，本文讨论了基于低功耗广域网（LPWAN）的定位。因此，提出了一种基于LPWAN标准ETSI TS 103 357中到达时间差（TDoA）测量的概念。本文解决了两个主要挑战。首先，TDoA测量要求接收基站具有高精度的时间同步。在这项工作中，通过利用机会信号（SoO）作为同步源解决了这一问题，实现了亚米级的同步精度。另一个问题源于发射端点的跳频（FH）波形，导致相位信息丢失，从而损失可用的定位带宽。引入了一种方法克服这一限制。本文阐述了系统概念，在理论研究和仿真中证明了其功能。最后，实际测量验证了其功能，并显示在视距（LOS）场景下二维定位精度低于10米。

英文摘要

The Internet of Things (IoT) is a highly emerging market. It serves as a key enabler for a variety of applications like the digital twin or asset tracking in industrial scenarios. This often requires the provision of precise position information. However, systems like Global Navigation Satellite Systems (GNSS) are ruled out due to high energy costs and indoor applications. A variety of systems is discussed to close this gap. In order to contribute to the investigations of possible gold standards, this paper discusses the localization based on Low Power Wide Area Networks (LPWAN). Therefore, a concept is presented, based on Time Difference of Arrival (TDoA) measurements within the LPWAN standard ETSI TS 103 357. This paper addresses two major challenges. At first, TDoA measurements require highly precise temporal synchronization of the receiving base stations. Within this work, this issue is solved by exploiting Signals of Opportunity (SoO) as synchronization source, enabling sub-meter synchronization accuracy. A further issue arises from the Frequency Hopping (FH) waveform of the transmitting endpoints, resulting in a loss of phase information and thus usable localization bandwidth. A method is introduced to overcome this limitation. This paper states the system concept, proves its functionality in theoretical investigations and simulations. Finally, real-world measurements verify the functionality and show a 2D localization accuracy of below 10 m in Line of Sight (LOS) scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.04730 2026-06-04 cs.CL eess.AS

Multilingual Long-Form Speech Instruction Following: KIT's Submission to IWSLT 2026

多语言长篇语音指令跟随：KIT 在 IWSLT 2026 的提交

Enes Yavuz Ugan, Maike Züfle, Yuka Ko, Supriti Sinhamahapatra, Fabian Retkowski, Seymanur Akti, Jan Niehues, Alexander Waibel

AI总结提出一种通用数据增强流水线，通过片段拼接、LLM标签生成和跨语言翻译将短语音语料转换为长语音训练数据，结合似然与最小贝叶斯风险解码解决长语音语义任务退化问题。

详情

Comments: 9 pages main paper, IWSLT 2026 Instruction Following track

AI中文摘要

随着大语言模型的出现，单任务和基于标记的多任务模型已演变为基于指令的系统，该系统从自然语言提示中隐式推断任务和目标语言。这一趋势反映在IWSLT的指令跟随赛道中，该赛道今年引入了包括未知惊喜任务在内的新任务，对已知任务的过拟合构成了真正的挑战。我们展示了KIT在无约束设置下对长指令和短指令跟随赛道的提交。我们的方法结合了一个通用数据增强流水线，通过片段拼接、基于LLM的标签生成和跨语言翻译将短语音语料转换为长语音训练数据，在六个任务和四种语言上产生了超过100万个实例。我们进一步表明，基于似然的重新排序虽然对ASR非常有效，但会系统地降低语义任务，通过选择从分段音频处理而非整体长语音推理中生成的候选者，这一失败模式通过将似然与最小贝叶斯风险解码相结合得以解决。

英文摘要

With the advent of Large Language Models, single-task and token-based multi-task models have evolved into instruction-based systems that infer task and target language implicitly from natural language prompts. This trend is reflected in IWSLT's Instruction Following Track, which this year introduced new tasks including an unknown surprise task, posing a genuine challenge against overfitting to known tasks. We present KIT's submission to the Long and Short Instruction Following tracks in the unconstrained setting. Our approach combines a general data augmentation pipeline that converts short-form corpora into long-form training data through segment concatenation, LLM-based label generation, and cross-lingual translation, yielding over 1M instances across six tasks and four languages. We further show that likelihood-based re-ranking, while highly effective for ASR, systematically degrades semantic tasks by spuriously selecting candidates generated from segmented audio processing rather than holistic long-form inference, a failure mode resolved by combining likelihood with Minimum Bayes Risk decoding.

URL PDF HTML ☆

赞 0 踩 0

2606.04725 2026-06-04 eess.SY cs.SY math.OC

GPU-Accelerated Direct Transcription-Based Nonlinear Model Predictive Control

基于直接转录的GPU加速非线性模型预测控制

Evelyn Gondosiswanto, Joshua L. Pulsipher

AI总结提出一种基于直接转录和二阶内点法的GPU加速框架，通过参数化内点公式重用结构相关计算，实现非线性模型预测控制（NMPC）的显著加速。

详情

AI中文摘要

本文提出了一种基于直接转录和二阶内点法的非线性模型预测控制（NMPC）的GPU加速框架。许多实际系统表现出非线性动力学，无法通过线性模型准确捕捉，这促使了NMPC的使用。然而，NMPC需要重复实时求解最优控制问题（OCP），这些问题在转录后成为计算量大的大规模非线性规划（NLP）。尽管GPU加速已成为非线性优化的一种有前景的方法，但现有的基于GPU的NMPC工作流在每次求解时都会重构结构相同的OCP。这引入了大量开销，尽管连续求解仅因更新的系统测量或参考轨迹而不同。为了解决这一限制，我们引入了一种参数化内点公式，该公式利用了转录OCP的固定结构，从而能够在重新求解中重用结构相关的计算（例如，稀疏Cholesky分解中的符号分解）。我们在蒸馏塔和二维加热板基准上，与最先进的CPU和GPU配置进行了评估。结果表明，该框架在总NMPC运行时间上实现了超过一个数量级的加速。这些改进主要由每次迭代求解时间的减少驱动，与基线相比，GPU执行实现了高达94%的减少。总体而言，结果证明了在GPU加速的NMPC中利用重复问题结构的有效性，并突显了所提出框架扩展实时NMPC应用范围的潜力。

英文摘要

In this paper, we present a GPU-accelerated framework for nonlinear model predictive control (NMPC) based on direct transcription and second-order interior-point methods. Many real-world systems exhibit nonlinear dynamics that cannot be accurately captured by linear models, motivating the use of NMPC. However, NMPC requires the repeated real-time solution of optimal control problems (OCP), which become computationally demanding large-scale nonlinear programs (NLPs) after transcription. Although GPU acceleration has emerged as a promising approach for nonlinear optimization, existing GPU-based NMPC workflows reconstruct structurally identical OCPs at each solve. This introduces substantial overhead even though successive solves differ only through updated system measurements or reference trajectories. To address this limitation, we introduce a parametric interior-point formulation that exploits the fixed structure of transcribed OCPs, enabling reuse of structure-dependent computations (e.g., symbolic factorization in sparse Cholesky) across re-solves. We evaluate the proposed framework on distillation column and 2D heated plate benchmarks against state-of-the-art CPU and GPU configurations. The results show that the framework achieves over an order-of-magnitude speedup in total NMPC run times. These improvements are primarily driven by reduced per-iteration solve times, with GPU execution achieving up to a 94% reduction compared to the baseline. Overall, the results demonstrate the effectiveness of exploiting repeated problem structure in GPU-accelerated NMPC and highlight the potential of the proposed framework to expand the envelope of real-time NMPC applications.

URL PDF HTML ☆

赞 0 踩 0

2606.04698 2026-06-04 eess.SP

Adaptive $c_2$-Perturbed AFDM Waveform Design for Integrated Sensing and Communication

面向集成感知与通信的自适应 $c_2$ 扰动 AFDM 波形设计

Shiqi Cui, Fan Zhang, Yuanshuo Gang, Zeping Sui, Tianqi Mao, Zhaocheng Wang

AI总结针对 AFDM 波形的高 PAPR 和随机数据符号导致的非理想自相关旁瓣问题，提出一种实时数据驱动框架，通过优化预啁啾参数 $c_2$ 来提升 ISAC 性能。

详情

Comments: 6 pages, 3 figures, submitted to IEEE Globalcom 2026

AI中文摘要

仿射频分复用（AFDM）是一种有前途的集成感知与通信（ISAC）系统波形，因其在时频双弥散信道中的优越性能。然而，AFDM 仍面临两个挑战：高 PAPR 和随机数据符号产生非理想的自相关旁瓣。为了解决这些挑战，本文提出了一种实时数据驱动框架，优化预啁啾参数 $c_2$ 以增强 AFDM-ISAC 性能。具体而言，我们构建了一个无侧信息优化问题，以降低 PAPR 以及非周期和周期自相关函数的加权积分旁瓣水平，其复杂度与传统 AFDM 接收机相当。此外，通过利用闭式梯度，开发了一种高效的非单调线搜索谱投影梯度算法。仿真结果表明，所提方法实现了优越的感知与通信权衡，并在严重的功率放大器非线性情况下能够获得提升的误码率性能。

英文摘要

Affine frequency division multiplexing (AFDM) is a promising waveform for integrated sensing and communication (ISAC) systems owing to its superior performance in time--frequency doubly dispersive channels. However, AFDM still faces a pair of challenges: high PAPR and random data symbols produce imperfect autocorrelation sidelobes. To address these challenges, this paper proposes a real-time data-driven framework that optimizes the pre-chirp parameter $c_2$ to enhance the AFDM-ISAC performance. Specifically, a side-information-free optimization problem is formulated to reduce PAPR and the weighted integrated sidelobe levels of both aperiodic and periodic autocorrelation functions, with complexity comparable to that of the conventional AFDM receiver. Furthermore, an efficient non-monotone line-search spectral projected-gradient algorithm is developed by exploiting closed-form gradients. Simulation results demonstrate that the proposed method achieves a superior sensing vs. communications trade-off and is capable of striking a promoted bit error rate performance in the presence of severe power amplifier nonlinearity.

URL PDF HTML ☆

赞 0 踩 0

2606.04680 2026-06-04 eess.AS cs.CL cs.SD

Read What You Hear: Reference-Free Hypotheses Evaluation with Acoustic Discrepancy

听你所写：基于声学差异的无参考假设评估

Zhihan Li, Hankun Wang, Yiwei Guo, Bohan Li, Xie Chen, Kai Yu

AI总结提出READ指标，利用预训练自回归TTS模型计算语音与文本假设的声学差异，无需参考转录即可评估ASR假设，并在噪声条件下实现高达20%的相对错误率降低。

2606.04638 2026-06-04 eess.SY cs.SY

Mixed potential for nonlinear RLC circuits with memristors

含忆阻器的非线性RLC电路的混合势

Mauro Di Marco, Mauro Forti, Luca Pancioni, Giacomo Innocenti, Alberto Tesi

AI总结本文首次将混合势概念引入含忆阻器的RLC电路（RLCM），通过通量-电荷域分析建立等效原理，并推导出紧凑的状态方程形式。

详情

AI中文摘要

在1964年发表的两篇开创性文章中，Brayton和Moser引入了混合势的概念，作为描述和分析包含电阻、电容和电感的一类非线性RLC电路的基本理论工具。本文首次证明，对于也包含忆阻器的RLC电路（RLCM），可以引入混合势。这要求忆阻器电路不在传统的电压-电流域中分析，而是在通量-电荷域中分析。通量-电荷分析法（FCAM）在此扩展中起着关键作用，特别是通过FCAM在通量-电荷域的RLCM电路与电压-电流域的非线性RLC电路之间建立的等价原理。讨论了几个明确找到混合势的例子，包括含有忆阻器的基本电路，如带有忆阻器的蔡氏电路，以及具有神经结构的大规模忆阻器阵列。本文主要致力于为忆阻器电路引入混合势，并研究其主要理论性质，例如通过混合势在通量-电荷域中以有效且紧凑的形式写出电路状态方程的可能性。在配套论文[1]中，混合势被用于系统地获得RLCM电路收敛性的类李雅普诺夫结果。这些结果将扩展现有的收敛性结果，这些结果未涵盖忆阻器电路中电容和电感同时存在的重要情况。

英文摘要

In two seminal articles published in 1964, Brayton and Moser introduced the concept of a mixed potential as a fundamental theoretic tool to describe and analyze a class RLC of nonlinear circuits containing resistors, capacitors and inductors. In this paper, it is shown for the first time that a mixed potential can be introduced for a class RLCM of RLC circuits containing also memristors. This is possible provided a memristor circuit is analyzed not in the traditional voltage-current domain but rather in the flux-charge domain. The flux-charge analysis method (FCAM) plays a crucial role in the extension, in particular, a key step is an equivalence principle established via FCAM between an RLCM circuit in the flux-charge domain and a nonlinear RLC circuit in the voltage-current domain. Several examples are discussed where the mixed potential is explicitly found. These include basic circuits with memristors, such as Chua's circuit with a memristor and also large-scale memristor arrays with a neural architecture. This paper is mainly devoted to the introduction of a mixed potential for memristor circuits and the study of its main theoretic properties, as the possibility to write the circuit state equations in the flux-charge domain in an effective and compact form via the mixed potential. In a companion paper [1], the mixed potential is used to obtain in a systematic way Lyapunov-like results on convergence of RLCM circuits. Those results will extend existing results on convergence that do not cover the important case where there is the simultaneous presence of capacitors and inductors in a memristor circuit.

URL PDF HTML ☆

赞 0 踩 0

2606.04600 2026-06-04 eess.SP

Joint 3D Trajectory and Power Allocation for HAPs-UAV Bistatic ISARAC in Low-Altitude Networks

低空网络中HAPs-UAV双基地ISARAC的联合3D轨迹与功率分配

Bang Huang, Mohamed-Slim Alouini

AI总结针对低空网络中HAPs-UAV双基地ISARAC系统，提出联合3D轨迹与功率分配方法，通过交替优化最大化地面用户和速率，满足SAR成像SNR和分辨率约束。

详情

AI中文摘要

本文研究了低空网络中高空平台（HAPs）-无人机（UAV）双基地集成合成孔径雷达（SAR）与通信（ISARAC）系统的联合三维（3D）轨迹规划和资源分配问题。在所提出的架构中，HAPs通过发射ISARAC波形用于地面用户通信，提供持久的广域连接，而低空UAV利用其近距离和机动性被动收集地面目标回波以实现高分辨率SAR成像。我们构建了一个地面用户和速率最大化问题，受限于严格的SAR成像信噪比（SNR）和分辨率要求、ISARAC传输的总能量预算以及UAV动态约束。该问题本质上是非凸的。为了解决它，开发了一种交替优化（AO）框架，其中固定UAV状态下的功率分配子问题具有闭式注水解，而固定发射功率下的UAV轨迹优化通过逐次凸近似（SCA）和凸差（DC）规划处理。仿真结果验证了所提方法的有效性，并展示了其在低空网络场景中联合支持持续通信覆盖和高分辨率感知的能力。

英文摘要

This paper investigates joint three-dimensional (3D) trajectory planning and resource allocation for a high-altitude platform (HAPs)-unmanned aerial vehicle (UAV) bistatic integrated synthetic aperture radar (SAR) and communication (ISARAC) system in low-altitude networks. In the proposed architecture, the HAPs provides persistent wide-area connectivity by transmitting ISARAC waveforms for ground-user communications, while a low-altitude UAV exploits its proximity and mobility to passively collect ground-target echoes for high-resolution SAR imaging. We formulate a sum-rate maximization problem for ground users subject to stringent SAR imaging signal-to-noise ratio (SNR) and resolution requirements, a total energy budget for ISARAC transmission, and UAV dynamic constraints. The resulting problem is inherently nonconvex. To tackle it, an alternating optimization (AO) framework is developed, where the power-allocation subproblem with fixed UAV states admits a closed-form water-filling solution, while the UAV trajectory optimization with fixed transmit powers is handled via successive convex approximation (SCA) and difference-of-convex (DC) programming. Simulation results verify the effectiveness of the proposed approach and demonstrate its capability to jointly support persistent communication coverage and high-resolution sensing in low-altitude network scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.04595 2026-06-04 eess.IV

KD-NVC: A Search-and-Distill Framework to Accelerate Neural Video Coding

KD-NVC：一种加速神经视频编码的搜索与蒸馏框架

Yuxiao Sun, Meiqin Liu, Chao Yao, Hui Xiang, Jingran Wu, Xianguo Zhang, Jian Jin, Weisi Lin, Yao Zhao

AI总结针对神经视频编码在边缘设备上实时解码的高复杂度问题，提出一种两阶段蒸馏框架KD-NVC，通过加速效率感知的神经架构搜索（AE-NAS）和能量感知特征蒸馏（EFD）实现模块级加速与率失真性能保持。

详情

Comments: This manuscript is submitted to IEEE Transactions

AI中文摘要

尽管神经视频编码（NVC）在率失真性能上取得了显著进展，但边缘设备上的实时解码已成为重要需求，却仍受限于高复杂度。知识蒸馏（KD）广泛用于模型加速，但其在NVC中的应用面临关键挑战。具体而言，NVC子模块的异质性使得统一的架构缩减效果不佳，需要针对每个模块进行设计以更好地权衡率失真与速度。然而，由于神经视频编解码器的训练成本高昂，通过现有神经架构搜索（NAS）算法搜索多样化架构是难以承受的。此外，在确定轻量级架构后，现有蒸馏方法忽略了由率约束引起的特征能量稀疏性，而这对于维持压缩性能至关重要。为解决这些问题，我们提出了一种两阶段蒸馏框架KD-NVC。在第一阶段，我们引入了一种基于加速效率的神经架构搜索（AE-NAS）算法。该算法探索模块级帕累托前沿，以自适应地在异构模块间分配加速预算。同时，它引入加速效率指标来确定最终的学生架构，而无需实际训练所有架构级候选。在第二阶段，我们设计了一种能量感知特征蒸馏（EFD）损失，该损失对齐教师和学生编解码器之间的空间聚合特征能量签名，传递率诱导的稀疏模式以实现更好的压缩效率。实验结果表明，所提框架持续优于现有的面向编解码器的蒸馏方法，并在RTX 5060上实现了1080p下69 FPS的解码，同时保持了与VTM-LDB相当的率失真性能。

英文摘要

While neural video coding (NVC) has achieved remarkable rate-distortion performance, real-time decoding on edge devices has become an important demand but remains limited by high complexity. Knowledge distillation (KD) is widely used for model acceleration, yet its application to NVC faces critical challenges. Specifically, the heterogeneity of NVC sub-modules renders uniform architectural reduction suboptimal, necessitating a per-module design for better rate-distortion-speed trade-off. However, searching for diverse architectures via existing neural architecture search (NAS) algorithms is unaffordable due to the expensive training cost of neural video codecs. Moreover, after the lightweight architecture is determined, existing distillation methods overlook the feature-energy sparsity induced by the rate-constraint, which is essential for maintaining compression performance. To address these issues, we propose a two-stage distillation framework KD-NVC. In the first stage, we introduce an acceleration-efficiency-based neural architecture search (AE-NAS) algorithm. It explores the module-wise Pareto frontier to adaptively allocate the acceleration budget across heterogeneous modules. Also, it introduces the acceleration-efficiency metric to determine the final student architecture without practically training all architecture-level candidates. In the second stage, we design an energy-aware feature distillation (EFD) loss that aligns the spatially-aggregated feature-energy signatures between the teacher and student codecs, transferring the rate-induced sparsity patterns for better compression efficiency. Experimental results demonstrate that the proposed framework consistently outperforms existing codec-oriented distillation methods, and achieves 69 FPS decoding at 1080p on RTX 5060 while maintaining comparable RD performance to VTM-LDB.

URL PDF HTML ☆

赞 0 踩 0

2606.04565 2026-06-04 eess.SY cs.SY

Implementation of a Misalignment-Tolerant MIMO Near Field Wireless Power Transfer System

一种容忍错位的MIMO近场无线电力传输系统的实现

Taroh Hijikata, Allan Jr Mesa, Charleston Dale Ambatali

AI总结本文利用Nelder-Mead迭代优化算法估计输入幅度和相位设置，通过四发射两接收的MIMO配置，在错位条件下显著提高近场无线电力传输效率。

详情

Comments: 5 pages, 8 figures, submitted and accepted to the 2026 IEEE Wireless Power Transfer Conference and Expo (WPTCE)

AI中文摘要

反应近场无线电力传输（WPT）系统的效率随着分离距离的增加而迅速下降，并且对发射线圈和接收线圈之间的错位高度敏感。这些限制限制了供电设备的移动性，并将许多近场WPT应用局限于静态场景。为了解决这些挑战，研究了一种多输入多输出（MIMO）WPT配置，因为它能够塑造发射器和接收器之间的磁场分布。通过适当设置每个发射线圈的幅度和相位，可以实现最大功率传输效率；然而，确定这些最优设置需要准确了解系统的S参数。本文介绍了使用Nelder-Mead迭代优化算法来估计近场WPT系统中最大化传输效率的输入幅度和相位设置。该实现包括一个四元件发射器和一个两元件接收器。基于测量的S参数，所提出的方法在对准和错位条件下均显著提高了WPT效率。

英文摘要

The efficiency of reactive near-field wireless power transfer (WPT) systems degrades rapidly with increasing separation distance and is highly sensitive to misalignment between transmitting and receiving coils. These limitations restrict the mobility of powered devices and confine many near-field WPT applications to static scenarios. To address these challenges, a multiple-input multiple-output (MIMO) WPT configuration is investigated due to its capability to shape the magnetic field distribution between the transmitter and receiver. Maximum power transfer efficiency can be achieved by appropriately setting the amplitude and phase of each transmitting coil; however, determining these optimal settings requires accurate knowledge of the system's S-parameters. This paper presents the use of the Nelder-Mead iterative optimization algorithm to estimate the input amplitude and phase settings that maximize transfer efficiency in a near-field WPT system. The implementation comprises a four-element transmitter and a two-element receiver. Based on measured S-parameters, the proposed approach significantly improves WPT efficiency under both aligned and misaligned conditions.

URL PDF HTML ☆

赞 0 踩 0