arXivDaily arXiv每日学术速递 周一至周五更新
重置
EESS电气与系统 237
2605.10879 2026-05-12 cs.IT cs.CR cs.NI eess.SP math.IT

Private Information Retrieval With Arbitrary Privacy Requirements for Graph-Based Storage

Mohamed Nomeir, Shreya Meel, Sennur Ulukus

AI总结 本文重新定义了私有信息检索(PIR)问题中的隐私要求,以支持更灵活的隐私需求。研究聚焦于基于图结构的存储系统中的PIR问题,允许每个服务器对隐私消息集合有不同且任意的设定,而非要求所有消息对所有服务器都私有。针对路径图和环形图两种具体存储结构,作者分析了多种隐私设置,并特别关注基于服务器邻域范围的隐私集合,从而实现了从局部PIR到标准图复制PIR的平滑过渡,并推导了相关场景下的容量界限。

详情
英文摘要

We reformulate the definition of privacy in the private information retrieval (PIR) problem to accommodate flexible privacy requirements. We focus on graph-replicated PIR, with a generalized privacy requirement, instead of requiring all messages to be private from all servers, during retrieval. Towards this, we define a privacy requirement set for each server, which can be an arbitrary subset of all message indices, as long as the stored message indices are in their privacy requirement set. Since both the storage and privacy requirement sets have many possibilities, we focus on two specific storage settings, namely the path and cyclic graphs. We consider several privacy settings for each of them, which are not necessarily the same, to give different examples for privacy sets. Of particular interest are the privacy sets that comprise the indices of messages stored at servers within a neighborhood range. The neighborhood range parameter allows a transition from the recently introduced local PIR [1] to the standard graph-replicated PIR. In these cases, we derive bounds on the capacity or find the exact capacity.

2605.10872 2026-05-12 cs.IT cs.CR cs.NI eess.SP math.IT

Local Private Information Retrieval: A New Privacy Perspective for Graph-Based Replicated Systems

Shreya Meel, Mohamed Nomeir, Sennur Ulukus

AI总结 本文重新定义了多服务器图复制私有信息检索(PIR)系统中的隐私概念,提出了一种新的隐私保护模型——局部用户隐私,即用户仅需隐藏其请求的消息索引,当且仅当该服务器存储了对应消息。研究核心在于分析这种局部隐私下PIR的通信效率提升,并建立了相应的容量理论。研究发现,在由不同图组成的离散图联合中,局部PIR容量具有乘法性优势,且对连通图提出了下界分析,特别地,推导出了环图和奇数顶点路径图的精确局部PIR容量。

详情
英文摘要

We rethink the definition of privacy in multi-server, graph-replicated private information retrieval (PIR) systems, and introduce a novel setting where the user's privacy is governed by the servers' storage structure. In particular, while retrieving a message from a server, the user is concerned with hiding their desired message index from the server, only if the server stores the corresponding message. We coin this privacy requirement as local user privacy and the resulting PIR problem as local PIR on the graph. Our goal is to measure the gain in communication efficiency of local PIR, compared to that of canonical PIR, by establishing its capacity, i.e., the maximum number of message symbols retrieved, per downloaded symbol. To this end, we observe a remarkable gain in the local PIR capacity of graphs, that are disjoint union of distinct graphs, which is multiplicative, compared to the PIR capacity, when the individual graphs are identical. For connected graphs, we propose schemes to establish capacity lower bounds for edge-transitive and bipartite graphs, which are greater than the best-known PIR capacity bounds. Finally, we derive the exact local PIR capacity for the cyclic graph, and the path graph with an odd number of vertices.

2605.10822 2026-05-12 cs.LG eess.SP

Benchmarking Sensor-Fault Robustness in Forecasting

Alexander Windmann, Philipp Wittenberg, Gianluca Manca, Marcel Dix, Jens U. Brandt, Oliver Niggemann

AI总结 该论文提出了一种名为SensorFault-Bench的基准测试框架,用于评估预测模型在传感器故障情况下的鲁棒性。研究通过引入标准化的故障严重性模型和多个真实数据集,系统评估了不同预测架构和鲁棒性改进方法在多种故障场景下的表现,揭示了传统基于干净数据误差的模型排名可能与实际故障场景下的性能存在显著差异。该框架还提供了开源代码和数据接口,支持后续研究在统一协议下进行扩展和比较。

详情
英文摘要

Cyber-physical system (CPS) forecasting models depend on sensor streams with noisy, biased, missing, or temporally misaligned readings, yet standard forecasting evaluation often selects models by nominal error without showing whether they remain robust under such faults. We introduce SensorFault-Bench, a shared CPS-grounded sensor-fault stress-test protocol for evaluating forecasting architectures and robustness-improvement methods, and an operational taxonomy organizing the method comparison. Across four real-world datasets and eight scored scenarios governed by a standardized severity model, it reports worst-scenario degradation, clean mean squared error (MSE), and worst-scenario fault-time MSE, separating relative robustness from absolute error. A disjoint fault-transfer split lets explicit fault-training methods train on adjacent fault families while evaluation uses separate benchmark scenarios. Empirically, forecasting architectures favored by clean MSE can degrade sharply under faults, and clean-MSE rankings can disagree with worst-scenario fault-time error rankings. Chronos-2, the evaluated zero-shot foundation-model representative, matches or trails the last-value naive forecaster in clean MSE on the two single-target datasets and has the largest worst-scenario degradation on ETTh1 and Traffic, where all channels are forecast targets. For the evaluated robustness-improvement method set, paired deltas show selective degradation reductions: projected gradient descent adversarial training and randomized training lead where value faults dominate observed degradation, while fault augmentation leads where availability faults dominate. SensorFault-Bench provides open-source code, documented data access, and reproduction and extension guides, so new datasets, architectures, and robustness-improvement methods can be evaluated under the same CPS sensor-fault robustness protocol.

2605.10772 2026-05-12 cs.CV cs.AI eess.IV

Towards a Large Language-Vision Question Answering Model for MSTAR Automatic Target Recognition

David F. Ramirez, Tim L. Overman, Kristen Jaskie, Marv Kleine, Andreas Spanias

AI总结 本文研究了将大语言-视觉模型(LLVM)应用于合成孔径雷达(SAR)图像的目标识别任务,特别是在军事车辆自动目标识别(ATR)中的应用。通过构建基于MSTAR公开数据集的训练与评估基准,并引入描述性文本和问答对,作者探索了LLVM在遥感图像描述和视觉问答(VQA)中的性能。实验表明,使用参数高效的微调方法,模型在识别细粒度目标特征方面达到了98%的准确率,为机器辅助的军事和情报遥感目标识别提供了新的技术路径。

Comments Accepted to SPIE Defense + Commercial Sensing, Automatic Target Recognition XXXV

详情
Journal ref
Proc. SPIE 13463, Automatic Target Recognition XXXV, 134630D (29 May 2025);
英文摘要

Large language-vision models (LLVM), such as OpenAI's ChatGPT and GPT-4, have gained prominence as powerful tools for analyzing text and imagery. The merging of these data domains represents a significant paradigm shift with far-reaching implications for automatic target recognition (ATR). Recent transformer-based LLVM research has shown substantial improvements for geospatial perception tasks. Our study examines the application of LLVM to remote sensing image captioning and visual question-answering (VQA), with a specific focus on synthetic aperture radar (SAR) imagery. We examine newly published LLVM methods, including CLIP and LLaVA neural network transformer architectures. We have developed a work-in-progress SAR training and evaluation benchmark derived from the MSTAR Public Dataset. This has been extended to include descriptive text captions and question-answer pairs for VQA tasks. This challenge dataset is designed to push the boundaries of an LLVM in identifying nuanced ATR details in SAR imagery. Utilizing parameter-efficient fine-tuning, we train an LLVM method to identify fine-grained target qualities at 98% accuracy. We detail our data setup and experiments, addressing potential pitfalls that could lead to misleading conclusions. Accurately identifying and differentiating military vehicle types in SAR data poses a critical challenge, especially under complex environmental conditions. Mastering this target recognition skill may require a human analyst months of training and years of practice. This research represents a unique effort to apply LLVM to SAR applications, advancing machine-assisted remote sensing ATR for military and intelligence contexts.

2605.10745 2026-05-12 eess.SP

How Time-Sensitive are IoBNT Networks? An Age of Information Perspective for In-Body Monitoring

Jorge Torres Gómez

AI总结 本文从信息新鲜度(AoI)的角度,研究了体内纳米传感器网络(IoBNT)在疾病监测中的时间敏感性。通过构建包含心血管生理特性和纳米通信信道的马尔可夫模型,评估了网络在样本生成、传输和交付过程中的信息更新能力。研究发现,在合理假设下,监测设备可在数十秒内接收到新鲜信息,表明该网络适用于组织层面的疾病监测,如细菌感染,但需更高效的架构来监测更快速的细胞级过程。

详情
Journal ref
Habilitation Thesis in Electrical Engineering, TU Berlin, Germany, 2026
英文摘要

This thesis develops a theoretical framework to evaluate the monitoring capability of IoBNT networks. We consider a scenario in which nanosensors passively flow in the bloodstream and detect biomarkers associated with potential diseases, reporting their detections to external gateways on the skin that host a monitoring device. The nanosensors thus realize an artificial point-to-point communication channel between the disease region and the monitor: some packets reach the destination directly, while others are lost through vessel paths that bypass the gateway. We evaluate the network's monitoring capability over this artificial channel using the \ac{AoI} concept, which jointly integrates sample generation (at the disease region), carrying (nanosensor travel through vessels), and delivery (nanosensor-to-gateway) as random events. These are modeled through (i) a Markov model that follows cardiovascular physiology and (ii) channel models of reported nanocommunication technologies. We compute the Markov transition probabilities using a cardiovascular simulator built as a low-complexity electric circuit model of the human vessels. For the nanosensor-to-gateway link, we model two well-known schemes: ultrasonic and terahertz channels. Integrating these components within the \ac{AoI} framework, we report information freshness via the average \ac{PAoI} metric. Under realistic physiological and communication assumptions, fresh information appears on the monitor within tens of seconds. The network is therefore suitable for monitoring tissue-level processes such as bacterial infections, while more adequate architectures are needed to monitor cellular-scale processes, which occur on timescales below tens of seconds.

2605.10739 2026-05-12 eess.IV cs.AI cs.CV

Geospatial-Temporal Sensemaking of Remote Sensing Activity Detections with Multimodal Large Language Model

David F. Ramirez, Tim Overman, Kristen Jaskie, Andreas Spanias

AI总结 本文提出了一种基于Sentinel-2卫星影像的多模态视觉问答数据集SMART-HC-VQA,用于分析人类活动的时空演变。该数据集通过将施工标注、类型标签、时间阶段标签等信息转化为自然语言问答对,构建了一个时序扩展的自动目标识别与视觉问答挑战任务。研究还引入了一种多图像大语言模型训练框架,能够处理多时相遥感影像并进行语义推理,为理解语言引导下的遥感活动提供了可复现的基础。

Comments Accepted to 2026 SPIE Defense + Security, Automatic Target Recognition XXXVI

详情
英文摘要

We introduce SMART-HC-VQA, a Sentinel-2-based visual question answering dataset derived from the IARPA SMART Heavy Construction dataset, designed for spatiotemporal analysis of human activity. The dataset transforms construction-site annotations, construction-type labels, temporal-phase labels, geographic metadata, and observation relationships into natural language question-answer triplets. This approach redefines the existing dataset as a temporally extended automatic target recognition and visual question answering (VQA) challenge, considering a fixed geospatial site as a target whose attributes and activity states evolve across sparse satellite observations. Currently, SMART-HC-VQA comprises 21,837 accessible Sentinel-2 image chips, 65,511 single-image VQA examples, and approximately 2.3 million two-image temporal comparison examples generated via our novel Image-Pairwise Combinatorial Augmentation. We detail the workflow for retrieving and processing Sentinel-2 imagery, segmenting large satellite tiles into site-centered images, maintaining traceability to SMART-HC annotations, and analyzing the distributions of site size, observation count, temporal coverage, construction type, and phase labels. Additionally, we describe an implemented multi-image MLLM training framework based on LLaVA-NeXT Mistral-7B, adapted to accept multiple dated image inputs and train on metadata-derived VQA examples. This work offers a reproducible foundation for understanding language-guided remote sensing activities, aiming not only to detect change but also to reason about the ongoing processes, their progression, and potential future developments.

2605.10738 2026-05-12 math.OC cs.MA cs.RO cs.SY eess.SY

Decentralized Contingency MPC based on Safe Sets for Nonlinear Multi-agent Collision Avoidance

Max Studt, Georg Schildbach

AI总结 本文研究了在非线性多智能体系统中,如何在不共享轨迹信息的情况下实现去中心化的避障控制。提出了一种基于安全集的应急模型预测控制(MPC)框架,每个智能体仅依赖于自身状态进行局部优化,通过耦合主轨迹与应急保证机制,确保在滚动时域操作中具有可行的避障动作。该方法引入了一种新颖的几何安全集更新机制,保证了递归可行性与收敛性,并在多种密集和稀疏场景中验证了其有效性。

详情
英文摘要

Decentralized collision avoidance remains challenging, particularly when agents do not communicate any information related to planned trajectories. Most existing approaches either rely on conservative coordination mechanisms or provide limited guarantees on recursive feasibility and convergence. This paper develops a decentralized contingency MPC framework for multi-agent systems with nonlinear dynamics that achieves collision-free motion under a state-only information pattern. Each agent follows the same consensual rule set, enabling safe decentralized planning without communication. Each agent solves a local optimization problem that couples a nominal trajectory with a contingency certificate ensuring a feasible backup maneuver under receding-horizon operation. A novel geometric and decentralized safe-set update mechanism prevents feasibility loss between consecutive time steps. The resulting scheme guarantees recursive feasibility, including collision avoidance, and establishes a Lyapunov-type convergence result to an admissible safe equilibrium. Simulation results demonstrate performance in both sparse and dense multi-agent environments, including cluttered bottleneck scenarios and under plug-and-play operation.

2605.10704 2026-05-12 eess.SP cs.RO

xApp Empowered Resource Management for Non-Terrestrial Users in 5G O-RAN Networks

Mohammed M. H. Qazzaz, Syed Ali Zaidi, Aubida A. Al-Hameed, Abdelaziz Salama, Des Mclernon

AI总结 本文提出了一种基于深度强化学习的xApp,用于5G开放无线接入网(O-RAN)中非地面用户设备的资源管理,旨在优化无人机沿预设航线飞行时的切换决策。该方法采用结合迁移学习的双重深度Q网络(DDQN)进行预测性优化,提前预判网络状态,从而降低切换频率和断连概率。实验表明,该框架在保证连接可靠性的同时,显著减少了切换次数,验证了智能学习方法在下一代O-RAN架构中管理无人机移动性的有效性。

详情
英文摘要

This paper introduces a proactive Unmanned Aerial Vehicle (UAV) mobility management xApp for Open Radio Access Network (O-RAN) Near Real-Time Radio Intelligent Controller (Near-RT RIC) environments, employing Double Deep Q-Network (DDQN) reinforcement learning (RL) enhanced with transfer learning to optimise handover decisions for UAVs operating along predetermined flight trajectories. Unlike reactive approaches that respond to signal degradation, the proposed framework anticipates network conditions and minimises both outage probability and handover frequency through predictive optimisation. The system leverages centralised weight averaging to consolidate knowledge from multiple flight scenarios into a global model capable of generalising to previously unseen operational environments without extensive retraining. A comprehensive evaluation demonstrates that the proposed framework achieves a favourable trade-off between handover frequency and connectivity reliability, reducing handover events by up to 54.6% compared to greedy approaches while maintaining outage probability at practically negligible levels. The results validate the effectiveness of intelligent learning-based approaches for UAV mobility management in next-generation O-RAN architectures, thereby contributing to seamless integration of aerial user equipment into cellular networks.

2605.10688 2026-05-12 cs.LG eess.SP

DANCE: Detect and Classify Events in EEG

Jarod Lévy, Hubert Banville, Jérémy Rapin, Jean-Remi King, Thomas Moreau, Stéphane d'Ascoli

AI总结 本文提出了一种名为DANCE的深度学习方法,用于直接从原始未对齐的脑电(EEG)信号中检测和分类事件,解决了传统方法依赖已知事件起始点的局限性。该方法将神经解码任务建模为集合预测问题,实现了端到端的异步解码。实验表明,DANCE在多种认知、临床和脑机接口任务中均优于现有方法,并在癫痫监测任务中达到了新的性能水平。

Comments 29 pages

详情
英文摘要

Event identification in continuous neural recordings is a critical task in neuroscience. Decoding in EEG is dominated by classifying windows aligned to known event onsets. However, while available in controlled experiments, such onsets are absent in continuous real-world monitoring. Here, we introduce DANCE, a deep learning pipeline that frames neural decoding as a set-prediction problem and jointly detects and classifies events directly from raw, unaligned signals. Evaluated separately on ten datasets curated from the literature with a wide variety of event types (ranging from milliseconds to minutes in duration), our model outperforms existing methods on a broad range of cognitive, clinical and BCI tasks. This single architecture establishes a new state of the art in the competitive task of seizure monitoring and matches the accuracy of onset-informed models for BCI tasks. Overall, our method marks a step towards end-to-end asynchronous neural decoding models

2605.10621 2026-05-12 cs.LG cs.SY eess.SY

Hierarchical End-to-End Taylor Bounds for Complete Neural Network Verification

Taha Entesari, Mahyar Fazlyab

AI总结 该论文研究了神经网络的可达性分析问题,旨在计算或界定给定输入域下网络输出的可能范围,以验证学习驱动的物理系统的安全性与鲁棒性。现有方法多依赖于二阶信息的可追踪近似,而本文提出了一种新的验证框架HiTaB,通过利用Hessian矩阵及其Lipschitz常数,系统性地引入更高阶的平滑性信息,构建了统一的零阶、一阶和二阶界框架,并提出了高效的层间曲率传播算法来计算深层网络中Hessian Lipschitz常数的上界,从而获得更紧致和可靠的安全性证明。

详情
英文摘要

Reachability analysis of neural networks, which seeks to compute or bound the set of outputs attainable over a given input domain, is central to certifying safety and robustness in learning-enabled physical systems. Since exact reachable set computation is generally intractable, existing methods typically rely on tractable overapproximations. Examining the state of the art for smooth, twice-differentiable networks, we observe that existing approaches exploit at most second-order information and do not systematically leverage higher-order information. In this work, we introduce \textsc{HiTaB}, a novel verification framework that exploits second-order smoothness through both the Hessian, $\nabla^2 f$, and its Lipschitz constant, $L_{\nabla^2 f}$. We further develop a unified hierarchy of zeroth-, first-, and second-order bounds, together with precise conditions under which higher-order approximations yield provable improvements. Our main technical contribution is a compositional procedure for efficiently bounding $L_{\nabla^2 f}$ in deep neural networks via layerwise propagation of curvature bounds. We extend the framework to both $\ell_2$- and $\ell_\infty$-constrained input sets and show how it can be integrated into branch-and-bound verification pipelines. To our knowledge, this is the first practical reachability analysis framework for smooth neural networks that systematically exploits Lipschitz continuity of curvature, leading to tighter and more informative safety certificates.

2605.10571 2026-05-12 eess.IV cs.CV

Set-Based Groupwise Registration for Variable-Length, Variable-Contrast Cardiac MRI

Yi Zhang, Yidong Zhao, Tijmen Toxopeus, Maša Božić-Iven, Sebastian Weingärtner, Qian Tao

AI总结 该研究针对可变长度、对比度不同的心脏MRI序列,提出了一种基于集合的群组配准方法\emph{\AnyTwoReg},以解决传统深度学习方法在跨协议配准中的泛化性不足问题。该方法将MRI序列视为无序集合,解耦了网络设计与序列长度和输入顺序的依赖关系,并通过共享编码器和相关性引导的特征聚合构建了排列不变的参考基准,实现了从图像到形变场的排列等变映射。实验表明,该方法在未见过的定量MRI数据集上表现出良好的零样本泛化能力,并有效提升了后续定量映射的质量。

Comments MICCAI 2026. Submitted Version

详情
英文摘要

Quantitative cardiac magnetic resonance imaging (MRI) enables non-invasive myocardial tissue characterization but relies on robust motion correction within these variable-length, variable-contrast image sequences. Groupwise registration, which simultaneously aligns all images, has shown greater robustness than pairwise registration for motion correction. However, current deep-learning-based groupwise registration methods cannot generalize across MRI sequences: the architecture typically encodes input data as a fixed-length channel stack, which rigidly couples network design to protocol-specific sequence length, input ordering, and contrast dynamics. At inference time, any change in imaging protocols will render the network unusable. In this work, we introduce \emph{\AnyTwoReg}, a new set-based groupwise registration framework that takes a quantitative MRI sequence as an unordered set. This set formulation fundamentally decouples network design from sequence length and input ordering. By utilizing a shared encoder and correlation-guided feature aggregation, \emph{\AnyTwoReg} constructs a permutation-invariant canonical reference for registration, and learns a permutation-equivariant mapping from images to deformation fields. Additionally, we extract contrast-insensitive image features from an existing foundation model to handle extreme contrast variations. Trained exclusively on a single public $T_1$ mapping dataset (STONE, sequence length $L=11$), \AnyTwoReg generalizes to two unseen quantitative MRI datasets (MOLLI, ASL) with variable lengths ($L \in [11, 60]$) and different contrast dynamics. It achieves strong cross-protocol generalization in a zero-shot manner, and consistently improves downstream quantitative mapping quality. Notably, while designed for quantitative MRI sequences, our framework is directly applicable to Cine MRI sequences for inter-cardiac-phase registration.

2605.10565 2026-05-12 eess.SP

Exponential Noise Robustness of Type-Based Multiple Access for Over-the-Air Computation

Marc Martinez-Gost, Ana Pérez-Neira, Miguel Ángel Lagunas

AI总结 本文研究了基于类型划分的多址接入(TBMA)在无先验分布知识的非参数估计环境下,用于空中介质计算(AirComp)时的鲁棒性。与传统依赖幅度调制且易受噪声影响的AirComp方法不同,TBMA通过结构化调制格式提升了性能,其信号叠加在接收端诱导出离散的晶格结构,利用最近晶格点投影可有效抑制噪声。该方法实现了均方误差(MSE)随信噪比指数级衰减,相较传统方法具有根本性的鲁棒性优势。

Comments Submitted to GLOBECOM 2026

详情
英文摘要

This paper studies the robustness of type-based multiple access (TBMA) in over-the-air computation (AirComp) under nonparametric estimation, where no prior knowledge of the data distribution is available. While conventional AirComp approaches rely on amplitude modulations and suffer from noise sensitivity, TBMA enables the use of more structured modulation formats that can be exploited for improved performance. We show that the superposition of transmitted signals in TBMA induces a discrete lattice structure in the received signal space, where each lattice point corresponds to the number of devices accessing a given channel resource. By exploiting this structure through nearest-lattice-point projection, noise effects can be substantially suppressed. The proposed technique achieves an exponential decay of the mean squared error (MSE) with respect to the energy-to-noise spectral density ratio, whereas in conventional techniques the MSE only scales inversely with this ratio. Simulation results validate the theoretical findings and demonstrate that TBMA provides a fundamental robustness advantage over traditional AirComp.

2605.10558 2026-05-12 cs.MA cs.SY eess.SY

Effect of Graph Gluing on Consensus in Networked Multi-Agent Systems

Rohollah Moghadam, Santosh Kandel

AI总结 本文研究了多智能体系统网络中图连接操作对一致性性能的影响。通过分析桥接连接和接口连接两种方式,探讨了子系统间通信链路的数量和结构如何影响组合图的Fiedler特征值,从而影响系统的一致性收敛速度。研究建立了互联策略、代数连通性与系统性能之间的明确关系,并通过仿真实验验证了理论分析的有效性。

详情
英文摘要

In this paper, the effects of graph gluing operations in networks of multi-agent systems and their impact on system performance are investigated. In many practical applications, multiple multi-agent subsystems must be interconnected through communication links to accomplish complex tasks, resulting in a larger communication network. Such interconnections modify the underlying graph topology and consequently affect the consensus behavior and convergence rate of the network. In particular, this paper examines both bridge gluing and interface gluing and analyzes how the number and structure of communication links between subsystems influence the Fiedler eigenvalue of the resulting graph. Since the Fiedler eigenvalue is directly related to the convergence rate of consensus dynamics, the proposed analysis establishes a clear relationship between interconnection strategies, algebraic connectivity, and system performance. The results provide theoretical insight into how different gluing mechanisms alter the spectral properties of the graph Laplacian and, in turn, the convergence characteristics of the networked multi-agent system. Simulation studies are presented to illustrate the theoretical findings and to validate the effectiveness of the proposed framework.

2605.10520 2026-05-12 eess.SY cs.SY

Equivariant Observer Design on SL(3) for Image Intensity-Based Homography Estimation

Tarek Bouazza, Pieter van Goor, Robert Mahony, Tarek Hamel

AI总结 本文研究了基于图像强度信息的单应性估计问题,提出了一种在特殊线性群 $\mathbf{SL}(3)$ 上设计的等变观测器,避免了传统方法对特征提取和匹配的依赖。该方法通过直接最小化图像像素强度定义的成本函数进行图像配准,分析了成本函数非退化条件,并引入二阶观测器以提升收敛性能。实验结果验证了所提方法在真实图像中的有效性。

Comments 16 pages, 4 figures, preprint submitted to Automatica

详情
英文摘要

This paper addresses the problem of homography estimation using a nonlinear observer designed on the Lie group $\mathbf{SL}(3)$ that exploits the full image information through direct image registration. Unlike traditional feature-based methods, which rely on extensive feature extraction and matching, the proposed approach formulates an observer that minimises a cost function defined directly in terms of image pixel intensities. Explicit conditions ensuring the non-degeneracy of the cost function are derived, and a comprehensive analysis is conducted to characterise and generate degenerate (unobservable) image configurations. Theoretical results demonstrate local exponential convergence of the observer. To improve local convergence properties, a second-order observer variant is introduced by incorporating the Hessian of the cost function into the correction term. Simulation results demonstrate the performance of the proposed solutions on real images.

2605.10490 2026-05-12 eess.SY cs.SY

Glycemic Safety Tube: A Provably Safe Control Framework for Artificial Pancreas Systems under Parametric Uncertainty

Pukhrambam Akash Singh, Ratnangshu Das, Ahan Basu, Pushpak Jagtap

AI总结 该研究提出了一种名为Glycemic Safety Tube Control(GSTC)的控制框架,用于在参数不确定性下实现人工胰腺系统的安全血糖调节。该方法无需依赖精确的患者特异性模型,通过设计保证血糖水平始终处于临床安全范围内,并在存在饮食扰动和估计误差的情况下确保输入约束满足。实验表明,GSTC在不同饮食模式和患者条件下均能保持安全性能,展现出良好的鲁棒性和计算效率,为新一代人工胰腺系统提供了一种安全、高效且无需依赖患者个体差异的控制方案。

详情
英文摘要

Type 1 diabetes eliminates the body's ability to produce insulin, making glucose regulation entirely dependent on external insulin delivery and the control algorithm. Existing closed-loop methods either rely on accurate patient-specific models or do not provide formal safety guarantees, and are often computationally demanding for wearable devices. This paper proposes Glycemic Safety Tube Control (GSTC), a model-free and computationally efficient control framework for automated insulin delivery. The method enforces clinically relevant safety bounds on glucose levels by design, ensuring that glucose remains within a prescribed safe range. We also derive feasibility conditions that guarantee safety and input constraint satisfaction under bounded meal disturbances and estimation errors. The performance of GSTC is evaluated against state-of-the-art methods, including linear and nonlinear model predictive control and sliding mode control. The results demonstrate that GSTC maintains safety under varying meal patterns and patient conditions, highlighting its robustness and computational efficiency. Overall, GSTC provides a safe, efficient, and patient-independent approach for next-generation artificial pancreas systems.

2605.10489 2026-05-12 eess.SY cs.SY

Observing the state of networks with directed higher-order interactions

Roberto Rizzello, Davide Salzano, Stefano Boccaletti, Pietro De Lellis

AI总结 本文研究了在存在有向高阶相互作用的情况下,如何重构非线性动态系统网络的状态。作者基于分析收敛性结果,提出了一种算法观测器设计方法,能够同时选择被测节点和观测器增益。通过大量数值实验验证了所设计观测器的性能与鲁棒性,并将其应用于群体智能体意见的完整重构。

详情
英文摘要

We consider the problem of reconstructing the state of a network of nonlinear dynamical systems in the presence of directed higher-order interactions. Grounded on analytical convergence results, we propose an algorithmic observer design procedure that simultaneously selects the nodes to be measured and the observer gains. We complement the theoretical analysis with an exhaustive numerical investigation campaign that showcases the performance and robustness of the designed observer. Finally, the algorithmic procedure is used to fully reconstruct the opinions of a group of agents.

2605.10433 2026-05-12 cs.IT eess.SP math.IT

Syndrome Adaptive Gain Control for Min-Sum Decoding of Quantum LDPC Codes

Hernan Cordova, Alexios Balatsoukas-Stimming, Yunus Can Gültekin, Gabriele Liga, Alex Alvarado

AI总结 本文提出了一种适用于量子低密度奇偶校验(QLDPC)码的综合征自适应增益最小和(SAGMS)解码算法,旨在解决传统最小和(MS)解码中因消息幅值高估而导致的性能下降问题。该方法通过在解码过程中根据未满足稳态器的比例动态调整消息增益,无需针对具体编码或噪声水平进行优化。仿真结果表明,SAGMS在保持MS复杂度的同时,性能接近甚至优于离线优化的MS解码,并在某些情况下超越了信念传播(BP)解码。

Comments 6 pages, 3 figures

详情
英文摘要

Min-Sum (MS) decoding is a popular low-complexity alternative to belief propagation (BP), retaining only the minimum incoming message magnitude during check-node (CN) processing, at the cost of systematic message magnitude overestimation. The scaled MS (SMS) decoder compensates for this effect using a fixed scaling factor. We propose the syndrome adaptive gain Min-Sum (SAGMS) decoder for quantum low-density parity-check (QLDPC) codes, which adapts the message gain online based on the fraction of unsatisfied stabilizers, requiring no per-code or per-noise level optimization. We show that the scaling factor required for SMS to match belief propagation decreases with the CN degree, so any fixed scaling optimized for one degree incurs into a growing penalty as the CN degree varies. SAGMS avoids this limitation by adapting the gain during decoding. Simulations on generalized bicycle QLDPC codes demonstrate that SAGMS matches or outperforms the frame error rate (FER) of an offline optimized SMS decoder. Moreover, SAGMS approaches BP performance and, under certain conditions outperforms it while retaining MS-level complexity.

2605.10431 2026-05-12 eess.SY cs.SY math.OC

Hierarchical 2-degree-of-freedom control combining Youla-Kucera parameterization and model predictive control

Zhiheng Zhao, Hans Henrik Niemann, John Bagterp Jørgensen

AI总结 本文提出了一种结合Youla-Kucera参数化与模型预测控制的分层二自由度控制结构。该方法通过系统的互质因子分解引入辅助前馈通道和控制器参数化通道,前者用于实现级联模型预测控制以优化系统性能,后者通过H2最优控制器设计实现无偏模型预测控制。该方法有效融合了参数化控制与预测控制的优势,提升了系统控制精度与鲁棒性。

Comments 7 pages, 4 figures, accepted for Europan Control Conference 2026 (ECC 2026)

详情
英文摘要

A hierarchical 2DOF (2-degree-of-freedom) structure combining Youla-Kucera (YK) parameterization and model predictive control (MPC) is presented in this paper. The YK parameterization employs the coprime factorization of the nominal system and controller, thereby introducing an auxiliary feedforward channel dedicated to system optimization and a controller parameterization channel. The feedforward channel is utilized to implement cascaded MPC for system optimization. The controller parameterization channel is utilized to achieve offset-free MPC by designing an appropriate YK parameter through the H2 optimal controller design.

2605.10411 2026-05-12 astro-ph.IM cs.SY eess.SY

High-speed single-photoelectron detection for Cherenkov astronomy

Luca Giangrande, Matthieu Heller, Teresa Montaruli

AI总结 该研究提出了一种用于切连科夫天文观测的高速单光电子探测系统,解决了在低噪声、可扩展探测器中实现纳秒级时间分辨和单光电子分辨率的难题。研究设计了一种定制的六边形硅光电倍增管传感器与专用前端集成电路(ASIC)协同工作的系统,具备光学滤波、四像素分割和高精度读出功能,能够在低功耗下实现高时间分辨率和良好线性响应。该系统展示了对单光电子信号的清晰分辨能力,为大视场切连科夫望远镜的成像提供了高效、可扩展的解决方案。

详情
英文摘要

Silicon photomultipliers are increasingly replacing photomultiplier tubes in Cherenkov telescope cameras, but achieving single-photoelectron resolution with nanosecond timing in a low-noise, scalable detector system remains challenging. We present a co-designed SiPM sensor and front-end application specific integrated circuit (ASIC) that meets these requirements. The custom hexagonal sensor, developed with Hamamatsu Photonics, incorporates an integrated optical filter and fourfold pixel segmentation. The readout is performed by a second prototype of the FANSIC ASIC, optimized for this application and fabricated in 65~nm standard CMOS technology, it provides eight channels with on-chip analog summing of sub-channels on a $3.5\times 3.5~\mathrm{mm}^2$ die, while consuming only 24~mW per channel. We demonstrate clear single-photoelectron peak separation with a gain of $2.7 \times 10^{-12}~ \mathrm{V \cdot s}$ , and an impulse response below 4~ns full width at half maximum with a 1.7 ns rise time, preserving the nanosecond-scale structure of Cherenkov pulses. The system responds linearly from 1 to 130 photoelectrons, and 55 distinct photoelectron peaks are resolved by varying the source intensity. These results demonstrate that the integrated sensor-electronics architecture delivers the speed, resolution, and dynamic range required for imaging atmospheric Cherenkov telescopes, and provides a scalable path toward large-area camera modules.

2605.10398 2026-05-12 eess.AS

SF-Flow: Sound field magnitude estimation via flow matching guided by sparse measurements

Ege Erdem, Shoichi Koyama, Tomohiko Nakamura, Orchisama Das, Zoran Cvetković

AI总结 本文提出了一种名为SF-Flow的新方法,用于从稀疏麦克风测量中重建三维声场的幅度。该方法基于流匹配(Flow Matching)生成模型,并结合一个具有排列不变性集编码器的3D U-Net网络,实现了对任意数量稀疏输入的稳定高效重建。实验表明,SF-Flow在1 kHz频率范围内具有较高的重建精度,训练速度远超传统自编码器,并且随着数据集规模的增大性能显著提升。

详情
英文摘要

Reconstructing a 3D sound field from sparse microphone measurements is a fundamental yet ill-posed problem, which we address through Acoustic Transfer Function (ATF) magnitude estimation. ATF magnitude encapsulates key perceptual and acoustic properties of a physical space with applications in room characterization and correction. Although recent generative paradigms such as Flow Matching (FM) have achieved state-of-the-art performance in speech and music generation, their potential in spatial audio remains underexplored. We propose a novel framework for 3D ATF magnitude reconstruction as a guided generation task, with a 3D U-Net conditioned by a permutation-invariant set encoder. This architecture enables reconstruction from an arbitrary number of sparse inputs while leveraging the stable and efficient training properties of FM. Experimental results demonstrate that SF-Flow achieves accurate reconstruction up to \SI{1}{kHz}, trains substantially faster than the autoencoder baseline, and improves significantly with dataset size.

2605.10352 2026-05-12 eess.SP

Quantifying System Level KPI Deviations of Sionna RT: Material and Near-Field Error Analysis Using a 5G OAI Testbed

Faizan Rauf, Srijita Sanyal, Markus Heinrichs, Aydin Sezgin

AI总结 本文研究了Sionna RT在5G系统级关键性能指标(KPI)上的偏差,通过OpenAirInterface(OAI)5G NR测试平台,将RT模拟信道与矢量网络分析仪(VNA)实测信道进行对比,揭示了天线近场过渡效应和材料属性不匹配是导致系统级KPI误差的主要因素,并提供了基于数字孪生的5G及未来网络规划的量化基准。

详情
英文摘要

Ray tracing (RT) has recently gained renewed interest in wireless communications, driven by its integration into digital twin (DT) frameworks for site specific channel modeling. Several previous studies have validated RT at the channel level, yet how these errors propagate into real 5G system level key performance indicators (KPIs) on actual hardware remains unquantified. This paper addresses this gap by comparing Sionna RT simulated channels against vector network analyzer (VNA) measured channels using an OpenAirInterface (OAI) 5G NR testbed. Channel measurements are conducted at 20 receiver positions in an indoor laboratory, with both channel types injected into a hardware in the loop channel emulator interfacing an OAIBOX MAX base station and a Quectel UE. RSRP, PUCCH SNR, and SINR are evaluated under both conditions. The results identify antenna near-field transition effects as a critical position-dependent error source, alongside material property mismatch, providing a quantitative benchmark for digital twin-based 5G and beyond network planning.

2605.10351 2026-05-12 cs.LG eess.SP

Foundations of Reliable Inference: Reliability-Efficiency Co-Design

Jiayi Huang

AI总结 本研究探讨了如何在保证人工智能模型不确定性估计可信度的同时提高推理效率的问题。作者提出了一种统一的框架,从两个角度出发,旨在实现可靠性与计算效率的协同设计。该工作为构建高效且可信的AI推理系统提供了理论基础和方法支持。

Comments PhD Thesis

详情
英文摘要

Reliable inference requires that artificial intelligence (AI) models provide trustworthy uncertainty estimates, not merely accurate predictions. Recent advances in Bayesian learning have made significant progress toward this goal, and growing concerns about computational overhead have jointly shifted the design criterion from reliability alone to the co-design of reliability and efficiency, i.e., reducing computational overhead while preserving trustworthy uncertainty quantification. This thesis develops a unified framework from two perspectives to address the central question: can we efficiently perform reliable inference?

2605.10350 2026-05-12 eess.SP

Signal-Dependent Shot Noise Modeling of Rydberg Atomic Quantum Receivers: A Design Perspective

Qihao Peng, Qu Luo, Tierui Gong, Neng Ye, Jizhou Wu, Cunhua Pan, Maged Elkashlan, Pei Xiao, Chau Yuen, George K. Karagiannidis, Jiangzhou Wang

AI总结 本文提出了一种面向通信的复数基带等效模型,用于超外差 Rydberg 原子量子接收机(RAQR),该模型准确描述了光检测引起的信号依赖性散粒噪声及其与光学工作点的耦合关系。通过构建直接非相干和平衡相干光检测下的复数基带表示,研究揭示了光学工作点对有效接收增益和等效噪声背景的联合影响,确立了由系统设计决定的增益-噪声权衡关系。进一步分析表明,忽略信号依赖性散粒噪声会导致工作点设计不准确,并推导了考虑该噪声的多输入多输出(MIMO)系统可实现的速率下界,验证了 RAQ-MIMO 在特定噪声条件下的性能优势。

详情
英文摘要

In this paper, we develop a communication-oriented complex baseband equivalent model for superheterodyne Rydberg atomic quantum receivers (RAQRs). The model explicitly captures photodetection-induced signal-dependent shot noise and its coupling with the optical operating point. By leveraging an atomic superheterodyne architecture and a strong local oscillator, we construct a complex baseband representation for both the received signal and the signal-dependent shot noise under both direct incoherent optical detection and balanced coherent optical detection. The derived model reveals that the optical operating point jointly determines the normalized effective receive gain and the equivalent noise background, thereby establishing a traceable gain-noise tradeoff governed by system design. More importantly, the proposed model shows that neglecting signal-dependent shot noise may lead to inaccurate operating-point design. Finally, by extending to the multiple-input-multiple-output (MIMO) case, we derive a lower bound on the achievable rate while considering the signal-dependent shot noise. Our analysis \textcolor{black}{reveals} that the non-zero asymptotic rate of RAQ-MIMO and its superiority over conventional RF-MIMO hinge on the normalized noise floor of the RAQ receive chain falling below that of RF MIMO. Simulation results validate our analysis and yield practical, closed-form design guidelines for RAQR front ends, revealing parameter regimes in which RAQ-MIMO outperforms conventional MIMO systems.

2605.10340 2026-05-12 eess.IV cs.CE cs.ET

Learning to Focus Synthetic Aperture Radar On-line with State-Space Models

Sebastian Fieldhouse, Roberto Del Prete, Gabriele Daga, Nathaniel Rensly, Gabriele Meoni, Kea-Tiong Tang

AI总结 本文提出了一种在线合成孔径雷达(SAR)处理器(OSP),通过将SAR成像视为数据流进行实时处理,解决了传统SAR聚焦方法延迟高、难以支持闭环认知系统的问题。OSP采用小型状态空间模型,并通过教师-学生蒸馏和多阶段损失进行训练,实现了高效低延迟的图像生成。实验表明,相比传统数字信号处理方法,OSP在单个CPU核心上处理速度提升了70倍,内存占用降低130倍,且保持了足够的成像质量以支持下游任务,如船舶检测和洪水映射。

详情
英文摘要

Conventional focusing methods for Synthetic Aperture Radar (SAR) employ block processing efficiently but remain latency-heavy processes that prevent the realisation of a closed-loop cognitive SAR vision system. We present the first Online SAR Processor (OSP), an online image-formation framework that treats SAR sensing as a stream and produces focused SAR image output line by line during acquisition. OSP uses a tiny state-space surrogate model trained with teacher-student distillation and multi-stage losses. We evaluate the method on 300GB of SAR data from Maya4, a Sentinel-1-derived dataset containing raw, range-compressed, range-cell-migration-corrected, and azimuth-compressed products. Relative to a linewise digital-signal-processing baseline, OSP delivers approximately 70$\times$ lower latency and 130$\times$ lower memory use; on a single AMD CPU core it processes one row in 16 ms with a memory footprint of 6 MB whilst maintaining a focusing quality high enough to support downstream decisions, which we illustrate with vessel detection and flood-mapping tasks.

2605.10337 2026-05-12 cs.AI eess.SP

CORTEG: Foundation Models Enable Cross-Modality Representation Transfer from Scalp to Intracranial Brain Recordings

Liuyin Yang, Qiang Sun, Bob Van Dyck, Eva Calvo Merino, Marc M. Van Hulle

AI总结 该研究提出CORTEG框架,旨在将基于头皮EEG的预训练基础模型迁移至颅内ECoG信号,以提升脑机接口的解码性能。CORTEG结合了电极感知的空间适配器、双流分词器和留一被试法微调策略,实现了跨被试学习和快速个性化校准。实验表明,CORTEG在多个任务中达到或超越了专门方法的性能,尤其在数据量有限的情况下表现突出,为高效、可扩展的颅内脑机接口提供了新思路。

详情
英文摘要

Intracranial electrocorticography (ECoG) offers high-signal-to-noise access to cortical activity for brain-computer interfaces, yet limited per-patient data has led most prior work to rely on small, subject-specific decoders that neglect information shared across patients. We investigate whether large pretrained scalp-EEG foundation models (EEG FMs) can be adapted to ECoG, enabling cross-patient learning and competitive decoding performance while calibrating to a held-out patient in 10-30 minutes on a single GPU. We introduce CORTEG, a cross-modality transfer framework that combines a pretrained EEG FM backbone, an electrode-aware KNNSoftFourier spatial adapter, a dual-stream tokenizer for low-frequency and high-gamma activity, and a leave-one-subject-out fine-tuning strategy. We evaluate CORTEG on two challenging regression tasks: public finger trajectory regression (n=9) and private audio envelope regression (n=16). CORTEG matches or exceeds the strongest task-specific baselines on both tasks: it reaches the highest mean correlation among compared methods on the public finger benchmark (gain not statistically significant on n=9 subjects), with larger and statistically significant gains on the audio task and in low-data per-patient calibration. Feature analyses align with neurophysiology, and latent manifolds capture low-dimensional finger-movement structure. CORTEG provides systematic evidence that scalp-EEG pretraining can be repurposed for ECoG decoding, enabling data-efficient intracranial BCIs that can adapt to new patients.

2605.10264 2026-05-12 cs.IT cs.SY eess.SY math.IT

Low-Cost GNSS Anti-Jamming Through 2-Bit Phase Shift Beamforming with Machine Learning

Burak Soner, Ekin Uzun, Can Aksoy

AI总结 本文研究了一种低成本的GNSS抗干扰方法,通过使用仅具有2比特相位移的波束成形技术,将每个复数阵列权重限制在四个QPSK相位状态中。为了解决由此带来的波束图解空间受限问题,作者提出了一个离散优化框架,并引入机器学习方法以实现低延迟的高性能波束成形。实验表明,该方法在中等和强干扰环境下显著提升了GNSS接收机的信号质量,验证了2比特相位移波束成形在抗干扰方面的有效性。

Comments Accepted for presentation at RAST 2026. Author accepted version. Final version to appear in IEEE Xplore

详情
英文摘要

We investigate low-cost GNSS anti-jamming using beamforming with inexpensive 2-bit phase shifters, constraining each complex array weight to one of four QPSK phase states (real/imaginary = -1 or +1). This severe quantization sharply limits the beampattern solution space, making conventional real-valued beamforming and naive weight quantization highly suboptimal. We formulate a discrete optimization that trades interference suppression against satellite-direction gain, and benchmark known combinatorial optimization methods across array sizes and interference conditions. Simulations show that performance improves with array size, with oracle and greedy search achieving up to 34 dB nulling, but oracle incurs exponential latency and greedy sampling is stochastic. To obtain deterministic low-latency performance, we propose an ML-aided method based on gradient-boosted decision trees followed by local search, which performs similar to the oracle for larger arrays at fixed latency. We further validate the approach experimentally using a fully digital emulation of the QPSK oracle beamformer and compare against a GNSS receiver without beamforming capability. Under mild jamming (J/S approximately 44 dB) both receivers maintain adequate tracking, with QPSK yielding a 4.2 dB higher average C/N0 (37.3 vs. 33.1 dB-Hz). Under moderate and strong jamming (J/S approximately 62-70 dB) the benefit is substantial. At J/S = 70 dB the unprotected receiver degrades to near tracking limits (avg C/N0 = 9.3 dB-Hz) while the QPSK oracle sustains an average C/N0 of 20.8 dB-Hz. These results confirm that 2-bit phase-shift beamforming provides considerable anti-jamming benefit over a standard GNSS receiver, motivating further research on oracle-level practical methods.

2605.10213 2026-05-12 eess.SP

Unsupervised Online Channel Estimation for High-Mobility OFDM via Implicit Neural Representation

Bohao Shi, Tianfu Qi, Xiaonan Chen, Jun Wang

AI总结 本文研究了高速移动场景下正交频分复用(OFDM)系统的无监督在线信道估计问题,针对多普勒效应引起的严重载波间干扰(ICI),提出了一种基于隐式神经表示(INR)的框架。该方法将时变频率选择性信道建模为连续的时频函数,通过正弦表示网络(SIREN)捕获细粒度信道变化,无需离线预训练或标注数据。实验表明,该方法在真实车联网(V2X)环境中实现了接近最优的链路可靠性,并在分布外鲁棒性方面优于监督学习基线,为物理层提供了一种适应性强、数据高效的解决方案。

详情
英文摘要

Accurate channel estimation remains challenging in high-mobility wireless systems because Doppler shifts induce severe inter-carrier interference (ICI) in Orthogonal Frequency Division Multiplexing (OFDM). We propose an unsupervised online channel estimation framework based on Implicit Neural Representation (INR). Unlike discrete-grid estimators, the proposed method decouples channel representation from the OFDM sampling resolution by modeling the time-varying frequency-selective channel as a continuous function of time-frequency coordinates. A Sinusoidal Representation Network (SIREN) with Gaussian Fourier feature mapping captures fine-grained channel variations and high-frequency details without offline pre-training or labeled data. For each received slot, the network parameters are updated by per-slot online fitting that minimizes a physics-aware ICI loss, while a confidence-aware decision-directed loop balances reliable pilots and dynamically harvested pseudo-pilots. Simulations in realistic Vehicle-to-Everything (V2X) environments show that the proposed method achieves near-optimal link-level reliability, significantly outperforming Least Squares (LS) and robust Linear Minimum Mean Square Error (LMMSE) estimators. Compared with supervised deep learning baselines, it also exhibits strong out-of-distribution (OOD) robustness under environmental distribution shifts, establishing an adaptable data-efficient physical-layer paradigm.

2605.10203 2026-05-12 cs.SD eess.AS

Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Music with Acoustic-Informed Attention Calibration

Haowen Li, Tianxiang Li, Yi Yang, Boyu Cao, Qi Liu

AI总结 该研究提出了一种名为Polyphonia的零样本音色迁移框架,旨在解决多声部音乐中对特定音轨进行音色编辑时背景伴奏易被破坏的问题。其核心方法是引入基于声学信息的注意力校准机制,通过概率声学先验建立粗略边界,从而在保持非目标音轨语义完整性的同时,更精确地定位并修改目标音轨。实验表明,该方法在目标音轨对齐度上比现有方法提升了15.5%,同时保持了较高的音乐保真度和非目标音轨的完整性。

Comments Accepted by ICML 2026

详情
英文摘要

The advancement of diffusion-based text-to-music generation has opened new avenues for zero-shot music editing. However, existing methods fail to achieve stem-specific timbre transfer, which requires altering specific stems while strictly preserving the background accompaniment. This limitation severely hinders practical application, since real-world production necessitates precise manipulation of components within dense mixtures. Our key finding is that, while vanilla cross-attention captures semantic features of stems, it lacks the spectral resolution to strictly localize targets in dense mixtures, leading to boundary leakage. To resolve this dilemma, we propose Polyphonia, a zero-shot editing framework with Acoustic-Informed Attention Calibration. Rather than relying solely on diffuse semantic attention, Polyphonia leverages a probabilistic acoustic prior to establish coarse boundaries, enabling non-target stems preserved precise semantic synthesis. For evaluation, we propose PolyEvalPrompts, a standardized prompt set with 1,170 timbre transfer tasks in polyphonic music. Specifically, Polyphonia achieves an increase of 15.5% in target alignment compared to baselines, while maintaining competitive music fidelity and non-target integrity.

2605.10199 2026-05-12 cs.CL eess.AS

How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

Hui Lu, Xueyuan Chen, Huimeng Wang, Shuhai Peng, Shiyin Kang, Xixin Wu, Zhiyong Wu

AI总结 本文研究了在全双工语音对话中,大语言模型(LLM)如何在生成自身语音响应的同时持续监听用户输入的问题。作者提出用户流在LLM中的路由方式是影响系统性能的关键架构问题,并设计了两种路由策略进行对比:一种是直接将用户流注入模型输入,另一种是通过交叉注意力机制访问外部记忆。实验表明,直接注入方式在语义理解和问答任务中表现更优,但在用户打断等场景下容易导致上下文混乱;而交叉注意力路由虽然问答性能稍逊,但能更好地保持生成上下文的稳定性,更具鲁棒性。研究为全双工语音对话系统的设计提供了重要的指导。

详情
英文摘要

Full-duplex spoken dialogue requires a model to keep listening while generating its own spoken response. This is challenging for large language models (LLMs), which are designed to extend a single coherent sequence and do not naturally support user input arriving during generation. We argue that how the user stream is routed into the LLM is therefore a key architectural question for full-duplex modeling. To study this question, we extend a text-only LLM into a unified full-duplex spoken dialogue system and compare two routing strategies under a shared training pipeline: (i) channel fusion, which injects the user stream directly into the LLM input, and (ii) cross-attention routing, which keeps the user stream as external memory accessed through cross-attention adapters. Experiments on spoken question answering and full-duplex interaction benchmarks reveal a clear tradeoff. Channel fusion yields stronger semantic grounding and consistently better question-answering performance. However, under semantically overlapping conditions such as user interruptions, it is more vulnerable to context corruption: if the model fails to stop in time, the overlapping user stream can interfere with ongoing generation and lead to semantically incoherent continuations. Cross-attention routing underperforms on question answering, but better preserves the LLM generation context and is more robust to this failure mode. These results establish user-stream routing as a central design axis in full-duplex spoken dialogue and offer practical guidance on the tradeoff between semantic integration and context robustness. We provide a demo page for qualitative inspection.

2605.10192 2026-05-12 eess.SP

LO-Free Receiver: Next-Gen Low-Power Joint Communication and Sensing

Hasan Atalay Gunel, Mohaned Chraiti, Ali Gorcin

AI总结 本文提出了一种无需本地振荡器(LO)的接收机架构,用于实现低功耗的联合通信与感知(JCAS)。该方法通过天线间的相对空间相位进行信息嵌入与恢复,利用天线域相关性构建与到达方向(DoA)相关的基带可观测量,从而实现通信与感知的自然解耦。研究构建了完整的流形域信号模型与接收机架构,并分析了其在相位噪声下的误码率与感知精度,适用于大规模物联网场景,尤其在毫米波及以上频段具有显著优势。

Comments The paper was accepted by VTC

详情
英文摘要

This paper introduces and analyzes Spatial Phase Manifold Communications (SPMC), a paradigm that facilitates joint communication and sensing (JCAS) over Local Oscillator (LO) free receiver. Information is embedded in, and recovered from, the relative spatial phase between antennas. In contrast to conventional coherent receivers that rely on LOs and on channel estimation/equalization, SPMC exploits antenna-domain correlation to form a baseband observable that is a function of inter-antenna phase differences. Since these phase differences are fundamentally tied to Direction-of-Arrival (DoA) and vice-versa, the formulation recasts communication and sensing as inference over the unit-circle manifold and thus naturally supports JCAS decomposition, i.e., data and spatial sensing are encoded and recovered through DoA signatures. We develop a comprehensive framework comprising: (i) a manifold-domain signal model and corresponding phase-alphabet design; (ii) an LO-free quadrature spatial-correlator receiver architecture that resolves the phase-sign ambiguity without requiring an LO; and (iii) an analysis of error probability and sensing precision, including robustness to phase noise. The proposed paradigm is particularly suited to massive Internet-of-Things (IoT) deployments, for which hardware simplicity, LO distribution cost, power consumption, and seamless sensing integration are critical, especially at millimeter-wave and higher carrier frequencies.