arXivDaily arXiv每日学术速递 周一至周五更新
重置
EESS电气与系统 104
2605.20169 2026-05-20 eess.SY cs.SY

The OAPS solution: a real-time predictive system for flexible PWR operation

OAPS解决方案:一种用于灵活PWR运行的实时预测系统

Guillaume Dupré, Alain Grossetête

AI总结 本文提出了一种创新解决方案,旨在促进核电机组的安全灵活运行。OAPS系统通过提供最优策略(如轴向偏移控制、氙振荡抑制和排放最小化)和实时建议(如稀释和硼化物流量率、涡轮机功率设定点和变化率)帮助核电站操作员自信高效地进行功率变化。

详情
Comments
ICAPP 2025 - International Congress on Advances in Nuclear Power Plants, SFEN, Sep 2025, Juan-les-Pins / Antibes, France
AI中文摘要

本文提出了一种创新解决方案,旨在促进核电机组的安全灵活运行。OAPS系统通过提供最优策略(如轴向偏移控制、氙振荡抑制和排放最小化)和实时建议(如稀释和硼化物流量率、涡轮机功率设定点和变化率)帮助核电站操作员自信高效地进行功率变化。事实上,就像GPS导航器根据用户当前位置优化和修改计划路线一样,OAPS系统会根据最新的机组测量数据定期更新其建议。为此,OAPS系统依赖于一种经过验证但核工业中先进的控制技术,即模型预测控制。OAPS系统的传统轴向偏移控制策略之前已在Framatome的全范围PWR模拟器和EDF的全范围N4模拟器上得到验证。在本文中,三种新的高级策略在Framatome开发的中等复杂度PWR模拟器上展示:1)确定最快可行的功率变化速率,2)加速消除轴向功率振荡,3)最小化水和硼排放。

英文摘要

This paper presents an innovative solution designed to facilitate safe and flexible operation of nuclear power plants. The purpose of this new device, named OAPS system, is to provide optimal strategies (e.g., axial offset control, xenon oscillations mitigation, effluent minimization) and real-time recommendations (e.g., dilution and boration flowrates, turbine power setpoints and variation rates) to help NPP operators perform power variations confidently and efficiently. In fact, just as a GPS navigator optimizes and modifies its planned route according to the current position of the user, the OAPS system regularly updates its recommendations based on the latest plant measurements. To achieve this, the OAPS system relies on a well-established -yet cutting-edge in the nuclear industry -advanced control technique known as model predictive control. The conventional axial offset control strategy of the OAPS system was previously validated on both Framatome's full-scope PWR simulator and EDF's full-scope N4 simulator. In this paper, three new advanced strategies are showcased on an intermediate-complexity PWR simulator developed by Framatome: 1) determination of the fastest feasible power variation rates, 2) accelerated cancellation of axial power oscillations and 3) minimization of water and boron effluents.

2605.20144 2026-05-20 eess.SY cs.SY

A Unified Framework for Attack-Resilient CLF-CBF Quadratic Programs for Nonlinear Control-Affine Systems

非线性控制仿真的攻击鲁棒CLF-CBF二次规划统一框架

Mohamadamin Rajabinezhad, Shan Zuo

AI总结 本文提出了一种针对非线性控制仿真的攻击鲁棒CLF-CBF二次规划统一框架,通过嵌入统一的自适应补偿项,实现了在控制输入虚假数据注入攻击下有限时间内恢复到名义安全集,无需事先确定攻击幅度上限,仅依赖于增长率分析和在线增益调节律。

详情
Comments
Under review for possible publication
AI中文摘要

本文介绍了一种针对非线性控制仿真的攻击鲁棒控制李雅普诺夫函数(AR-CLFs)和攻击鲁棒控制屏障函数(AR-CBFs),用于受控制输入虚假数据注入攻击(FDIA)影响的系统,其中FDIA满足至多指数增长的包络。所提出的框架将统一的自适应补偿项嵌入到CLF下降和CBF安全约束中。与基于输入到状态稳定性/安全性(ISS/ISSf)的方法不同,所提出的方法能够在不需事先确定FDIA幅度上限的情况下,实现有限时间恢复到名义安全集,依赖于用于分析的增长率特性以及在线增益调节律来调节补偿项。开发了一个统一的二次规划(QP)以同时执行AR-CLF和AR-CBF条件,保证在无界FDIA下的一致最终有界(UUB)稳定性和一致最终安全(UUS)。数值结果表明,与现有ISS-CLF、ISSf-CBF和鲁棒CLF-CBF-QP方法相比,具有改进的鲁棒性。

英文摘要

This letter introduces attack-resilient Control Lyapunov Functions (AR-CLFs) and attack-resilient Control Barrier Functions (AR-CBFs) for nonlinear control-affine systems subject to control-input false data injection attacks (FDIA) satisfying an at-most-exponentially growing envelope. The proposed framework embeds a unified adaptive compensation term into both the CLF decrease and CBF safety constraints. In contrast to input-to-state stability/safety (ISS/ISSf)-based methods that certify disturbance-dependent enlarged safe sets, the proposed approach enables finite-time recovery to the nominal safe set without requiring a prior magnitude bound on the FDIA, relying instead on a growth-rate characterization used for analysis and an online gain tuning law that regulates the compensation term. A unified quadratic program (QP) is developed to enforce the AR-CLF and AR-CBF conditions simultaneously, guaranteeing uniformly ultimately bounded (UUB) stability and uniform ultimate safety (UUS) under unbounded FDIA. Numerical results demonstrate improved resilience compared to existing ISS-CLF, ISSf-CBF, and robust CLF-CBF-QP approaches.

2605.20138 2026-05-20 cs.RO cs.SY eess.SY

Hamilton--Jacobi Reachability for Spacecraft Collision Avoidance

航天器碰撞避免的Hamilton-Jacobi可达性

Larry Hui, Jordan Kam, William Su, Jianshu Zhou

AI总结 本文提出了一种用于同轨道双卫星碰撞避免问题的Hamilton-Jacobi(HJ)可达性框架,通过平面Hill-Clohessy-Wiltshire(HCW)动力学在径向-切向-法向(RTN)框架中建模相对运动。定义目标状态空间为对应于联邦通信委员会(FCC)轨道标准的最小分离要求的不安全相对配置。将航天器之间的相互作用建模为零和微分博弈,其中玩家1是受控卫星,玩家2被建模为具有未知意图的有界对抗干扰。本文提出了HJ公式,并计算了后向可达集,这些集描述了在最坏情况下无法避免碰撞的相对状态,而集外的状态则允许证明安全的轨迹。这些可达集与监督混合控制逻辑相结合,以确定何时必须启动规避机动,从而为可扩展性提供数学基础的安全保证。

详情
Comments
Accepted to the 20th IEEE International Conference on Control & Automation (IEEE ICCA 2026). 6 pages, 4 figures
AI中文摘要

本文提出了一种用于同轨道双卫星碰撞避免问题的Hamilton-Jacobi(HJ)可达性框架,通过平面Hill-Clohessy-Wiltshire(HCW)动力学在径向-切向-法向(RTN)框架中建模相对运动。我们定义目标状态空间为对应于最小分离要求一致的联邦通信委员会(FCC)轨道标准的不安全相对配置。将航天器之间的相互作用建模为零和微分博弈,其中玩家1是受控卫星,玩家2被建模为具有未知意图的有界对抗干扰。我们提出了HJ公式,并计算了后向可达集,这些集描述了在最坏情况下无法避免碰撞的相对状态,而集外的状态则允许证明安全的轨迹。这些可达集与监督混合控制逻辑相结合,以确定何时必须启动规避机动,从而为可扩展性提供数学基础的安全保证。

英文摘要

This article presents a Hamilton--Jacobi (HJ) reachability framework for a two--satellite collision avoidance problem operating in the same circular orbit, where relative motion is modeled in the radial--tangential--normal (RTN) frame using planar Hill--Clohessy--Wiltshire (HCW) dynamics. We define the target state space as unsafe relative configurations in the orbit plane corresponding to minimum separation requirements consistent with Federal Communications Commission (FCC) orbital standards. The interaction between spacecraft is formulated as a zero--sum differential game, where Player 1 is the controlled satellite and Player 2 is modeled as a bounded adversarial disturbance with unknown intent. We present the HJ formulation and compute backward reachable sets that characterize relative states from which collision cannot be avoided under worst-case disturbances, while states outside this set admit provably collision-free trajectories. These reachable sets are integrated with supervisory hybrid control logic to determine when evasive maneuvers must be initiated, enabling mathematically grounded safety guarantees for scalability.

2605.20136 2026-05-20 eess.SY cs.SY

Enabling Real-Time Phase Control in Traffic Signal Hardware-in-the-Loop Simulation

在交通信号硬件在环仿真中实现实时相位控制

Zhiyao Zhang, Gergely Zachár, William Barbour, Matt Bunting, Marcos Quiñones-Grueiro, Jonathan Sprinkle, Dan Work

AI总结 本文提出首个支持实时相位控制的HILS测试平台,通过新型中间件架构将动态相位动作转换为符合NTCIP标准的硬件控制器命令,实现相位转换、信号状态同步和错误处理,实验验证系统能高效执行实时相位指令并保持亚毫秒级低延迟。

详情
Comments
7 pages, 5 figures, accpeted to IEEE ITSC 2026
AI中文摘要

先进的交通信号控制(TSC)算法需要实时相位控制,但现有的硬件在环仿真(HILS)测试平台仅支持预编程的定时计划。本文提出首个支持实时相位控制的HILS测试平台。我们开发了一种新型中间件架构,将动态相位动作(选择、切换和持续时间)转换为符合NTCIP标准的商用硬件控制器命令。该中间件管理相位转换,同步信号状态,并处理错误,而不会中断硬件的内部操作。实验验证表明,系统能够执行实时相位命令,处理系统冲突,并在平均亚毫秒级别实现低系统内部延迟。

英文摘要

Advanced Traffic Signal Control (TSC) algorithms require real-time phase control, yet existing Hardware-in-the-Loop Simulation (HILS) testbeds only support pre-programmed timing plans. In this paper, we present the first HILS testbed for real-time phase control. We develop a novel middleware architecture that translates dynamic phase actions (selection, switch, and duration) into commands for NTCIP-compliant commercial hardware controllers. This middleware manages phase transitions, synchronizes signal states, and handles errors without interrupting the hardware's internal operations. Experimental validation demonstrates that the system executes real-time phase commands, handles system conflicts, and achieves a low system internal latency at sub-millisecond on average.

2605.20132 2026-05-20 physics.geo-ph cs.LG eess.SP

FiLark: a streaming-first software framework for end-to-end exploration, annotation, and algorithm integration in distributed acoustic sensing

FiLark:一种面向流式处理的软件框架,用于分布式声学传感的端到端探索、标注和算法集成

Jintao Li, Weichang Li, Kai Tong, Xaingyu Guo

AI总结 本文提出FiLark框架,通过流式处理原则,实现分布式声学传感数据的端到端探索、标注和算法集成,解决传统批量分析框架无法处理连续高通道数据流的问题。

详情
AI中文摘要

分布式声学传感(DAS)系统生成的连续、超高通道计数的数据流速率超过了传统批量分析框架的能力。因此,诸如长时记录的交互探索、可扩展的事件标注和实时算法闭环监控等关键任务仍然无法得到足够支持。本文提出了FiLark(Fiber Lark),一种Python框架,其应用流式处理原则贯穿数据访问、信号处理、可视化和监控。FiLark将任何DAS源,包括连续多文件记录,作为统一流进行处理,并围绕该抽象构建所有系统组件。基于OpenGL的环形缓冲区渲染器允许以恒定内存使用量交互浏览和可视化任意长的记录。集成的标注界面支持在连续数据流中直接进行事件标注,从而在不进行离线预处理的情况下创建可重复的机器学习准备好的标注数据集。信号处理库包括时间、空间、频谱和分解基的运算符,包含通过PyTorch实现的CPU版本和GPU加速版本,以及具有状态的分块执行,以在段边界保持处理连续性和应用语义。标准化的监控接口进一步将流式检测器和基于学习的模型整合到可视化工作流程中。通过在所有层次共享共同的流式抽象,FiLark允许在交互式开发的处理配置和工作流程直接转移到可扩展的生产管道中,而无需修改。

英文摘要

Distributed acoustic sensing (DAS) systems generate continuous, ultra-high-channel-count data streams at rates that exceed the capabilities of conventional batch-oriented analysis frameworks. As a result, essential tasks such as interactive exploration of long-duration recordings, scalable event annotation, and real-time algorithm-in-the-loop monitoring remain inadequately supported by workflows built around manually selected data segments and offline processing. This paper presents FiLark (Fiber Lark), a Python framework that applies a \emph{streaming-first} principle uniformly across data access, signal processing, visualization and monitoring for DAS. Instead of operating on manually selected data segments, FiLark presents any DAS sources-including continuous multi-file recordings-as a unified stream and builds all system components around that abstraction. An OpenGL-based ring-buffer renderer enables interactive browsing and visualization of arbitrarily long recordings with constant memory usage. An integrated annotation interface supports event labeling directly within continuous data streams, facilitating the creation of reproducible machine-learning-ready labeled datasets without offline preprocessing. The signal processing library includes temporal, spatial, spectral, and decomposition-based operators, with both CPU implementations and GPU-accelerated variants via PyTorch, alongside stateful chunked execution that preserves processing continuity and application semantics across segment boundaries. A standardized monitor interface further integrates streaming detectors and learning-based models into the visualization workflow. By sharing a common streaming abstraction across all layers, FiLark allows processing configurations and workflows developed interactively to transfer directly to scalable production pipelines without modification.

2605.20108 2026-05-20 eess.SY cs.AI cs.LG cs.LO cs.SY

k-Inductive Neural Barrier Certificates for Unknown Nonlinear Dynamics

k-诱导神经屏障证书用于未知非线性动力学

Ben Wooding, Hongchao Zhang, Taylor T. Johnson, Abolfazl Lavaei

AI总结 本文提出了一种基于神经网络的k-诱导神经屏障证书(k-NBCs),用于部分未知的非线性系统,通过利用神经网络的可扩展性以及泛化Willems等人基本引理,构建数据驱动的表示以进行SMT验证,同时提高了设计灵活性。

详情
Comments
18 pages, 5 figures, 3rd International Conference on Neuro-Symbolic Systems (NeuS)
AI中文摘要

尽管传统的(k=1)离散时间屏障证书条件通过要求函数在每一步都非递增来施加严格的安全约束,k-诱导屏障证书通过允许临时增加--最多k-1次,每次在阈值ε内--同时保持整体安全性并提高灵活性。本文利用神经网络构建k-诱导神经屏障证书(k-NBCs)用于(部分)未知的非线性系统。虽然神经网络在设计过程中提供可扩展性,但缺乏形式保证,需要额外的方法如基于可满足性模理论(SMT)的反例引导归纳合成(CEGIS)进行验证。然而,CEGIS-SMT框架需要系统动力学的知识,这在实际情况下不可用。为此,我们利用Willems等人基本引理的泛化,使用单个状态轨迹,构建数据驱动的表示以进行SMT验证而不牺牲准确性。此外,CEGIS-SMT进一步消除了将屏障证书限制在特定函数类(如平方和)的约束,从而在设计上具有更大的灵活性。我们验证了我们的方法在三个非线性案例研究中,具有(部分)未知的动力学。

英文摘要

While conventional (k=1) discrete-time barrier certificate conditions impose strict safety constraints by requiring the function to be non-increasing at every step, k-inductive barrier certificates relax this by allowing a temporary increase -- up to k-1 times, each within a threshold $ε$ -- while maintaining overall safety, and improving flexibility. This paper leverages neural networks and constructs k-inductive neural barrier certificates (k-NBCs) for (partially) unknown nonlinear systems. While neural networks offer scalability in the design process, they lack formal guarantees, requiring additional approaches such as counterexample-guided inductive synthesis (CEGIS) with satisfiability modulo theories (SMT) for verification. However, the CEGIS-SMT framework requires knowledge of system dynamics, which is unavailable in practical settings. To address this, we leverage the generalization of the Willems et al.'s fundamental lemma, using a single state trajectory, to construct a data-driven representation of (partially) unknown models for SMT verification without sacrificing accuracy. Additionally, CEGIS-SMT further removes the constraint of restricting barrier certificates to specific function classes, such as sum-of-squares, enabling greater flexibility in their design. We validate our approach on three nonlinear case studies with (partially) unknown dynamics.

2605.20079 2026-05-20 cs.CV cs.AI cs.LG eess.IV

Probability-Conserving Flow Guidance

概率守恒的流引导

Parsa Esmati, Junha Hyung, Amirhossein Dadashzadeh, Jaegul Choo, Majid Mirmehdi

AI总结 本文提出了一种概率守恒的流引导方法AdaMaG,通过分析连续方程,将引导效果分解为发散项和分数平行项,并通过时间依赖的调度和分数平行衰减来控制这两个项,从而在不增加推理成本的情况下提高生成质量并减少幻觉。

详情
AI中文摘要

扩散和基于流的生成模型在视觉合成中占据主导地位,引导将样本对齐到用户输入并提高感知质量。然而,分类器无关引导(CFG)和基于外推的方法是速度/分数的启发式线性组合,忽略了生成流形的几何结构,破坏了概率守恒,导致在强引导下样本偏离学习的流形。我们通过连续方程分析引导,并展示其效果分解为一个发散项和一个在参数化下不变的分数平行项。我们证明发散项在采样接近数据流形时结构上会发散,这促使我们采用时间依赖的调度和分数平行衰减。所得到的即插即用规则,自适应流形引导(AdaMaG),在不增加推理成本的情况下限制了这两个项。最后,我们展示大多数减少饱和或提高生成质量的实证启发式方法直接对应于我们分解中的两个项。在图像生成基准测试中,AdaMaG提高了真实感,减少了幻觉,并在高引导制度下诱导了受控的去饱和。

英文摘要

Diffusion and flow-based generative models dominate visual synthesis, with guidance aligning samples to user input and improving perceptual quality. However, Classifier-Free Guidance (CFG) and extrapolation-based methods are heuristic linear combinations of velocities/scores that ignore the generative manifold geometry, breaking probability conservation and driving samples off the learned manifold under strong guidance. We analyse guidance through the continuity equation and show its effect decomposes into a divergence term and a score-parallel term defined invariantly across parameterisations. We prove the divergence term blows up structurally as sampling approaches the data manifold, motivating a time-dependent schedule alongside score-parallel attenuation. The resulting plug-and-play rule, Adaptive Manifold Guidance (AdaMaG), bounds both terms at no additional inference cost. Finally, we show that most empirical heuristics for reducing saturation or improving generation quality correspond directly to the two terms in our decomposition. Across image generation benchmarks, AdaMaG improves realism, reduces hallucinations, and induces controlled desaturation in high-guidance regimes.

2605.20038 2026-05-20 eess.SY cs.SY

A New Simple-to-Configure Self-Perturbing Multivariable Extremum-Seeking Controller

一种新的易于配置的自扰动多变量极值寻求控制器

Timothy I. Salsbury, Min Gyung Yu

AI总结 本文提出了一种新的随机继电器基于极值寻求控制器,用于多输入单输出系统,通过简化配置过程提高实际应用的可行性,展示了静态和动态系统的优化性能。

详情
Journal ref
Salsbury, Timothy I., and Min Gyung Yu. "A New Simple-to-Configure Self-Perturbing Multivariable Extremum-Seeking Controller." IFAC-PapersOnLine 58.28 (2024): 756-761
AI中文摘要

本文提出了一种新的随机继电器基于极值寻求控制器(ESC)用于多输入单输出(MISO)系统。该工作的目标是开发一种比其他方法更易于配置的算法,从而更容易地应用于现实问题。首先为静态映射开发了解决方案,然后将其适应于一般动态系统的类别。静态情况下的可配置参数数量为每个输入通道一个,而动态版本只需额外一个参数。通过使用随机继电器增益来解决梯度识别问题,并为静态情况提供了简单的稳定性证明。仿真测试展示了该策略在优化静态和动态系统方面的性能。

英文摘要

This paper presents a new stochastic relay-based extremum-seeking controller (ESC) for multi-input-single-output (MISO) systems. The goal of this work was to create an algorithm that is much simpler to configure than alternative approaches making deployment to real-world problems easier. A solution is developed first for a static map and then adapted for a general class of dynamic systems. The number of configurable parameters is one per input channel for the static case and only one additional parameter is needed for the dynamic version. The problem of gradient identification is solved via the use of stochastic relay gains and a simple stability proof for the static case is presented. Simulation tests demonstrate the performance of the strategy for optimizing both static and dynamic systems.

2605.20016 2026-05-20 eess.IV cs.CV

FGSVQA: Frequency-Guided Short-form Video Quality Assessment

FGSVQA:基于频率的短视频质量评估

Xinyi Wang, Angeliki Katsenou, Junxiao Shen, David Bull

AI总结 本文提出了一种端到端的视频质量评估框架,利用基于CLIP的密集视觉编码器和频率域中的压缩先验,生成具有伪影和结构感知的权重图,以实现高效的视频质量预测。

详情
Comments
4 pages, 1 figure
AI中文摘要

短视频给用户生成内容(UGC)的质量评估带来了新挑战,由于其复杂的生成流程、快速的内容变化和混合的失真。为了解决这一挑战,我们提出了一种端到端的视频质量评估(VQA)框架,该框架采用基于CLIP的密集视觉编码器,并结合从频率域导出的压缩先验,生成具有伪影和结构感知的权重图用于特征聚合。通过显式分解伪影、结构和原始视觉特征分支,并通过学习的门控模块在时间上自适应融合,所提出的方法实现了准确且高效的质量预测。实验结果表明,我们的方法在短视频数据集上在平均排名和线性相关性(SRCC: 0.736,PLCC: 0.787)方面表现出色,同时保持了高效的推理运行时间。代码和额外结果可在:https://github.com/xinyiW915/FGSVQA 获取。

英文摘要

Short-form video poses new challenges to the quality assessment of user-generated content (UGC) due to its complex generation pipeline, rapid content variation, and mixed distortions. To address this challenge, we propose an end-to-end video quality assessment (VQA) framework that employs a dense visual encoder based on CLIP, and incorporates compression priors derived from the frequency domain to generate artifact- and structure-aware weight maps for feature aggregation. By explicitly decomposing artifact, structure, and original visual feature branches and adaptively fusing them over time through a learned gating module, the proposed method achieves accurate and efficient quality prediction. Experimental results show that our method achieves strong performance on short-form video datasets in terms of average rank and linear correlation (SRCC: 0.736, PLCC: 0.787), while maintaining efficient inference runtime. The code and additional results are available at: https://github.com/xinyiW915/FGSVQA.

2605.19997 2026-05-20 eess.SP

CAT-MoEformer: Context-Aware Temporal MoE Transformer for Beam Prediction

CAT-MoEformer:基于上下文的时序MoE变换器用于波束预测

Changkai Zhou, Cunhua Pan, Hong Ren, Jiangzhou Wang

AI总结 本文提出CAT-MoEformer,一种结合场景条件混合专家(MoE)前馈网络的上下文感知变换器,用于从压缩的上行链路探测观测中主动预测毫米波波束。该方法通过三阶不对称卷积网络和 squeeze-and-excitation 重校准块提取频束相关特征,采用预训练的GPT-2模型建模波束序列的时间演变,并用场景条件MoE-FFN模块替代上层变换器的前馈网络,通过轻量级门控网络将场景标签和归一化用户设备速度映射到专家混合权重,从而在物理传播描述符上进行路由决策,避免了基于潜在隐藏状态的负载不平衡问题。

详情
Comments
13 pages, 5 figures
AI中文摘要

本文提出CAT-MoEformer,一种基于上下文的时序MoE变换器,用于主动预测毫米波波束。该方法通过三阶不对称卷积网络和squeeze-and-excitation重校准块提取频束相关特征,采用预训练的GPT-2模型建模波束序列的时间演变,并用场景条件MoE-FFN模块替代上层变换器的前馈网络。通过轻量级门控网络将场景标签和归一化用户设备速度映射到专家混合权重,从而在物理传播描述符上进行路由决策,避免了基于潜在隐藏状态的负载不平衡问题。为此,引入了三阶段训练策略:第一阶段通过硬专家分配建立场景特定的专业化,第二阶段通过隔离门控网络训练使软路由分布与硬划分对齐,第三阶段通过top-1硬推断进一步微调模型,在确定性单专家激活下最大化场景特定精度。在3GPP TR 38.901城市宏通道模拟中,64,000个用户样本的仿真结果表明,CAT-MoEformer实现了94.88%的Top-1波束预测准确率和80.62%的波束切换瞬间准确率,分别比CNN+GPT-2基线高出2.33%和9.55%,推理延迟为0.52 ms。

英文摘要

This paper proposes CAT-MoEformer, a context-aware transformer with scene-conditioned mixture-of-experts (MoE) feed-forward networks, for proactive mmWave beam prediction from compressed uplink pilot observations. The spatial encoder comprises a three-layer asymmetric convolutional network followed by a squeeze-and-excitation recalibration block, which extracts frequency-beam correlation features from pilot tensors without explicit channel reconstruction. A truncated pretrained GPT-2 backbone models the temporal evolution of beam sequences, with the feed-forward networks in the upper three transformer layers replaced by scene-conditioned MoE-FFN modules. A lightweight gating network maps the scenario label and normalized user equipment speed to expert mixing weights, conditioning the routing decision on physical propagation descriptors rather than on latent hidden states. This design yields interpretable expert assignments and eliminates the load imbalance associated with token-level routing. To prevent expert collapse under soft routing, a three-stage training strategy is introduced: hard expert assignment in the first stage establishes scene-specific specialization, isolated gating network training in the second stage aligns the soft routing distribution with the hard partition, and top-1 hard inference in the third stage fine-tunes the model under deterministic single-expert activation to maximize scene-specific precision. Simulation results on 3GPP TR 38.901 Urban Macro channel simulations with $64{,}000$ user samples demonstrate that CAT-MoEformer achieves a Top-1 beam prediction accuracy of $94.88\%$ and a beam switching instant accuracy of $80.62\%$, representing gains of $2.33\%$ and $9.55\%$ respectively over a CNN+GPT-2 baseline, with an inference latency of $0.52$~ms.

2605.19992 2026-05-20 eess.SY cs.SY math.OC

Robust synchronization for multi-agent systems governed by PDEs with observable and unobservable disturbances

由PDEs支配的多智能体系统中观测与不可观测扰动的鲁棒同步

Yongchun Bi, Jun Zheng, Guchuan Zhu, Jiye Zhang

AI总结 本文研究了由抛物型偏微分方程支配的多智能体系统在存在可观测和不可观测扰动下的鲁棒同步问题,设计了仅使用边界输出测量的扰动观测器以估计可观测的Dirichlet边界扰动并确保在不可观测扰动存在时观测器误差系统的鲁棒性,然后利用参考信号和局部输出信息构建分布式同步控制器,使所有智能体跟踪参考轨迹,特别在无不可观测扰动时实现指数跟踪,同时在控制器实施过程中保留鲁棒性。进一步分析了不可观测的Dirichlet-Robin边界扰动对同步性能的影响,并通过证明同步误差系统的解的有界性来证明这一点。此外,为表征所有扰动的影响,建立了闭环系统的输入到状态稳定性(ISS)。对于涉及的系统,广义Lyapunov方法和递归技术被广泛应用于稳定性分析,而提升技术和半群理论被用于证明系统的适定性。仿真结果验证了所提出的控制方案,展示了有效的扰动估计和抑制、鲁棒同步以及在各种场景下的ISS性质。

详情
AI中文摘要

本文研究了由抛物型偏微分方程支配的多智能体系统在存在可观测和不可观测扰动下的鲁棒同步问题。利用仅有的边界输出测量,设计了一个扰动观测器以估计可观测的Dirichlet边界扰动,同时确保在不可观测扰动存在于域内时观测器误差系统的鲁棒性。仅利用参考信号和局部输出信息,构建了分布式同步控制器,使所有智能体能够跟踪参考轨迹。特别地,在没有不可观测扰动的情况下实现了指数跟踪,而在控制器实施过程中出现额外不可观测扰动时仍保持鲁棒性。进一步分析了不可观测的Dirichlet-Robin边界扰动对同步性能的影响,通过证明同步误差系统的解的有界性来证明这一点。此外,为表征所有扰动的影响,建立了闭环系统的输入到状态稳定性(ISS)。对于涉及的系统,广义Lyapunov方法和递归技术被广泛应用于稳定性分析,而提升技术和半群理论被用于证明系统的适定性。仿真结果验证了所提出的控制方案,展示了有效的扰动估计和抑制、鲁棒同步以及在各种场景下的ISS性质。

英文摘要

This paper investigates robust synchronization for multi-agent systems (MASs) governed by parabolic partial differential equations in the presence of both observable and unobservable disturbances. Using only boundary output measurements, a disturbance observer is designed to estimate observable Dirichlet boundary disturbances while ensuring robustness of the observer error system with unobservable disturbances occurring in the domain. Using only the reference signal and local output information, distributed synchronization controllers are then constructed to enable all agents to track the reference trajectory. In particular, exponential tracking is achieved in the absence of unobservable disturbances, while robustness is preserved when additional unobservable disturbances occur during controller implementation. We further analyze the impact of unobservable Dirichlet-Robin boundary disturbances on synchronization performance by proving the boundedness of solutions to the synchronization error system. Moreover, to characterize the influence of all disturbances, input-to-state stability (ISS) is established for the closed-loop system. For the involved systems, the generalized Lyapunov method and the recursion technique are extensively employed in the stability analysis, and the lifting technique and semigroup theory are used to prove the well-posedness. Simulation results validate the proposed control scheme, demonstrating effective disturbance estimation and rejection, robust synchronization, and the ISS properties under various scenarios.

2605.19967 2026-05-20 eess.SY cs.SY

Safe Deep Reinforcement Learning for Spacecraft Reorientation with Pointing Keep-Out Constraint

具有指向保持区约束的空间飞行器再定向的安全深度强化学习

Juntang Yang, Mohamed Khalil Ben-Larbi

AI总结 本文提出了一种用于具有单一指向保持区的空间飞行器再定向控制的深度强化学习方法,设计了新的状态空间表示,制定了奖励函数以实现控制目标并施加姿态约束,采用软演员评论家算法处理连续状态和动作空间,并通过课程学习方法训练智能体,通过基于控制屏障函数的安全过滤器保证姿态约束的合规性。

详情
AI中文摘要

本文实现了具有安全过滤器的深度强化学习(DRL)用于具有单一指向保持区的空间飞行器再定向控制。设计了新的状态空间表示,包括姿态约束区的紧凑表示。制定了奖励函数以实现控制目标并施加姿态约束。采用软演员评论家(SAC)算法处理连续状态和动作空间。实现了课程学习方法用于智能体训练。为保证姿态约束的合规性,实施了基于控制屏障函数(CBF)的安全过滤器用于智能体部署。仿真结果展示了所提出的状态空间表示和设计的奖励函数的有效性。蒙特卡洛仿真表明,仅靠奖励塑造无法保证再定向 maneuver 的安全性。相比之下,使用基于CBF的安全过滤器,可以在 maneuver 中保证约束。

英文摘要

This paper implements deep reinforcement learning (DRL) with a safety filter for spacecraft reorientation control with a single pointing keep-out zone. A new state space representation is designed which includes a compact representation of the attitude constraint zone. A reward function is formulated to achieve the control objective while enforcing the attitude constraint. The soft actor-critic (SAC) algorithm is adopted to handle continuous state and action space. A curriculum learning approach is implemented for agent training. To guarantee the compliance of the attitude constraint, a control barrier function (CBF)-based safety filter is implemented for agent deployment. Simulation results demonstrate the effectiveness of the proposed state space presentation and the designed reward function. Monte Carlo simulations underscore that reward shaping alone cannot guarantee the safety during reorientation maneuver. In contrast, with the CBF-based safety filter, the constraint can be guaranteed during maneuvers.

2605.19963 2026-05-20 eess.SP

ADOPT: Analytical Demodulation of Periodic Textures for In-Plane Wave Tracking

ADOPT:周期纹理的分析解调用于平面波跟踪

Jalal Jouidi, Florent Chatelain, Lucie Bailly, Pierre Granjon, Nicolas Le Bihan, Stefan Catheline

AI总结 本文提出了一种基于二维分析信号的分析解调方法,用于从图像序列中跟踪平面波,该方法在高信噪比下优于传统数字图像相关技术,尤其在小位移情况下表现更佳,并且计算效率更高。

详情
Comments
21 pages, 14 figures
AI中文摘要

本文解决了利用周期表面图案从图像序列中跟踪平面波的问题。波诱导变形被建模为空间相位调制的周期载波。我们提出ADOPT(周期纹理的分析解调)方法,基于一个定向的二维分析信号来估计位移相位和方向。该方法依赖于描述纵向和横向平面波的物理模型。方向选择性滤波器分离了相关的频谱成分,相位提取提供了位移场的稳定重建。使用Cramer-Rao界进行的理论分析评估了ADOPT的性能极限。模拟结果表明,所提出的方法在高信噪比下优于最先进的数字图像相关(DIC)技术,特别是在小位移情况下,DIC受限。此外,ADOPT计算效率更高。在带有周期图案的硅膜实验中,验证了在冲击激励下对波场和色散曲线的准确估计。总体而言,所提出的框架为波诱导位移估计提供了一种稳健且高效的解决方案。

英文摘要

This paper addresses the problem of tracking in-plane waves from image sequences using periodic surface patterns. Wave-induced deformation is modeled as a spatial phase modulation of a periodic carrier. We propose ADOPT (Analytical Demodulation of Periodic Texture), a method based on an oriented two-dimensional analytic signal to estimate displacement phase and orientation. The approach relies on a physical model describing longitudinal and transverse in-plane waves. Orientation-selective filtering isolates relevant spectral components, and phase extraction provides a stable reconstruction of the displacement field. A theoretical analysis using the Cramer--Rao bound evaluates performance limits of ADOPT. Simulations show that the proposed method outperforms state-of-the-art Digital Image Correlation (DIC) at high signal-to-noise ratios, especially for small displacements where DIC becomes limited. Moreover, ADOPT is more computationally efficient. Experiments on silicone membranes with periodic patterns confirm accurate estimation of wave fields and dispersion curves under impulsive excitation. Overall, the proposed framework provides a robust and efficient solution for wave-induced displacement estimation.

2605.19887 2026-05-20 cs.DC cs.MA cs.RO cs.SY eess.SY

DAG-Based QoS-Aware Dynamic Task Placement for Networked Multi-Stage Control Pipelines

基于DAG的QoS感知动态任务放置用于网络化多阶段控制流水线

Thien Tran, Jonathan Kua, Thuong Hoang, Minh Tran, Yuemin Ding, Jiong Jin

AI总结 本文提出一种基于DAG的QoS感知动态任务放置框架,用于网络化机器人中的感知-感知-规划-控制流水线,通过动态任务放置优化计算、通信延迟和任务放置集,解决传统静态边缘卸载和单阶段模型的不足。

详情
Comments
4 pages, 1 figure, 1 algorithm, accepted as a Work-in-Progress (WiP) paper, on the 24th IEEE International Conference on Industrial Informatics (INDIN), 26-29 July, 2026, Melbourne, Australia
AI中文摘要

当前物理人工智能(PAI)严重依赖闭环视觉伺服流水线,其感知和规划阶段由于嵌入在机器人上的复杂模型可能在机载上变得计算密集。在实践中,将感知任务静态卸载到本地边缘是不适合具有标准化工业网络的高延迟敏感、精确工业环境的。这强调了在工业自动化中控制-通信-计算(3C)协同设计的重要性:单一本地执行会饱和AI加速的机器和机器人硬件,而静态边缘卸载会暴露控制环路到网络抖动。现有的自适应任务放置(ATP)控制器可以部分解决这一差距,通过在二进制阈值规则下将单个流水线阶段重新定位,但没有多阶段模型和显式的任务放置切换成本。在本工作进展(WiP)论文中,我们提出了一种基于有向无环图(DAG)的高质量服务(QoS)感知动态任务放置(DTP)框架,用于网络化机器人中的感知-感知-规划-控制流水线。该流水线被形式化为一个DAG,具有任务级别和节点级别的属性,用于计算成本、通信延迟和可行的任务放置集;在小的可解释候选集(完全本地、静态卸载、混合)上,基于窗口的成本函数结合尾端到端延迟、截止时间违规率、硬件利用率和汉明距离切换惩罚,并且DTP算法具有滞回和最小停留时间界限的任务放置抖动。本文的WiP论文提出了理论框架、结构化的定性分析以及两阶段仿真加硬件在环验证路线图。

英文摘要

Current Physical AI (PAI) relies heavily on closed-loop visual-servoing pipelines, whose perception and planning stages may become computationally intensive onboard due to complex models embedded on robots. In practice, offloading the perception task to on-site edges statically is inappropriate for latency-sensitive, precise industrial settings over a standardized industrial network. This emphasizes the importance of Control-Communication-Computing (3C) co-design in industrial automation: monolithic local execution saturates AI-accelerated machine and robot hardware, while static edge offloading exposes the control loop to network jitter. Existing adaptive task placement (ATP) controllers can partially address the gap by relocating a single pipeline stage on binary threshold rules, without a multi-stage model and an explicit cost on placement switching. In this Work-in-Progress (WiP) paper, we propose a directed acyclic graph (DAG) based quality-of-service (QoS)-aware dynamic task placement (DTP) framework for sensing-perception-planning-control pipelines in networked robotics. This pipeline is formalized as a DAG with task-level and node-level attributes for compute cost, communication delay, and feasible placement sets; over a small interpretable candidate set (fully local, static offload, hybrid), a window-based cost function combines tail end-to-end latency, deadline violation rate, hardware utilization, and a Hamming-distance switching penalty, and a DTP algorithm with hysteresis and a minimum dwell-time bounds placement chatter. Our WiP paper presents the theoretical framework, a structured qualitative analysis, and a two-phase simulation plus hardware-in-the-loop validation roadmap.

2605.19885 2026-05-20 eess.IV cs.CR cs.ET cs.MM

Set Shaping Theory as a Complementary Payload-Shaping Layer for Steganography

作为隐写术的互补载荷塑造层的集合塑造理论

Aida Koch, Logan Lewis, Lily Scott, Agi Weber

AI总结 本文研究了集合塑造理论(SST)作为可逆载荷塑造层在最低有效位(LSB)图像隐写术中的应用,通过控制模拟证明SST能有效降低KL散度,从而在基于直方图的准则下减少隐写痕迹的可检测性。

详情
AI中文摘要

本文研究了集合塑造理论(SST)作为可逆载荷塑造层在最低有效位(LSB)图像隐写术中的应用。该提案的目的是不替代现有隐写术方法,也不作为新的嵌入方案与之竞争,而是将SST定位为一个互补的预处理阶段,使现有的嵌入方法更容易应用,同时减少统计扰动。SST变换通过Glen Tankersley开发的近似快速变换算法实现,增加了K个符号的载荷长度。尽管嵌入的载荷从N位扩展到N+K位,所选表示可以减少D_KL(P||Q),从而在基于直方图的准则下使后续隐写插入更难检测。在1,800次受控模拟中,针对四个合成覆盖图像模型,SST相对于公平的N+K LSB基线平均降低了25.16%的D_KL(P||Q),置信区间为±1.22%。对于K=8,平均降低达到42.81%。额外的鲁棒性模拟验证了嵌入路径的随机性,证实了SST在多个距离上的效果:在K=8时,SST将KL散度降低了42.44%,Jensen-Shannon散度降低了29.62%,总变分降低了12.41%,对称卡方距离降低了28.30%。另外的基于图像的矩阵嵌入/STC样模拟显示,SST也减少了最小加权插入成本:相对于未塑造的K=0参考,K=8将成本降低了6.93%。

英文摘要

This paper studies the use of Set Shaping Theory (SST) as a reversible payload-shaping layer for least significant bit (LSB) image steganography. The proposal is not intended to replace existing steganographic methods or to compete with them as a new embedding scheme. Instead, SST is positioned as a complementary preprocessing stage that makes an existing embedding method easier to apply with lower statistical disturbance. The SST transformation increases the message length by K symbols and is implemented with the approximate and fast transformation algorithm developed by Glen Tankersley. Although the embedded payload is lengthened from N to N+K bits, the selected representation can reduce D_KL(P||Q) and therefore make the subsequent steganographic insertion less detectable under histogram-based criteria. Across 1,800 controlled simulations on four synthetic cover-image models, SST reduced D_KL(P||Q) by an average of 25.16 percent relative to a fair N+K LSB baseline, with a 95 percent confidence interval of +/- 1.22 percent. For K=8, the average reduction reached 42.81 percent. Additional robustness simulations with keyed random embedding paths confirmed the effect across several distances: at K=8, SST reduced KL divergence by 42.44 percent, Jensen-Shannon divergence by 29.62 percent, total variation by 12.41 percent, and symmetric chi-square distance by 28.30 percent. An additional image-based matrix-embedding/STC-like simulation showed that SST also reduces the minimum weighted insertion cost: relative to the unshaped K=0 reference, K=8 reduced the cost by 6.93 percent.

2605.19834 2026-05-20 cs.LG cs.AI cs.SY eess.SY

A Closed-loop, State-centric, Multi-agent Framework for Passenger Load Estimation from Heterogeneous Data Streams

一种闭环、以状态为中心的多智能体框架,用于从异构数据流中估计乘客负载

Yiyao Xu, Hao Zhou, Yuhang Wang, Jingran Sun

AI总结 本文提出一种闭环、以状态为中心的多智能体框架,用于从异构数据流中准确估计乘客负载,通过动态分配信任和物理约束提升鲁棒性。

详情
Comments
Preprint version of a paper accepted by the 2026 IEEE 29th International Conference on Intelligent Transportation Systems (ITSC). 7 pages, 4 figures
AI中文摘要

为了支持运营和乘客服务,公共交通机构需要可靠的乘客负载轨迹。目前,负载估计通常是从不完美的传感系统推断而来,而非完全观察,现代自动乘客计数(APC)系统的准确性仍受车站布局、流量强度和运营条件的影响。为了解决从异构数据流中稳健估计乘客负载的挑战,包括增量计数误差、证据冲突和上下文依赖的传感器可靠性,我们提出了一种闭环、以状态为中心的多智能体框架。该方法在每一步都强制物理可行性,动态分配信任给证据源,并将物理推导出的违反残差反馈回训练以提高鲁棒性。该架构包括一个统一的停靠事件骨干,一个耦合的感知-物理-融合循环用于停靠点推断,以及可选的行程级宏修正和闭环校准模块。

英文摘要

To support operations and passenger-facing services, transit agencies need reliable passenger load trajectories. Currently, load estimates are typically inferred from imperfect sensing systems rather than fully observed, and the accuracy of modern automatic passenger counting (APC) systems still varies with station layout, flow intensity, and operating conditions. To address the challenges of robust passenger load estimation from heterogeneous data streams, including incremental count errors, evidence conflicts, and context-dependent sensor reliability, we propose a closed-loop, state-centric, multi-agent framework. This method enforces physical feasibility at every step, allocates trust dynamically among evidence sources, and feeds physics-derived violation residuals back into training for robustness improvement. The architecture consists of a unified stop-event backbone, a coupled Perception--Physical--Fusion loop for stop-by-stop inference, and optional trip-level macro-correction and closed-loop calibration modules.

2605.19833 2026-05-20 cs.SD cs.AI cs.CL cs.MM eess.AS

Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation

Mega-ASR: 通过扩大现实世界声学模拟实现野外语音识别

Zhifei Xie, Kaiyu Pang, Haobin Zhang, Deheng Ye, Xiaobin Hu, Shuicheng Yan, Chunyan Miao

AI总结 本文提出Mega-ASR,一种统一的野外语音识别框架,结合可扩展的复合数据构建与渐进的声学到语义优化,通过在7种经典声学现象和54种物理上合理的复合场景上训练,显著提升了在恶劣环境下的语音识别性能。

详情
Comments
Project page: https://xzf-thu.github.io/Mega-ASR/. Code, models, and dataset will be released. A robust ASR framework targeting in-the-wild and compositional acoustic scenarios where conventional ASR systems fail
AI中文摘要

尽管自动语音识别(ASR)和大型音频-语言模型取得了快速进展,但现实环境中鲁棒的识别仍然受到一个“声学鲁棒性瓶颈”的限制:模型在严重、复合的失真下常常失去声学基础并产生遗漏或幻觉。我们提出了Mega-ASR,一种统一的ASR-in-the-wild框架,结合可扩展的复合数据构建与渐进的声学到语义优化。我们引入了Voices-in-the-Wild-2M,涵盖7种经典声学现象和54种物理上合理的复合场景,并通过Acoustic-to-Semantic Progressive Supervised Fine-Tuning和Dual-Granularity WER-Gated Policy Optimization训练Mega-ASR。大量实验表明,Mega-ASR在恶劣条件ASR基准测试中显著优于先前的最先进系统(在VOiCES R4-B-F上45.69% vs. 54.01%,在NOIZEUS Sta-0上21.49% vs. 29.34%)。在复杂的复合声学场景中,Mega-ASR进一步在强大的开源和闭源基线中实现了超过30%的相对WER降低,建立了在野外鲁棒ASR的可扩展范式。

英文摘要

Despite rapid advances in automatic speech recognition (ASR) and large audio-language models, robust recognition in real-world environments remains limited by an "acoustic robustness bottleneck": models often lose acoustic grounding and produce omissions or hallucinations under severe, compositional distortions. We propose Mega-ASR, a unified ASR-in-the-wild framework that combines scalable compound-data construction with progressive acoustic-to-semantic optimization. We introduce Voices-in-the-Wild-2M, covering 7 classic acoustic phenomena and 54 physically plausible compound scenarios, and train Mega-ASR with Acoustic-to-Semantic Progressive Supervised Fine-Tuning and Dual-Granularity WER-Gated Policy Optimization. Extensive experiments demonstrate that Mega-ASR achieves significant advantages over prior state-of-the-art systems on adverse-condition ASR benchmarks (45.69% vs. 54.01% on VOiCES R4-B-F, and 21.49% vs. 29.34% on NOIZEUS Sta-0). On complex compositional acoustic scenarios, Mega-ASR further delivers over 30% relative WER reduction against strong open- and closed-source baselines, establishing a scalable paradigm for robust ASR in-the-wild.

2605.19793 2026-05-20 cs.NI cs.ET cs.SY eess.SY

Motion-Coupled Sensing: When the State Change Powers Its Own Sensing

运动耦合感知:当状态变化本身驱动其自身的感知

Muhammad Tahir, Muhammad Mubbashar Baig, Umer Irfan, Muhammad Ahad, Naveed Anwar Bhatti

AI总结 该研究提出了一种运动耦合感知方法,利用机械运动提供的能量进行自供电的感知任务,解决了传统物联网系统中对电池维护和周期性轮询的依赖问题。

详情
Comments
9 Pages, 12 Figures
AI中文摘要

无电池物联网系统长期以来主要沿着两条路径发展:环境能量感知,其中能量到达与被监控事件相互分离,以及动能事件电报,其中用户操作本身为短时报告提供能量。机械闸门状态暴露了第三种情况:访问运动不仅是需要报告的事件,而且是潜在物理状态可能已改变并必须测量的时刻。我们证明常规铰链运动可以为一个有限的唤醒-感知-传输交易提供足够的能量,包括超声波感知和远距离LoRa上行链路。我们称之为运动耦合感知,并通过一个开源的紧凑型电磁谐振器实现,该谐振器可无需结构修改地适配到垃圾桶、门和柜子上。我们为最 demanding 的工作负载(垃圾桶监控)设计了该平台,其中每次操作必须为超声波测量和远距离LoRa上行链路提供能量。在五个校园地点和5,945次盖子操作中,垃圾桶部署实现了99.3%的每事件传输可靠性。在房间门上的现场部署中有1,870次操作,办公室柜子上有1,636次操作,分别实现了92%和94%的传输成功率,证明了相同的能量包络在不同铰链几何形状上无需硬件重新设计即可转移。这些结果表明,机械访问可以被视为自供电的感知交易,从而移除了物联网部署中周期性轮询和计划电池维护的需要。

英文摘要

Batteryless IoT systems have largely followed two paths: ambient-energy sensing, where energy arrival is decoupled from the event being monitored, and kinetic event telegrams, where a user actuation powers a short report of the actuation itself. Mechanically gated states expose a third case: the access motion is not only an event to report, but the moment at which a latent physical state may have changed and must be measured. We show that routine hinge motion can supply enough energy for one bounded wake-sense-transmit transaction, including ultrasonic sensing and a long-range LoRa uplink. We call this principle motion-coupled sensing and instantiate it with an open-source compact electromagnetic harvester that retrofits to bins, doors, and cabinets with no structural modification. We size the platform for the most demanding workload, waste-bin monitoring, where each actuation must power both an ultrasonic measurement and a long-range LoRa uplink. Across five campus locations and 5,945 lid actuations, the bin deployment achieves 99.3% per-event transmission reliability. Field deployments on room doors with 1,870 actuations and office cabinets with 1,636 actuations achieve 92% and 94% transmission success respectively, demonstrating that the same energy envelope transfers across hinge geometries without hardware redesign. These results show that mechanical access can be treated as a self-powered sensing transaction, removing periodic polling and scheduled battery maintenance for IoT deployments.

2605.19790 2026-05-20 eess.SP

Channel Estimation for Beyond Diagonal RIS-Aided Multi-User mmWave Systems

面向超diagonal RIS辅助多用户毫米波系统的信道估计

Linyu Peng, Tian Qiu, Cunhua Pan, Jiangzhou Wang, Taihao Zhang, Hong Ren

AI总结 本文提出了一种新的块克罗内克结构级联信道模型,用于超diagonal RIS辅助多用户毫米波系统,通过三级估计协议提高信道估计精度并降低试点开销。

详情
AI中文摘要

超diagonal可重构智能表面(BD-RIS)代表了推进毫米波(mmWave)通信的有前景的架构。然而,其复杂的元件连接破坏了传统的解耦数学结构,从而严重复杂化了级联信道估计。本文为配备均匀平面阵列(UPAs)的组连接BD-RIS辅助多用户(MU)毫米波系统提出了一种新的块克罗内克结构级联信道模型。通过利用级联信道稀疏性,提出了一种高效的三级估计协议。具体而言,第一阶段通过基于离散傅里叶变换(DFT)的方法获取基站(BS)接收到的公共到达角(AoAs)。第二阶段利用块克罗内克结构以及正交匹配追踪(OMP)和基于相关性的最小二乘(LS)方法来提取指定典型用户的完整级联信道。最后,第三阶段利用分层块OMP(HBOMP)算法来估计其他用户的信道。这种结构重建了公共和用户特定的组件,从根本上降低了计算复杂性并显著减少了试点开销。数值模拟验证了所提协议在保持相对低的试点开销的同时提高了信道估计精度。

英文摘要

Beyond diagonal reconfigurable intelligent surface (BD-RIS) represents a promising architecture for advancing millimeter-wave (mmWave) communications. However, its intricate inter-element connections invalidate the conventional decoupled mathematical structure, thereby severely complicating cascaded channel estimation. In this paper, we formulate a novel block-Kronecker-structured cascaded channel model for a \textit{group-connected} BD-RIS-aided multi-user (MU) mmWave system equipped with uniform planar arrays (UPAs). By exploiting the cascaded channel sparsity, an efficient three-stage estimation protocol is proposed. Specifically, Stage I acquires the common angles of arrival (AoAs) at the base station (BS) via a discrete Fourier transform (DFT)-based approach. Stage II leverages the block-Kronecker structure alongside orthogonal matching pursuit (OMP) and correlation-based least squares (LS) to extract the complete cascaded channel for a designated typical user. Finally, Stage III utilizes a Hierarchical Block OMP (HBOMP) algorithm to estimate the other users' channels. This structurally reconstructs the common and user-specific components, which fundamentally reduces the computational complexity and substantially reduces the pilot overhead. Numerical simulations verify that the proposed protocol yields improved channel estimation accuracy while maintaining a relatively low pilot overhead.

2605.18222 2026-05-20 eess.AS

Contextual Biasing for Streaming ASR via CTC-based Word Spotting

通过基于CTC的词搜索实现流式ASR中的上下文偏置

Kai-Chen Tsai, Tien-Hong Lo, Yun-Ting Sun, Berlin Chen

AI总结 本文提出了一种基于CTC的词搜索的流式ASR上下文偏置方法,通过在音频块间维护活跃关键词路径,实现了跨块关键词检测,并引入增量承诺机制以降低延迟并保证输出稳定性,实验表明该方法有效降低了WER并提升了关键词F分数。

详情
AI中文摘要

上下文偏置对于提高自动语音识别(ASR)系统中罕见词和领域特定词的识别性能至关重要。尽管近年来提出了多种方法,但大多数方法集中在离线设置,并未明确解决流式ASR的挑战。例如,基于CTC的词搜索(CTC-WS)通过直接从CTC对数概率中检测关键词表现出强大性能,但其局限于离线处理并需要完整话语的访问。在本工作中,我们提出了CTC-WS的流式扩展,用于实时上下文偏置。我们的方法通过使用具有状态的标记传递算法在音频块间维护活跃关键词路径,从而实现跨块关键词检测。为确保低延迟和稳定输出,我们引入了增量承诺机制,该机制仅发出保证不受未来音频影响的片段,同时推迟不确定区域。该方法自然地与流式ASR流水线集成,无需对底层声学模型进行修改或额外训练,使其在实际部署中具有实用性。实验结果表明,我们的方法降低了整体WER,并有效提高了关键词F分数,证明了其在实时ASR应用中的有效性。

英文摘要

Contextual biasing is essential to improving the recognition of rare and domain-specific words in an automatic speech recognition (ASR) system. While numerous methods have been proposed in recent years, most of them focus on offline settings and do not explicitly address the challenges of streaming ASR. For example, CTC-based word spotting (CTC-WS) have demonstrated strong performance by directly detecting keywords from CTC log-probabilities, but they are limited to offline processing and require access to the full utterance. In This work, we present a streaming extension of CTC-WS for real-time contextual biasing. Our method maintains active keyword paths across audio chunks using a stateful token passing algorithm, enabling the detection of keywords that span multiple chunks. To ensure low latency and stable output, we introduce an incremental commitment mechanism that only emits segments guaranteed not to be affected by future audio, while deferring uncertain regions. This method naturally integrates with streaming ASR pipelines and does not require modifications to the underlying acoustic model or additional training, making it practical for real-world deployment. Experimental results show that our method reduces overall WER and effectively improves keyword F-score, demonstrating its effectiveness for real-time ASR applications.

2605.14852 2026-05-20 eess.SP cs.SY eess.SY

An integration-free approach for particle flow filtering

一种无需积分的粒子流滤波方法

Domonkos Csuzdi, Tamás Bécsi, Olivér Törő

AI总结 本文提出了一种无需积分的粒子流滤波方法,通过将ODE转换为特定的本征空间,推导出闭合形式的代数表达式,并证明其等价于精确的卡尔曼测量更新,从而在非线性测量模型中实现了高效且稳定的粒子更新。

详情
AI中文摘要

Log-homotopy粒子流滤波器通过连续地将样本从先验分布迁移至后验分布来实现非线性贝叶斯估计。这种迁移由伪时间常微分方程(ODE)控制。这些滤波器的主要实际挑战是需要数值积分,这导致计算成本高且易出现刚性问题。本文开发了一种精确的、无需积分的闭合形式解,用于精确的Daum-Huang确定性粒子流,适用于向量线性高斯测量。通过将ODE转换为特定的本征空间,我们推导出同质状态转移矩阵和非同质激励项的闭合形式代数表达式。我们证明这种解析解等同于精确的卡尔曼测量更新。我们将这种闭合形式评估嵌入到N步分段方法中,用于非线性测量模型。我们进一步提出一个常数收缩率子步骤计划,使每一步在D的本征方向上的收缩率趋于相等。结果是一种能够缓解刚性的、无需积分的粒子更新方法,适用于高度非线性的测量模型。在仅靠轴承跟踪基准测试中,它在比较的滤波器中实现了最低的误差,每更新成本与确定性粒子流基线相当,远低于随机流。

英文摘要

Log-homotopy particle flow filters realize nonlinear Bayesian estimation by continuously migrating samples from the prior to the posterior distribution. This transport is governed by a pseudo-time ordinary differential equation (ODE). A major practical challenge of these filters is the need for numerical integration, which suffers from high computational cost and susceptibility to stiffness. This paper develops an exact, integration-free closed-form solution for the exact Daum--Huang deterministic particle flow under vector linear Gaussian measurements. By transforming the ODE into a specific eigenspace, we derive closed-form algebraic expressions for both the homogeneous state transition matrix and the inhomogeneous forcing term. We prove that this analytic solution is equivalent to the exact Kalman measurement update. We embed this closed-form evaluation within an $N$-step piecewise method for nonlinear measurement models. We further propose a constant contraction rate substep schedule that equalizes the per-step contraction along the eigendirection of $D$ associated with the largest eigenvalue $α_{\max}$. The result is a stiffness-mitigating, integration-free particle update for highly nonlinear measurement models. On a bearings-only tracking benchmark, it achieves the lowest error among the compared filters, at a per-update cost comparable to deterministic particle flow baselines and substantially lower than stochastic flows.

2512.20332 2026-05-20 cs.IT eess.SP math.IT

RIS-Empowered OTFS Modulation With Faster-than-Nyquist Signaling in High-Mobility Wireless Communications

利用RIS的OTFS调制与Faster-than-Nyquist信号在高机动无线通信中的应用

Chaorong Zhang, Benjamin K. Ng, Hui Xu, Chan-Tong Lam, Halim Yanikomeroglu

AI总结 本文提出了一种结合RIS、OTFS调制和Faster-than-Nyquist信号的新方案,旨在提高高机动无线通信中的可靠性和频谱效率。

详情
Comments
Revision in IEEE Journal
AI中文摘要

高机动无线通信系统受到严重的多普勒扩展和多路径延迟影响,这会降低传统调制方案的可靠性和频谱效率。正交时间频率空间(OTFS)调制通过在延迟-多普勒(DD)域中表示符号,从而在这些环境中表现出强大的鲁棒性,而Faster-than-Nyquist(FTN)信号可以通过有意的符号堆积进一步提高频谱效率。同时,可重构智能表面(RIS)通过被动波束成形提供了一种改进链路质量的有前途的方法。受这些优势的启发,我们提出了一种新的RIS增强的OTFS调制与FTN信号(RIS-OTFS-FTN)方案。首先,我们建立了统一的DD域输入-输出关系,该关系共同考虑了RIS被动波束成形、FTN引起的符号间干扰以及DD域信道特性。基于此模型,我们为帧错误率、频谱效率和峰值对平均功率比(PAPR)等提供了全面的分析性能。此外,设计了一种实际的RIS相位调整策略,采用量化相位选择以最大化有效信道增益。在标准化的扩展车辆A(EVA)信道模型下进行的广泛蒙特卡洛模拟验证了理论结果,并提供了关于频谱效率、PAPR、输入回退(IBO)和误差性能之间权衡的关键见解,其中有一些有趣的见解。所提出的RIS-OTFS-FTN方案在可靠性和频谱效率方面均表现出显著的性能提升,为未来高机动性和频谱受限的无线系统提供了一种可行的解决方案。

英文摘要

High-mobility wireless communication systems suffer from severe Doppler spread and multi-path delay, which degrade the reliability and spectral efficiency of conventional modulation schemes. Orthogonal time frequency space (OTFS) modulation offers strong robustness in such environments by representing symbols in the delay-Doppler (DD) domain, while faster-than-Nyquist (FTN) signaling can further enhance spectral efficiency through intentional symbol packing. Meanwhile, reconfigurable intelligent surfaces (RIS) provide a promising means to improve link quality via passive beamforming. Motivated by these advantages, we propose a novel RIS-empowered OTFS modulation with FTN signaling (RIS-OTFS-FTN) scheme. First, we establish a unified DD-domain input-output relationship that jointly accounts for RIS passive beamforming, FTN-induced inter-symbol interference, and DD-domain channel characteristics. Based on this model, we provide comprehensive analytical performance for the frame error rate, spectral efficiency, and peak-to-average power ratio (PAPR), etc. Furthermore, a practical RIS phase adjustment strategy with quantized phase selection is designed to maximize the effective channel gain. Extensive Monte Carlo simulations under a standardized extended vehicular A (EVA) channel model validate the theoretical results and provide key insights into the trade-offs among spectral efficiency, PAPR, input back-off (IBO), and error performance, with some interesting insights.The proposed RIS-OTFS-FTN scheme demonstrates notable performance gains in both reliability and spectral efficiency, offering a viable solution for future high-mobility and spectrum-constrained wireless systems.

2512.01482 2026-05-20 math.OC cs.SY eess.SY

On robotic manipulators with time-dependent inertial parameters: From physical consistency to boundedness of the mass matrix

关于具有时间依赖惯性参数的机械臂:从物理一致性到质量矩阵的有界性

Tom Kaufmann, Johann Reger

AI总结 本文研究了具有时间依赖惯性参数的机械臂动力学,提出了一种通用的机器人方程,考虑了惯性参数随时间变化以及由于质量分布重新调整引起的影响,并探讨了惯性参数的物理一致性和有界性对质量矩阵的影响,为自适应控制提供了更现实的鲁棒性测试方法。

详情
Comments
to be published in Nonlinear Dynamics
AI中文摘要

我们通过引入时间依赖惯性参数以及由于内部质量分布重新调整引起的影响,扩展了描述开放式运动链动力学的机器人方程。时间依赖惯性参数主要发生在机器人负载末端效应器时,这一场景在我们的模型中被涵盖,而无需限制机器人运动参数必须保持不变。此外,我们的模型还包含了符合这种运动限制的质量分布重新调整,例如连接到机器人上的拖车或乘客的移动。为了配合通用的机器人方程,我们引入了统一的物理一致性和惯性参数上界的概念,从而可以证明质量矩阵存在有限、正的统一界这一结构性质在时间依赖惯性参数情况下仍然成立。这些发现对自适应控制有影响,因为它们有助于更现实地测试对未知时间依赖性的鲁棒性。此外,本文的结果还为确保在上界和均匀物理一致的估计条件下,估计质量矩阵存在有限、正的统一界提供了途径。

英文摘要

We generalize the robotics equation describing the dynamics of open kinematic chains by including the effect of time-dependent change of inertial parameters as well as the effects of causative mass-density redistribution, triggered by internal movement of mass-carrying particles relative to their body-fixed frames. Time dependency of inertial parameters that results from the sole addition of mass to the robot prominently occurs during the loading of end-effectors--a scenario covered by our model without restriction from the restraint that kinematic parameters of the robot must remain constant. Further, our model also includes internal mass-density redistributions that adhere to this kinematic restraint such as trolleys attached to the robot or the movement of passengers. To accompany the generalized robotics equation with some theoretical infrastructure, we then introduce the concepts of uniform physical consistency and upper boundedness of inertial parameters under which desirable, structural properties regarding the existence of finite, positive uniform bounds of the mass matrix can be shown to carry over to the more involved case of time-dependent inertial parameters. These findings have implications for adaptive control, as they facilitate more realistic testing for robustness against unforeseen time dependencies. Moreover, the results in this paper also provide a pathway to ensuring the desirable existence of finite, positive uniform bounds of the estimated mass matrix under upper bounded, uniformly physically consistent estimation regimes.

2512.00667 2026-05-20 eess.SY cs.RO cs.SY

Active Learning of Fractional-Order Viscoelastic Model Parameters for Realistic Haptic Rendering

分数阶黏弹性模型参数的主动学习用于真实触觉渲染

Harun Tolasa, Gorkem Gemalmaz, Volkan Patoglu

AI总结 本文提出了一种系统的方法,通过主动学习优化分数阶黏弹性模型的参数,以提高触觉渲染的感知真实感,同时通过人类在回路优化和群体感知地图结合,选择出在一般人群中被广泛认为真实的参数。

详情
Comments
This work has been submitted to the IEEE Transactions on Haptics for possible publication. 14 pages, 8 figures
AI中文摘要

有效的医疗模拟器需要真实地渲染具有黏弹性材料特性(如蠕变和应力松弛)的生物组织。分数阶模型提供了一种有效描述本质上时间依赖的黏弹性动力学的方法,仅需少量参数,因为它们自然地捕捉记忆效应。然而,由于分数元素的阶数与其他参数之间的非直观、频率依赖的耦合,确定产生高感知真实感的分数阶模型参数值仍是一个重大挑战。在本研究中,我们提出了一种系统的方法,通过主动学习优化分数阶黏弹性模型的参数,以优化触觉渲染在一般人群中的感知真实感。首先,我们证明通过基于定性反馈的人类在回路(HiL)优化可以有效优化分数阶模型的参数,以确保对每个人都能保持一致的高真实感评分。其次,我们提出了一种严格的方法,将HiL优化结果结合到一个在完整数据集上训练的聚合感知地图中,并展示如何从这种表示中选择群体层面的最佳参数,这些参数在一般人群中被广泛认为是真实的。最后,我们通过人类受试者实验验证了广义分数阶黏弹性模型参数在三种黏弹性材料中的有效性。总体而言,通过所提出的HiL优化和聚合方法建立的广义分数阶黏弹性模型有潜力显著提高医疗训练模拟器的sim-to-real过渡性能。

英文摘要

Effective medical simulators necessitate realistic haptic rendering of biological tissues that exhibit viscoelastic material properties, such as creep and stress relaxation. Fractional-order models provide an effective means of describing intrinsically time-dependent viscoelastic dynamics with few parameters, as they naturally capture memory effects. However, due to the unintuitive, frequency-dependent coupling among the order of the fractional element and other parameters, determining appropriate parameter values for fractional-order models that yield high perceived realism remains a significant challenge. In this study, we propose a systematic means of determining the parameters of fractional-order viscoelastic models that optimizes the perceived realism of haptic rendering across general populations. First, we demonstrate that the parameters of fractional-order models can be effectively optimized through active learning, using qualitative feedback-based human-in-the-loop (HiL) optimization, to ensure consistently high realism ratings for each individual. Second, we propose a rigorous method to combine HiL optimization results into an aggregate perceptual map trained on the entire dataset, and demonstrate how to select population-level optimal parameters from this representation that are broadly perceived as realistic across general populations. Finally, we provide evidence of the effectiveness of the generalized fractional-order viscoelastic model parameters for three viscoelastic materials by characterizing their perceived realism through human-subject experiments. Overall, generalized fractional-order viscoelastic models established through the proposed HiL optimization and aggregation approach possess the potential to significantly improve the sim-to-real transition performance of medical training simulators.

2511.18236 2026-05-20 cs.RO cs.SY eess.SY

APULSE: A Scalable Hybrid Algorithm for the RCSPP on Large-Scale Dense Graphs

APULSE:一种用于大规模密集图上RCSPP的可扩展混合算法

Nuno Soares, António Grilo

AI总结 本文提出APULSE算法,通过结合A*启发式搜索、Pulse式剪枝机制和时间桶策略,高效解决大规模密集图上的资源受限最短路径问题,展现出显著的可扩展性和鲁棒性。

详情
Journal ref
in IEEE Access, vol. 14, pp. 40690-40706, 2026
Comments
This version corrects keywords and reference [9]. 9 pages
AI中文摘要

资源受限最短路径问题(RCSPP)是一个基础的NP难优化挑战,广泛应用于网络路由和自主导航等领域。该问题涉及在受预算限制的二次资源下寻找最小主成本路径。尽管存在各种RCSPP求解器,但它们在应用于复杂现实场景中常见的大型密集图时往往面临严重的可扩展性限制,使其在时间敏感的规划中不切实际。在无人地面车辆(UGVs)的任务规划等领域,这种挑战尤为突出。本文介绍APULSE,一种混合标签设置算法,旨在高效解决此类挑战性图中的RCSPP。APULSE结合了由A*启发式引导的最佳优先搜索、激进的Pulse式剪枝机制以及时间桶策略,以有效减少状态空间。通过使用大规模UGV规划场景的计算研究,APULSE与最先进的算法进行了基准测试。结果表明,APULSE在大型问题实例上能够以数量级更快的速度和更高的鲁棒性找到近最优解,特别是在竞争方法失败的情况下。这种优越的可扩展性使APULSE成为复杂大规模环境中的RCSPP有效解决方案,使其能够实现交互式决策支持和动态重新规划能力。

英文摘要

The resource-constrained shortest path problem (RCSPP) is a fundamental NP-hard optimization challenge with broad applications, from network routing to autonomous navigation. This problem involves finding a path that minimizes a primary cost subject to a budget on a secondary resource. While various RCSPP solvers exist, they often face critical scalability limitations when applied to the large, dense graphs characteristic of complex, real-world scenarios, making them impractical for time-critical planning. This challenge is particularly acute in domains like mission planning for unmanned ground vehicles (UGVs), which demand solutions on large-scale terrain graphs. This paper introduces APULSE, a hybrid label-setting algorithm designed to efficiently solve the RCSPP on such challenging graphs. APULSE integrates a best-first search guided by an A* heuristic with aggressive, Pulse-style pruning mechanisms and a time-bucketing strategy for effective state-space reduction. A computational study, using a large-scale UGV planning scenario, benchmarks APULSE against state-of-the-art algorithms. The results demonstrate that APULSE consistently finds near-optimal solutions while being orders of magnitude faster and more robust, particularly on large problem instances where competing methods fail. This superior scalability establishes APULSE as an effective solution for RCSPP in complex, large-scale environments, enabling capabilities such as interactive decision support and dynamic replanning.

2510.23820 2026-05-20 eess.SY cs.SY

MDP-based Energy-aware Task Scheduling for Battery-less IoT

基于MDP的无电池物联网任务调度:节能导向

Shahab Jahanbazi, Mateen Ashraf, Onel L. A. López

AI总结 本文提出了一种基于MDP的无电池物联网任务调度方法,通过建模能量间歇性和严格的时间约束,设计了具有阈值结构的最优稳定调度器,以提高长期任务完成率和可靠性。

详情
AI中文摘要

无电池物联网设备依赖于环境能量采集,因此需要联合考虑能量间歇性和严格的时间约束的调度策略。在周期性监控应用中,传感-计算-传输任务链必须在每个报告周期内完成。本文将该问题建模为具有独立和相同分布(i.i.d.)能量到达特性的长期平均奖励马尔可夫决策过程(MDP),明确捕捉电容器电压变化、任务顺序、允许的起始窗口和安全执行要求。我们进一步提出了促进可靠任务完成并惩罚低能量执行风险的奖励机制。我们证明所考虑的MDP是unichain的,并且最优稳定策略具有阈值结构,从而得到最优稳定阈值基于(OSTB)调度器。为了考虑更现实的能量来源,我们还研究了基于有限状态马尔可夫过程的相关采集模型,并证明所提出的框架可以在保守的充分条件下应用于该更丰富的设置。最后,数值结果表明,OSTB在长期完整链完成率、功率故障和延迟方面优于代表性基线,特别是在采集能量稀缺时表现更优。

英文摘要

Battery-less Internet of Things (IoT) devices rely on ambient energy harvesting and therefore require scheduling policies that jointly account for energy intermittency and hard timing constraints. This challenge is especially acute in periodic monitoring applications, where a sensing--computing--transmitting task chain must be completed within each reporting cycle. In this paper, we formulate this problem within a setting characterized by independently and identically distributed (i.i.d.) energy arrivals as a long-term average-reward Markov decision process (MDP) that explicitly captures capacitor-voltage evolution, task ordering, permissible start windows, and safe-execution requirements. We further propose rewards that promote reliable task completion while penalizing risky low-energy execution. We prove that the considered MDP is unichain and that the optimal stationary policy has a threshold structure, which leads to an optimal stationary threshold-based (OSTB) scheduler. To account for more realistic energy sources, we additionally study a correlated harvesting model based on a finite-state Markov process and show that the proposed framework can be applied to this richer setting under conservative sufficient conditions. Finally, numerical results show that OSTB outperforms representative baselines in terms of long-term full-chain completion rate, power failures, and latency, particularly when harvested energy is scarce.

2508.07285 2026-05-20 eess.AS

Non-Intrusive Automatic Speech Recognition Refinement: A Survey

非侵入式自动语音识别精修:综述

Mohammad Reza Peyghan, Saman Soleimani Roudi, Saeedreza Zouashkiani, Sajjad Amini, Fatemeh Rajabi, Shahrokh Ghaemmaghami

AI总结 本文综述了非侵入式自动语音识别精修方法,探讨了融合、重评分、修正、蒸馏和训练调整五类核心方法,分析了其优缺点及应用场景,并提出了标准化评估指标以促进公平比较,为开发更鲁棒、准确的ASR精修管道提供基础。

详情
AI中文摘要

自动语音识别(ASR)是现代技术的核心组成部分,驱动着语音激活助手、转录服务和可访问性工具等应用。然而,ASR系统仍难以应对人类语音的固有变异性,如口音、方言和说话方式,以及环境干扰,包括背景噪声。此外,领域特定的对话常使用专业术语,这会加剧转录错误。这些不足不仅降低了原始ASR的准确性,还会通过后续的自然语言处理流程传播错误。由于重新设计ASR模型成本高且耗时,非侵入式精修技术,即不改变模型架构的方法,变得越来越受欢迎。在本文综述中,我们回顾了当前非侵入式精修方法,并将其分为五类:融合、重评分、修正、蒸馏和训练调整。对于每类方法,我们概述了主要方法、优势、缺点以及理想的应用场景。除了方法分类外,本文还调研了旨在在领域特定上下文中精修ASR的适应技术,回顾了常用评估数据集及其构建过程,并提出了标准化的指标集以促进公平比较。最后,我们识别了开放研究空白,并提出了未来工作的有前途方向。通过提供这种结构化的概述,我们旨在为研究人员和实践者提供开发更鲁棒、准确的ASR精修管道的清晰基础。

英文摘要

Automatic Speech Recognition (ASR) is an integral component of modern technology, powering applications such as voice-activated assistants, transcription services, and accessibility tools. Yet ASR systems continue to struggle with the inherent variability of human speech, such as accents, dialects, and speaking styles, as well as environmental interference, including background noise. Moreover, domain-specific conversations often employ specialized terminology, which can exacerbate transcription errors. These shortcomings not only degrade raw ASR accuracy but also propagate mistakes through subsequent natural language processing pipelines. Because redesigning an ASR model is costly and time-consuming, non-intrusive refinement techniques that leave the model's architecture intact have become increasingly popular. In this survey, we review current non-intrusive refinement approaches and group them into five classes: fusion, re-scoring, correction, distillation, and training adjustment. For each class, we outline the main methods, advantages, drawbacks, and ideal application scenarios. Beyond method classification, this work surveys adaptation techniques aimed at refining ASR in domain-specific contexts, reviews commonly used evaluation datasets along with their construction processes, and proposes a standardized set of metrics to facilitate fair comparisons. Finally, we identify open research gaps and suggest promising directions for future work. By providing this structured overview, we aim to equip researchers and practitioners with a clear foundation for developing more robust, accurate ASR refinement pipelines.

2506.12218 2026-05-20 eess.SP cs.LG

Directed Acyclic Graph Convolutional Networks

有向无环图卷积网络

Samuel Rey, Hamed Ajorlou, Gonzalo Mateos

AI总结 本文提出了一种专门针对DAG上信号卷积学习的新型图神经网络架构DCN,通过因果图滤波器学习节点表示,利用正式的卷积操作实现频域表示,并引入并行DCN(PDCN)以解耦模型复杂度与图规模,实验证明其在准确率、鲁棒性和计算效率上优于现有方法。

详情
AI中文摘要

有向无环图(DAG)在科学和工程应用中至关重要,包括因果推断、调度和神经架构搜索。本文介绍DAG卷积网络(DCN),一种专为从DAG上信号进行卷积学习设计的新型图神经网络(GNN)架构。DCN利用因果图滤波器学习节点表示,这些表示考虑了DAG固有的部分顺序,这是一种在传统GNN中不存在的强归纳偏差。与以往在DAG上的机器学习方法不同,DCN基于允许频域表示的正式卷积操作。我们进一步提出并行DCN(PDCN),该模型将输入DAG信号馈入并行的因果图移位操作符银行,并使用共享的多层感知机处理这些DAG感知特征。这样,PDCN在解耦模型复杂度与图规模的同时保持了令人满意的预测性能。所提架构的排列等变性和表达能力也得到了确立。在多个任务、数据集和实验条件下进行全面的数值测试表明,(P)DCN在准确率、鲁棒性和计算效率方面均优于现有最先进基线。这些结果将(P)DCN定位为一种可行的深度学习框架,该框架专门针对DAG结构数据进行设计,基于第一性(图)信号处理原理。

英文摘要

Directed acyclic graphs (DAGs) are central to science and engineering applications including causal inference, scheduling, and neural architecture search. In this work, we introduce the DAG Convolutional Network (DCN), a novel graph neural network (GNN) architecture designed specifically for convolutional learning from signals supported on DAGs. The DCN leverages causal graph filters to learn nodal representations that account for the partial ordering inherent to DAGs, a strong inductive bias does not present in conventional GNNs. Unlike prior art in machine learning over DAGs, DCN builds on formal convolutional operations that admit spectral-domain representations. We further propose the Parallel DCN (PDCN), a model that feeds input DAG signals to a parallel bank of causal graph-shift operators and processes these DAG-aware features using a shared multilayer perceptron. This way, PDCN decouples model complexity from graph size while maintaining satisfactory predictive performance. The architectures' permutation equivariance and expressive power properties are also established. Comprehensive numerical tests across several tasks, datasets, and experimental conditions demonstrate that (P)DCN compares favorably with state-of-the-art baselines in terms of accuracy, robustness, and computational efficiency. These results position (P)DCN as a viable framework for deep learning from DAG-structured data that is designed from first (graph) signal processing principles.

2506.03178 2026-05-20 eess.IV cs.AI cs.CV

LLaMA-XR: A Novel Framework for Radiology Report Generation using LLaMA and QLoRA Fine Tuning

LLaMA-XR: 一种基于LLaMA和QLoRA微调的新型放射科报告生成框架

Md. Zihad Bin Jahangir, Muhammad Ashad Kabir, Sumaiya Akter, Israt Jahan, Minh Chau

AI总结 本文提出LLaMA-XR框架,结合LLaMA 3.1与DenseNet-121图像嵌入及QLoRA微调,提升放射科报告生成的准确性和临床相关性,同时保持计算效率。

详情
Journal ref
Bioengineering 2026, 13(5), 493
Comments
25 pages
AI中文摘要

自动化放射科报告生成具有减少放射科医生工作负担和提高诊断准确性的潜力。然而,从胸部X光片生成精确且具有临床意义的报告仍然具有挑战性,因为医学语言的复杂性和对上下文理解的需求。现有模型在保持准确性和上下文相关性方面存在困难。在本文中,我们提出了LLaMA-XR,一种新型框架,整合了LLaMA 3.1与基于DenseNet-121的图像嵌入以及量化低秩适应(QLoRA)微调。LLaMA-XR在保持计算效率的同时实现了改进的连贯性和临床准确性。这种效率是由一种优化策略驱动的,该策略增强了参数利用并减少了内存开销,使报告生成速度更快,计算资源需求更低。在IU X光基准数据集上进行的广泛实验表明,LLaMA-XR优于一系列最先进的方法。我们的模型在ROUGE-L得分上达到0.433,在METEOR得分上达到0.336,建立了该领域的性能新基准。这些结果突显了LLaMA-XR作为自动化放射科报告的有效且高效的AI系统潜力,提供了增强的临床效用和可靠性。

英文摘要

Automated radiology report generation holds significant potential to reduce radiologists' workload and enhance diagnostic accuracy. However, generating precise and clinically meaningful reports from chest radiographs remains challenging due to the complexity of medical language and the need for contextual understanding. Existing models often struggle with maintaining both accuracy and contextual relevance. In this paper, we present LLaMA-XR, a novel framework that integrates LLaMA 3.1 with DenseNet-121-based image embeddings and Quantized Low-Rank Adaptation (QLoRA) fine-tuning. LLaMA-XR achieves improved coherence and clinical accuracy while maintaining computational efficiency. This efficiency is driven by an optimization strategy that enhances parameter utilization and reduces memory overhead, enabling faster report generation with lower computational resource demands. Extensive experiments conducted on the IU X-ray benchmark dataset demonstrate that LLaMA-XR outperforms a range of state-of-the-art methods. Our model achieves a ROUGE-L score of 0.433 and a METEOR score of 0.336, establishing new performance benchmarks in the domain. These results underscore LLaMA-XR's potential as an effective and efficient AI system for automated radiology reporting, offering enhanced clinical utility and reliability.

2505.01824 2026-05-20 math.OC cs.SY eess.SY

Smoothness of the Augmented Lagrangian Dual in Convex Optimization

凸优化中增强拉格朗日对偶的光滑性

Jingwang Li, Vincent Lau

AI总结 本文研究了线性约束优化问题的对偶函数和增强拉格朗日函数的光滑性,证明了在域非空的条件下,增强拉格朗日对偶函数在特定条件下处处光滑,并且对于任意λ,存在最优解。

详情
AI中文摘要

本文聚焦于一般的线性约束优化问题:$\min_{x \in \mathbb{R}^d} f(x) \ ext{s.t.} \ Ax = b$,其中$f: \mathbb{R}^d ightarrow \mathbb{R} \cup \{+\infty\}$是一个闭合的凸函数,$A \in \mathbb{R}^{p imes d}$,$b \in \mathbb{R}^p$。我们定义标准对偶函数$ϕ(λ) = \inf_x \{f(x) + \langle λ, A x - b angle\}$,增强拉格朗日函数$\mathcal{L}_ρ(x, λ) = f(x) + \langle λ, Ax - b angle + racρ{2}\|Ax - b\|^2$($ρ> 0$),以及增强拉格朗日对偶函数$ϕ_ρ(λ) = \inf_x \mathcal{L}_ρ(x, λ)$。在域非空的基本条件下,我们证明了:(1) $ϕ_ρ$在处处是$ rac{1}ρ$-光滑的;(2) 对于任意$λ\in \mathbb{R}^p$,$\min_{x \in \mathbb{R}^d} \mathcal{L}_ρ(x, λ)$存在解。这些理论结果显著削弱了文献中通常用于确保此类性质的严格假设。

英文摘要

This paper focuses on the general linearly constrained optimization problem: $\min_{x \in \mathbb{R}^d} f(x) \ \text{s.t.} \ Ax = b$, where $f: \mathbb{R}^d \rightarrow \mathbb{R} \cup \{+\infty\}$ is a closed proper convex function, $A \in \mathbb{R}^{p \times d}$, and $b \in \mathbb{R}^p$. We define the standard dual function $ϕ(λ) = \inf_x \{f(x) + \langle λ, A x - b \rangle\}$, the augmented Lagrangian $\mathcal{L}_ρ(x, λ) = f(x) + \langle λ, Ax - b \rangle + \fracρ{2}\|Ax - b\|^2$ ($ρ> 0$), and the augmented Lagrangian dual function $ϕ_ρ(λ) = \inf_x \mathcal{L}_ρ(x, λ)$. Under the fundamental condition that $\text{dom} \ ϕ\neq \emptyset$, we establish that: (1) $ϕ_ρ$ is $\frac{1}ρ$-smooth everywhere; and (2) the solution to $\min_{x \in \mathbb{R}^d} \mathcal{L}_ρ(x, λ)$ exists for any $λ\in \mathbb{R}^p$. These theoretical findings substantially weaken the stringent assumptions typically imposed in the literature to ensure such properties.