Real-time Speech Restoration using Data Prediction Mean Flows
实时语音修复使用数据预测均流
AI总结 本文提出一种结合数据预测均流的多步流匹配模型,以低延迟架构实现实时语音修复,相比现有方法,计算量减少120倍且音频质量相似。
实时语音修复使用数据预测均流
Sebastian Braun
AI总结 本文提出一种结合数据预测均流的多步流匹配模型,以低延迟架构实现实时语音修复,相比现有方法,计算量减少120倍且音频质量相似。
生成模型能够解决具有非唯一解的问题,如带宽扩展和空隙填充,去除编码器中的高度非线性伪影、 clipping 和失真,而不是去除线性加性成分如噪声和回声。尽管大型离线处理模型已取得显著成果,但这些任务尚未用具有低延迟和计算能力的实时模型解决。我们提出了一种使用数据预测均流的多步流匹配模型,结合合适的新型低延迟架构,使流匹配模型在这些约束下成为有吸引力的选择。与最先进方法相比,我们的所提均流模型使用120倍更少的计算,并且除了STFT外没有引入其他算法延迟,同时实现了相似的音频质量。
Generative models are capable to address difficult problems with non-unique solutions like bandwidth extension and gap filling, removing highly non-linear artifacts from codecs, clipping and distortion, as opposed to removing linear additive components like noise and reverb. While large offline processing models have shown impressive results, these tasks have not been solved with real-time capable models with low latency and compute. We propose a few-step flow matching model using Data Prediction Mean Flows in combination with suitable novel low-latency architecture to make flow matching models an attractive choice under theses constraints. Compared to state-of-the-art, our proposed mean flow model uses 120x less compute and introduces no algorithmic latency other than the STFT, while achieving similar audio quality.
FORGE:无权重更新的自演化代理记忆
Igor Bogdanov, Chung-Horng Lung, Thomas Kunz, Jie Gao, Adrian Taylor, Marzia Zaman
AI总结 FORGE通过群体广播机制实现无梯度更新的自生成记忆,提升层次ReAct代理决策能力,在CybORG CAGE-2任务中显著提高性能并降低失败率。
LLM代理能否通过自生成记忆提升决策能力而不进行梯度更新?我们提出了FORGE(失败优化反射毕业与进化),一种分阶段、基于群体的协议,通过注入提示的自然语言记忆来进化层次ReAct代理。FORGE包含一个反射式内环,其中专门的反思代理(使用相同的基础LLM,不从更强模型蒸馏)将失败轨迹转换为可重用的知识工件:文本启发式(规则)、少量示例(示例)或两者(混合),外环在阶段间将表现最佳实例的记忆传播到群体,并通过毕业标准冻结收敛实例。我们在CybORG CAGE-2上评估,这是一个具有30步地平线的随机网络防御POMDP,对抗B线攻击者。所有四个测试的LLM家族(Gemini-2.5-Flash-Lite、Grok-4-Fast、Llama-4-Maverick、Qwen3-235B)均表现出强烈负的、重尾零样本奖励。与零样本基线和反射基线(隔离单流学习)相比,FORGE在所有12种模型-表示条件下,将平均评估回报提高了1.7-7.7倍,比反射基线提高了29-72%,将主要失败率(低于-100)降低到约1%。我们发现(1)群体广播是关键机制,无毕业消融确认广播承载性能提升,而毕业主要节省计算;(2)示例在三个模型中表现最强,规则提供最佳成本-可靠性剖面,约少40%的token;(3)较弱基线模型受益显著,表明FORGE可能缓解能力差距而非放大强模型。所有证据均限于CAGE-2 B线;跨家族发现是方向性证据。
Can LLM agents improve decision-making through self-generated memory without gradient updates? We propose FORGE (Failure-Optimized Reflective Graduation and Evolution), a staged, population-based protocol that evolves prompt-injected natural-language memory for hierarchical ReAct agents. FORGE wraps a Reflexion-style inner loop, where a dedicated reflection agent (using the same underlying LLM, no distillation from a stronger model) converts failed trajectories into reusable knowledge artifacts: textual heuristics (Rules), few-shot demonstrations (Examples), or both (Mixed), with an outer loop that propagates the best-performing instance's memory to the population between stages and freezes converged instances via a graduation criterion. We evaluate on CybORG CAGE-2, a stochastic network-defense POMDP at a 30-step horizon against the B-line attacker, where all four tested LLM families (Gemini-2.5-Flash-Lite, Grok-4-Fast, Llama-4-Maverick, Qwen3-235B) exhibit strongly negative, heavy-tailed zero-shot rewards. Compared against both a zero-shot baseline and a Reflexion baseline (isolated single-stream learning), FORGE improves average evaluation return by 1.7-7.7$\times$ over zero-shot and by 29-72% over Reflexion in all 12 model-representation conditions, reducing major-failure rates (below $-100$) to as low as $\sim$1%. We find that (1) population broadcast is critical mechanism, with a no-graduation ablation confirming that broadcast carries the performance gains while graduation primarily saves compute; (2) Examples achieves the strongest returns for three of four models, Rules offers the best cost-reliability profile with $\sim$40% fewer tokens; and (3) weaker baseline models benefit disproportionately, suggesting FORGE may mitigate capability gaps rather than amplify strong models. All evidence is confined to CAGE-2 B-line; cross-family findings are directional evidence.
智能能源基础设施的统一生成式AI框架:智能燃气分配、公用事业计费、碳分析和量子启发优化
Pavan Manjunath, Thomas pruefer
AI总结 本文提出一种统一的生成式AI框架,整合智能燃气分配、计费、碳分析和量子优化,以提升能源管理效率与环境责任。
智能计量、生成式人工智能和量子启发组合优化的加速融合正在重塑能源公用事业在物理基础设施管理、客户互动和环境责任方面的运营方式。
The accelerating convergence of smart metering, generative artificial intelligence, and quantum-inspired combinatorial optimisation is reshaping how energy utilities manage physical infrastructure, customer engagement, and environmental accountability
重新审视预emption:用于最小化信息年龄的多阈值预emption策略
Sahan Liyanaarachchi, Sennur Ulukus, Nail Akar
AI总结 本文研究了随机更新到达系统的多阈值预emption策略,通过分析框架评估信息年龄,并展示其在信息年龄优化中的有效性。
在信息年龄(AoI)文献中,最优预emption策略的研究一直是热点问题,阈值结构在生成-at-will更新生成模型下被证明是优化的。本文研究了随机更新到达系统的阈值策略有效性,引入了评估多阈值预emption策略信息年龄的分析框架,并展示了最优预emption策略的结构特性。我们证明了这些阈值策略在传统概率预emption策略和单阈值策略上更有效,通过结合数据包年龄和系统年龄设计策略,可显著降低信息年龄。
The study of optimal preemption policies for status update systems has been a recurring topic in the age of information (AoI) literature, where threshold-based structures have been shown to be optimal under a generate-at-will update generation model under certain assumptions. In this work, we study the effectiveness of threshold-based policies for a system with random update arrivals. In this regard, we introduce an analytical framework for evaluating the AoI of multi-threshold preemption policies and present interesting characteristics of the structure of the optimal preemption policy. We show the effectiveness of these threshold-based policies over the traditional probabilistic preemption policies and single-threshold policies, where we observe that significant gains in terms of AoI can be obtained by utilizing both the age of the packet and the age of the system when designing these preemption policies.
上下文、推理与层次:在对抗性POMDP中的复合LLM代理设计成本-性能研究
Igor Bogdanov, Chung-Horng Lung, Thomas Kunz, Jie Gao, Adrian Taylor, Marzia Zaman
AI总结 研究探讨了在对抗性部分可观测序贯环境中,复合LLM代理设计的上下文、推理和层次分解对性能与成本的影响,发现程序化状态抽象在成本效率上表现最佳,而分层分解无需推理可获得最佳性能。
在对抗性、部分可观测的序贯环境中部署复合LLM代理需要处理多个设计维度:(1)代理所见的内容,(2)其推理方式,以及(3)任务在组件间的分解。然而,从业者缺乏指导,以确定哪些设计选择能提升性能而非仅仅增加推理成本。我们通过CybORG CAGE-2环境(建模为部分可观测马尔可夫决策过程POMDP)进行受控研究。奖励为非正数,因此所有配置均在故障缓解模式下运行。我们的评估涵盖五种模型家族、六种模型和十二种配置(3,475次回合),并进行逐token的成本计算。我们变化上下文表示(原始观察与确定性状态跟踪层压缩历史)、推理(自我提问、自我批评和自我改进工具,可选思维链提示)以及分层分解(单体ReAct与委托给专门子代理)。我们发现:(1)程序化状态抽象在每token花费上获得最大回报(RPTS),在原始观察上提升均值回报高达76%。 (2)在分层中分布推理工具相对于单独分层,对所有五种模型家族均降低性能,达到3.4倍更差的均值回报,同时使用1.8-2.7倍更多token。我们称此破坏性模式为推理瀑布。 (3)没有推理的分层分解在大多数模型中获得最佳绝对性能,且上下文工程通常比推理更经济有效。这些发现表明在结构对抗性POMDPs中的设计原则:投资于程序化基础设施和清洁任务分解,而不是更深入的单个代理推理,因为这些策略在结合时可能会相互干扰。
Deploying compound LLM agents in adversarial, partially observable sequential environments requires navigating several design dimensions: (1) what the agent sees, (2) how it reasons, and (3) how tasks are decomposed across components. Yet practitioners lack guidance on which design choices improve performance versus merely increase inference costs. We present a controlled study of compound LLM agent design in CybORG CAGE-2, a cyber defense environment modeled as a Partially Observable Markov Decision Process (POMDP). Reward is non-positive, so all configurations operate in a failure-mitigation mode. Our evaluation spans five model families, six models, and twelve configurations (3,475 episodes) with token-level cost accounting. We vary context representation (raw observations vs. a deterministic state-tracking layer with compressed history), deliberation (self-questioning, self-critique, and self-improvement tools, with optional chain-of-thought prompting), and hierarchical decomposition (monolithic ReAct vs. delegation to specialized sub-agents). We find that: (1) Programmatic state abstraction delivers the largest returns per token spent (RPTS), improving mean return by up to 76% over raw observations. (2) Distributing deliberation tools across a hierarchy degrades performance relative to hierarchy alone for all five model families, reaching up to 3.4$\times$ worse mean return while using 1.8-2.7$\times$ more tokens. We call this destructive pattern a deliberation cascade. (3) Hierarchical decomposition without deliberation achieves the best absolute performance for most models, and context engineering is generally more cost-effective than deliberation. These findings suggest a design principle for structured adversarial POMDPs: invest in programmatic infrastructure and clean task decomposition rather than deeper per-agent reasoning, as these strategies can interfere when combined.
Watts vs. Bytes: 通过存储计算协同优化将数据中心转化为电网资产
Shaohui Liu, Sungho Shin, Deepjyoti Deka
AI总结 本文提出一种鲁棒的协同优化框架,用于在电网限制下优化数据中心的日间运营,通过协调计算需求与电池储能系统,提升电网服务能力和运营效率。
在电网压力增大背景下,推动数据中心持续扩展需要更紧密协调灵活的计算需求与本地电池储能系统(BESS),以提高运营效率并提供电网服务。本文开发了一种鲁棒的协同优化框架,用于在电力公司规定的峰值负载和爬坡限制下优化数据中心与本地BESS的日间运营。该模型同时考虑了具有截止时间约束的计算工作负载,通过工作负载调度和动态电压和频率调节(DVFS)进行管理,以及考虑退化的BESS调度,以实现成本优化并参与辅助服务市场。基于真实世界市场和工作负载数据的案例研究显示,所提出的框架在多种运行条件下都能产生可行的日间调度方案,当互联限制成为约束时,其效益显著增加。在基准条件下,BESS的价值来源于辅助服务参与和改进的工作负载和能源管理。然而,在高峰负载和爬坡限制下,BESS的每日价值增加了一倍或更多,主要由BESS行动减少可调度工作负载的潜在不完整性,同时遵守互联限制。在严格的峰值负载限制下,工作负载组成也起关键作用,其中非可调度任务的更高比例可使运营成本增加25%以上,相对更灵活的工作负载混合。此外,DVFS研究进一步表明,在严格负载限制下,处理器级控制是重要的灵活性杠杆。这些结果表明,协调计算-存储灵活性可以显著扩大数据中心的运营余量和电网价值,特别是在电网容量日益稀缺的情况下。
Enabling continued data-center growth under increasing grid stress motivates closer coordination between flexible computing demand and co-located battery energy storage systems (BESS) to improve site operations and provide grid services. This paper develops a robust co-optimization framework for day-ahead operation of data centers with co-located BESS under utility-imposed interconnection limits on peak load and ramping. The model jointly considers deadline-constrained computing workloads, managed through workload scheduling and dynamic voltage and frequency scaling (DVFS), together with degradation-aware BESS dispatch to enable cost optimization and participation in ancillary-service markets. Case studies based on real-world market and workload data show that the proposed framework yields feasible day-ahead schedules across a range of operating conditions, with substantially larger benefits when interconnection constraints become binding. Under baseline conditions, BESS value is derived from both ancillary-service participation and improved workload and energy management. Under stressed peak-load and ramping limits, however, the daily value of BESS increases by a factor of two or more, driven primarily \revise{by BESS actions to reduce the potential incompletion in the schedulable workload while complying with interconnection constraints}. Under tight peak-load caps, workload composition also matters where a higher share of non-schedulable jobs can increase operating cost by more than 25\% relative to more flexible workload mixes. \revise{Additionally, DVFS studies further show that processor-level control is a material flexibility lever under tight load limits.} These results demonstrate that coordinated compute-storage flexibility can materially expand the operational headroom and grid value of data centers, especially under increasingly scarce grid capacity.
数字双胞胎能追溯到多远的时间以反映物理对象的状态:陈旧度年龄
Ismail Cosandal, Sennur Ulukus
AI总结 本文提出一种新的度量标准——陈旧度年龄(AoS),用于衡量当前估计正确的时间间隔。通过分析马尔可夫源和优化问题,展示了该度量在数字双胞胎网络中的应用。
开创性的度量标准信息年龄(AoI)被引入以衡量通信网络中的信息新鲜度。尽管其具有变革性,但在某些应用中如远程监控,它仍存在不足,因为它是一个语义无关的度量标准,不考虑随机过程的动态性。有必要通过一个结合新鲜度和语义方面的度量标准来量化远程估计器的性能。为此,本文引入了一个新的度量标准,称为陈旧度年龄(AoS),用于衡量当前估计正确的时间。首先,我们分析了一个简单的场景,其中n-ary对称马尔可夫源被监控器通过恒定采样率观测,得出AoS的闭式表达式,并证明其随采样率单调递减。接下来,我们考虑多个不同的马尔可夫源,并制定了一个优化问题,其中远程监控器将总采样率分配给跟踪源。尽管优化问题是非凸的,但其结构适合使用聚块算法获得近似最优解,该算法利用了目标函数的单调性。虽然新的AoS度量标准可能适用于许多场景,但我们认为它特别适用于数字双胞胎网络(DTN),其中多个物理对象(POs)被监控,以维持它们的数字表示,即它们的数字双胞胎(DT)。
The groundbreaking metric age of information (AoI) has been introduced to measure information freshness in communication networks. As transformational as it is, AoI metric falls short in some applications, such as remote monitoring, since it is a semantic-agnostic metric which does not consider the dynamics of the random process. There is a need to quantify the performance of a remote estimator via a metric that combines freshness and semantic aspects. To this end, in this paper, we introduce a novel metric coined age of staleness (AoS) that measures when the last time that the current estimation was correct. First, we analyze a simple scenario where an $n$-ary symmetric Markov source is observed by a monitor via a constant sampling rate, obtain a closed-form expression for the AoS, and show that it is a monotonically decreasing function of the sampling rate. Next, we consider multiple distinct Markov sources, and formulate an optimization problem, where the remote monitor allocates the total sampling rate to tracking the sources. Although the optimization problem is non-convex, its structure is suitable for obtaining a near-optimal solution using the polyblock algorithm, which leverages the monotonicity of the objective function. While the new AoS metric could be applicable in many scenarios, we believe it is particularly well-suited for a digital twin network (DTN) where multiple physical objects (POs) are monitored with a total sampling rate constraint to maintain a digital representation of them, namely, their digital twin (DT).
MAxLM:基于多智能体语言模型的MU-MIMO-OFDMA无线网络调度与资源分配
Adnan Quadri, Hongxiang Li
AI总结 本文提出MAxLM框架,利用预训练语言模型优化无线网络的上行调度接入,通过WiSER平台实现自主调度与资源分配,实验表明其在不同STA数量和天线设置下均优于基准方法。
无线网络通过MIMO和OFDMA技术支持多用户通信,本文研究了联合MU-MIMO-OFDMA传输模式下用户调度与资源分配(SRA)的优化方法。我们提出一种多智能体(MA)框架,利用预训练的小型/中型语言模型(xLM)进行SRA优化,并引入AI辅助的无线系统工程与研究(WiSER)平台以实现自主SRA。通过在WLAN接入点上评估不同STA数量和天线设置的网络场景,数值结果证实所提技术在上行调度接入(UL-SA)吞吐量上优于基准方法。
Wireless networks support multi-user (MU) communication with multiple-input multiple-output (MIMO) and orthogonal frequency-division multiple access (OFDMA) technologies. In the joint MU-MIMO-OFDMA-enabled transmission mode, network throughput can be significantly increased by effectively utilizing the multi-channel resources to schedule numerous wireless users/stations (STAs) simultaneously. In this paper, we study ways to optimize the user scheduling and resource allocation (SRA) for the UL scheduled access (UL-SA) of a joint MU-MIMO-OFDMA-enabled wireless local area network (WLAN). In particular, we propose a multi-agent (MA) framework that utilizes an openly available pretrained small/medium-sized Language Model (xLM) to perform SRA for the UL-SA. To facilitate autonomous SRA using our proposed technique, we introduce the AI-assisted Wireless Systems Engineering and Research (WiSER) platform. We evaluate the performance of MAxLM-optimized SRA for network scenarios with a varying number of STAs and antenna settings on the WLAN Access Point. Numerical results confirm that our proposed technique achieves higher UL-SA throughput than the benchmark techniques.
SiFo:面向低开销的特定站点信道状态信息反馈无线基础模型
Cheng-Jie Zhao, Zhaolin Wang, Zongyao Zhao, Yuanwei Liu
AI总结 SiFo提出一种基于无线基础模型的框架,通过预训练和轻量级校准实现高效特定站点CSI反馈,提升频谱效率。
SiFo,一种基于无线基础模型的框架,被提出用于低开销的特定站点信道状态信息(CSI)反馈。在3GPP NR中,类型II反馈提供了一种基于代码表的CSI表示,但需要大量的参考信号开销、用户端搜索和反馈。基于学习的特定站点反馈可以通过利用部署相关的传播结构来减少这些在线成本,同时保持高质量的子空间表示。然而,现有特定站点设计通常为每个新站点训练专用神经网络,这在部署数量大时限制了可扩展性。SiFo通过在源站点上预训练CSI反馈模型,并通过轻量级校准适应目标站点来解决可扩展性问题。一小组目标站点用户报告低维参考信号接收功率(RSRP)指纹,并将基于完整CSI的子空间标签存储为校准记忆。在在线操作期间,被服务用户通过相同的SSB探测和RSRP报告过程与校准用户匹配,因此附近的校准样本提供特定站点的子空间指导,而无需更新模型参数。SiFo因此在转移通用传播知识的同时保留了本地适应性。在十个城市场景中的数值结果表明,SiFo(i)在相同的目标站点标记预算下,比单独训练的特定站点学习基线具有更高的CSI捕获效率;(ii)仅使用在线SSB探测期间收集的RSRP测量值即可接近高开销3GPP NR类型II反馈参考;(iii)在有限的目标站点数据下,将高CSI捕获效率和低开销转化为有效的频谱效率提升。
SiFo, a wireless foundation model-based framework, is proposed for low-overhead site-specific channel state information (CSI) feedback. In 3GPP NR, Type-II feedback provides an expressive codebook-based CSI representation, but it requires substantial reference-signal overhead, UE-side search, and feedback. Learning-based site-specific feedback can reduce these online costs while retaining high-quality subspace representation by exploiting deployment-dependent propagation structure. However, existing site-specific designs typically train a dedicated neural network for each new site, which limits scalability when the number of deployments is large. SiFo addresses this scalability issue by pretraining a CSI feedback model across source sites and adapting it to a target site through lightweight calibration. A small set of target-site users reports low-dimensional reference signal received power (RSRP) fingerprints, and their full-CSI-based subspace labels are stored as calibration memory. During online operation, a served user is matched to calibrated users through the same SSB probing and RSRP reporting procedure, so nearby calibration samples provide site-specific subspace guidance without updating model parameters. SiFo therefore transfers common propagation knowledge while retaining local adaptation. Numerical results across ten city scenarios demonstrate that SiFo (i) achieves higher CSI-capture efficiency than separately trained site-specific learning baselines under the same target-site labeled budget, (ii) approaches the high-overhead 3GPP NR Type-II feedback reference using only RSRP measurements collected during online SSB probing, and (iii) converts the high CSI-capture efficiency and low overhead into effective spectral efficiency improvement under limited target-site data.
隐蔽的贝叶斯最快变化检测
Yun-Feng Lo, Matthieu R. Bloch
AI总结 研究在贝叶斯和无限时间框架下隐蔽最快变化检测问题,提出隐蔽预算指标,分析在误报概率和隐蔽预算约束下检测延迟的第二阶界限,并提出可行方案。
我们研究在贝叶斯和无限时间框架下隐蔽最快变化检测问题。一个合法实体通过主动探测离散无记忆信道来尽可能快地检测状态变化,同时确保其探测行为对监控主动传感的对手保持隐蔽。我们引入预期隐蔽预算(ECB)作为可分析的隐蔽度量指标,该指标界定了主动和被动传感诱导的观测序列之间的相对熵上限。在误报概率(PFA)和ECB约束下,我们建立了平均检测延迟的第二阶渐进对偶界,当PFA约束趋近于零时,对于任何正的ECB约束,明确量化了最大平方根阶隐蔽传感增益。此外,我们提出了一种利用恒定传感概率的Shiryaev型策略的可行方案,并展示了该方案与第二阶渐进对偶界的一致性。我们通过数值示例来说明结果。
We investigate the problem of covert quickest change detection in a Bayesian and infinite-horizon setting. A legitimate entity seeks to detect a change in the state of a discrete memoryless channel as quickly as possible by actively probing it. Simultaneously, the entity must ensure its probing remains covert from an adversary monitoring the channel for active sensing. We introduce the expected covertness budget (ECB) as an analytically tractable covertness metric that bounds from above the relative entropy between the observation sequences induced by active and passive sensing. Under constraints on both the probability of false alarm (PFA) and the ECB, we establish a second-order asymptotic converse bound on the average detection delay as the PFA constraint approaches zero, for any positive ECB constraint, explicitly quantifying the maximum square-root-order covert sensing gain possible. Furthermore, we propose an achievability scheme utilizing a constant-sensing-probability Shiryaev-type policy and show that it matches the second-order asymptotic converse. We illustrate our result with a numerical example.
基于偏好的主动学习MPC目标函数
Hasna El Hasnaouy, Pablo Krupa, Mario Zanon, Alberto Bemporad
AI总结 本文提出两种主动学习策略,通过人类对轨迹对的偏好学习MPC目标函数,减少查询次数以获得更符合偏好的闭环行为。
在模型预测控制(MPC)中,当性能评估标准仅能通过人类判断获得时,设计目标函数具有挑战性。我们采用基于偏好的学习(PbL)方法从轨迹对的偏好中学习MPC目标函数。然而,PbL在现实中的应用常受限于人类偏好查询的成本或可用性。为此,主动学习(AL)策略旨在提高采样效率,减少获取高性能分类器所需的标注努力。本文提出两种AL策略,用于从人类对成对系统轨迹的偏好中学习MPC目标函数:一种基于池的策略,选择当前代理下不确定且与之前标注比较多样化的轨迹对;另一种查询合成策略,利用当前代理驱动的MPC引入新轨迹。数值结果表明,所提策略在较少查询次数下能产生更符合表达偏好的闭环行为,优于随机采样方法。
Designing the objective function in Model Predictive Control (MPC) is challenging when performance assessment criteria are available only from human judgment. We adopt a preference-based learning (PbL) approach to learn the MPC objective function from preferences over trajectory pairs. However, the real-world application of PbL is often restricted by the significant cost or limited availability of human preference queries. To address this, Active Learning (AL) strategies seek to improve sampling efficiency, reducing the labeling effort required to obtain a well-performing classifier. We present two AL strategies for learning the MPC objective function from human preferences over pairwise system trajectories: a pool-based strategy that selects trajectory pairs that are both uncertain under the current surrogate and diverse relative to previously labeled comparisons, and a query-synthesis strategy that incorporates new trajectories using the current surrogate-driven MPC. Numerical results show that the proposed strategies yield closed-loop behaviors that align more with the expressed preference using fewer number of queries compared to a random sampling approach.
全息阿基米德波束成形:用于抗阻塞太赫兹通信的曲面轨迹优化
Xinyuan Hu, Boya Di, Lingyang Song
AI总结 本文提出基于幅度控制的全息阿基米德波束成形方案,利用可重构全息表面实现抗阻塞的太赫兹通信,通过优化轨迹提升被阻塞用户接收功率达10dB以上。
太赫兹通信在6G网络中提供巨大的带宽,但近场区域由于大天线阵列面临严重的阻塞挑战。为克服近场聚焦束易受障碍物影响的限制,本文利用波前工程生成沿抛物线轨迹传播的阿基米德束以绕过阻塞。我们考虑可重构全息表面(RHS)作为精确波前工程的潜在解决方案,因其紧凑的辐射元件间距远小于半波长。我们揭示了RHS可调节的有效孔径允许抛物线偏移位于天线孔径内,从而增强设计阿基米德束轨迹的自由度。提出了一种名为基于幅度控制的全息阿基米德波束成形方案的模拟波束成形方法,以生成沿所需轨迹传播的弯曲束。为最大化被阻塞用户的接收功率,我们开发了一种基于几何的轨迹优化算法。仿真结果验证了与传统相控阵列相比,RHS可通过其可调节的有效孔径将被阻塞用户的接收功率提高超过10dB。
Terahertz communication offers vast bandwidth for high-speed transmission in the 6G networks but faces severe blockage challenges in the near-field region due to large antenna arrays. To overcome the limitation that near-field focused beams are susceptible to obstacles, wavefront engineering is leveraged to generate an Airy beam that propagates along a parabolic trajectory to circumvent blockages. In this paper, we consider the reconfigurable holographic surface (RHS) as a potential solution for such precise wavefront engineering owing to its compact radiation element spacing being much smaller than half-wavelength. We reveal that the adjustable effective aperture of the RHS allows the parabolic offset to be located within the antenna aperture, which enhances the freedom in designing Airy beam trajectories. An analog beamforming method, named the holographic Airy beamforming scheme based on amplitude control, is then proposed to generate the curved beam that propagates along the desired trajectory. To maximize the received power of a blocked user, we develop a geometry-based trajectory optimization algorithm. Simulation results validate that, compared to traditional phase-controlled arrays with analog beamforming, the RHS can leverage its adjustable effective aperture to improve the received power of the blocked user by over 10 dB.
通过快速且放松的向量拟合提升非正方形MIMO模态识别的输入堆叠
Beatrice E. Bauret Martínez, Gabriele Dessena, Marco Civera, Oscar E. Bonilla-Manrique
AI总结 本文提出一种基于地面振动测试数据的MIMO框架,利用改进的输入堆叠策略对飞机结构模态参数进行识别,通过数值和实验验证展示了方法的高精度和抗噪能力。
快速且放松的向量拟合(FRVF)是一种频率域系统识别方法,已被广泛应用于电气系统建模,但在机械系统中的应用相对较少。本文将FRVF重新公式化,用于基于地面振动测试(GVT)数据的航空结构模态参数识别,采用MIMO框架。所提出的方法分为三个阶段:(i)通过改进的输入堆叠策略对频率响应函数进行有理逼近;(ii)从得到的有理模型中识别系统极点;(iii)从提取的极点和相关残差中估计模态参数。该方法首先在MIMO梁模型上进行数值验证,特别强调在增加测量噪声下的准确性和鲁棒性。随后,利用BAE系统Hawk T1A飞机的GVT数据进行实验验证。所得结果表明,其性能与现有方法相当。总体而言,扩展的MIMO FRVF公式化方法表现出高精度和强抗噪能力,突显了其在GVT基模态分析中的适用性。
Fast and Relaxed Vector Fitting (FRVF) is a frequency-domain system identification approach that has been widely adopted in electrical system modelling, while its application to mechanical systems has remained relatively unexplored. In this work, FRVF is reformulated for the identification of structural modal parameters of an aircraft based on Ground Vibration Test (GVT) data within a Multi-Input Multi-Output (MIMO) framework. The proposed procedure consists of three stages: (i) rational approximation of frequency response functions via an enhanced input-stacking strategy, (ii) identification of system poles from the resulting rational model, and (iii) estimation of modal parameters from the extracted poles and associated residues. The methodology is first numerically validated on a MIMO beam model, with particular emphasis on accuracy and robustness under increasing measurement noise. Subsequently, experimental validation is conducted using GVT data from the BAE Systems Hawk T1A aircraft. The results obtained demonstrate a level of performance comparable to that achieved by existing methods. Overall, the extended MIMO formulation of FRVF exhibits high accuracy and strong robustness to measurement noise, highlighting its suitability for application in GVT-based modal analysis.
超越对角可重构智能表面:通过分数规划和流形优化进行分布式散射矩阵设计和MIMO波束成形
Iván Alexander Morales Sandoval, Marko Fidanovski, Hyeon Seok Rou, Giuseppe Thadeu Freitas de Abreu, Emil Björnson
AI总结 本文提出了一种基于分数规划和流形优化的分布式MIMO波束成形方法,用于优化超越对角可重构智能表面辅助的多用户小区自由大规模MIMO系统,通过改进的散射矩阵设计和波束成形技术提升系统性能。
我们考虑了超越对角可重构智能表面(BD-RIS)辅助的多用户(MU)小区自由(CF)-大规模多输入多输出(mMIMO)系统的优化问题,其中传播环境设计通过散射矩阵优化得以补充,同时开发了一种高效的基站(BS)波束成形(BF)方案,有效利用后者“工程化”信道。具体而言,我们描述了一种分数规划(FP)方法,该方法基于等效信道,结合一个由现有散射矩阵设计方法参数化的互易BD-RIS(RBD-RIS),从而得到相应的优化多输入多输出(MIMO)BF权重。所提出的方法将发射(TX)波束成形器分解为多个求和速率最大化(SRM)子波束成形器,每个子波束成形器满足独立的功率约束,从而可以最优处理分布式MIMO-BF场景。尽管所提出的SRM-MIMO-BF框架与特定的散射矩阵设计无关,但将BD-RIS辅助系统模型扩展到CF-mMIMO设置需要设计相应的波束成形矩阵。在此背景下,本文研究了波束成形在可重构智能表面(RIS)辅助系统中的影响。仿真结果表明,所提出的MIMO-BF权重设计方法,结合之前开发的互易BD-RIS(RBD-RIS)散射矩阵设计,优于现有BD-RIS辅助的最先进(SotA)方案,表明整体贡献大于各部分之和。
We consider the optimization of beyond diagonal reconfigurable intelligent surface (BD-RIS)-aided multi-user (MU) cell-free (CF)-massive multiple-input multiple-output (mMIMO) systems, where the propagation environment design achieved scattering matrix optimization is complemented by developing an efficient base station (BS) beamforming (BF) scheme that effectively exploits the latter ``engineered'' channel. In particular, we describe a fractional programming (FP) method, which based on the equivalent channel incorporating a reciprocal BD-RIS (RBD-RIS) parameterized by existing scattering matrix design methods, yielding the correspondingly optimized multiple-input multiple-output (MIMO) BF weights. The proposed approach decomposes the transmit (TX) beamformer into multiple sum-rate maximization (SRM) sub-beamformers, each satisfying an independent power-constraint, such that distributed MIMO-BF scenarios can be optimally handled. Although the proposed SRM-MIMO-BF framework is independent of the specific scattering matrix design, extending the BD-RIS-aided system model to the CF-mMIMO setting requires the design of a corresponding beamforming matrix. In this context, this work investigates the impact of beamforming in reconfigurable intelligent surface (RIS)-aided systems. Simulation results demonstrate that the proposed method for designing the MIMO-BF weights, when combined with the previously developed design of reciprocal BD-RIS (RBD-RIS) scattering matrices, outperforms existing BD-RIS-aided state-of-the-art (SotA) schemes employing existing MIMO-BF techniques, indicating that the whole contribution is more than the sum of the parts.
基于约束MPC的形态四旋翼运动规划在超狭窄通道中的应用
Harsh Modi, Xiao Liang, Minghui Zheng
AI总结 本文提出一种运动规划框架,用于在极端受限环境中为变形四旋翼规划形态和轨迹。通过开发一种新颖的障碍物避障成本函数,使四旋翼能够在有限的2D激光雷达感知下导航通过极狭窄的缝隙。
本文介绍了一种运动规划框架,用于在极端受限环境中为变形四旋翼规划形态和轨迹。我们开发了一种新颖的障碍物避障成本函数,用于非线性模型预测控制(MPC),使四旋翼能够在有限的2D激光雷达感知下导航通过极狭窄的缝隙。传统基于人工势场的成本通常在狭窄通道中具有较高的成本,会人工阻塞可通行的路径。相比之下,我们提出了一种平滑的指数障碍物成本,该成本在狭窄缝隙中保持低通行成本,同时保持强的碰撞避障行为。该方法避免了硬激活阈值,并引入了成本降低因子以减少狭窄通道内的成本。直接使用2D激光雷达测量值在MPC中允许绕过任意形状的障碍物。该方法嵌入在基于acados的非线性MPC框架中。仿真和实验结果证明了在通常排斥成本函数会失败的狭窄走廊中成功通行。该方法提供了一种计算高效且实用的解决方案,用于在保持安全避障的同时导航通过狭窄空间。虽然我们正在将该框架应用于变形四旋翼,但成本函数的公式是通用的,适用于任何移动机器人应用,不限于变形四旋翼。实现代码可在GitHub上获得,并附有简短视频链接。
This paper introduces a motion planning framework to plan morphology and trajectory for morphing quadrotors under extremely constrained environments. We develop a novel obstacle avoidance cost function for nonlinear model predictive control (MPC) that enables navigation through extremely narrow gaps under limited perception from a 2D LiDAR. Classical artificial potential field-based costs typically have a high cost in narrow passages, artificially blocking the navigable path. In contrast, we propose a smooth exponential obstacle cost that preserves low traversal cost within narrow gaps while maintaining strong collision avoidance behavior. The formulation avoids hard activation thresholds and introduces a cost reduction factor to reduce the cost within narrow passages. Direct use of 2D LiDAR measurements in MPC allows navigation around arbitrarily shaped obstacles. The method is embedded within an acados-based nonlinear MPC framework. Simulation and experimental results demonstrate successful traversal of narrow corridors where typical repulsive cost functions would fail. The approach provides a computationally efficient and practical solution for navigating through tight spaces while maintaining safety from the obstacles. While we are implementing the framework on the morphing quadrotors, the cost function formulation is general-purpose for any mobile robot application, and is not limited to the morphing quadrotors. The implementation code is available at \href{https://github.com/harshjmodi1996/morphocopter_mpc}{Github Repo} and a short video is available at \href{https://zh.engr.tamu.edu/wp-content/uploads/sites/310/2026/03/MPC_MorphoCopter_video.mp4}{Video Link}.
近场STAR-RIS赋能的ISCPT框架的鲁棒波束成形
Zahra Rostamikafaki, Francois Chan, Claude D'Amours
AI总结 本文提出一种同时传输和反射的可重构智能表面(STAR-RIS)辅助的近场集成感知、通信和功率传输(ISCPT)框架,通过鲁棒优化最大化 harvested power,在不完美级联信道状态信息(CSI)下考虑用户速率、窃听者容忍速率和最小感知波束成形增益约束。
本文提出一种同时传输和反射的可重构智能表面(STAR-RIS)辅助的近场集成感知、通信和功率传输(ISCPT)框架。我们提出了在不完美级联信道状态信息(CSI)下的鲁棒 harvested power 最大化问题,考虑用户速率、窃听者容忍速率和最小感知波束成形增益的约束。为解决此非凸问题,我们采用交替优化(AO)。首先,我们利用S-procedure近似半无限不等式约束,通过顺序秩约束放松(SROCR)获得秩一主动波束成形;然后利用基于惩罚的方案结合连续凸逼近(SCA)更新被动STAR-RIS系数。近场仿真显示,在满足保密性和波束成形目标的情况下, harvested power 显著提升,优于传统基线。
A simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)-aided near-field integrated sensing, communication, and power transfer (ISCPT) framework is proposed. We formulate a robust harvested power maximization problem under imperfect cascaded channel state information (CSI), with constraints on required user rate, eavesdropper tolerable rate, and minimum sensing beampattern gain. To address this non-convex problem, we adopt alternating optimization (AO). First, we approximate the semi-infinite inequality constraints using the S-procedure and obtain rank-one active beamforming via sequential rank-one constraint relaxation (SROCR); then we update the passive STAR-RIS coefficients with a penalty-based scheme refined by successive convex approximation (SCA). Simulations in the near field demonstrate notable gains in harvested power while meeting secrecy and beampattern targets, outperforming conventional baselines.
状态估计
Hao Li
AI总结 本文探讨了状态估计在高级控制理论中的应用,强调其在实际应用中的重要性及数学方法的基础作用。
控制科学是第三次工业革命的核心代表,对现代文明至关重要。控制系统是控制科学的主要研究对象,可能涉及许多方面的考虑,如硬件、软件、操作、维护、经济和社会等方面的考虑。然而,在所有这些考虑方面之外,对控制系统最本质的考虑是在数学意义上的方法论考虑,这种知识即我们所指的控制理论。除了从数学角度的重要性外,控制理论更因其在实际应用中的深刻扎根而令人着迷。控制理论的迷人之处在于知其然与知其所以然的结合,正是控制理论与实际应用的融合突显了这种迷人之处。对于实际应用的控制理论,尤其是带有所谓“高级”风味的,涉及几个基本方面。本文介绍了高级控制理论在实际应用中的状态估计方面[1,2]。
Control science is a core representative of the third industrial revolution and is so important to modern civilization. Control systems are the main subject of control science and may involve many aspects of consideration, such as hardware consideration, software consideration, operation consideration, maintenance consideration, economy consideration, society consideration. However, besides all such aspects of consideration, one aspect that is most essential to the control system is methodology consideration in mathematical sense, knowledge on which is what we refer to as control theory. Besides its importance from the mathematical perspective, control theory is even more charming as it is deeply rooted in practical applications. Charms of control theory consist in both know-why and know-how and it is the fusion of control theory and practical applications that highlights such charms. Control theory for practical applications, especially when somewhat with so-called "advanced" flavour, involves several fundamental aspects. This article introduces the State Estimation aspect of Advanced Control Theory for Practical Applications [1,2].
等离子体形状动态控制与任意传感器子集
D. Sorokin, M. Stokolesov, A. Granovskiy, I. Prokofyev, E. Adishchev, M. Nurgaliev, E. Khayrutdinov, G. Subbotin, R. Clark, D. Orlov
AI总结 本文提出一种强化学习代理,通过NSFsim模拟器在120个实验等离子体形状数据集上训练,实现了对动态变化的等离子体形状目标的实时跟踪,并在诊断故障情况下保持鲁棒性。
在托卡马克装置中,等离子体形状控制需要一个能够实时跟踪动态变化的形状目标并容忍诊断故障的控制器。经典方法将问题分解为平衡重构后接线性控制器,并假设固定且完全运行的传感器集。本文提出了一种强化学习代理,同时解决这两个限制。该代理在NSFsim高保真托卡马克模拟器上训练,该模拟器配置为DIII-D,基于120个实验等离子体形状数据集。形状目标每隔0.25秒随机重新采样,使代理面临多样化的过渡。在测试时,代理零样本跟踪动态形状序列;在模拟中持有一静态配置时,平均形状误差为2.01厘米,动态轨迹跟踪在模拟和物理设备上得到定性演示。诊断丢失随机屏蔽每个回合30%的磁感应传感器,产生一个对任意传感器子集鲁棒的单一策略,无需备用控制器或模式切换逻辑。不对称的actor-critic架构与特权平衡信息改进了部分可观测下的价值估计;actor上的辅助形状重建头使从原始诊断中端到端重建形状,并作为策略分析的可解释工具。该策略转移到实验DIII-D射线中,直接控制两个动态形状操作的线圈执行器,并转移到独立的GSevolve模拟器。
Plasma shape control in tokamaks requires a real-time controller that tracks dynamically changing shape targets while tolerating diagnostic failures. Classical approaches decompose the problem into equilibrium reconstruction followed by a linear controller, and assume a fixed, fully operational sensor set. We present a reinforcement learning agent that addresses both limitations simultaneously. The agent is trained in NSFsim, a high-fidelity tokamak simulator configured for DIII-D, on a curated dataset of 120 experimental plasma shapes. The shape targets are resampled as random step changes every 0.25 s, exposing the agent to diverse transitions across the full shape envelope. At test time the agent zero-shot tracks dynamic shape sequences; on a held-out static configuration in simulation it achieves a mean shape error of 2.01 cm, and dynamic trajectory following is demonstrated qualitatively in simulation and on the physical device. Diagnostic dropout randomly masks 30% of magnetic sensors per episode, yielding a single policy robust to arbitrary sensor subsets without backup controllers or mode-switching logic. An asymmetric actor-critic architecture with privileged equilibrium information improves value estimation under partial observability; an auxiliary shape reconstruction head on the actor enables end-to-end shape reconstruction from raw diagnostics and serves as an interpretability tool for policy analysis. The policy transfers to experimental DIII-D shots, where it directly commands the coil actuators on two dynamic shape maneuvers, and to the independent GSevolve simulator.
基于特征的损失函数中层选择影响深度学习超分辨率中图像质量及微结构一致性
David Lohr, Rene Werner
AI总结 本研究探讨了基于特征的损失函数在深度学习超分辨率中对扩散信号一致性的影响,发现深层网络层会导致网格状伪影,而浅层网络层能保持图像与地面真实的一致性,尤其在9倍超分辨率下表现优异。
高分辨率扩散MRI的临床应用受硬件限制和扫描时间阻碍,推动了计算超分辨率的发展。本研究探讨了基于特征的损失函数在深度学习超分辨率中保持扩散信号一致性的有效性。利用人类连接组计划的7T数据生成低分辨率和高分辨率扩散加权图像对,训练了UNets进行2D超分辨率。通过消融和隔离研究,评估了不同VGG16层用于特征损失与图像基L1基准的性能。更深层的层和其组合在超分辨率DWI中产生网格状伪影,这种伪影在扩散参数如定量和各向异性分数中持续存在。使用最浅层时没有此类伪影。该层的下游分析显示与地面真实高度一致,即使在9倍超分辨率下也是如此。图像SNR和使用的VGG16层深度调节伪影的出现和严重程度,要求在扩散MRI中谨慎选择贡献层。
Clinical application of high-resolution diffusion MRI is hindered by hardware limitations and prohibitive scan times, motivating computational super-resolution. This study investigates the efficacy of a feature-based loss function in preserving diffusion signal consistency in deep learning super-resolution. Using 7T data from the human connectome project to generate pairs of low- and high-resolution diffusion weighted images (DWI), we trained UNets for 2D super-resolution. Ablation and isolation studies evaluated different VGG16-layers for feature-based losses against an image-based L1 baseline. Deeper layers and combinations thereof resulted in grid-like artifacts in super-resolution DWIs, which persisted in diffusion parameters like quantitative and fractional anisotropy. No such artifacts were present when using the shallowest layer. Downstream analysis for this layer showed great consistency with the ground truth, even for 9-fold super-resolution. Image SNR and used VGG16-layer depths modulated artifact appearance and severity, mandating careful selection of contributing layers for application in and beyond diffusion MRI.
面向异构系统的通信高效近似梯度编码
Heekang Song, Wan Choi
AI总结 本文提出一种通信高效的梯度编码方案,通过统一框架优化梯度编码和量化,减少残差误差并提升通信效率,实验表明其在COCO数据集上加速收敛并提升通信效率。
我们提出了一种通信高效的最优结构梯度编码方案,以共同解决异构分布式学习中的慢速节点鲁棒性和通信效率问题。通过建立一个统一的框架,同时优化梯度编码和量化,我们提出了一个优化问题,以最小化残差误差,同时满足无偏约束。我们严格建立了联合全局最优,通过推导闭合形式的代码结构和最优比特分配策略,同时提出了一种低复杂度的比特分配算法,以高效地实现近最优性能。我们为凸函数和光滑函数提供了严格的收敛分析。在COCO数据集上的实验表明,我们的联合设计在现有基线相比下显著加速了收敛并提高了通信效率。
We propose a communication-efficient optimally structured gradient coding scheme to jointly address straggler resilience and communication efficiency in heterogeneous distributed learning. By establishing a unified framework that simultaneously optimizes gradient coding and quantization, we formulate an optimization problem to minimize residual error subject to an unbiasedness constraint. We rigorously establish the joint global optimum by deriving a closed-form code structure coupled with an optimal bit allocation strategy, while simultaneously proposing a low-complexity bit allocation algorithm that efficiently yields near-optimal performance. We provide rigorous convergence analysis for convex and smooth functions. Experiments on the COCO dataset demonstrate that our joint design significantly accelerates convergence and enhances communication efficiency compared to existing baselines.
面向代理的无线通信:架构、机遇与未来之路
Yuanwei Liu, Xu Gan, Zhaolin Wang, Shan Shan, Zongyao Zhao, Zhiguo Ding
AI总结 本文提出面向代理的无线通信框架,探讨代理智能与通信系统间的协同机制,涵盖通信系统设计与操作中的代理应用及无线服务支持,展望可测量、安全且互操作的部署方向。
未来无线网络正朝着自主服务操作发展,网络控制与资源管理需响应时变无线条件和演进的服务目标。为应对这一转变,本文开发了面向代理的无线通信框架,该框架刻画代理智能与通信系统间的相互作用。在此框架中,耦合围绕'通信用代理'和'代理用通信'展开。对于面向代理的操作,架构围绕可部署计算基础设施、可编程开放无线电接入网络(O-RAN)软件和可控通信接口组织。基于此架构,'通信用代理'涉及代理在通信系统设计与操作中的应用,包括代理生成的通信软件和代理驱动的自适应无线优化。另一方面,'代理用通信'涉及无线服务对代理操作的支持,包括网络支持的单代理循环和网络协助的多代理协调。最后,本文概述了面向代理的无线通信可测量、安全且互操作部署的有前途的研究方向。
Future wireless networks are moving toward autonomous service operation, where network control and resource management need to respond to time-varying radio conditions and evolving service objectives. To address this shift, this article develops an agent-native wireless communication framework that characterizes the interplay between agent intelligence and communication systems. In this framework, the coupling is organized around \emph{agents for communications} and \emph{communications for agents}. For agent-native operation, the architecture is organized around deployable computing infrastructure, programmable open radio access network (O-RAN) software, and controllable communication interfaces. Based on this architecture, \emph{agents for communications} addresses the use of agents in communication-system design and operation, including agent-generated communication software and agent-driven adaptive wireless optimization. On the other side, \emph{communications for agents} addresses wireless service support for agent operation, including network-supported single-agent loops and network-assisted multi-agent coordination. Finally, it outlines promising research directions for measurable, safe, and interoperable deployment of agent-native wireless communications.
通过数据增强和大语言模型错误校正提升口腔癌患者语音识别性能
Hidde Folkertsma, Thomas Tienkamp, Sebastiaan de Visscher, Max Witjes, Rob van Son, Jiapan Guo, Bence Mark Halpern
AI总结 本文通过数据增强和大语言模型错误校正技术,有效提升了口腔癌患者语音识别的性能,实验结果显示在Whisper和MMS模型上分别实现了40%和50%的词错误率降低。
近年来,自动语音识别(ASR)系统性能取得了显著进步。然而,对于如口腔癌(OC)治疗患者等存在语音障碍的人群,ASR性能仍落后于正常人群。OC语音数据的稀缺性和变异性使得开发此类语音的ASR模型极具挑战性。本文采用数据增强和大语言模型(LLM)错误校正技术来缓解这一问题。我们对荷兰语口腔癌语音语料应用了多种增强技术生成合成数据,并评估其对ASR性能的影响。我们对Whisper和多语言语音(MMS)模型进行微调,发现包含使用文本到语音(TTS)生成的数据时,平均词错误率(WER)下降8%。当使用LLM进行错误校正时,微调后的ASR模型WER进一步下降21.4-26.2%,而非微调模型下降10.0%。总体而言,我们实现了Whisper模型40%和MMS模型50%的WER下降,表明数据增强与LLM校正的结合是识别OC语音的有效策略。
In recent years, the performance of automatic speech recognition (ASR) systems has made considerable progress. Unfortunately, for people with speech impairments, such as people treated for oral cancer (OC), ASR performance is still lagging behind. The scarcity and variability of OC speech data makes development of ASR models for this type of speech difficult. In this work, we use data augmentation and large language model (LLM) error correction to mitigate this problem. We apply various augmentation techniques on a corpus of Dutch oral cancer speech to create synthetic data, and evaluate their effect on ASR performance. We finetune Whisper and Massively Multilingual Speech (MMS) models for each augmentation technique and observe, on average, an 8% relative decrease in Word Error Rate (WER) when including data created using text-to-speech (TTS). When employing LLMs for error correction, we see a further 21.4-26.2% relative decrease in WER for finetuned ASR models and a 10.0% relative decrease for non-finetuned models. Overall, we achieve a 40% relative WER decrease for Whisper and a 50% relative WER decrease for MMS, indicating that a combination of data augmentation and LLM correction is a viable strategy for the recognition of OC speech.
在残差扰动下的不确定性传播:一个智能家居案例研究
Guanru Pan, Dirk Reinhardt, Sebastien Gros, Timm Faulwasser
AI总结 本文提出了一种数据驱动框架,用于在未测量或统计未建模的扰动下进行不确定性传播。通过残差扰动将所有未结构化扰动整合为一个可从数据中估计的量,结合多项式混沌展开和高阶切比雪夫不等式实现高效不确定性量化。
本文提出了一种数据驱动框架,用于在未测量或统计未建模(非结构化)扰动下的不确定性传播。我们考虑残差扰动,将所有非结构化扰动整合为一个可从数据中估计的量。在温和假设下,所得随机预测器是因果且分布一致的,从而通过多项式混沌展开和高阶切比雪夫不等式实现高效的不确定性量化。所提出的方法通过挪威智能家居的实验数据进行了验证。
This paper presents a data-driven framework for uncertainty propagation under unmeasured or statistically unmodeled (unstructured) disturbances. We consider residual disturbances, which consolidate all unstructured disturbances into a single quantity that can be estimated from data. Under mild assumptions, the resulting stochastic predictor is causal and distributionally consistent, enabling efficient uncertainty quantification through polynomial chaos expansions and higher-order Chebyshev inequalities. The proposed method is validated using experimental data from a smart home in Norway.
基于优化序列波束成形的联合移动用户定位与被动目标感知
Aymen Hamrouni, Sofie Pollin, Hazem Sallouha
AI总结 本文提出一种考虑速度的序列波束成形框架,通过动态耦合单站感知与双站定位,在时间域内优化资源分配,实现厘米级定位精度和鲁棒速度估计。
集成感知与通信(ISAC)依赖单站感知(MS)和双站定位(BP)以实现环境感知和用户定位。然而,现有框架大多假设静态几何结构并独立优化这些模式,忽略了用户移动性和序列信息共享。本文提出一种速度感知的序列波束成形框架,动态地在时间域内耦合MS和BP。我们推导出位置域中的Cramer-Rao界(CRBs)以建立非凸资源分配问题。不同于依赖静态加权和权衡,我们引入了序列贝叶斯优化策略,其中MS首先执行以构建可靠的结构先验于用户设备(UE)和被动目标(PTs)。此协方差先验随后传递给UE以正则化BP估计阶段。我们证明了在两个阶段全局优化单一共享波束成形器相比两阶段贪心方法能获得更好的协同增益。仿真结果验证了共享序列设计在有限符号资源下有效平衡,实现了UE和PTs的厘米级定位精度,鲁棒的速度估计以及显著减少的计算运行时间。
Integrated sensing and communication (ISAC) relies on monostatic sensing (MS) and bistatic positioning (BP) to enable comprehensive environmental awareness and user localization. However, existing frameworks predominantly assume static geometries and optimize these modalities independently, neglecting user mobility and sequential information sharing. In this paper, we propose a velocity-aware sequential beamforming framework that dynamically couples MS and BP in time. We derive the Cramer-Rao bounds (CRBs) in the position domain to formulate a non-convex resource allocation problem. Instead of relying on static weighted-sum tradeoffs, we introduce a sequential Bayesian optimization strategy where MS is executed first to construct a reliable structural prior on the UE and passive targets (PTs). This covariance prior is subsequently passed to the UE to regularize the BP estimation stage. We demonstrate that optimizing a single shared beamformer globally across both phases yields superior synergistic gains compared to a two-stage greedy approach. Simulation results validate that the shared sequential design efficiently balances limited symbol resources, achieving centimeter-level positioning accuracy for both the UE and PTs, robust velocity estimation, and a significantly reduced computational runtime.
视频质量评估方法及AV2压缩性能结果
Zhijun Lei, Vibhoothi Vibhoothi, Dzung Hoang, Yixin Du, Ramzi Khsib
AI总结 本文提出AV2通用测试条件中的质量与性能评估方法,展示AV2(v13.0)相比AV1的编码增益,实验表明AV2在PSNR-YUV和VMAF指标上实现显著的BD-rate降低。
联盟开放媒体(AOMedia)开发了AV2视频编码标准以取代AV1,旨在实现跨多种媒体应用的显著压缩效率提升。本文详细介绍了AV2通用测试条件(CTC)中定义的质量和性能评估方法,包括基于凸包的自适应流媒体(AS)配置、用户生成内容(UGC)和扩展色度格式。我们展示了AV2(v13.0)相对于AV1基线的编码增益。实验结果表明,在随机访问配置下,AV2在PSNR-YUV和VMAF指标上分别实现了29.81%和33.79%的Bjøntegaard-Delta Rate(BD-rate)降低,验证了AV2在下一代流媒体应用中的效率。
The Alliance for Open Media (AOMedia) has developed the AV2 video coding standard to supersede AV1, aiming for substantial compression efficiency gains across diverse media applications. This paper details the quality and performance evaluation methodology defined in the AV2 Common Test Conditions (CTC), which introduces new evaluation methods and content, including convex-hull-based adaptive streaming (AS) configuration, user-generated content (UGC), and extended chroma formats. We present the coding gains of the AV2 (v13.0) against the AV1 baseline. Experimental results show that AV2 achieves significant Bjøntegaard-Delta Rate (BD-rate) reductions of 29.81\% and 33.79\% for PSNR-YUV and VMAF, respectively, under random access configuration, validating the efficiency of AV2 for next-generation streaming applications.
共享繁荣互联网
Juan A. Cabrera, Pit Hofmann, Jonas Schulz, Frederic Benken, Hrjehor Mark, Giang T. Nguyen, Holger Boche, Frank H. P. Fitzek
AI总结 本文提出共享繁荣互联网架构,通过映射物理约束到三个原则,构建了三个技术支柱,旨在使自动化和AI的效益广泛普及,并定义了可衡量的成果指标。
共享繁荣互联网(SPI)是一种网络计算架构,旨在使自动化和人工智能(AI)的益处广泛普及给社会。本文通过将香农、兰道尔、图灵和爱因斯坦的物理约束映射到三个设计原则:可信度、可持续性和技术主权,并将其转化为三个技术支柱:i)后香农、以目标为导向的通信,仅传输任务所需的内容;ii)前瞻性决策(“负延迟”)具有信心边界预行动和修正;iii)超越数字计算,在截止时间和可计算性约束下选择能量最优的子系统。SPI基于三个社会应用场景:远程教学、机器人和网络物理系统的远程教学以及老年护理。此外,本文定义了SPI的可衡量成果,包括延迟分解、每事件位数、每任务能耗和二氧化碳、安全和隐私指标以及鲁棒性。
The Shared Prosperity Internet (SPI) is a network-computing architecture that makes the benefits of automation and Artificial Intelligence (AI) broadly accessible to the society. To ground its design, this paper maps the physical constraints of Shannon, Landauer, Turing, and Einstein to three design principles: trustworthiness, sustainability, and technological sovereignty, and maps them into three technical pillars: i) post-Shannon, goal-oriented communication that transmits only what the task requires; ii) anticipatory decision-making ("negative latency") with confidence-bounded pre-action and correction; and iii) beyond-digital computing that selects energy-optimal substrates under deadline and computability constraints. The SPI is grounded in three societal use cases: remote teaching for pupils, remote teaching of robots and cyber-physical systems, and elder care. Furthermore, this paper defines measurable outcomes for an SPI, including latency decomposition, bits per event, energy and CO2 per task, safety and privacy indicators, and robustness.
基于卷积的不确定性传播的上下文条件高斯上界学习
Ruirui Liu, Xuejie Hou, Yiping Jiang, Hui Ren
AI总结 本文提出一种统一的学习框架,通过训练神经网络生成上下文感知的高斯上界,确保在有限分位数网格上具有可证明的保守性,并在满足三个显式正则性假设时在认证区间内保持连续尾保守性。
不确定性量化在安全关键领域至关重要——从自动驾驶到航空、金融和健康——其中决策必须依赖保守的界限而非点估计。预测层面的区间(如分位数回归、符合预测、方差网络或贝叶斯模型)通常不具有可组合性:将两个变量的区间相加不一定得到其和的合法区间或保持覆盖率。在航空领域,高斯上界用复杂的误差分布替换为保守的高斯分布,其尾部支配真实分布,因此保守性通过线性操作传播。然而,经典上界是全局的,通常过于保守,且难以适应特征条件误差。我们提出了一种统一的学习框架,训练神经网络生成上下文感知的高斯上界——均值和尺度——在有限分位数网格上具有可证明的保守性,并在满足三个显式正则性假设时在认证区间内保持连续尾保守性。我们的上界损失在选定的分位数上强制保守性,同时用一种类似瓦瑟斯坦的项惩罚分布距离。所学习的界限支持在强制网格上进行保守的线性组合和卷积分析,并在假设成立时在认证区间内进行保守性分析,同时比传统方法更不冗余。我们提供了离散到连续保守性的范围分析和紧域目标正则性的分析,并在合成数据和真实世界数据集上进行了验证,包括多路径、电离层和对流层残差误差。在这些设置中,该方法在保持强制网格上的保守性的同时,提供了更紧的界限。该框架是模态无关的,并适用于需要在动态环境中进行保守、特征条件不确定性估计的学习系统。
Uncertainty quantification is essential in safety-critical settings--from autonomous driving to aviation, finance, and health--where decisions must rely on conservative bounds rather than point estimates. Predictor-level intervals (e.g., from quantile regression, conformal prediction, variance networks, or Bayesian models) generally do not compose: adding two per-variable intervals need not yield a valid interval for their sum or preserve coverage. In aviation, Gaussian overbounding replaces complex error distributions with a conservative Gaussian whose tails dominate the truth, so conservatism propagates through linear operations. Yet classical overbounds are global, often overly conservative, and hard to adapt to feature-conditioned errors. We propose a unified learning framework that trains neural networks to produce context-aware Gaussian overbounds--mean and scale--with provable conservatism on a finite quantile grid and, under three explicit regularity assumptions, continuous-tail conservatism on a certified interval. Our overbounding loss enforces conservativeness at selected quantiles while penalizing distributional distance with a Wasserstein-style term. The learned bounds support conservative linear-combination and convolution analysis on the enforced grid, and on the certified interval when assumptions hold, while being less redundant than traditional methods. We provide a scoped analysis of discrete-to-continuous conservatism and compact-domain objective regularity, and validate on synthetic data and real-world datasets, including multipath, ionospheric, and tropospheric residual errors. Across these settings, the method yields tighter bounds while maintaining conservatism on the enforced grid and in experiments. The framework is modality-agnostic and applicable to learning systems that require conservative, feature-conditioned uncertainty estimates in dynamic environments.
反应式机器人中心安全机制用于受约束和动态环境中的自主导航
Viswa Narayanan Sankaranarayanan, Vignesh K. Viswanathan, Akshit Saradagi, Sumeet Satpute, George Nikolakopoulos
AI总结 本文提出一种基于3D激光雷达感知的复合控制屏障函数安全过滤器,用于实时保障自主机器人在受限动态环境中的安全导航,通过现场实验验证其在复杂环境下的可靠性。
本文研究了如何利用仅有的机载传感器确保自主机器人导航在空间受限动态环境中的实时安全性。我们提出了一种实时控制架构,将基于3D激光雷达感知的复合控制屏障函数(CBF)安全过滤器直接集成到自主系统中。该感知驱动框架通过机载点云数据动态施加避障约束,从而在控制频率下处理大量约束,同时对常规任务执行保持最小侵入性。安全区域定义为与平台几何一致的体坐标系下的椭球体,这在机器人旋转时会在世界坐标系中诱导时间变化约束;通过为每个激光雷达点专门制定时间变化(CBF)处理方式来应对这一影响。通过在地下环境中进行多场实验,利用四足平台执行视觉检查任务,验证了系统在动态障碍、不安全高层参考、突然定位异常以及狭窄走廊穿越情况下的可靠运行。
In this work, we address the problem of ensuring real-time safety in autonomous robot navigation, in spatially constrained dynamic environments, by utilizing only onboard sensors. We present a real-time control architecture that integrates a 3D LIDAR perception-based composite control barrier function(CBF)-based safety filter directly into the autonomy pipeline. The proposed perception-driven framework enforces collision avoidance constraints dynamically from onboard point cloud data, thus allowing a large number of constraints to be handled at the control frequency, while remaining minimally invasive to nominal task execution. The safety region is defined as an ellipsoid in the body-frame, consistent with the geometry of the platform, which induces time-varying constraints in the world frame as the robot rotates; this effect is handled through a dedicated formulation of time-varying (CBF) for each LIDAR point. We validate the system through multiple field experiments in underground environments by utilizing a quadruped platform performing a visual inspection task, demonstrating reliable operation in the presence of dynamic obstacles, unsafe high-level references, abrupt localization anomalies, and while traversing through narrow corridors.
针式天线系统下行性能分析:WDMA还是NOMA?
Han Zhang, Bingxin Zhang, Yizhe Zhao, Kun Yang
AI总结 本文分析了针式天线系统下行性能,比较了WDMA与NOMA在不同信噪比下的性能,发现NOMA在高信噪比下更高效,而WDMA在低到中等信噪比下更可靠,但存在性能地板和速率饱和问题。
本文提出了一种分析框架,用于研究采用波导分频多址接入(WDMA)和非正交多址接入(NOMA)的下行针式天线系统(PASS)。开发了一个统一的信道模型,以捕捉天线部署、用户空间分布和路径损耗。推导并验证了 outage probability 和 average achievable rate 的闭式和单积分表达式。结果表明,NOMA 在高发射信噪比(SNR)下由于逐次干扰消除(SIC)实现更高的频谱效率,而 WDMA 在低到中等 SNR 下表现更可靠,但面临 outage floor 和 rate saturation 问题。此外,WDMA 性能对用户空间分布更敏感,因为存在空间依赖的波导干扰。这些发现为 PASS 中接入方案选择和天线布置提供了设计见解。
This paper presents an analytical framework for downlink pinching antenna systems (PASS) employing waveguide division multiple access (WDMA) and non-orthogonal multiple access (NOMA). A unified channel model is developed to capture antenna deployment, user spatial distribution, and path loss. Closed-form and single-integral expressions for the outage probability and average achievable rate are derived and validated via Monte Carlo simulations. The results show that NOMA achieves higher spectral efficiency at high transmit signal-to-noise ratio (SNR) due to successive interference cancellation (SIC), whereas WDMA offers more reliable performance at low to moderate SNR but suffers from an outage floor and rate saturation at high SNR. Moreover, WDMA performance is more sensitive to the user spatial distribution due to the spatially dependent inter-waveguide interference. These findings provide design insights for access-scheme selection and antenna placement in PASS.
关于加密控制器对隐秘攻击的(非)鲁棒性
Philipp Binfet, Janis Adamek, Moritz Schulze Darup
AI总结 研究探讨了加密控制器在面对隐秘攻击时的鲁棒性问题,提出通过可验证计算实现安全防护,无需通信开销。
网络化控制系统(NCS)的安全性正受到网络安全和系统理论视角的广泛关注。前者关注传统IT安全目标,如数据保密性、完整性和可用性,而后者研究定制化攻击(及检测方案),包括隐秘攻击和零动态攻击。控制系统的保密性可以通过安全外包控制器评估到第三方平台(如云服务)来实现。支持此类安全计算的基础技术通常是同态加密(HE)。近期加密控制研究提出了对底层HE方案的修改,以实现不仅保密性还具备对某些完整性攻击的鲁棒性。尽管在理论上扩展此类方法是可取的,但我们证明由于其固有的可变性,公开密钥HE方案单独无法解决加密控制中的完整性问题。换句话说,使加密控制成为可能的相同同构性既可以被积极利用,也可以被消极利用。更具体地说,我们证明即使使用加密控制,NCS仍易受隐秘攻击。值得注意的是,这可以在不掌握未加密模型的情况下实现。然而,通过互补技术仍可以实现对这类攻击的鲁棒性。我们提出了一种基于可验证计算的方法,与现代同态密码系统集成,并且在渐进安全的同时不产生通信开销。
The security of networked control systems (NCS) is receiving increasing attention from both cyber-security and system-theoretic perspectives. The former focuses on classical IT security goals such as confidentiality, integrity, and availability of process data, while the latter investigates tailored attacks (and detection schemes), including covert and zero-dynamics attacks. Confidentiality in control systems can, for instance, be achieved by securely outsourcing the evaluation of the controller to third-party platforms, such as cloud services. The underlying technology enabling such secure computation often is homomorphic encryption (HE). Recent works in encrypted control have proposed modifications to underlying HE schemes to achieve not only confidentiality but also resilience to certain types of integrity attacks. While extensions in this direction are desirable in principle, we show that the integrity problem in encrypted control cannot be solved by public-key HE schemes alone due to their inherent malleability. In other words, the same homomorphisms that enable encrypted control in the first place can be leveraged not only constructively but also destructively. More precisely, we demonstrate that NCS are vulnerable to covert attacks, even when encrypted control is employed. Remarkably, this remains possible without knowledge of an unencrypted model. Yet, resilience to such attacks can still be achieved through complementary techniques. We present an approach based on verifiable computation that integrates with modern homomorphic cryptosystems and is asymptotically secure while incurring no communication overhead.