arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.03644 2026-06-03 cs.AI

AdapShot: Adaptive Many-Shot In-Context Learning with Semantic-Aware KV Cache Reuse

AdapShot: 自适应多示例上下文学习与语义感知的KV缓存重用

Jie Ou, Jinyu Guo, Shiyao Guo, Yuang Li, Ruiqi Wu, Zhaokun Wang, Wenyi Li, Wenhong Tian

AI总结 提出AdapShot方法,通过基于熵的探针机制动态优化示例数量,并结合语义感知的KV缓存重用策略,实现高效的多示例上下文学习,性能提升约10%,速度提升4.64倍。

详情
AI中文摘要

多示例上下文学习(Many-Shot ICL)已成为一种有前景的范式,利用大量示例来释放大型语言模型(LLMs)的推理潜力。然而,现有方法通常依赖于预定的固定示例数量。这种静态方法往往无法适应不同查询的难度变化,导致上下文不足或噪声干扰。此外,长上下文的过高计算和内存成本严重限制了多示例的可行性。为了解决上述限制,我们提出了AdapShot,它动态优化示例数量,并利用KV缓存重用实现高效推理。具体来说,我们设计了一种基于探针的评估机制,利用输出熵确定最佳示例数量。为了在探测和推理阶段避免冗余的预填充计算,我们引入了一种语义感知的KV缓存重用策略。在该重用策略中,为了解决位置编码不兼容问题,我们提出了一种解耦和重新编码方法,使得缓存的键值对能够灵活重新排序。大量实验表明,与最先进的DBSA相比,AdapShot平均性能提升约10%,速度提升4.64倍。

英文摘要

Many-Shot In-Context Learning (ICL) has emerged as a promising paradigm, leveraging extensive examples to unlock the reasoning potential of Large Language Models (LLMs). However, existing methods typically rely on a predetermined, fixed number of shots. This static approach often fails to adapt to the varying difficulty of different queries, leading to either insufficient context or interference from noise. Furthermore, the prohibitive computational and memory costs of long contexts severely limit Many-Shot's feasibility. To address the above limitations, we propose AdapShot, which dynamically optimizes shot counts and leverages KV cache reuse for efficient inference. Specifically, we design a probe-based evaluation mechanism that utilizes output entropy to determine the optimal number of shots. To bypass the redundant prefilling computation during both the probing and inference phases, we incorporate a semantics-aware KV cache reuse strategy. Within this reuse strategy, to address positional encoding incompatibilities, we introduce a decoupling and re-encoding method that enables the flexible reordering of cached key-value pairs. Extensive experiments demonstrate that AdapShot achieves an average performance gain of around 10% and a 4.64x speedup compared to state-of-the-art DBSA.

2605.02488 2026-06-03 cs.AI cs.DB cs.LO

Efficient Temporal Datalog Materialisation for Composite Event Recognition

高效的时间Datalog物化用于复合事件识别

Periklis Mantenoglou

AI总结 针对高速事件流中的关键情况检测需求,通过将主流事件规范语言映射到时间Datalog->-并扩展流触发图技术,实现统一的复合事件识别机制。

详情
AI中文摘要

许多应用需要在高速符号事件流中及时检测关键情况,例如对安全和透明度的威胁。这一需求推动了(i)事件规范语言的发展,该语言通过简单事件上的时间模式定义复合事件,以及(ii)流推理框架,评估用这些语言表达的模式。然而,事件规范语言通常被孤立研究,使得它们在表达性方面的比较复杂化,并模糊了其相关流推理器的范围。为了缓解这一问题,我们将突出的事件规范语言的实用片段映射到时间Datalog->-,一种具有分层否定且无未来依赖的时间Datalog。为了支持对时间Datalog->-的高效流推理,我们提出了流触发图,这是对最先进的Datalog物化技术的扩展。我们的方法产生了一个统一的复合事件识别机制,具有跨广泛实用事件规范语言进行泛化的潜力。

英文摘要

Several applications demand the timely detection of critical situations, such as threats to safety and transparency, over high-velocity streams of symbolic events. This demand has motivated the development of (i) event specification languages, which define composite events via temporal patterns over simpler events, and (ii) stream reasoning frameworks, evaluating patterns expressed in these languages. However, event specification languages are typically studied in isolation, complicating their comparison in terms of expressivity and obscuring the scope of their associated stream reasoners. To mitigate this issue, we map practical fragments of prominent event specification languages into Temporal Datalog->-, a temporal Datalog with stratified negation and no future dependencies. To support efficient stream reasoning over Temporal Datalog->-, we propose Streaming Trigger Graphs, an extension of a state-of-the-art technique for Datalog materialisation. Our approach yields a uniform composite event recognition mechanism that has the potential to generalise across a wide range of practical event specification languages.

2605.03299 2026-06-03 cs.CL

LLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Models

LLM-XTM:利用大语言模型增强跨语言主题模型

Minh Chu Xuan, Tien-Phat Nguyen, Linh Ngo Van, Dinh Viet Sang, Nguyen Thi Ngoc Diep, Trung Le

AI总结 提出LLM-XTM框架,通过LLM引导的主题精炼与自一致性不确定性量化,以黑盒方式稳定提升跨语言主题模型的连贯性和对齐性,减少对双语资源的依赖。

详情
Comments
ACL 2026
AI中文摘要

跨语言主题建模旨在发现跨语言的共享语义结构,但现有模型依赖稀疏的双语资源,往往产生不连贯或弱对齐的主题。最近基于LLM的精炼方法提高了可解释性,但成本高、在文档级别且容易产生幻觉,而先前的白盒方法需要无法获取的token概率。我们提出LLM-XTM,一个集成LLM引导的主题精炼与自一致性不确定性量化的框架,能够以黑盒、稳定且可扩展的方式增强跨语言主题模型。在多语言语料库上的实验表明,LLM-XTM在减少对双语词典和昂贵LLM调用依赖的同时,实现了更优的主题连贯性和对齐性。

英文摘要

Cross-lingual topic modeling aims to discover shared semantic structures across languages, yet existing models depend on sparse bilingual resources and often yield incoherent or weakly aligned topics. Recent LLM-based refinements improve interpretability but are costly, document-level, and prone to hallucination, with prior white-box approaches requiring inaccessible token probabilities. We propose LLM-XTM, a framework that integrates LLM-guided topic refinement with self-consistency uncertainty quantification, enabling black-box, stable, and scalable enhancement of cross-lingual topic models. Experiments on multilingual corpora show that LLM-XTM achieves superior topic coherence and alignment while reducing reliance on bilingual dictionaries and expensive LLM calls.

2605.01712 2026-06-03 cs.LG

CoAction: Cross-task Correlation-aware Pareto Set Learning

CoAction: 跨任务相关性感知的帕累托集学习

Xinyue Chen, Yingxuan Liang, Yiqin Huang, Chikai Shang, Hai-Lin Liu, Fangqing Gu

AI总结 提出CoAction框架,利用任务感知Transformer同时处理多个多目标优化问题,通过自注意力机制捕获任务间相关性,提升超体积、范围和稀疏性指标。

详情
Comments
Accepted by ICIC 2026 (Oral)
AI中文摘要

帕累托集学习(PSL)是多目标优化中的一种新兴范式,它训练神经网络将偏好向量映射到帕累托最优解。然而,现有的PSL方法主要关注一次解决单个多目标优化问题。这一局限性不仅在多目标多任务优化场景中增加了计算成本(因为每个任务需要单独的模型),而且未能利用任务间的相关性。为了解决这个问题,我们提出了一个跨任务相关性感知的帕累托集学习(CoAction)框架,该框架利用任务感知Transformer同时处理多个任务。具体来说,通过为每个任务分配任务特定的嵌入向量,模型有效地区分任务,同时促进任务间的知识共享。我们采用Transformer编码器作为骨干架构,利用其自注意力机制捕获复杂的任务依赖关系。该方法在涵盖基准问题和实际应用的全面多任务测试套件上进行了评估,在超体积、范围和稀疏性方面展示了有效性和有竞争力的性能。

英文摘要

Pareto set learning (PSL) is an emerging paradigm in multi-objective optimization that trains neural networks to map preference vectors to Pareto optimal solutions. However, existing PSL methods primarily focus on solving a single multi-objective optimization problem at a time. This limitation not only increases computational costs in multi-objective multitask optimization scenarios by requiring a separate model for each task, but also fails to exploit the inter-task correlations across tasks. To address this, we propose a Cross-tAsk correlation-aware Pareto Set Learning (CoAction) framework, which leverages task-aware transformer to handle multiple tasks simultaneously. Specifically, by assigning task-specific embedding vectors to individual tasks, the model effectively distinguishes between tasks while facilitating knowledge sharing among them. We utilize a Transformer encoder as the backbone architecture to leverage its self-attention mechanism for capturing complex task dependencies. The proposed approach is evaluated on comprehensive multitask test suites covering both benchmark problems and real-world applications, demonstrating effectiveness and competitive performance in Hypervolume, Range, and Sparsity.

2606.03940 2026-06-03 eess.IV cs.CV cs.LG cs.RO

SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction

SEAOTTER: 基于传感器嵌入自编码器与一次性转码的高效重建

Dan Jacobellis, Neeraja J. Yadwadkar

AI总结 提出SEAOTTER框架,结合传感器嵌入自编码器与可学习JPEG转码,在200:1压缩比下实现比AVIF快7倍编码、3.5倍解码,并提升ImageNet top-1准确率8%,同时保持JPEG兼容性。

详情
AI中文摘要

在机器人系统中,使用低成本、低功耗硬件可以轻松捕获高分辨率的大量视觉数据。然而,当通过JPEG/MPEG等传统编解码器传输时,有限的带宽和机载计算资源阻碍了充分利用。较新的编解码器(如AV1/AVIF)改善了率失真权衡,但需要更多资源进行编码,在没有定制ASIC的情况下不切实际。最近的非对称自编码器在极端功率和带宽约束下提供高质量,但增加了高昂的解码成本,并使用忽略围绕JPEG等标准建立的数十年基础设施的特有格式。为了解决这些限制,我们引入了一种基于传感器嵌入自编码器与一次性转码的高效重建(SEAOTTER)的云机器人压缩框架。由于传感器、云和消费阶段面临非常不同的功率和带宽预算,SEAOTTER结合了学习潜变量的紧凑性和标准JPEG文件的广泛可用性。由于朴素转码会降低性能,我们提出了一种可学习的JPEG颜色和量化变换,能够提高全局、密集和基于视觉语言感知的准确性。使用SEAOTTER,我们为预训练的冻结编码器训练通用和任务感知的转码流水线。在200:1的压缩比下,与AVIF相比,我们观察到编码速度提高7倍,解码速度提高3.5倍,ImageNet top-1准确率提高8%,同时保持与JPEG基础设施的兼容性。我们的代码可从此https URL获取。

英文摘要

In robotics systems, vast amounts of visual data are easily captured at high resolution using low-cost, low-power hardware. Yet, limited bandwidth and on-device compute resources prevent full utilization when transmitted via conventional codecs like JPEG/MPEG. Newer codecs, like AV1/AVIF, improve the rate-distortion trade-off, but demand far more resources for encoding, impractical without custom ASICs. Recent asymmetric autoencoders deliver high quality under extreme power and bandwidth constraints, but add prohibitive decoding cost and use bespoke formats that ignore decades of infrastructure built around standards like JPEG. To address these limitations, we introduce a compression framework for cloud robotics based on a Sensor Embedded Autoencoder paired with a One-Time Transcode for Efficient Reconstruction (SEAOTTER). Because the sensor, cloud, and consumer stages face very different power and bandwidth budgets, SEAOTTER combines the compactness of a learned latent with the broad usability of a standard JPEG file. Since naive transcoding degrades performance, we propose a learnable JPEG color and quantization transform that enables increased accuracy for global, dense, and vision-language-based perception. Using SEAOTTER, we train both general-purpose and task-aware transcoding pipelines for a pre-trained, frozen encoder. At a compression ratio of 200:1 and compared to AVIF, we observe 7 times faster encoding, 3.5 times faster decoding, and +8% ImageNet top-1 accuracy, while retaining compatibility with JPEG infrastructure. Our code is available at https://github.com/UT-SysML/seaotter .

2606.03455 2026-06-03 eess.AS cs.SD

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

WavTTS:通过直接原始波形建模实现高质量零样本TTS

Wenxi Chen, Dongya Jia, Yushen Chen, Zhikang Niu, Yuzhe Liang, Xiquan Li, Ruiqi Yan, Ziyang Ma, Guanrou Yang, Sanyuan Chen, Yue Wang, Zhuo Chen, Kai Yu, Xie Chen

AI总结 提出WavTTS,首个基于流匹配与扩散Transformer的原始波形生成TTS模型,通过简单分块策略直接建模波形并集成多尺度梅尔频谱监督,在零样本TTS中接近潜在空间生成模型性能。

详情
AI中文摘要

最近,基于VAE潜在变量或梅尔频谱的扩散模型已成为零样本TTS的主流范式。尽管这些压缩表示提高了生成效率,但它们不可避免地遭受信息损失和非端到端训练的问题。理论上,直接建模原始波形可以规避这些问题;然而,由于音频信号序列长度极长,这一方向尚未充分探索且常被认为困难。为了克服这一点,我们提出了WavTTS,这是第一个原始波形生成TTS模型,显著缩小了与潜在空间生成模型的差距。基于流匹配与扩散Transformer(DiT),WavTTS通过简单的分块策略直接建模语音波形,同时集成多尺度梅尔频谱监督以在训练过程中提供感知指导。此外,我们研究了波形扩散中预测目标和噪声调度的影响,并开发了一种有效的调度设计以提高生成质量。在开源基准上的评估表明,WavTTS接近当前最先进的潜在生成零样本TTS模型的性能,同时显著优于之前的端到端语音生成模型。我们的发现证明了直接在波形空间扩展基于扩散的TTS的可行性,为端到端语音生成开辟了新方向。

英文摘要

Recently, diffusion models operating on VAE latents or mel-spectrograms have become the dominant paradigm for zero-shot TTS. Although these compressed representations improve generation efficiency, they inevitably suffer from information loss and non-end-to-end training. Theoretically, directly modeling raw waveforms circumvents these issues; however, this direction remains underexplored and is often deemed difficult due to the extremely long sequence length of audio signals. To overcome this, we propose WavTTS, the first raw waveform generative TTS model that substantially narrows the gap with latent-space generative models. Built upon the flow matching with Diffusion Transformer (DiT), WavTTS directly models speech waveforms via a simple patchification strategy, while integrating multi-scale mel-spectrogram supervision to provide perceptual guidance during training. Furthermore, we investigate the impact of prediction targets and noise scheduling in waveform diffusion, and develop an effective schedule design to improve generation quality. Evaluations on open-source benchmarks demonstrate that WavTTS closely approaches the performance of current state-of-the-art latent generative zero-shot TTS models, while substantially outperforming previous end-to-end speech generation models. Our findings demonstrate the feasibility of scaling diffusion-based TTS directly in the waveform space, opening a new direction for end-to-end speech generation.

2606.03116 2026-06-03 eess.AS cs.AI cs.SD

AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Evaluator for Audio Instruction Following

AnyAudio-Judge:基于动态评分标准的音频指令跟随基准与评估器

Haitao Li, Tian Tan, Yuguang Yang, Shan Yang, Xie Chen

AI总结 针对指令引导音频生成中复杂指令解耦困难、评估缺乏可解释性和细粒度属性匹配的问题,提出基于动态评分标准的评估范式,通过自适应分解音频描述为可验证的二元评分项,并构建包含7920个样本的双语基准和105K训练语料,结合SFT与GRPO训练专用评估器,在零样本对齐检测和下游强化学习指令对齐中取得显著提升。

详情
AI中文摘要

指令引导音频生成的快速发展凸显了对稳健对齐评估的迫切需求。当前的自动评估方法严重依赖通用大语言模型的整体评分,难以解耦复杂指令,缺乏可解释性,且无法捕捉细粒度的属性不匹配。为解决这一问题,我们引入了一种新颖的基于动态评分标准的评估范式,该范式自适应地将复杂的音频描述分解为可变数量的独立、可验证的二元评分项。为了严格基准测试这一能力,我们提出了AnyAudio-Judge Bench,一个全面的双语基准,包含7920个精心策划的样本,涵盖四个不同的音频领域(语音、声音、音乐和混合),并包含特意构建的困难负样本。此外,我们构建了一个包含105K样本的大规模语料库,并带有明确的思维链(CoT)理由,以训练我们的专用评估器——AnyAudio-Judge模型。通过采用结合监督微调(SFT)和组相对策略优化(GRPO)的训练流程,我们的模型成功将其推理路径与基于评分标准的评分机制对齐。大量实验表明,AnyAudio-Judge不仅显著增强了与最先进基线相比的零样本对齐检测,而且提供了精确且可解释的奖励信号,显著改善了音频生成下游强化学习中的指令对齐。

英文摘要

The rapid advancement of instruction-guided audio generation has highlighted the critical need for robust alignment evaluation. Current automated evaluation methods heavily rely on holistic scoring from general-purpose large language models, which struggle to decouple complex instructions, lack interpretability, and fail to capture fine-grained attribute mismatches. To address this, we introduce a novel dynamic rubric-based evaluation paradigm that adaptively decomposes complex audio captions into a variable number of independent, verifiable binary rubric items. To rigorously benchmark this capability, we propose the AnyAudio-Judge Bench, a comprehensive, bilingual benchmark comprising 7,920 meticulously curated samples across four diverse audio domains (speech, sound, music, and mixed), featuring deliberately constructed hard negatives. Furthermore, we construct a large-scale corpus of 105K samples with explicit Chain-of-Thought (CoT) rationales to train our dedicated evaluator, the AnyAudio-Judge model. By employing a training pipeline that combines Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO), our model successfully aligns its reasoning paths with the rubric-based scoring mechanism. Extensive experiments demonstrate that AnyAudio-Judge not only significantly enhances zero-shot alignment detection compared to state-of-the-art baselines, but also provides precise and interpretable reward signals that substantially improve instruction alignment in downstream reinforcement learning for audio generation.

2606.02913 2026-06-03 eess.AS cs.SD

A Comparison of Generative and Discriminative Methods for Speech Enhancement: Robustness, Complexity, and Hallucination

生成式与判别式语音增强方法的比较:鲁棒性、复杂性与幻觉

Shrishti Saha Shetu, Emanuël A. P. Habets, Andreas Brendel

AI总结 本文比较了生成式和判别式深度学习方法在语音增强中的表现,分析了高/低信噪比、匹配/失配训练场景下的鲁棒性、复杂度与幻觉特性。

详情
AI中文摘要

在本研究中,我们对基于深度学习的生成式和判别式语音增强方法进行了全面的比较分析,特别是在降噪任务中。我们的研究重点在于评估它们在高低信噪比条件下的有效性,同时考虑匹配和不匹配的训练场景。我们进一步研究了训练数据量、模型收敛速度的影响,并根据所考虑的训练范式,从客观结果的角度解释了性能差异。此外,我们比较了这些方法的复杂度-性能权衡和实际可行性。为了进一步加强评估,我们研究了生成式方法在词错误率和音素相似度方面的幻觉特性。本研究得出的见解提供了经验证据,帮助研究人员和从业者理解不同方法的感知增益是否证明了其在实际应用中的计算成本是合理的。

英文摘要

In this study, we conduct a comprehensive comparative analysis of generative and discriminative deep learning-based speech enhancement methods, specifically in noise reduction tasks. Our investigation focuses on evaluating their effectiveness under high and low signal-to-noise ratio conditions, considering both matched and mismatched training scenarios. We further investigate the impact of training data volume, model convergence speed, and interpret the performance differences in terms of objective results for the considered training paradigms. Additionally, we compare the complexity-performance trade-off and the practical viability of these approaches. To further strengthen the evaluation, we study the hallucination characteristics of generative approaches in terms of word error rate and phoneme similarity. The insights derived from this study provide empirical evidence to assist researchers and practitioners in understanding whether the perceptual gains of different approaches justify their computational cost in practical applications.

2606.02906 2026-06-03 eess.IV cs.CV

Depth from Dual Differential Defocus and Stereo Consensus

基于双差分散焦与立体一致性的深度估计

Junjie Luo, Wei Xu, Dylan Chu, Emma Alexander, Qi Guo

AI总结 提出D^3S Consensus算法,融合散焦深度与立体视觉,在超出景深范围内实现高精度深度估计,通过物理独立线索的一致性选择可靠预测,以更小基线达到可比工作范围。

详情
AI中文摘要

我们提出了D^3S Consensus,一种基于物理的闭式算法,它统一了散焦深度(DfD)和立体视觉,在超出相机景深(DoF)的扩展工作范围内实现高精度深度估计。给定一对双散焦立体图像,该方法通过一种新颖的DfD理论——双差分散焦(D^3)和(S)立体耦合方式,估计一组超定深度。然后,通过在这些物理独立线索之间强制执行一致性,从该组中选择最可信的深度预测,以拒绝不可靠的估计。分析表明,在相同误差容限下,D^3S与先前的基于三角测量的深度估计系统相比,以10倍小的基线实现了可比的工作范围。这使得紧凑的无源双目测距仪具有比传统立体和DfD设计小得多的外形尺寸。我们展示了第一个D^3S原型,其基线仅为4毫米,EFL为12毫米。它通过单次采集生成高达900×1800像素的深度图,在0.3-1.64米范围内平均绝对误差为1厘米。这已经超过了某些具有更大外形尺寸的商用立体相机的报告精度。

英文摘要

We introduce D^3S Consensus, a physics-based, closed-form algorithm that unifies depth-from-defocus (DfD) and stereo to achieve highly accurate depth estimation throughout an extended working range beyond the depth-of-field (DoF) of cameras. Given a pair of dual-defocus stereo images, the method estimates an overdetermined set of depth using a novel DfD theory, Dual Differential Defocus (D^3), and (S)tereo in a coupled fashion. It then picks the most confident depth prediction from the set by enforcing consensus between these physically independent cues to reject unreliable estimates. Analysis shows that D^3S achieves a comparable working range under the same error tolerance with 10x smaller baseline than previous triangulation-based depth estimation systems. This enables compact passive binocular rangefinders with substantially smaller form factors than conventional stereo and DfD designs. We demonstrate the first D^3S prototype with only 4 mm baseline and 12 mm EFL. It generates up to 900 x 1800-pixel depth maps with 1-cm mean absolute error over 0.3-1.64 m from a snapshot acquisition. This has surpassed the reported accuracy of certain commercially available stereo cameras with much larger form factors.

2606.02661 2026-06-03 eess.IV cs.AI cs.LG

Learning to Refine: Spectral-Decoupled Iterative Refinement Framework for Precipitation Nowcasting

学习细化:用于降水临近预报的频谱解耦迭代细化框架

Yunlong Zhou, Chen Zhao, Danyang Peng, Fanfan Ji, Xiao-Tong Yuan

AI总结 提出频谱解耦迭代细化框架(SDIR),通过双路径设计(SFG-Former和FR-Refiner)和物理一致功率谱密度损失,在确定性框架中实现降水临近预报的渐进频率解耦细化,消除模糊和幻觉,在空间精度和频谱保真度上超越现有方法。

详情
Comments
21 pages, 10 figures, accepted at ICML 2026
AI中文摘要

准确的降水临近预报对减灾至关重要,但深度学习方法面临关键权衡:回归模型产生过度平滑、频谱衰减的预测,模糊对流细节并违反湍流幂律;扩散模型生成逼真但无锚定的幻觉,缺乏物理基础。我们提出频谱解耦迭代细化(SDIR),一个确定性框架,将临近预报重新表述为渐进频率解耦细化。SDIR首先提取稳定的低频天气尺度骨架,然后在物理约束下迭代细化高频纹理,消除模糊和幻觉。它采用双路径设计:天气尺度频率引导前馈网络(SFG-Former)使用尺度自适应Transformer处理全局结构,傅里叶残差细化器(FR-Refiner)使用尺度条件傅里叶神经算子处理精细残差。具有动态掩蔽的物理一致功率谱密度(PCPSD)损失强制执行湍流一致的频谱分布。在三个基准上的实验表明,SDIR在空间精度上显著优于最先进方法,同时实现了与基于扩散方法竞争的频谱保真度,实现了可靠的高分辨率业务化临近预报。代码链接:this https URL。

英文摘要

Accurate precipitation nowcasting is vital for disaster mitigation, but deep learning methods face a key trade-off: regression models produce over-smoothed, spectrally decaying predictions that blur convective details and violate turbulence power laws; diffusion models generate realistic yet unanchored hallucinations lacking physical grounding. We propose Spectral-Decoupled Iterative Refinement (SDIR), a deterministic framework that reformulates nowcasting as progressive frequency-decoupled refinement. SDIR first extracts a stable low-frequency synoptic skeleton, then iteratively refines high-frequency textures under physical constraints, eliminating both blurring and hallucinations. It features a dual-path design: the Synoptic Frequency-Guided Former (SFG-Former) with Scale-Adaptive Transformers for global structure, and the Fourier Residual Refiner (FR-Refiner) with Scale-Conditioned Fourier Neural Operators for fine residuals. A Physically Consistent Power Spectral Density (PCPSD) loss with dynamic masking enforces a turbulence-consistent spectral distribution. Experiments on three benchmarks show SDIR significantly outperforms SOTA methods in spatial accuracy while achieving spectral fidelity competitive with diffusion-based methods, enabling reliable high-resolution operational nowcasting. Code link: https://github.com/RuntimeWarning/SDIR.

2606.02642 2026-06-03 eess.AS cs.AI cs.CV cs.LG cs.MM cs.SD

SVHalluc: Benchmarking Speech-Vision Hallucination in Audio-Visual Large Language Models

SVHalluc: 音频-视觉大语言模型中的语音-视觉幻觉基准测试

Chenshuang Zhang, Kyeong Seon Kim, Chengxin Liu, Tae-Hyun Oh

AI总结 针对音频-视觉大语言模型中的语音-视觉幻觉问题,提出SVHalluc基准,从语义和时间两个维度评估模型将语音内容与视觉信号对齐的能力,发现现有模型存在跨模态理解局限。

详情
Comments
Accepted at CVPR 2026
AI中文摘要

尽管音频-视觉大语言模型(LLMs)取得了成功,但它们可能产生看似合理但缺乏依据的输出,即幻觉。现有基准侧重于环境声音(例如狗叫)来指示事件发生。相比之下,人类语音承载着根本不同的、丰富的语义和时间结构,但当前模型能否准确地将语音内容与相应的视觉信号对齐仍未得到探索。在这项工作中,我们表明语音内容可以引发音频-视觉LLMs中的幻觉。为了系统研究这一点,我们引入了SVHalluc,这是第一个用于评估音频-视觉LLMs中语音-视觉幻觉的综合基准。我们的基准从两个关键且互补的方面诊断语音-视觉幻觉:语义和时间。实验结果表明,最先进的开源音频-视觉LLMs难以将语音内容与相应的视觉信号对齐,在多个任务上的准确率接近随机。相比之下,Gemini 2.5 Pro显著优于开源模型。我们的分析表明,它们的失败源于跨模态理解能力有限,尽管在单模态感知方面表现强劲。我们的工作揭示了当前音频-视觉LLMs的一个新的根本性局限,并强调了基于语音的视频理解的需求。项目页面:此https URL。

英文摘要

Despite the success of audio-visual large-language models (LLMs), they can produce plausible but ungrounded outputs, termed hallucination. Existing benchmarks focus on environmental sounds (e.g., dog barking) to indicate event occurrence. In contrast, human speech carries fundamentally different, rich semantics and temporal structures, yet it remains unexplored whether current models can accurately align speech content with corresponding visual signals. In this work, we show that speech content can induce hallucinations in audio-visual LLMs. To systematically study this, we introduce SVHalluc, the first comprehensive benchmark for evaluating speech-vision hallucination in audio-visual LLMs. Our benchmark diagnoses speech-vision hallucinations from two critical and complementary aspects: semantic and temporal. Experimental results demonstrate that state-of-the-art open-source audio-visual LLMs struggle with aligning speech content with corresponding visual signals, with a near-random accuracy on multiple tasks. In contrast, Gemini 2.5 Pro significantly outperforms the open-source models. Our analysis suggests that their failures stem from limited ability in cross-modality understanding, despite strong performance in single-modality perception. Our work uncovers a new and fundamental limitation of current audio-visual LLMs and highlights the need for speech-grounded video comprehension. Project page: https://chenshuang-zhang.github.io/projects/svhalluc/.

2606.02639 2026-06-03 eess.IV cs.AI cs.CV

Sparse-View Lung Nodule Volumetry from Digitally Reconstructed Radiographs via AReT: Anatomy-Regularized TensoRF

通过AReT:解剖正则化TensoRF从数字重建放射图像进行稀疏视图肺结节体积测量

Spoorthi M, Suja Palaniswamy

AI总结 本文发现并解决了TensoRF在X射线衰减场中的默认密度偏移问题,提出解剖正则化张量辐射场框架AReT,仅用三个正交X射线投影即可实现肺结节的稳定体积重建,在LIDC-IDRI数据集上达到高精度。

详情
AI中文摘要

我们识别并解决了TensoRF应用于X射线衰减场时一个先前未报告的失败模式:默认密度偏移-10(最初为RGB场景重建引入)抑制了密度梯度,并阻止了稀疏视图医学重建,无论学习率或正则化策略如何。将密度偏移设置为零可恢复梯度流,并仅从三个正交X射线投影实现肺结节的稳定体积重建。在此基础上,我们提出AReT,一个解剖正则化的张量辐射场框架,用于使用LIDC-IDRI数据集(19名患者,放射科医生注释的结节)的冠状、矢状和轴向投影进行肺结节重建。与需要密集多视图采集的现有NeRF方法不同,AReT专为稀疏视图胸部成像设计,并整合了结合L1稀疏性和总变分平滑性的胸部解剖感知正则化。对11种重建策略的系统比较表明,解剖感知正则化始终优于生成先验引导的方法。与放射科医生共识分割相比,AReT在临床可操作的结节(>=10 mm,n=14)上实现了Pearson r=0.983(p<0.0001),中位绝对体积误差为11.4%,接近零的系统偏差为-77.3 mm^3,并且比球形体积近似提高了8.4倍。

英文摘要

We identify and resolve a previously unreported failure mode in TensoRF when applied to X-ray attenuation fields: the default density shift of -10, originally introduced for RGB scene reconstruction, suppresses density gradients and prevents sparse-view medical reconstruction regardless of learning rate or regularization strategy. Setting the density shift to zero restores gradient flow and enables stable volumetric reconstruction of pulmonary nodules from only three orthogonal X-ray projections. Building on this, we propose AReT, an anatomy-regularized tensorial radiance field framework for lung nodule reconstruction using coronal, sagittal, and axial projections from the LIDC-IDRI dataset (19 patients, radiologist-annotated nodules). Unlike existing NeRF approaches requiring dense multi-view acquisition, AReT is designed for sparse-view thoracic imaging and incorporates chest-anatomy-aware regularization combining L1 sparsity and total variation smoothness. A systematic comparison across 11 reconstruction strategies shows anatomy-aware regularization consistently outperforms generative-prior-guided approaches. Evaluated against radiologist consensus segmentations, AReT achieves Pearson r=0.983 (p<0.0001) for clinically actionable nodules >=10 mm (n=14), median absolute volumetric error of 11.4%, near-zero systematic bias of -77.3 mm^3, and 8.4x improvement over spherical volume approximation.

2606.02634 2026-06-03 eess.IV cs.AI

Echo-POSED: Geometric Self-Distillation for Echocardiography Guidance

Echo-POSED:用于超声心动图引导的几何自蒸馏

Elias Stenhede, Edvart Grüner Bjerke, Joanna Sulkowska, Eivind Bjørkan Orstad, Ole Jakob Elle, Ulysse Côté-Allard, Arian Ranjbar

AI总结 提出一种自监督框架Echo-POSED,通过从3D超声心动图体积中切取2D视图训练,实现实时经胸超声心动图引导,无需专家标注视图或跟踪探头轨迹,在SO(3)×SO(3)上保持探头运动等变性,在患者内和患者间引导模拟中达到平均角度误差8.2度。

详情
AI中文摘要

我们引入了Echo-POSED,一种用于实时经胸超声心动图(TTE)引导的自监督框架,它直接从2D超声图像推荐探头调整,无需专家标注的视图或跟踪的探头轨迹。相反,它在从常规采集的3D超声心动图体积中切取的2D视图上训练,强制执行对探头运动的等变性,同时保持对心脏相位的不变性,从而在SO(3)×SO(3)上产生姿态表示。在保留的测试集和公共外部3D-TTE数据集(包括供应商变化)上,Echo-POSED在虚拟扰动下保持几何一致性,并实现患者内和患者间引导模拟,在具有心脏运动的患者内模拟中,引导视图与目标视图之间的平均角度误差为8.2度。

英文摘要

We introduce Echo-POSED, a self-supervised framework for real-time transthoracic echocardiography (TTE) guidance that recommends probe adjustments directly from 2D ultrasound images, without the need for expert-labelled views or tracked probe trajectories. Instead, it trains on 2D views sliced from routinely acquired 3D echocardiography volumes, enforcing equivariance to probe motions while remaining invariant to cardiac phase, yielding a pose representation on $\mathrm{SO}(3)\times\mathrm{SO}(3)$. Across a held-out split and public external 3D--TTE datasets (including vendor shift), Echo-POSED maintains geometric consistency under virtual perturbations and enables intra- and inter-patient guidance simulations, achieving a combined mean angular error of 8.2 degrees between the guided and target views in intra-patient simulations with cardiac motion.

2606.02631 2026-06-03 eess.AS cs.AI cs.CV cs.LG cs.SD

Wavelet as Tokenizer: Preliminary Results on a Shared Wavelet Token Schema for Natural Signals

小波作为分词器:自然信号共享小波分词方案的初步结果

Shenghao Ding

AI总结 本文研究音频、图像和视频能否共享统一的小波分词方案,通过基于Haar DWT/IDWT的连续令牌模型,在多个数据集上验证了统一分词模式的可行性,并分析了潜在容量和元数据的影响。

详情
Comments
12 pages, 3 figures
AI中文摘要

本文研究音频、图像和视频是否可以共享一个共同的小波令牌模式,而不是依赖于各自模态特定的潜在网格。它介绍了一个初步的连续令牌模型,该模型围绕一级Haar DWT/IDWT前端、共享系数令牌布局、可选结构元数据、轻量级模态值适配器和共享的令牌级编码器-解码器主干构建。在Speech Commands、EuroSAT RGB和DAVIS 2017数据上,密集共享模型达到了39.92 dB音频、29.37 dB图像和23.93 dB视频的PSNR。在连续潜在标量预算下的匹配速率扫描表明,视觉增益不能仅由潜在容量解释,同时也表明加性元数据嵌入并非普遍改进来源。最后,固定速率能量选择提供了一个强大的非参数基线:在压缩保留比率下,energy_global相比均匀选择将音频的平均PSNR提高了16.73 dB,图像提高了16.90 dB,视频提高了15.86 dB。掩蔽稀疏训练在50%的密集令牌下达到了34.45 dB的视频PSNR。结果支持统一的 wavelet 令牌模式和稀疏令牌接口,但尚未建立通用的离散词汇表。

英文摘要

This paper studies whether audio, images, and video can share a common wavelet token schema rather than relying on separate modality-specific latent grids. It introduces a preliminary continuous-token model built around a one-level Haar DWT/IDWT frontend, a shared coefficient-token layout, optional structural metadata, lightweight modality value adapters, and a shared token-wise encoder-decoder trunk. On Speech Commands, EuroSAT RGB, and DAVIS 2017 data, a dense shared model reaches 39.92 dB audio, 29.37 dB image, and 23.93 dB video PSNR. A matched-rate sweep under continuous latent scalar budgets indicates that the visual gains are not explained solely by latent capacity, while also showing that additive metadata embeddings are not a universal source of improvement. Finally, fixed-rate energy selection provides a strong non-parametric baseline: energy_global improves average PSNR over uniform selection by 16.73 dB for audio, 16.90 dB for images, and 15.86 dB for video under compressed keep ratios. Masked sparse training reaches 34.45 dB video PSNR with 50% of dense tokens. The results support a unified wavelet token schema and sparse token interface, while stopping short of establishing a universal discrete vocabulary.

2606.02615 2026-06-03 eess.AS cs.AI cs.SD

FSA-GRPO: Teaching Auditory LLMs to Use Few-shot Demonstrations

FSA-GRPO:训练听觉大语言模型使用少样本示例

Haolong Zheng, Siyin Wang, Xulin Fan, Zengrui Jin, Mark Hasegawa-Johnson

AI总结 提出基于强化学习的后训练方法FSA-GRPO,通过专门设计的奖励机制鼓励模型利用少样本示例,增强其少样本适应能力,在儿童语音识别、语音翻译和音频理解等任务上取得提升。

详情
AI中文摘要

少样本提示为将听觉大语言模型适应低资源任务(如儿童语音识别)提供了一种有效方式。然而,大多数听觉大语言模型并未被明确训练以在这种示例条件格式下进行推理,限制了它们从少样本提示中获益的程度。为解决这一局限,我们引入了少样本感知GRPO(FSA-GRPO),一种基于强化学习的后训练方法,使用专门设计的奖励来鼓励模型利用少样本示例,从而增强其少样本适应能力。值得注意的是,仅使用高资源成人ASR数据进行训练即可提升模型的通用少样本适应能力,不仅在儿童语音识别中带来收益,在语音翻译和音频理解中也是如此。我们进一步研究了数据选择和辅助奖励加权,以确定有效的训练方案。实验表明,当域内数据不可用或无法用于训练时,FSA-GRPO比直接对相关域外数据进行微调更有效。

英文摘要

Few-shot prompting provides an effective way to adapt auditory large language models to low-resource tasks such as children's speech recognition. However, most auditory large language models are not explicitly trained to perform inference in this demonstration-conditioned format, limiting the extent to which they can benefit from few-shot prompting. To address this limitation, we introduce Few-Shot Aware GRPO (FSA-GRPO), an RL-based post-training recipe that uses a specially designed reward to encourage the model to leverage few-shot demonstrations, thereby strengthening its few-shot adaptation ability. Notably, training with only high-resource adult ASR data improves the model's general few-shot adaptation ability, yielding gains not only in children's speech recognition but also in speech translation and audio understanding. We further study data selection and auxiliary reward weighting to identify an effective training recipe. Our experiments show that when in-domain data are unavailable or cannot be used for training, FSA-GRPO is more effective than direct tuning on related out-of-domain data.

2606.03878 2026-06-03 stat.ML cs.LG

Privacy-Robust Incrementality Measurement for Advertising Systems under Signal Loss

信号损失下广告系统的隐私鲁棒增量测量

Prashant Shekhar, Caroline Howard

AI总结 针对隐私保护报告系统导致的信号损失,提出鲁棒因果决策框架,通过投影观测兼容的实验世界到增量泛函,给出尖锐决策边界,实现认证、拒绝或未决的增量判断。

详情
AI中文摘要

广告平台使用随机提升测试来测量增量,但隐私保护报告系统通过匹配率损失、可链接性损失、归因窗口损失、聚合阈值抑制、随机报告噪声和分段异质信号损失降低观测信号。本文将隐私约束下的广告测量形式化为一个鲁棒因果决策问题,考虑上述信号损失。给定随机实验和隐私引起的退化的模糊集,该框架将观测兼容的干净/未过滤实验世界的纤维投影到增量泛函上,并返回认证、拒绝和未决的决策。主要结果给出了尖锐的决策边界。边界外的报告支持一致有效的认证或拒绝,而边界内的报告包含的信息太少,任何方法都无法一致区分高于阈值的增量与非增量。支持结果给出了有限样本认证、样本复杂度保证、表明信号损失减少有效信息的极小极大下界,以及报告粒度权衡。在200万条Criteo提升数据和6.4万条Hillstrom电子邮件实验中,两个数据集的干净转化提升均为正,分别为0.00112和0.00495。在Criteo中,总体认证在轻度退化下幸存,在Hillstrom中在严重退化下幸存,而两个数据集中所有考虑的有限样本压力设置在同时包含不确定性和报告噪声后仍然未决。总体而言,本研究为隐私感知的增量测量贡献了一个决策理论层,其输出是由退化广告信号证明的最强因果主张。

英文摘要

Advertising platforms use randomized lift tests to measure incrementality, but privacy-preserving reporting systems degrade the observed signal through match-rate loss, linkability loss, attribution-window loss, aggregation-threshold suppression, randomized reporting noise, and segment-heterogeneous signal loss. This paper formulates privacy-constrained advertising measurement as a robust causal decision problem under the mentioned signal losses. Given a randomized experiment and an ambiguity set for privacy-induced degradation, the framework projects the observation-compatible fiber of clean/unfiltered experimental worlds onto the incrementality functional and returns certified, rejected, and unresolved decisions. The main result gives a sharp decision frontier. Reports outside the frontier support uniformly valid certification or rejection, whereas reports inside it contain too little information for any method to uniformly distinguish above-threshold incrementality from non-incrementality. Supporting results give finite-sample certification, sample-complexity guarantees, a minimax lower bound showing that signal loss reduces effective information, and a reporting-granularity tradeoff. On 2.0M Criteo Uplift rows and the 64K-row Hillstrom email experiment, clean conversion lift is positive in both datasets, with lifts 0.00112 and 0.00495, respectively. Population certification survives mild degradation in Criteo and severe degradation in Hillstrom, while all considered finite-sample stress settings in both datasets remain unresolved after simultaneous uncertainty and reporting noise are included. Overall, the research contributes a decision-theoretic layer for privacy-aware incrementality measurement whose output is the strongest causal-claim justified by degraded ads signals.

2606.03820 2026-06-03 stat.ML cs.LG

A Quantitative Approximation Framework for Flow Distillation in Diffusion Models

扩散模型中流蒸馏的定量近似框架

Weiguo Gao, Ming Li, Lei Shi, Hanfei Zhou

AI总结 针对扩散模型中的流蒸馏,提出一个定量近似框架,将少步采样视为学习流映射组合下的误差传播,通过理论分析和实验验证了稳定性平衡的非均匀时间网格能显著降低端到端相对MSE。

详情
AI中文摘要

我们为扩散蒸馏开发了一个定量近似框架,将少步采样视为学习流映射组合下的误差传播。聚焦于概率流ODE的轨迹蒸馏,我们表明局部近似误差在低噪声多模态区域可能被强烈放大,其中底层动力学变得刚性。在一个解析可处理的高斯混合Ornstein--Uhlenbeck设定中,我们分离了两个核心困难:近似时间依赖的分数场和控制由概率流ODE的时间积分Jacobian界决定的动力学放大。在近似方面,我们证明了构造性的L^p(p_t)保证,表明ReLU--ReQU网络随时间一致地近似高斯混合分数,其深度和宽度在目标精度上呈多对数缩放,并显式依赖于混合几何。在稳定性方面,我们推导了概率流速度的空间Lipschitz常数的一个显式界L(t),并将其转化为由∫_s^t L(u)du控制的流映射稳定性估计,使得刚性区域中的后期放大可计算。基于这些估计,我们证明深度残差组合有效近似长时程传输,全局误差由稳定性放大因子控制,并识别出一个Lipschitz不匹配区域,其中一步蒸馏在结构上不利。由此产生的理论通过累积稳定性坐标的均匀划分得到一个稳定性平衡的非均匀时间网格。实验支持该预测,并在8个分段下与均匀网格相比将端到端相对MSE降低了高达51.9%。

英文摘要

We develop a quantitative approximation framework for diffusion distillation, viewing few-step sampling as error propagation under compositions of learned flow maps. Focusing on trajectory distillation for the probability-flow ODE, we show that local approximation errors can be strongly amplified in low-noise multimodal regimes, where the underlying dynamics become stiff. In an analytically tractable Gaussian-mixture Ornstein--Uhlenbeck setting, we separate two core difficulties: approximating the time-dependent score field and controlling the dynamical amplification governed by the time-integrated Jacobian bound of the probability-flow ODE. On the approximation side, we prove constructive L^p(p_t) guarantees showing that ReLU--ReQU networks approximate the Gaussian-mixture score uniformly over time, with depth and width scaling polylogarithmically in the target accuracy and explicitly with the mixture geometry. On the stability side, we derive an explicit bound L(t) for the spatial Lipschitz constant of the probability-flow velocity and convert it into a flow map stability estimate governed by \int_s^t L(u)\,du, making late-time amplification in stiff regimes computable. Building on these estimates, we prove that deep residual compositions efficiently approximate the long-horizon transport, with global error controlled by the stability amplification factor, and identify a Lipschitz-mismatch regime in which one-step distillation is structurally unfavorable. The resulting theory yields a stability-balanced non-uniform time grid obtained by uniform partitioning in the cumulative stability coordinate. Experiments support the prediction and reduce end-to-end relative MSE by up to 51.9\% with 8 segments compared with uniform grids.

2606.03736 2026-06-03 stat.ML cs.LG

Resource-Constrained Adaptive Inference for Sequential Pricing

资源约束下的自适应推断用于顺序定价

Ruicheng Ao, Jiashuo Jiang, David Simchi-Levi

AI总结 针对资源约束导致固定价格推断不可行的问题,提出一种目标感知定价控制器,通过认证可行目标带并记录连续局部密度,实现基于局部去偏的学生化区间,并分析遗憾-信息核算。

详情
AI中文摘要

资源约束的定价控制器可能使得固定价格推断变得不可能:即使每个已实现的动作具有已知的正密度,控制器的资源状态也可能从可行集中移除目标价格邻域。我们通过局部不可识别结果和已实现的信息时钟形式化了这种支持排除失败。然后,我们设计了一种目标感知定价控制器,该控制器认证可行的目标带并记录连续的局部密度。局部去偏产生了学生化区间,其宽度由该时钟控制。由此产生的遗憾-信息核算(直到初始求解误差)表明,廉价的探索可能不足以进行推断:多项式目标质量给出多项式速率,而纯$1/t$目标分支在没有额外局部移动的情况下不会产生收缩的固定目标区间。实验显示了在认证带中的校准以及当资源状态崩溃目标支持时的诊断性弃权。

英文摘要

Resource-constrained pricing controllers can make fixed-price inference impossible: the controller's resource state may remove the target price neighborhood from the feasible set, even when every realized action has a known positive density. We formalize this support-exclusion failure through a local non-identification result and a realized information clock. We then design a target-aware pricing controller that certifies feasible target bands and logs continuous local densities. Localized debiasing gives studentized intervals whose width is governed by this clock. The resulting regret--information accounting, stated up to pilot re-solving error, shows that cheap exploration can be insufficient for inference: polynomial target mass gives polynomial rates, while a pure $1/t$ target branch does not yield shrinking fixed-target intervals without additional local movement. Experiments show calibration in certified bands and diagnostic abstention when the resource state collapses target support.

2606.03600 2026-06-03 stat.ML cs.LG

Set-Preserving Calibration from Conformal P-Values to E-Values

从共形p值到e值的集合保持校准

Nabil Alami, Jad Zakharia, Souhaib Ben Taieb

AI总结 针对共形预测中p值到e值转换的局限性,提出一种集合保持的P2E校准器,在不改变预测集的前提下实现高效转换,并在交叉共形预测和共形聚合中达到期望覆盖并提升效率。

详情
AI中文摘要

标准的共形预测(CP)过程通常用p值表述,但仅依赖p值限制了灵活性,例如在跨模型或数据分割组合依赖证据时。最近的工作探索了共形推断的e值表述,然而CP中p值和e值表述之间的直接联系仍然缺失,特别是在统计效率方面。我们首先指出了CP设置中经典p到e校准器的局限性,表明它们不是集合保持的,可能导致过于保守的预测集。为解决这一问题,我们提出了一种新颖的P2E校准器,它将共形p值转换为e值,而不改变原始共形p值诱导的预测集。我们在理论和实证上证明,我们的校准器相比现有的p到e校准器可以带来显著的效率提升。这种e值表述使得能够原则性地使用e值合并和随机化的最新进展,我们在两个应用中展示了其影响:交叉共形预测(CCP),其变体通常仅提供近似的$1-2\alpha$覆盖率,以及共形聚合(CA)。在这两种情况下,我们基于e值的方法满足所需的$1-\alpha$覆盖率保证,同时相比标准基线提高了效率。更广泛地说,我们的方法扩展了CP的灵活性,并为高效、无分布的量化不确定性开辟了新方向。

英文摘要

Standard conformal prediction (CP) procedures are typically formulated in terms of p-values, but reliance on p-values alone limits flexibility, for example, when combining dependent evidence across models or data splits. Recent work has explored e-value formulations for conformal inference, yet a direct connection between p- and e-value formulations in CP has been missing, especially regarding their statistical efficiency. We first identify limitations of classical p-to-e calibrators in the CP setting, showing that they are not set-preserving and can lead to overly conservative prediction sets. To address this, we propose a novel P2E calibrator that converts conformal p-values into e-values without altering the prediction set induced by the original conformal p-value. We establish both theoretically and empirically that our calibrator can yield significant efficiency gains over existing p-to-e calibrators. This e-value formulation enables principled use of recent advances in e-value merging and randomization, where we demonstrate its impact in two applications: cross-conformal prediction (CCP), whose variants typically provide only approximate $1-2α$ coverage, and conformal aggregation (CA). In both cases, our e-value-based methods satisfy the desired $1-α$ coverage guarantee while improving efficiency over standard baselines. More broadly, our approach expands the flexibility of CP and opens new directions for efficient, distribution-free uncertainty quantification.

2606.03574 2026-06-03 stat.ML cs.LG

Few-Shot Prediction for Pulsar Noise with Long Short-Term Memory Network

基于长短期记忆网络的脉冲星噪声少样本预测

Qingye Tang, Dechao An, Haoran Peng, Yuqi Ouyang

AI总结 针对脉冲星计时数据稀缺问题,提出一种结合模型无关元学习优化的LSTM网络,仅需少量真实计时残差即可快速适应新频域,并利用粒子群算法自动调参,在IPTA数据集上以10%数据实现高精度预测。

详情
AI中文摘要

本文提出了一种新颖的解决方案,用于在有限数据下预测脉冲星计时残差,解决了PTA数据集中毫秒脉冲星自旋频率子组数据稀缺的关键挑战。该方案应用了长短期记忆(LSTM)网络,并通过模型无关元学习算法进行优化,使得仅需少量真实计时残差即可通过微调LSTM网络快速适应新的频域。同时,采用粒子群优化算法进行自动超参数优化,提高了预测精度。我们的解决方案在国际脉冲星计时阵列(IPTA)第二次数据发布上进行了评估,在高频测试频域的三个指标上均展现出鲁棒的泛化能力和准确预测,且仅需这些域中10%的计时残差进行模型微调。此外,我们的轻量级结构仅需16.86 MB CPU内存和18毫秒即可完成单步残差预测。所有这些特性使得我们的解决方案非常适合实际应用,在这些应用中,有效且实时的脉冲星计时残差预测至关重要——尤其是在计算能力、内存或能源有限的资源受限环境中。

英文摘要

This work proposes a novel solution to predict pulsar timing residuals with limited data, addressing the critical challenge of data scarcity across spin-frequency subgroups of millisecond pulsars in PTA datasets. The proposed solution applies a Long Short-Term Memory (LSTM) network optimized using the model-agnostic meta-learning algorithm, enabling rapid adaptation to new frequency domain by fine-tuning the LSTM network with only a few-shot of ground truth timing residuals. Particle swarm optimization algorithm is also used for automatic hyperparameter optimization, leading to improved prediction accuracy. Our solution, evaluated on the second data release of the International Pulsar Timing Array (IPTA), demonstrates robust generalization with accurate predictions in three metrics across high-frequency test frequency domains, while requiring only 10% of the timing residuals from these domains for model fine-tuning. Furthermore, our lightweight structure only costs 16.86 MB CPU memory and 18 milliseconds for single-step residual prediction. All these characteristics make our solution highly suitable for real-world applications, where effective and real-time predictions of pulsar timing residuals are essential-particularly in resource-constrained environments with limited computational power, memory, or energy availability.

2606.03553 2026-06-03 stat.ML cs.LG math.OC

A Robust Optimization Approach to Sparse Principal Component Analysis

稀疏主成分分析的鲁棒优化方法

David Vävinggren, Francis Bach, André M. H. Teixeira, Dave Zachariah, Antônio H. Ribeiro

AI总结 提出AdvPCA方法,通过鲁棒优化在重建目标中引入最坏情况潜在空间扰动实现稀疏性,并给出闭式解和迭代算法。

详情
AI中文摘要

虽然主成分分析(PCA)是降维的基本工具,但其稠密表示使其不适用于高维数据。现有方法通过显式的$\ell_1$惩罚来促进稀疏性,但由于任务的无监督性质,这些惩罚不易调整。相比之下,我们提出了对抗性PCA(AdvPCA),它利用鲁棒优化,通过优化针对有界、最坏情况潜在空间扰动的重建目标来实现稀疏性。我们表明,该公式允许闭式约简,从而产生一种实用的迭代算法,该算法交替进行稀疏编码器的对抗性线性回归式更新和解码器的正交更新。通过对解进行理论刻画,我们推导出一种数据自适应参数化,使算法能够开箱即用地有效执行。我们通过在合成和真实世界基因组学数据上的数值实验验证了这些主张。

英文摘要

While principal component analysis (PCA) is a fundamental tool for dimensionality reduction, its dense representations make it ill-suited for high-dimensional data. Existing methods address this by promoting sparsity through explicit $\ell_1$-penalties, but these are not obvious to tune due to the unsupervised nature of the task. In contrast, we propose Adversarial PCA (AdvPCA), which leverages robust optimization to achieve sparsity by optimizing the reconstruction objective against bounded, worst-case latent space perturbations. We show that this formulation admits a closed-form reduction, leading to a practical iterative algorithm that alternates between adversarial linear regression-style updates for the sparse encoder and orthogonal updates for the decoder. By theoretically characterizing the solution, we derive a data-adaptive parameterization that allows the algorithm to perform effectively out of the box. We validate these claims through numerical experiments on synthetic and real-world genomics data.

2606.03292 2026-06-03 stat.ML cs.LG

Combining Statistical Features and Deep Encodings for Rehearsal-Based Class-Incremental Time Series Classification

结合统计特征与深度编码的基于排练的类增量时间序列分类

Pablo García-Santaclara, Bruno Fernández-Castro, Rebeca Pilar Díaz-Redondo

AI总结 提出一种双流特征提取管道(结合预训练冻结基础模型的深度时间嵌入特征与统计特征),用于多变量时间序列的类增量持续学习,在五个基准数据集上实现了有竞争力的平均准确率和低遗忘率。

详情
AI中文摘要

现实环境中使用的许多系统需要在不遗忘分类模型先前学习内容的情况下添加新类别并整合新信息。这被称为类增量持续学习,而对于多变量时间序列,数据的时间结构进一步增加了复杂性。本文提出了一种基于双流特征提取管道(使用通过预训练冻结基础模型生成的深度时间嵌入特征以及应用统计特征)的多变量时间序列分类类增量持续学习的新方法。在五个基准数据集上的评估表明,所提出的系统在所有数据集上实现了有竞争力的平均准确率,同时在所有实验配置中保持了较低的遗忘率。

英文摘要

Many systems used in real-world environments require adding new categories and incorporating new information without forgetting what was previously learnt by the classification model. This is known as class-incremental continual learning, and in the case of multivariate time-series, is further complicated by the temporal structure of the data. In this paper, we present a novel approach for performing class incremental continual learning for the classification of multivariate time series data based upon the construction of a dual-stream feature extraction pipeline (using both deep temporal embedding features generated via a pre-trained frozen foundation model and application of statistical features). Evaluated on five benchmark datasets, the proposed system achieves competitive average accuracy across all datasets while maintaining low forgetting rates across all experimental configurations.

2606.03245 2026-06-03 stat.ML cs.LG

Hierarchies of Calibration: Classification meets Regression

校准的层次结构:分类与回归的融合

Johannes Resin, Lu Yang, Tilmann Gneiting

AI总结 本文综述、扩展并桥接了分类与回归任务中的校准概念,重点研究了不同校准概念之间的层次关系,并提出了模态校准、全校准、部分校准和平均校准等新概念。

详情
AI中文摘要

校准概念形式化了概率预测与相应结果之间的兼容性。简而言之,结果应与从预测分布中随机抽取的样本无法区分。本文回顾、扩展并桥接了针对分类和回归任务提出的校准概念。特别强调了各种概念之间的层次关系,因为它们适用于一般实值数据、连续结果、计数数据、名义类别和二元结果。为了突出若干贡献,我们引入了名义结果的模态校准概念,在此背景下区分了全校准、部分校准和平均校准,并证明了双概率积分变换(PIT)校准在逻辑上独立于先前针对离散结果提出的校准概念。此外,我们推广了关于校准概念的现有结果,这些概念以预测分布的性质或泛函(如均值、分位数或事件概率)表示。在整篇论文中,我们通过实例说明这些概念及其层次关系,并提供支持构建指导性示例和反例的算法工具。

英文摘要

Concepts of calibration formalize the compatibility between probabilistic predictions and the respective outcomes. In a nutshell, the outcomes ought to be indistinguishable from random draws from the predictive distributions. In this paper, we review, extend, and bridge notions of calibration that have been proposed for classification and regression tasks. Particular emphasis is given to hierarchical relations between the various notions, as they apply to general real-valued data, continuous outcomes, count data, nominal classes, and binary outcomes. To highlight a number of contributions, we introduce the notion of modal calibration for nominal outcomes, we distinguish full, partial, and average calibration in this setting, and we show that double probability integral transform (PIT) calibration is logically independent of previously proposed concepts of calibration for discrete outcomes. Furthermore, we generalize extant results on concepts of calibration that are expressed in terms of properties or functionals of the predictive distributions, such as means, quantiles, or event probabilities. Throughout the paper, we illustrate the concepts and their hierarchical relations in worked examples, and we provide algorithmic tools that support the construction of instructive examples and counterexamples.

2606.03217 2026-06-03 stat.ML cond-mat.dis-nn cs.LG

An Asymptotic Theory of Chain-of-Thought in In-Context Learning

上下文学习中思维链的渐近理论

Kaito Takanami, Cengiz Pehlevan

AI总结 通过高维随机矩阵理论,推导了线性回归中上下文学习思维链的泛化误差精确公式,揭示了推理深度、预训练数据量和上下文长度之间的相变现象。

详情
AI中文摘要

思维链推理已成为一种广泛使用的机制,通过在推理时生成中间推理步骤来激发大型语言模型的多步推理。然而,泛化能力随思维链深度的缩放行为仍知之甚少。为了解决这个问题,我们研究了一个理论上可解的线性回归中上下文权重预测的思维链模型,其中测试时推理表示为权重参数估计的迭代细化。利用高维渐近下的随机矩阵理论工具,我们推导了泛化误差作为推理深度、预训练数据量和上下文长度的精确公式。我们的分析揭示了指数与多项式改进、饱和及过度思考之间的尖锐相变,并刻画了最优推理深度如何缩放。我们进一步表明,更深的推理在预训练和上下文信息足够丰富时最为有效,而有限的预训练或上下文会使较长的推理容易产生误差放大或饱和。我们还通过在完全学习的线性注意力和softmax注意力模型上的实验验证了这些预测。我们的结果为测试时思维链深度如何影响泛化提供了一个统一的理论解释。

英文摘要

Chain-of-thought (CoT) reasoning has become a widely used mechanism for eliciting multi-step reasoning in large language models by generating intermediate reasoning steps at inference time. Yet the scaling behavior of generalization with CoT depth remains poorly understood. To address this question, we study a theoretically solvable model of CoT for in-context weight prediction in linear regression, where test-time reasoning is represented as an iterative refinement of the weight-parameter estimate. Using tools from random matrix theory under high-dimensional asymptotics, we derive an exact formula for the generalization error as a function of reasoning depth, pretraining data amount, and context length. Our analysis reveals a sharp phase transition separating exponential and polynomial improvement, saturation, and overthinking, and characterizes how the optimal reasoning depth scales. We further show that deeper reasoning is most effective with sufficiently rich pretraining and in-context information, whereas limited pretraining or context makes longer reasoning prone to error amplification or saturation. We also validate these predictions through experiments on fully learned linear attention and softmax attention models. Our results provide a unified theoretical account of how test-time CoT depth affects generalization.

2606.03112 2026-06-03 stat.AP cs.LG

Trans GAN-WT: A Feature Extraction and Interactive Learning-Based Anomaly Detection Model for Wind Turbine Time Series Data

Trans GAN-WT: 一种基于特征提取和交互学习的风电机组时间序列数据异常检测模型

Jingzhe Kang

AI总结 提出融合Transformer和生成对抗网络的异常检测模型TransGAN-WT,通过放大重构误差、自回归多模态特征提取和时序特征交互学习,在真实风电机组数据集上F1达96.10%,误报率仅0.06%。

详情
AI中文摘要

随着风电场规模和数量的增加,风电机组的日常运维成本不断上升。为了降低运维成本并在灾难性故障发生前提高风电机组及系统运行数据的可靠性,监测设备运行状态并在早期检测故障至关重要。利用工况数据对风电机组运行状态进行异常评估,实现运行状态异常监测具有重要的实际意义。然而,现有的异常检测方法既无法在充满大量冗余信息的数据中进行有效的关系建模,也无法合理利用有价值的异常数据。为此,本文提出了一种融合Transformer和生成对抗网络的异常检测模型。首先,通过放大重构误差来降低微小偏差异常的漏检率。其次,利用自回归推理提取多模态特征,以增强训练的稳定性和泛化能力。最后,构建时序特征提取模块,促进不同时间尺度特征之间的交互学习,有效减少时间冗余。在真实风电机组数据集上进行的多组实验结果表明,TransGAN-WT在多个风电机组数据集上的平均F1分数达到96.10%,比几种其他最先进的基线方法分别高出5.84%和2.89%。同时,其误报率(FPR)仅为0.06%,并通过Wilcoxon符号秩检验验证了与最先进基线方法相比取得了统计上显著的性能提升,有效保障了风电机组的稳定运行。

英文摘要

With the increasing scale and number of wind farms, wind turbines' daily operation and maintenance costs are increasing. To reduce operation and maintenance costs and enhance the reliability of wind turbine and system operation data before reaching catastrophic failures, monitoring the operating status of the equipment and detecting failures at an early stage is crucial. It is of great practical significance to utilize the working condition data for abnormal assessment of the operating status of wind turbines to realize abnormal monitoring of the operating status of wind turbines. However, the existing anomaly detection methods can neither perform effective relational modeling in data filled with a large amount of redundant information nor reasonably utilize the valuable anomaly data. For this reason, this paper proposes an anomaly detection model that fuses a Transformer and a generative adversarial network. Firstly, it reduces the leakage detection rate of minor deviation anomalies by amplifying the reconstruction error. Secondly, it uses autoregressive inference to extract multimodal features to enhance the stability and generalization ability of training. Finally, the temporal feature extraction module is constructed to promote the interactive learning between features of different time scales and effectively reduce the time redundancy. The results of multiple sets of experiments conducted on real WTG datasets show that TransGAN-WT achieves an average F1 score of 96.10% across multiple wind turbine datasets, which is 5.84% and 2.89% higher than several other state-of-the-art baseline methods. It also realizes a false positive rate (FPR) of 0.06%, and is verified by the Wilcoxon signed-rank test to have achieved a statistically significant performance enhancement compared to the state-of-the-art baseline methods, effectively ensuring the stable operation of wind turbines.

2606.03018 2026-06-03 stat.ME cs.LG math.ST stat.ML stat.TH

A Fast Screening Approach for High-dimensional Outcomes and High-dimensional Predictors

高维结果与高维预测变量的快速筛选方法

Hongju Park, Zhenyao Ye, Shuo Chen

AI总结 提出图独立双筛选(GIDS)框架,同时降低响应变量和预测变量的维度,以解决高维交叉模态分析中的计算负担和可解释性问题。

详情
Comments
38 pages, 2 figures
AI中文摘要

由于超高维度和复杂依赖结构伴随高水平噪声,对多模态高维数据间的交互建模本质上具有挑战性。筛选方法能有效降低维度,但大多数现有方法仅缩减预测变量空间而保留所有结果变量。在交叉模态分析中,不同结果变量通常选择不同的预测变量子集,因此并集仍然很大且响应维度不变,限制了筛选的实际效益。这导致沉重的计算负担和较差的可解释性。为解决这些局限,我们提出一个新的筛选框架——图独立双筛选(GIDS),它同时降低响应变量和预测变量的维度。我们设计了计算高效的算法,促进后续选择过程,提高准确性和可扩展性,并建立了支持性的理论结果。广泛的模拟研究表明,GIDS优于仅筛选预测变量的现有方法。为展示其实用性,我们将GIDS应用于阿尔茨海默病神经影像学倡议(ADNI)数据集,分析全基因组865,353个DNA甲基化与49,386个转录组变量之间的交互。GIDS将特征空间缩减至约9,000个CpG位点和2,000个转录本,揭示了块状交互结构:具有强关联的CpG位点簇和基因转录本簇。这些发现不仅提高了计算可处理性,还产生了可解释的生物学见解,突显了阿尔茨海默病背后的协调调控机制。

英文摘要

Modeling interactions among multimodal, high-dimensional data is intrinsically challenging due to ultra-high dimensionality and complex dependence structure with high level noise. Screening methods are effective for reducing dimensionality, but most existing approaches shrink only the predictor space while retaining all outcomes. In cross-modal analyses, different outcomes often select different predictor subsets, so the union remains large and the response dimension is unchanged, limiting the practical benefit of screening. This gives rise to heavy computational burdens and poor interpretability. To address these limitations, we propose a new screening framework, Graph Independence Dual Screening (GIDS), which simultaneously reduces the dimensionality of response variables and predictors. We design computationally efficient algorithms that facilitate downstream selection procedures, improving accuracy and scalability, and establish supporting theoretical results. Extensive simulation studies demonstrate that GIDS outperforms existing methods that screen only predictors. To illustrate its utility, we applied GIDS to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, analyzing interactions between genome-wide 865,353 DNA methylation and 49,386 transcriptomic variables. GIDS reduced the feature space to approximately 9,000 CpGs and 2,000 transcripts, uncovering blockwise interaction structures: clusters of CpG sites and gene transcripts with strong associations. These findings not only improve computational tractability but also yield interpretable biological insights, highlighting coordinated regulatory mechanisms underlying Alzheimer's disease.

2606.02909 2026-06-03 stat.ML cs.LG

Scalable Derivative Gaussian Processes via Exact Gradient Reduction

可扩展的导数高斯过程通过精确梯度约简

Hyunseok Seung, Matthias Katzfuss

AI总结 提出TERA方法,利用精确梯度约简将导数高斯过程的计算复杂度从O(n^3 d^3)降至O(d m^2 + m^6),实现高维空间中的可扩展推理。

详情
AI中文摘要

梯度观测可以显著改善高斯过程(GP)代理,特别是在函数评估昂贵的高维设置中。然而,对n个函数值和n个完整梯度(d维)进行精确推理的计算复杂度与联合状态大小呈三次方关系,导致难以处理的O(n^3 d^3)计算瓶颈。我们提出TERA,一种基于目标特定精确梯度约简的高度可扩展导数GP方法。我们证明,对于平稳核,与连接目标和条件点的方向正交的梯度分量在条件上独立于目标函数值;因此,一旦指定了大小为m的条件集,精确条件密度完全由至多m^2个方向导数刻画。通过将这些约简的、无维度的条件作为Vecchia近似中的局部因子,TERA有效地将n和d从稠密矩阵求逆中解耦。这将每个目标的评估成本降低到O(d m^2 + m^6)时间和O(d m^2 + m^4)内存,同时保持底层导数GP模型在数学上不变。实验评估表明,TERA实现了最先进的预测精度,同时比标准导数GP快数个数量级。关键的是,计算时间和峰值GPU内存相对于d基本保持平稳,从而在高维空间中实现高度可扩展的推理。

英文摘要

Gradient observations can substantially improve Gaussian process (GP) surrogates, particularly in high-dimensional settings where function evaluations are expensive. However, exact inference with $n$ function values and $n$ full gradients in $d$ dimensions scales cubically in the joint state size, imposing an intractable $\mathcal{O}(n^3 d^3)$ computational bottleneck. We introduce TERA, a highly scalable derivative GP method based on target-specific exact gradient reduction. We prove that for stationary kernels, the gradient components orthogonal to the directions connecting the target and conditioning points are conditionally independent of the target function value; consequently, the exact conditional density is fully characterized by at most $m^2$ directional derivatives once a conditioning set of size $m$ is specified. By using these reduced, dimension-free conditionals as local factors in a Vecchia approximation, TERA effectively decouples $n$ and $d$ from the dense matrix inversion. This reduces the per-target evaluation cost to $\mathcal{O}(dm^2 + m^6)$ time and $\mathcal{O}(dm^2 + m^4)$ memory, leaving the underlying derivative GP model mathematically unchanged. Empirical evaluations demonstrate that TERA achieves state-of-the-art predictive accuracy while operating orders of magnitude faster than standard derivative GPs. Crucially, both computation time and peak GPU memory remain essentially flat with respect to $d$, enabling highly scalable inference in high-dimensional spaces.

2606.02740 2026-06-03 stat.ML cs.LG

ScoreStop: Gradient-based early stopping using functional score tests

ScoreStop: 基于梯度的早期停止方法使用函数得分检验

Oliver J. Hines, Christian L. Hines

AI总结 提出ScoreStop方法,通过函数得分检验在每次迭代中检验当前预测器是否为总体风险最小化器,从而在梯度提升决策树中实现基于梯度的早期停止,避免过拟合。

详情
Comments
Presented at the International Conference on Machine Learning 2026 Workshop on Hypothesis Testing
AI中文摘要

梯度提升决策树需要停止规则以避免过拟合。标准规则监控验证损失,如果损失在固定的耐心期内没有改善则停止。然而,耐心参数没有可解释的尺度,验证损失可能带有噪声或由用户指定的梯度隐式定义。我们提出ScoreStop,一种基于梯度的早期停止规则,将每次迭代的停止决策视为检验当前预测器是否为总体风险最小化器的原假设。我们使用在验证数据上计算的函数得分检验,其统计量在更新方向上具有尺度不变性,并且在原假设下具有已知的渐近分布。由于我们的检验使用梯度而非损失值,相同的构造适用于隐式损失(如LambdaRank)和通过影响函数的数据依赖损失(如Cox回归)。在合成实验和真实数据基准测试中,我们展示了ScoreStop与基于损失的方法相比具有竞争力。

英文摘要

Gradient boosted decision trees require a stopping rule to avoid overfitting. The standard rule monitors a validation loss and stops if the loss fails to improve for a fixed patience period. However, the patience parameter has no interpretable scale and validation losses can be noisy or implicitly defined by a user-specified gradient. We propose ScoreStop, a gradient-based early-stopping rule that casts the stopping decision at each iteration as a test of the null hypothesis that the current predictor is the population risk minimizer. We use a functional score test, computed on validation data, with a statistic that is scale-invariant in the update direction, with a known asymptotic distribution under the null. Because our test uses gradients rather than loss values, the same construction applies to implicit losses such as LambdaRank, and data-dependent losses such as Cox regression via influence functions. In synthetic experiments and real-data benchmarks, we show that ScoreStop is competitive with loss-based methods.

2606.02664 2026-06-03 stat.ML cs.LG

State-Coupled Volatility in Latent Dynamical Systems: Recovery Under Partial Observation

潜变量动力系统中的状态耦合波动性:部分观测下的恢复

Imani Beckett

AI总结 提出状态耦合随机波动框架,利用粒子期望最大化算法在部分观测下估计潜变量过程方差与平衡点位移的关系,并通过仿真验证了恢复与检测性能。

详情
Comments
40 pages, 16 figures
AI中文摘要

潜状态空间模型广泛用于研究部分观测的动力系统,但大多数公式假设过程变异性与潜状态位置无关。然而,在许多生物、行为和生理系统中,变异性可能系统地依赖于潜在动力状态,产生恒定方差模型无法捕捉的结构化随机性。我们引入了一个状态耦合随机波动框架,其中潜过程方差取决于与潜平衡点的位移。为了在部分观测下估计这种关系,我们开发了一种粒子期望最大化程序,结合了引导粒子滤波和反向轨迹平滑。该模型包含一个耦合参数 $\gamma$,用于量化潜状态位置与过程变异性之间的关联强度。一个大规模仿真基准评估了在不同耦合强度、观测噪声水平、轨迹长度和持续性机制下的恢复和检测性能。与基于观测状态的异方差代理相比,所提出的框架一致地减少了恢复偏差,在强耦合下改进最大。恢复性能随着潜持续性的增加而提高,而检测性能在广泛条件下保持竞争力,并随着观测噪声的增加而变得更加有利。综合来看,结果表明当明确建模潜状态结构时,可以在部分观测下识别和估计状态耦合波动性。该框架为研究状态依赖变异性以及评估结构化随机性是否提供超出平均状态轨迹所包含的系统动力学信息提供了实用的方法论基础。

英文摘要

Latent state-space models are widely used to study partially observed dynamical systems, yet most formulations assume that process variability is independent of latent-state position. In many biological, behavioral, and physiological systems, however, variability may depend systematically on the underlying dynamical state, producing structured stochasticity that is not captured by constant-variance models. We introduce a state-coupled stochastic volatility framework in which latent process variance depends on displacement from a latent equilibrium. To estimate this relationship under partial observation, we develop a particle expectation-maximization procedure combining bootstrap particle filtering and backward trajectory smoothing. The model includes a coupling parameter, $γ$, that quantifies the strength of association between latent-state position and process variability. A large-scale simulation benchmark evaluated recovery and detection performance across varying coupling strengths, observation noise levels, trajectory lengths, and persistence regimes. The proposed framework consistently reduced recovery bias relative to an observed-state heteroskedastic proxy, with the largest improvements occurring under strong coupling. Recovery performance improved with increasing latent persistence, while detection performance remained competitive across a broad range of conditions and became increasingly advantageous as observation noise increased. Taken together, the results demonstrate that state-coupled volatility can be identified and estimated under partial observation when latent-state structure is explicitly modeled. The framework provides a practical methodological foundation for studying state-dependent variability and evaluating whether structured stochasticity contributes information about system dynamics beyond that contained in mean-state trajectories alone.

2606.02645 2026-06-03 stat.ML cs.AI cs.LG

Target Updates May Stabilize Linear Q-Learning: Periodic and Soft Dynamics

目标更新可能稳定线性Q学习:周期性和软动态

Donghwan Lee

AI总结 本文通过精确的切换线性系统动力学和联合谱半径分析,证明了在特定谱和步长条件下,周期性硬目标更新和软目标更新可以保证线性Q学习收敛到精确的投影Q-Bellman解。

详情
AI中文摘要

Q学习中的周期性目标更新和actor-critic方法中的软目标更新是经验上公认的稳定机制,但其精确的理论解释仍不完整。本文针对线性函数逼近的Q学习(线性Q学习),利用Bellman最大值引起的精确切换线性系统(SLS)动力学以及由此产生的切换矩阵族的联合谱半径(JSR),对这些机制进行了严格而精确的分析。尽管线性Q学习通常可能无法收敛,但我们证明,在明确的谱和步长条件下,周期性硬目标更新和软目标更新可以保证收敛到精确的投影Q-Bellman解。主要分析针对确定性线性Q学习进行,其中目标更新机制最为透明。一旦为均值递归建立了相应的JSR证书,随机强化学习设置可以通过将确定性模式替换为采样随机模式并添加相应的随机噪声分析来处理。

英文摘要

Periodic target updates in Q-learning and soft target updates in actor-critic methods are empirically well established stabilization mechanisms, but their precise theoretical explanation is still incomplete. This paper gives a rigorous and exact analysis of these mechanisms for Q-learning with linear function approximation (linear Q-learning) using the exact switched linear system (SLS) dynamics induced by the Bellman maximum and the joint spectral radius (JSR) of the resulting switching matrix families. Although linear Q-learning can fail to converge in general, we prove that, under explicit spectral and step-size conditions, periodic hard target updates and soft target updates can guarantee convergence to the exact projected Q-Bellman solution. The main analysis is carried out for deterministic linear Q-learning, where the target-update mechanism is most transparent. Once the corresponding JSR certificate is established for the mean recursion, the stochastic reinforcement-learning setting can be treated by replacing deterministic modes with sampled stochastic modes and adding the corresponding stochastic-noise analysis.