arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 8085
2606.03673 2026-06-03 eess.SP

Chasing Lightning: Detecting, Characterizing, and Identifying a Powerful Space-Based GNSS Interference Source

追逐闪电:检测、表征和识别一个强大的天基GNSS干扰源

Zachary L. Clements, Argyris Kriezis, Todd E. Humphreys

AI总结 本文利用2019-2026年地面GNSS参考站网络数据,开发了基于接收功率的检测框架,详细描述了干扰事件的时空谱模式,并融合接收功率和到达时间差测量技术,将干扰源识别为俄罗斯“闪电”轨道预警卫星星座。

Comments Submitted for review to the Institute of Navigation journal NAVIGATION

详情
AI中文摘要

本文分析并识别了一个自2019年以来在欧洲大陆、格陵兰和加拿大造成数十次强大瞬态广域干扰事件的天基全球导航卫星系统(GNSS)干扰源。虽然全球范围内GNSS干扰近期增加主要归因于地面或近地面源,但天基干扰源因其潜在的大范围地理覆盖以及预示着GNSS干扰的质变升级而特别令人担忧。基于2019年至2026年间从地面GNSS参考站网络收集的数据,本文(1)开发了一个基于接收功率的检测框架;(2)详细描述了由该源引起的广域干扰事件的空间、时间和频谱模式;(3)提出并分析了融合接收功率和到达时间差测量的识别技术;(4)应用这些技术将GNSS干扰源自信地识别为俄罗斯在“闪电”(Molniya)轨道上的预警卫星星座。

英文摘要

This paper analyzes and identifies a space-based Global Navigation Satellite System (GNSS) interference source that has caused scores of powerful transient wide-area interference events over continental Europe, Greenland, and Canada since 2019. While terrestrial or near-terrestrial sources are primarily responsible for the recent uptick in GNSS interference worldwide, space-based interferers are of special concern given their potential for vast geographic reach and their portent of a qualitative escalation in GNSS interference. Based on data collected between 2019 and 2026 from a network of terrestrial GNSS reference stations, this paper (1) develops a received-power-based detection framework; (2) details the spatial, temporal, and spectral patterns of wide-area interference events caused by the source; (3) presents and analyzes identification techniques that blend received-power and time-difference-of-arrival measurements; and (4) applies these techniques to confidently identify the GNSS interference source as a constellation of Russian early warning satellites in Molniya ("lightning") orbits.

2606.03531 2026-06-03 eess.SP

Voxel-CKM: Voxelized Radio Frequency Radiance Fields for Fast and Few-Shot CKM Construction

Voxel-CKM:用于快速少样本CKM构建的体素化射频辐射场

Hanlei Li, Guangyi Zhang, Kequan Zhou, Yunlong Cai, Guanding Yu

AI总结 提出Voxel-CKM框架,通过体素化射频辐射场和向量矩阵分解实现快速、少样本的信道知识地图构建。

详情
AI中文摘要

信道知识地图(CKM)旨在根据用户位置预测信道状态信息(CSI),从而实现低开销的CSI获取。然而,现有的CKM构建方法通常需要数小时到数天的训练时间和密集的测量,导致高昂的部署成本。在本文中,我们提出了Voxel-CKM,一种新颖的体素化射频(RF)辐射场框架,用于快速和少样本的CKM构建。核心思想是用显式体素网格替代隐式神经表示,以有效捕捉无线信道的空间变化。在此基础上,我们进一步引入紧凑的向量矩阵(VM)分解,用少量矩阵和向量参数化这些体素网格,这显著加速了收敛并促进了快速CKM构建。为了实现少样本学习,我们将发射机先验作为归纳偏置纳入,以指导稀疏测量下的学习过程。此外,提出了一种总变差(TV)正则化损失,以减轻过拟合并稳定优化。实验表明,Voxel-CKM显著加速了训练收敛,并在少样本情况下提高了性能。

英文摘要

Channel knowledge maps (CKMs) are designed to predict channel state information (CSI) from user locations, thereby enabling low-overhead CSI acquisition. However, existing CKM construction methods often require hours-to-days of training time and dense measurements, resulting in substantial deployment cost. In this paper, we propose Voxel-CKM, a novel voxelized radio frequency (RF) radiance field framework for fast and few-shot CKM construction. The core idea is to replace implicit neural representations with explicit voxel grids to efficiently capture the spatial variation of wireless channels. Building upon this, we further introduce a compact vector-matrix (VM) decomposition to parameterize these voxel grids using a small set of matrices and vectors, which significantly accelerates convergence and facilitates fast CKM construction. To enable few-shot learning, we incorporate a transmitter prior as an inductive bias to guide the learning process under sparse measurements. Additionally, a total-variation (TV) regularization loss is proposed to mitigate overfitting and stabilize optimization. Experiments show that Voxel-CKM substantially accelerates training convergence and improves performance in the few-shot regime.

2606.03468 2026-06-03 eess.IV cs.MM cs.NI

When BBR Meets Live Streaming

当BBR遇上直播

Xu Yan, Tong Li, Bo Wu, Cheng Luo, Jiuxiang Zhu, Laizhong Cui

AI总结 针对BBR在直播场景中带宽估计不准确导致的问题,提出辅助组件BBR-Copilot通过主动发送额外数据生成精确带宽样本,提升BBR在直播中的性能。

详情
AI中文摘要

最近,亚马逊、腾讯、字节跳动和华为等行业先驱已采用BBR作为其直播应用(包括TikTok Live)的拥塞控制算法。然而,BBR最初是为批量数据传输而设计的,在直播场景中面临多重挑战。在本文中,我们首先探讨了由于直播场景中带宽估计不准确而导致的BBR的两个关键问题:(i)BBR难以退出启动阶段,导致严重的自致丢包。(ii)BBR在稳定阶段以低于可用带宽的速率发送数据。然后,我们提出了BBR-Copilot,一个与BBR协作的辅助拥塞控制组件,使BBR更好地适应直播场景。BBR-Copilot通过智能地创建和发送额外数据,主动生成准确的带宽测量样本。我们在QUIC上实现了BBR-Copilot原型,并通过测试平台进行了评估。实验评估结果表明,BBR-Copilot有效提升了BBR在直播场景中的性能。

英文摘要

Recently, industrial pioneers like Amazon, Tencent, ByteDance, and Huawei have been adopting BBR as their congestion control algorithm for live-streaming applications, including TikTok Live. However, BBR, originally crafted for bulk data transmission, faces multiple challenges in live-streaming scenarios. In this paper, we first explore two key issues associated with BBR due to inaccurate bandwidth estimation in live-streaming scenarios: (i) BBR cannot easily exit its startup phase, resulting in a fierce self-inflicted loss. (ii) BBR sends data at a lower rate than the available bandwidth during its stable phase. We then propose BBR-Copilot, an auxiliary congestion control component that cooperates with BBR, making BBR better adapt to live-streaming scenarios. BBR-Copilot allows for proactively generating accurate bandwidth measurement samples by smartly creating and sending extra data. We implement the BBR-Copilot prototype upon QUIC and evaluate it via testbed. Experimental evaluation results show that BBR-Copilot effectively enhances BBR's performance in live-streaming scenarios.

2606.03370 2026-06-03 eess.IV

SMAC: Spatial-Modal Joint Modeling and Adaptive Representation Collapse for Multimodal Object Tracking

SMAC: 空间-模态联合建模与自适应表示崩溃的多模态目标跟踪

Meijing Gao, Qitai Sun, Huanyu Sun, Bingxuan Yang, Bingzhou Sun, Xu Chen, Yonghao Yan, Yuxuan Yang

AI总结 针对复杂光照下多模态多目标跟踪中空间与模态特征联合建模不足及固定融合策略适应性有限的问题,提出基于空间-模态卷积融合和蒸馏提示的多模态跟踪框架,通过解耦3D卷积、幅相分解和表示崩溃网络实现自适应融合,在UniRTL数据集上取得领先性能。

Comments 12 pages, 16 figures. Code and pretrained models are available at https://github.com/QitaiSun/SMAC

详情
AI中文摘要

复杂光照下的多模态多目标跟踪(MOT)由于空间和模态特征的联合建模不足以及固定融合策略的适应性有限,仍然具有挑战性。为了解决这些问题,本文提出了一种基于空间-模态卷积融合和蒸馏提示的多模态MOT框架。首先构建了空间-模态融合骨干网络,其中Basic模块通过解耦3D卷积进行空间特征提取和模态交互,而Mixed模块通过幅相分解建模非线性跨模态相关性。此外,设计了一个表示崩溃网络用于自适应多模态融合。蒸馏提示引导(DPG)模块在教师监督下生成动态模态权重,全局模态差异聚合(GMDA)模块在多模态表示崩溃过程中保留判别性信息。在UniRTL数据集上的大量实验证明了所提方法的有效性。所提跟踪器在RNT模态上达到63.31 HOTA和79.21 MOTA,优于多种最先进方法,同时保持有利的推理效率。源代码和预训练模型在此https URL公开提供。

英文摘要

Multimodal multi-object tracking (MOT) under complex illumination remains challenging due to insufficient joint modeling of spatial and modal features and the limited adaptability of fixed fusion strategies. To address these issues, this paper proposes a spatial-modal convolution fusion and distillation-prompt-based multimodal MOT framework. A spatial-modal fusion backbone is first constructed, where a Basic module performs spatial feature extraction and modal interaction via decoupled 3D convolution, while a Mixed module models nonlinear cross-modal correlations through amplitude-phase decomposition. In addition, a representation collapse network is designed for adaptive multimodal fusion. A Distillation Prompt Guidance (DPG) module generates dynamic modal weights under teacher supervision, and a Global Modal Difference Aggregation (GMDA) module preserves discriminative information during multimodal representation collapse. Extensive experiments on the UniRTL dataset demonstrate the effectiveness of the proposed method. The proposed tracker achieves 63.31 HOTA and 79.21 MOTA on the RNT modality, outperforming several state-of-the-art methods while maintaining favorable inference efficiency. The source code and pretrained models are publicly available at https://github.com/QitaiSun/SMAC.

2606.03337 2026-06-03 eess.SP

Node-Oriented Proactive Spectral Modulation: A Unified Fractional Framework for Graph Signal Denoising

面向节点的主动频谱调制:图信号去噪的统一分数阶框架

Manjun Cui, Zhichao Zhang, Yangfan He

AI总结 提出一种面向节点的分数阶滤波(NOFF)框架,通过低秩约束(LRNOFF)实现局部空间适应性与主动频谱调制的统一,解决图信号去噪中的频谱刚性与过拟合问题。

详情
AI中文摘要

图信号去噪是图信号处理中的一项基本任务。面向节点的滤波方法增强了空间适应性,但由于依赖图傅里叶变换而存在频谱刚性。相反,新兴的分数阶域变换提供了关键的频谱灵活性,但其根本受限于全局共享的滤波范式,无法适应局部拓扑变化。为弥合这一差距,本文提出一种广义的面向节点分数阶滤波(NOFF)框架,该框架无缝集成了局部空间适应性与跨多种分数阶变换的主动频谱调制。然而,为所有顶点直接分配独立的满秩滤波器会导致参数空间过大,从而在随机噪声上产生严重的过拟合。为缓解这一问题,我们引入了低秩NOFF(LRNOFF)架构。通过施加严格的低秩约束,LRNOFF本质上充当了强大的隐式正则化器,防止噪声记忆并确保提取鲁棒的频谱基。此外,我们开发了一种高效的计算实现,称为LRNOFF-Fast,它在保持理论最优性的同时大幅降低了计算和内存开销。在真实数据集上的实验表明,所提出的框架达到了最先进的性能。

英文摘要

Graph signal denoising is a fundamental task in graph signal processing. While the node-oriented filtering approach enhances spatial adaptability, it suffers from spectral rigidity due to its reliance on the graph Fourier transform. Conversely, emerging fractional-domain transforms provide crucial spectral flexibility but are fundamentally limited by their globally shared filtering paradigm, failing to accommodate localized topological variations. To bridge this gap, this paper proposes a generalized node-oriented fractional filtering (NOFF) framework that seamlessly integrates localized spatial adaptability with proactive spectral modulation across various fractional transforms. However, straightforwardly assigning independent full-rank filters to all vertices incurs a prohibitive parameter space, leading to severe overfitting on random noise. To mitigate this, we introduce the low-rank NOFF (LRNOFF) architecture. By imposing a strict low-rank constraint, LRNOFF inherently acts as a powerful implicit regularizer, preventing noise memorization and ensuring the extraction of robust spectral bases. Furthermore, we develop an efficient computational implementation termed LRNOFF-Fast, which drastically reduces computational and memory overhead while preserving theoretical optimality. Experiments on real-world datasets demonstrate that the proposed framework achieves state-of-the-art performance.

2606.03013 2026-06-03 eess.SP

Fault-Aware Design for Reconfigurable Holographic Surface-Aided ISAC Systems

面向可重构全息表面辅助ISAC系统的故障感知设计

Lu Wang, Mohamadreza Delbari, Gui Zhou, Luis F. Abanto-Leon, Matthias Hollick, Vahid Jamali

AI总结 针对可重构全息表面(RHS)辅助的集成感知与通信(ISAC)系统中硬件故障问题,提出一种基于块坐标下降的优化方法,通过最小化误指定克拉美-罗界(MCRB)并满足信干噪比(SINR)等约束,实现故障感知的RHS设计,平均性能提升13.7%。

Comments accepted by IEEE PIMRC 2026

详情
AI中文摘要

可重构全息表面(RHS)辅助的集成感知与通信(ISAC)系统在实现低硬件成本和高能效的感知与通信方面具有巨大潜力。然而,现有工作很大程度上忽略了RHS中的实际硬件损伤,特别是具有不可控幅度的故障RHS元件,如果不加以处理,会降低系统性能。本文旨在填补这一空白,通过i)量化故障RHS元件对ISAC性能的影响,以及ii)优化功能性RHS元件以保持ISAC性能。具体而言,我们推导了用于感知的误指定克拉美-罗界(MCRB)和用于通信的信干噪比(SINR),以衡量故障元件引起的性能损失。然后,我们制定了一个优化问题,在SINR、发射功率预算和RHS幅度的约束下最小化MCRB。所制定问题的高度非凸性构成了重大挑战,我们通过重新表述并提出一种基于块坐标下降的解决方案来应对,该方案结合了主要化-最小化和逐次凸逼近技术。仿真结果验证了所提方法相比未感知故障的基准实现了平均13.7%的性能提升。

英文摘要

Reconfigurable holographic surface (RHS)-aided integrated sensing and communication (ISAC) systems hold great promise for achieving both sensing and communication with low hardware costs and high energy efficiency. However, existing works largely overlook practical hardware impairments in RHSs, particularly faulty RHS elements with uncontrollable amplitudes, which degrade system performance if left unaddressed. This work aims to fill the gap by i) quantifying the impact of faulty RHS elements on ISAC performance and ii) optimizing the functional RHS elements to preserve the ISAC performance. Specifically, we derive the misspecified Cramer-Rao bound (MCRB) for sensing and the signal-to-interference-and-noise ratio (SINR) for communication to measure the performance loss caused by faulty elements. We then formulate an optimization problem that minimizes MCRB, subject to constraints on SINR, transmit power budget, and RHS amplitude. The high non-convexity of the formulated problem poses a significant challenge, which we address by reformulating and proposing a block coordinate descent-based solution incorporating majorization-minimization and successive convex approximation techniques. Simulation results verify that the proposed approach achieves an average 13.7% performance gain compared to the fault-unaware benchmark.

2606.02961 2026-06-03 eess.IV

AtlasGS: Brain MRI Spatial Resolution Harmonization With Shared Gaussian Geometry

AtlasGS: 基于共享高斯几何的脑MRI空间分辨率协调

Yifan Gao, Peiran Xu, Yimeng He, Haoran Li, Ziyang Long, Yufeng Wang, Ju Dong Yang, Debiao Li

AI总结 提出基于高斯泼溅的共享几何框架,通过两阶段训练实现多模态MRI各向同性超分辨率重建,在多个数据集上达到最先进性能。

详情
AI中文摘要

基于高斯泼溅(GS)的共享几何框架采用两阶段训练策略,首先从各向同性结构扫描中学习显式的、特定于受试者的高斯支架编码解剖几何,然后重用以拟合稀疏切片采集的目标模态的外观。在UK Biobank、GBM和ABCD数据集上进行的跨模态(T2加权、FLAIR、DWI、ASL)、退化因子(×3、×5、×7)和病理异常(胶质母细胞瘤)的穿平面超分辨率实验证明了最先进的重建保真度。共享高斯几何能够为目标模态生成具有强结构一致性的任意视角视图,并进一步展示了自监督面内超分辨率的潜力。这项工作建立了显式几何引导表示作为一种新颖、灵活且可解释的途径,用于回顾性多对比度MRI协调和可靠的临床参考构建。源代码可在以下网址获取:this https URL

英文摘要

Splatting (GS)-based shared geometry framework adopts a two-stage training strategy, in which an explicit, subject-specific Gaussian scaffold encoding anatomical geometry is first learned from the isotropic structural scan and then reused to fit appearance for target modalities acquired with sparse slices. Experiments on the UK Biobank, GBM, and ABCD datasets for through-plane super-resolution across multiple modalities (T2-weighted, FLAIR, DWI, ASL), degradation factors ($\times 3$, $\times 5$, $\times 7$), and pathological abnormalities (glioblastoma) demonstrate state-of-the-art reconstruction fidelity. The shared Gaussian geometry enables arbitrary-view generation for target modalities with strong structural consistency and further shows potential for self-supervised in-plane super-resolution. This work establishes explicit geometry-guided representations as a novel, flexible, and interpretable pathway toward retrospective multi-contrast MRI harmonization and reliable clinical reference construction. Source code is available at: https://github.com/yfgao76/AtlasGS

2606.02891 2026-06-03 eess.SP

Global Unknown Estimation: A Statistical Framework for Wireless Distributed Learning

全局未知估计:无线分布式学习的统计框架

Yicheng Qu, Ali Bereyhi, Ben Liang

AI总结 针对无线分布式学习中空中计算聚合效果有限的问题,提出全局未知估计(GUE)统计框架,将模型聚合视为推断任务,在低信噪比下相比空中计算可降低约15 dB所需功率。

详情
AI中文摘要

空中计算(AirComp)广泛用于无线分布式学习中的模型聚合。尽管它提高了通信效率,但我们认为由于AirComp聚合的目标问题与分布式学习的目标问题存在差异,其有效性有限。在本文中,我们为无线分布式学习中的最优模型聚合开发了一个严谨的公式。利用该公式,我们表明AirComp聚合通常假设局部参数的统计模型不匹配。然后,我们提出了一种用于模型聚合的统计框架,称为全局未知估计(GUE)。它捕捉了局部和全局模型参数之间的统计关系,允许将模型聚合解释为推断任务。我们通过数值实验验证了GUE的效率。我们的结果表明,在低信噪比区域,与AirComp聚合相比,GUE可以将模型聚合所需的功率降低约15 dB。值得注意的是,这一增益是在没有额外计算开销的情况下获得的。

英文摘要

Over-the-air computation (AirComp) is widely used for model aggregation in wireless distributed learning. Although it enhances communication efficiency, we believe the AirComp aggregation has limited effectiveness due to the difference between its target problem and that of distributed learning. In this paper, we develop a rigorous formulation for optimal model aggregation in wireless distributed learning. Using this formulation, we show that AirComp aggregation generally assumes a mismatched statistical model for local parameters. We then propose a statistical framework for model aggregation, called global unknown estimation (GUE). It captures the statistical relation between the local and global model parameters, allowing to interpret model aggregation as an inference task. We validate the efficiency of GUE through numerical experiments. Our results show that, in the low SNR regime, GUE can reduce the required power for model aggregation by approximately 15 dB compared to AirComp aggregation. Remarkably, this gain is obtained without additional computational overhead

2606.02782 2026-06-03 eess.SP

Short-Acquisition Contrast-Free Super-Resolution Microvascular Imaging in Rabbit Kidney

兔肾短采集无造影剂超分辨微血管成像

Zhengchang Kou, Yuning Zhao, Mingrui Liu, Rita J. Miller, Michael L. Oelze

AI总结 提出基于高频超快超声和血流背向散射信号非线性波束形成的无造影剂超分辨超声微血管成像方法,仅用125毫秒数据实现8帧/秒成像,空间分辨率22.2微米,较传统功率多普勒提升三倍。

详情
AI中文摘要

超声定位显微镜(ULM)通过定位和追踪血管内微泡实现微米级微血管成像,但其对外源性造影剂和长采集时间的依赖限制了临床转化。本研究提出一种基于高频超快超声和血流背向散射信号非线性波束形成的高帧率无造影剂超分辨超声微血管成像方法。每幅图像仅使用125毫秒的体内超快数据,在兔肾模型中实现了8帧/秒的成像帧率。重建的微血管图像在23.04 x 15.18 mm²的视场中分辨出全局空间分辨率为22.2微米的血管,而超声波长为67.5微米。这相当于在相同采集时长下,较传统功率多普勒成像提升三倍。与传统血流成像相比,该方法无需注射微泡即可提供更好的微血管对比度和更精细的血管描绘。这些结果为微血管评估的高帧率、无造影剂超分辨超声成像提供了一条实用途径。

英文摘要

Ultrasound localization microscopy (ULM) enables micrometer-scale microvascular imaging by localizing and tracking intravascular microbubbles, but its dependence on exogenous contrast agents and long acquisition times limits clinical translation. This study presents a high-frame-rate contrast-free super-resolution ultrasound microvascular imaging method based on high-frequency ultrafast ultrasound and nonlinear beamforming of backscatter signals from native blood flow. Using only 125 milliseconds of in vivo ultrafast data per image, the proposed method achieved an imaging frame rate of 8 frames/s in a rabbit kidney model. The reconstructed microvascular images resolved vessels with a global spatial resolution of 22.2 um over a field of view of 23.04 x 15.18 mm2, where the wavelength of ultrasound was 67.5 um. This corresponds to a three-fold improvement over conventional power Doppler imaging under the same acquisition duration. Compared with conventional flow imaging, the proposed method provided improved microvascular contrast and finer vessel delineation without microbubble injection. These results demonstrate a practical pathway toward high frame rate, contrast-free super-resolution ultrasound imaging for microvascular assessment.

2606.02771 2026-06-03 eess.SP

A data-driven filter bank framework for IMU-based heave motion estimation

基于数据驱动的滤波器组框架用于IMU垂荡运动估计

Aybars Tokta

AI总结 提出一种数据驱动框架,通过优化一组与特定频率范围相关的IIR滤波器,利用合成数据集实现IMU垂荡运动的高精度鲁棒估计。

Comments 6 pages, 10 figures

详情
AI中文摘要

在本研究中,我们解决了惯性导航系统中基于IMU的垂荡运动估计问题。与现有方法不同,我们提出了一种数据驱动框架,其中一组IIR滤波器(每个滤波器与特定频率范围相关)使用合成的真实垂荡-加速度元组数据集进行优化。合成垂荡信号生成流程首先从已知的波浪能谱合成随机波浪信号,然后通过文献中报告的垂荡响应幅值算子进行处理。相应的垂直加速度测量值通过对垂荡信号进行二次微分并添加真实IMU记录中观测到的低频和高频扰动获得。使用基于傅里叶变换的方法估计平均峰值周期并选择合适的滤波器。离线和实时测试的仿真结果表明,该方法对变化的海况具有鲁棒性,并提供准确的垂荡估计,最大RMSE不超过5厘米或有效垂荡高度的5%中的较大值。

英文摘要

In this study, we address the IMU-based heave motion estimation problem for inertial navigation systems. Unlike existing approaches, we propose a data-driven framework in which a bank of IIR filters, each associated with a specific frequency range, is optimized using a synthetically generated dataset of realistic heave-acceleration tuples. The synthetic heave signal generation pipeline starts by synthesizing random wave signals from established wave energy spectra and then processing them through heave response amplitude operators reported in the literature. The corresponding vertical acceleration measurements are obtained by double-differentiating the heave signals and corrupting them with realistic low- and high-frequency disturbances observed in real IMU recordings. A Fourier-transform-based method is used to estimate the mean peak period and select the appropriate filter. Simulation results from both offline and real-time tests demonstrate that the proposed method is robust to varying sea regimes and provides accurate heave estimation, with a maximum RMSE not exceeding the larger of 5 cm or 5% of the significant heave height.

2606.03961 2026-06-03 stat.ME stat.AP

A Neural Estimation Framework for Aggregated Relational Data under Intractable Likelihoods

一种面向难解似然的聚合关系数据神经估计框架

Rowland G Seymour, Joseph Marsh

AI总结 提出一种基于模拟的神经估计框架,通过训练置换不变贝叶斯估计器,解决聚合关系数据中因同质性、潜在空间聚类和不完美回忆导致的跨群体依赖问题,并应用于网络规模升级法难以处理的生成模型。

Comments 33 pages, 3 figures, 2 tables

详情
AI中文摘要

聚合关系数据(ARD)包含对诸如“你认识多少~$X$? 的人”这类问题的调查回答,广泛用于调查统计中关于人群和社交网络的间接推断。ARD的主要推断目标是通过网络规模升级法(NSUM)估计隐藏人群规模,但也用于个人网络规模估计、混合模式恢复以及潜在网络结构推断。ARD的贝叶斯推断几乎普遍假设,在给定受访者度数的条件下,不同子群体报告的计数是独立的。然而,有理由质疑这一假设,因为同质性、潜在空间聚类和不完美回忆都可能引起跨群体依赖性。我们开发了一个基于模拟的ARD神经估计框架,仅需一个模拟器,因此可应用于似然无法写出或有效评估的生成模型。该框架训练一个置换不变的神经贝叶斯估计器,通过最小化多分位数弹球损失(采用累积间隔构造以从设计上排除分位数交叉),为每个边缘参数返回后验中位数和95%可信区间。我们在NSUM式ARD推断的三个结构不同的难解扩展上演示了该框架:随机块模型、潜在空间模型和回忆子集模型。我们将该框架应用于在卢旺达收集的ARD家庭调查。该框架对来自训练分布的任何新调查提供推断,并将ARD建模的范围扩展到网络结构和认知过程假设,超越了当前基于似然的推断所能达到的范围。

英文摘要

Aggregated relational data (ARD) consists of survey responses to questions of the form ``how many people do you know who~$X$?'' and is widely used in survey statistics for indirect inference about populations and social networks. The dominant ARD inference target is hidden-population size estimation via the Network Scale-Up Method (NSUM), but ARD is also used for personal-network-size estimation, mixing-pattern recovery, and inference about latent network structure. Bayesian inference for ARD almost universally assumes that, conditional on a respondent's degree, the counts reported for different subpopulations are independent. There are, however, reasons to question this assumption, as homophily, latent-space clustering, and imperfect recall may all induce cross-population dependence. We develop a simulation-based neural estimation framework for ARD which requires only a simulator, so it can be applied to generative models whose likelihood cannot be written down or efficiently evaluated. The framework trains a permutation-invariant neural Bayes estimator that returns, for each marginal parameter, a posterior median and a 95% credible interval, by minimising a multi-quantile pinball loss with a cumulative-gap construction that rules out quantile crossing by design. We demonstrate the framework on three structurally distinct intractable extensions of NSUM-style ARD inference: a stochastic block model, a latent-space model, and a recall-subset model. We apply the framework to ARD Household Survey collected in Rwanda. The framework provides inference on any new survey drawn from the training distribution, and extends the reach of ARD modelling to network-structure and cognitive-process assumptions beyond those currently accessible to likelihood-based inference.

2606.03880 2026-06-03 stat.ME stat.AP

Principal Components Decomposition of Fraction of Variance Explained in High Dimensional Linear Models with Strong Correlation

强相关高维线性模型中方差解释比例的主成分分解

Man Luo, Chun Chieh Fan, David Azriel, Armin Schwartzman

AI总结 针对高维线性模型中预测变量强相关导致传统FVE估计失效的问题,提出将FVE分解为低维强相关成分和高维弱相关成分的框架,通过主成分分解结合GWASH或LMM-REML方法降低偏差,并在ABCD脑成像数据中验证了其有效性。

详情
AI中文摘要

线性模型中的方差解释比例(FVE)量化了预测变量对结果变异性的解释程度。在高维设置中,传统FVE估计量不适用,而现代FVE估计量(如GWASH或通过限制最大似然估计的线性混合效应模型LMM-REML)难以处理预测变量间的强相关性,这在脑成像数据中经常出现。我们提出一个分解框架,将FVE分为两个部分:一个捕捉强相关性的低维成分(可通过低维方法估计),以及一个具有剩余弱相关性的高维成分(可通过高维方法估计)。模拟表明,分解主导主成分(PCs)并使用GWASH或LMM-REML估计高维FVE,相比直接应用GWASH和LMM-REML等标准方法,能更好地减少偏差。随着预测变量数和样本数增加,我们的方法渐近地表现出一致的性能。我们在青少年大脑认知发展(ABCD)脑成像数据集的分析中展示了该方法,捕捉了由高分辨率脑成像数据预测的认知测量FVE中细微的遗传力信号。

英文摘要

The fraction of variance explained (FVE) in a linear model quantifies the extent to which predictors account for outcome variability. In high-dimensional settings, where traditional FVE estimators do not apply, modern FVE estimators such as GWASH or linear mix-effect model estimated through the restricted maximum likelihood (LMM-REML) struggle with strong correlation among predictors, often found, for example, in brain imaging data. We propose a decomposition framework that partitions the FVE into two components: a low-dimensional component capturing the strong correlation, estimable by low dimensional methods, and a high-dimensional component with remaining weak correlation, estimable by high dimensional methods. Simulations demonstrate that decomposing dominant principal components (PCs) and estimating the high-dimensional FVE using GWASH or LMM-REML leads to improved bias reduction compared to directly applying standard approaches such as GWASH and LMM-REML. Our method shows consistent performance asymptotically as both the number of predictors and the number of samples increase. We illustrate the method in an analysis of the Adolescent Brain Cognitive Development (ABCD) brain imaging dataset, capturing nuanced heritability signals in the FVE of cognitive measures predicted by high-resolution brain imaging data.

2606.03828 2026-06-03 stat.ME

Network Time Series Models for Multivariate Volatility Forecasting

多变量波动率预测的网络时间序列模型

Chiara Boetti, Matthew A. Nunes

AI总结 提出基于网络的广义异质自回归(GNHAR)模型,通过格兰杰因果检验或关联指数推断的有向图纳入截面溢出效应,实现简洁的多变量波动率预测。

详情
AI中文摘要

实现波动率已成为衡量金融资产潜在变动的标准工具,其预测对于广泛的金融应用至关重要。我们提出了一种基于网络的模型,通过异质自回归(HAR)方法预测实现方差向量。广义网络HAR(GNHAR)模型通过从格兰杰因果检验或关联指数推断的有向图纳入截面溢出效应,得到简洁的多变量时间序列模型规范。在平静和危机状态下对十只股票的应用中,所提出的GNHAR模型在短期和长期预测中均优于常见的HAR模型基准。我们还比较了考虑跳跃-连续分解或节点特定期权隐含方差时的网络规范。最后,与过度参数化模型不同,我们的方法产生一组简洁的参数,跟踪跨市场依赖性的增强或减弱,提供市场稳定性的时变定量评估。

英文摘要

Realized volatility has become a standard tool for measuring latent variation in financial assets, and its forecasting is crucial for a wide range of financial applications. We propose a network-based model for forecasting a vector of realized variance processes through the heterogeneous autoregressive (HAR) approach. The generalised network HAR (GNHAR) model incorporates cross-sectional spillovers through a directed graph inferred from Granger-causality tests or connectedness indices, yielding a parsimonious multivariate time series model specification. In an application to ten equities over tranquil and crisis regimes, the proposed GNHAR model improves upon common HAR model benchmarks under both short- and long-term forecasting. We also compare the network-based specification when the jump-continuous decomposition or node-specific option-implied variances are considered. Finally, unlike overparameterised models, our approach yields a concise set of parameters that track the strengthening or weakening of cross-market dependencies, providing a time-varying quantitative assessment of market stability.

2606.03805 2026-06-03 stat.ME

Regularization in Paired Comparison Models via Pseudo-Games and Phantom Players

通过伪博弈和幻影玩家进行成对比较模型的正则化

Mark E. Glickman

AI总结 针对成对比较模型中最大似然估计不稳定的问题,本文提出两种数据增广正则化方法:添加伪博弈和引入幻影玩家,产生有限收缩估计并解决位置不可识别性,在Bradley-Terry模型中与岭正则化效果相当。

Comments 22 pages, 4 figures, 2 tables

详情
AI中文摘要

成对比较模型对于从二元结果估计潜在能力或偏好很有用,但当比较图不连通或近乎分离时,最大似然估计可能不稳定或失败。岭正则化通过将能力参数向共同中心收缩来解决这些困难,但它可能掩盖了使Bradley-Terry和Thurstone-Mosteller模型对从业者有吸引力的简单似然解释。本文描述了两种关于正则化的数据增广视角。第一种在每对竞争者之间添加分数伪博弈。第二种添加一个固定强度的幻影玩家,并让每个真实竞争者对该玩家获得加权的伪胜和伪负。两种方法都产生有限的收缩估计;幻影玩家构造还解决了通常的位置不可识别性,而无需显式线性约束。对于Bradley-Terry模型,这两种增广导致了透明的惩罚函数,可以直接与岭惩罚进行比较。对2025年美国职业棒球大联盟常规赛的应用表明,调优的伪博弈和幻影玩家正则化可以紧密再现岭正则化的强度估计,同时保留直观的增广数据表示。

英文摘要

Paired comparison models are useful for estimating latent abilities or preferences from binary outcomes, but maximum likelihood estimation can be unstable or fail when the comparison graph is disconnected or nearly separated. Ridge regularization addresses these difficulties by shrinking ability parameters toward a common center, but it can obscure the simple likelihood interpretation that makes Bradley-Terry and Thurstone-Mosteller models attractive to practitioners. This paper describes two data-augmentation perspectives on regularization. The first adds fractional pseudo-games between every pair of competitors. The second adds a fixed-strength phantom player and gives each real competitor a weighted pseudo-win and pseudo-loss against that player. Both approaches yield finite, shrunken estimates; the phantom-player construction also resolves the usual location nonidentifiability without an explicit linear constraint. For the Bradley-Terry model, the two augmentations lead to transparent penalty functions that can be compared directly with ridge penalties. An application to the 2025 Major League Baseball regular season illustrates that tuned pseudo-game and phantom-player regularization can closely reproduce ridge-regularized strength estimates while retaining an intuitive augmented-data representation.

2606.03750 2026-06-03 stat.ME

Extending TCLUST to higher dimensions

将TCLUST扩展到更高维度

Lucía Trapote Reglero, Luis Ángel García Escudero, Agustín Mayo Íscar

AI总结 针对高维数据中传统鲁棒聚类方法TCLUST参数估计困难的问题,提出结合HDDC框架与修剪技术的tHHDC方法,实现鲁棒聚类与降维的融合。

详情
AI中文摘要

已知异常值会显著扭曲许多常用聚类方法的结果,通常导致不可靠的分区。为了解决这个问题,已经开发了几种鲁棒聚类方法,这些方法不仅减少异常值的影响,而且有助于检测有意义的异常值。本报告聚焦于基于修剪的鲁棒聚类方法,特别是TCLUST,它将MCD在单总体问题中使用的修剪类型扩展到多个未知聚类的更一般情况。虽然TCLUST在低维数据上表现良好,但由于估计大量参数的复杂性,它在高维数据集上表现不佳。鲁棒线性分组(RLG)方法通过假设聚类位于低维子空间附近,从而将聚类与降维相结合,提供了一种替代方案。然而,当子空间相交时,RLG存在局限性,并且假设了过于简单的各向同性正交误差。本文将提出一种扩展TCLUST的鲁棒聚类方法,该方法基于高维数据聚类(HDDC)方法,通过引入修剪和特征值约束。这种新方法称为tHHDC,它结合了TCLUST和RLG,需要在HDDC框架内对这两种方法进行仔细的修改和集成。本文将研究该方法的理论性质,并提供可行的实现算法。通过模拟研究和实际数据示例,将说明所提出方法的有效性以及输入参数选择的问题。

英文摘要

Outliers are known to significantly distort the results of many commonly used clustering methods, often leading to unreliable partitions. To address this issue, several robust clustering approaches have been developed that not only reduce their influence but also facilitate the detection of meaningful outliers. This presentation focuses on robust clustering methods based on trimming, especially TCLUST, which extends the type of trimming used by MCD in one-population problems to the more general case of multiple and unknown clusters. While TCLUST performs well on low-dimensional data, it struggles with high-dimensional datasets due to the complexity of estimating a large number of parameters. The Robust Linear Grouping (RLG) method offers an alternative by assuming clusters lie near lower-dimensional subspaces, thereby combining clustering with dimensionality reduction. However, RLG has limitations when subspaces intersect and assumes overly simplistic isotropic orthogonal errors. A robust clustering method extending TCLUST will be presented, building on the High Dimensional Data Clustering (HDDC) approach by incorporating trimming and eigenvalue constraints. This new approach, called tHHDC, combines TCLUST and RLG, requiring careful modification and integration of both methodologies within that HDDC framework. A study of the theoretical properties of this approach, together with a feasible algorithm for its implementation, will be presented. The interest of the proposed methodology, along with the issue of selecting input parameters, will be illustrated through a simulation study and a real-data example.

2606.03702 2026-06-03 stat.ME

Dynamic Mini Max Design and Sequential HB Inference for Repeated Surveys

重复调查的动态极小极大设计与序贯分层贝叶斯推断

Siu-Ming Tam

AI总结 提出动态极小极大(DMM)框架,通过联合优化样本量和波重叠,在满足精度约束、受访者负担和预算下降低成本,并实现水平和变动的协调推断。

Comments 41 pages, 4 figures

详情
AI中文摘要

本文为重复调查开发了一个动态极小极大(DMM)框架,包括动态极小极大设计和序贯分层贝叶斯更新(SHBU)。DMM在同时满足水平和变动的精度约束、受访者负担限制和实地调查预算的条件下,联合优化样本量和波重叠。使用2021年澳大利亚人口普查数据(t=1)和模拟波t=2,3,4进行说明。DMM和经典设计均从相同的5%比例分配n_A=42,018个单位开始。DMM将其减少到n*=40,251,同时满足所有精度约束,实现约6.3%的成本节约。两种设计的水平覆盖率相当(最大绝对相对误差(MARE)比率0.844-1.263)。变动覆盖率显著不同:DMM在所有27个域-变量单元中达到100%,而经典设计仅达到82%-96%(全国87.5%-95.0%)。经典置信区间低估了变动不确定性,因为它仅处理抽样方差,未考虑模型方差分量V_mod_hat。论文概述了DMM框架的其他优势——包括水平和变动的协调联合推断、无需临时复合估计器链接的序贯更新以及小区域估计。

英文摘要

TThis paper develops a Dynamic Mini-Max (DMM) framework for repeated surveys comprising a Dynamic Mini-Max Design and a Sequential Hierarchical Bayes Update (SHBU). The DMM jointly optimizes sample size and wave overlap subject to simultaneous precision constraints for levels and movements, a respondent burden limit, and a fieldwork budget. The methods are illustrated using 2021 Australian Census data (t = 1) and simulated waves t = 2, 3, 4. Both the DMM and the classical design start from the same 5% proportional allocation of n_A = 42,018 units. The DMM reduces this to n* = 40,251 while meeting all precision constraints, achieving a cost saving of approximately 6.3%. Level coverage is comparable between the two designs (maximum absolute relative error (MARE) ratio 0.844--1.263). Movement coverage diverges markedly: the DMM achieves 100% across all 27 domain-variable cells, while the classical design achieves only 82%--96% (87.5%--95.0% nationally). The classical confidence interval understates movement uncertainty because it addresses sampling variance only and does not account for the model variance component V_mod_hat. Additional benefits of the DMM framework -- including coherent joint inference for levels and movements, sequential updating without ad hoc composite-estimator chaining, and small area estimation -- are outlined in the paper.

2606.03670 2026-06-03 stat.ME math.ST stat.TH

Projection Diagnostics for Directional Asymmetry and Tail-Ratio Departure in Multivariate Data

多元数据中方向不对称性与尾部比率偏离的投影诊断

Sayantan Banerjee, Soudeep Deb

AI总结 提出基于投影的诊断方法,通过方向偏度与分位数尾部比率将数据分类为四种模式,避免高阶矩的不稳定性,并建立理论性质。

详情
AI中文摘要

我们研究基于投影的诊断方法,用于区分多元数据中的方向不对称性与尾部比率偏离。该方法将问题简化为单维投影,并计算两个基于分位数的汇总统计量:在多个分位数水平上评估的方向偏度度量,以及相对于选定基准评估的分位数间尾部比率。这两个汇总统计量导致四类分类:对称基准尾部、对称尾部偏离、偏斜基准尾部和偏斜尾部偏离。分位数公式避免依赖三阶和四阶矩,这些矩在重尾设置中可能不稳定。我们在中心对称性和椭圆性下建立总体性质,在搜索方向上建立均匀有限样本界,以及在分离模式下的阈值分类器一致性。还使用稀疏秩一计算说明为什么坐标方向在高维中可以补充随机方向。所得诊断旨在指导后续建模选择,例如对称、偏斜、尾部偏离或组合多元模型是否合适。

英文摘要

We study projection-based diagnostics for distinguishing directional asymmetry from tail-ratio departure in multivariate data. The procedure reduces the problem to one-dimensional projections and computes two quantile-based summaries: a directional skewness measure evaluated over several quantile levels, and an interquantile tail-ratio evaluated relative to a chosen benchmark. The two summaries lead to a four-regime classification: symmetric benchmark-tail, symmetric tail-departed, skewed benchmark-tail, and skewed tail-departed. The quantile formulation avoids relying on third and fourth moments, which can be unstable in heavy-tailed settings. We establish population properties under central symmetry and ellipticity, uniform finite-sample bounds over the searched directions, and consistency of the threshold classifier under separated regimes. A sparse rank-one calculation is also used to show why coordinate directions can complement random directions in high dimensions. The resulting diagnostic is meant to guide subsequent modelling choices, for example whether a symmetric, skewed, tail-departed, or combined multivariate model is appropriate.

2606.03477 2026-06-03 stat.ME

Surrogate-assisted optimal sampling for risk prediction under measurement constraints

测量约束下基于替代辅助的最优抽样用于风险预测

Sunhyun Park, Seong-ho Lee

AI总结 针对真实响应测量成本高且有限的问题,提出一种利用替代变量辅助的最优抽样框架,通过最小化期望交叉熵损失来分配测量预算,并构建逆概率加权估计器,实现预测性能提升与鲁棒性。

详情
AI中文摘要

在许多风险预测问题中,协变量和响应替代变量通常可在大规模目标人群中常规获取,而真实响应则成本高昂且仅对有限子集观测。这产生了一个设计问题:在固定测量预算下,必须决定哪些观测应接受响应测量以构建预测模型。我们提出了一种在测量约束下用于风险预测的替代辅助最优抽样框架。在目标设定中,替代变量识别出已确认的阳性病例,而替代阴性观测的响应未被观测且可选择性地测量,因此抽样设计决定了响应测量预算的分配方式。我们的框架构建了最小化期望样本外交叉熵损失主导项的最优抽样设计,并将所得设计纳入逆概率加权交叉熵估计量。所提出的设计仅依赖于协变量、替代变量和初步估计量,因此在设计阶段不需要未标记观测的响应。我们建立了所得估计量的一致性、渐近正态性和前导阶预测最优性。广泛的模拟研究和两个真实数据应用表明,所提出的设计提高了预测性能,并在替代变量错误设定和罕见结局设定下表现出鲁棒性。

英文摘要

In many risk prediction problems, covariates and a response surrogate are routinely available for a large target population, whereas the true response is costly to ascertain and is observed only for a limited subset. This creates a design problem: one must decide which observations should receive response measurement in order to build a prediction model under a fixed measurement budget. We propose a surrogate-assisted optimal sampling framework for risk prediction under measurement constraints. In the target setting, the surrogate identifies confirmed positive cases, while responses for surrogate-negative observations remain unobserved and can be selectively measured, and thus the sampling design determines how the response measurement budget is allocated. Our framework constructs an optimal sampling design minimizing the leading term of the expected out-of-sample cross-entropy loss and incorporates the resulting design into an inverse-probability-weighted cross-entropy estimator. The proposed design depends only on covariates, the surrogate, and a preliminary estimator, and therefore does not require responses from unlabeled observations at the design stage. We establish consistency, asymptotic normality, and leading-order prediction optimality of the resulting estimator. Extensive simulation studies and two real data applications demonstrate that the proposed design improves prediction performance and exhibits robustness under surrogate misspecification and rare outcome settings.

2606.03429 2026-06-03 stat.ME cond-mat.dis-nn cond-mat.stat-mech math-ph math.MP physics.data-an

Modeling Discrete Data with High-Order Vector Potts Models

高阶矢量Potts模型对离散数据的建模

Aaron De Clercq, Merijn Moody, Clélia de Mulatier

AI总结 本文通过引入q态自旋模型,将最大熵框架从二元数据推广到离散数据,提出高阶矢量Potts模型,并利用配分函数的圈展开和规范变换揭示其统计性质,最后聚焦于最小复杂模型实现快速模型选择。

Comments 89 pages, 16 figures

详情
AI中文摘要

对高维数据进行建模具有挑战性,但对于理解许多复杂系统至关重要。最大熵模型(如Ising模型和Potts模型)已被广泛用于从数据中的相关模式捕获成对相互作用,从而能够从观测(例如,从蛋白质序列或神经群体活动)中推断复杂系统的图形表示。最近,人们对涉及三个或更多变量的高阶相关模式建模的兴趣日益增长。虽然在高阶Ising模型的二元数据方面取得了进展,但我们将此框架扩展到更一般的离散数据情况。我们引入了q态自旋模型,这是一个完整的最大熵模型族,将矢量Potts模型推广到包含长程和任意高阶相互作用。在成对情况下,与标准矢量Potts模型相比,我们的模型允许更多样化的相互作用类型。我们通过示例讨论了它们的统计解释,并将其与离散傅里叶分析联系起来。利用配分函数的圈展开,我们证明了自旋模型的统计性质完全由其相互作用的代数结构所捕获。我们定义了规范变换,在此变换下该结构(以及配分函数)保持不变。规范变换下等价的模型可以被视为同一抽象统计模型的不同表示,尽管通常具有不同阶数的相互作用,这扩展了二元情况的结果。对于数据分析的实际应用,我们专注于二元情况下称为最小复杂模型的一个子集,并将其推广到离散数据。我们获得了这些模型边际似然的闭式表达式,从而能够快速进行模型选择。我们通过简单的真实世界示例说明了它们的用途。

英文摘要

Modeling high-dimensional data is challenging, yet essential to understanding many complex systems. Maximum entropy models such as Ising and Potts models have been used extensively to capture pairwise interactions from correlation patterns in data, allowing to infer graphical representations of complex systems from observations (e.g., from protein sequences or neural population activity). Recently, there has been growing interest in modeling higher-order correlation patterns involving simultaneously three or more variables. While progress has been made in binary data with high-order Ising models, we extend this framework to the more general case of discrete data. We introduce q-state spin models, a complete family of maximum entropy models that generalize the vector Potts model to include long-range and arbitrary high-order interactions. In the pairwise case, our models allow for more diverse interaction types compared to the standard vector Potts model. We discuss their statistical interpretation with examples and relate them to discrete Fourier analysis. Using a loop expansion of the partition function, we show that the statistical properties of spin models are fully captured by the algebraic structure of their interactions. We define gauge transformations under which this structure, and thus the partition function, remains invariant. Models equivalent under gauge transformations can be seen as different representations of the same abstract statistical model, despite generally having interactions of different orders, extending results from the binary case. For practical application to data analysis, we focus on a subset of models known in the binary case as Minimally Complex Models, generalizing them to discrete data. We obtain a closed-form expression for the marginal likelihood of these models, enabling fast model selection. We illustrate their use with simple real-world examples.

2606.03230 2026-06-03 stat.ME stat.CO

Predictively-Oriented Kalman Filtering

面向预测的卡尔曼滤波

Zheyang Shen, Gerardo Duran-Martin, Chris. J. Oates

AI总结 针对非线性状态空间模型中模型误设导致的过度自信推断问题,提出一种基于预测导向后验的快速近似线性高斯更新方法(EKF-PrO),计算成本与现有滤波方法相当。

详情
AI中文摘要

本文提出了一种后贝叶斯方法,用于非线性状态空间模型中的在线滤波,能够避免在动力学模型、测量模型或两者都可能被误设的情况下出现过度自信的推断。这通过使用预测导向(PrO)后验来解决,这是一种新兴范式,其中学习(即后验集中)当且仅当整体模型被良好指定时发生,而不严格遵循贝叶斯定理。由于PrO后验的表征具有挑战性,我们的主要技术贡献是一种快速的近似线性高斯更新过程,类似于(迭代)扩展卡尔曼滤波。该方法称为EKF-PrO,没有可调超参数,计算成本与现有滤波方法相当。在系统性地误设状态空间模型的一系列线性和非线性应用中,对性能进行了实证评估。

英文摘要

This paper presents a post-Bayesian approach to online filtering in nonlinear state-space models, capable of avoiding over-confident inferences in settings where either the dynamical model, the measurement model, or both, could be misspecified. This is addressed using predictively oriented (PrO) posteriors, an emerging paradigm in which learning (i.e., posterior concentration) occurs if and only if the overall model is well-specified, without strict adherence to Bayes' theorem. As the characterisation of PrO posteriors is challenging, our main technical contribution is a fast approximate linear-Gaussian update procedure, analogous to an (iterated) extended Kalman filter. The methodology, which we call EKF-PrO, has no tunable hyper-parameters and has a computational cost comparable to that of existing filtering methods. Performance is empirically assessed on a range of linear and non-linear applications, in which the state-space model is systematically misspecified.

2606.03211 2026-06-03 stat.ME stat.ML

Optimized Labeling Resource Allocation for Prediction-Assisted Inference via OPAL

通过OPAL进行预测辅助推断的优化标注资源分配

Virginia L. Ma, Emmanuel J. Candès

AI总结 提出OPAL方法,通过可学习的平滑策略分配标注资源,以最小化估计方差,实现预测辅助推断中的高效标注和统计推断。

详情
AI中文摘要

主动统计推断是一个新框架,能够对总体参数做出具有可证明统计保证的精确声明。它利用预测性“黑箱”机器学习模型策略性地决定标注哪些数据点,大致优先考虑ML模型对其标签值不确定的样本。一个主要问题是,当不确定性估计存在噪声时,该框架可能变得脆弱。本文介绍了OPAL(标注分配优化策略),它在可处理的平滑策略类中学习标注策略,以产生方差最小的估计量。实际上,OPAL是一个端到端的流程,将黑箱模型的不确定性得分转化为数据自适应的标注策略,然后对收集的样本进行推断。我们在涵盖医学影像数据、计算社会科学和蛋白质组学的真实数据集上评估了OPAL。作为一个具体例子,我们考虑从组织病理学图像预测乳腺癌亚型,并使用OPAL为不同人口统计组的比值比形成有效的置信区间。我们表明,OPAL在有限样本中实现了名义覆盖,并具有人们期望从拥有更多标注样本的方法中获得的准确性。

英文摘要

Active Statistical Inference is a new framework to make precise claims about population parameters with provable statistical guarantees. It uses a predictive "black-box" machine learning (ML) model to strategically decide which data points to label, roughly prioritizing samples for which the ML model is unsure about their label values. A major issue is that the framework can be brittle when uncertainty estimates are noisy. This paper introduces OPAL (Optimized Policy for Allocation of Labels), which learns a labeling strategy within a tractable class of smooth policies to yield estimators with the lowest variance. In effect, OPAL is an end-to-end pipeline that turns a black-box model's uncertainty scores into a data-adaptive labeling strategy and then performs inference on the collected samples. We evaluate OPAL on real datasets spanning medical imaging data, computational social science, and proteomics. As a concrete example, we consider predicting breast cancer subtype from histopathology images and using OPAL to form valid confidence intervals for odds ratios for different demographic groups. We show that OPAL achieves nominal coverage in finite samples and has the accuracy one expects from methods which have far more labeled samples.

2606.03154 2026-06-03 stat.ME

Efficient Federated Estimation and Inference for High-Dimensional Tail Index Regression

高维尾指数回归的高效联邦估计与推断

Haoyu Geng, Liuhua Peng, Changliang Zou, Xiaolong Cui

AI总结 针对异质联邦数据,提出基于稀疏正则化与非凹融合惩罚的高维尾指数回归方法,实现系数估计、变量选择和分组恢复,并建立去偏联邦推断程序。

Comments 35 pages, 5 figures

详情
AI中文摘要

尾指数回归研究协变量如何影响重尾数据的尾部重度。在许多应用中,数据分布在异质来源中,由于隐私或监管限制,直接合并不可行。现有方法主要关注单数据集分析,未解决异质联邦设置。我们开发了一个针对高维尾指数回归的个性化联邦框架,该框架在利用客户端间潜在相似性的同时适应客户端异质性。所提出的估计器结合稀疏正则化和非凹融合惩罚,进行系数估计、变量选择和分组恢复。我们建立了非渐近收敛速度,并证明该估计器通过一致恢复潜在分组结构具有oracle性质。在计算方面,我们开发了一种基于ADMM的联邦算法,具有自适应梯度更新,并建立了其收敛保证。我们进一步提出了一种基于相关客户端间自适应加权聚合的去偏联邦推断程序,产生有效的置信区间和假设检验,其效率优于仅目标推断。模拟研究和真实数据分析证明了所提出方法的有效性。

英文摘要

Tail index regression studies how covariates affect tail heaviness in heavy-tailed data. In many applications, data are distributed across heterogeneous sources, where direct pooling is infeasible due to privacy or regulatory constraints. Existing methods mainly focus on single-dataset analysis and do not address heterogeneous federated settings. We develop a personalized federated framework for high-dimensional tail index regression that accommodates client heterogeneity while exploiting latent similarities across clients. The proposed estimator combines sparsity regularization with nonconcave fusion penalties to perform coefficient estimation, variable selection, and group recovery. We establish non-asymptotic convergence rates and show that the estimator enjoys an oracle property by consistently recovering the underlying grouping structure. For computation, we develop an ADMM-based federated algorithm with adaptive gradient updates and establish its convergence guarantees. We further propose a debiased federated inference procedure based on adaptive weighted aggregation across related clients, yielding valid confidence intervals and hypothesis tests with improved efficiency over target-only inference. Simulation studies and real-data analysis demonstrate the effectiveness of the proposed methods.

2606.03023 2026-06-03 stat.ME stat.AP

Marginalised Poisson Hurdle Model for Cross-Sectional Count Data with Excess Zeros

边际化泊松障碍模型用于含过多零的截面计数数据

Fred Fosu Agyarko, Edward Acheampong, Issah Seidu, Samuel Iddi

AI总结 针对含过多零的计数数据,提出边际化泊松障碍模型(MPHM),通过重新参数化计数分量直接建模边际均值,解决了标准泊松障碍模型中发病率密度比(IDR)非恒定问题,并证明了估计量的渐近性质。

详情
AI中文摘要

在健康经济学和流行病学中,含过多零的计数数据频繁出现。标准泊松障碍模型(PHM)直接参数化潜在的泊松率,因此其计数分量系数是对数率比而非边际均值的对数比。因此,PHM的发病率密度比(IDR)既不精确也不随协变量分布恒定,这使应用报告复杂化。我们提出边际化泊松障碍模型(MPHM),它重新参数化计数分量,使得系数向量beta直接控制边际均值E[Y]。一个非线性连接方程将结构泊松率与该参数化均值联系起来。我们证明了连接解的存在性和唯一性,开发了向量化的Brent方法求解器,推导了得分方程和块对角Fisher信息,建立了渐近正态性,并证明了exp(beta)在所有协变量值上精确恒定。一项模拟研究,样本量n ∈ {100, 250, 500, 1000},零比例π ∈ {0.2, 0.4, 0.6, 0.8},R = 200次重复,在所有16种场景下确认了一致性、接近零的偏差以及0.905-0.975的95% Wald覆盖率。应用于NMES1988医生就诊数据(n = 4,406),MPHM得出每个额外慢性病的IDR = 1.163(95% CI: 1.150-1.177)——这是一个精确的、全人群效应,而PHM无法得出。MPHM通过直接参数化E[Y]解决了非恒定IDR问题。得到的IDR对每个个体和整个人群都成立,无需进一步边际化,大大简化了健康利用研究中协变量效应的报告。

英文摘要

Count data with excess zeros arise frequently in health economics and epidemiology. The standard Poisson Hurdle Model (PHM) parametrises the underlying Poisson rate directly, so its count-component coefficients are log-rate ratios rather than log-ratios of the marginal mean. Consequently, the incidence density ratio (IDR) from the PHM is neither exact nor constant across covariate profiles, complicating applied reporting. We propose the Marginalised Poisson Hurdle Model (MPHM), which reparametrises the count component so that the coefficient vector beta directly governs the marginal mean E[Y]. A nonlinear connector equation links the structural Poisson rate to this parametrised mean. We prove existence and uniqueness of the connector solution, develop a vectorised Brent's-method solver, derive the score equations and block-diagonal Fisher information, establish asymptotic normality, and prove that exp(beta) is exactly constant across all covariate values. A simulation study with n in {100, 250, 500, 1000}, zero proportion pi in {0.2, 0.4, 0.6, 0.8}, and R = 200 replications confirms consistency, near-zero bias, and 95% Wald coverage of 0.905-0.975 across all 16 scenarios. Applied to the NMES1988 physician visit data (n = 4,406), the MPHM yields IDR = 1.163 (95% CI: 1.150-1.177) per additional chronic condition - an exact, population-wide effect not derivable from the PHM. The MPHM resolves the non-constant IDR problem by directly parametrising E[Y]. The resulting IDR holds for every individual and the whole population without further marginalisation, substantially simplifying the reporting of covariate effects in health utilisation research.

2606.03012 2026-06-03 stat.ME

Powerful Switchback Experiments -- Or Not?

强大的切换实验?——或者不是?

Sergei Pankratev

AI总结 本文推导了切换实验个体层面OLS估计量的多水平渐近方差闭合公式,揭示了统计功效的结构性下限,并研究了三种方法论应用。

详情
AI中文摘要

切换实验——其中处理在跨时间段的聚类水平上分配——广泛应用于市场和平台环境,但目前尚不存在闭合形式的功效公式。我们通过推导个体层面OLS估计量的闭合形式、多水平渐近方差近似来填补这一空白,从而便于功效预算。利用该公式,我们揭示了统计功效的结构性下限:虽然特质噪声随观测密度消失,但宏观冲击会因聚类规模不平衡而受到乘法惩罚。通过解析推导和蒙特卡洛模拟,我们确认该公式在典型参数下是精确的,并在极端边界情况下作为数学上的保守上界。我们研究了三种方法论应用。首先,我们证明诸如分层等高级分配设计仅能部分消除聚类规模不平衡对功效的惩罚。其次,我们证明针对宏观冲击的方差缩减技术比针对残差噪声的技术产生不成比例的更高效率增益。第三,我们形式化了个体层面估计量与单元层面估计量之间的有限样本功效权衡。

英文摘要

Switchback experiments -- in which treatment is assigned at the level of a cluster crossed with a time period -- are widely used in marketplace and platform settings, yet no closed-form power formula exists for them. We fill this gap by deriving a closed-form, multi-level asymptotic variance approximation for the individual-level OLS estimator, facilitating power budgeting. Using this formula, we reveal a structural floor on statistical power: while idiosyncratic noise vanishes with observation density, macro-level shocks are multiplicatively penalized by cluster size imbalance. We confirm through analytical derivations and Monte Carlo simulations that the formula is exact across typical parameters and serves as a mathematically conservative upper bound in extreme boundary regimes. We study three methodological applications. First, we prove that advanced assignment designs like stratification only partially eliminate the penalty of cluster size imbalance on power. Second, we demonstrate that variance reduction techniques targeting macro-level shocks yield disproportionately greater efficiency gains than those targeting residual noise. Third, we formalize the finite-sample power trade-offs between individual-level and cell-level estimators.

2606.03007 2026-06-03 stat.AP q-bio.PE

Computing the final epidemic size distributions of a multi-type Galton--Watson process

计算多类型 Galton-Watson 过程的最终流行规模分布

Yuta Okada, Hiroshi Nishiura

AI总结 提出一种基于柯西积分轮廓选择的方法,计算多类型 Galton-Watson 过程的最终规模分布,并应用于模拟数据和中东呼吸综合征真实数据。

Comments Submitted; under review

详情
AI中文摘要

Galton-Watson 过程 (GWP) 是一种离散时间分支过程模型,为分析流行病数据和估计基本再生数等关键流行病学参数提供了有力工具。当与基于监测的簇大小数据结合使用时,即使每个传播过程不可直接观测,GWP 也能揭示传播异质性的程度。当获得簇大小分布数据时,可通过使用与观测簇大小数据对应的概率质量函数来统计推断控制传播的参数。然而,对于多类型 GWP,实际应用仍然有限,可能是因为缺乏概念上和实践中直接的方法来推导最终规模分布的闭式解。在本研究中,我们提出一个框架,通过选择柯西积分轮廓的方法来计算多类型 GWP 的最终规模分布。我们提供了如何将我们的框架应用于模拟数据和中东呼吸综合征真实数据的示例,并讨论了在使用未以灭绝为条件的似然进行统计推断时参数可识别性方面的潜在陷阱。

英文摘要

The Galton--Watson process (GWP) is a discrete-time branching process model that provides a powerful tool for analyzing epidemic data and estimating key epidemiological parameters such as the basic reproduction number. When used with surveillance-based cluster size data, the GWP can also elicit information about the extent of transmission heterogeneity, even when each transmission process is not directly observable. When cluster size distribution data are available, the parameters that govern the transmission can be statistically inferred by using the probability mass function that corresponds to the observed cluster size data. For multi-type GWPs, however, real-world applications remain limited, possibly because of the absence of conceptually and practically straightforward approaches for deriving the closed-form solution of the final size distribution. In the present study, we propose a framework for computing the final size distribution of multi-type GWPs, using a method for the choice of the Cauchy integral contour. We provide examples of how our framework can be applied to both simulated data and real-world data of Middle East respiratory syndrome, and discuss potential pitfalls surrounding the identifiability of parameters for statistical inference when using likelihoods that are not conditioned on extinction.

2606.02874 2026-06-03 stat.CO stat.ML

Neural Posterior Estimation for Stochastic Epidemic Models Using Final Outcome Data

基于最终结果数据的随机流行病模型的神经后验估计

Theodore Kypraios

AI总结 本文首次将神经后验估计(NPE)应用于基于最终结果数据的随机SIR流行病模型,通过前馈神经网络参数化对数正态后验近似,准确恢复参考后验,并推广到家庭模型中的全局和局部传播率联合推断。

详情
AI中文摘要

神经后验估计(NPE)是一种基于模拟的贝叶斯推断方法,通过训练神经网络从模拟的参数-数据对中近似后验分布,绕过了似然评估。我们首次将NPE应用于通过最终结果数据观测的随机易感-感染-移除(SIR)流行病模型,考虑了均匀混合和家庭结构种群。这类数据在回顾性暴发调查和家庭传播研究中自然出现,但推断在计算上具有挑战性:数据增强马尔可夫链蒙特卡洛(MCMC)在大种群中混合缓慢且难以实现,而近似贝叶斯计算(ABC)则面临低接受率,尤其是对于大种群或不太可能的结果。此类观测的离散、低维特性使得该设置特别适合NPE。我们表明,由前馈神经网络参数化的对数正态后验近似能够准确恢复各种种群大小和传播机制下的参考后验,并自然地扩展到家庭模型中全局和局部传播率的联合推断。一旦训练完成,网络在几秒钟内产生近似后验分布,并可靠地推广到训练中未见过的种群大小和结构。在合成和真实暴发数据集上的性能始终强劲,结果与已发表的分析高度一致。

英文摘要

Neural posterior estimation (NPE) is a simulation-based approach to Bayesian inference that trains a neural network to approximate the posterior distribution from simulated parameter - data pairs, bypassing likelihood evaluation. We apply NPE -- to our knowledge for the first time -- to stochastic susceptible-infectious-removed (SIR) epidemic models observed through final outcome data, considering both homogeneously mixing and household-structured populations. Such data arise naturally in retrospective outbreak investigations and household transmission studies, yet inference is computationally challenging: data-augmentation Markov chain Monte Carlo (MCMC) can be slow to mix in large populations and difficult to implement, while Approximate Bayesian Computation (ABC) suffers from low acceptance rates, particularly for large populations or unlikely outcomes. The discrete, low-dimensional nature of such observations makes this setting particularly well suited to NPE. We show that a logNormal posterior approximation, parameterised by a feed-forward neural network, accurately recovers reference posteriors across a range of population sizes and transmission regimes, and extends naturally to joint inference on global and local transmission rates in the household model. Once trained, the network produces approximate posterior distributions in seconds and generalises reliably to population sizes and structures not seen during training. Performance on both synthetic and real outbreak datasets is consistently strong, with results in close agreement with published analyses.

2606.02833 2026-06-03 stat.ME

Identification, Estimation, and Inference for Sequential Causally Ordered Mediation Pathways

顺序因果中介路径的识别、估计与推断

Ritoban Kundu, Canyi Chen, Peter X. K. Song

AI总结 本文建立了一个针对顺序中介变量的通用框架,实现了总效应的路径特定分解,并提出了基于学生化统计量和数据分割的推断方法,在复合零假设下有效控制第一类错误。

详情
AI中文摘要

中介分析在揭示暴露通过中间途径影响结果的机制中起着重要作用。虽然单中介变量设置的方法学进展已较为成熟,但处理多个顺序中介变量的严谨工具仍不完善。这类设置在纵向队列研究等应用中很常见,其中暴露随时间通过复杂的中介链发挥作用。在本文中,我们建立了一个针对顺序中介变量的通用框架,能够识别总效应并将其正式分解为特定路径的效应。我们还开发了针对连续和分类结果的中介估计量的估计程序。此外,我们引入了一种新的检验策略,使用学生化统计量结合数据分割进行推断。该方法在复合零假设下,针对多种数据生成机制实现了有效的第一类错误控制。通过大量模拟和两项大规模实证研究的应用,我们证明了所提出的方法能够提供可靠的估计、有效的推断,并在发现新中介途径方面具有更高的功效。

英文摘要

Mediation analysis plays an essential role in uncovering the mechanisms by which an exposure influences an outcome through intermediate pathways. While methodological advances for single-mediator settings are well established, rigorous tools for handling multiple, sequentially ordered mediators remain underdeveloped. Such settings are common in applications like longitudinal cohort studies, where exposures operate through complex chains of mediators over time. In this paper, we establish a general framework for sequentially ordered mediators that enables the identification and formal decomposition of the total effect into component path-specific effects. We also develop estimation procedures for mediation estimands with both continuous and categorical outcomes. Furthermore, we introduce a new testing strategy to conduct inference using a studentized statistic combined with data-splitting. This approach achieves valid Type I error control under the composite null across diverse data-generating mechanisms. Through extensive simulations and applications to two large-scale empirical studies, we demonstrate that the proposed methodology provides reliable estimation, valid inference, and improved power for discovering novel mediation pathways.

2606.02777 2026-06-03 stat.CO stat.ME

Emulators for Large-scale Computer Experiments with Quantitative and Qualitative Inputs

大规模计算机实验的模拟器:定量与定性输入

Anita Shahrokhian, Youngdeok Hwang, C. Devon Lin

AI总结 提出一种基于加性高斯过程和Vecchia近似的可扩展框架,用于处理混合输入的大规模计算机实验模拟。

详情
AI中文摘要

同时具有定量和定性输入的计算机实验在各个领域已变得普遍。然而,为这类大规模实验构建精确且计算高效的模拟器仍然是一个重大挑战。我们提出了一种新颖的、可扩展的框架,用于模拟具有混合输入的计算机实验。我们的方法基于一种新的协方差函数,该函数结合了加性高斯过程(GPs)来处理混合输入,并采用Vecchia近似实现可扩展性。我们证明,当与所提出的建模框架结合时,大规模计算机实验的方法可以有效地扩展。

英文摘要

Computer experiments with both quantitative and qualitative inputs have become common across various areas. However, constructing accurate and computationally efficient emulators for such experiments at large scales remains a significant challenge. We propose a novel, scalable framework for emulating computer experiments with mixed inputs. Our approach is based on a new covariance function integrating additive Gaussian Processes (GPs) to handle the mixed inputs, with Vecchia approximation for scalability. We demonstrate that methods for large-scale computer experiments can be effectively extended when paired with our proposed modeling framework.

2606.02676 2026-06-03 stat.ME

Diagnostic Tools for Extreme Value Regression Models

极值回归模型的诊断工具

Ed Mackay, Jordan Richards, Philip Jonathan

AI总结 针对极值回归模型,提出标准化尾部图和标准化残差图两种可视化诊断工具,通过渐近分布实现全局和局部拟合优度比较,支持模型选择与改进。

详情
AI中文摘要

可视化和定量的拟合优度诊断是实践者工具箱中的重要工具。在拟合极值回归模型时,对令人信服且可靠的诊断的需求尤为明显,这类模型用于远超出响应变量可观测范围的外推,且常在未观测的协变量值处进行评估。尽管如此,针对极值回归模型的诊断工具很少,现有的工具在低维或非欧几里得协变量域(现代应用中常见)上的可解释性或可扩展性方面往往存在不足。此外,现有方法倾向于提供模型拟合的全局视角;即它们量化整个数据集上的拟合优度,而不提供对协变量空间中模型拟合可能较差的区域的洞察。我们提出了两种新的极值回归模型可视化诊断工具:标准化尾部图和标准化残差图。通过考虑标准化超越概率的渐近分布,我们证明这些图的置信边界近似独立于构建时使用的样本量。这使得我们能够提出可视化诊断工具,尽管协变量域各区域的样本量不同,但可以高效且一致地在全局和区域层面比较拟合优度。在讨论全局和区域拟合优度的汇总统计量之后,我们提供了两个极值回归模型的应用实例,说明我们的诊断工具如何用于模型比较(在数千个候选模型中)并提供支持模型设计的可操作发现。

英文摘要

Visual and quantitative goodness-of-fit diagnostics are an important tool in the practitioner's toolbox. The need for convincing and reliable diagnostics is particularly clear when fitting extreme value regression models, which are used for extrapolation far beyond the observable range of the response variable, and often evaluated at unobserved covariate values. Despite this, few diagnostics have been developed for extreme value regression models, and those available often suffer in terms of interpretability or scalability on low-dimensional or non-Euclidean covariate domains, often encountered in modern applications. Moreover, existing methods tend to offer a global perspective on model fit; that is, they quantify goodness-of-fit across the entire dataset, without offering insight into regions of the covariate space where the model fit may be poor. We propose two novel visual diagnostics for extreme value regression models: the standardised tail plot and the normalised residual plot. By considering the asymptotic distribution of normalised exceedance probabilities, we show that uncertainty bounds for our plots are approximately independent of the sample size used in their construction. This allows us to propose visual diagnostics which can efficiently and consistently compare goodness-of-fit at both a global and regional level, despite varying sample sizes over regions of the covariate domain. Following a discussion of summary statistics for global and regional goodness-of-fit, we provide two applications of extreme value regression models that illustrate how our diagnostics can be used to perform model comparison (across thousands of candidate models) and provide actionable findings that support model design.

2606.02589 2026-06-03 stat.ME stat.ML

Rashomon-Seeded Annealing for Robust Bayesian Inference in Factorial Designs

Rashomon播种退火用于因子设计中的鲁棒贝叶斯推断

Yiyang Fan, Soumyakanti Pan, Tyler H. McCormick

AI总结 针对因子设计中模型不确定性导致的后验多模态和MCMC收敛问题,提出Rashomon播种退火方法,利用Rashomon集作为退火重要性采样的初始分布,实现无需穷举模型空间的完整后验推断。

Comments 28 pages, 8 figures

详情
AI中文摘要

在因子设计中,通过贝叶斯模型平均整合模型不确定性受到可解释交互效应的组合爆炸阻碍,通常产生多模态后验,标准马尔可夫链蒙特卡罗算法遇到显著的收敛问题。我们提出一个通用计算框架,将Rashomon集(传统上因预测和可解释性而受到重视的高性能模型集合)重新用作估计完整后验的战略性“热启动”。我们的方法,Rashomon播种退火,通过将起始密度锚定在这些预先识别的高证据区域内,同时保持对整个模型空间的全局支持,来初始化退火重要性采样(AIS)。AIS校正不是将推断限制在Rashomon集并低估不确定性,而是恢复完整的后验推断,将Rashomon证书从推断截断转变为提议机制。我们使用Rashomon划分集(RPS)作为因子设计的严格认证种子构造器来演示这种方法。所得算法产生一致的自标准化后验摘要,如模型平均单元均值、可信区间和不确定性摘要,而无需穷举整个模型空间。这弥合了高证据模型发现与严格贝叶斯推断之间的差距,并概述了一种通用策略,其中任何高后验种子集都可以为基于AIS的模型平均提供计算杠杆。

英文摘要

Integrating over model uncertainty in factorial designs via Bayesian model averaging is hindered by the combinatorial explosion of interpretable interaction effects, often yielding a multimodal posterior, where standard Markov chain Monte Carlo algorithms encounter significant convergence issues. We propose a general computational framework that repurposes Rashomon sets, collections of high-performing models traditionally valued for prediction and interpretability, as a strategic "warm start" for estimating the full posterior. Our method, Rashomon-seeded annealing, initializes annealed importance sampling (AIS) by anchoring the starting density within these pre-identified, high-evidence regions while preserving global support over the entire model space. Rather than restricting inference to the Rashomon set and understating uncertainty, the AIS correction restores full posterior inference, turning the Rashomon certificate from an inferential truncation into a proposal mechanism. We demonstrate this approach using Rashomon Partition Sets (RPS) as a rigorous, certified seed constructor for factorial designs. The resulting algorithm yields consistent self-normalized posterior summaries, such as model-averaged cell means, credible intervals, and uncertainty summaries without exhaustive enumeration of the complete model space. This bridges the gap between high-evidence model discovery and rigorous Bayesian inference, and outlines a general strategy in which any high-posterior seed set can provide computational leverage for AIS-based model averaging.