URL PDF HTML ☆

赞 0 踩 0

2606.10869 2026-06-10 eess.SP 新提交

Information Bottleneck Meets Quantization: Finite Rate Analysis and Optimal Designs

信息瓶颈遇上量化：有限速率分析与最优设计

Francesco Binucci, Paolo Banelli

AI总结本文理论分析了高斯信息瓶颈（GIB）潜在表示的标量和向量量化对目标数据信息性的影响，并提出了在有限速率约束下的任务导向量化设计，在MMSE回归问题上验证了有效性，最后将任务导向思想扩展到非高斯场景。

Comments 16 pages, 9 figures

详情

AI中文摘要

信息瓶颈（IB）是一个成熟的框架，通过权衡速率和数据表示大小，寻找数据源的潜在紧凑表示，以获得相对于另一个目标数据的信息准确性。当目标与源联合高斯时，高斯IB（GIB）是其简单的闭式解。然而，在许多实际问题中，潜在表示必须由有限数量的比特存储或表示，而最优（G）IB解则不然。首先，本文从理论上分析了标量和向量量化对GIB潜在表示的影响，以及其对目标数据（非）信息性的影响。然后，通过在潜在表示上施加有限速率约束，重新表述GIB优化问题，提出了任务导向的量化设计。在MMSE回归问题上的仿真结果证实了所提出的量化设计的有效性，与标准GIB潜在表示的更启发式或分离的量化设计相比，显示出显著的增益。最后，通过适当修改用于IB启发的向量量化器的变分自编码器（VAE）中的代价函数，将任务导向思想扩展到非高斯设置。

英文摘要

The Information Bottleneck (IB) is a well established framework that looks for a latent compact representation of a data source, by trading rate and data-size representation, for information accuracy with respect to another target data. The Gaussian IB (GIB) is its simple closed form solution, when the target is jointly Gaussian with the source. Actually, in many practical problems the latent representation has to be stored or represented by a finite number of bits, while the optimal (G)IB solution has not. First, this manuscript theoretically analyzes the effect of scalar and vector quantization of the GIB latent representation, and its impact on the (dis)informativeness with respect to the target data. Then, task-oriented quantization designs are proposed by (jointly) reformulating the GIB optimization problem under a finite-rate constraint on the latent representation. Simulation results on MMSE regression problems confirm the effectiveness of the proposed quantization designs, which show significant gains with respect to more heuristic, or separate, quantization designs of the standard GIB latent representation. Finally, the paper extends the task-oriented philosophy to non-Gaussian settings, by properly modifying the cost function used in variational auto-encoders (VAEs) of IB-inspired vector quantizers.

URL PDF HTML ☆

赞 0 踩 0

2606.10864 2026-06-10 eess.AS 新提交

Phoneme-First Prediction for LLM-Based Speech Recognition

基于LLM的语音识别的音素优先预测

Jakob Poncelet, Hugo Van hamme

AI总结提出在LLM中集成音素预测步骤，先预测音素再生成转录，以提升低资源场景下的语音识别准确性和可解释性。

Comments Accepted at EUSIPCO 2026

详情

AI中文摘要

基于重尾似然的复变分自编码器在海杂波中雷达目标检测

Ting Bai, Jun Tang, Yuxin Xu

AI总结针对海杂波重尾、尖峰特性及目标标签稀缺问题，提出无监督复变分自编码器，采用Student-t负对数似然捕获重尾重构误差，并引入时域幅度误差约束，实现恒虚警率下的雷达目标检测。

详情

AI中文摘要

为了解决海杂波的重尾、尖峰特性以及标记目标数据的稀缺性，提出了一种用于海上雷达目标检测的无监督复值变分自编码器（VAE）。在实现中，每个复基带慢时间序列由其同相和正交分量表示，模型学习仅从杂波数据中重构它们。采用Student-$t$负对数似然来捕获重尾重构误差，同时减少杂波学习期间对异常值的敏感性。此外，引入了时域幅度误差约束，以惩罚重构中的慢时间幅度失配。在推理时，重构偏差用作检测统计量，并通过从仅杂波验证集估计的经验分位数设置决策阈值，以实现恒虚警率（CFAR）。在实测海杂波数据上的实验表明，在CFAR约束下，检测性能相对于MF、AMF和实值$\beta$-VAE持续提升。

英文摘要

To address the heavy-tailed, spike-prone nature of sea clutter and the scarcity of labeled target data, an unsupervised complex-valued variational autoencoder (VAE) for maritime radar target detection is proposed. In implementation, each complex baseband slow-time sequence is represented by its in-phase and quadrature components, and the model learns their joint reconstruction from clutter-only data. A Student-$t$ negative log-likelihood is adopted to capture heavy-tailed reconstruction errors while reducing sensitivity to outliers during clutter learning. In addition, a time-domain amplitude error constraint is introduced to penalize slow-time magnitude mismatch in the reconstruction. At inference, reconstruction deviation is used as the detection statistic, and the decision threshold is set via an empirical quantile estimated from a clutter-only validation set to enforce a constant false-alarm rate (CFAR). Experiments on measured sea-clutter data show that detection performance is consistently improved over MF, AMF, and a real-valued $β$-VAE under CFAR constraints.

URL PDF HTML ☆

赞 0 踩 0

2606.10464 2026-06-10 eess.AS 新提交

GC-LoRA: Gated Convolutional LoRA for Parameter-Efficient Acoustic Adaptation

GC-LoRA：用于参数高效声学适应的门控卷积LoRA

Natarajan Balaji Shankar, Zilai Wang, Kaiyuan Zhang, Mohan Shi, Abeer Alwan

AI总结提出GC-LoRA适配器架构，通过注入Conformer风格的局部卷积处理到预训练Transformer编码器中，高效捕捉局部声学依赖，在多种声学失配领域实现高达10.9%的词错误率降低。

Comments Accepted for publication at Interspeech 2026

详情

AI中文摘要

基于Transformer的语音基础模型在大多数自动语音识别任务中表现出色，但在应用于声学特性不匹配的领域时，性能往往会下降。虽然参数高效微调（PEFT）方法（如低秩适应（LoRA））调整全局注意力，但它们缺乏对于捕捉领域特定变化至关重要的局部上下文建模。我们提出了GC-LoRA，一种新颖的适配器架构，将Conformer风格的局部卷积处理注入到预训练的Transformer编码器中。通过将轻量级适配器集成到编码器注意力输出投影中，我们的方法在不干扰预训练全局表示的情况下，高效地捕捉局部声学依赖。在多种数据集（声学退化、带限、方言、儿童语音）上的实验证明了我们方法的有效性，与基线相比，实现了高达10.9%的词错误率（WER）降低，同时仅增加少量可训练参数。

英文摘要

Transformer-based Speech Foundation Models excel in most Automatic Speech Recognition tasks but often suffer performance degradation when applied to domains with mismatched acoustic characteristics. While Parameter Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), adjust global attention, they lack the local context modeling crucial for capturing domain-specific variations. We propose GC-LoRA, a novel adapter architecture that injects Conformer-style local convolutional processing into pretrained Transformer encoders. By integrating a lightweight adapter to encoder attention output projections, our method efficiently captures local acoustic dependencies without disrupting pretrained global representations. Experiments across diverse datasets (acoustically-degraded, bandlimited, dialectal, child) demonstrate the efficacy of our approach, achieving Word Error Rate (WER) reductions of up to 10.9% compared to baselines while adding minimal trainable parameters.

URL PDF HTML ☆

赞 0 踩 0

2606.10240 2026-06-10 eess.IV 新提交

Laplace-Mixture Dipole Inversion for Quantitative Susceptibility Mapping

拉普拉斯混合偶极子反演用于定量磁化率成像

Shuai Huang, James J. Lah, Jason W. Allen, Deqiang Qiu

AI总结提出一种基于拉普拉斯混合先验的自动偶极子反演方法（LAMDI），无需手动调参即可在定量磁化率成像中保留精细解剖结构，性能与现有方法相当。

详情

AI中文摘要

目的：开发一种用于定量磁化率成像（QSM）的自动偶极子反演方法，在无需手动调整正则化参数的情况下保留精细解剖结构。理论：原始的带参数估计的近似消息传递（AMP-PE）框架使用单一拉普拉斯先验对图像梯度建模，未能充分捕捉脑磁化率图的重尾梯度分布。这种先验不匹配可能导致过度正则化和块状重建。我们通过使用双分量拉普拉斯混合先验对梯度建模来解决这一局限性。方法：我们提出一种拉普拉斯混合偶极子反演（LAMDI）方法，将双分量拉普拉斯混合先验融入具有自动参数估计的AMP-PE框架中。LAMDI在公开的体内数据集上进行了评估。其性能与FANSI、MEDI以及使用单一拉普拉斯先验的AMP-PE（AMP-PE-L1）在标准默认设置和参考调优设置下进行了比较。结果：在公开的多方向QSM数据集上，LAMDI实现了与AMP-PE-L1相当的NRMSE和SSIM，同时显著降低了HFEN，表明其更好地保留了高频解剖细节。在基于参考的调优下，FANSI和MEDI在某些指标上达到了最佳性能，但LAMDI在无需参考图或手动正则化调优的情况下仍具有竞争力。结论：LAMDI通过结合有竞争力的重建精度和改进的精细解剖细节保留，为QSM偶极子反演提供了一种有效且自动的参数估计替代方案。

英文摘要

Purpose: To develop an automatic dipole inversion method for quantitative susceptibility mapping (QSM) that preserves fine anatomical structures without the need for manual regularization-parameter tuning. Theory: The original approximate message passing with parameter estimation (AMP-PE) framework models image gradients with a single Laplace prior, which does not fully capture the heavy-tailed gradient distribution of brain susceptibility maps. This prior mismatch can lead to over-regularization and blocky reconstructions. We address this limitation by modeling the gradients with a two-component Laplace mixture prior. Methods: We propose a Laplace-Mixture Dipole Inversion (LAMDI) method by incorporating a two-component Laplace mixture prior into the AMP-PE framework with automatic parameter estimation. LAMDI was evaluated on a public in vivo dataset. Its performance was compared with FANSI, MEDI, and AMP-PE with a single-Laplace prior (AMP-PE-L1) under both standard default and reference-tuned settings. Results: On a public multi-orientation QSM dataset, LAMDI achieved NRMSE and SSIM comparable to AMP-PE-L1 while substantially reducing HFEN, suggesting improved preservation of high-frequency anatomical detail. Under reference-based tuning, FANSI and MEDI achieved the best performance for some metrics, but LAMDI remained competitive without requiring reference maps or manual regularization tuning. Conclusion: LAMDI provides an effective and automatic parameter-estimation alternative for QSM dipole inversion by combining competitive reconstruction accuracy with improved preservation of fine anatomical detail.

URL PDF HTML ☆

赞 0 踩 0

2606.10190 2026-06-10 eess.SP 新提交

Optimal Illumination via Joint Movement and Phase Optimization for Movable Antenna-RIS Configuration

可移动天线-RIS配置的联合移动与相位优化的最优照明

Yan Zhang, Nicola Marchetti, Indrakshi Dey

AI总结提出可移动天线增强RIS架构，利用随机微分方程建模天线移动，通过两时间尺度框架优化长期信噪比，实现高达36 dB稳态增益和16倍能效提升。

详情

AI中文摘要

可重构智能表面（RIS）能够实现对无线传播的可编程控制，但在静态部署中仍易受持续深度衰落的影响。本文引入了一种可移动天线增强的RIS（MA-RIS）架构，其中天线元件物理重新定位以采样独立的空间信道，从而实现移动性带来的分集。我们使用随机微分方程（SDE）框架对天线运动进行建模，该框架捕获了受控漂移和环境扩散。基于伊藤微积分的分析表征了稳态天线分布、空间去相关和中断概率，揭示了控制强度与移动随机性之间的基本权衡。为了在考虑控制开销的同时最大化长期信噪比，我们提出了一种开销感知的两时间尺度框架，将慢速天线轨迹控制与快速相位适应分离。通过汉密尔顿-雅可比-贝尔曼（HJB）公式的预测近似求解随机最优控制问题，实现了实时实现。仿真验证了理论预测：两时间尺度策略实现了高达36 dB的稳态信噪比，具有显著的稳定性，比仅位置控制高出15 dB，比未控制基线高出30 dB以上。尽管信噪比低于有源RIS，但所提出的方法在不同系统规模下实现了高达16倍的能效提升，为弹性无线系统建立了移动性驱动的信道适应新范式。

英文摘要

Reconfigurable intelligent surfaces (RIS) enable programmable control of wireless propagation but remain vulnerable to persistent deep fades in static deployments. This paper introduces a Movable Antenna-enhanced RIS (MA-RIS) architecture where antenna elements physically reposition to sample independent spatial channels, enabling mobility-induced diversity. We model antenna motion using a Stochastic Differential Equation (SDE) framework capturing controlled drift and environmental diffusion. It^o calculus-based analysis characterizes steady-state antenna distributions, spatial decorrelation, and outage probability, revealing fundamental trade-offs between control strength and mobility randomness. To maximize long-term SNR while accounting for control overhead, we propose an overhead-aware Two-timescale framework separating slow antenna trajectory control from fast phase adaptation. The stochastic optimal control problem is solved via predictive approximation of the Hamilton-Jacobi-Bellman (HJB) formulation, enabling real-time implementation. Simulations validate theoretical predictions: the Two-timescale strategy achieves up to 36 dB steady-state SNR with remarkable stability, outperforming position-only control by up to 15 dB and uncontrolled baselines by over 30 dB. Despite experiencing a lower SNR than Active RIS, the proposed approach delivers up to 16 times higher energy efficiency (EE) across varying system scales, establishing a new paradigm of mobility-enabled channel adaptation for resilient wireless systems.

URL PDF HTML ☆

赞 0 踩 0

2606.10164 2026-06-10 eess.SP 新提交

Curved Beam Enabled Wireless Communications: Modeling, Analysis and Optimization

弯曲波束赋能无线通信：建模、分析与优化

Jiawei Yao, Xiaoren Xu, Walid Saad, Mingzhe Chen

AI总结针对障碍物场景，提出利用连续孔径阵列生成弯曲波束以提升无线通信性能，通过建模波束控制与分段信道，设计基于分数规划和增强块坐标上升的迭代算法优化加权和速率。

详情

AI中文摘要

本文研究了在存在障碍物的情况下，利用弯曲波束提升无线通信性能的问题。特别地，配备连续孔径阵列的发射机可以通过允许信号沿直线和弯曲路径传播来生成弯曲波束，以服务多个接收机。为了优化加权和速率，本文开发了一种弯曲波束模型，用于控制波束转向、波束聚焦和波束弯曲功能，并建立了一种分段信道模型来表征由障碍物引起的实际信道。基于所引入的弯曲波束模型，提出了一个优化问题，目标是在发射功率预算和弯曲波束物理约束下最大化所有用户的加权和速率。为了解决该问题，首先通过对连续坐标进行离散采样，将连续孔径转换为有限求和。然后，分析了理想连续孔径设计与其实际离散孔径近似之间的性能差距。基于上述离散近似，开发了一种迭代算法来优化弯曲波束控制参数。具体地，通过分数规划（FP）将原问题重新表述为可处理的形式。然后，通过设计一种增强的块坐标上升（BCA）方法来解决变换后的问题，该方法利用先前迭代的局部下降来确定代理构造点，从而加速收敛。接着，在代理函数中加入近端正则化项以控制更新幅度并抑制激进更新，从而提高更新稳定性。最后，基于有效信道增益计算波束幅度。仿真结果表明，与仅使用直线波束相比，所提方法可以改善加权和速率。

英文摘要

In this paper, the problem of using curved beams to improve wireless communication performance in the presence of a blockage is studied. In particular, a transmitter equipped with a continuous aperture array can generate curved beams to serve multiple receivers by allowing signals to propagate along both straight and curved paths. To optimize the weighted sum-rate, a curved beam model is developed for controlling the beam steering, beam focusing, and beam curving functions, along with a segmented channel model to characterize practical channels induced by the blockage. Based on the introduced curved beam model, an optimization problem is posed with the goal of maximizing the weighted sum-rate of all users under a transmit power budget and physical constraints of curved beams. To solve this problem, the continuous aperture is first converted into finite summations via a discrete sampling of the continuous coordinate. Then, the performance gap between the ideal continuous aperture design and its practical discrete aperture approximation is analyzed. Based on the above discrete approximation, an iterative algorithm is developed to optimize curved beam control parameters. In particular, the original problem is reformulated as a trackable form via fractional programming (FP). Then, the transformed problem is solved by designing an enhanced block coordinate ascent (BCA) method which determines a surrogate-construction point leveraging the local descent from previous iterations, thereby accelerating convergence. Then, a proximal regularization term is included into the surrogate function to control the update magnitude and suppress aggressive update, thereby improving updates stability. Finally, the beam amplitudes are computed based on the effective channel gains. Simulation results show that the proposed method can improve the weighted sum-rate compared to using only straight beam.

URL PDF HTML ☆

赞 0 踩 0

2606.10048 2026-06-10 eess.SP 新提交

Human Walking Sensing and Pose Estimation in the 6 GHz Band Using Amplitude and Phase CSI

使用幅度和相位CSI在6 GHz频段进行人体行走感知与姿态估计

Zhaorui Yin, Mattia Brambilla, Monica Nicoli

AI总结研究利用6 GHz OFDM信号的幅度和相位CSI进行室内人体姿态估计，设计处理流程并适配四种深度学习模型，实验表明幅度CSI性能与联合幅度-相位处理相当，相位信息作为补充特征更有效。

详情

AI中文摘要

本文研究了在6 GHz频段运行的室内多基地无线网络中，利用正交频分复用（OFDM）信号进行人体姿态估计。我们设计并验证了一个处理流程，该流程利用来自多个无线电链路的信道状态信息（CSI）的幅度和相位来估计人体姿态。文献中的四种深度学习架构，即DT-Pose、MetaFi++、HPE-Li和VST-Pose，被适配到OFDM CSI结构，并扩展以联合利用幅度和相位信息。这些模型估计在网络覆盖区域内行走的人体姿态。使用标准姿态估计指标如Procrustes对齐平均每关节位置误差（PA-MPJPE）和骨骼长度损失（BLL）在开放访问数据集上进行性能评估。结果表明，从6 GHz OFDM CSI测量中可以实现可靠的人体姿态重建，其中DT-Pose提供了最佳的整体精度。平均而言，仅幅度CSI的性能与联合幅度-相位处理相当，而相位信息作为补充特征比作为独立输入更有益。

英文摘要

This paper investigates human pose estimation from Orthogonal Frequency-Division Multiplexing (OFDM) signals in an indoor multistatic wireless network operating in the 6 GHz band. We design and validate a processing pipeline that exploits both the amplitude and phase of the Channel State Information (CSI) from multiple radio links to estimate the human body pose. Four deep learning architectures from the literature, namely DT-Pose, MetaFi++, HPE-Li, and VST-Pose, are adapted to the OFDM CSI structure and extended to jointly exploit the amplitude and phase information. The models estimate the pose of a human walking within the network coverage area. Performance evaluation is conducted on an open-access dataset using standard pose-estimation metrics such as Procrustes-aligned Mean Per-Joint Position Error (PA-MPJPE) and Bone Length Loss (BLL). Results indicate that reliable human pose reconstruction can be achieved from 6 GHz OFDM CSI measurements, with DT-Pose providing the best overall accuracy. On average, amplitude-only CSI yields performance comparable to joint amplitude-phase processing, whereas phase information is more beneficial as a complementary feature rather than as a standalone input.

URL PDF HTML ☆

赞 0 踩 0

2606.11013 2026-06-10 stat.ME 新提交

Empirical stratification for treatment effect heterogeneity with post-treatment variables

治疗后变量处理效应异质性的经验分层

Chao Cheng, Rui Wang, Yichi Zhang

AI总结提出一种假设精简的经验分层框架，通过基于基线协变量预测的潜在治疗后变量响应定义经验得分，构建可识别的经验分层处理效应，并连接主分层因果效应。

详情

AI中文摘要

基于加权共形预测从历史调查数据预测当前结果

Chihoon Lee, Sungkyu Jung, Hyokyung G. Hong

AI总结针对大规模调查中部分结果仅在特定年份测量的缺失问题，提出加权共形预测框架，通过估计历史与目标协变量分布间的似然比，实现有效的总体水平预测，并保证覆盖概率。

Comments Submitted to Journal of the Royal Statistical Society Series B. 89 pages, 14 figures. Includes supplementary material

详情

AI中文摘要

在诸如国家健康与营养调查（NHANES）等大规模复杂调查中，某些结果仅在选定的年份进行测量，导致不同调查波次间记录不完整。我们开发了一个加权共形预测框架，能够利用早期调查的信息对未观测到的结果进行有效的总体水平预测。该方法适应协变量偏移，其中连续和分类协变量的分布随时间演变，同时调查设计影响代表性。它整合了子组特定的密度比和子组比例估计，以近似历史与目标协变量分布之间的似然比，并且我们为所得预测集建立了覆盖保证。模拟研究和一项预测当前美国人口低密度脂蛋白胆固醇（LDL-C）的应用表明，所提出的方法实现了接近名义水平的覆盖，并且在效率上优于现有方法，特别是在协变量分布复杂或未知的情况下。

英文摘要

In large-scale complex surveys such as the National Health and Nutrition Examination Survey (NHANES), some outcomes are measured only in selected years, leaving incomplete records across survey waves. We develop a weighted conformal prediction framework that enables valid population-level prediction of unobserved outcomes using information from earlier surveys. The method accommodates covariate shift, where both continuous and categorical covariate distributions evolve over time while survey design affects representativeness. It integrates subgroup-specific density ratio and subgroup-proportion estimation to approximate likelihood ratios between the historical and target covariate distributions, and we establish coverage guarantees for the resulting prediction sets. Simulation studies and an application predicting low-density lipoprotein cholesterol (LDL-C) for the current U.S. population show that the proposed approach achieves coverage close to the nominal level and improved efficiency over existing methods, particularly when covariate distributions are complex or unknown.

URL PDF HTML ☆

赞 0 踩 0

2606.10409 2026-06-10 stat.ME 新提交

在函数形式的灵活建模中调整协变量测量误差的方法：一项盲法、受控中性比较模拟研究的结果

Mohammed Sedki, Aris Perperoglou, Anne C. M. Thiébaut, Steve Ferreira Guerra, Paul Gustafson, Frank E. Harrell, Willi Sauerbrei, Michal Abrahamowicz, Laurence S. Freedman

AI总结通过盲法多阶段中性比较模拟研究，评估了六类测量误差校正方法与四种灵活回归模型结合在非线性关联估计中的表现，发现点态SIMEX最准确稳健，贝叶斯方法和回归校准次之，多重插补较差，B样条最差。

详情

AI中文摘要

协变量测量误差在流行病学研究中普遍存在，并扭曲估计的暴露-结果关联，然而校正方法几乎仅在线性建模假设下研究。当潜在关联是非线性且本身通过灵活回归估计时，这些方法的行为仍不清楚。我们报告了一项在STRATOS倡议内进行的盲法、多阶段中性比较模拟研究，评估了测量误差校正与函数形式灵活建模的结合。六类校正方法（点态和基于系数的模拟外推[SIMEX]、对数尺度和风险尺度的贝叶斯推断、多重插补[MI]和回归校准[RC]）分别与B样条（BS）、惩罚样条（PS）、分数多项式（FP）和自然样条（NS）结合，产生了23种分析方法。这些方法应用于在五种函数形式（J形、线性、两种阈值模型和饱和模型）下生成的病例对照数据，跨越不同样本量、重复子研究规模、误差幅度和误差分布的数据集，采用经典加性误差和用于误差校准的重复子研究。性能通过暴露分布中心95%范围内估计函数的对数均方误差进行评估。点态SIMEX总体最准确且最稳健，其次是贝叶斯方法和与PS、FP或NS配对的RC；MI表现较差，而使用无惩罚BS的贝叶斯估计表现最差。PS、FP和NS几乎等效，而BS始终较差。没有单一方法在所有场景中占主导地位，强调了敏感性分析的价值。

英文摘要

Covariate measurement error is pervasive in epidemiological research and distorts estimated exposure-outcome associations, yet correction methods have been studied almost exclusively under linear modelling assumptions. Their behaviour when the underlying association is non-linear and is itself estimated with flexible regression, remains poorly characterised. We report a blinded, multi-stage neutral comparison simulation study, conducted within the STRATOS initiative, evaluating measurement error correction coupled with flexible modelling of functional form. Six families of correction methods (pointwise and coefficient-based Simulation Extrapolation [SIMEX], Bayesian inference on the logit and risk scales, Multiple Imputation [MI], and Regression Calibration [RC]) were each combined with B-splines (BS), penalised splines (PS), fractional polynomials (FP), and natural splines (NS), yielding 23 analytic methods. Methods were applied to case-control data generated under five functional forms (J-shape, linear, two threshold models, and saturation) across simulated datasets spanning varying sample sizes, replication substudy sizes, error magnitudes, and error distributions, with classical additive error and a replication substudy for error calibration. Performance was assessed by the log mean squared error of the estimated function over the central 95 % of the exposure distribution. Pointwise SIMEX was the most accurate and most robust approach overall, followed by Bayesian methods and RC when paired with PS, FP, or NS; MI performed less well, and Bayesian estimation with unpenalised BS performed worst. PS, FP, and NS were near-equivalent, whereas BS was consistently inferior. No single method dominated across all scenarios, underscoring the value of sensitivity analyses.

URL PDF HTML ☆

赞 0 踩 0

2606.10096 2026-06-10 stat.ME 新提交

Estimating the Wasserstein barycenter of one-dimensional distributions under sparse sampling

稀疏采样下一维分布的Wasserstein重心估计

James Peng, Florian Stijven, Linbo Wang, Peter B. Gilbert

AI总结针对每个单元仅通过少量独立同分布样本观测到一维分布的数据，提出边际构造重心（MCB）估计量，通过二项混合方法估计潜在分位数分布，克服稀疏采样下经验Wasserstein重心的偏差，并证明其一致性和渐近正态性。

详情

AI中文摘要

我们研究稀疏采样下的分布数据，其中每个单元由实直线上的概率分布表示，仅通过少量独立同分布样本观测。一维分布数据的一个自然中心趋势概念是Wasserstein重心，其分位数函数是单元级分位数函数的逐点平均。我们关注Wasserstein重心分位数函数的逐点估计：在给定分位数水平下，目标是相应单元级分位数的总体均值。一个朴素的插件估计量是经验Wasserstein重心，它将观测到的单元级经验分布视为真实的潜在单元级分布。然而，在稀疏采样下，该估计量可能存在严重偏差。我们提出了一种避免直接估计单元级分布或分布总体分布的方法。我们从更宏大的目标开始：刻画给定分位数水平下潜在单元级分位数的分布。我们证明该分布可以用单元级CDF值的边际分布表示，而后者可以通过二项混合方法估计。这激发了我们的估计量——边际构造重心（MCB）估计量，通过取估计的潜在单元级分位数分布的均值得到。我们建立了MCB估计量逐点一致且渐近正态的条件，并通过模拟表明，在稀疏采样下它能够显著优于经验Wasserstein重心。我们在HVTN 502/503疫苗效力试验的HIV-1序列数据分析中说明了该方法，当每个参与者只有少量序列可用时，使用重心来总结和比较参与者内部病毒序列特征的分布。

英文摘要

We study distributional data under sparse sampling where each unit is represented by a probability distribution on the real line observed only through a small i.i.d.~sample. A natural notion of central tendency for one-dimensional distributional data is the Wasserstein barycenter, whose quantile function is the pointwise average of the unit-level quantile functions. We focus on pointwise estimation of the Wasserstein barycenter quantile function: at a given quantile level, the target is the population mean of the corresponding unit-level quantiles. A naive plug-in estimator is the empirical Wasserstein barycenter, which treats observed unit-level empirical distributions as the true latent unit-level distributions. Under sparse sampling, however, this estimator can be severely biased. We propose an approach that avoids directly estimating either the unit-level distributions or the full population law of distributions. We start with the more ambitious goal of characterizing the distribution of latent unit-level quantiles at a given quantile level. We show that this distribution can be written in terms of the marginal distributions of the unit-level CDF values, which can be estimated using binomial mixture methods. This motivates our estimator, the marginal-constructed barycenter (MCB) estimator, obtained by taking the mean of the estimated distribution of latent unit-level quantiles. We establish conditions under which the MCB estimator is pointwise consistent and asymptotically normal, and show through simulations that it can substantially outperform the empirical Wasserstein barycenter under sparse sampling. We illustrate the method in an analysis of HIV-1 sequence data from the HVTN 502/503 vaccine efficacy trials, using the barycenter to summarize and compare within-participant distributions of viral sequence features when only a small number of sequences are available per participant.

URL PDF HTML ☆

赞 0 踩 0

2606.10093 2026-06-10 stat.AP stat.ME 新提交

Predicting Hospitalization from a Whole-Person Health Score with Incomplete Electronic Health Records Data: A Case Study

从不完整的电子健康记录数据中的全人健康评分预测住院：一项案例研究

Grayson E. Weavil, Joseph Rigdon, Sarah C. Lotspeich

AI总结本研究利用统计建模和机器学习，从不完整的电子健康记录中计算全因负荷指数（ALI），并评估其预测住院的能力，发现模式子模型方法在样本内表现最佳（AUC=0.73），但交叉验证效果较差（AUC=0.63）。

Comments 13 pages, 5 figures, 2 tables, R code and simulated dataset available on GitHub

详情

AI中文摘要

将标准化的全人健康测量嵌入电子健康记录（EHR）可能对预防性护理至关重要。全因负荷指数（ALI）由三个身体系统的十个压力源成分计算得出，提供了整体健康的有前景的快照。ALI可以从EHR数据计算，但许多成分缺失，因为并非所有患者都接受所有测试。使用统计建模和机器学习，来自大型学术健康系统的$1000$名患者的EHR数据被用于从ALI预测住院（作为计数或二元变量），并控制年龄和性别。评估了各种方法来填补患者缺失的ALI成分的信息空白，包括结合成分或单独使用它们的汇总度量。性能通过受试者工作特征（ROC）曲线和相应的ROC曲线下面积（AUC）来衡量。住院的计数建模并未优于二元建模，逻辑回归优于随机森林。总体而言，汇总度量表现相似，其中完整病例比例（即“不健康”的非缺失成分比例）表现最佳（AUC $= 0.64$），但差异$\leq 0.01$。当单独使用成分时，模式子模型方法在样本中最准确地预测了住院（AUC $= 0.73$），但交叉验证效果不佳（AUC $= 0.63$）。所有汇总度量表现相似。然而，当单独包含ALI成分时，为具有相同缺失数据模式的患者子集定制模型表现最佳。下一步包括实施EHR以实现预测并支持临床决策者大规模决策。

英文摘要

Embedding a standardized whole-person health measure in electronic health records (EHR) could be instrumental to preventative care. The allostatic load index (ALI), calculated from ten component stressors across three body systems, offers a promising snapshot of holistic health. The ALI can be calculated from EHR data, but many components are missing, since not all patients undergo all tests. Using statistical modeling and machine learning, EHR data for $1000$ patients from a large academic health system were used to predict in-patient hospitalization (as a count or binary) from ALI, controlling for age and sex. Various methods were evaluated to fill in information gaps for patients' missing ALI components, including summary measures combining components or using them separately. Performance was measured using receiver operating characteristic (ROC) curves and corresponding areas under the ROC curve (AUC). Count modeling of hospitalization did not improve upon binary, and logistic regression beat random forest. Overall, summary measures performed similarly, with the complete-case proportion (i.e., the proportion of non-missing components that were "unhealthy") performing best (AUC $= 0.64$) but by $\leq 0.01$. When using components separately, the pattern submodel approach most accurately predicted hospitalization (AUC $= 0.73$) in sample, but did not cross-validate as well (AUC $= 0.63$). All summary measures performed similarly. However, when including the ALI components separately, tailoring models to subsets of patients with the same missing data pattern performed best. Next steps include EHR implementation to enable prediction and support clinician decision-making at scale.

URL PDF HTML ☆

赞 0 踩 0