arXivDaily arXiv每日学术速递 周一至周五更新
重置
CS计算机953
2606.07469 2026-06-08 econ.EM cs.NA econ.TH math.NA math.PR 新提交

Statistical and Numerical Convergence in Stochastic Equilibrium

随机均衡中的统计与数值收敛

David Staines

AI总结 本文基于SELCKE的严格随机均衡理论,发现系统以特征值或逆特征值中更接近单位圆者与最大冲击持久性中较大者给出的速率几何收敛至长期均衡,并开发了检验随机均衡存在的模拟程序。

详情
Comments
91 Pages: 63 Main Text, 28 Suppelementary Materials
AI中文摘要

本文阐述了来自SELCKE(Staines (2024a))arXiv:2312.16214的严格随机均衡理论的最一般的计算和计量经济学含义。分析基础是发现系统几何收敛至长期均衡,其速率由特征值或逆特征值(来自外部)中更接近单位圆者与最大冲击持久性中的较大者给出。高阶冲击收敛更快。我开发了一个模拟程序,用于渐近检验特定模型是否存在随机均衡。基本逼近结果断言,无论展开阶数或损失函数如何,随机稳态都能提供最准确的摄动解。我还证明了当二阶项消失时,会出现超一致参数估计量$O(1/T)$。除了Calvo模型,我还研究了两种替代定价模型中的随机均衡。动力学显著简化。我通过误差中的最大滞后限制了脉冲响应达到峰值的时间。这为泰勒合同提供了经验支持,尽管存在单位根和强成本渠道的问题。对于菜单成本,我证明了初始价格分布超指数衰减,产生了一个等价于具有内生重置概率的Calvo模型的系统。异质性扰动的影响表现为实际产出与有效产出之间的额外楔子。借助新的分布论证,证明了目标函数在边界处的爆破,因此该模型满足递归均衡的现有特征值存在条件。在此过程中,为现有的理论模型和统计程序提供了新的见解。

英文摘要

This paper sets out the most general computational and econometric implications of the rigorous stochastic equilibrium theory from SELCKE (Staines (2024a)) arXiv:2312.16214. The analytical backbone is the discovery that the system converges geometrically to long-run equilibrium, at a rate given by the greater of the eigenvalue or inverse eigenvalue (from outside) closest to the unit circle and the maximum shock persistence. High-order shocks converge faster. I develop a simulation procedure to test, with asymptotic power, whether stochastic equilibrium exists for a particular model. The fundamental approximation result asserts that, whatever the order of expansion or loss function, the stochastic steady state delivers the most accurate perturbation solution. I also show that super-consistent parameter estimators $O(1/T)$ arise whenever second-order terms vanish. Besides Calvo, I study stochastic equilibrium in two alternative pricing models. Dynamics simplify considerably. I bound the time the impulse response peaks, by the maximum lag in the errors. This lends empirical support to Taylor contracts, although there are issues surrounding unit roots and the strong cost-channel. For menu costs, I demonstrate that the initial price distribution decays away super-exponentially, producing a system equivalent to Calvo with an endogenous reset probability. The impact of idiosyncratic disturbances appears as an additional wedge between actual and efficient output. Blow-up of the objective function at the boundary is proven, with the help of new distributional arguments, so the model meets existing eigenvalue existence conditions for the recursive equilibrium. Along the way, new light is shone on existing theoretical models and statistical procedures.

2606.07463 2026-06-08 eess.SP cs.CE cs.LG 新提交

Amortized Neural Optimization for Pre-Layout Signal Integrity Design Space Exploration using Differentiable Surrogates

基于可微代理的布局前信号完整性设计空间探索的摊销神经优化

Julian Withöft, Werner John, Emre Ecik, Ralf Brüning, Jürgen Götze

AI总结 提出摊销神经优化(ANO)框架,利用可微神经网络代理模型替代迭代黑盒优化,实现单次前向传播获取近最优设计参数,在DDR5 DFE、SerDes均衡等场景中加速三到四个数量级。

详情
Comments
16 pages, 20 figures, 8 tables
AI中文摘要

高速信号完整性(SI)分析的布局前设计空间探索(DSE)通常受限于现代电子设计自动化(EDA)工作流程中仿真和迭代优化算法的计算成本。虽然机器学习代理模型加速了仿真步骤,但优化设计仍需利用迭代黑盒搜索方法。这种迭代性质扩展性差,使得多角点扫描计算成本高昂。作为解决方案,本文提出了用于布局前SI设计的摊销神经优化(ANO)。ANO通过利用完全可微的神经网络代理模型,完全消除了迭代黑盒推理。ANO从代理中提取解析梯度,以训练全局优化策略。推理时不再重复求解优化问题,而是离线学习优化过程,从而实现摊销。一旦ANO策略训练完成,它就能在单个确定性前向传播中直接将不同的通道上下文映射到近最优设计参数。基于三个复杂的SI设计场景展示了ANO框架的效率和准确性,包括DDR5决策反馈均衡(DFE)、9维SerDes Tx/Rx联合均衡以及DDR3 DQS差分对布线(在内部对偏斜约束下优化眼图指标)。与实例特定的黑盒算法相比,在牺牲约10%最优性的代价下,实现了三到四个数量级的加速。对于大规模32万实例多角点SerDes扫描优化,ANO将原本需要数天计算时间的迭代搜索算法压缩为一次批量前向传播,毫秒级完成。这将计算昂贵的SI优化转变为实时、交互式的布局前DSE。

英文摘要

Pre-layout design space exploration (DSE) for high-speed signal integrity (SI) analysis is often limited by the computational cost of simulations and iterative optimization algorithms within modern electronic design automation (EDA) workflows. While machine learning surrogate models accelerate the simulation step, optimizing designs still requires utilizing iterative black-box search methods. This iterative nature scales poorly, making multi-corner sweeps computationally expensive. As a solution, this paper proposes amortized neural optimization (ANO) for pre-layout SI design. ANO entirely eliminates iterative black-box inference by utilizing fully differentiable neural network surrogate models. ANO extracts analytical gradients from the surrogate to train a global optimization policy. Instead of solving the optimization problem repeatedly at inference, the optimization process is learned offline and therefore amortized. Once the ANO policy is trained, it maps different channel contexts directly to near-optimal design parameters in a single deterministic forward pass. The efficiency and accuracy of the ANO framework are demonstrated based on three complex SI design scenarios, including DDR5 decision feedback equalization (DFE), 9-dimensional SerDes Tx/Rx co-equalization, and DDR3 DQS differential pair routing to optimize eye diagram metrics under intra-pair skew constraints. By trading roughly 10% in optimality compared to instance-specific black-box algorithms, it realizes speedups of three to four orders of magnitude. For a large-scale 320,000-instance multi-corner SerDes sweep optimization, ANO collapses what would have taken days of computation using iterative search algorithms into a single batched forward pass that completes in milliseconds. This transforms computationally expensive SI optimization into real-time and interactive pre-layout DSE.

2606.07381 2026-06-08 eess.IV cs.AI cs.CV 新提交

Impact of Synthetic Lesional MR Images in Automated Focal Cortical Dysplasia Detection in Low-Data Scenarios

合成病灶MR图像在低数据场景下自动局灶性皮质发育不良检测中的影响

Prabhjot Kaur, Hakim Ouaalam, Sedat Kandemirli, Sanjay P. Prabhu, Simon K. Warfield

AI总结 本研究通过条件生成网络合成FCD病灶MRI数据,评估其真实性及对自动检测的影响,发现合成数据可减少约20%标注需求,但真实数据仍更有效。

详情
AI中文摘要

背景与目的:自动检测局灶性皮质发育不良(FCD)需要大量体素级病灶勾画的MRI数据,这些数据难以获取。本研究旨在生成呈现FCD的合成MRI数据,评估其真实性,并评估其对自动FCD检测的影响,特别是在减少手动标注需求方面。方法:回顾性研究了来自多个(3个)中心的131例FCD患者和90例健康对照的T1加权(T1w)和T2加权液体衰减反转恢复(FLAIR)MRI扫描。通过将生成网络以二元FCD掩膜为条件生成合成MRI。两位神经放射科医生从14张真实和14张合成扫描的随机集合中识别真实图像。训练了三个nnU-Net模型用于检测FCD,分别使用:(i)仅真实数据(35例FCD/35例对照),(ii)真实数据(35例FCD/35例对照)加合成增强,以及(iii)扩展的真实数据(70例FCD/70例对照)。结果:专家区分真实与合成图像的能力有限,T1w分类准确率为60%,FLAIR为70%(评分者间一致性kappa=0.86)。用合成数据增强自动FCD检测使灵敏度提高8.14%(p=0.12),并改善了模型在真实病灶部位的置信度(0.83±0.11至0.89±0.12;p=0.02)。扩展真实数据模型进一步将灵敏度提高至73.8%(p<0.001),置信度提高至0.90±0.14(p=0.01)。结论:条件生成网络可以生成逼真的合成FCD-MRI,在保持同等灵敏度的情况下减少约20%的标注数据需求。当可用时,等量的真实数据仍比合成增强更有效。

英文摘要

Background and Purpose: Automated detection of focal cortical dysplasia (FCD) requires large volumes of voxelwise lesion-delineated MRI data, which are difficult to acquire. This study aims to generate synthetic MRI data exhibiting FCD, assess their realism, and evaluate their impact on automated FCD detection, particularly in reducing the need for manual annotations. Methods: T1-weighted (T1w) and T2-weighted Fluid-Attenuated Inversion Recovery (FLAIR) MRI scans from 131 FCD patients and 90 healthy controls from multiple (3) sites were retrospectively studied. Synthetic MRIs were generated by conditioning a generative network on binary FCD masks. Two neuroradiologists identified real images from a random set of 14 real and 14 synthetic scans. Three nnU-Net models were trained to detect FCD using: (i) real-only (35 FCD / 35 controls), (ii) real (35 FCD / 35 controls) plus synthetic augmentation, and (iii) expanded real data (70 FCD / 70 controls). Results: Experts showed limited ability to distinguish real from synthetic images, with classification accuracy of 60% for T1w and 70% for FLAIR (inter-rater agreement kappa = 0.86). Augmenting automated FCD detection with synthetic data increased sensitivity by 8.14% (p = 0.12) and improved model confidence at true lesion sites (0.83 +/- 0.11 to 0.89 +/- 0.12; p = 0.02). The expanded real-data model further improved sensitivity to 73.8% (p < 0.001) and confidence to 0.90 +/- 0.14 (p = 0.01). Conclusion: Conditional generative networks can generate realistic synthetic FCD-MRIs, reducing labeled data needs by approximately 20% while maintaining equivalent sensitivity. Equivalent amounts of real data, when available, remain more effective than synthetic augmentation.

2606.07374 2026-06-08 eess.SP cs.CV 新提交

Beyond Backscatter: InSAR coherence from detected SAR images

超越后向散射:来自检测SAR图像的InSAR相干性

Francescopaolo Sica, Andrea Pulella, Michael Schmitt

AI总结 提出一种深度学习框架,直接从检测SAR图像回归相干性,无需精确配准,使用Residual U-Net学习后向散射幅度与相干性的关系,在多种数据集上验证了高分辨率相干性回归的准确性提升和泛化能力。

详情
Comments
27 pages, 20 figures
AI中文摘要

在这项工作中,我们提出了一个深度学习框架,用于直接从检测SAR图像进行相干性回归,无需精确配准。使用从精确配准的Sentinel-1 SLC数据导出的相干性图训练Residual U-Net,以学习后向散射幅度与相干性之间的关系。模型在12天SLC对上训练,并在不同数据集上进行评估,包括配准的SLC产品和开放存取的分析就绪数据,覆盖不同的辐射特性、几何形状和位置。实验结果表明,与现有的基于强度的方法相比,所提出的方法实现了高分辨率相干性回归,且准确性更高。该网络在多样化的地理位置以及训练时从未见过的不同时间基线之间都能很好地泛化。此外,能够在全球可用的分析就绪数据(例如通过Google Earth Engine分发的地距检测数据)上运行,使其在任务设计、变化监测和多种制图任务中能够大规模应用。

英文摘要

In this work, we propose a deep learning framework for coherence regression directly from detected SAR images, without the need for accurate coregistration. A Residual U-Net is trained using coherence maps derived from precisely coregistered Sentinel-1 SLC data to learn the relationship between backscatter magnitudes and coherence. The model is trained on 12-day SLC pairs and evaluated across different datasets, including coregistered SLC products and open access analysis-ready data, covering diverse radiometric properties, geometries, and locations. Experimental results demonstrate that the proposed method achieves high-resolution coherence regression with improved accuracy compared to existing intensity-based approaches. The network generalizes well across diverse geographical locations and even across different temporal baselines that were never seen at training time. Additionally, the ability to operate on globally available analysis-ready data, such as ground range detected data, e.g., distributed through Google Earth Engine, enables its large-scale application in mission design, change monitoring, and diverse mapping tasks.

2606.07347 2026-06-08 eess.SP cs.ET 新提交

CSI Phase Averaging for High-Sensitivity Wi-Fi Sensing in Low-Multipath Environments

低多径环境下的高灵敏度Wi-Fi感知的CSI相位平均

Toshinori Suzuki, Shin-ichiro Ogura, Yu Morishima, Hiroshi Matsuura

AI总结 提出一种基于模型驱动的低复杂度运动检测方法,利用CSI相位结构特性抑制相位偏移误差,并通过相位平均降低噪声,实验证明可在低多径户外环境中检测数米外的飞鸟。

详情
Comments
13 pages, 11 figures, 3 tables
AI中文摘要

本文提出一种基于模型驱动的低复杂度运动检测方法,用于户外Wi-Fi感知。该方法利用低多径传播环境下信道状态信息(CSI)相位分量的结构特性(通常被认为不利于Wi-Fi感知),以减轻源自无线设备的相位偏移误差。此外,相位平均提供了处理增益,降低了包括量化噪声和热噪声在内的随机噪声分量。描述了该方法的理论基础,并使用从商用IEEE 802.11ac设备获取的压缩波束成形帧进行了实验评估。实验主要关注户外果园环境中飞行的野生乌鸦。实验结果表明,即使鸟类在距离发射和接收天线之间的直接视距路径数米外飞行,该方法也能检测到它们。此外,结果表明当风速低于3 m/s时,植被运动引起的波动可忽略不计。所提出的方法预计不仅适用于果园监测,也适用于低多径环境下的其他户外Wi-Fi感知应用。

英文摘要

This paper presents a low-complexity motion detection method for outdoor Wi-Fi sensing based on a model-driven approach. The method exploits the structural characteristics of the phase components in channel state information (CSI) for low-multipath propagation environments, which are generally considered disadvantageous for Wi-Fi sensing, to mitigate the phase offset errors originating from wireless devices. In addition, phase averaging provides a processing gain that reduces the random noise components, including quantization and thermal noise. The theoretical basis of the method is described and its effectiveness is experimentally evaluated using Compressed Beamforming frames obtained from commercial IEEE 802.11ac devices. The experiments primarily focus wild crows flying in an outdoor orchard environment. The experimental results demonstrate that the method can detect birds even when they fly several meters away from the direct line-of-sight path between the transmitter and receiver antennas. Furthermore, the results indicated that fluctuations caused by vegetation movement were negligible when the wind speed was less than 3~m/s. The proposed approach is expected to be applicable not only to orchard monitoring but also to other outdoor Wi-Fi sensing applications in low-multipath environments.

2606.07259 2026-06-08 eess.AS cs.SD 新提交

Assessing True Generalisability of Audio-Visual Speech Recognisers

评估音视频语音识别器的真正泛化能力

Zhaofeng Lin, Stavros Petridis, Maja Pantic, Naomi Harte

AI总结 通过构建与LRS3测试集严格匹配的评估集,发现当前最先进的音视频语音识别模型在未见数据上性能全面崩溃,揭示了其泛化能力不足,并分析了退化原因、词汇偏差和错误模式。

详情
Comments
Accepted to Interspeech 2026 Long paper track. 9 pages, 4 figures
AI中文摘要

当前的音视频语音识别(AVSR)模型在标准LRS3基准上实现了近乎完美的性能,引发了对自适应过拟合的担忧。为了系统评估真正的泛化能力,我们从大规模MultiVSR数据集中构建了一个高度可控、未见过的评估子集。与标准的分布外基准不同,我们的子集在声学、视觉和人口统计分布上与LRS3测试集严格匹配。评估五种最先进的架构揭示了普遍的性能崩溃,证明当前系统即使在严格对齐的条件下也无法泛化。通过跨七个因素的细粒度属性分析,我们隔离了这种退化的具体驱动因素。此外,我们发现了深刻的词汇偏差,揭示了不同的错误模式,并令人惊讶地发现音视频性能甚至落后于纯音频设置。我们发布了匹配的测试集,用于未来的基准测试。

英文摘要

Current Audio-Visual Speech Recognition (AVSR) models achieve near-perfect performance on the standard LRS3 benchmark, raising concerns of adaptive overfitting. To systematically assess true generalisability, we construct a highly controlled, unseen evaluation set subsampled from the massive MultiVSR dataset. Unlike standard out-of-distribution benchmarks, our subset strictly matches the acoustic, visual, and demographic distributions of the LRS3 test set. Evaluating five state-of-the-art architectures reveals a universal performance collapse, proving that current systems fail to generalise even under strictly aligned conditions. Through a fine-grained attribute analysis across seven factors, we isolate the specific drivers of this degradation. Furthermore, we uncover a profound lexical bias, expose distinct error patterns, and surprisingly reveal that audio-visual performance even lags behind audio-only settings. We release our matched test set for future benchmarking.

2606.07063 2026-06-08 eess.IV cs.CV 新提交

Beyond Universality: The GCC-FER Dataset and Culture-Aware Adaptation for Dynamic Facial Expression Recognition

超越普遍性:GCC-FER数据集及面向动态面部表情识别的文化感知适应

Sonalika Singh, Jyotirindra Dandapat, Avishi Razdan, Kshipra V. Moghe, Puneet Gupta, Lalan Kumar

AI总结 针对动态面部表情识别中文化差异被忽视的问题,提出首个大规模全球跨文化数据集GCC-FER,并设计文化感知适应系统CA-FER,通过自适应校准面部表示减轻文化偏差,实验证明其有效性。

详情
AI中文摘要

动态面部表情识别(DFER)是情感计算、人机交互和智能多媒体系统中的关键使能技术。尽管文化细微差别对FER性能有显著影响,但大多数现有FER系统假设情感表达在人群中普遍一致。这种差异可归因于不同文化中面部肌肉激活模式的系统性差异。推进跨文化FER的主要挑战在于缺乏文化多样性的基准数据集。为解决这一问题,本文引入了一个名为全球跨文化面部表情识别(GCC-FER)的新型混合多元文化视频数据集。GCC-FER包含跨越四种文化群体(非洲、高加索、东亚和南亚)的23,934个视频样本,涵盖七种基本表情,结合了对代表性不足人群的心理学家监督内部数据收集以及对现有来源的严格种族过滤。据我们所知,GCC-FER是首个旨在解决这些人口统计差距的大规模全球跨文化DFER数据集。利用该数据集,为每个文化群体推导出基于行为的文化先验,并为实际部署推导出全局先验。提出了一种文化感知FER(CA-FER)系统,通过自适应重新校准潜在面部表示来减轻文化偏差。在GCC-FER和DFEW上的大量实验表明,所提系统在多文化环境下持续提高了FER性能。

英文摘要

Dynamic Facial Expression Recognition (DFER) is a key enabling technology in affective computing, human-computer interaction, and intelligent multimedia systems. Despite the significant influence of cultural nuances on FER performance, most existing FER systems assume that emotional expressions are universally consistent across populations. This variation can be attributed to systematic differences in facial muscle activation patterns across cultures. A major challenge in advancing cross-cultural FER lies in the scarcity of culturally diverse benchmark datasets. To address this, a new hybrid multicultural video dataset termed Global Cross-Cultural Facial Expression Recognition (GCC-FER) is introduced. GCC-FER comprises 23,934 video samples spanning four cultural groups (African, Caucasian, East Asian, and South Asian) across seven basic expressions, combining psychologically supervised in-house data collection for underrepresented populations with rigorous ethnicity filtering of existing sources. To the best of our knowledge, GCC-FER is the first large-scale global cross-cultural DFER dataset designed to address these demographic gaps. Leveraging this dataset, behaviorally grounded cultural priors are derived for each cultural group and a global prior for practical deployment. A Culture-Aware FER (CA-FER) system is proposed to mitigate cultural bias by adaptively recalibrating latent facial representations. Extensive experiments on GCC-FER and DFEW demonstrate that the proposed system consistently improves FER performance across multicultural settings.

2606.06983 2026-06-08 eess.IV cs.AI cs.CV 新提交

DaX: Learning General Pathology Representations Across Scales

DaX: 跨尺度的通用病理学表示学习

Bokai Zhao, Yiyang Zhang, Long Bai, Tai Ma, Hanqing Chao, Minfeng Xu

AI总结 提出病理视觉基础模型DaX,通过改进DINOv3自监督学习,结合连续放大训练、跨尺度组织视图等设计,在44个公开数据集的161项临床任务上取得最佳平均性能。

详情
AI中文摘要

计算病理学需要能够跨不同临床终点迁移且对放大倍数、染色、扫描仪类型、切片制备和输入分辨率变化保持鲁棒的视觉表示。我们提出DaX,一个病理视觉基础模型,它将DINOv3风格的自监督学习适应到全切片组织病理学。DaX从自然图像DINOv3权重初始化,并融合了连续放大训练、跨尺度组织视图、方向无关和采集鲁棒的数据增强、多输入尺寸训练以及Gram锚定的密集一致性。这些设计旨在连接局部细胞形态与全局组织结构,同时稳定跨输入尺度的密集token级表示。我们进一步构建了一个WSI级基准,包含来自44个公共数据集的161项临床有意义任务,涵盖28,182名患者和34,394张切片,跨越四个临床领域和九个任务类别。所有模型在固定的患者级交叉验证协议下进行评估,并采用折叠级统计排名,从而实现可重复的比较,对分割依赖的变异性不敏感。在该基准上,DaX在任务中取得了最高的平均性能,并持续获得强大的任务级排名分数,其增益涵盖诊断病理学、生物标志物和分子谱分析、组织/标本背景以及风险、反应和预后。这些结果支持DaX作为计算病理学的可迁移视觉编码器,并为未来的病理基础模型提供了标准化的评估框架。项目页面:此https URL。

英文摘要

Computational pathology requires visual representations that transfer across diverse clinical endpoints and remain robust to variation in magnification, staining, scanner type, slide preparation, and input resolution. We present DaX, a pathology vision foundation model that adapts DINOv3-style self-supervised learning to whole-slide histopathology. DaX is initialized from natural-image DINOv3 weights and incorporates continuous magnification training, cross-scale tissue views, orientation-agnostic and acquisition-robust augmentation, multi-input-size training, and Gram-anchored dense consistency. These designs aim to connect local cellular morphology with global tissue architecture while stabilizing dense token-level representations across input scales. We further construct a WSI-level benchmark comprising 161 clinically meaningful tasks from 44 public datasets, covering 28,182 patients and 34,394 slides across four clinical domains and nine task categories. All models are evaluated under a fixed patient-level cross-validation protocol with fold-level statistical ranking, enabling reproducible comparisons that are less sensitive to split-dependent variation. Across this benchmark, DaX achieves the highest mean performance across tasks and consistently strong task-level ranking scores, with gains spanning diagnostic pathology, biomarker and molecular profiling, tissue/specimen context, and risk, response, and prognosis. These results support DaX as a transferable visual encoder for computational pathology and provide a standardized evaluation framework for future pathology foundation models. Project page: https://alibaba-damo-academy.github.io/DaX/benchboard/.

2606.06907 2026-06-08 eess.AS cs.AI cs.SD 新提交

SpectCount: Spectrotemporal Counting via Synthetic Signals Improves Large Audio Language Models

SpectCount: 通过合成信号进行频谱时间计数改进大型音频语言模型

Seonuk Kim, Yonghyeon Jun, Ju Yeon Kang, Jimin Hong, Yoonhyeong Lee, Nam Soo Kim

AI总结 针对大型音频语言模型在频谱时间感知上的弱点,提出SpectCount方法,利用动态生成的完全合成音频信号进行数据高效微调,无需真实音频或标注,显著提升多种听觉基准性能。

详情
Comments
5 pages, 5 figures
AI中文摘要

大型音频语言模型(LALMs)通过音频编码器和大规模音频数据扩展了大型语言模型。然而,高质量标注音频数据的稀缺性仍然是扩展的根本瓶颈。通过探测信号可检测性分析,我们识别出基础LALM在细粒度频谱时间感知上的弱点。为了解决这些挑战,我们提出频谱时间计数(SpectCount),一种基于动态生成的完全合成音频信号的数据高效微调方法,无需依赖真实世界音频、标注或预训练生成模型。SpectCount不仅解决了观察到的弱点,还在微调期间未见的声音、音乐和语音等多种听觉基准上提升了性能。这些结果表明,针对弱点的合成信号为LALMs增强听觉理解能力提供了一条数据高效的途径。

英文摘要

Large audio language models (LALMs) extend large language models with an audio encoder and large-scale audio data. However, the scarcity of high-quality annotated audio data remains a fundamental bottleneck for scaling. Through probing signal detectability analysis, we identify fine-grained spectrotemporal perceptual weaknesses in a foundation LALM. To address these challenges, we propose Spectrotemporal Counting (SpectCount), a data-efficient fine-tuning approach based on fully synthetic audio signals generated on-the-fly, without relying on real-world audio, annotations, or pretrained generative models. SpectCount not only resolves the observed weaknesses but also improves performance on diverse auditory benchmarks spanning sound, music, and speech, unseen during fine-tuning. These results suggest that weakness-targeted synthetic signals provide a data-efficient path toward enhanced auditory understanding capabilities in LALMs.

2606.06847 2026-06-08 eess.IV cs.CV 新提交

Physics-Driven Semantic Scattering Structure Understanding of Aircraft Target in SAR Images

SAR图像中飞机目标的物理驱动语义散射结构理解

Yifei Yin, Xiaogang Yu, Hao Shi, Liang Chen, Wei Li

AI总结 针对SAR图像中飞机目标散射中心表示不稳定、弱散射部件缺失的问题,提出物理驱动框架S3U-SAR,通过定义语义散射关键点并利用多维物理先验约束,实现完整拓扑结构重建,在基准数据集上取得最优性能。

详情
AI中文摘要

合成孔径雷达(SAR)因其全天时、全天候观测能力,已成为目标解译不可或缺的手段。在SAR目标解译中,电磁散射信息提供了超越视觉纹理的物理基础线索,并被广泛用于目标解译。然而,现有方法仍以局部散射中心表示为主。这种无序且与部件无关的表示对飞机目标极不稳定。因此,物理存在的弱散射响应部件常被遗漏,导致重建的拓扑结构不完整。为解决这一局限,我们建立了语义散射结构理解作为SAR飞机解译的新范式。定义语义散射关键点以将局部电磁响应与物理上有意义的飞机部件关联,同时引入可见性感知属性以保留弱可观测但物理存在的部件。关键点进一步组织为稳定的语义散射结构。基于此,我们提出S3U-SAR,一个物理驱动框架,用于定位语义散射关键点并构建由多维物理先验(包括散射异质性、刚体拓扑、散斑不确定性)约束的完整表示。进一步引入置信门控联合监督策略以缓解优化冲突。我们构建了KP-SAR-Aircraft-1.0,首个用于语义散射结构理解的细粒度基准。大量实验表明,S3U-SAR相比基线取得了最佳性能。跨类别和跨数据集评估进一步验证了其鲁棒性和可迁移性。

英文摘要

Synthetic aperture radar (SAR) has become indispensable for target interpretation owing to its all-day and all-weather observation capability. In SAR target interpretation, electromagnetic scattering information provides a physically grounded cue beyond visual texture and has been widely exploited for target interpretation. However, existing methods remain dominated by local scattering center representations. Such unordered and component-agnostic representations are highly unstable for aircraft targets. As a result, physically existing components with weak scattering responses are often missed, resulting in the incomplete reconstructed topology structure. To address this limitation, we establish Semantic Scattering Structure Understanding as a new paradigm for SAR aircraft interpretation. Semantic scattering keypoints are defined to associate local electromagnetic responses with physically meaningful aircraft components, while visibility-aware attributes are introduced to retain weakly observable yet physically existed components. The keypoints are further organized into a stable semantic scattering structure. Build upon this, we propose S3U-SAR, a physics-driven framework to localize semantic scattering keypoints and construct the complete representation constrained by multi-dimensional physical priors containing scattering heterogeneity, rigid-body topology, speckle uncertainty. A confidence-gated joint supervision strategy is further introduced to alleviate optimization conflicts. We construct KP-SAR-Aircraft-1.0, the first fine-grained benchmark for semantic scattering structure understanding. Extensive experiments demonstrate that S3U-SAR achieves the best performance compared with baselines. Cross-category and cross-dataset evaluations further verify its robustness and transferability.

2606.06837 2026-06-08 eess.AS cs.LG 新提交

SEAM: Shortcut-Aware Real-Time Detection of Scripted vs. Spontaneous Speech for Interview Guardrails

SEAM:面向面试防护栏的脚本化与自发语音的快捷方式感知实时检测

Vsevolod, Kovalev, Pranay Manocha

AI总结 提出SEAM框架,通过统一预处理、接缝感知采样、非语音增强和紧凑DistilHuBERT骨干,在8秒窗口下实现0.971 ROC-AUC,并揭示快捷方式学习问题。

详情
Comments
Accepted to Interspeech 2026
AI中文摘要

脚本化与自发语音检测对面试防护栏具有吸引力,但基准性能可能因与语料库身份、信道条件和录音伪影相关的快捷方式(而非说话风格本身)而膨胀。我们提出SEAM,一个用于实时脚本化检测的快捷方式感知框架,结合了统一预处理、接缝感知采样、非语音增强和紧凑的DistilHuBERT骨干。使用8秒窗口,该模型在外部面试领域评估集上达到0.971 ± 0.004的ROC-AUC。移除快捷方式预防组件可改善内部留出指标,但急剧降低外部性能,表明存在快捷方式学习。训练后量化将模型占用减少至41.8MB,且外部性能损失很小。结果表明,鲁棒的实时脚本化检测不仅依赖于骨干网络,还依赖于快捷方式感知的数据设计和评估。我们发布代码和模型检查点。

英文摘要

Scripted vs spontaneous speech detection is appealing for interview guardrails, but benchmark performance can be inflated by shortcuts tied to corpus identity, channel conditions, and recording artifacts rather than speaking style itself. We present SEAM, a shortcut-aware framework for real-time scriptedness detection that combines uniform preprocessing, seam-aware sampling, non-speech augmentation, and a compact DistilHuBERT backbone. With 8s windows, the model achieves 0.971 +- 0.004 ROC-AUC on an external interview-domain evaluation set. Removing the shortcut-prevention components improves internal held-out metrics but sharply reduces external performance, indicating shortcut learning. Post-training quantization reduces the model footprint to 41.8MB with little loss in external performance. The results demonstrate that robust real-time scriptedness detection depends not only on the backbone, but on shortcut-aware data design and evaluation. We release code and model checkpoints.

2606.06795 2026-06-08 eess.AS cs.SD 新提交

BiEAR: A Human Auditory-Inspired Adaptive Binaural Front-end for Multi-Speaker Localisation and Distance Estimation

BiEAR: 一种受人类听觉启发的自适应双耳前端,用于多说话人定位和距离估计

Hanyu Meng, Eliathamby Ambikairajah, Vidhyasaharan Sethu, Qiquan Zhang, Haizhou Li

AI总结 提出受人类听觉启发的自适应双耳前端BiEAR,通过神经控制器动态调整滤波器组频率选择性,提升多说话人定位和距离估计的准确性与鲁棒性。

详情
Comments
Accepted to INTERSPEECH 2026
AI中文摘要

我们提出BiEAR,一种受人类听觉启发的自适应双耳前端,用于多说话人定位和距离估计。受人类听觉中内侧橄榄耳蜗(MOC)反馈的启发,BiEAR使用神经控制器在推理过程中自适应调整双耳听觉滤波器组的频率选择性。这为双耳产生时频自适应表示,使模型能够响应变化的声学条件。我们在消声和真实房间环境中评估了BiEAR在多说话人定位和距离估计上的性能。结果表明,与常用的固定双耳前端相比,自适应前端提高了定位准确性以及对未见说话人和房间的鲁棒性。对学习到的滤波器自适应的可视化和分析表明,BiEAR随时间强调信息丰富的频带。这些发现表明,自适应的、受生物启发的双耳前端可以改善机器在复杂声学场景中的听觉鲁棒性。

英文摘要

We present BiEAR, a human auditory-inspired adaptive binaural front-end for multi-speaker localisation and distance estimation. Inspired by medial olivocochlear (MOC) feedback in human hearing, BiEAR uses a neural controller to adaptively adjust the frequency selectivity of a binaural auditory filterbank during inference. This yields time-frequency adaptive representations for ears, enabling the model to respond to changing acoustic conditions. We evaluate BiEAR on multi-speaker localisation and distance estimation in anechoic and real-room environments. Results show that the adaptive front-end improves localisation accuracy and robustness to unseen speakers and rooms compared with commonly used fixed binaural front-ends. Visualisation and analysis of learned filter adaptations show that BiEAR emphasises informative frequency bands over time. These findings suggest that adaptive, biologically inspired binaural front-ends can improve machine hearing robustness in complex acoustic scenes.

2606.06725 2026-06-08 eess.IV cs.CV 新提交

Compute-Optimal Network Design for Echocardiography Myocardial Segmentation and Perfusion Quantification using Neural Scaling Laws

基于神经缩放定律的超声心动图心肌分割与灌注量化的计算最优网络设计

Clara Rodrigo González, Matthieu Toulemonde, Lasha Gvinianidze, Cameron A. B. Smith, Oscar Bates, Roxy Senior, Fu Siong Ng, Meng-Xing Tang

AI总结 应用神经缩放定律预测心肌分割性能,在CAMUS和CEUS数据集上确定最优网络大小,实现参数减少240倍且性能达最优,自动分割在心肌灌注量化中与资深心脏病专家等效。

详情
Comments
15 pages, 4 figures, 5 tables, journal
AI中文摘要

使用对比增强超声进行心肌灌注量化提供了一种床旁非电离替代核成像模态的方法。然而,其临床采用受到耗时的手动标注的限制。由于域内训练数据匮乏,自动分割已被证明具有挑战性。我们应用当前用于优化大数据集上大型语言模型的策略,将神经缩放定律应用于预测心肌分割的网络性能。我们在数据子集上外推性能,以确定CAMUS超声心动图数据集和25名患者的对比增强超声(CEUS)数据集上的最优网络大小。最后,通过将最终心肌灌注参数与资深心脏病专家获得的参数进行比较,验证了我们模型的临床实用性。基于缩放定律的外推能够预测完整数据集大小下的测试损失,使我们能够选择两个网络,在CAMUS上以240倍的参数减少获得最先进性能。我们观察到缩放定律的梯度从CAMUS迁移到CEUS数据集,但预测损失存在偏差。自动分割的掩膜在心肌灌注量化中与资深心脏病专家表现相当。这些结果确立了神经缩放定律作为小成像数据集上数据驱动计算最优模型设计的实用工具。

英文摘要

Myocardial perfusion quantification using contrast-enhanced ultrasound offers a bedside non-ionizing alternative to nuclear imaging modalities. However, its clinical adoption is hindered by time-consuming manual labelling. Automated segmentation has proved challenging due to a paucity of in-domain training data. Adapting strategies currently used to optimise large language models for large datasets, we apply neural scaling laws to predict network performance for myocardial segmentation. We extrapolate performance on subsets of the data to determine optimal network size on the CAMUS echocardiography dataset and a 25-patient contrast-enhanced ultrasound (CEUS) dataset. Finally, we validate the clinical utility of our models by comparing the final myocardial perfusion parameters with those obtained by a senior cardiologist. Extrapolation based on the scaling law is predictive of test loss at the full dataset size, allowing us to select two networks that obtained state-of-the-art performance on CAMUS with a 240-fold reduction in parameter count. We observe the gradient of the scaling law transfers from CAMUS to the CEUS dataset with a bias in the predicted losses. The automatically segmented masks perform equivalently to a senior cardiologist in myocardial perfusion quantification. These results establish neural scaling laws as a practical tool for data-driven compute-optimal model design for small imaging datasets.

2606.06540 2026-06-08 eess.IV cs.CV 新提交

ErA: Error-Aware Deep Unrolling Network for Single Image Defocus Deblurring

ErA:用于单图像散焦去模糊的误差感知深度展开网络

Tu Vo, Chan Y. Park

AI总结 提出ErA网络,通过联合学习紧凑核基和逐像素权重,并利用增广拉格朗日展开中的误差感知项交替更新和ResUNet去噪器校正核估计误差,在多个数据集上达到最优性能。

详情
AI中文摘要

我们提出了ErA(误差感知深度展开网络),一个用于单图像散焦去模糊的端到端框架。ErA联合学习一个紧凑的核基和逐像素权重,同时增广拉格朗日展开中的一个误差感知项通过交替更新和ResUNet去噪器校正核估计误差。它在DPDD、RealDOF和RTF上达到了最先进的PSNR/SSIM,并在没有真实数据的CUHK上显示出强大的泛化能力。

英文摘要

We introduce ErA (Error-Aware Deep Unrolling Network), an end-to-end frame work for single-image defocus deblurring. ErA jointly learns a compact kerne basis and per-pixel weights, while an error-aware term in Augmented Lagrangian unrolling corrects kernel estimation errors via alternating updates and ResUNet denoisers. It achieves state-of-the-art PSNR/SSIM on DPDD, RealDOF, and RTF, and shows strong generalization on CUHK without ground truth.

2606.06534 2026-06-08 eess.IV cs.AI 新提交

Attention Consistent Longitudinal Medical Visual Question Answering Guided by Vision Foundation Models

基于视觉基础模型的注意力一致纵向医学视觉问答

Jialin Wu, Qianru Zhang, Georges El Fakhri, Xiaofeng Liu

AI总结 提出一种注意力引导的编码器-解码器框架,通过轻量级配准和自适应掩码生成,结合辅助损失函数,实现胸部X光片的纵向医学视觉问答,在Medical-Diff-VQA基准上取得优异性能。

详情
Journal ref
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026, pp. 6448-6458
Comments
Accepted to CVPR 2026 Workshop PHAROS-AIF-MIH
AI中文摘要

纵向医学视觉问答(VQA)需要推理当前时间点图像与参考时间点图像之间的解剖差异。我们针对胸部X光片提出了一种注意力引导的编码器-解码器。与传统的直接对比不同,我们引入了一个轻量级仿射配准模块,通过小配准正则化将当前图像与参考图像进行共配准,以减少无关运动。配准后的图像对输入图像编码器,随后通过冻结的DINO掩码生成器和可训练的自适应掩码生成器生成应用于原始图像对的掩码。掩码图像对再次输入图像编码器,并与文本特征拼接,作为基于多模态Transformer的解码器的输入以生成最终答案。为了促进学习稳定并澄清变化信号,受DINO-v3启发,我们加入了额外的辅助目标,包括掩码重建损失、成对Gram风格一致性损失和KoLeo均匀性损失,以增强表示的几何结构。在Medical-Diff-VQA基准上,该模型获得了强大的BLEU、ROUGE-L、CIDEr和METEOR分数,同时通过共享显著性掩码提供了内在的可解释性。这些结果支持将显著性条件生成与轻度预对齐作为医学VQA中纵向推理的原则性框架。我们的训练策略也展示了在生物医学中利用图像基础模型的范式潜力:同时优化监督和无监督学习目标。

英文摘要

Longitudinal medical visual question answering (VQA) requires reasoning about anatomical differences between an image of a current time point and an image of a referred time point. We propose an attention-guided encoder-decoder for this task with chest X-rays. Instead of conventional direct contrast, we propose to include a lightweight affine registration module to reduce nuisance motion by co-registering the current image to the reference image with a small registration regularizer. The registered image pair is fed into the image encoder, followed by a frozen DINO-based mask generator and a trainable adaptive mask generator to produce masks applied to the original image pairs. The masked image pairs are again fed into the image encoder and concatenated with text features as the input to a multimodal transformer-based decoder to generate final answers. To facilitate learning stabilization and clarify the change signal, inspired by DINO-v3, we include additional auxiliary objectives, including a mask rebuilding loss, a pairwise Gram-style consistency loss, and a KoLeo uniformity loss, which enhances the geometry of the representation. On the Medical-Diff-VQA benchmark, the model delivers strong BLEU, ROUGE-L, CIDEr, and METEOR scores while offering intrinsic interpretability through the shared saliency mask. These results support saliency-conditioned generation with mild pre-alignment as a principled framework for longitudinal reasoning in medical VQA. Our training strategy also illustrates the potential of a paradigm in utilizing image foundation models in biomedicine: optimizing both supervised and unsupervised learning objectives simultaneously.

2606.06524 2026-06-08 eess.IV cs.CV cs.LG 新提交

Advanced Flood Prediction with Physics-Guided Deep Learning: Combining UNet, FNO, and SAR/Optical Imagery

基于物理引导深度学习的先进洪水预测:结合UNet、FNO与SAR/光学影像

Tewodros Syum Gebre, Jagrati Talreja, Leila Hashemi-Beni

AI总结 提出物理引导深度学习框架,融合多模态遥感与浅水方程约束,通过UNet-FNO混合架构实现高精度洪水预测,IoU达0.82,F1达0.90。

详情
Comments
This paper has been accepted for publication in the Proceedings of the IEEE Radar Conference (RadarConf 2026). The final authenticated version will be available through IEEE Xplore
AI中文摘要

由于地面观测有限、地形条件异质以及数据驱动模型中难以强制执行水动力学一致性,准确且可扩展的洪水测绘仍然具有挑战性。本文介绍了一种物理引导的深度学习框架,该框架集成了多模态遥感(Sentinel-1 SAR、Sentinel-2光学影像和DEM衍生的地形特征)与深度平均浅水方程(SWE)的约束。所提出的混合架构结合了用于捕捉精细尺度空间细节的UNet和用于模拟流域尺度水力相互作用的傅里叶神经算子(FNO),而物理信息残差损失确保了质量和动量一致性。在多种洪泛区环境下评估,混合模型在洪水范围预测中实现了0.82的交并比和0.90的F1分数,优于仅使用UNet和仅使用FNO的基线模型。以水动力学模拟作为参考数据,该模型在水深方面实现了0.21米的均方根误差,在流速方面实现了0.15米/秒的均方根误差。物理一致性得以保持,残差低且质量不平衡低于2.1%。消融研究证实,去除基于物理的正则化会显著降低性能,突显了物理约束对稳定性和泛化能力的价值。这些结果表明,将水动力学原理嵌入深度学习可产生更准确、可靠且物理一致的洪水预测,为业务监测和大规模部署提供了巨大潜力。

英文摘要

Accurate and scalable flood mapping remains challenging due to limited ground observations, heterogeneous terrain conditions, and the difficulty of enforcing hydrodynamic consistency within data-driven models. This work introduces a physics-guided deep learning framework that integrates multi-modal remote sensing (Sentinel-1 SAR, Sentinel-2 optical imagery, and DEM-derived terrain features) with constraints from the depth-averaged shallow water equations (SWE). The proposed hybrid architecture combines a UNet to capture fine-scale spatial details with a Fourier Neural Operator (FNO) to model basin-scale hydraulic interactions, while physics-informed residual losses ensure mass and momentum consistency. Evaluated across diverse floodplain settings, the hybrid model achieves an Intersection over Union of 0.82 and an F1 score of 0.90 for flood extent prediction, outperforming UNet-only and FNO-only baselines. Using hydrodynamic simulations as reference data, the model achieves an RMSE of 0.21 m for water depth and 0.15 m/s for flow velocity. Physics consistency is maintained, with low residuals and mass imbalance below 2.1%. Ablation studies confirm that removing physicsbased regularization significantly degrades performance, underscoring the value of physical constraints for stability and generalization. These results demonstrate that embedding hydrodynamic principles into deep learning yields more accurate, reliable, and physically coherent flood predictions, offering strong potential for operational monitoring and large-scale deployment.

2606.06509 2026-06-08 eess.IV cs.AI cs.LG q-bio.TO 新提交

Which Anatomy Matters Under Limited Labels? A Data-Efficient Anatomy-Aware Benchmark for Cardiac Pathology Prediction

在有限标签下哪些解剖结构重要?用于心脏病理预测的数据高效解剖感知基准

Himanshu Singh

AI总结 针对有限标签和计算资源下的医学影像问题,提出解剖感知基准,通过比较不同解剖结构表示和分类器,发现表示质量比模型复杂度更重要。

详情
Comments
ACCEPTED at ICML 2026 Workshop GlobalSouthML (Seoul, South Korea; PMLR 306, 2026)
AI中文摘要

许多医学影像问题必须在有限标签和受限计算条件下解决,然而性能提升主要来自更具表达力的模型还是对临床有意义解剖结构的更好表示,目前尚不清楚。我们通过一个低数据解剖感知基准来研究这个问题,该基准用于在公共ACDC MRI数据集上进行5类心脏病理预测。利用来自右心室、心肌和左心室的分割衍生患者描述符,我们在线性、核和基于树的分类器上比较了特定解剖结构和多结构表示。我们发现,在有限标签设置下,表示主导复杂度。这些结果表明,在资源受限的医疗环境中,识别和表示最具信息量的解剖结构可能比单纯增加模型复杂度更重要。

英文摘要

Numerous medical imaging problems must be solved under limited labels and constrained compute, yet it remains unclear whether performance gains are driven mainly by more expressive models or by better representation of clinically meaningful anatomy. We study this question through a low-data anatomy-aware benchmark for 5-class cardiac pathology prediction on the public ACDC MRI dataset. Using segmentation-derived patient descriptors from the right ventricle, myocardium, and left ventricle, we compare anatomy-specific and multi-structure representations across linear, kernel, and tree-based classifiers. We find that under limited label settings, representation dominates complexity. These results suggest that in resource-constrained healthcare settings, identifying and representing the most informative anatomy may matter more than the increasing complexity of the model alone.

2606.07399 2026-06-08 stat.ML cs.LG 新提交

Automatic, Debiased, and Invariant Counterfactual Generation under General Interventions

通用干预下的自动、去偏和不变反事实生成

Raphael C Kim, Jingsen Zhu, Ramin Zabih, Michele Santacatterina

AI总结 提出ADIGen框架,结合Riesz回归、因果不变性和正交统计学习,实现通用干预下反事实生成的自动、去偏和不变性,并提供过剩风险界。

详情
AI中文摘要

用于反事实结果的生成模型在复杂干预下支持决策具有巨大潜力,但现有方法受限于不稳定的估计、跨环境的泛化能力差以及来自干扰模型错误设定的偏差。我们引入了ADIGen,一个在通用干预下(包括高维干预和结果)进行自动、去偏和不变反事实生成的框架。ADIGen结合了Riesz回归以避免不稳定的密度比估计,因果不变性以改善分布偏移下的泛化,以及正交统计学习以获得针对干扰模型错误设定的双重稳健保证。我们提供了过剩风险界,表明ADIGen在通用干预下控制了反事实风险,具有乘积偏差干扰余项和跨环境的不变风险界。

英文摘要

Generative models for counterfactual outcomes have great potential to support decision-making under complex interventions, but existing approaches are limited by unstable estimation, poor generalization across environments, and bias from nuisance model misspecification. We introduce ADIGen, a framework for automatic, debiased, and invariant counterfactual generation under general interventions, including high-dimensional interventions and outcomes. ADIGen combines Riesz regression to avoid unstable density-ratio estimation, causal invariance to improve generalization under distribution shift, and orthogonal statistical learning to obtain doubly robust guarantees against nuisance model misspecification. We provide excess-risk bounds showing that ADIGen controls counterfactual risk under general interventions, with a product-bias nuisance remainder and an invariant risk bound across environments.

2606.07062 2026-06-08 stat.CO cs.MS 新提交

CATEKAPPA: An R Shiny Application for Design and Analysis of Consistency Tests Based on the Kappa Statistic for Categorical Responses

CATEKAPPA:基于Kappa统计量进行分类响应一致性检验设计与分析的R Shiny应用

Zheng Gai, Li Xincheng, Jiang Wangyingjie, Zhao Panwei

AI总结 针对分类数据一致性检验中样本量确定和Kappa系数计算两大难题,开发了集成样本量规划与一致性分析的R Shiny应用CATEKAPPA,支持Cohen's、Fleiss'和Light's Kappa,并提供自动解释。

详情
Comments
10 pages, 4 figures; This open-source R package CATEKAPPA is available on CRAN at https://CRAN.R-project.org/package=catekappa, source code repository is hosted at https://github.com/satellite837/catekappa. Manuscript planned for submission to Journal of Statistical Software (JSS). Supplementary R package source code uploaded as ancillary file
AI中文摘要

Kappa统计量是分类数据中衡量评估者间一致性的最广泛使用的指标。尽管其流行,应用研究人员常遇到两大障碍:(i) 确定达到给定功效下期望一致性水平所需的样本量,以及(ii) 计算合适的Kappa系数并进行正确解释。现有的R包如irr和kappaSize提供了这些功能,但需要编程技能且缺乏集成的用户友好界面。我们提出CATEKAPPA,一个R包,通过将样本量规划(通过kappaSize)和一致性分析(通过irr)结合到单个基于Shiny的Web应用中,弥合了这一差距。该包支持两位评估者的Cohen's kappa、三位或更多评估者的Fleiss' kappa以及Light's kappa,并使用Landis & Koch量表提供自动解释。用户可以启动交互式图形界面或使用命令行函数进行脚本编写。该包在CRAN上免费提供。

英文摘要

The kappa statistic is the most widely used measure of inter-rater agreement for categorical data. Despite its popularity, applied researchers often encounter two major hurdles: (i) determining the sample size required to achieve a desired level of agreement with given power, and (ii) computing appropriate kappa coefficients with proper interpretation. Existing R packages such as irr and kappaSize provide these functionalities but require programming skills and lack an integrated, user-friendly interface. We present CATEKAPPA, an R package that bridges this gap by combining sample size planning (via kappaSize) and agreement analysis (via irr) into a single Shiny-based web application. The package supports Cohen's kappa for two raters, Fleiss' kappa for three or more raters, and Light's kappa, and provides automatic interpretation using the Landis & Koch scale. Users can either launch an interactive graphical interface or use command-line functions for scripting. The package is freely available on CRAN.

2606.07016 2026-06-08 stat.AP cs.CV 新提交

An Integrated Roadside Sensing and Communication Framework for Vulnerable Road User Safety at Signalized Intersections

信号交叉口弱势道路使用者安全的集成路边感知与通信框架

Parvez Anowar

AI总结 提出集成多模态感知、边缘计算、V2X/P2X通信和自适应信号控制的框架,基于公开数据集R-LiViT分析53,319个标注,发现VRU占49%、昼夜密度差异大、近距离事件变化10倍、83%行人边界框小,支持多模态感知和自适应部署。

详情
Comments
17 pages, 5 figures, 2 tables. Preprint
AI中文摘要

弱势道路使用者(VRU)约占全球城市交通死亡人数的一半,而交叉口集中了不成比例的伤亡。最近关于VRU保护的感知技术综述列举了数十种单传感器和双传感器部署,但所调查的系统均未将多模态感知与边缘侧近碰撞分析以及双向车联万物(V2X)和行人联万物(P2X)消息传递集成在单个交叉口机柜中。本文提出一个信号交叉口VRU保护的综合框架,在感知层结合LiDAR、雷达、RGB相机和热成像相机,在计算层进行基于边缘的预测和替代安全分析,在通信层进行V2X和P2X消息传递,在驱动层进行自适应信号控制。该框架基于使用R-LiViT(首个公开的路边LiDAR-视觉-热成像数据集)的实证案例研究,该数据集提供了200个多模态序列和2,400个标注的RGB-T帧,来自三个德国交叉口。对53,319个检测标注的分析显示,VRU约占所有道路使用者观测的49%;从白天到夜晚,行人密度下降38%,车辆下降45%,而夜间分布显示更高的近距离比例;在三个交叉口的八个独特位置,每帧近距离事件计数变化约10倍;83%的行人边界框在图像空间中较小,表明VRU通常远离任何单个传感器。这些发现支持多模态感知、边缘侧分析和自适应上下文感知部署,而非统一的单传感器解决方案。

英文摘要

Vulnerable road users (VRUs) account for approximately half of urban traffic deaths globally, with intersections concentrating a disproportionate share of these casualties. Recent reviews of sensing technology for VRU protection have cataloged dozens of single-sensor and dual-sensor deployments, yet none of the surveyed systems couples multi-modal sensing with edge-side near-miss analytics and bidirectional vehicle-to-everything (V2X) and pedestrian-to-everything (P2X) messaging in a single intersection cabinet. This paper presents an integrated framework for VRU protection at signalized intersections, combining LiDAR, radar, RGB camera, and thermal camera at the perception layer, edge-based prediction and surrogate-safety analytics at the computation layer, V2X and P2X messaging at the communication layer, and adaptive signal control at the actuation layer. The framework is grounded in an empirical case study using R-LiViT, the first publicly released roadside LiDAR-Visual-Thermal dataset, which provides 200 multi-modal sequences and 2,400 annotated RGB-T frames at three German intersections. Analysis of 53,319 detection annotations reveals that VRUs comprise approximately 49% of all road-user observations, that day-to-night density drops by 38% for pedestrians and 45% for vehicles while the night distribution shows a higher close-proximity share, that per-frame close-proximity event counts vary approximately 10-fold across the eight unique locations at three intersections, and that 83% of pedestrian bounding boxes are small in image space, indicating that VRUs are typically far from any single sensor. These findings support multi-modal sensing, edge-side analytics, and adaptive context-sensitive deployment rather than uniform single-sensor solutions.

2606.06957 2026-06-08 stat.ML cs.LG 新提交

Deep Single-Index Fréchet Regression

深度单指标弗雷歇回归

Muqing Cui, Yidong Zhou, Su I Iao, Hans-Georg Müller

AI总结 提出DeSI框架,通过深度神经网络估计单指标方向,在度量空间中进行弗雷歇回归,缓解维数灾难并保持可解释性,理论保证收敛率,在分布、网络等数据上表现优异。

详情
AI中文摘要

预测位于非欧几里得空间中的输出,如概率分布、网络和对称正定矩阵,在现代数据分析中变得越来越重要,特别是当输入是高维时。我们提出了DeSI(深度单指标弗雷歇回归),一种用于度量空间值输出和多变量输入的半参数回归框架,该框架假设条件弗雷歇均值具有单指标结构。DeSI使用深度神经网络估计可解释的指标方向,该方向量化了输入的相对重要性,并在目标度量空间中沿着得到的一维指标进行弗雷歇回归。这种结构缓解了维数灾难,同时保持了可解释性,这与标准深度神经网络形成对比。我们为DeSI建立了理论保证,包括一致逼近和收敛速度,并通过在分布、网络和对称正定矩阵上的模拟,以及在新泽西州的成分情绪数据上的应用,展示了其强大的预测性能。

英文摘要

Predicting outputs that are located in non-Euclidean spaces, such as probability distributions, networks, and symmetric positive-definite matrices, is becoming increasingly important in modern data analysis, particularly when inputs are high-dimensional. We propose DeSI (Deep Single-Index Fréchet Regression), a semiparametric framework for regression with metric space-valued outputs and multivariate inputs that assumes a single-index structure for the conditional Fréchet mean. DeSI estimates an interpretable index direction, which quantifies the relative importance of inputs, using a deep neural network, and performs Fréchet regression along the resulting one-dimensional index in the target metric space. This structure mitigates the curse of dimensionality while retaining interpretability, which stands in contrast to standard deep neural networks. We establish theoretical guarantees for DeSI, including uniform approximation and convergence rates, and demonstrate its strong predictive performance through simulations on distributions, networks, and symmetric positive-definite matrices, as well as an application to compositional mood data from New Jersey.

2606.06855 2026-06-08 stat.ML cs.LG math.ST stat.TH 新提交

Stability beyond Bounded Differences: Sharp Generalization Bounds under Finite $L_p$ Moments

超越有界差分的稳定性:有限 $L_p$ 矩下的尖锐泛化界

Qianqian Lei, Soham Bonnerjee, Yuefeng Han, Wei Biao Wu

AI总结 针对重尾或无界损失,提出仅需有限 $L_p$ 矩条件的稳定性框架,导出尖锐高概率泛化界,覆盖经验风险最小化、转导回归和元学习。

详情
AI中文摘要

虽然算法稳定性是理解学习算法泛化能力的核心工具,但现有的高概率保证通常依赖于一致有界或次高斯/次韦布尔尾部假设,这对于现代设置中重尾或无界损失可能过于严格。我们开发了一个仅需有限 $L_p$ 矩条件的稳定性框架。我们的第一个贡献是在 $L_p$ 约束下独立随机变量函数的尖锐集中不等式,将 McDiarmid 的有界差分技术扩展到经典范围之外。利用这些结果,我们在一系列学习范式中推导出尖锐的高概率泛化界,包括经验风险最小化、转导回归和元学习。这些保证表明,即使有界性不成立,$L_p$ 稳定性也足以实现鲁棒泛化,显著削弱了稳定性文献中的标准假设。

英文摘要

While algorithmic stability is a central tool for understanding generalization of learning algorithms, existing high-probability guarantees typically rely on uniform boundedness or sub-Gaussian/sub-Weibull tail assumptions, which can be overly restrictive for modern settings with heavy-tailed or unbounded losses. We develop a stability-based framework that requires only a finite $L_p$ moment condition. Our first contribution is sharp concentration inequalities for functions of independent random variables under $L_p$ constraints, extending McDiarmid's bounded-differences techniques beyond the classical regime. Leveraging these results, we derive sharp high-probability generalization bounds across a range of learning paradigms, including empirical risk minimization, transductive regression, and meta-learning. These guarantees show that $L_p$ stability suffices for robust generalization even when boundedness fails, substantially weakening the standard assumptions in the stability literature.

2606.06814 2026-06-08 stat.ML cs.LG math.ST stat.AP stat.TH 新提交

The Effect of Training Task Diversity on In-Context Learning through the Lens of Low-Dimensional Subspaces

训练任务多样性对上下文学习的影响:基于低维子空间的视角

Soo Min Kwon, Alec S. Xu, Can Yaras, Dogyoon Song, Laura Balzano, Qing Qu

AI总结 本文通过低秩高斯混合模型分析训练任务多样性(由子空间非重叠列数定义)如何提升线性注意力上下文学习的泛化与优化,解释训练多样性缩短学习平台期及实现分布外泛化的现象,并扩展至非线性场景。

详情
AI中文摘要

Transformer执行上下文学习(ICL)的涌现能力引发了大量旨在理解其底层机制的研究。现有工作通常研究训练任务多样性(定义为ICL训练任务向量的数量或任务向量所来自的函数类数量)如何塑造ICL的学习动态和泛化能力。尽管这两种定义都揭示了许多有趣的现象,但后一定义下的许多观察结果在理论上仍未得到解释。本文提出了一个最小分析模型,在这些现象下,这些现象可以从训练数据的属性中可靠地涌现。通过将训练任务向量建模为低秩高斯的混合,我们展示了训练任务多样性(由参数化协方差矩阵的子空间之间的非重叠列数定义)如何改善线性注意力ICL的泛化和优化轨迹。特别地,我们表明我们的模型可以解释(i)为什么任务多样性训练缩短了ICL的平台期,以及(ii)为什么ICL似乎实现了分布外泛化。最后,我们通过实验证明了我们的结果如何扩展到非线性Transformer和非线性函数类。总体而言,我们的工作提出了一个可处理的框架来统一现有的观察结果。

英文摘要

The transformer's emergent ability to perform in-context learning (ICL) has sparked a wide range of studies designed to understand its underlying mechanisms. Existing works often study how training task diversity, defined either as the number of ICL training task vectors or as the number of function classes from which the task vectors are drawn, shapes both the learning dynamics and generalization capabilities of ICL. While both definitions have uncovered many interesting phenomena, many observations under the latter definition remain theoretically unexplained. This paper presents a minimal analytical model under which these phenomena provably emerge from the properties of the training data. By modeling the training task vectors as a mixture of low-rank Gaussians, we show how training task diversity, defined by the number of non-overlapping columns between subspaces that parameterize the covariance matrices, improves both the generalization and optimization trajectory of ICL with linear attention. In particular, we show that our model can explain (i) why training with task diversity shortens the ICL plateau and (ii) why ICL appears to achieve out-of-distribution generalization. We conclude by empirically demonstrating how our results extend to nonlinear transformers and nonlinear function classes. Overall, our work presents a tractable framework to unify existing observations.

2606.06785 2026-06-08 stat.ML cs.LG math.DS 新提交

Empirical Transfer Operators and Finite-Sample Change Detection for Noisy Expanding Interval Maps

经验转移算子与含噪扩张区间映射的有限样本变化检测

Aparna Rajput

AI总结 针对一维含噪动力系统,提出基于分区经验转移矩阵的有限样本变化检测方法,通过比较滑动窗口与基线段的平稳分布L1距离来检测不变密度变化,并给出有限样本界和误报保证。

详情
Comments
27 pages, 2 tables, 1 figure
AI中文摘要

我们研究了一维含噪动力系统的有限样本变化检测,使用基于分区的经验近似来刻画平稳行为。给定区间值过程的观测,我们对状态空间进行划分,从观测到的分区元素之间的转移中估计一个有限转移矩阵,并应用一个小的Doeblin型正则化以确保唯一的平稳分布。从初始参考段,我们计算基线经验平稳分布\(\widehat{\pi}_{0,\rho}\)。对于每个后续滑动窗口,我们计算\(\widehat{\pi}_{t,\rho}\)并定义得分\[ S_t=\|\widehat{\pi}_{t,\rho}-\widehat{\pi}_{0,\rho}\|_1. \] \(S_t\)的大值表示相对于基线的平稳行为发生变化。该统计量检测不变密度或平稳定律的变化,但不检测转移动态的所有可能变化。在关于经验转移集中性、有限状态平稳分布稳定性、分区近似、正则化偏差和噪声稳定性的明确假设下,我们推导了经验平稳密度的有限样本界。该界将采样误差、正则化偏差、分区近似误差和噪声偏差分开。然后,我们得到了单窗口误报保证,以及当不变密度变化超过估计误差时的充分检测条件。我们在合成含噪beta映射变点实验中展示了该方法。

英文摘要

We study finite-sample change detection for one-dimensional noisy dynamical systems using partition-based empirical approximations of stationary behaviour. Given observations from an interval-valued process, we partition the state space, estimate a finite transition matrix from observed transitions between partition elements, and apply a small Doeblin-type regularisation to ensure a unique stationary distribution. From an initial reference segment, we compute a baseline empirical stationary distribution \(\widehatπ_{0,ρ}\). For each later sliding window, we compute \(\widehatπ_{t,ρ}\) and define the score \[ S_t=\|\widehatπ_{t,ρ}-\widehatπ_{0,ρ}\|_1. \] Large values of \(S_t\) indicate a change in stationary behaviour relative to the baseline. The statistic detects changes in invariant density or stationary law, but not all possible changes in transition dynamics. Under explicit assumptions on empirical transition concentration, finite-state stationary distribution stability, partition approximation, regularisation bias, and noise stability, we derive a finite-sample bound for the empirical stationary density. The bound separates sampling error, regularisation bias, partition approximation error, and noise bias. We then obtain a single-window false-alarm guarantee and a sufficient detection condition when the invariant density changes by more than the estimation error. We illustrate the method on synthetic noisy beta-map change-point experiments.

2606.06772 2026-06-08 stat.ML cs.AI cs.LG 新提交

Generalization in Deep Neural Networks: Minimax Rates for Gradient Methods

深度神经网络的泛化:梯度方法的极小化最优速率

Junyu Zhou, Puyu Wang, Yunwen Lei, Marius Kloft, Yiming Ying

AI总结 本文建立了过参数化深度神经网络与核方法学习动力学的联系,证明了梯度下降和随机梯度下降在足够宽度下能达到极小化最优泛化误差。

详情
Comments
37 pages
AI中文摘要

理解过参数化神经网络的泛化性能已成为深度学习理论的核心课题。尽管近期进展,特别是神经正切核(NTK)机制下的工作,揭示了浅层架构的行为,但深度神经网络(DNN)的统计泛化性质,尤其是在回归任务中,仍远未得到充分理解。本文通过提供使用梯度方法训练的DNN的全面泛化分析,在弥合这一差距方面取得了重大进展。首先,我们首次建立了使用梯度方法训练的、具有光滑激活函数的DNN的学习动态与核方法的学习动态之间的关键联系,表明过参数化DNN上的梯度方法可以完全继承其核对应物的有利学习动态。基于这一联系以及核方法已确立的最优性,我们推导出了梯度下降(GD)和随机梯度下降(SGD)的过量总体风险的第一个已知极小化最优速率,假设网络宽度与样本大小成多项式比例。我们的结果表明,在足够宽度下,由GD或SGD训练的DNN可以实现与基于核的方法相当的泛化性能。

英文摘要

Understanding the generalization performance of over-parameterized neural networks has become a central topic in deep learning theory. While recent advances, particularly works under the Neural Tangent Kernel (NTK) regime, have shed light on the behavior of shallow architectures, the statistical generalization properties of deep neural networks (DNNs), especially in regression tasks, remain far less understood. In this paper, we make significant progress toward closing this gap by providing a comprehensive generalization analysis of DNNs trained using gradient-based methods. First, we establish, for the first time, a crucial connection between the learning dynamics of a DNN with smooth activation functions trained via gradient-based methods and those of kernel methods, showing that gradient-based methods on over-parameterized DNNs can fully inherit the favorable learning dynamics of their kernel counterparts. Building on this connection and the well-established optimality of kernel methods, we derive the first known minimax-optimal rates for the excess population risk of both gradient descent (GD) and stochastic gradient descent (SGD), under the assumption that network width scales polynomially with the sample size. Our results demonstrate that, with sufficient width, DNNs trained by GD or SGD can achieve generalization performance comparable to kernel-based methods.

2606.06764 2026-06-08 stat.ML cs.AI cs.LG 新提交

Optimal Rates for Generalization of Gradient Descent Methods with Deep Neural Networks

深度神经网络梯度下降方法的泛化最优速率

Junyu Zhou, Puyu Wang, Yunwen Lei, Yiming Ying, Ding-Xuan Zhou

AI总结 本文针对深度ReLU网络,在神经正切核(NTK)机制下,首次建立了梯度下降(GD)和随机梯度下降(SGD)的极小化最优泛化误差速率,证明宽度足够时可达核方法的最优速率。

详情
Comments
39 pages, 1 table
AI中文摘要

近年来,在神经正切核(NTK)机制下,对于过参数化神经网络的梯度下降方法的统计泛化性能的理解取得了进展。然而,现有关于回归问题的工作大多局限于浅层网络架构,在深度神经网络理论中留下了显著的空白。本文通过为使用梯度下降(GD)和随机梯度下降(SGD)训练的深度ReLU网络提供全面的泛化分析来填补这一空白。具体来说,我们首次建立了深度ReLU网络的GD和SGD在总体风险超额上的极小化最优速率,假设网络宽度与网络深度和训练样本规模呈多项式关系。我们的结果表明,在足够宽度下,深度ReLU网络的梯度下降方法能够达到与核方法相当的泛化最优速率。

英文摘要

Recent progress has been made in understanding the statistical generalization performance of gradient descent methods for overparameterized neural networks within the neural tangent kernel (NTK) regime. However, most of the existing work on regression problems is limited to shallow network architectures, leaving a notable gap in the theory of deep neural networks. This paper addresses this gap by presenting a comprehensive generalization analysis for deep ReLU networks trained using gradient descent (GD) and stochastic gradient descent (SGD). Specifically, we establish the first known minimax-optimal rates of excess population risk for both GD and SGD with deep ReLU networks, under the assumption that the network width scales polynomially with respect to the network depth and training sample size. Our results demonstrate that with sufficient width, gradient descent methods for deep ReLU networks can achieve optimal generalization rates on par with kernel methods.

2606.07445 2026-06-08 q-fin.MF cs.GT econ.TH q-fin.PR 新提交

Bubbles vs. Baselines: Token Valuation and Institutional Capital in PoS Networks under EIP-1559

泡沫 vs. 基线:EIP-1559下PoS网络中的代币估值与机构资本

Mikhail Perepelitsa

AI总结 本文构建了一个开放经济宏观均衡模型,分析EIP-1559下PoS网络中机构投资者与零售消费者的策略互动,揭示代币估值锚定于网络采用率的基本面,而机构超额收益源于零售消费者交易效用的杠杆提取。

详情
AI中文摘要

本文提出了一个开放经济宏观均衡模型,用于描述具有费用销毁机制(EIP-1559)的权益证明(PoS)网络,该模型形式化了凯利优化理性机构投资者与效用驱动零售消费者之间的策略互动。我们分析了两种行为模式下的网络动态。在无界积累模型中,消费者纯粹积累代币,产生独家买方压力,与机构投资组合再平衡相互作用,助长不断扩大的投机泡沫,并为投资者带来复合超额收益。相反,在效用消费模型中,消费者动态买卖代币,以平衡加密财富与现实世界的法币消费。在此框架内,我们推导出ETH的显式稳态均衡价格,展示了代币估值如何锚定于稳定的基本基线,该基线直接随网络采用率变化,同时完全消除机构收益溢价。我们的数值模拟表明,虽然外生传统金融(TradFi)冲击通过投资组合再平衡传播,导致代币价格高波动,但网络通胀保持高度稳定。此外,我们证明网络安全性通过反周期消费者行为免受机构垄断的影响。我们的发现表明,PoS生态系统中机构超额财富的创造并非源于质押协议本身,而是严格由零售消费者对交易效用的持续需求的杠杆提取驱动。

英文摘要

This paper presents an open-economy macroeconomic equilibrium model for Proof-of-Stake (PoS) networks with fee-burn mechanics (EIP-1559) that formalizes the strategic interplay between a Kelly-optimizing rational institutional investor and a utility-driven retail consumer. We analyze network dynamics across two behavioral regimes. In The Unbounded Accumulation Model, the consumer purely accumulates tokens, creating an exclusive buy-side pressure that interacts with institutional portfolio rebalancing to fuel an ever-expanding speculative bubble and generate compounding excess returns for investors. Conversely, in The Utility-Consumption Model, the consumer dynamically buys and sells tokens to balance crypto wealth against real-world fiat consumption. Within this framework, we derive an explicit steady-state equilibrium price for ETH, demonstrating how token valuation anchors to a stable fundamental baseline that scales directly with network adoption while completely dissolving the institutional yield premium. Our numerical simulations show that while exogenous traditional finance (TradFi) shocks propagate through portfolio rebalancing to drive high token price volatility, network inflation remains highly stable. Furthermore, we prove that network security is insulated from institutional monopoly by counter-cyclical consumer behavior. Our findings reveal that institutional excess wealth creation in PoS ecosystems is not native to the staking protocol itself, but is strictly driven by the leveraged extraction of the retail consumer's continuous demand for transactional utility.

2606.06652 2026-06-08 econ.GN cs.CE cs.IT eess.SP math.IT q-fin.EC 新提交

Probabilistic Risk Sensitivity and Loss Aversion in Cumulative Prospect Theory

累积前景理论中的概率风险敏感性和损失厌恶

Symeon Vaidanis, Marios Kountouris

AI总结 提出二元赌博框架,定义概率风险敏感性指标为概率阈值比,用于分析累积前景理论中的接受和偏好阈值,并与效用溢价、概率溢价及Arrow-Pratt曲率度量进行比较。

详情
Comments
This paper has been submitted for publication
AI中文摘要

本文开发了一个二元赌博框架,用于表征累积前景理论(CPT)中的风险敏感性和损失厌恶。所提出的概率风险敏感性度量被定义为一个概率阈值比,该比率决定了涉及确定结果与二元赌博或两个二元赌博的选择问题中的接受阈值和偏好阈值。我们展示了如何在该框架中恢复对称和非对称赌博厌恶的标准概念,并将所得的基于阈值的条件与效用溢价、概率溢价和Arrow-Pratt曲率度量进行比较。分析阐明了这些准则何时一致、何时分歧,特别是在递增厌恶条件、概率分布不等的二元赌博以及涉及概率权重函数的情形中。我们还识别了当使用CPT效用函数表示参考点处的损失厌恶时出现的技术限制。所得框架提供了直接与概率阈值相关的风险敏感性的决策理论解释,并补充了现有的基于溢价的方法。

英文摘要

This paper develops a binary-gamble framework for characterizing risk sensitivity and loss aversion in Cumulative Prospect Theory (CPT). The proposed probabilistic risk-sensitivity metric is defined as a probability-threshold ratio that determines acceptance and preference thresholds in choice problems involving either a certain outcome and a binary gamble or two binary gambles. We show how standard notions of symmetric and non-symmetric bet aversion can be recovered within this framework, and we compare the resulting threshold-based conditions with utility premia, probability premia, and Arrow--Pratt curvature measures. The analysis clarifies when these criteria coincide and when they diverge, particularly for increasing aversion conditions, binary gambles with unequal probability distributions, and settings involving probability weighting functions. We also identify technical restrictions that arise when CPT-utility functions are used to represent loss aversion at the reference point. The resulting framework provides a decision-theoretic interpretation of risk sensitivity that is directly tied to probability thresholds and complements existing premium-based approaches.

2606.06537 2026-06-08 q-bio.QM cs.CV eess.IV 新提交

DSU-Net: An Attention-Enhanced Dense Skip U-Net for Breast Lesion Segmentation in Mammographic Images

DSU-Net:用于乳腺X线图像中乳腺病变分割的注意力增强密集跳跃U-Net

Reza Bozorgpour, Mohammadreza Soltany Sadrabadi

AI总结 提出DSU-Net,通过密集跳跃连接和注意力机制改进特征传播与边界描绘,在CBIS-DDSM数据集上实现高精度乳腺病变分割。

详情
AI中文摘要

乳腺癌仍然是全球女性癌症相关死亡的主要原因之一,因此早期检测对于有效治疗至关重要。乳腺X线摄影是主要的筛查方式;然而,可疑病变的准确勾画仍然具有挑战性,且存在观察者间差异。自动分割方法可以通过提供一致且高效的病变定位来辅助放射科医生。本研究提出了DSU-Net,一种用于乳腺X线图像中自动乳腺病变分割的注意力增强密集跳跃U-Net架构。该框架集成了密集跳跃连接和注意力机制,以改进特征传播、保留空间信息并增强病变边界描绘。实验使用了乳腺摄影筛查数字数据库的精选乳腺成像子集(CBIS-DDSM)。为了解决严重的前景-背景不平衡问题,训练中采用了结合Dice损失、焦点损失和二元交叉熵损失的复合损失函数。所提模型在验证数据集上实现了0.9421的Dice相似系数、0.8905的交并比、0.9711的准确率和0.9878的AUC-ROC。定性评估显示了对不同大小和形态病变的准确勾画,而定量结果证实了病变与背景区域之间的稳健区分。这些发现表明,DSU-Net在乳腺X线图像中提供了准确可靠的乳腺病变分割,并突出了注意力引导深度学习在计算机辅助乳腺癌筛查和诊断中的潜力。

英文摘要

Breast cancer remains one of the leading causes of cancer-related mortality among women worldwide, making early detection essential for effective treatment. Mammography is the primary screening modality; however, accurate delineation of suspicious lesions remains challenging and subject to inter-observer variability. Automated segmentation methods can assist radiologists by providing consistent and efficient lesion localization. This study presents DSU-Net, an attention-enhanced Dense Skip U-Net architecture for automated breast lesion segmentation in mammographic images. The proposed framework integrates dense skip connections and attention mechanisms to improve feature propagation, preserve spatial information, and enhance lesion boundary delineation. Experiments were conducted using the Curated Breast Imaging Subset of the Digital Database for Screening Mammography (CBIS-DDSM). To address severe foreground-background imbalance, a composite loss function combining Dice loss, focal loss, and binary cross-entropy loss was employed during training. The proposed model achieved a Dice Similarity Coefficient of 0.9421, an Intersection over Union of 0.8905, an accuracy of 0.9711, and an AUC-ROC of 0.9878 on the validation dataset. Qualitative evaluation demonstrated accurate delineation of lesions with varying sizes and morphologies, while quantitative results confirmed robust discrimination between lesion and background regions. These findings demonstrate that DSU-Net provides accurate and reliable breast lesion segmentation in mammographic images and highlights the potential of attention-guided deep learning for computer-aided breast cancer screening and diagnosis.

2606.06516 2026-06-08 q-bio.QM cs.LG 新提交

Probabilistic learning to perform pre-onset individualised prediction of disease severity: application to Veno Occlusive Disease

概率学习用于疾病严重程度的发病前个体化预测:在静脉闭塞性疾病中的应用

Dalia Chakrabarty, Kane Warrior, Chuqiao Zhang, Akash Bhojgaria, Joydeep Chakrabartty

AI总结 提出一种新的概率监督学习方法,利用数字孪生和概率逆学习,在骨髓移植前自动预测静脉闭塞性疾病(VOD)的严重程度评分,辅助医生制定治疗方案。

详情
AI中文摘要

我们提出了一种新的概率监督学习方法,能够对预期患者疾病发展的严重程度进行可靠、自动且早期的个体化预测。通过考虑预期患者的数字孪生(DT),在移植前预测静脉闭塞性疾病(VOD)的严重程度评分来展示预测能力,该评分参数化了患者在接受骨髓移植后VOD发展的严重程度。通过将移植前变量与严重程度评分变量之间的关系建模为(随机)函数,该函数被视为适当选择的随机过程的样本函数,从而学习这种关系。该基础过程的参数使用训练数据集学习,该数据集由回顾性患者队列的实时演变生成,随后通过预期患者评分的概率逆学习来扩充该训练数据集的大小。扩充后的训练集允许学习在移植前阶段自动预测VOD严重程度评分的函数,该评分表征了物理患者在其独特移植前状态下的DT。该评分随后反馈给真实预期患者,作为其移植后VOD发展的严重程度。这样的评分允许治疗血液肿瘤学家决定治疗方案,在本例中简化为决定是否使用去纤维蛋白多核苷酸治疗患者。开发了一个AI工具来执行这种自动预测,医生输入表征预期患者DT的移植前状态数据。

英文摘要

We advance a new probabilistic supervised learning approach that permits reliable, automated, and early individualised prediction of the severity with which a disease will develop in a prospective patient. The prediction capacity is illustrated via the pre-transplant prediction of the score of severity of Veno Occlusive Disease (or VOD) in the digital twin (DT) of the considered prospective patient, where this score parametrises the severity with which VOD will develop in this patient, after they undergo their Bone Marrow Transplant. The learning of the relationship between the pre-transplant variables, and a severity score variable is undertaken by modelling this relationship as a (random) function that is treated as a sample function of an adequately-chosen stochastic process. The parameters of this underlying process are learnt using a training dataset that is generated using the real-time evolution of retrospective patients in a cohort, with this training dataset subsequently augmented in size by a probabilistic inverse learning of the score of prospective patients. The augmented training set, then permits the learning of the function that capacitates - at the pre-transplant stage - automated prediction of the score of the severity of VOD that characterises the DT of a physical patient in their unique pre-transplant state. This score is subsequently fed back to the real prospective patient as the severity with which VOD will develop in them, after this patient undergoes their transplant. Such a score then permits the treating Haematologist-Oncologists to decide on the treatment regimen, which in this illustration reduces to deciding on treating the patient with Defibrotide. An AI facility is developed to undertake such automated prediction, with the physician inputting the data on the pre-transplant state that characterises the DT of the prospective patient under consideration.