arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2606.19943 2026-06-19 eess.IV cs.AI 新提交

SIMBA: ABidirectional Retrieval Forward Simulation Framework for Modeling FY-4A GIIRS Hyperspectral Infrared Radiances Toward NWP Applications

SIMBA：面向NWP应用的FY-4A GIIRS高光谱红外辐射双向检索正向模拟框架

Jingdong Shen, Fu Wang*, Qifeng Lu, Hao Huang, Chunqiang Wu, Chi Yang, Xiaofang Liu

AI总结提出SIMBA框架，联合进行大气廓线检索和辐射重建，通过循环一致性约束和双向Mamba模块增强耦合，在FY-4A GIIRS数据上优于多种深度学习基线。

详情

AI中文摘要

高光谱红外观测是数值天气预报（NWP）的重要数据源，因为它们提供了大气温度和湿度垂直结构的丰富信息。然而，现有的深度学习方法主要关注从辐射到大气廓线的单向检索，而反向辐射模拟过程以及大气状态空间与辐射观测空间之间的一致性考虑不足。在本研究中，我们提出了SIMBA，一个用于FY-4A GIIRS高光谱红外辐射建模的统一双向检索-正向模拟框架，面向NWP应用。该框架联合执行大气廓线检索和辐射重建，引入循环一致性约束以加强两个过程之间的耦合，并采用双向Mamba状态空间模块来捕捉沿气压层的长程依赖。利用配准的FY-4A GIIRS观测和ERA5再分析数据，该方法在温度检索、比湿检索、长波辐射重建和中波辐射重建上进行了评估。实验结果表明，SIMBA在检索和重建任务上均优于多个代表性深度学习基线，而消融实验证实了双向设计和循环一致性机制的贡献。这些结果表明，所提出的框架对于联合大气廓线检索和高光谱红外辐射建模是有效的，并显示出未来在雅可比相关分析和面向NWP扩展方面的潜力。

英文摘要

Hyperspectral infrared observations are an important data source for numerical weather prediction (NWP) because they provide rich information on the vertical structure of atmospheric temperature and humidity. However, most existing deep learning methods mainly focus on one-way retrieval from radiances to atmospheric profiles, while the reverse radiance simulation process and the consistency between atmospheric state space and radiance observation space are insufficiently considered. In this study, we propose SIMBA, a unified bidirectional retrieval-forward simulation framework for FY-4A GIIRS hyperspectral infrared radiance modeling toward NWP applications. The framework jointly performs atmospheric profile retrieval and radiance reconstruction, introduces a cycle-consistency constraint to strengthen the coupling between the two processes, and employs a bidirectional Mamba state-space module to capture long-range dependencies along pressure levels. Using collocated FY-4A GIIRS observations and ERA5 reanalysis data, the proposed method is evaluated for temperature retrieval, specific humidity retrieval, long-wave radiance reconstruction, and medium-wave radiance reconstruction. Experimental results show that SIMBA outperforms several representative deep learning baselines across both retrieval and reconstruction tasks, while ablation experiments confirm the contribution of the bidirectional design and cycle-consistency mechanism. These results demonstrate that the proposed framework is effective for joint atmospheric profile retrieval and hyperspectral infrared radiance modeling, and suggest potential for future Jacobian-related analysis and NWP-oriented extensions.

URL PDF HTML ☆

赞 0 踩 0

2606.19574 2026-06-19 eess.IV cs.CV 新提交

FrequencyFormer: A Co-Designed Sensor-to-Processor Pipeline for Frequency-Domain Vision Transformer Inference

FrequencyFormer: 面向频域视觉Transformer推理的协同设计传感器到处理器流水线

Chengwei Zhou, Ovishake Sen, Xuming Chen, Rishith Paramasivam, Shaahin Angizi, Swarup Bhunia, Baibhab Chatterjee, Gourav Datta

AI总结提出FrequencyFormer，通过多尺度DCT标记化将图像压缩为频域令牌，结合近传感器LUT硬件和低功耗通信架构，实现高达128倍数据压缩和28.8 TOPS/W能效，兼容多种视觉任务。

详情

AI中文摘要

在传感器边缘系统上部署视觉Transformer（ViT）不仅受限于设备计算能力，还受限于从传感器到处理器传输高维图像数据所需的能量和带宽。虽然传感器内和近传感器计算通过早期特征提取降低了这一成本，但现有方法通常仅提供适度的压缩。我们观察到频域提供了视觉信息的自然紧凑表示，并且可以在传感器级别利用以减少传感器到处理器的数据移动。基于这一见解，我们提出了FrequencyFormer，一种用于高效ViT推理的协同设计传感器到处理器流水线。FrequencyFormer包括：（1）多尺度DCT标记化器，将224x224图像压缩为紧凑的频域令牌，实现高达128倍的片外数据量减少，且精度损失较小；（2）基于查找表（LUT）的近传感器硬件实现，利用固定DCT系数实现无乘法器、节能且面积高效的标记化；（3）改进的基于MIPI的低功耗通信架构，进一步降低传输能量。FrequencyFormer可作为标准ViT补丁嵌入的直接替代，并与分类、检测和分割任务的预训练骨干网络兼容。该流水线实现了28.8 TOPS/W的能效，将通信能量降低230倍，并将总传感器侧能量降低2.22倍，展示了频域标记化作为传感器内ViT部署的可扩展基础。

英文摘要

Deploying vision transformers (ViTs) on sensor-edge systems is limited not only by on-device compute, but also by the energy and bandwidth required to transmit high-dimensional image data from the sensor to the processor. While in-sensor and near-sensor computing reduce this cost through early feature extraction, existing methods often provide only modest compression. We observe that the frequency domain provides a naturally compact representation of visual information and can be exploited at the sensor level to reduce sensor-to-processor data movement. Building on this insight, we present FrequencyFormer, a co-designed sensor-to-processor pipeline for efficient ViT inference. FrequencyFormer includes: (1) a multi-scale DCT tokenizer that compresses a 224x224 image into compact frequency-domain tokens, achieving up to 128x reduction in off-chip data volume with modest accuracy loss; (2) a LUT-based near-sensor hardware implementation that leverages fixed DCT coefficients for multiplier-free, energy- and area-efficient tokenization; and (3) a modified MIPI-based low-power communication architecture that further reduces transfer energy. FrequencyFormer serves as a drop-in replacement for standard ViT patch embedding and remains compatible with pretrained backbones across classification, detection, and segmentation tasks. The pipeline achieves 28.8 TOPS/W, reduces communication energy by 230x, and lowers total sensor-side energy by 2.22x, demonstrating frequency-domain tokenization as a scalable foundation for in-sensor ViT deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.19372 2026-06-19 eess.IV cs.CV cs.LG 新提交

Full-Self Diagnostics (FSD): Physics-Grounded Visual Biomarker Inference from Smartphone Video via Inverse Problems and Operator Learning

全自诊断(FSD): 通过逆问题和算子学习从智能手机视频进行基于物理的可视生物标志物推断

Jonathan Thomas, Harsh Thaker

AI总结提出全自诊断(FSD)框架，结合物理前向模型、信息论可观测性、正则化逆问题、算子学习和随机变分推断，从9秒面部视频恢复生理状态，在59名受试者38812次扫描中验证，血糖MARD达29.86%。

Comments 38,812 paired scans, preliminary longitudinal validation of multichannel visual glucose inference (MARD 17 to 46 percent across cohorts); physics plus information theory plus operator learning framework

详情

AI中文摘要

我们提出全自诊断(FSD)，一个统一的数学框架，用于从消费级智能手机拍摄的无约束9秒面部视频中恢复潜在生理状态。该方法整合了五个相互增强的组件：(1)基于辐射传输方程和发色团吸收的物理前向模型，将相机观测映射到生物标志物浓度；(2)信息论可观测性理论，证明多通道视觉信号（光谱、脉搏、呼吸、微表情和眼动）与生理状态包含严格递增的互信息；(3)具有域均匀可辨识性保证的稳定Tikhonov正则化逆问题；(4)算子学习公式，实现跨设备、分辨率和人群的泛化；(5)可解释为随机变分推断的监督学习过程，从配对生物传感器真实值持续优化模型，性能随配对观测数量的平方根倒数比例提升。在59名受试者的38812次真实世界配对扫描上的实证验证展示了实际性能。第一作者自采数据（血糖范围35-550 mg/dL）的MARD为29.86%，97.57%的预测落在Clarke误差网格A+B区，仅0.27%在危险E区。一位管理良好的糖尿病参与者在较窄的70-180 mg/dL范围内达到MARD 17%。这些结果证实，消费级面部视频编码了足够的结构化信息，可在完全无约束条件下进行临床相关的非侵入性生物标志物推断，且性能随更多配对数据的可用性可预测地提升。

英文摘要

We present Full-Self Diagnostics (FSD), a unified mathematical framework for recovering latent physiological states from unconstrained 9-second facial videos captured by consumer smartphones. The approach integrates five mutually reinforcing components: (1) a physics-based forward model derived from the radiative transfer equation and chromophore absorption that maps camera observables to biomarker concentrations; (2) an information-theoretic observability theory proving that multi-channel visual signals (spectral, pulse, respiratory, micro-expression, and oculomotor) contain strictly increasing mutual information with physiological state; (3) a stable, Tikhonov-regularized inverse problem with domain-uniform identifiability guarantees; (4) an operator-learning formulation that enables generalization across devices, resolutions, and populations; and (5) a supervised learning procedure, interpretable as stochastic variational inference, that continuously refines the model from paired biosensor ground truth with performance improving proportionally to one over the square root of the number of paired observations. Empirical validation on 38812 real-world paired scans across 59 subjects demonstrates practical performance. Self-collected data from the lead author (glucose range 35-550 mg/dL) yields MARD of 29.86 percent with 97.57 percent of predictions in Clarke Error Grid Zones A+B and only 0.27 percent in the dangerous Zone E. A well-managed diabetic participant achieves MARD of 17 percent in the narrower 70-180 mg/dL band. These results confirm that consumer-grade facial video encodes sufficient structured information for clinically relevant, non-invasive biomarker inference under fully unconstrained conditions, with performance scaling predictably as more paired data becomes available.

URL PDF HTML ☆

赞 0 踩 0

2606.19767 2026-06-19 eess.IV cs.CV physics.med-ph 新提交

Contour-Constrained Deformable Registration with Parameter Characterization for Head and Neck Surgical Guidance

面向头颈外科引导的带参数表征的轮廓约束可变形配准

Qingyun Yang, Jon S. Heiselman, Ayberk Acar, Morgan J. Ringel, Michael I. Miga, Matthieu Chabanas, Michael C. Topf, Jie Ying Wu

AI总结提出一种基于正则化Kelvinlet基函数的可变形配准框架，通过表面点云、基准标记和轮廓约束校正术后组织变形，在9例头颈标本上将配准误差从刚性配准的11.11mm降至5.62mm，降幅达49.41%。

详情

AI中文摘要

全球每年新增89万例头颈部鳞状细胞癌，其复发率在实体恶性肿瘤中最高。尽管冰冻切片分析是术中切缘评估的标准方法，但由于切除标本与切除床之间的对准不精确，加上切除后黏膜组织收缩，准确地将检测到的阳性切缘重新定位到切除床上仍然具有挑战性。我们提出了一种生物力学驱动的可变形配准框架，用于校正术后组织变形以提供术中引导。该方法基于正则化Kelvinlet基函数的可变形配准方法，将3D标本网格配准到术中切除床点云。配准匹配表面点云、基准标记和边界轮廓约束，直接惩罚标本与切除床边界之间的垂直距离一致性。在来自皮肤、颊粘膜和舌部位的9个标本上，使用刚性配准的整体平均目标配准误差为$11.11 \pm 4.07$ mm，使用无轮廓约束的可变形配准则降至$8.20 \pm 2.68$ mm（降低26.19%）。所提出的轮廓约束可变形配准进一步将误差降至$5.62 \pm 2.28$ mm，相对于刚性配准降低了49.41%。我们在临床最具挑战性的舌标本中观察到最大降幅。我们还进行了系统的两阶段参数搜索，以表征表面配准、基准对应、轮廓约束和应变能正则化的相对重要性。该搜索表明，对于具有大侧向变形的组织类型，轮廓权重主导配准精度，而算法在广泛的参数组合范围内均可运行。

英文摘要

With 890,000 annual new cases globally, head and neck squamous cell carcinoma has one of the highest recurrence rates among solid malignancies. Although frozen section analysis is the standard of care for intraoperative margin assessment, accurately relocating detected positive margins on the resection bed remains challenging due to imprecise alignment between resected specimens and their resection bed, compounded by post-resection mucosal tissue shrinkage. We present a biomechanics-driven deformable registration framework that corrects post-resection tissue deformation to provide intraoperative guidance. Our approach registers 3D specimen meshes to intraoperative resection bed point clouds using a deformable registration approach based on regularized Kelvinlet basis functions. The registration matches surface point clouds, fiducial landmarks, and boundary contour constraints that directly penalize perpendicular distance-to-agreement between specimen and resection bed boundaries. Across nine specimens from skin, buccal mucosa, and tongue sites, the overall mean target registration error was $11.11 \pm 4.07$ mm using rigid registration, which decreased to $8.20 \pm 2.68$ mm (26.19\% reduction) using deformable registration without contour constraint. The proposed contour-constrained deformable registration further reduced the error to $5.62 \pm 2.28$ mm, a 49.41\% reduction relative to rigid registration. We observed the largest reduction in the most clinically challenging tongue specimens. We also performed a systematic two-stage parameter search to characterize the relative importance of surface alignment, fiducial correspondences, contour constraint, and strain energy regularization. This search revealed that contour weighting dominates registration accuracy for tissue types with large lateral deformation, while the algorithm operates over a broad range of parameter combinations.

URL PDF HTML ☆

赞 0 踩 0

2606.20112 2026-06-19 cs.CV eess.IV 交叉投稿

Pixel-Level Residual Diffusion Transformer: Scalable 3D CT Volume Generation

像素级残差扩散Transformer：可扩展的3D CT体生成

Zhenkai Zhang, Markus Hiller, Krista A. Ehinger, Tom Drummond

发表机构 * School of Computing and Information Systems, The University of Melbourne（墨尔本大学计算与信息系统学院）

AI总结提出像素级残差扩散Transformer（PRDiT），通过两阶段训练（局部MLP盲估计器分离低频结构+全局残差扩散Transformer建模高频残差）实现高保真3D CT体生成，在LIDC-IDRI和RAD-ChestCT数据集上优于现有方法。

Comments Accepted at ICLR 2026. Code available at https://github.com/Fredy-Zhang/PRDiT

详情

AI中文摘要

由于现有生成模型固有的巨大计算需求和优化困难，生成具有精细细节的高分辨率3D CT体仍然具有挑战性。在本文中，我们提出了像素级残差扩散Transformer（PRDiT），这是一种可扩展的生成框架，可直接在体素级别合成高质量的3D医学体。PRDiT引入了一个两阶段训练架构，包括：1）一个局部去噪器，形式为基于MLP的盲估计器，作用于重叠的3D块，以有效分离低频结构；2）一个全局残差扩散Transformer，采用内存高效注意力来建模和细化整个体上的高频残差。这种从粗到细的建模策略简化了优化，增强了训练稳定性，并有效保留了细微结构，而无需自编码器瓶颈。在LIDC-IDRI和RAD-ChestCT数据集上进行的大量实验表明，PRDiT始终优于最先进的模型，如HA-GAN、3D LDM和WDM-3D，在3D FID、MMD和Wasserstein距离指标上显著降低。

英文摘要

Generating high-resolution 3D CT volumes with fine details remains challenging due to substantial computational demands and optimization difficulties inherent to existing generative models. In this paper, we propose the Pixel-Level Residual Diffusion Transformer (PRDiT), a scalable generative framework that synthesizes high-quality 3D medical volumes directly at voxel-level. PRDiT introduces a two-stage training architecture comprising 1) a local denoiser in the form of an MLP-based blind estimator operating on overlapping 3D patches to separate low-frequency structures efficiently, and 2) a global residual diffusion transformer employing memory-efficient attention to model and refine high-frequency residuals across entire volumes. This coarse-to-fine modeling strategy simplifies optimization, enhances training stability, and effectively preserves subtle structures without the limitations of an autoencoder bottleneck. Extensive experiments conducted on the LIDC-IDRI and RAD-ChestCT datasets demonstrate that PRDiT consistently outperforms state-of-the-art models, such as HA-GAN, 3D LDM and WDM-3D, achieving significantly lower 3D FID, MMD and Wasserstein distance scores.

URL PDF HTML ☆

赞 0 踩 0

2603.10791 2026-06-19 eess.IV 版本更新

Semantic Satellite Communications for Synchronized Audiovisual Reconstruction

面向同步视听重建的语义卫星通信

Fangyu Liu, Peiwen Jiang, Wenjin Wang, Xiao Li, Shi Jin

AI总结提出自适应多模态语义传输系统，通过双流生成架构和动态关键帧更新机制，在带宽受限的卫星场景下实现高质量同步视听重建，显著降低带宽消耗并提升鲁棒性。

详情

AI中文摘要

卫星通信在支持高保真同步视听服务方面面临严重瓶颈，因为传统方案在信道波动、带宽有限和长传播延迟下难以处理跨模态一致性。为了解决这些问题，本文提出了一种针对卫星场景的自适应多模态语义传输系统，旨在带宽约束下实现高质量同步视听重建。与具有固定模态优先级的静态方案不同，我们的框架采用双流生成架构，可灵活切换视频驱动音频生成和音频驱动视频生成。这使得系统能够动态解耦语义，仅传输最重要的模态，同时利用跨模态生成恢复另一种模态。为了平衡重建质量和传输开销，动态关键帧更新机制根据无线场景和用户需求自适应维护共享知识库。此外，引入基于大语言模型的决策模块以增强系统适应性。通过集成卫星特定知识，该模块联合考虑任务需求和信道因素（如天气引起的衰落），主动调整传输路径和生成工作流。仿真结果表明，所提系统在实现高保真视听同步的同时显著降低带宽消耗，提高了挑战性卫星场景下的传输效率和鲁棒性。

英文摘要

Satellite communications face severe bottlenecks in supporting high-fidelity synchronized audiovisual services, as conventional schemes struggle with cross-modal coherence under fluctuating channel conditions, limited bandwidth, and long propagation delays. To address these limitations, this paper proposes an adaptive multimodal semantic transmission system tailored for satellite scenarios, aiming for high-quality synchronized audiovisual reconstruction under bandwidth constraints. Unlike static schemes with fixed modal priorities, our framework features a dual-stream generative architecture that flexibly switches between video-driven audio generation and audio-driven video generation. This allows the system to dynamically decouple semantics, transmitting only the most important modality while employing cross-modal generation to recover the other. To balance reconstruction quality and transmission overhead, a dynamic keyframe update mechanism adaptively maintains the shared knowledge base according to wireless scenarios and user requirements. Furthermore, a large language model based decision module is introduced to enhance system adaptability. By integrating satellite-specific knowledge, this module jointly considers task requirements and channel factors such as weather-induced fading to proactively adjust transmission paths and generation workflows. Simulation results demonstrate that the proposed system significantly reduces bandwidth consumption while achieving high-fidelity audiovisual synchronization, improving transmission efficiency and robustness in challenging satellite scenarios.

URL PDF HTML ☆

赞 0 踩 0

2601.15119 2026-06-19 eess.IV cs.CV 版本更新

Vision Models for Medical Imaging: A Hybrid Approach for PCOS Detection from Ultrasound Scans

Md Mahmudul Hoque, Md Mehedi Hassain, Muntakimur Rahaman, Md. Towhidul Islam, Shaista Rani, Md Sharif Mollah

发表机构 * Department of CSE, CCN University of Science & Technology（计算机科学与工程系，CCN科学与技术大学）； Department of EEE,International Islamic University Chittagong（电子工程系，国际伊斯兰大学恰tagong分校）； Faculty of Engineering, Multimedia University（工程学院，多媒体大学）； Department of CSE, Stamford University of Bangladesh（计算机科学与工程系，斯塔福德大学孟加拉国分校）； Department of Biology, Lucknow University（生物学系，拉胡尔大学）； Department of CSE, Bangladesh Army International University of Science & Technology（计算机科学与工程系，孟加拉国军队国际科学与技术大学）

2601.03112 2026-06-19 eess.IV cs.CV 版本更新

DiT-JSCC: Rethinking Deep JSCC with Diffusion Transformers and Semantic Representations

DiT-JSCC：基于扩散变换器与语义表示的深度JSCC再思考

Kailin Tan, Jincheng Dai, Sixian Wang, Guo Lu, Shuo Shao, Kai Niu, Wenjun Zhang, Ping Zhang

发表机构 * Beijing University of Posts and Telecommunications（北京邮电大学）； Shanghai Jiao Tong University（上海交通大学）； University of Shanghai for Science and Technology（上海科技大学）

AI总结提出DiT-JSCC框架，联合学习语义优先表示编码器和扩散变换器生成解码器，通过粗细粒度条件解码和基于Kolmogorov复杂度的自适应带宽分配，在极端信道条件下提升语义一致性与传输效率。

Comments 14pages, 14figures, 2tables

详情

AI中文摘要

生成式联合源信道编码（GJSCC）已成为一种新的深度JSCC范式，用于在极端无线信道条件（如超低带宽和低信噪比）下实现高保真和鲁棒的图像传输。近期研究通常采用扩散模型作为生成解码器，但经常产生视觉上逼真但语义一致性有限的结果。这种局限性源于面向重建的JSCC编码器与生成解码器之间的根本性不匹配，因为前者缺乏显式的语义判别能力，无法提供可靠的条件线索。在本文中，我们提出DiT-JSCC，一种新颖的GJSCC骨干网络，能够联合学习语义优先的表示编码器和基于扩散变换器（DiT）的生成解码器，我们的开源项目旨在促进GJSCC的未来研究。具体来说，我们设计了一个语义-细节双分支编码器，与从粗到细的条件DiT解码器自然对齐，在极端信道条件下优先考虑语义一致性。此外，受Kolmogorov复杂度启发，引入了一种无需训练的自适应带宽分配策略，以进一步提高传输效率，从而真正重新定义生成解码时代的信息价值概念。大量实验表明，DiT-JSCC在语义一致性和视觉质量上始终优于现有JSCC方法，尤其是在极端条件下。

英文摘要

Generative joint source-channel coding (GJSCC) has emerged as a new Deep JSCC paradigm for achieving high-fidelity and robust image transmission under extreme wireless channel conditions, such as ultra-low bandwidth and low signal-to-noise ratio. Recent studies commonly adopt diffusion models as generative decoders, but they frequently produce visually realistic results with limited semantic consistency. This limitation stems from a fundamental mismatch between reconstruction-oriented JSCC encoders and generative decoders, as the former lack explicit semantic discriminability and fail to provide reliable conditional cues. In this paper, we propose DiT-JSCC, a novel GJSCC backbone that can jointly learn a semantics-prioritized representation encoder and a diffusion transformer (DiT) based generative decoder, our open-source project aims to promote the future research in GJSCC. Specifically, we design a semantics-detail dual-branch encoder that aligns naturally with a coarse-to-fine conditional DiT decoder, prioritizing semantic consistency under extreme channel conditions. Moreover, a training-free adaptive bandwidth allocation strategy inspired by Kolmogorov complexity is introduced to further improve the transmission efficiency, thereby indeed redefining the notion of information value in the era of generative decoding. Extensive experiments demonstrate that DiT-JSCC consistently outperforms existing JSCC methods in both semantic consistency and visual quality, particularly in extreme regimes.

URL PDF HTML ☆

赞 0 踩 0

2508.01819 2026-06-19 eess.IV 版本更新

Decoding the Alzheimer's Continuum: Interpretable Multi-Gate Routing for Diagnosis and Transition Prediction

解码阿尔茨海默病连续谱：可解释的多门路由用于诊断与转换预测

Yufeng Jiang, Hexiao Ding, Hongzhao Chen, Jing Lan, Xinzhi Teng, Gerald W. Y. Cheng, Yunlin Mao, Zongxi Li, Haoran Xie, Jung Sun Yoo, Jing Cai

AI总结提出M$^3$AD统一框架，利用可解释多门专家混合架构，基于T1加权sMRI同时实现三分类诊断和阶段转换预测，准确率达95.13%。

Comments Accepted by MICCAI2026

详情

AI中文摘要

阿尔茨海默病（AD）表现为从正常认知（NC）经轻度认知障碍（MCI）到痴呆的连续进展。然而，大多数深度学习方法将此连续谱简化为不连续的分类任务，很大程度上忽略了动态阶段转换。为了解码这一复杂进展，我们提出M$^3$AD，一个统一框架，仅使用T1加权sMRI联合处理三分类诊断和诊断阶段转换预测。M$^3$AD利用可解释的多门专家混合架构，采用专门的路由机制动态捕获诊断特定的病理模式和跨连续谱的共享结构特征。它进一步通过自适应注意力融合整合临床先验（年龄、性别、eTIV）以增强泛化能力。M$^3$AD在原始实验设置下达到95.13%的准确率（MCLNC报告为90.44%），转换预测准确率为94.87%。关键的是，分析多门路由揭示了区分稳定性和进展性MCI的独特专家激活特征，为个体水平的进展风险分层提供了机制基础。代码见：此 https URL。

英文摘要

Alzheimer's disease (AD) manifests as a continuous progression from normal cognition (NC) through mild cognitive impairment (MCI) to dementia. However, most deep learning approaches reduce this continuum to disjointed classification tasks, largely ignoring dynamic stage transitions. To decode this complex progression, we propose M$^3$AD, a unified framework that jointly addresses three-class diagnosis classification and diagnosis stage transition prediction using only T1-weighted sMRI. M$^3$AD leverages an interpretable multi-gate mixture of experts architecture, employing specialized routing mechanisms to dynamically capture both diagnosis-specific pathological patterns and shared structural features across the continuum. It further integrates clinical priors (age, sex, eTIV) via adaptive attention fusion to enhance generalization. M$^3$AD achieves 95.13% accuracy, compared to 90.44% reported by MCLNC under its original experimental setting, and 94.87% for transition prediction. Crucially, analyzing the multi-gate routing reveals distinct expert activation signatures distinguishing stable from progressive MCI, providing a mechanistic basis for individual-level progression risk stratification. Code is available at https://github.com/csyfjiang/M3AD.

URL PDF HTML ☆

赞 0 踩 0

2503.23179 2026-06-19 eess.IV cs.CV 版本更新

OncoReg: Medical Image Registration for Oncological Challenges

OncoReg：面向肿瘤学挑战的医学图像配准

Wiebke Heyer, Yannic Elser, Lennart Berkel, Xinrui Song, Xuanang Xu, Pingkun Yan, Xi Jia, Jinming Duan, Zi Li, Tony C. W. Mok, BoWen LI, Tim Hable, Christian Staackmann, Christoph Großbröhmer, Lasse Hansen, Alessa Hering, Malte M. Sieren, Mattias P. Heinrich

发表机构 * Institute of Medical Informatics, University of Lübeck（吕贝克大学医学信息学研究所）； Institute of Radiology and Nuclear Medicine, University Hospital Schleswig-Holstein（石勒斯维希-霍尔斯坦大学医院放射科和核医学研究所）； Department of Biomedical Engineering and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute（伦塞拉塞尔理工学院生物医学工程系和生物技术与跨学科研究中心）； School of Computer Science, University of Birmingham（伯明翰大学计算机科学学院）； Division of Informatics, Imaging and Data Sciences, University of Manchester（曼彻斯特大学信息学、成像和数据科学系）； DAMO Academy, Alibaba Group（阿里集团DAMO学院）； Hangzhou Shengshi Technology Co., Ltd（杭州盛世科技有限公司）； Department of Radiation Oncology, University Hospital Schleswig-Holstein（石勒斯维希-霍尔斯坦大学医院放射肿瘤科）； EchoScout GmbH ； Radboud University Medical Center, Nijmegen（奈密根大学医学中心）； Institute of Interventional Radiology, University Hospital Schleswig-Holstein（石勒斯维希-霍尔斯坦大学医院介入放射科）

AI总结提出OncoReg挑战，通过两阶段框架在保护患者隐私的同时开发可泛化的图像配准方法，用于放射治疗中锥束CT与扇束CT的配准，发现特征提取是关键，深度学习和经典方法结合最有效。

Comments 21 pages, 13 figures

详情

AI中文摘要

在现代癌症研究中，由于患者隐私相关的挑战，产生的大量医学数据往往未被充分利用。OncoReg挑战通过一个两阶段框架解决了这一问题，该框架使研究人员能够在确保患者隐私的同时开发和验证图像配准方法，并促进更可泛化的AI模型的发展。第一阶段涉及使用公开可用的数据集，第二阶段则专注于在安全的医院网络内对私有数据集进行模型训练。OncoReg建立在Learn2Reg挑战的基础上，纳入了放射治疗中介入性锥束计算机断层扫描与标准计划扇束CT图像的配准。准确的图像配准在肿瘤学中至关重要，特别是在图像引导放射治疗的动态治疗调整中，需要精确对齐以最小化对健康组织的辐射暴露，同时有效靶向肿瘤。本文详细介绍了OncoReg挑战的方法和数据，并对竞赛参赛作品和结果进行了全面分析。研究发现，特征提取在此配准任务中起着关键作用。从该挑战中涌现的一种新方法展示了其多功能性，而现有方法的表现与新技术相当。深度学习和经典方法在图像配准中仍扮演重要角色，尤其是方法的组合，特别是在特征提取方面，被证明最为有效。

英文摘要

In modern cancer research, the vast volume of medical data generated is often underutilised due to challenges related to patient privacy. The OncoReg Challenge addresses this issue by enabling researchers to develop and validate image registration methods through a two-phase framework that ensures patient privacy while fostering the development of more generalisable AI models. Phase one involves working with a publicly available dataset, while phase two focuses on training models on a private dataset within secure hospital networks. OncoReg builds upon the foundation established by the Learn2Reg Challenge by incorporating the registration of interventional cone-beam computed tomography with standard planning fan-beam CT images in radiotherapy. Accurate image registration is crucial in oncology, particularly for dynamic treatment adjustments in image-guided radiotherapy, where precise alignment is necessary to minimise radiation exposure to healthy tissues while effectively targeting tumours. This work details the methodology and data behind the OncoReg Challenge and provides a comprehensive analysis of the competition entries and results. Findings reveal that feature extraction plays a pivotal role in this registration task. A new method emerging from this challenge demonstrated its versatility, while established approaches continue to perform comparably to newer techniques. Both deep learning and classical approaches still play significant roles in image registration, with the combination of methods, particularly in feature extraction, proving most effective.

URL PDF HTML ☆

赞 0 踩 0

2405.10705 2026-06-19 eess.IV cs.CV 版本更新

3D Vessel Reconstruction from Sparse-View Dynamic DSA Images via Vessel Probability Guided Attenuation Learning

基于血管概率引导衰减学习的稀疏视角动态DSA图像三维血管重建

Zhentao Liu, Huangxuan Zhao, Wenhui Qin, Zhenghong Zhou, Xinggang Wang, Wenping Wang, Xiaochun Lai, Chuansheng Zheng, Dinggang Shen, Zhiming Cui

发表机构 * School of Biomedical Engineering \& State Key Laboratory of Advanced Medical Materials ； Devices, ShanghaiTech University, Shanghai, China ； National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China ； School of Electronic Information ； Communications, Huazhong University of Science ； Department of Computer Science \& Engineering, Texas A\&M University, USA

AI总结提出血管概率引导衰减学习框架，通过静态与动态衰减场互补加权实现稀疏视角DSA重建，降低辐射剂量，并采用渐进训练和时间扰动损失提升质量。

Comments Accepted by Medical Image Analysis (MedIA), 2026

详情

DOI: 10.1016/j.media.2026.104088

AI中文摘要

数字减影血管造影（DSA）是血管疾病诊断的金标准之一。借助造影剂，时间分辨的二维DSA图像提供全面的血流信息，可用于重建三维血管结构以进行医学评估。当前的商用DSA系统通常需要数百个扫描视角进行重建，导致大量辐射暴露。在本研究中，我们提出了一种基于神经渲染的优化框架，专门用于高质量稀疏视角DSA重建，以减少辐射剂量。我们的方法称为血管概率引导衰减学习，将DSA成像表示为静态和动态衰减场的互补加权组合，权重来自时间无关的血管概率场。作为前景掩膜，血管概率为静态和动态场提供适应不同场景类型的适当梯度。该机制实现了静态背景与动态造影剂流的自监督分解，并显著提高了重建质量。我们的模型通过最小化合成投影与真实DSA图像之间的差异进行训练。我们进一步采用两种训练策略来提高重建质量：（1）由粗到细的渐进训练以改善几何结构，以及（2）时间扰动渲染损失以保持时间一致性。实验结果表明了高质量的三维血管重建和二维DSA图像合成。

英文摘要

Digital Subtraction Angiography (DSA) is one of the gold standards for vascular disease diagnosis. With the help of a contrast agent, time-resolved 2D DSA images deliver comprehensive blood flow information and can be utilized to reconstruct 3D vessel structures for medical assessment. Current commercial DSA systems typically require hundreds of scanning views to perform reconstruction, resulting in substantial radiation exposure. In this study, we propose a neural rendering-based optimization framework tailored for high-quality sparse-view DSA reconstruction to reduce radiation dosage. Our approach, termed vessel probability guided attenuation learning, represents DSA imaging as a complementary weighted combination of static and dynamic attenuation fields, with the weights derived from the time-independent vessel probability field. Functioning as a foreground mask, vessel probability provides proper gradients for both static and dynamic fields adaptive to different scene types. This mechanism enables self-supervised decomposition between static backgrounds and dynamic contrast agent flow, and significantly improves reconstruction quality. Our model is trained by minimizing the discrepancy between synthesized projections and real captured DSA images. We further employ two training strategies to improve reconstruction quality: (1) coarse-to-fine progressive training for better geometry and (2) temporal perturbed rendering loss for temporal consistency. Experimental results have demonstrated high-quality 3D vessel reconstruction and 2D DSA image synthesis.

URL PDF HTML ☆

赞 0 踩 0