arXivDaily arXiv每日学术速递 周一至周五更新

科学与医疗

医学 AI

医学智能、临床 AI、医学影像、病理、诊断和医疗健康大模型。

今日/当前日期收录 1 信号源:cs.CV, cs.LG, q-bio, eess.IV, eess.SP
2606.19827 2026-06-19 cs.LG cs.AI 新提交 80%

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

何时、何地以及如何:面向表格自监督学习的自适应分箱

Daehwan Kim, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University(汉阳大学) Hankuk University of Foreign Studies(韩国外国语大学)

专题命中 其他医学AI :自适应分箱用于医疗表格自监督学习,提升性能。

AI总结 提出自适应分箱方法,通过特征级粗到细课程学习动态优化离散化,结合类别重建与顺序监督,在医疗表格数据上提升自监督学习性能。

Comments Accepted to MICCAI 2026

详情
AI中文摘要

医疗表格数据在临床研究中无处不在,但表格数据的深度学习仍未被充分探索,因为可靠的标签通常需要昂贵的专家判定,尽管结构化临床变量通常以表格形式常规可用。自监督学习可以利用这些未标记的表格,而最近基于分箱的前置任务提供了一种有前景的归纳偏置,但现有目标固定单个全局分位数离散化并应用特征无关的监督。我们提出自适应分箱,一种用于表格自监督学习的训练自适应离散化前置任务,通过特征级粗到细课程将离散化与学习耦合。受神经网络的频谱偏差和课程学习原则的启发,我们的方法在检测到平台期时逐步细化每个特征的离散化,并选择表示感知的分割点,以联合改善值空间浓度和表示空间一致性。一种异质性感知目标统一了类别重建与数值特征的顺序监督,在统一评估协议下对公共医疗表格数据集的实验显示,线性探测和微调均取得一致改进,无需数据集特定的离散化调整。我们进一步引入一个医疗表格自监督学习基准,配备标准化协议,以支持这一未被充分探索领域的可重复进展。我们的代码可在该网址获取。

英文摘要

Medical tabular data are ubiquitous in clinical research, but deep learning for tables remains underexplored because reliable labels often require costly expert adjudication, even though structured clinical variables are routinely available in tabular form. Self-supervised learning can leverage these unlabeled tables, and recent binning-based pretexts offer a promising inductive bias, but existing objectives fix a single global quantile discretization and apply feature-agnostic supervision. We propose Adaptive Binning, a training-adaptive discretization pretext for tabular SSL that couples discretization to learning through a feature-wise coarse-to-fine curriculum. Motivated by the spectral bias of neural networks and the principles of curriculum learning, our method progressively refines discretization per feature upon plateau detection and selects representation-aware splits to jointly improve value-space concentration and representation-space coherence. A heterogeneity-aware objective unifies categorical reconstruction with ordinal supervision for numerical features, and experiments on public medical tabular datasets under unified evaluation protocols show consistent gains for linear probing and fine-tuning without dataset-specific discretization tuning. We further introduce a medical tabular SSL benchmark with standardized protocols to support reproducible progress in this underexplored domain. Our code is available at https://github.com/labhai/Adaptive-Binning.