arXivDaily arXiv每日学术速递 周一至周五更新
重置

1. 深度学习架构与训练方法 10 篇

2606.18275 2026-06-18 cs.ET cond-mat.mtrl-sci cs.LG 交叉投稿

A physical adaptive material motor unit neural network: a hygromorph composite material machine

一种物理自适应材料运动单元神经网络:潮致变形复合材料机器

Charles de Kergariou, David Correa, Adam W. Perriman, Helmut Hauser, Fabrizio Scarpa

发表机构 * Bristol Composites Institute, School of Civil, Aerospace and Mechanical Engineering, University of Bristol(布里斯托尔复合材料研究所,土木、航空航天与机械工程学院,布里斯托尔大学) School of Architecture, University of Waterloo(滑铁卢大学建筑学院) Research School of Chemistry and John Curtin School of Medical Research, Australian National University(化学研究学校和约翰·库廷医学研究学院,澳大利亚国立大学) School of Cellular and Molecular Medicine, University of Bristol(细胞与分子医学学院,布里斯托尔大学) School of Engineering Mathematics and Technology, University of Bristol(工程数学与技术学院,布里斯托尔大学) Bristol Robotics Lab, Bristol, United Kingdom(布里斯托尔机器人实验室,布里斯托尔,英国)

AI总结 提出一种基于木材和炭黑复合材料的物理自适应运动单元神经网络,通过数据感知反向传播训练,实现动态遮阳控制,并能随数据库扩展增量学习。

Comments 35 pages, 16 figures

详情
AI中文摘要

新型材料科学的进步使得结构能够通过将记忆和学习能力直接嵌入材料来充当智能机器。我们的工作介绍了一种物理自适应材料运动单元神经网络,利用由木材和炭黑基复合材料组成的新一代可控执行器,这些执行器对温度和相对湿度敏感。这些材料执行器被组装成一种类似肌肉收缩触发的运动单元结构,形成一种能够进行动态遮阳控制的智能机器,例如可用于建筑物。该机器由一个神经网络控制,该网络在超过350个在不同环境条件下收集的实验数据点上进行训练。通过建立一种新的数据感知反向传播训练,我们展示了该机器能够预测遮阳响应,并随着数据库的扩展逐步学习预测适当的行为。我们还展示了该机器优化配置以在两种不同条件下实现相似遮阳输出的能力。

英文摘要

Advances in novel materials science enable structures to function as intelligent machines by embedding memory and learning capabilities directly into materials. Our work introduces a physical adaptive material motor unit neural network,leveraging a new generation of controllable actuators composed of wood- and carbon black-based composites, sensitive to temperature and relative humidity. These material actuators are assembled into a motor unit-like structure inspired by muscle contraction trigger, forming an intelligent machine capable of dynamic shading control that can be used, for example, in buildings. The machine is governed by a neural network trained on over 350 experimental data points collected under diverse environmental conditions. By establishing a new data-aware backpropagation training, we show that the machine predicts shading responses and learns to predict appropriate behaviour incrementally as the database expands. We also demonstrate the ability of the machine to optimise configurations to achieve similar shading outputs under two distinct conditions.

2606.18305 2026-06-18 math.NA cs.LG cs.NA 交叉投稿

Starter-Iterator Neural Operator: A Unified Architecture for High-Fidelity Forward and Inverse PDE Problems

起始迭代神经算子:面向高保真正问题和逆问题的统一架构

Kuilin Qin, Lianfang Wang, Xu Sun, Jiwei Jia, Yu Wang, Yong Wang, Yuping Duan

发表机构 * School of Mathematical Sciences, Beijing Normal University(北京师范大学数学科学学院) School of Mathematics, Jilin University(吉林大学数学学院) Key Laboratory of Digital Technology in Medical Diagnostics of Zhejiang(浙江省数字医疗诊断技术重点实验室) School of Physics, Nankai University(南开大学物理学院)

AI总结 提出起始迭代神经算子(SINO),通过神经网络重解释传统迭代方法的初始化与迭代格式,实现频谱-时空协同建模,在Navier-Stokes方程、声波方程等正逆问题中提升数值精度与泛化能力。

详情
AI中文摘要

算子学习是一个新兴的交叉学科领域,融合了机器学习与科学计算。通过映射无限维函数空间,该方法为高维偏微分方程(PDE)提供了高效的代理建模框架。与传统数值求解器相比,它在计算复杂度和逼近精度之间实现了更优的权衡,在实时预测和参数扫描等多查询任务中展现出显著优势。鉴于正演模拟和反演推理对精度的严格要求,以及现有算子学习方法在处理复杂边界或长期演化时的精度瓶颈,我们提出了起始迭代神经算子(SINO)。我们的框架通过神经网络重新诠释传统迭代方法的初始化策略和迭代格式,建立了一种高效的频谱-时空协同建模方法。具体而言,频域初始化模块捕获全局稳定的低频特征,而时域学习模块专注于优化局部解残差,从而有效克服了传统单域建模方法的内在局限性。在典型动力系统(如Navier-Stokes方程和声波方程)以及实际应用(包括超分辨率成像和天气预报)上的大量实验表明,SINO在数值精度、泛化能力和鲁棒性方面均取得了卓越性能。

英文摘要

Operator learning is an emerging interdisciplinary field that integrates machine learning with scientific computing. By mapping infinite-dimensional function spaces, this approach provides an efficient surrogate modeling framework for high-dimensional partial differential equations (PDEs). Compared to traditional numerical solvers, it achieves a superior trade-off between computational complexity and approximation accuracy, demonstrating significant advantages in many-query tasks such as real-time prediction and parameter sweeps. Given the stringent accuracy requirements of both forward simulation and inverse inference, as well as the precision bottlenecks of existing operator learning methods in handling complex boundaries or long-term evolution, we propose the Starter-Iterator Neural Operator (SINO). Our framework reinterprets the initialization strategies and iterative formats of traditional iterative methods through neural networks, establishing an efficient approach for spectral-spatiotemporal collaborative modeling. Specifically, the frequency-domain initialization module captures globally stable low-frequency features, while the time-domain learning module focuses on optimizing local solution residuals, thereby effectively overcoming the inherent limitations of conventional single-domain modeling approaches. Extensive experiments on typical dynamical systems such as the Navier-Stokes equations and acoustic wave equations, as well as practical applications including super-resolution imaging and weather forecasting, demonstrate that SINO achieves outstanding performance in numerical accuracy, generalization capability, and robustness.

2606.18611 2026-06-18 cs.SD cs.AI cs.LG stat.ML 交叉投稿

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

QC-GAN: 一种参数高效的四元数Conformer GAN用于高保真语音增强

Shogo Yamauchi, Hideaki Tamori, Makoto Sakai, Yosuke Yamano, Tohru Nitta

发表机构 * The Asahi Shimbun Company(朝日新闻社) Tokyo Woman's Christian University(东京女子基督教大学)

AI总结 提出参数高效的QC-GAN,结合四元数Conformer生成器和MetricGAN训练,通过汉密尔顿积共享权重减少参数量,在VoiceBank+DEMAND上以0.89M参数达到PESQ 3.48,性能媲美两倍大小模型。

Comments 10 pages, 6 figures and 5 tables. Accepted at Interspeech2026

详情
AI中文摘要

我们提出了一种参数高效的语音增强框架——四元数Conformer GAN(QC-GAN),它将四元数Conformer生成器与基于MetricGAN的训练相结合。汉密尔顿积通过结构化权重共享对幅度和相位进行编码,在减少层参数数量的同时保持其相互依赖性。采用度量学习判别器,通过优化近似感知评估分数来最大化感知质量。在VoiceBank+DEMAND数据集上,QC-GAN仅用0.89M参数就达到了3.48的语音质量感知评估(PESQ)分数,其性能与最先进模型相当,而参数量不到后者的一半。一个35K参数的变体实现了3.23的PESQ分数,以显著更少的参数超越了传统方法。在DNS-Challenge 3数据集上的评估进一步证实了其在真实世界条件下的泛化能力。

英文摘要

We propose a parameter-efficient speech enhancement framework, Quaternion Conformer GAN (QC-GAN), which combines a Quaternion Conformer generator with MetricGAN-based training. The Hamilton product encodes the magnitude and phase via structured weight sharing, reducing the number of layer parameters while preserving their interdependencies. A metric-learning discriminator was employed to maximize perceptual quality by optimizing the approximate perceptual evaluation scores. On the VoiceBank+DEMAND dataset, QC-GAN achieved a Perceptual Evaluation of Speech Quality (PESQ) score of 3.48 with only 0.89M parameters, delivering a performance comparable to state-of-the-art models at less than half their size. A 35K-parameter variant achieved a PESQ score of 3.23, surpassing conventional methods with significantly fewer parameters. Evaluation on the DNS-Challenge 3 dataset further confirmed generalization to real-world conditions.

2606.18759 2026-06-18 cs.CG cs.LG cs.NA math.NA 交叉投稿

A Neural Network Framework for Geodesic-Like Curve Computation on Parametric Surfaces

参数曲面上类测地线曲线计算的神经网络框架

Sheng-Gwo Chen, Chen-Chang Peng

发表机构 * Department of Applied Mathematics, National Chiayi University, Chia-Yi 600, Taiwan(国立嘉义大学应用数学系,嘉义600,台湾)

AI总结 提出基于物理信息神经网络(PINNs)的框架,高效计算参数曲面上的类测地线曲线,支持多曲面系统和旋转曲面。

Comments 22 pages, 16 figures, 8 tables

详情
AI中文摘要

类测地线曲线的概念由Chen于2010年提出,作为估计参数曲面上最短路径(测地线)的一种方法,其收敛性已在理论上得到证明。然而,高效的数值计算框架尚未被开发。在本文中,我们提出了一种优雅且高效的方法,通过利用深度学习和物理信息神经网络(PINNs)来计算类测地线曲线。在所提出的框架下,不仅可以高效处理单个参数曲面,还可以稳健地处理一大类复杂参数曲面,包括具有$C^0$或更高连续性的多曲面系统以及旋转曲面。

英文摘要

The concept of geodesic-like curves was introduced by Chen in 2010 as a method for estimating shortest paths (geodesics) on parametric surfaces, with its convergence established theoretically. However, an efficient numerical computational framework has not yet been developed. In this paper, we propose an elegant and efficient approach for computing geodesic-like curves by leveraging deep learning and Physics-Informed Neural Networks (PINNs). Under the proposed framework, not only can single parametric surfaces be handled efficiently, but a broad class of complex parametric surfaces including multi-surface systems with $C^0$ or higher continuity and surfaces of revolution can also be robustly addressed.

2606.18837 2026-06-18 cs.MA cs.AI cs.LG 交叉投稿

Skill-MAS: Evolving Meta-Skill for Automatic Multi-Agent Systems

Skill-MAS: 演化元技能以自动生成多智能体系统

Hehai Lin, Qi Yang, Chengwei Qin

发表机构 * Ant Group(蚂蚁集团) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州))

AI总结 提出Skill-MAS,通过将高层编排能力解耦为可演化的元技能,在无需参数更新的情况下实现经验保留,利用多轨迹采样和选择性反思优化元技能,在多个基准和LLM上取得显著性能提升且成本可控。

详情
AI中文摘要

基于大型语言模型(LLM)的自动多智能体系统(MAS)生成已成为处理复杂任务的关键前沿。然而,现有方法在模型能力和经验保留之间面临两难困境。推理时MAS利用冻结的尖端LLM,但重复相同搜索而不从过去经验中学习。相反,训练时MAS通过梯度更新内化经验,但受限于较小模型的低能力上限,且难以扩展到大型尖端LLM。为弥合这一差距,我们提出Skill-MAS,一种新颖的第三条路径,通过将高层编排能力概念化为可演化的元技能,将经验保留与参数更新解耦。Skill-MAS通过一个封闭优化循环来精炼这种架构知识:(1)多轨迹采样在当前元技能下为每个任务采样行为分布;(2)选择性反思自适应选择优先任务,并应用分层对比分析将系统经验蒸馏为可泛化的策略级原则。在四个复杂基准和四个不同LLM上的大量实验表明,Skill-MAS不仅实现了显著的性能提升,而且保持了良好的成本-性能权衡。进一步分析揭示,演化后的元技能高度鲁棒,并在未见任务和不同LLM之间表现出强迁移性。

英文摘要

Large Language Model (LLM)-based automatic Multi-Agent Systems (MAS) generation has become a crucial frontier for tackling complex tasks. However, existing methods face a dilemma between model capability and experience retention. Inference-time MAS leverages frozen frontier LLMs but repeats identical searches without learning from past experience. Conversely, Training-time MAS internalizes experience via gradient updates but is constrained by the low capability ceiling of smaller models, and is hard to scale to large frontier LLMs. To bridge this gap, we propose Skill-MAS, a novel third path that decouples experience retention from parametric updates by conceptualizing the high-level orchestration capability as an evolvable Meta-Skill. Skill-MAS refines this architectural knowledge through a closed optimization loop: (1) Multi-Trajectory Rollout samples a behavioral distribution for each task under the current Meta-Skill; and (2) Selective Reflection adaptively selects priority tasks and applies hierarchical contrastive analysis to distill systemic experience into generalizable, strategy-level principles. Extensive experiments across four complex benchmarks and four distinct LLMs demonstrate that Skill-MAS not only achieves remarkable performance gains but also maintains a favorable cost-performance trade-off. Further analysis reveals that the evolved Meta-Skills are highly robust and exhibit strong transferability across unseen tasks and different LLMs.

2606.18853 2026-06-18 stat.ML cs.LG 交叉投稿

Kernel of Partition Paths: A Unified Representation for Tree Ensembles

划分路径的核:树集成的统一表示

Nicolas Mahler

AI总结 提出KPP核,通过路径度量索引森林节点,统一了预测、精确加性归因、确定性Lipschitz鲁棒半径和Rademacher风险界,为树集成提供几何框架。

Comments 31 pages

详情
AI中文摘要

最近的一系列工作将单个决策树重新表述为基于其分裂的工程特征的线性模型,为oracle不等式和特征重要性重解释开辟了途径,但留下了一个开放问题:当通过节点而非分裂索引特征映射时,森林诱导的统一几何对象是什么。本文研究了该对象。KPP通过森林节点索引特征映射,并由路径度量加权,该度量将每个坐标转化为平方欧几里得路径等距嵌入的分量。KPP在承载度量的非对角Gram矩阵下统一了四个支柱:预测、精确加性归因、KPP度量下的确定性Lipschitz鲁棒半径,以及在固定、诚实或交叉拟合条件下的回归和分类的均匀Rademacher风险界。所有概率保证均以表示为条件,并在三种显式条件机制下陈述;鲁棒半径保证在KPP度量下是确定性的,而非原始输入的范数。回归和分类的快速率改进被推测为开放问题,并未声称是定理。

英文摘要

A recent line of work has reframed individual decision trees as linear models on engineered features associated with their splits, opening routes for oracle inequalities and feature-importance reinterpretation, but leaving open the question of what unified geometric object a forest induces when one indexes its feature map by nodes rather than by splits. The present paper studies that object. KPP indexes the feature map by the nodes of the forest, weighted by a path metric that turns each coordinate into a component of a squared-Euclidean path-isometric embedding. KPP unifies four pillars under a single non-diagonal Gram that carries a metric: prediction, exact additive attribution, deterministic Lipschitz robust radius in the KPP metric, and uniform Rademacher risk bounds for regression and classification under fixed, honest, or cross-fit conditioning. All probabilistic guarantees are conditional on the representation and are stated under three explicit conditioning regimes; the robust-radius guarantee is deterministic in the KPP metric rather than in a norm on the raw input. Conjectured fast-rate refinements for both regression and classification are stated as open problems and are not claimed as theorems.

2606.19039 2026-06-18 cs.NE cs.LG cs.SD 交叉投稿

Adaptive Speech-to-Spike Encoding for Spiking Neural Networks

自适应语音到脉冲编码用于脉冲神经网络

Taharim Rahman Anon, Jakaria Islam Emon

发表机构 * PI LLC(1 PI LLC)

AI总结 提出一种可学习的残差语音到脉冲编码器,与R-LIF骨干网络联合训练,在GSC-v2上达94.97%准确率,参数高效且学习任务对齐的脉冲表示。

Comments Accepted at Interspeech 2026. This version is a preprint

详情
AI中文摘要

连续声学信号与离散事件驱动处理之间的不匹配仍然是神经形态语音处理的基本瓶颈。当前系统通常依赖固定的脉冲编码器,迫使下游脉冲神经网络(SNN)补偿非自适应的输入表示。为了解决这个问题,我们提出了一种可学习的残差语音到脉冲编码器,与循环漏积分点火(R-LIF)骨干网络进行端到端联合训练。我们在Google Speech Commands v2(GSC-v2)基准上验证了该方法,达到了高达94.97%的准确率。值得注意的是,学习到的编码器仍然高度参数高效,其紧凑的35k参数变体达到了89.8%,匹配或超过了需要多一个数量级参数的先前基线。我们以编码器为中心的分析,包括线性探测和梯度残差检查,表明编码器并不追求忠实的信号重建,而是学习任务对齐的脉冲表示,增强了类别可分性。最后,我们通过比较直接反馈对齐(DFA)和替代梯度BPTT在相同架构和训练条件下的表现,对生物启发、硬件友好的信用分配进行了基准测试。我们发现DFA达到了91.5%的准确率,量化了生物启发学习规则在现代神经形态音频中的性能权衡。

英文摘要

The mismatch between continuous acoustic signals and discrete event-driven processing remains a fundamental bottleneck for neuromorphic speech processing. Current systems typically rely on fixed spike encoders, forcing downstream Spiking Neural Networks (SNNs) to compensate for non-adaptive input representations. To address this, we present a learnable residual speech-to-spike encoder jointly trained end-to-end with a Recurrent Leaky Integrate-and-Fire (R-LIF) backbone. We validate this approach on the Google Speech Commands v2 (GSC-v2) benchmark, achieving up to 94.97% accuracy. Notably, the learned encoder remains highly parameter-efficient with a compact 35k-parameter variant that reaches 89.8%, matching or exceeding prior baselines that require an order of magnitude more parameters. Our encoder-focused analysis, including linear probing and gradient-residual inspection, indicates that the encoder does not target faithful signal reconstruction but instead learns task-aligned spike representations that enhance class separability. Finally, we benchmark bio-inspired, hardware-friendly credit assignment by comparing Direct Feedback Alignment (DFA) with surrogate-gradient BPTT under identical architectures and training conditions. We find that DFA reaches 91.5% accuracy, quantifying the performance trade-off of bio-inspired learning rules for modern neuromorphic audio.

2606.19101 2026-06-18 eess.SP cs.LG 交叉投稿

Structure Over Nonlinearity: Explicit Interaction Architectures for Dynamical Learning

结构优于非线性:面向动力学学习的显式交互架构

Augusto Sarti

AI总结 提出基于波启发交互结构的显式动力学单元,通过结构化组织而非非线性表达实现建模能力,在非线性系统辨识中深度提升表示质量与泛化性能。

Comments 11 pages, 2 figures, 2 tables

详情
AI中文摘要

大多数动力学系统的学习架构依赖于通用非线性函数逼近,通常需要高模型复杂度来捕获结构化行为。在这项工作中,我们提出了一种替代范式,其中建模能力主要来源于结构而非表达性非线性。我们引入了一类基于波启发交互结构和内部状态的显式结构化动力学单元。受波计算原理启发,所提出的单元采用严格的因果组织,消除了代数循环,产生无需隐式求解器即可评估的完全显式模型。堆叠此类单元可产生具有涌现层次行为的分层动力学架构。通过非线性系统辨识任务的实验,我们表明即使在有限的参数优化下,深度也能提高表示质量和泛化能力。特别地,所提出的架构即使在仅进行读出层拟合时也能产生信息丰富的内部表示,这表明有用的动力学结构在大量参数优化之前就已从交互的组织中涌现。这些结果表明,结构优先的设计为学习动力学系统提供了一种可行且有效的替代传统黑箱方法,突出了交互结构作为模型表达性主要来源的作用。

英文摘要

Most learning architectures for dynamical systems rely on generic nonlinear function approximation, often requiring high model complexity to capture structured behaviors. In this work, we propose an alternative paradigm in which modeling capability arises primarily from structure rather than from expressive nonlinearities. We introduce a class of explicit structured dynamical units based on wave-inspired interaction structures with internal state. Inspired by wave-based computational principles, the proposed units adopt a strictly causal organization that eliminates algebraic loops, yielding fully explicit models that can be evaluated without implicit solvers. Stacking such units produces layered dynamical architectures with emergent hierarchical behavior. Through experiments on a nonlinear system identification task, we show that depth improves both representation quality and generalization, even under limited parameter optimization. In particular, the proposed architectures produce informative internal representations even under readout-only fitting, indicating that useful dynamical structure emerges from the organization of interactions prior to substantial parameter optimization. These results suggest that structure-first design provides a viable and effective alternative to conventional black-box approaches for learning dynamical systems, highlighting the role of interaction structure as a primary source of model expressivity.

2606.19168 2026-06-18 cs.AI cs.LG 交叉投稿

Beyond Safe Data: Pretraining-Stage Alignment with Regular Safety Reflection

超越安全数据:具有正则安全反射的预训练阶段对齐

Jinhan Li, Kexian Tang, Yihan Xu, Zhuorui Ye, Kaifeng Lyu

发表机构 * Institute for Interdisciplinary Information Sciences, Tsinghua University(清华大学交叉信息研究院)

AI总结 提出安全反射预训练方法,在预训练语料中插入安全反思,使模型具备自我监控能力,实验表明该方法能有效降低推理和微调攻击成功率。

详情
AI中文摘要

为了实现大型语言模型(LLMs)更深层次的安全对齐,最近的研究探讨了如何将安全干预措施提前到预训练阶段,主要通过过滤不安全数据或将其改写为更安全的形式。我们认为,预训练阶段的对齐应超越使数据安全:LLMs可能将看似良性的知识和能力组合成不安全的行为。为此,我们提出了安全反射预训练,一种预训练阶段的对齐方法,该方法定期在预训练语料中插入简短的安全反思,将自我监控直接集成到语言建模中,建立一种基础能力,随后通过兼容的后训练加以强化。我们在FineWeb-Edu上预训练的1.7B模型上的实验表明,安全反射预训练提高了安全分类准确性,并显著降低了推理阶段和微调攻击的成功率。除了真实世界实验,我们还引入了一个完全受控的合成环境MedSafetyWorld,其中包含清晰的安全定义和推理结构,模型可以轻松地从安全数据中泛化出不安全行为。在MedSafetyWorld中的消融实验进一步表明,与数据过滤和改写相比,安全反射预训练在防止模型根据安全数据泛化出的不安全行为方面具有明显优势。综合来看,我们的发现表明,预训练对齐不仅应使训练数据安全,还应塑造模型可能从安全数据中习得的行为。

英文摘要

To achieve deeper safety alignment for large language models (LLMs), recent efforts have studied how to push safety interventions earlier into the pretraining stage, primarily by filtering unsafe data or rewriting it into safer forms. We argue that pretraining-stage alignment should go beyond making the data safe: LLMs may compose seemingly benign knowledge and capabilities into unsafe behaviors. To this end, we propose Safety Reflection Pretraining, a pretraining-stage alignment method which regularly inserts short safety reflections into pretraining corpora to integrate self-monitoring directly into language modeling, establishing a foundational capability that is subsequently reinforced by compatible post-training. Our experiments with 1.7B models pretrained on FineWeb-Edu show that Safety Reflection Pretraining improves safety classification accuracy and substantially reduces the success rates of inference-stage and finetuning attacks. Complementary to our real-world experiments, we also introduce a fully controlled synthetic environment, MedSafetyWorld, with a clear definition of safety and a reasoning structure under which models can easily generalize unsafe behaviors from safe data. Ablations in MedSafetyWorld further demonstrate a clear advantage of Safety Reflection Pretraining in preventing models from acting on unsafe behaviors generalized from safe data, compared with data filtering and rewriting. Taken together, our findings suggest that pretraining alignment should not only make the training data safe, but also shape the behaviors that models are likely to acquire from safe data.

2606.19279 2026-06-18 cs.AI cs.LG cs.LO math.CT math.LO math.PR 交叉投稿

NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning

NeSyCat Torch:神经符号学习中范畴语义的可微张量实现

Daniel Romero Schellhorn, Till Mossakowski, Björn Gehrke

发表机构 * University of Osnabrück(奥斯纳布吕克大学)

AI总结 提出NeSyCat Torch框架,通过强单子和真值聚合结构统一神经符号语义,利用惰性对数张量单子实现可微训练,在MNIST加法任务上优于LTN和DeepProbLog。

详情
AI中文摘要

神经符号语义是碎片化的:经典、模糊、概率和神经系统的真值各自遵循其归纳规则。NeSyCat扩展了ULLER,将它们统一在一个单一的真值归纳定义下,该定义以强单子和真值上的聚合结构为参数。NeSyCat至今缺乏对由神经网络学习的谓词和函数的描述。我们提供NeSyCat Torch作为缺失的环节,通过神经网络解释计算符号,在概率编程和张量后端中实现该框架。我们使用分布单子作为参考语义和度量评估,并辅以一个用于数值稳定、可微训练的单子:对数半环上的惰性对数张量单子。为了高效批量训练,我们还采用了批处理单子。公理即源代码:一次性地用基于单子的do-notation编写,单子绑定执行边缘化,惰性地剪枝不需要的分支。在MNIST加法任务上,我们的HaskTorch、JAX和PyTorch实现在速度和准确性上优于LTN和DeepProbLog,同时几乎达到DeepStochLog的准确性。然而,与DeepStochLog不同,我们保持在一个统一的框架内,适用于许多一阶神经符号方法。即,该构造以单子为参数;例如,用Giry单子实例化它可将方法扩展到连续概率(在此留作未来工作)。

英文摘要

Neurosymbolic semantics is fragmented: classical, fuzzy, probabilistic and neural systems each define truth by their own inductive rules. NeSyCat, extending ULLER, subsumes them under a single inductive definition of truth, parametric in a strong monad and an aggregation structure on truth-values. NeSyCat has so far lacked an account of predicates and functions learned by neural networks. We provide NeSyCat Torch as the missing link and interpret computational symbols via neural networks, implementing the framework in probabilistic programming and tensor-based backends. We use the distribution monad for reference semantics and metric evaluation, and complement it by a monad for numerically stable, differentiable training: the lazy log-tensor monad over the log-semiring. For efficient training in batches, we furthermore employ a batch monad. The axioms are the source code: written once in monad-based do-notation, monadic bind performs marginalisation, lazily pruning unneeded branches. On MNIST addition, our HaskTorch, JAX, and PyTorch implementations outperform LTN and DeepProbLog in speed and accuracy, while achieving nearly the accuracy of DeepStochLog. However, unlike DeepStochLog, we stay in a uniform framework that applies to many first-order NeSy approaches. Namely, the construction is parametric in the monad; instantiating it with, e.g., the Giry monad extends the approach to continuous probability (working out a neural representation here is left for future work).

2. 表示学习、自监督与对比学习 2 篇

2606.18520 2026-06-18 stat.ML cs.CG cs.CL cs.DS cs.IR cs.LG 交叉投稿

Compact Geometric Representations of Hierarchies

层次结构的紧凑几何表示

Prashant Gokhale, Piotr Indyk, Yuhao Liu, Sandeep Silwal, Tony Chang Wang, Haike Xu

发表机构 * UW-Madison(威斯康星大学麦迪逊分校) MIT(麻省理工学院)

AI总结 研究如何用低维几何嵌入表示有向无环图中的祖先-后代关系,提出基于树宽等结构参数的维度上界和下界,并在真实数据集上验证了紧凑性。

Comments Published at the 39th Annual Conference on Learning Theory (COLT) 2026. 22 Pages

详情
AI中文摘要

计算数据的几何表示是现代机器学习的基石,通常通过训练双编码器将查询和文档映射到共享嵌入空间来实现。You等人[NeurIPS '25]的最新工作将这种方法扩展到层次检索,其中相关性由有向无环图(DAG)中的祖先-后代关系决定。虽然先前的工作表明当后代数量较少时存在有效嵌入,但这些界限对于深层层次结构会严重退化,所需维度与节点总数相当。在本文中,我们研究了更一般图类的紧凑可达性嵌入,并提供了使用维度依赖于结构图参数的嵌入来表示层次结构的理论保证。我们证明,对于任何有向树,存在常数维度3的可达性嵌入,与树的大小或深度无关。我们将这一结果推广到以树宽$t$为特征的图,构造了维度为$O(t \log n)$的嵌入,其中$n$是节点数。作为这些上界的补充,我们提供了匹配或接近匹配的下界,表明对于一般DAG,维度$\Omega(n)$是必要的,而对于树宽为$t$的图,需要$\Omega(t/\log(n/t))$的维度。我们还获得了由DAG中交叉边数量参数化的上界和下界。此外,我们展示了我们的嵌入可以在真实世界数据集上构建,并且与先前具有理论保证的嵌入相比,在高召回率情况下维度小得多。

英文摘要

Computing geometric representations of data is a cornerstone of modern machine learning, typically achieved by training dual encoders which map queries and documents into a shared embedding space. Recent work of You et al. [NeurIPS '25] has extended this approach to hierarchical retrieval, where relevance is determined by the ancestor-descendant relationships in a Directed Acyclic Graph (DAG). While previous work has shown that valid embeddings exist when the number of descendants is small, these bounds degrade significantly for deep hierarchies, requiring dimensions as large as the total number of nodes. In this paper, we investigate compact reachability embeddings for more general graph classes and provide theoretical guarantees for representing hierarchies using embeddings whose dimension depends on structural graph parameters. We prove that for any directed tree, there exists a reachability embedding in constant dimension 3, independent of the tree's size or depth. We generalize this result to graphs characterized by treewidth $t$, constructing embeddings of dimension $O(t \log n)$, where $n$ is the number of nodes. Complementing these upper bounds, we provide matching or near-matching lower bounds, showing that dimension $Ω(n)$ is necessary for general DAGs and $Ω(t/\log(n/t))$ is required for graphs of treewidth $t$. We also obtain upper and lower bounds parameterized by the number of cross-edges in the DAG. We additionally show that our embeddings can be constructed on real world datasets, and that they give much smaller dimensions in high recall regimes compared to prior embeddings with theoretical guarantees.

2606.19249 2026-06-18 cs.CV cs.LG 交叉投稿

Transformer Geometry Observatory TGO-I: Spectral Geometry Observatory

Transformer几何观测站TGO-I:谱几何观测站

Kaustubh Kapil, Kishor P. Upla

发表机构 * Sardar Vallabhai National Institute of Technology (SVNIT), Surat, India(印度苏拉特萨达尔·瓦拉巴伊国家理工学院(SVNIT))

AI总结 提出TGO框架,通过分析ViT表示的谱几何(有效秩、稳定秩、参与比、谱熵、谱平坦度、谱各向异性等),发现训练过程中维度利用增加、各向异性降低、谱熵和参与比上升,最终CLS标记表示具有最高有效维度和最低各向异性。

详情
AI中文摘要

尽管Vision Transformers(ViTs)被广泛采用并在众多计算机视觉应用中取得成功,对其维度和表示几何的基本理解仍然相对未被充分探索。为了弥补这一差距,我们引入了Transformer几何观测站(TGO),这是一个系统的实验和分析流程框架,旨在研究Vision Transformers的表示几何和动态。TGO-I是该框架的第一部分,专注于ViT表示的谱几何。使用在ImageNet-100上训练的ViT-Small/16模型,我们分析了训练过程中的有效秩、稳定秩、参与比、谱熵、谱平坦度、谱各向异性、协方差结构、特征谱和奇异值谱。我们的结果揭示了维度利用的一致增加,伴随着各向异性降低、谱熵增加、参与比增加以及逐渐平坦的特征谱。与常见的直觉(即训练应将信息集中到少数主导方向)相反,我们观察到方差在表示维度上的逐渐重新分布。这一现象在最终的CLS标记表示中尤为明显,该表示在网络中表现出最高的有效维度和最低的各向异性。

英文摘要

Despite the widespread adoption of Vision Transformers (ViTs) and their success across numerous computer vision applications, the fundamental understanding of their dimensional and representational geometry remains relatively underexplored. To address this gap, we introduce Transformer Geometry Observatory (TGO), a systematic framework of experiments and analysis pipelines designed to investigate the representational geometry and dynamics of Vision Transformers. TGO-I, the first installment of the framework, focuses on the spectral geometry of ViT representations. Using a ViT-Small/16 model trained on ImageNet-100, we analyze Effective Rank, Stable Rank, Participation Ratio, Spectral Entropy, Spectral Flatness, Spectral Anisotropy, covariance structure, eigenspectra, and singular value spectra throughout training. Our results reveal a consistent increase in dimensional utilization, accompanied by decreasing anisotropy, increasing spectral entropy, increasing participation ratio, and progressively flatter eigenspectra. Contrary to the common intuition that training should concentrate information into a small number of dominant directions, we observe a progressive redistribution of variance across representational dimensions. This phenomenon is particularly pronounced in the final CLS token representation, which exhibits the highest effective dimensionality and lowest anisotropy within the network.

3. 强化学习与序列决策 5 篇

2606.18438 2026-06-18 math.OC cs.LG 交叉投稿

Sequential Hiring of Contingent Workers Through Learning-Based Optimization

基于学习优化的临时工顺序雇佣

Chris Lee, Xiuli Chao, Izak Duenyas

发表机构 * Department of Industrial and Operations Engineering, University of Michigan(工业与运营工程系,密歇根大学) Ross School of Business, University of Michigan(罗斯商学院,密歇根大学)

AI总结 针对临时工场景中工人产能和劳动力供给的不确定性,提出DR-UCB策略,通过学习周期顺序决策替换与雇佣,实现累积利润最大化,并证明其遗憾下界匹配。

详情
AI中文摘要

在本文中,我们研究了临时工场景下存在工人产能和劳动力供给不确定性的顺序劳动力管理问题。企业通过维持固定规模的活跃团队并随时间学习工人生产力,以最大化累积利润。我们强调该问题中的两个关键运营摩擦:替换工人成本高昂,且工人可能因先前工作承诺、日程限制或入职流程等原因无法立即雇佣。因此,雇佣决策仅在随机延迟后生效。我们将该问题建模为具有昂贵切换和延迟动作的随机多臂赌博机,并开发了一种基于学习的雇佣策略DR-UCB(延迟替换-UCB),该策略通过学习周期顺序做出替换和雇佣决策。在每个周期中,该策略使用实时生产数据确定何时启动劳动力变更以及替换和雇佣哪些工人。我们证明,所提策略的前沿遗憾在其对时间范围的依赖上匹配下界。数值实验表明,DR-UCB优于基准策略。

英文摘要

In this paper, we study a sequential workforce management problem in a contingent labor setting with uncertainty in both worker production and labor supply. A firm seeks to maximize cumulative profit by maintaining an active team of fixed size while learning worker productivity over time. We emphasize two critical operational frictions in this problem: replacing workers is costly, and workers may not be available immediately for hiring because of, for example, prior job commitments, scheduling constraints, or onboarding procedures. Thus, hiring decisions take effect only after a random delay. We formulate this problem as a stochastic multi-play bandit with costly switching and delayed actions, and develop a learning-based hiring policy, DR-UCB (DelayedReplacement-UCB), that makes replacement and hiring decisions sequentially through learning cycles. In each cycle, the policy uses real-time production data to determine when to initiate workforce changes and which workers to replace and hire. We show that the leading-order regret of the proposed policy matches its lower bound in its dependence on the time horizon. Our numerical experiments show that DR-UCB outperforms benchmark policies.

2606.18514 2026-06-18 cs.RO cs.LG 交叉投稿

N(CO)$^2$: Neural Combinatorial Optimization with Chance Constraints to Solve Stochastic Orienteering

N(CO)$^2$: 基于机会约束的神经组合优化求解随机定向问题

Anas Saeed, Marcos Abel Zuzuárregui, Stefano Carpin

发表机构 * Department of Computer Science and Engineering, University of California, Merced(加州大学默塞德分校计算机科学与工程系)

AI总结 提出N(CO)$^2$框架,结合强化学习求解随机定向问题,无需手工启发式,在不确定环境下优化路径选择,性能媲美MILP。

详情
Journal ref
In Proceedings of the IEEE International Conference on Automation Science and Engineering (CASE), 2025
AI中文摘要

神经组合优化(NCO)通过学习启发式,为求解复杂图优化问题提供了一种有前景的替代传统启发式方法的方法。这类问题在自动化领域频繁出现,可用于建模多种应用。虽然NCO在确定性组合优化问题上已被广泛研究,但只有少数工作旨在解决随机组合优化问题。本文提出N(CO)$^2$:基于机会约束的神经组合优化,用于求解随机定向问题(SOP),无需手工设计的启发式。通过集成强化学习(RL)框架,模型在不确定性下优化路径选择,有效平衡探索与利用。实验结果表明,我们的方法在多种SOP实例上具有良好的泛化能力,与最先进的混合整数线性规划(MILP)相比性能具有竞争力。所提方法减少了启发式设计的人力投入,同时在不确定环境中实现自适应和高效的决策。

英文摘要

Neural combinatorial optimization (NCO) offers a promising alternative to traditional heuristic-based methods for solving complex graph optimization problems by proposing to learn heuristics through data. This class of problems frequently arises in automation, as it can be used to model a variety of applications. While NCO has been extensively studied for deterministic combinatorial optimization problems, there are only a few works that aim to solve stochastic combinatorial optimization problems. In this work, we present N(CO)$^2$: Neural Combinatorial Optimization with Chance cOnstraints to solve the Stochastic Orienteering Problem (SOP) without the use of hand-crafted heuristics. By integrating a reinforcement learning (RL) framework, the model optimizes path selection under uncertainty, effectively balancing exploration and exploitation. Empirical results demonstrate that our method generalizes well across diverse SOP instances, achieving competitive performance compared to the state-of-the-art mixed-integer linear program (MILP) for the task. The proposed approach reduces human effort in heuristic design while enabling adaptive and efficient decision-making in uncertain environments.

2606.18531 2026-06-18 stat.ML cs.LG 交叉投稿

When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?

轨迹级监督何时允许高效的离线强化学习?

Xuanfei Ren, Tengyang Xie

发表机构 * University of Wisconsin-Madison(威斯康星大学麦迪逊分校)

AI总结 本文研究离线强化学习中仅使用轨迹级结果(如累积回报或偏好)进行策略优化的统计理论,提出OPAC算法并证明其样本复杂度,同时揭示在非线性聚合目标下存在的统计障碍。

Comments 69 pages

详情
AI中文摘要

离线强化学习通常在过程级奖励监督下进行分析,然而许多序列决策数据集仅记录轨迹级结果。我们发展了从这种结果级监督进行离线策略优化的统计理论。首先研究规范设置,其中目标仍是期望累积奖励,但每个离线轨迹仅提供一个标量标签,其条件均值是累积回报。我们提出OPAC,一种悲观演员-评论家算法,它学习潜在奖励模型并从轨迹级标签优化策略。我们证明了阶为$\widetilde O(H^2\sqrt{C_{sa}(\pi^\star)/n})$的高概率保证和匹配的下界,刻画了用单个轨迹级标签替代过程级奖励的尖锐统计代价。然后我们将该原理扩展到基于偏好的反馈,在偏好模型常数范围内保留了领先的视界和可集中性依赖。最后,我们研究广义基于结果的离线强化学习,其中监督和目标都是由潜在每步奖励的非线性聚合引起的轨迹级量。该问题通常不可学习:对于全成功目标,即使具有确定性转移和常数可集中性,任何离线学习器可能需要$\Omega(2^H)$个轨迹。然后我们通过两个结构系数$\kappa_\mu(\sigma)$和$\chi_\mu(\sigma)$识别出一个可处理的区域,这两个系数捕捉了结果聚合和广义贝尔曼更新中的信息损失,在此区域广义OPAC实现了多项式样本复杂度。我们的结果共同描绘了何时结果级监督能够实现样本高效的离线控制,以及何时缺失过程级奖励会带来根本性的统计障碍。

英文摘要

Offline reinforcement learning is typically analyzed under process-level reward supervision, yet many sequential decision datasets record only trajectory-level outcomes. We develop a statistical theory for offline policy optimization from such outcome-level supervision. We first study the canonical setting where the target remains the expected cumulative reward, but each offline trajectory provides only a scalar label whose conditional mean is the cumulative return. We propose OPAC, a pessimistic actor-critic algorithm that learns a latent reward model and optimizes a policy from trajectory-level labels. We prove a high-probability guarantee of order $\widetilde O(H^2\sqrt{C_{sa}(π^\star)/n})$ and a matching lower bound, characterizing the sharp statistical cost of replacing process-level rewards with one trajectory-level label. We then extend the principle to preference-based feedback, preserving the leading horizon and concentrability dependence up to preference-model constants. Finally, we study generalized outcome-based offline RL, where both the supervision and the objective are trajectory-level quantities induced by a nonlinear aggregation of latent per-step rewards. This problem is not learnable in general: for all-success objectives, any offline learner may require $Ω(2^H)$ trajectories even with deterministic transitions and constant concentrability. We then identify a tractable regime through two structural coefficients, $κ_μ(σ)$ and $χ_μ(σ)$, capturing information loss in outcome aggregation and generalized Bellman updates, under which generalized OPAC achieves polynomial sample complexity. Together, our results delineate when outcome-level supervision enables sample-efficient offline control and when missing process-level rewards create fundamental statistical barriers.

2606.18598 2026-06-18 cs.AI cs.LG 交叉投稿

Optimizing Lithium Production Decisions under Geological, Demand, and Pricing Uncertainties: A POMDP Framework for Multi-Objective Decision Making

在地质、需求和定价不确定性下优化锂生产决策:多目标决策的POMDP框架

Anna C. Edmonds, Mansur M. Arief, Robert J. Moss, Mykel J. Kochenderfer, Jef Caers

发表机构 * Computer Science Department, Stanford University(斯坦福大学计算机科学系) Aeronautics and Astronautics Department, Stanford University(斯坦福大学航空与航天系) Earth and Planetary Sciences Department, Stanford University(斯坦福大学地球与行星科学系)

AI总结 提出POMDP框架,通过信念状态规划优化锂矿开采决策,动态适应价格不确定性,实现更高需求满足和更平衡的经济环境效益。

Comments 24 pages, 14 tables, 4 figures

详情
AI中文摘要

锂生产中的决策制定具有挑战性,无论是从投资者角度还是战略生产角度。决定开采哪些矿山以及何时开采,不仅涉及地质和价格不确定性,还涉及提取方法选择的复杂性,从直接锂提取到硬岩开采。先前的工作探索了该问题的模型和优化采矿决策的不同方法;这些模型没有考虑定价不确定性、需求不确定性或提取锂的不同采矿技术。将不同的定价模型和提取技术纳入这些模型,可以制定更稳健的策略,不仅决定何时何地开采矿山,还决定采用哪种生产方法。我们将问题表述为部分可观测马尔可夫决策过程(POMDP),并使用信念状态规划方法求解以获得最优决策。在我们的研究中,我们表明POMDP求解器通过信念状态规划和显式不确定性管理,动态适应变化的锂价格机制(静态、线性、指数和随机),优于人类启发式启发法。通过优化勘探、生产和技术选择的顺序,该框架在所有不同的定价和矿床情景下,在项目生命周期内实现了更高的需求满足和更平衡的经济环境结果。

英文摘要

Decision making in lithium production is challenging, whether from an investor's perspective or a strategic production standpoint. Determining which mines to open and when to open them involves not only geological and price uncertainties, but also complexities around the choice of extraction method, from direct lithium extraction to hard rock mining. Prior work explored models of this problem and different methods to optimize mining decisions; these models did not account for uncertainty in pricing, uncertainty in demand, or different mining technologies to extract lithium. Incorporating different pricing models and extraction technology into these models enables more robust strategies for determining not only when and where to open a mine, but also which method of production to pursue. We frame the problem as a partially observable Markov decision process (POMDP) and solve using belief state planning methods to get optimal decision making. In our study, we show that POMDP solvers outperform human inspired heuristics by dynamically adapting to shifting lithium price regimes (static, linear, exponential, and stochastic) through belief state planning and explicit uncertainty management. By optimally sequencing exploration, production, and technology choice, the framework achieves higher demand fulfillment and more balanced economic environmental outcomes over the projects lifetime in all different pricing and deposit scenarios.

2606.19069 2026-06-18 eess.SY cs.LG cs.SY 交叉投稿

Model-Free Reinforcement Learning Control for Resilient Cyber-Physical Systems

面向弹性信息物理系统的无模型强化学习控制

Hugo O. Garcés, Alejandro J. Rojas, Bernardo A. Hernández, Andrés Escalona, Jonathan M. Palma, Md. Rezwan Parvez, Bhushan Gopaluni, Sirish L. Shah

发表机构 * Departmento de Ingenier\'ia El\'ectrica, Universidad de Concepci\'on, Concepci\'on, Chile (e-mail: ) Department of Electrical \& Computer Engineering, University of Alberta, Edmonton, T6G 1H9, Alberta, AB, Canada (e-mail: ) Department of Chemical Biological Engineering, University of British Columbia, Vancouver, BC V6T 1Z3, Canada ( ) Department of Chemical \& Materials Engineering, University of Alberta, Edmonton, T6G 1H9, Alberta, AB, Canada (e-mail: )

AI总结 本文比较了无模型控制器在非线性系统遭受网络攻击(虚假数据注入和拒绝服务攻击)下的性能,分析了四种强化学习奖励类型,发现Lyapunov奖励在低跟踪误差下弹性最佳,指数奖励在中等训练条件下提供良好折衷,渐进和线性奖励收敛快但鲁棒性差。

Comments Accepted to the 23rd IFAC World Congress 2026

详情
AI中文摘要

本文比较了无模型控制器在遭受网络攻击(包括虚假数据注入和拒绝服务攻击)的非线性系统上的性能。分析了四种强化学习奖励类型的准确性、成本和弹性。结果表明,Lyapunov奖励在低跟踪误差下提供最佳弹性。指数模式在中等训练条件下也提供了良好的折衷,具有可接受的弹性。渐进和线性奖励收敛更快,但鲁棒性较差。强化学习模型预测控制器(RL-MPC)表现出强稳态弹性,但需要更长的训练时间;强化学习比例-积分-微分控制器(RL-PID)更快,训练时间显著减少。近端策略优化(PPO)优于深度确定性策略梯度(DDPG),关键绩效指标(KPI)方差显著降低。本研究旨在强调精心设计的强化学习奖励如何提高性能和对网络威胁的弹性。

英文摘要

This paper compares the performance of model-free controllers on a nonlinear system under cyberattacks, including false data injection and denial-of-service attacks. Four RL reward types are analyzed for accuracy, cost, and resilience. Results show that the Lyapunov reward offers the best resilience with low tracking error. Exponential mode also provides good trade-offs with acceptable resilience under moderate training conditions. Progressive and linear rewards converge faster but are less robust. RL-MPCs show strong steady-state resilience but require longer training times; RL-PID controllers are faster with significantly less training time. Proximal Policy Optimization outperforms Deep Deterministic Policy Gradient with a significant reduction in KPI variance. This study serves to highlight how well-designed RL rewards can improve performance and resilience against cyber threats.

4. 生成模型与概率建模 5 篇

2606.18290 2026-06-18 cond-mat.stat-mech cs.LG eess.SP 交叉投稿

Stochastic Thermodynamics and SDE-based Generative Models

随机热力学与基于SDE的生成模型

Yaowen Zhang

发表机构 * GitHub

AI总结 本文在随机热力学框架下,为基于SDE的生成模型(如扩散模型和薛定谔桥)定义了轨迹层面的功、热和熵产生,并推广了Jarzynski恒等式和类第二定律不等式。

详情
AI中文摘要

基于SDE的生成模型,包括扩散模型和薛定谔桥,在信号处理任务中有着广泛的应用,如语音增强、图像恢复和时间序列生成。本文在随机热力学的背景下为这类模型提出了一个建模框架。本文的主要结果是功、热和熵产生的轨迹层面定义,以及一个推广的Jarzynski恒等式和一个类第二定律不等式。所提出的框架扩展了原始的Jarzynski设置,以适应时间依赖的浴温和非保守驱动力。这种热力学视角可能从非平衡统计力学的角度加深我们对扩散模型和薛定谔桥的理解。

英文摘要

SDE-based generative models, including diffusion models and the Schrödinger bridge, have found broad applications in signal processing tasks such as speech enhancement, image restoration, and time-series generation. This note presents a modeling framework for such models within the context of stochastic thermodynamics. The main results of this note are trajectory-level definitions of work, heat, and entropy production, along with a generalized Jarzynski identity and a second-law-like inequality. The proposed framework extends the original Jarzynski setup to accommodate time-dependent bath temperature and nonconservative driving forces. This thermodynamic perspective may deepen our understanding of diffusion models and the Schrödinger bridge from a nonequilibrium statistical mechanics viewpoint.

2606.18354 2026-06-18 eess.IV cs.LG 交叉投稿

Structural MRI Synthesis for Alzheimer's Disease via Conditional Diffusion on Anatomical Masks

基于解剖掩膜条件扩散的阿尔茨海默病结构MRI合成

Muge Zhang, Muhammad Ali Khaliq, Jamal Alsakran, Byeong Kil Lee, Jeeho Ryoo

发表机构 * Fairleigh Dickinson University(Fairleigh Dickinson大学) University of Colorado at Colorado Springs(科罗拉多州立大学)

AI总结 针对阿尔茨海默病结构MRI合成中细微解剖变化难以捕捉的问题,本文扩展Med-DDPM条件扩散模型,以解剖分割掩膜为条件生成3D结构MRI,实验表明合成数据训练的模型Dice分数与真实数据相当,混合数据训练则显著提升性能。

详情
Journal ref
2025 IEEE 8th International Conference on Multimedia Information Processing and Retrieval (MIPR)
AI中文摘要

生成式机器学习模型的最新进展显著改善了医学成像,为数据增强、隐私保护和模型泛化提供了有前景的解决方案。然而,由于神经退行性病变相关的细微、区域特异性和渐进性解剖变化,合成阿尔茨海默病(AD)的高质量结构MRI数据仍然具有挑战性。在本文中,我们将最初为脑肿瘤合成设计的Med-DDPM条件扩散模型扩展,以生成专门针对AD的3D结构MRI。我们采用Med-DDPM,因为与其他生成模型相比,它具有稳定的结构和保真度,特别适合捕捉AD特征的细微解剖变化。我们的方法以来自ADNI数据集的解剖分割掩膜为条件,将关键的AD相关脑结构纳入生成过程。我们通过在真实、合成和混合数据集上训练分割模型,系统评估了合成图像的质量和实用性。实验结果表明,仅在合成数据上训练的分割模型达到了与真实数据训练(0.6513)相当的Dice分数(0.6532),同时召回率显著提高。值得注意的是,在混合数据集(混合真实和合成图像)上训练的模型优于真实和纯合成基线,Dice分数达到0.7244。这些发现强调了条件扩散模型在生成解剖准确、AD特异性合成MRI方面的成功应用,并突出了它们在增强训练数据可用性、提高诊断准确性和促进神经影像研究可重复性方面的潜力。

英文摘要

Recent advances in generative machine learning models have significantly improved medical imaging, offering promising solutions for data augmentation, privacy preservation, and improved model generalization. However, synthesizing high-quality structural MRI data for Alzheimer's Disease (AD) remains challenging due to the subtle, region-specific, and progressive anatomical changes associated with neurodegeneration. In this paper, we extend the Med-DDPM conditional diffusion model -- originally designed for brain tumor synthesis -- to generate 3D structural MRIs specifically tailored to AD. We adopted Med-DDPM due to its established stability and structural fidelity compared to other generative models, which makes it particularly suitable for capturing the subtle anatomical changes characteristic of AD. Our approach conditions the diffusion process on anatomical segmentation masks derived from the ADNI dataset, incorporating key AD-relevant brain structures into the generation process. We systematically evaluate the quality and utility of the synthetic images by training segmentation models on real, synthetic, and hybrid (mixed) datasets. Experimental results demonstrate that segmentation models trained exclusively on synthetic data achieve comparable Dice scores (0.6532) to those trained on real data (0.6513), while exhibiting significantly enhanced recall. Notably, models trained on hybrid datasets (mixing real and synthetic images) outperform both real and synthetic-only baselines, achieving a Dice score of 0.7244. These findings underscore the successful use of conditional diffusion models for generating anatomically accurate, AD-specific synthetic MRIs, and highlight their potential for enhancing training data availability, improving diagnostic accuracy, and promoting research reproducibility in neuroimaging studies.

2606.18790 2026-06-18 cs.SD cs.AI cs.LG 交叉投稿

Closing the Loop: PID Feedback Control for Interpretable Activation Steering in Symbolic Music Generation

闭环:用于符号音乐生成中可解释激活引导的PID反馈控制

Ioannis Prokopiou, Pantelis Vikatos, Maximos Kaliakatsos-Papakostas, Theodoros Giannakopoulos, Themos Stafylakis

发表机构 * Athens University of Economics and Business(雅典经济与商业大学) Orfium Research(Orfium 研究) Hellenic Mediterranean University(希腊地中海大学) Archimedes / Athena Research Center(阿基米德/雅典娜研究中心)

AI总结 提出基于PID反馈控制的推理时激活引导框架,通过差分均值法提取音高和时长潜在方向,并利用Gram-Schmidt正交化解耦多属性引导,实现符号音乐生成中细粒度、可解释的属性调制。

Comments Accepted at Learning to Listen: ICML 2026 Workshop on Machine Learning for Audio (43rd International Conference on Machine Learning - ICMLMLA26), 4 pages main (11 total), 2 figures

详情
AI中文摘要

基于Transformer的架构在生成复杂符号序列方面取得了显著进展,但在实现对离散信号属性的细粒度、可解释控制方面仍存在明显差距。本文研究了多轨音乐Transformer(MMT)的机制可解释性,并提出了一种无需重新训练即可通过推理时激活引导实现确定性属性调制的框架。利用差分均值(DiffMean)方法,我们在残差流中分离出信号属性(特别是音高和时长)的潜在方向。我们验证了该领域的线性表示假设,实现了引导幅度与属性偏移之间的高相关性。为了解决多属性引导中固有的特征纠缠问题,我们引入了一种利用Gram-Schmidt正交化的双引导框架。实验结果表明,与朴素向量加法相比,这种几何解耦减少了概念干扰和信号退化,即使在强自回归条件下也能实现独立的确定性控制。

英文摘要

Transformer-based architectures have significantly advanced the generation of complex symbolic sequences, yet a significant gap remains in achieving fine-grained, interpretable control over discrete signal attributes. This paper investigates the mechanistic interpretability of the Multitrack Music Transformer (MMT) and proposes a framework for deterministic attribute modulation without retraining to bridge this gap via inference-time activation steering. Utilizing the Difference-in-Means (DiffMean) methodology, we isolate latent directions for signal attributes, specifically Pitch and Duration, within the residual stream. We validate the Linear Representation Hypothesis in this domain, achieving high correlation between steering magnitude and attribute shift. To address the inherent feature entanglement in multi-attribute steering, we introduce a Dual Steering framework utilizing Gram-Schmidt Orthogonalization. Experimental results demonstrate that this geometric decoupling reduces conceptual interference and signal degradation compared to naive vector addition, enabling independent deterministic control even against strong autoregressive conditioning.

2606.18856 2026-06-18 cs.CL cs.LG 交叉投稿

Approximate Structured Diffusion for Sequence Labelling

近似结构化扩散用于序列标注

Nicolas Floquet, Joseph Le Roux, Nadi Tomeh

发表机构 * Université Sorbonne Paris Nord, CNRS, Laboratoire d’Informatique de Paris Nord, LIPN(巴黎北大学 Sorbonne、法国国家科学研究中心、巴黎北信息学实验室、LIPN)

AI总结 提出一种基于扩散的条件随机场(CRF)训练方法,通过引入标签噪声条件来捕捉长距离依赖,结合近似推理在词性标注任务上实现16.5%的错误率降低。

详情
AI中文摘要

序列标注是自然语言处理(NLP)的核心任务,涉及为输入句子的每个标记分配一个标签。从机器学习的角度来看,序列标注通常被建模为由神经网络参数化的线性链条件随机场(CRF)。虽然这种方法在经验上取得了良好结果,但CRF假设有限的决策跨度(例如标签二元组),这可能会限制其表达能力,并在需要长距离依赖时损害性能。我们证明可以利用扩散来训练一个以整个标签序列为条件的CRF,但条件是标签的噪声版本。实验表明,该方法结合近似CRF推理,在词性标注任务上实现了16.5%的错误率降低,提高了标签准确性。

英文摘要

Sequence labelling, a core task of Natural Language Processing (NLP), consists in assigning each token of an input sentence a label. From a Machine Learning point of view, sequence labelling is often cast as a Linear-Chain Conditional Random Field (CRF) parametrised by a neural network. While this approach gives good empirical results, CRFs assume a finite decision span (eg label bigrams) which can limit their expressivity and hurt performance when long-range dependencies are required. We show we can leverage diffusion to train a CRF conditioned on an entire label sequence, with the caveat that the condition is on a noisy version of labels. We show experimentally that this method, in conjunction with approximate CRF inference, improves label accuracy with a 16.5% error reduction for POS-tagging.

2606.19005 2026-06-18 cs.CL cs.LG 交叉投稿

Sumi: Open Uniform Diffusion Language Model from Scratch

Sumi: 从头训练的开放均匀扩散语言模型

Mengyu Ye, Keito Kudo, Wataru Ikeda, Ryosuke Matsuda, Keisuke Sakaguchi, Jun Suzuki

发表机构 * Tohoku University(东北大学)

AI总结 本文提出Sumi,一个从零开始预训练的70亿参数均匀扩散语言模型,在1.5T tokens上训练,性能与同规模自回归模型相当,并开源所有资源。

详情
AI中文摘要

扩散模型已成为自回归模型的有前途的替代方案。其中,均匀扩散语言模型(UDLM)允许在任何步骤更新任何token,原则上能够实现更灵活的生成。然而,目前还没有从零开始预训练的大参数规模和大token预算的UDLM。自回归建模和掩码扩散建模已经拥有大规模的可供社区研究和构建的模型;而均匀扩散模型则没有。大规模从头预训练的UDLM将为研究缩放行为、生成动态、可控性以及与现有自回归和掩码扩散模型的权衡提供一个干净的参考点。为此,我们引入了Sumi(日语中“墨水”的意思),一个完全开放的70亿参数均匀扩散语言模型,从零开始在1.5T tokens上预训练。Sumi在知识、推理和编码基准测试中与在可比token预算下训练的自回归模型表现相当,但在常识基准测试中表现较差,其中我们以教育为主的数据混合可能是原因之一。我们发布了模型权重、检查点和完整的训练方案,包括在公开可用的语料库上的数据混合的完整规范。我们希望这次发布能使社区研究大规模原生均匀扩散,并促进对其尚未很好理解的方面的研究。

英文摘要

Diffusion models have become a promising alternative to autoregressive models. Among these, uniform diffusion language models (UDLMs) permit any token to be updated at any step, in principle enabling more flexible generation. However, no UDLM has yet been pretrained from scratch at both large parameter scale and large token budget. Both autoregressive modeling and masked diffusion modeling already have capable models at scale that the community can study and build on; uniform diffusion has none. A scratch-pretrained UDLM at scale would provide a clean reference point for studying scaling behavior, generation dynamics, controllability, and trade-offs against established autoregressive and masked diffusion models. To this end, we introduce Sumi ("ink" in Japanese), a fully open 7B uniform diffusion language model pretrained from scratch on 1.5T tokens. Sumi performs competitively with autoregressive models trained at comparable token budgets on knowledge, reasoning, and coding benchmarks, while under-performing on commonsense benchmarks, where our education-heavy data mixture is a likely contributor. We release our model weights, checkpoints, and full training recipe, including a complete specification of the data mixture over publicly available corpora. We hope this release enables the community to study native uniform diffusion at scale and catalyzes work on its as-yet poorly understood aspects.

5. 优化、泛化与理论分析 8 篇

2606.18515 2026-06-18 quant-ph cs.LG stat.ML 交叉投稿

Exponentially many initializations to avoid barren plateaus

指数多个初始化以避免贫瘠高原

Ankit Kulshrestha, Ricard Puig, Diego García-Martín, Lukasz Cincio, Ilya Safro, Zoë Holmes, M. Cerezo

发表机构 * Fujitsu Research of America, Santa Clara, CA 95054, USA(美国富士通美洲研究部) University of Delaware, Newark, DE 19716, USA(德雷克塞尔大学) Department for Quantum Information and Computation at Kepler (QUICK), Johannes Kepler University, Linz, Austria(约翰·凯撒大学量子信息与计算部门) Information Sciences, Los Alamos National Laboratory, Los Alamos, NM 87545, USA(洛斯阿拉莫斯国家实验室信息科学部)

AI总结 提出一阶矩框架诊断初始化能否逃离完全集中的贫瘠高原不动点,发现避免贫瘠高原的初始化策略高度非唯一,存在指数多个不等价族,且不同初始化导致不同极小值。

Comments 18 + 27 pages, 5+4 figures, 1 Table

详情
AI中文摘要

贫瘠高原被描述为一种平均情况现象:选择一个拟设,天真地初始化,然后集中随之而来。这导致了一种普遍观点,即贫瘠高原的潜在治愈方法仅仅是更仔细地初始化参数。在这里,我们表明情况更为微妙。我们引入了一个一阶矩框架,该框架提供了一个简单的算子级诊断,用于判断初始化何时可能逃离完全集中的贫瘠高原不动点,并用于比较不同初始化策略引起的偏差。我们的框架恢复了几种已知的初始化方案,如恒等初始化和高斯初始化,但也表明避免贫瘠高原是高度非唯一的。实际上,许多平移、有偏和非对称的参数分布可以避免集中,并且这些选择不必等价。事实上,我们的结果表明,可以生成指数多个不等价的初始化策略族。然后,我们的数值实验表明,不同一阶矩不同的初始化可能导致不同的达到极小值,这表明通过智能初始化避免贫瘠高原可以将指数集中问题转化为从众多选项中选择正确可训练口袋的挑战。

英文摘要

Barren plateaus are stated as an average-case phenomenon: pick an ansatz, initialize it naively, and concentration follows. This has led to the common view that a potential cure for barren plateaus is simply to initialize the parameters more carefully. Here we show that the situation is subtler. We introduce a first-moment framework that gives a simple operator-level diagnostic for when an initialization may escape the fully concentrated barren-plateau fixed point, and for comparing the biases induced by different initialization strategies. Our framework recovers several known initialization schemes such as identity and Gaussian initialization, but also shows that barren-plateau avoidance is highly non-unique. Indeed, many shifted, biased, and non-symmetric parameter distributions can avoid concentration, and these choices need not be equivalent. In fact, our results show that one can generate exponentially many families of inequivalent initialization strategies. Then, our numerics indicate that different first-moment-distinct initializations can lead to different attained minima, suggesting that avoiding barren plateaus via smart initializations can trade the exponential concentration problem for the challenge of selecting the right trainable pocket amongst many options.

2606.18527 2026-06-18 stat.ML cs.LG 交叉投稿

Toward Simultaneously Optimal Regret in U-Calibration

面向同时最优遗憾的U-校准

Rafael Frongillo, Haipeng Luo, Nishant A. Mehta, Jon Schneider

发表机构 * University of Colorado Boulder(科罗拉多大学波德穆尔分校) University of Southern California(南加州大学) Google Research(谷歌研究)

AI总结 提出一种基于自和谐噪声的FTPL变体,实现对所有有界适当损失的最优$\tilde O(\sqrt{T})$遗憾和对光滑损失的对数遗憾。

Comments 30 pages; to appear at COLT 2026

详情
AI中文摘要

U-校准研究在线预测算法,其预测可被任何未知下游智能体使用,同时保证对所有适当损失函数的次线性遗憾。现有U-校准算法对每个有界适当损失实现了最坏情况最优的$O(\sqrt{T})$遗憾,但它们未能适应更简单的损失:如我们所示,即使对于平方损失等光滑损失,它们也会产生$\Omega(\sqrt{T})$遗憾,而不是最优的$O(\log T)$遗憾。在这项工作中,我们表明这一局限性并非固有。具体来说,我们设计了一个单一的预测算法,同时对所有有界适当损失实现$\tilde O(\sqrt{T})$遗憾,并对所有有界光滑适当损失实现$O(\log T)$遗憾。更一般地,我们的算法还对于相对于对数障碍光滑的损失(包括几个非Lipschitz例子)实现了对数遗憾。我们的方法基于一种新颖的跟随扰动领导者(FTPL)变体,其中使用自和谐噪声直接在预测空间中应用扰动。由于这种噪声的复杂性质,所得分析也大大偏离了先前的FTPL分析,可能具有独立意义。

英文摘要

U-calibration studies online forecasting algorithms whose predictions can be consumed by any unknown downstream agent, guaranteeing sublinear regret simultaneously for all proper loss functions. Existing U-calibration algorithms achieve worst-case optimal $O(\sqrt{T})$ regret for every bounded proper loss, but they fail to adapt to easier losses: as we show, even for smooth losses such as squared loss, they incur $Ω(\sqrt{T})$ regret instead of the optimal $O(\log T)$ regret. In this work, we show that this limitation is not inherent. Specifically, we design a single forecast algorithm that simultaneously achieves $\tilde O(\sqrt{T})$ regret for every bounded proper loss and $O(\log T)$ regret for every bounded smooth proper loss. More generally, our algorithm also attains logarithmic regret for losses that are smooth relative to the log-barrier, which include several non-Lipschitz examples. Our approach is based on a novel variant of Follow-the-Perturbed-Leader (FTPL) in which perturbations are applied directly in the prediction space using self-concordant noise. The resulting analysis also departs substantially from prior FTPL analyses due to the complex nature of this noise and may be of independent interest.

2606.18679 2026-06-18 cs.DS cs.GT cs.LG math.OC 交叉投稿

Fair Online Resource Allocation

公平在线资源分配

Christopher En, Yuri Faenza, Andrea Lodi, Gonzalo Muñoz

发表机构 * Columbia University, IEOR Department(哥伦比亚大学工业工程与运营研究系) Cornell Tech(康奈尔科技学院) Universidad de Chile(智利大学)

AI总结 研究在线资源分配中的公平性问题,提出基于对偶镜像下降的算法,在批次内强制执行公平约束,实现亚线性遗憾,并通过难民数据验证了福利与公平的权衡。

Comments 3 pages, 4 figures. To appear in the proceedings of EC 2026

详情
AI中文摘要

我们研究公平在线资源分配问题,其动机源于难民安置和航班调度等应用,其中代理顺序到达并必须分配到容量有限的设施。我们引入一个模型,在资源约束和Lipschitz公平性要求下最大化整体福利,该要求确保同一批次中到达的相似代理获得相似的预期结果。我们首先分析离线问题,证明最优公平分配的价值至少是最优不公平分配的$\Omega(1/\gamma)$倍,其中$\gamma$是公平系数,从而界定了公平的代价。对于在线设置,我们提出一种基于对偶镜像下降的算法,该算法在估计最优对偶变量的同时,在批次内强制执行公平约束。我们证明该算法相对于最优离线流体基准实现了亚线性遗憾。最后,我们使用难民经济项目的真实数据验证了理论结果,展示了算法的性能,并考察了福利最大化与公平执行之间的权衡。

英文摘要

We study the problem of fair online resource allocation, motivated by applications such as refugee resettlement and airline scheduling, where agents arrive sequentially and must be assigned to facilities with limited capacities. We introduce a model that maximizes the overall welfare subject to resource constraints and a Lipschitz fairness requirement, which ensures that similar agents arriving in the same batch receive similar expected outcomes. We first analyze the offline problem, proving that the value of the optimal fair allocation is at least an $Ω(1/γ)$ fraction of the optimal unfair allocation, where $γ$ is the fairness coefficient, thereby bounding the price of fairness. For the online setting, we propose an algorithm based on dual mirror descent that enforces fairness constraints within batches while estimating optimal dual variables. We prove that this algorithm achieves sublinear regret relative to the optimal offline fluid benchmark. Finally, we validate our theoretical results using real-world data from the Refugee Economies Programme, demonstrating the algorithm's performance and examining the trade-offs between welfare maximization and fairness enforcement.

2606.18807 2026-06-18 cs.DS cs.LG 交叉投稿

Learning Augmented Exact Exponential Algorithms

学习增强的精确指数时间算法

Tatiana Belova, Yuriy Dementiev, Danil Sagunov

发表机构 * ITMO University(ITMO大学)

AI总结 提出一种通用方法,利用略优于随机猜测的噪声预测器,可证明地减少NP难子集选择问题的搜索空间,运行时间加速随预测质量平滑扩展,且仅需预测的成对独立性或无需知道预测器精度。

详情
AI中文摘要

学习增强算法领域已经证明,机器学习预测可以在广泛的问题中绕过最坏情况下的下界。然而,到目前为止,关注点几乎完全集中在多项式时间算法上,其中预测改进了竞争比、近似保证或运行时间。在本文中,我们提出了一个问题:预测能否推动NP难问题的精确指数时间算法的前沿?我们通过提出一种通用方法对此问题给出肯定回答,该方法增强了一整类用于各种子集选择问题的最先进精确算法。我们表明,一个仅略优于随机猜测的噪声预测器足以可证明地减少搜索空间,并且由此产生的运行时间加速随预测质量平滑扩展。重要的是,我们的算法仅需要预测的成对独立性,或者,不需要知道预测器的精度——这两种设置都比通常假设的更弱且更现实。

英文摘要

The field of learning-augmented algorithms has demonstrated that machine-learned predictions can bypass worst-case lower bounds across a wide range of problems. So far, however, the focus has been almost exclusively on polynomial-time algorithms, where predictions improve competitive ratios, approximation guarantees, or running times. In this paper, we raise the question of whether predictions can push the frontier of exact exponential-time algorithms for NP-hard problems. We answer this question affirmatively by proposing a general approach that augments an entire family of state-of-the-art exact algorithms for a variety of subset selection problems. We show that a noisy predictor that is only marginally better than random guessing suffices to provably reduce the search space, and that the resulting runtime speedup scales smoothly with the prediction quality. Importantly, our algorithms require only pairwise independence of predictions or, alternatively, do not require the knowledge of the predictor's accuracy - both strictly weaker and more realistic settings than typically assumed.

2606.18993 2026-06-18 stat.ML cs.LG stat.ME 交叉投稿

Sequential Kernel-based Conditional Independence Testing via Adaptive Betting

基于自适应投注的序列核条件独立性检验

Zheng He, Danica J. Sutherland

AI总结 提出一种对估计误差更鲁棒的序列条件独立性检验方法,通过自适应优化核条件独立性统计量、归一化及截断平移校准,在合成与真实数据上控制第一类错误并保持高功效。

Comments Published at ICML 2026: https://openreview.net/forum?id=vUMdIyTs9c

详情
AI中文摘要

检验条件独立性是基础但本质上困难的问题:在没有额外假设的情况下,通常无法控制第一类错误。“Model-X”范式通过假设精确知道相关条件分布来解决这一困难。虽然经典的一次性检验有时可以容忍对该假设的小偏差,但现有的序列条件独立性检验通常要求精确知道Model-X条件分布,这使得当必须估计该分布时它们变得脆弱。我们提出了一种新方法,对这类估计误差具有更强的鲁棒性。我们的方法将测试-投注应用于自适应优化的核条件独立性统计量,并结合归一化方案和截断-移位校准策略。这些修改大大减少了第一类错误膨胀,同时在高维合成基准和现实世界公平性任务中保持了高功效,优于现有的序列Model-X方法。代码可在https://this URL获取。

英文摘要

Testing conditional independence is fundamental yet intrinsically difficult: without additional assumptions, Type I error control is impossible in general. The "Model-X'' paradigm addresses this difficulty by assuming exact knowledge of a relevant conditional distribution. While small deviations from this assumption can sometimes be tolerated in classical one-shot testing, existing sequential conditional independence tests typically require the Model-X conditional to be known exactly, making them fragile when it must instead be estimated. We propose a new approach that is substantially more robust to such estimation error. Our method applies testing-by-betting to an adaptively optimized Kernel Conditional Independence statistic, together with a normalization scheme and a truncate-and-shift calibration strategy. These modifications greatly reduce Type I error inflation while preserving high power across high-dimensional synthetic benchmarks and real-world fairness tasks, outperforming existing sequential Model-X approaches. Code is available at https://github.com/he-zh/SKCI.

2606.19117 2026-06-18 stat.ME cs.LG econ.EM stat.ML 交叉投稿

Wasserstein Policy Learning for Distributional Outcomes

Wasserstein 策略学习用于分布性结果

Yiyan Huang, Cheuk Hang Leung, Qi Wu, Zhiheng Zhang

AI总结 针对分布值结果,提出基于Wasserstein重心和效用泛函的策略学习框架,使用IPW和DR估计器,证明遗憾率由策略类复杂度主导,并给出极小化下界。

Comments Accepted by The 39th Annual Conference on Learning Theory (COLT 2026)

详情
AI中文摘要

离线策略学习在因果推断中受到越来越多的关注。主要目标是学习一个策略(个体化治疗规则),作为从协变量到治疗的映射,以最大化定义为标量值潜在结果均值的经验福利。在本文中,我们研究具有分布值结果的离线策略学习,其中每个潜在结果是$\mathbb{R}$上的概率测度,奖励通过应用于诱导结果分布的Wasserstein重心的效用泛函来定义。我们基于逆概率加权(IPW)和双稳健(DR)估计器为策略学习框架建立了统计保证。通过处理组合策略类和无限维分位数域乘积上的具有挑战性的均匀偏差,我们证明了有限样本遗憾具有主导依赖$\widetilde{\mathcal{O}}(\sqrt{\mathrm{N\text{-}dim}(\Pi)/N})$。在一维Wasserstein设定下,并在所述正则条件下,主导遗憾率仍由策略类复杂度控制。此外,我们提供了一个极小化下界,建立了对$N$和$\mathrm{N\text{-}dim}(\Pi)$主导依赖的尖锐性。

英文摘要

Offline policy learning has received growing attention in causal inference. The primary objective is to learn a policy (individualized treatment rule) as a mapping from covariates to treatment that maximizes the empirical welfare defined as the mean of scalar-valued potential outcomes. In this paper, we study offline policy learning with distribution-valued outcomes, where each potential outcome is a probability measure on $\mathbb{R}$ and the reward is defined through a utility functional applied to the Wasserstein barycenter of induced outcome distributions. We establish statistical guarantees for the policy learning framework based on both Inverse Probability Weighting (IPW) and Doubly Robust (DR) estimators. By handling the challenging uniform deviation over the product of the combinatorial policy class and the infinite-dimensional quantile domain, we prove that the finite-sample regret has leading dependence $\widetilde{\mathcal{O}}(\sqrt{\mathrm{N\text{-}dim}(Π)/N})$. In the one-dimensional Wasserstein setting and under the stated regularity conditions, the leading regret rate is still governed by the policy-class complexity. Moreover, we provide a minimax lower bound establishing the sharpness of the leading dependence on $N$ and $\mathrm{N\text{-}dim}(Π)$.

2606.19147 2026-06-18 stat.ML cs.LG math.ST stat.TH 交叉投稿

On Local Population-Risk Certificates

论局部总体风险证书

Mingzhi Song

发表机构 * Department of Mathematics, The University of Hong Kong(香港大学数学系)

AI总结 本文提出局部总体风险增量证书,用于在模型更新时提供风险控制,通过双边置信带判断更新是否接受。

Comments 35 pages, 6 figures

详情
AI中文摘要

本文为当前模型周围的总体风险增量开发了局部证书。对于局部候选集 \(\mathcal D\),该证书是 \(P({\ell_{\theta+v}-\ell_\theta})\) 在 \(v\in\mathcal D\) 上的双边置信带。作为应用,该置信带的上端点产生了一个风险控制的更新规则:仅当认证的上端点为非正时,更新被接受;否则保留当前模型。

英文摘要

This paper develops local certificates for population-risk increments around a current model. For a local candidate set \(\mathcal D\), the certificate is a two-sided confidence band for \(P({\ell_{θ+v}-\ell_θ})\) over \(v\in\mathcal D\). As an application, the upper endpoint of this band yields a risk-controlled update rule: an update is accepted only when its certified upper endpoint is nonpositive; otherwise the current model is retained.

2606.19212 2026-06-18 stat.ML cs.LG 交叉投稿

Generalised Eigenvalue Geometry of Semantic Adversarial Attacks

语义对抗攻击的广义特征值几何

Martin Anthony, Kaveh Salehzadeh Nobari

AI总结 提出一种连续局部模型,通过矩阵束$(A,B)$的最大广义特征值量化语义对抗攻击性,并给出预测翻转条件、攻击性证书及VC界。

详情
AI中文摘要

最近的实证工作表明,语义等价的释义可以欺骗金融情感分类器:尽管释义在强参考嵌入下保持与原文接近,但它可能足以改变目标模型的表示,从而改变预测类别。现有的鲁棒性理论要么假设单模型威胁模型,要么主要关注实证攻击算法。我们开发了一个连续局部模型来描述语义释义扰动,该模型捕捉了这种双模型结构。我们证明,在代理模型预算下,目标表示的最坏情况局部位移由从两个嵌入映射的雅可比矩阵构造的矩阵束$(A,B)$的最大广义特征值控制。由此产生的攻击性指标$\lambda^*(x)$是局部释义几何和所选嵌入器固有的,为仿射读出提供了闭式预测翻转条件,并支持保守的总体和有限样本攻击性证书。为了对仿射读出的类别进行统一控制,我们推导了二元攻击性指标的无分布VC界,以及基于攻击性调整边界的尺度敏感边界,该边界从标准分类器边界中减去局部几何惩罚。我们还将连续理论与离散释义搜索联系起来,识别出成功与不成功的有限搜索之间的不对称性,并给出了离散和连续设置一致时的覆盖条件。最后,我们提出了一个使用软令牌松弛和生成的释义集的实证验证框架,以评估部署的金融文本分类器上的局部特征值几何、预测翻转条件和有限搜索近似。

英文摘要

Recent empirical work shows that semantically equivalent paraphrases can fool financial sentiment classifiers: although a paraphrase remains close to the original under a strong reference embedding, it may shift the target model's representation enough to change the predicted class. Existing robustness theory either assumes a single-model threat model or focuses mainly on empirical attack algorithms. We develop a continuous local model of semantic paraphrase perturbations that captures this two-model structure. We show that the worst-case local displacement of the target representation, subject to a proxy-model budget, is governed by the largest generalised eigenvalue of a matrix pencil $(A,B)$ constructed from the Jacobians of the two embedding maps. The resulting attackability index $λ^*(x)$ is intrinsic to the local paraphrase geometry and the chosen embedders, yields a closed-form prediction-flip condition for affine readouts, and supports conservative population and finite-sample attackability certificates. For uniform control over classes of affine readouts, we derive a distribution-free VC bound for binary attackability indicators and a scale-sensitive margin bound based on an attackability-adjusted margin that subtracts a local geometric penalty from the standard classifier margin. We also connect the continuous theory to discrete paraphrase search, identify an asymmetry between successful and unsuccessful finite searches, and give a covering condition under which the discrete and continuous settings agree. Finally, we propose an empirical verification framework using soft-token relaxations and generated paraphrase sets to assess the local eigenvalue geometry, prediction-flip condition, and finite-search approximation on a deployed financial-text classifier.

6. 高效学习、压缩与部署 3 篇

2606.18463 2026-06-18 cs.DC cs.LG cs.NA math.NA stat.ML 交叉投稿

Mixed-Precision Communication-Avoiding SGD for Generalized Linear Models on GPUs

面向GPU上广义线性模型的混合精度通信避免SGD

Aditya Devarakonda, Irene Simó Muñoz, Giulia Guidi

发表机构 * Department of Computer Science, Wake Forest University(沃杰福大学计算机科学系) Department of Computer Science, Cornell University(康奈尔大学计算机科学系)

AI总结 提出混合精度通信避免SGD(CA-SGD),通过分析有限精度误差将精度选择分解为九个独立部分,在NVIDIA GPU上实现5.1-6.8倍加速,且损失与FP32 SGD匹配。

详情
AI中文摘要

分布式随机梯度下降(SGD)受限于通信而非计算,因为每次迭代都需要跨进程进行AllReduce。通信避免SGD(CA-SGD)通过将$s$次连续的AllReduce替换为单个$sb\ imes sb$ Gram矩阵的AllReduce,将通信开销分摊到$s$次迭代中,以更多的计算和带宽换取更少的同步点。现代GPU配备矩阵硬件和低精度格式,通过加速Gram GEMM和缩减BF16流量来抵消这一开销。我们研究了NVIDIA GPU上针对广义线性模型的混合精度CA-SGD。我们的有限精度分析将一次CA-SGD外迭代的局部舍入误差分解为九个独立的精度选择,仅通过低精度单元舍入误差依赖于硬件,因此所得方案原则上可跨GPU代际迁移。该方案将输入矩阵和边缘向量以低精度存储,从低精度输入计算Gram矩阵并采用高精度累加,以高精度通信该矩阵,并以高精度执行内部递推和权重更新。在NERSC Perlmutter A100 GPU上,混合精度CA-SGD在逻辑回归、线性回归和泊松问题上的损失与FP32 SGD相差在0.5%以内,并在epsilon、SUSY、HIGGS、synth和Poisson-synth数据集上达到5.1-6.8倍于FP32 SGD的加速。我们的软件可在以下网址获取:this https URL

英文摘要

Distributed stochastic gradient descent (SGD) is limited by communication rather than computation, since each iteration requires an AllReduce across processes. Communication-avoiding SGD (CA-SGD) amortizes communication over $s$ iterations by replacing $s$ consecutive AllReduces with a single AllReduce of an $sb\times sb$ Gram matrix, trading more computation and bandwidth for fewer synchronization points. Modern GPUs with matrix hardware and reduced-precision formats offset this by accelerating the Gram GEMM and shrinking BF16 traffic. We study mixed-precision CA-SGD for generalized linear models on NVIDIA GPUs. Our finite-precision analysis decomposes the local rounding error of one CA-SGD outer iteration into nine independent precision choices, depending on the hardware only through its low-precision unit roundoffs, so the resulting recipes transfer in principle across GPU generations. The recipe stores the input matrix and margin vector in low precision, computes the Gram matrix from low-precision inputs with high-precision accumulation, communicates it in high precision, and performs the inner recurrence and weight updates in high precision. On NERSC Perlmutter A100 GPUs, mixed-precision CA-SGD matches FP32 SGD loss within $0.5\%$ on logistic, linear, and Poisson problems and reaches $5.1$--$6.8\times$ speedup over FP32 SGD on epsilon, SUSY, HIGGS, synth, and Poisson-synth. Our software is available at https://doi.org/10.5281/zenodo.20448273

2606.19004 2026-06-18 cs.DC cs.AI cs.LG 交叉投稿

Spotlight: Synergizing Seed Exploration and Spot GPUs for DiT RL Post-Training

Spotlight: 协同种子探索与抢占式GPU用于DiT强化学习后训练

Ruiqi Lai, Dakai An, Wei Gao, Ju Huang, Siran Yang, Jiamang Wang, Lin Qu, Dmitrii Ustiugov, Wei Wang

发表机构 * NTU Singapore(南洋理工大学) Hong Kong University of Science and Technology(香港科技大学) Alibaba Group(阿里巴巴集团)

AI总结 针对DiT强化学习后训练成本高的问题,提出Spotlight系统,通过利用探索对旧权重的容忍性和SP组快速重配置,在抢占式GPU上实现高效训练,加速4倍并降低成本1.4-6.4倍。

详情
AI中文摘要

扩散Transformer(DiT)的强化学习(RL)后训练成本极高,需要数千块高端GPU。现有工作探索了两个降低成本的方向:种子探索通过选择高对比度样本来改善训练收敛,但增加了关键路径的计算量;抢占式GPU提供69-77%的成本降低,但在训练期间处于空闲状态,因为DiT rollout几乎同时完成,这阻止了类似LLM的rollout与训练流水线化。抢占式GPU的抢占进一步破坏了序列并行(SP)组,导致GPU拓扑碎片化。我们提出了Spotlight,这是第一个利用抢占式GPU进行DiT RL后训练的系统。Spotlight基于我们设计的两个关键洞察:(1)我们证明探索可以容忍过时的模型权重,因为使用前一次迭代模型权重的探索保留了随机种子的相对排序,允许探索在训练期间在空闲的抢占式GPU上运行。(2)SP重配置可以重用节点内状态,将组恢复时间从分钟级缩短到亚秒级启动。基于这些洞察,Spotlight引入了三种技术:基于bandit的探索规划器,在训练时间预算内最大化奖励方差;弹性序列并行,通过持久调度器和节点内权重复制动态重配置SP组;以及抢占感知的拉取式请求调度器,平衡负载并在抢占时提交进行中的状态。我们在开源RL平台ROLL上实现了Spotlight,并在Qwen-Image后训练上进行了评估。Spotlight达到相同目标验证分数的速度比基线快4倍,总成本降低1.4-6.4倍,同时在分辨率512×512和1280×1280的DeepSeek-OCR和Geneval数据集上实现了更优的图像质量。

英文摘要

Reinforcement learning (RL) post-training of Diffusion Transformers (DiTs) is prohibitively expensive, requiring thousands of high-end GPUs. Existing works explore two directions to reduce cost: seed exploration improves training convergence by selecting high-contrast samples, yet adds compute to the critical path; spot GPUs offer 69--77\% lower cost, yet sit idle during training because DiT rollouts finish nearly simultaneously, which prevents LLM-style pipelining of rollout with training. Spot preemptions further break Sequence Parallelism (SP) groups, fragmenting GPU topology. We present Spotlight, the first system that harvests spot GPUs for DiT RL post-training. Spotlight rests on two key insights we devise: (1)~we show that exploration can tolerate stale model weights because exploration that uses the model weights from the previous iteration preserves the relative ranking of random seeds, allowing exploration to run on idle spot GPUs during training. (2)~SP reconfiguration can reuse on-node state, reducing group recovery from minutes to sub-second launches. Built on these insights, Spotlight introduces three techniques: a bandit-based exploration planner that maximizes reward variance within the training time budget, elastic sequence parallelism that reconfigures SP groups on the fly via persistent schedulers and intra-node weight copying, and a preemption-aware pull-based request scheduler that balances load and commits in-flight state upon preemption. We implement Spotlight on the open-source RL platform ROLL and evaluate it on Qwen-Image post-training. Spotlight reaches the same target validation score $4\times$ faster than baselines, reducing total cost by $1.4$-$6.4\times$ while achieving superior image quality on DeepSeek-OCR and Geneval datasets with resolution $512\times512$ and $1280\times1280$.

2606.14824 2026-06-18 cs.AR cs.AI cs.LG 交叉投稿

Running hardware-aware neural architecture search on embedded devices under 512MB of RAM

在512MB内存下的嵌入式设备上运行硬件感知的神经架构搜索

Andrea Mattia Garavagno, Edoardo Ragusa, Paolo Gastaldo, Antonio Frisoli

发表机构 * University of Bologna(博洛尼亚大学) Politecnico di Milano(米兰理工学院)

AI总结 提出一种在资源受限的嵌入式设备上直接运行的硬件感知神经架构搜索方法,生成针对低端MCU的微型CNN,在Visual Wake Word数据集上达到最先进水平。

详情
Journal ref
2024 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 2024, pp. 1-2
AI中文摘要

本文提出了一种新颖的硬件感知神经架构搜索(HW NAS)方法,该方法考虑了运行它的计算平台上的可用资源,使其能够在各种嵌入式设备上执行。所提出的HW NAS生成针对低端微控制器单元(MCU)的微型卷积神经网络(CNN),这些MCU通常用于物联网(IoT)或可穿戴机器人领域,从而开辟了新的应用场景。网关可以运行它来根据获取的数据定制CNN的架构,而无需使用外部服务器,从而确保隐私。所提出的技术在Visual Wake Word数据集(一个标准的TinyML基准)上的多个人体识别任务中,在多个嵌入式设备上取得了最先进的结果。

英文摘要

This document proposes a novel approach to hardware-aware neural architecture search (HW NAS) that considers the resources available on the computing platform running it, enabling its execution on various embedded devices. The presented HW NAS produces tiny convolutional neural networks (CNNs) targeting low-end microcontroller units (MCUs), typically involved in the Internet of Things (IoT) or wearable robotics, opening new use cases. A gateway could run it to tailor CNNs' architecture on the acquired data without using external servers, ensuring privacy. The proposed technique achieves state-of-the-art results in the human-recognition tasks on the Visual Wake Word dataset, a standard TinyML benchmark, on several embedded devices.

7. 联邦学习、隐私与安全 3 篇

2606.18312 2026-06-18 cs.CR cs.DC cs.LG 交叉投稿

TIGER: Inverting Transformer Gradients via Embedding-Subspace Distance Optimization

TIGER:通过嵌入子空间距离优化反转Transformer梯度

William Kalikman, Ivo Petrov, Dimitar I. Dimitrov, Martin Vechev

发表机构 * ETH Zürich(苏黎世联邦理工学院) INSAIT, Sofia University "St. Kliment Ohridski"(索菲亚大学"圣克莱门特·奥赫里茨基")

AI总结 提出TIGER攻击,通过将子空间信号转化为可微目标,直接优化令牌嵌入以最小化到子空间的距离,在编码器模型上提升重建质量和速度,在解码器模型上增强对差分隐私的鲁棒性。

Comments 16 pages, 13 pages main text,

详情
AI中文摘要

联邦学习允许多个客户端通过向中央服务器发送梯度更新来联合训练共享模型,同时保持原始输入在本地。然而,先前的梯度反转攻击表明,这些更新可以泄露足够的信息来重建客户端输入。现有的针对Transformer的攻击要么优化虚拟输入以匹配真实的客户端更新,这对于现代模型来说成本高昂且不稳定;要么利用注意力梯度的低秩性来识别包含真实层嵌入的子空间,然后对候选令牌进行离散成员测试。然而,这种令牌测试在数值噪声(例如来自量化或差分隐私)下很脆弱,并且对于具有非因果注意力的编码器模型扩展性差。我们引入了TIGER,一种连续的梯度反转攻击,它将这种子空间信号转化为可微目标。TIGER不是搜索令牌或匹配完整梯度,而是直接优化令牌嵌入以最小化它们到子空间的距离。我们的实验表明,在仅编码器模型上,TIGER在重建质量和运行时间上均显著优于现有攻击;而在解码器模型上,TIGER比先前基于子空间的攻击更鲁棒,从而在受差分隐私保护的联邦学习设置中实现了首次成功的重建。

英文摘要

Federated learning allows multiple clients to jointly train a shared model by sending gradient updates to a central server while keeping raw inputs local. However, prior gradient inversion attacks show that these updates can reveal enough information to reconstruct client inputs. Existing attacks on transformers either optimize dummy inputs to match the true client updates, which is costly and unstable for modern models, or exploit the low rank of attention gradients to identify a subspace containing the true layer embeddings, followed by a discrete membership test for candidate tokens. However, this token test is brittle under numerical noise, i.e., from quantization or Differential Privacy (DP), and scales poorly for encoder models with non-causal attention. We introduce TIGER, a continuous gradient inversion attack that turns this subspace signal into a differentiable objective. Instead of searching over tokens or matching full gradients, TIGER directly optimizes token embeddings to minimize their distance to the subspace. Our experiments demonstrate that on encoder-only models, TIGER substantially improves both reconstruction quality and runtime over existing attacks, while on decoder models, TIGER is more robust than prior subspace-based attacks, enabling the first successful reconstructions in DP-defended federated learning settings.

2606.19023 2026-06-18 cs.CR cs.LG 交叉投稿

Lifecycle-Aware Dynamic Analysis for Secure ML Model Execution

生命周期感知的动态分析用于安全ML模型执行

Gabriele Digregorio, Marco Di Gennaro, Francesco Pastore, Stefano Zanero, Stefano Longari, Michele Carminati

发表机构 * Politecnico di Milano(米兰理工大学)

AI总结 提出Moat,一种动态生命周期感知方法,通过监控模型执行各阶段与宿主系统的结构化交互来检测恶意行为,在多个框架上实现零误报率。

详情
AI中文摘要

对预训练机器学习(ML)模型的日益依赖引入了新的攻击面。最近的漏洞表明,恶意行为可以嵌入模型工件中,常常绕过现有防御。当前的模型扫描解决方案主要依赖于静态的、特定格式的规则或已知的攻击签名,这限制了它们跨框架泛化和检测新型利用路径的能力。相比之下,我们提出了一种解决方案,专注于攻击对执行模型的宿主系统产生的影响,并基于关于ML模型执行的基本直觉。特别地,我们观察到ML模型在定义良好的生命周期阶段内运行,并且在每个阶段内,与宿主系统的交互是高度结构化和可预测的。我们将这些直觉转化为Moat,一种用于安全ML模型执行的动态生命周期感知方法,并在我们的参考实现Re-Moat中实例化此设计。我们使用来自Hugging Face Hub的77,974个真实世界模型工件、来自CVE的31个概念验证(PoC)以及来自最先进数据集的334个模型,在多个ML框架上评估Re-Moat,并将其与最先进的模型扫描解决方案进行比较。我们的结果表明,我们的方法检测到所有评估的攻击类别,同时保持接近零的误报率,验证了我们的直觉并激励了用于安全ML模型执行的动态分析。

英文摘要

The growing reliance on pre-trained Machine Learning (ML) models has introduced new attack surfaces. Recent vulnerabilities demonstrate that malicious behavior can be embedded within model artifacts, often bypassing existing defenses. Current model-scanning solutions primarily rely on static, format-specific rules or known attack signatures, which limit their ability to generalize across frameworks and to detect novel exploitation paths. In contrast, we propose a solution that focuses on the effects an attack has on the host system executing the model and builds on foundational intuitions about ML model execution. In particular, we observe that ML models operate within well-defined lifecycle phases and that, within each phase, interactions with the host system are highly structured and predictable. We translate these intuitions into Moat, a dynamic lifecycle-aware approach for securing ML model execution, and instantiate this design in Re-Moat, our reference implementation. We evaluate Re-Moat across multiple ML frameworks using 77,974 real-world model artifacts from the Hugging Face Hub, 31 Proofs-of-Concept (PoCs) from CVEs, and 334 models from a state-of-the-art dataset, and compare it against state-of-the-art model-scanning solutions. Our results show that our approach detects all evaluated attack classes while maintaining a close-to-zero false-positive rate, validating our intuitions and motivating dynamic analysis for securing ML model execution.

2606.19129 2026-06-18 cs.CR cs.LG 交叉投稿

Giskard : Byzantine Robust and Confidential Aggregation for Large-Scale Decentralized Learning

Giskard: 大规模去中心化学习中的拜占庭鲁棒与机密聚合

Ousmane Touat, César Sabater, Mohamed Maouche, Sonia Ben Mokhtar

发表机构 * INSA Lyon, LIRIS, CNRS(里尔斯大学 Lyon,LIRIS,CNRS) INRIA, INSA Lyon(法国国家科学研究中心 INRIA,里尔斯大学 Lyon)

AI总结 针对去中心化学习中同时保证机密性和抵御拜占庭行为的挑战,提出Giskard协议,通过树状委员会结构和BGW风格MPC实现近似中位数聚合,在百万级参与者下降低通信复杂度并保持模型效用。

Comments 17 pages, with appendix

详情
AI中文摘要

在去中心化学习中同时处理机密性和拜占庭行为是一个具有挑战性的问题。实际上,在去中心化学习中,客户端在本地保留数据的同时训练机器学习模型,并与一组邻居共享其模型参数或梯度。虽然强制机密性需要隐藏交换的模型参数/梯度(例如,通过使用密码学技术),但处理拜占庭贡献通常需要检查后者。因此,大多数研究工作分别处理这些目标。最近的一系列工作提出使用安全多方计算(MPC)来实现对模型投毒攻击的鲁棒聚合器,从而同时保证机密性和拜占庭鲁棒性。然而,这些解决方案扩展性差:它们要么要求参与者之间进行全对全通信,要么将整个计算委托给一个小子集,其计算和通信负载随网络规模成比例增长。在本文中,我们提出了Giskard,一种用于机密且拜占庭鲁棒的去中心化聚合协议。Giskard将$n$个参与方组织成一个大小为$O(\log n)$的委员会树,并通过在值域上进行委员会适应的分布式二分搜索来评估坐标-wise近似中位数,在每个委员会内使用BGW风格的MPC。我们通过理论证明其安全性和机密性,并通过涉及多达一百万个参与者的广泛实验来评估Giskard。与其最接近的竞争对手相比,Giskard渐近地降低了每方通信复杂度,同时在多达$n/4$个拜占庭参与方下表现出相当的模型效用。

英文摘要

Dealing simultaneously with confidentiality and Byzantine behaviors in decentralized learning is a challenging problem. Indeed, in decentralized learning, clients train a machine learning model while keeping their data locally and share their model parameters or gradients with a set of neighbors. While enforcing confidentiality calls for hiding the exchanged model parameters/gradients (e.g., by using cryptographic techniques), dealing with Byzantine contributions often requires inspecting the latter. Hence, most research works address these objectives separately. A recent line of work proposes to employ secure multi-party computation (MPC) to implement robust aggregators against model poisoning, thereby enforcing both confidentiality and Byzantine resilience. However, these solutions scale badly: they either require all-to-all communication between participants or delegate the entire computation to a small subset, whose computational and communication load grows proportionally with the size of the network. In this paper, we present Giskard, a protocol for confidential and Byzantine-robust decentralized aggregation. Giskard organizes $n$ parties into a tree of committees of size $O(\log n)$ and evaluates a coordinate-wise approximate median via a committee-adapted distributed binary search over the value domain, using BGW-style MPC within each committee. We assess Giskard both theoretically by proving its security and confidentiality properties and experimentally through extensive experiments involving up to one million participants. Compared to its closest competitors, Giskard reduces per-party communication complexity asymptotically while exhibiting comparable model utility under up to $n/4$ Byzantine parties.

8. 鲁棒性、不确定性与可信学习 4 篇

2606.18467 2026-06-18 stat.ML cs.LG 交叉投稿

ToolChain-CRC: Conformal Risk Control for Agentic AI Under Retrieval and Tool-Use Drift

ToolChain-CRC: 检索与工具使用漂移下代理型AI的共形风险控制

Jeffery Opoku, David Banahene

发表机构 * The University of Texas Rio Grande Valley(德克萨斯大学里奥格兰德谷分校) Florida International University(佛罗里达国际大学)

AI总结 针对检索增强和工具使用代理在漂移下的风险控制问题,提出ToolChain-CRC方法,通过构建轨迹级风险评分并校准接受或干预规则,实现可证明的轨迹级风险控制。

Comments 26 pages, 11 figures

详情
AI中文摘要

现代AI代理检索文档、调用工具、检查中间信息,然后产生最终答案或行动。这产生了一个仅从最终答案无法察觉的风险控制问题。即使检索薄弱、工具输出错误或早期步骤缺乏支持,最终响应也可能看起来可接受。我们提出ToolChain-CRC,一种针对漂移下检索增强和工具使用代理的共形风险控制方法。该方法将每次代理运行视为动作、观察和最终输出的完整轨迹。它构建步骤级风险评分,将其组合成轨迹风险评分,校准接受或干预规则,并添加一个随时报警,可在最终答案前停止风险运行。我们在可交换校准运行下证明了轨迹级风险控制,给出了具有可审计常数的漂移感知扩展,并通过超鞅构造证明了随时升级规则。实验涵盖合成工具链漂移、RAG/工具使用压力测试、基于SQuAD的公共检索任务、无API代理问答案例研究、消融实验、目标风险敏感性检查、20种子鲁棒性检查、漂移边界审计以及实时RAG/工具使用代理基准。在这些设置中,仅基于最终答案的校准可能遗漏检索和工具故障,而轨迹级校准将接受轨迹的风险保持在目标之下。

英文摘要

Modern AI agents retrieve documents, call tools, check intermediate information, and then produce a final answer or action. This creates a risk-control problem that is not visible from the final answer alone. A final response may look acceptable even when the retrieval was weak, a tool output was wrong, or an earlier step was unsupported. We propose ToolChain-CRC, a conformal risk-control method for retrieval-augmented and tool-using agents under drift. The method treats each agent run as a full trajectory of actions, observations, and final output. It builds step-level risk scores, combines them into a trajectory risk score, calibrates an accept-or-intervene rule, and adds an anytime alarm that can stop risky runs before the final answer. We prove trajectory-level risk control under exchangeable calibration runs, give a drift-aware extension with auditable constants, and prove an anytime escalation rule through a supermartingale construction. Experiments cover synthetic tool-chain drift, RAG/tool-use stress tests, public SQuAD-derived retrieval tasks, an API-free agentic QA case study, ablations, target-risk sensitivity checks, 20-seed robustness checks, a drift-margin audit, and a live RAG/tool-use agent benchmark. Across these settings, final-answer-only calibration can miss retrieval and tool failures, while trajectory-level calibration keeps accepted-trajectory risk below the target.

2606.18530 2026-06-18 cs.CR cs.CL cs.LG 交叉投稿

Evaluating Prompting-Based Defenses Against Domain-Camouflaged Injection Attacks

评估基于提示的防御策略对抗领域伪装注入攻击

Aaditya Pai

发表机构 * Data Science Institute(数据科学研究所)

AI总结 针对领域伪装注入攻击,评估五种基于提示的防御方法(如释义、重点标记等)在三个模型家族和三个部署领域中的有效性,发现释义法最有效,可将伪装攻击成功率降低55-84%。

Comments 9 pages, 4 figures, 4 tables; under review at the AdvML-Frontiers x CoTMA workshop, COLM 2026

详情
AI中文摘要

领域伪装注入攻击使用领域特定词汇将恶意指令嵌入检索内容中,从而逃避依赖句法注入标记的标准检测器。当检测失败时,从业者需要知道哪些防御架构能降低攻击成功率。我们评估了五种基于提示的防御方法(重点标记、释义、提示夹层以及两种组合)对抗领域伪装注入攻击,涉及三个模型家族(Claude Haiku、Llama 3.1 8B、Gemini 2.0 Flash)和三个部署领域(金融、法律、通用),共进行3,510次试验。在代理处理之前对检索内容进行释义是最一致有效的防御方法,根据模型不同,可将伪装攻击成功率降低55-84%,并且在所有测试模型上均实现了比我们的Llama Guard 4配置更低的攻击成功率。防御效果强烈依赖于模型:重点标记在Claude Haiku上将攻击成功率减半,但在Llama 3.1 8B上没有任何益处。金融领域部署面临最高的残余风险,基线攻击成功率为26-33%,在较弱模型上没有任何基于提示的防御能完全消除威胁。这些结果首次系统评估了专门针对伪装类注入攻击的基于提示的防御方法,并为从业者建立了基于基准的建议。所有任务均使用合成构建的专业文档;这些基准排名是否能推广到真实企业文档仍是一个开放问题。

英文摘要

Domain-camouflaged injection attacks embed malicious instructions in retrieved content using domain-appropriate vocabulary, evading standard detectors that rely on syntactic injection markers. When detection fails, practitioners need to know which defense architectures reduce attack success. We evaluate five prompting-based defenses (spotlighting, paraphrasing, prompt sandwiching, and two combinations) against domain-camouflaged injection across three model families (Claude Haiku, Llama 3.1 8B, Gemini 2.0 Flash) and three deployment domains (financial, legal, general) using 3,510 trials. Paraphrasing retrieved content before agent processing is the most consistently effective defense in this benchmark, reducing camouflage attack success rate by 55-84\% depending on model, and achieves lower attack success rates than our Llama Guard 4 configuration on every model tested. Defense effectiveness is strongly model-dependent: spotlighting halves attack success on Claude Haiku but provides no benefit on Llama 3.1 8B. Financial domain deployments face the highest residual risk at 26-33\% baseline attack success rate, with no prompting-based defense fully eliminating the threat on weaker models. These results provide the first systematic evaluation of prompting-based defenses specifically against camouflage-class injection attacks and establish benchmark-based recommendations for practitioners. All tasks use synthetically constructed professional documents; whether these benchmark rankings generalize to real enterprise documents remains an open question.

2606.18860 2026-06-18 cs.CV cs.LG 交叉投稿

Quantification of Uncertainty with Adversarial Models in Medical Image Segmentation

医学图像分割中对抗模型的不确定性量化

Hana Jebril, Thomas Pinetz, Günter Klambauer, Hrvoje Bogunović

发表机构 * Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria(人工智能研究所、医学数据科学中心、维也纳医学大学,奥地利) Comprehensive Center for AI in Medicine, Medical University of Vienna, Austria(医学人工智能综合中心、维也纳医学大学,奥地利) ELLIS Unit Linz, LIT AI Lab and Institute for Machine Learning, Johannes Kepler University Linz, Austria(林茨ELLIS单位、LIT人工智能实验室和机器学习研究所、林茨约瑟夫·冯·克拉夫特大学,奥地利) Institute for Machine Learning, Johannes Kepler University Linz, Austria(机器学习研究所、林茨约瑟夫·冯·克拉夫特大学,奥地利) Clinical Research Center for Medical AI, Johannes Kepler University Linz, Austria(医学人工智能临床研究中心、林茨约瑟夫·冯·克拉夫特大学,奥地利)

AI总结 提出QUAM-SM后处理框架,通过针对性对抗搜索识别脆弱像素,量化不确定性并分离认知与偶然不确定性,在公开数据集上优于现有方法。

Comments Accepted at MICCAI 2026

详情
AI中文摘要

可靠的像素级不确定性量化具有通过实现高保真纵向监测和区分真实病理变化与伪影来改变临床工作流程的潜力。理想情况下,这些模型提供关键治疗计划和手术干预所需的稳定性。然而,标准深度学习模型常常遭受校准不良,产生过度自信的预测,掩盖了微妙病理边界处的潜在脆弱性。为了解决这个问题,我们提出了QUAM-SM,一种使用针对性对抗搜索来识别“对抗脆弱”像素的后处理框架。通过主动寻找暴露预测不稳定性的扰动,我们的方法突出了决策最容易被翻转的区域。重要的是,该框架将认知不确定性与偶然不确定性分离。在两个具有多个专家标注的公开数据集上的实验表明,QUAM-SM在可靠性和边界敏感性方面优于标准和最新的不确定性估计方法。代码可在以下网址获取:https://this https URL

英文摘要

Reliable pixel-level uncertainty quantification holds the potential to transform clinical workflows by enabling high-fidelity longitudinal monitoring and distinguishing true pathological changes from artifacts. Ideally, these models provide the stability required for critical treatment planning and surgical intervention. However, standard deep learning models often suffer from miscalibration, yielding overconfident predictions that mask underlying vulnerabilities at subtle pathological boundaries. To address this, we propose QUAM-SM, a post-hoc framework using targeted adversarial search to identify "adversarially fragile" pixels. By actively seeking perturbations that expose predictive instability, our method highlights regions where decisions are most vulnerable to being flipped. Importantly, the framework disentangles epistemic uncertainty from aleatoric uncertainty. Experiments on two public datasets with multiple expert annotations demonstrate that QUAM-SM outperforms both standard and recent uncertainty estimation approaches in terms of reliability and boundary sensitivity. Code is available at https://github.com/HanaJebril/quam_sm

2606.19300 2026-06-18 cs.CV cs.LG 交叉投稿

Confidence is Not Reliability: Rethinking MC Dropout in Brain Tumour Segmentation

置信度不等于可靠性:重新思考脑肿瘤分割中的MC Dropout

Xin Ci Wong, Duygu Sarikaya, Kieran Zucker, Marc De Kamps, Nishant Ravikumar

发表机构 * Centre for Doctoral Training in AI for Medical Diagnosis and Care, School of Computing, University of Leeds(利兹大学计算机学院人工智能医学诊断与护理博士培训中心) School of Computer Science, University of Leeds(利兹大学计算机科学学院)

AI总结 通过MC Dropout不确定性估计,发现全局不确定性-误差对齐(AUROC≈0.97)可能掩盖关键子区域(如增强肿瘤)的严重误校准(ECE=0.915),表明子区域校准评估对临床安全至关重要。

Comments Accepted for MIUA2016

详情
AI中文摘要

多参数MRI中的胶质瘤分割是治疗计划的关键组成部分。一个在治疗关键子区域上静默失败的分割模型会带来患者安全风险,而Dice分数等基于重叠的指标无法暴露这种风险。我们探究通过蒙特卡洛(MC)Dropout进行的体素级不确定性估计能否可靠地识别临床关键子区域中的分割错误,以及校准失败模式是否仅从标准报告指标中可检测。在126名BraTS21患者的两模型实证案例研究中,我们评估了高性能预训练SegResNet和本地训练的带有残差单元的UNet(UNet-Res)。MC dropout保持了分割准确性($|\Delta \text{Dice}|$ $<0.01$),同时实现了强不确定性-误差对齐(熵(H)的AUROC $\approx$0.97),表明不确定性正确地将错误体素排在正确体素之上。基于熵的患者分层识别出一个高不确定性亚组,其分割性能显著较低(全肿瘤Dice中位数$0.835$ vs. $0.925$),支持不确定性作为实用的分诊信号。然而,全局对齐可能掩盖重要的区域特异性差异。尽管AUROC相似,UNet-Res在增强肿瘤熵上接近零($0.054$),期望校准误差(ECE)为$0.915$,Dice仅为$0.714$,表明在最临床关键子区域上置信度严重误校准,这是标准Dice和AUROC报告无法发现的失败模式。这些发现表明,强不确定性-误差对齐对于临床安全是必要但不充分的:在选择临床部署模型时,子区域特异性校准评估必须伴随AUROC评估。

英文摘要

Glioma segmentation in multiparametric MRI is a critical component of treatment planning. A segmentation model that fails silently on treatment-critical sub-regions represents a patient safety risk that overlap-based metrics such as Dice scores cannot expose. We ask whether voxel-level uncertainty estimation via Monte Carlo (MC) Dropout can reliably identify segmentation errors in clinically critical sub-regions, and whether calibration failure modes are detectable from standard reporting metrics alone. In an empirical two-model case study on 126 BraTS21 patients, we evaluate a high-performance pretrained SegResNet and a locally trained UNet with residual units (UNet-Res). MC dropout preserved segmentation accuracy ($|Δ\text{Dice}|$ $<0.01$) while achieving strong uncertainty-error alignment (AUROC for entropy (H) $\approx$0.97), indicating uncertainty correctly ranks erroneous voxels above correct ones. Entropy-based patient stratification identified a high-uncertainty subgroup with substantially lower segmentation performance (median whole-tumour Dice $0.835$ vs. $0.925$), supporting uncertainty as a practical triage signal. However, global alignment can mask important region-specific differences. Despite similar AUROC, UNet-Res exhibited near-zero enhancing tumour entropy ($0.054$) and Expected Calibration Error (ECE) of $0.915$, with a Dice of only $0.714$, indicating severely miscalibrated confidence on the most clinically critical sub-region, a failure mode invisible to standard Dice and AUROC reporting. These findings demonstrate that strong uncertainty-error alignment is necessary but insufficient for clinical safety: sub-region-specific calibration assessment must accompany AUROC evaluation when selecting models for clinical deployment.

9. 迁移、元学习与持续学习 1 篇

2606.18567 2026-06-18 stat.ML cs.LG stat.AP stat.ME 交叉投稿

Bridging Data Gaps in Structural Fragility Modeling through Transfer Learning: Methodology and Case Studies

通过迁移学习弥合结构易损性建模中的数据空白:方法与案例研究

Narges Saeednejad, Jamie Ellen Padgett

发表机构 * Department of Civil and Environmental Engineering, Rice University(Rice大学土木与环境工程系) Ken Kennedy Institute, Rice University(Rice大学肯尼迪研究所)

AI总结 提出以方法为中心的迁移学习框架,解决领域偏移、类别不平衡和目标标签稀缺问题,通过三个案例验证其在低数据场景下提升失效检测与预测稳定性的有效性。

Comments 24 pages, 12 figures

详情
AI中文摘要

本文提出了一个以方法为中心的迁移学习框架,用于在领域偏移、类别不平衡和目标标签稀缺的情况下进行易损性自适应,同时保持工程可解释性并支持不确定性下的决策。通过三个互补的案例研究展示了四种迁移学习策略(基于实例、基于参数、分层贝叶斯和多源):(i) 基于实例的迁移学习通过重要性加权,利用卡特里娜飓风观测数据演示了沿海桥梁易损性;(ii) 基于参数的迁移学习结合分层贝叶斯迁移学习,实现了跨层的部分合并和后验不确定性量化,利用伊恩飓风观测数据演示了住宅建筑易损性;(iii) 多源迁移学习融合多个分析易损性模型,学习源权重并进行正则化的目标域自适应,利用2001年尼斯夸利地震观测数据演示了地震桥梁易损性。在这些案例研究中,直接迁移源模型(即使用现有最先进模型)在领域偏移和严重类别不平衡下失败,而有针对性的自适应在低数据场景下显著提高了失效检测和预测稳定性。这些发现强调了在开发和自适应易损性模型时,需要对诊断、策略选择和不确定性报告提供系统指导。

英文摘要

This paper presents a methodology-centered transfer learning framework for fragility adaptation under domain shift, class imbalance, and scarce target labels while preserving engineering interpretability and supporting decision-making under uncertainty. Four transfer learning strategies (instance-based, parameter-based, hierarchical Bayesian, and multi-source) are demonstrated through three complementary case studies: (i) instance-based transfer learning via importance weighting, demonstrated on coastal bridge fragility using Hurricane Katrina observations; (ii) parameter-based transfer learning together with hierarchical Bayesian transfer learning, enabling partial pooling across strata and posterior uncertainty quantification, demonstrated on residential building fragility using Hurricane Ian observations; and (iii) multi-source transfer learning that fuses multiple analytical fragility models with learned source weights and regularized target-domain adaptation, demonstrated on seismic bridge fragility using observations from the 2001 Nisqually earthquake. Across these case studies, direct transfer of source models (i.e. using existing state-of-the-art models) fails under domain shift and severe class imbalance, while targeted adaptation substantially improves failure detection and predictive stability in low-data regimes. These findings highlight the need for systematic guidance on diagnostics, strategy selection, and uncertainty reporting when developing and adapting fragility models.

10. 数据集、基准与评测 13 篇

2606.18267 2026-06-18 cs.SI cs.LG cs.NE 交叉投稿

Graph Instance Landscapes: When Structural Similarity Does (Not) Reflect Shortest-Path Performance

图实例景观:当结构相似性(不)反映最短路径性能时

Maryam Gholami Shiri, Ivana Krminac, Marko Djukanović, Sašo Džeroski, Eva Tuba, Tome Eftimov

发表机构 * Jožef Stefan Institute(乔泽夫·斯塔芬研究所) Ljubljana, Slovenia(斯洛文尼亚卢布尔雅那) Jožef Stefan International Postgraduate School(乔泽夫·斯塔芬国际研究生学院) University of Banja Luka(班贾卢卡大学) Faculty of Natural Science and Mathematics(自然科学与数学学院) University of Nova Gorica(诺瓦戈里察大学) Institute of Information Sciences (IZUM)(信息科学研究所(IZUM)) Trinity University(特里尼蒂大学)

AI总结 通过将图嵌入低维结构特征空间并聚类,分析最短路径算法在不同图结构区域中的性能差异,发现结构相似性并不保证性能相似。

Comments Preprint version of a paper accepted at the 2026 IEEE Congress on Evolutionary Computation (IEEE CEC 2026)

详情
AI中文摘要

最短路径算法的基准测试通常基于异构图集上的聚合性能,这限制了对不同搜索范式如何响应实例结构的理解。我们采用实例景观视角进行图基准测试,将图嵌入到低成本的结构特征空间中,并将其聚类为结构相似的区域。研究了三个基准套件:加权 Erdős--Rényi 图、随机几何(无线)图和真实世界道路网络。我们评估了四种代表性的最短路径求解器,涵盖无信息精确搜索(Dijkstra)、双向精确搜索(双向 Dijkstra)、启发式引导精确搜索(A$^{*}$)和基于双端队列的策略(DEQ)。在多种特征选择方案下分析聚类鲁棒性,并使用非参数检验比较不同景观区域内的运行时间分布。虽然生成器参数诱导出稳定的结构区域,但我们发现特征空间相似性并不一定意味着性能相似:即使在相同的景观区域内,也经常观察到显著的运行时间变化。合并套件分析进一步表明,不同的基准族占据大部分不相交的区域。这些结果突出了结构景观用于最短路径算法结构感知基准测试的潜力和局限性。

英文摘要

Benchmarking shortest-path algorithms is commonly based on aggregate performance over heterogeneous graph sets, which limits insight into how different search paradigms react to instance structure. We adopt an instance-landscape view of graph benchmarking by embedding graphs into a low-cost structural feature space and clustering them into regions of similar structure. Three benchmark suites are studied: weighted Erdős--Rényi graphs, random geometric (wireless) graphs, and real-world road networks. We evaluate four representative shortest-path solvers spanning uninformed exact search (Dijkstra), bidirectional exact search (bidirectional Dijkstra), heuristic-guided exact search (A$^{*}$), and deque-based strategies (DEQ). Clustering robustness is analyzed under multiple feature-selection schemes, and runtime distributions are compared across landscape regions using non-parametric tests. While generator parameters induce stable structural regions, we find that feature-space similarity does not necessarily imply performance similarity: significant runtime shifts are frequently observed even within the same landscape region. A merged-suite analysis further shows that different benchmark families occupy largely disjoint regions. These results highlight both the potential and the limits of structural landscapes for the structure-aware benchmarking of shortest-path algorithms.

2606.18281 2026-06-18 stat.AP cs.LG stat.ML 交叉投稿

A Guide to Estimating Conditional Average Treatment Effects in Competing Risks Settings

竞争风险背景下条件平均处理效应估计指南

Daniel Klippert, Sarah Friedrich, Markus Pauly

发表机构 * Department of Statistics, TU Dortmund University(图恩-多特蒙德大学统计学系) Research Center Trustworthy Data Science and Security, University Alliance Ruhr (UA Ruhr)(鲁尔大学联盟可信数据科学与安全研究中心) Institute for Mathematics, University of Augsburg(艾希施泰特大学数学研究所)

AI总结 针对竞争风险生存数据,比较六种元学习器估计条件平均处理效应,提供R包crsurvlearners指导模型选择。

详情
AI中文摘要

条件平均处理效应(CATE)是个性化医疗中治疗决策的核心。在竞争风险背景下,从生存数据估计CATE允许对特定感兴趣事件的治疗效果进行患者特异性评估,同时适当考虑替代事件类型。在存在合并症的情况下,这种区分至关重要,因为竞争死亡原因可能混淆治疗效果。本文聚焦于右删失生存时间和二元治疗,研究CATE定义为在固定时间点上感兴趣事件绝对风险的协变量条件差异。为此,我们研究了元学习器,这些学习器将机器学习算法适应于竞争风险场景中的CATE估计。我们系统比较了六种元学习器,结合Cox回归或随机生存森林进行风险建模,以及弹性网回归或随机森林进行直接CATE建模。为提供模型选择的实践指导,我们在多种模拟设置中评估其性能,这些设置在风险复杂性、治疗异质性、治疗分配、事件类型分布和删失方面有所不同。为促进应用,我们提供R包crsurvlearners,实现了所有考虑的方法。

英文摘要

Conditional average treatment effects (CATEs) are central to treatment decision-making in personalized medicine. In competing risks settings, estimating CATEs from survival data allows for patient-specific assessments of treatment effectiveness for a specific event of interest while properly accounting for alternative event types. This distinction is essential in the presence of comorbidities, where competing causes of death may otherwise confound the therapeutic benefit. Focusing on right-censored survival times with binary treatment, we examine CATEs defined as covariate-conditional differences in the absolute risk for the event of interest at a fixed time. To this end, we study meta-learners which adapt machine learning algorithms for CATE estimation in competing risks scenarios. We systematically compare six meta-learners, combining Cox regression or random survival forests for risk modeling with elastic net regression or random forests for direct CATE modeling. To provide practical guidance on model selection, we evaluate their performance in multiple simulation settings, that differ in hazard complexity, treatment heterogeneity, treatment assignment, event type distribution and censoring. To facilitate applied use, we provide the R package, crsurvlearners, which implements all considered approaches.

2606.18302 2026-06-18 q-bio.OT cs.LG 交叉投稿

Protein-Based Fish Species Identification: Dataset, Models, and Insights from Native Bangladeshi Fish

基于蛋白质的鱼类物种识别:孟加拉本土鱼类的数据集、模型与见解

Md Nasiat Hasan Fahim, Md. Abid Ullah Muhib, Mohammad Shahidur Rahman

发表机构 * Shahjalal University of Science

AI总结 本研究构建了首个孟加拉本土鱼类蛋白质序列数据集,并系统评估了七种架构,提出了一种轻量级混合模型MotifCNN-Transformer+TA-PE,在资源受限场景下优于大型蛋白质语言模型ProtBERT。

Comments Published in 2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking (QPAIN). \c{opyright} 2026 IEEE. Personal use of this material is permitted

详情
Journal ref
2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking (QPAIN)
AI中文摘要

在孟加拉国,正确识别鱼类物种对于粮食安全、经济发展和气候适应性至关重要。蛋白质序列直接反映功能和进化约束,对物种认证和生物多样性监测具有重要意义。然而,目前尚无针对孟加拉本土鱼类物种的蛋白质序列识别基准。本研究通过引入首个包含9种孟加拉本土鱼类2845条高质量蛋白质序列的精选数据集来填补这一空白。我们还通过对七种架构范式进行系统基准测试,建立了该领域首个蛋白质序列分类基线。此外,我们提出了一种实用的新型混合架构——MotifCNN与具有末端感知位置编码的Transformer(MotifCNN-Transformer+TA-PE)。该新架构实现了79.80%的准确率和0.80的宏F1分数。最高准确率83.04%由微调的蛋白质语言模型ProtBERT取得,该模型有4.2亿参数,需要双16GB GPU进行推理。根据McNemar检验,ProtBERT相比我们的MotifCNN-Transformer+TA-PE的3.24%准确率提升在统计上不显著(p = 0.1120)。在九类中的六类上,我们的新架构在每类识别中优于ProtBERT。此外,我们的MotifCNN-Transformer+TA-PE比ProtBERT快约5倍,小42倍,支持16倍更大的批处理大小,且无需GPU推理,使其在资源受限地区(如孟加拉农村)部署更为实用。除此之外,我们的基础性工作展示了系统发育关系对序列相似性的影响,并为南亚蛋白质依赖型经济中的渔业管理、食品认证和生物多样性保护建立了途径。

英文摘要

Correct identification of fish species is highly significant for food security, economic development, and climate resilience in Bangladesh. Protein sequences directly reflect functional and evolutionary constraints which are important for species authentication and biodiversity monitoring. Yet there exists no benchmark for native Bangladeshi fish species identification from protein sequence. In this study, we addressed this gap by introducing the first curated dataset for nine native Bangladeshi fish species of 2845 high quality protein sequences. We also established the first protein sequence classification baseline for this domain through a systematic benchmarking of seven architectural paradigms. Moreover, we propose a realistic deployable novel hybrid architecture of MotifCNN and Transformer with Terminal-Aware Positional-Encoding (MotifCNN-Transformer+TA-PE). Our novel architecture achieves 79.80% accuracy with macro-F1 of 0.80. The highest 83.04% accuracy is achieved by finetuned protein language model ProtBERT that has 420M parameters and requires dual 16GB GPUs for inference. According to McNemar's test, ProtBERT's 3.24% accuracy gain over our MotifCNN-Transformer+TA-PE is statistically insignificant (p = 0.1120). Our novel architecture beats it among six of the nine classes in per class identification. Also our MotifCNN-Transformer+TA-PE is approximately 5x faster, 42x smaller, and supports 16x larger batch size than ProtBERT and has GPU free inference, making it more practical for deployment in resources constrained areas such as rural Bangladesh. Beyond this, our foundational work shows effects of phylogenetic relationships on sequence similarity and establishes pathways for fisheries management, food authentication and biodiversity conservation in South Asia's protein dependent economy.

2606.18436 2026-06-18 stat.ML cs.LG 交叉投稿

Pointwise is Pointless? A Multimodal Ablation Study for Precipitation Nowcasting with Graph Neural Networks

逐点是否无意义?基于图神经网络的降水临近预报的多模态消融研究

Ophélia Miralles, Máté Mile, Christoffer Artturi, Thomas Nipen, Ivar Seierstad

发表机构 * Norwegian Meteorological Institute(挪威气象研究所)

AI总结 本研究通过多模态图神经网络系统,消融分析雷达、数值预报、地面观测、卫星数据及训练损失对降水临近预报的影响,发现各模态分别改善不同方面,点观测虽提升局部但需结合损失函数和不确定性表示才能优化雷达场。

详情
AI中文摘要

稀疏点观测在降水临近预报中日益可用,但尚不清楚它们能在多大程度上改善密集雷达场预报。我们通过北欧雷达区域的多模态图神经网络临近预报系统部分回答了这个问题。该模型预测未来两小时内每五分钟的降雨率,并采用雷达历史、MEPS数值天气预报、Netatmo地面观测、MSG卫星通道、随机噪声和基于CRPS的集合损失的不同组合进行训练。本研究设计为对操作相关信源和训练目标的消融。我们比较了仅雷达、NWP信息、站点信息、卫星信息、噪声增强和基于CRPS的配置,使用雷达网格、站点位置、降雨起始的互补诊断,以及oracle、位移和幅度评分。结果表明,每个信源改善了预报问题的不同方面。MEPS稳定了仅雷达外推,Netatmo观测改善了局部站点和起始诊断,卫星预测因子减少了某些站点级偏差,但在确定性使用时可能过早激活降雨。基于CRPS的配置提供了最一致的雷达网格增益,而卫星与CRPS的组合设置给出了最佳的整体oracle/DAS评分。这些结果不支持点观测对临近预报无用的结论,但表明局部观测技能和空间相干雷达场技能是不同的目标。实际意义是,稀疏观测可以提供有用的局部约束,但它们对雷达类场的益处取决于训练损失、不确定性表示以及观测支持在模型中的编码方式。

英文摘要

Sparse point observations are increasingly available for precipitation nowcasting, but it is unclear how much they improve dense radar-field forecasts. We partially address this question with a multimodal graph neural network nowcasting system over the Nordic radar domain. The model predicts rain rate every five minutes up to two hours ahead and is trained with different combinations of radar history, MEPS numerical weather prediction, Netatmo surface observations, MSG satellite channels, stochastic noise, and CRPS-based ensemble losses. The study is designed as an ablation of operationally relevant information sources and training objectives. We compare radar-only, NWP-informed, station-informed, satellite-informed, noise-augmented, and CRPS-based configurations using complementary diagnostics on the radar grid, at station locations, for rain onset, and through oracle, displacement, and amplitude scores. The results show that each source improves a different part of the forecast problem. MEPS stabilises radar-only extrapolation, Netatmo observations improve local station and onset diagnostics, and satellite predictors reduce some station-level biases but may activate rain too early when used deterministically. CRPS-based configurations provide the most consistent radar-grid gains, while the combined satellite and CRPS setup gives the best overall oracle/DAS score. These results do not support the conclusion that point observations are uninformative for nowcasting, but they show that local observational skill and spatially coherent radar-field skill are distinct targets. The practical implication is that sparse observations can provide useful local constraints, but their benefit for radar-like fields depends on the training loss, uncertainty representation, and how observation support is encoded in the model.

2606.18557 2026-06-18 cs.AI cs.LG cs.LO 交叉投稿

DeFAb: A Verifiable Benchmark for Defeasible Abduction in Foundation Models

DeFAb:基础模型中可废止溯因的可验证基准

Patrick Cooper, Alvaro Velasquez

发表机构 * University of Colorado Boulder(科罗拉多大学博尔德分校)

AI总结 提出DeFAb基准,通过将知识库转换为可验证的溯因实例,评估基础模型在可废止推理中的创造力与理论推理能力,发现前沿模型准确率远低于符号求解器。

Comments 33 pages, 14 figures, 23 tables. Dataset: https://huggingface.co/datasets/PatrickAllenCooper/DeFAb ; code and evaluation harness: https://github.com/PatrickAllenCooper/blanc

详情
AI中文摘要

一个基于规则的逻辑求解器在不到50微秒内以100%的准确率解决了我们基准中的每个实例;而最佳前沿语言模型在渲染鲁棒评估下最高仅达65%,最差降至23.5%(四种表面渲染的最坏情况)。我们引入DeFAb(可废止溯因基准),这是一个数据集和生成流水线,将四十年的公共资助知识库转换为形式化可废止溯因实例:通过覆盖默认值同时保留无关期望来构建解释异常假设。由于每个假设必须通过多项式时间检查(有效推导、保守性和最小性),DeFAb将逻辑严谨性作为衡量创造性和理论推理的工具,评分的是理论修正的规范构建,而非流畅但破坏理论的散文。该流水线将分类层次结构(OpenCyc、YAGO、Wikidata)与行为属性图(ConceptNet、UMLS)配对,从18个来源生成372,648+个实例,涉及33.75M条实例化规则,分为三个级别,并具有多项式时间可验证的金标准。四个前沿模型未能可靠内化可废止推理:渲染鲁棒的Level 2准确率为7.8-23.5%;思维链方差(约36个百分点)超过任何模型间差距;匹配的污染控制隔离出+19.4个百分点的Level 3差距。我们进一步发布了DeFAb-Hard(235个实例的Level 3难度变体;最佳模型53.3% vs 符号100%)和CONJURE(一个内核验证的变革性创造力变体,包含560个Lean 4/Mathlib实例,其金答案证明内核先前未包含的定义,无需判断的验证器;试点发现零新概念)。同一验证器还可作为偏好优化(DPO、RLVR/GRPO)的精确奖励。基于MIT许可发布于此https URL。

英文摘要

A rule-based logic solver resolves every instance in our benchmark in under 50 microseconds with 100% accuracy; the best frontier language model reaches 65% at best and drops to 23.5% under rendering-robust evaluation (worst case over four surface renderings). We introduce DeFAb (Defeasible Abduction Benchmark), a dataset and generation pipeline that converts four decades of publicly funded knowledge bases into formally grounded instances for defeasible abduction: constructing hypotheses that explain anomalies by overriding defaults while preserving unrelated expectations. Because every hypothesis must pass polynomial-time checks for valid derivation, conservativity, and minimality, DeFAb makes logical rigor the instrument for measuring creativity and theoretical reasoning, scoring the disciplined construction of theory revisions rather than fluent but theory-destroying prose. The pipeline pairs taxonomic hierarchies (OpenCyc, YAGO, Wikidata) with behavioral property graphs (ConceptNet, UMLS) to produce 372,648+ instances across 33.75M materialized rules from 18 sources, in three levels with polynomial-time verifiable gold standards. Four frontier models do not reliably internalize defeasible reasoning: rendering-robust Level 2 accuracy is 7.8-23.5%; chain-of-thought variance (~36 pp) exceeds any inter-model gap; and a matched contamination control isolates a +19.4 pp Level 3 gap. We further release DeFAb-Hard (a 235-instance Level 3 difficulty variant; best model 53.3% vs 100% symbolic) and CONJURE (a kernel-verified transformative-creativity variant of 560 Lean 4/Mathlib instances whose gold answers are definitions the proof kernel did not previously contain, judge-free verifier; a pilot finds zero novel concepts). The same verifier doubles as an exact reward for preference optimization (DPO, RLVR/GRPO). Released under MIT at https://huggingface.co/datasets/PatrickAllenCooper/DeFAb.

2606.18686 2026-06-18 cs.AI cs.CL cs.LG 交叉投稿

ForecastBench-Sim: A Simulated-World Forecasting Benchmark

ForecastBench-Sim:一个模拟世界预测基准

Jaeho Lee, Nick Merrill, Ezra Karger

发表机构 * Forecasting Research Institute(预测研究所)

AI总结 提出基于Freeciv游戏模拟的预测基准ForecastBench-Sim,通过游戏回滚生成可控、即时可解的预测问题,用于评估AI系统的概率推理能力。

Comments 15 pages, 5 main figures, 6 appendix figures. Spotlight presentation at Forecasting as a New Frontier of Intelligence / Workshop on AI Forecasting, ICML 2026

详情
AI中文摘要

通用AI系统的预测基准通常继承现实世界的约束:结果缓慢显现、尾部事件罕见、反事实问题难以评分。我们引入ForecastBench-Sim,一个基于Freeciv(一款以文明系列为模型的回合制策略游戏)游戏回滚的模拟世界预测基准。预测者接收固定的世界报告(当前游戏状态的结构化快照),并回答关于隐藏未来状态的问题;然后基准继续模拟并对预测进行评分。由于世界是模拟的,同一设置可以生成任意时间跨度的连续或二元预测问题、用于条件或因果问题的配对干预世界,以及罕见或破坏性结果的已解决示例。我们描述了基准流程、问题族、评分协议和发布工件,并报告了来自模型评估和匿名人工试点的验证切片。ForecastBench-Sim旨在通过提供受控、即时可解的任务来补充现实世界预测基准,用于研究动态世界状态下的概率推理。

英文摘要

Forecasting benchmarks for general-purpose AI systems usually inherit the constraints of the real world: outcomes resolve slowly, tail events are rare, and counterfactual questions are difficult to score. We introduce ForecastBench-Sim, a simulated-world forecasting benchmark built on game rollouts from Freeciv, a turn-based strategy game modelled on the Civilization series. Forecasters receive a fixed world report (a structured snapshot of the current game state) and answer questions about hidden future states; the benchmark then continues the simulation and scores forecasts. Because the world is simulated, the same setup can generate continuous or binary forecasting questions at arbitrary time horizons, paired intervention worlds for conditional or causal questions, and resolved examples of rare or disruptive outcomes. We describe the benchmark pipeline, question families, scoring protocol, and release artifacts, and report validation slices from model evaluations and an anonymized human pilot. ForecastBench-Sim is intended to complement real-world forecasting benchmarks by providing controlled, immediately resolvable tasks for studying probabilistic reasoning under dynamic world states.

2606.18729 2026-06-18 stat.ML cs.LG 交叉投稿

TimeLAVA: Learning-Agnostic Data Valuation for Time Series

TimeLAVA: 时间序列的学习无关数据估值

Wenqin Liu, Weizhi Quan, Aoqi Zuo, Erdun Gao, Vu Nguyen, Dino Sejdinovic, Howard Bondell, Mingming Gong

发表机构 * School of Mathematics and Statistics, The University of Melbourne(墨尔本大学数学与统计学学院) Statistics, The University of Melbourne(墨尔本大学统计学系) Statistics, University of Sydney(悉尼大学统计学系) Responsible AI Research Centre, Australian Institute for Machine Learning(澳大利亚机器学习研究所负责任人工智能研究中心) Amazon(亚马逊) School of Mathematical Sciences, Adelaide University(阿德莱德大学数学科学学院) Department of Machine Learning, MBZUAI(MBZUAI机器学习系)

AI总结 提出TimeLAVA,一种学习无关框架,通过小波变换和最优传输评估时间序列片段对分布差异的边际贡献,无需模型训练,在异常检测、数据剪枝和标签噪声检测中优于现有方法。

Comments 34pages

详情
Journal ref
ICML2026
AI中文摘要

数据估值量化单个样本的内在质量,以实现原则性的数据整理、质量控制和鲁棒学习。对于医疗、金融和工业监控等关键领域的时间序列,有效的估值方法至关重要但基本缺乏。现有方法要么依赖于模型,限制了其泛化性,要么针对独立同分布数据设计,因此无法捕捉序列数据固有的时间依赖性、多尺度模式和非平稳动态。我们引入了TimeLAVA,一种学习无关框架,通过评估时间片段对最小化评估数据与参考数据之间分布差异的边际贡献来估值。其核心是一种新颖的基于选择性小波的Wasserstein差异,结合了用于时间定位的多尺度小波变换和用于对分布偏移具有鲁棒性的非平衡最优传输。通过敏感性分析高效计算片段值,无需模型训练,并聚合成逐点得分。我们提供了将估值与模型无关泛化联系起来的理论保证,并证明了对异常值污染的有界敏感性。在异常检测、数据剪枝和标签噪声检测上的大量实验表明,TimeLAVA在多样化的真实世界数据集上产生了比现有方法显著更具信息量的价值分数。

英文摘要

Data valuation quantifies the intrinsic quality of individual samples to enable principled data curation, quality control, and robust learning. For time series in critical domains such as healthcare, finance, and industrial monitoring, effective valuation methods are essential yet fundamentally lacking. Existing approaches are either model-dependent, limiting their generalizability, or designed for i.i.d. data and thus fail to capture temporal dependencies, multi-scale patterns, and non-stationary dynamics inherent to sequential data. We introduce TimeLAVA, a learning-agnostic framework that values temporal segments by their marginal contribution to minimizing distributional discrepancy between evaluated and reference data. At its core is a novel Selective Wavelet-based Wasserstein discrepancy combining multi-scale wavelet transforms for temporal localization with unbalanced optimal transport for robustness to distributional shifts. Segment values are efficiently computed via sensitivity analysis without requiring model training and aggregated into point-wise scores. We provide theoretical guarantees linking valuation to model-agnostic generalization and prove bounded sensitivity to outlier contamination. Extensive experiments across anomaly detection, data pruning, and label noise detection demonstrate that TimeLAVA produces significantly more informative value scores than existing methods on diverse real-world datasets.

2606.18750 2026-06-18 stat.AP cs.LG 交叉投稿

Ensuring Trustworthy Online A/B Testing: Addressing Five Key Questions on CUPED

确保可信的在线A/B测试:解决关于CUPED的五个关键问题

Yu Zhang, Bokui Wan, Yongli Qin, Jinyong Ma, Yifan Guo

AI总结 本文系统解决CUPED应用中五个常见但被忽视的问题,包括最优调整规范、回归调整有效性、鲁棒方差估计,并扩展到多臂实验和两阶段抽样设计,通过理论分析和实验验证提供可靠方法,已在字节跳动平台部署。

Comments 15 pages, 3 figures

详情
AI中文摘要

A/B测试已成为大规模在线实验中数据驱动决策的金标准,为功能发布、定价优化和用户体验提升提供关键指导。为最大化统计灵敏度,许多科技公司常规使用实验前数据控制实验(CUPED),该技术实现大幅方差缩减,同时保持平均处理效应估计的无偏性。尽管被广泛采用,CUPED的几个关键方法和实践细节仍未充分探索。本文系统解决了关于CUPED应用的五个常见但被忽视的问题。首先,我们提供各种后CUPED估计量的比较分析,以确定最优调整规范。其次,我们评估基于回归的调整的有效性,并描述为此类框架定制的鲁棒方差估计方法。最后,我们将研究扩展到复杂但常见的场景,包括多臂实验和两阶段抽样设计。我们的发现表明,在这些设置中,天真地依赖标准方差估计量可能导致严重误导的推断。通过提供严格的理论见解和广泛的实验验证,本工作加深了对CUPED的概念理解。值得注意的是,推荐的方法已成功部署并集成到字节跳动的实验平台中。

英文摘要

A/B testing has become the gold standard for data-driven decision-making in large-scale online experimentation, providing critical guidance for feature launch, pricing optimization, and user experience enhancement. To maximize statistical sensitivity, many technology companies routinely employ Controlled-experiment Using Pre-Experiment Data (CUPED), a technique that achieves substantial variance reduction while preserving the unbiasedness of estimating the average treatment effect. Despite its widespread adoption, several critical methodological and practical nuances of CUPED remain underexplored. This paper systematically addresses five frequently encountered yet overlooked questions regarding the application of CUPED. First, we provide a comparative analysis of various post-CUPED estimators to identify the optimal adjustment specification. Second, we evaluate the validity of regression-based adjustments and delineate robust variance estimation methods tailored for such frameworks. Finally, we extend our investigation to complex but common scenarios, including multi-arm experiments and two-stage sampling designs. Our findings reveal that in these settings, naive reliance on standard variance estimators can lead to severely misleading inferences. By offering rigorous theoretical insights and extensive experimental validation, this work deepens the conceptual understanding of CUPED. Notably, the recommended methodologies have been successfully deployed and integrated into ByteDance's experimentation platform.

2606.18972 2026-06-18 stat.ML cs.LG 交叉投稿

FOSC-X: An Extended Framework for Optimal Local Cuts and Non-Horizontal Cluster Selection from Clustering Hierarchies

FOSC-X: 一种用于从聚类层次结构中提取最优局部切割和非水平聚类的扩展框架

Connor Simpson, Ricardo J. G. B. Campello

AI总结 提出FOSC-X框架,通过动态规划从层次聚类树中提取前M个全局最优的局部非水平切割聚类,支持聚类数约束,在线性时间内保证最优排序。

详情
AI中文摘要

从层次结构中提取平坦聚类解是实际聚类分析中的常见任务,可表述为优化问题。现有方法侧重于寻找单个最优解。我们引入FOSC-X,一个从层次聚类树的局部非水平切割中提取前M个全局最优平坦聚类的框架,同时可选地对聚类数量施加约束。这使得能够自动识别多个高质量替代聚类,捕捉层次结构的不同方面。无约束时,利用子树内局部最优部分候选可组合成全局最优解并自动确定聚类数的性质,通过动态规划在多项式时间内求解前M问题。然而,这可能导致聚类数最终不理想——例如,在特定应用领域中过大而失去意义或难以实际分析。施加聚类数约束破坏了无约束动态规划方法的最优性性质,因为局部最优部分候选可能不再能组合成可行的全局最优解。FOSC-X通过一种动态规划策略应对这一挑战,该策略使用可行性的下界和上界维护紧凑的可行候选集,同时剪枝不可行或占优的组合。所得方法保证在有无聚类数约束下,均以聚类节点数和数据集大小的线性时间复杂度获得前M个解的最优排序。实验表明,FOSC-X能有效揭示单解提取方法忽略的替代聚类结构。

英文摘要

Extracting a flat clustering solution from a hierarchy is a common task in practical cluster analysis and can be formulated as an optimisation problem. Existing approaches focus on finding a single optimal solution. We introduce FOSC-X, a framework for extracting the top-M globally optimal flat clusterings from local, non-horizontal cuts of a hierarchical cluster tree, while optionally enforcing constraints on the number of clusters. This enables automatic identification of multiple high-quality alternative clusterings that capture different aspects of the hierarchical structure. Without constraints, the top-M problem can be solved in polynomial time using dynamic programming, exploiting the property that locally optimal partial candidates within subtrees can be combined to form globally optimal solutions while automatically determining the number of clusters. However, this can lead to solutions with numbers of clusters that are ultimately undesirable -- e.g., too large to be meaningful or practically analysed within a particular application domain. Imposing cluster-count constraints breaks the optimality property underlying the unconstrained dynamic programming approach, since locally optimal partial candidates may no longer combine into feasible globally optimal solutions. FOSC-X addresses this challenge through a dynamic programming strategy that maintains compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations. The resulting method guarantees optimal rankings of the top-M solutions with linear-time complexity in the number of cluster nodes and dataset size, both with and without cluster-count constraints. Experiments show that FOSC-X efficiently reveals alternative clustering structures overlooked by single-solution extraction methods.

2606.19057 2026-06-18 stat.ML cs.LG stat.CO stat.ME 交叉投稿

Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning

通过正-无标签学习量化与审计大语言模型评估

Zilong Zhang, Yi-Ting Hung, Lei Ding, Chi-Kuang Yeh

AI总结 针对大语言模型作为评估者存在的系统性偏差(如冗长偏好),提出基于部分最优传输的几何审计框架,利用少量人工验证正样本校正偏差,无需重训练即可提升与人类偏好的一致性。

详情
AI中文摘要

大语言模型(LLM)越来越多地被用作可扩展评估的评判者,然而这种LLM作为评判者的系统表现出与语义质量脱节的系统性偏差,最显著的是冗长偏差。同时,人工监督成本高昂且通常具有选择性,产生可靠的正向判断,但大多数输出未被标记且质量可能参差不齐。我们将选择性人工监督下的LLM评估形式化为一个正-无标签学习问题,并提出了一个基于部分最优传输的几何审计框架。通过在固定嵌入空间中将一小部分人工验证的正样本与可靠的无标签输出子集对齐,我们的方法识别出与人类一致的偏好,并在无需重新训练的情况下纠正有偏的评判者。实验表明,该方法提高了与人类偏好的一致性,增强了对呈现偏差的鲁棒性,并提供了可解释的置信度估计,为现有的LLM作为评判者流程提供了一种可扩展且统计上有依据的替代方案。

英文摘要

Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM--as--a--Judge systems exhibit systematic biases that are decoupled from semantic quality, most notably verbosity bias. Meanwhile, human supervision is costly and typically selective, yielding reliable positive judgments but leaving most outputs unlabelled and potentially mixed in quality. We formulate LLM evaluation under selective human supervision as a positive--unlabelled learning problem and propose a geometric auditing framework based on Partial Optimal Transport. By aligning a small set of human--verified positives with a reliable subset of unlabelled outputs in a fixed embedding space, our method identifies human--consistent preferences and corrects biased judges without retraining. Experiments demonstrate improved alignment with human preferences, increased robustness to presentation biases, and interpretable confidence estimates, offering a scalable and statistically grounded alternative to existing LLM--as--a--judge pipelines.

2606.19184 2026-06-18 cs.CV cs.LG 交叉投稿

When AUC Misleads: Polarization-Aware Evaluation of Deepfake Detectors under Domain Shift

当AUC误导:域偏移下深度伪造检测器的极化感知评估

Dat Nguyen, Cosmin Radoi, Romain Hermary, Marcella Astrid, Nesryne Mejri, Enjie Ghorbel, Djamila Aouada

发表机构 * Cristal Laboratory, National School of Computer Sciences, University of Manouba(马努巴大学国家计算机科学学院Cristal实验室)

AI总结 针对现有AUC评估无法反映真实场景中混合数据源和不同伪影类型的问题,提出Cross-dataset AUC(Cross-AUC)指标,通过平均每域AUC并引入预测极化度量(Wasserstein距离)来评估域偏移鲁棒性,实验证明其有效性。

详情
AI中文摘要

生成式AI的最新进展,如扩散模型和换脸工具,使得创建高度逼真的深度伪造成为可能,导致了包括金融欺诈和非自愿色情内容在内的现实危害。为此,深度伪造检测成为一个活跃的研究领域,近期方法越来越关注提高对未见操作的泛化能力。这通常通过跨多个数据集分别测量的ROC曲线下面积(AUC)来评估。然而,这种评估未能反映检测器面对混合数据源和不同伪影类型的真实场景。为解决这一局限,我们引入一种新指标——跨数据集AUC(Cross-AUC),该指标平均每域AUC并加入预测极化度量,以考虑对域偏移的鲁棒性。极化程度通过类别分数分布之间的Wasserstein距离量化。Cross-AUC不仅更真实地评估深度伪造检测器在域偏移下的泛化能力,而且具有可解释性,因为它能更好地解释性能下降的原因。在七个基准数据集上的实验证明了其实用性。

英文摘要

Recent advances in generative AI, such as diffusion models and face-swapping tools, have enabled the creation of highly realistic deepfakes, leading to real-world harms including financial fraud and non-consensual explicit content. In response, deepfake detection has become an active research area, with recent methods increasingly focusing on improving generalization to unseen manipulations. This is typically evaluated using the Area Under the ROC Curve (AUC) measured separately across multiple datasets. However, such an evaluation fails to reflect real-world scenarios where detectors face a mixture of data sources and varying artifact types. To address this limitation, we introduce a novel metric, Cross-dataset AUC (Cross-AUC) that averages per-domain AUCs with a measure of prediction polarization for taking into account the robustness to domain shift. The polarization extent is quantified by the Wasserstein Distance between class score distributions. Cross-AUC not only assesses the generalization capabilities of deepfake detectors under domain shifts more realistically, but it is also interpretable as it better explains the reason behind a drop in performance. Experiments performed on seven benchmark datasets demonstrate its practical relevance.

2606.19245 2026-06-18 cs.AI cs.LG 交叉投稿

TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology

TxBench-PP:分析AI代理在小分子临床前药理学中的表现

Hannah Le, Ramesh Ramasamy, Alex Urrutia, Mahsa Yazdani, Tim Proctor, Kenny Workman

发表机构 * LatchBio

AI总结 提出TxBench-PP基准,用于评估AI代理从真实实验数据中恢复临床前药理学结论的能力,测试显示最强配置Claude Opus 4.8 / Pi仅通过59.3%的端点尝试。

详情
AI中文摘要

人工智能(AI)代理有望通过压缩解释和决策循环来加速药物发现,但实际部署需要基于现实程序决策的可信评估。我们引入了TherapeuticsBench临床前药理学(TxBench-PP),这是一个针对小分子临床前药理学的可验证基准,也是更广泛的TherapeuticsBench在药物发现阶段和治疗模式中的首个聚焦切片。TxBench-PP测试代理是否能够从真实实验数据中恢复准确的结论,而非从文献中记忆的事实。该基准包含100个评估,按程序阶段、实验类型和任务结构索引,涵盖作用机制(MoA)和药效学(PD)推理、化合物-靶点结合、因果靶点验证、可开发性与安全性以及转化疗效。代理接收现实的工作流程快照,在编码环境中检查文件,并返回确定性评分的结构化答案。在16个模型-工具配置(包括11个模型和4,800条轨迹)中,没有系统能够可靠地恢复临床前药理学决策。最强配置Claude Opus 4.8 / Pi通过了59.3%的端点尝试(178/300;95% CI, 51.1-67.6),其次是GPT-5.5 / Pi,为55.3%(166/300;47.0-63.6)。

英文摘要

Artificial intelligence (AI) agents promise to accelerate drug discovery by compressing interpretation and decision-making loops, but practical deployment requires trusted evaluation on realistic program decisions. We introduce TherapeuticsBench Preclinical Pharmacology (TxBench-PP), a verifiable benchmark for small-molecule preclinical pharmacology and the first focused slice of a broader TherapeuticsBench effort across drug-discovery stages and therapeutic modalities. TxBench-PP tests whether agents can recover accurate conclusions from real-world assay data rather than memorized facts from literature. The benchmark contains 100 evaluations indexed by program stage, assay type, and task structure, spanning mechanism-of-action (MoA) and pharmacodynamic (PD) reasoning, compound-target engagement, causal target validation, developability and safety, and translational efficacy. Agents receive realistic workflow snapshots, inspect files in a coding environment, and return structured answers graded deterministically. Across 16 model-harness configurations, comprising 11 models and 4,800 trajectories, no system reliably recovered preclinical pharmacology decisions. The strongest configuration, Claude Opus 4.8 / Pi, passed 59.3\% of endpoint attempts (178/300; 95\% CI, 51.1-67.6), followed by GPT-5.5 / Pi at 55.3\% (166/300; 47.0-63.6).

2606.19334 2026-06-18 cs.CL cs.CY cs.LG 交叉投稿

Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

用LOCUS解放法律:美国地方条例语料库

Denis Peskoff, Joe Barrow, Christopher Vu, Diag Davenport

发表机构 * UC Berkeley(加州大学伯克利分校) School of Information(信息学院) Independent(独立研究者)

AI总结 为解决美国地方条例缺乏机器可读语料的问题,构建了包含9239个市县条例的LOCUS语料库,并训练ModernBERT分类器以分析法律透明度等维度。

Comments 14 pages, 6 figures

详情
AI中文摘要

法律人工智能的进展越来越依赖于大规模获取权威法律文本。然而,美国法律中最具影响力的层级之一——地方条例——在很大程度上仍然缺失于现有的机器可读语料库中。地方法规管辖着分区、住房、商业许可、公共卫生、噪音、动物控制以及许多其他日常监管领域,但它们分散在专为人类浏览而非批量研究访问设计的供应商平台上。我们引入了LOCUS——美国地方条例语料库——一个全面的语料库和县级统一访问层,用于美国市和县条例。原始语料库可供研究人员发布,几乎涵盖了所有公开可用的市和县条例。由此产生的原始语料库包含来自9239个城市和县的法规。一个较小的县级统一LOCUS访问层覆盖了美国3144个县中最大的2309个,覆盖了大部分人口。我们使用OCR来处理使法律无法成为公共资源的各种文档格式。我们发布了带有覆盖元数据的语料库,以支持可重复性、下游法律AI研究以及逐步扩展对地方法律的机器可读访问。我们训练了一系列基于ModernBERT的分类器和评分器,以便从多个维度分析美国地方法律,例如不透明性和家长式作风,这些维度以前从未在此规模上研究过。LOCUS-v1及其衍生模型可在以下网址获取:this https URL

英文摘要

Progress in legal AI increasingly depends on access to authoritative legal text at scale. Yet one of the most consequential layers of American law remains largely absent from existing machine-readable corpora: local ordinances. Local codes govern zoning, housing, business licensing, public health, noise, animal control, and many other domains of everyday regulation, but they are fragmented across vendor platforms designed for human browsing rather than bulk research access. We introduce LOCUS - the Local Ordinance Corpus for the United States - a comprehensive corpus and county-harmonized access layer for U.S. municipal and county ordinance codes. The raw corpus, available for release to researchers, represents nearly all publicly available municipal and county ordinance codes. The resulting raw corpus contains codes from 9,239 cities and counties. A smaller county-harmonized LOCUS access layer provides coverage for the largest 2,309 of 3,144 U.S. counties, accounting for a majority of the population. We use OCR to handle the myriad of document formats that have kept the law from being a public resource. We release the corpus with coverage metadata to support reproducibility, downstream legal AI research, and the incremental expansion of machine-readable access to local law. We train a collection of ModernBERT-based classifiers and scorers to facilitate analyzing U.S. local law among several dimensions, such as opacity and paternalism, that have not previously been studied at this scale. LOCUS-v1 and its derivative models are available at: https://huggingface.co/datasets/LocalLaws/LOCUS-v1

11. 机器学习应用 21 篇

2606.17077 2026-06-18 physics.chem-ph cs.AI cs.LG quant-ph 交叉投稿

Comprehensive pKa Data Augmentation from Limited Real Data through an Engineered Models-Quantum Framework

基于工程化模型-量子框架从有限真实数据中全面增强pKa数据

Wang Rui, Liu Dinghao

发表机构 * Department of Chemistry, Tsinghua University(清华大学化学系) Department of Chemical Engineering, Tsinghua University(清华大学化学工程系) School of Science, China Pharmaceutical University(中国药科大学理学院)

AI总结 针对pKa数据稀疏问题,提出量子辅助分子生成方法,利用优化机器学习模型预测和量子退火器采样,在相干伊辛机上实现极端值采样。

详情
AI中文摘要

质子解离常数(pKa)对于功能分子发现和分子建模至关重要。基于已建立的最大实验pKa数据库iBonD,我们和其他研究人员开发了多种方法,包括基于机器学习的经验预测和高精度能量计算。尽管如此,高质量pKa数据的快速增强仍然受到根本性限制。作为这项工作的一部分,我们使用一组经过广泛优化的机器学习模型,对未标记分子数据集进行了大规模基于回归的pKa预测。结果表明,由于未标记分子数据集的特征分布,pKa数据分布近似正态,尾部区域样本极度稀缺。尽管这种增强对于提高整体数据可用性和预测建模非常有价值,但对于高效发现具有广谱pKa性质的分子仍然不足。为了解决这个问题,我们探索从广阔的化学空间中定向生成具有稀疏pKa性质的分子。鉴于传统的连续潜在空间VAE-RNN分子生成方法稳定性不足,且在补充稀疏数据方面未能显示出明显优势,我们设计并实现了一种量子辅助的稀疏pKa分子生成。在模拟量子退火器上验证了可行性,并在物理相干伊辛机(CIM)上进一步实现了优越的极端值采样。(未完待续)

英文摘要

Proton dissociation constants (pKa) are critical for functional molecule discovery and molecular modeling. Building on iBonD, the largest experimental pKa database established, we and other researchers have developed several methods including machine-learning-based empirical prediction and high-accuracy energy calculations. Despite this foundation, the rapid augmentation of high-quality pKa data remains fundamentally constrained. As part of this work, we performed large-scale regression-based pKa prediction on unlabeled molecular datasets using a collection of extensively optimized machine-learning models. The results indicate that, since the feature distributions of unlabeled molecular datasets, the pKa data distribution approximates normality, with extreme scarcity of tail-region samples. Although such augmentation is highly valuable for improving overall data availability and predictive modeling, it remains insufficient for efficiently discovering molecules with broad-spectrum pKa properties. To address this, we explore the targeted generation of molecules with sparse pKa properties from the vast chemical space. Given that traditional continuous latent space VAE-RNN methods for molecular generation suffer from insufficient stability and fail to demonstrate clear advantages in complementing sparse data, we design and implement a quantum-assisted sparse-pKa molecular generation. Feasibility is validated on a simulated quantum annealer, and superior extreme-value sampling is further achieved on physical coherent Ising machines (CIMs). (to be continued)

2601.23018 2026-06-18 cs.HC cs.AI cs.LG 交叉投稿

Integrating Multi-Label Classification and Generative AI for Scalable Analysis of User Feedback

整合多标签分类与生成式AI实现用户反馈的可扩展分析

Sandra Loop, Erik Bertram, Sebastian Juhl, Martin Schrepp

发表机构 * SAP SE(SAP公司) Hochschule Fresenius Heidelberg(弗赖辛大学海德堡分校) University of Missouri(密苏里大学)

AI总结 提出结合监督多标签分类与生成式AI的方法,高效处理大量用户评论,自动分配主题标签并生成摘要,同时发现情感分析不能可靠反映产品满意度。

Comments 8 pages, 2 figures, submitted to Springer Nature

详情
AI中文摘要

在高度竞争的软件市场中,用户体验(UX)评估对于确保软件质量和促进产品长期成功至关重要。此类UX评估通常将标准化问卷的定量指标与通过开放式问题收集的定性反馈相结合。虽然开放式反馈为改进提供了有价值的见解,并有助于解释定量结果,但分析大量用户评论具有挑战性且耗时。在本文中,我们介绍了一家大型软件公司在长期UX测量项目中开发的技术,以高效处理和解释大量用户评论。为了提供收集到的评论的高层概述,我们采用监督机器学习方法,为每条评论分配有意义的预定义主题标签。此外,我们展示了如何利用生成式AI(GenAI)创建简洁且信息丰富的用户反馈摘要,促进向组织尤其是高层管理人员有效传达发现。最后,我们研究了用户评论中表达的情感是否可以作为整体产品满意度的指标。我们的结果表明,仅凭情感分析并不能可靠地反映用户满意度。相反,产品满意度需要在调查中明确评估,以衡量用户对产品的感知。

英文摘要

In highly competitive software markets, user experience (UX) evaluation is crucial for ensuring software quality and fostering long-term product success. Such UX evaluations typically combine quantitative metrics from standardized questionnaires with qualitative feedback collected through open-ended questions. While open-ended feedback offers valuable insights for improvement and helps explain quantitative results, analyzing large volumes of user comments is challenging and time-consuming. In this paper, we present techniques developed during a long-term UX measurement project at a major software company to efficiently process and interpret extensive volumes of user comments. To provide a high-level overview of the collected comments, we employ a supervised machine learning approach that assigns meaningful, pre-defined topic labels to each comment. Additionally, we demonstrate how generative AI (GenAI) can be leveraged to create concise and informative summaries of user feedback, facilitating effective communication of findings to the organization and especially upper management. Finally, we investigate whether the sentiment expressed in user comments can serve as an indicator for overall product satisfaction. Our results show that sentiment analysis alone does not reliably reflect user satisfaction. Instead, product satisfaction needs to be assessed explicitly in surveys to measure the user's perception of the product.

2606.03745 2026-06-18 hep-ph cs.LG hep-ex physics.data-an 交叉投稿

Predicting the Neutrino Mass Ordering Using Neural Networks

利用神经网络预测中微子质量顺序

T. J. C. Bezerra, L. Asquith, E. Bannister, W. Shorrock

发表机构 * Department of Physics and Astronomy, University of Sussex(苏塞克斯大学物理与天文学系)

AI总结 针对中微子质量顺序这一粒子物理核心问题,提出基于前馈神经网络分类器的机器学习方法,利用合成长基线数据集训练,并与标准χ²和logL方法对比,证明其性能相当,可作为独立交叉检验工具。

Comments 11 pages, 7 figures

详情
AI中文摘要

确定中微子质量顺序仍是粒子物理中的一个核心开放问题。虽然下一代长基线实验有望解决这一问题,但当前数据提供的灵敏度有限,因为正常顺序和倒置顺序之间的谱差异细微且与参数简并纠缠。我们研究了一种用于质量顺序确定的机器学习策略,使用前馈神经网络分类器,该分类器在合成长基线数据集上训练,这些数据集由三味振荡概率、物质效应和统计涨落生成。我们使用常见的判别指标(包括接收者操作特征曲线)将分类器与标准χ²和logL方法进行评估,以量化灵敏度并说明如何选择操作点以优先考虑纯度或效率。我们发现,在所研究的场景中,神经网络实现了与常规拟合相当的性能,为已有分析提供了灵活、独立的交叉检验。该框架可以扩展以包含系统不确定性并探索振荡参数的联合推断,也可作为在中微子物理中引入机器学习方法的教学工具。

英文摘要

Determining the neutrino mass ordering remains a central open problem in particle physics. While next-generation long-baseline experiments are expected to resolve this question, current data provide limited sensitivity because the spectral differences between normal and inverted ordering are subtle and entangled with parameter degeneracies. We investigate a machine-learning strategy for mass-ordering determination using a feed-forward neural-network classifier trained on synthetic long-baseline datasets generated with three-flavour oscillation probabilities, matter effects, and statistical fluctuations. We evaluate the classifier against standard $χ^2$ and $\log\mathcal{L}$ approaches using common discrimination metrics, including receiver-operating-characteristic curves, to quantify sensitivity and to illustrate how operating points can be selected to prioritise purity or efficiency. We find that the neural network achieves performance comparable to conventional fits for the scenarios studied, providing a flexible, independent cross-check of established analyses. The framework can be extended to incorporate systematic uncertainties and to explore joint inference of oscillation parameters, and it may also serve as a pedagogical tool for introducing machine-learning methods in neutrino physics.

2606.18271 2026-06-18 cs.AI cs.LG 交叉投稿

NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation

NAVI-Orbital:用于自主地球观测的零样本视觉语言模型的首次在轨演示

Juan Manuel Delfa Victoria, Taran Cyriac John, Andrew W. Herson

发表机构 * NASA Jet Propulsion Laboratory (JPL)(美国宇航局喷气推进实验室) Loft Orbital(Loft Orbital公司)

AI总结 本文介绍NAVI-Orbital系统,在低地球轨道卫星上首次实现视觉语言模型的自主多模态推理,通过语义压缩解决数据下传瓶颈。

Comments 17 pages, 47 figures

详情
AI中文摘要

随着地球观测数据的生成速度超过下行链路带宽和人在回路处理能力,星载采集与可操作地面情报之间的差距日益扩大。本文介绍NAVI-Orbital,一个部署在低地球轨道(LEO)航天器上的软件系统。2026年4月16日,NAVI-Orbital实现了据作者所知首次在轨演示,即视觉语言模型完全在星上进行自主多模态推理。NAVI-Orbital使用本地视觉语言模型(Gemma 3)对每个捕获场景进行分类,生成其内容及特征间关系的文本描述,并通过自然语言对话响应操作员的后续查询。该系统通过纯英语提示替代传统指令序列进行任务重定向,并由基于图的状态机(LangGraph)编排,协调用于检测和对话的专用代理。地面基准测试(在7,960张图像的精选AID基准上准确率达88.16%)、Flatsat验证以及实时在轨捕获的新获取、未见过的地球图像(包括未校正的YAM-9图像,在星上通过硬件加速GPU推理处理且未对飞行仪器进行微调)的结果表明,在卫星级边缘计算机上运行基础模型是可行的,通过星上地球观测的语义压缩,颠覆了传统的先采集后全部下传的带宽模式。

英文摘要

As Earth Observation data generation outpaces downlink bandwidth and human-in-the-loop processing, a widening gap has emerged between onboard collection and actionable ground intelligence. This paper presents NAVI-Orbital, a software system deployed on a Low Earth Orbit (LEO) spacecraft. On April 16, 2026, NAVI-Orbital achieved what is, to the authors' knowledge, the first in-orbit demonstration of a vision-language model performing autonomous multi-modal inference entirely onboard. NAVI-Orbital uses a local vision-language model (Gemma 3) to classify each captured scene, produce a text description of its content and the relationships between its features, and respond to operator follow-up via natural-language dialogue. The system is re-tasked through plain-English prompts in place of conventional command sequences, and is orchestrated by a graph-based state machine (LangGraph) coordinating dedicated agents for detection and dialogue. Results across ground benchmarking (88.16% accuracy on the 7,960-image curated AID benchmark), Flatsat validation, and live in-orbit captures of newly acquired, previously unseen Earth imagery (including uncorrected YAM-9 imagery, processed onboard with hardware-accelerated GPU inference and no fine-tuning for the flight instrument) demonstrate the feasibility of running foundation models on satellite-class edge computers to invert the conventional acquire-then-downlink-everything bandwidth profile through semantic compression of Earth observations in-orbit.

2606.18323 2026-06-18 cs.SD cs.LG 交叉投稿

Reliable Neural-Codec Text-to-Speech by ASR Self-Verification and Distillation: Near-Zero Catastrophic Failures Across Models and Codecs

通过ASR自验证与蒸馏实现可靠的神经编解码文本转语音:跨模型与编解码器的近零灾难性失败

Ali Asaria, Tony Salomone, Deep Gandhi

发表机构 * Transformer Lab

AI总结 针对开放自回归神经编解码TTS模型的随机灾难性失败(静音、早停、重复或幻觉),提出基于ASR往返的格式鲁棒度量,通过最佳N自验证将失败率降至近零,并通过蒸馏将鲁棒性迁移至单次解码,在无测试代价下关闭约52-58%的失败。

详情
AI中文摘要

开放自回归神经编解码文本转语音(TTS)模型在典型输入上表现优异,但会出现随机灾难性失败:在相当一部分话语中,它们会发出静音、提前终止或陷入重复或幻觉内容。我们表明这种失败模式可以廉价地消除。在单一格式鲁棒度量(通过ASR往返的灾难性失败率)下,最佳N ASR自验证将失败率降至近零:在标准语料库(LibriSpeech)上N=2时未观察到失败,在困难提示集上N=4时也未观察到。这不是单一模型的假象:该减少在四个开放编解码TTS系统和三个神经编解码器(XCodec2、SNAC、Mimi)上复现,其中三个系统在N=2时达到近零下限。然后,通过将自验证行为蒸馏到模型中,我们在推理时免费实现了修复,这恢复了单次解码中的大部分鲁棒性,在无测试代价下关闭了困难输入上约52-58%的失败。蒸馏增益集中在需要的地方(困难输入);在已经可靠的散文上,没有改进空间且无检测到变化。一项受控比较添加了一个干净的负面结果:离线直接偏好优化(DPO/IPO)并未优于普通监督蒸馏,而在线迭代变体虽有前景但在我们的评估规模下统计上不显著。我们诚实地报告了唯一抵抗的模型(一个更大的Llasa,其中规模并未明显帮助)以及一个罕见词能力上限,该上限无法通过任何自蒸馏方法克服。

英文摘要

Open autoregressive neural-codec text-to-speech (TTS) models sound excellent on typical inputs yet suffer stochastic catastrophic failures: on a meaningful fraction of utterances they emit silence, terminate early, or collapse into repetitive or hallucinated content. We show this failure mode is cheap to remove. Under a single format-robust metric (a catastrophic-failure rate via an ASR round-trip), best-of-N ASR self-verification drives failures to near-zero: no observed failures remain by N=2 on a standard corpus (LibriSpeech) and by N=4 on a hard prompt set. This is not an artifact of one model: the reduction replicates across four open codec-TTS systems and three neural codecs (XCodec2, SNAC, Mimi), reaching the near-zero floor by N=2 on three of the four. We then make the fix free at inference time by distilling the self-verified behaviour into the model, which recovers much of the robustness in single-shot decoding, closing ~52-58% of the failure mass on hard inputs at no test-time cost. The distillation gain concentrates where it is needed (hard inputs); on already-reliable prose there is no headroom and no detectable change. A controlled comparison adds a clean negative: offline direct preference optimization (DPO/IPO) does not beat plain supervised distillation, and an online iterative variant is promising but not statistically separable at our evaluation size. We report honestly the one model that resists (a larger Llasa where scale did not obviously help) and a rare-word capability ceiling that no self-distillation method overcomes

2606.18429 2026-06-18 cs.CV cs.AI cs.LG 交叉投稿

CAOA -- Completion-Assisted Object-CAD Alignment

CAOA -- 补全辅助的物体-CAD对齐

Hiranya Garbha Kumar, Minhas Kamal, Balakrishnan Prabhakaran

发表机构 * University at Albany(奥尔巴尼大学)

AI总结 提出CAOA方法,结合语义感知点云补全和对称感知相对位姿估计,在Scan2CAD上实现17%精度提升,并发布S2C-Completion数据集。

Comments GitHub: https://github.com/MinhasKamal/CAOA

详情
Journal ref
Thirteenth International Conference on 3D Vision (3DV), 2026
AI中文摘要

准确地将CAD模型与室内RGB-D扫描中的对应物体对齐是3D语义重建的核心挑战。该任务需要估计9自由度(DoF)位姿——位置、旋转和三轴尺度——但受到噪声和不完整扫描以及导致几何畸变的分割误差的阻碍。我们提出补全辅助的物体-CAD对齐(CAOA),该方法将语义和上下文感知的点云补全模块与对称感知的相对位姿估计算法相结合,实现CAD模型与扫描物体的精确对齐。现有的补全方法通常在合成数据集上训练和评估,往往难以泛化到真实扫描。为弥合这一差距,我们引入了一种针对室内场景的合成数据生成策略,通过与广泛使用的补全数据集进行定量比较,验证了其显著减小合成到真实领域差距的效果。此外,我们发布了S2C-Completion,一个来自Scan2CAD的超过8500个物体-CAD对的专家标注数据集,用于真实室内单物体补全,并作为该任务的新基准。对于物体-CAD对齐,我们通过对称感知损失融入对称信息,提高了对对称模糊的鲁棒性。在Scan2CAD基准上,CAOA相比最先进方法实现了17%的精度提升。

英文摘要

Accurately aligning CAD models to their corresponding objects in indoor RGB-D scans is a central challenge in 3D semantic reconstruction. The task requires estimating a 9-Degree-of-Freedom (DoF) pose-position, rotation, and scale along three axes-but is hindered by noisy and incomplete scans, as well as segmentation errors that cause geometric distortions. We present Completion-Assisted Object-CAD Alignment (CAOA), a method that integrates a semantically and contextually aware point cloud completion module with a symmetry-aware relative pose estimation algorithm, enabling precise alignment of CAD models to scanned objects. Existing completion methods are typically trained and evaluated on synthetic datasets, which often fail to generalize to real-world scans. To bridge this gap, we introduce a synthetic data generation strategy tailored to indoor scenes, significantly reducing the synthetic-to-real domain gap-validated through quantitative comparisons with widely used completion datasets. In addition, we release S2C-Completion, an expert-annotated dataset of over 8,500 object-CAD pairs from Scan2CAD, created for real-world indoor single-object completion and intended as a new benchmark for this task. For object-CAD alignment, we incorporate symmetry information via a symmetry-aware loss, improving robustness to symmetric ambiguities. On the Scan2CAD benchmark, CAOA achieves a 17% accuracy improvement over state-of-the-art methods.

2606.18464 2026-06-18 astro-ph.IM astro-ph.EP cs.LG 交叉投稿

Modeling Doppler Shifts in Radial-Velocity Data with Deep Learning toward Earth-mass Exoplanet Detection

利用深度学习建模径向速度数据中的多普勒频移以探测地球质量系外行星

Isidro Gómez-Vargas, Xavier Dumusque, Yinan Zhao, Khaled Al Moulla, Michael Cretignier

发表机构 * Department of Astronomy, University of Geneva 51 chemin de Pegasi, 1290 Versoix, Switzerland. Instituto de Astrofı\'isica de Andaluc\'ia (CSIC), Glorieta de la Astronom\'ia s/n, E-18008 Granada, Spain. Institute of Space Sciences (CSIC), Carrer de Can Magrans s/n, E-08193 Barcelona, Spain. Department of Astronomy, University of Texas at Austin, 2515 Speedway, Austin, TX 78712, USA. Instituto de Astrofísica e Ciências do Espaço, Universidade do Porto, CAUP, Rua das Estrelas, 4150-762 Porto, Portugal. Department of Physics, University of Oxford, OX13RH Oxford, UK.

AI总结 针对恒星活动干扰,提出结合物理启发光谱表示与深度学习的框架,通过交叉验证和遗传算法优化,可靠恢复振幅≥25 cm/s、周期10-550天的行星信号,并发布Python包doppleriann。

Comments 20 pages, 14 figures. Accepted for publication in Astronomy & Astrophysics

详情
AI中文摘要

由于恒星活动的影响,在恒星径向速度测量中探测由地球质量行星引起的微小多普勒频移仍然极具挑战性。许多在模拟数据上表现良好的深度学习方法难以可靠地应用于真实恒星光谱。本工作的目标是开发一种深度学习框架,使其能够泛化到真实、未见过的光谱,并提高径向速度数据中地球质量行星的可探测性。我们在注入行星信号的HARPS-N太阳光谱上训练人工神经网络,使用基于通量和谱线形成温度的物理驱动光谱表示,以及它们的速度梯度。探索了两种训练策略:留出测试和交叉验证。通过基于遗传算法的超参数优化增强模型鲁棒性,并使用蒙特卡洛dropout量化预测不确定性。在交叉验证策略下,我们最精确的神经网络模型能够可靠地恢复振幅≥25 cm/s、周期在10到550天之间的行星信号的振幅、相位和轨道周期。此外,在所有测试案例中,成功恢复的信号对应于多普勒频移预测周期图中最显著的峰值。基于温度的光谱壳表示始终优于基于通量的壳。我们还发布了实现该框架的Python包doppleriann。我们的结果表明,将物理驱动的光谱表示与深度学习相结合,为从真实观测的径向速度数据中探测地球质量行星提供了一条有前景的途径,该建模框架既具有物理基础又具有统计严谨性,并包含了不确定性量化和优化的训练策略。

英文摘要

Detecting the tiny Doppler shifts induced by Earth-mass planets in stellar radial-velocity measurements remains extremely challenging due to stellar activity. Many deep-learning methods performing well on simulated data remain difficult to apply reliably on real stellar spectra. The aim of this work is to develop a deep-learning framework that generalizes to real, unseen spectra and improves the detectability of Earth-mass planets in radial-velocity data. We train artificial neural networks on HARPS-N solar spectra with injected planetary signals, using physics-motivated spectral representations based on flux and line-formation temperature, together with their velocity gradients. Two training strategies are explored: hold-out testing and cross-validation. Model robustness is enhanced through genetic-algorithm-based hyperparameter optimization, and predictive uncertainty is quantified using Monte Carlo dropout. Our most precise neural network model reliably retrieves, under the cross-validation strategy, the amplitudes, phases, and orbital periods of planetary signals with amplitudes greater than or equal to 25 cm/s and periods between 10 and 550 days. In addition, in all cases tested here, the successfully recovered signals correspond to the most significant peaks in the periodograms of the Doppler-shift predictions. Temperature-based spectral-shell representations consistently outperform flux-based shells. We also release doppleriann, a Python package implementing the proposed framework. Our results demonstrate that combining physically motivated spectral representations with deep learning provides a promising pathway toward the detection of Earth-mass planets in radial-velocity data from real observations, supported by a modeling framework that is both physically grounded and statistically rigorous, incorporating uncertainty quantification and optimized training strategies.

2606.18698 2026-06-18 cs.RO cs.AI cs.LG 交叉投稿

Leveraging Energy Features for Surface Classification with Deep Learning: A Comparative Analysis Across Three Independent Datasets

利用能量特征进行基于深度学习的表面分类:三个独立数据集的比较分析

Alexander Belyaev, Oleg Kushnarev

AI总结 研究评估能量特征作为表面分类的独立或辅助模态的可行性,在三个数据集上比较多种深度学习架构,发现CNN性能最优,纯能量特征准确率85-90%,与惯性特征结合可达96-99%,且能量特征可稳定提升1-2%准确率。

详情
AI中文摘要

基于能量的方法在移动机器人表面分类中仍是一个相对未被充分研究的途径,尽管在受限环境中取得了有希望的结果。本研究评估了使用能量衍生特征作为独立分类模态或作为惯性数据补充输入的可行性。在三个公开数据集上进行了全面评估,比较了现代深度学习架构(包括循环神经网络、卷积神经网络、仅编码器变压器和Mamba状态空间模型)在自动超参数调整和输入序列长度优化下的性能。模型在所有评估数据集上均实现了比先前报道值更高的准确率,其中卷积神经网络取得了最高的整体性能。当仅依赖基于能量的特征时,模型分类准确率在85-90%范围内,比与惯性特征结合时(96-99%)低约5-10%。用能量特征增强惯性数据导致平均准确率持续提高1-2%。这些发现表明,仅依赖能量特征的分类器为独立部署提供了足够的准确性,同时在与其它感知模态结合使用时也提供了一致的增益。

英文摘要

The energy-based method remains a comparatively underexamined approach for surface classification in mobile robotics, despite promising results in constrained environments. This study evaluated the viability of using energy-derived features as either a standalone classification modality or as supplementary input to inertial data. A comprehensive evaluation was conducted across three publicly available datasets, comparing the performance of modern deep learning architectures including recurrent neural networks, convolutional neural networks, encoder-only transformers, and Mamba state-space models, under automated hyperparameter tuning and input sequence length optimization. The models achieved higher accuracy than previously reported values on all evaluated datasets, with the convolutional neural network yielding the highest overall performance. When relying exclusively on energy-based features, the models attained classification accuracies in the range of 85-90%, approximately 5-10% lower than those achieved when combined with inertial features (96-99%). Augmenting inertial data with energy features resulted in a consistent mean accuracy improvement of 1-2%. These findings indicate that classifiers relying solely on energy features offer sufficient accuracy for standalone deployment, while also providing a consistent gain when used in combination with other sensing modalities.

2606.18723 2026-06-18 cs.CV cs.LG 交叉投稿

Clinically Aligned Geometry Constraints for Robust IVUS Vessel Boundary Segmentation

临床对齐的几何约束用于鲁棒的IVUS血管边界分割

Yunshu Chen, Litao Yang, Giuseppe Di Giovanni, Jordan Tan, Deval Mehta, Andrew Lin, Derek Chew, Masasi Fujino, Julie Butters, Stephen Nicholls, Zongyuan Ge, Kyung Hoon Cho

发表机构 * AIM For Health Lab, Monash University(莫纳什大学AIM健康实验室) Department of Data Science and Artificial Intelligence, Faculty of IT, Monash University(莫纳什大学信息技术学院数据科学与人工智能系) Monash University Victorian Heart Institute(莫纳什大学维多利亚心脏研究所) School of Computing Technologies, RMIT University(皇家墨尔本理工大学计算技术学院) National Cerebral and Cardiovascular Center(国立循环器病研究中心) Department of Cardiology, Chonnam National University Hospital and Medical School(全南大学医院和医学院心脏病学系)

AI总结 提出GeoCat网络,通过双编码器与可微几何一致性损失,在IVUS分割中降低边界漂移和拓扑错误,提升临床几何测量精度。

Comments MICCAI2026 Accepted

详情
AI中文摘要

血管内超声(IVUS)管腔和外弹性膜(EEM)分割对于定量评估冠状动脉斑块负荷至关重要。管腔或EEM勾画的误差会直接传播到斑块面积、斑块负荷和几何测量中。然而,优先考虑重叠分数的标准方法常常遭受边界漂移和拓扑错误,导致临床测量不准确。我们提出GeoCat,一个几何一致性网络,使用双笛卡尔-极坐标编码器,结合跨域注意力和时间融合,处理5帧IVUS片段。可微的几何一致性损失直接监督临床相关描述符,包括直径、方向和横截面积。该模型在来自146名患者的12,242张标注帧上训练,这些帧使用两种商用IVUS系统采集。我们使用分割准确性和斑块相关临床指标评估性能,包括Dice/IoU、边界测量(95HD(mm)、ASSD)、拓扑违规率和临床几何误差(dmax/dmin、角度和面积)。在我们的数据集上,GeoCat实现了0.93的Dice,将95HD降低到0.14 mm,并将拓扑违规率降低到1.0%。重要的是,它显著提高了几何保真度,产生0.13-0.16 mm的直径误差和约8度的角度误差,支持可靠的斑块负荷量化。

英文摘要

Intravascular ultrasound (IVUS) lumen and external elastic membrane (EEM) segmentation is important for quantitative coronary plaque burden assessment. Errors in lumen or EEM delineation directly propagate to plaque area, plaque burden and geometric measurements. However, standard methods prioritising overlap scores often suffer from boundary drift and topology errors, leading to inaccurate clinical measurements. We present GeoCat, a geometry-consistent network that processes 5-frame IVUS clips using dual Cartesian-polar encoders with cross-domain attention and temporal fusion. A differentiable geometry consistency loss directly supervises clinically relevant descriptors including diameters, orientations, and cross-sectional areas. The model is trained on 12,242 annotated frames from 146 patients acquired with two commercial IVUS systems. We evaluate performance using both segmentation accuracy and plaque-relevant clinical metrics, including Dice/IoU, boundary measures(95HD (mm), ASSD), topology violation rate, and clinical geometry errors (dmax/dmin, angles, and areas). On our dataset, GeoCat achieves a Dice of 0.93, reduces 95HD to 0.14 mm, and lowers topology violations to 1.0%. Importantly, it significantly improves geometric fidelity, yielding diameter errors of 0.13-0.16 mm and angular errors of ~8 degrees, supporting reliable plaque burden quantification.

2606.18734 2026-06-18 eess.SP cs.LG 交叉投稿

Point-Cloud-Assistant Localized Statistical Channel Prediction by Tangent Gaussian Splatting

点云辅助的切线高斯溅射局部统计信道预测

Ye Xue, Yiheng Wang, Xinhua Shao, Qi Yan, Shutao Zhang, Tsung-Hui Chang

AI总结 提出点云辅助切线高斯溅射(PC-TGS)框架,通过融合稀疏无线电测量与密集LiDAR几何数据,将角功率谱外推到未测量网格,实现大规模无线数字孪生中的高效信道预测。

详情
AI中文摘要

准确、特定地点的信道信息对于优化下一代无线网络至关重要。在各种方法中,局部统计信道建模(LSCM)通过从参考信号接收功率(RSRP)测量中建模信道多径角功率谱(APS),已成为一种针对高效网络优化的最先进方法。然而,尽管其有效性,LSCM无法在绝大多数没有测量值的位置预测APS,这严重限制了其在大规模真实场景中的适用性。为了解决这一挑战,我们提出了\emph{点云辅助切线高斯溅射}(PC-TGS),这是第一个通过将稀疏无线电测量与密集的基于LiDAR的几何信息相结合,将APS\emph{外推}到未测量室外网格的框架。PC-TGS将环境散射体表示为各向异性的3D高斯分布,通过原始点云的松弛均值重新参数化进行初始化和细化。切线平面投影将每个高斯分布精确映射到局部角度域,而深度感知的电磁溅射过程聚合它们的贡献。为了确保实际部署,我们推导了用于APS bin积分的闭式高斯加权平均(GWA),并提供了可证明的误差界。在LiDAR扫描的城市规模数据集(500万个点,6310个RSRP样本)上的评估表明,与最先进的基线相比,PC-TGS在APS和RSRP预测性能上更优,并且在外推APS任务中推理时间更快。这些结果突显了PC-TGS在大规模无线数字孪生中实现几何感知和数据高效信道预测的潜力。

英文摘要

Accurate, site-specific channel information is crucial for optimizing next-generation wireless networks. Among various approaches, localized statistical channel modeling (LSCM), which models the channel multipath angular power spectrum (APS) from the reference signal received power (RSRP) measurement, has emerged as a state-of-the-art method tailored for efficient network optimization. However, despite its effectiveness, LSCM cannot predict APS at the vast majority of locations where no measurements are available, which significantly restricts its applicability in large-scale, real-world scenarios. To address this challenge, we present \emph{point-cloud-assisted tangent Gaussian splatting} (PC-TGS), the first framework to \emph{extrapolate} APS to unmeasured outdoor grids by integrating sparse radio measurements with dense LiDAR-based geometry. PC-TGS represents environmental scatterers as anisotropic 3D Gaussians, initialized and refined through a relaxed-mean reparameterization of the raw point cloud. A tangent-plane projection accurately maps each Gaussian into the local angular domain, while a depth-aware electromagnetic splatting process aggregates their contributions. To ensure practical deployment, we derive a closed-form Gaussian-weighted average (GWA) for APS bin integration and provide a provable error bound. { Evaluations on a LiDAR-scanned city-scale dataset (5M points, 6,310 RSRP samples) demonstrate that PC-TGS achieves better APS and RSRP prediction performance compared to state-of-the-art baselines and faster inference time for APS extrapolation task. These results highlight the potential of PC-TGS to enable geometry-aware and data-efficient channel prediction in large-scale wireless digital twins.

2606.18824 2026-06-18 cs.CV cs.LG 交叉投稿

Where Will They Go? Modelling Multimodal Pedestrian Manoeuvres from Ego-centric Videos

他们将去哪里?从自我中心视频建模多模态行人机动

Yuxuan Xie, Nicolas Pugeault, Chongfeng Wei, Hubert P. H. Shum, Edmond S. L. Ho

发表机构 * School of Computing Science, University of Glasgow(格拉斯哥大学计算机科学学院) James Watt School of Engineering, University of Glasgow(格拉斯哥大学詹姆斯·瓦特工程学院) Department of Computer Science, Durham University(杜伦大学计算机科学系)

AI总结 提出MMPM框架,通过行为感知交互模块和基于CVAE的模态感知轨迹预测器,分别建模行人过马路和不过马路两种模式,提升自我中心视角下多模态轨迹预测准确性。

Comments Accepted at The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2026

详情
AI中文摘要

从自我中心摄像头进行行人轨迹预测具有挑战性,因为它依赖于与车辆和场景上下文的复杂交互以及行人的意图。通过建模行人历史与未来轨迹的相关性和意图,通常会产生多模态(即多个模式)分布。现有的随机预测器通常从单一单峰分布中采样多个未来轨迹,这可能导致次优的“混合模式”轨迹,这些轨迹位于不同的运动模式之间,并在真实场景中变得不合理。在本文中,我们提出MMPM,一种模态感知框架,基于行人的过马路行为将未来轨迹分布分别建模为语义上有意义的模式。MMPM由两个模块组成:行为感知行人交互模块(PIM),通过引入注视、头部和手势来联合捕捉行人-车辆和行人-环境交互;以及基于CVAE的模态感知轨迹预测器(MTP)模块,分别对过马路和不过马路两种模式的未来轨迹分布进行建模。基于查询的解码器进一步在解码过程中强制执行模态一致性。在PIE和JAAD数据集上的实验表明,我们的方法超越了最先进的基线。我们提出的MTP是模型无关的,可以集成到现有框架如BiTrap-NP和SGNet-ED中,以进一步提高未来轨迹预测性能。我们还引入了一种数据驱动的验证协议,将预测与时空一致的真实轨迹匹配,展示了相比先前工作改进的逐帧位移误差。

英文摘要

Pedestrian trajectory prediction from an ego-centric camera is challenging since it depends on complex interactions with vehicles and scene context, as well as the intention of the pedestrian. By modelling correlation and intent from the historical and future trajectories of the pedestrian, it will usually result in a multimodal (i.e. multiple modes) distribution. Existing stochastic predictors often sample multiple futures from a single unimodal distribution, which can yield sub-optimal 'mixed-mode' trajectories that lie between distinct motion patterns and become implausible in real scenes. In this paper, we propose MMPM, a mode-aware framework that separately models future trajectory distributions into semantically meaningful modes based on the pedestrian's crossing behavior. MMPM consists of two modules: behavior-aware Pedestrian Interaction Module (PIM) that jointly captures pedestrian-vehicle and pedestrian-environment interactions by introducing gaze, head and hand gesture, and a CVAE-based Mode-aware Trajectory Predictor (MTP) module to model the future trajectory distributions on two modes, crossing and non-crossing the road, separately. A query-based decoder further enforces mode consistency during decoding. Experiments on PIE and JAAD datasets show that our method surpasses state-of-the-art baselines. Our proposed MTP is model-agnostic, which can be integrated into existing frameworks such as BiTrap-NP and SGNet-ED to further improve future trajectory prediction performance. We additionally introduce a data-driven validation protocol that matches predictions to spatio-temporally consistent ground-truth trajectories, demonstrating improved frame-wise displacement errors over previous work.

2606.18876 2026-06-18 cs.CV cs.LG 交叉投稿

Test-Time Adaptation in Optical Coherence Tomography Using Trajectory-Aligned Time-Independent Flow

光学相干断层扫描中基于轨迹对齐的时间无关流的测试时自适应

Veit Hucke, Thomas Pinetz, Gregor Reiter, Ursula Schmidt-Erfurth, Hrvoje Bogunović

发表机构 * Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria(人工智能研究所、医学数据科学中心、维也纳医学大学,奥地利) Comprehensive Center for Artificial Intelligence in Medicine, Medical University of Vienna, Austria(医学人工智能综合中心、维也纳医学大学,奥地利) Department of Ophthalmology and Optometry, Medical University of Vienna, Austria(眼科与视光学部、维也纳医学大学,奥地利) Laboratory for Ophthalmic Image Analysis, Medical University of Vienna, Austria(眼科图像分析实验室、维也纳医学大学,奥地利)

AI总结 提出一种基于流匹配的测试时自适应方法,通过直方图匹配和去除时间条件,生成高质量替代图像,在AMD分割中达到最优性能。

Comments Accepted in MICCAI

详情
AI中文摘要

光学相干断层扫描(OCT)在眼科中至关重要,但图像质量不一致,尤其是在低成本设备中,阻碍了自动化分析。为了解决这个问题,我们引入了一种基于流匹配的测试时自适应方法,从噪声输入生成高质量替代图像。通常,测试数据和训练数据之间的域差距会导致去噪过程中像素分布不匹配。我们通过将测试图像的直方图与合成参考轨迹匹配来克服这一问题,成功地将输入与预期分布对齐。此外,我们移除了网络的时间条件,以考虑真实世界噪声分布的轻微偏差。我们的方法在分割年龄相关性黄斑变性(AMD)两个阶段的关键生物标志物方面达到了最先进的性能。代码地址:this https URL。

英文摘要

Optical coherence tomography (OCT) is essential in ophthalmology, but inconsistent image quality especially in low-cost devices hinders automated analysis. To address this, we introduce a flow-matching-based test-time adaptation method that generates high-quality surrogate images from noisy inputs. Typically, domain gaps between test and training data cause pixel distribution mismatches during the denoising process. We overcome this by matching the test image's histogram to synthetic reference trajectories, successfully aligning the input with expected distributions. Additionally, we remove the network's time conditioning to account for slight deviations in real-world noise distributions. Our approach achieves state-of-the-art performance in segmenting critical biomarkers for two stages of Age-related Macular Degeneration (AMD). Code is available: https://github.com/Veit21/tta-flow.

2606.18932 2026-06-18 astro-ph.EP astro-ph.IM cs.AI cs.LG 交叉投稿

TransitNet: A Compact Attention-Augmented Deep Learning Framework for Low-SNR Transit Blind Searches

TransitNet: 一种用于低信噪比凌星盲搜索的紧凑型注意力增强深度学习框架

Xingchen Yan, Jian Ge, Qingtian Liu, Kevin Willis, Quanquan Hu, Jiapeng Zhu

发表机构 * Shanghai Astronomical Observatory, Shanghai 200030, China(上海天文台,上海200030,中国) University of Chinese Academy of Sciences, Yanqi Lake Campus, East Road 1, Huairou, Beijing 101408, China(中国科学院大学,燕琦湖校区,东路1号,北京101408,中国) Science Talent Training Center, Gainesville, FL, 32606 USA(科学人才培训中心,佛罗里达州盖恩斯维尔,32606美国)

AI总结 提出紧凑型注意力增强深度学习框架TransitNet,用于低信噪比凌星盲搜索,在SNR 6-8范围内达到95.2%准确率,恢复率93.0%,远超TLS和BLS,且模型仅1.5 MB,推理速度提升12-25倍。

Comments 24 pages, 23 figures, 3 tables, submitted to MNRAS

详情
AI中文摘要

受中长周期地球大小行星观测不完整性的启发,我们提出了TransitNet,一种用于低信噪比凌星盲搜索的紧凑型注意力增强深度学习框架。为了实现盲搜索条件下现实的方法开发和客观的阈值校准,我们开发了一个统一的数据集构建、基准测试和阈值选择框架。在由未见过的Kepler目标构建的恢复基准测试中,TransitNet在具有挑战性的信噪比6-8范围内达到了95.2%的准确率,并优于TLS和BLS,ROC-AUC和PR-AP值分别为0.974和0.982。在一次注入的地球大小和亚地球大小凌星恢复实验中,TransitNet实现了93.0%的恢复率,显著超过TLS(63.1%)和BLS(60.0%)。除了检测,TransitNet还提供了基于注意力的凌星窗口和中点估计。在一个独立评估集上,97.4%的注入凌星被估计的凌星窗口完全覆盖。应用于真实的Kepler观测,该模型成功恢复了所有34个选定的已确认Kepler行星,平均绝对凌星中点误差为1.24小时。该模型结合了约1.5 MB的紧凑体积和高推理效率,相对于CPU-TLS加速约12-25倍,相对于CPU-BLS加速约4-5倍。这些结果表明,TransitNet在测试范围内为低信噪比凌星盲搜索提供了一个准确、可扩展且计算高效的框架,并激励其扩展到更长周期的地球大小行星搜索。

英文摘要

Motivated by the observational incompleteness of intermediate-to-long-period Earth-size planets, we present TransitNet, a compact attention-augmented deep-learning framework for low-SNR transit blind searches. To enable realistic method development and objective threshold calibration under blind-search conditions, we develop a unified dataset construction, benchmarking, and threshold-selection framework. On recovery benchmarks constructed from unseen Kepler targets, TransitNet attains 95.2 percent accuracy in the challenging SNR range of 6 to 8 and outperforms both TLS and BLS, achieving ROC-AUC and PR-AP values of 0.974 and 0.982, respectively. In an injected Earth-size and sub-Earth-size transit recovery experiment, TransitNet achieves a recovery rate of 93.0 percent, substantially exceeding those of TLS (63.1 percent) and BLS (60.0 percent). In addition to detection, TransitNet provides attention-based estimates of transit windows and midpoints. On an independent evaluation set, 97.4 percent of injected transits are fully covered by the estimated transit window. Applied to real Kepler observations, the model successfully recovers all 34 selected confirmed Kepler planets, with a mean absolute transit midpoint error of 1.24 hours. The model combines a compact footprint of about 1.5 MB with high inference efficiency, yielding speed-ups of about 12 to 25 times relative to CPU-TLS and about 4 to 5 times relative to CPU-BLS. These results demonstrate that TransitNet provides an accurate, scalable, and computationally efficient framework for low-SNR transit blind searches in the tested regime and motivate its extension to longer-period Earth-size planet searches.

2606.19092 2026-06-18 stat.AP cs.LG 交叉投稿

Context-Aware Optimization of Follow-Up Intervals for Type 2 Diabetes Care Using Markov Decision Processes

使用马尔可夫决策过程对2型糖尿病护理随访间隔进行上下文感知优化

Parisa Lotfibagha, Kristen Miller, William J. Gallagher, Elizabeth B. Selden, Muge Capan

AI总结 提出上下文马尔可夫决策过程模型,利用电子健康记录数据为2型糖尿病患者优化个性化随访间隔,识别低风险和高风险亚群,相比固定间隔策略显著降低预期累积成本。

详情
AI中文摘要

慢性病管理依赖于定期的医患互动来跟踪疾病进展和控制。对于2型糖尿病,当前指南对所有患者规定固定的初级保健随访间隔,忽略了临床轨迹和患者特征的异质性。本研究引入上下文马尔可夫决策过程模型,利用来自10个初级保健诊所的22,154名2型糖尿病患者的电子健康记录数据,优化亚群特定的随访间隔决策。上下文通过以下方式识别:i) 利用主成分分析对代表个体健康轨迹的变量进行降维,以及ii) 通过主成分和额外的患者层面特征使用聚类将患者分配到上下文中。出现了两个不同的上下文,分别代表低风险和高风险亚群。CMDP导出的策略建议:(i) 如果当前就诊的实验室值未测量,则在1个月内随访;(ii) 对于实验室值升高或近期住院,最多3个月;(iii) 对于持续血糖控制,6至12个月,高风险上下文患者的随访间隔更短。最优策略实现了比基准更低的预期累积成本(例如,在高共病上下文中,相对于美国糖尿病协会类似的固定间隔随访策略,CMDP策略降低了约34.8%的成本;在低共病上下文中降低了约6.4%)。这些发现展示了上下文感知方法如何为适应性随访策略提供信息,并有可能通过综合机器学习和概率决策模型来推进初级保健中的慢性病管理。

英文摘要

Chronic disease management relies on regular patient-provider interactions to follow-up on disease progression and control. For Type 2 Diabetes (T2D), current guidelines prescribe fixed time intervals between subsequent primary care visits for all patients, overlooking heterogeneity in clinical trajectories and patient characteristics. This study introduces a Contextual Markov Decision Process (CMDP) model to optimize subpopulation-specific follow-up interval decisions using Electronic Health Record (EHR) data from 22,154 T2D patients across 10 primary care clinics. Contexts are identified by: i) dimensionality reduction of variables representing the individual health trajectories utilizing Principal Component Analysis, and ii) assigning patients to contexts via principal components and additional patient-level features using clustering. Two distinct contexts emerged, representing a lower- and a higher-risk subpopulation. CMDP-derived policies recommend: (i) follow-up within 1 month if lab value at current visit is unmeasured; (ii) up to 3 months for elevated lab values or recent hospitalizations; and (iii) 6 to 12 months for sustained glycemic control, with shorter follow-up intervals for patients in high-risk context. The optimal policies achieved lower expected cumulative cost than benchmarks (e.g., in the higher-comorbidity context, the CMDP policy reduced cost by about 34.8%, and in the lower-comorbidity context by about 6.4%, relative to an American Diabetes Association-like fixed interval follow-up policy. These findings demonstrate how context-aware approaches can inform adaptive follow-up strategies, and have the potential to advance chronic care management in primary care by synthesizing machine learning and probabilistic decision models.

2606.19118 2026-06-18 cs.AI cs.LG econ.GN q-fin.EC 交叉投稿

Analysing drivers and interdependencies in European electricity markets using XAI

使用XAI分析欧洲电力市场的驱动因素与相互依赖性

Antoine Pesenti, Aidan O'Sullivan

发表机构 * UCL Energy Institute, University College London, UK(伦敦大学学院能源研究所,英国)

AI总结 结合深度神经网络与可解释人工智能(XAI)技术,利用SHAP和SSHAP框架分析39个欧洲竞价区的电价决定因素,发现可再生能源(尤其是太阳能)对电价形成具有重要作用,天然气价格仍是主导驱动因素,且互联互通显著影响价格动态。

Comments 12 pages

详情
AI中文摘要

电力市场本质上是复杂系统,具有强非线性、高维交互以及跨区域日益增长的相互依赖性。虽然深度神经网络(DNN)在电价预测方面表现出强大的能力,但其缺乏可解释性限制了其在理解电价形成潜在驱动因素方面的实用性。本文通过将DNN模型与可解释人工智能(XAI)技术相结合,分析了39个欧洲竞价区电价的决定因素,填补了这一空白。我们采用SHAP(SHapley Additive exPlanations)量化特征贡献,并应用和扩展了SSHAP(一种聚合框架)以提高高维设置下的可解释性。分析表明,可再生能源(尤其是太阳能)在电价形成中发挥着不成比例的重要作用,尽管其在总发电量中占比较低。天然气价格仍然是跨电力市场的主导且一致的驱动因素,而互联互通显著影响价格动态,凸显了欧洲电力系统的强相互依赖性。此外,我们构建了一个合成性的全欧盟电力市场,以探索完全一体化单一价格市场的反事实情景。

英文摘要

Electricity markets are inherently complex systems characterised by strong nonlinearities, high-dimensional interactions, and increasing interdependence across regions. While deep neural networks (DNNs) have demonstrated strong predictive capabilities for electricity prices, their lack of interpretability limits their usefulness for understanding the underlying drivers of price formation. This paper addresses this gap by combining DNN models with explainable artificial intelligence (XAI) techniques to analyse the determinants of electricity prices across 39 European bidding zones. We employ SHAP (SHapley Additive exPlanations) to quantify feature contributions and apply and extend SSHAP, an aggregation framework to improve interpretability in high-dimensional settings. The analysis identifies that renewable energy sources, particularly solar, play a disproportionately important role in price formation despite their lower share in total power generation. Gas prices remain a dominant and consistent driver across electricity markets, while interconnections significantly shape price dynamics, highlighting the strong interdependence of European electricity systems. In addition, a synthetic EU-wide electricity market is constructed to explore the counterfactual scenario of a fully integrated market with a single price.

2606.19149 2026-06-18 cs.CR cs.LG 交叉投稿

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

OpenAnt:通过代码分解、对抗性验证和动态测试实现LLM驱动的漏洞发现

Nahum Korda, Gadi Evron

AI总结 提出OpenAnt系统,结合静态分析与LLM推理,通过代码分解、对抗性验证和动态测试三阶段流水线,在降低误报率的同时发现未知漏洞。

详情
AI中文摘要

在大型代码库中自动发现漏洞仍然具有挑战性:传统静态分析误报率高,而模糊测试等动态方法需要大量基础设施且通常针对狭窄的漏洞类别。大型语言模型(LLM)的最新进展使得对程序行为进行语义推理成为可能,但将LLM应用于仓库级安全分析会引入上下文管理、成本和验证方面的挑战。我们提出了OpenAnt,一个开源漏洞发现系统,它在多阶段流水线中集成了静态程序分析与基于LLM的推理。OpenAnt引入了三种关键技术。首先,代码库被分解为自包含的分析单元,并通过从外部入口点的可达性进行过滤,将分析面减少高达97%,同时保留与攻击相关的代码。其次,候选漏洞通过受限攻击者模拟进行对抗性验证,其中模型在现实攻击者能力下评估可利用性。第三,通过动态验证确认发现结果,其中自动生成利用环境,在沙箱容器中执行,并在使用后丢弃。在包括OpenSSL、WordPress和Flowise在内的广泛使用的开源项目上的评估表明,这种架构可以识别先前未知的漏洞,同时保持可管理的分析成本并大幅减少误报。我们的结果表明,结合语义推理与利用验证的闭环漏洞发现流水线,为可扩展的自动化安全分析提供了一条实用路径。OpenAnt已在Apache 2.0许可下开源,网址为https://this https URL。

英文摘要

Automated vulnerability discovery in large codebases remains challenging: traditional static analysis produces high false-positive rates, while dynamic approaches such as fuzzing require substantial infrastructure and often target narrow classes of bugs. Recent advances in large language models (LLMs) enable semantic reasoning about program behavior, but applying LLMs to repository-scale security analysis introduces challenges related to context management, cost, and verification. We present OpenAnt, an open-source vulnerability discovery system that integrates static program analysis with LLM-based reasoning in a multi-stage pipeline. OpenAnt introduces three key techniques. First, codebases are decomposed into self-contained analysis units filtered by reachability from external entry points, reducing the analysis surface by up to 97% while preserving attack-relevant code. Second, candidate vulnerabilities undergo adversarial verification through constrained attacker simulation, where the model evaluates exploitability under realistic attacker capabilities. Third, findings are validated through dynamic verification, in which exploit environments are generated automatically, executed in sandboxed containers, and discarded after use. Evaluation on widely used open-source projects including OpenSSL, WordPress, and Flowise shows that this architecture can identify previously unknown vulnerabilities while maintaining manageable analysis cost and substantially reducing false positives. Our results suggest that closed-loop vulnerability discovery pipelines, combining semantic reasoning with exploit validation, provide a practical path toward scalable automated security analysis. OpenAnt is released as open source under the Apache 2.0 license at https://github.com/knostic/OpenAnt.

2606.19186 2026-06-18 cs.RO cs.LG 交叉投稿

Learning to Annotate Delayed and False AEB Events: A Practical System for Extreme Class Imbalance and Asymmetric Label Noise

学习标注延迟和误报AEB事件:针对极端类别不平衡和非对称标签噪声的实用系统

Mengxiang Hao, Xin Jiang, Xinghao Huang, Wenliang Su, Zhiteng Wang, Junjie Rao, Xiaotian Yang, Wei Liao, Chengyu Han, Gen Liang, Yulun Song, Zhitao Xu, Xianpeng Lang

发表机构 * Li Auto(理想汽车)

AI总结 提出首个自动化AEB标注框架,通过特定数据增强和噪声抑制技术,解决极端类别不平衡和非对称标签噪声问题,将延迟/误报触发召回率提升80%,人工工作量减少50%。

Comments 8 pages, 5 figures, accepted by IEEE International Conference on Robotics and Automation (ICRA)

详情
Journal ref
2026 IEEE International Conference on Robotics and Automation (ICRA)
AI中文摘要

自主紧急制动(AEB)优化依赖于准确标注的真实世界触发事件,特别是揭示系统缺陷的罕见但关键的延迟和误报AEB触发事件。然而,这些少数样本在每天数千次触发事件中占比不到5%,使得大规模人工标注成本过高。我们提出了首个自动化AEB标注框架来解决这一问题。在开发过程中,我们识别出两个严重损害延迟/误报触发标注准确性的基本挑战:(1)极端类别不平衡,其中延迟/误报触发被真实触发淹没;(2)非对称标签噪声,其中误标注的多数样本(真实触发)抑制了少数样本(延迟/误报触发)的学习。为克服这些挑战,我们提出两项关键创新:(1)特定数据增强,通过操纵焦点目标属性、移植自车动态和掩蔽非焦点代理来合成逼真样本;(2)噪声抑制,使用稳定硬度估计和探针引导的自适应阈值来清理误标注的真实触发样本。关键的是,我们将模型部署为具有全栈架构的实用标注系统,从每天数千个AEB事件中高效识别关键的延迟/误报触发。生产结果表明,延迟/误报触发的召回率提高了80%,人工工作量减少了50%。除了直接收益,该系统通过积累高质量标注实现持续自我改进,为车载AEB系统优化奠定了必要的数据基础。

英文摘要

Autonomous Emergency Braking (AEB) optimization relies on accurately annotated real-world trigger events, particularly rare but critical delayed and false AEB triggers that expose system deficiencies. However, these minority samples comprise less than 5% of thousands of daily triggers, making manual annotation prohibitively expensive at scale. We present the first automated AEB annotation framework to address this problem. During development, we identified two fundamental challenges that severely impair delayed/false trigger annotation accuracy: (1) Extreme class imbalance where delayed/false triggers are overwhelmed by true triggers; (2) Asymmetric label noise where mislabeled majority samples (true triggers) suppress minority samples (delayed/false triggers) learning. To overcome these challenges, we propose two key innovations: (1) Specific data augmentation that synthesizes realistic samples by manipulating focal target attributes, transplanting ego-vehicle dynamics, and masking non-focal agents; (2) noise suppression using stable hardness estimation and probe-guided adaptive threshold to clean mislabeled true trigger samples. Crucially, we deploy our model as a practical annotation system with full-stack architecture, efficiently identifying critical delayed/false triggers from thousands of daily AEB events. Production results demonstrate 80% improvement in recall of delayed/false triggers and 50% reduction in manual workload. Beyond immediate gains, the system enables continuous self-improvement through accumulated high-quality annotations, establishing a necessary data foundation for on-vehicle AEB system optimization

2606.19251 2026-06-18 physics.comp-ph cs.LG physics.flu-dyn 交叉投稿

Acceleration of an algebraic multigrid pressure solver using graph neural networks

使用图神经网络加速代数多重网格压力求解器

Eric Chillón, Artur K. Lidtke, Nguyen Anh Khoa Doan, Bernat Font

发表机构 * Faculty of Mechanical Engineering, Delft University of Technology, The Netherlands(荷兰代尔夫特理工大学机械工程学院) Maritime Research Institute Netherlands, The Netherlands(荷兰海事研究院) Department of Aeronautics, Imperial College London, United Kingdom(英国伦敦帝国理工学院航空系)

AI总结 提出一种基于图卷积同构网络的代数多重网格平滑器,通过预测最优多项式系数构造稀疏伪逆算子,减少V-cycle迭代次数,在非结构化网格上实现4%-37%的加速,并泛化至训练时未见的大规模网格。

Comments 23 pages, 11 figures

详情
AI中文摘要

求解压力-泊松方程仍然是非结构化不可压缩流求解器的主要计算瓶颈,这主要是由于传统线性求解器对网格不规则性固有的敏感性。本文引入了一种数据驱动的代数多重网格(AMG)平滑器,该平滑器使用改进的图卷积同构网络(GCIN)。图神经网络预测最优多项式系数,以在不同网格拓扑上构造稀疏伪逆算子。优化系数以减少每次V-cycle迭代后的残差。通过直接从稀疏系数矩阵捕获系统的代数结构,所提出的方法在适应非结构化网格中的局部各向异性的同时,保持了求解器的线性性。我们的框架通过减少达到给定容差所需的V-cycle次数,并在不同基准测试中实现4%到37%的墙钟加速,展示了显著的性能提升。值得注意的是,该模型在比训练时所见大128倍的网格上保持效率,并在未见过的工业相关问题上(如AirfRANS数据集)加速求解器收敛,表现出鲁棒的泛化能力。

英文摘要

Solving the pressure-Poisson equation remains the primary computational bottleneck in incompressible unstructured flow solvers primarily due to the inherent sensitivity of traditional linear solvers to mesh irregularities. This work introduces a data-driven algebraic multigrid (AMG) smoother that uses a modified graph convolutional isomorphism network (GCIN). The graph neural network predicts optimal polynomial coefficients to construct a sparse pseudo-inverse operator across diverse grid topologies. The coefficients are optimized to reduce the residual after each V-cycle iteration. By directly capturing the algebraic structure of the system from the sparse coefficient matrix, the proposed method maintains the solver's linearity while adapting to local anisotropies in unstructured grids. Our framework demonstrates significant performance gains by reducing the number of V-cycles required for a given tolerance and delivering wall-clock speedups from 4% to 37% across diverse benchmarks. Notably, the model exhibits robust generalization by maintaining efficiency on meshes up to 128 times larger than those seen in training, and by accelerating the solver's convergence on unseen industry-relevant problems such as the AirfRANS dataset.

2606.19253 2026-06-18 cs.CV cs.AI cs.LG cs.RO 交叉投稿

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

OneCanvas: 通过全景重投影实现3D场景理解

Bartłomiej Baranowski, Dave Zhenyu Chen, Matthias Nießner

发表机构 * Technical University of Munich(慕尼黑工业大学) Huawei(华为)

AI总结 提出OneCanvas方法,将多视图补丁特征聚合到全景画布上,利用深度和相机位姿进行重投影,无需复杂几何编码器或大量训练,在SQA3D等基准上达到最先进精度。

Comments Project page: https://baranowskibrt.github.io/onecanvas/

详情
AI中文摘要

现有的视觉语言模型(VLM)中的3D场景理解方法要么依赖复杂的、模型特定的几何编码器,要么为了追求空间推理而需要大量的训练预算。相反,OneCanvas将所有视图的补丁特征聚合到一个单一的等距柱状全景画布上。具体来说,每个补丁利用其深度和相机位姿被反投影到3D世界坐标,然后根据从画布原点看到的该点的连续经度和纬度放置在画布上,无需对重叠视图进行光栅化或聚合。补丁的度量坐标的3D位置嵌入被添加到其特征中,从而恢复了将世界位置压缩到角度画布坐标时丢失的深度。因此,来自所有帧的补丁共享一个空间坐标系,无需融合或对主干网络进行重大架构修改。预训练的VLM将此表示视为普通图像。由于画布可以以任何感兴趣的姿态为中心,相同的表示直接支持从特定视角进行情境推理,这是机器人和具身AI中的常见需求。得益于这种表示,我们还可以引入空间预训练课程:通过程序化地将从真实图像中提取的对象的补丁特征放置在原本空白的画布上的选定3D世界位置,我们生成了涵盖广泛空间推理任务的即时监督,并控制答案分布以减少空间推理捷径。OneCanvas在SQA3D和VSI-Bench上达到了最先进的准确率,并在SPBench上泛化到分布外数据,其训练计算量比最强竞争方法少一个数量级。

英文摘要

Existing approaches to 3D scene understanding in Vision-Language Models (VLMs) either rely on complex, model-specific geometry encoders or large training budgets in pursuit of spatial reasoning. Instead, OneCanvas aggregates patch features from all views onto a single equirectangular panoramic canvas. Namely, each patch is unprojected to a 3D world coordinate using its depth and camera pose, then placed on the canvas at the continuous longitude and latitude of that point as seen from the canvas origin, with no rasterization or aggregation across overlapping views. A 3D position embedding of the patch's metric coordinates is added to its feature, restoring the depth lost when collapsing the world position to an angular canvas coordinate. Patches from all frames thus share one spatial coordinate system with no fusion or major architectural modifications of the backbone. The pretrained VLM consumes this representation as if it were an ordinary image. Because the canvas can be centered on any pose of interest, the same representation directly supports situated reasoning from a specific viewpoint, a common requirement in robotics and embodied AI. Thanks to this representation, we can also introduce a spatial pretraining curriculum: by procedurally placing patch features of objects, drawn from real images, at chosen 3D world positions on an otherwise empty canvas, we generate on-the-fly supervision spanning a broad range of spatial reasoning tasks, with answer distributions controlled to reduce spatial reasoning shortcuts. OneCanvas achieves state-of-the-art accuracy on SQA3D and VSI-Bench, and generalizes to out-of-distribution data on SPBench, using an order of magnitude less training compute than the strongest competing methods.

2606.19302 2026-06-18 physics.ao-ph cs.LG 交叉投稿

Optimal scenario design for climate emulation

气候模拟的最优情景设计

Christopher B. Womack, Shahine Bouabid, Andrei Sokolov, Popat Salunke, Glenn Flierl, Sebastian D. Eastham, Noelle E. Selin

发表机构 * Department of Aeronautics and Astronautics, Massachusetts Institute of Technology(航空与航天系,麻省理工学院) Center for Sustainability Science and Strategy, Massachusetts Institute of Technology(可持续科学与战略中心,麻省理工学院) Department of Earth, Atmospheric, and Planetary Sciences, Massachusetts Institute of Technology(地球、大气与行星科学系,麻省理工学院) Brahmal Vasudevan Institute for Sustainable Aviation, Department of Aeronautics, Imperial College London(可持续航空研究所,帝国理工学院伦敦校区) Institute for Data, Systems, and Society, Massachusetts Institute of Technology(数据、系统与社会研究所,麻省理工学院)

AI总结 针对气候模拟器泛化能力受限的问题,提出通过可微简单气候模型优化训练数据情景,使小数据集训练的模拟器性能优于标准情景集。

详情
AI中文摘要

随着深度学习在物理系统中的普及,改进泛化性的努力主要集中在设计嵌入物理约束的架构上。然而,对于机器学习替代气候模型(模拟器),我们表明现有情景中用于生成训练数据的低结构多样性限制了预测能力。在此,我们研究是否可以优化训练数据集本身以提高泛化性。我们引入一种方法创建数据集,使模拟器能够泛化到训练数据中未出现的新结构情景。我们使用可微简单气候模型(SCM)计算模拟器损失对训练数据扰动的敏感性,迭代更新训练数据以最大化模拟器技能。对于SCM,以这种方式优化的一个情景训练出的模拟器优于在六个标准ScenarioMIP路径上训练的模拟器。尽管训练数据集更小,但我们实现了更高的预测技能,发现我们的模拟器成功隔离了不同气候强迫因子(如温室气体与气溶胶)的独特物理行为,而无需单强迫运行。然后我们证明,使用SCM优化的情景驱动中等复杂度气候模型时,产生的训练数据集比在ScenarioMIP输出上训练得到更熟练的模拟器。我们的结果表明,在运行全尺度气候模型的计算受限环境中,生成少量动态丰富的情景比扩展传统排放路径集对模拟和表征系统响应具有更大的边际价值。

英文摘要

As deep learning for physical systems continues to grow in popularity, efforts to improve generalizability have primarily focused on designing architectures that embed physical constraints. However, for machine-learning surrogate climate models (emulators), we show that the low structural diversity in existing scenarios commonly used to generate training data places a ceiling on predictive skill. Here, we examine whether training datasets themselves can be optimized to improve generalization. We introduce a method to create datasets that produce emulators capable of generalizing to new, structurally different scenarios absent from the training data. We use a differentiable Simple Climate Model (SCM) to calculate the sensitivity of emulator loss to perturbations in the training data, iteratively updating the training data to maximize emulator skill. For an SCM, training on one scenario optimized in this fashion outperforms an emulator trained on six standard ScenarioMIP pathways. We achieve this higher predictive skill despite training on a smaller dataset, finding that our emulator successfully isolates distinct physical behaviors of different climate forcing agents (e.g., greenhouse gases vs. aerosols) without single-forcing runs. We then demonstrate that scenarios optimized using an SCM, when used to drive an intermediate-complexity climate model, produce a training dataset that yields a more skillful emulator than training on ScenarioMIP outputs. Our results suggest that, in the compute-constrained environment of running full-scale climate models, generating a small number of dynamically rich scenarios provides greater marginal value for emulation and characterizing system responses than expanding the suite of traditional emissions pathways.

2606.19329 2026-06-18 astro-ph.IM cs.LG 交叉投稿

The Chandra-Gaia Catalog of Counterparts: Resolving ambiguous Gaia matches to X-ray sources in the Chandra Source Catalog using Machine Learning

钱德拉-盖亚对应体星表:利用机器学习解决钱德拉源星表中X射线源与盖亚源的多重匹配歧义

V. Samuel Pérez-Díaz, Vinay L. Kashyap, Joshua D. Ingram, David Fouhey, Juan Rafael Martínez-Galarza, Pavlos Protopapas, Jeremy J. Drake, Dong-Woo Kim, Cecilia Garraffo

发表机构 * Center for Astrophysics Harvard \& Smithsonian, 60 Garden St, Cambridge MA 02138, USA Harvard John A. Paulson School of Engineering Universidad del Rosario, School of Engineering, Science The NSF AI Institute for Artificial Intelligence New York University, Courant Institute, 60 5th Avenue, New York NY, USA Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213 New College of Florida, 5800 Bayshore Road, Sarasota, FL 34243, USA Astrophysics Laboratory, 3251 Hanover St, Palo Alto, CA 94304, USA

AI总结 提出结合源属性(星等、颜色、距离)的机器学习框架,解决钱德拉源星表与盖亚源星表的交叉匹配歧义,为约11.3万个X射线源找到对应体,并识别约2万个假匹配。

Comments Accepted to The Astrophysical Journal. Website: https://www.samuelperezdi.com/chandragaia/

详情
AI中文摘要

我们提出了一个框架,用于将钱德拉源星表(CSC v2.1)中的源与盖亚数据发布3中的光学源进行交叉匹配。与纯空间方法不同,我们使用源属性(如星等、颜色和距离)来识别真实对应体、检测偶然重合,并在存在多个合理候选者时解决歧义。我们使用NWAY(一种考虑位置误差和源密度的贝叶斯交叉匹配框架)定义高置信度匹配的训练集。我们在两个星表的多种特征上训练梯度提升分类器(LightGBM)。在约25.4万个独特X射线源中,我们为约11.3万个源找到了对应体,其中约7000个源存在多个合理对应体。对于约2万个基于分离的交叉匹配能找到匹配的源,我们未找到对应体,并将其中的一半归因于偶然重合。我们在钱德拉猎户座超深项目(COUP)上验证了该流程,机器学习匹配在不使用任何位置信息的情况下再现了NWAY交叉匹配的95%。我们发布了约11.3万个钱德拉-盖亚对应体的星表,以及约7000个替代匹配和约2万个歧义NWAY关联,以支持未来对钱德拉和盖亚均可探测到的源进行种群研究。我们讨论了局限性,并提供了该框架的泛化版本,适用于其他交叉匹配场景。

英文摘要

We present a framework to cross-match sources from the Chandra Source Catalog (CSC v2.1) with optical sources from Gaia Data Release 3. Unlike purely spatial approaches, we use source properties such as magnitudes, colors, and distances to identify true counterparts, detect chance coincidences, and resolve ambiguities when multiple plausible candidates exist. We define a training set of high-confidence matches using NWAY, a Bayesian cross-matching framework that accounts for positional errors and source densities. We train a gradient-boosted classifier (LightGBM) on a variety of features from both catalogs. Of the ~$254$k unique X-ray sources, we find counterparts for ~$113$k sources, of which plausible multiple counterparts are found for ~$7$k. We find no counterparts for ~$20$k sources for which separation-based cross-matching does find a match, and attribute half of these to chance coincidences. We validate the pipeline on the Chandra Orion Ultradeep Project (COUP), where the machine-learning matches reproduce 95% of NWAY cross-matches without using any positional information. We release a catalog of the ~$113$k Chandra-Gaia counterparts, together with ~$7$k alternative matches and ~$20$k ambiguous NWAY associations, supporting future population studies of sources detectable by both Chandra and Gaia. We discuss limitations and provide a generalization of the framework that is applicable in other cross-matching scenarios.

12. 其他/综合机器学习 2 篇

2606.18535 2026-06-18 stat.ME cs.LG math.ST stat.TH 交叉投稿

Shrinkage priors for Bayesian Substitute Confounders

贝叶斯替代混杂因子的收缩先验

Yordan P. Raykov, Hengrui Luo, Justin D. Strait, Wasiur R. KhudaBukhsh

发表机构 * School of Mathematical Sciences, University of Nottingham, Nottingham, UK(诺丁汉大学数学科学学院) Department of Statistics, Rice University, USA(里士满大学统计学系;伯克利国家实验室) Lawrence Berkeley National Laboratory, USA(洛斯阿拉莫斯国家实验室统计科学组) Statistical Sciences Group, Los Alamos National Laboratory, USA

AI总结 针对多原因观察研究中替代混杂因子过度编码问题,提出贝叶斯因子分配框架,利用收缩先验学习稀疏替代混杂因子,保持粗粒度多原因依赖,并证明后验集中性和重叠保持几何性质,实现潜在结果的一致性估计。

详情
AI中文摘要

多原因观察研究通过原因间的依赖结构包含关于未测量混杂的信息。然而,对未观测混杂的直接插补通常比学习一个低维替代得分更复杂,该得分保留了稳定因果调整所需的共享分配变异。去混杂因子(Wang and Blei, 2019)及相关替代混杂因子方法利用了这一思想,但灵活的分配模型可以拟合原因的联合分布,同时产生过度编码处理向量、破坏重叠或捕获单原因变异的得分。我们开发了一个贝叶斯因子分配框架,用于学习稀疏替代混杂因子,该框架通过收缩先验保留粗粒度的多原因依赖。该理论在后验集中性、因子得分收缩和保留重叠的分配几何层面进行阐述,因此不依赖于特定的收缩先验。在这些条件下,当相应的潜变量识别假设成立时,所提出的回归调整估计量对平均潜在结果是一致的。收缩先验为潜在结构学习提供了自然工具:它们倾向于由多个原因支持的低维因子,阻止有效的单原因因子,并通过渐进收缩诱导潜在因子的排序。合成实验说明了信号强度、结果有效性和几何感知正则化的作用。在阿尔茨海默病神经影像学倡议(ADNI)基线分析中,稀疏替代得分恢复了对侵入性脑脊液生物标志物直接条件调整的大部分效果,而重叠崩溃诊断则识别出拟合因子何时简化为单个观测测量。

英文摘要

Multi-cause observational studies contain information about unmeasured confounding through the dependence structure among causes. However, literal imputation of the unobserved confounder is often more complex than learning a lower-dimensional substitute score that preserves the shared assignment variation needed for stable causal adjustment. The deconfounder (Wang and Blei, 2019) and related substitute confounder methods exploit this idea, but flexible assignment models can fit the joint distribution of the causes while producing scores that over-encode the treatment vector, collapse overlap, or capture single-cause variation. We develop a Bayesian factor assignment framework for learning sparse substitute confounders that retain coarse multi-cause dependence with shrinkage priors. The theory is stated at the level of posterior concentration, factor score contraction, and overlap-preserving assignment geometry and therefore does not rely on a particular shrinkage prior. Under these conditions, the proposed regression-adjusted estimators are consistent for mean potential outcomes when the corresponding latent variable identification assumptions hold. Shrinkage priors provide a natural tool for latent structural learning: they favour low-dimensional factors supported by multiple causes, discourage effectively single-cause factors, and induce an ordering of the latent factors through progressive shrinkage. Synthetic experiments illustrate the roles of signal strength, outcome validity, and geometry-aware regularization. In an Alzheimer's Disease Neuroimaging Initiative (ADNI) baseline analysis, sparse substitute scores recover much of the adjustment obtained by directly conditioning on invasive cerebrospinal-fluid biomarkers, while collapse diagnostics identify when fitted factors reduce to individual observed measurements.

2606.19270 2026-06-18 eess.IV cs.LG physics.med-ph 交叉投稿

Beyond Algorithms: Conceptual Innovation in Medical Imaging AI

超越算法:医学影像人工智能中的概念创新

Mark A. Anastasio

发表机构 * Mallinckrodt Institute of Radiology and Department of Electrical & Systems Engineering, Washington University in St. Louis(马林克罗德特放射医学研究所和电气与系统工程系,华盛顿大学圣路易斯分校)

AI总结 本文区分算法创新与概念创新,指出当前激励结构过度奖励算法新颖性而忽视概念贡献,通过医学影像AI案例展示概念不足导致的错位目标与有限临床影响,并提出促进概念创新的建议。

详情
AI中文摘要

人工智能推动了医学影像研究的快速发展,产生了日益复杂的算法,并在基准任务上稳步改进。然而,这种以算法为中心的发展轨迹也揭示了一个日益加剧的不平衡:虽然计算方法快速进步,但定义成像任务、评估指标和临床意义的概念基础有时仍未得到充分审视。在这篇观点文章中,我们区分了算法创新(专注于在固定问题定义内改进计算实现和性能)与概念创新(重新定义提出的问题、衡量成功的方式以及方法在临床上的相关性)。我们认为,当前的激励结构、培训路径和发表规范不成比例地奖励算法新颖性,尤其是对早期职业研究者而言,而有时低估了对科学成熟和临床转化至关重要的概念贡献。通过医学影像AI的代表性例子,我们展示了概念基础不足如何导致目标错位、泛化脆弱以及现实世界影响有限。最后,我们为研究者、导师、审稿人和期刊提出了可操作的建议,以更好地识别、支持和整合概念创新与算法进步。

英文摘要

Artificial intelligence has driven rapid progress in medical imaging research, producing increasingly sophisticated algorithms and steady improvements on benchmark tasks. However, this algorithm-centric trajectory has also revealed a growing imbalance: while computational methods advance rapidly, the conceptual foundations that define imaging tasks, evaluation metrics, and clinical meaning sometimes remain underexamined. In this Perspective, we distinguish algorithmic innovation, which focuses on improving computational implementations and performance within a fixed problem definition, from conceptual innovation, which reframes what problems are posed, how success is measured, and why an approach is clinically relevant. We argue that prevailing incentive structures, training pathways, and publication norms disproportionately reward algorithmic novelty, particularly for early-career researchers, while at times undervaluing conceptual contributions that are essential for scientific maturation and clinical translation. Through representative examples from medical imaging AI, we show how insufficient conceptual grounding can lead to misaligned objectives, fragile generalization, and limited real-world impact. We conclude with actionable recommendations for researchers, mentors, reviewers, and journals to better recognize, support, and integrate conceptual innovation alongside algorithmic advances.