arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.18275 2026-06-18 cs.ET cond-mat.mtrl-sci cs.LG 交叉投稿

A physical adaptive material motor unit neural network: a hygromorph composite material machine

一种物理自适应材料运动单元神经网络：潮致变形复合材料机器

Charles de Kergariou, David Correa, Adam W. Perriman, Helmut Hauser, Fabrizio Scarpa

发表机构 * Bristol Composites Institute, School of Civil, Aerospace and Mechanical Engineering, University of Bristol（布里斯托尔复合材料研究所，土木、航空航天与机械工程学院，布里斯托尔大学）； School of Architecture, University of Waterloo（滑铁卢大学建筑学院）； Research School of Chemistry and John Curtin School of Medical Research, Australian National University（化学研究学校和约翰·库廷医学研究学院，澳大利亚国立大学）； School of Cellular and Molecular Medicine, University of Bristol（细胞与分子医学学院，布里斯托尔大学）； School of Engineering Mathematics and Technology, University of Bristol（工程数学与技术学院，布里斯托尔大学）； Bristol Robotics Lab, Bristol, United Kingdom（布里斯托尔机器人实验室，布里斯托尔，英国）

AI总结提出一种基于木材和炭黑复合材料的物理自适应运动单元神经网络，通过数据感知反向传播训练，实现动态遮阳控制，并能随数据库扩展增量学习。

Comments 35 pages, 16 figures

详情

AI中文摘要

新型材料科学的进步使得结构能够通过将记忆和学习能力直接嵌入材料来充当智能机器。我们的工作介绍了一种物理自适应材料运动单元神经网络，利用由木材和炭黑基复合材料组成的新一代可控执行器，这些执行器对温度和相对湿度敏感。这些材料执行器被组装成一种类似肌肉收缩触发的运动单元结构，形成一种能够进行动态遮阳控制的智能机器，例如可用于建筑物。该机器由一个神经网络控制，该网络在超过350个在不同环境条件下收集的实验数据点上进行训练。通过建立一种新的数据感知反向传播训练，我们展示了该机器能够预测遮阳响应，并随着数据库的扩展逐步学习预测适当的行为。我们还展示了该机器优化配置以在两种不同条件下实现相似遮阳输出的能力。

英文摘要

Advances in novel materials science enable structures to function as intelligent machines by embedding memory and learning capabilities directly into materials. Our work introduces a physical adaptive material motor unit neural network,leveraging a new generation of controllable actuators composed of wood- and carbon black-based composites, sensitive to temperature and relative humidity. These material actuators are assembled into a motor unit-like structure inspired by muscle contraction trigger, forming an intelligent machine capable of dynamic shading control that can be used, for example, in buildings. The machine is governed by a neural network trained on over 350 experimental data points collected under diverse environmental conditions. By establishing a new data-aware backpropagation training, we show that the machine predicts shading responses and learns to predict appropriate behaviour incrementally as the database expands. We also demonstrate the ability of the machine to optimise configurations to achieve similar shading outputs under two distinct conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.18305 2026-06-18 math.NA cs.LG cs.NA 交叉投稿

Starter-Iterator Neural Operator: A Unified Architecture for High-Fidelity Forward and Inverse PDE Problems

起始迭代神经算子：面向高保真正问题和逆问题的统一架构

Kuilin Qin, Lianfang Wang, Xu Sun, Jiwei Jia, Yu Wang, Yong Wang, Yuping Duan

发表机构 * School of Mathematical Sciences, Beijing Normal University（北京师范大学数学科学学院）； School of Mathematics, Jilin University（吉林大学数学学院）； Key Laboratory of Digital Technology in Medical Diagnostics of Zhejiang（浙江省数字医疗诊断技术重点实验室）； School of Physics, Nankai University（南开大学物理学院）

AI总结提出起始迭代神经算子（SINO），通过神经网络重解释传统迭代方法的初始化与迭代格式，实现频谱-时空协同建模，在Navier-Stokes方程、声波方程等正逆问题中提升数值精度与泛化能力。

详情

AI中文摘要

算子学习是一个新兴的交叉学科领域，融合了机器学习与科学计算。通过映射无限维函数空间，该方法为高维偏微分方程（PDE）提供了高效的代理建模框架。与传统数值求解器相比，它在计算复杂度和逼近精度之间实现了更优的权衡，在实时预测和参数扫描等多查询任务中展现出显著优势。鉴于正演模拟和反演推理对精度的严格要求，以及现有算子学习方法在处理复杂边界或长期演化时的精度瓶颈，我们提出了起始迭代神经算子（SINO）。我们的框架通过神经网络重新诠释传统迭代方法的初始化策略和迭代格式，建立了一种高效的频谱-时空协同建模方法。具体而言，频域初始化模块捕获全局稳定的低频特征，而时域学习模块专注于优化局部解残差，从而有效克服了传统单域建模方法的内在局限性。在典型动力系统（如Navier-Stokes方程和声波方程）以及实际应用（包括超分辨率成像和天气预报）上的大量实验表明，SINO在数值精度、泛化能力和鲁棒性方面均取得了卓越性能。

英文摘要

Operator learning is an emerging interdisciplinary field that integrates machine learning with scientific computing. By mapping infinite-dimensional function spaces, this approach provides an efficient surrogate modeling framework for high-dimensional partial differential equations (PDEs). Compared to traditional numerical solvers, it achieves a superior trade-off between computational complexity and approximation accuracy, demonstrating significant advantages in many-query tasks such as real-time prediction and parameter sweeps. Given the stringent accuracy requirements of both forward simulation and inverse inference, as well as the precision bottlenecks of existing operator learning methods in handling complex boundaries or long-term evolution, we propose the Starter-Iterator Neural Operator (SINO). Our framework reinterprets the initialization strategies and iterative formats of traditional iterative methods through neural networks, establishing an efficient approach for spectral-spatiotemporal collaborative modeling. Specifically, the frequency-domain initialization module captures globally stable low-frequency features, while the time-domain learning module focuses on optimizing local solution residuals, thereby effectively overcoming the inherent limitations of conventional single-domain modeling approaches. Extensive experiments on typical dynamical systems such as the Navier-Stokes equations and acoustic wave equations, as well as practical applications including super-resolution imaging and weather forecasting, demonstrate that SINO achieves outstanding performance in numerical accuracy, generalization capability, and robustness.

URL PDF HTML ☆

赞 0 踩 0

2606.18611 2026-06-18 cs.SD cs.AI cs.LG stat.ML 交叉投稿

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

QC-GAN: 一种参数高效的四元数Conformer GAN用于高保真语音增强

Shogo Yamauchi, Hideaki Tamori, Makoto Sakai, Yosuke Yamano, Tohru Nitta

发表机构 * The Asahi Shimbun Company（朝日新闻社）； Tokyo Woman's Christian University（东京女子基督教大学）

AI总结提出参数高效的QC-GAN，结合四元数Conformer生成器和MetricGAN训练，通过汉密尔顿积共享权重减少参数量，在VoiceBank+DEMAND上以0.89M参数达到PESQ 3.48，性能媲美两倍大小模型。

Comments 10 pages, 6 figures and 5 tables. Accepted at Interspeech2026

2606.18759 2026-06-18 cs.CG cs.LG cs.NA math.NA 交叉投稿

A Neural Network Framework for Geodesic-Like Curve Computation on Parametric Surfaces

参数曲面上类测地线曲线计算的神经网络框架

Sheng-Gwo Chen, Chen-Chang Peng

发表机构 * Department of Applied Mathematics, National Chiayi University, Chia-Yi 600, Taiwan（国立嘉义大学应用数学系，嘉义600，台湾）

AI总结提出基于物理信息神经网络（PINNs）的框架，高效计算参数曲面上的类测地线曲线，支持多曲面系统和旋转曲面。

Comments 22 pages, 16 figures, 8 tables

2606.18837 2026-06-18 cs.MA cs.AI cs.LG 交叉投稿

Skill-MAS: Evolving Meta-Skill for Automatic Multi-Agent Systems

Skill-MAS: 演化元技能以自动生成多智能体系统

Hehai Lin, Qi Yang, Chengwei Qin

发表机构 * Ant Group（蚂蚁集团）； The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））

AI总结提出Skill-MAS，通过将高层编排能力解耦为可演化的元技能，在无需参数更新的情况下实现经验保留，利用多轨迹采样和选择性反思优化元技能，在多个基准和LLM上取得显著性能提升且成本可控。

详情

AI中文摘要

基于大型语言模型（LLM）的自动多智能体系统（MAS）生成已成为处理复杂任务的关键前沿。然而，现有方法在模型能力和经验保留之间面临两难困境。推理时MAS利用冻结的尖端LLM，但重复相同搜索而不从过去经验中学习。相反，训练时MAS通过梯度更新内化经验，但受限于较小模型的低能力上限，且难以扩展到大型尖端LLM。为弥合这一差距，我们提出Skill-MAS，一种新颖的第三条路径，通过将高层编排能力概念化为可演化的元技能，将经验保留与参数更新解耦。Skill-MAS通过一个封闭优化循环来精炼这种架构知识：（1）多轨迹采样在当前元技能下为每个任务采样行为分布；（2）选择性反思自适应选择优先任务，并应用分层对比分析将系统经验蒸馏为可泛化的策略级原则。在四个复杂基准和四个不同LLM上的大量实验表明，Skill-MAS不仅实现了显著的性能提升，而且保持了良好的成本-性能权衡。进一步分析揭示，演化后的元技能高度鲁棒，并在未见任务和不同LLM之间表现出强迁移性。

英文摘要

Large Language Model (LLM)-based automatic Multi-Agent Systems (MAS) generation has become a crucial frontier for tackling complex tasks. However, existing methods face a dilemma between model capability and experience retention. Inference-time MAS leverages frozen frontier LLMs but repeats identical searches without learning from past experience. Conversely, Training-time MAS internalizes experience via gradient updates but is constrained by the low capability ceiling of smaller models, and is hard to scale to large frontier LLMs. To bridge this gap, we propose Skill-MAS, a novel third path that decouples experience retention from parametric updates by conceptualizing the high-level orchestration capability as an evolvable Meta-Skill. Skill-MAS refines this architectural knowledge through a closed optimization loop: (1) Multi-Trajectory Rollout samples a behavioral distribution for each task under the current Meta-Skill; and (2) Selective Reflection adaptively selects priority tasks and applies hierarchical contrastive analysis to distill systemic experience into generalizable, strategy-level principles. Extensive experiments across four complex benchmarks and four distinct LLMs demonstrate that Skill-MAS not only achieves remarkable performance gains but also maintains a favorable cost-performance trade-off. Further analysis reveals that the evolved Meta-Skills are highly robust and exhibit strong transferability across unseen tasks and different LLMs.

URL PDF HTML ☆

赞 0 踩 0

2606.18853 2026-06-18 stat.ML cs.LG 交叉投稿

Kernel of Partition Paths: A Unified Representation for Tree Ensembles

划分路径的核：树集成的统一表示

Nicolas Mahler

AI总结提出KPP核，通过路径度量索引森林节点，统一了预测、精确加性归因、确定性Lipschitz鲁棒半径和Rademacher风险界，为树集成提供几何框架。

Comments 31 pages

详情

AI中文摘要

最近的一系列工作将单个决策树重新表述为基于其分裂的工程特征的线性模型，为oracle不等式和特征重要性重解释开辟了途径，但留下了一个开放问题：当通过节点而非分裂索引特征映射时，森林诱导的统一几何对象是什么。本文研究了该对象。KPP通过森林节点索引特征映射，并由路径度量加权，该度量将每个坐标转化为平方欧几里得路径等距嵌入的分量。KPP在承载度量的非对角Gram矩阵下统一了四个支柱：预测、精确加性归因、KPP度量下的确定性Lipschitz鲁棒半径，以及在固定、诚实或交叉拟合条件下的回归和分类的均匀Rademacher风险界。所有概率保证均以表示为条件，并在三种显式条件机制下陈述；鲁棒半径保证在KPP度量下是确定性的，而非原始输入的范数。回归和分类的快速率改进被推测为开放问题，并未声称是定理。

英文摘要

A recent line of work has reframed individual decision trees as linear models on engineered features associated with their splits, opening routes for oracle inequalities and feature-importance reinterpretation, but leaving open the question of what unified geometric object a forest induces when one indexes its feature map by nodes rather than by splits. The present paper studies that object. KPP indexes the feature map by the nodes of the forest, weighted by a path metric that turns each coordinate into a component of a squared-Euclidean path-isometric embedding. KPP unifies four pillars under a single non-diagonal Gram that carries a metric: prediction, exact additive attribution, deterministic Lipschitz robust radius in the KPP metric, and uniform Rademacher risk bounds for regression and classification under fixed, honest, or cross-fit conditioning. All probabilistic guarantees are conditional on the representation and are stated under three explicit conditioning regimes; the robust-radius guarantee is deterministic in the KPP metric rather than in a norm on the raw input. Conjectured fast-rate refinements for both regression and classification are stated as open problems and are not claimed as theorems.

URL PDF HTML ☆

赞 0 踩 0

2606.19039 2026-06-18 cs.NE cs.LG cs.SD 交叉投稿

Adaptive Speech-to-Spike Encoding for Spiking Neural Networks

自适应语音到脉冲编码用于脉冲神经网络

Taharim Rahman Anon, Jakaria Islam Emon

发表机构 * PI LLC（1 PI LLC）

AI总结提出一种可学习的残差语音到脉冲编码器，与R-LIF骨干网络联合训练，在GSC-v2上达94.97%准确率，参数高效且学习任务对齐的脉冲表示。

Comments Accepted at Interspeech 2026. This version is a preprint

详情

AI中文摘要

连续声学信号与离散事件驱动处理之间的不匹配仍然是神经形态语音处理的基本瓶颈。当前系统通常依赖固定的脉冲编码器，迫使下游脉冲神经网络（SNN）补偿非自适应的输入表示。为了解决这个问题，我们提出了一种可学习的残差语音到脉冲编码器，与循环漏积分点火（R-LIF）骨干网络进行端到端联合训练。我们在Google Speech Commands v2（GSC-v2）基准上验证了该方法，达到了高达94.97%的准确率。值得注意的是，学习到的编码器仍然高度参数高效，其紧凑的35k参数变体达到了89.8%，匹配或超过了需要多一个数量级参数的先前基线。我们以编码器为中心的分析，包括线性探测和梯度残差检查，表明编码器并不追求忠实的信号重建，而是学习任务对齐的脉冲表示，增强了类别可分性。最后，我们通过比较直接反馈对齐（DFA）和替代梯度BPTT在相同架构和训练条件下的表现，对生物启发、硬件友好的信用分配进行了基准测试。我们发现DFA达到了91.5%的准确率，量化了生物启发学习规则在现代神经形态音频中的性能权衡。

英文摘要

The mismatch between continuous acoustic signals and discrete event-driven processing remains a fundamental bottleneck for neuromorphic speech processing. Current systems typically rely on fixed spike encoders, forcing downstream Spiking Neural Networks (SNNs) to compensate for non-adaptive input representations. To address this, we present a learnable residual speech-to-spike encoder jointly trained end-to-end with a Recurrent Leaky Integrate-and-Fire (R-LIF) backbone. We validate this approach on the Google Speech Commands v2 (GSC-v2) benchmark, achieving up to 94.97% accuracy. Notably, the learned encoder remains highly parameter-efficient with a compact 35k-parameter variant that reaches 89.8%, matching or exceeding prior baselines that require an order of magnitude more parameters. Our encoder-focused analysis, including linear probing and gradient-residual inspection, indicates that the encoder does not target faithful signal reconstruction but instead learns task-aligned spike representations that enhance class separability. Finally, we benchmark bio-inspired, hardware-friendly credit assignment by comparing Direct Feedback Alignment (DFA) with surrogate-gradient BPTT under identical architectures and training conditions. We find that DFA reaches 91.5% accuracy, quantifying the performance trade-off of bio-inspired learning rules for modern neuromorphic audio.

URL PDF HTML ☆

赞 0 踩 0

2606.19101 2026-06-18 eess.SP cs.LG 交叉投稿

Structure Over Nonlinearity: Explicit Interaction Architectures for Dynamical Learning

结构优于非线性：面向动力学学习的显式交互架构

Augusto Sarti

AI总结提出基于波启发交互结构的显式动力学单元，通过结构化组织而非非线性表达实现建模能力，在非线性系统辨识中深度提升表示质量与泛化性能。

Comments 11 pages, 2 figures, 2 tables

详情

AI中文摘要

大多数动力学系统的学习架构依赖于通用非线性函数逼近，通常需要高模型复杂度来捕获结构化行为。在这项工作中，我们提出了一种替代范式，其中建模能力主要来源于结构而非表达性非线性。我们引入了一类基于波启发交互结构和内部状态的显式结构化动力学单元。受波计算原理启发，所提出的单元采用严格的因果组织，消除了代数循环，产生无需隐式求解器即可评估的完全显式模型。堆叠此类单元可产生具有涌现层次行为的分层动力学架构。通过非线性系统辨识任务的实验，我们表明即使在有限的参数优化下，深度也能提高表示质量和泛化能力。特别地，所提出的架构即使在仅进行读出层拟合时也能产生信息丰富的内部表示，这表明有用的动力学结构在大量参数优化之前就已从交互的组织中涌现。这些结果表明，结构优先的设计为学习动力学系统提供了一种可行且有效的替代传统黑箱方法，突出了交互结构作为模型表达性主要来源的作用。

英文摘要

Most learning architectures for dynamical systems rely on generic nonlinear function approximation, often requiring high model complexity to capture structured behaviors. In this work, we propose an alternative paradigm in which modeling capability arises primarily from structure rather than from expressive nonlinearities. We introduce a class of explicit structured dynamical units based on wave-inspired interaction structures with internal state. Inspired by wave-based computational principles, the proposed units adopt a strictly causal organization that eliminates algebraic loops, yielding fully explicit models that can be evaluated without implicit solvers. Stacking such units produces layered dynamical architectures with emergent hierarchical behavior. Through experiments on a nonlinear system identification task, we show that depth improves both representation quality and generalization, even under limited parameter optimization. In particular, the proposed architectures produce informative internal representations even under readout-only fitting, indicating that useful dynamical structure emerges from the organization of interactions prior to substantial parameter optimization. These results suggest that structure-first design provides a viable and effective alternative to conventional black-box approaches for learning dynamical systems, highlighting the role of interaction structure as a primary source of model expressivity.

URL PDF HTML ☆

赞 0 踩 0

2606.19168 2026-06-18 cs.AI cs.LG 交叉投稿

Beyond Safe Data: Pretraining-Stage Alignment with Regular Safety Reflection

超越安全数据：具有正则安全反射的预训练阶段对齐

Jinhan Li, Kexian Tang, Yihan Xu, Zhuorui Ye, Kaifeng Lyu

发表机构 * Institute for Interdisciplinary Information Sciences, Tsinghua University（清华大学交叉信息研究院）

AI总结提出安全反射预训练方法，在预训练语料中插入安全反思，使模型具备自我监控能力，实验表明该方法能有效降低推理和微调攻击成功率。

详情

AI中文摘要

为了实现大型语言模型（LLMs）更深层次的安全对齐，最近的研究探讨了如何将安全干预措施提前到预训练阶段，主要通过过滤不安全数据或将其改写为更安全的形式。我们认为，预训练阶段的对齐应超越使数据安全：LLMs可能将看似良性的知识和能力组合成不安全的行为。为此，我们提出了安全反射预训练，一种预训练阶段的对齐方法，该方法定期在预训练语料中插入简短的安全反思，将自我监控直接集成到语言建模中，建立一种基础能力，随后通过兼容的后训练加以强化。我们在FineWeb-Edu上预训练的1.7B模型上的实验表明，安全反射预训练提高了安全分类准确性，并显著降低了推理阶段和微调攻击的成功率。除了真实世界实验，我们还引入了一个完全受控的合成环境MedSafetyWorld，其中包含清晰的安全定义和推理结构，模型可以轻松地从安全数据中泛化出不安全行为。在MedSafetyWorld中的消融实验进一步表明，与数据过滤和改写相比，安全反射预训练在防止模型根据安全数据泛化出的不安全行为方面具有明显优势。综合来看，我们的发现表明，预训练对齐不仅应使训练数据安全，还应塑造模型可能从安全数据中习得的行为。

英文摘要

To achieve deeper safety alignment for large language models (LLMs), recent efforts have studied how to push safety interventions earlier into the pretraining stage, primarily by filtering unsafe data or rewriting it into safer forms. We argue that pretraining-stage alignment should go beyond making the data safe: LLMs may compose seemingly benign knowledge and capabilities into unsafe behaviors. To this end, we propose Safety Reflection Pretraining, a pretraining-stage alignment method which regularly inserts short safety reflections into pretraining corpora to integrate self-monitoring directly into language modeling, establishing a foundational capability that is subsequently reinforced by compatible post-training. Our experiments with 1.7B models pretrained on FineWeb-Edu show that Safety Reflection Pretraining improves safety classification accuracy and substantially reduces the success rates of inference-stage and finetuning attacks. Complementary to our real-world experiments, we also introduce a fully controlled synthetic environment, MedSafetyWorld, with a clear definition of safety and a reasoning structure under which models can easily generalize unsafe behaviors from safe data. Ablations in MedSafetyWorld further demonstrate a clear advantage of Safety Reflection Pretraining in preventing models from acting on unsafe behaviors generalized from safe data, compared with data filtering and rewriting. Taken together, our findings suggest that pretraining alignment should not only make the training data safe, but also shape the behaviors that models are likely to acquire from safe data.

URL PDF HTML ☆

赞 0 踩 0

2606.19279 2026-06-18 cs.AI cs.LG cs.LO math.CT math.LO math.PR 交叉投稿

NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning

NeSyCat Torch：神经符号学习中范畴语义的可微张量实现

Daniel Romero Schellhorn, Till Mossakowski, Björn Gehrke

发表机构 * University of Osnabrück（奥斯纳布吕克大学）

AI总结提出NeSyCat Torch框架，通过强单子和真值聚合结构统一神经符号语义，利用惰性对数张量单子实现可微训练，在MNIST加法任务上优于LTN和DeepProbLog。

详情

AI中文摘要

神经符号语义是碎片化的：经典、模糊、概率和神经系统的真值各自遵循其归纳规则。NeSyCat扩展了ULLER，将它们统一在一个单一的真值归纳定义下，该定义以强单子和真值上的聚合结构为参数。NeSyCat至今缺乏对由神经网络学习的谓词和函数的描述。我们提供NeSyCat Torch作为缺失的环节，通过神经网络解释计算符号，在概率编程和张量后端中实现该框架。我们使用分布单子作为参考语义和度量评估，并辅以一个用于数值稳定、可微训练的单子：对数半环上的惰性对数张量单子。为了高效批量训练，我们还采用了批处理单子。公理即源代码：一次性地用基于单子的do-notation编写，单子绑定执行边缘化，惰性地剪枝不需要的分支。在MNIST加法任务上，我们的HaskTorch、JAX和PyTorch实现在速度和准确性上优于LTN和DeepProbLog，同时几乎达到DeepStochLog的准确性。然而，与DeepStochLog不同，我们保持在一个统一的框架内，适用于许多一阶神经符号方法。即，该构造以单子为参数；例如，用Giry单子实例化它可将方法扩展到连续概率（在此留作未来工作）。

英文摘要

Neurosymbolic semantics is fragmented: classical, fuzzy, probabilistic and neural systems each define truth by their own inductive rules. NeSyCat, extending ULLER, subsumes them under a single inductive definition of truth, parametric in a strong monad and an aggregation structure on truth-values. NeSyCat has so far lacked an account of predicates and functions learned by neural networks. We provide NeSyCat Torch as the missing link and interpret computational symbols via neural networks, implementing the framework in probabilistic programming and tensor-based backends. We use the distribution monad for reference semantics and metric evaluation, and complement it by a monad for numerically stable, differentiable training: the lazy log-tensor monad over the log-semiring. For efficient training in batches, we furthermore employ a batch monad. The axioms are the source code: written once in monad-based do-notation, monadic bind performs marginalisation, lazily pruning unneeded branches. On MNIST addition, our HaskTorch, JAX, and PyTorch implementations outperform LTN and DeepProbLog in speed and accuracy, while achieving nearly the accuracy of DeepStochLog. However, unlike DeepStochLog, we stay in a uniform framework that applies to many first-order NeSy approaches. Namely, the construction is parametric in the monad; instantiating it with, e.g., the Giry monad extends the approach to continuous probability (working out a neural representation here is left for future work).

URL PDF HTML ☆

赞 0 踩 0

2606.18520 2026-06-18 stat.ML cs.CG cs.CL cs.DS cs.IR cs.LG 交叉投稿

Compact Geometric Representations of Hierarchies

层次结构的紧凑几何表示

Prashant Gokhale, Piotr Indyk, Yuhao Liu, Sandeep Silwal, Tony Chang Wang, Haike Xu

发表机构 * UW-Madison（威斯康星大学麦迪逊分校）； MIT（麻省理工学院）

AI总结研究如何用低维几何嵌入表示有向无环图中的祖先-后代关系，提出基于树宽等结构参数的维度上界和下界，并在真实数据集上验证了紧凑性。

Comments Published at the 39th Annual Conference on Learning Theory (COLT) 2026. 22 Pages

详情

AI中文摘要

计算数据的几何表示是现代机器学习的基石，通常通过训练双编码器将查询和文档映射到共享嵌入空间来实现。You等人[NeurIPS '25]的最新工作将这种方法扩展到层次检索，其中相关性由有向无环图（DAG）中的祖先-后代关系决定。虽然先前的工作表明当后代数量较少时存在有效嵌入，但这些界限对于深层层次结构会严重退化，所需维度与节点总数相当。在本文中，我们研究了更一般图类的紧凑可达性嵌入，并提供了使用维度依赖于结构图参数的嵌入来表示层次结构的理论保证。我们证明，对于任何有向树，存在常数维度3的可达性嵌入，与树的大小或深度无关。我们将这一结果推广到以树宽$t$为特征的图，构造了维度为$O(t \log n)$的嵌入，其中$n$是节点数。作为这些上界的补充，我们提供了匹配或接近匹配的下界，表明对于一般DAG，维度$\Omega(n)$是必要的，而对于树宽为$t$的图，需要$\Omega(t/\log(n/t))$的维度。我们还获得了由DAG中交叉边数量参数化的上界和下界。此外，我们展示了我们的嵌入可以在真实世界数据集上构建，并且与先前具有理论保证的嵌入相比，在高召回率情况下维度小得多。

英文摘要

Computing geometric representations of data is a cornerstone of modern machine learning, typically achieved by training dual encoders which map queries and documents into a shared embedding space. Recent work of You et al. [NeurIPS '25] has extended this approach to hierarchical retrieval, where relevance is determined by the ancestor-descendant relationships in a Directed Acyclic Graph (DAG). While previous work has shown that valid embeddings exist when the number of descendants is small, these bounds degrade significantly for deep hierarchies, requiring dimensions as large as the total number of nodes. In this paper, we investigate compact reachability embeddings for more general graph classes and provide theoretical guarantees for representing hierarchies using embeddings whose dimension depends on structural graph parameters. We prove that for any directed tree, there exists a reachability embedding in constant dimension 3, independent of the tree's size or depth. We generalize this result to graphs characterized by treewidth $t$, constructing embeddings of dimension $O(t \log n)$, where $n$ is the number of nodes. Complementing these upper bounds, we provide matching or near-matching lower bounds, showing that dimension $Ω(n)$ is necessary for general DAGs and $Ω(t/\log(n/t))$ is required for graphs of treewidth $t$. We also obtain upper and lower bounds parameterized by the number of cross-edges in the DAG. We additionally show that our embeddings can be constructed on real world datasets, and that they give much smaller dimensions in high recall regimes compared to prior embeddings with theoretical guarantees.

URL PDF HTML ☆

赞 0 踩 0

2606.19249 2026-06-18 cs.CV cs.LG 交叉投稿

Transformer Geometry Observatory TGO-I: Spectral Geometry Observatory

Transformer几何观测站TGO-I：谱几何观测站

Kaustubh Kapil, Kishor P. Upla

发表机构 * Sardar Vallabhai National Institute of Technology (SVNIT), Surat, India（印度苏拉特萨达尔·瓦拉巴伊国家理工学院（SVNIT））

AI总结提出TGO框架，通过分析ViT表示的谱几何（有效秩、稳定秩、参与比、谱熵、谱平坦度、谱各向异性等），发现训练过程中维度利用增加、各向异性降低、谱熵和参与比上升，最终CLS标记表示具有最高有效维度和最低各向异性。

详情

AI中文摘要

尽管Vision Transformers（ViTs）被广泛采用并在众多计算机视觉应用中取得成功，对其维度和表示几何的基本理解仍然相对未被充分探索。为了弥补这一差距，我们引入了Transformer几何观测站（TGO），这是一个系统的实验和分析流程框架，旨在研究Vision Transformers的表示几何和动态。TGO-I是该框架的第一部分，专注于ViT表示的谱几何。使用在ImageNet-100上训练的ViT-Small/16模型，我们分析了训练过程中的有效秩、稳定秩、参与比、谱熵、谱平坦度、谱各向异性、协方差结构、特征谱和奇异值谱。我们的结果揭示了维度利用的一致增加，伴随着各向异性降低、谱熵增加、参与比增加以及逐渐平坦的特征谱。与常见的直觉（即训练应将信息集中到少数主导方向）相反，我们观察到方差在表示维度上的逐渐重新分布。这一现象在最终的CLS标记表示中尤为明显，该表示在网络中表现出最高的有效维度和最低的各向异性。

英文摘要

Despite the widespread adoption of Vision Transformers (ViTs) and their success across numerous computer vision applications, the fundamental understanding of their dimensional and representational geometry remains relatively underexplored. To address this gap, we introduce Transformer Geometry Observatory (TGO), a systematic framework of experiments and analysis pipelines designed to investigate the representational geometry and dynamics of Vision Transformers. TGO-I, the first installment of the framework, focuses on the spectral geometry of ViT representations. Using a ViT-Small/16 model trained on ImageNet-100, we analyze Effective Rank, Stable Rank, Participation Ratio, Spectral Entropy, Spectral Flatness, Spectral Anisotropy, covariance structure, eigenspectra, and singular value spectra throughout training. Our results reveal a consistent increase in dimensional utilization, accompanied by decreasing anisotropy, increasing spectral entropy, increasing participation ratio, and progressively flatter eigenspectra. Contrary to the common intuition that training should concentrate information into a small number of dominant directions, we observe a progressive redistribution of variance across representational dimensions. This phenomenon is particularly pronounced in the final CLS token representation, which exhibits the highest effective dimensionality and lowest anisotropy within the network.

URL PDF HTML ☆

赞 0 踩 0

2606.18438 2026-06-18 math.OC cs.LG 交叉投稿

Sequential Hiring of Contingent Workers Through Learning-Based Optimization

基于学习优化的临时工顺序雇佣

Chris Lee, Xiuli Chao, Izak Duenyas

发表机构 * Department of Industrial and Operations Engineering, University of Michigan（工业与运营工程系，密歇根大学）； Ross School of Business, University of Michigan（罗斯商学院，密歇根大学）

AI总结针对临时工场景中工人产能和劳动力供给的不确定性，提出DR-UCB策略，通过学习周期顺序决策替换与雇佣，实现累积利润最大化，并证明其遗憾下界匹配。

详情

AI中文摘要

在本文中，我们研究了临时工场景下存在工人产能和劳动力供给不确定性的顺序劳动力管理问题。企业通过维持固定规模的活跃团队并随时间学习工人生产力，以最大化累积利润。我们强调该问题中的两个关键运营摩擦：替换工人成本高昂，且工人可能因先前工作承诺、日程限制或入职流程等原因无法立即雇佣。因此，雇佣决策仅在随机延迟后生效。我们将该问题建模为具有昂贵切换和延迟动作的随机多臂赌博机，并开发了一种基于学习的雇佣策略DR-UCB（延迟替换-UCB），该策略通过学习周期顺序做出替换和雇佣决策。在每个周期中，该策略使用实时生产数据确定何时启动劳动力变更以及替换和雇佣哪些工人。我们证明，所提策略的前沿遗憾在其对时间范围的依赖上匹配下界。数值实验表明，DR-UCB优于基准策略。

英文摘要

In this paper, we study a sequential workforce management problem in a contingent labor setting with uncertainty in both worker production and labor supply. A firm seeks to maximize cumulative profit by maintaining an active team of fixed size while learning worker productivity over time. We emphasize two critical operational frictions in this problem: replacing workers is costly, and workers may not be available immediately for hiring because of, for example, prior job commitments, scheduling constraints, or onboarding procedures. Thus, hiring decisions take effect only after a random delay. We formulate this problem as a stochastic multi-play bandit with costly switching and delayed actions, and develop a learning-based hiring policy, DR-UCB (DelayedReplacement-UCB), that makes replacement and hiring decisions sequentially through learning cycles. In each cycle, the policy uses real-time production data to determine when to initiate workforce changes and which workers to replace and hire. We show that the leading-order regret of the proposed policy matches its lower bound in its dependence on the time horizon. Our numerical experiments show that DR-UCB outperforms benchmark policies.

URL PDF HTML ☆

赞 0 踩 0

2606.18514 2026-06-18 cs.RO cs.LG 交叉投稿

N(CO)$^2$: Neural Combinatorial Optimization with Chance Constraints to Solve Stochastic Orienteering

N(CO)$^2$: 基于机会约束的神经组合优化求解随机定向问题

Anas Saeed, Marcos Abel Zuzuárregui, Stefano Carpin

发表机构 * Department of Computer Science and Engineering, University of California, Merced（加州大学默塞德分校计算机科学与工程系）

AI总结提出N(CO)$^2$框架，结合强化学习求解随机定向问题，无需手工启发式，在不确定环境下优化路径选择，性能媲美MILP。

详情

Journal ref: In Proceedings of the IEEE International Conference on Automation Science and Engineering (CASE), 2025

AI中文摘要

神经组合优化（NCO）通过学习启发式，为求解复杂图优化问题提供了一种有前景的替代传统启发式方法的方法。这类问题在自动化领域频繁出现，可用于建模多种应用。虽然NCO在确定性组合优化问题上已被广泛研究，但只有少数工作旨在解决随机组合优化问题。本文提出N(CO)$^2$：基于机会约束的神经组合优化，用于求解随机定向问题（SOP），无需手工设计的启发式。通过集成强化学习（RL）框架，模型在不确定性下优化路径选择，有效平衡探索与利用。实验结果表明，我们的方法在多种SOP实例上具有良好的泛化能力，与最先进的混合整数线性规划（MILP）相比性能具有竞争力。所提方法减少了启发式设计的人力投入，同时在不确定环境中实现自适应和高效的决策。

英文摘要

Neural combinatorial optimization (NCO) offers a promising alternative to traditional heuristic-based methods for solving complex graph optimization problems by proposing to learn heuristics through data. This class of problems frequently arises in automation, as it can be used to model a variety of applications. While NCO has been extensively studied for deterministic combinatorial optimization problems, there are only a few works that aim to solve stochastic combinatorial optimization problems. In this work, we present N(CO)$^2$: Neural Combinatorial Optimization with Chance cOnstraints to solve the Stochastic Orienteering Problem (SOP) without the use of hand-crafted heuristics. By integrating a reinforcement learning (RL) framework, the model optimizes path selection under uncertainty, effectively balancing exploration and exploitation. Empirical results demonstrate that our method generalizes well across diverse SOP instances, achieving competitive performance compared to the state-of-the-art mixed-integer linear program (MILP) for the task. The proposed approach reduces human effort in heuristic design while enabling adaptive and efficient decision-making in uncertain environments.

URL PDF HTML ☆

赞 0 踩 0

2606.18531 2026-06-18 stat.ML cs.LG 交叉投稿

When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?

轨迹级监督何时允许高效的离线强化学习？

Xuanfei Ren, Tengyang Xie

发表机构 * University of Wisconsin-Madison（威斯康星大学麦迪逊分校）

AI总结本文研究离线强化学习中仅使用轨迹级结果（如累积回报或偏好）进行策略优化的统计理论，提出OPAC算法并证明其样本复杂度，同时揭示在非线性聚合目标下存在的统计障碍。

Comments 69 pages

详情

AI中文摘要

离线强化学习通常在过程级奖励监督下进行分析，然而许多序列决策数据集仅记录轨迹级结果。我们发展了从这种结果级监督进行离线策略优化的统计理论。首先研究规范设置，其中目标仍是期望累积奖励，但每个离线轨迹仅提供一个标量标签，其条件均值是累积回报。我们提出OPAC，一种悲观演员-评论家算法，它学习潜在奖励模型并从轨迹级标签优化策略。我们证明了阶为$\widetilde O(H^2\sqrt{C_{sa}(\pi^\star)/n})$的高概率保证和匹配的下界，刻画了用单个轨迹级标签替代过程级奖励的尖锐统计代价。然后我们将该原理扩展到基于偏好的反馈，在偏好模型常数范围内保留了领先的视界和可集中性依赖。最后，我们研究广义基于结果的离线强化学习，其中监督和目标都是由潜在每步奖励的非线性聚合引起的轨迹级量。该问题通常不可学习：对于全成功目标，即使具有确定性转移和常数可集中性，任何离线学习器可能需要$\Omega(2^H)$个轨迹。然后我们通过两个结构系数$\kappa_\mu(\sigma)$和$\chi_\mu(\sigma)$识别出一个可处理的区域，这两个系数捕捉了结果聚合和广义贝尔曼更新中的信息损失，在此区域广义OPAC实现了多项式样本复杂度。我们的结果共同描绘了何时结果级监督能够实现样本高效的离线控制，以及何时缺失过程级奖励会带来根本性的统计障碍。

英文摘要

Offline reinforcement learning is typically analyzed under process-level reward supervision, yet many sequential decision datasets record only trajectory-level outcomes. We develop a statistical theory for offline policy optimization from such outcome-level supervision. We first study the canonical setting where the target remains the expected cumulative reward, but each offline trajectory provides only a scalar label whose conditional mean is the cumulative return. We propose OPAC, a pessimistic actor-critic algorithm that learns a latent reward model and optimizes a policy from trajectory-level labels. We prove a high-probability guarantee of order $\widetilde O(H^2\sqrt{C_{sa}(π^\star)/n})$ and a matching lower bound, characterizing the sharp statistical cost of replacing process-level rewards with one trajectory-level label. We then extend the principle to preference-based feedback, preserving the leading horizon and concentrability dependence up to preference-model constants. Finally, we study generalized outcome-based offline RL, where both the supervision and the objective are trajectory-level quantities induced by a nonlinear aggregation of latent per-step rewards. This problem is not learnable in general: for all-success objectives, any offline learner may require $Ω(2^H)$ trajectories even with deterministic transitions and constant concentrability. We then identify a tractable regime through two structural coefficients, $κ_μ(σ)$ and $χ_μ(σ)$, capturing information loss in outcome aggregation and generalized Bellman updates, under which generalized OPAC achieves polynomial sample complexity. Together, our results delineate when outcome-level supervision enables sample-efficient offline control and when missing process-level rewards create fundamental statistical barriers.

URL PDF HTML ☆

赞 0 踩 0

2606.18598 2026-06-18 cs.AI cs.LG 交叉投稿

Optimizing Lithium Production Decisions under Geological, Demand, and Pricing Uncertainties: A POMDP Framework for Multi-Objective Decision Making

在地质、需求和定价不确定性下优化锂生产决策：多目标决策的POMDP框架

Anna C. Edmonds, Mansur M. Arief, Robert J. Moss, Mykel J. Kochenderfer, Jef Caers

发表机构 * Computer Science Department, Stanford University（斯坦福大学计算机科学系）； Aeronautics and Astronautics Department, Stanford University（斯坦福大学航空与航天系）； Earth and Planetary Sciences Department, Stanford University（斯坦福大学地球与行星科学系）

AI总结提出POMDP框架，通过信念状态规划优化锂矿开采决策，动态适应价格不确定性，实现更高需求满足和更平衡的经济环境效益。

Comments 24 pages, 14 tables, 4 figures

详情

AI中文摘要

锂生产中的决策制定具有挑战性，无论是从投资者角度还是战略生产角度。决定开采哪些矿山以及何时开采，不仅涉及地质和价格不确定性，还涉及提取方法选择的复杂性，从直接锂提取到硬岩开采。先前的工作探索了该问题的模型和优化采矿决策的不同方法；这些模型没有考虑定价不确定性、需求不确定性或提取锂的不同采矿技术。将不同的定价模型和提取技术纳入这些模型，可以制定更稳健的策略，不仅决定何时何地开采矿山，还决定采用哪种生产方法。我们将问题表述为部分可观测马尔可夫决策过程（POMDP），并使用信念状态规划方法求解以获得最优决策。在我们的研究中，我们表明POMDP求解器通过信念状态规划和显式不确定性管理，动态适应变化的锂价格机制（静态、线性、指数和随机），优于人类启发式启发法。通过优化勘探、生产和技术选择的顺序，该框架在所有不同的定价和矿床情景下，在项目生命周期内实现了更高的需求满足和更平衡的经济环境结果。

英文摘要

Decision making in lithium production is challenging, whether from an investor's perspective or a strategic production standpoint. Determining which mines to open and when to open them involves not only geological and price uncertainties, but also complexities around the choice of extraction method, from direct lithium extraction to hard rock mining. Prior work explored models of this problem and different methods to optimize mining decisions; these models did not account for uncertainty in pricing, uncertainty in demand, or different mining technologies to extract lithium. Incorporating different pricing models and extraction technology into these models enables more robust strategies for determining not only when and where to open a mine, but also which method of production to pursue. We frame the problem as a partially observable Markov decision process (POMDP) and solve using belief state planning methods to get optimal decision making. In our study, we show that POMDP solvers outperform human inspired heuristics by dynamically adapting to shifting lithium price regimes (static, linear, exponential, and stochastic) through belief state planning and explicit uncertainty management. By optimally sequencing exploration, production, and technology choice, the framework achieves higher demand fulfillment and more balanced economic environmental outcomes over the projects lifetime in all different pricing and deposit scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.19069 2026-06-18 eess.SY cs.LG cs.SY 交叉投稿

Model-Free Reinforcement Learning Control for Resilient Cyber-Physical Systems

面向弹性信息物理系统的无模型强化学习控制

Hugo O. Garcés, Alejandro J. Rojas, Bernardo A. Hernández, Andrés Escalona, Jonathan M. Palma, Md. Rezwan Parvez, Bhushan Gopaluni, Sirish L. Shah

发表机构 * Departmento de Ingenier\'ia El\'ectrica, Universidad de Concepci\'on, Concepci\'on, Chile (e-mail: ) ； Department of Electrical \& Computer Engineering, University of Alberta, Edmonton, T6G 1H9, Alberta, AB, Canada (e-mail: ) ； Department of Chemical ； Biological Engineering, University of British Columbia, Vancouver, BC V6T 1Z3, Canada ( ) ； Department of Chemical \& Materials Engineering, University of Alberta, Edmonton, T6G 1H9, Alberta, AB, Canada (e-mail: )

AI总结本文比较了无模型控制器在非线性系统遭受网络攻击（虚假数据注入和拒绝服务攻击）下的性能，分析了四种强化学习奖励类型，发现Lyapunov奖励在低跟踪误差下弹性最佳，指数奖励在中等训练条件下提供良好折衷，渐进和线性奖励收敛快但鲁棒性差。

Comments Accepted to the 23rd IFAC World Congress 2026

详情

AI中文摘要

本文比较了无模型控制器在遭受网络攻击（包括虚假数据注入和拒绝服务攻击）的非线性系统上的性能。分析了四种强化学习奖励类型的准确性、成本和弹性。结果表明，Lyapunov奖励在低跟踪误差下提供最佳弹性。指数模式在中等训练条件下也提供了良好的折衷，具有可接受的弹性。渐进和线性奖励收敛更快，但鲁棒性较差。强化学习模型预测控制器（RL-MPC）表现出强稳态弹性，但需要更长的训练时间；强化学习比例-积分-微分控制器（RL-PID）更快，训练时间显著减少。近端策略优化（PPO）优于深度确定性策略梯度（DDPG），关键绩效指标（KPI）方差显著降低。本研究旨在强调精心设计的强化学习奖励如何提高性能和对网络威胁的弹性。

英文摘要

This paper compares the performance of model-free controllers on a nonlinear system under cyberattacks, including false data injection and denial-of-service attacks. Four RL reward types are analyzed for accuracy, cost, and resilience. Results show that the Lyapunov reward offers the best resilience with low tracking error. Exponential mode also provides good trade-offs with acceptable resilience under moderate training conditions. Progressive and linear rewards converge faster but are less robust. RL-MPCs show strong steady-state resilience but require longer training times; RL-PID controllers are faster with significantly less training time. Proximal Policy Optimization outperforms Deep Deterministic Policy Gradient with a significant reduction in KPI variance. This study serves to highlight how well-designed RL rewards can improve performance and resilience against cyber threats.

URL PDF HTML ☆

赞 0 踩 0

2606.18290 2026-06-18 cond-mat.stat-mech cs.LG eess.SP 交叉投稿

Stochastic Thermodynamics and SDE-based Generative Models

随机热力学与基于SDE的生成模型

Yaowen Zhang

发表机构 * GitHub

AI总结本文在随机热力学框架下，为基于SDE的生成模型（如扩散模型和薛定谔桥）定义了轨迹层面的功、热和熵产生，并推广了Jarzynski恒等式和类第二定律不等式。

2606.18354 2026-06-18 eess.IV cs.LG 交叉投稿

Structural MRI Synthesis for Alzheimer's Disease via Conditional Diffusion on Anatomical Masks

基于解剖掩膜条件扩散的阿尔茨海默病结构MRI合成

Muge Zhang, Muhammad Ali Khaliq, Jamal Alsakran, Byeong Kil Lee, Jeeho Ryoo

发表机构 * Fairleigh Dickinson University（Fairleigh Dickinson大学）； University of Colorado at Colorado Springs（科罗拉多州立大学）

AI总结针对阿尔茨海默病结构MRI合成中细微解剖变化难以捕捉的问题，本文扩展Med-DDPM条件扩散模型，以解剖分割掩膜为条件生成3D结构MRI，实验表明合成数据训练的模型Dice分数与真实数据相当，混合数据训练则显著提升性能。

详情

DOI: 10.1109/MIPR67560.2025.00037
Journal ref: 2025 IEEE 8th International Conference on Multimedia Information Processing and Retrieval (MIPR)

AI中文摘要

生成式机器学习模型的最新进展显著改善了医学成像，为数据增强、隐私保护和模型泛化提供了有前景的解决方案。然而，由于神经退行性病变相关的细微、区域特异性和渐进性解剖变化，合成阿尔茨海默病（AD）的高质量结构MRI数据仍然具有挑战性。在本文中，我们将最初为脑肿瘤合成设计的Med-DDPM条件扩散模型扩展，以生成专门针对AD的3D结构MRI。我们采用Med-DDPM，因为与其他生成模型相比，它具有稳定的结构和保真度，特别适合捕捉AD特征的细微解剖变化。我们的方法以来自ADNI数据集的解剖分割掩膜为条件，将关键的AD相关脑结构纳入生成过程。我们通过在真实、合成和混合数据集上训练分割模型，系统评估了合成图像的质量和实用性。实验结果表明，仅在合成数据上训练的分割模型达到了与真实数据训练（0.6513）相当的Dice分数（0.6532），同时召回率显著提高。值得注意的是，在混合数据集（混合真实和合成图像）上训练的模型优于真实和纯合成基线，Dice分数达到0.7244。这些发现强调了条件扩散模型在生成解剖准确、AD特异性合成MRI方面的成功应用，并突出了它们在增强训练数据可用性、提高诊断准确性和促进神经影像研究可重复性方面的潜力。

英文摘要

Recent advances in generative machine learning models have significantly improved medical imaging, offering promising solutions for data augmentation, privacy preservation, and improved model generalization. However, synthesizing high-quality structural MRI data for Alzheimer's Disease (AD) remains challenging due to the subtle, region-specific, and progressive anatomical changes associated with neurodegeneration. In this paper, we extend the Med-DDPM conditional diffusion model -- originally designed for brain tumor synthesis -- to generate 3D structural MRIs specifically tailored to AD. We adopted Med-DDPM due to its established stability and structural fidelity compared to other generative models, which makes it particularly suitable for capturing the subtle anatomical changes characteristic of AD. Our approach conditions the diffusion process on anatomical segmentation masks derived from the ADNI dataset, incorporating key AD-relevant brain structures into the generation process. We systematically evaluate the quality and utility of the synthetic images by training segmentation models on real, synthetic, and hybrid (mixed) datasets. Experimental results demonstrate that segmentation models trained exclusively on synthetic data achieve comparable Dice scores (0.6532) to those trained on real data (0.6513), while exhibiting significantly enhanced recall. Notably, models trained on hybrid datasets (mixing real and synthetic images) outperform both real and synthetic-only baselines, achieving a Dice score of 0.7244. These findings underscore the successful use of conditional diffusion models for generating anatomically accurate, AD-specific synthetic MRIs, and highlight their potential for enhancing training data availability, improving diagnostic accuracy, and promoting research reproducibility in neuroimaging studies.

URL PDF HTML ☆

赞 0 踩 0

2606.18790 2026-06-18 cs.SD cs.AI cs.LG 交叉投稿

Closing the Loop: PID Feedback Control for Interpretable Activation Steering in Symbolic Music Generation

闭环：用于符号音乐生成中可解释激活引导的PID反馈控制

Ioannis Prokopiou, Pantelis Vikatos, Maximos Kaliakatsos-Papakostas, Theodoros Giannakopoulos, Themos Stafylakis

发表机构 * Athens University of Economics and Business（雅典经济与商业大学）； Orfium Research（Orfium 研究）； Hellenic Mediterranean University（希腊地中海大学）； Archimedes / Athena Research Center（阿基米德/雅典娜研究中心）

AI总结提出基于PID反馈控制的推理时激活引导框架，通过差分均值法提取音高和时长潜在方向，并利用Gram-Schmidt正交化解耦多属性引导，实现符号音乐生成中细粒度、可解释的属性调制。

Comments Accepted at Learning to Listen: ICML 2026 Workshop on Machine Learning for Audio (43rd International Conference on Machine Learning - ICMLMLA26), 4 pages main (11 total), 2 figures

详情

AI中文摘要

基于Transformer的架构在生成复杂符号序列方面取得了显著进展，但在实现对离散信号属性的细粒度、可解释控制方面仍存在明显差距。本文研究了多轨音乐Transformer（MMT）的机制可解释性，并提出了一种无需重新训练即可通过推理时激活引导实现确定性属性调制的框架。利用差分均值（DiffMean）方法，我们在残差流中分离出信号属性（特别是音高和时长）的潜在方向。我们验证了该领域的线性表示假设，实现了引导幅度与属性偏移之间的高相关性。为了解决多属性引导中固有的特征纠缠问题，我们引入了一种利用Gram-Schmidt正交化的双引导框架。实验结果表明，与朴素向量加法相比，这种几何解耦减少了概念干扰和信号退化，即使在强自回归条件下也能实现独立的确定性控制。

英文摘要

Transformer-based architectures have significantly advanced the generation of complex symbolic sequences, yet a significant gap remains in achieving fine-grained, interpretable control over discrete signal attributes. This paper investigates the mechanistic interpretability of the Multitrack Music Transformer (MMT) and proposes a framework for deterministic attribute modulation without retraining to bridge this gap via inference-time activation steering. Utilizing the Difference-in-Means (DiffMean) methodology, we isolate latent directions for signal attributes, specifically Pitch and Duration, within the residual stream. We validate the Linear Representation Hypothesis in this domain, achieving high correlation between steering magnitude and attribute shift. To address the inherent feature entanglement in multi-attribute steering, we introduce a Dual Steering framework utilizing Gram-Schmidt Orthogonalization. Experimental results demonstrate that this geometric decoupling reduces conceptual interference and signal degradation compared to naive vector addition, enabling independent deterministic control even against strong autoregressive conditioning.

URL PDF HTML ☆

赞 0 踩 0

2606.18856 2026-06-18 cs.CL cs.LG 交叉投稿

Approximate Structured Diffusion for Sequence Labelling

近似结构化扩散用于序列标注

Nicolas Floquet, Joseph Le Roux, Nadi Tomeh

发表机构 * Université Sorbonne Paris Nord, CNRS, Laboratoire d’Informatique de Paris Nord, LIPN（巴黎北大学 Sorbonne、法国国家科学研究中心、巴黎北信息学实验室、LIPN）

AI总结提出一种基于扩散的条件随机场（CRF）训练方法，通过引入标签噪声条件来捕捉长距离依赖，结合近似推理在词性标注任务上实现16.5%的错误率降低。

2606.19005 2026-06-18 cs.CL cs.LG 交叉投稿

学习增强的精确指数时间算法

Tatiana Belova, Yuriy Dementiev, Danil Sagunov

发表机构 * ITMO University（ITMO大学）

AI总结提出一种通用方法，利用略优于随机猜测的噪声预测器，可证明地减少NP难子集选择问题的搜索空间，运行时间加速随预测质量平滑扩展，且仅需预测的成对独立性或无需知道预测器精度。

详情

AI中文摘要

学习增强算法领域已经证明，机器学习预测可以在广泛的问题中绕过最坏情况下的下界。然而，到目前为止，关注点几乎完全集中在多项式时间算法上，其中预测改进了竞争比、近似保证或运行时间。在本文中，我们提出了一个问题：预测能否推动NP难问题的精确指数时间算法的前沿？我们通过提出一种通用方法对此问题给出肯定回答，该方法增强了一整类用于各种子集选择问题的最先进精确算法。我们表明，一个仅略优于随机猜测的噪声预测器足以可证明地减少搜索空间，并且由此产生的运行时间加速随预测质量平滑扩展。重要的是，我们的算法仅需要预测的成对独立性，或者，不需要知道预测器的精度——这两种设置都比通常假设的更弱且更现实。

英文摘要

The field of learning-augmented algorithms has demonstrated that machine-learned predictions can bypass worst-case lower bounds across a wide range of problems. So far, however, the focus has been almost exclusively on polynomial-time algorithms, where predictions improve competitive ratios, approximation guarantees, or running times. In this paper, we raise the question of whether predictions can push the frontier of exact exponential-time algorithms for NP-hard problems. We answer this question affirmatively by proposing a general approach that augments an entire family of state-of-the-art exact algorithms for a variety of subset selection problems. We show that a noisy predictor that is only marginally better than random guessing suffices to provably reduce the search space, and that the resulting runtime speedup scales smoothly with the prediction quality. Importantly, our algorithms require only pairwise independence of predictions or, alternatively, do not require the knowledge of the predictor's accuracy - both strictly weaker and more realistic settings than typically assumed.

URL PDF HTML ☆

赞 0 踩 0

2606.18993 2026-06-18 stat.ML cs.LG stat.ME 交叉投稿

Sequential Kernel-based Conditional Independence Testing via Adaptive Betting

基于自适应投注的序列核条件独立性检验

Zheng He, Danica J. Sutherland

AI总结提出一种对估计误差更鲁棒的序列条件独立性检验方法，通过自适应优化核条件独立性统计量、归一化及截断平移校准，在合成与真实数据上控制第一类错误并保持高功效。

Comments Published at ICML 2026: https://openreview.net/forum?id=vUMdIyTs9c

详情

AI中文摘要

检验条件独立性是基础但本质上困难的问题：在没有额外假设的情况下，通常无法控制第一类错误。“Model-X”范式通过假设精确知道相关条件分布来解决这一困难。虽然经典的一次性检验有时可以容忍对该假设的小偏差，但现有的序列条件独立性检验通常要求精确知道Model-X条件分布，这使得当必须估计该分布时它们变得脆弱。我们提出了一种新方法，对这类估计误差具有更强的鲁棒性。我们的方法将测试-投注应用于自适应优化的核条件独立性统计量，并结合归一化方案和截断-移位校准策略。这些修改大大减少了第一类错误膨胀，同时在高维合成基准和现实世界公平性任务中保持了高功效，优于现有的序列Model-X方法。代码可在https://this URL获取。

英文摘要

Testing conditional independence is fundamental yet intrinsically difficult: without additional assumptions, Type I error control is impossible in general. The "Model-X'' paradigm addresses this difficulty by assuming exact knowledge of a relevant conditional distribution. While small deviations from this assumption can sometimes be tolerated in classical one-shot testing, existing sequential conditional independence tests typically require the Model-X conditional to be known exactly, making them fragile when it must instead be estimated. We propose a new approach that is substantially more robust to such estimation error. Our method applies testing-by-betting to an adaptively optimized Kernel Conditional Independence statistic, together with a normalization scheme and a truncate-and-shift calibration strategy. These modifications greatly reduce Type I error inflation while preserving high power across high-dimensional synthetic benchmarks and real-world fairness tasks, outperforming existing sequential Model-X approaches. Code is available at https://github.com/he-zh/SKCI.

URL PDF HTML ☆

赞 0 踩 0

2606.19117 2026-06-18 stat.ME cs.LG econ.EM stat.ML 交叉投稿

Wasserstein Policy Learning for Distributional Outcomes

Wasserstein 策略学习用于分布性结果

Yiyan Huang, Cheuk Hang Leung, Qi Wu, Zhiheng Zhang

AI总结针对分布值结果，提出基于Wasserstein重心和效用泛函的策略学习框架，使用IPW和DR估计器，证明遗憾率由策略类复杂度主导，并给出极小化下界。

Comments Accepted by The 39th Annual Conference on Learning Theory (COLT 2026)

详情

AI中文摘要

离线策略学习在因果推断中受到越来越多的关注。主要目标是学习一个策略（个体化治疗规则），作为从协变量到治疗的映射，以最大化定义为标量值潜在结果均值的经验福利。在本文中，我们研究具有分布值结果的离线策略学习，其中每个潜在结果是$\mathbb{R}$上的概率测度，奖励通过应用于诱导结果分布的Wasserstein重心的效用泛函来定义。我们基于逆概率加权（IPW）和双稳健（DR）估计器为策略学习框架建立了统计保证。通过处理组合策略类和无限维分位数域乘积上的具有挑战性的均匀偏差，我们证明了有限样本遗憾具有主导依赖$\widetilde{\mathcal{O}}(\sqrt{\mathrm{N\text{-}dim}(\Pi)/N})$。在一维Wasserstein设定下，并在所述正则条件下，主导遗憾率仍由策略类复杂度控制。此外，我们提供了一个极小化下界，建立了对$N$和$\mathrm{N\text{-}dim}(\Pi)$主导依赖的尖锐性。

英文摘要

Offline policy learning has received growing attention in causal inference. The primary objective is to learn a policy (individualized treatment rule) as a mapping from covariates to treatment that maximizes the empirical welfare defined as the mean of scalar-valued potential outcomes. In this paper, we study offline policy learning with distribution-valued outcomes, where each potential outcome is a probability measure on $\mathbb{R}$ and the reward is defined through a utility functional applied to the Wasserstein barycenter of induced outcome distributions. We establish statistical guarantees for the policy learning framework based on both Inverse Probability Weighting (IPW) and Doubly Robust (DR) estimators. By handling the challenging uniform deviation over the product of the combinatorial policy class and the infinite-dimensional quantile domain, we prove that the finite-sample regret has leading dependence $\widetilde{\mathcal{O}}(\sqrt{\mathrm{N\text{-}dim}(Π)/N})$. In the one-dimensional Wasserstein setting and under the stated regularity conditions, the leading regret rate is still governed by the policy-class complexity. Moreover, we provide a minimax lower bound establishing the sharpness of the leading dependence on $N$ and $\mathrm{N\text{-}dim}(Π)$.

URL PDF HTML ☆

赞 0 踩 0

2606.19147 2026-06-18 stat.ML cs.LG math.ST stat.TH 交叉投稿

On Local Population-Risk Certificates

论局部总体风险证书

Mingzhi Song

发表机构 * Department of Mathematics, The University of Hong Kong（香港大学数学系）

AI总结本文提出局部总体风险增量证书，用于在模型更新时提供风险控制，通过双边置信带判断更新是否接受。

Comments 35 pages, 6 figures

2606.19212 2026-06-18 stat.ML cs.LG 交叉投稿

在512MB内存下的嵌入式设备上运行硬件感知的神经架构搜索

Andrea Mattia Garavagno, Edoardo Ragusa, Paolo Gastaldo, Antonio Frisoli

发表机构 * University of Bologna（博洛尼亚大学）； Politecnico di Milano（米兰理工学院）

AI总结提出一种在资源受限的嵌入式设备上直接运行的硬件感知神经架构搜索方法，生成针对低端MCU的微型CNN，在Visual Wake Word数据集上达到最先进水平。

详情

DOI: 10.1109/ICCE59016.2024.10444268
Journal ref: 2024 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 2024, pp. 1-2

AI中文摘要

本文提出了一种新颖的硬件感知神经架构搜索（HW NAS）方法，该方法考虑了运行它的计算平台上的可用资源，使其能够在各种嵌入式设备上执行。所提出的HW NAS生成针对低端微控制器单元（MCU）的微型卷积神经网络（CNN），这些MCU通常用于物联网（IoT）或可穿戴机器人领域，从而开辟了新的应用场景。网关可以运行它来根据获取的数据定制CNN的架构，而无需使用外部服务器，从而确保隐私。所提出的技术在Visual Wake Word数据集（一个标准的TinyML基准）上的多个人体识别任务中，在多个嵌入式设备上取得了最先进的结果。

英文摘要

This document proposes a novel approach to hardware-aware neural architecture search (HW NAS) that considers the resources available on the computing platform running it, enabling its execution on various embedded devices. The presented HW NAS produces tiny convolutional neural networks (CNNs) targeting low-end microcontroller units (MCUs), typically involved in the Internet of Things (IoT) or wearable robotics, opening new use cases. A gateway could run it to tailor CNNs' architecture on the acquired data without using external servers, ensuring privacy. The proposed technique achieves state-of-the-art results in the human-recognition tasks on the Visual Wake Word dataset, a standard TinyML benchmark, on several embedded devices.

URL PDF HTML ☆

赞 0 踩 0

2606.18312 2026-06-18 cs.CR cs.DC cs.LG 交叉投稿

TIGER: Inverting Transformer Gradients via Embedding-Subspace Distance Optimization

TIGER：通过嵌入子空间距离优化反转Transformer梯度

William Kalikman, Ivo Petrov, Dimitar I. Dimitrov, Martin Vechev

发表机构 * ETH Zürich（苏黎世联邦理工学院）； INSAIT, Sofia University "St. Kliment Ohridski"（索菲亚大学"圣克莱门特·奥赫里茨基"）

AI总结提出TIGER攻击，通过将子空间信号转化为可微目标，直接优化令牌嵌入以最小化到子空间的距离，在编码器模型上提升重建质量和速度，在解码器模型上增强对差分隐私的鲁棒性。

Comments 16 pages, 13 pages main text,

详情

AI中文摘要

联邦学习允许多个客户端通过向中央服务器发送梯度更新来联合训练共享模型，同时保持原始输入在本地。然而，先前的梯度反转攻击表明，这些更新可以泄露足够的信息来重建客户端输入。现有的针对Transformer的攻击要么优化虚拟输入以匹配真实的客户端更新，这对于现代模型来说成本高昂且不稳定；要么利用注意力梯度的低秩性来识别包含真实层嵌入的子空间，然后对候选令牌进行离散成员测试。然而，这种令牌测试在数值噪声（例如来自量化或差分隐私）下很脆弱，并且对于具有非因果注意力的编码器模型扩展性差。我们引入了TIGER，一种连续的梯度反转攻击，它将这种子空间信号转化为可微目标。TIGER不是搜索令牌或匹配完整梯度，而是直接优化令牌嵌入以最小化它们到子空间的距离。我们的实验表明，在仅编码器模型上，TIGER在重建质量和运行时间上均显著优于现有攻击；而在解码器模型上，TIGER比先前基于子空间的攻击更鲁棒，从而在受差分隐私保护的联邦学习设置中实现了首次成功的重建。

英文摘要

Federated learning allows multiple clients to jointly train a shared model by sending gradient updates to a central server while keeping raw inputs local. However, prior gradient inversion attacks show that these updates can reveal enough information to reconstruct client inputs. Existing attacks on transformers either optimize dummy inputs to match the true client updates, which is costly and unstable for modern models, or exploit the low rank of attention gradients to identify a subspace containing the true layer embeddings, followed by a discrete membership test for candidate tokens. However, this token test is brittle under numerical noise, i.e., from quantization or Differential Privacy (DP), and scales poorly for encoder models with non-causal attention. We introduce TIGER, a continuous gradient inversion attack that turns this subspace signal into a differentiable objective. Instead of searching over tokens or matching full gradients, TIGER directly optimizes token embeddings to minimize their distance to the subspace. Our experiments demonstrate that on encoder-only models, TIGER substantially improves both reconstruction quality and runtime over existing attacks, while on decoder models, TIGER is more robust than prior subspace-based attacks, enabling the first successful reconstructions in DP-defended federated learning settings.

URL PDF HTML ☆

赞 0 踩 0

2606.19023 2026-06-18 cs.CR cs.LG 交叉投稿

竞争风险背景下条件平均处理效应估计指南

Daniel Klippert, Sarah Friedrich, Markus Pauly

发表机构 * Department of Statistics, TU Dortmund University（图恩-多特蒙德大学统计学系）； Research Center Trustworthy Data Science and Security, University Alliance Ruhr (UA Ruhr)（鲁尔大学联盟可信数据科学与安全研究中心）； Institute for Mathematics, University of Augsburg（艾希施泰特大学数学研究所）

AI总结针对竞争风险生存数据，比较六种元学习器估计条件平均处理效应，提供R包crsurvlearners指导模型选择。

详情

AI中文摘要

条件平均处理效应（CATE）是个性化医疗中治疗决策的核心。在竞争风险背景下，从生存数据估计CATE允许对特定感兴趣事件的治疗效果进行患者特异性评估，同时适当考虑替代事件类型。在存在合并症的情况下，这种区分至关重要，因为竞争死亡原因可能混淆治疗效果。本文聚焦于右删失生存时间和二元治疗，研究CATE定义为在固定时间点上感兴趣事件绝对风险的协变量条件差异。为此，我们研究了元学习器，这些学习器将机器学习算法适应于竞争风险场景中的CATE估计。我们系统比较了六种元学习器，结合Cox回归或随机生存森林进行风险建模，以及弹性网回归或随机森林进行直接CATE建模。为提供模型选择的实践指导，我们在多种模拟设置中评估其性能，这些设置在风险复杂性、治疗异质性、治疗分配、事件类型分布和删失方面有所不同。为促进应用，我们提供R包crsurvlearners，实现了所有考虑的方法。

英文摘要

Conditional average treatment effects (CATEs) are central to treatment decision-making in personalized medicine. In competing risks settings, estimating CATEs from survival data allows for patient-specific assessments of treatment effectiveness for a specific event of interest while properly accounting for alternative event types. This distinction is essential in the presence of comorbidities, where competing causes of death may otherwise confound the therapeutic benefit. Focusing on right-censored survival times with binary treatment, we examine CATEs defined as covariate-conditional differences in the absolute risk for the event of interest at a fixed time. To this end, we study meta-learners which adapt machine learning algorithms for CATE estimation in competing risks scenarios. We systematically compare six meta-learners, combining Cox regression or random survival forests for risk modeling with elastic net regression or random forests for direct CATE modeling. To provide practical guidance on model selection, we evaluate their performance in multiple simulation settings, that differ in hazard complexity, treatment heterogeneity, treatment assignment, event type distribution and censoring. To facilitate applied use, we provide the R package, crsurvlearners, which implements all considered approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.18302 2026-06-18 q-bio.OT cs.LG 交叉投稿

Protein-Based Fish Species Identification: Dataset, Models, and Insights from Native Bangladeshi Fish

基于蛋白质的鱼类物种识别：孟加拉本土鱼类的数据集、模型与见解

Md Nasiat Hasan Fahim, Md. Abid Ullah Muhib, Mohammad Shahidur Rahman

发表机构 * Shahjalal University of Science

AI总结本研究构建了首个孟加拉本土鱼类蛋白质序列数据集，并系统评估了七种架构，提出了一种轻量级混合模型MotifCNN-Transformer+TA-PE，在资源受限场景下优于大型蛋白质语言模型ProtBERT。

Comments Published in 2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking (QPAIN). \c{opyright} 2026 IEEE. Personal use of this material is permitted

详情

DOI: 10.1109/QPAIN69676.2026.11546620
Journal ref: 2026 IEEE 2nd International Conference on Quantum Photonics, Artificial Intelligence & Networking (QPAIN)

AI中文摘要

在孟加拉国，正确识别鱼类物种对于粮食安全、经济发展和气候适应性至关重要。蛋白质序列直接反映功能和进化约束，对物种认证和生物多样性监测具有重要意义。然而，目前尚无针对孟加拉本土鱼类物种的蛋白质序列识别基准。本研究通过引入首个包含9种孟加拉本土鱼类2845条高质量蛋白质序列的精选数据集来填补这一空白。我们还通过对七种架构范式进行系统基准测试，建立了该领域首个蛋白质序列分类基线。此外，我们提出了一种实用的新型混合架构——MotifCNN与具有末端感知位置编码的Transformer（MotifCNN-Transformer+TA-PE）。该新架构实现了79.80%的准确率和0.80的宏F1分数。最高准确率83.04%由微调的蛋白质语言模型ProtBERT取得，该模型有4.2亿参数，需要双16GB GPU进行推理。根据McNemar检验，ProtBERT相比我们的MotifCNN-Transformer+TA-PE的3.24%准确率提升在统计上不显著（p = 0.1120）。在九类中的六类上，我们的新架构在每类识别中优于ProtBERT。此外，我们的MotifCNN-Transformer+TA-PE比ProtBERT快约5倍，小42倍，支持16倍更大的批处理大小，且无需GPU推理，使其在资源受限地区（如孟加拉农村）部署更为实用。除此之外，我们的基础性工作展示了系统发育关系对序列相似性的影响，并为南亚蛋白质依赖型经济中的渔业管理、食品认证和生物多样性保护建立了途径。

英文摘要

Correct identification of fish species is highly significant for food security, economic development, and climate resilience in Bangladesh. Protein sequences directly reflect functional and evolutionary constraints which are important for species authentication and biodiversity monitoring. Yet there exists no benchmark for native Bangladeshi fish species identification from protein sequence. In this study, we addressed this gap by introducing the first curated dataset for nine native Bangladeshi fish species of 2845 high quality protein sequences. We also established the first protein sequence classification baseline for this domain through a systematic benchmarking of seven architectural paradigms. Moreover, we propose a realistic deployable novel hybrid architecture of MotifCNN and Transformer with Terminal-Aware Positional-Encoding (MotifCNN-Transformer+TA-PE). Our novel architecture achieves 79.80% accuracy with macro-F1 of 0.80. The highest 83.04% accuracy is achieved by finetuned protein language model ProtBERT that has 420M parameters and requires dual 16GB GPUs for inference. According to McNemar's test, ProtBERT's 3.24% accuracy gain over our MotifCNN-Transformer+TA-PE is statistically insignificant (p = 0.1120). Our novel architecture beats it among six of the nine classes in per class identification. Also our MotifCNN-Transformer+TA-PE is approximately 5x faster, 42x smaller, and supports 16x larger batch size than ProtBERT and has GPU free inference, making it more practical for deployment in resources constrained areas such as rural Bangladesh. Beyond this, our foundational work shows effects of phylogenetic relationships on sequence similarity and establishes pathways for fisheries management, food authentication and biodiversity conservation in South Asia's protein dependent economy.

URL PDF HTML ☆

赞 0 踩 0

2606.18436 2026-06-18 stat.ML cs.LG 交叉投稿

TimeLAVA: 时间序列的学习无关数据估值

Wenqin Liu, Weizhi Quan, Aoqi Zuo, Erdun Gao, Vu Nguyen, Dino Sejdinovic, Howard Bondell, Mingming Gong

发表机构 * School of Mathematics and Statistics, The University of Melbourne（墨尔本大学数学与统计学学院）； Statistics, The University of Melbourne（墨尔本大学统计学系）； Statistics, University of Sydney（悉尼大学统计学系）； Responsible AI Research Centre, Australian Institute for Machine Learning（澳大利亚机器学习研究所负责任人工智能研究中心）； Amazon（亚马逊）； School of Mathematical Sciences, Adelaide University（阿德莱德大学数学科学学院）； Department of Machine Learning, MBZUAI（MBZUAI机器学习系）

AI总结提出TimeLAVA，一种学习无关框架，通过小波变换和最优传输评估时间序列片段对分布差异的边际贡献，无需模型训练，在异常检测、数据剪枝和标签噪声检测中优于现有方法。

Comments 34pages

详情

Journal ref: ICML2026

AI中文摘要

数据估值量化单个样本的内在质量，以实现原则性的数据整理、质量控制和鲁棒学习。对于医疗、金融和工业监控等关键领域的时间序列，有效的估值方法至关重要但基本缺乏。现有方法要么依赖于模型，限制了其泛化性，要么针对独立同分布数据设计，因此无法捕捉序列数据固有的时间依赖性、多尺度模式和非平稳动态。我们引入了TimeLAVA，一种学习无关框架，通过评估时间片段对最小化评估数据与参考数据之间分布差异的边际贡献来估值。其核心是一种新颖的基于选择性小波的Wasserstein差异，结合了用于时间定位的多尺度小波变换和用于对分布偏移具有鲁棒性的非平衡最优传输。通过敏感性分析高效计算片段值，无需模型训练，并聚合成逐点得分。我们提供了将估值与模型无关泛化联系起来的理论保证，并证明了对异常值污染的有界敏感性。在异常检测、数据剪枝和标签噪声检测上的大量实验表明，TimeLAVA在多样化的真实世界数据集上产生了比现有方法显著更具信息量的价值分数。

英文摘要

Data valuation quantifies the intrinsic quality of individual samples to enable principled data curation, quality control, and robust learning. For time series in critical domains such as healthcare, finance, and industrial monitoring, effective valuation methods are essential yet fundamentally lacking. Existing approaches are either model-dependent, limiting their generalizability, or designed for i.i.d. data and thus fail to capture temporal dependencies, multi-scale patterns, and non-stationary dynamics inherent to sequential data. We introduce TimeLAVA, a learning-agnostic framework that values temporal segments by their marginal contribution to minimizing distributional discrepancy between evaluated and reference data. At its core is a novel Selective Wavelet-based Wasserstein discrepancy combining multi-scale wavelet transforms for temporal localization with unbalanced optimal transport for robustness to distributional shifts. Segment values are efficiently computed via sensitivity analysis without requiring model training and aggregated into point-wise scores. We provide theoretical guarantees linking valuation to model-agnostic generalization and prove bounded sensitivity to outlier contamination. Extensive experiments across anomaly detection, data pruning, and label noise detection demonstrate that TimeLAVA produces significantly more informative value scores than existing methods on diverse real-world datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.18750 2026-06-18 stat.AP cs.LG 交叉投稿

Ensuring Trustworthy Online A/B Testing: Addressing Five Key Questions on CUPED

确保可信的在线A/B测试：解决关于CUPED的五个关键问题

Yu Zhang, Bokui Wan, Yongli Qin, Jinyong Ma, Yifan Guo

AI总结本文系统解决CUPED应用中五个常见但被忽视的问题，包括最优调整规范、回归调整有效性、鲁棒方差估计，并扩展到多臂实验和两阶段抽样设计，通过理论分析和实验验证提供可靠方法，已在字节跳动平台部署。

Comments 15 pages, 3 figures

详情

AI中文摘要

A/B测试已成为大规模在线实验中数据驱动决策的金标准，为功能发布、定价优化和用户体验提升提供关键指导。为最大化统计灵敏度，许多科技公司常规使用实验前数据控制实验（CUPED），该技术实现大幅方差缩减，同时保持平均处理效应估计的无偏性。尽管被广泛采用，CUPED的几个关键方法和实践细节仍未充分探索。本文系统解决了关于CUPED应用的五个常见但被忽视的问题。首先，我们提供各种后CUPED估计量的比较分析，以确定最优调整规范。其次，我们评估基于回归的调整的有效性，并描述为此类框架定制的鲁棒方差估计方法。最后，我们将研究扩展到复杂但常见的场景，包括多臂实验和两阶段抽样设计。我们的发现表明，在这些设置中，天真地依赖标准方差估计量可能导致严重误导的推断。通过提供严格的理论见解和广泛的实验验证，本工作加深了对CUPED的概念理解。值得注意的是，推荐的方法已成功部署并集成到字节跳动的实验平台中。

英文摘要

A/B testing has become the gold standard for data-driven decision-making in large-scale online experimentation, providing critical guidance for feature launch, pricing optimization, and user experience enhancement. To maximize statistical sensitivity, many technology companies routinely employ Controlled-experiment Using Pre-Experiment Data (CUPED), a technique that achieves substantial variance reduction while preserving the unbiasedness of estimating the average treatment effect. Despite its widespread adoption, several critical methodological and practical nuances of CUPED remain underexplored. This paper systematically addresses five frequently encountered yet overlooked questions regarding the application of CUPED. First, we provide a comparative analysis of various post-CUPED estimators to identify the optimal adjustment specification. Second, we evaluate the validity of regression-based adjustments and delineate robust variance estimation methods tailored for such frameworks. Finally, we extend our investigation to complex but common scenarios, including multi-arm experiments and two-stage sampling designs. Our findings reveal that in these settings, naive reliance on standard variance estimators can lead to severely misleading inferences. By offering rigorous theoretical insights and extensive experimental validation, this work deepens the conceptual understanding of CUPED. Notably, the recommended methodologies have been successfully deployed and integrated into ByteDance's experimentation platform.

URL PDF HTML ☆

赞 0 踩 0

2606.18972 2026-06-18 stat.ML cs.LG 交叉投稿

FOSC-X: An Extended Framework for Optimal Local Cuts and Non-Horizontal Cluster Selection from Clustering Hierarchies

FOSC-X: 一种用于从聚类层次结构中提取最优局部切割和非水平聚类的扩展框架

Connor Simpson, Ricardo J. G. B. Campello

AI总结提出FOSC-X框架，通过动态规划从层次聚类树中提取前M个全局最优的局部非水平切割聚类，支持聚类数约束，在线性时间内保证最优排序。

详情

AI中文摘要

从层次结构中提取平坦聚类解是实际聚类分析中的常见任务，可表述为优化问题。现有方法侧重于寻找单个最优解。我们引入FOSC-X，一个从层次聚类树的局部非水平切割中提取前M个全局最优平坦聚类的框架，同时可选地对聚类数量施加约束。这使得能够自动识别多个高质量替代聚类，捕捉层次结构的不同方面。无约束时，利用子树内局部最优部分候选可组合成全局最优解并自动确定聚类数的性质，通过动态规划在多项式时间内求解前M问题。然而，这可能导致聚类数最终不理想——例如，在特定应用领域中过大而失去意义或难以实际分析。施加聚类数约束破坏了无约束动态规划方法的最优性性质，因为局部最优部分候选可能不再能组合成可行的全局最优解。FOSC-X通过一种动态规划策略应对这一挑战，该策略使用可行性的下界和上界维护紧凑的可行候选集，同时剪枝不可行或占优的组合。所得方法保证在有无聚类数约束下，均以聚类节点数和数据集大小的线性时间复杂度获得前M个解的最优排序。实验表明，FOSC-X能有效揭示单解提取方法忽略的替代聚类结构。

英文摘要

Extracting a flat clustering solution from a hierarchy is a common task in practical cluster analysis and can be formulated as an optimisation problem. Existing approaches focus on finding a single optimal solution. We introduce FOSC-X, a framework for extracting the top-M globally optimal flat clusterings from local, non-horizontal cuts of a hierarchical cluster tree, while optionally enforcing constraints on the number of clusters. This enables automatic identification of multiple high-quality alternative clusterings that capture different aspects of the hierarchical structure. Without constraints, the top-M problem can be solved in polynomial time using dynamic programming, exploiting the property that locally optimal partial candidates within subtrees can be combined to form globally optimal solutions while automatically determining the number of clusters. However, this can lead to solutions with numbers of clusters that are ultimately undesirable -- e.g., too large to be meaningful or practically analysed within a particular application domain. Imposing cluster-count constraints breaks the optimality property underlying the unconstrained dynamic programming approach, since locally optimal partial candidates may no longer combine into feasible globally optimal solutions. FOSC-X addresses this challenge through a dynamic programming strategy that maintains compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations. The resulting method guarantees optimal rankings of the top-M solutions with linear-time complexity in the number of cluster nodes and dataset size, both with and without cluster-count constraints. Experiments show that FOSC-X efficiently reveals alternative clustering structures overlooked by single-solution extraction methods.

URL PDF HTML ☆

赞 0 踩 0

2606.19057 2026-06-18 stat.ML cs.LG stat.CO stat.ME 交叉投稿

用LOCUS解放法律：美国地方条例语料库

Denis Peskoff, Joe Barrow, Christopher Vu, Diag Davenport

发表机构 * UC Berkeley（加州大学伯克利分校）； School of Information（信息学院）； Independent（独立研究者）

AI总结为解决美国地方条例缺乏机器可读语料的问题，构建了包含9239个市县条例的LOCUS语料库，并训练ModernBERT分类器以分析法律透明度等维度。

Comments 14 pages, 6 figures

详情

AI中文摘要

法律人工智能的进展越来越依赖于大规模获取权威法律文本。然而，美国法律中最具影响力的层级之一——地方条例——在很大程度上仍然缺失于现有的机器可读语料库中。地方法规管辖着分区、住房、商业许可、公共卫生、噪音、动物控制以及许多其他日常监管领域，但它们分散在专为人类浏览而非批量研究访问设计的供应商平台上。我们引入了LOCUS——美国地方条例语料库——一个全面的语料库和县级统一访问层，用于美国市和县条例。原始语料库可供研究人员发布，几乎涵盖了所有公开可用的市和县条例。由此产生的原始语料库包含来自9239个城市和县的法规。一个较小的县级统一LOCUS访问层覆盖了美国3144个县中最大的2309个，覆盖了大部分人口。我们使用OCR来处理使法律无法成为公共资源的各种文档格式。我们发布了带有覆盖元数据的语料库，以支持可重复性、下游法律AI研究以及逐步扩展对地方法律的机器可读访问。我们训练了一系列基于ModernBERT的分类器和评分器，以便从多个维度分析美国地方法律，例如不透明性和家长式作风，这些维度以前从未在此规模上研究过。LOCUS-v1及其衍生模型可在以下网址获取：this https URL

英文摘要

Progress in legal AI increasingly depends on access to authoritative legal text at scale. Yet one of the most consequential layers of American law remains largely absent from existing machine-readable corpora: local ordinances. Local codes govern zoning, housing, business licensing, public health, noise, animal control, and many other domains of everyday regulation, but they are fragmented across vendor platforms designed for human browsing rather than bulk research access. We introduce LOCUS - the Local Ordinance Corpus for the United States - a comprehensive corpus and county-harmonized access layer for U.S. municipal and county ordinance codes. The raw corpus, available for release to researchers, represents nearly all publicly available municipal and county ordinance codes. The resulting raw corpus contains codes from 9,239 cities and counties. A smaller county-harmonized LOCUS access layer provides coverage for the largest 2,309 of 3,144 U.S. counties, accounting for a majority of the population. We use OCR to handle the myriad of document formats that have kept the law from being a public resource. We release the corpus with coverage metadata to support reproducibility, downstream legal AI research, and the incremental expansion of machine-readable access to local law. We train a collection of ModernBERT-based classifiers and scorers to facilitate analyzing U.S. local law among several dimensions, such as opacity and paternalism, that have not previously been studied at this scale. LOCUS-v1 and its derivative models are available at: https://huggingface.co/datasets/LocalLaws/LOCUS-v1

URL PDF HTML ☆

赞 0 踩 0

2606.17077 2026-06-18 physics.chem-ph cs.AI cs.LG quant-ph 交叉投稿

通过ASR自验证与蒸馏实现可靠的神经编解码文本转语音：跨模型与编解码器的近零灾难性失败

Ali Asaria, Tony Salomone, Deep Gandhi

发表机构 * Transformer Lab

AI总结针对开放自回归神经编解码TTS模型的随机灾难性失败（静音、早停、重复或幻觉），提出基于ASR往返的格式鲁棒度量，通过最佳N自验证将失败率降至近零，并通过蒸馏将鲁棒性迁移至单次解码，在无测试代价下关闭约52-58%的失败。

详情

AI中文摘要

开放自回归神经编解码文本转语音（TTS）模型在典型输入上表现优异，但会出现随机灾难性失败：在相当一部分话语中，它们会发出静音、提前终止或陷入重复或幻觉内容。我们表明这种失败模式可以廉价地消除。在单一格式鲁棒度量（通过ASR往返的灾难性失败率）下，最佳N ASR自验证将失败率降至近零：在标准语料库（LibriSpeech）上N=2时未观察到失败，在困难提示集上N=4时也未观察到。这不是单一模型的假象：该减少在四个开放编解码TTS系统和三个神经编解码器（XCodec2、SNAC、Mimi）上复现，其中三个系统在N=2时达到近零下限。然后，通过将自验证行为蒸馏到模型中，我们在推理时免费实现了修复，这恢复了单次解码中的大部分鲁棒性，在无测试代价下关闭了困难输入上约52-58%的失败。蒸馏增益集中在需要的地方（困难输入）；在已经可靠的散文上，没有改进空间且无检测到变化。一项受控比较添加了一个干净的负面结果：离线直接偏好优化（DPO/IPO）并未优于普通监督蒸馏，而在线迭代变体虽有前景但在我们的评估规模下统计上不显著。我们诚实地报告了唯一抵抗的模型（一个更大的Llasa，其中规模并未明显帮助）以及一个罕见词能力上限，该上限无法通过任何自蒸馏方法克服。

英文摘要

Open autoregressive neural-codec text-to-speech (TTS) models sound excellent on typical inputs yet suffer stochastic catastrophic failures: on a meaningful fraction of utterances they emit silence, terminate early, or collapse into repetitive or hallucinated content. We show this failure mode is cheap to remove. Under a single format-robust metric (a catastrophic-failure rate via an ASR round-trip), best-of-N ASR self-verification drives failures to near-zero: no observed failures remain by N=2 on a standard corpus (LibriSpeech) and by N=4 on a hard prompt set. This is not an artifact of one model: the reduction replicates across four open codec-TTS systems and three neural codecs (XCodec2, SNAC, Mimi), reaching the near-zero floor by N=2 on three of the four. We then make the fix free at inference time by distilling the self-verified behaviour into the model, which recovers much of the robustness in single-shot decoding, closing ~52-58% of the failure mass on hard inputs at no test-time cost. The distillation gain concentrates where it is needed (hard inputs); on already-reliable prose there is no headroom and no detectable change. A controlled comparison adds a clean negative: offline direct preference optimization (DPO/IPO) does not beat plain supervised distillation, and an online iterative variant is promising but not statistically separable at our evaluation size. We report honestly the one model that resists (a larger Llasa where scale did not obviously help) and a rare-word capability ceiling that no self-distillation method overcomes

URL PDF HTML ☆

赞 0 踩 0

2606.18429 2026-06-18 cs.CV cs.AI cs.LG 交叉投稿

CAOA -- Completion-Assisted Object-CAD Alignment

CAOA -- 补全辅助的物体-CAD对齐

Hiranya Garbha Kumar, Minhas Kamal, Balakrishnan Prabhakaran

发表机构 * University at Albany（奥尔巴尼大学）

AI总结提出CAOA方法，结合语义感知点云补全和对称感知相对位姿估计，在Scan2CAD上实现17%精度提升，并发布S2C-Completion数据集。

Comments GitHub: https://github.com/MinhasKamal/CAOA

详情

DOI: 10.1109/3DV69130.2026.00047
Journal ref: Thirteenth International Conference on 3D Vision (3DV), 2026

AI中文摘要

准确地将CAD模型与室内RGB-D扫描中的对应物体对齐是3D语义重建的核心挑战。该任务需要估计9自由度（DoF）位姿——位置、旋转和三轴尺度——但受到噪声和不完整扫描以及导致几何畸变的分割误差的阻碍。我们提出补全辅助的物体-CAD对齐（CAOA），该方法将语义和上下文感知的点云补全模块与对称感知的相对位姿估计算法相结合，实现CAD模型与扫描物体的精确对齐。现有的补全方法通常在合成数据集上训练和评估，往往难以泛化到真实扫描。为弥合这一差距，我们引入了一种针对室内场景的合成数据生成策略，通过与广泛使用的补全数据集进行定量比较，验证了其显著减小合成到真实领域差距的效果。此外，我们发布了S2C-Completion，一个来自Scan2CAD的超过8500个物体-CAD对的专家标注数据集，用于真实室内单物体补全，并作为该任务的新基准。对于物体-CAD对齐，我们通过对称感知损失融入对称信息，提高了对对称模糊的鲁棒性。在Scan2CAD基准上，CAOA相比最先进方法实现了17%的精度提升。

英文摘要

Accurately aligning CAD models to their corresponding objects in indoor RGB-D scans is a central challenge in 3D semantic reconstruction. The task requires estimating a 9-Degree-of-Freedom (DoF) pose-position, rotation, and scale along three axes-but is hindered by noisy and incomplete scans, as well as segmentation errors that cause geometric distortions. We present Completion-Assisted Object-CAD Alignment (CAOA), a method that integrates a semantically and contextually aware point cloud completion module with a symmetry-aware relative pose estimation algorithm, enabling precise alignment of CAD models to scanned objects. Existing completion methods are typically trained and evaluated on synthetic datasets, which often fail to generalize to real-world scans. To bridge this gap, we introduce a synthetic data generation strategy tailored to indoor scenes, significantly reducing the synthetic-to-real domain gap-validated through quantitative comparisons with widely used completion datasets. In addition, we release S2C-Completion, an expert-annotated dataset of over 8,500 object-CAD pairs from Scan2CAD, created for real-world indoor single-object completion and intended as a new benchmark for this task. For object-CAD alignment, we incorporate symmetry information via a symmetry-aware loss, improving robustness to symmetric ambiguities. On the Scan2CAD benchmark, CAOA achieves a 17% accuracy improvement over state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2606.18464 2026-06-18 astro-ph.IM astro-ph.EP cs.LG 交叉投稿

Modeling Doppler Shifts in Radial-Velocity Data with Deep Learning toward Earth-mass Exoplanet Detection

利用深度学习建模径向速度数据中的多普勒频移以探测地球质量系外行星

Isidro Gómez-Vargas, Xavier Dumusque, Yinan Zhao, Khaled Al Moulla, Michael Cretignier

发表机构 * Department of Astronomy, University of Geneva 51 chemin de Pegasi, 1290 Versoix, Switzerland. Instituto de Astrofı\'isica de Andaluc\'ia (CSIC), Glorieta de la Astronom\'ia s/n, E-18008 Granada, Spain. Institute of Space Sciences (CSIC), Carrer de Can Magrans s/n, E-08193 Barcelona, Spain. Department of Astronomy, University of Texas at Austin, 2515 Speedway, Austin, TX 78712, USA. Instituto de Astrofísica e Ciências do Espaço, Universidade do Porto, CAUP, Rua das Estrelas, 4150-762 Porto, Portugal. Department of Physics, University of Oxford, OX13RH Oxford, UK.

AI总结针对恒星活动干扰，提出结合物理启发光谱表示与深度学习的框架，通过交叉验证和遗传算法优化，可靠恢复振幅≥25 cm/s、周期10-550天的行星信号，并发布Python包doppleriann。

Comments 20 pages, 14 figures. Accepted for publication in Astronomy & Astrophysics

详情

AI中文摘要

由于恒星活动的影响，在恒星径向速度测量中探测由地球质量行星引起的微小多普勒频移仍然极具挑战性。许多在模拟数据上表现良好的深度学习方法难以可靠地应用于真实恒星光谱。本工作的目标是开发一种深度学习框架，使其能够泛化到真实、未见过的光谱，并提高径向速度数据中地球质量行星的可探测性。我们在注入行星信号的HARPS-N太阳光谱上训练人工神经网络，使用基于通量和谱线形成温度的物理驱动光谱表示，以及它们的速度梯度。探索了两种训练策略：留出测试和交叉验证。通过基于遗传算法的超参数优化增强模型鲁棒性，并使用蒙特卡洛dropout量化预测不确定性。在交叉验证策略下，我们最精确的神经网络模型能够可靠地恢复振幅≥25 cm/s、周期在10到550天之间的行星信号的振幅、相位和轨道周期。此外，在所有测试案例中，成功恢复的信号对应于多普勒频移预测周期图中最显著的峰值。基于温度的光谱壳表示始终优于基于通量的壳。我们还发布了实现该框架的Python包doppleriann。我们的结果表明，将物理驱动的光谱表示与深度学习相结合，为从真实观测的径向速度数据中探测地球质量行星提供了一条有前景的途径，该建模框架既具有物理基础又具有统计严谨性，并包含了不确定性量化和优化的训练策略。

英文摘要

Detecting the tiny Doppler shifts induced by Earth-mass planets in stellar radial-velocity measurements remains extremely challenging due to stellar activity. Many deep-learning methods performing well on simulated data remain difficult to apply reliably on real stellar spectra. The aim of this work is to develop a deep-learning framework that generalizes to real, unseen spectra and improves the detectability of Earth-mass planets in radial-velocity data. We train artificial neural networks on HARPS-N solar spectra with injected planetary signals, using physics-motivated spectral representations based on flux and line-formation temperature, together with their velocity gradients. Two training strategies are explored: hold-out testing and cross-validation. Model robustness is enhanced through genetic-algorithm-based hyperparameter optimization, and predictive uncertainty is quantified using Monte Carlo dropout. Our most precise neural network model reliably retrieves, under the cross-validation strategy, the amplitudes, phases, and orbital periods of planetary signals with amplitudes greater than or equal to 25 cm/s and periods between 10 and 550 days. In addition, in all cases tested here, the successfully recovered signals correspond to the most significant peaks in the periodograms of the Doppler-shift predictions. Temperature-based spectral-shell representations consistently outperform flux-based shells. We also release doppleriann, a Python package implementing the proposed framework. Our results demonstrate that combining physically motivated spectral representations with deep learning provides a promising pathway toward the detection of Earth-mass planets in radial-velocity data from real observations, supported by a modeling framework that is both physically grounded and statistically rigorous, incorporating uncertainty quantification and optimized training strategies.

URL PDF HTML ☆

赞 0 踩 0

2606.18698 2026-06-18 cs.RO cs.AI cs.LG 交叉投稿

Leveraging Energy Features for Surface Classification with Deep Learning: A Comparative Analysis Across Three Independent Datasets

利用能量特征进行基于深度学习的表面分类：三个独立数据集的比较分析

Alexander Belyaev, Oleg Kushnarev

AI总结研究评估能量特征作为表面分类的独立或辅助模态的可行性，在三个数据集上比较多种深度学习架构，发现CNN性能最优，纯能量特征准确率85-90%，与惯性特征结合可达96-99%，且能量特征可稳定提升1-2%准确率。

详情

AI中文摘要

基于能量的方法在移动机器人表面分类中仍是一个相对未被充分研究的途径，尽管在受限环境中取得了有希望的结果。本研究评估了使用能量衍生特征作为独立分类模态或作为惯性数据补充输入的可行性。在三个公开数据集上进行了全面评估，比较了现代深度学习架构（包括循环神经网络、卷积神经网络、仅编码器变压器和Mamba状态空间模型）在自动超参数调整和输入序列长度优化下的性能。模型在所有评估数据集上均实现了比先前报道值更高的准确率，其中卷积神经网络取得了最高的整体性能。当仅依赖基于能量的特征时，模型分类准确率在85-90%范围内，比与惯性特征结合时（96-99%）低约5-10%。用能量特征增强惯性数据导致平均准确率持续提高1-2%。这些发现表明，仅依赖能量特征的分类器为独立部署提供了足够的准确性，同时在与其它感知模态结合使用时也提供了一致的增益。

英文摘要

The energy-based method remains a comparatively underexamined approach for surface classification in mobile robotics, despite promising results in constrained environments. This study evaluated the viability of using energy-derived features as either a standalone classification modality or as supplementary input to inertial data. A comprehensive evaluation was conducted across three publicly available datasets, comparing the performance of modern deep learning architectures including recurrent neural networks, convolutional neural networks, encoder-only transformers, and Mamba state-space models, under automated hyperparameter tuning and input sequence length optimization. The models achieved higher accuracy than previously reported values on all evaluated datasets, with the convolutional neural network yielding the highest overall performance. When relying exclusively on energy-based features, the models attained classification accuracies in the range of 85-90%, approximately 5-10% lower than those achieved when combined with inertial features (96-99%). Augmenting inertial data with energy features resulted in a consistent mean accuracy improvement of 1-2%. These findings indicate that classifiers relying solely on energy features offer sufficient accuracy for standalone deployment, while also providing a consistent gain when used in combination with other sensing modalities.

URL PDF HTML ☆

赞 0 踩 0

2606.18723 2026-06-18 cs.CV cs.LG 交叉投稿

Clinically Aligned Geometry Constraints for Robust IVUS Vessel Boundary Segmentation

临床对齐的几何约束用于鲁棒的IVUS血管边界分割

Yunshu Chen, Litao Yang, Giuseppe Di Giovanni, Jordan Tan, Deval Mehta, Andrew Lin, Derek Chew, Masasi Fujino, Julie Butters, Stephen Nicholls, Zongyuan Ge, Kyung Hoon Cho

发表机构 * AIM For Health Lab, Monash University（莫纳什大学AIM健康实验室）； Department of Data Science and Artificial Intelligence, Faculty of IT, Monash University（莫纳什大学信息技术学院数据科学与人工智能系）； Monash University Victorian Heart Institute（莫纳什大学维多利亚心脏研究所）； School of Computing Technologies, RMIT University（皇家墨尔本理工大学计算技术学院）； National Cerebral and Cardiovascular Center（国立循环器病研究中心）； Department of Cardiology, Chonnam National University Hospital and Medical School（全南大学医院和医学院心脏病学系）

AI总结提出GeoCat网络，通过双编码器与可微几何一致性损失，在IVUS分割中降低边界漂移和拓扑错误，提升临床几何测量精度。

Comments MICCAI2026 Accepted

详情

AI中文摘要

血管内超声（IVUS）管腔和外弹性膜（EEM）分割对于定量评估冠状动脉斑块负荷至关重要。管腔或EEM勾画的误差会直接传播到斑块面积、斑块负荷和几何测量中。然而，优先考虑重叠分数的标准方法常常遭受边界漂移和拓扑错误，导致临床测量不准确。我们提出GeoCat，一个几何一致性网络，使用双笛卡尔-极坐标编码器，结合跨域注意力和时间融合，处理5帧IVUS片段。可微的几何一致性损失直接监督临床相关描述符，包括直径、方向和横截面积。该模型在来自146名患者的12,242张标注帧上训练，这些帧使用两种商用IVUS系统采集。我们使用分割准确性和斑块相关临床指标评估性能，包括Dice/IoU、边界测量（95HD（mm）、ASSD）、拓扑违规率和临床几何误差（dmax/dmin、角度和面积）。在我们的数据集上，GeoCat实现了0.93的Dice，将95HD降低到0.14 mm，并将拓扑违规率降低到1.0%。重要的是，它显著提高了几何保真度，产生0.13-0.16 mm的直径误差和约8度的角度误差，支持可靠的斑块负荷量化。

英文摘要

Intravascular ultrasound (IVUS) lumen and external elastic membrane (EEM) segmentation is important for quantitative coronary plaque burden assessment. Errors in lumen or EEM delineation directly propagate to plaque area, plaque burden and geometric measurements. However, standard methods prioritising overlap scores often suffer from boundary drift and topology errors, leading to inaccurate clinical measurements. We present GeoCat, a geometry-consistent network that processes 5-frame IVUS clips using dual Cartesian-polar encoders with cross-domain attention and temporal fusion. A differentiable geometry consistency loss directly supervises clinically relevant descriptors including diameters, orientations, and cross-sectional areas. The model is trained on 12,242 annotated frames from 146 patients acquired with two commercial IVUS systems. We evaluate performance using both segmentation accuracy and plaque-relevant clinical metrics, including Dice/IoU, boundary measures(95HD (mm), ASSD), topology violation rate, and clinical geometry errors (dmax/dmin, angles, and areas). On our dataset, GeoCat achieves a Dice of 0.93, reduces 95HD to 0.14 mm, and lowers topology violations to 1.0%. Importantly, it significantly improves geometric fidelity, yielding diameter errors of 0.13-0.16 mm and angular errors of ~8 degrees, supporting reliable plaque burden quantification.

URL PDF HTML ☆

赞 0 踩 0

2606.18734 2026-06-18 eess.SP cs.LG 交叉投稿

Point-Cloud-Assistant Localized Statistical Channel Prediction by Tangent Gaussian Splatting

点云辅助的切线高斯溅射局部统计信道预测

Ye Xue, Yiheng Wang, Xinhua Shao, Qi Yan, Shutao Zhang, Tsung-Hui Chang

AI总结提出点云辅助切线高斯溅射（PC-TGS）框架，通过融合稀疏无线电测量与密集LiDAR几何数据，将角功率谱外推到未测量网格，实现大规模无线数字孪生中的高效信道预测。

详情

AI中文摘要

准确、特定地点的信道信息对于优化下一代无线网络至关重要。在各种方法中，局部统计信道建模（LSCM）通过从参考信号接收功率（RSRP）测量中建模信道多径角功率谱（APS），已成为一种针对高效网络优化的最先进方法。然而，尽管其有效性，LSCM无法在绝大多数没有测量值的位置预测APS，这严重限制了其在大规模真实场景中的适用性。为了解决这一挑战，我们提出了\emph{点云辅助切线高斯溅射}（PC-TGS），这是第一个通过将稀疏无线电测量与密集的基于LiDAR的几何信息相结合，将APS\emph{外推}到未测量室外网格的框架。PC-TGS将环境散射体表示为各向异性的3D高斯分布，通过原始点云的松弛均值重新参数化进行初始化和细化。切线平面投影将每个高斯分布精确映射到局部角度域，而深度感知的电磁溅射过程聚合它们的贡献。为了确保实际部署，我们推导了用于APS bin积分的闭式高斯加权平均（GWA），并提供了可证明的误差界。在LiDAR扫描的城市规模数据集（500万个点，6310个RSRP样本）上的评估表明，与最先进的基线相比，PC-TGS在APS和RSRP预测性能上更优，并且在外推APS任务中推理时间更快。这些结果突显了PC-TGS在大规模无线数字孪生中实现几何感知和数据高效信道预测的潜力。

英文摘要

Accurate, site-specific channel information is crucial for optimizing next-generation wireless networks. Among various approaches, localized statistical channel modeling (LSCM), which models the channel multipath angular power spectrum (APS) from the reference signal received power (RSRP) measurement, has emerged as a state-of-the-art method tailored for efficient network optimization. However, despite its effectiveness, LSCM cannot predict APS at the vast majority of locations where no measurements are available, which significantly restricts its applicability in large-scale, real-world scenarios. To address this challenge, we present \emph{point-cloud-assisted tangent Gaussian splatting} (PC-TGS), the first framework to \emph{extrapolate} APS to unmeasured outdoor grids by integrating sparse radio measurements with dense LiDAR-based geometry. PC-TGS represents environmental scatterers as anisotropic 3D Gaussians, initialized and refined through a relaxed-mean reparameterization of the raw point cloud. A tangent-plane projection accurately maps each Gaussian into the local angular domain, while a depth-aware electromagnetic splatting process aggregates their contributions. To ensure practical deployment, we derive a closed-form Gaussian-weighted average (GWA) for APS bin integration and provide a provable error bound. { Evaluations on a LiDAR-scanned city-scale dataset (5M points, 6,310 RSRP samples) demonstrate that PC-TGS achieves better APS and RSRP prediction performance compared to state-of-the-art baselines and faster inference time for APS extrapolation task. These results highlight the potential of PC-TGS to enable geometry-aware and data-efficient channel prediction in large-scale wireless digital twins.

URL PDF HTML ☆

赞 0 踩 0

2606.18824 2026-06-18 cs.CV cs.LG 交叉投稿

Where Will They Go? Modelling Multimodal Pedestrian Manoeuvres from Ego-centric Videos

他们将去哪里？从自我中心视频建模多模态行人机动

Yuxuan Xie, Nicolas Pugeault, Chongfeng Wei, Hubert P. H. Shum, Edmond S. L. Ho

发表机构 * School of Computing Science, University of Glasgow（格拉斯哥大学计算机科学学院）； James Watt School of Engineering, University of Glasgow（格拉斯哥大学詹姆斯·瓦特工程学院）； Department of Computer Science, Durham University（杜伦大学计算机科学系）

AI总结提出MMPM框架，通过行为感知交互模块和基于CVAE的模态感知轨迹预测器，分别建模行人过马路和不过马路两种模式，提升自我中心视角下多模态轨迹预测准确性。

Comments Accepted at The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2026

详情

AI中文摘要

从自我中心摄像头进行行人轨迹预测具有挑战性，因为它依赖于与车辆和场景上下文的复杂交互以及行人的意图。通过建模行人历史与未来轨迹的相关性和意图，通常会产生多模态（即多个模式）分布。现有的随机预测器通常从单一单峰分布中采样多个未来轨迹，这可能导致次优的“混合模式”轨迹，这些轨迹位于不同的运动模式之间，并在真实场景中变得不合理。在本文中，我们提出MMPM，一种模态感知框架，基于行人的过马路行为将未来轨迹分布分别建模为语义上有意义的模式。MMPM由两个模块组成：行为感知行人交互模块（PIM），通过引入注视、头部和手势来联合捕捉行人-车辆和行人-环境交互；以及基于CVAE的模态感知轨迹预测器（MTP）模块，分别对过马路和不过马路两种模式的未来轨迹分布进行建模。基于查询的解码器进一步在解码过程中强制执行模态一致性。在PIE和JAAD数据集上的实验表明，我们的方法超越了最先进的基线。我们提出的MTP是模型无关的，可以集成到现有框架如BiTrap-NP和SGNet-ED中，以进一步提高未来轨迹预测性能。我们还引入了一种数据驱动的验证协议，将预测与时空一致的真实轨迹匹配，展示了相比先前工作改进的逐帧位移误差。

英文摘要

Pedestrian trajectory prediction from an ego-centric camera is challenging since it depends on complex interactions with vehicles and scene context, as well as the intention of the pedestrian. By modelling correlation and intent from the historical and future trajectories of the pedestrian, it will usually result in a multimodal (i.e. multiple modes) distribution. Existing stochastic predictors often sample multiple futures from a single unimodal distribution, which can yield sub-optimal 'mixed-mode' trajectories that lie between distinct motion patterns and become implausible in real scenes. In this paper, we propose MMPM, a mode-aware framework that separately models future trajectory distributions into semantically meaningful modes based on the pedestrian's crossing behavior. MMPM consists of two modules: behavior-aware Pedestrian Interaction Module (PIM) that jointly captures pedestrian-vehicle and pedestrian-environment interactions by introducing gaze, head and hand gesture, and a CVAE-based Mode-aware Trajectory Predictor (MTP) module to model the future trajectory distributions on two modes, crossing and non-crossing the road, separately. A query-based decoder further enforces mode consistency during decoding. Experiments on PIE and JAAD datasets show that our method surpasses state-of-the-art baselines. Our proposed MTP is model-agnostic, which can be integrated into existing frameworks such as BiTrap-NP and SGNet-ED to further improve future trajectory prediction performance. We additionally introduce a data-driven validation protocol that matches predictions to spatio-temporally consistent ground-truth trajectories, demonstrating improved frame-wise displacement errors over previous work.

URL PDF HTML ☆

赞 0 踩 0

2606.18876 2026-06-18 cs.CV cs.LG 交叉投稿

Test-Time Adaptation in Optical Coherence Tomography Using Trajectory-Aligned Time-Independent Flow

光学相干断层扫描中基于轨迹对齐的时间无关流的测试时自适应

Veit Hucke, Thomas Pinetz, Gregor Reiter, Ursula Schmidt-Erfurth, Hrvoje Bogunović

发表机构 * Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria（人工智能研究所、医学数据科学中心、维也纳医学大学，奥地利）； Comprehensive Center for Artificial Intelligence in Medicine, Medical University of Vienna, Austria（医学人工智能综合中心、维也纳医学大学，奥地利）； Department of Ophthalmology and Optometry, Medical University of Vienna, Austria（眼科与视光学部、维也纳医学大学，奥地利）； Laboratory for Ophthalmic Image Analysis, Medical University of Vienna, Austria（眼科图像分析实验室、维也纳医学大学，奥地利）

AI总结提出一种基于流匹配的测试时自适应方法，通过直方图匹配和去除时间条件，生成高质量替代图像，在AMD分割中达到最优性能。

Comments Accepted in MICCAI

2606.18932 2026-06-18 astro-ph.EP astro-ph.IM cs.AI cs.LG 交叉投稿

TransitNet: A Compact Attention-Augmented Deep Learning Framework for Low-SNR Transit Blind Searches

TransitNet: 一种用于低信噪比凌星盲搜索的紧凑型注意力增强深度学习框架

Xingchen Yan, Jian Ge, Qingtian Liu, Kevin Willis, Quanquan Hu, Jiapeng Zhu

发表机构 * Shanghai Astronomical Observatory, Shanghai 200030, China（上海天文台，上海200030，中国）； University of Chinese Academy of Sciences, Yanqi Lake Campus, East Road 1, Huairou, Beijing 101408, China（中国科学院大学，燕琦湖校区，东路1号，北京101408，中国）； Science Talent Training Center, Gainesville, FL, 32606 USA（科学人才培训中心，佛罗里达州盖恩斯维尔，32606美国）

AI总结提出紧凑型注意力增强深度学习框架TransitNet，用于低信噪比凌星盲搜索，在SNR 6-8范围内达到95.2%准确率，恢复率93.0%，远超TLS和BLS，且模型仅1.5 MB，推理速度提升12-25倍。

Comments 24 pages, 23 figures, 3 tables, submitted to MNRAS

详情

AI中文摘要

受中长周期地球大小行星观测不完整性的启发，我们提出了TransitNet，一种用于低信噪比凌星盲搜索的紧凑型注意力增强深度学习框架。为了实现盲搜索条件下现实的方法开发和客观的阈值校准，我们开发了一个统一的数据集构建、基准测试和阈值选择框架。在由未见过的Kepler目标构建的恢复基准测试中，TransitNet在具有挑战性的信噪比6-8范围内达到了95.2%的准确率，并优于TLS和BLS，ROC-AUC和PR-AP值分别为0.974和0.982。在一次注入的地球大小和亚地球大小凌星恢复实验中，TransitNet实现了93.0%的恢复率，显著超过TLS（63.1%）和BLS（60.0%）。除了检测，TransitNet还提供了基于注意力的凌星窗口和中点估计。在一个独立评估集上，97.4%的注入凌星被估计的凌星窗口完全覆盖。应用于真实的Kepler观测，该模型成功恢复了所有34个选定的已确认Kepler行星，平均绝对凌星中点误差为1.24小时。该模型结合了约1.5 MB的紧凑体积和高推理效率，相对于CPU-TLS加速约12-25倍，相对于CPU-BLS加速约4-5倍。这些结果表明，TransitNet在测试范围内为低信噪比凌星盲搜索提供了一个准确、可扩展且计算高效的框架，并激励其扩展到更长周期的地球大小行星搜索。

英文摘要

Motivated by the observational incompleteness of intermediate-to-long-period Earth-size planets, we present TransitNet, a compact attention-augmented deep-learning framework for low-SNR transit blind searches. To enable realistic method development and objective threshold calibration under blind-search conditions, we develop a unified dataset construction, benchmarking, and threshold-selection framework. On recovery benchmarks constructed from unseen Kepler targets, TransitNet attains 95.2 percent accuracy in the challenging SNR range of 6 to 8 and outperforms both TLS and BLS, achieving ROC-AUC and PR-AP values of 0.974 and 0.982, respectively. In an injected Earth-size and sub-Earth-size transit recovery experiment, TransitNet achieves a recovery rate of 93.0 percent, substantially exceeding those of TLS (63.1 percent) and BLS (60.0 percent). In addition to detection, TransitNet provides attention-based estimates of transit windows and midpoints. On an independent evaluation set, 97.4 percent of injected transits are fully covered by the estimated transit window. Applied to real Kepler observations, the model successfully recovers all 34 selected confirmed Kepler planets, with a mean absolute transit midpoint error of 1.24 hours. The model combines a compact footprint of about 1.5 MB with high inference efficiency, yielding speed-ups of about 12 to 25 times relative to CPU-TLS and about 4 to 5 times relative to CPU-BLS. These results demonstrate that TransitNet provides an accurate, scalable, and computationally efficient framework for low-SNR transit blind searches in the tested regime and motivate its extension to longer-period Earth-size planet searches.

URL PDF HTML ☆

赞 0 踩 0

2606.19092 2026-06-18 stat.AP cs.LG 交叉投稿

Context-Aware Optimization of Follow-Up Intervals for Type 2 Diabetes Care Using Markov Decision Processes

使用马尔可夫决策过程对2型糖尿病护理随访间隔进行上下文感知优化

Parisa Lotfibagha, Kristen Miller, William J. Gallagher, Elizabeth B. Selden, Muge Capan

AI总结提出上下文马尔可夫决策过程模型，利用电子健康记录数据为2型糖尿病患者优化个性化随访间隔，识别低风险和高风险亚群，相比固定间隔策略显著降低预期累积成本。

详情

AI中文摘要

慢性病管理依赖于定期的医患互动来跟踪疾病进展和控制。对于2型糖尿病，当前指南对所有患者规定固定的初级保健随访间隔，忽略了临床轨迹和患者特征的异质性。本研究引入上下文马尔可夫决策过程模型，利用来自10个初级保健诊所的22,154名2型糖尿病患者的电子健康记录数据，优化亚群特定的随访间隔决策。上下文通过以下方式识别：i) 利用主成分分析对代表个体健康轨迹的变量进行降维，以及ii) 通过主成分和额外的患者层面特征使用聚类将患者分配到上下文中。出现了两个不同的上下文，分别代表低风险和高风险亚群。CMDP导出的策略建议：(i) 如果当前就诊的实验室值未测量，则在1个月内随访；(ii) 对于实验室值升高或近期住院，最多3个月；(iii) 对于持续血糖控制，6至12个月，高风险上下文患者的随访间隔更短。最优策略实现了比基准更低的预期累积成本（例如，在高共病上下文中，相对于美国糖尿病协会类似的固定间隔随访策略，CMDP策略降低了约34.8%的成本；在低共病上下文中降低了约6.4%）。这些发现展示了上下文感知方法如何为适应性随访策略提供信息，并有可能通过综合机器学习和概率决策模型来推进初级保健中的慢性病管理。

英文摘要

Chronic disease management relies on regular patient-provider interactions to follow-up on disease progression and control. For Type 2 Diabetes (T2D), current guidelines prescribe fixed time intervals between subsequent primary care visits for all patients, overlooking heterogeneity in clinical trajectories and patient characteristics. This study introduces a Contextual Markov Decision Process (CMDP) model to optimize subpopulation-specific follow-up interval decisions using Electronic Health Record (EHR) data from 22,154 T2D patients across 10 primary care clinics. Contexts are identified by: i) dimensionality reduction of variables representing the individual health trajectories utilizing Principal Component Analysis, and ii) assigning patients to contexts via principal components and additional patient-level features using clustering. Two distinct contexts emerged, representing a lower- and a higher-risk subpopulation. CMDP-derived policies recommend: (i) follow-up within 1 month if lab value at current visit is unmeasured; (ii) up to 3 months for elevated lab values or recent hospitalizations; and (iii) 6 to 12 months for sustained glycemic control, with shorter follow-up intervals for patients in high-risk context. The optimal policies achieved lower expected cumulative cost than benchmarks (e.g., in the higher-comorbidity context, the CMDP policy reduced cost by about 34.8%, and in the lower-comorbidity context by about 6.4%, relative to an American Diabetes Association-like fixed interval follow-up policy. These findings demonstrate how context-aware approaches can inform adaptive follow-up strategies, and have the potential to advance chronic care management in primary care by synthesizing machine learning and probabilistic decision models.

URL PDF HTML ☆

赞 0 踩 0

2606.19118 2026-06-18 cs.AI cs.LG econ.GN q-fin.EC 交叉投稿

Analysing drivers and interdependencies in European electricity markets using XAI

使用XAI分析欧洲电力市场的驱动因素与相互依赖性

Antoine Pesenti, Aidan O'Sullivan

发表机构 * UCL Energy Institute, University College London, UK（伦敦大学学院能源研究所，英国）

AI总结结合深度神经网络与可解释人工智能（XAI）技术，利用SHAP和SSHAP框架分析39个欧洲竞价区的电价决定因素，发现可再生能源（尤其是太阳能）对电价形成具有重要作用，天然气价格仍是主导驱动因素，且互联互通显著影响价格动态。

Comments 12 pages

详情

AI中文摘要

电力市场本质上是复杂系统，具有强非线性、高维交互以及跨区域日益增长的相互依赖性。虽然深度神经网络（DNN）在电价预测方面表现出强大的能力，但其缺乏可解释性限制了其在理解电价形成潜在驱动因素方面的实用性。本文通过将DNN模型与可解释人工智能（XAI）技术相结合，分析了39个欧洲竞价区电价的决定因素，填补了这一空白。我们采用SHAP（SHapley Additive exPlanations）量化特征贡献，并应用和扩展了SSHAP（一种聚合框架）以提高高维设置下的可解释性。分析表明，可再生能源（尤其是太阳能）在电价形成中发挥着不成比例的重要作用，尽管其在总发电量中占比较低。天然气价格仍然是跨电力市场的主导且一致的驱动因素，而互联互通显著影响价格动态，凸显了欧洲电力系统的强相互依赖性。此外，我们构建了一个合成性的全欧盟电力市场，以探索完全一体化单一价格市场的反事实情景。

英文摘要

Electricity markets are inherently complex systems characterised by strong nonlinearities, high-dimensional interactions, and increasing interdependence across regions. While deep neural networks (DNNs) have demonstrated strong predictive capabilities for electricity prices, their lack of interpretability limits their usefulness for understanding the underlying drivers of price formation. This paper addresses this gap by combining DNN models with explainable artificial intelligence (XAI) techniques to analyse the determinants of electricity prices across 39 European bidding zones. We employ SHAP (SHapley Additive exPlanations) to quantify feature contributions and apply and extend SSHAP, an aggregation framework to improve interpretability in high-dimensional settings. The analysis identifies that renewable energy sources, particularly solar, play a disproportionately important role in price formation despite their lower share in total power generation. Gas prices remain a dominant and consistent driver across electricity markets, while interconnections significantly shape price dynamics, highlighting the strong interdependence of European electricity systems. In addition, a synthetic EU-wide electricity market is constructed to explore the counterfactual scenario of a fully integrated market with a single price.

URL PDF HTML ☆

赞 0 踩 0

2606.19149 2026-06-18 cs.CR cs.LG 交叉投稿

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

OpenAnt：通过代码分解、对抗性验证和动态测试实现LLM驱动的漏洞发现

Nahum Korda, Gadi Evron

AI总结提出OpenAnt系统，结合静态分析与LLM推理，通过代码分解、对抗性验证和动态测试三阶段流水线，在降低误报率的同时发现未知漏洞。

详情

AI中文摘要

在大型代码库中自动发现漏洞仍然具有挑战性：传统静态分析误报率高，而模糊测试等动态方法需要大量基础设施且通常针对狭窄的漏洞类别。大型语言模型（LLM）的最新进展使得对程序行为进行语义推理成为可能，但将LLM应用于仓库级安全分析会引入上下文管理、成本和验证方面的挑战。我们提出了OpenAnt，一个开源漏洞发现系统，它在多阶段流水线中集成了静态程序分析与基于LLM的推理。OpenAnt引入了三种关键技术。首先，代码库被分解为自包含的分析单元，并通过从外部入口点的可达性进行过滤，将分析面减少高达97%，同时保留与攻击相关的代码。其次，候选漏洞通过受限攻击者模拟进行对抗性验证，其中模型在现实攻击者能力下评估可利用性。第三，通过动态验证确认发现结果，其中自动生成利用环境，在沙箱容器中执行，并在使用后丢弃。在包括OpenSSL、WordPress和Flowise在内的广泛使用的开源项目上的评估表明，这种架构可以识别先前未知的漏洞，同时保持可管理的分析成本并大幅减少误报。我们的结果表明，结合语义推理与利用验证的闭环漏洞发现流水线，为可扩展的自动化安全分析提供了一条实用路径。OpenAnt已在Apache 2.0许可下开源，网址为https://this https URL。

英文摘要

Automated vulnerability discovery in large codebases remains challenging: traditional static analysis produces high false-positive rates, while dynamic approaches such as fuzzing require substantial infrastructure and often target narrow classes of bugs. Recent advances in large language models (LLMs) enable semantic reasoning about program behavior, but applying LLMs to repository-scale security analysis introduces challenges related to context management, cost, and verification. We present OpenAnt, an open-source vulnerability discovery system that integrates static program analysis with LLM-based reasoning in a multi-stage pipeline. OpenAnt introduces three key techniques. First, codebases are decomposed into self-contained analysis units filtered by reachability from external entry points, reducing the analysis surface by up to 97% while preserving attack-relevant code. Second, candidate vulnerabilities undergo adversarial verification through constrained attacker simulation, where the model evaluates exploitability under realistic attacker capabilities. Third, findings are validated through dynamic verification, in which exploit environments are generated automatically, executed in sandboxed containers, and discarded after use. Evaluation on widely used open-source projects including OpenSSL, WordPress, and Flowise shows that this architecture can identify previously unknown vulnerabilities while maintaining manageable analysis cost and substantially reducing false positives. Our results suggest that closed-loop vulnerability discovery pipelines, combining semantic reasoning with exploit validation, provide a practical path toward scalable automated security analysis. OpenAnt is released as open source under the Apache 2.0 license at https://github.com/knostic/OpenAnt.

URL PDF HTML ☆

赞 0 踩 0

2606.19186 2026-06-18 cs.RO cs.LG 交叉投稿

Learning to Annotate Delayed and False AEB Events: A Practical System for Extreme Class Imbalance and Asymmetric Label Noise

学习标注延迟和误报AEB事件：针对极端类别不平衡和非对称标签噪声的实用系统

Mengxiang Hao, Xin Jiang, Xinghao Huang, Wenliang Su, Zhiteng Wang, Junjie Rao, Xiaotian Yang, Wei Liao, Chengyu Han, Gen Liang, Yulun Song, Zhitao Xu, Xianpeng Lang

发表机构 * Li Auto（理想汽车）

AI总结提出首个自动化AEB标注框架，通过特定数据增强和噪声抑制技术，解决极端类别不平衡和非对称标签噪声问题，将延迟/误报触发召回率提升80%，人工工作量减少50%。

Comments 8 pages, 5 figures, accepted by IEEE International Conference on Robotics and Automation (ICRA)

详情

Journal ref: 2026 IEEE International Conference on Robotics and Automation (ICRA)

AI中文摘要

自主紧急制动（AEB）优化依赖于准确标注的真实世界触发事件，特别是揭示系统缺陷的罕见但关键的延迟和误报AEB触发事件。然而，这些少数样本在每天数千次触发事件中占比不到5%，使得大规模人工标注成本过高。我们提出了首个自动化AEB标注框架来解决这一问题。在开发过程中，我们识别出两个严重损害延迟/误报触发标注准确性的基本挑战：（1）极端类别不平衡，其中延迟/误报触发被真实触发淹没；（2）非对称标签噪声，其中误标注的多数样本（真实触发）抑制了少数样本（延迟/误报触发）的学习。为克服这些挑战，我们提出两项关键创新：（1）特定数据增强，通过操纵焦点目标属性、移植自车动态和掩蔽非焦点代理来合成逼真样本；（2）噪声抑制，使用稳定硬度估计和探针引导的自适应阈值来清理误标注的真实触发样本。关键的是，我们将模型部署为具有全栈架构的实用标注系统，从每天数千个AEB事件中高效识别关键的延迟/误报触发。生产结果表明，延迟/误报触发的召回率提高了80%，人工工作量减少了50%。除了直接收益，该系统通过积累高质量标注实现持续自我改进，为车载AEB系统优化奠定了必要的数据基础。

英文摘要

Autonomous Emergency Braking (AEB) optimization relies on accurately annotated real-world trigger events, particularly rare but critical delayed and false AEB triggers that expose system deficiencies. However, these minority samples comprise less than 5% of thousands of daily triggers, making manual annotation prohibitively expensive at scale. We present the first automated AEB annotation framework to address this problem. During development, we identified two fundamental challenges that severely impair delayed/false trigger annotation accuracy: (1) Extreme class imbalance where delayed/false triggers are overwhelmed by true triggers; (2) Asymmetric label noise where mislabeled majority samples (true triggers) suppress minority samples (delayed/false triggers) learning. To overcome these challenges, we propose two key innovations: (1) Specific data augmentation that synthesizes realistic samples by manipulating focal target attributes, transplanting ego-vehicle dynamics, and masking non-focal agents; (2) noise suppression using stable hardness estimation and probe-guided adaptive threshold to clean mislabeled true trigger samples. Crucially, we deploy our model as a practical annotation system with full-stack architecture, efficiently identifying critical delayed/false triggers from thousands of daily AEB events. Production results demonstrate 80% improvement in recall of delayed/false triggers and 50% reduction in manual workload. Beyond immediate gains, the system enables continuous self-improvement through accumulated high-quality annotations, establishing a necessary data foundation for on-vehicle AEB system optimization

URL PDF HTML ☆

赞 0 踩 0

2606.19251 2026-06-18 physics.comp-ph cs.LG physics.flu-dyn 交叉投稿

Acceleration of an algebraic multigrid pressure solver using graph neural networks

使用图神经网络加速代数多重网格压力求解器

Eric Chillón, Artur K. Lidtke, Nguyen Anh Khoa Doan, Bernat Font

发表机构 * Faculty of Mechanical Engineering, Delft University of Technology, The Netherlands（荷兰代尔夫特理工大学机械工程学院）； Maritime Research Institute Netherlands, The Netherlands（荷兰海事研究院）； Department of Aeronautics, Imperial College London, United Kingdom（英国伦敦帝国理工学院航空系）

AI总结提出一种基于图卷积同构网络的代数多重网格平滑器，通过预测最优多项式系数构造稀疏伪逆算子，减少V-cycle迭代次数，在非结构化网格上实现4%-37%的加速，并泛化至训练时未见的大规模网格。

Comments 23 pages, 11 figures

详情

AI中文摘要

求解压力-泊松方程仍然是非结构化不可压缩流求解器的主要计算瓶颈，这主要是由于传统线性求解器对网格不规则性固有的敏感性。本文引入了一种数据驱动的代数多重网格（AMG）平滑器，该平滑器使用改进的图卷积同构网络（GCIN）。图神经网络预测最优多项式系数，以在不同网格拓扑上构造稀疏伪逆算子。优化系数以减少每次V-cycle迭代后的残差。通过直接从稀疏系数矩阵捕获系统的代数结构，所提出的方法在适应非结构化网格中的局部各向异性的同时，保持了求解器的线性性。我们的框架通过减少达到给定容差所需的V-cycle次数，并在不同基准测试中实现4%到37%的墙钟加速，展示了显著的性能提升。值得注意的是，该模型在比训练时所见大128倍的网格上保持效率，并在未见过的工业相关问题上（如AirfRANS数据集）加速求解器收敛，表现出鲁棒的泛化能力。

英文摘要

Solving the pressure-Poisson equation remains the primary computational bottleneck in incompressible unstructured flow solvers primarily due to the inherent sensitivity of traditional linear solvers to mesh irregularities. This work introduces a data-driven algebraic multigrid (AMG) smoother that uses a modified graph convolutional isomorphism network (GCIN). The graph neural network predicts optimal polynomial coefficients to construct a sparse pseudo-inverse operator across diverse grid topologies. The coefficients are optimized to reduce the residual after each V-cycle iteration. By directly capturing the algebraic structure of the system from the sparse coefficient matrix, the proposed method maintains the solver's linearity while adapting to local anisotropies in unstructured grids. Our framework demonstrates significant performance gains by reducing the number of V-cycles required for a given tolerance and delivering wall-clock speedups from 4% to 37% across diverse benchmarks. Notably, the model exhibits robust generalization by maintaining efficiency on meshes up to 128 times larger than those seen in training, and by accelerating the solver's convergence on unseen industry-relevant problems such as the AirfRANS dataset.

URL PDF HTML ☆

赞 0 踩 0

2606.19253 2026-06-18 cs.CV cs.AI cs.LG cs.RO 交叉投稿

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

OneCanvas: 通过全景重投影实现3D场景理解

Bartłomiej Baranowski, Dave Zhenyu Chen, Matthias Nießner

发表机构 * Technical University of Munich（慕尼黑工业大学）； Huawei（华为）

AI总结提出OneCanvas方法，将多视图补丁特征聚合到全景画布上，利用深度和相机位姿进行重投影，无需复杂几何编码器或大量训练，在SQA3D等基准上达到最先进精度。

Comments Project page: https://baranowskibrt.github.io/onecanvas/

详情

AI中文摘要

现有的视觉语言模型（VLM）中的3D场景理解方法要么依赖复杂的、模型特定的几何编码器，要么为了追求空间推理而需要大量的训练预算。相反，OneCanvas将所有视图的补丁特征聚合到一个单一的等距柱状全景画布上。具体来说，每个补丁利用其深度和相机位姿被反投影到3D世界坐标，然后根据从画布原点看到的该点的连续经度和纬度放置在画布上，无需对重叠视图进行光栅化或聚合。补丁的度量坐标的3D位置嵌入被添加到其特征中，从而恢复了将世界位置压缩到角度画布坐标时丢失的深度。因此，来自所有帧的补丁共享一个空间坐标系，无需融合或对主干网络进行重大架构修改。预训练的VLM将此表示视为普通图像。由于画布可以以任何感兴趣的姿态为中心，相同的表示直接支持从特定视角进行情境推理，这是机器人和具身AI中的常见需求。得益于这种表示，我们还可以引入空间预训练课程：通过程序化地将从真实图像中提取的对象的补丁特征放置在原本空白的画布上的选定3D世界位置，我们生成了涵盖广泛空间推理任务的即时监督，并控制答案分布以减少空间推理捷径。OneCanvas在SQA3D和VSI-Bench上达到了最先进的准确率，并在SPBench上泛化到分布外数据，其训练计算量比最强竞争方法少一个数量级。

英文摘要

Existing approaches to 3D scene understanding in Vision-Language Models (VLMs) either rely on complex, model-specific geometry encoders or large training budgets in pursuit of spatial reasoning. Instead, OneCanvas aggregates patch features from all views onto a single equirectangular panoramic canvas. Namely, each patch is unprojected to a 3D world coordinate using its depth and camera pose, then placed on the canvas at the continuous longitude and latitude of that point as seen from the canvas origin, with no rasterization or aggregation across overlapping views. A 3D position embedding of the patch's metric coordinates is added to its feature, restoring the depth lost when collapsing the world position to an angular canvas coordinate. Patches from all frames thus share one spatial coordinate system with no fusion or major architectural modifications of the backbone. The pretrained VLM consumes this representation as if it were an ordinary image. Because the canvas can be centered on any pose of interest, the same representation directly supports situated reasoning from a specific viewpoint, a common requirement in robotics and embodied AI. Thanks to this representation, we can also introduce a spatial pretraining curriculum: by procedurally placing patch features of objects, drawn from real images, at chosen 3D world positions on an otherwise empty canvas, we generate on-the-fly supervision spanning a broad range of spatial reasoning tasks, with answer distributions controlled to reduce spatial reasoning shortcuts. OneCanvas achieves state-of-the-art accuracy on SQA3D and VSI-Bench, and generalizes to out-of-distribution data on SPBench, using an order of magnitude less training compute than the strongest competing methods.

URL PDF HTML ☆

赞 0 踩 0

2606.19302 2026-06-18 physics.ao-ph cs.LG 交叉投稿

Optimal scenario design for climate emulation

气候模拟的最优情景设计

Christopher B. Womack, Shahine Bouabid, Andrei Sokolov, Popat Salunke, Glenn Flierl, Sebastian D. Eastham, Noelle E. Selin

发表机构 * Department of Aeronautics and Astronautics, Massachusetts Institute of Technology（航空与航天系，麻省理工学院）； Center for Sustainability Science and Strategy, Massachusetts Institute of Technology（可持续科学与战略中心，麻省理工学院）； Department of Earth, Atmospheric, and Planetary Sciences, Massachusetts Institute of Technology（地球、大气与行星科学系，麻省理工学院）； Brahmal Vasudevan Institute for Sustainable Aviation, Department of Aeronautics, Imperial College London（可持续航空研究所，帝国理工学院伦敦校区）； Institute for Data, Systems, and Society, Massachusetts Institute of Technology（数据、系统与社会研究所，麻省理工学院）

AI总结针对气候模拟器泛化能力受限的问题，提出通过可微简单气候模型优化训练数据情景，使小数据集训练的模拟器性能优于标准情景集。

详情

AI中文摘要

随着深度学习在物理系统中的普及，改进泛化性的努力主要集中在设计嵌入物理约束的架构上。然而，对于机器学习替代气候模型（模拟器），我们表明现有情景中用于生成训练数据的低结构多样性限制了预测能力。在此，我们研究是否可以优化训练数据集本身以提高泛化性。我们引入一种方法创建数据集，使模拟器能够泛化到训练数据中未出现的新结构情景。我们使用可微简单气候模型（SCM）计算模拟器损失对训练数据扰动的敏感性，迭代更新训练数据以最大化模拟器技能。对于SCM，以这种方式优化的一个情景训练出的模拟器优于在六个标准ScenarioMIP路径上训练的模拟器。尽管训练数据集更小，但我们实现了更高的预测技能，发现我们的模拟器成功隔离了不同气候强迫因子（如温室气体与气溶胶）的独特物理行为，而无需单强迫运行。然后我们证明，使用SCM优化的情景驱动中等复杂度气候模型时，产生的训练数据集比在ScenarioMIP输出上训练得到更熟练的模拟器。我们的结果表明，在运行全尺度气候模型的计算受限环境中，生成少量动态丰富的情景比扩展传统排放路径集对模拟和表征系统响应具有更大的边际价值。

英文摘要

As deep learning for physical systems continues to grow in popularity, efforts to improve generalizability have primarily focused on designing architectures that embed physical constraints. However, for machine-learning surrogate climate models (emulators), we show that the low structural diversity in existing scenarios commonly used to generate training data places a ceiling on predictive skill. Here, we examine whether training datasets themselves can be optimized to improve generalization. We introduce a method to create datasets that produce emulators capable of generalizing to new, structurally different scenarios absent from the training data. We use a differentiable Simple Climate Model (SCM) to calculate the sensitivity of emulator loss to perturbations in the training data, iteratively updating the training data to maximize emulator skill. For an SCM, training on one scenario optimized in this fashion outperforms an emulator trained on six standard ScenarioMIP pathways. We achieve this higher predictive skill despite training on a smaller dataset, finding that our emulator successfully isolates distinct physical behaviors of different climate forcing agents (e.g., greenhouse gases vs. aerosols) without single-forcing runs. We then demonstrate that scenarios optimized using an SCM, when used to drive an intermediate-complexity climate model, produce a training dataset that yields a more skillful emulator than training on ScenarioMIP outputs. Our results suggest that, in the compute-constrained environment of running full-scale climate models, generating a small number of dynamically rich scenarios provides greater marginal value for emulation and characterizing system responses than expanding the suite of traditional emissions pathways.

URL PDF HTML ☆

赞 0 踩 0

2606.19329 2026-06-18 astro-ph.IM cs.LG 交叉投稿

The Chandra-Gaia Catalog of Counterparts: Resolving ambiguous Gaia matches to X-ray sources in the Chandra Source Catalog using Machine Learning

钱德拉-盖亚对应体星表：利用机器学习解决钱德拉源星表中X射线源与盖亚源的多重匹配歧义

V. Samuel Pérez-Díaz, Vinay L. Kashyap, Joshua D. Ingram, David Fouhey, Juan Rafael Martínez-Galarza, Pavlos Protopapas, Jeremy J. Drake, Dong-Woo Kim, Cecilia Garraffo

发表机构 * Center for Astrophysics Harvard \& Smithsonian, 60 Garden St, Cambridge MA 02138, USA ； Harvard John A. Paulson School of Engineering ； Universidad del Rosario, School of Engineering, Science ； The NSF AI Institute for Artificial Intelligence ； New York University, Courant Institute, 60 5th Avenue, New York NY, USA ； Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213 ； New College of Florida, 5800 Bayshore Road, Sarasota, FL 34243, USA ； Astrophysics Laboratory, 3251 Hanover St, Palo Alto, CA 94304, USA

AI总结提出结合源属性（星等、颜色、距离）的机器学习框架，解决钱德拉源星表与盖亚源星表的交叉匹配歧义，为约11.3万个X射线源找到对应体，并识别约2万个假匹配。

Comments Accepted to The Astrophysical Journal. Website: https://www.samuelperezdi.com/chandragaia/

详情

AI中文摘要

我们提出了一个框架，用于将钱德拉源星表（CSC v2.1）中的源与盖亚数据发布3中的光学源进行交叉匹配。与纯空间方法不同，我们使用源属性（如星等、颜色和距离）来识别真实对应体、检测偶然重合，并在存在多个合理候选者时解决歧义。我们使用NWAY（一种考虑位置误差和源密度的贝叶斯交叉匹配框架）定义高置信度匹配的训练集。我们在两个星表的多种特征上训练梯度提升分类器（LightGBM）。在约25.4万个独特X射线源中，我们为约11.3万个源找到了对应体，其中约7000个源存在多个合理对应体。对于约2万个基于分离的交叉匹配能找到匹配的源，我们未找到对应体，并将其中的一半归因于偶然重合。我们在钱德拉猎户座超深项目（COUP）上验证了该流程，机器学习匹配在不使用任何位置信息的情况下再现了NWAY交叉匹配的95%。我们发布了约11.3万个钱德拉-盖亚对应体的星表，以及约7000个替代匹配和约2万个歧义NWAY关联，以支持未来对钱德拉和盖亚均可探测到的源进行种群研究。我们讨论了局限性，并提供了该框架的泛化版本，适用于其他交叉匹配场景。

英文摘要

We present a framework to cross-match sources from the Chandra Source Catalog (CSC v2.1) with optical sources from Gaia Data Release 3. Unlike purely spatial approaches, we use source properties such as magnitudes, colors, and distances to identify true counterparts, detect chance coincidences, and resolve ambiguities when multiple plausible candidates exist. We define a training set of high-confidence matches using NWAY, a Bayesian cross-matching framework that accounts for positional errors and source densities. We train a gradient-boosted classifier (LightGBM) on a variety of features from both catalogs. Of the ~$254$k unique X-ray sources, we find counterparts for ~$113$k sources, of which plausible multiple counterparts are found for ~$7$k. We find no counterparts for ~$20$k sources for which separation-based cross-matching does find a match, and attribute half of these to chance coincidences. We validate the pipeline on the Chandra Orion Ultradeep Project (COUP), where the machine-learning matches reproduce 95% of NWAY cross-matches without using any positional information. We release a catalog of the ~$113$k Chandra-Gaia counterparts, together with ~$7$k alternative matches and ~$20$k ambiguous NWAY associations, supporting future population studies of sources detectable by both Chandra and Gaia. We discuss limitations and provide a generalization of the framework that is applicable in other cross-matching scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.18535 2026-06-18 stat.ME cs.LG math.ST stat.TH 交叉投稿

Shrinkage priors for Bayesian Substitute Confounders

贝叶斯替代混杂因子的收缩先验

Yordan P. Raykov, Hengrui Luo, Justin D. Strait, Wasiur R. KhudaBukhsh

发表机构 * School of Mathematical Sciences, University of Nottingham, Nottingham, UK（诺丁汉大学数学科学学院）； Department of Statistics, Rice University, USA（里士满大学统计学系；伯克利国家实验室）； Lawrence Berkeley National Laboratory, USA（洛斯阿拉莫斯国家实验室统计科学组）； Statistical Sciences Group, Los Alamos National Laboratory, USA

AI总结针对多原因观察研究中替代混杂因子过度编码问题，提出贝叶斯因子分配框架，利用收缩先验学习稀疏替代混杂因子，保持粗粒度多原因依赖，并证明后验集中性和重叠保持几何性质，实现潜在结果的一致性估计。

详情

AI中文摘要

多原因观察研究通过原因间的依赖结构包含关于未测量混杂的信息。然而，对未观测混杂的直接插补通常比学习一个低维替代得分更复杂，该得分保留了稳定因果调整所需的共享分配变异。去混杂因子（Wang and Blei, 2019）及相关替代混杂因子方法利用了这一思想，但灵活的分配模型可以拟合原因的联合分布，同时产生过度编码处理向量、破坏重叠或捕获单原因变异的得分。我们开发了一个贝叶斯因子分配框架，用于学习稀疏替代混杂因子，该框架通过收缩先验保留粗粒度的多原因依赖。该理论在后验集中性、因子得分收缩和保留重叠的分配几何层面进行阐述，因此不依赖于特定的收缩先验。在这些条件下，当相应的潜变量识别假设成立时，所提出的回归调整估计量对平均潜在结果是一致的。收缩先验为潜在结构学习提供了自然工具：它们倾向于由多个原因支持的低维因子，阻止有效的单原因因子，并通过渐进收缩诱导潜在因子的排序。合成实验说明了信号强度、结果有效性和几何感知正则化的作用。在阿尔茨海默病神经影像学倡议（ADNI）基线分析中，稀疏替代得分恢复了对侵入性脑脊液生物标志物直接条件调整的大部分效果，而重叠崩溃诊断则识别出拟合因子何时简化为单个观测测量。

英文摘要

Multi-cause observational studies contain information about unmeasured confounding through the dependence structure among causes. However, literal imputation of the unobserved confounder is often more complex than learning a lower-dimensional substitute score that preserves the shared assignment variation needed for stable causal adjustment. The deconfounder (Wang and Blei, 2019) and related substitute confounder methods exploit this idea, but flexible assignment models can fit the joint distribution of the causes while producing scores that over-encode the treatment vector, collapse overlap, or capture single-cause variation. We develop a Bayesian factor assignment framework for learning sparse substitute confounders that retain coarse multi-cause dependence with shrinkage priors. The theory is stated at the level of posterior concentration, factor score contraction, and overlap-preserving assignment geometry and therefore does not rely on a particular shrinkage prior. Under these conditions, the proposed regression-adjusted estimators are consistent for mean potential outcomes when the corresponding latent variable identification assumptions hold. Shrinkage priors provide a natural tool for latent structural learning: they favour low-dimensional factors supported by multiple causes, discourage effectively single-cause factors, and induce an ordering of the latent factors through progressive shrinkage. Synthetic experiments illustrate the roles of signal strength, outcome validity, and geometry-aware regularization. In an Alzheimer's Disease Neuroimaging Initiative (ADNI) baseline analysis, sparse substitute scores recover much of the adjustment obtained by directly conditioning on invasive cerebrospinal-fluid biomarkers, while collapse diagnostics identify when fitted factors reduce to individual observed measurements.

URL PDF HTML ☆

赞 0 踩 0

2606.19270 2026-06-18 eess.IV cs.LG physics.med-ph 交叉投稿

Beyond Algorithms: Conceptual Innovation in Medical Imaging AI

超越算法：医学影像人工智能中的概念创新

Mark A. Anastasio

发表机构 * Mallinckrodt Institute of Radiology and Department of Electrical & Systems Engineering, Washington University in St. Louis（马林克罗德特放射医学研究所和电气与系统工程系，华盛顿大学圣路易斯分校）

AI总结本文区分算法创新与概念创新，指出当前激励结构过度奖励算法新颖性而忽视概念贡献，通过医学影像AI案例展示概念不足导致的错位目标与有限临床影响，并提出促进概念创新的建议。

详情

AI中文摘要

人工智能推动了医学影像研究的快速发展，产生了日益复杂的算法，并在基准任务上稳步改进。然而，这种以算法为中心的发展轨迹也揭示了一个日益加剧的不平衡：虽然计算方法快速进步，但定义成像任务、评估指标和临床意义的概念基础有时仍未得到充分审视。在这篇观点文章中，我们区分了算法创新（专注于在固定问题定义内改进计算实现和性能）与概念创新（重新定义提出的问题、衡量成功的方式以及方法在临床上的相关性）。我们认为，当前的激励结构、培训路径和发表规范不成比例地奖励算法新颖性，尤其是对早期职业研究者而言，而有时低估了对科学成熟和临床转化至关重要的概念贡献。通过医学影像AI的代表性例子，我们展示了概念基础不足如何导致目标错位、泛化脆弱以及现实世界影响有限。最后，我们为研究者、导师、审稿人和期刊提出了可操作的建议，以更好地识别、支持和整合概念创新与算法进步。

英文摘要

Artificial intelligence has driven rapid progress in medical imaging research, producing increasingly sophisticated algorithms and steady improvements on benchmark tasks. However, this algorithm-centric trajectory has also revealed a growing imbalance: while computational methods advance rapidly, the conceptual foundations that define imaging tasks, evaluation metrics, and clinical meaning sometimes remain underexamined. In this Perspective, we distinguish algorithmic innovation, which focuses on improving computational implementations and performance within a fixed problem definition, from conceptual innovation, which reframes what problems are posed, how success is measured, and why an approach is clinically relevant. We argue that prevailing incentive structures, training pathways, and publication norms disproportionately reward algorithmic novelty, particularly for early-career researchers, while at times undervaluing conceptual contributions that are essential for scientific maturation and clinical translation. Through representative examples from medical imaging AI, we show how insufficient conceptual grounding can lead to misaligned objectives, fragile generalization, and limited real-world impact. We conclude with actionable recommendations for researchers, mentors, reviewers, and journals to better recognize, support, and integrate conceptual innovation alongside algorithmic advances.

URL PDF HTML ☆

赞 0 踩 0

1. 深度学习架构与训练方法 10 篇

A physical adaptive material motor unit neural network: a hygromorph composite material machine

Starter-Iterator Neural Operator: A Unified Architecture for High-Fidelity Forward and Inverse PDE Problems

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

A Neural Network Framework for Geodesic-Like Curve Computation on Parametric Surfaces

Skill-MAS: Evolving Meta-Skill for Automatic Multi-Agent Systems

Kernel of Partition Paths: A Unified Representation for Tree Ensembles

Adaptive Speech-to-Spike Encoding for Spiking Neural Networks

Structure Over Nonlinearity: Explicit Interaction Architectures for Dynamical Learning

Beyond Safe Data: Pretraining-Stage Alignment with Regular Safety Reflection

NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning

2. 表示学习、自监督与对比学习 2 篇

Compact Geometric Representations of Hierarchies

Transformer Geometry Observatory TGO-I: Spectral Geometry Observatory

3. 强化学习与序列决策 5 篇

Sequential Hiring of Contingent Workers Through Learning-Based Optimization

N(CO)$^2$: Neural Combinatorial Optimization with Chance Constraints to Solve Stochastic Orienteering

When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?

Optimizing Lithium Production Decisions under Geological, Demand, and Pricing Uncertainties: A POMDP Framework for Multi-Objective Decision Making

Model-Free Reinforcement Learning Control for Resilient Cyber-Physical Systems

4. 生成模型与概率建模 5 篇

Stochastic Thermodynamics and SDE-based Generative Models

Structural MRI Synthesis for Alzheimer's Disease via Conditional Diffusion on Anatomical Masks

Closing the Loop: PID Feedback Control for Interpretable Activation Steering in Symbolic Music Generation

Approximate Structured Diffusion for Sequence Labelling

Sumi: Open Uniform Diffusion Language Model from Scratch

5. 优化、泛化与理论分析 8 篇

Exponentially many initializations to avoid barren plateaus

Toward Simultaneously Optimal Regret in U-Calibration

Fair Online Resource Allocation

Learning Augmented Exact Exponential Algorithms

Sequential Kernel-based Conditional Independence Testing via Adaptive Betting

Wasserstein Policy Learning for Distributional Outcomes

On Local Population-Risk Certificates

Generalised Eigenvalue Geometry of Semantic Adversarial Attacks

6. 高效学习、压缩与部署 3 篇

Mixed-Precision Communication-Avoiding SGD for Generalized Linear Models on GPUs

Spotlight: Synergizing Seed Exploration and Spot GPUs for DiT RL Post-Training

Running hardware-aware neural architecture search on embedded devices under 512MB of RAM

7. 联邦学习、隐私与安全 3 篇

TIGER: Inverting Transformer Gradients via Embedding-Subspace Distance Optimization

Lifecycle-Aware Dynamic Analysis for Secure ML Model Execution

Giskard : Byzantine Robust and Confidential Aggregation for Large-Scale Decentralized Learning

8. 鲁棒性、不确定性与可信学习 4 篇

ToolChain-CRC: Conformal Risk Control for Agentic AI Under Retrieval and Tool-Use Drift

Evaluating Prompting-Based Defenses Against Domain-Camouflaged Injection Attacks

Quantification of Uncertainty with Adversarial Models in Medical Image Segmentation

Confidence is Not Reliability: Rethinking MC Dropout in Brain Tumour Segmentation

9. 迁移、元学习与持续学习 1 篇

Bridging Data Gaps in Structural Fragility Modeling through Transfer Learning: Methodology and Case Studies

10. 数据集、基准与评测 13 篇

Graph Instance Landscapes: When Structural Similarity Does (Not) Reflect Shortest-Path Performance

A Guide to Estimating Conditional Average Treatment Effects in Competing Risks Settings

Protein-Based Fish Species Identification: Dataset, Models, and Insights from Native Bangladeshi Fish

Pointwise is Pointless? A Multimodal Ablation Study for Precipitation Nowcasting with Graph Neural Networks

DeFAb: A Verifiable Benchmark for Defeasible Abduction in Foundation Models

ForecastBench-Sim: A Simulated-World Forecasting Benchmark

TimeLAVA: Learning-Agnostic Data Valuation for Time Series

Ensuring Trustworthy Online A/B Testing: Addressing Five Key Questions on CUPED

FOSC-X: An Extended Framework for Optimal Local Cuts and Non-Horizontal Cluster Selection from Clustering Hierarchies

Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning

When AUC Misleads: Polarization-Aware Evaluation of Deepfake Detectors under Domain Shift

TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology

Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

11. 机器学习应用 21 篇

Comprehensive pKa Data Augmentation from Limited Real Data through an Engineered Models-Quantum Framework

Integrating Multi-Label Classification and Generative AI for Scalable Analysis of User Feedback

Predicting the Neutrino Mass Ordering Using Neural Networks

NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation

Reliable Neural-Codec Text-to-Speech by ASR Self-Verification and Distillation: Near-Zero Catastrophic Failures Across Models and Codecs

CAOA -- Completion-Assisted Object-CAD Alignment

Modeling Doppler Shifts in Radial-Velocity Data with Deep Learning toward Earth-mass Exoplanet Detection

Leveraging Energy Features for Surface Classification with Deep Learning: A Comparative Analysis Across Three Independent Datasets

Clinically Aligned Geometry Constraints for Robust IVUS Vessel Boundary Segmentation

Point-Cloud-Assistant Localized Statistical Channel Prediction by Tangent Gaussian Splatting

Where Will They Go? Modelling Multimodal Pedestrian Manoeuvres from Ego-centric Videos

Test-Time Adaptation in Optical Coherence Tomography Using Trajectory-Aligned Time-Independent Flow

TransitNet: A Compact Attention-Augmented Deep Learning Framework for Low-SNR Transit Blind Searches

Context-Aware Optimization of Follow-Up Intervals for Type 2 Diabetes Care Using Markov Decision Processes

Analysing drivers and interdependencies in European electricity markets using XAI