arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.13095 2026-06-12 eess.AS cs.SD 新提交

Balancing ASR and diarization in end-to-end LLMs for multi-talker speech recognition

在端到端大语言模型中平衡ASR与说话人日志以进行多说话人语音识别

Naijun Zheng, Yuke Lin, Sanli Tian, Mengtian Li, Zhiwei Lin, Longshuai Xiao, Dandan Tu

AI总结提出双编码器架构、特征交错格式、长度感知说话人ID损失和自适应阈值ASR损失策略，在有限真实数据下高效训练LLM系统，平衡ASR与说话人日志任务，在AliMeeting和Aishell4语料库上分别实现18%和24%的相对改进。

详情

Comments: Accepted in Interspeech 2026

AI中文摘要

多说话人语音识别通常通过结合自动语音识别（ASR）和说话人日志的流水线系统来处理。最近，基于大语言模型（LLM）的方法通过联合建模语义和说话人信息显示出前景，但它们通常需要大规模的多说话人语料库，而标注这些语料库成本高昂。在本文中，我们研究了如何在有限真实录音数据下高效训练基于LLM的系统，同时保持说话人归属的高准确性。我们提出了几种策略：（1）双编码器架构，用于提取语义和说话人特征；（2）特征交错格式，将这些特征合并作为LLM的输入；（3）长度感知的说话人ID损失，以增强日志能力；（4）自适应阈值的ASR损失计算，以减轻语音重叠引起的幻觉。这些策略平衡了ASR和说话人日志任务之间的训练。我们的系统优于开源基线方法，在AliMeeting语料库上实现了18%的相对改进，在Aishell4语料库上实现了24%的相对改进。

英文摘要

Multi-talker speech recognition is often addressed by combining automatic speech recognition (ASR) and speaker diarization in a pipeline system. Recently, LLM-based approaches have shown promise by jointly modeling semantic and speaker information, but they typically require large-scale multi-talker corpora that are costly to annotate. In this paper, we investigate how to efficiently train an LLM-based system with limited real-recorded data while maintaining high accuracy in speaker attribution. We propose several strategies: (1) a dual-encoder architecture to extract semantic and speaker features, (2) a feature interleaving format to merge these features as the inputs to the LLM, (3) a length-aware speaker ID loss to enhance diarization capability, and (4) an adaptive threshold strategy for ASR loss computation to mitigate hallucinations caused by speech overlaps. These strategies balance training between ASR and diarization tasks. Our system outperforms open-source baseline approaches, achieving relative improvements of 18% on the AliMeeting corpus and 24% on the Aishell4 corpus.

URL PDF HTML ☆

赞 0 踩 0

2606.13017 2026-06-12 q-bio.NC cs.LG 新提交

Deep Sleep Classification via EEG Signal Criticality: A Passive BCI Approach for Sleep-Improvement Neurofeedback

基于EEG信号临界性的深度睡眠分类：一种用于改善睡眠神经反馈的被动BCI方法

Stanisław Narębski, Tomasz Komendziński, Tomasz M. Rutkowski

AI总结本研究利用去趋势波动分析（DFA）提取的临界性特征，通过朴素贝叶斯分类器实现了对深度睡眠（N3）的高精度识别（平衡准确率87.17%），为被动脑机接口中的状态依赖神经反馈提供了高效感知机制。

详情

Comments: 7 pages, 3 figures, accepted for publication in the Proceedings of the 10th Graz Brain-Computer Interface Conference 2026, Graz, Austria, September 14-17, 2026

AI中文摘要

自动睡眠分期是被动脑-机接口（pBCI）的一项基础应用，它解码自发神经状态以实现独立于用户意图的闭环干预。本研究评估了从去趋势波动分析（DFA）中提取的临界性特征，用于特定识别深度睡眠（N3）。我们分析了来自290名老年女性的347,232个EEG时段，使用UMAP流形学习可视化状态转换。随后，通过10折交叉验证对六个分类器进行基准测试，使用平衡准确率确定此http URL的最佳“状态感知”引擎。朴素贝叶斯达到了最高的平均平衡准确率（87.17% ± 0.24%），显著优于全连接深度神经网络（FNN：81.58%）和随机森林（80.97%）。线性模型（LDA：57.21%；SVM：51.01%）表现不佳，表明DFA衍生的临界性特征位于一个独特的非线性流形上。EEG临界性的概率解码为pBCI提供了一种高精度的感知机制。这种稳健的分类流程支持开发状态依赖的神经反馈，例如靶向听觉刺激，以增强认知恢复。

英文摘要

Automated sleep staging is a fundamental application of passive Brain-Computer Interfaces (pBCI), decoding spontaneous neural states to enable closed-loop interventions independent of user intent. This study evaluates criticality features derived from Detrended Fluctuation Analysis (DFA) for the specific identification of deep sleep (N3). We analyzed $347,232$ EEG epochs from $290$ older women using UMAP manifold learning to visualize state transitions. Subsequently, six classifiers were benchmarked via 10-fold cross-validation, using balanced accuracy to determine the optimal "state-sensing" engine for this http URL Bayes achieved the highest mean balanced accuracy ($87.17\% \pm 0.24\%$), significantly outperforming a fully connected deep neural network (FNN: $81.58\%$) and Random Forest ($80.97\%$). Linear models (LDA: $57.21\%$; SVM: $51.01\%$) performed poorly, indicating that DFA-derived criticality features reside on a distinct, non-linear manifold. Probabilistic decoding of EEG criticality provides a high-accuracy sensing mechanism for pBCIs. This robust classification pipeline supports the development of state-dependent neurofeedback, such as targeted auditory stimulation, to enhance cognitive recovery.

URL PDF HTML ☆

赞 0 踩 0

2606.12838 2026-06-12 q-bio.QM cs.AI cs.LG q-bio.GN 新提交

OCOO-T: A Simple and Scalable Virtual Cell Model for Transcriptional Perturbation Response Prediction

OCOO-T: 一种用于转录扰动响应预测的简单可扩展虚拟细胞模型

Danning Jiang, Zheming An, Yalong Zhao, Lipeng Lai

AI总结提出OCOO-T，一种基于流匹配的简约虚拟细胞模型，通过连续时间去噪和自适应层归一化，在多个基准上实现转录扰动预测的最优性能。

详情

Comments: 22 pages, 6 figures

AI中文摘要

预测单细胞对遗传、化学和细胞因子扰动的转录响应是计算生物学和AI虚拟细胞（AIVC）建模中的一个基本挑战，对药物发现和基因调控网络的阐明具有直接影响。现有方法通常依赖辅助细胞状态编码器、分层变分自编码器、专用Transformer编码器-解码器模块或基因相互作用先验，将高维表达谱压缩为潜在表示。虽然有效，但这些设计增加了架构复杂性，可能限制可扩展性和泛化性。本文介绍了OCOO-T，一种基于流匹配的简约AIVC模型，用于转录扰动响应预测。OCOO-T利用一个直接操作连续基因表达谱的普通Transformer堆栈，并将扰动响应预测表述为连续时间去噪过程。通过自适应层归一化和上下文令牌整合扰动嵌入、剂量信息以及细胞系/细胞类型特异性。在Tahoe100M、Replogle和PBMC基准上的全面评估表明，OCOO-T在多种扰动和细胞类型上实现了最先进的性能，同时通过细胞上下文的修补和拆补有效扩展到长转录谱。通过利用基于Transformer去噪的单细胞组学简单性，OCOO-T为计算机细胞模拟提供了一个有效且可扩展的框架。

英文摘要

Predicting single-cell transcriptional responses to genetic, chemical and cytokine perturbations is a fundamental challenge in computational biology and AI Virtual Cell (AIVC) modeling, with direct implications for drug discovery and the elucidation of gene regulatory networks. Existing approaches often rely on auxiliary cell-state encoders, hierarchical variational autoencoders, dedicated Transformer encoder-decoder modules, or gene-interaction priors to compress high-dimensional expression profiles into latent representations. While effective, these designs increase architectural complexity and may limit scalability and generalizability. This paper introduces OCOO-T, a minimalist flow-matching-based AIVC model for transcriptional perturbation response prediction. OCOO-T utilizes a vanilla Transformer stack that operates directly on continuous gene expression profiles and formulates perturbation response prediction as a continuous-time denoising process. Perturbation embeddings, dosage information, and cell-line/cell-type specificity are integrated through adaptive layer normalization and in-context tokens. Comprehensive evaluations on Tahoe100M, Replogle, and PBMC benchmarks demonstrate that OCOO-T achieves state-of-the-art performance across diverse perturbations and cell types while effectively scaling to long transcriptional profiles through patching and depatching of cellular contexts. By leveraging the simplicity of Transformer-based denoising for single-cell omics, OCOO-T provides an effective and scalable framework for in-silico cellular simulation.

URL PDF HTML ☆

赞 0 踩 0

2606.12654 2026-06-12 stat.ME cs.LG stat.ML 新提交

Computationally tractable robust differentially private mean estimation

计算可处理的鲁棒差分隐私均值估计

Kelly Ramsay

AI总结提出一种名为“气球均值”的新差分隐私均值估计器，通过扩展马氏距离球上的迭代裁剪实现计算可处理性、鲁棒性及零集中差分隐私，理论保证在重尾和污染椭圆模型下的统计性能与鲁棒性。

2606.12623 2026-06-12 stat.AP cs.LG 新提交

Estimating Individualized Treatment Effects in Acute Ischemic Stroke with Causal Transformation Models (TRAM-DAG): A Multi-Centre Observational Study with External RCT Validation

使用因果变换模型（TRAM-DAG）估计急性缺血性卒中个体化治疗效果：一项多中心观察性研究及外部RCT验证

Oliver Dürr, Lisa Herzog, Pascal Bühler, Susanne Wegener, Beate Sick

AI总结提出因果变换模型（TRAM-DAG）估计急性缺血性卒中患者个体化治疗效果，基于观察数据拟合后，在RCT人群中验证其平均效果与ATE一致，并能正确排序患者预后。

详情

AI中文摘要

急性缺血性卒中的个体化医疗需要从平均治疗效果（ATE）转向个体化治疗效果（ITE）估计，以支持治疗决策。在急性缺血性卒中中，随机对照试验（如MR CLEAN研究）显示机械取栓平均优于溶栓。我们旨在识别哪些个体患者从机械取栓中获益最大。关注的结局是三个月时的改良Rankin量表（mRS），这是一个有序的功能残疾指标（0：无症状，6：死亡）。我们证明，在观察性MAGIC多中心卒中患者数据上拟合后，有向无环图上的因果变换模型（TRAM-DAG）可用于ITE估计。为确保与用于验证的MR CLEAN人群的可比性，我们在MAGIC子人群（入院NIHSS≥6，对应MR CLEAN的一项纳入标准）上训练TRAM-DAG。然后使用拟合模型估计MR CLEAN人群中卒中患者的ITE。虽然这些ITE估计无法通过实验确认，但我们显示其平均值与试验报告的ATE一致。此外，ITE估计正确地将试验患者按观察到的良好结局（三个月mRS≤2）频率排序。这些发现支持使用像TRAM-DAG这样的因果模型进行卒中护理中的个性化决策，并突显其弥合观察性证据与临床试验之间差距的能力。

英文摘要

Personalized medicine in acute ischemic stroke requires moving beyond average treatment effects (ATE) to individualized treatment effect (ITE) estimates to support treatment decisions. In acute ischemic stroke, mechanical thrombectomy has been shown to be more effective on average than lysis in randomized controlled trials (RCTs), such as the MR CLEAN study. We aim to identify which individual patients benefit most from mechanical thrombectomy compared to lysis. The outcome of interest is the modified Rankin Scale (mRS) at three months, an ordinal measure of functional disability (0: no symptoms, 6: death). We demonstrate that causal transformation models on directed acyclic graphs (TRAM-DAG) can be used for ITE estimation after being fitted on observational MAGIC multi-center stroke patient data. To ensure comparability with the MR CLEAN population, which we use for validation, we train the TRAM-DAG on a MAGIC sub-population with NIHSS at admission >= 6, corresponding to one inclusion criterion of MR CLEAN. The fitted model is then used to estimate ITEs for stroke patients in the MR CLEAN population. While these ITE estimates cannot be confirmed experimentally, we show that their average is consistent with the trial's reported ATE. Furthermore, the ITE estimates correctly rank trial patients by their observed frequency of a good outcome (mRS at three months <= 2). These findings support the use of causal models like TRAM-DAG for personalized decision-making in stroke care and highlight their ability to bridge the gap between observational evidence and clinical trials.

URL PDF HTML ☆

赞 0 踩 0

2606.12471 2026-06-12 stat.ML cs.CL cs.ET cs.LG 新提交

Identifiability Without Gaussianity: Symbolic World Models and Near-Infinite Temporal Consistency

无高斯假设的可识别性：符号世界模型与近无限时间一致性

Seth Dobrin, Łukasz Chmiel

AI总结本文提出物理基础符号架构（PGSA），证明其在非高斯动态系统中实现精确线性可识别性和近无限时间一致性，克服了统计世界模型的高斯边界限制。

详情

Comments: Pre-print

AI中文摘要

Klindt、LeCun 和 Balestriero (arXiv:2605.26379) 证明了联合嵌入预测架构（JEPA）实现线性可识别性（即线性恢复世界的真实潜在变量）当且仅当世界的潜在动态遵循高斯平稳过程。这一高斯边界意味着时间一致性的基本限制：对于任何非高斯物理系统，统计世界模型的表示误差随时间单调增长。我们证明这一限制是统计对齐机制的产物，而非世界模型的一般性质。我们引入物理基础符号架构（PGSA），并证明三个结果：(1) PGSA 对所有物理机制实现精确线性可识别性，无论潜在分布如何；(2) PGSA 的每步误差仅受数值精度限制；(3) 直接推论是，PGSA 在无界数量的转换中保持时间一致性，我们称之为近无限时间一致性。我们进一步证明，对于任何非高斯系统，统计世界模型无法实现这一性质，无论模型容量或训练数据量如何。其中四个定理的代数核心已在 Lean 4 中使用 Mathlib4 v4.31.0 形式化（零个 sorry 占位符）；Klindt 等人的逆命题作为外部前提。对比表明，在世界动态的因果生成器中进行符号基础化是充分条件，并且在非高斯体制下，是实现近无限时间一致性的唯一条件。

英文摘要

Klindt, LeCun, and Balestriero ( arXiv:2605.26379 ) proved that Joint-Embedding Predictive Architectures (JEPAs) achieve linear identifiability, the linear recovery of the world's true latent variables, if and only if the world's latent dynamics follow a Gaussian, stationary process. This Gaussian boundary implies a fundamental limit on temporal consistency: for any non-Gaussian physical system, the representation error of a statistical World Model grows monotonically with time. We prove that this limit is an artifact of the statistical alignment mechanism, not a property of World Models in general. We introduce the Physics-Grounded Symbolic Architecture (PGSA) and prove three results: (1) a PGSA achieves exact linear identifiability for all physical regimes, regardless of the latent distribution; (2) the per-step error of a PGSA is bounded by numerical precision alone; and (3) as a direct consequence, a PGSA maintains temporal consistency for an unbounded number of transitions, a property we term near-infinite temporal consistency. We further prove that statistical World Models cannot achieve this property for any non-Gaussian system, regardless of model capacity or the volume of training data. The algebraic cores of four of the theorems are formalized in Lean 4 with Mathlib4 v4.31.0 (zero sorry placeholders); the Klindt et al. converse is taken as an external premise. The contrast establishes that symbolic grounding in the causal generator of the world's dynamics is the sufficient condition and, in non-Gaussian regimes, the only condition for near-infinite temporal consistency.

URL PDF HTML ☆

赞 0 踩 0

2606.13671 2026-06-12 cs.LG 新提交

Understanding Truncated Positional Encodings for Graph Neural Networks

理解图神经网络的截断位置编码

James Flora, Mitchell Black, Weng-Keen Wong, Amir Nayyeri

AI总结研究截断位置编码（如前k个特征空间或邻接矩阵幂）对图神经网络表达能力的影响，理论证明截断后多种位置编码的表达能力存在本质差异，且截断谱位置编码不再强于1-WL测试，实验表明混合截断编码优于单一类型。

详情

Comments: 28 pages, 4 figures, ICML 2026

AI中文摘要

位置编码（PEs）在理论和经验上增强了图神经网络（GNNs）的能力。两个最流行的PE家族——谱（例如，拉普拉斯特征空间、有效电阻）和基于游走的（邻接矩阵的多项式）——在表达能力上理论等价，其表达性介于1-WL和3-WL测试之间。然而，这种等价性假设GNN使用这些PE的“完整”版本，这需要$O(n^3)$的时间和空间复杂度。相反，从业者通常使用这些编码的截断变体，例如前$k$个特征空间或邻接矩阵的幂。然而，这些截断PE的理论性质尚不清楚。在这项工作中，我们启动了对这些截断PE的研究。理论上，我们表明，在截断下，几个PE家族在表达能力上存在根本差异。作为推论，我们证明截断谱PE不再强于1-WL测试。我们还研究了一个谱PE家族——$k$-调和距离——以突出即使密切相关的截断PE在表达能力上的差异。最后，我们通过实验表明，在真实世界数据集上，混合截断PE优于任何单一家族。

英文摘要

Positional encodings (PEs) enhance the power of graph neural networks (GNNs), both theoretically and empirically. Two of the most popular families of PEs - spectral (e.g., Laplacian eigenspaces, effective resistance) and walk-based (polynomials of the adjacency matrix) - are theoretically equivalent in expressive power, with expressivity between the 1-WL and 3-WL tests. However, this equivalence assumes the GNN uses the "complete" version of these PEs, which requires $O(n^3)$ time and space complexity. Instead, practitioners commonly use truncated variants of these encodings, such as the first $k$ eigenspaces or powers of the adjacency matrix. However, the theoretical properties of these truncated PEs are unknown. In this work, we initiate the study of these truncated PEs. Theoretically, we show that, under truncation, several families of PEs are fundamentally different in expressive power. As a corollary, we show that truncated spectral PEs are no longer stronger than the 1-WL test. We also study a family of spectral PEs, the $k$-harmonic distances, to highlight the differences in expressive power of even closely related truncated PEs. Finally, we experimentally show that a mix of truncated PEs is preferable to any single family on real-world datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.13658 2026-06-12 cs.AI 新提交

Before You Think: System 0, AI-Mediated Cognition and Cognitive Colonization

在你思考之前：系统0、AI中介认知与认知殖民化

Marianna Bergamaschi Ganapini, Massimo Chiriatti, Enrico Panai, Giuseppe Riva

AI总结本文比较三种AI认知框架，提出系统0具有独特理论地位，并引入“认知殖民化”概念，指出AI系统能将外部利益嵌入自我架构，构成难以察觉的影响。

2606.13655 2026-06-12 cs.CV cs.GR 新提交

Flex4DHuman: Flexible Multi-view Video Diffusion for 4D Human Reconstruction

Flex4DHuman：面向4D人体重建的灵活多视角视频扩散模型

Jen-Hao Cheng, Yipeng Wang, Hao Zhang, Gengshan Yang, Jenq-Neng Hwang

AI总结提出Flex4DHuman，一种基于相对相机位姿条件化的多视角视频扩散模型，无需显式几何先验即可将单目或稀疏多视角视频转换为密集多视角视频，并用于4D高斯溅射重建。

详情

Comments: 18 pages, 8 figures. Code, and multi-view caption dataset available

AI中文摘要

我们提出Flex4DHuman，一种多视角视频扩散模型，它通过仅使用相对相机位姿条件化，将动态主体的单目或稀疏多视角视频转换为同步的密集多视角视频。与先前依赖骨架、深度图、法线或渲染目标视角几何的人体中心方法不同，Flex4DHuman不需要显式几何先验，而是通过相对相机位姿位置编码来条件化生成。生成的视频可直接被下游重建流程用于创建动态4D高斯溅射。基于Wan 2.1 1.3B文本到视频模型，Flex4DHuman保留了骨干架构，并通过五轴位置编码编码相机和视角信息，该编码将时空RoPE扩展了视角索引和连续SE(3)相对相机几何。三阶段课程逐步训练模型以进行位姿跟随、灵活的参考到目标视角生成以及时间展开。为支持时间展开，我们使用干净的历史目标视角令牌进行训练。我们还添加了多视角字幕以实现测试时文本控制。结合现成的4D高斯溅射阶段，我们的框架将单目静态相机视频提升为动态4D高斯溅射。在DNA-Rendering和ActorsHQ上的实验表明，Flex4DHuman超越了先前最先进的方法，而相同的公式在混合人体-动物训练后泛化到动物类别。这些能力使Flex4DHuman成为从随意单目视频进行可扩展4D内容创建的实际一步，适用于仿真、游戏、AR/VR和视频重拍。

英文摘要

We present Flex4DHuman, a multi-view video diffusion model that transforms a monocular or sparse multi-view video of a dynamic subject into synchronized dense multi-view videos using only relative camera-pose conditioning. Unlike prior human-centric methods that rely on skeletons, depth maps, normals, or rendered target-view geometry, Flex4DHuman requires no explicit geometry priors and instead conditions generation through relative camera-pose positional encoding. The generated videos can be directly ingested by downstream reconstruction pipelines to create dynamic 4D Gaussian splats. Built on the Wan 2.1 1.3B text-to-video model, Flex4DHuman preserves the backbone architecture and encodes camera and view information through a five-axis positional encoding that extends spatio-temporal RoPE with view indices and continuous SE(3) relative camera geometry. A three-stage curriculum progressively trains the model for pose following, flexible reference-to-target view generation, and temporal rollout. To support temporal rollout, we train with clean historical target-view tokens. We also add multi-view captions to enable test-time text control. Combined with an off-the-shelf 4D Gaussian Splatting stage, our framework lifts monocular static-camera videos into dynamic 4D Gaussian splats. Experiments on DNA-Rendering and ActorsHQ show that Flex4DHuman surpasses prior state-of-the-art methods, while the same formulation generalizes to animal categories after mixed human-animal training. These capabilities make Flex4DHuman a practical step toward scalable 4D content creation from casual monocular videos for simulation, gaming, AR/VR, and video re-shooting.

URL PDF HTML ☆

赞 0 踩 0

2606.13637 2026-06-12 cs.LG 新提交

The Stable Recovery Manifold: Geometric Principles Governing Recoverability in Continual Learning

稳定恢复流形：持续学习中可恢复性的几何原理

Ayushman Trivedi, Bhavika Melwani

AI总结通过分析Split CIFAR-100上ResNet-18的顺序学习，发现遗忘知识在表示重组后仍可紧凑解码，提出稳定恢复流形假说，表明灾难性遗忘主要是可访问性和流形对齐问题。

详情

Comments: 9 pages, 8 figures, 8 tables

AI中文摘要

灾难性遗忘通常被视为顺序学习过程中先前学习知识的破坏。基于可访问性崩溃框架，我们研究了持续学习中可恢复性的几何结构。使用Split CIFAR-100和顺序训练的ResNet-18，我们分析了十个任务上的可恢复性、表示漂移和恢复复杂度。我们引入了恢复子空间维度（k_t），即保持完整探针性能90%所需的最小奇异方向数量。与我们的可恢复性扩散假说相反，尽管存在显著的表示漂移，恢复维度在整个训练过程中保持稳定（平均k_t = 8.0）。主角度漂移强烈预测可恢复性（r = -0.862），一个简单的几何模型解释了82.2%的可恢复性方差。这些发现支持稳定恢复流形假说，表明遗忘的知识在表示重组后仍可紧凑解码。结果表明，灾难性遗忘主要是一个可访问性和流形对齐问题，而非信息破坏。

英文摘要

Catastrophic forgetting is often viewed as the destruction of previously learned knowledge during sequential learning. Building on the Accessibility Collapse framework, we investigate the geometric structure of recoverability in continual learning. Using Split CIFAR-100 and a sequentially trained ResNet-18, we analyze recoverability, representational drift, and recovery complexity across ten tasks. We introduce Recovery Subspace Dimensionality (k_t), a measure of the minimum number of singular directions required to preserve 90 percent of full probe performance. Contrary to our Recoverability Diffusion hypothesis, recovery dimensionality remains stable throughout training (mean k_t = 8.0) despite substantial representational drift. Principal-angle drift strongly predicts recoverability (r = -0.862), and a simple geometric model explains 82.2 percent of recoverability variance. These findings support the Stable Recovery Manifold hypothesis, suggesting that forgotten knowledge remains compactly decodable despite representational reorganization. The results indicate that catastrophic forgetting is primarily an accessibility and manifold-alignment problem rather than information destruction.

URL PDF HTML ☆

赞 0 踩 0

2606.13633 2026-06-12 eess.SY cs.LG 新提交

Aerial Wildfire Suppression Planning with a Hybrid CNN-Cellular Automata Fire Model

基于混合CNN-元胞自动机火灾模型的空中野火抑制规划

Ion Matei, Maksym Zhenirovskyy, Takuya Kurihana, Rohit Vupala, Anthony Wong

AI总结提出结合混合神经-元胞自动机野火模型与梯度优化空中投放的框架，通过蒙特卡洛采样和空间相关扰动量化不确定性，案例验证可生成有效抑制方案。

详情

AI中文摘要

空中野火抑制不仅需要预测火势蔓延，还需要在操作和环境不确定性下设计有效的干预策略。我们提出了一个空中野火抑制的建模与优化框架，该框架将混合神经-元胞自动机野火模型与基于梯度的目标空中投放设计相结合。野火模型根据地形、燃料和风数据预测空间变化的蔓延行为，而干预模块确定二元投放动作，其连续值位置和方向参数映射到模拟网格。水和阻燃剂具有不同的抑制效果，分别对应于立即减少活跃燃烧和持续减少未来蔓延。为了评估所得抑制方案的鲁棒性，我们通过每日火势状态的蒙特卡洛采样量化偶然不确定性，并通过空间相关的预测误差扰动量化认知不确定性。基于2020年Bear Fire的案例研究表明，该框架可以生成连贯的空中抑制调度，以减少总火灾影响面积，并支持对野火干预策略的不确定性感知分析。

英文摘要

Aerial wildfire suppression requires not only predicting fire spread, but also designing effective intervention strategies under operational and environmental uncertainty. We present a modeling and optimization framework for aerial wildfire suppression that combines a hybrid neural-cellular automaton wildfire model with gradient-based design of targeted aerial drops. The wildfire model predicts spatially varying spread behavior from terrain, fuel, and wind data, while the intervention module determines binary drop actions with continuous-valued location and orientation parameters mapped to the simulation grid. Water and retardant are represented with distinct suppression effects, corresponding to immediate reduction of active burning and persistent reduction of future spread. To evaluate the robustness of the resulting suppression plans, we quantify both aleatoric uncertainty through Monte Carlo sampling of daily fire-state realizations and epistemic uncertainty through spatially correlated prediction-error perturbations. A case study based on the 2020 Bear Fire shows that the framework can generate coherent aerial suppression schedules for reducing total fire-affected area and can support uncertainty-aware analysis of wildfire intervention strategies.

URL PDF HTML ☆

赞 0 踩 0

2606.13603 2026-06-12 cs.LG cs.AI cs.CL 新提交

Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models

超越承诺边界：探究大型推理模型中的附带思维链

Daniel Scalena, Sara Candussio, Luca Bortolussi, Elisabetta Fersini, Malvina Nissim, Gabriele Sarti

AI总结通过早期退出估计思维链步骤的因果重要性，发现推理中存在从瞬态猜测到稳定答案的“承诺边界”，后续步骤为附带现象，可提前退出以缩短推理长度达55%而不影响性能。

详情

AI中文摘要

思维链推理是语言模型推理时扩展的主导范式，但每个步骤对最终答案的因果影响尚不明确。我们通过早期退出估计每个步骤的因果重要性，并利用这一度量研究多个模型家族的推理轨迹中答案如何形成。在多种任务中，我们发现推理通常会跨越一个“承诺边界”——从瞬态中间猜测到稳定、高置信度答案的急剧转变。这种转变通常发生在单个步骤中，远在模型推理块结束之前，随后是“附带”的思维链步骤，这些步骤不改变最终答案概率。利用注意力探针，我们表明答案形成阶段可以从中间推理步骤中以高精度线性解码，并稳健地泛化到未见过的推理任务。我们利用这一信号在承诺边界处提前退出推理块，平均将思维链长度减少高达55%，而对模型性能影响微乎其微。

英文摘要

Chain-of-thought (CoT) reasoning is the dominant paradigm for inference-time scaling in language models, yet the causal influence of individual steps on the final answer poorly understood. We estimate each step's causal importance via early exit and use this measure to study how answers form across the reasoning traces of several model families. Across diverse tasks, we find that reasoning typically crosses a \emph{commitment boundary} -- a sharp transition from transient intermediate guesses to a stable, high-confidence answer. This transition often happens in a single step, well before the model's reasoning block ends, and is followed by \emph{epiphenomenal} CoT steps that leave the final answer probability unaltered. Using attention probes, we show that answer-formation stages can be linearly decoded from intermediate reasoning steps with high accuracy and generalize robustly to unseen reasoning tasks. We exploit this signal to early-exit reasoning blocks at the commitment boundary, reducing the length of CoTs up to 55\% on average with negligible impact on model performance.

URL PDF HTML ☆

赞 0 踩 0

2606.13598 2026-06-12 cs.AI cs.CL cs.LG cs.MA 新提交

Reward Modeling for Multi-Agent Orchestration

多智能体编排的奖励建模

King Yeung Tsang, Zihao Zhao, Vishal Venkataramani, Haizhou Shi, Zixuan Ke, Semih Yavuz, Shafiq Joty, Hao Wang

AI总结提出OrchRM框架，通过自监督学习从多智能体执行中间产物构建奖励模型，无需人工标注，实现高效编排器训练和测试时扩展，在多个领域提升性能并降低计算成本。

详情

Comments: Preprint; work in progress

AI中文摘要

基于大型语言模型（LLM）的多智能体系统（MAS）需要有效的编排来协调专门化的智能体，然而训练这样的编排器受到有限监督和高计算成本的阻碍。我们提出了编排奖励建模（OrchRM），一种无需人工标注即可评估编排质量的自监督框架。OrchRM利用多智能体执行过程中的中间产物来构建Bradley-Terry奖励模型训练的胜负对。与现有的依赖昂贵子智能体展开的MAS测试时扩展和编排器训练框架不同，OrchRM直接在编排层面操作，实现了高效且高性能的奖励引导编排器训练和MAS测试时扩展。OrchRM在token使用上提高了高达10倍的训练效率，同时将MAS测试时扩展的准确率提升了高达8%。这些增益在多个领域（包括数学推理、基于网络的问答和多跳推理）中一致迁移，证明了编排级奖励建模作为鲁棒多智能体编排的可扩展方向。代码将在此https URL提供。

英文摘要

Multi-Agent Systems (MAS) built on Large Language Models (LLMs) require effective orchestration to coordinate specialized agents, yet training such orchestrators is hindered by limited supervision and high computational cost. We propose Orchestration Reward Modeling (OrchRM), a self-supervised framework for evaluating orchestration quality without human annotations. OrchRM leverages intermediate artifacts from multi-agent executions to construct win-lose pairs for Bradley-Terry reward model training. Unlike existing MAS test-time scaling and orchestrator training frameworks that rely on costly sub-agent rollouts, OrchRM operates directly at the orchestration level, enabling efficient and high-performing reward-guided orchestrator training and MAS test-time scaling. OrchRM improves training efficiency by up to 10x in token usage while improving MAS test-time scaling performance by up to 8% in accuracy. These gains consistently transfer across multiple domains, including mathematical reasoning, web-based question answering, and multi-hop reasoning, demonstrating orchestration-level reward modeling as a scalable direction for robust multi-agent orchestration. Code will be available at this https URL.

URL PDF HTML ☆

赞 0 踩 0

2606.13587 2026-06-12 cs.CV 新提交

Towards Effective Waste Segmentation for Automated Waste Recycling in Cluttered Background

面向杂乱背景下的自动废物回收的有效废物分割

Mamoona Javaid, Mubashir Noman, Abdul Hannan, Shah Nawaz, Mustansar Fiaz, Sajid Ghuffar

AI总结提出一种结合空间域和谱域的级联分割网络，并引入辅助特征增强模块，在杂乱场景下实现高效废物分割，在三个数据集上验证了有效性。

详情

Comments: accepted at ICML 2026

AI中文摘要

城市区域的快速扩张和人口增长导致废物产量急剧增加，这需要高效自动化的废物管理。在此背景下，使用深度学习的自动废物回收（AWR）可以帮助人类实现最优废物管理。最近的AWR深度学习方法提供了有前景的废物分割性能，但这些方法依赖大型骨干网络，对AWR系统效率低下，且在杂乱场景中性能下降。为此，本文引入了一种最优废物分割网络，该网络有效利用空间域捕获局部结构依赖性和谱域高效提取全局上下文关系。这种级联设计使网络能够逐步利用互补域中的局部和全局表示，突出有效分割各种废物对象所需的语义信息。此外，引入了辅助特征增强模块（AFEM），以增强目标对象的边界和斑点放大，从而在杂乱场景中实现更好的分割。在ZeroWaste-aug、ZeroWaste-f和SpectralWaste数据集上的大量实验揭示了所提出方法的优势。

英文摘要

Rapid expansion of urban areas and population growth is causing an immense increase in waste production, which demands the need for efficient and automated waste management. In this scenario, automated waste recycling (AWR) using deep learning methods can assist humans in optimal waste management. Recent deep learning approaches for AWR provide promising waste segmentation performance, however, these methods rely on large backbone networks that are inefficient for AWR systems and suffer from performance deterioration in cluttered scenes. To this end, an optimal waste segmentation network is introduced which effectively utilizes the spatial domain to capture localized structural dependencies and the spectral domain to efficiently extract global contextual relationships. This cascaded design allows the network to progressively leverage both local and global representations across complementary domains to highlight the semantic information necessary for effective segmentation of various waste objects. Furthermore, auxiliary feature enhancement module (AFEM) is introduced to enhance the target objects' boundaries and blob amplification for better segmentation in cluttered scenarios. Extensive experimentation on ZeroWaste-aug, ZeroWaste-f and SpectralWaste datasets reveals the merits of the proposed method.

URL PDF HTML ☆

赞 0 踩 0

2606.13566 2026-06-12 cs.AI 新提交

A Three-Layer Framework for AI in Scientific Discovery

人工智能在科学发现中的三层框架

Guojun Liao

AI总结提出AI在科学发现中的三层框架，核心创新是第二层：通过定性推理进行模型形成，识别框架结构不足并寻找缺失概念，通过三个案例说明其重要性。

详情

AI中文摘要

当前关于人工智能在科学发现中的讨论往往被两种可见的能力所主导：对现有知识的搜索以及通过优化、模拟和自动化的执行。两者都很重要，但都没有完全捕捉到发现的核心行为：模型的形成和演化。本文提出了AI在发现中的三层视图。第一层是大语言模型的搜索与检索。第二层，作为本文的主要创新，是通过定性推理进行模型形成：识别当前框架在结构上不足的能力，并在更广泛的表示空间中理解问题，不是通过试错，而是通过结构性的洞察，了解缺失了什么以及可以在哪里找到。第三层是执行、优化和细化。主要主张是第二层既是最重要的，也是发展最不充分的。没有模型形成的搜索仍然局限于继承的框架，而没有概念修订的执行只会放大现有的表述。我们通过三个案例研究来说明第二层推理：陈省身对高斯-博内定理的内蕴证明，通过李雅普诺夫函数解决内斯特罗夫加速梯度收敛问题，以及OpenAI在2026年自主反驳埃尔德什单位距离猜想。每个案例都表现出相同的结构特征：一个已经变得不充分的框架，一个缺失的概念对象，以及在一个意想不到的邻近领域中找到的解决方案。

英文摘要

Current discussions of AI in scientific discovery are often dominated by two visible capabilities: search over existing knowledge and execution through optimization, simulation, and automation. Both are important, but neither fully captures the central act of discovery: the formation and evolution of models. This paper proposes a three-layer view of AI in discovery. Layer 1 is search and retrieval by large language models. Layer 2, as the main innovation of this paper, is model formation through qualitative reasoning: the capacity to recognize when a current framework is structurally inadequate and to understand the problem within a broader representational space, not through trial and error, but through structural insight into what is missing and where it can be found. Layer 3 is execution, optimization, and refinement. The main claim is that Layer 2 is both the most important and the least developed. Search without model formation remains confined to inherited frameworks, while execution without conceptual revision only amplifies an existing formulation. We illustrate Layer 2 reasoning through three case studies: S. S. Chern's intrinsic proof of the Gauss-Bonnet theorem, the resolution of the Nesterov Accelerated Gradient convergence problem via Lyapunov functions, and the autonomous disproof of the Erdos unit distance conjecture by OpenAI in 2026. Each case exhibits the same structural signature: a framework that had become inadequate, a missing conceptual object, and a resolution found in an unexpected neighboring field.

URL PDF HTML ☆

赞 0 踩 0

2606.13565 2026-06-12 cs.LG 新提交

A2D2: Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding

A2D2: 任意长度离散扩散模型的自适应解码微调

Sophia Tang, Yuchen Zhu, Molei Tao, Pranam Chatterjee

AI总结提出A2D2框架，通过联合优化插入和去掩蔽策略及基于质量的推理调度，实现任意长度离散扩散模型的奖励引导微调，理论上保证收敛到奖励倾斜分布，实验提升奖励优化与生成灵活性和准确性。

详情

AI中文摘要

离散扩散模型为序列生成提供了一个简单且稳定的基于似然的框架，最近通过令牌插入扩展到任意长度设置。然而，针对任意长度离散扩散的基于奖励的微调原则性方法仍 largely unexplored。我们引入了任意长度离散扩散模型的自适应解码微调（A2D2），这是一个统一的框架，通过联合优化插入和去掩蔽策略以及基于质量的推理调度，实现任意长度离散扩散模型的奖励引导微调。我们推导了联合插入-去掩蔽路径测度的Radon-Nikodym导数，从而在不需要目标样本的情况下，理论上保证收敛到难以处理的奖励倾斜序列分布。在此基础上，我们将去掩蔽和插入质量确立为最小化解码误差的可行方法，并引入自适应联合解码（AJD）损失，该损失可证明地生成产生奖励倾斜分布的最优路径测度。实验上，A2D2在提高奖励优化的同时，相比先前的固定长度微调和推理时引导方法，增强了生成的灵活性和准确性。

英文摘要

Discrete diffusion models offer a simple and stable likelihood-based framework for sequence generation, recently extended to any-length settings via token insertion. Principled reward-guided fine-tuning for any-length discrete diffusion, however, remains largely unexplored. We introduce Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding (A2D2), a unified framework for reward-guided fine-tuning of any-length discrete diffusion models via joint optimization of the insertion and unmasking policies together with a quality-based inference schedule. We derive the Radon-Nikodym derivative for the joint insertion-unmasking path measures, enabling theoretically guaranteed convergence to the intractable reward-tilted sequence distribution without requiring target samples. Building on this, we establish unmasking and insertion quality as tractable approaches for minimizing decoding error and introduce the Adaptive Joint Decoding (AJD) loss, which provably yields the optimal path measure that generates the reward-tilted distribution. Empirically, A2D2 improves reward optimization while enhancing generation flexibility and accuracy over prior fixed-length fine-tuning and inference-time guidance methods.

URL PDF HTML ☆

赞 0 踩 0

2606.13543 2026-06-12 cs.NI cs.LG 新提交

NetCause: Counterfactual Learning for Root Cause Analysis in Large-Scale Networks

NetCause：大规模网络中根因分析的反事实学习

Fabien Chraim, Jian Zhang, Dominik Janzing, Xiang Song, Christos Faloutsos, John Evans

AI总结提出NetCause框架，将网络事件建模为图时间过程，通过反事实模拟排序候选根因，在31个专家标注事件上准确率提升16.1%。

详情

Comments: 9 pages, 6 figures

AI中文摘要

一个学习模型能否捕捉故障在大规模网络中的传播方式，并利用这些知识将客户影响因果归因于其根本原因？现有的根因分析技术通常依赖于静态规则、相关启发式或拓扑局部推理，难以在动态环境中泛化，因为故障在复杂的物理和逻辑依赖关系中传播。我们提出了NetCause，一个基于自监督学习的框架，将网络事件建模为图时间过程，并使用反事实模拟对候选根因进行排序。该方法生成可解释的根因假设排序，并自然地与操作员定义的缓解和修复措施集成。我们在来自领先云提供商生产网络的六个月内收集的1500多个事件上训练模型，并在31个专家标注的事件上评估。NetCause在与运营决策最相关的场景中持续改善根因排序质量，相比基于规则的启发式基线，准确率提升16.1%。虽然训练计算密集，但推理轻量，每个事件仅需数秒GPU运行时间（远低于典型的遥测收集延迟）。

英文摘要

Can a learned model capture how faults propagate through a large-scale network and use this knowledge to causally attribute customer impact to its underlying root cause? Existing root cause analysis techniques often rely on static rules, correlation heuristics, or topology-local reasoning, which struggle to generalize in dynamic environments where faults propagate across complex physical and logical dependencies. We present NetCause, a self-supervised learning-based framework that models network incidents as graph-temporal processes and uses counterfactual simulation to rank candidate root causes. This approach produces an interpretable ranking of root cause hypotheses and integrates naturally with operator-defined mitigation and remediation actions. We train the model on over 1,500 incidents collected over six months from a leading cloud provider's production network and evaluate it on 31 expert-labeled incidents. NetCause consistently improves root cause ranking quality in the regime most relevant to operational decision-making, achieving a 16.1% accuracy improvement over a rule-based heuristic baseline. While training is computationally intensive, inference is lightweight, requiring only seconds of GPU runtime per incident (well below typical telemetry collection latencies).

URL PDF HTML ☆

赞 0 踩 0

2606.13532 2026-06-12 cs.NI cs.LG 新提交

Graphical Causal Reasoning for Root Cause Analysis in Cloud Networks

云网络中根本原因分析的图因果推理

Fabien Chraim, Dominik Janzing, John Evans

AI总结提出基于图因果发现的云网络事故根本原因分析方法，通过时空分组和自动化本体降维，利用双变量Granger因果性和条件独立性检验构建因果图，并引入概率方法进行时间感知的根因评分。在35个生产事故中召回率85.7%，精确匹配率74.3%。

详情

Comments: 6 pages, 4 figures

AI中文摘要

云计算依赖于大规模网络，这些网络本质上是复杂系统。在本文中，我们提出了一种新颖的云网络事故根本原因分析（RCA）方法，利用基于图的因果发现技术。我们的方法通过引入时空分组策略和自动化本体来降低问题维度，从而解决了基于规则的自动化的局限性。我们使用双变量Granger因果性和条件独立性检验从二元时间序列数据构建因果图。对于推理，我们引入了一种概率方法，该方法根据时间延迟分配边特定的条件概率，从而通过因果图遍历实现可解释的、时间感知的根因评分。我们使用来自一家主要云提供商的35个生产事故的标记数据集评估了该系统。该模型成功召回正确根因的事故占85.7%，精确匹配的事故占74.3%。在生产中，该系统已用于800多个真实世界事故，并获得了网络工程师的积极定性反馈。这些结果突显了在动态和大规模运营环境中采用数据驱动的因果方法进行RCA的实用性。

英文摘要

Cloud-computing relies on large-scale networks which are inherently complex systems. In this paper, we present a novel approach to root cause analysis (RCA) of cloud network incidents, leveraging graph-based causal discovery techniques. Our method addresses the limitations of rule-based automation by introducing a spatiotemporal grouping strategy and an automation ontology to reduce the dimensionality of the problem. We construct a causal graph from binary time series data using bivariate Granger causality and conditional independence tests. For inference, we introduce a probabilistic method that assigns edge-specific conditional probabilities as a function of time lag, allowing for interpretable, time-aware root cause scoring via causal graph traversal. We evaluated the system using a labeled dataset of 35 production incidents from a major cloud provider. The model successfully recalled the correct root cause in 85.7% of incidents and produced an exact match in 74.3%. In production, the deployed system has been used in over 800 real-world incidents, with positive qualitative feedback from network engineers. These results highlight the practicality of a data-driven, causal approach to RCA in dynamic and large-scale operational environments.

URL PDF HTML ☆

赞 0 踩 0

2606.13529 2026-06-12 cs.HC cs.LG 新提交

Ride, Track, and Recover: Pilot Randomized Trial of a Wearable Digital Self-Management Intervention During a Veteran Endurance-Cycling Program

骑行、追踪与恢复：一项关于可穿戴数字自我管理干预在退伍军人耐力骑行项目中的初步随机试验

Alan Ta, Nilsu Salgin, Caleb Armstrong, Kala Phillips Reindel, Farzan Sasangohar

AI总结本研究通过随机试验，评估可穿戴数字自我管理干预对退伍军人创伤后应激障碍（PTSD）高唤醒症状的稳定效果，发现干预组症状改善更持久，且机器学习检测精度与症状严重程度正相关。

详情

AI中文摘要

退伍军人的创伤后应激障碍（PTSD）以持续高唤醒及共病焦虑和抑郁症状为特征，这些症状在临床环境外难以监测和管理。在德克萨斯州参加“英雄计划”骑行活动的13名退伍军人，通过计算机生成序列在自然环境中随机分为两组：（1）数字干预加体力活动，或（2）仅体力活动，外加一个由从更广泛的“英雄计划”退伍军人社区中选出的7名退伍军人组成的第三组家庭监测对照组。连续智能手表传感结合心率和加速度计特征来检测高唤醒事件，并由参与者实时确认。每周收集焦虑、抑郁和PTSD严重程度的自我报告测量。广义加性混合模型描述了随时间变化的非线性轨迹。基线归一化的高唤醒轨迹在不同条件下存在显著差异，数字干预组（n=7）显示出结构化的稳定，而仅体力活动组（n=3）在研究后期出现恶化。两个骑行组在耐力活动期间均表现出急性症状改善；然而，数字干预组表现出更高的整体收益维持。家庭对照组（n=4）显示出症状逐渐下降。机器学习检测的感知精度在个体间差异很大，并与症状严重程度正相关，较高严重程度的参与者确认了更大比例的检测事件。这些结果表明，将可穿戴检测与数字自我管理工具相结合可能支持高唤醒的稳定和症状改善，同时强调了在可穿戴心理健康系统中个性化和以人为中心的设计的重要性。

英文摘要

Post-traumatic stress disorder (PTSD) in veterans is characterized by persistent hyperarousal and comorbid anxiety and depressive symptoms that are difficult to monitor and manage outside clinical settings. Thirteen veterans participating in a Project Hero cycling event in Texas were randomized by computer-generated sequence in a naturalistic setting to two arms: (1) digital intervention plus physical activity, or (2) physical activity only, plus a third at-home monitoring control cohort consisting of 7 veterans selected from the broader Project Hero veteran community. Continuous smartwatch sensing combined heart rate and accelerometer features to detect hyperarousal events, which were confirmed in real time by participants. Weekly self-report measures of anxiety, depression, and PTSD severity were collected. Generalized additive mixed models characterized nonlinear trajectories over time. Baseline-normalized hyperarousal trajectories differed significantly across conditions, with the digital intervention group (n=7) showing structured stabilization compared to late-study escalation in the physical-only group (n=3). Both cycling groups exhibited acute symptom improvements during the endurance event; however, the digital intervention group demonstrated a higher overall maintenance of gains. The at-home control group (n=4) showed gradual symptom declines. Perceived precision of ML detections varied substantially across individuals and was positively associated with symptom severity, with higher-severity participants confirming a greater proportion of detected events. These results suggest that coupling wearable detection with digital self-management tools may support stabilization of hyperarousal and symptom improvement while emphasizing the importance of personalization and human-centered design in wearable mental health systems.

URL PDF HTML ☆

赞 0 踩 0

2606.13501 2026-06-12 cs.DC cs.LG cs.PF 新提交

GF-DiT: Scheduling Parallelism for Diffusion Transformer Serving

GF-DiT：扩散Transformer服务的并行调度

Xinwei Qiang, Yifan Hu, Shixuan Sun, Jing Yang, Han Zhao, Chen Chen, Yu Feng, Jingwen Leng, Minyi Guo

AI总结提出GF-DiT，一种策略可编程运行时，通过动态调整请求并行度来优化扩散Transformer服务，利用无组集合通信实现低开销在线重配置，显著提升吞吐量和降低延迟。

详情

AI中文摘要

扩散Transformer（DiT）已成为图像和视频生成的主流架构，对高效DiT服务的需求日益增长。现有系统为每个请求在其整个生命周期内分配固定的并行配置。然而，DiT工作负载在请求、执行阶段和系统条件之间表现出显著的异构性，使得静态并行性效率低下，通常导致GPU利用率低和服务质量下降。本文认为，DiT服务应将GPU并行性视为一种可调度的资源。我们提出GF-DiT，一种策略可编程的弹性DiT服务运行时，能够根据工作负载需求和服务目标动态调整运行中请求的并行度。GF-DiT引入了一种异步执行抽象，将请求分解为独立可调度的轨迹任务，并支持在线GPU重新分配。为了使弹性并行性实用化，GF-DiT进一步提出了无组集合（group-free collectives），一种轻量级通信抽象，支持低开销的任意执行组在线形成和重新配置。我们在vLLM-Omni中实现了GF-DiT，并在代表性的图像和视频扩散工作负载上进行了评估。与具有静态并行性的固定流水线执行相比，GF-DiT将吞吐量提高了高达6.01倍，平均延迟降低了高达95%，SLO违规率降低了高达90%，并将通信组设置开销从778毫秒降低到约60微秒。

英文摘要

Diffusion Transformers (DiTs) have become the dominant architecture for image and video generation, creating growing demand for efficient DiT serving. Existing systems assign each request a fixed parallel configuration throughout its lifetime. However, DiT workloads exhibit substantial heterogeneity across requests, execution stages, and system conditions, making static parallelism inefficient and often leading to poor GPU utilization and degraded service quality. This paper argues that DiT serving should treat GPU parallelism as a first-class schedulable resource. We present GF-DiT, a policy-programmable runtime for elastic DiT serving that dynamically adapts the parallelism of running requests according to workload demands and service objectives. GF-DiT introduces an asynchronous execution abstraction that decomposes requests into independently schedulable trajectory tasks and enables online GPU reallocation. To make elastic parallelism practical, GF-DiT further proposes group-free collectives, a lightweight communication abstraction that supports low-overhead online formation and reconfiguration of arbitrary execution groups. We implement GF-DiT in vLLM-Omni and evaluate it on representative image and video diffusion workloads. Compared with fixed-pipeline execution with static parallelism, GF-DiT improves throughput by up to 6.01$\times$, reduces mean latency by up to 95%, lowers SLO violation rates by up to 90%, and reduces communication-group setup overhead from 778 ms to approximately 60 $\mu$s.

URL PDF HTML ☆

赞 0 踩 0

2606.13468 2026-06-12 cs.SE cs.AI 新提交

Understanding the Rejection of Fixes Generated by Agentic Pull Requests -- Insights from the AIDev Dataset

理解AI代理生成的拉取请求修复被拒绝的原因——来自AIDev数据集的洞察

Mahmoud Abujadallah, Ali Arabat, Mohammed Sayagh

AI总结通过分析AIDev数据集，发现46.41%的AI代理（Copilot、Devin、Cursor、Claude）提出的代码修复被拒绝。本文对306个未合并的PR进行定性研究，归纳出14个拒绝原因，分为四类，并提出了改进模型引导的建议。

详情

Comments: 5 pages, 2 figures, MSR '26: Proceedings of the 23rd International Conference on Mining Software Repositories, April 2026, Rio de Janeiro, Brazil

AI中文摘要

AI编码代理越来越多地被用于生成拉取请求（PR），以在软件项目中提出代码修复。通过对AIDev数据集的初步探索，我们发现由Copilot、Devin、Cursor和Claude代理提出的修复中有46.41%被拒绝。这代表了大量浪费的资源，需要人工审查、验证以及运行测试和验证，而这些修复最终被丢弃。本文的目标是理解AI代理的失败模式，这对于更好地将AI代理集成为高效团队成员至关重要。本文对由前述代理创建或共同创作的306个未合并的拉取请求的代表性样本进行了定性研究，随后对拒绝原因进行了定量分析。我们的定性发现确定了14个原因，分为四个高级类别，用于拒绝AI代理的修复。我们观察到，开发者可能因以下原因拒绝修复：修复的实现不正确（例如，不完整、方法错误）、修复未通过持续集成（CI）管道并测试失败、代理无法执行实现（例如，未生成代码、会话丢失），以及修复优先级低。我们的结果揭示了在以下层面更好引导模型的重要性：（1）提出关于修复问题应遵循的方法的提示，（2）概述不应采取的方法的约束或限制，以及（3）指导代理如何通过CI管道验证实现而不引入破坏性变更。我们的结果表明，需要良好的任务优先级排序，以便生成的修复不会导致浪费的人工审查努力或浪费的代理资源（例如，令牌、计算或允许的请求数量）。

英文摘要

AI coding agents are increasingly used to generate pull requests (PRs) that propose code fixes in software projects. From a first exploration of the AIDev dataset, we find that 46.41\% of the fixes proposed by the agents Copilot, Devin, Cursor, and Claude are rejected. This represents a significant amount of wasted resources that require human reviews, verifications, and running tests and validations for fixes that are merely discarded. Our goal in this paper is to understand the failure modes of AI-agents, an understanding that is crucial for better integrating AI-agents as efficient teammates. In this paper, we conduct a qualitative study on a representative sample of 306 non-merged pull requests created or co-authored by the agents mentioned earlier, followed by a quantitative analysis of the reasons for rejection. Our qualitative findings identify 14 reasons divided into four high-level categories for rejecting AI-agent fixes. We observe that developers can reject fixes due to fixes whose implementation is incorrect (e.g., incomplete, wrong approach), fixes that do not pass the continuous integration (CI) pipelines and fail tests, fixes for which the agent is unable to perform the implementation (e.g., no code generated, sessions lost), and fixes whose priority is low. Our results shed light on the importance of better guiding the model at these levels: (1) proposing hints about the approach to follow for fixing an issue, (2) outlining constraints or limitations regarding the approaches that should not be taken, and (3) instructing the agent on how to validate the implementation through CI pipelines and without introducing a breaking change. Our results suggest the need for good prioritization of tasks so that generated fixes do not lead to wasted human review efforts or wasted agent resources (e.g., tokens, compute, or allowed number of requests).

URL PDF HTML ☆

赞 0 踩 0

2606.13461 2026-06-12 cs.LG cs.CV 新提交

Reinforcement Learning for Neural Model Editing

神经模型编辑的强化学习

Shaivi Malik

AI总结提出将神经模型编辑形式化为强化学习问题，通过奖励反馈学习编辑策略，在偏见缓解和机器遗忘任务上取得良好效果。

详情

AI中文摘要

编辑预训练神经网络需要针对特定目标定制的专用算法。设计此类算法通常耗时且需要大量精力。我们提出了一个探索性框架，将神经模型编辑形式化为强化学习问题，其中智能体使用奖励反馈修改模型。我们引入了两个环境：MaskWorld，其中智能体以乘法方式缩放权重；以及ShiftWorld，其中智能体应用加法权重更新。奖励函数结合了效用保持目标和任务特定编辑目标，使智能体能够在保持整体模型性能的同时学习有针对性的修改。我们在文本分类中的偏见缓解和图像分类中的机器遗忘上评估了该框架，这两者传统上都依赖于专用算法。我们的结果表明，在遗忘任务中，学习到的策略将遗忘集准确率降至接近0%，同时保留集准确率保持在90%以上。在偏见缓解设置中，学习到的策略将偏见相关性能提高了5%以上，同时保持了一般分类效用。我们的发现表明，神经模型编辑可以转化为强化学习问题，从而可以从奖励反馈中学习编辑策略，而不是为每个任务手动设计。

英文摘要

Editing pretrained neural networks requires specialized algorithms tailored to specific objectives. Designing such algorithms is often time-consuming and demands significant effort. We present an exploratory framework that formulates neural model editing as a reinforcement learning problem, where agents modify models using reward feedback. We introduce two environments: MaskWorld, where agents scale weights multiplicatively, and ShiftWorld, where agents apply additive weight updates. The reward function combines a utility-preservation objective with a task-specific editing objective, enabling agents to learn targeted modifications while maintaining overall model performance. We evaluate the framework on bias mitigation in text classification and machine unlearning in image classification, both of which traditionally rely on specialized algorithms. Our results show that the learned policies reduce forget set accuracy to nearly 0% while preserving over 90% retain set accuracy on the unlearning task. In the bias mitigation setting, the learned policies improve bias-related performance by more than 5% while maintaining general classification utility. Our findings show that neural model editing can be cast as a reinforcement learning problem, allowing editing policies to be learned from reward feedback rather than manually engineered for each task.

URL PDF HTML ☆

赞 0 踩 0

2606.13452 2026-06-12 cs.DL cs.CL cs.CY cs.HC 新提交

Examining the Cognitive Gap Between Authors and Peer Reviewers on Academic Paper Novelty

审视作者与同行评审员在学术论文新颖性上的认知差距

Chenggang Yang, Chengzhi Zhang

AI总结通过分析Nature Communications上15,328篇论文及其评审意见，发现作者和评审员都强调结果导向的创新，但评审员视角更全面；高创新论文受益于强宣传语言，中等创新论文的宣传语言与评审分歧显著相关。

详情

AI中文摘要

新颖性是评估学术论文质量的关键指标。学者们努力突出其工作的新颖方面，尤其是在标题、摘要和引言中。同行评审作为科学严谨性的守门人，严格评估论文的新颖性，但作者自我宣传与评审员评价之间可能存在认知差距。为探究此问题，我们分析了2016年至2021年间发表在Nature Communications上的15,328篇学术论文及其同行评审意见。我们发现，评审员和作者都强调结果导向的创新，但评审员采用更全面的评价视角。此外，通过考察宣传强度与论文固有新颖性的关系，我们发现其效果取决于论文的实际创新水平。高创新论文受益于更强的宣传语言，获得更积极的评价。我们还发现，宣传语言与评审员对新颖性的分歧显著相关，但仅针对中等创新性的论文，而对高或低新颖性的论文影响甚微。这揭示了宣传语言如何在学术评价的灰色地带中发挥最显著的作用。

英文摘要

Novelty is a crucial metric for assessing the quality of academic papers. Scholars strive to highlight the novel aspects of their work, particularly in the title, abstract, and introduction. Peer review, serving as the gatekeeper of scientific rigor, rigorously evaluates the novelty of papers, yet a cognitive gap may exist between author self-promotion and reviewer evaluation. To investigate this, we analyzed 15,328 academic papers published in Nature Communications from 2016 to 2021, along with their peer-review comments. We found that both reviewers and authors emphasize result-oriented innovation, with reviewers adopting a more comprehensive evaluation perspective. Furthermore, by examining promotional intensity against inherent paper novelty, we found that its effect depends on the paper's actual innovation level. Highly innovative papers benefit from stronger promotional language, receiving more positive evaluations. We also found that promotional language significantly correlates with reviewer disagreement on novelty specifically for papers of moderate innovativeness, whereas it has negligible impact for papers with either very high or very low novelty. This reveals how promotional language operates most prominently in the gray area of academic evaluation.

URL PDF HTML ☆

赞 0 踩 0

2606.13451 2026-06-12 cs.LG 新提交

Uncertainty Estimation for Molecular Diffusion Models

分子扩散模型的不确定性估计

Paul Seij, Christian A. Naesseth, Stephan Mandt, Metod Jazbec

AI总结提出一种事后方法，利用去噪网络的拉普拉斯近似估计预训练分子扩散模型中每个样本的不确定性，该分数与样本质量负相关，可用于过滤生成样本。

2606.13449 2026-06-12 cs.SE cs.AI 新提交

Toward Instructions-as-Code: Understanding the Impact of Instruction Files on Agentic Pull Requests

面向指令即代码：理解指令文件对智能体拉取请求的影响

Ali Arabat, Mohammed Sayagh

AI总结通过分析148个项目的15549个智能体PR，发现指令文件对合并率、代码变更量和合并工作量无一致正面影响，但成功项目指令文件更长且结构更清晰，提出“指令即代码”研究方向。

详情

Comments: 5 pages, 8 figures, 23rd International Conference on Mining Software Repositories, April 13--14, 2026

AI中文摘要

AI智能体（如GitHub Copilot）作为队友协作完成不同的软件工程任务，包括通过拉取请求（Agentic-PRs）提出的代码生成。为了提高智能体效率，开发者创建指令文件来指导AI智能体，包括如何导航项目、定位正确组件、运行测试、遵守最佳实践等。本文研究了这些指令的创建与AI智能体在创建更好的拉取请求方面的性能之间的关系，这些拉取请求具有更高的成功机会（即合并率）、处理更复杂的任务（例如代码变更量），并且需要更少的合并工作量（例如合并时间）。为此，我们分析了来自AIDev数据集中148个项目的15,549个智能体PR。使用这三个维度，我们比较了每个项目在创建指令文件前后的情况。我们发现，为AI智能体指定指令并不一定会带来更好的结果。使用指令文件后，27.7%的项目的合并率至少提高了20%，而26.35%的项目合并率下降。在变更量（例如代码变更量、修改文件数量）和合并智能体PR的工作量（例如合并时间和评论数量）方面也观察到相同的情况。通过初步探索，我们发现成功提高合并率的项目具有更长的指令文件，并且这些文件结构良好，分为更多的章节和子章节。我们的结果激励了研究需求，以帮助从业者将指令文件的开发视为一项软件工程活动（即，\textbf{指令即代码}）。

英文摘要

AI-agents (e.g., GitHub Copilot) collaborate as teammates in different software engineering tasks, including code generation proposed through pull requests (Agentic-PRs). For better agent efficiency, developers create instruction files that guide the AI-agents, including how to navigate the project, locate the right components, run tests, respect best practices, and more. In this paper, we investigate the relationship between the creation of these instructions and the performance of AI-agents in creating better pull requests, which have a higher chance of success (i.e., the merge rate), address more complex tasks (e.g., code churn), and require less effort to be merged (e.g., time to merge). To this end, we analyze 15,549 agentic PRs from 148 projects in the AIDev dataset. Using the three dimensions, we compare each project before and after the creation of the instruction files. We find that specifying instructions for AI-agents does not necessarily lead to better results. With the instruction files, 27.7\% of the projects increased their merge rate by at least 20\%, while 26.35\% decreased it. The same observation is seen with the amount of changes (e.g., code churn, number of modified files) and with the efforts to merge an agentic PR (e.g., merge time and number of comments). From a first exploration, we find that projects that managed to increase their merge rate have substantially longer instruction files, which are also well structured into a higher number of sections and sub-sections. Our results motivate the need for research to assist practitioners in framing the development of instruction files as a software engineering activity (aka, \textbf{Instructions-as-Code}).

URL PDF HTML ☆

赞 0 踩 0

2606.13443 2026-06-12 cs.LG 新提交

How Much Memory Do We Need? Adaptive Memory Gate for Neural Operators

我们需要多少记忆？神经算子的自适应记忆门

Jihyeon Hur, Yongseok Kwon, Min-Gi Jo, Jeongwhan Choi, Noseong Park

AI总结针对现有神经算子固定记忆权重适应性不足的问题，提出AMGFNO，通过可学习门动态调节记忆权重，在低分辨率下nRMSE降低55-79%。

2606.13441 2026-06-12 cs.AI cs.CL 新提交

Why Sampling Is Not Choosing: Intentionality, Agency, and Moral Responsibility in Large Language Models

为什么采样不是选择：大语言模型中的意向性、能动性与道德责任

Joseph Keshet

AI总结本文论证大语言模型不具备道德责任所需的承诺性能动性，其输出源于概率映射而非内在意向性，随机采样不等于选择。

详情

AI中文摘要

近期大语言模型（LLMs）的进展引发了关于此类系统展现能动性或具备道德主体资格的讨论。本文认为这些归因是错误的。我们坚持道德责任需要基于内在意向性和自我归因行动的承诺性能动性，而这种能动性构成了与责任相关的自由意志形式。尽管LLMs生成连贯且可进行规范性评估的输出，其操作完全由从数据中学习到的概率输入-输出映射所刻画。它们表面的意向性是衍生的而非内在的，其输出既不被作为承诺拥有，也不受理由引导。随机采样引入的变异性并不等同于选择或作者身份。我们回应来自意向立场、功能主义、相容论以及模型输出中存在道德推理的反对意见，认为这些都不足以确立真正的能动性。

英文摘要

Recent advances in large language models (LLMs) have prompted claims that such systems exhibit agency or qualify as moral agents. This paper argues that these attributions are misguided. We maintain that moral responsibility requires commitment-bearing agency grounded in intrinsic intentionality and self-attributed action, and that such agency constitutes the form of free will relevant to responsibility. Although LLMs generate coherent and normatively evaluable outputs, their operation is fully characterized by probabilistic input-output mappings learned from data. Their apparent intentionality is derived rather than intrinsic, and their outputs are neither owned as commitments nor guided by reasons. Variability introduced by stochastic sampling does not amount to choice or authorship. We address objections from the intentional stance, functionalism, compatibilism, and the presence of moral reasoning in model outputs, arguing that none suffice to establish genuine agency.

URL PDF HTML ☆

赞 0 踩 0

2606.13436 2026-06-12 cs.AI 新提交

Evaluation Sovereignty in Metadata-Driven Classification: A Multi-Track Framework for Weakly Supervised Information Systems

元数据驱动分类中的评估主权：面向弱监督信息系统的多轨道框架

Raymond Vasquez

AI总结针对弱监督元数据系统中标签权威性影响评估有效性的问题，提出评估主权概念及多轨道评估框架，通过实验揭示模型性能在银标与金标评估下的显著差异，并重新定义评估有效性为系统级属性。

详情

AI中文摘要

机器学习中的评估通常被视为中立的测量过程。然而，在操作性信息系统中，评估结果往往受标签生成过程的影响。本文并非旨在提升分类性能，而是考察在不同标签权威体制下性能测量的有效性。这一问题在大规模元数据驱动系统中尤为突出，此类系统中的标签常不完整、不一致或仅受弱监督。我们引入评估主权概念，定义为性能指标独立于标签权威和监督体制的程度，并提出一个多轨道评估框架，系统性地变化训练和评估标签来源。通过对大规模科学元数据进行层次多标签分类，我们证明在操作性（“银标”）评估下表现强劲的模型在独立（“金标”）评估下性能显著下降，尤其在细粒度分类中。例如，Micro-F1从约0.54降至0.03。值得注意的是，基于排名的指标仍高于基线，揭示了潜在模型信号与分类有效性之间的分歧。这些发现表明，通常报告的性能指标可能反映的是与标注过程的对齐，而非真正的预测能力。因此，我们将评估有效性重新概念化为由标签治理塑造的系统级属性，并为审计在弱监督下运行的智能系统提供了一种实用方法论。

英文摘要

Evaluation in machine learning is typically treated as a neutral measurement process. However, in operational information systems, evaluation outcomes are often conditioned by the processes used to generate labels. This paper does not seek to improve classification performance. Instead, it examines the validity of performance measurement under differing label-authority regimes. This issue is particularly relevant in large-scale metadata-driven systems, where labels are often incomplete, inconsistent, or weakly supervised. We introduce evaluation sovereignty, defined as the degree to which performance metrics are independent of label authority and supervision regime, and propose a multi-track evaluation framework that systematically varies training and evaluation label sources. Using hierarchical multi-label classification on large-scale scientific metadata, we demonstrate that models exhibiting strong performance under operational ("silver") evaluation degrade substantially under independent ("gold") evaluation, particularly for fine-grained classification. For example, Micro-F1 decreases from approximately 0.54 to 0.03. Notably, ranking-based metrics remain above baseline, revealing a divergence between latent model signal and classification validity. These findings suggest that commonly reported performance metrics may reflect alignment with labeling processes rather than true predictive capability. We therefore reconceptualize evaluation validity as a system-level property shaped by label governance and provide a practical methodology for auditing intelligent systems operating under weak supervision.

URL PDF HTML ☆

赞 0 踩 0

2606.13426 2026-06-12 cs.LG stat.ML 新提交

Accelerating Speculative Diffusions via Block Verification

通过块验证加速推测性扩散

Alexander Soen, Hisham Husain, Valentin De Bortoli, Arnaud Doucet

AI总结提出一种针对扩散模型的推测性采样方案，通过块验证提高草稿接受率，无需训练的Free Drafter实现高达6.3%的加速。

详情

AI中文摘要

推测性解码通过使用草稿模型生成令牌，并采用接受-拒绝方案确保输出与目标分布匹配，从而加速LLM推理。将其适应于连续扩散是困难的，因为推测性采样需要从残差分布中采样。虽然在离散空间中直接，但在连续空间中高效采样残差并非易事。因此，现有的扩散适应要么使用计算效率低下的采样技术，要么依赖替代方案。在这项工作中，我们引入了一种新颖的方案，高效地实现了扩散模型的原始推测性采样机制。我们的方法相比现有方法具有关键优势：它使我们能够将LLM的块验证适应到扩散——这被证明可以提高草稿的接受率。此外，我们形式化并分析了Free Drafter，一种无需训练的扩散启发式自推测草稿生成器。通过启用块验证，我们的Free Drafter在无需额外训练且开销可忽略的情况下，相比现有推测性方法实现了高达6.3%的加速。

英文摘要

Speculative decoding speeds up LLM inference by using a draft model to generate tokens, with an acceptance-rejection scheme that ensures that the output matches the target distribution. Adapting this to continuous diffusions is difficult because speculative sampling requires drawing from a residual distribution. While straightforward in discrete spaces, efficiently sampling this residual in continuous space is non-trivial. Consequently, existing diffusion adaptations either use computationally inefficient sampling techniques or rely on an alternative scheme. In this work, we introduce a novel scheme that efficiently implements the original speculative sampling mechanism for diffusion models. Our approach offers a critical advantage over current methods: it enables us to adapt block verification from LLMs to diffusions -- which provably improves the acceptance rate of drafts. Furthermore, we formalize and analyze the Free Drafter, a heuristic self-speculative drafter for diffusions that requires no training. By enabling block verification, our Free Drafter yields up to a 6.3% speedup over existing speculative methods with no additional training and negligible overhead beyond the existing parallel verification pass.

URL PDF HTML ☆

赞 0 踩 0

2606.13400 2026-06-12 cs.LG cs.AI cs.RO 新提交

PolyFlow: Safe and Efficient Polytope-Constrained Flow Matching with Constraint Embedding and Projection-free Update

PolyFlow: 安全高效的多面体约束流匹配，具有约束嵌入和无投影更新

Jianming Ma, Qiyue Yang, Yang Zhang, Liyun Yan, Zhanxiang Cao, Yazhou Zhang, Yue Gao

AI总结提出PolyFlow，一种将约束直接嵌入模型和流动力学的多面体约束流匹配框架，通过离散时间流公式和无投影架构消除离散化误差并严格满足任意多面体约束，在规划与控制任务中实现零约束违反并降低推理延迟。

详情

Comments: 30 pages, 12 figures, Accepted to ICML 2026

AI中文摘要

尽管基于流的生成模型在广泛领域展现了强大的性能，但由于严格的约束要求，在安全关键的物理系统中部署它们仍然具有挑战性。现有方法通常通过事后修正来强制执行安全性，这会产生大量的计算开销，并可能扭曲学习到的分布。我们提出了PolyFlow，一种多面体约束流匹配框架，将约束直接嵌入到模型和流动力学中。PolyFlow引入了离散时间流公式和无投影架构，消除了离散化误差，并保证严格满足任意多面体约束，无需昂贵的迭代求解器。实验结果表明，PolyFlow在规划和控制任务中实现了零约束违反，同时保持了较高的分布保真度。与最先进的约束生成基线相比，PolyFlow显著降低了推理延迟，并在安全性、效率和生成质量之间展示了有利的权衡。代码可在该 https URL 获取。

英文摘要

While flow-based generative models have demonstrated strong performance across a wide range of domains, deploying them in safety-critical physical systems remains challenging due to strict constraint requirements. Existing approaches typically enforce safety through post-hoc corrections, which incur substantial computational overhead and may distort the learned distribution. We propose PolyFlow, a polytope-constrained flow matching framework that embeds constraints directly into the model and flow dynamics. PolyFlow introduces a discrete-time flow formulation and a projection-free architecture, which eliminate the discretization error and guarantee strict satisfaction of arbitrary polyhedral constraints, without the need for expensive iterative solvers. Experimental results show that PolyFlow achieves zero constraint violation while maintaining high distributional fidelity across a range of planning and control tasks. Compared to state-of-the-art constrained generation baselines, PolyFlow significantly reduces inference latency and demonstrates a favorable trade-off between safety, efficiency, and generative quality. Code is available on this https URL.

URL PDF HTML ☆

赞 0 踩 0