arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2115
2411.08821 2026-06-17 stat.ML cs.LG stat.CO 版本更新

Conditional Local Importance by Quantile Expectations

基于分位数期望的条件局部重要性

Kelvyn K. Bladen, Adele Cutler, D. Richard Cutler, Kevin R. Moon

AI总结 提出模型无关的局部变量重要性方法CLIQUE,通过分位数期望捕获局部依赖关系,提升稳定性并直接适用于多类分类问题。

Comments 29 pages, 28 figures

详情
Journal ref
Transactions on Machine Learning Research (2026)
AI中文摘要

全局变量重要性度量通常用于解释机器学习模型的结果。局部变量重要性技术评估变量如何影响单个观测。当前流行的方法,包括LIME和SHAP,在预测空间中提供了有用的特征贡献度量,但在模型损失空间中改进局部结构表征方面仍有空间。此外,它们本身不适用于多类分类问题。我们提出了一种新的模型无关的局部变量重要性计算方法CLIQUE,它突出局部依赖关系,比基于置换的方法具有更好的稳定性,并且可以直接应用于多类分类问题。模拟和真实示例表明,CLIQUE强调局部依赖信息,捕获超出相关性可评估的交互行为,并在响应变量对变量变化不变的区域分配零重要性。

英文摘要

Global variable importance measures are commonly used to interpret the results of machine learning models. Local variable importance techniques assess how variables contribute to individual observations. Current, popular methods, including LIME and SHAP, provide useful measures of feature contribution in the prediction space, while leaving opportunities for improved characterization of local structure in the model loss space. Additionally, they are not natively adapted for multi-class classification problems. We propose a new model-agnostic method for calculating local variable importance, CLIQUE, that highlights locally dependent relationships, provides improved stability over permutation-based methods, and can be directly applied to multi-class classification problems. Simulated and real-world examples show that CLIQUE emphasizes locally dependent information, captures interaction behavior beyond what can be evaluated by correlations, and assigns zero importance in regions where the response is invariant to changes in variables.

2603.04198 2026-06-17 stat.ML cs.LG 版本更新

Stable and Steerable Sparse Autoencoders with Weight Regularization

基于权重正则化的稳定且可操控的稀疏自编码器

Piotr Jedryszek, Oliver M. Crook

AI总结 通过L1/L2权重正则化提高稀疏自编码器的跨种子特征一致性,并在语言模型上提升操控成功率,同时保持可解释性分数。

详情
AI中文摘要

稀疏自编码器(SAEs)被广泛用于从神经网络激活中提取人类可解释的特征,但其学习到的特征在不同随机种子和训练选择下可能差异很大。为了提高稳定性,我们研究了通过添加编码器和解码器权重的L1或L2惩罚进行权重正则化,并评估了正则化与常见SAE训练默认值的交互作用。在MNIST上,我们观察到L2权重正则化产生了一个高度对齐的特征核心,并且当与绑定初始化和单位范数解码器约束结合时,它显著提高了跨种子的特征一致性。对于在语言模型激活(Pythia-70M-deduped)上训练的TopK SAEs,添加小的L2权重惩罚增加了三个随机种子间共享特征的比例,并使操控成功率大致翻倍,同时自动可解释性分数的平均值基本保持不变。最后,在正则化设置下,激活操控成功与否能更好地由自动可解释性分数预测,这表明正则化可以使基于文本的特征解释与功能可控性对齐。

英文摘要

Sparse autoencoders (SAEs) are widely used to extract human-interpretable features from neural network activations, but their learned features can vary substantially across random seeds and training choices. To improve stability, we studied weight regularization by adding L1 or L2 penalties on encoder and decoder weights, and evaluate how regularization interacts with common SAE training defaults. On MNIST, we observe that L2 weight regularization produces a core of highly aligned features and, when combined with tied initialization and unit-norm decoder constraints, it dramatically increases cross-seed feature consistency. For TopK SAEs trained on language model activations (Pythia-70M-deduped), adding a small L2 weight penalty increased the fraction of features shared across three random seeds and roughly doubles steering success rates, while leaving the mean of automated interpretability scores essentially unchanged. Finally, in the regularized setting, activation steering success becomes better predicted by auto-interpretability scores, suggesting that regularization can align text-based feature explanations with functional controllability.

2603.02159 2026-06-17 stat.ML cs.LG 版本更新

Instrumental and Proximal Causal Inference with Gaussian Processes

基于高斯过程的工具变量和近端因果推断

Yuqi Zhang, Krikamol Muandet, Dino Sejdinovic, Edwin Fong, Siu Lun Chau

AI总结 提出去条件高斯过程框架,用于存在未观测混杂时的因果推断,同时提供可靠的后验不确定性量化,并通过边际似然优化实现模型选择。

详情
AI中文摘要

工具变量(IV)和近端因果学习(Proxy)方法是在存在未观测混杂情况下进行因果推断的核心框架。尽管方法论上取得了重大进展,现有方法很少提供可靠的认知不确定性(EU)量化。我们通过一个去条件高斯过程(DGP)框架来解决这一差距,用于不确定性感知的因果学习。我们的公式将流行的核估计量恢复为后验均值,确保了预测精度,而后验方差则提供了有原则且校准良好的EU。此外,概率结构通过边际对数似然优化实现了系统的模型选择。实证结果表明,通过经验覆盖频率和决策感知的准确率拒绝曲线评估,该方法在提供信息丰富的EU量化的同时,表现出强大的预测性能。总之,我们的方法为存在未观测混杂情况下的因果推断提供了一个统一、实用的解决方案,并具有可靠的不确定性。

英文摘要

Instrumental variable (IV) and proximal causal learning (Proxy) methods are central frameworks for causal inference in the presence of unobserved confounding. Despite substantial methodological advances, existing approaches rarely provide reliable epistemic uncertainty (EU) quantification. We address this gap through a Deconditional Gaussian Process (DGP) framework for uncertainty-aware causal learning. Our formulation recovers popular kernel estimators as the posterior mean, ensuring predictive precision, while the posterior variance yields principled and well-calibrated EU. Moreover, the probabilistic structure enables systematic model selection via marginal log-likelihood optimization. Empirical results demonstrate strong predictive performance alongside informative EU quantification, evaluated via empirical coverage frequencies and decision-aware accuracy rejection curves. Together, our approach provides a unified, practical solution for causal inference under unobserved confounding with reliable uncertainty.

2602.17894 2026-06-17 stat.ML cs.LG math.ST stat.TH 版本更新

Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget

从有偏且昂贵的数据源学习:预算下的极小极大最优数据收集

Michael O. Harding, Vikas Singh, Kirthevasan Kandasamy

AI总结 针对预算固定的多源数据收集问题,提出最大化有效样本量的采样方案,结合事后分层估计器,实现极小极大最优风险。

Comments COLT 2026

详情
AI中文摘要

数据收集是现代统计和机器学习流程的关键组成部分,特别是当必须从多个异质数据源收集数据以研究感兴趣的目标总体时。在许多用例中,如医学研究或政治民意调查,不同数据源产生不同的采样成本。观测通常具有相关的群体身份——例如健康指标、人口统计或政治派别——并且这些群体的相对组成可能在源总体之间以及源总体与目标总体之间存在显著差异。在这项工作中,我们研究在固定预算下的多源数据收集,重点关注总体均值和群体条件均值的估计。我们表明,朴素的数据收集策略(例如试图“匹配”目标分布)或依赖标准估计量(例如样本均值)可能高度次优。相反,我们开发了一种采样方案,该方案最大化有效样本量——总样本量除以 $D_{\chi^2}(q\mid\mid\overline{p}) + 1$,其中 $q$ 是目标分布,$\overline{p}$ 是聚合源分布,$D_{\chi^2}$ 是 $\chi^2$ 散度。我们将此采样方案与经典的事后分层估计器配对,并给出其风险的上界。我们提供了匹配的下界,证明我们的方法达到了预算下的极小极大最优风险。我们的技术也扩展到最小化超额风险的预测问题,为具有昂贵和异质数据源的多源学习提供了原则性方法。

英文摘要

Data collection is a critical component of modern statistical and machine learning pipelines, particularly when data must be gathered from multiple heterogeneous sources to study a target population of interest. In many use cases, such as medical studies or political polling, different sources incur different sampling costs. Observations often have associated group identities - for example, health markers, demographics, or political affiliations - and the relative composition of these groups may differ substantially, both among the source populations and between sources and target population. In this work, we study multi-source data collection under a fixed budget, focusing on the estimation of population means and group-conditional means. We show that naive data collection strategies (e.g. attempting to "match" the target distribution) or relying on standard estimators (e.g. sample mean) can be highly suboptimal. Instead, we develop a sampling plan which maximizes the effective sample size - the total sample size divided by $D_{χ^2}(q\mid\mid\overline{p}) + 1$, where $q$ is the target distribution, $\overline{p}$ is the aggregated source distribution, and $D_{χ^2}$ is the $χ^2$-divergence. We pair this sampling plan with a classical post-stratification estimator and upper bound its risk. We provide matching lower bounds, establishing that our approach achieves the budgeted minimax optimal risk. Our techniques also extend to prediction problems when minimizing the excess risk, providing a principled approach to multi-source learning with costly and heterogeneous data sources.

2507.04704 2026-06-17 q-bio.QM cs.AI cs.CV 版本更新

SPATIA: Multimodal Generation and Prediction of Spatial Cell Phenotypes

SPATIA: 空间细胞表型的多模态生成与预测

Zhenglun Kong, Mufan Qiu, John Boesen, Xiang Lin, Sukwon Yun, Tianlong Chen, Manolis Kellis, Marinka Zitnik

AI总结 提出SPATIA模型,融合细胞形态、基因表达和空间上下文,通过置信感知流匹配和形态-谱对齐实现多尺度生成与预测,在12项任务中优于18个基线模型。

Comments ICML 2026

详情
AI中文摘要

理解细胞形态、基因表达和空间上下文如何共同塑造组织功能是生物学中的一个核心挑战。基于图像的空间转录组学技术现在能够提供细胞图像和基因表达谱的高分辨率测量,但现有方法通常孤立地分析这些模态或以有限的分辨率进行分析。我们通过引入SPATIA来解决这个问题,这是一个多层次的生成和预测模型,通过融合从细胞到组织水平的形态、基因表达和空间上下文,学习统一的、空间感知的表征。SPATIA还结合了一个空间条件生成框架,该框架具有置信感知的OT重加权和形态-谱对齐,用于建模目标状态形态分布。具体来说,我们提出了一个置信感知的流匹配目标,该目标基于不确定性对弱最优传输对进行重加权。我们进一步应用形态-谱对齐来鼓励有生物学意义的图像生成,从而能够建模微环境依赖的表型转变。我们组装了一个多尺度数据集,包含17个组织中的2590万个细胞-基因对。我们在12项任务上对SPATIA与18个模型进行了基准测试,涵盖表型生成、注释、聚类、基因插补和跨模态预测等类别。SPATIA相比最先进模型取得了改进,生成保真度提高了8%,预测准确率提高了3%。

英文摘要

Understanding how cellular morphology, gene expression, and spatial context jointly shape tissue function is a central challenge in biology. Image-based spatial transcriptomics technologies now provide high-resolution measurements of cell images and gene expression profiles, but existing methods typically analyze these modalities in isolation or at limited resolution. We address the problem by introducing SPATIA, a multi-level generative and predictive model that learns unified, spatially aware representations by fusing morphology, gene expression, and spatial context from the cell to the tissue level. SPATIA also incorporates a spatially conditioned generative framework with confidence-aware OT reweighting and morphology-profile alignment for modeling target-state morphology distributions. Specifically, we propose a confidence-aware flow matching objective that reweights weak optimal-transport pairs based on uncertainty. We further apply morphology-profile alignment to encourage biologically meaningful image generation, enabling the modeling of microenvironment-dependent phenotypic transitions. We assembled a multi-scale dataset consisting of 25.9 million cell-gene pairs across 17 tissues. We benchmark SPATIA against 18 models across 12 tasks, spanning categories such as phenotype generation, annotation, clustering, gene imputation, and cross-modal prediction. SPATIA achieves improved performance over state-of-the-art models, improving generative fidelity by 8% and predictive accuracy by up to 3%.

2602.05790 2026-06-17 cs.IT cs.LG math.IT stat.ML 版本更新

Price of metric universality in vector quantization is at most 0.11 bit

向量量化中度量普适性的代价至多为0.11比特

Alina Harbuzova, Or Ordentlich, Yury Polyanskiy

AI总结 本文证明存在一个通用码本,对于所有可能的X统计量,在W为高斯时,其性能至少与速率每维度降低0.11比特的X自适应水填充码本相当。

Comments 41 page, 1 figure

详情
AI中文摘要

快速计算矩阵乘积 $W^\top X$ 是现代大语言模型的核心操作。为了更高效地部署,一种流行的方法是使用低精度近似 $\widehat W$ 替代真实 $W$(“仅权重量化”)。信息论表明,降低 $W$ 精度的最优算法依赖于 $X$ 的(二阶)统计量,并且需要将向量量化码本与 $X$ 的 PCA 方向仔细对齐(称为“水填充分配”的过程)。然而,码本对 $X$ 统计量的依赖性非常不实用。本文证明存在一个通用码本,对于所有可能的 $X$ 统计量同时接近最优,其意义在于:当 $W$ 为高斯时,该通用码本至少与速率每维度降低 0.11 比特的 $X$ 自适应水填充码本一样好。这样的通用码本将是低精度存储格式的理想候选者,这是当前活跃研究的话题,但可惜存在性证明是非构造性的。等价地,我们的结果表明在 $\mathbb{R}^n$ 中存在一个网,它同时关于所有希尔伯特范数是球面的接近最优覆盖。

英文摘要

Fast computation of a matrix product $W^\top X$ is a workhorse of modern LLMs. To make their deployment more efficient, a popular approach is that of using a low-precision approximation $\widehat W$ in place of true $W$ (``weight-only quantization''). Information theory demonstrates that an optimal algorithm for reducing precision of $W$ depends on the (second order) statistics of $X$ and requires a careful alignment of vector quantization codebook with PCA directions of $X$ (a process known as ``waterfilling allocation''). Dependence of the codebook on statistics of $X$, however, is highly impractical. This paper proves that there exist a universal codebook that is simultaneously near-optimal for all possible statistics of $X$, in the sense of being at least as good as an $X$-adapted waterfilling codebook with rate reduced by 0.11 bit per dimension in the case when $W$ is Gaussian. Such universal codebook would be an ideal candidate for the low-precision storage format, a topic of active modern research, but alas the existence proof is non-constructive. Equivalently, our result shows existence of a net in $\mathbb{R}^n$ that is a nearly-optimal covering of a sphere simultaneously with respect to all Hilbert norms.

2602.04901 2026-06-17 q-bio.GN cs.LG 版本更新

Beyond Independent Genes: Learning Module-Inductive Representations for Single-Cell Gene Perturbation Prediction

超越独立基因:学习模块归纳表示用于单细胞基因扰动预测

Jiafa Ruan, Ruijie Quan, Liyang Xu, Zongxin Yang, Yi Yang

AI总结 提出scBIG框架,通过基因关系聚类、基因簇感知编码器和结构感知对齐学习协调的基因程序模块表示,结合条件流匹配实现灵活泛化的扰动预测,在多个单细胞扰动基准上平均提升6.7%。

详情
AI中文摘要

预测遗传扰动引起的转录响应是功能基因组学中的一个核心问题。实际上,扰动响应很少是基因独立的,而是表现为功能相关基因之间协调的、程序级别的转录变化。然而,大多数现有方法由于基于基因的建模范式以及依赖无法捕捉动态程序重组的静态生物学先验知识,未能显式建模这种协调性。为解决这些局限,我们提出scBIG,一种模块归纳的扰动预测框架,显式建模协调的基因程序。scBIG通过基因关系聚类从数据中归纳出连贯的基因程序,通过基因簇感知编码器捕获程序间交互,并使用结构感知对齐目标保持模块协调性。然后利用条件流匹配对这些结构化表示进行建模,以实现灵活且可泛化的扰动预测。在多个单细胞扰动基准上的大量实验表明,scBIG始终优于最先进的方法,特别是在未见和组合扰动设置中,相比最强基线平均提升6.7%。代码可在该https URL获取。

英文摘要

Predicting transcriptional responses to genetic perturbations is a central problem in functional genomics. In practice, perturbation responses are rarely gene-independent but instead manifest as coordinated, program-level transcriptional changes among functionally related genes. However, most existing methods do not explicitly model such coordination, due to gene-wise modeling paradigms and reliance on static biological priors that cannot capture dynamic program reorganization. To address these limitations, we propose scBIG, a module-inductive perturbation prediction framework that explicitly models coordinated gene programs. scBIG induces coherent gene programs from data via Gene-Relation Clustering, captures inter-program interactions through a Gene-Cluster-Aware Encoder, and preserves modular coordination using structure-aware alignment objectives. These structured representations are then modeled using conditional flow matching to enable flexible and generalizable perturbation prediction. Extensive experiments on multiple single-cell perturbation benchmarks show that scBIG consistently outperforms state-of-the-art methods, particularly on unseen and combinatorial perturbation settings, achieving an average improvement of 6.7% over the strongest baselines. The code is available at https://github.com/ttruan2426-dot/scBIG.

2602.04155 2026-06-17 stat.ML cs.GT cs.LG 版本更新

Maximin Relative Improvement: Fair Learning as a Bargaining Problem

最大化相对改进:将公平学习视为讨价还价问题

Jiwoo Han, Moulinath Banerjee, Yuekai Sun

AI总结 提出将群体公平解释为子群体间的讨价还价问题,通过相对改进指标恢复Kalai-Smorodinsky解,并给出公理化和有限样本收敛保证。

Comments Accepted at ICML 2026

详情
AI中文摘要

当在多个子群体上部署单一预测器时,我们提出了一种根本不同的方法:将群体公平解释为子群体间的讨价还价问题。这种博弈论视角揭示了现有的鲁棒优化方法(如最小化最差群体损失或遗憾)对应于经典的讨价还价解,并体现了不同的公平原则。我们提出了相对改进,即实际风险降低相对于基线预测器潜在降低的比率,它恢复了Kalai-Smorodinsky解。与当群体具有不同潜在可预测性时可能不可比较的绝对尺度方法不同,相对改进提供了公理化理由,包括尺度不变性和个体单调性。我们在温和条件下建立了有限样本收敛保证。

英文摘要

When deploying a single predictor across multiple subpopulations, we propose a fundamentally different approach: interpreting group fairness as a bargaining problem among subpopulations. This game-theoretic perspective reveals that existing robust optimization methods such as minimizing worst-group loss or regret correspond to classical bargaining solutions and embody different fairness principles. We propose relative improvement, the ratio of actual risk reduction to potential reduction from a baseline predictor, which recovers the Kalai-Smorodinsky solution. Unlike absolute-scale methods that may not be comparable when groups have different potential predictability, relative improvement provides axiomatic justification including scale invariance and individual monotonicity. We establish finite-sample convergence guarantees under mild conditions.

2602.02881 2026-06-17 cs.SE cs.AI 版本更新

Learning-Infused Formal Reasoning: From Contract Synthesis to Artifact Reuse and Formal Semantics

学习增强的形式化推理:从合约合成到工件复用和形式语义

Arshad Beg, Diarmuid O'Donoghue, Rosemary Monahan

AI总结 提出将形式化方法与人工智能融合的长期研究愿景,通过自动化合约合成、语义工件复用和精化理论,构建知识驱动的验证生态系统,加速未来保障。

Comments LNCS Proceedings Submitted Version. 17 pages. Accepted and presented at VERIFAI-2026: The Interplay between Artificial Intelligence and Software Verification LASER center, Villebrumier, France, March 8-11, 2026

详情
AI中文摘要

本文阐述了形式化方法与人工智能交叉领域的长期研究愿景,概述了多个概念和技术维度,并报告了我们为实现这一愿景正在开展的工作。它基于自动化合约合成、语义工件复用和基于精化的理论,提出了下一代形式化方法的前瞻性视角。我们认为,未来的验证系统必须从构建单个正确性证明转向累积的、知识驱动的范式,其中规范、合约和证明被持续合成并在系统间转移。为支持这一转变,我们概述了一个混合框架,结合大语言模型与基于图的表示,以实现可扩展的语义匹配和验证工件的原则性复用。基于学习的组件在异构表示法和抽象层次间提供语义指导,而符号匹配确保形式正确性。基于组合推理,这一愿景指向系统演化的验证生态系统,利用过去的验证工作加速未来的保障。

英文摘要

This paper articulates a long-term research vision for formal methods at the intersection with artificial intelligence, outlining multiple conceptual and technical dimensions and reporting on our ongoing work toward realising this vision. It advances a forward-looking perspective on the next generation of formal methods based on the integration of automated contract synthesis, semantic artifact reuse, and refinement-based theory. We argue that future verification systems must builds towards individual correctness proofs toward a cumulative, knowledge-driven paradigm in which specifications, contracts, and proofs are continuously synthesised and transferred across systems. To support this shift, we outline a hybrid framework combining large language models with graph-based representations to enable scalable semantic matching and principled reuse of verification artifacts. Learning-based components provide semantic guidance across heterogeneous notations and abstraction levels, while symbolic matching ensures formal soundness. Grounded in compositional reasoning, this vision points toward verification ecosystems that evolve systematically, leveraging past verification efforts to accelerate future assurance.

2601.22184 2026-06-17 cs.GT cs.LG cs.MA 版本更新

Tacit Coordination of Large Language Models

大型语言模型的隐性协调

Ido Aharon, Emanuele La Malfa, Michael Wooldridge, Sarit Kraus

AI总结 研究大型语言模型在多智能体无通信协调中的焦点涌现能力,通过博弈和搜救任务评估,发现模型在多数场景匹配或超越人类,但在数值常识和文化显著性任务中失败,并提出无学习策略改善协调。

Comments Code: https://github.com/EmanueleLM/focal-points

详情
AI中文摘要

大型语言模型(LLMs)越来越多地被部署在需要无通信协调的多智能体环境中,从人机交互到安全关键场景。人类通常通过焦点来克服缺乏沟通的问题:这些是自然突出的显著解决方案。我们首次大规模评估了焦点如何在LLMs中涌现、何时以及为何涌现,通过合作与竞争博弈(包括真实的搜救场景)比较其与人类的行为,展示了焦点何时能实现有效协调。在超过20个开源和闭源模型中,我们发现LLMs表现出显著的无通信协调能力,通常匹配或超越人类。然而,相同的模型在需要数值常识或文化细微显著性的任务中始终失败。我们还评估了简单的无学习策略,这些策略显著改善了LLMs之间以及人类与LLMs之间的协调。我们的结果揭示了现代LLMs中惊人的协调能力以及社会局限性,并提供了对其编码的潜在显著性概念的新见解。我们的发现警示,在协调环境中部署LLMs时,不应假设它们共享人类的文化和感知基础。

英文摘要

Large Language Models (LLMs) are increasingly deployed in multi-agent settings that require coordination without communication, from human-AI interaction to safety-critical scenarios. Humans often overcome the absence of communication through focal points: salient solutions that naturally stand out to all participants. We present the first large-scale evaluation of how, when, and why focal points emerge in LLMs, comparing their behaviour with humans across cooperative and competitive games, including realistic search and rescue scenarios, demonstrating when focal points enable effective coordination. Across more than 20 open- and closed-source models, we find that LLMs exhibit a remarkable ability to coordinate without communication, often matching or outperforming humans. However, the same models consistently fail in tasks requiring numerical common sense or culturally nuanced notions of salience. We additionally evaluate simple learning-free strategies that substantially improve coordination both among LLMs and between humans and LLMs. Our results reveal striking coordination capabilities, as well as social limitations in modern LLMs, and offer new insight into the latent notions of salience encoded within them. Our findings caution against assuming that LLMs share humans' cultural and perceptual substrate when deployed in coordination settings.

2601.21455 2026-06-17 stat.ML cs.LG 版本更新

Questioning the Coverage-Length Metric in Conformal Prediction: When Shorter Intervals Are Not Better

质疑共形预测中的覆盖-长度度量:当更短的区间并不更好时

Yizhou Min, Yizhou Lu, Lanqi Li, Zhen Zhang, Jiaye Teng

AI总结 本文批判性检验共形预测中标准度量(覆盖率和区间长度)的充分性,揭示一种称为“偏见技巧”(PT)的反直觉方法可欺骗性地缩短区间长度而保持覆盖有效,并提出新度量“区间稳定性”以检测此类行为。

详情
AI中文摘要

共形预测(CP)已成为无分布不确定性量化的基石,通常通过其覆盖率和区间长度进行评估。本文批判性地检验了这些标准度量的充分性。我们证明,通过一种称为偏见技巧(PT)的反直觉方法,区间长度可能被欺骗性地改善,而覆盖率仍然有效。具体而言,对于任何给定的测试样本,PT 概率性地返回一个区间,该区间要么为空,要么使用调整后的置信水平构建,从而保持边际覆盖率。虽然 PT 可能产生欺骗性较低的区间长度,但它引入了实际漏洞:同一输入在算法的重复运行中可能产生完全不同的预测区间。我们正式推导了 PT 实现这些误导性改进的条件,并在各种回归和分类任务中提供了广泛的实证证据。此外,我们引入了一个新度量——区间稳定性,它有助于检测新的 CP 方法是否基于此类 PT 技术隐式地改善了长度。代码可在 https://this URL 获取。

英文摘要

Conformal prediction(CP) has become a cornerstone of distribution-free uncertainty quantification, conventionally evaluated by its coverage and interval length. This work critically examines the sufficiency of these standard metrics. We demonstrate that the interval length might be deceptively improved through a counter-intuitive approach termed Prejudicial Trick(PT), while the coverage remains valid. Specifically, for any given test sample, PT probabilistically returns an interval, which is either null or constructed using an adjusted confidence level, thereby preserving marginal coverage. While PT potentially yields a deceptively lower interval length, it introduces practical vulnerabilities: the same input can yield completely different prediction intervals across repeated runs of the algorithm. We formally derive the conditions under which PT achieves these misleading improvements and provide extensive empirical evidence across various regression and classification tasks. Furthermore, we introduce a new metric interval stability which helps detect whether a new CP method implicitly improves the length based on such PT-like techniques. Code is available at https://github.com/benben-cd/PT-Conformal-Prediction.

2601.06862 2026-06-17 cs.CR cs.CV cs.LG cs.MM eess.IV 版本更新

Learning QoE from Packet-Level Measurements in Encrypted Video Conferencing Traffic

从加密视频会议流量的数据包级别测量中学习QoE

Michael Sidorov, Ofer Hadar

AI总结 针对ISP无法访问加密内容评估QoE的挑战,提出基于CNN的框架仅利用数据包大小预测BRISQUE和MOS,在WhatsApp和Zoom数据集上优于先前模型。

详情
AI中文摘要

用户体验质量已成为当今世界最重要的方面之一,因为它直接影响个人继续使用或放弃产品或服务的意愿。在此背景下,视频会议应用(VCAs)在COVID-19大流行后得到广泛采用,必须在日益拥挤的市场中提供卓越性能以保持竞争力。尽管内容提供商(CPs)如Zoom、WhatsApp、Telegram和Google Meet可以通过比较发送和接收的数据来评估通话质量,但VCAs中广泛使用的端到端加密使得互联网服务提供商(ISPs)评估体验质量(QoE)变得更加困难。由于ISPs无法访问加密内容,他们必须依赖对数据路径上未加密流量特征的被动测量。在这项工作中,我们提出了一个简单而有效的QoE预测框架,基于几乎原生的卷积神经网络(CNN)架构,仅使用从视频会议(VC)通话中两个参与者之间的通信中提取的数据包大小来预测两个QoE指标:BRISQUE和MOS。所提出的框架简单、易于实现,且不需要高端计算资源,但提供了优越的预测性能,正如我们在从WhatsApp和Zoom收集的两个自定义数据集上的实验所示,这些实验在QoE预测任务上比先前模型取得了显著改进。

英文摘要

The quality of the user experience has become one of the most important aspects in todays world, as it directly influences individuals willingness to continue using or abandon a product or service. In this context, video conferencing applications (VCAs), which experienced widespread adoption following the COVID-19 pandemic, must deliver excellent performance to remain competitive in an increasingly crowded market. Although content providers (CPs) such as Zoom, WhatsApp, Telegram, and Google Meet can assess conversation quality by comparing transmitted and received data. The widespread use of end-to-end encryption in VCAs makes quality-of-experience (QoE) evaluation by internet service providers (ISPs) far more challenging. Since ISPs do not have access to the encrypted content, they must rely on passive measurements of unencrypted traffic characteristics on the data path. In this work, we present a simple yet effective QoE prediction framework based on an almost stock convolutional neural network (CNN) architecture that uses only the packet sizes extracted from the communication between two participants in a video conferencing (VC) call to predict two QoE metrics: BRISQUE and MOS. The proposed framework is simple, easy to implement, and does not require high-end computational resources, yet it provides superior prediction performance, as shown in our experiments on two custom datasets collected from WhatsApp and Zoom, which achieve substantial improvements over previous models for the QoE prediction task.

2512.25065 2026-06-17 cs.OS cs.AI cs.DC 版本更新

Vulcan: Instance-specialized, Verifiable Systems Heuristics Through LLM-driven Search

Vulcan:通过LLM驱动的搜索实现实例特化的可验证系统启发式方法

Rohit Dwivedula, Divyanshu Saxena, Sujay Yadalam, Eric Hayden Campbell, Daehyeok Kim, Aditya Akella

AI总结 提出Vulcan框架,利用LLM生成系统启发式方法,通过隔离决策逻辑和受限语言Anvil保证安全,在调度、缓存和内存管理上取得显著性能提升。

Comments 19 pages

详情
AI中文摘要

系统资源管理任务主要依赖于手工设计的启发式方法。然而,日益增长的硬件异构性和工作负载多样性要求针对特定部署实例进行特化的启发式方法,这使得手动设计成本高昂且难以扩展。在本文中,我们探索如何使用LLM合成系统启发式方法。主要挑战是确保生成的启发式方法安全执行、正确集成到周围系统中,同时仍能实现强大的性能。我们提出Vulcan,一个识别LLM友好接口的框架,该接口将核心决策逻辑与其余实现隔离。使用Vulcan,LLM生成的代码被限制为简单的无状态决策函数,而可信的运行时抽象提供丰富的派生统计信息,用于有意义的策略探索,而不会出现系统集成错误。为了确保执行安全,LLM使用受限语言Anvil合成启发式方法,该语言通过构造保证重要属性。我们在三个研究充分的领域评估Vulcan,并展示了在spot-VM调度中高达4.9倍的节省,缓存驱逐中高达2倍的未命中率降低,以及分层内存系统中高达10%的应用性能提升,同时全程确保执行安全。

英文摘要

Systems resource management tasks rely primarily on hand-designed heuristics. However, growing hardware heterogeneity and workload diversity require heuristics specialized to particular deployment instances, making manual design expensive and difficult to scale. In this paper, we explore how to synthesize systems heuristics using LLMs. The main challenge is ensuring that generated heuristics execute safely, integrate correctly with the surrounding system, and still achieve strong performance. We propose Vulcan, a framework that identifies LLM-friendly interfaces that isolate core decision logic from the rest of the implementation. With Vulcan, LLM-generated code is restricted to simple stateless decision functions, while trusted runtime abstractions provide rich derived statistics for meaningful policy exploration without system-integration bugs. To ensure execution safety, LLMs synthesize heuristics in a restricted language, Anvil, that guarantees important properties by construction. We evaluate Vulcan across three well-studied domains and demonstrate up to 4.9x higher savings for spot-VM scheduling, up to 2x lower miss ratios for cache eviction, and up to 10% higher application performance for tiered-memory systems, while ensuring execution safety throughout.

2510.21127 2026-06-17 cs.NI cs.AI 版本更新

Enhanced Evolutionary Multi-Objective Deep Reinforcement Learning for Reliable and Efficient Wireless Rechargeable Sensor Networks

增强型进化多目标深度强化学习用于可靠高效无线可充电传感器网络

Bowei Tong, Hui Kang, Jiahui Li, Geng Sun, Jiacheng Wang, Yaoqi Yang, Bo Xu, Dusit Niyato

AI总结 针对无线可充电传感器网络中节点存活率与充电能效的权衡问题,提出一种结合LSTM策略网络、MLP前瞻增量模型和时变Pareto策略评估的增强型进化多目标深度强化学习算法,显著优于现有方法。

Comments The article content needs to be significantly revised

详情
AI中文摘要

尽管传感器网络取得了快速进展,但传统的电池供电传感器网络存在运行寿命有限和维护频繁的问题,严重限制了其在偏远和不可达环境中的部署。因此,具有移动充电能力的无线可充电传感器网络(WRSNs)为延长网络寿命提供了一种有前景的解决方案。然而,WRSNs面临着在动态运行条件下最大化节点存活率与最大化充电能效之间固有权衡的关键挑战。在本文中,我们研究了一个典型场景,其中移动充电器移动并为传感器充电,从而在最小化能量浪费的同时维持网络连通性。具体而言,我们制定了一个多目标优化问题,该问题同时最大化多个时隙内的网络节点存活率和移动充电器能量使用效率,这具有NP-hard计算复杂性和长期时间依赖性,使得传统优化方法无效。为了解决这些挑战,我们提出了一种增强型进化多目标深度强化学习算法,该算法集成了基于长短期记忆(LSTM)的策略网络用于时间模式识别、基于多层感知器的前瞻增量模型用于未来状态预测,以及时变Pareto策略评估方法用于动态偏好适应。大量仿真结果表明,所提算法在平衡节点存活率和能量效率方面显著优于现有方法,同时生成多样化的Pareto最优解。此外,LSTM增强的策略网络比传统网络收敛速度快25%,时变评估方法有效适应动态条件。

英文摘要

Despite rapid advancements in sensor networks, conventional battery-powered sensor networks suffer from limited operational lifespans and frequent maintenance requirements that severely constrain their deployment in remote and inaccessible environments. As such, wireless rechargeable sensor networks (WRSNs) with mobile charging capabilities offer a promising solution to extend network lifetime. However, WRSNs face critical challenges from the inherent trade-off between maximizing the node survival rates and maximizing charging energy efficiency under dynamic operational conditions. In this paper, we investigate a typical scenario where mobile chargers move and charge the sensor, thereby maintaining the network connectivity while minimizing the energy waste. Specifically, we formulate a multi-objective optimization problem that simultaneously maximizes the network node survival rate and mobile charger energy usage efficiency across multiple time slots, which presents NP-hard computational complexity with long-term temporal dependencies that make traditional optimization approaches ineffective. To address these challenges, we propose an enhanced evolutionary multi-objective deep reinforcement learning algorithm, which integrates a long short-term memory (LSTM)-based policy network for temporal pattern recognition, a multilayer perceptron-based prospective increment model for future state prediction, and a time-varying Pareto policy evaluation method for dynamic preference adaptation. Extensive simulation results demonstrate that the proposed algorithm significantly outperforms existing approaches in balancing node survival rate and energy efficiency while generating diverse Pareto-optimal solutions. Moreover, the LSTM-enhanced policy network converges 25% faster than conventional networks, with the time-varying evaluation method effectively adapting to dynamic conditions.

2510.19528 2026-06-17 stat.ML cs.LG math.ST stat.TH 版本更新

Learning Upper Lower Value Envelopes to Shape Online RL: A Principled Approach

学习上下值包络以塑造在线强化学习:一种原则性方法

Sebastian Reboul, Hélène Halconruy

AI总结 提出一种两阶段框架,利用离线数据学习值函数的上下界,并将其融入在线算法,通过解耦上下界实现更灵活紧致的近似,理论分析给出高概率遗憾界,实验表明显著降低遗憾。

Comments 35 pages, 5 figures

详情
AI中文摘要

我们研究了利用离线数据加速在线强化学习这一基本问题——该方向潜力巨大但理论基础有限。我们的研究聚焦于如何在此背景下\emph{学习}和\emph{应用}值包络。为此,我们引入了一个原则性的两阶段框架:第一阶段使用离线数据推导值函数的上下界,第二阶段将这些学习到的界融入在线算法。我们的方法通过解耦上下界扩展了先前工作,实现了更灵活和紧致的近似。与依赖固定塑形函数的方法不同,我们的包络是数据驱动的,并明确建模为随机变量,通过过滤论证确保各阶段的独立性。分析建立了由两个可解释量决定的高概率遗憾界,从而为离线预训练和在线微调之间提供了形式化的桥梁。在表格型MDP上的实验结果表明,与UCBVI和先前方法相比,我们的方法显著降低了遗憾,同时与相关方法保持竞争力。

英文摘要

We investigate the fundamental problem of leveraging offline data to accelerate online reinforcement learning - a direction with strong potential but limited theoretical grounding. Our study centers on how to \emph{learn} and \emph{apply} value envelopes within this context. To this end, we introduce a principled two-stage framework: the first stage uses offline data to derive upper and lower bounds on value functions, while the second incorporates these learned bounds into online algorithms. Our method extends prior work by decoupling the upper and lower bounds, enabling more flexible and tighter approximations. In contrast to approaches that rely on fixed shaping functions, our envelopes are data-driven and explicitly modeled as random variables, with a filtration argument ensuring independence across phases. The analysis establishes high-probability regret bounds determined by two interpretable quantities, thereby providing a formal bridge between offline pre-training and online fine-tuning. Empirical results on tabular MDPs demonstrate substantial regret reductions compared with both UCBVI and prior methods while remaining competitive with related approaches.

2510.18003 2026-06-17 cs.CR cs.AI cs.CY 版本更新

BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?

BadScientist: 研究代理能否写出令人信服但不严谨的论文来欺骗LLM审稿人?

Fengqing Jiang, Yichen Feng, Yuetai Li, Luyao Niu, Basel Alomair, Radha Poovendran

AI总结 提出BadScientist框架,通过无需真实实验的呈现操纵策略生成造假论文,揭示LLM审稿系统存在系统性漏洞,造假论文接受率高达一定水平,且审稿人存在“担忧-接受冲突”,当前检测方法效果有限。

Comments ACL 2026; Project Page at https://bad-scientist.github.io/

详情
AI中文摘要

基于LLM的研究助手和基于AI的同行评审系统的融合产生了一个关键漏洞:完全自动化的出版循环,其中AI生成的研究由AI评审员在没有人类监督的情况下进行评估。我们通过\textbf{BadScientist}框架对此进行研究,该框架评估面向造假的论文生成代理能否欺骗多模型LLM评审系统。我们的生成器采用无需真实实验的呈现操纵策略。我们开发了一个严格的评估框架,具有形式化的错误保证(集中界和校准分析),并在真实数据上进行了校准。我们的结果揭示了系统性漏洞:造假论文的接受率高达一定水平。关键的是,我们发现了\textit{担忧-接受冲突}——评审员经常标记诚信问题,却给出接受级别的分数。我们的缓解策略仅显示出微小的改进,检测准确性几乎不超过随机猜测。尽管聚合数学在理论上可靠,但诚信检查系统性失败,暴露了当前AI驱动评审系统的根本局限性,并强调了在科学出版中迫切需要纵深防御保障措施。

英文摘要

The convergence of LLM-powered research assistants and AI-based peer review systems creates a critical vulnerability: fully automated publication loops where AI-generated research is evaluated by AI reviewers without human oversight. We investigate this through \textbf{BadScientist}, a framework that evaluates whether fabrication-oriented paper generation agents can deceive multi-model LLM review systems. Our generator employs presentation-manipulation strategies requiring no real experiments. We develop a rigorous evaluation framework with formal error guarantees (concentration bounds and calibration analysis), calibrated on real data. Our results reveal systematic vulnerabilities: fabricated papers achieve acceptance rates up to . Critically, we identify \textit{concern-acceptance conflict} -- reviewers frequently flag integrity issues yet assign acceptance-level scores. Our mitigation strategies show only marginal improvements, with detection accuracy barely exceeding random chance. Despite provably sound aggregation mathematics, integrity checking systematically fails, exposing fundamental limitations in current AI-driven review systems and underscoring the urgent need for defense-in-depth safeguards in scientific publishing.

2502.18049 2026-06-17 stat.ML cs.LG 版本更新

Recursive Learning Without Collapse: A Weighting-Based Stabilization Framework

无崩溃的递归学习:基于加权的稳定化框架

Hengzhi He, Shirong Xu, Guang Cheng

AI总结 针对递归生成模型训练中的模型崩溃问题,提出基于加权的训练策略,在混合真实与合成数据场景下,理论推导出最优加权方案的统一表达式,揭示合成数据利用与模型性能间的权衡。

Comments This article has been accepted for publication in Journal of the Royal Statistical Society: Series B, published by Oxford University Press

详情
AI中文摘要

最近的研究发现了递归生成模型训练中的一个有趣现象,称为模型崩溃,即基于先前模型生成的数据训练的模型表现出严重的性能下降。解决这一问题并开发更有效的训练策略已成为生成模型研究的核心挑战。在本文中,我们在一个新框架下研究这一现象,其中生成模型在每一步迭代中基于新收集的真实数据和上一步的合成数据的组合进行训练。为了开发整合真实和合成数据的最优训练策略,我们评估了加权训练方案在各种场景下的性能,包括高斯分布估计、广义线性模型和非参数估计。我们从理论上刻画了合成数据的混合比例和加权方案对最终模型性能的影响。我们的关键发现是,在不同设置下,不同合成数据比例下的最优加权方案渐近地遵循一个统一表达式,揭示了利用合成数据与模型性能之间的基本权衡。在某些情况下,分配给真实数据的最优权重对应于黄金比例的倒数。最后,我们在大量模拟数据集和一个真实表格数据集上验证了我们的理论结果。

英文摘要

Recent studies identified an intriguing phenomenon in recursive generative model training known as model collapse, where models trained on data generated by previous models exhibit severe performance degradation. Addressing this issue and developing more effective training strategies have become central challenges in generative model research. In this paper, we investigate this phenomenon within a novel framework, where generative models are iteratively trained on a combination of newly collected real data and synthetic data from the previous training step. To develop an optimal training strategy for integrating real and synthetic data, we evaluate the performance of a weighted training scheme in various scenarios, including Gaussian distribution estimation, generalized linear models, and nonparametric estimation. We theoretically characterize the impact of the mixing proportion and weighting scheme of synthetic data on the final model's performance. Our key finding is that, across different settings, the optimal weighting scheme under different proportions of synthetic data asymptotically follows a unified expression, revealing a fundamental trade-off between leveraging synthetic data and model performance. In some cases, the optimal weight assigned to real data corresponds to the reciprocal of the golden ratio. Finally, we validate our theoretical results on extensive simulated datasets and a real tabular dataset.

2510.01359 2026-06-17 cs.CR cs.AI 版本更新

Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks

破解代码:通过系统性越狱攻击对AI代码代理进行安全评估

Shoumik Saha, Jifan Chen, Sam Mayers, Sanjay Krishna Gouda, Zijian Wang, Varun Kumar

AI总结 提出JAWS-Bench基准测试,通过三级工作区评估代码LLM代理的越狱风险,发现代理化使攻击成功率提升1.6倍,并揭示可执行攻击代码的高比例。

Comments 22 pages, 18 figures, 8 tables

详情
AI中文摘要

具备代码能力的大语言模型(LLM)代理被嵌入软件工程工作流中,可以读取、编写和执行代码,这使得“越狱”的风险超越了纯文本环境。先前的评估侧重于拒绝或有害文本检测,而未涉及代理是否编译并运行恶意程序。我们提出了JAWS-Bench(跨工作区越狱基准),该基准涵盖三个逐步升级的工作区场景,以反映攻击者的能力:空工作区(JAWS-0)、单文件工作区(JAWS-1)和多文件工作区(JAWS-M)。我们将其与一个分层的、可执行感知的评判框架配对,该框架测试(i)合规性、(ii)攻击成功性、(iii)语法正确性以及(iv)运行时可执行性,以衡量可部署的危害。在来自五个系列的七个LLM后端上,JAWS-0中的纯提示攻击实现了61%的合规性;其中58%有害,52%可解析,27%可端到端运行。在JAWS-1中,更强模型的合规性达到约100%,平均攻击成功率(ASR)约为71%;JAWS-M将平均ASR提升至约75%,其中32%的攻击代码可运行。将LLM封装为代理会使ASR提高1.6倍,这是通过在规划和工具使用过程中推翻初始拒绝来实现的。类似的趋势也出现在OpenHands、SWE-Agent和OpenAI Codex上,表明我们的JAWS-Bench是代理无关的。类别分析识别出哪些攻击类别最易受攻击且最可部署,从而激励了执行感知的防御和保留拒绝的代理设计。

英文摘要

Code-capable large language model (LLM) agents are embedded in software engineering workflows where they can read, write, and execute code, raising "jailbreak" stakes beyond text-only settings. Prior evaluations emphasize refusal or harmful-text detection, leaving open whether agents compile and run malicious programs. We present JAWS-Bench (Jailbreaks Across WorkSpaces), a benchmark spanning three escalating workspace regimes mirroring attacker capability: empty (JAWS-0), single-file (JAWS-1), and multi-file (JAWS-M). We pair this with a hierarchical, executable-aware Judge Framework that tests (i) compliance, (ii) attack success, (iii) syntactic correctness, and (iv) runtime executability, to measure deployable harm. Across seven LLM backends from five families, prompt-only attacks in JAWS-0 achieve 61% compliance; 58% are harmful, 52% parse, and 27% run end-to-end. In JAWS-1, compliance reaches ~100% for stronger models with a mean ASR (Attack Success Rate) ~71%; JAWS-M raises mean ASR to ~75%, with 32% runnable attack code. Wrapping an LLM in an agent increases ASR by 1.6$\times$, by overturning initial refusals during planning and tool use. Similar trends hold for OpenHands, SWE-Agent, and OpenAI Codex, suggesting our JAWS-Bench is agent-agnostic. Category analyses identify which attack classes are most vulnerable and deployable, motivating execution-aware defenses and refusal-preserving agent designs.

2501.10729 2026-06-17 stat.ME cs.LG stat.ML 版本更新

Robust Local Polynomial Regression with Similarity Kernels

基于相似性核的稳健局部多项式回归

Yaniv Shulman

AI总结 针对传统局部多项式回归对异常值敏感的问题,提出一种结合响应变量信息的条件密度核加权方法,通过局部密度估计降低异常值影响,在保持与标准LOWESS竞争力同时降低经验偏差。

详情
AI中文摘要

局部多项式回归(LPR)因其灵活性和简单性,是一种广泛使用的非参数方法,用于建模复杂关系。它通过拟合低阶多项式到数据的局部子集(按邻近度加权)来估计回归函数。然而,传统的LPR对异常值和高杠杆点敏感,这些点会显著影响估计精度。本文重新审视用于计算回归权重的核函数,并提出一种新颖的框架,将预测变量和响应变量都纳入加权机制。本工作的重点是一种条件密度核,通过局部密度估计减轻异常值的影响,从而稳健地估计权重。所提出的方法已在Python中实现,并在此https URL公开提供。总体分析量化了基于密度的稳健加权引起的偏差,报告的实验显示,与迭代稳健LOWESS相比,经验偏差更低,同时与标准LOWESS保持竞争力。这一进展为传统LPR提供了有前景的扩展,为稳健回归应用开辟了新的可能性。

英文摘要

Local Polynomial Regression (LPR) is a widely used nonparametric method for modeling complex relationships due to its flexibility and simplicity. It estimates a regression function by fitting low-degree polynomials to localized subsets of the data, weighted by proximity. However, traditional LPR is sensitive to outliers and high-leverage points, which can significantly affect estimation accuracy. This paper revisits the kernel function used to compute regression weights and proposes a novel framework that incorporates both predictor and response variables in the weighting mechanism. The focus of this work is a conditional density kernel that robustly estimates weights by mitigating the influence of outliers through localized density estimation. The proposed method is implemented in Python and is publicly available at https://github.com/yaniv-shulman/rsklpr. The population analysis quantifies the bias induced by density-based robust weighting, and the reported experiments show lower empirical bias than iterative robust LOWESS while remaining competitive with standard LOWESS. This advancement provides a promising extension to traditional LPR, opening new possibilities for robust regression applications.

2507.05164 2026-06-17 math.DS cs.LG nlin.AO 版本更新

A Dynamical Systems Perspective on the Analysis of Neural Networks

神经网络分析的动力学系统视角

Dennis Chemnitz, Maximilian Engel, Christian Kuehn, Sara-Viola Kuntz

AI总结 利用动力学系统重新表述深度神经网络、梯度下降等挑战,研究信息传播、训练动态和平均场极限,揭示网络嵌入、稳定性及图极限等性质。

Comments preprint of a book chapter contribution

详情
AI中文摘要

在本章中,我们利用动力学系统分析机器学习算法的几个方面。作为阐述性贡献,我们展示了如何将深度神经网络、(随机)梯度下降及相关主题中的各种挑战重新表述为动力学陈述。我们还解决了三个具体挑战。首先,我们考虑信息通过神经网络的传播过程,即研究不同架构下的输入-输出映射。我们解释了增强神经ODE的通用嵌入性质(可表示给定正则性的任意函数)、根据合适函数类对多层感知器和神经ODE的分类,以及神经延迟方程中的记忆依赖性。其次,我们从动力学角度考虑神经网络的训练方面。我们描述了梯度下降的动力学系统视角,并研究了超定问题的稳定性。然后我们将此分析扩展到过参数化设置,并描述了稳定性边缘现象,也涉及隐式偏差的可能解释。对于随机梯度下降,我们通过插值解的Lyapunov指数展示了过参数化设置的稳定性结果。第三,我们解释了关于神经网络平均场极限的几个结果。我们描述了一个结果,该结果通过有向图测度将现有技术扩展到涉及图极限的异质神经网络。这表明大类神经网络自然落入图上Kuramoto型模型及其大图极限的框架内。最后,我们指出使用动力学研究可解释和可靠AI的类似策略也可应用于生成模型或梯度训练方法中的基本问题(如反向传播或梯度消失/爆炸)等设置。

英文摘要

In this chapter, we utilize dynamical systems to analyze several aspects of machine learning algorithms. As an expository contribution we demonstrate how to re-formulate a wide variety of challenges from deep neural networks, (stochastic) gradient descent, and related topics into dynamical statements. We also tackle three concrete challenges. First, we consider the process of information propagation through a neural network, i.e., we study the input-output map for different architectures. We explain the universal embedding property for augmented neural ODEs representing arbitrary functions of given regularity, the classification of multilayer perceptrons and neural ODEs in terms of suitable function classes, and the memory-dependence in neural delay equations. Second, we consider the training aspect of neural networks dynamically. We describe a dynamical systems perspective on gradient descent and study stability for overdetermined problems. We then extend this analysis to the overparameterized setting and describe the edge of stability phenomenon, also in the context of possible explanations for implicit bias. For stochastic gradient descent, we present stability results for the overparameterized setting via Lyapunov exponents of interpolation solutions. Third, we explain several results regarding mean-field limits of neural networks. We describe a result that extends existing techniques to heterogeneous neural networks involving graph limits via digraph measures. This shows how large classes of neural networks naturally fall within the framework of Kuramoto-type models on graphs and their large-graph limits. Finally, we point out that similar strategies to use dynamics to study explainable and reliable AI can also be applied to settings such as generative models or fundamental issues in gradient training methods, such as backpropagation or vanishing/exploding gradients.

2410.08562 2026-06-17 cond-mat.mtrl-sci cs.LG 版本更新

Adaptable Method for Crystal Design across Diverse Constraints and Objectives with Pretrained Property Predictors

基于预训练属性预测器的可适应方法用于跨多样约束与目标的晶体设计

Akihiro Fujii, Yoshitaka Ushiku, Koji Shimizu, Anh Khoa Augustin Lu, Satoshi Watanabe

AI总结 提出一种直接预测器引导的梯度优化方法,结合现成预测器、位点元素掩码、模板初始化和任务特定损失,实现数据高效、约束丰富的晶体设计,在钙钛矿中优于生成和贝叶斯基线,并支持半金属设计。

详情
AI中文摘要

先进的晶体设计可以加速从光伏到自旋电子学等应用中的材料发现。实际设计必须满足多种属性和物理约束,然而现有的基于机器学习的方法通常依赖于大型数据集、重新训练或任务特定的生成器。在这里,我们展示了直接预测器引导的梯度优化通过结合现成预测器与位点元素掩码、模板初始化和任务特定损失,实现了数据高效、约束丰富的晶体设计。在钙钛矿中,它在三个目标——带隙、形成能和容忍因子——以及两个硬约束下优于生成和贝叶斯基线。DFT评估进一步表明,尽管使用的预测器训练数据约为领先生成模型的十分之一,其带隙目标性能仍具有竞争力。通过灵活组合预训练预测器与应用导向的掩码和自定义损失,同一框架支持半金属设计。这种模块化可以帮助研究人员和工程师将多样化的应用需求直接转化为优化的候选晶体,且计算成本最低。

英文摘要

Advanced crystal design can accelerate materials discovery across applications from photovoltaics to spintronics. Practical design must satisfy multiple properties and physical constraints, yet existing machine-learning-based approaches to such design often depend on large datasets, retraining, or task-specific generators. Here, we show that direct predictor-guided gradient optimization enables data-efficient, constraint-rich crystal design by combining off-the-shelf predictors with site-wise element masks, template initialization, and task-specific losses. In perovskites, it outperformed generative and Bayesian baselines under three targets -- band gap, formation energy, and tolerance factor -- and two hard constraints. DFT assessment further showed band-gap targeting competitive with a leading generative model despite using predictors trained on roughly one-tenth of the data. By flexibly combining pretrained predictors with application-oriented masks and custom losses, the same framework supported half-metal design. Such modularity could help researchers and engineers translate diverse application requirements directly into optimized candidate crystals with minimal computational cost.

2405.15379 2026-06-17 stat.ML cs.LG math.PR math.ST stat.TH 版本更新

Randomized Midpoint Method for Log-Concave Sampling under Constraints

对数凹分布约束采样的随机中点方法

Yifeng Yu, Shijie Zhang, Lu Yu

AI总结 提出约束域中过阻尼和动能朗之万扩散的随机中点离散化方法,通过投影算子建立统一框架,证明Wasserstein-q距离下的收敛保证并得到近最优下界。

详情
AI中文摘要

本文研究在凸紧集上支撑的对数凹分布的采样问题,特别关注约束域中过阻尼和动能朗之万扩散的随机中点离散化。我们重新审视了通过投影算子处理约束的近端框架,并发展了一个更通用的公式,涵盖了欧几里得、Bregman和Gauge投影。由此产生的光滑近似允许对约束下的朗之万算法及其变体进行统一且易于处理的分析。在此框架内,我们建立了光滑代理与目标分布之间Wasserstein-$q$($q\geqslant 1$)距离的收敛保证。我们进一步推导了互补的下界,表明结果在阶上是近乎最优的。基于这种紧致近似分析,我们获得了约束下随机中点朗之万算法的新收敛保证,以及普通和动能朗之万蒙特卡洛方法的改进界,从而推进了约束扩散采样的理论理解。

英文摘要

In this paper, we study the problem of sampling from log-concave distributions supported on convex and compact sets, with a particular focus on the randomized midpoint discretization of both overdamped and kinetic Langevin diffusions in constrained domains. We revisit the proximal framework for handling constraints through projection operators and develop a more general formulation that encompasses Euclidean, Bregman, and Gauge projections. The resulting smooth approximation allows a unified and tractable analysis of Langevin algorithms and their variants under constraints. Within this framework, we establish convergence guarantees in Wasserstein-$q$ $(q\geqslant 1)$ distances between the smooth surrogate and the target distribution. We further derive complementary lower bounds, showing that the results are near-optimal in order. Building upon this tight approximation analysis, we obtain new convergence guarantees for the randomized midpoint Langevin algorithms and refined bounds for both vanilla and kinetic Langevin Monte Carlo methods under constraints, thereby advancing the theoretical understanding of constrained diffusion-based sampling.

2503.05598 2026-06-17 cs.CE cs.LG 版本更新

From Theory to Application: A Practical Introduction to Neural Operators in Scientific Computing

从理论到应用:神经算子在科学计算中的实用入门

Prashant K. Jha

AI总结 本文综述了用于学习参数偏微分方程解算子的神经网络架构,包括DeepONet、PCANet和傅里叶神经算子,并在三个典型问题上评估其性能,同时讨论了作为贝叶斯逆问题替代模型的应用及挑战。

Comments 72 pages, 22 figures, GitHub repository: https://github.com/CEADpx/neural_operators

详情
AI中文摘要

本综述考察了用于学习参数偏微分方程(PDE)解算子的神经算子架构,重点强调概念清晰性和实际实现。该工作分析了关键模型,包括DeepONet、PCANet和傅里叶神经算子,突出了它们的基础表示、计算结构和比较性能。这些架构在三个典型PDE问题上进行了演示:泊松方程、线弹性问题和超弹性问题。为使内容自包含,介绍了关键基础主题,包括函数空间的有限维表示、奇异值分解以及从无限维函数空间中采样。除了正向建模,本综述还讨论了在贝叶斯逆问题框架中使用神经算子作为替代模型,包括先验指定、正向映射近似和后验计算。三种神经算子架构的性能在分布内样本、分布外样本和贝叶斯推断任务上进行了评估。本综述还讨论了与预测精度和泛化相关的挑战,概述了新兴策略,如基于残差的误差校正和多层级训练。最后,本综述将神经算子定位在更广泛的科学计算工作流中,并指出了实现可靠、可扩展算子学习的方向。

英文摘要

This review examines neural operator architectures for learning solution operators of parametric partial differential equations (PDEs), with an emphasis on conceptual clarity and practical implementation. The work analyzes key models, including DeepONet, PCANet, and the Fourier Neural Operator, highlighting their underlying representations, computational structures, and comparative performance. These architectures are demonstrated on three canonical PDE problems: the Poisson equation, a linear elasticity problem, and a hyperelasticity problem. To make the presentation self-contained, key foundational topics are introduced, including finite-dimensional representations of function spaces, singular-value decomposition, and sampling from infinite-dimensional function spaces. Beyond forward modeling, the review discusses the use of neural operators as surrogate models within a Bayesian inverse-problem framework, including prior specification, forward-map approximation, and posterior computation. The performance of the three neural-operator architectures is evaluated on in-distribution samples, out-of-distribution samples, and Bayesian inference tasks. The review also discusses challenges related to prediction accuracy and generalization, outlining emerging strategies such as residual-based error correction and multi-level training. The review concludes by positioning neural operators within broader scientific-computing workflows and by identifying directions for reliable, scalable operator learning.

2503.04507 2026-06-17 q-bio.QM cs.LG 版本更新

The Morse Transform for Discrete Shape Analysis

离散形状分析的Morse变换

Alexander M. Tanaka, Aras T. Asaad, Richard Cooper, Vidit Nanda

AI总结 提出一种基于定向分段线性Morse理论的拓扑变换,通过记录多个高度函数下的临界点来量化嵌入对象的几何形状,生成的特征向量在配体虚拟筛选中取得最优平均AUROC。

Comments 37 pages, 3 main figures, 2 main tables, 12 appendix figures and 4 appendix tables

详情
AI中文摘要

物体的几何形状在调节其与物理世界的相互作用中起着至关重要的作用。然而,为了统计推断或分类任务的目的,用数值描述几何信息仍然困难。在这里,我们引入了一种新的拓扑变换,它利用定向分段线性Morse理论,通过编录多个高度函数下的临界点来量化嵌入对象的几何形状。该Morse变换的输出记录了表征底层形状的临界点的高度和局部拓扑类型(峰、谷或鞍点),保留了比欧拉特征变换更精细的信息,同时自然优先考虑形状的最外层区域。关键的是,该输出可以进一步压缩为丰富而紧凑的特征向量。我们将Morse特征向量作为配体虚拟筛选(LBVS)的描述符进行基准测试,这本质上依赖于分子的形状。在常见的梯度提升树分类流程下,与其他拓扑变换描述符和标准基于形状的LBVS描述符相比,Morse描述符实现了最高的平均AUROC。

英文摘要

The geometry of an object plays a vital role in modulating its interactions with the physical world. It nevertheless remains difficult to describe geometric information numerically for the purposes of statistical inference or classification tasks. Here, we introduce a new topological transform which leverages directional piecewise-linear Morse theory to quantify the geometry of an embedded object by cataloguing critical points across multiple height-functions. The output of this Morse transform records both the heights and the local topological type (peak, trough or saddle) of the critical points that characterise the underlying shape, retaining finer information than the Euler characteristic transform whilst naturally prioritising a shape's outermost regions. Crucially, this output can be further compressed into a rich but compact feature vector. We benchmark the Morse feature vector as a descriptor for ligand-based virtual screening (LBVS), which intrinsically depends on the shape of molecules. Under a common gradient-boosted tree classification pipeline, Morse descriptors achieve the highest mean AUROC when compared to other topological transform descriptors and to standard shape-based LBVS descriptors.

2403.18957 2026-06-17 cs.CY cs.CL cs.LG cs.SI 版本更新

Moderating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Vision-Language Models

使用大型视觉语言模型审核不安全的用户生成内容游戏中的非法在线图像推广

Keyan Guo, Ayush Utkarsh, Wenbo Ding, Isabelle Ondracek, Ziming Zhao, Guo Freeman, Nishant Vishwamitra, Hongxin Hu

AI总结 针对社交媒体上非法推广不安全UGC游戏的图像,提出UGCG-Guard系统,利用大型视觉语言模型和条件提示策略实现零样本域适应,检测准确率达94%。

Comments In Proceedings of the 33rd USENIX Conference on Security Symposium (SEC '24), August 14-16, 2024

详情
AI中文摘要

在线用户生成内容游戏(UGCG)在儿童和青少年中越来越受欢迎,用于社交互动和更具创造性的在线娱乐。然而,它们带来了更高的接触露骨内容的风险,引发了对儿童和青少年在线安全的日益关注。尽管存在这些担忧,但很少有研究关注社交媒体上基于图像的非法不安全UGCG推广问题,这种推广可能无意中吸引年轻用户。这一挑战源于难以获得全面的UGCG图像训练数据以及这些图像与传统不安全内容不同的独特性质。在这项工作中,我们迈出了研究不安全UGCG非法推广威胁的第一步。我们收集了一个包含2,924张图像的真实世界数据集,这些图像展示了游戏创作者用于推广UGCG的多种色情和暴力内容。我们的深入研究揭示了对此问题的新认识,以及自动标记非法UGCG推广的迫切需求。我们还创建了一个尖端系统UGCG-Guard,旨在帮助社交媒体平台有效识别用于非法UGCG推广的图像。该系统利用最近引入的大型视觉语言模型(VLM),并采用一种新颖的条件提示策略进行零样本域适应,以及思维链(CoT)推理进行上下文识别。UGCG-Guard取得了出色结果,在现实场景中检测这些用于非法推广游戏的图像时准确率达到94%。

英文摘要

Online user generated content games (UGCGs) are increasingly popular among children and adolescents for social interaction and more creative online entertainment. However, they pose a heightened risk of exposure to explicit content, raising growing concerns for the online safety of children and adolescents. Despite these concerns, few studies have addressed the issue of illicit image-based promotions of unsafe UGCGs on social media, which can inadvertently attract young users. This challenge arises from the difficulty of obtaining comprehensive training data for UGCG images and the unique nature of these images, which differ from traditional unsafe content. In this work, we take the first step towards studying the threat of illicit promotions of unsafe UGCGs. We collect a real-world dataset comprising 2,924 images that display diverse sexually explicit and violent content used to promote UGCGs by their game creators. Our in-depth studies reveal a new understanding of this problem and the urgent need for automatically flagging illicit UGCG promotions. We additionally create a cutting-edge system, UGCG-Guard, designed to aid social media platforms in effectively identifying images used for illicit UGCG promotions. This system leverages recently introduced large vision-language models (VLMs) and employs a novel conditional prompting strategy for zero-shot domain adaptation, along with chain-of-thought (CoT) reasoning for contextual identification. UGCG-Guard achieves outstanding results, with an accuracy rate of 94% in detecting these images used for the illicit promotion of such games in real-world scenarios.

2206.06208 2026-06-17 eess.AS cs.CL cs.SD

Automated Evaluation of Standardized Dementia Screening Tests

标准化痴呆筛查测试的自动化评估

Franziska Braun, Markus Förstel, Bastian Oppermann, Andreas Erzigkeit, Thomas Hillemacher, Hartmut Lehfeld, Korbinian Riedhammer

AI总结 本文研究了标准化痴呆筛查测试的自动化评分方法,通过分析手动和自动转录本的评分相关性,发现自动评分在某些任务上比人工评分更严格,但整体仍保持高相关性。

Comments Submitted to Interspeech 2022. arXiv admin note: text overlap with arXiv:2206.05018

详情
Journal ref
Proceedings of Interspeech 2022
AI中文摘要

在痴呆筛查和监测中,标准化测试在临床实践中起关键作用,因为它们旨在通过测量多种认知任务的表现来最小化主观性。本文报告了一项研究,该研究包括一个半标准化的病史采集,随后是两种标准化的神经心理学测试,即SKT和CERAD-NB。这些测试包括命名物体、学习词列表等基本任务,以及广泛使用的工具如MMSE。大多数任务是口头进行的,因此应适合基于转录文本的自动化评分。对于前30名患者的第一批,我们分析了专家手动评分与基于手动和自动转录的自动评分之间的相关性。对于SKT和CERAD-NB,我们观察到使用手动转录本时的高到完美相关性;对于某些相关性较低的任务,自动评分比人类参考更严格,因为其仅限于音频。使用自动转录本时,相关性下降如预期,与识别准确性相关;然而,我们仍观察到高达0.98(SKT)和0.85(CERAD-NB)的高相关性。我们证明使用词替代可以缓解识别错误,从而提高与专家评分的相关性。

英文摘要

For dementia screening and monitoring, standardized tests play a key role in clinical routine since they aim at minimizing subjectivity by measuring performance on a variety of cognitive tasks. In this paper, we report on a study that consists of a semi-standardized history taking followed by two standardized neuropsychological tests, namely the SKT and the CERAD-NB. The tests include basic tasks such as naming objects, learning word lists, but also widely used tools such as the MMSE. Most of the tasks are performed verbally and should thus be suitable for automated scoring based on transcripts. For the first batch of 30 patients, we analyze the correlation between expert manual evaluations and automatic evaluations based on manual and automatic transcriptions. For both SKT and CERAD-NB, we observe high to perfect correlations using manual transcripts; for certain tasks with lower correlation, the automatic scoring is stricter than the human reference since it is limited to the audio. Using automatic transcriptions, correlations drop as expected and are related to recognition accuracy; however, we still observe high correlations of up to 0.98 (SKT) and 0.85 (CERAD-NB). We show that using word alternatives helps to mitigate recognition errors and subsequently improves correlation with expert scores.

2606.17977 2026-06-17 econ.EM 新提交

Beyond Parallel Trends in Staggered Difference-in-Differences: Identification under Higher-Order Parallelism

超越交错双重差分中的平行趋势:高阶平行性下的识别

Zecharias Anteneh

AI总结 本文提出高阶平行性假设层次,替代传统平行趋势假设,在交错双重差分设计中实现队列特定和平均处理效应的点识别,并证明聚合定理。

Comments 38 pages, 4 figures. Companion Stata command (anddp) implementing the estimator will be available soon at https://github.com/zanteneh/anddp

详情
AI中文摘要

在双重差分设计中,平行趋势假设要求处理组和对照组之间的结果差距在未处理情况下保持平坦。预处理事件研究经常拒绝这一平坦差距要求。现有的应对措施包括参数趋势控制以及基于违规程度假设的处理效应边界。本文表明,在严格更弱的假设下,交错设计中队列特定和平均处理效应的点识别仍然可以实现。我将平坦差距要求替换为高阶条件层次 Parallel[p],将该框架嵌入 Callaway 和 Sant'Anna (2021) 的组-时间平均处理效应结构中,并证明了一个聚合定理,该定理适用于不同队列在不同可行多项式阶数下被识别的情况,这是交错设计特有的此前未解决的挑战。一个序贯阶数选择程序指导应用实践。蒙特卡洛证据证实,选择后自助法覆盖接近名义水平,且推断对现实序列相关具有稳健性。应用于医疗补助扩展数据,该方法得到的点估计基于预处理数据未拒绝的假设,而同样的数据明确拒绝了平坦差距要求。

英文摘要

In difference-in-differences designs, the parallel trends assumption requires that the outcome gap between treated and control units would have remained flat absent treatment. Pre-treatment event studies frequently reject this flat-gap requirement. Existing responses include parametric trend controls and bounds on the treatment effect under assumptions about the magnitude of the violation. This paper shows that point identification of cohort-specific and aggregate treatment effects in staggered designs remains achievable under strictly weaker assumptions. I replace the flat-gap requirement with a hierarchy of higher-order conditions, Parallel[p], embed this framework in the group-time average treatment effect structure of Callaway and Sant'Anna (2021), and prove an aggregation theorem for the case where different cohorts are identified under different feasible polynomial orders, a challenge unique to staggered designs that has not been previously addressed. A sequential order-selection procedure guides applied practice. Monte Carlo evidence confirms that post-selection bootstrap coverage remains near-nominal and that inference is robust to realistic serial correlation. Applied to Medicaid expansion data, the method yields point estimates resting on an assumption the pre-treatment data do not reject, in contrast to the flat-gap requirement which those same data decisively reject.

2606.18196 2026-06-17 eess.SP 新提交

Receiver-Aware Analysis and Verification of the Spectral Separation Coefficient Under Interference-Induced Degradation

接收机感知的干扰诱导退化下频谱分离系数的分析与验证

Lucas Heublein, Fabian Benschuh, Alexander Rügamer, Felix Ott

AI总结 本文通过引入接收机前端特性计算依赖接收机的频谱分离系数(SSC),并利用真实和仿真数据集实验验证了干扰影响计算的鲁棒性。

Comments 7 pages, 4 figures

详情
AI中文摘要

干扰对基于卫星的定位系统构成重大挑战,因此准确量化特定干扰类型对接收机性能以及由此产生的位置计算可靠性的影响至关重要。当前实践中,干扰影响通常使用与接收机无关的指标进行量化,而接收机特定的前端特性要么被理想化,要么仅被隐含考虑。在本文中,我们通过将接收机特定的前端特性明确纳入干扰影响的计算中,并通过实验验证所得的依赖接收机的分析,来解决这一局限性。因此,我们记录了一个包含210个不同干扰场景的真实世界开放场数据集,并针对特定接收机模块计算了依赖接收机的频谱分离系数(SSC)和干扰影响。此外,我们使用由射频星座模拟器(RFCS)生成的受控数据集验证了计算,该模拟器采用相同的接收机模块并回放类似的干扰类别。两种环境下获得的结果比较证明了干扰影响计算的鲁棒性。

英文摘要

Interference poses a significant challenge to satellite-based positioning systems, making it essential to accurately quantify the effects of specific interference types on receiver performance and the resulting reliability of position computation. In current practice, interference effects are often quantified using receiver-independent metrics, with receiver-specific front-end characteristics either idealized or only implicitly considered. In this paper, we address this limitation by explicitly incorporating receiver-specific front-end characteristics into the computation of interference effects and validating the resulting receiver-dependent analysis experimentally. Therefore, we record a real-world open-field dataset comprising 210 distinct interference scenarios and compute the receiver-dependent spectral separation coefficient (SSC) and interference impact for a specific receiver module. Furthermore, we verify the computation using a controlled dataset generated with a radio frequency constellation simulator (RFCS), employing the same receiver module and replaying similar interferences classes. The comparison of results obtained in both environments demonstrates the robustness of the interference impact computation.

2606.18134 2026-06-17 eess.AS 新提交

Grounding Spoken LLMs in Multi-Speaker Audio via Diarization Conditioning

通过说话人日志条件将口语大语言模型扩展到多说话人音频

Alexander Polok, Samuele Cornell, Sathvik Udupa, Jan Černocký, Shinji Watanabe, Lukáš Burget

AI总结 提出基于说话人日志条件的口语语言模型,通过条件化声学编码器提取目标说话人表示,避免序列化输出训练导致的灾难性遗忘,在多个数据集上显著提升说话人属性转录性能。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

我们提出了说话人日志条件的口语语言模型(SLMs),这是一种将SLMs扩展到远场多说话人音频的策略。不同于通过序列化输出训练来调整解码器(这有灾难性遗忘的风险),我们通过说话人日志掩码条件化声学编码器以提取目标说话人表示,同时保持解码器冻结。我们将其实例化为Dixtral,将说话人日志条件的Whisper(DiCoW)编码器集成到Voxtral SLM中。在AMI、NOTSOFAR-1、LibriSpeechMix和Mixer6上,Dixtral在说话人属性转录方面分别以29.0%、19.8%和16.0%的绝对cpWER优于Gemini 3.0 Flash、VibeVoice和Voxtral Mini Transcribe V2。在一个新颖的长篇多说话人问答基准上,零样本Dixtral在远场内容理解上与Gemini持平,而经过微调后,在所有任务上均超越了Gemini和基于近讲语音的Voxtral。

英文摘要

We propose diarization-conditioned spoken language models (SLMs), a strategy for extending SLMs to far-field multi-talker audio. Rather than adapting the decoder via Serialized Output Training, which risks catastrophic forgetting, we condition the acoustic encoder on diarization masks to extract target-speaker representations, keeping the decoder frozen. We instantiate this as Dixtral, integrating a Diarization Conditioned Whisper (DiCoW) encoder into the Voxtral SLM. On AMI, NOTSOFAR-1, LibriSpeechMix, and Mixer6, Dixtral outperforms Gemini 3.0 Flash, VibeVoice, and Voxtral Mini Transcribe V2 on speaker-attributed transcription by 29.0%, 19.8%, and 16.0% absolute cpWER respectively. On a novel long-form multi-speaker QA benchmark, zero-shot Dixtral matches Gemini on far-field content understanding, and when fine-tuned surpasses both Gemini and Voxtral operating on close-talk across all tasks.

2606.18072 2026-06-17 eess.AS 新提交

One-Step Token-to-Waveform Generation with MeanFlow in Latent Space

基于潜在空间中MeanFlow的一步式Token到波形生成

Zheqi Dai, Guangyan Zhang, Zhen Ye, Jingyu Li, Haolin He, Chunyat Wu, Yiwen Guo, Qiuqiang Kong

AI总结 提出MeanFlow在高度压缩潜在空间中实现一步式Token2Wav生成,解决多步流匹配解码器的速度-质量权衡,RTF提升17倍且质量损失可忽略。

Comments 5 pages, 1 figure

详情
AI中文摘要

神经音频编解码器是现代基于LLM的文本到语音(TTS)和多模态系统的核心。随着低比特率语义编解码器的重要性日益增加,Token到波形(Token2Wav)解码器成为决定感知质量和系统效率的瓶颈。传统的多步流匹配解码器提供了卓越的质量,但由于迭代采样导致高推理延迟,造成了严重的质量-速度权衡。在本文中,我们提出了一种新颖的Token2Wav架构,通过在高度压缩的潜在空间中应用MeanFlow来克服这一限制。通过建模平均速度而非瞬时速度场,MeanFlow实现了真正的一步生成。在潜在域中操作减轻了波形级流的内存和稳定性问题,与多步基线相比,实时因子(RTF)提升了高达17倍,且质量下降可忽略。此外,我们引入了缓解潜在不匹配的细化策略,包括冻结MeanFlow生成器的仅解码器微调和端到端联合微调,在不增加推理时间成本的情况下提高了保真度。代码和演示已公开。

英文摘要

Neural audio codecs are central to modern LLM-based Text-to-Speech (TTS) and multimodal systems. As low-bitrate semantic codecs gain prominence, the Token-to-Waveform (Token2Wav) decoder becomes a bottleneck determining both perceptual quality and system efficiency. Conventional multi-step flow-matching decoders offer superior quality but suffer from high inference latency due to iterative sampling, creating a severe quality-speed trade-off. In this paper, we propose a novel Token2Wav architecture that overcomes this limitation by applying MeanFlow in a highly compressed latent space. By modeling the average velocity rather than the instantaneous velocity field, MeanFlow enables true one-step generation. Operating in the latent domain mitigates the memory and stability issues of waveform-level flows, yielding up to a 17$\times$ improvement in Real-Time Factor (RTF) compared to multi-step baselines with negligible quality degradation. Furthermore, we introduce refinement strategies that mitigate latent mismatch, including decoder-only fine-tuning with the MeanFlow generator frozen and end-to-end joint fine-tuning, improving fidelity without increasing inference-time cost. Code and demo are publicly available.