arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2062
2603.10468 2026-06-01 eess.AS cs.AI cs.HC cs.MM cs.SD

G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition

G-STAR: 端到端全局说话人跟踪属性识别

Jing Peng, Ziyi Chen, Haoyu Li, Yucheng Wang, Duo Ma, Mengtian Li, Yunfan Du, Dezhu Xu, Kai Yu, Shuai Wang

AI总结 提出G-STAR框架,通过缓存条件说话人跟踪模块与Speech-LLM转录骨干耦合,实现长时重叠多说话人语音的端到端说话人属性识别,支持组件优化和联合训练,在局部和全局评估中均表现优异。

Comments submitted to Emnlp 2026

详情
AI中文摘要

我们研究了带时间戳的说话人属性自动语音识别(SA-ASR),针对长时、多说话人且存在重叠的语音。在此设置中,分块推理必须保持会议级别的说话人身份一致性,同时生成带时间戳和说话人标签的转录。先前的Speech-LLM系统倾向于优先考虑局部日志或全局标签,缺乏联合建模细粒度时间边界和鲁棒跨块身份链接的能力。我们提出G-STAR,一个端到端框架,将缓存条件的说话人跟踪模块与Speech-LLM转录骨干耦合。跟踪器提供具有时间基础的结构化说话人线索,LLM基于这些线索生成属性文本。G-STAR支持组件优化和联合端到端训练,能够在异构监督和领域偏移下进行灵活学习。在分块解码协议下,基于预言分割的局部评估和全会议全局评估的实验均显示出强大的说话人属性转录性能。

英文摘要

We study timestamped speaker-attributed automatic speech recognition (SA-ASR) for long-form, multi-party speech with overlap. In this setting, chunk-wise inference must preserve meeting-level speaker identity consistency while producing time-stamped, speaker-labeled transcripts. Prior Speech-LLM systems tend to prioritize either local diarization or global labeling, lacking the ability to jointly model fine-grained temporal boundaries and robust cross-chunk identity linking. We propose G-STAR, an end-to-end framework that couples a cache-conditioned speaker-tracking module with a Speech-LLM transcription backbone. The tracker provides structured speaker cues with temporal grounding, and the LLM generates attributed text conditioned on these cues. G-STAR supports component-wise optimization and joint end-to-end training, enabling flexible learning under heterogeneous supervision and domain shift. Under chunk-wise decoding protocols, experiments on both oracle-segmented local evaluation and full-meeting global evaluation show strong speaker-attributed transcription performance.

2603.08721 2026-06-01 cs.AR cs.LG cs.SE

KernelCraft: Benchmarking for Agentic Close-to-Metal Kernel Generation on Emerging Hardware

KernelCraft: 面向新兴硬件的近底层内核生成的智能体基准测试

Jiayi Nie, Haoran Wu, Yao Lai, Zeyu Cao, Cheng Zhang, Binglei Lou, Erwei Wang, Jianyi Cheng, Timothy M. Jones, Robert Mullins, Rika Antonova, Yiren Zhao

AI总结 提出KernelCraft基准,通过函数调用和反馈驱动的工作流评估LLM智能体为新兴加速器生成和优化底层内核的能力,在多个任务上验证其能快速生成正确且高效的内核。

详情
AI中文摘要

具有新颖指令集架构(ISA)的新AI加速器通常需要开发者手动编写底层内核,这是一个耗时且易出错的过程,且无法跨硬件目标扩展。这延迟了新兴硬件平台进入市场。虽然先前基于LLM的代码生成在成熟的GPU生态系统中显示出潜力,但目前尚不清楚智能体LLM系统能否快速为具有新ISA的新兴硬件生成有效且高效的内核。我们提出KernelCraft:首个基准,用于评估LLM智能体通过函数调用、反馈驱动的工作流为定制加速器生成和优化底层内核的能力。我们在三个新兴加速器上评估智能体性能,涵盖20多个机器学习任务,每个任务有五种不同的配置。在四个领先的推理模型中,最强的智能体能在几步优化内为未见过的ISA生成功能正确的内核,并产生匹配或超越编译器基线的优化内核。这些结果证明了KernelCraft加速加速器芯片开发周期的潜力。KernelCraft可在https://kernelcraft-cam.github.io/获取。

英文摘要

New AI accelerators with novel instruction set architectures (ISAs) often require developers to manually craft low-level kernels, a time-consuming and error-prone process that does not scale across hardware targets. This delays emerging hardware platforms from reaching the market. While prior LLM-based code generation has shown promise in mature GPU ecosystems, it remains unclear whether agentic LLM systems can quickly produce valid and efficient kernels for emerging hardware with new ISAs. We present KernelCraft: the first benchmark for evaluating an LLM agent's ability to generate and optimize low-level kernels for customized accelerators through a function-calling, feedback-driven workflow. We evaluate agent performance across three emerging accelerators on more than 20 machine-learning tasks, each with five diverse task configurations. Across four leading reasoning models, the strongest agents generate functionally correct kernels for unseen ISAs within a few refinement steps and produce optimized kernels that match or outperform compiler baselines. These results demonstrate KernelCraft's potential to accelerate the accelerator chip development cycle. KernelCraft is available at https://kernelcraft-cam.github.io/.

2603.08385 2026-06-01 eess.IV cs.CV

Rectified flow-based prediction of post-treatment brain MRI from pre-radiotherapy priors for patients with glioma

基于整流流的胶质瘤患者放疗前先验信息治疗后脑MRI预测

Selena Huisman, Nordin Belkacemi, Vera C. Keil, Joost Verhoeff, Szabolcs David

AI总结 提出一种基于整流流的条件图像生成模型,利用放疗前MRI和剂量图预测治疗后任意时间点的脑MRI,实现快速推理并保持语义和视觉保真度。

Comments 10 pages, 6 figures, 1 supplementary table, added GitHub url, corrected figure captions

详情
AI中文摘要

脑肿瘤平均导致20年的寿命损失。标准疗法会引起大脑复杂的结构变化,这些变化通过MRI监测。人工智能的最新进展使得从临床数据中进行条件多模态图像生成成为可能。在本研究中,我们通过条件图像生成探索了颅内肿瘤患者随访MRI的AI驱动生成。该方法能够对放疗后变化进行真实建模,从而优化治疗。使用公开的SAILOR数据集(25名患者)创建了一个二维整流流模型,该模型以治疗前MRI轴向切片和RT剂量图为条件。采用交叉注意力条件化来整合时间和化疗数据。通过结构相似性指数(SSIM)、峰值信噪比(PSNR)、Dice分数和雅可比行列式对生成的图像进行验证。所生成的模型能够生成任意时间点的真实随访MRI,同时整合治疗信息。比较真实图像与预测图像,SSIM为0.88,PSNR为22.82。真实与预测MRI的组织分割平均Dice-Sørensen系数(DSC)为0.91。整流流(RF)模型的推理速度比去噪扩散概率模型(DDPM)快250倍。所提出的模型能够实时生成真实的随访MRI,并通过图像质量指标和组织分割确认其保持了语义和视觉保真度。条件生成允许通过改变治疗参数进行反事实模拟,产生预测的形态学变化。该能力有望支持颅内肿瘤患者的适应性治疗剂量规划和个性化预后预测。代码将在同行评审发表后提供:https://github.com/SelenaIHuisman/RF-GlioPREDICT

英文摘要

Brain tumors result in 20 years of lost life on average. Standard therapies induce complex structural changes in the brain that are monitored through MRI. Recent developments in artificial intelligence (AI) enable conditional multimodal image generation from clinical data. In this study, we investigate AI-driven generation of follow-up MRI in patients with intracranial tumors through conditional image generation. This approach enables realistic modeling of post-radiotherapy changes, allowing for treatment optimization. The public SAILOR dataset of 25 patients was used to create a 2D rectified flow model conditioned on axial slices of pre-treatment MRI and RT dose maps. Cross-attention conditioning was used to incorporate temporal and chemotherapy data. The resulting images were validated with structural similarity index measure (SSIM), peak signal-to-noise ratio (PSNR), Dice scores and Jacobian determinants. The resulting model generates realistic follow-up MRI for any time point, while integrating treatment information. Comparing real versus predicted images, SSIM is 0.88, and PSNR is 22.82. Tissue segmentations from real versus predicted MRI result in a mean Dice-Sørensen coefficient (DSC) of 0.91. The rectified flow (RF) model enables up to 250x faster inference than Denoising Diffusion Probabilistic Models (DDPM). The proposed model generates realistic follow-up MRI in real-time, preserving both semantic and visual fidelity as confirmed by image quality metrics and tissue segmentations. Conditional generation allows counterfactual simulations by varying treatment parameters, producing predicted morphological changes. This capability has potential to support adaptive treatment dose planning and personalized outcome prediction for patients with intracranial tumors. Code will be available upon peer-reviewed publication at: https://github.com/SelenaIHuisman/RF-GlioPREDICT

2603.05529 2026-06-01 cs.DB cs.AI

NGDBench: Towards Neural Graph Data Management

NGDBench:迈向神经图数据管理

Yufei Li, Yisen Gao, Jiaxuan Xiong, Jiaxin Bai, Shijie Zhong, Haoyu Huang, Zhongwei Xie, Hong Ting Tsang, Yangqiu Song

AI总结 针对现有数据管理系统在噪声、不完整和动态更新下缺乏隐式结构发现和推理能力的问题,提出NGDBench基准,通过统一结构化与非结构化来源的图视图,评估神经查询方法在噪声和动态状态跟踪中的表现。

Comments https://github.com/HKUST-KnowComp/NGDBench

详情
AI中文摘要

对现实世界决策至关重要的数据越来越多地存在于组织内部。这些数据是异构的、不断演化的,并且只能被不完美地捕获。然而,当前的数据管理系统仍然是被动的,只能检索显式存储的内容,而在噪声、不完整和持续更新的情况下,对发现隐式结构或推理的支持有限。我们认为,下一代数据管理需要神经能力,这种能力可以揭示复杂的潜在关系,区分可靠信号与噪声,并在底层数据状态演化时保持一致。为了支持这一方向,我们引入了NGDBench,这是一个跨五个领域的基准,统一了结构化和非结构化来源。NGDBench采用图视图,因为图为建模复杂系统、捕获潜在关系以及包含关系表等结构化格式提供了灵活的抽象。每个实例将一个干净的潜在图与一个实际扰动的观测图配对。NGDBench支持完整的Cypher查询和动态数据管理操作。对最先进的基于LLM的Text-to-Cypher和GraphRAG管道的评估表明,当前的神经查询方法仍然对噪声敏感,并且在动态状态跟踪方面存在困难,这凸显了对具有弹性和推理能力的数据管理的需求。我们的代码可在https://github.com/HKUST-KnowComp/NGDBench获取。

英文摘要

Data critical to real-world decision-making is increasingly found within organizations. Such data is heterogeneous, constantly evolving, and only imperfectly captured. However, current data management systems remain largely passive, retrieving what is explicitly stored while offering limited support for uncovering implicit structure or reasoning under noise, incompleteness, and continuous updates. We argue that next-generation data management requires neural capabilities, which can uncover complex latent relationships, distinguish reliable signals from noise, and remain consistent as the underlying data state evolves. To support this direction, we introduce NGDBench, a benchmark across five domains that unifies structured and unstructured sources. NGDBench adopts a graph view because graphs provide a flexible abstraction for modeling complex systems, capturing latent relationships, and subsuming structured formats such as relational tables. Each instance pairs a clean latent graph with a realistically perturbed observed graph. NGDBench supports full Cypher queries and dynamic data management operations. Evaluations of state-of-the-art Text-to-Cypher by LLMs and GraphRAG pipelines reveal that current neural query methods remain sensitive to noise and struggle with dynamic state tracking, highlighting the need for resilient, inference-capable data management. Our code is available at https://github.com/HKUST-KnowComp/NGDBench.

2603.00068 2026-06-01 cs.CY cs.AI

The Global Landscape of Environmental AI Regulation: From the Cost of Reasoning to a Right to Green AI

环境AI监管的全球格局:从推理成本到绿色AI权利

Kai Ebert, Boris Gamazaychikov, Philipp Hacker, Sasha Luccioni

AI总结 本文通过实证证据、全球监管地图和政策建议,揭示了生成式AI日益增长的环境成本,并提出模型级透明度、用户选择权和国际协调的三管齐下应对方案。

Comments 23 pages, 1 table, preprint

详情
AI中文摘要

人工智能系统造成了巨大且不断增长的环境成本,然而随着其部署加速,关于这些影响的透明度却在下降。本文做出三项贡献。首先,我们汇集了实证证据,表明2025年激增的生成式网络搜索和推理模型比前几代AI方法具有更高的累积环境影响。其次,我们绘制了跨越11个司法管辖区的全球监管格局,发现环境治理的方式(主要在设施层面而非模型层面,侧重于训练而非推理,除欧盟外有限的AI特定能源披露要求)限制了其适用性。第三,为解决这一问题,我们提出了三管齐下的政策回应:强制性的模型级透明度,涵盖推理消耗、基准和计算位置;用户有权选择退出不必要的生成式AI集成并选择环境优化的模型;以及国际协调以防止监管套利。最后,我们提出了具体的立法建议——包括对欧盟AI法案、消费者权利指令和数字服务法案的修正——这些可作为其他司法管辖区的模板。

英文摘要

Artificial intelligence (AI) systems impose substantial and growing environmental costs, yet transparency about these impacts has declined even as their deployment has accelerated. This paper makes three contributions. First, we collate empirical evidence that generative Web search and reasoning models - which have proliferated in 2025 - come with much higher cumulative environmental impacts than previous generations of AI approaches. Second, we map the global regulatory landscape across eleven jurisdictions and find that the manner in which environmental governance operates (predominantly at the facility-level rather than the model-level, with a focus on training rather than inference, with limited AI-specific energy disclosure requirements outside the EU) limits its applicability. Third, to address this, we propose a three-pronged policy response: mandatory model-level transparency that covers inference consumption, benchmarks, and compute locations; user rights to opt out of unnecessary generative AI integration and to select environmentally optimized models; and international coordination to prevent regulatory arbitrage. We conclude with concrete legislative proposals - including amendments to the EU AI Act, Consumer Rights Directive, and Digital Services Act - that could serve as templates for other jurisdictions.

2602.21620 2026-06-01 cs.GT cs.LG

Revisiting the Bertrand Paradox via Equilibrium Analysis of No-regret Learners

重新审视 Bertrand 悖论:通过无遗憾学习者的均衡分析

Arnab Maiti, Junyan Liu, Kevin Jamieson, Lillian J. Ratliff

AI总结 本文通过无遗憾学习者的重复博弈模型,分析 Bertrand 定价博弈中高价格均衡出现的条件,并比较外部遗憾与交换遗憾对竞争行为的影响。

Comments 36 pages, 34 figures

详情
AI中文摘要

我们研究具有非递增需求函数的离散 Bertrand 定价博弈。该博弈有 $n \ge 2$ 个玩家,他们同时从集合 $\{1/k, 2/k, \ldots, 1\}$ 中选择价格,其中 $k\in\mathbb{N}$。设定最低价格的玩家获得全部需求;如果多个玩家并列最低价格,则他们平分需求。我们研究 Bertrand 悖论,即经典理论预测低价格,而实际市场往往维持高价格。为了理解这一差距,我们分析了一个重复博弈模型,其中企业使用无遗憾学习算法设定价格。我们的目标是刻画在不同无遗憾学习保证下可能出现的均衡结果。我们特别关注诸如无外部遗憾学习者是否能收敛到不良的高价格结果,以及更强的保证(如无交换遗憾)如何塑造竞争性低价格行为的出现等问题。我们通过理论分析解决这些问题及相关问题,并辅以实验支持理论,揭示无交换遗憾学习者的惊人现象。

英文摘要

We study the discrete Bertrand pricing game with a non-increasing demand function. The game has $n \ge 2$ players who simultaneously choose prices from the set $\{1/k, 2/k, \ldots, 1\}$, where $k\in\mathbb{N}$. The player who sets the lowest price captures the entire demand; if multiple players tie for the lowest price, they split the demand equally. We study the Bertrand paradox, where classical theory predicts low prices, yet real markets often sustain high prices. To understand this gap, we analyze a repeated-game model in which firms set prices using no-regret learners. Our goal is to characterize the equilibrium outcomes that can arise under different no-regret learning guarantees. We are particularly interested in questions such as whether no-external-regret learners can converge to undesirable high-price outcomes, and how stronger guarantees such as no-swap regret shape the emergence of competitive low-price behavior. We address these and related questions through a theoretical analysis, complemented by experiments that support the theory and reveal surprising phenomena for no-swap regret learners.

2602.19171 2026-06-01 cs.GR cs.AI

HistCAD: A Constraint-Aware Parametric History-Based CAD Representation, Dataset, and Benchmark with Industrial Complexity

HistCAD:一种约束感知的基于参数化历史CAD表示、数据集和具有工业复杂性的基准

Xintong Dong, Chuanyang Li, Peng Zheng, Chuqi Han, Jiaxin Jing, Hailong Shen, Yanzhi Song, Zhouwang Yang

AI总结 提出HistCAD表示标准、数据集和基准,通过显式约束记录草图、特征和操作,实现可编辑的参数化CAD序列生成与评估。

详情
AI中文摘要

参数化CAD序列是可重用的,因为尺寸和几何约束控制参数变化如何传播。现有的CAD生成数据集和基准强调重建保真度、执行有效性或静态形状相似性,而忽略了编辑下设计意图的保持。我们引入了HistCAD,一个用于可执行参数化CAD且具有显式约束的表示标准、数据集和基准。HistCAD定义了一种独立于CAD软件的中间语言,记录草图图元、约束、特征操作以及用于倒角和圆角等操作的3D点边界参考。该数据集包含170,236个可执行序列,与原生CAD模型、STEP文件、渲染视图和文本注释对齐,结合了学术规模与专业创作的工业复杂性。基于此表示,约束感知可编辑性基准应用参数编辑并报告编辑可达性、条件保留约束满足率和总体可编辑成功率,缩写为ER、cPCSR和OES;这些指标将无法达到有效编辑状态与无法保留所需约束区分开来。实验表明,显式约束对于编辑后保留设计意图至关重要,并且HistCAD支持从文本进行监督式CAD生成以及直接的大语言模型工作流。我们认为HistCAD将CAD生成从静态形状模仿重新定义为具有显式约束的可重用参数化序列的合成。

英文摘要

Parametric CAD sequences are reusable because dimensional and geometric constraints govern how parameter changes propagate. Existing CAD generation datasets and benchmarks emphasize reconstruction fidelity, execution validity, or static shape similarity, leaving preservation of design intent under edits largely unmeasured. We introduce HistCAD, a representation standard, dataset, and benchmark for executable parametric CAD with explicit constraints. HistCAD defines an intermediate language independent of CAD software, recording sketch primitives, constraints, feature operations, and 3D point boundary references for operations such as fillet and chamfer. The dataset contains 170,236 executable sequences aligned with native CAD models, STEP files, rendered views, and text annotations, combining academic scale with professionally authored industrial complexity. Building on this representation, the Constraint-Aware Editability Benchmark applies parameter edits and reports Edit Reachability, conditional preserved constraint satisfaction, and Overall Editable Success, abbreviated ER, cPCSR, and OES; these metrics separate failures to reach a valid edited state from failures to preserve required constraints. Experiments show that explicit constraints are essential for preserving design intent after edits, and that HistCAD supports supervised CAD generation from text and direct LLM workflows. We argue that HistCAD reframes CAD generation from static shape imitation to the synthesis of reusable parametric sequences with explicit constraints.

2602.16601 2026-06-01 stat.ML cs.LG

Quantifying Error Propagation and Model Collapse in Diffusion Models

量化扩散模型中的误差传播与模型崩溃

Nail B. Khelifa, Richard E. Turner, Ramji Venkataramanan

AI总结 本文理论分析了基于分数的扩散模型中递归训练导致模型崩溃的误差传播机制,给出了生成分布与目标分布之间累积散度的上下界,并刻画了不同漂移区域。

Comments Accepted at ICML 2026

详情
AI中文摘要

机器学习模型越来越多地在合成数据上进行训练或微调。已观察到,在此类数据上递归训练会显著降低各种任务的性能,通常表现为逐渐偏离目标分布。在这项工作中,我们在基于分数的扩散模型设置下从理论上分析了这一现象。对于每个训练轮次使用合成数据与来自目标分布的新鲜样本组合的实际流程,我们获得了生成分布与目标分布之间累积散度的上界和下界。值得注意的是,据我们所知,这是首次对学习分布与目标分布之间的散度给出下界,即使对于标准扩散模型也是如此。我们的结果使我们能够根据分数估计误差和每代中使用的新鲜数据比例来表征不同的漂移区域。在某个区域中,多次再训练轮次后的累积散度可以表示为每代分数估计误差的折现和。我们还提供了合成数据和图像上的实证结果以说明该理论。

英文摘要

Machine learning models are increasingly trained or fine-tuned on synthetic data. Recursively training on such data has been observed to significantly degrade performance in a wide range of tasks, often characterized by a progressive drift away from the target distribution. In this work, we theoretically analyze this phenomenon in the setting of score-based diffusion models. For a realistic pipeline where each training round uses a combination of synthetic data and fresh samples from the target distribution, we obtain upper and lower bounds on the accumulated divergence between the generated and target distributions. Notably, to the best of our knowledge, this is the first lower bound on the divergence between the learned and target distributions, even for standard diffusion models. Our results allow us to characterize different regimes of drift, depending on the score estimation error and the proportion of fresh data used in each generation. In a certain regime, the accumulated divergence after several retraining rounds can be expressed as a discounted sum of score estimation errors made at each generation. We also provide empirical results on synthetic data and images to illustrate the theory.

2602.13812 2026-06-01 cs.DB cs.AI cs.MA

DTBench: A Synthetic Benchmark for Document-to-Table Extraction

DTBench:文档到表格提取的合成基准

Yuxiang Guo, Zhuoran Du, Nan Tang, Kezheng Tang, Congcong Ge, Yunjun Gao

AI总结 提出DTBench合成基准,通过反向Table2Doc范式和多智能体合成流程生成文档,系统评估LLM在文档到表格提取中的多种能力。

Comments KDD26

详情
AI中文摘要

文档到表格(Doc2Table)提取是在目标模式从非结构化文档中导出结构化表格,实现可靠且可验证的基于SQL的数据分析。尽管大型语言模型(LLM)在灵活信息提取方面显示出潜力,但其生成精确结构化表格的能力仍未被充分理解,特别是对于需要推理和冲突解决等复杂能力的间接提取。现有基准既没有明确区分也没有全面覆盖Doc2Table提取所需的各种能力。我们认为,一个能力感知的基准对于系统评估至关重要。然而,使用人工标注的文档-表格对构建此类基准成本高、难以扩展且能力覆盖有限。为解决此问题,我们采用反向Table2Doc范式,并设计多智能体合成工作流,从真实表格生成文档。基于此方法,我们提出DTBench,一个合成基准,采用提出的Doc2Table能力的两级分类法,涵盖5个主要类别和13个子类别。我们在DTBench上评估了几种主流LLM,展示了模型间的显著性能差距,以及在推理、忠实性和冲突解决方面的持续挑战。DTBench为数据生成和评估提供了全面的测试平台,促进了Doc2Table提取的未来研究。该基准公开于https://github.com/ZJU-DAILY/DTBench。

英文摘要

Document-to-table (Doc2Table) extraction derives structured tables from unstructured documents under a target schema, enabling reliable and verifiable SQL-based data analytics. Although large language models (LLMs) have shown promise in flexible information extraction, their ability to produce precisely structured tables remains insufficiently understood, particularly for indirect extraction that requires complex capabilities such as reasoning and conflict resolution. Existing benchmarks neither explicitly distinguish nor comprehensively cover the diverse capabilities required in Doc2Table extraction. We argue that a capability-aware benchmark is essential for systematic evaluation. However, constructing such benchmarks using human-annotated document-table pairs is costly, difficult to scale, and limited in capability coverage. To address this, we adopt a reverse Table2Doc paradigm and design a multi-agent synthesis workflow to generate documents from ground-truth tables. Based on this approach, we present DTBench, a synthetic benchmark that adopts a proposed two-level taxonomy of Doc2Table capabilities, covering 5 major categories and 13 subcategories. We evaluate several mainstream LLMs on DTBench, and demonstrate substantial performance gaps across models, as well as persistent challenges in reasoning, faithfulness, and conflict resolution. DTBench provides a comprehensive testbed for data generation and evaluation, facilitating future research on Doc2Table extraction. The benchmark is publicly available at https://github.com/ZJU-DAILY/DTBench.

2602.12386 2026-06-01 cs.MA cs.GT cs.LG

Provably Convergent Actor-Critic for MARL through Risk-aversion

通过风险厌恶实现可证明收敛的MARL演员-评论家算法

Yizhou Zhang, Eric Mazumdar

AI总结 针对无限时域一般和马尔可夫博弈,提出基于风险厌恶分位数响应均衡(RQE)的单时间尺度演员-评论家算法,利用RQE的正则性证明全局收敛并给出有限样本保证。

详情
AI中文摘要

在无限时域一般和马尔可夫博弈(MGs)中学习平稳策略仍然是多智能体强化学习(MARL)中的一个基本开放问题。尽管平稳策略因其实用性而受到青睐,但计算经典博弈论均衡的平稳形式在计算上是棘手的——这与解决单智能体RL或零和博弈的相对容易形成鲜明对比。为了弥合这一差距,我们研究了风险厌恶分位数响应均衡(RQE),这是一种根植于行为博弈论的概念,结合了风险厌恶和有限理性。我们证明RQE具有强正则性条件,使其特别适合在MGs中进行学习。我们提出了一种新颖的单时间尺度演员-评论家算法,其特点是演员更新更快而评论家更新较慢。利用RQE的正则性,我们证明该方法实现了具有有限样本保证的全局收敛。我们在多个环境中进行了实证验证,表明与风险中性基线相比,我们的算法具有优越的收敛性能。

英文摘要

Learning stationary policies in infinite-horizon general-sum Markov games (MGs) remains a fundamental open problem in Multi-Agent Reinforcement Learning (MARL). While stationary strategies are preferred for their practicality, computing stationary forms of classic game-theoretic equilibria is computationally intractable -- a stark contrast to the comparative ease of solving single-agent RL or zero-sum games. To bridge this gap, we study Risk-averse Quantal response Equilibria (RQE), a solution concept rooted in behavioral game theory that incorporates risk aversion and bounded rationality. We demonstrate that RQE possesses strong regularity conditions that make it uniquely amenable to learning in MGs. We propose a novel single-timescale Actor-Critic algorithm characterized by a faster actor and a slower critic. Leveraging the regularity of RQE, we prove that this approach achieves global convergence with finite-sample guarantees. We empirically validate our algorithm in several environments to demonstrate superior convergence properties compared to risk-neutral baselines.

2602.09405 2026-06-01 stat.ML cs.IT cs.LG math.IT math.ST stat.TH

Is Memorization Helpful or Harmful? Prior Information Sets the Threshold

记忆是有益还是有害?先验信息设定阈值

Chen Cheng, Rina Foygel Barber

AI总结 在过参数化线性模型和贝叶斯框架下,研究先验分布如何决定训练误差与泛化误差的关系,给出记忆必要或过拟合有害的条件。

Comments 33 pages, 3 figures. Accepted to the Conference on Learning Theory (COLT) 2026

详情
AI中文摘要

我们研究了任意估计过程中训练误差与泛化误差之间的联系,在贝叶斯设置下,基于一般先验的过参数化线性模型中进行工作。我们发现了先验分布$π$固有的决定因素,给出了最优泛化需要训练误差(i)接近插值(相对于噪声大小,即记忆是必要的),或(ii)接近噪声水平(即过拟合是有害的)的显式条件。值得注意的是,当噪声达到由Fisher信息和先验$π$的方差参数决定的阈值时,这些现象会发生。

英文摘要

We examine the connection between training error and generalization error for arbitrary estimating procedures, working in an overparameterized linear model under general priors in a Bayesian setup. We find determining factors inherent to the prior distribution $π$, giving explicit conditions under which optimal generalization necessitates that the training error be (i) near interpolating relative to the noise size (i.e., memorization is necessary), or (ii) close to the noise level (i.e., overfitting is harmful). Remarkably, these phenomena occur when the noise reaches thresholds determined by the Fisher information and the variance parameters of the prior $π$.

2602.09309 2026-06-01 cond-mat.mtrl-sci cond-mat.mes-hall cs.LG physics.atm-clus

How Far Can You Grow? Characterizing the Extrapolation Frontier of Graph Generative Models for Materials Science

你能长多远?表征材料科学中图生成模型的外推前沿

Can Polat, Erchin Serpedin, Mustafa Kurban, Hasan Kurban

AI总结 提出RADII基准,通过半径分辨的纳米粒子结构评估晶体生成模型的外推能力,发现模型在训练半径外误差增加,且外推前沿可预测。

详情
AI中文摘要

每种晶体材料生成模型都存在一个临界结构尺寸,超出该尺寸其输出变得不可靠;我们称之为外推前沿。尽管这对纳米材料设计有重要影响,但这一前沿从未被系统测量过。我们引入RADII,一个半径分辨的基准,包含约75,000个晶体衍生的纳米粒子结构(33-11,298个原子),将半径视为连续缩放旋钮,在无泄漏分割下追踪从分布内到分布外的生成质量。每个模型以目标组成和原子数为条件,将几何外推作为评估变量。RADII提供前沿特定的诊断:每个半径的误差曲线精确定位每个架构的缩放上限,表面-内部分解分离边界和体相失效,跨度量排序揭示结构保真度的哪个方面首先失效。对五种最先进架构进行基准测试,我们发现:(i) 表现良好的模型在训练半径外全局位置误差增加约13%,而发散模型在所有尺度上保真度差,局部键合保真度从可忽略的退化到超过2倍的误差增长;(ii) 没有两个架构共享相同的失效序列,揭示前沿是由模型族决定的多维表面;(iii) 表现良好的模型遵循预期的几何缩放指数α ~ 1/3,其分布内拟合可预测分布外误差,使前沿可预测。将MatterGen扩展到其公布的参数数量稳定了采样,但并未关闭前沿,而DiffCSP在公布规模下仍不稳定。这些发现将输出尺度确立为几何生成模型的一级评估轴。代码和数据:https://github.com/KurbanIntelligenceLab/RADII。

英文摘要

Every generative model for crystalline materials harbors a critical structure size beyond which its outputs become unreliable; we call this the extrapolation frontier. Despite its consequences for nanomaterial design, this frontier has never been systematically measured. We introduce RADII, a radius-resolved benchmark of ~75,000 crystal-derived nanoparticle structures (33-11,298 atoms) that treats radius as a continuous scaling knob, tracing generation quality from in- to out-of-distribution under leakage-free splits. Each model is conditioned on target composition and atom count, isolating geometric extrapolation as the evaluation variable. RADII provides frontier-specific diagnostics: per-radius error profiles pinpoint each architecture's scaling ceiling, surface-interior decomposition separates boundary from bulk failures, and cross-metric sequencing reveals which aspect of structural fidelity breaks first. Benchmarking five state-of-the-art architectures, we find that: (i) well-behaved models degrade by ~13% in global positional error beyond training radii, while divergent models show poor fidelity across scales, with local bond fidelity ranging from negligible degradation to over 2x error growth; (ii) no two architectures share a failure sequence, revealing the frontier as a multi-dimensional surface shaped by model family; and (iii) well-behaved models follow the expected geometric scaling exponent alpha ~ 1/3, whose in-distribution fit predicts out-of-distribution error, making frontiers forecastable. Scaling MatterGen to its published parameter count stabilizes sampling but does not close the frontier, while DiffCSP remains unstable at published scale. These findings establish output scale as a first-class evaluation axis for geometric generative models. Code and data: https://github.com/KurbanIntelligenceLab/RADII.

2602.01011 2026-06-01 cs.MA cs.AI

Multi-Agent Teams Hold Experts Back

多智能体团队阻碍专家发挥

Aneesh Pappu, Batu El, Hancheng Cao, Carmelo di Nolfo, Yanchao Sun, Meng Cao, James Zou

AI总结 研究自组织多智能体LLM团队在无约束协调下无法匹配专家性能,发现整合妥协行为是主要瓶颈,导致性能损失高达41.1%。

Comments Accepted at the International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

多智能体LLM系统越来越多地被部署为自主协作者,其中智能体自由交互而非执行固定的、预先指定的工作流程。在这种设置中,有效的协调无法完全预先设计,而必须通过交互涌现。然而,大多数先前的工作通过固定角色、工作流程或聚合规则来强制协调,留下了在协调不受约束时自组织团队表现如何的问题。借鉴组织心理学,我们研究了自组织LLM团队是否能够实现强协同,即团队表现匹配或超过最佳个体成员。在受人类启发的和前沿的ML基准测试中,我们发现——与人类团队不同——LLM团队始终无法匹配其专家智能体的表现,即使明确告知谁是专家,在ML基准测试中性能损失高达41.1%。分解这一失败,我们表明专家利用而非识别是主要瓶颈。对话分析揭示了一种整合妥协的倾向——平均专家和非专家观点而非适当加权专业知识——这种倾向随团队规模增加而增加,并与性能负相关。有趣的是,这种寻求共识的行为提高了对对抗性智能体的鲁棒性,表明在一致性和有效利用专业知识之间存在权衡。我们的发现揭示了自组织多智能体团队在利用成员集体专业知识方面的显著差距。

英文摘要

Multi-agent LLM systems are increasingly deployed as autonomous collaborators, where agents interact freely rather than execute fixed, pre-specified workflows. In such settings, effective coordination cannot be fully designed in advance and must instead emerge through interaction. However, most prior work enforces coordination through fixed roles, workflows, or aggregation rules, leaving open the question of how well self-organizing teams perform when coordination is unconstrained. Drawing on organizational psychology, we study whether self-organizing LLM teams achieve strong synergy, where team performance matches or exceeds the best individual member. Across human-inspired and frontier ML benchmarks, we find that -- unlike human teams -- LLM teams consistently fail to match their expert agent's performance, even when explicitly told who the expert is, incurring performance losses of up to 41.1% on ML benchmarks. Decomposing this failure, we show that expert leveraging, rather than identification, is the primary bottleneck. Conversational analysis reveals a tendency toward integrative compromise -- averaging expert and non-expert views rather than appropriately weighting expertise -- which increases with team size and correlates negatively with performance. Interestingly, this consensus-seeking behavior improves robustness to adversarial agents, suggesting a trade-off between alignment and effective expertise utilization. Our findings reveal a significant gap in the ability of self-organizing multi-agent teams to harness the collective expertise of their members.

2602.07457 2026-06-01 cs.SE cs.AI cs.CL

Pull Requests as a Training Signal for Repo-Level Code Editing

拉取请求作为仓库级代码编辑的训练信号

Qinglin Zhu, Tianyu Chen, Shuai Lu, Lei Ji, Runcong Zhao, Murong Ma, Xiangxiang Dai, Yulan He, Lin Gui, Peng cheng, Yeyun Gong

AI总结 提出Clean-PR方法,利用真实GitHub拉取请求作为训练信号,通过重建和验证转换为搜索/替换编辑块,结合无代理对齐的监督微调,在SWE-bench上显著提升仓库级代码编辑性能。

Comments Accepted at ICML 2026

详情
AI中文摘要

仓库级代码编辑要求模型理解复杂依赖关系并在大型代码库中执行精确的多文件修改。虽然最近在SWE-bench上的进展严重依赖于复杂的代理脚手架,但尚不清楚这种能力有多少可以通过高质量的训练信号内化。为了解决这个问题,我们提出了Clean Pull Request (Clean-PR),一种利用真实世界GitHub拉取请求作为仓库级编辑训练信号的中间训练范式。我们引入了一个可扩展的流水线,通过重建和验证将嘈杂的拉取请求差异转换为搜索/替换编辑块,从而得到最大的公开可用语料库,包含200万个拉取请求,涵盖12种编程语言。使用这个训练信号,我们进行中间训练阶段,然后进行无代理对齐的监督微调过程,并带有错误驱动的数据增强。在SWE-bench上,我们的模型显著优于指令微调基线,在SWE-bench Lite上实现了13.6%的绝对改进,在SWE-bench Verified上实现了12.3%的绝对改进。这些结果表明,仓库级代码理解和编辑能力可以在简化的、无代理的协议下有效地内化到模型权重中,而无需依赖繁重的推理时脚手架。

英文摘要

Repository-level code editing requires models to understand complex dependencies and execute precise multi-file modifications across a large codebase. While recent gains on SWE-bench rely heavily on complex agent scaffolding, it remains unclear how much of this capability can be internalised via high-quality training signals. To address this, we propose Clean Pull Request (Clean-PR), a mid-training paradigm that leverages real-world GitHub pull requests as a training signal for repository-level editing. We introduce a scalable pipeline that converts noisy pull request diffs into Search/Replace edit blocks through reconstruction and validation, resulting in the largest publicly available corpus of 2 million pull requests spanning 12 programming languages. Using this training signal, we perform a mid-training stage followed by an agentless-aligned supervised fine-tuning process with error-driven data augmentation. On SWE-bench, our model significantly outperforms the instruction-tuned baseline, achieving absolute improvements of 13.6% on SWE-bench Lite and 12.3% on SWE-bench Verified. These results demonstrate that repository-level code understanding and editing capabilities can be effectively internalised into model weights under a simplified, agentless protocol, without relying on heavy inference-time scaffolding.

2602.03896 2026-06-01 stat.ML cs.LG q-bio.NC

A hitchhiker's guide to Poisson gradient estimation

泊松梯度估计的旅行者指南

Michael Ibrahim, Hanqi Zhao, Eli Sennesh, Zhi Li, Anqi Wu, Jacob L. Yates, Chengrui Li, Hadi Vafaii

AI总结 本文系统比较了指数到达时间模拟和Gumbel-SoftMax松弛两种方法,提出改进的EAT方法以降低偏差,并在泊松潜变量模型上验证其优越性能。

Comments Published at ICML2026 --- code: https://github.com/hadivafaii/PoissonGradientEstimation

详情
AI中文摘要

泊松分布潜变量模型在计算神经科学中广泛使用,但通过离散随机样本进行微分仍然具有挑战性。两种方法解决了这一问题:*指数到达时间*(EAT)模拟和*Gumbel-SoftMax*(GSM)松弛。我们首次对这些方法进行了系统比较,并为实践者提供了实用指导。我们的主要技术贡献是对EAT方法的修改,理论上保证了无偏的一阶矩(精确匹配发放率),并减少了二阶矩偏差。我们在分布保真度、梯度质量以及两个任务上的性能对这些方法进行了评估:(1)具有泊松潜变量的变分自编码器,以及(2)部分可观测的广义线性模型,其中必须从观测到的脉冲序列推断潜在的神经连接性。在所有指标上,我们修改后的EAT方法表现出更好的整体性能(通常与精确梯度相当),并且对超参数选择具有更高的鲁棒性。这些结果扩展到过度分散的负二项潜变量,其中修改后的EAT再次表现最佳。然而,只有GSM可以推广到任意非泊松分布,包括欠分散的情况。总之,我们的结果阐明了这些方法之间的权衡,并为使用泊松潜变量模型的实践者提供了具体建议。

英文摘要

Poisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address this: *Exponential Arrival Time* (EAT) simulation and *Gumbel-SoftMax* (GSM) relaxation. We provide the first systematic comparison of these methods, along with practical guidance for practitioners. Our main technical contribution is a modification to the EAT method that theoretically guarantees an unbiased first moment (exactly matching the firing rate), and reduces second-moment bias. We evaluate these methods on their distributional fidelity, gradient quality, and performance on two tasks: (1) variational autoencoders with Poisson latents, and (2) partially observable generalized linear models, where latent neural connectivity must be inferred from observed spike trains. Across all metrics, our modified EAT method exhibits better overall performance (often comparable to exact gradients), and substantially higher robustness to hyperparameter choices. These results extend to over-dispersed Negative Binomial latents, where modified EAT again performs best. However, only GSM generalizes to arbitrary non-Poisson distributions, including the under-dispersed regime. Together, our results clarify the trade-offs between these methods and offer concrete recommendations for practitioners working with Poisson latent variable models.

2601.22202 2026-06-01 eess.IV cs.CV

A Survey on Semantic Communication for Vision: Categories, Frameworks, Enabling Techniques, and Applications

面向视觉的语义通信综述:分类、框架、使能技术与应用

Runze Cheng, Yao Sun, Ahmad Taha, Xuesong Liu, David Flynn, Muhammad Ali Imran

AI总结 本文系统综述了面向视觉数据语义通信(SemCom-Vision)的方法,基于语义量化方案将现有方法分为语义保持、语义扩展和语义精炼三类,并总结了基于机器学习的编解码模型、训练算法及知识利用策略。

详情
Journal ref
IEEE Transactions on Network Science and Engineering, vol. 13, pp. 8080-8103, 2026
AI中文摘要

语义通信(SemCom)作为一种变革性范式,用于流量密集的视觉数据传输,将关注点从原始数据传输转向有意义的内容传输,从而缓解通信资源的日益紧张。然而,要实现SemCom,面临着视觉数据精确语义量化、不同任务和目标下的鲁棒语义提取与重建、利用有效知识的收发端协调以及适应不可预测的无线通信环境等挑战。本文对面向视觉数据语义通信(SemCom-Vision)进行了系统综述,其中进行了计算机视觉(CV)与通信工程的跨学科分析,为机器学习(ML)赋能的SemCom-Vision设计提供全面指导。具体而言,本综述首先阐述了SemCom的基础知识和关键概念。然后,我们引入了一种新的分类视角,根据通过语义量化方案解释的通信目标,将现有的SemCom-Vision方法分为语义保持通信(SPC)、语义扩展通信(SEC)和语义精炼通信(SRC)。此外,本综述阐述了每个SemCom-Vision类别中基于ML的编码器-解码器模型和训练算法,随后介绍了知识结构和利用策略。最后,我们讨论了潜在的SemCom-Vision应用。

英文摘要

Semantic communication (SemCom) emerges as a transformative paradigm for traffic-intensive visual data transmission, shifting focus from raw data to meaningful content transmission and relieving the increasing pressure on communication resources. However, to achieve SemCom, challenges are faced in accurate semantic quantization for visual data, robust semantic extraction and reconstruction under diverse tasks and goals, transceiver coordination with effective knowledge utilization, and adaptation to unpredictable wireless communication environments. In this paper, we present a systematic review of SemCom for visual data transmission (SemCom-Vision), wherein an interdisciplinary analysis integrating computer vision (CV) and communication engineering is conducted to provide comprehensive guidelines for the machine learning (ML)-empowered SemCom-Vision design. Specifically, this survey first elucidates the basics and key concepts of SemCom. Then, we introduce a novel classification perspective to categorize existing SemCom-Vision approaches as semantic preservation communication (SPC), semantic expansion communication (SEC), and semantic refinement communication (SRC) based on communication goals interpreted through semantic quantization schemes. Moreover, this survey articulates the ML-based encoder-decoder models and training algorithms for each SemCom-Vision category, followed by knowledge structure and utilization strategies. Finally, we discuss potential SemCom-Vision applications.

2601.21778 2026-06-01 cs.NE cs.LG

Error Amplification Limits ANN-to-SNN Conversion in Continuous Control

误差放大限制了连续控制中的ANN到SNN转换

Zijie Xu, Zihan Huang, Yiting Dong, Kang Chen, Wenxuan Liu, Zhaofei Yu

AI总结 针对连续控制中ANN到SNN转换性能差的问题,提出跨步残差电位初始化(CRPI)机制,通过抑制时间相关误差恢复性能。

Comments Accepted by ICML2026

详情
AI中文摘要

脉冲神经网络(SNN)可以通过转换已有的训练良好的人工神经网络(ANN)来获得有竞争力的性能,从而避免额外的昂贵训练。这一特性在强化学习(RL)中特别有吸引力,因为通过环境交互进行训练成本高昂且存在潜在风险。然而,现有的转换方法在连续控制中表现不佳,而合适的基线方法基本缺失。我们确定误差放大是关键原因:小的动作近似误差在决策步骤间变得时间相关,导致累积的状态分布偏移和严重的性能退化。为了解决这个问题,我们提出了跨步残差电位初始化(CRPI),一种轻量级无梯度机制,它在决策步骤间传递残余膜电位以抑制时间相关误差。在具有向量和视觉观测的连续控制基准上的实验表明,CRPI可以集成到现有的转换流程中,并显著恢复丢失的性能。我们的结果强调了连续控制是ANN到SNN转换的一个关键且具有挑战性的基准,其中小的误差可能被强烈放大并影响性能。代码可在 https://github.com/xuzijie32/ANN2SNN-CRPI 获取。

英文摘要

Spiking Neural Networks (SNNs) can achieve competitive performance by converting already existing well-trained Artificial Neural Networks (ANNs), avoiding further costly training. This property is particularly attractive in Reinforcement Learning (RL), where training through environment interaction is expensive and potentially unsafe. However, existing conversion methods perform poorly in continuous control, where suitable baselines are largely absent. We identify error amplification as the key cause: small action approximation errors become temporally correlated across decision steps, inducing cumulative state distribution shift and severe performance degradation. To address this issue, we propose Cross-Step Residual Potential Initialization (CRPI), a lightweight gradient-free mechanism that carries over residual membrane potentials across decision steps to suppress temporally correlated errors. Experiments on continuous control benchmarks with both vector and visual observations demonstrate that CRPI can be integrated into existing conversion pipelines and substantially recovers lost performance. Our results highlight continuous control as a critical and challenging benchmark for ANN-to-SNN conversion, where small errors can be strongly amplified and impact performance. Code is available at https://github.com/xuzijie32/ANN2SNN-CRPI.

2601.20076 2026-06-01 math.OC cs.LG

Randomized Feasibility Methods for Constrained Optimization with Adaptive Step Sizes

自适应步长的约束优化随机可行性方法

Abhishek Chakraborty, Angelia Nedić

AI总结 提出一种结合Polyak步长和随机约束采样的自适应步长随机可行性算法,用于解决强凸或一般凸目标函数下的约束优化问题,并证明线性收敛或O(1/√T)最坏情况速率。

详情
AI中文摘要

我们考虑在凸函数下水平集交集定义的约束下最小化目标函数。研究两种情况:(i) 强凸且Lipschitz光滑的目标函数;(ii) 凸但可能非光滑的目标函数。为了处理不易投影的约束,我们使用带有Polyak步长和每轮随机采样约束数量的随机可行性算法,同时采取(次)梯度步来最小化目标函数。对于情况(i),我们证明了使用自适应步长时目标函数值在期望上线性收敛到任意给定容差。对于情况(ii),我们开发了一种完全无问题参数的自适应步长方案,在期望上达到O(1/√T)的最坏情况速率。迭代的不可行性几乎必然随可行性更新次数几何级数下降,而对于平均迭代,我们建立了函数值相对于最优值的期望下界,该下界依赖于随机采样约束数量的分布。对于某些样本量增长的选择,可以达到最优速率。最后,在二次约束二次规划(QCQP)问题、支持向量机(SVM)和具有群体公平约束的逻辑回归上的仿真表明,我们的算法相比其他最先进方法具有计算效率优势。

英文摘要

We consider minimizing an objective function subject to constraints defined by the intersection of lower-level sets of convex functions. We study two cases: (i) strongly convex and Lipschitz-smooth objective function and (ii) convex but possibly nonsmooth objective function. To deal with the constraints that are not easy to project on, we use a randomized feasibility algorithm with Polyak steps and a random number of sampled constraints per iteration, while taking (sub)gradient steps to minimize the objective function. For case (i), we prove linear convergence in expectation of the objective function values to any prescribed tolerance using an adaptive stepsize. For case (ii), we develop a fully problem parameter-free and adaptive stepsize scheme that yields an $O(1/\sqrt{T})$ worst-case rate in expectation. The infeasibility of the iterates decreases geometrically with the number of feasibility updates almost surely, while for the averaged iterates, we establish an expected lower bound on the function values relative to the optimal value that depends on the distribution for the random number of sampled constraints. For certain choices of sample-size growth, optimal rates are achieved. Finally, simulations on a Quadratically Constrained Quadratic Programming (QCQP) problem, Support Vector Machines (SVM), and logistic regression with group fairness constraints demonstrate the computational efficiency of our algorithm compared to other state-of-the-art methods.

2601.19966 2026-06-01 cond-mat.mtrl-sci cs.LG physics.chem-ph physics.comp-ph

Global Plane Waves From Local Gaussians: Periodic Charge Densities in a Blink

从局部高斯到全局平面波:眨眼间的周期电荷密度

Jonas Elsborg, Felix Ærtebjerg, Luca Thiede, Alán Aspuru-Guzik, Tejs Vegge, Arghya Bhowmik

AI总结 提出ELECTRAFI模型,利用实空间各向异性高斯的解析傅里叶变换和泊松求和公式,通过单次逆FFT快速重建周期电荷密度,在保持高精度的同时速度提升高达633倍。

Comments ICML 2026, 29 pages including appendix, 11 Figures, 7 tables

详情
AI中文摘要

我们引入了ELECTRAFI,一种快速、端到端可微的模型,用于预测晶体材料中的周期电荷密度。ELECTRAFI在实空间中构建各向异性高斯,并利用其闭式傅里叶变换,通过泊松求和公式解析地评估平面波系数。该公式将非局域和周期行为委托给解析变换,使得通过单次逆FFT即可重建完整的周期电荷密度。通过避免显式的实空间网格探测、周期图像求和以及球谐展开,ELECTRAFI在周期基准测试中达到或超越了最先进的精度,同时比最强的竞争方法快高达633倍,在几分之一秒内重建晶体电荷密度。当用于初始化DFT计算时,ELECTRAFI将总DFT计算成本降低高达约20%,而较慢的电荷密度模型由于高推理时间而抵消了节省。我们的结果表明,准确性和推理成本共同决定了端到端DFT加速,并激励我们关注效率。

英文摘要

We introduce ELECTRAFI, a fast, end-to-end differentiable model for predicting periodic charge densities in crystalline materials. ELECTRAFI constructs anisotropic Gaussians in real space and exploits their closed-form Fourier transforms to analytically evaluate plane-wave coefficients via the Poisson summation formula. This formulation delegates non-local and periodic behavior to analytic transforms, enabling reconstruction of the full periodic charge density with a single inverse FFT. By avoiding explicit real-space grid probing, periodic image summation, and spherical harmonic expansions, ELECTRAFI matches or exceeds state-of-the-art accuracy across periodic benchmarks while being up to $633 \times$ faster than the strongest competing method, reconstructing crystal charge densities in a fraction of a second. When used to initialize DFT calculations, ELECTRAFI reduces total DFT compute cost by up to ~20%, whereas slower charge density models negate savings due to high inference times. Our results show that accuracy and inference cost jointly determine end-to-end DFT speedups, and motivate our focus on efficiency.

2508.17671 2026-06-01 cs.GT cs.AI cs.MA econ.TH

Consistent Opponent Modeling in Imperfect-Information Games

不完全信息博弈中的一致对手建模

Sam Ganzfried

AI总结 针对不完全信息博弈中现有对手建模方法无法保证收敛到对手真实策略的问题,提出一种基于序列形式博弈表示和投影梯度下降的凸优化算法,实现高效且一致的对手建模。

详情
AI中文摘要

多智能体环境中智能体的目标是在与对手交互时最大化总收益。遵循博弈论解概念(如纳什均衡)在某些场景下可能获得强性能;然而,这类方法未能利用与对手重复交互中的历史和观测数据。对手建模算法整合机器学习技术,利用可用数据来利用次优对手;然而,这类方法在不完全信息博弈中的有效性至今相当有限。我们表明,即使面对来自已知先验分布的静态对手,现有对手建模方法也无法满足一个简单的理想性质;即,即使博弈迭代次数趋近无穷,它们也不能保证模型趋近对手的真实策略。我们开发了一种新算法,能够实现这一性质,并通过基于序列形式博弈表示和投影梯度下降求解凸最小化问题来高效运行。在标准贝叶斯可辨识性和访问假设下,该算法保证从游戏过程的观测以及可能可用的额外历史数据中高效收敛到对手的真实策略。

英文摘要

The goal of agents in multi-agent environments is to maximize total reward against the opposing agents that are encountered. Following a game-theoretic solution concept, such as Nash equilibrium, may obtain a strong performance in some settings; however, such approaches fail to capitalize on historical and observed data from repeated interactions against our opponents. Opponent modeling algorithms integrate machine learning techniques to exploit suboptimal opponents utilizing available data; however, the effectiveness of such approaches in imperfect-information games to date is quite limited. We show that existing opponent modeling approaches fail to satisfy a simple desirable property even against static opponents drawn from a known prior distribution; namely, they do not guarantee that the model approaches the opponent's true strategy even in the limit as the number of game iterations approaches infinity. We develop a new algorithm that is able to achieve this property and runs efficiently by solving a convex minimization problem based on the sequence-form game representation using projected gradient descent. The algorithm is guaranteed to efficiently converge to the opponent's true strategy under standard Bayesian identifiability and visitation assumptions, given observations from gameplay and possibly additional historical data if it is available.

2509.16187 2026-06-01 cs.SE cs.LG

MatchFixAgent: Language-Agnostic Autonomous Repository-Level Code Translation Validation and Repair

MatchFixAgent: 语言无关的自主仓库级代码翻译验证与修复

Ali Reza Ibrahimzada, Brandon Paulsen, Reyhaneh Jabbarvand, Joey Dodds, Daniel Kroening

AI总结 提出基于大语言模型的多智能体框架MatchFixAgent,实现语言无关的仓库级代码翻译等价性验证与修复,在验证覆盖率和修复成功率上显著优于现有方法。

Comments Published in ICML 2026

详情
AI中文摘要

代码翻译将源代码从一种编程语言转换为另一种。验证翻译的功能等价性并在必要时进行修复是代码翻译的关键步骤。现有的自动化验证和修复方法由于工程开销高而难以泛化到多种编程语言,并且它们依赖于现有且往往不充分的测试套件,导致等价性误判和翻译修复效果不佳。为弥补这一差距,我们开发了MatchFixAgent,一个基于大语言模型、语言无关的翻译等价性验证与修复框架。MatchFixAgent采用多智能体架构,将等价性验证分解为多个子任务,以确保对翻译进行彻底且一致的语义分析。我们将MatchFixAgent的验证和修复结果与四种仓库级代码翻译技术进行了比较。结果表明,MatchFixAgent对99.2%的翻译对生成了(不)等价判定,其中72.8%的等价性验证结果与先前工作一致。当MatchFixAgent的结果与先前工作不一致时,我们发现60.7%的情况下MatchFixAgent的结果实际上是正确的。此外,我们证明MatchFixAgent可以修复50.6%的不等价翻译,而先前工作仅为18.5%。

英文摘要

Code translation transforms source code from one programming language (PL) to another. Validating the functional equivalence of translation and repairing, if necessary, are critical steps in code translation. Existing automated validation and repair approaches struggle to generalize to many PLs due to high engineering overhead, and they rely on existing and often inadequate test suites, which results in false claims of equivalence and ineffective translation repair. To bridge this gap, we develop MatchFixAgent, a large language model (LLM)-based, PL-agnostic framework for equivalence validation and repair of translations. MatchFixAgent features a multi-agent architecture that divides equivalence validation into several sub-tasks to ensure thorough and consistent semantic analysis of the translation. We compare MatchFixAgent's validation and repair results with four repository-level code translation techniques. Our results demonstrate that MatchFixAgent produces (in)equivalence verdicts for 99.2% of translation pairs, with the same equivalence validation result as prior work on 72.8% of them. When MatchFixAgent's result disagrees with prior work, we find that 60.7% of the time MatchFixAgent's result is actually correct. In addition, we show that MatchFixAgent can repair 50.6% of inequivalent translation, compared to prior work's 18.5%.

2512.11779 2026-06-01 stat.ML cs.AI cs.LG

Conditional Coverage Diagnostics for Conformal Prediction

条件覆盖诊断用于共形预测

Sacha Braun, David Holzmüller, Michael I. Jordan, Francis Bach

AI总结 提出将条件覆盖估计转化为分类问题,通过超额风险度量(ERT)来诊断共形预测的条件覆盖偏差,实验表明使用现代分类器比传统指标具有更高的统计功效。

详情
AI中文摘要

评估条件覆盖仍然是评估预测系统可靠性中最持久的挑战之一。尽管共形方法可以保证边际覆盖,但没有方法能保证产生具有正确条件覆盖的集合,这使得实践者无法清晰解释局部偏差。为了克服现有指标的样本低效和过拟合问题,我们将条件覆盖估计转化为一个分类问题。当且仅当某个分类器能够达到比目标覆盖更低的风险时,条件覆盖被违反。通过选择(适当的)损失函数,得到的风险差异给出了自然误覆盖度量(如L1和L2距离)的保守估计,甚至可以分离过覆盖和欠覆盖以及非恒定目标覆盖的影响。我们将得到的度量族称为目标覆盖的超额风险(ERT)。实验表明,使用现代分类器比基于简单分类器的现有指标(如CovGap)具有更高的统计功效。此外,我们使用我们的度量来基准测试不同的共形预测方法。最后,我们发布了ERT以及先前条件覆盖度量的开源软件包。这些贡献共同为理解、诊断和改进预测系统的条件可靠性提供了新视角。

英文摘要

Evaluating conditional coverage remains one of the most persistent challenges in assessing the reliability of predictive systems. Although conformal methods can give guarantees on marginal coverage, no method can guarantee to produce sets with correct conditional coverage, leaving practitioners without a clear way to interpret local deviations. To overcome sample-inefficiency and overfitting issues of existing metrics, we cast conditional coverage estimation as a classification problem. Conditional coverage is violated if and only if some classifier can achieve lower risk than the target coverage. Through the choice of a (proper) loss function, the resulting risk difference gives a conservative estimate of natural miscoverage measures such as L1 and L2 distance, and can even separate the effects of over- and under-coverage, and non-constant target coverages. We call the resulting family of metrics excess risk of the target coverage (ERT). We show experimentally that the use of modern classifiers provides much higher statistical power than simple classifiers underlying established metrics like CovGap. Additionally, we use our metric to benchmark different conformal prediction methods. Finally, we release an open-source package for ERT as well as previous conditional coverage metrics. Together, these contributions provide a new lens for understanding, diagnosing, and improving the conditional reliability of predictive systems.

2512.00919 2026-06-01 stat.ML cs.LG

Outcome-Aware Spectral Feature Learning for Instrumental Variable Regression

面向工具变量回归的结果感知谱特征学习

Dimitri Meunier, Jakub Wornbard, Vladimir R. Kostic, Antoine Moulin, Alek Fröhlich, Karim Lounici, Massimiliano Pontil, Arthur Gretton

AI总结 针对存在隐藏混杂因素的非参数工具变量回归问题,提出一种通过最小化基于增广算子的对比损失来学习结果感知谱特征的方法,以缓解谱错位导致的因果函数表示不足问题。

Comments ICML 2026

详情
AI中文摘要

我们解决了在存在隐藏混杂因素的情况下使用非参数工具变量(IV)回归进行因果效应估计的问题。一种成熟的方法是使用基于学习到的谱特征的估计量,即跨越连接处理变量与工具变量的算子的主要奇异子空间的特征。虽然这种方法很强大,但此类特征对结果变量是无关的。因此,当真实因果函数无法被这些主导奇异函数很好地表示时,该方法可能会失败。为了缓解这一问题,我们引入了增广谱特征学习,这是一个使特征学习过程具有结果感知能力的框架。我们的方法通过最小化从增广算子导出的新颖对比损失来学习特征,该增广算子融合了结果的信息。通过学习这些任务特定的特征,即使在谱错位的情况下,我们的方法仍然有效。我们对该框架进行了理论分析,并在具有挑战性的基准测试上验证了我们的方法。

英文摘要

We address the problem of causal effect estimation in the presence of hidden confounders using nonparametric instrumental variable (IV) regression. An established approach is to use estimators based on learned spectral features, that is, features spanning the top singular subspaces of the operator linking treatments to instruments. While powerful, such features are agnostic to the outcome variable. Consequently, the method can fail when the true causal function is poorly represented by these dominant singular functions. To mitigate, we introduce Augmented Spectral Feature Learning, a framework that makes the feature learning process outcome-aware. Our method learns features by minimizing a novel contrastive loss derived from an augmented operator that incorporates information from the outcome. By learning these task-specific features, our approach remains effective even under spectral misalignment. We provide a theoretical analysis of this framework and validate our approach on challenging benchmarks.

2511.05875 2026-06-01 cs.HC cs.AI cs.CV

Towards a Humanized Social-Media Ecosystem: AI-Augmented HCI Design Patterns for Safety, Agency & Well-Being

迈向人性化的社交媒体生态系统:面向安全、自主与福祉的AI增强人机交互设计模式

Mohd Ruhul Ameen, Akif Islam

AI总结 提出Human-Layer AI(HL-AI)框架,通过浏览器端用户拥有的可解释中介,在不依赖平台合作的情况下赋予用户实时控制权,实现内容重写、完整性检测、信息流定制、行为中断和恢复模式等五种设计模式,以提升社交媒体安全性与用户福祉。

Comments 6 pages, 5 tables, 7 figures, and 2 algorithm tables. Accepted at International Conference on Signal Processing, Information, Communication and Systems (SPICSCON 2025)

详情
Journal ref
2025 IEEE International Conference on Signal Processing, Information, Communication and Systems (SPICSCON)
AI中文摘要

社交平台连接了数十亿人,但其以参与度优先的算法往往对用户施加影响而非与用户协作,加剧了压力、虚假信息和失控感。我们提出Human-Layer AI(HL-AI)——用户拥有的、可解释的中介,位于浏览器中平台逻辑与界面之间。HL-AI赋予人们实用的、即时的控制权,无需平台合作。我们贡献了一个可用的Chrome/Edge原型,实现了五种代表性模式框架——上下文感知帖子重写器、帖子完整性检测器、精细信息流策展器、微退出代理和恢复模式——以及一个统一的数学公式,平衡用户效用、自主成本和风险阈值。评估涵盖技术准确性、可用性和行为结果。结果是一套人性化的控制手段,帮助用户在伤害发生前重写内容、通过完整性提示阅读、有意图地调整信息流、暂停强迫性循环以及在骚扰期间寻求庇护,同时通过解释和覆盖选项保留自主权。该原型为改造当今的信息流以融入安全性、自主性和福祉提供了实用路径,并邀请进行严格的跨文化用户评估。

英文摘要

Social platforms connect billions of people, yet their engagement-first algorithms often work on users rather than with them, amplifying stress, misinformation, and a loss of control. We propose Human-Layer AI (HL-AI)--user-owned, explainable intermediaries that sit in the browser between platform logic and the interface. HL-AI gives people practical, moment-to-moment control without requiring platform cooperation. We contribute a working Chrome/Edge prototype implementing five representative pattern frameworks--Context-Aware Post Rewriter, Post Integrity Meter, Granular Feed Curator, Micro-Withdrawal Agent, and Recovery Mode--alongside a unifying mathematical formulation balancing user utility, autonomy costs, and risk thresholds. Evaluation spans technical accuracy, usability, and behavioral outcomes. The result is a suite of humane controls that help users rewrite before harm, read with integrity cues, tune feeds with intention, pause compulsive loops, and seek shelter during harassment, all while preserving agency through explanations and override options. This prototype offers a practical path to retrofit today's feeds with safety, agency, and well-being, inviting rigorous cross-cultural user evaluation.

2510.20853 2026-06-01 eess.AS cs.CL cs.SD

Beyond Hearing: Learning Task-Agnostic ExG Representations from Earphones via Physiology-Informed Tokenization

超越听觉:通过生理学启发的标记化从耳机学习任务无关的ExG表示

Hyungjun Yoon, Seungjoo Lee, Yu Yvonne Wu, Xiaomeng Chen, Taiting Lu, Freddy Yifei Liu, Taeckyung Lee, Hyeongheon Cha, Haochen Zhao, Gaoteng Zhao, Dongyao Chen, Cecilia Mascolo, Sung-Ju Lee, Lili Qiu

AI总结 提出一种基于耳机的生理学启发的多频带标记化方法(PiMT),通过无干扰的日常ExG数据采集和重建任务学习鲁棒表示,实现跨多种任务(包括五种人类感官)的通用ExG监测。

Comments Accepted to ICLR 2026

详情
AI中文摘要

电生理(ExG)信号为人类生理学提供了有价值的见解,但由于两个关键限制,构建能够泛化到日常任务的基础模型仍然具有挑战性:(i)数据多样性不足,因为大多数ExG记录是在受控实验室中使用笨重、昂贵的设备收集的;以及(ii)任务特定的模型设计需要定制的处理(即目标频率滤波器)和架构,这限制了跨任务的泛化。为了解决这些挑战,我们引入了一种可扩展的、任务无关的野外ExG监测方法。我们使用基于耳机的硬件原型收集了50小时的无干扰自由生活ExG数据,以缩小数据多样性差距。我们方法的核心是生理学启发的多频带标记化(PiMT),它将ExG信号分解为12个生理学启发的标记,然后通过重建任务学习鲁棒的表示。这使得能够在全频谱范围内进行自适应特征识别,同时捕获任务相关信息。在我们新的DailySense数据集(第一个支持基于ExG的五种人类感官分析的数据集)以及四个公共ExG基准上的实验表明,PiMT在多种任务上始终优于最先进的方法。

英文摘要

Electrophysiological (ExG) signals offer valuable insights into human physiology, yet building foundation models that generalize across everyday tasks remains challenging due to two key limitations: (i)~insufficient data diversity, as most ExG recordings are collected in controlled labs with bulky, expensive devices; and (ii)~task-specific model designs that require tailored processing (i.e., targeted frequency filters) and architectures, which limit generalization across tasks. To address these challenges, we introduce an approach for scalable, task-agnostic ExG monitoring in the wild. We collected 50 hours of unobtrusive free-living ExG data with an earphone-based hardware prototype to narrow the data diversity gap. At the core of our approach is Physiology-informed Multi-band Tokenization (PiMT), which decomposes ExG signals into 12 physiology-informed tokens, followed by a reconstruction task to learn robust representations. This enables adaptive feature recognition across the full frequency spectrum while capturing task-relevant information. Experiments on our new DailySense dataset, the first to enable ExG-based analysis across five human senses, together with four public ExG benchmarks, demonstrate that PiMT consistently outperforms state-of-the-art methods across diverse tasks.

2510.03415 2026-06-01 cs.PL cs.AI cs.CL cs.SE

LLMs Lean on Priors, Not Programming Language Semantics

LLMs 依赖先验而非编程语言语义

Aditya Thimmaiah, Jiyang Zhang, Jayanth Srinivasa, Junyi Jessy Li, Milos Gligoric

AI总结 通过 PLSemanticsBench 基准测试,发现前沿大语言模型在程序执行任务中依赖预训练统计规律而非形式语义规则,语义变异和结构复杂度导致准确率大幅下降。

Comments Accepted at ICML 2026

详情
AI中文摘要

近期工作探究大语言模型(LLMs)是否基于显式规则而非预训练统计规律进行推理。程序执行提供了一个典型实例:形式语义通过符号转换规则定义行为,这些规则在分布偏移下可被系统性改变。我们研究 LLMs 能否通过程序执行基于形式语义进行推理,并引入 PLSemanticsBench,将轻量级 C 程序与两种语义系统(小步操作语义和 K 语义)配对,探测四种能力:组合规则得到最终状态、状态未变时选择规则、在长轨迹上维持这种条件推理、以及在新语义下遵循提供的规则。为解耦语义推理与语法熟悉度,我们重新定义熟悉运算符以引发符号-含义冲突,并引入仅通过提供规则定义的新符号,同时在人类编写、LLM 翻译和模糊生成的分割上以递增的结构复杂度进行压力测试。在 11 个前沿 LLM 上,标准语义下的最终状态准确率(高达 90%)在语义变异和结构复杂度增加时急剧下降,降幅达 40-60 个百分点。仅少数模型实现了非零的长程条件推理准确率,即使最佳系统也仅达到 35%。这些结果表明,当代 LLMs 往往依赖预训练的词汇关联,而非系统地基于提供的正式规则进行推理。PLSemanticsBench 公开于 https://EngineeringSoftware.github.io/PLSemanticsBench。

英文摘要

Recent work asks whether large language models (LLMs) condition their reasoning on explicit rules rather than statistical regularities from pretraining. Program execution provides a canonical instance: formal semantics define behavior through symbolic transition rules that can be systematically altered under distribution shift. We investigate whether LLMs can condition their reasoning on formal semantics through program execution and introduce PLSemanticsBench, pairing featherweight C programs with two semantic systems -- small-step operational semantics and K semantics -- and probing four capabilities: composing rules for final states, selecting rules when state is unmutated, sustaining such conditioning over long traces, and following supplied rules under novel semantics. To decouple semantic reasoning from syntactic familiarity, we redefine familiar operators to induce symbol-meaning conflict and introduce novel symbols defined only through the supplied rules, and stress-test models on Human-Written, LLM-Translated, and Fuzzer-Generated splits with increasing structural complexity. Across 11 frontier LLMs, strong final-state accuracy under standard semantics (up to 90%) drops sharply -- by as much as 40--60% points -- under semantic mutations and increasing structural complexity. Only a handful of models achieve non-zero long-horizon conditioning accuracy, and even the best systems reach just 35%. Together, these results suggest that contemporary LLMs often rely on pretrained lexical associations rather than systematically conditioning on supplied formal rules. PLSemanticsBench is publicly available at https://EngineeringSoftware.github.io/PLSemanticsBench.

2509.14221 2026-06-01 cs.IR cs.CL

GEM-Bench: A Benchmark for Ad-Injected Response Generation within Generative Engine Marketing

GEM-Bench:生成引擎营销中广告注入响应生成的基准测试

Silan Hu, Shiqi Zhang, Yimin Shi, Xiaokui Xiao

AI总结 针对生成引擎营销中广告注入响应生成缺乏专门基准的问题,提出首个综合基准GEM-Bench,包含多场景数据集、多维度指标和基线方案,实验表明简单提示方法虽能提升点击率但降低用户满意度,而基于预生成无广告响应的插入方法可缓解此问题但增加开销。

Comments Technical Report

详情
AI中文摘要

生成引擎营销(GEM)是一种新兴的生态系统,通过将相关广告无缝集成到生成引擎(如基于LLM的聊天机器人)的响应中来实现货币化。GEM的核心在于广告注入响应的生成与评估。然而,现有基准并非专门为此设计,限制了未来的研究。为填补这一空白,我们提出了GEM-Bench,这是首个针对GEM中广告注入响应生成的综合基准。GEM-Bench包含三个精心策划的数据集,涵盖聊天机器人和搜索场景;一个捕获用户满意度和参与度多个维度的指标本体;以及在一个可扩展的多智能体框架内实现的多个基线解决方案。我们的初步结果表明,虽然简单的基于提示的方法能实现合理的参与度(如点击率),但往往降低用户满意度。相比之下,基于预生成的无广告响应插入广告的方法有助于缓解这一问题,但引入了额外开销。这些发现凸显了未来需要研究设计更有效、更高效的GEM广告注入响应生成解决方案。该基准及所有相关资源已在https://gem-bench.org/公开。

英文摘要

Generative Engine Marketing (GEM) is an emerging ecosystem for monetizing generative engines, such as LLM-based chatbots, by seamlessly integrating relevant advertisements into their responses. At the core of GEM lies the generation and evaluation of ad-injected responses. However, existing benchmarks are not specifically designed for this purpose, which limits future research. To address this gap, we propose GEM-Bench, the first comprehensive benchmark for ad-injected response generation in GEM. GEM-Bench includes three curated datasets covering both chatbot and search scenarios, a metric ontology that captures multiple dimensions of user satisfaction and engagement, and several baseline solutions implemented within an extensible multi-agent framework. Our preliminary results indicate that, while simple prompt-based methods achieve reasonable engagement such as click-through rate, they often reduce user satisfaction. In contrast, approaches that insert ads based on pre-generated ad-free responses help mitigate this issue but introduce additional overhead. These findings highlight the need for future research on designing more effective and efficient solutions for generating ad-injected responses in GEM. The benchmark and all related resources are publicly available at https://gem-bench.org/.

2505.05168 2026-06-01 math.ST cs.LG stat.ML stat.TH

Dynamical local Fréchet curve regression in manifolds

流形上的动态局部Fréchet曲线回归

M. D. Ruiz-Medina, A. Torres-Signes

AI总结 本文在可分离希尔伯特空间中推导了响应和回归变量的最小二乘局部线性Fréchet曲线预测器,并提出了基于加权Fréchet均值的流形内蕴局部线性Fréchet曲线预测器,证明了其渐近最优性。

Comments This paper is currently under journal second revision

详情
AI中文摘要

在温和条件下,本文推导了在可分离希尔伯特空间中评估的响应和回归变量的最小二乘局部线性Fréchet曲线预测器。我们获得了允许在向量函数的L^{2}空间中实现该局部线性Fréchet函数预测器的条件,该空间的值位于紧致黎曼流形上的时变切空间。其次,基于加权Fréchet均值方法,提出了在该流形上评估的内蕴局部线性Fréchet曲线预测器。证明了其渐近最优性。模拟研究和实际数据分析分析了两种预测器经验版本的有限样本性能,并与测地线Nadaraya-Watson型曲线预测器进行了比较。在实际数据分析中,基于NASA MAGSAT卫星的地心纬度和经度观测,对地球磁场的时变球坐标进行了函数预测。

英文摘要

Under mild conditions, this paper derives a least-squares local linear Fréchet curve predictor for response and regressor evaluated in a separable Hilbert space. We obtain the conditions allowing the implementation of this local linear Fréchet functional predictor in the ambient L^{2}-space of vector functions, with values in the time-varying tangent space on a compact Riemannian manifold. An intrinsic local linear Fréchet curve predictor evaluated in such a manifold is secondly proposed, based on a weighted Fréchet mean approach. Its asymptotical optimality is proved. The simulation study and real-data application analyze the finite-sample performance of the empirical versions of both predictors, compared with a geodesic Nadaraya-Watson-type curve predictor. In the real-data application, the functional prediction of the time-varying spherical coordinates of the Earth's magnetic field is addressed, from the observation of the geocentric latitude and longitude of the satellite NASA's MAGSAT spacecraft.

2509.14789 2026-06-01 eess.AS cs.CR cs.SD eess.SP

Acoustic Simulation Framework for Multi-channel Replay Speech Detection

多通道重放语音检测的声学仿真框架

Michael Neri, Tuomas Virtanen

AI总结 提出一个利用公开资源模拟多通道重放语音配置的声学仿真框架,训练M-ALRAD检测器并扩展其利用通道间相位差特征,在无真实训练数据下于ReMASC语料库上评估泛化能力。

Comments Submitted to IEEE MMSP 2026

详情
AI中文摘要

重放语音攻击对语音控制系统构成重大威胁,尤其是在广泛部署语音助手的智能环境中。虽然多通道音频提供了空间线索,可以增强重放检测的鲁棒性,但现有的数据集和方法主要依赖于单通道录音。此外,先前的研究强调,这种攻击对新环境的泛化具有挑战性,需要新的方法来生成涵盖各种声学条件的数据。因此,在这项工作中,我们引入了一个声学仿真框架,旨在使用公开可用的资源模拟多通道重放语音配置。利用该框架,我们训练了最先进的多通道重放检测器M-ALRAD,并在没有任何真实训练数据的情况下,在ReMASC真实录音语料库上评估其泛化能力。为了改进空间信息的利用,我们为M-ALRAD扩展了相邻麦克风对之间计算的通道间相位差特征,用方向线索增强波束形成表示。合成数据集将在论文被接收后提供。

英文摘要

Replay speech attacks pose a significant threat to voice-controlled systems, especially in smart environments where voice assistants are widely deployed. While multi-channel audio offers spatial cues that can enhance replay detection robustness, existing datasets and methods predominantly rely on single-channel recordings. Moreover, previous studies highlighted that generalization of this attack to new environments is challenging, requiring new methods for generating data encompassing various acoustic conditions. Hence, in this work we introduce an acoustic simulation framework designed to simulate multi-channel replay speech configurations using publicly available resources. Using the framework, we train the state-of-the-art multi-channel replay detector M-ALRAD and evaluate its generalisation on the ReMASC real-recording corpus without any real training data. To improve the exploitation of spatial information, we extend M-ALRAD with inter-channel phase difference features computed for adjacent microphone pairs, augmenting the beamformed representation with directional cues. Synthetic datasets will be available upon acceptance of the paper.

2509.06856 2026-06-01 stat.ML cs.LG cs.NA math.NA

Sequential Least-Squares Estimators with Fast Randomized Sketching for Linear Statistical Models

用于线性统计模型的快速随机草图化序贯最小二乘估计器

Guan-Yu Chen, Dong-Yue Xie, Xi Yang

AI总结 提出一种融合草图-求解与迭代草图方法的序贯最小二乘估计框架,通过逐步增大草图尺寸迭代求解子问题,高效获得高精度参数估计。

详情
AI中文摘要

我们提出了一种新颖的随机化框架,用于大规模线性统计模型的估计问题,即快速随机草图化序贯最小二乘估计器(SLSE-FRS),该框架首次集成了草图-求解和迭代草图方法。通过迭代构建和求解草图最小二乘子问题,并逐步增大草图尺寸以获得更好的精度,SLSE-FRS逐步细化真实参数向量的估计,最终产生高精度估计器。我们分析了SLSE-FRS的收敛性质,并提供了其高效实现。数值实验表明,SLSE-FRS优于最先进的方法,即预处理共轭梯度法和迭代双重草图法。

英文摘要

We propose a novel randomized framework for the estimation problem of large-scale linear statistical models, namely Sequential Least-Squares Estimators with Fast Randomized Sketching (SLSE-FRS), which integrates Sketch-and-Solve and Iterative-Sketching methods for the first time. By iteratively constructing and solving sketched least-squares (LS) subproblems with increasing sketch sizes to achieve better precisions, SLSE-FRS gradually refines the estimators of the true parameter vector, ultimately producing high-precision estimators. We analyze the convergence properties of SLSE-FRS, and provide its efficient implementation. Numerical experiments show that SLSE-FRS outperforms the state-of-the-art methods, namely the Preconditioned Conjugate Gradient (PCG) method, and the Iterative Double Sketching (IDS) method.