arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2136
专题追踪
2606.10255 2026-06-10 eess.IV cs.CV cs.DL cs.LG physics.bio-ph 新提交

POPSICLE: Benchmark Datasets for Segmentation and Localization in CryoET

POPSICLE: 用于冷冻电镜断层扫描中分割和定位的基准数据集

Jonathan Schwartz, Utz Heinrich Ermel, C. Braxton Owens, Zhuowen Zhao, Ariana Peck, Gus L. W. Hart, Grant J. Jensen, Bridget Carragher, Dari Kimanius

发表机构 * Biohub Brigham Young University

AI总结 提出POPSICLE基准套件,基于CryoET数据门户构建,涵盖真核和原核系统、纯化与原位样本,支持体素分割和稀疏定位任务,旨在解决冷冻电镜断层扫描中缺乏标准化基准的问题。

详情
AI中文摘要

冷冻电镜断层扫描(cryoET)通过直接可视化完整细胞内的分子结构,将分子架构与细胞组织在天然环境中联系起来,已成为结构和细胞生物学中的强大工具。然而,实现cryoET的全部潜力日益依赖于计算分析,特别是机器学习(ML)的进步,以解释其复杂且信息丰富的数据。尽管进展迅速,cryoET的ML开发仍受限于缺乏标准化、良好注释的基准。现有评估通常规模小、任务特定且孤立构建,限制了方法间的稳健比较。在此,我们提出POPSICLE,一个基于CryoET数据门户(一个开放、ML就绪的断层数据、元数据和注释库)构建的cryoET分割和大分子定位基准套件。POPSICLE涵盖真核和原核系统、纯化和完全原位样本,以及密集体素分割和稀疏定位任务。基于动态数据资源,它可随着新数据集和注释的出现而扩展。基线实验揭示了模型排名在不同任务间的显著变化,强调了需要针对cryoET独特特征定制的基准,而非从相邻生物医学成像领域借鉴的评估实践。因此,POPSICLE为cryoET中可重复的ML评估提供了开放且可扩展的基础。

英文摘要

Cryo-electron tomography (cryoET) has emerged as a powerful tool in structural and cellular biology by enabling direct visualization of macromolecular structures within intact cells, thereby linking molecular architecture to cellular organization in a native context. Realizing the full potential of cryoET, however, increasingly depends on advances in computational analysis, particularly machine learning (ML), to interpret its complex and information-rich data. Despite rapid progress, ML development for cryoET remains bottlenecked by the lack of standardized, well-annotated benchmarks. Existing evaluations are typically small, task-specific, and are assembled in isolation, limiting robust comparisons across methods. Here, we present POPSICLE, a benchmark suite for cryoET segmentation and macromolecular localization built from the CryoET Data Portal - an open, ML-ready repository of tomographic data, metadata, and annotations. POPSICLE spans eukaryotic and prokaryotic systems, both purified and fully in situ samples, and dense voxel-wise segmentation as well as sparse localization tasks. Built on a living data resource, it can expand as new datasets and annotations become available. Baseline experiments reveal substantial variation in model rankings across tasks, underscoring the need for benchmarks tailored to the unique characteristics of cryoET rather than evaluation practices adapted from adjacent biomedical imaging domains. POPSICLE thus provides an open and extensible foundation for reproducible ML evaluation in cryoET.

2606.11140 2026-06-10 physics.geo-ph cs.AI cs.LG stat.AP stat.ML 新提交

Data assimilation for subsurface flow using latent diffusion model parameterization: performance of ensemble-Kalman and Monte Carlo techniques

基于潜扩散模型参数化的地下流体数据同化:集成卡尔曼与蒙特卡洛技术的性能

Guido Di Federico, Wenchao Teng, Louis J. Durlofsky

发表机构 * Department of Energy Science & Engineering, Stanford University(能源科学与工程系,斯坦福大学)

AI总结 针对地下流体数据同化中高维参数反演问题,比较了基于潜扩散模型(LDM)的集成卡尔曼方法(ESMDA)与蒙特卡洛方法(MCMC/SMC)在三维河道地质模型上的性能,发现蒙特卡洛方法在保持地质真实性的同时能更有效地降低数据失配和不确定性。

详情
AI中文摘要

地下流体数据同化(DA)涉及校准模型参数以匹配观测数据(通常来自井),同时保持地质真实性。潜扩散模型(LDM)提供了从高维地质模型空间到低维潜变量的高效映射,降低了反问题的维度,同时保持了后验地质模型的合理性。然而,LDM映射的高度非线性可能会降低基于卡尔曼增益的集成更新的性能。我们针对具有层次地质不确定性的三维河道地质模型,系统比较了DA算法。我们使用多重数据同化集成平滑器(ESMDA)比较了模型空间和潜空间的DA,并展示了一个关键权衡:模型空间更新实现了显著的不确定性降低,但产生了地质上不现实的后验模型,而潜空间更新保持了真实性但表现出有限的不确定性降低。受此启发,我们在3D-LDM潜空间中探索了严格的马尔可夫链蒙特卡洛(MCMC)和序贯蒙特卡洛(SMC)算法。为适应其高计算需求,我们开发了一个快速代理流模型来近似井响应。MCMC和SMC在三个合成测试案例中与ESMDA进行了评估,DA在LDM潜空间中执行。由于LDM参数化,所有模型都保持了地质真实性。MCMC和SMC彼此一致,并且比潜空间ESMDA实现了更低的数据失配和更多的不确定性降低。我们的总体结果表明,集成卡尔曼方法在高度非线性参数化下可能提供过高的后验不确定性,而由快速代理模型支持的严格蒙特卡洛采样可以提供更可靠的替代方案。

英文摘要

Data assimilation (DA) in subsurface flow entails calibrating model parameters to match observed data, typically at wells, while preserving geological realism. Latent diffusion models (LDMs) provide efficient mappings from high-dimensional geological model space to a low-dimensional latent variable, reducing the dimensionality of the inverse problem while maintaining plausibility in posterior geomodels. However, the high nonlinearity in the LDM mapping may degrade the performance of Kalman-gain-based ensemble updates. We present a systematic comparison of DA algorithms applied to large-scale 3D channelized geomodels with hierarchical geological uncertainty. We compare model-space and latent-space DA using the ensemble smoother with multiple data assimilation (ESMDA), and demonstrate a key trade-off: model-space updates achieve significant uncertainty reduction but produce geologically unrealistic posterior models, while latent-space updates preserve realism but exhibit limited uncertainty reduction. Motivated by this, we explore rigorous Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) algorithms in the 3D-LDM latent space. To accommodate their high computational demands, we develop a fast surrogate flow model that approximates well-rate responses. MCMC and SMC are evaluated against ESMDA across three synthetic test cases, with DA performed in the LDM latent space. All models maintain geological realism due to the LDM parameterization. MCMC and SMC are consistent with one another and achieve lower data mismatch and more uncertainty reduction than latent-space ESMDA. Our overall results demonstrate that ensemble Kalman methods may provide overestimated posterior uncertainty with highly nonlinear parameterizations, while rigorous Monte Carlo sampling, enabled by fast surrogate models, can provide a more reliable alternative.

2606.10130 2026-06-10 physics.optics cs.LG physics.data-an 新提交

Effective Training Principles of Physical Reservoirs

物理储层的有效训练原则

Sobhi Saeed, Mehmet Müftüoglu, Glitta R. Cheeran, Juliane Heim, Bennet Fischer, Mario Chemnitz

发表机构 * Leibniz-Institute of Photonic Technology(莱比锡光电技术研究所) Friedrich Schiller University(弗里德里希-席勒大学)

AI总结 研究通过输出剪枝和正则化方法减轻物理储层计算中的过拟合与计算负担,比较多种方法并展示输出采样和正则化对性能的提升。

Comments 19 pages, 7 figures

详情
AI中文摘要

储层计算机受益于光学现象的固有复杂性,这些现象提供了丰富的、通常是非线性的动力学。然而,直接在储层输出上进行训练会使系统容易过拟合,并且在训练阶段计算效率低下。在这项工作中,我们研究了通过输出剪枝和正则化来减轻过拟合和减少计算开销的策略。我们比较了损失最小化搜索方法(Equal Search 和 Branch and Bound)与面向输出的统计过滤方法(Variance Filter)以及随机剪枝,突出了每种方法的优缺点以及明智的储层输出采样的整体重要性,特别是对于缩小的潜在空间。我们进一步证明,强制在整个输出频谱上选择读出可以提高性能,特别是对于非迭代方法。此外,我们检查了 L1 和 L2 正则化技术(LASSO 和岭回归),两者都显著提高了高度非线性任务(如 Spiral Benchmark)的性能。虽然我们的方法具有通用性,但结果是从一个非线性光纤极端学习机中获得的,并以其为例进行讨论。总的来说,这项研究深入分析了储层的隐藏层过滤机制和输出层训练,从而在物理储层计算系统中实现优化性能。

英文摘要

Reservoir computers benefit from the inherent complexity of optical phenomena, which provide rich, often nonlinear dynamics. However, training directly on the reservoir's output renders the system prone to overfitting and computationally inefficient during the training phase. In this work, we investigate strategies to mitigate overfitting and reduce computational overhead through output pruning and regularization. We compare loss-minimizing search methods (Equal Search and Branch and Bound) against an output-oriented statistical filtering approach (Variance Filter) and random pruning, highlighting advantages and disadvantages of each approach and the overall importance of informed reservoir output sampling, particularly for a shrinking latent space. We further demonstrate that enforcing readout selection across the full output spectrum improves performance, especially for non-iterative methods. Additionally, we examine L1 and L2 regularization techniques (LASSO and ridge regression), both of which significantly enhance performance on highly nonlinear tasks such as the Spiral Benchmark. While our methods are of general use, results are obtained from and discussed exemplarily for a nonlinear fiber-optical extreme learning machine. Overall, this study provides a deep analysis of the reservoirs' hidden-layer filtering mechanisms and the output-layer training, enabling optimized performance in physical reservoir computing systems.

2606.09963 2026-06-10 physics.flu-dyn cs.AI 新提交

Geometry-Aware Anisotropic Boundary Correction for Aerodynamic Simulation

几何感知的各向异性边界修正用于气动模拟

Xin Zhang, Yipeng Huang, Shu Jiang, Zhenzhong Wang, Min Jiang

发表机构 * School of Informatics, Xiamen University(厦门大学信息学院) Institute of Artificial Intelligence, Xiamen University(厦门大学人工智能研究院)

AI总结 针对神经算子忽视边界各向异性物理行为的问题,提出几何条件各向异性边界修正框架GeoABC,利用边界几何引入方向感知修正,在2D翼型和3D汽车任务中平均降低近边界相对L2误差约38%。

详情
AI中文摘要

气动模拟是工程形状设计的关键组成部分,其中表面压力系数等核心量强烈依赖于固体边界附近的流动动力学。神经算子为昂贵的计算流体动力学(CFD)求解器提供了一种高效替代方案。然而,传统方法各向同性地处理边界区域,未能考虑沿边界的不同物理行为。实际上,气动过程表现出各向异性:沿切向,流动沿壁面传播;沿法向,物理量受壁面约束。为了显式建模不同的物理行为,我们提出了GeoABC,一种几何条件各向异性边界修正框架。GeoABC利用边界几何将方向感知的边界修正引入神经算子的中间表示,将边界几何从静态输入特征转变为调节物理预测的结构先验。在2D翼型和3D汽车任务中,GeoABC一致地适应多种神经算子主干,平均降低近边界相对$L_2$误差约38%,缩小了主流神经算子共有的结构近壁间隙,推动神经算子向高保真气动模拟发展。

英文摘要

Aerodynamic simulation is a key component of engineering shape design, where core quantities such as the surface pressure coefficient strongly depend on flow dynamics near solid boundaries. Neural operators provide an efficient alternative to expensive Computational Fluid Dynamics (CFD) solvers. However, conventional methods treat the boundary region isotropically, failing to account for the distinct physical behaviors along the boundaries. In reality, the aerodynamic process exhibits anisotropy: along the tangential direction, flow propagates along the wall; along the normal direction, physical quantities are constrained by the wall. To explicitly model the distinct physical behaviors, we propose GeoABC, a geometry-conditioned anisotropic boundary correction framework. GeoABC leverages the boundary geometries to introduce direction-aware boundary correction into the intermediate representations of neural operators, transforming boundary geometry from static input features into a structural prior that modulates physical prediction. On 2D airfoil and 3D car tasks, GeoABC consistently adapts to multiple neural operator backbones, reducing near-boundary relative $L_2$ error by $\sim$38\% on average, narrowing the structural near-wall gap shared by mainstream neural operators, and advancing neural operators toward high-fidelity aerodynamic simulation.

2606.10384 2026-06-10 nlin.AO cs.AI physics.comp-ph 新提交

Towards Critical Branching Mechanism in Recurrent Neural Networks

递归神经网络中的临界分支机制

Feixiang Ren, Ling Feng

发表机构 * Department of Physics, National University of Singapore(新加坡国立大学物理系) Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR)(科技研究局高性能计算研究所)

AI总结 本文通过分析LSTM网络隐藏态动力学,发现小规模网络在最优训练阶段呈现近临界动力学(无标度雪崩统计和分支参数接近1),而大规模网络保持亚临界,并引入混合分支过程框架解释亚临界分支与1/f噪声的共存。

详情
AI中文摘要

临界性已被提出作为生物神经系统中的关键组织原则,但其在人工神经网络中的起源和相关性仍不清楚。我们分析了训练后的长短期记忆(LSTM)网络中的隐藏态动力学,并表明接近其最优训练时期(步数)的小型网络表现出无标度雪崩统计和接近1的分支参数,指示近临界动力学,而较大的模型保持亚临界。为了解释亚临界分支与稳健的$1/f^{\beta}$噪声的共存,我们引入了一个混合分支过程框架,将异质分支动力学与长程时间相关性联系起来。这些结果将LSTM中的类临界行为识别为一种涌现的、依赖于容量的动力学机制。

英文摘要

Criticality has been proposed as a key organizing principle in biological neural systems, yet its origin and relevance in artificial neural networks remain unclear. We analyze hidden-state dynamics in trained long short-term memory (LSTM) networks and show that small networks near their optimal training epochs (steps) exhibit scale-free avalanche statistics and branching parameters close to unity, indicative of near-critical dynamics, while larger models remain subcritical. To explain the coexistence of subcritical branching with robust $1/f^β$ noise, we introduce a mixture branching process framework that links heterogeneous branching dynamics to long-range temporal correlations. These results identify critical-like behavior in LSTMs as an emergent, capacity-dependent dynamical regime.

2606.10698 2026-06-10 hep-ph cs.LG hep-th 新提交

Efficient AI-Inspired Reduction of Feynman Integrals via Tube Seeding

基于管状播种的费曼积分高效类脑约化

Justin Berman, Francois Charton, Andres Luna, Matthias Wilhelm, Mao Zeng

发表机构 * Leinweber Institute for Theoretical Physics, Randall Laboratory of Physics, University of Michigan, Ann Arbor, 450 Church St, Ann Arbor, MI 48109-1040, USA(莱因韦伯理论物理研究所,物理系拉尔登实验室,密歇根大学安娜堡分校) Axiom Math, 124 University Avenue, Palo Alto, California, 94301, United States(Axiom数学公司,帕洛阿尔托,加利福尼亚州,94301,美国) Niels Bohr International Academy, Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100 Copenhagen , Denmark(尼尔斯·波尔国际学院,尼尔斯·波尔研究所,哥本哈根大学) Center for Quantum Mathematics, Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark(量子数学中心,数学与计算机科学系,丹麦南部大学) Higgs Centre for Theoretical Physics, University of Edinburgh, Edinburgh, EH9 3FD, United Kingdom(希格斯理论物理中心,爱丁堡大学)

AI总结 利用机器学习发现一种新的种子选择策略,通过稀疏种子线性增长实现高幂次多圈积分的约化,显著降低计算时间和内存占用,适用于唯象应用。

Comments 61 pages, 25 figures, 11 tables

详情
AI中文摘要

在本文中,我们利用机器学习发现了一种新的种子选择策略,用于费曼积分的分部积分约化,这是理论粒子物理和引力波物理前沿计算中常见的瓶颈。我们的策略允许通过本质上标准的Laporta算法,但采用稀疏的种子积分选择,仅随分子幂次线性增长,从而约化具有大分子幂次的多圈积分,而现有策略会导致随被约化积分复杂度增加的多项式幂次增长。种子被限制在一个薄管状区域内,该区域沿之字形路径将目标积分与主积分连接起来。我们通过约化具有数值运动学、秩为20的非平面2圈5点积分来展示我们方法的能力,这对于传统播种的Laporta算法来说难以实现。超越单个积分,我们进一步展示了通过将目标积分分成若干块来约化完整的一组顶层秩10积分,每块都可以通过我们的稀疏播种策略以显著少于其他先进策略的时间和内存占用求解,使该方法适用于唯象应用。我们在GitHub上提供了原理验证实现,网址为https://this URL。

英文摘要

In this paper, we use machine learning to discover a new seeding strategy for integration-by-parts reduction of Feynman integrals, which is a frequent bottleneck in state-of-the-art calculations in theoretical particle and gravitational-wave physics. Our strategy allows us to reduce multi-loop integrals with large numerator powers via essentially the standard Laporta algorithm but with a sparse selection of seed integrals that grows only linearly with the numerator power, whereas existing strategies lead to growth with a polynomial power that increases with the complexity of the integral being reduced. The seeds are restricted to a thin tube-like region that connects the target integral to the master integrals along a zigzag path. We demonstrate the power of our approach by reducing non-planar 2-loop 5-point integrals of rank 20 with numerical kinematics over a finite field, which is prohibitively difficult for the Laporta algorithm with conventional seeding. Going beyond individual integrals, we further demonstrate the reduction of a complete set of top-level rank-10 integrals by dividing the target integrals into several chunks, each of which can be solved by our sparse seeding strategy with considerably less time and a significantly lower memory footprint than other state-of-the-art strategies, making the approach well-suited for phenomenological applications. We provide a proof-of-principle implementation on GitHub at https://github.com/andreslunagodoy/tube_seeding.

2606.10381 2026-06-10 hep-ex cs.AI cs.CL cs.IR physics.ins-det 新提交

Agentic Hybrid RAG for Evidence-Grounded Muon Collider Analysis

基于证据的缪子对撞机分析的智能混合RAG

Ruobing Jiang, Dawei Fu, Cheng Jiang, Tianyi Yang, Zijian Wang, Youpeng Wu, Yong Ban, Yajun Mao, Qiang Li

发表机构 * Peking University(北京大学)

AI总结 提出智能混合RAG框架,结合稀疏与稠密检索及智能推理,用于缪子对撞机研究的证据检索与答案生成,构建首个基准并验证其有效性。

Comments 22 pages, 5 figures, and 6 tables

详情
AI中文摘要

缪子对撞机研究涵盖加速器物理、探测器仪器和高能现象学,相关证据分散在快速扩展且异构的科学文献中。随着高能物理(HEP)越来越多地探索智能辅助分析工作流,高效定位、整合和验证科学证据成为关键能力。虽然检索增强生成(RAG)为科学问答提供了有前景的框架,但在不牺牲检索精度的情况下整合智能推理仍是一个关键挑战。在这项工作中,我们提出了智能混合RAG,一个基于证据的RAG框架,用于缪子对撞机研究。该框架结合了混合检索器(集成稀疏词汇和稠密语义检索)与智能推理模块,用于查询分解、证据扩展和基于证据的答案生成。为了进行系统评估,我们构建了缪子对撞机领域首个检索增强科学问答基准,包括一个精选文献语料库以及涵盖主要探测器和物理研究主题的专用检索和答案生成基准。广泛评估表明,混合检索提供了最强的检索基础,而智能推理在受控证据扩展和答案合成方面最为有效。基于这一原则,智能混合RAG在检索效果、答案质量、证据覆盖和事实基础方面始终优于代表性的检索和RAG基线。该基准和框架共同为基于证据的科学问答以及未来在大规模科学文献上运行的HEP分析智能体奠定了基础。

英文摘要

Muon collider research spans accelerator physics, detector instrumentation, and high-energy phenomenology, with relevant evidence scattered across a rapidly expanding and heterogeneous body of scientific literature. As high-energy physics (HEP) increasingly explores agent-assisted analysis workflows, efficiently locating, integrating, and verifying scientific evidence becomes an essential capability. While retrieval-augmented generation (RAG) offers a promising framework for scientific question answering, integrating agentic reasoning without compromising retrieval precision remains a key challenge. In this work, we present agentic hybrid RAG, an evidence-grounded RAG framework for muon collider research. The framework combines a hybrid retriever, integrating sparse lexical and dense semantic retrieval, with an agentic reasoning module for query decomposition, evidence expansion, and grounded answer generation. To enable systematic evaluation, we construct the first benchmark for retrieval-augmented scientific question answering in the muon collider domain, comprising a curated literature corpus together with dedicated retrieval and answer-generation benchmarks covering major detector and physics research topics. Extensive evaluation shows that hybrid retrieval provides the strongest retrieval backbone, while agentic reasoning is most effective for controlled evidence expansion and answer synthesis. Built on this principle, agentic hybrid RAG consistently outperforms representative retrieval and RAG baselines in retrieval effectiveness, answer quality, evidence coverage, and factual grounding. Together, the benchmark and framework provide a foundation for evidence-grounded scientific question answering and future HEP analysis agents operating over large-scale scientific literature.

2606.10547 2026-06-10 eess.IV cond-mat.mtrl-sci cs.LG physics.ins-det 新提交

Unsupervised Deep Learning for Limited-Angle STEM-EDX Tomography -- Application to 3D Chemical Analysis of Phase-Change Memory Devices

无监督深度学习用于有限角度STEM-EDX层析成像——在相变存储器件三维化学分析中的应用

Daniel del Pozo Bueno, Serge Brosset, Theo Monniez, Gabriele Navarro, Philippe Ciuciu, Zineb Saghi

发表机构 * CEA, LETI, Univ. Grenoble Alpes(CEA LETI 格鲁诺布尔大学) CEA, Neurospin, Paris-Saclay University(CEA 神经科学研究中心 巴黎-萨克雷大学) Inria, MIND(Inria MIND)

AI总结 提出基于深度图像先验和全变分正则化的无监督深度学习框架(DIP-TV及多通道扩展DIPm-TV),解决有限角度STEM-EDX层析成像中的缺失楔伪影和噪声问题,实现相变存储器件的三维化学分析。

Comments 29 pages (17 main manuscript + 12 supplementary information), 4 figures, 8 supplementary figures, 1 table, and 4 supplementary tables

详情
AI中文摘要

扫描透射电子显微镜(STEM)中的能量色散X射线(EDX)层析成像能够实现纳米尺度的三维成分和元素映射,但其应用受到有限倾斜范围和避免束损伤所需的低剂量条件的限制。有限角度采集会引入缺失楔伪影,如拉长和各向异性分辨率,而噪声低剂量数据进一步降低重建质量和定量可靠性。本文提出了一种基于深度图像先验和全变分正则化(DIP-TV)的无监督深度学习框架,用于有限角度STEM-EDX层析成像。我们将其扩展为多通道公式(DIPm-TV),通过利用空间相关性联合重建多个元素图。使用合成三通道体模,我们展示了该方法在中等噪声下补偿了约$100^\circ$缺失角度范围对应的严重缺失楔伪影,性能优于同步迭代重建技术和压缩感知方法。我们将该方法应用于原始(制备态)和SET(晶态)工作状态下的Ge-Sb-Te(GST)存储器件三维化学分析。样品制备为横截面聚焦离子束薄片,并在$-40^\circ$至$+40^\circ$的有限角度倾斜范围内以$5^\circ$步长和$2.0\times10^5$ $e^-/Ang^2$剂量采集。多通道方法仅利用EDX信号实现逐体素元素重建,无需外部结构先验(如高角环形暗场成像)。重建体积显示出近各向同性空间分辨率,并揭示了与器件操作相关的成分异质性。该方法能够在实验可实现的样品几何结构中进行三维化学表征,而传统方法因严重的角度限制而失效。

英文摘要

Energy Dispersive X-ray (EDX) tomography in Scanning Transmission Electron Microscopy (STEM) enables 3D compositional and elemental mapping at the nanoscale, but its use is limited by restricted tilt ranges and low-dose conditions required to avoid beam damage. Limited-angle acquisition introduces missing-wedge artefacts such as elongation and anisotropic resolution, while noisy low-dose data further degrade reconstruction quality and quantitative reliability. Here, we introduce an unsupervised deep learning framework based on Deep Image Prior with total variation regularization (DIP-TV) for limited-angle STEM-EDX tomography. We extend it to a multi-channel formulation (DIPm-TV) that jointly reconstructs multiple elemental maps by exploiting spatial correlations. Using a synthetic 3-channel phantom, we show that the method compensates for severe missing-wedge artefacts corresponding to approximately $100^\circ$ of missing angular range under moderate noise, outperforming simultaneous iterative reconstruction technique and compressed sensing approaches. We apply the method to 3D chemical analysis of Ge-Sb-Te (GST) memory devices in virgin (as-fabricated) and SET (crystalline) operational states. Samples were prepared as cross-sectional focused ion beam lamellae and acquired under a limited-angle tilt range from $-40^\circ$ to $+40^\circ$ with $5^\circ$ steps and a dose of $2.0\times10^5$ $e^-/Ang^2$. The multi-channel approach enables voxel-by-voxel elemental reconstruction using only EDX signals without external structural priors such as high-angle annular dark-field imaging. The reconstructed volumes show near-isotropic spatial resolution and reveal compositional heterogeneities associated with device operation. This approach enables 3D chemical characterization in experimentally accessible sample geometries where conventional methods fail due to severe angular limitations.

2606.10349 2026-06-10 cond-mat.dis-nn cond-mat.str-el cs.LG 新提交

Magnetic HIP-NN for spin dynamics in disordered itinerant magnets

磁性HIP-NN用于无序巡游磁体中的自旋动力学

Supriyo Ghosh, Yunhao Fan, Sheng Zhang, Kipton Barros, Gia-Wei Chern

发表机构 * Department of Physics, University of Virginia(弗吉尼亚大学物理系) Department of Chemistry, University of Chicago(芝加哥大学化学系) Theoretical Division and CNLS, Los Alamos National Laboratory(洛斯阿拉莫斯国家实验室理论 division 和 CNLS)

AI总结 提出磁性HIP-NN(mHIP-NN),通过旋转不变自旋关联的分层消息传递,高效模拟无序巡游磁体中电子介导的自旋动力学,准确再现Landau-Lifshitz-Gilbert动力学和热淬火后的非平衡自旋关联演化。

Comments 12 pages, 5 figures

详情
AI中文摘要

我们提出了分层相互作用粒子神经网络(HIP-NN)的磁性扩展,用于实现无序巡游磁体中电子介导自旋动力学的大规模模拟。由此产生的磁性HIP-NN(mHIP-NN)将旋转不变的自旋关联直接纳入分层消息传递层,使网络能够从耦合的几何-自旋环境中学习涌现的磁能景观和有效局域场,同时保持自旋旋转对称性。作为基准应用,我们考虑了结构无序的巡游$s$-$d$交换模型,其中有效磁力动态地来源于瞬时电子结构,并且使用传统的基于精确对角化的方法在计算上难以评估。我们表明,mHIP-NN准确再现了控制Landau-Lifshitz-Gilbert动力学的局域力矩,并忠实地捕捉了热淬火后空间自旋关联的非平衡演化。我们的结果确立了对称性感知的分层消息传递网络作为大规模模拟受挫巡游自旋系统和非平衡磁动力学的高效且可扩展的框架。更广泛地说,由于学习的能量泛函对原子坐标和自旋变量完全可微,该框架也为自旋依赖的原子间势和耦合原子-自旋动力学提供了自然基础。

英文摘要

We present a magnetic extension of the Hierarchically Interacting Particle Neural Network (HIP-NN) that enables large-scale simulations of electron-mediated spin dynamics in disordered itinerant magnets. The resulting magnetic HIP-NN (mHIP-NN) incorporates rotationally invariant spin correlations directly into hierarchical message-passing layers, enabling the network to learn emergent magnetic energy landscapes and effective local fields from coupled geometric-spin environments while preserving spin-rotation symmetry. As a benchmark application, we consider structurally disordered itinerant $s$-$d$ exchange models in which the effective magnetic forces arise dynamically from the instantaneous electronic structure and are computationally prohibitive to evaluate using conventional exact-diagonalization-based approaches. We show that mHIP-NN accurately reproduces the local torques governing Landau-Lifshitz-Gilbert dynamics and faithfully captures the nonequilibrium evolution of spatial spin correlations following thermal quenches. Our results establish symmetry-aware hierarchical message-passing networks as an efficient and scalable framework for large-scale simulations of frustrated itinerant spin systems and nonequilibrium magnetic dynamics. More broadly, because the learned energy functional remains fully differentiable with respect to both atomic coordinates and spin variables, the framework also provides a natural foundation for spin-dependent interatomic potentials and coupled atom-spin dynamics.

2606.10771 2026-06-10 astro-ph.IM cs.LG cs.RO 新提交

On-sky demonstration of reinforcement learning for adaptive optics control

自适应光学控制强化学习的在轨演示

Jalo Nousiainen, Vincent Chambouleyron, Benoit Neichel, Sylvain Cetre, Jean-Francois Sauvage, Angelie Alagao, Markus Kasper, Jonathan Dray, Romain Fetick, Byron Engler

发表机构 * European Southern Observatory(欧洲南天文学中心) Aix Marseille University(艾克斯马赛大学) CNRS(法国国家科学研究中心) CNES(法国国家太空研究中心) LAM(雷恩天文物理实验室) Wakea Consulting(Wakea咨询公司) Bertin Alpao

AI总结 首次在望远镜上演示了基于强化学习的自适应光学控制器PO4AO,在多种条件下优于传统积分控制器,展示了鲁棒性和高性能。

Comments 11 pages, 12 figures accepted by A&A

详情
AI中文摘要

基于强化学习(RL)的算法最近已成为自适应光学(AO)控制的一种有前景的方法。在模拟和实验室实验中,它们已展现出对现实世界效应(如光子和探测器噪声、误配准、振动以及视宁度条件的快速变化)的鲁棒性。然而,它们的性能尚未在天空中得到验证。我们报告了首个基于强化学习的自适应光学控制器(名为PO4AO)的在轨演示。我们进一步分析了其在轨行为,并确定了改进算法及其实现的方向。PO4AO在位于OHP的1.52米望远镜(T152)的Coudé焦点的Papyrus自适应光学系统上实现并部署。基于Python的实现通过共享内存缓冲区与现有的实时控制器(DAO RTC)接口连接。在多个夜晚,覆盖不同的流量水平和大气条件,将PO4AO的性能与标准积分控制器进行了比较。PO4AO在所有测试配置中均持续优于标准积分器。该控制器成功学习并补偿了振动模式,并表现出对测量噪声的强鲁棒性。一旦为Papyrus调整好,PO4AO以交钥匙方式运行,在变化的观测条件和科学目标下使用单一超参数集。尽管非优化的Python实现引入了约750微秒的额外延迟,以及控制抖动和偶尔的帧丢失,但仍实现了这些性能提升。当正确实现和优化后,PO4AO构成了单共轭自适应光学系统的鲁棒且高性能的交钥匙控制器,为在轨AO操作中更广泛地采用强化学习策略铺平了道路。

英文摘要

Reinforcement learning (RL)-based algorithms have recently emerged as a promising approach for adaptive optics (AO) control. In simulations and laboratory experiments, they have demonstrated robustness to real-world effects such as photon and detector noise, misregistration, vibrations, and rapid variations in seeing conditions. However, their performance has not yet been validated on sky. We report the first on-sky demonstration of a reinforcement learning controller for adaptive optics, named Policy Optimization for AO (PO4AO). We further analyze its on-sky behavior and identify directions for improving the algorithm and its implementation.PO4AO was implemented and deployed on the Papyrus adaptive optics system installed at the Coudé focus of the 1.52 m telescope (T152) at the OHP. A Python-based implementation was interfaced with the existing real-time controller (DAO RTC) via shared-memory buffers. The performance of PO4AO was compared to that of a standard integrator controller over several nights, covering a range of flux levels and atmospheric conditions. PO4AO consistently outperformed the standard integrator in all tested configurations. The controller successfully learned and compensated for vibration patterns and demonstrated strong robustness to measurement noise. Once tuned for Papyrus, PO4AO operated in a turnkey fashion, using a single set of hyperparameters across varying observing conditions and science targets. These performance gains were achieved despite a non-optimized Python implementation introducing approximately $750\,μ\text{s}$ of additional latency, along with control jitter and occasional frame drops. When properly implemented and optimized, PO4AO constitutes a robust and high-performance turnkey controller for single-conjugate adaptive optics systems, paving the way for broader adoption of reinforcement learning strategies in on-sky AO operations.

2606.10197 2026-06-10 astro-ph.GA cs.AI 新提交

Integral Field Unit Spectroscopy with One Fiber

单光纤积分场单元光谱学

Zehao Peng, Biprateep Dey, Chris J. Maddison, Joshua S. Speagle

发表机构 * University of Toronto(多伦多大学) Vector Institute(向量研究所)

AI总结 提出一种多模态概率基础模型,利用掩码自编码器从宽带图像预测星系任意空间位置的高分辨率光谱,无需IFU训练数据,性能与监督基线相当。

Comments Accepted for Conference on Physics and AI at Stanford University (PAI 2026)

详情
AI中文摘要

积分场单元(IFU)光谱学提供星系的空间分辨光谱,为星系演化提供关键见解。然而,其高观测成本限制了当前IFU数据集约$10^4$个天体。我们提出一个多模态概率基础模型,直接从宽带图像预测星系内任意空间位置的高分辨率光谱,并带有校准的不确定性。基于掩码自编码器框架,我们的架构注入光纤位置编码和红移感知波长编码,实现空间条件预测。使用暗能量光谱仪(DESI)巡天的470万张图像和单光纤光谱观测训练,我们的模型利用光纤放置的自然方差和星系的形态自相似性,在没有任何IFU训练数据的情况下实现IFU般的能力。预测的发射线通量图与APO附近星系巡天(MaNGA)的独立IFU观测结果一致,性能与直接在IFU数据上训练的监督基线相当。

英文摘要

Integral field unit (IFU) spectroscopy provides spatially resolved spectra across galaxies, offering crucial insights into their evolution. However, its high observational cost limits current IFU datasets to $\sim 10^4$ objects. We present a multi-modal, probabilistic foundation model that predicts high-resolution spectra with calibrated uncertainties at arbitrary spatial locations within a galaxy directly from broadband images. Built on a masked autoencoder framework, our architecture injects fiber positional encodings and redshift aware wavelength encodings, enabling spatially conditioned predictions. Trained on 4.7 million images and single fiber spectroscopic observations from the Dark Energy Spectroscopic Instrument (DESI) survey, our model exploits the natural variance of fiber placements and the morphological self-similarity of galaxies to achieve IFU-like capabilities without any IFU training data. Predicted emission line flux maps match independent IFU observations from the Mapping Nearby Galaxies at APO (MaNGA) survey, with performance comparable to a supervised baseline trained directly on IFU data.

2606.10023 2026-06-10 astro-ph.CO astro-ph.IM cs.LG 新提交

Learning the Universe: Posterior Reliability of Neural Generative Models in High-Dimensional Field-Level Inference of Cosmic Initial Conditions

学习宇宙:神经生成模型在高维场级宇宙初始条件推断中的后验可靠性

Ludvig Doeser, Jens Jasche

发表机构 * The Oskar Klein Centre, Department of Physics, Stockholm University, AlbaNova University Centre(奥斯卡·克莱因中心,物理系,斯德哥尔摩大学,阿尔瓦纳大学中心) Center for Computational Astrophysics, Flatiron Institute(计算天体物理学中心,Flatiron研究所)

AI总结 本文通过哈密顿蒙特卡洛参考后验,评估神经生成模型(随机插值和GLOW归一化流)在高维场级宇宙初始条件推断中的后验可靠性,发现匹配后验均值或边缘分布无法保证正确的不确定性结构。

Comments This is a Learning the Universe publication. 19 pages, 18 figures

详情
AI中文摘要

准确的后验估计是科学推断的核心,因为不确定性决定了从观测数据中能可靠地学到什么。虽然马尔可夫链蒙特卡洛方法提供了渐近收敛保证,但在高维设置中计算成本高昂。基于神经网络的生成模型能够对整个离散化三维场进行快速摊销推断,但通常缺乏收敛保证和原则性的精度评估。利用哈密顿蒙特卡洛获得参考后验样本,我们对隐式生成模型(随机插值)和显式基于似然的模型(GLOW归一化流)进行了受控的场级评估。这种在典型应用中无法获得的比较,使得能够检测到标准指标无法捕捉的后验几何失败。作为案例研究,我们考虑了从当今大尺度结构推断宇宙初始条件的宇宙学逆问题。为了匹配现代宇宙学数据的精度,该问题日益依赖复杂、非线性和不可微的模拟器,这些模拟器与基于梯度的推断框架不兼容。生成模型提供了一条应对这些挑战的途径,前提是它们推断的后验是可靠的。在这项工作中,我们表明,匹配后验均值、边缘分布或实现高互相关并不意味着正确的不确定性结构,这一点通过后验方差场和基于样本的评估得以揭示。通过这项工作,我们旨在提高对高维场级设置中不确定性估计挑战的认识,强调在科学应用中仔细设计和验证神经生成方法的重要性。

英文摘要

Accurate posterior estimation is central to scientific inference, as uncertainties determine what can be reliably learned from observational data. While Markov chain Monte Carlo methods provide asymptotic convergence guarantees, they are computationally demanding in high-dimensional settings. Neural network-based generative models for entire discretized 3D fields enable fast amortized inference but often lack convergence guarantees and principled accuracy assessment. Using Hamiltonian Monte Carlo to obtain reference posterior samples, we conduct a controlled field-level evaluation of an implicit generative model (Stochastic Interpolants) and an explicit likelihood-based model (GLOW normalizing flows). This comparison, unavailable in typical applications, enables the detection of posterior geometry failures that standard metrics cannot capture. As a case study, we consider the cosmological inverse problem of inferring cosmic initial conditions from present-day large-scale structure. To match the precision of modern cosmological data, this problem increasingly relies on complex, non-linear, and non-differentiable simulators, which are incompatible with gradient-based inference frameworks. Generative models offer a route to address these challenges, provided their inferred posteriors are reliable. In this work, we show that matching posterior means, marginal distributions, or achieving high cross-correlation does not imply correct uncertainty structure, as revealed by posterior variance fields and sample-based evaluations. Through this work, we aim to raise awareness of the challenges of uncertainty estimation in high-dimensional field-level settings, highlighting the importance of careful design and validation of neural generative approaches for scientific applications.

2606.09041 2026-06-10 cs.CY cs.AI cs.GR cs.HC cs.MM 交叉投稿

Culturally-Aware AI for Cross-Boundary Community Learning: Undergraduate Innovation at the Intersection of Computation and Design

跨边界社区学习的文化感知AI:计算与设计交叉领域的本科生创新

Jiaojiao Zhao, Weisheng Zhang, Jiawen Cai, Haibin Gao, Luyao Zhang

发表机构 * Duke Kunshan University(杜克昆山大学) Zhouzhuang Mystery of Life Museum(周庄生命之谜博物馆) Digital Innovation Research Center and Social Science Division(数字创新研究中心和社会科学系)

AI总结 本文提出一个协作框架,通过社区参与计算实现文化感知AI教育,促进社会工作和计算科学跨学科融合,应用于文化遗产保护与可持续发展。

详情
AI中文摘要

人工智能在教育领域(AIED)的研究正在迅速扩展,但技术进步往往缺乏以人为中心的根基和对文化背景的充分关注。社区学习作为一种根植于社会工作的教学法,在AIED研究中仍然代表性不足,尤其是在亚太地区。本文报告了跨边界社区学习,其中本科生开发基于AI的解决方案,用于文化遗产保护和可持续发展。我们考察了社区参与计算如何在教育、技术和文化三个维度上实现以人为中心的AIED。我们贡献了一个文化感知AIED的协作框架,该框架通过打破社会工作与计算科学之间的学科壁垒,促进多方利益相关者协作,同时扩大参与度。

英文摘要

Research on artificial intelligence in education (AIED) is rapidly expanding, yet technical progress often lacks human-centered grounding and adequate attention to cultural context. Community-Based Learning, a pedagogy rooted in social work, remains underrepresented in AIED research, particularly within Asia-Pacific contexts. This paper reports on cross-boundary Community-Based Learning where undergraduate students develop AI-enabled solutions for cultural heritage preservation and sustainable development. We examine how community-engaged computing operationalizes human-centered AIED across three dimensions: education, technology, and culture. We contribute a collaborative framework for culturally-aware AIED that fosters multi-stakeholder collaboration while widening participation by dissolving disciplinary silos between social work and computational science.

2605.27770 2026-06-10 hep-th cs.LG 交叉投稿

Sampling Triangulations and Calabi-Yau Threefolds with Autoregressive GNNs

使用自回归图神经网络采样三角剖分和卡拉比-丘三维流形

Nate MacFadden

发表机构 * Department of Physics, Cornell University(康奈尔大学物理系)

AI总结 提出dualGNN,一种自回归消息传递图神经网络,用于采样凸多面体的精细正则三角剖分,并应用于弦论中均匀采样卡拉比-丘三维流形,模型参数少、训练快、泛化能力强。

Comments 50 pages, 27 figures, 3 tables

详情
AI中文摘要

我们引入了`dualGNN`,一种自回归消息传递图神经网络,用于采样凸多面体的精细正则三角剖分(FRT)。dualGNN 在三角剖分对偶图的推广上操作,边由`有向电路`标记——来自定向拟阵理论的组合不变量,我们证明这些对于揭示正则性既是必要的也是充分的。该模型独立于多面体中的点数,并在多面体的保向对称群($\mathrm{SL}(d,\mathbb{Z}) \ltimes \mathbb{Z}^d$)下不变。当使用某种掩码程序实现时,还可以保证每次 rollout 都产生一个精细三角剖分(在二维中)。在 $N_\mathrm{pts} \leq 40$ 的未见多边形上,dualGNN 是我们测试过的最均匀的 FRT 采样器,甚至仅在单个多边形上训练的模型也能很好地泛化到其他多边形。该模型很小(约 92k 参数),在单个消费级 GPU 上训练约 7.5 小时,并且无需修改即可在 M1 MacBook Pro 上运行。我们将 dualGNN 应用于弦论,在 $h^{1,1}=86$ 处均匀采样卡拉比-丘三维流形,并在 $h^{1,1}=128$ 处与均匀性一致。这比之前的学习方法提升了一个数量级,而模型小了约 1000 倍。代码、训练脚本和预训练模型可在 https://github.com/natemacfadden/dualGNN 获取。

英文摘要

We introduce `dualGNN', an autoregressive message-passing GNN for sampling fine, regular triangulations (FRTs) of convex polytopes. dualGNN operates on a generalization of the dual graph of a triangulation, with edges labeled by `signed circuits' -- combinatorial invariants from oriented matroid theory which we show are both necessary and sufficient for exposing regularity. The model is independent of the number of points in the polytope and invariant under the polytope's orientation-preserving symmetries ($\mathrm{SL}(d,\mathbb{Z}) \ltimes \mathbb{Z}^d$). When implemented with a certain masking procedure, one can also guarantee that every rollout produces a fine triangulation (in $2$D). On unseen polygons with $N_\mathrm{pts} \leq 40$, dualGNN is the most uniform FRT sampler we tested, and even a model trained on a single polygon generalizes well to other polygons. The model is small ($\sim92$k parameters), trains in $\sim7.5$ hours on a single consumer GPU, and runs without modification on an M1 MacBook Pro. We apply dualGNN to string theory, uniformly sampling Calabi-Yau threefolds at $h^{1,1}=86$ and consistent with uniformity at $h^{1,1}=128$. This is an order of magnitude beyond previous learned methods with a model $\sim1000\times$ smaller. Code, training scripts, and pretrained models are available at https://github.com/natemacfadden/dualGNN .

2601.11072 2026-06-10 cs.HC cs.AI cs.CY 交叉投稿

More Human or More AI? Visualizing Human-AI Collaboration Disclosures in Journalistic News Production

更人性化还是更AI?新闻制作中人机协作披露的可视化

Amber Kusters, Pooja Prajod, Pablo Cesar, Abdallah El Ali

发表机构 * Centrum Wiskunde & Informatica Amsterdam(阿姆斯特丹数学与信息学研究中心) Centrum Wiskunde & Informatica(阿姆斯特丹数学与信息学研究中心) TU Delft(代尔夫特理工大学) Utrecht University(乌得勒支大学)

AI总结 通过协同设计会话和实验室研究,探讨不同可视化披露方式(文本、基于角色的时间线、基于任务的时间线、聊天机器人)及协作比例如何影响用户对新闻中人机协作的感知。

Comments Accepted to ACM CHI 2026 - Preprint

详情
AI中文摘要

在新闻编辑流程中,目前对AI使用的披露仅限于简单的标签,这忽略了人类和AI如何在新闻文章上协作的细微差别。通过协同设计会话(N=10),我们收集了69个披露设计,并实现了四个原型,以可视化方式披露新闻中的人机协作。随后,我们进行了一项受试者内实验室研究(N=32),考察披露可视化(文本、基于角色的时间线、基于任务的时间线、聊天机器人)和协作比例(主要人类 vs. 主要AI)如何影响可视化感知、注视模式以及体验后反应。我们发现,文本披露在传达人机协作方面效果最差,而聊天机器人提供了最深入的信息。此外,基于角色的时间线在主要人类文章中放大了AI的贡献,而基于任务的时间线在主要AI文章中将感知转向人类参与。我们贡献了人机协作披露可视化及其评估,并提出了关于可视化如何改变对AI在新闻文章创作中实际角色感知的警示性考虑。

英文摘要

Within journalistic editorial processes, disclosing AI usage is currently limited to simplistic labels, which misses the nuance of how humans and AI collaborated on a news article. Through co-design sessions (N=10), we elicited 69 disclosure designs and implemented four prototypes that visually disclose human-AI collaboration in journalism. We then ran a within-subjects lab study (N=32) to examine how disclosure visualizations (Textual, Role-based Timeline, Task-based Timeline, Chatbot) and collaboration ratios (Primarily Human vs. Primarily AI) influenced visualization perceptions, gaze patterns, and post-experience responses. We found that textual disclosures were least effective in communicating human-AI collaboration, whereas Chatbot offered the most in-depth information. Furthermore, while role-based timelines amplified AI contribution in primarily human articles, task-based timeline shifted perceptions toward human involvement in primarily AI articles. We contribute Human-AI collaboration disclosure visualizations and their evaluation, and cautionary considerations on how visualizations can alter perceptions of AI's actual role during news article creation.

2605.03344 2026-06-10 cs.IR cs.AI cs.CL 版本更新

RAG over Thinking Traces Can Improve Reasoning Tasks

RAG 基于思考轨迹可提升推理任务

Negar Arabzadeh, Wenjie Ma, Sewon Min, Matei Zaharia

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出检索思考轨迹而非文档,通过 T3 方法将其转化为结构化表示,在推理任务上显著提升性能,超越标准 RAG 和无 RAG 基线。

详情
AI中文摘要

检索增强生成(RAG)已被证明对知识密集型任务有效,但普遍认为其对数学和代码生成等推理密集型问题帮助有限。我们通过证明限制不在于 RAG 本身而在于语料库的选择来挑战这一假设。我们不检索文档,而是提出检索思考轨迹,即问题求解尝试过程中产生的中间思考轨迹。我们表明思考轨迹本身就是一个强大的检索源,并进一步引入 T3,一种离线方法,将其转化为结构化、利于检索的表示,以提高可用性。使用这些轨迹作为语料库,简单的检索-生成流水线在强模型和基准测试(如 AIME 2025--2026、LiveCodeBench 和 GPQA-Diamond)上持续提升推理性能,优于无 RAG 基线和检索标准网络语料库。例如,在 AIME 2025-2026 上,使用 Gemini-2-thinking 生成的轨迹进行 RAG,在 Gemini-2.5-Flash、GPT-OSS-120B 和 GPT-5 上分别实现了 +56.3%、+8.6% 和 +7.6% 的相对增益,尽管这些是更新的模型。总体而言,我们的结果表明思考轨迹是推理任务的有效检索语料库,将其转化为结构化、紧凑或诊断性表示可带来更强的增益。代码见此链接。

英文摘要

Retrieval-augmented generation (RAG) has proven effective for knowledge-intensive tasks, but is widely believed to offer limited benefit for reasoning-intensive problems such as math and code generation. We challenge this assumption by showing that the limitation lies not in RAG itself, but in the choice of corpus. Instead of retrieving documents, we propose retrieving thinking traces, i.e., intermediate thinking trajectories generated during problem solving attempts. We show that thinking traces are already a strong retrieval source, and further introduce T3, an offline method that transforms them into structured, retrieval-friendly representations, to improve usability. Using these traces as a corpus, a simple retrieve-then-generate pipeline consistently improves reasoning performance across strong models and benchmarks such as AIME 2025--2026, LiveCodeBench, and GPQA-Diamond, outperforming both non-RAG baselines and retrieval over standard web corpora. For instance, on AIME 2025-2026, RAG with traces generated by Gemini-2-thinking achieves relative gains of +56.3%, +8.6%, and +7.6% for Gemini-2.5-Flash, GPT-OSS-120B, and GPT-5, respectively, even though these are more recent models. Overall, our results suggest that thinking traces are an effective retrieval corpus for reasoning tasks, and transforming them into structured, compact, or diagnostic representations unlocks even stronger gains. Code available at https://github.com/Narabzad/t3.

2606.09677 2026-06-10 eess.AS cs.AI 版本更新

MeCo: One-Step MeanFlow-based Corrector for Multi-Channel Speech Separation

MeCo: 基于MeanFlow的一步校正器用于多通道语音分离

Dohwan Kim, Jung-Woo Choi

发表机构 * School of Electrical Engineering, KAIST(韩国成均馆大学电气工程学院)

AI总结 提出MeCo,一种基于MeanFlow的一步生成式校正器,通过数据空间优化联合训练生成目标与信号保真度,在极低计算开销下同时提升信号保真度和人耳听觉质量。

Comments 5 pages, accepted to Interspeech 2026

详情
AI中文摘要

虽然用于多通道语音分离的判别模型在基于参考的指标上表现出色,但它们通常表现出次优的人耳听觉质量。为了解决这个问题,我们提出了一种新颖的基于MeanFlow的一步生成式校正器(MeCo)。MeCo学习一个条件平均速度场,以一步方式将判别估计直接映射到干净语音流形上。为了最大化一步生成性能,我们引入了数据空间优化(DSO)。DSO集成了一个$\mathbf{x}_r$损失,该损失惩罚较长位移间隔上的预测误差,作为人耳听觉质量的生成目标,以及一个端点SI-SDR损失,直接优化终端信号保真度。实验表明,MeCo以最小的计算开销实现了最先进的性能,在域内和域外场景中同时实现了卓越的信号保真度和人耳听觉质量。

英文摘要

While discriminative models for multi-channel speech separation excel in reference-based metrics, they often exhibit suboptimal human listening quality. To address this, we propose a novel MeanFlow-based one-step generative corrector (MeCo). MeCo learns a conditional average velocity field to map discriminative estimates directly onto the clean speech manifold in a single step. To maximize one-step generation performance, we introduce Data-Space Optimization (DSO). DSO integrates an $\mathbf{x}_r$-loss, which penalizes prediction errors on longer displacement intervals to serve as a generative objective for human listening quality, with an Endpoint SI-SDR loss that directly optimizes terminal signal fidelity. Experiments demonstrate that MeCo achieves state-of-the-art (SOTA) performance with minimal computational overhead, simultaneously achieving superior signal fidelity and human listening quality in both in-domain and out-of-domain scenarios.

2606.09141 2026-06-10 eess.AS cs.SD 版本更新

FlashTTS: Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation

FlashTTS: 基于MTP加速和X-pred均值流蒸馏的快速流式TTS

Hanke Xie, Xiaming Ren, Dake Guo, Ruonan You, Wenhao Li, Jingbin Hu, Guobin Ma, Huakang Chen, Kejie Xu, Rui Huang, Weiguo Tan, Xianrong Wang, Lei Xie

发表机构 * Huawei Technologies Co., Ltd(华为技术有限公司)

AI总结 提出FlashTTS框架,通过滞后多轨架构、并行多令牌预测和X-pred均值流匹配解码器,实现低延迟流式TTS,首包延迟降至325ms,保持零样本语音克隆和跨语言可懂度。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

近期语音对话系统的进展要求文本转语音(TTS)模型更快、响应更及时。现代语音对话系统对TTS模型有两个主要要求:低延迟和支持流式输入输出。然而,大多数现有的基于单码本LLM的TTS方法依赖于多阶段流水线,缺乏原生流式能力。这些系统通常由于缓慢的自回归预测和多步流匹配而遭受高端到端延迟。为了解决这些限制,我们提出了FlashTTS,一个开源、低延迟的流式TTS框架。FlashTTS引入了一种滞后多轨架构,原生处理流式文本和语音输入,从而消除了句子级缓冲的需要。为了加速声学生成,我们将并行多令牌预测(MTP)与X-pred均值流匹配解码器集成。这种配置在恰好两次函数评估(2-NFE)中实现了高保真度的令牌到梅尔频谱生成。通过联合优化输入处理和解码效率,FlashTTS为实时语音对话系统提供了实用基础。实验表明,与稳健的流式基线相比,FlashTTS将首包延迟显著降低至325毫秒,同时保持了强大的零样本语音克隆和跨语言可懂度。语音样本可用。模型代码和检查点将作为开源发布。

英文摘要

Recent progress in speech dialogue systems requires Text-to-Speech (TTS) models to be faster and more responsive. Modern speech dialogue systems impose two primary requirements on TTS models: low latency and support for streaming inputs and outputs. However, most existing single-codebook LLM-based TTS methods rely on multi-stage pipelines that lack native streaming capabilities. These systems typically suffer from high end-to-end latency due to slow autoregressive prediction and multi-step flow matching. To address these limitations, we propose FlashTTS, an open-source and low-latency streaming TTS framework. FlashTTS introduces a lagged multi-track architecture that natively processes streaming text and speech inputs, thereby eliminating the need for sentence-level buffering. To accelerate acoustic generation, we integrate parallel Multi-Token Prediction (MTP) with an X-pred mean flow matching decoder. This configuration achieves high-fidelity token-to-mel generation in exactly two function evaluations (2-NFE). By jointly optimizing input processing and decoding efficiency, FlashTTS offers a practical foundation for real-time speech dialogue systems. Experiments show that FlashTTS substantially reduces First-Packet Latency to 325ms compared to robust streaming baselines, all while preserving strong zero-shot voice cloning and cross-lingual intelligibility. Speech samples are available. The model code and checkpoints will be released as open source.

2606.08799 2026-06-10 stat.ML cs.LG 版本更新

Generalization in Nonlinear Least Squares via Learned Feature Geometry

非线性最小二乘中基于学习特征几何的泛化性

Ayub Kharel, Ilja Kuzborskij, Patrick Rebeschini, Yasin Abbasi-Yadkori

发表机构 * University of Oxford(牛津大学) Google DeepMind(谷歌DeepMind) Sapient Intelligence(智睿科技)

AI总结 通过算法稳定性分析岭正则化非线性最小二乘的泛化误差,利用经验雅可比Gram矩阵和残差曲率项定义数据依赖的有效维度,并证明其与内在维度而非参数数量相关。

Comments Preprint, under review

详情
AI中文摘要

我们通过平均算法稳定性研究了岭正则化非线性最小二乘模型的泛化性,推导了局部极小值点的误差界,该误差界依赖于数据依赖的有效维度,该维度通过经验雅可比Gram矩阵和残差-曲率项反映了训练参数处梯度模型的几何结构。在线性情况下,曲率项消失,这恢复了雅可比核协方差的经典有效维度,但评估的是训练后的模型而非初始化时的模型(如神经正切核分析中常见)。我们进一步通过梯度特征的覆盖复杂度来界定该有效维度,从而得到依赖于学习几何而非参数数量的保证。特别地,对于流形支持的数据和分段Lipschitz雅可比矩阵,界限随内在维度缩放;而对于单隐层ReLU网络,该机制可通过激活稳定区域的数量显式表达。在合成流形、聚类分布和基准数据集上的实验展示了训练后雅可比矩阵的压缩、残差-曲率线性化的紧致性,以及稳定性界限与观测泛化差距的一致性。我们界限的一个关键特征是推导的简洁性,它基于强对数凹噪声下的Brascamp-Lieb不等式从第一性原理得出。

英文摘要

We study the generalization of ridge-regularized nonlinear least-squares models via on-average algorithmic stability, deriving error bounds for local minimizers in terms of a data-dependent effective dimension that reflects the geometry of the gradient model at the trained parameters, through the empirical Jacobian Gram matrix and a residual-curvature term. In the linear case, where the curvature term vanishes, this recovers the classical effective dimension of the Jacobian kernel covariance, but evaluated at the trained model rather than at initialization as is typical in neural tangent kernel analyses. We further bound this effective dimension via covering complexity of the gradient features, leading to guarantees that depend on learned geometry rather than parameter count. In particular, for manifold-supported data and piecewise Lipschitz Jacobians, the bounds scale with intrinsic dimension, while for one-hidden-layer ReLU networks, the mechanism can be made explicit through counts of activation-stable regions. Experiments on synthetic manifolds, clustered distributions, and benchmark datasets illustrate trained-Jacobian compression, the tightness of the residual-curvature linearization, and agreement between the stability bound and observed generalization gaps. A key feature of our bounds is the simplicity of their derivation, which follows from first principles using the Brascamp-Lieb inequality under strongly log-concave noise.

2606.08251 2026-06-10 cs.CY cs.AI 版本更新

Contemporary AI lacks the imagination to diverge or negate in science

当代人工智能缺乏在科学中发散或否定的想象力

Honglin Bao, Siyang Wu, Xiao Liu, Sida Li, Shiyun Cao, James A. Evans

发表机构 * Data Science Institute, University of Chicago(芝加哥大学数据科学研究所) Knowledge Lab, University of Chicago(芝加哥大学知识实验室)

AI总结 通过大规模科学家评估,发现当前AI在科学假设生成中缺乏多样性,无法自发提出零假设,且自动评估与专家判断一致性低,但微调奖励模型可缩小差距。

详情
AI中文摘要

关于人工智能将加速科学发现的宏大预测已超越来自在职科学家的证据,该领域仍缺乏大规模、科学家参与的测试。我们进行了迄今最大规模的此类评估,描绘了AI尚不能为科学做什么。我们邀请了121,640篇近期预印本(涵盖生物学、医学、化学和社会科学)的作者,对大型语言模型(LLMs)根据其论文背景和难题生成的后续想法进行评判。6,749名科学家返回了25,139组关于新颖性、实证可行性、真实性概率和采纳倾向的评分。出现了三种模式。第一,非推理LLMs陷入狭窄的“蜂巢思维”,产生相似想法;推理模型探索更宽的假设空间,但没有模型类自发提出零假设——人类更自由地做出这一举动。第二,科学家奖励与自己相似的想法,并更看重概率而非新颖性,尽管社会科学家比生命科学家更容忍风险。资深社会科学家是最严厉的批评者,他们的怀疑是有道理的:LLMs在像社会科学这样的多元领域中最易出错,这些领域需要上下文感知的解释和不断发展的理论。第三,社区目前依赖的自动评估器——LLM作为评委、人工指标,甚至最先进的(SOTA)模型——与专家判断的一致性较弱,检索增强和科学家角色提示仅带来边际收益。我们在人类评分上后训练的Qwen3-14B奖励模型捕捉了领域品味细微差别,比SOTA模型高出27%,并缩小了与独立同行评审员间一致性的差距。尽管有种种炒作,当今的科学AI仍然是一个其想象力、输出和判断需要人类基础的协作者。

英文摘要

Bold projections that artificial intelligence will accelerate scientific discovery have raced ahead of evidence from working scientists, and the field still lacks large-scale, scientist-in-the-loop tests of these claims. Here we mount the largest such evaluation to date and map what AI cannot yet do for science. We invited authors of 121,640 recent preprints across biology, medicine, chemistry, and the social sciences to judge ideas that large language models (LLMs) generated from the context and puzzles of their own papers. 6,749 scientists returned 25,139 sets of ratings on novelty, empirical feasibility, probability of being true, and favorability of adoption. Three patterns emerge. First, non-reasoning LLMs collapse into a narrow "hivemind" of similar ideas; reasoning models roam a wider hypothesis space, yet no model class spontaneously proposes null hypotheses -- a move humans make more freely. Second, scientists reward ideas that resemble their own and prize probability over novelty, though social scientists tolerate risk more readily than life scientists. Senior social scientists are the harshest critics, and their skepticism is well-earned: LLMs falter most in pluralistic fields like the social sciences that demand context-aware interpretation and evolving theories. Third, automated evaluators on which the community currently relies -- LLM-as-a-judge, artificial metrics, and even state-of-the-art (SOTA) models -- agree only weakly with expert judgment, and retrieval augmentation and scientist persona prompting yield only marginal gains. A Qwen3-14B reward model we post-trained on human ratings captures field taste nuances, beats SOTA models by up to 27%, and closes the gap to the inter-rater consistency of independent peer reviewers. For all the hype, today's scientific AI still represents a collaborator whose imagination, outputs and judgment benefit from human grounding.

2606.03419 2026-06-10 math.OC cs.AI cs.CG cs.NE math.CO 版本更新

Optimizing Explicit Unit-Distance Lower-Bound Certificates

优化显式单位距离下界证书

Michael T. M. Emmerich

发表机构 * Faculty of Information Technology, University of Jyväskylä(贾韦斯科普大学信息科技学院)

AI总结 针对Erdős单位距离猜想下界,通过非线性整数规划优化参数,提出开源验证流程并改进证书,得到当前最佳下界u(n)>n^{1.0152}。

Comments 17 pages, 9 figures. Added a declaration on the use of AI. Added references to further contributions discussed on MathOverflow, including a reference to the independently developed verification pipeline and certificate package by Tseng (2026), published on Zenodo

详情
AI中文摘要

2026年对Erdős单位距离猜想的反驳以及Sawin后续的显式定量改进表明,对于固定正数ε,n个平面点中单位距离的最大数量u(n)可以超过n^{1+ε}。Sawin的显式界给出了任意大n下超过n^{1.014}个单位距离,并暴露了有限参数的选择尚未完全优化。本报告将有限参数选择任务表述为非线性整数规划问题的变体,并提出了一个开源的Python验证流程,首先通过复现Sawin公布的参数选择进行验证,然后应用于计算改进的证书。主要的计算贡献是对素数集合T和S_Q、整数重数k(p)以及有理编码的实数参数R进行整数优化和检查程序。优化流程有意设计为轻量级且可在标准硬件上复现:我们提出了一种确定性贪心构造启发式、一种带有修复算子以保持数论可行性的定制整数进化策略,以及一种双亲离散重组变体。比较了四个证书级别:Sawin公布的示例(δ=0.0141144286784982...)、贪心优化证书(δ=0.0151718056372133...)、带有有理数R=6672416/100000的定制整数进化策略证书(δ=0.0152616610684193...),以及带有离散重组的定制整数进化策略证书(同样R=6672416/100000,δ=0.0152628688170072...)。因此,在严格按引用应用Sawin显式准则的前提下,当前最佳证书支持谨慎的明确陈述:对于任意大的n,u(n)>n^{1.0152}。

英文摘要

The 2026 disproof of Erdős's unit-distance conjecture and Sawin's quantitative refinement show that the maximum number $u(n)$ of unit distances among $n$ planar points can exceed $n^{1+\varepsilon}$ for a fixed positive $\varepsilon$. Sawin's explicit bound gives more than $n^{1.014}$ unit distances for arbitrarily large $n$ and exposes integer parameters whose choice is not fully optimized. This report treats Sawin's parameter selection as a nonlinear integer optimization problem and develops an open-source Python optimization and verification pipeline for certificates involving prime sets $T$ and $S_Q$, integer multiplicities $k(p)$, and a rationally encoded real parameter $R$. After reproducing Sawin's certificate with $δ=0.014114\ldots$, the pipeline yields improved certificates with the same $T$. We develop a tailored integer evolution strategy achieving a certificate with $δ=0.015263\ldots$ and supporting the cautious statement $u(n)>n^{1.0152}$ for arbitrarily large $n$. For extended ramified prime ranges, the Emmerich--Cordella certificate obtained with the same framework reports $u(n)>n^{1.031}$ for $\#T=67$, illustrating the importance of enlarging $T$. Very recent MathOverflow discussions, brought to the author's attention as of version~4, report further improvements, including certificates above $δ>0.035$ and beyond $δ>0.036$. Some of these improvements may rely not only on larger prime ranges but also on modified constraint systems and additional degrees of freedom that deviate from Sawin's original formulation. Beyond this application, the work illustrates how randomized optimization heuristics can improve, verify, and refine explicit certificates for combinatorial geometry through nonlinear integer optimization.

2606.00038 2026-06-10 cs.CY cs.AI 版本更新

Beyond Tool Adoption: A Practical Five-Stage Developmental Continuum for AI Literacy in Higher Education

超越工具采纳:高等教育中人工智能素养的实用五阶段发展连续体

J. Paul Liu, Rachel Levy

发表机构 * Dept of Marine, Earth, and Atmospheric Sciences(海洋、地球与大气科学系) AI Hub for Science(科学人工智能中心) Center of Geospatial Analytics(地理空间分析中心) Data Science and AI Academy(数据科学与人工智能学院) Department of Mathematics(数学系) North Carolina State University(北卡罗来纳州立大学)

AI总结 本文提出一个五阶段AI素养连续体模型,帮助教育者诊断和引导学生从回避或盲目使用AI,逐步发展为批判性评估和改进AI应用的能力。

Comments 26 pages, 5 tables, 2 figures, 1 Supplementary Table

详情
AI中文摘要

人工智能(AI)素养日益被认为是所有大学毕业生应具备的基础能力。然而,学生与AI工具的互动往往集中在两个有问题的极端:因恐惧、不信任、伦理担忧或缺乏访问权限而回避,以及不加批判地依赖,产生流畅的输出却掩盖了误解。现有的AI素养框架提供了有价值的定义,但大多数在诊断学习者起点以及如何向负责任、批判性参与进步方面提供的指导有限。本文提出了一个五阶段AI素养连续体——1) 尚未参与,2) 不加批判地使用,3) 知情使用,4) 批判性评估,5) 改进——描述了高等教育中AI使用的发展取向。该连续体补充了维度框架,为教育者提供了实用的诊断和教学路径,与包括联合国教科文组织和经合组织在内的国际框架保持一致。我们介绍了来自北卡罗来纳州立大学的一个基于设计的实施案例,其中学分课程和密集实践工作坊在2024年秋季至2026年春季期间吸引了超过330名参与者。由于实施未使用经过验证的前/后测试工具或对照组,我们将发现视为基于观察和实践:参与者表现出从非参与或不加批判地使用向知情参与转变的行为,而持续且嵌入学科的经验则产生了更强的批判性评估和改进导向实践的证据。我们讨论了课程路径、公平性考量、评估策略,并认为AI素养不应仅被理解为工具采纳,而应被理解为在学科和社会背景下理解、评估和负责任地应用AI系统的发展能力。

英文摘要

Artificial intelligence (AI) literacy is increasingly recognized as a foundational competency for all university graduates. Yet students' engagement with AI tools often clusters at two extremes: avoidance driven by fear, mistrust, ethical concern, or lack of access, and uncritical reliance that produces fluent output while masking misunderstanding. Existing AI literacy frameworks provide valuable competency definitions, but most offer limited guidance for diagnosing where learners begin and how they progress toward responsible, critical engagement. This paper proposes a five-stage AI Literacy Continuum: 0) Not Yet Engaged, 1) Uncritical Use, 2) Informed Use, 3) Critical Evaluation, and 4) Improvement --that describes developmental orientations toward AI use in higher education. The continuum complements dimensional frameworks by providing educators with a practical diagnostic and instructional pathway aligned with international frameworks, including UNESCO and OECD. We present a design-based implementation case from North Carolina State University, where credit-bearing courses and intensive hands-on workshops engaged more than 330 participants between Fall 2024 and Spring 2026. Because the implementation did not use a validated pre/post instrument or comparison group, we frame the findings as observational and practice-based: participants exhibited behaviors consistent with movement from non-engagement or uncritical use toward informed engagement, while sustained and discipline-embedded experiences produced stronger evidence of critical evaluation and improvement-oriented practice. We discuss curricular pathways, opportunity considerations, assessment strategies, and argue that AI literacy should be understood not as tool adoption alone but as a developmental capacity to understand, evaluate, and responsibly apply AI systems in disciplinary and societal contexts.

2605.30370 2026-06-10 cs.NE cs.AI cs.CV cs.LG 版本更新

Updating the standard neuron model in artificial neural networks

更新人工神经网络中的标准神经元模型

Raul Mohedano, Thomas Batard, Erik Velasco-Salido, Ramsses De Los Santos Mendoza, Jorge H. Martínez, Stacey Levine, Marcelo Bertalmío

发表机构 * Spanish National Research Council (CSIC)(西班牙国家研究理事会(CSIC)) Center for Research in Mathematics (CIMAT)(数学研究中心(CIMAT)) Universidad Autónoma de Madrid (UAM)(马德里自治大学(UAM)) National Science Foundation (NSF)(国家科学基金会(NSF))

AI总结 本文用更真实的皮层细胞模型替代标准点神经元模型,在不增加参数的情况下,提升了人工神经网络的表达能力、鲁棒性和学习速度,并减少了记忆化和所需训练数据量。

Comments Acknowledgments included in the manuscript

详情
AI中文摘要

自20世纪50年代诞生以来,人工神经网络(ANNs)一直使用当时神经科学中流行的所谓点神经元模型,希望这种类比能够更好地模拟大脑功能。多年来,神经科学文献表明点神经元模型过于简单,无法正确表示许多基本的神经过程;然而,ANNs中的标准神经元模型仍然保持不变。在这里,我们用一个非常新的皮层细胞模型替代它,并通过理论分析和实验结果证明,仅仅通过使用更真实的神经单元元素而不增加参数数量,所得到的ANNs提供了许多重要优势,包括增强的表达能力、鲁棒性和学习速度,以及减少记忆化和所需的训练数据量。

英文摘要

From their inception in the 1950s, artificial neural networks (ANNs) started using the so-called point neuron model then prevalent in neuroscience, hoping that this analogy would allow for a better emulation of brain function. Over the years the neuroscience literature has shown that the point neuron model is too simplistic to properly represent many fundamental neural processes; however, the standard neuron model in ANNs still remains the same. Here we substitute it by a very recent model of cortical cells and demonstrate through theoretical analyses and experimental results how, simply by using a more realistic neural unit element without augmenting the number of parameters, the resulting ANNs offer a number of important advantages that include increases in expressivity, robustness and learning speed, and a reduction in memorization and the amount of training data needed.

2605.30292 2026-06-10 stat.ML cs.LG math.ST stat.ME stat.TH 版本更新

Leave a Window Out: Modifying the Jackknife for Predictive Inference in Time Series

留出一个窗口:修改刀切法用于时间序列的预测推断

Hanyang Jiang, Rina Foygel Barber, Ashwin Pananjady, Yao Xie

发表机构 * Schools of Industrial and Systems Engineering and Electrical and Computer Engineering(工业与系统工程系和电气与计算机工程系) Department of Statistics, University of Chicago(芝加哥大学统计系)

AI总结 针对时间序列中数据非可交换性和记忆预测器的问题,提出留出一个窗口(LWO)方法,通过修改刀切法实现有效覆盖,并产生比分裂共形预测更窄的区间。

Comments 40 pages, 8 figures

详情
AI中文摘要

共形预测方法在数据可交换且预测器以无记忆方式训练时,具有强大的理论和经验预测推断性能。然而,这些假设和约束在许多真实数据场景中不切实际,例如时间序列(其中时间依赖性违反了可交换性,并且无记忆预测器不可避免地具有较差的预测准确性)。最近的研究表明,分裂共形预测方法对于记忆预测器和偏离可交换性(这是时间序列数据的常见特征)具有鲁棒性。然而,由于使用样本分裂可能导致较低的准确性,这促使我们探究其他不依赖数据分裂的预测推断方法是否也能可靠地用于时间序列设置。在这项工作中,我们表明即使在具有轻微时间依赖性的典型时间序列模型中,原始的留一刀切法也可能遭受任意的覆盖损失。作为补救措施,我们提出了一种针对此类设置的精心修改,称为留出一个窗口(LWO)方法,并表明只要模型拟合过程满足温和的稳定性条件,它就能实现有效的覆盖。我们的证明基于量化数据偏离循环可交换性的程度,并引入了新的系数来衡量这种偏离的程度。在时间序列数据上的实验表明,当原始刀切法无法覆盖时,我们的LWO方法通常能实现有效的覆盖,同时产生比分裂共形预测更窄的区间。

英文摘要

Conformal prediction methods enjoy strong theoretical and empirical predictive inference performance, provided the data is exchangeable and is treated symmetrically during training. However, these assumptions are impractical in many settings, such as time series, where temporal dependence violates exchangeability and it is preferable to use predictors that leverage dependence by treating data asymmetrically. Recent work shows that split conformal prediction is robust to these issues, but sample splitting can reduce accuracy, motivating the study of methods that do not rely on data splitting in the time series setting. In this work, we show that the vanilla leave-one-out jackknife can suffer arbitrary loss of coverage even in canonical time series models with mild temporal dependence. As a remedy, we propose a modification tailored to such settings, which we term the leave-a-window-out (LWO) method, and show that it can achieve valid coverage provided that the model-fitting procedure satisfies mild stability properties. Our proofs are based on quantifying the degree to which the data departs from cyclic exchangeability, which we introduce new coefficients to measure. Experiments on time series demonstrate that our method often enjoys valid coverage when the vanilla jackknife fails to cover, while producing much narrower intervals than split conformal prediction.

2605.24818 2026-06-10 stat.ME cs.CL cs.LG 版本更新

Spiking the training data to correct for test set contamination

向训练数据注入噪声以校正测试集污染

Johnny Tian-Zheng Wei, Jerry Li, Ameya Godbole, Robin Jia

发表机构 * University of Southern California(南加州大学)

AI总结 提出通过以已知比例故意污染部分测试样本(注入噪声)来校正测试集污染导致的分数膨胀,并利用记忆预测器进行统计校正。

详情
AI中文摘要

关于测试集污染的文献主要集中在检测上,但对污染测试分数的校正研究不足。我们的核心建议是通过以已知比例故意污染一些测试样本来向训练数据注入噪声。然后,这些注入的样本可用于校准模型记忆的预测器,从而实现对膨胀测试分数的原则性统计校正。为了评估不同的校正估计量,我们首先提出了一个基于Hubble模型的模拟框架。Hubble模型以最小对形式出现,其中扰动模型被故意用几个测试集污染,而标准模型则没有,作为反事实和校正目标。我们考虑使用来自记忆预测器、正确性预测器或两者的信息的估计量。在模拟中,我们建立了基本的统计直觉,并表明利用记忆和正确性信息的估计量优于不做任何校正的朴素估计。然后,我们实例化了几种记忆和正确性预测器,并发现简单的预测器(如Platt缩放的成员推理指标)为校正提供了良好的信号。最后,我们考察了注入噪声的实际考虑。简单的记忆预测器在校准时不需要超过10个样本,并且通常从一个数据集迁移到另一个数据集。综上所述,注入噪声是解决测试集污染的一种有前景的方法。

英文摘要

The literature on test set contamination largely focuses on detection, but the correction of contaminated test scores is underexplored. Our core proposal is to spike the training data by intentionally contaminating some test examples at known rates. The spiked examples can then be used to calibrate predictors of model memorization which enable principled statistical correction of inflated test scores. To evaluate different correction estimators, we first present a simulation framework based on the Hubble models. Hubble models come in minimal pairs, where the perturbed model was deliberately contaminated with several test sets, while the standard model was not, serving as the counterfactual and correction target. We consider estimators that use information from a memorization predictor, correctness predictor, or both. In simulation, we establish basic statistical intuitions and show that estimators leveraging memorization and correctness information are better than naive estimation which makes no correction at all. We then instantiate several memorization and correctness predictors, and find that simple predictors such as Platt-scaled membership inference metrics provide good signal for correction. Finally, we examine the practical considerations of spiking. Simple memorization predictors need no more than 10 examples for calibration and often transfer from one dataset to another. Taken together, spiking is a promising solution for test set contamination.

2605.17189 2026-06-10 stat.ML cs.IT cs.LG math.IT math.ST stat.TH 版本更新

Sample-efficient inductive matrix completion with noise and inexact side-information

具有噪声和不精确侧信息的样本高效归纳矩阵补全

Yuepeng Yang, Cong Ma

发表机构 * Yale Department of Statistics and Data Sciences, Yale University(耶鲁大学统计与数据科学系) UChicago Department of Statistics, University of Chicago(芝加哥大学统计系)

AI总结 本文研究了在存在噪声和不精确侧信息的情况下,通过非凸投影梯度下降算法实现样本高效的归纳矩阵补全,提出了一个适用于有效问题规模的正则性条件,实现了线性收敛和估计误差仅依赖于有效问题规模的结论。

详情
AI中文摘要

低秩矩阵补全是一个广泛研究的问题,具有许多变体。归纳矩阵补全(IMC)结合了行和列的侧信息以显著缩小搜索空间。先前的工作分为两个领域:利用这种结构实现减少样本复杂度的方法,但仅适用于无噪声环境;以及处理噪声但需要样本复杂度与环境矩阵维度相匹配的方法,从而放弃了侧信息应提供的样本效率。在本文中,我们通过研究具有噪声的IMC并使用非凸投影梯度下降算法进行谱初始化来填补这一差距。我们的主要技术贡献是建立一个适用于由有效问题规模决定的减少样本复杂度的IMC损失函数的正则性条件,其规模与侧信息维度而非环境维度成比例。这直接导致了线性收敛和估计误差仅依赖于有效问题规模而非环境矩阵维度。我们进一步将分析扩展到不精确侧信息设置,证明减少的样本复杂度得以保持,并且估计误差在不精确性方面是最佳的。广泛的模拟和在MovieLens数据集上的实际实验验证了我们的理论发现。

英文摘要

Inductive matrix completion (IMC) is a variant of low-rank matrix completion that incorporates row and column side-information. In principle, it can reduce the effective dimension of the recovery problem from the ambient matrix size to the dimension of the side-information features. Existing theory, however, does not fully realize this advantage in the noisy setting: sample-efficient guarantees only apply to noiseless recovery, while noisy guarantees require sample sizes comparable to ordinary matrix completion. This paper closes this gap for noisy IMC. We analyze a nonconvex projected gradient descent algorithm with spectral initialization and prove that, under exact side-information, it achieves linear convergence and stable recovery at a sample complexity governed by the effective side-information dimension rather than the ambient matrix dimension. The key technical ingredient is a local regularity condition for the IMC loss that holds at this reduced sample size, despite the mismatch between the observation pattern and the side-information subspaces. We further extend the analysis to inexact side-information, showing that the same reduced sample complexity is preserved and that the estimation error degrades optimally with the level of subspace misspecification. Motivated by this trade-off, we also propose a penalized interpolation between IMC and ordinary matrix completion that balances sample efficiency against robustness to imperfect side-information. Simulations and experiments on the MovieLens dataset support the theoretical findings and illustrate the practical benefits of exploiting side-information in low-sample regimes.

2605.09595 2026-06-10 cs.NE cs.RO 版本更新

Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain

用于不平地形四足运动控制的神经形态强化学习

Zhuangyu Han, Abhronil Sengupta

发表机构 * School of Electrical Engineering and Computer Science(电气工程与计算机科学学院)

AI总结 提出基于平衡传播的PPO框架,结合CPG策略与残差调整策略,通过局部学习实现四足机器人在不平地形上的高效运动控制,性能与反向传播相当,GPU内存效率提升4.3倍。

详情
AI中文摘要

强化学习(RL)已实现复杂地形上的鲁棒四足运动,但大多数学习控制器通过反向传播在大量并行仿真中离线训练,并作为固定策略部署,限制了在地形变化、负载变化、执行器磨损以及其他实际条件下的适应能力,且受限于机载功耗。局部学习通过用局部神经状态驱动的更新替代全局反向传播图,为能量感知的机上自适应提供了潜在路径,使学习规则更兼容神经形态和内存计算基底。本文提出一种基于平衡传播(EP)的近端策略优化(PPO)框架,用于不平地形四足运动。控制器结合了仿生中枢模式发生器(CPG)策略和残余姿态调整策略,同时用支持EP的局部学习替代传统的反向传播训练的策略和价值网络。为了用EP训练随机连续控制策略,我们推导了与EP兼容的PPO输出扰动信号,并引入了一种双边比率裁剪机制,在松弛过程中稳定策略更新。在12自由度A1四足机器人上的实验表明,所提控制器在两阶段不平地形运动任务中实现了稳定的策略收敛。其运动性能在成功率、速度跟踪、执行器功率和身体稳定性方面与反向传播训练的PPO基线相当,同时与通过时间反向传播(BPTT)相比,GPU内存效率提高了4.3倍。这些结果表明,基于局部平衡的学习可以支持高维具身运动,并为低功耗机上自适应和微调提供算法基础。

英文摘要

Reinforcement learning (RL) has enabled robust quadruped locomotion over complex terrain, but most learned controllers are trained offline with backpropagation in massively parallel simulation and deployed as fixed policies, limiting adaptation to terrain variation, payload changes, actuator wear, and other real-world conditions under onboard power constraints. Local learning provides a potential path toward energy-aware on-robot adaptation by replacing global backpropagation graphs with updates driven by local neural states, making the learning rule more compatible with neuromorphic and in-memory computing substrates. This work proposes an equilibrium-propagation (EP)-based proximal policy optimization (PPO) framework for uneven-terrain quadruped locomotion. The controller combines a bio-inspired central pattern generator (CPG) policy with a residual postural adjustment policy, while replacing conventional backpropagation-trained policy and value networks with EP-enabled local learning. To train stochastic continuous-control policies with EP, we derive an EP-compatible PPO output-nudging signal and introduce a two-sided ratio clipping mechanism that stabilizes policy updates during relaxation. Experiments on a 12-DoF A1 quadruped show that the proposed controller achieves stable policy convergence in a two-stage uneven terrain locomotion task. Its locomotion performance is comparable to a backpropagation-trained PPO baseline in success rate, velocity tracking, actuator power, and body stability, while improving GPU memory efficiency by 4.3\(\times\) compared with backpropagation through time (BPTT). These results suggest that local equilibrium-based learning can support high-dimensional embodied locomotion and provide an algorithmic foundation for low-power on-robot adaptation and fine-tuning.

2507.09788 2026-06-10 cs.MA cs.AI cs.CL cs.HC 版本更新

TinyTroupe: An LLM-powered Multiagent Persona Simulation Toolkit

TinyTroupe:一个基于LLM的多智能体人物模拟工具包

Paulo Salem, Robert Sim, Christopher Olsen, Prerit Saxena, Rafael Barcelos, Yi Ding

发表机构 * Microsoft Corporation(微软公司) Dipeak Technology(迪佩克技术)

AI总结 针对现有LLM多智能体系统在细粒度人物模拟方面的不足,提出TinyTroupe工具包,支持详细人物定义和程序化控制,用于行为研究和社会模拟。

Comments 9 pages

详情
AI中文摘要

近期大型语言模型(LLM)的进展催生了一类新的自主智能体,重新激发并扩展了该领域的兴趣。基于LLM的多智能体系统(MAS)因此涌现,既用于辅助也用于模拟目的,但用于现实人类行为模拟的工具——及其独特的挑战和机遇——仍不成熟。现有的MAS库和工具缺乏细粒度的人物规范、群体采样设施、实验支持以及集成验证等关键能力,限制了它们在行为研究、社会模拟及相关应用中的实用性。为解决这些不足,本文介绍了TinyTroupe,一个模拟工具包,支持详细的人物定义(如国籍、年龄、职业、个性、信念、行为)并通过众多LLM驱动的机制实现程序化控制。这使得能够简洁地表述实际感兴趣的行为问题,无论是个人还是群体层面,并提供了有效的解决方案。通过代表性工作示例(如头脑风暴和市场调研会议)展示了TinyTroupe的组件,同时阐明了其目的并证明了其实用性。还提供了选定方面的定量和定性评估,包括以真实人类行为作为对照的初步实验。结果突出了可能性、局限性和权衡。该方法虽然以特定的Python实现形式呈现,但旨在作为一种新颖的概念贡献,可以部分或完全融入其他环境中。该库以开源形式提供,网址为https://github.com/microsoft/TinyTroupe。

英文摘要

Recent advances in Large Language Models (LLM) have led to a new class of autonomous agents, renewing and expanding interest in the area. LLM-powered Multiagent Systems (MAS) have thus emerged, both for assistive and simulation purposes, yet tools for realistic human behavior simulation -- with its distinctive challenges and opportunities -- remain underdeveloped. Existing MAS libraries and tools lack fine-grained persona specifications, population sampling facilities, experimentation support, and integrated validation, among other key capabilities, limiting their utility for behavioral studies, social simulation, and related applications. To address these deficiencies, in this work we introduce TinyTroupe, a simulation toolkit enabling detailed persona definitions (e.g., nationality, age, occupation, personality, beliefs, behaviors) and programmatic control via numerous LLM-driven mechanisms. This allows for the concise formulation of behavioral problems of practical interest, either at the individual or group level, and provides effective means for their solution. TinyTroupe's components are presented using representative working examples, such as brainstorming and market research sessions, thereby simultaneously clarifying their purpose and demonstrating their usefulness. Quantitative and qualitative evaluations of selected aspects are also provided, including preliminary experiments with real human behavior as control. Results highlight possibilities, limitations, and trade-offs. The approach, though realized as a specific Python implementation, is meant as a novel conceptual contribution, which can be partially or fully incorporated in other contexts. The library is available as open source at https://github.com/microsoft/tinytroupe.

2604.01114 2026-06-10 cs.HC cs.AI cs.CY cs.ET 版本更新

Trust and Reliance on AI in Education: AI Literacy and Need for Cognition as Moderators

教育中AI的信任与依赖:AI素养和认知需求作为调节变量

Griffin Pitts, Neha Rani, Weedguet Mildort

发表机构 * North Carolina State University(北卡罗来纳州立大学) University of Florida(佛罗里达大学)

AI总结 本研究通过编程问题解决实验,发现学生对AI助手的信任与适当依赖呈非线性关系,高信任导致对正确与错误建议的区分能力下降,且AI素养和认知需求显著调节这一关系。

Comments Full paper accepted to the 27th International Conference on AI in Education (AIED 2026). AIED Proceedings to be released Summer 2026

详情
AI中文摘要

随着生成式AI系统被整合到教育环境中,学生在完成学习任务时经常遇到AI生成的输出,无论是通过请求帮助还是通过集成工具。对AI的信任会影响学生如何解释和使用这些输出,包括他们是否批判性地评估或表现出过度依赖。我们研究了在编程问题解决任务中,学生的信任如何与他们对AI助手的适当依赖相关,以及这种关系是否因学习者特征而异。共有432名本科生参与,学生在完成Python输出预测问题时,接收来自AI聊天机器人的建议和解释,包括准确和故意误导的建议。我们将依赖行为操作化为学生响应反映适当使用AI助手建议的程度,即当建议正确时接受,错误时拒绝。任务前后调查评估了对助手的信任、AI素养、认知需求、编程自我效能和编程素养。结果显示了一种非线性关系,其中较高的信任与较低的适当依赖相关,表明对正确和错误建议的区分能力较弱。这种关系受到学生AI素养和认知需求的显著调节。这些发现强调了未来需要研究教学和系统支持,以鼓励在问题解决过程中对AI辅助进行更反思性的评估。

英文摘要

As generative AI systems are integrated into educational settings, students often encounter AI-generated output while working through learning tasks, either by requesting help or through integrated tools. Trust in AI can influence how students interpret and use that output, including whether they evaluate it critically or exhibit overreliance. We investigate how students' trust relates to their appropriate reliance on an AI assistant during programming problem-solving tasks, and whether this relationship differs by learner characteristics. With 432 undergraduate participants, students' completed Python output-prediction problems while receiving recommendations and explanations from an AI chatbot, including accurate and intentionally misleading suggestions. We operationalize reliance behaviorally as the extent to which students' responses reflected appropriate use of the AI assistant's suggestions, accepting them when they were correct and rejecting them when they were incorrect. Pre- and post-task surveys assessed trust in the assistant, AI literacy, need for cognition, programming self-efficacy, and programming literacy. Results showed a non-linear relationship in which higher trust was associated with lower appropriate reliance, suggesting weaker discrimination between correct and incorrect recommendations. This relationship was significantly moderated by students' AI literacy and need for cognition. These findings highlight the need for future work on instructional and system supports that encourage more reflective evaluation of AI assistance during problem-solving.

2604.13776 2026-06-10 cs.CY cs.CL cs.CR cs.CV 版本更新

Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking

谁被标记?AI内容水印中的多元评估差距

Alexander Nemecek, Osama Zafar, Yuqiao Xu, Wenbiao Li, Erman Ayday

发表机构 * Case Western Reserve University(凯斯西储大学)

AI总结 本文揭示AI内容水印在不同语言、文化和群体间存在系统性偏差,提出跨语言检测一致性、文化多样性覆盖和检测指标人口统计分解三个评估维度,主张水印部署前必须进行公平性审计。

Comments 7 pages. Accepted at the Multimodal Alignment for a Pluralistic Society (MAPS) Workshop, CVPR 2026

详情
AI中文摘要

水印正成为AI内容认证的默认机制,治理政策和框架将其引用为内容溯源的基础设施。然而,在文本、图像和音频模态中,水印信号强度、可检测性和鲁棒性取决于内容本身的统计特性,而这些特性在不同语言、文化视觉传统和人口统计群体间存在系统性差异。我们研究了这种内容依赖性如何产生特定模态的偏差路径。通过回顾各模态的主要水印基准,我们发现除一个例外,没有基准报告跨语言、文化内容类型或人群组的性能。为解决此问题,我们提出了多元水印基准测试的三个具体评估维度:跨语言检测一致性、文化多样性内容覆盖以及检测指标的人口统计分解。我们认为水印是多元对齐管道的一部分,应遵循相同的评估标准。我们将此与当前强制部署水印但未要求公平性评估的治理框架联系起来。我们的立场是评估必须先于部署,并且应用于AI模型的相同偏差审计要求应扩展到验证层。

英文摘要

Watermarking is becoming the default mechanism for AI content authentication, with governance policies and frameworks referencing it as infrastructure for content provenance. Yet across text, image, and audio modalities, watermark signal strength, detectability, and robustness depend on statistical properties of the content itself, properties that vary systematically across languages, cultural visual traditions, and demographic groups. We examine how this content dependence creates modality-specific pathways to bias. Reviewing the major watermarking benchmarks across modalities, we find that, with one exception, none report performance across languages, cultural content types, or population groups. To address this, we propose three concrete evaluation dimensions for pluralistic watermark benchmarking: cross-lingual detection parity, culturally diverse content coverage, and demographic disaggregation of detection metrics. We argue that watermarking is part of the pluralistic alignment pipeline and should be held to the same evaluation standards. We connect this to governance frameworks currently mandating watermarking deployment without requiring fairness evaluation. Our position is that evaluation must precede deployment, and that the same bias auditing requirements applied to AI models should extend to the verification layer.