arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1764
2606.06535 2026-06-08 cs.SE cs.LG 新提交

Architecturally Significant MLOps Guidelines for ML Model Integration and Deployment: a Gray Literature Review

架构上重要的MLOps指南:ML模型集成与部署的灰色文献综述

Faezeh Amou Najafabad, Markus Haug, Keerthiga Rajenthiram, Justus Bogner, Ilias Gerostathopoulos

发表机构 * Vrije Universiteit Amsterdam(阿姆斯特丹自由大学) Technical University of Munich(慕尼黑技术大学)

AI总结 通过灰色文献综述,总结了25条架构上重要的MLOps指南,分为五类,用于指导ML模型在MLOps系统中的集成与部署。

详情
Comments
ECSA2026
AI中文摘要

背景。尽管机器学习运维(MLOps)的采用日益增长,但由于缺乏统一的架构指导,团队往往以临时方式处理MLOps项目。社区将受益于一份综合知识的参考,以指导MLOps系统的架构设计,特别是关于ML模型的集成与部署。目标。为此,我们的目标是提供一份关于MLOps系统中ML模型集成与部署的架构上重要指南的全面概述。方法。我们对103个网络来源进行了灰色文献综述,以分析MLOps模型集成与部署的实践知识现状。然后,我们应用主题分析将这些实践综合为推荐指南。结果。我们贡献了25条架构上重要的MLOps指南,用于模型集成与部署,分为五类,并描述了它们对整体系统架构的影响。结论。我们的结果作为实践现状的MLOps指南概述,以支持研究人员和从业者在其MLOps系统中集成与部署ML模型。

英文摘要

Context. Despite the growing adoption of Machine Learning Operations (MLOps), teams often approach MLOps projects in an ad hoc manner due to the lack of consolidated architectural guidance. The community would benefit from a reference that synthesizes knowledge to inform the architectural design of MLOps systems, especially regarding the integration and deployment of ML models. Objective. In response, our goal is to provide a comprehensive overview of architecturally significant guidelines for the integration and deployment of ML models in MLOps systems. Method. We conduct a gray literature review of 103 web sources to analyze state-of-practice knowledge on MLOps model integration and deployment. We then apply thematic analysis to synthesize these practices into recommended guidelines. Results. We contribute a collection of 25 architecturally significant MLOps guidelines for model integration and deployment, organized into five categories, and describe their impact on the overall system architecture. Conclusion. Our results serve as an overview of state-of-practice MLOps guidelines to support researchers and practitioners with the integration and deployment of ML models in their MLOps systems.

2606.06521 2026-06-08 cs.AR cs.AI cs.DC cs.LG cs.PF 新提交

P-Cast Precision in FP8 Attention: Sink-Induced Collapse and the Optimality of S=2^8

P-Cast:FP8注意力中的精度——Sink引发的坍缩与S=2^8的最优性

Reed Lau

发表机构 * Tencent(腾讯)

AI总结 针对FP8注意力计算中softmax概率矩阵P在乘法前转换为FP8时的精度问题,分析了KV块迭代顺序和静态缩放因子对精度的影响,发现正向迭代导致非sink值下溢为0,反向迭代结合S=256可消除下溢,并证明S=256在比特精确、量化步长和覆盖范围上最优。

详情
Comments
8 pages, 3 figures, 3 tables, 1 algorithm. Technical note on FP8 E4M3 P-cast precision
AI中文摘要

FP8 (E4M3) 加速注意力计算可显著提升吞吐量,但3位尾数在P*V矩阵乘法前将softmax概率矩阵P转换为FP8时带来了精度挑战。我们分析了在注意力Sink现象下影响输出精度的两种实现选择:(1) KV块迭代顺序,(2) 转换前应用于P的静态缩放因子。我们证明正向KV迭代会导致“P坍缩”——在主导阶上,非sink的P值中有比例为Φ(Δ + δ_k - 6.93 - ln S)的部分下溢为零,其中小偏移δ_k ≈ 1(对于k_sink=4)是sink块内期望的分数最大值;而反向迭代可消除该问题,当反向与S=256结合时保证零下溢。我们进一步给出S=256=2^8的构造性刻画,它是同时满足(i) 比特精确的IEEE 754缩放,(ii) E4M3数轴上锯齿函数dp(S)的下包络(dp=2^-4,最小最坏情况量化步长),以及(iii) 在比特精确(2^k)缩放中最大正常范围覆盖(非比特精确缩放如448可实现略高覆盖)的静态缩放因子。两种优化已在FlashAttention-3/4中基于工程理由部署;我们的贡献是定量解释这些选择为何良好,并给出一个闭式阈值Δ_c = 6.93 + ln S - δ_k用于预测内核级精度损失。内核忠实实验(Q、K、V为FP32以隔离P-cast效应)在中等sink强度下显示3-10倍的MSE改进,配对测试证实两种修复结合时均饱和到相同的精度下限。

英文摘要

FP8 (E4M3) acceleration for attention computation offers significant throughput gains, but the 3-bit mantissa introduces precision challenges when the softmax probability matrix~$P$ is cast to FP8 before the $P \cdot V$ matrix multiplication. We analyze two implementation choices that affect output precision under the \emph{Attention Sink} phenomenon: (1)~the KV block iteration order, and (2) the static scaling factor applied to $P$ before casting. We show that forward KV iteration causes \emph{P-collapse} -- to leading order a fraction $Φ(Δ+ δ_k - 6.93 - \ln S)$ of non-sink $P$ values underflow to zero, where the small shift $δ_k \approx 1$ (for $k_{\text{sink}}{=}4$) is the expected within-sink-block score maximum -- and that reverse iteration removes it, with a zero-underflow guarantee when reverse is combined with $S{=}256$. We further give a constructive characterization of $S = 256 = 2^8$ as the static scale that simultaneously satisfies (i)~bit-exact IEEE 754 scaling, (ii) the lower envelope of a sawtooth function $dp(S)$ over the E4M3 number line ($dp = 2^{-4}$, the minimum worst-case quantization step), and (iii)~the maximum normal-range coverage \emph{among bit-exact ($2^k$) scales} (a non-bit-exact scale such as $448$ attains slightly higher coverage; sec.5}). Both optimizations are already deployed in FlashAttention-3/4 on engineering grounds; our contribution is a quantitative account of \emph{why} these choices are good and a closed-form threshold $Δ_c = 6.93 + \ln S - δ_k$ for predicting kernel-level precision loss. Kernel-faithful experiments ($Q, K, V$ in FP32 to isolate the P-cast effect) show $3$-$10\times$ MSE improvement at moderate sink strengths, and paired tests confirm both fixes saturate to the same precision floor when combined -- which motivated updating the hpc-ops kernel from $S{=}1$ to $S{=}256$.

2606.06515 2026-06-08 cs.AR cs.AI cs.DC cs.ET cs.LG 新提交

DxPTA: An Architecture Design Space Exploration with Optical Dataflow-guided Strategy for HW/SW Co-Design of Photonic Transformer Accelerators

DxPTA:基于光学数据流引导策略的光子Transformer加速器硬件/软件协同设计的架构设计空间探索

Rachmad Vidya Wicaksana Putra, Solomon Micheal Serunjogi, Mahmoud Rasras, Muhammad Shafique

发表机构 * eBRAIN Lab, Division of Engineering, New York University (NYU) Abu Dhabi(eBRAIN实验室,工程学院,纽约大学(NYU)阿布扎赫德分校) Photonic Research Lab (PRL), Division of Engineering, New York University (NYU) Abu Dhabi(光子研究实验室(PRL),工程学院,纽约大学(NYU)阿布扎赫德分校) New York University (NYU) Abu Dhabi(纽约大学(NYU)阿布扎赫德分校)

AI总结 提出DxPTA方法,通过光学数据流分析架构参数并设计约束感知搜索算法,实现光子Transformer加速器的高效硬件/软件协同设计,在满足面积、功耗等约束下显著提升搜索速度。

详情
Comments
8 pages, 12 figures
AI中文摘要

基于Transformer的网络已成为具有最先进性能的突出AI模型,可能为人工通用智能(AGI)铺平道路。然而,它们的大尺寸仍然阻碍了其高效实现,因此需要替代解决方案以实现其节能加速。最近,最先进的工作提出了光子Transformer加速器(PTA),与传统电子加速器相比,具有显著的加速和能效提升。然而,它们的PTA架构是在不考虑应用约束(如面积、功耗、能量和延迟)的情况下开发的。此外,它们的手动设计方法也需要大量设计时间来确定适合目标应用的架构,因此使得这种方法不可扩展。为了解决这些限制,我们提出了DxPTA,一种新颖的设计空间探索方法,用于实现满足所有约束的适当PTA架构的高效硬件/软件协同设计。这是通过(1)基于相干光学数据流识别PTA架构参数;(2)分析参数的影响/重要性;(3)利用此分析设计约束感知架构搜索算法来实现的。实验结果表明,我们的DxPTA可以为不同的基于Transformer的模型(即DeiT-T/S/B和BERT-B/L)找到合适的PTA架构。在约束条件为面积50mm^2、功耗5W、能量50mJ和延迟10ms的情况下,它实现了高达26mm^2面积、4.8W功耗、39mJ能量和6ms延迟;搜索时间比穷举方法快15.2倍。这些结果证明了DxPTA方法在实现针对各种基于AGI的应用的高效PTA设计方面的潜力。

英文摘要

Transformer-based networks have emerged as prominent AI models with state-of-the-art performance, which potentially pave the way toward artificial general intelligence (AGI). However, their large sizes still hinder their efficient implementation, thus highlighting the need for alternate solutions to enable their energy-efficient acceleration. Recently, state-of-the-art works propose photonic transformer accelerators (PTAs) with significant speedup and energy efficiency improvements over the conventional electronic accelerators. However, their PTA architectures are developed without considering the application constraints (e.g., area, power, energy, and latency). Moreover, their manual design approach also requires huge design time to determine a suitable architecture for the targeted application, hence making this approach not scalable. To address these limitations, we propose DxPTA, a novel design space exploration methodology for enabling efficient hardware/software co-design of the appropriate PTA architecture that meets all constraints. It is achieved by (1) identifying the PTA architecture parameters based on the coherent optical dataflow; (2) analyzing the impact/significance of the parameters; and (3) leveraging this analysis for devising a constraint-aware architecture search algorithm. Experimental results show that, our DxPTA can find the appropriate PTA architectures for different transformer-based models (i.e., DeiT-T/S/B and BERT-B/L). It achieves up to 26mm^2 area, 4.8W power, 39mJ energy, and 6ms latency, for constraints of 50mm^2 area, 5W power, 50mJ energy, and 10ms latency; with 15.2x faster searching time than the exhaustive approach. These results demonstrate the potential of DxPTA methodology for enabling efficient PTA designs for diverse AGI-based applications.

2606.06510 2026-06-08 cs.AR cs.AI cs.DC cs.PF 新提交

FP8 is All You Need (Part 1): Debunking Hardware FP64 as the HPC Holy Grail

FP8就是一切(第一部分):揭穿硬件FP64作为HPC圣杯的神话

Satoshi Matsuoka

发表机构 * RIKEN Center for Computational Science (R-CCS)(日本计算科学研究中心(R-CCS))

AI总结 本文通过中国剩余定理的Ozaki Scheme II,在AI优化GPU上利用FP8张量吞吐量实现全FP64精度的内存天花板性能,挑战了原生FP64硬件是科学计算基础的传统观点。

详情
Comments
There is a companion Part (2) paper focusing on Ozaki-style FFT
AI中文摘要

传统HPC教条认为,原生硬件FP64硅是科学计算不可约的基础——双精度模拟的“圣杯”。本文论证该教条是错误的:在B300代及以后的AI优化GPU上,丰富的FP8张量吞吐量结合基于中国剩余定理的Ozaki Scheme II,在典型HPC内核谱上以全FP64精度恢复了内存天花板执行。NVIDIA的Blackwell Ultra (B300)将原生FP64压缩至约1.3 TFLOPS——相比B200下降31倍——使得即使是内存受限的内核(SpMV、GEMV、模板计算)也变为计算受限。我们做出四项贡献。第一,一个统一的分析模型——张量-内存均衡(TME)模型,在Roofline模型上增加了计算乘数alpha、带宽乘数beta和重建延迟gamma。第二,我们识别出寄存器级融合是驱动beta趋近于1的机制,使得模拟在内存墙后几乎免费。第三,我们预测Ozaki Scheme II将模拟FP64从约1 TFLOPS的原生下限提升至约500 TFLOPS(B300)和约400 TFLOPS(Rubin R200),在计算受限区域超过B200原生FP64上限一个数量级以上,同时在带宽受限区域匹配内存天花板。第四,与H100基线相比,Ozaki Scheme II在每个研究的工作负载上匹配或超过H100,而B300原生FP64则导致高达50倍的性能下降。结合配套的FFT分析(在幸存的INT32流水线上使用Kulisch定点重建)和配套第二部分论文中报告的FP32+Kahan归约,B300上每个被调查的内核类别都以全FP64精度达到内存天花板。证据支持标题的主张:FP8,配合Ozaki Scheme II和Kulisch逃生路线,是生产级HPC所需的一切;原生FP64硅不再是人们所认为的圣杯。

英文摘要

Conventional HPC dogma holds that native hardware FP64 silicon is the irreducible foundation of scientific computing -- the "holy grail" of double-precision simulation. This paper argues the dogma is wrong: on AI-optimised GPUs of the B300 generation and beyond, abundant FP8 tensor throughput combined with the Chinese Remainder Theorem-based Ozaki Scheme II recovers memory-roof execution at full FP64 accuracy across the canonical HPC kernel spectrum. NVIDIA's Blackwell Ultra (B300) collapses native FP64 to ~1.3 TFLOPS -- a 31x regression from the B200 -- rendering even memory-bound kernels (SpMV, GEMV, stencils) compute-bound. We make four contributions. First, a unified analytic model, the Tensor-Memory Equilibrium (TME) model, augmenting the Roofline with a compute multiplier alpha, a bandwidth multiplier beta, and a reconstruction latency gamma. Second, we identify register-level fusion as the mechanism driving beta -> 1, making emulation essentially free behind the memory wall. Third, we project that Ozaki II vaults emulated FP64 from the ~1 TFLOPS native floor to ~500 TFLOPS (B300) and ~400 TFLOPS (Rubin R200), exceeding even B200's native FP64 ceiling by over an order of magnitude in the compute-bound regime while matching the memory roof in the bandwidth-bound regime. Fourth, against an H100 baseline, Ozaki II matches or exceeds H100 on every workload studied, versus the up-to-50x regression that B300 native FP64 imposes. Combined with a companion FFT analysis (Kulisch fixed-point reconstruction on the surviving INT32 pipe) and FP32+Kahan reductions reported in the companion Part(2) paper, every surveyed kernel class on B300 reaches the memory roof at full FP64. The evidence supports the title's claim: FP8, with Ozaki II and Kulisch escape routes, is all one needs for production HPC; native FP64 silicon is no longer the holy grail it has been taken to be.

2606.06505 2026-06-08 cs.CG cs.AI cs.CV math.DG 新提交

A Geometric Gaussian Mixture Representation of Plane Curves

平面曲线的几何高斯混合表示

Ali Darijani, Benedikt Stratmann, Jürgen Beyerer

发表机构 * Fraunhofer IOSB(弗劳恩霍夫研究所) KIT, IES(卡尔斯鲁厄理工学院,信息工程系)

AI总结 提出一种用户定义的平面曲线概率多边形表示,通过为每个线段赋予法向不确定性参数,构造高斯混合模型,保留局部几何与法向不确定性,适用于多种曲线类型。

详情
AI中文摘要

我们引入了一种用户定义的平面曲线概率多边形表示。给定一条曲线,我们在曲线上选择顶点,并通过线段连接相邻顶点以获得多边形近似。每个线段在法线方向上配备一个用户定义的不确定性参数。这产生了一组薄的概率几何基元,它们保留了底层曲线的几何形状,同时将其扩展到理想化的确定性一维公式之外。对于每个线段,我们定义一个随机变量,该变量在线段的切线方向上均匀分布,在线段的法线方向上高斯分布。通过匹配第一和第二中心矩,该构造诱导出一个高斯分量,其均值位于线段中点,协方差编码了切向和法向不确定性。将逐段分量与适当的权重相结合,得到平面曲线的用户定义概率多边形表示的高斯混合模型(GMM)。所提出的框架提供了一个解析上可处理的概率模型,保留了局部几何和法向不确定性。它适用于光滑、封闭、开放、非正则和自交的平面曲线,允许自适应离散化和法向方向上的变化不确定性,从而支持不确定性感知的几何建模。在一组典型平面曲线上的实验表明,所得的GMM捕获了局部切线、局部法线和局部弧长;从而也真实地捕获了底层曲线的全局形状。该表示特别适用于不确定性感知的CAD和数字孪生、机器人中的概率障碍物建模以及概率轨迹规划等应用。

英文摘要

We introduce a user defined probabilistic polygonal representation for plane curves. Given a curve, we select vertices on the curve and connect consecutive vertices by line segments to obtain a polygonal approximation. Each segment is equipped with a user defined uncertainty parameter in the normal direction. This yields a collection of thin probabilistic geometric primitives that retain the geometrz of the underlying curve while extending it beyond the idealized deterministic one dimensional formulation. For each segment, we define a Random Variable that is uniform distributed in the tangent direction of the segment and Gaussian distributed in the normal direction of the segment. By matching the first and the second central moments, this construction induces a Gaussian component whose mean lies at the segment midpoint and whose covariance encodes both tangential and normal uncertainty. Combining the segment wise components with appropriate weights yields a Gaussian Mixture Model (GMM) representation of the user defined probabilistic polygonal representation of the plane curve. The proposed framework provides an analytically tractable probabilistic model that preserves local geometry, and uncertainty in the normal direction. It applies to smooth, closed, open, non regular, and self intersecting plane curves, allows adaptive discretization and varying uncertainty in the normal direction, and as a result supports uncertainty aware geometric modeling. Experiments on a collection of canonical plane curves show that the resulting GMM capture local tangent, local normal, and local arc length; resulting in the global shape of the underlying curves to be truthfully captured as well. The representation is particularly relevant for applications in uncertainty aware CAD and digital twins, probabilistic obstacle modeling in robotics, and probabilistic trajectory planning.

2606.06498 2026-06-08 cs.GR cs.CV 新提交

Semantic-Structural Alignment for Generative Pictorial Charts

生成式图形图表的语义-结构对齐

Zhida Sun, Yulin Zhang, Zheng Gu, Min Lu, Bongshin Lee, Daniel Cohen-Or, Hui Huang

发表机构 * Visual Computing Research Center (VCC), College of Computer Science and Software Engineering (CSSE) Shenzhen University China(视觉计算研究中心(VCC)、计算机科学与软件工程学院(CSSE)深圳大学中国)

AI总结 提出一种生成式框架,通过多模态扩散变压器中的结构对齐和语义对齐机制,实现兼具艺术表现力和结构保真度的图形图表自动合成。

详情
Comments
11 pages, 17 figures, Accepted to ACM TOG
AI中文摘要

传统统计图形精确但往往缺乏图形图表的视觉吸引力、记忆性和参与度。我们提出了一种用于自动合成图形图表的生成式框架,弥合了语义表达与结构保真度之间的差距。我们不是将图表仅仅视为需要风格化的图像,而是将问题构建为一个双条件生成任务,由两个并行的外部控制信号引导:一个捕捉编辑意图语义上下文的文本提示,以及一个提供抽象统计图表全局结构的上下文图像。为了在多模态扩散变压器中增强这些控制,我们引入了两个互补的特征级机制:结构对齐,将空间布局锚定到输入图表;以及语义对齐,从参考图像转移表达性纹理。我们的方法泛化到主要视觉通道(即长度、面积、角度和位置)和多样化的语义领域,生成的图形图表既具有艺术吸引力又结构一致。广泛的定量评估和感知用户研究表明,我们的框架优于传统的可控生成和图像编辑基线,为表达性视觉叙事中高保真、数据驱动的生成建模提供了基础。项目页面:此 https URL。

英文摘要

Traditional statistical graphics are precise but often lack the visual appeal, memorability, and engagement of pictorial charts. We present a generative framework for the automated synthesis of pictorial charts that bridges the gap between semantic expression and structural faithfulness. Rather than treating charts merely as images to be stylized, we frame the problem as a dual-conditioned generation task guided by two parallel external control signals: a text prompt capturing the semantic context of the editing intent, and a context image providing the abstract statistical chart's global structure. To reinforce these controls within a Multi-Modal Diffusion Transformer, we introduce two complementary feature-level mechanisms: structural alignment to anchor spatial layouts to the input chart, and semantic alignment to transfer expressive textures from reference images. Generalizing across major visual channels (i.e., length, area, angle, and position) and diverse semantic domains, our method produces pictorial charts that are both artistically compelling and structurally consistent. Extensive quantitative evaluations and perceptual user studies demonstrate that our framework outperforms traditional controllable generation and image editing baselines, providing a foundation for high-fidelity, data-driven generative modeling in expressive visual storytelling. Project page: https://ssalign.github.io/.

2606.07403 2026-06-08 math.OC cs.LG 新提交

The Proxy Benders Decomposition

代理Benders分解

Changkun Guan, El Mehdi Er Raqabi, Mathieu Tanneau, Pascal Van Hentenryck

发表机构 * H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, USA(赫尔曼·米利特·斯图尔特工业与系统工程学院,佐治亚理工学院,美国亚特兰大) Department of Operations and Decision Systems, Université Laval, Quebec, Canada(运营与决策系统系,拉瓦尔大学,加拿大魁北克)

AI总结 针对Benders分解中子问题重复求解导致收敛慢的问题,提出代理Benders分解(Proxy-BD),用自监督预测-投影-补全机制生成可行对偶解,在保持理论有效性的同时大幅降低计算开销。

详情
AI中文摘要

Benders分解是求解具有复杂变量的大规模混合整数优化问题的基本框架,当这些变量固定后,子问题会显著简化。然而,经典Benders分解反复求解高度相似的子问题,且迭代中常出现锯齿形行为,导致大规模设置下收敛缓慢。受Benders子问题的重复结构和参数化特性启发,本文引入代理Benders分解(Proxy-BD),一种新的分解框架,其中子问题优化被经过认证的优化代理替代,而非重复精确求解。所提出的代理遵循自监督的预测-投影-补全机制,生成对偶可行解以产生可证明有效的Benders割。该框架通过投影-补全认证层,独立于预测质量保持分解的理论有效性。建立了代理诱导割的形式化刻画,该框架自然扩展到现代分解方案,包括分支-和-Benders-割算法。在大规模设施选址和网络设计问题上的计算实验表明,Proxy-BD在保持接近最优解质量的同时,显著减少了子问题的计算量。在高达2000x2000的无容量设施选址实例上,Proxy-BD的中位最优性差距低于0.5%,实现高达161倍的中位加速比,并在最大实例上减少超过240倍的割生成数量。计算增益随追索复杂度持续增加,表明在大规模分解设置中,基于代理的推理比重复精确子问题优化具有更显著的扩展优势。

英文摘要

Benders decomposition is a fundamental framework for solving large-scale mixed-integer optimization problems with complicating variables that, when fixed, yield significantly easier subproblems. However, classical Benders decomposition repeatedly solves highly similar subproblems and often exhibits zigzagging behavior across iterations, leading to slow convergence in large-scale settings. Motivated by the repetitive structure and parametric nature of Benders subproblems, this paper introduces the proxy Benders decomposition (Proxy-BD), a new decomposition framework in which subproblem optimization is replaced by certified optimization proxies rather than repeated exact solves. The proposed proxy follows a self-supervised predict-project-and-complete mechanism that produces dual-feasible solutions for generating provably valid Benders cuts. The framework preserves the theoretical validity of the decomposition independently of prediction quality through a projection-and-completion certification layer. A formal characterization of proxy-induced cuts is established, and the framework naturally extends to modern decomposition schemes, including branch-and-Benders-cut algorithms. Computational experiments on large-scale facility location and network design problems demonstrate that Proxy-BD substantially reduces the computational effort of subproblems while maintaining near-optimal solution quality. On large-scale uncapacitated facility location instances up to 2000x2000, Proxy-BD achieves median optimality gaps below 0.5%, yields up to 161x median speedups, and reduces the number of generated cuts by more than 240x on the largest instances. The computational gains consistently increase with recourse complexity, indicating that proxy-based inference scales substantially more favorably than repeated exact subproblem optimization in large-scale decomposition settings.

2606.07325 2026-06-08 math.ST cs.AI cs.IT math.IT stat.TH 新提交

A Temporal Spatial Minimax Rate for Smoothly-Varying Distributions in Wasserstein Space

Wasserstein空间中平滑变化分布的时空极小极大速率

Munsik Kim

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 研究在Wasserstein空间中,基于过去有限噪声快照估计未来曲线值的极小极大速率,提出时空下界并证明其匹配上界。

详情
AI中文摘要

我们研究了在$2$-Wasserstein空间$\mathcal{P}_2(\mathbb{R}^d)$中,从过去有限个噪声快照估计曲线$t\mapsto\mu_t$的未来值$\mu_{t_n+h}$的极小极大速率,在速度场的$k$阶协变导数满足绝热界$\|\nabla_t^k v\|\le\varepsilon$的条件下。我们的核心结果是统一的时空极小极大下界:在正则的、局部传输丰富的子类上,每个估计量都会遭受$W_2$-风险,其$M$-指数为$\gamma_d(k+1)/(k+1+\gamma_d)$,其中$\gamma_d=\min(1/d,1/2)$($M$为总样本量)。该下界源于时空约化:光滑性预算定义了一个可达的$W_2$-球,沿时间轴嵌入一个传输填充,整个快照实验的信息由Fano论证控制——空间填充是经典的,但其光滑性容许的时间嵌入和全窗口分析是新的。该界插值了一个与维数无关的外推下限$\varepsilon h^{k+1}$——即使过去完全已知,未来不可观测的不可约代价——以及空间估计的维数灾难$M^{-\gamma_d}$,当$k\to\infty$时恢复静态分布估计速率。我们以设计依赖的形式陈述下界——具有设计加权的有效样本量——适用于任意观测时间,并在密集(等间距)情形下得到闭式指数。匹配的上界在$k=0$(速率$M^{-1/(d+1)}$,$d\ge3$)和平移子模型中对所有$k$建立;对于$k\ge1$,协变估计量条件依赖于两个估计(比较几何偏差界和最优传输映射估计速率)达到该速率,将无条件的一般$k$上界留作开放问题。在合成弯曲和平坦族上的数值实验验证了预测的指数。

英文摘要

We study the minimax rate of estimating a future value $μ_{t_n+h}$ of a curve $t\mapstoμ_t$ in the $2$-Wasserstein space $\mathcal{P}_2(\mathbb{R}^d)$ from finitely many noisy snapshots of its past, under an adiabatic bound $\|\nabla_t^k v\|\le\varepsilon$ on the $k$-th covariant derivative of the velocity field. Our central result is a unified temporal-spatial minimax lower bound: over regular, locally transport-rich subclasses, every estimator incurs $W_2$-risk with $M$-exponent $γ_d(k+1)/(k+1+γ_d)$, $γ_d=\min(1/d,1/2)$ ($M$ the total sample size). It follows from a temporal-to-spatial reduction: the smoothness budget defines a reachable $W_2$-ball into which a transport packing is embedded along the time axis, and the information of the entire snapshot experiment is controlled by a Fano argument -- the spatial packing is classical, but its smoothness-admissible temporal embedding and the full-window analysis are new. The bound interpolates a dimension-free extrapolation floor of order $\varepsilon h^{k+1}$ -- the irreducible cost of an unobserved future, present even with the exact past -- and the spatial estimation curse $M^{-γ_d}$, recovering the static distribution-estimation rate as $k\to\infty$. We state the lower bound in a design-dependent form -- with a design-weighted effective sample size -- valid for arbitrary observation times, and obtain the closed-form exponent in the dense (equispaced) regime. The matching upper bound is established at $k=0$ (rate $M^{-1/(d+1)}$, $d\ge3$) and, in a translation submodel, for all $k$; for $k\ge1$ a covariant estimator attains the rate conditionally on two estimates (a comparison-geometry bias bound and an optimal-transport map-estimation rate), leaving the unconditional general-$k$ upper bound as an open problem. Numerical experiments on synthetic curved and flat families corroborate the predicted exponents.

2606.07153 2026-06-08 math.NA cs.LG cs.NA math.OC 新提交

No-Harm Physics-Informed Inverse Learning with Residual-Calibrated Uncertainty

无伤害物理信息逆学习与残差校准不确定性

Ronald Katende

发表机构 * Department of Mathematics(数学系) Kabale University(卡巴勒大学)

AI总结 提出一种无伤害认证与选择框架,通过残差校准半径确保物理信息逆学习不劣于基线,结合数据、物理、边界等残差提供后验误差界与确定性不确定性半径。

详情
Comments
25 pages, 10 Tables, 12 Figures
AI中文摘要

物理信息学习越来越多地用于偏微分方程控制的逆问题,但其可靠性仍然难以认证。本文开发了一种用于物理信息逆学习的无伤害认证与选择框架。仅当学习重建的残差校准半径不劣于基线半径时,即当 $$R_{\mathrm{learn}}\le R_{\mathrm{base}}+\varepsilon_{\mathrm{safe}}$$ 时,才接受学习重建;否则,该方法返回基线。该认证结合了数据、物理、边界或初始条件以及优化残差。在条件稳定性估计下,这些残差产生后验重建误差界和确定性不确定性半径。对于从独立随机配点估计的物理残差,还推导了高概率认证。在泊松源恢复、逆热重建、有限角度断层扫描、椭圆系数识别和随机残差验证上的数值测试表明,该选择器接受认证的改进,拒绝偏移、幻觉或未完成的候选,并在强不适定情况下变得保守。因此,该框架是一个认证与选择层,而不是另一个重建架构。

英文摘要

Physics-informed learning is increasingly used for partial differential equation (PDE)-governed inverse problems, but its reliability remains difficult to certify. This paper develops a no-harm certification-and-selection framework for physics-informed inverse learning. A learned reconstruction is accepted only when its residual-calibrated radius is no worse than the baseline radius, namely when $$R_{\mathrm{learn}}\le R_{\mathrm{base}}+\varepsilon_{\mathrm{safe}};$$otherwise, the method returns the baseline. The certificate combines data, physics, boundary or initial-condition, and optimization residuals. Under a conditional stability estimate, these residuals yield an a posteriori reconstruction-error bound and a deterministic uncertainty radius. A high-probability certificate is also derived for physics residuals estimated from independent random collocation points. Numerical tests on Poisson source recovery, inverse heat reconstruction, limited-angle tomography, elliptic coefficient identification, and stochastic residual validation show that the selector accepts certified improvements, rejects shifted, hallucinated, or unfinished candidates, and becomes conservative in strongly ill-posed regimes. The framework is therefore a certification-and-selection layer, not another reconstruction architecture.

2606.06782 2026-06-08 cs.IT cs.LG math.IT math.ST stat.ML stat.TH 新提交

The Sharp Phase Transition of Tyler's M-Estimator for Robust Subspace Recovery

Tyler's M-估计器在鲁棒子空间恢复中的尖锐相变

Gilad Lerman, Teng Zhang

发表机构 * School of Mathematics, University of Minnesota(明尼苏达大学数学系) Department of Mathematics, University of Central Florida(中央佛罗里达大学数学系)

AI总结 研究Tyler's M-估计器在临界信噪比DS-SNR=1时的行为,证明其收敛到真实子空间,建立尖锐相变。

详情
AI中文摘要

鲁棒子空间恢复(RSR)旨在从被异常值严重污染的数据集中识别潜在的d维子空间。复杂性理论结果基于维度缩放信噪比(DS-SNR)建立了问题计算难度的阈值:当DS-SNR严格小于1时,问题是SSE难的;当它大于1时,在一般位置假设下可通过实用算法求解。然而,在临界边界DS-SNR=1处实用算法的确切行为一直未知。本文解决了Tyler's M-估计器(TME)在此临界边界的行为,从而建立了尖锐相变。具体地,我们证明在一种新的稳定性条件下,当DS-SNR≥1时,TME精确收敛到真实子空间,该条件比先前文献中使用的一般位置假设更宽松。我们的分析利用了在majorization-minimization框架内对TME迭代的分解。

英文摘要

Robust Subspace Recovery (RSR) aims to identify an underlying d-dimensional subspace from a dataset heavily corrupted by outliers. Complexity-theoretic results establish a threshold for the problem's computational hardness based on the dimension-scaled signal-to-noise ratio (DS-SNR): the problem is SSE-hard when the DS-SNR is strictly less than 1, and solvable via practical algorithms when it is greater than 1 under general position assumptions. However, the exact behavior of practical algorithms at the critical boundary DS-SNR = 1 has remained unknown. This work resolves the behavior of Tyler's M-estimator (TME) at this critical boundary, consequently establishing a sharp phase transition. Specifically, we prove that TME converges exactly to the true subspace for DS-SNR \geq 1 under a new stability condition, which is less restrictive than the general position assumptions used in prior literature. Our analysis utilizes a decomposition of the TME iterates within a majorization-minimization framework.

2606.06543 2026-06-08 quant-ph cs.AI 新提交

Coordinated optimization of departure sequencing and section-track allocation in railway short-term concentrated departure scenarios based on qubo and hybrid quantum algorithms

基于QUBO和混合量子算法的铁路短期集中发车场景下出发排序与段轨分配协同优化

Xiaobin Li, Yanbin Gao, Weiguang Wang, Xuechen Liang

发表机构 * School of Transportation Engineering(交通运输工程学院)

AI总结 针对铁路短期集中发车场景,提出基于QUBO模型与仿真评估的协同优化框架,混合量子算法在动态条件下综合成本降低4.28%-26.26%,总延误减少4.37%-24.25%。

详情
AI中文摘要

本研究探讨了铁路短期集中发车场景下出发排序与段轨分配的协同优化问题。构建了一个二次无约束二元优化(QUBO)模型,在统一的二元框架内表示出发位置分配和段轨选择。由于调度方案的质量取决于时间相关的运行交互,而静态组合模型无法完全捕捉这些交互,因此引入基于仿真的评估层来评估段占用、中间站等待、站台容量压力、运行时间波动和延误传播。在此分层框架内,传统启发式算法、量子启发式算法和混合算法在相同的决策结构下进行了比较。结果表明,QUBO模型在解码后能够生成可行的候选方案,而仿真层清晰地区分了竞争算法在正常和扰动条件下的运行性能。在测试场景中,QPSO-QAOA在正常条件下表现最佳,而量子增强方法在动态条件下相对于传统方法平均综合成本降低4.28%--26.26%,总延误减少4.37%--24.25%。这些发现表明,基于QUBO的建模与基于仿真的评估相结合,为铁路短期集中发车调度提供了一种有用的方法论框架,尽管仍需使用实际运行数据进行验证。

英文摘要

This study examines the coordinated optimization of departure sequencing and section-track allocation in railway short-term concentrated departure scenarios. A quadratic unconstrained binary optimization (QUBO) model is formulated to represent departure-position assignment and section-track selection within a unified binary framework. Because the quality of a dispatching scheme depends on time-dependent operational interactions that cannot be fully captured by a static combinatorial model, a simulation-based evaluation layer is introduced to assess section occupation, intermediate-station waiting, platform-capacity pressure, running-time fluctuations, and delay propagation. Within this layered framework, conventional heuristics, quantum-inspired algorithms, and hybrid algorithms are compared on the same decision structure. The results show that the QUBO model can generate feasible candidate schemes after decoding, while the simulation layer clearly differentiates the operational performance of the competing algorithms under both normal and disturbed conditions. In the tested scenarios, QPSO-QAOA performs best under normal conditions, and the quantum-enhanced methods reduce comprehensive cost by 4.28\%--26.26\% and total delay by 4.37\%--24.25\% on average under dynamic conditions relative to their conventional counterparts. These findings suggest that the integration of QUBO-based modeling and simulation-based evaluation provides a useful methodological framework for railway short-term concentrated departure scheduling, although validation with real operational data remains necessary.

2606.07257 2026-06-08 physics.optics cs.LG 新提交

On the conditional equivalence of phase retrieval algorithms

关于相位恢复算法的条件等价性

Jakob Schroeder, Andreas Döpp

发表机构 * Fakultät für Physik, Ludwig-Maximilian-Universität München(物理系,路德维希-马克西米利安慕尼黑大学) Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心) Max Planck Institut für Quantenoptik(马克斯·普朗克量子光学研究所)

AI总结 本文证明了Gerchberg-Saxton算法与梯度下降法在幅度最小二乘损失上等价,并给出了全局和局部的概率解释,为迭代相位恢复中的松弛提供指导。

详情
AI中文摘要

相位恢复——从强度测量中恢复复值场——通常使用Gerchberg-Saxton (GS)算法的变体来解决,该算法被理解为测量平面之间的交替投影。同时,现代计算成像越来越依赖于基于梯度的优化和自动微分。这里我们表明这两种方法在数学上是等价的:GS幅度替换步骤恰好是幅度最小二乘损失上的单位梯度下降步骤。这种等价性使得经典相位恢复与可微物理管道无缝集成。我们进一步确定了这种等价性的两种互补概率解释:全局上,幅度损失是高斯幅度噪声下的负对数似然;局部上,每个投影步骤作为贝叶斯更新出现,以传播场为先验。局部观点为迭代相位恢复中的松弛提供了定性指导。

英文摘要

Phase retrieval - recovering a complex-valued field from intensity measurements - is typically solved using variants of the Gerchberg-Saxton (GS) algorithm, understood as alternating projections between measurement planes. Meanwhile, modern computational imaging increasingly relies on gradient-based optimization and automatic differentiation. Here we show that these two approaches are mathematically identical: the GS magnitude replacement step is exactly a unit gradient descent step on an amplitude least-squares loss. This equivalence enables seamless integration of classical phase retrieval with differentiable physics pipelines. We further identify two complementary probabilistic interpretations of this equivalence: globally, the amplitude loss is the negative log-likelihood under Gaussian amplitude noise; locally, each projection step arises as a Bayesian update with the propagated field as prior. The local view provides qualitative guidance for relaxation in iterative phase retrieval.

2606.06573 2026-06-08 physics.flu-dyn cs.CL cs.LG eess.SP 新提交

Multiscale POD of Transformer Attention Fields: Scale-Selective Analysis via Morlet Scalogram

Transformer注意力场的多尺度POD:基于Morlet尺度图的尺度选择性分析

Athanasios Zeris

发表机构 * Independent Researcher(独立研究者) Athens, Greece(希腊雅典)

AI总结 提出尺度选择性POD方法分析Transformer注意力场,通过Morlet小波识别时间尺度,提取各尺度能量主导模态,揭示层间尺度组织规律,无需架构修改或语言标注。

详情
Comments
23 pages, 3 figures, 4 tables
AI中文摘要

我们引入尺度选择性本征正交分解(POD)用于Transformer注意力场,受POD从湍流系综中提取能量主导模态的启发。Morlet连续小波变换识别文档系综中注意力滞后结构的主导时间尺度;然后POD从注意力场系综中提取每个尺度上的能量主导模态。得到的模态揭示了层依赖的尺度组织,早期层强调精细尺度,后期层转向较粗尺度。我们根据POD特征值衰减率定义谱集中指数,并经验性地表明该指数通过注意力场复杂度区分不同层。根据经典POD最优性定理,提取的模态最小化系综上的平均L2重构误差(定理1),为每层提供数据驱动的有效秩。该方法无需架构修改和语言标注:主导注意力模式仅从系综统计中涌现。湍流类比是结构性的而非物理性的:我们借用系综协方差和模态分析,而非流体动力学本身。

英文摘要

We introduce scale-selective Proper Orthogonal Decomposition (POD) for transformer attention fields, inspired by the use of POD for extracting energetically dominant modes from turbulent flow ensembles. The Morlet continuous wavelet transform identifies dominant temporal scales in the attention lag structure across a document ensemble; POD then extracts the energetically dominant modes at each scale from the ensemble of attention fields. The resulting modes reveal layer-dependent scale organisation, with early layers emphasising fine scales and later layers shifting toward coarser scales. We define a spectral concentration index from the POD eigenvalue decay rate and show empirically that it differentiates layers by their attention field complexity. By the classical POD optimality theorem, the extracted modes minimise the average L2 reconstruction error over the ensemble (Theorem 1), giving a data-driven effective rank for each layer. The method requires no architectural modification and no linguistic annotations: dominant attention patterns emerge from ensemble statistics alone. The turbulence analogy is structural rather than physical: we borrow ensemble covariance and modal analysis, not fluid dynamics itself.

2606.07385 2026-06-08 nlin.CD cs.LG physics.data-an 新提交

Unified Geometry-Guided ML-FTLE for Tracking Transient Chaos from Scalar Time Series

统一几何引导的ML-FTLE用于从标量时间序列追踪瞬态混沌

S. V. Manivelan, Andrei Velichko, I. Manimehan

发表机构 * Department of Physics, M. R. Government Arts College (Affiliated to Bharathidasan University, Tiruchirappalli)(物理系,M.R.政府艺术学院(隶属于巴拉特拉桑大学, Tiruchirappalli)) Institute of Physics and Technology, Petrozavodsk State University(物理与技术学院,佩特罗扎沃茨克州立大学)

AI总结 提出几何引导的机器学习框架,通过结合预测轨迹发散和宏观吸引子形态,从标量观测中检测瞬态混沌,无需控制方程,验证了融合拓扑状态空间与预测发散能系统改进连续过渡追踪。

详情
Comments
Preprint; 9 figures; submitted for peer review
AI中文摘要

在没有控制方程的情况下,从标量观测中检测瞬态混沌是非线性动力学中的一个基本挑战。我们提出了一个几何引导的机器学习框架,该框架统一了预测轨迹发散与宏观吸引子形态,以追踪突然的 regime 转变。该方法通过样本外 k-最近邻预测误差提取局部不稳定性尺度,建立 ML-FTLE 估计器,随后将此时间发散映射到由最小庞加莱占用网格字典导出的结构接近矩阵上。通过偏最小二乘回归,我们提取一个直接校准到经验有限时间李雅普诺夫谱的潜在几何成分,得到基于庞加莱的几何引导 FTLE。对解析 QR-FTLE 基线的验证证实,融合拓扑状态空间与预测发散系统地改进了连续过渡追踪。结构相似性指数最优地解析了逐渐阻尼,而豪斯多夫距离在突然的相空间崩溃期间表现出极端弹性。此外,宏观空间离散化作为针对加性高斯噪声的鲁棒拓扑正则化器,即使在中等信号阈值下也能保留确定性特征。这个无方程框架为监测复杂非平稳系统中的结构转变提供了高精度、抗噪声的诊断方法。

英文摘要

Detecting transient chaos from scalar observations without governing equations represents a fundamental challenge in nonlinear dynamics. We propose a geometry-guided machine learning framework that unifies predictive trajectory divergence with macroscopic attractor morphology to track abrupt regime shifts. The methodology extracts a local instability scale via out-of-sample k-nearest neighbor forecast errors to establish the ML-FTLE estimator, subsequently mapping this temporal divergence onto a structural closeness matrix derived from a minimal dictionary of Poincare occupancy grids. By employing partial least squares regression, we extract a latent geometric component calibrated directly to the empirical finite-time Lyapunov spectrum, yielding the Poincare-based geometric-guided FTLE. Validation against analytical QR-FTLE baselines confirms that fusing topological state spaces with predictive divergence systematically improves continuous transition tracking. The Structural Similarity Index optimally resolves gradual damping, while Hausdorff Distance exhibits extreme resilience during abrupt phase-space collapses. Furthermore, macroscopic spatial discretization acts as a robust topological regularizer against additive Gaussian noise, preserving deterministic signatures even at moderate signal thresholds. This equation-free framework provides a highly accurate, noise-resilient diagnostic for monitoring structural transitions in complex non-stationary systems.

2606.06107 2026-06-08 quant-ph cs.IT cs.LG eess.IV math.IT physics.optics 交叉投稿

Deployed trusted-node quantum key distribution over 300 km with a multi-core fiber access link

部署的300公里可信节点量子密钥分发与多芯光纤接入链路

Martin Clason, Joakim Argillander, Didrik Bergström, Daniel Spegel-Lexne, Giulio Foletto, Ashraf El Hassan, Mohamed Bourennane, Onur Günlü, Katia Gallo, Rui Lin, Guilherme B. Xavier

发表机构 * Department of Electrical Engineering, Linköping University(电气工程系,林雪平大学) Department of Physics, KTH Royal Institute of Technology(物理学系,皇家理工学院) Department of Physics, Stockholm University(物理学系,斯德哥尔摩大学) Lehrstuhl für Nachrichtentechnik, Technische Universität Dortmund(信息通信技术教席,德累斯顿技术大学) Department of Electrical Engineering, Chalmers University of Technology(电气工程系,查尔姆斯理工大学)

AI总结 通过270公里部署单模光纤和33公里多芯光纤段,总长303公里,使用商用QKD系统与外部超导纳米线单光子探测器,实现了可信节点量子密钥分发,并演示了动态可重构光纤网络中的集成以及密钥速率对图像传输保真度的影响。

详情
Comments
11 pages, 4 figures
AI中文摘要

量子密钥分发(QKD)越来越多地被考虑部署在现实通信网络中,其中长距离、异构光纤基础设施以及与经典通信的共存带来了巨大挑战。在这里,我们展示了在林雪平大学和瑞典国家量子通信基础设施的斯德哥尔摩枢纽之间,通过270公里部署的单模光纤和一段33公里模拟城域接入链路的多芯光纤(MCF)段,总距离303公里的可信节点QKD。两个子链路使用商用QKD系统,其接收器与外部超导纳米线单光子探测器接口,使得能够在超出标准内部门控模式探测器支持的损耗下运行。我们在两个MCF芯之间主动切换QKD信道的同时运行链路,其他芯中有共传播的以太网流量和注入的宽带光学噪声。结果证明了将商用QKD集成到与未来混合量子-经典网络相关的苛刻、动态可重构光纤基础设施中。最后,使用生成的密钥,我们说明了有限且时变的QKD吞吐量如何影响一次性密码本保护的图像传输:图像保真度强烈依赖于可用的QKD生成的密钥预算和压缩算法的选择,突显了现实场景中基于QKD加密的应用级挑战。

英文摘要

Quantum key distribution (QKD) is increasingly considered for deployment in realistic communication networks, where long distances, heterogeneous fiber infrastructure, and coexistence with classical traffic present substantial challenges. Here, we demonstrate trusted-node QKD between Linköping University and the Stockholm hub of the Swedish national quantum communication infrastructure over 270 km of deployed single-mode fiber, extended by a 33 km multi-core fiber (MCF) segment emulating a metropolitan access link, for a total distance of 303 km. The two sub-links use commercial QKD systems whose receivers are interfaced with external superconducting nanowire single-photon detectors, enabling operation at losses beyond those supported by standard internal gated-mode detectors. We operate the link while actively switching the QKD channel between two MCF cores, with co-propagating Ethernet traffic and injected broadband optical noise in the other cores. The results demonstrate the integration of commercial QKD into demanding, dynamically reconfigurable fiber infrastructure relevant to future hybrid quantum-classical networks. Finally, using the generated secret keys, we illustrate how limited and time-varying QKD throughput affects one-time-pad-protected image transmission: image fidelity depends strongly on the available QKD-generated key budget and the choice of compression algorithm, highlighting application-level challenges for QKD-based encryption in realistic scenarios.

2606.05050 2026-06-08 cond-mat.mtrl-sci cs.AI physics.chem-ph 交叉投稿

Autonomous heterogeneous catalyst discovery with a self-evolving multi-agent digital twin

自主异质催化剂发现:一种自进化多智能体数字孪生系统

Zhilong Song, Zongmin Zhang, Lixue Cheng

发表机构 * Department of Chemistry, Hong Kong University of Science and Technology(香港科技大学化学系) IAS Center for AI for Scientific Discoveries, Hong Kong University of Science and Technology(香港科技大学人工智能科学发现中心) Department of Computer Science and Engineering, Hong Kong University of Science and Technology(香港科技大学计算机科学与工程系) Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology(香港科技大学化学与生物工程系)

AI总结 提出CatDT(催化数字孪生),一种自进化多智能体系统,通过集成八种专业智能体和27种科学工具,在单个GPU上5-30分钟内自动构建工作催化剂数字孪生,实现从体相晶体和自然语言反应描述到稳定晶面预测、反应路径枚举、过渡态定位和动力学计算的全流程,在七个气固相基准上预测与实验偏差在0.5-2倍内,并独立发现丙烷脱氢非贵金属候选催化剂。

详情
AI中文摘要

理论异质催化有望实现快速催化剂发现,然而计算和机器学习预测常常偏离实验,并局限于狭窄的材料家族,原因是缺乏忠实、条件感知的催化模拟器。我们提出CatDT(催化数字孪生),一种自进化多智能体系统,构建工作催化剂的自主数字孪生,统一了气固和液固建模。仅从体相晶体和自然语言反应描述出发,八个专业智能体和27种科学工具在单个GPU上5-30分钟内预测稳定晶面、重构工作表面、枚举和排序反应路径、定位过渡态并计算动力学。两项创新解决了最困难的步骤:UniMech通过融合智能体引导提议与能量缓存图搜索,以比穷举枚举低超过$10^3$倍的成本发现新型材料的主导路径;记忆增强的强化循环将600个催化表面的势垒计算成功率从41%提高到84%。在七个气固基准上——台阶金属、单原子催化剂、有序金属间化合物、富空位二维硫化物和碳化物,以及强金属-载体相互作用(SMSI)界面——每个CatDT预测在四个数量级内与实验偏差在0.5-2倍之间。对于丙烷脱氢,CatDT独立发现与Pt基工业基准相媲美的非贵金属候选催化剂,其中提出的Ni@ZrO$_2$ SMSI覆盖层在约100%选择性下达到$1.63~ ext{s}^{-1}$的模拟TOF。更广泛地说,忠实催化剂数字孪生——或任何多阶段科学模拟器——的决定性因素不是原始LLM能力,而是围绕它的工程化框架:确定性工具、持久记忆和跨模型、工具和运行累积的已验证自我改进。

英文摘要

Theoretical heterogeneous catalysis promises rapid catalyst discovery, yet computational and machine-learning predictions often deviate from experiment and stay confined to narrow material families, for want of a faithful, condition-aware catalytic simulator. We present CatDT (Catalysis Digital Twin), a self-evolving multi-agent system that builds an autonomous digital twin of a working catalyst, unifying gas-solid and liquid-solid modeling. From only a bulk crystal and a natural-language reaction description, eight specialized agents and 27 scientific tools predict stable facets, reconstruct working surfaces, enumerate and rank reaction pathways, locate transition states, and compute kinetics in 5-30 min on a single GPU. Two innovations address the hardest steps: UniMech finds dominant pathways for novel materials at over $10^3\times$ lower cost than exhaustive enumeration by fusing agent-guided proposals with energy-cached graph search, and a memory-augmented reinforcement loop raises barrier-calculation success from 41\% to 84\% across 600 catalytic surfaces. Across seven gas-solid benchmarks -- stepped metals, single-atom catalysts, ordered intermetallics, vacancy-rich 2D sulfides and carbides, and a strong-metal--support-interaction (SMSI) interface -- every CatDT prediction lies within 0.5-2 times experiment over four orders of magnitude. For propane dehydrogenation, CatDT independently discovers non-precious candidates rivaling the Pt-based industrial benchmark, with a proposed Ni@ZrO$_2$ SMSI overlayer reaching a simulated TOF of $1.63~\text{s}^{-1}$ at $\sim$100\% selectivity. More broadly, the decisive factor for a faithful catalyst digital twin -- or any multi-stage scientific simulator -- is not raw LLM capability but the engineered harness around it: deterministic tools, persistent memory, and verified self-improvement that compound across models, tools, and runs.

2606.04550 2026-06-08 cs.IR cs.AI cs.SI 交叉投稿

Trading Engagement for Sustainability: Carbon-Aware Re-ranking for E-commerce Recommendations

用参与度换取可持续性:面向电子商务推荐中碳感知的重排序

Noah Lund Syrdal, Anders Vestrum, Jorgen Bergh

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出一种碳感知重排序策略,通过检索增强的碳足迹估计管道推断缺失的产品碳足迹标签,并在三个推荐模型上权衡用户参与度与碳排放,实现可持续推荐。

详情
Comments
23 pages, 30 figures. Code available at https://github.com/andersvestrum/carbon-aware-recsys
AI中文摘要

电子商务推荐系统强烈影响用户考虑和购买的产品,然而可持续性信号(如产品碳足迹,PCF)几乎从未在目录规模上可用。我们研究了在大多数商品缺失PCF标签且必须推断的现实场景中的碳感知产品推荐。我们首先通过检索增强的PCF估计管道来估计产品级碳足迹,该管道利用语义相似性搜索、少样本LLM提示和最近邻回退,将来自碳目录(一组生命周期评估产品的小型集合)的监督转移到大型未标记的电子商务目录。然后,我们在三个已建立的推荐模型(BPR、NeuMF和LightGCN)产生的相关性分数之上应用碳感知的事后重排序策略。该方法通过单个可调参数lambda在预测的用户-物品参与度和估计的碳足迹之间进行权衡。在这项离线研究中,参与度通过亚马逊评论交互来操作化,这些交互作为隐式反馈以及用户兴趣或购买行为的代理。我们在亚马逊评论数据集上跨三个产品类别(家居与厨房、运动与户外、电子产品)评估该框架。通过扫描lambda,我们构建了帕累托前沿,表征每个模型和类别可实现的参与度和碳权衡。在所有模型和类别中,以最小的参与度成本即可实现显著的碳减排。然而,可用的碳空间因模型和类别而异,强调了模型选择和领域背景的重要性。

英文摘要

E-commerce recommender systems strongly influence which products users consider and purchase, yet sustainability signals such as Product Carbon Footprint (PCF) are almost never available at catalog scale. We study carbon-aware product recommendation in the realistic setting where PCF labels are missing for most items and must be inferred. We first estimate product-level carbon footprints via a retrieval-augmented PCF estimation pipeline that transfers supervision from the Carbon Catalogue, a small set of life-cycle-assessed products, to a large unlabeled e-commerce catalog using semantic similarity search, few-shot LLM prompting, and a nearest-neighbour fallback. We then apply a carbon-aware post-hoc re-ranking strategy on top of relevance scores produced by three established recommendation models: BPR, NeuMF, and LightGCN. The method trades off predicted user-item engagement against estimated carbon footprint through a single tunable parameter, lambda. In this offline study, engagement is operationalized through Amazon review interactions, which serve as implicit feedback and as a proxy for user interest or purchase behavior. We evaluate the framework on the Amazon Reviews dataset across three product categories: Home and Kitchen, Sports and Outdoors, and Electronics. By sweeping lambda, we construct Pareto frontiers that characterize the achievable engagement and carbon trade-off for each model and category. Substantial carbon reductions are achievable at minimal engagement cost across all models and categories. However, the available carbon headroom varies by model and category, underscoring the importance of model choice and domain context.

2601.12359 2026-06-08 cs.CR cs.AI cs.CL 交叉投稿

Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs

零样本嵌入漂移检测:一种轻量级防御对抗提示注入的LLM方法

Anirudh Sekar, Mrinal Agarwal, Rachel Sharma, Akitsugu Tanaka, Jasmine Zhang, Arjun Damerla, Kevin Zhu

发表机构 * Algoverse AI Research(Algoverse AI研究院) Berkeley(伯克利大学)

AI总结 本文提出ZEDD,通过量化嵌入空间中良性与可疑输入之间的语义变化,实现对直接和间接提示注入的检测。该方法无需模型内部访问或先验知识,具有低工程开销,能高效部署于多种LLM架构,准确率达93%以上。

详情
Comments
Accepted to NeurIPS 2025 Lock-LLM Workshop
AI中文摘要

提示注入攻击已成为LLM应用中的日益严重漏洞,其中对抗性提示利用电子邮件或用户生成内容等间接输入渠道绕过对齐保护措施,导致有害或意外输出。尽管对齐技术有所进步,但最先进的LLM仍广泛易受对抗性提示攻击,凸显了需要稳健、高效且可推广的检测机制的紧迫性。本文提出零样本嵌入漂移检测(ZEDD),一种轻量级、低工程开销的框架,通过量化嵌入空间中良性与可疑输入之间的语义变化,识别直接和间接提示注入尝试。ZEDD无需访问模型内部、先验攻击类型知识或任务特定重训练,可高效地在多种LLM架构上进行零样本部署。我们的方法使用对抗性清洁提示对,并通过余弦相似度测量嵌入漂移,以捕捉现实世界注入攻击中的细微对抗性操纵。为确保评估的鲁棒性,我们编纂并重新标注了涵盖五个注入类别的综合LLMail-Inject数据集。广泛实验表明,嵌入漂移是一种稳健且可转移的信号,优于传统方法在检测准确性和操作效率方面。在Llama 3、Qwen 2和Mistral等模型架构上,分类准确率超过93%,误报率低于3%,我们的方法提供了一种轻量级、可扩展的防御层,可整合到现有LLM流程中,填补了保护LLM系统以抵御适应性对抗威胁的关键空白。

英文摘要

Prompt injection attacks have become an increasing vulnerability for LLM applications, where adversarial prompts exploit indirect input channels such as emails or user-generated content to circumvent alignment safeguards and induce harmful or unintended outputs. Despite advances in alignment, even state-of-the-art LLMs remain broadly vulnerable to adversarial prompts, underscoring the urgent need for robust, productive, and generalizable detection mechanisms beyond inefficient, model-specific patches. In this work, we propose Zero-Shot Embedding Drift Detection (ZEDD), a lightweight, low-engineering-overhead framework that identifies both direct and indirect prompt injection attempts by quantifying semantic shifts in embedding space between benign and suspect inputs. ZEDD operates without requiring access to model internals, prior knowledge of attack types, or task-specific retraining, enabling efficient zero-shot deployment across diverse LLM architectures. Our method uses adversarial-clean prompt pairs and measures embedding drift via cosine similarity to capture subtle adversarial manipulations inherent to real-world injection attacks. To ensure robust evaluation, we assemble and re-annotate the comprehensive LLMail-Inject dataset spanning five injection categories derived from publicly available sources. Extensive experiments demonstrate that embedding drift is a robust and transferable signal, outperforming traditional methods in detection accuracy and operational efficiency. With greater than 93% accuracy in classifying prompt injections across model architectures like Llama 3, Qwen 2, and Mistral and a false positive rate of <3%, our approach offers a lightweight, scalable defense layer that integrates into existing LLM pipelines, addressing a critical gap in securing LLM-powered systems to withstand adaptive adversarial threats.

2606.05919 2026-06-08 stat.ML cs.LG econ.EM stat.CO 版本更新

Finding Most Influential Sets

寻找最具影响力的集合

Lucas D. Konrad, Nikolas Kuschnig

发表机构 * Vienna University of Economics and Business(维也纳经济与商业大学) Monash University(墨尔本大学)

AI总结 针对具有线性分式留出效应的估计量,提出一种基于Dinkelbach方法的高效算法,将最具影响力集合的选择转化为一个单参数序列的top-k问题,实现全局最优解。

详情
Comments
Published as a conference paper at ICML 2026, fixed ref
AI中文摘要

识别最具影响力的集合(MIS)——即移除后能最大程度改变目标估计量的大小为$k$的子集——通常是不可行的,因为需要搜索$inom{n}{k}$个子集。对于具有线性分式留出效应的估计量,我们证明MIS选择可简化为一个单参数序列的top-k问题。Dinkelbach方法产生了一种每轮迭代成本为$\mathcal{O}(n)$且有限终止的算法。对于固定残差化输入,该算法返回单变量比率目标的全局最优集,包括预言机残差化偏线性模型。当存在估计的干扰函数时,均匀分母和生成得分稳定性意味着对一阶预言机正交得分目标的近似;在分离条件下,可精确恢复集合。模拟和应用表明,该方法恢复了以前计算上无法访问的精确MIS。

英文摘要

Identifying most influential sets (MIS) - size-$k$ subsets whose removal maximally changes a target estimand - is typically infeasible because it requires searching over $\binom{n}{k}$ subsets. For estimands with linear-fractional leave-set-out effects, we show that MIS selection reduces to a one-parameter sequence of top-$k$ problems. Dinkelbach's method yields an algorithm with $\mathcal{O}(n)$ cost per iteration and finite termination. For fixed residualized inputs, the algorithm returns a globally optimal set for the univariate ratio objective, including the oracle-residualized partial linear model. With estimated nuisance functions, uniform denominator and generated-score stability imply approximation to the first-order oracle orthogonal-score objective; exact set recovery follows under a separation condition. Simulations and applications show that the method recovers exact MIS that were previously computationally inaccessible.

2606.05763 2026-06-08 eess.AS cs.SD 版本更新

M2S-AVSR: Modality-aware Multi-view Self-supervised Representation for Robust Audio-Visual Speech Recognition

M2S-AVSR:面向鲁棒视听语音识别的模态感知多视角自监督表示

Fei Su, Cancan Li, Ming Li, Juan Liu

发表机构 * School of Artificial Intelligence and the School of Computer Science, Wuhan University, China(人工智能学院和计算机科学学院,武汉大学,中国) School of Artificial Intelligence, The Chinese University of Hong Kong, Shenzhen, China(人工智能学院,香港中文大学(深圳),中国) School of Artificial Intelligence, Wuhan University, China(人工智能学院,武汉大学,中国)

AI总结 提出一种模态感知多视角自监督表示框架(M2S-AVSR),通过多视角编码学习视角不变视觉语音表示,并利用模态感知模块进行细粒度融合,以应对视角变化、音频失真和视觉遮挡等挑战,在多个基准上取得最优性能。

详情
Comments
submitted to IEEE Transactions on Audio, Speech, and Language Processing
AI中文摘要

视听语音识别(AVSR)通过利用视觉线索增强语音识别的鲁棒性,而现实场景由于视角变化、音频失真和视觉遮挡而仍然具有挑战性,这些因素会降低模态质量并增加视听异步性。在本文中,我们提出了一种新颖的模态感知多视角自监督表示框架,用于鲁棒的视听语音识别(M2S-AVSR)。首先,我们引入了一个多视角表示学习编码器,以学习视角不变的视觉语音表示。其次,我们采用了一个模态感知模块,该模块显式地对模态质量和跨模态同步性进行建模,以执行细粒度的模态感知融合,从而在解码过程中实现细粒度的视觉信息注入。此外,我们提出了AISHELL8-RealScene,一个在真实环境中录制的公开多场景、多视角对话视听数据集,并在此基础上建立了语音识别基准。在英语和普通话基准上的实验证明了所提出方法在挑战性条件下的有效性。在LRS3上,M2S-AVSR在视角扰动和视觉退化设置下实现了高达29.4%的相对改进。我们的方法还在MISP2021-AVSR测试集上取得了新的最先进性能。在AISHELL8-RealScene上,它在户外场景中取得了最佳结果。所提出的方法和数据集为未来在现实条件下进行鲁棒语音和多模态任务的研究提供了有用的支持。

英文摘要

Audio-Visual Speech Recognition (AVSR) enhances speech recognition robustness by leveraging visual cues, while real-world scenarios remain challenging due to viewpoint variation, audio distortion, and visual occlusion, which degrade modality quality and increase audio-visual asynchrony. In this paper, we propose a novel Modality-aware Multi-view Self-supervised representation framework for robust Audio-Visual Speech Recognition (M2S-AVSR). First, we introduce a multi-view representation learning encoder to learn view-invariant visual speech representations. Next, we employ a modality-aware module that explicitly models modality quality and cross-modal synchrony to perform fine-grained modality-aware fusion, enabling fine-grained visual information injection during decoding. In addition, we release AISHELL8-RealScene, a public multi-scenario, multi-view conversational audio-visual dataset recorded in real-world environments, and establish a speech recognition benchmark on it. Experiments on English and Mandarin benchmarks demonstrate the effectiveness of the proposed method under challenging conditions. On LRS3, M2S-AVSR achieves up to 29.4% relative improvement under viewpoint perturbation and visual degradation settings. Our method also achieves new state-of-the-art performance on the MISP2021-AVSR test set. On AISHELL8-RealScene, it achieves the best result in outdoor scenes. The proposed method and dataset provide useful support for future research on robust speech and multimodal tasks under realistic conditions.

2606.05654 2026-06-08 cs.SE cs.AI cs.LG 版本更新

When Surface Form Changes Moderation Decisions: A Paired Study of Code-Mixed Workflow Instability

当表面形式改变审核决策:代码混合工作流不稳定性的配对研究

Suraj Babu Thimma Krishnaram, Yibo Hu, Karthikeyan Saravanan

发表机构 * GitHub

AI总结 通过配对评估设置,研究在清洁英语与泰米尔语-英语代码混合输入下,仇恨审核工作流的变化,发现代码混合导致决策翻转率高达0.265,并增加审核负担和误报。

详情
AI中文摘要

仇恨审核通常被评估为对清洁英语输入的分类,但部署的系统必须将内容路由到诸如ALLOW、FLAG或REVIEW等操作。我们通过配对评估设置研究这种工作流在代码混合输入下的变化,其中相同的基础内容以清洁英语和泰米尔语-英语代码混合形式表达。在基于清洁英语开发数据调整的阈值下,代码混合输入产生显著的动作不稳定性,配对清洁到代码混合决策翻转率为0.265。主要工作流影响是增加的审核负担和增加的非仇恨内容误报:审核率从0.138上升到0.297,非仇恨误报率从0.069上升到0.104。仅泰米尔语输入整体表现出更强的退化,表明存在更广泛的语言覆盖限制,而非相同的代码混合不稳定性模式。一个简单的基于分歧的延迟规则减少了压力输入上的自动错误,但只能通过增加审核负载。这些结果表明,工作流级别的评估揭示了标准分类摘要可能遗漏的审核失败。

英文摘要

Hate moderation is often evaluated as classification on clean English inputs, but deployed systems must route content to actions such as ALLOW, FLAG, or REVIEW. We study how this workflow changes under code-mixed inputs using a paired evaluation setting where the same underlying content is expressed as clean English and Tamil-English code-mix. Under thresholds tuned on clean English development data, code-mixed inputs produce substantial action instability, with a paired clean- to-code-mix decision flip rate of 0.265. The main workflow effects are increased review burden and increased false-flagging of non-hateful content: review rate rises from 0.138 to 0.297 and non-hate false-flag rate rises from 0.069 to 0.104. Tamil-only inputs show stronger degradation overall, suggesting a broader language-coverage limitation rather than the same code-mixed instability pattern. A simple disagreement-based deferral rule reduces automatic errors on stressed inputs, but only by increasing review load. These results show that workflow-level evaluation reveals moderation failures that standard classification summaries can miss.

2606.03163 2026-06-08 cs.MA cs.AI cs.DC 版本更新

OpenAgenet / OAN Yellow Paper: Technical Architecture for Trust-Governed Resource Identity and Discovery

OpenAgenet/OAN:信任治理的智能体身份与发现技术架构

Jinliang Xu

发表机构 * OpenAgenet / OAN

AI总结 本文提出OpenAgenet/OAN协议中立信任层技术架构,通过角色架构、身份对象、注册工作流、根治理生命周期、根验证包模型、授权感知发现、签名可信调用、验证要求、状态转换、安全属性、实现边界和部署考虑,实现异构智能体框架(包括MCP、A2A、ANP类系统及领域特定协议)的身份准入、可发现、可验证和安全交互。

详情
AI中文摘要

本文描述了OpenAgenet / OAN的技术架构。OAN是一个协议中立的信任层,用于开放的智能体互连。它规定了角色架构、身份对象、注册工作流、根治理生命周期、根验证包模型、授权感知发现、签名可信调用、验证要求、状态转换、安全属性、实现边界和部署考虑。该设计旨在支持异构智能体框架和交互协议,包括MCP、A2A、ANP类系统以及领域特定的智能体协议。OAN不定义智能体之间的完整业务对话;它定义了在特定协议交互开始之前,智能体身份如何变得可接纳、可发现、可验证且安全可接近。

英文摘要

This yellow paper describes the technical architecture of OpenAgenet / OAN. OAN is a protocol-neutral trust layer for open Agent interconnection and discoverable AI resource products. It specifies the role architecture, \texttt{did:oan} identity objects, registration workflow, governance-backed Root lifecycle enforcement, Root-verified package model, authorization-aware Discovery, Root-issued infrastructure authorization VCs, signed trusted invocation, verification requirements, state transitions, security properties, implementation boundaries, and deployment considerations. The design is intended to support heterogeneous Agent frameworks and interaction protocols, including MCP, A2A, ANP-like systems, domain-specific Agent protocols, Skills, MCP Servers, and Tool/API resources. OAN does not define the entire business conversation among Agents or the native protocol of every resource; it defines how resource identities become admissible, discoverable, verifiable, and safe to approach before protocol-specific interaction begins.

2606.03161 2026-06-08 cs.MA cs.AI 版本更新

OpenAgenet / OAN White Paper: Open Infrastructure for Trusted Agent Interconnection

OpenAgenet/OAN:可信智能体互连的开放基础设施

Jinliang Xu

发表机构 * arXiv.org

AI总结 针对智能体从孤立应用转向开放多运营商网络时面临的身份验证、治理状态、发现授权、新鲜度和信任证据问题,提出协议无关的信任层OAN,通过根治理身份准入、注册商辅助注册、根验证包发布、授权感知发现和签名可信调用来实现可信互连。

详情
AI中文摘要

OpenAgenet,简称OAN,是一个用于可信智能体互连的开放基础设施项目。它解决了一个在智能体从孤立应用转向开放的多运营商网络时变得明显的问题:在智能体能够安全地发现、选择和调用另一个智能体之前,它需要一种方法来验证身份来源、治理状态、发现授权、新鲜度和连接前的信任证据。OAN被设计为一个协议无关的信任层。它不取代智能体交互协议、工具协议、模型编排框架或应用级工作流。相反,它提供了根治理的身份准入、注册商辅助的注册、根验证的包发布、授权感知的发现以及签名的可信调用。本文介绍了OAN的动机、架构、角色、治理模型、与MCP、A2A和ANP的关系、部署模式、合作模型、区块链支持的授权公告、原型状态、性能概况和路线图。

英文摘要

OpenAgenet, abbreviated as OAN, is an open infrastructure project for trusted Agent interconnection. It addresses a problem that becomes visible when Agents move from isolated applications into open, multi-operator networks: before an Agent can safely discover, select, and invoke another Agent, it needs a way to verify identity provenance, governance state, discovery authorization, freshness, and pre-connection trust evidence. OAN is designed as a protocol-neutral trust layer. It does not replace Agent interaction protocols, tool protocols, model orchestration frameworks, or application-level workflows. Instead, it provides \texttt{did:oan}-based resource identity, governance-backed admission, Registrar-assisted onboarding, Root-verified package publication, authorization-aware Discovery, Root-issued infrastructure authorization VCs, and signed trusted invocation. The architectural center of OAN is the combination of federated governance, resource identity, and trusted Discovery, rather than a single directory or naming service. This white paper explains the motivation, architecture, roles, governance model, relationship with MCP, A2A, and ANP, deployment patterns, cooperation model, on-chain governance layer, prototype status, performance profile, and roadmap of OAN.

2606.02475 2026-06-08 math.NA cs.CE cs.LG cs.NA 版本更新

Physics-Informed Residuals for Adaptive Mesh Refinement in Finite-Difference PDE Solvers

面向有限差分PDE求解器中自适应网格细化的物理信息残差

Henry Kasumba, Ronald Katende

发表机构 * Department of Mathematics, Makerere University(数学系,Makerere大学) Department of Mathematics, Kabale University(数学系,Kabale大学)

AI总结 提出利用物理信息神经网络(PINN)作为离网格残差探针,为有限差分求解器提供自适应网格细化指示,在粘性Burgers方程等基准测试中验证了其有效性。

详情
Comments
20 pages, 5 tables, 5 figures
AI中文摘要

经典有限差分求解器仍是偏微分方程的可靠工具,但其效率取决于网格分辨率的放置位置。当求解困难集中在尖锐梯度、前沿、振荡或约束敏感区域附近时,均匀细化可能浪费自由度。本文研究了一种混合策略,其中物理信息神经网络(PINN)不作为最终求解器,而是作为自适应网格细化的离网格残差探针。PINN残差在域内采样,转换为单元指示器,并在最终近似由有限差分求解器计算之前指导细化。该方法在三个基准测试上进行了评估。主要的全求解器验证使用一维粘性Burgers方程,在自适应网格上进行非均匀有限差分求解。PINN阈值细化在60个自由度下达到最终相对$L^2$误差0.021067,而均匀细化在192个自由度下为0.022617。在匹配网格大小时,PINN阈值将误差降低了约67.5%。PINN-Dörfler细化性能类似,使用58个自由度时误差为0.021264。梯度指示器仍略精确,因此结果支持有用性而非普遍优越性。基于非线性薛定谔方程和不可压缩Navier-Stokes系统的二维和三维代理测试表明,PINN残差可以组织结构化细化并优于随机细化,尽管它们并不始终优于梯度或均匀基线。结果支持PINN引导的AMR作为一种残差指示器策略,将物理信息诊断信息传递到有限差分网格自适应中,同时保留经典求解器作为最终近似引擎。

英文摘要

Classical finite-difference solvers remain reliable tools for partial differential equations, but their efficiency depends on where mesh resolution is placed. Uniform refinement can waste degrees of freedom when solution difficulty is localised near sharp gradients, fronts, oscillations, or constraint-sensitive regions. This paper studies a hybrid strategy in which a physics-informed neural network (PINN) is used not as the final solver, but as an off-grid residual probe for adaptive mesh refinement. The PINN residual is sampled over the domain, converted into cellwise indicators, and used to guide refinement before the final approximation is computed by a finite-difference solver. The method is evaluated on three benchmarks. The main full-solver validation uses the one-dimensional viscous Burgers equation with a nonuniform finite-difference solve on the adapted meshes. PINN-threshold refinement attains final relative $L^2$ error $0.021067$ with $60$ degrees of freedom, compared with $0.022617$ for uniform refinement with $192$ degrees of freedom. At matched mesh size, PINN-threshold reduces the error by about $67.5\%$. PINN-Dörfler refinement gives similar performance, with error $0.021264$ using $58$ degrees of freedom. A gradient indicator remains slightly more accurate, so the result supports usefulness rather than universal superiority. Manufactured 2D and 3D proxy tests, based on a nonlinear Schrödinger equation and an incompressible Navier--Stokes system, show that PINN residuals can organise structured refinement and improve over random refinement, although they do not consistently outperform gradient or uniform baselines. The results support PINN-guided AMR as a residual-indicator strategy for transferring physics-informed diagnostic information into finite-difference mesh adaptation while preserving the classical solver as the final approximation engine.

2606.01765 2026-06-08 cs.FL cs.CL cs.LG 版本更新

An Algebraic View of the Expressivity of Recurrent Language Models

循环语言模型表达能力的代数视角

Franz Nowak, Ryan Cotterell, Reda Boumasmoud

发表机构 * arXiv.org GitHub

AI总结 本文通过代数统一框架分析循环神经网络在不同算术模型下的表达能力,将形式语言识别问题归结为语法幺半群是否划分特定圈积的代数问题。

详情
Comments
28 pages, 2 figures, to be published at ICML 2026
AI中文摘要

循环神经语言模型能识别哪些形式语言?文献中的形式结果存在冲突:一些作者报告图灵完备性,而另一些则显示等价于正则语言。这种差异的原因在于底层算术模型不同。本文发展了一个统一的代数视角来刻画循环神经网络的表达能力,首先对各种算术模型进行形式化描述。该视角将表达能力归结为一个代数问题,例如网络的语法幺半群是否划分某个圈积。作为案例研究,本文重新审视了对角状态空间模型:一旦强制执行浮点递归,同一架构无法实现偶数模计数器,但在无符号整数量化下却能实现每个偶数模计数器。

英文摘要

What formal languages can a recurrent neural language model recognize? Formal results in the literature conflict: some authors report Turing-completeness, while others show equivalence to regular languages. The reason for this discrepancy is that the underlying arithmetic model differs. The paper develops a unified algebraic account of the expressivity of recurrent neural networks, starting with a formal account of various arithmetic models. This account reduces expressivity to an algebraic question, e.g., whether a network's syntactic monoid divides a certain wreath product. As a case study, the paper revisits diagonal state-space models: the same architecture cannot implement an even-modulus counter once floating-point recurrences are enforced, yet realizes every even-modulus counter under unsigned-integer quantization.

2606.00279 2026-06-08 cs.CR cs.LG 版本更新

Bit-Exact AI Inference Verification Without Performance Tradeoffs

无性能权衡的位精确AI推理验证

Naci Cankaya

发表机构 * Naci Cankaya(纳西·卡纳亚)

AI总结 针对GPU浮点运算非确定性导致AI工作负载验证困难的问题,提出通过软件仿真实现位精确重计算,在不牺牲性能的前提下实现可审计的推理验证。

详情
Comments
Best paper award, ICML 2026 TAIGR workshop. Code can be found at https://github.com/NaciCankaya/hardware_rounding_error_predictor
AI中文摘要

验证AI工作负载的声明是对隐蔽对手(仅在检测可能性高时才遵守监控)进行可信AI治理的前提,然而GPU浮点运算明显的非确定性迫使审计人员接受近似输出匹配。隐蔽对手可以利用监控计算中不可验证的自由度。攻击向量包括隐写术、未报告的推理软件修改以及通过未报告的批次元素进行的隐蔽计算。通过实验,我们分析了现代推理引擎(vLLM、HF transformers)如何产生确定性但非不变性的输出,而无需设置影响性能的确定性标志,只要重计算所需的信息可用且后端未调用原子函数。我们证明,这种位精确重计算不需要访问相同的硬件,通过跨多个NVIDIA GPU变体的LLM推理的纯软件仿真实现。因此,累积的舍入误差可以成为用于推理的软件和硬件设置的可审计签名,而不是可验证性的约束。

英文摘要

Verifying claims about AI workloads is a prerequisite for credible AI governance of covert adversaries (who comply with monitoring only when detection likelihood is high), yet the apparent non-determinism of GPU floating-point arithmetic forces auditors to accept approximate output matches. Covert adversaries can exploit unverifiable degrees of freedom in monitored computation. Attack vectors include steganography, unreported modification of inference software, and covert computation via unreported batch elements. Empirically, we analyze how modern inference engines (vLLM, HF transformers) produce deterministic but non-invariant outputs, without needing to set performance-compromising determinism flags, if the right information is available for re-computation and no atomic functions are called in the backend. We demonstrate that such bitwise-precise re-computation does not require access to identical hardware, via a software-only emulation of LLM inference across multiple NVIDIA GPU variants. Thus, accumulated rounding errors can be an auditable signature of the software and hardware setup used for inference, instead of a constraint on verifiability.

2605.31051 2026-06-08 cs.NE cs.AI 版本更新

Linear Ordering Problem: Time for a Change

线性排序问题:变革之时

Fabrizio Fagiolo, Marco Baioletti, Valentino Santucci

发表机构 * University for Foreigners of Perugia(佩鲁吉亚外国大学) University of Perugia(佩鲁吉亚大学)

AI总结 针对线性排序问题(LOP)中基准数据过时及多最优解问题,提出基于最新经济数据的新基准套件和生成多样高质量解的算法方案,并引入质量和多样性评估指标。

详情
Comments
Accepted for publication at PPSN 2026 - Conference on Parallel Problem Solving
AI中文摘要

线性排序问题(LOP)是一个基础组合优化问题,在经济学、社会选择和机器学习等领域有重要应用。其最突出的用途是经济投入产出表的三角化,这有助于识别经济中的关键产业。大多数现有算法都是在基于过时的宏观经济数据的基准上评估的,这些数据不再反映当代经济的结构。此外,LOP实例通常表现出许多不同的全局最优解,这些解之间可能差异很大,给依赖单一解的应用带来了挑战。为了解决这些局限性,我们引入了一个基于最新现实世界经济数据的新基准套件,以及一种利用最先进的LOP元启发式算法生成多样化高质量解集的算法方案,同时提供了评估质量和多样性的指标。实验在传统单解场景和新引入的多解场景下对所提出的基准套件进行了结果报告。

英文摘要

The Linear Ordering Problem (LOP) is a fundamental combinatorial optimization problem with important applications in areas such as economics, social choice, and machine learning. Its most prominent use is the triangulation of economic input-output tables, which helps identify critical industries in an economy. Most existing algorithms have been evaluated on benchmarks derived from outdated macroeconomic data, which no longer reflect the structure of contemporary economies. Furthermore, LOP instances often exhibit many distinct global optima that can differ substantially from one another, creating challenges for applications that rely on a single solution. To address these limitations, we introduce a novel benchmark suite derived from up-to-date real-world economic data and an algorithmic scheme that leverages state-of-the-art LOP metaheuristics to generate diverse sets of high-quality solutions, together with metrics for assessing both quality and diversity. Experiments were conducted to report results on the proposed benchmark suite under both the traditional single-solution setting and the newly introduced multi-solution scenario

2605.30432 2026-06-08 math.DS cs.LG cs.SI nlin.AO physics.soc-ph 版本更新

Learning effective models from network dynamics data with multiple initial conditions using weak form SINDy

使用弱形式SINDy从多初始条件的网络动力学数据中学习有效模型

Moyi Tian, Daniel A. Messenger, Vanja Dukic, Nancy Rodríguez, David M. Bortz

发表机构 * Department of Applied Mathematics, University of Colorado, Boulder, CO 80309 United States(应用数学系,科罗拉多大学,博尔德,CO 80309 美国) Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545 United States(洛斯阿拉莫斯国家实验室理论部,洛斯阿拉莫斯,NM 87545 美国)

AI总结 本文使用弱形式稀疏非线性动力学识别(WSINDy)方法,从多初始条件的网络动力学数据中学习有效模型,并评估了噪声水平与轨迹数量对学习精度的影响。

详情
Comments
24 pages, 14 figures, 1 table. Code available at https://github.com/Moyi-Tian/WSINDy-NetworkDynamics
AI中文摘要

社会系统由通过社交互动相互影响的个体网络组成。研究这些网络上的过程演化有助于我们更好地理解社会行为模式。我们研究了一个耦合线上和线下社交活动的系统,并探讨如何使用弱形式稀疏非线性动力学识别(WSINDy)方法直接从数据中学习有效模型,该方法用于发现控制方程。我们使用网络上的随机交互过程的平均场近似模型生成的数据评估学习性能,并测试在不同噪声水平下系统恢复的准确性。结果表明,当噪声较高时,使用更多轨迹可以提高准确性,但只需少量额外轨迹即可获得大部分收益,之后改进甚微。我们还从网络上的平均随机数据中学习有效的常微分方程模型。当传统的平均场近似失效时,直接从随机过程中识别连续常微分方程能够生成更符合数据的有效模型,并更深入地理解潜在动力学。

英文摘要

Social systems consist of networks of individuals who influence one another through social interactions. Studying how processes evolve on these networks can help us better understand patterns of social behavior. We study a system that couples online and offline social activity and investigate how to learn effective models directly from data using Weak Form Sparse Identification of Nonlinear Dynamics (WSINDy), a method for discovering governing equations. We assess learning performance using data generated by a mean-field approximation model of a stochastic interaction process on networks and test how accurately the system can be recovered under different noise levels. Our results show that using more trajectories improves accuracy when noise is high, but only a small number of additional trajectories is needed to gain most of the benefit, with little improvement beyond that. We also learn effective ODE models from averaged stochastic data on networks. When traditional mean-field approximations fail, identifying continuum ODEs directly from stochastic processes yields efficient models that better match the data and provide deeper insight into the underlying dynamics.

2605.25645 2026-06-08 cs.DC cs.AI 版本更新

Fine-Tuning and Serving Gemma 4 31B on Google Cloud TPU: A Technical Comparison with GPU Baselines

在Google Cloud TPU上微调和服务Gemma 4 31B:与GPU基线的技术比较

Jatin Kishnani, Mayank Goel, Amit Singh, Pulkit Agrawal, Sairanjan Mishra

发表机构 * Google Cloud(谷歌云)

AI总结 本文首次端到端展示了在TPU硬件上微调和服务Google Gemma 4 31B模型,通过与GPU平台的实证比较,提供了代码级适配方案,并证明TPU在训练速度和成本上具有优势。

详情
AI中文摘要

我们首次端到端展示了在TPU硬件上微调和服务Google的Gemma 4 31B模型,提供了TPU与GPU平台在大语言模型适配上的实证比较。使用LoRA在Google TPU v5p-8上进行训练,在TPU v6e-8(Trillium)上进行推理,我们记录了将基于PyTorch、HuggingFace TRL和FSDP的GPU原生训练配方移植到JAX + Tunix/Qwix栈所需的全部代码级适配。这些适配涵盖网格配置、LoRA模块命名约定、分片注释修正、梯度检查点、数据管道重构以及自定义的Orbax到safetensors检查点合并过程。对于推理,我们详细描述了在v6e-8上服务Gemma 4所需的vLLM-TPU Docker设置,并刻画了由此产生的延迟和吞吐量特征。与相同超参数下的2xH100 GPU基线相比,TPU训练完成速度快1.61倍,成本低2.12倍。推理吞吐量在平台间差异在3%以内,而TPU的首令牌延迟低2倍(235 ms vs. 475 ms)。总体而言,对于代表性的训练加服务工作负载,TPU配置便宜1.82倍。我们的工作填补了开放工具生态系统中的关键空白,为从业者提供了可复现、生产就绪的Gemma 4在TPU基础设施上部署的配方。

英文摘要

We present the first end-to-end demonstration of fine-tuning and serving Google's Gemma 4 31B model on TPU hardware, providing an empirical comparison of TPU and GPU platforms for large language model adaptation. Using LoRA on a Google TPU v5p-8 for training and TPU v6e-8 (Trillium) for inference, we document the full set of code-level adaptations required to port a GPU-native training recipe - built on PyTorch, HuggingFace TRL, and FSDP - to the JAX + Tunix/Qwix stack. These adaptations span mesh configuration, LoRA module naming conventions, sharding annotation corrections, gradient checkpoint, data pipeline restructuring, and a custom Orbax-to-safetensor checkpoint merging procedure. For inference, we detail the vLLM-TPU Docker setup necessary to serve Gemma 4 on v6e-8 and characterize the resulting latency and throughput profile. Compared with a similar-costing 2xH100 GPU baseline under identical hyperparameters, TPU training completes 1.61x faster at 2.12x lower cost. For inference, we cover the vLLM-TPU Docker setup required to serve Gemma 4 on v6e-8 and explain the observed latency and throughput characteristics across a QPS sweep spanning 512 to 16k input tokens. Across both workloads we compare performance and cost against a 2xH100 GPU baseline running identical hyperparameters. The TPU completes training 1.61x faster at 2.12x lower cost. For inference, TPU v6e-8 matches GPU at short context (<=2048 tokens) and decisively outperforms at long context: 66% higher throughput and 23.6x faster TTFT at 4096-token inputs (61 ms vs 1,443 ms at QPS=4). Our work removes a critical gap in the open tooling ecosystem and provides practitioners with a recipe for Gemma 4 Dense 31B deployment on the TPU infrastructure.

2603.11075 2026-06-08 cs.AR cs.AI 版本更新

VeriHGN: Heterogeneous Graph-Based Congestion Prediction for Chip Layout Verification

VeriHGN: 基于异构图的芯片布局验证中的拥堵预测

Runbang Hu, Bo Fang, Bingzhe Li, Yuede Ji

发表机构 * The University of Texas at Arlington(德克萨斯大学阿灵顿分校) The University of Texas at Dallas(德克萨斯大学达拉斯分校)

AI总结 本文提出VeriHGN框架,通过增强的异构图统一电路组件和空间网格,实现更准确的逻辑意图与物理实现的交互建模,提高了拥堵预测的准确性和相关性。

详情
Comments
Accpeted at KDD 2026
AI中文摘要

随着非常大规模集成电路(VLSI)设计在规模和复杂性上持续增长,布局验证已成为现代电子设计自动化(EDA)工作流程中的核心挑战。在实践中,拥堵只能在详细布线后才能被准确识别,这使得传统验证既耗时又昂贵。因此,学习方法被探索以实现早期阶段的拥堵预测并减少布线迭代。然而,尽管先前的方法结合了网表连接性和布局特征,但它们通常以松散耦合的方式建模这两个方面,并主要产生数值拥堵估计。我们提出VeriHGN,一个基于增强异构图的验证框架,将电路组件和空间网格统一到单一关系表示中,从而实现更准确的逻辑意图与物理实现的交互建模。在工业基准测试中,包括ISPD2015、CircuitNet-N14和CircuitNet-N28,实验表明,VeriHGN在预测准确性和相关性度量方面均优于现有最先进方法。

英文摘要

As Very Large Scale Integration (VLSI) designs continue to scale in size and complexity, layout verification has become a central challenge in modern Electronic Design Automation (EDA) workflows. In practice, congestion can only be accurately identified after detailed routing, making traditional verification both time-consuming and costly. Learning-based approaches have therefore been explored to enable early-stage congestion prediction and reduce routing iterations. However, although prior methods incorporate both netlist connectivity and layout features, they often model the two in a loosely coupled manner and primarily produce numerical congestion estimates. We propose VeriHGN, a verification framework built on an enhanced heterogeneous graph that unifies circuit components and spatial grids into a single relational representation, enabling more faithful modeling of the interaction between logical intent and physical realization. Experiments on industrial benchmarks, including ISPD2015, CircuitNet-N14, and CircuitNet-N28, demonstrate that VeriHGN achieves the best or near-best performance over state-of-the-art methods in prediction accuracy and correlation metrics.