arXivDaily arXiv每日学术速递 周一至周五更新
2606.20490 2026-06-19 cs.MS 新提交

Software package MaRDI Open Interfaces for improved interoperability in numerical optimization

软件包MaRDI开放接口:提升数值优化互操作性

Dmitry I. Kabanov, Stephan Rave, Mario Ohlberger

AI总结 提出MaRDI开放接口软件包,通过统一非线性优化接口减少编码与测试工作,并以物理信息神经网络求解粘性Burgers方程为例验证其互操作性。

Comments 15 pages, 1 figure, 1 table, GAMM2026

详情
AI中文摘要

为了解决计算科学中的互操作性挑战,我们介绍了软件包MaRDI Open Interfaces的最新更新。该软件包旨在减少计算科学家在编写数值求解器绑定以及将实验代码适配到同一问题类型(例如,基准测试哪个求解器更好)的不同求解器接口上所花费的时间和编码/测试工作。通过简化这些任务,该软件包帮助研究人员专注于其计算项目的实际本质。在这里,我们展示了一个最近开发的非线性优化接口,并说明了如何将其应用于优化问题的计算实验。作为此类问题的一个例子,我们考虑了训练物理信息神经网络以预测粘性Burgers方程的解。

英文摘要

To address the challenges of interoperability in computational science, we present the latest updates to the software package MaRDI Open Interfaces. This software package aims to decrease the time and coding/testing efforts spent by computational scientists on tasks such as writing bindings to numerical solvers and adapting experiment codes to the varying interfaces of solvers for the same problem type (e.g., for benchmarking, which solver is better). By streamlining these tasks, this software package helps researchers focus on the actual essence of their computational projects. Here, we demonstrate a recently developed interface for nonlinear optimization and illustrate how it can be applied for computational experiments with optimization problems. As an example of such problem, we consider training of physics-informed neural networks to predict the solutions of viscous Burgers' equation.

2606.20410 2026-06-19 cs.MS 新提交

MaRDI Open Interfaces for Interoperable Nonlinear Optimization

MaRDI 开放接口:实现可互操作的非线性优化

Dmitry I. Kabanov, Stephan Rave, Mario Ohlberger

AI总结 提出MaRDI开放接口软件包,通过统一数值问题接口和自动数据编组,提升非线性优化中不同求解器和编程语言间的互操作性,减少代码修改和测试成本。

Comments 12 pages, 1 figure, 1 table, deRSE2026

详情
AI中文摘要

MaRDI开放接口是一个旨在提高科学计算互操作性的软件包,特别是针对非线性优化。为此,该包具有两个主要特点。首先,它为典型的数值问题提供统一接口,以帮助在同一问题类型的求解器之间切换。其次,它自动处理编程语言之间的数据编组。因此,计算科学家可以通过使用该包更快地进行实验,减少代码修改和测试工作。本文描述了该软件包的总体结构,并展示了非线性优化接口的示例。

英文摘要

MaRDI Open Interfaces is a software package that aims to improve interoperability in scientific computing, particularly, for nonlinear optimization. To this end, this package holds two main characteristics. First, it provides unified interfaces for typical numerical problems to help switching between solvers for the same problem type. Second, it automates data marshalling between programming languages. Hence, computational scientists can conduct experiments faster by using the package, with fewer code-modification and testing efforts. In this work we describe the general structure of the software package and show examples with the interface for nonlinear optimization.

2606.20496 2026-06-19 math.NA cs.DC cs.MS cs.NA 交叉投稿

CoarseSolvers for Exascale Solution of Poisson Problems

用于泊松问题百亿亿次求解的粗网格求解器

Thilina Ratnayaka, Paul Fischer, Luke Olson

AI总结 提出一种两层Schwarz方法替代代数多重网格(AMG)作为p-多重网格预条件子的粗网格求解器,通过结构化非嵌套粗空间实现无通信插值,在Summit/Frontier超算上验证了优于BoomerAMG的性能。

详情
AI中文摘要

我们提出一种两层Schwarz方法,作为代数多重网格(AMG)的替代方案,用于求解由不可压缩Navier-Stokes方程的谱/有限元离散产生的压力泊松方程的p-多重网格(pMG)预条件子的最后一层(粗网格)求解器。所提出的Schwarz方法包括原始pMG粗空间中的一个局部问题和一个全局粗问题。本文的主要贡献是为全局粗问题提出了一种新颖的、结构化的非嵌套粗空间。所提出的全局粗空间的结构化特性使得原始p-多重网格粗空间与全局粗问题之间的插值无需通信。通过在橡树岭领导计算设施的Summit/Frontier超算上使用高度可扩展的不可压缩Navier-Stokes求解器套件Nek5000/RS进行的一系列实验,我们展示了所提方法相比最先进的AMG求解器BoomerAMG的有效性。

英文摘要

WepresentatwolevelSchwarzmethodasanalternativetoAlgebraicMultigridmethod(AMG) used as the last level (coarse) solver of the p-multigrid pMG preconditioner for pressure Poission equation resulting from Spectral/Finite element descretization of incompressible Navier-Stokes eqaution. Proposed Schwarz method consits of a local problem in the original pMG coarse space and a global coarse problem. Main contribution of the paper is a novel, structured and a non-nested coarse space for the global coarse problem. Structured nature of the proposed global coarse space enable communication-free interpolation between the original p-multgrid coarse space and the global coarse problem. We demonstrate the effectiveness of the proposed method compared to the state of the art AMG solver BoomerAMG by a series of experiments performed using Nek5000/RS, a suite of highly scalable incompressible Navier-Stokes solvers, on Summit/Frontier supercomputers at Oak Ridge Leadership Computing Facility.

2606.20358 2026-06-19 math.CV cs.MS 交叉投稿

Formalizing Extended Complex Numbers, Mobius Transformations, and Cross Ratio in Lean 4

在 Lean 4 中形式化扩充复数、莫比乌斯变换和交比

Fubin Yan, Kenneth W. Shum

AI总结 使用 Lean 4 形式化扩充复平面、莫比乌斯变换和交比,证明了群结构、三点唯一性和交比不变性,提供约 6000 行验证代码。

Comments 10 pages

详情
AI中文摘要

扩充复平面是复分析、双曲几何和数学物理中的一个基本对象。其几何由莫比乌斯变换支配,交比作为中心不变量。我们在 Lean 4 定理证明器中形式化了这些概念。扩充复平面使用 Mathlib 的 Option 类型在 $\mathbb{C}$ 上表示,其中附加元素表示无穷远点。在此基础之上,我们定义了莫比乌斯变换、它们在扩充复平面上的作用以及交比。我们形式化了莫比乌斯变换的几个基本性质,包括它们的群结构,并将它们与射影一般线性群等同。我们还证明了将任意三个不同点映射到任意另外三个不同点的莫比乌斯变换的唯一性,以及交比的不变性。所有证明都在 Lean 4 中进行了机器检查。完整的开发包含约 6000 行 Lean 代码,包括约 40 个定义和 150 个引理与定理。这项工作为未来共形几何、双曲模型、模形式以及数学物理应用的形式化提供了经过验证的基础。

英文摘要

The extended complex plane is a fundamental object in complex analysis, hyperbolic geometry, and mathematical physics. Its geometry is governed by Möbius transformations, with the cross ratio serving as a central invariant. We present a formalization of these concepts in the Lean4 theorem prover. The extended complex plane is represented using Mathlib's Option type over $\mathbb{C}$, where the additional element represents the point at infinity. On this foundation, we define Möbius transformations, their action on the extended complex plane, and the cross ratio. We formalize several basic properties of Möbius transformations, including their group structure, and identify them with a projective general linear group. We also prove the uniqueness of a Möbius transformation mapping any three distinct points to any other three distinct points, and the invariance of the cross ratio. All proofs are machine-checked in Lean 4. The complete development comprises approximately 6,000 lines of Lean code, including about 40 definitions and 150 lemmas and theorems. This work provides a verified foundation for future formalizations of conformal geometry, hyperbolic models, modular forms, and applications in mathematical physics.

2606.05017 2026-06-19 cs.AR cs.MS 版本更新

GoldenFloat: A Phi-Derived Static-Split Floating-Point Family from GF4 to GF256 with a Lucas-Exact Integer Identity

GoldenFloat: 从GF4到GF256的基于Phi的静态拆分浮点系列及其Lucas精确整数恒等式

Dmitrii Vasilev

AI总结 提出一种由单一闭式规则生成的静态拆分浮点系列GoldenFloat,并给出多宽度RTL生成器、Lucas精确累加器路径和FPGA编解码器三个具体实现。

Comments 20 pages, single-file LaTeX, ASCII source. v2: peer-anchor updates. Adds Sarnoff P3109 (arXiv:2606.04028), AMD MXFP4 silicon (arXiv:2605.09825), NVIDIA GB10 NVFP4 measurement, companion catalog (arXiv:2606.09686), MixFP4 (arXiv:2605.31035). FL-002 expanded: (c1) GF256 bias, (c2) count drift, (g) static-split vs micro-mixing. TTSKY26a regeneration timeline added. No mathematical claims revised

详情
AI中文摘要

我们提出一种面向硬件的GoldenFloat(GF)描述,这是一个由单一闭式规则生成的静态拆分浮点系列,以及三个具体成果:(i)一个开放的多宽度RTL生成器,覆盖GF4-GF256,并带有针对正确舍入参考的连续积分差分扫描;(ii)一个整数支持的Lucas精确累加器路径,在n=1,...,256时以500位精度验证;(iii)一个GF16 FPGA编解码器,在Artix-7(Xilinx XC7A35T)上以323 MHz通过35/35测试台。对于每个总宽度N>=4,指数宽度e=round((N-1)/phi^2),其中小数部分f=N-1-e,phi=(1+sqrt(5))/2。该规则复现了九种格式(9/9)的已实现指数宽度,并一致扩展到GF128、GF512、GF1024。该规则与posit、takum、OCP-MX以及IEEE P3109多宽度浮点草案并列。我们不对其中任何一种提出每级精度或优越性声明。广度/工具链一致性框架被记录为一个开放猜想,并带有预注册的证伪路径。证伪分类账(FL-002)记录了开放问题及解决它们的实验。报告了日期为2026-05-31的RTL正确性勘误;制造的TTSKY26b芯片带有缺陷的乘法器组合,修正后的生成器是再生基线。

英文摘要

We present a hardware-oriented description of GoldenFloat (GF), a static-split floating-point family generated by a single closed rule, and three concrete artefacts: (i) an open multi-width RTL generator covering GF4-GF256 with a continuous-integration differential sweep against a correctly-rounded reference; (ii) an integer-backed Lucas-exact accumulator path verified at 500-digit precision for n = 1, ..., 256; and (iii) a GF16 FPGA codec passing a 35-of-35 testbench at 323 MHz on Artix-7 (Xilinx XC7A35T). A format-conformance oracle (Corona) ships in the same repository and is used as the blackbox check in our continuous-integration audit. The rule and its scope. For each total width N >= 4, the exponent width is e = round((N-1)/phi^2) with fraction f = N-1-e and phi = (1+sqrt(5))/2. The rule reproduces the realised exponent widths of nine formats GF4, GF8, GF12, GF16, GF20, GF24, GF32, GF64, GF256 (9/9) and extends consistently to GF128, GF512, GF1024. The rule is positioned alongside posit (2022 Posit Standard), takum (Hunhold 2024, 2025), OCP-MX (Rouhani et al. 2023), and the IEEE P3109 multi-width float draft, all of which are width-spanning families under a parameterised rule. We make no per-rung accuracy or superiority claim against any of them. What is open. The breadth/toolchain-coherence framing is recorded as an open conjecture with a pre-registered falsification path: a matched-substrate FPGA experiment and a matched-budget software ablation. A falsification ledger (FL-002) records the open questions and the experiments that would settle them. An RTL-correctness erratum dated 2026-05-31 is reported in Section 5.5; the fabricated TTSKY26b dies carry the defective multiplier portfolio, and the corrected generator is the regeneration baseline.

1501.00324 2026-06-19 cs.MS cs.CE 版本更新

A New Sparse Matrix Vector Multiplication GPU Algorithm Designed for Finite Element Problems

一种针对有限元问题设计的新型稀疏矩阵向量乘法GPU算法

Jonathan Wong, Ellen Kuhl, Eric Darve

AI总结 针对有限元分析中的稀疏矩阵向量乘法(SPMV)在GPU上的性能瓶颈,提出一种新SPMV算法及其变体,通过有效带宽测试和心脏有限元模拟验证,相比现有算法可带来高达12倍加速。

Comments 35 pages, 22 figures Code available at: https://github.com/thejonwong/warpkernel

Journal ref Int J Numer Meth Eng 102 12 1784-1814 2015

详情
AI中文摘要

近年来,图形处理器(GPU)越来越多地被用于各种科学计算应用中。然而,CPU和GPU之间的架构差异要求开发能够利用GPU硬件的算法。由于稀疏矩阵向量乘法(SPMV)操作在有限元分析中常用,因此针对GPU上的非结构化有限元网格,开发了一种新的SPMV算法及其几种变体。针对15个不同大小和不同稀疏结构的稀疏矩阵,测量并分析了当前GPU算法和新提出算法的有效带宽。随后研究了优化效果以及新GPU算法与其变体之间的差异。最后,在心脏的GPU有限元模拟中,将新SPMV GPU算法和当前SPMV GPU算法用于GPU CG求解器,并将这些结果与并行PETSc有限元实现结果进行比较。有效带宽测试表明,对于各种稀疏矩阵,新算法与当前算法相比具有非常有利的性能,并能带来非常显著的好处。GPU有限元模拟结果证明了使用GPU进行有限元分析的优势,并且表明所提出的算法在实际有限元应用中可以实现高达12倍的加速比。

英文摘要

Recently, graphics processors (GPUs) have been increasingly leveraged in a variety of scientific computing applications. However, architectural differences between CPUs and GPUs necessitate the development of algorithms that take advantage of GPU hardware. As sparse matrix vector multiplication (SPMV) operations are commonly used in finite element analysis, a new SPMV algorithm and several variations are developed for unstructured finite element meshes on GPUs. The effective bandwidth of current GPU algorithms and the newly proposed algorithms are measured and analyzed for 15 sparse matrices of varying sizes and varying sparsity structures. The effects of optimization and differences between the new GPU algorithm and its variants are then subsequently studied. Lastly, both new and current SPMV GPU algorithms are utilized in the GPU CG Solver in GPU finite element simulations of the heart. These results are then compared against parallel PETSc finite element implementation results. The effective bandwidth tests indicate that the new algorithms compare very favorably with current algorithms for a wide variety of sparse matrices and can yield very notable benefits. GPU finite element simulation results demonstrate the benefit of using GPUs for finite element analysis, and also show that the proposed algorithms can yield speedup factors up to 12-fold for real finite element applications.