arXivDaily arXiv每日学术速递 周一至周五更新
重置
cs.DS数据结构14
2606.12179 2026-06-11 cs.DS math.NA 新提交

Nearly Instance Optimal Sparse Matrix Approximation from Matrix-Vector Products

近乎实例最优的稀疏矩阵近似:基于矩阵-向量乘积

Christoper Musco, Indu Ramesh

AI总结 研究仅通过矩阵-向量乘积查询学习隐式矩阵的稀疏近似问题,提出基于退化度的统一框架,证明查询复杂度的紧界,并给出多项式时间算法。

详情
AI中文摘要

大量工作研究学习隐式矩阵 $A\in \mathbb{R}^{m\times n}$ 的近似问题,该矩阵仅能通过形如 ${x} \rightarrow {A}{x}$ 或 ${x} \rightarrow {A}^T{x}$ 的矩阵-向量乘积查询(matvec查询)隐式访问。特别关注的是学习具有固定稀疏模式的近最优近似的方法。例如,我们可能想学习隐式矩阵 $A$ 的近最优对角、带状或箭头形近似。自然,解决该问题所需的 matvec 查询次数取决于稀疏模式,该模式可编码为二元矩阵 ${S}\in \{0,1\}^{m\times n}$。先前算法的查询复杂度与 ${S}$ 中1的总数、其最大列/行稀疏度或其“冲突图”的色数等量相关。这些量不可比较:对于给定的 ${S}$,用其中一个参数化可能比另一个产生更低的查询复杂度。在这项工作中,我们通过提供稀疏矩阵近似的 matvec 查询复杂度的近乎尖锐刻画,统一并加强了这些先前结果。推广图算法中的一个定义,令退化度 ${degen}({S})$ 表示最小的数 $k$,使得如果我们迭代删除 ${S}$ 中所有具有 $\leq k$ 个1的行和列,最终得到一个空矩阵。我们证明,对于任何稀疏模式 ${S}$,可以用 $\tilde{O}({degen}({S}))$ 次矩阵-向量乘积查询学习到具有稀疏模式 $S$ 的 $A$ 的近最优近似,且 $\Omega({degen}({S}))$ 次查询是必要的。此外,与先前基于图着色的工作不同,我们的所有方法都在多项式时间内运行。

英文摘要

A large body of work studies the problem of learning an approximation to an implicit matrix $A\in \mathbb{R}^{m\times n}$ that is only accessible implicitly via matrix-vector product queries (matvec queries) of the form ${x} \rightarrow {A}{x}$ or ${x} \rightarrow {A}^T{x}$. Of particular interest are methods that learn a near-optimal approximation with a fixed sparsity pattern. For example, we might want to learn a near-optimal diagonal, banded, or arrow-head approximation to an implicit matrix $A$. Naturally, the number of matvec queries required to solve this problem depends on the sparsity pattern, which can be encoded as a binary matrix ${S}\in \{0,1\}^{m\times n}$. The query complexity of previous algorithms scales with quantities like the total number of ones in ${S}$, its maximum column/row sparsity, or the chromatic number of a its "conflict graph". These quantities are incomparable: for a given ${S}$, parameterizing by one might yield lower query complexity than another. In this work, we unify and tighten these prior results by providing a nearly sharp characterization of the matvec query complexity of sparse matrix approximation. Generalizing a definition from graph algorithms, let the degeneracy, ${degen}({S})$, denote the smallest number $k$ so that, if we iteratively delete all rows and columns of ${S}$ with $\leq k$ ones, we are left with an empty matrix. We show that a near-optimal approximation to $A$ with sparsity pattern $S$ can be learned with $\tilde{O}({degen}({S}))$ matrix-vector product queries, and $\Omega({degen}({S}))$ queries are necessary, for any sparsity pattern ${S}$. Moreover, unlike prior work based on graph coloring, all of our methods run in polynomial time.

2606.11974 2026-06-11 cs.DS cs.DC 新提交

Near-Optimal Distributed 2-Ruling Sets on Graphs with Low Arboricity

低树度图上的近最优分布式2-统治集

Malte Baumecker, Rustam Latypov, Yannic Maus, Jara Uitto

AI总结 针对低树度图,提出在LOCAL模型中几乎最优的随机算法,在O(log log n)轮内高概率计算2-统治集,改进指数级并匹配下界。

详情
AI中文摘要

给定图$G=(V,E)$,一个$\beta$-统治集是节点子集$S\subseteq V$,满足$S$是独立集,且每个节点$V$到$S$中某节点的距离至多为$\beta$。本文在经典\LOCAL模型中提出了几乎最优的分布式算法来寻找$2$-统治集。我们的主要贡献是一个随机算法,它在具有有界树度的任意$n$节点图上,以高概率在$O(\log \log n)$轮内计算出$2$-统治集。事实上,该算法适用于树度高达$O(\log\log n)$的图,比结合[Barenboim, Elkin, Pettie, Schneider; JACM'16]、[Ghaffari; SODA'16]和[Bisht, Kothapalli and Pemmaraju; PODC'14]所能达到的先前最优结果指数级改进,并且几乎匹配$\Omega(\log \log n / \log \log \log n)$的下界[Balliu, Brandt, Kuhn, Olivetti; FOCS'20]。统治参数$\beta=2$对于运行时间为$\log^{o(1)}n$的算法是最优的:在树度为2的图上,MIS(即$\beta = 1$)存在$\Omega(\sqrt{\log n})$轮的下界[Khoury, Schild; FOCS'25]。此外,对于更大的树度,我们获得了改进的算法。对于树度为$\alpha$的一般图,我们提出了一个随机算法,在$\widetilde{O}(\log^{5/8} \alpha +\log^{5/3} \log n)$轮内计算出$2$-统治集。对于一大类非常数树度,这比先前最优结果指数级改进。我们的技术超越了分布式计算。在低空间大规模并行计算(\mpc)模型中,我们提出了一个$O(\log \log \log n)$轮的算法,该算法以高概率在树度高达$2^{poly (\log \log n)}$的任意图上计算出$2$-统治集,比[Kothapalli, Pai, Pemmaraju; FSTTCS'20]结合[Fischer, Giliberti, Grunau; SPAA'23]的先前最优结果指数级改进。

英文摘要

Given a graph $G=(V,E)$, a $\beta$-ruling set is a subset of nodes $S\subseteq V$ that is independent, and each node in $V$ is at distance at most $\beta$ from some node in $S$. In this paper, we present almost optimal distributed algorithms for finding $2$-ruling sets in the classical \LOCAL model. Our main contribution is a randomized algorithm that w.h.p.\ computes a $2$-ruling set on any $n$-node graph with bounded arboricity in $O(\log \log n)$ rounds. In fact, the algorithm works up to arboricity $O(\log\log n)$, improves exponentially over the prior state of the art that can be achieved by combining [Barenboim, Elkin, Pettie, Schneider; JACM'16], [Ghaffari; SODA'16], and [Bisht, Kothapalli and Pemmaraju; PODC'14], and nearly matches the lower bound of $\Omega(\log \log n / \log \log \log n)$ [Balliu, Brandt, Kuhn, Olivetti; FOCS'20]. The domination parameter $\beta=2$ is optimal for algorithms with runtime $\log^{o(1)}n$: on graphs with arboricity $2$, there is a lower bound of $\Omega(\sqrt{\log n})$ rounds for MIS (i.e., $\beta = 1$) [Khoury, Schild; FOCS'25]. Additionally, we obtain improved algorithms for larger arboricity. For general graphs with arboricity $\alpha$, we present a randomized algorithm that computes a $2$-ruling set in $\widetilde{O}(\log^{5/8} \alpha +\log^{5/3} \log n)$ rounds. This improves exponentially over the state of the art for a large range of non-constant arboricity. Our techniques extend beyond distributed computing. We present an $O(\log \log \log n)$-round algorithm in the low-space Massively Parallel Computation (\mpc) model that w.h.p.\ computes a $2$-ruling set on any graph with arboricity up to $2^{poly (\log \log n)}$, improving exponentially over the state of the art from [Kothapalli, Pai, Pemmaraju; FSTTCS'20] combined with [Fischer, Giliberti, Grunau; SPAA'23].

2606.11820 2026-06-11 math.OC cs.DS 新提交

On finding exact solutions of linear programs in the oracle model

在oracle模型中寻找线性规划精确解

Daniel Dadush, László A. Végh, Giacomo Zambelli

AI总结 提出一种在oracle模型中求解线性规划的算法,通过几何条件数实现精确解,无需位复杂度参数。

详情
AI中文摘要

我们考虑oracle模型中的线性规划:$\max\{c^\top x \,:\, x\in P\}$,其中多面体$P=\{x\in\mathbb{R}^n\,:\, Ax\le b\}$由分离oracle给出。我们提出一种算法,使用$O(n^2\log(n/\delta))$次oracle调用和$O(n^4\log(n/\delta)+n^5\log\log(1/\delta))$次算术运算找到精确原始和对偶解,其中$\delta$是与系统$(A,b)$相关的几何条件数。这些界不依赖于成本向量$c$,也不需要先验知道$\delta$。对于有理数数据,$\log(1/\delta)$在$(A,b)$的编码大小中多项式有界,从而提供了多项式时间算法。该算法以黑箱方式工作,需要近似原始和对偶解的子程序;当使用Jiang、Lee、Song和Wong(STOC 2020)的切割平面方法作为子程序时,达到上述运行时间。尽管近似求解器可能只返回原始解,我们基于Burrell和Todd(Math. Oper. Res. 1985)的工作开发了一个提取对偶证书的通用框架。我们的算法加强了Grötschel、Lovász和Schrijver(Prog. Comb. Opt. 1984)以及Frank和Tardos(Combinatorica 1987)依赖于位复杂度参数的结果。我们的算法避免了基于舍入的论证(如同时丢番图逼近),而使用几何论证。

英文摘要

We consider linear programming in the oracle model: $\max\{c^\top x \,:\, x\in P\}$, where the polyhedron $P=\{x\in\mathbb{R}^n\,:\, Ax\le b\}$ is given by a separation oracle. We present an algorithm that finds exact primal and dual solutions using $O(n^2\log(n/\delta))$ oracle calls and $O(n^4\log(n/\delta)+n^5\log\log(1/\delta))$ arithmetic operations, where $\delta$ is a geometric condition number associated with the system $(A,b)$. These bounds do not depend on the cost vector $c$ and do not require a priori knowledge of $\delta$. For rational data, $\log(1/\delta)$ is polynomially bounded in the encoding size of $(A,b)$, thus providing a polynomial-time algorithm. The algorithm works in a black box manner, requiring a subroutine for approximate primal and dual solutions; the above running times are achieved when using the cutting plane method of Jiang, Lee, Song, and Wong (STOC 2020) for this subroutine. Whereas approximate solvers may return primal solutions only, we develop a general framework for extracting dual certificates based on the work of Burrell and Todd (Math. Oper. Res. 1985). Our algorithm strengthens results by Grötschel, Lovász, and Schrijver (Prog. Comb. Opt. 1984), and by Frank and Tardos (Combinatorica 1987) that rely on bit-complexity arguments. Our algorithm avoids rounding-based arguments such as simultaneous Diophantine approximation and uses geometric arguments instead.

2606.11760 2026-06-11 cs.DS cs.CR cs.DB 新提交

A Fast Gaussian Mechanism under Continual Observation, with Applications

持续观测下的快速高斯机制及其应用

Rasmus Pagh, Sia Sejer

AI总结 针对持续更新场景下的私有向量发布问题,提出一种基于布朗桥的常数时间采样方法,实现高斯噪声的快速生成,并应用于差分隐私计数草图,提升正交范围计数查询和连接大小估计的性能。

详情
AI中文摘要

我们考虑在更新下私有发布$k$维向量的问题:从零向量开始,在时间$t_1, t_2,\dots$,向量分别加上$x^{(1)}, x^{(2)},\dots$。对于正整数$T, k$,我们将更新建模为数据集$\{(t_i, x^{(i)})\}_i$,其中$t_i \in [T]$且$x^{(i)} \in B_k$($k$维单位球)。如果两个这样的数据集的对称差大小至多为$1$,则称它们为相邻的。持续发布包括每个时间步$t=1,\dots,T$的和$A^{(t)} = \sum_{i \;: \; t_i \leq t} x^{(i)}$。经典的持续发布技术允许我们以$\text{polylog}(T)$的加性噪声幅度发布$A^{(1)},\dots,A^{(T)}$的近似,计算时间为$O(kT)$,即使在在线自适应情况下(数据持续揭示当前时间步)也是如此。受私有草图技术的启发,我们考虑在时间步$t$仅需发布$A^{(t)}$中条目的\emph{子集}的设置。我们的新结果是,可以在\emph{常数时间}内采样给定噪声向量中的任何所需条目,同时精确再现具有高斯噪声的二叉树机制的分布。对已知$O(\log T)$时间界的改进来自一种新的数据结构,它允许我们使用布朗桥在常数时间内以正确的相关性采样新的噪声值。我们提出了两个独立感兴趣的数据管理应用,它们将我们的技术与差分隐私CountSketch结合使用:1)正交范围计数查询的动态数据结构,具有比先前数据结构更好的隐私/准确性/空间权衡;2)连接大小估计,其中我们还展示了改进的高概率界。

英文摘要

We consider the problem of privately releasing a $k$-dimensional vector under updates: Starting with a zero vector, at times $t_1, t_2,\dots$ the vector is updated by adding $x^{(1)}, x^{(2)},\dots$, respectively. For positive integers $T$, $k$ we model the updates as a data set $\{(t_i, x^{(i)})\}_i$, where $t_i \in [T]$ and $x^{(i)} \in B_k$ (the $k$-dimensional unit ball). Two such data sets are said to be neighboring if their symmetric difference has size at most $1$. The continual release consists of the sum $A^{(t)} = \sum_{i \;: \; t_i \leq t} x^{(i)}$ for each time step $t=1,\dots,T$. Classical continual release techniques allow us to release an approximation of $A^{(1)},\dots,A^{(T)}$ with additive noise of magnitude $\text{polylog}(T)$, computed in time $O(kT)$, even in the on-line, adaptive case where data is continually revealed for the current time step. Motivated by private sketching techniques, we consider the setting where only a \emph{subset} of entries in $A^{(t)}$ need to be released at time step $t$. Our new result is that it is possible to sample any desired entry in a given noise vector in \emph{constant time} while reproducing exactly the distribution of the binary tree mechanism with Gaussian noise. The improvement on the known time bound of $O(\log T)$ comes from a new data structure that allows us to sample a new noise value with the correct correlations in constant time using Brownian bridges. We present two data management applications, of independent interest, that use our technique in conjunction with differentially private CountSketches: 1) A dynamic data structure for orthogonal range counting queries with a better privacy/accuracy/space trade-off than previous data structures, and 2) Join size estimation, where in addition we show improved high-probability bounds.

2606.11701 2026-06-11 cs.DS 新提交

Beyond Frequency Marching: Orbit Recovery in Dihedral and Projected Multireference Alignment

超越频率匹配:二面体和投影多参考对齐中的轨道恢复

Tait Weicht, Alexander S. Wein

AI总结 针对二面体和投影多参考对齐变体,提出首个多项式时间算法,通过矩方法递归分解问题,实现信号恢复。

详情
Comments
58 pages
AI中文摘要

多参考对齐(MRA)是恢复隐藏“信号”向量的任务,给定许多被未知偏移循环平移的噪声副本。该任务属于轨道恢复问题类别,其中观测样本受某些群作用影响。这些问题有多种实际动机,包括从冷冻电子显微镜(cryo-EM)图像重建三维分子结构。我们考虑MRA的两种变体:二面体MRA,其中循环群被二面体群取代,允许向量的反转以及平移;以及投影MRA,其中观测通过类似于cryo-EM中存在的断层投影的投影算子传递。我们应用矩方法,旨在从样本的三阶矩张量中恢复信号。对于基本MRA,该逆问题已得到充分理解,但对于我们考虑的变体,没有已知的多项式时间算法能成功处理一般信号。我们为这两种变体给出了第一个这样的算法。我们的方法要求信号长度为2的幂,并递归地将问题细分为一半大小的更小问题。该算法对一般信号的成功被证明,但依赖于关于某个多项式符号矩阵秩的猜想。对于任何给定问题规模,该猜想可以在计算机上验证。

英文摘要

Multireference alignment (MRA) is the task of recovering a hidden "signal" vector, given many noisy copies that have been cyclically shifted by unknown offsets. This task belongs to the class of orbit recovery problems, in which the observed samples are affected by some group action. These problems have a variety of practical motivations, including the reconstruction of 3-dimensional molecular structure from cryogenic electron microscopy (cryo-EM) images. We consider two variants of MRA: dihedral MRA, where the cyclic group is replaced by the dihedral group, allowing for reversals of the vector in addition to shifts; and projected MRA, where the observations are passed through a projection operator akin to the tomographic projection present in cryo-EM. We apply the method of moments and aim to recover the signal from the third moment tensor of the samples. This inverse problem is well understood for basic MRA, but for the variants we consider there is no polynomial-time algorithm known to succeed for generic signals. We give the first such algorithm for both of these variants. Our method requires the signal length to be a power of two, and recursively subdivides the problem into smaller problems of half the size. The algorithm's success for generic signals is proven, conditional on a conjecture about the rank of a certain symbolic matrix of polynomials. For any given problem size, this conjecture can be verified on a computer.

2606.11469 2026-06-11 cs.DS cs.LG math.ST 新提交

Density estimation for Hellinger via minimum-distance estimators: mixtures of Gaussians, log-concave, and more

基于最小距离估计量的Hellinger密度估计:高斯混合、对数凹等

Spencer Compton, Jerry Li

AI总结 将最小距离估计方法从总变差距离扩展到Hellinger距离,通过反向数据处理不等式,实现了对对数凹混合和高斯混合(任意方差)的近线性时间学习,样本复杂度接近最优。

详情
AI中文摘要

我们研究密度估计任务,希望从$n$个样本中准确估计概率密度。在总变差距离下,密度估计的经典方法是最小距离估计量方法,其中我们仅通过限制特定概念类(即Yatracos类)的VC维即可得到算法和分析。虽然该技术最初主要针对总变差距离给出了精确保证,但在本文中,我们将最小距离估计量方法扩展到Hellinger距离下的学习。我们的主要观察是,通过联系最近得到反向数据处理不等式的结果,我们可以为Hellinger距离生成类似的方案(其中我们只需要限制相关概念类的VC维)。该方案足够灵活,可以容纳最初为总变差距离设计的快速算法;通过修改Acharya等人(2017)的方法,我们首次得到了近线性时间算法,用于学习包括单变量对数凹密度混合和高斯混合(具有任意方差)在内的类别,且样本复杂度接近最优。

英文摘要

We study the task of density estimation, where we hope to accurately estimate a probability density from $n$ samples. A textbook method for density estimation in total variation distance is the minimum-distance estimator approach, where we conclude both the algorithm and the analysis merely from bounding the VC dimension of a particular concept class (the so-called Yatracos class). While this technique has originally yielded sharp guarantees primarily for total variation distance, in this work we extend the minimum-distance estimator approach for learning within Hellinger distance. Our main observation is that we may produce an analogous recipe for Hellinger (where we only require bounding the VC dimension of a related concept class) by drawing connections to recent results yielding reverse data processing inequalities. This recipe is flexible enough to accommodate fast algorithms originally designed for total variation distance; by modifying the approach of Acharya et al. (2017) we conclude the first near-linear time algorithm for learning classes including univariate mixtures of log-concave densities and mixtures of Gaussians (with arbitrary variances), with near-optimal sample complexity.

2606.11448 2026-06-11 cs.DS cs.CC cs.IT 新提交

A Unified Lower Bound on the Noisy Query Complexity of Boolean Functions

布尔函数噪声查询复杂度的统一下界

Yuzhou Gu, Xin Li, Yinzhan Xu

AI总结 针对噪声查询模型,基于布尔超立方体子图的度统计,提出了布尔函数噪声查询复杂度的通用下界,统一并改进了现有结果,并解决了Gu、Li和Xu提出的开放问题。

详情
Comments
COLT 2026
AI中文摘要

我们研究了Feige、Raghavan、Peleg和Upfal [SICOMP 1994] 引入的噪声查询模型中布尔函数 $f: \{0, 1\}^n \rightarrow \{0, 1\}$ 的查询复杂度。在该模型中,算法可以自适应地查询输入向量的比特,但每个查询结果以恒定概率 $p \in (0, 1/2)$ 独立翻转;允许重复查询。函数 $f$ 的噪声查询复杂度 $\mathsf{N}_p(f)$ 定义为在最坏情况输入 $x$ 下,以不超过 $1/3$ 的错误概率计算 $f(x)$ 所需的最小期望查询次数。我们基于布尔超立方体某些子图的度统计,证明了 $\mathsf{N}_p(f)$ 的一个通用下界。这是除了由简单观察 $\mathsf{N}_p(f)$ 不低于随机化查询复杂度所蕴含的下界之外的第一个通用下界。我们表明,该下界恢复了(在常数因子内)大多数先前已知的布尔函数噪声查询复杂度下界,为理解这些结果提供了一个统一框架,并在若干情况下简化了证明。此外,这肯定地回答了Gu、Li和Xu [COLT 2025] 的一个开放问题:$\mathsf{N}_p(f) = \Omega(\mathsf{I}(f) \log \mathsf{I}(f))$,其中 $\mathsf{I}(f)$ 表示 $f$ 的总影响。我们还应用我们的通用下界,为若干新函数获得了噪声查询复杂度的紧界。

英文摘要

We study the query complexity of Boolean functions $f: \{0, 1\}^n \rightarrow \{0, 1\}$ in the noisy query model introduced by Feige, Raghavan, Peleg and Upfal [SICOMP 1994]. In this model, an algorithm can adaptively query the bits of an input vector, but each query result is independently flipped with constant probability $p \in (0, 1/2)$; repeated queries are allowed. The noisy query complexity $\mathsf{N}_p(f)$ of a function $f$ is defined as the minimum expected number of queries needed to compute $f(x)$ with error probability at most $1/3$, for the worst case input $x$. We prove a general lower bound on $\mathsf{N}_p(f)$ based on degree statistics of certain subgraphs of the Boolean hypercube. This is the first general lower bound beyond those implied by the simple observation that $\mathsf{N}_p(f)$ is lower bounded by the randomized query complexity. We show that this recovers (up to a constant factor) most previously known lower bounds on the noisy query complexity of Boolean functions, providing a unified framework for understanding these results and simplifying the proofs in several cases. Furthermore, this resolves in the affirmative an open problem of Gu, Li and Xu [COLT 2025] that $\mathsf{N}_p(f) = \Omega(\mathsf{I}(f) \log \mathsf{I}(f))$, where $\mathsf{I}(f)$ denotes the total influence of $f$. We also apply our general lower bound to obtain tight bounds on the noisy query complexity for several new functions.

2606.11437 2026-06-11 cs.DS cs.AI cs.LG stat.ML 新提交

The Power of Test-Time Training for Approximate Sampling

测试时训练对近似采样的威力

Noah Golowich, Ankur Moitra, Dhruv Rohatgi

AI总结 本文形式化测试时训练(TTT)为从已知分布类中采样的问题,证明查询复杂度的二次下界,并展示在分布类大小受限时可规避该下界,为TTT提供理论框架。

详情
AI中文摘要

从复杂概率分布中高效采样是一个基本问题,近年来随着生成式AI的兴起,这一问题变得越来越重要,因为从大语言模型(LLM)中提出的复杂采样程序已被用于解决具有挑战性的推理问题。然而,这类采样算法的有效性受到LLM与特定采样任务之间关系的限制,这推动了测试时训练(TTT)框架的发展。TTT通过根据推理时收到的部分生成和奖励反馈更新模型权重来工作,从而适应特定问题。在这项工作中,我们提出了一种TTT的形式化,将其定义为从属于已知分布类$F$的给定概率测度$\mu^\star$中生成样本的问题,给定一个提供$\mu^\star$近似密度估计的预言机$\hat \mu$。这与Jerrum、Valiant和Vazirani(1986)以及Jerrum和Sinclair(1989)的开创性工作中研究的将采样约化为近似计数的问题密切相关:即当$F$是所有分布的类时,它恰好与上述计数到采样的约化一致。在本文中,我们首先证明了在给定对$\hat \mu$的查询访问的情况下,从$\mu^\star$采样的查询复杂度的二次下界(对于足够大的类$F$),从而表明Jerrum和Sinclair(1989)提出并由Hayes和Sinclair(2010)改进的随机游走方法是最优的。这回答了Hayes和Sinclair提出的一个开放问题。然后,我们证明如果$F$的大小适当受限,这个下界可以被规避。正如我们所讨论的,后一个结果可以被视为TTT的抽象,因此代表了为TTT发展一个原则性理论框架的起点。

英文摘要

Efficiently sampling from a complex probability distribution is a fundamental problem which has become increasingly pertinent in recent years with the rise of generative AI, as sophisticated sampling procedures from LLMs have been proposed to solve challenging reasoning problems. The efficacy of such sampling algorithms is limited, however, by the relationship between the LLM and the particular sampling task at hand, which has motivated the framework of test-time training (TTT). TTT works by updating a model's weights in response to partial generations and reward feedback received at inference time, thus adapting to the particular problem. In this work, we propose a formalization for TTT as the problem of producing a sample from a given probability measure $\mu^\star$ belonging to a known class ${F}$ of distributions, given an oracle $\hat \mu$ which yields approximate density estimates for $\mu^\star$. This is closely related to the problem of reducing sampling to approximate counting studied in seminal works of Jerrum, Valiant & Vazirani (1986) and Jerrum & Sinclair (1989): namely, when ${F}$ is the class of all distributions, it coincides exactly with the aforementioned counting-to-sampling reduction. In this paper, we first show a quadratic lower bound on the query complexity of sampling from $\mu^\star$ given query access to $\hat \mu$ (for sufficiently large classes ${F}$), thus showing that the random walk approach proposed by Jerrum & Sinclair (1989) and refined by Hayes & Sinclair (2010), is optimal. This answers an open question posed by Hayes & Sinclair. We then show that this lower bound can be circumvented if the size of ${F}$ is bounded appropriately. As we discuss, this latter result can be viewed as an abstraction of TTT, and thus represents a starting point for the development of a principled theoretical framework for TTT.

2606.11283 2026-06-11 cs.DS cs.LG stat.ML 新提交

Fixed-Parameter Tractability of Private Synthetic Data Generation

私有合成数据生成的固定参数可处理性

Badih Ghazi, Cristóbal Guzmán, Pritish Kamath, Alexander Knop, Ravi Kumar, Pasin Manurangsi

AI总结 研究差分隐私下合成数据生成问题,通过查询族关联图的树宽参数建立固定参数可处理性,提出两种最优算法。

详情
AI中文摘要

我们研究在差分隐私下生成合成数据的问题。我们建立了该问题的固定参数可处理性(FPT),其中参数是查询族关联图的树宽。我们的算法在所有情况下都达到最优错误率,并通过两种不同方法实现:第一种基于线性规划(LP)和LP对偶分离问题的FPT;第二种基于子采样私有乘法权重方法,其中我们获得了从吉布斯分布采样的FPT。两种方法都通过树分解上的动态规划框架统一。

英文摘要

We study the problem of generating synthetic data under differential privacy. We establish fixed-parameter tractability (FPT) for this problem where the parameter is the treewidth of the query family's incidence graph. Our algorithms attain optimal error rates across all regimes and are realized by two different approaches: the first is based on linear programming (LP) and the FPT of the separation problem for the LP dual; the second is based on a subsampled private multiplicative weights method, where we obtain FPT for sampling from Gibbs distributions. Both approaches are unified by a dynamic programming framework over a tree decomposition.

2606.01183 2026-06-11 cs.DC cs.DB cs.DS cs.PF 版本更新

The World's Fastest Matching Engine Algorithm

世界上最快的撮合引擎算法

Jake Yoon

AI总结 提出Priority-Indicated Node (PIN)和邻域感知树操作两种数据结构,消除订单簿中指针追逐和根到叶搜索的延迟,实现亚微秒级尾部延迟和每秒数千万条消息的处理能力。

详情
Comments
20 pages, 5 figures, 7 tables
AI中文摘要

每个电子交易所都依赖于一个订单簿,其存储层决定了撮合延迟。主流实现——通过平衡树链接的链表——在每个操作上施加两个成本:指针追逐遍历以到达插入点,以及根到叶搜索以定位目标价格水平。在微突发条件下,这些成本会产生尾部延迟峰值,在流动性最需要时降低市场质量。我们提出了两种数据结构贡献,消除了这些成本。第一种是优先级指示节点(PIN),一种优先队列,其中条目占据固定容量、连续可寻址的槽位,每个槽位携带一个指示条目全局优先级的每槽指示器。与每次操作需要O(log n)次比较的堆不同,PIN直接根据指示器解析插入位置,无需比较条目;指示器更新为O(1),与队列大小无关。第二种解决了更广泛的低效问题:平衡搜索树在每次插入和删除时都进行根到叶搜索,即使调用者已经知道键的中序邻居——例如在有序事件流、增量索引维护和电子交易中。邻域感知插入和删除利用已知的邻居引用,通过O(1)次引用写入来附加或移除节点,然后进行单路径重平衡,统一适用于红黑树、AVL树和B/B+树变体。单个CPU核心在每秒数百万条消息的微突发下,以亚微秒级尾部延迟维持每秒3200万条订单消息,比同一硬件上最好的开源撮合引擎快5-11倍。扩展到单个96核实例,该引擎在10,000个交易品种上维持每秒6.4亿条消息。

英文摘要

A single CPU core sustains 32 million order messages per second at sub-microsecond median end-to-end host-path response latency, 4.7-11 times faster than the best available open-source matching engines on identical hardware. Scaled out, a single 96-core commodity server (~$1,630/month) sustains ~640 million messages per second across 10,000 symbols, over 20 times the provisioned capacity of the U.S. consolidated quote feed. We reach these numbers by attacking the storage layer that sets matching latency. The dominant order-book implementation, linked lists chained through a balanced tree, imposes two costs on every operation: pointer-chased traversal to the insertion point, and root-to-leaf search to locate the target price level. Under micro-bursts these costs produce tail-latency spikes that degrade market quality precisely when liquidity is most needed. We present two data-structure contributions that eliminate them. The first is the Priority-Indicated Node (PIN), a priority queue in which entries occupy fixed-capacity, contiguously addressable slots, with indicators encoding the entry's global priority status. Unlike heaps, which require O(log n) comparisons per operation, the PIN resolves insertion position directly from the indicators without comparing entries; indicator updates are O(1), independent of queue size. A depth-aware capacity model sizes each PIN so hot entries fit within L1 residency. The second targets a broader inefficiency: balanced search trees search from root to leaf on every insertion and deletion, even when the caller already knows the key's in-order neighbors, which in electronic trading are available at zero cost. Neighbor-aware insertion and deletion use known neighbor references to attach or remove a node with O(1) reference writes, followed by single-path rebalancing, across red-black, AVL, and B+-tree variants.

2605.02030 2026-06-11 cs.DB cs.DS 版本更新

U-HNSW: An Efficient Graph-based Solution to ANNS Under Universal Lp Metrics

U-HNSW:一种基于图的高效通用Lp度量近似最近邻搜索方法

Huayi Wang, Jingfan Meng, Jun Xu

AI总结 提出首个基于图的通用Lp度量近似最近邻搜索方法U-HNSW,利用L1和L2度量构建HNSW索引并采用早停策略,查询时间比MLSH快最多2670倍。

详情
AI中文摘要

在通用Lp度量下的近似最近邻搜索(ANNS-U-L_p)是一个重要且具有挑战性的研究问题,因为它要求同时回答所有可能的p(0<p≤2)值下的查询,而无需为每个可能的p值构建索引。最先进的解决方案MLSH是一种基于局部敏感哈希(LSH)的ANNS方法,其查询性能勉强可接受。相比之下,基于图的ANNS方法在ANNS-L_p问题(固定p值)上显著提高了查询效率,但无法直接扩展到ANNS-U-$L_p$问题。本文提出U-HNSW,这是首个用于ANNS-U-L_p的基于图的方法。我们的方案使用基于两个基础度量($L_1$和$L_2$)构建的HNSW图索引来生成有希望的最近邻候选,然后通过早停策略验证这些候选,该策略大幅减少了昂贵的Lp距离计算次数。实验结果表明,U-HNSW不仅比运行在RAM磁盘上的原始MLSH实现快最多2670倍(比理想化的MLSH快最多15倍),而且在ANNS-L_p问题(固定p值)上,除了少数特殊p值外,其性能也优于原始HNSW。

英文摘要

Approximate nearest neighbor search under universal L_p metrics (ANNS-U-L_p) is an important and challenging research problem, as it requires answering queries under all possible p (0<p <= 2) values simultaneously without building an index for each possible p value. The state-of-the-art solution, called MLSH, is a Locality-Sensitive Hashing (LSH)-based ANNS method with barely acceptable query performance. In contrast, graph-based ANNS methods, which offer significantly improved query efficiency on the ANNS-L_p problem (with a fixed p-value), cannot be naively extended to the ANNS-U-$L_p$ problem. In this paper, we propose U-HNSW, the first graph-based method for ANNS-U-L_p. Our scheme uses HNSW graph indexes built on two base metrics ($L_1$ and $L_2$) to generate promising nearest neighbors candidates, and then verifies these candidates with an early-termination strategy that substantially reduces the number of expensive L_p distance computations. Experimental results show that U-HNSW not only achieves up to 2670 times shorter query times than the original MLSH implementation running on a RAM disk (up to 15 times shorter than the idealized MLSH), but also outperforms the original HNSW on the ANNS-L_p problem (with a fixed p-value), except for a few special p values.

2210.09899 2026-06-11 cs.DS cs.CC cs.LO 版本更新

First Order Logic on Pathwidth Revisited Again

再论路径宽度上的一阶逻辑

Michael Lampis

AI总结 研究有界路径宽度图上一阶逻辑可表达性质的可判定性,证明其具有初等依赖,与树宽度情况形成对比。

详情
AI中文摘要

Courcelle 著名定理指出,所有 MSO 可表达的性质可以在有界树宽的图上在线性时间内判定。不幸的是,该定理隐含的常数是一个指数塔,其高度随公式中的量词交替次数增加。更糟糕的是,在标准假设下,即使考虑在树上判定 FO 可表达性质这个更受限的问题,也无法改进。本文重新审视这个被广泛研究的主题,并识别出一个自然特例,其中 Courcelle 定理的依赖关系实际上可以改进。具体来说,我们证明,如果输入图具有有界路径宽度(而非树宽度),则所有 FO 可表达的性质都可以用关于输入公式的初等依赖来判定。这是树宽度和路径宽度具有不同复杂度行为的一个罕见例子。我们的结果也与有界路径宽度图上的 MSO 逻辑形成鲜明对比,因为在标准假设下,已知后者的依赖必须是非初等的。我们的工作建立在 Gajarský 和 Hliněný 针对更受限的有界树深图类的相应元定理之上,并对其进行了推广。

英文摘要

Courcelle's celebrated theorem states that all MSO-expressible properties can be decided in linear time on graphs of bounded treewidth. Unfortunately, the hidden constant implied by this theorem is a tower of exponentials whose height increases with each quantifier alternation in the formula. More devastatingly, this cannot be improved, under standard assumptions, even if we consider the much more restricted problem of deciding FO-expressible properties on trees. In this paper we revisit this well-studied topic and identify a natural special case where the dependence of Courcelle's theorem can, in fact, be improved. Specifically, we show that all FO-expressible properties can be decided with an elementary dependence on the input formula, if the input graph has bounded pathwidth (rather than treewidth). This is a rare example of treewidth and pathwidth having different complexity behaviors. Our result is also in sharp contrast with MSO logic on graphs of bounded pathwidth, where it is known that the dependence has to be non-elementary, under standard assumptions. Our work builds upon, and generalizes, a corresponding meta-theorem by Gajarský and Hliněný for the more restricted class of graphs of bounded tree-depth.

1303.2033 2026-06-11 cs.DS cs.IT cs.NA math.IT math.NA

Extended Fourier analysis of signals

Vilnis Liepins

详情
Comments
52 pages, 11 figures
英文摘要

This summary of the doctoral thesis provides a comprehensive formulation of the Extended Discrete Fourier Transform (EDFT), derived directly from the Fourier integral and its orthogonality properties. The method is obtained by solving weighted least-squares estimators in both continuous and discrete domains, yielding an adaptive frequency-domain representation that remains fully consistent with the classical Fourier framework. In the special case of uniformly sampled data on a uniform frequency grid of the same size, the EDFT reduces exactly to the classical Discrete Fourier Transform (DFT). However, when the analysis grid exceeds the number of observed samples, EDFT circumvents conventional zero-padding by optimizing the transformation basis over the extended frequency set. This enables accurate spectral estimation from incomplete or nonuniformly sampled data. Consequently, the EDFT achieves enhanced frequency resolution in regions of strong spectral content while maintaining global resolution balance, thereby remaining consistent with the uncertainty principle. The inverse EDFT reconstructs the original signal and produces extrapolated or interpolated samples wherever spectral information is available. The EDFT requires no explicit separation of deterministic and stochastic components and accurately captures broadband, transient, and sinusoidal features simultaneously. Simulation studies confirm its robustness under nonuniform sampling, multiple Nyquist zones, missing-data conditions, and signals with mixed spectra comprising both line and continuous components. Although iterative computation of the EDFT entails higher numerical cost compared to the classical DFT, this limitation - significant in the 1990s - has been largely mitigated by modern computational resources, rendering the EDFT practical for contemporary signal analysis applications.

2410.00568 2026-06-11 cs.DS cs.DM

Approximation of Spanning Tree Congestion using Hereditary Bisection

Petr Kolman

详情
Comments
Final DMTCS version. 9 pages
英文摘要

The Spanning Tree Congestion (STC) problem is the following NP-hard problem: given a graph $G$, construct a spanning tree $T$ of $G$ minimizing its maximum edge congestion where the congestion of an edge $e\in T$ is the number of edges $uv$ in $G$ such that the unique path between $u$ and $v$ in $T$ passes through $e$; the optimal value for a given graph $G$ is denoted $STC(G)$. It is known that every spanning tree is an $n/2$-approximation for the STP problem. A long-standing problem is to design a better approximation algorithm. Our contribution towards this goal is an $O(Δ\cdot\log^{3/2}n)$-approximation algorithm where $Δ$ is the maximum degree in $G$ and $n$ the number of vertices. For graphs with a maximum degree bounded by a polylog of the number of vertices, this is an exponential improvement over the previous best approximation. Our main tool for the algorithm is a new lower bound on the spanning tree congestion which is of independent interest. Denoting by $hb(G)$ the hereditary bisection of $G$ which is the maximum bisection width over all subgraphs of $G$, we prove that for every graph $G$, $STC(G)\geq Ω(hb(G)/Δ)$.