arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2606.20331 2026-06-19 cs.DS cs.CC 新提交

Computing Twin-Width via Treedepth and Vertex Integrity

通过树深度和顶点完整性计算双宽度

Robert Ganian, Mathis Rocton

AI总结本文证明，当参数化为树深度时，近似双宽度是固定参数可解的；当参数化为顶点完整性时，精确计算双宽度是固定参数可解的，首次为非平凡参数化算法提供最优收缩序列。

Comments A short version of this preprint appeared at STACS 2026

详情

DOI: 10.4230/LIPIcs.STACS.2026.42

AI中文摘要

双宽度是一个图参数，已成为解释一阶模型检验在许多图类上固定参数可解性的核心。尽管其算法重要性，计算双宽度仍然知之甚少：甚至识别双宽度至多为4的图是NP难的，并且没有已知的以双宽度本身为参数的固定参数近似。最近突破这一障碍的方法侧重于首先开发以不同于双宽度的参数化来计算或近似双宽度的固定参数算法。我们的第一个结果表明，当以树深度为参数时，近似双宽度是固定参数可解的，从而打破了所有先前可处理的参数化都基于删除距离的长期障碍。证明通过有向双宽度进行，首次提供了该变体可能在算法上更易处理的构造性证据。作为第二个主要结果，我们表明，以顶点完整性为参数时，精确计算双宽度是固定参数可解的。这构成了计算最优收缩序列的第一个非平凡参数化算法。

英文摘要

Twin-width is a graph parameter that has become central to explaining the fixed-parameter tractability of first-order model checking across many graph classes. Despite its algorithmic importance, computing twin-width remains poorly understood: even recognizing graphs of twin-width at most four is NP-hard, and no fixed-parameter approximations parameterized by twin-width itself are known. A recent approach towards breaking this barrier focuses on first developing fixed-parameter algorithms for computing or approximating twin-width under parameterizations distinct from twin-width. Our first result establishes that approximating twin-width is fixed-parameter tractable when parameterized by treedepth, thereby breaking the long-standing barrier that all previous tractable parameterizations were based on deletion distance. The proof proceeds via oriented twin-width, yielding the first constructive evidence that this variant may be easier to handle algorithmically. As our second main result, we show that computing twin-width exactly is fixed-parameter tractable with respect to vertex integrity. This constitutes the first non-trivial parameterized algorithm for computing optimal contraction sequences.

URL PDF HTML ☆

赞 0 踩 0

2606.20202 2026-06-19 cs.DS 新提交

Tight Algorithm and Hardness for Submodular Linear Ordering

子模线性排序的紧致算法与难度

Evan Abboud, Roy Schwartz

AI总结针对一般子模函数的最小线性排序问题，提出多项式时间O(√(n/ln n))近似算法，并证明信息论下界匹配，任何多项式时间算法无法达到o(√(n/ln n))近似比。

Comments 25 pages. Accepted to the 53rd International Colloquium on Automata, Languages, and Programming (ICALP 2026)

详情

AI中文摘要

我们考虑最小线性排序问题：给定基数为$n$的集合$N$和非负集函数$f\colon 2^N\rightarrow \mathbb{R}_{\geq 0}$，目标是找到$N$的一个排列$\pi$，使得$\pi$的所有前缀上$f$值的和最小。该问题已被研究用于各种集函数类，其中子模$f$的情况特别受关注，因为它涵盖了经典问题，包括最小线性排列和最小包含区间图。在这项工作中，我们通过建立匹配的上界和下界，解决了一般子模$f$的最小线性排序问题的近似性，并给出：$(1)$一个多项式时间算法，实现$O(\sqrt{n/\ln n})$-近似；以及$(2)$一个匹配的信息论难度结果，表明任何对$f$进行多项式次数求值的算法都无法实现$o(\sqrt{n/\ln n})$-近似。此前，已知的最佳近似难度为$2$，而$O(\sqrt{n/\ln n})$-近似仅对$f$既是子模又是对称的特殊情况已知。

英文摘要

We consider the Minimum Linear Ordering Problem: given a ground set $N$ of cardinality $n$ and a non-negative set function $f\colon 2^N\rightarrow \mathbb{R}_{\geq 0}$, the goal is to find an ordering $π$ of $N$ that minimizes the sum of the values of $f$ over all prefixes of $π$. This problem has been studied for various classes of set functions, and the case of a submodular $f$ is of special interest, as it captures classic problems including Minimum Linear Arrangement and Minimum Containing Interval Graph. In this work, we resolve the approximability of the Minimum Linear Ordering Problem for a general submodular $f$ by establishing matching upper and lower bounds and present: $(1)$ a polynomial-time algorithm achieving an $O(\sqrt{n/\ln n})$-approximation; and $(2)$ a matching information-theoretic hardness result, showing that no algorithm evaluating $f$ a polynomial number of times can achieve an $o(\sqrt{n/\ln n})$-approximation. Previously, the best known hardness of approximation was $2$, and an $O(\sqrt{n/\ln n})$-approximation was known only for the special case where $f$ is both submodular and symmetric.

URL PDF HTML ☆

赞 0 踩 0

2606.18679 2026-06-19 cs.DS cs.GT cs.LG math.OC 新提交

Fair Online Resource Allocation

公平在线资源分配

Christopher En, Yuri Faenza, Andrea Lodi, Gonzalo Muñoz

发表机构 * Columbia University, IEOR Department（哥伦比亚大学工业工程与运营研究系）； Cornell Tech（康奈尔科技学院）； Universidad de Chile（智利大学）

AI总结研究在线资源分配中的公平性问题，提出基于对偶镜像下降的算法，在批次内强制执行公平约束，实现亚线性遗憾，并通过难民数据验证了福利与公平的权衡。

Comments 30 pages, 4 figures. To appear in the proceedings of EC 2026

详情

AI中文摘要

我们研究公平在线资源分配问题，其动机源于难民安置和航班调度等应用，其中代理顺序到达并必须分配到容量有限的设施。我们引入一个模型，在资源约束和Lipschitz公平性要求下最大化整体福利，该要求确保同一批次中到达的相似代理获得相似的预期结果。我们首先分析离线问题，证明最优公平分配的价值至少是最优不公平分配的$\Omega(1/\gamma)$倍，其中$\gamma$是公平系数，从而界定了公平的代价。对于在线设置，我们提出一种基于对偶镜像下降的算法，该算法在估计最优对偶变量的同时，在批次内强制执行公平约束。我们证明该算法相对于最优离线流体基准实现了亚线性遗憾。最后，我们使用难民经济项目的真实数据验证了理论结果，展示了算法的性能，并考察了福利最大化与公平执行之间的权衡。

英文摘要

We study the problem of fair online resource allocation, motivated by applications such as refugee resettlement and airline scheduling, where agents arrive sequentially and must be assigned to facilities with limited capacities. We introduce a model that maximizes the overall welfare subject to resource constraints and a Lipschitz fairness requirement, which ensures that similar agents arriving in the same batch receive similar expected outcomes. We first analyze the offline problem, proving that the value of the optimal fair allocation is at least an $Ω(1/γ)$ fraction of the optimal unfair allocation, where $γ$ is the fairness coefficient, thereby bounding the price of fairness. For the online setting, we propose an algorithm based on dual mirror descent that enforces fairness constraints within batches while estimating optimal dual variables. We prove that this algorithm achieves sublinear regret relative to the optimal offline fluid benchmark. Finally, we validate our theoretical results using real-world data from the Refugee Economies Programme, demonstrating the algorithm's performance and examining the trade-offs between welfare maximization and fairness enforcement.

URL PDF HTML ☆

赞 0 踩 0

2606.20539 2026-06-19 cs.DB cs.DS 交叉投稿

Caching for Dollars, Not Hits: An Exact Offline Reference for Cloud-Egress Caching and the Crossover That Decides When It Pays

为美元缓存，而非命中率：云出口缓存的精确离线参考及决定何时值得的交叉点

Madhulatha Mandarapu, Sandeep Kunkunuru

AI总结针对云存储出口费用而非延迟的缓存问题，提出多项式时间精确离线最优策略，发现LRU的美元后悔随成本分散度上升，而成本感知的GreedyDual可大幅降低，并给出决定何时需要成本感知缓存的闭合形式交叉点。

Comments 6 pages, 3 figures. Code, benchmarks, and full pre-registration: https://github.com/samyama-ai/cloud-egress-cache

详情

AI中文摘要

当缓存未命中从云对象存储获取数据时，计费基于每次GET请求和每字节出口流量，而非延迟。经典缓存最小化未命中率，这是错误的目标：一个很少但昂贵获取的对象可能比一个频繁但廉价获取的对象花费数千倍。广义缓存理论界定了未命中成本目标，但尚无公开基准衡量实际部署的启发式策略在真实云价格下与美元最优离线策略的差距。我们提供了该参考。对于具有异构未命中成本的统一大小页面缓存，离线美元最优可通过积分区间线性规划在多项式时间内精确求解——经暴力验证；可变大小是NP难的，因此我们将基于流的离线界从命中率目标扩展到美元（成本-FOO），误差约4%。基于此参考我们发现：(i) 异质性遗憾定律——LRU的美元遗憾随未命中成本分散度上升（Spearman 0.87），而成本感知的GreedyDual将其降至约十分之一；(ii) 竞争边界——当预算恰好覆盖昂贵工作集时，GreedyDual的残余遗憾降至接近零，否则为开放区间；(iii) 闭合形式交叉点 s* = GET费用/出口费率（S3上约4 KB，GCS上约330 B），可预测哪些部署需要成本感知缓存。在真实Twitter轨迹上，仅价格向量即可使工作负载跨越s*，按预测改变状态。该工件是一个可复现的计费忠实基准；其构建的启发式策略和界为先前工作，已致谢。

英文摘要

When a cache miss fetches from cloud object storage, the bill is per GET request and per byte of egress, not latency. Classic caching minimizes the miss rate, the wrong objective: a rarely but expensively fetched object can cost thousands of times more dollars than a frequently but cheaply fetched one. Generalized-caching theory bounds the miss-cost objective, but no reported benchmark measures how far deployed heuristics sit from the dollar-optimal offline policy on real cloud prices. We supply that reference. For uniform-size page caches with heterogeneous miss costs the offline dollar-optimum is exact in polynomial time via an integral interval linear program -- validated against brute force; variable sizes are NP-hard, so we extend the flow-based offline bound from the hit-ratio objective to dollars (cost-FOO), tight to about four percent. Against this reference we find: (i) a heterogeneity-regret law -- LRU's dollar-regret rises with miss-cost dispersion (Spearman 0.87) while cost-aware GreedyDual cuts it to roughly a tenth; (ii) a contention frontier -- GreedyDual's residual regret collapses to near zero exactly when the budget fits the expensive working set, and is the open slice otherwise; and (iii) a closed-form crossover s* = GET_fee/egress_rate (about 4 KB on S3, 330 B on GCS) that predicts which deployments need dollar-aware caching at all. On a real Twitter trace the price vector alone moves the workload across s*, shifting the regime as predicted. The artifact is a reproducible billing-faithful benchmark; heuristics and bounds it builds on are prior work, credited.

URL PDF HTML ☆

赞 0 踩 0

2606.19393 2026-06-19 cs.DM cs.DS math.CO 交叉投稿

An alternative way of defining finite graphs

定义有限图的另一种方式

Maxim Nazarov

AI总结提出一种完全图不变量“图线性符号”，作为有限图的替代定义，用于简化图的对称性图示和同构比较。

Journal ref Prikl. Diskr. Mat., 2015, no. 3(29), 83-94

2606.20082 2026-06-19 math.OC cs.DS cs.LG 交叉投稿

Beyond Averaging in John Ellipsoid Approximation: High-Accuracy Algorithms in the Leverage-Score Model

超越John椭球逼近中的平均化：杠杆分数模型中的高精度算法

Xiaoyu Li, Junwei Yu, Jiaojiao Jiang, Junbin Gao, Andi Han

AI总结本文分离了John椭球逼近算法中的认证、识别和精度三种成本，证明精度依赖仅为双对数，并提出了加速方法和阻尼牛顿法，在杠杆分数模型中实现了高精度逼近。

详情

AI中文摘要

对称多面体 $P=\{\mathbf{x}\in\mathbb{R}^d:\|\mathbf{A}\mathbf{x}\|_\infty\le1\}$, $\mathbf{A}\in\mathbb{R}^{n\times d}$ 的 John 椭球由一系列杠杆分数算法计算，从 Cohen, Cousins, Lee 和 Yang (COLT 2019) 到其后续工作 [WY24, CLS+25]，均在 $\Theta(\varepsilon^{-1}\log(n/d))$ 次迭代内达到 $(1+\varepsilon)$-逼近。我们将这一复杂度分离为现代算法混淆的三种成本（认证、识别和精度），并发现历史上的 $\varepsilon^{-1}$ 仅存在于第一种成本中。在等价的 D-最优设计形式 $\min_{\mathbf{p}\in\Delta_n}-\log\det(\sum_i p_i\mathbf{a}_i\mathbf{a}_i^\top)$ 中，杠杆分数预言机恰好是一阶预言机，而 $(1+\varepsilon)$-John 保证对应于 Frank-Wolfe 间隙 $g(\mathbf{p})\le\varepsilon d$；通过这一对应关系，成本得以分离。$\varepsilon^{-1}$ 是认证的产物：迭代点的均匀平均（该系列算法中使用的认证）的间隙恰好为 $\Theta(1/T)$，无论每次迭代多么廉价。相反，针对最后迭代点，同一预言机是快速的：热启动加速方法在 $\varepsilon$-无关的初始化 $C(\mathbf{A})$ 后，仅需 $C(\mathbf{A})+O(\sqrt{\kappa}\log(1/\varepsilon))$ 次查询即可达到保证；一旦最优面被识别，面问题成为无约束自和谐最小化，其 Hessian 可由预言机精确恢复，因此阻尼牛顿法仅需 $O(\log\log(1/\varepsilon))$ 步，总查询数为 $C(\mathbf{A})+O(d^2\log\log(1/\varepsilon))$。因此，在 $\varepsilon$-无关、条件依赖的初始化后，精度依赖是双对数的；开放问题在于剩余的识别成本（达到最优面的无条件界）和下界。精度并非障碍。

英文摘要

The John ellipsoid of a symmetric polytope $P=\{\mathbf{x}\in\mathbb{R}^d:\|\mathbf{A}\mathbf{x}\|_\infty\le1\}$, $\mathbf{A}\in\mathbb{R}^{n\times d}$, is computed by a long line of leverage-score algorithms, from Cohen, Cousins, Lee and Yang (COLT 2019) to its successors [WY24, CLS+25], all reaching a $(1+\varepsilon)$-approximation in $Θ(\varepsilon^{-1}\log(n/d))$ iterations. We separate this complexity into three costs the modern line conflates (certification, identification, and accuracy) and locate the historical $\varepsilon^{-1}$ in the first alone. In the equivalent D-optimal-design form $\min_{\mathbf{p}\inΔ_n}-\log\det(\sum_i p_i\mathbf{a}_i\mathbf{a}_i^\top)$, the leverage-score oracle is exactly the first-order oracle and the $(1+\varepsilon)$-John guarantee the Frank-Wolfe gap $g(\mathbf{p})\le\varepsilon d$; through this dictionary the costs come apart. The $\varepsilon^{-1}$ is a certification artifact: the uniform average of the iterates, the certificate used throughout the line, has gap exactly $Θ(1/T)$, however cheap each iteration is made. Pointed instead at the last iterate the same oracle is fast: a warm-started accelerated method reaches the guarantee in $C(\mathbf{A})+O(\sqrtκ\log(1/\varepsilon))$ queries after an $\varepsilon$-independent setup $C(\mathbf{A})$, and once the optimal face is identified the facial problem is an unconstrained self-concordant minimization whose Hessian the oracle recovers exactly, so damped Newton needs only $O(\log\log(1/\varepsilon))$ steps, for a total of $C(\mathbf{A})+O(d^2\log\log(1/\varepsilon))$ queries. The accuracy dependence is thus doubly logarithmic after an $\varepsilon$-independent, condition-dependent setup; the open problem is the remaining identification cost (a condition-free bound on reaching the optimal face) and lower bounds. Accuracy is not the obstruction.

URL PDF HTML ☆

赞 0 踩 0

2606.19763 2026-06-19 math.PR cs.DS 交叉投稿

Optimal Sparsification of Gaussian Processes

高斯过程的最优稀疏化

Shivam Nadimpalli

AI总结针对中心高斯过程的上确界，提出一种维度无关的最优稀疏化定理，通过指数因子改进现有结果，并证明依赖关系紧致。

Comments 38 pages, 1 figure

详情

AI中文摘要

我们证明了中心高斯过程上确界的最优无维度稀疏化定理。给定有界集 $T\subseteq\mathbb{R}^n$，我们证明 $T$ 上的典范高斯过程的上确界可以被一个由仅 $\exp(O(1/\varepsilon^2))$ 个点索引的平移子过程的上确界在 $L^2$ 意义下逼近，误差至多为 $\varepsilon$ 乘以 $T$ 的高斯宽度。特别地，逼近过程的大小与原始索引集的维度和基数均无关。这比 De、Nadimpalli、O'Donnell 和 Servedio (2026) 最近的稀疏化定理改进了一个指数因子，并且我们证明了对 $\varepsilon$ 的依赖在指数上是紧的（至多常数因子）。作为推论，我们得到了高斯空间上范数的指数改进的 junta 定理，并改进了高斯测度下凸集的学习、性质测试和多面体逼近的结果。证明基于一个结合 Sudakov 下界与 Brascamp–Lieb 不等式的插值论证。

英文摘要

We prove an optimal dimension-free sparsification theorem for suprema of centered Gaussian processes. Given a bounded set $T\subseteq\mathbb{R}^n$, we show that the supremum of the canonical Gaussian process on $T$ can be $L^2$-approximated by the supremum of a shifted subprocess indexed by only $\exp(O(1/\varepsilon^2))$ points, with error at most $\varepsilon$ times the Gaussian width of $T$. In particular, the size of the approximating process is independent of both the ambient dimension and the cardinality of the original index set. This improves a recent sparsification theorem of De, Nadimpalli, O'Donnell, and Servedio (2026) by an exponential factor, and we show that the dependence on $\varepsilon$ is tight up to constants in the exponent. As consequences, we obtain an exponentially improved junta theorem for norms over Gaussian space and sharpen results on learning, property testing, and polyhedral approximation of convex sets under the Gaussian measure. The proof is based on an interpolation argument that combines Sudakov's minoration with the Brascamp--Lieb inequality.

URL PDF HTML ☆

赞 0 踩 0

2606.01183 2026-06-19 cs.DC cs.DB cs.DS cs.PF 版本更新

The World's Fastest Matching Engine Algorithm

世界上最快的撮合引擎算法

Jake Yoon

AI总结提出Priority-Indicated Node (PIN)和邻域感知树操作两种数据结构，消除订单簿中指针追逐和根到叶搜索的延迟，实现亚微秒级尾部延迟和每秒数千万条消息的处理能力。

Comments 20 pages, 5 figures, 7 tables

详情

AI中文摘要

每个电子交易所都依赖于一个订单簿，其存储层决定了撮合延迟。主流实现——通过平衡树链接的链表——在每个操作上施加两个成本：指针追逐遍历以到达插入点，以及根到叶搜索以定位目标价格水平。在微突发条件下，这些成本会产生尾部延迟峰值，在流动性最需要时降低市场质量。我们提出了两种数据结构贡献，消除了这些成本。第一种是优先级指示节点（PIN），一种优先队列，其中条目占据固定容量、连续可寻址的槽位，每个槽位携带一个指示条目全局优先级的每槽指示器。与每次操作需要O(log n)次比较的堆不同，PIN直接根据指示器解析插入位置，无需比较条目；指示器更新为O(1)，与队列大小无关。第二种解决了更广泛的低效问题：平衡搜索树在每次插入和删除时都进行根到叶搜索，即使调用者已经知道键的中序邻居——例如在有序事件流、增量索引维护和电子交易中。邻域感知插入和删除利用已知的邻居引用，通过O(1)次引用写入来附加或移除节点，然后进行单路径重平衡，统一适用于红黑树、AVL树和B/B+树变体。单个CPU核心在每秒数百万条消息的微突发下，以亚微秒级尾部延迟维持每秒3200万条订单消息，比同一硬件上最好的开源撮合引擎快5-11倍。扩展到单个96核实例，该引擎在10,000个交易品种上维持每秒6.4亿条消息。

英文摘要

A single CPU core sustains 32 million order messages per second at sub-microsecond median end-to-end host-path response latency, 4.7-11 times faster than the best available open-source matching engines on identical hardware. Scaled out, a single 96-core commodity server (~$1,630/month) sustains ~640 million messages per second across 10,000 symbols, over 20 times the provisioned capacity of the U.S. consolidated quote feed. We reach these numbers by attacking the storage layer that sets matching latency. The dominant order-book implementation, linked lists chained through a balanced tree, imposes two costs on every operation: pointer-chased traversal to the insertion point, and root-to-leaf search to locate the target price level. Under micro-bursts these costs produce tail-latency spikes that degrade market quality precisely when liquidity is most needed. We present two data-structure contributions that eliminate them. The first is the Priority-Indicated Node (PIN), a priority queue in which entries occupy fixed-capacity, contiguously addressable slots, with indicators encoding the entry's global priority status. Unlike heaps, which require O(log n) comparisons per operation, the PIN resolves insertion position directly from the indicators without comparing entries; indicator updates are O(1), independent of queue size. A depth-aware capacity model sizes each PIN so hot entries fit within L1 residency. The second targets a broader inefficiency: balanced search trees search from root to leaf on every insertion and deletion, even when the caller already knows the key's in-order neighbors, which in electronic trading are available at zero cost. Neighbor-aware insertion and deletion use known neighbor references to attach or remove a node with O(1) reference writes, followed by single-path rebalancing, across red-black, AVL, and B+-tree variants.

URL PDF HTML ☆

赞 0 踩 0

2501.09293 2026-06-19 cs.DS 版本更新

Non-Splitting Coflow Scheduling with Provable Guarantees in Heterogeneous Parallel Networks

异构并行网络中具有可证明保证的非分割Coflow调度

Chi-Yeh Chen

AI总结针对异构并行网络中非分割coflow调度问题，提出统一多项式时间近似算法，最小化完工时间，并在纯EPS、纯非全停OCS和纯全停OCS环境下给出近似比。

详情

AI中文摘要

作为一种突出的网络抽象，coflow有效地捕获了数据中心中的通信模式。由于大规模数据中心中的coflow调度是$\mathcal{NP}$-难的，现有文献主要关注具有$m=2$个网络核心的有限环境，并依赖于流分割，这引入了大量的操作开销。关键的是，对于更实际的非分割coflow调度问题，即使对于$m=2$的情况，也没有提出具有可证明性能保证的近似算法，更不用说一般的混合架构了。为了弥合这一关键差距，本文研究了混合异构并行网络中的非分割问题，该网络具有多个网络核心（$m \ge 2$），由电子分组交换机（EPS）、非全停光电路交换机（OCS）和全停OCS组成。我们提出了一种统一的多项式时间近似算法，该算法在混合环境中最小化完工时间，且不产生任何分割开销。令$τ$表示网络中所有端口的最大流度，$N$为输入/输出端口数，$m$为网络核心数。在纯EPS环境中，该算法实现了$\min\left\{τ, 2Nm+1\right\}$的近似保证。对于纯非全停和纯全停OCS环境，保证比率分别为$2\min\left\{τ, 2Nm+1\right\}$和$2\min\left\{2τ-1, 2Nm+τ\right\}$。值得注意的是，当专门针对$m=2$设置时，我们的算法摆脱了网络规模的依赖，对于纯EPS和纯非全停OCS分别产生常数界$2$和$4$，对于纯全停OCS产生$2τ+2$。通过利用这些组成界，我们证明了混合架构中的整体性能保证由网络中性能最差的交换机架构上界决定。

英文摘要

As a prominent network abstraction, coflow models efficiently capture communication patterns in data centers. Since coflow scheduling in large-scale data centers is $\mathcal{NP}$-hard, the existing literature has predominantly focused on limited environments with $m=2$ network cores, relying on flow splitting, which introduces substantial operational overhead. Crucially, no approximation algorithm with provable performance guarantees has been proposed for the more practical, non-splitting coflow scheduling problem, even for the $m=2$ case, let alone for general hybrid architectures. To bridge this critical gap, this paper investigates the non-splitting problem within a hybrid, heterogeneous parallel network featuring multiple network cores ($m \ge 2$) composed of Electronic Packet Switches (EPS), not-all-stop Optical Circuit Switches (OCS), and all-stop OCS. We propose three unified polynomial-time approximation algorithms that minimize the makespan and the total weighted coflow completion time across this hybrid environment without incurring any splitting overhead. Let $τ$ denote the maximum flow degree across all ports in the network, and let $m$ be the number of network cores. To minimize the makespan, our algorithm achieves an approximation ratio of $2\min\left\{2τ-1, m+τ-1\right\}$ in the hybrid architecture. To minimize the total weighted coflow completion time, our algorithm achieves an approximation ratio of $16\min\left\{2τ-1, 2m+τ-1\right\}$ in the hybrid architecture. Moreover, we characterize the approximation ratios of our algorithm under different architectural combinations.

URL PDF HTML ☆

赞 0 踩 0

2509.15069 2026-06-19 eess.SP cs.DS cs.NA math.NA 版本更新

Efficient Computation of Time-Index Powered Weighted Sums Using Cascaded Accumulators

使用级联累加器高效计算时间索引加权和

Deijany Rodriguez Linares, Oksana Moryakova, Håkan Johansson

AI总结提出一种利用级联累加器高效计算时间索引加权和的方法，将乘法次数从K×N减少到K+1次常数乘法，无需存储数据块，适用于实时逐样本处理系统。

Comments This work has been submitted to the IEEE for possible publication

Journal ref IEEE Signal Processing Letters, vol. 33, pp. 893-897, Feb. 2026

2509.19598 2026-06-19 cs.IT cs.DS math.IT 版本更新

Efficient $\varepsilon$-approximate minimum-entropy couplings

高效的ε-近似最小熵耦合

Spencer Compton

AI总结针对离散概率分布的最小熵耦合问题，提出运行时间为n^{O(poly(1/ε)·exp(m))}的算法，实现H(ALG) ≤ H(OPT) + ε，证明对常数m存在多项式时间近似方案。