Coset Ensemble Decoder for Quantum Error Correction with Algorithm-Hardware Co-Design
面向量子纠错的陪集集成解码器与算法-硬件协同设计
Shuang Liang, Jubo Xu, Giulio Bassanino, Qianzhou Wang, Yidong Zhou, Yuncheng Lu, Zhiwen Mo, Paul H. J. Kelly, Bo Yuan, Wayne Luk, Hongxiang Fan
AI总结 提出陪集集成解码算法,通过显式利用逻辑等价陪集改进Union-Find解码,结合硬件架构优化,在电路级退极化噪声下实现比MWPM和UF解码器更好的精度-延迟权衡,并显著降低FPGA资源消耗。
详情
- Comments
- 15 pages, 19 figures, 1 table. Accepted to appear in the 53rd Annual International Symposium on Computer Architecture (ISCA 2026)
可靠的大规模量子计算依赖于容错架构,其中量子纠错(QEC)实时连续提取和解码错误综合征。QEC中的一个关键组件是解码器,这是一个经典子系统,必须同时提供高逻辑精度和超低延迟。本文提出了一种新颖的算法-硬件协同设计,改善了与现有方法(如最小权重完美匹配(MWPM)和Union-Find(UF)解码器)的精度-延迟权衡。在算法层面,我们引入了陪集集成解码,通过显式利用逻辑等价陪集改进了UF解码。我们的方法执行集成森林探索以生成多个陪集一致候选,并聚合它们以近似陪集级最大似然解码。我们通过逆序消除和无损图压缩进一步降低了计算和内存复杂度,而不牺牲精度。在硬件层面,我们设计了一种领域特定架构,在时间上重用资源,避免了先前空间架构中与码距成比例的资源增长。提出了多项优化,如多bank内存哈希和分层ID映射,以缓解高度并发访问模式下的流水线停顿和内存冲突。在电路级退极化噪声模型下,我们的协同设计方法实现了比先前基于MWPM和UF的解码器更好的精度-延迟权衡,同时与已报道的基于UF的解码器资源相比,FPGA LUT消耗减少了高达8.2倍。可调候选数进一步暴露了一个灵活的设计旋钮,使用户能够根据不同的容错工作负载定制解码性能。我们的实现可在https://this URL公开获取。
Reliable large-scale quantum computation relies on fault-tolerant architectures, where quantum error correction (QEC) continuously extracts and decodes error syndromes in real time. A critical component in QEC is the decoder, a classical subsystem that must simultaneously deliver high logical accuracy and ultra-low latency. This paper presents a novel algorithm-hardware co-design that improves the accuracy-latency trade-off over existing approaches such as vanilla Minimum-Weight Perfect Matching (MWPM) and Union-Find (UF) decoders. At the algorithmic level, we introduce coset ensemble decoding, which improves UF decoding by explicitly exploiting logically equivalent cosets. Our method performs ensemble forest exploration to generate multiple coset-consistent candidates and aggregates them to approximate coset-level maximum-likelihood decoding. We further reduce computational and memory complexity via reverse-order elimination and lossless graph compression, without sacrificing accuracy. At the hardware level, we design a domain-specific architecture that temporally reuses resources, avoiding the code-distance-proportional resource growth in prior spatial architectures. Several optimizations, such as multi-bank memory hashing and hierarchical ID mapping, are proposed to mitigate pipeline stalls and memory conflicts under highly concurrent access patterns. Under a circuit-level depolarizing noise model, our co-design approach achieves a better accuracy-latency trade-off than prior MWPM- and UF-based decoders, while reducing FPGA LUT consumption by up to 8.2 times compared with reported UF-based decoder resources. The tunable candidate number further exposes a flexible design knob, enabling users to tailor decoding performance to the requirements of different fault-tolerant workloads. Our implementation is publicly available at https://github.com/IMSeonL/coset-ensemble-decoder.