PennySynth: RAG-Driven Data Synthesis for Automated Quantum Code Generation
PennySynth:基于RAG的数据合成用于自动量子代码生成
Minghao Shao, Nouhaila Innan, Hariharan Janardhanan, Muhammad Kashif, Alberto Marchisio, Muhammad Shafique
AI总结 提出PennySynth框架,通过检索增强生成和代码感知嵌入,利用13,389个PennyLane指令-代码对数据集,在QHack竞赛中实现52%-68%的pass@5,显著提升量子代码生成的结构有效性和功能正确性。
详情
- Comments
- 11 pages, 3 figures
量子编程框架日益增长的复杂性暴露了现有基于大语言模型(LLM)的代码助手的一个关键局限性:通用模型在面对专门的量子编码挑战时,会幻觉出PennyLane特定的门名称、错误放置设备配置并生成结构无效的电路。我们提出PennySynth,一个检索增强生成框架,通过将LLM推理条件化为一个包含13,389个PennyLane指令-代码对的精选知识库来解决这一差距,该知识库通过一个三阶段(提取、验证和去重)流程从官方PennyLane仓库、社区GitHub源和QHack竞赛档案中构建。PennySynth引入了一种使用st-codesearch-distilroberta-base的代码感知嵌入策略,该策略针对自然语言到代码的检索进行训练,将平均检索余弦相似度从通用基线的0.45提高到0.726。在涵盖QHack竞赛三年(2022、2023、2024)的74个挑战上进行评估,PennySynth在QHack 2022、2023和2024上分别达到64%、68%和52%的pass@5,相比无检索的Claude Sonnet 4.6提高了+28、+25和+28个百分点。我们进一步引入了一个量子适应的CodeBLEU指标,该指标对qml.*令牌模式进行加权,并表明结构代码相似性和功能正确性捕捉了量子代码质量的不同方面。受控消融实验揭示,代码感知嵌入是检索性能的主要驱动因素,而当检索质量足够精确时,数据集扩展和源组合提供了额外的增益。
The growing complexity of quantum programming frameworks has exposed a critical limitation in existing large language model (LLM)-based code assistants: general-purpose models hallucinate PennyLane-specific gate names, misplace device configurations, and produce structurally invalid circuits when faced with specialized quantum coding challenges. We present PennySynth, a retrieval-augmented generation framework that addresses this gap by conditioning LLM inference on a curated knowledge base of 13,389 PennyLane instruction-code pairs, built via a three-stage extraction, verification, and deduplication pipeline over official PennyLane repositories, community GitHub sources, and QHack competition archives. PennySynth introduces a code-aware embedding strategy using st-codesearch-distilroberta-base, trained for natural-language-to-code retrieval, increasing average retrieval cosine similarity from 0.45 to 0.726 compared to a general-purpose baseline. Evaluated across 74 challenges spanning three years of the QHack competition (2022, 2023, 2024), PennySynth achieves 64%, 68%, and 52% pass@5 on QHack 2022, 2023, and 2024, respectively, improving over Claude Sonnet 4.6 without retrieval by +28, +25, and +28 percentage points. We further introduce a quantum-adapted CodeBLEU metric that upweights qml.* token patterns and show that structural code similarity and functional correctness capture distinct aspects of quantum code quality. Controlled ablations reveal that code-aware embeddings are the primary driver of retrieval performance, while dataset expansion and source composition provide additional gains when retrieval quality is sufficiently precise.