Qiskit Code Migration with LLMs
使用大语言模型进行Qiskit代码迁移
Jose Manuel Suarez, Luis Mariano Bibbo, Joaquin Bogado, Alenandro Fernandez
AI总结 针对量子软件开发套件版本演进导致的代码维护问题,提出结合大语言模型与检索增强生成(RAG)的混合方法,利用自动生成的迁移场景分类体系引导模型,实现Qiskit代码跨版本自动迁移,有效减少幻觉并提升迁移建议质量。
详情
量子开发套件(QDK)的快速演进引入了一种特定形式的技术债务,损害了代码可维护性并阻碍了软件复用。在量子软件工程(QSE)这一专业领域,高质量训练数据的稀缺和新兴框架的高波动性加剧了这一挑战,常导致通用大语言模型(LLM)产生不可靠或幻觉结果。本文提出一种将LLM与检索增强生成(RAG)相结合的混合方法,用于自动化Qiskit代码的跨版本迁移。所提方法通过利用自动生成的迁移场景分类体系作为结构化、版本特定的知识源来指导模型,从而提升迁移建议的精度和可靠性。该方法通过一个自动化、可扩展的工作流实现,评估了不同检索方案(无约束和限制性)下的LLM(Google Gemini Flash-2.5和OpenAI Gpt-oss-20b)。结果表明,基于分类体系的RAG架构,特别是在限制性方案下,显著减少了幻觉并提高了描述质量,其中Google Gemini Flash-2.5在检测复杂重构场景方面表现出更优性能。这些发现证实了这种以数据为中心的方法在促进技术独立性、提供缓解API过时问题的鲁棒智能助手方面的潜力,从而确保量子算法在快速变化的生态系统中的长期可用性,并降低量子软件工程(QSE)的学习曲线。
The rapid evolution of Quantum Development Kits (QDKs) introduces a specific form of technical debt that compromises code maintainability and hinders software reuse. In the specialized domain of Quantum Software Engineering (QSE), this challenge is intensified by the scarcity of high-quality training data and the high volatility of emerging frameworks, which often lead general-purpose Large Language Models (LLMs) to produce unreliable or hallucinated results. This paper proposes a hybrid approach integrating LLMs with Retrieval-Augmented Generation (RAG) to automate the migration of Qiskit code across versions. The proposed methodology enhances the precision and reliability of migration suggestions by leveraging an automatically generated taxonomy of migration scenarios as the structured, version-specific knowledge source to guide the models. The approach is implemented through an automated, extensible workflow evaluating LLMs (Google Gemini Flash-2.5 and OpenAI Gpt-oss-20b) under different retrieval schemes (unconstrained and restrictive). Results demonstrate that the taxonomy-based RAG architecture, particularly under the restrictive scheme, significantly reduces hallucinations and improves descriptive quality, with Google Gemini Flash-2.5 showing superior performance in detecting complex refactoring scenarios. These findings confirm the potential of this data-centric methodology to foster technological independence and provide robust, intelligent assistants that mitigate API obsolescence, ensuring the long-term availability of quantum algorithms within a rapidly shifting ecosystem and flattening the learning curve within Quantum Software Engineering (QSE).