TAHOE: Text-to-SQL with Automated Hint Optimization from Experience
TAHOE: 基于经验的自动提示优化文本到SQL系统
Zhiyi Chen, Jie Song, Peng Li
AI总结 提出TAHOE系统,通过错误驱动的提示学习管道将调试痕迹转化为结构化提示库,结合策略层建模用户意图,在Spider 2.0-Snow上无需更新参数即可显著提升Text-to-SQL性能。
详情
大型语言模型(LLM)通过Text-to-SQL使数据库访问民主化,但从原型到生产部署仍然困难。实际部署必须处理严格的SQL方言、大规模模式和不断变化的用户偏好,而有监督微调成本高且僵化,代理测试时扩展昂贵。我们提出Tahoe,一个将提示优化视为动态数据管理问题的系统。Tahoe在开发和部署阶段使用错误驱动的提示学习管道,将调试痕迹整合到结构化的提示库中。编译器反馈被提炼为可重用的语法提示(针对方言特定规则),而执行和用户反馈被转换为语义提示(针对模式和用户特定逻辑)。Tahoe进一步引入策略层,将冲突的用户意图建模为共享自然语言触发下的竞争策略,并利用近期信号和学习后归因统计来总结经验成功、危害、惰性和支持。在推理时,Tahoe检索相关提示,并通过逻辑规划后接SQL合成引导LLM。我们实现并评估了开发阶段的工作流,将部署时的人类反馈更新留作未来工作。在Spider 2.0-Snow上,Tahoe在不更新模型参数的情况下显著改进了Text-to-SQL。在113个有监督的Spider 2.0-Snow-0212示例上使用GPT-5.5,Tahoe将通过率从61.95%提高到79.42%,pass-at-4从72.57%提高到87.61%,实现了100%的Snowflake语法通过率,并将每个采样候选的平均编译器反馈批评轮次从2.79降低到0.12。相同的提示库也迁移到较弱的骨干模型,包括在Doubao-2.0-lite上获得19.7个百分点的通过率提升。
Large Language Models (LLMs) have democratized database access through Text-to-SQL, but moving from prototypes to production remains difficult. Real deployments must handle strict SQL dialects, massive schemas, and evolving user preferences, while supervised fine-tuning is costly and rigid and agentic test-time scaling is expensive. We present Tahoe, a system that treats prompt optimization as a dynamic data management problem. Tahoe uses an error-driven hint learning pipeline across Development and Deployment to consolidate debugging traces into a structured Hint Bank. Compiler feedback is distilled into reusable Syntax Hints for dialect-specific rules, while execution and user feedback are converted into Semantic Hints for schema- and user-specific logic. Tahoe further introduces a Strategy Layer that models conflicting user intents as competing strategies under shared natural-language triggers, with recency signals and post-learning attribution statistics that summarize empirical success, harm, inertness, and support. At inference time, Tahoe retrieves relevant hints and guides the LLM through Logic Planning followed by SQL Synthesis. We implement and evaluate the development-phase workflow, leaving deployment-time human-feedback updates for future work. On Spider 2.0-Snow, Tahoe substantially improves Text-to-SQL without updating model parameters. On 113 supervised Spider 2.0-Snow-0212 examples using GPT-5.5, Tahoe raises pass rate from 61.95 percent to 79.42 percent and pass-at-4 from 72.57 percent to 87.61 percent, achieves 100 percent Snowflake syntax pass rate, and reduces average compiler-feedback critic rounds from 2.79 to 0.12 per sampled candidate. The same Hint Bank also transfers to weaker backbones, including a 19.7 percentage-point pass-rate gain on Doubao-2.0-lite.