大模型推理能力 - arXivDaily 专题

2305.14985 2026-06-19 cs.CV cs.CL 版本更新 65%

IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models

IdealGPT: 通过大型语言模型迭代分解视觉与语言推理

Haoxuan You, Rui Sun, Zhecan Wang, Long Chen, Gengyu Wang, Hammad A. Ayyubi, Kai-Wei Chang, Shih-Fu Chang

发表机构 * Columbia University（哥伦比亚大学）； HKUST（香港科技大学）； University of California, Los Angeles（加州大学洛杉矶分校）

专题命中复杂问题求解：LLM生成子问题并推理最终答案。

AI总结提出IdealGPT框架，利用大型语言模型迭代分解视觉语言推理任务，通过子问题生成、子答案获取和最终答案推理的循环过程，在零样本设置下显著提升多步推理性能。

Comments 13 pages, 5 figures

详情

AI中文摘要

视觉与语言（VL）理解领域通过端到端的大型预训练VL模型（VLM）取得了前所未有的进展。然而，它们在需要多步推理的零样本推理任务中仍存在不足。为了实现这一目标，先前的工作采用了分而治之的流程。本文认为，先前的工作存在几个固有的缺点：1）它们依赖于特定领域的子问题分解模型。2）即使子问题或子答案提供的信息不足，它们也强制模型预测最终答案。我们通过IdealGPT框架解决了这些局限性，该框架利用大型语言模型（LLM）迭代分解VL推理。具体来说，IdealGPT使用一个LLM生成子问题，一个VLM提供相应的子答案，另一个LLM进行推理以得出最终答案。这三个模块迭代地执行分而治之的过程，直到模型对主问题的最终答案有信心。我们在零样本设置下对多个具有挑战性的VL推理任务评估了IdealGPT。特别是，我们的IdealGPT在VCR上比现有最好的GPT-4类模型绝对提高了10%，在SNLI-VE上提高了15%。代码可在以下网址获取：此 https URL

英文摘要

The field of vision-and-language (VL) understanding has made unprecedented progress with end-to-end large pre-trained VL models (VLMs). However, they still fall short in zero-shot reasoning tasks that require multi-step inferencing. To achieve this goal, previous works resort to a divide-and-conquer pipeline. In this paper, we argue that previous efforts have several inherent shortcomings: 1) They rely on domain-specific sub-question decomposing models. 2) They force models to predict the final answer even if the sub-questions or sub-answers provide insufficient information. We address these limitations via IdealGPT, a framework that iteratively decomposes VL reasoning using large language models (LLMs). Specifically, IdealGPT utilizes an LLM to generate sub-questions, a VLM to provide corresponding sub-answers, and another LLM to reason to achieve the final answer. These three modules perform the divide-and-conquer procedure iteratively until the model is confident about the final answer to the main question. We evaluate IdealGPT on multiple challenging VL reasoning tasks under a zero-shot setting. In particular, our IdealGPT outperforms the best existing GPT-4-like models by an absolute 10% on VCR and 15% on SNLI-VE. Code is available at https://github.com/Hxyou/IdealGPT

URL PDF HTML ☆

赞 0 踩 0

1702.06162 2026-06-19 cs.CR 版本更新 55%

Survey of Automated Vulnerability Detection and Exploit Generation Techniques in Cyber Reasoning Systems

网络推理系统中自动化漏洞检测与利用生成技术综述

Teresa Nicole Brooks

专题命中复杂问题求解：综述自动化漏洞检测与利用生成，涉及推理

AI总结本文综述了DARPA网络大挑战赛中获胜系统Mayhem和Mechanical Phish的自动化漏洞检测与利用生成技术，总结了其核心方法、底层技术及相关工作。

Comments This is the accepted submitted version of this paper that was published in the Intelligent Computing Proceedings of the 2018 Computing Conference, Volume 2

Journal ref Intelligent Computing: Proceedings of the 2018 Computing Conference, Vol. 2, Springer, 2019, pp. 1083-1102

详情

DOI: 10.1007/978-3-030-01177-2_79

AI中文摘要

软件无处不在，从工业电站、心脏起搏器等关键任务系统到家用电器。对技术日益增长的依赖以及软件复杂性的增加带来了严重的安全隐患，因为我们可能被含有可利用漏洞的软件所包围。这些挑战使得二进制分析成为计算机科学中的一个重要研究领域，并强调了构建能够以规模、速度和效能运行的自动化分析系统的必要性，同时具备人类专家的技能。尽管该领域的研究取得了巨大进展，但仍存在局限性和有待解决的开放挑战。认识到这一需求，DARPA赞助了网络大挑战赛（CGC），这是一场展示当前最先进系统的竞赛，这些系统执行自动化漏洞检测、利用生成和软件修补。本文是对两个获胜系统Mayhem和Mechanical Phish的漏洞检测与利用生成技术、底层技术及相关工作的综述。

英文摘要

Software is everywhere, from mission critical systems such as industrial power stations, pacemakers and even household appliances. This growing dependence on technology and the increasing complexity software has serious security implications as it means we are potentially surrounded by software that contain exploitable vulnerabilities. These challenges have made binary analysis an important area of research in computer science and has emphasized the need for building automated analysis systems that can operate at scale, speed and efficacy; all while performing with the skill of a human expert. Though great progress has been made in this area of research, there remains limitations and open challenges to be addressed. Recognizing this need, DARPA sponsored the Cyber Grand Challenge (CGC), a competition to showcase the current state of the art in systems that perform; automated vulnerability detection, exploit generation and software patching. This paper is a survey of the vulnerability detection and exploit generation techniques, underlying technologies and related works of two of the winning systems Mayhem and Mechanical Phish.

URL PDF HTML ☆

赞 0 踩 0