arXivDaily arXiv每日学术速递 周一至周五更新

AI 大模型

AI Agent

智能体、工具调用、规划、工作流、多智能体和自主任务执行。

今日/当前日期收录 5 信号源:cs.AI, cs.CL, cs.LG, cs.SE
2606.18502 2026-06-18 cs.CL 新提交 90%

Towards Scalable Customization and Deployment of Multi-Agent Systems for Enterprise Applications

面向企业应用的多智能体系统可扩展定制与部署

Paresh Dashore, Shreyas Kulkarni, Uttam Gurram, Nadia Bathaee, Kartik Balasubramaniam, Genta Indra Winata, Sambit Sahu, Shi-Xiong Zhang

发表机构 * Capital One(第一资本)

专题命中 工作流自动化 :多智能体系统定制与部署框架

AI总结 提出统一框架,通过智能体模型定制(持续预训练、微调、偏好优化)和推理优化(推测解码、FP8量化),实现领域自适应和4.48倍吞吐加速,保持性能并提升长尾场景鲁棒性。

Comments Preprint

详情
AI中文摘要

基于大语言模型的多智能体系统在复杂推理和任务执行上表现出色,支持广泛的企业应用。然而,由于领域特定的定制需求以及智能体工作流中的高延迟和推理成本,生产部署仍然具有挑战性。我们提出了一个统一框架,用于在实际环境中定制和高效部署多智能体系统。第一阶段,智能体模型定制,结合持续预训练、监督微调和偏好优化,将紧凑模型适应到专业领域,同时保留强大的智能体能力。第二阶段,推理优化,集成推测解码和FP8量化与目标校准,以最小质量损失实现成本高效的推理服务。在企业工作负载上,我们的框架实现了快速领域自适应,吞吐量提升4.48倍,同时保持性能并提高长尾场景的鲁棒性。

英文摘要

Large language model (LLM)-based multi-agent systems demonstrate strong performance on complex reasoning and task execution, enabling broad enterprise applications. However, production deployment remains challenging due to domain-specific customization requirements and high latency and inference costs in agentic workflows. We propose a unified framework for customization and efficient deployment of multi-agent systems in real-world settings. The first stage, Agentic Model Customization, combines continual pretraining, supervised fine-tuning, and preference optimization to adapt a compact model to specialized domains while retaining strong agentic capabilities. The second stage, Inference Optimization, integrates speculative decoding and FP8 quantization with targeted calibration to enable cost-efficient serving with minimal quality loss. Across enterprise workloads, our framework enables rapid domain adaptation and achieves a 4.48x speedup in throughput while maintaining performance and improving robustness on long-tail scenarios.

2606.18661 2026-06-18 cs.CV cs.AI 新提交 85%

LandslideAgent with Multimodal LandslideBench: A Domain-Rule-Augmented Agent for Autonomous Landslide Identification and Analysis

LandslideAgent与多模态LandslideBench:一种面向自主滑坡识别与分析的领域规则增强型智能体

Chengfu Liu, Dongyang Hou, Junwu Xiang, Cheng Yang, Xuezhi Cui, Zeyuan Wang, Liangtian Liu, Zelang Miao

发表机构 * Central South University(中南大学)

专题命中 工作流自动化 :指令驱动智能体框架,自主识别分析滑坡

AI总结 提出指令驱动智能体框架,包含多模态数据集LandslideBench、滑坡专用视觉语言模型LandslideVLM及领域规则增强智能体LandslideAgent,实现自主滑坡识别与分析。

详情
AI中文摘要

智能滑坡灾害解译对于防灾减灾至关重要,然而当前范式难以同时提取视觉特征和高层次地球科学语义,而通用视觉语言模型在复杂地质场景中存在感知局限和领域幻觉。为解决这些挑战,我们提出一个指令驱动的智能体框架,包含三个组成部分。首先,通过多VLM交叉验证和交互式标注构建LandslideBench,这是一个多模态细粒度数据集,包含七个子类型标签、高分辨率图像、像素级掩膜和高质量文本描述。然后,通过LoRA在LandslideBench上微调面向滑坡的VLM——LandslideVLM,以增强地质语义理解。最后,以LandslideVLM为认知核心的领域规则增强智能体LandslideAgent,采用双规则控制器,结合结构化报告元数据约束和交叉验证识别约束,来调控自动化工具调用。实验表明,LandslideBench为五种主流模型在细粒度分类和语义分割上提供了有效基线。LandslideVLM在滑坡判别、细粒度分类和语义描述质量上分别提升了10.96%、32.87%和15.91%。LandslideAgent进一步实现了自主多源空间数据推理,实现了滑坡识别与分析的全流程智能化。

英文摘要

Intelligent landslide hazard interpretation is critical for disaster prevention, yet current paradigms struggle to simultaneously extract visual features and high-level geoscientific semantics, while general-purpose vision-language models (VLMs) suffer from perceptual limitations and domain hallucinations in complex geological scenarios. To address these challenges, we propose an instruction-driven agentic framework comprising three components. First, LandslideBench, a multimodal fine-grained dataset with seven subtype labels, high-resolution imagery, pixel-level masks, and high-quality textual descriptions, is constructed via multi-VLM cross-validation and interactive annotation. Then, LandslideVLM, a landslide-oriented VLM, is fine-tuned via LoRA on LandslideBench to enhance geological semantic understanding. Finally, LandslideAgent, a domain rule-enhanced agent taking LandslideVLM as its cognitive backbone, employs a dual-rule controller incorporating structured report metadata constraints and cross-validation identification constraints to regulate automated tool invocation. Experiments demonstrate that LandslideBench provides effective baselines across five mainstream models on fine-grained classification and semantic segmentation. LandslideVLM achieves accuracy improvements of 10.96%, 32.87%, and 15.91% on landslide discrimination, fine-grained classification, and semantic description quality, respectively. LandslideAgent further enables autonomous multi-source spatial data inference, realizing full-process intelligence for landslide identification and analysis.

2606.18425 2026-06-18 cs.SE cs.AI cs.DC 新提交 85%

From Specification to Execution: AI Assisted Scientific Workflow Management

从规范到执行:AI辅助的科学工作流管理

Komal Thareja, Hamza Safri, Rajiv Mayani, Anirban Mandal, Ewa Deelman

发表机构 * RENCI, University of North Carolina at Chapel Hill, NC, USA(RENCI,北卡罗来纳大学教堂山分校) Information Sciences Institute, University of Southern California, Marina del Rey, CA, USA(信息科学研究所,南加州大学马里纳德尔雷耶斯分校)

专题命中 工作流自动化 :AI辅助科学工作流生成与调试

AI总结 提出一种AI辅助方法,通过规范驱动的工作流生成、自动化调试和分布式执行,结合Pegasus与MCP层,实现从自然语言到大规模科学工作流的端到端管理。

详情
AI中文摘要

科学工作流管理系统(WMS)支持复杂管道的可扩展和可重复执行,但工作流的设计、实现和调试仍然主要依赖人工,需要大量专业知识。最近使用大型语言模型(LLM)的方法在从自然语言生成工作流方面显示出潜力,但通常依赖于直接的代码合成,这限制了透明度、可重复性以及与工作流系统的集成。我们提出了一种AI辅助的科学工作流管理方法,结合了规范驱动的工作流生成、自动化调试和分布式执行。该方法引入了一个结构化的规范阶段,将工作流意图、设计和实现分离,允许在代码生成之前进行验证。我们还开发了一个基于LLM的调试代理,用于诊断和解决跨多个系统层的故障。为了支持分布式执行和用户交互,我们将广泛使用的WMS Pegasus与模型上下文协议(MCP)层集成,为工作流提交、监控和控制提供统一接口。我们使用一个用于医学影像的联邦学习工作流来评估该方法,该工作流具有并行、迭代和依赖密集的结构。该系统生成并执行了包含数千个作业的大规模工作流,减少了调试工作量,并允许非专家用户使用专家级设计模式构建工作流。这些结果表明,端到端的AI辅助工作流生成和执行是可行的,并指向了用于管理科学工作流生命周期的AI驱动平台。

英文摘要

Scientific workflow management systems (WMS) support scalable and reproducible execution of complex pipelines, but workflow design, implementation, and debugging remain largely manual and require significant expertise. Recent approaches using large language models (LLMs) show promise for workflow generation from natural language, but often rely on direct code synthesis, which limits transparency, reproducibility, and integration with workflow systems. We present an AI-assisted approach to scientific workflow management that combines specification-driven workflow generation, automated debugging, and distributed execution. The method introduces a structured specification phase that separates workflow intent, design, and implementation, allowing validation prior to code generation. We also develop an LLM-based debugging agent that diagnoses and resolves failures across multiple system layers. To support distributed execution and user interaction, we integrate Pegasus, a widely used WMS, with a Model Context Protocol (MCP) layer, providing a unified interface for workflow submission, monitoring, and control. We evaluate the approach using a federated learning workflow for medical imaging, chosen for its parallel, iterative, and dependency-intensive structure. The system generated and executed large-scale workflows with thousands of jobs, reduced debugging effort, and allowed non-expert users to construct workflows with expert-level design patterns. These results indicate that end-to-end AI-assisted workflow generation and execution is feasible, and point toward AI-driven platforms for managing the scientific workflow lifecycle.

2606.18874 2026-06-18 cs.AI 新提交 80%

Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

通过研究框架将AI科学家的研究综合与验证外部化

Zijian Wang, Hanqi Li, Ziyue Yang, Zijian Hu, Shenghan Zuo, Yunzhe Zhang, Da Ma, Danyu Luo, Chenrun Wang, Jing Peng, Tiancheng Huang, Sijia Guo, Huayang Wang, Zichen Zhu, Senyu Han, Yilu Cao, Kai Yu, Lu Chen

发表机构 * X-LANCE Lab, School of Computer Science, Shanghai Jiao Tong University, Shanghai, China(上海交通大学计算机学院X-LANCE实验室) Jiangsu Key Lab of Language Computing, Suzhou, China(江苏省语言计算重点实验室) Suzhou Laboratory, Suzhou, China(苏州实验室)

专题命中 工作流自动化 :自动化科学研究工作流,外部化综合与验证。

AI总结 提出Xcientist框架,将研究综合与实验验证外部化为可检查的合同驱动过程,解决自动研究中的声明漂移问题,并在多个领域验证其有效性。

Comments 65 pages, 14 figures, 19 tables

详情
AI中文摘要

AI系统日益能够自动化科学工作流程,但连接先前证据、生成的想法、实验和最终声明的推理通常仍然隐含在模型推理中。这里我们介绍Xcientist,一个研究框架,将研究综合和实验验证外部化为可检查的、合同驱动的过程。Xcientist将文献证据、想法状态、实施计划、消融记录和修复痕迹组织为持久的研究工件,使得生成的机制可以在不丢失其证据基础的情况下被基础化、执行、测试和修订。我们将声明漂移识别为自动化研究的一种失败模式,其中可运行的工件不再支持最初声称的机制。在无训练记忆系统、图结构交通预测和多尺度物理信息神经网络中,Xcientist保留了从问题公式化到机制设计、验证和有限修订的可追踪轨迹。这些结果表明,AI科学家不仅应根据其最终工件进行评估,还应看其综合和验证过程是否可归因、可检查且在科学上可问责。

英文摘要

AI systems can increasingly automate scientific workflows, but the reasoning that links prior evidence, generated ideas, experiments and final claims often remains implicit inside model inference. Here we introduce Xcientist, a research harness that externalizes research synthesis and experimental validation into inspectable, contract-governed processes. Xcientist organizes literature evidence, idea states, implementation plans, ablation records and repair traces as persistent research artifacts, so that generated mechanisms can be grounded, executed, tested and revised without losing their evidential basis. We identify claim drift as a failure mode of automated research, where runnable artifacts no longer support the mechanism originally claimed. Across training-free memory systems, graph-structured traffic forecasting and multi-scale physics-informed neural networks, Xcientist preserves traceable trajectories from problem formulation to mechanism design, validation and bounded revision. These results suggest that AI scientists should be evaluated not only by their final artifacts, but by whether their synthesis and validation processes remain attributable, inspectable and scientifically accountable.

2606.17510 2026-06-18 cs.SE cs.SY eess.SY 新提交 75%

OmniDroneX: An LLM-Assisted Holistic Drone-as-a-Service Ecosystem

OmniDroneX: 一种LLM辅助的全方位无人机即服务生态系统

I-Ling Yen, Akeem Mohammed, Farokh Bastani, San-Yih Hwang

专题命中 工作流自动化 :LLM辅助无人机服务组合与任务定义

AI总结 提出OmniDroneX统一无人机即服务生态系统,通过libUAV接口和PT-SOA抽象模型连接底层物理与高层任务,利用大语言模型辅助功能识别、服务组合和自然语言任务定义,支持多种组合技术以实现可扩展、自演进的无人机系统。

Comments This manuscript is a full version of a paper accepted in shortened form by IEEE International Conference on Joint Cloud Computing

详情
AI中文摘要

尽管无人机技术取得了快速进步,但由于无人机系统研究中的若干空白,当前部署仍然有限。为应对这些挑战,我们提出OmniDroneX,一个统一的无人机即服务生态系统,其中无人机从固定功能平台转变为动态可组合实体,可与外部基础设施集成以提供全方位能力。OmniDroneX通过统一的供应商无关接口(libUAV)和形式化的物理服务抽象模型(PT-SOA)连接底层物理原语与高层任务意图。一个核心创新是大语言模型(LLM)在OmniDroneX架构多层中的多样化应用。LLM用于辅助识别和形式化原始设备功能及抽象服务定义,支持自动化服务组合和工作流生成,并实现交互式自然语言任务规范与细化。OmniDroneX还包含了动态无人机系统中至关重要的多种组合技术类别,包括用于无人机能力增强的物理层组合,以及时空、功能、协作、异常感知和基于QoS的服务组合。总体而言,这些特性使OmniDroneX能够作为在复杂动态环境中运行的可扩展、有弹性和自演进的无人机生态系统的基础。

英文摘要

Despite rapid advances in UAV technologies, current deployments remain limited due to several gaps in UAV systems research. To address these challenges, we propose OmniDroneX, a unified Drone-as-a-Service ecosystem, in which drones are transitioned from fixed function platforms into dynamically composable entities that can be integrated with external infrastructures to offer omni-capabilities. OmniDroneX bridges low-level physical primitives with high-level mission intent through a unified vendor-agnostic interface (libUAV) and a formal physical-service abstraction model (PT-SOA). A core innovation is the diverse application of large language models (LLMs) across multiple layers of the OmniDroneX architecture. LLMs are used to assist in identifying and formalizing primitive device functions and abstract service definitions, supporting automated service composition and workflow generation, and enabling interactive, natural-language mission specification and refinement. OmniDroneX also incorporates important categories of composition techniques that are essential in dynamic UAV systems, including physical layer composition for drone capability augmentation, as well as spatiotemporal, functional, collaborative, exception-aware, and QoS-based service compositions. Collectively, these features allow OmniDroneX to serve as a foundation for scalable, resilient, and self-evolving UAV ecosystems operating in complex and dynamic environments.