arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2603.22528 2026-03-25 cs.IR cs.AI

GraphRAG for Engineering Diagrams: ChatP&ID Enables LLM Interaction with P&IDs

Achmad Anggawirya Alimin, Artur M. Schweidtmann

2603.22513 2026-03-25 cs.SE cs.CL

Generating and Evaluating Sustainable Procurement Criteria for the Swiss Public Sector using In-Context Prompting with Large Language Models

Yingqiang Gao, Veton Matoshi, Luca Rolshoven, Tilia Ellendorff, Judith Binder, Jeremy Austin Jann, Gerold Schneider, Matthias Stürmer

详情

英文摘要

Public procurement refers to the process by which public sector institutions, such as governments, municipalities, and publicly funded bodies, acquire goods and services. Swiss law requires the integration of ecological, social, and economic sustainability requirements into tender evaluations in the format of criteria that have to be fulfilled by a bidder. However, translating high-level sustainability regulations into concrete, verifiable, and sector-specific procurement criteria (such as selection criteria, award criteria, and technical specifications) remains a labor-intensive and error-prone manual task, requiring substantial domain expertise in several groups of goods and services and considerable manual effort. This paper presents a configurable, LLM-assisted pipeline that is presented as a software supporting the systematic generation and evaluation of sustainability-oriented procurement criteria catalogs for Switzerland. The system integrates in-context prompting, interchangeable LLM backends, and automated output validation to enable auditable criteria generation across different procurement sectors. As a proof of concept, we instantiate the pipeline using official sustainability guidelines published by the Swiss government and the European Commission, which are ingested as structured reference documents. We evaluate the system through a combination of automated quality checks, including an LLM-based evaluation component, and expert comparison against a manually curated gold standard. Our results demonstrate that the proposed pipeline can substantially reduce manual drafting effort while producing criteria catalogs that are consistent with official guidelines. We further discuss system limitations, failure modes, and design trade-offs observed during deployment, highlighting key considerations for integrating generative AI into public sector software workflows.

URL PDF HTML ☆

赞 0 踩 0

2603.22510 2026-03-25 cs.DL cs.AI cs.IR

Do Large Language Models Reduce Research Novelty? Evidence from Information Systems Journals

Ali Safari

2603.22499 2026-03-25 cs.CR cs.LG

OrgForge-IT: A Verifiable Synthetic Benchmark for LLM-Based Insider Threat Detection

Jeffrey Flynt

详情

英文摘要

Synthetic insider threat benchmarks face a consistency problem: corpora generated without an external factual constraint cannot rule out cross-artifact contradictions. The CERT dataset -- the field's canonical benchmark -- is also static, lacks cross-surface correlation scenarios, and predates the LLM era. We present OrgForge-IT, a verifiable synthetic benchmark in which a deterministic simulation engine maintains ground truth and language models generate only surface prose, making cross-artifact consistency an architectural guarantee. The corpus spans 51 simulated days, 2,904 telemetry records at a 96.4% noise rate, and four detection scenarios designed to defeat single-surface and single-day triage strategies across three threat classes and eight injectable behaviors. A ten-model leaderboard reveals several findings: (1) triage and verdict accuracy dissociate - eight models achieve identical triage F1=0.80 yet split between verdict F1=1.0 and 0.80; (2) baseline false-positive rate is a necessary companion to verdict F1, with models at identical verdict accuracy differing by two orders of magnitude on triage noise; (3) victim attribution in the vishing scenario separates tiers - Tier A models exonerate the compromised account holder while Tier B models detect the attack but misclassify the victim; (4) rigid multi-signal thresholds structurally exclude single-surface negligent insiders, demonstrating the necessity of parallel, threat-class-specific triage pipelines; and (5) agentic software-engineering training acts as a force multiplier for multi-day temporal correlation, but only when paired with frontier-level parameter scale. Finally, prompt sensitivity analysis reveals that unstructured prompts induce vocabulary hallucination, motivating a two-track scoring framework separating prompt adherence from reasoning capability. OrgForge-IT is open source under the MIT license.

URL PDF HTML ☆

赞 0 踩 0

2603.22469 2026-03-25 eess.SY cs.AI cs.SY math.OC

Stability-Preserving Online Adaptation of Neural Closed-loop Maps

Danilo Saccani, Luca Furieri, Giancarlo Ferrari-Trecate

2603.22468 2026-03-25 stat.ML cs.LG math.ST stat.TH

SPDE Methods for Nonparametric Bayesian Posterior Contraction and Laplace Approximation

Enric Alberola-Boloix, Ioar Casado-Telletxea

Comments 32 pages, under review

2603.22437 2026-03-25 cs.CR cs.LG eess.SP

mmFHE: mmWave Sensing with End-to-End Fully Homomorphic Encryption

Tanvir Ahmed, Yixuan Gao, Adnan Armouti, Rajalakshmi Nandakumar

Comments Under review

2603.22401 2026-03-25 quant-ph cs.LG

Probabilistic modeling over permutations using quantum computers

Vasilis Belis, Giulio Crognaletti, Matteo Argenton, Michele Grossi, Maria Schuld

Comments 36 pages, 4 Figures

2603.22399 2026-03-25 quant-ph cs.AI cs.LG q-bio.BM

Latent Style-based Quantum Wasserstein GAN for Drug Design

Julien Baglio, Yacine Haddad, Richard Polifka

Comments Main part: 22 pages, 11 figures, 6 tables. Supplementary material: 16 pages, 15 figures, 14 tables

2603.21825 2026-03-25 cs.HC cs.AI

BadminSense: Enabling Fine-Grained Badminton Stroke Evaluation on a Single Smartwatch

Taizhou Chen, Kai Chen, Xingyu Liu, Pingchuan Ke, Zhida Sun

Journal ref In Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems

2603.20328 2026-03-25 stat.ML cs.LG

Decorrelation, Diversity, and Emergent Intelligence: The Isomorphism Between Social Insect Colonies and Ensemble Machine Learning

Ernest Fokoué, Gregory Babbitt, Yuval Levental

Comments 47 pages, 13 figures, 4 tables

2603.12567 2026-03-25 cond-mat.mtrl-sci cs.LG

Foundation-Model Surrogates Enable Data-Efficient Active Learning for Materials Discovery

Jeffrey Hu, Rongzhi Dong, Ying Feng, Ming Hu, Jianjun Hu

Comments 18 pages

详情

英文摘要

Active learning (AL) has emerged as a powerful paradigm for accelerating materials discovery by iteratively steering experiments toward promising candidates, reducing the number of costly synthesis-and-characterization cycles needed to identify optimal materials. However, current AL relies predominantly on Gaussian Process (GP) and Random Forest (RF) surrogates, which suffer from complementary limitations: GP underfits complex composition-property landscapes due to rigid kernel assumptions, while RF produces unreliable heuristic uncertainty estimates in small-data regimes. This small-data challenge is pervasive in materials science, making reliable surrogate modeling extremely difficult with models trained from scratch on each new dataset. Here we propose In-Context Active Learning (ICAL), which addresses this bottleneck by replacing conventional surrogates with TabPFN, a transformer-based foundation model (FM) pre-trained on millions of synthetic regression tasks to meta-learn a universal prior over tabular data, upon which TabPFN performs principled Bayesian inference in a single forward pass without dataset-specific retraining, delivering strong small-data regression performance and well-calibrated predictive uncertainty (required for effective AL). We benchmark ICAL against GP and RF across 10 materials datasets and TabPFN wins on 8 out of 10 datasets, achieving a mean saving of 52% in extra evaluations relative to GP and 29.77% relative to RF. Cross-validation analysis confirms that TabPFN's advantage stems from superior uncertainty calibration, achieving the lowest Negative Log-Likelihood and Area Under the Sparsification Error curve among all surrogates. These results demonstrate that pre-trained FMs can serve as effective surrogates for active learning, enabling data-efficient discovery across diverse materials systems and small-data experimental sciences.

URL PDF HTML ☆

赞 0 踩 0

2603.02098 2026-03-25 cs.IR cs.CL cs.CV

Efficient and High-Fidelity Omni Modality Retrieval

Chuong Huynh, Manh Luong, Abhinav Shrivastava

Comments CVPR 2026. Project page: https://hmchuong.github.io/omniret

详情

英文摘要

Multimodal retrieval is the task of aggregating information from queries across heterogeneous modalities to retrieve desired targets. State-of-the-art multimodal retrieval models can understand complex queries, yet they are typically limited to two modalities: text and vision. This limitation impedes the development of universal retrieval systems capable of comprehending queries that combine more than two modalities. To advance toward this goal, we present OmniRet, the first retrieval model capable of handling complex, composed queries spanning three key modalities: text, vision, and audio. Our OmniRet model addresses two critical challenges for universal retrieval: computational efficiency and representation fidelity. First, feeding massive token sequences from modality-specific encoders to Large Language Models (LLMs) is computationally inefficient. We therefore introduce an attention-based resampling mechanism to generate compact, fixed-size representations from these sequences. Second, compressing rich omni-modal data into a single embedding vector inevitably causes information loss and discards fine-grained details. We propose Attention Sliced Wasserstein Pooling to preserve these fine-grained details, leading to improved omni-modal representations. OmniRet is trained on an aggregation of approximately 6 million query-target pairs spanning 30 datasets. We benchmark our model on 13 retrieval tasks and a MMEBv2 subset. Our model demonstrates significant improvements on composed query, audio and video retrieval tasks, while achieving on-par performance with state-of-the-art models on others. Furthermore, we curate a new Audio-Centric Multimodal Benchmark (ACM). This new benchmark introduces two critical, previously missing tasks-composed audio retrieval and audio-visual retrieval to more comprehensively evaluate a model's omni-modal embedding capacity.

URL PDF HTML ☆

赞 0 踩 0

2602.22625 2026-03-25 cs.GR cs.CV

DiffBMP: Differentiable Rendering with Bitmap Primitives

Seongmin Hong, Junghun James Kim, Daehyeop Kim, Insoo Chung, Se Young Chun

Comments Accepted to CVPR 2026, https://diffbmp.com

2602.12288 2026-03-25 eess.SY cs.AI cs.RO cs.SY

Energy-Aware Reinforcement Learning for Robotic Manipulation of Articulated Components in Infrastructure Operation and Maintenance

Xiaowen Tao, Yinuo Wang, Haitao Ding, Yuanyang Qi, Ziyu Song

Comments 18 pages, 5 figures, 7 tables. This version supersedes all previous preprint versions

2602.07023 2026-03-25 q-fin.TR cs.AI

Behavioral Consistency Validation for LLM Agents: An Analysis of Trading-Style Switching through Stock-Market Simulation

Zeping Li, Guancheng Wan, Keyang Chen, Yu Chen, Yiwen Zhao, Philip Torr, Guangnan Ye, Zhenfei Yin, Hongfeng Chai

2602.00657 2026-03-25 cs.CC cs.DM cs.DS cs.LG math.CO

Non-Clashing Teaching in Graphs: Algorithms, Complexity, and Bounds

Sujoy Bhore, Liana Khazaliya, Fionn Mc Inerney

Comments An extended abstract of this paper will appear in the proceedings of ICLR 2026

2601.14001 2026-03-25 cs.IR cs.LG

Cross-Sensory Brain Passage Retrieval: Scaling Beyond Visual to Audio

Niall McGuire, Yashar Moshfeghi

Comments Accepted At ECIR 2026

详情

DOI: 10.1007/978-3-032-21289-4_31

英文摘要

Query formulation from internal information needs remains fundamentally challenging across all Information Retrieval paradigms due to cognitive complexity and physical impairments. Brain Passage Retrieval (BPR) addresses this by directly mapping EEG signals to passage representations without intermediate text translation. However, existing BPR research exclusively uses visual stimuli, leaving critical questions unanswered: Can auditory EEG enable effective retrieval for voice-based interfaces and visually impaired users? Can training on combined EEG datasets from different sensory modalities improve performance despite severe data scarcity? We present the first systematic investigation of auditory EEG for BPR and evaluate cross-sensory training benefits. Using dual encoder architectures with four pooling strategies (CLS, mean, max, multi-vector), we conduct controlled experiments comparing auditory-only, visual-only, and combined training on the Alice (auditory) and Nieuwland (visual) datasets. Results demonstrate that auditory EEG consistently outperforms visual EEG, and cross-sensory training with CLS pooling achieves substantial improvements over individual training: 31% in MRR (0.474), 43% in Hit@1 (0.314), and 28% in Hit@10 (0.858). Critically, combined auditory EEG models surpass BM25 text baselines (MRR: 0.474 vs 0.428), establishing neural queries as competitive with traditional retrieval whilst enabling accessible interfaces. These findings validate auditory neural interfaces for IR tasks and demonstrate that cross-sensory training addresses data scarcity whilst outperforming single-modality approaches Code: https://github.com/NiallMcguire/Audio_BPR

URL PDF HTML ☆

赞 0 踩 0

2601.07315 2026-03-25 cs.MA cs.AI cs.AR

VLM-CAD: VLM-Optimized Collaborative Agent Design Workflow for Analog Circuit Sizing

Guanyuan Pan, Shuai Wang, Yugui Lin, Tiansheng Zhou, Pietro Liò, Zhenxin Zhao, Yaqi Wang

Comments submitted to the 34th ACM International Conference on Multimedia (ACMMM 2026)

2512.22387 2026-03-25 cs.SE cs.AI cs.MA

AI-Generated Code Is Not Reproducible (Yet): An Empirical Study of Dependency Gaps in LLM-Based Coding Agents

Bhanu Prakash Vangala, Ali Adibifar, Ashish Gehani, Tanu Malik

2512.19703 2026-03-25 eess.AS cs.IR cs.LG cs.MM cs.SD

ASK: Adaptive Self-improving Knowledge Framework for Audio Text Retrieval

Siyuan Fu, Xuchen Guo, Mingjun Liu, Hongxiang Li, Boyin Tan, Gongxi Zhu, Xianwei Zhuang, Jinghan Ru, Yuxin Xie, Yuguo Yin

2512.19196 2026-03-25 physics.comp-ph cs.LG cs.NA math.NA

Adaptive Probability Flow Residual Minimization for High-Dimensional Fokker-Planck Equations

Xiaolong Wu, Qifeng Liao

详情

英文摘要

Solving high-dimensional Fokker-Planck (FP) equations is a challenge in computational physics and stochastic dynamics, due to the curse of dimensionality (CoD) and unbounded domains. Existing deep learning approaches, such as Physics-Informed Neural Networks, face computational challenges as dimensionality increases, driven by the $O(d^2)$ complexity of automatic differentiation for second-order derivatives. While recent probability flow approaches bypass this by learning score functions or matching velocity fields, they often involve serial operations or depend on sampling efficiency in complex distributions. To address these issues, we propose the Adaptive Probability Flow Residual Minimization (A-PFRM) method. The second-order FP equation is reformulated as an equivalent first-order deterministic Probability Flow ODE (PF-ODE) constraint, which avoids explicit Hessian computation. Unlike score matching or velocity matching, A-PFRM solves FP equations by minimizing the residual of the continuity equation induced by the PF-ODE. By utilizing Continuous Normalizing Flows combined with the Hutchinson Trace Estimator, the training complexity is reduced to a linear scale of $O(d)$, achieving an efficient $O(1)$ wall-clock time on GPUs. To address data sparsity in high dimensions, a generative adaptive sampling strategy is employed, and we further prove that dynamically aligning collocation points with the evolving probability mass is a necessary condition to bound the approximation error. Experiments on diverse benchmarks -- ranging from anisotropic Ornstein-Uhlenbeck (OU) processes and high-dimensional Brownian motions with time-varying diffusion terms, to Geometric OU processes featuring non-Gaussian solutions -- demonstrate that A-PFRM effectively mitigates the CoD, maintaining high accuracy and constant temporal cost for problems up to 100 dimensions.

URL PDF HTML ☆

赞 0 踩 0

2512.10766 2026-03-25 cs.CR cs.AI cs.CV

Metaphor-based Jailbreak Attacks on Text-to-Image Models

Chenyu Zhang, Lanjun Wang, Yiwen Ma, Wenhui Li, Yi Tu, An-An Liu

Comments Code is available in \url{https://github.com/datar001/metaphor-based-jailbreaking-attack}

详情

英文摘要

Text-to-image (T2I) models commonly incorporate defense mechanisms to prevent the generation of sensitive images. Unfortunately, recent jailbreak attacks have shown that adversarial prompts can effectively bypass these mechanisms and induce T2I models to produce sensitive content, revealing critical safety vulnerabilities. However, existing attack methods implicitly assume that the attacker knows the type of deployed defenses, which limits their effectiveness against unknown or diverse defense mechanisms. In this work, we reveal an underexplored vulnerability of T2I models to metaphor-based jailbreak attacks (MJA), which aims to attack diverse defense mechanisms without prior knowledge of their type by generating metaphor-based adversarial prompts. Specifically, MJA consists of two modules: an LLM-based multi-agent generation module (LMAG) and an adversarial prompt optimization module (APO). LMAG decomposes the generation of metaphor-based adversarial prompts into three subtasks: metaphor retrieval, context matching, and adversarial prompt generation. Subsequently, LMAG coordinates three LLM-based agents to generate diverse adversarial prompts by exploring various metaphors and contexts. To enhance attack efficiency, APO first trains a surrogate model to predict the attack results of adversarial prompts and then designs an acquisition strategy to adaptively identify optimal adversarial prompts. Extensive experiments on T2I models with various external and internal defense mechanisms demonstrate that MJA achieves stronger attack performance while using fewer queries, compared with six baseline methods. Additionally, we provide an in-depth vulnerability analysis suggesting that metaphor-based adversarial prompts evade safety mechanisms by inducing semantic ambiguity, while sensitive images arise from the model's probabilistic interpretation of concealed semantics.

URL PDF HTML ☆

赞 0 踩 0

2512.09275 2026-03-25 stat.ML cs.LG

Impact of Positional Encoding: Clean and Adversarial Rademacher Complexity for Transformers under In-Context Regression

Weiyi He, Yue Xing

Comments 25 pages, 3 figures

2511.05919 2026-03-25 cs.CR cs.AI cs.CL

Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs

Alina Fastowski, Bardh Prenkaj, Yuxiao Li, Gjergji Kasneci

2511.04568 2026-03-25 stat.ML cs.LG econ.EM math.ST stat.ME stat.TH

Riesz Regression As Direct Density Ratio Estimation

Masahiro Kato

2510.12416 2026-03-25 stat.ML cs.LG

Geopolitics, Geoeconomics, and Sovereign Risk: Different Shocks, Different Channels

Alvaro Ortiz, Tomasa Rodrigo, Pablo Saborido

2509.19668 2026-03-25 eess.AS cs.AI cs.SD

Selective Classifier-free Guidance for Zero-shot Text-to-speech

John Zheng, Farhad Maleki

Comments 5 pages, 7 figures, 1 table. Revision 1: removed ICASSP copyright notice

2508.10149 2026-03-25 stat.ML cs.LG

Prediction-Powered Inference with Inverse Probability Weighting

Jyotishka Datta, Nicholas G. Polson

Comments 10 pages, 3 figures

2508.09537 2026-03-25 cs.SE cs.AI

From Context to Intent: Reasoning-Guided Function-Level Code Completion

Yanzhou Li, Tianlin Li, Yiran Zhang, Shangqing Liu, Aishan Liu, Xianglong Liu, Yang Liu

详情

英文摘要

The growing capabilities of Large Language Models (LLMs) have led to their widespread adoption for function completion within code repositories. Recent studies on such tasks show promising results when explicit instructions, often in the form of docstrings, are available to guide the completion. However, in real-world scenarios, clear docstrings are frequently absent. Under such conditions, LLMs typically fail to produce accurate completions. To enable more automated and accurate function completion in such settings, we aim to enable LLMs to accurately infer the developer's intent prior to code completion. Our key insight is that the preceding code, namely the code context before the function to be completed, often contains valuable cues that help the model understand the intended functionality. However, inferring intent from such implicit context is non-trivial and constitutes a core challenge in function-level code completion. To tackle this challenge, inspired by how humans interpret context, we propose a reasoning-based prompting framework that guides LLMs to utilize these contextual cues to infer intent step by step. To incentivize LLMs to reason through the preceding code and infer intent, we further curate a dataset of 40k examples, each annotated with intermediate reasoning traces and corresponding docstrings. Extensive experiments on DevEval and ComplexCodeEval demonstrate consistent performance improvements across multiple models, achieving over 25% relative gains in pass@1 for both DeepSeekCoder and CodeLLaMA families. Building upon our framework, we further develop an intent-interactive platform that supports lightweight human feedback. This platform allows developers to select from a set of candidate intentions or edit the intent to better guide the model. Our experiments show that this interactive approach leads to further performance improvements.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

GraphRAG for Engineering Diagrams: ChatP&ID Enables LLM Interaction with P&IDs

Generating and Evaluating Sustainable Procurement Criteria for the Swiss Public Sector using In-Context Prompting with Large Language Models

Do Large Language Models Reduce Research Novelty? Evidence from Information Systems Journals

OrgForge-IT: A Verifiable Synthetic Benchmark for LLM-Based Insider Threat Detection

Stability-Preserving Online Adaptation of Neural Closed-loop Maps

SPDE Methods for Nonparametric Bayesian Posterior Contraction and Laplace Approximation

mmFHE: mmWave Sensing with End-to-End Fully Homomorphic Encryption

Probabilistic modeling over permutations using quantum computers

Latent Style-based Quantum Wasserstein GAN for Drug Design

BadminSense: Enabling Fine-Grained Badminton Stroke Evaluation on a Single Smartwatch

Decorrelation, Diversity, and Emergent Intelligence: The Isomorphism Between Social Insect Colonies and Ensemble Machine Learning

Foundation-Model Surrogates Enable Data-Efficient Active Learning for Materials Discovery

Efficient and High-Fidelity Omni Modality Retrieval

DiffBMP: Differentiable Rendering with Bitmap Primitives

Energy-Aware Reinforcement Learning for Robotic Manipulation of Articulated Components in Infrastructure Operation and Maintenance

Behavioral Consistency Validation for LLM Agents: An Analysis of Trading-Style Switching through Stock-Market Simulation

Non-Clashing Teaching in Graphs: Algorithms, Complexity, and Bounds

Cross-Sensory Brain Passage Retrieval: Scaling Beyond Visual to Audio

VLM-CAD: VLM-Optimized Collaborative Agent Design Workflow for Analog Circuit Sizing

AI-Generated Code Is Not Reproducible (Yet): An Empirical Study of Dependency Gaps in LLM-Based Coding Agents

ASK: Adaptive Self-improving Knowledge Framework for Audio Text Retrieval

Adaptive Probability Flow Residual Minimization for High-Dimensional Fokker-Planck Equations

Metaphor-based Jailbreak Attacks on Text-to-Image Models

Impact of Positional Encoding: Clean and Adversarial Rademacher Complexity for Transformers under In-Context Regression

Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs

Riesz Regression As Direct Density Ratio Estimation

Geopolitics, Geoeconomics, and Sovereign Risk: Different Shocks, Different Channels

Selective Classifier-free Guidance for Zero-shot Text-to-speech

Prediction-Powered Inference with Inverse Probability Weighting

From Context to Intent: Reasoning-Guided Function-Level Code Completion