arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2506.04989 2026-04-10 cs.SE cs.CY cs.LG

BacPrep: Lessons from Deploying an LLM-Based Bacalaureat Assessment Platform

Adrian-Marius Dumitran, Radu Dita, Angela Liliana Dumitran

Comments First version ACCEPTED at BBGI (ITS 2025 Workshop) Second versions ACCEPTED at ITS 2026

详情

英文摘要

Accessing quality preparation and feedback for the Romanian Bacalaureat exam is challenging, particularly for students in remote or underserved areas. This paper presents BacPrep, an experimental online platform exploring Large Language Model (LLM) potential for automated assessment, aiming to offer a free, accessible resource. Using official exam questions from the last 5 years, BacPrep employs the latest available Gemini Flash model (currently Gemini 2.5 Flash, via the \texttt{gemini-flash-latest} endpoint) to prioritize user experience quality during the data collection phase, with model versioning to be locked for subsequent rigorous evaluation. The platform has collected over 100 student solutions across Computer Science and Romanian Language exams, enabling preliminary assessment of LLM grading quality. This revealed several significant challenges: grading inconsistency across multiple runs, arithmetic errors when aggregating fractional scores, performance degradation under large prompt contexts, failure to apply subject-specific rubric weightings, and internal inconsistencies between generated scores and qualitative feedback. These findings motivate a redesigned architecture featuring subject-level prompt decomposition, specialized per-subject graders, and a median-selection strategy across multiple runs. Expert validation against human-graded solutions remains the critical next step.

URL PDF HTML ☆

赞 0 踩 0

2504.13532 2026-04-10 quant-ph cs.CV q-fin.PR

Quantum Walks-Based Adaptive Distribution Generation with Efficient CUDA-Q Acceleration

Yen-Jui Chang, Wei-Ting Wang, Chen-Yu Liu, Yun-Yuan Wang, Ching-Ray Chang

Comments 17 pages, 5 figures

2503.21840 2026-04-10 eess.IV cs.CV

Vision Language Models versus Machine Learning Models Performance on Polyp Detection and Classification in Colonoscopy Images

Mohammad Amin Khalafi, Seyed Amir Ahmad Safavi-Naini, Ameneh Salehi, Nariman Naderi, Dorsa Alijanzadeh, Pardis Ketabi Moghadam, Kaveh Kavosi, Negar Golestani, Shabnam Shahrokh, Soltanali Fallah, Jamil S Samaan, Nicholas P. Tatonetti, Nicholas Hoerter, Girish Nadkarni, Hamid Asadzadeh Aghdaei, Ali Soroush

Comments Code is available at: https://github.com/aminkhalafi/CML-vs-LLM-on-Polyp-Detection. CoI: AlSo serves on the advisory board and holds equity in Virgo Surgical Solutions. The other authors declare no conflicts of interest. Data

详情

DOI: 10.1038/s41598-025-29566-2
Journal ref: Scientific Reports 15, 45484 (2025)

英文摘要

Introduction: This study provides a comprehensive performance assessment of vision-language models (VLMs) against established convolutional neural networks (CNNs) and classic machine learning models (CMLs) for computer-aided detection (CADe) and computer-aided diagnosis (CADx) of colonoscopy polyp images. Method: We analyzed 2,258 colonoscopy images with corresponding pathology reports from 428 patients. We preprocessed all images using standardized techniques (resizing, normalization, and augmentation) and implemented a rigorous comparative framework evaluating 11 distinct models: ResNet50, 4 CMLs (random forest, support vector machine, logistic regression, decision tree), two specialized contrastive vision language encoders (CLIP, BiomedCLIP), and three general-purpose VLMs ( GPT-4 Gemini-1.5-Pro, Claude-3-Opus). Our performance assessment focused on two clinical tasks: polyp detection (CADe) and classification (CADx). Result: In polyp detection, ResNet50 achieved the best performance (F1: 91.35%, AUROC: 0.98), followed by BiomedCLIP (F1: 88.68%, AUROC: [AS1] ). GPT-4 demonstrated comparable effectiveness to traditional machine learning approaches (F1: 81.02%, AUROC: [AS2] ), outperforming other general-purpose VLMs. For polyp classification, performance rankings remained consistent but with lower overall metrics. ResNet50 maintained the highest efficacy (weighted F1: 74.94%), while GPT-4 demonstrated moderate capability (weighted F1: 41.18%), significantly exceeding other VLMs (Claude-3-Opus weighted F1: 25.54%, Gemini 1.5 Pro weighted F1: 6.17%). Conclusion: CNNs remain superior for both CADx and CADe tasks. However, VLMs like BioMedCLIP and GPT-4 may be useful for polyp detection tasks where training CNNs is not feasible.

URL PDF HTML ☆

赞 0 踩 0

2503.12374 2026-04-10 cs.SE cs.AI

Beyond Final Code: A Process-Oriented Error Analysis of Software Development Agents in Real-World GitHub Scenarios

Zhi Chen, Wei Ma, Lingxiao Jiang

Comments Paper accepted at ICSE 2026, Research Track

2410.17690 2026-04-10 eess.SY cs.GT cs.MA cs.RO cs.SY

Multi-agent Reach-avoid MDP via Potential Games and Low-rank Policy Structure

Adam Casselman, Abraham P. Vinod, Sarah H. Q. Li

Comments 8 pages, 4 figures

2604.08140 2026-04-10 cs.CR cs.AI cs.MM cs.NI

Multimodal Reasoning with LLM for Encrypted Traffic Interpretation: A Benchmark

Longgang Zhang, Xiaowei Fu, Fuxiang Huang, Lei Zhang

Comments Project page \url{https://github.com/lgzhangzlg/Multimodal-Reasoning-with-LLM-for-Encrypted-Traffic-Interpretation-A-Benchmark}

详情

英文摘要

Network traffic, as a key media format, is crucial for ensuring security and communications in modern internet infrastructure. While existing methods offer excellent performance, they face two key bottlenecks: (1) They fail to capture multidimensional semantics beyond unimodal sequence patterns. (2) Their black box property, i.e., providing only category labels, lacks an auditable reasoning process. We identify a key factor that existing network traffic datasets are primarily designed for classification and inherently lack rich semantic annotations, failing to generate human-readable evidence report. To address data scarcity, this paper proposes a Byte-Grounded Traffic Description (BGTD) benchmark for the first time, combining raw bytes with structured expert annotations. BGTD provides necessary behavioral features and verifiable chains of evidence for multimodal reasoning towards explainable encrypted traffic interpretation. Built upon BGTD, this paper proposes an end-to-end traffic-language representation framework (mmTraffic), a multimodal reasoning architecture bridging physical traffic encoding and semantic interpretation. In order to alleviate modality interference and generative hallucinations, mmTraffic adopts a jointly-optimized perception-cognition architecture. By incorporating a perception-centered traffic encoder and a cognition-centered LLM generator, mmTraffic achieves refined traffic interpretation with guaranteed category prediction. Extensive experiments demonstrate that mmTraffic autonomously generates high-fidelity, human-readable, and evidence-grounded traffic interpretation reports, while maintaining highly competitive classification accuracy comparing to specialized unimodal model (e.g., NetMamba). The source code is available at https://github.com/lgzhangzlg/Multimodal-Reasoning-with-LLM-for-Encrypted-Traffic-Interpretation-A-Benchmark

URL PDF HTML ☆

赞 0 踩 0

2604.08123 2026-04-10 cs.DC cs.AI

LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows

Lingyun Yang, Suyi Li, Tianyu Feng, Xiaoxiao Jiang, Zhipeng Di, Weiyi Lu, Kan Liu, Yinghao Yu, Tao Lan, Guodong Yang, Lin Qu, Liping Zhang, Wei Wang

2604.08113 2026-04-10 cs.CR cs.AI cs.LG

TADP-RME: A Trust-Adaptive Differential Privacy Framework for Enhancing Reliability of Data-Driven Systems

Labani Halder, Payel Sadhukhan, Sarbani Palit

2604.08099 2026-04-10 eess.SY cs.RO cs.SY

Complementary Filtering on SO(3) for Attitude Estimation with Scalar Measurements

Alessandro Melis, Soulaimane Berkane, Tarek Hamel

Comments Submitted to CDC 2026

2604.08062 2026-04-10 cs.HC cs.AI

From Gaze to Guidance: Interpreting and Adapting to Users' Cognitive Needs with Multimodal Gaze-Aware AI Assistants

Valdemar Danry, Javier Hernandez, Andrew Wilson, Pattie Maes, Judith Amores

2604.08037 2026-04-10 cs.CR cs.AI cs.CV cs.LG

PrivFedTalk: Privacy-Aware Federated Diffusion with Identity-Stable Adapters for Personalized Talking-Head Generation

Soumya Mazumdar, Vineet Kumar Rakesh, Tapas Samanta

Comments GitHub: https://github.com/mazumdarsoumya/PrivFedTalk

详情

英文摘要

Talking-head generation has advanced rapidly with diffusion-based generative models, but training usually depends on centralized face-video and speech datasets, raising major privacy concerns. The problem is more acute for personalized talking-head generation, where identity-specific data are highly sensitive and often cannot be pooled across users or devices. PrivFedTalk is presented as a privacy-aware federated framework for personalized talking-head generation that combines conditional latent diffusion with parameter-efficient identity adaptation. A shared diffusion backbone is trained across clients, while each client learns lightweight LoRA identity adapters from local private audio-visual data, avoiding raw data sharing and reducing communication cost. To address heterogeneous client distributions, Identity-Stable Federated Aggregation (ISFA) weights client updates using privacy-safe scalar reliability signals computed from on-device identity consistency and temporal stability estimates. Temporal-Denoising Consistency (TDC) regularization is introduced to reduce inter-frame drift, flicker, and identity drift during federated denoising. To limit update-side privacy risk, secure aggregation and client-level differential privacy are applied to adapter updates. The implementation supports both low-memory GPU execution and multi-GPU client-parallel training on heterogeneous shared hardware. Comparative experiments on the present setup across multiple training and aggregation conditions with PrivFedTalk, FedAvg, and FedProx show stable federated optimization and successful end-to-end training and evaluation under constrained resources. The results support the feasibility of privacy-aware personalized talking-head training in federated environments, while suggesting that stronger component-wise, privacy-utility, and qualitative claims need further standardized evaluation.

URL PDF HTML ☆

赞 0 踩 0

2604.08003 2026-04-10 eess.AS cs.CL cs.SD

Rethinking Entropy Allocation in LLM-based ASR: Understanding the Dynamics between Speech Encoders and LLMs

Yuan Xie, Jiaqi Song, Guang Qiu, Xianliang Wang, Ming Lei, Jie Gao, Jie Wu

2604.07989 2026-04-10 cs.IR cs.AI

Show Me the Infographic I Imagine: Intent-Aware Infographic Retrieval for Authoring Support

Jing Xu, Jiarui Hu, Zhihao Shuai, Yiyun Chen, Weikai Yang

Comments Project homepage: https://infographicretrieval.github.io/

2604.07988 2026-04-10 cs.DC cs.AI

LogAct: Enabling Agentic Reliability via Shared Logs

Mahesh Balakrishnan, Ashwin Bharambe, Davide Testuggine, David Geraghty, David Mao, Vidhya Venkat, Ilya Mironov, Rithesh Baradi, Gayathri Aiyer, Victoria Dudin

2604.07970 2026-04-10 eess.SY cs.RO cs.SY

Karma Mechanisms for Decentralised, Cooperative Multi Agent Path Finding

Kevin Riehl, Julius Schlapbach, Anastasios Kouvelas, Michail A. Makridis

2604.07951 2026-04-10 quant-ph cs.AI cs.LG

Investigation of Automated Design of Quantum Circuits for Imaginary Time Evolution Methods Using Deep Reinforcement Learning

Ryo Suzuki, Shohei Watabe

Comments 11 pages, 11 figures

2604.07929 2026-04-10 cs.IR cs.AI

Same Outcomes, Different Journeys: A Trace-Level Framework for Comparing Human and GUI-Agent Behavior in Production Search Systems

Maria Movin, Claudia Hauff, Aron Henriksson, Panagiotis Papapetrou

2604.07911 2026-04-10 cs.MA cs.AI cs.LG

Dynamic Attentional Context Scoping: Agent-Triggered Focus Sessions for Isolated Per-Agent Steering in Multi-Agent LLM Orchestration

Nickson Patel

Comments 15 pages, 4 figures, preprint

详情

英文摘要

Multi-agent LLM orchestration systems suffer from context pollution: when N concurrent agents compete for the orchestrator's context window, each agent's task state, partial outputs, and pending questions contaminate the steering interactions of every other agent, degrading decision quality. We introduce Dynamic Attentional Context Scoping (DACS), a mechanism in which the orchestrator operates in two asymmetric modes. In Registry mode it holds only lightweight per-agent status summaries (<=200 tokens each), remaining responsive to all agents and the user. When an agent emits a SteeringRequest, the orchestrator enters Focus(a_i) mode, injecting the full context of agent a_i while compressing all other agents to their registry entries. Context isolation is agent-triggered, asymmetric, and deterministic: the context window contains exactly F(a_i) + R_{-i} during steering, eliminating cross-agent contamination without requiring context compression or retrieval. We evaluate DACS across four experimental phases totalling 200 trials: Phase 1 tests N in {3,5,10} (60 trials); Phase 2 tests agent heterogeneity and adversarial dependencies (60 trials); Phase 3 tests decision density up to D=15 (40 trials); Phase 4 uses autonomous LLM agents for free-form questions (40 trials, Claude Haiku 4.5). Across all 8 synthetic scenarios, DACS achieves 90.0--98.4% steering accuracy versus 21.0--60.0% for a flat-context baseline (p < 0.0001 throughout), with wrong-agent contamination falling from 28--57% to 0--14% and context efficiency ratios of up to 3.53x. The accuracy advantage grows with N and D; keyword matching is validated by LLM-as-judge across all phases (mean kappa=0.909). DACS outperforms the flat-context baseline by +17.2pp at N=3 (p=0.0023) and +20.4pp at N=5 (p=0.0008) in Phase 4, with the advantage growing with N confirmed by two independent judges.

URL PDF HTML ☆

赞 0 踩 0

2604.07896 2026-04-10 quant-ph cs.LG

Non-variational supervised quantum kernel methods: a review

John Tanner, Chon-Fai Kam, Jingbo Wang

Comments 38 pages, 11 figures, 1 table

2604.07872 2026-04-10 cs.NE cs.AI

PyVRP$^+$: LLM-Driven Metacognitive Heuristic Evolution for Hybrid Genetic Search in Vehicle Routing Problems

Manuj Malik, Jianan Zhou, Shashank Reddy Chirra, Zhiguang Cao

Comments 18 pages, accepted to the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

2604.07869 2026-04-10 cs.IR cs.LG

Ensembles at Any Cost? Accuracy-Energy Trade-offs in Recommender Systems

Jannik Nitschke, Lukas Wegmeth, Joeran Beel

2604.07863 2026-04-10 cs.IR cs.AI

Task-Adaptive Retrieval over Agentic Multi-Modal Web Histories via Learned Graph Memory

Saman Forouzandeh, Kamal Berahmand, Mahdi Jalili

Comments The 49th International ACM SIGIR Conference on Research and Development in Information Retrieval

2604.07857 2026-04-10 eess.SY cs.AI cs.SY

Networking-Aware Energy Efficiency in Agentic AI Inference: A Survey

Xiaojing Chen, Haiqi Yu, Wei Ni, Dusit Niyato, Ruichen Zhang, Xin Wang, Shunqing Zhang, Shugong Xu

2604.07851 2026-04-10 cs.IR cs.AI

ReRec: Reasoning-Augmented LLM-based Recommendation Assistant via Reinforcement Fine-tuning

Jiani Huang, Shijie Wang, Liangbo Ning, Wenqi Fan, Qing Li

Comments Accepted by ACL 2026

2604.07831 2026-04-10 cs.CR cs.CL cs.CV

Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element Injection

Wenkui Yang, Chao Jin, Haisu Zhu, Weilin Luo, Derek Yuen, Kun Shao, Huaibo Huang, Junxian Duan, Jie Cao, Ran He

Comments 44 pages, 10 figures, public code will be available at https://github.com/HashTAG00002/UI-Injection

2604.07810 2026-04-10 stat.ML cs.LG math.PR stat.ME

Intensity Dot Product Graphs

Giulio Valentino Dalla Riva, Matteo Dalla Riva

2604.07803 2026-04-10 cs.CY cs.AI cs.CV

The Weaponization of Computer Vision: Tracing Military-Surveillance Ties through Conference Sponsorship

Noa Garcia, Amelia Katirai

Comments FAccT 2026

2604.07788 2026-04-10 cs.IR cs.CL

PeReGrINE: Evaluating Personalized Review Fidelity with User Item Graph Context

Steven Au, Baihan Lin

2604.07781 2026-04-10 eess.SY cs.AI cs.LG cs.SY

Toward Generalizable Graph Learning for 3D Engineering AI: Explainable Workflows for CAE Mode Shape Classification and CFD Field Prediction

Tong Duy Son, Kohta Sugiura, Marc Brughmans, Andrey Hense, Zhihao Liu, Amirthalakshmi Veeraraghavan, Ajinkya Bhave, Jay Masters, Paolo di Carlo, Theo Geluk

2604.07762 2026-04-10 cond-mat.stat-mech cs.LG math.OC math.PR

Generative optimal transport via forward-backward HJB matching

Haiqian Yang, Vishaal Krishnan, Sumit Sinha, L. Mahadevan

Comments 16 pages, 4 figures

详情

英文摘要

Controlling the evolution of a many-body stochastic system from a disordered reference state to a structured target ensemble, characterized empirically through samples, arises naturally in non-equilibrium statistical mechanics and stochastic control. The natural relaxation of such a system - driven by diffusion - runs from the structured target toward the disordered reference. The natural question is then: what is the minimum-work stochastic process that reverses this relaxation, given a pathwise cost functional combining spatial penalties and control effort? Computing this optimal process requires knowledge of trajectories that already sample the target ensemble - precisely the object one is trying to construct. We resolve this by establishing a time-reversal duality: the value function governing the hard backward dynamics satisfies an equivalent forward-in-time HJB equation, whose solution can be read off directly from the tractable forward relaxation trajectories. Via the Cole-Hopf transformation and its associated Feynman-Kac representation, this forward potential is computed as a path-space free energy averaged over these forward trajectories - the same relaxation paths that are easy to simulate - without any backward simulation or knowledge of the target beyond samples. The resulting framework provides a physically interpretable description of stochastic transport in terms of path-space free energy, risk-sensitive control, and spatial cost geometry. We illustrate the theory with numerical examples that visualize the learned value function and the induced controlled diffusions, demonstrating how spatial cost fields shape transport geometry analogously to Fermat's Principle in inhomogeneous media. Our results establish a unifying connection between stochastic optimal control, Schrödinger bridge theory, and non-equilibrium statistical mechanics.

URL PDF HTML ☆

赞 0 踩 0