arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.00280 2026-04-02 cs.SE cs.AI

VeriAct: Beyond Verifiability -- Agentic Synthesis of Correct and Complete Formal Specifications

Md Rakib Hossain Misu, Iris Ma, Cristina V. Lopes

详情

英文摘要

Formal specifications play a central role in ensuring software reliability and correctness. However, automatically synthesizing high-quality formal specifications remains a challenging task, often requiring domain expertise. Recent work has applied large language models to generate specifications in Java Modeling Language (JML), reporting high verification pass rates. But does passing a verifier mean that the specification is actually correct and complete? In this work, we first conduct a comprehensive evaluation comparing classical and prompt-based approaches for automated JML specification synthesis. We then investigate whether prompt optimization can push synthesis quality further by evolving prompts through structured verification feedback. While optimization improves verifier pass rates, we find a clear performance ceiling. More critically, we propose Spec-Harness, an evaluation framework that measures specification correctness and completeness through symbolic verification, revealing that a large fraction of verifier-accepted specifications, including optimized ones, are in fact incorrect or incomplete, over- or under-constraining both inputs and outputs in ways invisible to the verifier. To push beyond this ceiling, we propose VeriAct, a verification-guided agentic framework that iteratively synthesizes and repairs specifications through a closed loop of LLM-driven planning, code execution, verification, and Spec-Harness feedback. Our experiments on two benchmark datasets show that VeriAct outperforms both prompt-based and prompt-optimized baselines, producing specifications that are not only verifiable but also correct and complete.

URL PDF HTML ☆

赞 0 踩 0

2604.00263 2026-04-02 eess.IV cs.CV

Feature-level Site Leakage Reduction for Cross-Hospital Chest X-ray Transfer via Self-Supervised Learning

Ayoub Louaye Bouaziz, Lokmane Chebouba

Comments Accepted at The 7th International Conference on Computing Systems and Applications [Algiers,2026]

2604.00242 2026-04-02 cs.IR cs.CL

FGR-ColBERT: Identifying Fine-Grained Relevance Tokens During Retrieval

Antonín Jarolím, Martin Fajčík

2604.00237 2026-04-02 cs.CY cs.AI cs.MA

AI-Mediated Explainable Regulation for Justice

Thomas Hofweber, Andreas Sudmann, Evangelos Pournaras

2604.00225 2026-04-02 eess.IV cs.CV

Pupil Design for Computational Wavefront Estimation

Ali Almuallem, Nicholas Chimitt, Bole Ma, Qi Guo, Stanley H. Chan

2604.00222 2026-04-02 cs.SE cs.LG cs.PF

Risk-Aware Batch Testing for Performance Regression Detection

Ali Sayedsalehi, Peter C. Rigby, Gregory Mierzwinski

Comments 14 pages, 1 figure, 4 tables. Replication package and dataset available

详情

英文摘要

Performance regression testing is essential in large-scale continuous-integration (CI) systems, yet executing full performance suites for every commit is prohibitively expensive. Prior work on performance regression prediction and batch testing has shown independent benefits, but each faces practical limitations: predictive models are rarely integrated into CI decision-making, and conventional batching strategies ignore commit-level heterogeneity. We unify these strands by introducing a risk-aware framework that integrates machine-learned commit risk with adaptive batching. Using Mozilla Firefox as a case study, we construct a production-derived dataset of human-confirmed regressions aligned chronologically with Autoland, and fine-tune ModernBERT, CodeBERT, and LLaMA-3.1 variants to estimate commit-level performance regression risk, achieving up to 0.694 ROC-AUC with CodeBERT. The risk scores drive a family of risk-aware batching strategies, including Risk-Aged Priority Batching and Risk-Adaptive Stream Batching, evaluated through realistic CI simulations. Across thousands of historical Firefox commits, our best overall configuration, Risk-Aged Priority Batching with linear aggregation (RAPB-la), yields a Pareto improvement over Mozilla's production-inspired baseline. RAPB-la reduces total test executions by 32.4%, decreases mean feedback time by 3.8%, maintains mean time-to-culprit at approximately the baseline level, reduces maximum time-to-culprit by 26.2%, and corresponds to an estimated annual infrastructure cost savings of approximately $491K under our cost model. These results demonstrate that risk-aware batch testing can reduce CI resource consumption while improving diagnostic timeliness. To support reproducibility and future research, we release a complete replication package containing all datasets, fine-tuning pipelines, and implementations of our batching algorithms.

URL PDF HTML ☆

赞 0 踩 0

2604.00189 2026-04-02 cs.SE cs.AI cs.NI

Making Sense of AI Agents Hype: Adoption, Architectures, and Takeaways from Practitioners

Ruoyu Su, Matteo Esposito, Roberta Capuano, Rafiullah Omar, June Sallou, Henry Muccini, Davide Taibi

2604.00186 2026-04-02 eess.SY cs.AI cs.CY cs.SY econ.GN q-fin.EC stat.AP

Agentic AI and Occupational Displacement: A Multi-Regional Task Exposure Analysis of Emerging Labor Market Disruption

Ravish Gupta, Saket Kumar

Comments 26 pages, 2 figures, 6 tables. Submitted to IMF-OECD-PIIE-World Bank Conference on Labor Markets and Structural Transformation 2026

2604.00181 2026-04-02 cs.CR cs.AI

NFC based inventory control system for secure and efficient communication

Razi Iqbal, Awais Ahmad, Asfandyar Gillani

2604.00179 2026-04-02 eess.SY cs.LG cs.SY

Finite-Time Analysis of Projected Two-Time-Scale Stochastic Approximation

Yitao Bai, Thinh T. Doan, Justin Romberg

Comments 6 pages, 3 figures

2604.00171 2026-04-02 cs.SE cs.AI cs.LO

Unified Architecture Metamodel of Information Systems Developed by Generative AI

Oleg Grynets, Vasyl Lyashkevych

Comments 22 pages, 13 figures, 12 tables, 28 references

详情

英文摘要

The rapid development of AI and LLMs has driven new methods of SDLC, in which a large portion of code, technical, and business documentation is generated automatically. However, since there is no single architectural framework that can provide consistent, repeatable transformations across different representation layers of information systems, such systems remain fragmented in their system representation. This study explores the problem of creating a unified architecture for LLM-oriented applications based on selected architectural frameworks by SMEs. A framework structure is proposed that covers some key types of architectural diagrams and supports a closed cycle of transformations, such as: "Code to Documentation to Code". The key architectural diagrams are split equally between main architectural layers: high-layer (business and domain understanding), middle-layer (system architecture), and low-layer (developer-layer architecture). Each architectural layer still contains some abstraction layers, which make it more flexible and better fit the requirements of design principles and architectural patterns. The conducted experiments demonstrated the stable quality of generated documentation and code when using a structured architectural context in the form of architectural diagrams. The results confirm that the proposed unified architecture metamodel can serve as an effective interface between humans and models, improving the accuracy, stability, and repeatability of LLM generation. However, the selected set of architectural diagrams should be optimised to avoid redundancy between some diagrams, and some diagrams should be updated to represent extra contextual orchestration. This work demonstrates measurable improvements for a new generation of intelligent tools that automate the SDLC and enable a comprehensive architecture compatible with AI-driven development.

URL PDF HTML ☆

赞 0 踩 0

2604.00167 2026-04-02 cs.SE cs.AI

A Study on the Impact of Fault localization Granularity for Repository-Scale Code Repair Tasks

Joseph Townsend, Chandresh Pravin, Kwun Ho Ngan, Matthieu Parizy

详情

英文摘要

Automatic program repair can be a challenging task, especially when resolving complex issues at a repository-level, which often involves issue reproduction, fault localization, code repair, testing and validation. Issues of this scale can be commonly found in popular GitHub repositories or datasets that are derived from them. Some repository-level approaches separate localization and repair into distinct phases. Where this is the case, the fault localization approaches vary in terms of the granularity of localization. Where the impact of granularity is explored to some degree for smaller datasets, not all isolate this issue from the separate question of localization accuracy by testing code repair under the assumption of perfect fault localization. To the best of the authors' knowledge, no repository-scale studies have explicitly investigated granularity under this assumption, nor conducted a systematic empirical comparison of granularity levels in isolation. We propose a framework for performing such tests by modifying the localization phase of the Agentless framework to retrieve ground-truth localization data and include this as context in the prompt fed to the repair phase. We show that under this configuration and as a generalization over the SWE-Bench-Mini dataset, function-level granularity yields the highest repair rate against line-level and file-level. However, a deeper dive suggests that the ideal granularity may in fact be task dependent. This study is not intended to improve on the state-of-the-art, nor do we intend for results to be compared against any complete agentic frameworks. Rather, we present a proof of concept for investigating how fault localization may impact automatic code repair in repository-scale scenarios. We present preliminary findings to this end and encourage further research into this relationship between the two phases.

URL PDF HTML ☆

赞 0 踩 0

2604.00120 2026-04-02 cs.SE cs.AI

From Domain Understanding to Design Readiness: a playbook for GenAI-supported learning in Software Engineering

Rafal Wlodarski

2604.00112 2026-04-02 cs.CR cs.LG cs.SE

Efficient Software Vulnerability Detection Using Transformer-based Models

Sameer Shaik, Zhen Huang, Daniela Stan Raicu, Jacob Furst

2604.00081 2026-04-02 cs.CY cs.AI cs.RO

Beyond Symbolic Control: Societal Consequences of AI-Driven Workforce Displacement and the Imperative for Genuine Human Oversight Architectures

Richard J. Mitchell

Comments 23 pages, 23 references

2604.00065 2026-04-02 q-bio.GN cs.LG

Genetic algorithms for multi-omic feature selection: a comparative study in cancer survival analysis

Luca Cattelani, Vittorio Fortino

2604.00064 2026-04-02 stat.ML cs.LG math.PR math.ST q-fin.CP stat.TH

Forecast collapse of transformer-based models under squared loss in financial time series

Pierre Andreoletti

2604.00060 2026-04-02 stat.ML cs.IT cs.LG math.IT

Scaled Gradient Descent for Ill-Conditioned Low-Rank Matrix Recovery with Optimal Sampling Complexity

Zhenxuan Li, Meng Huang

2604.00058 2026-04-02 q-bio.GN cs.AI cs.LG

GenoBERT: A Language Model for Accurate Genotype Imputation

Lei Huang, Chuan Qiu, Kuan-Jui Su, Anqi Liu, Yun Gong, Weiqiang Lin, Lindong Jiang, Chen Zhao, Meng Song, Jeffrey Deng, Qing Tian, Zhe Luo, Ping Gong, Hui Shen, Chaoyang Zhang, Hong-Wen Deng

2604.00057 2026-04-02 cs.MM cs.AI

Towards Automatic Soccer Commentary Generation with Knowledge-Enhanced Visual Reasoning

Zeyu Jin, Xiaoyu Qin, Songtao Zhou, Kaifeng Yun, Jia Jia

Comments Accepted by ICME 2026

2604.00053 2026-04-02 cs.SE cs.AI

The Energy Footprint of LLM-Based Environmental Analysis: LLMs and Domain Products

Alicia Bao, Jiamian He, Angel Hsu, Diego Manya, Ji, Zhang

2604.00049 2026-04-02 math.NA cs.NA cs.RO

A Generalized Matrix Inverse that is Consistent with Respect to Diagonal Transformations

Jeffrey Uhlmann

Comments This reflects the 2018 SIMAX publication. (The 1604.08476 preprint has a comment saying that its content is contained in the SIMAX paper, but the two are quite distinct.)

2604.00048 2026-04-02 eess.IV cs.AI

Whittaker-Henderson smoother for long satellite image time series interpolation

Mathieu Fauvel

2604.00046 2026-04-02 cs.SE cs.LG

Large Language Models for Analyzing Enterprise Architecture Debt in Unstructured Documentation

Christin Pagels, Simon Hacks, Rob Henk Bemthuis

Comments Author version, 2 figures, 5 tables. To appear in the Proceedings of the 41st ACM/SIGAPP Symposium on Applied Computing (SAC '26), 2026

2604.00043 2026-04-02 cs.PL cs.AI

DriftScript: A Domain-Specific Language for Programming Non-Axiomatic Reasoning Agents

Seamus Brady

2604.00039 2026-04-02 cs.PL cs.LG

Transformers for Program Termination

Yoav Alon, Cristina David

Comments 12 pages

2604.00038 2026-04-02 stat.ML cs.LG

Isomorphic Functionalities between Ant Colony and Ensemble Learning: Part II-On the Strength of Weak Learnability and the Boosting Paradigm

Ernest Fokoué, Gregory Babbitt, Yuval Levental

Comments 21 pages, 5 figures, 4 tables

2604.00036 2026-04-02 q-bio.NC cs.AI cs.LG cs.NE physics.bio-ph

When and Where: A Model Hippocampal Network Unifies Formation of Time Cells and Place Cells

Qiaorong S. Yu, Zhaoze Wang, Vijay Balasubramanian

Comments 18 pages, 6 figures

2604.00032 2026-04-02 physics.ed-ph cs.CY cs.RO

Rusty Flying Robots: Learning a Full Robotics Stack with Real-Time Operation on an STM32 Microcontroller in a 9 ECTS MS Course

Wolfgang Hoenig, Christoph Scherer, Khaled Wahba

Comments Accepted at the International Conference on Robotics in Education (RiE), 2026

2604.00031 2026-04-02 q-fin.GN cs.LG

Decomposable Reward Modeling and Realistic Environment Design for Reinforcement Learning-Based Forex Trading

Nabeel Ahmad Saidd