arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1717
2604.00280 2026-04-02 cs.SE cs.AI

VeriAct: Beyond Verifiability -- Agentic Synthesis of Correct and Complete Formal Specifications

Md Rakib Hossain Misu, Iris Ma, Cristina V. Lopes

详情
英文摘要

Formal specifications play a central role in ensuring software reliability and correctness. However, automatically synthesizing high-quality formal specifications remains a challenging task, often requiring domain expertise. Recent work has applied large language models to generate specifications in Java Modeling Language (JML), reporting high verification pass rates. But does passing a verifier mean that the specification is actually correct and complete? In this work, we first conduct a comprehensive evaluation comparing classical and prompt-based approaches for automated JML specification synthesis. We then investigate whether prompt optimization can push synthesis quality further by evolving prompts through structured verification feedback. While optimization improves verifier pass rates, we find a clear performance ceiling. More critically, we propose Spec-Harness, an evaluation framework that measures specification correctness and completeness through symbolic verification, revealing that a large fraction of verifier-accepted specifications, including optimized ones, are in fact incorrect or incomplete, over- or under-constraining both inputs and outputs in ways invisible to the verifier. To push beyond this ceiling, we propose VeriAct, a verification-guided agentic framework that iteratively synthesizes and repairs specifications through a closed loop of LLM-driven planning, code execution, verification, and Spec-Harness feedback. Our experiments on two benchmark datasets show that VeriAct outperforms both prompt-based and prompt-optimized baselines, producing specifications that are not only verifiable but also correct and complete.

2604.00263 2026-04-02 eess.IV cs.CV

Feature-level Site Leakage Reduction for Cross-Hospital Chest X-ray Transfer via Self-Supervised Learning

Ayoub Louaye Bouaziz, Lokmane Chebouba

Comments Accepted at The 7th International Conference on Computing Systems and Applications [Algiers,2026]

详情
英文摘要

Cross-hospital failure in chest X-ray models is often attributed to domain shift, yet most work assumes invariance without measuring it. This paper studies how to measure site leakage directly and how that measurement changes conclusions about transfer methods. We study multi-site self-supervised learning (SSL) and feature-level adversarial site confusion for cross-hospital transfer. We pretrain a ResNet-18 on NIH and CheXpert without pathology labels. We then freeze the encoder and train a linear pneumonia classifier on NIH only, evaluating transfer to RSNA. We quantify site leakage using a post hoc linear probe that predicts acquisition site from frozen backbone features $f$ and projection features $z$. Across 3 random seeds, multi-site SSL improves RSNA AUC from 0.6736 $\pm$ 0.0148 (ImageNet initialization) to 0.7804 $\pm$ 0.0197. Adding adversarial site confusion on $f$ reduces measured leakage but does not reliably improve AUC and increases variance. On $f$, site probe accuracy drops from 0.9890 $\pm$ 0.0021 (SSL-only) to 0.8504 $\pm$ 0.0051 (CanonicalF), where chance is 0.50. On $z$, probe accuracy drops from 0.8912 $\pm$ 0.0092 to 0.7810 $\pm$ 0.0250. These results show that measuring leakage changes how transfer methods should be interpreted: multi-site SSL drives transfer, while adversarial confusion exposes the limits of invariance assumptions.

2604.00242 2026-04-02 cs.IR cs.CL

FGR-ColBERT: Identifying Fine-Grained Relevance Tokens During Retrieval

Antonín Jarolím, Martin Fajčík

详情
英文摘要

Document retrieval identifies relevant documents but does not provide fine-grained evidence cues, such as specific relevant spans. A possible solution is to apply an LLM after retrieval; however, this introduces significant computational overhead and limits practical deployment. We propose FGR-ColBERT, a modification of ColBERT retrieval model that integrates fine-grained relevance signals distilled from an LLM directly into the retrieval function. Experiments on MS MARCO show that FGR-ColBERT (110M) achieves a token-level F1 of 64.5, exceeding the 62.8 of Gemma 2 (27B), despite being approximately 245 times smaller. At the same time, it preserves retrieval effectiveness (99% relative Recall@50) and remains efficient, incurring only a ~1.12x latency overhead compared to the original ColBERT.

2604.00237 2026-04-02 cs.CY cs.AI cs.MA

AI-Mediated Explainable Regulation for Justice

Thomas Hofweber, Andreas Sudmann, Evangelos Pournaras

详情
英文摘要

Present practice of deciding on regulation faces numerous problems that make adopted regulations static, unexplained, unduly influenced by powerful interest groups, and stained with a perception of illegitimacy. These well-known problems with the regulatory process can lead to injustice and have substantial negative effects on society and democracy. We discuss a new approach that utilizes distributed artificial intelligence (AI) to make a regulatory recommendation that is explainable and adaptable by design. We outline the main components of a system that can implement this approach and show how it would resolve the problems with the present regulatory system. This approach models and reasons about stakeholder preferences with separate preference models, while it aggregates these preferences in a value sensitive way. Such recommendations can be updated due to changes in facts or in values and are inherently explainable. We suggest how stakeholders can make their preferences known to the system and how they can verify whether they were properly considered in the regulatory decision. The resulting system promises to support regulatory justice, legitimacy, and compliance.

2604.00225 2026-04-02 eess.IV cs.CV

Pupil Design for Computational Wavefront Estimation

Ali Almuallem, Nicholas Chimitt, Bole Ma, Qi Guo, Stanley H. Chan

详情
英文摘要

Establishing a precise connection between imaged intensity and the incident wavefront is essential for emerging applications in adaptive optics, holography, computational microscopy, and non-line-of-sight imaging. While prior work has shown that breaking symmetries in pupil design enables wavefront recovery from a single intensity measurement, there is little guidance on how to design a pupil that improves wavefront estimation. In this work we introduce a quantitative asymmetry metric to bridge this gap and, through an extensive empirical study and supporting analysis, demonstrate that increasing asymmetry enhances wavefront recoverability. We analyze the trade-offs in pupil design, and the impact on light throughput along with performance in noise. Both large-scale simulations and optical bench experiments are carried out to support our findings.

2604.00222 2026-04-02 cs.SE cs.LG cs.PF

Risk-Aware Batch Testing for Performance Regression Detection

Ali Sayedsalehi, Peter C. Rigby, Gregory Mierzwinski

Comments 14 pages, 1 figure, 4 tables. Replication package and dataset available

详情
英文摘要

Performance regression testing is essential in large-scale continuous-integration (CI) systems, yet executing full performance suites for every commit is prohibitively expensive. Prior work on performance regression prediction and batch testing has shown independent benefits, but each faces practical limitations: predictive models are rarely integrated into CI decision-making, and conventional batching strategies ignore commit-level heterogeneity. We unify these strands by introducing a risk-aware framework that integrates machine-learned commit risk with adaptive batching. Using Mozilla Firefox as a case study, we construct a production-derived dataset of human-confirmed regressions aligned chronologically with Autoland, and fine-tune ModernBERT, CodeBERT, and LLaMA-3.1 variants to estimate commit-level performance regression risk, achieving up to 0.694 ROC-AUC with CodeBERT. The risk scores drive a family of risk-aware batching strategies, including Risk-Aged Priority Batching and Risk-Adaptive Stream Batching, evaluated through realistic CI simulations. Across thousands of historical Firefox commits, our best overall configuration, Risk-Aged Priority Batching with linear aggregation (RAPB-la), yields a Pareto improvement over Mozilla's production-inspired baseline. RAPB-la reduces total test executions by 32.4%, decreases mean feedback time by 3.8%, maintains mean time-to-culprit at approximately the baseline level, reduces maximum time-to-culprit by 26.2%, and corresponds to an estimated annual infrastructure cost savings of approximately $491K under our cost model. These results demonstrate that risk-aware batch testing can reduce CI resource consumption while improving diagnostic timeliness. To support reproducibility and future research, we release a complete replication package containing all datasets, fine-tuning pipelines, and implementations of our batching algorithms.

2604.00189 2026-04-02 cs.SE cs.AI cs.NI

Making Sense of AI Agents Hype: Adoption, Architectures, and Takeaways from Practitioners

Ruoyu Su, Matteo Esposito, Roberta Capuano, Rafiullah Omar, June Sallou, Henry Muccini, Davide Taibi

详情
英文摘要

To support practitioners in understanding how agentic systems are designed in real-world industrial practice, we present a review of practitioner conference talks on AI agents. We analyzed 138 recorded talks to examine how companies adopt agent-based architectures (Objective 1), identify recurring architectural strategies and patterns (Objective 2), and analyze application domains and technologies used to implement and operate LLM-driven agentic systems (Objective 3).

2604.00186 2026-04-02 eess.SY cs.AI cs.CY cs.SY econ.GN q-fin.EC stat.AP

Agentic AI and Occupational Displacement: A Multi-Regional Task Exposure Analysis of Emerging Labor Market Disruption

Ravish Gupta, Saket Kumar

Comments 26 pages, 2 figures, 6 tables. Submitted to IMF-OECD-PIIE-World Bank Conference on Labor Markets and Structural Transformation 2026

详情
英文摘要

This paper extends the Acemoglu-Restrepo task exposure framework to address the labor market effects of agentic artificial intelligence systems: autonomous AI agents capable of completing entire occupational workflows rather than discrete tasks. Unlike prior automation technologies that substitute for individual subtasks, agentic AI systems execute end-to-end workflows involving multi-step reasoning, tool invocation, and autonomous decision-making, substantially expanding occupational displacement risk beyond what existing task-level analyses capture. We introduce the Agentic Task Exposure (ATE) score, a composite measure computed algorithmically from O*NET task data using calibrated adoption parameters--not a regression estimate--incorporating AI capability scores, workflow coverage factors, and logistic adoption velocity. Applying the ATE framework across five major US technology regions (Seattle-Tacoma, San Francisco Bay Area, Austin, New York, and Boston) over a 2025-2030 horizon, we find that 93.2% of the 236 analyzed occupations across six information-intensive SOC groups (financial, legal, healthcare, healthcare support, sales, and administrative/clerical) cross the moderate-risk threshold (ATE >= 0.35) in Tier 1 regions by 2030, with credit analysts, judges, and sustainability specialists reaching ATE scores of 0.43-0.47. We simultaneously identify seventeen emerging occupational categories benefiting from reinstatement effects, concentrated in human-AI collaboration, AI governance, and domain-specific AI operations roles. Our findings carry implications for workforce transition policy, regional economic planning, and the temporal dynamics of labor market adjustment

2604.00181 2026-04-02 cs.CR cs.AI

NFC based inventory control system for secure and efficient communication

Razi Iqbal, Awais Ahmad, Asfandyar Gillani

详情
英文摘要

This paper brings up this idea of using Near Field Communication (NFC) for inventory control system instead of using traditional barcodes. NFC because of its high security, ease of use and efficiency can be very suitable for systems like inventory control. In traditional inventory control systems, each product has a barcode pasted on it, which is vulnerable to attacks as barcodes are open and have no security. Furthermore, barcodes are prone to damages and can be unreliable when pasted on different types of products e.g. hot and frozen products, circular shaped products and irregular shaped products like clothes etc. NFC on the other hand is very efficient, secure and reliable when it comes to short-range wireless communication. In this paper we will present our prototype for the inventory control system of an electronic store in which each product has a passive NFC tag pasted to it. When a customer buys a product the receipt of the product is generated using NFC between the NFC passive tag on the product and NFC enabled device (e.g. smart phone or reader) at the cash counter.

2604.00179 2026-04-02 eess.SY cs.LG cs.SY

Finite-Time Analysis of Projected Two-Time-Scale Stochastic Approximation

Yitao Bai, Thinh T. Doan, Justin Romberg

Comments 6 pages, 3 figures

详情
英文摘要

We study the finite-time convergence of projected linear two-time-scale stochastic approximation with constant step sizes and Polyak--Ruppert averaging. We establish an explicit mean-square error bound, decomposing it into two interpretable components, an approximation error determined by the constrained subspace and a statistical error decaying at a sublinear rate, with constants expressed through restricted stability margins and a coupling invertibility condition. These constants cleanly separate the effect of subspace choice (approximation errors) from the effect of the averaging horizon (statistical errors). We illustrate our theoretical results through a number of numerical experiments on both synthetic and reinforcement learning problems.

2604.00171 2026-04-02 cs.SE cs.AI cs.LO

Unified Architecture Metamodel of Information Systems Developed by Generative AI

Oleg Grynets, Vasyl Lyashkevych

Comments 22 pages, 13 figures, 12 tables, 28 references

详情
英文摘要

The rapid development of AI and LLMs has driven new methods of SDLC, in which a large portion of code, technical, and business documentation is generated automatically. However, since there is no single architectural framework that can provide consistent, repeatable transformations across different representation layers of information systems, such systems remain fragmented in their system representation. This study explores the problem of creating a unified architecture for LLM-oriented applications based on selected architectural frameworks by SMEs. A framework structure is proposed that covers some key types of architectural diagrams and supports a closed cycle of transformations, such as: "Code to Documentation to Code". The key architectural diagrams are split equally between main architectural layers: high-layer (business and domain understanding), middle-layer (system architecture), and low-layer (developer-layer architecture). Each architectural layer still contains some abstraction layers, which make it more flexible and better fit the requirements of design principles and architectural patterns. The conducted experiments demonstrated the stable quality of generated documentation and code when using a structured architectural context in the form of architectural diagrams. The results confirm that the proposed unified architecture metamodel can serve as an effective interface between humans and models, improving the accuracy, stability, and repeatability of LLM generation. However, the selected set of architectural diagrams should be optimised to avoid redundancy between some diagrams, and some diagrams should be updated to represent extra contextual orchestration. This work demonstrates measurable improvements for a new generation of intelligent tools that automate the SDLC and enable a comprehensive architecture compatible with AI-driven development.

2604.00167 2026-04-02 cs.SE cs.AI

A Study on the Impact of Fault localization Granularity for Repository-Scale Code Repair Tasks

Joseph Townsend, Chandresh Pravin, Kwun Ho Ngan, Matthieu Parizy

详情
英文摘要

Automatic program repair can be a challenging task, especially when resolving complex issues at a repository-level, which often involves issue reproduction, fault localization, code repair, testing and validation. Issues of this scale can be commonly found in popular GitHub repositories or datasets that are derived from them. Some repository-level approaches separate localization and repair into distinct phases. Where this is the case, the fault localization approaches vary in terms of the granularity of localization. Where the impact of granularity is explored to some degree for smaller datasets, not all isolate this issue from the separate question of localization accuracy by testing code repair under the assumption of perfect fault localization. To the best of the authors' knowledge, no repository-scale studies have explicitly investigated granularity under this assumption, nor conducted a systematic empirical comparison of granularity levels in isolation. We propose a framework for performing such tests by modifying the localization phase of the Agentless framework to retrieve ground-truth localization data and include this as context in the prompt fed to the repair phase. We show that under this configuration and as a generalization over the SWE-Bench-Mini dataset, function-level granularity yields the highest repair rate against line-level and file-level. However, a deeper dive suggests that the ideal granularity may in fact be task dependent. This study is not intended to improve on the state-of-the-art, nor do we intend for results to be compared against any complete agentic frameworks. Rather, we present a proof of concept for investigating how fault localization may impact automatic code repair in repository-scale scenarios. We present preliminary findings to this end and encourage further research into this relationship between the two phases.

2604.00120 2026-04-02 cs.SE cs.AI

From Domain Understanding to Design Readiness: a playbook for GenAI-supported learning in Software Engineering

Rafal Wlodarski

详情
英文摘要

Software engineering courses often require rapid upskilling in supporting knowledge areas such as domain understanding and modeling methods. We report an experience from a two-week milestone in a master's course where 29 students used a customized ChatGPT (GPT-3.5) tutor grounded in a curated course knowledge base to learn cryptocurrency-finance basics and Domain-Driven Design (DDD). We logged all interactions and evaluated a 34.5% random sample of prompt-answer pairs (60/~174) with a five-dimension rubric (accuracy, relevance, pedagogical value, cognitive load, supportiveness), and we collected pre/post self-efficacy. Responses were consistently accurate and relevant in this setting: accuracy averaged 98.9% with no factual errors and only 2/60 minor inaccuracies, and relevance averaged 92.2%. Pedagogical value was high (89.4%) with generally appropriate cognitive load (82.78%), but supportiveness was low (37.78%). Students reported large pre-post self-efficacy gains for genAI-assisted domain learning and DDD application. From these observations we distill seventeen concrete teaching practices spanning prompt/configuration and course/workflow design (e.g., setting expected granularity, constraining verbosity, curating guardrail examples, adding small credit with a simple quality rubric). Within this single-course context, results suggest that genAI-supported learning can complement instruction in domain understanding and modeling tasks, while leaving room to improve tone and follow-up structure.

2604.00112 2026-04-02 cs.CR cs.LG cs.SE

Efficient Software Vulnerability Detection Using Transformer-based Models

Sameer Shaik, Zhen Huang, Daniela Stan Raicu, Jacob Furst

详情
英文摘要

Detecting software vulnerabilities is critical to ensuring the security and reliability of modern computer systems. Deep neural networks have shown promising results on vulnerability detection, but they lack the capability to capture global contextual information on vulnerable code. To address this limitation, we explore the application of transformers for C/C++ vulnerability detection. We use program slices that encapsulate key syntactic and semantic features of program code, such as API function calls, array usage, pointer manipulations, and arithmetic expressions. By leveraging transformers' capability to capture both local and global contextual information on vulnerable code, our work can identify vulnerabilities accurately. Combined with data balancing and hyperparameter fine-tuning, our work offers a robust and efficient approach to identifying vulnerable code with moderate resource usage and training time.

2604.00081 2026-04-02 cs.CY cs.AI cs.RO

Beyond Symbolic Control: Societal Consequences of AI-Driven Workforce Displacement and the Imperative for Genuine Human Oversight Architectures

Richard J. Mitchell

Comments 23 pages, 23 references

详情
英文摘要

The accelerating displacement of human labor by artificial intelligence (AI) and robotic systems represents a structural transformation whose societal consequences extend far beyond conventional labor market analysis. This paper presents a systematic multi-domain examination of the likely effects on economic structure, psychological well-being, political stability, education, healthcare, and geopolitical order. We identify a critical and underexamined dimension of this transition: the governance gap between nominal human oversight of AI systems -- where humans occupy positions of formal authority over AI decisions -- and genuine human oversight, where those humans possess the cognitive access, technical capability, and institutional authority to meaningfully understand, evaluate, and override AI outputs. We argue that this distinction, largely absent from current governance frameworks including the EU AI Act and NIST AI Risk Management Framework 1.0, represents the primary architectural failure mode in deployed AI governance. The societal consequences of labor displacement intensify this problem by concentrating consequential AI decision-making among an increasingly narrow class of technical and capital actors. We propose five architectural requirements for genuine human oversight systems and characterize the governance window -- estimated at 10-15 years -- before current deployment trajectories risk path-dependent social, economic, and institutional lock-in.

2604.00065 2026-04-02 q-bio.GN cs.LG

Genetic algorithms for multi-omic feature selection: a comparative study in cancer survival analysis

Luca Cattelani, Vittorio Fortino

详情
英文摘要

Multi-omic datasets offer opportunities for improved biomarker discovery in cancer research, but their high dimensionality and limited sample sizes make identifying compact and effective biomarker panels challenging. Feature selection in large-scale omics can be efficiently addressed by combining machine learning with genetic algorithms, which naturally support multi-objective optimization of predictive accuracy and biomarker set size. However, genetic algorithms remain relatively underexplored for multi-omic feature selection, where most approaches concatenate all layers into a single feature space. To address this limitation, we introduce Sweeping*, a multi-view, multi-objective algorithm alternating between single- and multi-view optimization. It employs a nested single-view multi-objective optimizer, and for this study we use the genetic algorithm NSGA3-CHS. It first identifies informative biomarkers within each layer, then jointly evaluates cross-layer interactions; these multi-omic solutions guide the next single-view search. Through repeated sweeps, the algorithm progressively identifies compact biomarker panels capturing cross-modal complementary signals. We benchmark five Sweeping* strategies, including hierarchical and concatenation-based variants, using survival prediction on three TCGA cohorts. Each strategy jointly optimizes predictive accuracy and set size, measured via the concordance index and root-leanness. Overall performance and estimation error are assessed through cross hypervolume and Pareto delta under 5-fold cross-validation. Our results show that Sweeping* can improve the accuracy-complexity trade-off when sufficient survival signal is present and that integrating omic layers can enhance survival prediction beyond clinical-only models, although benefits remain cohort-dependent.

2604.00064 2026-04-02 stat.ML cs.LG math.PR math.ST q-fin.CP stat.TH

Forecast collapse of transformer-based models under squared loss in financial time series

Pierre Andreoletti

详情
英文摘要

We study trajectory forecasting under squared loss for time series with weak conditional structure, using highly expressive prediction models. Building on the classical characterization of squared-loss risk minimization, we emphasize regimes in which the conditional expectation of future trajectories is effectively degenerate, leading to trivial Bayes-optimal predictors (flat for prices and zero for returns in standard financial settings). In this regime, increased model expressivity does not improve predictive accuracy but instead introduces spurious trajectory fluctuations around the optimal predictor. These fluctuations arise from the reuse of noise and result in increased prediction variance without any reduction in bias. This provides a process-level explanation for the degradation of Transformerbased forecasts on financial time series. We complement these theoretical results with numerical experiments on high-frequency EUR/USD exchange rate data, analyzing the distribution of trajectory-level forecasting errors. The results show that Transformer-based models yield larger errors than a simple linear benchmark on a large majority of forecasting windows, consistent with the variance-driven mechanism identified by the theory.

2604.00060 2026-04-02 stat.ML cs.IT cs.LG math.IT

Scaled Gradient Descent for Ill-Conditioned Low-Rank Matrix Recovery with Optimal Sampling Complexity

Zhenxuan Li, Meng Huang

详情
英文摘要

The low-rank matrix recovery problem seeks to reconstruct an unknown $n_1 \times n_2$ rank-$r$ matrix from $m$ linear measurements, where $m\ll n_1n_2$. This problem has been extensively studied over the past few decades, leading to a variety of algorithms with solid theoretical guarantees. Among these, gradient descent based non-convex methods have become particularly popular due to their computational efficiency. However, these methods typically suffer from two key limitations: a sub-optimal sample complexity of $O((n_1 + n_2)r^2)$ and an iteration complexity of $O(κ\log(1/ε))$ to achieve $ε$-accuracy, resulting in slow convergence when the target matrix is ill-conditioned. Here, $κ$ denotes the condition number of the unknown matrix. Recent studies show that a preconditioned variant of GD, known as scaled gradient descent (ScaledGD), can significantly reduce the iteration complexity to $O(\log(1/ε))$. Nonetheless, its sample complexity remains sub-optimal at $O((n_1 + n_2)r^2)$. In contrast, a delicate virtual sequence technique demonstrates that the standard GD in the positive semidefinite (PSD) setting achieves the optimal sample complexity $O((n_1 + n_2)r)$, but converges more slowly with an iteration complexity $O(κ^2 \log(1/ε))$. In this paper, through a more refined analysis, we show that ScaledGD achieves both the optimal sample complexity $O((n_1 + n_2)r)$ and the improved iteration complexity $O(\log(1/ε))$. Notably, our results extend beyond the PSD setting to general low-rank matrix recovery problem. Numerical experiments further validate that ScaledGD accelerates convergence for ill-conditioned matrices with the optimal sampling complexity.

2604.00058 2026-04-02 q-bio.GN cs.AI cs.LG

GenoBERT: A Language Model for Accurate Genotype Imputation

Lei Huang, Chuan Qiu, Kuan-Jui Su, Anqi Liu, Yun Gong, Weiqiang Lin, Lindong Jiang, Chen Zhao, Meng Song, Jeffrey Deng, Qing Tian, Zhe Luo, Ping Gong, Hui Shen, Chaoyang Zhang, Hong-Wen Deng

详情
英文摘要

Genotype imputation enables dense variant coverage for genome-wide association and risk-prediction studies, yet conventional reference-panel methods remain limited by ancestry bias and reduced rare-variant accuracy. We present Genotype Bidirectional Encoder Representations from Transformers (GenoBERT), a transformer-based, reference-free framework that tokenizes phased genotypes and uses a self-attention mechanism to capture both short- and long-range linkage disequilibrium (LD) dependencies. Benchmarking on two independent datasets including the Louisiana Osteoporosis Study (LOS) and the 1000 Genomes Project (1KGP) across ancestry groups and multiple genotype missingness levels (5-50%) shows that GenoBERT achieves the highest overall accuracy compared to four baseline methods (Beagle5.4, SCDA, BiU-Net, and STICI). At practical sparsity levels (up to 25% missing), GenoBERT attains high overall imputation accuracy ($r^2 approx 0.98$) across datasets, and maintains robust performance ($r^2 > 0.90$) even at 50% missingness. Experimental results across different ancestries confirm consistent gains across datasets, with resilience to small sample sizes and weak LD. A 128-SNP (single-nucleotide polymorphism) context window (approximately 100 Kb) is validated through LD-decay analyses as sufficient to capture local correlation structures. By eliminating reference-panel dependence while preserving high accuracy, GenoBERT provides a scalable and robust solution for genotype imputation and a foundation for downstream genomic modeling.

2604.00057 2026-04-02 cs.MM cs.AI

Towards Automatic Soccer Commentary Generation with Knowledge-Enhanced Visual Reasoning

Zeyu Jin, Xiaoyu Qin, Songtao Zhou, Kaifeng Yun, Jia Jia

Comments Accepted by ICME 2026

详情
英文摘要

Soccer commentary plays a crucial role in enhancing the soccer game viewing experience for audiences. Previous studies in automatic soccer commentary generation typically adopt an end-to-end method to generate anonymous live text commentary. Such generated commentary is insufficient in the context of real-world live televised commentary, as it contains anonymous entities, context-dependent errors and lacks statistical insights of the game events. To bridge the gap, we propose GameSight, a two-stage model to address soccer commentary generation as a knowledge-enhanced visual reasoning task, enabling live-televised-like knowledgeable commentary with accurate reference to entities (players and teams). GameSight starts by performing visual reasoning to align anonymous entities with fine-grained visual and contextual analysis. Subsequently, the entity-aligned commentary is refined with knowledge by incorporating external historical statistics and iteratively updated internal game state information. Consequently, GameSight improves the player alignment accuracy by 18.5% on SN-Caption-test-align dataset compared to Gemini 2.5-pro. Combined with further knowledge enhancement, GameSight outperforms in segment-level accuracy and commentary quality, as well as game-level contextual relevance and structural composition. We believe that our work paves the way for a more informative and engaging human-centric experience with the AI sports application. Demo Page: https://gamesight2025.github.io/gamesight2025

2604.00053 2026-04-02 cs.SE cs.AI

The Energy Footprint of LLM-Based Environmental Analysis: LLMs and Domain Products

Alicia Bao, Jiamian He, Angel Hsu, Diego Manya, Ji, Zhang

详情
英文摘要

As large language models (LLMs) are increasingly used in domain-specific applications, including climate change and environmental research, understanding their energy footprint has become an important concern. The growing adoption of retrieval-augmented (RAG) systems for climate-domain specific analysis raises a key question: how does the energy consumption of domain-specific RAG workflows compare with that of direct generic LLM usage? Prior research has focused on standalone model calls or coarse token-based estimates, while leaving the energy implications of deployed application workflows insufficiently understood. In this paper, we assess the inference-time energy consumption of two LLM-based climate analysis chatbots (ChatNetZero and ChatNDC) compared to the generic GPT-4o-mini model. We estimate energy use under actual user queries by decomposing each workflow into retrieval, generation, and hallucination-checking components. We also test across different times of day and geographic access locations. Our results show that the energy consumption of domain-specific RAG systems depends strongly on their design. More agentic pipelines substantially increase inference-time energy use, particularly when used for additional accuracy or verification checks, although they may not yield proportional gains in response quality. While more research is needed to further test these initial findings more robustly across models, environments and prompting structures, this study provides a new understanding on how the design of domain-specific LLM products affects both the energy footprint and quality of output.

2604.00049 2026-04-02 math.NA cs.NA cs.RO

A Generalized Matrix Inverse that is Consistent with Respect to Diagonal Transformations

Jeffrey Uhlmann

Comments This reflects the 2018 SIMAX publication. (The 1604.08476 preprint has a comment saying that its content is contained in the SIMAX paper, but the two are quite distinct.)

详情
Journal ref
SIAM Journal on Matrix Analysis, Vol. 239, No. 2, pp. 781-800, 2018
英文摘要

A new generalized matrix inverse is derived which is consistent with respect to arbitrary nonsingular diagonal transformations, e.g., it preserves units associated with variables under state space transformations, thus providing a general solution to a longstanding open problem relevant to a wide variety of applications in robotics, tracking, and control systems. The new inverse complements the Drazin inverse (which is consistent with respect to similarity transformations) and the Moore-Penrose inverse (which is consistent with respect to unitary/orthonormal transformations) to complete a trilogy of generalized matrix inverses that exhausts the standard family of analytically-important linear system transformations. Results are generalized to obtain unit-consistent and unit-invariant matrix decompositions and examples of their use are described.

2604.00048 2026-04-02 eess.IV cs.AI

Whittaker-Henderson smoother for long satellite image time series interpolation

Mathieu Fauvel

详情
英文摘要

Whittaker smoother is a widely adopted solution to pre-process satellite image time series. Yet, two key limitations remain: the smoothing parameter must be tuned individually for each pixel, and the standard formulation assumes homoscedastic noise, imposing uniform smoothing across the temporal dimension. This paper addresses both limitations by casting the Whittaker smoother as a differentiable neural layer, in which the smoothing parameter is inferred by a neural network. The framework is further extended to handle heteroscedastic noise through a time-varying regularization, allowing the degree of smoothing to adapt locally along the time series. To enable large-scale processing, a sparse, memory-efficient, and fully differentiable implementation is proposed, exploiting the symmetric banded structure of the underlying linear system via Cholesky factorization. Benchmarks on GPU demonstrate that this implementation substantially outperforms standard dense linear solvers, both in speed and memory consumption. The approach is validated on SITS acquired over the French metropolitan territory between 2016 and 2024. Results confirm the feasibility of large-scale heteroscedastic Whittaker smoothing, though reconstruction differences with the homoscedastic baseline remain limited, suggesting that the transformer architecture used for smoothing parameter estimation may lack the temporal acuity needed to capture abrupt noise variations such as singleday cloud contamination.

2604.00046 2026-04-02 cs.SE cs.LG

Large Language Models for Analyzing Enterprise Architecture Debt in Unstructured Documentation

Christin Pagels, Simon Hacks, Rob Henk Bemthuis

Comments Author version, 2 figures, 5 tables. To appear in the Proceedings of the 41st ACM/SIGAPP Symposium on Applied Computing (SAC '26), 2026

详情
英文摘要

Enterprise Architecture Debt (EA Debt) arises from suboptimal design decisions and misaligned components that can degrade an organization's IT landscape over time. Early indicators, Enterprise Architecture Smells (EA Smells), are currently mainly detected manually or only from structured artifacts, leaving much unstructured documentation under-analyzed. This study proposes an approach using a large language model (LLM) to identify and quantify EA Debt in unstructured architectural documentation. Following a design science research approach, we design and evaluate an LLM-based prototype for automated EA Smell detection. The artifact ingests unstructured documents (e.g., process descriptions, strategy papers), applies fine-tuned detection models, and outputs identified smells. We evaluate the prototype through a case study using synthetic yet realistic business documents, benchmarking against a custom GPT-based model. Results show that LLMs can detect multiple predefined EA Smells in unstructured text, with the benchmark model achieving higher precision and processing speed, and the fine-tuned on-premise model offering data protection advantages. The findings highlight opportunities for integrating LLM-based smell detection into EA governance practice.

2604.00043 2026-04-02 cs.PL cs.AI

DriftScript: A Domain-Specific Language for Programming Non-Axiomatic Reasoning Agents

Seamus Brady

详情
英文摘要

Non-Axiomatic Reasoning Systems (NARS) provide a framework for building adaptive agents that operate under insufficient knowledge and resources. However, the standard input language, Narsese, poses a usability barrier: its dense symbolic notation, overloaded punctuation, and implicit conventions make programs difficult to read, write, and maintain. We present DriftScript, a Lisp-like domain-specific language that compiles to Narsese. DriftScript provides source-level constructs covering the major sentence and term forms used in Non-Axiomatic Logic (NAL) levels 1 through 8, including inheritance, temporal implication, variable quantification, sequential conjunction, and operation invocation, while replacing symbolic syntax with readable keyword-based S-expressions. The compiler is a zero-dependency, four-stage pipeline implemented in 1,941 lines of C99. When used with the DriftNARS engine, DriftScript programs connect to external systems through four structured callback types and an HTTP operation registry, enabling a sense-reason-act loop for autonomous agents. We describe the language design and formal grammar, detail the compiler architecture, and evaluate the compiler through a 106-case test suite, equivalence testing against hand-written Narsese, a NAL coverage analysis, structural readability metrics, and compilation benchmarks. The source code is available at https://github.com/seamus-brady/DriftNARS. This paper focuses on the design and implementation of the DriftScript language and its embedding into DriftNARS, rather than on new inference algorithms for NARS itself.

2604.00039 2026-04-02 cs.PL cs.LG

Transformers for Program Termination

Yoav Alon, Cristina David

Comments 12 pages

详情
英文摘要

Determining whether a program terminates is a core challenge in program analysis with direct implications for correctness, verification, and security. We investigate whether transformer architectures can recognise termination patterns directly from source code and how their strengths can be amplified through ensembles. To overcome the extreme scarcity of non-terminating examples, we design an ensemble framework of compact transformer encoders, systematically trained with a suite of imbalance-aware loss functions and class-aware sampling techniques. By combining models trained with distinct loss functions, our ensembles achieve substantially stronger performance than any single transformer, outperforming both powerful off-the-shelf LLMs and graph-based methods. Finally, we introduce an attribution pipeline that produces syntax-aware explanations for the termination estimation.

2604.00038 2026-04-02 stat.ML cs.LG

Isomorphic Functionalities between Ant Colony and Ensemble Learning: Part II-On the Strength of Weak Learnability and the Boosting Paradigm

Ernest Fokoué, Gregory Babbitt, Yuval Levental

Comments 21 pages, 5 figures, 4 tables

详情
英文摘要

In Part I of this series, we established a rigorous mathematical isomorphism between ant colony decision-making and random forest learning, demonstrating that variance reduction through decorrelation is a universal principle shared by biological and computational ensembles. Here we turn to the complementary mechanism: bias reduction through adaptive weighting. Just as boosting algorithms sequentially focus on difficult instances, ant colonies dynamically amplify successful foraging paths through pheromone-mediated recruitment. We prove that these processes are mathematically isomorphic, establishing that the fundamental theorem of weak learnability has a direct analog in colony decision-making. We develop a formal mapping between AdaBoost's adaptive reweighting and ant recruitment dynamics, show that the margin theory of boosting corresponds to the stability of quorum decisions, and demonstrate through comprehensive simulation that ant colonies implementing adaptive recruitment achieve the same bias-reduction benefits as boosting algorithms. This completes a unified theory of ensemble intelligence, revealing that both variance reduction (Part I) and bias reduction (Part II) are manifestations of the same underlying mathematical principles governing collective intelligence in biological and computational systems.

2604.00036 2026-04-02 q-bio.NC cs.AI cs.LG cs.NE physics.bio-ph

When and Where: A Model Hippocampal Network Unifies Formation of Time Cells and Place Cells

Qiaorong S. Yu, Zhaoze Wang, Vijay Balasubramanian

Comments 18 pages, 6 figures

详情
英文摘要

Hippocampal place and time cells encode spatial and temporal aspects of experience. Both have the same neural substrate, but have been modeled as having different functions and mechanistic origins, place cells as continuous attractors, and time cells as leaky integrators. Here, we show that both types emerge from two dynamical regimes of a single recurrent network (RNN) modeling hippocampal CA3 as a predictive autoencoder. The network receives simulated, partially occluded ``experience vectors" containing spatial patterns (location-specific activity sampled during environmental traversal) and/or temporal patterns (correlated activity pairs separated by ``void" intervals), and is trained to reconstruct missing input. During spatial navigation, the network generates stable attractor-like place fields. But trained on temporally structured inputs, the network produces sequentially broadened fields, recapitulating time cells. By varying spatio-temporal input patterning, we observe hidden units transition smoothly between time cell-like and place cell-like representations. These results suggest a shared origin, but task-driven difference, between place and time cells.

2604.00032 2026-04-02 physics.ed-ph cs.CY cs.RO

Rusty Flying Robots: Learning a Full Robotics Stack with Real-Time Operation on an STM32 Microcontroller in a 9 ECTS MS Course

Wolfgang Hoenig, Christoph Scherer, Khaled Wahba

Comments Accepted at the International Conference on Robotics in Education (RiE), 2026

详情
英文摘要

We describe a novel masters-level projects class that teaches robotics along the traditional robotics pipeline (dynamics, state estimation, controls, planning). One key motivational part is that students have to directly apply the algorithms they learn on a highly constrained compute platform, effectively making a robot fly. We teach nonlinear algorithms as deployed in state-of-the-art flight stacks such as PX4. Didactically, we rely on two core concepts: 1) avoidance of provided black-box software infrastructure, and 2) usage of the safe and efficient programming language Rust that is used on the PC (for simulation) and an STM32 microcontroller (for robot deployment). We discuss our methodology and the student feedback over two years with ten students each. Teaching material: https://imrclab.github.io/teaching/flying-robots

2604.00031 2026-04-02 q-fin.GN cs.LG

Decomposable Reward Modeling and Realistic Environment Design for Reinforcement Learning-Based Forex Trading

Nabeel Ahmad Saidd

详情
英文摘要

Applying reinforcement learning (RL) to foreign exchange (Forex) trading remains challenging because realistic environments, well-defined reward functions, and expressive action spaces must be satisfied simultaneously, yet many prior studies rely on simplified simulators, single scalar rewards, and restricted action representations, limiting both interpretability and practical relevance. This paper presents a modular RL framework designed to address these limitations through three tightly integrated components: a friction-aware execution engine that enforces strict anti-lookahead semantics, with observations at time t, execution at time t+1, and mark-to-market at time t+1, while incorporating realistic costs such as spread, commission, slippage, rollover financing, and margin-triggered liquidation; a decomposable 11-component reward architecture with fixed weights and per-step diagnostic logging to enable systematic ablation and component-level attribution; and a 10-action discrete interface with legal-action masking that encodes explicit trading primitives while enforcing margin-aware feasibility constraints. Empirical evaluation on EURUSD focuses on learning dynamics rather than generalization and reveals strongly non-monotonic reward interactions, where additional penalties do not reliably improve outcomes; the full reward configuration achieves the highest training Sharpe (0.765) and cumulative return (57.09 percent). The expanded action space increases return but also turnover and reduces Sharpe relative to a conservative 3-action baseline, indicating a return-activity trade-off under a fixed training budget, while scaling-enabled variants consistently reduce drawdown, with the combined configuration achieving the strongest endpoint performance.