arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1771
专题追踪
2603.23482 2026-03-25 cs.SE cs.AI

ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Software Domains

Muhammad Khalid, Manuel Oriol, Yilmaz Uygun

Comments 17 pages, 6 figures, 7 tables. Accepted at VerifAI-2026 Workshop, co-located with ETAPS 2026

详情
英文摘要

Requirements engineering is a vital, yet labor-intensive, stage in the software development process. This article introduces ReqFusion: an AI-enhanced system that automates the extraction, classification, and analysis of software requirements utilizing multiple Large Language Model (LLM) providers. The architecture of ReqFusion integrates OpenAI GPT, Anthropic Claude, and Groq models to extract functional and non-functional requirements from various documentation formats (PDF, DOCX, and PPTX) in academic, industrial, and tender proposal contexts. The system uses a domain-independent extraction method and generates requirements following the Project, Environment, Goal, and System (PEGS) approach introduced by Bertrand Meyer. The main idea is that, because the PEGS format is detailed, LLMs have more information and cues about the requirements, producing better results than a simple generic request. An ablation study confirms this hypothesis: PEGS-guided prompting achieves an F1 score of 0.88, compared to 0.71 for generic prompting under the same multi-provider configuration. The evaluation used 18 real-world documents to generate 226 requirements through automated classification, with 54.9% functional and 45.1% nonfunctional across academic, business, and technical domains. An extended evaluation on five projects with 1,050 requirements demonstrated significant improvements in extraction accuracy and a 78% reduction in analysis time compared to manual methods. The multi-provider architecture enhances reliability through model consensus and fallback mechanisms, while the PEGS-based approach ensures comprehensive coverage of all requirement categories.

2603.23443 2026-03-25 cs.SE cs.AI

Evaluating LLM-Based Test Generation Under Software Evolution

Sabaat Haroon, Mohammad Taha Khan, Muhammad Ali Gulzar

Comments 10 pages, 9 figures, 2 tables

详情
英文摘要

Large Language Models (LLMs) are increasingly used for automated unit test generation. However, it remains unclear whether these tests reflect genuine reasoning about program behavior or simply reproduce superficial patterns learned during training. If the latter dominates, LLM-generated tests may exhibit weaknesses such as reduced coverage, missed regressions, and undetected faults. Understanding how LLMs generate tests and how those tests respond to code evolution is therefore essential. We present a large-scale empirical study of LLM-based test generation under program changes. Using an automated mutation-driven framework, we analyze how generated tests react to semantic-altering changes (SAC) and semantic-preserving changes (SPC) across eight LLMs and 22,374 program variants. LLMs achieve strong baseline results, reaching 79% line coverage and 76% branch coverage with fully passing test suites on the original programs. However, performance degrades as programs evolve. Under SACs, the pass rate of newly generated tests drops to 66%, and branch coverage declines to 60%. More than 99% of failing SAC tests pass on the original program while executing the modified region, indicating residual alignment with the original behavior rather than adaptation to updated semantics. Performance also declines under SPCs despite unchanged functionality: pass rates fall to 79% and branch coverage to 69%. Although SPC edits preserve semantics, they often introduce larger syntactic changes, leading to instability in generated test suites. Models generate more new tests while discarding many baseline tests, suggesting sensitivity to lexical changes rather than true semantic impact. Overall, our results indicate that current LLM-based test generation relies heavily on surface-level cues and struggles to maintain regression awareness as programs evolve.

2603.23438 2026-03-25 cs.CR cs.AI

Targeted Adversarial Traffic Generation : Black-box Approach to Evade Intrusion Detection Systems in IoT Networks

Islam Debicha, Tayeb Kenaza, Ishak Charfi, Salah Mosbah, Mehdi Sehaki, Jean-Michel Dricot

Comments Already published in International Journal of Machine Learning and Cybernetics. Debicha, I., Kenaza, T., Charfi, I. et al. Targeted adversarial traffic generation: black-box approach to evade intrusion detection systems in IoT networks. Int. J. Mach. Learn. & Cyber. 17, 58 (2026). https://doi.org/10.1007/s13042-025-02873-w

Journal ref Int. J. Mach. Learn. & Cyber. 17, 58 (2026)

详情
英文摘要

The integration of machine learning (ML) algorithms into Internet of Things (IoT) applications has introduced significant advantages alongside vulnerabilities to adversarial attacks, especially within IoT-based intrusion detection systems (IDS). While theoretical adversarial attacks have been extensively studied, practical implementation constraints have often been overlooked. This research addresses this gap by evaluating the feasibility of evasion attacks on IoT network-based IDSs, employing a novel black-box adversarial attack. Our study aims to bridge theoretical vulnerabilities with real-world applicability, enhancing understanding and defense against sophisticated threats in modern IoT ecosystems. Additionally, we propose a defense scheme tailored to mitigate the impact of evasion attacks, thereby reinforcing the resilience of ML-based IDSs. Our findings demonstrate successful evasion attacks against IDSs, underscoring their susceptibility to advanced techniques. In contrast, we proposed a defense mechanism that exhibits robust performance by effectively detecting the majority of adversarial traffic, showcasing promising outcomes compared to current state-of-the-art defenses. By addressing these critical cybersecurity challenges, our research contributes to advancing IoT security and provides insights for developing more resilient IDS.

2603.23419 2026-03-25 cs.HC cs.AI

Biased Error Attribution in Multi-Agent Human-AI Systems Under Delayed Feedback

Teerthaa Parakh, Karen M. Feigh

Comments 14 pages, 9 figures. Preprint. An extended abstract is under submission

详情
英文摘要

Human decision-making is strongly influenced by cognitive biases, particularly under conditions of uncertainty and risk. While prior work has examined bias in single-step decisions with immediate outcomes and in human interaction with a single autonomous agent, comparatively little attention has been paid to decision-making under delayed outcomes involving multiple AI agents, where decisions at each step affect subsequent states. In this work, we study how delayed outcomes shape decision-making and responsibility attribution in a multi-agent human-AI task. Using a controlled game-based experiment, we analyze how participants adjust their behavior following positive and negative outcomes. We observe asymmetric responses to gains and losses, with stronger corrective adjustments after negative outcomes. Importantly, participants often fail to correctly identify the actions that caused failure and misattribute responsibility across AI agents, leading to systematic revisions of decisions that are weakly related to the underlying causes of poor performance. We refer to this phenomenon as a form of attribution bias, manifested as biased error attribution under delayed feedback. Our findings highlight how cognitive biases can be amplified in human-AI systems with delayed outcomes and multiple autonomous agents, underscoring the need for decision-support systems that better support causal understanding and learning over time.

2603.23405 2026-03-25 cs.MA cs.AI cs.RO

Planning over MAPF Agent Dependencies via Multi-Dependency PIBT

Zixiang Jiang, Yulun Zhang, Rishi Veerapaneni, Jiaoyang Li

详情
英文摘要

Modern Multi-Agent Path Finding (MAPF) algorithms must plan for hundreds to thousands of agents in congested environments within a second, requiring highly efficient algorithms. Priority Inheritance with Backtracking (PIBT) is a popular algorithm capable of effectively planning in such situations. However, PIBT is constrained by its rule-based planning procedure and lacks generality because it restricts its search to paths that conflict with at most one other agent. This limitation also applies to Enhanced PIBT (EPIBT), a recent extension of PIBT. In this paper, we describe a new perspective on solving MAPF by planning over agent dependencies. Taking inspiration from PIBT's priority inheritance logic, we define the concept of agent dependencies and propose Multi-Dependency PIBT (MD-PIBT) that searches over agent dependencies. MD-PIBT is a general framework where specific parameterizations can reproduce PIBT and EPIBT. At the same time, alternative configurations yield novel planning strategies that are not expressible by PIBT or EPIBT. Our experiments demonstrate that MD-PIBT effectively plans for as many as 10,000 homogeneous agents under various kinodynamic constraints, including pebble motion, rotation motion, and differential drive robots with speed and acceleration limits. We perform thorough evaluations on different variants of MAPF and find that MD-PIBT is particularly effective in MAPF with large agents.

2603.23356 2026-03-25 hep-ex cs.AI cs.CV cs.LG

Contrastive Metric Learning for Point Cloud Segmentation in Highly Granular Detectors

Max Marriott-Clarke, Lazar Novakovic, Elizabeth Ratzer, Robert J. Bainbridge, Loukas Gouskos, Benedikt Maier

详情
英文摘要

We propose a novel clustering approach for point-cloud segmentation based on supervised contrastive metric learning (CML). Rather than predicting cluster assignments or object-centric variables, the method learns a latent representation in which points belonging to the same object are embedded nearby while unrelated points are separated. Clusters are then reconstructed using a density-based readout in the learned metric space, decoupling representation learning from cluster formation and enabling flexible inference. The approach is evaluated on simulated data from a highly granular calorimeter, where the task is to separate highly overlapping particle showers represented as sets of calorimeter hits. A direct comparison with object condensation (OC) is performed using identical graph neural network backbones and equal latent dimensionality, isolating the effect of the learning objective. The CML method produces a more stable and separable embedding geometry for both electromagnetic and hadronic particle showers, leading to improved local neighbourhood consistency, a more reliable separation of overlapping showers, and better generalization when extrapolating to unseen multiplicities and energies. This translates directly into higher reconstruction efficiency and purity, particularly in high-multiplicity regimes, as well as improved energy resolution. In mixed-particle environments, CML maintains strong performance, suggesting robust learning of the shower topology, while OC exhibits significant degradation. These results demonstrate that similarity-based representation learning combined with density-based aggregation is a promising alternative to object-centric approaches for point cloud segmentation in highly granular detectors.

2603.23322 2026-03-25 stat.AP cs.AI cs.CY physics.geo-ph

Leveraging LLMs and Social Media to Understand User Perception of Smartphone-Based Earthquake Early Warnings

Hanjing Wang, S. Mostafa Mousavi, Patrick Robertson, Richard M. Allen, Alexie Barski, Robert Bosch, Nivetha Thiruverahan, Youngmin Cho, Tajinder Gadh, Steve Malkos, Boone Spooner, Greg Wimpey, Marc Stogaitis

详情
英文摘要

Android's Earthquake Alert (AEA) system provided timely early warnings to millions during the Mw 6.2 Marmara Ereglisi, Türkiye earthquake on April 23, 2025. This event, the largest in the region in 25 years, served as a critical real-world test for smartphone-based Earthquake Early Warning (EEW) systems. The AEA system successfully delivered alerts to users with high precision, offering over a minute of warning before the strongest shaking reached urban areas. This study leveraged Large Language Models (LLMs) to analyze more than 500 public social media posts from the X platform, extracting 42 distinct attributes related to user experience and behavior. Statistical analyses revealed significant relationships, notably a strong correlation between user trust and alert timeliness. Our results indicate a distinction between engineering and the user-centric definition of system accuracy. We found that timeliness is accuracy in the user's mind. Overall, this study provides actionable insights for optimizing alert design, public education campaigns, and future behavioral research to improve the effectiveness of such systems in seismically active regions.

2603.23305 2026-03-25 stat.ML cs.LG

Contextual Graph Matching with Correlated Gaussian Features

Mohammad Hassan Ahmad Yarandi, Luca Ganassali

详情
英文摘要

We investigate contextual graph matching in the Gaussian setting, where both edge weights and node features are correlated across two networks. We derive precise information-theoretic thresholds for exact recovery, and identify conditions under which almost exact recovery is possible or impossible, in terms of graph and feature correlation strengths, the number of nodes, and feature dimension. Interestingly, whereas an all-or-nothing phase transition is observed in the standard graph-matching scenario, the additional contextual information introduces a richer structure: thresholds for exact and almost exact recovery no longer coincide. Our results provide the first rigorous characterization of how structural and contextual information interact in graph matching, and establish a benchmark for designing efficient algorithms.

2603.20534 2026-03-25 cs.SE cs.AI

An Industrial-Scale Retrieval-Augmented Generation Framework for Requirements Engineering: Empirical Evaluation with Automotive Manufacturing Data

Muhammad Khalid, Yilmaz Uygun

Comments 10 pages, 6 figures

详情
英文摘要

Requirements engineering in Industry 4.0 faces critical challenges with heterogeneous, unstructured documentation spanning technical specifications, supplier lists, and compliance standards. While retrieval-augmented generation (RAG) shows promise for knowledge-intensive tasks, no prior work has evaluated RAG on authentic industrial RE workflows using comprehensive production-grade performance metrics. This paper presents a comprehensive empirical evaluation of RAG for industrial requirements engineering automation using authentic automotive manufacturing documentation comprising 669 requirements across four specification standards (MBN 9666-1, MBN 9666-2, BQF 9666-5, MBN 9666-9) spanning 2015-2023, plus 49 supplier qualifications with extensive supporting documentation. Through controlled comparisons with BERT-based and ungrounded LLM approaches, the framework achieves 98.2% extraction accuracy with complete traceability, outperforming baselines by 24.4% and 19.6%, respectively. Hybrid semantic-lexical retrieval achieves MRR of 0.847. Expert quality assessment averaged 4.32/5.0 across five dimensions. The evaluation demonstrates 83% reduction in manual analysis time and 47% cost savings through multi-provider LLM orchestration. Ablation studies quantify individual component contributions. Longitudinal analysis reveals a 55% reduction in requirement volume coupled with 1,800% increase in IT security focus, identifying 10 legacy suppliers (20.4%) requiring requalification, representing potential $2.3M in avoided contract penalties.

2603.16146 2026-03-25 stat.ML cs.LG cs.SY eess.SY stat.ME

Deep Adaptive Model-Based Design of Experiments

Arno Strouwen, Sebastian Micluţa-Câmpeanu

详情
英文摘要

Model-based design of experiments (MBDOE) is essential for efficient parameter estimation in nonlinear dynamical systems. However, conventional adaptive MBDOE requires costly posterior inference and design optimization between each experimental step, precluding real-time applications. We address this by combining Deep Adaptive Design (DAD), which amortizes sequential design into a neural network policy trained offline, with differentiable mechanistic models. For dynamical systems with known governing equations but uncertain parameters, we extend sequential contrastive training objectives to handle nuisance parameters and propose a transformer-based policy architecture that respects the temporal structure of dynamical systems. We demonstrate the approach on four systems of increasing complexity: a fed-batch bioreactor with Monod kinetics, a Haldane bioreactor with uncertain substrate inhibition, a two-compartment pharmacokinetic model with nuisance clearance parameters, and a DC motor for real-time deployment.

2603.08956 2026-03-25 econ.GN cs.LG q-fin.EC

A Survey of Reinforcement Learning For Economics

Pranjal Rawat

详情
英文摘要

This survey (re)introduces reinforcement learning methods to economists. The curse of dimensionality limits how far exact dynamic programming can be effectively applied, forcing us to rely on suitably "small" problems or our ability to convert "big" problems into smaller ones. While this reduction has been sufficient for many classical applications, a growing class of economic models resists such reduction. Reinforcement learning algorithms offer a natural, sample-based extension of dynamic programming, extending tractability to problems with high-dimensional states, continuous actions, and strategic interactions. I review the theory connecting classical planning to modern learning algorithms and demonstrate their mechanics through simulated examples in pricing, inventory control, strategic games, and preference elicitation. I also examine the practical vulnerabilities of these algorithms, noting their brittleness, sample inefficiency, sensitivity to hyperparameters, and the absence of global convergence guarantees outside of tabular settings. The successes of reinforcement learning remain strictly bounded by these constraints, as well as a reliance on accurate simulators. When guided by economic structure, reinforcement learning provides a remarkably flexible framework. It stands as an imperfect, but promising, addition to the computational economist's toolkit. A companion survey (Rust and Rawat, 2026b) covers the inverse problem of inferring preferences from observed behavior. All simulation code is publicly available.

2510.23421 2026-03-25 econ.GN cs.AI q-fin.EC

Quantifying Systemic Vulnerability in the Foundation Model Industry

Claudio Pirrone, Stefano Fricano, Gioacchino Fazio

Comments Conference Paper - SIEPI (29-30 January 2026) - Bari

详情
英文摘要

The foundation model industry exhibits unprecedented concentration in critical inputs: semiconductors, energy infrastructure, elite talent, capital, and training data. Despite extensive sectoral analyses, no comprehensive framework exists for assessing overall industrial vulnerability. We develop the Artificial Intelligence Industrial Vulnerability Index (AIIVI) grounded in O-Ring production theory, recognizing that foundation model production requires simultaneous availability of non-substitutable inputs. Given extreme data opacity and rapid technological evolution, we implement a validated human-in-the-loop methodology using large language models to systematically extract indicators from dispersed grey literature, with complete human verification of all outputs. Applied to six state-of-the-art foundation model developers, AIIVI equals 0.82, indicating extreme vulnerability driven by compute infrastructure (0.85) and energy systems (0.90). While industrial policy currently emphasizes semiconductor capacity, energy infrastructure represents the emerging binding constraint. This methodology proves applicable to other fast-evolving, opaque industries where traditional data sources are inadequate.

2510.15994 2026-03-25 cs.CR cs.AI

MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents

Dongsen Zhang, Zekun Li, Xu Luo, Xuannan Liu, Peipei Li, Wenjun Xu

Comments Accepted by ICLR 2026

详情
英文摘要

The Model Context Protocol (MCP) standardizes how large language model (LLM) agents discover, describe, and call external tools. While MCP unlocks broad interoperability, it also enlarges the attack surface by making tools first-class, composable objects with natural-language metadata, and standardized I/O. We present MSB (MCP Security Benchmark), the first end-to-end evaluation suite that systematically measures how well LLM agents resist MCP-specific attacks throughout the full tool-use pipeline: task planning, tool invocation, and response handling. MSB contributes: (1) a taxonomy of 12 attacks including name-collision, preference manipulation, prompt injections embedded in tool descriptions, out-of-scope parameter requests, user-impersonating responses, false-error escalation, tool-transfer, retrieval injection, and mixed attacks; (2) an evaluation harness that executes attacks by running real tools (both benign and malicious) via MCP rather than simulation; and (3) a robustness metric that quantifies the trade-off between security and performance: Net Resilient Performance (NRP). We evaluate nine popular LLM agents across 10 domains and 405 tools, producing 2,000 attack instances. Results reveal the effectiveness of attacks against each stage of MCP. Models with stronger performance are more vulnerable to attacks due to their outstanding tool calling and instruction following capabilities. MSB provides a practical baseline for researchers and practitioners to study, compare, and harden MCP agents. Code: https://github.com/dongsenzhang/MSB

2508.07494 2026-03-25 math.OC cs.AI

From Product Hilbert Spaces to the Generalized Koopman Operator and the Nonlinear Fundamental Lemma

Mircea Lazar

Comments Revisions compared to first version: formal analysis of the generalized Koopman composition operator, exact bilinear form with finite-dimensional input Hilbert space for input-affine systems, quantitative persistency of excitation notion for infinite-dimensional bilinear systems, nonlinear fundamental lemma in terms of Hankel operators and frames, addition soft-robotic manipulator example

详情
英文摘要

The generalization of the Koopman operator to systems with control input and the derivation of a nonlinear fundamental lemma are two open problems that play a key role in the development of data-driven control methods for nonlinear systems. In this paper we derive a novel solution to these problems based on basis functions expansion in a product Hilbert space constructed as the tensor product between the Hilbert spaces of the state and input observable functions, respectively. We identify relaxed invariance conditions that guarantee existence of a bounded linear operator, i.e., the generalized Koopman operator, from the constructed product Hilbert space to the Hilbert space corresponding to the lifted state propagated forward in time. Compared to classical Koopman invariance conditions, measure preservation is not required. Moreover, we derive a nonlinear fundamental lemma by exploiting the constructed exact infinite-dimensional bilinear Koopman representation and Hankel operators. The effectiveness of the developed generalized Koopman embedding is illustrated on the Van der Pol oscillator and in predictive control of a soft-robotic manipulator model.

2411.03231 2026-03-25 cs.CR cs.AI cs.DC cs.LO

LOGSAFE: Logic-Guided Verification for Trustworthy Federated Time-Series Learning

Dung Thuy Nguyen, Ziyan An, Taylor T. Johnson, Meiyi Ma, Kevin Leach

Comments 17th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS)

详情
英文摘要

This paper introduces LOGSAFE, a defense mechanism for federated learning in time series settings, particularly within cyber-physical systems. It addresses poisoning attacks by moving beyond traditional update-similarity methods and instead using logical reasoning to evaluate client reliability. LOGSAFE extracts client-specific temporal properties, infers global patterns, and verifies clients against them to detect and exclude malicious participants. Experiments show that it significantly outperforms existing methods, achieving up to 93.27% error reduction over the next best baseline. Our code is available at https://github.com/judydnguyen/LOGSAFE-Robust-FTS.

2410.06112 2026-03-25 cs.NI cs.LG

SwiftQueue: Optimizing Low-Latency Applications with Swift Packet Queuing

Siddhant Ray, Xi Jiang, Jack Luo, Nick Feamster, Junchen Jiang

详情
英文摘要

Low Latency, Low Loss, and Scalable Throughput (L4S), as an emerging router-queue management technique, has seen steady deployment in the industry. An L4S-enabled router assigns each packet to the queue based on the packet header marking. Currently, L4S employs per-flow queue selection, i.e. all packets of a flow are marked the same way and thus use the same queues, even though each packet is marked separately. However, this may hurt tail latency and latency-sensitive applications because transient congestion and queue buildups may only affect a fraction of packets in a flow. We present SwiftQueue, a new L4S queue-selection strategy in which a sender uses a novel per-packet latency predictor to pinpoint which packets likely have latency spikes or drops. The insight is that many packet-level latency variations result from complex interactions among recent packets at shared router queues. Yet, these intricate packet-level latency patterns are hard to learn efficiently by traditional models. Instead, SwiftQueue uses a custom Transformer, which is well-studied for its expressiveness on sequential patterns, to predict the next packet's latency based on the latencies of recently received ACKs. Based on the predicted latency of each outgoing packet, SwiftQueue's sender dynamically marks the L4S packet header to assign packets to potentially different queues, even within the same flow. Using real network traces, we show that SwiftQueue is 45-65% more accurate in predicting latency and its variations than state-of-art methods. Based on its latency prediction, SwiftQueue reduces the tail latency for L4S-enabled flows by 36-45%, compared with the existing L4S queue-selection method.

2312.10618 2026-03-25 stat.ME cs.LG stat.ML

Sparse Learning and Class Probability Estimation with Weighted Support Vector Machines

Liyun Zeng, Hao Helen Zhang

详情
英文摘要

Classification and probability estimation are fundamental tasks with broad applications across modern machine learning and data science, spanning fields such as biology, medicine, engineering, and computer science. Recent development of weighted Support Vector Machines (wSVMs) has demonstrated considerable promise in robustly and accurately predicting class probabilities and performing classification across a variety of problems (Wang et al., 2008). However, the existing framework relies on an $\ell^2$-norm regularized binary wSVMs optimization formulation, which is designed for dense features and exhibits limited performance in the presence of sparse features with redundant noise. Effective sparse learning thus requires prescreening of important variables for each binary wSVM to ensure accurate estimation of pairwise conditional probabilities. In this paper, we propose a novel class of wSVMs frameworks that incorporate automatic variable selection with accurate probability estimation for sparse learning problems. We developed efficient algorithms for variable selection by solving either the $\ell^1$-norm or elastic net regularized wSVMs optimization problems. Class probability is then estimated either via the $\ell^2$-norm regularized wSVMs framework applied to the selected variables, or directly through elastic net regularized wSVMs. The two-step approach offers a strong advantage in simultaneous automatic variable selection and reliable probability estimators with competitive computational efficiency. The elastic net regularized wSVMs achieve superior performance in both variable selection and probability estimation, with the added benefit of variable grouping, at the cost of increases compensation time for high dimensional settings. The proposed wSVMs-based sparse learning methods are broadly applicable and can be naturally extended to $K$-class problems through ensemble learning.

2011.01007 2026-03-25 cs.CY cs.CL

The 2020s Political Economy of Machine Translation

Steven Weber

Comments 42 pages, 0 figures

Journal ref Bus. Polit. 24 (2022) 96-112

详情
英文摘要

This paper explores the hypothesis that the diversity of human languages, right now a barrier to interoperability in communication and trade, will become significantly less of a barrier as machine translation technologies are deployed over the next several years.But this new boundary-breaking technology does not reduce all boundaries equally, and it creates new challenges for the distribution of ideas and thus for innovation and economic growth.

2603.23300 2026-03-25 q-fin.PM cs.AI cs.MA q-fin.ST

Designing Agentic AI-Based Screening for Portfolio Investment

Mehmet Caner, Agostino Capponi, Nathan Sun, Jonathan Y. Tan

详情
英文摘要

We introduce a new agentic artificial intelligence (AI) platform for portfolio management. Our architecture consists of three layers. First, two large language model (LLM) agents are assigned specialized tasks: one agent screens for firms with desirable fundamentals, while a sentiment analysis agent screens for firms with desirable news. Second, these agents deliberate to generate and agree upon buy and sell signals from a large portfolio, substantially narrowing the pool of candidate assets. Finally, we apply a high-dimensional precision matrix estimation procedure to determine optimal portfolio weights. A defining theoretical feature of our framework is that the number of assets in the portfolio is itself a random variable, realized through the screening process. We introduce the concept of sensible screening and establish that, under mild screening errors, the squared Sharpe ratio of the screened portfolio consistently estimates its target. Empirically, our method achieves superior Sharpe ratios relative to an unscreened baseline portfolio and to conventional screening approaches, evaluated on S&P 500 data over the period 2020--2024.

2603.23279 2026-03-25 cs.SI cs.AI

Emergence of Fragility in LLM-based Social Networks: the Case of Moltbook

Luca Sodano, Sofia Sciangula, Amulya Galmarini, Francesco Bertolotti

详情
英文摘要

The rapid diffusion of large language models and the growth in their capability has enabled the emergence of online environments populated by autonomous AI agents that interact through natural language. These platforms provide a novel empirical setting for studying collective dynamics among artificial agents. In this paper we analyze the interaction network of Moltbook, a social platform composed entirely of LLM based agents, using tools from network science. The dataset comprises 39,924 users, 235,572 posts, and 1,540,238 comments collected through web scraping. We construct a directed weighted network in which nodes represent agents and edges represent commenting interactions. Our analysis reveals strongly heterogeneous connectivity patterns characterized by heavy tailed degree and activity distributions. At the mesoscale, the network exhibits a pronounced core periphery organization in which a very small structural core (0.9% of nodes) concentrates a large fraction of connectivity. Robustness experiments show that the network is relatively resilient to random node removal but highly vulnerable to targeted attacks on highly connected nodes, particularly those with high out degree. These findings indicate that the interaction structure of AI agent social systems may develop strong centralization and structural fragility, providing new insights into the collective organization of LLM native social environments.

2603.23269 2026-03-25 cs.CR cs.AI cs.LG

Not All Tokens Are Created Equal: Query-Efficient Jailbreak Fuzzing for LLMs

Wenyu Chen, Xiangtao Meng, Chuanchao Zang, Li Wang, Xinyu Gao, Jianing Wang, Peng Zhan, Zheng Li, Shanqing Guo

详情
英文摘要

Large Language Models(LLMs) are widely deployed, yet are vulnerable to jailbreak prompts that elicit policy-violating outputs. Although prior studies have uncovered these risks, they typically treat all tokens as equally important during prompt mutation, overlooking the varying contributions of individual tokens to triggering model refusals. Consequently, these attacks introduce substantial redundant searching under query-constrained scenarios, reducing attack efficiency and hindering comprehensive vulnerability assessment. In this work, we conduct a token-level analysis of refusal behavior and observe that token contributions are highly skewed rather than uniform. Moreover, we find strong cross-model consistency in refusal tendencies, enabling the use of a surrogate model to estimate token-level contributions to the target model's refusals. Motivated by these findings, we propose TriageFuzz, a token-aware jailbreak fuzzing framework that adapts the fuzz testing approach with a series of customized designs. TriageFuzz leverages a surrogate model to estimate the contribution of individual tokens to refusal behaviors, enabling the identification of sensitive regions within the prompt. Furthermore, it incorporates a refusal-guided evolutionary strategy that adaptively weights candidate prompts with a lightweight scorer to steer the evolution toward bypassing safety constraints. Extensive experiments on six open-source LLMs and three commercial APIs demonstrate that TriageFuzz achieves comparable attack success rates (ASR) with significantly reduced query costs. Notably, it attains a 90% ASR with over 70% fewer queries compared to baselines. Even under an extremely restrictive budget of 25 queries, TriageFuzz outperforms existing methods, improving ASR by 20-40%.

2603.23252 2026-03-25 cs.NI cs.AI

AI Lifecycle-Aware Feasibility Framework for Split-RIC Orchestration in NTN O-RAN

Daniele Tarchi

Comments 12 pages, 9 figures. Submitted to IEEE Transactions on Network and Service Management (TNSM)

详情
英文摘要

Integrating Artificial Intelligence (AI) into Non-Terrestrial Networks (NTN) is constrained by the joint limits of satellite SWaP and feeder-link capacity, which directly impact O-RAN closed-loop control and model lifecycle management. This paper studies the feasibility of distributing the O-RAN control hierarchy across Ground, LEO, and GEO segments through a Split-RIC architecture. We compare three deployment scenarios: (i) ground-centric control with telemetry streaming, (ii) ground--LEO Split-RIC with on-board inference and store-and-forward learning, and (iii) GEO--LEO multi-layer control enabled by inter-satellite links. For each scenario, we derive closed-form expressions for lifecycle energy and lifecycle latency that account for training-data transfer, model dissemination, and near-real-time inference. Numerical sensitivity analysis over feeder-link conditions, model complexity, and orbital intermittency yields operator-relevant feasibility regions that delineate when on-board inference and non-terrestrial learning loops are physically preferable to terrestrial offloading.

2603.23210 2026-03-25 cond-mat.dis-nn cond-mat.mtrl-sci cs.LG

Generative Inversion of Spectroscopic Data for Amorphous Structure Elucidation

Jiawei Guo, Daniel Schwalbe-Koda

Comments 10 pages; SI: 51 pages

详情
英文摘要

Determining atomistic structures from characterization data is one of the most common yet intricate problems in materials science. Particularly in amorphous materials, proposing structures that balance realism and agreement with experiments requires expert guidance, good interatomic potentials, or both. Here, we introduce GLASS, a generative framework that inverts multi-modal spectroscopic measurements into realistic atomistic structures without knowledge of the potential energy surface. A score-based model learns a structural prior from low-fidelity data and samples out-of-distribution structures conditioned on differentiable spectral targets. Reconstructions using pair distribution functions (PDFs), X-ray absorption spectroscopy, and diffraction measurements quantify the complementarity between spectral modalities and demonstrate that PDFs is the most informative probe for our framework. We use GLASS to rationalize three contested experimental problems: paracrystallinity in amorphous silicon, a liquid-liquid phase transition in sulfur, and ball-milled amorphous ice. In each case, generated structures reproduce experimental measurements and reveal mechanisms inaccessible to diffraction analysis alone.

2603.23205 2026-03-25 stat.ML cs.LG stat.ME

Between Resolution Collapse and Variance Inflation: Weighted Conformal Anomaly Detection in Low-Data Regimes

Oliver Hennhöfer, Christine Preisach

Comments 18 pages, 2 figures, 7 tables

详情
英文摘要

Standard conformal anomaly detection provides marginal finite-sample guarantees under the assumption of exchangeability . However, real-world data often exhibit distribution shifts, necessitating a weighted conformal approach to adapt to local non-stationarity. We show that this adaptation induces a critical trade-off between the minimum attainable p-value and its stability. As importance weights localize to relevant calibration instances, the effective sample size decreases. This can render standard conformal p-values overly conservative for effective error control, while the smoothing technique used to mitigate this issue introduces conditional variance, potentially masking anomalies. We propose a continuous inference relaxation that resolves this dilemma by decoupling local adaptation from tail resolution via continuous weighted kernel density estimation. While relaxing finite-sample exactness to asymptotic validity, our method eliminates Monte Carlo variability and recovers the statistical power lost to discretization. Empirical evaluations confirm that our approach not only restores detection capabilities where discrete baselines yield zero discoveries, but outperforms standard methods in statistical power while maintaining valid marginal error control in practice.

2603.23106 2026-03-25 stat.ML cs.LG cs.NA math.NA quant-ph

High-Resolution Tensor-Network Fourier Methods for Exponentially Compressed Non-Gaussian Aggregate Distributions

Juan José Rodríguez-Aldavero, Juan José García-Ripoll

Comments 22 pages, 13 figures

详情
英文摘要

Characteristic functions of weighted sums of independent random variables exhibit low-rank structure in the quantized tensor train (QTT) representation, also known as matrix product states (MPS), enabling up to exponential compression of their fully non-Gaussian probability distributions. Under variable independence, the global characteristic function factorizes into local terms. Its low-rank QTT structure arises from intrinsic spectral smoothness in continuous models, or from spectral energy concentration as the number of components $D$ grows in discrete models. We demonstrate this on weighted sums of Bernoulli and lognormal random variables. In the former, despite an adversarial, incompressible small-$D$ regime, the characteristic function undergoes a sharp bond-dimension collapse for $D \gtrsim 300$ components, enabling polylogarithmic time and memory scaling. In the latter, the approach reaches high-resolution discretizations of $N = 2^{30}$ frequency modes on standard hardware, far beyond the $N = 2^{24}$ ceiling of dense implementations. These compressed representations enable efficient computation of Value at Risk (VaR) and Expected Shortfall (ES), supporting applications in quantitative finance and beyond.

2603.23073 2026-03-25 cs.SE cs.AI

Can an LLM Detect Instances of Microservice Infrastructure Patterns?

Carlos Eduardo Duarte, Neil B. Harrison, Filipe Figueiredo Correia, Ademar Aguiar, Pavlína Gonçalves

Comments Accepted at ICSA 2026 - International Conference on Software Architecture - Research Track

详情
英文摘要

Architectural patterns are frequently found in various software artifacts. The wide variety of patterns and their implementations makes detection challenging with current tools, especially since they often only support detecting patterns in artifacts written in a single language. Large Language Models (LLMs), trained on a diverse range of software artifacts and knowledge, might overcome the limitations of existing approaches. However, their true effectiveness and the factors influencing their performance have not yet been thoroughly examined. To better understand this, we developed MicroPAD. This tool utilizes GPT 5 nano to identify architectural patterns in software artifacts written in any language, based on natural-language pattern descriptions. We used MicroPAD to evaluate an LLM's ability to detect instances of architectural patterns, particularly infrastructure-related microservice patterns. To accomplish this, we selected a set of GitHub repositories and contacted their top contributors to create a new, human-annotated dataset of 190 repositories containing microservice architectural patterns. The results show that MicroPAD was capable of detecting pattern instances across multiple languages and artifact types. The detection performance varied across patterns (F1 scores ranging from 0.09 to 0.70), specifically in relation to their prevalence and the distinctiveness of the artifacts through which they manifest. We also found that patterns associated with recognizable, dominant artifacts were detected more reliably. Whether these findings generalize to other LLMs and tools is a promising direction for future research.

2603.23063 2026-03-25 cs.SE cs.AI cs.CY cs.LG

Machine Learning Models for the Early Detection of Burnout in Software Engineering: a Systematic Literature Review

Tien Rahayu Tulili, Ayushi Rastogi, Andrea Capiluppi

Comments This paper is under review

详情
英文摘要

Burnout is an occupational syndrome that, like many other professions, affects the majority of software engineers. Past research studies showed important trends, including an increasing use of machine learning techniques to allow for an early detection of burnout. This paper is a systematic literature review (SLR) of the research papers that proposed machine learning (ML) approaches, and focused on detecting burnout in software developers and IT professionals. Our objective is to review the accuracy and precision of the proposed ML techniques, and to formulate recommendations for future researchers interested to replicate or extend those studies. From our SLR we observed that a majority of primary studies focuses on detecting emotions or utilise emotional dimensions to detect or predict the presence of burnout. We also performed a cross-sectional study to detect which ML approach shows a better performance at detecting emotions; and which dataset has more potential and expressivity to capture emotions. We believe that, by identifying which ML tools and datasets show a better performance at detecting emotions, and indirectly at identifying burnout, our paper can be a valuable asset to progress in this important research direction.

2603.23057 2026-03-25 eess.AS cs.LG

Prompt Amplification and Zero-Shot Late Fusion in Audio-Language Models for Speech Emotion Recognition

Saurabh Kataria, Xiao Hu

详情
英文摘要

Audio-Language Models (ALMs) are making strides in understanding speech and non-speech audio. However, domain-specialist Foundation Models (FMs) remain the best for closed-ended speech processing tasks such as Speech Emotion Recognition (SER). Using ALMs for Zero-shot SER is a popular choice, but their potential to work with specialists to achieve state-of-the-art (SOTA) performance remains unexplored. We propose ZS-Fuse, a late-fusion method that combines zero-shot emotion estimates from a dual-encoder ALM with specialist FMs. To handle ambiguity in emotions and sensitivity to prompt choice, 1) we use a simple prompt ensemble and 2) suggest a novel technique called prompt amplification, which repeats audio and text queries to discover stronger zero-shot capabilities. We demonstrate the efficacy of our technique by evaluating ZS-Fuse with three dual-encoder ALMs and two FMs, and report improvements over SOTA baselines, such as WavLM-Large, on three speech emotion recognition datasets.

2603.23050 2026-03-25 cs.DB cs.AI

DBAutoDoc: Automated Discovery and Documentation of Undocumented Database Schemas via Statistical Analysis and Iterative LLM Refinement

Amith Nagarajan, Thomas Altman

详情
英文摘要

A tremendous number of critical database systems lack adequate documentation. Declared primary keys are absent, foreign key constraints have been dropped for performance, column names are cryptic abbreviations, and no entity-relationship diagrams exist. We present DBAutoDoc, a system that automates the discovery and documentation of undocumented relational database schemas by combining statistical data analysis with iterative large language model (LLM) refinement. DBAutoDoc's central insight is that schema understanding is fundamentally an iterative, graph-structured problem. Drawing structural inspiration from backpropagation in neural networks, DBAutoDoc propagates semantic corrections through schema dependency graphs across multiple refinement iterations until descriptions converge. This propagation is discrete and semantic rather than mathematical, but the structural analogy is precise: early iterations produce rough descriptions akin to random initialization, and successive passes sharpen the global picture as context flows through the graph. The system makes four concrete contributions detailed in the paper. On a suite of benchmark databases, DBAutoDoc achieved overall weighted scores of 96.1% across two model families (Google's Gemini and Anthropic's Claude) using a composite metric. Ablation analysis demonstrates that the deterministic pipeline contributes a 23-point F1 improvement over LLM-only FK detection, confirming that the system's contribution is substantial and independent of LLM pre-training knowledge. DBAutoDoc is released as open-source software with all evaluation configurations and prompt templates included for full reproducibility.

2603.23007 2026-03-25 cs.CR cs.AI

AgentRAE: Remote Action Execution through Notification-based Visual Backdoors against Screenshots-based Mobile GUI Agents

Yutao Luo, Haotian Zhu, Shuchao Pang, Zhigang Lu, Tian Dong, Yongbin Zhou, Minhui Xue

详情
英文摘要

The rapid adoption of mobile graphical user interface (GUI) agents, which autonomously control applications and operating systems (OS), exposes new system-level attack surfaces. Existing backdoors against web GUI agents and general GenAI models rely on environmental injection or deceptive pop-ups to mislead the agent operation. However, these techniques do not work on screenshots-based mobile GUI agents due to the challenges of restricted trigger design spaces, OS background interference, and conflicts in multiple trigger-action mappings. We propose AgentRAE, a novel backdoor attack capable of inducing Remote Action Execution in mobile GUI agents using visually natural triggers (e.g., benign app icons in notifications). To address the underfitting caused by natural triggers and achieve accurate multi-target action redirection, we design a novel two-stage pipeline that first enhances the agent's sensitivity to subtle iconographic differences via contrastive learning, and then associates each trigger with a specific mobile GUI agent action through a backdoor post-training. Our extensive evaluation reveals that the proposed backdoor preserves clean performance with an attack success rate of over 90% across ten mobile operations. Furthermore, it is hard to visibly detect the benign-looking triggers and circumvents eight representative state-of-the-art defenses. These results expose an overlooked backdoor vector in mobile GUI agents, underscoring the need for defenses that scrutinize notification-conditioned behaviors and internal agent representations.