arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1284
专题追踪
2602.06699 2026-02-09 quant-ph cs.CL cs.LG

Quantum Attention by Overlap Interference: Predicting Sequences from Classical and Many-Body Quantum Data

Alessio Pecilli, Matteo Rosati

Comments 4 + 1 pages, 2 figures

详情
英文摘要

We propose a variational quantum implementation of self-attention (QSA), the core operation in transformers and large language models, which predicts future elements of a sequence by forming overlap-weighted combinations of past data. At variance with previous approaches, our QSA realizes the required nonlinearity through interference of state overlaps and returns a Renyi-1/2 cross-entropy loss directly as the expectation value of an observable, avoiding the need to decode amplitude-encoded predictions into classical logits. Furthermore, QSA naturally accommodates a constrained, trainable data-embedding that ties quantum state overlaps to data-level similarities. We find a gate complexity dominant scaling O(T d^2) for QSA, versus O(T^2 d) classically, suggesting an advantage in the practical regime where the sequence length T dominates the embedding size d. In simulations, we show that our QSA-based quantum transformer learns sequence prediction on classical data and on many-body transverse-field Ising quantum trajectories, establishing trainable attention as a practical primitive for quantum dynamical modeling.

2602.06693 2026-02-09 cs.NI cs.CC cs.LG

Makespan Minimization in Split Learning: From Theory to Practice

Robert Ganian, Fionn Mc Inerney, Dimitra Tsigkari

Comments This paper will appear at IEEE INFOCOM 2026

详情
英文摘要

Split learning recently emerged as a solution for distributed machine learning with heterogeneous IoT devices, where clients can offload part of their training to computationally-powerful helpers. The core challenge in split learning is to minimize the training time by jointly devising the client-helper assignment and the schedule of tasks at the helpers. We first study the model where each helper has a memory cardinality constraint on how many clients it may be assigned, which represents the case of homogeneous tasks. Through complexity theory, we rule out exact polynomial-time algorithms and approximation schemes even for highly restricted instances of this problem. We complement these negative results with a non-trivial polynomial-time 5-approximation algorithm. Building on this, we then focus on the more general heterogeneous task setting considered by Tirana et al. [INFOCOM 2024], where helpers have memory capacity constraints and clients have variable memory costs. In this case, we prove that, unless P=NP, the problem cannot admit a polynomial-time approximation algorithm for any approximation factor. However, by adapting our aforementioned 5-approximation algorithm, we develop a novel heuristic for the heterogeneous task setting and show that it outperforms heuristics from prior works through extensive experiments.

2602.06654 2026-02-09 cs.IR cs.AI

Multimodal Generative Retrieval Model with Staged Pretraining for Food Delivery on Meituan

Boyu Chen, Tai Guo, Weiyu Cui, Yuqing Li, Xingxing Wang, Chuan Shi, Cheng Yang

详情
英文摘要

Multimodal retrieval models are becoming increasingly important in scenarios such as food delivery, where rich multimodal features can meet diverse user needs and enable precise retrieval. Mainstream approaches typically employ a dual-tower architecture between queries and items, and perform joint optimization of intra-tower and inter-tower tasks. However, we observe that joint optimization often leads to certain modalities dominating the training process, while other modalities are neglected. In addition, inconsistent training speeds across modalities can easily result in the one-epoch problem. To address these challenges, we propose a staged pretraining strategy, which guides the model to focus on specialized tasks at each stage, enabling it to effectively attend to and utilize multimodal features, and allowing flexible control over the training process at each stage to avoid the one-epoch problem. Furthermore, to better utilize the semantic IDs that compress high-dimensional multimodal embeddings, we design both generative and discriminative tasks to help the model understand the associations between SIDs, queries, and item features, thereby improving overall performance. Extensive experiments on large-scale real-world Meituan data demonstrate that our method achieves improvements of 3.80%, 2.64%, and 2.17% on R@5, R@10, and R@20, and 5.10%, 4.22%, and 2.09% on N@5, N@10, and N@20 compared to mainstream baselines. Online A/B testing on the Meituan platform shows that our approach achieves a 1.12% increase in revenue and a 1.02% increase in click-through rate, validating the effectiveness and superiority of our method in practical applications.

2602.06639 2026-02-09 eess.SY cs.RO cs.SY

Efficient and Robust Modeling of Nonlinear Mechanical Systems

Davide Tebaldi, Roberto Zanasi

详情
英文摘要

The development of efficient and robust dynamic models is fundamental in the field of systems and control engineering. In this paper, a new formulation for the dynamic model of nonlinear mechanical systems, that can be applied to different automotive and robotic case studies, is proposed, together with a modeling procedure allowing to automatically obtain the model formulation. Compared with the Euler-Lagrange formulation, the proposed model is shown to give superior performances in terms of robustness against measurement noise for systems exhibiting dependence on some external variables, as well as in terms of execution time when computing the inverse dynamics of the system.

2602.06621 2026-02-09 stat.ML cs.LG

Infinite-dimensional generative diffusions via Doob's h-transform

Thorben Pieper-Sethmacher, Daniel Paulin

详情
英文摘要

This paper introduces a rigorous framework for defining generative diffusion models in infinite dimensions via Doob's h-transform. Rather than relying on time reversal of a noising process, a reference diffusion is forced towards the target distribution by an exponential change of measure. Compared to existing methodology, this approach readily generalises to the infinite-dimensional setting, hence offering greater flexibility in the diffusion model. The construction is derived rigorously under verifiable conditions, and bounds with respect to the target measure are established. We show that the forced process under the changed measure can be approximated by minimising a score-matching objective and validate our method on both synthetic and real data.

2602.06616 2026-02-09 cs.CR cs.LG

Confundo: Learning to Generate Robust Poison for Practical RAG Systems

Haoyang Hu, Zhejun Jiang, Yueming Lyu, Junyuan Zhang, Yi Liu, Ka-Ho Chow

详情
英文摘要

Retrieval-augmented generation (RAG) is increasingly deployed in real-world applications, where its reference-grounded design makes outputs appear trustworthy. This trust has spurred research on poisoning attacks that craft malicious content, inject it into knowledge sources, and manipulate RAG responses. However, when evaluated in practical RAG systems, existing attacks suffer from severely degraded effectiveness. This gap stems from two overlooked realities: (i) content is often processed before use, which can fragment the poison and weaken its effect, and (ii) users often do not issue the exact queries anticipated during attack design. These factors can lead practitioners to underestimate risks and develop a false sense of security. To better characterize the threat to practical systems, we present Confundo, a learning-to-poison framework that fine-tunes a large language model as a poison generator to achieve high effectiveness, robustness, and stealthiness. Confundo provides a unified framework supporting multiple attack objectives, demonstrated by manipulating factual correctness, inducing biased opinions, and triggering hallucinations. By addressing these overlooked challenges, Confundo consistently outperforms a wide range of purpose-built attacks across datasets and RAG configurations by large margins, even in the presence of defenses. Beyond exposing vulnerabilities, we also present a defensive use case that protects web content from unauthorized incorporation into RAG systems via scraping, with no impact on user experience.

2602.06599 2026-02-09 cs.MA cs.AI cs.LG

Sample-Efficient Policy Space Response Oracles with Joint Experience Best Response

Ariyan Bighashdel, Thiago D. Simão, Frans A. Oliehoek

Comments Accepted at the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

详情
英文摘要

Multi-agent reinforcement learning (MARL) offers a scalable alternative to exact game-theoretic analysis but suffers from non-stationarity and the need to maintain diverse populations of strategies that capture non-transitive interactions. Policy Space Response Oracles (PSRO) address these issues by iteratively expanding a restricted game with approximate best responses (BRs), yet per-agent BR training makes it prohibitively expensive in many-agent or simulator-expensive settings. We introduce Joint Experience Best Response (JBR), a drop-in modification to PSRO that collects trajectories once under the current meta-strategy profile and reuses this joint dataset to compute BRs for all agents simultaneously. This amortizes environment interaction and improves the sample efficiency of best-response computation. Because JBR converts BR computation into an offline RL problem, we propose three remedies for distribution-shift bias: (i) Conservative JBR with safe policy improvement, (ii) Exploration-Augmented JBR that perturbs data collection and admits theoretical guarantees, and (iii) Hybrid BR that interleaves JBR with periodic independent BR updates. Across benchmark multi-agent environments, Exploration-Augmented JBR achieves the best accuracy-efficiency trade-off, while Hybrid BR attains near-PSRO performance at a fraction of the sample cost. Overall, JBR makes PSRO substantially more practical for large-scale strategic learning while preserving equilibrium robustness.

2602.06593 2026-02-09 cs.SE cs.AI

AgentStepper: Interactive Debugging of Software Development Agents

Robert Hutter, Michael Pradel

详情
英文摘要

Software development agents powered by large language models (LLMs) have shown great promise in automating tasks like environment setup, issue solving, and program repair. Unfortunately, understanding and debugging such agents remain challenging due to their complex and dynamic nature. Developers must reason about trajectories of LLM queries, tool calls, and code modifications, but current techniques reveal little of this intermediate process in a comprehensible format. The key insight of this paper is that debugging software development agents shares many similarities with conventional debugging of software programs, yet requires a higher level of abstraction that raises the level from low-level implementation details to high-level agent actions. Drawing on this insight, we introduce AgentStepper, the first interactive debugger for LLM-based software engineering agents. AgentStepper enables developers to inspect, control, and interactively manipulate agent trajectories. AgentStepper represents trajectories as structured conversations among an LLM, the agent program, and tools. It supports breakpoints, stepwise execution, and live editing of prompts and tool invocations, while capturing and displaying intermediate repository-level code changes. Our evaluation applies AgentStepper to three state-of-the-art software development agents, ExecutionAgent, SWE-Agent, and RepairAgent, showing that integrating the approach into existing agents requires minor code changes (39-42 edited lines). Moreover, we report on a user study with twelve participants, indicating that AgentStepper improves the ability of participants to interpret trajectories (64% vs. 67% mean performance) and identify bugs in the agent's implementation (17% vs. 60% success rate), while reducing perceived workload (e.g., frustration reduced from 5.4/7.0 to 2.4/7.0) compared to conventional tools.

2602.06555 2026-02-09 cs.DC cs.LG

Reinforcement Learning-Based Dynamic Management of Structured Parallel Farm Skeletons on Serverless Platforms

Lanpei Li, Massimo Coppola, Malio Li, Valerio Besozzi, Jack Bell, Vincenzo Lomonaco

Comments Accepted at AHPC3 workshop, PDP 2026

详情
英文摘要

We present a framework for dynamic management of structured parallel processing skeletons on serverless platforms. Our goal is to bring HPC-like performance and resilience to serverless and continuum environments while preserving the programmability benefits of skeletons. As a first step, we focus on the well known Farm pattern and its implementation on the open-source OpenFaaS platform, treating autoscaling of the worker pool as a QoS-aware resource management problem. The framework couples a reusable farm template with a Gymnasium-based monitoring and control layer that exposes queue, timing, and QoS metrics to both reactive and learning-based controllers. We investigate the effectiveness of AI-driven dynamic scaling for managing the farm's degree of parallelism via the scalability of serverless functions on OpenFaaS. In particular, we discuss the autoscaling model and its training, and evaluate two reinforcement learning (RL) policies against a baseline of reactive management derived from a simple farm performance model. Our results show that AI-based management can better accommodate platform-specific limitations than purely model-based performance steering, improving QoS while maintaining efficient resource usage and stable scaling behaviour.

2602.06553 2026-02-09 math.AG cs.LG

Evolving Ranking Functions for Canonical Blow-Ups in Positive Characteristic

Gergely Bérczi

Comments 41 pages

详情
英文摘要

Resolution of singularities in positive characteristic remains a long-standing open problem in algebraic geometry. In characteristic zero, the problem was solved by Hironaka in 1964, work for which he was awarded the Fields Medal. Modern proofs proceed by constructing suitable ranking functions, that is, invariants shown to strictly decrease along canonical sequences of blow-ups, ensuring termination. In positive characteristic, however, no such general ranking function is known: Frobenius-specific pathologies, such as the kangaroo phenomenon, can cause classical characteristic-zero invariants to plateau or even temporarily increase, presenting a fundamental obstruction to existing approaches. In this paper we report a sequence of experiments using the evolutionary search model AlphaEvolve, designed to discover candidate ranking functions for a toy canonical blow-up process. Our test benchmarks consist of carefully selected hypersurface singularities in dimension $4$ and characteristic $p=3$, with monic purely inseparable leading term, a regime in which naive order-based invariants often fail. After iteratively refining the experimental design, we obtained a discretized five-component lexicographic ranking function satisfying a bounded-delay descent criterion with zero violations across the benchmark. These experiments in turn motivated our main results: the conjectural delayed ranking functions in characteristic $3$ formulated in two conjectures.

2602.06545 2026-02-09 stat.ML cs.LG

Operationalizing Stein's Method for Online Linear Optimization: CLT-Based Optimal Tradeoffs

Zhiyu Zhang, Aaditya Ramdas

详情
英文摘要

Adversarial online linear optimization (OLO) is essentially about making performance tradeoffs with respect to the unknown difficulty of the adversary. In the setting of one-dimensional fixed-time OLO on a bounded domain, it has been observed since Cover (1966) that achievable tradeoffs are governed by probabilistic inequalities, and these descriptive results can be converted into algorithms via dynamic programming, which, however, is not computationally efficient. We address this limitation by showing that Stein's method, a classical framework underlying the proofs of probabilistic limit theorems, can be operationalized as computationally efficient OLO algorithms. The associated regret and total loss upper bounds are "additively sharp", meaning that they surpass the conventional big-O optimality and match normal-approximation-based lower bounds by additive lower order terms. Our construction is inspired by the remarkably clean proof of a Wasserstein martingale central limit theorem (CLT) due to Röllin (2018). Several concrete benefits can be obtained from this general technique. First, with the same computational complexity, the proposed algorithm improves upon the total loss upper bounds of online gradient descent (OGD) and multiplicative weight update (MWU). Second, our algorithm can realize a continuum of optimal two-point tradeoffs between the total loss and the maximum regret over comparators, improving upon prior works in parameter-free online learning. Third, by allowing the adversary to randomize on an unbounded support, we achieve sharp in-expectation performance guarantees for OLO with noisy feedback.

2602.06534 2026-02-09 cs.CR cs.LG

AlertBERT: A noise-robust alert grouping framework for simultaneous cyber attacks

Lukas Karner, Max Landauer, Markus Wurzenberger, Florian Skopik

详情
英文摘要

Automated detection of cyber attacks is a critical capability to counteract the growing volume and sophistication of cyber attacks. However, the high numbers of security alerts issued by intrusion detection systems lead to alert fatigue among analysts working in security operations centres (SOC), which in turn causes slow reaction time and incorrect decision making. Alert grouping, which refers to clustering of security alerts according to their underlying causes, can significantly reduce the number of distinct items analysts have to consider. Unfortunately, conventional time-based alert grouping solutions are unsuitable for large scale computer networks characterised by high levels of false positive alerts and simultaneously occurring attacks. To address these limitations, we propose AlertBERT, a self-supervised framework designed to group alerts from isolated or concurrent attacks in noisy environments. Thereby, our open-source implementation of AlertBERT leverages masked-language-models and density-based clustering to support both real-time or forensic operation. To evaluate our framework, we further introduce a novel data augmentation method that enables flexible control over noise levels and simulates concurrent attack occurrences. Based on the data sets generated through this method, we demonstrate that AlertBERT consistently outperforms conventional time-based grouping techniques, achieving superior accuracy in identifying correct alert groups.

2602.06506 2026-02-09 cs.HC cs.CL cs.CY

Designing Computational Tools for Exploring Causal Relationships in Qualitative Data

Han Meng, Qiuyuan Lyu, Peinuan Qin, Yitian Yang, Renwen Zhang, Wen-Chieh Lin, Yi-Chieh Lee

Comments 19 pages, 5 figures, conditionally accepted by CHI26

详情
英文摘要

Exploring causal relationships for qualitative data analysis in HCI and social science research enables the understanding of user needs and theory building. However, current computational tools primarily characterize and categorize qualitative data; the few systems that analyze causal relationships either inadequately consider context, lack credibility, or produce overly complex outputs. We first conducted a formative study with 15 participants interested in using computational tools for exploring causal relationships in qualitative data to understand their needs and derive design guidelines. Based on these findings, we designed and implemented QualCausal, a system that extracts and illustrates causal relationships through interactive causal network construction and multi-view visualization. A feedback study (n = 15) revealed that participants valued our system for reducing the analytical burden and providing cognitive scaffolding, yet navigated how such systems fit within their established research paradigms, practices, and habits. We discuss broader implications for designing computational tools that support qualitative data analysis.

2602.06476 2026-02-09 cs.MA cs.AI cs.LG

Prism: Spectral Parameter Sharing for Multi-Agent Reinforcement Learning

Kyungbeom Kim, Seungwon Oh, Kyung-Joong Kim

详情
英文摘要

Parameter sharing is a key strategy in multi-agent reinforcement learning (MARL) for improving scalability, yet conventional fully shared architectures often collapse into homogeneous behaviors. Recent methods introduce diversity through clustering, pruning, or masking, but typically compromise resource efficiency. We propose Prism, a parameter sharing framework that induces inter-agent diversity by representing shared networks in the spectral domain via singular value decomposition (SVD). All agents share the singular vector directions while learning distinct spectral masks on singular values. This mechanism encourages inter-agent diversity and preserves scalability. Extensive experiments on both homogeneous (LBF, SMACv2) and heterogeneous (MaMuJoCo) benchmarks show that Prism achieves competitive performance with superior resource efficiency.

2602.06443 2026-02-09 cs.CR cs.AI

TrajAD: Trajectory Anomaly Detection for Trustworthy LLM Agents

Yibing Liu, Chong Zhang, Zhongyi Han, Hansong Liu, Yong Wang, Yang Yu, Xiaoyan Wang, Yilong Yin

Comments 9 pages, 5 figures, 1 table

详情
英文摘要

We address the problem of runtime trajectory anomaly detection, a critical capability for enabling trustworthy LLM agents. Current safety measures predominantly focus on static input/output filtering. However, we argue that ensuring LLM agents reliability requires auditing the intermediate execution process. In this work, we formulate the task of Trajectory Anomaly Detection. The goal is not merely detection, but precise error localization. This capability is essential for enabling efficient rollback-and-retry. To achieve this, we construct TrajBench, a dataset synthesized via a perturb-and-complete strategy to cover diverse procedural anomalies. Using this benchmark, we investigate the capability of models in process supervision. We observe that general-purpose LLMs, even with zero-shot prompting, struggle to identify and localize these anomalies. This reveals that generalized capabilities do not automatically translate to process reliability. To address this, we propose TrajAD, a specialized verifier trained with fine-grained process supervision. Our approach outperforms baselines, demonstrating that specialized supervision is essential for building trustworthy agents.

2602.06431 2026-02-09 cs.SI cs.AI cs.IR

A methodology for analyzing financial needs hierarchy from social discussions using LLM

Abhishek Jangra, Sachin Thukral, Arnab Chatterjee, Jayasree Raveendran

Comments 15 pages, 5 figures, 4 tables

详情
英文摘要

This study examines the hierarchical structure of financial needs as articulated in social media discourse, employing generative AI techniques to analyze large-scale textual data. While human needs encompass a broad spectrum from fundamental survival to psychological fulfillment financial needs are particularly critical, influencing both individual well-being and day-to-day decision-making. Our research advances the understanding of financial behavior by utilizing large language models (LLMs) to extract and analyze expressions of financial needs from social media posts. We hypothesize that financial needs are organized hierarchically, progressing from short-term essentials to long-term aspirations, consistent with theoretical frameworks established in the behavioral sciences. Through computational analysis, we demonstrate the feasibility of identifying these needs and validate the presence of a hierarchical structure within them. In addition to confirming this structure, our findings provide novel insights into the content and themes of financial discussions online. By inferring underlying needs from naturally occurring language, this approach offers a scalable and data-driven alternative to conventional survey methodologies, enabling a more dynamic and nuanced understanding of financial behavior in real-world contexts.

2602.06395 2026-02-09 cs.CR cs.AI cs.LG

Empirical Analysis of Adversarial Robustness and Explainability Drift in Cybersecurity Classifiers

Mona Rajhans, Vishal Khawarey

Comments Accepted for publication in 18th ACM International Conference on Agents and Artificial Intelligence (ICAART 2026), Marbella, Spain

详情
英文摘要

Machine learning (ML) models are increasingly deployed in cybersecurity applications such as phishing detection and network intrusion prevention. However, these models remain vulnerable to adversarial perturbations small, deliberate input modifications that can degrade detection accuracy and compromise interpretability. This paper presents an empirical study of adversarial robustness and explainability drift across two cybersecurity domains phishing URL classification and network intrusion detection. We evaluate the impact of L (infinity) bounded Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) perturbations on model accuracy and introduce a quantitative metric, the Robustness Index (RI), defined as the area under the accuracy perturbation curve. Gradient based feature sensitivity and SHAP based attribution drift analyses reveal which input features are most susceptible to adversarial manipulation. Experiments on the Phishing Websites and UNSW NB15 datasets show consistent robustness trends, with adversarial training improving RI by up to 9 percent while maintaining clean-data accuracy. These findings highlight the coupling between robustness and interpretability degradation and underscore the importance of quantitative evaluation in the design of trustworthy, AI-driven cybersecurity systems.

2602.06365 2026-02-09 eess.SY cs.LG cs.SY

Advances in Battery Energy Storage Management: Control and Economic Synergies

Venkata Rajesh Chundru, Shreshta Rajakumar Deshpande, Stanislav A Gankov

Comments Pre Print

详情
英文摘要

The existing literature on Battery Energy Storage Systems (BESS) predominantly focuses on two main areas: control system design aimed at achieving grid stability and the techno-economic analysis of BESS dispatch on power grid. However, with the increasing incorporation of ancillary services into power grids, a more comprehensive approach to energy management systems is required. Such an approach should not only optimize revenue generation from BESS but also ensure the safe, efficient, and reliable operation of lithium-ion batteries. This research seeks to bridge this gap by exploring literature that addresses both the economic and operational dimensions of BESS. Specifically, it examines how economic aspects of grid duty cycles can align with control schemes deployed in BESS systems. This alignment, or synergy, could be instrumental in creating robust digital twins virtual representations of BESS systems that enhance both grid stability and revenue potential. The literature review is organized into five key categories: (1) ancillary services for BESS, exploring support functions that BESS can provide to power grids; (2) control systems developed for real-time BESS power flow management, ensuring smooth operations under dynamic grid conditions; (3) optimization algorithms for BESS dispatch, focusing on efficient energy allocation strategies; (4) techno-economic analyses of BESS and battery systems to assess their financial viability; and (5) digital twin technologies for real-world BESS deployments, enabling advanced predictive maintenance and performance optimization. This review will identify potential synergies, research gaps, and emerging trends, paving the way for future innovations in BESS management and deployment strategies.

2602.06350 2026-02-09 eess.IV cs.CV

AS-Mamba: Asymmetric Self-Guided Mamba Decoupled Iterative Network for Metal Artifact Reduction

Bowen Ning, Zekun Zhou, Xinyi Zhong, Zhongzhen Wang, HongXin Wu, HaiTao Wang, Liu Shi, Qiegen Liu

Comments 10 pages,10 figures

详情
英文摘要

Metal artifact significantly degrades Computed Tomography (CT) image quality, impeding accurate clinical diagnosis. However, existing deep learning approaches, such as CNN and Transformer, often fail to explicitly capture the directional geometric features of artifacts, leading to compromised structural restoration. To address these limitations, we propose the Asymmetric Self-Guided Mamba (AS-Mamba) for metal artifact reduction. Specifically, the linear propagation of metal-induced streak artifacts aligns well with the sequential modeling capability of State Space Models (SSMs). Consequently, the Mamba architecture is leveraged to explicitly capture and suppress these directional artifacts. Simultaneously, a frequency domain correction mechanism is incorporated to rectify the global amplitude spectrum, thereby mitigating intensity inhomogeneity caused by beam hardening. Furthermore, to bridge the distribution gap across diverse clinical scenarios, we introduce a self-guided contrastive regularization strategy. Extensive experiments on public andclinical dental CBCT datasets demonstrate that AS-Mamba achieves superior performance in suppressing directional streaks and preserving structural details, validating the effectiveness of integrating physical geometric priors into deep network design.

2602.06345 2026-02-09 cs.CR cs.AI

Zero-Trust Runtime Verification for Agentic Payment Protocols: Mitigating Replay and Context-Binding Failures in AP2

Qianlong Lan, Anuj Kaul, Shaun Jones, Stephanie Westrum

详情
英文摘要

The deployment of autonomous AI agents capable of executing commercial transactions has motivated the adoption of mandate-based payment authorization protocols, including the Universal Commerce Protocol (UCP) and the Agent Payments Protocol (AP2). These protocols replace interactive, session-based authorization with cryptographically issued mandates, enabling asynchronous and autonomous execution. While AP2 provides specification-level guarantees through signature verification, explicit binding, and expiration semantics, real-world agentic execution introduces runtime behaviors such as retries, concurrency, and orchestration that challenge implicit assumptions about mandate usage. In this work, we present a security analysis of the AP2 mandate lifecycle and identify enforcement gaps that arise during runtime in agent-based payment systems. We propose a zero-trust runtime verification framework that enforces explicit context binding and consume-once mandate semantics using dynamically generated, time-bound nonces, ensuring that authorization decisions are evaluated at execution time rather than assumed from static issuance properties. Through simulation-based evaluation under high concurrency, we show that context-aware binding and consume-once enforcement address distinct and complementary attack classes, and that both are required to prevent replay and context-redirect attacks. The proposed framework mitigates all evaluated attacks while maintaining stable verification latency of approximately 3.8~ms at throughput levels up to 10{,}000 transactions per second. We further demonstrate that the required runtime state is bounded by peak concurrency rather than cumulative transaction history, indicating that robust runtime security for agentic payment execution can be achieved with minimal and predictable overhead.

2602.06336 2026-02-09 cs.CR cs.DC cs.LG

AdFL: In-Browser Federated Learning for Online Advertisement

Ahmad Alemari, Pritam Sen, Cristian Borcea

详情
英文摘要

Since most countries are coming up with online privacy regulations, such as GDPR in the EU, online publishers need to find a balance between revenue from targeted advertisement and user privacy. One way to be able to still show targeted ads, based on user personal and behavioral information, is to employ Federated Learning (FL), which performs distributed learning across users without sharing user raw data with other stakeholders in the publishing ecosystem. This paper presents AdFL, an FL framework that works in the browsers to learn user ad preferences. These preferences are aggregated in a global FL model, which is then used in the browsers to show more relevant ads to users. AdFL can work with any model that uses features available in the browser such as ad viewability, ad click-through, user dwell time on pages, and page content. The AdFL server runs at the publisher and coordinates the learning process for the users who browse pages on the publisher's website. The AdFL prototype does not require the client to install any software, as it is built utilizing standard APIs available on most modern browsers. We built a proof-of-concept model for ad viewability prediction that runs on top of AdFL. We tested AdFL and the model with two non-overlapping datasets from a website with 40K visitors per day. The experiments demonstrate AdFL's feasibility to capture the training information in the browser in a few milliseconds, show that the ad viewability prediction achieves up to 92.59% AUC, and indicate that utilizing differential privacy (DP) to safeguard local model parameters yields adequate performance, with only modest declines in comparison to the non-DP variant.

2602.06297 2026-02-09 stat.ML cs.LG

Time-uniform conformal and PAC prediction

Kayla E. Scharfstein, Arun Kumar Kuchibhotla

详情
英文摘要

Given that machine learning algorithms are increasingly being deployed to aid in high stakes decision-making, uncertainty quantification methods that wrap around these black box models such as conformal prediction have received much attention in recent years. In sequential settings, where data are observed/generated in a streaming fashion, traditional conformal methods do not provide any guarantee without fixing the sample size. More importantly, traditional conformal methods cannot cope with sequentially updated predictions. As such, we develop an extension of the conformal prediction and related probably approximately correct (PAC) prediction frameworks to sequential settings where the number of data points is not fixed in advance. The resulting prediction sets are anytime-valid in that their expected coverage is at the required level at any time chosen by the analyst even if this choice depends on the data. We present theoretical guarantees for our proposed methods and demonstrate their validity and utility on simulated and real datasets.

2602.05975 2026-02-09 cs.IR cs.CL

SAGE: Benchmarking and Improving Retrieval for Deep Research Agents

Tiansheng Hu, Yilun Zhao, Canyu Zhang, Arman Cohan, Chen Zhao

详情
英文摘要

Deep research agents have emerged as powerful systems for addressing complex queries. Meanwhile, LLM-based retrievers have demonstrated strong capability in following instructions or reasoning. This raises a critical question: can LLM-based retrievers effectively contribute to deep research agent workflows? To investigate this, we introduce SAGE, a benchmark for scientific literature retrieval comprising 1,200 queries across four scientific domains, with a 200,000 paper retrieval corpus. We evaluate six deep research agents and find that all systems struggle with reasoning-intensive retrieval. Using DR Tulu as backbone, we further compare BM25 and LLM-based retrievers (i.e., ReasonIR and gte-Qwen2-7B-instruct) as alternative search tools. Surprisingly, BM25 significantly outperforms LLM-based retrievers by approximately 30%, as existing agents generate keyword-oriented sub-queries. To improve performance, we propose a corpus-level test-time scaling framework that uses LLMs to augment documents with metadata and keywords, making retrieval easier for off-the-shelf retrievers. This yields 8% and 2% gains on short-form and open-ended questions, respectively.

2602.05817 2026-02-09 cs.CR cs.LG cs.NI

Interpreting Manifolds and Graph Neural Embeddings from Internet of Things Traffic Flows

Enrique Feito-Casares, Francisco M. Melgarejo-Meseguer, Elena Casiraghi, Giorgio Valentini, José-Luis Rojo-Álvarez

详情
英文摘要

The rapid expansion of Internet of Things (IoT) ecosystems has led to increasingly complex and heterogeneous network topologies. Traditional network monitoring and visualization tools rely on aggregated metrics or static representations, which fail to capture the evolving relationships and structural dependencies between devices. Although Graph Neural Networks (GNNs) offer a powerful way to learn from relational data, their internal representations often remain opaque and difficult to interpret for security-critical operations. Consequently, this work introduces an interpretable pipeline that generates directly visualizable low-dimensional representations by mapping high-dimensional embeddings onto a latent manifold. This projection enables the interpretable monitoring and interoperability of evolving network states, while integrated feature attribution techniques decode the specific characteristics shaping the manifold structure. The framework achieves a classification F1-score of 0.830 for intrusion detection while also highlighting phenomena such as concept drift. Ultimately, the presented approach bridges the gap between high-dimensional GNN embeddings and human-understandable network behavior, offering new insights for network administrators and security analysts.

2602.05754 2026-02-09 cs.DC cs.AI

TimelyFreeze: Adaptive Parameter Freezing Mechanism for Pipeline Parallelism

Seonghye Cho, Jaemin Han, Hyunjin Kim, Euisoo Jung, Jae-Gil Lee

详情
英文摘要

Pipeline parallelism enables training models that exceed single-device memory, but practical throughput remains limited by pipeline bubbles. Although parameter freezing can improve training throughput by adaptively skipping backward computation, existing methods often over-freeze parameters, resulting in unnecessary accuracy degradation. To address this issue, we propose TimelyFreeze, which models the pipeline schedule as a directed acyclic graph and solves a linear program to compute optimal freeze ratios that minimize batch execution time under accuracy constraints. Experiments show that TimelyFreeze achieves up to 40% training throughput improvement on LLaMA-8B with comparable accuracy. Overall, it enables faster large-scale model training without compromising convergence and generalizes across diverse pipeline-parallel settings.

2602.05386 2026-02-09 cs.CR cs.AI

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Zhenxiong Yu, Zhi Yang, Zhiheng Jin, Shuhe Wang, Heng Zhang, Yanlin Fei, Lingfeng Zeng, Fangqi Lou, Shuo Zhang, Tu Hu, Jingping Liu, Rongze Chen, Xingyu Zhu, Kunyi Wang, Chaofa Yuan, Xin Guo, Zhaowei Liu, Feipeng Zhang, Jie Huang, Huacan Wang, Ronghao Chen, Liwen Zhang

详情
英文摘要

As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security challenges. Most existing agent defense mechanisms adopt a mandatory checking paradigm, in which security validation is forcibly triggered at predefined stages of the agent lifecycle. In this work, we argue that effective agent security should be intrinsic and selective rather than architecturally decoupled and mandatory. We propose Spider-Sense framework, an event-driven defense framework based on Intrinsic Risk Sensing (IRS), which allows agents to maintain latent vigilance and trigger defenses only upon risk perception. Once triggered, the Spider-Sense invokes a hierarchical defence mechanism that trades off efficiency and precision: it resolves known patterns via lightweight similarity matching while escalating ambiguous cases to deep internal reasoning, thereby eliminating reliance on external models. To facilitate rigorous evaluation, we introduce S$^2$Bench, a lifecycle-aware benchmark featuring realistic tool execution and multi-stage attacks. Extensive experiments demonstrate that Spider-Sense achieves competitive or superior defense performance, attaining the lowest Attack Success Rate (ASR) and False Positive Rate (FPR), with only a marginal latency overhead of 8.3\%.

2602.05227 2026-02-09 stat.ML cs.LG cs.NA math.AP math.NA stat.ME

Radon--Wasserstein Gradient Flows for Interacting-Particle Sampling in High Dimensions

Elias Hess-Childs, Dejan Slepčev, Lantian Xu

Comments 49 pages, 7 figures; corrected Figure 4.4

详情
英文摘要

Gradient flows of the Kullback--Leibler (KL) divergence, such as the Fokker--Planck equation and Stein Variational Gradient Descent, evolve a distribution toward a target density known only up to a normalizing constant. We introduce new gradient flows of the KL divergence with a remarkable combination of properties: they admit accurate interacting-particle approximations in high dimensions, and the per-step cost scales linearly in both the number of particles and the dimension. These gradient flows are based on new transportation-based Riemannian geometries on the space of probability measures: the Radon--Wasserstein geometry and the related Regularized Radon--Wasserstein (RRW) geometry. We define these geometries using the Radon transform so that the gradient-flow velocities depend only on one-dimensional projections. This yields interacting-particle-based algorithms whose per-step cost follows from efficient Fast Fourier Transform-based evaluation of the required 1D convolutions. We additionally provide numerical experiments that study the performance of the proposed algorithms and compare convergence behavior and quantization. Finally, we prove some theoretical results including well-posedness of the flows and long-time convergence guarantees for the RRW flow.

2602.03868 2026-02-09 eess.AS cs.AI cs.CL cs.SD

Benchmarking Automatic Speech Recognition for Indian Languages in Agricultural Contexts

Chandrashekar M S, Vineet Singh, Lakshmi Pedapudi

Comments 9 pages, 6 figures

详情
英文摘要

The digitization of agricultural advisory services in India requires robust Automatic Speech Recognition (ASR) systems capable of accurately transcribing domain-specific terminology in multiple Indian languages. This paper presents a benchmarking framework for evaluating ASR performance in agricultural contexts across Hindi, Telugu, and Odia languages. We introduce evaluation metrics including Agriculture Weighted Word Error Rate (AWWER) and domain-specific utility scoring to complement traditional metrics. Our evaluation of 10,934 audio recordings, each transcribed by up to 10 ASR models, reveals performance variations across languages and models, with Hindi achieving the best overall performance (WER: 16.2%) while Odia presents the greatest challenges (best WER: 35.1%, achieved only with speaker diarization). We characterize audio quality challenges inherent to real-world agricultural field recordings and demonstrate that speaker diarization with best-speaker selection can substantially reduce WER for multi-speaker recordings (upto 66% depending on the proportion of multi-speaker audio). We identify recurring error patterns in agricultural terminology and provide practical recommendations for improving ASR systems in low-resource agricultural domains. The study establishes baseline benchmarks for future agricultural ASR development.

2602.02614 2026-02-09 cs.SE cs.AI cs.CR

Testing Storage-System Correctness: Challenges, Fuzzing Limitations, and AI-Augmented Opportunities

Ying Wang, Jiahui Chen, Dejun Jiang

详情
英文摘要

Storage systems are fundamental to modern computing infrastructures, yet ensuring their correctness remains challenging in practice. Despite decades of research on system testing, many storage-system failures (including durability, ordering, recovery, and consistency violations) remain difficult to expose systematically. This difficulty stems not primarily from insufficient testing tooling, but from intrinsic properties of storage-system execution, including nondeterministic interleavings, long-horizon state evolution, and correctness semantics that span multiple layers and execution phases. This survey adopts a storage-centric view of system testing and organizes existing techniques according to the execution properties and failure mechanisms they target. We review a broad spectrum of approaches, ranging from concurrency testing and long-running workloads to crash-consistency analysis, hardware-level semantic validation, and distributed fault injection, and analyze their fundamental strengths and limitations. Within this framework, we examine fuzzing as an automated testing paradigm, highlighting systematic mismatches between conventional fuzzing assumptions and storage-system semantics, and discuss how recent artificial intelligence advances may complement fuzzing through state-aware and semantic guidance. Overall, this survey provides a unified perspective on storage-system correctness testing and outlines key challenges

2602.00169 2026-02-09 cond-mat.mtrl-sci cs.AI

Towards Agentic Intelligence for Materials Science

Huan Zhang, Yizhan Li, Wenhao Huang, Ziyu Hou, Yu Song, Xuye Liu, Farshid Effaty, Jinya Jiang, Sifan Wu, Qianggang Ding, Izumi Takahara, Leonard R. MacGillivray, Teruyasu Mizoguchi, Tianshu Yu, Lizi Liao, Yuyu Luo, Yu Rong, Jia Li, Ying Diao, Heng Ji, Bang Liu

Comments 81 pages

详情
英文摘要

The convergence of artificial intelligence and materials science presents a transformative opportunity, but achieving true acceleration in discovery requires moving beyond task-isolated, fine-tuned models toward agentic systems that plan, act, and learn across the full discovery loop. This survey advances a unique pipeline-centric view that spans from corpus curation and pretraining, through domain adaptation and instruction tuning, to goal-conditioned agents interfacing with simulation and experimental platforms. Unlike prior reviews, we treat the entire process as an end-to-end system to be optimized for tangible discovery outcomes rather than proxy benchmarks. This perspective allows us to trace how upstream design choices-such as data curation and training objectives-can be aligned with downstream experimental success through effective credit assignment. To bridge communities and establish a shared frame of reference, we first present an integrated lens that aligns terminology, evaluation, and workflow stages across AI and materials science. We then analyze the field through two focused lenses: From the AI perspective, the survey details LLM strengths in pattern recognition, predictive analytics, and natural language processing for literature mining, materials characterization, and property prediction; from the materials science perspective, it highlights applications in materials design, process optimization, and the acceleration of computational workflows via integration with external tools (e.g., DFT, robotic labs). Finally, we contrast passive, reactive approaches with agentic design, cataloging current contributions while motivating systems that pursue long-horizon goals with autonomy, memory, and tool use. This survey charts a practical roadmap towards autonomous, safety-aware LLM agents aimed at discovering novel and useful materials.