arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1527
2603.11332 2026-03-13 cs.CC cs.LG

On the Computational Hardness of Transformers

Barna Saha, Yinzhan Xu, Christopher Ye, Hantao Yu

Comments 46 pages, 2 figures. Abstract shortened to meet arXiv requirements

详情
英文摘要

The transformer has revolutionized modern AI across language, vision, and beyond. It consists of $L$ layers, each running $H$ attention heads in parallel and feeding the combined output to the subsequent layer. In attention, the input consists of $N$ tokens, each a vector of dimension $m$. The attention mechanism involves multiplying three $N \times m$ matrices, applying softmax to an intermediate product. Several recent works have advanced our understanding of the complexity of attention. Known algorithms for transformers compute each attention head independently. This raises a fundamental question that has recurred throughout TCS under the guise of ``direct sum'' problems: can multiple instances of the same problem be solved more efficiently than solving each instance separately? Many answers to this question, both positive and negative, have arisen in fields spanning communication complexity and algorithm design. Thus, we ask whether transformers can be computed more efficiently than $LH$ independent evaluations of attention. In this paper, we resolve this question in the negative, and give the first non-trivial computational lower bounds for multi-head multi-layer transformers. In the small embedding regime ($m = N^{o(1)}$), computing $LH$ attention heads separately takes $LHN^{2 + o(1)}$ time. We establish that this is essentially optimal under SETH. In the large embedding regime ($m = N$), one can compute $LH$ attention heads separately using $LHN^{ω+ o(1)}$ arithmetic operations (plus exponents), where $ω$ is the matrix multiplication exponent. We establish that this is optimal, by showing that $LHN^{ω- o(1)}$ arithmetic operations are necessary when $ω> 2$. Our lower bound in the large embedding regime relies on a novel application of the Baur-Strassen theorem, a powerful algorithmic tool underpinning the famous backpropagation algorithm.

2603.11330 2026-03-13 q-bio.QM cs.LG cs.NA math.DS math.NA

Ill-Conditioning in Dictionary-Based Dynamic-Equation Learning: A Systems Biology Case Study

Yuxiang Feng, Niall M Mangan, Manu Jayadharan

详情
英文摘要

Data-driven discovery of governing equations from time-series data provides a powerful framework for understanding complex biological systems. Library-based approaches that use sparse regression over candidate functions have shown considerable promise, but they face a critical challenge when candidate functions become strongly correlated: numerical ill-conditioning. Poor or restricted sampling, together with particular choices of candidate libraries, can produce strong multicollinearity and numerical instability. In such cases, measurement noise may lead to widely different recovered models, obscuring the true underlying dynamics and hindering accurate system identification. Although sparse regularization promotes parsimonious solutions and can partially mitigate conditioning issues, strong correlations may persist, regularization may bias the recovered models, and the regression problem may remain highly sensitive to small perturbations in the data. We present a systematic analysis of how ill-conditioning affects sparse identification of biological dynamics using benchmark models from systems biology. We show that combinations involving as few as two or three terms can already exhibit strong multicollinearity and extremely large condition numbers. We further show that orthogonal polynomial bases do not consistently resolve ill-conditioning and can perform worse than monomial libraries when the data distribution deviates from the weight function associated with the orthogonal basis. Finally, we demonstrate that when data are sampled from distributions aligned with the appropriate weight functions corresponding to the orthogonal basis, numerical conditioning improves, and orthogonal polynomial bases can yield improved model recovery accuracy across two baseline models.

2603.11316 2026-03-13 physics.med-ph cs.CV cs.LG

MRI2Qmap: multi-parametric quantitative mapping with MRI-driven denoising priors

Mohammad Golbabaee, Matteo Cencini, Carolin Pirkl, Marion Menzel, Michela Tosetti, Bjoern Menze

详情
英文摘要

Magnetic Resonance Fingerprinting (MRF) and other highly accelerated transient-state parameter mapping techniques enable simultaneous quantification of multiple tissue properties, but often suffer from aliasing artifacts due to compressed sampling. Incorporating spatial image priors can mitigate these artifacts, and deep learning has shown strong potential when large training datasets are available. However, extending this paradigm to MRF-type sequences remains challenging due to the scarcity of quantitative imaging data for training. Can this limitation be overcome by leveraging sources of training data from clinically-routine weighted MRI images? To this end, we introduce MRI2Qmap, a plug-and-play quantitative reconstruction framework that integrates the physical acquisition model with priors learned from deep denoising autoencoders pretrained on large multimodal weighted-MRI datasets. MRI2Qmap demonstrates that spatial-domain structural priors learned from independently acquired datasets of routine weighted-MRI images can be effectively used for quantitative MRI reconstruction. The proposed method is validated on highly accelerated 3D whole-brain MRF data from both in-vivo and simulated acquisitions, achieving competitive or superior performance relative to existing baselines without requiring ground-truth quantitative imaging data for training. By decoupling quantitative reconstruction from the need for ground-truth MRF training data, this framework points toward a scalable paradigm for quantitative MRI that can capitalize on the large and growing repositories of routine clinical MRI.

2603.11304 2026-03-13 stat.ML cs.AI cs.LG stat.ME

Worst-case low-rank approximations

Anya Fries, Markus Reichstein, David Blei, Jonas Peters

详情
英文摘要

Real-world data in health, economics, and environmental sciences are often collected across heterogeneous domains (such as hospitals, regions, or time periods). In such settings, distributional shifts can make standard PCA unreliable, in that, for example, the leading principal components may explain substantially less variance in unseen domains than in the training domains. Existing approaches (such as FairPCA) have proposed to consider worst-case (rather than average) performance across multiple domains. This work develops a unified framework, called wcPCA, applies it to other objectives (resulting in the novel estimators such as norm-minPCA and norm-maxregret, which are better suited for applications with heterogeneous total variance) and analyzes their relationship. We prove that for all objectives, the estimators are worst-case optimal not only over the observed source domains but also over all target domains whose covariance lies in the convex hull of the (possibly normalized) source covariances. We establish consistency and asymptotic worst-case guarantees of empirical estimators. We extend our methodology to matrix completion, another problem that makes use of low-rank approximations, and prove approximate worst-case optimality for inductive matrix completion. Simulations and two real-world applications on ecosystem-atmosphere fluxes demonstrate marked improvements in worst-case performance, with only minor losses in average performance.

2603.11274 2026-03-13 cs.HC cs.AI cs.CY

"I followed what felt right, not what I was told": Autonomy, Coaching, and Recognizing Bias Through AI-Mediated Dialogue

Atieh Taheri, Hamza El Alaoui, Patrick Carrington, Jeffrey P. Bigham

Comments Accepted to CHI 2026 (ACM Conference on Human Factors in Computing Systems), 23 pages, 5 figures

详情
英文摘要

Ableist microaggressions remain pervasive in everyday interactions, yet interventions to help people recognize them are limited. We present an experiment testing how AI-mediated dialogue influences recognition of ableism. 160 participants completed a pre-test, intervention, and a post-test across four conditions: AI nudges toward bias (Bias-Directed), inclusion (Neutral-Directed), unguided dialogue (Self-Directed), and a text-only non-dialogue (Reading). Participants rated scenarios on standardness of social experience and emotional impact; those in dialogue-based conditions also provided qualitative reflections. Quantitative results showed dialogue-based conditions produced stronger recognition than Reading, though trajectories diverged: biased nudges improved differentiation of bias from neutrality but increased overall negativity. Inclusive or no nudges remained more balanced, while Reading participants showed weaker gains and even declines. Qualitative findings revealed biased nudges were often rejected, while inclusive nudges were adopted as scaffolding. We contribute a validated vignette corpus, an AI-mediated intervention platform, and design implications highlighting trade-offs conversational systems face when integrating bias-related nudges.

2603.11264 2026-03-13 eess.SY cs.RO cs.SY

Multi-Robot Multitask Gaussian Process Estimation and Coverage

Lai Wei, Andrew McDonald, Vaibhav Srivastava

详情
英文摘要

Coverage control is essential for the optimal deployment of agents to monitor or cover areas with sensory demands. While traditional coverage involves single-task robots, increasing autonomy now enables multitask operations. This paper introduces a novel multitask coverage problem and addresses it for both the cases of known and unknown sensory demands. For known demands, we design a federated multitask coverage algorithm and establish its convergence properties. For unknown demands, we employ a multitask Gaussian Process (GP) framework to learn sensory demand functions and integrate it with the multitask coverage algorithm to develop an adaptive algorithm. We introduce a novel notion of multitask coverage regret that compares the performance of the adaptive algorithm against an oracle with prior knowledge of the demand functions. We establish that our algorithm achieves sublinear cumulative regret, and numerically illustrate its performance.

2603.11244 2026-03-13 q-bio.GN cs.LG

A Standardized Framework For Evaluating Gene Expression Generative Models

Andrea Rubbi, Andrea Giuseppe Di Francesco, Mohammad Lotfollahi, Pietro Liò

详情
英文摘要

The rapid development of generative models for single-cell gene expression data has created an urgent need for standardised evaluation frameworks. Current evaluation practices suffer from inconsistent metric implementations, incomparable hyperparameter choices, and a lack of biologically-grounded metrics. We present Generated Genetic Expression Evaluator (GGE), an open-source Python framework that addresses these challenges by providing a comprehensive suite of distributional metrics with explicit computation space options and biologically-motivated evaluation through differentially expressed gene (DEG)-focused analysis and perturbation-effect correlation, enabling standardized reporting and reproducible benchmarking. Through extensive analysis of the single-cell generative modeling literature, we identify that no standardized evaluation protocol exists. Methods report incomparable metrics computed in different spaces with different hyperparameters. We demonstrate that metric values vary substantially depending on implementation choices, highlighting the critical need for standardization. GGE enables fair comparison across generative approaches and accelerates progress in perturbation response prediction, cellular identity modeling, and counterfactual inference.

2603.11241 2026-03-13 eess.AS cs.LG cs.SD

Cough activity detection for automatic tuberculosis screening

Joshua Jansen van Vüren, Devendra Singh Parihar, Daphne Naidoo, Kimsey Zajac, Willy Ssengooba, Grant Theron, Thomas Niesler

详情
英文摘要

The automatic identification of cough segments in audio through the determination of start and end points is pivotal to building scalable screening tools in health technologies for pulmonary related diseases. We propose the application of two current pre-trained architectures to the task of cough activity detection. A dataset of recordings containing cough from patients symptomatic for tuberculosis (TB) who self-present at community-level care centres in South Africa and Uganda is employed. When automatic start and end points are determined using XLS-R, an average precision of 0.96 and an area under the receiver-operating characteristic of 0.99 are achieved for the test set. We show that best average precision is achieved by utilising only the first three layers of the network, which has the dual benefits of reduced computational and memory requirements, pivotal for smartphone-based applications. This XLS-R configuration is shown to outperform an audio spectrogram transformer (AST) as well as a logistic regression baseline by 9% and 27% absolute in test set average precision respectively. Furthermore, a downstream TB classification model trained using the coughs automatically isolated by XLS-R comfortably outperforms a model trained on the coughs isolated by AST, and is only narrowly outperformed by a classifier trained on the ground truth coughs. We conclude that the application of large pre-trained transformer models is an effective approach to identifying cough end-points and that the integration of such a model into a screening tool is feasible.

2603.11229 2026-03-13 stat.ML cs.LG

Trustworthy predictive distributions for rare events via diagnostic transport maps

Elizabeth Cucuzzella, Rafael Izbicki, Ann B. Lee

Comments 19 pages, 5 figures, 2 tables

详情
英文摘要

Forecast systems in science and technology are increasingly moving beyond point prediction toward methods that produce full predictive distributions of future outcomes y, conditional on high-dimensional and complex sequences of inputs x. However, even when forecast systems provide a full predictive distribution, the result is rarely calibrated with respect to all x and y. The estimated density can be especially unreliable in low-frequency or out-of-distribution regimes, where accurate uncertainty quantification and a means for human experts to verify results are most needed to establish trust in models. In this paper, we take an initial predictive distribution as given and treat it as a useful but potentially misspecified base model. WE then introduce diagnostic transport maps, covariate-dependent probability-to-probability maps that quantify how the base model's probabilities should be adjusted to better match the true conditional distribution of calibration data. At deployment, these maps provide the user with real-time local diagnostics that reveal where the model fails and how it fails (including bias, dispersion, skewness, and tail errors), while also producing a recalibrated predictive distribution through a simple composition with the base model. We apply diagnostic transport maps to short-term tropical cyclone intensity forecasting and show that an easy-to-fit parametric version identifies evolutionary modes associated with local miscalibration and improves the predictive performance for rare events, including 24-hour rapid intensity change, as compared to the operational forecasts of the National Hurricane Center.

2603.11212 2026-03-13 cs.CR cs.LG

Security-by-Design for LLM-Based Code Generation: Leveraging Internal Representations for Concept-Driven Steering Mechanisms

Maximilian Wendlinger, Daniel Kowatsch, Konstantin Böttinger, Philip Sperl

Comments to be published in the IEEE European Symposium on Security and Privacy (EuroS&P)'26

详情
英文摘要

Large Language Models (LLMs) show remarkable capabilities in understanding natural language and generating complex code. However, as practitioners adopt CodeLLMs for increasingly critical development tasks, research reveals that these models frequently generate functionally correct yet insecure code, posing significant security risks. While multiple approaches have been proposed to improve security in AI-based code generation, combined benchmarks show these methods remain insufficient for practical use, achieving only limited improvements in both functional correctness and security. This stems from a fundamental gap in understanding the internal mechanisms of code generation and the root causes of security vulnerabilities, forcing researchers to rely on heuristics and empirical observations. In this work, we investigate the internal representation of security concepts in CodeLLMs, revealing that models are often aware of vulnerabilities as they generate insecure code. Through systematic evaluation, we demonstrate that CodeLLMs can distinguish between security subconcepts, enabling a more fine-grained analysis than prior black-box approaches. Leveraging these insights, we propose Secure Concept Steering for CodeLLMs (SCS-Code). During token generation, SCS-Code steers LLMs' internal representations toward secure and functional code output, enabling a lightweight and modular mechanism that can be integrated into existing code models. Our approach achieves superior performance compared to state-of-the-art methods across multiple secure coding benchmarks.

2603.11205 2026-03-13 eess.AS cs.SD

Can LLMs Help Localize Fake Words in Partially Fake Speech?

Lin Zhang, Thomas Thebaud, Zexin Cai, Sanjeev Khudanpur, Daniel Povey, Leibny Paola García-Perera, Matthew Wiesner, Nicholas Andrews

Comments Submitted to Interspeech 2026; put on arxiv based on requirement from Interspeech: "Interspeech no longer enforces an anonymity period for submissions." and "For authors that prefer to upload their paper online, a note indicating that the paper was submitted for review to Interspeech should be included in the posting."

详情
英文摘要

Large language models (LLMs), trained on large-scale text, have recently attracted significant attention for their strong performance across many tasks. Motivated by this, we investigate whether a text-trained LLM can help localize fake words in partially fake speech, where only specific words within a speech are edited. We build a speech LLM to perform fake word localization via next token prediction. Experiments and analyses on AV-Deepfake1M and PartialEdit indicates that the model frequently leverages editing-style pattern learned from the training data, particularly word-level polarity substitutions for those two databases we discussed, as cues for localizing fake words. Although such particular patterns provide useful information in an in-domain scenario, how to avoid over-reliance on such particular pattern and improve generalization to unseen editing styles remains an open question.

2603.11200 2026-03-13 cs.CR cs.LG

DNS-GT: A Graph-based Transformer Approach to Learn Embeddings of Domain Names from DNS Queries

Massimiliano Altieri, Ronan Hamon, Roberto Corizzo, Michelangelo Ceci, Ignacio Sanchez

详情
英文摘要

Network intrusion detection systems play a crucial role in the security strategy employed by organisations to detect and prevent cyberattacks. Such systems usually combine pattern detection signatures with anomaly detection techniques powered by machine learning methods. However, the commonly proposed machine learning methods present drawbacks such as over-reliance on labeled data and limited generalization capabilities. To address these issues, embedding-based methods have been introduced to learn representations from network data, such as DNS traffic, mainly due to its large availability, that generalise effectively to many downstream tasks. However, current approaches do not properly consider contextual information among DNS queries. In this paper, we tackle this issue by proposing DNS-GT, a novel Transformer-based model that learns embeddings for domain names from sequences of DNS queries. The model is first pre-trained in a self-supervised fashion in order to learn the general behavior of DNS activity. Then, it can be finetuned on specific downstream tasks, exploiting interactions with other relevant queries in a given sequence. Our experiments with real-world DNS data showcase the ability of our method to learn effective domain name representations. A quantitative evaluation on domain name classification and botnet detection tasks shows that our approach achieves better results compared to relevant baselines, creating opportunities for further exploration of large-scale language models for intrusion detection systems. Our code is available at: https://github.com/m-altieri/DNS-GT.

2603.11138 2026-03-13 stat.ML cs.LG math.ST stat.TH

Deep regression learning from dependent observations with minimum error entropy principle

William Kengne, Modou Wade

详情
英文摘要

This paper considers nonparametric regression from strongly mixing observations. The proposed approach is based on deep neural networks with minimum error entropy (MEE) principle. We study two estimators: the non-penalized deep neural network (NPDNN) and the sparse-penalized deep neural network (SPDNN) predictors. Upper bounds of the expected excess risk are established for both estimators over the classes of Hölder and composition Hölder functions. For the models with Gaussian error, the rates of the upper bound obtained match (up to a logarithmic factor) with the lower bounds established in \cite{schmidt2020nonparametric}, showing that both the MEE-based NPDNN and SPDNN estimators from strongly mixing data can achieve the minimax optimal convergence rate.

2603.11134 2026-03-13 math.ST cs.LG stat.TH

Conformal e-prediction in the presence of confounding

Vladimir Vovk, Ruodu Wang

Comments 8 pages, 2 figures

详情
英文摘要

This note extends conformal e-prediction to cover the case where there is observed confounding between the random object $X$ and its label $Y$. We consider both the case where the observed data is IID and a case where some dependence between observations is permitted.

2603.11128 2026-03-13 stat.ML cs.LG cs.NE

Efficient Approximation to Analytic and $L^p$ functions by Height-Augmented ReLU Networks

ZeYu Li, FengLei Fan, TieYong Zeng

详情
英文摘要

This work addresses two fundamental limitations in neural network approximation theory. We demonstrate that a three-dimensional network architecture enables a significantly more efficient representation of sawtooth functions, which serves as the cornerstone in the approximation of analytic and $L^p$ functions. First, we establish substantially improved exponential approximation rates for several important classes of analytic functions and offer a parameter-efficient network design. Second, for the first time, we derive a quantitative and non-asymptotic approximation of high orders for general $L^p$ functions. Our techniques advance the theoretical understanding of the neural network approximation in fundamental function spaces and offer a theoretically grounded pathway for designing more parameter-efficient networks.

2603.11126 2026-03-13 cs.MA cs.CL

Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

Yuanhong Wu, Djallel Bouneffouf, D. Frank Hsu

Comments 5 pages, 3 figures, accepted to 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

详情
英文摘要

Aligning large language models (LLMs) with human values is a central challenge for ensuring trustworthy and safe deployment. While existing methods such as Reinforcement Learning from Human Feedback (RLHF) and its variants have improved alignment, they often rely on a single evaluator or narrowly defined reward signals, limiting their ability to capture ethical pluralism. In this work, we propose the Value Alignment System using Combinatorial Fusion Analysis (VAS-CFA), a framework that operationalizes multi-agent fusion alignment. It instantiates multiple moral agents, each fine-tuned to represent a distinct normative perspective, and fuses their outputs using CFA with both rank- and score-based aggregation. This design leverages cognitive diversity, between agents, to mitigate conflicts and redundancies across multiple agents, producing responses that better reflect human values. Empirical evaluation demonstrates that VAS-CFA outperforms both single agent baselines and prior aggregation approaches on standard metrics, showing that multi-agent fusion provides a robust and effective mechanism for advancing value alignment in LLMs.

2603.11125 2026-03-13 stat.ML cs.LG

Co-Diffusion: An Affinity-Aware Two-Stage Latent Diffusion Framework for Generalizable Drug-Target Affinity Prediction

Yining Qian, Pengjie Wang, Yixiao Li, An-Yang Lu, Cheng Tan, Shuang Li, Lijun Liu

详情
英文摘要

Predicting drug-target affinity is fundamental to virtual screening and lead optimization. However, existing deep models often suffer from representation collapse in stringent cold-start regimes, where the scarcity of labels and domain shifts prevent the learning of transferable pharmacophores and binding motifs. In this paper, we propose Co-Diffusion, a novel affinity-aware framework that redefines DTA prediction as a constrained latent denoising process to enhance generalization. Co-Diffusion employs a two-stage paradigm: Stage I establishes an affinity-steered latent manifold by aligning drug and target embeddings under an explicit supervised objective, ensuring that the latent space reflects the intrinsic binding landscape. Stage II introduces modality-specific latent diffusion as a stochastic perturb-and-denoise regularizer, forcing the model to recover consistent affinity semantics from noisy structural representations. This approach effectively mitigates the reconstruction-regression conflict common in generative DTA models. Theoretically, we show that Co-Diffusion maximizes a variational lower bound on the joint likelihood of drug structures, protein sequences, and binding strength. Extensive experiments across multiple benchmarks demonstrate that Co-Diffusion significantly outperforms state-of-the-art baselines, particularly yielding superior zero-shot generalization on unseen molecular scaffolds and novel protein families-paving a robust path for in silico drug prioritization in unexplored chemical spaces.

2603.11095 2026-03-13 cs.MM cs.SD eess.SP

Multimodal Self-Attention Network with Temporal Alignment for Audio-Visual Emotion Recognition

Inyong Koo, yeeun Seong, Minseok Son, Jaehyuk Jang, Changick Kim

Comments 5 pages, 3 figures, accepted to ICASSP 2026

详情
英文摘要

Audio-visual emotion recognition (AVER) methods typically fuse utterance-level features, and even frame-level attention models seldom address the frame-rate mismatch across modalities. In this paper, we propose a Transformer-based framework focusing on the temporal alignment of multimodal features. Our design employs a multimodal self-attention encoder that simultaneously captures intra- and inter-modal dependencies within a shared feature space. To address heterogeneous sampling rates, we incorporate Temporally-aligned Rotary Position Embeddings (TaRoPE), which implicitly synchronize audio and video tokens. Furthermore, we introduce a Cross-Temporal Matching (CTM) loss that enforces consistency among temporally proximate pairs, guiding the encoder toward better alignment. Experiments on CREMA-D and RAVDESS datasets demonstrate consistent improvements over recent baselines, suggesting that explicitly addressing frame-rate mismatch helps preserve temporal cues and enhances cross-modal fusion.

2603.11088 2026-03-13 cs.CR cs.AI

The Attack and Defense Landscape of Agentic AI: A Comprehensive Survey

Juhee Kim, Xiaoyuan Liu, Zhun Wang, Shi Qiu, Bo Li, Wenbo Guo, Dawn Song

Comments Accepted to USENIX Security 2026. This manuscript is an extended version of the conference paper, including additional discussion and updated content

详情
英文摘要

AI agents that combine large language models with non-AI system components are rapidly emerging in real-world applications, offering unprecedented automation and flexibility. However, this unprecedented flexibility introduces complex security challenges fundamentally different from those in traditional software systems. This paper presents the first systematic and comprehensive survey of AI agent security, including an analysis of the design space, attack landscape, and defense mechanisms for secure AI agent systems. We further conduct case studies to point out existing gaps in securing agentic AI systems and identify open challenges in this emerging domain. Our work also introduces the first systematic framework for understanding the security risks and defense strategies of AI agents, serving as a foundation for building both secure agentic systems and advancing research in this critical area.

2603.11082 2026-03-13 cs.SE cs.AI

Quality-Driven Agentic Reasoning for LLM-Assisted Software Design: Questions-of-Thoughts (QoT) as a Time-Series Self-QA Chain

Yen-Ku Liu, Yun-Cheng Tsai

详情
英文摘要

Recent advances in large language models (LLMs) have accelerated AI-assisted software development, yet practical deployment remains constrained by incomplete implementations, weak modularization, and inconsistent security practices. We introduce Questions-of-Thoughts (QoT), a quality-driven inference-time scaffold that turns a user goal into (i) an ordered sequence of engineering steps and (ii) stepwise self-questioning to verify constraints and reduce omission errors, while maintaining a lightweight reasoning record that stabilizes subsequent design decisions. We evaluate QoT across three representative backend engineering domains: API Design, Data Communication, and File Systems. Each task requires multi-module decomposition and exposes standard failure modes in LLM-generated systems. To enable data-driven comparison, we score generated artifacts using an ISO/IEC-inspired quality rubric that measures Scalability, Completeness, Modularity, and Security. We report domain-wise gains as the change in total quality score, defined as the QoT score minus the NoQoT score. Results show capacity-dependent improvements: QoT yields consistent quality improvements for larger models and more complex domains, while smaller models may exhibit trade-offs under tight context and planning budgets. We release an open artifact with prompts, scoring guidelines, raw generations, and scripts that reproduce the reported tables and figures to support applied AI and data analytics research.

2603.11078 2026-03-13 cs.SE cs.AI cs.CL

CR-Bench: Evaluating the Real-World Utility of AI Code Review Agents

Kristen Pereira, Neelabh Sinha, Rajat Ghosh, Debojyoti Dutta

详情
英文摘要

Recent advances in frontier large language models have enabled code review agents that operate in open-ended, reasoning-intensive settings. However, the lack of standardized benchmarks and granular evaluation protocols makes it difficult to assess behavior of code review agents beyond coarse success metrics, particularly for tasks where false positives are costly. To address this gap, we introduce CR-Bench, a benchmarking dataset, and CR-Evaluator, a fine-grained evaluation pipeline for code review agents. Using these tools, we conduct a preliminary study evaluating both a single-shot agent and a Reflexion-based agent across two frontier models. We find that code review agents can exhibit a low signal-to-noise ratio when designed to identify all hidden issues, obscuring true progress and developer productivity when measured solely by resolution rates. Our analysis identifies the hidden trade-off between issue resolution and spurious findings, revealing a frontier that constrains effective agent design. Together, CR-Bench and CR-Evaluator provide a timely foundation for studying and developing code review agents as LLM-based systems transition from controlled benchmarks to real-world software engineering workflows.

2603.11068 2026-03-13 cond-mat.mtrl-sci cs.AI

From Phase Prediction to Phase Design: A ReAct Agent Framework for High-Entropy Alloy Discovery

Iman Peivaste, Salim Belouettar

详情
英文摘要

Discovering high-entropy alloy (HEA) compositions that reliably form a target crystal phase is a high-dimensional inverse design problem that conventional trial-and-error experimentation and forward-only machine learning models cannot efficiently solve. Here we present a ReAct (Reasoning + Acting) LLM agent that autonomously proposes, validates, and iteratively refines HEA compositions by querying a calibrated XGBoost surrogate trained on 4,753 experimental records across four phases (FCC, BCC, BCC+FCC, BCC+IM), achieving 94.66\% accuracy (F1 macro = 0.896). Against Bayesian optimisation (BO) and random search baselines, the full-prompt agent achieves descriptor-space rediscovery rates of 38\%, 18\%, and 38\% for FCC, BCC, and BCC+FCC (Mann--Whitney $p \leq 0.039$), with proposals lying 2.4--22.8$\times$ closer to the experimental phase manifold than random search. An ablation reveals that domain priors shift the agent from landmark-alloy recall toward compositionally diverse exploration -- an uninformed agent scores higher rediscovery by concentrating on literature-dense families, while the full-prompt agent explores underrepresented space (unique ratio 1.0 vs.\ 0.39 for BCC+FCC). These regimes represent distinct criteria: proximity to known literature versus genuine discovery. Spearman analysis confirms agent reasoning is statistically aligned with empirical phase distributions ($ρ= 0.736$, $p = 0.004$ for BCC). This work establishes LLM-guided agentic reasoning as a principled, transparent, and manifold-aware complement to gradient-free optimisation for inverse alloy design.

2603.11061 2026-03-13 physics.chem-ph cs.AI cs.NE

Hybrid Quantum-Classical Encoding for Accurate Residue-Level pKa Prediction

Van Le, Tan Le

详情
英文摘要

Accurate prediction of residue-level pKa values is essential for understanding protein function, stability, and reactivity. While existing resources such as DeepKaDB and CpHMD-derived datasets provide valuable training data, their descriptors remain primarily classical and often struggle to generalize across diverse biochemical environments. We introduce a reproducible hybrid quantum-classical framework that enriches residue-level representations with a Gaussian kernel-based quantum-inspired feature mapping. These quantum-enhanced descriptors are combined with normalized structural features to form a unified hybrid encoding processed by a Deep Quantum Neural Network (DQNN). This architecture captures nonlinear relationships in residue microenvironments that are not accessible to classical models. Benchmarking across multiple curated descriptor sets demonstrates that the DQNN achieves improved cross-context generalization relative to classical baselines. External evaluation on the PKAD-R experimental benchmark and an A$β$40 case study further highlights the robustness and transferability of the quantum-inspired representation. By integrating quantum-inspired feature transformations with classical biochemical descriptors, this work establishes a scalable and experimentally transferable approach for residue-level pKa prediction and broader applications in protein electrostatics.

2603.11054 2026-03-13 cs.SI cs.AI cs.CR cs.CY cs.GT

A Survey on Quantitative Modeling of Trust in Online Social Networks

Wenting Song, K. Suzanne Barber

Comments 34 pages, 9 figures, submitted to ACM computing surveys

详情
英文摘要

Online social networks facilitate user engagement and information sharing but are also rife with misinformation and deception. Research on trust modeling in online social networks focuses on developing computational models or algorithms to measure trust relationships, assess the reliability of shared content, and detect spam or malicious activities. However, most existing review papers either briefly mention the concept of trust or focus on a single category of trust models. In this paper, we offer a comprehensive categorization and review of state-of-the-art trust models developed for online social networks. First, we explore theories and models related to trust in psychology and identify several factors that influence the formation and evolution of online trust. Next, state-of-the-art trust models are categorized based on their algorithmic foundations. For each category, the modeling mechanisms are investigated, and their unique contributions to quantitative trust modeling are highlighted. Subsequently, we provide an implementation-centric trust modeling handbook, which summarizes available datasets, trust-related features, promising modeling techniques, and feasible application scenarios. Finally, the findings of the literature review are summarized, and unresolved challenges are discussed.

2603.11051 2026-03-13 cs.IR cs.AI cs.CL cs.LG

OpenSanctions Pairs: Large-Scale Entity Matching with LLMs

Chandler Smith, Magnus Sesodia, Friedrich Lindenberg, Christian Schroeder de Witt

详情
英文摘要

We release OpenSanctions Pairs, a large-scale entity matching benchmark derived from real-world international sanctions aggregation and analyst deduplication. The dataset contains 755,540 labeled pairs spanning 293 heterogeneous sources across 31 countries, with multilingual and cross-script names, noisy and missing attributes, and set-valued fields typical of compliance workflows. We benchmark a production rule-based matcher (nomenklatura RegressionV1 algorithm) against open- and closed-source LLMs in zero- and few-shot settings. Off-the-shelf LLMs substantially outperform the production rule-based baseline (91.33\% F1), reaching up to 98.95\% F1 (GPT-4o) and 98.23\% F1 with a locally deployable open model (DeepSeek-R1-Distill-Qwen-14B). DSPy MIPROv2 prompt optimization yields consistent but modest gains, while adding in-context examples provides little additional benefit and can degrade performance. Error analysis shows complementary failure modes: the rule-based system over-matches (high false positives), whereas LLMs primarily fail on cross-script transliteration and minor identifier/date inconsistencies. These results indicate that pairwise matching performance is approaching a practical ceiling in this setting, and motivate shifting effort toward pipeline components such as blocking, clustering, and uncertainty-aware review. Code available at https://github.com/chansmi/OSINT_entity_resolution

2603.10323 2026-03-13 cs.CR cs.CV

The Orthogonal Vulnerabilities of Generative AI Watermarks: A Comparative Empirical Benchmark of Spatial and Latent Provenance

Jesse Yu, Nicholas Wei

Comments 10 pages, 4 figures

详情
英文摘要

As open-weights generative AI rapidly proliferates, the ability to synthesize hyper-realistic media has introduced profound challenges to digital trust. Automated disinformation and AI-generated imagery have made robust digital provenance a critical cybersecurity imperative. Currently, state-of-the-art invisible watermarks operate within one of two primary mathematical manifolds: the spatial domain (post-generation pixel embedding) or the latent domain (pre-generation frequency embedding). While existing literature frequently evaluates these models against isolated, classical distortions, there is a critical lack of rigorous, comparative benchmarking against modern generative AI editing tools. In this study, we empirically evaluate two leading representative paradigms, RivaGAN (Spatial) and Tree-Ring (Latent), utilizing an automated Attack Simulation Engine across 30 intensity intervals of geometric and generative perturbations. We formalize an "Adversarial Evasion Region" (AER) framework to measure cryptographic degradation against semantic visual retention (OpenCLIP > 75.0). Our statistical analysis ($n=100$ per interval, $MOE = \pm 3.92\%$) reveals that these domains possess mutually exclusive, mathematically orthogonal vulnerabilities. Spatial watermarks experience severe cryptographic degradation under algorithmic pixel-rewriting (exhibiting a 67.47% AER evasion rate under Img2Img translation), whereas latent watermarks exhibit profound fragility against geometric misalignment (yielding a 43.20% AER evasion rate under static cropping). By proving that single-domain watermarking is fundamentally insufficient against modern adversarial toolsets, this research exposes a systemic vulnerability in current digital provenance standards and establishes the foundational exigence for future multi-domain cryptographic architectures.

2603.10065 2026-03-13 cs.IT cs.AI cs.SY eess.SY math.IT stat.ME

The Epistemic Support-Point Filter: Jaynesian Maximum Entropy Meets Popperian Falsification

Moriba Kemessia Jah

详情
英文摘要

This paper proves that the Epistemic Support-Point Filter (ESPF) is the unique optimal recursive estimator within the class of epistemically admissible evidence-only filters. Where Bayesian filters minimize mean squared error and are driven toward an assumed truth, the ESPF minimizes maximum entropy and surfaces what has not been proven impossible -- a fundamentally different epistemic commitment with fundamentally different failure modes. Two results locate this theorem within the broader landscape of estimation theory. The first is a unification: the ESPF's optimality criterion is the log-geometric mean of the alpha-cut volume family in the Holder mean hierarchy. The Popperian minimax bound and the Kalman MMSE criterion occupy the p=+inf and p=0 positions on the same curve. Possibility and probability are not competing frameworks: they are the same ignorance functional evaluated under different alpha-cut geometries. The Kalman filter is the Gaussian specialization of the ESPF's optimality criterion, not a separate invention. The second result is a diagnostic: numerical validation over a 2-day, 877-step Smolyak Level-3 orbital tracking run shows that possibilistic stress manifests through necessity saturation and surprisal escalation rather than MVEE sign change -- a direct consequence of the Holder ordering, not an empirical observation. Three lemmas establish the result: the Possibilistic Entropy Lemma decomposes the ignorance functional; the Possibilistic Cramer-Rao Bound limits entropy reduction per measurement; the Evidence-Optimality Lemma proves minimum-q selection is the unique minimizer and that any rule incorporating prior possibility risks race-to-bottom bias.

2603.08771 2026-03-13 stat.ML cs.IT cs.LG math.IT

Micro-Diffusion Compression - Binary Tree Tweedie Denoising for Online Probability Estimation

Roberto Tacconelli

Comments 12 pages, 1 figure

详情
英文摘要

We present Midicoth, a lossless compression system that introduces a micro-diffusion denoising layer for improving probability estimates produced by adaptive statistical models. In compressors such as Prediction by Partial Matching (PPM), probability estimates are smoothed by a prior to handle sparse observations. When contexts have been seen only a few times, this prior dominates the prediction and produces distributions that are significantly flatter than the true source distribution, leading to compression inefficiency. Midicoth addresses this limitation by treating prior smoothing as a shrinkage process and applying a reverse denoising step that corrects predicted probabilities using empirical calibration statistics. To make this correction data-efficient, the method decomposes each byte prediction into a hierarchy of binary decisions along a bitwise tree. This converts a single 256-way calibration problem into a sequence of binary calibration tasks, enabling reliable estimation of correction terms from relatively small numbers of observations. The denoising process is applied in multiple successive steps, allowing each stage to refine residual prediction errors left by the previous one. The micro-diffusion layer operates as a lightweight post-blend calibration stage applied after all model predictions have been combined, allowing it to correct systematic biases in the final probability distribution. Midicoth combines five fully online components: an adaptive PPM model, a long-range match model, a trie-based word model, a high-order context model, and the micro-diffusion denoiser applied as the final stage.

2603.08245 2026-03-13 cs.CG cs.CV

Topologically Stable Hough Transform

Stefan Huber, Kristóf Huszár, Michael Kerber, Martin Uray

Comments Extended abstract will be presented at EuroCG'26; 11 pages, 7 figures

详情
英文摘要

We propose an alternative formulation of the well-known Hough transform to detect lines in point clouds. Replacing the discretized voting scheme of the classical Hough transform by a continuous score function, its persistent features in the sense of persistent homology give a set of candidate lines. We also devise and implement an algorithm to efficiently compute these candidate lines.

2603.07949 2026-03-13 cs.DC cs.RO

RAPID: Redundancy-Aware and Compatibility-Optimal Edge-Cloud Partitioned Inference for Diverse VLA Models

Zihao Zheng, Sicheng Tian, Hangyu Cao, Chenyue Li, Jiayu Chen, Maoliang Li, Xinhao Sun, Hailong Zou, Guojie Luo, Xiang Chen

详情
英文摘要

Vision Language Action (VLA) models are mainstream in embodied intelligence but face high inference costs. Edge-Cloud Collaborative (ECC) inference offers an effective fix by easing edge-device computing pressure to meet real-time needs. However, existing ECC frameworks are suboptimal for VLA models due to two challenges: (1) Mainstream environment-oriented edge-cloud partitioning methods are susceptible to interference from visual noise; (2) Existing edge-cloud partitioning methods overlook the step-wise redundancy unique to embodied tasks, thereby disrupting the physical continuity of motion. To address these issues, we propose a novel ECC inference framework, termed RAPID. Specifically, we developed an implementation tailored to the proposed framework. Experiments demonstrate this achieves a speedup of up to 1.73x with only 5%~7% overhead.