arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1530
2604.08720 2026-04-13 cs.SE cs.AI

Demystifying the Silence of Correctness Bugs in PyTorch Compiler

Meiziniu Li, Dongze Li, Jianmeng Liu, Shing-Chi Cheung

详情
英文摘要

Performance optimization of AI infrastructure is key to the fast adoption of large language models (LLMs). The PyTorch compiler (torch.compile), a core optimization tool for deep learning (DL) models (including LLMs), has received due attention. However, torch.compile is prone to correctness bugs, which cause incorrect outputs of compiled DL models without triggering exceptions, crashes, or warnings. These bugs pose a serious threat to the reliability of downstream LLM applications. Data from the PyTorch community shows that 19.2% of high-priority issues are incorrect outputs of compiled DL models induced by torch.compile bugs, the second-most-common bug category (only behind program crashes at 19.57%). However, no systematic study has been conducted to specifically characterize and thereby detect these bugs. In this paper, we present the first empirical study of the correctness bugs in torch.compile, examine their characteristics, and assess the effectiveness of existing fuzzers in detecting them. Based on our findings, we propose a proof-of-concept testing technique named AlignGuard, tailored specifically for detecting correctness bugs in torch.compile. AlignGuard incorporates bug characteristics distilled from our empirical study, applying LLM-based test mutation to existing test cases for correctness bug detection. At the time of writing, AlignGuard has successfully detected 23 new correctness bugs in recent torch.compile. All these bugs have been confirmed or fixed by the PyTorch development team, and over half (14/23) of them are even marked as high-priority bugs, underscoring the usefulness of our technique.

2604.08703 2026-04-13 cs.MM cs.DB cs.LG

QoS-QoE Translation with Large Language Model

Yingjie Yu, Mingyuan Wu, Ahmadreza Eslaminia, Lingzhi Zhao, Kaizhuo Yan, Klara Nahrstedt

详情
英文摘要

QoS-QoE translation is a fundamental problem in multimedia systems because it characterizes how measurable system and network conditions affect user-perceived experience. Although many prior studies have examined this relationship, their findings are often developed for specific setups and remain scattered across papers, experimental settings, and reporting formats, limiting systematic reuse, cross-scenario generalization, and large-scale analysis. To address this gap, we first introduce QoS-QoE Translation dataset, a source-grounded dataset of structured QoS-QoE relationships from the multimedia literature, with a focus on video streaming related tasks. We construct the dataset through an automated pipeline that combines paper curation, QoS-QoE relationship extraction, and iterative data evaluation. Each record preserves the extracted relationship together with parameter definitions, supporting evidence, and contextual metadata. We further evaluate the capability of large language models (LLMs) on QoS-QoE translation, both before and after supervised fine-tuning on our dataset, and show strong performance on both continuous-value and discrete-label prediction in bidirectional translation, from QoS-QoE and QoE-QoS. Our dataset provides a foundation for benchmarking LLMs in QoS-QoE translation and for supporting future LLM-based reasoning for multimedia quality prediction and optimization. The complete dataset and code are publicly available at https://yyu6969.github.io/qos-qoe-translation-page/, for full reproducibility and open access.

2604.08669 2026-04-13 cond-mat.quant-gas cs.LG quant-ph

An Algorithm for Fast Assembling Large-Scale Defect-Free Atom Arrays

Tao Zhang, Xiaodi Li, Hui Zhai, Linghui Chen

详情
英文摘要

It is widely believed that tens of thousands of physical qubits are needed to build a practically useful quantum computer. Atom arrays formed by optical tweezers are among the most promising platforms for achieving this goal, owing to the excellent scalability and mobility of atomic qubits. However, assembling a defect-free atom array with ~ 10^4 qubits remains algorithmically challenging, alongside other hardware limitations. This is due to the computationally hard path-planning problems and the time-consuming generation of suffciently smooth trajectories for optical tweezer potentials by spatial light modulators (SLM). Here, we present a unified framework comprising two innovative components to fully address these algorithmic challenges: (1) a path-planning module that employs a supervised learning approach using a graph neural network combined with a modified auction decoder, and (2) a potential-generation module called the phase and profile-aware Weighted Gerchberg-Saxton algorithm. The inference time for the first module is nearly a size-independent constant overhead of ~ 5 ms, and the second module generates a potential frame with about 0.5 ms, a timescale shorter than the current commercial SLM refresh time. Altogether, our algorithm enables the assembly of an atom array with 10^4 qubits on a timescale much shorter than the typical vacuum lifetime of the trapped atoms.

2604.08661 2026-04-13 quant-ph cond-mat.dis-nn cs.LG physics.comp-ph

Geometry-Induced Long-Range Correlations in Recurrent Neural Network Quantum States

Asif Bin Ayub, Amine Mohamed Aboussalah, Mohamed Hibat-Allah

Comments 16 pages, 4 figures, and 1 table

详情
英文摘要

Neural Quantum States based on autoregressive recurrent neural network (RNN) wave functions enable efficient sampling without Markov-chain autocorrelation, but standard RNN architectures are biased toward finite-length correlations and can fail on states with long-range dependencies. A common response is to adopt transformer-style self-attention, but this typically comes with substantially higher computational and memory overhead. Here we introduce dilated RNN wave functions, where recurrent units access distant sites through dilated connections, injecting an explicit long-range inductive bias while retaining a favorable $\mathcal{O}(N \log N)$ forward pass scaling. We show analytically that dilation changes the correlation geometry and can induce power-law correlation scaling in a simplified linearized and perturbative setting. Numerically, for the critical 1D transverse-field Ising model, dilated RNNs reproduce the expected power-law connected two-point correlations in contrast to the exponential decay typical of conventional RNN ansätze. We further show that the dilated RNN accurately approximates the one-dimensional Cluster state, a paradigmatic example with long-range conditional correlations that has previously been reported to be challenging for RNN-based wave functions. These results highlight dilation as a simple geometric mechanism for building correlation-aware autoregressive neural quantum states.

2604.08648 2026-04-13 astro-ph.HE astro-ph.IM cs.LG hep-ph

High-dimensional inference for the $γ$-ray sky with differentiable programming

Siddharth Mishra-Sharma, Tracy R. Slatyer, Yitian Sun, Yuqing Wu

Comments 17 pages, 13 figures. Code available at https://github.com/smsharma/fermi-prob-prog

详情
英文摘要

We motivate the use of differentiable probabilistic programming techniques in order to account for the large model-space inherent to astrophysical $γ$-ray analyses. Targeting the longstanding Galactic Center $γ$-ray Excess (GCE) puzzle, we construct differentiable forward model and likelihood that make liberal use of GPU acceleration and vectorization in order to simultaneously account for a continuum of possible spatial morphologies consistent with the GCE emission in a fully probabilistic manner. Our setup allows for efficient inference over the large model space using variational methods. Beyond application to $γ$-ray data, a goal of this work is to showcase how differentiable probabilistic programming can be used as a tool to enable flexible analyses of astrophysical datasets.

2604.08628 2026-04-13 cs.CR cs.AI cs.IR

Retrieval Augmented Classification for Confidential Documents

Yeseul E. Chang, Rahul Kailasa, Simon Shim, Byunghoon Oh, Jaewoo Lee

Comments Appears in: KSII The 17th International Conference on Internet (ICONI) 2025, Dec 2025. 7 pages (48-54)

详情
Journal ref
In Proceedings of KSII ICONI 2025, Dec 2025
英文摘要

Unauthorized disclosure of confidential documents demands robust, low-leakage classification. In real work environments, there is a lot of inflow and outflow of documents. To continuously update knowledge, we propose a methodology for classifying confidential documents using Retrieval Augmented Classification (RAC). To confirm this effectiveness, we compare RAC and supervised fine tuning (FT) on the WikiLeaks US Diplomacy corpus under realistic sequence-length constraints. On balanced data, RAC matches FT. On unbalanced data, RAC is more stable while delivering comparable performance--about 96% Accuracy on both the original (unbalanced) and augmented (balanced) sets, and up to 94% F1 with proper prompting--whereas FT attains 90% F1 trained on the augmented, balanced set but drops to 88% F1 trained on the original, unbalanced set. When robust augmentation is infeasible, RAC provides a practical, security-preserving path to strong classification by keeping sensitive content out of model weights and under your control, and it remains robust as real-world conditions change in class balance, data, context length, or governance requirements. Because RAC grounds decisions in an external vector store with similarity matching, it is less sensitive to label skew, reduces parameter-level leakage, and can incorporate new data immediately via reindexing--a difficult step for FT, which typically requires retraining. The contributions of this paper are threefold: first, a RAC-based classification pipeline and evaluation recipe; second, a controlled study that isolates class imbalance and context-length effects for FT versus RAC in confidential-document grading; and third, actionable guidance on RAC design patterns for governed deployments.

2604.08625 2026-04-13 stat.ML cs.LG math.ST stat.TH

Spectral-Transport Stability and Benign Overfitting in Interpolating Learning

Gustav Olaf Yunus Laitinen-Lundström Fredriksson-Imanov

Comments 50 pages, 7 figures, 4 tables. Research article. Includes full proofs, model-specific corollaries, and synthetic supporting experiments. Submitted to Machine Learning

详情
英文摘要

We develop a theoretical framework for generalization in the interpolating regime of statistical learning. The central question is why highly overparameterized estimators can attain zero empirical risk while still achieving nontrivial predictive accuracy, and how to characterize the boundary between benign and destructive overfitting. We introduce a spectral-transport stability framework in which excess risk is controlled jointly by the spectral geometry of the data distribution, the sensitivity of the learning rule under single-sample replacement, and the alignment structure of label noise. This leads to a scale-dependent Fredriksson index that combines effective dimension, transport stability, and noise alignment into a single complexity parameter for interpolating estimators. We prove finite-sample risk bounds, establish a sharp benign-overfitting criterion through the vanishing of the index along admissible spectral scales, and derive explicit phase-transition rates under polynomial spectral decay. For a model-specific specialization, we obtain an explicit theorem for polynomial-spectrum linear interpolation, together with a proof of the resulting rate. The framework also clarifies implicit regularization by showing how optimization dynamics can select interpolating solutions of minimal spectral-transport energy. These results connect algorithmic stability, double descent, benign overfitting, operator-theoretic learning theory, and implicit bias within a unified structural account of modern interpolation.

2604.08606 2026-04-13 cs.GT cs.AI econ.TH

Extrapolating Volition with Recursive Information Markets

Abhimanyu Pallavi Sudhir, Long Tran-Thanh

Comments Accepted to Games, Agents and Incentives Workshop at AAMAS-2026

详情
英文摘要

One of the impediments to the efficiency of information markets is the inherent information asymmetry present in them, exacerbated by the "buyer's inspection paradox" (the buyer cannot mitigate the asymmetry by "inspecting" the information, because in doing so the buyer obtains the information without paying for it). Previous work has suggested that using Large Language Model (LLM) buyers to inspect and purchase information could overcome this information asymmetry, as an LLM buyer can simply "forget" the information it inspects. In this work, we analyze this mechanism formally through a "value-of-information" paradigm, i.e. whether it incentivizes information to be priced and provided in accordance with its "true value". We focus in particular on our new recursive version of the mechanism, which we believe has a range of applications including in AI alignment research, where it is related to Extrapolated Volition and Scalable Oversight.

2604.08602 2026-04-13 cs.DL cs.AI cs.LG

TiAb Review Plugin: A Browser-Based Tool for AI-Assisted Title and Abstract Screening

Yuki Kataoka, Masahiro Banno, Michihito Kyo, Shuri Nakao, Tomoo Sato, Shunsuke Taito, Tomohiro Takayama, Takahiro Tsuge, Yasushi Tsujimoto, Ryuhei So, Toshi A. Furukawa

Comments 25 pages, 2 figures. Abstract submitted to Cochrane Colloquium 2026. Code: https://github.com/youkiti/tiab-review-plugin

详情
英文摘要

Background: Server-based screening tools impose subscription costs, while open-source alternatives require coding skills. Objectives: We developed a browser extension that provides no-code, serverless artificial intelligence (AI)-assisted title and abstract screening and examined its functionality. Methods: TiAb Review Plugin is an open-source Chrome browser extension (available at https://chromewebstore.google.com/detail/tiab-review-plugin/alejlnlfflogpnabpbplmnojgoeeabij). It uses Google Sheets as a shared database, requiring no dedicated server and enabling multi-reviewer collaboration. Users supply their own Gemini API key, stored locally and encrypted. The tool offers three screening modes: manual review, large language model (LLM) batch screening, and machine learning (ML) active learning. For ML evaluation, we re-implemented the default ASReview active learning algorithm (TF-IDF with Naive Bayes) in TypeScript to enable in-browser execution, and verified equivalence against the original Python implementation using 10-fold cross-validation on six datasets. For LLM evaluation, we compared 16 parameter configurations across two model families on a benchmark dataset, then validated the optimal configuration (Gemini 3.0 Flash, low thinking budget, TopP=0.95) with a sensitivity-oriented prompt on five public datasets (1,038 to 5,628 records, 0.5 to 2.0 percent prevalence). Results: The TypeScript classifier produced top-100 rankings 100 percent identical to the original ASReview across all six datasets. For LLM screening, recall was 94 to 100 percent with precision of 2 to 15 percent, and Work Saved over Sampling at 95 percent recall (WSS@95) ranged from 48.7 to 87.3 percent. Conclusions: We developed a functional browser extension that integrates LLM screening and ML active learning into a no-code, serverless environment, ready for practical use in systematic review screening.

2604.08597 2026-04-13 cs.DB cs.AI

STIndex: A Context-Aware Multi-Dimensional Spatiotemporal Information Extraction System

Wenxiao Zhang, Yu Liu, Qiang sun, Yihao Ding, Sirui Li, Yanbing Liu, Jin B. Hong, Wei Liu

详情
英文摘要

Extracting structured knowledge from unstructured data still faces practical limitations: entity and event extraction pipelines remain brittle, knowledge graph construction requires costly ontology engineering, and cross-domain generalization is rarely production-ready. In contrast, space and time provide universal contextual anchors that naturally align heterogeneous information and benefit downstream tasks such as retrieval and reasoning. We introduce \textbf{STIndex}, an end-to-end system that structures unstructured content into a multidimensional spatiotemporal data warehouse. Users define domain-specific analysis dimensions with configurable hierarchies, while large language models perform context-aware extraction and grounding. \textbf{STIndex} integrates document-level memory, geocoding correction, and quality validation, and offers an interactive analytics dashboard for visualization, clustering, burst detection, and entity network analysis. In evaluation on a public health benchmark, \textbf{STIndex} improves spatiotemporal entity extraction F1 by 4.37\% (GPT-4o-mini) and 3.60\% (Qwen3-8B). A live demonstration and open-source code are available at https://stindex.ai4wa.com/dashboard.

2604.08594 2026-04-13 q-bio.NC cs.AI cs.HC

Mapping generative AI use in the human brain: divergent neural, academic, and mental health profiles of functional versus socio emotional AI use

Junjie Wang, Xianyang Gan, Dan Liu, Jingxian He, Stefania Ferraro, Keith M. Kendrick, Weihua Zhao, Shuxia Yao, Christian Montag, Benjamin Becker

Comments 45 pages, 20 figures, 5 tables

详情
英文摘要

The widespread adoption of generative artificial intelligence conversational agents (AICAs) among university students constitutes a novel cognitive social environment whose impact on the maturing brain remains elusive. Combining surveys with high resolution structural MRI, we examined patterns of general, functional, and socio emotional AICA use, academic performance, mental health, and brain structural signatures in a comparatively large sample of 222 young individuals. Across computational anatomy, meta analytic network level, and behavioral decoding analyses, we observed use specific associations. Higher general and functional AICA use frequencies were linked to better academic outcomes (GPA), larger dorsolateral prefrontal and calcarine gray matter volume, and enhanced hippocampal network clustering and local efficiency. In contrast, more frequent socio emotional AICA use was associated with poorer mental health (depression, social anxiety) and lower volume of superior temporal and amygdalar regions central to social and affective processing. These findings indicate that the same class of AI tools exerts distinct effects depending on usage patterns and motivations, engaging prefrontal hippocampal systems that support cognition versus socio emotional systems that may track distress linked usage. These heterogeneities are crucial for designing environments that harness the educational benefits of AI while mitigating mental health risks.

2604.08585 2026-04-13 cs.DB cs.AI

QCFuse: Query-Centric Cache Fusion for Efficient RAG Inference

Jianxin Yan, Zeheng Qian, Wangze Ni, Zhitao Shen, Zhiping Wang, Haoyang Li, Jia Zhu, Lei Chen, Kui Ren

详情
英文摘要

Cache fusion accelerates generation process of LLMs equipped with RAG through KV caching and selective token recomputation, thereby reducing computational costs and improving efficiency. However, existing methods primarily rely on local perspectives for token selection and lack global awareness from the user query. Utilizing this global awareness is challenging due to the high cost of obtaining context-aware query representations and the strict pipeline constraints required for efficient attention analysis. Thus, this demonstration introduces QCFuse, an innovative KV cache fusion system centered on the user query. QCFuse leverages semantic summary anchors to enhance query representations and selectively recomputes query-related tokens to improve accuracy, updating tokens based on the attention distribution of the most critical Transformer layer to preserve the high efficiency of the pipeline structure. Evaluations on real-world datasets demonstrate that QCFuse significantly improves the response efficiency of LLMs by 40\% while maintaining equivalent accuracy compared to current methods. Additionally, in certain scenarios, QCFuse achieves an attention denoising effect that yields higher response accuracy, demonstrating substantial potential in the optimization of LLM inference.

2604.08580 2026-04-13 math.OC cs.LG

Adjoint Matching through the Lens of the Stochastic Maximum Principle in Optimal Control

Carles Domingo-Enrich, Jiequn Han

详情
英文摘要

Reward fine-tuning of diffusion and flow models and sampling from tilted or Boltzmann distributions can both be formulated as stochastic optimal control (SOC) problems, where learning an optimal generative dynamics corresponds to optimizing a control under SDE constraints. In this work, we revisit and generalize Adjoint Matching, a recently proposed SOC-based method for learning optimal controls, and place it on a rigorous footing by deriving it from the Stochastic Maximum Principle (SMP). We formulate a general Hamiltonian adjoint matching objective for SOC problems with control-dependent drift and diffusion and convex running costs, and show that its expected value has the same first variation as the original SOC objective. As a consequence, critical points satisfy the Hamilton--Jacobi--Bellman (HJB) stationarity conditions. In the important practical case of state- and control-independent diffusion, we recover the lean adjoint matching loss previously introduced in adjoint matching, which avoids second-order terms and whose critical points coincide with the optimal control under mild uniqueness assumptions. Finally, we show that adjoint matching can be precisely interpreted as a continuous-time method of successive approximations induced by the SMP, yielding a practical and implementable alternative to classical SMP-based algorithms, which are obstructed by intractable martingale terms in the stochastic setting. These results are also of independent interest to the stochastic control community, providing new implementable objectives and a viable pathway for SMP-based iterations in stochastic problems.

2604.08576 2026-04-13 cs.NI cs.AI cs.LG

GAN-Enhanced Deep Reinforcement Learning for Semantic-Aware Resource Allocation in 6G Network Slicing

Daniel Benniah John

Comments 15 pages, 8 figures. Under review. Simulation-based evaluation for 6G network slicing

详情
英文摘要

Sixth-generation (6G) wireless networks must support heterogeneous services: enhanced Mobile Broadband (eMBB) requiring 1 Tbps data rates, massive Machine-Type Communications (mMTC) supporting 10 million devices per km, and Ultra-Reliable Low-Latency Communications (URLLC) with 0.1-1 ms latency. Current resource allocation suffers from three limitations: (1) semantic blindness wasting 35% bandwidth on redundant data, (2) discrete action quantization, and (3) limited training diversity. This paper proposes GAN-DDPG, a Generative Adversarial Network-enhanced Deep Deterministic Policy Gradient framework integrating conditional GANs for traffic synthesis, continuous action DDPG, and semantic-aware reward optimization. Extensive simulations with statistical validation demonstrate significant improvements: 22% URLLC, 20% eMBB, 25% mMTC spectral efficiency gains (all p < 0.001) compared to baseline DDPG, with 18% latency and 31% packet loss reduction.

2604.08552 2026-04-13 cs.DB cs.AI

Automated Standardization of Legacy Biomedical Metadata Using an Ontology-Constrained LLM Agent

Josef Hardi, Martin J. O'Connor, Marcos Martinez-Romero, Jean G. Rosario, Stephen A. Fisher, Mark A. Musen

详情
英文摘要

Scientific metadata are often incomplete and noncompliant with community standards, limiting dataset findability, interoperability, and reuse. When reporting guidelines exist, they typically lack machine-actionable representations. Producing FAIR datasets requires encoding metadata standards as machine-actionable templates with rich field specifications and precise value constraints. Recent work has shown that LLMs guided by field names and ontology constraints can improve metadata standardization, but these approaches treat constraints as static text prompts, relying on the model's training knowledge alone. We present an LLM-based metadata standardization system that queries authoritative biomedical terminology services in real time to retrieve canonically correct vocabulary terms on demand. We evaluate this approach on 839 legacy metadata records from the Human BioMolecular Atlas Program (HuBMAP) using an expert-curated gold standard for exact-match assessment. Our evaluation shows that augmenting the LLM with real-time tool access consistently improves prediction accuracy over the LLM alone across both ontology-constrained and non-ontology-constrained fields, demonstrating a practical, scalable approach to automated standardization of biomedical metadata.

2604.08551 2026-04-13 cs.CR cs.CY cs.LG

Self-Sovereign Agent

Wenjie Qu, Xuandong Zhao, Jiaheng Zhang, Dawn Song

详情
英文摘要

We investigate the emerging prospect of self-sovereign agents -- AI systems that can economically sustain and extend their own operation without human involvement. Recent advances in large language models and agent frameworks have substantially expanded agents' practical capabilities, pointing toward a potential shift from developer-controlled tools to more autonomous digital actors. We analyze the remaining technical barriers to such deployments and discuss the security, societal, and governance challenges that could arise if such systems become practically viable. A project page is available at: https://self-sovereign-agent.github.io.

2604.08550 2026-04-13 cs.IR cs.AI

Unbiased Rectification for Sequential Recommender Systems Under Fake Orders

Qiyu Qin, Yichen Li, Haozhao Wang, Cheng Wang, Rui Zhang, Ruixuan Li

详情
英文摘要

Fake orders pose increasing threats to sequential recommender systems by misleading recommendation results through artificially manipulated interactions, including click farming, context-irrelevant substitutions, and sequential perturbations. Unlike injecting carefully designed fake users to influence recommendation performance, fake orders embedded within genuine user sequences aim to disrupt user preferences and mislead recommendation results, thereby manipulating exposure rates of specific items to gain competitive advantages. To protect users' authentic interest preferences and eliminate misleading information, this paper aims to perform precise and efficient rectification on compromised sequential recommender systems while avoiding the enormous computational and time costs of retraining existing models. Specifically, we identify that fake orders are not absolutely harmful - in certain cases, partial fake orders can even have a data augmentation effect. Based on this insight, we propose Dual-view Identification and Targeted Rectification (DITaR), which primarily identifies harmful samples to achieve unbiased rectification of the system. The core idea of this method is to obtain differentiated representations from collaborative and semantic views for precise detection, and then filters detected suspicious fake orders to select truly harmful ones for targeted rectification with gradient ascent. This ensures that useful information in fake orders is not removed while preventing bias residue. Moreover, it maintains the original data volume and sequence structure, thus protecting system performance and trustworthiness to achieve optimal unbiased rectification. Extensive experiments on three datasets demonstrate that DITaR achieves superior performance compared to state-of-the-art methods in terms of recommendation quality, computational efficiency, and system robustness.

2604.08549 2026-04-13 cs.IR cs.AI cs.CL

VerifAI: A Verifiable Open-Source Search Engine for Biomedical Question Answering

Miloš Košprdić, Adela Ljajić, Bojana Bašaragin, Darija Medvecki, Lorenzo Cassano, Nikola Milošević

详情
Journal ref
Sumitted to IEEE Access,2026
英文摘要

We introduce VerifAI, an open-source expert system for biomedical question answering that integrates retrieval-augmented generation (RAG) with a novel post-hoc claim verification mechanism. Unlike standard RAG systems, VerifAI ensures factual consistency by decomposing generated answers into atomic claims and validating them against retrieved evidence using a fine-tuned natural language inference (NLI) engine. The system comprises three modular components: (1) a hybrid Information Retrieval (IR) module optimized for biomedical queries (MAP@10 of 42.7%), (2) a citation-aware Generative Component fine-tuned on a custom dataset to produce referenced answers, and (3) a Verification Component that detects hallucinations with state-of-the-art accuracy, outperforming GPT-4 on the HealthVer benchmark. Evaluations demonstrate that VerifAI significantly reduces hallucinated citations compared to zero-shot baselines and provides a transparent, verifiable lineage for every claim. The full pipeline, including code, models, and datasets, is open-sourced to facilitate reliable AI deployment in high-stakes domains.

2604.08277 2026-04-13 quant-ph cs.AI cs.LG

QARIMA: A Quantum Approach To Classical Time Series Analysis

Nishikanta Mohanty, Bikash K. Behera, Badshah Mukherjee, Pravat Dash

Comments 17 Algorithms, 19 Figures , 26 Tables

详情
英文摘要

We present a quantum-inspired ARIMA methodology that integrates quantum-assisted lag discovery with fixed-configuration variational quantum circuits (VQCs) for parameter estimation and weak-lag refinement. Differencing and candidate lags are identified via swap-test-driven quantum autocorrelation (QACF) and quantum partial autocorrelation (QPACF), with a delayed-matrix construction that aligns quantum projections to time-domain regressors, followed by standard information-criterion parsimony. Given the screened orders (p,d,q), we retain a fixed VQC ansatz, optimizer, and training budget, preventing hyperparameter leakage, and deploy the circuit in two estimation roles: VQC-AR for autoregressive coefficients and VQC-MA for moving-average coefficients. Between screening and estimation, a lightweight VQC weak-lag refinement re-weights or prunes screened AR lags without altering (p,d,q). Across environmental and industrial datasets, we perform rolling-origin evaluations against automated classical ARIMA, reporting out-of-sample mean squared error (MSE), mean absolute percentage error (MAPE), and Diebold-Mariano tests on MSE and MAE. Empirically, the seven quantum contributions (1) differencing selection, (2) QACF, (3) QPACF, (4) swap-test primitives with delayed-matrix construction, (5) VQC-AR, (6) VQC weak-lag refinement, and (7) VQC-MA collectively reduce meta-optimization overhead and make explicit where quantum effects enter order discovery, lag refinement, and AR/MA parameter estimation.

2604.06816 2026-04-13 physics.optics cs.CV

Enhanced Self-Supervised Multi-Image Super-Resolution for Camera Array Images

Yating Chen, Feng Huang, Xianyu Wu, Jing Wu, Ying Shen

详情
英文摘要

Conventional multi-image super-resolution (MISR) methods, such as burst and video SR, rely on sequential frames from a single camera. Consequently, they suffer from complex image degradation and severe occlusion, increasing the difficulty of accurate image restoration. In contrast, multi-aperture camera-array imaging captures spatially distributed views with sampling offsets forming a stable disk-like distribution, which enhances the non-redundancy of observed data. Existing MISR algorithms fail to fully exploit these unique properties. Supervised MISR methods tend to overfit the degradation patterns in training data, and current self-supervised learning (SSL) techniques struggle to recover fine-grained details. To address these issues, this paper thoroughly investigates the strengths, limitations and applicability boundaries of multi-image-to-single-image (Multi-to-Single) and multi-image-to-multi-image (Multi-to-Multi) SSL methods. We propose the Multi-to-Single-Guided Multi-to-Multi SSL framework that combines the advantages of Multi-to-Single and Multi-to-Multi to generate visually appealing and high-fidelity images rich in texture details. The Multi-to-Single-Guided Multi-to-Multi SSL framework provides a new paradigm for integrating deep neural network with classical physics-based variational methods. To enhance the ability of MISR network to recover high-frequency details from aliased artifacts, this paper proposes a novel camera-array SR network called dual Transformer suitable for SSL. Experiments on synthetic and real-world datasets demonstrate the superiority of the proposed method.

2604.03936 2026-04-13 stat.ML cs.LG stat.ME

Biconvex Biclustering

Sam Rosen, Eric C. Chi, Jason Xu

Comments 34 pages, 5 figures

详情
英文摘要

This article proposes a biconvex modification to convex biclustering in order to improve its performance in high-dimensional settings. In contrast to heuristics that discard a subset of noisy features a priori, our method jointly learns and accordingly weighs informative features while discovering biclusters. Moreover, the method is adaptive to the data, and is accompanied by an efficient algorithm based on proximal alternating minimization, complete with detailed guidance on hyperparameter tuning and efficient solutions to optimization subproblems. These contributions are theoretically grounded; we establish finite-sample bounds on the objective function under sub-Gaussian errors, and generalize these guarantees to cases where input affinities need not be uniform. Extensive simulation results reveal our method consistently recovers underlying biclusters while weighing and selecting features appropriately, outperforming peer methods. An application to a gene microarray dataset of lymphoma samples recovers biclusters matching an underlying classification, while giving additional interpretation to the mRNA samples via the column groupings and fitted weights.

2603.28965 2026-04-13 eess.SY cs.RO cs.SY math.DS

Koopman Operator Framework for Modeling and Control of Off-Road Vehicle on Deformable Terrain

Kartik Loya, Phanindra Tallapragada

Comments 11 pages, 14 figures, 4 tables. Submitted to ASME Journal of Autonomous Vehicles (JAVS-26-1012)

详情
英文摘要

This work presents a hybrid physics-informed and data-driven modeling framework for predictive control of autonomous off-road vehicles operating on deformable terrain. Traditional high-fidelity terramechanics models are often too computationally demanding to be directly used in control design. Modern Koopman operator methods can be used to represent the complex terramechanics and vehicle dynamics in a linear form. We develop a framework whereby a Koopman linear system can be constructed using data from simulations of a vehicle moving on deformable terrain. For vehicle simulations, the deformable-terrain terramechanics are modeled using Bekker-Wong theory, and the vehicle is represented as a simplified five-degree-of-freedom (5-DOF) system. The Koopman operators are identified from large simulation datasets for sandy loam and clay using a recursive subspace identification method, where Grassmannian distance is used to prioritize informative data segments during training. The advantage of this approach is that the Koopman operator learned from simulations can be updated with data from the physical system in a seamless manner, making this a hybrid physics-informed and data-driven approach. Prediction results demonstrate stable short-horizon accuracy and robustness under mild terrain-height variations. When embedded in a constrained MPC, the learned predictor enables stable closed-loop tracking of aggressive maneuvers while satisfying steering and torque limits.

2603.28013 2026-04-13 cs.CR cs.AI cs.LG

Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers

Haochuan Kevin Wang, Zechen Zhang

Comments 10 pages, 8 figures. Benchmark code and run logs released

详情
英文摘要

Multi-agent LLM systems are entering production -- processing documents, managing workflows, acting on behalf of users -- yet their resilience to prompt injection is still evaluated with a single binary: did the attack succeed? This leaves architects without the diagnostic information needed to harden real pipelines. We introduce a kill-chain canary methodology that tracks a cryptographic token through four stages (EXPOSED -> PERSISTED -> RELAYED -> EXECUTED) across 950 runs, five frontier LLMs, six attack surfaces, and five defense conditions. The results reframe prompt injection as a pipeline-architecture problem: every model is fully exposed, yet outcomes diverge downstream -- Claude blocks all injections at memory-write (0/164 ASR), GPT-4o-mini propagates at 53%, and DeepSeek exhibits 0%/100% across surfaces from the same model. Three findings matter for deployment: (1) write-node placement is the highest-leverage safety decision -- routing writes through a verified model eliminates propagation; (2) all four defenses fail on at least one surface due to channel mismatch alone, no adversarial adaptation required; (3) invisible whitefont PDF payloads match or exceed visible-text ASR, meaning rendered-layer screening is insufficient. These dynamics apply directly to production: institutional investors and financial firms already run NLP pipelines over earnings calls, SEC filings, and analyst reports -- the document-ingestion workflows now migrating to LLM agents. Code, run logs, and tooling are publicly released.

2602.17667 2026-04-13 cs.IR cs.CV cs.LG

When & How to Write for Personalized Demand-aware Query Rewriting in Video Search

Cheng cheng, Chenxing Wang, Aolin Li, Haijun Wu, Huiyun Hu, Juyuan Wang

详情
英文摘要

In video search systems, user historical behaviors provide rich context for identifying search intent and resolving ambiguity. However, traditional methods utilizing implicit history features often suffer from signal dilution and delayed feedback. To address these challenges, we propose WeWrite, a novel Personalized Demand-aware Query Rewriting framework. Specifically, WeWrite tackles three key challenges: (1) When to Write: An automated posterior-based mining strategy extracts high-quality samples from user logs, identifying scenarios where personalization is strictly necessary; (2) How to Write: A hybrid training paradigm combines Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO) to align the LLM's output style with the retrieval system; (3) Deployment: A parallel "Fake Recall" architecture ensures low latency. Online A/B testing on a large-scale video platform demonstrates that WeWrite improves the Click-Through Video Volume (VV$>$10s) by 1.07% and reduces the Query Reformulation Rate by 2.97%.

2602.07142 2026-04-13 cs.HC cs.AI

Exploring Teachers' Perspectives on Using Conversational AI Agents for Group Collaboration

Prerna Ravi, Carúmey Stevens, Beatriz Flamia Azevedo, Jasmine David, Brandon Hanks, Hal Abelson, Grace Lin, Emma Anderson

Comments Accepted to 27th International Conference on AI in Education (AIED) 2026

详情
英文摘要

Collaboration is a cornerstone of 21st-century learning, yet teachers continue to face challenges in supporting productive peer interaction. Emerging generative AI tools offer new possibilities for scaffolding collaboration, but their role in mediating in-person group work remains underexplored, especially from the perspective of educators. This paper presents findings from an exploratory qualitative study with 33 K12 teachers who interacted with Phoenix, a voice-based conversational agent designed to function as a near-peer in face-to-face group collaboration. Drawing on playtesting sessions, surveys, and focus groups, we examine how teachers perceived the agent's behavior, its influence on group dynamics, and its classroom potential. While many appreciated Phoenix's capacity to stimulate engagement, they also expressed concerns around autonomy, trust, anthropomorphism, and pedagogical alignment. We contribute empirical insights into teachers' mental models of AI, reveal core design tensions, and outline considerations for group-facing AI agents that support meaningful, collaborative learning.

2602.04674 2026-04-13 cs.SI cs.AI cs.CL

Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility

Eun Cheol Choi, Lindsay E. Young, Emilio Ferrara

Comments Accepted to ICWSM 2026

详情
英文摘要

Large language models (LLMs) are increasingly used as proxies for human judgment in computational social science, yet their ability to reproduce patterns of susceptibility to misinformation remains unclear. We test whether LLM-simulated survey respondents, prompted with participant profiles drawn from social survey data measuring network, demographic, attitudinal and behavioral features, can reproduce human patterns of misinformation belief and sharing. Using three online surveys as baselines, we evaluate whether LLM outputs match observed response distributions and recover feature-outcome associations present in the original survey data. LLM-generated responses capture broad distributional tendencies and show modest correlation with human responses, but consistently overstate the association between belief and sharing. Linear models fit to simulated responses exhibit substantially higher explained variance and place disproportionate weight on attitudinal and behavioral features, while largely ignoring personal network characteristics, relative to models fit to human responses. Analyses of model-generated reasoning and LLM training data suggest that these distortions reflect systematic biases in how misinformation-related concepts are represented. Our findings suggest that LLM-based survey simulations are better suited for diagnosing systematic divergences from human judgment than for substituting it.

2602.04418 2026-04-13 cs.MA cs.AI cs.DC cs.ET cs.SE

SPEAR: An Engineering Case Study of Multi-Agent Coordination for Smart Contract Auditing

Indraveni Chebolu, Arnab Mallick, Harmesh Rana

Comments Accepted at 14th International Workshop on Engineering Multi-Agent Systems(EMAS @ AAMAS)

详情
英文摘要

We present SPEAR, a multi-agent coordination framework for smart contract auditing that applies established MAS patterns in a realistic security analysis workflow. SPEAR models auditing as a coordinated mission carried out by specialized agents: a Planning Agent prioritizes contracts using risk-aware heuristics, an Execution Agent allocates tasks via the Contract Net protocol, and a Repair Agent autonomously recovers from brittle generated artifacts using a programmatic-first repair policy. Agents maintain local beliefs updated through AGM-compliant revision, coordinate via negotiation and auction protocols, and revise plans as new information becomes available. An empirical study compares the multi-agent design with centralized and pipeline-based alternatives under controlled failure scenarios, focusing on coordination, recovery behavior, and resource use.

2601.08588 2026-04-13 quant-ph cs.IT cs.LG math.IT math.ST stat.TH

Sample Complexity of Composite Quantum Hypothesis Testing

Jacob Paul Simpson, Efstratios Palias, Sharu Theresa Jose

Comments Accepted to ISIT 2026

详情
英文摘要

This paper investigates symmetric composite binary quantum hypothesis testing (QHT), where the goal is to determine which of two uncertainty sets contains an unknown quantum state. While asymptotic error exponents for this problem are well-studied, the finite-sample regime remains poorly understood. We bridge this gap by characterizing the sample complexity -- the minimum number of state copies required to achieve a target error level. Specifically, we derive lower bounds that generalize the sample complexity of simple QHT and introduce new upper bounds for various uncertainty sets, including of both finite and infinite cardinalities. Notably, our upper and lower bounds match up to universal constants, providing a tight characterization of the sample complexity. Finally, we extend our analysis to the differentially private setting, establishing the sample complexity for privacy-preserving composite QHT.

2512.01708 2026-04-13 stat.ML cs.LG

Differentially Private and Federated Structure Learning in Bayesian Networks

Ghita Fassy El Fehri, Aurélien Bellet, Philippe Bastien

详情
英文摘要

Learning the structure of a Bayesian network from decentralized data poses two major challenges: (i) ensuring rigorous privacy guarantees for participants, and (ii) avoiding communication costs that scale poorly with dimensionality. In this work, we introduce Fed-Sparse-BNSL, a novel federated method for learning linear Gaussian Bayesian network structures that addresses both challenges. By combining differential privacy with greedy updates that target only a few relevant edges per participant, Fed-Sparse-BNSL efficiently uses the privacy budget while keeping communication costs low. Our careful algorithmic design preserves model identifiability and enables accurate structure estimation. Experiments on synthetic and real datasets demonstrate that Fed-Sparse-BNSL achieves utility close to non-private baselines while offering substantially stronger privacy and communication efficiency.

2511.03913 2026-04-13 cs.NE cs.AI

Evolutionary Optimization Trumps Adam Optimization on Embedding Space Exploration

Domício Pereira Neto, João Correia, Penousal Machado

Comments 34 pages, 6 figures, 3 tables, 18 appendix figures, 1 appendix table

详情
英文摘要

Deep diffusion models have revolutionized image generation by producing high-quality outputs. However, achieving specific objectives with these models often requires costly adaptations such as fine-tuning, which can be resource-intensive and time-consuming. An alternative approach is inference-time control, which involves optimizing the prompt embeddings to guide the generation process without altering the model weights. We explore prompt-embedding search optimization for the Stable Diffusion XL Turbo model, comparing a gradient-free evolutionary approach, the Separable Covariance Matrix Adaptation Evolution Strategy (sep-CMA-ES), against the widely used gradient-based optimizer Adaptive Moment Estimation (Adam). Candidate images are evaluated by a weighted objective that combines LAION Aesthetic Predictor V2 and CLIPScore, enabling explicit trade-offs between aesthetic quality and prompt-image alignment. On 36 prompts sampled from Parti Prompts (P2) under three weight settings (aesthetics-only, balanced, alignment-only), sep-CMA-ES consistently achieves higher objective values than Adam. We additionally analyze divergence from the unoptimized baseline using cosine similarity and SSIM and report the compute and memory footprints. These results suggest that sep-CMA-ES is an effective inference-time optimizer for prompt-embedding search, improving aesthetics-alignment trade-offs and resource usage without model fine-tuning.