arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1851
2604.18716 2026-04-22 cs.CR cs.LG

TrEEStealer: Stealing Decision Trees via Enclave Side Channels

Jonas Sander, Anja Rabich, Nick Mahling, Felix Maurer, Jonah Heller, Qifan Wang, Thomas Eisenbarth, David Oswald

详情
英文摘要

Today, machine learning is widely applied in sensitive, security-related, and financially lucrative applications. Model extraction attacks undermine current business models where a model owner sells model access, e.g., via MLaaS APIs. Additionally, stolen models can enable powerful white-box attacks, facilitating privacy attacks on sensitive training data, and model evasion. In this paper, we focus on Decision Trees (DT), which are widely deployed in practice. Existing black-box extraction attacks for DTs are either query-intensive, make strong assumptions about the DT structure, or rely on rich API information. To limit attacks to the black-box setting, CPU vendors introduced Trusted Execution Environments (TEE) that use hardware-mechanisms to isolate workloads from external parties, e.g., MLaaS providers. We introduce TrEEStealer, a high-fidelity extraction attack for stealing TEE-protected DTs. TrEEStealer exploits TEE-specific side-channels to steal DTs efficiently and without strong assumptions about the API output or DT structure. The extraction efficacy stems from a novel algorithm that maximizes the information derived from each query by coupling Control-Flow Information (CFI) with passive information tracking. We use two primitives to acquire CFI: for AMD SEV, we follow previous work using the SEV-Step framework and performance counters. For Intel SGX, we reproduce prior findings on current Xeon 6 CPUs and construct a new primitive to efficiently extract the branch history of inference runs through the Branch-History-Register. We found corresponding vulnerabilities in three popular libraries: OpenCV, mlpack, and emlearn. We show that TrEEStealer achieves superior efficiency and extraction fidelity compared to prior attacks. Our work establishes a new state-of-the-art for DT extraction and confirms that TEEs fail to protect against control-flow leakage.

2604.18697 2026-04-22 cs.CR cs.CL cs.LG

Beyond Indistinguishability: Measuring Extraction Risk in LLM APIs

Ruixuan Liu, David Evans, Li Xiong

Comments Accepted by S&P 2026

详情
英文摘要

Indistinguishability properties such as differential privacy bounds or low empirically measured membership inference are widely treated as proxies to show a model is sufficiently protected against broader memorization risks. However, we show that indistinguishability properties are neither sufficient nor necessary for preventing data extraction in LLM APIs. We formalize a privacy-game separation between extraction and indistinguishability-based privacy, showing that indistinguishability and inextractability are incomparable: upper-bounding distinguishability does not upper-bound extractability. To address this gap, we introduce $(l, b)$-inextractability as a definition that requires at least $2^b$ expected queries for any black-box adversary to induce the LLM API to emit a protected $l$-gram substring. We instantiate this via a worst-case extraction game and derive a rank-based extraction risk upper bound for targeted exact extraction, as well as extensions to cover untargeted and approximate extraction. The resulting estimator captures the extraction risk over multiple attack trials and prefix adaptations. We show that it can provide a tight and efficient estimation for standard greedy extraction and an upper bound on the probabilistic extraction risk given any decoding configuration. We empirically evaluate extractability across different models, clarifying its connection to distinguishability, demonstrating its advantage over existing extraction risk estimators, and providing actionable mitigation guidelines across model training, API access, and decoding configurations in LLM API deployment. Our code is publicly available at: https://github.com/Emory-AIMS/Inextractability.

2604.18663 2026-04-22 cs.CR cs.AI

Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation

Wentao Zhang, Yan Zhuang, ZhuHang Zheng, Mingfei Zhang, Jiawen Deng, Fuji Ren

Comments 22 pages, Accepted to the ACL 2026 Main Conference

详情
英文摘要

Existing jamming attacks on Retrieval-Augmented Generation (RAG) systems typically induce explicit refusals or denial-of-service behaviors, which are conspicuous and easy to detect. In this work, we formalize a subtler availability threat, termed soft failure, which degrades system utility by inducing fluent and coherent yet non-informative responses rather than overt failures. We propose Deceptive Evolutionary Jamming Attack (DEJA), an automated black-box attack framework that generates adversarial documents to trigger such soft failures by exploiting safety-aligned behaviors of large language models. DEJA employs an evolutionary optimization process guided by a fine-grained Answer Utility Score (AUS), computed via an LLM-based evaluator, to systematically degrade the certainty of answers while maintaining high retrieval success. Extensive experiments across multiple RAG configurations and benchmark datasets show that DEJA consistently drives responses toward low-utility soft failures, achieving SASR above 79\% while keeping hard-failure rates below 15\%, significantly outperforming prior attacks. The resulting adversarial documents exhibit high stealth, evading perplexity-based detection and resisting query paraphrasing, and transfer across model families to proprietary systems without retargeting.

2604.18660 2026-04-22 cs.CR cs.AI

Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks

Jin Zhao, Marta Knežević, Tanja Käser

Comments ACL 2026

详情
英文摘要

Large Language Models (LLMs) are increasingly used in education, yet their default helpfulness often conflicts with pedagogical principles. Prior work evaluates pedagogical quality via answer leakage-the disclosure of complete solutions instead of scaffolding-but typically assumes well-intentioned learners, leaving tutor robustness under student misuse largely unexplored. In this paper, we study scenarios where students behave adversarially and aim to obtain the correct answer from the tutor. We evaluate a broad set of LLM-based tutor models, including different model families, pedagogically aligned models, and a multi-agent design, under a range of adversarial student attacks. We adapt six groups of adversarial and persuasive techniques to the educational setting and use them to probe how likely a tutor is to reveal the final answer. We evaluate answer leakage robustness using different types of in-context adversarial student agents, finding that they often fail to carry out effective attacks. We therefore introduce an adversarial student agent that we fine-tune to jailbreak LLM-based tutors, which we propose as the core of a standardized benchmark for evaluating tutor robustness. Finally, we present simple but effective defense strategies that reduce answer leakage and strengthen the robustness of LLM-based tutors in adversarial scenarios.

2604.18658 2026-04-22 cs.CR cs.AI cs.CL

Owner-Harm: A Missing Threat Model for AI Agent Safety

Dongcheng Zhang, Yiqing Jiang

Comments 15 pages. Companion manuscript on per-decision proof-obligation synthesis (LSVJ-S) in preparation

详情
英文摘要

Existing AI agent safety benchmarks focus on generic criminal harm (cybercrime, harassment, weapon synthesis), leaving a systematic blind spot for a distinct and commercially consequential threat category: agents harming their own deployers. Real-world incidents illustrate the gap: Slack AI credential exfiltration (Aug 2024), Microsoft 365 Copilot calendar-injection leaks (Jan 2024), and a Meta agent unauthorized forum post exposing operational data (Mar 2026). We propose Owner-Harm, a formal threat model with eight categories of agent behavior damaging the deployer. We quantify the defense gap on two benchmarks: a compositional safety system achieves 100% TPR / 0% FPR on AgentHarm (generic criminal harm) yet only 14.8% (4/27; 95% CI: 5.9%-32.5%) on AgentDojo injection tasks (prompt-injection-mediated owner harm). A controlled generic-LLM baseline shows the gap is not inherent to owner-harm (62.7% vs. 59.3%, delta 3.4 pp) but arises from environment-bound symbolic rules that fail to generalize across tool vocabularies. On a post-hoc 300-scenario owner-harm benchmark, the gate alone achieves 75.3% TPR / 3.3% FPR; adding a deterministic post-audit verifier raises overall TPR to 85.3% (+10.0 pp) and Hijacking detection from 43.3% to 93.3%, demonstrating strong layer complementarity. We introduce the Symbolic-Semantic Defense Generalization (SSDG) framework relating information coverage to detection rate. Two SSDG experiments partially validate it: context deprivation amplifies the detection gap 3.4x (R = 3.60 vs. R = 1.06); context injection reveals structured goal-action alignment, not text concatenation, is required for effective owner-harm detection.

2604.18649 2026-04-22 cs.CR cs.AI

Position: No Retroactive Cure for Infringement during Training

Satoru Utsunomiya, Masaru Isonuma, Junichiro Mori, Ichiro Sakata

Comments 12pages

详情
英文摘要

As generative AI faces intensifying legal challenges, the machine learning community has increasingly relied on post-hoc mitigation -- especially machine unlearning and inference-time guardrails -- to argue for compliance. This paper argues that such post-hoc mitigation methods cannot retroactively cure liability from unlawful acquisition and training, because compliance hinges on data lineage, not the outputs. Our argument has three parts. First, unauthorized copying/ingestion can be a legally complete completed act, and model weights may operate as fixed copies that retain training-derived expressive value, making later filtering beside the point for infringement. Second, contract and tort/unfair-competition rules -- via licenses, terms of service, and anti-free-riding principles -- can independently restrict access and use, often bypassing copyright defenses (e.g., fair use or TDM exceptions). Third, since value from protected inputs can persist in weights, remedies such as unjust enrichment and disgorgement may require stripping gains and, in some cases, reaching the model itself. We therefore argue for a shift from Post-Hoc Sanitization to verifiable Ex-Ante Process Compliance.

2604.18621 2026-04-22 q-bio.GN cs.LG

Quantum AI for Cancer Diagnostic Biomarker Discovery

Mandeep Kaur Saggi, Amandeep Singh Bhatia, Humaira Gowher, Sabre Kais

Comments 25 pages, 15 figures

详情
英文摘要

Quantum machine learning offers a promising new paradigm for computational biology by leveraging quantum mechanical principles to enhance cancer classification, biomarker discovery, and bioinformatics diagnostics. In this study, we apply QML to identify subtype specific biomarkers for lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), the two predominant forms of non-small cell lung cancer. Our methodology involves a two-phase process: in Phase 1, differential expression analysis and methylation analysis between tumor and normal samples allows us to identify LUAD-specific and LUSC-specific genes, revealing potential prognostic biomarkers for cancer subtypes. Phase 2 focuses on developing a quantum classifier capable of distinguishing between LUAD and LUSC tumors, as well as between tumor and normal samples. This classifier not only enhances diagnostic precision but also demonstrates the quantum advantage in processing large-scale multiomic datasets. Our results consistently demonstrated that Sample3, representing the combined gene set, achieved the highest overall predictive performance in all metrics. These results demonstrate that QML provides an effective and scalable approach for biomarker discovery and subtype specific cancer classification. GO enrichment analysis highlighted the significant involvement of genes in synaptic signaling, ion channel regulation, and neuronal development. In the quantum phase, KEGG analysis further identified enrichment in cancer-associated pathways, including neurotrophin, MAPK, Ras, and PI3KAkt signaling, with key genes such as NGFR, NTRK2, and NTF3 suggesting a central role in neurotrophinmediated oncogenic processes. Our findings highlight the growing potential of quantum computing to advance precision oncology and next-generation biomedical analytics.

2604.18616 2026-04-22 cs.DC cs.AI cs.PL

ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants

Haohui Mai, Xiaoyan Guo, Xiangyun Ding, Daifeng Li, Qiuchu Yu, Chenzhun Guo, Cong Wang, Jiacheng Zhao, Christos Kozyrakis, Binhang Yuan

详情
英文摘要

LLM-based coding agents can generate functionally correct GPU kernels, yet their performance remains far below hand-optimized libraries on critical computations such as matrix multiplication, attention, and Mixture-of-Experts (MoE). Peak GPU performance requires coordinated reasoning over tightly coupled optimizations, including tiling, shared-memory staging, software pipelining, and instruction scheduling, while existing agents rely on sparse pass/fail feedback, leaving them unable to diagnose global constraint violations. We present Argus, an agentic framework that addresses this through data-flow invariants: compile-time specifications encoding how data must be choreographed throughout kernel execution. Argus introduces a tile-based, Pythonic DSL exposing hardware instructions and compiler policies while hiding low-level representations. The DSL provides tag functions to propagate symbolic annotations through data and control flow, and tag assertions to enforce relational constraints at use sites. When violations occur, the compiler returns concrete counterexamples identifying the thread, data element, and program point, enabling dense, structured feedback for targeted fixes. Invariants are verified at compile time via abstract interpretation over a layout algebra and SMT solving, with zero runtime overhead. An in-context reinforcement learning planner learns to select optimizations and synthesize effective invariants, supported by a curated knowledge base of GPU optimization techniques. We evaluate Argus on the AMD MI300X GPU across GEMM, flash attention, and MoE kernels accounting for over 90% of GPU time in LLM inference. Generated kernels achieve 99-104% of state-of-the-art hand-optimized assembly throughput and are 2-1543x faster than existing agentic systems. Argus further generalizes to 200 KernelBench tasks, solving 100% of Level 1 and 90% of Level 2 problems.

2604.18612 2026-04-22 cs.NE cs.AI cs.LG

Agent-GWO: Collaborative Agents for Dynamic Prompt Optimization in Large Language Models

Xudong Wang, Chaoning Zhang, Chenghao Li, Shuxu Chen, Qigan Sun, Jiaquan Zhang, Fachrina Dewi Puspitasari, Tae-Ho Kim, Jiwei Wei, Malu Zhang, Guoqing Wang, Yang Yang, Heng Tao Shen

Comments Accepted to ACL 2026. 9 pages, 5 figures

详情
英文摘要

Large Language Models (LLMs) have demonstrated strong capabilities in complex reasoning tasks, while recent prompting strategies such as Chain-of-Thought (CoT) have further elevated their performance in handling complex logical problems. Despite these advances, high-quality reasoning remains heavily reliant on manual static prompts and is sensitive to decoding configurations and task distributions, leading to performance fluctuations and limited transferability. Existing automatic prompt optimization methods typically adopt single-agent local search, failing to simultaneously optimize prompts and decoding hyperparameters within a unified framework to achieve stable global improvements. To address this limitation, we propose Agent-GWO, a dynamic prompt optimization framework for complex reasoning. Specifically, we unify prompt templates and decoding hyperparameters as inheritable agent configurations. By leveraging the leader-follower mechanism of the Grey Wolf Optimizer (GWO), we automatically select three leader agents ($α$, $β$, and $δ$) to guide the collaborative updates of the remaining agents, enabling iterative convergence toward robust optimal reasoning configurations that can be seamlessly integrated for inference. Extensive experiments on multiple mathematical and hybrid reasoning benchmarks across diverse LLM backbones show that Agent-GWO consistently improves accuracy and stability over existing prompt optimization methods. The code will be released publicly.

2604.18611 2026-04-22 cs.NE cs.AI cs.LG

Neuromorphic Continual Learning for Sequential Deployment of Nuclear Plant Monitoring Systems

Samrendra Roy, Sajedul Talukder, Syed Bahauddin Alam

详情
英文摘要

Anomaly detection in nuclear industrial control systems (ICS) requires continuous, energy-efficient monitoring across multiple subsystems that are often deployed at different stages of plant commissioning. When a conventional neural network is sequentially trained to monitor new subsystems, it catastrophically forgets previously learned anomaly patterns, a safety-critical failure mode. We present the first spiking neural network (SNN)-based anomaly detection system with continual learning for nuclear ICS, addressing both challenges simultaneously. Our approach introduces spike-encoded asynchronous sensor fusion, a delta-based encoding that converts heterogeneous sensor streams into sparse spike trains at rates dictated by each sensor's natural dynamics, achieving 92.7% input sparsity. We evaluate five continual learning strategies, including sequential fine-tuning, Elastic Weight Consolidation (EWC), Synaptic Intelligence (SI), experience replay, and a hybrid EWC+Replay approach, on the HAI 21.03 nuclear ICS security dataset across three sequentially deployed subsystems (boiler, turbine, water treatment). The hybrid EWC+Replay method achieves an average F1 score of 0.979 with near-zero average forgetting (AF = 0.000 single seed; 0.035 +/- 0.039 across three seeds), while requiring 12.6x fewer operations (an estimated 2.5x in energy based on published hardware specifications) than an equivalent artificial neural network. The system detects all tested attacks with a mean latency of 0.6 seconds. These results demonstrate that neuromorphic computing offers a viable path toward always-on, energy-efficient, and adaptable safety monitoring for next-generation nuclear facilities.

2604.18610 2026-04-22 cs.NE cs.AI

SpikeMLLM: Spike-based Multimodal Large Language Models via Modality-Specific Temporal Scales and Temporal Compression

Han Xu, Zhiyong Qin, Di Shang, Jiahong Zhang, Xuerui Qiu, Bo Lei, Tiejun Huang, Bo Xu, Guoqi Li

详情
英文摘要

Multimodal Large Language Models (MLLMs) have achieved remarkable progress but incur substantial computational overhead and energy consumption during inference, limiting deployment in resource-constrained environments. Spiking Neural Networks (SNNs), with their sparse event-driven computation, offer inherent energy efficiency advantages on neuromorphic hardware, yet extending them to MLLMs faces two key challenges: heterogeneous modalities make uniform spike encoding insufficient, and high-resolution image inputs amplify timestep unfolding overhead. We propose SpikeMLLM, the first spike-based framework for MLLMs, which unifies existing ANN quantization methods in the spiking representation space and incorporates Modality-Specific Temporal Scales (MSTS) guided by Modality Evolution Discrepancy (MED) and Temporally Compressed LIF (TC-LIF) for timestep compression from T=L-1 to T=log2(L)-1. Experiments on four representative MLLMs across diverse multimodal benchmarks show that SpikeMLLM maintains near-lossless performance under aggressive timestep compression (Tv/Tt=3/4), with average gaps of only 0.72% and 1.19% relative to the FP16 baseline on InternVL2-8B and Qwen2VL-72B. We further develop a dedicated RTL accelerator tailored to the spike-driven datapath, observing 9.06x higher throughput and 25.8x better power efficiency relative to an FP16 GPU baseline under a deployment-oriented co-design setting, suggesting the promise of algorithm-hardware co-design for efficient multimodal intelligence.

2604.18607 2026-04-22 cs.NE cs.AI

TurboEvolve: Towards Fast and Robust LLM-Driven Program Evolution

Yang Yang, Zining Zhong, Jindong Li, Jiemin Wu, Kaishen Yuan, Wenshuo Chen, Menglin Yang, Yutao Yue

Comments 12 pages, 8 figures

详情
英文摘要

LLM-driven program evolution can discover high-quality programs, but its cost and run-to-run variance hinder reliable progress. We propose TurboEvolve, a multi-island evolutionary framework that improves sample efficiency and robustness under fixed evaluation budgets. Inspired by the multiple-offspring strategy in evolutionary algorithms, TurboEvolve introduces verbalized Sampling, prompting the LLM to emit K diverse candidates with explicit self-assigned sampling weights, and an online scheduler that adapts K to expand exploration under stagnation and reduce overhead during steady progress. To exploit existing solution pools, we further propose "seed-pool injection," which clusters seeds and assigns them across islands with controlled perturbations and elitist preservation to balance diversity and refinement. Across multiple program-optimization benchmarks, TurboEvolve consistently achieves stronger performance at lower budgets and improves best-known solutions on several tasks.

2604.18606 2026-04-22 eess.SP cs.AI

Thermal Anomaly Detection using Physics Aware Neuromorphic Networks: Comparison between Raw and L1C Sentinel-2 Data

Stephen Smith, Cormac Purcell, Gabriele Meoni, Roberto Del Prete, Zdenka Kuncic

Comments 9 pages, 5 figures

详情
英文摘要

Damage caused by bushfires and volcanic eruptions escalates rapidly when detection is delayed, making fast and reliable early warning capabilities essential. Recent Earth Observation (EO) approaches have shown that thermal anomaly detection can be performed directly on decompressed Level-0 (L0) sensor data, avoiding computationally expensive preprocessing chains. However, direct exploitation of raw data remains challenging due to domain shift, sensor drift, radiometric inconsistencies, and the scarcity of labelled training samples. To address these challenges, this work proposes a Physics-Aware Neuromorphic Network (PANN) framework for onboard thermal anomaly detection. The proposed lightweight architecture, inspired by physical neural network principles and neuromorphic computing paradigms, is evaluated using two Sentinel-2 datasets: decompressed L0 with additional metadata (i.e. raw) and Level-1C (L1C). The PANN achieves a Matthews Correlation Coefficient (MCC) of $0.809$ on raw measurements, compared to $0.875$ when using ground-processed L1C products. The mean processing latency per L0 granule is $2.44 \pm 0.09~\mathrm{s}$, which is below the Sentinel-2 acquisition time of $3.6~\mathrm{s}$, demonstrating the feasibility of real-time, onboard processing. Furthermore, the projected execution time for the corresponding neuromorphic hardware instantiation is substantially lower at $0.1290 \pm 0.0002~\mathrm{s}$. Memory usage, including all necessary programs and packages, remains within realistic onboard constraints, with requirements of $0.673 \pm 0.007~\mathrm{Gb}$ for the software PANN and $0.393 \pm 0.004~\mathrm{Gb}$ for the estimated hardware realisation. Overall, these results indicate that PANN offers a promising pathway toward low-latency and resource-efficient onboard EO processing for thermal event detection.

2604.18591 2026-04-22 cs.HC cs.AI

SPRITE: From Static Mockups to Engine-Ready Game UI

Yunshu Bai, RuiHao Li, Hao Zhang, Chien Her Lim, Ming Yan, Mengtian Li

Comments CHI EA '26

详情
英文摘要

Game UI implementation requires translating stylized mockups into interactive engine entities. However, current "Screenshot-to-Code" tools often struggle with the irregular geometries and deep visual hierarchies typical of game interfaces. To bridge this gap, we introduce SPRITE, a pipeline that transforms static screenshots into editable engine assets. By integrating Vision-Language Models (VLMs) with a structured YAML intermediate representation, SPRITE explicitly captures complex container relationships and non-rectangular layouts. We evaluated SPRITE against a curated Game UI benchmark and conducted expert reviews with professional developers to assess reconstruction fidelity and prototyping efficiency. Our findings demonstrate that SPRITE streamlines development by automating tedious coding and resolving complex nesting. By facilitating rapid in-engine iteration, SPRITE effectively blurs the boundaries between artistic design and technical implementation in game development. Project page: https://baiyunshu.github.io/sprite.github.io/

2604.18589 2026-04-22 cs.HC cs.AI

CentaurTA Studio: A Self-Improving Human-Agent Collaboration System for Thematic Analysis

Lei Wang, Min Huang, Eduard Dragut

详情
英文摘要

Thematic analysis is difficult to scale: manual workflows are labor-intensive, while fully automated pipelines often lack controllability and transparent evaluation. We present \textbf{CentaurTA Studio}, a web-based system for self-improving human--agent collaboration in open coding and theme construction. The system integrates (1) a two-stage human feedback pipeline separating simulator drafting and expert validation, (2) persistent prompt optimization that distills validated feedback into reusable alignment principles, and (3) rubric-based evaluation with early stopping for process control. Across three domains, CentaurTA achieves the strongest performance in both Open Coding and Theme Construction, reaching up to 92.12\% accuracy and consistently outperforming baseline systems. Agreement between the rubric-based LLM judge and human annotators reaches substantial reliability (average $κ= 0.68$). Ablation studies show that removing the feedback loop reduces performance from 90\% to 81\%, while eliminating the Critic or early stopping degrades accuracy or increases interaction cost. The full system reaches peak performance within 10 iterative rounds (about 25 minutes), demonstrating improved efficiency over expert-only refinement.

2604.18586 2026-04-22 cs.CY cs.AI cs.CL cs.LG cs.SI

Who Shapes Brazil's Vaccine Debate? Semi-Supervised Modeling of Stance and Polarization in YouTube's Media Ecosystem

Geovana S. de Oliveira, Ana P. C. Silva, Fabricio Murai, Carlos H. G. Ferreira

Comments Paper accepted at WebSci'26

详情
英文摘要

Vaccination remains a cornerstone of global public health, yet the COVID-19 pandemic exposed how online misinformation, political polarization, and declining institutional trust can undermine immunization efforts. Most of the prior computational studies that analyzed vaccine discourse on social platforms focus on English-language data, specific vaccines, or short time windows, impairing our understanding of long-term dynamics in high-impact, non-English contexts like Brazil, home to one of the world's most comprehensive immunization systems. We here present the largest longitudinal study of Brazil's vaccine discourse on YouTube, leveraging a semi-supervised stance detection framework that combines self-labeling and self-training to classify nearly 1.4 million comments. By integrating stance with temporal patterns, engagement metrics, and channel taxonomy (legacy media, science communicators, digital-native outlets), we map how pro- and anti-vaccine narratives evolve and circulate within a hybrid media ecosystem. Our results show that semi-supervised learning substantially improves stance classification robustness, enabling fine-grained tracking of public attitudes across Brazil's full immunization schedule. Polarization spikes during epidemiological crises, especially COVID-19, but becomes fragmented across vaccines and interaction patterns in the post-pandemic period. Notably, science communication and digital-native channels emerge as the primary loci of both supportive and oppositional engagement, revealing structural vulnerabilities in contemporary health communication. Thus, our work advances computational methods for large-scale stance modeling while offering actionable evidence for public health agencies, platform governance, and online information ecosystems.

2604.18507 2026-04-22 math.OC cs.AI cs.LG

Learning the Riccati solution operator for time-varying LQR via Deep Operator Networks

Jun Chen, Umberto Biccari, Junmin Wang

详情
英文摘要

We propose a computational framework for replacing the repeated numerical solution of differential Riccati equations in finite-horizon Linear Quadratic Regulator (LQR) problems by a learned operator surrogate. Instead of solving a nonlinear matrix-valued differential equation for each new system instance, we construct offline an approximation of the associated solution operator mapping time-dependent system parameters to the Riccati trajectory. The resulting model enables fast online evaluation of approximate optimal feedbacks across a wide class of systems, thereby shifting the computational burden from repeated numerical integration to a one-time learning stage. From a theoretical perspective, we establish control-theoretic guarantees for this operator-based approximation. In particular, we derive bounds quantifying how operator approximation errors propagate to feedback performance, trajectory accuracy, and cost suboptimality, and we prove that exponential stability of the closed-loop system is preserved under sufficiently accurate operator approximation. These results provide a framework to assess the reliability of data-driven approximations in optimal control. On the computational side, we design tailored DeepONet architectures for matrix-valued, time-dependent problems and introduce a progressive learning strategy to address scalability with respect to the system dimension. Numerical experiments on both time-invariant and time-varying LQR problems demonstrate that the proposed approach achieves high accuracy and strong generalization across a wide range of system configurations, while delivering substantial computational speedups compared to classical solvers. The method offers an effective and scalable alternative for parametric and real-time optimal control applications.

2604.18146 2026-04-22 cs.IR cs.AI cs.CL

Modular Representation Compression: Adapting LLMs for Efficient and Effective Recommendations

Yunjia Xi, Menghui Zhu, Jianghao Lin, Bo Chen, Ruiming Tang, Yong Yu, Weinan Zhang

Comments SIGIR 2026

详情
英文摘要

Recently, large language models (LLMs) have advanced recommendation systems (RSs), and recent works have begun to explore how to integrate LLMs into industrial RSs. While most approaches deploy LLMs offline to generate and pre-cache augmented representations for RSs, high-dimensional representations from LLMs introduce substantial storage and computational costs. Thus, it is crucial to compress LLM representations effectively. However, we identify a counterintuitive phenomenon during representation compression: Mid-layer Representation Advantage (MRA), where representations from middle layers of LLMs outperform those from final layers in recommendation tasks. This degraded final layer renders existing compression methods, which typically compress on the final layer, suboptimal. We interpret this based on modularity theory that LLMs develop spontaneous internal functional modularity and force the final layer to specialize in the proxy training task. Thus, we propose \underline{M}odul\underline{a}r \underline{R}epresentation \underline{C}ompression (MARC) to explicitly control the modularity of LLMs. First, Modular Adjustment explicitly introduces compression and task adaptation modules, enabling the LLM to operate strictly as a representation-learning module. Next, to ground each module to its specific task, Modular Task Decoupling uses information constraints and different network structures to decouple tasks. Extensive experiments validate that MARC addresses MRA and produces efficient representations. Notably, MARC achieved a 2.82% eCPM lift in an online A/B test within a large-scale commercial search advertising scenario.

2604.18005 2026-04-22 cs.MA cs.AI cs.CL

Diversity Collapse in Multi-Agent LLM Systems: Structural Coupling and Collective Failure in Open-Ended Idea Generation

Nuo Chen, Yicheng Tong, Yuzhe Yang, Yufei He, Xueyi Zhang, Qingyun Zou, Qian Wang, Bingsheng He

Comments 56 pages, 15 figures; Accepted at ACL 2026 Findings

详情
英文摘要

Multi-agent systems (MAS) are increasingly used for open-ended idea generation, driven by the expectation that collective interaction will broaden the exploration diversity. However, when and why such collaboration truly expands the solution space remains unclear. We present a systematic empirical study of diversity in MAS-based ideation across three bottom-up levels: model intelligence, agent cognition, and system dynamics. At the model level, we identify a compute efficiency paradox, where stronger, highly aligned models yield diminishing marginal diversity despite higher per-sample quality. At the cognition level, authority-driven dynamics suppress semantic diversity compared to junior-dominated groups. At the system level, group-size scaling yields diminishing returns and dense communication topologies accelerate premature convergence. We characterize these outcomes as collective failures emerging from structural coupling, a process where interaction inadvertently contracts agent exploration and triggers diversity collapse. Our analysis shows that this collapse arises primarily from the interaction structure rather than inherent model insufficiency, highlighting the importance of preserving independence and disagreement when designing MAS for creative tasks. Our code is available at https://github.com/Xtra-Computing/MAS_Diversity.

2604.16369 2026-04-22 cs.CY cs.AI cs.CL

Why AI Readiness Is an Organizational Learning Problem, Not a Technology Purchase

Jeanne McClure, Gregg Gerdau

Comments 8 Pages 2 figures 1 table

详情
英文摘要

Global corporate AI investment reached $252.3 billion in 2024, yet only 6% of firms report significant earnings impact. This article argues that AI project failure is fundamentally an organizational learning problem rather than a technology deficit. Drawing on a systematic synthesis of 19 large-scale industry and academic sources, including surveys of nearly 10,000 organizational leaders, we identify two categories of failure: organizational (culture, leadership alignment, governance, and human-AI learning deficits) and technical (semantic bottlenecks and output management challenges). We introduce the Siloed-Integrated-Orchestrated (SIO) progression model, which maps enterprise AI capability across five pillars -- Culture & Leadership, Human Capital & Operations, Data Architecture, Systems Infrastructure, and Governance & Regulatory Compliance -- and provides prescriptive guidance for advancing between stages. The implications challenge organizations to reframe AI investment as capability development rather than technology procurement.

2604.13327 2026-04-22 cs.DC cs.LG cs.PL

Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel

Hongyi Jin, Bohan Hou, Guanjie Wang, Ruihang Lai, Jinqi Chen, Zihao Ye, Yaxing Cai, Yixin Dong, Xinhao Cheng, Zhihao Zhang, Yilong Zhao, Yingyi Huang, Lijie Yang, Jinchen Jiang, Gabriele Oliaro, Jianan Ji, Xupeng Miao, Vinod Grover, Todd C. Mowry, Zhihao Jia, Tianqi Chen

Comments 16 pages. 18 figures. accepted in MLSys 2026. References corrected

详情
英文摘要

Modern GPU workloads, especially large language model (LLM) inference, suffer from kernel launch overheads and coarse synchronization that limit inter-kernel parallelism. Recent megakernel techniques fuse multiple operators into a single persistent kernel to eliminate launch gaps and expose inter-kernel parallelism, but struggle to handle dynamic shapes and data-dependent computation in real workloads. We present Event Tensor, a unified compiler abstraction for dynamic megakernels. Event Tensor encodes dependencies between tiled tasks, and enables first-class support for both shape and data-dependent dynamism. Built atop this abstraction, our Event Tensor Compiler (ETC) applies static and dynamic scheduling transformations to generate high-performance persistent kernels. Evaluations show that ETC achieves state-of-the-art LLM serving latency while significantly reducing system warmup overhead.

2604.08608 2026-04-22 cs.CR cs.AI cs.LG

Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines

Tanzim Ahad, Ismail Hossain, Md Jahangir Alam, Sai Puppala, Yoonpyo Lee, Syed Bahauddin Alam, Sajedul Talukder

Comments This paper got accepted for AAAI 2026 Summer Symposium

详情
英文摘要

We introduce Semantic Intent Fragmentation (SIF), an attack class against LLM orchestration systems where a single, legitimately phrased request causes an orchestrator to decompose a task into subtasks that are individually benign but jointly violate security policy. Current safety mechanisms operate at the subtask level, so each step clears existing classifiers -- the violation only emerges at the composed plan. SIF exploits OWASP LLM06:2025 through four mechanisms: bulk scope escalation, silent data exfiltration, embedded trigger deployment, and quasi-identifier aggregation, requiring no injected content, no system modification, and no attacker interaction after the initial request. We construct a three-stage red-teaming pipeline grounded in OWASP, MITRE ATLAS, and NIST frameworks to generate realistic enterprise scenarios. Across 14 scenarios spanning financial reporting, information security, and HR analytics, a GPT-20B orchestrator produces policy-violating plans in 71% of cases (10/14) while every subtask appears benign. Three independent signals validate this: deterministic taint analysis, chain-of-thought evaluation, and a cross-model compliance judge with 0% false positives. Stronger orchestrators increase SIF success rates. Plan-level information-flow tracking combined with compliance evaluation detects all attacks before execution, showing the compositional safety gap is closable.

2604.04981 2026-04-22 q-bio.GN cs.LG cs.NE

An Imbalanced Dataset with Multiple Feature Representations for Studying Quality Control of Next-Generation Sequencing

Philipp Röchner, Clarissa Krämer, Johannes U Mayer, Franz Rothlauf, Steffen Albrecht, Maximilian Sprang

详情
英文摘要

Next-generation sequencing (NGS) is a key technique for studying the DNA and RNA of organisms. However, identifying quality problems in NGS data across different experimental settings remains challenging. To develop automated quality-control tools, researchers require datasets with features that capture the characteristics of quality problems. Existing NGS repositories, however, offer only a limited number of quality-related features. To address this gap, we propose a dataset derived from 37,491 NGS samples with two types of quality-related feature representations. The first type consists of 34 features derived from quality control tools (QC-34 features). The second type has a variable number of features ranging from eight to 1,183. These features were derived from read counts in problematic genomic regions identified by the ENCODE blocklist (BL features). All features describe the same human and mouse samples from five genomic assays, allowing direct comparison of feature representations. The proposed dataset includes a binary quality label, derived from automated quality control and domain experts. Among all samples, $3.2\%$ are of low quality. Supervised machine learning algorithms accurately predicted quality labels from the features, confirming the relevance of the provided feature representations. The proposed feature representations enable researchers to study how different feature types (QC-34 vs. BL features) and granularities (varying number of BL features) affect the detection of quality problems.

2604.03753 2026-04-22 cs.CR cs.LG

Spatiotemporal-Aware Bit-Flip Injection on DNN-based Advanced Driver Assistance Systems (extended version)

Taibiao Zhao, Xiang Zhang, Mingxuan Sun, Ruyi Ding, Xugui Zhou

Comments The authors have identified issues in the experimental setup and evaluation that may affect the validity of the results. In particular, inconsistencies in the fault injection protocol and temporal analysis may lead to incorrect conclusions. The authors therefore request withdrawal for thorough revision

详情
英文摘要

Modern advanced driver assistance systems (ADAS) rely on deep neural networks (DNNs) for perception and planning. Since DNNs' parameters reside in DRAM during inference, bit flips caused by cosmic radiation or low-voltage operation may corrupt DNN computations, distort driving decisions, and lead to real-world incidents. This paper presents a SpatioTemporal-Aware Fault Injection (STAFI) framework to locate critical fault sites in DNNs for ADAS efficiently. Spatially, we propose a Progressive Metric-guided Bit Search (PMBS) that efficiently identifies critical network weight bits whose corruption causes the largest deviations in driving behavior (e.g., unintended acceleration or steering). Furthermore, we develop a Critical Fault Time Identification (CFTI) mechanism that determines when to trigger these faults, taking into account the context of real-time systems and environmental states, to maximize the safety impact. Experiments on DNNs for a production ADAS demonstrate that STAFI uncovers 29.56x more hazard-inducing critical faults than the strongest baseline.

2603.20897 2026-04-22 cs.CY cs.AI cs.AR

The data heat island effect: quantifying the impact of AI data centers in a warming world

Andrea Marinoni, Erik Cambria, Weisi Lin, Mauro Dalla Mura, Jocelyn Chanussot, Edoardo Ragusa, Chi Yan Tso, Yihao Zhu, Benjamin Horton

详情
英文摘要

The strong and continuous increase of AI-based services leads to the steady proliferation of AI data centres worldwide with the unavoidable escalation of their power consumption. It is unknown how this energy demand for computational purposes will impact the surrounding environment. Here, we focus our attention on the heat dissipation of AI hyperscalers. Taking advantage of land surface temperature measurements acquired by remote sensing platforms over the last decades, we are able to obtain a robust assessment of the temperature increase recorded in the areas surrounding AI data centres globally. We estimate that the land surface temperature increases by 2°C on average after the start of operations of an AI data centre, inducing local microclimate zones, which we call the data heat island effect. We assess the impact on the communities, quantifying that more than 340 million people could be affected by this temperature increase. Our results show that the data heat island effect could have a remarkable influence on communities and regional welfare in the future, hence becoming part of the conversation around environmentally sustainable AI worldwide.

2603.04445 2026-04-22 cs.NI cs.CL cs.PF

Dynamic Model Routing and Cascading for Efficient LLM Inference: A Survey

Yasmin Moslem, John D. Kelleher

Comments Work funded by ADAPT Centre, Trinity College Dublin, and Huawei Ireland

详情
英文摘要

The rapid growth of large language models (LLMs) with diverse capabilities, costs, and domains has created a critical need for intelligent model selection at inference time. While smaller models suffice for routine queries, complex tasks demand more capable models. However, static model deployment does not account for the complexity and domain of incoming queries, leading to suboptimal performance and increased costs. Dynamic routing systems that adaptively select models based on query characteristics have emerged as a solution to this challenge. We provide a systematic analysis of state-of-the-art multi-LLM routing and cascading approaches. In contrast to mixture-of-experts architectures, which route within a single model, we study routing across multiple independently trained LLMs. We cover diverse routing paradigms, including query difficulty, human preferences, clustering, uncertainty quantification, reinforcement learning, multimodality, and cascading. For each paradigm, we analyze representative methods and examine key trade-offs. Beyond taxonomy, we introduce a conceptual framework that characterizes routing systems along three dimensions: when decisions are made, what information is used, and how they are computed. This perspective highlights that practical systems are often compositional, integrating multiple paradigms under operational constraints. Our analysis demonstrates that effective multi-LLM routing requires balancing competing objectives. Choosing the optimal routing strategy depends on deployment and computational constraints. Well-designed routing systems can outperform even the most powerful individual models by strategically leveraging specialized capabilities across models while maximizing efficiency gains. Meanwhile, open challenges remain in developing routing mechanisms that generalize across diverse architectures, modalities, and applications.

2602.18571 2026-04-22 cs.SE cs.AI

Debug2Fix: Can Interactive Debugging Help Coding Agents Fix More Bugs?

Spandan Garg, Yufan Huang

Comments In Review

详情
英文摘要

While significant progress has been made in automating various aspects of software development through coding agents, there is still significant room for improvement in their bug fixing capabilities. Debugging and investigation of runtime behavior remains largely a manual, developer-driven process. Popular coding agents typically rely on either static analysis of the code or iterative test-fix cycles, which is akin to trial and error debugging. We posit that there is a wealth of rich runtime information that developers routinely access while debugging code, which agents are currently deprived of due to design limitations. Despite how prevalent debuggers are in modern IDEs and command-line tools, they have surprisingly not made their way into coding agents. In this work, we introduce Debug2Fix, a novel framework that incorporates interactive debugging as a core component of a software engineering agent via a subagent architecture. We incorporate debuggers for Java and Python into our agent framework and evaluate against GitBug-Java and SWE-Bench-Live and achieve >20% improvement in performance compared to the baseline for certain models. Furthermore, using our framework, we're able to make weaker models like GPT-5 and Claude Haiku 4.5 match or exceed the performances of stronger models like Claude Sonnet 4.5, showing that better tool design is often just as important as switching to a more expensive model. Finally, we conduct systematic ablations demonstrating the importance of both the subagent architecture and debugger integration.

2602.17036 2026-04-22 cs.IR cs.LG

LiveGraph: Active-Structure Neural Re-ranking for Exercise Recommendation

Rong Fu, Zijian Zhang, Haiyun Wei, Jiekai Wu, Kun Liu, Xianda Li, Haoyu Zhao, Yang Li, Yongtai Liu, Ziming Wang, Rui Lu, Simon Fong

Comments 19 pages, 5 figures

详情
英文摘要

The continuous expansion of digital learning environments has catalyzed the demand for intelligent systems capable of providing personalized educational content. While current exercise recommendation frameworks have made significant strides, they frequently encounter obstacles regarding the long-tailed distribution of student engagement and the failure to adapt to idiosyncratic learning trajectories. We present LiveGraph, a novel active-structure neural re-ranking framework designed to overcome these limitations. Our approach utilizes a graph-based representation enhancement strategy to bridge the information gap between active and inactive students while integrating a dynamic re-ranking mechanism to foster content diversity. By prioritizing the structural relationships within learning histories, the proposed model effectively balances recommendation precision with pedagogical variety. Comprehensive experimental evaluations conducted on multiple real-world datasets demonstrate that LiveGraph surpasses contemporary baselines in both predictive accuracy and the breadth of exercise diversity.

2602.15423 2026-04-22 cs.IR cs.LG

GaiaFlow: Semantic-Guided Diffusion Tuning for Carbon-Frugal Search

Rong Fu, Jia Yee Tan, Chunlei Meng, Shuo Yin, Xiaowen Ma, Wangyu Wu, Muge Qi, Guangzhen Yao, Zhaolu Kang, Zeli Su, Simon Fong

Comments 19 pages, 7 figures

详情
英文摘要

As the burgeoning power requirements of sophisticated neural architectures escalate, the information retrieval community has recognized ecological sustainability as a pivotal priority that necessitates a fundamental paradigm shift in model design. While contemporary neural rankers have attained unprecedented accuracy, the substantial environmental externalities associated with their computational intensity often remain overlooked in large-scale deployments. We present GaiaFlow, an innovative framework engineered to facilitate carbon-frugal search by operationalizing semantic-guided diffusion tuning. Our methodology orchestrates the convergence of retrieval-guided Langevin dynamics and a hardware-independent performance modeling strategy to optimize the trade-off between search precision and environmental preservation. By incorporating adaptive early exit protocols and precision-aware quantized inference, the proposed architecture significantly mitigates operational carbon footprints while maintaining robust retrieval quality across heterogeneous computing infrastructures. Extensive experimental evaluations demonstrate that GaiaFlow achieves a superior equilibrium between effectiveness and energy efficiency, offering a scalable and sustainable pathway for next-generation neural search systems.

2602.11631 2026-04-22 physics.geo-ph cs.LG

Enforcing Reciprocity in Operator Learning for Seismic Wave Propagation

Caifeng Zou, Yaozhong Shi, Zachary E. Ross, Robert W. Clayton, Kamyar Azizzadenesheli

详情
英文摘要

Accurate and efficient wavefield modeling underpins seismic structure and source studies. Traditional methods comply with physical laws but are computationally intensive. Data-driven methods, while opening new avenues for advancement, have yet to incorporate strict physical consistency. The principle of reciprocity is one of the most fundamental physical laws in wave propagation. We introduce the Reciprocity-Enforced Neural Operator (RENO), a transformer-based architecture for modeling seismic wave propagation that hard-codes the reciprocity principle. The model leverages the cross-attention mechanism and commutative operations to guarantee invariance under swapping source and receiver positions. Beyond improved physical consistency, the proposed architecture supports simultaneous realizations for multiple sources. This yields an order-of-magnitude inference speedup at a similar memory footprint over a conventional neural operator on a realistic multi-source configuration. We demonstrate the functionality using the reciprocity relation for particle velocity fields under single forces. This architecture is also applicable to pressure fields under dilatational sources and travel-time fields governed by the eikonal equation, paving the way for encoding more complex reciprocity relations.