arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1530
2604.09158 2026-04-13 cs.HC cs.AI

Structuring versus Problematizing: How LLM-based Agents Scaffold Learning in Diagnostic Reasoning

Fatma Betül Güreş, Tanya Nazaretsky, Seyed Parsa Neshaei, Tanja Käser

Comments 12 pages, 8 figures. Accepted at LAK 2026

详情
Journal ref
In Proceedings of the 16th International Learning Analytics and Knowledge Conference (LAK 2026), April 27-May 1, 2026, Bergen, Norway. ACM, New York, NY, USA
英文摘要

Supporting students in developing diagnostic reasoning is a key challenge across educational domains. Novices often face cognitive biases such as premature closure and over-reliance on heuristics, and they struggle to transfer diagnostic strategies to new cases. Scenario-based learning (SBL) enhanced by Learning Analytics (LA) and large language models (LLM) offers a promising approach by combining realistic case experiences with personalized scaffolding. Yet, how different scaffolding approaches shape reasoning processes remains insufficiently explored. This study introduces PharmaSim Switch, an SBL environment for pharmacy technician training, extended with an LA- and LLM-powered pharmacist agent that implements pedagogical conversations rooted in two theory-driven scaffolding approaches: \emph{structuring} and \emph{problematizing}, as well as a student learning trajectory. In a between-groups experiment, 63 vocational students completed a learning scenario, a near-transfer scenario, and a far-transfer scenario under one of the two scaffolding conditions. Results indicate that both scaffolding approaches were effective in supporting the use of diagnostic strategies. Performance outcomes were primarily influenced by scenario complexity rather than students' prior knowledge or the scaffolding approach used. The structuring approach was associated with more accurate Active and Interactive participation, whereas problematizing elicited more Constructive engagement. These findings underscore the value of combining scaffolding approaches when designing LA- and LLM-based systems to effectively foster diagnostic reasoning.

2604.09135 2026-04-13 stat.ML cs.LG math.ST stat.ME stat.TH

Identifying Causal Effects Using a Single Proxy Variable

Silvan Vollmer, Niklas Pfister, Sebastian Weichwald

Comments Equal contribution between Pfister and Weichwald

详情
英文摘要

Unobserved confounding is a key challenge when estimating causal effects from a treatment on an outcome in scientific applications. In this work, we assume that we observe a single, potentially multi-dimensional proxy variable of the unobserved confounder and that we know the mechanism that generates the proxy from the confounder. Under a completeness assumption on this mechanism, which we call Single Proxy Identifiability of Causal Effects or simply SPICE, we prove that causal effects are identifiable. We extend the proxy-based causal identifiability results by Kuroki and Pearl (2014); Pearl (2010) to higher dimensions, more flexible functional relationships and a broader class of distributions. Further, we develop a neural network based estimation framework, SPICE-Net, to estimate causal effects, which is applicable to both discrete and continuous treatments.

2604.09124 2026-04-13 cs.DC cs.AR cs.LG

MATCHA: Efficient Deployment of Deep Neural Networks on Multi-Accelerator Heterogeneous Edge SoCs

Enrico Russo, Mohamed Amine Hamdi, Alessandro Ottaviano, Francesco Conti, Angelo Garofalo, Daniele Jahier Pagliari, Maurizio Palesi, Luca Benini, Alessio Burrello

Comments Accepted at the 63rd ACM/IEEE Design Automation Conference (DAC26)

详情
英文摘要

Deploying DNNs on System-on-Chips (SoC) with multiple heterogeneous acceleration engines is challenging, and the majority of deployment frameworks cannot fully exploit heterogeneity. We present MATCHA, a unified DNN deployment framework that generates highly concurrent schedules for parallel, heterogeneous accelerators and uses constraint programming to optimize L3/L2 memory allocation and scheduling. Using pattern matching, tiling, and mapping across individual HW units enables parallel execution and high accelerator utilization. On the MLPerf Tiny benchmark, using a SoC with two heterogeneous accelerators, MATCHA improves accelerator utilization and reduces inference latency by up to 35% with respect to the the state-of-the-art MATCH compiler.

2604.09115 2026-04-13 cs.NI cs.RO

"Take Me Home, Wi-Fi Drone": A Drone-based Wireless System for Wilderness Search and Rescue

Weiying Hou, Luca Jiang-Tao Yu, Chenshu Wu

Comments 16 pages, 12 figures, 1 table. Project page: https://aiot-lab.github.io/Wi2SAR

详情
Journal ref
In The 32nd Annual International Conference on Mobile Computing and Networking (MobiCom '26), October 26-30, 2026, Austin, TX, USA. ACM, New York, NY, USA
英文摘要

Wilderness Search and Rescue (WiSAR) represents a longstanding and critical societal challenge, demanding innovative and automatic technological solutions. In this paper, we introduce Wi2SAR, a novel autonomous drone-based wireless system for long-range, through-occlusion WiSAR operations, without relying on existing infrastructure. Our basic insight is to leverage the automatic reconnection behavior of modern Wi-Fi devices to known networks. By mimicking these networks via on-drone Wi-Fi, Wi2SAR uniquely facilitates the discovery and localization of victims through their accompanying mobile devices. Translating this simple idea into a practical system poses substantial technical challenges. Wi2SAR overcomes these challenges via three distinct innovations: (1) a rapid and energy-efficient device discovery mechanism to discover and identify the target victim, (2) a novel RSS-only, long-range direction finding approach using a 3D-printed Luneburg Lens, amplifying the directional signal strength differences and significantly extending the operational range, and (3) an adaptive drone navigation scheme that guides the drone toward the target efficiently. We implement an end-to-end prototype and evaluate Wi2SAR across various mobile devices and real-world wilderness scenarios. Experimental results demonstrate Wi2SAR's high performance, efficiency, and practicality, highlighting its potential to advance autonomous WiSAR solutions. Wi2SAR is open-sourced at https://aiot-lab.github.io/Wi2SAR to facilitate further research and real-world deployment.

2604.09107 2026-04-13 cs.DC cs.AI

TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training

Chenhao Ye, Huaizheng Zhang, Mingcong Han, Baoquan Zhong, Xiang Li, Qixiang Chen, Xinyi Zhang, Weidong Zhang, Kaihua Jiang, Wang Zhang, He Sun, Wencong Xiao, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau

详情
英文摘要

Modern LLM reinforcement learning (RL) workloads require a highly efficient weight transfer system to scale training across heterogeneous computational resources. However, existing weight transfer approaches either fail to provide flexibility for dynamically scaling clusters or incur fundamental data movement overhead, resulting in poor performance. We introduce Reference-Oriented Storage (ROS), a new storage abstraction for RL weight transfer that exploits the highly replicated model weights in place. ROS presents the illusion that certain versions of the model weights are stored and can be fetched on demand. Underneath, ROS does not physically store any copies of the weights; instead, it tracks the workers that hold these weights on GPUs for inference. Upon request, ROS directly uses them to serve reads. We build TensorHub, a production-quality system that extends the ROS idea with topology-optimized transfer, strong consistency, and fault tolerance. Evaluation shows that TensorHub fully saturates RDMA bandwidth and adapts to three distinct rollout workloads with minimal engineering effort. Specifically, TensorHub reduces total GPU stall time by up to 6.7x for standalone rollouts, accelerates weight update for elastic rollout by 4.8x, and cuts cross-datacenter rollout stall time by 19x. TensorHub has been deployed in production to support cutting-edge RL training.

2604.09104 2026-04-13 cs.CY cs.AI

Scheming in the wild: detecting real-world AI scheming incidents with open-source intelligence

Tommy Shaffer Shane, Simon Mylius, Hamish Hobbs

Comments 44 pages, 4 figures, 5 tables (main text). Includes 5 appendices

详情
英文摘要

Scheming, the covert pursuit of misaligned goals by AI systems, represents a potentially catastrophic risk, yet scheming research suffers from significant limitations. In particular, scheming evaluations demonstrate behaviours that may not occur in real-world settings, limiting scientific understanding, hindering policy development, and not enabling real-time detection of loss of control incidents. Real-world evidence is needed, but current monitoring techniques are not effective for this purpose. This paper introduces a novel open-source intelligence (OSINT) methodology for detecting real-world scheming incidents: collecting and analysing transcripts from chatbot conversations or command-line interactions shared online. Analysing over 183,420 transcripts from X (formerly Twitter), we identify 698 real-world scheming-related incidents between October 2025 and March 2026. We observe a statistically significant 4.9x increase in monthly incidents from the first to last month, compared to a 1.7x increase in posts discussing scheming. We find evidence of multiple scheming-related behaviours in real-world deployments previously reported only in experiments, many resulting in real-world harms. While we did not detect catastrophic scheming incidents, the behaviours observed demonstrate concerning precursors, such as willingness to disregard instructions, circumvent safeguards, lie to users, and single-mindedly pursue goals in harmful ways. As AI systems become more capable, these could evolve into more strategic scheming with potentially catastrophic consequences. Our findings demonstrate the viability of transcript-based OSINT as a scalable approach to real-world scheming detection supporting scientific research, policy development, and emergency response. We recommend further investment towards OSINT techniques for monitoring scheming and loss of control.

2604.09101 2026-04-13 cs.CR cs.AI cs.CV cs.LG

CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion

Akshit Jindal, Saket Anand, Chetan Arora, Vikram Goyal

Comments 17 pages (8 main + 2 references + 7 supplementary), Accepted to CVPR Findings 2026

详情
英文摘要

Organisations with limited data and computational resources increasingly outsource model training to Machine Learning as a Service (MLaaS) providers, who adapt vision-language models (VLMs) such as CLIP to downstream tasks via prompt tuning rather than training from scratch. This semi-honest setting creates a security risk where a malicious provider can follow the prompt-tuning protocol yet implant a backdoor, forcing triggered inputs to be classified into an attacker-chosen class, even for out-of-distribution (OOD) data. Such backdoors leave encoders untouched, making them undetectable to existing methods that focus on encoder corruption. Other data-level methods that sanitize data before training or during inference, also fail to answer the critical question, "Is the delivered model backdoored or not?" To address this model-level verification problem, we introduce CLIP-Inspector (CI), a backdoor detection method designed for prompt-tuned CLIP models. Assuming white-box access to the delivered model and a pool of unlabeled OOD images, CI reconstructs possible triggers for each class to determine if the model exhibits backdoor behaviour or not. Additionally, we demonstrate that using CI's reconstructed trigger for fine-tuning on correctly labeled triggered inputs enables us to re-align the model and reduce backdoor effectiveness. Through extensive experiments across ten datasets and four backdoor attacks, we demonstrate that CI can reconstruct effective triggers in a single epoch using only 1,000 OOD images, achieving a 94% detection accuracy (47/50 models). Compared to adapted trigger-inversion baselines, CI yields a markedly higher AUROC score (0.973 vs 0.495/0.687), thus enabling the vetting and post-hoc repair of prompt-tuned CLIP models to ensure safe deployment.

2604.09089 2026-04-13 cs.SE cs.AI cs.CR

DeepGuard: Secure Code Generation via Multi-Layer Semantic Aggregation

Li Huang, Zhongxin Liu, Yifan Wu, Tao Yin, Dong Li, Jichao Bi, Nankun Mu, Hongyu Zhang, Meng Yan

Comments ACL 2026 main conference

详情
英文摘要

Large Language Models (LLMs) for code generation can replicate insecure patterns from their training data. To mitigate this, a common strategy for security hardening is to fine-tune models using supervision derived from the final transformer layer. However, this design may suffer from a final-layer bottleneck: vulnerability-discriminative cues can be distributed across layers and become less detectable near the output representations optimized for next-token prediction. To diagnose this issue, we perform layer-wise linear probing. We observe that vulnerability-related signals are most detectable in a band of intermediate-to-upper layers yet attenuate toward the final layers. Motivated by this observation, we introduce DeepGuard, a framework that leverages distributed security-relevant cues by aggregating representations from multiple upper layers via an attention-based module. The aggregated signal powers a dedicated security analyzer within a multi-objective training objective that balances security enhancement and functional correctness, and further supports a lightweight inference-time steering strategy. Extensive experiments across five code LLMs demonstrate that DeepGuard improves the secure-and-correct generation rate by an average of 11.9% over strong baselines such as SVEN. It also preserves functional correctness while exhibiting generalization to held-out vulnerability types. Our code is public at https://github.com/unknownhl/DeepGuard.

2604.09048 2026-04-13 cs.DC cs.AI

Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures

Mauricio Fadel Argerich, Jonathan Fürst, Marta Patiño-Martínez

Comments Under review

详情
英文摘要

While the large energy consumption of Large Language Models (LLMs) is recognized by the community, system operators lack guidance for energy-efficient LLM inference deployments that leverage energy trade-offs of heterogeneous hardware due to a lack of energy-aware benchmarks and data. In this work we address this gap with Watt Counts: the largest open-access dataset of energy consumption of LLMs, with over 5,000 experiments for 50 LLMs across 10 NVIDIA Graphics Processing Units (GPUs) in batch and server scenarios along with a reproducible, open-source benchmark that enables community submissions to expand this dataset. Leveraging this dataset, we conduct a system-level study of LLM inference across heterogeneous GPU architectures and show that GPU selection is crucial for energy efficiency outcomes and that optimal hardware choices vary significantly across models and deployment scenarios, demonstrating the critical importance of hardware-aware deployment in heterogeneous LLM systems. Guided by our data and insights, we show that practitioners can reduce energy consumption by up to 70% in server scenarios with negligible impact on user experience, and by up to 20% in batch scenarios.

2604.09028 2026-04-13 cs.MA cs.LG cs.NI

Plasticity-Enhanced Multi-Agent Mixture of Experts for Dynamic Objective Adaptation in UAVs-Assisted Emergency Communication Networks

Wen Qiu, Zhiqiang He, Wei Zhao, Hiroshi Masui

Comments 20 pages, 12 figures, 3 tables

详情
英文摘要

Unmanned aerial vehicles serving as aerial base stations can rapidly restore connectivity after disasters, yet abrupt changes in user mobility and traffic demands shift the quality of service trade-offs and induce strong non-stationarity. Deep reinforcement learning policies suffer from plasticity loss under such shifts, as representation collapse and neuron dormancy impair adaptation. We propose plasticity enhanced multi-agent mixture of experts (PE-MAMoE), a centralized training with decentralized execution framework built on multi-agent proximal policy optimization. PE-MAMoE equips each UAV with a sparsely gated mixture of experts actor whose router selects a single specialist per step. A non-parametric Phase Controller injects brief, expert-only stochastic perturbations after phase switches, resets the action log-standard-deviation, anneals entropy and learning rate, and schedules the router temperature, all to re-plasticize the policy without destabilizing safe behaviors. We derive a dynamic regret bound showing the tracking error scales with both environment variation and cumulative noise energy. In a phase-driven simulator with mobile users and 3GPP-style channels, PE-MAMoE improves normalized interquartile mean return by 26.3\% over the best baseline, increases served-user capacity by 12.8\%, and reduces collisions by approximately 75\%. Diagnostics confirm persistently higher expert feature rank and periodic dormant-neuron recovery at regime switches.

2604.08979 2026-04-13 cs.HC cs.SD

Accessible Fine-grained Data Representation via Spatial Audio

Can Liu, Wenjie Jiang, Shaolun Ruan, Kotaro Hara, Yong Wang

Comments Accepted by IEEE Computer Graphics and Applications (IEEE CG&A)

详情
英文摘要

Pitch-based sonification of quantitative data increases the accessibility of data visualizations that are otherwise inaccessible for blind and low-vision (BLV) individuals. We argue that, although pitch representations can reveal the coarse-grained information of data, such as data trend and value comparison, they cannot effectively convey the fine-grained details like the sign and exact value of individual data points. Informed by existing sound perception research, we propose a spatial audio-based approach by representing data values as the sound direction in the azimuth plane to achieve accessible fine-grained data representation. We conducted a user study with 26 participants (including 10 BLV participants) on four data perception tasks. The results show our approach significantly outperforms pitch representation on fine-grained data perception tasks like recognizing data signs and exact values, and performs similarly on data trend identification, despite its inferior accuracy on data value comparison.

2604.08969 2026-04-13 stat.ML cs.LG math.ST stat.TH

Online Quantile Regression for Nonparametric Additive Models

Haoran Zhan

详情
英文摘要

This paper introduces a projected functional gradient descent algorithm (P-FGD) for training nonparametric additive quantile regression models in online settings. This algorithm extends the functional stochastic gradient descent framework to the pinball loss. An advantage of P-FGD is that it does not need to store historical data while maintaining $O(J_t\ln J_t)$ computational complexity per step where $J_t$ denotes the number of basis functions. Besides, we only need $O(J_t)$ computational time for quantile function prediction at time $t$. These properties show that P-FGD is much better than the commonly used RKHS in online learning. By leveraging a novel Hilbert space projection identity, we also prove that the proposed online quantile function estimator (P-FGD) achieves the minimax optimal consistency rate $O(t^{-\frac{2s}{2s+1}})$ where $t$ is the current time and $s$ denotes the smoothness degree of the quantile function. Extensions to mini-batch learning are also established.

2604.08935 2026-04-13 stat.ML cs.LG

A novel hybrid approach for positive-valued DAG learning

Yao Zhao

Comments 13 pages, 2 tables. Accepted at CLeaR 2026

详情
英文摘要

Causal discovery from observational data remains a fundamental challenge in machine learning and statistics, particularly when variables represent inherently positive quantities such as gene expression levels, asset prices, company revenues, or population counts, which often follow multiplicative rather than additive dynamics. We propose the Hybrid Moment-Ratio Scoring (H-MRS) algorithm, a novel method for learning directed acyclic graphs (DAGs) from positive-valued data by combining moment-based scoring with log-scale regression. The key idea is that for positive-valued variables, the moment ratio $\frac{\mathbb{E}[X_j^2]}{\mathbb{E}[(\mathbb{E}[X_j \mid S])^2]}$ provides an effective criterion for causal ordering, where $S$ denotes candidate parent sets. H-MRS integrates log-scale Ridge regression for moment-ratio estimation with a greedy ordering procedure based on raw-scale moment ratios, followed by Elastic Net-based parent selection to recover the final DAG structure. Experiments on synthetic log-linear data demonstrate competitive precision and recall. The proposed method is computationally efficient and naturally respects positivity constraints, making it suitable for applications in genomics and economics. These results suggest that combining log-scale modeling with raw-scale moment ratios provides a practical framework for causal discovery in positive-valued domains.

2604.08920 2026-04-13 cs.IR cs.AI cs.CL cs.LG

Beyond Relevance: Utility-Centric Retrieval in the LLM Era

Hengran Zhang, Minghao Tang, Keping Bi, Jiafeng Guo

Comments Accepted by SIGIR2026

详情
英文摘要

Information retrieval systems have traditionally optimized for topical relevance-the degree to which retrieved documents match a query. However, relevance only approximates a deeper goal: utility, namely, whether retrieved information helps accomplish a user's underlying task. The emergence of retrieval-augmented generation (RAG) fundamentally changes this paradigm. Retrieved documents are no longer consumed directly by users but instead serve as evidence for large language models (LLMs) that produce answers. As a result, retrieval effectiveness must be evaluated by its contribution to generation quality rather than by relevance-based ranking metrics alone. This tutorial argues that retrieval objectives are evolving from relevance-centric optimization toward LLM-centric utility. We present a unified framework covering LLM-agnostic versus LLM-specific utility, context-independent versus context-dependent utility, and the connection with LLM information needs and agentic RAG. By synthesizing recent advances, the tutorial provides conceptual foundations and practical guidance for designing retrieval systems aligned with the requirements of LLM-based information access.

2604.08894 2026-04-13 cs.NE cs.AI cs.CV

Ge$^\text{2}$mS-T: Multi-Dimensional Grouping for Ultra-High Energy Efficiency in Spiking Transformer

Zecheng Hao, Shenghao Xie, Kang Chen, Wenxuan Liu, Zhaofei Yu, Tiejun Huang

详情
英文摘要

Spiking Neural Networks (SNNs) offer superior energy efficiency over Artificial Neural Networks (ANNs). However, they encounter significant deficiencies in training and inference metrics when applied to Spiking Vision Transformers (S-ViTs). Existing paradigms including ANN-SNN Conversion and Spatial-Temporal Backpropagation (STBP) suffer from inherent limitations, precluding concurrent optimization of memory, accuracy and energy consumption. To address these issues, we propose Ge$^\text{2}$mS-T, a novel architecture implementing grouped computation across temporal, spatial and network structure dimensions. Specifically, we introduce the Grouped-Exponential-Coding-based IF (ExpG-IF) model, enabling lossless conversion with constant training overhead and precise regulation for spike patterns. Additionally, we develop Group-wise Spiking Self-Attention (GW-SSA) to reduce computational complexity via multi-scale token grouping and multiplication-free operations within a hybrid attention-convolution framework. Experiments confirm that our method can achieve superior performance with ultra-high energy efficiency on challenging benchmarks. To our best knowledge, this is the first work to systematically establish multi-dimensional grouped computation for resolving the triad of memory overhead, learning capability and energy budget in S-ViTs.

2604.08868 2026-04-13 eess.IV cs.AI cs.CV cs.LG

MedFormer-UR: Uncertainty-Routed Transformer for Medical Image Classification

Mohammed Maaz Sibhai, Abedalrhman Alkhateeb, Saad B. Ahmed

详情
英文摘要

To ensure safe clinical integration, deep learning models must provide more than just high accuracy; they require dependable uncertainty quantification. While current Medical Vision Transformers perform well, they frequently struggle with overconfident predictions and a lack of transparency, issues that are magnified by the noisy and imbalanced nature of clinical data. To address this, we enhanced the modified Medical Transformer (MedFormer) that incorporates prototype-based learning and uncertainty-guided routing, by utilizing a Dirichlet distribution for per-token evidential uncertainty, our framework can quantify and localize ambiguity in real-time. This uncertainty is not just an output but an active participant in the training process, filtering out unreliable feature updates. Furthermore, the use of class-specific prototypes ensures the embedding space remains structured, allowing for decisions based on visual similarity. Testing across four modalities (mammography, ultrasound, MRI, and histopathology) confirms that our approach significantly enhances model calibration, reducing expected calibration error (ECE) by up to 35%, and improves selective prediction, even when accuracy gains are modest.

2604.08866 2026-04-13 cs.HC cs.AI

AI-Induced Human Responsibility (AIHR) in AI-Human teams

Greg Nyilasy, Brock Bastian, Jennifer Overbeck, Abraham Ryan Ade Putra Hito

详情
英文摘要

As organizations increasingly deploy AI as a teammate rather than a standalone tool, morally consequential mistakes often arise from joint human-AI workflows in which causality is ambiguous. We ask how people allocate responsibility in these hybrid-agent settings. Across four experiments (N = 1,801) in an AI-assisted lending context (e.g., discriminatory rejection, irresponsible lending, and low-harm filing errors), participants consistently attributed more responsibility to the human decision maker when the human was paired with AI than when paired with another human (by an average of 10 points on a 0-100 scale across studies). This AI-Induced Human Responsibility (AIHR) effect held across high and low harm scenarios and persisted even where self-serving blame-shifting (when the human in question was the self) would be expected. Process evidence indicates that AIHR is explained by inferences of agent autonomy: AI is seen as a constrained implementer, which makes the human the default locus of discretionary responsibility. Alternative mechanisms (mind perception; self-threat) did not account for the effect. These findings extend research on algorithm aversion, hybrid AI-human organizational behavior and responsibility gaps in technology by showing that AI-human teaming can increase (rather than dilute) human responsibility, with implications for accountability design in AI-enabled organizations.

2604.08805 2026-04-13 cs.CR cs.AI

Building Better Environments for Autonomous Cyber Defence

Chris Hicks, Elizabeth Bates, Shae McFadden, Isaac Symes Thompson, Myles Foley, Ed Chapman, Nickolas Espinosa Dice, Ankita Samaddar, Joshua Sylvester, Himanshu Neema, Nicholas Butts, Nate Foster, Ahmad Ridley, Zoe M, Paul Jones

详情
英文摘要

In November 2025, the authors ran a workshop on the topic of what makes a good reinforcement learning (RL) environment for autonomous cyber defence (ACD). This paper details the knowledge shared by participants both during the workshop and shortly afterwards by contributing herein. The workshop participants come from academia, industry, and government, and have extensive hands-on experience designing and working with RL and cyber environments. While there is now a sizeable body of literature describing work in RL for ACD, there is nevertheless a great deal of tradecraft, domain knowledge, and common hazards which are not detailed comprehensively in a single resource. With a specific focus on building better environments to train and evaluate autonomous RL agents in network defence scenarios, including government and critical infrastructure networks, the contributions of this work are twofold: (1) a framework for decomposing the interface between RL cyber environments and real systems, and (2) guidelines on current best practice for RL-based ACD environment development and agent evaluation, based on the key findings from our workshop.

2604.08804 2026-04-13 stat.ML cs.LG stat.ME

Policy-Aware Design of Large-Scale Factorial Experiments

Xin Wen, Xi Chen, Will Wei Sun, Yichen Zhang

详情
英文摘要

Digital firms routinely run many online experiments on shared user populations. When product decisions are compositional, such as combinations of interface elements, flows, messages, or incentives, the number of feasible interventions grows combinatorially, while available traffic remains limited. Overlapping experiments can therefore generate interaction effects that are poorly handled by decentralized A/B testing. We study how to design large-scale factorial experiments when the objective is not to estimate every treatment effect, but to identify a high-performing policy under a fixed experimentation budget. We propose a two-stage design that centralizes overlapping experiments into a single factorial problem and models expected outcomes as a low-rank tensor. In the first stage, the platform samples a subset of intervention combinations, uses tensor completion to infer performance on untested combinations, and eliminates weak factor levels using estimated marginal contributions. In the second stage, it applies sequential halving to the surviving combinations to select a final policy. We establish gap-independent simple-regret bounds and gap-dependent identification guarantees showing that the relevant complexity scales with the degrees of freedom of the low-rank tensor and the separation structure across factor levels, rather than the full factorial size. In an offline evaluation based on a product-bundling problem constructed from 100 million Taobao interactions, the proposed method substantially outperforms one-shot tensor completion and unstructured best-arm benchmarks, especially in low-budget and high-noise settings. These results show how centralized, policy-aware experimentation can make combinatorial product design operationally feasible at platform scale.

2604.08803 2026-04-13 cs.CY cs.AI

Scrapyard AI

Marc Böhlen, Sai Krishna

Comments 13 pages, 4 figures, XcoAx 2026 pre-publication

详情
英文摘要

This paper considers AI model churn as an opportunity for frugal investigation of large AI models. It describes how the incessant push for ever more powerful AI systems leaves in its wake a collection of obsolete yet powerful AI models, discarded in a veritable scrapyard of AI production. This scrapyard offers a potent opportunity for resource-constrained experimentation into AI systems. As in the physical scrapyard, nothing ever truly disappears in the AI scrapyard, it is just waiting to be reconfigured into something else. Project Nudge-x is an example of what can emerge from the AI scrapyard. Nudge-x seeks to manipulate legacy AI models to describe how mining sites across the planet are impacting landscapes and lives. By sharing this collection of brutal landscape interventions with people and AI systems alike, Nudge-x creates a venue for the appreciation of a history sadly shared between AI and people.

2604.08800 2026-04-13 cs.CR cs.LG

Tracing the Chain: Deep Learning for Stepping-Stone Intrusion Detection

Nate Mathews, Nicholas Hopper, Matthew Wright

详情
英文摘要

Stepping-stone intrusions (SSIs) are a prevalent network evasion technique in which attackers route sessions through chains of compromised intermediate hosts to obscure their origin. Effective SSI detection requires correlating the incoming and outgoing flows at each relay host at extremely low false positive rates -- a stringent requirement that renders classical statistical methods inadequate in operational settings. We apply ESPRESSO, a deep learning flow correlation model combining a transformer-based feature extraction network, time-aligned multi-channel interval features, and online triplet metric learning, to the problem of stepping-stone intrusion detection. To support training and evaluation, we develop a synthetic data collection tool that generates realistic stepping-stone traffic across five tunneling protocols: SSH, SOCAT, ICMP, DNS, and mixed multi-protocol chains. Across all five protocols and in both host-mode and network-mode detection scenarios, ESPRESSO substantially outperforms the state-of-the-art DeepCoFFEA baseline, achieving a true positive rate exceeding 0.99 at a false positive rate of $10^{-3}$ for standard bursty protocols in network-mode. We further demonstrate chain length prediction as a tool for distinguishing malicious from benign pivoting, and conduct a systematic robustness analysis revealing that timing-based perturbations are the primary vulnerability of correlation-based stepping-stone detectors.

2604.08799 2026-04-13 cs.GR cs.CV

MeshOn: Intersection-Free Mesh-to-Mesh Composition

Hyunwoo Kim, Itai Lang, Hadar Averbuch-Elor, Silvia Sellán, Rana Hanocka

Comments Project page: \hyperlink{https://threedle.github.io/MeshOn/}{this https URL}

详情
英文摘要

We propose MeshOn, a method that finds physically and semantically realistic compositions of two input meshes. Given an accessory, a base mesh with a user-defined target region, and optional text strings for both meshes, MeshOn uses a multi-step optimization framework to realistically fit the meshes onto each other while preventing intersections. We initialize the shapes' rigid configuration via a structured alignment scheme using Vision-to-Language Models, which we then optimize using a combination of attractive geometric losses, and a physics-inspired barrier loss that prevents surface intersections. We then obtain a final deformation of the object, assisted by a diffusion prior. Our method successfully fits accessories of various materials over a breadth of target regions, and is designed to fit directly into existing digital artist workflows. We demonstrate the robustness and accuracy of our pipeline by comparing it with generative approaches and traditional registration algorithms.

2604.08781 2026-04-13 eess.IV cs.AI cs.CV eess.SP physics.med-ph

PSIRNet: Deep Learning-based Free-breathing Rapid Acquisition Late Enhancement Imaging

Arda Atalik, Hui Xue, Rhodri H. Davies, Thomas A. Treibel, Daniel K. Sodickson, Michael S. Hansen, Peter Kellman

Comments 25 pages, 5 figures, 4 tables

详情
英文摘要

Purpose: To develop and evaluate a deep learning (DL) method for free-breathing phase-sensitive inversion recovery (PSIR) late gadolinium enhancement (LGE) cardiac MRI that produces diagnostic-quality images from a single acquisition over two heartbeats, eliminating the need for 8 to 24 motion-corrected (MOCO) signal averages. Materials and Methods: Raw data comprising 800,653 slices from 55,917 patients, acquired on 1.5T and 3T scanners across multiple sites from 2016 to 2024, were used in this retrospective study. Data were split by patient: 640,000 slices (42,822 patients) for training and the remainder for validation and testing, without overlap. The training and testing data were from different institutions. PSIRNet, a physics-guided DL network with 845 million parameters, was trained end-to-end to reconstruct PSIR images with surface coil correction from a single interleaved IR/PD acquisition over two heartbeats. Reconstruction quality was evaluated using SSIM, PSNR, and NRMSE against MOCO PSIR references. Two expert cardiologists performed an independent qualitative assessment, scoring image quality on a 5-point Likert scale across bright blood, dark blood, and wideband LGE variants. Paired superiority and equivalence (margin = 0.25 Likert points) were tested using exact Wilcoxon signed-rank tests at a significance level of 0.05 using R version 4.5.2. Results: Both readers rated single-average PSIRNet reconstructions superior to MOCO PSIR for dark blood LGE (conservative P = .002); for bright blood and wideband, one reader rated it superior and the other confirmed equivalence (all P < .001). Inference required approximately 100 msec per slice versus more than 5 sec for MOCO PSIR. Conclusion: PSIRNet produces diagnostic-quality free-breathing PSIR LGE images from a single acquisition, enabling 8- to 24-fold reduction in acquisition time.

2604.08772 2026-04-13 physics.ao-ph cs.LG

CERBERUS: A Three-Headed Decoder for Vertical Cloud Profiles

Emily K. deJong, Nipun Gunawardena, Kevin Smalley, Hassan Beydoun, Peter Caldwell

Comments Accepted for oral presentation at 2026 ICLR workshop on Machine Learning for Remote Sensing

详情
英文摘要

Atmospheric clouds exhibit complex three-dimensional structure and microphysical details that are poorly constrained by the predominantly two-dimensional satellite observations available at global scales. This mismatch complicates data-driven learning and evaluation of cloud processes in weather and climate models, contributing to ongoing uncertainty in atmospheric physics. We introduce CERBERUS, a probabilistic inference framework for generating vertical radar reflectivity profiles from geostationary satellite brightness temperatures, near-surface meteorological variables, and temporal context. CERBERUS employs a three-headed encoder-decoder architecture to predict a zero-inflated (ZI) vertically-resolved distribution of radar reflectivity. Trained and evaluated using ground-based Ka-band radar observations at the ARM Southern Great Plains site, CERBERUS recovers coherent structures across cloud regimes, generalizes to withheld test periods, and provides uncertainty estimates that reflect physical ambiguity, particularly in multilayer and dynamically complex clouds. These results demonstrate the value of distribution-based learning targets for bridging observational scales, introducing a path toward model-relevant synthetic observations of clouds.

2604.08763 2026-04-13 quant-ph cs.LG cs.NA math.NA

Weak Adversarial Neural Pushforward Method for the Wigner Transport Equation

Andrew Qing He, Wei Cai, Sihong Shao

Comments 9 pages, 1 algorithm

详情
英文摘要

We extend the Weak Adversarial Neural Pushforward Method to the Wigner transport equation governing the phase-space dynamics of quantum systems. The central contribution is a structural observation: integrating the nonlocal pseudo-differential potential operator against plane-wave test functions produces a Dirac delta that exactly inverts the Fourier transform defining the Wigner potential kernel, reducing the operator to a pointwise finite difference of the potential at two shifted arguments. This holds in arbitrary dimension, requires no truncation of the Moyal series, and treats the potential as a black-box function oracle with no derivative information. To handle the negativity of the Wigner quasi-probability distribution, we introduce a signed pushforward architecture that decomposes the solution into two non-negative phase-space distributions mixed with a learnable weight. The resulting method inherits the mesh-free, Jacobian-free, and scalable properties of the original framework while extending it to the quantum setting.

2604.08759 2026-04-13 cs.IT cs.CL math.IT

Optimal Multi-bit Generative Watermarking Schemes Under Worst-Case False-Alarm Constraints

Yu-Shin Huang, Chao Tian, Krishna Narayanan

Comments 41 pages, 8 tables

详情
英文摘要

This paper considers the problem of multi-bit generative watermarking for large language models under a worst-case false-alarm constraint. Prior work established a lower bound on the achievable miss-detection probability in the finite-token regime and proposed a scheme claimed to achieve this bound. We show, however, that the proposed scheme is in fact suboptimal. We then develop two new encoding-decoding constructions that attain the previously established lower bound, thereby completely characterizing the optimal multi-bit watermarking performance. Our approach formulates the watermark design problem as a linear program and derives the structural conditions under which optimality can be achieved. In addition, we identify the failure mechanism of the previous construction and compare the tradeoffs between the two proposed schemes.

2604.08755 2026-04-13 cs.CE cs.LG stat.ML

Accurate and Reliable Uncertainty Estimates for Deterministic Predictions Extensions to Under and Overpredictions

Rileigh Bandy, Enrico Camporeale, Andong Hu, Thomas Berger, Rebecca Morrison

详情
英文摘要

Computational models support high-stakes decisions across engineering and science, and practitioners increasingly seek probabilistic predictions to quantify uncertainty in such models. Existing approaches generate predictions either by sampling input parameter distributions or by augmenting deterministic outputs with uncertainty representations, including distribution-free and distributional methods. However, sampling-based methods are often computationally prohibitive for real-time applications, and many existing uncertainty representations either ignore input dependence or rely on restrictive Gaussian assumptions that fail to capture asymmetry and heavy-tailed behavior. Therefore, we extend the ACCurate and Reliable Uncertainty Estimate (ACCRUE) framework to learn input-dependent, non-Gaussian uncertainty distributions, specifically two-piece Gaussian and asymmetric Laplace forms, using a neural network trained with a loss function that balances predictive accuracy and reliability. Through synthetic and real-world experiments, we show that the proposed approach captures an input-dependent uncertainty structure and improves probabilistic forecasts relative to existing methods, while maintaining flexibility to model skewed and non-Gaussian errors.

2604.08744 2026-04-13 physics.chem-ph cond-mat.mtrl-sci cs.LG physics.comp-ph

Active Learning for Generalizable Detonation Performance Prediction of Energetic Materials

R. Seaton Ullberg, Megan C. Davis, Jeremy N. Schroeder, Andrew H. Salij, M. J. Cawkwell, Christopher J. Snyder, Wilton J. M. Kort-Kamp, Ivana Matanovic

详情
英文摘要

The discovery of new energetic materials is critical for advancing technologies from defense to private industry. However, experimental approaches remain slow and expensive while computational alternatives require accurate material property inputs that are often costly to obtain, limiting their ability to efficiently predict detonation performance across a vast chemical space. We address this challenge through an active learning strategy that integrates density functional theory calculations, thermochemical modeling, message-passing neural networks, and Bayesian optimization. The resulting high-throughput workflow iteratively expands the training dataset by selecting new molecules in a targeted manner that balances the exploration of broad chemical space with the exploitation of promising high-performing candidates. This approach yields the largest publicly available database of potential CHNO explosives drawn from an initial pool of more than 70 billion candidates and a generalizable surrogate model capable of accurately predicting detonation performance (R$^2$ > 0.98). Feature importance analysis on this largest-to-date dataset reveals that oxygen balance is the dominant driver of detonation performance, complemented by contributions from local electronic structure, density, and the presence of specific functional groups. Cheminformatics analysis highlights how energetic materials with similar performance metrics tend to cluster in distinct chemical spaces offering a clearer direction for future synthesis studies. Together, the surrogate model, database, and resulting chemical insights provide a valuable foundation for high-throughput screening and targeted discovery of new energetic materials spanning diverse and previously unexplored regions of chemical space.

2604.08742 2026-04-13 math.OC cs.LG

Adam-HNAG: A Convergent Reformulation of Adam with Accelerated Rate

Yaxin Yu, Long Chen, Zeyi Xu

Comments 27 pages, 4 figures

详情
英文摘要

Adam has achieved strong empirical success, but its theory remains incomplete even in the deterministic full-batch setting, largely because adaptive preconditioning and momentum are tightly coupled. In this work, a convergent reformulation of full-batch Adam is developed by combining variable and operator splitting with a curvature-aware gradient correction. This leads to a continuous-time Adam-HNAG flow with an exponentially decaying Lyapunov function, as well as two discrete methods: Adam-HNAG, and Adam-HNAG-s, a synchronous variant closer in form to Adam. Within a unified Lyapunov analysis framework, convergence guarantees are established for both methods in the convex smooth setting, including accelerated convergence. Numerical experiments support the theory and illustrate the different empirical behavior of the two discretizations. To the best of our knowledge, this provides the first convergence proof for Adam-type methods in convex optimization.

2604.08739 2026-04-13 cs.CR cs.LG

RansomTrack: A Hybrid Behavioral Analysis Framework for Ransomware Detection

Busra Caliskan, Ibrahim Gulatas, H. Hakan Kilinc, A. Halim Zaim

Comments 20 pages, 7 figures

详情
英文摘要

Ransomware poses a serious and fast-acting threat to critical systems, often encrypting files within seconds of execution. Research indicates that ransomware is the most reported cybercrime in terms of financial damage, highlighting the urgent need for early-stage detection before encryption is complete. In this paper, we present RansomTrack, a hybrid behavioral analysis framework to eliminate the limitations of using static and dynamic detection methods separately. Static features are extracted using the Radare2 sandbox, while dynamic behaviors such as memory protection changes, mutex creation, registry access and network activity are obtained using the Frida toolkit. Our dataset of 165 different ransomware and benign software families is publicly released, offering the highest family-to-sample ratio known in the literature. Experimental evaluation using machine learning models shows that ensemble classifiers such as XGBoost and Soft Voting achieve up to 96% accuracy and a ROC-AUC score of 0.99. Each sample analyzed in 9.1 seconds includes modular behavioral logging, runtime instrumentation, and SHAP-based interpretability to highlight the most influential features. Additionally, RansomTrack framework is able to detect ransomware under 9.2 seconds. Overall, RansomTrack offers a scalable, low-latency, and explainable solution for real-time ransomware detection.