arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1564
2603.25024 2026-03-27 stat.ML cs.LG

Improving Infinitely Deep Bayesian Neural Networks with Nesterov's Accelerated Gradient Method

Chenxu Yu, Wenqi Fang

详情
英文摘要

As a representative continuous-depth neural network approach, stochastic differential equation (SDE)-based Bayesian neural networks (BNNs) have attracted considerable attention due to their solid theoretical foundations and strong potential for real-world applications. However, their reliance on numerical SDE solvers inevitably incurs a large number of function evaluations (NFEs), resulting in high computational cost and occasional convergence instability. To address these challenges, we propose a Nesterov-accelerated gradient (NAG) enhanced SDE-BNN model. By integrating NAG into the SDE-BNN framework along with an NFE-dependent residual skip connection, our method accelerates convergence and substantially reduces NFEs during both training and testing. Extensive empirical results show that our model consistently outperforms conventional SDE-BNNs across various tasks, including image classification and sequence modeling, achieving lower NFEs and improved predictive accuracy.

2603.24986 2026-03-27 cs.HC cs.AI

Rethinking Health Agents: From Siloed AI to Collaborative Decision Mediators

Ray-Yuan Chung, Xuhai Xu, Ari Pollack

Comments Accepted in CHI '26 Workshop on Human-Agent Collaboration

详情
英文摘要

Large language model based health agents are increasingly used by health consumers and clinicians to interpret health information and guide health decisions. However, most AI systems in healthcare operate in siloed configurations, supporting individual users rather than the multi-stakeholder relationships central to healthcare. Such use can fragment understanding and exacerbate misalignment among patients, caregivers, and clinicians. We reframe AI not as a standalone assistant, but as a collaborator embedded within multi-party care interactions. Through a clinically validated fictional pediatric chronic kidney disease case study, we show that breakdowns in adherence stem from fragmented situational awareness and misaligned goals, and that siloed use of general-purpose AI tools does little to address these collaboration gaps. We propose a conceptual framework for designing AI collaborators that surface contextual information, reconcile mental models, and scaffold shared understanding while preserving human decision authority.

2603.24974 2026-03-27 math.OC cs.LG stat.ML

The Value of Information in Resource-Constrained Pricing

Ruicheng Ao, Jiashuo Jiang, David Simchi-Levi

Comments Extended version of the NeurIPS 2025 paper (arXiv:2501.14155). This version adds phase transition, surrogate-assisted variance reduction under model misspecification, and numerical experiments

详情
英文摘要

Firms that price perishable resources -- airline seats, hotel rooms, seasonal inventory -- now routinely use demand predictions, but these predictions vary widely in quality. Under hard capacity constraints, acting on an inaccurate prediction can irreversibly deplete inventory needed for future periods. We study how prediction uncertainty propagates into dynamic pricing decisions with linear demand, stochastic noise, and finite capacity. A certified demand forecast with known error bound~$ε^0$ specifies where the system should operate: it shifts regret from $O(\sqrt{T})$ to $O(\log T)$ when $ε^0 \lesssim T^{-1/4}$, and we prove this threshold is tight. A misspecified surrogate model -- biased but correlated with true demand -- cannot set prices directly but reduces learning variance by a factor of $(1-ρ^2)$ through control variates. The two mechanisms compose: the forecast determines the regret regime; the surrogate tightens estimation within it. All algorithms rest on a boundary attraction mechanism that stabilizes pricing near degenerate capacity boundaries without requiring non-degeneracy assumptions. Experiments confirm the phase transition threshold, the variance reduction from surrogates, and robustness across problem instances.

2603.24968 2026-03-27 eess.IV cs.AI

Subject-Specific Low-Field MRI Synthesis via a Neural Operator

Ziqi Gao, Nicha Dvornek, Xiaoran Zhang, Gigi Galiana, Hemant Tagare, Todd Constable

Comments 11 pages, 2 figures, 2 tables

详情
英文摘要

Low-field (LF) magnetic resonance imaging (MRI) improves accessibility and reduces costs but generally has lower signal-to-noise ratios and degraded contrast compared to high field (HF) MRI, limiting its clinical utility. Simulating LF MRI from HF MRI enables virtual evaluation of novel imaging devices and development of LF algorithms. Existing low field simulators rely on noise injection and smoothing, which fail to capture the contrast degradation seen in LF acquisitions. To this end, we introduce an end-to-end LF-MRI synthesis framework that learns HF to LF image degradation directly from a small number of paired HF-LF MRIs. Specifically, we introduce a novel HF to LF coordinate-image decoupled neural operator (H2LO) to model the underlying degradation process, and tailor it to capture high-frequency noise textures and image structure. Experimental results in T1w and T2w MRI demonstrate that H2LO produces more faithful simulated low-field images than existing parameterized noise synthesis models and popular image-to-image translation models. Furthermore, it improves performance in downstream image enhancement tasks, showcasing its potential to enhance LF MRI diagnostic capabilities.

2603.24898 2026-03-27 cs.CR cs.AI cs.NI

Sovereign AI at the Front Door of Care: A Physically Unidirectional Architecture for Secure Clinical Intelligence

Vasu Srinivasan, Dhriti Vasu

Comments 31 pages

详情
英文摘要

We present a Sovereign AI architecture for clinical triage in which all inference is performed on-device and inbound data is delivered via a physically unidirectional channel, implemented using receive-only broadcast infrastructure or certified hardware data diodes, with no return path to any external network. This design removes the network-mediated attack surface by construction, rather than attempting to secure it through software controls. The system performs conversational symptom intake, integrates device-captured vitals, and produces structured, triage-aligned clinical records at the point of care. We formalize the security properties of receiver-side unidirectionality and show that the architecture is transport-agnostic across broadcast and diode-enforced deployments. We further analyze threat models, enforcement mechanisms, and deployment configurations, demonstrating how physical one-way data flow enables high-assurance operation in both resource-constrained and high-risk environments. This work positions physically unidirectional channels as a foundational primitive for sovereign, on-device clinical intelligence at the front door of care.

2603.24891 2026-03-27 cs.AR cs.AI

Surrogates, Spikes, and Sparsity: Performance Analysis and Characterization of SNN Hyperparameters on Hardware

Ilkin Aliyev, Jesus Lopez, Tosiron Adegbija

详情
Journal ref
IEEE International Symposium on Performance Analysis of Systems and Software 2026
英文摘要

Spiking Neural Networks (SNNs) offer inherent advantages for low-power inference through sparse, event-driven computation. However, the theoretical energy benefits of SNNs are often decoupled from real hardware performance due to the opaque relationship between training-time choices and inference-time sparsity. While prior work has focused on weight pruning and compression, the role of training hyperparameters -- specifically surrogate gradient functions and neuron model configurations -- in shaping hardware-level activation sparsity remains underexplored. This paper presents a workload characterization study quantifying the sensitivity of hardware latency to SNN hyperparameters. We decouple the impact of surrogate gradient functions (e.g., Fast Sigmoid, Spike Rate Escape) and neuron models (LIF, Lapicque) on classification accuracy and inference efficiency across three event-based vision datasets: DVS128-Gesture, N-MNIST, and DVS-CIFAR10. Our analysis reveals that standard accuracy metrics are poor predictors of hardware efficiency. While Fast Sigmoid achieves the highest accuracy on DVS-CIFAR10, Spike Rate Escape reduces inference latency by up to 12.2% on DVS128-Gesture with minimal accuracy trade-offs. We also demonstrate that neuron model selection is as critical as parameter tuning; transitioning from LIF to Lapicque neurons yields up to 28% latency reduction. We validate on a custom cycle-accurate FPGA-based SNN instrumentation platform, showing that sparsity-aware hyperparameter selection can improve accuracy by 9.1% and latency by over 2x compared to baselines. These findings establish a methodology for predicting hardware behavior from training parameters. The RTL and reproducibility artifacts are at https://zenodo.org/records/18893738.

2603.24877 2026-03-27 cs.HC cs.AI

More Than "Means to an End": Supporting Reasoning with Transparently Designed AI Data Science Processes

Venkatesh Sivaraman, Patrick Vossler, Adam Perer, Julian Hong, Jean Feng

Comments Accepted to Workshop on Tools for Thought at CHI'26: Understanding, Protecting, and Augmenting Human Cognition with Generative AI - From Vision to Implementation

详情
英文摘要

Generative artificial intelligence (AI) tools can now help people perform complex data science tasks regardless of their expertise. While these tools have great potential to help more people work with data, their end-to-end approach does not support users in evaluating alternative approaches and reformulating problems, both critical to solving open-ended tasks in high-stakes domains. In this paper, we reflect on two AI data science systems designed for the medical setting and how they function as tools for thought. We find that success in these systems was driven by constructing AI workflows around intentionally-designed intermediate artifacts, such as readable query languages, concept definitions, or input-output examples. Despite opaqueness in other parts of the AI process, these intermediates helped users reason about important analytical choices, refine their initial questions, and contribute their unique knowledge. We invite the HCI community to consider when and how intermediate artifacts should be designed to promote effective data science thinking.

2603.24857 2026-03-27 cs.CR cs.AI cs.CL cs.CV cs.LG

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Zhenyi Wang, Siyu Luan

Comments Published at Transactions on Machine Learning Research (TMLR)

详情
英文摘要

As machine learning (ML) systems expand in both scale and functionality, the security landscape has become increasingly complex, with a proliferation of attacks and defenses. However, existing studies largely treat these threats in isolation, lacking a coherent framework to expose their shared principles and interdependencies. This fragmented view hinders systematic understanding and limits the design of comprehensive defenses. Crucially, the two foundational assets of ML -- \textbf{data} and \textbf{models} -- are no longer independent; vulnerabilities in one directly compromise the other. The absence of a holistic framework leaves open questions about how these bidirectional risks propagate across the ML pipeline. To address this critical gap, we propose a \emph{unified closed-loop threat taxonomy} that explicitly frames model-data interactions along four directional axes. Our framework offers a principled lens for analyzing and defending foundation models. The resulting four classes of security threats represent distinct but interrelated categories of attacks: (1) Data$\rightarrow$Data (D$\rightarrow$D): including \emph{data decryption attacks and watermark removal attacks}; (2) Data$\rightarrow$Model (D$\rightarrow$M): including \emph{poisoning, harmful fine-tuning attacks, and jailbreak attacks}; (3) Model$\rightarrow$Data (M$\rightarrow$D): including \emph{model inversion, membership inference attacks, and training data extraction attacks}; (4) Model$\rightarrow$Model (M$\rightarrow$M): including \emph{model extraction attacks}. Our unified framework elucidates the underlying connections among these security threats and establishes a foundation for developing scalable, transferable, and cross-modal security strategies, particularly within the landscape of foundation models.

2603.24849 2026-03-27 cs.HC cs.AI cs.CV cs.CY

Gaze patterns predict preference and confidence in pairwise AI image evaluation

Nikolas Papadopoulos, Shreenithi Navaneethan, Sheng Bai, Ankur Samanta, Paul Sajda

Comments This paper has been accepted to ACM ETRA 2026

详情
英文摘要

Preference learning methods, such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO), rely on pairwise human judgments, yet little is known about the cognitive processes underlying these judgments. We investigate whether eye-tracking can reveal preference formation during pairwise AI-generated image evaluation. Thirty participants completed 1,800 trials while their gaze was recorded. We replicated the gaze cascade effect, with gaze shifting toward chosen images approximately one second before the decision. Cascade dynamics were consistent across confidence levels. Gaze features predicted binary choice (68% accuracy), with chosen images receiving more dwell time, fixations, and revisits. Gaze transitions distinguished high-confidence from uncertain decisions (66% accuracy), with low-confidence trials showing more image switches per second. These results show that gaze patterns predict both choice and confidence in pairwise image evaluations, suggesting that eye-tracking provides implicit signals relevant to the quality of preference annotations.

2603.24825 2026-03-27 cs.SE cs.AI

Learning From Developers: Towards Reliable Patch Validation at Scale for Linux

Chih-En Lin, Attreyee Mukherjee, Ajay Rawat, Ruqi Zhang, Pedro Fonseca

Comments Submitted to OSDI'26

详情
英文摘要

Patch reviewing is critical for software development, especially in distributed open-source development, which highly depends on voluntary work, such as Linux. This paper studies the past 10 years of patch reviews of the Linux memory management subsystem to characterize the challenges involved in patch reviewing at scale. Our study reveals that the review process is still primarily reliant on human effort despite a wide-range of automatic checking tools. Although kernel developers strive to review all patch proposals, they struggle to keep up with the increasing volume of submissions and depend significantly on a few developers for these reviews. To help scale the patch review process, we introduce FLINT, a patch validation system framework that synthesizes insights from past discussions among developers and automatically analyzes patch proposals for compliance. FLINT employs a rule-based analysis informed by past discussions among developers and an LLM that does not require training or fine-tuning on new data, and can continuously improve with minimum human effort. FLINT uses a multi-stage approach to efficiently distill the essential information from past discussions. Later, when a patch proposal needs review, FLINT retrieves the relevant validation rules for validation and generates a reference-backed report that developers can easily interpret and validate. FLINT targets bugs that traditional tools find hard to detect, ranging from maintainability issues, e.g., design choices and naming conventions, to complex concurrency issues, e.g., deadlocks and data races. FLINT detected 2 new issues in Linux v6.18 development cycle and 7 issues in previous versions. FLINT achieves 21% and 14% of higher ground-truth coverage on concurrency bugs than the baseline with LLM only. Moreover, FLINT achieves a 35% false positive rate, which is lower than the baseline.

2603.24775 2026-03-27 cs.CR cs.AI

AIP: Agent Identity Protocol for Verifiable Delegation Across MCP and A2A

Sunil Prakash

Comments 17 pages, 10 tables, 2 figures

详情
英文摘要

AI agents increasingly call tools via the Model Context Protocol (MCP) and delegate to other agents via Agent-to-Agent (A2A), yet neither protocol verifies agent identity. A scan of approximately 2,000 MCP servers found all lacked authentication. In our survey, we did not identify a prior implemented protocol that jointly combines public-key verifiable delegation, holder-side attenuation, expressive chained policy, transport bindings across MCP/A2A/HTTP, and provenance-oriented completion records. We introduce Invocation-Bound Capability Tokens (IBCTs), a primitive that fuses identity, attenuated authorization, and provenance binding into a single append-only token chain. IBCTs operate in two wire formats: compact mode (a signed JWT for single-hop cases) and chained mode (a Biscuit token with Datalog policies for multi-hop delegation). We provide reference implementations in Python and Rust with full cross-language interoperability. Compact mode verification takes 0.049ms (Rust) and 0.189ms (Python), with 0.22ms overhead over no-auth in real MCP-over-HTTP deployment. In a real multi-agent deployment with Gemini 2.5 Flash, AIP adds 2.35ms of overhead (0.086% of total end-to-end latency). Adversarial evaluation across 600 attack attempts shows 100% rejection rate, with two attack categories (delegation depth violation and audit evasion through empty context) uniquely caught by AIP's chained delegation model that neither unsigned nor plain JWT deployments detect.

2603.24774 2026-03-27 cs.SE cs.AI

From Untestable to Testable: Metamorphic Testing in the Age of LLMs

Valerio Terragni

Comments Accepted for publication at IEEE Computer Magazine. This is the authors' accepted manuscript. Version of record available via DOI: 10.1109/MC.2026.3671990

详情
Journal ref
IEEE Computer Magazine, Software Engineering Column, September 2026
英文摘要

This article discusses the challenges of testing software systems with increasingly integrated AI and LLM functionalities. LLMs are powerful but unreliable, and labeled ground truth for testing rarely scales. Metamorphic Testing solves this by turning relations among multiple test executions into executable test oracles.

2603.24752 2026-03-27 physics.chem-ph cs.LG

Autotuning T-PaiNN: Enabling Data-Efficient GNN Interatomic Potential Development via Classical-to-Quantum Transfer Learning

Vivienne Pelletier, Vedant Bhat, Daniel J. Rivera, Steven A. Wilson, Christopher L. Muhich

Comments 19 pages, 7 figures

详情
英文摘要

Machine-learned interatomic potentials (MLIPs), particularly graph neural network (GNN)-based models, offer a promising route to achieving near-density functional theory (DFT) accuracy at significantly reduced computational cost. However, their practical deployment is often limited by the large volumes of expensive quantum mechanical training data required. In this work, we introduce a transfer learning framework, Transfer-PaiNN (T-PaiNN), that substantially improves the data efficiency of GNN-MLIPs by leveraging inexpensive classical force field data. The approach consists of pretraining a PaiNN MLIP architecture on large-scale datasets generated from classical molecular simulations, followed by fine-tuning (dubbed autotuning) using a comparatively small DFT dataset. We demonstrate the effectiveness of autotuning T-PaiNN on both gas-phase molecular systems (QM9 dataset) and condensed-phase liquid water. Across all cases, T-PaiNN significantly outperforms models trained solely on DFT data, achieving order-of-magnitude reductions in mean absolute error while accelerating training convergence. For example, using the QM9 data set, error reductions of up to 25 times are observed in low-data regimes, while liquid water simulations show improved predictions of energies, forces, and experimentally relevant properties such as density and diffusion. These gains arise from the model's ability to learn general features of the potential energy surface from extensive classical sampling, which are subsequently refined to quantum accuracy. Overall, this work establishes transfer learning from classical force fields as a practical and computationally efficient strategy for developing high-accuracy, data-efficient GNN interatomic potentials, enabling broader application of MLIPs to complex chemical systems.

2603.24738 2026-03-27 cs.DC cs.AI cs.LG cs.MA

Decentralized Task Scheduling in Distributed Systems: A Deep Reinforcement Learning Approach

Daniel Benniah John

Comments 12 pages, 8 figures. Under review. Code available at GitHub

详情
英文摘要

Efficient task scheduling in large-scale distributed systems presents significant challenges due to dynamic workloads, heterogeneous resources, and competing quality-of-service requirements. Traditional centralized approaches face scalability limitations and single points of failure, while classical heuristics lack adaptability to changing conditions. This paper proposes a decentralized multi-agent deep reinforcement learning (DRL-MADRL) framework for task scheduling in heterogeneous distributed systems. We formulate the problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and develop a lightweight actor-critic architecture implemented using only NumPy, enabling deployment on resource-constrained edge devices without heavyweight machine learning frameworks. Using workload characteristics derived from the publicly available Google Cluster Trace dataset, we evaluate our approach on a 100-node heterogeneous system processing 1,000 tasks per episode over 30 experimental runs. Experimental results demonstrate 15.6% improvement in average task completion time (30.8s vs 36.5s for random baseline), 15.2% energy efficiency gain (745.2 kWh vs 878.3 kWh), and 82.3% SLA satisfaction compared to 75.5% for baselines, with all improvements statistically significant (p < 0.001). The lightweight implementation requires only NumPy, Matplotlib, and SciPy. Complete source code and experimental data are provided for full reproducibility at https://github.com/danielbenniah/marl-distributed-scheduling.

2603.24704 2026-03-27 stat.ME cs.LG stat.AP stat.ML

Conformal Selective Prediction with General Risk Control

Tian Bai, Ying Jin

详情
英文摘要

In deploying artificial intelligence (AI) models, selective prediction offers the option to abstain from making a prediction when uncertain about model quality. To fulfill its promise, it is crucial to enforce strict and precise error control over cases where the model is trusted. We propose Selective Conformal Risk control with E-values (SCoRE), a new framework for deriving such decisions for any trained model and any user-defined, bounded and continuously-valued risk. SCoRE offers two types of guarantees on the risk among ``positive'' cases in which the system opts to trust the model. Built upon conformal inference and hypothesis testing ideas, SCoRE first constructs a class of (generalized) e-values, which are non-negative random variables whose product with the unknown risk has expectation no greater than one. Such a property is ensured by data exchangeability without requiring any modeling assumptions. Passing these e-values on to hypothesis testing procedures, we yield the binary trust decisions with finite-sample error control. SCoRE avoids the need of uniform concentration, and can be readily extended to settings with distribution shifts. We evaluate the proposed methods with simulations and demonstrate their efficacy through applications to error management in drug discovery, health risk prediction, and large language models.

2603.24703 2026-03-27 cs.SE cs.RO cs.SY eess.SY

IndustriConnect: MCP Adapters and Mock-First Evaluation for AI-Assisted Industrial Operations

Melwin Xavier, Melveena Jolly, Vaisakh M A, Midhun Xavier

详情
英文摘要

AI assistants can decompose multi-step workflows, but they do not natively speak industrial protocols such as Modbus, MQTT/Sparkplug B, or OPC UA, so this paper presents INDUSTRICONNECT, a prototype suite of Model Context Protocol (MCP) adapters that expose industrial operations as schema-discoverable AI tools while preserving protocol-specific connectivity and safety controls; the system uses a common response envelope and a mock-first workflow so adapter behavior can be exercised locally before connecting to plant equipment, and a deterministic benchmark covering normal, fault-injected, stress, and recovery scenarios evaluates the flagship adapters, comprising 870 runs (480 normal, 210 fault-injected, 120 stress, 60 recovery trials) and 2820 tool calls across 7 fault scenarios and 12 stress scenarios, where the normal suite achieved full success, the fault suite confirmed structured error handling with adapter-level uint16 range validation, the stress suite identified concurrency boundaries, and same-session recovery after endpoint restart is demonstrated for all three protocols, with results providing evidence spanning adapter correctness, concurrency behavior, and structured error handling for AI-assisted industrial operations.

2603.24692 2026-03-27 cs.NE cs.AI

Reconstructing Spiking Neural Networks Using a Single Neuron with Autapses

Wuque Cai, Hongze Sun, Quan Tang, Shifeng Mao, Zhenxing Wang, Jiayi He, Duo Chen, Dezhong Yao, Daqing Guo

详情
英文摘要

Spiking neural networks (SNNs) are promising for neuromorphic computing, but high-performing models still rely on dense multilayer architectures with substantial communication and state-storage costs. Inspired by autapses, we propose time-delayed autapse SNN (TDA-SNN), a framework that reconstructs SNNs with a single leaky integrate-and-fire neuron and a prototype-learning-based training strategy. By reorganizing internal temporal states, TDA-SNN can realize reservoir, multilayer perceptron, and convolution-like spiking architectures within a unified framework. Experiments on sequential, event-based, and image benchmarks show competitive performance in reservoir and MLP settings, while convolutional results reveal a clear space--time trade-off. Compared with standard SNNs, TDA-SNN greatly reduces neuron count and state memory while increasing per-neuron information capacity, at the cost of additional temporal latency in extreme single-neuron settings. These findings highlight the potential of temporally multiplexed single-neuron models as compact computational units for brain-inspired computing.

2603.24634 2026-03-27 cs.NI cs.AI cs.LG

Dual-Graph Multi-Agent Reinforcement Learning for Handover Optimization

Matteo Salvatori, Filippo Vannella, Sebastian Macaluso, Stylianos E. Trevlakis, Carlos Segura Perales, José Suarez-Varela, Alexandros-Apostolos A. Boulogeorgos, Ioannis Arapakis

详情
英文摘要

HandOver (HO) control in cellular networks is governed by a set of HO control parameters that are traditionally configured through rule-based heuristics. A key parameter for HO optimization is the Cell Individual Offset (CIO), defined for each pair of neighboring cells and used to bias HO triggering decisions. At network scale, tuning CIOs becomes a tightly coupled problem: small changes can redirect mobility flows across multiple neighbors, and static rules often degrade under non-stationary traffic and mobility. We exploit the pairwise structure of CIOs by formulating HO optimization as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) on the network's dual graph. In this representation, each agent controls a neighbor-pair CIO and observes Key Performance Indicators (KPIs) aggregated over its local dual-graph neighborhood, enabling scalable decentralized decisions while preserving graph locality. Building on this formulation, we propose TD3-D-MA, a discrete Multi-Agent Reinforcement Learning (MARL) variant of the TD3 algorithm with a shared-parameter Graph Neural Network (GNN) actor operating on the dual graph and region-wise double critics for training, improving credit assignment in dense deployments. We evaluate TD3-D-MA in an ns-3 system-level simulator configured with real-world network operator parameters across heterogeneous traffic regimes and network topologies. Results show that TD3-D-MA improves network throughput over standard HO heuristics and centralized RL baselines, and generalizes robustly under topology and traffic shifts.

2603.24629 2026-03-27 cs.SE cs.AI cs.MA cs.SY eess.SY

Sketch2Simulation: Automating Flowsheet Generation via Multi Agent Large Language Models

Abdullah Bahamdan, Emma Pajak, John D. Hedengren, Antonio del Rio Chanona

Comments 27 pages, 14 figures, 8 tables

详情
英文摘要

Converting process sketches into executable simulation models remains a major bottleneck in process systems engineering, requiring substantial manual effort and simulator-specific expertise. Recent advances in generative AI have improved both engineering-diagram interpretation and LLM-assisted flowsheet generation, but these remain largely disconnected: diagram-understanding methods often stop at extracted graphs, while text-to-simulation workflows assume structured inputs rather than raw visual artifacts. To bridge this gap, we present an end-to-end multi-agent large language model system that converts process diagrams directly into executable Aspen HYSYS flowsheets. The framework decomposes the task into three coordinated layers: diagram parsing and interpretation, simulation model synthesis, and multi-level validation. Specialized agents handle visual interpretation, graph-based intermediate representation construction, code generation for the HYSYS COM interface, execution, and structural verification. We evaluate the framework on four chemical engineering case studies of increasing complexity, from a simple desalting process to an industrial aromatic production flowsheet with multiple recycle loops. The system produces executable HYSYS models in all cases, achieving complete structural fidelity on the two simpler cases and strong performance on the more complex ones, with connection consistency above 0.93 and stream consistency above 0.96. These results demonstrate a viable end-to-end sketch-to-simulation workflow while highlighting remaining challenges in dense recycle structures, implicit diagram semantics, and simulator-interface constraints.

2603.24618 2026-03-27 cs.AR cs.AI cs.LG

Causal AI For AMS Circuit Design: Interpretable Parameter Effects Analysis

Mohyeu Hussain, David Koblah, Reiner Dizon-Paradis, Domenic Forte

详情
英文摘要

Analog-mixed-signal (AMS) circuits are highly non-linear and operate on continuous real-world signals, making them far more difficult to model with data-driven AI than digital blocks. To close the gap between structured design data (device dimensions, bias voltages, etc.) and real-world performance, we propose a causal-inference framework that first discovers a directed-acyclic graph (DAG) from SPICE simulation data and then quantifies parameter impact through Average Treatment Effect (ATE) estimation. The approach yields human-interpretable rankings of design knobs and explicit 'what-if' predictions, enabling designers to understand trade-offs in sizing and topology. We evaluate the pipeline on three operational-amplifier families (OTA, telescopic, and folded-cascode) implemented in TSMC 65nm and benchmark it against a baseline neural-network (NN) regressor. Across all circuits the causal model reproduces simulation-based ATEs with an average absolute error of less than 25%, whereas the neural network deviates by more than 80% and frequently predicts the wrong sign. These results demonstrate that causal AI provides both higher accuracy and explainability, paving the way for more efficient, trustworthy AMS design automation.

2603.24617 2026-03-27 cs.DS cs.LG math.OC

Multi-LLM Query Optimization

Arlen Dean, Zijin Zhang, Stefanus Jasin, Yuqing Liu

详情
英文摘要

Deploying multiple large language models (LLMs) in parallel to classify an unknown ground-truth label is a common practice, yet the problem of optimally allocating queries across heterogeneous models remains poorly understood. In this paper, we formulate a robust, offline query-planning problem that minimizes total query cost subject to statewise error constraints which guarantee reliability for every possible ground-truth label. We first establish that this problem is NP-hard via a reduction from the minimum-weight set cover problem. To overcome this intractability, we develop a surrogate by combining a union bound decomposition of the multi-class error into pairwise comparisons with Chernoff-type concentration bounds. The resulting surrogate admits a closed-form, multiplicatively separable expression in the query counts and is guaranteed to be feasibility-preserving. We further show that the surrogate is asymptotically tight at the optimization level: the ratio of surrogate-optimal cost to true optimal cost converges to one as error tolerances shrink, with an explicit rate of $O\left(\log\log(1/α_{\min}) / \log(1/α_{\min})\right)$. Finally, we design an asymptotic fully polynomial-time approximation scheme (AFPTAS) that returns a surrogate-feasible query plan within a $(1+\varepsilon)$ factor of the surrogate optimum.

2603.24601 2026-03-27 eess.SP cs.AI cs.LG

FED-HARGPT: A Hybrid Centralized-Federated Approach of a Transformer-based Architecture for Human Context Recognition

Wandemberg Gibaut, Alexandre Osorio, Amparo Munoz, Sildolfo F. G. Neto, Fabio Grassiotto

Comments Paper presented on: July 2025 Conference: XVII Simpósio Brasileiro de Automação Inteligente (SBAI) At: São João del-Rei

详情
英文摘要

The study explores a hybrid centralized-federated approach for Human Activity Recognition (HAR) using a Transformer-based architecture. With the increasing ubiquity of edge devices, such as smartphones and wearables, a significant amount of private data from wearable and inertial sensors is generated, facilitating discreet monitoring of human activities, including resting, sleeping, and walking. This research focuses on deploying HAR technologies using mobile sensor data and leveraging Federated Learning within the Flower framework to evaluate the training of a federated model derived from a centralized baseline. The experimental results demonstrate the effectiveness of the proposed hybrid approach in improving the accuracy and robustness of HAR models while preserving data privacy in a non-IID data scenario. The federated learning setup demonstrated comparable performance to centralized models, highlighting the potential of federated learning to strike a balance between data privacy and model performance in real-world applications.

2603.24599 2026-03-27 eess.SP cs.AI

A Learnable SIM Paradigm: Fundamentals, Training Techniques, and Applications

Hetong Wang, Yashuai Cao, Tiejun Lv

Comments 9 pages, 5 figures, accepted by IEEE Wireless Communications Magazine

详情
英文摘要

Stacked intelligent metasurfaces (SIMs) represent a breakthrough in wireless hardware by comprising multilayer, programmable metasurfaces capable of analog computing in the electromagnetic (EM) wave domain. By examining their architectural analogies, this article reveals a deeper connection between SIMs and artificial neural networks (ANNs). Leveraging this profound structural similarity, this work introduces a learnable SIM architecture and proposes a learnable SIM-based machine learning (ML) paradigm for sixth-generation (6G)-andbeyond systems. Then, we develop two SIM-empowered wireless signal processing schemes to effectively achieve multi-user signal separation and distinguish communication signals from jamming signals. The use cases highlight that the proposed SIM-enabled signal processing system can significantly enhance spectrum utilization efficiency and anti-jamming capability in a lightweight manner and pave the way for ultra-efficient and intelligent wireless infrastructures.

2603.24598 2026-03-27 math.OC cs.LG cs.SY eess.SY

Response-Aware Risk-Constrained Control Barrier Function With Application to Vehicles

Qijun Liao, Jue Yang

Comments 22 pages, 20 figures

详情
英文摘要

This paper proposes a unified control framework based on Response-Aware Risk-Constrained Control Barrier Function for dynamic safety boundary control of vehicles. Addressing the problem of physical model parameter mismatch, the framework constructs an uncertainty propagation model that fuses nominal dynamics priors with direct vehicle body responses. Utilizing simplified single-track dynamics to provide a baseline direction for control gradients and covering model deviations through statistical analysis of body response signals, the framework eliminates the dependence on accurate online estimation of road surface adhesion coefficients. By introducing Conditional Value at Risk (CVaR) theory, the framework reformulates traditional deterministic safety constraints into probabilistic constraints on the tail risk of barrier function derivatives. Combined with a Bayesian online learning mechanism based on inverse Wishart priors, it identifies environmental noise covariance in real-time, adaptively tuning safety margins to reduce performance loss under prior parameter mismatch. Finally, based on Control Lyapunov Function (CLF), a unified Second-Order Cone Programming (SOCP) controller is constructed. Theoretical analysis establishes convergence of Sequential Convex Programming to local Karush-Kuhn-Tucker points and provides per-step probabilistic safety bounds. High-fidelity dynamics simulations demonstrate that under extreme conditions, the method not only eliminates the output divergence phenomenon of traditional methods but also achieves Pareto improvement in both safety and tracking performance. For the chosen risk level, the per-step safety violation probability is theoretically bounded by approximately 2%, validated through high-fidelity simulations showing zero boundary violations across all tested scenarios.

2603.24595 2026-03-27 cs.PL cs.AI

Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels

Mengting He, Shihao Xia, Haomin Jia, Wenfei Wu, Linhai Song

详情
英文摘要

The widespread adoption of large language models (LLMs) has made GPU-accelerated inference a critical part of modern computing infrastructure. Production inference systems rely on CUDA kernels to implement core transformer operations, yet these kernels are highly susceptible to memory-safety bugs due to model-dependent tensor layouts, intricate memory indexing, and massive thread-level parallelism. Such bugs can corrupt model weights, crash inference services, or even enable adversarial attacks. Existing techniques either depend on unavailable hardware, incur high overhead, or fail to handle kernel inputs with variable lengths, and none can effectively detect CUDA memory bugs in LLM inference systems. This paper presents Model2Kernel, the first practical system for automatically verifying the memory safety of CUDA kernels used in LLM inference. Model2Kernel performs model-aware dynamic analysis to determine how each model invokes kernels and to classify kernel arguments as either fixed by the model architecture or controlled by model users. Using this information, Model2Kernel then applies CUDA-specialized symbolic execution, supported by new abstractions for dynamic tensor memory and thread identifiers, to accurately pinpoint memory bugs in kernels. In the evaluation on CUDA kernels and models from vLLM, Hugging Face, and recent LLM research papers, Model2Kernel discovers 353 previously unknown bugs while producing only nine false positives, demonstrating its effectiveness.

2603.24477 2026-03-27 cs.SE cs.LG

Composer 2 Technical Report

Cursor Research, :, Aaron Chan, Ahmed Shalaby, Alexander Wettig, Aman Sanger, Andrew Zhai, Anurag Ajay, Ashvin Nair, Charlie Snell, Chen Lu, Chen Shen, Emily Jia, Federico Cassano, Hanpeng Liu, Haoyu Chen, Henry Wildermuth, Jacob Jackson, Janet Li, Jediah Katz, Jiajun Yao, Joey Hejna, Josh Warner, Julius Vering, Kevin Frans, Lee Danilek, Less Wright, Lujing Cen, Luke Melas-Kyriazi, Michael Truell, Michiel de Jong, Naman Jain, Nate Schmidt, Nathan Wang, Niklas Muennighoff, Oleg Rybkin, Paul Loh, Phillip Kravtsov, Rishabh Yadav, Sahil Shah, Sam Kottler, Alexander M Rush, Shengtong Zhang, Shomil Jain, Sriram Sankar, Stefan Heule, Stuart H. Sul, Sualeh Asif, Victor Rong, Wanqi Zhu, William Lin, Yuchen Wu, Yuri Volkov, Yury Zemlyanskiy, Zack Holbrook, Zhiyuan Zhang

详情
英文摘要

Composer 2 is a specialized model designed for agentic software engineering. The model demonstrates strong long-term planning and coding intelligence while maintaining the ability to efficiently solve problems for interactive use. The model is trained in two phases: first, continued pretraining to improve the model's knowledge and latent coding ability, followed by large-scale reinforcement learning to improve end-to-end coding performance through stronger reasoning, accurate multi-step execution, and coherence on long-horizon realistic coding problems. We develop infrastructure to support training in the same Cursor harness that is used by the deployed model, with equivalent tools and structure, and use environments that match real problems closely. To measure the ability of the model on increasingly difficult tasks, we introduce a benchmark derived from real software engineering problems in large codebases including our own. Composer 2 is a frontier-level coding model and demonstrates a process for training strong domain-specialized models. On our CursorBench evaluations the model achieves a major improvement in accuracy compared to previous Composer models (61.3). On public benchmarks the model scores 61.7 on Terminal-Bench and 73.7 on SWE-bench Multilingual in our harness, comparable to state-of-the-art systems.

2603.21152 2026-03-27 physics.geo-ph cs.AI

TRACE: A Multi-Agent System for Autonomous Physical Reasoning for Seismology

Feng Liu, Jian Xu, Xin Cui, Xinghao Wang, Zijie Guo, Jiong Wang, S. Mostafa Mousavi, Xinyu Gu, Hao Chen, Ben Fei, Lihua Fang, Fenghua Ling, Zefeng Li, Lei Bai

Comments 25 pages for main text and 164 pages for appendices

详情
英文摘要

Inferring physical mechanisms that govern earthquake sequences from geophysical observations remains a challenging task, particularly across tectonically distinct environments where similar seismic patterns can reflect different underlying processes. Current seismological processing and interpretation rely heavily on experts' choice of parameters and the synthesis of various seismological products, limiting reproducibility and the formation of generalizable knowledge across settings. Here we present TRACE (Trans-perspective Reasoning and Automated Comprehensive Evaluator), a multi-agent system that combines large language model planning with formal seismological constraints to derive auditable, physically grounded mechanistic inferences from raw observations. Applied to the 2019 Ridgecrest sequence, TRACE autonomously identifies stress-perturbation-induced delayed triggering, resolving the cascading interaction between the Mw 6.4 and Mw 7.1 mainshocks. For the 2025 Santorini-Kolumbo volcanic eruption, the system identifies a structurally guided intrusion model, distinguishing episodic migration via fault channels from the continuous propagation expected in homogeneous crustal failure. By providing a generalizable infrastructure for deriving physical insights from seismic phenomena, TRACE advances the field from expert-dependent analysis toward knowledge-guided autonomous discovery in Earth sciences.

2603.20283 2026-03-27 cs.IR cs.LG

FastPFRec: A Fast Personalized Federated Recommendation with Secure Sharing

Zhenxing Yan, Jidong Yuan, Yongqi Sun, Haiyang Liu, Zhihui Gao

详情
Journal ref
Expert Systems with Applications, 2026, 319: 132135
英文摘要

Graph neural network (GNN)-based federated recommendation systems effectively capture user-item relationships while preserving data privacy. However, existing methods often face slow convergence on graph data and privacy leakage risks during collaboration. To address these challenges, we propose FastPFRec (Fast Personalized Federated Recommendation with Secure Sharing), a novel framework that enhances both training efficiency and data security. FastPFRec accelerates model convergence through an efficient local update strategy and introduces a privacy-aware parameter sharing mechanism to mitigate leakage risks. Experiments on four real-world datasets (Yelp, Kindle, Gowalla-100k, and Gowalla-1m) show that FastPFRec achieves 32.0% fewer training rounds, 34.1% shorter training time, and 8.1% higher accuracy compared with existing baselines. These results demonstrate that FastPFRec provides an efficient and privacy-preserving solution for scalable federated recommendation.

2603.18865 2026-03-27 eess.SY cs.LG cs.SY

RadioDiff-FS: Physics-Informed Manifold Alignment in Few-Shot Diffusion Models for High-Fidelity Radio Map Construction

Xiucheng Wang, Zixuan Guo, Nan Cheng

详情
英文摘要

Radio maps (RMs) provide spatially continuous propagation characterizations essential for 6G network planning, but high-fidelity RM construction remains challenging. Rigorous electromagnetic solvers incur prohibitive computational latency, while data-driven models demand massive labeled datasets and generalize poorly from simplified simulations to complex multipath environments. This paper proposes RadioDiff-FS, a few-shot diffusion framework that adapts a pretrained main-path generator to multipath-rich target domains with only a small number of high-fidelity samples. The adaptation is grounded in a theoretical decomposition of the multipath RM into a dominant main-path component and a directionally sparse residual. This decomposition shows that the cross-domain shift corresponds to a bounded and geometrically structured feature translation rather than an arbitrary distribution change. A direction-consistency loss (DCL) is then introduced to constrain diffusion score updates along physically plausible propagation directions, thereby suppressing phase-inconsistent artifacts that arise in the low-data regime. Experiments show that RadioDiff-FS reduces NMSE by 59.5\% on static RMs and by 74.0\% on dynamic RMs relative to the vanilla diffusion baseline, achieving an SSIM of 0.9752 and a PSNR of 36.37 dB under severely limited supervision. Even in a one-shot setting with a single target-domain sample per scene, RadioDiff-FS outperforms all fully supervised baselines, confirming that the directional constraint provides an effective inductive bias under extreme data scarcity. Code is available at https://github.com/UNIC-Lab/RadioDiff-FS.

2603.11413 2026-03-27 cs.HC cs.AI

Evaluation format, not model capability, drives triage failure in the assessment of consumer health AI

David Fraile Navarro, Farah Magrabi, Enrico Coiera

Comments 12 pages

详情
英文摘要

Ramaswamy et al. reported in Nature Medicine that ChatGPT Health under-triages 51.6% of emergencies, concluding that consumer-facing AI triage poses safety risks. However, their evaluation used an exam-style protocol -- forced A/B/C/D output, knowledge suppression, and suppression of clarifying questions -- that differs fundamentally from how consumers use health chatbots. We tested five frontier LLMs (GPT-5.2, Claude Sonnet 4.6, Claude Opus 4.6, Gemini 3 Flash, Gemini 3.1 Pro) on a 17-scenario partial replication bank under constrained (exam-style, 1,275 trials) and naturalistic (patient-style messages, 850 trials) conditions, with targeted ablations and prompt-faithful checks using the authors' released prompts. Naturalistic interaction improved triage accuracy by 6.4 percentage points ($p = 0.015$). Diabetic ketoacidosis was correctly triaged in 100% of trials across all models and conditions. Asthma triage improved from 48% to 80%. The forced A/B/C/D format was the dominant failure mechanism: three models scored 0--24% with forced choice but 100% with free text (all $p < 10^{-8}$), consistently recommending emergency care in their own words while the forced-choice format registered under-triage. Prompt-faithful checks on the authors' exact released prompts confirmed the scaffold produces model-dependent, case-dependent results. Our results suggest that the headline under-triage rate is highly contingent on evaluation format and may not generalize as a stable estimate of deployed triage behavior. Valid evaluation of consumer health AI requires testing under conditions that reflect actual use.