arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1502
2603.05693 2026-03-09 eess.IV cs.AI cs.CV

Longitudinal Lesion Inpainting in Brain MRI via 3D Region Aware Diffusion

Zahra Karimaghaloo, Dumitru Fetco, Haz-Edine Assemlal, Hassan Rivaz, Douglas L. Arnold

详情
英文摘要

Accurate longitudinal analysis of brain MRI is often hindered by evolving lesions, which bias automated neuroimaging pipelines. While deep generative models have shown promise in inpainting these lesions, most existing methods operate cross-sectionally or lack 3D anatomical continuity. We present a novel pseudo-3D longitudinal inpainting framework based on Denoising Diffusion Probabilistic Models (DDPM). Our approach utilizes multi-channel conditioning to incorporate longitudinal context from distinct visits (t_1, t_2) and extends Region-Aware Diffusion (RAD) to the medical domain, focusing the generative process on pathological regions without altering surrounding healthy tissue. We evaluated our model against state-of-the-art baselines on longitudinal brain MRI from 93 patients. Our model significantly outperforms the leading baseline (FastSurfer-LIT) in terms of perceptual fidelity, reducing the Learned Perceptual Image Patch Similarity (LPIPS) distance from 0.07 to 0.03 while effectively eliminating inter-slice discontinuities. Furthermore, our model demonstrates high longitudinal stability with a Temporal Fidelity Index of 1.024, closely approaching the ideal value of 1.0 and substantially narrowing the gap compared to LIT's TFI of 1.22. Notably, the RAD mechanism provides a substantial gain in efficiency; our framework achieves an average processing time of 2.53 min per volume, representing approximately 10x speedup over the 24.30 min required by LIT. By leveraging longitudinal priors and region-specific denoising, our framework provides a highly reliable and efficient preprocessing step for the study of progressive neurodegenerative diseases. A derivative dataset consisting of 93 pre-processed scans used for testing will be available upon request after acceptance. Code will be released upon acceptance.

2603.05692 2026-03-09 cs.DC cs.LG cs.PF

Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and Bottlenecks

Burak Topcu, Musa Oguzhan Cim, Poovaiah Palangappa, Meena Arunachalam, Mahmut Taylan Kandemir

Comments 17 pages, 8 figures, 3 tables

详情
英文摘要

Breakthroughs in the generative AI domain have fueled an explosion of large language model (LLM)-powered applications, whose workloads fundamentally consist of sequences of inferences through transformer architectures. Within this rapidly expanding ecosystem, dense LLMs--those that activate all model parameters for each token generation--form the foundation for advanced expert-based variants. Dense models continue to dominate because of their strong generalization ability, scalability, ease of fine-tuning, and versatility across diverse tasks. In LLM inference systems, performance is mainly characterized by latency, response time, and throughput (i.e., tokens generated per unit of time). Latency and throughput are inherently coupled: optimizing for one often comes at the expense of the other. Moreover, batching strategies and parallelism configurations, which are essential when dense model parameters exceed device memory capacity, can significantly affect both latency and overall system throughput. This paper (i) investigates the workloads of two representative dense LLMs--Llama-3.1-70B and Llama-3.1-405B, focusing in particular on intra-node parallelization schemes, (ii) analyzes how input characteristics, batching, and parallelism strategies influence latency flexibility and the latency-throughput tradeoff, and (iii) identifies key performance bottlenecks that inform design choices for meeting service-level agreements (SLAs) and sustaining inference quality. Our empirical evaluations reveal that Tensor Parallelism (TP) improves the latency objectives while Pipeline Parallelism (PP) is better-suited for throughput-oriented applications. We highlight that their hybrid usage by controlling the TP and PP degrees provides control over the latency-throughput interplay.

2603.05689 2026-03-09 cs.CR cs.AI

SecureRAG-RTL: A Retrieval-Augmented, Multi-Agent, Zero-Shot LLM-Driven Framework for Hardware Vulnerability Detection

Touseef Hasan, Blessing Airehenbuwa, Nitin Pundir, Souvika Sarkar, Ujjwal Guin

详情
英文摘要

Large language models (LLMs) have shown remarkable capabilities in natural language processing tasks, yet their application in hardware security verification remains limited due to scarcity of publicly available hardware description language (HDL) datasets. This knowledge gap constrains LLM performance in detecting vulnerabilities within HDL designs. To address this challenge, we propose SecureRAG-RTL, a novel Retrieval-Augmented Generation (RAG)-based approach that significantly enhances LLM-based security verification of hardware designs. Our approach integrates domain-specific retrieval with generative reasoning, enabling models to overcome inherent limitations in hardware security expertise. We establish baseline vulnerability detection rates using prompt-only methods and then demonstrate that SecureRAG-RTL achieves substantial improvements across diverse LLM architectures, regardless of size. On average, our method increases detection accuracy by about 30%, highlighting its effectiveness in bridging domain knowledge gaps. For evaluation, we curated and annotated a benchmark dataset of 14 HDL designs containing real-world security vulnerabilities, which we will release publicly to support future research. These findings underscore the potential of RAG-driven augmentation to enable scalable, efficient, and accurate hardware security verification workflows.

2603.05681 2026-03-09 eess.IV cs.CV

Gabor Primitives for Accelerated Cardiac Cine MRI Reconstruction

Wenqi Huang, Veronika Spieker, Nil Stolt-Ansó, Natascha Niessen, Maik Dannecker, Sevgi Gokce Kafali, Sila Kurugol, Julia A. Schnabel, Daniel Rueckert

详情
英文摘要

Accelerated cardiac cine MRI requires reconstructing spatiotemporal images from highly undersampled k-space data. Implicit neural representations (INRs) enable scan-specific reconstruction without large training datasets, but encode content implicitly in network weights without physically interpretable parameters. Gaussian primitives provide an explicit and geometrically interpretable alternative, but their spectra are confined near the k-space origin, limiting high-frequency representation. We propose Gabor primitives for MRI reconstruction, modulating each Gaussian envelope with a complex exponential to place its spectral support at an arbitrary k-space location, enabling efficient representation of both smooth structures and sharp boundaries. To exploit spatiotemporal redundancy in cardiac cine, we decompose per-primitive temporal variation into a low-rank geometry basis capturing cardiac motion and a signal-intensity basis modeling contrast changes. Experiments on cardiac cine data with Cartesian and radial trajectories show that Gabor primitives consistently outperform compressed sensing, Gaussian primitives, and hash-grid INR baselines, while providing a compact, continuous-resolution representation with physically meaningful parameters.

2603.05612 2026-03-09 q-bio.NC cs.LG stat.AP stat.ML

Behavior-dLDS: A decomposed linear dynamical systems model for neural activity partially constrained by behavior

Eva Yezerets, En Yang, Misha B. Ahrens, Adam S. Charles

详情
英文摘要

Brain-wide recordings of large-scale networks of neurons now provide an unprecedented view into how the brain drives behavior. However, brain activity contains both information directly related to behavior as well as the potential for many internal computations. Moreover, observable behavior is executed not only by the brain, but also by the spinal cord and peripheral nervous system. Behavior is a coarse-grained product of neural activity, and we thus take the view that it can be best represented by lower-dimensional latent neural dynamics. Capturing this indirect relationship while disambiguating behavior-generating networks from internal computations running in parallel requires new modeling approaches that can embody the parallel and distributed nature of large-scale neural populations. We thus present behavior-decomposed linear dynamical systems (b-dLDS) to disentangle simultaneously recorded subsystems and identify how the latent neural subsystems relate to behavior. We demonstrate the ability of b-dLDS to decouple behavioral vs. internal computations on controlled, simulated data, showing improvements over a state-of-the-art model that uses behavior to supervise all dynamics based on behavior. We then show that b-dLDS can further scale up to tens of thousands of neurons by applying our model to large-scale recording of a zebrafish hindbrain during the complex positional homeostasis behavior, wherein b-dLDS highlights behavior-related dynamic connectivity networks.

2603.05578 2026-03-09 cs.SE cs.AI

Tool-Genesis: A Task-Driven Tool Creation Benchmark for Self-Evolving Language Agent

Bowei Xia, Mengkang Hu, Shijian Wang, Jiarui Jin, Wenxiang Jiao, Yuan Lu, Kexin Li, Ping Luo

Comments 25 pages, 10 figures, 2 tables

详情
英文摘要

Research on self-evolving language agents has accelerated, drawing increasing attention to their ability to create, adapt, and maintain tools from task requirements. However, existing benchmarks predominantly rely on predefined specifications, which limits scalability and hinders truly autonomous evolution. While recent studies attempt to dynamically generate tools, they primarily emphasize downstream performance, resulting in a "black-box" evaluation that makes it difficult to attribute failures to specific causes. To address this, we propose Tool-Genesis, a diagnostic benchmark designed to quantify agent capabilities across multiple dimensions, including interface compliance, functional correctness, and downstream utility. Tool-Genesis evaluates whether agents can construct task-relevant tools solely from abstract requirements (without preset specifications) and use them to solve realistic problems. Crucially, we find that even state-of-the-art models struggle to produce precise tool interfaces or executable logic in a one-shot setting. These minor initial flaws are amplified through the pipeline, leading to a sharp degradation in downstream metrics. We hope Tool-Genesis will guide future research toward training and steering models to synthesize persistent, general-purpose tools that better address real-world challenges.

2603.05575 2026-03-09 stat.ML cs.LG

Prediction-Powered Conditional Inference

Yang Sui, Jin Zhou, Hua Zhou, Xiaowu Dai

详情
英文摘要

We study prediction-powered conditional inference in the setting where labeled data are scarce, unlabeled covariates are abundant, and a black-box machine-learning predictor is available. The goal is to perform statistical inference on conditional functionals evaluated at a fixed test point, such as conditional means, without imposing a parametric model for the conditional relationship. Our approach combines localization with prediction-based variance reduction. First, we introduce a reproducing kernel-based localization method that learns a data-adaptive weight function from covariates and reformulates the target conditional moment at the test point as a weighted unconditional moment. Second, we incorporate machine-learning predictions through a correction-based decomposition of this localized moment, yielding a prediction-powered estimator and confidence interval that reduce variance when the predictor is informative while preserving validity regardless of predictor accuracy. We establish nonasymptotic error bounds and minimax-optimal convergence rates for the resulting estimator, prove pointwise asymptotic normality with consistent variance estimation, and provide an explicit variance decomposition that characterizes how machine-learning predictions and unlabeled covariates improve statistical efficiency. Numerical experiments on simulated and real datasets demonstrate valid conditional coverage and substantially sharper confidence intervals than alternative methods.

2603.05572 2026-03-09 q-bio.GN cs.LG

Machine Learning for analysis of Multiple Sclerosis cross-tissue bulk and single-cell transcriptomics data

Francesco Massafra, Samuele Punzo, Silvia Giulia Galfré, Alessandro Maglione, Simone Pernice, Stefano Forti, Simona Rolla, Marco Beccuti, Marinella Clerico, Corrado Priami, Alina Sîrbu

详情
英文摘要

Multiple Sclerosis (MS) is a chronic autoimmune disease of the central nervous system whose molecular mechanisms remain incompletely understood. In this study, we developed an end-to-end machine learning pipeline to analyze transcriptomic data from peripheral blood mononuclear cells and cerebrospinal fluid, integrating both bulk microarray and single-cell RNA sequencing datasets (concentrating on CD4+ and B-cells). After rigorous preprocessing, batch correction, and gene declustering, XGBoost classifiers were trained to distinguish MS patients from healthy controls. Explainable AI tools, namely SHapley Additive exPlanations (SHAP), were employed to identify key genes driving classification, and results were compared with Differential Expression Analysis (DEA). SHAP-prioritized genes were further investigated through interaction networks and pathway enrichment analyses. The models achieved strong performance, particularly in CSF B-cells (AUC=0.94) and microarray (AUC=0.86). SHAP gene selection proved to be complementary to classical DEA. Gene clusters identified across multiple datasets highlighted immune activation, non-canonical immune checkpoints (ITK, CLEC2D, KLRG1, CEACAM1), ribosomal and translational programs, ubiquitin-proteasome regulation, lipid trafficking, and Epstein-Barr virus-related pathways. Our integrative and explainable framework reveals complementary insights beyond conventional analysis and provides novel mechanistic hypotheses and potential biomarkers for MS pathogenesis.

2603.05568 2026-03-09 stat.ML cs.LG

Learning Optimal Distributionally Robust Individualized Treatment Rules Integrating Multi-Source Data

Wenhai Cui, Wen Su, Xingqiu Zhao

详情
英文摘要

Integrative analysis of multiple datasets for estimating optimal individualized treatment rules (ITRs) can enhance decision efficiency. A central challenge is posterior shift, wherein the conditional distribution of potential outcomes given covariates differs between source and target populations. We propose a prior information-based distributionally robust ITR (PDRO-ITR) that maximizes the worst-case policy value over a covariate-dependent distributional uncertainty set, ensuring robust performance under posterior shift. The uncertainty set is constructed as an individualized combination of source distributions, with weights combining prior source-membership probabilities and deviation terms constrained to the probability simplex to accommodate posterior shift. We derive a closed-form solution for the PDRO-ITR and develop an adaptive procedure to tune the uncertainty level. We establish risk bounds for the PDRO-ITR estimator, which guarantees robust performance under the worst case. Extensive simulations and two real-data applications demonstrate that the proposed method achieves superior performance compared to existing approaches.

2603.05562 2026-03-09 cs.LO cs.AI

Model Change for Description Logic Concepts

Ana Ozaki, Jandson S. Ribeiro

Comments Presented at AAAI 2026 (main track)

详情
英文摘要

We consider the problem of modifying a description logic concept in light of models represented as pointed interpretations. We call this setting model change, and distinguish three main kinds of changes: eviction, which consists of only removing models; reception, which incorporates models; and revision, which combines removal with incorporation of models in a single operation. We introduce a formal notion of revision and argue that it does not reduce to a simple combination of eviction and reception, contrary to intuition. We provide positive and negative results on the compatibility of eviction and reception for EL and ALC description logic concepts and on the compatibility of revision for ALC concepts.

2603.05553 2026-03-09 cs.SE cs.AI cs.CL

EigenData: A Self-Evolving Multi-Agent Platform for Function-Calling Data Synthesis, Auditing, and Repair

Jiaao Chen, Jingyuan Qi, Mingye Gao, Wei-Chen Wang, Hanrui Wang, Di Jin

详情
英文摘要

Function-calling agents -- large language models that invoke tools and APIs -- require high-quality, domain-specific training data spanning executable environments, backing databases, and diverse multi-turn trajectories. We introduce EigenData, an integrated, self-evolving platform that automates the full data lifecycle through a multi-agent architecture. A top-level orchestrator, EigenCore, coordinates three specialized sub-systems: DatabaseAgent for realistic domain database construction, CodingAgent for verified executable environment generation with iterative test-debug loops, and DataAgent for multi-turn trajectory synthesis with self-evolving prompt optimization. Cross-component feedback ensures consistency across all artifacts. We apply EigenData to audit and repair the Berkeley Function-Calling Leaderboard (BFCL-V3), identifying systematic errors in function schemas, implementations, and reference trajectories, automatically correcting them through coordinated schema refinement, code-level bug fixes, and trajectory modification, and introducing an outcome-aware evaluation protocol that assesses task success via database-state correctness rather than turn-level trajectory matching. We demonstrate that the repaired benchmark, coupled with outcome-aware metrics, produces model rankings substantially better correlated with human judgments of functional correctness.

2603.05544 2026-03-09 stat.ME cs.LG stat.AP

An intuitive rearranging of the Yates covariance decomposition for probabilistic verification of forecasts with the Brier score

Bruno Hebling Vieira

Comments 4 pages, 0 figures

详情
英文摘要

Proper scoring rules are essential for evaluating probabilistic forecasts. We propose a simple algebraic rearrangement of the Yates covariance decomposition of the Brier score into three independently non-negative terms: a variance mismatch term, a correlation deficit term, and a calibration-in-the-large term. This rearrangement makes the optimality conditions for perfect forecasting transparent: the optimal forecast must simultaneously match the variance of outcomes, achieve perfect positive correlation with outcomes, and match the mean of outcomes. Any deviation from these conditions results in a positive contribution to the Brier score.

2603.05535 2026-03-09 eess.IV cs.CV cs.LG

Clinical-Injection Transformer with Domain-Adapted MAE for Lupus Nephritis Prognosis Prediction

Yuewen Huang, Zhitao Ye, Guangnan Feng, Fudan Zheng, Xia Gao, Yutong Lu

详情
英文摘要

Lupus nephritis (LN) is a severe complication of systemic lupus erythematosus that affects pediatric patients with significantly greater severity and worse renal outcomes compared to adults. Despite the urgent clinical need, predicting pediatric LN prognosis remains unexplored in computational pathology. Furthermore, the only existing histopathology-based approach for LN relies on multiple costly staining protocols and fails to integrate complementary clinical data. To address these gaps, we propose the first multimodal computational pathology framework for three-class treatment response prediction (complete remission, partial response, and no response) in pediatric LN, utilizing only routine PAS-stained biopsies and structured clinical data. Our framework introduces two key methodological innovations. First, a Clinical-Injection Transformer (CIT) embeds clinical features as condition tokens into patch-level self-attention, facilitating implicit and bidirectional cross-modal interactions within a unified attention space. Second, we design a decoupled representation-knowledge adaptation strategy using a domain-adapted Masked Autoencoder (MAE). This strategy explicitly separates self-supervised morphological feature learning from pathological knowledge extraction. Additionally, we introduce a multi-granularity morphological type injection mechanism to bridge distilled classification knowledge with downstream prognostic predictions at both the instance and patient levels. Evaluated on a cohort of 71 pediatric LN patients with KDIGO-standardized labels, our method achieves a three-class accuracy of 90.1% and an AUC of 89.4%, demonstrating its potential as a highly accurate and cost-effective prognostic tool.

2603.05532 2026-03-09 physics.chem-ph cs.AI

On the Reliability of AI Methods in Drug Discovery: Evaluation of Boltz-2 for Structure and Binding Affinity Prediction

Shunzhou Wan, Xibei Zhang, Xiao Xue, Peter V. Coveney

详情
英文摘要

Despite continuing hype about the role of AI in drug discovery, no "AI-discovered drugs" have so far received regulatory approval. Here we assess one of the latest AI based tools in this domain. The ability to rapidly predict protein-ligand structures and binding affinities is pivotal for accelerating drug discovery. Boltz-2, a recently developed biomolecular foundation model, aims to bridge the gap between AI efficiency and physics-based precision through a joint "co-folding" approach. In this study, we provide an extensive evaluation of Boltz-2 using two large-scale datasets: 16,780 compounds for 3CLPro and 21,702 compounds for TNKS2. We compare Boltz-2 predicted structures with traditional docking and binding affinities with binding free energies derived from the physics-based ESMACS protocol. Structural analysis reveals significant global RMSD variations, indicating that Boltz-2 predicts multiple protein conformations and ligand binding positions rather than a single converged pose. Energetic evaluations exhibit only weak to moderate correlations across the global datasets. Furthermore, a focused analysis of the top 100 compounds yields no significant correlation between the Boltz-2 predictions and the binding free energies from fine-grained ESMACS, alongside observed saturation difference in ligand structures. Our results show that while Boltz-2 offers substantial speed for initial screening, it lacks the energetic resolution required for lead identification. These findings highlight the necessity of employing physics-based methods for the reliability and refinement of AI-derived models.

2603.05528 2026-03-09 cs.MM cs.AI cs.CL cs.CV cs.SD eess.AS

Omni-C: Compressing Heterogeneous Modalities into a Single Dense Encoder

Kin Wai Lau, Yasar Abbas Ur Rehman, Lai-Man Po, Pedro Porto Buarque de Gusmão

详情
英文摘要

Recent multimodal systems often rely on separate expert modality encoders which cause linearly scaling complexity and computational overhead with added modalities. While unified Omni-models address this via Mixture-of-Expert (MoE) architectures with specialized experts and routing, they still inflate parameter counts and introduce routing overhead. In this paper, we propose Omni-C (Omni-Compress), a single dense Transformer-based encoder that learns competitive shared representations across heterogeneous modalities--images, audio, and text--through unimodal contrastive pretraining on large-scale unaligned data. By maximizing parameter sharing in the backbone and using lightweight modality-specific projection heads, Omni-C effectively mitigates inter-modality conflicts without requiring MoE, paired supervision, or routing. This design supports efficient deployment on memory-constrained systems via sequential modality processing and low-memory inference, eliminating the need for parallel expert loading or specialized hardware. Experiments show Omni-C achieves performance comparable to expert models in unimodal and cross-model tasks, with modest zero-shot degradation on audio and text that is largely recovered through lightweight linear probing or parameter efficient fine-tuning. The unified architecture substantially reduces inference memory usage compared to multi-encoder baselines, advancing efficient and scalable multimodal learning.

2603.05525 2026-03-09 physics.chem-ph cs.AI cs.CE

Molecular Representations for AI in Chemistry and Materials Science: An NLP Perspective

Sanjanasri JP, Pratiti Bhadra, N. Sukumar, Soman KP

详情
英文摘要

Deep learning, a subfield of machine learning, has gained importance in various application areas in recent years. Its growing popularity has led it to enter the natural sciences as well. This has created the need for molecular representations that are both machine-readable and understandable to scientists from different fields. Over the years, many chemical molecular representations have been constructed, and new ones continue to be developed as computer technology advances and knowledge of molecular complexity increases. This paper presents some of the most popular digital molecular representations inspired by natural language processing (NLP) and used in chemical informatics. In addition, the paper discusses some notable AI-based applications that use these representations. This paper aims to provide a guide to structural representations that are important for the application of AI in chemistry and materials science from the perspective of an NLP researcher. This review is a reference tool for researchers with little experience working with chemical representations who wish to work on projects at the interface of these fields.

2603.05520 2026-03-09 cs.MA cs.CR cs.LG

Information-Theoretic Privacy Control for Sequential Multi-Agent LLM Systems

Sadia Asif, Mohammad Mohammadi Amiri

详情
英文摘要

Sequential multi-agent large language model (LLM) systems are increasingly deployed in sensitive domains such as healthcare, finance, and enterprise decision-making, where multiple specialized agents collaboratively process a single user request. Although individual agents may satisfy local privacy constraints, sensitive information can still be inferred through sequential composition and intermediate representations. In this work, we study \emph{compositional privacy leakage} in sequential LLM agent pipelines. We formalize leakage using mutual information and derive a theoretical bound that characterizes how locally introduced leakage can amplify across agents under sequential execution. Motivated by this analysis, we propose a privacy-regularized training framework that directly constrains information flow between agent outputs and agent-local sensitive variables. We evaluate our approach across sequential agent pipelines of varying depth on three benchmark datasets, demonstrating stable optimization dynamics and consistent, interpretable privacy-utility trade-offs. Our results show that privacy in agentic LLM systems cannot be guaranteed by local constraints alone and must instead be treated as a system-level property during both training and deployment.

2603.05518 2026-03-09 cs.HC cs.CV

CoEditor++: Instruction-based Visual Editing via Cognitive Reasoning

Minheng Ni, Yutao Fan, Zhengyuan Yang, Yeli Shen, Yuxiang Wei, Yaowen Zhang, Lijuan Wang, Lei Zhang, Wangmeng Zuo

详情
英文摘要

Recent advances in large multimodal models (LMMs) have enabled instruction-based image editing, allowing users to modify visual content via natural language descriptions. However, existing approaches often struggle with high-level semantic reasoning and visual consistency, particularly under ambiguous or complex instructions. To address these challenges, we propose CoEditor++, a cognitively structured, training-free framework that decomposes editing into "what to edit" and "how to edit" through two cognitive stages with a reflective self-selection mechanism, enabling robust, fine-grained, and interpretable editing. Built entirely from open-sourced components, CoEditor++ requires no additional training or fine-tuning, ensuring transparency and cross-domain applicability. We evaluate CoEditor++ on SmartEdit, a widely used benchmark for general editing, and AltBear, a privacy and compliance-oriented benchmark. Experimental results show that CoEditor++ achieves state-of-the-art performance in both general editing and responsible editing tasks compared with open-sourced models that require training on specialized editing datasets maintaining significantly higher visual consistency. When compared with closed-source models such as Nano Banana Pro or GPT-4o, CoEditor++ preserves comparable instruction following while still substantially outperforming them in visual consistency. Extensive ablation studies confirm that the effectiveness of CoEditor++ benefits from its structured cognitive design rather than any specific model component. Our findings suggest the potential toward cognitive-centric instruction-based image editing.

2603.05511 2026-03-09 cs.HC cs.AI cs.GR cs.RO

An Embodied Companion for Visual Storytelling

Patrick Tresset, Markus Wulfmeier

Comments 35 pages, 18 figures

详情
英文摘要

As artificial intelligence shifts from pure tool for delegation toward agentic collaboration, its use in the arts can shift beyond the exploration of machine autonomy toward synergistic co-creation. While our earlier robotic works utilized automation to distance the artist's intent from the final mark, we present Companion: an artistic apparatus that integrates a drawing robot with Large Language Models (LLMs) to re-center human-machine presence. By leveraging in-context learning and real-time tool use, the system engages in bidirectional interaction via speech and sketching. This approach transforms the robot from a passive executor into a playful co-creative partner capable of driving shared visual storytelling into unexpected aesthetic territories. To validate this collaborative shift, we employed the Consensual Assessment Technique (CAT) with a panel of seven art-world experts. Results confirm that the system produces works with a distinct aesthetic identity and professional exhibition merit, demonstrating the potential of AI as a highly capable artistic collaborator.

2603.05510 2026-03-09 cs.HC cs.AI cs.CY

Exploring Human-in-the-Loop Themes in AI Application Development: An Empirical Thematic Analysis

Parm Suksakul, Nathan Kittichaikoonkij, Nakhin Polthai, Aung Pyae

Comments Accepted for presentation at IEEE CON 2026

详情
英文摘要

Developing and deploying AI applications in organizations is challenging when human decision authority and oversight are underspecified across the system lifecycle. Although Human-in-the-Loop (HITL) and Human-Centered AI (HCAI) principles are widely acknowledged, operational guidance for structuring roles, checkpoints, and feedback mechanisms remains fragmented. We report a multi-source qualitative study: a retrospective diary study of a customer-support chatbot and semi-structured interviews with eight AI experts from academia and industry. Through five-cycle thematic analysis of 1,435 codewords, we derive four themes: AI Governance and Human Authority, Human-in-the-Loop Iterative Refinement, AI System Lifecycle and Operational Constraints, and Human-AI Team Collaboration and Coordination. These themes provide empirical inputs for subsequent HITL framework design and validation.

2603.05035 2026-03-09 cs.CR cs.LG

Good-Enough LLM Obfuscation (GELO)

Anatoly Belikov, Ilya Fedotov

详情
英文摘要

Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV caches and hidden states, threatening prompt privacy for open-source models. Cryptographic protections such as MPC and FHE offer strong guarantees but remain one to two orders of magnitude too slow for interactive inference, while static obfuscation schemes break under multi-run statistical attacks once the model is known. We present GELO (Good-Enough LLM Obfuscation), a lightweight protocol for privacy-preserving inference that limits information leakage from untrusted accelerator observations by hiding hidden states with fresh, per-batch invertible mixing. For each offloaded projection, the TEE samples a random matrix $A$, forms $U = AH$, offloads $U$ and weights W to the accelerator, and then applies $A^{-1}$ on return, so that $A^{-1}((AH)W ) = HW$ and outputs are unchanged. Because mixing is never reused across batches, the attacker faces only a single-batch blind source separation problem. We analyse information leakage and introduce two practical defences: (i) non-orthogonal mixing to mask Gram matrices, and (ii) orthogonal mixing augmented with a small fraction of high-energy "shield" vectors that pollute higher-order statistics. On Llama-2 7B, GELO preserves float32 outputs exactly, closely matches low-precision baselines, offloads the dominant matrix multiplications with about 20-30% latency overhead, and defeats a range of ICA/BSS and anchor-based attacks.

2603.04672 2026-03-09 math.NA cs.LG cs.NA

Improving the accuracy of physics-informed neural networks via last-layer retraining

Saad Qadeer, Panos Stinis

Comments Approved for release by Pacific Northwest National Laboratory

详情
英文摘要

Physics-informed neural networks (PINNs) are a versatile tool in the burgeoning field of scientific machine learning for solving partial differential equations (PDEs). However, determining suitable training strategies for them is not obvious, with the result that they typically yield moderately accurate solutions. In this article, we propose a method for improving the accuracy of PINNs by coupling them with a post-processing step that seeks the best approximation in a function space associated with the network. We find that our method yields errors four to five orders of magnitude lower than those of the parent PINNs across architectures and dimensions. Moreover, we can reuse the basis functions for the linear space in more complex settings, such as time-dependent and nonlinear problems, allowing for transfer learning. Our approach also provides a residual-based metric that allows us to optimally choose the number of basis functions employed.

2602.22289 2026-03-09 q-bio.QM cs.LG q-bio.GN

What Topological and Geometric Structure Do Biological Foundation Models Learn? Evidence from 141 Hypotheses

Ihor Kendiukhov

详情
英文摘要

When biological foundation models such as scGPT and Geneformer process single-cell gene expression, what geometric and topological structure forms in their internal representations? Is that structure biologically meaningful or a training artifact, and how confident should we be in such claims? We address these questions through autonomous large-scale hypothesis screening: an AI-driven executor-brainstormer loop that proposed, tested, and refined 141 geometric and topological hypotheses across 52 iterations, covering persistent homology, manifold distances, cross-model alignment, community structure, and directed topology, all with explicit null controls and disjoint gene-pool splits. Three principal findings emerge. First, the models learn genuine geometric structure. Gene embedding neighborhoods exhibit non-trivial topology, with persistent homology significant in 11 of 12 transformer layers at p < 0.05 in the weakest domain and 12 of 12 in the other two. A multi-level distance hierarchy shows that manifold-aware metrics outperform Euclidean distance for identifying regulatory gene pairs, and graph community partitions track known transcription factor target relationships. Second, this structure is shared across independently trained models. CCA alignment between scGPT and Geneformer yields canonical correlation of 0.80 and gene retrieval accuracy of 72 percent, yet none of 19 tested methods reliably recover gene-level correspondences. The models agree on the global shape of gene space but not on precise gene placement. Third, the structure is more localized than it first appears. Under stringent null controls applied across all null families, robust signal concentrates in immune tissue, while lung and external lung signals weaken substantially.

2602.11247 2026-03-09 cs.CR cs.AI

Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection

J Alex Corll

详情
英文摘要

Multi-turn prompt injection attacks distribute malicious intent across multiple conversation turns, exploiting the assumption that each turn is evaluated independently. While single-turn detection has been extensively studied, no published formula exists for aggregating per-turn pattern scores into a conversation-level risk score at the proxy layer -- without invoking an LLM. We identify a fundamental flaw in the intuitive weighted-average approach: it converges to the per-turn score regardless of turn count, meaning a 20-turn persistent attack scores identically to a single suspicious turn. Drawing on analogies from change-point detection (CUSUM), Bayesian belief updating, and security risk-based alerting, we propose peak + accumulation scoring -- a formula combining peak single-turn risk, persistence ratio, and category diversity. Evaluated on 10,654 multi-turn conversations -- 588 attacks sourced from WildJailbreak adversarial prompts and 10,066 benign conversations from WildChat -- the formula achieves 90.8% recall at 1.20% false positive rate with an F1 of 85.9%. A sensitivity analysis over the persistence parameter reveals a phase transition at rho ~ 0.4, where recall jumps 12 percentage points with negligible FPR increase. We release the scoring algorithm, pattern library, and evaluation harness as open source.

2602.10473 2026-03-09 cs.HC cs.AI cs.SI

Why Human Guidance Matters in Collaborative Vibe Coding

Haoyu Hu, Raja Marjieh, Katherine M Collins, Chenyi Li, Thomas L. Griffiths, Ilia Sucholutsky, Nori Jacoby

详情
英文摘要

Writing code has been one of the most transformative ways for human societies to translate abstract ideas into tangible technologies. Modern AI is changing this process by enabling experts and non-experts alike to generate code without actually writing it, instead using natural language instructions or "vibe coding". While increasingly popular, the impact of vibe coding on productivity and collaboration, and the role of humans in this process, remains unclear. Here, we introduce a controlled experimental framework for studying collaborative vibe coding and use it to compare human-led, AI-led, and hybrid groups. Across 20 experiments involving 737 human participants, we show that people provide uniquely effective high-level instructions for vibe coding, whereas AI-provided instructions often result in performance collapse. We further demonstrate that hybrid systems perform best when humans lead by providing instructions while evaluation is delegated to AI. Although AI systems can rapidly optimize performance for specific tasks, our work highlights the importance of human guidance in shaping future hybrid societies.

2601.18047 2026-03-09 physics.optics cs.ET cs.LG

Laser interferometry as a robust neuromorphic platform for machine learning

Amanuel Anteneh, Kyungeun Kim, J. M. Schwarz, Israel Klich, Olivier Pfister

详情
英文摘要

We present a method for implementing an optical neural network using only linear optical resources, namely field displacement and interferometry applied to coherent states of light. The nonlinearity required for learning in a neural network is realized via an encoding of the input into phase shifts allowing for far more straightforward experimental implementation compared to previous proposals for, and demonstrations of, $\textit{in situ}$ inference. Beyond $\textit{in situ}$ inference, the method enables $\textit{in situ}$ training by utilizing established techniques like parameter shift methods or physical backpropagation to extract gradients directly from measurements of the linear optical circuit. We also investigate the effect of photon losses and find the model to be very resilient to these.

2601.12436 2026-03-09 eess.AS cs.AI cs.LG cs.MM cs.SD

Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition

Linzhi Wu, Xingyu Zhang, Hao Yuan, Yakun Zhang, Changyan Zheng, Liang Xie, Tiejun Liu, Erwei Yin

Comments Accepted by ICASSP2026

详情
英文摘要

Audio-visual speech recognition (AVSR) typically improves recognition accuracy in noisy environments by integrating noise-immune visual cues with audio signals. Nevertheless, high-noise audio inputs are prone to introducing adverse interference into the feature fusion process. To mitigate this, recent AVSR methods often adopt mask-based strategies to filter audio noise during feature interaction and fusion, yet such methods risk discarding semantically relevant information alongside noise. In this work, we propose an end-to-end noise-robust AVSR framework coupled with speech enhancement, eliminating the need for explicit noise mask generation. This framework leverages a Conformer-based bottleneck fusion module to implicitly refine noisy audio features with video assistance. By reducing modality redundancy and enhancing inter-modal interactions, our method preserves speech semantic integrity to achieve robust recognition performance. Experimental evaluations on the public LRS3 benchmark suggest that our method outperforms prior advanced mask-based baselines under noisy conditions.

2601.06558 2026-03-09 cs.IT cs.CV math.IT

Robust Sparse Signal Recovery with Outliers: A Hard Thresholding Pursuit Approach Based on LAD

Jiao Xu, Peng Li, Bing Zheng

详情
英文摘要

Recovering a sparse signal from outlier-contaminated measurements is a fundamental challenge in many applications. While existing algorithms predominantly address scenarios with bounded noise or assume known signal sparsity, few methods tackle the more practical problem of sparse recovery from gross outliers without prior knowledge of sparsity. To bridge this gap, we study the sparsity-constrained Least Absolute Deviations (LAD) minimization problem. This paper proposes the Graded Fast Hard Thresholding Pursuit (GFHTP$_1$) algorithm with a quantile-truncated step size for $\ell_1$-loss minimization. In contrast to most state-of-the-art methods, our GFHTP$_1$ requires no prior knowledge of the signal's sparsity level. We establish a theoretical convergence analysis under mild conditions and further prove that an $s$-sparse signal can be recovered exactly within at most $s$ iterations. To our knowledge, these results provide the first efficient recovery guarantees for sparse signal reconstruction from outlier-corrupted measurements without a sparsity prior. Numerical experiments demonstrate that GFHTP$_1$ consistently outperforms competing algorithms in robustness to varying signal sparsity and outlier support size, while also achieving less computational time.

2601.06225 2026-03-09 cs.CY cs.AI cs.CL

Classroom AI: Large Language Models as Grade-Specific Teachers

Jio Oh, Steven Euijong Whang, James Evans, Jindong Wang

详情
英文摘要

Large Language Models (LLMs) offer a promising solution to complement traditional teaching and address global teacher shortages that affect hundreds of millions of children, but they fail to provide grade-appropriate responses for students at different educational levels. We introduce a framework for finetuning LLMs to generate age-appropriate educational content across six grade levels, from lower elementary to adult education. Our framework successfully adapts explanations to match students' comprehension capacities without sacrificing factual correctness. This approach integrates seven established readability metrics through a clustering method and builds a comprehensive dataset for grade-specific content generation. Evaluations across multiple datasets with 208 human participants demonstrate substantial improvements in grade-level alignment, achieving a 35.64 percentage point increase compared to prompt-based methods while maintaining response accuracy. AI-assisted learning tailored to different grade levels has the potential to advance educational engagement and equity.

2512.07329 2026-03-09 physics.chem-ph cond-mat.mtrl-sci cs.LG physics.comp-ph

Two-dimensional RMSD projections for reaction path visualization and validation

Rohit Goswami

Comments 11 pages, 4 figures

详情
英文摘要

Transition state or minimum energy path finding methods constitute a routine component of the computational chemistry toolkit. Standard analysis involves trajectories conventionally plotted in terms of the relative energy to the initial state against a cumulative displacement variable, or the image number. These dimensional reductions obscure structural rearrangements in high dimensions and are often history dependent. This precludes the ability to compare optimization histories of different methods beyond the number of calculations, time taken, and final saddle geometry. We present a method mapping trajectories onto a two-dimensional projection defined by a permutation corrected root mean square deviation from the reactant and product configurations. Energy is represented as an interpolated color-mapped surface constructed from all optimization steps using a gradient-enhanced Gaussian Process with the inverse multiquadric kernel, whose posterior variance contours delineate data-supported regions from extrapolated ones. A rotated coordinate frame decomposes the RMSD plane into reaction progress and orthogonal distance. We show the utility of the framework on a cycloaddition reaction, where a machine-learned potential saddle and density functional theory reference lie on comparable energy contours despite geometric displacements, along with the ratification of the visualization for more complex reactions, a Grignard rearrangement, and a conrotatory bicyclobutane ring opening.