arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1771
专题追踪
2603.22992 2026-03-25 eess.SY cs.RO cs.SY eess.SP

Design Guidelines for Nonlinear Kalman Filters via Covariance Compensation

Shida Jiang, Jaewoong Lee, Shengyu Tao, Scott Moura

Comments This manuscript has been accepted by ACC 2026

详情
英文摘要

Nonlinear extensions of the Kalman filter (KF), such as the extended Kalman filter (EKF) and the unscented Kalman filter (UKF), are indispensable for state estimation in complex dynamical systems, yet the conditions for a nonlinear KF to provide robust and accurate estimations remain poorly understood. This work proposes a theoretical framework that identifies the causes of failure and success in certain nonlinear KFs and establishes guidelines for their improvement. Central to our framework is the concept of covariance compensation: the deviation between the covariance predicted by a nonlinear KF and that of the EKF. With this definition and detailed theoretical analysis, we derive three design guidelines for nonlinear KFs: (i) invariance under orthogonal transformations, (ii) sufficient covariance compensation beyond the EKF baseline, and (iii) selection of compensation magnitude that favors underconfidence. Both theoretical analysis and empirical validation confirm that adherence to these principles significantly improves estimation accuracy, whereas fixed parameter choices commonly adopted in the literature are often suboptimal. The codes and the proofs for all the theorems in this paper are available at https://github.com/Shida-Jiang/Guidelines-for-Nonlinear-Kalman-Filters.

2603.22987 2026-03-25 cs.CR cs.LG

A Critical Review on the Effectiveness and Privacy Threats of Membership Inference Attacks

Najeeb Jebreel, David Sánchez, Josep Domingo-Ferrer

Comments To appear in ESORICS 2026

详情
英文摘要

Membership inference attacks (MIAs) aim to determine whether a data sample was included in a machine learning (ML) model's training set and have become the de facto standard for measuring privacy leakages in ML. We propose an evaluation framework that defines the conditions under which MIAs constitute a genuine privacy threat, and review representative MIAs against it. We find that, under the realistic conditions defined in our framework, MIAs represent weak privacy threats. Thus, relying on them as a privacy metric in ML can lead to an overestimation of risk and to unnecessary sacrifices in model utility as a consequence of employing too strong defenses.

2603.22968 2026-03-25 cs.CR cs.CL

Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy

Weijun Li, Arnaud Grivet Sébert, Qiongkai Xu, Annabelle McIver, Mark Dras

Comments 22 pages, 11 figures, 5 tables

详情
英文摘要

The growing use of large language models has increased interest in sharing textual data in a privacy-preserving manner. One prominent line of work addresses this challenge through text rewriting under Local Differential Privacy (LDP), where input texts are locally obfuscated before release with formal privacy guarantees. These guarantees are typically expressed by a parameter $\varepsilon$ that upper bounds the worst-case privacy loss. However, nominal $\varepsilon$ values are often difficult to interpret and compare across mechanisms. In this work, we investigate how to empirically calibrate across text rewriting mechanisms under LDP. We propose TeDA, which formulates calibration via a hypothesis-testing framework that instantiates text distinguishability audits in both surface and embedding spaces, enabling empirical assessment of indistinguishability from privatized texts. Applying this calibration to several representative mechanisms, we demonstrate that similar nominal $\varepsilon$ bounds can imply very different levels of distinguishability. Empirical calibration thus provides a more comparable footing for evaluating privacy-utility trade-offs, as well as a practical tool for mechanism comparison and analysis in real-world LDP text rewriting deployments.

2603.22964 2026-03-25 quant-ph cond-mat.quant-gas cs.LG stat.ML

A PAC-Bayesian approach to generalization for quantum models

Pablo Rodriguez-Grasa, Matthias C. Caro, Jens Eisert, Elies Gil-Fuster, Franz J. Schreiber, Carlos Bravo-Prieto

Comments 15+29 pages, 4 figures

详情
英文摘要

Generalization is a central concept in machine learning theory, yet for quantum models, it is predominantly analyzed through uniform bounds that depend on a model's overall capacity rather than the specific function learned. These capacity-based uniform bounds are often too loose and entirely insensitive to the actual training and learning process. Previous theoretical guarantees have failed to provide non-uniform, data-dependent bounds that reflect the specific properties of the learned solution rather than the worst-case behavior of the entire hypothesis class. To address this limitation, we derive the first PAC-Bayesian generalization bounds for a broad class of quantum models by analyzing layered circuits composed of general quantum channels, which include dissipative operations such as mid-circuit measurements and feedforward. Through a channel perturbation analysis, we establish non-uniform bounds that depend on the norms of learned parameter matrices; we extend these results to symmetry-constrained equivariant quantum models; and we validate our theoretical framework with numerical experiments. This work provides actionable model design insights and establishes a foundational tool for a more nuanced understanding of generalization in quantum machine learning.

2603.22959 2026-03-25 stat.ML cs.LG

Stepwise Variational Inference with Vine Copulas

Elisabeth Griesbauer, Leiv Rønneberg, Arnoldo Frigessi, Claudia Czado, Ingrid Hobæk Haff

详情
英文摘要

We propose stepwise variational inference (VI) with vine copulas: a universal VI procedure that combines vine copulas with a novel stepwise estimation procedure of the variational parameters. Vine copulas consist of a nested sequence of trees built from copulas, where more complex latent dependence can be modeled with increasing number of trees. We propose to estimate the vine copula approximate posterior in a stepwise fashion, tree by tree along the vine structure. Further, we show that the usual backward Kullback-Leibler divergence cannot recover the correct parameters in the vine copula model, thus the evidence lower bound is defined based on the Rényi divergence. Finally, an intuitive stopping criterion for adding further trees to the vine eliminates the need to pre-define a complexity parameter of the variational distribution, as required for most other approaches. Thus, our method interpolates between mean-field VI (MFVI) and full latent dependence. In many applications, in particular sparse Gaussian processes, our method is parsimonious with parameters, while outperforming MFVI.

2603.22954 2026-03-25 cs.CR cs.LG

Privacy-Preserving EHR Data Transformation via Geometric Operators: A Human-AI Co-Design Technical Report

Maolin Wang, Beining Bao, Gan Yuan, Hongyu Chen, Bingkun Zhao, Baoshuo Kan, Jiming Xu, Qi Shi, Yinggong Zhao, Yao Wang, Wei Ying Ma, Jun Yan

详情
英文摘要

Electronic health records (EHRs) and other real-world clinical data are essential for clinical research, medical artificial intelligence, and life science, but their sharing is severely limited by privacy, governance, and interoperability constraints. These barriers create persistent data silos that hinder multi-center studies, large-scale model development, and broader biomedical discovery. Existing privacy-preserving approaches, including multi-party computation and related cryptographic techniques, provide strong protection but often introduce substantial computational overhead, reducing the efficiency of large-scale machine learning and foundation-model training. In addition, many such methods make data usable for restricted computation while leaving them effectively invisible to clinicians and researchers, limiting their value in workflows that still require direct inspection, exploratory analysis, and human interpretation. We propose a real-world-data transformation framework for privacy-preserving sharing of structured clinical records. Instead of converting data into opaque representations, our approach constructs transformed numeric views that preserve medical semantics and major statistical properties while, under a clearly specified threat model, provably breaking direct linkage between those views and protected patient-level attributes. Through collaboration between computer scientists and the AI agent \textbf{SciencePal}, acting as a constrained tool inventor under human guidance, we design three transformation operators that are non-reversible within this threat model, together with an additional mixing strategy for high-risk scenarios, supported by theoretical analysis and empirical evaluation under reconstruction, record linkage, membership inference, and attribute inference attacks.

2603.22920 2026-03-25 cs.CY cs.AI

The EU AI Act and the Rights-based Approach to Technological Governance

Georgios Pavlidis

Journal ref Review of European and Comparative Law (2026)

详情
英文摘要

The EU AI Act constitutes an important development in shaping the Union's digital regulatory architecture. The Act places fundamental rights at the heart of a risk-based governance framework. The article examines how the AI Act institutionalises a human-centric approach to AI and how the AI Act's provisions explicitly and implicitly embed the protection of rights enshrined in the EU Charter of Fundamental Rights. It argues that fundamental rights function not merely as aspirational goals, but as legal thresholds and procedural triggers across the lifecycle of an AI system. The analysis suggests that the AI Act has the potential to serve as a model for rights-preserving AI systems, while acknowledging that challenges will emerge at the level of implementation.

2603.22912 2026-03-25 cs.CY cs.AI

From the AI Act to a European AI Agency: Completing the Union's Regulatory Architecture

Georgios Pavlidis

Journal ref Croatian Yearbook of European Law and Policy (2025) 21 CYELP [ONLINE FIRST]

详情
英文摘要

As artificial intelligence (AI) technologies continue to advance, effective risk assessment, regulation, and oversight are necessary to ensure that AI development and deployment align with ethical principles while preserving innovation and economic competitiveness. The adoption of the EU AI Act marks an important step in this direction, establishing a harmonised legal framework that includes detailed provisions on AI governance, as well as the creation of the European AI Office. This paper revisits the question of whether a more robust supranational agency dedicated to AI is still warranted and explores how such a body could enhance policy coherence, improve risk assessment capacities, and foster international cooperation. It also argues that a strengthened EU-level agency would also serve the Union's strategic aim of securing digital and technological sovereignty.

2603.22900 2026-03-25 stat.ME cs.AI cs.LG stat.ML

Off-Policy Evaluation and Learning for Survival Outcomes under Censoring

Kohsuke Kubota, Mitsuhiro Takahashi, Yuta Saito

Comments Preprint

详情
英文摘要

Optimizing survival outcomes, such as patient survival or customer retention, is a critical objective in data-driven decision-making. Off-Policy Evaluation~(OPE) provides a powerful framework for assessing such decision-making policies using logged data alone, without the need for costly or risky online experiments in high-stakes applications. However, typical estimators are not designed to handle right-censored survival outcomes, as they ignore unobserved survival times beyond the censoring time, leading to systematic underestimation of the true policy performance. To address this issue, we propose a novel framework for OPE and Off-Policy Learning~(OPL) tailored for survival outcomes under censoring. Specifically, we introduce IPCW-IPS and IPCW-DR, which employ the Inverse Probability of Censoring Weighting technique to explicitly deal with censoring bias. We theoretically establish that our estimators are unbiased and that IPCW-DR achieves double robustness, ensuring consistency if either the propensity score or the outcome model is correct. Furthermore, we extend this framework to constrained OPL to optimize policy value under budget constraints. We demonstrate the effectiveness of our proposed methods through simulation studies and illustrate their practical impacts using public real-world data for both evaluation and learning tasks.

2603.22855 2026-03-25 cs.AR cs.LG

TorR: Towards Brain-Inspired Task-Oriented Reasoning via Cache-Oriented Algorithm-Architecture Co-design

Hyunwoo Oh, SungHeon Jeong, Suyeon Jang, Hanning Chen, Sanggeon Yun, Tamoghno Das, Mohsen Imani

Comments Accepted to DAC 2026

详情
英文摘要

Task-oriented object detection (TOOD) atop CLIP offers open-vocabulary, prompt-driven semantics, yet dense per-window computation and heavy memory traffic hinder real-time, power-limited edge deployment. We present \emph{TorR}, a brain-inspired \textbf{algorithm--architecture co-design} that \textbf{replaces CLIP-style dense alignment with a hyperdimensional (HDC) associative reasoner} and turns temporal coherence into reuse. On the \emph{algorithm} side, TorR reformulates alignment as HDC similarity and graph composition, introducing \emph{partial-similarity reuse} via (i) query caching with per-class score accumulation, (ii) exact $δ$-updates when only a small set of hypervector bits change, and (iii) similarity/load-gated bypass under high system load. On the \emph{architecture} side, TorR instantiates a lane-scalable, bit-sliced item memory with bank/precision gating and a lightweight controller that schedules bypass/$δ$/full paths to meet RT-30/RT-60 targets as object counts vary. Synthesized in a TSMC 28\,nm process and exercised with a cycle-accurate simulator, TorR sustains real-time throughput with millijoule-scale energy per window ($\approx$50\,mJ at 60\,FPS; $\approx$113\,mJ at 30\,FPS) and low latency jitter, while delivering competitive AP@0.5 across five task prompts (mean 44.27\%) within a bounded margin to strong VLM baselines, but at orders-of-magnitude lower energy. The design exposes deployment-time configurability (effective dimension $D'$, thresholds, precision) to trade accuracy, latency, and energy for edge budgets.

2603.22853 2026-03-25 cs.CR cs.AI

Agent Audit: A Security Analysis System for LLM Agent Applications

Haiyue Zhang, Yi Nian, Yue Zhao

详情
英文摘要

What should a developer inspect before deploying an LLM agent: the model, the tool code, the deployment configuration, or all three? In practice, many security failures in agent systems arise not from model weights alone, but from the surrounding software stack: tool functions that pass untrusted inputs to dangerous operations, exposed credentials in deployment artifacts, and over-privileged Model Context Protocol (MCP) configurations. We present Agent Audit, a security analysis system for LLM agent applications. Agent Audit analyzes Python agent code and deployment artifacts through an agent-aware pipeline that combines dataflow analysis, credential detection, structured configuration parsing, and privilege-risk checks. The system reports findings in terminal, JSON, and SARIF formats, enabling direct integration with local development workflows and CI/CD pipelines. On a benchmark of 22 samples with 42 annotated vulnerabilities, Agent Audit detects 40 vulnerabilities with 6 false positives, substantially improving recall over common SAST baselines while maintaining sub-second scan times. Agent Audit is open source and installable via pip, making security auditing accessible for agent systems. In the live demonstration, attendees scan vulnerable agent repositories and observe how Agent Audit identifies security risks in tool functions, prompts, and more. Findings are linked to source locations and configuration paths, and can be exported into VS Code and GitHub Code Scanning for interactive inspection.

2603.22842 2026-03-25 eess.IV cs.CV

L-UNet: An LSTM Network for Remote Sensing Image Change Detection

Shuting Sun, Lin Mu, Lizhe Wang, Peng Liu

详情
英文摘要

Change detection of high-resolution remote sensing images is an important task in earth observation and was extensively investigated. Recently, deep learning has shown to be very successful in plenty of remote sensing tasks. The current deep learning-based change detection method is mainly based on conventional long short-term memory (Conv-LSTM), which does not have spatial characteristics. Since change detection is a process with both spatiality and temporality, it is necessary to propose an end-to-end spatiotemporal network. To achieve this, Conv-LSTM, an extension of the Conv-LSTM structure, is introduced. Since it shares similar spatial characteristics with the convolutional layer, L-UNet, which substitutes partial convolution layers of UNet-to-Conv-LSTM and Atrous L-UNet (AL-UNet), which further using Atrous structure to multiscale spatial information is proposed. Experiments on two data sets are conducted and the proposed methods show the advantages both in quantity and quality when compared with some other methods.

2603.22790 2026-03-25 quant-ph cs.AI

Quantum Random Forest for the Regression Problem

Kamil Khadiev, Liliya Safina

Comments Accepted in Quantum Computing - Artificial Intelligence for Industry Applications and Scientific Discovery A Workshop at the IEEE International Conference on Quantum Communications, Networking, and Computing (QCNC) 2026

详情
英文摘要

The Random Forest model is one of the popular models of Machine learning. We present a quantum algorithm for testing (forecasting) process of the Random Forest machine learning model for the Regression problem. The presented algorithm is more efficient (in terms of query complexity or running time) than the classical counterpart.

2603.22787 2026-03-25 cs.HC cs.RO

DiSCo: Diffusion Sequence Copilots for Shared Autonomy

Andy Wang, Xu Yan, Brandon McMahan, Michael Zhou, Yuyang Yuan, Johannes Y. Lee, Ali Shreif, Matthew Li, Zhenghao Peng, Bolei Zhou, Yuchen Cui, Jonathan C. Kao

Comments 10 pages, 5 figures, HRI '26: Proceedings of the 21st ACM/IEEE International Conference on Human-Robot Interaction

详情
英文摘要

Shared autonomy combines human user and AI copilot actions to control complex systems such as robotic arms. When a task is challenging, requires high dimensional control, or is subject to corruption, shared autonomy can significantly increase task performance by using a trained copilot to effectively correct user actions in a manner consistent with the user's goals. To significantly improve the performance of shared autonomy, we introduce Diffusion Sequence Copilots (DiSCo): a method of shared autonomy with diffusion policy that plans action sequences consistent with past user actions. DiSCo seeds and inpaints the diffusion process with user-provided actions with hyperparameters to balance conformity to expert actions, alignment with user intent, and perceived responsiveness. We demonstrate that DiSCo substantially improves task performance in simulated driving and robotic arm tasks. Project website: https://sites.google.com/view/disco-shared-autonomy/

2603.22776 2026-03-25 eess.IV cs.CV

Viewport-based Neural 360° Image Compression

Jingwei Liao, Bo Chen, Klara Nahrstedt, Zhisheng Yan

详情
英文摘要

Given the popularity of 360° images on social media platforms, 360° image compression becomes a critical technology for media storage and transmission. Conventional 360° image compression pipeline projects the spherical image into a single 2D plane, leading to issues of oversampling and distortion. In this paper, we propose a novel viewport-based neural compression pipeline for 360° images. By replacing the image projection in conventional 360° image compression pipelines with viewport extraction and efficiently compressing multiple viewports, the proposed pipeline minimizes the inherent oversampling and distortion issues. However, viewport extraction impedes information sharing between multiple viewports during compression, causing the loss of global information about the spherical image. To tackle this global information loss, we design a neural viewport codec to capture global prior information across multiple viewports and maximally compress the viewport data. The viewport codec is empowered by a transformer-based ViewPort ConText (VPCT) module that can be integrated with canonical learning-based 2D image compression structures. We compare the proposed pipeline with existing 360° image compression models and conventional 360° image compression pipelines building on learning-based 2D image codecs and standard hand-crafted codecs. Results show that our pipeline saves an average of $14.01\%$ bit consumption compared to the best-performing 360° image compression methods without compromising quality. The proposed VPCT-based codec also outperforms existing 2D image codecs in the viewport-based neural compression pipeline. Our code can be found at: https://github.com/Jingwei-Liao/VPCT.

2603.22771 2026-03-25 cs.CR cs.LG

Explainable Threat Attribution for IoT Networks Using Conditional SHAP and Flow Behavior Modelling

Samuel Ozechi, Jennifer Okonkwoabutu

详情
英文摘要

As the Internet of Things (IoT) continues to expand across critical infrastructure, smart environments, and consumer devices, securing them against cyber threats has become increasingly vital. Traditional intrusion detection models often treat IoT threats as binary classification problems or rely on opaque models, thereby limiting trust. This work studies multiclass threat attribution in IoT environments using the CICIoT2023 dataset, grouping over 30 attack variants into 8 semantically meaningful classes. We utilize a combination of a gradient boosting model and SHAP (SHapley Additive exPlanations) to deliver both global and class-specific explanations, enabling detailed insight into the features driving each attack classification. The results show that the model distinguishes distinct behavioral signatures of the attacks using flow timing, packet size uniformity, TCP flag dynamics, and statistical variance. Additional analysis that exposes both feature attribution and the decision trajectory per class further validates these observed patterns. Our findings contribute to the development of more accurate and explainable intrusion detection systems, bridging the gap between high-performance machine learning and the need for trust and accountability in AI-driven cybersecurity for IoT environments.

2603.22759 2026-03-25 cs.HC cs.RO

Human vs. NAO: A Computational-Behavioral Framework for Quantifying Social Orienting in Autism and Typical Development

Vartika Narayani Srinet, Anirudha Bhattacharjee, Braj Bhushan, Bishakh Bhattacharya

详情
英文摘要

Responding to one's name is among the earliest-emerging social orienting behaviors and is one of the most prominent aspects in the detection of Autism Spectrum Disorder (ASD). Typically developing children exhibit near-reflexive orienting to their name, whereas children with ASD often demonstrate reduced frequency, increased latency, or atypical patterns of response. In this study, we examine differential responsiveness to quantify name-calling stimuli delivered by both human agents and NAO, a humanoid robot widely employed in socially assistive interventions for autism. The analysis focuses on multiple behavioral parameters, including eye contact, response latency, head and facial orientation shifts, and duration of sustained interest. Video-based computational methods were employed, incorporating face detection, eye region tracking, and spatio-temporal facial analysis, to obtain fine-grained measures of children's responses. By comparing neurotypical and neuroatypical groups under controlled human-robot conditions, this work aims to understand how the source and modality of social cues affect attentional dynamics in name-calling contexts. The findings advance both the theoretical understanding of social orienting deficits in autism and the applied development of robot-assisted assessment tools.

2603.22750 2026-03-25 stat.ML cs.LG

REALITrees: Rashomon Ensemble Active Learning for Interpretable Trees

Simon D. Nguyen, Hayden McTavish, Kentaro Hoffman, Cynthia Rudin, Tyler H. McCormick

详情
英文摘要

Active learning reduces labeling costs by selecting samples that maximize information gain. A dominant framework, Query-by-Committee (QBC), typically relies on perturbation-based diversity by inducing model disagreement through random feature subsetting or data blinding. While this approximates one notion of epistemic uncertainty, it sacrifices direct characterization of the plausible hypothesis space. We propose the complementary approach: Rashomon Ensembled Active Learning (REAL) which constructs a committee by exhaustively enumerating the Rashomon Set of all near-optimal models. To address functional redundancy within this set, we adopt a PAC-Bayesian framework using a Gibbs posterior to weight committee members by their empirical risk. Leveraging recent algorithmic advances, we exactly enumerate this set for the class of sparse decision trees. Across synthetic and established active learning baselines, REAL outperforms randomized ensembles, particularly in moderately noisy environments where it strategically leverages expanded model multiplicity to achieve faster convergence.

2603.22741 2026-03-25 cs.DS cs.LG cs.NA math.NA math.ST stat.ML stat.TH

Algorithmic warm starts for Hamiltonian Monte Carlo

Matthew S. Zhang, Jason M. Altschuler, Sinho Chewi

详情
英文摘要

Generating samples from a continuous probability density is a central algorithmic problem across statistics, engineering, and the sciences. For high-dimensional settings, Hamiltonian Monte Carlo (HMC) is the default algorithm across mainstream software packages. However, despite the extensive line of work on HMC and its widespread empirical success, it remains unclear how many iterations of HMC are required as a function of the dimension $d$. On one hand, a variety of results show that Metropolized HMC converges in $O(d^{1/4})$ iterations from a warm start close to stationarity. On the other hand, Metropolized HMC is significantly slower without a warm start, e.g., requiring $Ω(d^{1/2})$ iterations even for simple target distributions such as isotropic Gaussians. Finding a warm start is therefore the computational bottleneck for HMC. We resolve this issue for the well-studied setting of sampling from a probability distribution satisfying strong log-concavity (or isoperimetry) and third-order derivative bounds. We prove that \emph{non-Metropolized} HMC generates a warm start in $\tilde{O}(d^{1/4})$ iterations, after which we can exploit the warm start using Metropolized HMC. Our final complexity of $\tilde{O}(d^{1/4})$ is the fastest algorithm for high-accuracy sampling under these assumptions, improving over the prior best of $\tilde{O}(d^{1/2})$. This closes the long line of work on the dimensional complexity of MHMC for such settings, and also provides a simple warm-start prescription for practical implementations.

2603.22714 2026-03-25 cs.CY cs.AI

PopResume: Causal Fairness Evaluation of LLM/VLM Resume Screeners with Population-Representative Dataset

Sumin Yu, Juhyeon Park, Taesup Moon

Comments Under Review

详情
英文摘要

We present PopResume, a population-representative resume dataset for causal fairness auditing of LLM- and VLM-based resume screening systems. Unlike existing benchmarks that rely on manually injected demographic information and outcome-level disparities, PopResume is grounded in population statistics and preserves natural attribute relationships, enabling path-specific effect (PSE)-based fairness evaluation. We decompose the effect of a protected attribute on resume scores into two paths: the business necessity path, mediated by job-relevant qualifications, and the redlining path, mediated by demographic proxies. This distinction allows auditors to separate legally permissible from impermissible sources of disparity. Evaluating four LLMs and four VLMs on PopResume's 60.8K resumes across five occupations, we identify five representative discrimination patterns that aggregate metrics fail to capture. Our results demonstrate that PSE-based evaluation reveals fairness issues masked by outcome-level measures, underscoring the need for causally-grounded auditing frameworks in AI-assisted hiring.

2603.22648 2026-03-25 cs.HC cs.AI

AwesomeLit: Towards Hypothesis Generation with Agent-Supported Literature Research

Zefei Xie, Yuhan Guo, Kai Xu

详情
英文摘要

There are different goals for literature research, from understanding an unfamiliar topic to generate hypothesis for the next research project. The nature of literature research also varies according to user's familiarity level of the topic. For inexperienced researchers, identifying gaps in the existing literature and generating feasible hypothesis are crucial but challenging. While general ``deep research'' tools can be used, they are not designed for such use case, thus often not effective. In addition, the ``black box" nature and hallucination of Large Language Models (LLMs) often lead to distrust. In this paper, we introduce a human-agent collaborative visualization system AwesomeLit to address this need. It has several novel features: a transparent user-steerable agentic workflow; a dynamically generated query exploring tree, visualizing the exploration path and provenance; and a semantic similarity view, depicting the relationships between papers. It enables users to transition from general intentions to detailed research topics. Finally, a qualitative study involving several early researchers showed that AwesomeLit is effective in helping users explore unfamiliar topics, identify promising research directions, and improve confidence in research results.

2603.22644 2026-03-25 stat.ML cs.LG

Overfitting and Generalizing with (PAC) Bayesian Prediction in Noisy Binary Classification

Xiaohan Zhu, Mesrob I. Ohannessian, Nathan Srebro

详情
英文摘要

We consider a PAC-Bayes type learning rule for binary classification, balancing the training error of a randomized ''posterior'' predictor with its KL divergence to a pre-specified ''prior''. This can be seen as an extension of a modified two-part-code Minimum Description Length (MDL) learning rule, to continuous priors and randomized predictions. With a balancing parameter of $λ=1$ this learning rule recovers an (empirical) Bayes posterior and a modified variant recovers the profile posterior, linking with standard Bayesian prediction (up to the treatment of the single-parameter noise level). However, from a risk-minimization prediction perspective, this Bayesian predictor overfits and can lead to non-vanishing excess loss in the agnostic case. Instead a choice of $λ\gg 1$, which can be seen as using a sample-size-dependent-prior, ensures uniformly vanishing excess loss even in the agnostic case. We precisely characterize the effect of under-regularizing (and over-regularizing) as a function of the balance parameter $λ$, understanding the regimes in which this under-regularization is tempered or catastrophic. This work extends previous work by Zhu and Srebro [2025] that considered only discrete priors to PAC Bayes type learning rules and, through their rigorous Bayesian interpretation, to Bayesian prediction more generally.

2603.22634 2026-03-25 cs.HC cs.AI

Learning to Trust: How Humans Mentally Recalibrate AI Confidence Signals

ZhaoBin Li, Mark Steyvers

详情
英文摘要

Productive human-AI collaboration requires appropriate reliance, yet contemporary AI systems are often miscalibrated, exhibiting systematic overconfidence or underconfidence. We investigate whether humans can learn to mentally recalibrate AI confidence signals through repeated experience. In a behavioral experiment (N = 200), participants predicted the AI's correctness across four AI calibration conditions: standard, overconfidence, underconfidence, and a counterintuitive "reverse confidence" mapping. Results demonstrate robust learning across all conditions, with participants significantly improving their accuracy, discrimination, and calibration alignment over 50 trials. We present a computational model utilizing a linear-in-log-odds (LLO) transformation and a Rescorla-Wagner learning rule to explain these dynamics. The model reveals that humans adapt by updating their baseline trust and confidence sensitivity, using asymmetric learning rates to prioritize the most informative errors. While humans can compensate for monotonic miscalibration, we identify a significant boundary in the reverse confidence scenario, where a substantial proportion of participants struggled to override initial inductive biases. These findings provide a mechanistic account of how humans adapt their trust in AI confidence signals through experience.

2603.22627 2026-03-25 eess.IV cs.CV

Single-Subject Multi-View MRI Super-Resolution via Implicit Neural Representations

Heejong Kim, Abhishek Thanki, Roel van Herten, Daniel Margolis, Mert R Sabuncu

详情
英文摘要

Clinical MRI frequently acquires anisotropic volumes with high in-plane resolution and low through-plane resolution to reduce acquisition time. Multiple orientations are therefore acquired to provide complementary anatomical information. Conventional integration of these views relies on registration followed by interpolation, which can degrade fine structural details. Recent deep learning-based super-resolution (SR) approaches have demonstrated strong performance in enhancing single-view images. However, their clinical reliability is often limited by the need for large-scale training datasets, resulting in increased dependence on cohort-level priors. Self-supervised strategies offer an alternative by learning directly from the target scans. Prior work either neglects the existence of multi-view information or assumes that in-plane information can supervise through-plane reconstruction under the assumption of pre-alignment between images. However, this assumption is rarely satisfied in clinical settings. In this work, we introduce Single-Subject Implicit Multi-View Super-Resolution for MRI (SIMS-MRI), a framework that operates solely on anisotropic multi-view scans from a single patient without requiring pre- or post-processing. Our method combines a multi-resolution hash-encoded implicit representation with learned inter-view alignment to generate a spatially consistent isotropic reconstruction. We validate the SIMS-MRI pipeline on both simulated brain and clinical prostate MRI datasets. Code will be made publicly available for reproducibility: https://github.com/abhshkt/SIMS-MRI

2603.22625 2026-03-25 cs.IR cs.CL

Leveraging Large Language Models to Extract and Translate Medical Information in Doctors' Notes for Health Records and Diagnostic Billing Codes

Peter Hartnett, Chung-Chi Huang, Sarah Hartnett, David Hartnett

Comments 45 pages, 19 figures

详情
英文摘要

Physician burnout in the United States has reached critical levels, driven in part by the administrative burden of Electronic Health Record (EHR) documentation and complex diagnostic codes. To relieve this strain and maintain strict patient privacy, this thesis explores an on-device, offline automatic medical coding system. The work focuses on using open-weight Large Language Models (LLMs) to extract clinical information from physician notes and translate it into ICD-10-CM diagnostic codes without reliance on cloud-based services. A privacy-focused pipeline was developed using Ollama, LangChain, and containerized environments to evaluate multiple open-weight models, including Llama 3.2, Mistral, Phi, and DeepSeek, on consumer-grade hardware. Model performance was assessed for zero-shot, few-shot, and retrieval-augmented generation (RAG) prompting strategies using a novel benchmark of synthetic medical notes. Results show that strict JSON schema enforcement achieved near 100% formatting compliance, but accurate generation of specific diagnostic codes remains challenging for smaller local models (7B-20B parameters). Contrary to common prompt-engineering guidance, few-shot prompting degraded performance through overfitting and hallucinations. While RAG enabled limited discovery of unseen codes, it frequently saturated context windows, reducing overall accuracy. The findings suggest that fully automated unsupervised coding with local open-source models is not yet reliable; instead, a human-in-the-loop assisted coding approach is currently the most practical path forward. This work contributes a reproducible local LLM architecture and benchmark dataset for privacy-preserving medical information extraction and coding.

2603.22617 2026-03-25 cs.HC cs.AI

Do Consumers Accept AIs as Moral Compliance Agents?

Greg Nyilasy, Abraham Ryan Ade Putra Hito, Jennifer Overbeck, Brock Bastian, Darren W. Dahl

详情
英文摘要

Consumers are generally resistant to Artificial Intelligence (AI) involvement in moral decision-making, perceiving moral agency as requiring uniquely human traits. This research investigates whether consumers might instead accept AIs in the role of moral compliance, where AI upholds pre-existing moral norms without exercising subjective discretion. Across five studies this research shows that consumers evaluate AI more positively than human agents in moral compliance roles. The findings reveal that this preference arises from inferences of AI's lack of ulterior motives, which are often attributed to human agents. While previous studies have focused on AI as a decision-maker, this work demonstrates the critical role of upholding pre-existing rules, a role in which AI is perceived to excel. These findings contribute to understanding consumer acceptance of moral AI and provide actionable insights for organizations seeking to leverage AI in ethical oversight. By positioning AI as a moral compliance agent, companies can address consumer skepticism, enhance trust, and improve perceptions of corporate ethicality.

2603.22587 2026-03-25 cs.IR cs.AI cs.DB

flexvec: SQL Vector Retrieval with Programmatic Embedding Modulation

Damian Delmas

Comments 15 pages, 1 figure, 7 tables, 4 appendices. Code available at https://github.com/damiandelmas/flexvec

详情
英文摘要

As AI agents become the primary consumers of retrieval APIs, there is an opportunity to expose more of the retrieval pipeline to the caller. flexvec is a retrieval kernel that exposes the embedding matrix and score array as a programmable surface, allowing arithmetic operations on both before selection. We refer to composing operations on this surface at query time as Programmatic Embedding Modulation (PEM). This paper describes a set of such operations and integrates them into a SQL interface via a query materializer that facilitates composable query primitives. On a production corpus of 240,000 chunks, three composed modulations execute in 19 ms end-to-end on a desktop CPU without approximate indexing. At one million chunks, the same operations execute in 82 ms.

2603.22577 2026-03-25 cs.CR cs.AI cs.MA

STRIATUM-CTF: A Protocol-Driven Agentic Framework for General-Purpose CTF Solving

James Hugglestone, Samuel Jacob Chacko, Dawson Stoller, Ryan Schmidt, Xiuwen Liu

Comments 8 pages, 7 pages

详情
英文摘要

Large Language Models (LLMs) have demonstrated potential in code generation, yet they struggle with the multi-step, stateful reasoning required for offensive cybersecurity operations. Existing research often relies on static benchmarks that fail to capture the dynamic nature of real-world vulnerabilities. In this work, we introduce STRIATUM-CTF (A Search-based Test-time Reasoning Inference Agent for Tactical Utility Maximization in Cybersecurity), a modular agentic framework built upon the Model Context Protocol (MCP). By standardizing tool interfaces for system introspection, decompilation, and runtime debugging, STRIATUM-CTF enables the agent to maintain a coherent context window across extended exploit trajectories. We validate this approach not merely on synthetic datasets, but in a live competitive environment. Our system participated in a university-hosted Capture-the-Flag (CTF) competition in late 2025, where it operated autonomously to identify and exploit vulnerabilities in real-time. STRIATUM-CTF secured First Place, outperforming 21 human teams and demonstrating strong adaptability in a dynamic problem-solving setting. We analyze the agent's decision-making logs to show how MCP-based tool abstraction significantly reduces hallucination compared to naive prompting strategies. These results suggest that standardized context protocols are a critical path toward robust autonomous cyber-reasoning systems.

2603.22563 2026-03-25 stat.ML cs.LG

Privacy-Preserving Reinforcement Learning from Human Feedback via Decoupled Reward Modeling

Young Hyun Cho, Will Wei Sun

详情
英文摘要

Preference-based fine-tuning has become an important component in training large language models, and the data used at this stage may contain sensitive user information. A central question is how to design a differentially private pipeline that is well suited to the distinct structure of reinforcement learning from human feedback. We propose a privacy-preserving framework that imposes differential privacy only on reward learning and derives the final policy from the resulting private reward model. Theoretically, we study the suboptimality gap and show that privacy contributes an additional additive term beyond the usual non-private statistical error. We also establish a minimax lower bound and show that the dominant term changes with sample size and privacy level, which in turn characterizes regimes in which the upper bound is rate-optimal up to logarithmic factors. Empirically, synthetic experiments confirm the scaling predicted by the theory, and experiments on the Anthropic HH-RLHF dataset using the Gemma-2B-IT model show stronger private alignment performance than existing differentially private baseline methods across privacy budgets.

2603.22536 2026-03-25 eess.AS cs.SD

MSP-Conversation: A Corpus for Naturalistic, Time-Continuous Emotion Recognition

Luz Martinez-Lucas, Pravin Mote, Abinay Reddy Naini, Mohammed Abdelwahab, Carlos Busso

详情
英文摘要

Affective computing aims to understand and model human emotions for computational systems. Within this field, speech emotion recognition (SER) focuses on predicting emotions conveyed through speech. While early SER systems relied on limited datasets and traditional machine learning models, recent deep learning approaches demand largescale, naturalistic emotional corpora. To address this need, we introduce the MSP-Conversation corpus: a dataset of more than 70 hours of conversational audio with time-continuous emotional annotations and detailed speaker diarizations. The time-continuous annotations capture the dynamic and contextdependent nature of emotional expression. The annotations in the corpus include fine-grained temporal traces of valence, arousal, and dominance. The audio data is sourced from publicly available podcasts and overlaps with a subset of the isolated speaking turns in the MSP-Podcast corpus to facilitate direct comparisons between annotation methods (i.e., in-context versus out-of-context annotations). The paper outlines the development of the corpus, annotation methodology, analyses of the annotations, and baseline SER experiments, establishing the MSP-Conversation corpus as a valuable resource for advancing research in dynamic SER in naturalistic settings.