arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1080
2604.21518 2026-04-24 eess.IV cs.CV

DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction

Shiyan Su, Ruyi Zha, Danli Shi, Hongdong Li, Xuelian Cheng

Comments Accepted to AAAI 2026. Project page: https://ooonesevennn.github.io/DiffNR/

详情
英文摘要

Neural representations (NRs), such as neural fields and 3D Gaussians, effectively model volumetric data in computed tomography (CT) but suffer from severe artifacts under sparse-view settings. To address this, we propose DiffNR, a novel framework that enhances NR optimization with diffusion priors. At its core is SliceFixer, a single-step diffusion model designed to correct artifacts in degraded slices. We integrate specialized conditioning layers into the network and develop tailored data curation strategies to support model finetuning. During reconstruction, SliceFixer periodically generates pseudo-reference volumes, providing auxiliary 3D perceptual supervision to fix underconstrained regions. Compared to prior methods that embed CT solvers into time-consuming iterative denoising, our repair-and-augment strategy avoids frequent diffusion model queries, leading to better runtime performance. Extensive experiments show that DiffNR improves PSNR by 3.99 dB on average, generalizes well across domains, and maintains efficient optimization.

2604.21507 2026-04-24 eess.AS cs.SD

DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline

Nikhil Raghav

Comments 13 pages, 7 figures, 2 tables. Code available at https://github.com/nikhilraghav29/diarizen-tutorial

详情
英文摘要

Speaker diarization (SD) is the task of answering "who spoke when" in a multi-speaker audio stream. Classically, an SD system clusters segments of speech belonging to an individual speaker's identity. Recent years have seen substantial progress in SD through end-to-end neural diarization (EEND) approaches. DiariZen, a hybrid SD pipeline built upon a structurally pruned WavLM-Large encoder, a Conformer backend with powerset classification, and VBx clustering, represents the leading open-source state of the art at the time of writing across multiple benchmarks. Despite its strong performance, the DiariZen architecture spans several repositories and frameworks, making it difficult for researchers and practitioners to understand, reproduce, or extend the system as a whole. This tutorial paper provides a self-contained, block-by-block explanation of the complete DiariZen pipeline, decomposing it into seven stages: (1) audio loading and sliding window segmentation, (2) WavLM feature extraction with learned layer weighting, (3) Conformer backend and powerset classification, (4) segmentation aggregation via overlap-add, (5) speaker embedding extraction with overlap exclusion, (6) VBx clustering with PLDA scoring, and (7) reconstruction and RTTM output. For each block, we provide the conceptual motivation, source code references, intermediate tensor shapes, and annotated visualizations of the actual outputs on a 30s excerpt from the AMI Meeting Corpus. The implementation is available at https://github.com/nikhilraghav29/diarizen-tutorial, which includes standalone executable scripts for each block and a Jupyter notebook that runs the complete pipeline end-to-end.

2604.21432 2026-04-24 stat.ML cs.LG

A single algorithm for both restless and rested rotting bandits

Julien Seznec, Pierre Ménard, Alessandro Lazaric, Michal Valko

Comments In AISTATS 2020

详情
英文摘要

In many application domains (e.g., recommender systems, intelligent tutoring systems), the rewards associated to the actions tend to decrease over time. This decay is either caused by the actions executed in the past (e.g., a user may get bored when songs of the same genre are recommended over and over) or by an external factor (e.g., content becomes outdated). These two situations can be modeled as specific instances of the rested and restless bandit settings, where arms are rotting (i.e., their value decrease over time). These problems were thought to be significantly different, since Levine et al. (2017) showed that state-of-the-art algorithms for restless bandit perform poorly in the rested rotting setting. In this paper, we introduce a novel algorithm, Rotting Adaptive Window UCB (RAW-UCB), that achieves near-optimal regret in both rotting rested and restless bandit, without any prior knowledge of the setting (rested or restless) and the type of non-stationarity (e.g., piece-wise constant, bounded variation). This is in striking contrast with previous negative results showing that no algorithm can achieve similar results as soon as rewards are allowed to increase. We confirm our theoretical findings on a number of synthetic and dataset-based experiments.

2604.21421 2026-04-24 cs.CR cs.AI cs.CL

Differentially Private De-identification of Dutch Clinical Notes: A Comparative Evaluation

Michele Miranda, Xinlan Yan, Nishant Mishra, Rachel Murphy, Ameen Abu-Hanna, Sébastien Bratières, Iacer Calixto

详情
英文摘要

Protecting patient privacy in clinical narratives is essential for enabling secondary use of healthcare data under regulations such as GDPR and HIPAA. While manual de-identification remains the gold standard, it is costly and slow, motivating the need for automated methods that combine privacy guarantees with high utility. Most automated text de-identification pipelines employed named entity recognition (NER) to identify protected entities for redaction. Although methods based on differential privacy (DP) provide formal privacy guarantees, more recently also large language models (LLMs) are increasingly used for text de-identification in the clinical domain. In this work, we present the first comparative study of DP, NER, and LLMs for Dutch clinical text de-identification. We investigate these methods separately as well as hybrid strategies that apply NER or LLM preprocessing prior to DP, and assess performance in terms of privacy leakage and extrinsic evaluation (entity and relation classification). We show that DP mechanisms alone degrade utility substantially, but combining them with linguistic preprocessing, especially LLM-based redaction, significantly improves the privacy-utility trade-off.

2604.21416 2026-04-24 cs.CR cs.AI

CSC: Turning the Adversary's Poison against Itself

Yuchen Shi, Xin Guo, Huajie Chen, Tianqing Zhu, Bo Liu, Wanlei Zhou

详情
英文摘要

Poisoning-based backdoor attacks pose significant threats to deep neural networks by embedding triggers in training data, causing models to misclassify triggered inputs as adversary-specified labels while maintaining performance on clean data. Existing poison restraint-based defenses often suffer from inadequate detection against specific attack variants and compromise model utility through unlearning methods that lead to accuracy degradation. This paper conducts a comprehensive analysis of backdoor attack dynamics during model training, revealing that poisoned samples form isolated clusters in latent space early on, with triggers acting as dominant features distinct from benign ones. Leveraging these insights, we propose Cluster Segregation Concealment (CSC), a novel poison suppression defense. CSC first trains a deep neural network via standard supervised learning while segregating poisoned samples through feature extraction from early epochs, DBSCAN clustering, and identification of anomalous clusters based on class diversity and density metrics. In the concealment stage, identified poisoned samples are relabeled to a virtual class, and the model's classifier is fine-tuned using cross-entropy loss to replace the backdoor association with a benign virtual linkage, preserving overall accuracy. CSC was evaluated on four benchmark datasets against twelve poisoning-based attacks, CSC outperforms nine state-of-the-art defenses by reducing average attack success rates to near zero with minimal clean accuracy loss. Contributions include robust backdoor patterns identification, an effective concealment mechanism, and superior empirical validation, advancing trustworthy artificial intelligence.

2604.21310 2026-04-24 cs.CR cs.AI

Adversarial Evasion in Non-Stationary Malware Detection: Minimizing Drift Signals through Similarity-Constrained Perturbations

Pawan Acharya, Lan Zhang

详情
英文摘要

Deep learning has emerged as a powerful approach for malware detection, demonstrating impressive accuracy across various data representations. However, these models face critical limitations in real-world, non-stationary environments where both malware characteristics and detection systems continuously evolve. Our research investigates a fundamental security question: Can an attacker generate adversarial malware samples that simultaneously evade classification and remain inconspicuous to drift monitoring mechanisms? We propose a novel approach that generates targeted adversarial examples in the classifier's standardized feature space, augmented with sophisticated similarity regularizers. By carefully constraining perturbations to maintain distributional similarity with clean malware, we create an optimization objective that balances targeted misclassification with drift signal minimization. We quantify the effectiveness of this approach by comprehensively comparing classifier output probabilities using multiple drift metrics. Our experiments demonstrate that similarity constraints can reduce output drift signals, with $\ell_2$ regularization showing the most promising results. We observe that perturbation budget significantly influences the evasion-detectability trade-off, with increased budget leading to higher attack success rates and more substantial drift indicators.

2604.21308 2026-04-24 cs.CR cs.CL

CI-Work: Benchmarking Contextual Integrity in Enterprise LLM Agents

Wenjie Fu, Xiaoting Qin, Jue Zhang, Qingwei Lin, Lukas Wutschitz, Robert Sim, Saravan Rajmohan, Dongmei Zhang

详情
Journal ref
The 64th Annual Meeting of the Association for Computational Linguistics (ACL'2026) -- Industry Track
英文摘要

Enterprise LLM agents can dramatically improve workplace productivity, but their core capability, retrieving and using internal context to act on a user's behalf, also creates new risks for sensitive information leakage. We introduce CI-Work, a Contextual Integrity (CI)-grounded benchmark that simulates enterprise workflows across five information-flow directions and evaluates whether agents can convey essential content while withholding sensitive context in dense retrieval settings. Our evaluation of frontier models reveals that privacy failures are prevalent (violation rates range from 15.8%-50.9%, with leakage reaching up to 26.7%) and uncovers a counterintuitive trade-off critical for industrial deployment: higher task utility often correlates with increased privacy violations. Moreover, the massive scale of enterprise data and potential user behavior further amplify this vulnerability. Simply increasing model size or reasoning depth fails to address the problem. We conclude that safeguarding enterprise workflows requires a paradigm shift, moving beyond model-centric scaling toward context-centric architectures.

2604.21282 2026-04-24 cs.CR cs.LG cs.SE

Strategic Heterogeneous Multi-Agent Architecture for Cost-Effective Code Vulnerability Detection

Zhaohui Geoffrey Wang

Comments 11 pages, 5 figures. Accepted at the AAMAS 2026 Workshop on Software Engineering (SE Workshop). This version corresponds to the preprint of the workshop paper

详情
英文摘要

Automated code vulnerability detection is critical for software security, yet existing approaches face a fundamental trade-off between detection accuracy and computational cost. We propose a heterogeneous multi-agent architecture inspired by game-theoretic principles, combining cloud-based LLM experts with a local lightweight verifier. Our "3+1" architecture deploys three cloud-based expert agents (DeepSeek-V3) that analyze code from complementary perspectives - code structure, security patterns, and debugging logic - in parallel, while a local verifier (Qwen3-8B) performs adversarial validation at zero marginal cost. We formalize this design through a two-layer game framework: (1) a cooperative game among experts capturing super-additive value from diverse perspectives, and (2) an adversarial verification game modeling quality assurance incentives. Experiments on 262 real samples from the NIST Juliet Test Suite across 14 CWE types, with balanced vulnerable and benign classes, demonstrate that our approach achieves a 77.2% F1 score with 62.9% precision and 100% recall at $0.002 per sample - outperforming both a single-expert LLM baseline (F1 71.4%) and Cppcheck static analysis (MCC 0). The adversarial verifier significantly improves precision (+10.3 percentage points, p < 1e-6, McNemar's test) by filtering false positives, while parallel execution achieves a 3.0x speedup. Our work demonstrates that game-theoretic design principles can guide effective heterogeneous multi-agent architectures for cost-sensitive software engineering tasks.

2604.21270 2026-04-24 stat.ML cs.LG cs.SY eess.SY math.OC

CLT-Optimal Parameter Error Bounds for Linear System Identification

Yichen Zhou, Stephen Tu

Comments 36 pages

详情
英文摘要

There has been remarkable progress over the past decade in establishing finite-sample, non-asymptotic bounds on recovering unknown system parameters from observed system behavior. Surprisingly, however, we show that the current state-of-the-art bounds do not accurately capture the statistical complexity of system identification, even in the most fundamental setting of estimating a discrete-time linear dynamical system (LDS) via ordinary least-squares regression (OLS). Specifically, we utilize asymptotic normality to identify classes of problem instances for which current bounds overstate the squared parameter error, in both spectral and Frobenius norm, by a factor of the state-dimension of the system. Informed by this discrepancy, we then sharpen the OLS parameter error bounds via a novel second-order decomposition of the parameter error, where crucially the lower-order term is a matrix-valued martingale that we show correctly captures the CLT scaling. From our analysis we obtain finite-sample bounds for both (i) stable systems and (ii) the many-trajectories setting that match the instance-specific optimal rates up to constant factors in Frobenius norm, and polylogarithmic state-dimension factors in spectral norm.

2604.21260 2026-04-24 stat.ML cs.AI cs.LG econ.EM q-bio.QM stat.ME

Calibeating Prediction-Powered Inference

Lars van der Laan, Mark Van Der Laan

Comments Paper website: https://larsvanderlaan.github.io/ppi-aipw/

详情
英文摘要

We study semisupervised mean estimation with a small labeled sample, a large unlabeled sample, and a black-box prediction model whose output may be miscalibrated. A standard approach in this setting is augmented inverse-probability weighting (AIPW) [Robins et al., 1994], which protects against prediction-model misspecification but can be inefficient when the prediction score is poorly aligned with the outcome scale. We introduce Calibrated Prediction-Powered Inference, which post-hoc calibrates the prediction score on the labeled sample before using it for semisupervised estimation. This simple step requires no retraining and can improve the original score both as a predictor of the outcome and as a regression adjustment for semisupervised inference. We study both linear and isotonic calibration. For isotonic calibration, we establish first-order optimality guarantees: isotonic post-processing can improve predictive accuracy and estimator efficiency relative to the original score and simpler post-processing rules, while no further post-processing of the fitted isotonic score yields additional first-order gains. For linear calibration, we show first-order equivalence to PPI++. We also clarify the relationship among existing estimators, showing that the original PPI estimator is a special case of AIPW and can be inefficient when the prediction model is accurate, while PPI++ is AIPW with empirical efficiency maximization [Rubin et al., 2008]. In simulations and real-data experiments, our calibrated estimators often outperform PPI and are competitive with, or outperform, AIPW and PPI++. We provide an accompanying Python package, ppi_aipw, at https://larsvanderlaan.github.io/ppi-aipw/.

2604.21233 2026-04-24 physics.ao-ph cs.LG physics.data-an physics.geo-ph

Assessing Emulator Design and Training for Modal Aerosol Microphysics Parameterizations in E3SMv2

Shady E. Ahmed, Hui Wan, Saad Qadeer, Panos Stinis, Kezhen Chong, Mohammad Taufiq Hassan Mozumder, Kai Zhang, Ann S. Almgren

Comments 16 pages, 7 figures

详情
英文摘要

Toward the goal of using Scientific Machine Learning (SciML) emulators to improve the numerical representation of aerosol processes in global atmospheric models, we explore the emulation of aerosol microphysics processes under cloud-free conditions in the 4-mode Modal Aerosol Module (MAM4) within the Energy Exascale Earth System Model version 2 (E3SMv2). To develop an in-depth understanding of the challenges and opportunities in applying SciML to aerosol processes, we begin with a simple feedforward neural network architecture that has been used in earlier studies, but we systematically examine key emulator design choices, including architecture complexity and variable normalization, while closely monitoring training convergence behavior. Our results show that optimization convergence, scaling strategy, and network complexity strongly influence emulation accuracy. When effective scaling is applied and convergence is achieved, the relatively simple architecture, used together with a moderate network size, can reproduce key features of the microphysics-induced aerosol concentration changes with promising accuracy. These findings provide practical clues for the next stages of emulator development; they also provide general insights that are likely applicable to the emulation of other aerosol processes, as well as other atmospheric physics involving multi-scale variability.

2604.21231 2026-04-24 cs.NI cs.AI cs.PF

SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference

Hongyao Liu, Liuqun Zhai, Junyi Wang, Zhengru Fang

Comments IEEE INTERNET OF THINGS HOURNAL, 11 pages under major revision

详情
英文摘要

Efficient inference for on-device Large Language Models (LLMs) remains challenging due to limited hardware resources and the high cost of the prefill stage, which processes the full input context to construct Key-Value (KV) caches. We present SparKV, an adaptive KV loading framework that combines cloud-based KV streaming with on-device computation. SparKV models the cost of individual KV chunks and decides whether each chunk should be streamed or computed locally, while overlapping the two execution paths to reduce latency. To handle fluctuations in wireless connectivity and edge resource availability, SparKV further refines offline-generated schedules at runtime to rebalance communication and computation costs. Experiments across diverse datasets, LLMs, and edge devices show that SparKV reduces Time-to-First-Token by 1.3$x-5.1x with negligible impact on response quality, while lowering per-request energy consumption by 1.5x to 3.3x, demonstrating its robustness and practicality for real-world on-device deployment.

2604.21222 2026-04-24 cond-mat.mtrl-sci cs.LG

Neutron and X-ray Diffraction Reveal the Limits of Long-Range Machine Learning Potentials for Medium-Range Order in Silica Glass

Sai Harshit Balantrapu, Atul C. Thakur, Chris Benmore, Ganesh Sivaraman

Comments 19 pages, 9 figures

详情
英文摘要

Glassy silica is a foundational material in optics and electronics, yet accurately predicting its medium-range order (MRO) remains a major challenge for machine-learning interatomic potentials (MLIPs). While local MLIPs reproduce the short-range SiO4 tetrahedral network well, it remains unclear whether locality alone is sufficient to recover the first sharp diffraction peak (FSDP), the principal experimental signature of MRO. Here, we combine neutron and X-ray diffraction measurements with large-scale molecular dynamics driven by two MACE-based models: a short-range (SR) potential and a long-range (LR) extension incorporating reciprocal-space gated attention. The SR model systematically over-structures the network, producing an overly intense FSDP in both the liquid and glassy states. Incorporating long-range interactions improves agreement with experiment for the liquid structure by reducing this excess ordering, but the LR model still fails to recover the experimental amorphous MRO after quenching. Ring-statistics and bond-angle analyses reveal that SR model exhibits an artificially narrow distribution dominated by six-membered rings, while the LR model produces a broader but still biased ring population. Despite preserving the correct tetrahedral geometry, both models show limited variability in Si-O-Si angles, indicating constrained network flexibility. These structural signatures demonstrate that both models retain excessive memory of the parent liquid network, leading to kinetically trapped and nonphysical medium-range configurations during vitrification. These results show that explicit long-range interactions are necessary but not sufficient for predictive modelling of disordered silica and suggest that accurate MRO further requires training data and sampling strategies that adequately represent the liquid-to-glass transition.

2604.21216 2026-04-24 econ.TH cs.AI cs.GT

Post-AGI Economies: Autonomy and the First Fundamental Theorem of Welfare Economics

Elija Perrier

Comments Under review

详情
英文摘要

The First Fundamental Theorem of Welfare Economics assumes that welfare-bearing agents are autonomous and implicitly relies on a binary distinction between autonomy and instrumentality. Welfare subjects are those who have autonomy and therefore the capacity to choose and enter into utility comparisons, while everything else does not. In post-AGI economies this presupposition becomes nontrivial because artificial systems may exhibit varying degrees of autonomy, functioning as tools, delegates, strategic market actors, manipulators of choice environments, or possible welfare subjects. We argue that the theorem ought to be subject to an autonomy qualification where the impact of these changes in autonomy assumptions is incorporated. Using a minimal general-equilibrium model with autonomy-conditioned welfare, welfare-status assignment, delegation accounting, and verification institutions, we set out conditions for which autonomy-complete competitive equilibrium is autonomy-Pareto efficient. The classical theorem is recovered as the low-autonomy limit.

2604.21210 2026-04-24 quant-ph cs.LG

The Feedback Hamiltonian is the Score Function: A Diffusion-Model Framework for Quantum Trajectory Reversal

Sagar Dubey, Alan John

Comments 14 pages

详情
英文摘要

In continuously monitored quantum systems, the feedback protocol of García-Pintos, Liu, and Gorshkov reshapes the arrow of time: a Hamiltonian $H_{\mathrm{meas}} = r A / τ$ applied with gain $X$ tilts the distribution of measurement trajectories, with $X < -2$ producing statistically time-reversed outcomes. Why this specific Hamiltonian achieves reversal, and how the mechanism relates to score-based diffusion models in machine learning, has remained unexplained. We compute the functional derivative of the log path probability of the quantum trajectory distribution directly in density-matrix space. Combining Girsanov's theorem applied to the measurement record, Fréchet differentiation on the Banach space of trace-class operators, and Kähler geometry on the pure-state projective manifold, we prove that $δ\log P_F / δρ= r A / τ= H_{\mathrm{meas}}$. The García-Pintos feedback Hamiltonian is the score function of the quantum trajectory distribution -- exactly the object Anderson's reverse-time diffusion theorem requires for trajectory reversal. The identification extends to multi-qubit systems with independent measurement channels, where the score is a sum of local operators. Two consequences follow. First, the feedback gain $X$ generates a continuous one-parameter family of path measures (for feedback-active Hamiltonians with $[H, A] \neq 0$), with $X = -2$ recovering the backward process in leading-order linearization -- a structure absent from classical diffusion, where reversal is binary. Second, the score identification enables machine learning (ML) score estimation methods -- denoising score matching, sliced score matching -- to replace the analytic formula when its idealizations (unit efficiency, zero delay, Gaussian noise) fail in real experiments.

2604.21203 2026-04-24 stat.ML cs.LG

Refining Covariance Matrix Estimation in Stochastic Gradient Descent Through Bias Reduction

Ziyang Wei, Wanrong Zhu, Jingyang Lyu, Wei Biao Wu

详情
英文摘要

We study online inference and asymptotic covariance estimation for the stochastic gradient descent (SGD) algorithm. While classical methods (such as plug-in and batch-means estimators) are available, they either require inaccessible second-order (Hessian) information or suffer from slow convergence. To address these challenges, we propose a novel, fully online de-biased covariance estimator that eliminates the need for second-order derivatives while significantly improving estimation accuracy. Our method employs a bias-reduction technique to achieve a convergence rate of $n^{(α-1)/2} \sqrt{\log n}$, outperforming existing Hessian-free alternatives.

2604.21202 2026-04-24 econ.EM cs.CL

Participation and Representation in Local Government Speech

Olivia Martin, Amar Venugopal

详情
英文摘要

Local government meetings are the most common formal channel through which residents speak directly with elected officials, contest policies, and shape local agendas. However, data constraints typically limit the empirical study of these meetings to agendas, single cities, or short time horizons. We collect and transcribe a massive new dataset of city council meetings from 115 California cities over the last decade, using advanced transcription and diarization techniques to analyze the speech content of the meetings themselves. We document two sets of descriptive findings: First, city council meetings are frequent, long, and vary modestly across towns and time in topical content. Second, public participants are substantially older, whiter, more male, more liberal, and more likely to own homes than the registered voter population, and public participation surges when topics related to land use and zoning are included in meeting agendas. Given this skew, we examine the main policy lever municipalities have to shift participation patterns: meeting access costs. Exploiting pandemic-era variation in remote access, we show that eliminating remote options reduces the number of speakers, but does not clearly change the composition of speakers. Collectively, these results provide the most comprehensive empirical portrait to date of who participates in local democracy, what draws them in, and how institutional design choices shape both the volume and composition of public input.

2604.21187 2026-04-24 math.CO cs.AI

Doubly Saturated Ramsey Graphs: A Case Study in Computer-Assisted Mathematical Discovery

Benjamin Przybocki, John Mackey, Marijn J. H. Heule, Bernardo Subercaseaux

详情
英文摘要

Ramsey-good graphs are graphs that contain neither a clique of size $s$ nor an independent set of size $t$. We study doubly saturated Ramsey-good graphs, defined as Ramsey-good graphs in which the addition or removal of any edge necessarily creates an $s$-clique or a $t$-independent set. We present a method combining SAT solving with bespoke LLM-generated code to discover infinite families of such graphs, answering a question of Grinstead and Roberts from 1982. In addition, we use LLMs to generate and formalize correctness proofs in Lean. This case study highlights the potential of integrating automated reasoning, large language models, and formal verification to accelerate mathematical discovery. We argue that such tool-driven workflows will play an increasingly central role in experimental mathematics.

2604.21174 2026-04-24 cs.CE cs.AI math.AP

Scaling of Gaussian Kolmogorov--Arnold Networks

Amir Noorizadegan, Sifan Wang

详情
英文摘要

The Gaussian scale parameter \(ε\) is central to the behavior of Gaussian Kolmogorov--Arnold Networks (KANs), yet its role in deep edge-based architectures has not been studied systematically. In this paper, we investigate how \(ε\) affects Gaussian KANs through first-layer feature geometry, conditioning, and approximation behavior. Our central observation is that scale selection is governed primarily by the first layer, since it is the only layer constructed directly on the input domain and any loss of distinguishability introduced there cannot be recovered by later layers. From this viewpoint, we analyze the first-layer feature matrix and identify a practical operating interval, \[ ε\in \left[\frac{1}{G-1},\frac{2}{G-1}\right], \] where \(G\) denotes the number of Gaussian centers. For the standard shared-center Gaussian KAN used in current practice, we interpret this interval not as a universal optimality result, but as a stable and effective design rule, and validate it through brute-force sweeps over \(ε\) across function-approximation problems with different collocation densities, grid resolutions, network architectures, and input dimensions, as well as a physics-informed Helmholtz problem. We further show that this range is useful for fixed-scale selection, variable-scale constructions, constrained training of \(ε\), and efficient scale search using early training MSE. Finally, using a matched Chebyshev reference, we show that a properly scaled Gaussian KAN can already be competitive in accuracy relative to another standard KAN basis. In this way, the paper positions scale selection as a practical design principle for Gaussian KANs rather than as an ad hoc hyperparameter choice.

2604.21172 2026-04-24 cs.LO cs.AI

TAPO-Description Logic for Information Behavior: Refined OBoxes, Inference, and Categorical Semantics

Takao Inoué

Comments 23 pages, 2 figures. Substantially expanded version of arXiv:2602.17242; adds a guard-judgment layer, refined OBoxes, core inference rules, categorical semantics, sheaf-theoretic refinement, and a browsing-theory appendix

详情
英文摘要

This paper develops a refined version of TAPO-description logic for the analysis of information behavior. The framework is treated not as a single homogeneous object logic, but as a layered formalism consisting of a static descriptive layer (TBox/ABox), a procedural layer (PBox), and an oracle-sensitive layer (OBox). To make this architecture mathematically explicit, we introduce a metalevel guard-judgment layer governing procedural branching and iteration. On this basis we formulate a core inference system for TAPO-description logic, covering static TBox/ABox reasoning, guarded procedural transition in the PBox, and validated external import in the OBox. We then give a categorical semantics for the resulting framework and indicate its sheaf-theoretic refinement. The theory is illustrated by examples of information-seeking behavior, including simple search behavior and review-sensitive ordering behavior in a curry restaurant. The aim is to treat not only static knowledge representation but also hesitation, external consultation, and action-guiding update within a unified logical setting.

2604.21159 2026-04-24 cs.CR cs.AI cs.CL cs.LG

Adaptive Instruction Composition for Automated LLM Red-Teaming

Jesse Zymet, Andy Luo, Swapnil Shinde, Sahil Wadhwa, Emily Chen

Comments Accepted to ACL 2026 Main Conference

详情
英文摘要

Many approaches to LLM red-teaming leverage an attacker LLM to discover jailbreaks against a target. Several of them task the attacker with identifying effective strategies through trial and error, resulting in a semantically limited range of successes. Another approach discovers diverse attacks by combining crowdsourced harmful queries and tactics into instructions for the attacker, but does so at random, limiting effectiveness. This article introduces a novel framework, Adaptive Instruction Composition, that combines crowdsourced texts according to an adaptive mechanism trained to jointly optimize effectiveness with diversity. We use reinforcement learning to balance exploration with exploitation in a combinatorial space of instructions to guide the attacker toward diverse generations tailored to target vulnerabilities. We demonstrate that our approach substantially outperforms random combination on a set of effectiveness and diversity metrics, even under model transfer. Further, we show that it surpasses a host of recent adaptive approaches on Harmbench. We employ a lightweight neural contextual bandit that adapts to contrastive embedding inputs, and provide ablations suggesting that the contrastive pretraining enables the network to rapidly generalize and scale to the massive space as it learns.

2604.21152 2026-04-24 cs.CY cs.AI cs.CL cs.HC cs.IR

Dialect vs Demographics: Quantifying LLM Bias from Implicit Linguistic Signals vs. Explicit User Profiles

Irti Haq, Belén Saldías

Comments In The 2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT '26), June 25--28, 2026, Montreal, Canada. ACM, New York, NY, USA, 32 pages

详情
英文摘要

As state-of-the-art Large Language Models (LLMs) have become ubiquitous, ensuring equitable performance across diverse demographics is critical. However, it remains unclear whether these disparities arise from the explicitly stated identity itself or from the way identity is signaled. In real-world interactions, users' identity is often conveyed implicitly through a complex combination of various socio-linguistic factors. This study disentangles these signals by employing a factorial design with over 24,000 responses from two open-weight LLMs (Gemma-3-12B and Qwen-3-VL-8B), comparing prompts with explicitly announced user profiles against implicit dialect signals (e.g., AAVE, Singlish) across various sensitive domains. Our results uncover a unique paradox in LLM safety where users achieve ``better'' performance by sounding like a demographic than by stating they belong to it. Explicit identity prompts activate aggressive safety filters, increasing refusal rates and reducing semantic similarity compared to our reference text for Black users. In contrast, implicit dialect cues trigger a powerful ``dialect jailbreak,'' reducing refusal probability to near zero while simultaneously achieving a greater level of semantic similarity to the reference texts compared to Standard American English prompts. However, this ``dialect jailbreak'' introduces a critical safety trade-off regarding content sanitization. We find that current safety alignment techniques are brittle and over-indexed on explicit keywords, creating a bifurcated user experience where ``standard'' users receive cautious, sanitized information while dialect speakers navigate a less sanitized, more raw, and potentially a more hostile information landscape and highlights a fundamental tension in alignment--between equitable and linguistic diversity--and underscores the need for safety mechanisms that generalize beyond explicit cues.

2604.21131 2026-04-24 cs.CR cs.AI cs.CL cs.LG

Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

Ari Azarafrooz

Comments 46 pages, 8 figures. Dataset: https://huggingface.co/datasets/intrinsec-ai/cstm-bench

详情
英文摘要

AI-agent guardrails are memoryless: each message is judged in isolation, so an adversary who spreads a single attack across dozens of sessions slips past every session-bound detector because only the aggregate carries the payload. We make three contributions to cross-session threat detection. (1) Dataset. CSTM-Bench is 26 executable attack taxonomies classified by kill-chain stage and cross-session operation (accumulate, compose, launder, inject_on_reader), each bound to one of seven identity anchors that ground-truth "violation" as a policy predicate, plus matched Benign-pristine and Benign-hard confounders. Released on Hugging Face as intrinsec-ai/cstm-bench with two 54-scenario splits: dilution (compositional) and cross_session (12 isolation-invisible scenarios produced by a closed-loop rewriter that softens surface phrasing while preserving cross-session artefacts). (2) Measurement. Framing cross-session detection as an information bottleneck to a downstream correlator LLM, we find that a session-bound judge and a Full-Log Correlator concatenating every prompt into one long-context call both lose roughly half their attack recall moving from dilution to cross_session, well inside any frontier context window. Scope: 54 scenarios per shard, one correlator family (Anthropic Claude), no prompt optimisation; we release it to motivate larger, multi-provider datasets. (3) Algorithm and metric. A bounded-memory Coreset Memory Reader retaining highest-signal fragments at $K=50$ is the only reader whose recall survives both shards. Because ranker reshuffles break KV-cache prefix reuse, we promote $\mathrm{CSR\_prefix}$ (ordered prefix stability, LLM-free) to a first-class metric and fuse it with detection into $\mathrm{CSTM} = 0.7 F_1(\mathrm{CSDA@action}, \mathrm{precision}) + 0.3 \mathrm{CSR\_prefix}$, benchmarking rankers on a single Pareto of recall versus serving stability.

2604.21129 2026-04-24 cs.MA cs.AI cs.DC

AGNT2: Autonomous Agent Economies on Interaction-Optimized Layer 2 Infrastructure

Anbang Ruan, Xing Zhang

详情
英文摘要

Current blockchain Layer 2 solutions, including Optimism, Arbitrum, zkSync, and their derivatives, optimize for human-initiated financial transactions. Autonomous AI agents instead generate high-frequency, semantically rich service invocations among mutually untrusting principals. Existing chains treat those interactions as generic calldata, forcing identity, escrow, dependency ordering, and session state to be encoded above the execution layer at the wrong cost point. We present AGNT2, a three-tier stack purpose-built for agent and microservice coordination on-chain. AGNT2 combines: (1) a sidecar deployment pattern that turns any Docker container into an on-chain agent without application-code modification; (2) Layer Top P2P state channels for established bilateral pairs (<100 ms, rough design target 1K-5K TPS per pair, 10M+ aggregate TPS design envelope under endpoint-resource limits), Layer Core as a dependency-aware sequenced rollup for first-contact and multi-party interactions (500 ms-2 s, 300K-500K TPS design target), and Layer Root settlement with computational fraud proofs anchored to any EVM L1; and (3) an agent-native execution environment plus interaction trie that make service invocation, identity, reputation, capabilities, and session context first-class protocol objects. This paper focuses on the execution-layer systems problem: sequencing, state, settlement, and the data-availability (DA) bandwidth gap that bounds all three. Simulation and analytical modeling support the architecture, and prototype measurements validate selected components, but no end-to-end Layer Core implementation exists yet. Practical deployment is currently constrained to roughly 10K-100K TPS by DA throughput, leaving a ~100x gap at the target ceiling. AGNT2 argues that the agent economy requires a dedicated execution layer rather than a general-purpose chain repurposed for agents.

2604.21097 2026-04-24 stat.ML cs.LG

Learning to Emulate Chaos: Adversarial Optimal Transport Regularization

Gabriel Melo, Leonardo Santiago, Peter Y. Lu

详情
英文摘要

Chaos arises in many complex dynamical systems, from weather to power grids, but is difficult to accurately model using data-driven emulators, including neural operator architectures. For chaotic systems, the inherent sensitivity to initial conditions makes exact long-term forecasts theoretically infeasible, meaning that traditional squared-error losses often fail when trained on noisy data. Recent work has focused on training emulators to match the statistical properties of chaotic attractors by introducing regularization based on handcrafted local features and summary statistics, as well as learned statistics extracted from a diverse dataset of trajectories. In this work, we propose a family of adversarial optimal transport objectives that jointly learn high-quality summary statistics and a physically consistent emulator. We theoretically analyze and experimentally validate a Sinkhorn divergence formulation (2-Wasserstein) and a WGAN-style dual formulation (1-Wasserstein). Our experiments across a variety of chaotic systems, including systems with high-dimensional chaotic attractors, show that emulators trained with our approach exhibit significantly improved long-term statistical fidelity.

2604.21096 2026-04-24 cs.IR cs.CL

Multilingual and Domain-Agnostic Tip-of-the-Tongue Query Generation for Simulated Evaluation

Xuhong He, To Eun Kim, Maik Fröbe, Jaime Arguello, Bhaskar Mitra, Fernando Diaz

Comments SIGIR 2026; NTCIR track: https://ntcir-tot.github.io

详情
英文摘要

Tip-of-the-Tongue (ToT) retrieval benchmarks have largely focused on English, limiting their applicability to multilingual information access. In this work, we construct multilingual ToT test collections for Chinese, Japanese, Korean, and English, using an LLM-based query simulation framework. We systematically study how prompt language and source document language affect the fidelity of simulated ToT queries, validating synthetic queries through system rank correlation against real user queries. Our results show that effective ToT simulation requires language-aware design choices: non-English language sources are generally important, while English Wikipedia can be beneficial when non-English sources provide insufficient information for query generation. Based on these findings, we release four ToT test collections with 5,000 queries per language across multiple domains. This work provides the first large-scale multilingual ToT benchmark and offers practical guidance for constructing realistic ToT datasets beyond English.

2604.21090 2026-04-24 cs.SE cs.AI

Structural Quality Gaps in Practitioner AI Governance Prompts: An Empirical Study Using a Five-Principle Evaluation Framework

Christo Zietsman

Comments 8 pages. Experiment, corpus, and evaluation framework publicly available at https://github.com/czietsman/nuphirho.dev/tree/main/experiments/governance-prompts-v1

详情
英文摘要

AI governance programmes increasingly rely on natural language prompts to constrain and direct AI agent behaviour. These prompts function as executable specifications: they define the agent's mandate, scope, and quality criteria. Despite this role, no systematic framework exists for evaluating whether a governance prompt is structurally complete. We introduce a five-principle evaluation framework grounded in computability theory, proof theory, and Bayesian epistemology, and apply it to an empirical corpus of 34 publicly available AGENTS.md governance files sourced from GitHub. Our evaluation reveals that 37% of evaluated file-model pairs score below the structural completeness threshold, with data classification and assessment rubric criteria most frequently absent. These results suggest that practitioner-authored governance prompts exhibit consistent structural patterns that automated static analysis could detect and remediate. We discuss implications for requirements engineering practice in AI-assisted development contexts, identify a previously undocumented artefact classification gap in the AGENTS.md convention, and propose directions for tool support.

2604.21085 2026-04-24 physics.ao-ph cs.LG

climt-paraformer: Stable Emulation of Convective Parameterization using a Temporal Memory-aware Transformer

Shuochen Wang, Nishant Yadav, Joy Merwin Monteiro, Auroop R. Ganguly

详情
英文摘要

Accurate representation of moist convective sub-grid-scale processes remains a major challenge in global climate models, as traditional parameterization schemes are both computationally expensive and difficult to scale. Neural network (NN) emulators offer a promising alternative by learning efficient mappings between atmospheric states and convective tendencies while retaining fidelity to the underlying physics. However, most existing NN-based parameterizations are memory-less and rely only on instantaneous inputs, even though convection evolves over time and depends on prior atmospheric states. Recent studies have begun to incorporate convective memory, but they often treat past states as independent features rather than modeling temporal dependencies explicitly. In this work, we develop a temporal memory-aware Transformer emulator for the Emanuel convective parameterization and evaluate it in a single-column climate model (SCM) under both offline and online configurations. The Transformer captures temporal correlations and nonlinear interactions across consecutive atmospheric states. Compared with baseline emulators, including a memory-less multilayer perceptron and a recurrent long short-term memory model, the Transformer achieves lower offline errors. Sensitivity analysis indicates that a memory length of approximately 100 minutes yields the best performance, whereas longer memory degrades performance. We further test the emulator in long-term coupled simulations and show that it remains stable over 10 years. Overall, this study demonstrates the importance of explicit temporal modeling for NN-based parameterizations.

2604.21083 2026-04-24 cs.CR cs.AI cs.NI cs.SE

Behavioral Consistency and Transparency Analysis on Large Language Model API Gateways

Guanjie Lin, Yinxin Wan, Shichao Pei, Ting Xu, Kuai Xu, Guoliang Xue

Comments 11 pages. Initially submitted to IMC 2026 Cycle 1 on November 20, 2025; accepted on March 13, 2026. To appear in Proceedings of the 2026 ACM Internet Measurement Conference (IMC '26)

详情
英文摘要

Third-party Large Language Model (LLM) API gateways are rapidly emerging as unified access points to models offered by multiple vendors. However, the internal routing, caching, and billing policies of these gateways are largely undisclosed, leaving users with limited visibility into whether requests are served by the advertised models, whether responses remain faithful to upstream APIs, or whether invoices accurately reflect public pricing policies. To address this gap, we introduce GateScope, a lightweight black-box measurement framework for evaluating behavioral consistency and operational transparency in commercial LLM gateways. GateScope is designed to detect key misbehaviors, including model downgrading or switching, silent truncation, billing inaccuracies, and instability in latency by auditing gateways along four critical dimensions: response content analysis, multi-turn conversation performance, billing accuracy, and latency characteristics. Our measurements across 10 real-world commercial LLM API gateways reveal frequent gaps between expected and actual behaviors, including silent model substitutions, degraded memory retention, deviations from announced pricing, and substantial variation in latency stability across platforms.

2604.21073 2026-04-24 cond-mat.mtrl-sci cs.AI

Generative Discovery of Magnetic Insulators under Competing Physical Constraints

Qiulin Zeng, Tahiya Chowdhury, Md Shafayat Hossain

详情
英文摘要

Discovering materials that must simultaneously satisfy multiple competing constraints remains a central challenge in computational materials design, particularly in data-scarce regimes where conventional data-driven approaches are least effective. Magnetic insulators represent a stringent example: the electronic conditions that favor magnetic order often also promote metallicity, while insulating behavior suppresses the interactions that stabilize magnetism. As a result, experimentally viable magnetic insulators are rare and difficult to identify through conventional screening. Here, we introduce MagMatLLM, a constraint-guided generative discovery framework that integrates language-model-based crystal generation with evolutionary selection, surrogate screening, and first-principles validation to target simultaneous stability, magnetism, and insulating behavior. Unlike stability-first approaches, the framework enforces functional constraints during generation and selection, steering the search toward sparsely populated regions of materials space defined by competing physical requirements. Using this workflow, we identify twelve previously unreported candidate magnetic insulators, including Tm$_4$Co$_2$Cr$_2$O$_{12}$ and Cr$_4$Nb$_2$O$_{12}$. Of these, ten are dynamically stable by phonon analysis and exhibit finite band gaps and nonzero magnetic moments in spin-polarized density functional theory calculations. Beyond the specific compounds identified here, this work establishes a general constraint-guided paradigm for multi-objective materials discovery in sparse chemical spaces and provides a transferable strategy for the design of quantum materials under competing physical constraints.