arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 3188
专题追踪
2603.01977 2026-03-03 math.AP cs.LG math.OC

Quantitative Convergence of Wasserstein Gradient Flows of Kernel Mean Discrepancies

Lénaïc Chizat, Maria Colombo, Roberto Colombo, Xavier Fernández-Real

详情
英文摘要

We study the quantitative convergence of Wasserstein gradient flows of Kernel Mean Discrepancy (KMD) (also known as Maximum Mean Discrepancy (MMD)) functionals. Our setting covers in particular the training dynamics of shallow neural networks in the infinite-width and continuous time limit, as well as interacting particle systems with pairwise Riesz kernel interaction in the mean-field and overdamped limit. Our main analysis concerns the model case of KMD functionals given by the squared Sobolev distance $ \mathscr{E}^ν_{s}(μ)= \frac{1}{2}\lVert μ-ν\rVert_{\dot H^{-s}}^{2}$ for any $s\geq 1 $ and $ν$ a fixed probability measure on the $d$-dimensional torus. First, inspired by Yudovich theory for the $2d$-Euler equation, we establish existence and uniqueness in natural weak regularity classes. Next, we show that for $s=1$ the flow converges globally at an exponential rate under minimal assumptions, while for $s>1$ we prove local convergence at polynomial rates that depend explicitly on $s$ and on the Sobolev regularity of $μ$ and $ν$. These rates hold both at the energy level and in higher regularity classes and are tight for $ν$ uniform. We then consider the gradient flow of the population loss for shallow neural networks with ReLU activation, which can be cast as a Wasserstein--Fisher--Rao gradient flow on the space of nonnegative measures on the sphere $\mathbb{S}^d$. Exploiting a correspondence with the Sobolev energy case with $s=(d+3)/2$, we derive an explicit polynomial local convergence rate for this dynamics. Except for the special case $s=1$, even non-quantitative convergence was previously open in all these settings. We also include numerical experiments in dimension $d=1$ using both PDE and particle methods which illustrate our analysis.

2603.01971 2026-03-03 stat.ML cs.LG

LOCUS: A Distribution-Free Loss-Quantile Score for Risk-Aware Predictions

Matheus Barreto, Mário de Castro, Thiago R. Ramos, Denis Valle, Rafael Izbicki

Comments The article contains nine pages and the appendix twelve

详情
英文摘要

Modern machine learning models can be accurate on average yet still make mistakes that dominate deployment cost. We introduce Locus, a distribution-free wrapper that produces a per-input loss-scale reliability score for a fixed prediction function. Rather than quantifying uncertainty about the label, Locus models the realized loss of the prediction function using any engine that outputs a predictive distribution for the loss given an input. A simple split-calibration step turns this function into a distribution-free interpretable score that is comparable across inputs and can be read as an upper loss level. The score is useful on its own for ranking, and it can optionally be thresholded to obtain a transparent flagging rule with distribution-free control of large-loss events. Experiments across 13 regression benchmarks show that Locus yields effective risk ranking and reduces large-loss frequency compared to standard heuristics.

2603.01942 2026-03-03 cs.HC cs.AI

Ignore All Previous Instructions: Jailbreaking as a de-escalatory peace building practise to resist LLM social media bots

Huw Day, Adrianna Jezierska, Jessica Woodgate

Comments Accepted to ICLR 2026 AI for peace workshop

详情
英文摘要

Large Language Models have intensified the scale and strategic manipulation of political discourse on social media, leading to conflict escalation. The existing literature largely focuses on platform-led moderation as a countermeasure. In this paper, we propose a user-centric view of "jailbreaking" as an emergent, non-violent de-escalation practice. Online users engage with suspected LLM-powered accounts to circumvent large language model safeguards, exposing automated behaviour and disrupting the circulation of misleading narratives.

2603.01926 2026-03-03 cs.IR cs.CV

MealRec: Multi-granularity Sequential Modeling via Hierarchical Diffusion Models for Micro-Video Recommendation

Xinxin Dong, Haokai Ma, Yuze Zheng, Yongfu Zha, Yonghui Yang, Xiaodong Wang

详情
英文摘要

Micro-video recommendation aims to capture user preferences from the collaborative and context information of the interacted micro-videos, thereby predicting the appropriate videos. This target is often hindered by the inherent noise within multimodal content and unreliable implicit feedback, which weakens the correspondence between behaviors and underlying interests. While conventional works have predominantly approached such scenario through behavior-augmented modeling and content-centric multimodal analysis, these paradigms can inadvertently give rise to two non-trivial challenges: preference-irrelative video representation extraction and inherent modality conflicts. To address these issues, we propose a Multi-granularity sequential modeling method via hierarchical diffusion models for micro-video Recommendation (MealRec), which simultaneously considers temporal correlations during preference modeling from intra- and inter-video perspectives. Specifically, we first propose Temporal-guided Content Diffusion (TCD) to refine video representations under intra-video temporal guidance and personalized collaborative signals to emphasize salient content while suppressing redundancy. To achieve the semantically coherent preference modeling, we further design the Noise-unconditional Preference Denoising (NPD) to recovers informative user preferences from corrupted states under the blind denoising. Extensive experiments and analyses on four micro-video datasets from two platforms demonstrate the effectiveness, universality, and robustness of our MealRec, further uncovering the effective mechanism of our proposed TCD and NPD. The source code and corresponding dataset will be available upon acceptance.

2603.01923 2026-03-03 cs.LO cs.LG

Bound Propagation meets Constraint Simplification: Improving Logic-based XAI for Neural Networks

Ronaldo Gomes, Jairo Ribeiro, Luiz Queiroz, Thiago Alves Rocha

Comments Preprint version. For the final published version, see the DOI below

详情
英文摘要

Logic-based methods for explaining neural network decisions offer formal guarantees of correctness and non-redundancy, but they often suffer from high computational costs, especially for large networks. In this work, we improve the efficiency of such methods by combining bound propagation with constraint simplification. These simplifications, derived from the propagation, tighten neuron bounds and eliminate unnecessary binary variables, making the explanation process more efficient. Our experiments suggest that combining these techniques reduces explanation time by up to 89.26\%, particularly for larger neural networks.

2603.01874 2026-03-03 cs.CR cs.AI

Phishing the Phishers with SpecularNet: Hierarchical Graph Autoencoding for Reference-Free Web Phishing Detection

Tailai Song, Pedro Casas, Michela Meo

详情
英文摘要

Phishing remains the most pervasive threat to the Web, enabling large-scale credential theft and financial fraud through deceptive webpages. While recent reference-based and generative-AI-driven phishing detectors achieve strong accuracy, their reliance on external knowledge bases, cloud services, and complex multimodal pipelines fundamentally limits practicality, scalability, and reproducibility. In contrast, conventional deep learning approaches often fail to generalize to evolving phishing campaigns. We introduce SpecularNet, a novel lightweight framework for reference-free web phishing detection that demonstrates how carefully designed compact architectures can rival heavyweight systems. SpecularNet operates solely on the domain name and HTML structure, modeling the Document Object Model (DOM) as a tree and leveraging a hierarchical graph autoencoding architecture with directional, level-wise message passing. This design captures higher-order structural invariants of phishing webpages while enabling fast, end-to-end inference on standard CPUs. Extensive evaluation against 13 state of the art phishing detectors, including leading reference-based systems, shows that SpecularNet achieves competitive detection performance with dramatically lower computational cost. On benchmark datasets, it reaches an F1 score of 93.9%, trailing the best reference-based method slightly while reducing inference time from several seconds to approximately 20 milliseconds per webpage. Field and robustness evaluations further validate SpecularNet in real-world deployments, on a newly collected 2026 open-world dataset, and against adversarial attacks.

2603.01870 2026-03-03 cs.LO cs.LG

Generalizing Logic-based Explanations for Machine Learning Classifiers via Optimization

Francisco Mateus Rocha Filho, Ajalmar Rêgo da Rocha Neto, Thiago Alves Rocha

Comments Preprint version. For the final published version, see the DOI below

详情
英文摘要

Machine learning models support decision-making, yet the reasons behind their predictions are opaque. Clear and reliable explanations help users make informed decisions and avoid blindly trusting model outputs. However, many existing explanation methods fail to guarantee correctness. Logic-based approaches ensure correctness but often offer overly constrained explanations, limiting coverage. Recent work addresses this by incrementally expanding explanations while maintaining correctness. This process is performed separately for each feature, adjusting both its upper and lower bounds. However, this approach faces a trade-off: smaller increments incur high computational costs, whereas larger ones may lead to explanations covering fewer instances. To overcome this, we propose two novel methods. Onestep builds upon this prior work, generating explanations in a single step for each feature and each bound, eliminating the overhead of an iterative process. \textit{Twostep} takes a gradual approach, improving coverage. Experimental results show that Twostep significantly increases explanation coverage (by up to 72.60\% on average across datasets) compared to Onestep and, consequently, to prior work.

2602.02734 2026-03-03 eess.AS cs.AI cs.CL

WAXAL: A Large-Scale Multilingual African Language Speech Corpus

Abdoulaye Diack, Perry Nelson, Kwaku Agbesi, Angela Nakalembe, MohamedElfatih MohamedKhair, Vusumuzi Dube, Tavonga Siyavora, Subhashini Venugopalan, Jason Hickey, Uche Okonkwo, Abhishek Bapna, Isaac Wiafe, Raynard Dodzi Helegah, Elikem Doe Atsakpo, Charles Nutrokpor, Fiifi Baffoe Payin Winful, Kafui Kwashie Solaga, Jamal-Deen Abdulai, Akon Obu Ekpezu, Audace Niyonkuru, Samuel Rutunda, Boris Ishimwe, Michael Melese, Engineer Bainomugisha, Joyce Nakatumba-Nabende, Andrew Katumba, Claire Babirye, Jonathan Mukiibi, Vincent Kimani, Samuel Kibacia, James Maina, Fridah Emmah, Ahmed Ibrahim Shekarau, Ibrahim Shehu Adamu, Yusuf Abdullahi, Howard Lakougna, Bob MacDonald, Hadar Shemtov, Aisha Walcott-Bryant, Moustapha Cisse, Avinatan Hassidim, Jeff Dean, Yossi Matias

Comments Initial dataset release with added TTS, some more to come

详情
英文摘要

The advancement of speech technology has predominantly favored high-resource languages, creating a significant digital divide for speakers of most Sub-Saharan African languages. To address this gap, we introduce WAXAL, a large-scale, openly accessible speech dataset for 24 languages representing over 100 million speakers. The collection consists of two main components: an Automated Speech Recognition (ASR) dataset containing approximately 1,250 hours of transcribed, natural speech from a diverse range of speakers, and a Text-to-Speech (TTS) dataset with around 235 hours of high-quality, single-speaker recordings reading phonetically balanced scripts. This paper details our methodology for data collection, annotation, and quality control, which involved partnerships with four African academic and community organizations. We provide a detailed statistical overview of the dataset and discuss its potential limitations and ethical considerations. The WAXAL datasets are released at https://huggingface.co/datasets/google/WaxalNLP under the permissive CC-BY-4.0 license to catalyze research, enable the development of inclusive technologies, and serve as a vital resource for the digital preservation of these languages.

2512.14450 2026-03-03 eess.SY cs.RO cs.SY

Nonlinear System Identification Nano-drone Benchmark

Riccardo Busetto, Elia Cereda, Marco Forgione, Gabriele Maroni, Dario Piga, Daniele Palossi

详情
英文摘要

We introduce a benchmark for system identification based on 75k real-world samples from the Crazyflie 2.1 Brushless nano-quadrotor, a sub-50g aerial vehicle widely adopted in robotics research. The platform presents a challenging testbed due to its multi-input, multi-output nature, open-loop instability, and nonlinear dynamics under agile maneuvers. The dataset comprises four aggressive trajectories with synchronized 4-dimensional motor inputs and 13-dimensional output measurements. To enable fair comparison of identification methods, the benchmark includes a suite of multi-horizon prediction metrics for evaluating both one-step and multi-step error propagation. In addition to the data, we provide a detailed description of the platform and experimental setup, as well as baseline models highlighting the challenge of accurate prediction under real-world noise and actuation nonlinearities. All data, scripts, and reference implementations are released as open-source at https://github.com/idsia-robotics/nanodrone-sysid-benchmark to facilitate transparent comparison of algorithms and support research on agile, miniaturized aerial robotics.

2511.23224 2026-03-03 quant-ph cs.LG

Nonstabilizerness Estimation using Graph Neural Networks

Vincenzo Lipardi, Domenica Dibenedetto, Georgios Stamoulis, Evert van Nieuwenburg, Mark H. M. Winands

详情
英文摘要

This article proposes a Graph Neural Network (GNN) approach to estimate nonstabilizerness in quantum circuits, measured by the stabilizer Rényi entropy (SRE). Nonstabilizerness is a fundamental resource for quantum advantage, and efficient SRE estimations are highly beneficial in practical applications. We address the nonstabilizerness estimation problem through three supervised learning formulations starting from easier classification tasks to the more challenging regression task. Experimental results show that the proposed GNN manages to capture meaningful features from the graph-based circuit representation, resulting in robust generalization performances achieved across diverse scenarios. In classification tasks, the GNN is trained on product states and generalizes on circuits evolved under Clifford operations, entangled states, and circuits with higher number of qubits. In the regression task, the GNN significantly improves the SRE estimation on out-of-distribution circuits with higher number of qubits and gate counts compared to previous work, for both unstructured random quantum circuits and structured circuits derived from the transverse-field Ising model. Moreover, the graph representation of quantum circuits naturally integrates hardware-specific information. Simulations on noisy quantum hardware highlight the potential of the proposed GNN to predict the SRE measured on quantum devices.

2511.22652 2026-03-03 cond-mat.mtrl-sci cs.LG

Generative Models for Crystalline Materials

Houssam Metni, Laura Ruple, Lauren N. Walters, Luca Torresi, Jonas Teufel, Henrik Schopmans, Jona Östreicher, Yumeng Zhang, Marlen Neubert, Yuri Koide, Kevin Steiner, Paul Link, Lukas Bär, Mariana Petrova, Gerbrand Ceder, Pascal Friederich

详情
英文摘要

Understanding structure-property relationships in materials is fundamental in condensed matter physics and materials science. Over the past few years, machine learning (ML) has emerged as a powerful tool for advancing this understanding and accelerating materials discovery. Early ML approaches primarily focused on constructing and screening large material spaces to identify promising candidates for various applications. More recently, research efforts have increasingly shifted toward generating crystal structures using end-to-end generative models. This review analyzes the current state of generative modeling for crystal structure prediction and de novo generation. It examines crystal representations, outlines the generative models used to design crystal structures, and evaluates their respective strengths and limitations. Furthermore, the review highlights experimental considerations for evaluating generated structures and provides recommendations for suitable existing software tools. Emerging topics, such as modeling disorder and defects, integration in advanced characterization, incorporating synthetic feasibility constraints, and model explainability are explored. Ultimately, this work aims to inform both experimental scientists looking to adapt suitable ML models to their specific circumstances and ML specialists seeking to understand the unique challenges related to inverse materials design and discovery.

2510.07088 2026-03-03 stat.ML cs.LG

Fourier Analysis on the Boolean Hypercube via Hoeffding Functional Decomposition

Baptiste Ferrere, Nicolas Bousquet, Fabrice Gamboa, Jean-Michel Loubes, Joseph Muré

详情
英文摘要

Fourier analysis on the Boolean hypercube is fundamentally defined as the orthogonal decomposition of the space of pseudo-Boolean functions with respect to the uniform probability measure. In this work, we propose an ANOVA-based generalization of the Fourier decomposition on the Boolean hypercube endowed with any arbitrary probability measure. We provide an \emph{explicit} decomposition basis which generalizes the Walsh-Hadamard (or parity functions) basis under any \emph{arbitrary} probability measure on the Boolean hypercube. We formulate the computation of the entire functional decomposition as a least squares problem and also provide a method to address the classical \emph{curse of dimensionality} challenge. We provide a comprehensive generalization of Fourier analysis on the Boolean hypercube, enabling the handling of non-uniform configuration spaces inherent to real-world machine learning tasks, \textit{e.g.} when dealing with \emph{one-hot encoded} features. Finally, we demonstrate its practical impact in the field of explainable AI, by conducting comparative studies with feature attribution methods such as SHAP or TreeHFD.

2509.16799 2026-03-03 quant-ph cs.LG

A Study on Stabilizer Rényi Entropy Estimation using Machine Learning

Vincenzo Lipardi, Domenica Dibenedetto, Georgios Stamoulis, Mark H. M. Winands

详情
英文摘要

Nonstabilizerness is a fundamental resource for quantum advantage, as it quantifies the extent to which a quantum state diverges from those states that can be efficiently simulated on a classical computer, the stabilizer states. The stabilizer Rényi entropy (SRE) is one of the most investigated measures of nonstabilizerness because of its computational properties and suitability for experimental measurements on quantum processors. Because computing the SRE for arbitrary quantum states is a computationally hard problem, we propose a supervised machine-learning approach to estimate it. In this work, we frame SRE estimation as a regression task and train a Random Forest Regressor and a Support Vector Regressor (SVR) on a comprehensive dataset, including both unstructured random quantum circuits and structured circuits derived from the physics-motivated one-dimensional transverse Ising model (TIM). We compare the machine-learning models using two different quantum circuit representations: one based on classical shadows and the other on circuit-level features. Furthermore, we assess the generalization capabilities of the models on out-of-distribution instances. Experimental results show that an SVR trained on circuit-level features achieves the best overall performance. On the random circuits dataset, our approach converges to accurate SRE estimations, but struggles to generalize out of distribution. In contrast, it generalizes well on the structured TIM dataset, even to deeper and larger circuits. In line with previous work, our experiments suggest that machine learning offers a viable path for efficient nonstabilizerness estimation.

2508.07757 2026-03-03 eess.AS cs.SD

Score-Informed Transformer for Refining MIDI Velocity in Automatic Music Transcription

Zhanhong He, Roberto Togneri, David Huang

Comments Submitted to SMC2026 Conference

详情
英文摘要

MIDI velocity is crucial for capturing expressive dynamics in human performances. In practical scenarios, a music score with inaccurate velocities may be available alongside the performance audio (e.g., music education and free online archives), enabling the task of score-informed MIDI velocity estimation. In this work, we propose a modular, lightweight score-informed Transformer correction module that refines the velocity estimates of Automatic Music Transcription (AMT) systems. We integrate the proposed module into multiple AMT systems (HPT, HPPNet, and DynEst). Trained exclusively on the MAESTRO training split, our method consistently reduces velocity estimation errors on MAESTRO and improves cross-dataset generalization to SMD and MAPS datasets. Under this training protocol, integrating our score-informed module with HPT (named Score-HPT) establishes a new state-of-the-art performance, outperforms existing score-informed methods and velocity-enabled AMT systems while adding only 1 M parameters.

2507.08986 2026-03-03 physics.flu-dyn cs.LG

Physics-Based Machine Learning Closures and Wall Models for Hypersonic Transition-Continuum Boundary Layer Predictions

Ashish S. Nair, Narendra Singh, Marco Panesi, Justin Sirignano, Jonathan F. MacArt

Journal ref Physical Review Fluids 11, 033402 (2026)

详情
英文摘要

Modeling rarefied hypersonic flows remains a fundamental challenge due to the breakdown of classical continuum assumptions in the transition-continuum regime, where the Knudsen number ranges from approximately 0.1 to 10. Conventional Navier-Stokes-Fourier (NSF) models with empirical slip-wall boundary conditions fail to accurately predict nonequilibrium effects such as velocity slip, temperature jump, and shock structure deviations. We develop a physics-constrained machine learning framework that augments transport models and boundary conditions to extend the applicability of continuum solvers in nonequilibrium hypersonic regimes. We employ deep learning PDE models (DPMs) for the viscous stress and heat flux embedded in the governing PDEs and trained via adjoint-based optimization. We evaluate these for two-dimensional supersonic flat-plate flows across a range of Mach and Knudsen numbers. Additionally, we introduce a wall model based on a mixture of skewed Gaussian approximations of the particle velocity distribution function. This wall model replaces empirical slip conditions with physically informed, data-driven boundary conditions for the streamwise velocity and wall temperature. Our results show that a trace-free anisotropic viscosity model, paired with the skewed-Gaussian distribution function wall model, achieves significantly improved accuracy, particularly at high-Mach and high-Knudsen number regimes. Strategies such as parallel training across multiple Knudsen numbers and inclusion of high-Mach data during training are shown to enhance model generalization. Increasing model complexity yields diminishing returns for out-of-sample cases, underscoring the need to balance degrees of freedom and overfitting. This work establishes data-driven, physics-consistent strategies for improving hypersonic flow modeling for regimes in which conventional continuum approaches are invalid.

2505.09365 2026-03-03 physics.space-ph astro-ph.IM astro-ph.SR cs.LG

ARCANE -- Early Detection of Interplanetary Coronal Mass Ejections

H. T. Rüdisser, G. Nguyen, J. Le Louëdec, E. E. Davies, C. Möstl

Comments 29 pages, 10 figures, 1 table, submitted to AGU Space Weather on 14 May 2025, revised 17 October 2025, accepted 01 December 2025, published 23 February 2026

Journal ref Rüdisser, H. T., Nguyen, G., Le Louëdec, J., Davies, E. E., & Möstl, C. (2026). ARCANE--Early detection of Interplanetary Coronal Mass Ejections. Space Weather, 24, e2025SW004537

详情
英文摘要

Interplanetary coronal mass ejections (ICMEs) are major drivers of space weather disturbances, posing risks to both technological infrastructure and human activities. Automatic detection of ICMEs in solar wind in situ data is essential for early warning systems. While several methods have been proposed to identify these structures in time series data, robust real-time detection remains a significant challenge. In this work, we present ARCANE - the first framework explicitly designed for early ICME detection in streaming solar wind data under realistic operational constraints, enabling event identification without requiring observation of the full structure. Our approach evaluates the strengths and limitations of detection models by comparing a machine learning-based method to a threshold-based baseline. The ResUNet++ model, previously validated on science data, significantly outperforms the baseline, particularly in detecting high-impact events, while retaining solid performance on lower-impact cases. Notably, we find that using real-time solar wind (RTSW) data instead of high-resolution science data leads to only minimal performance degradation. Despite the challenges of operational settings, our detection pipeline achieves an F1-Score of 0.37, with an average detection delay of 24.5% of the event's duration while processing only a minimal portion of the event data. As more data becomes available, the performance increases significantly. These results mark a substantial step forward in automated space weather monitoring and lay the groundwork for enhanced real-time forecasting capabilities.

2504.03592 2026-03-03 math.OC cs.GT cs.LG

Optimistic Online Learning in Symmetric Cone Games

Anas Barakat, Wayne Lin, John Lazarsfeld, Antonios Varvitsiotis

Comments Published in Transactions on Machine Learning Research 2026

详情
英文摘要

We introduce symmetric cone games (SCGs), a broad class of multi-player games where each player's strategy lies in a generalized simplex (the trace-one slice of a symmetric cone). This framework unifies a wide spectrum of settings, including normal-form games (simplex strategies), quantum games (density matrices), and continuous games with ball-constrained strategies. It also captures several structured machine learning and optimization problems, such as distance metric learning and Fermat-Weber facility location, as two-player zero-sum SCGs. To compute approximate Nash equilibria in two-player zero-sum SCGs, we propose a single online learning algorithm: Optimistic Symmetric Cone Multiplicative Weights Updates (OSCMWU). Unlike prior methods tailored to specific geometries, OSCMWU provides closed-form updates over any symmetric cone and achieves a $\tilde{\mathcal{O}}(1/ε)$ iteration complexity for computing $ε$-saddle points. Our analysis builds on the Optimistic Follow-the-Regularized-Leader framework and hinges on a key technical contribution: We prove that the symmetric cone negative entropy is strongly convex with respect to the trace-one norm. This result extends known results for the simplex and spectraplex to all symmetric cones, and may be of independent interest.

2503.01441 2026-03-03 math.OC cs.LG

A Randomized Linearly Convergent Frank-Wolfe-type Method for Smooth Convex Minimization over the Spectrahedron

Dan Garber

Comments Accepted to Mathematical Programming SERIES A

详情
英文摘要

We consider the problem of minimizing a smooth and convex function over the $n$-dimensional spectrahedron -- the set of real symmetric $n\times n$ positive semidefinite matrices with unit trace, which underlies numerous applications in statistics, machine learning and additional domains. Standard first-order methods often require high-rank matrix computations which are prohibitive when the dimension $n$ is large. The well-known Frank-Wolfe method on the other hand only requires efficient rank-one matrix computations, however, suffers from worst-case slow convergence, even under conditions that enable linear convergence rates for standard methods. In this work we present the first Frank-Wolfe-based algorithm that only applies efficient rank-one matrix computations and, assuming quadratic growth and strict complementarity conditions, is guaranteed, after a finite number of iterations, to converge linearly, in expectation, and independently of the ambient dimension.

2501.15849 2026-03-03 eess.SY cs.LG cs.SY

Data-Driven Prediction and Control of Hammerstein-Wiener Systems with Implicit Gaussian Processes

Mingzhou Yin, Matthias A. Müller

详情
英文摘要

This work investigates data-driven prediction and control of Hammerstein-Wiener systems using physics-informed Gaussian process (GP) models that encode the block-oriented model structure. Data-driven prediction algorithms have been developed for structured nonlinear systems based on Willems' fundamental lemma. However, existing frameworks do not apply to output nonlinearities in Wiener systems and rely on a finite-dimensional dictionary of basis functions for Hammerstein systems. In this work, an implicit predictor structure is considered, leveraging the linearity for the dynamical part of the model. This implicit function is learned by GP regression, utilizing carefully designed structured kernel functions from linear model parameters and GP priors for the nonlinearities. Virtual derivative points are added to the regression by expectation propagation to encode monotonicity information of the nonlinearities. The linear model parameters are estimated as hyperparameters by assuming a stable spline hyperprior. The implicit GP model provides explicit output prediction by optimizing selected optimality criteria. The implicit model is also applied to receding horizon control with the expected control cost and chance constraint satisfaction guarantee. Numerical results demonstrate that the proposed prediction and control algorithms are superior to black-box GP models without model structure knowledge.

2501.06762 2026-03-03 q-bio.NC cs.LG cs.NE

Improving the adaptive and continuous learning capabilities of artificial neural networks: Lessons from multi-neuromodulatory dynamics

Jie Mei, Alejandro Rodriguez-Garcia, Daigo Takeuchi, Gabriel Wainstein, Nina Hubig, Yalda Mohsenzadeh, Srikanth Ramaswamy

详情
英文摘要

Continuous, adaptive learning, the ability to adapt to the environment and keep improving performance, is a hallmark of natural intelligence. Biological organisms excel in acquiring, transferring, and retaining knowledge while adapting to volatile environments, making them a source of inspiration for artificial neural networks (ANNs). This study explores how neuromodulation, a building block of learning in biological systems, can help address catastrophic forgetting and enhance the robustness of ANNs in continual learning. Driven by neuromodulators including dopamine (DA), acetylcholine (ACh), serotonin (5-HT) and noradrenaline (NA), neuromodulatory processes in the brain operate at multiple scales, facilitating dynamic responses to environmental changes through mechanisms ranging from local synaptic plasticity to global network-wide adaptability. Importantly, the relationship between neuromodulators and their interplay in modulating sensory and cognitive processes is more complex than previously expected, demonstrating a "many-to-one" neuromodulator-to-task mapping. To inspire neuromodulation-aware learning rules, we highlight (i) how multi-neuromodulatory interactions enrich single-neuromodulator-driven learning, (ii) the impact of neuromodulators across multiple spatio-temporal scales, and correspondingly, (iii) strategies for approximating and integrating neuromodulated learning processes in ANNs. To illustrate these principles, we present a conceptual study to showcase how neuromodulation-inspired mechanisms, such as DA-driven reward processing and NA-based cognitive flexibility, can enhance ANN performance in a Go/No-Go task. Though multi-scale neuromodulation, we aim to bridge the gap between biological and artificial learning, paving the way for ANNs with greater flexibility, robustness, and adaptability.

2501.04134 2026-03-03 stat.ML cs.LG math.OC math.ST stat.TH

Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity

Mario Bravo, Juan P. Flores-Mella, Cristóbal Guzmán

Comments 38 pages, 2 figures

详情
英文摘要

We study the mixing time of the projected Langevin algorithm (LA) and the privacy curve of noisy Stochastic Gradient Descent (SGD), beyond nonexpansive iterations. Specifically, we derive new mixing time bounds for the projected LA which are, in some important cases, dimension-free and poly-logarithmic on the accuracy, closely matching the existing results in the smooth convex case. Additionally, we establish new upper bounds for the privacy curve of the subsampled noisy SGD algorithm. These bounds show a crucial dependency on the regularity of gradients, and are useful for a wide range of convex losses beyond the smooth case. Our analysis relies on a suitable extension of the Privacy Amplification by Iteration (PABI) framework (Feldman et al., 2018; Altschuler and Talwar, 2022, 2023) to noisy iterations whose gradient map is not necessarily nonexpansive. This extension is achieved by designing an optimization problem which accounts for the best possible Rényi divergence bound obtained by an application of PABI, where the tractability of the problem is crucially related to the modulus of continuity of the associated gradient mapping. We show that, in several interesting cases -- namely the nonsmooth convex, weakly smooth and (strongly) dissipative -- such optimization problem can be solved exactly and explicitly, yielding the tightest possible PABI-based bounds.

2412.16031 2026-03-03 stat.ML cs.LG math.ST stat.TH

Learning sparsity-promoting regularizers for linear inverse problems

Giovanni S. Alberti, Ernesto De Vito, Tapio Helin, Matti Lassas, Luca Ratti, Matteo Santacesaria

Comments 28 pages, 4 figures

Journal ref SIAM Journal on Mathematics of Data Science 2026 8:1, 167-199

详情
英文摘要

This paper introduces a novel approach to learning sparsity-promoting regularizers for solving linear inverse problems. We develop a bilevel optimization framework to select an optimal synthesis operator, denoted as $B$, which regularizes the inverse problem while promoting sparsity in the solution. The method leverages statistical properties of the underlying data and incorporates prior knowledge through the choice of $B$. We establish the well-posedness of the optimization problem, provide theoretical guarantees for the learning process, and present sample complexity bounds. The approach is demonstrated through theoretical infinite-dimensional examples, including compact perturbations of a known operator and the problem of learning the mother wavelet, and through extensive numerical simulations. This work extends previous efforts in Tikhonov regularization by addressing non-differentiable norms and proposing a data-driven approach for sparse regularization in infinite dimensions.

2410.07612 2026-03-03 cs.CR cs.AI

A Survey for Deep Reinforcement Learning Based Network Intrusion Detection

Wanrong Yang, Alberto Acuto, Yihang Zhou, Dominik Wojtczak

Comments 17 pages, 7 figures

详情
英文摘要

Cyber-attacks are becoming increasingly sophisticated and frequent, highlighting the importance of network intrusion detection systems. This paper explores the potential and challenges of using deep reinforcement learning (DRL) in network intrusion detection. It begins by introducing key DRL concepts and frameworks, such as deep Q-networks and actor-critic algorithms, and reviews recent research utilizing DRL for intrusion detection. The study evaluates challenges related to model training efficiency, detection of minority and unknown class attacks, feature selection, and handling unbalanced datasets. The performance of DRL models is comprehensively analyzed, showing that while DRL holds promise, many recent technologies remain underexplored. Some DRL models achieve state-of-the-art results on public datasets, occasionally outperforming traditional deep learning methods. The paper concludes with recommendations for enhancing DRL deployment and testing in real-world network scenarios, with a focus on Internet of Things intrusion detection. It discusses recent DRL architectures and suggests future policy functions for DRL-based intrusion detection. Finally, the paper proposes integrating DRL with generative methods to further improve performance, addressing current gaps and supporting more robust and adaptive network intrusion detection systems.

2406.02645 2026-03-03 physics.comp-ph cs.AI cs.LG cs.NA math.NA

Astral: training physics-informed neural networks with error majorants

Vladimir Fanaskov, Tianchi Yu, Alexander Rudikov, Ivan Oseledets

Comments Accepted to ICLR 2026 workshop AI&PDE, reviewed at https://openreview.net/forum?id=TcFpJK2FcN

详情
英文摘要

The primal approach to physics-informed learning is a residual minimization. We argue that residual is, at best, an indirect measure of the error of approximate solution and propose to train with error majorant instead. Since error majorant provides a direct upper bound on error, one can reliably estimate how close PiNN is to the exact solution and stop the optimization process when the desired accuracy is reached. We call loss function associated with error majorant \textbf{Astral}: neur\textbf{A}l a po\textbf{ST}erio\textbf{R}i function\textbf{A}l \textbf{L}oss. To compare Astral and residual loss functions, we illustrate how error majorants can be derived for various PDEs and conduct experiments with diffusion equations (including anisotropic and in the L-shaped domain), convection-diffusion equation, temporal discretization of Maxwell's equation, magnetostatics and nonlinear elastoplasticity problems. The results indicate that Astral loss is competitive to the residual loss, typically leading to faster convergence and lower error. The main benefit of using Astral loss comes from its ability to estimate error, which is impossible with other loss functions. Our experiments indicate that the error estimate obtained with Astral loss is usually tight enough, e.g., for a highly anisotropic equation, on average, Astral overestimates error by a factor of $1.5$, and for convection-diffusion by a factor of $1.7$. We further demonstrate that Astral loss is better correlated with error than residual and is a more reliable predictor of the error value. Moreover, unlike residual, the error indicator obtained from Astral loss has a superb spatial correlation with error. Backed with the empirical and theoretical results, we argue that one can productively use Astral loss to perform reliable error analysis and approximate PDE solutions with accuracy similar to standard residual-based techniques.

2401.00664 2026-03-03 math.OC cs.LG math.PR math.ST stat.TH

Metric Entropy-Free Sample Complexity Bounds for Sample Average Approximation in Convex Stochastic Programming

Hongcheng Liu, Jindong Tong

详情
英文摘要

This paper studies sample average approximation (SAA) in solving convex or strongly convex stochastic programming (SP) problems. In estimating SAA's sample efficiency, the state-of-the-art sample complexity bounds entail metric entropy terms (such as the logarithm of the feasible region's covering number), which often grow polynomially with problem dimensionality. While it has been shown that metric entropy-free complexity rates are attainable under a uniform Lipschitz condition, such an assumption can be overly critical for many important SP problem settings. In response, this paper presents metric entropy-free sample complexity bounds for the SAA under standard SP assumptions} -- in the absence of the uniform Lipschitz condition. For a $d$-dimensional problem, the new results often lead to an $O(d)$-improvement in the complexity rate compared with the state-of-the-art. From the newly established complexity bounds, an important revelation is that SAA and the canonical stochastic mirror descent (SMD) method, two mainstream solution approaches to SP, entail almost identical rates of sample efficiency, lifting a theoretical discrepancy of SAA from SMD also by a factor of $O(d)$. Furthermore, this paper explores non-Lipschitzian scenarios where SAA maintains provable efficacy but the corresponding results for SMD remain mostly unexplored, indicating the potential of SAA's better applicability in some irregular settings. The results of our numerical experiments align with our theoretical findings.

2603.01834 2026-03-03 cond-mat.mtrl-sci cs.LG

Probing Materials Knowledge in LLMs: From Latent Embeddings to Reliable Predictions

Vineeth Venugopal, Soroush Mahjoubi, Elsa Olivetti

Comments Under Review

详情
英文摘要

Large language models are increasingly applied to materials science, yet fundamental questions remain about their reliability and knowledge encoding. Evaluating 25 LLMs across four materials science tasks -- over 200 base and fine-tuned configurations -- we find that output modality fundamentally determines model behavior. For symbolic tasks, fine-tuning converges to consistent, verifiable answers with reduced response entropy, while for numerical tasks, fine-tuning improves prediction accuracy but models remain inconsistent across repeated inference runs, limiting their reliability as quantitative predictors. For numerical regression, we find that better performance can be obtained by extracting embeddings directly from intermediate transformer layers than from model text output, revealing an ``LLM head bottleneck,'' though this effect is property- and dataset-dependent. Finally, we present a longitudinal study of GPT model performance in materials science, tracking four models over 18 months and observing 9--43\% performance variation that poses reproducibility challenges for scientific applications.

2603.01820 2026-03-03 q-fin.TR cs.LG

Deep Learning for Financial Time Series: A Large-Scale Benchmark of Risk-Adjusted Performance

Adir Saly-Kaufmann, Kieran Wood, Jan Peter-Calliess, Stefan Zohren

Comments 43 pages, 27 figures, 11 tables

详情
英文摘要

We present a large scale benchmark of modern deep learning architectures for a financial time series prediction and position sizing task, with a primary focus on Sharpe ratio optimization. Evaluating linear models, recurrent networks, transformer based architectures, state space models, and recent sequence representation approaches, we assess out of sample performance on a daily futures dataset spanning commodities, equity indices, bonds, and FX spanning 2010 to 2025. Our evaluation goes beyond average returns and includes statistical significance, downside and tail risk measures, breakeven transaction cost analysis, robustness to random seed selection, and computational efficiency. We find that models explicitly designed to learn rich temporal representations consistently outperform linear benchmarks and generic deep learning models, which often lead the ranking in standard time series benchmarks. Hybrid models such as VSN with LSTM, a combination of Variable Selection Networks (VSN) and LSTMs, achieves the highest overall Sharpe ratio, while VSN with xLSTM and LSTM with PatchTST exhibit superior downside adjusted characteristics. xLSTM demonstrates the largest breakeven transaction cost buffer, indicating improved robustness to trading frictions.

2603.01806 2026-03-03 cs.SI cs.GR cs.LG

GCTAM: Global and Contextual Truncated Affinity Combined Maximization Model For Unsupervised Graph Anomaly Detection

Xiong Zhang, Hong Peng, Zhenli He, Cheng Xie, Xin Jin, Hua Jiang

Comments Accepted by IJCAI 2025

详情
英文摘要

Anomalies often occur in real-world information networks/graphs, such as malevolent users, malicious comments, banned users, and fake news in social graphs. The latest graph anomaly detection methods use a novel mechanism called truncated affinity maximization (TAM) to detect anomaly nodes without using any label information and achieve impressive results. TAM maximizes the affinities among the normal nodes while truncating the affinities of the anomalous nodes to identify the anomalies. However, existing TAM-based methods truncate suspicious nodes according to a rigid threshold that ignores the specificity and high-order affinities of different nodes. This inevitably causes inefficient truncations from both normal and anomalous nodes, limiting the effectiveness of anomaly detection. To this end, this paper proposes a novel truncation model combining contextual and global affinity to truncate the anomalous nodes. The core idea of the work is to use contextual truncation to decrease the affinity of anomalous nodes, while global truncation increases the affinity of normal nodes. Extensive experiments on massive real-world datasets show that our method surpasses peer methods in most graph anomaly detection tasks. In highlights, compared with previous state-of-the-art methods, the proposed method has +15\% $\sim$ +20\% improvements in two famous real-world datasets, Amazon and YelpChi. Notably, our method works well in large datasets, Amazin-all and YelpChi-all, and achieves the best results, while most previous models cannot complete the tasks.

2603.01795 2026-03-03 cs.HC cs.AI cs.CL

PleaSQLarify: Visual Pragmatic Repair for Natural Language Database Querying

Robin Shing Moon Chan, Rita Sevastjanova, Mennatallah El-Assady

Comments Accepted at CHI'26, main track

详情
英文摘要

Natural language database interfaces broaden data access, yet they remain brittle under input ambiguity. Standard approaches often collapse uncertainty into a single query, offering little support for mismatches between user intent and system interpretation. We reframe this challenge through pragmatic inference: while users economize expressions, systems operate on priors over the action space that may not align with the users'. In this view, pragmatic repair -- incremental clarification through minimal interaction -- is a natural strategy for resolving underspecification. We present \textsc{PleaSQLarify}, which operationalizes pragmatic repair by structuring interaction around interpretable decision variables that enable efficient clarification. A visual interface complements this by surfacing the action space for exploration, requesting user disambiguation, and making belief updates traceable across turns. In a study with twelve participants, \textsc{PleaSQLarify} helped users recognize alternative interpretations and efficiently resolve ambiguity. Our findings highlight pragmatic repair as a design principle that fosters effective user control in natural language interfaces.

2603.01784 2026-03-03 cs.CR cs.AI

Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution

Guoxin Shi, Haoyu Wang, Zaihui Yang, Yuxing Wang, Yongzhe Chang

Comments Preprint

详情
英文摘要

Adversarial behavior plays a central role in aligning large language models with human values. However, existing alignment methods largely rely on static adversarial settings, which fundamentally limit robustness, particularly in multimodal settings with a larger attack surface. In this work, we move beyond static adversarial supervision and introduce co-evolutionary alignment with evolving attacks, instantiated by CEMMA (Co-Evolutionary Multi-Modal Alignment), an automated and adaptive framework for multimodal safety alignment. We introduce an Evolutionary Attacker that decomposes adversarial prompts into method templates and harmful intents. By employing genetic operators, including mutation, crossover, and differential evolution, it enables simple seed attacks to inherit the structural efficacy of sophisticated jailbreaks. The Adaptive Defender is iteratively updated on the synthesized hard negatives, forming a closed-loop process that adapts alignment to evolving attacks. Experiments show that the Evolutionary Attacker substantially increases red-teaming jailbreak attack success rate (ASR), while the Adaptive Defender improves robustness and generalization across benchmarks with higher data efficiency, without inducing excessive benign refusal, and remains compatible with inference-time defenses such as AdaShield.