arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2393
专题追踪
2508.08326 2026-02-17 cs.LG

Weather-Driven Agricultural Decision-Making Using Digital Twins Under Imperfect Conditions

Tamim Ahmed, Monowar Hasan

Comments ACM SIGSPATIAL 2025

详情
英文摘要

By offering a dynamic, real-time virtual representation of physical systems, digital twin technology can enhance data-driven decision-making in digital agriculture. Our research shows how digital twins are useful for detecting inconsistencies in agricultural weather data measurements, which are key attributes for various agricultural decision-making and automation tasks. We develop a modular framework named Cerealia that allows end-users to check for data inconsistencies when perfect weather feeds are unavailable. Cerealia uses neural network models to check anomalies and aids end-users in informed decision-making. We develop a prototype of Cerealia using the NVIDIA Jetson Orin platform and test it with an operational weather network established in a commercial orchard as well as publicly available weather datasets.

2508.07428 2026-02-17 cs.LG cs.AI

Lightning Prediction under Uncertainty: DeepLight with Hazy Loss

Md Sultanul Arifin, Abu Nowshed Sakib, Yeasir Rayhan, Tanzima Hashem

详情
英文摘要

Lightning, a common feature of severe meteorological conditions, poses significant risks, from direct human injuries to substantial economic losses. These risks are further exacerbated by climate change. Early and accurate prediction of lightning would enable preventive measures to safeguard people, protect property, and minimize economic losses. In this paper, we present DeepLight, a novel deep learning architecture for predicting lightning occurrences. Existing prediction models face several critical limitations: i) they often struggle to capture the dynamic spatial context and the inherent randomness of lightning events, including whether lightning occurs and its variability in location and timing even under similar meteorological conditions; ii) they underutilize key observational data, such as radar reflectivity and cloud properties; and iii) they rely heavily on Numerical Weather Prediction (NWP) systems, which are both computationally expensive and highly sensitive to parameter settings. To overcome these challenges, DeepLight leverages multi-source meteorological data, including radar reflectivity, cloud properties, and historical lightning occurrences through a dual-encoder architecture. By employing multi-branch convolution techniques, it dynamically captures spatial correlations across varying extents. Furthermore, its novel Hazy Loss function explicitly addresses the spatio-temporal uncertainty of lightning by penalizing deviations based on proximity to true events, enabling the model to better learn patterns amidst randomness. Extensive experiments show that DeepLight improves the Equitable Threat Score (ETS) by 18\%--30\% over state-of-the-art methods, establishing it as a robust solution for lightning prediction.

2507.16713 2026-02-17 cs.RO cs.AI cs.CL

A Pragmatist Robot: Learning to Plan Tasks by Experiencing the Real World

Kaixian Qu, Guowei Lan, René Zurbrügg, Changan Chen, Christopher E. Mower, Haitham Bou-Ammar, Marco Hutter

Comments Accepted to RA-L

详情
英文摘要

Large language models (LLMs) have emerged as the dominant paradigm for robotic task planning using natural language instructions. However, trained on general internet data, LLMs are not inherently aligned with the embodiment, skill sets, and limitations of real-world robotic systems. Inspired by the emerging paradigm of verbal reinforcement learning-where LLM agents improve through self-reflection and few-shot learning without parameter updates-we introduce PragmaBot, a framework that enables robots to learn task planning through real-world experience. PragmaBot employs a vision-language model (VLM) as the robot's "brain" and "eye", allowing it to visually evaluate action outcomes and self-reflect on failures. These reflections are stored in a short-term memory (STM), enabling the robot to quickly adapt its behavior during ongoing tasks. Upon task completion, the robot summarizes the lessons learned into its long-term memory (LTM). When facing new tasks, it can leverage retrieval-augmented generation (RAG) to plan more grounded action sequences by drawing on relevant past experiences and knowledge. Experiments on four challenging robotic tasks show that STM-based self-reflection increases task success rates from 35% to 84%, with emergent intelligent object interactions. In 12 real-world scenarios (including eight previously unseen tasks), the robot effectively learns from the LTM and improves single-trial success rates from 22% to 80%, with RAG outperforming naive prompting. These results highlight the effectiveness and generalizability of PragmaBot. Project webpage: https://pragmabot.github.io/

2507.11035 2026-02-17 cs.CV

Efficient Dual-domain Image Dehazing with Haze Prior Perception

Lirong Zheng, Yanshan Li, Rui Yu, Kaihao Zhang

详情
英文摘要

Transformers offer strong global modeling for single-image dehazing but come with high computational costs. Most methods rely on spatial features to capture long-range dependencies, making them less effective under complex haze conditions. Although some integrate frequency-domain cues, weak coupling between spatial and frequency branches limits their performance. To address these issues, we propose the Dark Channel Guided Frequency-aware Dehazing Network (DGFDNet), a dual-domain framework that explicitly aligns degradation across spatial and frequency domains. At its core, the DGFDBlock consists of two key modules: 1) Haze-Aware Frequency Modulator (HAFM), which uses dark channel priors to generate a haze confidence map for adaptive frequency modulation, achieving global degradation-aware spectral filtering. 2) Multi-level Gating Aggregation Module (MGAM), which fuses multi-scale features via multi-scale convolutions and a hybrid gating mechanism to recover fine-grained structures. Additionally, the Prior Correction Guidance Branch (PCGB) incorporates feedback for iterative refinement of the prior, improving haze localization accuracy, particularly in outdoor scenes. Extensive experiments on four benchmark datasets demonstrate that DGFDNet achieves state-of-the-art performance with improved robustness and real-time efficiency. Code is available at: https://github.com/Dilizlr/DGFDNet.

2506.19474 2026-02-17 cs.CV

HMSViT: A Hierarchical Masked Self-Supervised Vision Transformer for Corneal Nerve Segmentation and Diabetic Neuropathy Diagnosis

Xin Zhang, Liangxiu Han, Yue Shi, Yanlin Zheng, Uazman Alam, Maryam Ferdousi, Rayaz Malik

详情
英文摘要

Diabetic Peripheral Neuropathy (DPN) affects nearly half of diabetes patients, requiring early detection. Corneal Confocal Microscopy (CCM) enables non-invasive diagnosis, but automated methods suffer from inefficient feature extraction, reliance on handcrafted priors, and data limitations. We propose HMSViT, a novel Hierarchical Masked Self-Supervised Vision Transformer (HMSViT) designed for corneal nerve segmentation and DPN diagnosis. Unlike existing methods, HMSViT employs pooling-based hierarchical and dual attention mechanisms with absolute positional encoding, enabling efficient multi-scale feature extraction by capturing fine-grained local details in early layers and integrating global context in deeper layers, all at a lower computational cost. A block-masked self supervised learning framework is designed for the HMSViT that reduces reliance on labelled data, enhancing feature robustness, while a multi-scale decoder is used for segmentation and classification by fusing hierarchical features. Experiments on clinical CCM datasets showed HMSViT achieves state-of-the-art performance, with 61.34% mIoU for nerve segmentation and 70.40% diagnostic accuracy, outperforming leading hierarchical models like the Swin Transformer and HiViT by margins of up to 6.39% in segmentation accuracy while using fewer parameters. Detailed ablation studies further reveal that integrating block-masked SSL with hierarchical multi-scale feature extraction substantially enhances performance compared to conventional supervised training. Overall, these comprehensive experiments confirm that HMSViT delivers excellent, robust, and clinically viable results, demonstrating its potential for scalable deployment in real-world diagnostic applications.

2505.10685 2026-02-17 cs.CV

GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention

Lingjun Zhao, Sizhe Wei, James Hays, Lu Gan

详情
英文摘要

3D semantic occupancy prediction is essential for achieving safe, reliable autonomous driving and robotic navigation. Compared to camera-only perception systems, multi-modal pipelines, especially LiDAR-camera fusion methods, can produce more accurate and fine-grained predictions. Although voxel-based scene representations are widely used for semantic occupancy prediction, 3D Gaussians have emerged as a continuous and significantly more compact alternative. In this work, we propose a multi-modal Gaussian-based semantic occupancy prediction framework utilizing 3D deformable attention, namely GaussianFormer3D. We introduce a voxel-to-Gaussian initialization strategy that provides 3D Gaussians with accurate geometry priors from LiDAR data, and design a LiDAR-guided 3D deformable attention mechanism to refine these Gaussians using LiDAR-camera fusion features in a lifted 3D space. Extensive experiments on real-world on-road and off-road autonomous driving datasets demonstrate that GaussianFormer3D achieves state-of-the-art prediction performance with reduced memory consumption and improved efficiency.

2505.04338 2026-02-17 cs.LG

Riemannian Denoising Diffusion Probabilistic Models

Zichen Liu, Wei Zhang, Christof Schütte, Tiejun Li

详情
英文摘要

We propose Riemannian Denoising Diffusion Probabilistic Models (RDDPMs) for learning distributions on submanifolds of Euclidean space that are level sets of functions, including most of the manifolds relevant to applications. Existing methods for generative modeling on manifolds rely on substantial geometric information such as geodesic curves or eigenfunctions of the Laplace-Beltrami operator and, as a result, they are limited to manifolds where such information is available. In contrast, our method, built on a projection scheme, can be applied to more general manifolds, as it only requires being able to evaluate the value and the first order derivatives of the function that defines the submanifold. We provide a theoretical analysis of our method in the continuous-time limit, which elucidates the connection between our RDDPMs and score-based generative models on manifolds. The capability of our method is demonstrated on datasets from previous studies and on new datasets sampled from two high-dimensional manifolds, i.e. $\mathrm{SO}(10)$ and the configuration space of molecular system alanine dipeptide with fixed dihedral angle.

2504.20426 2026-02-17 cs.AI

RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library

Jiapeng Wang, Jinhao Jiang, Zhiqiang Zhang, Jun Zhou, Wayne Xin Zhao

Comments Accepted to EACL 2026

详情
英文摘要

The advancement of reasoning capabilities in Large Language Models (LLMs) requires substantial amounts of high-quality reasoning data, particularly in mathematics. Existing data synthesis methods, such as data augmentation from annotated training sets or direct question generation based on relevant knowledge points and documents, have expanded datasets but face challenges in mastering the inner logic of the problem during generation and ensuring the verifiability of the solutions. To address these issues, we propose RV-Syn, a novel Rational and Verifiable mathematical Synthesis approach. RV-Syn constructs a structured mathematical operation function library based on initial seed problems and generates computational graphs as solutions by combining Python-formatted functions from this library. These graphs are then back-translated into complex problems. Based on the constructed computation graph, we achieve solution-guided logic-aware problem generation. Furthermore, the executability of the computational graph ensures the verifiability of the solving process. Experimental results show that RV-Syn surpasses existing synthesis methods, including those involving human-generated problems, achieving greater efficient data scaling. This approach provides a scalable framework for generating high-quality reasoning datasets.

2504.07667 2026-02-17 cs.CV

S2R-HDR: A Large-Scale Rendered Dataset for HDR Fusion

Yujin Wang, Jiarui Wu, Yichen Bian, Fan Zhang, Tianfan Xue

Comments Accepted by ICLR 2026. Project Page:https://openimaginglab.github.io/S2R-HDR

详情
英文摘要

The generalization of learning-based high dynamic range (HDR) fusion is often limited by the availability of training data, as collecting large-scale HDR images from dynamic scenes is both costly and technically challenging. To address these challenges, we propose S2R-HDR, the first large-scale high-quality synthetic dataset for HDR fusion, with 24,000 HDR samples. Using Unreal Engine 5, we design a diverse set of realistic HDR scenes that encompass various dynamic elements, motion types, high dynamic range scenes, and lighting. Additionally, we develop an efficient rendering pipeline to generate realistic HDR images. To further mitigate the domain gap between synthetic and real-world data, we introduce S2R-Adapter, a domain adaptation designed to bridge this gap and enhance the generalization ability of models. Experimental results on real-world datasets demonstrate that our approach achieves state-of-the-art HDR fusion performance. Dataset and code are available at https://openimaginglab.github.io/S2R-HDR.

2503.17644 2026-02-17 cs.LG cs.AI

On The Sample Complexity Bounds In Bilevel Reinforcement Learning

Mudit Gaur, Utsav Singh, Amrit Singh Bedi, Raghu Pasupathu, Vaneet Aggarwal

Comments This is updated version of the paper 2410.15610

详情
英文摘要

Bilevel reinforcement learning (BRL) has emerged as a powerful framework for aligning generative models, yet its theoretical foundations, especially sample complexity bounds, remain underexplored. In this work, we present the first sample complexity bound for BRL, establishing a rate of $\mathcal{O}(ε^{-3})$ in continuous state-action spaces. Traditional MDP analysis techniques do not extend to BRL due to its nested structure and non-convex lower-level problems. We overcome these challenges by leveraging the Polyak-Łojasiewicz (PL) condition and the MDP structure to obtain closed-form gradients, enabling tight sample complexity analysis. Our analysis also extends to general bi-level optimization settings with non-convex lower levels, where we achieve state-of-the-art sample complexity results of $\mathcal{O}(ε^{-3})$ improving upon existing bounds of $\mathcal{O}(ε^{-6})$. Additionally, we address the computational bottleneck of hypergradient estimation by proposing a fully first-order, Hessian-free algorithm suitable for large-scale problems.

2503.12763 2026-02-17 cs.CV cs.LG

A Survey on Human Interaction Motion Generation

Kewei Sui, Anindita Ghosh, Inwoo Hwang, Bing Zhou, Jian Wang, Chuan Guo

Comments The repository listing relevant papers is accessible at: https://github.com/soraproducer/Awesome-Human-Interaction-Motion-Generation

Journal ref International Journal of Computer Vision, 2026

详情
英文摘要

Humans inhabit a world defined by interactions -- with other humans, objects, and environments. These interactive movements not only convey our relationships with our surroundings but also demonstrate how we perceive and communicate with the real world. Therefore, replicating these interaction behaviors in digital systems has emerged as an important topic for applications in robotics, virtual reality, and animation. While recent advances in deep generative models and new datasets have accelerated progress in this field, significant challenges remain in modeling the intricate human dynamics and their interactions with entities in the external world. In this survey, we present, for the first time, a comprehensive overview of the literature in human interaction motion generation. We begin by establishing foundational concepts essential for understanding the research background. We then systematically review existing solutions and datasets across three primary interaction tasks -- human-human, human-object, and human-scene interactions -- followed by evaluation metrics. Finally, we discuss open research directions and future opportunities.

2502.10295 2026-02-17 cs.LG

Fenchel-Young Variational Learning

Sophia Sklaviadis, Thomas Moellenhoff, Andre Martins, Mario Figueiredo

Comments Under review

详情
英文摘要

From a variational perspective, many statistical learning criteria involve seeking a distribution that balances empirical risk and regularization. In this paper, we broaden this perspective by introducing a new general class of variational methods based on Fenchel-Young (FY) losses, treated as divergences that generalize (and encompass) the familiar Kullback-Leibler divergence at the core of classical variational learning. Our proposed formulation -- FY variational learning -- includes as key ingredients new notions of FY free energy, FY evidence, FY evidence lower bound, and FY posterior. We derive alternating minimization and gradient backpropagation algorithms to compute (or lower bound) the FY evidence, which enables learning a wider class of models than previous variational formulations. This leads to generalized FY variants of classical algorithms, such as an FY expectation-maximization (FYEM) algorithm, and latent-variable models, such as an FY variational autoencoder (FYVAE). Our new methods are shown to be empirically competitive, often outperforming their classical counterparts, and most importantly, to have qualitatively novel features. For example, FYEM has an adaptively sparse E-step, while the FYVAE can support models with sparse observations and sparse posteriors.

2502.05376 2026-02-17 cs.LG

LO-BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference

Reena Elangovan, Charbel Sakr, Anand Raghunathan, Brucek Khailany

Journal ref Transactions on Machine Learning Research, 2025

详情
英文摘要

Post-training quantization (PTQ) is a promising approach to reducing the storage and computational requirements of large language models (LLMs) without additional training cost. Recent PTQ studies have primarily focused on quantizing only weights to sub-8-bits while maintaining activations at 8-bits or higher. Accurate sub-8-bit quantization for both weights and activations without relying on quantization-aware training remains a significant challenge. We propose a novel quantization method called block clustered quantization (BCQ) wherein each operand tensor is decomposed into blocks (a block is a group of contiguous scalars), blocks are clustered based on their statistics, and a dedicated optimal quantization codebook is designed for each cluster. As a specific embodiment of this approach, we propose a PTQ algorithm called Locally-Optimal BCQ (LO-BCQ) that iterates between the steps of block clustering and codebook design to greedily minimize the quantization mean squared error. When weight and activation scalars are encoded to W4A4 format (with 0.5-bits of overhead for storing scaling factors and codebook selectors), we advance the current state-of-the-art by demonstrating <1% loss in inference accuracy across several LLMs and downstream tasks.

2502.01594 2026-02-17 cs.LG math.OC

Faster Adaptive Optimization via Expected Gradient Outer Product Reparameterization

Adela DePavia, Jose Cruzado, Jiayou Liang, Vasileios Charisopoulos, Rebecca Willett

详情
英文摘要

Adaptive optimization algorithms -- such as Adagrad, Adam, and their variants -- have found widespread use in machine learning, signal processing and many other settings. Several methods in this family are not rotationally equivariant, meaning that simple reparameterizations (i.e. change of basis) can drastically affect their convergence. However, their sensitivity to the choice of parameterization has not been systematically studied; it is not clear how to identify a "favorable" change of basis in which these methods perform best. In this paper we propose a reparameterization method and demonstrate both theoretically and empirically its potential to improve their convergence behavior. Our method is an orthonormal transformation based on the expected gradient outer product (EGOP) matrix, which can be approximated using either full-batch or stochastic gradient oracles. We show that for a broad class of functions, the sensitivity of adaptive algorithms to choice-of-basis is influenced by the decay of the EGOP matrix spectrum. We illustrate the potential impact of EGOP reparameterization by presenting empirical evidence and theoretical arguments that common machine learning tasks with "natural" data exhibit EGOP spectral decay.

2501.16178 2026-02-17 cs.LG stat.ML

SWIFT: Mapping Sub-series with Wavelet Decomposition Improves Time Series Forecasting

Wenxuan Xie, Fanpu Cao

详情
英文摘要

In recent work on time-series prediction, Transformers and even large language models have garnered significant attention due to their strong capabilities in sequence modeling. However, in practical deployments, time-series prediction often requires operation in resource-constrained environments, such as edge devices, which are unable to handle the computational overhead of large models. To address such scenarios, some lightweight models have been proposed, but they exhibit poor performance on non-stationary sequences. In this paper, we propose $\textit{SWIFT}$, a lightweight model that is not only powerful, but also efficient in deployment and inference for Long-term Time Series Forecasting (LTSF). Our model is based on three key points: (i) Utilizing wavelet transform to perform lossless downsampling of time series. (ii) Achieving cross-band information fusion with a learnable filter. (iii) Using only one shared linear layer or one shallow MLP for sub-series' mapping. We conduct comprehensive experiments, and the results show that $\textit{SWIFT}$ achieves state-of-the-art (SOTA) performance on multiple datasets, offering a promising method for edge computing and deployment in this task. Moreover, it is noteworthy that the number of parameters in $\textit{SWIFT-Linear}$ is only 25\% of what it would be with a single-layer linear model for time-domain prediction. Our code is available at https://github.com/LancelotXWX/SWIFT.

2412.11586 2026-02-17 cs.CV

StrandHead: Text to Hair-Disentangled 3D Head Avatars Using Human-Centric Priors

Xiaokun Sun, Zeyu Cai, Ying Tai, Jian Yang, Zhenyu Zhang

Comments Accepted by ICCV 2025; Project page: https://xiaokunsun.github.io/StrandHead.github.io

详情
英文摘要

While haircut indicates distinct personality, existing avatar generation methods fail to model practical hair due to the data limitation or entangled representation. We propose StrandHead, a novel text-driven method capable of generating 3D hair strands and disentangled head avatars with strand-level attributes. Instead of using large-scale hair-text paired data for supervision, we demonstrate that realistic hair strands can be generated from prompts by distilling 2D generative models pre-trained on human mesh data. To this end, we propose a meshing approach guided by strand geometry to guarantee the gradient flow from the distillation objective to the neural strand representation. The optimization is then regularized by statistically significant haircut features, leading to stable updating of strands against unreasonable drifting. These employed 2D/3D human-centric priors contribute to text-aligned and realistic 3D strand generation. Extensive experiments show that StrandHead achieves the state-of-the-art performance on text to strand generation and disentangled 3D head avatar modeling. The generated 3D hair can be applied on avatars for strand-level editing, as well as implemented in the graphics engine for physical simulation or other applications. Project page: https://xiaokunsun.github.io/StrandHead.github.io/.

2407.13010 2026-02-17 cs.LG cs.CE physics.comp-ph stat.ML

A Resolution Independent Neural Operator

Bahador Bahmani, Somdatta Goswami, Ioannis G. Kevrekidis, Michael D. Shields

Journal ref Journal of Computational Physics, 539, 2025, 114233

详情
英文摘要

The Deep Operator Network (DeepONet) is a powerful neural operator architecture that uses two neural networks to map between infinite-dimensional function spaces. This architecture allows for the evaluation of the solution field at any location within the domain but requires input functions to be discretized at identical locations, limiting practical applications. We introduce a general framework for operator learning from input-output data with arbitrary sensor locations and counts. This begins by introducing a resolution-independent DeepONet (RI-DeepONet), which handles input functions discretized arbitrarily but sufficiently finely. To achieve this, we propose two dictionary learning algorithms that adaptively learn continuous basis functions, parameterized as implicit neural representations (INRs), from correlated signals on arbitrary point clouds. These basis functions project input function data onto a finite-dimensional embedding space, making it compatible with DeepONet without architectural changes. We specifically use sinusoidal representation networks (SIRENs) as trainable INR basis functions. Similarly, the dictionary learning algorithms identify basis functions for output data, defining a new neural operator architecture: the Resolution Independent Neural Operator (RINO). In RINO, the operator learning task reduces to mapping coefficients of input basis functions to output basis functions. We demonstrate RINO's robustness and applicability in handling arbitrarily sampled input and output functions during both training and inference through several numerical examples.

2602.15021 2026-02-17 astro-ph.SR astro-ph.GA cs.LG

Generalization from Low- to Moderate-Resolution Spectra with Neural Networks for Stellar Parameter Estimation: A Case Study with DESI

Xiaosheng Zhao, Yuan-Sen Ting, Rosemary F. G. Wyse, Alexander S. Szalay, Yang Huang, László Dobos, Tamás Budavári, Viska Wei

Comments 20 pages, 13 figures, 4 tables. Submitted to AAS journals. Comments welcome

详情
英文摘要

Cross-survey generalization is a critical challenge in stellar spectral analysis, particularly in cases such as transferring from low- to moderate-resolution surveys. We investigate this problem using pre-trained models, focusing on simple neural networks such as multilayer perceptrons (MLPs), with a case study transferring from LAMOST low-resolution spectra (LRS) to DESI medium-resolution spectra (MRS). Specifically, we pre-train MLPs on either LRS or their embeddings and fine-tune them for application to DESI stellar spectra. We compare MLPs trained directly on spectra with those trained on embeddings derived from transformer-based models (self-supervised foundation models pre-trained for multiple downstream tasks). We also evaluate different fine-tuning strategies, including residual-head adapters, LoRA, and full fine-tuning. We find that MLPs pre-trained on LAMOST LRS achieve strong performance, even without fine-tuning, and that modest fine-tuning with DESI spectra further improves the results. For iron abundance, embeddings from a transformer-based model yield advantages in the metal-rich ([Fe/H] > -1.0) regime, but underperform in the metal-poor regime compared to MLPs trained directly on LRS. We also show that the optimal fine-tuning strategy depends on the specific stellar parameter under consideration. These results highlight that simple pre-trained MLPs can provide competitive cross-survey generalization, while the role of spectral foundation models for cross-survey stellar parameter estimation requires further exploration.

2602.14947 2026-02-17 eess.SY cs.LG cs.SY

Gradient Networks for Universal Magnetic Modeling of Synchronous Machines

Junyi Li, Tim Foissner, Floran Martin, Antti Piippo, Marko Hinkkanen

详情
英文摘要

This paper presents a physics-informed neural network approach for dynamic modeling of saturable synchronous machines, including cases with spatial harmonics. We introduce an architecture that incorporates gradient networks directly into the fundamental machine equations, enabling accurate modeling of the nonlinear and coupled electromagnetic constitutive relationship. By learning the gradient of the magnetic field energy, the model inherently satisfies energy balance (reciprocity conditions). The proposed architecture can universally approximate any physically feasible magnetic behavior and offers several advantages over lookup tables and standard machine learning models: it requires less training data, ensures monotonicity and reliable extrapolation, and produces smooth outputs. These properties further enable robust model inversion and optimal trajectory generation, often needed in control applications. We validate the proposed approach using measured and finite-element method (FEM) datasets from a 5.6-kW permanent-magnet (PM) synchronous reluctance machine. Results demonstrate accurate and physically consistent models, even with limited training data.

2602.14939 2026-02-17 eess.SY cs.LG cs.SY

Fault Detection in Electrical Distribution System using Autoencoders

Sidharthenee Nayak, Victor Sam Moses Babu, Chandrashekhar Narayan Bhende, Pratyush Chakraborty, Mayukha Pal

详情
英文摘要

In recent times, there has been considerable interest in fault detection within electrical power systems, garnering attention from both academic researchers and industry professionals. Despite the development of numerous fault detection methods and their adaptations over the past decade, their practical application remains highly challenging. Given the probabilistic nature of fault occurrences and parameters, certain decision-making tasks could be approached from a probabilistic standpoint. Protective systems are tasked with the detection, classification, and localization of faulty voltage and current line magnitudes, culminating in the activation of circuit breakers to isolate the faulty line. An essential aspect of designing effective fault detection systems lies in obtaining reliable data for training and testing, which is often scarce. Leveraging deep learning techniques, particularly the powerful capabilities of pattern classifiers in learning, generalizing, and parallel processing, offers promising avenues for intelligent fault detection. To address this, our paper proposes an anomaly-based approach for fault detection in electrical power systems, employing deep autoencoders. Additionally, we utilize Convolutional Autoencoders (CAE) for dimensionality reduction, which, due to its fewer parameters, requires less training time compared to conventional autoencoders. The proposed method demonstrates superior performance and accuracy compared to alternative detection approaches by achieving an accuracy of 97.62% and 99.92% on simulated and publicly available datasets.

2602.14867 2026-02-17 cond-mat.mtrl-sci cs.LG

Fast and accurate quasi-atom method for simultaneous atomistic and continuum simulation of solids

Artem Chuprov, Egor E. Nuzhin, Alexey A. Tsukanov, Nikolay V. Brilliantov

详情
英文摘要

We report a novel hybrid method of simultaneous atomistic simulation of solids in critical regions (contacts surfaces, cracks areas, etc.), along with continuum modeling of other parts. The continuum is treated in terms of quasi-atoms of different size, comprising composite medium. The parameters of interaction potential between the quasi-atoms are optimized to match elastic properties of the composite medium to those of the atomic one. The optimization method coincides conceptually with the online Machine Learning (ML) methods, making it computationally very efficient. Such an approach allows a straightforward application of standard software packages for molecular dynamics (MD), supplemented by the ML-based optimizer. The new method is applied to model systems with a simple, pairwise Lennard-Jones potential, as well with multi-body Tersoff potential, describing covalent bonds. Using LAMMPS software we simulate collision of particles of different size. Comparing simulation results, obtained by the novel method, with full-atomic simulations, we demonstrate its accuracy, validity and overwhelming superiority in computational speed. Furthermore, we compare our method with other hybrid methods, specifically, with the closest one -- AtC (Atomic to Continuum) method. We demonstrate a significant superiority of our approach in computational speed and implementation convenience. Finally, we discuss a possible extension of the method for modeling other phenomena.

2602.14833 2026-02-17 eess.SP cs.LG

RF-GPT: Teaching AI to See the Wireless World

Hang Zou, Yu Tian, Bohao Wang, Lina Bariah, Samson Lasaulce, Chongwen Huang, Mérouane Debbah

详情
英文摘要

Large language models (LLMs) and multimodal models have become powerful general-purpose reasoning systems. However, radio-frequency (RF) signals, which underpin wireless systems, are still not natively supported by these models. Existing LLM-based approaches for telecom focus mainly on text and structured data, while conventional RF deep-learning models are built separately for specific signal-processing tasks, highlighting a clear gap between RF perception and high-level reasoning. To bridge this gap, we introduce RF-GPT, a radio-frequency language model (RFLM) that utilizes the visual encoders of multimodal LLMs to process and understand RF spectrograms. In this framework, complex in-phase/quadrature (IQ) waveforms are mapped to time-frequency spectrograms and then passed to pretrained visual encoders. The resulting representations are injected as RF tokens into a decoder-only LLM, which generates RF-grounded answers, explanations, and structured outputs. To train RF-GPT, we perform supervised instruction fine-tuning of a pretrained multimodal LLM using a fully synthetic RF corpus. Standards-compliant waveform generators produce wideband scenes for six wireless technologies, from which we derive time-frequency spectrograms, exact configuration metadata, and dense captions. A text-only LLM then converts these captions into RF-grounded instruction-answer pairs, yielding roughly 12,000 RF scenes and 0.625 million instruction examples without any manual labeling. Across benchmarks for wideband modulation classification, overlap analysis, wireless-technology recognition, WLAN user counting, and 5G NR information extraction, RF-GPT achieves strong multi-task performance, whereas general-purpose VLMs with no RF grounding largely fail.

2602.14785 2026-02-17 eess.AS cs.LG

SA-SSL-MOS: Self-supervised Learning MOS Prediction with Spectral Augmentation for Generalized Multi-Rate Speech Assessment

Fengyuan Cao, Xinyu Liang, Fredrik Cumlin, Victor Ungureanu, Chandan K. A. Reddy, Christian Schuldt, Saikat Chatterjee

Comments Accepted at ICASSP 2026

详情
英文摘要

Designing a speech quality assessment (SQA) system for estimating mean-opinion-score (MOS) of multi-rate speech with varying sampling frequency (16-48 kHz) is a challenging task. The challenge arises due to the limited availability of a MOS-labeled training dataset comprising multi-rate speech samples. While self-supervised learning (SSL) models have been widely adopted in SQA to boost performance, a key limitation is that they are pretrained on 16 kHz speech and therefore discard high-frequency information present in higher sampling rates. To address this issue, we propose a spectrogram-augmented SSL method that incorporates high-frequency features (up to 48 kHz sampling rate) through a parallel-branch architecture. We further introduce a two-step training scheme: the model is first pre-trained on a large 48 kHz dataset and then fine-tuned on a smaller multi-rate dataset. Experimental results show that leveraging high-frequency information overlooked by SSL features is crucial for accurate multi-rate SQA, and that the proposed two-step training substantially improves generalization when multi-rate data is limited.

2602.14783 2026-02-17 cs.CY cs.AI

What hackers talk about when they talk about AI: Early-stage diffusion of a cybercrime innovation

Benoît Dupont, Chad Whelan, Serge-Olivier Paquette

Comments 33 pages, 2 figures, submitted to Global Crime

详情
英文摘要

The rapid expansion of artificial intelligence (AI) is raising concerns about its potential to transform cybercrime. Beyond empowering novice offenders, AI stands to intensify the scale and sophistication of attacks by seasoned cybercriminals. This paper examines the evolving relationship between cybercriminals and AI using a unique dataset from a cyber threat intelligence platform. Analyzing more than 160 cybercrime forum conversations collected over seven months, our research reveals how cybercriminals understand AI and discuss how they can exploit its capabilities. Their exchanges reflect growing curiosity about AI's criminal applications through legal tools and dedicated criminal tools, but also doubts and anxieties about AI's effectiveness and its effects on their business models and operational security. The study documents attempts to misuse legitimate AI tools and develop bespoke models tailored for illicit purposes. Combining the diffusion of innovation framework with thematic analysis, the paper provides an in-depth view of emerging AI-enabled cybercrime and offers practical insights for law enforcement and policymakers.

2602.14735 2026-02-17 quant-ph cs.LG

The Signal Horizon: Local Blindness and the Contraction of Pauli-Weight Spectra in Noisy Quantum Encodings

Ait Haddou Marwan

详情
英文摘要

The performance of quantum classifiers is typically analyzed through global state distinguishability or the trainability of variational models. This study investigates how much class information remains accessible under locality-constrained measurements in the presence of noise. The authors formulate binary quantum classification as constrained quantum state discrimination and introduce a locality-restricted distinguishability measure quantifying the maximum bias achievable by observables acting on at most $k$ subsystems. For $n$-qubit systems subject to independent depolarizing noise, the locally accessible signal is governed by a Pauli-weight-dependent contraction mechanism. This motivates a computable predictor, the $k$-local Pauli-accessible amplitude $A_{k}(p)$, which lower bounds the optimal $k$-local classification advantage. Numerical experiments on four-qubit encodings demonstrate quantitative agreement between empirical accuracy and the prediction across noise levels. The research identifies an operational breakdown threshold where $k$-local classifiers become indistinguishable from random guessing despite persistent global distinguishability.

2602.14699 2026-02-17 cs.DB cs.AI cs.AR

Qute: Towards Quantum-Native Database

Muzhi Chen, Xuanhe Zhou, Wei Zhou, Bangrui Xu, Surui Tang, Guoliang Li, Bingsheng He, Yeye He, Yitong Song, Fan Wu

Comments Please refer our open-source prototype at: https://github.com/weAIDB/Qute

详情
英文摘要

This paper envisions a quantum database (Qute) that treats quantum computation as a first-class execution option. Unlike prior simulation-based methods that either run quantum algorithms on classical machines or adapt existing databases for quantum simulation, Qute instead (i) compiles an extended form of SQL into gate-efficient quantum circuits, (ii) employs a hybrid optimizer to dynamically select between quantum and classical execution plans, (iii) introduces selective quantum indexing, and (iv) designs fidelity-preserving storage to mitigate current qubit constraints. We also present a three-stage evolution roadmap toward quantum-native database. Finally, by deploying Qute on a real quantum processor (origin_wukong), we show that it outperforms a classical baseline at scale, and we release an open-source prototype at https://github.com/weAIDB/Qute.

2602.14689 2026-02-17 cs.CR cs.AI cs.CL cs.LG

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

Lukas Struppek, Adam Gleave, Kellin Pelrine

Comments 54 pages, 7 figures, 35 tables

详情
英文摘要

As the capabilities of large language models continue to advance, so does their potential for misuse. While closed-source models typically rely on external defenses, open-weight models must primarily depend on internal safeguards to mitigate harmful behavior. Prior red-teaming research has largely focused on input-based jailbreaking and parameter-level manipulations. However, open-weight models also natively support prefilling, which allows an attacker to predefine initial response tokens before generation begins. Despite its potential, this attack vector has received little systematic attention. We present the largest empirical study to date of prefill attacks, evaluating over 20 existing and novel strategies across multiple model families and state-of-the-art open-weight models. Our results show that prefill attacks are consistently effective against all major contemporary open-weight models, revealing a critical and previously underexplored vulnerability with significant implications for deployment. While certain large reasoning models exhibit some robustness against generic prefilling, they remain vulnerable to tailored, model-specific strategies. Our findings underscore the urgent need for model developers to prioritize defenses against prefill attacks in open-weight LLMs.

2602.14642 2026-02-17 stat.ML cs.LG

GenPANIS: A Latent-Variable Generative Framework for Forward and Inverse PDE Problems in Multiphase Media

Matthaios Chatzopoulos, Phaedon-Stelios Koutsourelakis

详情
英文摘要

Inverse problems and inverse design in multiphase media, i.e., recovering or engineering microstructures to achieve target macroscopic responses, require operating on discrete-valued material fields, rendering the problem non-differentiable and incompatible with gradient-based methods. Existing approaches either relax to continuous approximations, compromising physical fidelity, or employ separate heavyweight models for forward and inverse tasks. We propose GenPANIS, a unified generative framework that preserves exact discrete microstructures while enabling gradient-based inference through continuous latent embeddings. The model learns a joint distribution over microstructures and PDE solutions, supporting bidirectional inference (forward prediction and inverse recovery) within a single architecture. The generative formulation enables training with unlabeled data, physics residuals, and minimal labeled pairs. A physics-aware decoder incorporating a differentiable coarse-grained PDE solver preserves governing equation structure, enabling extrapolation to varying boundary conditions and microstructural statistics. A learnable normalizing flow prior captures complex posterior structure for inverse problems. Demonstrated on Darcy flow and Helmholtz equations, GenPANIS maintains accuracy on challenging extrapolative scenarios - including unseen boundary conditions, volume fractions, and microstructural morphologies, with sparse, noisy observations. It outperforms state-of-the-art methods while using 10 - 100 times fewer parameters and providing principled uncertainty quantification.

2602.14641 2026-02-17 quant-ph cs.LG

Quantum Reservoir Computing with Neutral Atoms on a Small, Complex, Medical Dataset

Luke Antoncich, Yuben Moodley, Ugo Varetto, Jingbo Wang, Jonathan Wurtz, Jing Chen, Pascal Jahan Elahi, Casey R. Myers

详情
英文摘要

Biomarker-based prediction of clinical outcomes is challenging due to nonlinear relationships, correlated features, and the limited size of many medical datasets. Classical machine-learning methods can struggle under these conditions, motivating the search for alternatives. In this work, we investigate quantum reservoir computing (QRC), using both noiseless emulation and hardware execution on the neutral-atom Rydberg processor \textit{Aquila}. We evaluate performance with six classical machine-learning models and use SHAP to generate feature subsets. We find that models trained on emulated quantum features achieve mean test accuracies comparable to those trained on classical features, but have higher training accuracies and greater variability over data splits, consistent with overfitting. When comparing hardware execution of QRC to noiseless emulation, the models are more robust over different data splits and often exhibit statistically significant improvements in mean test accuracy. This combination of improved accuracy and increased stability is suggestive of a regularising effect induced by hardware execution. To investigate the origin of this behaviour, we examine the statistical differences between hardware and emulated quantum feature distributions. We find that hardware execution applies a structured, time-dependent transformation characterised by compression toward the mean and a progressive reduction in mutual information relative to emulation.

2602.09392 2026-02-17 cs.CR cs.AI

LLMAC: A Global and Explainable Access Control Framework with Large Language Model

Sharif Noor Zisad, Ragib Hasan

Comments This paper is accepted and presented in IEEE Consumer Communications & Networking Conference (CCNC 2026)

详情
英文摘要

Today's business organizations need access control systems that can handle complex, changing security requirements that go beyond what traditional methods can manage. Current approaches, such as Role-Based Access Control (RBAC), Attribute-Based Access Control (ABAC), and Discretionary Access Control (DAC), were designed for specific purposes. They cannot effectively manage the dynamic, situation-dependent workflows that modern systems require. In this research, we introduce LLMAC, a new unified approach using Large Language Models (LLMs) to combine these different access control methods into one comprehensive, understandable system. We used an extensive synthetic dataset that represents complex real-world scenarios, including policies for ownership verification, version management, workflow processes, and dynamic role separation. Using Mistral 7B, our trained LLM model achieved outstanding results with 98.5% accuracy, significantly outperforming traditional methods (RBAC: 14.5%, ABAC: 58.5%, DAC: 27.5%) while providing clear, human readable explanations for each decision. Performance testing shows that the system can be practically deployed with reasonable response times and computing resources.