arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1107
2605.00937 2026-05-05 physics.flu-dyn cs.LG

An ALE-Consistent Graph Neural Operator-Transformer Framework for Fluid-Structure Interaction

Shihang Zhao, Martín Saravia, Haokui Jiang, Zhiyang Xue, Shunxiang Cao

Comments 29 pages, 20 figures

详情
英文摘要

We propose an arbitrary Lagrangian-Eulerian (ALE)-consistent machine learning framework for long-term fluid-structure interaction (FSI) prediction on deforming unstructured meshes. Specifically, the fluid dynamics are modeled by a surrogate that combines a graph neural operator (GNO) with a vision Transformer (ViT) for spatiotemporal prediction, while a lightweight long short-term memory (LSTM) network predicts structural kinematics at the interface. The two surrogates are coupled through a standard partitioned procedure. Most importantly, kinematic compatibility at the moving interface is enforced via an ALE-consistent boundary-correction step that updates the fluid-side interface velocity with the predicted structural velocity at each coupling update, thereby improving near-interface accuracy and long-term rollout stability. To mitigate autoregressive error accumulation, a two-stage training strategy is adopted, consisting of single-step supervised pretraining followed by long-term autoregressive fine-tuning. The proposed framework is validated on the benchmark problem of a flexible beam vibration in the wake of a cylinder. Results demonstrate accurate phase-consistent predictions over long rollouts and robust generalization under inlet-profile variations in both interpolation and extrapolation settings. Systematic ablation studies further assess the respective contributions of the ViT module, ALE-consistent boundary correction, and long-term training to predictive accuracy and rollout robustness.

2605.00930 2026-05-05 q-bio.GN cs.AI

CellxPert: Inference-Time MCMC Steering of a Multi-Omics Single-Cell Foundation Model for In-Silico Perturbation

Andac Demir, Erik W. Anderson, Jeremy L. Jenkins, Srayanta Mukherjee

详情
Journal ref
ICLR Machine Learning for Genomics Explorations Workshop 2026
英文摘要

In this work, we introduce CellxPert, a scalable multimodal foundation model that unifies single-cell and spatial multi-omics within a common representation space. CellxPert jointly encodes transcriptomic (scRNA-seq), chromatin-accessibility (ATAC-seq), and surface-proteomic (CITE-seq) measurements, while directly incorporating MERFISH and imaging mass-cytometry data as 2D or 3D spatial-visual layers. CellxPert facilitates four key downstream tasks out of the box: (i) cell-type annotation across a broad ontology of 154 largely overlapping identities -- the largest label space addressed to date and a stringent test of fine-grained discrimination, (ii) efficient fine-tuning using Low Rank Adaptation (LoRA), (iii) genome-wide transcriptomic response prediction to in-silico perturbations (ISP), and (iv) seamless multi-omic integration across various assays and platforms. Unlike current single-cell foundation models, which approximate gene perturbations by deleting or reordering tokenized gene expression ranks, CellxPert employs a Metropolis-Hastings sampler whose proposal kernel uses the model's masked conditional distributions to transition to new transcriptomic states conditioned on the perturbed genes. This Markov-chain procedure mitigates out-of-distribution artifacts introduced by abrupt token manipulation and produces trajectories that are biologically interpretable. Evaluations on PBMC68K, Replogle Perturb-seq, Systema, and BMMC benchmarks show that CellxPert surpasses classical and state-of-the-art baselines in cell-type annotation, perturbation response prediction, and multi-omic integration.

2605.00923 2026-05-05 eess.IV cs.CV

A Proof-of-Concept Study of Multitask Learning for Cranial Synthetic CT Generation Across Heterogeneous MRI Field Strengths

Zhuoyao Xin, Yiren Zhang, Christopher Wu, Dong Liu, Chunming Gu, Elena Greco, Erik H. Middlebrooks, Jun Hua, Jia Guo

Comments Published in Medical Physics (2026). DOI: 10.1002/mp.70429

详情
Journal ref
Medical Physics, 53(5): e70429, 2026
英文摘要

Accurate synthesis of computed tomography (CT) images from magnetic resonance imaging (MRI) is clinically valuable for cranial applications such as attenuation correction, radiotherapy planning, and image-guided interventions. However, heterogeneity across MRI field strengths and acquisition protocols limits the generalizability of existing methods. In this study, we formulate cranial CT synthesis as a modular, structurally coupled problem and propose a deep learning framework to improve robustness across heterogeneous MRI conditions. The model is designed to adapt to variations in field strength and imaging protocols while preserving anatomical consistency. Experiments on multi-site datasets demonstrate improved performance and generalization compared with conventional approaches. The proposed method enables reliable CT synthesis across heterogeneous MRI settings, supporting broader clinical translation.

2605.00922 2026-05-05 cs.SE cs.AI

To Vibe Research or Not to Vibe Research? Generative AI in Qualitative Research

Katja Karhu, Kari Smolander, Jussi Kasurinen

Comments 13 pages, 2 figures. Accepted to VibeX 2026: 1st International Workshop on Vibe Coding and Vibe Researching

详情
英文摘要

There has been intense debate among qualitative researchers about whether generative AI is suitable for qualitative research. In this paper, we summarize the broader ongoing discussion of generative AI in qualitative research and its implications for software engineering researchers. The qualitative research approach, small-q (positivist or post-positivist) or Big Q (non-positivist), is among the major criteria for determining whether generative AI can be used in qualitative research. In addition to research philosophy and research approach, skills, ethics, and personal preferences also play a role in researchers' decisions about whether to use AI in qualitative research.

2605.00914 2026-05-05 cs.MA cs.AI

The Cost of Consensus: Isolated Self-Correction Prevails Over Unguided Homogeneous Multi-Agent Debate

Blaž Bertalanič, Carolina Fortuna

Comments 19 pages, ACM Conference on AI and Agentic Systems

详情
英文摘要

Multi-agent debate, where teams of LLMs iteratively exchange rationales and vote on answers, is widely deployed under the assumption that peer review filters hallucinations. Yet the failure dynamics of homogeneous debate remain poorly understood, therefore we report findings from a controlled empirical study of teams of $N{=}10$ homogeneous agents (Qwen2.5-7B, Llama-3.1-8B, Ministral-3-8B) across $R{=}3$ debate rounds on two high-difficulty benchmarks (GSM-Hard and MMLU-Hard). We compare peer debate against isolated self-correction and a stochastic noise control that injects rationales from unrelated problems. We decompose debate failure into three model-dependent pathways: sycophantic conformity, where agents uncritically adopt majority answers (modal adoption up to 85.5%); contextual fragility, where peer rationales destabilize previously correct reasoning (vulnerability rate up to 70.0%); and consensus collapse, where plurality voting discards correct answers already present in the generation pool (oracle gap up to 32.3 percentage points). Ablations over communication density ($K \in \{2,4,9\}$) and sampling temperature ($T \in \{0.4, 0.7\}$) show that conformity reaches high levels at minimal peer exposure ($K{=}2$) and intensifies with greater initial diversity. Across all configurations, debate consumes 2.1-3.4$\times$ more tokens (up to 28,631 tokens per problem) than self-correction for equal or lower accuracy. Our results indicate that, within the 7-8B parameter class, homogeneous teams without structured roles do not benefit from unguided peer exchange, and that isolated self-correction consistently offers a more favorable cost-accuracy tradeoff.

2605.00897 2026-05-05 eess.SP cs.CV eess.IV

SPAT: A Semantic Port-Aware Adaptive-Rate Transmission Protocol for Semantic Communication

Yunhao Wang, Shuai Ma, Bin Shen, Shouhan Shi, Youlong Wu, Guangming Shi, Xiang Cheng

详情
英文摘要

With the evolution of 6G, semantic communication has emerged as a promising paradigm by prioritizing the delivery of task-relevant meaning over strict bit-level correctness. However, existing transport mechanisms still rely on explicit port headers and bit-level validation, making them vulnerable to header corruption and the resulting packet loss. To address this issue, this paper proposes a Semantic Port-Aware Adaptive-Rate Transmission Protocol (SPAT) for semantic communication. The proposed framework jointly embeds source and destination port information into semantic representations, thereby reducing dependence on explicit port headers while enabling robust port-aware transmission. Furthermore, a differentiated semantic processing mechanism is developed for uplink and downlink scenarios, where port identification is introduced for uplink service recognition and destination-aware conditional gating is designed for downlink selective decoding. In addition, an adaptive-rate controller is incorporated to dynamically adjust the number of transmitted semantic channels according to channel conditions and feature importance, thereby improving both robustness and transmission efficiency. Experimental results on the AFHQ and ImageNet-10 datasets, together with real-world experimental measurements, demonstrate that SPAT consistently outperforms TCP, UDP, and SITP in reconstruction quality across different SNRs while maintaining low-latency transmission.

2605.00895 2026-05-05 eess.SP cs.AI cs.LG

Transfer Learning for Tonal Noise Prediction in VRF Units Using Thermodynamic and Vibration Signals

ZhiWei Su, Ding Wang, Yuan Guo, Yang Qiao, HongJun Cao

详情
英文摘要

The second-order harmonic (2f) component generated by twin-rotary compressor is a dominant low-frequency noise source of variable refrigerant flow (VRF) outdoor units, yet its amplitude fluctuates strongly with environmental thermal load and valve opening, making it difficult to assess accurately using conventional mechanism-based models. This paper proposes an unsupervised transfer learning method based on Domain-invariant Partial Least Squares (Di-PLS) to accurately predict 2f noise levels under new conditions using different signals. Prediction models utilizing thermodynamic signals and acceleration signals are constructed respectively, and the generalization performance of the proposed Di-PLS is systematically compared with traditional Partial Least Squares (PLS). Results demonstrate that Di-PLS significantly outperforms PLS by extracting cross-condition common features and minimizing the distribution discrepancy between the source and target domains. Specifically, the acceleration-based Di-PLS model achieves the best performance, maintaining prediction errors within 3 dB for all test cases. This superiority over thermodynamic-based models highlights a physical insight: while thermodynamic states drive dynamic changes, structural vibration possesses a stronger and more direct causal link to acoustic radiation.

2605.00881 2026-05-05 eess.IV cs.CV physics.med-ph

A Coupled Fourth Order Telegraph Diffusion Framework Using Grayscale Indicators for Image Despeckling

Manish Kumar, Rajendra K. Ray

详情
英文摘要

Speckle noise severely limits the quality of images acquired from coherent imaging systems such as Synthetic Aperture Radar (SAR) and medical ultrasound. Traditional second-order PDE-based despeckling approaches, although popular, often introduce staircase artifacts and blur fine details. To overcome these limitations, we present a nonlinear, fourth-order coupled hyperbolic-parabolic PDE model that effectively reduces noise while preserving the structure. The framework consists of two evolution equations: one governing fourth-order diffusion for effective speckle reduction and smooth intensity transitions, and another refining an edge indicator to protect textures and structural features. The diffusion coefficient is adaptively constructed using both the image intensity variable u and a grayscale-based indicator function, ensuring structure-aware denoising while avoiding blocky artifacts and preserving fine details. We also prove the existence of a weak solution to the proposed model by applying Schauder fixed-point theorem. A finite-difference scheme with Gauss Seidel iteration is employed for efficient implementation. We compare the proposed model with the existing coupled second-order PDE model (HPCPDE) and the fourth-order telegraph diffusion model (TDFM). The results show that our model consistently outperforms these approaches. Experiments on standard grayscale images, real SAR and ultrasound data, as well as speckle-corrupted color images, demonstrate that the proposed method achieves superior performance over conventional PDE-based techniques in terms of PSNR, MSSIM, and Speckle Index.

2605.00877 2026-05-05 cs.MM cs.AI cs.CL cs.CV cs.LG

OceanPile: A Large-Scale Multimodal Ocean Corpus for Foundation Models

Yida Xue, Ningyu Zhang, Tingwei Wu, Zhe Ma, Daxiong Ji, Zhao Wang, Guozhou Zheng, Huajun Chen

Comments Work in progress

详情
英文摘要

The vast and underexplored ocean plays a critical role in regulating global climate and supporting marine biodiversity, yet artificial intelligence has so far delivered limited impact in this domain due to a fundamental data bottleneck. Specifically, ocean data are highly fragmented across disparate sources and inherently exhibit multi-modal, high-noise, and weakly labeled characteristics, lacking unified schemas and semantic alignment. Although Multimodal Large Language Models (MLLMs) have achieved remarkable success in general domains, their application to ocean science remains severely constrained by the absence of large-scale, well-aligned multimodal datasets tailored to marine environments. To bridge this gap, we introduce OceanPile, a large-scale multimodal corpus designed for ocean foundation models. It comprises three key components: OceanCorpus, a unified collection integrating sonar data, underwater imagery, marine science visuals, and scientific text from diverse authoritative sources; OceanInstruction, a high-quality instruction dataset synthesized via a novel pipeline guided by a hierarchical Ocean Concept Knowledge Graph; and OceanBenchmark, a manually curated evaluation benchmark for rigorous assessment. We establish a multi-stage quality control process to ensure scientific validity and alignment across modalities. Experimental validation demonstrates significant performance improvements for models trained on our data. All datasets are publicly released to advance the field of marine artificial intelligence and empower domain-specific MLLMs.

2605.00872 2026-05-05 eess.SP cs.AI cs.CV

Multi-View Hierarchical Representation Learning of Fetal Hemodynamics for Maternal Hypertension Detection at the Edge

Alireza Rafiei, Anahí Venzor Strader, Esteban Castro Aragón, Victoriana Rosibely Sut Serech, Enma Carolina Coyote Ixen, Reza Sameni, Peter Rohloff, Gari D. Clifford, Nasim Katebi

详情
英文摘要

Hypertensive disorders of pregnancy remain a leading cause of maternal and fetal morbidity worldwide, yet diagnosis relies on intermittent cuff-based blood pressure measurements that are prone to bias and fail to capture continuous physiological dynamics. Growing evidence suggests that fetal cardiovascular activity is associated with maternal-placental hemodynamics and may encode markers of maternal hypertension. To analyze this, we collected a large-scale dataset of fetal one-dimensional Doppler ultrasound recordings paired with maternal blood pressure from 3,255 pregnant women across 8,170 antenatal visits in rural Guatemala. We developed AutoHyPE, a hierarchical attention network that models short- and long-term signal structure, incorporating a novel prototype-based contrastive learning and multi-view strategy to enhance representation robustness under long-tailed class distribution and biological variability. AutoHyPE achieved an AUROC of 0.80 for maternal hypertension detection, outperforming baseline approaches while maintaining balanced performance across classes, with no performance degradation in an edge deployment scenario. Our findings demonstrated that fetal cardiac mechanical activity contains hemodynamic features indicative of maternal hypertension status. This supports a promising paradigm shift toward continuous, objective monitoring of maternal health using existing, low-cost ultrasound technology and introduces a complementary approach to traditional methods based on blood pressure measurements, advancing scalable prenatal care.

2605.00871 2026-05-05 eess.SP cs.AI cs.CV cs.LG

NAKUL-Med: Spectral-Graph State Space Models with Dynamics Kernels for Medical Signals

Badri N. Patro, Vijay S. Agneeswaran

Comments Accepted CVPR Finding Track

详情
英文摘要

State space models (SSMs) achieve linear-time complexity but struggle with multi-channel physiological signals due to three limitations: fixed kernels cannot capture multi-scale temporal dynamics (motor preparation over hundreds of milliseconds vs. execution transients in tens of milliseconds), Markovian state updates restrict global context for periodic oscillations, and channel-independent processing ignores spatial electrode topology. We introduce NAKUL, extending SSMs for medical signal analysis through three contributions: (1) Dynamic Kernel Generation-parallel SSM branches with varying kernel sizes (3, 5, 7, 11 timesteps) are weighted by a meta-network that analyzes input statistics, enabling adaptive temporal scale selection; (2) Spectral Context Modeling-FFT-based operations with learnable Gaussian frequency band filters capture global periodic patterns in $O(N \log N)$ complexity; (3) Graph-Guided Spatial Attention-fixed electrode topology provides spatial biases to multi-head attention for principled cross-channel interaction. On BCI Competition IV-2a motor imagery (our primary benchmark), NAKUL achieves 91.7$\pm$0.6\% accuracy, matching EEG-Conformer (92.1$\pm$0.7\%) while using 28\% fewer parameters (2.5M vs 3.5M) and 2.0$\times$ faster inference (4.3ms vs 8.7ms). The model generalizes to EEG emotion recognition (83.6\%), multimodal EEG-fMRI (91.4\%), and medical imaging (92.8\% on ultrasound), demonstrating architectural versatility. Ablations show dynamic kernels contribute +2.6\% and exhibit interpretable scale selection patterns correlated with known neural dynamics.

2605.00870 2026-05-05 eess.SP cs.AI cs.LG

An Algorithm for On-Sensor Agnostic Detection of Changes in Human Activity for Ultra-Low-Power Applications

Sara Rimoldi, Arianna De Vecchi, Hazem Hesham Yousef Shalby, Federica Villa

Comments Accepted to 2026 International Conference on Automatic Face and Gesture Recognition (FG)

详情
英文摘要

Wearable devices running Human Activity Recognition(HAR) on Inertial Measurement Units~(IMUs) waste energy by performing continuous classification for each window, even during long periods of unchanged activity. We address this with a lightweight change-detection gate: a non-parametric algorithm based on dynamic template matching that runs continuously at only approximately 16kFLOPs per step, requires no offline training, and does not need prior definition of target activity classes. The gate invokes the full HAR network only when it detects an activity change, reducing the computational load by over 67% in realistic monitoring settings. The algorithm is evaluated on smart glasses, smartwatch, and smartphone data, requiring only a brief device-specific calibration phase. The gate achieves 98% sensitivity on UCA-EHAR, ensuring no genuine activity transition is missed, while 75% specificity keeps unnecessary HAR invocations low. Results on WISDM are 97% sensitivity and 76% specificity, demonstrating robustness and flexibility to various settings.

2605.00869 2026-05-05 eess.SP cs.CV cs.LG

Robust Cross-Domain WiFi Fall Detection via Physics-Driven Attention-Enhanced Transformers

Yingzhe Wang, Cunhua Pan, Ruijing Liu, Shaokai Li, Hong Ren, Kezhi Wang, Jiangzhou Wang

详情
英文摘要

Device-free fall detection utilizing WiFi Channel State Information (CSI) has emerged as a promising, privacy-preserving solution for elderly health monitoring in the Internet of Things (IoT) era. However, existing deep learning approaches suffer from severe performance degradation when deployed in unseen environments due to static background overfitting and Non-Line-of-Sight (NLoS) signal attenuation. To address these critical bottlenecks, we propose a robust, domain-generalizable framework featuring a novel Attention-Enhanced CNN-Transformer hybrid architecture. First, we design a physics-driven \textbf{Dynamic Variance Gate (DVG)} to dynamically calculate local temporal variance, acting as a soft-attention mask that eliminates static environmental DC components while amplifying dynamic human motion. Second, we introduce a Physics-Aware Data Augmentation strategy to force the network to learn invariant morphological signatures rather than environment-specific noise. Furthermore, a Convolutional Block Attention Module (CBAM) is integrated to refine spatiotemporal features prior to Transformer-based sequence modeling. Extensive cross-domain evaluations across four distinct indoor environments demonstrate that our method achieves 97.6\% accuracy in NLoS scenarios and 98.8\% in completely unseen environments without target-domain fine-tuning. Finally, we deploy the proposed framework on an edge computing system equipped with commercial WiFi NICs. Real-world live inference field tests confirm the system's robustness against unseen environmental layouts and its capability for continuous, low-latency whole-home safety monitoring.

2605.00865 2026-05-05 eess.SP cs.CL cs.CV cs.LG cs.SD q-bio.NC

How Well Can We Decode Vowels from Auditory EEG -- A Rigorous Cross-Subject Benchmark with Honest Assessment

Xiaoyang Li

Comments 31 pages, 11 figures; includes supplementary material (14 pages, additional figures and analyses)

详情
英文摘要

EEG based phoneme decoding is promising for brain computer interfaces, but many prior studies rely on within subject evaluation, small cohorts, or weak leakage control. We present a reproducible cross subject benchmark for five class vowel decoding (a, e, i, o, u) from auditory EEG using OpenNeuro ds006104 (16 subjects, 61 channels, 256 Hz). Under strict leave one subject out evaluation with training only normalization and explicit anti leakage checks, we compare 14 pipelines from classical machine learning, deep learning, and Riemannian methods. The best full feature model (XGBoost) reaches 24.5 percent accuracy (chance 20 percent), while differential entropy features with LightGBM reach 25.5 percent in feature specific analysis. After multiple comparison correction, strong pairwise model advantages are limited. Classical methods are competitive with deep models in this low signal regime. Additional analyses (ablation, pairwise vowels, within subject CV, ERP, temporal generalization, and electrode importance) indicate that vowel information is real but weak and mainly carried by early transient auditory responses. We release code and evaluation scripts for full reproducibility.

2605.00861 2026-05-05 eess.AS cs.AI eess.SP

Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment

Huanchen Cai, Sten Ternström

详情
英文摘要

This study investigates voice mapping as an evaluation framework for text-to-speech (TTS) synthesis quality. The study analyzes six TTS models, including historical and recent ones. The metrics are crest factor, spectrum balance, and cepstral peak prominence (CPPs). We investigated 6 influential TTS models: Merlin, Tacotron 2, Transformer TTS, FastSpeech 2, Glow-TTS, and VITS. The results demonstrate that voice range serves as a primary indicator of model capability, with VITS showing the largest range among tested models. Glow-TTS exhibited superior performance in soft phonation, indicated by higher spectrum balance, despite limited voice range. The results showed that the CPPs values between 7-8 dB indicate natural voice quality, while with CPPs exceeding 10 dB, the speech tends to sound robotic. These findings underscore the need for voice mapping to evaluate vocal effort, and capture how TTS systems handle voice dynamic and expressiveness.

2605.00860 2026-05-05 physics.ao-ph cs.LG

An Adaptive Spatiotemporal Clustering Framework for 3D Ocean Subsurface Temperature Reconstruction

Ming Shan Loo, Wengen Li, Xudong Jiang, Hailiang Cheng, Zhifei Zhang, Jihong Guan, Yichao Zhang

详情
英文摘要

The reconstruction of ocean subsurface temperature (OST) using satellite remote sensing data holds significant scientific value for advancing the understanding of ocean dynamics and climate variability. However, the scarcity of subsurface observations, combined with the high degree of nonlinearity and spatiotemporal heterogeneity in subsurface processes, poses substantial challenges to the accuracy and generalization capability of traditional reconstruction methods. To address these limitations, this study proposes an adaptive framework that could capture both vertical structural dependencies and temporal variation patterns of OST via spatio-temporal clustering. By incorporating this framework with various deep learning models, e.g., dual-path convolutional neural networks (DP-CNN), Attention U-Net, and Vision Transformer (ViT), the OST field can be accurately reconstructed at a global scale only using surface observations, i.e., sea surface temperature (SST), sea surface salinity (SSS), sea surface height (SSH), and sea surface wind (SSW). Experimental results demonstrate that multiple deep learning methods using the proposed framework largely outperform their original counterparts, yielding improvements in RMSE ranging from 12.4\% to 27.2\%. This study provides a reliable solution for subsurface temperature reconstruction, offering important implications for meteorological modeling and climate change assessment.

2605.00858 2026-05-05 eess.SP cs.CE cs.LG

A Hybrid Windkessel-Neural Approach for Improved Noninvasive Blood Pressure Monitoring

Vaibhav Gollapalli, Aniruth Ananthanarayanan

详情
英文摘要

Owing to the recent advancements in wearable devices for health care, the importance of BP estimation without cuffs increases. Cuff technologies are inappropriate for continuous BP measurement due to their inconvenient usage, invasive character, necessity of calibration, large size, and inability to perform long-term monitoring. Normally, the algorithm used for cuffless BP prediction employs machine learning models that operate according to the data-driven approach. However, although they show high numerical accuracy, ML models do not provide any interpretability, resulting in poor physiological validity and clinical applicability. We propose a combination of Windkessel and ML models that incorporates the physical aspects into the latter one. It is performed by reformulating Windkessel into a form that will allow employing ML models. The result is a system of ODEs which can be used in the neural network. Thus, the inclusion of physical constraints improves the data-driven approach by making models consistent with physics, understandable, and robust. For illustration, we apply the described technique using a publicly available MIMIC-II database that we obtain from the UCI Machine Learning Repository.

2605.00857 2026-05-05 eess.SP cs.AI cs.LG q-bio.NC

Foundation Model Guided Dual-Branch Co-Adaptation for Source-Free EEG Decoding

Peiliang Gong, Han Zhang, Zhen Jiang, Chenyu Liu, Ziyu Jia, Xinliang Zhou, Daoqiang Zhang, Xiaoli Li

详情
英文摘要

Source-free domain adaptation (SFDA) provides a practical solution to cross-subject EEG decoding by adapting source-pretrained models to unlabeled target domains without accessing source data. However, existing SFDA methods rely solely on the limited internal knowledge of source-pretrained models, leading to inferior cross-domain generalization and unreliable pseudo-labels. Although EEG Foundation Models (FMs) pretrained on large-scale data exhibit strong generalizability, their potential in SFDA remains largely unexplored. To this end, we propose FUSED, a Foundation-guided Source-free EEG Decoding framework that integrates a large-scale FM with a compact Specialist Model (SM) via dual-branch co-adaptation. Specifically, we introduce a Co-adaptation mechanism equipping both branches with linear and prototype views, enabling cross-branch pseudo-label generation. Additionally, we design a Consensus Filtering Mechanism that exploits the FM's inherent stability to identify high-quality samples, along with a Two-Stage Pseudo-Label Refinement scheme to suppress error accumulation through cross-branch arbitration. Finally, we calibrate the FM's decision boundaries via mutual information maximization with the SM, followed by knowledge distillation from FM to SM, forming a principled calibrate-then-distill pipeline. To our knowledge, FUSED is the first work to leverage EEG FMs within the SFDA framework for cross-subject EEG decoding. Extensive experiments across three EEG paradigms, including motor imagery, emotion recognition, and SSVEP, demonstrate consistent state-of-the-art performance, validating the effectiveness of foundation-guided synergy for robust and privacy-preserving EEG decoding.

2605.00855 2026-05-05 math.OC cs.LG stat.ML

An Efficient Spatial Branch-and-Bound Algorithm for Global Optimization of Gaussian Process Posterior Mean Functions

Wei-Ting Tang, Akshay Kudva, Calvin Tsay, Joel A. Paulson

详情
英文摘要

We study the deterministic global optimization of trained Gaussian process posterior mean functions over hyperrectangular domains. Although the posterior mean function has a compact closed-form representation, its global optimization is challenging because it remains nonlinear and nonconvex. Existing exact deterministic approaches become increasingly difficult to scale as the number of training data points grows, leading to approximation-based methods that improve tractability by optimizing a modified (inexact) objective. In this work, we propose PALM-Mean, a piecewise-analytic lower-bounding framework embedded in reduced-space spatial branch-and-bound. At each node, kernel terms that are locally important are replaced by a sign-aware piecewise-linear relaxation in an appropriate scalar distance variable, while the remaining terms are bounded analytically in closed form. We show this hybrid approach yields a valid lower bound for the posterior mean, while limiting the size of the branch-and-bound subproblems. We establish validity of the node lower bounds and $\varepsilon$-global convergence of the resulting algorithm. Computational results on synthetic benchmarks and real-world application problems show that PALM-Mean improves scalability relative to representative general-purpose deterministic global solvers, particularly as the number of training data points increases.

2605.00850 2026-05-05 physics.ao-ph cs.AI cs.LG eess.IV

Earth System Foundation Model (ESFM): A unified framework for heterogeneous data integration and forecasting

Firat Ozdemir, Yun Cheng, Salman Mohebi, Fanny Lehmann, Simon Adamov, Zhenyi Zhang, Leonardo Trentini, Dana Grund, Oliver Fuhrer, Torsten Hoefler, Siddhartha Mishra, Sebastian Schemm, Benedikt Soja, Mathieu Salzmann

Comments ESFM is available on https://github.com/swiss-ai/ESFM. 48 pages, 29 figures, 18 tables

详情
英文摘要

Foundation models (FMs) for the Earth system learn statistical relationships between physical variables across massive datasets to enable versatile downstream applications through finetuning, separating them from task-specific weather models. Here, we introduce Earth System Foundation Model (ESFM), a fully open model building on the 3D Swin UNet backbone of the pioneering Aurora model. ESFM introduces extensions that increase functionality and foster adoption in climate sciences. First, the encoding scheme and training protocols have been extended to handle diverse datasets, including those containing missing values across all spatio-temporal dimensions such as satellite data, as well as station data, all under one backbone. Axial attention is introduced to capture inter-variable dependencies. As a result ESFM skillfully predicts variables in regions or on pressure levels where no data is present at the initial time, while preserving inter-variable relationships, for example between temperature, pressure, and humidity. Individual variable tokenization enables different sets of variables to be shuffled during training and simplifies the process of building extensions for new downstream tasks. Adaptive layer norm-based ensembles allow for a simple yet effective way to transform deterministic ESFM to a probabilistic FM. We present findings using dense gridded data (ERA5, CMIP6), regionally masked dense data, sparse gridded MODIS satellite data, and station data. Results demonstrate competitive or superior performance relative to state-of-the-art benchmarks. Case studies of Super Typhoon Doksuri (2023) and 2024 sudden stratospheric warming events show accurate positional and magnitude estimations of extreme weather. ESFM retains the strengths of previous foundation models, such as long-term stability, but facilitates application to a variety of downstream tasks.

2605.00849 2026-05-05 eess.SP cs.LG cs.SY eess.SY

Deep Learning for Multi-Antenna Modulation Recognition of Radio Signals

Tao Chen, Shilian Zheng, Jiepeng Chen, Zhangbin Pei, Qi Xuan, Xiaoniu Yang

详情
英文摘要

Multi-antenna receiving systems have become a prevalent technical solution in communication systems. Meanwhile, deep learning has achieved significant progress in automatic modulation recognition tasks in single-antenna systems. However, the application of deep learning in multi-antenna modulation recognition (MAMR) tasks is still limited. In this paper, we propose an MAMR method namely MAMR-IQ to fully explore the diversity gain of a multi-antenna receiving system, which concatenates the raw received in-phase and quadrature (IQ) signals of multiple antennas and feeds them into a convolutional neural network. Simulation results show that the proposed MAMR-IQ method outperforms two existing deep learning-based MAMR methods which are based on direct voting (DV) and weight average (WA) in terms of both recognition accuracy and computational complexity. To address the problem of limited training data in few-shot scenarios, we further propose a data augmentation method that involves exchanging IQ sequences received by any two antennas to generate augmented samples. Simulation results show that with the proposed augmentation method, the recognition accuracy can be further improved.

2605.00845 2026-05-05 cs.DB cs.AI cs.CL

Graph Query Generation with Constraint-guided Large Language Agents

Mengying Wang, Nicolaas Jedema, Rahul Pandey, RaviKiran Krishnan, Jens Lehmann, Yinghui Wu

Comments 42nd IEEE International Conference on Data Engineering (ICDE)

详情
英文摘要

Knowledge Graph Question Answering (KGQA) has advanced through structured query generation, yet most efforts target RDF/SPARQL, leaving Cypher and property graphs underexplored, despite increasing demand for unified KGQA in industry settings. We propose UniQGen, a novel constraint-based framework that employs LLM agents to dynamically extract and refine representative graph query clauses into executable, intent-aligned graph queries across query languages. The foundation of our method is a variant of Chase & Backchase, a family of algorithms for query optimization and reformulation. We extend Chase & Backchase with a dynamic reasoning process over query constraints that also interact with LLMs for query quality estimation. With a Cypher-supported Freebase graph deployed on Amazon Neptune, we extensively evaluate our approach on popular KGQA benchmarks (GraphQ, GrailQA, and WebQSP). We demonstrate that UniQGen outperforms state-of-the-art graph query generation techniques in both accuracy and efficiency, with F1 gains of 31.6% on GraphQ and 4.9% on GrailQA. Unlike prior methods, our framework does not require fine-tuning for schema matching, making it more extensible to schema-less graphs and semantics in query workloads, and is more suitable for enterprise-grade KGQA. We release Cypher outputs and a Neptune-ready Freebase snapshot to support reproducible, cross-language KGQA research.

2605.00844 2026-05-05 cs.CY cs.AI

The Oracle's Fingerprint: Correlated AI Forecasting Errors and the Limits of Bias Transmission

Theodor Spiro

Comments 23 pages, 3 figures, 5 tables

详情
英文摘要

When large language models (LLMs) are consulted as forecasting tools, the independence of individual errors -- the foundation of collective intelligence -- may collapse. We test three conditions necessary for this "epistemic monoculture" to emerge. In Study 1, we show that GPT-4o, Claude, and Gemini exhibit highly correlated forecasting errors on 568 resolved binary prediction questions (mean pairwise error correlation r = 0.77, p < 0.001; r = 0.78 excluding likely-leaked questions), despite being developed independently by different organizations. In Study 2, we test whether this correlated bias has propagated into human crowd forecasts, using a within-question design that tracks community prediction shifts across the ChatGPT launch boundary (November 2022). We find that community forecasts move in the direction predicted by LLMs (r = 0.20, p = 0.007), but this shift is fully explained by rational updating toward ground truth. In Study 3, we examine whether the category-level pattern of human forecasting errors increasingly resembles the LLM bias fingerprint. We find the opposite: pre-ChatGPT human biases already strongly resembled the LLM pattern (r = 0.87), while post-ChatGPT the resemblance weakened (r = -0.28). Together, these findings reveal an epistemic monoculture that is built but not yet activated: three nominally independent AI systems share the same failure modes, amplifying precisely the biases humans already hold.

2605.00843 2026-05-05 cs.CY cs.AI

Generative-AI and the transformation of workforce. A job postings-driven analysis

Diana Maria Popa, Simona-Vasilica Oprea, Adela Bâra

详情
英文摘要

This paper investigates how generative-artificial intelligence AI is reshaping job requirements, skill compositions and sectoral dynamics across global labor markets. It examines the evolving frequency and framing of AI-related competencies in job postings, exploring whether generative-AI functions primarily as an augmentative or substitutive force in the workplace. A large-scale, multi-source corpus of over 150,000 English-language job postings 2018-2025 is compiled from twelve open-access datasets and one public API. The analytical framework integrates lexical skill extraction, semantic framing, topic modeling, BERTopic, LDA, KMeans, and time-series forecasting ARIMA. Skill mentions are categorized into five dimensions: AI_Data, Routine, Soft_Meta, Domain_Specific and Leadership, while cross sectoral analyses and correlation matrices quantify interdependencies between competencies. Sentence-transformer embeddings and cosine similarity are used to compute a Framing Index, distinguishing augmentation- versus automation-oriented discourse. Investigating job postings, our research contributes a replicable, data driven methodology for mapping the diffusion of AI related skills across industries and time. Results reveal a sharp post-2021 increase in AI-related skill mentions: prompt engineering, fine-tuning and model validation, accompanied by a decline in routine tasks: data entry and manual coding. Forecasts suggest sustained growth in AI_Data and Soft_Meta skills through 2025, signaling a structural convergence toward hybrid human-AI expertise as a new foundation of employability.

2605.00838 2026-05-05 cs.NI cs.LG

Adaptive Alarm Threshold Prediction in 4G Mobile Networks: A Percentile-Guided Deep Learning Framework with Interpretable Outputs

Ayon Roy, Sadman Sharif, Shiva Prasad Sarkar

Comments 21 pages, 8 figures, preprint

详情
英文摘要

In mobile telecommunications, alarms act as early warning signals. They are triggered when a cell, the basic unit of radio coverage, shuts down or behaves abnormally. This signals a degradation in service quality, which directly affects the customer experience. To fix the issue, operators rely on preset thresholds to decide when an engineer should be sent out. In practice, these thresholds are set manually and remain fixed regardless of the time of day, traffic levels, or overall network conditions. This often leads to serious faults slipping through during busy hours, while minor issues can cause unnecessary callouts when the network is quiet. This paper presents a machine learning framework that automatically predicts four alarm thresholds, audit window duration, inactive time limit, total fluctuation count, and per hour fluctuation limit, from live network behavior. Since no ground truth labels exist for thresholds, we introduce a percentile guided label derivation strategy and evaluate four models on an anonymized dataset of 10,648 cells across three vendors and nine regions from a real 4G network, comprising a Gradient Boosted Trees baseline, a CNN-BiLSTM with attention, the proposed PCTN, and an iTransformer. PCTN performs the best overall with respect to three of the four targets, outperforming a state-of-the-art iTransformer while using 83 percent fewer parameters. Its mixed output heads and dynamic alpha mechanism produce thresholds that are both accurate and interpretable, allowing operators to inspect and adjust the learned policy without retraining. All comparisons are statistically significant at p < 0.001. The framework undergoes daily retraining using new data, which enables the thresholds to constantly adjust to changes in the network.

2605.00831 2026-05-05 cs.DC cs.AI cs.PF

GhostServe: A Lightweight Checkpointing System in the Shadow for Fault-Tolerant LLM Serving

Shakya Jayakody, Youpeng Zhao, Chinmay Dhanraj Nehate, Jun Wang

Comments MLSys 2026

详情
英文摘要

The rise of million-token, agent-based applications has placed unprecedented demands on large language model (LLM) inference services. The long-running nature of these tasks increases their susceptibility to hardware and software faults, leading to costly job failures, wasted resources, and degraded user experience. The stateful key-value (KV) cache, which grows with the sequence length, presents a central challenge as it is a critical and vulnerable component in distributed serving systems. In this work, we propose GhostServe, a novel checkpointing solution to facilitate fault-tolerant LLM serving. Specifically, GhostServe protects the streaming KV cache in the shadow by applying erasure coding to generate and store the parity shards in host memory. In the event of device failures, GhostServe enables fast reconstruction of the lost KV cache, allowing the inference process to resume seamlessly without costly full recomputation or state replication. Evaluations demonstrate that GhostServe reduces checkpointing latency by up to 2.7x and recovery latency by 2.1x for a single batch, and 1.2x median response latency compared to existing methods, in the presence of system failures, paving the way for high-availability and cost-effective LLM serving at scale.

2605.00827 2026-05-05 cs.DC cs.AI cs.SE

Separating Intelligence from Execution: A Workflow Engine for the Model Context Protocol

Abhinav Singh Parmar

Comments 16 pages, 5 figures

详情
英文摘要

Large Language Model (LLM) agents increasingly interact with external systems through tool-calling protocols such as the Model Context Protocol (MCP). In prevailing architectures, the agent must reason about every tool invocation in every session, consuming tokens proportional to the number of actions performed--even when the task has been solved before. We present the MCP Workflow Engine, a novel MCP-native orchestration layer that decouples intelligence (deciding what to do) from execution (carrying it out). An agent reasons once to produce a declarative workflow blueprint--a JSON document specifying a directed sequence of MCP tool calls with parameterized templates, loops, parallel branches, and data piping. Subsequent executions are triggered by a single run_workflow tool call, consuming one invocation's worth of tokens regardless of the blueprint's internal complexity. We formalize the MCP Mediator architectural pattern--an MCP server that simultaneously acts as a client to downstream MCP servers--and implement it in TypeScript against the MCP SDK. We evaluate the engine on a production-scale Kubernetes CMDB synchronization task spanning 67 orchestrated steps across 2 MCP servers, 38 namespaces, 13 worker nodes, and 22 distinct resource types. The engine reduces per-execution token cost by over 99%, completes the full cluster graph--comprising 1,200+ nodes and 2,800+ relationships across 20 relationship types--in under 45 seconds, and achieves deterministic, idempotent execution with zero agent involvement at run time.

2605.00826 2026-05-05 cs.IR cs.CV cs.LG cs.MM

Understanding the Performance Plateau in Text-to-Video Retrieval: A Comprehensive Empirical and Linguistic Analysis

Maria-Eirini Pegia, Dimitrios Stefanopoulos, Björn Þór Jónsson, Anastasia Moumtzidou, Ilias Gialampoukidis, Stefanos Vrochidis, Ioannis Kompatsiaris

Comments Survey, 50 pages, 15 figures, 13 tables, 154 citations

详情
英文摘要

Text-to-video retrieval enables users to find relevant video content using natural language queries, a task that has grown increasingly important with the rapid expansion of online video. Over the past six years, research has produced numerous methods, such as dual encoders, attention-driven models, and multimodal fusion approaches; however, fundamental questions remain about model behavior, dataset influence, and query difficulty. In this work, we evaluate 14 state-of-the-art retrieval methods across 3 widely used datasets under a unified preprocessing and evaluation framework. We analyze caption characteristics, including length, clarity, semantic category, and Action vs. Scene balance, and link these to model performance. Our results show that short, clear, and simple captions, such as those describing single actions or color attributes, achieve higher recall, while complex events, multi-step activities, or fine-grained scene descriptions remain challenging for all existing models. Attention-driven architectures better handle temporally dependent or multi-step queries, whereas dual-encoder and multimodal fusion models perform well primarily on simpler or single-category captions. Cross-dataset generalization improves with larger, more diverse caption sets, but generative captions do not consistently enhance retrieval accuracy. Overall, our findings highlight key dataset factors, benchmark challenges, and the interplay between query content and model architecture, providing guidance for developing more effective text-to-video retrieval systems.

2604.27947 2026-05-05 cs.NE cs.AI cs.LG cs.LO

Attractor FCM

Alexis Kafantaris

详情
英文摘要

In this paper an attractor FCM is created, tested, and analyzed. This FCM is neither a hebbian based nor agentic, nor a hybrid; it rather is a gradient descent based, physics constrained, Jacobian version of an FCM. Moreover, this model has several quirks; it uses residual memory, back propagation through time, and a fixed point anchor that is recursively implemented to update its weights. The residuals update the recursive part without losing the system memory. The model's anchor enables it to converge in a fixed point for which back propagation through time unrolls it and ensures that the error minimization is for an accurate gradient. Furthermore, a new learning algorithm is utilized. The Newton's method finds the system's fixed point attractor and then gradient descend is adaptively changing the landscape; an adaptive term is used to directly manipulate the weights through the attractor dynamics. As the adaptive term changes, the descent through the landscape is constantly adjusting according to sigmoid saturation, and that prevents premature convergence to a local minimum. Lastly, the updates are filtered by causal mask that informs the network about the physics, respecting the initial expert based opinions, for which model reduces the error to the target in an efficient way.

2604.23940 2026-05-05 cs.SE cs.AI

Constraint-Guided Multi-Agent Decompilation for Executable Binary Recovery

Yifan Zhang, Xiaohan Wang, Yueke Zhang, Yu Huang, Kevin Leach

详情
英文摘要

Decompilation -- recovering source code from compiled binaries -- is essential for security analysis, malware reverse engineering, and legacy software maintenance. However, existing decompilers produce code that often fails to compile or execute correctly, limiting their practical utility. We present a multi-agent framework that transforms decompiled code into re-executable source through Multi-level Constraint-Guided Decompilation (MCGD). Our approach employs a hierarchical validation pipeline with three constraint levels: (1) syntactic correctness via parsing, (2) compilability via GCC, and (3) behavioral equivalence via LLM-generated test cases. When validation fails, specialized LLM agents iteratively refine the code using structured error feedback. We evaluate our framework on 1,641 real-world binaries from ExeBench across three decompilers (RetDec, Ghidra, and Angr). Our framework achieves 84-97% re-executability, improving baseline decompiler output by 28-89 percentage points. In comparison with state-of-the-art LLM-based decompilation methods using the same GPT-4o backbone, our approach (84.1%) outperforms LLM4Decompile (80.3%), SK2Decompile (73.9%), and SALT4Decompile (61.8%). Our ablation study reveals that execution-based validation is critical: compile-only approaches achieve 0% behavioral correctness despite 91-99% compilation rates. The system converges efficiently, with 90%+ binaries reaching correctness within 2 iterations at an average cost of $0.03-0.05 per binary. Our results demonstrate that constraint-guided agentic refinement can bridge the gap between raw decompiler output and practically useful source code.