arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.08171 2026-04-10 cs.CV cs.AI

OceanMAE: A Foundation Model for Ocean Remote Sensing

Viola-Joanna Stamer, Panagiotis Agrafiotis, Behnood Rasti, Begüm Demir

详情

英文摘要

Accurate ocean mapping is essential for applications such as bathymetry estimation, seabed characterization, marine litter detection, and ecosystem monitoring. However, ocean remote sensing (RS) remains constrained by limited labeled data and by the reduced transferability of models pre-trained mainly on land-dominated Earth observation imagery. In this paper, we propose OceanMAE, an ocean-specific masked autoencoder that extends standard MAE pre-training by integrating multispectral Sentinel-2 observations with physically meaningful ocean descriptors during self-supervised learning. By incorporating these auxiliary ocean features, OceanMAE is designed to learn more informative and ocean-aware latent representations from large- scale unlabeled data. To transfer these representations to downstream applications, we further employ a modified UNet-based framework for marine segmentation and bathymetry estimation. Pre-trained on the Hydro dataset, OceanMAE is evaluated on MADOS and MARIDA for marine pollutant and debris segmentation, and on MagicBathyNet for bathymetry regression. The experiments show that OceanMAE yields the strongest gains on marine segmentation, while bathymetry benefits are competitive and task-dependent. In addition, an ablation against a standard MAE on MARIDA indicates that incorporating auxiliary ocean descriptors during pre-training improves downstream segmentation quality. These findings highlight the value of physically informed and domain-aligned self-supervised pre- training for ocean RS. Code and weights are publicly available at https://git.tu-berlin.de/joanna.stamer/SSLORS2.

URL PDF HTML ☆

赞 0 踩 0

2604.08169 2026-04-10 cs.AI

Activation Steering for Aligned Open-ended Generation without Sacrificing Coherence

Niklas Herbster, Martin Zborowski, Alberto Tosato, Gauthier Gidel, Tommaso Tosato

2604.08167 2026-04-10 cs.CV

T-Gated Adapter: A Lightweight Temporal Adapter for Vision-Language Medical Segmentation

Pranjal Khadka

Comments Accepted at the PHAROS-AIF-MIH Workshop at CVPR 2026

2604.08159 2026-04-10 cs.CV cs.AI

Face-D(^2)CL: Multi-Domain Synergistic Representation with Dual Continual Learning for Facial DeepFake Detection

Yushuo Zhang, Yu Cheng, Yongkang Hu, Jiuan Zhou, Jiawei Chen, Yuan Xie, Zhaoxia Yin

2604.08156 2026-04-10 cs.CL

Training Data Size Sensitivity in Unsupervised Rhyme Recognition

Petr Plecháč, Artjoms Šeļa, Silvie Cinková, Mirella De Sisto, Lara Nugues, Neža Kočnik, Antonina Martynenko, Ben Nagy, Luca Giovannini, Robert Kolár

2604.08148 2026-04-10 cs.CL

Clickbait detection: quick inference with maximum impact

Soveatin Kuntur, Panggih Kusuma Ningrum, Anna Wróblewska, Maria Ganzha, Marcin Paprzycki

Comments Accepted Student competition ICCS 2026

2604.08147 2026-04-10 cs.SD cs.CV

Semantic Noise Reduction via Teacher-Guided Dual-Path Audio-Visual Representation Learning

Linge Wang, Yingying Chen, Bingke Zhu, Lu Zhou, Jinqiao Wang

2604.08133 2026-04-10 cs.LG cs.AI cs.CL

Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference

Baihui Liu, Kaiyuan Tian, Wei Wang, Zhaoning Zhang, Linbo Qiao, Dongsheng Li

Comments ACL 2026 main

2604.08131 2026-04-10 cs.CL

Graph Neural Networks for Misinformation Detection: Performance-Efficiency Trade-offs

Soveatin Kuntur, Maciej Krzywda, Anna Wróblewska, Marcin Paprzycki, Maria Ganzha, Szymon Łukasik, Amir H. Gandomi

Comments Accepted at Computational Modeling and Artificial Intelligence for Social Systems Track in ICCS 2026

2604.08126 2026-04-10 cs.CL

LLM-Based Data Generation and Clinical Skills Evaluation for Low-Resource French OSCEs

Tian Huang, Tom Bourgeade, Irina Illina

Comments 11 pages, 2 figures, to be published in LREC 2026 proceedings

2604.08124 2026-04-10 cs.AI

Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search

Chuzhan Hao, Wenfeng Feng, Guochao Jiang, Guofeng Quan, Guohua Liu, Yuewei Zhang

Comments 15 pages, ACL2026 Findings Accepted

2604.08121 2026-04-10 cs.CV cs.AI

Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

Luozheng Qin, Jia Gong, Qian Qiao, Tianjiao Li, Li Xu, Haoyu Pan, Chao Qu, Zhiyu Tan, Hao Li

Comments Page and Code: https://fr0zencrane.github.io/uni-vigu-page/

2604.08120 2026-04-10 cs.CV cs.AI cs.CL cs.LG

Small Vision-Language Models are Smart Compressors for Long Video Understanding

Junjie Fei, Jun Chen, Zechun Liu, Yunyang Xiong, Chong Zhou, Wei Wen, Junlin Han, Mingchen Zhuge, Saksham Suri, Qi Qian, Shuming Liu, Lemeng Wu, Raghuraman Krishnamoorthi, Vikas Chandra, Mohamed Elhoseiny, Chenchen Zhu

Comments Project page and demo are available at https://FeiElysia.github.io/tempo-page/

2604.08118 2026-04-10 cs.CL cs.LG

Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization

Ian W. Kennedy, Nafise Sadat Moosavi

Comments 9 pages (+ references and appendix). Under review at ACL Rolling Review

2604.08115 2026-04-10 cs.AI

Revise: A Framework for Revising OCRed text in Practical Information Systems with Data Contamination Strategy

Gyuho Shim, Seongtae Hong, Heuiseok Lim

Comments Accepted to ACL 2025 Industry-Oral

2604.08111 2026-04-10 cs.LG cs.CV

Bias Redistribution in Visual Machine Unlearning: Does Forgetting One Group Harm Another?

Yunusa Haruna, Adamu Lawan, Ibrahim Haruna Abdulhamid, Hamza Mohammed Dauda, Jiaquan Zhang, Chaoning Zhang, Shamsuddeen Hassan Muhammad

2604.08106 2026-04-10 cs.CV

EPIR: An Efficient Patch Tokenization, Integration and Representation Framework for Micro-expression Recognition

Junbo Wang, Liangyu Fu, Yuke Li, Yining Zhu, Xuecheng Wu, Kun Hu

2604.08104 2026-04-10 cs.CL

Quantum Vision Theory Applied to Audio Classification for Deepfake Speech Detection

Khalid Zaman, Melike Sah, Anuwat Chaiwongyenc, Cem Direkoglu

详情

英文摘要

We propose Quantum Vision (QV) theory as a new perspective for deep learning-based audio classification, applied to deepfake speech detection. Inspired by particle-wave duality in quantum physics, QV theory is based on the idea that data can be represented not only in its observable, collapsed form, but also as information waves. In conventional deep learning, models are trained directly on these collapsed representations, such as images. In QV theory, inputs are first transformed into information waves using a QV block, and then fed into deep learning models for classification. QV-based models improve performance in image classification compared to their non-QV counterparts. What if QV theory is applied speech spectrograms for audio classification tasks? This is the motivation and novelty of the proposed approach. In this work, Short-Time Fourier Transform (STFT), Mel-spectrograms, and Mel-Frequency Cepstral Coefficients (MFCC) of speech signals are converted into information waves using the proposed QV block and used to train QV-based Convolutional Neural Networks (QV-CNN) and QV-based Vision Transformers (QV-ViT). Extensive experiments are conducted on the ASVSpoof dataset for deepfake speech classification. The results show that QV-CNN and QV-ViT consistently outperform standard CNN and ViT models, achieving higher classification accuracy and improved robustness in distinguishing genuine and spoofed speech. Moreover, the QV-CNN model using MFCC features achieves the best overall performance on the ASVspoof dataset, with an accuracy of 94.20% and an EER of 9.04%, while the QV-CNN with Mel-spectrograms attains the highest accuracy of 94.57%. These findings demonstrate that QV theory is an effective and promising approach for audio deepfake detection and opens new directions for quantum-inspired learning in audio perception tasks.

URL PDF HTML ☆

赞 0 踩 0

2604.08088 2026-04-10 cs.CV

Coordinate-Based Dual-Constrained Autoregressive Motion Generation

Kang Ding, Hongsong Wang, Jie Gui, Liang Wang

Comments Code is available at: https://github.com/fly-dk/CDAMD

2604.08087 2026-04-10 cs.SD cs.LG

DeepForestSound: a multi-species automatic detector for passive acoustic monitoring in African tropical forests, a case study in Kibale National Park

Gabriel Dubus, Théau d'Audiffret, Claire Auger, Raphaël Cornette, Sylvain Haupert, Innocent Kasekendi, Raymond Katumba, Hugo Magaldi, Lise Pernel, Harold Rugonge, Jérôme Sueur, John Justice Tibesigwa, Sabrina Krief

Comments 8 pages

2604.08084 2026-04-10 cs.CV

DiffVC: A Non-autoregressive Framework Based on Diffusion Model for Video Captioning

Junbo Wang, Liangyu Fu, Yuke Li, Yining Zhu, Ya Jing, Xuecheng Wu, Jiangbin Zheng

2604.08075 2026-04-10 cs.CL

Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving

Xunzhuo Liu, Bowei He, Xue Liu, Andy Luo, Haichen Zhang, Huamin Chen

2604.08074 2026-04-10 cs.CV

DinoRADE: Full Spectral Radar-Camera Fusion with Vision Foundation Model Features for Multi-class Object Detection in Adverse Weather

Christof Leitgeb, Thomas Puchleitner, Max Peter Ronecker, Daniel Watzenig

Comments Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026

2604.08072 2026-04-10 cs.CV physics.comp-ph

Tensor-Augmented Convolutional Neural Networks: Enhancing Expressivity with Generic Tensor Kernels

Chia-Wei Hsing, Wei-Lin Tu

Comments 8 pages, 2 figures, 2 tables

2604.08070 2026-04-10 cs.CV cs.AI

AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models

Imane Momayiz, Soufiane Ait Elaouad, Abdeljalil Elmajjodi, Haitame Bouanane

2604.08068 2026-04-10 cs.CV

Brain3D: EEG-to-3D Decoding of Visual Representations via Multimodal Reasoning

Emanuele Balloni, Emanuele Frontoni, Chiara Matti, Marina Paolanti, Roberto Pierdicca, Emiliano Santarnecchi

Comments 17 pages, 2 figures

2604.08065 2026-04-10 cs.LG

Multimodal Latent Reasoning via Predictive Embeddings

Ashutosh Adhikari, Mirella Lapata

2604.08063 2026-04-10 cs.CV

EEG2Vision: A Multimodal EEG-Based Framework for 2D Visual Reconstruction in Cognitive Neuroscience

Emanuele Balloni, Emanuele Frontoni, Chiara Matti, Marina Paolanti, Roberto Pierdicca, Emiliano Santarnecchi

Comments 17 pages, 5 figures

2604.08056 2026-04-10 cs.LG

Automating aggregation strategy selection in federated learning

Dian S. Y. Pang, Endrias Y. Ergetu, Eric Topham, Ahmed E. Fetit

2604.08050 2026-04-10 cs.CV

ABMAMBA: Multimodal Large Language Model with Aligned Hierarchical Bidirectional Scan for Efficient Video Captioning

Daichi Yashima, Shuhei Kurita, Yusuke Oda, Shuntaro Suzuki, Seitaro Otsuki, Komei Sugiura

Comments Accepted to ICPR 2026