arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.26766 2026-03-31 cs.CV

JND-Guided Neural Watermarking with Spatial Transformer Decoding for Screen-Capture Robustness

Jiayi Qin, Jingwei Li, Chuan Wu

详情

英文摘要

Screen-shooting robust watermarking aims to imperceptibly embed extractable information into host images such that the watermark survives the complex distortion pipeline of screen display and camera recapture. However, achieving high extraction accuracy while maintaining satisfactory visual quality remains an open challenge, primarily because the screen-shooting channel introduces severe and entangled degradations including Moiré patterns, color-gamut shifts, perspective warping, and sensor noise. In this paper, we present an end-to-end deep learning framework that jointly optimizes watermark embedding and extraction for screen-shooting robustness. Our framework incorporates three key innovations: (i) a comprehensive noise simulation layer that faithfully models realistic screen-shooting distortions -- notably including a physically-motivated Moiré pattern generator -- enabling the network to learn robust representations against the full spectrum of capture-channel noise through adversarial training; (ii) a Just Noticeable Distortion (JND) perceptual loss function that adaptively modulates watermark embedding strength by supervising the perceptual discrepancy between the JND coefficient map and the watermark residual, thereby concentrating watermark energy in perceptually insensitive regions to maximize visual quality; and (iii) two complementary automatic localization modules -- a semantic-segmentation-based foreground extractor for captured image rectification and a symmetric noise template mechanism for anti-cropping region recovery -- that enable fully automated watermark decoding under realistic deployment conditions. Extensive experiments demonstrate that our method achieves an average PSNR of 30.94~dB and SSIM of 0.94 on watermarked images while embedding 127-bit payloads.

URL PDF HTML ☆

赞 0 踩 0

2603.26765 2026-03-31 cs.AI

Bitboard version of Tetris AI

Xingguo Chen, Pingshou Xiong, Zhenyu Luo, Mengfei Hu, Xinwen Li, Yongzhou Lü, Guang Yang, Chao Li, Shangdong Yang

2603.26764 2026-03-31 cs.CV

Low Dose CT for Stroke Diagnosis: A Dual Pipeline Deep Learning Framework for Portable Neuroimaging

Rhea Ghosal, Ronok Ghosal, Eileen Lou

Comments 13 pages, 4 figures, 3 tables. Includes dose-level evaluation and robustness stress tests (motion and ring artifacts). Code and dataset based on RSNA Intracranial Hemorrhage Detection

2603.26761 2026-03-31 cs.CV cs.AI

Tiny-ViT: A Compact Vision Transformer for Efficient and Explainable Potato Leaf Disease Classification

Shakil Mia, Umme Habiba, Urmi Akter, SK Rezwana Quadir Raisa, Jeba Maliha, Md. Iqbal Hossain, Md. Shakhauat Hossan Sumon

Comments Accepted and Presented Paper at the 2026 IEEE International Conference on Electrical, Computer and Telecommunication Engineering, Rajshahi, Bangladesh

2603.26760 2026-03-31 cs.CV

An Intelligent Framework for Real-Time Yoga Pose Detection and Posture Correction

Chandramouli Haldar

2603.26759 2026-03-31 cs.CV

Physics-Aware Diffusion for LiDAR Point Cloud Densification

Zeping Zhang, Robert Laganière

2603.26757 2026-03-31 cs.RO

Beyond Viewpoint Generalization: What Multi-View Demonstrations Offer and How to Synthesize Them for Robot Manipulation?

Boyang Cai, Qiwei Liang, Jiawei Li, Shihang Weng, Zhaoxin Zhang, Tao Lin, Xiangyu Chen, Wenjie Zhang, Jiaqi Mao, Weisheng Xu, Bin Yang, Jiaming Liang, Junhao Cai, Renjing Xu

2603.26756 2026-03-31 cs.CV

GradAttn: Replacing Fixed Residual Connections with Task-Modulated Attention Pathways

Soudeep Ghoshal, Himanshu Buckchash

Comments 14 pages, 5 figures. Under review

2603.26754 2026-03-31 cs.CV cs.AI

Generating Synthetic Wildlife Health Data from Camera Trap Imagery: A Pipeline for Alopecia and Body Condition Training Data

David Brundage

2603.26753 2026-03-31 cs.RO

Reasoning Systems for Semantic Navigation in Mobile Robots

Jonathan Crespo, Ramón Barber, O. M. Mozos, Daniel Beßler, Michael Beetz

Comments This is the authors' manuscript. The final published article is available at https://doi.org/10.1109/IROS.2018.8594271

2603.26752 2026-03-31 cs.RO cond-mat.mtrl-sci

Functionalization of Situated Robots via Vapour

Kadri-Ann Pankratov, Leonid Zinatullin, Adele Metsniit, Marie Vihmar, Indrek Must

Comments Accepted in 9th IEEE-RAS International Conference on Soft Robotics (Robosoft 2026) as Extended Abstract (preliminary results)

2603.26751 2026-03-31 cs.CV

Survey on Remote Sensing Scene Classification: From Traditional Methods to Large Generative AI Models

Qionghao Huang, Can Hu

Comments Accepted in Journal of King Saud University Computer and Information Sciences

详情

DOI: 10.1007/s44443-026-00694-7
Journal ref: Journal of King Saud University Computer and Information Sciences, 2026

英文摘要

Remote sensing scene classification has experienced a paradigmatic transformation from traditional handcrafted feature methods to sophisticated artificial intelligence systems that now form the backbone of modern Earth observation applications. This comprehensive survey examines the complete methodological evolution, systematically tracing development from classical texture descriptors and machine learning classifiers through the deep learning revolution to current state-of-the-art foundation models and generative AI approaches. We chronicle the pivotal shift from manual feature engineering to automated hierarchical representation learning via convolutional neural networks, followed by advanced architectures including Vision Transformers, graph neural networks, and hybrid frameworks. The survey provides in-depth coverage of breakthrough developments in self-supervised foundation models and vision-language systems, highlighting exceptional performance in zero-shot and few-shot learning scenarios. Special emphasis is placed on generative AI innovations that tackle persistent challenges through synthetic data generation and advanced feature learning strategies. We analyze contemporary obstacles including annotation costs, multimodal data fusion complexities, interpretability demands, and ethical considerations, alongside current trends in edge computing deployment, federated learning frameworks, and sustainable AI practices. Based on comprehensive analysis of recent advances and gaps, we identify key future research priorities: advancing hyperspectral and multi-temporal analysis capabilities, developing robust cross-domain generalization methods, and establishing standardized evaluation protocols to accelerate scientific progress in remote sensing scene classification systems.

URL PDF HTML ☆

赞 0 踩 0

2603.26748 2026-03-31 cs.RO cs.AI

LARD 2.0: Enhanced Datasets and Benchmarking for Autonomous Landing Systems

Yassine Bougacha, Geoffrey Delhomme, Mélanie Ducoffe, Augustin Fuchs, Jean-Brice Ginestet, Jacques Girard, Sofiane Kraiem, Franck Mamalet, Vincent Mussot, Claire Pagetti, Thierry Sammour

2603.26746 2026-03-31 cs.CV cs.LG

TDEC: Deep Embedded Image Clustering with Transformer and Distribution Information

Ruilin Zhang, Haiyang Zheng, Hongpeng Wang

2603.26745 2026-03-31 cs.CV

Motion Semantics Guided Normalizing Flow for Privacy-Preserving Video Anomaly Detection

Yang Liu, Boan Chen, Yuanyuan Meng, Jing Liu, Zhengliang Guo, Wei Zhou, Peng Sun, Hong Chen

Comments Accepted to IEEE ICME 2026

2603.26744 2026-03-31 cs.CV

CNMBI: Determining the Number of Clusters Using Center Pairwise Matching and Boundary Filtering

Ruilin Zhang, Haiyang Zheng, Hongpeng Wang

2603.26743 2026-03-31 cs.CV cs.AI cs.LG

Steering Sparse Autoencoder Latents to Control Dynamic Head Pruning in Vision Transformers (Student Abstract)

Yousung Lee, Dongsoo Har

Comments 3 pages, 5 figures. Accepted as AAAI 2026 Student Abstract. Includes additional appendix with extended analysis

2603.26742 2026-03-31 cs.CL cs.LG

Do Multilingual VLMs Reason Equally? A Cross-Lingual Visual Reasoning Audit for Indian Languages

Swastik R

Comments 16 pages, 10 figures, 6 tables. Code and data: https://github.com/QuantumByte-01/multilingual-vlm-reasoning-audit Dataset: https://huggingface.co/datasets/Swastikr/multilingual-vlm-reasoning

2603.26741 2026-03-31 cs.CV cs.AI cs.RO

Language-Conditioned World Modeling for Visual Navigation

Yifei Dong, Fengyi Wu, Yilong Dai, Lingdong Kong, Guangyu Chen, Xu Zhu, Qiyu Hu, Tianyu Wang, Johnalbert Garnica, Feng Liu, Siyu Huang, Qi Dai, Zhi-Qi Cheng

Comments 19 pages, 6 figures, Code: https://github.com/F1y1113/LCVN

2603.26740 2026-03-31 cs.RO

Motion as a Sensing Modality for Metric Scale in Monocular Visual-Inertial Odometry

Hadush Hailu, Bruk Gebregziabher

Comments 10 pages

2603.26737 2026-03-31 cs.CV cs.AI

Beyond Static Visual Tokens: Structured Sequential Visual Chain-of-Thought Reasoning

Guangfu Guo, Xiaoqian Lu, Yue Feng, Mingming Sun

2603.26736 2026-03-31 cs.CV cs.AI

Ordinal Semantic Segmentation Applied to Medical and Odontological Images

Mariana Dória Prata Lima, Gilson Antonio Giraldi, Jaime S. Cardoso

Comments 23 pages, 1 figure

2603.26735 2026-03-31 cs.CV cs.AI

Distilled Large Language Model-Driven Dynamic Sparse Expert Activation Mechanism

Qinghui Chen, Zekai Zhang, Zaigui Zhang, Kai Zhang, Dagang Li, Wenmin Wang, Jinglin Zhang, Cong Liu

2603.26731 2026-03-31 cs.CV cs.AI

Contextual inference from single objects in Vision-Language models

Martina G. Vilas, Timothy Schaumlöffel, Gemma Roig

2603.26730 2026-03-31 cs.RO

Why Cognitive Robotics Matters: Lessons from OntoAgent and LLM Deployment in HARMONIC for Safety-Critical Robot Teaming

Sanjay Oruganti, Sergei Nirenburg, Marjorie McShane, Jesse English, Michael Roberts, Christian Arndt, Ramviyas Parasuraman, Luis Sentis

2603.26727 2026-03-31 cs.CV cs.AI

The Nonverbal Gap: Toward Affective Computer Vision for Safer and More Equitable Online Dating

Ratna Kandala, Niva Manchanda, Akshata Kishore Moharir

2603.26726 2026-03-31 cs.CV cs.AI

A Multimodal Deep Learning Framework for Edema Classification Using HCT and Clinical Data

Aram Ansary Ogholbake, Hannah Choi, Spencer Brandenburg, Alyssa Antuna, Zahraa Al-Sharshahi, Makayla Cox, Haseeb Ahmed, Jacqueline Frank, Nathan Millson, Luke Bauerle, Jessica Lee, David Dornbos, Qiang Cheng

详情

英文摘要

We propose AttentionMixer, a unified deep learning framework for multimodal detection of brain edema that combines structural head CT (HCT) with routine clinical metadata. While HCT provides rich spatial information, clinical variables such as age, laboratory values, and scan timing capture complementary context that might be ignored or naively concatenated. AttentionMixer is designed to fuse these heterogeneous sources in a principled and efficient manner. HCT volumes are first encoded using a self-supervised Vision Transformer Autoencoder (ViT-AE++), without requiring large labeled datasets. Clinical metadata are mapped into the same feature space and used as keys and values in a cross-attention module, where HCT-derived feature vector serves as queries. This cross-attention fusion allows the network to dynamically modulate imaging features based on patient-specific context and provides an interpretable mechanism for multimodal integration. A lightweight MLP-Mixer then refines the fused representation before final classification, enabling global dependency modeling with substantially reduced parameter overhead. Missing or incomplete metadata are handled via a learnable embedding, promoting robustness to real-world clinical data quality. We evaluate AttentionMixer on a curated brain HCT cohort with expert edema annotations using five-fold cross-validation. Compared with strong HCT-only, metadata-only, and prior multimodal baselines, AttentionMixer achieves superior performance (accuracy 87.32%, precision 92.10%, F1-score 85.37%, AUC 94.14%). Ablation studies confirm the benefit of both cross-attention and MLP-Mixer refinement, and permutation-based metadata importance analysis highlights clinically meaningful variables driving predictions. These results demonstrate that structured, interpretable multimodal fusion can substantially improve edema detection in clinical practice.

URL PDF HTML ☆

赞 0 踩 0

2603.26724 2026-03-31 cs.CV cs.RO

An Annotation-to-Detection Framework for Autonomous and Robust Vine Trunk Localization in the Field by Mobile Agricultural Robots

Dimitrios Chatziparaschis, Elia Scudiero, Brent Sams, Konstantinos Karydis

Comments 7 pages, 6 figures, conference

2603.26713 2026-03-31 cs.LG eess.SP stat.ML

Boundary-aware Prototype-driven Adversarial Alignment for Cross-Corpus EEG Emotion Recognition

Guangli Li, Canbiao Wu, Na Tian, Li Zhang, Zhen Liang

详情

英文摘要

Electroencephalography (EEG)-based emotion recognition suffers from severe performance degradation when models are transferred across heterogeneous datasets due to physiological variability, experimental paradigm differences, and device inconsistencies. Existing domain adversarial methods primarily enforce global marginal alignment and often overlook class-conditional mismatch and decision boundary distortion, limiting cross-corpus generalization. In this work, we propose a unified Prototype-driven Adversarial Alignment (PAA) framework for cross-corpus EEG emotion recognition. The framework is progressively instantiated in three configurations: PAA-L, which performs prototype-guided local class-conditional alignment; PAA-C, which further incorporates contrastive semantic regularization to enhance intra-class compactness and inter-class separability; and PAA-M, the full boundary-aware configuration that integrates dual relation-aware classifiers within a three-stage adversarial optimization scheme to explicitly refine controversial samples near decision boundaries. By combining prototype-guided subdomain alignment, contrastive discriminative enhancement, and boundary-aware aggregation within a coherent adversarial architecture, the proposed framework reformulates emotion recognition as a relation-driven representation learning problem, reducing sensitivity to label noise and improving cross-domain stability. Extensive experiments on SEED, SEED-IV, and SEED-V demonstrate state-of-the-art performance under four cross-corpus evaluation protocols, with average improvements of 6.72\%, 5.59\%, 6.69\%, and 4.83\%, respectively. Furthermore, the proposed framework generalizes effectively to clinical depression identification scenarios, validating its robustness in real-world heterogeneous settings. The source code is available at \textit{https://github.com/WuCB-BCI/PAA}

URL PDF HTML ☆

赞 0 踩 0

2603.26711 2026-03-31 cs.RO cs.SY eess.SY

Surface-Constrained Offline Warping with Contact-Aware Online Pose Projection for Safe Robotic Trajectory Execution

Farong Wang, Sai Swaminathan, Fei Liu

Comments 7 pages, 7 figures. Submitted to IROS 2026