arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.23688 2026-04-28 cs.CV

Do Protective Perturbations Really Protect Portrait Privacy under Real-world Image Transformations?

Ruiqing Sun, Xingshan Yao, Zhijing Wu, Tian Lan, Chenhao Cui, Huiyang Zhao, Jialing Shi, Chen Yang, Xianling Mao

详情

英文摘要

Proactive defense methods protect portrait images from unauthorized editing or talking face generation (TFG) by introducing pixel-level protective perturbations, and have already attracted increasing attention for privacy protection. In real-world scenarios, images inevitably undergo various transformations during cross-device display and dissemination--such as scale transformations and color compression--that directly alter pixel values. However, it remains unclear whether such pixel-level modifications affect the effectiveness of existing proactive defense methods that rely on pixel-level perturbations. To solve this problem, we conduct a systematic evaluation of representative proactive defenses under image transformation. The evaluated methods are selected to span different generation architectures such as diffusion and GAN-based models, as well as defense scopes covering both portrait and natural images, and are assessed using both qualitative and quantitative metrics for subjective and objective comparison. Experimental results indicate that defense methods based on pixel-level perturbations struggle to withstand common image transformations, posing a risk of defense failure in real-world applications. To further highlight this risk, we propose a simple yet effective purification framework by leveraging the vulnerabilities induced by real-world image transformations. Experimental results demonstrate that the proposed method can efficiently remove protective perturbations with low computational cost, highlighting previously overlooked risks to the research community.

URL PDF HTML ☆

赞 0 踩 0

2604.23685 2026-04-28 cs.CV

Reading in the Dark: Low-light Scene Text Recognition

Xuanshuo Fu, Lei Kang, Ernest Valveny, Dimosthenis Karatzas, Javier Vazquez-Corral

2604.23683 2026-04-28 cs.CV

Learning to Decipher from Pixels -- A Case Study of Copiale

Lei Kang, Giuseppe De Gregorio, Raphaela Heil, Alicia Fornés, Beáta Megyesi

Comments Accepted to HistoCrypt 2026

2604.23681 2026-04-28 cs.LG cs.CL stat.ML

Rank, Head-Channel Non-Identifiability, and Symmetry Breaking: A Precise Analysis of Representational Collapse in Transformers

Giansalvo Cirrincione

Comments 36 pages, 8 figures, 1 table. Submitted to Artificial Intelligence (Elsevier)

详情

英文摘要

A widely cited result by Dong et al. (2021) showed that Transformers built from self-attention alone, without skip connections or feed-forward layers, suffer from rapid rank collapse: all token representations converge to a single direction. The proposed remedy was the MLP. We show that this picture, while correct in the regime studied by Dong, is incomplete in ways that matter for architectural understanding. Three results are established. First, layer normalisation is precisely affine-rank-neutral: it preserves the affine rank of the token representation set exactly. The widespread claim that LN "plays no role" is imprecise; the correct statement is sharper. Second, residual connections generically obstruct rank collapse in real Transformers such as BERT-base, in a measure-theoretic sense, without contribution from the MLP. The MLP's irreplaceable function is different: generating feature directions outside the linear span of the original token embeddings, which no stack of attention layers can produce. Third, a phenomenon distinct from rank collapse is identified: head-channel non-identifiability. After multi-head attention sums per-head outputs through the output projection, individual contributions cannot be canonically attributed to a specific head; n(H-1)d_k degrees of freedom per layer remain ambiguous when recovering a single head from the mixed signal. The MLP cannot remedy this because it acts on the post-summation signal. A constructive partial remedy is proposed: a position-gated output projection (PG-OP) at parameter overhead below 1.6% of the standard output projection. The four collapse phenomena identified in the literature -- rank collapse in depth, in width, head-channel non-identifiability, and entropy collapse -- are unified under a symmetry-breaking framework, each corresponding to a distinct symmetry of the Transformer's forward pass.

URL PDF HTML ☆

赞 0 踩 0

2604.23678 2026-04-28 cs.AI

Transferable Human Mobility Network Reconstruction with neuroGravity

Jinming Yang, Shaoyu Huang, Zongyuan Huang, Yaohui Jin, Xiaokang Yang, Marta C. Gonzalez, Yanyan Xu

2604.23674 2026-04-28 cs.AI

Vibe Medicine: Redefining Biomedical Research Through Human-AI Co-Work

Zihao Wu, Steven Xu, Bowen Chen, Shaowen Wan, Yiwei Li, Wei Ruan, Yanjun Lyu, Siyuan Li, Dajiang Zhu, Tianming Liu, Lin Zhao

2604.23670 2026-04-28 cs.CV

Deploy DINO with Many-to-Many Association

Haodong Jiang, Mingzhe Li, Junfeng Wu

2604.23665 2026-04-28 cs.CV

HAC: Parameter-Efficient Hyperbolic Adaptation of CLIP for Zero-Shot VQA

Francesco Dibitonto, Cigdem Beyan, Vittorio Murino

Comments This is the preprint version of the paper. The final version has been accepted for publication in the Proceedings of the 28th International Conference on Pattern Recognition (ICPR 2026)

2604.23662 2026-04-28 cs.CV

SolarFCD: A Large-Scale Dataset and Benchmark for Solar Fault Classification in Photovoltaic Systems

Misbah Ijaz, Saif Ur Rehman Khan, Abd Ur Rehman, Arooj Zaib, Sebastian Vollmer, Andreas Dengel, Muhammad Nabeel Asim

2604.23655 2026-04-28 cs.CV

BVI-Mamba: Video Enhancement Using a Visual State-Space Model for Low-Light and Underwater Environments

Guoxi Huang, Ruirui Lin, Yini Li, David R. Bull, Nantheera Anantrasirichai

2604.23653 2026-04-28 cs.CV cs.AI

ResAF-Net: An Anchor-Free Attention-Based Network for Tree Detection and Agricultural Mapping in Palestine

Rabee Al-Qasem

2604.23651 2026-04-28 cs.CV

Geometry-Conditioned Diffusion for Occlusion-Robust In-Bed Pose Estimation

Navid Aslankhani Khameneh, Marco Carletti, Cigdem Beyan

Comments This is the preprint version of the paper. The final version has been accepted for publication in the Proceedings of the 20th IEEE International Conference on Automatic Face and Gesture Recognition (IEEE FG 2026)

2604.23648 2026-04-28 cs.RO

Safe Navigation in Unknown and Cluttered Environments via Direction-Aware Convex Free-Region Generation

Zhicheng Song, Yongjian Li, Kai Chen, Yulin Li, Fan Shi, Jun Ma

2604.23646 2026-04-28 cs.AI cs.CR

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Rong Xiang

2604.23644 2026-04-28 cs.CV cs.AI

RaV-IDP: A Reconstruction-as-Validation Framework for Faithful Intelligent Document Processing

Pritesh Jha

2604.23641 2026-04-28 cs.CV

VDLF-Net: Variational Feature Fusion for Adaptive and Few-Shot Visual Learning

Jiawei Yan

2604.23636 2026-04-28 cs.CV

Discriminator-Guided Adaptive Diffusion for Source-Free Test-Time Adaptation under Image Corruptions

Francesco Olivato, Cigdem Beyan, Vittorio Murino

Comments This is the preprint (submitted version) of the paper. The final version has been accepted for publication in the Proceedings of the 28th International Conference on Pattern Recognition (ICPR 2026)

2604.23633 2026-04-28 cs.AI

Causal Discovery as Dialectical Aggregation: A Quantitative Argumentation Framework

Sheng Wei, Yulin Chen, Beishui Liao

Comments Accepted at the 23rd International Conference on Principles of Knowledge Representation and Reasoning (KR 2026). This arXiv version includes supplementary material and additional implementation details

2604.23632 2026-04-28 cs.CV cs.MM cs.SD

Hallo-Live: Real-Time Streaming Joint Audio-Video Avatar Generation with Asynchronous Dual-Stream and Human-Centric Preference Distillation

Chunyu Li, Jiaye Li, Ruiqiao Mei, Haoyuan Xia, Hao Zhu, Jingdong Wang, Siyu Zhu

2604.23627 2026-04-28 cs.CL cs.LG

Neural Grammatical Error Correction for Romanian

Teodor-Mihai Cotet, Stefan Ruseti, Mihai Dascalu

2604.23626 2026-04-28 cs.CL

GraphPlanner: Graph Memory-Augmented Agentic Routing for Multi-Agent LLMs

Tao Feng, Haozhen Zhang, Zijie Lei, Peixuan Han, Jiaxuan You

2604.23623 2026-04-28 cs.AI

Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

Zichuan Fu, Xian Wu, Guojing Li, Yejing Wang, Yijun Chen, Zihao Zhao, Yixuan Luo, Hanyu Yan, Yefeng Zheng, Xiangyu Zhao

Comments ACL 2026 Findings

2604.23622 2026-04-28 cs.CV

A Synergistic CNN-Transformer Network with Pooling Attention Fusion for Hyperspectral Image Classification

Peng Chen, Wenxuan He, Feng Qian, Guangyao Shi, Jingwen Yan

2604.23620 2026-04-28 cs.RO

Move-Then-Operate: Behavioral Phasing for Human-Like Robotic Manipulation

Haoming Xu, Lei Lei, Jie Gu, Chu Tang, Jingmin Chen, Ruiqi Wang

Comments 15 pages, 10 figures

2604.23615 2026-04-28 cs.CL cs.AI

Applications of the Transformer Architecture in AI-Assisted English Reading Comprehension

Ping Li

Comments 9 pages, 5 figures, Conference paper for International Conference on Big Data Applications in Education and Engineering {ICBDAEE 2026)

2604.23612 2026-04-28 cs.CV

Comparative Study of Weighted and Coupled Second- and Fourth-Order PDEs for Image Despeckling in Grayscale, Color, SAR, and Ultrasound

Manish Kumar, Rajendra K. Ray

详情

英文摘要

Partial Differential Equation (PDE)-based approaches have gained significant attention in image despeckling due to their strong capability to preserve structural details while suppressing noise. However, conventional second-order PDE models tend to generate blocky artifacts, whereas higher-order models often introduce speckle patterns. To resolve it, this paper proposes and comparatively analyzes two advanced PDE-based frameworks designed for speckle noise suppression while preserving the fine edges. The first model introduces a novel weighted formulation that combines second and fourth-order PDEs through a weighting parameter. The second-order diffusion coefficient employs grayscale and gradient-based indicators, while the fourth-order term is guided solely by a Laplacian-based indicator. The second model constructs a coupled PDE framework, where independent fourth and second-order components are explicitly solved in an iterative manner. In this coupled structure, each diffusion coefficient is defined separately to enhance adaptability in varying image regions. Both models are implemented using the explicit finite difference method. The proposed techniques are extensively evaluated on a variety of datasets, including standard grayscale, color, Synthetic Aperture Radar (SAR), and ultrasound images. Comparative experiments with the existing Telegraph Diffusion Model (TDM) and Fourth-Order Telegraph Diffusion Model (TDFM) demonstrate the superiority of the proposed approaches in reducing speckle noise while effectively preserving fine image structures and edges. Quantitative evaluations using PSNR, SSIM and Speckle Index metrics confirm that the proposed models produce higher image quality and enhanced visual perception. Overall, the presented PDE-based formulations provide a reliable and efficient framework for image despeckling in both natural and medical imaging.

URL PDF HTML ☆

赞 0 踩 0

2604.23609 2026-04-28 cs.RO

Tube Diffusion Policy: Reactive Visual-Tactile Policy Learning for Contact-rich Manipulation

Teng Xue, Alberto Rigo, Bingjian Huang, Jiayi Shen, Zhengtong Xu, Nick Colonnese, Amirhossein H. Memar

2604.23606 2026-04-28 cs.LG math-ph math.MP

Hamiltonian Graph Inference Networks: Joint structure discovery and dynamics prediction for lattice Hamiltonian systems from trajectory data

Ru Geng, Panayotis Kevrekidis, Yixian Gao, Hong-Kun Zhang, Jian Zu

Comments 18 pages, 8 figures

2604.23605 2026-04-28 cs.AI

Thinking Like a Clinician: A Cognitive AI Agent for Clinical Diagnosis via Panoramic Profiling and Adversarial Debate

Zhiqi Lv, Duofan Tu, Jun Li, Mingyue Zhao, Heqin Zhu, Wenliang Li, Shaohua Kevin Zhou

2604.23604 2026-04-28 cs.CV cs.RO

Learning to Identify Out-of-Distribution Objects for 3D LiDAR Anomaly Segmentation

Simone Mosco, Daniel Fusaro, Alberto Pretto

Comments This paper has been accepted at the 2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)