arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.16200 2026-04-20 cs.CV

Saturation-Aware Space-Variant Blind Image Deblurring

Muhammad Z. Alam, Larry Stetsiuk, Arooba Zeshan

Comments 12 pages, 12 Figure

详情

DOI: 10.1109/TMM.2026.3680680

英文摘要

This paper presents a novel saturation aware space variant blind image deblurring framework designed to address challenges posed by saturated pixels in deblurring under high dynamic range and low light conditions. The proposed approach effectively segments the image based on blur intensity and proximity to saturation, leveraging a pre estimated Light Spread Function to mitigate stray light effects. By accurately estimating the true radiance of saturated regions using the dark channel prior, our method enhances the deblurring process without introducing artifacts like ringing. Experimental evaluations on both synthetic and real world datasets demonstrate that the framework improves deblurring outcomes across various scenarios showcasing superior performance compared to state of the art saturation-aware and general purpose methods. This adaptability highlights the framework potential integration with existing and emerging blind image deblurring techniques.

URL PDF HTML ☆

赞 0 踩 0

2604.16197 2026-04-20 cs.LG

Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation

Yide Ran, Jianwen Xie, Minghui Wang, Wenjin Zheng, Denghui Zhang, Chuan Li, Zhaozhuo Xu

Comments 54 pages

2604.16182 2026-04-20 cs.LG cs.AI

Synthetic data in cryptocurrencies using generative models

André Saimon S. Sousa, Otto Pires, Frank Acasiete, Oscar M. Granados, Valéria Loureiro da Silva, Hugo Saba

2604.16175 2026-04-20 cs.AI cs.CV

MARCH: Multi-Agent Radiology Clinical Hierarchy for CT Report Generation

Yi Lin, Yihao Ding, Yonghui Wu, Yifan Peng

Comments Accepted by ACL 2026 main conference

2604.16170 2026-04-20 cs.CV cs.CE

neuralCAD-Edit: An Expert Benchmark for Multimodal-Instructed 3D CAD Model Editing

Toby Perrett, Matthew Bouchard, William McCarthy

Comments Project page: https://autodeskailab.github.io/neuralCAD-Edit

2604.16158 2026-04-20 cs.CL cs.AI cs.LG

AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency

Max Henning Höth, Kristian Kersting, Björn Deiseroth, Letitia Parcalabescu

Comments 14 pages, 8 figures, 1 table

2604.16147 2026-04-20 cs.CV cs.AI

SWNet: A Cross-Spectral Network for Camouflaged Weed Detection

Henry O. Velesaca, Luigi Miranda, Angel D. Sappa

2604.16145 2026-04-20 cs.LG cs.AI cs.DC cs.PF

Training Time Prediction for Mixed Precision-based Distributed Training

Minchul Kang, Changyong Shin, Jinwoo Jeong, Hyunho Lee, Younghun Go, Gyeongmin Kim, Gyeongsik Yang, Chuck Yoo

2604.16138 2026-04-20 cs.CL cs.LG

Sentiment Analysis of German Sign Language Fairy Tales

Fabrizio Nunnari, Siddhant Jain, Patrick Gebhard

2604.16135 2026-04-20 cs.CV

Motion-Adapter: A Diffusion Model Adapter for Text-to-Motion Generation of Compound Actions

Yue Jiang, Mingyu Yang, Liuyuxin Yang, Yang Xu, Bingxin Yun, Yuhe Zhang

Comments 12 pages, 12 figures, Under review for publication in IEEE Transactions on Visualization and Computer Graphics

2604.16132 2026-04-20 cs.CL cs.AI

Can LLMs Understand the Impact of Trauma? Costs and Benefits of LLMs Coding the Interviews of Firearm Violence Survivors

Jessica H. Zhu, Shayla Stringfield, Vahe Zaprosyan, Michael Wagner, Michel Cukier, Joseph B. Richardson

Comments Accepted to Findings of the Association for Computational Linguistics (2026)

2604.16119 2026-04-20 cs.LG

Univariate Channel Fusion for Multivariate Time Series Classification

Fernando Moro, Vinicius M. A. Souza

Comments International Conference on Pattern Recognition (ICPR 2026)

2604.16117 2026-04-20 cs.LG cs.AI

SCRIPT: Implementing an Intelligent Tutoring System for Programming in a German University Context

Alina Deriyeva, Jesper Dannath, Benjamin Paassen

Comments In: Cristea, A.I., Walker, E., Lu, Y., Santos, O.C., Isotani, S. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED. AIED 2025. Communications in Computer and Information Science, vol 2590 . Springer, Cham

2604.16115 2026-04-20 cs.CV

From Articles to Canopies: Knowledge-Driven Pseudo-Labelling for Tree Species Classification using LLM Experts

Michał Romaszewski, Dominik Kopeć, Michał Cholewa, Katarzyna Kołodziej, Przemysław Głomb, Jan Niedzielko, Jakub Charyton, Justyna Wylazłowska, Anna Jarocińska

2604.16114 2026-04-20 cs.CV

Towards In-Context Tone Style Transfer with A Large-Scale Triplet Dataset

Yuhai Deng, Huimin She, Wei Shen, Meng Li, Ruoxi Wu, Lunxi Yuan, Xiang Li

Comments 33 pages, 14 figures

2604.16111 2026-04-20 cs.LG stat.ML

Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model

Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric

Comments Accepted at the 32nd International Conference on Algorithmic Learning Theory (ALT 2021)

2604.16108 2026-04-20 cs.CV

Polyglot: Multilingual Style Preserving Speech-Driven Facial Animation

Federico Nocentini, Kwanggyoon Seo, Qingju Liu, Claudio Ferrari, Stefano Berretti, David Ferman, Hyeongwoo Kim, Pablo Garrido, Akin Caliskan

Comments The project website is available at https://fedenoce.github.io/polyglot/

2604.16099 2026-04-20 cs.CV

DenTab: A Dataset for Table Recognition and Visual QA on Real-World Dental Estimates

Laziz Hamdi, Amine Tamasna, Thierry Paquet

2604.16087 2026-04-20 cs.LG stat.ML

The Harder Path: Last Iterate Convergence for Uncoupled Learning in Zero-Sum Games with Bandit Feedback

Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Michal Valko, Vianney Perchet

Comments Accepted at the 42nd International Conference on Machine Learning (ICML 2025)

2604.16086 2026-04-20 cs.CV cs.AI cs.LG stat.ML

Stylistic-STORM (ST-STORM) : Perceiving the Semantic Nature of Appearance

Hamed Ouattara, Pierre Duthon, Pascal Houssam Salmane, Frédéric Bernardin, Omar Ait Aider

Comments 20 pages, 16 figures, ICPR 2026 (28th International Conference on Pattern Recognition)

详情

英文摘要

One of the dominant paradigms in self-supervised learning (SSL), illustrated by MoCo or DINO, aims to produce robust representations by capturing features that are insensitive to certain image transformations such as illumination, or geometric changes. This strategy is appropriate when the objective is to recognize objects independently of their appearance. However, it becomes counterproductive as soon as appearance itself constitutes the discriminative signal. In weather analysis, for example, rain streaks, snow granularity, atmospheric scattering, as well as reflections and halos, are not noise: they carry the essential information. In critical applications such as autonomous driving, ignoring these cues is risky, since grip and visibility depend directly on ground conditions and atmospheric conditions. We introduce ST-STORM, a hybrid SSL framework that treats appearance (style) as a semantic modality to be disentangled from content. Our architecture explicitly separates two latent streams, regulated by gating mechanisms. The Content branch aims at a stable semantic representation through a JEPA scheme coupled with a contrastive objective, promoting invariance to appearance variations. In parallel, the Style branch is constrained to capture appearance signatures (textures, contrasts, scattering) through feature prediction and reconstruction under an adversarial constraint. We evaluate ST-STORM on several tasks, including object classification (ImageNet-1K), fine-grained weather characterization, and melanoma detection (ISIC 2024 Challenge). The results show that the Style branch effectively isolates complex appearance phenomena (F1=97% on Multi-Weather and F1=94% on ISIC 2024 with 10% labeled data), without degrading the semantic performance (F1=80% on ImageNet-1K) of the Content branch, and improves the preservation of critical appearance

URL PDF HTML ☆

赞 0 踩 0

2604.16084 2026-04-20 cs.LG cs.AI

Unveiling Stochasticity: Universal Multi-modal Probabilistic Modeling for Traffic Forecasting

Weijiang Xiong, Robert Fonod, Nikolas Geroliminis

2604.16083 2026-04-20 cs.CV

DINOv3 Beats Specialized Detectors: A Simple Foundation Model Baseline for Image Forensics

Jieming Yu, Qiuxiao Feng, Zhuohan Wang, Xiaochen Ma

Comments Technical report

2604.16082 2026-04-20 cs.CV cs.AI cs.LG

Early Detection of Acute Myeloid Leukemia (AML) Using YOLOv12 Deep Learning Model

Enas E. Ahmed, Salah A. Aly, Mayar Moner

Comments 6 pages, 10 figures, 2 tables

2604.16070 2026-04-20 cs.CV

TableSeq: Unified Generation of Structure, Content, and Layout

Laziz Hamdi, Amine Tamasna, Pascal Boisson, Thierry Paquet

2604.16067 2026-04-20 cs.LG cs.CV

AEGIS: Anchor-Enforced Gradient Isolation for Knowledge-Preserving Vision-Language-Action Fine-Tuning

Guransh Singh

2604.16060 2026-04-20 cs.CV cs.AI

Chain-of-Thought Degrades Visual Spatial Reasoning Capabilities of Multimodal LLMs

Sai Srinivas Kancheti, Aditya Sanjiv Kanade, Vineeth N. Balasubramanian, Tanuja Ganu

2604.16056 2026-04-20 cs.SD cs.AI

AST: Adaptive, Seamless, and Training-Free Precise Speech Editing

Sihan Lv, Yechen Jin, Zhen Li, Jintao Chen, Jinshan Zhang, Ying Li, Jianwei Yin, Meng Xi

2604.16054 2026-04-20 cs.CV cs.AI

Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs

Rohit Sinha, Aditya Kanade, Sai Srinivas Kancheti, Vineeth N Balasubramanian, Tanuja Ganu

2604.16044 2026-04-20 cs.CV

Elucidating the SNR-t Bias of Diffusion Probabilistic Models

Meng Yu, Lei Sun, Jianhao Zeng, Xiangxiang Chu, Kun Zhan

Comments Accepted to CVPR 2026, 19pages, with appendix

2604.16037 2026-04-20 cs.CL

Stochasticity in Tokenisation Improves Robustness

Sophie Steger, Rui Li, Sofiane Ennadir, Anya Sims, Arno Solin, Franz Pernkopf, Martin Trapp