arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.21478 2026-04-24 cs.CV

Rethinking Cross-Domain Evaluation for Face Forgery Detection with Semantic Fine-grained Alignment and Mixture-of-Experts

Yuhan Luo, Tao Chen, Decheng Liu

Comments The source code is available at https://github.com/Yuhan-Luo/Semantic-Fine-grained-Alignment-and-Mixture-of-Experts

详情

英文摘要

Nowadays, visual data forgery detection plays an increasingly important role in social and economic security with the rapid development of generative models. Existing face forgery detectors still can't achieve satisfactory performance because of poor generalization ability across datasets. The key factor that led to this phenomenon is the lack of suitable metrics: the commonly used cross-dataset AUC metric fails to reveal an important issue where detection scores may shift significantly across data domains. To explicitly evaluate cross-domain score comparability, we propose \textbf{Cross-AUC}, an evaluation metric that can compute AUC across dataset pairs by contrasting real samples from one dataset with fake samples from another (and vice versa). It is interesting to find that evaluating representative detectors under the Cross-AUC metric reveals substantial performance drops, exposing an overlooked robustness problem. Besides, we also propose the novel framework \textbf{S}emantic \textbf{F}ine-grained \textbf{A}lignment and \textbf{M}ixture-of-Experts (\textbf{SFAM}), consisting of a patch-level image-text alignment module that enhances CLIP's sensitivity to manipulation artifacts, and the facial region mixture-of-experts module, which routes features from different facial regions to specialized experts for region-aware forgery analysis. Extensive qualitative and quantitative experiments on the public datasets prove that the proposed method achieves superior performance compared with the state-of-the-art methods with various suitable metrics.

URL PDF HTML ☆

赞 0 踩 0

2604.21473 2026-04-24 cs.LG cs.AI

Drug Synergy Prediction via Residual Graph Isomorphism Networks and Attention Mechanisms

Jiyan Song, Wenyang Wang, Chengcheng Yan, Zhiquan Han, Feifei Zhao

2604.21471 2026-04-24 cs.RO

Ufil: A Unified Framework for Infrastructure-based Localization

Simon Schäfer, Lucas Hegerath, Marius Molz, Massimo Marcon, Bassam Alrifaee

Comments 8 pages, 6 figures, this work was submitted to IEEE International Conference on Intelligent Transportation Systems (ITSC) 2026

2604.21469 2026-04-24 cs.CL cs.LG

Cross-Domain Data Selection and Augmentation for Automatic Compliance Detection

Fariz Ikhwantri, Dusica Marijan

Comments 10 pages, 5 figures, 4 tables. 11th Special Session on Intelligent Data Mining, 2025 IEEE International Conference on Big Data

2604.21465 2026-04-24 cs.CV

ID-Eraser: Proactive Defense Against Face Swapping via Identity Perturbation

Junyan Luo, Peipeng Yu, Jianwei Fei, Shiya Zeng, Xiaoyu Zhou, Zhihua Xia, Xiang Liu

2604.21464 2026-04-24 cs.LG cs.AI

Dynamical Priors as a Training Objective in Reinforcement Learning

Sukesh Subaharan

Comments Supplementary material can be accessed here: https://github.com/drsukeshs/esd-rl

2604.21462 2026-04-24 cs.LG

Conditional anomaly detection with soft harmonic functions

Michal Valko, Branislav Kveton, Hamed Valizadegan, Gregory F. Cooper, Milos Hauskrecht

Comments Published at IEEE International Conference on Data Mining (ICDM 2011). 10.1109/ICDM.2011.40

2604.21461 2026-04-24 cs.CV cs.HC

Do MLLMs Understand Pointing? Benchmarking and Enhancing Referential Reasoning in Egocentric Vision

Chentao Li, Zirui Gao, Mingze Gao, Yinglian Ren, Jianjiang Feng, Jie Zhou

Comments 20 pages, 14 figures. Committed to ACL 2026

2604.21453 2026-04-24 cs.CV

Instance-level Visual Active Tracking with Occlusion-Aware Planning

Haowei Sun, Kai Zhou, Hao Gao, Shiteng Zhang, Jinwu Hu, Xutao Wen, Qixiang Ye, Mingkui Tan

Comments CVPR 2026 Poster

2604.21450 2026-04-24 cs.CV cs.AI cs.LG

VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution

Yixuan Zhu, Shilin Ma, Haolin Wang, Ao Li, Yanzhe Jing, Yansong Tang, Lei Chen, Jiwen Lu, Jie Zhou

Comments Accepted in ICLR 2026. Code is available at https://github.com/EternalEvan/VARestorer

2604.21444 2026-04-24 cs.AI

HiCrew: Hierarchical Reasoning for Long-Form Video Understanding via Question-Aware Multi-Agent Collaboration

Yuehan Zhu, Jingqi Zhao, Jiawen Zhao, Xudong Mao, Baoquan Zhao

2604.21442 2026-04-24 cs.CV

2L-LSH: A Locality-Sensitive Hash Function-Based Method For Rapid Point Cloud Indexing

Shurui Wang, Yuhe Zhang, Ruizhe Guo, Yaning Zhang, Yifei Xie, Xinyu Zhou

Comments 13 pages, 13 figures. Published in The Computer Journal

2604.21435 2026-04-24 cs.CV

UHR-DETR: Efficient End-to-End Small Object Detection for Ultra-High-Resolution Remote Sensing Imagery

Jingfang Li, Haoran Zhu, Wen Yang, Jinrui Zhang, Fang Xu, Haijian Zhang, Gui-Song Xia

2604.21430 2026-04-24 cs.AI

Brief chatbot interactions produce lasting changes in human moral values

Yue Teng, Qianer Zhong, Kim Mai Tich Nguyen Thordsen, Christian Montag, Benjamin Becker

2604.21428 2026-04-24 cs.CL

Decoupled DiLoCo for Resilient Distributed Pre-training

Arthur Douillard, Keith Rush, Yani Donchev, Zachary Charles, Nova Fallen, Ayush Dubey, Ionel Gog, Josef Dean, Blake Woodworth, Zachary Garrett, Nate Keating, Jenny Bishop, Henry Prior, Edouard Yvinec, Arthur Szlam, Marc'Aurelio Ranzato, Jeff Dean

2604.21422 2026-04-24 cs.CV

Pre-process for segmentation task with nonlinear diffusion filters

Javier Sanguino, Carlos Platero, Olga Velasco

Comments Manuscript from 2017, previously unpublished, 37 pages

2604.21420 2026-04-24 cs.AI

FairQE: Multi-Agent Framework for Mitigating Gender Bias in Translation Quality Estimation

Jinhee Jang, Juhwan Choi, Dongjin Lee, Seunguk Yu, Youngbin Kim

Comments Accepted to ACL 2026

2604.21414 2026-04-24 cs.AI

SemanticAgent: A Semantics-Aware Framework for Text-to-SQL Data Synthesis

Qiang Gao, Zhenping Li, Anqi Zhuo, Yingxiao Zhao, Weibo Geng, Xiaosong Li

2604.21411 2026-04-24 cs.LG physics.geo-ph

A Green-Integral-Constrained Neural Solver with Stochastic Physics-Informed Regularization

Mohammad Mahdi Abedi, David Pardo, Tariq Alkhalifah

详情

英文摘要

Standard physics-informed neural networks (PINNs) struggle to simulate highly oscillatory Helmholtz solutions in heterogeneous media because pointwise minimization of second-order PDE residuals is computationally expensive, biased toward smooth solutions, and requires artificial absorbing boundary layers to restrict the solution. To overcome these challenges, we introduce a Green-Integral (GI) neural solver for the acoustic Helmholtz equation. It departs from the PDE-residual-based formulation by enforcing wave physics through an integral representation that imposes a nonlocal constraint. Oscillatory behavior and outgoing radiation are encoded directly through the integral kernel, eliminating second-order spatial derivatives and enforcing physical solutions without additional boundary layers. Theoretically, optimizing this GI loss via a neural network acts as a spectrally tuned preconditioned iteration, enabling convergence in heterogeneous media where the classical Born series diverges. By exploiting FFT-based convolution to accelerate the GI loss evaluation, our approach substantially reduces GPU memory usage and training time. However, this efficiency relies on a fixed regular grid, which can limit local resolution. To improve local accuracy in strong scattering regions, we also propose a hybrid GI+PDE loss, enforcing a lightweight Helmholtz residual at a small number of nonuniformly sampled collocation points. We evaluate our method on seismic benchmark models characterized by structural contrasts and subwavelength heterogeneity at frequencies up to 20Hz. GI-based training consistently outperforms PDE-based PINNs, reducing computational cost by over a factor of ten. In models with localized scattering, the hybrid loss yields the most accurate reconstructions, providing a stable, efficient, and physically grounded alternative.

URL PDF HTML ☆

赞 0 踩 0

2604.21409 2026-04-24 cs.CV

S1-VL: Scientific Multimodal Reasoning Model with Thinking-with-Images

Qingxiao Li, Lifeng Xu, QingLi Wang, Yudong Bai, Mingwei Ou, Shu Hu, Nan Xu

Comments 29 pages, 13 figures

详情

英文摘要

We present S1-VL, a multimodal reasoning model for scientific domains that natively supports two complementary reasoning paradigms: Scientific Reasoning, which relies on structured chain-of-thought, and Thinking-with-Images, which enables the model to actively manipulate images through Python code execution during reasoning. In the Thinking-with-Images mode, the model generates and executes image-processing code in a sandbox environment, obtains intermediate visual results, and continues reasoning in a multi-turn iterative manner. This design is particularly effective for challenging scenarios such as high-resolution scientific chart interpretation, microscopic image understanding, and geometry-assisted reasoning. To construct the training data, we collect scientific multimodal datasets spanning six disciplines: mathematics, physics, chemistry, astronomy, geography, and biology. We further develop a six-dimensional quality filtering framework for reasoning trajectories. To mitigate redundant, ineffective, and erroneous visual operations commonly found in existing datasets, we propose a multi-stage filtering pipeline together with an adaptive data routing strategy. This strategy converts samples with low visual information gain into pure Reasoning-mode data, enabling the model to learn when image operations are truly necessary. S1-VL is trained through a four-stage progressive pipeline: scientific multimodal SFT, Thinking-with-Images cold-start SFT, and two stages of reinforcement learning with SAPO. We build S1-VL-32B on top of Qwen3-VL-32B-Thinking and evaluate it on 13 benchmarks. Experimental results show that S1-VL-32B achieves state-of-the-art performance on all five Thinking-with-Images benchmarks, including HRBench-4K, HRBench-8K, MME-RealWorld-CN, MME-RealWorld-Lite, and V*, and outperforms compared systems on scientific reasoning benchmarks such as Physics and VRSBench.

URL PDF HTML ☆

赞 0 踩 0

2604.21396 2026-04-24 cs.CV cs.AI

VG-CoT: Towards Trustworthy Visual Reasoning via Grounded Chain-of-Thought

Byeonggeuk Lim, Kyeonghyun Kim, JungMin Yun, YoungBin Kim

Comments Accepted to LREC 2026

2604.21393 2026-04-24 cs.LG

Relocation of compact sets in $\mathbb{R}^n$ by diffeomorphisms and linear separability of datasets in $\mathbb{R}^n$

Xiao-Song Yang, Xuan Zhou, Qi Zhou

2604.21387 2026-04-24 cs.CV

EdgeFormer: local patch-based edge detection transformer on point clouds

Yifei Xie, Zhikun Tu, Tong Yang, Yuhe Zhang, Xinyu Zhou

Comments 22 pages, 9 figures. Published in Pattern Analysis and Applications

2604.21377 2026-04-24 cs.RO cs.HC

A Replicable Robotics Awareness Method Using LLM-Enabled Robotics Interaction: Evidence from a Corporate Challenge

S. A. Prieto, M. A. Gopee, Y. Ben Arab, B. García de Soto, J. Esteba, P. Olivera Brizzio

Comments 10 pages, 8 Figures, to be submitted for journal per-review

2604.21370 2026-04-24 cs.CL cs.CY

MKJ at SemEval-2026 Task 9: A Comparative Study of Generalist, Specialist, and Ensemble Strategies for Multilingual Polarization

Maziar Kianimoghadam Jouneghani

Comments 9 pages, 9 tables. Accepted to the 20th International Workshop on Semantic Evaluation (SemEval-2026), Task 9

2604.21369 2026-04-24 cs.LG cs.HC

Channel-Free Human Activity Recognition via Inductive-Bias-Aware Fusion Design for Heterogeneous IoT Sensor Environments

Tatsuhito Hasegawa

Comments 13 pages, 6 figures, 8 tables, Preprint. This work has been submitted to the IEEE for possible publication

2604.21365 2026-04-24 cs.LG cs.AI cs.CL cs.SE

mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code

Adam Skurla, Dominik Macko, Jakub Simko

2604.21362 2026-04-24 cs.CV

KD-CVG: A Knowledge-Driven Approach for Creative Video Generation

Linkai Liu, Wei Feng, Xi Zhao, Shen Zhang, Xingye Chen, Zheng Zhang, Jingjing Lv, Junjie Shen, Ching Law, Yuchen Zhou, Zipeng Guo, Chao Gou

Comments Accepted to ICASSP 2026

2604.21361 2026-04-24 cs.AI

Time, Causality, and Observability Failures in Distributed AI Inference Systems

Ankur Sharma, Deep Shah, David Lariviere, Hesham ElBakoury

Comments 17 pages, 6 figures. Produced as part of the Unified Intelligent Infrastructure workstream at the Open Compute Project (OCP)

2604.21357 2026-04-24 cs.AI cs.CL

ReaGeo: Reasoning-Enhanced End-to-End Geocoding with LLMs

Jian Cui, Zhiyuan Ren, Desheng Weng, Yongqi Zhao, Gong Wenbin, Yu Lei, Zhenning Dong

Comments 12 pages, 8 figures, submitted to ACM SIGSPATIAL 2024 (under review)