arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2601.02441 2026-03-26 cs.CV cs.AI

Understanding Pure Textual Reasoning for Blind Image Quality Assessment

Yuan Li, Shin'ya Nishida

Comments Code available at https://github.com/AnonymousUserPublish/Bridging-Image-Text-Gap-for-BIQA/tree/main. This work is accepted by ICME (IEEE International Conference on Multimedia and Expo) 2026

2512.23374 2026-03-26 cs.CV

NeXT-IMDL: Build Benchmark for NeXT-Generation Image Manipulation Detection & Localization

Yifei Li, Haoyuan He, Yu Zheng, Bingyao Yu, Wenzhao Zheng, Lei Chen, Jie Zhou, Jiwen Lu

Comments Duplicate experiment results in Table 3 (Set-1 & Set-2)

2512.17143 2026-03-26 cs.CV

Pro-Pose: Unpaired Full-Body Portrait Synthesis via Canonical UV Maps

Sandeep Mishra, Yasamin Jafarian, Andreas Lugmayr, Yingwei Li, Varsha Ramakrishnan, Srivatsan Varadharajan, Alan C. Bovik, Ira Kemelmacher-Shlizerman

2512.16917 2026-03-26 cs.AI cs.CL cs.LG

Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning

Qihao Liu, Luoxin Ye, Wufei Ma, Yu-Cheng Chou, Alan Yuille

Comments Camera-ready version

2512.16456 2026-03-26 cs.CV

Prime and Reach: Synthesising Body Motion for Gaze-Primed Object Reach

Masashi Hatano, Saptarshi Sinha, Jacob Chalk, Wei-Hong Li, Hideo Saito, Dima Damen

Comments Project Page: https://masashi-hatano.github.io/prime-and-reach/

2512.11715 2026-03-26 cs.CV cs.MM eess.IV

EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing

Wei Chow, Linfeng Li, Lingdong Kong, Zefeng Li, Qi Xu, Hang Song, Tian Ye, Xian Wang, Jinbin Bai, Shilin Xu, Xiangtai Li, Junting Pan, Shaoteng Liu, Ran Zhou, Tianshu Yang, Songhua Liu

详情

英文摘要

Recent advances in diffusion models (DMs) have achieved exceptional visual quality in image editing tasks. However, the global denoising dynamics of DMs inherently conflate local editing targets with the full-image context, leading to unintended modifications in non-target regions. In this paper, we shift our attention beyond DMs and turn to Masked Generative Transformers (MGTs) as an alternative approach to tackle this challenge. By predicting multiple masked tokens rather than holistic refinement, MGTs exhibit a localized decoding paradigm that endows them with the inherent capacity to explicitly preserve non-relevant regions during the editing process. Building upon this insight, we introduce the first MGT-based image editing framework, termed EditMGT. We first demonstrate that MGT's cross-attention maps provide informative localization signals for localizing edit-relevant regions and devise a multi-layer attention consolidation scheme that refines these maps to achieve fine-grained and precise localization. On top of these adaptive localization results, we introduce region-hold sampling, which restricts token flipping within low-attention areas to suppress spurious edits, thereby confining modifications to the intended target regions and preserving the integrity of surrounding non-target areas. To train EditMGT, we construct CrispEdit-2M, a high-resolution dataset spanning seven diverse editing categories. Without introducing additional parameters, we adapt a pre-trained text-to-image MGT into an image editing model through attention injection. Extensive experiments across four standard benchmarks demonstrate that, with fewer than 1B parameters, our model achieves similarity performance while enabling 6 times faster editing. Moreover, it delivers comparable or superior editing quality, with improvements of 3.6% and 17.6% on style change and style transfer tasks, respectively.

URL PDF HTML ☆

赞 0 踩 0

2512.08334 2026-03-26 cs.CV

HybridSplat: Fast Reflection-baked Gaussian Tracing using Hybrid Splatting

Chang Liu, Hongliang Yuan, Lianghao Zhang, Sichao Wang, Jianwei Guo, Shi-Sheng Huang

Comments The authors have decided to withdraw this manuscript to undergo a comprehensive revision of the methodology and data analysis. The current version no longer accurately reflects the final scope and quality of our ongoing research

2512.07801 2026-03-26 cs.CL cs.AI cs.HC cs.LG

Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support

Raunak Jain

2512.06438 2026-03-26 cs.CV

AGORA: Adversarial Generation Of Real-time Animatable 3D Gaussian Head Avatars

Ramazan Fazylov, Sergey Zagoruyko, Aleksandr Parkin, Stamatis Lefkimmiatis, Ivan Laptev

Comments Extended the method to support mobile devices; updated experiments, results and supplementary

2512.04480 2026-03-26 cs.AI cs.CE cs.SY eess.SY math.OC

Prescriptive Artificial Intelligence: A Formal Paradigm for Auditing Human Decisions Under Uncertainty

Pedro Passos, Patrick Moratori

Comments Preprint; suitable for AI, decision sciences, and prescriptive analytics. Short versions published in Wharton Sports Analytics Journal Fall 2025 (AI Feature Spotlight) and accepted to AAAI Bridge on LM Reasoning 2026

2512.04000 2026-03-26 cs.CV cs.AI cs.LG

Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding

Jialuo Li, Bin Li, Jiahao Li, Yan Lu

Comments CVPR 2026

2512.02566 2026-03-26 cs.CV cs.AI

From Panel to Pixel: Zoom-In Vision-Language Pretraining from Biomedical Scientific Literature

Kun Yuan, Min Woo Sun, Zhen Chen, Alejandro Lozano, Xiangteng He, Shi Li, Nassir Navab, Xiaoxiao Sun, Nicolas Padoy, Serena Yeung-Levy

2512.01540 2026-03-26 cs.CV

FlashVGGT: Efficient and Scalable Visual Geometry Transformers with Compressed Descriptor Attention

Zipeng Wang, Dan Xu

Comments CVPR2026

2511.20924 2026-03-26 cs.CV

GaINeR: Geometry-Aware Implicit Network Representation

Weronika Jakubowska, Mikołaj Zieliński, Rafał Tobiasz, Krzysztof Byrski, Maciej Zięba, Dominik Belter, Przemysław Spurek

Comments 22 pages, 16 figures

2511.20592 2026-03-26 cs.LG cs.CV

Latent Diffusion Inversion Requires Understanding the Latent Space

Mingxing Rao, Bowen Qu, Daniel Moyer

Comments 14 pages, 4 figures, 7 tables

2511.18789 2026-03-26 cs.LG stat.ML

Perturbing the Derivative: Doubly Wild Refitting for Model-Free Evaluation of Opaque Machine Learning Predictors

Haichen Hu, David Simchi-Levi

2511.18370 2026-03-26 cs.CV cs.GR

MimiCAT: Mimic with Correspondence-Aware Cascade-Transformer for Category-Free 3D Pose Transfer

Zenghao Chai, Chen Tang, Yongkang Wong, Xulei Yang, Mohan Kankanhalli

Comments Accepted to CVPR 2026. Project page: https://mimicat3d.github.io/

2511.16148 2026-03-26 cs.LG

Enhancing Nuclear Reactor Core Simulation through Data-Based Surrogate Models

Perceval Beja-Battais, Alain Grossetête, Nicolas Vayatis

2511.13649 2026-03-26 cs.CV

Distribution Matching Distillation Meets Reinforcement Learning

Dengyang Jiang, Dongyang Liu, Zanyi Wang, Qilong Wu, Liuzhuozheng Li, Hengzhuang Li, Xin Jin, David Liu, Changsheng Lu, Zhen Li, Bo Zhang, Mengmeng Wang, Steven Hoi, Peng Gao, Harry Yang

Comments The synergy of reinforcement learning and distribution matching distillation. See more: https://github.com/vvvvvjdy/dmdr

2511.11743 2026-03-26 cs.LG cs.AI

Uncertainty Makes It Stable: Curiosity-Driven Quantized Mixture-of-Experts

Sebastián Andrés Cajas Ordóñez, Luis Fernando Torres Torres, Mackenzie J. Meni, Carlos Andrés Duran Paredes, Eric Arazo, Cristian Bosch, Ricardo Simon Carbajo, Yuan Lai, Leo Anthony Celi

2511.06767 2026-03-26 cs.LG cs.AI

QUARK: Quantization-Enabled Circuit Sharing for Transformer Acceleration by Exploiting Common Patterns in Nonlinear Operations

Zhixiong Zhao, Haomin Li, Fangxin Liu, Yuncheng Lu, Zongwu Wang, Tao Yang, Li Jiang, Haibing Guan

Comments Accepted by ICCAD 2025

2511.06229 2026-03-26 cs.LG

Deep Reinforcement Learning for Dynamic Origin-Destination Matrix Estimation in Microscopic Traffic Simulations Considering Credit Assignment

Donggyu Min, Seongjin Choi, Dong-Kyu Kim

Comments 16 pages, 13 figures, 7 tables

2511.04854 2026-03-26 cs.LG q-bio.QM

SigmaDock: Untwisting Molecular Docking With Fragment-Based SE(3) Diffusion

Alvaro Prat, Leo Zhang, Charlotte M. Deane, Yee Whye Teh, Garrett M. Morris

Comments Camera-ready version for ICLR 2026

2511.01718 2026-03-26 cs.RO cs.CV

Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process

Jiayi Chen, Wenxuan Song, Pengxiang Ding, Ziyang Zhou, Han Zhao, Feilong Tang, Donglin Wang, Haoang Li

2510.27684 2026-03-26 cs.CV

Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals

Xiangyu Fan, Zesong Qiu, Zhuguanyu Wu, Fanzhou Wang, Zhiqian Lin, Tianxiang Ren, Dahua Lin, Ruihao Gong, Lei Yang

2510.14751 2026-03-26 cs.LG cs.AI

Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries

Divyat Mahajan, Sachin Goyal, Badr Youbi Idrissi, Mohammad Pezeshki, Ioannis Mitliagkas, David Lopez-Paz, Kartik Ahuja

Comments Proceedings of the Fourteenth International Conference on Learning Representations (ICLR) 2026

2510.11306 2026-03-26 cs.RO

Rotor-Failure-Aware Quadrotors Flight in Unknown Environments

Xiaobin Zhou, Miao Wang, Chengao Li, Can Cui, Ruibin Zhang, Yongchao Wang, Chao Xu, Fei Gao

2510.10223 2026-03-26 cs.CL cs.AI cs.LG

You only need 4 extra tokens: Synergistic Test-time Adaptation for LLMs

Yijie Xu, Huizai Yao, Zhiyu Guo, Pengteng Li, Aiwei Liu, Xuming Hu, Weiyu Guo, Hui Xiong

Comments Under Review

2510.09146 2026-03-26 cs.LG

Score-Based Density Estimation from Pairwise Comparisons

Petrus Mikkola, Luigi Acerbi, Arto Klami

Comments Accepted at ICLR 2026. Camera-ready version. 36 pages, 16 figures

2510.04601 2026-03-26 cs.CL

FedSRD: Sparsify-Reconstruct-Decompose for Communication-Efficient Federated Large Language Models Fine-Tuning

Guochen Yan, Luyuan Xie, Qingni Shen, Yuejian Fang, Zhonghai Wu

Comments Accepted by WWW 2026