arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.24953 2026-04-30 cs.CV cs.AI

ViPO: Visual Preference Optimization at Scale

Ming Li, Jie Wu, Justin Cui, Xiaojie Li, Rui Wang, Chen Chen

Comments ICLR 2026 Paper. Project Page: https://liming-ai.github.io/ViPO ; Code: https://github.com/liming-ai/ViPO

详情

英文摘要

While preference optimization is crucial for improving visual generative models, how to effectively scale this paradigm remains largely unexplored. Current open-source preference datasets contain conflicting preference patterns, where winners excel in some dimensions but underperform in others. Naively optimizing on such noisy datasets fails to learn preferences, hindering effective scaling. To enhance robustness against noise, we propose Poly-DPO, which extends the DPO objective with an additional polynomial term that dynamically adjusts model confidence based on dataset characteristics, enabling effective learning across diverse data distributions. Beyond biased patterns, existing datasets suffer from low resolution, limited prompt diversity, and imbalanced distributions. To facilitate large-scale visual preference optimization by tackling data bottlenecks, we construct ViPO, a massive-scale preference dataset with 1M image pairs at 1024px across five categories and 300K video pairs at 720p+ across three categories. State-of-the-art generative models and diverse prompts ensure reliable preference signals with balanced distributions. Remarkably, when applying Poly-DPO to our high-quality dataset, the optimal configuration converges to standard DPO. This convergence validates dataset quality and Poly-DPO's adaptive nature: sophisticated optimization becomes unnecessary with sufficient data quality, yet remains valuable for imperfect datasets. We validate our approach across visual generation models. On noisy datasets like Pick-a-Pic V2, Poly-DPO achieves 6.87 and 2.32 gains over Diffusion-DPO on GenEval for SD1.5 and SDXL, respectively. For ViPO, models achieve performance far exceeding those trained on existing open-source preference datasets. These results confirm that addressing both algorithmic adaptability and data quality is essential for scaling visual preference optimization.

URL PDF HTML ☆

赞 0 踩 0

2604.24940 2026-04-30 cs.CL cs.AI

ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models

Orhan Demirci, Sezer Aptourachman, Aydın Kaya

Comments 13 pages (9 pages main text + 4 pages appendix), 6 tables, 1 algorithm

2604.24169 2026-04-30 cs.CV

PointTransformerX: Portable and Efficient 3D Point Cloud Processing without Sparse Algorithms

Laurenz Reichardt, Nikolas Ebert, Oliver Wasenmüller

Comments This paper has been accepted at IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2026

2604.24036 2026-04-30 cs.CV eess.IV

Robust Grounding with MLLMs Against Occlusion and Small Objects via Language-Guided Semantic Cues

Beomchan Park, Seongho Kim, Hyunjun Kim, Sungjune Park, Yong Man Ro

Comments 4 pages, 2 figures, ICASSP 2026

2604.24005 2026-04-30 cs.LG cs.AI

TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents

Jiaqi Wang, Wenhao Zhang, Weijie Shi, Yaliang Li, James Cheng

Comments Update code, model weight

2604.22658 2026-04-30 cs.CV

PASR: Pose-Aware 3D Shape Retrieval from Occluded Single Views

Jiaxin Shi, Guofeng Zhang, Wufei Ma, Naifu Liang, Adam Kortylewski, Alan Yuille

2604.22334 2026-04-30 cs.CV

FILTR: Extracting Topological Features from Pretrained 3D Models

Louis Martinez, Maks Ovsjanikov

Comments Project page: https://filtr-topology.github.io/

2604.19653 2026-04-30 cs.AI

A Dual Perspective on Synthetic Trajectory Generators: Utility Framework and Privacy Vulnerabilities

Aya Cherigui, Florent Guépin, Arnaud Legendre, Jean-François Couchot

2604.18112 2026-04-30 cs.CL cs.MM

Retrieval-Augmented Multimodal Model for Fake News Detection

Yiheng Li, Weihai Lu, Hanyi Yu, Yue Wang

Comments Accepted to SIGIR 26

2604.17698 2026-04-30 cs.LG cs.CL stat.ML

The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability

Prashant C. Raju

2604.16902 2026-04-30 cs.AI

Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models

Xinru Yan, Boxi Cao, Yaojie Lu, Hongyu Lin, Weixiang Zhou, Le Sun, Xianpei Han

2604.14858 2026-04-30 cs.AI cs.SE

Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-Codex

Zhonghao Yang, Yu Li, Yanxu Zhu, Tianyi Zhou, Yuejin Xie, Haoyu Luo, Jing Shao, Xia Hu, Dongrui Liu

Comments 18 pages, 3 figures

2604.14246 2026-04-30 cs.LG cs.AI

Awakening Dormant Experts:Counterfactual Routing to Mitigate MoE Hallucinations

Wentao Hu, Yanbo Zhai, Xiaohui Hu, Mingkuan Zhao, Shanhong yu, Xue Liu, Kaidong Yu, Shuangyong Song, Xuelong Li

Comments 14 pages, 6 figures, 6 tables

2604.14001 2026-04-30 cs.CL cs.AI cs.LG cs.NE

Diffusion Language Models for Speech Recognition

Davyd Naveriani, Albert Zeyer, Ralf Schlüter, Hermann Ney

2604.13127 2026-04-30 cs.CV cs.AI cs.SD

Graph Propagated Projection Unlearning: A Unified Framework for Vision and Audio Discriminative Models

Shreyansh Pathak, Jyotishman Das

Comments This submission has been withdrawn because it is posted accidentally without full author approval. A revised version may be submitted with full approval anytime soon

2604.12807 2026-04-30 cs.CV cs.AI

Rethinking Satellite Image Restoration for Onboard AI: A Lightweight Learning-Based Approach

Adrien Dorise, Marjorie Bellizzi, Omar Hlimi

Comments Accepted at AI4SPACE@CVPR conference

2604.12019 2026-04-30 cs.AI cs.HC

A Framework for Longitudinal Health AI Agents

Georgianna Lin, Rencong Jiang, Noémie Elhadad, Xuhai "Orson" Xu

Comments 10 pages, 2 figures, 5 tables

2604.06061 2026-04-30 cs.LG

PromptEvolver: Prompt Inversion through Evolutionary Optimization in Natural-Language Space

Asaf Buchnick, Aviv Shamsian, Aviv Navon, Ethan Fetaya

2604.05190 2026-04-30 cs.CL cs.AI cs.IR

Retrieval-Augmented LLMs for Evidence Localization in Clinical Trial Recruitment from Longitudinal EHR Narratives

Ziyi Chen, Mengxian Lyu, Cheng Peng, Yonghui Wu

2604.03260 2026-04-30 cs.CL cs.AI

Why Attend to Everything? Focus is the Key

Hengshuai Yao, Xing Chen, Ahmed Murtadha, Jin Li, Yasin Abbasi Yadkori, Shuai Shao, Changling Liu, Guan Wang, Mingli Yuan, William Chen, Sen Song

2604.02021 2026-04-30 cs.RO

Bridging Discrete Planning and Continuous Execution for Redundant Robot

Teng Yan, Yue Yu, Yihan Liu, Bingzhuo Zhong

Comments 8 pages, 3 figures. Submitted to IFAC World Congress 2026

2604.01929 2026-04-30 cs.SD cs.AI cs.LG

Woosh: A Sound Effects Foundation Model

Gaëtan Hadjeres, Marc Ferras, Khaled Koutini, Benno Weck, Alexandre Bittar, Thomas Hummel, Zineb Lahrichi, Hakim Missoum, Joan Serrà, Yuki Mitsufuji

2604.00809 2026-04-30 cs.CV cs.HC cs.IR

Revisiting Human-in-the-Loop Object Retrieval with Pre-Trained Vision Transformers

Kawtar Zaher, Olivier Buisson, Alexis Joly

2603.25342 2026-04-30 cs.LG

From Intent to Evidence: A Categorical Approach for Structural Evaluation of Deep Research Agents

Shuoling Liu, Zhiquan Tan, Kun Yi, Hui Wu, Yihan Li, Jiangpeng Yan, Liyuan Chen, Kai Chen, Qiang Yang

2603.21664 2026-04-30 cs.CV

HumanOmni-Speaker: Identifying Who said What and When

Detao Bai, Shimin Yao, Weixuan Chen, Zhiheng Ma, Xihan Wei, Jingren Zhou

2603.15262 2026-04-30 cs.AI

Probe-then-Plan: Environment-Aware Planning for Industrial E-commerce Search

Mengxiang Chen, Zhouwei Zhai, Jin Li

2603.14845 2026-04-30 cs.LG cs.AI

Integrating Weather Foundation Model and Satellite to Enable Fine-Grained Solar Irradiance Forecasting

Ziqing Ma, Kai Ying, Xinyue Gu, Tian Zhou, Tianyu Zhu, Haifan Zhang, Peisong Niu, Zheng Wang, Cong Bai, Liang Sun

2603.13589 2026-04-30 cs.LG cs.AI cs.CV

Assessing the Utility of Volumetric Motion Fields for Radar-based Precipitation Nowcasting with Physics-informed Deep Learning

Peter Pavlík, Anna Bou Ezzeddine, Viera Rozinajová

Comments To be submitted to a fitting journal

2603.12583 2026-04-30 cs.RO cs.HC cs.SY eess.SY math.OC

Skill-informed Data-driven Haptic Nudges for High-dimensional Human Motor Learning

Ankur Kamboj, Rajiv Ranganathan, Xiaobo Tan, Vaibhav Srivastava

2603.12249 2026-04-30 cs.CL cs.AI cs.CV

SciMDR: Advancing Scientific Multimodal Document Reasoning

Ziyu Chen, Yilun Zhao, Chengye Wang, Rilyn Han, Manasi Patwardhan, Arman Cohan

Comments ACL 2026