arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

math/0603024 2026-04-10 math.ST cs.GL physics.soc-ph stat.TH

Towards a better list of citation superstars: compiling a multidisciplinary list of highly cited researchers

Igor Podlubny, Katarina Kassayova

Comments 15 pages, 4 tables

2604.08547 2026-04-10 cs.CV cs.GR

GaussiAnimate: Reconstruct and Rig Animatable Categories with Level of Dynamics

Jiaxin Wang, Dongxin Lyu, Zeyu Cai, Zhiyang Dou, Cheng Lin, Anpei Chen, Yuliang Xiu

Comments Page: https://cookmaker.cn/gaussianimate

2604.08546 2026-04-10 cs.CV

When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

Zhengyang Sun, Yu Chen, Xin Zhou, Xiaofan Li, Xiwu Chen, Dingkang Liang, Xiang Bai

Comments Accepted by CVPR 2026. Project page: https://h-embodvis.github.io/NUMINA

2604.08545 2026-04-10 cs.CV cs.AI

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

Shilin Yan, Jintao Tong, Hongwei Xue, Xiaojun Tang, Yangyang Wang, Kunyu Shi, Guannan Zhang, Ruixuan Li, Yixiong Zou

Comments Project Page: https://Accio-Lab.github.io/Metis

2604.08543 2026-04-10 cs.CV

E-3DPSM: A State Machine for Event-Based Egocentric 3D Human Pose Estimation

Mayur Deshmukh, Hiroyasu Akada, Helge Rhodin, Christian Theobalt, Vladislav Golyanik

Comments 20 pages; 14 figures and 14 tables; CVPR 2026; project page: https://4dqv.mpi-inf.mpg.de/E-3DPSM/

2604.08542 2026-04-10 cs.CV

Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction

Tao Xie, Peishan Yang, Yudong Jin, Yingfeng Cai, Wei Yin, Weiqiang Ren, Qian Zhang, Wei Hua, Sida Peng, Xiaoyang Guo, Xiaowei Zhou

Comments Project page: https://zju3dv.github.io/scal3r

2604.08541 2026-04-10 cs.CV cs.AI cs.CL

Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts

Haolei Xu, Haiwen Hong, Hongxing Li, Rui Zhou, Yang Zhang, Longtao Huang, Hui Xue, Yongliang Shen, Weiming Lu, Yueting Zhuang

2604.08540 2026-04-10 cs.CV cs.AI cs.CL

AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation

Ziwei Zhou, Zeyuan Lai, Rui Wang, Yifan Yang, Zhen Xing, Yuqing Yang, Qi Dai, Lili Qiu, Chong Luo

2604.08537 2026-04-10 cs.LG q-bio.NC

Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding

Mu Nan, Muquan Yu, Weijian Mai, Jacob S. Prince, Hossein Adeli, Rui Zhang, Jiahang Cao, Benjamin Becker, John A. Pyles, Margaret M. Henderson, Chunfeng Song, Nikolaus Kriegeskorte, Michael J. Tarr, Xiaoqing Hu, Andrew F. Luo

Comments Accepted to CVPR 2026, website: https://github.com/ezacngm/brainCodec

2604.08536 2026-04-10 cs.CV cs.AI

RewardFlow: Generate Images by Optimizing What You Reward

Onkar Susladkar, Dong-Hwan Jang, Tushar Prakash, Adheesh Juvekar, Vedant Shah, Ayush Barik, Nabeel Bashir, Muntasir Wahed, Ritish Shrirao, Ismini Lourentzou

Comments CVPR 2026. Project page: https://plan-lab.github.io/rewardflow

2604.08535 2026-04-10 cs.RO cs.CV

Fail2Drive: Benchmarking Closed-Loop Driving Generalization

Simon Gerstenecker, Andreas Geiger, Katrin Renz

2604.08534 2026-04-10 cs.RO

ActiveGlasses: Learning Manipulation with Active Vision from Ego-centric Human Demonstration

Yanwen Zou, Chenyang Shi, Wenye Yu, Han Xue, Jun Lv, Ye Pan, Chuan Wen, Cewu Lu

2604.08532 2026-04-10 cs.CV

Self-Improving 4D Perception via Self-Distillation

Nan Huang, Pengcheng Yu, Weijia Zeng, James M. Rehg, Angjoo Kanazawa, Haiwen Feng, Qianqian Wang

2604.08529 2026-04-10 cs.HC cs.AI

PSI: Shared State as the Missing Layer for Coherent AI-Generated Instruments in Personal AI Agents

Zhiyuan Wang, Erzhen Hu, Mark Rucker, Laura E. Barnes

2604.08528 2026-04-10 cs.RO

A-SLIP: Acoustic Sensing for Continuous In-hand Slip Estimation

Uksang Yoo, Yuemin Mao, Jean Oh, Jeffrey Ichnowski

2604.08527 2026-04-10 cs.CL cs.LG

Demystifying OPD: Length Inflation and Stabilization Strategies for Large Language Models

Feng Luo, Yu-Neng Chuang, Guanchu Wang, Zicheng Xu, Xiaotian Han, Tianyi Zhang, Vladimir Braverman

2604.08526 2026-04-10 cs.CV cs.GR

FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On

Johanna Karras, Yuanhao Wang, Yingwei Li, Ira Kemelmacher-Shlizerman

Comments SIGGRAPH 2026

2604.08525 2026-04-10 cs.AI cs.CL cs.CY

Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest

Addison J. Wu, Ryan Liu, Shuyue Stella Li, Yulia Tsvetkov, Thomas L. Griffiths

2604.08524 2026-04-10 cs.LG cs.AI cs.CL

What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal

Stephen Cheng, Sarah Wiegreffe, Dinesh Manocha

Comments 9 pages + appendix, 7 figures

2604.08523 2026-04-10 cs.CL cs.AI

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Yuxuan Zhang, Yubo Wang, Yipeng Zhu, Penghui Du, Junwen Miao, Xuan Lu, Wendong Xu, Yunzhuo Hao, Songcheng Cai, Xiaochen Wang, Huaisong Zhang, Xian Wu, Yi Lu, Minyi Lei, Kai Zou, Huifeng Yin, Ping Nie, Liang Chen, Dongfu Jiang, Wenhu Chen, Kelsey R. Allen

Comments Project page: https://claw-bench.com

2604.08522 2026-04-10 cs.CV

UniversalVTG: A Universal and Lightweight Foundation Model for Video Temporal Grounding

Joungbin An, Agrim Jain, Kristen Grauman

Comments Project Page: https://vision.cs.utexas.edu/projects/universalvtg

2604.08521 2026-04-10 math.OC cs.SY eess.SY

Discounted MPC and infinite-horizon optimal control under plant-model mismatch: Stability and suboptimality

Robert H. Moldenhauer, Karl Worthmann, Romain Postoyan, Dragan Nešić, Mathieu Granzotto

Comments Submitted to 65th IEEE Conference on Decision and Control as a regular paper

2604.08519 2026-04-10 cs.CL stat.ML

Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts

Jiayuan Ye, Vitaly Feldman, Kunal Talwar

2604.08517 2026-04-10 cs.GT

Learning vs. Optimizing Bidders in Budgeted Auctions

Giannis Fikioris, Balasubramanian Sivan, Éva Tardos

2604.08516 2026-04-10 cs.CV

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

Tanmay Gupta, Piper Wolters, Zixian Ma, Peter Sushko, Rock Yuren Pang, Diego Llanes, Yue Yang, Taira Anderson, Boyuan Zheng, Zhongzheng Ren, Harsh Trivedi, Taylor Blanton, Caleb Ouellette, Winson Han, Ali Farhadi, Ranjay Krishna

Comments https://allenai.org/blog/molmoweb

详情

英文摘要

Web agents--autonomous systems that navigate and execute tasks on the web on behalf of users--have the potential to transform how people interact with the digital world. However, the most capable web agents today rely on proprietary models with undisclosed training data and recipes, limiting scientific understanding, reproducibility, and community-driven progress. We believe agents for the open web should be built in the open. To this end, we introduce (1) MolmoWebMix, a large and diverse mixture of browser task demonstrations and web-GUI perception data and (2) MolmoWeb, a family of fully open multimodal web agents. Specifically, MolmoWebMix combines over 100K synthetic task trajectories from multiple complementary generation pipelines with 30K+ human demonstrations, atomic web-skill trajectories, and GUI perception data, including referring expression grounding and screenshot question answering. MolmoWeb agents operate as instruction-conditioned visual-language action policies: given a task instruction and a webpage screenshot, they predict the next browser action, requiring no access to HTML, accessibility trees, or specialized APIs. Available in 4B and 8B size, on browser-use benchmarks like WebVoyager, Online-Mind2Web, and DeepShop, MolmoWeb agents achieve state-of-the-art results outperforming similar scale open-weight-only models such as Fara-7B, UI-Tars-1.5-7B, and Holo1-7B. MolmoWeb-8B also surpasses set-of-marks (SoM) agents built on much larger closed frontier models like GPT-4o. We further demonstrate consistent gains through test-time scaling via parallel rollouts with best-of-N selection, achieving 94.7% and 60.5% pass@4 (compared to 78.2% and 35.3% pass@1) on WebVoyager and Online-Mind2Web respectively. We will release model checkpoints, training data, code, and a unified evaluation harness to enable reproducibility and accelerate open research on web agents.

URL PDF HTML ☆

赞 0 踩 0

2604.08514 2026-04-10 cs.HC

"Because we are no longer ashamed of our disabilities, we are proud": Advocating and Reclaiming Next-Gen Accessibility Symbols

Karen Joy, Chris Dodge, Harsh Chavda, Alyssa Sheehan

Comments 18 pages, 10 images

2604.08513 2026-04-10 cs.CV

When Fine-Tuning Changes the Evidence: Architecture-Dependent Semantic Drift in Chest X-Ray Explanations

Kabilan Elangovan, Daniel Ting

2604.08510 2026-04-10 cs.CL

What do Language Models Learn and When? The Implicit Curriculum Hypothesis

Emmy Liu, Kaiser Sun, Millicent Li, Isabelle Lee, Lindia Tjuatja, Jen-tse Huang, Graham Neubig

2604.08509 2026-04-10 cs.CV cs.RO

Visually-grounded Humanoid Agents

Hang Ye, Xiaoxuan Ma, Fan Lu, Wayne Wu, Kwan-Yee Lin, Yizhou Wang

Comments Project page: https://alvinyh.github.io/VGHuman/

2604.08504 2026-04-10 stat.ML cs.AI cs.CL cs.DS cs.LG

Differentially Private Language Generation and Identification in the Limit

Anay Mehrotra, Grigoris Velegkas, Xifan Yu, Felix Zhou