arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2601.22674 2026-03-31 cs.CV

VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration

Hanxun Yu, Wentong Li, Xuan Qu, Song Wang, Junbo Chen, Jianke Zhu

Comments ICLR2026, Code Link: https://github.com/hanxunyu/VisionTrim

详情

英文摘要

Multimodal large language models (MLLMs) suffer from high computational costs due to excessive visual tokens, particularly in high-resolution and video-based scenarios. Existing token reduction methods typically focus on isolated pipeline components and often neglect textual alignment, leading to performance degradation. In this paper, we propose VisionTrim, a unified framework for training-free MLLM acceleration, integrating two effective plug-and-play modules: 1) the Dominant Vision Token Selection (DVTS) module, which preserves essential visual tokens via a global-local view, and 2) the Text-Guided Vision Complement (TGVC) module, which facilitates context-aware token merging guided by textual cues. Extensive experiments across diverse image and video multimodal benchmarks demonstrate the performance superiority of our VisionTrim, advancing practical MLLM deployment in real-world applications. The code is available at: https://github.com/hanxunyu/VisionTrim.

URL PDF HTML ☆

赞 0 踩 0

2601.21956 2026-03-31 cs.LG

Uncertainty-Aware Data-Based Method for Fast and Reliable Shape Optimization

Yunjia Yang, Runze Li, Yufei Zhang, Haixin Chen

2601.20451 2026-03-31 cs.CL

MuVaC: A Variational Causal Framework for Multimodal Sarcasm Understanding in Dialogues

Diandian Guo, Fangfang Yuan, Cong Cao, Xixun Lin, Chuan Zhou, Hao Peng, Yanan Cao, Yanbing Liu

Comments 12 pages, 7 figures. Accepted by WWW 2026

2601.17470 2026-03-31 cs.CV

PhaSR: Generalized Image Shadow Removal with Physically Aligned Priors

Chia-Ming Lee, Yu-Fan Lin, Yu-Jou Hsiao, Jin-Hui Jiang, Yu-Lun Liu, Chih-Chung Hsu

Comments CVPR 2026 Camera Ready; Project Page: https://ming053l.github.io/PhaSR_github

2601.11396 2026-03-31 cs.CV

SUG-Occ: Explicit Semantics and Uncertainty Guided Sparse Learning for Efficient 3D Occupancy Prediction

Hanlin Wu, Pengfei Lin, Ehsan Javanmardi, Naren Bao, Bo Qian, Hao Si, Manabu Tsukada

2601.10079 2026-03-31 cs.LG cs.AI cs.CL

Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts

Sijia Luo, Xiaokang Zhang, Yuxuan Hu, Bohan Zhang, Ke Wang, Jinbo Su, Mengshu Sun, Lei Liang, Jing Zhang

2601.08120 2026-03-31 cs.LG

Structure Detection for Contextual Reinforcement Learning

Tianyue Zhou, Jung-Hoon Cho, Cathy Wu

详情

DOI: 10.1609/aaai.v40i34.40137
Journal ref: AAAI 2026

英文摘要

Contextual Reinforcement Learning (CRL) tackles the problem of solving a set of related Contextual Markov Decision Processes (CMDPs) that vary across different context variables. Traditional approaches--independent training and multi-task learning--struggle with either excessive computational costs or negative transfer. A recently proposed multi-policy approach, Model-Based Transfer Learning (MBTL), has demonstrated effectiveness by strategically selecting a few tasks to train and zero-shot transfer. However, CMDPs encompass a wide range of problems, exhibiting structural properties that vary from problem to problem. As such, different task selection strategies are suitable for different CMDPs. In this work, we introduce Structure Detection MBTL (SD-MBTL), a generic framework that dynamically identifies the underlying generalization structure of CMDP and selects an appropriate MBTL algorithm. For instance, we observe Mountain structure in which generalization performance degrades from the training performance of the target task as the context difference increases. We thus propose M/GP-MBTL, which detects the structure and adaptively switches between a Gaussian Process-based approach and a clustering-based approach. Extensive experiments on synthetic data and CRL benchmarks--covering continuous control, traffic control, and agricultural management--show that M/GP-MBTL surpasses the strongest prior method by 12.49% on the aggregated metric. These results highlight the promise of online structure detection for guiding source task selection in complex CRL environments.

URL PDF HTML ☆

赞 0 踩 0

2601.06932 2026-03-31 cs.CL cs.AI

Symphonym: Universal Phonetic Embeddings for Cross-Script Name Matching

Stephen Gadd

Comments 19 pages, 3 tables

2601.06853 2026-03-31 cs.CL cs.LG

†DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems

Zabir Al Nazi, Shubhashis Roy Dipta, Sudipta Kar

2601.05866 2026-03-31 cs.CL

FACTUM: Mechanistic Detection of Citation Hallucination in Long-Form RAG

Maxime Dassen, Rebecca Kotula, Kenton Murray, Andrew Yates, Dawn Lawrie, Efsun Kayi, James Mayfield, Kevin Duh

Comments Accepted at ECIR 2026. 13 pages, 2 figures

2601.03596 2026-03-31 cs.CV

Adaptive Attention Distillation for Robust Few-Shot Segmentation under Environmental Perturbations

Qianyu Guo, Jingrong Wu, Jieji Ren, Weifeng Ge, Wenqiang Zhang

Comments 12 pages, 5 figures

2601.02856 2026-03-31 cs.LG

Electricity Price Forecasting: Bridging Linear Models, Neural Networks and Online Learning

Btissame El Mahtout, Florian Ziel

2601.01146 2026-03-31 cs.LG

Self-Training the Neurochaos Learning Algorithm

Anusree M, Akhila Henry, Pramod P Nair

2601.01095 2026-03-31 cs.CV cs.LG

NarrativeTrack: Evaluating Entity-Centric Reasoning for Narrative Understanding

Hyeonjeong Ha, Jinjin Ge, Bo Feng, Kaixin Ma, Gargi Chakraborty

Comments Project Page: https://github.com/apple/ml-NarrativeTrack

2512.24143 2026-03-31 cs.CL

Activation Steering for Masked Diffusion Language Models

Adi Shnaidman, Erin Feiglin, Osher Yaari, Efrat Mentel, Amit Levi, Raz Lapid

Comments Accepted at ReALM-GEN @ ICLR 2026

2512.22065 2026-03-31 cs.CV cs.AI cs.HC

StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars

Zhiyao Sun, Ziqiao Peng, Yifeng Ma, Yi Chen, Zhengguang Zhou, Zixiang Zhou, Guozhen Zhang, Youliang Zhang, Yuan Zhou, Qinglin Lu, Yong-Jin Liu

Comments Accepted by CVPR 2026. Project page: https://streamavatar.github.io

2512.21326 2026-03-31 cs.LG cs.AI cs.CL stat.ML

Measuring all the noises of LLM Evals

Sida Wang

2512.13680 2026-03-31 cs.CV

LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction

Tianye Ding, Yiming Xie, Yiqing Liang, Moitreya Chatterjee, Pedro Miraldo, Huaizu Jiang

Comments CVPR 2026, 16 pages

2512.12812 2026-03-31 cs.CL cs.AI

Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, and LLaMA

Hanyu Cai, Binqi Shen, Lier Jin, Lan Hu, Xiaojing Fan

2512.12378 2026-03-31 cs.CV

M4Human: A Large-Scale Multimodal mmWave Radar Benchmark for Human Mesh Reconstruction

Junqiao Fan, Yunjiao Zhou, Yizhuo Yang, Xinyuan Cui, Jiarui Zhang, Lihua Xie, Jianfei Yang, Chris Xiaoxuan Lu, Fangqiang Ding

2512.10950 2026-03-31 cs.CV

E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training

Qitao Zhao, Hao Tan, Qianqian Wang, Sai Bi, Kai Zhang, Kalyan Sunkavalli, Shubham Tulsiani, Hanwen Jiang

Comments CVPR 2026 Camera-ready. Project website: https://qitaozhao.github.io/E-RayZer

2512.10652 2026-03-31 cs.CV cs.CR

TriDF: Evaluating Perception, Detection, and Hallucination for Interpretable DeepFake Detection

Jian-Yu Jiang-Lin, Kang-Yang Huang, Ling Zou, Ling Lo, Sheng-Ping Yang, Yu-Wen Tseng, Kun-Hsiang Lin, Chia-Ling Chen, Yu-Ting Ta, Yan-Tsung Wang, Po-Ching Chen, Hongxia Xie, Hong-Han Shuai, Wen-Huang Cheng

Comments CVPR 2026

2512.09804 2026-03-31 cs.CL cs.LG

OnCoCo 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Conversations

Jens Albrecht, Robert Lehmann, Aleksandra Poltermann, Eric Rudolph, Philipp Steigerwald, Mara Stieler

Comments Accepted at SoCon-NLPSI@LREC 2026

2512.08503 2026-03-31 cs.CV cs.AI

Disrupting Hierarchical Reasoning: Adversarial Protection for Geographic Privacy in Multimodal Reasoning Models

Jiaming Zhang, Che Wang, Yang Cao, Longtao Huang, Wei Yang Bryan Lim

Comments ICLR 2026

2512.05790 2026-03-31 cs.LG physics.data-an

Learnability Window in Gated Recurrent Neural Networks

Lorenzo Livi

Comments clarified language and minor fixes throughout

2512.05422 2026-03-31 cs.CV

ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction

Jiangtong Tan, Lin Liu, Jie Huanng, Xiaopeng Zhang, Qi Tian, Feng Zhao

2512.03350 2026-03-31 cs.CV

SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation

Yu Yuan, Tharindu Wickremasinghe, Zeeshan Nadir, Xijun Wang, Yiheng Chi, Stanley H. Chan

Comments Accepted by CVPR 2026. Camera-Ready Version. Project Page: https://yuyuanspace.com/SeeU/

2512.03336 2026-03-31 cs.LG cs.AI stat.ML

Single-Round Scalable Analytic Federated Learning

Alan T. L. Bacellar, Mustafa Munir, Felipe M. G. França, Priscila M. V. Lima, Radu Marculescu, Lizy K. John

Comments To appear in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

2512.01022 2026-03-31 cs.RO

CycleManip: Enabling Cyclic Task Manipulation via Effective Historical Perception and Understanding

Yi-Lin Wei, Haoran Liao, Yuhao Lin, Pengyue Wang, Zhizhao Liang, Guiliang Liu, Wei-Shi Zheng

Comments Accepted by CVPR2026. Project page: https://isee-laboratory.github.io/CycleManip/

2511.22950 2026-03-31 cs.CV cs.RO

RobotSeg: A Model and Dataset for Segmenting Robots in Image and Video

Haiyang Mei, Qiming Huang, Hai Ci, Mike Zheng Shou

Comments CVPR 2026. Project page: https://github.com/showlab/RobotSeg