arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.01236 2026-03-03 cs.CV cs.LG

AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

Changwoo Baek, Jouwon Song, Sohyeon Kim, Kyeongbo Kong

Comments Accepted to ICLR 2026

详情

英文摘要

Large Vision-Language Models (LVLMs) have adopted visual token pruning strategies to mitigate substantial computational overhead incurred by extensive visual token sequences. While prior works primarily focus on either attention-based or diversity-based pruning methods, in-depth analysis of these approaches' characteristics and limitations remains largely unexplored. In this work, we conduct thorough empirical analysis using effective rank (erank) as a measure of feature diversity and attention score entropy to investigate visual token processing mechanisms and analyze the strengths and weaknesses of each approach. Our analysis reveals two insights: (1) Our erank-based quantitative analysis shows that many diversity-oriented pruning methods preserve substantially less feature diversity than intended; moreover, analysis using the CHAIR dataset reveals that the diversity they do retain is closely tied to increased hallucination frequency compared to attention-based pruning. (2) We further observe that attention-based approaches are more effective on simple images where visual evidence is concentrated, while diversity-based methods better handle complex images with distributed features. Building on these empirical insights, we show that incorporating image-aware adjustments into existing hybrid pruning strategies consistently improves their performance. We also provide a minimal instantiation of our empirical findings through a simple adaptive pruning mechanism, which achieves strong and reliable performance across standard benchmarks as well as hallucination-specific evaluations. Our project page available at https://cvsp-lab.github.io/AgilePruner.

URL PDF HTML ☆

赞 0 踩 0

2603.01225 2026-03-03 cs.CL

Can Thinking Models Think to Detect Hateful Memes?

Mohamed Bayan Kmainasi, Mucahid Kutlu, Ali Ezzat Shahroor, Abul Hasnat, Firoj Alam

2603.01224 2026-03-03 cs.CV cs.AI cs.HC cs.LG cs.RO

Monocular 3D Object Position Estimation with VLMs for Human-Robot Interaction

Ari Wahl, Dorian Gawlinski, David Przewozny, Paul Chojecki, Felix Bießmann, Sebastian Bosse

Comments Accepted at Workshop on Integrating Image Processing with Large-Scale Language/Vision Models for Advanced Visual Understanding (LVLM) at IEEE International Conference on Image Processing (ICIP) 2025

2603.01220 2026-03-03 cs.CL

Generative AI & Fictionality: How Novels Power Large Language Models

Edwin Roland, Richard Jean So

2603.01212 2026-03-03 cs.CL

XAI-enhanced Comparative Opinion Mining via Aspect-based Scoring and Semantic Reasoning

Ngoc-Quang Le, T. Thanh-Lam Nguyen, Quoc-Trung Phu, Thi-Phuong Le, Duy-Cat Can, Hoang-Quynh Le

2603.01205 2026-03-03 cs.CV

CoSMo3D: Open-World Promptable 3D Semantic Part Segmentation through LLM-Guided Canonical Spatial Modeling

Li Jin, Weikai Chen, Yujie Wang, Yingda Yin, Zeyu Hu, Runze Zhang, Keyang Luo, Shengju Qian, Xin Wang, Xueying Qin

2603.01201 2026-03-03 cs.AI

Incremental LTLf Synthesis

Giuseppe De Giacomo, Yves Lespérance, Gianmarco Parretti, Fabio Patrizi, Moshe Y. Vardi

2603.01195 2026-03-03 cs.CV cs.AI

VisNec: Measuring and Leveraging Visual Necessity for Multimodal Instruction Tuning

Mingkang Dong, Hongyi Cai, Jie Li, Sifan Zhou, Bin Ren, Kunyu Peng, Yuqian Fu

Comments 17 pages, 4 figures

2603.01194 2026-03-03 cs.CV

RnG: A Unified Transformer for Complete 3D Modeling from Partial Observations

Mochu Xiang, Zhelun Shen, Xuesong Li, Jiahui Ren, Jing Zhang, Chen Zhao, Shanshan Liu, Haocheng Feng, Jingdong Wang, Yuchao Dai

Comments Accepted to CVPR 2026

2603.01190 2026-03-03 cs.CL

Reasoning or Rationalization? The Role of Justifications in Masked Diffusion Models for Fact Verification

Jacob Devasier

2603.01189 2026-03-03 cs.RO cs.HC

Agent-Based Simulation of Trust Development in Human-Robot Teams: An Empirically-Validated Framework

Ravi Kalluri

2603.01185 2026-03-03 cs.CL cs.AI cs.CR

Token-level Data Selection for Safe LLM Fine-tuning

Yanping Li, Zhening Liu, Zijian Li, Zehong Lin, Jun Zhang

Comments Accepted by ICLR 2026

2603.01184 2026-03-03 cs.LG cs.AI q-bio.NC stat.CO

Scaling of learning time for high dimensional inputs

Carlos Stein Brito

Comments 14 pages, 5 figures

2603.01178 2026-03-03 cs.RO

riMESA: Consensus ADMM for Real-World Collaborative SLAM

Daniel McGann, Michael Kaess

2603.01174 2026-03-03 cs.CV

VP-Hype: A Hybrid Mamba-Transformer Framework with Visual-Textual Prompting for Hyperspectral Image Classification

Abdellah Zakaria Sellam, Fadi Abdeladhim Zidi, Salah Eddine Bekhouche, Ihssen Houhou, Marouane Tliba, Cosimo Distante, Abdenour Hadid

2603.01171 2026-03-03 cs.LG cs.CC cs.NE

PARWiS: Winner determination under shoestring budgets using active pairwise comparisons

Shailendra Bhandari

Comments 12 pages

2603.01169 2026-03-03 cs.CV cs.AI cs.LG

TripleSumm: Adaptive Triple-Modality Fusion for Video Summarization

Sumin Kim, Hyemin Jeong, Mingu Kang, Yejin Kim, Yoori Oh, Joonseok Lee

Comments Published as a Conference Paper at ICLR 2026

2603.01167 2026-03-03 cs.CL

DEP: A Decentralized Large Language Model Evaluation Protocol

Jianxiang Peng, Junhao Li, Hongxiang Wang, Haocheng Lyu, Hui Guo, Siyi Hao, Zhen Wang, Chuang Liu, Shaowei Zhang, Bojian Xiong, Yue Chen, Zhuowen Han, Ling Shi, Tianyu Dong, Juesi Xiao, Lei Yang, Yuqi Ren, Deyi Xiong

2603.01163 2026-03-03 cs.CV

BeautyGRPO: Aesthetic Alignment for Face Retouching via Dynamic Path Guidance and Fine-Grained Preference Modeling

Jiachen Yang, Xianhui Lin, Yi Dong, Zebiao Zheng, Xing Liu, Hong Gu, Yanmei Fang

Comments Accepted by CVPR 2026

2603.01161 2026-03-03 cs.CV cs.AI

GRAD-Former: Gated Robust Attention-based Differential Transformer for Change Detection

Durgesh Ameta, Ujjwal Mishra, Praful Hambarde, Amit Shukla

Comments This work has been submitted to the IEEE for possible publication

2603.01160 2026-03-03 cs.AI cs.CL

Semantic XPath: Structured Agentic Memory Access for Conversational AI

Yifan Simon Liu, Ruifan Wu, Liam Gallagher, Jiazhou Liang, Armin Toroghi, Scott Sanner

2603.01153 2026-03-03 cs.RO

RAG-RUSS: A Retrieval-Augmented Robotic Ultrasound for Autonomous Carotid Examination

Dianye Huang, Ziping Cong, Nassir Navab, Zhongliang Jiang

Comments Accepted by ICRA

2603.01152 2026-03-03 cs.AI

DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent

Tongzhou Wu, Yuhao Wang, Xinyu Ma, Xiuqiang He, Shuaiqiang Wang, Dawei Yin, Xiangyu Zhao

Comments 6 pages, 4 figures

2603.01151 2026-03-03 cs.RO cs.CV cs.GR

D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping

Haozhe Lou, Mingtong Zhang, Haoran Geng, Hanyang Zhou, Sicheng He, Zhiyuan Gao, Siheng Zhao, Jiageng Mao, Pieter Abbeel, Jitendra Malik, Daniel Seita, Yue Wang

Comments ICLR 2026 Poster

2603.01144 2026-03-03 cs.LG

A Decomposition Framework for Certifiably Optimal Orthogonal Sparse PCA

Difei Cheng, Qiao Hu

Comments 14 pages; 12 figures

2603.01143 2026-03-03 cs.CV cs.AI

TC-SSA: Token Compression via Semantic Slot Aggregation for Gigapixel Pathology Reasoning

Zhuo Chen, Shawn Young, Lijian Xu

Comments 8 pages, 4 figures, 2 tables

2603.01140 2026-03-03 cs.CV

Teacher-Guided Causal Interventions for Image Denoising: Orthogonal Content-Noise Disentanglement in Vision Transformers

Kuai Jiang, Zhaoyan Ding, Guijuan Zhang, Dianjie Lu, Zhuoran Zheng

2603.01137 2026-03-03 cs.LG cs.AI

A Deep Learning Framework for Heat Demand Forecasting using Time-Frequency Representations of Decomposed Features

Adithya Ramachandran, Satyaki Chatterjee, Thorkil Flensmark B. Neergaard, Maximilian Oberndoerfer, Andreas Maier, Siming Bayer

详情

DOI: 10.1016/j.egyai.2026.100704
Journal ref: Energy and AI Volume 24, May 2026, 100704

英文摘要

District Heating Systems are essential infrastructure for delivering heat to consumers across a geographic region sustainably, yet efficient management relies on optimizing diverse energy sources, such as wood, gas, electricity, and solar, in response to fluctuating demand. Aligning supply with demand is critical not only for ensuring reliable heat distribution but also for minimizing carbon emissions and extending infrastructure lifespan through lower operating temperatures. However, accurate multi-step forecasting to support these goals remains challenging due to complex, non-linear usage patterns and external dependencies. In this work, we propose a novel deep learning framework for day-ahead heat demand prediction that leverages time-frequency representations of historical data. By applying Continuous Wavelet Transform to decomposed demand and external meteorological factors, our approach enables Convolutional Neural Networks to learn hierarchical temporal features that are often inaccessible to standard time domain models. We systematically evaluate this method against statistical baselines, state-of-the-art Transformers, and emerging foundation models using multi-year data from three distinct Danish districts, a Danish city, and a German city. The results show a significant advancement, reducing the Mean Absolute Error by 36% to 43% compared to the strongest baselines, achieving forecasting accuracy of up to 95% across annual test datasets. Qualitative and statistical analyses further confirm the accuracy and robustness by reliably tracking volatile demand peaks where others fail. This work contributes both a high-performance forecasting architecture and critical insights into optimal feature composition, offering a validated solution for modern energy applications.

URL PDF HTML ☆

赞 0 踩 0

2603.01135 2026-03-03 cs.AI

FCN-LLM: Empower LLM for Brain Functional Connectivity Network Understanding via Graph-level Multi-task Instruction Tuning

Xingcan Hu, Wei Wang, Li Xiao

2603.01128 2026-03-03 cs.RO

A Deployable Bio-inspired Compliant Leg Design for Enhanced Leaping in Quadruped Robots

Yiyang Chen, Yuxin Liu, Jinzheng Zhou, Fanxin Wang, Qinglei Bu, Jie Sun, Yikun Cheng