arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2503.23495 2026-03-24 cs.CV

Embedding Shift Dissection on CLIP: Effects of Augmentations on VLM's Representation Learning

Ashim Dahal, Saydul Akbar Murad, Nick Rahimi

Comments accepted at MIV at CVPR 2025

详情

DOI: 10.1109/CVPRW67362.2025.00469
Journal ref: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

英文摘要

Understanding the representation shift on Vision Language Models like CLIP under different augmentations provides valuable insights on Mechanistic Interpretability. In this study, we show the shift on CLIP's embeddings on 9 common augmentation techniques: noise, blur, color jitter, scale and rotate, flip, elastic and perspective transforms, random brightness and contrast, and coarse dropout of pixel blocks. We scrutinize the embedding shifts under similarity on attention map, patch, edge, detail preservation, cosine similarity, L2 distance, pairwise distance and dendrogram clusters and provide qualitative analysis on sample images. Our findings suggest certain augmentations like noise, perspective transform and shift scaling have higher degree of drastic impact on embedding shift. This study provides a concrete foundation for future work on VLM's robustness for mechanical interpretation and adversarial data defense. The code implementation for this study can be found on \href{https://github.com/ashimdahal/clip-shift-analysis}{https://github.com/ashimdahal/clip-shift-analysis}.

URL PDF HTML ☆

赞 0 踩 0

2503.13401 2026-03-24 cs.CL cs.AI

Levels of Analysis for Large Language Models

Alexander Y. Ku, Declan Campbell, Xuechunzi Bai, Jiayi Geng, Ryan Liu, Raja Marjieh, R. Thomas McCoy, Andrew Nam, Ilia Sucholutsky, Veniamin Veselovsky, Liyi Zhang, Jian-Qiao Zhu, Thomas L. Griffiths

2503.10475 2026-03-24 cs.RO cs.SY eess.SY

Stratified Topological Autonomy for Long-Range Coordination (STALC)

Cora A. Duggan, Adam Goertz, Adam Polevoy, Mark Gonzales, Kevin C. Wolfe, Bradley Woosley, John G. Rogers, Joseph Moore

Comments ©2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

2503.01013 2026-03-24 cs.LG

TimeXL: Explainable Multi-modal Time Series Prediction with LLM-in-the-Loop

Yushan Jiang, Wenchao Yu, Geon Lee, Dongjin Song, Kijung Shin, Wei Cheng, Yanchi Liu, Haifeng Chen

Comments NeurIPS 2025 camera ready version

2502.18220 2026-03-24 cs.CV cs.AI

UASTrack: A Unified Adaptive Selection Framework with Modality-Customization in Single Object Tracking

He Wang, Tianyang Xu, Zhangyong Tang, Xiao-Jun Wu, Josef Kittler

2502.16772 2026-03-24 cs.LG

Model-Based Exploration in Monitored Markov Decision Processes

Alireza Kazemipour, Simone Parisi, Matthew E. Taylor, Michael Bowling

2502.11026 2026-03-24 cs.LG cs.AI cs.CL

RLHF in an SFT Way: From Optimal Solution to Reward-Weighted Alignment

Yuhao Du, Zhuo Li, Pengyu Cheng, Zhihong Chen, Yuejiao Xie, Xiang Wan, Anningzhe Gao

Comments Published in TMLR-2026

2502.00618 2026-03-24 cs.CV cs.AI

DesCLIP: Robust Continual Learning via General Attribute Descriptions for VLM-Based Visual Recognition

Chiyuan He, Zihuan Qiu, Fanman Meng, Linfeng Xu, Qingbo Wu, Hongliang Li

Comments IEEE Transactions on Multimedia 2026

2501.00725 2026-03-24 cs.LG cs.CV

Automatic Construction of Pattern Classifiers Capable of Continuous Incremental Learning and Unlearning Tasks Based on Compact-Sized Probabilistic Neural Network

Tetsuya Hoya, Shunpei Morita

Comments A modified version appeared in the Proceedings of the AAIML-2026

2412.07971 2026-03-24 cs.LG cs.DC stat.ML

Effectiveness of Distributed Gradient Descent with Local Steps for Overparameterized Models

Heng Zhu, Harsh Vardhan, Arya Mazumdar

2412.02868 2026-03-24 cs.AI

PrecLLM: A Privacy-Preserving Framework for Efficient Clinical Annotation Extraction from Unstructured EHRs using Small-Scale LLMs

Yixiang Qu, Yifan Dai, Shilin Yu, Pradham Tanikella, Malvika Pillai, Walter Chen, Jialiu Xie, Yishan Ren, Duan Wang, Yikai Wang, Sid Sheth, Guanting Chen, Yufeng Liu, Travis Schrank, Trevor Hackman, Didong Li, Di Wu

2411.18064 2026-03-24 cs.CV

Lightweight Gaze Estimation Model Via Fusion Global Information

Zhang Cheng, Yanxia Wang

2411.16196 2026-03-24 cs.CV cs.LG

Learn from Foundation Model: Fruit Detection Model without Manual Annotation

Yanan Wang, Zhenghao Fei, Ruichen Li, Yibin Ying

Comments 35 pages, 11figures, conference or other essential info

2411.11391 2026-03-24 cs.LG cs.AI

The GECo algorithm for Graph Neural Networks Explanation

Salvatore Calderaro, Domenico Amato, Giosuè Lo Bosco, Riccardo Rizzo, Filippo Vella

2411.01259 2026-03-24 cs.CL cs.CY

Diversidade linguística e inclusão digital: desafios para uma ia brasileira

Raquel Meister Ko Freitag

Comments in Portuguese language. paper aceepted to LAAI-Ethics 2024

2410.08947 2026-03-24 cs.LG cs.AI

Meta-Transfer Learning Powered Temporal Graph Networks for Cross-City Real Estate Appraisal

Weijia Zhang, Jindong Han, Hao Liu, Wei Fan, Hao Wang, Hui Xiong

Comments Accepted by TIST 2026

2410.05824 2026-03-24 cs.CL

Multi-Session Client-Centered Treatment Outcome Evaluation in Psychotherapy

Hongbin Na, Tao Shen, Shumao Yu, Ling Chen

Comments Accepted at LREC 2026. Camera-ready Version

2409.19437 2026-03-24 cs.LG cs.AI cs.DS math.OC

Strongly-polynomial time and validation analysis of policy gradient methods

Caleb Ju, Guanghui Lan

Comments Updated manuscript with new experiments

2407.18707 2026-03-24 cs.LG stat.ML

Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection

Steven Adams, Andrea Patanè, Morteza Lahijanian, Luca Laurenti

详情

英文摘要

Infinitely wide or deep neural networks (NNs) with independent and identically distributed (i.i.d.) parameters have been shown to be equivalent to Gaussian processes. Because of the favorable properties of Gaussian processes, this equivalence is commonly employed to analyze neural networks and has led to various breakthroughs over the years. However, neural networks and Gaussian processes are equivalent only in the limit; in the finite case there are currently no methods available to approximate a trained neural network with a Gaussian model with bounds on the approximation error. In this work, we present an algorithmic framework to approximate a neural network of finite width and depth, and with not necessarily i.i.d. parameters, with a mixture of Gaussian processes with error bounds on the approximation error. In particular, we consider the Wasserstein distance to quantify the closeness between probabilistic models and, by relying on tools from optimal transport and Gaussian processes, we iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes. Crucially, for any NN and $ε>0$ our approach is able to return a mixture of Gaussian processes that is $ε$-close to the NN at a finite set of input points. Furthermore, we rely on the differentiability of the resulting error bound to show how our approach can be employed to tune the parameters of a NN to mimic the functional behavior of a given Gaussian process, e.g., for prior selection in the context of Bayesian inference. We empirically investigate the effectiveness of our results on both regression and classification problems with various neural network architectures. Our experiments highlight how our results can represent an important step towards understanding neural network predictions and formally quantifying their uncertainty.

URL PDF HTML ☆

赞 0 踩 0

2407.08626 2026-03-24 cs.LG cs.RO

RoboMorph: Evolving Robot Morphology using Large Language Models

Kevin Qiu, Władysław Pałucki, Krzysztof Ciebiera, Paweł Fijałkowski, Marek Cygan, Łukasz Kuciński

2406.06999 2026-03-24 cs.CV

Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

Junfei Yi, Jianxu Mao, Tengfei Liu, Mingjie Li, Hanyu Gu, Hui Zhang, Xiaojun Chang, Yaonan Wang

2406.01914 2026-03-24 cs.CV cs.AI cs.CL

HPE-CogVLM: Advancing Vision Language Models with a Head Pose Grounding Task

Yu Tian, Tianqi Shao, Tsukasa Demizu, Xuyang Wu, Hsin-Tai Wu

Comments Accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2026. This version includes major updates in methodology and experiments. The final version is available at IEEE Xplore

详情

DOI: 10.1109/TCSVT.2026.3675940
Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, Early Access, 2026

英文摘要

Head pose estimation (HPE) requires a sophisticated understanding of 3D spatial relationships to generate precise yaw, pitch, and roll angles. Previous HPE models, primarily CNN-based, rely on cropped close-up human head images as inputs and often lack robustness in real-world scenario. Vision Language Models (VLMs) can analyze entire images while focusing on specific objects through their attention mechanisms. In this paper, we propose a novel framework to improve the HPE accuracy by leveraging the object detection grounding capability of a VLM, referred to as CogVLM. We empirically find that directly LoRA fine-tuning of this VLM for the HPE task fails to achieve desirable HPE accuracy, while some model merging methods can improve accuracy but frequently produce blended invalid response formats, struggling to handle both object detection and HPE tasks simultaneously. To integrate HPE capability into CogVLM effectively, we develop a novel LoRA layer-based model merging method. This merging approach applies a high cosine similarity threshold and a 'winner-takes-all' layer selection strategy, aligning attention to the HPE task while preserving original object detection knowledge. It successfully resolves issues with blended invalid response formats and improves accuracy. Results show that our HPE-CogVLM achieves a 31.5% reduction in Mean Absolute Error over the current state-of-the-art CNN model, 6DRepNet, in cross-dataset evaluation. Furthermore, HPE-CogVLM outperforms both directly LoRA fine-tuned and task arithmetic-based merged VLMs across all HPE metrics.

URL PDF HTML ☆

赞 0 踩 0

2405.13859 2026-03-24 cs.CV

Accurate Quantization for Gait Representation Learning

S. Tian, H. Gao, G. Hong, S. Wang, J. Wang, X. Yu, S. Zhang

2403.10889 2026-03-24 cs.LG stat.ML

List Sample Compression and Uniform Convergence

Steve Hanneke, Shay Moran, Tom Waknine

2402.15127 2026-03-24 cs.LG cs.IT math.IT stat.ML

Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention

Junwen Yang, Tianyuan Jin, Vincent Y. F. Tan

Comments 36 pages

2310.15190 2026-03-24 cs.RO

Fast Path Planning for Autonomous Vehicle Parking with Safety-Guarantee using Hamilton-Jacobi Reachability

Xuemin Chi, Jun Zeng, Jihao Huang, Zhitao Liu, Hongye Su

Comments accepted by IEEE Transactions on Vehicular Technology

2308.03527 2026-03-24 cs.AI

Exploring ChatGPT's Empathic Abilities

Kristina Schaaff, Caroline Reinig, Tim Schlippe

2302.13001 2026-03-24 cs.LG cs.AI

Better Generative Replay for Continual Federated Learning

Daiqing Qi, Handong Zhao, Sheng Li

2211.16715 2026-03-24 cs.LG cs.AI math.OC

Policy Optimization over General State and Action Spaces

Caleb Ju, Guanghui Lan

Comments Writing updates and new experimental results

2209.04999 2026-03-24 cs.RO cs.AI

Multi-Step First: A Lightweight Deep Reinforcement Learning Strategy for Robust Continuous Control with Partial Observability

Lingheng Meng, Rob Gorbet, Michael Burke, Dana Kulić

Comments 21 pages, 12 figures. Published in Neural Networks, Vol. 199, 2026