arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.20433 2026-03-24 cs.SD cs.AI cs.CL eess.AS

ALICE: A Multifaceted Evaluation Framework of Large Audio-Language Models' In-Context Learning Ability

Yen-Ting Piao, Jay Chiehen Liao, Wei-Tang Chien, Toshiki Ogimoto, Shang-Tse Chen, Yun-Nung Chen, Chun-Yi Lee, Shao-Yuan Lo

Comments Submitted to Interspeech 2026

2603.20432 2026-03-24 cs.CL cs.AI

Coding Agents are Effective Long-Context Processors

Weili Cao, Xunjian Yin, Bhuwan Dhingra, Shuyan Zhou

2603.20428 2026-03-24 cs.CV

Benchmarking Efficient & Effective Camera Pose Estimation Strategies for Novel View Synthesis

Jhacson Meza, Martin R. Oswald, Torsten Sattler

2603.20425 2026-03-24 cs.AI

Leveraging Natural Language Processing and Machine Learning for Evidence-Based Food Security Policy Decision-Making in Data-Scarce Making

Karan Kumar Singh, Nikita Gajbhiye

Comments 25 pages, 12 figures, 2 tables. Submitted for academic publication

2603.20422 2026-03-24 cs.CV cs.AI cs.IR

PEARL: Personalized Streaming Video Understanding Model

Yuanhong Zheng, Ruichuan An, Xiaopeng Lin, Yuxing Liu, Sihan Yang, Huanyu Zhang, Haodong Li, Qintong Zhang, Renrui Zhang, Guopeng Li, Yifan Zhang, Yuheng Li, Wentao Zhang

Comments Arxiv Submission

详情

英文摘要

Human cognition of new concepts is inherently a streaming process: we continuously recognize new objects or identities and update our memories over time. However, current multimodal personalization methods are largely limited to static images or offline videos. This disconnects continuous visual input from instant real-world feedback, limiting their ability to provide the real-time, interactive personalized responses essential for future AI assistants. To bridge this gap, we first propose and formally define the novel task of Personalized Streaming Video Understanding (PSVU). To facilitate research in this new direction, we introduce PEARL-Bench, the first comprehensive benchmark designed specifically to evaluate this challenging setting. It evaluates a model's ability to respond to personalized concepts at exact timestamps under two modes: (1) Frame-level, focusing on a specific person or object in discrete frames, and (2) a novel Video-level, focusing on personalized actions unfolding across continuous frames. PEARL-Bench comprises 132 unique videos and 2,173 fine-grained annotations with precise timestamps. Concept diversity and annotation quality are strictly ensured through a combined pipeline of automated generation and human verification. To tackle this challenging new setting, we further propose PEARL, a plug-and-play, training-free strategy that serves as a strong baseline. Extensive evaluations across 8 offline and online models demonstrate that PEARL achieves state-of-the-art performance. Notably, it brings consistent PSVU improvements when applied to 3 distinct architectures, proving to be a highly effective and robust strategy. We hope this work advances vision-language model (VLM) personalization and inspires further research into streaming personalized AI assistants. Code is available at https://github.com/Yuanhong-Zheng/PEARL.

URL PDF HTML ☆

赞 0 踩 0

2603.20418 2026-03-24 cs.LG cs.NA math.NA

Data-driven discovery of roughness descriptors for surface characterization and intimate contact modeling of unidirectional composite tapes

Sebastian Rodriguez, Mikhael Tannous, Jad Mounayer, Camilo Cruz, Anais Barasinski, Francisco Chinesta

2603.20406 2026-03-24 cs.LG cs.AI

Thinking in Different Spaces: Domain-Specific Latent Geometry Survives Cross-Architecture Translation

Marcus Armstrong, Navid Ayoobi, Arjun Mukherjee

详情

英文摘要

We investigate whether independently trained language models converge to geometrically compatible latent representations, and whether this compatibility can be exploited to correct model behavior at inference time without any weight updates. We learn a linear projection matrix that maps activation vectors from a large teacher model into the coordinate system of a smaller student model, then intervene on the student's residual stream during generation by substituting its internal state with the translated teacher representation. Across a fully crossed experimental matrix of 20 heterogeneous teacher-student pairings spanning mixture-of-experts, dense, code-specialized, and synthetically trained architectures, the Ridge projection consistently achieves R^2 = 0.50 on verbal reasoning and R^2 = 0.40 on mathematical reasoning, collapsing to R^2 = -0.22 under permutation control and R^2 = 0.01 under L_1 regularization. Behavioral correction rates range from 14.0% to 50.0% on TruthfulQA (mean 25.2%) and from 8.5% to 43.3% on GSM8K arithmetic reasoning (mean 25.5%), demonstrating that the method generalizes across fundamentally different reasoning domains. We report a near-zero correlation between geometric alignment quality and behavioral correction rate (r = -0.07), revealing a dissociation between representation space fidelity and output space impact. Intervention strength is architecture-specific: student models exhibit characteristic sensitivity profiles that invert across domains, with the most steerable verbal student becoming the least steerable mathematical student. Finally, a double dissociation experiment conducted across all 20 model pairings confirms without exception that projection matrices collapse catastrophically when transferred across reasoning domains (mean R^2 = -3.83 in both transfer directions), establishing domain-specific subspace geometry as a universal property of LMs.

URL PDF HTML ☆

赞 0 踩 0

2603.20403 2026-03-24 cs.CV

FAAR: Efficient Frequency-Aware Multi-Task Fine-Tuning via Automatic Rank Selection

Maxime Fontana, Michael Spratling, Miaojing Shi

Comments CVPR 2026

2603.20397 2026-03-24 cs.LG cs.AI

KV Cache Optimization Strategies for Scalable and Efficient LLM Inference

Yichun Xu, Navjot K. Khaira, Tejinder Singh

Comments 24 pages, 14 figures

2603.20396 2026-03-24 cs.AI math.LO

Compression is all you need: Modeling Mathematics

Vitaly Aksenov, Eve Bodnia, Michael H. Freedman, Michael Mulligan

Comments 28 pages, 5 figures, 1 appendix

2603.20392 2026-03-24 cs.LG cs.AI stat.ML

SymCircuit: Bayesian Structure Inference for Tractable Probabilistic Circuits via Entropy-Regularized Reinforcement Learning

Y. Sungtaek Ju

Comments 17 pages

2603.20390 2026-03-24 cs.LG cs.AI

CAMA: Exploring Collusive Adversarial Attacks in c-MARL

Men Niu, Xinxin Fan, Quanliang Jing, Shaoye Luo, Yunfeng Lu

2603.20386 2026-03-24 cs.CV

Jigsaw Regularization in Whole-Slide Image Classification

So Won Jeong, Veronika Ročková

2603.20383 2026-03-24 cs.CV

Multi-Stage Fine-Tuning of Pathology Foundation Models with Head-Diverse Ensembling for White Blood Cell Classification

Antony Gitau, Martin Paulson, Bjørn-Jostein Singstad, Karl Thomas Hjelmervik, Ola Marius Lysaker, Veralia Gabriela Sanchez

Comments Accepted to ISBI 2026

2603.20382 2026-03-24 cs.CV

Uni-Classifier: Leveraging Video Diffusion Priors for Universal Guidance Classifier

Yujie Zhou, Pengyang Ling, Jiazi Bu, Bingjie Gao, Li Niu

Comments Accepted by ICME 2026

2603.20381 2026-03-24 cs.CL cs.AI cs.HC

The production of meaning in the processing of natural language

Christopher J. Agostino, Quan Le Thien, Nayan D'Souza, Louis van der Elst

Comments Submitted to HAXD 2026, 9 pages, 3 figures, 2 tables. associated package available at https://github.com/npc-worldwide/qstk

2603.20353 2026-03-24 cs.CV cs.RO eess.IV eess.SP

Scene Representation using 360° Saliency Graph and its Application in Vision-based Indoor Navigation

Preeti Meena, Himanshu Kumar, Sandeep Yadav

2603.20352 2026-03-24 cs.LG

The Multiverse of Time Series Machine Learning: an Archive for Multivariate Time Series Classification

Matthew Middlehurst, Aiden Rushbrooke, Ali Ismail-Fawaz, Maxime Devanne, Germain Forestier, Angus Dempster, Geoffrey I. Webb, Christopher Holder, Anthony Bagnall

2603.20348 2026-03-24 cs.CV

Toward a Multi-View Brain Network Foundation Model: Cross-View Consistency Learning Across Arbitrary Atlases

Jiaxing Xu, Jingying Ma, Xin Lin, Yuxiao Liu, Kai He, Qika Lin, Yiping Ke, Yang Li, Dinggang Shen, Mengling Feng

2603.20341 2026-03-24 cs.LG

Interpretable Multiple Myeloma Prognosis with Observational Medical Outcomes Partnership Data

Salma Rachidi, Aso Bozorgpanah, Eric Fey, Alexander Jung

2603.20337 2026-03-24 cs.CV

High-fidelity Multi-view Normal Integration with Scale-encoded Neural Surface Representation

Tongyu Yang, Heng Guo, Yasuyuki Matsushita, Fumio Okura, Yu Luo, Xin Fan

Comments 12 pages, 11 figures

2603.20335 2026-03-24 cs.LG

Hybrid Autoencoder-Isolation Forest approach for time series anomaly detection in C70XP cyclotron operation data at ARRONAX

F Basbous, F Poirier, F Haddad, D Mateus

2603.20333 2026-03-24 cs.LG cs.AI cs.MA

Bounded Coupled AI Learning Dynamics in Tri-Hierarchical Drone Swarms

Oleksii Bychkov

Comments 25 pages, 3 tables

2603.20327 2026-03-24 cs.LG cs.AI cs.CV

Probing the Latent World: Emergent Discrete Symbols and Physical Structure in Latent Representations

Liu hung ming

Comments 35 pages, 6 figures, 3 tables, 26 equations; independent research report; Stage 1 of a four-stage AIM--V-JEPA 2 integration roadmap; code available at https://github.com/cyrilliu1974/JEPA

2603.20326 2026-03-24 cs.CV

Prompt-Free Lightweight SAM Adaptation for Histopathology Nuclei Segmentation with Strong Cross-Dataset Generalization

Muhammad Hassan Maqsood, Yanming Zhu, Alfred Lam, Getamesay Dagnaw, Xuefei Yin, Alan Wee-Chung Liew

2603.20325 2026-03-24 cs.CV

DCG-Net: Dual Cross-Attention with Concept-Value Graph Reasoning for Interpretable Medical Diagnosis

Getamesay Dagnaw, Xuefei Yin, Muhammad Hassan Maqsood, Yanming Zhu, Alan Wee-Chung Liew

2603.20323 2026-03-24 cs.CV

NCSTR: Node-Centric Decoupled Spatio-Temporal Reasoning for Video-based Human Pose Estimation

Quang Dang Huynh, Xuefei Yin, Andrew Busch, Hugo G. Espinosa, Alan Wee-Chung Liew, Matthew T. O. Worsey, Yanming Zhu

2603.20317 2026-03-24 cs.CV cs.DC cs.NI

Which Workloads Belong in Orbit? A Workload-First Framework for Orbital Data Centers Using Semantic Abstraction

Durgendra Narayan Singh

2603.20315 2026-03-24 cs.LG

Rolling-Origin Validation Reverses Model Rankings in Multi-Step PM10 Forecasting: XGBoost, SARIMA, and Persistence

Federico Garcia Crespi, Eduardo Yubero Funes, Marina Alfosea Simon

Comments 28 pages, 4 figures. Submitted to International Journal of Forecasting

2603.20314 2026-03-24 cs.CV cs.LG

VGS-Decoding: Visual Grounding Score Guided Decoding for Hallucination Mitigation in Medical VLMs

Govinda Kolli, Adinath Madhavrao Dukre, Behzad Bozorgtabar, Dwarikanath Mahapatra, Imran Razzak