arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.06153 2026-03-09 cs.LG cs.AI physics.geo-ph

Ensemble Graph Neural Networks for Probabilistic Sea Surface Temperature Forecasting via Input Perturbations

Alejandro J. González-Santana, Giovanny A. Cuervo-Londoño, Javier Sánchez

Comments 20 pages, 14 figures, 6 tables

详情

英文摘要

Accurate regional ocean forecasting requires models that are both computationally efficient and capable of representing predictive uncertainty. This work investigates ensemble learning strategies for sea surface temperature (SST) forecasting using Graph Neural Networks (GNNs), with a focus on how input perturbation design affects forecast skill and uncertainty representation. We adapt a GNN architecture to the Canary Islands region in the North Atlantic and implement a homogeneous ensemble approach inspired by bagging, where diversity is introduced during inference by perturbing initial ocean states rather than retraining multiple models. Several noise-based ensemble generation strategies are evaluated, including Gaussian noise, Perlin noise, and fractal Perlin noise, with systematic variation of noise intensity and spatial structure. Ensemble forecasts are assessed over a 15-day horizon using deterministic metrics (RMSE and bias) and probabilistic metrics, including the Continuous Ranked Probability Score (CRPS) and the Spread-skill ratio. Results show that, while deterministic skill remains comparable to the single-model forecast, the type and structure of input perturbations strongly influence uncertainty representation, particularly at longer lead times. Ensembles generated with spatially coherent perturbations, such as low-resolution Perlin noise, achieve better calibration and lower CRPS than purely random Gaussian perturbations. These findings highlight the critical role of noise structure and scale in ensemble GNN design and demonstrate that carefully constructed input perturbations can yield well-calibrated probabilistic forecasts without additional training cost, supporting the feasibility of ensemble GNNs for operational regional ocean prediction.

URL PDF HTML ☆

赞 0 踩 0

2603.06148 2026-03-09 cs.CV cs.AI

VLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language Models

Rohit Saxena, Alessandro Suglia, Pasquale Minervini

2603.06147 2026-03-09 cs.CV

Longitudinal NSCLC Treatment Progression via Multimodal Generative Models

Massimiliano Mantegna, Elena Mulero Ayllón, Alice Natalina Caragliano, Francesco Di Feola, Claudia Tacconi, Michele Fiore, Edy Ippolito, Carlo Greco, Sara Ramella, Philippe C. Cattin, Paolo Soda, Matteo Tortora, Valerio Guarrasi

2603.06142 2026-03-09 cs.LG cond-mat.dis-nn cs.AI cs.NE stat.ML

Predictive Coding Graphs are a Superset of Feedforward Neural Networks

Björn van Zwol

Comments 11 pages, 3 figures. Accepted at the NeuroAI Workshop @ NeurIPS 2024. OpenReview: https://openreview.net/forum?id=J36z3R0sNq

2603.06141 2026-03-09 cs.CV

Spatial Colour Mixing Illusions as a Perception Stress Test for Vision-Language Models

Nicoleta-Nina Basoc, Adrian Cosma, Emilian Radoi

2603.06140 2026-03-09 cs.CV cs.AI

Place-it-R1: Unlocking Environment-aware Reasoning Potential of MLLM for Video Object Insertion

Bohai Gu, Taiyi Wu, Dazhao Du, Jian Liu, Shuai Yang, Xiaotong Zhao, Alan Zhao, Song Guo

Comments https://nevsnev.github.io/Place-it-R1/

2603.06138 2026-03-09 cs.LG cs.AI

Partial Policy Gradients for RL in LLMs

Puneet Mathur, Branislav Kveton, Subhojyoti Mukherjee, Viet Dac Lai

2603.06136 2026-03-09 cs.CV

Cross-Resolution Distribution Matching for Diffusion Distillation

Feiyang Chen, Hongpeng Pan, Haonan Xu, Xinyu Duan, Yang Yang, Zhefeng Wang

2603.06131 2026-03-09 cs.LG

DQE: A Semantic-Aware Evaluation Metric for Time Series Anomaly Detection

Yuewei Li, Dalin Zhang, Huan Li, Xinyi Gong, Hongjun Chu, Zhaohui Song

2603.06130 2026-03-09 cs.RO cs.AI

A Hazard-Informed Data Pipeline for Robotics Physical Safety

Alexei Odinokov, Rostislav Yavorskiy

Comments 4th International Conference on Automation and Mechatronics Engineering (ICAME 2026)

2603.06123 2026-03-09 cs.CL cs.LG

Diffusion Language Models Are Natively Length-Aware

Vittorio Rossi, Giacomo Cirò, Davide Beltrame, Luca Gandolfi, Paul Röttger, Dirk Hovy

2603.06122 2026-03-09 cs.CV

FedARKS: Federated Aggregation via Robust and Discriminative Knowledge Selection and Integration for Person Re-identification

Xin Xu, Binchang Ma, Zhixi Yu, Wei Liu

2603.06121 2026-03-09 cs.RO

Sticky-Glance: Robust Intent Recognition for Human Robot Collaboration via Single-Glance

Yuzhi Lai, Shenghai Yuan, Peizheng Li, Andreas Zell

2603.06120 2026-03-09 cs.LG

Dynamic Momentum Recalibration in Online Gradient Learning

Zhipeng Yao, Rui Yu, Guisong Chang, Ying Li, Yu Zhang, Dazhou Li

Comments Accepted by CVPR 2026

2603.06114 2026-03-09 cs.CL cs.AI

Making Implicit Premises Explicit in Logical Understanding of Enthymemes

Xuyao Feng, Anthony Hunter

2603.06113 2026-03-09 cs.LG physics.chem-ph

Latent Diffusion-Based 3D Molecular Recovery from Vibrational Spectra

Wenjin Wu, Aleš Leonardis, Linjiang Chen, Jianbo Jiao

Comments 27 pages, 10 figures

2603.06090 2026-03-09 cs.CV cs.CL

DeepSight: Bridging Depth Maps and Language with a Depth-Driven Multimodal Model

Hao Yang, Hongbo Zhang, Yanyan Zhao, Bing Qin

2603.06088 2026-03-09 cs.CL cs.AI

Experiences Build Characters: The Linguistic Origins and Functional Impact of LLM Personality

Xi Wang, Mengdie Zhuang, Jiqun Liu

2603.06084 2026-03-09 cs.RO

Multimodal Behavior Tree Generation: A Small Vision-Language Model for Robot Task Planning

Cristiano Battistini, Riccardo Andrea Izzo, Gianluca Bardaro, Matteo Matteucci

2603.06081 2026-03-09 cs.CV

Lyapunov Probes for Hallucination Detection in Large Foundation Models

Bozhi Luan, Gen Li, Yalan Qin, Jifeng Guo, Yun Zhou, Faguo Wu, Hongwei Zheng, Wenjun Wu, Zhaoxin Fan

2603.06073 2026-03-09 cs.RO cs.AI

Lifelong Embodied Navigation Learning

Xudong Wang, Jiahua Dong, Baichen Liu, Qi Lyu, Lianqing Liu, Zhi Han

Comments 24 pages, 7 figures

2603.06071 2026-03-09 cs.CV cs.AI

Text-Driven Emotionally Continuous Talking Face Generation

Hao Yang, Yanyan Zhao, Tian Zheng, Hongbo Zhang, Bichen Wang, Di Wu, Xing Fu, Xuda Zhi, Yongbo Huang, Hao He

2603.06067 2026-03-09 cs.AI

Aggregative Semantics for Quantitative Bipolar Argumentation Frameworks

Yann Munro, Isabelle Bloch, Marie-Jeanne Lesot

2603.06066 2026-03-09 cs.CL cs.AI

Evaluating Austrian A-Level German Essays with Large Language Models for Automated Essay Scoring

Jonas Kubesch, Lena Huber, Clemens Havas

Comments To be presented at the SAC2026 and published in its symposium proceedings

2603.06064 2026-03-09 cs.AI

Agentic LLM Planning via Step-Wise PDDL Simulation: An Empirical Characterisation

Kai Göbel, Pierrick Lorang, Patrik Zips, Tobias Glück

2603.06061 2026-03-09 cs.CV cs.RO

Transforming Omnidirectional RGB-LiDAR data into 3D Gaussian Splatting

Semin Bae, Hansol Lim, Jongseong Brad Choi

Comments This work has been submitted to the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) for possible publication

2603.06058 2026-03-09 cs.RO

RODEO: RObotic DEcentralized Organization

Milan Groshev, Eduardo Castelló Ferrer

Comments 8 pages, 6 figures, Accepted at IEEE International Conference on Robotics & Automation (2026)

2603.06057 2026-03-09 cs.CV cs.AI cs.LG cs.SD

TempoSyncDiff: Distilled Temporally-Consistent Diffusion for Low-Latency Audio-Driven Talking Head Generation

Soumya Mazumdar, Vineet Kumar Rakesh

2603.06054 2026-03-09 cs.CV cs.AI

Probing Visual Concepts in Lightweight Vision-Language Models for Automated Driving

Nikos Theodoridis, Reenu Mohandas, Ganesh Sistu, Anthony Scanlan, Ciarán Eising, Tim Brophy

详情

英文摘要

The use of Vision-Language Models (VLMs) in automated driving applications is becoming increasingly common, with the aim of leveraging their reasoning and generalisation capabilities to handle long tail scenarios. However, these models often fail on simple visual questions that are highly relevant to automated driving, and the reasons behind these failures remain poorly understood. In this work, we examine the intermediate activations of VLMs and assess the extent to which specific visual concepts are linearly encoded, with the goal of identifying bottlenecks in the flow of visual information. Specifically, we create counterfactual image sets that differ only in a targeted visual concept and then train linear probes to distinguish between them using the activations of four state-of-the-art (SOTA) VLMs. Our results show that concepts such as the presence of an object or agent in a scene are explicitly and linearly encoded, whereas other spatial visual concepts, such as the orientation of an object or agent, are only implicitly encoded by the spatial structure retained by the vision encoder. In parallel, we observe that in certain cases, even when a concept is linearly encoded in the model's activations, the model still fails to answer correctly. This leads us to identify two failure modes. The first is perceptual failure, where the visual information required to answer a question is not linearly encoded in the model's activations. The second is cognitive failure, where the visual information is present but the model fails to align it correctly with language semantics. Finally, we show that increasing the distance of the object in question quickly degrades the linear separability of the corresponding visual concept. Overall, our findings improve our understanding of failure cases in VLMs on simple visual tasks that are highly relevant to automated driving.

URL PDF HTML ☆

赞 0 踩 0

2603.06049 2026-03-09 cs.CV cs.RO

Devil is in Narrow Policy: Unleashing Exploration in Driving VLA Models

Canyu Chen, Yuguang Yang, Zhewen Tan, Yizhi Wang, Ruiyi Zhan, Haiyan Liu, Xuanyao Mao, Jason Bao, Xinyue Tang, Linlin Yang, Bingchuan Sun, Yan Wang, Baochang Zhang

Comments Accepted by CVPR2026 findings