arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2509.16654 2026-03-04 cs.CV

Are VLMs Ready for Lane Topology Awareness in Autonomous Driving?

Xin Chen, Jia He, Maozheng Li, Dongliang Xu, Tianyu Wang, Yixiao Chen, Zhixin Lin, Yue Yao

Comments 5 pages, 5 figures

详情

英文摘要

Vision-Language Models (VLMs) have recently shown remarkable progress in multimodal reasoning, yet their applications in autonomous driving remain limited. In particular, the ability to understand road topology, a key requirement for safe navigation, has received relatively little attention. While some recent works have begun to explore VLMs in driving contexts, their performance on topology reasoning is far from satisfactory. In this work, we systematically evaluate VLMs' capabilities in road topology understanding. Specifically, multi-view images are projected into unified ground-plane coordinate system and fused into bird's-eye-view (BEV) lanes. Based on these BEV lanes, we formulate four topology-related diagnostic VQA tasks, which together capture essential components of spatial topology reasoning. Through extensive evaluation, we find that while frontier closed-source models (e.g., GPT-4o) achieve relatively high accuracy in some tasks, they still fail in some spatial questions that humans can answer (e.g., GPT-4o achieve only 67.8% in vector, a two-class classification problem). Furthermore, we find open-source VLMs, even at 30B scale, struggle significantly. These results indicate that spatial reasoning remains a fundamental bottleneck for current VLMs. We also find that the model's capability is positively correlated with model size, length of reasoning tokens and shots provided as examples, showing direction for future research.

URL PDF HTML ☆

赞 0 踩 0

2509.11663 2026-03-04 cs.RO cs.AI cs.CV

ConEQsA: Concurrent and Asynchronous Embodied Questions Scheduling and Answering

Haisheng Wang, Dong Liu, Weiming Zhi

Comments 8 pages, 6 figures

2509.10625 2026-03-04 cs.CL cs.AI

No Answer Needed: Predicting LLM Answer Accuracy from Question-Only Linear Probes

Iván Vicente Moreno Cencerrado, Arnau Padrés Masdemont, Anton Gonzalvez Hawthorne, David Demitri Africa, Lorenzo Pacchiardi

Comments Accepted (poster) to Principled Design for Trustworthy AI at ICLR 2026

2508.18817 2026-03-04 cs.RO cs.LG

Learning Acrobatic Flight from Preferences

Colin Merk, Ismail Geles, Jiaxu Xing, Angel Romero, Giorgia Ramponi, Davide Scaramuzza

Comments 8 pages, 6 figures

2508.18264 2026-03-04 cs.CV

MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs

Sixun Dong, Juhua Hu, Mian Zhang, Ming Yin, Yanjie Fu, Qi Qian

Comments Accepted by ICLR 2026. Code: https://github.com/Ironieser/mmtok , Project Homepage: https://project.ironieser.cc/mmtok

2508.05282 2026-03-04 cs.CL

Not All Errors Are Created Equal: ASCoT Addresses Late-Stage Fragility in Efficient LLM Reasoning

Dongxu Zhang, Yujun Wu, Yiding Sun, Jinnan Yang, Ning Yang, Jihua Zhu, Miao Xin, Baoliang Tian

2508.04542 2026-03-04 cs.LG cs.CR cs.SI

Privacy Risk Predictions Based on Fundamental Understanding of Personal Data and an Evolving Threat Landscape

Haoran Niu, K. Suzanne Barber

Comments 13 pages, 10 figures, 1 table

2508.01592 2026-03-04 cs.CV cs.AI

DMTrack: Spatio-Temporal Multimodal Tracking via Dual-Adapter

Weihong Li, Shaohua Dong, Haonan Lu, Yanhao Zhang, Heng Fan, Libo Zhang

Comments Accepted by ICRA 2026

2507.16334 2026-03-04 cs.AI cs.LG math.DG

Higher Gauge Flow Models

Alexander Strunk, Roland Assam

Comments arXiv admin note: text overlap with arXiv:2507.13414

2507.13414 2026-03-04 cs.LG cs.AI math.DG

Gauge Flow Models

Alexander Strunk, Roland Assam

2506.23316 2026-03-04 cs.RO cs.CV

SceneStreamer: Continuous Scenario Generation as Next Token Group Prediction

Zhenghao Peng, Yuxin Liu, Bolei Zhou

2506.07275 2026-03-04 cs.LG cs.HC stat.AP

Tailored Behavior-Change Messaging for Physical Activity: Integrating Contextual Bandits and Large Language Models

Haochen Song, Dominik Hofer, Rania Islambouli, Laura Hawkins, Ananya Bhattacharjee, Zahra Hassanzadeh, Jan Smeddinck, Meredith Franklin, Joseph Jay Williams

详情

英文摘要

Contextual multi-armed bandit (cMAB) algorithms offer a promising framework for adapting behavioral interventions to individuals over time. However, cMABs often require large samples to learn effectively and typically rely on a finite pre-set of fixed message templates. In this paper, we present a hybrid cMABxLLM approach in which the cMAB selects an intervention type, and a large language model (LLM) which personalizes the message content within the selected type. We deployed this approach in a 30-day physical-activity intervention, comparing four behavioral change intervention types: behavioral self-monitoring, gain-framing, loss-framing, and social comparison, delivered as daily motivational messages to support motivation and achieve a daily step count. Message content is personalized using dynamic contextual factors, including daily fluctuations in self-efficacy, social influence, and regulatory focus. Over the trial, participants received daily messages assigned by one of five models: equal randomization (RCT), cMAB only, LLM only, LLM with interaction history, or cMABxLLM. Outcomes include motivation towards physical activity and message usefulness, assessed via ecological momentary assessments (EMAs). We evaluate and compare the five delivery models using pre-specified statistical analyses that account for repeated measures and time trends. We find that the cMABxLLM approach retains the perceived acceptance of LLM-generated messages, while reducing token usage and providing an explicit, reproducible decision rule for intervention selection. This hybrid approach also avoids the skew in intervention delivery by improving support for under-delivered intervention types. More broadly, our approach provides a deployable template for combining Bayesian adaptive experimentation with generative models in a way that supports both personalization and interpretability.

URL PDF HTML ☆

赞 0 踩 0

2506.07177 2026-03-04 cs.CV cs.AI

Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models

Sangwon Jang, Taekyung Ki, Jaehyeong Jo, Jaehong Yoon, Soo Ye Kim, Zhe Lin, Sung Ju Hwang

Comments ICLR 2026. Project page: https://frame-guidance-video.github.io/

2506.03922 2026-03-04 cs.CL cs.AI cs.CV

HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models

Zhaolu Kang, Junhao Gong, Jiaxu Yan, Wanke Xia, Yian Wang, Ziwen Wang, Huaxuan Ding, Zhuo Cheng, Wenhao Cao, Zhiyuan Feng, Siqi He, Shannan Yan, Junzhe Chen, Xiaomin He, Chaoya Jiang, Wei Ye, Kaidong Yu, Xuelong Li

2506.01303 2026-03-04 cs.LG q-bio.NC

Dynamic Manifold Hopfield Networks for Context-Dependent Associative Memory

Chong Li, Taiping Zeng, Xiangyang Xue, Jianfeng Feng

2505.21813 2026-03-04 cs.LG stat.ML

Optimizing Data Augmentation through Bayesian Model Selection

Madi Matymov, Ba-Hien Tran, Michael Kampffmeyer, Markus Heinonen, Maurizio Filippone

Comments 26 pages, 3 figures

2505.19892 2026-03-04 cs.AI

OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging

Yongxian Wei, Runxi Cheng, Weike Jin, Enneng Yang, Li Shen, Lu Hou, Sinan Du, Chun Yuan, Xiaochun Cao, Dacheng Tao

2505.18996 2026-03-04 cs.LG stat.ML

Automatic and Structure-Aware Sparsification of Hybrid Neural ODEs

Bob Junyi Zou, Lu Tian

Comments Accepted at The 14th International Conference on Learning Representations (ICLR) 2026

2505.15008 2026-03-04 cs.LG cs.AI stat.ML

Know When to Abstain: Optimal Selective Classification with Likelihood Ratios

Alvin Heng, Harold Soh

2504.10190 2026-03-04 cs.CV

Differentially Private 2D Human Pose Estimation

Kaushik Bhargav Sivangi, Paul Henderson, Fani Deligianni

Comments Accepted at the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026

2503.14572 2026-03-04 cs.LG cs.AI

Robust Weight Imprinting: Insights from Neural Collapse and Proxy-Based Aggregation

Justus Westerhoff, Golzar Atefi, Mario Koddenbrock, Alexei Figueroa, Alexander Löser, Erik Rodner, Felix A. Gers

2503.12567 2026-03-04 cs.CV

GAN-Based Single-Stage Defense for Traffic Sign Classification Under Adversarial Patch

Abyad Enan, Mashrur Chowdhury

Comments This work has been submitted to a peer-reviewed journal and is currently under review

2503.07348 2026-03-04 cs.CV

Cycle-Consistent Multi-Graph Matching for Self-Supervised Annotation of C.Elegans

Christoph Karg, Sebastian Stricker, Lisa Hutschenreiter, Bogdan Savchynskyy, Dagmar Kainmueller

2502.20314 2026-03-04 cs.LG

Adversarial Attacks in Weight-Space Classifiers

Tamir Shor, Ethan Fetaya, Chaim Baskin, Alex Bronstein

2502.16894 2026-03-04 cs.CL

Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

Chenghao Fan, Zhenyi Lu, Sichen Liu, Chengfeng Gu, Xiaoye Qu, Wei Wei, Yu Cheng

Comments Accepted by ICML 2025

2412.17558 2026-03-04 cs.CL

A Survey of Query Optimization in Large Language Models

Mingyang Song, Mao Zheng

Comments Ongoing Work

2411.11677 2026-03-04 cs.LG cs.CR cs.IR

Few-shot Model Extraction Attacks against Sequential Recommender Systems

Hui Zhang, Fu Liu

Comments there are something wrong in the formula

2409.00447 2026-03-04 cs.AI

The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts

I. de Rodrigo, A. Sanchez-Cuadrado, J. Boal, A. J. Lopez-Lopez

2407.12843 2026-03-04 cs.CL cs.AI

NutriBench: A Dataset for Evaluating Large Language Models on Nutrition Estimation from Meal Descriptions

Andong Hua, Mehak Preet Dhaliwal, Laya Pullela, Ryan Burke, Yao Qin

Comments ICLR 2025

2407.01656 2026-03-04 cs.LG cond-mat.dis-nn physics.data-an q-bio.NC stat.ML

Absolute abstraction: a renormalisation group approach

Carlo Orientale Caputo, Elias Seiffert, Enrico Frausin, Matteo Marsili

Comments 35 pages, 6 figures