arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2512.17053 2026-03-13 cs.CL cs.AI cs.DB

Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL

Khushboo Thaker, Yony Bresler

Comments Accepted at the 39th Canadian Conference on Artificial Intelligence (Canadian AI 2026). This is the extended version containing additional details and appendices omitted from the camera-ready proceedings due to space constraints

2512.06002 2026-03-13 cs.RO cs.AI

POrTAL: Plan-Orchestrated Tree Assembly for Lookahead

Evan Conway, David Porfirio, David Chan, Mark Roberts, Laura M. Hiatt

Comments Submitted to IROS 26

2512.04034 2026-03-13 cs.LG

Domain Feature Collapse: Implications for Out-of-Distribution Detection and Solutions

Hong Yang, Devroop Kar, Qi Yu, Alex Ororbia, Travis Desell

Comments Error in theoretical assumptions

2512.02421 2026-03-13 cs.CV

Generalizing Vision-Language Models with Dedicated Prompt Guidance

Xinyao Li, Yinjie Min, Hongbo Chen, Zhekai Du, Fengling Li, Jingjing Li

Comments Accepted to AAAI26

2511.22433 2026-03-13 cs.CV

SkeletonAgent: An Agentic Interaction Framework for Skeleton-based Action Recognition

Hongda Liu, Yunfan Liu, Changlu Wang, Yunlong Wang, Zhenan Sun

2511.22018 2026-03-13 cs.CV cs.AI

MedEyes: Learning Dynamic Visual Focus for Medical Progressive Diagnosis

Chunzheng Zhu, Yangfang Lin, Shen Chen, Yijun Wang, Jianxin Lin

Comments AAAI 2026, Medical Chain-of-Thought (CoT), Reinforcement Learning with Verifiable Rewards (RLVR), Multimodal Grounded Reasoning

2511.18685 2026-03-13 cs.CV cs.RO

Beyond Description: Cognitively Benchmarking Fine-Grained Action for Embodied Agents

Dayong Liu, Chao Xu, Weihong Chen, Suyu Zhang, Juncheng Wang, Jiankang Deng, Baigui Sun, Yang Liu

2511.18463 2026-03-13 cs.CV

Decoupling Perception from Reasoning for Hallucination-Resistant Video Understanding

Bowei Pu, Chuanbin Liu, Yifan Ge, Peicheng Zhou, Yiwei Sun, Zhiying Lu, Zhangchi Hu, Hongtao Xie

Comments 17 pages, 8 figures

2511.16846 2026-03-13 cs.CL cs.AI

ConCISE: A Reference-Free Conciseness Evaluation Metric for LLM-Generated Answers

Seyed Mohssen Ghafari, Ronny Kol, Juan C. Quiroz, Nella Luan, Monika Patial, Chanaka Rupasinghe, Herman Wandabwa, Luiz Pizzato

2511.12908 2026-03-13 cs.CV cs.AI

DeepSport: A Multimodal Large Language Model for Comprehensive Sports Video Reasoning via Agentic Reinforcement Learning

Junbo Zou, Haotian Xia, Zhen Ye, Shengjie Zhang, Christopher Lai, Vicente Ordonez, Weining Shen, Hanjie Chen

2511.12254 2026-03-13 cs.AI cs.IR

Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation

Yuxiang Zhou, Jichang Li, Yanhao Zhang, Haonan Lu, Guanbin Li

2511.11851 2026-03-13 cs.CV cs.CR

Defending Unauthorized Model Merging via Dual-Stage Weight Protection

Wei-Jia Chen, Min-Yen Tsai, Cheng-Yi Lee, Chia-Mu Yu

Comments Accepted at CVPR 2026, updated

2511.09921 2026-03-13 cs.AI

Adaptive Hyperbolic Kernels: Modulated Embedding in de Branges-Rovnyak Spaces

Leping Si, Meimei Yang, Hui Xue, Shipeng Zhu, Pengfei Fang

Comments 13 pages, 3 figures, AAAI26 conference Camera-Ready

2511.07654 2026-03-13 cs.RO

Time as a Control Dimension in Robot Learning

Yinsen Jia, Boyuan Chen

2511.06315 2026-03-13 cs.CV

PuzLM: Solving Jigsaw Puzzles with Sequence-to-Sequence Language Models

Gur Elkin, Ofir Itzhak Shahar, Ohad Ben-Shahar

2511.04583 2026-03-13 cs.AI cs.CL cs.CV cs.LG

Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper

Atsuyuki Miyai, Mashiro Toyooka, Takashi Otonari, Zaiying Zhao, Kiyoharu Aizawa

Comments TMLR2026. Issues, comments, and questions are all welcome in https://github.com/Agent4Science-UTokyo/Jr.AI-Scientist

详情

英文摘要

Understanding the current capabilities and risks of AI Scientist systems (autoresearch) is essential for ensuring trustworthy and sustainable AI-driven scientific progress while preserving the integrity of the academic ecosystem. To this end, we develop Jr. AI Scientist, a state-of-the-art autonomous AI scientist system that mimics the core research workflow of a novice student researcher: Given the baseline paper from the human mentor, it analyzes its limitations, formulates novel hypotheses for improvement, iteratively experiments until improvements are achieved, and writes a paper with the results. Unlike previous approaches that assume full automation or operate on small-scale code, Jr. AI Scientist follows a well-defined research workflow and leverages modern coding agents to handle complex, multi-file implementations, leading to scientifically valuable contributions. Through our experiments, the Jr. AI Scientist successfully generated new research papers that build upon real NeurIPS, IJCV, and ICLR works by proposing and implementing novel methods. For evaluation, we conducted automated assessments using AI Reviewers, author-led evaluations, and submissions to Agents4Science, a venue dedicated to AI-driven contributions. The findings demonstrate that Jr. AI Scientist generates papers receiving higher review scores by DeepReviewer than existing fully automated systems. Nevertheless, we identify important limitations from the author evaluation and the Agents4Science reviews, indicating the potential risks of directly applying current AI Scientist systems and key challenges for future research. Finally, we comprehensively report various risks identified during development. We believe this study clarifies the current role and limitations of AI Scientist systems, offering insights into the areas that still require human expertise and the risks that may emerge as these systems evolve.

URL PDF HTML ☆

赞 0 踩 0

2511.03400 2026-03-13 cs.RO

GUIDES: Guidance Using Instructor-Distilled Embeddings for Pre-trained Robot Policy Enhancement

Minquan Gao, Xinyi Li, Qing Yan, Xiaojian Sun, Xiaopan Zhang, Chien-Ming Huang, Jiachen Li

Comments IEEE International Conference on Robotics and Automation (ICRA 2026)

2511.00783 2026-03-13 cs.RO cs.SY eess.SY

When Semantics Connect the Swarm: LLM-Driven Fuzzy Control for Cooperative Multi-Robot Underwater Coverage

Jingzehua Xu, Weihang Zhang, Yangyang Li, Hongmiaoyi Zhang, Guanwen Xie, Jiwei Tang, Shuai Zhang, Yi Li

Comments Withdrawal for further improvement. The final version will be released in a few months

2511.00617 2026-03-13 cs.LG cs.AI cs.CL stat.ML

Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering

Eric Bigelow, Daniel Wurgaft, YingQiao Wang, Noah Goodman, Tomer Ullman, Hidenori Tanaka, Ekdeep Singh Lubana

2510.26796 2026-03-13 cs.CV cs.GR

See4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting

Dongyue Lu, Ao Liang, Tianxin Huang, Xiao Fu, Yuyang Zhao, Baorui Ma, Liang Pan, Wei Yin, Lingdong Kong, Wei Tsang Ooi, Ziwei Liu

Comments Eurographics2026; 26 pages; 21 figures; 3 tables; project page: https://see-4d.github.io/

2510.21019 2026-03-13 cs.LG cs.CV

More Than Memory Savings: Zeroth-Order Optimization Mitigates Forgetting in Continual Learning

Wanhao Yu, Zheng Wang, Shuteng Niu, Sen Lin, Li Yang

2510.16439 2026-03-13 cs.CL

FrugalPrompt: Reducing Contextual Overhead in Large Language Models via Token Attribution

Syed Rifat Raiyan, Md Farhan Ishmam, Abdullah Al Imran, Mohammad Ali Moni

2510.13108 2026-03-13 cs.CV cs.AI cs.RO

DriveCritic: Towards Context-Aware, Human-Aligned Evaluation for Autonomous Driving with Vision-Language Models

Jingyu Song, Zhenxin Li, Shiyi Lan, Xinglong Sun, Nadine Chang, Maying Shen, Joshua Chen, Katherine A. Skinner, Jose M. Alvarez

Comments Accepted at ICRA 2026; 8 pages, 3 figures

2510.11036 2026-03-13 cs.RO cs.AI

XGrasp: Gripper-Aware Grasp Detection with Multi-Gripper Data Generation

Yeonseo Lee, Jungwook Mun, Hyosup Shin, Guebin Hwang, Junhee Nam, Taeyeop Lee, Sungho Jo

Comments 9 pages, 10 figures

2510.10489 2026-03-13 cs.CV

Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation

Jiaye Li, Baoyou Chen, Hui Li, Zilong Dong, Jingdong Wang, Siyu Zhu

2510.08724 2026-03-13 cs.LG

Counterfactually Fair Conformal Prediction

Ozgur Guldogan, Neeraj Sarna, Yuanyuan Li, Michael Berger

Comments Accepted at AISTATS 2026

2510.07791 2026-03-13 cs.CV

GTR-Bench: Evaluating Geo-Temporal Reasoning in Vision-Language Models

Qinghongbing Xie, Zhaoyuan Xia, Feng Zhu, Lijun Gong, Ziyue Li, Rui Zhao, Long Zeng

Comments ICLR 2026, 31 pages, 20 figures

详情

英文摘要

Recently spatial-temporal intelligence of Visual-Language Models (VLMs) has attracted much attention due to its importance for autonomous driving, embodied AI and general AI. Existing spatial-temporal benchmarks mainly focus on egocentric (first-person) perspective reasoning using images/video contexts, or geographic reasoning with graphical context (e.g., maps), thus fail to assess VLMs' geographic spatial-temporal intelligence that requires integrating both images/video and graphical context, which is crucial for real-world scenarios such as traffic management and emergency response. To address the gaps, we introduce Geo-Temporal Reasoning benchmark (GTR-Bench), a novel challenge for geographic temporal reasoning of moving targets in a large-scale camera network. GTR-Bench is more challenging as it requires multiple perspective switches between maps and videos, joint reasoning across multiple videos with non-overlapping fields of view, and inference over spatial-temporal regions that are unobserved by any video context. Evaluations of more than 10 popular VLMs on GTR-Bench show that even the best proprietary model, Gemini-2.5-Pro (34.9\%), significantly lags behind human performance (78.61\%) on geo-temporal reasoning. Moreover, our comprehensive analysis on GTR-Bench reveals three major deficiencies of current models for geo-temporal reasoning. (1) VLMs exhibit imbalanced utilization of spatial and temporal context during reasoning. (2) they show weak temporal forecasting ability, leading to poorer performance on temporally focused tasks. (3) they lack the capability to effectively align and integrate map data with multi-view video inputs. We believe GTR-Bench offers valuable insights and opens up new opportunities for research and applications in spatial-temporal intelligence. Benchmark and code will be released at https://github.com/X-Luffy/GTR-Bench.

URL PDF HTML ☆

赞 0 踩 0

2510.06754 2026-03-13 cs.RO cs.CV cs.LG

UniFField: A Generalizable Unified Neural Feature Field for Visual, Semantic, and Spatial Uncertainties in Any Scene

Christian Maurer, Snehal Jauhri, Sophie Lueth, Georgia Chalvatzaki

Comments ICRA 2026 Project website: https://sites.google.com/view/uniffield

2510.04579 2026-03-13 cs.LG math.MG stat.ML

Busemann Functions in the Wasserstein Space: Existence, Closed-Forms, and Applications to Slicing

Clément Bonet, Elsa Cazelles, Lucas Drumetz, Nicolas Courty

Comments Published as a conference paper at AISTATS 2026

2509.26489 2026-03-13 cs.CV cs.LG eess.SP

Contrastive Diffusion Guidance for Spatial Inverse Problems

Sattwik Basu, Chaitanya Amballa, Zhongweiyang Xu, Jorge Vančo Sampedro, Srihari Nelakuditi, Romit Roy Choudhury