arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.03704 2026-03-09 cs.RO cs.AI

Large-Language-Model-Guided State Estimation for Partially Observable Task and Motion Planning

Yoonwoo Kim, Raghav Arora, Roberto Martín-Martín, Peter Stone, Ben Abbatematteo, Yoonchang Sung

详情

英文摘要

Robot planning in partially observable environments, where not all objects are known or visible, is a challenging problem, as it requires reasoning under uncertainty through partially observable Markov decision processes. During the execution of a computed plan, a robot may unexpectedly observe task-irrelevant objects, which are typically ignored by naive planners. In this work, we propose incorporating two types of common-sense knowledge: (1) certain objects are more likely to be found in specific locations; and (2) similar objects are likely to be co-located, while dissimilar objects are less likely to be found together. Manually engineering such knowledge is complex, so we explore leveraging the powerful common-sense reasoning capabilities of large language models (LLMs). Our planning and execution framework, CoCo-TAMP, introduces a hierarchical state estimation that uses LLM-guided information to shape the belief over task-relevant objects, enabling efficient solutions to long-horizon task and motion planning problems. In experiments, CoCo-TAMP achieves an average reduction of 62.7% in planning and execution time in simulation, and 72.6% in real-world demonstrations, compared to a baseline that does not incorporate either type of common-sense knowledge.

URL PDF HTML ☆

赞 0 踩 0

2603.03294 2026-03-09 cs.CL cs.AI cs.LG

Fine-Tuning and Evaluating Conversational AI for Agricultural Advisory

Sanyam Singh, Naga Ganesh, Vineet Singh, Lakshmi Pedapudi, Ritesh Kumar, SSP Jyothi, Archana Karanam, Waseem Pasha, Ekta Kumari, C. Yashoda, Mettu Vijaya Rekha Reddy, Shesha Phani Debbesa, Chandan Dash

Comments 22 pages, 5 figures, 9 tables

2603.02872 2026-03-09 cs.CV

Think-as-You-See: Streaming Chain-of-Thought Reasoning for Large Vision-Language Models

Jialiang Zhang, Junlong Tong, Junyan Lin, Hao Wu, Yirong Sun, Yunpu Ma, Xiaoyu Shen

2603.02795 2026-03-09 cs.CV

VSearcher: Long-Horizon Multimodal Search Agent via Reinforcement Learning

Ruiyang Zhang, Qianguo Sun, Chao Song, Yiyan Qi, Zhedong Zheng

Comments 23 pages, 6 figures

2603.02406 2026-03-09 cs.LG cs.AI

Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

Zhanghan Ni, Yanjing Li, Zeju Qiu, Bernhard Schölkopf, Hongyu Guo, Weiyang Liu, Shengchao Liu

Comments The Fourteenth International Conference on Learning Representations; Code available at: https://github.com/ZhanghanNi/RigidSSL.git

2603.02002 2026-03-09 cs.LG cs.AI

MatRIS: Toward Reliable and Efficient Pretrained Machine Learning Interatomic Potentials

Yuanchang Zhou, Siyu Hu, Xiangyu Zhang, Hongyu Wang, Guangming Tan, Weile Jia

Comments 28 pages, 9 figures, 12 tables

2603.01511 2026-03-09 cs.AI

Multimodal Mixture-of-Experts with Retrieval Augmentation for Protein Active Site Identification

Jiayang Wu, Jiale Zhou, Rubo Wang, Xingyi Zhang, Xun Lin, Tianxu Lv, Leong Hou U, Yefeng Zheng

2603.01034 2026-03-09 cs.CV cs.AI cs.LG

Reparameterized Tensor Ring Functional Decomposition for Multi-Dimensional Data Recovery

Yangyang Xu, Junbo Ke, You-Wei Wen, Chao Wang

Comments 22 pages, 18 figures, 12 tables. Accepted by CVPR 2026

2603.00543 2026-03-09 cs.CV

Cross-Scale Pansharpening via ScaleFormer and the PanScale Benchmark

Ke Cao, Xuanhua He, Xueheng Li, Lingting Zhu, Yingying Wang, Ao Ma, Zhanjie Zhang, Man Zhou, Chengjun Xie, Jie Zhang

Comments Accepted by CVPR 2026

2603.00542 2026-03-09 cs.CV

Adaptive Dynamic Dehazing via Instruction-Driven and Task-Feedback Closed-Loop Optimization for Diverse Downstream Task Adaptation

Yafei Zhang, Shuaitian Song, Huafeng Li, Shujuan Wang, Yu Liu

Comments Accepted by AAAI2026(Oral)

2603.00425 2026-03-09 cs.LG

Weight Updates as Activation Shifts: A Principled Framework for Steering

Dyah Adila, John Cooper, Alexander Yun, Avi Trost, Frederic Sala

2602.23972 2026-03-09 cs.RO

Learning Robust Control Policies for Inverted Pose on Miniature Blimp Robots

Yuanlin Yang, Lin Hong, Fumin Zhang

Comments Accepted in ICRA 2026

2602.23543 2026-03-09 cs.CV

Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos

Ziqi Gao, Jieyu Zhang, Wisdom Oluchi Ikezogwo, Jae Sung Park, Tario G. You, Daniel Ogbu, Chenhao Zheng, Weikai Huang, Yinuo Yang, Winson Han, Quan Kong, Rajat Saini, Ranjay Krishna

2602.23136 2026-03-09 cs.CL cs.AI cs.LG

Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs

Jayadev Billa

Comments 24 pages, 11 tables, 2 figures. Code: https://github.com/jb1999/modality_collapse_paper, submitted for review COLM 2026

2602.23008 2026-03-09 cs.LG cs.AI

Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization

Zeyuan Liu, Jeonghye Kim, Xufang Luo, Dongsheng Li, Yuqing Yang

Comments Accepted to ICLR 2026

2602.22654 2026-03-09 cs.CV

Denoising as Path Planning: Training-Free Acceleration of Diffusion Models with DPCache

Bowen Cui, Yuanbin Wang, Huajiang Xu, Biaolong Chen, Aixi Zhang, Hao Jiang, Zhengzheng Jin, Xu Liu, Pipei Huang

Comments Accepted by CVPR 2026

2602.21835 2026-03-09 cs.CV

UniVBench: Towards Unified Evaluation for Video Foundation Models

Jianhui Wei, Xiaotian Zhang, Yichen Li, Yuan Wang, Yan Zhang, Ziyi Chen, Zhihang Tang, Wei Xu, Zuozhu Liu

2602.21273 2026-03-09 cs.CV

StoryTailor:A Zero-Shot Pipeline for Action-Rich Multi-Subject Visual Narratives

Jinghao Hu, Yuhe Zhang, GuoHua Geng, Kang Li, Han Zhang

Comments 24 pages,19 figures,accepted by CVPR2026

2602.18822 2026-03-09 cs.CV

Robust Self-Supervised Cross-Modal Super-Resolution against Real-World Misaligned Observations

Xiaoyu Dong, Jiahuan Li, Ziteng Cui, Naoto Yokoya

2602.17598 2026-03-09 cs.CL cs.AI eess.AS

The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?

Jayadev Billa

Comments 10 pages, 6 figures, 7 tables. submitted for review Interspeech 2026

2602.17424 2026-03-09 cs.CL

Diverse Word Choices, Same Reference: Annotating Lexically-Rich Cross-Document Coreference

Anastasia Zhukova, Felix Hamborg, Karsten Donnay, Norman Meuschke, Bela Gipp

Comments accepted to ACDSA 2026

2602.17095 2026-03-09 cs.LG cs.AI

FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment

Chuiyang Meng, Ming Tang, Vincent W. S. Wong

2602.15849 2026-03-09 cs.CL cs.AI

IntelliAsk: Learning to Ask High-Quality Research Questions via RLVR

Karun Sharma, Vidushee Vats, Shengzhi Li, Yuxiang Wang, Zhongtian Sun, Prayag Tiwari

Comments 24 Pages, v2, Abstract Modified

2602.12407 2026-03-09 cs.RO cs.CV cs.LG

MiDAS: A Multimodal Data Acquisition System and Dataset for Robot-Assisted Minimally Invasive Surgery

Keshara Weerasinghe, Seyed Hamid Reza Roodabeh, Andrew Hawkins, Zhaomeng Zhang, Zachary Schrader, Homa Alemzadeh

Comments 29 pages, 17 figures

2602.11143 2026-03-09 cs.RO

APEX: Learning Adaptive High-Platform Traversal for Humanoid Robots

Yikai Wang, Tingxuan Leng, Changyi Lin, Shiqi Liu, Shir Simon, Bingqing Chen, Jonathan Francis, Ding Zhao

Comments Project Website: https://apex-humanoid.github.io/

2602.11089 2026-03-09 cs.CL cs.AI

DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning

Yicheng Chen, Zerun Ma, Xinchen Xie, Yining Li, Kai Chen

Comments 20 pages, 11 figures

2602.10956 2026-03-09 cs.LG

Stochastic Parroting in Temporal Attention -- Regulating the Diagonal Sink

Victoria Hankemeier, Malte Schilling

Comments Accepted at ESANN 2026, Code: https://github.com/vicky-hnk/spatio-temp-parroting

2602.10704 2026-03-09 cs.CV cs.RO

(MGS)$^2$-Net: Unifying Micro-Geometric Scale and Macro-Geometric Structure for Cross-View Geo-Localization

Minglei Li, Mengfan He, Chunyu Li, Chao Chen, Xingyu Shao, Ziyang Meng

2602.10177 2026-03-09 cs.LG cs.AI cs.CL cs.CY

Towards Autonomous Mathematics Research

Tony Feng, Trieu H. Trinh, Garrett Bingham, Dawsen Hwang, Yuri Chervonyi, Junehyuk Jung, Joonkyung Lee, Carlo Pagano, Sang-hyun Kim, Federico Pasqualotto, Sergei Gukov, Jonathan N. Lee, Junsu Kim, Kaiying Hou, Golnaz Ghiasi, Yi Tay, YaGuang Li, Chenkai Kuang, Yuan Liu, Hanzhao Lin, Evan Zheran Liu, Nigamaa Nayakanti, Xiaomeng Yang, Heng-Tze Cheng, Demis Hassabis, Koray Kavukcuoglu, Quoc V. Le, Thang Luong

Comments 42 pages, updated with summary of FirstProof results. Accompanied blog post https://deepmind.google/blog/accelerating-mathematical-and-scientific-discovery-with-gemini-deep-think/

详情

英文摘要

Recent advances in foundational models have yielded reasoning systems capable of achieving a gold-medal standard at the International Mathematical Olympiad. The transition from competition-level problem-solving to professional research, however, requires navigating vast literature and constructing long-horizon proofs. In this work, we introduce Aletheia, a math research agent that iteratively generates, verifies, and revises solutions end-to-end in natural language. Specifically, Aletheia is powered by an advanced version of Gemini Deep Think for challenging reasoning problems, a novel inference-time scaling law that extends beyond Olympiad-level problems, and intensive tool use to navigate the complexities of mathematical research. We demonstrate the capability of Aletheia from Olympiad problems to PhD-level exercises and most notably, through several distinct milestones in AI-assisted mathematics research: (a) a research paper (Feng26) generated by AI without any human intervention in calculating certain structure constants in arithmetic geometry called eigenweights; (b) a research paper (LeeSeo26) demonstrating human-AI collaboration in proving bounds on systems of interacting particles called independent sets; and (c) an extensive semi-autonomous evaluation (Feng et al., 2026a) of 700 open problems on Bloom's Erdos Conjectures database, including autonomous solutions to four open questions. In order to help the public better understand the developments pertaining to AI and mathematics, we suggest quantifying standard levels of autonomy and novelty of AI-assisted results, as well as propose a novel concept of human-AI interaction cards for transparency. We conclude with reflections on human-AI collaboration in mathematics and share all prompts as well as model outputs at https://github.com/google-deepmind/superhuman/tree/main/aletheia.

URL PDF HTML ☆

赞 0 踩 0

2602.08877 2026-03-09 cs.LG

Stress-Testing Alignment Audits With Prompt-Level Strategic Deception

Oliver Daniels, Perusha Moodley, Benjamin M. Marlin, David Lindner

Comments Accepted at the ICLR 2026 Workshop on Principled Design for Trustworthy AI