arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.00477 2026-03-02 cs.CL

Intention-Adaptive LLM Fine-Tuning for Text Revision Generation

Zhexiong Liu, Diane Litman

Comments In the Conference of the European Chapter of the Association for Computational Linguistics (EACL), March 2026

详情

英文摘要

Large Language Models (LLMs) have achieved impressive capabilities in various context-based text generation tasks, such as summarization and reasoning; however, their applications in intention-based generation tasks remain underexplored. One such example is revision generation, which requires the generated text to explicitly reflect the writer's actual intentions. Identifying intentions and generating desirable revisions are challenging due to their complex and diverse nature. Although prior work has employed LLMs to generate revisions with few-shot learning, they struggle with handling entangled multi-intent scenarios. While fine-tuning LLMs using intention-based instructions appears promising, it demands large amounts of annotated data, which is expensive and scarce in the revision community. To address these challenges, we propose Intention-Tuning, an intention-adaptive layer-wise LLM fine-tuning framework that dynamically selects a subset of LLM layers to learn the intentions and subsequently transfers their representations to revision generation. Experimental results suggest that Intention-Tuning is effective and efficient on small revision corpora, outperforming several PEFT baselines.

URL PDF HTML ☆

赞 0 踩 0

2601.21961 2026-03-02 cs.AI cs.HC

How do Visual Attributes Influence Web Agents? A Comprehensive Evaluation of User Interface Design Factors

Kuai Yu, Naicheng Yu, Han Wang, Rui Yang, Huan Zhang

2601.20582 2026-03-02 cs.CL cond-mat.stat-mech math-ph math.MP

Single-Nodal Spontaneous Symmetry Breaking in NLP Models

Shalom Rosner, Ronit D. Gross, Ella Koresh, Ido Kanter

Comments 23 pages, 6 figures, 1 table

2601.20318 2026-03-02 cs.CV

CPiRi: Channel Permutation-Invariant Relational Interaction for Multivariate Time Series Forecasting

Jiyuan Xu, Wenyu Zhang, Xin Jing, Shuai Chen, Shuai Zhang, Jiahao Nie

Comments 22 pages, 10 figures, ICLR 2026

2601.11556 2026-03-02 cs.LG cs.AI cs.CL cs.SD eess.AS

CSyMR: Benchmarking Compositional Music Information Retrieval in Symbolic Music Reasoning

Boyang Wang, Yash Vishe, Xin Xu, Zachary Novack, Xunyi Jiang, Julian McAuley, Junda Wu

2601.11170 2026-03-02 cs.CL

The Growing Gains and Pains of Iterative Web Corpora Crawling: Insights from South Slavic CLASSLA-web 2.0 Corpora

Taja Kuzman Pungeršek, Peter Rupnik, Vít Suchomel, Nikola Ljubešić

Comments 11 pages, 7 figures, 2 tables. Accepted at the LREC 2026 conference

2601.10553 2026-03-02 cs.CV

Inference-time Physics Alignment of Video Generative Models with Latent World Models

Jianhao Yuan, Xiaofeng Zhang, Felix Friedrich, Nicolas Beltran-Velez, Melissa Hall, Reyhane Askari-Hemmat, Xiaochuang Han, Nicolas Ballas, Michal Drozdzal, Adriana Romero-Soriano

Comments 22 pages, 10 figures

2601.00186 2026-03-02 cs.LG cs.IR cs.NI

Reinforcement-Learned Unequal Error Protection for Quantized Semantic Embeddings

Moirangthem Tiken Singh, Adnan Arif

2512.24653 2026-03-02 cs.RO

RoboMIND 2.0: A Multimodal, Bimanual Mobile Manipulation Dataset for Generalizable Embodied Intelligence

Chengkai Hou, Kun Wu, Jiaming Liu, Zhengping Che, Di Wu, Fei Liao, Guangrun Li, Jingyang He, Qiuxuan Feng, Zhao Jin, Chenyang Gu, Zhuoyang Liu, Nuowei Han, Xiangju Mi, Yaoxu Lv, Yankai Fu, Gaole Dai, Langzhe Gu, Tao Li, Yuheng Zhang, Yixue Zhang, Xinhua Wang, Shichao Fan, Meng Li, Zhen Zhao, Ning Liu, Zhiyuan Xu, Pei Ren, Junjie Ji, Haonan Liu, Kuan Cheng, Shanghang Zhang, Jian Tang

2512.23075 2026-03-02 cs.LG cs.AI cs.IT math.IT stat.ML

Trust Region Masking for Long-Horizon LLM Reinforcement Learning

Yingru Li, Jiacai Liu, Jiawei Xu, Yuxuan Tong, Ziniu Li, Qian Liu, Baoxiang Wang

2512.17131 2026-03-02 cs.LG cs.AI stat.ML

Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs

Aaron Defazio, Konstantin Mishchenko, Parameswaran Raman, Hao-Jun Michael Shi, Lin Xiao

2512.14031 2026-03-02 cs.RO cs.AI

Sample-Efficient Robot Skill Learning for Construction Tasks: Benchmarking Hierarchical Reinforcement Learning and Vision-Language-Action VLA Model

Zhaofeng Hu, Hongrui Yu, Vaidhyanathan Chandramouli, Ci-Jyun Liang

详情

DOI: 10.1061/JCCEE5/CPENG-7696

英文摘要

This study evaluates two leading approaches for teaching construction robots new skills to understand their applicability for construction automation: a Vision-Language-Action (VLA) model and Reinforcement Learning (RL) methods. The goal is to understand both task performance and the practical effort needed to deploy each approach on real jobs. The authors developed two teleoperation interfaces to control the robots and collect the demonstrations needed, both of which proved effective for training robots for long-horizon and dexterous tasks. In addition, the authors conduct a three-stage evaluation. First, the authors compare a Multi-Layer Perceptron (MLP) policy with a Deep Q-network (DQN) imitation model to identify the stronger RL baseline, focusing on model performance, generalization, and a pick-up experiment. Second, three different VLA models are trained in two different scenarios and compared with each other. Third, the authors benchmark the selected RL baseline against the VLA model using computational and sample-efficiency measures and then a robot experiment on a multi-stage panel installation task that includes transport and installation. The VLA model demonstrates strong generalization and few-shot capability, achieving 60% and 100% success in the pickup phase. In comparison, DQN can be made robust but needs additional noise during tuning, which increases the workload. Overall, the findings indicate that VLA offers practical advantages for changing tasks by reducing programming effort and enabling useful performance with minimal data, while DQN provides a viable baseline when sufficient tuning effort is acceptable.

URL PDF HTML ☆

赞 0 踩 0

2512.10685 2026-03-02 cs.CV cs.LG

Sharp Monocular View Synthesis in Less Than a Second

Lars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, Amaël Delaunoy, Tian Fang, Yanghai Tsin, Stephan R. Richter, Vladlen Koltun

Comments Published at ICLR 2026. Code and weights available at https://github.com/apple/ml-sharp

2512.08016 2026-03-02 cs.CV cs.AI

FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models

Jiyoon Pyo, Yuankun Jiao, Dongwon Jung, Zekun Li, Leeje Jang, Sofia Kirsanova, Jina Kim, Yijun Lin, Qin Liu, Junyi Xie, Hadi Askari, Nan Xu, Muhao Chen, Yao-Yi Chiang

Comments Accepted to ICLR 2026

详情

英文摘要

Cartographic reasoning is the skill of interpreting geographic relationships by aligning legends, map scales, compass directions, map texts, and geometries across one or more map images. Although essential as a concrete cognitive capability and for critical tasks such as disaster response and urban planning, it remains largely unevaluated. Building on progress in chart and infographic understanding, recent large vision language model studies on map visual question-answering often treat maps as a special case of charts. In contrast, map VQA demands comprehension of layered symbology (e.g., symbols, geometries, and text labels) as well as spatial relations tied to orientation and distance that often span multiple maps and are not captured by chart-style evaluations. To address this gap, we introduce FRIEDA, a benchmark for testing complex open-ended cartographic reasoning in LVLMs. FRIEDA sources real map images from documents and reports in various domains and geographical areas. Following classifications in Geographic Information System (GIS) literature, FRIEDA targets all three categories of spatial relations: topological (border, equal, intersect, within), metric (distance), and directional (orientation). All questions require multi-step inference, and many require cross-map grounding and reasoning. We evaluate eleven state-of-the-art LVLMs under two settings: (1) the direct setting, where we provide the maps relevant to the question, and (2) the contextual setting, where the model may have to identify the maps relevant to the question before reasoning. Even the strongest models, Gemini-2.5-Pro and GPT-5-Think, achieve only 38.20% and 37.20% accuracy, respectively, far below human performance of 84.87%. These results reveal a persistent gap in multi-step cartographic reasoning, positioning FRIEDA as a rigorous benchmark to drive progress on spatial intelligence in LVLMs.

URL PDF HTML ☆

赞 0 踩 0

2512.05651 2026-03-02 cs.CV

Self-Supervised AI-Generated Image Detection: A Camera Metadata Perspective

Nan Zhong, Mian Zou, Yiran Xu, Zhenxing Qian, Xinpeng Zhang, Baoyuan Wu, Kede Ma

2512.04576 2026-03-02 cs.CV

TARDis: Time Attenuated Representation Disentanglement for Incomplete Multi-Modal Tumor Segmentation and Classification

Zishuo Wan, Qinqin Kang, Na Li, Yi Huang, Qianru Zhang, Le Lu, Yun Bian, Dawei Ding, Ke Yan

2512.03816 2026-03-02 cs.LG cs.CR

Log Probability Tracking of LLM APIs

Timothée Chauvin, Erwan Le Merrer, François Taïani, Gilles Tredan

Comments ICLR 2026

2512.02814 2026-03-02 cs.AI

Radiologist Copilot: An Agentic Framework Orchestrating Specialized Tools for Reliable Radiology Reporting

Yongrui Yu, Zhongzhen Huang, Linjie Mu, Shaoting Zhang, Xiaofan Zhang

2512.01047 2026-03-02 cs.AI cs.LG cs.RO

Automating the Refinement of Reinforcement Learning Specifications

Tanmay Ambadkar, Đorđe Žikelić, Abhinav Verma

Comments Fourteenth International Conference on Learning Representations 2026 https://ambadkar.com/autospec

2511.21934 2026-03-02 cs.LG cs.AI

Heterogeneous Multi-Agent Reinforcement Learning with Attention for Cooperative and Scalable Feature Transformation

Tao Zhe, Huazhen Fang, Kunpeng Liu, Qian Lou, Tamzidul Hoque, Dongjie Wang

Comments Accepted at KDD 2026 Research Track

详情

英文摘要

Feature transformation enhances downstream task performance by generating informative features through mathematical feature crossing. Despite the advancements in deep learning, feature transformation remains essential for structured data, where deep models often struggle to capture complex feature interactions. Prior literature on automated feature transformation has achieved success but often relies on heuristics or exhaustive searches, leading to inefficient and time-consuming processes. Recent works employ reinforcement learning (RL) to enhance traditional approaches through a more effective trial-and-error way. However, two limitations remain: 1) Dynamic feature expansion during the transformation process, which causes instability and increases the learning complexity for RL agents; 2) Insufficient cooperation and communication between agents, which results in suboptimal feature crossing operations and degraded model performance. To address them, we propose a novel heterogeneous multi-agent RL framework to enable cooperative and scalable feature transformation. The framework comprises three heterogeneous agents, grouped into two types, each designed to select essential features and operations for feature crossing. To enhance communication among these agents, we implement a shared critic mechanism that facilitates information exchange during feature transformation. To handle the dynamically expanding feature space, we tailor multi-head attention-based feature agents to select suitable features for feature crossing. Additionally, we introduce a state encoding technique during the optimization process to stabilize and enhance the learning dynamics of the RL agents, resulting in more robust and reliable transformation policies. Finally, we conduct extensive experiments to validate the effectiveness, efficiency, robustness, and interpretability of our model.

URL PDF HTML ☆

赞 0 踩 0

2511.21135 2026-03-02 cs.RO cs.AI cs.CV

SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation

Ziyi Chen, Yingnan Guo, Zedong Chu, Minghua Luo, Yanfen Shen, Mingchao Sun, Junjun Hu, Shichao Xie, Kuan Yang, Pei Shi, Zhining Gu, Lu Liu, Honglin Han, Xiaolong Wu, Mu Xu, Yu Zhang, Ning Guo

2511.18825 2026-03-02 cs.CV

Q-Save: Towards Scoring and Attribution for Generated Video Evaluation

Xiele Wu, Zicheng Zhang, Mingtao Chen, Yixian Liu, Yiming Liu, Shushi Wang, Zhichao Hu, Yuhong Liu, Guangtao Zhai, Xiaohong Liu

Comments 20 pages, 11 figures

2511.18326 2026-03-02 cs.CV cs.AI

General vs Domain-Specific CNNs: Understanding Pretraining Effects on Brain MRI Tumor Classification

Helia Abedini, Saba Rahimi, Reza Vaziri

2511.18183 2026-03-02 cs.RO

Off-Road Navigation via Implicit Neural Representation of Terrain Traversability

Yixuan Jia, Qingyuan Li, Jonathan P. How

Comments Full version: 10 pages

2511.15927 2026-03-02 cs.LG cs.AI

DiffuMamba: High-Throughput Diffusion LMs with Mamba Backbone

Vaibhav Singh, Oleksiy Ostapenko, Pierre-André Noël, Eugene Belilovsky, Torsten Scholak

Comments 8 pages, 3 figures

2511.10762 2026-03-02 cs.RO cs.CV

Attentive Feature Aggregation or: How Policies Learn to Stop Worrying about Robustness and Attend to Task-Relevant Visual Cues

Nikolaos Tsagkas, Andreas Sochopoulos, Duolikun Danier, Sethu Vijayakumar, Alexandros Kouris, Oisin Mac Aodha, Chris Xiaoxuan Lu

Comments This paper stems from a split of our earlier work "When Pre-trained Visual Representations Fall Short: Limitations in Visuo-Motor Robot Learning." While "The Temporal Trap" replaces the original and focuses on temporal entanglement, this companion study examines policy robustness and task-relevant visual cue selection. arXiv admin note: text overlap with arXiv:2502.03270

2511.05408 2026-03-02 cs.CL cs.LG

Steering Language Models with Weight Arithmetic

Constanza Fierro, Fabien Roger

Comments ICLR 2026 camera-ready

2511.04506 2026-03-02 cs.CL

Modeling Clinical Uncertainty in Radiology Reports: from Explicit Uncertainty Markers to Implicit Reasoning Pathways

Paloma Rabaey, Jong Hak Moon, Jung-Oh Lee, Min Gwan Kim, Hangyul Yoon, Thomas Demeester, Edward Choi

2511.03772 2026-03-02 cs.CL

GRDD+: An Extended Greek Dialectal Dataset with Cross-Architecture Fine-tuning Evaluation

Stergios Chatzikyriakidis, Dimitris Papadakis, Sevasti-Ioanna Papaioannou, Erofili Psaltaki

2510.25992 2026-03-02 cs.CL cs.AI cs.LG

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Yihe Deng, I-Hung Hsu, Jun Yan, Zifeng Wang, Rujun Han, Gufeng Zhang, Yanfei Chen, Wei Wang, Tomas Pfister, Chen-Yu Lee

Comments Paper accepted by ICLR 2026. The first two authors contribute equally