arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2512.22519 2026-04-27 cs.RO

Clutter-Robust Vision-Language-Action Models through Object-Centric and Geometry Grounding

Khoa Vo, Taisei Hanyu, Yuki Ikebe, Trong Thang Pham, Nhat Chung, Minh Nhat Vu, Duy Nguyen Ho Minh, Anh Nguyen, Anthony Gunderman, Chase Rainwater, Ngan Le

Comments Under review. Project website: https://uark-aicv.github.io/OBEYED_VLA

详情

英文摘要

Recent Vision-Language-Action (VLA) models have made impressive progress toward general-purpose robotic manipulation by post-training large Vision-Language Models (VLMs) for action prediction. Yet most VLAs entangle perception and control in a monolithic pipeline optimized purely for action, which can erode language-conditioned grounding. In our real-world tabletop tests, policies over-grasp when the target is absent, are distracted by clutter, and overfit to background appearance. To address these issues, we propose OBEYED-VLA (OBject-centric and gEometrY groundED VLA), a framework that explicitly disentangles perceptual grounding from action reasoning. Instead of operating directly on raw RGB, OBEYED-VLA augments VLAs with a perception module that grounds multi-view inputs into task-conditioned, object-centric, and geometry-aware observations. This module includes a VLM-based object-centric grounding stage that selects task-relevant object regions across camera views, along with a complementary geometric grounding stage that emphasizes the 3D structure of these objects over their appearance. The resulting grounded views are then fed to a pretrained VLA policy, which we fine-tune exclusively on single-object demonstrations collected without environmental clutter or non-target objects. On a real-world UR10e tabletop setup, OBEYED-VLA substantially improves robustness over strong VLA baselines across four challenging regimes and multiple difficulty levels: distractor objects, absent-target rejection, background appearance changes, and cluttered manipulation of unseen objects. Ablation studies confirm that both semantic grounding and geometry-aware grounding are critical to these gains. Overall, the results indicate that making perception an explicit, object-centric component is an effective way to strengthen and generalize VLA-based robotic manipulation.

URL PDF HTML ☆

赞 0 踩 0

2512.22502 2026-04-27 cs.RO cs.GR

Topology-Preserving Scalar Field Optimization for Boundary-Conforming Spiral Toolpaths on Multiply Connected Freeform Surfaces

Shen Changqing, Xu Bingzhou, Qi Bosong, Zhang Xiaojian, Yan Sijie, Ding Han

Comments Reorganized the manuscript and added more detailed explanations of the workflow and multiple case studies

2512.21898 2026-04-27 cs.RO cs.AI

Flexible Multitask Learning with Factorized Diffusion Policy

Chaoqi Liu, Haonan Chen, Sigmund H. Høeg, Shaoxiong Yao, Yunzhu Li, Kris Hauser, Yilun Du

2512.20831 2026-04-27 cs.AI cs.LG

Context-Sensitive Abstractions for Reinforcement Learning with Parameterized Actions

Rashmeet Kaur Nayyar, Naman Shah, Siddharth Srivastava

2512.20761 2026-04-27 cs.LG cs.AI

TS-Arena -- A Live Forecast Pre-Registration Platform

Marcel Meyer, Sascha Kaltenpoth, Henrik Albers, Kevin Zalipski, Oliver Müller

2512.05859 2026-04-27 cs.CV

Edit-aware RAW Reconstruction

Abhijith Punnappurath, Luxi Zhao, Ke Zhao, Hue Nguyen, Radek Grzeszczuk, Michael S. Brown

Comments Accepted to CVPR 2026

2511.22277 2026-04-27 cs.LG

TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation

Henrijs Princis, Arindam Sharma, Cristina David

Comments 30 pages, 9 figures, 13 tables

2511.21777 2026-04-27 cs.LG

Artificial intelligence for methane detection: from continuous monitoring to verified mitigation

Gonzalo Mateo-Garcia, Anna Allen, Itziar Irakulis-Loitxate, Manuel Montesino-San Martin, Marc Watine, Cynthia Randles, Tharwat Mokalled, Alma Raunak, Carol Castañeda-Martinez, Juan E. Jonhson, Javier Gorroño, James Requeima, Claudio Cifarelli, Luis Guanter, Richard E. Turner, Manfredi Caltagirone

2511.17663 2026-04-27 cs.LG cs.AI cs.SY eess.SY

AI-based framework to predict animal and pen feed intake in feedlot beef cattle

Alex S. C. Maia, John B. Hall, Hugo F. M. Milan, Izabelle A. M. A. Teixeira

2511.17388 2026-04-27 cs.CL cs.LG

Selective Rotary Position Embedding

Sajad Movahedi, Timur Carstensen, Arshia Afzal, Frank Hutter, Antonio Orvieto, Volkan Cevher

2511.14135 2026-04-27 cs.LG cs.AI cs.GT cs.MA

AdaFair-MARL: Enforcing Adaptive Fairness Constraints in Multi-Agent Reinforcement Learning

Promise Ekpo, Saesha Agarwal, Felix Grimm, Lekan Molu, Angelique Taylor

2511.13211 2026-04-27 cs.CV

3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale

Yijia Fan, Jusheng Zhang, Kaitong Cai, Jing Yang, Jian Wang, Keze Wang

2511.13193 2026-04-27 cs.AI

Cost-Effective Communication: An Auction-based Method for Language Agent Interaction

Yijia Fan, Jusheng Zhang, Kaitong Cai, Jing Yang, Chengpei Tang, Jian Wang, Keze Wang

2511.07003 2026-04-27 cs.CL

NiuTrans.LMT: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs

Yingfeng Luo, Ziqiang Xu, Yuxuan Ouyang, Murun Yang, Dingyang Lin, Kaiyan Chang, Tong Zheng, Bei Li, Peinan Feng, Quan Du, Tong Xiao, Jingbo Zhu

Comments Accepted to ACL 2026 Main Conference. Models are available at: https://github.com/NiuTrans/LMT

2511.05456 2026-04-27 cs.LG

Parameter-Efficient Conditioning for Material Generalization in Graph-Based Simulators

Naveen Raj Manoharan, Hassan Iqbal, Krishna Kumar

详情

英文摘要

Graph network-based simulators (GNS) have demonstrated strong potential for learning particle-based physics (such as fluids, deformable solids, and granular flows) while generalizing to unseen geometries due to their inherent inductive biases. However, existing models are typically trained for a single material type and fail to generalize across distinct constitutive behaviors, limiting their applicability in real-world engineering settings. Using granular flows as a running example, we propose a parameter-efficient conditioning mechanism that makes the GNS model adaptive to material parameters. We identify that sensitivity to material properties is concentrated in the early message-passing (MP) layers, a finding we link to the local nature of constitutive models (e.g., Mohr-Coulomb) and their effects on information propagation. We empirically validate this by showing that fine-tuning only the first few (1-5) of 10 MP layers of a pretrained model achieves comparable test performance as compared to fine-tuning the entire network. Building on this insight, we propose a parameter-efficient Feature-wise Linear Modulation (FiLM) conditioning mechanism designed to specifically target these early layers. This approach produces accurate long-term rollouts on unseen, interpolated, or moderately extrapolated values (e.g., up to 2.5 degrees for friction angle and 0.25 kPa for cohesion) when trained exclusively on as few as 12 short simulation trajectories from new materials, representing a 5-fold data reduction compared to a baseline multi-task learning method. Finally, we validate the model's utility by applying it to an inverse problem, successfully identifying unknown cohesion parameters from trajectory data. This approach enables the use of GNS in inverse design and closed-loop control tasks where material properties are treated as design variables.

URL PDF HTML ☆

赞 0 踩 0

2511.01411 2026-04-27 cs.CV cs.LG eess.IV

Extremal Contours: Gradient-driven contours for compact visual attribution

Reza Karimzadeh, Albert Alonso, Frans Zdyb, Julius B. Kirkegaard, Bulat Ibragimov

2510.27413 2026-04-27 cs.LG cs.AI cs.CL

Atlas-Alignment: Making Interpretability Transferable Across Language Models

Bruno Puri, Jim Berend, Sebastian Lapuschkin, Wojciech Samek

2510.27241 2026-04-27 cs.CL

Identifying the Periodicity of Information in Natural Language

Yulin Ou, Yu Wang, Yang Xu, Hendrik Buschmeier

Comments Accepted at ACL 2026 (main)

2510.25977 2026-04-27 cs.CL

NeuronMLP: Efficient LLM Inference via Singular Value Decomposition Compression and Tiling on AWS Trainium

Dinghong Song, Jierui Xu, Weichu Yang, Pengfei Su, Dong Li

Comments 12 pages, 8 figures

2510.21285 2026-04-27 cs.AI cs.CL

When Models Outthink Their Safety: Unveiling and Mitigating Self-Jailbreak in Large Reasoning Models

Yingzhi Mao, Chunkang Zhang, Junxiang Wang, Xinyan Guan, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun

Comments ACL 2026. The first two authors contributed equally. The main text is 9 pages, with an appendix of 28 pages. The paper contains 20 figures and 15 tables

2510.19592 2026-04-27 cs.CV

Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation

Su Ho Han, Jeongseok Hyun, Pilhyeon Lee, Minho Shim, Dongyoon Wee, Seon Joo Kim

Comments Accepted to ICLR 2026. Code is available at https://github.com/HYUNJS/DecAF

2510.18999 2026-04-27 cs.RO cs.AI cs.CV

OREN: Octree Residual Network for Real-Time Euclidean Signed Distance Mapping

Zhirui Dai, Qihao Qian, Tianxing Fan, Nikolay Atanasov

2510.11586 2026-04-27 cs.CL cs.CY

Survey Response Generation: Generating Closed-Ended Survey Responses In-Silico with Large Language Models

Georg Ahnert, Anna-Carolina Haensch, Barbara Plank, Markus Strohmaier

2510.10254 2026-04-27 cs.CV

Are Video Models Emerging as Zero-Shot Learners and Reasoners in Medical Imaging?

Yuxiang Lai, Jike Zhong, Ming Li, Yuheng Li, Xiaofeng Yang

2510.07632 2026-04-27 cs.AI cs.CL cs.CV cs.LG

Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models

Yinglun Zhu, Jiancheng Zhang, Fuzhi Tang

Comments To appear at ICLR 2026; extended results to generative multimodal models

2510.05971 2026-04-27 cs.CV

Shaken or Stirred? An Analysis of MetaFormer's Token Mixing for Medical Imaging

Ron Keuth, Paul Kaftan, Mattias P. Heinrich

Comments Code and data: https://github.com/multimodallearning/MetaFormerMedImaging/tree/clean_code

2510.03248 2026-04-27 cs.LG cs.AI cs.CV physics.med-ph

Multimodal Neural Operators for Real-Time Biomechanical Modelling of Traumatic Brain Injury

Anusha Agarwal, Dibakar Roy Sarkar, Somdatta Goswami

2509.25003 2026-04-27 cs.LG cs.CV

Score-based Membership Inference on Diffusion Models

Mingxing Rao, Bowen Qu, Daniel Moyer

2509.24004 2026-04-27 cs.CV

SIE3D: Single-Image Expressive 3D Avatar Generation via Semantic Embedding and Perceptual Expression Loss

Zhiqi Huang, Dulongkai Cui, Jinglu Hu

Comments Published in ICASSP 2026. 5 pages, 3 figures. Project page: https://huang-zhiqi.github.io/SIE3D/

2509.22630 2026-04-27 cs.CL cs.AI cs.LG

StateX: Enhancing RNN Recall via Post-training State Expansion

Xingyu Shen, Yingfa Chen, Zhen Leng Thai, Xu Han, Zhiyuan Liu, Maosong Sun