arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.15828 2026-02-18 cs.RO cs.CV cs.LG

Dex4D: Task-Agnostic Point Track Policy for Sim-to-Real Dexterous Manipulation

Yuxuan Kuang, Sungjae Park, Katerina Fragkiadaki, Shubham Tulsiani

Comments Project page: https://dex4d.github.io/

详情

英文摘要

Learning generalist policies capable of accomplishing a plethora of everyday tasks remains an open challenge in dexterous manipulation. In particular, collecting large-scale manipulation data via real-world teleoperation is expensive and difficult to scale. While learning in simulation provides a feasible alternative, designing multiple task-specific environments and rewards for training is similarly challenging. We propose Dex4D, a framework that instead leverages simulation for learning task-agnostic dexterous skills that can be flexibly recomposed to perform diverse real-world manipulation tasks. Specifically, Dex4D learns a domain-agnostic 3D point track conditioned policy capable of manipulating any object to any desired pose. We train this 'Anypose-to-Anypose' policy in simulation across thousands of objects with diverse pose configurations, covering a broad space of robot-object interactions that can be composed at test time. At deployment, this policy can be zero-shot transferred to real-world tasks without finetuning, simply by prompting it with desired object-centric point tracks extracted from generated videos. During execution, Dex4D uses online point tracking for closed-loop perception and control. Extensive experiments in simulation and on real robots show that our method enables zero-shot deployment for diverse dexterous manipulation tasks and yields consistent improvements over prior baselines. Furthermore, we demonstrate strong generalization to novel objects, scene layouts, backgrounds, and trajectories, highlighting the robustness and scalability of the proposed framework.

URL PDF HTML ☆

赞 0 踩 0

2602.15820 2026-02-18 cs.LG

Stabilizing Test-Time Adaptation of High-Dimensional Simulation Surrogates via D-Optimal Statistics

Anna Zimmel, Paul Setinek, Gianluca Galletti, Johannes Brandstetter, Werner Zellinger

2602.15817 2026-02-18 cs.LG cs.RO math.OC

Solving Parameter-Robust Avoid Problems with Unknown Feasibility using Reinforcement Learning

Oswin So, Eric Yang Yu, Songyuan Zhang, Matthew Cleaveland, Mitchell Black, Chuchu Fan

Comments ICLR 2026. The project page can be found at https://oswinso.xyz/fge

2602.15816 2026-02-18 cs.AI cs.ET

Developing AI Agents with Simulated Data: Why, what, and how?

Xiaoran Liu, Istvan David

2602.15814 2026-02-18 cs.CL cs.AI

Avey-B

Devang Acharya, Mohammad Hammoud

2602.15813 2026-02-18 cs.RO

FAST-EQA: Efficient Embodied Question Answering with Global and Local Region Relevancy

Haochen Zhang, Nirav Savaliya, Faizan Siddiqui, Enna Sachdeva

Comments WACV 2026

2602.15799 2026-02-18 cs.LG cs.AI

The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety

Max Springer, Chung Peng Lee, Blossom Metevier, Jane Castleman, Bohdan Turbal, Hayoung Jung, Zeyu Shen, Aleksandra Korolova

Comments 27 pages, 4 figures

详情

英文摘要

Fine-tuning aligned language models on benign tasks unpredictably degrades safety guardrails, even when training data contains no harmful content and developers have no adversarial intent. We show that the prevailing explanation, that fine-tuning updates should be orthogonal to safety-critical directions in high-dimensional parameter space, offers false reassurance: we show this orthogonality is structurally unstable and collapses under the dynamics of gradient descent. We then resolve this through a novel geometric analysis, proving that alignment concentrates in low-dimensional subspaces with sharp curvature, creating a brittle structure that first-order methods cannot detect or defend. While initial fine-tuning updates may indeed avoid these subspaces, the curvature of the fine-tuning loss generates second-order acceleration that systematically steers trajectories into alignment-sensitive regions. We formalize this mechanism through the Alignment Instability Condition, three geometric properties that, when jointly satisfied, lead to safety degradation. Our main result establishes a quartic scaling law: alignment loss grows with the fourth power of training time, governed by the sharpness of alignment geometry and the strength of curvature coupling between the fine-tuning task and safety-critical parameters. These results expose a structural blind spot in the current safety paradigm. The dominant approaches to safe fine-tuning address only the initial snapshot of a fundamentally dynamic problem. Alignment fragility is not a bug to be patched; it is an intrinsic geometric property of gradient descent on curved manifolds. Our results motivate the development of curvature-aware methods, and we hope will further enable a shift in alignment safety analysis from reactive red-teaming to predictive diagnostics for open-weight model deployment.

URL PDF HTML ☆

赞 0 踩 0

2602.15791 2026-02-18 cs.AI cs.CL

Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings

Suhyung Jang, Ghang Lee, Jaekun Lee, Hyunjun Lee

Comments 42nd International Symposium on Automation and Robotics in Construction (ISARC 2025)

2602.15785 2026-02-18 cs.AI

This human study did not involve human subjects: Validating LLM simulations as behavioral evidence

Jessica Hullman, David Broska, Huaman Sun, Aaron Shaw

2602.15783 2026-02-18 cs.CV

Context-aware Skin Cancer Epithelial Cell Classification with Scalable Graph Transformers

Lucas Sancéré, Noémie Moreau, Katarzyna Bozek

Comments 17 pages, 2 figures

详情

英文摘要

Whole-slide images (WSIs) from cancer patients contain rich information that can be used for medical diagnosis or to follow treatment progress. To automate their analysis, numerous deep learning methods based on convolutional neural networks and Vision Transformers have been developed and have achieved strong performance in segmentation and classification tasks. However, due to the large size and complex cellular organization of WSIs, these models rely on patch-based representations, losing vital tissue-level context. We propose using scalable Graph Transformers on a full-WSI cell graph for classification. We evaluate this methodology on a challenging task: the classification of healthy versus tumor epithelial cells in cutaneous squamous cell carcinoma (cSCC), where both cell types exhibit very similar morphologies and are therefore difficult to differentiate for image-based approaches. We first compared image-based and graph-based methods on a single WSI. Graph Transformer models SGFormer and DIFFormer achieved balanced accuracies of $85.2 \pm 1.5$ ($\pm$ standard error) and $85.1 \pm 2.5$ in 3-fold cross-validation, respectively, whereas the best image-based method reached $81.2 \pm 3.0$. By evaluating several node feature configurations, we found that the most informative representation combined morphological and texture features as well as the cell classes of non-epithelial cells, highlighting the importance of the surrounding cellular context. We then extended our work to train on several WSIs from several patients. To address the computational constraints of image-based models, we extracted four $2560 \times 2560$ pixel patches from each image and converted them into graphs. In this setting, DIFFormer achieved a balanced accuracy of $83.6 \pm 1.9$ (3-fold cross-validation), while the state-of-the-art image-based model CellViT256 reached $78.1 \pm 0.5$.

URL PDF HTML ☆

赞 0 踩 0

2602.15782 2026-02-18 cs.CV

Meteorological data and Sky Images meets Neural Models for Photovoltaic Power Forecasting

Ines Montoya-Espinagosa, Antonio Agudo

Comments CAI 2026

2602.15776 2026-02-18 cs.AI

GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent Systems

Yiqin Yang, Xu Yang, Yuhua Jiang, Ni Mu, Hao Hu, Runpeng Xie, Ziyou Zhang, Siyuan Li, Yuan-Hua Ni, Qianchuan Zhao, Bo Xu

2602.15775 2026-02-18 cs.CV

NeRFscopy: Neural Radiance Fields for in-vivo Time-Varying Tissues from Endoscopy

Laura Salort-Benejam, Antonio Agudo

Comments ISBI 2026

2602.15769 2026-02-18 cs.CL

ViTaB-A: Evaluating Multimodal Large Language Models on Visual Table Attribution

Yahia Alqurnawi, Preetom Biswas, Anmol Rao, Tejas Anvekar, Chitta Baral, Vivek Gupta

2602.15767 2026-02-18 cs.RO cs.AI cs.HC

Robot-Assisted Social Dining as a White Glove Service

Atharva S Kashyap, Ugne Aleksandra Morkute, Patricia Alves-Oliveira

Comments 20 pages, 9 figures. Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26)

2602.15766 2026-02-18 cs.SD

TAC: Timestamped Audio Captioning

Sonal Kumar, Prem Seetharaman, Ke Chen, Oriol Nieto, Jiaqi Su, Zhepei Wang, Rithesh Kumar, Dinesh Manocha, Nicholas J. Bryan, Zeyu Jin, Justin Salamon

2602.15758 2026-02-18 cs.CL cs.AI

ChartEditBench: Evaluating Grounded Multi-Turn Chart Editing in Multimodal Language Models

Manav Nitin Kapadnis, Lawanya Baghel, Atharva Naik, Carolyn Rosé

Comments 16 pages, 13 figures including Supplementary Material

2602.15757 2026-02-18 cs.CL cs.AI

Beyond Binary Classification: Detecting Fine-Grained Sexism in Social Media Videos

Laura De Grazia, Danae Sánchez Villegas, Desmond Elliott, Mireia Farrús, Mariona Taulé

2602.15755 2026-02-18 cs.CV cs.RO

RaCo: Ranking and Covariance for Practical Learned Keypoints

Abhiram Shenoi, Philipp Lindenberger, Paul-Edouard Sarlin, Marc Pollefeys

2602.15753 2026-02-18 cs.CL

Under-resourced studies of under-resourced languages: lemmatization and POS-tagging with LLM annotators for historical Armenian, Georgian, Greek and Syriac

Chahan Vidal-Gorène, Bastien Kindt, Florian Cafiero

2602.15750 2026-02-18 cs.LG cs.AI

UrbanVerse: Learning Urban Region Representation Across Cities and Tasks

Fengze Sun, Egemen Tanin, Shanika Karunasekera, Zuqing Li, Flora D. Salim, Jianzhong Qi

2602.15740 2026-02-18 cs.LG cs.AI q-bio.QM

MRC-GAT: A Meta-Relational Copula-Based Graph Attention Network for Interpretable Multimodal Alzheimer's Disease Diagnosis

Fatemeh Khalvandi, Saadat Izadi, Abdolah Chalechale

Comments 27 pages, 10 figures, 10 table

2602.15734 2026-02-18 cs.CV

Language and Geometry Grounded Sparse Voxel Representations for Holistic Scene Understanding

Guile Wu, David Huang, Bingbing Liu, Dongfeng Bai

Comments Technical Report

2602.15733 2026-02-18 cs.RO cs.AI

MeshMimic: Geometry-Aware Humanoid Motion Learning through 3D Scene Reconstruction

Qiang Zhang, Jiahao Ma, Peiran Liu, Shuai Shi, Zeran Su, Zifan Wang, Jingkai Sun, Wei Cui, Jialin Yu, Gang Han, Wen Zhao, Pihai Sun, Kangning Yin, Jiaxu Wang, Jiahang Cao, Lingfeng Zhang, Hao Cheng, Xiaoshuai Hao, Yiding Ji, Junwei Liang, Jian Tang, Renjing Xu, Yijie Guo

Comments 17 pages, 6 figures

2602.15730 2026-02-18 cs.CL econ.EM

Causal Effect Estimation with Latent Textual Treatments

Omri Feldman, Amar Venugopal, Jann Spiess, Amir Feder

2602.15727 2026-02-18 cs.CV cs.AI cs.GR cs.LG eess.IV

Spanning the Visual Analogy Space with a Weight Basis of LoRAs

Hila Manor, Rinon Gal, Haggai Maron, Tomer Michaeli, Gal Chechik

Comments Code and data are in https://research.nvidia.com/labs/par/lorweb

2602.15725 2026-02-18 cs.AI cs.CL cs.LG

Recursive Concept Evolution for Compositional Reasoning in Large Language Models

Sarim Chaudhry

2602.15724 2026-02-18 cs.CV cs.AI

Learning to Retrieve Navigable Candidates for Efficient Vision-and-Language Navigation

Shutian Gu, Chengkai Huang, Ruoyu Wang, Lina Yao

2602.15721 2026-02-18 cs.RO cs.AI

Lifelong Scalable Multi-Agent Realistic Testbed and A Comprehensive Study on Design Choices in Lifelong AGV Fleet Management Systems

Jingtian Yan, Yulun Zhang, Zhenting Liu, Han Zhang, He Jiang, Jingkai Chen, Stephen F. Smith, Jiaoyang Li

2602.15716 2026-02-18 cs.CL

Rethinking Metrics for Lexical Semantic Change Detection

Roksana Goworek, Haim Dubossarsky

Comments Accepted to the LChange 2026 Workshop, colocated with EACL 2026