arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2501.00296 2026-03-10 cs.RO cs.AI cs.CV cs.LG

From Pixels to Predicates: Learning Symbolic World Models via Pretrained Vision-Language Models

Ashay Athalye, Nishanth Kumar, Tom Silver, Yichao Liang, Jiuguang Wang, Tomás Lozano-Pérez, Leslie Pack Kaelbling

Comments A version of this paper appears in the official proceedings of RA-L, Volume 11, Issue 4

详情

英文摘要

Our aim is to learn to solve long-horizon decision-making problems in complex robotics domains given low-level skills and a handful of short-horizon demonstrations containing sequences of images. To this end, we focus on learning abstract symbolic world models that facilitate zero-shot generalization to novel goals via planning. A critical component of such models is the set of symbolic predicates that define properties of and relationships between objects. In this work, we leverage pretrained vision-language models (VLMs) to propose a large set of visual predicates potentially relevant for decision-making, and to evaluate those predicates directly from camera images. At training time, we pass the proposed predicates and demonstrations into an optimization-based model-learning algorithm to obtain an abstract symbolic world model that is defined in terms of a compact subset of the proposed predicates. At test time, given a novel goal in a novel setting, we use the VLM to construct a symbolic description of the current world state, and then use a search-based planning algorithm to find a sequence of low-level skills that achieves the goal. We demonstrate empirically across experiments in both simulation and the real world that our method can generalize aggressively, applying its learned world model to solve problems with a wide variety of object types, arrangements, numbers of objects, and visual backgrounds, as well as novel goals and much longer horizons than those seen at training time.

URL PDF HTML ☆

赞 0 踩 0

2412.18582 2026-03-10 cs.CL cs.LG

Exploring Embedding Priors in Prompt-Tuning for Improved Interpretability and Control

Sergey Sedov, Sumanth Bharadwaj Hachalli Karanam, Venu Gopal Kadamba

2412.14744 2026-03-10 cs.LG

Finite Sample Bounds for Non-Parametric Regression: Optimal Sample Efficiency and Space Complexity

Davide Maran, Marcello Restelli

2410.02843 2026-03-10 cs.LG cs.AI physics.comp-ph

Neural delay differential equations: learning non-Markovian closures for partially known dynamical systems

Thibault Monsel, Onofrio Semeraro, Lionel Mathelin, Guillaume Charpiat

2409.16990 2026-03-10 cs.CV

Single Image, Any Face: Generalisable 3D Face Generation

Wenqing Wang, Haosen Yang, Josef Kittler, Xiatian Zhu

Comments Accepted by Pattern Recognition, March 2026

2409.11148 2026-03-10 cs.CL cs.AI

Improving the Efficiency of Visually Augmented Language Models

Paula Ontalvilla, Aitor Ormazabal, Gorka Azkune

Comments COLING 2025

2409.09787 2026-03-10 cs.LG cs.AI stat.CO stat.ML

BNEM: A Boltzmann Sampler Based on Bootstrapped Noised Energy Matching

RuiKang OuYang, Bo Qiang, José Miguel Hernández-Lobato

Comments Camera-ready version for TMLR (03/2026)

2409.08926 2026-03-10 cs.RO cs.CV

ClearDepth: Enhanced Stereo Perception of Transparent Objects for Robotic Manipulation

Kaixin Bai, Huajian Zeng, Lei Zhang, Yiwen Liu, Hongli Xu, Zhaopeng Chen, Jianwei Zhang

Comments 9 pages

2409.08439 2026-03-10 cs.RO cs.AI cs.LG cs.SY eess.SY

Input-to-State Stable Coupled Oscillator Networks for Closed-form Model-based Control in Latent Space

Maximilian Stölzle, Cosimo Della Santina

Comments 38th Conference on Neural Information Processing Systems (NeurIPS 2024) spotlight, 50 pages

详情

DOI: 10.52202/079017-2607
Journal ref: Stölzle, Maximilian, and Cosimo Della Santina. "Input-to-state stable coupled oscillator networks for closed-form model-based control in latent space." Advances in Neural Information Processing Systems 37 (2024): 82010-82059

英文摘要

Even though a variety of methods have been proposed in the literature, efficient and effective latent-space control (i.e., control in a learned low-dimensional space) of physical systems remains an open challenge. We argue that a promising avenue is to leverage powerful and well-understood closed-form strategies from control theory literature in combination with learned dynamics, such as potential-energy shaping. We identify three fundamental shortcomings in existing latent-space models that have so far prevented this powerful combination: (i) they lack the mathematical structure of a physical system, (ii) they do not inherently conserve the stability properties of the real systems, (iii) these methods do not have an invertible mapping between input and latent-space forcing. This work proposes a novel Coupled Oscillator Network (CON) model that simultaneously tackles all these issues. More specifically, (i) we show analytically that CON is a Lagrangian system - i.e., it possesses well-defined potential and kinetic energy terms. Then, (ii) we provide formal proof of global Input-to-State stability using Lyapunov arguments. Moving to the experimental side, we demonstrate that CON reaches SoA performance when learning complex nonlinear dynamics of mechanical systems directly from images. An additional methodological innovation contributing to achieving this third goal is an approximated closed-form solution for efficient integration of network dynamics, which eases efficient training. We tackle (iii) by approximating the forcing-to-input mapping with a decoder that is trained to reconstruct the input based on the encoded latent space force. Finally, we show how these properties enable latent-space control. We use an integral-saturated PID with potential force compensation and demonstrate high-quality performance on a soft robot using raw pixels as the only feedback information.

URL PDF HTML ☆

赞 0 踩 0

2408.15205 2026-03-10 cs.CV

Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation

Jian Hu, Jiayi Lin, Junchi Yan, Shaogang Gong

Comments NeurIPS 2024

详情

英文摘要

Promptable segmentation typically requires instance-specific manual prompts to guide the segmentation of each desired object. To minimize such a need, task-generic promptable segmentation has been introduced, which employs a single task-generic prompt to segment various images of different objects in the same task. Current methods use Multimodal Large Language Models (MLLMs) to reason detailed instance-specific prompts from a task-generic prompt for improving segmentation accuracy. The effectiveness of this segmentation heavily depends on the precision of these derived prompts. However, MLLMs often suffer hallucinations during reasoning, resulting in inaccurate prompting. While existing methods focus on eliminating hallucinations to improve a model, we argue that MLLM hallucinations can reveal valuable contextual insights when leveraged correctly, as they represent pre-trained large-scale knowledge beyond individual images. In this paper, we utilize hallucinations to mine task-related information from images and verify its accuracy for enhancing precision of the generated prompts. Specifically, we introduce an iterative Prompt-Mask Cycle generation framework (ProMaC) with a prompt generator and a mask generator.The prompt generator uses a multi-scale chain of thought prompting, initially exploring hallucinations for extracting extended contextual knowledge on a test image.These hallucinations are then reduced to formulate precise instance-specific prompts, directing the mask generator to produce masks that are consistent with task semantics by mask semantic alignment. The generated masks iteratively induce the prompt generator to focus more on task-relevant image areas and reduce irrelevant hallucinations, resulting jointly in better prompts and masks. Experiments on 5 benchmarks demonstrate the effectiveness of ProMaC. Code given in https://lwpyh.github.io/ProMaC/.

URL PDF HTML ☆

赞 0 踩 0

2402.15109 2026-03-10 cs.LG

Remaining-data-free Machine Unlearning by Suppressing Sample Contribution

Xinwen Cheng, Zhehao Huang, Wenxin Zhou, Zhengbao He, Ruikai Yang, Yingwen Wu, Xiaolin Huang

2309.09045 2026-03-10 cs.LG

Temporal Smoothness Regularisers for Neural Link Predictors

Manuel Dileo, Pasquale Minervini, Matteo Zignani, Sabrina Gaito

详情

DOI: 10.14428/esann/2025.ES2025-87

英文摘要

Most algorithms for representation learning and link prediction on relational data are designed for static data. However, the data to which they are applied typically evolves over time, including online social networks or interactions between users and items in recommender systems. This is also the case for graph-structured knowledge bases -- knowledge graphs -- which contain facts that are valid only for specific points in time. In such contexts, it becomes crucial to correctly identify missing links at a precise time point, i.e. the temporal prediction link task. Recently, Lacroix et al. and Sadeghian et al. proposed a solution to the problem of link prediction for knowledge graphs under temporal constraints inspired by the canonical decomposition of 4-order tensors, where they regularise the representations of time steps by enforcing temporal smoothing, i.e. by learning similar transformation for adjacent timestamps. However, the impact of the choice of temporal regularisation terms is still poorly understood. In this work, we systematically analyse several choices of temporal smoothing regularisers using linear functions and recurrent architectures. In our experiments, we show that by carefully selecting the temporal smoothing regulariser and regularisation weight, a simple method like TNTComplEx can produce significantly more accurate results than state-of-the-art methods on three widely used temporal link prediction datasets. Furthermore, we evaluate the impact of a wide range of temporal smoothing regularisers on two state-of-the-art temporal link prediction models. Our work shows that simple tensor factorisation models can produce new state-of-the-art results using newly proposed temporal regularisers, highlighting a promising avenue for future research.

URL PDF HTML ☆

赞 0 踩 0

2210.00869 2026-03-10 cs.LG astro-ph.GA cs.AI

Explainable classification of astronomical uncertain time series

Michael Franklin Mbouopda, Emille E. O. Ishida, Engelbert Mephu Nguifo, Emmanuel Gangler

2603.08177 2026-03-10 cs.CL cs.AI cs.LG

Is continuous CoT better suited for multi-lingual reasoning?

Ali Hamza Bashir, Behzad Shomali, Markus Frey, Mehdi Ali, Rafet Sifa, David Berghaus

Comments Accepted at the ICLR latent reasoning workshop

2603.08173 2026-03-10 cs.SD cs.AI

Evolution Strategy-Based Calibration for Low-Bit Quantization of Speech Models

Lucas Rakotoarivony

Comments Submitted to INTERSPEECH 2026

2603.08171 2026-03-10 cs.AI

Evidence-Driven Reasoning for Industrial Maintenance Using Heterogeneous Data

Fearghal O'Donncha, Nianjun Zhou, Natalia Martinez, James T Rayfield, Fenno F. Heath, Abigail Langbridge, Roman Vaculin

2603.08166 2026-03-10 cs.CL

RexDrug: Reliable Multi-Drug Combination Extraction through Reasoning-Enhanced LLMs

Zhijun Wang, Ling Luo, Dinghao Pan, Huan Zhuang, Lejing Yu, Yuanyuan Sun, Hongfei Lin

Comments 21 pages, 7 figures

2603.08159 2026-03-10 cs.LG

Learning Hierarchical Knowledge in Text-Rich Networks with Taxonomy-Informed Representation Learning

Yunhui Liu, Yongchao Liu, Yinfeng Chen, Chuntao Hong, Tao Zheng, Tieke He

Comments Accepted by KDD 2026. Extended version coming soon

2603.08156 2026-03-10 cs.LG stat.ML

Are We Winning the Wrong Game? Revisiting Evaluation Practices for Long-Term Time Series Forecasting

Thanapol Phungtua-eng, Yoshitaka Yamamoto

Comments First draft

2603.08154 2026-03-10 cs.SD cs.MM

Soundscapes in Spectrograms: Pioneering Multilabel Classification for South Asian Sounds

Sudip Chakrabarty, Pappu Bishwas, Rajdeep Chatterjee, Tathagata Bandyopadhyay, Digonto Biswas, Bibek Howlader

2603.08153 2026-03-10 cs.CL

Gender Bias in MT for a Genderless Language: New Benchmarks for Basque

Amaia Murillo, Olatz-Perez-de-Viñaspre, Naiara Perez

2603.08150 2026-03-10 cs.CV cs.RO

Edged USLAM: Edge-Aware Event-Based SLAM with Learning-Based Depth Priors

Şebnem Sarıözkan, Hürkan Şahin, Olaya Álvarez-Tuñón, Erdal Kayacan

Comments 8 pages, 7 figures, 3 tables. Accepted to ICRA 2026. Project code and datasets available at https://github.com/sebnem-byte/Edged-USLAM

2603.08148 2026-03-10 cs.CL cs.AI

Gradually Excavating External Knowledge for Implicit Complex Question Answering

Chang Liu, Xiaoguang Li, Lifeng Shang, Xin Jiang, Qun Liu, Edmund Y. Lam, Ngai Wong

Comments 13 pages, 3 figures, EMNLP findings 2023

2603.08147 2026-03-10 cs.CV

MV-Fashion: Towards Enabling Virtual Try-On and Size Estimation with Multi-View Paired Data

Hunor Laczkó, Libang Jia, Loc-Phat Truong, Diego Hernández, Sergio Escalera, Jordi Gonzalez, Meysam Madadi

2603.08137 2026-03-10 cs.LG

Mitigating Homophily Disparity in Graph Anomaly Detection: A Scalable and Adaptive Approach

Yunhui Liu, Qizhuo Xie, Yinfeng Chen, Xudong Jin, Tao Zheng, Bin Chong, Tieke He

Comments Accepted by WWW 2026

2603.08136 2026-03-10 cs.RO

POIROT: Investigating Direct Tangible vs. Digitally Mediated Interaction and Attitude Moderation in Multi-party Murder Mystery Games

Wen Chen, Rongxi Chen, Shankai Chen, Huiyang Gong, Minghui Guo, Yingri Xu, Xintong Wu, Xinyi Fu

Comments 16 pages, 7 figures. Accepted to the 21st ACM/IEEE International Conference on Human-Robot Interaction (HRI 2026)

2603.08135 2026-03-10 cs.CV

VesselFusion: Diffusion Models for Vessel Centerline Extraction from 3D CT Images

Soichi Mita, Shumpei Takezaki, Ryoma Bise

2603.08133 2026-03-10 cs.CV

Fast Low-light Enhancement and Deblurring for 3D Dark Scenes

Feng Zhang, Jinglong Wang, Ze Li, Yanghong Zhou, Yang Chen, Lei Chen, Xiatian Zhu

Comments 5 pages, 2 figures, Accepted at ICASSP 2026

2603.08131 2026-03-10 cs.RO cs.CV

UniGround: Universal 3D Visual Grounding via Training-Free Scene Parsing

Jiaxi Zhang, Yunheng Wang, Wei Lu, Taowen Wang, Weisheng Xu, Shuning Zhang, Yixiao Feng, Yuetong Fang, Renjing Xu

Comments 14 pages,6 figures,3 tables

2603.08130 2026-03-10 cs.LG stat.ML

Explainable Condition Monitoring via Probabilistic Anomaly Detection Applied to Helicopter Transmissions

Aurelio Raffa Ugolini, Jessica Leoni, Valentina Breschi, Damiano Paniccia, Francesco Aldo Tucci, Luigi Capone, Mara Tanelli