arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2393
专题追踪
2409.11972 2026-02-17 cs.RO cs.LG

Generation of Uncertainty-Aware High-Level Spatial Concepts in Factorized 3D Scene Graphs via Graph Neural Networks

Jose Andres Millan-Romera, Muhammad Shaheer, Miguel Fernandez-Cortizas, Martin R. Oswald, Holger Voos, Jose Luis Sanchez-Lopez

Comments Submitted to IEEE Robotics and Automation Letters (RA-L)

详情
英文摘要

Enabling robots to autonomously discover high-level spatial concepts (e.g., rooms and walls) from primitive geometric observations (e.g., planar surfaces) within 3D Scene Graphs is essential for robust indoor navigation and mapping. These graphs provide a hierarchical metric-semantic representation in which such concepts are organized. To further enhance graph-SLAM performance, Factorized 3D Scene Graphs incorporate these concepts as optimization factors that constrain relative geometry and enforce global consistency. However, both stages of this process remain largely manual: concepts are typically derived using hand-crafted, concept-specific heuristics, while factors and their covariances are likewise manually designed. This reliance on manual specification limits generalization across diverse environments and scalability to new concept classes. This paper presents a novel learning-based method that infers spatial concepts online from observed vertical planes and introduces them as optimizable factors within a SLAM backend, eliminating the need to handcraft concept generation, factor design, and covariance specification. We evaluate our approach in simulated environments with complex layouts, improving room detection by 20.7% and trajectory estimation by 19.2%, and further validate it on real construction sites, where room detection improves by 5.3% and map matching accuracy by 3.8%. Results confirm that learned factors can improve their handcrafted counterparts in SLAM systems and serve as a foundation for extending this approach to new spatial concepts.

2408.11438 2026-02-17 cs.LG cs.CV physics.ao-ph

Benchmarking AI-based data assimilation to advance data-driven global weather forecasting

Wuxin Wang, Weicheng Ni, Ben Fei, Tao Han, Lilan Huang, Taikang Yuan, Xiaoyong Li, Lei Bai, Boheng Duan, Kaijun Ren

Comments 32pages, 11 figures, 3 tables

详情
英文摘要

Research on Artificial Intelligence (AI)-based Data Assimilation (DA) is expanding rapidly. However, the absence of an objective, comprehensive, and real-world benchmark hinders the fair comparison of diverse methods. Here, we introduce DABench, a benchmark designed for contributing to the development and evaluation of AI-based DA methods. By integrating real-world observations, DABench provides an objective and fair platform for validating long-term closed-loop DA cycles, supporting both deterministic and ensemble configurations. Furthermore, we assess the efficacy of AI-based DA in generating initial conditions for the advanced AI-based weather forecasting model to produce accurate medium-range global weather forecasting. Our dual-validation, utilizing both reanalysis data and independent radiosonde observations, demonstrates that AI-based DA achieves performance competitive with state-of-the-art AI-driven four-dimensional variational frameworks across both global weather DA and medium-range forecasting metrics. We invite the research community to utilize DABench to accelerate the advancement of AI-based DA for global weather forecasting.

2407.11907 2026-02-17 cs.LG cs.SI

GraphFM: A generalist graph transformer that learns transferable representations across diverse domains

Divyansha Lachi, Mehdi Azabou, Vinam Arora, Eva Dyer

Journal ref Transactions on Machine Learning Research, 2025

详情
英文摘要

Graph neural networks (GNNs) are often trained on individual datasets, requiring specialized models and significant hyperparameter tuning due to the unique structures and features of each dataset. This approach limits the scalability and generalizability of GNNs, as models must be tailored for each specific graph type. To address these challenges, we introduce GraphFM, a scalable multi-graph pretraining approach designed for learning across diverse graph datasets. GraphFM uses a Perceiver-based encoder with learned latent tokens to compress domain-specific features into a shared latent space, enabling generalization across graph domains. We propose new techniques for scaling up graph training on datasets of different sizes, allowing us to train GraphFM on 152 distinct graph datasets, containing a total of 7.4 million nodes and 189 million edges. This allows us to study the effect of scale on pretraining across domains such as molecules, citation networks, and product graphs, and show that training on diverse datasets improves performance over single-source pretraining. Additionally, pretraining with a mixture of synthetic and real graphs enhances adaptability and stability, leading to competitive performance with state-of-the-art models across various node classification tasks. This approach reduces the burden of dataset-specific training and provides a single generalist model capable of performing across multiple diverse graph structures and tasks. Code is available at https://github.com/nerdslab/GraphFM.

2407.05006 2026-02-17 cs.CL

Recent Advancements and Challenges of Turkic Central Asian Language Processing

Yana Veitsman, Mareike Hartmann

详情
英文摘要

Research in NLP for Central Asian Turkic languages - Kazakh, Uzbek, Kyrgyz, and Turkmen - faces typical low-resource language challenges like data scarcity, limited linguistic resources and technology development. However, recent advancements have included the collection of language-specific datasets and the development of models for downstream tasks. Thus, this paper aims to summarize recent progress and identify future research directions. It provides a high-level overview of each language's linguistic features, the current technology landscape, the application of transfer learning from higher-resource languages, and the availability of labeled and unlabeled data. By outlining the current state, we hope to inspire and facilitate future research.

2406.19898 2026-02-17 cs.CL

Paraphrase Types Elicit Prompt Engineering Capabilities

Jan Philip Wahle, Terry Ruas, Yang Xu, Bela Gipp

Journal ref EMNLP 2024

详情
英文摘要

Much of the success of modern language models depends on finding a suitable prompt to instruct the model. Until now, it has been largely unknown how variations in the linguistic expression of prompts affect these models. This study systematically and empirically evaluates which linguistic features influence models through paraphrase types, i.e., different linguistic changes at particular positions. We measure behavioral changes for five models across 120 tasks and six families of paraphrases (i.e., morphology, syntax, lexicon, lexico-syntax, discourse, and others). We also control for other prompt engineering factors (e.g., prompt length, lexical diversity, and proximity to training data). Our results show a potential for language models to improve tasks when their prompts are adapted in specific paraphrase types (e.g., 6.7% median gain in Mixtral 8x7B; 5.5% in LLaMA 3 8B). In particular, changes in morphology and lexicon, i.e., the vocabulary used, showed promise in improving prompts. These findings contribute to developing more robust language models capable of handling variability in linguistic expression.

2406.11047 2026-02-17 cs.RO cs.AI

Enhancing Supermarket Robot Interaction: A Multi-Level LLM Conversational Interface for Handling Diverse Customer Intents

Chandran Nandkumar, Luka Peternel

Journal ref Frontiers in Robotics and AI, Volume 12, 1576348, April 2025

详情
英文摘要

This paper presents the design and evaluation of a novel multi-level LLM interface for supermarket robots to assist customers. The proposed interface allows customers to convey their needs through both generic and specific queries. While state-of-the-art systems like OpenAI's GPTs are highly adaptable and easy to build and deploy, they still face challenges such as increased response times and limitations in strategic control of the underlying model for tailored use-case and cost optimization. Driven by the goal of developing faster and more efficient conversational agents, this paper advocates for using multiple smaller, specialized LLMs fine-tuned to handle different user queries based on their specificity and user intent. We compare this approach to a specialized GPT model powered by GPT-4 Turbo, using the Artificial Social Agent Questionnaire (ASAQ) and qualitative participant feedback in a counterbalanced within-subjects experiment. Our findings show that our multi-LLM chatbot architecture outperformed the benchmarked GPT model across all 13 measured criteria, with statistically significant improvements in four key areas: performance, user satisfaction, user-agent partnership, and self-image enhancement. The paper also presents a method for supermarket robot navigation by mapping the final chatbot response to correct shelf numbers, enabling the robot to sequentially navigate towards the respective products, after which lower-level robot perception, control, and planning can be used for automated object retrieval. We hope this work encourages more efforts into using multiple, specialized smaller models instead of relying on a single powerful, but more expensive and slower model.

2406.04955 2026-02-17 cs.RO cs.AI

Experimental Evaluation of ROS-Causal in Real-World Human-Robot Spatial Interaction Scenarios

Luca Castri, Gloria Beraldo, Sariah Mghames, Marc Hanheide, Nicola Bellotto

Comments Published at 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)

详情
英文摘要

Deploying robots in human-shared environments requires a deep understanding of how nearby agents and objects interact. Employing causal inference to model cause-and-effect relationships facilitates the prediction of human behaviours and enables the anticipation of robot interventions. However, a significant challenge arises due to the absence of implementation of existing causal discovery methods within the ROS ecosystem, the standard de-facto framework in robotics, hindering effective utilisation on real robots. To bridge this gap, in our previous work we proposed ROS-Causal, a ROS-based framework designed for onboard data collection and causal discovery in human-robot spatial interactions. In this work, we present an experimental evaluation of ROS-Causal both in simulation and on a new dataset of human-robot spatial interactions in a lab scenario, to assess its performance and effectiveness. Our analysis demonstrates the efficacy of this approach, showcasing how causal models can be extracted directly onboard by robots during data collection. The online causal models generated from the simulation are consistent with those from lab experiments. These findings can help researchers to enhance the performance of robotic systems in shared environments, firstly by studying the causal relations between variables in simulation without real people, and then facilitating the actual robot deployment in real human environments. ROS-Causal: https://lcastri.github.io/roscausal

2405.13929 2026-02-17 cs.CL cs.AI

Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian

Aleksandr Nikolich, Konstantin Korolev, Sergei Bratchikov, Igor Kiselev, Artem Shelmanov

详情
英文摘要

There has been a surge in the development of various Large Language Models (LLMs). However, text generation for languages other than English often faces significant challenges, including poor generation quality and reduced computational performance due to the disproportionate representation of tokens in the model's vocabulary. In this work, we address these issues by developing a pipeline for the adaptation of English-oriented pre-trained models to other languages and constructing efficient bilingual LLMs. Using this pipeline, we construct Vikhr, a series of bilingual open-source instruction-following LLMs designed specifically for the Russian language. ``Vikhr'' refers to the name of the Mistral LLM series and means a ``strong gust of wind.'' Unlike previous Russian-language models that typically rely on LoRA adapters on top of English-oriented models, sacrificing performance for lower training costs, Vikhr features an adapted tokenizer vocabulary and undergoes the continued pre-training and instruction tuning of all weights. This not only enhances the model's performance but also significantly improves its computational and contextual efficiency. We also expanded the instruction datasets and corpora for continued pre-training. The model weights, instruction sets, and code are publicly available.

2405.07788 2026-02-17 cs.CL

DEPTH: Discourse Education through Pre-Training Hierarchically

Zachary Bamberger, Ofek Glick, Chaim Baskin, Yonatan Belinkov

Comments 25 pages, 10 figures, 10 tables, accepted to NAACL 2025, Rep4NLP

Journal ref Proceedings of the 10th Workshop on Representation Learning for NLP, 2025, 1-25

详情
英文摘要

Language Models (LMs) struggle with linguistic understanding at the discourse level, even though discourse patterns such as coherence, cohesion, and narrative flow are prevalent in their pre-training data. To improve the discourse capabilities of LMs already at the pre-training stage, we introduce DEPTH, an encoder-decoder model that learns latent representations for sentences using a discourse-oriented pre-training objective. DEPTH combines hierarchical sentence representations with two objectives: (1) Sentence Un-Shuffling, and (2) Span-Corruption. Our approach trains the model to represent both sub-word-level and sentence-level dependencies over a pre-training corpora. When trained either from scratch or continuing from a pre-trained T5 checkpoint, DEPTH learns semantic and discourse-level representations faster than T5, outperforming it in span-corruption loss despite the additional sentence-un-shuffling objective. Evaluations on the GLUE, DiscoEval, and NI benchmarks demonstrate DEPTH's ability to quickly learn diverse downstream tasks, which require syntactic, semantic, and discourse capabilities. Our approach extends the discourse capabilities of T5, while minimally impacting other natural language understanding (NLU) capabilities in the resulting LM. We share our codebase for reproducibility: https://github.com/zbambergerNLP/depth.git.

2404.13895 2026-02-17 cs.LG

Optimal Design for Human Preference Elicitation

Subhojyoti Mukherjee, Anusha Lalitha, Kousha Kalantari, Aniket Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton

Comments Advances in Neural Information Processing Systems 37

详情
英文摘要

Learning of preference models from human feedback has been central to recent advances in artificial intelligence. Motivated by the cost of obtaining high-quality human annotations, we study efficient human preference elicitation for learning preference models. The key idea in our work is to generalize optimal designs, an approach to computing optimal information-gathering policies, to lists of items that represent potential questions with answers. The policy is a distribution over the lists and we elicit preferences from them proportionally to their probabilities. To show the generality of our ideas, we study both absolute and ranking feedback models on items in the list. We design efficient algorithms for both and analyze them. Finally, we demonstrate that our algorithms are practical by evaluating them on existing question-answering problems.

2404.08634 2026-02-17 cs.CL cs.AI cs.LG

When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models

Sunny Sanyal, Ravid Shwartz-Ziv, Alexandros G. Dimakis, Sujay Sanghavi

Comments Published in Transactions on Machine Learning Research (TMLR)

详情
英文摘要

Large Language Models (LLMs) are known for their performance, but we uncover a significant structural inefficiency: a phenomenon we term attention collapse. In many pre-trained decoder-style LLMs, the attention matrices in deeper layers degenerate, collapsing to near rank-one structures. These underutilized layers, which we call lazy layers, are redundant and impair model efficiency. To address this, we introduce Inheritune, a simple yet powerful training recipe designed to build smaller, stronger language models. Inheritune initializes a compact model by inheriting the potent early layers from a larger pre-trained model and then progressively trains and expands it. Our experiments on various models, including the GPT-2 family, demonstrate that models trained with Inheritune can match or even surpass the performance of their larger counterparts, despite having significantly fewer layers. This work presents a novel path toward model compression by design, enabling the creation of compact, yet highly performant language models. Code is available at https://github.com/sanyalsunny111/LLM-Inheritune.

2403.15605 2026-02-17 cs.CV cs.LG

Efficiently Assemble Normalization Layers and Regularization for Federated Domain Generalization

Khiem Le, Long Ho, Cuong Do, Danh Le-Phuoc, Kok-Seng Wong

Comments CVPR'24

详情
英文摘要

Domain shift is a formidable issue in Machine Learning that causes a model to suffer from performance degradation when tested on unseen domains. Federated Domain Generalization (FedDG) attempts to train a global model using collaborative clients in a privacy-preserving manner that can generalize well to unseen clients possibly with domain shift. However, most existing FedDG methods either cause additional privacy risks of data leakage or induce significant costs in client communication and computation, which are major concerns in the Federated Learning paradigm. To circumvent these challenges, here we introduce a novel architectural method for FedDG, namely gPerXAN, which relies on a normalization scheme working with a guiding regularizer. In particular, we carefully design Personalized eXplicitly Assembled Normalization to enforce client models selectively filtering domain-specific features that are biased towards local data while retaining discrimination of those features. Then, we incorporate a simple yet effective regularizer to guide these models in directly capturing domain-invariant representations that the global model's classifier can leverage. Extensive experimental results on two benchmark datasets, i.e., PACS and Office-Home, and a real-world medical dataset, Camelyon17, indicate that our proposed method outperforms other existing methods in addressing this particular problem.

2402.02644 2026-02-17 cs.LG stat.ML

Permutation-based Inference for Variational Learning of Directed Acyclic Graphs

Edwin V. Bonilla, Pantelis Elinas, He Zhao, Maurizio Filippone, Vassili Kitsios, Terry O'Kane

详情
英文摘要

Estimating the structure of Bayesian networks as directed acyclic graphs (DAGs) from observational data is a fundamental challenge, particularly in causal discovery. Bayesian approaches excel by quantifying uncertainty and addressing identifiability, but key obstacles remain: (i) representing distributions over DAGs and (ii) estimating a posterior in the underlying combinatorial space. We introduce PIVID, a method that jointly infers a distribution over permutations and DAGs using variational inference and continuous relaxations of discrete distributions. Through experiments on synthetic and real-world datasets, we show that PIVID can outperform deterministic and Bayesian approaches, achieving superior accuracy-uncertainty trade-offs while scaling efficiently with the number of variables.

2312.02355 2026-02-17 cs.LG cs.AI

When is Offline Policy Selection Sample Efficient for Reinforcement Learning?

Vincent Liu, Prabhat Nagarajan, Andrew Patterson, Martha White

Comments AAMAS 2026

详情
英文摘要

Offline reinforcement learning algorithms often require careful hyperparameter tuning. Before deployment, we need to select amongst a set of candidate policies. However, there is limited understanding about the fundamental limits of this offline policy selection (OPS) problem. In this work we provide clarity on when sample efficient OPS is possible, primarily by connecting OPS to off-policy policy evaluation (OPE) and Bellman error (BE) estimation. We first show a hardness result, that in the worst case, OPS is just as hard as OPE, by proving a reduction of OPE to OPS. As a result, no OPS method can be more sample efficient than OPE in the worst case. We then connect BE estimation to the OPS problem, showing how BE can be used as a tool for OPS. While BE-based methods generally require stronger requirements than OPE, when those conditions are met they can be more sample efficient. Building on this insight, we propose a BE method for OPS, called Identifiable BE Selection (IBES), that has a straightforward method for selecting its own hyperparameters. We conclude with an empirical study comparing OPE and IBES, and by showing the difficulty of OPS on an offline Atari benchmark dataset.

2307.14397 2026-02-17 cs.CV cs.LG

A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot

Milad Abdollahzadeh, Guimeng Liu, Touba Malekzadeh, Christopher T. H. Teo, Keshigeyan Chandrasegaran, Ngai-Man Cheung

Comments Accepted to Transactions on Machine Learning Research (TMLR)

Journal ref Transactions on Machine Learning Research (TMLR), 2025

详情
英文摘要

Generative modeling in machine learning aims to synthesize new data samples that are statistically similar to those observed during training. While conventional generative models such as GANs and diffusion models typically assume access to large and diverse datasets, many real-world applications (e.g. in medicine, satellite imaging, and artistic domains) operate under limited data availability and strict constraints. In this survey, we examine Generative Modeling under Data Constraint (GM-DC), which includes limited-data, few-shot, and zero-shot settings. We present a unified perspective on the key challenges in GM-DC, including overfitting, frequency bias, and incompatible knowledge transfer, and discuss how these issues impact model performance. To systematically analyze this growing field, we introduce two novel taxonomies: one categorizing GM-DC tasks (e.g. unconditional vs. conditional generation, cross-domain adaptation, and subject-driven modeling), and another organizing methodological approaches (e.g. transfer learning, data augmentation, meta-learning, and frequency-aware modeling). Our study reviews over 230 papers, offering a comprehensive view across generative model types and constraint scenarios. We further analyze task-approach-method interactions using a Sankey diagram and highlight promising directions for future work, including adaptation of foundation models, holistic evaluation frameworks, and data-centric strategies for sample selection. This survey provides a timely and practical roadmap for researchers and practitioners aiming to advance generative modeling under limited data. Project website: https://sutd-visual-computing-group.github.io/gmdc-survey/.

2303.09807 2026-02-17 cs.CV cs.AI

TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction

Haoran Li, XiaoLu Li, Yihang Lin, Yanbin Hao, Haiyong Xie, Pengyuan Zhou, Yong Liao

详情
英文摘要

Video prediction is a complex time-series forecasting task with great potential in many use cases. However, traditional methods prioritize accuracy and overlook slow prediction speeds due to complex model structures, redundant information, and excessive GPU memory consumption. These methods often predict frames sequentially, making acceleration difficult and limiting their applicability in real-time scenarios like danger prediction and warning.Therefore, we propose a transformer-based keypoint prediction neural network (TKN). TKN extracts dynamic content from video frames in an unsupervised manner, reducing redundant feature computation. And, TKN uses an acceleration matrix to reduce the computational cost of attention and employs a parallel computing structure for prediction acceleration. To the best of our knowledge, TKN is the first real-time video prediction solution that achieves a prediction rate of 1,176 fps, significantly reducing computation costs while maintaining other performance. Qualitative and quantitative experiments on multiple datasets have demonstrated the superiority of our method, suggesting that TKN has great application potential.

2211.15549 2026-02-17 cs.CV

Realtime Data-Efficient Portrait Stylization Based On Geometric Alignment

Xinrui Wang, Zhuoru Li, Xiao Zhou, Yusuke Iwasawa, Yutaka Matsuo

Comments 16 pages, 14 figures

Journal ref Neural Networks Volume 191, November 2025, 107774

详情
英文摘要

Portrait Stylization aims to imbue portrait photos with vivid artistic effects drawn from style examples. Despite the availability of enormous training datasets and large network weights, existing methods struggle to maintain geometric consistency and achieve satisfactory stylization effects due to the disparity in facial feature distributions between facial photographs and stylized images, limiting the application on rare styles and mobile devices. To alleviate this, we propose to establish meaningful geometric correlations between portraits and style samples to simplify the stylization by aligning corresponding facial characteristics. Specifically, we integrate differentiable Thin-Plate-Spline (TPS) modules into an end-to-end Generative Adversarial Network (GAN) framework to improve the training efficiency and promote the consistency of facial identities. By leveraging inherent structural information of faces, e.g., facial landmarks, TPS module can establish geometric alignments between the two domains, at global and local scales, both in pixel and feature spaces, thereby overcoming the aforementioned challenges. Quantitative and qualitative comparisons on a range of portrait stylization tasks demonstrate that our models not only outperforms existing models in terms of fidelity and stylistic consistency, but also achieves remarkable improvements in 2x training data efficiency and 100x less computational complexity, allowing our lightweight model to achieve real-time inference (30 FPS) at 512*512 resolution on mobile devices.

2208.07035 2026-02-17 cs.RO

Model Predictive Impedance Control with Gaussian Processes for Human and Environment Interaction

Kevin Haninger, Christian Hegeler, Luka Peternel

Comments Video: https://youtu.be/Of2O3mHfM94 Code: https://gitlab.cc-asp.fraunhofer.de/hanikevi/gp-mpc-impedance-control

Journal ref Robotics and Autonomous Systems, Volume 165, 104431, July 2023

详情
英文摘要

Robotic tasks which involve uncertainty--due to variation in goal, environment configuration, or confidence in task model--may require human input to instruct or adapt the robot. In tasks with physical contact, several existing methods for adapting robot trajectory or impedance according to individual uncertainties have been proposed, e.g., realizing intention detection or uncertainty-aware learning from demonstration. However, isolated methods cannot address the wide range of uncertainties jointly present in many tasks. To improve generality, this paper proposes a model predictive control (MPC) framework which plans both trajectory and impedance online, can consider discrete and continuous uncertainties, includes safety constraints, and can be efficiently applied to a new task. This framework can consider uncertainty from: contact constraint variation, uncertainty in human goals, or task disturbances. An uncertainty-aware task model is learned from a few ($\leq3$) demonstrations using Gaussian Processes. This task model is used in a nonlinear MPC problem to optimize robot trajectory and impedance according to belief in discrete human goals, human kinematics, safety constraints, contact stability, and frequency-domain disturbance rejection. This MPC formulation is introduced, analyzed with respect to convexity, and validated in co-manipulation with multiple goals, a collaborative polishing task, and a collaborative assembly task.

2207.12381 2026-02-17 cs.CV cs.AI

LightX3ECG: A Lightweight and eXplainable Deep Learning System for 3-lead Electrocardiogram Classification

Khiem H. Le, Hieu H. Pham, Thao BT. Nguyen, Tu A. Nguyen, Tien N. Thanh, Cuong D. Do

Comments Biomedical Signal Processing and Control

详情
英文摘要

Cardiovascular diseases (CVDs) are a group of heart and blood vessel disorders that is one of the most serious dangers to human health, and the number of such patients is still growing. Early and accurate detection plays a key role in successful treatment and intervention. Electrocardiogram (ECG) is the gold standard for identifying a variety of cardiovascular abnormalities. In clinical practices and most of the current research, standard 12-lead ECG is mainly used. However, using a lower number of leads can make ECG more prevalent as it can be conveniently recorded by portable or wearable devices. In this research, we develop a novel deep learning system to accurately identify multiple cardiovascular abnormalities by using only three ECG leads.

2112.06251 2026-02-17 cs.LG stat.ML

Learning with Subset Stacking

Ş. İlker Birbil, Sinan Yıldırım, Samet Çopur, M. Hakan Akyüz

Comments 26 pages, 8 figures, 4 tables. Code available

详情
英文摘要

We propose a new regression algorithm that learns from a set of input-output pairs. Our algorithm is designed for populations where the relation between the input variables and the output variable exhibits a heterogeneous behavior across the predictor space. The algorithm starts with generating subsets that are concentrated around random points in the input space. This is followed by training a local predictor for each subset. Those predictors are then combined in a novel way to yield an overall predictor. We call this algorithm "LEarning with Subset Stacking" or LESS, due to its resemblance to the method of stacking regressors. We offer bagging and boosting variants of LESS and test against the state-of-the-art methods on several datasets. Our comparison shows that LESS is highly competitive.

2110.12433 2026-02-17 cs.RO cs.SY eess.SY

Model Predictive Control with Gaussian Processes for Flexible Multi-Modal Physical Human Robot Interaction

Kevin Haninger, Christian Hegeler, Luka Peternel

Comments Submitted, ICRA 2022. Video: https://youtu.be/0GT1pPpXvt8 Data and code: https://owncloud.fraunhofer.de/index.php/s/kmCZvlKOghclHy9

Journal ref 2022 IEEE International Conference on Robotics and Automation (ICRA), May 2022

详情
英文摘要

Physical human-robot interaction can improve human ergonomics, task efficiency, and the flexibility of automation, but often requires application-specific methods to detect human state and determine robot response. At the same time, many potential human-robot interaction tasks involve discrete modes, such as phases of a task or multiple possible goals, where each mode has a distinct objective and human behavior. In this paper, we propose a novel method for multi-modal physical human-robot interaction that builds a Gaussian process model for human force in each mode of a collaborative task. These models are then used for Bayesian inference of the mode, and to determine robot reactions through model predictive control. This approach enables optimization of robot trajectory based on the belief of human intent, while considering robot impedance and human joint configuration, according to ergonomic- and/or task-related objectives. The proposed method reduces programming time and complexity, requiring only a low number of demonstrations (here, three per mode) and a mode-specific objective function to commission a flexible online human-robot collaboration task. We validate the method with experiments on an admittance-controlled industrial robot, performing a collaborative assembly task with two modes where assistance is provided in full six degrees of freedom. It is shown that the developed algorithm robustly re-plans to changes in intent or robot initial position, achieving online control at 15 Hz.

2106.04096 2026-02-17 cs.LG math.OC stat.ML

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

Semih Cayci, Niao He, R. Srikant

Journal ref SIAM J. Optim. 34 (2024) 2729-2755

详情
英文摘要

Natural policy gradient (NPG) methods with entropy regularization achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finite-time convergence analyses of entropy-regularized NPG with linear function approximation under softmax parameterization. In particular, we prove that entropy-regularized NPG with averaging satisfies the \emph{persistence of excitation} condition, and achieves a fast convergence rate of $\tilde{O}(1/T)$ up to a function approximation error in regularized Markov decision processes. This convergence result does not require any a priori assumptions on the policies. Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits \emph{linear convergence} up to a function approximation error.

2602.13712 2026-02-17 cs.CV cs.LG

Fine-tuned Vision Language Model for Localization of Parasitic Eggs in Microscopic Images

Chan Hao Sien, Hezerul Abdul Karim, Nouar AlDahoul

详情
英文摘要

Soil-transmitted helminth (STH) infections continuously affect a large proportion of the global population, particularly in tropical and sub-tropical regions, where access to specialized diagnostic expertise is limited. Although manual microscopic diagnosis of parasitic eggs remains the diagnostic gold standard, the approach can be labour-intensive, time-consuming, and prone to human error. This paper aims to utilize a vision language model (VLM) such as Microsoft Florence that was fine-tuned to localize all parasitic eggs within microscopic images. The preliminary results show that our localization VLM performs comparatively better than the other object detection methods, such as EfficientDet, with an mIOU of 0.94. This finding demonstrates the potential of the proposed VLM to serve as a core component of an automated framework, offering a scalable engineering solution for intelligent parasitological diagnosis.

2602.13710 2026-02-17 cs.LG

HBVLA: Pushing 1-Bit Post-Training Quantization for Vision-Language-Action Models

Xin Yan, Zhenglin Wan, Feiyang Ye, Xingrui Yu, Hangyu Du, Yang You, Ivor Tsang

详情
英文摘要

Vision-Language-Action (VLA) models enable instruction-following embodied control, but their large compute and memory footprints hinder deployment on resource-constrained robots and edge platforms. While reducing weights to 1-bit precision through binarization can greatly improve efficiency, existing methods fail to narrow the distribution gap between binarized and full-precision weights, causing quantization errors to accumulate under long-horizon closed-loop execution and severely degrade actions. To fill this gap, we propose HBVLA, a VLA-tailored binarization framework. First, we use a policy-aware enhanced Hessian to identify weights that are truly critical for action generation. Then, we employ a sparse orthogonal transform for non-salient weights to induce a low-entropy intermediate state. Finally, we quantize both salient and non-salient weights in the Harr domain with group-wise 1-bit quantization. We have evaluated our approach on different VLAs: on LIBERO, quantized OpenVLA-OFT retains 92.2% of full-precision performance; on SimplerEnv, quantized CogAct retains 93.6%, significantly outperforming state-of-the-art binarization methods. We further validate our method on real-world evaluation suite and the results show that HBVLA incurs only marginal success-rate degradation compared to the full-precision model, demonstrating robust deployability under tight hardware constraints. Our work provides a practical foundation for ultra-low-bit quantization of VLAs, enabling more reliable deployment on hardware-limited robotic platforms.

2602.13706 2026-02-17 cs.LG

Near-Optimal Regret for Policy Optimization in Contextual MDPs with General Offline Function Approximation

Orin Levy, Aviv Rosenberg, Alon Cohen, Yishay Mansour

详情
英文摘要

We introduce \texttt{OPO-CMDP}, the first policy optimization algorithm for stochastic Contextual Markov Decision Process (CMDPs) under general offline function approximation. Our approach achieves a high probability regret bound of $\widetilde{O}(H^4\sqrt{T|S||A|\log(|\mathcal{F}||\mathcal{P}|)}),$ where $S$ and $A$ denote the state and action spaces, $H$ the horizon length, $T$ the number of episodes, and $\mathcal{F}, \mathcal{P}$ the finite function classes used to approximate the losses and dynamics, respectively. This is the first regret bound with optimal dependence on $|S|$ and $|A|$, directly improving the current state-of-the-art (Qian, Hu, and Simchi-Levi, 2024). These results demonstrate that optimistic policy optimization provides a natural, computationally superior and theoretically near-optimal path for solving CMDPs.

2602.13701 2026-02-17 cs.CL

Metaphors' journeys across time and genre: tracking the evolution of literary metaphors with temporal embeddings

Veronica Mangiaterra, Chiara Barattieri di San Pietro, Paolo Canal, Valentina Bambini

详情
英文摘要

Metaphors are a distinctive feature of literary language, yet they remain less studied experimentally than everyday metaphors. Moreover, previous psycholinguistic and computational approaches overlooked the temporal dimension, although many literary metaphors were coined centuries apart from contemporary readers. This study innovatively applies tools from diachronic distributional semantics to assess whether the processing costs of literary metaphors varied over time and genre. Specifically, we trained word embeddings on literary and nonliterary Italian corpora from the 19th and 21st centuries, for a total of 124 million tokens, and modeled changes in the semantic similarity between topics and vehicles of 515 19th-century literary metaphors, taking this measure as a proxy of metaphor processing demands. Overall, semantic similarity, and hence metaphor processing demands, remained stable over time. However, genre played a key role: metaphors appeared more difficult (i.e., lower topic-vehicle similarity) in modern literary contexts than in 19th-century literature, but easier (i.e., higher topic-vehicle similarity) in today's nonliterary language (e.g., the Web) than in 19th-century nonliterary texts. This pattern was further shaped by semantic features of metaphors' individual terms, such as vector coherence and semantic neighborhood density. Collectively, these findings align with broader linguistic changes in Italian, such as the stylistic simplification of modern literature, which may have increased metaphor processing demands, and the high creativity of the Web's language, which seems to render metaphor more accessible.

2602.13700 2026-02-17 cs.LG

Optimal Regret for Policy Optimization in Contextual Bandits

Orin Levy, Yishay Mansour

详情
英文摘要

We present the first high-probability optimal regret bound for a policy optimization technique applied to the problem of stochastic contextual multi-armed bandit (CMAB) with general offline function approximation. Our algorithm is both efficient and achieves an optimal regret bound of $\widetilde{O}(\sqrt{ K|\mathcal{A}|\log|\mathcal{F}|})$, where $K$ is the number of rounds, $\mathcal{A}$ is the set of arms, and $\mathcal{F}$ is the function class used to approximate the losses. Our results bridge the gap between theory and practice, demonstrating that the widely used policy optimization methods for the contextual bandit problem can achieve a rigorously-proved optimal regret bound. We support our theoretical results with an empirical evaluation of our algorithm.

2602.13699 2026-02-17 cs.LG

Attention Head Entropy of LLMs Predicts Answer Correctness

Sophie Ostmeier, Brian Axelrod, Maya Varma, Asad Aali, Yabin Zhang, Magdalini Paschali, Sanmi Koyejo, Curtis Langlotz, Akshay Chaudhari

详情
英文摘要

Large language models (LLMs) often generate plausible yet incorrect answers, posing risks in safety-critical settings such as medicine. Human evaluation is expensive, and LLM-as-judge approaches risk introducing hidden errors. Recent white-box methods detect contextual hallucinations using model internals, focusing on the localization of the attention mass, but two questions remain open: do these approaches extend to predicting answer correctness, and do they generalize out-of-domains? We introduce Head Entropy, a method that predicts answer correctness from attention entropy patterns, specifically measuring the spread of the attention mass. Using sparse logistic regression on per-head 2-Renyi entropies, Head Entropy matches or exceeds baselines in-distribution and generalizes substantially better on out-of-domains, it outperforms the closest baseline on average by +8.5% AUROC. We further show that attention patterns over the question/context alone, before answer generation, already carry predictive signal using Head Entropy with on average +17.7% AUROC over the closest baseline. We evaluate across 5 instruction-tuned LLMs and 3 QA datasets spanning general knowledge, multi-hop reasoning, and medicine.

2602.13691 2026-02-17 cs.AI

PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning

Yu Li, Guangfeng Cai, Shengtian Yang, Han Luo, Shuo Han, Xu He, Dong Li, Lei Feng

详情
英文摘要

Recent advancements in Large Language Model (LLM) agents have demonstrated strong capabilities in executing complex tasks through tool use. However, long-horizon multi-step tool planning is challenging, because the exploration space suffers from a combinatorial explosion. In this scenario, even when a correct tool-use path is found, it is usually considered an immediate reward for current training, which would not provide any reusable information for subsequent training. In this paper, we argue that historically successful trajectories contain reusable tool-transition patterns, which can be leveraged throughout the whole training process. Inspired by ant colony optimization where historically successful paths can be reflected by the pheromone, we propose Pheromone-Guided Policy Optimization (PhGPO), which learns a trajectory-based transition pattern (i.e., pheromone) from historical trajectories and then uses the learned pheromone to guide policy optimization. This learned pheromone provides explicit and reusable guidance that steers policy optimization toward historically successful tool transitions, thereby improving long-horizon tool planning. Comprehensive experimental results demonstrate the effectiveness of our proposed PhGPO.

2602.13689 2026-02-17 cs.RO cs.CV

Symmetry-Aware Fusion of Vision and Tactile Sensing via Bilateral Force Priors for Robotic Manipulation

Wonju Lee, Matteo Grimaldi, Tao Yu

Comments Accepted By ICRA2026

详情
英文摘要

Insertion tasks in robotic manipulation demand precise, contact-rich interactions that vision alone cannot resolve. While tactile feedback is intuitively valuable, existing studies have shown that naïve visuo-tactile fusion often fails to deliver consistent improvements. In this work, we propose a Cross-Modal Transformer (CMT) for visuo-tactile fusion that integrates wrist-camera observations with tactile signals through structured self- and cross-attention. To stabilize tactile embeddings, we further introduce a physics-informed regularization that encourages bilateral force balance, reflecting principles of human motor control. Experiments on the TacSL benchmark show that CMT with symmetry regularization achieves a 96.59% insertion success rate, surpassing naïve and gated fusion baselines and closely matching the privileged "wrist + contact force" configuration (96.09%). These results highlight two central insights: (i) tactile sensing is indispensable for precise alignment, and (ii) principled multimodal fusion, further strengthened by physics-informed regularization, unlocks complementary strengths of vision and touch, approaching privileged performance under realistic sensing.