arXivDaily arXiv每日学术速递 周一至周五更新
重置
2602.18434 2026-02-23 cs.CV

Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory

Vatsal Agarwal, Saksham Suri, Matthew Gwilliam, Pulkit Kumar, Abhinav Shrivastava

Comments Project page: see https://vatsalag99.github.io/memstream/

详情
英文摘要

Streaming video understanding requires models to robustly encode, store, and retrieve information from a continuous video stream to support accurate video question answering (VQA). Existing state-of-the-art approaches rely on key-value caching to accumulate frame-level information over time, but use a limited number of tokens per frame, leading to the loss of fine-grained visual details. In this work, we propose scaling the token budget to enable more granular spatiotemporal understanding and reasoning. First, we find that current methods are ill-equipped to handle dense streams: their feature encoding causes query-frame similarity scores to increase over time, biasing retrieval toward later frames. To address this, we introduce an adaptive selection strategy that reduces token redundancy while preserving local spatiotemporal information. We further propose a training-free retrieval mixture-of-experts that leverages external models to better identify relevant frames. Our method, MemStream, achieves +8.0% on CG-Bench, +8.5% on LVBench, and +2.4% on VideoMME (Long) over ReKV with Qwen2.5-VL-7B.

2602.18432 2026-02-23 cs.CV

SARAH: Spatially Aware Real-time Agentic Humans

Evonne Ng, Siwei Zhang, Zhang Chen, Michael Zollhoefer, Alexander Richard

Comments Project page: https://evonneng.github.io/sarah/

详情
英文摘要

As embodied agents become central to VR, telepresence, and digital human applications, their motion must go beyond speech-aligned gestures: agents should turn toward users, respond to their movement, and maintain natural gaze. Current methods lack this spatial awareness. We close this gap with the first real-time, fully causal method for spatially-aware conversational motion, deployable on a streaming VR headset. Given a user's position and dyadic audio, our approach produces full-body motion that aligns gestures with speech while orienting the agent according to the user. Our architecture combines a causal transformer-based VAE with interleaved latent tokens for streaming inference and a flow matching model conditioned on user trajectory and audio. To support varying gaze preferences, we introduce a gaze scoring mechanism with classifier-free guidance to decouple learning from control: the model captures natural spatial alignment from data, while users can adjust eye contact intensity at inference time. On the Embody 3D dataset, our method achieves state-of-the-art motion quality at over 300 FPS -- 3x faster than non-causal baselines -- while capturing the subtle spatial dynamics of natural conversation. We validate our approach on a live VR system, bringing spatially-aware conversational agents to real-time deployment. Please see https://evonneng.github.io/sarah/ for details.

2602.18429 2026-02-23 cs.CL cs.IR

VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

Harshul Raj Surana, Arijit Maji, Aryan Vats, Akash Ghosh, Sriparna Saha, Amit Sheth

详情
英文摘要

Large Language Models (LLMs) have made significant progress in reasoning tasks across various domains such as mathematics and coding. However, their performance deteriorates in tasks requiring rich socio-cultural knowledge and diverse local contexts, particularly those involving Indian Culture. Existing Cultural benchmarks are (i) Manually crafted, (ii) contain single-hop questions testing factual recall, and (iii) prohibitively costly to scale, leaving this deficiency largely unmeasured. To address this, we introduce VIRAASAT, a novel, semi-automated multi-hop approach for generating cultural specific multi-hop Question-Answering dataset for Indian culture. VIRAASAT leverages a Knowledge Graph comprising more than 700 expert-curated cultural artifacts, covering 13 key attributes of Indian culture (history, festivals, etc). VIRAASAT spans all 28 states and 8 Union Territories, yielding more than 3,200 multi-hop questions that necessitate chained cultural reasoning. We evaluate current State-of-the-Art (SOTA) LLMs on VIRAASAT and identify key limitations in reasoning wherein fine-tuning on Chain-of-Thought(CoT) traces fails to ground and synthesize low-probability facts. To bridge this gap, we propose a novel framework named Symbolic Chain-of-Manipulation (SCoM). Adapting the Chain-of-Manipulation paradigm, we train the model to simulate atomic Knowledge Graph manipulations internally. SCoM teaches the model to reliably traverse the topological structure of the graph. Experiments on Supervised Fine-Tuning (SFT) demonstrate that SCoM outperforms standard CoT baselines by up to 20%. We release the VIRAASAT dataset along with our findings, laying a strong foundation towards building Culturally Aware Reasoning Models.

2602.18428 2026-02-23 cs.LG cs.CV eess.IV

The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

Mojtaba Sahraee-Ardakan, Mauricio Delbracio, Peyman Milanfar

详情
英文摘要

Autonomous (noise-agnostic) generative models, such as Equilibrium Matching and blind diffusion, challenge the standard paradigm by learning a single, time-invariant vector field that operates without explicit noise-level conditioning. While recent work suggests that high-dimensional concentration allows these models to implicitly estimate noise levels from corrupted observations, a fundamental paradox remains: what is the underlying landscape being optimized when the noise level is treated as a random variable, and how can a bounded, noise-agnostic network remain stable near the data manifold where gradients typically diverge? We resolve this paradox by formalizing Marginal Energy, $E_{\text{marg}}(\mathbf{u}) = -\log p(\mathbf{u})$, where $p(\mathbf{u}) = \int p(\mathbf{u}|t)p(t)dt$ is the marginal density of the noisy data integrated over a prior distribution of unknown noise levels. We prove that generation using autonomous models is not merely blind denoising, but a specific form of Riemannian gradient flow on this Marginal Energy. Through a novel relative energy decomposition, we demonstrate that while the raw Marginal Energy landscape possesses a $1/t^p$ singularity normal to the data manifold, the learned time-invariant field implicitly incorporates a local conformal metric that perfectly counteracts the geometric singularity, converting an infinitely deep potential well into a stable attractor. We also establish the structural stability conditions for sampling with autonomous models. We identify a ``Jensen Gap'' in noise-prediction parameterizations that acts as a high-gain amplifier for estimation errors, explaining the catastrophic failure observed in deterministic blind models. Conversely, we prove that velocity-based parameterizations are inherently stable because they satisfy a bounded-gain condition that absorbs posterior uncertainty into a smooth geometric drift.

2602.18427 2026-02-23 math.CO cs.DM math-ph math.MP math.OC

Polytopes of alternating sign matrices with dihedral-subgroup symmetry

Péter Madarasi

详情
英文摘要

We investigate the convex hulls of the eight dihedral symmetry classes of $n \times n$ alternating sign matrices, i.e., ASMs invariant under a subgroup of the symmetry group of the square. Extending the prefix-sum description of the ASM polytope, we develop a uniform core--assembly framework: each symmetry class is encoded by a set of core positions and an affine assembly map that reconstructs the full matrix from its core. This reduction transfers polyhedral questions to lower-dimensional core polytopes, which are better suited to the tool set of polyhedral combinatorics, while retaining complete information about the original symmetry class. For the vertical, vertical--horizontal, half-turn, diagonal, diagonal--antidiagonal, and total symmetry classes, we give explicit polynomial-size linear inequality descriptions of the associated polytopes. In these cases, we also determine the dimension and provide facet descriptions. The quarter-turn symmetry class behaves differently: the natural relaxation admits fractional vertices, and we need to extend the system with a structured family of parity-type Chvátal--Gomory inequalities to obtain the quarter-turn symmetric ASM polytope. Our framework leads to efficient algorithms for computing minimum-cost ASMs in each symmetry class and provides a direct link between the combinatorics of symmetric ASMs and tools from polyhedral combinatorics and combinatorial optimization.

2602.18426 2026-02-23 astro-ph.GA cs.CV

Spatio-Spectroscopic Representation Learning using Unsupervised Convolutional Long-Short Term Memory Networks

Kameswara Bharadwaj Mantha, Lucy Fortson, Ramanakumar Sankar, Claudia Scarlata, Chris Lintott, Sandor Kruk, Mike Walmsley, Hugh Dickinson, Karen Masters, Brooke Simmons, Rebecca Smethurst

Comments This manuscript was previously submitted to ICML for peer review. Reviewers noted that while the underlying VAE-based architecture builds on established methods, its application to spatially-resolved IFS data is promising for unsupervised representation learning in astronomy. This version is released for community visibility. Reviewer decisions: Weak accept and Weak reject (Final: Reject)

详情
英文摘要

Integral Field Spectroscopy (IFS) surveys offer a unique new landscape in which to learn in both spatial and spectroscopic dimensions and could help uncover previously unknown insights into galaxy evolution. In this work, we demonstrate a new unsupervised deep learning framework using Convolutional Long-Short Term Memory Network Autoencoders to encode generalized feature representations across both spatial and spectroscopic dimensions spanning $19$ optical emission lines (3800A $< λ<$ 8000A) among a sample of $\sim 9000$ galaxies from the MaNGA IFS survey. As a demonstrative exercise, we assess our model on a sample of $290$ Active Galactic Nuclei (AGN) and highlight scientifically interesting characteristics of some highly anomalous AGN.

2602.18425 2026-02-23 cs.CL cs.IR

RVR: Retrieve-Verify-Retrieve for Comprehensive Question Answering

Deniz Qian, Hung-Ting Chen, Eunsol Choi

Comments 18 pages, 12 figures, 12 tables

详情
英文摘要

Comprehensively retrieving diverse documents is crucial to address queries that admit a wide range of valid answers. We introduce retrieve-verify-retrieve (RVR), a multi-round retrieval framework designed to maximize answer coverage. Initially, a retriever takes the original query and returns a candidate document set, followed by a verifier that identifies a high-quality subset. For subsequent rounds, the query is augmented with previously verified documents to uncover answers that are not yet covered in previous rounds. RVR is effective even with off-the-shelf retrievers, and fine-tuning retrievers for our inference procedure brings further gains. Our method outperforms baselines, including agentic search approaches, achieving at least 10% relative and 3% absolute gain in complete recall percentage on a multi-answer retrieval dataset (QAMPARI). We also see consistent gains on two out-of-domain datasets (QUEST and WebQuestionsSP) across different base retrievers. Our work presents a promising iterative approach for comprehensive answer recall leveraging a verifier and adapting retrievers to a new inference scenario.

2602.18424 2026-02-23 cs.CV cs.RO

CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation

Xia Su, Ruiqi Chen, Benlin Liu, Jingwei Ma, Zonglin Di, Ranjay Krishna, Jon Froehlich

详情
英文摘要

Vision-Language Models (VLMs) have shown remarkable progress in Vision-Language Navigation (VLN), offering new possibilities for navigation decision-making that could benefit both robotic platforms and human users. However, real-world navigation is inherently conditioned by the agent's mobility constraints. For example, a sweeping robot cannot traverse stairs, while a quadruped can. We introduce Capability-Conditioned Navigation (CapNav), a benchmark designed to evaluate how well VLMs can navigate complex indoor spaces given an agent's specific physical and operational capabilities. CapNav defines five representative human and robot agents, each described with physical dimensions, mobility capabilities, and environmental interaction abilities. CapNav provides 45 real-world indoor scenes, 473 navigation tasks, and 2365 QA pairs to test if VLMs can traverse indoor environments based on agent capabilities. We evaluate 13 modern VLMs and find that current VLM's navigation performance drops sharply as mobility constraints tighten, and that even state-of-the-art models struggle with obstacle types that require reasoning on spatial dimensions. We conclude by discussing the implications for capability-aware navigation and the opportunities for advancing embodied spatial reasoning in future VLMs. The benchmark is available at https://github.com/makeabilitylab/CapNav

2602.18422 2026-02-23 cs.CV

Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

Linxi Xie, Lisong C. Sun, Ashley Neall, Tong Wu, Shengqu Cai, Gordon Wetzstein

Comments Project page here: https://codeysun.github.io/generated-reality

详情
英文摘要

Extended reality (XR) demands generative models that respond to users' tracked real-world motion, yet current video world models accept only coarse control signals such as text or keyboard input, limiting their utility for embodied interaction. We introduce a human-centric video world model that is conditioned on both tracked head pose and joint-level hand poses. For this purpose, we evaluate existing diffusion transformer conditioning strategies and propose an effective mechanism for 3D head and hand control, enabling dexterous hand--object interactions. We train a bidirectional video diffusion model teacher using this strategy and distill it into a causal, interactive system that generates egocentric virtual environments. We evaluate this generated reality system with human subjects and demonstrate improved task performance as well as a significantly higher level of perceived amount of control over the performed actions compared with relevant baselines.

2602.18421 2026-02-23 cs.RO cond-mat.soft

Snapping Actuators with Asymmetric and Sequenced Motion

Xin Li, Ye Jin, Mohsen Jafarpour, Hugo de Souza Oliveira, Edoardo Milana

Comments 9th IEEE-RAS International Conference on Soft Robotics (RoboSoft 2026)

详情
英文摘要

Snapping instabilities in soft structures offer a powerful pathway to achieve rapid and energy-efficient actuation. In this study, an eccentric dome-shaped snapping actuator is developed to generate controllable asymmetric motion through geometry-induced instability. Finite element simulations and experiments reveal consistent asymmetric deformation and the corresponding pressure characteristics. By coupling four snapping actuators in a pneumatic network, a compact quadrupedal robot achieves coordinated wavelike locomotion using only a single pressure input. The robot exhibits frequency-dependent performance with a maximum speed of 72.78~mm/s at 7.5~Hz. These findings demonstrate the potential of asymmetric snapping mechanisms for physically controlled actuation and lay the groundwork for fully untethered and efficient soft robotic systems.

2602.18420 2026-02-23 cs.CL

SPQ: An Ensemble Technique for Large Language Model Compression

Jiamin Yao, Eren Gultepe

Comments Accepted to LREC 2026 Main Conference

详情
英文摘要

This study presents an ensemble technique, SPQ (SVD-Pruning-Quantization), for large language model (LLM) compression that combines variance-retained singular value decomposition (SVD), activation-based pruning, and post-training linear quantization. Each component targets a different source of inefficiency: i) pruning removes redundant neurons in MLP layers, ii) SVD reduces attention projections into compact low-rank factors, iii) and 8-bit quantization uniformly compresses all linear layers. At matched compression ratios, SPQ outperforms individual methods (SVD-only, pruning-only, or quantization-only) in perplexity, demonstrating the benefit of combining complementary techniques. Applied to LLaMA-2-7B, SPQ achieves up to 75% memory reduction while maintaining or improving perplexity (e.g., WikiText-2 5.47 to 4.91) and preserving accuracy on downstream benchmarks such as C4, TruthfulQA, and GSM8K. Compared to strong baselines like GPTQ and SparseGPT, SPQ offers competitive perplexity and accuracy while using less memory (6.86 GB vs. 7.16 GB for GPTQ). Moreover, SPQ improves inference throughput over GPTQ, achieving up to a 1.9x speedup, which further enhances its practicality for real-world deployment. The effectiveness of SPQ's robust compression through layer-aware and complementary compression techniques may provide practical deployment of LLMs in memory-constrained environments. Code is available at: https://github.com/JiaminYao/SPQ_LLM_Compression/

2602.18417 2026-02-23 cs.LG cs.CL

Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

Joshua Nunley

Comments 12 pages, 3 figures, 8 tables

详情
英文摘要

This paper presents a direct framework for sequence models with hidden states on closed subgroups of U(d). We use a minimal axiomatic setup and derive recurrent and transformer templates from a shared skeleton in which subgroup choice acts as a drop-in replacement for state space, tangent projection, and update map. We then specialize to O(d) and evaluate orthogonal-state RNN and transformer models on Tiny Shakespeare and Penn Treebank under parameter-matched settings. We also report a general linear-mixing extension in tangent space, which applies across subgroup choices and improves finite-budget performance in the current O(d) experiments.

2602.18416 2026-02-23 eess.SY cs.SY math.OC

Convex Block-Cholesky Approach to Risk-Constrained Low-thrust Trajectory Design under Operational Uncertainty

Kenshiro Oguri, Gregory Lantoine

详情
英文摘要

Designing robust trajectories under uncertainties is an emerging technology that may represent a key paradigm shift in space mission design. As we pursue more ambitious scientific goals (e.g., multi-moon tours, missions with extensive components of autonomy), it becomes more crucial that missions are designed with navigation (Nav) processes in mind. The effect of Nav processes is statistical by nature, as they consist of orbit determination (OD) and flight-path control (FPC). Thus, this mission design paradigm calls for techniques that appropriately quantify statistical effects of Nav, evaluate associated risks, and design missions that ensure sufficiently low risk while minimizing a statistical performance metric; a common metric is Delta-V99: worst-case (99%-quantile) Delta-V expenditure including statistical FPC efforts. In response to the need, this paper develops an algorithm for risk-constrained trajectory optimization under operational uncertainties due to initial state dispersion, navigation error, maneuver execution error, and imperfect dynamics modeling. We formulate it as a nonlinear stochastic optimal control problem and develop a computationally tractable algorithm that combines optimal covariance steering and sequential convex programming (SCP). Specifically, the proposed algorithm takes a block-Cholesky approach for convex formulation of optimal covariance steering, and leverages a recent SCP algorithm, SCvx*, for reliable numerical convergence. We apply the developed algorithm to risk-constrained, statistical trajectory optimization for exploration of dwarf planet Ceres with a Mars gravity assist, and demonstrate the robustness of the statistically-optimal trajectory and FPC policies via nonlinear Monte Carlo simulation.

2602.18409 2026-02-23 cs.LG cs.AI cs.LO

Unifying approach to uniform expressivity of graph neural networks

Huan Luo, Jonni Virtema

详情
英文摘要

The expressive power of Graph Neural Networks (GNNs) is often analysed via correspondence to the Weisfeiler-Leman (WL) algorithm and fragments of first-order logic. Standard GNNs are limited to performing aggregation over immediate neighbourhoods or over global read-outs. To increase their expressivity, recent attempts have been made to incorporate substructural information (e.g. cycle counts and subgraph properties). In this paper, we formalize this architectural trend by introducing Template GNNs (T-GNNs), a generalized framework where node features are updated by aggregating over valid template embeddings from a specified set of graph templates. We propose a corresponding logic, Graded template modal logic (GML(T)), and generalized notions of template-based bisimulation and WL algorithm. We establish an equivalence between the expressive power of T-GNNs and GML(T), and provide a unifying approach for analysing GNN expressivity: we show how standard AC-GNNs and its recent variants can be interpreted as instantiations of T-GNNs.

2602.18405 2026-02-23 cs.IT math.IT

A Generalized Information Bottleneck Method: A Decision-Theoretic Perspective

Akira Kamatsuka, Takahiro Yoshida

详情
英文摘要

The information bottleneck (IB) method seeks a compressed representation of data that preserves information relevant to a target variable for prediction while discarding irrelevant information from the original data. In its classical formulation, the IB method employs mutual information to evaluate the compression between the original and compressed data and the utility of the representation for the target variable. In this study, we investigate a generalized IB problem, where the evaluation of utility is based on the $\mathcal{H}$-mutual information that satisfies the concave (\texttt{CV}) and averaging (\texttt{AVG}) conditions. This class of information measures admits a statistical decision-theoretic interpretation via its equivalence to the expected value of sample information. Based on this interpretation, we derive an alternating optimization algorithm to assess the tradeoff between compression and utility in the generalized IB problem.

2602.18404 2026-02-23 math.NA cs.NA

Well-posedness and time stepping adaptivity for a class of collocation discretisations of time-fractional subdiffusion equations

Sebastian Franz, Natalia Kopteva

Comments 23 pages, 9 figures

详情
英文摘要

Time-fractional parabolic equations with a Caputo time derivative of order $α\in(0,1)$ are discretised in time using collocation methods, which assume that the Caputo derivative of the computed solution is piecewise-polynomial. For such discretisations of any order $m\ge 0$, with any choice of collocation points, we give sufficient conditions for existence and uniqueness of collocation solutions. Furthermore, we investigate the applicability and performance of such schemes in the context of the a-posteriori error estimation and adaptive time stepping algorithms.

2602.18403 2026-02-23 cs.LG

Scientific Knowledge-Guided Machine Learning for Vessel Power Prediction: A Comparative Study

Orfeas Bourchas, George Papalambrou

Comments Accepted to the KGML Bridge at AAAI 2026 (non-archival)

详情
英文摘要

Accurate prediction of main engine power is essential for vessel performance optimization, fuel efficiency, and compliance with emission regulations. Conventional machine learning approaches, such as Support Vector Machines, variants of Artificial Neural Networks (ANNs), and tree-based methods like Random Forests, Extra Tree Regressors, and XGBoost, can capture nonlinearities but often struggle to respect the fundamental propeller law relationship between power and speed, resulting in poor extrapolation outside the training envelope. This study introduces a hybrid modeling framework that integrates physics-based knowledge from sea trials with data-driven residual learning. The baseline component, derived from calm-water power curves of the form $P = cV^n$, captures the dominant power-speed dependence, while another, nonlinear, regressor is then trained to predict the residual power, representing deviations caused by environmental and operational conditions. By constraining the machine learning task to residual corrections, the hybrid model simplifies learning, improves generalization, and ensures consistency with the underlying physics. In this study, an XGBoost, a simple Neural Network, and a Physics-Informed Neural Network (PINN) coupled with the baseline component were compared to identical models without the baseline component. Validation on in-service data demonstrates that the hybrid model consistently outperformed a pure data-driven baseline in sparse data regions while maintaining similar performance in populated ones. The proposed framework provides a practical and computationally efficient tool for vessel performance monitoring, with applications in weather routing, trim optimization, and energy efficiency planning.

2602.18401 2026-02-23 cs.LG cs.AI q-bio.NC stat.ML

Leakage and Second-Order Dynamics Improve Hippocampal RNN Replay

Josue Casco-Rodriguez, Nanda H. Krishna, Richard G. Baraniuk

详情
英文摘要

Biological neural networks (like the hippocampus) can internally generate "replay" resembling stimulus-driven activity. Recent computational models of replay use noisy recurrent neural networks (RNNs) trained to path-integrate. Replay in these networks has been described as Langevin sampling, but new modifiers of noisy RNN replay have surpassed this description. We re-examine noisy RNN replay as sampling to understand or improve it in three ways: (1) Under simple assumptions, we prove that the gradients replay activity should follow are time-varying and difficult to estimate, but readily motivate the use of hidden state leakage in RNNs for replay. (2) We confirm that hidden state adaptation (negative feedback) encourages exploration in replay, but show that it incurs non-Markov sampling that also slows replay. (3) We propose the first model of temporally compressed replay in noisy path-integrating RNNs through hidden state momentum, connect it to underdamped Langevin sampling, and show that, together with adaptation, it counters slowness while maintaining exploration. We verify our findings via path-integration of 2D triangular and T-maze paths and of high-dimensional paths of synthetic rat place cell activity.

2602.18397 2026-02-23 cs.RO

How Fast Can I Run My VLA? Demystifying VLA Inference Performance with VLA-Perf

Wenqi Jiang, Jason Clemons, Karu Sankaralingam, Christos Kozyrakis

详情
英文摘要

Vision-Language-Action (VLA) models have recently demonstrated impressive capabilities across various embodied AI tasks. While deploying VLA models on real-world robots imposes strict real-time inference constraints, the inference performance landscape of VLA remains poorly understood due to the large combinatorial space of model architectures and inference systems. In this paper, we ask a fundamental research question: How should we design future VLA models and systems to support real-time inference? To address this question, we first introduce VLA-Perf, an analytical performance model that can analyze inference performance for arbitrary combinations of VLA models and inference systems. Using VLA-Perf, we conduct the first systematic study of the VLA inference performance landscape. From a model-design perspective, we examine how inference performance is affected by model scaling, model architectural choices, long-context video inputs, asynchronous inference, and dual-system model pipelines. From the deployment perspective, we analyze where VLA inference should be executed -- on-device, on edge servers, or in the cloud -- and how hardware capability and network performance jointly determine end-to-end latency. By distilling 15 key takeaways from our comprehensive evaluation, we hope this work can provide practical guidance for the design of future VLA models and inference systems.

2602.18396 2026-02-23 cs.LG eess.SP math.PR stat.AP stat.ML

PRISM-FCP: Byzantine-Resilient Federated Conformal Prediction via Partial Sharing

Ehsan Lari, Reza Arablouei, Stefan Werner

Comments 13 pages, 5 figures, 2 tables, Submitted to IEEE Transactions on Signal Processing (TSP)

详情
英文摘要

We propose PRISM-FCP (Partial shaRing and robust calIbration with Statistical Margins for Federated Conformal Prediction), a Byzantine-resilient federated conformal prediction framework that utilizes partial model sharing to improve robustness against Byzantine attacks during both model training and conformal calibration. Existing approaches address adversarial behavior only in the calibration stage, leaving the learned model susceptible to poisoned updates. In contrast, PRISM-FCP mitigates attacks end-to-end. During training, clients partially share updates by transmitting only $M$ of $D$ parameters per round. This attenuates the expected energy of an adversary's perturbation in the aggregated update by a factor of $M/D$, yielding lower mean-square error (MSE) and tighter prediction intervals. During calibration, clients convert nonconformity scores into characterization vectors, compute distance-based maliciousness scores, and downweight or filter suspected Byzantine contributions before estimating the conformal quantile. Extensive experiments on both synthetic data and the UCI Superconductivity dataset demonstrate that PRISM-FCP maintains nominal coverage guarantees under Byzantine attacks while avoiding the interval inflation observed in standard FCP with reduced communication, providing a robust and communication-efficient approach to federated uncertainty quantification.

2602.18394 2026-02-23 cs.CV

Self-Aware Object Detection via Degradation Manifolds

Stefan Becker, Simon Weiss, Wolfgang Hübner, Michael Arens

详情
英文摘要

Object detectors achieve strong performance under nominal imaging conditions but can fail silently when exposed to blur, noise, compression, adverse weather, or resolution changes. In safety-critical settings, it is therefore insufficient to produce predictions without assessing whether the input remains within the detector's nominal operating regime. We refer to this capability as self-aware object detection. We introduce a degradation-aware self-awareness framework based on degradation manifolds, which explicitly structure a detector's feature space according to image degradation rather than semantic content. Our method augments a standard detection backbone with a lightweight embedding head trained via multi-layer contrastive learning. Images sharing the same degradation composition are pulled together, while differing degradation configurations are pushed apart, yielding a geometrically organized representation that captures degradation type and severity without requiring degradation labels or explicit density modeling. To anchor the learned geometry, we estimate a pristine prototype from clean training embeddings, defining a nominal operating point in representation space. Self-awareness emerges as geometric deviation from this reference, providing an intrinsic, image-level signal of degradation-induced shift that is independent of detection confidence. Extensive experiments on synthetic corruption benchmarks, cross-dataset zero-shot transfer, and natural weather-induced distribution shifts demonstrate strong pristine-degraded separability, consistent behavior across multiple detector architectures, and robust generalization under semantic shift. These results suggest that degradation-aware representation geometry provides a practical and detector-agnostic foundation.

2602.18390 2026-02-23 cs.DB cs.LO

Dichotomy for Axiomatising Inclusion Dependencies on K-Databases

Miika Hannula, Teymur Ismikhanov, Jonni Virtema

详情
英文摘要

A relation consisting of tuples annotated by an element of a monoid K is called a K-relation. A K-database is a collection of K-relations. In this paper, we study entailment of inclusion dependencies over K-databases, where K is a positive commutative monoid. We establish a dichotomy regarding the axiomatisation of the entailment of inclusion dependencies over K-databases, based on whether the monoid K is weakly absorptive or weakly cancellative. We establish that, if the monoid is weakly cancellative then the standard axioms of inclusion dependencies are sound and complete for the implication problem. If the monoid is not weakly cancellative, it is weakly absorptive and the standard axioms of inclusion dependencies together with the weak symmetry axiom are sound and complete for the implication problem. In addition, we establish that the so-called balance axiom is further required, if one stipulates that the joint weights of each K-relation of a K-database need to be the same; this generalises the notion of a K-relation being a distribution. In conjunction with the balance axiom, weak symmetry axiom boils down to symmetry.

2602.18389 2026-02-23 cs.DS

Improved Algorithms for Clustering with Noisy Distance Oracles

Pinki Pradhan, Anup Bhattacharya, Ragesh Jaiswal

Comments 37 pages, 10 figures

详情
英文摘要

Bateni et al. has recently introduced the weak-strong distance oracle model to study clustering problems in settings with limited distance information. Given query access to the strong-oracle and weak-oracle in the weak-strong oracle model, the authors design approximation algorithms for $k$-means and $k$-center clustering problems. In this work, we design algorithms with improved guarantees for $k$-means and $k$-center clustering problems in the weak-strong oracle model. The $k$-means++ algorithm is routinely used to solve $k$-means in settings where complete distance information is available. One of the main contributions of this work is to show that $k$-means++ algorithm can be adapted to work in the weak-strong oracle model using only a small number of strong-oracle queries, which is the critical resource in this model. In particular, our $k$-means++ based algorithm gives a constant approximation for $k$-means and uses $O(k^2 \log^2{n})$ strong-oracle queries. This improves on the algorithm of Bateni et al. that uses $O(k^2 \log^4n \log^2 \log n)$ strong-oracle queries for a constant factor approximation of $k$-means. For the $k$-center problem, we give a simple ball-carving based $6(1 + ε)$-approximation algorithm that uses $O(k^3 \log^2{n} \log{\frac{\log{n}}ε})$ strong-oracle queries. This is an improvement over the $14(1 + ε)$-approximation algorithm of Bateni et al. that uses $O(k^2 \log^4{n} \log^2{\frac{\log{n}}ε})$ strong-oracle queries. To show the effectiveness of our algorithms, we perform empirical evaluations on real-world datasets and show that our algorithms significantly outperform the algorithms of Bateni et al.

2602.18386 2026-02-23 cs.RO cs.AI cs.LG cs.SY eess.SY

Learning to Tune Pure Pursuit in Autonomous Racing: Joint Lookahead and Steering-Gain Control with PPO

Mohamed Elgouhary, Amr S. El-Wakeel

详情
英文摘要

Pure Pursuit (PP) is widely used in autonomous racing for real-time path tracking due to its efficiency and geometric clarity, yet performance is highly sensitive to how key parameters-lookahead distance and steering gain-are chosen. Standard velocity-based schedules adjust these only approximately and often fail to transfer across tracks and speed profiles. We propose a reinforcement-learning (RL) approach that jointly chooses the lookahead Ld and a steering gain g online using Proximal Policy Optimization (PPO). The policy observes compact state features (speed and curvature taps) and outputs (Ld, g) at each control step. Trained in F1TENTH Gym and deployed in a ROS 2 stack, the policy drives PP directly (with light smoothing) and requires no per-map retuning. Across simulation and real-car tests, the proposed RL-PP controller that jointly selects (Ld, g) consistently outperforms fixed-lookahead PP, velocity-scheduled adaptive PP, and an RL lookahead-only variant, and it also exceeds a kinematic MPC raceline tracker under our evaluated settings in lap time, path-tracking accuracy, and steering smoothness, demonstrating that policy-guided parameter tuning can reliably improve classical geometry-based control.

2602.18384 2026-02-23 cs.LG cs.AI

FedZMG: Efficient Client-Side Optimization in Federated Learning

Fotios Zantalis, Evangelos Zervas, Grigorios Koulouras

详情
英文摘要

Federated Learning (FL) enables distributed model training on edge devices while preserving data privacy. However, clients tend to have non-Independent and Identically Distributed (non-IID) data, which often leads to client-drift, and therefore diminishing convergence speed and model performance. While adaptive optimizers have been proposed to mitigate these effects, they frequently introduce computational complexity or communication overhead unsuitable for resource-constrained IoT environments. This paper introduces Federated Zero Mean Gradients (FedZMG), a novel, parameter-free, client-side optimization algorithm designed to tackle client-drift by structurally regularizing the optimization space. Advancing the idea of Gradient Centralization, FedZMG projects local gradients onto a zero-mean hyperplane, effectively neutralizing the "intensity" or "bias" shifts inherent in heterogeneous data distributions without requiring additional communication or hyperparameter tuning. A theoretical analysis is provided, proving that FedZMG reduces the effective gradient variance and guarantees tighter convergence bounds compared to standard FedAvg. Extensive empirical evaluations on EMNIST, CIFAR100, and Shakespeare datasets demonstrate that FedZMG achieves better convergence speed and final validation accuracy compared to the baseline FedAvg and the adaptive optimizer FedAdam, particularly in highly non-IID settings.

2602.18382 2026-02-23 eess.SY cs.SY math.OC

Incremental Input-to-State Stability and Equilibrium Tracking for Stochastic Contracting Dynamics

Yu Kawano, Simone Betteti, Alexander Davydov, Francesco Bullo

详情
英文摘要

In this paper, we study the contractivity of nonlinear stochastic differential equations (SDEs) driven by deterministic inputs and Brownian motions. Given a weighted $\ell_2$-norm for the state space, we show that an SDE is incrementally noise- and input-to-state stable if its vector field is uniformly contracting in the state and uniformly Lipschitz in the input. This result is applied to error estimation for time-varying equilibrium tracking in the presence of noise affecting both the system dynamics and the input signals. We consider both Ornstein-Uhlenbeck processes modeling unbounded noise and Jacobi diffusion processes modeling bounded noise. Finally, we turn our attention to the associated Fokker-Planck equation of an SDE. For this context, we prove incremental input-to-state stability with respect to an arbitrary $p$-Wasserstein metric when the drift vector field is uniformly contracting in the state and uniformly Lipschitz in the input with respect to an arbitrary norm.

2602.18380 2026-02-23 cs.CC cs.GT econ.TH

The Complexity of Sparse Win-Lose Bimatrix Games

Eleni Batziou, John Fearnley, Abheek Ghosh, Rahul Savani

Comments 43 pages

详情
英文摘要

We prove that computing an $ε$-approximate Nash equilibrium of a win-lose bimatrix game with constant sparsity is PPAD-hard for inverse-polynomial $ε$. Our result holds for 3-sparse games, which is tight given that 2-sparse win-lose bimatrix games can be solved in polynomial time.

2602.18379 2026-02-23 cs.RO cond-mat.soft

Ori-Sense: origami capacitive sensing for soft robotic applications

Hugo de Souza Oliveira, Xin Li, Mohsen Jafarpour, Edoardo Milana

Comments 9th IEEE-RAS International Conference on Soft Robotics (RoboSoft 2026)

详情
英文摘要

This work introduces Ori-Sense, a compliant capacitive sensor inspired by the inverted Kresling origami pattern. The device translates torsional deformation into measurable capacitance changes, enabling proprioceptive feedback for soft robotic systems. Using dissolvable-core molding, we fabricated a monolithic silicone structure with embedded conductive TPU electrodes, forming an integrated soft capacitor. Mechanical characterization revealed low stiffness and minimal impedance, with torque values below 0.01 N mm for axial displacements between -15 mm and 15 mm, and up to 0.03 N mm at 30 degrees twist under compression. Finite-element simulations confirmed localized stresses along fold lines and validated the measured torque-rotation response. Electrical tests showed consistent capacitance modulation up to 30%, directly correlated with the twist angle, and maximal sensitivity of S_theta ~ 0.0067 pF/deg at 5 mm of axial deformation.

2602.18376 2026-02-23 eess.SY cs.SY

Parameter Update Laws for Adaptive Control with Affine Equality Parameter Constraints

Ashwin P. Dani

详情
英文摘要

In this paper, constrained parameter update laws for adaptive control with convex equality constraint on the parameters are developed, one based on a gradient only update and the other incorporating concurrent learning (CL) update. The update laws are derived by solving a constrained optimization problem with affine equality constraints. This constrained problem is reformulated as an equivalent unconstrained problem in a new variable, thereby eliminating the equality constraints. The resulting update law is integrated with an adaptive trajectory tracking controller, enabling online learning of the unknown system parameters. Lyapunov stability of the closed-loop system with the equality-constrained parameter update law is established. The effectiveness of the proposed equality-constrained adaptive control law is demonstrated through simulations, validating its ability to maintain constraints on the parameter estimates, achieving convergence to the true parameters for CL-based update law, and achieving asymptotic and exponential tracking performance for constrained gradient and constrained CL-based update laws, respectively.

2602.18374 2026-02-23 cs.RO cs.AI

Zero-shot Interactive Perception

Venkatesh Sripada, Frank Guerin, Amir Ghalamzan

Comments Original manuscript submitted on April 24, 2025. Timestamped and publicly available on OpenReview: https://openreview.net/forum?id=7MhpFcr5Nx

详情
英文摘要

Interactive perception (IP) enables robots to extract hidden information in their workspace and execute manipulation plans by physically interacting with objects and altering the state of the environment -- crucial for resolving occlusions and ambiguity in complex, partially observable scenarios. We present Zero-Shot IP (ZS-IP), a novel framework that couples multi-strategy manipulation (pushing and grasping) with a memory-driven Vision Language Model (VLM) to guide robotic interactions and resolve semantic queries. ZS-IP integrates three key components: (1) an Enhanced Observation (EO) module that augments the VLM's visual perception with both conventional keypoints and our proposed pushlines -- a novel 2D visual augmentation tailored to pushing actions, (2) a memory-guided action module that reinforces semantic reasoning through context lookup, and (3) a robotic controller that executes pushing, pulling, or grasping based on VLM output. Unlike grid-based augmentations optimized for pick-and-place, pushlines capture affordances for contact-rich actions, substantially improving pushing performance. We evaluate ZS-IP on a 7-DOF Franka Panda arm across diverse scenes with varying occlusions and task complexities. Our experiments demonstrate that ZS-IP outperforms passive and viewpoint-based perception techniques such as Mark-Based Visual Prompting (MOKA), particularly in pushing tasks, while preserving the integrity of non-target elements.