arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3203
2603.14659 2026-03-17 cs.CV cs.AI

VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting

Daeun Lee, Shoubin Yu, Yue Zhang, Mohit Bansal

Comments Project website: https://visioncoach.github.io/

详情
英文摘要

Video reasoning requires models to locate and track question-relevant evidence across frames. While reinforcement learning (RL) with verifiable rewards improves accuracy, it still struggles to achieve reliable spatio-temporal grounding during the reasoning process. Moreover, improving grounding typically relies on scaled training data or inference-time perception tools, which increases annotation cost or computational cost. To address this challenge, we propose VisonCoach, an input-adaptive RL framework that improves spatio-temporal grounding through visual prompting as training-time guidance. During RL training, visual prompts are selectively applied to challenging inputs to amplify question-relevant evidence and suppress distractors. The model then internalizes these improvements through self-distillation, enabling grounded reasoning directly on raw videos without visual prompting at inference. VisonCoach consists of two components: (1) Visual Prompt Selector, which predicts appropriate prompt types conditioned on the video and question, and (2) Spatio-Temporal Reasoner, optimized with RL under visual prompt guidance and object-aware grounding rewards that enforce object identity consistency and multi-region bounding-box overlap. Extensive experiments demonstrate that VisonCoach achieves state-of-the-art performance under comparable settings, across diverse video reasoning, video understanding, and temporal grounding benchmarks (V-STAR, VideoMME, World-Sense, VideoMMMU, PerceptionTest, and Charades-STA), while maintaining a single efficient inference pathway without external tools. Our results show that visual prompting during training improves grounded video reasoning, while self-distillation enables the model to internalize this ability without requiring prompts at inference time.

2603.14658 2026-03-17 cs.CV cs.AI

Human-AI Ensembles Improve Deepfake Detection in Low-to-Medium Quality Videos

Marco Postiglione, Isabel Gortner, V. S. Subrahmanian

详情
英文摘要

Deepfake detection is widely framed as a machine learning problem, yet how humans and AI detectors compare under realistic conditions remains poorly understood. We evaluate 200 human participants and 95 state-of-the-art AI detectors across two datasets: DF40, a standard benchmark, and CharadesDF, a novel dataset of videos of everyday activities. CharadesDF was recorded using mobile phones leading to low/moderate quality videos compared to the more professionally captured DF40. Humans outperform AI detectors on both datasets, with the gap widening in the case of CharadesDF where AI accuracy collapses to near chance (0.537) while humans maintain robust performance (0.784). Human and AI errors are complementary: humans miss high-quality deepfakes while AI detectors flag authentic videos as fake, and hybrid human-AI ensembles reduce high-confidence errors. These findings suggest that effective real-world deepfake detection, especially in non-professionally produced videos, requires human-AI collaboration rather than AI algorithms alone.

2603.14656 2026-03-17 cs.RO

Coordinate-Independent Robot Model Identification

Yanhao Yang, Ross L. Hatton

Comments 8 pages, 7 figures, supplementary video: https://youtu.be/w2bBBV9t1fk?si=iCoJ4l51wumwvCIo

详情
英文摘要

Robot model identification is commonly performed by least-squares regression on inverse dynamics, but existing formulations measure residuals directly in coordinate force space and therefore depend on the chosen coordinate chart, units, and scaling. This paper proposes a coordinate-independent identification method that weights inverse-dynamics residuals by the dual metric induced by the system Riemannian metric. Using the force--velocity vector--covector duality, the dual metric provides a physically meaningful normalization of generalized forces, pulling coordinate residuals back into the ambient mechanical space and eliminating coordinate-induced bias. The resulting objective remains convex through an affine-metric and Schur-complement reformulation, and is compatible with physical-consistency constraints and geometric regularization. Experiments on an inertia-dominated Crazyflie--pendulum system and a drag-dominated LandSalp robot show improved identification accuracy, especially on shape coordinates, in both low-data and high-data settings.

2603.14651 2026-03-17 cs.LG cs.AI cs.MA

EARCP: Self-Regulating Coherence-Aware Ensemble Architecture for Sequential Decision Making -- Ensemble Auto-Regule par Coherence et Performance

Mike Amega

Comments 13 pages, 1 table, 1 algorithm. Open-source implementation available at https://github.com/Volgat/earcp and via pip install earcp. Dual-licensed: free for academic researchers, students, and organizations with gross revenue under $100,000/year; commercial license required for organizations exceeding this threshold (contact author)

详情
英文摘要

We present EARCP (Ensemble Auto-Régulé par Cohérence et Performance), a novel ensemble architecture that dynamically weights heterogeneous expert models based on both their individual performance and inter-model coherence. Unlike traditional ensemble methods that rely on static or offline-learned combinations, EARCP continuously adapts model weights through a principled online learning mechanism that balances exploitation of high-performing models with exploration guided by consensus signals. The architecture combines theoretical foundations from multiplicative weight update algorithms with a novel coherence-based regularization term, providing both theoretical guarantees through regret bounds and practical robustness in non-stationary environments. We formalize the EARCP framework, prove sublinear regret bounds of O(sqrt(T log M)) under standard assumptions, and demonstrate its effectiveness through empirical evaluation on sequential prediction tasks including time series forecasting, activity recognition, and financial prediction. The architecture is designed as a general-purpose framework applicable to any domain requiring ensemble learning with temporal dependencies. An open-source implementation is available at https://github.com/Volgat/earcp and via PyPI (pip install earcp).

2603.14648 2026-03-17 cs.LG cs.AI

A Methodology for Thermal Limit Bias Predictability Through Artificial Intelligence

Anirudh Tunga, Michael J. Mueterthies, Jonathan Nistor

详情
Journal ref
Transactions of the American Nuclear Society 131 (2024) 520-523
英文摘要

Nuclear power plant operators face significant challenges due to unpredictable deviations between offline and online thermal limits, a phenomenon known as thermal limit bias, which leads to conservative design margins, increased fuel costs, and operational inefficiencies. This work presents a deep learning based methodology to predict and correct this bias for Boiling Water Reactors (BWRs), focusing on the Maximum Fraction of Limiting Power Density (MFLPD) metric used to track the Linear Heat Generation Rate (LHGR) limit. The proposed model employs a fully convolutional encoder decoder architecture, incorporating a feature fusion network to predict corrected MFLPD values closer to online measurements. Evaluated across five independent fuel cycles, the model reduces the mean nodal array error by 74 percent, the mean absolute deviation in limiting values by 72 percent, and the maximum bias by 52 percent compared to offline methods. These results demonstrate the model's potential to meaningfully improve fuel cycle economics and operational planning, and a commercial variant has been deployed at multiple operating BWRs.

2603.14647 2026-03-17 cs.CV cs.AI

TopoCL: Topological Contrastive Learning for Medical Imaging

Guangyu Meng, Pengfei Gu, Peixian Liang, John P. Lalor, Erin Wolf Chambers, Danny Z. Chen

详情
英文摘要

Contrastive learning (CL) has become a powerful approach for learning representations from unlabeled images. However, existing CL methods focus predominantly on visual appearance features while neglecting topological characteristics (e.g., connectivity patterns, boundary configurations, cavity formations) that provide valuable cues for medical image analysis. To address this limitation, we propose a new topological CL framework (TopoCL) that explicitly exploits topological structures during contrastive learning for medical imaging. Specifically, we first introduce topology-aware augmentations that control topological perturbations using a relative bottleneck distance between persistence diagrams, preserving medically relevant topological properties while enabling controlled structural variations. We then design a Hierarchical Topology Encoder that captures topological features through self-attention and cross-attention mechanisms. Finally, we develop an adaptive mixture-of-experts (MoE) module to dynamically integrate visual and topological representations. TopoCL can be seamlessly integrated with existing CL methods. We evaluate TopoCL on five representative CL methods (SimCLR, MoCo-v3, BYOL, DINO, and Barlow Twins) and five diverse medical image classification datasets. The experimental results show that TopoCL achieves consistent improvements: an average gain of +3.26% in linear probe classification accuracy with strong statistical significance, verifying its effectiveness.

2603.14646 2026-03-17 cs.AI

Dynamic Theory of Mind as a Temporal Memory Problem: Evidence from Large Language Models

Thuy Ngoc Nguyen, Duy Nhat Phan, Cleotilde Gonzalez

Comments 8 pages, 4 figures, 3 tables, conference

详情
英文摘要

Theory of Mind (ToM) is central to social cognition and human-AI interaction, and Large Language Models (LLMs) have been used to help understand and represent ToM. However, most evaluations treat ToM as a static judgment at a single moment, primarily relying on tests of false beliefs. This overlooks a key dynamic dimension of ToM: the ability to represent, update, and retrieve others' beliefs over time. We investigate dynamic ToM as a temporally extended representational memory problem, asking whether LLMs can track belief trajectories across interactions rather than only inferring current beliefs. We introduce DToM-Track, an evaluation framework to investigate temporal belief reasoning in controlled multiturn conversations, testing the recall of beliefs held prior to an update, the inference of current beliefs, and the detection of belief change. Using LLMs as computational probes, we find a consistent asymmetry: models reliably infer an agent's current belief but struggle to maintain and retrieve prior belief states once updates occur. This pattern persists across LLM model families and scales, and is consistent with recency bias and interference effects well documented in cognitive science. These results suggest that tracking belief trajectories over time poses a distinct challenge beyond classical false-belief reasoning. By framing ToM as a problem of temporal representation and retrieval, this work connects ToM to core cognitive mechanisms of memory and interference and exposes the implications for LLM models of social reasoning in extended human-AI interactions.

2603.14645 2026-03-17 cs.CV

Spectrum Matching: a Unified Perspective for Superior Diffusability in Latent Diffusion

Mang Ning, Mingxiao Li, Le Zhang, Lanmiao Liu, Matthew B. Blaschko, Albert Ali Salah, Itir Onal Ertugrul

Comments We use NIPS template for readability reason

详情
英文摘要

In this paper, we study the diffusability (learnability) of variational autoencoders (VAE) in latent diffusion. First, we show that pixel-space diffusion trained with an MSE objective is inherently biased toward learning low and mid spatial frequencies, and that the power-law power spectral density (PSD) of natural images makes this bias perceptually beneficial. Motivated by this result, we propose the \emph{Spectrum Matching Hypothesis}: latents with superior diffusability should (i) follow a flattened power-law PSD (\emph{Encoding Spectrum Matching}, ESM) and (ii) preserve frequency-to-frequency semantic correspondence through the decoder (\emph{Decoding Spectrum Matching}, DSM). In practice, we apply ESM by matching the PSD between images and latents, and DSM via shared spectral masking with frequency-aligned reconstruction. Importantly, Spectrum Matching provides a unified view that clarifies prior observations of over-noisy or over-smoothed latents, and interprets several recent methods as special cases (e.g., VA-VAE, EQ-VAE). Experiments suggest that Spectrum Matching yields superior diffusion generation on CelebA and ImageNet datasets, and outperforms prior approaches. Finally, we extend the spectral view to representation alignment (REPA): we show that the directional spectral energy of the target representation is crucial for REPA, and propose a DoG-based method to further improve the performance of REPA. Our code is available https://github.com/forever208/SpectrumMatching.

2603.14643 2026-03-17 cs.AI cs.CL

Argumentation for Explainable and Globally Contestable Decision Support with LLMs

Adam Dejl, Matthew Williams, Francesca Toni

详情
英文摘要

Large language models (LLMs) exhibit strong general capabilities, but their deployment in high-stakes domains is hindered by their opacity and unpredictability. Recent work has taken meaningful steps towards addressing these issues by augmenting LLMs with post-hoc reasoning based on computational argumentation, providing faithful explanations and enabling users to contest incorrect decisions. However, this paradigm is limited to pre-defined binary choices and only supports local contestation for specific instances, leaving the underlying decision logic unchanged and prone to repeated mistakes. In this paper, we introduce ArgEval, a framework that shifts from instance-specific reasoning to structured evaluation of general decision options. Rather than mining arguments solely for individual cases, ArgEval systematically maps task-specific decision spaces, builds corresponding option ontologies, and constructs general argumentation frameworks (AFs) for each option. These frameworks can then be instantiated to provide explainable recommendations for specific cases while still supporting global contestability through modification of the shared AFs. We investigate the effectiveness of ArgEval on treatment recommendation for glioblastoma, an aggressive brain tumour, and show that it can produce explainable guidance aligned with clinical practice.

2603.14639 2026-03-17 cs.RO cs.CV

Seeing Where to Deploy: Metric RGB-Based Traversability Analysis for Aerial-to-Ground Hidden Space Inspection

Seoyoung Lee, Shaekh Mohammad Shithil, Durgakant Pushp, Lantao Liu, Zhangyang Wang

详情
英文摘要

Inspection of confined infrastructure such as culverts often requires accessing hidden spaces whose entrances are reachable primarily from elevated viewpoints. Aerial-ground cooperation enables a UAV to deploy a compact UGV for interior exploration, but selecting a suitable deployment region from aerial observations requires metric terrain reasoning involving scale ambiguity, reconstruction uncertainty, and terrain semantics. We present a metric RGB-based geometric-semantic reconstruction and traversability analysis framework for aerial-to-ground hidden space inspection. A feed-forward multi-view RGB reconstruction backbone produces dense geometry, while temporally consistent semantic segmentation yields a 3D semantic map. To enable deployment-relevant measurements without LiDAR-based dense mapping, we introduce an embodied motion prior that recovers metric scale by enforcing consistency between predicted camera motion and onboard platform egomotion. From the metrically grounded reconstruction, we construct a confidence-aware geometric-semantic traversability map and evaluate candidate deployment zones under explicit reachability constraints. Experiments on a tethered UAV-UGV platform demonstrate reliable deployment-zone identification in hidden space scenarios.

2603.14636 2026-03-17 cs.SD cs.AI cs.CL eess.AS

Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

Lok-Lam Ieong, Chia-Chien Chen, Chih-Kai Yang, Yu-Han Huang, An-Yu Cheng, Hung-yi Lee

Comments 6 pages, 4 figures, 2 tables

详情
英文摘要

Chain-of-thought (CoT) prompting has been extended to large audio-language models (LALMs) to elicit reasoning, yet enhancing its effectiveness without training remains challenging. We study inference-time model steering as a training-free approach to improve LALM reasoning. We introduce three strategies using diverse information sources and evaluate them across four LALMs and four benchmarks. Results show general accuracy gains up to 4.4% over CoT prompting. Notably, we identify a cross-modal transfer where steering vectors derived from few text samples effectively guide speech-based reasoning, demonstrating high data efficiency. We also examine hyperparameter sensitivity to understand the robustness of these approaches. Our findings position model steering as a practical direction for strengthening LALM reasoning.

2603.14632 2026-03-17 cs.CV cs.IT math.IT

Continual Few-shot Adaptation for Synthetic Fingerprint Detection

Joseph Geo Benjamin, Anil K. Jain, Karthik Nandakumar

Comments Accepted in 14th International Workshop on Biometrics and Forensics (IWBF-2026)

详情
英文摘要

The quality and realism of synthetically generated fingerprint images have increased significantly over the past decade fueled by advancements in generative artificial intelligence (GenAI). This has exacerbated the vulnerability of fingerprint recognition systems to data injection attacks, where synthetic fingerprints are maliciously inserted during enrollment or authentication. Hence, there is an urgent need for methods to detect if a fingerprint image is real or synthetic. While it is straightforward to train deep neural network (DNN) models to classify images as real or synthetic, often such DNN models overfit the training data and fail to generalize well when applied to synthetic fingerprints generated using unseen GenAI models. In this work, we formulate synthetic fingerprint detection as a continual few-shot adaptation problem, where the objective is to rapidly evolve a base detector to identify new types of synthetic data. To enable continual few-shot adaptation, we employ a combination of binary cross-entropy and supervised contrastive (applied to the feature representation) losses and replay a few samples from previously known styles during fine-tuning to mitigate catastrophic forgetting. Experiments based on several DNN backbones (as feature extractors) and a variety of real and synthetic fingerprint datasets indicate that the proposed approach achieves a good trade-off between fast adaptation for detecting unseen synthetic styles and forgetting of known styles.

2603.14631 2026-03-17 cs.LG cs.AI cs.CL

Anterior's Approach to Fairness Evaluation of Automated Prior Authorization System

Sai P. Selvaraj, Khadija Mahmoud, Anuj Iravane

详情
英文摘要

Increasing staffing constraints and turnaround-time pressures in Prior authorization (PA) have led to increasing automation of decision systems to support PA review. Evaluating fairness in such systems poses unique challenges because legitimate clinical guidelines and medical necessity criteria often differ across demographic groups, making parity in approval rates an inappropriate fairness metric. We propose a fairness evaluation framework for prior authorization models based on model error rates rather than approval outcomes. Using 7,166 human-reviewed cases spanning 27 medical necessity guidelines, we assessed consistency in sex, age, race/ethnicity, and socioeconomic status. Our evaluation combined error-rate comparisons, tolerance-band analysis with a predefined $\pm$5 percentage-point margin, statistical power evaluation, and protocol-controlled logistic regression. Across most demographics, model error rates were consistent, and confidence intervals fell within the predefined tolerance band, indicating no meaningful performance differences. For race/ethnicity, point estimates remain small, but subgroup sample sizes were limited, resulting in wide confidence intervals and underpowered tests, with inconclusive evidence within the dataset we explored. These findings illustrate a rigorous and regulator-aligned approach to fairness evaluation in administrative healthcare AI systems.

2603.14623 2026-03-17 cs.LG

Proactive Routing to Interpretable Surrogates with Distribution-Free Safety Guarantees

Iqtedar Uddin, Mazin Khider, André Bauer

详情
英文摘要

Model routing determines whether to use an accurate black-box model or a simpler surrogate that approximates it at lower cost or greater interpretability. In deployment settings, practitioners often wish to restrict surrogate use to inputs where its degradation relative to a reference model is controlled. We study proactive (input-based) routing, in which a lightweight gate selects the model before either runs, enabling distribution-free control of the fraction of routed inputs whose degradation exceeds a tolerance τ. The gate is trained to distinguish safe from unsafe inputs, and a routing threshold is chosen via Clopper-Pearson conformal calibration on a held-out set, guaranteeing that the routed-set violation rate is at most α with probability 1-δ. We derive a feasibility condition linking safe routing to the base safe rate π and risk budget α, along with sufficient AUC thresholds ensuring that feasible routing exists. Across 35 OpenML datasets and multiple black-box model families, gate-based conformal routing maintains controlled violation while achieving substantially higher coverage than regression conformal and naive baselines. We further show that probabilistic calibration primarily affects routing efficiency rather than distribution-free validity.

2603.14609 2026-03-17 cs.CV

GroundSet: A Cadastral-Grounded Dataset for Spatial Understanding with Vector Data

Roger Ferrod, Maël Lecene, Krishna Sapkota, George Leifman, Vered Silverman, Genady Beryozkin, Sylvain Lobry

详情
英文摘要

Precise spatial understanding in Earth Observation is essential for translating raw aerial imagery into actionable insights for critical applications like urban planning, environmental monitoring and disaster management. However, Multimodal Large Language Models exhibit critical deficiencies in fine-grained spatial understanding within Remote Sensing, primarily due to a reliance on limited or repurposed legacy datasets. To bridge this gap, we introduce a large-scale dataset grounded in verifiable cadastral vector data, comprising 3.8 million annotated objects across 510k high-resolution images with 135 granular semantic categories. We validate this resource through a comprehensive instruction-tuning benchmark spanning seven spatial reasoning tasks. Our evaluation establishes a robust baseline using a standard LLaVA architecture. We show that while current RS-specialized and commercial models (e.g., Gemini) struggle in zero-shot settings, high-fidelity supervision effectively bridges this gap, enabling standard architectures to master fine-grained spatial grounding without complex architectural modifications.

2603.14608 2026-03-17 cs.LG cs.AI math.OC stat.ML

Delightful Policy Gradient

Ian Osband

详情
英文摘要

Standard policy gradients weight each sampled action by advantage alone, regardless of how likely that action was under the current policy. This creates two pathologies: within a single decision context (e.g. one image or prompt), a rare negative-advantage action can disproportionately distort the update direction; across many such contexts in a batch, the expected gradient over-allocates budget to contexts the policy already handles well. We introduce the \textit{Delightful Policy Gradient} (DG), which gates each term with a sigmoid of \emph{delight}, the product of advantage and action surprisal (negative log-probability). For $K$-armed bandits, DG provably improves directional accuracy in a single context and, across multiple contexts, shifts the expected gradient strictly closer to the supervised cross-entropy oracle. This second effect is not variance reduction: it persists even with infinite samples. Empirically, DG outperforms REINFORCE, PPO, and advantage-weighted baselines across MNIST, transformer sequence modeling, and continuous control, with larger gains on harder tasks.

2603.14605 2026-03-17 cs.RO

CyboRacket: A Perception-to-Action Framework for Humanoid Racket Sports

Peng Ren, Chuan Qi, Haoyang Ge, Qiyuan Su, Xuguo He, Cong Huang, Pei Chi, Jiang Zhao, Kai Chen

详情
英文摘要

Dynamic ball-interaction tasks remain challenging for robots because they require tight perception-action coupling under limited reaction time. This challenge is especially pronounced in humanoid racket sports, where successful interception depends on accurate visual tracking, trajectory prediction, coordinated stepping, and stable whole-body striking. Existing robotic racket-sport systems often rely on external motion capture for state estimation or on task-specific low-level controllers that must be retrained across tasks and platforms. We present CyboRacket, a hierarchical perception-to-action framework for humanoid racket sports that integrates onboard visual perception, physics-based trajectory prediction, and large-scale pre-trained whole-body control. The framework uses onboard cameras to track the incoming object, predicts its future trajectory, and converts the estimated interception state into target end-effector and base-motion commands for whole-body execution by SONIC on the Unitree G1 humanoid robot. We evaluate the proposed framework in a vision-based humanoid tennis-hitting task. Experimental results demonstrate real-time visual tracking, trajectory prediction, and successful striking using purely onboard sensing.

2603.14604 2026-03-17 cs.RO cs.CV cs.LG

Tactile Modality Fusion for Vision-Language-Action Models

Charlotte Morissette, Amin Abyaneh, Wei-Di Chang, Anas Houssaini, David Meger, Hsiu-Chin Lin, Jonathan Tremblay, Gregory Dudek

Comments 19 pages, 5 figures

详情
英文摘要

We propose TacFiLM, a lightweight modality-fusion approach that integrates visual-tactile signals into vision-language-action (VLA) models. While recent advances in VLA models have introduced robot policies that are both generalizable and semantically grounded, these models mainly rely on vision-based perception. Vision alone, however, cannot capture the complex interaction dynamics that occur during contact-rich manipulation, including contact forces, surface friction, compliance, and shear. While recent attempts to integrate tactile signals into VLA models often increase complexity through token concatenation or large-scale pretraining, the heavy computational demands of behavioural models necessitate more lightweight fusion strategies. To address these challenges, TacFiLM outlines a post-training finetuning approach that conditions intermediate visual features on pretrained tactile representations using feature-wise linear modulation (FiLM). Experimental results on insertion tasks demonstrate consistent improvements in success rate, direct insertion performance, completion time, and force stability across both in-distribution and out-of-distribution tasks. Together, these results support our method as an effective approach to integrating tactile signals into VLA models, improving contact-rich manipulation behaviours.

2603.14603 2026-03-17 cs.RO

Latent Dynamics-Aware OOD Monitoring for Trajectory Prediction with Provable Guarantees

Tongfei Guo, Lili Su

详情
英文摘要

In safety-critical Cyber-Physical Systems (CPS), accurate trajectory prediction provides vital guidance for downstream planning and control, yet although deep learning models achieve high-fidelity forecasts on validation data, their reliability degrades under out-of-distribution (OOD) scenarios caused by environmental uncertainty or rare traffic behaviors in real-world deployment; detecting such OOD events is challenging due to evolving traffic conditions and changing interaction patterns, while safety-critical applications demand formal guarantees on detection delay and false-alarm rates, motivating us-following recent work [1]-to formulate OOD monitoring for trajectory prediction as a quickest changepoint detection (QCD) problem that offers a principled statistical framework with established theory; we further observe that the real-world evolution of prediction errors under in-distribution (ID) conditions can be effectively modeled by a Hidden Markov Model (HMM), and by leveraging this structure we extend the cumulative Maximum Mean Discrepancy approach to enable detection without requiring explicit knowledge of the post-change distribution while still admitting provable guarantees on delay and false alarms, with experiments on three real-world driving datasets demonstrating reduced detection delay and robustness to heavy-tailed errors and unknown post-change conditions.

2603.14600 2026-03-17 cs.LG cs.AI cs.RO

A Loss Landscape Visualization Framework for Interpreting Reinforcement Learning: An ADHDP Case Study

Jingyi Liu, Jian Guo, Eberhard Gill

Comments Submitted to Acta Astronautica

详情
英文摘要

Reinforcement learning algorithms have been widely used in dynamic and control systems. However, interpreting their internal learning behavior remains a challenge. In the authors' previous work, a critic match loss landscape visualization method was proposed to study critic training. This study extends that method into a framework which provides a multi-perspective view of the learning dynamics, clarifying how value estimation, policy optimization, and temporal-difference (TD) signals interact during training. The proposed framework includes four complementary components; a three-dimensional reconstruction of the critic match loss surface that shows how TD targets shape the optimization geometry; an actor loss landscape under a frozen critic that reveals how the policy exploits that geometry; a trajectory combining time, Bellman error, and policy weights that indicates how updates move across the surface; and a state-TD map that identifies the state regions that drive those updates. The Action-Dependent Heuristic Dynamic Programming (ADHDP) algorithm for spacecraft attitude control is used as a case study. The framework is applied to compare several ADHDP variants and shows how training stabilizers and target updates change the optimization landscape and affect learning stability. Therefore, the proposed framework provides a systematic and interpretable tool for analyzing reinforcement learning behavior across algorithmic designs.

2603.14598 2026-03-17 cs.RO

SmallSatSim: A High-Fidelity Simulation and Training Toolkit for Microgravity Robotic Close Proximity Operations

David Schwartz, Alexander Hansson, Sabrina Bodmer, David Sternberg, Oliver Jia-Richards, Keenan Albee

Comments 7 pages, 7 figures

详情
英文摘要

Microgravity rendezvous and close proximity operations (RPO) is a growing area of interest for applications spanning in-space assembly and manufacturing (ISAM), orbital debris remediation, and small body exploration. Microgravity environments present unique challenges for robotic control and planning algorithms for new agile RPO mission scenarios like free-floating manipulation, planning under failure, and estimating high-fidelity dynamics of tumbling bodies. To facilitate the development and testing of novel RPO algorithms, we introduce SmallSatSim, a high-fidelity simulation toolkit that leverages the MuJoCo physics engine to accurately model small satellite RPO dynamics in local microgravity robotic free-flight settings, including under model disturbances and perturbations. The framework includes cutting edge out-of-the-box free-flyer control techniques. A GPU-accelerated pipeline using MuJoCo MJX and JAX is implemented for sampling- and learning-based simulation uses cases. SmallSatSim also supports configurable failure models, enabling the evaluation of safe control strategies under adversarial conditions. Visualization, logging, and GPU-enabled parallelization further enhance SmallSatSim's capability for RPO testing. We outline SmallSatSim's features and intended use cases, and demonstrate its use for robotic RPO planning and control. The open-sourced toolkit aims to accelerate research in autonomous, agile robotic small satellite operations.

2603.14594 2026-03-17 cs.AI cs.LG cs.LO

Scaling the Explanation of Multi-Class Bayesian Network Classifiers

Yaofang Zhang, Adnan Darwiche

Comments To appear in the 4th World Conference on Explainable Artificial Intelligence (XAI), 2026

详情
英文摘要

We propose a new algorithm for compiling Bayesian network classifier (BNC) into class formulas. Class formulas are logical formulas that represent a classifier's input-output behavior, and are crucial in the recent line of work that uses logical reasoning to explain the decisions made by classifiers. Compared to prior work on compiling class formulas of BNCs, our proposed algorithm is not restricted to binary classifiers, shows significant improvement in compilation time, and outputs class formulas as negation normal form (NNF) circuits that are OR-decomposable, which is an important property when computing explanations of classifiers.

2603.14593 2026-03-17 cs.CL

Parameter-Efficient Quality Estimation via Frozen Recursive Models

Umar Abubacar, Roman Bauer, Diptesh Kanojia

Comments Accepted to LowResLM Workshop @ EACL 2026

详情
英文摘要

Tiny Recursive Models (TRM) achieve strong results on reasoning tasks through iterative refinement of a shared network. We investigate whether these recursive mechanisms transfer to Quality Estimation (QE) for low-resource languages using a three-phase methodology. Experiments on $8$ language pairs on a low-resource QE dataset reveal three findings. First, TRM's recursive mechanisms do not transfer to QE. External iteration hurts performance, and internal recursion offers only narrow benefits. Next, representation quality dominates architectural choices, and lastly, frozen pretrained embeddings match fine-tuned performance while reducing trainable parameters by 37$\times$ (7M vs 262M). TRM-QE with frozen XLM-R embeddings achieves a Spearman's correlation of 0.370, matching fine-tuned variants (0.369) and outperforming an equivalent-depth standard transformer (0.336). On Hindi and Tamil, frozen TRM-QE outperforms MonoTransQuest (560M parameters) with 80$\times$ fewer trainable parameters, suggesting that weight sharing combined with frozen embeddings enables parameter efficiency for QE. We release the code publicly for further research. Code is available at https://github.com/surrey-nlp/TRMQE.

2603.14592 2026-03-17 cs.LG cs.CR

A Multi-Scale Graph Learning Framework with Temporal Consistency Constraints for Financial Fraud Detection in Transaction Networks under Non-Stationary Conditions

Yiming Lei, Qiannan Shen, Junhao Song

Comments 39 pages, 13 figures

详情
英文摘要

Financial fraud detection in transaction networks involves modeling sparse anomalies, dynamic patterns, and severe class imbalance in the presence of temporal drift in the data. In real-world transaction systems, a suspicious transaction is rarely isolated: rather, legitimate and suspicious transactions are often connected through accounts, intermediaries or through temporal transaction sequences. Attribute-based or randomly partitioned learning pipelines are therefore insufficient to detect relationally structured fraud. STC-MixHop, a graph-based framework combining spatial multi-resolution propagation with lightweight temporal consistency modeling for anomaly and fraud detection in dynamic transaction networks. It integrates three components: a MixHop-inspired multi-scale neighborhood diffusion encoder a multi-scale neighborhood diffusion MixHop-based encoder for learning structural patterns; a spatial-temporal attention module coupling current and preceding graph snapshots to stabilize representations; and a temporally informed self-supervised pretraining strategy exploiting unlabeled transaction interactions to improve representation quality. We evaluate the framework primarily on the PaySim dataset under strict chronological splits, supplementing the analysis with Porto Seguro and FEMA data to probe cross-domain component behavior. Results show that STC-MixHop is competitive among graph methods and achieves strong screening-oriented recall under highly imbalanced conditions. The experiments also reveal an important boundary condition: when node attributes are highly informative, tabular baselines remain difficult to outperform. Graph structure contributes most clearly where hidden relational dependencies are operationally important. These findings support a stability-focused view of graph learning for financial fraud detection.

2603.14591 2026-03-17 cs.LG cs.AI cs.IR

FlashHead: Efficient Drop-In Replacement for the Classification Head in Language Model Inference

Wilhelm Tranheden, Shahnawaz Ahmed, Devdatt Dubhashi, Jonna Matthiesen, Hannes von Essen

Comments A collection of models with FlashHead optimization can be found at: https://huggingface.co/collections/embedl/flashhead

详情
英文摘要

Language models are increasingly adopting smaller architectures optimized for consumer devices. In this setting, inference efficiency is the primary constraint. Meanwhile, vocabulary sizes continue to grow rapidly, making the classification head a critical bottleneck that accounts for up to 60\% of model parameters, and 50\% of inference compute. We introduce FlashHead, the first efficient drop-in replacement for the dense classification head that is training-free and hardware-friendly. FlashHead builds on principles from information retrieval, reframing that computation at the output head as a retrieval problem rather than a dense classification over the full vocabulary. FlashHead introduces four key innovations: (1) a balanced clustering scheme that structures vocabulary partitions into compact hardware-efficient tensors, (2) extending multiprobe retrieval to language model heads, enabling thousands of clusters to be scored in parallel, (3) a novel inference-time sampling mechanism that extends retrieval beyond top tokens, enabling probabilistic sampling across the full vocabulary, and (4) selective quantization, enabling effective low-bit computation in the head. Experiments on Llama-3.2, Gemma-3, and Qwen-3 show that FlashHead delivers model-level inference speedups of up to \textbf{1.75x} which maintaining output accuracy compared to the original head. By overcoming the classification head bottleneck, FlashHead establishes a new benchmark for efficient inference and removes a key barrier to developing smaller, capable models for consumer hardware.

2603.14589 2026-03-17 cs.LG cs.AI cs.RO

Adapting Critic Match Loss Landscape Visualization to Off-policy Reinforcement Learning

Jingyi Liu, Jian Guo, Eberhard Gill

Comments Revised manuscript, submitted to Astrodynamics

详情
英文摘要

This work extends an established critic match loss landscape visualization method from online to off-policy reinforcement learning (RL), aiming to reveal the optimization geometry behind critic learning. Off-policy RL differs from stepwise online actor-critic learning in its replay-based data flow and target computation. Based on these two structural differences, the critic match loss landscape visualization method is adapted to the Soft Actor-Critic (SAC) algorithm by aligning the loss evaluation with its batch-based data flow and target computation, using a fixed replay batch and precomputed critic targets from the selected policy. Critic parameters recorded during training are projected onto a principal component plane, where the critic match loss is evaluated to form a 3-D landscape with an overlaid 2-D optimization path. Applied to a spacecraft attitude control problem, the resulting landscapes are analyzed both qualitatively and quantitatively using sharpness, basin area, and local anisotropy metrics, together with temporal landscape snapshots. Comparisons between convergent SAC, divergent SAC, and divergent Action-Dependent Heuristic Dynamic Programming (ADHDP) cases reveal distinct geometric patterns and optimization behaviors under different algorithmic structures. The results demonstrate that the adapted critic match loss visualization framework serves as a geometric diagnostic tool for analyzing critic optimization dynamics in replay-based off-policy RL-based control problems.

2603.14588 2026-03-17 cs.AI cs.IR cs.LG

SuperLocalMemory V3: Information-Geometric Foundations for Zero-LLM Enterprise Agent Memory

Varun Pratap Bhardwaj

Comments 43 pages, 5 figures, 9 tables, 3 appendices. Code: https://github.com/qualixar/superlocalmemory. Zenodo DOI: 10.5281/zenodo.19038659

详情
英文摘要

Persistent memory is a central capability for AI agents, yet the mathematical foundations of memory retrieval, lifecycle management, and consistency remain unexplored. Current systems employ cosine similarity for retrieval, heuristic decay for salience, and provide no formal contradiction detection. We establish information-geometric foundations through three contributions. First, a retrieval metric derived from the Fisher information structure of diagonal Gaussian families, satisfying Riemannian metric axioms, invariant under sufficient statistics, and computable in O(d) time. Second, memory lifecycle formulated as Riemannian Langevin dynamics with proven existence and uniqueness of the stationary distribution via the Fokker-Planck equation, replacing hand-tuned decay with principled convergence guarantees. Third, a cellular sheaf model where non-trivial first cohomology classes correspond precisely to irreconcilable contradictions across memory contexts. On the LoCoMo benchmark, the mathematical layers yield +12.7 percentage points over engineering baselines across six conversations, reaching +19.9 pp on the most challenging dialogues. A four-channel retrieval architecture achieves 75% accuracy without cloud dependency. Cloud-augmented results reach 87.7%. A zero-LLM configuration satisfies EU AI Act data sovereignty requirements by architectural design. To our knowledge, this is the first work establishing information-geometric, sheaf-theoretic, and stochastic-dynamical foundations for AI agent memory systems.

2603.14587 2026-03-17 cs.CV

Texel Splatting: Perspective-Stable 3D Pixel Art

Dylan Ebert

Comments 3 pages, 2 figures

详情
英文摘要

Rendering 3D scenes as pixel art requires that discrete pixels remain stable as the camera moves. Existing methods snap the camera to a grid. Under orthographic projection, this works: every pixel shifts by the same amount, and a single snap corrects all of them. Perspective breaks this. Pixels at different depths drift at different rates, and no single snap corrects all depths. Texel splatting avoids this entirely. Scene geometry is rendered into a cubemap from a fixed point in the world, and each texel is splatted to the screen as a world-space quad. Cubemap indexing gives rotation invariance. Grid-snapping the origin gives translation invariance. The primary limitation is that a fixed origin cannot see all geometry; disocclusion at probe boundaries remains an open tradeoff.

2603.14567 2026-03-17 cs.CL

Top-b: Entropic Regulation of Relative Probability Bands in Autoregressive Language Processes

Deepon Halder, Raj Dabre

详情
英文摘要

Probabilistic language generators are theoretically modeled as discrete stochastic processes, yet standard decoding strategies (Top-k, Top-p) impose static truncation rules that fail to accommodate the dynamic information density of natural language. This misalignment often forces a suboptimal trade-off: static bounds are either too restrictive for high-entropy creative generation or too permissive for low-entropy logical reasoning. In this work, we formalize the generation process as a trajectory through a relative probability manifold. We introduce Top-b (Adaptive Relative Band Sampling), a decoding strategy that regulates the candidate set via a dynamic bandwidth coefficient coupled strictly to the instantaneous Shannon entropy of the model's distribution. We provide a theoretical framework demonstrating that Top-b acts as a variance-minimizing operator on the tail distribution. Empirical validation on GPQA and GSM8K benchmarks indicates that Top-b significantly reduces generation entropy and inter-decoding variance while maintaining competitive reasoning accuracy, effectively approximating a self-regulating control system for autoregressive generation.

2603.14563 2026-03-17 cs.CL

Multilingual TinyStories: A Synthetic Combinatorial Corpus of Indic Children's Stories for Training Small Language Models

Deepon Halder, Angira Mukherjee

详情
英文摘要

The development of robust language models for low-resource languages is frequently bottlenecked by the scarcity of high-quality, coherent, and domain-appropriate training corpora. In this paper, we introduce the Multilingual TinyStories dataset, a large-scale, synthetically generated collection of children's stories encompassing 17 Indian languages. Designed specifically for the training and evaluation of Small Language Models (SLMs), the corpus provides simple, narrative-driven text strictly localized to native scripts. We detail our hybrid curation pipeline, which leverages the Sarvam-M language model and a novel combinatorial prompt engineering framework for native generation, coupled with the Google Translate API for large-scale cross-lingual expansion. Through strict programmatic filtering, we compiled 132,942 stories and over 93.9 million tokens in our release, serving as a foundational resource for multilingual language modeling and transfer learning in the Indic linguistic sphere.