arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 3188
专题追踪
2603.00404 2026-03-03 cs.LG cs.AI

USE: Uncertainty Structure Estimation for Robust Semi-Supervised Learning

Tsao-Lun Chen, Chien-Liang Liu, Tzu-Ming Harry Hsu, Tai-Hsien Wu, Chi-Cheng Fu, Han-Yi E. Chou, Shun-Feng Su

Comments Revised mathematical derivations

详情
英文摘要

In this study, a novel idea, Uncertainty Structure Estimation (USE), a lightweight, algorithm-agnostic procedure that emphasizes the often-overlooked role of unlabeled data quality is introduced for Semi-supervised learning (SSL). SSL has achieved impressive progress, but its reliability in deployment is limited by the quality of the unlabeled pool. In practice, unlabeled data are almost always contaminated by out-of-distribution (OOD) samples, where both near-OOD and far-OOD can negatively affect performance in different ways. We argue that the bottleneck does not lie in algorithmic design, but rather in the absence of principled mechanisms to assess and curate the quality of unlabeled data. The proposed USE trains a proxy model on the labeled set to compute entropy scores for unlabeled samples, and then derives a threshold, via statistical comparison against a reference distribution, that separates informative (structured) from uninformative (structureless) samples. This enables assessment as a preprocessing step, removing uninformative or harmful unlabeled data before SSL training begins. Through extensive experiments on imaging (CIFAR-100) and NLP (Yelp Review) data, it is evident that USE consistently improves accuracy and robustness under varying levels of OOD contamination. Thus, it can be concluded that the proposed approach reframes unlabeled data quality control as a structural assessment problem, and considers it as a necessary component for reliable and efficient SSL in realistic mixed-distribution environments.

2603.00397 2026-03-03 cs.LG

TENG-BC: Unified Time-Evolving Natural Gradient for Neural PDE Solvers with General Boundary Conditions

Hongjie Jiang, Di Luo

详情
英文摘要

Accurately solving time-dependent partial differential equations (PDEs) with neural networks remains challenging due to long-time error accumulation and the difficulty of enforcing general boundary conditions. We introduce TENG-BC, a high-precision neural PDE solver based on the Time-Evolving Natural Gradient, designed to perform under general boundary constraints. At each time step, TENG-BC performs a boundary-aware optimization that jointly enforces interior dynamics and boundary conditions, accommodating Dirichlet, Neumann, Robin, and mixed types within a unified framework. This formulation admits a natural-gradient interpretation, enabling stable time evolution without delicate penalty tuning. Across benchmarks over diffusion, transport, and nonlinear PDEs with various boundary conditions, TENG-BC achieves solver-level accuracy under comparable sampling budgets, outperforming conventional solvers and physics-informed neural network (PINN) baselines.

2603.00396 2026-03-03 cs.LG cs.AI cs.SY eess.SY math.OC

Hereditary Geometric Meta-RL: Nonlocal Generalization via Task Symmetries

Paul Nitschke, Shahriar Talebi

Comments Accepted to 2026 American Control Conference

详情
英文摘要

Meta-Reinforcement Learning (Meta-RL) commonly generalizes via smoothness in the task encoding. While this enables local generalization around each training task, it requires dense coverage of the task space and leaves richer task space structure untapped. In response, we develop a geometric perspective that endows the task space with a "hereditary geometry" induced by the inherent symmetries of the underlying system. Concretely, the agent reuses a policy learned at the train time by transforming states and actions through actions of a Lie group. This converts Meta-RL into symmetry discovery rather than smooth extrapolation, enabling the agent to generalize to wider regions of the task space. We show that when the task space is inherited from the symmetries of the underlying system, the task space embeds into a subgroup of those symmetries whose actions are linearizable, connected, and compact--properties that enable efficient learning and inference at the test time. To learn these structures, we develop a differential symmetry discovery method. This collapses functional invariance constraints and thereby improves numerical stability and sample efficiency over functional approaches. Empirically, on a two-dimensional navigation task, our method efficiently recovers the ground-truth symmetry and generalizes across the entire task space, while a common baseline generalizes only near training tasks.

2603.00382 2026-03-03 cs.CV

DiffSOS: Acoustic Conditional Diffusion Model for Speed-of-Sound Reconstruction in Ultrasound Computed Tomography

Yujia Wu, Shuoqi Chen, Shiru Wang, Yucheng Tang, Petr Bruza, Geoffrey P. Luke

详情
英文摘要

Accurate Speed-of-Sound (SoS) reconstruction from acoustic waveforms is a cornerstone of ultrasound computed tomography (USCT), enabling quantitative velocity mapping that reveals subtle anatomical details and pathological variations often invisible in conventional imaging. However, practical utility is hindered by the limitations of existing algorithms; traditional Full Waveform Inversion (FWI) is computationally intensive, while current deep learning approaches tend to produce oversmoothed results lacking fine details. We propose DiffSOS, a conditional diffusion model that directly maps acoustic waveforms to SoS maps. Our framework employs a specialized acoustic ControlNet to strictly ground the denoising process in physical wave measurements. To ensure structural consistency, we optimize a hybrid loss function that integrates noise prediction, spatial reconstruction, and noise frequency content. To accelerate inference, we employ stochastic Denoising Diffusion Implicit Model (DDIM) sampling, achieving near real-time reconstruction with only 10 steps. Crucially, we exploit the stochastic generative nature of our framework to estimate pixel-wise uncertainty, providing a measure of reliability that is often absent in deterministic approaches. Evaluated on the OpenPros USCT benchmark, DiffSOS significantly outperforms state-of-the-art networks, achieving an average Multi-scale Structural Similarity of 0.957. Our approach provides high-fidelity SoS maps with a principled measure of confidence, facilitating safer and faster clinical interpretation.

2603.00374 2026-03-03 cs.AI cs.MA

Conservative Equilibrium Discovery in Offline Game-Theoretic Multiagent Reinforcement Learning

Austin A. Nguyen, Michael P. Wellman

详情
英文摘要

Offline learning of strategies takes data efficiency to its extreme by restricting algorithms to a fixed dataset of state-action trajectories. We consider the problem in a mixed-motive multiagent setting, where the goal is to solve a game under the offline learning constraint. We first frame this problem in terms of selecting among candidate equilibria. Since datasets may inform only a small fraction of game dynamics, it is generally infeasible in offline game-solving to even verify a proposed solution is a true equilibrium. Therefore, we consider the relative probability of low regret (i.e., closeness to equilibrium) across candidates based on the information available. Specifically, we extend Policy Space Response Oracles (PSRO), an online game-solving approach, by quantifying game dynamics uncertainty and modifying the RL objective to skew towards solutions more likely to have low regret in the true game. We further propose a novel meta-strategy solver, tailored for the offline setting, to guide strategy exploration in PSRO. Our incorporation of Conservatism principles from Offline reinforcement learning approaches for strategy Exploration gives our approach its name: COffeE-PSRO. Experiments demonstrate COffeE-PSRO's ability to extract lower-regret solutions than state-of-the-art offline approaches and reveal relationships between algorithmic components empirical game fidelity, and overall performance.

2603.00372 2026-03-03 cs.CV

Unsupervised Semantic Segmentation in Synchrotron Computed Tomography with Self-Correcting Pseudo Labels

Austin Yunker, Peter Kenesei, Hemant Sharma, Jun-Sang Park, Antonino Miceli, Rajkumar Kettimuthu

详情
英文摘要

X-ray computed tomography (CT) is a widely used imaging technique that provides detailed examinations into the internal structure of an object with synchrotron CT (SR-CT) enabling improved data quality by using higher energy, monochromatic X-rays. While SR-CT allows for improved resolution, time-resolved experimentation, and reduced imaging artifacts, it also produces significantly larger datasets than conventional CT. Accurate and efficient evaluation of these datasets is a critical component of these workflows; yet is often done manually representing a major bottleneck in the analysis phase. While deep learning has emerged as a powerful tool capable of providing a wide range of purely data-driven solutions, it requires a substantial amount of labeled data for training and manual annotation of SR-CT datasets is impractical in practice. In this paper, we introduce a novel framework that enables automatic segmentation of large, high-resolution SR-CT datasets by eliminating the need to hand label images for deep learning training. First, we generate pseudo labels by clustering on the voxel values identifying regions in the volume with similar attenuation coefficients producing an initial semantic map. Afterwards, we train a segmentation model on the pseudo labels before utilizing the Unbiased Teacher approach to self-correct them ensuring accurate final segmentations. We find our approach improves pixel-wise accuracy and mIoU by 13.31% and 15.94%, respectively, over the baseline pseudo labels when using a magnesium crystal SR-CT sample. Additionally, we extensively evaluate the different components of our workflow including segmentation model, loss function, pseudo labeling strategy, and input type. Finally, we evaluate our approach on to two additional samples highlighting our frameworks ability to produce segmentations that are considerably better than the original pseudo labels.

2603.00369 2026-03-03 cs.CL

Policy Compliance of User Requests in Natural Language for AI Systems

Pedro Cisneros-Velarde

详情
英文摘要

Consider an organization whose users send requests in natural language to an AI system that fulfills them by carrying out specific tasks. In this paper, we consider the problem of ensuring such user requests comply with a list of diverse policies determined by the organization with the purpose of guaranteeing the safe and reliable use of the AI system. We propose, to the best of our knowledge, the first benchmark consisting of annotated user requests of diverse compliance with respect to a list of policies. Our benchmark is related to industrial applications in the technology sector. We then use our benchmark to evaluate the performance of various LLM models on policy compliance assessment under different solution methods. We analyze the differences on performance metrics across the models and solution methods, showcasing the challenging nature of our problem.

2603.00368 2026-03-03 cs.LG cs.CV eess.IV

Deep Learning-Based Meat Freshness Detection with Segmentation and OOD-Aware Classification

Hutama Arif Bramantyo, Mukarram Ali Faridi, Rui Chen, Clarissa Harris, Yin Sun

详情
英文摘要

In this study, we present a meat freshness classification framework from Red-Green-Blue (RGB) images that supports both packaged and unpackaged meat datasets. The system classifies four in-distribution (ID) meat classes and uses an out-of-distribution (OOD)-aware abstention mechanism that flags low-confidence samples as No Result. The pipeline combines U-Net-based segmentation with deep feature classifiers. Segmentation is used as a preprocessing step to isolate the meat region and reduce background, producing more consistent inputs for classification. The segmentation module achieved an Intersection over Union (IoU) of 75% and a Dice coefficient of 82%, producing standardized inputs for the classification stage. For classification, we benchmark five backbones: Residual Network-50 (ResNet-50), Vision Transformer-Base/16 (ViT-B/16), Swin Transformer-Tiny (Swin-T), EfficientNet-B0, and MobileNetV3-Small. We use nested 5x3 cross-validation (CV) for model selection and hyperparameter tuning. On the held-out ID test set, EfficientNet-B0 achieves the highest accuracy (98.10%), followed by ResNet-50 and MobileNetV3-Small (both 97.63%) and Swin-T (97.51%), while ViT-B/16 is lower (94.42%). We additionally evaluate OOD scoring and thresholding using standard OOD metrics and sensitivity analysis over the abstention threshold. Finally, we report on-device latency using TensorFlow Lite (TFLite) on a smartphone, highlighting practical accuracy-latency trade-offs for future deployment.

2603.00364 2026-03-03 cs.CL

Distribution-Aware Companding Quantization of Large Language Models

Athul Radhakrishnan, Siddhant Mohan, Mahima Sachdeva

详情
英文摘要

Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample efficiency. More specifically, at each position in the training corpus, we ask the model to predict the following n tokens using n independent output heads, operating on top of a shared model trunk. Considering multi-token prediction as an auxiliary training task, we measure improved downstream capabilities with no overhead in training time for both code and natural language models. The method is increasingly useful for larger model sizes and keeps its appeal when training for multiple epochs. Gains are especially pronounced on generative benchmarks like coding, where our models consistently outperform strong baselines by several percentage points. Our 13B parameter models solves 12 % more problems on HumanEval and 17 % more on MBPP than comparable next-token models. Experiments on small algorithmic tasks demonstrate that multi-token prediction is favorable for the development of induction heads and algorithmic reasoning capabilities. As an additional benefit, models trained with 4-token prediction are up to 3X times faster at inference, even with large batch sizes.

2603.00363 2026-03-03 cs.LG cs.AI

Quantifying Catastrophic Forgetting in IoT Intrusion Detection Systems

Sourasekhar Banerjee, David Bergqvist, Salman Toor, Christian Rohner, Andreas Johnsson

Comments 6 pages, 4 figures

详情
英文摘要

Distribution shifts in attack patterns within RPL-based IoT networks pose a critical threat to the reliability and security of large-scale connected systems. Intrusion Detection Systems (IDS) trained on static datasets often fail to generalize to unseen threats and suffer from catastrophic forgetting when updated with new attacks. Ensuring continual adaptability of IDS is therefore essential for maintaining robust IoT network defence. In this focused study, we formulate intrusion detection as a domain continual learning problem and propose a method-agnostic IDS framework that can integrate diverse continual learning strategies. We systematically benchmark five representative approaches across multiple domain-ordering sequences using a comprehensive multi-attack dataset comprising 48 domains. Results show that continual learning mitigates catastrophic forgetting while maintaining a balance between plasticity, stability, and efficiency, a crucial aspect for resource-constrained IoT environments. Among the methods, Replay-based approaches achieve the best overall performance, while Synaptic Intelligence (SI) delivers near-zero forgetting with high training efficiency, demonstrating strong potential for stable and sustainable IDS deployment in dynamic IoT networks.

2603.00362 2026-03-03 cs.CV

Percept-Aware Surgical Planning for Visual Cortical Prostheses with Vascular Avoidance

Galen Pogoncheff, Alvin Wang, Jacob Granley, Michael Beyeler

详情
英文摘要

Cortical visual prostheses aim to restore sight by electrically stimulating neurons in early visual cortex (V1). With the emergence of high-density and flexible neural interfaces, electrode placement within three-dimensional cortex has become a critical surgical planning problem. Existing strategies emphasize visual field coverage and anatomical heuristics but do not directly optimize predicted perceptual outcomes under safety constraints. We present a percept-aware framework for surgical planning of cortical visual prostheses that formulates electrode placement as a constrained optimization problem in anatomical space. Electrode coordinates are treated as learnable parameters and optimized end-to-end using a differentiable forward model of prosthetic vision. The objective minimizes task-level perceptual error while incorporating vascular avoidance and gray matter feasibility constraints. Evaluated on simulated reading and natural image tasks using realistic folded cortical geometry (FreeSurfer fsaverage), percept-aware optimization consistently improves reconstruction fidelity relative to coverage-based placement strategies. Importantly, vascular safety constraints eliminate margin violations while preserving perceptual performance. The framework further enables co-optimization of multi-electrode thread configurations under fixed insertion budgets. These results demonstrate how differentiable percept models can inform anatomically grounded, safety-aware computer-assisted planning for cortical neural interfaces and provide a foundation for optimizing next-generation visual prostheses.

2603.00355 2026-03-03 cs.LG cs.SD eess.AS

StethoLM: Audio Language Model for Cardiopulmonary Analysis Across Clinical Tasks

Yishan Wang, Tsai-Ning Wang, Mathias Funk, Aaqib Saeed

Comments To be published in TMLR

详情
英文摘要

Listening to heart and lung sounds - auscultation - is one of the first and most fundamental steps in a clinical examination. Despite being fast and non-invasive, it demands years of experience to interpret subtle audio cues. Recent deep learning methods have made progress in automating cardiopulmonary sound analysis, yet most are restricted to simple classification and offer little clinical interpretability or decision support. We present StethoLM, the first audio-language model specialized for cardiopulmonary auscultation, capable of performing instruction-driven clinical tasks across the full spectrum of auscultation analysis. StethoLM integrates audio encoding with a medical language model backbone and is trained on StethoBench, a comprehensive benchmark comprising 77,027 instruction-response pairs synthesized from 16,125 labeled cardiopulmonary recordings spanning seven clinical task categories: binary classification, detection, reporting, reasoning, differential diagnosis, comparison, and location-based analysis. Through multi-stage training that combines supervised fine-tuning and direct preference optimization, StethoLM achieves substantial gains in performance and robustness on out-of-distribution data. Our work establishes a foundation for instruction-following AI systems in clinical auscultation.

2603.00351 2026-03-03 cs.RO cs.AI cs.LG cs.SD

Acoustic Sensing for Universal Jamming Grippers

Lion Weber, Theodor Wienert, Martin Splettstößer, Alexander Koenig, Oliver Brock

Comments Accepted at ICRA 2026, supplementary material under https://rbo.gitlab-pages.tu-berlin.de/papers/acoustic-jamming-icra26/

Journal ref IEEE International Conference on Robotics and Automation (ICRA) 2026

详情
英文摘要

Universal jamming grippers excel at grasping unknown objects due to their compliant bodies. Traditional tactile sensors can compromise this compliance, reducing grasping performance. We present acoustic sensing as a form of morphological sensing, where the gripper's soft body itself becomes the sensor. A speaker and microphone are placed inside the gripper cavity, away from the deformable membrane, fully preserving compliance. Sound propagates through the gripper and object, encoding object properties, which are then reconstructed via machine learning. Our sensor achieves high spatial resolution in sensing object size (2.6 mm error) and orientation (0.6 deg error), remains robust to external noise levels of 80 dBA, and discriminates object materials (up to 100% accuracy) and 16 everyday objects (85.6% accuracy). We validate the sensor in a realistic tactile object sorting task, achieving 53 minutes of uninterrupted grasping and sensing, confirming the preserved grasping performance. Finally, we demonstrate that disentangled acoustic representations can be learned, improving robustness to irrelevant acoustic variations.

2603.00350 2026-03-03 cs.AI

Monotropic Artificial Intelligence: Toward a Cognitive Taxonomy of Domain-Specialized Language Models

Antonio de Sousa Leitão Filho, Allan Kardec Duailibe Barros Filho, Fabrício Saul Lima, Selby Mykael Lima dos Santos, Rejani Bandeira Vieira Sousa

详情
英文摘要

The prevailing paradigm in artificial intelligence research equates progress with scale: larger models trained on broader datasets are presumed to yield superior capabilities. This assumption, while empirically productive for general-purpose applications, obscures a fundamental epistemological tension between breadth and depth of knowledge. We introduce the concept of \emph{Monotropic Artificial Intelligence} -- language models that deliberately sacrifice generality to achieve extraordinary precision within narrowly circumscribed domains. Drawing on the cognitive theory of monotropism developed to understand autistic cognition, we argue that intense specialization represents not a limitation but an alternative cognitive architecture with distinct advantages for safety-critical applications. We formalize the defining characteristics of monotropic models, contrast them with conventional polytropic architectures, and demonstrate their viability through Mini-Enedina, a 37.5-million-parameter model that achieves near-perfect performance on Timoshenko beam analysis while remaining deliberately incompetent outside its domain. Our framework challenges the implicit assumption that artificial general intelligence constitutes the sole legitimate aspiration of AI research, proposing instead a cognitive ecology in which specialized and generalist systems coexist complementarily.

2603.00338 2026-03-03 cs.RO

Layered Safety: Enhancing Autonomous Collision Avoidance via Multistage CBF Safety Filters

Erina Yamaguchi, Ryan M. Bena, Gilbert Bahati, Aaron D. Ames

详情
英文摘要

This paper presents a general end-to-end framework for constructing robust and reliable layered safety filters that can be leveraged to perform dynamic collision avoidance over a broad range of applications using only local perception data. Given a robot-centric point cloud, we begin by constructing an occupancy map which is used to synthesize a Poisson safety function (PSF). The resultant PSF is employed as a control barrier function (CBF) within two distinct safety filtering stages. In the first stage, we propose a predictive safety filter to compute optimal safe trajectories based on nominal potentially-unsafe commands. The resultant short-term plans are constrained to satisfy the CBF condition along a finite prediction horizon. In the second stage, instantaneous velocity commands are further refined by a real-time CBF-based safety filter and tracked by the full-order low-level robot controller. Assuming accurate tracking of velocity commands, we obtain formal guarantees of safety for the full-order system. We validate the optimality and robustness of our multistage architecture, in comparison to traditional single-stage safety filters, via a detailed Pareto analysis. We further demonstrate the effectiveness and generality of our collision avoidance methodology on multiple legged robot platforms across a variety of real-world dynamic scenarios.

2603.00337 2026-03-03 cs.CV

Diffusion-Based Low-Light Image Enhancement with Color and Luminance Priors

Xuanshuo Fu, Lei Kang, Javier Vazquez-Corral

详情
英文摘要

Low-light images often suffer from low contrast, noise, and color distortion, degrading visual quality and impairing downstream vision tasks. We propose a novel conditional diffusion framework for low-light image enhancement that incorporates a Structured Control Embedding Module (SCEM). SCEM decomposes a low-light image into four informative components including illumination, illumination-invariant features, shadow priors, and color-invariant cues. These components serve as control signals that condition a U-Net-based diffusion model trained with a simplified noise-prediction loss. Thus, the proposed SCEM equipped Diffusion method enforces structured enhancement guided by physical priors. In experiments, our model is trained only on the LOLv1 dataset and evaluated without fine-tuning on LOLv2-real, LSRW, DICM, MEF, and LIME. The method achieves state-of-the-art performance in quantitative and perceptual metrics, demonstrating strong generalization across benchmarks. https://casted.github.io/scem/.

2603.00326 2026-03-03 cs.LG cs.DC cs.PF

Vectorized Adaptive Histograms for Sparse Oblique Forests

Ariel Lubonja, Jungsang Yoon, Haoyin Xu, Yue Wan, Yilin Xu, Richard Stotz, Mathieu Guillame-Bert, Joshua T. Vogelstein, Randal Burns

详情
英文摘要

Classification using sparse oblique random forests provides guarantees on uncertainty and confidence while controlling for specific error types. However, they use more data and more compute than other tree ensembles because they create deep trees and need to sort or histogram linear combinations of data at runtime. We provide a method for dynamically switching between histograms and sorting to find the best split. We further optimize histogram construction using vector intrinsics. Evaluating this on large datasets, our optimizations speedup training by 1.7-2.5x compared to existing oblique forests and 1.5-2x compared to standard random forests. We also provide a GPU and hybrid CPU-GPU implementation.

2603.00325 2026-03-03 cs.RO cs.SY eess.SY

Geometric Look-Angle Shaping Strategy for Enclosed Inspection

Amit Shivam, Manuel C. R. M. Fernandes, Sergio Vinha, Fernando A. C. C. Fontes

Comments Preprinted submitted to ICUAS 2026

详情
英文摘要

This paper introduces inspection through GLASS, a Geometric Look-Angle Shaping Strategy for enclosed regions using unmanned aerial vehicles. In doing so, the vehicles guidance command is constructed through a bounded, geometry-consistent shaping of the look angle relative to a desired standoff path. By embedding a smooth, hyperbolic-tangent-type shaping function within a polar geometric framework, GLASS ensures global existence of the guidance dynamics. It avoids the far-field limitations inherent to conventional formulations. Lyapunov stability analysis establishes asymptotic convergence to a prescribed inspection standoff under explicit curvature feasibility conditions, along with analytical settling-time characteristics. The proposed strategy incorporates maximum turn-rate constraints without inducing singularities throughout the workspace. High-fidelity six-degree-of-freedom quadrotor simulations demonstrate the effectiveness of GLASS in representative enclosed inspection scenarios, highlighting a practically viable guidance framework for autonomous enclosed inspection missions.

2603.00324 2026-03-03 cs.CV

Proof-of-Perception: Certified Tool-Using Multimodal Reasoning with Compositional Conformal Guarantees

Arya Fayyazi, Haleh Akrami

Journal ref CVPR 2026

详情
英文摘要

We present Proof-of-Perception (PoP), a tool-using framework that casts multimodal reasoning as an executable graph with explicit reliability guarantees. Each perception or logic node outputs a conformal set, yielding calibrated, stepwise uncertainty; a lightweight controller uses these certificates to allocate compute under a budget, expanding with extra tool calls only when needed and stopping early otherwise. This grounds answers in verifiable evidence, reduces error compounding and hallucinations, and enables principled accuracy-compute trade-offs. Across document, chart, and multi-image QA benchmarks, PoP improves performance and reliability over strong chain-of-thought, ReAct-style, and program-of-thought baselines while using computation more efficiently.

2603.00319 2026-03-03 cs.RO cs.SY eess.SY

Modeling PWM-Time-SOC Interaction in a Simulated Robot

Vidyut Pradeep, Shirantha Welikala

详情
英文摘要

Accurate prediction of battery state of charge is needed for autonomous robots to plan movements without using up all available power. This work develops a physics and data-informed model from a simulation that predicts SOC depletion as a function of time and PWM duty cycle for a simulated 4-wheel Arduino robot. A forward-motion simulation incorporating motor electrical characteristics (resistance, inductance, back-EMF, torque constant) and mechanical dynamics (mass, drag, rolling resistance, wheel radius) was used to generate SOC time-series data across PWM values from 1-100%. Sparse Identification of Nonlinear Dynamics (SINDy), combined with least-squares regression, was applied to construct a unified nonlinear model that captures SOC(t, p). The framework allows for energy-aware planning for similar robots and can be extended to incorporate arbitrary initial SOC levels and environment-dependent parameters for real-world deployment.

2603.00307 2026-03-03 cs.CL

From Prerequisites to Predictions: Validating a Geometric Hallucination Taxonomy Through Controlled Induction

Matic Korun

Comments 9 pages, 2 figures, appendices (reproducibility, sample generation, additional figures)

详情
英文摘要

We test whether a geometric hallucination taxonomy -- classifying failures as center-drift (Type~1), wrong-well convergence (Type~2), or coverage gaps (Type~3) -- can distinguish hallucination types through controlled induction in GPT-2. Using a two-level statistical design with prompts ($N = 15$/group) as the unit of inference, we run each experiment 20 times with different generation seeds to quantify result stability. In static embeddings, Type~3 norm separation is robust (significant in 18/20 runs, Holm-corrected in 14/20, median $r = +0.61$). In contextual hidden states, the Type~3 norm effect direction is stable (19/20 runs) but underpowered at $N = 15$ (significant in 4/20, median $r = -0.28$). Types~1 and~2 do not separate in either space (${\leq}\,3/20$ runs). Token-level tests inflate significance by 4--16$\times$ through pseudoreplication -- a finding replicated across all 20 runs. The results establish coverage-gap hallucinations as the most geometrically distinctive failure mode, carried by magnitude rather than direction, and confirm the Type~1/2 non-separation as genuine at 124M parameters.

2603.00306 2026-03-03 cs.LG

When does Chain-of-Thought Help: A Markovian Perspective

Zihan Wang, Yijun Dong, Qi Lei

详情
英文摘要

Chain-of-Thought (CoT) prompting is a widely used inference-time technique for improving reasoning, yet its gains are uneven across tasks. We analyze when and why CoT helps by modeling the step-wise reasoning trajectory as a Markov chain. Each intermediate step is a state and the dependence between steps is captured by a transition kernel. Our theory identifies transition alignment, whether instances share a common step-wise transition kernel, as the key determinant of CoT's effectiveness. When transitions are identical across steps, CoT reduces inference-time sample complexity: fewer context sample trajectories suffice to recover the final decision. In contrast, when transitions differ across steps, these gains can vanish. We further quantify how noise in intermediate steps modulates CoT's benefit. Beyond theory, we design synthetic benchmarks that isolate these factors to complement prior results on real-world tasks and to empirically validate our predictions.

2603.00302 2026-03-03 cs.LG cs.AI cs.LO

Polynomial Surrogate Training for Differentiable Ternary Logic Gate Networks

Sai Sandeep Damera, Ryan Matheu, Aniruddh G. Puranic, John S. Baras

Comments 28 pages, 13 figures. Submitted to 3rd International Conference on Neuro-Symbolic Systems (NeuS) 2026

详情
英文摘要

Differentiable logic gate networks (DLGNs) learn compact, interpretable Boolean circuits via gradient-based training, but all existing variants are restricted to the 16 two-input binary gates. Extending DLGNs to Ternary Kleene $K_3$ logic and training DTLGNs where the UNKNOWN state enables principled abstention under uncertainty is desirable. However, the support set of potential gates per neuron explodes to $19{,}683$, making the established softmax-over-gates training approach intractable. We introduce Polynomial Surrogate Training (PST), which represents each ternary neuron as a degree-$(2,2)$ polynomial with 9 learnable coefficients (a $2{,}187\times$ parameter reduction) and prove that the gap between the trained network and its discretized logic circuit is bounded by a data-independent commitment loss that vanishes at convergence. Scaling experiments from 48K to 512K neurons on CIFAR-10 demonstrate that this hardening gap contracts with overparameterization. Ternary networks train $2$-$3\times$ faster than binary DLGNs and discover true ternary gates that are functionally diverse. On synthetic and tabular tasks we find that the UNKNOWN output acts as a Bayes-optimal uncertainty proxy, enabling selective prediction in which ternary circuits surpass binary accuracy once low-confidence predictions are filtered. More broadly, PST establishes a general polynomial-surrogate methodology whose parameterization cost grows only quadratically with logic valence, opening the door to many-valued differentiable logic.

2603.00296 2026-03-03 cs.CL cs.AI cs.LG

Stepwise Penalization for Length-Efficient Chain-of-Thought Reasoning

Xintong Li, Sha Li, Rongmei Lin, Hongye Jin, Linwei Li, Hejie Cui, Sarah Zhang, Chia-Yuan Chang, Kewei Cheng, Besnik Fetahu, Priyanka Nigam, Jingbo Shang, Bing Yin

Comments Preprint

详情
英文摘要

Large reasoning models improve with more test-time computation, but often overthink, producing unnecessarily long chains-of-thought that raise cost without improving accuracy. Prior reinforcement learning approaches typically rely on a single outcome reward with trajectory-level length penalties, which cannot distinguish essential from redundant reasoning steps and therefore yield blunt compression. Although recent work incorporates step-level signals, such as offline pruning, supervised data construction, or verifier-based intermediate rewards, reasoning length is rarely treated as an explicit step-level optimization objective during RL. We propose Step-wise Adaptive Penalization (SWAP), a fine-grained framework that allocates length reduction across steps based on intrinsic contribution. We estimate step importance from the model's on-policy log-probability improvement toward the correct answer, then treat excess length as a penalty mass redistributed to penalize low-importance steps more heavily while preserving high-importance reasoning. We optimize with a unified outcome-process advantage within group-relative policy optimization. Extensive experiments demonstrate that SWAP reduces reasoning length by 64.3% on average while improving accuracy by 5.7% relative to the base model.

2603.00290 2026-03-03 cs.LG

Scalable Gaussian process modeling of parametrized spatio-temporal fields

Srinath Dama, Prasanth B. Nair

详情
英文摘要

We introduce a scalable Gaussian process (GP) framework with deep product kernels for data-driven learning of parametrized spatio-temporal fields over fixed or parameter-dependent domains. The proposed framework learns a continuous representation, enabling predictions at arbitrary spatio-temporal coordinates, independent of the training data resolution. We leverage Kronecker matrix algebra to formulate a computationally efficient training procedure with complexity that scales nearly linearly with the total number of spatio-temporal grid points. A key feature of our approach is the efficient computation of the posterior variance at essentially the same computational cost as the posterior mean (exactly for Cartesian grids and via rigorous bounds for unstructured grids), thereby enabling scalable uncertainty quantification. Numerical studies on a range of benchmark problems demonstrate that the proposed method achieves accuracy competitive with operator learning methods such as Fourier neural operators and deep operator networks. On the one-dimensional unsteady Burgers' equation, our method surpasses the accuracy of projection-based reduced-order models. These results establish the proposed framework as an effective tool for data-driven surrogate modeling, particularly when uncertainty estimates are required for downstream tasks.

2603.00289 2026-03-03 cs.CV

Seeking Necessary and Sufficient Information from Multimodal Medical Data

Boyu Chen, Weiye Bao, Junjie Liu, Michael Shen, Bo Peng, Paul Taylor, Zhu Li, Mengyue Yang

Comments 11 pages, 1 figure. Submitted to MICCAI 2026

详情
英文摘要

Learning multimodal representations from medical images and other data sources can provide richer information for decision-making. While various multimodal models have been developed for this, they overlook learning features that are both necessary (must be present for the outcome to occur) and sufficient (enough to determine the outcome). We argue learning such features is crucial as they can improve model performance by capturing essential predictive information, and enhance model robustness to missing modalities as each modality can provide adequate predictive signals. Such features can be learned by leveraging the Probability of Necessity and Sufficiency (PNS) as a learning objective, an approach that has proven effective in unimodal settings. However, extending PNS to multimodal scenarios remains underexplored and is non-trivial as key conditions of PNS estimation are violated. We address this by decomposing multimodal representations into modality-invariant and modality-specific components, then deriving tractable PNS objectives for each. Experiments on synthetic and real-world medical datasets demonstrate our method's effectiveness. Code will be available on GitHub.

2603.00285 2026-03-03 cs.AI

TraderBench: How Robust Are AI Agents in Adversarial Capital Markets?

Xiaochuang Yuan, Hui Xu, Silvia Xu, Cui Zou, Jing Xiong

Comments Equal Contribution: Xiaochuang Yuan and Hui Xu contributed equally to this work. All correspondence should be directed to yxc20098@gmail.com. Submitted to Agents in the Wild Workshop, ICLR2026

详情
英文摘要

Evaluating AI agents in finance faces two key challenges: static benchmarks require costly expert annotation yet miss the dynamic decision-making central to real-world trading, while LLM-based judges introduce uncontrolled variance on domain-specific tasks. We introduce TraderBench, a benchmark that addresses both issues. It combines expert-verified static tasks (knowledge retrieval, analytical reasoning) with adversarial trading simulations scored purely on realized performance-Sharpe ratio, returns, and drawdown-eliminating judge variance entirely. The framework features two novel tracks: crypto trading with four progressive market-manipulation transforms, and options derivatives scoring across P&L accuracy, Greeks, and risk management. Trading scenarios can be refreshed with new market data to prevent benchmark contamination. Evaluating 13 models (8B open-source to frontier) on ~50 tasks, we find: (1) 8 of 13 models score ~33 on crypto with <1-point variation across adversarial conditions, exposing fixed non-adaptive strategies; (2) extended thinking helps retrieval (+26 points) but has zero impact on trading (+0.3 crypto, -0.1 options). These findings reveal that current agents lack genuine market adaptation, underscoring the need for performance-grounded evaluation in finance.

2603.00273 2026-03-03 cs.CV eess.SP

Ozone Cues Mitigate Reflected Downwelling Radiance in LWIR Absorption-Based Ranging

Unay Dorken Gallastegi, Wentao Shangguan, Vaibhav Choudhary, Akshay Agarwal, Hoover Rueda-Chacón, Martin J. Stevens, Vivek K Goyal

Comments 15 pages, 10 figures

详情
英文摘要

Passive long-wave infrared (LWIR) absorption-based ranging relies on atmospheric absorption to estimate distances to objects from their emitted thermal radiation. First demonstrated decades ago for objects much hotter than the air and recently extended to scenes with low temperature variations, this ranging has depended on reflected radiance being negligible. Downwelling radiance is especially problematic, sometimes causing large inaccuracies. In two new ranging methods, we use characteristic features from ozone absorption to estimate the contribution of reflected downwelling radiance. The quadspectral method gives a simple closed-form range estimate from four narrowband measurements, two at a water vapor absorption line and two at an ozone absorption line. The hyperspectral method uses a broader spectral range to improve accuracy while also providing estimates of temperature, emissivity profiles, and contributions of downwelling from a collection of zenith angles. Experimental results demonstrate improved ranging accuracy, in one case reducing error from over 100 m when reflected light is not modeled to 6.8 m with the quadspectral method and 1.2 m with the hyperspectral method.

2603.00267 2026-03-03 cs.AI cs.IR cs.SI

Multi-Sourced, Multi-Agent Evidence Retrieval for Fact-Checking

Shuzhi Gong, Richard O. Sinnott, Jianzhong Qi, Cecile Paris, Preslav Nakov, Zhuohan Xie

详情
英文摘要

Misinformation spreading over the Internet poses a significant threat to both societies and individuals, necessitating robust and scalable fact-checking that relies on retrieving accurate and trustworthy evidence. Previous methods rely on semantic and social-contextual patterns learned from training data, which limits their generalization to new data distributions. Recently, Retrieval Augmented Generation (RAG) based methods have been proposed to utilize the reasoning capability of LLMs with retrieved grounding evidence documents. However, these methods largely rely on textual similarity for evidence retrieval and struggle to retrieve evidence that captures multi-hop semantic relations within rich document contents. These limitations lead to overlooking subtle factual correlations between the evidence and the claims to be fact-checked during evidence retrieval, thus causing inaccurate veracity predictions. To address these issues, we propose WKGFC, which exploits authorized open knowledge graph as a core resource of evidence. LLM-enabled retrieval is designed to assess the claims and retrieve the most relevant knowledge subgraphs, forming structured evidence for fact verification. To augment the knowledge graph evidence, we retrieve web contents for completion. The above process is implemented as an automatic Markov Decision Process (MDP): A reasoning LLM agent decides what actions to take according to the current evidence and the claims. To adapt the MDP for fact-checking, we use prompt optimization to fine-tune the agentic LLM.

2603.00266 2026-03-03 cs.CV

Adversarial Patch Generation for Visual-Infrared Dense Prediction Tasks via Joint Position-Color Optimization

He Li, Wenyue He, Weihang Kong, Xingchen Zhang

Comments 12 pages, 8 figures

详情
英文摘要

Multimodal adversarial attacks for dense prediction remain largely underexplored. In particular, visual-infrared (VI) perception systems introduce unique challenges due to heterogeneous spectral characteristics and modality-specific intensity distributions. Existing adversarial patch methods are primarily designed for single-modal inputs and fail to account for crossspectral inconsistencies, leading to reduced attack effectiveness and poor stealthiness when applied to VI dense prediction models. To address these challenges, we propose a joint position-color optimization framework (AP-PCO) for generating adversarial patches in visual-infrared settings. The proposed method optimizes patch placement and color composition simultaneously using a fitness function derived from model outputs, enabling a single patch to perturb both visible and infrared modalities. To further bridge spectral discrepancies, we introduce a crossmodal color adaptation strategy that constrains patch appearance according to infrared grayscale characteristics while maintaining strong perturbations in the visible domain, thereby reducing cross-spectral saliency. The optimization procedure operates without requiring internal model information, supporting flexible black-box attacks. Extensive experiments on visual-infrared dense prediction tasks demonstrate that the proposed AP-PCO achieves consistently strong attack performance across multiple architectures, providing a practical benchmark for robustness evaluation in VI perception systems.