arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1551
2512.15557 2026-04-03 cs.RO

OMCL: Open-vocabulary Monte Carlo Localization

Evgenii Kruzhkov, Raphael Memmesheimer, Sven Behnke

Comments Accepted to IEEE RA-L

详情
英文摘要

Robust robot localization is an important prerequisite for navigation, but it becomes challenging when the map and robot measurements are obtained from different sensors. Prior methods are often tailored to specific environments, relying on closed-set semantics or fine-tuned features. In this work, we extend Monte Carlo Localization with vision-language features, allowing OMCL to robustly compute the likelihood of visual observations given a camera pose and a 3D map created from posed RGB-D images or aligned point clouds. These open-vocabulary features enable us to associate observations and map elements from different modalities, and to natively initialize global localization through natural language descriptions of nearby objects. We evaluate our approach using Matterport3D and Replica for indoor scenes and demonstrate generalization on SemanticKITTI for outdoor scenes.

2512.13855 2026-04-03 cs.CV cs.AI

Improvise, Adapt, Overcome -- Telescopic Adapters for Efficient Fine-tuning of Vision Language Models in Medical Imaging

Ujjwal Mishra, Vinita Shukla, Praful Hambarde, Amit Shukla

Comments Accepted at the IEEE/CVF winter conference on applications of computer vision (WACV 2026)

详情
英文摘要

Adapting Vision Language Segmentation Models (VLSMs) to medical imaging domains requires significant computational overhead when using conventional fine-tuning approaches. Existing Parameter-Efficient Fine-Tuning (PEFT) methods apply uniform adapter dimensions across all transformer layers, leading to suboptimal parameter allocation and reduced adaptation efficiency. We introduce Telescopic Adapters, a novel PEFT framework that employs depth-aware scaling to progressively increase adapter capacity from shallow to deep transformer layers. Our method integrates lightweight bottleneck modules within CLIPSeg's vision and text encoders, with adapter dimensions dynamically scaled based on layer depth and semantic relevance. Using only 613k trainable parameters--244x fewer than end-to-end fine-tuning, Telescopic Adapters achieve superior performance across five diverse medical datasets spanning polyp segmentation, skin lesion detection, and breast ultrasound imaging. Comprehensive ablation studies demonstrate that deeper layers require substantially more adaptation capacity than shallow layers, validating our telescopic scaling hypothesis. Our approach establishes a new paradigm for efficient medical VLSM fine-tuning, enabling deployment in resource-constrained clinical environments while maintaining competitive segmentation accuracy. Our source code is publicly available at https://github.com/Ujjwal238/Telescopic_adapters

2512.10822 2026-04-03 cs.AI cs.RO

V-OCBF: Learning Safety Filters from Offline Data via Value-Guided Offline Control Barrier Functions

Mumuksh Tayal, Manan Tayal, Aditya Singh, Shishir Kolathaya, Ravi Prakash

Comments 28 pages, 9 figures, 11 tables. Paper accepted at TMLR

详情
英文摘要

Ensuring safety in autonomous systems requires controllers that aim to satisfy state-wise constraints without relying on online interaction.While existing Safe Offline RL methods typically enforce soft expected-cost constraints, they struggle to ensure strict state-wise safety. Conversely, Control Barrier Functions (CBFs) offer a principled mechanism to enforce forward invariance, but often rely on expert-designed barrier functions or knowledge of the system dynamics. We introduce Value-Guided Offline Control Barrier Functions (V-OCBF), a framework that learns a neural CBF entirely from offline demonstrations. Unlike prior approaches, V-OCBF does not assume access to the dynamics model; instead, it derives a recursive finite-difference barrier update, enabling model-free learning of a barrier that propagates safety information over time. Moreover, V-OCBF incorporates an expectile-based objective that avoids querying the barrier on out-of-distribution actions and restricts updates to the dataset-supported action set. The learned barrier is then used with a Quadratic Program (QP) formulation to synthesize real-time safe control. Across multiple case studies, V-OCBF yields substantially fewer safety violations than baseline methods while maintaining strong task performance, highlighting its scalability for offline synthesis of safety-critical controllers without online interaction or hand-engineered barriers.

2512.10498 2026-04-03 cs.CV

Robust Shape from Focus via Multiscale Directional Dilated Laplacian and Recurrent Network

Khurram Ashfaq, Muhammad Tariq Mahmood

Comments Accepted to IJCV

详情
Journal ref
International Journal of Computer Vision, Volume 134, article number 115, (2026)
英文摘要

Shape-from-Focus (SFF) is a passive depth estimation technique that infers scene depth by analyzing focus variations in a focal stack. Most recent deep learning-based SFF methods typically operate in two stages: first, they extract focus volumes (a per pixel representation of focus likelihood across the focal stack) using heavy feature encoders; then, they estimate depth via a simple one-step aggregation technique that often introduces artifacts and amplifies noise in the depth map. To address these issues, we propose a hybrid framework. Our method computes multi-scale focus volumes traditionally using handcrafted Directional Dilated Laplacian (DDL) kernels, which capture long-range and directional focus variations to form robust focus volumes. These focus volumes are then fed into a lightweight, multi-scale GRU-based depth extraction module that iteratively refines an initial depth estimate at a lower resolution for computational efficiency. Finally, a learned convex upsampling module within our recurrent network reconstructs high-resolution depth maps while preserving fine scene details and sharp boundaries. Extensive experiments on both synthetic and real-world datasets demonstrate that our approach outperforms state-of-the-art deep learning and traditional methods, achieving superior accuracy and generalization across diverse focal conditions.

2512.08991 2026-04-03 cs.CV cs.LG

Deterministic World Models for Verification of Closed-loop Vision-based Systems

Yuang Geng, Zhuoyang Zhou, Zhongzheng Zhang, Siyuan Pan, Hoang-Dung Tran, Ivan Ruchkin

Comments Significantly revised version with additional experiments and updated results. Submitted to EMSOFT 2026

详情
英文摘要

Verifying closed-loop vision-based control systems remains a fundamental challenge due to the high dimensionality of images and the difficulty of modeling visual environments. While generative models are increasingly used as camera surrogates in verification, their reliance on stochastic latent variables introduces unnecessary overapproximation error. To address this bottleneck, we propose a Deterministic World Model (DWM) that maps system states directly to generative images, effectively eliminating uninterpretable latent variables to ensure precise input bounds. The DWM is trained with a dual-objective loss function that combines pixel-level reconstruction accuracy with a control difference loss to maintain behavioral consistency with the real system. We integrate DWM into a verification pipeline utilizing Star-based reachability analysis (StarV) and employ conformal prediction to derive rigorous statistical bounds on the trajectory deviation between the world model and the actual vision-based system. Experiments on standard benchmarks show that our approach yields significantly tighter reachable sets and better verification performance than a latent-variable baseline.

2512.05069 2026-04-03 cs.LG cs.CR quant-ph

Hybrid Quantum-Classical Autoencoders for Unsupervised Network Intrusion Detection

Mohammad Arif Rasyidi, Omar Alhussein, Sami Muhaidat, Ernesto Damiani

Comments The authors have identified limitations in the experimental evaluation, which are insufficient to fully support the paper's conclusions. The manuscript is withdrawn pending additional experiments and analysis

详情
英文摘要

Unsupervised anomaly-based intrusion detection requires models that can generalize to attack patterns not observed during training. This work presents the first large-scale evaluation of hybrid quantum-classical (HQC) autoencoders for this task. We construct a unified experimental framework that iterates over key quantum design choices, including quantum-layer placement, measurement approach, variational and non-variational formulations, and latent-space regularization. Experiments across three benchmark NIDS datasets show that HQC autoencoders can match or exceed classical performance in their best configurations, although they exhibit higher sensitivity to architectural decisions. Under zero-day evaluation, well-configured HQC models provide stronger and more stable generalization than classical and supervised baselines. Simulated gate-noise experiments reveal early performance degradation, indicating the need for noise-aware HQC designs. These results provide the first data-driven characterization of HQC autoencoder behavior for network intrusion detection and outline key factors that govern their practical viability. All experiment code and configurations are available at https://github.com/arasyi/hqcae-network-intrusion-detection.

2512.02344 2026-04-03 cs.CV

A multi-weight self-matching visual explanation for cnns on sar images

Siyuan Sun, Yongping Zhang, Hongcheng Zeng, Yamin Wang, Wei Yang, Wanting Yang, Jie Chen

详情
英文摘要

In recent years, convolutional neural networks (CNNs) have achieved significant success in various synthetic aperture radar (SAR) tasks. However, the complexity and opacity of their internal mechanisms hinder the fulfillment of high-reliability requirements, thereby limiting their application in SAR. Improving the interpretability of CNNs is thus of great importance for their development and deployment in SAR. In this paper, a visual explanation method termed multi-weight self-matching class activation mapping (MS-CAM) is proposed. MS-CAM matches SAR images with the feature maps and corresponding gradients extracted by the CNN, and combines both channel-wise and element-wise weights to visualize the decision basis learned by the model in SAR images. Extensive experiments conducted on a self-constructed SAR target classification dataset demonstrate that MS-CAM more accurately highlights the network's regions of interest and captures detailed target feature information, thereby enhancing network interpretability. Furthermore, the feasibility of applying MS-CAM to weakly-supervised obiect localization is validated. Key factors affecting localization accuracy, such as pixel thresholds, are analyzed in depth to inform future work.

2511.22828 2026-04-03 cs.AI q-bio.NC

Fast dynamical similarity analysis

Arman Behrad, Mitchell Ostrow, Mohammad Taha Fakharian, Ila Fiete, Christian Beste, Shervin Safavi

详情
英文摘要

Understanding how nonlinear dynamical systems (e.g., artificial neural networks and neural circuits) process information requires comparing their underlying dynamics at scale, across diverse architectures and large neural recordings. While many similarity metrics exist, current approaches fall short for large-scale comparisons. Geometric methods are computationally efficient but fail to capture governing dynamics, limiting their accuracy. In contrast, traditional dynamical similarity methods are faithful to system dynamics but are often computationally prohibitive. We bridge this gap by combining the efficiency of geometric approaches with the fidelity of dynamical methods. We introduce fast dynamical similarity analysis (fastDSA), a computationally efficient and accurate metric for measuring (dis)similarity between nonlinear dynamical systems. FastDSA leverages modern computational tools, including random matrix theory to determine optimal system rank, novel optimization pipelines for aligning system flow fields, and Koopman embeddings. Across benchmark nonlinear systems and recurrent network models, fastDSA is robust to arbitrary coordinate choices while remaining sensitive to meaningful dynamical differences, capturing variations in system evolution that geometric methods may miss and traditional methods detect only at high computational cost. To our knowledge, fastDSA is the fastest method that retains accuracy in comparing nonlinear dynamical systems. It enables scalable, statistical analyses across diverse systems, significantly expanding the practical applicability of dynamical similarity analysis.

2511.22294 2026-04-03 cs.CV cs.LG

Structure is Supervision: Multiview Masked Autoencoders for Radiology

Sonia Laguna, Andrea Agostini, Alain Ryser, Samuel Ruiperez-Campillo, Irene Cannistraci, Moritz Vandenhirtz, Stephan Mandt, Nicolas Deperrois, Farhad Nooralahzadeh, Michael Krauthammer, Thomas M. Sutter, Julia E. Vogt

详情
Journal ref
Transactions on Machine Learning Research (TMLR) 2026
英文摘要

Building robust medical machine learning systems requires pretraining strategies that exploit the intrinsic structure present in clinical data. We introduce Multiview Masked Autoencoder (MVMAE), a self-supervised framework that leverages the natural multi-view organization of radiology studies to learn view-invariant and disease-relevant representations. MVMAE combines masked image reconstruction with cross-view alignment, transforming clinical redundancy across projections into a powerful self-supervisory signal. We further extend this approach with MVMAE-V2T, which incorporates radiology reports as an auxiliary text-based learning signal to enhance semantic grounding while preserving fully vision-based inference. Evaluated on a downstream disease classification task on three large-scale public datasets, MIMIC-CXR, CheXpert, and PadChest, MVMAE consistently outperforms supervised and vision-language baselines. Furthermore, MVMAE-V2T provides additional gains, particularly in low-label regimes where structured textual supervision is most beneficial. Together, these results establish the importance of structural and textual supervision as complementary paths toward scalable, clinically grounded medical foundation models.

2511.22048 2026-04-03 cs.CV cs.AI

ICM-SR: Image-Conditioned Manifold Regularization for Image Super-Resolution

Junoh Kang, Donghun Ryou, Bohyung Han

详情
英文摘要

Real world image super-resolution (Real-ISR) often leverages the powerful generative priors of text-to-image diffusion models by regularizing the output to lie on their learned manifold. However, existing methods often overlook the importance of the regularizing manifold, typically defaulting to a text-conditioned manifold. This approach suffers from two key limitations. Conceptually, it is misaligned with the Real-ISR task, which is to generate high quality (HQ) images directly tied to the low quality (LQ) images. Practically, the teacher model often reconstructs images with color distortions and blurred edges, indicating a flawed generative prior for this task. To correct these flaws and ensure conceptual alignment, a more suitable manifold must incorporate information from the images. While the most straightforward approach is to condition directly on the raw input images, their high information densities make the regularization process numerically unstable. To resolve this, we propose image-conditioned manifold regularization (ICM), a method that regularizes the output towards a manifold conditioned on the sparse yet essential structural information: a combination of colormap and Canny edges. ICM provides a task-aligned and stable regularization signal, thereby avoiding the instability of dense-conditioning and enhancing the final super-resolution quality. Our experiments confirm that the proposed regularization significantly enhances super-resolution performance, particularly in perceptual quality, demonstrating its effectiveness for real-world applications. We will release the source code of our work for reproducibility.

2511.21681 2026-04-03 cs.CV

Seeing without Pixels: Perception from Camera Trajectories

Zihui Xue, Kristen Grauman, Dima Damen, Andrew Zisserman, Tengda Han

Comments Accepted by CVPR 2026, Project website: https://sites.google.com/view/seeing-without-pixels

详情
英文摘要

Can one perceive a video's content without seeing its pixels, just from the camera trajectory-the path it carves through space? This paper is the first to systematically investigate this seemingly implausible question. Towards this end, we propose a contrastive learning framework to train CamFormer, a dedicated encoder that projects camera pose trajectories into a joint embedding space, aligning them with natural language. We find that, contrary to its apparent simplicity, the camera trajectory is a remarkably informative signal to uncover video content. In other words, "how you move" can indeed provide valuable cues about "what you are doing" (egocentric) or "observing" (exocentric). We demonstrate the versatility of our learned CamFormer embeddings on a diverse suite of downstream tasks, ranging from cross-modal alignment to classification and temporal analysis. Importantly, our representations are robust across diverse camera pose estimation methods, including both high-fidelity multi-sensored and standard RGB-only estimators. Our findings establish camera trajectory as a lightweight, robust, and versatile modality for perceiving video content.

2511.21569 2026-04-03 cs.AI cs.HC

When Models Fabricate Credentials: Measuring How Professional Identity Suppresses Honest Self-Representation

Alex Diep

Comments Submitted to COLM; 43 pages, 12 figures, 15 tables; sharpen focus of paper and reduced length of paper

详情
英文摘要

When language models are assigned professional personas, they face a conflict between maintaining the persona and disclosing their AI nature. How models resolve this conflict has practical consequences: a model that constructs detailed narratives of medical training and board certifications presents a surface of professional authority it does not possess. We systematically characterize this behavior using AI identity disclosure as a testbed: when probed about expertise origins, a model can either acknowledge its AI nature or maintain its assigned professional identity. Using a factorial design, sixteen open-weight models were audited across 19,200 trials. Under neutral conditions, models disclosed their AI nature in 99.8%-99.9% of interactions; assigning a professional persona reduced disclosure to 36.3% on average, though this suppression was highly context-dependent: the same models that maintained a neurosurgeon persona often disclosed under a financial advisor persona, a 9.7-fold difference. Counter to expectations that greater scale should support broader behavioral generalization, model size explained little of this variation, while model identity explained substantially more (Delta R_adj^2 = 0.375 vs. 0.012). We hypothesized that instruction-following dynamics contribute to these patterns and probed this directly: varying a single system prompt statement increased disclosure from 23.7% to 65.8%, while general honesty instructions produced negligible effects. Self-representational behavior does not generalize across professional contexts; instead, models exhibit sharp and sometimes unexpected differences under minor environmental changes, with training choices appearing to matter more than scale.

2511.20456 2026-04-03 cs.LG

Towards Trustworthy Wi-Fi CSI-based Sensing: Systematic Evaluation of Adversarial Robustness

Shreevanth Krishnaa Gopalakrishnan, Stephen Hailes

Comments 18 pages, 5 figures, 6 tables

详情
英文摘要

Machine learning drives Channel State Information (CSI)-based human sensing in modern wireless networks, enabling applications like device-free human activity recognition (HAR) and identification (HID). However, the susceptibility of these models to adversarial perturbations raises security concerns that must be quantified prior to edge deployment. We present a systematic robustness evaluation of five diverse CSI architectures across four public datasets, jointly analyzing white-box, black-box transfer, and universal attacks, together with defense strategies, under unconstrained and physics-guided perturbation boundaries. Contrary to prior assumptions, our experiments reveal that model capacity does not guarantee robustness; simple architectures consistently exhibit superior resilience compared to high-capacity sequence and vision models. Furthermore, vulnerability is fundamentally task-dependent, with HAR proving highly susceptible to attack, while HID demonstrates stark inherent resistance. Crucially, enforcing physical signal constraints drastically reduces attack success rates and significantly taxes attacker computation, showing that standard unconstrained feature-space attacks substantially overestimate real-world Over-The-Air vulnerabilities. By synthesizing attack, defense, and security metrics with strict edge hardware considerations, this work establishes foundational design principles for secure, deployable, and physically realizable wireless sensing systems.

2511.18123 2026-04-03 cs.CV cs.AI cs.CL cs.LG

Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

Dachuan Zhao, Weiyue Li, Zhenda Shen, Yushu Qiu, Bowen Xu, Haoyu Chen, Yongchao Chen

Comments Accepted at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

详情
英文摘要

Vision-Language Models (VLMs) have become indispensable for multimodal reasoning, yet their representations often encode and amplify demographic biases, resulting in biased associations and misaligned predictions in downstream tasks. Such behavior undermines fairness and distorts the intended alignment between vision and language. Recent post-hoc approaches attempt to mitigate bias by replacing the most attribute-correlated embedding coordinates with neutral values. However, our systematic analysis reveals three critical limitations of this coordinate-wise approach: feature entanglement, poor cross-dataset generalization, and incomplete bias removal. We find that bias is not localized to a few coordinates but is instead distributed across a few linear subspaces. To address these limitations, we propose $\textbf{S}$ubspace $\textbf{P}$rojection $\textbf{D}$ebiasing ($\textbf{SPD}$), a geometrically principled framework that identifies and removes the entire subspace of linearly decodable bias while reinserting a neutral mean component to preserve semantic fidelity. Extensive experiments across zero-shot classification, text-to-image retrieval, and image generation validate the effectiveness of SPD: our method achieves more robust debiasing with an average improvement of $18.5\%$ across four fairness metrics, while maintaining minimal loss in task performance compared to the best debiasing baseline.

2511.16471 2026-04-03 cs.CV

FastSurfer-CC: A robust, accurate, and comprehensive framework for corpus callosum morphometry

Clemens Pollak, Kersten Diers, Santiago Estrada, David Kügler, Martin Reuter

详情
英文摘要

The corpus callosum, the largest commissural structure in the human brain, is a central focus in research on aging and neurological diseases. It is also a critical target for interventions such as deep brain stimulation and serves as an important biomarker in clinical trials, including those investigating remyelination therapies. Despite extensive research on corpus callosum segmentation, few publicly available tools provide a comprehensive and automated analysis pipeline. To address this gap, we present FastSurfer-CC, an efficient and fully automated framework for corpus callosum morphometry. FastSurfer-CC automatically identifies mid-sagittal slices, segments the corpus callosum and fornix, localizes the anterior and posterior commissures to standardize head positioning, generates thickness profiles and subdivisions, and extracts eight shape metrics for statistical analysis. We demonstrate that FastSurfer-CC outperforms existing specialized tools across the individual tasks. Moreover, our method reveals statistically significant differences between Huntington's disease patients and healthy controls that are not detected by the current state-of-the-art.

2511.16145 2026-04-03 cs.LG cs.AI

Labels Matter More Than Models: Rethinking the Unsupervised Paradigm in Time Series Anomaly Detection

Zhijie Zhong, Zhiwen Yu, Kaixiang Yang, Yongheng Liu, Jun Jiang, C. L. Philip Chen

Comments 20 pages, 15 figures, 8 tables. Under review

详情
英文摘要

Time series anomaly detection (TSAD) is a critical data mining task often constrained by label scarcity. Consequently, current research predominantly focuses on Unsupervised Time-series Anomaly Detection (UTAD), relying on increasingly complex architectures to model normal data distributions. However, this algorithm-centric trend often overlooks the significant performance gains achievable from limited anomaly labels available in practical scenarios. This paper challenges the premise that algorithmic complexity is the optimal path for TSAD. Instead of proposing another intricate unsupervised model, we present a comprehensive benchmark and empirical study to rigorously compare supervised and unsupervised paradigms. To isolate the value of labels, we introduce \stand, a deliberately minimalist supervised baseline. Extensive experiments on five public datasets demonstrate that: (1) Labels matter more than models: under a limited labeling budget, simple supervised models significantly outperform complex state-of-the-art unsupervised methods; (2) Supervision yields higher returns: the performance gain from minimal supervision far exceeds the incremental gains from architectural innovations; and (3) Practicality: \stand~exhibits superior prediction consistency and anomaly localization compared to unsupervised counterparts. These findings advocate for a paradigm shift in TSAD research, urging the community to prioritize data-centric label utilization over purely algorithmic complexity. The code and benchmark are publicly available at https://github.com/EmorZz1G/STAND.

2511.10853 2026-04-03 cs.AI cs.HC

Advanced Assistance for Traffic Crash Analysis: An AI-Driven Multi-Agent Approach to Pre-Crash Reconstruction

Gerui Xu, Boyou Chen, Huizhong Guo, Dave LeBlanc, Arpan Kusari, Efe Yarbasi, Ananna Ahmed, Zhaonan Sun, Shan Bao

Comments 36 pages, 14 figures

详情
Journal ref
SAE International Journal of Transportation Safety, 14(1), 2026
英文摘要

Traffic collision reconstruction traditionally relies on human expertise and can be accurate, but pre-crash reconstruction is more challenging. This study develops a multi-agent AI framework that reconstructs pre-crash scenarios and infers vehicle behaviors from fragmented collision data. We propose a two-phase collaborative framework with reconstruction and reasoning stages. The system processes 277 rear-end lead vehicle deceleration (LVD) crashes from the Crash Investigation Sampling System (CISS, 2017 to 2022), integrating narrative reports, structured tabular variables, and scene diagrams. Phase I generates natural-language crash reconstructions from multimodal inputs. Phase II combines these reconstructions with Event Data Recorder (EDR) signals to (1) identify striking and struck vehicles and (2) isolate the EDR records most relevant to the collision moment, enabling inference of key pre-crash behaviors. For validation, we evaluated all LVD cases and emphasized 39 complex crashes where multiple EDR records per crash created ambiguity due to missing or conflicting data. Ground truth was set by consensus of two independent manual annotators, with a separate language model used only to flag potential conflicts for re-checking. The framework achieved 100% accuracy across 4,155 trials; three reasoning models produced identical outputs, indicating that performance is driven by the structured prompts rather than model choice. Research analysts without reconstruction training achieved 92.31% accuracy on the same 39 complex cases. Ablation tests showed that removing structured reasoning anchors reduced case-level accuracy from 99.7% to 96.5% and increased errors across multiple output dimensions. The system remained robust under incomplete inputs. This zero-shot evaluation, without domain-specific training or fine-tuning, suggests a scalable approach for AI-assisted pre-crash analysis.

2511.10841 2026-04-03 cs.LG cs.AI

FlowPath: Learning Data-Driven Manifolds with Invertible Flows for Robust Irregularly-sampled Time Series Classification

YongKyung Oh, Dong-Young Lim, Sungil Kim

Comments Published at the 40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026). https://ojs.aaai.org/index.php/AAAI/article/view/39643

详情
英文摘要

Modeling continuous-time dynamics from sparse and irregularly-sampled time series remains a fundamental challenge. Neural controlled differential equations provide a principled framework for such tasks, yet their performance is highly sensitive to the choice of control path constructed from discrete observations. Existing methods commonly employ fixed interpolation schemes, which impose simplistic geometric assumptions that often misrepresent the underlying data manifold, particularly under high missingness. We propose FlowPath, a novel approach that learns the geometry of the control path via an invertible neural flow. Rather than merely connecting observations, FlowPath constructs a continuous and data-adaptive manifold, guided by invertibility constraints that enforce information-preserving and well-behaved transformations. This inductive bias distinguishes FlowPath from prior unconstrained learnable path models. Empirical evaluations on 18 benchmark datasets and a real-world case study demonstrate that FlowPath consistently achieves statistically significant improvements in classification accuracy over baselines using fixed interpolants or non-invertible architectures. These results highlight the importance of modeling not only the dynamics along the path but also the geometry of the path itself, offering a robust and generalizable solution for learning from irregular time series.

2511.01375 2026-04-03 cs.AI

Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges

Hamin Koo, Minseon Kim, Jaehyung Kim

Comments ICLR 2026

详情
英文摘要

Identifying the vulnerabilities of large language models (LLMs) is crucial for improving their safety by addressing inherent weaknesses. Jailbreaks, in which adversaries bypass safeguards with crafted input prompts, play a central role in red-teaming by probing LLMs to elicit unintended or unsafe behaviors. Recent optimization-based jailbreak approaches iteratively refine attack prompts by leveraging LLMs. However, they often rely heavily on either binary attack success rate (ASR) signals, which are sparse, or manually crafted scoring templates, which introduce human bias and uncertainty in the scoring outcomes. To address these limitations, we introduce AMIS (Align to MISalign), a meta-optimization framework that jointly evolves jailbreak prompts and scoring templates through a bi-level structure. In the inner loop, prompts are refined using fine-grained and dense feedback from a fixed scoring template. In the outer loop, the template is optimized using an ASR alignment score, gradually evolving to better reflect true attack outcomes across queries. This co-optimization process yields progressively stronger jailbreak prompts and more calibrated scoring signals. Evaluations on AdvBench and JBB-Behaviors demonstrate that AMIS achieves state-of-the-art performance, including 88.0% ASR on Claude-3.5-Haiku and 100.0% ASR on Claude-4-Sonnet, outperforming existing baselines by substantial margins.

2510.25147 2026-04-03 cs.LG math.OC

Machine Learning Guided Optimal Transmission Switching to Mitigate Wildfire Ignition Risk

Weimin Huang, Ryan Piansky, Bistra Dilkina, Daniel K. Molzahn

详情
英文摘要

To mitigate acute wildfire ignition risks, utilities de-energize power lines in high-risk areas. The Optimal Power Shutoff (OPS) problem optimizes line energization statuses to manage wildfire ignition risks through de-energizations while reducing load shedding. OPS problems are computationally challenging Mixed-Integer Linear Programs (MILPs) that must be solved rapidly and frequently in operational settings. For a particular power system, OPS instances share a common structure with varying parameters related to wildfire risks, loads, and renewable generation. This motivates the use of Machine Learning (ML) for solving OPS problems by exploiting shared patterns across instances. In this paper, we develop an ML-guided framework that quickly produces high-quality de-energization decisions by extending existing ML-guided MILP solution methods while integrating domain knowledge on the number of energized and de-energized lines. Results on a large-scale realistic California-based synthetic test system show that the proposed ML-guided method produces high-quality solutions faster than traditional optimization methods.

2510.25126 2026-04-03 cs.LG cs.AI

Bridging the Divide: End-to-End Sequence-Graph Learning

Yuen Chen, Yulun Wu, Samuel Sharpe, Igor Melnyk, Nam H. Nguyen, Furong Huang, C. Bayan Bruss, Rizal Fathony

详情
英文摘要

Many real-world prediction tasks, particularly those involving entities such as customers or patients, involve both {sequential} and {relational} data. Each entity maintains its own sequence of events while simultaneously engaging in relationships with others. Existing methods in sequence and graph modeling often overlook one modality in favor of the other. We argue that these two facets should instead be integrated and learned jointly. We introduce BRIDGE, a unified end-to-end architecture that couples a sequence model with a graph module under a single objective, allowing gradients to flow across both components to learn task-aligned representations. To enable fine-grained interaction, we propose TOKENXATTN, a token-level cross-attention layer that facilitates message passing between specific events in neighboring sequences. Across two settings, relationship prediction and fraud detection, BRIDGE consistently outperforms static graph models, temporal graph methods, as well as sequence-only baselines on both ranking and classification metrics.

2510.24379 2026-04-03 cs.CV

A Luminance-Aware Multi-Scale Network for Polarization Image Fusion with a Multi-Scene Dataset

Zhuangfan Huang, Xiaosong Li, Gao Wang, Tao Ye, Haishu Tan, Huafeng Li

详情
英文摘要

Polarization image fusion combines S0 and DOLP images to reveal surface roughness and material properties through complementary texture features, which has important applications in camouflage recognition, tissue pathology analysis, surface defect detection and other fields. To intergrate coL-Splementary information from different polarized images in complex luminance environment, we propose a luminance-aware multi-scale network (MLSN). In the encoder stage, we propose a multi-scale spatial weight matrix through a brightness-branch , which dynamically weighted inject the luminance into the feature maps, solving the problem of inherent contrast difference in polarized images. The global-local feature fusion mechanism is designed at the bottleneck layer to perform windowed self-attention computation, to balance the global context and local details through residual linking in the feature dimension restructuring stage. In the decoder stage, to further improve the adaptability to complex lighting, we propose a Brightness-Enhancement module, establishing the mapping relationship between luminance distribution and texture features, realizing the nonlinear luminance correction of the fusion result. We also present MSP, an 1000 pairs of polarized images that covers 17 types of indoor and outdoor complex lighting scenes. MSP provides four-direction polarization raw maps, solving the scarcity of high-quality datasets in polarization image fusion. Extensive experiment on MSP, PIF and GAND datasets verify that the proposed MLSN outperms the state-of-the-art methods in subjective and objective evaluations, and the MS-SSIM and SD metircs are higher than the average values of other methods by 8.57%, 60.64%, 10.26%, 63.53%, 22.21%, and 54.31%, respectively. The source code and dataset is avalable at https://github.com/1hzf/MLS-UNet.

2510.22855 2026-04-03 cs.LG

A Review of Neural Networks in Precipitation Prediction

Yugong Zeng, Jiayuan Wang, Jonathan Wu

详情
英文摘要

Precipitation prediction has undergone a profound transformation. A notable limitation of traditional NWP is the need for extensive statistical post-processing. To address this challenge, neural network-based approaches were developed. These approaches offer a framework that directly learns the mapping from atmospheric predictors to precipitation targets. Based on the technological development, this article first reviews the traditional precipitation forecasting methods and summarizes the development trends of precipitation forecasting based on neural networks. We then outline the training process, loss functions, and some datasets for precipitation prediction. In the main body of the article, we detail the basic artificial neural networks (ANNs), spatial feature extraction models, time feature extraction models, generative models, Transformer models, graph neural networks (GNNs), and emerging hybrid models. Finally, in the appendix, we supplement the commonly used evaluation metrics. This paper focuses on the advantages and disadvantages of various neural network models in precipitation forecasting applications, and also pays attention to the latest progress of neural network-based methods. Overall, neural networks have significantly improved the accuracy of short-term and medium-term precipitation forecasting, but still face challenges in representing extreme rainfall, handling imbalanced data, and ensuring physical consistency. The latest progress shows that future prediction systems will increasingly rely on the integration of multiple sources of data and hybrid physical-data-driven models to enhance their robustness and applicability. By compositing research covering multiple eras and paradigms, we not only depict the history of neural networks in precipitation prediction but also outline future directions in next generation forecasting systems.

2510.21852 2026-04-03 cs.LG physics.flu-dyn

Interpretable Diagnostics and Adaptive Data Assimilation for Neural ODEs via Discrete Empirical Interpolation

Hojin Kim, Romit Maulik

Comments 19 pages, 17 figures

详情
英文摘要

We present a framework that leverages the Discrete Empirical Interpolation Method (DEIM) for interpretable deep learning and dynamical system analysis. Although DEIM efficiently approximates nonlinear terms in projection-based reduced-order models (POD-ROM), its fixed interpolation points are repurposed for identifying dynamically representative spatial structures in learned models. We apply DEIM as an interpretability tool to examine the learned dynamics of a pre-trained Neural Ordinary Differential Equation (NODE) for two-dimensional vortex-merging and backward-facing step flows. DEIM trajectories reveal physically meaningful structures in NODE predictions and expose failure modes when extrapolating to unseen flow configurations. Building on this diagnostic capability, we further introduce a DEIM-guided data assimilation strategy that injects sparse, dynamically representative corrections into the NODE rollout. By allocating a limited nudging budget to DEIM-identified sampling locations, the framework significantly improves long-term stability and predictive accuracy in out-of-distribution scenarios for the two-dimensional vortex-merging flow. Additional experiments for a flow over a backward-facing step reveal regime-dependent gains, with alternative sampling strategies performing competitively as well. These results demonstrate that DEIM can serve as an interpretable diagnostic and control framework for understanding and enhancing neural differential equation models.

2510.14523 2026-04-03 cs.LG math.ST stat.ML stat.TH

On the Identifiability of Tensor Ranks via Prior Predictive Matching

Eliezer da Silva, Arto Klami, Diego Mesquita, Iñigo Urteaga

Comments Accepted at AISTATS 2026

详情
英文摘要

Selecting the latent dimensions (ranks) in tensor factorization is a central challenge that often relies on heuristic methods. This paper introduces a rigorous approach to determine rank identifiability in probabilistic tensor models, based on prior predictive moment matching. We transform a set of moment matching conditions into a log-linear system of equations in terms of marginal moments, prior hyperparameters, and ranks; establishing an equivalence between rank identifiability and the solvability of such system. We apply this framework to four foundational tensor-models, demonstrating that the linear structure of the PARAFAC/CP model, the chain structure of the Tensor Train model, and the closed-loop structure of the Tensor Ring model yield solvable systems, making their ranks identifiable. In contrast, we prove that the symmetric topology of the Tucker model leads to an underdetermined system, rendering the ranks unidentifiable by this method. For the identifiable models, we derive explicit closed-form rank estimators based on the moments of observed data only. We empirically validate these estimators and evaluate the robustness of the proposal.

2510.11579 2026-04-03 cs.CV cs.LG

MS-Mix: Sentiment-Guided Adaptive Augmentation for Multimodal Sentiment Analysis

Hongyu Zhu, Lin Chen, Xin Jin, Mingsheng Shang

Comments Under Review

详情
英文摘要

Multimodal Sentiment Analysis (MSA) integrates complementary features from text, video, and audio for robust emotion understanding in human interactions. However, models suffer from severe data scarcity and high annotation costs, severely limiting real-world deployment in social media analytics and human-computer systems. Existing Mixup-based augmentation techniques, when naively applied to MSA, often produce semantically inconsistent samples and amplified label noise by ignoring emotional semantics across modalities. To address these challenges, we propose MS-Mix, an adaptive emotion-sensitive augmentation framework that automatically optimizes data quality in multimodal settings. Its key components are: (1) Sentiment-aware sample selection strategy that filters incompatible pairs via latent-space semantic similarity to prevent contradictory emotion mixing. (2) Sentiment intensity guided module with multi-head self-attention for computing modality-specific mixing ratios conditioned on emotional salience dynamically. (3) Sentiment alignment loss based on Kullback-Leibler divergence to align predicted sentiment distributions across modalities with ground-truth labels, improving discrimination and consistency. Extensive experiments on two public datasets with six state-of-the-art backbones confirm that MS-Mix consistently outperforms prior methods, significantly improving robustness and practical applicability for MSA. The source code is available at an anonymous link: https://anonymous.4open.science/r/MS-Mix-review-0C72.

2510.09416 2026-04-03 cs.LG cs.SI

What Do Temporal Graph Learning Models Learn?

Abigail J. Hayes, Tobias Schumacher, Markus Strohmaier

详情
英文摘要

Learning on temporal graphs has become a central topic in graph representation learning, with numerous benchmarks indicating the strong performance of state-of-the-art models. However, recent work has raised concerns about the reliability of benchmark results, noting issues with commonly used evaluation protocols and the surprising competitiveness of simple heuristics. This contrast raises the question of which characteristics of the underlying graphs temporal graph learning models actually use to form their predictions. We address this by systematically evaluating eight models on their ability to capture eight fundamental characteristics related to the link structure of temporal graphs. These include structural characteristics such as density, temporal patterns such as recency, and edge formation mechanisms such as homophily. Using both synthetic and real-world datasets, we analyze how well models learn these characteristics. Our findings reveal a mixed picture: models capture some characteristics well but fail to reproduce others. With this, we expose important limitations. Overall, we believe that our results provide practical insights for the application of temporal graph learning models and motivate more interpretability-driven evaluations in graph learning research.

2510.07487 2026-04-03 cs.LG

Reinforcement Learning-based Task Offloading in the Internet of Wearable Things

Waleed Bin Qaim, Aleksandr Ometov, Claudia Campolo, Antonella Molinaro, Elena Simona Lohan, Jari Nurmi

Comments Withdrawn by the authors. A revised version is under preparation

详情
英文摘要

Over the years, significant contributions have been made by the research and industrial sectors to improve wearable devices towards the Internet of Wearable Things (IoWT) paradigm. However, wearables are still facing several challenges. Many stem from the limited battery power and insufficient computation resources available on wearable devices. On the other hand, with the popularity of smart wearables, there is a consistent increase in the development of new computationally intensive and latency-critical applications. In such a context, task offloading allows wearables to leverage the resources available on nearby edge devices to enhance the overall user experience. This paper proposes a framework for Reinforcement Learning (RL)-based task offloading in the IoWT. We formulate the task offloading process considering the tradeoff between energy consumption and task accomplishment time. Moreover, we model the task offloading problem as a Markov Decision Process (MDP) and utilize the Q-learning technique to enable the wearable device to make optimal task offloading decisions without prior knowledge. We evaluate the performance of the proposed framework through extensive simulations for various applications and system configurations conducted in the ns-3 network simulator. We also show how varying the main system parameters of the Q-learning algorithm affects the overall performance in terms of average task accomplishment time, average energy consumption, and percentage of tasks offloaded.

2510.07197 2026-04-03 cs.RO

COMPAct: Computational Optimization and Automated Modular design of Planetary Actuators

Aman Singh, Deepak Kapa, Suryank Joshi, Shishir Kolathaya

Comments 8 pages, 9 Figures, 2 tables; first two authors contributed equally; published in 2026 IEEE International Conference on Robotics and Automation (ICRA 2026)

详情
英文摘要

The optimal design of robotic actuators is a critical area of research, yet limited attention has been given to optimizing gearbox parameters and automating actuator CAD. This paper introduces COMPAct: Computational Optimization and Automated Modular Design of Planetary Actuators, a framework that systematically identifies optimal gearbox parameters for a given motor across four gearbox types, single-stage planetary gearbox (SSPG), compound planetary gearbox (CPG), Wolfrom planetary gearbox (WPG), and double-stage planetary gearbox (DSPG). The framework minimizes mass and actuator width while maximizing efficiency, and further automates actuator CAD generation to enable direct 3D printing without manual redesign. Using this framework, optimal gearbox designs are explored across a wide range of gear ratios, providing insights into the suitability of different gearbox types while automatically generating CAD models for all four gearbox types with varying gear ratios and motors. Two actuator types are fabricated and experimentally evaluated through power efficiency, no-load backlash, and transmission stiffness tests. Experimental results indicate that the SSPG actuator achieves a mechanical efficiency of 60-80%, a no-load backlash of 0.59 deg, and a transmission stiffness of 242.7 Nm/rad, while the CPG actuator demonstrates 60% efficiency, 2.6 deg backlash, and a stiffness of 201.6 Nm/rad. CODE: https://github.com/singhaman1750/COMPAct.git VIDEO: https://youtu.be/etK6anjXag8?si=jFK7HgAPSBy-GnDR

2510.06339 2026-04-03 cs.RO

Vi-TacMan: Articulated Object Manipulation via Vision and Touch

Leiyao Cui, Zihang Zhao, Sirui Xie, Wenhuan Zhang, Zhi Han, Yixin Zhu

Comments ICRA 2026

详情
英文摘要

Autonomous manipulation of articulated objects remains a fundamental challenge for robots in human environments. Vision-based methods can infer hidden kinematics but can yield imprecise estimates on unfamiliar objects. Tactile approaches achieve robust control through contact feedback but require accurate initialization. This suggests a natural synergy: vision for global guidance, touch for local precision. Yet no framework systematically exploits this complementarity for generalized articulated manipulation. Here we present Vi-TacMan, which uses vision to propose grasps and coarse directions that seed a tactile controller for precise execution. By incorporating surface normals as geometric priors and modeling directions via von Mises-Fisher distributions, our approach achieves significant gains over baselines (all p<0.0001). Critically, manipulation succeeds without explicit kinematic models -- the tactile controller refines coarse visual estimates through real-time contact regulation. Tests on more than 50,000 simulated and diverse real-world objects confirm robust cross-category generalization. This work establishes that coarse visual cues suffice for reliable manipulation when coupled with tactile feedback, offering a scalable paradigm for autonomous systems in unstructured environments.