arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1718
2503.07555 2026-04-02 cs.LG

Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference

Fateme Jamshidi, Mohammad Shahverdikondori, Negar Kiyavash

详情
英文摘要

We study multi-armed bandits under network interference, where each unit's reward depends on its own treatment and those of its neighbors in a given graph. This induces an exponentially large action space, making standard approaches computationally impractical. We propose a novel algorithm that uses the local graph structure to minimize regret. We derive a graph-dependent upper bound on cumulative regret that improves over prior work. Additionally, we provide the first lower bounds for bandits with arbitrary network interference, where each bound involves a distinct structural property of the graph. These bounds show that for both dense and sparse graphs, our algorithm is nearly optimal, with matching upper and lower bounds up to logarithmic factors. When the interference graph is unknown, a variant of our algorithm is Pareto optimal: no algorithm can uniformly outperform it across all instances. We complement our theoretical results with numerical experiments, showing that our approach outperforms the baseline methods.

2502.21060 2026-04-02 cs.LG cs.IT math.IT

VT-Former: Efffcient Transformer-based Decoder for Varshamov-Tenengolts Codes

Yali Wei, Alan J. X. Guo, Zihui Yan, Yufan Dai, Wenjia Fan

Comments 9 pages, 10 figures, 5 tables

详情
英文摘要

In recent years, widespread attention has been drawn to the challenge of correcting insertion, deletion, and substitution (IDS) errors in DNA-based data storage. Among various IDS-correcting codes, Varshamov-Tenengolts (VT) codes, originally designed for single-error correction, have been established as a central research focus. While existing decoding methods demonstrate high accuracy for single-error correction, they are typically not applicable to the correction of multiple IDS errors. In this work, the latent capability of VT codes for multiple-error correction is investigated through a statistic-enhanced Transformer-based VT decoder (VT-Former), utilizing both symbol and statistic feature embeddings. Experimental results demonstrate that VT-Former achieves nearly 100\% accuracy on correcting single errors. For multi-error decoding tasks across various codeword lengths, improvements in both frame accuracy and bit accuracy are observed, compared to conventional hard-decision and soft-in soft-out decoding algorithms. Furthermore, while lower decoding latency is exhibited by the base model compared to traditional soft decoders, the architecture is further optimized in this study to enhance decoding efficiency and reduce computational overhead.

2502.14883 2026-04-02 cs.CV cs.AI

How Blind and Low-Vision Individuals Prefer Large Vision-Language Model-Generated Scene Descriptions

Na Min An, Eunki Kim, Wan Ju Kang, Sangryul Kim, James Thorne, Hyunjung Shim

Comments This paper has been superseded by version 2 of arXiv:2510.00766

详情
英文摘要

For individuals with blindness or low vision (BLV), navigating complex environments can pose serious risks. Large Vision-Language Models (LVLMs) show promise for generating scene descriptions, but their effectiveness for BLV users remains underexplored. To address this gap, we conducted a user study with eight BLV participants to systematically evaluate preferences for six types of LVLM descriptions. While they helped to reduce fear and improve actionability, user ratings showed wide variation in sufficiency and conciseness. Furthermore, GPT-4o--despite its strong potential to refine descriptions--was not consistently preferred by participants. We use the insights obtained from the user study to build training data for building our new automatic evaluation metric that can capture BLV preferences effectively. Our findings underscore the urgent need for BLV-centered evaluation metrics and human-in-the-loop feedback to advance LVLM description quality for accessibility.

2502.01861 2026-04-02 cs.LG stat.ML

Learning Hyperparameters via a Data-Emphasized Variational Objective

Ethan Harvey, Mikhail Petrov, Michael C. Hughes

Comments arXiv admin note: text overlap with arXiv:2410.19675

详情
英文摘要

When training large models on limited data, avoiding overfitting is paramount. Common grid search or smarter search methods rely on expensive separate runs for each candidate hyperparameter, while carving out a validation set that reduces available training data. In this paper, we study gradient-based learning of hyperparameters via the evidence lower bound (ELBO) objective from Bayesian variational methods. This avoids the need for any validation set. We focus on scenarios where the model is over-parameterized for flexibility and the approximate posterior is chosen to be Gaussian with isotropic covariance for tractability, even though it cannot match the true posterior. In such scenarios, we find the ELBO prioritizes posteriors that match the prior, leading to severe underfitting. Instead, we recommend a data-emphasized ELBO that upweights the likelihood but not the prior. In Bayesian transfer learning of image and text classifiers, our method reduces the 88+ hour grid search of past work to under 3 hours while delivering comparable accuracy. We further demonstrate how our approach enables efficient yet accurate approximations of Gaussian processes with learnable lengthscale kernels.

2502.01556 2026-04-02 cs.LG stat.ML

A Gaussian Process View on Observation Noise and Initialization in Wide Neural Networks

Sergio Calvo-Ordoñez, Jonathan Plenk, Richard Bergna, Alvaro Cartea, Jose Miguel Hernandez-Lobato, Konstantina Palla, Kamil Ciosek

Comments AISTATS 2026, Camera-ready version

详情
英文摘要

Performing gradient descent in a wide neural network is equivalent to computing the posterior mean of a Gaussian Process with the Neural Tangent Kernel (NTK-GP), for a specific prior mean and with zero observation noise. However, existing formulations have two limitations: (i) the NTK-GP assumes noiseless targets, leading to misspecification on noisy data; (ii) the equivalence does not extend to arbitrary prior means, which are essential for well-specified models. To address (i), we introduce a regularizer into the training objective, showing its correspondence to incorporating observation noise in the NTK-GP. To address (ii), we propose a \textit{shifted network} that enables arbitrary prior means and allows obtaining the posterior mean with gradient descent on a single network, without ensembling or kernel inversion. We validate our results with experiments across datasets and architectures, showing that this approach removes key obstacles to the practical use of NTK-GP equivalence in applied Gaussian process modeling.

2501.09821 2026-04-02 cs.LG math.PR

BN-Pool: Bayesian Nonparametric Pooling for Graphs

Daniele Castellana, Filippo Maria Bianchi

详情
英文摘要

We introduce BN-Pool, the first clustering-based pooling method for Graph Neural Networks that adaptively determines the number of supernodes in a coarsened graph. BN-Pool leverages a generative model based on a Bayesian nonparametric framework for partitioning graph nodes into an unbounded number of clusters. During training, the node-to-cluster assignments are learned by combining the supervised loss of the downstream task with an unsupervised auxiliary term, which encourages the reconstruction of the original graph topology while penalizing unnecessary proliferation of clusters. By automatically discovering the optimal coarsening level for each graph, BN-Pool preserves the performance of soft-clustering pooling methods while avoiding their typical redundancy by learning compact pooled graphs. The code is available at https://github.com/NGMLGroup/Bayesian-Nonparametric-Graph-Pooling.

2501.09136 2026-04-02 cs.AI cs.CL cs.IR

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei, Athanasios V. Vasilakos

详情
英文摘要

Large Language Models (LLMs) have advanced artificial intelligence by enabling human-like text generation and natural language understanding. However, their reliance on static training data limits their ability to respond to dynamic, real-time queries, resulting in outdated or inaccurate outputs. Retrieval-Augmented Generation (RAG) has emerged as a solution, enhancing LLMs by integrating real-time data retrieval to provide contextually relevant and up-to-date responses. Despite its promise, traditional RAG systems are constrained by static workflows and lack the adaptability required for multi-step reasoning and complex task management. Agentic Retrieval-Augmented Generation (Agentic RAG) transcends these limitations by embedding autonomous AI agents into the RAG pipeline. These agents leverage agentic design patterns reflection, planning, tool use, and multi-agent collaboration to dynamically manage retrieval strategies, iteratively refine contextual understanding, and adapt workflows through operational structures ranging from sequential steps to adaptive collaboration. This integration enables Agentic RAG systems to deliver flexibility, scalability, and context-awareness across diverse applications. This paper presents an analytical survey of Agentic RAG systems. It traces the evolution of RAG paradigms, introduces a principled taxonomy of Agentic RAG architectures based on agent cardinality, control structure, autonomy, and knowledge representation, and provides a comparative analysis of design trade-offs across existing frameworks. The survey examines applications in healthcare, finance, education, and enterprise document processing, and distills practical lessons for system designers and practitioners. Finally, it identifies key open research challenges related to evaluation, coordination, memory management, efficiency, and governance, outlining directions for future research.

2412.01114 2026-04-02 cs.LG

Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations

Cevahir Koprulu, Po-han Li, Tianyu Qiu, Ruihan Zhao, Tyler Westenbroek, David Fridovich-Keil, Sandeep Chinchali, Ufuk Topcu

详情
英文摘要

Many continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks. In principle, online RL methods can automatically explore the state space to solve each new task. However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases. Manually shaping rewards can accelerate learning for a fixed task, but it is an arduous process that must be repeated for each new environment. We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations, and then uses these priors to synthesize dense dynamics-aware rewards for the given task. This supervision substantially accelerates learning in our experiments, and we provide analysis demonstrating how the approach can effectively guide online learning agents to faraway goals.

2411.17499 2026-04-02 cs.LG

Time-Series Forecasting in Smart Manufacturing Systems: An Experimental Evaluation of the State-of-the-art Algorithms

Mojtaba A. Farahani, Fadi El Kalach, Austin Harper, M. R. McCormick, Ramy Harik, Thorsten Wuest

详情
Journal ref
Robotics and Computer-Integrated Manufacturing 95 (2025): 103010
英文摘要

TSF is growing in various domains including manufacturing. Although numerous TSF algorithms have been developed recently, the validation and evaluation of algorithms hold substantial value for researchers and practitioners and are missing. This study aims to fill this gap by evaluating the SoTA TSF algorithms on thirteen manufacturing datasets, focusing on their applicability in manufacturing. Each algorithm was selected based on its TSF category to ensure a representative set of algorithms. The evaluation includes different scenarios to evaluate the models using two problem categories and two forecasting horizons. To evaluate the performance, the WAPE was calculated, and additional post hoc analyses were conducted to assess the significance of observed differences. Only algorithms with codes from open-source libraries were utilized, and no hyperparameter tuning was done. This allowed us to evaluate the algorithms as "out-of-the-box" solutions that can be easily implemented, ensuring their usability within the manufacturing by practitioners with limited technical knowledge. This aligns to facilitate the adoption of these techniques in smart manufacturing systems. Based on the results, transformer and MLP-based architectures demonstrated the best performance with MLP-based architecture winning the most scenarios. For univariate TSF, PatchTST emerged as the most robust, particularly for long-term horizons, while for multivariate problems, MLP-based architectures like N-HITS and TiDE showed superior results. The study revealed that simpler algorithms like XGBoost could outperform complex algorithms in certain tasks. These findings challenge the assumption that more sophisticated models produce better results. Additionally, the research highlighted the importance of computational resource considerations, showing variations in runtime and memory usage across different algorithms.

2411.13181 2026-04-02 cs.CV cs.AI cs.CY

Cross-Camera Distracted Driver Classification through Feature Disentanglement and Contrastive Learning

Luigi Celona, Simone Bianco, Paolo Napoletano

详情
英文摘要

The classification of distracted drivers is pivotal for ensuring safe driving. Previous studies demonstrated the effectiveness of neural networks in automatically predicting driver distraction, fatigue, and potential hazards. However, recent research has uncovered a significant loss of accuracy in these models when applied to samples acquired under conditions that differ from the training data. In this paper, we introduce a robust model designed to withstand changes in camera position within the vehicle. Our Driver Behavior Monitoring Network (DBMNet) relies on a lightweight backbone and integrates a disentanglement module to discard camera view information from features, coupled with contrastive learning to enhance the encoding of various driver actions. Experiments conducted using a leave-one-camera-out protocol on the daytime and nighttime subsets of the 100-Driver dataset validate the effectiveness of our approach. Cross-dataset and cross-camera experiments conducted on three benchmark datasets, namely AUCDD-V1, EZZ2021 and SFD, demonstrate the superior generalization capabilities of the proposed method. Overall DBMNet achieves an improvement of 7% in Top-1 accuracy compared to existing efficient approaches. Moreover, a quantized version of the DBMNet and all considered methods has been deployed on a Coral Dev Board board. In this deployment scenario, DBMNet outperforms alternatives, achieving the lowest average error while maintaining a compact model size, low memory footprint, fast inference time, and minimal power consumption.

2410.19899 2026-04-02 cs.CV

Exploring Self-Supervised Learning with U-Net Masked Autoencoders and EfficientNet-B7 for Improved Gastrointestinal Abnormality Classification in Video Capsule Endoscopy

Vamshi Krishna Kancharla, Pavan Kumar Kaveti, Dasari Naga Raju

Comments Capsule Vision 2024 Challenge

详情
英文摘要

Video Capsule Endoscopy (VCE) has become an indispensable diagnostic tool for gastrointestinal (GI) disorders due to its non-invasive nature and ability to capture high-resolution images of the small intestine. However, the enormous volume of data generated during a single procedure makes manual inspection labor-intensive, time-consuming, and prone to inter-observer variability. Automated analysis using deep learning offers a promising solution, but its effectiveness is often limited by data imbalance and the high cost of labeled medical data. In this work, we propose a novel framework that combines self-supervised learning through a U-Net-based masked autoencoder with supervised feature extraction using EfficientNet-B7 for multi-class abnormality classification in VCE images. The U-Net model is first trained in a self-supervised manner using Gaussian noise removal and masked reconstruction to learn robust visual representations without requiring annotations. The learned encoder features are then fused with EfficientNet-B7 features to form a rich, discriminative representation for classification. We evaluate our approach on the Capsule Vision 2024 Challenge dataset consisting of ten abnormality classes and a dominant normal class. Experimental results demonstrate that the proposed fusion framework achieves a validation accuracy of 94\%, outperforming standalone architectures and attention-based fusion variants. The study highlights the effectiveness of self-supervised representation learning and feature fusion in addressing class imbalance and improving diagnostic accuracy in real-world medical imaging scenarios.

2410.09512 2026-04-02 cs.RO

The Indirect Method for Generating Libraries of Optimal Periodic Trajectories and Its Application to Economical Bipedal Walking

Maximilian Raff, Kathrin Flaßkamp, C. David Remy

Comments submitted to the International Journal of Robotics Research (IJRR)

详情
英文摘要

Trajectory optimization is an essential tool for generating efficient, dynamically consistent gaits in legged locomotion. This paper explores the indirect method of trajectory optimization, emphasizing its application in creating optimal periodic gaits for legged systems and contrasting it with the more common direct method. While the direct method provides flexibility in implementation, it is limited by its need for an input space parameterization. In contrast, the indirect method improves accuracy by computing the control input from states and costates obtained along the optimal trajectory. In this work, we tackle the convergence challenges associated with indirect shooting methods by utilizing numerical continuation methods. This is particularly useful for the systematic development of gait libraries. Our contributions include: (1) the formalization of a general periodic trajectory optimization problem that extends existing first-order necessary conditions to a broader range of cost functions and operating conditions; (2) a methodology for efficiently generating libraries of optimal trajectories (gaits) utilizing a single shooting approach combined with numerical continuation methods; (3) a novel approach for reconstructing Lagrange multipliers and costates from passive gaits; (4) a comparative analysis of the indirect and direct shooting methods using a compass-gait walker as a case study, demonstrating the improved accuracy of the indirect method in generating optimal gaits; and (5) demonstrating applicability to the more complex legged robot RABBIT, with ten dynamic states and four inputs. The findings underscore the potential of the indirect method for generating families of optimal gaits, thereby advancing the field of trajectory optimization in legged robotics.

2410.03131 2026-04-02 cs.AI cs.CL cs.LG

Code Comprehension then Auditing for Unsupervised LLM Evaluation

Bhrij Patel, Souradip Chakraborty, Mengdi Wang, Dinesh Manocha, Amrit Singh Bedi

Comments 19 pages

详情
英文摘要

Large Language Models (LLMs) for unsupervised code correctness evaluation have recently gained attention because they can judge if code runs as intended without requiring reference implementations or unit tests, which may be unavailable, sparse, or unreliable. However, most prior approaches condition LLM evaluators directly on the full code implementation, forcing the model to jointly infer program behavior and evaluate correctness in a single step. This entanglement leads to misinterpretations of code behavior and unreliable judgments. To mitigate this issue, we introduce CoCoA, an unsupervised Code Comprehension then Auditing framework that first comprehends functionality to generate a natural-language explanation. Then it evaluates task alignment based on this explanation. By sequentially sampling comprehension before evaluation, CoCoA improves the quality of inferred program behavior and enables the evaluator to focus on behavioral alignment rather than raw implementation details. Across multiple datasets, programming languages, and models, CoCoA achieves up to $68\%$ increased F1 score and up to $20\%$ increased accuracy over the best-performing baselines.

2409.20146 2026-04-02 cs.CV

VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection

Huilin Deng, Hongchen Luo, Wei Zhai, Yang Cao, Yu Kang

详情
英文摘要

Zero-shot anomaly detection (ZSAD) recognizes and localizes anomalies in previously unseen objects by establishing feature mapping between textual prompts and inspection images, demonstrating excellent research value in flexible industrial manufacturing. However, existing ZSAD methods are limited by closed-world settings, struggling to unseen defects with predefined prompts. Recently, adapting Multimodal Large Language Models (MLLMs) for Industrial Anomaly Detection (IAD) presents a viable solution. Unlike fixed-prompt methods, MLLMs exhibit a generative paradigm with open-ended text interpretation, enabling more adaptive anomaly analysis. However, this adaption faces inherent challenges as anomalies often manifest in fine-grained regions and exhibit minimal visual discrepancies from normal samples. To address these challenges, we propose a novel framework VMAD (Visual-enhanced MLLM Anomaly Detection) that enhances MLLM with visual-based IAD knowledge and fine-grained perception, simultaneously providing precise detection and comprehensive analysis of anomalies. Specifically, we design a Defect-Sensitive Structure Learning scheme that transfers patch-similarities cues from visual branch to our MLLM for improved anomaly discrimination. Besides, we introduce a novel visual projector, Locality-enhanced Token Compression, which mines multi-level features in local contexts to enhance fine-grained detection. Furthermore, we introduce the Real Industrial Anomaly Detection (RIAD), a comprehensive IAD dataset with detailed anomaly descriptions and analyses, offering a valuable resource for MLLM-based IAD development. Extensive experiments on zero-shot benchmarks, including MVTec-AD, Visa, WFDD, and RIAD datasets, demonstrate our superior performance over state-of-the-art methods. The code and dataset will be available soon.

2408.07575 2026-04-02 cs.AI math.ST stat.ME stat.TH

A General Framework on Conditions for Constraint-based Causal Learning

Kai Z. Teh, Kayvan Sadeghi, Terry Soo

详情
英文摘要

Most constraint-based causal learning algorithms provably return the correct causal graph under certain correctness conditions, such as faithfulness. By representing any constraint-based causal learning algorithm using the notion of a property, we provide a general framework to obtain and study correctness conditions for these algorithms. From the framework, we provide exact correctness conditions for the PC algorithm, which are then related to the correctness conditions of some other existing causal discovery algorithms. The framework also suggests a paradigm for designing causal learning algorithms which allows for the correctness conditions of algorithms to be controlled for before designing the actual algorithm, and has the following implications. We show that the sparsest Markov representation condition is the weakest correctness condition for algorithms that output ancestral graphs or directed acyclic graphs satisfying any existing notions of minimality. We also reason that Pearl-minimality is necessary for meaningful causal learning but not sufficient to relax the faithfulness condition and, as such, has to be strengthened, such as by including background knowledge, for causal learning beyond faithfulness.

2408.06788 2026-04-02 cs.CV cs.HC

Visual Neural Decoding via Improved Visual-EEG Semantic Consistency

Hongzhou Chen, Lianghua He, Yihang Liu, Longzhen Yang, Shaohua Shang, MengChu Zhou

详情
英文摘要

Visual neural decoding aims to extract and interpret original visual experiences directly from human brain activity. Recent studies have demonstrated the feasibility of decoding visual semantic categories from electroencephalography (EEG) signals, among which metric learning-based approaches have delivered promising results. However, these methods that directly map EEG features into a pre-trained embedding space inevitably introduce mapping bias, resulting in a modality gap and semantic inconsistency that impair cross-modal alignment. To address these issues, this work constructs a Visual-EEG Joint Semantic Space to bridge the gap between visual images and neural signals. Building upon this space, we propose two novel approaches to improve semantic consistency between cross-modal representations and facilitate optimal alignment. Specifically, (1) we introduce a Visual-EEG Semantic Decoupling Network (VE-SDN) to explicitly disentangle semantic components from modality representations, thereby achieving purely semantic-level cross-modal alignment. (2) We introduce a Neural-Guided Intra-Class Consistency (NGIC) objective, an asymmetric representation alignment strategy designed to effectively enhance the robustness of visual representations and further boost decoding performance. Extensive experiments on a large-scale Visual-EEG dataset validate the effectiveness of the proposed method. Compared to the strongest baseline, our approach demonstrates superior decoding performance, yielding relative Top-1/Top-5 accuracy improvements of 38.9%/17.9% in intra-subject and 16.1%/11.3% in inter-subject settings. The code is available at https://github.com/hzalanchen/Cross-Modal-EEG

2407.01967 2026-04-02 cs.CV

Harnessing the Power of Local Representations for Few-Shot Classification

Shi Tang, Guiming Luo, Xinchen Ye, Zhiyi Xia

详情
英文摘要

Generalizing to novel classes unseen during training is a key challenge of few-shot classification. Recent metric-based methods try to address this by local representations. However, they are unable to take full advantage of them due to (i) improper supervision for pretraining the feature extractor, and (ii) lack of adaptability in the metric for handling various possible compositions of local feature sets. In this work, we harness the power of local representations in improving novel-class generalization. For the feature extractor, we design a novel pretraining paradigm that learns randomly cropped patches by soft labels. It utilizes the class-level diversity of patches while diminishing the impact of their semantic misalignments to hard labels. To align network output with soft labels, we also propose a UniCon KL-Divergence that emphasizes the equal contribution of each base class in describing "non-base" patches. For the metric, we formulate measuring local feature sets as an entropy-regularized optimal transport problem to introduce the ability to handle sets consisting of homogeneous elements. Furthermore, we design a Modulate Module to endow the metric with the necessary adaptability. Our method achieves new state-of-the-art performance on three popular benchmarks. Moreover, it exceeds state-of-the-art transductive and cross-modal methods in the fine-grained scenario.

2407.01570 2026-04-02 cs.RO cs.AI

Ego-Foresight: Self-supervised Learning of Agent-Aware Representations for Improved RL

Manuel Serra Nunes, Atabak Dehban, Yiannis Demiris, José Santos-Victor

Comments 13 pages, 8 figures, conference

详情
英文摘要

Despite the significant advances in Deep Reinforcement Learning (RL) observed in the last decade, the amount of training experience necessary to learn effective policies remains one of the primary concerns in both simulated and real environments. Looking to solve this issue, previous work has shown that improved efficiency can be achieved by separately modeling the agent and environment, but usually requires a supervisory signal. In contrast to RL, humans can perfect a new skill from a small number of trials and often do so without a supervisory signal, making neuroscientific studies of human development a valuable source of inspiration for RL. In particular, we explore the idea of motor prediction, which states that humans develop an internal model of themselves and of the consequences that their motor commands have on the immediate sensory inputs. Our insight is that the movementofthe agent provides a cue that allows the duality between the agent and environment to be learned. To instantiate this idea, we present Ego-Foresight (EF), a self-supervised method for disentangling agent information based on motion and prediction. Our main finding is that, when used as an auxiliary task in feature learning, self-supervised agent awareness improves the sample-efficiency and performance of the underlying RL algorithm. To test our approach, we study the ability of EF to predict agent movement and disentangle agent information. Then, we integrate EF with model-free and model based RL algorithms to solve simulated control tasks, showing improved sample-efficiency and performance.

2406.15904 2026-04-02 cs.LG stat.ME stat.ML

Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction

Kulunu Dharmakeerthi, YoonHaeng Hur, Tengyuan Liang

详情
英文摘要

Practitioners often face the challenge of deploying prediction models in new environments with shifted distributions of covariates and responses. With observational data, such shifts are often driven by unobserved confounding, and can in fact alter the concept of which model is best. This paper studies distribution shifts in the domain adaptation problem with unobserved confounding. We postulate a linear structural causal model to account for endogeneity and unobserved confounding, and we leverage exogenous invariant covariate representations to cure concept shifts and improve target prediction. We propose a data-driven representation learning method that optimizes for a lower-dimensional linear subspace and a prediction model confined to that subspace. This method operates on a non-convex objective -- that interpolates between predictability and stability -- constrained to the Stiefel manifold, using an analog of projected gradient descent. We analyze the optimization landscape and prove that, provided sufficient regularization, nearly all local optima align with an invariant linear subspace resilient to distribution shifts. This method achieves a nearly ideal gap between target and source risk. We validate the method and theory with real-world data sets to illustrate the tradeoffs between predictability and stability.

2405.15556 2026-04-02 cs.LG cs.CL cs.CR

Certifiably Robust RAG against Retrieval Corruption

Chong Xiang, Tong Wu, Zexuan Zhong, David Wagner, Danqi Chen, Prateek Mittal

详情
英文摘要

Retrieval-augmented generation (RAG) is susceptible to retrieval corruption attacks, where malicious passages injected into retrieval results can lead to inaccurate model responses. We propose RobustRAG, the first defense framework with certifiable robustness against retrieval corruption attacks. The key insight of RobustRAG is an isolate-then-aggregate strategy: we isolate passages into disjoint groups, generate LLM responses based on the concatenated passages from each isolated group, and then securely aggregate these responses for a robust output. To instantiate RobustRAG, we design keyword-based and decoding-based algorithms for securely aggregating unstructured text responses. Notably, RobustRAG achieves certifiable robustness: for certain queries in our evaluation datasets, we can formally certify non-trivial lower bounds on response quality -- even against an adaptive attacker with full knowledge of the defense and the ability to arbitrarily inject a bounded number of malicious passages. We evaluate RobustRAG on the tasks of open-domain question-answering and free-form long text generation and demonstrate its effectiveness across three datasets and three LLMs.

2401.14295 2026-04-02 cs.CL cs.AI cs.LG

Demystifying Chains, Trees, and Graphs of Thoughts

Maciej Besta, Florim Memedi, Zhenyu Zhang, Robert Gerstenberger, Guangyuan Piao, Nils Blach, Piotr Nyczyk, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Lukas Gianinazzi, Ales Kubicek, Hubert Niewiadomski, Aidan O'Mahony, Onur Mutlu, Torsten Hoefler

详情
Journal ref
IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 47, Issue 12, pages 10967-10989 (December 2025)
英文摘要

The field of natural language processing (NLP) has witnessed significant progress in recent years, with a notable focus on improving large language models' (LLM) performance through innovative prompting techniques. Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the overall LLM reasoning is guided by a structure such as a graph. As illustrated with numerous examples, this paradigm significantly enhances the LLM's capability to solve numerous tasks, ranging from logical or mathematical reasoning to planning or creative writing. To facilitate the understanding of this growing field and pave the way for future developments, we devise a general blueprint for effective and efficient LLM reasoning schemes. For this, we conduct an in-depth analysis of the prompt execution pipeline, clarifying and clearly defining different concepts. We then build the first taxonomy of structure-enhanced LLM reasoning schemes. We focus on identifying fundamental classes of harnessed structures, and we analyze the representations of these structures, algorithms executed with these structures, and many others. We refer to these structures as reasoning topologies, because their representation becomes to a degree spatial, as they are contained within the LLM context. Our study compares existing prompting schemes using the proposed taxonomy, discussing how certain design choices lead to different patterns in performance and cost. We also outline theoretical underpinnings, relationships between prompting and other parts of the LLM ecosystem such as knowledge bases, and the associated research challenges. Our work will help to advance future prompt engineering techniques.

2401.09244 2026-04-02 cs.CL

Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges

Aiqi Jiang, Arkaitz Zubiaga

Comments 35 pages, 7 figures

详情
Journal ref
ACM Computing Surveys 2026
英文摘要

The growing prevalence and rapid evolution of offensive language in social media amplify the complexities of detection, particularly highlighting the challenges in identifying such content across diverse languages. This survey presents a systematic and comprehensive exploration of Cross-Lingual Transfer Learning (CLTL) techniques in offensive language detection in social media. Our study stands as the first holistic overview to focus exclusively on the cross-lingual scenario in this domain. We analyse 67 relevant papers and categorise these studies across various dimensions, including the characteristics of multilingual datasets used, the cross-lingual resources employed, and the specific CLTL strategies implemented. According to "what to transfer", we also summarise three main CLTL transfer approaches: instance, feature, and parameter transfer. Additionally, we shed light on the current challenges and future research opportunities in this field. Furthermore, we have made our survey resources available online, including two comprehensive tables that provide accessible references to the multilingual datasets and CLTL methods used in the reviewed literature.

2310.20641 2026-04-02 cs.LG

Performance Improvement in Multi-class Classification via Automated Hierarchy Generation and Exploitation through Extended LCPN Schemes

Celal Alagoz

详情
英文摘要

Hierarchical classification (HC) plays a pivotal role in multi-class classification tasks, where objects are organized into a hierarchical structure. This study explores the performance of HC through a comprehensive analysis that encompasses both hierarchy generation and hierarchy exploitation. This analysis is particularly relevant in scenarios where a predefined hierarchy structure is not readily accessible. Notably, two novel hierarchy exploitation schemes, LCPN+ and LCPN+F, which extend the capabilities of LCPN and combine the strengths of global and local classification, have been introduced and evaluated alongside existing methods. The findings reveal the consistent superiority of LCPN+F, which outperforms other schemes across various datasets and scenarios. Moreover, this research emphasizes not only effectiveness but also efficiency, as LCPN+ and LCPN+F maintain runtime performance comparable to Flat Classification (FC). Additionally, this study underscores the importance of selecting the right hierarchy exploitation scheme to maximize classification performance. This work extends our understanding of HC and establishes a benchmark for future research, fostering advancements in multi-class classification methodologies.

2303.08250 2026-04-02 cs.CV cs.LG

CHEEM: Continual Learning by Reuse, New, Adapt and Skip -- A Hierarchical Exploration-Exploitation Approach

Chinmay Savadikar, Michelle Dai, Tianfu Wu

Comments CVPR 2026

详情
英文摘要

To effectively manage the complexities of real-world dynamic environments, continual learning must incrementally acquire, update, and accumulate knowledge from a stream of tasks of different nature without suffering from catastrophic forgetting of prior knowledge. While this capability is innate to human cognition, it remains a significant challenge for modern deep learning systems. At the heart of this challenge lies the stability-plasticity dilemma: the need to balance leveraging prior knowledge, integrating novel information, and allocating model capacity adaptively based on task complexity and synergy. In this paper, we propose a novel exemplar-free class-incremental continual learning (ExfCCL) framework that addresses these issues through a Hierarchical Exploration-Exploitation (HEE) approach. The core of our method is a HEE-guided efficient neural architecture search (HEE-NAS) that enables a learning-to-adapt backbone via four primitive operations - reuse, new, adapt, and skip - thereby serving as an internal memory that dynamically updates selected components across streaming tasks. To address the task ID inference problem in ExfCCL, we exploit an external memory of task centroids proposed in the prior art. We term our method CHEEM (Continual Hierarchical-Exploration-Exploitation Memory). CHEEM is evaluated on the challenging MTIL and VDD benchmarks using both Tiny and Base Vision Transformers and a proposed holistic Figure-of-Merit (FoM) metric. It significantly outperforms state-of-the-art prompting-based continual learning methods, closely approaching full fine-tuning upper bounds. Furthermore, it learns adaptive model structures tailored to individual tasks in a semantically meaningful way. Our code is available at https://github.com/savadikarc/cheem .

2303.06561 2026-04-02 cs.LG cond-mat.dis-nn math.OC stat.ML

Phase Diagram of Initial Condensation for Two-layer Neural Networks

Zhengan Chen, Yuqing Li, Tao Luo, Zhangchen Zhou, Zhi-Qin John Xu

详情
Journal ref
CSIAM Transactions on Applied Mathematics 2024
英文摘要

The phenomenon of distinct behaviors exhibited by neural networks under varying scales of initialization remains an enigma in deep learning research. In this paper, based on the earlier work by Luo et al.~\cite{luo2021phase}, we present a phase diagram of initial condensation for two-layer neural networks. Condensation is a phenomenon wherein the weight vectors of neural networks concentrate on isolated orientations during the training process, and it is a feature in non-linear learning process that enables neural networks to possess better generalization abilities. Our phase diagram serves to provide a comprehensive understanding of the dynamical regimes of neural networks and their dependence on the choice of hyperparameters related to initialization. Furthermore, we demonstrate in detail the underlying mechanisms by which small initialization leads to condensation at the initial training stage.

2604.00324 2026-04-02 cs.LG cs.AI

The Persistent Vulnerability of Aligned AI Systems

Aengus Lynch

Comments PhD thesis, University College London, 2025. 157 pages. Supervised by Ricardo Silva

详情
英文摘要

Autonomous AI agents are being deployed with filesystem access, email control, and multi-step planning. This thesis contributes to four open problems in AI safety: understanding dangerous internal computations, removing dangerous behaviors once embedded, testing for vulnerabilities before deployment, and predicting when models will act against deployers. ACDC automates circuit discovery in transformers, recovering all five component types from prior manual work on GPT-2 Small by selecting 68 edges from 32,000 candidates in hours rather than months. Latent Adversarial Training (LAT) removes dangerous behaviors by optimizing perturbations in the residual stream to elicit failure modes, then training under those perturbations. LAT solved the sleeper agent problem where standard safety training failed, matching existing defenses with 700x fewer GPU hours. Best-of-N jailbreaking achieves 89% attack success on GPT-4o and 78% on Claude 3.5 Sonnet through random input augmentations. Attack success follows power law scaling across text, vision, and audio, enabling quantitative forecasting of adversarial robustness. Agentic misalignment tests whether frontier models autonomously choose harmful actions given ordinary goals. Across 16 models, agents engaged in blackmail (96% for Claude Opus 4), espionage, and actions causing death. Misbehavior rates rose from 6.5% to 55.1% when models stated scenarios were real rather than evaluations. The thesis does not fully resolve any of these problems but makes each tractable and measurable.

2604.00323 2026-04-02 cs.CL cs.CY

Large Language Models in the Abuse Detection Pipeline

Suraj Kath, Sanket Badhe, Preet Shah, Ashwin Sampathkumar, Shivani Gupta

详情
英文摘要

Online abuse has grown increasingly complex, spanning toxic language, harassment, manipulation, and fraudulent behavior. Traditional machine-learning approaches dependent on static classifiers and labor-intensive labeling struggle to keep pace with evolving threat patterns and nuanced policy requirements. Large Language Models introduce new capabilities for contextual reasoning, policy interpretation, explanation generation, and cross-modal understanding, enabling them to support multiple stages of modern safety systems. This survey provides a lifecycle-oriented analysis of how LLMs are being integrated into the Abuse Detection Lifecycle (ADL), which we define across four stages: (I) Label \& Feature Generation, (II) Detection, (III) Review \& Appeals, and (IV) Auditing \& Governance. For each stage, we synthesize emerging research and industry practices, highlight architectural considerations for production deployment, and examine the strengths and limitations of LLM-driven approaches. We conclude by outlining key challenges including latency, cost-efficiency, determinism, adversarial robustness, and fairness and discuss future research directions needed to operationalize LLMs as reliable, accountable components of large-scale abuse-detection and governance systems.

2604.00320 2026-04-02 cs.RO cs.SY eess.SY

Hierarchical Motion Planning and Control under Unknown Nonlinear Dynamics via Predicted Reachability

Zhiquan Zhang, Melkior Ornik

详情
英文摘要

Autonomous motion planning under unknown nonlinear dynamics requires learning system properties while navigating toward a target. In this work, we develop a hierarchical planning-control framework that enables online motion synthesis with limited prior system knowledge. The state space is partitioned into polytopes and approximates the unknown nonlinear system using a piecewise-affine (PWA) model. The local affine models are identified once the agent enters the corresponding polytopes. To reduce computational complexity, we introduce a non-uniform adaptive state space partition strategy that refines the partition only in task-relevant regions. The resulting PWA system is abstracted into a directed weighted graph, whose edge existence is incrementally verified using reach control theory and predictive reachability conditions. Certified edges are weighted using provable time-to-reach bounds, while uncertain edges are assigned information-theoretic weights to guide exploration. The graph is updated online as new data becomes available, and high-level planning is performed by graph search, while low-level affine feedback controllers are synthesized to execute the plan. Furthermore, the conditions of classical reach control theory are often difficult to satisfy in underactuated settings. We therefore introduce relaxed reachability conditions to extend the framework to such systems. Simulations demonstrate effective exploration-exploitation trade-offs with formal reachability guarantees.

2604.00319 2026-04-02 cs.AI cs.MA

Collaborative AI Agents and Critics for Fault Detection and Cause Analysis in Network Telemetry

Syed Eqbal Alam, Zhan Shu

详情
英文摘要

We develop algorithms for collaborative control of AI agents and critics in a multi-actor, multi-critic federated multi-agent system. Each AI agent and critic has access to classical machine learning or generative AI foundation models. The AI agents and critics collaborate with a central server to complete multimodal tasks such as fault detection, severity, and cause analysis in a network telemetry system, text-to-image generation, video generation, healthcare diagnostics from medical images and patient records, etcetera. The AI agents complete their tasks and send them to AI critics for evaluation. The critics then send feedback to agents to improve their responses. Collaboratively, they minimize the overall cost to the system with no inter-agent or inter-critic communication. AI agents and critics keep their cost functions or derivatives of cost functions private. Using multi-time scale stochastic approximation techniques, we provide convergence guarantees on the time-average active states of AI agents and critics. The communication overhead is a little on the system, of the order of $\mathcal{O}(m)$, for $m$ modalities and is independent of the number of AI agents and critics. Finally, we present an example of fault detection, severity, and cause analysis in network telemetry and thorough evaluation to check the algorithm's efficacy.

2604.00310 2026-04-02 cs.LG cs.AI

Robust Multimodal Safety via Conditional Decoding

Anurag Kumar, Raghuveer Peri, Jon Burnsky, Alexandru Nelus, Rohit Paturi, Srikanth Vishnubhotla, Yanjun Qi

Comments 8 pages + Appendix section. Submitted to ACL 2026

详情
英文摘要

Multimodal large-language models (MLLMs) often experience degraded safety alignment when harmful queries exploit cross-modal interactions. Models aligned on text alone show a higher rate of successful attacks when extended to two or more modalities. In this work, we propose a simple conditional decoding strategy, CASA (Classification Augmented with Safety Attention) that utilizes internal representations of MLLMs to predict a binary safety token before response generation. We introduce a novel safety attention module designed to enhance the model's ability to detect malicious queries. Our design ensures robust safety alignment without relying on any external classifier or auxiliary head, and without the need for modality-specific safety fine-tuning. On diverse benchmarks such as MM-SafetyBench, JailbreakV-28k, and adversarial audio tests, CASA lowers the average attack success rate by more than 97% across modalities and across attack types. Our empirical evaluations also show that CASA maintains strong utility in benign inputs, a result validated through both automated and human evaluations (via 13 trained annotators). Together, these results highlight CASA as a simple and generalizable framework to improve multimodal LLM safety.