arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1832
2503.04404 2026-05-01 cs.LG cs.CR cs.NI

Temporal Analysis of NetFlow Datasets for Network Intrusion Detection Systems

Majed Luay, Siamak Layeghy, Seyedehfaezeh Hosseininoorbin, Mohanad Sarhan, Nour Moustafa, Marius Portmann

详情
英文摘要

This paper investigates the temporal analysis of NetFlow datasets for machine learning (ML)-based network intrusion detection systems (NIDS). Although many previous studies have highlighted the critical role of temporal features, such as inter-packet arrival time and flow length/duration, in NIDS, the currently available NetFlow datasets for NIDS lack these temporal features. This study addresses this gap by creating and making publicly available a set of NetFlow datasets that incorporate these temporal features [1]. With these temporal features, we provide a comprehensive temporal analysis of NetFlow datasets by examining the distribution of various features over time and presenting time-series representations of NetFlow features. This temporal analysis has not been previously provided in the existing literature. We also borrowed an idea from signal processing, time frequency analysis, and tested it to see how different the time frequency signal presentations (TFSPs) are for various attacks. The results indicate that many attacks have unique patterns, which could help ML models to identify them more easily.

2503.01817 2026-05-01 cs.LG

Gradient-Based Optimization on Gödel Logic as Discrete Local Search

Alessandro Daniele, Emile van Krieken

详情
英文摘要

A fundamental challenge in neurosymbolic systems is applying continuous gradient-based optimization to discrete logical domains. While fuzzy relaxations provide differentiability, they often lack a formal structural alignment with classical logic. In this work, we show that Gödel semantics addresses this limitation through a homomorphism that maps its continuous interpretations to Boolean ones, allowing discrete variables to be encoded while maintaining full differentiability. Building on this foundation, we show that gradient-based optimization on Gödel logic instantiates a discrete local search for Boolean satisfiability. Our formal analysis proves that each optimization step identifies and modifies a single variable within a unsatisfied clause, precisely mimicking the steps of a discrete solver. We identify local optima as the primary limitation of such dynamics and introduce the Gödel Trick, a stochastic reparameterization technique designed to improve the exploration of the solution space. We further show a formal connection between this approach, probabilistic inference, and the Gumbel-Max trick. Experimental results on SAT benchmarks and the Visual Sudoku task validate our theoretical findings, demonstrating that our approach effectively navigates complex combinatorial landscapes and provides a solid foundation for differentiable discrete search.

2502.14698 2026-05-01 cs.LG cs.AI stat.AP stat.ML

General Uncertainty Estimation with Delta Variances

Simon Schmitt, John Shawe-Taylor, Hado van Hasselt

详情
英文摘要

Decision makers may suffer from uncertainty induced by limited data. This may be mitigated by accounting for epistemic uncertainty, which is however challenging to estimate efficiently for large neural networks. To this extent we investigate Delta Variances, a family of algorithms for epistemic uncertainty quantification, that is computationally efficient and convenient to implement. It can be applied to neural networks and more general functions composed of neural networks. As an example we consider a weather simulator with a neural-network-based step function inside -- here Delta Variances empirically obtain competitive results at the cost of a single gradient computation. The approach is convenient as it requires no changes to the neural network architecture or training procedure. We discuss multiple ways to derive Delta Variances theoretically noting that special cases recover popular techniques and present a unified perspective on multiple related methods. Finally we observe that this general perspective gives rise to a natural extension and empirically show its benefit.

2502.14541 2026-05-01 cs.CL

LLM-based User Profile Management for Recommender System

Seunghwan Bang, Hwanjun Song

Comments Accepted at SIGIR 2025 Workshop

详情
英文摘要

The rapid advancement of Large Language Models (LLMs) has opened new opportunities in recommender systems by enabling zero-shot recommendation without conventional training. Despite their potential, most existing works rely solely on users' purchase histories, leaving significant room for improvement by incorporating user-generated textual data, such as reviews and product descriptions. Addressing this gap, we propose PURE, a novel LLM-based recommendation framework that builds and maintains evolving user profiles by systematically extracting and summarizing key information from user reviews. PURE consists of three core components: a Review Extractor for identifying user preferences and key product features, a Profile Updater for refining and updating user profiles, and a Recommender for generating personalized recommendations using the most current profile. To evaluate PURE, we introduce a continuous sequential recommendation task that reflects real-world scenarios by adding reviews over time and updating predictions incrementally. Our experimental results on Amazon datasets demonstrate that PURE outperforms existing LLM-based methods, effectively leveraging long-term user information while managing token limitations.

2502.09282 2026-05-01 cs.CV cs.HC cs.LG

MsEdF: A Multi-stream Encoder-decoder Framework for Remote Sensing Image Captioning

Swadhin Das, Raksha Sharma

详情
英文摘要

Remote sensing images contain complex spatial patterns and semantic structures, which makes the captioning model difficult to accurately describe. Encoder-decoder architectures have become the widely used approach for RSIC by translating visual content into descriptive text. However, many existing methods rely on a single-stream architecture, which weakens the model to accurately describe the image. Such single-stream architectures typically struggle to extract diverse spatial features or capture complex semantic relationships, limiting their effectiveness in scenes with high intraclass similarity or contextual ambiguity. In this work, we propose a novel Multi-stream Encoder-decoder Framework (MsEdF) which improves the performance of RSIC by optimizing both the spatial representation and language generation of encoder-decoder architecture. The encoder fuses information from two complementary image encoders, thereby promoting feature diversity through the integration of multiscale and structurally distinct cues. To improve the capture of context-aware descriptions, we refine the input sequence's semantic modeling on the decoder side using a stacked GRU architecture with an element-wise aggregation scheme. Experiments on three benchmark RSIC datasets show that MsEdF outperforms several baseline models.

2502.07189 2026-05-01 cs.LG stat.ML

Exploring Vision Neural Network Pruning via Screening Methodology

Mingyuan Wang, Yangzi Guo, Sida Liu, Yuhang Liu

详情
英文摘要

The remarkable performance of modern deep neural networks (DNNs) is largely driven by their massive scale, often comprising tens to hundreds of millions-or even billions-of parameters. However, such a scale incurs substantial storage and computational costs, hindering deployment on platforms such as edge devices that require energy-efficient and real-time processing. In this paper, we propose a network pruning framework that reduces both storage and computation requirements by an order of magnitude while preserving model accuracy. Our approach eliminates non-essential parameters through a statistical analysis of component significance across classification categories. Specifically, we employ a F-statistic-based screening technique combined with a weighted evaluation scheme to quantify the contributions of connections and channels, enabling both unstructured and structured pruning within a unified framework. Extensive experiments on real-world vision datasets, covering both fully connected neural networks (FNNs) and convolutional neural networks (CNNs), demonstrate that the proposed framework produces compact and efficient models that are highly competitive with the state of art apporoaches.

2501.19143 2026-05-01 cs.AI cs.CR cs.CV

Imitation Game for Adversarial Disillusion with Chain-of-Thought Reasoning in Generative AI

Ching-Chun Chang, Fan-Yun Chen, Shih-Hong Gu, Kai Gao, Hanrui Wang, Isao Echizen

详情
Journal ref
in IEEE Access, vol. 13, pp. 95085-95093, 2025
英文摘要

As the cornerstone of artificial intelligence, machine perception confronts a fundamental threat posed by adversarial illusions. These adversarial attacks manifest in two primary forms: deductive illusion, where specific stimuli are crafted based on the victim model's general decision logic, and inductive illusion, where the victim model's general decision logic is shaped by specific stimuli. The former exploits the model's decision boundaries to create a stimulus that, when applied, interferes with its decision-making process. The latter reinforces a conditioned reflex in the model, embedding a backdoor during its learning phase that, when triggered by a stimulus, causes aberrant behaviours. The multifaceted nature of adversarial illusions calls for a unified defence framework, addressing vulnerabilities across various forms of attack. In this study, we propose a disillusion paradigm based on the concept of an imitation game. At the heart of the imitation game lies a multimodal generative agent, steered by chain-of-thought reasoning, which observes, internalises and reconstructs the semantic essence of a sample, liberated from the classic pursuit of reversing the sample to its original state. As a proof of concept, we conduct experimental simulations using a multimodal generative dialogue agent and evaluates the methodology under a variety of attack scenarios. Experimental results demonstrate that the proposed framework consistently neutralises both deductive and inductive adversarial illusions across diverse white-box and black-box attack scenarios.

2501.12632 2026-05-01 cs.CV cs.LG

TeD-Loc: Text Distillation for Weakly Supervised Object Localization

Shakeeb Murtaza, Soufiane Belharbi, Alexis Guichemerre, Marco Pedersoli, Eric Granger

详情
英文摘要

Weakly supervised object localization (WSOL) models are trained using only image-level class labels. They can predict both the object class and spatial regions corresponding to the object, without requiring explicit bounding box annotations. Given their reliance on classification objectives, traditional WSOL methods, like class activation mapping, tend to focus on the most discriminative object regions, often missing the full spatial extent. Although vision-language models such as CLIP encode rich semantic priors, they are not directly suited for WSOL because global text and class-token embeddings are not explicitly aligned with local patch embeddings, making patch-level localization difficult without additional mechanisms. Recent methods such as GenPrompt address this limitation, but at the cost of increased complexity, as they rely on conditional denoising and elaborate prompt-learning strategies. We propose Text Distillation for Localization (TeD-Loc), which transfers knowledge from CLIP text embeddings to patch embeddings through contrastive alignment, thereby enabling patch-level foreground/background localization. A localization-guided classification module is also introduced that uses localization scores to aggregate foreground patch embeddings for joint classification and localization in a single model. In addition, a QR-based orthogonalization of class text embeddings is applied before distillation to improve discrimination for semantically similar classes. Extensive experiments show that TeD-Loc improves Top-1 Loc by ~5% on CUB and ILSVRC, and PxAP by ~31% on histopathology benchmarks, while achieving more efficient inference than GenPrompt.

2501.08469 2026-05-01 cs.RO cs.SY eess.SY

Electrostatic Clutch-Based Mechanical Multiplexer with Increased Force Capability

Timothy E. Amish, Jeffrey T. Auletta, Chad C. Kessens, Joshua R. Smith, Jeffrey I. Lipton

详情
英文摘要

Robotic systems with many degrees of freedom (DoF) are constrained by the demands of dedicating a motor to each joint, and while mechanical multiplexing reduces actuator count, existing clutch designs are bulky, force-limited, or restricted to one output at a time. The problem addressed in this study is how to achieve high-force multiplexing that supports both simultaneous and sequential control from a single motor. Here we show an electrostatic capstan clutch-based transmission that enables both single-input-single-output (SISO) and single-input-multiple-output (SIMO) multiplexing. We demonstrated these on a four-DoF tendon-driven robotic hand where a single motor achieved output forces of up to 212 N, increased vertical grip strength by 4.09 times, and raised horizontal carrying capacity to 111.2 N, the highest currently among five-fingered tendon-driven robotic hands. These results demonstrate that electrostatic-based multiplexing provides versatile actuation, overcoming the limitations of prior systems.

2501.04066 2026-05-01 cs.LG cs.AR

Federated Knowledge Distillation for Multi-Model Architectures Lithography Hotspot Detection

Yuqi Li, Xingyou Lin, Yanli Li, Kai Zhang, Chuanguang Yang, Zhongliang Guo, Jianping Gou, Tingwen Huang, Yingli Tian

Comments Accept by ICME2026

详情
英文摘要

As a special type of multimedia data, Lithography Hotspot Detection (LHD) training often requires stronger privacy protection than conventional multimedia data, and federated learning provides a promising potential solution to this challenge. However, existing approaches rely solely on either parameter aggregation or Knowledge Distillation (KD), failing to fully exploit the potential of collaborative learning. To address this, we propose FedKD-hybrid, a novel framework that synergizes the strengths of both paradigms. Specifically, FedKD-hybrid utilizes a public dataset to facilitate consensus, where clients exchange both parameters of agreed-upon layers and logits. This hybrid information is aggregated to refine local models, enhancing knowledge transfer. Extensive experiments on ICCAD-2012 and real-world FAB datasets demonstrate that FedKD-hybrid consistently outperforms state-of-the-art methods in both effectiveness and robustness.

2501.02770 2026-05-01 cs.AI cs.MA cs.RO

Multi-Agent Pathfinding Under Team-Connected Communication Constraint via Adaptive Path Expansion and Dynamic Leading

Hoang-Dung Bui, Erion Plaku, Gregoy J. Stein

详情
Journal ref
Journal of Artificial Intelligence Research, 85(46) (2026)
英文摘要

This paper proposes a novel planning framework to handle a multi-agent pathfinding problem under team-connected communication constraint, where all agents must have a connected communication channel to the rest of the team during their entire movements. Standard multi-agent path finding approaches (e.g., priority-based search) have potential in this domain but fail when neighboring configurations at start and goal differ. Their single-expansion approach -- computing each agent's path from the start to the goal in just a single expansion -- cannot reliably handle planning under communication constraints for agents as their neighbors change during navigating. Similarly, leader-follower approaches (e.g., platooning) are effective at maintaining team communication, but fixing the leader at the outset of planning can cause planning to become stuck in dense-clutter environments, limiting their practical utility. To overcome this limitation, we propose a novel two-level multi-agent pathfinding framework that integrates two techniques: adaptive path expansion to expand agent paths to their goals in multiple stages; and dynamic leading technique that enables the reselection of the leading agent during each agent path expansion whenever progress cannot be made. Simulation experiments show the efficiency of our planners, which can handle up to 25 agents across five environment types under a limited communication range constraint and up to 11-12 agents on three environment types under line-of-sight communication constraint, exceeding 90% success-rate where baselines routinely fail.

2412.16720 2026-05-01 cs.AI

OpenAI o1 System Card

OpenAI, :, Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, Alex Iftimie, Alex Karpenko, Alex Tachard Passos, Alexander Neitz, Alexander Prokofiev, Alexander Wei, Allison Tam, Ally Bennett, Ananya Kumar, Andre Saraiva, Andrea Vallone, Andrew Duberstein, Andrew Kondrich, Andrey Mishchenko, Andy Applebaum, Angela Jiang, Ashvin Nair, Barret Zoph, Behrooz Ghorbani, Bohan Zhang, Ben Rossen, Benjamin Sokolowsky, Boaz Barak, Bob McGrew, Borys Minaiev, Botao Hao, Bowen Baker, Brandon Houghton, Brandon McKinzie, Brydon Eastman, Camillo Lugaresi, Cary Bassin, Cary Hudson, Chak Ming Li, Charles de Bourcy, Chelsea Voss, Chen Shen, Chong Zhang, Chris Koch, Chris Orsinger, Christopher Hesse, Claudia Fischer, Clive Chan, Dan Roberts, Daniel Kappler, Daniel Levy, Daniel Selsam, David Dohan, David Farhi, David Mely, David Robinson, Dimitris Tsipras, Doug Li, Dragos Oprica, Eben Freeman, Eddie Zhang, Edmund Wong, Elizabeth Proehl, Enoch Cheung, Eric Mitchell, Eric Wallace, Erik Ritter, Evan Mays, Fan Wang, Felipe Petroski Such, Filippo Raso, Florencia Leoni, Foivos Tsimpourlas, Francis Song, Fred von Lohmann, Freddie Sulit, Geoff Salmon, Giambattista Parascandolo, Gildas Chabot, Grace Zhao, Greg Brockman, Guillaume Leclerc, Hadi Salman, Haiming Bao, Hao Sheng, Hart Andrin, Hessam Bagherinezhad, Hongyu Ren, Hunter Lightman, Hyung Won Chung, Ian Kivlichan, Ian O'Connell, Ian Osband, Ignasi Clavera Gilaberte, Ilge Akkaya, Ilya Kostrikov, Ilya Sutskever, Irina Kofman, Jakub Pachocki, James Lennon, Jason Wei, Jean Harb, Jerry Twore, Jiacheng Feng, Jiahui Yu, Jiayi Weng, Jie Tang, Jieqi Yu, Joaquin Quiñonero Candela, Joe Palermo, Joel Parish, Johannes Heidecke, John Hallman, John Rizzo, Jonathan Gordon, Jonathan Uesato, Jonathan Ward, Joost Huizinga, Julie Wang, Kai Chen, Kai Xiao, Karan Singhal, Karina Nguyen, Karl Cobbe, Katy Shi, Kayla Wood, Kendra Rimbach, Keren Gu-Lemberg, Kevin Liu, Kevin Lu, Kevin Stone, Kevin Yu, Lama Ahmad, Lauren Yang, Leo Liu, Leon Maksin, Leyton Ho, Liam Fedus, Lilian Weng, Linden Li, Lindsay McCallum, Lindsey Held, Lorenz Kuhn, Lukas Kondraciuk, Lukasz Kaiser, Luke Metz, Madelaine Boyd, Maja Trebacz, Manas Joglekar, Mark Chen, Marko Tintor, Mason Meyer, Matt Jones, Matt Kaufer, Max Schwarzer, Meghan Shah, Mehmet Yatbaz, Melody Y. Guan, Mengyuan Xu, Mengyuan Yan, Mia Glaese, Mianna Chen, Michael Lampe, Michael Malek, Michele Wang, Michelle Fradin, Mike McClay, Mikhail Pavlov, Miles Wang, Mingxuan Wang, Mira Murati, Mo Bavarian, Mostafa Rohaninejad, Nat McAleese, Neil Chowdhury, Neil Chowdhury, Nick Ryder, Nikolas Tezak, Noam Brown, Ofir Nachum, Oleg Boiko, Oleg Murk, Olivia Watkins, Patrick Chao, Paul Ashbourne, Pavel Izmailov, Peter Zhokhov, Rachel Dias, Rahul Arora, Randall Lin, Rapha Gontijo Lopes, Raz Gaon, Reah Miyara, Reimar Leike, Renny Hwang, Rhythm Garg, Robin Brown, Roshan James, Rui Shu, Ryan Cheu, Ryan Greene, Saachi Jain, Sam Altman, Sam Toizer, Sam Toyer, Samuel Miserendino, Sandhini Agarwal, Santiago Hernandez, Sasha Baker, Scott McKinney, Scottie Yan, Shengjia Zhao, Shengli Hu, Shibani Santurkar, Shraman Ray Chaudhuri, Shuyuan Zhang, Siyuan Fu, Spencer Papay, Steph Lin, Suchir Balaji, Suvansh Sanjeev, Szymon Sidor, Tal Broda, Aidan Clark, Tao Wang, Taylor Gordon, Ted Sanders, Tejal Patwardhan, Thibault Sottiaux, Thomas Degry, Thomas Dimson, Tianhao Zheng, Timur Garipov, Tom Stasi, Trapit Bansal, Trevor Creech, Troy Peterson, Tyna Eloundou, Valerie Qi, Vineet Kosaraju, Vinnie Monaco, Vitchyr Pong, Vlad Fomenko, Weiyi Zheng, Wenda Zhou, Wenting Zhan, Wes McCabe, Wojciech Zaremba, Yann Dubois, Yinghai Lu, Yining Chen, Young Cha, Yu Bai, Yuchen He, Yuchen Zhang, Yunyun Wang, Zheng Shao, Zhuohan Li

详情
英文摘要

The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence. Our results underscore the need for building robust alignment methods, extensively stress-testing their efficacy, and maintaining meticulous risk management protocols. This report outlines the safety work carried out for the OpenAI o1 and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.

2411.18929 2026-05-01 cs.CV cs.AI cs.LG

VIPaint: Image Inpainting with Pre-Trained Diffusion Models via Variational Inference

Sakshi Agarwal, Gabriel Hope, Jimin Heo, Erik B. Sudderth

Comments Proceedings of the 29th International Conference on Artificial Intelligence and Statistics (AISTATS), May 2026, Tangier, Morocco. PMLR Volume 300

详情
英文摘要

Diffusion probabilistic models learn to remove noise added during training, generating novel data (e.g., images) from Gaussian noise through sequential denoising. However, conditioning the generative process on corrupted or masked images is challenging. While various methods have been proposed for inpainting masked images with diffusion priors, they often fail to produce samples from the true conditional distribution, especially for large masked regions. Many baselines also cannot be applied to latent diffusion models which generate high-quality images with much lower computational cost. We propose a hierarchical variational inference algorithm that optimizes a non-Gaussian Markov approximation of the true diffusion posterior. Our VIPaint method outperforms existing approaches to inpainting, producing diverse high-quality imputations even for state-of-the-art text-conditioned latent diffusion models, and is also effective for other inverse problems like deblurring and superresolution.

2411.18796 2026-05-01 cs.LG q-bio.QM

Graph-Based Biomarker Discovery and Interpretation for Alzheimer's Disease

Maryam Khalid, Fadeel Sher Khan, John Broussard, Arko Barman

详情
英文摘要

Early diagnosis and discovery of therapeutic drug targets are crucial objectives for effective management of Alzheimer's Disease (AD). Current approaches for AD diagnosis and treatment planning are based on radiological imaging and largely inaccessible for population-level screening due to prohibitive costs and limited availability. Recently, blood tests have shown promise in diagnosing AD and highlighting possible biomarkers that can be used as drug targets for AD management. Blood tests are significantly more accessible to disadvantaged populations, cost-effective, and minimally invasive. However, biomarker discovery in the context of AD diagnosis is complex as there exist important associations between various biomarkers. Here, we introduce BRAIN (Biomarker Representation, Analysis, and Interpretation Network), a novel machine learning (ML) framework to jointly optimize diagnostic accuracy and biomarker discovery processes to identify all relevant biomarkers that contribute to AD diagnosis. Using a holistic graph-based representation for biomarkers, we highlight their interdependencies and explain why different ML models identify different discriminative biomarkers. We apply BRAIN to a publicly available blood biomarker dataset, revealing three novel biomarker subnetworks whose interactions vary between the control and AD groups, offering a new paradigm for drug discovery and biomarker analysis for AD.

2409.16808 2026-05-01 cs.CV cs.AR cs.DC cs.LG cs.SE

Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices

Daghash K. Alqahtani, Aamir Cheema, Adel N. Toosi

详情
英文摘要

Modern applications, such as autonomous vehicles, require deploying deep learning algorithms on resource-constrained edge devices for real-time image and video processing. However, there is limited understanding of the efficiency and performance of various object detection models on these devices. In this paper, we evaluate state-of-the-art object detection models, including YOLOv8 (Nano, Small, Medium), EfficientDet Lite (Lite0, Lite1, Lite2), and SSD (SSD MobileNet V1, SSDLite MobileDet). We deployed these models on popular edge devices like the Raspberry Pi 3, 4, and 5 with/without TPU accelerators, and Jetson Orin Nano, collecting key performance metrics such as energy consumption, inference time, and Mean Average Precision (mAP). Our findings highlight that lower mAP models such as SSD MobileNet V1 are more energy-efficient and faster in inference, whereas higher mAP models like YOLOv8 Medium generally consume more energy and have slower inference, though with exceptions when accelerators like TPUs are used. Among the edge devices, Jetson Orin Nano stands out as the fastest and most energy-efficient option for request handling, despite having the highest idle energy consumption. These results emphasize the need to balance accuracy, speed, and energy efficiency when deploying deep learning models on edge devices, offering valuable guidance for practitioners and researchers selecting models and devices for their applications.

2408.13122 2026-05-01 cs.LG cs.AI cs.IT math.IT

Semantic Variational Bayes Based on Semantic Information G Theory for Solving Latent Variables

Chenguang Lu

Comments 22 pages, 6 figures, 38 references

详情
英文摘要

The Variational Bayesian method (VB) is used to solve the probability distributions of latent variables with the minimum free energy criterion. This criterion is not easy to understand, and the computation is complex. For these reasons, this paper proposes the Semantic Variational Bayes' method (SVB). The Semantic Information Theory the author previously proposed extends the rate-distortion function R(D) to the rate-fidelity function R(G), where R is the minimum mutual information for given semantic mutual information G. SVB came from the parameter solution of R(G), where the variational and iterative methods originated from Shannon et al.'s research on the rate-distortion function. The constraint functions SVB uses include likelihood, truth, membership, similarity, and distortion functions. SVB uses the maximum information efficiency (G/R) criterion, including the maximum semantic information criterion for optimizing model parameters and the minimum mutual information criterion for optimizing the Shannon channel. For the same tasks, SVB is computationally simpler than VB. The computational experiments in the paper include 1) using a mixture model as an example to show that the mixture model converges as G/R increases; 2) demonstrating the application of SVB in data compression with a group of error ranges as the constraint; 3) illustrating how the semantic information measure and SVB can be used for maximum entropy control and reinforcement learning in control tasks with given range constraints, providing numerical evidence for balancing control's purposiveness and efficiency. Further research is needed to apply SVB to neural networks and deep learning.

2408.12622 2026-05-01 cs.AI cs.CR cs.ET cs.LG cs.SY eess.SY

The AI Risk Repository: A Comprehensive Meta-Review, Database, and Taxonomy of Risks From Artificial Intelligence

Peter Slattery, Alexander K. Saeri, Emily A. C. Grundy, Jess Graham, Michael Noetel, Risto Uuk, James Dao, Soroush Pour, Stephen Casper, Neil Thompson

详情
Journal ref
Patterns 101517 (2026)
英文摘要

The risks posed by Artificial Intelligence (AI) are of considerable concern to academics, auditors, policymakers, AI companies, and the public. However, a lack of shared understanding of AI risks can impede our ability to comprehensively discuss, research, and react to them. This paper addresses this gap by creating an AI Risk Repository to serve as a common frame of reference. This comprises a living database of 777 risks extracted from 43 taxonomies, which can be filtered based on two overarching taxonomies and easily accessed, modified, and updated via our website and online spreadsheets. We construct our Repository with a systematic review of taxonomies and other structured classifications of AI risk followed by an expert consultation. We develop our taxonomies of AI risk using a best-fit framework synthesis. Our high-level Causal Taxonomy of AI Risks classifies each risk by its causal factors (1) Entity: Human, AI; (2) Intentionality: Intentional, Unintentional; and (3) Timing: Pre-deployment; Post-deployment. Our mid-level Domain Taxonomy of AI Risks classifies risks into seven AI risk domains: (1) Discrimination & toxicity, (2) Privacy & security, (3) Misinformation, (4) Malicious actors & misuse, (5) Human-computer interaction, (6) Socioeconomic & environmental, and (7) AI system safety, failures, & limitations. These are further divided into 23 subdomains. The AI Risk Repository is, to our knowledge, the first attempt to rigorously curate, analyze, and extract AI risk frameworks into a publicly accessible, comprehensive, extensible, and categorized risk database. This creates a foundation for a more coordinated, coherent, and complete approach to defining, auditing, and managing the risks posed by AI systems.

2408.02679 2026-05-01 cs.LG cs.GR cs.HC stat.ME

Visual Analysis of Multi-outcome Causal Graphs

Mengjie Fan, Jinlu Yu, Daniel Weiskopf, Nan Cao, Huai-Yu Wang, Liang Zhou

详情
Journal ref
EEE Transactions on Visualization and Computer Graphics, vol. 31, no. 1, pp. 656-666, 2025
英文摘要

We introduce a visual analysis method for multiple causal graphs with different outcome variables, namely, multi-outcome causal graphs. Multi-outcome causal graphs are important in healthcare for understanding multimorbidity and comorbidity. To support the visual analysis, we collaborated with medical experts to devise two comparative visualization techniques at different stages of the analysis process. First, a progressive visualization method is proposed for comparing multiple state-of-the-art causal discovery algorithms. The method can handle mixed-type datasets comprising both continuous and categorical variables and assist in the creation of a fine-tuned causal graph of a single outcome. Second, a comparative graph layout technique and specialized visual encodings are devised for the quick comparison of multiple causal graphs. In our visual analysis approach, analysts start by building individual causal graphs for each outcome variable, and then, multi-outcome causal graphs are generated and visualized with our comparative technique for analyzing differences and commonalities of these causal graphs. Evaluation includes quantitative measurements on benchmark datasets, a case study with a medical expert, and expert user studies with real-world health research data.

2407.19001 2026-05-01 cs.CV

Effective Prompt Pool Learning for Continual Category Discovery

Fernando Julio Cendra, Xinghui Li, Kai Han

Comments Under review. Extended version of our ECCV 2024 paper, see arXiv:2407.19001v2

详情
英文摘要

This paper studies effective prompt pool learning for Continual Category Discovery (CCD), a challenging open-world setting where a model must discover novel categories from a continuous stream of unlabelled data containing both known and novel classes, while mitigating catastrophic forgetting of previously learned concepts. We introduce a series of novel prompt-pool-based frameworks for CCD, each exploring a different design of prompt pools. First, we propose PromptCCD, which focuses on global class prototypes via a Gaussian Mixture Prompt (GMP) module. GMP fits a generative Gaussian mixture model over feature embeddings, where each mixture component serves as both a class prototype and a dynamic prompt that conditions the backbone's representations. This design enables label-free prompt selection and on-the-fly estimation of the number of emerging categories. Through a systematic spectrum study, we then show that category count, rather than sample size, is the primary bottleneck for discovery performance, motivating the need for finer-grained representations. Building on this finding, we propose PromptCCD++, which focuses on object-part prototypes via Part-level Prompting (PLP) modules. PLP decomposes prompt pool into multiple, specialized part-level prompt pools. During discovery phase, these pools dynamically assign part-specific prompts to local object regions without the need for manual part annotations, enabling the model to learn object-part representations that boost category discovery. Extensive evaluations on both generic and fine-grained benchmarks, supported by comprehensive ablation studies, demonstrate the effectiveness of our framework for CCD.

2406.04808 2026-05-01 cs.LG cs.HC

VERA: Generating Visual Explanations of Two-Dimensional Embeddings via Region Annotation

Pavlin G. Poličar, Blaž Zupan

详情
英文摘要

Two-dimensional embeddings obtained from dimensionality reduction techniques such as MDS, t-SNE, or UMAP, are widely used to visualize high-dimensional data and support researchers in visually identifying clusters, outliers, and other interesting patterns in the data. However, the main challenge is not only to detect such patterns, but to explain what they represent in terms of the original, human-interpretable features of the data. Existing approaches often rely on interactive exploration or direct feature encodings, requiring substantial manual inspection that can be time-consuming and repetitive. As an alternative, we propose VERA (Visual Explanations via Region Annotation), a general-purpose method for explaining two-dimensional embeddings through automatically generated, static, region-based visual explanations. VERA identifies informative regions in the embedding space and associates them with user-provided human-interpretable features, producing concise visual annotations that summarize the structure of the embedding landscape at a glance. Rather than merely showing where feature values occur, VERA automatically filters, merges, and ranks candidate explanations, enabling users to focus on the most informative embedding structures without manual exploration. We demonstrate VERA's utility on several real-world datasets and evaluate its effectiveness in a user study comparing it with the utility of a comprehensive interactive data mining toolkit. Our results show that VERA's generated static explanations can convey the essential insights of complex embeddings and support users in typical exploratory data analysis tasks, while requiring significantly less time and user effort.

2305.02251 2026-05-01 cs.AI cs.LG

Automated Scientific Discovery: From Equation Discovery to Autonomous Discovery Systems

Stefan Kramer, Mattia Cerrato, Jannis Brugger, Sašo Džeroski, Ross King

Comments 19 pages plus references

详情
Journal ref
Machine Learning (2026) 115:109
英文摘要

The paper surveys automated scientific discovery, from equation discovery and symbolic regression to autonomous discovery systems and agents. It discusses the individual approaches from a "big picture" perspective and in context, but also discusses open issues and recent topics like the various roles of deep neural networks in this area, aiding in the discovery of human-interpretable knowledge. Further, we will present closed-loop scientific discovery systems, starting with the pioneering work on the Adam system up to current efforts in fields from material science to astronomy. Finally, we will elaborate on autonomy from a machine learning perspective, but also in analogy to the autonomy levels in autonomous driving. The maximal level, level five, is defined to require no human intervention at all in the production of scientific knowledge. Achieving this is one step towards solving the Nobel Turing Grand Challenge to develop AI Scientists: AI systems capable of making Nobel-quality scientific discoveries highly autonomously at a level comparable, and possibly superior, to the best human scientists by 2050.

2211.06762 2026-05-01 cs.RO

Adaptive Nonlinear MPC for Trajectory Tracking of An Overactuated Tiltrotor Hexacopter

Yueqian Liu, Fengyu Quan, Haoyao Chen

Comments (1) Eq. (10) sign error, inconsistent with Eq. (14). (2) Eq. (15) spurious Coriolis term (skips transport theorem). (3) typo before Eq. (21): _Bω_dot_EKF?_Bτ_dot_EKF. (4) Sec. IV comparison lacks systematic tuning and does not support its claims. (5) the open-source release at github.com/HITSZ-NRSL/omniHex will not happen

详情
英文摘要

Omnidirectional micro aerial vehicles (OMAVs) are more capable of doing environmentally interactive tasks due to their ability to exert full wrenches while maintaining stable poses. However, OMAVs often incorporate additional actuators and complex mechanical structures to achieve omnidirectionality. Obtaining precise mathematical models is difficult, and the mismatch between the model and the real physical system is not trivial. The large model-plant mismatch significantly degrades overall system performance if a non-adaptive model predictive controller (MPC) is used. This work presents the $\mathcal{L}_1$-MPC, an adaptive nonlinear model predictive controller for accurate 6-DOF trajectory tracking of an overactuated tiltrotor hexacopter in the presence of model uncertainties and external disturbances. The $\mathcal{L}_1$-MPC adopts a cascaded system architecture in which a nominal MPC is followed and augmented by an $\mathcal{L}_1$ adaptive controller. The proposed method is evaluated against the non-adaptive MPC, the EKF-MPC, and the PID method in both numerical and PX4 software-in-the-loop simulation with Gazebo. The $\mathcal{L}_1$-MPC reduces the tracking error by around 90% when compared to a non-adaptive MPC, and the $\mathcal{L}_1$-MPC has lower tracking errors, higher uncertainty estimation rates, and less tuning requirements over the EKF-MPC. We will make the implementations, including the hardware-verified PX4 firmware and Gazebo plugins, open-source at https://github.com/HITSZ-NRSL/omniHex.

1907.11158 2026-05-01 cs.CL

Cross-Lingual Transfer for Distantly Supervised and Low-resources Indonesian NER

Fariz Ikhwantri

详情
英文摘要

Manually annotated corpora for low-resource languages are usually small in quantity (gold), or large but distantly supervised (silver). Inspired by recent progress of injecting pre-trained language model (LM) on many Natural Language Processing (NLP) task, we proposed to fine-tune pre-trained language model from high-resources languages to low-resources languages to improve the performance of both scenarios. Our empirical experiment demonstrates significant improvement when fine-tuning pre-trained language model in cross-lingual transfer scenarios for small gold corpus and competitive results in large silver compare to supervised cross-lingual transfer, which will be useful when there is no parallel annotation in the same task to begin. We compare our proposed method of cross-lingual transfer using pre-trained LM to different sources of transfer such as mono-lingual LM and Part-of-Speech tagging (POS) in the downstream task of both large silver and small gold NER dataset by exploiting character-level input of bi-directional language model task.

2604.28186 2026-05-01 cs.GT cs.AI cs.CC cs.LG econ.TH

Computing Equilibrium beyond Unilateral Deviation

Mingyang Liu, Gabriele Farina, Asuman Ozdaglar

详情
英文摘要

Most familiar equilibrium concepts, such as Nash and correlated equilibrium, guarantee only that no single player can improve their utility by deviating unilaterally. They offer no guarantees against profitable coordinated deviations by coalitions. Although the literature proposes solution concepts that provide stability against multilateral deviations (\emph{e.g.}, strong Nash and coalition-proof equilibrium), these generally fail to exist. In this paper, we study an alternative solution concept that minimizes coalitional deviation incentives, rather than requiring them to vanish, and is therefore guaranteed to exist. Specifically, we focus on minimizing the average gain of a deviating coalition, and extend the framework to weighted-average and maximum-within-coalition gains. In contrast, the minimum-gain analogue is shown to be computationally intractable. For the average-gain and maximum-gain objectives, we prove a lower bound on the complexity of computing such an equilibrium and present an algorithm that matches this bound. Finally, we use our framework to solve the \emph{Exploitability Welfare Frontier} (EWF), the maximum attainable social welfare subject to a given exploitability (the maximum gain over all unilateral deviations).

2604.28176 2026-05-01 quant-ph cs.LG

Defending Quantum Classifiers against Adversarial Perturbations through Quantum Autoencoders

Emma Andrews, Sahan Sanjaya, Prabhat Mishra

详情
英文摘要

Machine learning models can learn from data samples to carry out various tasks efficiently. When data samples are adversarially manipulated, such as by insertion of carefully crafted noise, it can cause the model to make mistakes. Quantum machine learning models are also vulnerable to such adversarial attacks, especially in image classification using variational quantum classifiers. While there are promising defenses against these adversarial perturbations, such as training with adversarial samples, they face practical limitations. For example, they are not applicable in scenarios where training with adversarial samples is either not possible or can overfit the models on one type of attack. In this paper, we propose an adversarial training-free defense framework that utilizes a quantum autoencoder to purify the adversarial samples through reconstruction. Moreover, our defense framework provides a confidence metric to identify potentially adversarial samples that cannot be purified the quantum autoencoder. Extensive evaluation demonstrates that our defense framework can significantly outperform state-of-the-art in prediction accuracy (up to 68%) under adversarial attacks.

2604.28167 2026-05-01 cond-mat.soft cs.LG

Mapping the Phase Diagram of the Vicsek Model with Machine Learning

Grace T. Bai, Brandon B. Le

Comments 8 pages, 3 figures

详情
英文摘要

In this study, we use machine learning to classify and interpolate the phase structure of the Vicsek flocking model across the three-dimensional parameter space $(η,ρ,v_0)$. We construct a dataset of simulated parameter points and characterize each point using long-time dynamical observables. These observables are then used as inputs to a K-Means clustering procedure, which assigns each point to a disorder, order, or coexistence phase. Using these clustered labels, we train a neural-network classifier to learn the mapping from model parameters to phase behavior, achieving a classification accuracy of 0.92. The resulting phase map resolves a narrow coexistence region separating the ordered and disordered phases and extends the inferred phase boundaries beyond the originally sampled simulation points. More broadly, this approach provides a systematic way to convert sparse simulation data into a global phase diagram for collective-motion models.

2604.28163 2026-05-01 eess.SP cs.LG stat.CO stat.ML

Sequential Inference for Gaussian Processes: A Signal Processing Perspective

Daniel Waxman, Fernando Llorente, Petar M. Djurić

Comments 53 pages, 7 figures. Accepted to IEEE Signal Processing Magazine

详情
英文摘要

The proliferation of capable and efficient machine learning (ML) models marks one of the strongest methodological shifts in signal processing (SP) in its nearly 100-year history. ML models support the development of SP systems that represent complex, nonlinear relationships with high predictive accuracy. Adapting these models often requires sequential inference, which differs both theoretically and methodologically from the usual paradigm of ML, where data are often assumed independent and identically distributed. Gaussian processes (GPs) are a flexible yet principled framework for modeling random functions, and they have become increasingly relevant to SP as statistical and ML methods assume a more prominent role. We provide a self-contained, tutorial-style overview of GPs, with a particular focus on recent methodological advances in sequential, incremental, or streaming inference. We introduce these techniques from a signal-processing perspective while bridging them to recent advances in ML. Many of the developments we survey have direct applications to state-space modeling, sequential regression and forecasting, anomaly detection in time series, sequential Bayesian optimization, adaptive and active sensing, and sequential detection and decision-making. By organizing these advances from a signal-processing perspective, we intend to equip practitioners with practical tools and a coherent roadmap for deploying sequential GP models in real-world systems.

2604.28142 2026-05-01 cs.IR cs.LG

Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing

Silvio Martinico, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini

Comments 6 pages, 2 figures, SIGIR 2026

详情
英文摘要

Multivector retrieval models achieve state-of-the-art effectiveness through fine-grained token-level representations, but their deployment incurs substantial computational and memory costs. Current solutions, based on the well-known k-means clustering algorithm, group similar vectors together to enable both effective compression and efficient retrieval. However, standard k-means scales poorly with the number of clusters and dataset size, and favours frequent tokens during training while underrepresenting rare, discriminative ones. In this work, we introduce TACHIOM, a multivector retrieval system that exploits token-level structure to significantly accelerate both clustering and retrieval. By accounting for tokens' distribution during centroid allocation, TACHIOM easily scales to millions of centroids, enabling highly accurate document scoring using only centroids, avoiding expensive token-level computation. TACHIOM combines a graph-based index over centroids with an optimized Product Quantization layout for efficient final scoring. Experiments on MS-MARCOv1 and LoTTE show that TACHIOM achieves up to $247\times$ faster clustering than k-means and up to $9.8\times$ retrieval speedup over state-of-the-art systems while maintaining comparable or superior effectiveness.

2604.28138 2026-05-01 cs.OS cs.AI

Crab: A Semantics-Aware Checkpoint/Restore Runtime for Agent Sandboxes

Tianyuan Wu, Chaokun Chang, Lunxi Cao, Wei Gao, Wei Wang

Comments 15 pages, 21 figures

详情
英文摘要

Autonomous agents act through sandboxed containers and microVMs whose state spans filesystems, processes, and runtime artifacts. Checkpoint and restore (C/R) of this state is needed for fault tolerance, spot execution, RL rollout branching, and safe rollback-yet existing approaches fall into two extremes: application-level recovery preserves chat history but misses OS-side effects, while full per-turn checkpointing is correct but too expensive under dense co-location. The root cause is an agent-OS semantic gap: agent frameworks see tool calls but not their OS effects; the OS sees state changes but lacks turn-level context to judge recovery relevance. This gap hides massive sparsity: over 75% of agent turns produce no recovery-relevant state, so most checkpoints are unnecessary. Crab (Checkpoint-and-Restore for Agent SandBoxes) is a transparent host-side runtime that bridges this gap without modifying agents or C/R backends. An eBPF-based inspector classifies each turn's OS-visible effects to decide checkpoint granularity; a coordinator aligns checkpoints with turn boundaries and overlaps C/R with LLM wait time; and a host-scoped engine schedules checkpoint traffic across co-located sandboxes. On shell-intensive and code-repair workloads, Crab raises recovery correctness from 8% (chat-only) to 100%, cuts checkpoint traffic by up to 87%, and stays within 1.9% of fault-free execution time.

2604.28129 2026-05-01 cs.CR cs.AI

Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection

Prashant Kulkarni

详情
英文摘要

Multi-turn prompt injection follows a known attack path -- trust-building, pivoting, escalation but text-level defenses miss covert attacks where individual turns appear benign. We show this attack path leaves an activation-level signature in the model's residual stream: each phase shift moves the activation, producing a total path length far exceeding benign conversations. We call this adversarial restlessness. Five scalar trajectory features capturing this signal lift conversation-level detection from 76.2% to 93.8% on synthetic held-out data. The signal replicates across four model families (24B-70B); probes are model-specific and do not transfer across architectures. Generalization is source-dependent: leave-one-source-out evaluation shows each of synthetic, LMSYS-Chat-1M, and SafeDialBench captures distinct attack distributions, with detection on real-world LMSYS reaching 47-71% when its distribution is represented in training. Combined three-source training achieves 89.4% detection at 2.4% false positive rate on a held-out mixed set. We further show that three-phase turn-level labels(benign/pivoting/adversarial) unique to our synthetic dataset are essential: binary conversation-level labels produce 50-59% false positives. These results establish adversarial restlessness as a reliable activation-level signal and characterize the data requirements for practical deployment.