arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1386
专题追踪
2411.01076 2026-02-12 cs.CL cs.AI cs.CR cs.DC cs.LG

When Speculation Spills Secrets: Side Channels via Speculative Decoding In LLMs

Jiankun Wei, Abdulrahman Abdulrazzag, Tianchen Zhang, Adel Muursepp, Gururaj Saileshwar

详情
英文摘要

Deployed large language models (LLMs) often rely on speculative decoding, a technique that generates and verifies multiple candidate tokens in parallel, to improve throughput and latency. In this work, we reveal a new side-channel whereby input-dependent patterns of correct and incorrect speculations can be inferred by monitoring per-iteration token counts or packet sizes. In evaluations using research prototypes and production-grade vLLM serving frameworks, we show that an adversary monitoring these patterns can fingerprint user queries (from a set of 50 prompts) with over 75% accuracy across four speculative-decoding schemes at temperature 0.3: REST (100%), LADE (91.6%), BiLD (95.2%), and EAGLE (77.6%). Even at temperature 1.0, accuracy remains far above the 2% random baseline - REST (99.6%), LADE (61.2%), BiLD (63.6%), and EAGLE (24%). We also show the capability of the attacker to leak confidential datastore contents used for prediction at rates exceeding 25 tokens/sec. To defend against these, we propose and evaluate a suite of mitigations, including packet padding and iteration-wise token aggregation.

2410.18970 2026-02-12 cs.AI cs.LG

Bridging Explainability and Embeddings: BEE Aware of Spuriousness

Cristian Daniel Păduraru, Antonio Bărbălau, Radu Filipescu, Andrei Liviu Nicolicioiu, Elena Burceanu

Comments ICLR 2026

详情
英文摘要

Current methods for detecting spurious correlations rely on analyzing dataset statistics or error patterns, leaving many harmful shortcuts invisible when counterexamples are absent. We introduce BEE (Bridging Explainability and Embeddings), a framework that shifts the focus from model predictions to the weight space, and to the embedding geometry underlying decisions. By analyzing how fine-tuning perturbs pretrained representations, BEE uncovers spurious correlations that remain hidden from conventional evaluation pipelines. We use linear probing as a transparent diagnostic lens, revealing spurious features that not only persist after full fine-tuning but also transfer across diverse state-of-the-art models. Our experiments cover numerous datasets and domains: vision (Waterbirds, CelebA, ImageNet-1k), language (CivilComments, MIMIC-CXR medical notes), and multiple embedding families (CLIP, CLIP-DataComp.XL, mGTE, BLIP2, SigLIP2). BEE consistently exposes spurious correlations: from concepts that slash the ImageNet accuracy by up to 95%, to clinical shortcuts in MIMIC-CXR notes that induce dangerous false negatives. Together, these results position BEE as a general and principled tool for diagnosing spurious correlations in weight space, enabling principled dataset auditing and more trustworthy foundation models. The source code is publicly available at https://github.com/bit-ml/bee.

2410.05762 2026-02-12 cs.CV

Guided Self-attention: Find the Generalized Necessarily Distinct Vectors for Grain Size Grading

Fang Gao, Xuetao Li, Jiabao Wang, Shengheng Ma, Jun Yu

Journal ref IEEE Transactions on Human-Machine Systems, 1-13, 2026

详情
英文摘要

With the development of steel materials, metallographic analysis has become increasingly important. Unfortunately, grain size analysis is a manual process that requires experts to evaluate metallographic photographs, which is unreliable and time-consuming. To resolve this problem, we propose a novel classifi-cation method based on deep learning, namely GSNets, a family of hybrid models which can effectively introduce guided self-attention for classifying grain size. Concretely, we build our models from three insights:(1) Introducing our novel guided self-attention module can assist the model in finding the generalized necessarily distinct vectors capable of retaining intricate rela-tional connections and rich local feature information; (2) By improving the pixel-wise linear independence of the feature map, the highly condensed semantic representation will be captured by the model; (3) Our novel triple-stream merging module can significantly improve the generalization capability and efficiency of the model. Experiments show that our GSNet yields a classifi-cation accuracy of 90.1%, surpassing the state-of-the-art Swin Transformer V2 by 1.9% on the steel grain size dataset, which comprises 3,599 images with 14 grain size levels. Furthermore, we intuitively believe our approach is applicable to broader ap-plications like object detection and semantic segmentation.

2410.03847 2026-02-12 cs.LG cs.AI

Enhancing Inverse Reinforcement Learning through Encoding Dynamic Information in Reward Shaping

Simon Sinong Zhan, Philip Wang, Qingyuan Wu, Yixuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu

详情
英文摘要

In this paper, we aim to tackle the limitation of the Adversarial Inverse Reinforcement Learning (AIRL) method in stochastic environments where theoretical results cannot hold and performance is degraded. To address this issue, we propose a novel method which infuses the dynamics information into the reward shaping with the theoretical guarantee for the induced optimal policy in the stochastic environments. Incorporating our novel model-enhanced rewards, we present a novel Model-Enhanced AIRL framework, which integrates transition model estimation directly into reward shaping. Furthermore, we provide a comprehensive theoretical analysis of the reward error bound and performance difference bound for our method. The experimental results in MuJoCo benchmarks show that our method can achieve superior performance in stochastic environments and competitive performance in deterministic environments, with significant improvement in sample efficiency, compared to existing baselines.

2408.01253 2026-02-12 cs.AI cs.SY eess.SY q-bio.NC

Metareasoning in uncertain environments: a meta-BAMDP framework

Prakhar Godara, Tilman Diego Alemán

详情
英文摘要

\textit{Reasoning} may be viewed as an algorithm $P$ that makes a choice of an action $a^* \in \mathcal{A}$, aiming to optimize some outcome. However, executing $P$ itself bears costs (time, energy, limited capacity, etc.) and needs to be considered alongside explicit utility obtained by making the choice in the underlying decision problem. Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$, generally referred to as \textit{metareasoning}. Conventionally, human metareasoning models assume that the agent knows the transition and reward distributions of the underlying MDP. This paper generalizes such models by proposing a meta Bayes-Adaptive MDP (meta-BAMDP) framework to handle metareasoning in environments with unknown reward/transition distributions, which encompasses a far larger and more realistic set of planning problems that humans and AI systems face. As a first step, we apply the framework to Bernoulli bandit tasks. Owing to the meta problem's complexity, our solutions are necessarily approximate. However, we introduce two novel theorems that significantly enhance the tractability of the problem, enabling stronger approximations that are robust within a range of assumptions grounded in realistic human decision-making scenarios. These results offer a resource-rational perspective and a normative framework for understanding human exploration under cognitive constraints, as well as providing experimentally testable predictions about human behavior in Bernoulli Bandit tasks.

2407.06454 2026-02-12 cs.RO cs.MA

Simplification of Robotic System Model Analysis by Petri Net Meta-Model Property Transfer

Maksym Figat, Cezary Zieliński

Comments 16 pages

Journal ref IEEE Access, 2026

详情
英文摘要

This paper presents a simplification of robotic system model analysis due to the transfer of Robotic System Hierarchical Petri Net (RSHPN) meta-model properties onto the model of a designed system. Key contributions include: 1) analysis of RSHPN meta-model properties; 2) decomposition of RSHPN analysis into analysis of individual Petri nets, thus the reduction of state space explosion; and 3) transfer of RSHPN meta-model properties onto the produced models, hence elimination of the need for full re-analysis of the RSHPN model when creating new robotic systems. Only task-dependent parts of the model need to be analysed. This approach streamlines the analysis thus reducing the design time. Moreover, it produces a specification which is a solid foundation for the implementation of the system. The obtained results highlight the potential of Petri nets as a valuable formal framework for analysing robotic system properties.

2406.14986 2026-02-12 cs.AI cs.CL

Implicit Probabilistic Reasoning Does Not Reflect Explicit Answers in Large Language Models

Manuel Mondal, Ljiljana Dolamic, Gérôme Bovet, Philippe Cudré-Mauroux, Julien Audiffren

Comments Published in Transactions on Machine Learning Research

详情
英文摘要

The handling of probabilities in the form of uncertainty or partial information is an essential task for LLMs in many settings and applications. A common approach to evaluate an LLM's probabilistic reasoning capabilities is to assess its ability to answer questions pertaining to probability through the use of multiple-choice questions (MCQs). However, this paradigm, which we refer to as explicit probabilistic reasoning, has been shown in the literature to yield significant limitations (e.g., sensitivity to answer ordering). In this work, we introduce an alternative approach, named implicit probabilistic reasoning, which evaluates the models' ability to integrate probabilistic reasoning into their text generation process. To achieve this, we rephrase MCQs as text-completion scenarios with a determined set of outcomes and compare the model's next-token probability assignments to the true likelihood of the outcomes. In line with previous work, we find that models exhibit solid performance in their explicit probabilistic reasoning (i.e., answers to MCQs). However, during text completion (i.e., implicit probabilistic reasoning), where the same information must be taken into account to generate text, the models' predictions often significantly diverge from the known ground truth. For instance, our evaluation method reveals that implicit probabilistic reasoning is improperly influenced by many factors, such as independent prior events, partial observations about a result, or statistical background information. All of these issues can cause erroneous results to be produced in text generation, which are not detected by conventional MCQ-based evaluation.

2404.00443 2026-02-12 cs.RO cs.SY eess.SY

Dynamic Motion/Force Control of Mobile Manipulators via Extended-UDE

Songqun Gao, Wendi Ding, Maotong Cheng, Qinyuan Ren, Ben M. Chen

Journal ref Control Engineering Practice, Volume 167, 2026, 106650

详情
英文摘要

Mobile manipulators are known for their superior mobility over manipulators on fixed bases, offering promising applications in smart industry and housekeeping scenarios. The dynamic coupling nature between the mobile base and the manipulator presents challenges for force interactive tasks of the mobile manipulator. However, current strategies often fail to account for this coupling in such scenarios. To address this, this paper presents a dynamic coupling-integrated manipulator model that requires only the manipulator dynamics and the mobile base kinematics, which simplifies the modeling process. In addition, embedding the dynamic model, an extended uncertainty and disturbance estimator (UDE) is proposed for the mobile manipulator, which separately estimates the dynamic coupling terms and other unmodeled uncertainties, incorporating them into the feedforward and feedback control loops, respectively. The proposed approach increases the speed of response of the system and improves the dynamic robot-environment interaction (REI) performance of the mobile manipulator. A series of simulations and experiments of a wall-cleaning task are conducted to verify the effectiveness of the proposed approach. Ablation studies demonstrate that the proposed approach significantly improves the motion/force tracking performance when the mobile base is in dynamic motion.

2402.01070 2026-02-12 cs.LG

FedShift: Robust Federated Learning Aggregation Scheme in Resource Constrained Environment via Weight Shifting

Jungwon Seo, Minhoe Kim, Chunming Rong

详情
英文摘要

Federated Learning (FL) commonly relies on a central server to coordinate training across distributed clients. While effective, this paradigm suffers from significant communication overhead, impacting overall training efficiency. To mitigate this, prior work has explored compression techniques such as quantization. However, in heterogeneous FL settings, clients may employ different quantization levels based on their hardware or network constraints, necessitating a mixed-precision aggregation process at the server. This introduces additional challenges, exacerbating client drift and leading to performance degradation. In this work, we propose FedShift, a novel aggregation methodology designed to mitigate performance degradation in FL scenarios with mixed quantization levels. FedShift employs a statistical matching mechanism based on weight shifting to align mixed-precision models, thereby reducing model divergence and addressing quantization-induced bias. Our approach functions as an add-on to existing FL optimization algorithms, enhancing their robustness and improving convergence. Empirical results demonstrate that FedShift effectively mitigates the negative impact of mixed-precision aggregation, yielding superior performance across various FL benchmarks.

2401.01273 2026-02-12 cs.LG

Learning-based agricultural management in partially observable environments subject to climate variability

Zhaoan Wang, Shaoping Xiao, Junchao Li, Jun Wang

详情
英文摘要

Agricultural management, with a particular focus on fertilization strategies, holds a central role in shaping crop yield, economic profitability, and environmental sustainability. While conventional guidelines offer valuable insights, their efficacy diminishes when confronted with extreme weather conditions, such as heatwaves and droughts. In this study, we introduce an innovative framework that integrates Deep Reinforcement Learning (DRL) with Recurrent Neural Networks (RNNs). Leveraging the Gym-DSSAT simulator, we train an intelligent agent to master optimal nitrogen fertilization management. Through a series of simulation experiments conducted on corn crops in Iowa, we compare Partially Observable Markov Decision Process (POMDP) models with Markov Decision Process (MDP) models. Our research underscores the advantages of utilizing sequential observations in developing more efficient nitrogen input policies. Additionally, we explore the impact of climate variability, particularly during extreme weather events, on agricultural outcomes and management. Our findings demonstrate the adaptability of fertilization policies to varying climate conditions. Notably, a fixed policy exhibits resilience in the face of minor climate fluctuations, leading to commendable corn yields, cost-effectiveness, and environmental conservation. However, our study illuminates the need for agent retraining to acquire new optimal policies under extreme weather events. This research charts a promising course toward adaptable fertilization strategies that can seamlessly align with dynamic climate scenarios, ultimately contributing to the optimization of crop management practices.

2310.03111 2026-02-12 cs.LG q-bio.NC

Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data

Rabia Gondur, Usama Bin Sikandar, Evan Schaffer, Mikio Christian Aoi, Stephen L Keeley

Comments Updated version published in ICLR 2024

Journal ref In The Twelfth International Conference on Learning Representations. (2024)

详情
英文摘要

Characterizing the relationship between neural population activity and behavioral data is a central goal of neuroscience. While latent variable models (LVMs) are successful in describing high-dimensional time-series data, they are typically only designed for a single type of data, making it difficult to identify structure shared across different experimental data modalities. Here, we address this shortcoming by proposing an unsupervised LVM which extracts temporally evolving shared and independent latents for distinct, simultaneously recorded experimental modalities. We do this by combining Gaussian Process Factor Analysis (GPFA), an interpretable LVM for neural spiking data with temporally smooth latent space, with Gaussian Process Variational Autoencoders (GP-VAEs), which similarly use a GP prior to characterize correlations in a latent space, but admit rich expressivity due to a deep neural network mapping to observations. We achieve interpretability in our model by partitioning latent variability into components that are either shared between or independent to each modality. We parameterize the latents of our model in the Fourier domain, and show improved latent identification using this approach over standard GP-VAE methods. We validate our model on simulated multi-modal data consisting of Poisson spike counts and MNIST images that scale and rotate smoothly over time. We show that the multi-modal GP-VAE (MM-GPVAE) is able to not only identify the shared and independent latent structure across modalities accurately, but provides good reconstructions of both images and neural rates on held-out trials. Finally, we demonstrate our framework on two real world multi-modal experimental settings: Drosophila whole-brain calcium imaging alongside tracked limb positions, and Manduca sexta spike train measurements from ten wing muscles as the animal tracks a visual stimulus.

2305.05311 2026-02-12 cs.CL cs.AI

Structured Sentiment Analysis as Transition-based Dependency Graph Parsing

Daniel Fernández-González

Comments Final peer-reviewed manuscript accepted for publication in Artificial Intelligence Review

详情
英文摘要

Structured sentiment analysis (SSA) aims to automatically extract people's opinions from a text in natural language and adequately represent that information in a graph structure. One of the most accurate methods for performing SSA was recently proposed and consists of approaching it as a dependency graph parsing task. Although we can find in the literature how transition-based algorithms excel in different dependency graph parsing tasks in terms of accuracy and efficiency, all proposed attempts to tackle SSA following that approach were based on graph-based models. In this article, we present the first transition-based method to address SSA as dependency graph parsing. Specifically, we design a transition system that processes the input text in a left-to-right pass, incrementally generating the graph structure containing all identified opinions. To effectively implement our final transition-based model, we resort to a Pointer Network architecture as a backbone. From an extensive evaluation, we demonstrate that our model offers the best performance to date in practically all cases among prior dependency-based methods, and surpasses recent task-specific techniques on the most challenging datasets. We additionally include an in-depth analysis and empirically prove that the average-case time complexity of our approach is quadratic in the sentence length, being more efficient than top-performing graph-based parsers.

2207.04196 2026-02-12 cs.RO

Robotic Depowdering for Additive Manufacturing Via Pose Tracking

Zhenwei Liu, Junyi Geng, Xikai Dai, Tomasz Swierzewski, Kenji Shimada

Comments Github link: https://github.com/zhenweil/Robotic-Depowdering-for-Additive-Manufacturing-Via-Pose-Tracking Video link: https://www.youtube.com/watch?v=AUIkyULAhqM

Journal ref 2022 IEEE Robotics and Automation Letters

详情
英文摘要

With the rapid development of powder-based additive manufacturing, depowdering, a process of removing unfused powder that covers 3D-printed parts, has become a major bottleneck to further improve its productiveness. Traditional manual depowdering is extremely time-consuming and costly, and some prior automated systems either require pre-depowdering or lack adaptability to different 3D-printed parts. To solve these problems, we introduce a robotic system that automatically removes unfused powder from the surface of 3D-printed parts. The key component is a visual perception system, which consists of a pose-tracking module that tracks the 6D pose of powder-occluded parts in real-time, and a progress estimation module that estimates the depowdering completion percentage. The tracking module can be run efficiently on a laptop CPU at up to 60 FPS. Experiments show that our depowdering system can remove unfused powder from the surface of various 3D-printed parts without causing any damage. To the best of our knowledge, this is one of the first vision-based robotic depowdering systems that adapt to parts with various shapes without the need for pre-depowdering.

2602.11141 2026-02-12 cs.HC cs.LG

LCIP: Loss-Controlled Inverse Projection of High-Dimensional Image Data

Yu Wang, Frederik L. Dennig, Michael Behrisch, Alexandru Telea

详情
英文摘要

Projections (or dimensionality reduction) methods $P$ aim to map high-dimensional data to typically 2D scatterplots for visual exploration. Inverse projection methods $P^{-1}$ aim to map this 2D space to the data space to support tasks such as data augmentation, classifier analysis, and data imputation. Current $P^{-1}$ methods suffer from a fundamental limitation -- they can only generate a fixed surface-like structure in data space, which poorly covers the richness of this space. We address this by a new method that can `sweep' the data space under user control. Our method works generically for any $P$ technique and dataset, is controlled by two intuitive user-set parameters, and is simple to implement. We demonstrate it by an extensive application involving image manipulation for style transfer.

2602.11125 2026-02-12 cs.DC cs.RO

Min-Sum Uniform Coverage Problem by Autonomous Mobile Robots

Animesh Maiti, Abhinav Chakraborty, Bibhuti Das, Subhash Bhagat, Krishnendu Mukhopadhyaya

详情
英文摘要

We study the \textit{min-sum uniform coverage} problem for a swarm of $n$ mobile robots on a given finite line segment and on a circle having finite positive radius, where the circle is given as an input. The robots must coordinate their movements to reach a uniformly spaced configuration that minimizes the total distance traveled by all robots. The robots are autonomous, anonymous, identical, and homogeneous, and operate under the \textit{Look-Compute-Move} (LCM) model with \textit{non-rigid} motion controlled by a fair asynchronous scheduler. They are oblivious and silent, possessing neither persistent memory nor a means of explicit communication. In the \textbf{line-segment setting}, the \textit{min-sum uniform coverage} problem requires placing the robots at uniformly spaced points along the segment so as to minimize the total distance traveled by all robots. In the \textbf{circle setting} for this problem, the robots have to arrange themselves uniformly around the given circle to form a regular $n$-gon. There is no fixed orientation or designated starting vertex, and the goal is to minimize the total distance traveled by all the robots. We present a deterministic distributed algorithm that achieves uniform coverage in the line-segment setting with minimum total movement cost. For the circle setting, we characterize all initial configurations for which the \textit{min-sum uniform coverage} problem is deterministically unsolvable under the considered robot model. For all the other remaining configurations, we provide a deterministic distributed algorithm that achieves uniform coverage while minimizing the total distance traveled. These results characterize the deterministic solvability of min-sum coverage for oblivious robots and achieve optimal cost whenever solvable.

2602.11116 2026-02-12 eess.SY cs.RO cs.SY math.OC

Multi-UAV Trajectory Optimization for Bearing-Only Localization in GPS Denied Environments

Alfonso Sciacchitano, Liraz Mudrik, Sean Kragelund, Isaac Kaminer

Comments 38 pages, 7 figure, and 6 tables

详情
英文摘要

Accurate localization of maritime targets by unmanned aerial vehicles (UAVs) remains challenging in GPS-denied environments. UAVs equipped with gimballed electro-optical sensors are typically used to localize targets, however, reliance on these sensors increases mechanical complexity, cost, and susceptibility to single-point failures, limiting scalability and robustness in multi-UAV operations. This work presents a new trajectory optimization framework that enables cooperative target localization using UAVs with fixed, non-gimballed cameras operating in coordination with a surface vessel. This estimation-aware optimization generates dynamically feasible trajectories that explicitly account for mission constraints, platform dynamics, and out-of-frame events. Estimation-aware trajectories outperform heuristic paths by reducing localization error by more than a factor of two, motivating their use in cooperative operations. Results further demonstrate that coordinated UAVs with fixed, non-gimballed cameras achieve localization accuracy that meets or exceeds that of single gimballed systems, while substantially lowering system complexity and cost, enabling scalability, and enhancing mission resilience.

2602.11114 2026-02-12 cs.MA cs.AI cs.LG cs.SE

Learning to Compose for Cross-domain Agentic Workflow Generation

Jialiang Wang, Shengxiang Xu, Hanmo Liu, Jiachuan Wang, Yuyu Luo, Shimin Di, Min-Ling Zhang, Lei Chen

详情
英文摘要

Automatically generating agentic workflows -- executable operator graphs or codes that orchestrate reasoning, verification, and repair -- has become a practical way to solve complex tasks beyond what single-pass LLM generation can reliably handle. Yet what constitutes a good workflow depends heavily on the task distribution and the available operators. Under domain shift, current systems typically rely on iterative workflow refinement to discover a feasible workflow from a large workflow space, incurring high iteration costs and yielding unstable, domain-specific behavior. In response, we internalize a decompose-recompose-decide mechanism into an open-source LLM for cross-domain workflow generation. To decompose, we learn a compact set of reusable workflow capabilities across diverse domains. To recompose, we map each input task to a sparse composition over these bases to generate a task-specific workflow in a single pass. To decide, we attribute the success or failure of workflow generation to counterfactual contributions from learned capabilities, thereby capturing which capabilities actually drive success by their marginal effects. Across stringent multi-domain, cross-domain, and unseen-domain evaluations, our 1-pass generator surpasses SOTA refinement baselines that consume 20 iterations, while substantially reducing generation latency and cost.

2602.11107 2026-02-12 stat.ME cs.LG stat.ML

Renet: Principled and Efficient Relaxation for the Elastic Net via Dynamic Objective Selection

Albert Dorador

详情
英文摘要

We introduce Renet, a principled generalization of the Relaxed Lasso to the Elastic Net family of estimators. While, on the one hand, $\ell_1$-regularization is a standard tool for variable selection in high-dimensional regimes and, on the other hand, the $\ell_2$ penalty provides stability and solution uniqueness through strict convexity, the standard Elastic Net nevertheless suffers from shrinkage bias that frequently yields suboptimal prediction accuracy. We propose to address this limitation through a framework called \textit{relaxation}. Existing relaxation implementations rely on naive linear interpolations of penalized and unpenalized solutions, which ignore the non-linear geometry that characterizes the entire regularization path and risk violating the Karush-Kuhn-Tucker conditions. Renet addresses these limitations by enforcing sign consistency through an adaptive relaxation procedure that dynamically dispatches between convex blending and efficient sub-path refitting. Furthermore, we identify and formalize a unique synergy between relaxation and the ``One-Standard-Error'' rule: relaxation serves as a robust debiasing mechanism, allowing practitioners to leverage the parsimony of the 1-SE rule without the traditional loss in predictive fidelity. Our theoretical framework incorporates automated stability safeguards for ultra-high dimensional regimes and is supported by a comprehensive benchmarking suite across 20 synthetic and real-world datasets, demonstrating that Renet consistently outperforms the standard Elastic Net and provides a more robust alternative to the Adaptive Elastic Net in high-dimensional, low signal-to-noise ratio and high-multicollinearity regimes. By leveraging an adaptive solver backend, Renet delivers these statistical gains while offering a computational profile that remains competitive with state-of-the-art coordinate descent implementations.

2602.11076 2026-02-12 eess.SY cs.AI cs.SY eess.SP

Interpretable Attention-Based Multi-Agent PPO for Latency Spike Resolution in 6G RAN Slicing

Kavan Fatehi, Mostafa Rahmani Ghourtani, Amir Sonee, Poonam Yadav, Alessandra M Russo, Hamed Ahmadi, Radu Calinescu

Comments This work has been accepted to appear in the IEEE International Conference on Communications (ICC)

详情
英文摘要

Sixth-generation (6G) radio access networks (RANs) must enforce strict service-level agreements (SLAs) for heterogeneous slices, yet sudden latency spikes remain difficult to diagnose and resolve with conventional deep reinforcement learning (DRL) or explainable RL (XRL). We propose \emph{Attention-Enhanced Multi-Agent Proximal Policy Optimization (AE-MAPPO)}, which integrates six specialized attention mechanisms into multi-agent slice control and surfaces them as zero-cost, faithful explanations. The framework operates across O-RAN timescales with a three-phase strategy: predictive, reactive, and inter-slice optimization. A URLLC case study shows AE-MAPPO resolves a latency spike in $18$ms, restores latency to $0.98$ms with $99.9999\%$ reliability, and reduces troubleshooting time by $93\%$ while maintaining eMBB and mMTC continuity. These results confirm AE-MAPPO's ability to combine SLA compliance with inherent interpretability, enabling trustworthy and real-time automation for 6G RAN slicing.

2602.11059 2026-02-12 stat.ML cs.LG stat.AP

A Gibbs posterior sampler for inverse problem based on prior diffusion model

Jean-François Giovannelli

详情
英文摘要

This paper addresses the issue of inversion in cases where (1) the observation system is modeled by a linear transformation and additive noise, (2) the problem is ill-posed and regularization is introduced in a Bayesian framework by an a prior density, and (3) the latter is modeled by a diffusion process adjusted on an available large set of examples. In this context, it is known that the issue of posterior sampling is a thorny one. This paper introduces a Gibbs algorithm. It appears that this avenue has not been explored, and we show that this approach is particularly effective and remarkably simple. In addition, it offers a guarantee of convergence in a clearly identified situation. The results are clearly confirmed by numerical simulations.

2602.11016 2026-02-12 cs.AR cs.AI

From Buffers to Registers: Unlocking Fine-Grained FlashAttention with Hybrid-Bonded 3D NPU Co-Design

Jinxin Yu, Yudong Pan, Mengdi Wang, Huawei Li, Yinhe Han, Xiaowei Li, Ying Wang

Comments Accepted to DATE 2026

详情
英文摘要

Transformer-based models dominate modern AI workloads but exacerbate memory bottlenecks due to their quadratic attention complexity and ever-growing model sizes. Existing accelerators, such as Groq and Cerebras, mitigate off-chip traffic with large on-chip caches, while algorithmic innovations such as FlashAttention fuse operators to avoid materializing large attention matrices. However, as off-chip traffic decreases, our measurements show that on-chip SRAM accesses account for over 60% of energy in long-sequence workloads, making cache access the new bottleneck. We propose 3D-Flow, a hybrid-bonded, 3D-stacked spatial accelerator that enables register-to-register communication across vertically partitioned PE tiers. Unlike 2D multi-array architectures limited by NoC-based router-to-router transfers, 3D-Flow leverages sub-10 um vertical TSVs to sustain cycle-level operator pipelining with minimal overhead. On top of this architecture, we design 3D-FlashAttention, a fine-grained scheduling method that balances latency across tiers, forming a bubble-free vertical dataflow without on-chip SRAM roundtrips. Evaluations on Transformer workloads (OPT and QWEN models) show that our 3D spatial accelerator reduces 46-93% energy consumption and achieves 1.4x-7.6x speedups compared to state-of-the-art 2D and 3D designs.

2602.11015 2026-02-12 cs.CR cs.AI

CVPL: A Geometric Framework for Post-Hoc Linkage Risk Assessment in Protected Tabular Data

Valery Khvatov, Alexey Neyman

Comments 53 pages, 9 figures, 6 appendices. Code: https://github.com/DGT-Network/cvpl

详情
英文摘要

Formal privacy metrics provide compliance-oriented guarantees but often fail to quantify actual linkability in released datasets. We introduce CVPL (Cluster-Vector-Projection Linkage), a geometric framework for post-hoc assessment of linkage risk between original and protected tabular data. CVPL represents linkage analysis as an operator pipeline comprising blocking, vectorization, latent projection, and similarity evaluation, yielding continuous, scenario-dependent risk estimates rather than binary compliance verdicts. We formally define CVPL under an explicit threat model and introduce threshold-aware risk surfaces, R(lambda, tau), that capture the joint effects of protection strength and attacker strictness. We establish a progressive blocking strategy with monotonicity guarantees, enabling anytime risk estimation with valid lower bounds. We demonstrate that the classical Fellegi-Sunter linkage emerges as a special case of CVPL under restrictive assumptions, and that violations of these assumptions can lead to systematic over-linking bias. Empirical validation on 10,000 records across 19 protection configurations demonstrates that formal k-anonymity compliance may coexist with substantial empirical linkability, with a significant portion arising from non-quasi-identifier behavioral patterns. CVPL provides interpretable diagnostics identifying which features drive linkage feasibility, supporting privacy impact assessment, protection mechanism comparison, and utility-risk trade-off analysis.

2602.11000 2026-02-12 cs.DC cs.AI cs.LG

Fine-Tuning GPT-5 for GPU Kernel Generation

Ali Tehrani, Yahya Emara, Essam Wissam, Wojciech Paluch, Waleed Atallah, Łukasz Dudziak, Mohamed S. Abdelfattah

详情
英文摘要

Developing efficient GPU kernels is essential for scaling modern AI systems, yet it remains a complex task due to intricate hardware architectures and the need for specialized optimization expertise. Although Large Language Models (LLMs) demonstrate strong capabilities in general sequential code generation, they face significant challenges in GPU code generation because of the scarcity of high-quality labeled training data, compiler biases when generating synthetic solutions, and limited generalization across hardware generations. This precludes supervised fine-tuning (SFT) as a scalable methodology for improving current LLMs. In contrast, reinforcement learning (RL) offers a data-efficient and adaptive alternative but requires access to relevant tools, careful selection of training problems, and a robust evaluation environment. We present Makora's environment and tools for reinforcement learning finetuning of frontier models and report our results from fine-tuning GPT-5 for Triton code generation. In the single-attempt setting, our fine-tuned model improves kernel correctness from 43.7% to 77.0% (+33.3 percentage points) and increases the fraction of problems outperforming TorchInductor from 14.8% to 21.8% (+7 percentage points) compared to baseline GPT-5, while exceeding prior state-of-the-art models on KernelBench. When integrated into a full coding agent, it is able to solve up to 97.4% of problems in an expanded KernelBench suite, outperforming the PyTorch TorchInductor compiler on 72.9% of problems with a geometric mean speedup of 2.12x. Our work demonstrates that targeted post-training with reinforcement learning can unlock LLM capabilities in highly specialized technical domains where traditional supervised learning is limited by data availability, opening new pathways for AI-assisted accelerator programming.

2602.10996 2026-02-12 cs.MA cs.CL cs.LG

The emergence of numerical representations in communicating artificial agents

Daniela Mihai, Lucas Weber, Francesca Franzon

Comments In the Sixteenth International Conference on the Evolution of Language

详情
英文摘要

Human languages provide efficient systems for expressing numerosities, but whether the sheer pressure to communicate is enough for numerical representations to arise in artificial agents, and whether the emergent codes resemble human numerals at all, remains an open question. We study two neural network-based agents that must communicate numerosities in a referential game using either discrete tokens or continuous sketches, thus exploring both symbolic and iconic representations. Without any pre-defined numeric concepts, the agents achieve high in-distribution communication accuracy in both communication channels and converge on high-precision symbol-meaning mappings. However, the emergent code is non-compositional: the agents fail to derive systematic messages for unseen numerosities, typically reusing the symbol of the highest trained numerosity (discrete), or collapsing extrapolated values onto a single sketch (continuous). We conclude that the communication pressure alone suffices for precise transmission of learned numerosities, but additional pressures are needed to yield compositional codes and generalisation abilities.

2602.10975 2026-02-12 cs.SE cs.AI

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development

Qixing Zhou, Jiacheng Zhang, Haiyang Wang, Rui Hao, Jiahe Wang, Minghao Han, Yuxue Yang, Shuzhe Wu, Feiyang Pan, Lue Fan, Dandan Tu, Zhaoxiang Zhang

Comments Accepted by ICLR 2026

详情
英文摘要

Agents powered by large language models (LLMs) are increasingly adopted in the software industry, contributing code as collaborators or even autonomous developers. As their presence grows, it becomes important to assess the current boundaries of their coding abilities. Existing agentic coding benchmarks, however, cover a limited task scope, e.g., bug fixing within a single pull request (PR), and often rely on non-executable evaluations or lack an automated approach for continually updating the evaluation coverage. To address such issues, we propose FeatureBench, a benchmark designed to evaluate agentic coding performance in end-to-end, feature-oriented software development. FeatureBench incorporates an execution-based evaluation protocol and a scalable test-driven method that automatically derives tasks from code repositories with minimal human effort. By tracing from unit tests along a dependency graph, our approach can identify feature-level coding tasks spanning multiple commits and PRs scattered across the development timeline, while ensuring the proper functioning of other features after the separation. Using this framework, we curated 200 challenging evaluation tasks and 3825 executable environments from 24 open-source repositories in the first version of our benchmark. Empirical evaluation reveals that the state-of-the-art agentic model, such as Claude 4.5 Opus, which achieves a 74.4% resolved rate on SWE-bench, succeeds on only 11.0% of tasks, opening new opportunities for advancing agentic coding. Moreover, benefiting from our automated task collection toolkit, FeatureBench can be easily scaled and updated over time to mitigate data leakage. The inherent verifiability of constructed environments also makes our method potentially valuable for agent training.

2602.10963 2026-02-12 eess.SY cs.NA cs.RO cs.SY math.NA

Lie Group Variational Integrator for the Geometrically Exact Rod with Circular Cross-Section Incorporating Cross-Sectional Deformation

Srishti Siddharth, Vivek Natarajan, Ravi N. Banavar

Comments Submitted to: Computers and Mathematics with Applications

详情
英文摘要

In this paper, we derive the continuous space-time equations of motion of a three-dimensional geometrically exact rod, or the Cosserat rod, incorporating planar cross-sectional deformation. We then adopt the Lie group variational integrator technique to obtain a discrete model of the rod incorporating both rotational motion and cross-sectional deformation as well. The resulting discrete model possesses several desirable features: it ensures volume conservation of the discrete elements by considering cross-sectional deformation through a local dilatation factor, it demonstrates the beneficial properties associated with the variational integrator technique, such as the preservation of the rotational configuration, and energy conservation with a bounded error. An exhaustive set of numerical results under various initial conditions of the rod demonstrates the efficacy of the model in replicating the physics of the system.

2602.10935 2026-02-12 cs.HC cs.AI

What do people want to fact-check?

Bijean Ghafouri, Dorsaf Sallami, Luca Luceri, Taylor Lynn Curtis, Jean-Francois Godbout, Emilio Ferrara, Reihaneh Rabbany

详情
英文摘要

Research on misinformation has focused almost exclusively on supply, asking what falsehoods circulate, who produces them, and whether corrections work. A basic demand-side question remains unanswered. When ordinary people can fact-check anything they want, what do they actually ask about? We provide the first large-scale evidence on this question by analyzing close to 2{,}500 statements submitted by 457 participants to an open-ended AI fact-checking system. Each claim is classified along five semantic dimensions (domain, epistemic form, verifiability, target entity, and temporal reference), producing a behavioral map of public verification demand. Three findings stand out. First, users range widely across topics but default to a narrow epistemic repertoire, overwhelmingly submitting simple descriptive claims about present-day observables. Second, roughly one in four requests concerns statements that cannot be empirically resolved, including moral judgments, speculative predictions, and subjective evaluations, revealing a systematic mismatch between what users seek from fact-checking tools and what such tools can deliver. Third, comparison with the FEVER benchmark dataset exposes sharp structural divergences across all five dimensions, indicating that standard evaluation corpora encode a synthetic claim environment that does not resemble real-world verification needs. These results reframe fact-checking as a demand-driven problem and identify where current AI systems and benchmarks are misaligned with the uncertainty people actually experience.

2602.10916 2026-02-12 cs.CY cs.AI

Traceable, Enforceable, and Compensable Participation: A Participation Ledger for People-Centered AI Governance

Rashid Mushkani

Comments Presented at PAIRS: Participatory AI Research & Practice Symposium

详情
英文摘要

Participatory approaches are widely invoked in AI governance, yet participation rarely translates into durable influence. In public sector and civic AI systems, community contributions such as deliberations, annotations, prompts, and incident reports are often recorded informally, weakly linked to system updates, and disconnected from enforceable rights or sustained compensation. As a result, participation is frequently symbolic rather than accountable. We introduce the Participation Ledger, a machine readable and auditable framework that operationalizes participation as traceable influence, enforceable authority, and compensable labor. The ledger represents participation as an influence graph that links contributed artifacts to verified changes in AI systems, including datasets, prompts, adapters, policies, guardrails, and evaluation suites. It integrates three elements: a Participation Evidence Standard documenting consent, privacy, compensation, and reuse terms; an influence tracing mechanism that connects system updates to replayable before and after tests, enabling longitudinal monitoring of commitments; and encoded rights and incentives. Capability Vouchers allow authorized community stewards to request or constrain specific system capabilities within defined boundaries, while Participation Credits support ongoing recognition and compensation when contributed tests continue to provide value. We ground the framework in four urban AI and public space governance deployments and provide a machine readable schema, templates, and an evaluation plan for assessing traceability, enforceability, and compensation in practice.

2602.10891 2026-02-12 cs.NE cs.AI

Interactive LLM-assisted Curriculum Learning for Multi-Task Evolutionary Policy Search

Berfin Sakallioglu, Giorgia Nadizar, Eric Medvet

Comments 8 pages, 7 figures, with Appendix

详情
英文摘要

Multi-task policy search is a challenging problem because policies are required to generalize beyond training cases. Curriculum learning has proven to be effective in this setting, as it introduces complexity progressively. However, designing effective curricula is labor-intensive and requires extensive domain expertise. LLM-based curriculum generation has only recently emerged as a potential solution, but was limited to operate in static, offline modes without leveraging real-time feedback from the optimizer. Here we propose an interactive LLM-assisted framework for online curriculum generation, where the LLM adaptively designs training cases based on real-time feedback from the evolutionary optimization process. We investigate how different feedback modalities, ranging from numeric metrics alone to combinations with plots and behavior visualizations, influence the LLM ability to generate meaningful curricula. Through a 2D robot navigation case study, tackled with genetic programming as optimizer, we evaluate our approach against static LLM-generated curricula and expert-designed baselines. We show that interactive curriculum generation outperforms static approaches, with multimodal feedback incorporating both progression plots and behavior visualizations yielding performance competitive with expert-designed curricula. This work contributes to understanding how LLMs can serve as interactive curriculum designers for embodied AI systems, with potential extensions to broader evolutionary robotics applications.

2602.10888 2026-02-12 eess.SY cs.LG cs.SY

Anomaly Detection with Machine Learning Algorithms in Large-Scale Power Grids

Marc Gillioz, Guillaume Dubuis, Étienne Voutaz, Philippe Jacquod

Comments 12 pages, 9 figures

详情
英文摘要

We apply several machine learning algorithms to the problem of anomaly detection in operational data for large-scale, high-voltage electric power grids. We observe important differences in the performance of the algorithms. Neural networks typically outperform classical algorithms such as k-nearest neighbors and support vector machines, which we explain by the strong contextual nature of the anomalies. We show that unsupervised learning algorithm work remarkably well and that their predictions are robust against simultaneous, concurring anomalies.