arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1125
专题追踪 全部专题
2411.15605 2026-02-23 cs.CV

GIFT: A Framework Towards Global Interpretable Faithful Textual Explanations of Vision Classifiers

Éloi Zablocki, Valentin Gerard, Amaia Cardiel, Eric Gaussier, Matthieu Cord, Eduardo Valle

Comments TMLR 2026 (featured certification)

详情
英文摘要

Understanding the decision processes of deep vision models is essential for their safe and trustworthy deployment in real-world settings. Existing explainability approaches, such as saliency maps or concept-based analyses, often suffer from limited faithfulness, local scope, or ambiguous semantics. We introduce GIFT, a post-hoc framework that aims to derive Global, Interpretable, Faithful, and Textual explanations for vision classifiers. GIFT begins by generating a large set of faithful, local visual counterfactuals, then employs vision-language models to translate these counterfactuals into natural-language descriptions of visual changes. These local explanations are aggregated by a large language model into concise, human-readable hypotheses about the model's global decision rules. Crucially, GIFT includes a verification stage that quantitatively assesses the causal effect of each proposed explanation by performing image-based interventions, ensuring that the final textual explanations remain faithful to the model's true reasoning process. Across diverse datasets, including the synthetic CLEVR benchmark, the real-world CelebA faces, and the complex BDD driving scenes, GIFT reveals not only meaningful classification rules but also unexpected biases and latent concepts driving model behavior. Altogether, GIFT bridges the gap between local counterfactual reasoning and global interpretability, offering a principled approach to causally grounded textual explanations for vision models.

2411.08875 2026-02-23 cs.AI

Causal Explanations for Image Classifiers

Hana Chockler, David A. Kelly, Daniel Kroening, Youcheng Sun

Comments Under submission in a journal. A subset of the contribution was published in ICCV 2021

详情
英文摘要

Existing algorithms for explaining the output of image classifiers use different definitions of explanations and a variety of techniques to find them. However, none of the existing tools use a principled approach based on formal definitions of cause and explanation. In this paper we present a novel black-box approach to computing explanations grounded in the theory of actual causality. We prove relevant theoretical results and present an algorithm for computing approximate explanations based on these definitions. We prove termination of our algorithm and discuss its complexity and the amount of approximation compared to the precise definition. We implemented the framework in a tool ReX and we present experimental results and a comparison with state-of-the-art tools. We demonstrate that ReX is the most efficient black-box tool and produces the smallest explanations, in addition to outperforming other black-box tools on standard quality measures.

2410.11688 2026-02-23 cs.CV cs.NE

Visual Fixation-Based Retinal Prosthetic Simulation

Yuli Wu, Do Dinh Tan Nguyen, Henning Konermann, Rüveyda Yilmaz, Peter Walter, Johannes Stegmaier

Journal ref 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI)

详情
英文摘要

This study proposes a retinal prosthetic simulation framework driven by visual fixations, inspired by the saccade mechanism, and assesses performance improvements through end-to-end optimization in a classification task. Salient patches are predicted from input images using the self-attention map of a vision transformer to mimic visual fixations. These patches are then encoded by a trainable U-Net and simulated using the pulse2percept framework to predict visual percepts. By incorporating a learnable encoder, we aim to optimize the visual information transmitted to the retinal implant, addressing both the limited resolution of the electrode array and the distortion between the input stimuli and resulting phosphenes. The predicted percepts are evaluated using the self-supervised DINOv2 foundation model, with an optional learnable linear layer for classification accuracy. On a subset of the ImageNet validation set, the fixation-based framework achieves a classification accuracy of 87.72%, using computational parameters based on a real subject's physiological data, significantly outperforming the downsampling-based accuracy of 40.59% and approaching the healthy upper bound of 92.76%. Our approach shows promising potential for producing more semantically understandable percepts with the limited resolution available in retinal prosthetics.

2409.06857 2026-02-23 cs.CL

What is the Role of Small Models in the LLM Era: A Survey

Lihu Chen, Gaël Varoquaux

Comments a survey paper of small models

详情
英文摘要

Large Language Models (LLMs) have made significant progress in advancing artificial general intelligence (AGI), leading to the development of increasingly large models such as GPT-4 and LLaMA-405B. However, scaling up model sizes results in exponentially higher computational costs and energy consumption, making these models impractical for academic researchers and businesses with limited resources. At the same time, Small Models (SMs) are frequently used in practical settings, although their significance is currently underestimated. This raises important questions about the role of small models in the era of LLMs, a topic that has received limited attention in prior research. In this work, we systematically examine the relationship between LLMs and SMs from two key perspectives: Collaboration and Competition. We hope this survey provides valuable insights for practitioners, fostering a deeper understanding of the contribution of small models and promoting more efficient use of computational resources. The code is available at https://github.com/tigerchen52/role_of_small_models

2408.07238 2026-02-23 cs.CL cs.AI cs.LG

Beyond Mimicry to Contextual Guidance: Knowledge Distillation for Interactive AI

Tong Wang, K. Sudhir

详情
英文摘要

As large language models increasingly mediate firm - customer interactions, firms face a tradeoff: the most capable models perform well but are costly and difficult to control at scale. Existing knowledge distillation methods address this challenge by training weaker, deployable models to imitate frontier outputs; however, such open-loop approaches are poorly suited to interactive, multi-turn settings where responses must be sequenced coherently across conversational states. We propose a shift in what knowledge is distilled - from output imitation to contextual guidance. We develop a framework in which a superior teacher model constructs a reusable library of strategic textual guidance for particular scenarios likely to be encountered by the student. When deployed, the student retrieves the context-specific guidance at inference time, enabling adaptive behavior without retraining. Using customer-service interactions, we show that this approach improves service quality and customer satisfaction relative to standard fine-tuning while maintaining alignment with firm policies. The results position inference-time textual guidance as a scalable and controllable approach to distillation for interactive AI agents in marketing settings.

2408.03099 2026-02-23 cs.CL cs.LG

Topic Modeling with Fine-tuning LLMs and Bag of Sentences

Johannes Schneider

Comments This is the submitted journal version of enhanced with the novel fine-tuning part of "Efficient and Flexible Topic Modeling using Pretrained Embeddings and Bag of Sentences'' which appeared at the International Conference on Agents and Artificial Intelligence(ICAART) in 2024

详情
英文摘要

Large language models (LLMs) are increasingly used for topic modeling, outperforming classical topic models such as LDA. Commonly, pre-trained LLM encoders such as BERT are used out-of-the-box despite the fact that fine-tuning is known to improve LLMs considerably. The challenge lies in obtaining a suitable labeled dataset for fine-tuning. In this paper, we build on the recent idea of using bags of sentences as the elementary unit for computing topics. Based on this idea, we derive an approach called FT-Topic to perform unsupervised fine-tuning, relying primarily on two steps for constructing a training dataset in an automatic fashion. First, a heuristic method identifies pairs of sentence groups that are assumed to belong either to the same topic or to different topics. Second, we remove sentence pairs that are likely labeled incorrectly. The resulting dataset is then used to fine-tune an encoder LLM, which can be leveraged by any topic modeling approach that uses embeddings. In this work, we demonstrate its effectiveness by deriving a novel state-of-the-art topic modeling method called SenClu. The method achieves fast inference through an expectation-maximization algorithm and hard assignments of sentence groups to a single topic, while allowing users to encode prior knowledge about the topic-document distribution. Code is available at https://github.com/JohnTailor/FT-Topic

2404.13567 2026-02-23 cs.AI

On the Value of Labeled Data and Symbolic Methods for Hidden Neuron Activation Analysis

Abhilekha Dalal, Rushrukh Rayan, Adrita Barua, Eugene Y. Vasserman, Md Kamruzzaman Sarker, Pascal Hitzler

详情
英文摘要

A major challenge in Explainable AI is in correctly interpreting activations of hidden neurons: accurate interpretations would help answer the question of what a deep learning system internally detects as relevant in the input, demystifying the otherwise black-box nature of deep learning systems. The state of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans, but systematic automated methods that would be able to hypothesize and verify interpretations of hidden neuron activations are underexplored. This is particularly the case for approaches that can both draw explanations from substantial background knowledge, and that are based on inherently explainable (symbolic) methods. In this paper, we introduce a novel model-agnostic post-hoc Explainable AI method demonstrating that it provides meaningful interpretations. Our approach is based on using a Wikipedia-derived concept hierarchy with approximately 2 million classes as background knowledge, and utilizes OWL-reasoning-based Concept Induction for explanation generation. Additionally, we explore and compare the capabilities of off-the-shelf pre-trained multimodal-based explainable methods. Our results indicate that our approach can automatically attach meaningful class expressions as explanations to individual neurons in the dense layer of a Convolutional Neural Network. Evaluation through statistical analysis and degree of concept activation in the hidden layer show that our method provides a competitive edge in both quantitative and qualitative aspects compared to prior work.

2403.08802 2026-02-23 cs.AI cs.CY cs.LG

Governance of Generative Artificial Intelligence for Companies

Johannes Schneider, Pauline Kuss, Rene Abraham, Christian Meske

Comments This paper is under submission

详情
英文摘要

Generative Artificial Intelligence (GenAI), specifically large language models (LLMs) like ChatGPT, has swiftly entered organizations without adequate governance, posing both opportunities and risks. Despite extensive debate on GenAI's transformative potential and emerging regulatory measures, limited research addresses organizational governance from both technical and business perspectives. While frameworks for AI governance exist, it remains unclear to what extent they apply to GenAI. This review paper fills this gap by surveying recent literature to better understand the fundamental characteristics of GenAI and to adapt existing governance frameworks specifically to GenAI within organizations. To this end, it extends Nickerson's framework development process by incorporating prior conceptualizations. The resulting framework delineates scope, objectives, and governance mechanisms designed to both harness business opportunities and mitigate risks associated with GenAI integration. Overall, this research advances a focused approach to GenAI governance, offering practical guidance for companies navigating the challenges of GenAI adoption and highlighting research gaps.

2402.08621 2026-02-23 cs.LG math.OC stat.ML

A Unified Framework for Analyzing Meta-algorithms in Online Convex Optimization

Mohammad Pedramfar, Vaneet Aggarwal

Comments in Proc. AAMAS 2026

Journal ref Proc. AAMAS 2026

详情
英文摘要

In this paper, we analyze the problem of online convex optimization in different settings, including different feedback types (full-information/semi-bandit/bandit/etc) in either stochastic or non-stochastic setting and different notions of regret (static adversarial regret/dynamic regret/adaptive regret). This is done through a framework which allows us to systematically propose and analyze meta-algorithms for the various settings described above. We show that any algorithm for online linear optimization with deterministic gradient feedback against fully adaptive adversaries is an algorithm for online convex optimization. We also show that any such algorithm that requires full-information feedback may be transformed to an algorithm with semi-bandit feedback with comparable regret bound. We further show that algorithms that are designed for fully adaptive adversaries using deterministic semi-bandit feedback can obtain similar bounds using only stochastic semi-bandit feedback when facing oblivious adversaries. We use this to describe general meta-algorithms to convert first order algorithms to zeroth order algorithms with comparable regret bounds. Our framework allows us to analyze online optimization in various settings, recovers several results in the literature with a simplified proof technique, and provides new results.

2203.16462 2026-02-23 cs.LG cs.NE math.OC math.PR stat.ML

Convergence of gradient descent for deep neural networks

Sourav Chatterjee

Comments 30 pages, 3 figures. Numerical experiments added in this revision

详情
英文摘要

We give a simple local Polyak-Lojasiewicz (PL) criterion that guarantees linear (exponential) convergence of gradient flow and gradient descent to a zero-loss solution of a nonnegative objective. We then verify this criterion for the squared training loss of a feedforward neural network with smooth, strictly increasing activation functions, in a regime that is complementary to the usual over-parameterized analyses: the network width and depth are fixed, while the input data vectors are assumed to be linearly independent (in particular, the ambient input dimension is at least the number of data points). A notable feature of the verification is that it is constructive: it leads to a simple "positive" initialization (zero first-layer weights, strictly positive hidden-layer weights, and sufficiently large output-layer weights) under which gradient descent provably converges to an interpolating global minimizer of the training loss. We also discuss a probabilistic corollary for random initializations, clarify its dependence on the probability of the required initialization event, and provide numerical experiments showing that this theory-guided initialization can substantially accelerate optimization relative to standard random initializations at the same width.

2106.14989 2026-02-23 cs.CV

Object Detection Based Handwriting Localization

Yuli Wu, Yucheng Hu, Suting Miao

Comments ICDAR 2021 Workshop: Industrial Applications of Document Analysis and Recognition

Journal ref Document Analysis and Recognition - ICDAR 2021 Workshops

详情
英文摘要

We present an object detection based approach to localize handwritten regions from documents, which initially aims to enhance the anonymization during the data transmission. The concatenated fusion of original and preprocessed images containing both printed texts and handwritten notes or signatures are fed into the convolutional neural network, where the bounding boxes are learned to detect the handwriting. Afterwards, the handwritten regions can be processed (e.g. replaced with redacted signatures) to conceal the personally identifiable information (PII). This processing pipeline based on the deep learning network Cascade R-CNN works at 10 fps on a GPU during the inference, which ensures the enhanced anonymization with minimal computational overheads. Furthermore, the impressive generalizability has been empirically showcased: the trained model based on the English-dominant dataset works well on the fictitious unseen invoices, even in Chinese. The proposed approach is also expected to facilitate other tasks such as handwriting recognition and signature verification.

2007.06660 2026-02-23 cs.CV

Improving Pixel Embedding Learning through Intermediate Distance Regression Supervision for Instance Segmentation

Yuli Wu, Long Chen, Dorit Merhof

Comments ECCV 2020 Workshop: Computer Vision Problems in Plant Phenotyping (CVPPP 2020)

Journal ref Computer Vision - ECCV 2020 Workshops: Glasgow, UK, August 23-28, 2020, Proceedings, Part VI

详情
英文摘要

As a proposal-free approach, instance segmentation through pixel embedding learning and clustering is gaining more emphasis. Compared with bounding box refinement approaches, such as Mask R-CNN, it has potential advantages in handling complex shapes and dense objects. In this work, we propose a simple, yet highly effective, architecture for object-aware embedding learning. A distance regression module is incorporated into our architecture to generate seeds for fast clustering. At the same time, we show that the features learned by the distance regression module are able to promote the accuracy of learned object-aware embeddings significantly. By simply concatenating features of the distance regression module to the images as inputs of the embedding module, the mSBD scores on the CVPPP Leaf Segmentation Challenge can be further improved by more than 8% compared to the identical set-up without concatenation, yielding the best overall result amongst the leaderboard at CodaLab.

2602.18426 2026-02-23 astro-ph.GA cs.CV

Spatio-Spectroscopic Representation Learning using Unsupervised Convolutional Long-Short Term Memory Networks

Kameswara Bharadwaj Mantha, Lucy Fortson, Ramanakumar Sankar, Claudia Scarlata, Chris Lintott, Sandor Kruk, Mike Walmsley, Hugh Dickinson, Karen Masters, Brooke Simmons, Rebecca Smethurst

Comments This manuscript was previously submitted to ICML for peer review. Reviewers noted that while the underlying VAE-based architecture builds on established methods, its application to spatially-resolved IFS data is promising for unsupervised representation learning in astronomy. This version is released for community visibility. Reviewer decisions: Weak accept and Weak reject (Final: Reject)

详情
英文摘要

Integral Field Spectroscopy (IFS) surveys offer a unique new landscape in which to learn in both spatial and spectroscopic dimensions and could help uncover previously unknown insights into galaxy evolution. In this work, we demonstrate a new unsupervised deep learning framework using Convolutional Long-Short Term Memory Network Autoencoders to encode generalized feature representations across both spatial and spectroscopic dimensions spanning $19$ optical emission lines (3800A $< λ<$ 8000A) among a sample of $\sim 9000$ galaxies from the MaNGA IFS survey. As a demonstrative exercise, we assess our model on a sample of $290$ Active Galactic Nuclei (AGN) and highlight scientifically interesting characteristics of some highly anomalous AGN.

2602.18372 2026-02-23 cs.HC cs.AI

"How Do I ...?": Procedural Questions Predominate Student-LLM Chatbot Conversations

Alexandra Neagu, Marcus Messer, Peter Johnson, Rhodri Nelson

Comments 14 pages, 2 figures

详情
英文摘要

Providing scaffolding through educational chatbots built on Large Language Models (LLM) has potential risks and benefits that remain an open area of research. When students navigate impasses, they ask for help by formulating impasse-driven questions. Within interactions with LLM chatbots, such questions shape the user prompts and drive the pedagogical effectiveness of the chatbot's response. This paper focuses on such student questions from two datasets of distinct learning contexts: formative self-study, and summative assessed coursework. We analysed 6,113 messages from both learning contexts, using 11 different LLMs and three human raters to classify student questions using four existing schemas. On the feasibility of using LLMs as raters, results showed moderate-to-good inter-rater reliability, with higher consistency than human raters. The data showed that 'procedural' questions predominated in both learning contexts, but more so when students prepare for summative assessment. These results provide a basis on which to use LLMs for classification of student questions. However, we identify clear limitations in both the ability to classify with schemas and the value of doing so: schemas are limited and thus struggle to accommodate the semantic richness of composite prompts, offering only partial understanding the wider risks and benefits of chatbot integration. In the future, we recommend an analysis approach that captures the nuanced, multi-turn nature of conversation, for example, by applying methods from conversation analysis in discursive psychology.

2602.18350 2026-02-23 quant-ph cs.CV cs.LG

Quantum-enhanced satellite image classification

Qi Zhang, Anton Simen, Carlos Flores-Garrigós, Gabriel Alvarado Barrios, Paolo A. Erdman, Enrique Solano, Aaron C. Kemp, Vincent Beltrani, Vedangi Pathak, Hamed Mohammadbagherpoor

详情
英文摘要

We demonstrate the application of a quantum feature extraction method to enhance multi-class image classification for space applications. By harnessing the dynamics of many-body spin Hamiltonians, the method generates expressive quantum features that, when combined with classical processing, lead to quantum-enhanced classification accuracy. Using a strong and well-established ResNet50 baseline, we achieved a maximum classical accuracy of 83%, which can be improved to 84% with a transfer learning approach. In contrast, applying our quantum-classical method the performance is increased to 87% accuracy, demonstrating a clear and reproducible improvement over robust classical approaches. Implemented on several of IBM's quantum processors, our hybrid quantum-classical approach delivers consistent gains of 2-3% in absolute accuracy. These results highlight the practical potential of current and near-term quantum processors in high-stakes, data-driven domains such as satellite imaging and remote sensing, while suggesting broader applicability in real-world machine learning tasks.

2602.18319 2026-02-23 cs.GR cs.AI cs.HC cs.LG

Robo-Saber: Generating and Simulating Virtual Reality Players

Nam Hee Kim, Jingjing May Liu, Jaakko Lehtinen, Perttu Hämäläinen, James F. O'Brien, Xue Bin Peng

Comments 13 pages, 15 figures. Accepted to Eurographics 2026. Project page: https://robo-saber.github.io/

详情
英文摘要

We present the first motion generation system for playtesting virtual reality (VR) games. Our player model generates VR headset and handheld controller movements from in-game object arrangements, guided by style exemplars and aligned to maximize simulated gameplay score. We train on the large BOXRR-23 dataset and apply our framework on the popular VR game Beat Saber. The resulting model Robo-Saber produces skilled gameplay and captures diverse player behaviors, mirroring the skill levels and movement patterns specified by input style exemplars. Robo-Saber demonstrates promise in synthesizing rich gameplay data for predictive applications and enabling a physics-based whole-body VR playtesting agent.

2602.18313 2026-02-23 physics.chem-ph cs.LG

Clapeyron Neural Networks for Single-Species Vapor-Liquid Equilibria

Jan Pavšek, Alexander Mitsos, Elvis J. Sim, Jan G. Rittig

详情
英文摘要

Machine learning (ML) approaches have shown promising results for predicting molecular properties relevant for chemical process design. However, they are often limited by scarce experimental property data and lack thermodynamic consistency. As such, thermodynamics-informed ML, i.e., incorporating thermodynamic relations into the loss function as regularization term for training, has been proposed. We herein transfer the concept of thermodynamics-informed graph neural networks (GNNs) from the Gibbs-Duhem to the Clapeyron equation, predicting several pure component properties in a multi-task manner, namely: vapor pressure, liquid molar volume, vapor molar volume and enthalpy of vaporization. We find improved prediction accuracy of the Clapeyron-GNN compared to the single-task learning setting, and improved approximation of the Clapeyron equation compared to the purely data-driven multi-task learning setting. In fact, we observe the largest improvement in prediction accuracy for the properties with the lowest availability of data, making our model promising for practical application in data scarce scenarios of chemical engineering practice.

2602.18307 2026-02-23 cs.SE cs.CL cs.LG cs.PL

VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean

Yutong Xin, Qiaochu Chen, Greg Durrett, Işil Dillig

详情
英文摘要

Large language models have achieved striking results in interactive theorem proving, particularly in Lean. However, most benchmarks for LLM-based proof automation are drawn from mathematics in the Mathlib ecosystem, whereas proofs in software verification are developed inside definition-rich codebases with substantial project-specific libraries. We introduce VeriSoftBench, a benchmark of 500 Lean 4 proof obligations drawn from open-source formal-methods developments and packaged to preserve realistic repository context and cross-file dependencies. Our evaluation of frontier LLMs and specialized provers yields three observations. First, provers tuned for Mathlib-style mathematics transfer poorly to this repository-centric setting. Second, success is strongly correlated with transitive repository dependence: tasks whose proofs draw on large, multi-hop dependency closures are less likely to be solved. Third, providing curated context restricted to a proof's dependency closure improves performance relative to exposing the full repository, but nevertheless leaves substantial room for improvement. Our benchmark and evaluation suite are released at https://github.com/utopia-group/VeriSoftBench.

2602.18283 2026-02-23 cs.IR cs.AI

HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation

Lei Xin, Yuhao Zheng, Ke Cheng, Changjiang Jiang, Zifan Zhang, Fanhu Zeng

Comments Preprint

详情
英文摘要

Modeling long sequences of user behaviors has emerged as a critical frontier in generative recommendation. However, existing solutions face a dilemma: linear attention mechanisms achieve efficiency at the cost of retrieval precision due to limited state capacity, while softmax attention suffers from prohibitive computational overhead. To address this challenge, we propose HyTRec, a model featuring a Hybrid Attention architecture that explicitly decouples long-term stable preferences from short-term intent spikes. By assigning massive historical sequences to a linear attention branch and reserving a specialized softmax attention branch for recent interactions, our approach restores precise retrieval capabilities within industrial-scale contexts involving ten thousand interactions. To mitigate the lag in capturing rapid interest drifts within the linear layers, we furthermore design Temporal-Aware Delta Network (TADN) to dynamically upweight fresh behavioral signals while effectively suppressing historical noise. Empirical results on industrial-scale datasets confirm the superiority that our model maintains linear inference speed and outperforms strong baselines, notably delivering over 8% improvement in Hit Rate for users with ultra-long sequences with great efficiency.

2602.18213 2026-02-23 cond-mat.str-el cs.LG physics.comp-ph

Machine-learning force-field models for dynamical simulations of metallic magnets

Gia-Wei Chern, Yunhao Fan, Sheng Zhang, Puhan Zhang

Comments 9 pages, 5 figures

Journal ref AIP Advances 16, 025031 (2026)

详情
英文摘要

We review recent advances in machine learning (ML) force-field methods for Landau-Lifshitz-Gilbert (LLG) simulations of itinerant electron magnets, focusing on scalability and transferability. Built on the principle of locality, a deep neural network model is developed to efficiently and accurately predict the electron-mediated forces governing spin dynamics. Symmetry-aware descriptors constructed through a group-theoretical approach ensure rigorous incorporation of both lattice and spin-rotation symmetries. The framework is demonstrated using the prototypical s-d exchange model widely employed in spintronics. ML-enabled large-scale simulations reveal novel nonequilibrium phenomena, including anomalous coarsening of tetrahedral spin order on the triangular lattice and the freezing of phase separation dynamics in lightly hole-doped, strong-coupling square-lattice systems. These results establish ML force-field frameworks as scalable, accurate, and versatile tools for modeling nonequilibrium spin dynamics in itinerant magnets.

2602.17037 2026-02-23 cs.SE cs.AI cs.HC cs.PL

Wink: Recovering from Misbehaviors in Coding Agents

Rahul Nanda, Chandra Maddila, Smriti Jha, Euna Mehnaz Khan, Matteo Paltenghi, Satish Chandra

详情
英文摘要

Autonomous coding agents, powered by large language models (LLMs), are increasingly being adopted in the software industry to automate complex engineering tasks. However, these agents are prone to a wide range of misbehaviors, such as deviating from the user's instructions, getting stuck in repetitive loops, or failing to use tools correctly. These failures disrupt the development workflow and often require resource-intensive manual intervention. In this paper, we present a system for automatically recovering from agentic misbehaviors at scale. We first introduce a taxonomy of misbehaviors grounded in an analysis of production traffic, identifying three primary categories: Specification Drift, Reasoning Problems, and Tool Call Failures, which we find occur in about 30% of all agent trajectories. To address these issues, we developed a lightweight, asynchronous self-intervention system named Wink. Wink observes agent trajectories and provides targeted course-correction guidance to nudge the agent back to a productive path. We evaluated our system on over 10,000 real world agent trajectories and found that it successfully resolves 90% of the misbehaviors that require a single intervention. Furthermore, a live A/B test in our production environment demonstrated that our system leads to a statistically significant reduction in Tool Call Failures, Tokens per Session and Engineer Interventions per Session. We present our experience designing and deploying this system, offering insights into the challenges of building resilient agentic systems at scale.

2602.08227 2026-02-23 cs.HC cs.AI

Investigating Writing Professionals' Relationships with Generative AI: How Combined Perceptions of Rivalry and Collaboration Shape Work Practices and Outcomes

Rama Adithya Varanasi, Oded Nov, Batia Mishan Wiesenfeld

Comments CHI'2026

详情
英文摘要

This study investigates how professional writers' complex relationship with GenAI shapes their work practices and outcomes. Through a cross-sectional survey with writing professionals (n=403) in diverse roles, we show that collaboration and rivalry orientation are associated with differences in work practices and outcomes. Rivalry is primarily associated with relational crafting and skill maintenance. Collaboration is primarily associated with task crafting, productivity, and satisfaction, at the cost of long-term skill deterioration. Combination of the orientations (high rivalry and high collaboration) reconciles these differences, while boosting the association with the outcomes. Our findings argue for a balanced approach where high levels of rivalry and collaboration are essential to shape work practices and generate outcomes aimed at the long-term success of the job. We present key design implications on how to increase friction (rivalry) and reduce over-reliance (collaboration) to achieve a more balanced relationship with GenAI.

2511.18554 2026-02-23 cs.DS cs.LG cs.SY eess.SY

Online Smoothed Demand Management

Adam Lechowicz, Nicolas Christianson, Mohammad Hajiesmaili, Adam Wierman, Prashant Shenoy

Comments Accepted to SIGMETRICS '26. 65 pages, 11 figures

详情
英文摘要

We introduce and study a class of online problems called online smoothed demand management $(\texttt{OSDM})$, motivated by paradigm shifts in grid integration and energy storage for large energy consumers such as data centers. In $\texttt{OSDM}$, an operator makes two decisions at each time step: an amount of energy to be purchased, and an amount of energy to be delivered (i.e., used for computation). The difference between these decisions charges (or discharges) the operator's energy storage (e.g., a battery). Two types of demand arrive online: base demand, which must be covered at the current time, and flexible demand, which can be satisfied at any time before a demand-specific deadline $Δ_t$. The operator's goal is to minimize a cost (subject to above constraints) that combines a cost of purchasing energy, a cost for delivering energy (if applicable), and smoothness penalties on the purchasing and delivery rates to discourage fluctuations and encourage ``grid healthy'' decisions. $\texttt{OSDM}$ generalizes several problems in the online algorithms literature while being the first to fully model applications of interest. We propose a competitive algorithm for $\texttt{OSDM}$ called $\texttt{PAAD}$ (partitioned accounting & aggregated decisions) and show it achieves the optimal competitive ratio. To overcome the pessimism typical of worst-case analysis, we also propose a novel learning framework that provides guarantees on the worst-case competitive ratio (i.e., to provide robustness against nonstationarity) while allowing end-to-end differentiable learning of the best algorithm on historical instances of the problem. We evaluate our algorithms in a case study of a grid-integrated data center with battery storage, showing that $\texttt{PAAD}$ effectively solves the problem and end-to-end learning achieves substantial performance improvements compared to $\texttt{PAAD}$.

2510.21087 2026-02-23 cs.HC cs.CL

Designing and Evaluating Chain-of-Hints for Scientific Question Answering

Anubhav Jangra, Smaranda Muresan

详情
英文摘要

LLMs are reshaping education, with students increasingly relying on them for learning. Implemented using general-purpose models, these systems are likely to give away the answers, potentially undermining conceptual understanding and critical thinking. Prior work shows that hints can effectively promote cognitive engagement. Building on this insight, we evaluate 18 open-source LLMs on chain-of-hints generation that scaffold users toward the correct answer. We compare two distinct hinting strategies: static hints, pre-generated for each problem, and dynamic hints, adapted to a learners' progress. We evaluate these systems on five pedagogically grounded automatic metrics for hint quality. Using the best performing LLM as the backbone of a quantitative study with 41 participants, we uncover distinct user preferences across hinting strategies, and identify the limitations of automatic evaluation metrics to capture them. Our findings highlight key design considerations for future research on tutoring systems and contribute toward the development of more learner-centered educational technologies.

2510.13887 2026-02-23 eess.IV cs.AI cs.LG stat.ML

Incomplete Multi-view Clustering via Hierarchical Semantic Alignment and Cooperative Completion

Xiaojian Ding, Lin Zhao, Xian Li, Xiaoying Zhu

Comments 13 pages, conference paper. Accepted to the Thirty-ninth Conference on Neural Information Processing Systems (NeurIPS 2025)

详情
英文摘要

Incomplete multi-view data, where certain views are entirely missing for some samples, poses significant challenges for traditional multi-view clustering methods. Existing deep incomplete multi-view clustering approaches often rely on static fusion strategies or two-stage pipelines, leading to suboptimal fusion results and error propagation issues. To address these limitations, this paper proposes a novel incomplete multi-view clustering framework based on Hierarchical Semantic Alignment and Cooperative Completion (HSACC). HSACC achieves robust cross-view fusion through a dual-level semantic space design. In the low-level semantic space, consistency alignment is ensured by maximizing mutual information across views. In the high-level semantic space, adaptive view weights are dynamically assigned based on the distributional affinity between individual views and an initial fused representation, followed by weighted fusion to generate a unified global representation. Additionally, HSACC implicitly recovers missing views by projecting aligned latent representations into high-dimensional semantic spaces and jointly optimizes reconstruction and clustering objectives, enabling cooperative learning of completion and clustering. Experimental results demonstrate that HSACC significantly outperforms state-of-the-art methods on five benchmark datasets. Ablation studies validate the effectiveness of the hierarchical alignment and dynamic weighting mechanisms, while parameter analysis confirms the model's robustness to hyperparameter variations. The code is available at https://github.com/XiaojianDing/2025-NeurIPS-HSACC.

2510.06170 2026-02-23 eess.IV cs.AI cs.CV

Smartphone-based iris recognition through high-quality visible-spectrum iris image capture.V2

Naveenkumar G Venkataswamy, Yu Liu, Soumyabrata Dey, Stephanie Schuckers, Masudul H Imtiaz

Comments The new version is available at arXiv:2512.15548

详情
英文摘要

Smartphone-based iris recognition in the visible spectrum (VIS) remains difficult due to illumination variability, pigmentation differences, and the absence of standardized capture controls. This work presents a compact end-to-end pipeline that enforces ISO/IEC 29794-6 quality compliance at acquisition and demonstrates that accurate VIS iris recognition is feasible on commodity devices. Using a custom Android application performing real-time framing, sharpness evaluation, and feedback, we introduce the CUVIRIS dataset of 752 compliant images from 47 subjects. A lightweight MobileNetV3-based multi-task segmentation network (LightIrisNet) is developed for efficient on-device processing, and a transformer matcher (IrisFormer) is adapted to the VIS domain. Under a standardized protocol and comparative benchmarking against prior CNN baselines, OSIRIS attains a TAR of 97.9% at FAR=0.01 (EER=0.76%), while IrisFormer, trained only on UBIRIS.v2, achieves an EER of 0.057% on CUVIRIS. The acquisition app, trained models, and a public subset of the dataset are released to support reproducibility. These results confirm that standardized capture and VIS-adapted lightweight models enable accurate and practical iris recognition on smartphones.

2509.12253 2026-02-23 eess.IV cs.AI cs.LG

Physics-Informed Neural Networks vs. Physics Models for Non-Invasive Glucose Monitoring: A Comparative Study Under Noise-Stressed Synthetic Conditions

Riyaadh Gani

详情
英文摘要

Non-invasive glucose monitoring outside controlled settings is dominated by low signal-to-noise ratio (SNR): hardware drift, environmental variation, and physiology suppress the glucose signature in NIR signals. We present a noise-stressed NIR simulator that injects 12-bit ADC quantisation, LED drift, photodiode dark noise, temperature/humidity variation, contact-pressure noise, Fitzpatrick I-VI melanin, and glucose variability to create a low-correlation regime (rho_glucose-NIR = 0.21). Using this platform, we benchmark six methods: Enhanced Beer-Lambert (physics-engineered ridge regression), Original PINN, Optimised PINN, RTE-inspired PINN, Selective RTE PINN, and a shallow DNN. The physics-engineered Beer Lambert model achieves the lowest error (13.6 mg/dL RMSE) with only 56 parameters and 0.01 ms inference, outperforming deeper PINNs and the SDNN baseline under low-SNR conditions. The study reframes the task as noise suppression under weak signal and shows that carefully engineered physics features can outperform higher-capacity models in this regime.

2508.16189 2026-02-23 cs.CR cs.AI

A Relay-Chain-Powered Ciphertext-Policy Attribute-Based Encryption in Intelligent Transportation Systems

Aparna Singh, Geetanjali Rathee, Chaker Abdelaziz Kerrache, Mohamed Chahine Ghanem

详情
英文摘要

The very high growth of Intelligent Transportation Systems (ITS) has generated an urgent requirement for secure, effective, and context-aware data sharing mechanisms, especially over heterogeneous and geographically dispersed settings. This work suggests a new architecture that combines a relay chain-driven encryption system with a modified Ciphertext-Policy Attribute-Based Encryption (CP-ABE) scheme to tackle the double impediment of dynamic access and low-latency communication. The model proposes a context-aware smart contract on a worldwide relay chain that checks against data properties, including event type, time, and geographical region, to specify the suitable level of encryption policy. From such relay-directed judgment, On-Board Units (OBUs) encrypt data end-to-end by utilising CP-ABE and store ciphertext inside localised regional blockchains, preventing dependence on symmetric encryption or off-chain storage. High-sensitivity events are secured with firm, multi-attribute access rules, whereas common updates use light policies to help reduce processing burdens. The crypto system also adds traceability and low-latency revocation, with global enforcement managed through the relay chain. This distributed, scalable model provides a proper balance between responsiveness in real time and security and is extremely apt for next-gen vehicular networks that function across multi-jurisdictional domains.

2507.11551 2026-02-23 eess.IV cs.AI cs.CV

Landmark Detection for Medical Images using a General-purpose Segmentation Model

Ekaterina Stansfield, Jennifer A. Mitterer, Abdulrahman Altahhan

Comments 13 pages, 8 figures, 2 tables. Submitted to ICONIP 2025

详情
英文摘要

Radiographic images are a cornerstone of medical diagnostics in orthopaedics, with anatomical landmark detection serving as a crucial intermediate step for information extraction. General-purpose foundational segmentation models, such as SAM (Segment Anything Model), do not support landmark segmentation out of the box and require prompts to function. However, in medical imaging, the prompts for landmarks are highly specific. Since SAM has not been trained to recognize such landmarks, it cannot generate accurate landmark segmentations for diagnostic purposes. Even MedSAM, a medically adapted variant of SAM, has been trained to identify larger anatomical structures, such as organs and their parts, and lacks the fine-grained precision required for orthopaedic pelvic landmarks. To address this limitation, we propose leveraging another general-purpose, non-foundational model: YOLO. YOLO excels in object detection and can provide bounding boxes that serve as input prompts for SAM. While YOLO is efficient at detection, it is significantly outperformed by SAM in segmenting complex structures. In combination, these two models form a reliable pipeline capable of segmenting not only a small pilot set of eight anatomical landmarks but also an expanded set of 72 landmarks and 16 regions with complex outlines, such as the femoral cortical bone and the pelvic inlet. By using YOLO-generated bounding boxes to guide SAM, we trained the hybrid model to accurately segment orthopaedic pelvic radiographs. Our results show that the proposed combination of YOLO and SAM yields excellent performance in detecting anatomical landmarks and intricate outlines in orthopaedic pelvic radiographs.

2506.16552 2026-02-23 cs.IR cs.CL

Revela: Dense Retriever Learning via Language Modeling

Fengyu Cai, Tong Chen, Xinran Zhao, Sihao Chen, Hongming Zhang, Sherry Tongshuang Wu, Iryna Gurevych, Heinz Koeppl

Comments Accepted to ICLR 2026 (Oral). Camera-ready version

详情
英文摘要

Dense retrievers play a vital role in accessing external and specialized knowledge to augment language models (LMs). Training dense retrievers typically requires annotated query-document pairs, which are costly to create and scarce in specialized domains (e.g., code) or in complex settings (e.g., requiring reasoning). These practical challenges have sparked growing interest in self-supervised retriever learning. Since LMs are trained to capture token-level dependencies through a self-supervised learning objective (i.e., next token prediction), we can analogously cast retrieval as learning dependencies among chunks of tokens. This analogy naturally leads to the question: How can we adapt self-supervised learning objectives in the spirit of language modeling to train retrievers? To answer this question, we introduce Revela, a unified and scalable training framework for self-supervised retriever learning via language modeling. Revela models semantic dependencies among documents by conditioning next token prediction on local and cross-document context through an in-batch attention mechanism. This attention is weighted by retriever-computed similarity scores, enabling the retriever to be optimized as part of language modeling. We evaluate Revela on domain-specific (CoIR), reasoning-intensive (BRIGHT), and general-domain (BEIR) benchmarks across various retriever backbones. Without annotated or synthetic query-document pairs, Revela surpasses larger supervised models and proprietary APIs on both CoIR and BRIGHT. It achieves BEIR's unsupervised SoTA with ~1000x less training data and 10x less compute. Performance increases with batch size and model size, highlighting Revela's scalability and its promise for self-supervised retriever learning.