arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1433
专题追踪
2510.08176 2026-02-06 cs.SD cs.AI cs.LG eess.AS

Leveraging Whisper Embeddings for Audio-based Lyrics Matching

Eleonora Mancini, Joan Serrà, Paolo Torroni, Yuki Mitsufuji

Comments Accepted at ICASSP 2026 (IEEE International Conference on Acoustics, Speech and Signal Processing)

详情
英文摘要

Audio-based lyrics matching can be an appealing alternative to other content-based retrieval approaches, but existing methods often suffer from limited reproducibility and inconsistent baselines. In this work, we introduce WEALY, a fully reproducible pipeline that leverages Whisper decoder embeddings for lyrics matching tasks. WEALY establishes robust and transparent baselines, while also exploring multimodal extensions that integrate textual and acoustic features. Through extensive experiments on standard datasets, we demonstrate that WEALY achieves a performance comparable to state-of-the-art methods that lack reproducibility. In addition, we provide ablation studies and analyses on language robustness, loss functions, and embedding strategies. This work contributes a reliable benchmark for future research, and underscores the potential of speech technologies for music information retrieval tasks.

2510.04660 2026-02-06 cs.LG

An Attention-based Feature Memory Design for Energy-Efficient Continual Learning

Yuandou Wang, Filip Gunnarsson, Rihan Hai

详情
英文摘要

Tabular data streams are increasingly prevalent in real-time decision-making across healthcare, finance, and the Internet of Things, often generated and processed on resource-constrained edge and mobile devices. Continual learning (CL) enables models to learn sequentially from such streams while retaining previously acquired knowledge. While recent CL advances have made significant progress in mitigating catastrophic forgetting, the energy and memory efficiency of CL for tabular data streams remains largely unexplored. To address this gap, we propose AttenMLP, which integrates attention-based feature replay with context retrieval and sliding buffer updates within a minibatch training framework for streaming tabular learning. We evaluate AttenMLP against state-of-the-art (SOTA) tabular models on real-world concept drift benchmarks with temporal distribution shifts. Experimental results show that AttenMLP achieves accuracy comparable to strong baselines without replay, while substantially reducing energy consumption through tunable design choices. In particular, with the proposed attention-based feature memory design, AttenMLP costs a 0.062 decrease in final accuracy under the incremental concept drift dataset, while reducing energy usage up to 33.3\% compared to TabPFNv2. Under the abrupt concept drift dataset, AttenMLP reduces 1.47\% energy consumption compared to TabR, at the cost of a 0.038 decrease in final accuracy. Although ranking third in global efficiency, AttenMLP demonstrates energy-accuracy trade-offs across both abrupt and incremental concept drift scenarios compared to SOTA tabular models.

2510.03999 2026-02-06 cs.CL

LH-Deception: Simulating and Understanding LLM Deceptive Behaviors in Long-Horizon Interactions

Yang Xu, Xuanming Zhang, Samuel Yeh, Jwala Dhamala, Ousmane Dia, Rahul Gupta, Sharon Li

Comments ICLR 2026

详情
英文摘要

Deception is a pervasive feature of human communication and an emerging concern in large language models (LLMs). While recent studies document instances of LLM deception, most evaluations remain confined to single-turn prompts and fail to capture the long-horizon interactions in which deceptive strategies typically unfold. We introduce a new simulation framework, LH-Deception, for a systematic, empirical quantification of deception in LLMs under extended sequences of interdependent tasks and dynamic contextual pressures. LH-Deception is designed as a multi-agent system: a performer agent tasked with completing tasks and a supervisor agent that evaluates progress, provides feedback, and maintains evolving states of trust. An independent deception auditor then reviews full trajectories to identify when and how deception occurs. We conduct extensive experiments across 11 frontier models, spanning both closed-source and open-source systems, and find that deception is model-dependent, increases with event pressure, and consistently erodes supervisor trust. Qualitative analyses further reveal emergent, long-horizon phenomena, such as ``chains of deception", which are invisible to static, single-turn evaluations. Our findings provide a foundation for evaluating future LLMs in real-world, trust-sensitive contexts.

2510.02345 2026-02-06 cs.CL cs.AI cs.DC cs.LG cs.NE

Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression

Peijun Zhu, Ning Yang, Baoliang Tian, Jiayu Wei, Weihao Zhang, Haijun Zhang, Pin Lv

Comments 10 pages, 2 figures, 8 tables. Under review as a conference paper at ICML 2026

详情
英文摘要

Mixture-of-Experts (MoE) Large Language Models (LLMs) face a trilemma of load imbalance, parameter redundancy, and communication overhead. We introduce a unified framework based on dynamic expert clustering and structured compression to address these issues cohesively. Our method employs an online clustering procedure that periodically regroups experts using a fused metric of parameter and activation similarity, which stabilizes expert utilization. To our knowledge, this is one of the first frameworks to leverage the semantic embedding capability of the router to dynamically reconfigure the model's architecture during training for substantial efficiency gains. Within each cluster, we decompose expert weights into a shared base matrix and extremely low-rank residual adapters, achieving up to fivefold parameter reduction per group while preserving specialization. This structure enables a two-stage hierarchical routing strategy: tokens are first assigned to a cluster, then to specific experts within it, drastically reducing the routing search space and the volume of all-to-all communication. Furthermore, a heterogeneous precision scheme, which stores shared bases in FP16 and residual factors in INT4, coupled with dynamic offloading of inactive clusters, reduces peak memory consumption to levels comparable to dense models. Evaluated on GLUE and WikiText-103, our framework matches the quality of standard MoE models while reducing total parameters by approximately 80%, improving throughput by 10% to 20%, and lowering expert load variance by a factor of over three. Our work demonstrates that structural reorganization is a principled path toward scalable, efficient, and memory-effective MoE LLMs. Code is available at https://github.com/szdtzpj/Breaking_the_moe_trilemma

2509.22650 2026-02-06 cs.CV

RefAM: Attention Magnets for Zero-Shot Referral Segmentation

Anna Kukleva, Enis Simsar, Alessio Tonioni, Muhammad Ferjad Naeem, Federico Tombari, Jan Eric Lenssen, Bernt Schiele

Comments Project Page: https://refam-diffusion.github.io/

详情
英文摘要

Most existing approaches to referring segmentation achieve strong performance only through fine-tuning or by composing multiple pre-trained models, often at the cost of additional training and architectural modifications. Meanwhile, large-scale generative diffusion models encode rich semantic information, making them attractive as general-purpose feature extractors. In this work, we introduce a new method that directly exploits features, attention scores, from diffusion transformers for downstream tasks, requiring neither architectural modifications nor additional training. To systematically evaluate these features, we extend benchmarks with vision-language grounding tasks spanning both images and videos. Our key insight is that stop words act as attention magnets: they accumulate surplus attention and can be filtered to reduce noise. Moreover, we identify global attention sinks (GAS) emerging in deeper layers and show that they can be safely suppressed or redirected onto auxiliary tokens, leading to sharper and more accurate grounding maps. We further propose an attention redistribution strategy, where appended stop words partition background activations into smaller clusters, yielding sharper and more localized heatmaps. Building on these findings, we develop RefAM, a simple training-free grounding framework that combines cross-attention maps, GAS handling, and redistribution. Across zero-shot referring image and video segmentation benchmarks, our approach achieves strong performance and surpasses prior methods on most datasets, establishing a new state of the art without fine-tuning, additional components and complex reasoning.

2509.22352 2026-02-06 cs.LG cs.AI

SurvDiff: A Diffusion Model for Generating Synthetic Data in Survival Analysis

Marie Brockschmidt, Maresa Schröder, Stefan Feuerriegel

详情
英文摘要

Survival analysis is a cornerstone of clinical research by modeling time-to-event outcomes such as metastasis, disease relapse, or patient death. Unlike standard tabular data, survival data often come with incomplete event information due to dropout, or loss to follow-up. This poses unique challenges for synthetic data generation, where it is crucial for clinical research to faithfully reproduce both the event-time distribution and the censoring mechanism. In this paper, we propose SurvDiff an end-to-end diffusion model specifically designed for generating synthetic data in survival analysis. SurvDiff is tailored to capture the data-generating mechanism by jointly generating mixed-type covariates, event times, and right-censoring, guided by a survival-tailored loss function. The loss encodes the time-to-event structure and directly optimizes for downstream survival tasks, which ensures that SurvDiff (i) reproduces realistic event-time distributions and (ii preserves the censoring mechanism. Across multiple datasets, we show that SurvDiff consistently outperforms state-of-the-art generative baselines in both distributional fidelity and survival model evaluation metrics across multiple medical datasets. To the best of our knowledge, SurvDiff is the first end-to-end diffusion model explicitly designed for generating synthetic survival data.

2509.21818 2026-02-06 cs.LG math.OC

Sharpness-Aware Minimization Can Hallucinate Minimizers

Chanwoong Park, Uijeong Jang, Ernest K. Ryu, Insoon Yang

详情
英文摘要

Sharpness-Aware Minimization (SAM) is widely used to seek flatter minima -- often linked to better generalization. In its standard implementation, SAM updates the current iterate using the loss gradient evaluated at a point perturbed by distance $ρ$ along the normalized gradient direction. We show that, for some choices of $ρ$, SAM can stall at points where this shifted (perturbed-point) gradient vanishes despite a nonzero original gradient, and therefore, they are not stationary points of the original loss. We call these points hallucinated minimizers, prove their existence under simple nonconvex landscape conditions (e.g., the presence of a local minimizer and a local maximizer), and establish sufficient conditions for local convergence of the SAM iterates to them. We corroborate this failure mode in neural network training and observe that it aligns with SAM's performance degradation often seen at large $ρ$. Finally, as a practical safeguard, we find that a short initial SGD warm-start before enabling SAM mitigates this failure mode and reduces sensitivity to the choice of $ρ$.

2509.21004 2026-02-06 cs.LG

Multi-Agent Inverted Transformer for Flight Trajectory Prediction

Seokbin Yoon, Keumjin Lee

Comments 11 pages, 8 figures, submitted for IEEE Transactions on Intelligent Transportation System

详情
英文摘要

Flight trajectory prediction for multiple aircraft is essential and provides critical insights into how aircraft navigate within current air traffic flows. However, predicting multi-agent flight trajectories is inherently challenging. One of the major difficulties is modeling both the individual aircraft behaviors over time and the complex interactions between flights. Generating explainable prediction outcomes is also a challenge. Therefore, we propose a Multi-Agent Inverted Transformer, MAIFormer, as a novel neural architecture that predicts multi-agent flight trajectories. The proposed framework features two key attention modules: (i) masked multivariate attention, which captures spatio-temporal patterns of individual aircraft, and (ii) agent attention, which models the social patterns among multiple agents in complex air traffic scenes. We evaluated MAIFormer using a real-world automatic dependent surveillance-broadcast flight trajectory dataset from the terminal airspace of Incheon International Airport in South Korea. The experimental results show that MAIFormer achieves the best performance across multiple metrics and outperforms other methods. In addition, MAIFormer produces prediction outcomes that are interpretable from a human perspective, which improves both the transparency of the model and its practical utility in air traffic control.

2509.20900 2026-02-06 cs.CL

Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization

Weixuan Wang, Minghao Wu, Barry Haddow, Alexandra Birch

详情
英文摘要

Long document summarization remains a significant challenge for current large language models (LLMs), as existing approaches commonly struggle with information loss, factual inconsistencies, and coherence issues when processing excessively long documents. We propose SummQ, a novel adversarial multi-agent framework that addresses these limitations through collaborative intelligence between specialized agents operating in two complementary domains: summarization and quizzing. Our approach employs summary generators and reviewers that work collaboratively to create and evaluate comprehensive summaries, while quiz generators and reviewers create comprehension questions that serve as continuous quality checks for the summarization process. This adversarial dynamic, enhanced by an examinee agent that validates whether the generated summary contains the information needed to answer the quiz questions, enables iterative refinement through multifaceted feedback mechanisms. We evaluate SummQ on three widely used long document summarization benchmarks. Experimental results demonstrate that our framework significantly outperforms existing state-of-the-art methods across ROUGE and BERTScore metrics, as well as in LLM-as-a-Judge and human evaluations. Our comprehensive analyses reveal the effectiveness of the multi-agent collaboration dynamics, the influence of different agent configurations, and the impact of the quizzing mechanism. This work establishes a new approach for long document summarization that uses adversarial agentic collaboration to improve summarization quality.

2509.19391 2026-02-06 cs.LG cs.AI

TensLoRA: Tensor Alternatives for Low-Rank Adaptation

Axel Marmoret, Reda Bensaid, Jonathan Lys, Vincent Gripon, François Leduc-Primeau

Comments Published at ICASSP 2026. 5 pages, 1 figure, 2 tables. Code can be found at https://github.com/ax-le/TensLoRA

详情
英文摘要

Low-Rank Adaptation (LoRA) is widely used to efficiently adapt Transformers by adding trainable low-rank matrices to attention projections. While effective, these matrices are considered independent for each attention projection (Query, Key, and Value) and each layer. Recent extensions have considered joint, tensor-based adaptations, but only in limited forms and without a systematic framework. We introduce TensLoRA, a unified framework that aggregates LoRA updates into higher-order tensors and models a broad family of tensor-based low-rank adaptations. Our formulation generalizes existing tensor-based methods and enables mode-specific compression rates, allowing parameter budgets to be tailored according to the modality and task. Experiments on vision and language benchmarks reveal that the tensor construction directly impacts performance, sometimes better than standard LoRA under similar parameter counts.

2509.11298 2026-02-06 cs.LG cs.AI cs.CL

When Are Two RLHF Objectives the Same?

Madhava Gaikwad

Comments 21 pages

详情
英文摘要

The preference optimization literature contains many proposed objectives, often presented as distinct improvements. We introduce Opal, a canonicalization algorithm that determines whether two preference objectives are algebraically equivalent by producing either a canonical form or a concrete witness of non-equivalence. Applying Opal reveals that many widely used methods optimize the same underlying objective, while others are provably distinct. For example, batch normalization can cause the same response pair to receive different gradients depending on batch composition. We identify a small set of structural mechanisms that give rise to genuinely different objectives; most remaining differences are reparameterizations.

2509.06690 2026-02-06 cs.CV cs.AI cs.AR

BioLite U-Net: Edge-Deployable Semantic Segmentation for In Situ Bioprinting Monitoring

Usman Haider, Lukasz Szemet, Daniel Kelly, Vasileios Sergis, Andrew C. Daly, Karl Mason

Comments 8 pages, 5 figures, conference-style submission (ICRA 2026). Includes dataset description, BioLite U-Net architecture, benchmark results on edge device (Raspberry Pi 4B)

详情
英文摘要

Bioprinting is a rapidly advancing field that offers a transformative approach to fabricating tissue and organ models through the precise deposition of cell-laden bioinks. Ensuring the fidelity and consistency of printed structures in real-time remains a core challenge, particularly under constraints imposed by limited imaging data and resource-constrained embedded hardware. Semantic segmentation of the extrusion process, differentiating between nozzle, extruded bioink, and surrounding background, enables in situ monitoring critical to maintaining print quality and biological viability. In this work, we introduce a lightweight semantic segmentation framework tailored for real-time bioprinting applications. We present a novel, manually annotated dataset comprising 787 RGB images captured during the bioprinting process, labeled across three classes: nozzle, bioink, and background. To achieve fast and efficient inference suitable for integration with bioprinting systems, we propose a BioLite U-Net architecture that leverages depthwise separable convolutions to drastically reduce computational load without compromising accuracy. Our model is benchmarked against MobileNetV2 and MobileNetV3-based segmentation baselines using mean Intersection over Union (mIoU), Dice score, and pixel accuracy. All models were evaluated on a Raspberry Pi 4B to assess real-world feasibility. The proposed BioLite U-Net achieves an mIoU of 92.85% and a Dice score of 96.17%, while being over 1300x smaller than MobileNetV2-DeepLabV3+. On-device inference takes 335 ms per frame, demonstrating near real-time capability. Compared to MobileNet baselines, BioLite U-Net offers a superior tradeoff between segmentation accuracy, efficiency, and deployability, making it highly suitable for intelligent, closed-loop bioprinting systems.

2509.06505 2026-02-06 cs.LG cs.IT math.IT stat.ML

On optimal solutions of classical and sliced Wasserstein GANs with non-Gaussian data

Yu-Jui Huang, Hsin-Hua Shen, Yu-Chih Huang, Wan-Yi Lin, Shih-Chun Lin

详情
英文摘要

The generative adversarial network (GAN) aims to approximate an unknown distribution via a parameterized neural network (NN). While GANs have been widely applied in reinforcement and semi-supervised learning as well as computer vision tasks, selecting their parameters often needs an exhaustive search, and only a few selection methods have been proven to be theoretically optimal. One of the most promising GAN variants is the Wasserstein GAN (WGAN). Prior work on optimal parameters for population WGAN is limited to the linear-quadratic-Gaussian (LQG) setting, where the generator NN is linear, and the data is Gaussian. In this paper, we focus on the characterization of optimal solutions of population WGAN beyond the LQG setting. As a basic result, closed-form optimal parameters for one-dimensional WGAN are derived when the NN has non-linear activation functions, and the data is non-Gaussian. For high-dimensional data, we adopt the sliced Wasserstein framework and show that the linear generator can be asymptotically optimal. Moreover, the original sliced WGAN only constrains the projected data marginal instead of the whole one in classical WGAN, and thus, we propose another new unprojected sliced WGAN and identify its asymptotic optimality. Empirical studies show that compared to the celebrated r-principal component analysis (r-PCA) solution, which has cubic complexity to the data dimension, our generator for sliced WGAN can achieve better performance with only linear complexity.

2509.04661 2026-02-06 cs.LG cs.NE

Flexible inference for animal learning rules using neural networks

Yuhan Helena Liu, Victor Geadah, Jonathan Pillow

详情
英文摘要

Understanding how animals learn is a central challenge in neuroscience, with growing relevance to the development of animal- or human-aligned artificial intelligence. However, existing approaches tend to assume fixed parametric forms for the learning rule (e.g., Q-learning, policy gradient), which may not accurately describe the complex forms of learning employed by animals in realistic settings. Here we address this gap by developing a framework to infer learning rules directly from behavioral data collected during de novo task learning. We assume that animals follow a decision policy parameterized by a generalized linear model (GLM), and we model their learning rule -- the mapping from task covariates to per-trial weight updates -- using a deep neural network (DNN). This formulation allows flexible, data-driven inference of learning rules while maintaining an interpretable form of the decision policy itself. To capture more complex learning dynamics, we introduce a recurrent neural network (RNN) variant that relaxes the Markovian assumption that learning depends solely on covariates of the current trial, allowing for learning rules that integrate information over multiple trials. Simulations demonstrate that the framework can recover ground-truth learning rules. We applied our DNN and RNN-based methods to a large behavioral dataset from mice learning to perform a sensory decision-making task and found that they outperformed traditional RL learning rules at predicting the learning trajectories of held-out mice. The inferred learning rules exhibited reward-history-dependent learning dynamics, with larger updates following sequences of rewarded trials. Overall, these methods provide a flexible framework for inferring learning rules from behavioral data in de novo learning tasks, setting the stage for improved animal training protocols and the development of behavioral digital twins.

2509.03493 2026-02-06 cs.LG cs.AI

On Entropy Control in LLM-RL Algorithms

Han Shen

Comments Updated with ICLR 2026 version

详情
英文摘要

For RL algorithms, appropriate entropy control is crucial to their effectiveness. To control the policy entropy, a commonly used method is entropy regularization, which is adopted in various popular RL algorithms including PPO, SAC and A3C. Although entropy regularization proves effective in robotic and games RL conventionally, studies found that it gives weak to no gains in LLM-RL training. In this work, we study the issues of entropy bonus in LLM-RL setting. Specifically, we first argue that the conventional entropy regularization suffers from the LLM's extremely large response space and the sparsity of the optimal outputs. As a remedy, we propose AEnt, an entropy control method that utilizes a new clamped entropy bonus with an automatically adjusted coefficient. The clamped entropy is evaluated with the re-normalized policy defined on certain smaller token space, which encourages exploration within a more compact response set. In addition, the algorithm automatically adjusts entropy coefficient according to the clamped entropy value, effectively controlling the entropy-induced bias while leveraging the entropy's benefits. AEnt is tested in math-reasoning tasks under different base models and datasets, and it is observed that AEnt outperforms the baselines consistently across multiple benchmarks.

2509.02276 2026-02-06 cs.AI

Rewarding Explainability in Drug Repurposing with Knowledge Graphs

Susana Nunes, Samy Badreddine, Catia Pesquita

Comments 9 pages, 4 figures, accepted at conference IJCAI 2025

Journal ref Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI-25), pp. 4624-4632, 2025

详情
英文摘要

Knowledge graphs (KGs) are powerful tools for modelling complex, multi-relational data and supporting hypothesis generation, particularly in applications like drug repurposing. However, for predictive methods to gain acceptance as credible scientific tools, they must ensure not only accuracy but also the capacity to offer meaningful scientific explanations. This paper presents a novel approach REx, for generating scientific explanations based in link prediction in knowledge graphs. It employs reward and policy mechanisms that consider desirable properties of scientific explanation to guide a reinforcement learning agent in the identification of explanatory paths within a KG. The approach further enriches explanatory paths with domain-specific ontologies, ensuring that the explanations are both insightful and grounded in established biomedical knowledge. We evaluate our approach in drug repurposing using three popular knowledge graph benchmarks. The results clearly demonstrate its ability to generate explanations that validate predictive insights against biomedical knowledge and that outperform the state-of-the-art approaches in predictive performance, establishing REx as a relevant contribution to advance AI-driven scientific discovery.

2508.21051 2026-02-06 cs.CL cs.AI cs.CY

Language Models and Logic Programs for Trustworthy Tax Reasoning

William Jurayj, Nils Holzenberger, Benjamin Van Durme

Comments Accepted to AAAI 2026

详情
英文摘要

According to the United States Internal Revenue Service, ``the average American spends $\$270$ and 13 hours filing their taxes''. Even beyond the U.S., tax filing requires complex reasoning, combining application of overlapping rules with numerical calculations. Because errors can incur costly penalties, any automated system must deliver high accuracy and auditability, making modern large language models (LLMs) poorly suited for this task. We propose an approach that integrates LLMs with a symbolic solver to calculate tax obligations. We evaluate variants of this system on the challenging StAtutory Reasoning Assessment (SARA) dataset, and include a novel method for estimating the cost of deploying such a system based on real-world penalties for tax errors. We further show how combining up-front translation of plain-text rules into formal logic programs, combined with intelligently retrieved exemplars for formal case representations, can dramatically improve performance on this task and reduce costs to well below real-world averages. Our results demonstrate the effectiveness of applying semantic parsing methods to statutory reasoning, and show promising economic feasibility of neuro-symbolic architectures for increasing access to reliable tax assistance.

2508.19842 2026-02-06 cs.LG

Symplectic convolutional neural networks

Süleyman Yıldız, Konrad Janik, Peter Benner

详情
英文摘要

We propose a new symplectic convolutional neural network (CNN) architecture by leveraging symplectic neural networks, proper symplectic decomposition, and tensor techniques. Specifically, we first introduce a mathematically equivalent form of the convolution layer and then, using symplectic neural networks, we demonstrate a way to parameterize the layers of the CNN to ensure that the convolution layer remains symplectic. To construct a complete autoencoder, we introduce a symplectic pooling layer. We demonstrate the performance of the proposed neural network on three examples: the wave equation, the nonlinear Schrödinger (NLS) equation, and the sine-Gordon equation. The numerical results indicate that the symplectic CNN outperforms the linear symplectic autoencoder obtained via proper symplectic decomposition.

2508.04349 2026-02-06 cs.CL cs.AI

GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy

Hongze Tan, Zihan Wang, Jianfei Pan, Jinghao Lin, Hao Wang, Yifan Wu, Tao Chen, Zhihang Zheng, Zhihao Tang, Haihua Yang

详情
英文摘要

Reinforcement Learning (RL) is pivotal for enhancing Large Language Model (LLM) reasoning, yet mainstream algorithms such as GRPO and DAPO remain constrained by a coarse-grained credit assignment paradigm, where all tokens within the same response receive the identical reward. In this paper, we propose Dynamic Entropy Weighting, systematically define entropy-based weight ratios $\frac{H_{i,t}}{\sum_{k=1}^{n} H_{k,t}}$ and similar variants to redistribute rewards and get fine-grained rewards through two new algorithms: Group Token Policy Optimization (GTPO), which assigns an entropy-weighted reward to each token and synthesizes token-specific advantage function to drive the model toward optimal path, and the analogous algorithm Sequence-Level GRPO (GRPO-S), which extends this design to the sequence level and exhibits superior stability in long Chain-of-Thought (CoT) reasoning tasks.

2508.01151 2026-02-06 cs.CV cs.AI

Personalized Safety Alignment for Text-to-Image Diffusion Models

Yu Lei, Jinbin Bai, Qingyu Shi, Aosong Feng, Hongcheng Gao, Xiao Zhang, Rex Ying

详情
英文摘要

Text-to-image diffusion models have revolutionized visual content generation, yet their deployment is hindered by a fundamental limitation: safety mechanisms enforce rigid, uniform standards that fail to reflect diverse user preferences shaped by age, culture, or personal beliefs. To address this, we propose Personalized Safety Alignment (PSA), a framework that transitions generative safety from static filtration to user-conditioned adaptation. We introduce Sage, a large-scale dataset capturing diverse safety boundaries across 1,000 simulated user profiles, covering complex risks often missed by traditional datasets. By integrating these profiles via a parameter-efficient cross-attention adapter, PSA dynamically modulates generation to align with individual sensitivities. Extensive experiments demonstrate that PSA achieves a calibrated safety-quality trade-off: under permissive profiles, it relaxes over-cautious constraints to enhance visual fidelity, while under restrictive profiles, it enforces state-of-the-art suppression, significantly outperforming static baselines. Furthermore, PSA exhibits superior instruction adherence compared to prompt-engineering methods, establishing personalization as a vital direction for creating adaptive, user-centered, and responsible generative AI. Our code, data, and models are publicly available at https://github.com/M-E-AGI-Lab/PSAlign.

2507.17001 2026-02-06 cs.LG

Should Bias be Eliminated? A General Framework to Use Bias for OOD Generalization

Yan Li, Yunlong Deng, Zijian Li, Anpeng Wu, Zeyu Tang, Kun Zhang, Guangyi Chen

详情
英文摘要

Most approaches to out-of-distribution (OOD) generalization learn domain-invariant representations by discarding contextual bias. In this paper, we raise a critical question: Should bias be eliminated? If not, is there a general way to leverage bias for better OOD generalization? To answer these questions, we first provide a theoretical analysis that characterizes the circumstances in which biased features contribute positively. Although theoretical results show that bias may sometimes play a positive role, leveraging it effectively is non-trivial, since its harmful and beneficial components are often entangled. Recent advances have sought to refine the prediction of bias by presuming reliable predictions from invariant features. However, such assumptions may be too strong in the real world, especially when the target also shifts from training to testing domains. Motivated by this challenge, we introduce a framework to leverage bias in a more general scenario. Specifically, we employ a generative model to capture the data generation process and identify the underlying bias factors, which are then used to construct a bias-aware predictor. Since the bias-aware predictor may shift across environments, we first estimate the environment state to train predictors under different environments, combining them as a mixture of domain experts for the final prediction. Then, we build a general invariant predictor, which can be invariant under label shift to guide the adaptation of the bias-aware predictor. Evaluations on synthetic data and standard domain generalization benchmarks demonstrate that our method consistently outperforms both invariance only baselines, recent bias utilization approaches and advanced baselines, yielding improved robustness and adaptability.

2507.13772 2026-02-06 cs.CV cs.LG

Feature Engineering is Not Dead: Reviving Classical Machine Learning with Entropy, HOG, and LBP Feature Fusion for Image Classification

Abhijit Sen, Giridas Maiti, Bikram K. Parida, Bhanu P. Mishra, Mahima Arya, Denys I. Bondar

详情
英文摘要

Feature engineering continues to play a critical role in image classification, particularly when interpretability and computational efficiency are prioritized over deep learning models with millions of parameters. In this study, we revisit classical machine learning based image classification through a novel approach centered on Permutation Entropy (PE), a robust and computationally lightweight measure traditionally used in time series analysis but rarely applied to image data. We extend PE to two-dimensional images and propose a multiscale, multi-orientation entropy-based feature extraction approach that characterizes spatial order and complexity along rows, columns, diagonals, anti-diagonals, and local patches of the image. To enhance the discriminatory power of the entropy features, we integrate two classic image descriptors: the Histogram of Oriented Gradients (HOG) to capture shape and edge structure, and Local Binary Patterns (LBP) to encode micro-texture of an image. The resulting hand-crafted feature set, comprising of 780 dimensions, is used to train Support Vector Machine (SVM) classifiers optimized through grid search. The proposed approach is evaluated on multiple benchmark datasets, including Fashion-MNIST, KMNIST, EMNIST, and CIFAR-10, where it delivers competitive classification performance without relying on deep architectures. Our results demonstrate that the fusion of PE with HOG and LBP provides a compact, interpretable, and effective alternative to computationally expensive and limited interpretable deep learning models. This shows a potential of entropy-based descriptors in image classification and contributes a lightweight and generalizable solution to interpretable machine learning in image classification and computer vision.

2507.13624 2026-02-06 cs.LG cs.DC cs.NI

FedSkipTwin: Digital-Twin-Guided Client Skipping for Communication-Efficient Federated Learning

Daniel Commey, Kamel Abbad, Garth V. Crosby, Lyes Khoukhi

Journal ref 2026 IEEE 23rd Consumer Communications & Networking Conference (CCNC)

详情
英文摘要

Communication overhead remains a primary bottleneck in federated learning (FL), particularly for applications involving mobile and IoT devices with constrained bandwidth. This work introduces FedSkipTwin, a novel client-skipping algorithm driven by lightweight, server-side digital twins. Each twin, implemented as a simple LSTM, observes a client's historical sequence of gradient norms to forecast both the magnitude and the epistemic uncertainty of its next update. The server leverages these predictions, requesting communication only when either value exceeds a predefined threshold; otherwise, it instructs the client to skip the round, thereby saving bandwidth. Experiments are conducted on the UCI-HAR and MNIST datasets with 10 clients under a non-IID data distribution. The results demonstrate that FedSkipTwin reduces total communication by 12-15.5% across 20 rounds while simultaneously improving final model accuracy by up to 0.5 percentage points compared to the standard FedAvg algorithm. These findings establish that prediction-guided skipping is a practical and effective strategy for resource-aware FL in bandwidth-constrained edge environments.

2507.13579 2026-02-06 cs.LG cs.AI

Learning to summarize user information for personalized reinforcement learning from human feedback

Hyunji Nam, Yanming Wan, Mickel Liu, Peter Ahnn, Jianxun Lian, Natasha Jaques

Comments 10 pages for main text, 10 pages for appendix

详情
英文摘要

As everyday use cases of large language model (LLM) AI assistants have expanded, it is becoming increasingly important to personalize responses to align to different users' preferences and goals. While reinforcement learning from human feedback (RLHF) is effective at improving LLMs to be generally more helpful and fluent, it does not account for variability across users, as it models the entire user population with a single reward model, meaning it assumes that everyone's preferences are the same. We present a novel framework, Preference Learning Using Summarization (PLUS), that uses reinforcement learning (RL) to learn to produce text-based summaries of each user's preferences, characteristics, and past conversations. These summaries condition the reward model, enabling it to make personalized predictions about the types of responses valued by each user. Both the user-summarization model and reward model are trained simultaneously, creating an online co-adaptation loop. We show that in contrast to the standard Bradley-Terry model, summaries produced by PLUS capture diverse aspects of user preferences, achieving a 11-77/% improvement in reward model accuracy. Key strengths of PLUS are: (1) robust performance with new users and conversation topics, achieving a 25\% improvement over the best personalized reward model technique used for RLHF; (2) zero-shot personalization with state-of-the-art proprietary models like GPT-4 (e.g., PLUS-summary-conditioned responses achieved a 72\% win rate compared to 28% for default GPT-4o); (3) learning from flexible user contexts beyond preference labels, and (4) interpretable representation of users, enabling greater transparency and user control in pluralistic LLM alignment.

2507.10239 2026-02-06 cs.CV

Transferring Styles for Reduced Texture Bias and Improved Robustness in Semantic Segmentation Networks

Ben Hamscher, Edgar Heinert, Annika Mütze, Kira Maag, Matthias Rottmann

Comments accepted at ECAI 2025

详情
英文摘要

Recent research has investigated the shape and texture biases of deep neural networks (DNNs) in image classification which influence their generalization capabilities and robustness. It has been shown that, in comparison to regular DNN training, training with stylized images reduces texture biases in image classification and improves robustness with respect to image corruptions. In an effort to advance this line of research, we examine whether style transfer can likewise deliver these two effects in semantic segmentation. To this end, we perform style transfer with style varying across artificial image areas. Those random areas are formed by a chosen number of Voronoi cells. The resulting style-transferred data is then used to train semantic segmentation DNNs with the objective of reducing their dependence on texture cues while enhancing their reliance on shape-based features. In our experiments, it turns out that in semantic segmentation, style transfer augmentation reduces texture bias and strongly increases robustness with respect to common image corruptions as well as adversarial attacks. These observations hold for convolutional neural networks and transformer architectures on the Cityscapes dataset as well as on PASCAL Context, showing the generality of the proposed method.

2507.04756 2026-02-06 cs.CL cs.AI

CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering

Hang Lv, Sheng Liang, Hao Wang, Hongchao Gu, Yaxiong Wu, Wei Guo, Defu Lian, Yong Liu, Enhong Chen

详情
英文摘要

Personalization has become crucial for adapting models to the diverse and evolving needs of users across cultural, temporal, and contextual dimensions. While existing methods often rely on centralized fine-tuning or static preference alignment within a single model, they struggle to achieve both real-time and high-quality personalization under the resource and privacy constraints of personal devices. To address this challenge, we propose CoSteer, a collaborative framework that enables tuning-free, real-time personalization via decoding-time adaptation. By leveraging logit differences between context-aware and context-agnostic local small models, CoSteer steers cloud-based large models, ensuring effective personalization while preserving the large model's capabilities. Personalization is handled locally, with only final tokens sent to the cloud, maintaining both user context and system efficiency. Through extensive experiments across a wide range of tasks, we demonstrate that CoSteer generates high-quality personalized content, ensuring both effectiveness and computational efficiency. Our results highlight its robustness across models and environments, confirming its practical applicability in real-world scenarios.

2507.01028 2026-02-06 cs.LG cs.AI

Dual Perspectives on Non-Contrastive Self-Supervised Learning

Jean Ponce, Basile Terver, Martial Hebert, Michael Arbel

详情
英文摘要

The {\em stop gradient} and {\em exponential moving average} iterative procedures are commonly used in non-contrastive approaches to self-supervised learning to avoid representation collapse, with excellent performance in downstream applications in practice. This presentation investigates these procedures from the dual viewpoints of optimization and dynamical systems. We show that, in general, although they {\em do not} optimize the original objective, or {\em any} other smooth function, they {\em do} avoid collapse Following~\citet{Tian21}, but without any of the extra assumptions used in their proofs, we then show using a dynamical system perspective that, in the linear case, minimizing the original objective function without the use of a stop gradient or exponential moving average {\em always} leads to collapse. Conversely, we characterize explicitly the equilibria of the dynamical systems associated with these two procedures in this linear setting as algebraic varieties in their parameter space, and show that they are, in general, {\em asymptotically stable}. Our theoretical findings are illustrated by empirical experiments with real and synthetic data.

2506.22186 2026-02-06 cs.LG

Thompson Sampling-Based Learning and Control for Unknown Dynamic Systems

Kaikai Zheng, Dawei Shi, Yang Shi, Long Wang

详情
英文摘要

Thompson sampling (TS) is a Bayesian randomized exploration strategy that samples options (e.g., system parameters or control laws) from the current posterior and then applies the selected option that is optimal for a task, thereby balancing exploration and exploitation; this makes TS effective for active learning-based controller design. However, TS relies on finite parametric representations, which limits its applicability to more general spaces, which are more commonly encountered in control system design. To address this issue, this work proposes a parameterization method for control law learning using reproducing kernel Hilbert spaces and designs a data-driven active learning control approach. Specifically, the proposed method treats the control law as an element in a function space, allowing the design of control laws without imposing restrictions on the system structure or the form of the controller. A TS framework is proposed in this work to reduce control costs through online exploration and exploitation, and the convergence guarantees are further provided for the learning process. Theoretical analysis shows that the proposed method learns the relationship between control laws and closed-loop performance metrics at an exponential rate, and the upper bound of control regret is also derived. Furthermore, the closed-loop stability of the proposed learning framework is analyzed. Numerical experiments on controlling unknown nonlinear systems validate the effectiveness of the proposed method.

2506.21996 2026-02-06 cs.AI

AlphaBeta is not as good as you think: a simple class of synthetic games for a better analysis of deterministic game-solving algorithms

Raphaël Boige, Amine Boumaza, Bruno Scherrer

Journal ref The Thirty-ninth Annual Conference on Neural Information Processing Systems, Dec 2025, San Diego, United States

详情
英文摘要

Deterministic game-solving algorithms are conventionally analyzed in the light of their average-case complexity against a distribution of random game-trees, where leaf values are independently sampled from a fixed distribution. This simplified model enables uncluttered mathematical analysis, revealing two key properties: root value distributions asymptotically collapse to a single fixed value for finite-valued trees, and all reasonable algorithms achieve global optimality. However, these findings are artifacts of the model's design: its long criticized independence assumption strips games of structural complexity, producing trivial instances where no algorithm faces meaningful challenges. To address this limitation, we introduce a class of synthetic games generated by a probabilistic model that incrementally constructs game-trees using a fixed level-wise conditional distribution. By enforcing ancestor dependencies, a critical structural feature of real-world games, our framework generates problems with adjustable difficulty while retaining some form of analytical tractability. For several algorithms, including AlphaBeta and Scout, we derive recursive formulas characterizing their average-case complexities under this model. These allow us to rigorously compare algorithms on deep game-trees, where Monte-Carlo simulations are no longer feasible. While asymptotically, all algorithms seem to converge to identical branching factor (a result analogous to that of independence-based models), deep finite trees reveal stark differences: AlphaBeta incurs a significantly larger constant multiplicative factor compared to algorithms like Scout, leading to a substantial practical slowdown. Our framework sheds new light on classical game-solving algorithms, offering rigorous evidence and analytical tools to advance the understanding of these methods under a richer, more challenging, and yet tractable model.

2506.08629 2026-02-06 cs.CV cs.AI

ECMNet:Lightweight Semantic Segmentation with Efficient CNN-Mamba Network

Feixiang Du, Shengkun Wu

Comments 16 pages, 2 figures, 4 tables

详情
英文摘要

In the past decade, Convolutional Neural Networks (CNNs) and Transformers have achieved wide applicaiton in semantic segmentation tasks. Although CNNs with Transformer models greatly improve performance, the global context modeling remains inadequate. Recently, Mamba achieved great potential in vision tasks, showing its advantages in modeling long-range dependency. In this paper, we propose a lightweight Efficient CNN-Mamba Network for semantic segmentation, dubbed as ECMNet. ECMNet combines CNN with Mamba skillfully in a capsule-based framework to address their complementary weaknesses. Specifically, We design a Enhanced Dual-Attention Block (EDAB) for lightweight bottleneck. In order to improve the representations ability of feature, We devise a Multi-Scale Attention Unit (MSAU) to integrate multi-scale feature aggregation, spatial aggregation and channel aggregation. Moreover, a Mamba enhanced Feature Fusion Module (FFM) merges diverse level feature, significantly enhancing segmented accuracy. Extensive experiments on two representative datasets demonstrate that the proposed model excels in accuracy and efficiency balance, achieving 70.6% mIoU on Cityscapes and 73.6% mIoU on CamVid test datasets, with 0.87M parameters and 8.27G FLOPs on a single RTX 3090 GPU platform.