arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1718
2507.12988 2026-04-02 cs.CV cs.LG

Variance-Based Pruning for Accelerating and Compressing Trained Networks

Uranik Berisha, Jens Mehnert, Alexandru Paul Condurache

Comments Accepted as Oral at ICCV'25 (IEEE/CVF International Conference on Computer Vision)

详情
英文摘要

Increasingly expensive training of ever larger models such as Vision Transfomers motivate reusing the vast library of already trained state-of-the-art networks. However, their latency, high computational costs and memory demands pose significant challenges for deployment, especially on resource-constrained hardware. While structured pruning methods can reduce these factors, they often require costly retraining, sometimes for up to hundreds of epochs, or even training from scratch to recover the lost accuracy resulting from the structural modifications. Maintaining the provided performance of trained models after structured pruning and thereby avoiding extensive retraining remains a challenge. To solve this, we introduce Variance-Based Pruning, a simple and structured one-shot pruning technique for efficiently compressing networks, with minimal finetuning. Our approach first gathers activation statistics, which are used to select neurons for pruning. Simultaneously the mean activations are integrated back into the model to preserve a high degree of performance. On ImageNet-1k recognition tasks, we demonstrate that directly after pruning DeiT-Base retains over 70% of its original performance and requires only 10 epochs of fine-tuning to regain 99% of the original accuracy while simultaneously reducing MACs by 35% and model size by 36%, thus speeding up the model by 1.44x. The code is available at: https://github.com/boschresearch/variance-based-pruning

2507.11737 2026-04-02 cs.AI

Auto-Formulating Dynamic Programming Problems with Large Language Models

Chenyu Zhou, Jingyuan Yang, Linwei Xin, Yitian Chen, Ziyan He, Dongdong Ge

详情
英文摘要

Dynamic programming (DP) is a fundamental method in operations research, but formulating DP models has traditionally required expert knowledge of both the problem context and DP techniques. Large Language Models (LLMs) offer the potential to automate this process. However, DP problems pose unique challenges due to their inherently stochastic transitions and the limited availability of training data. These factors make it difficult to directly apply existing LLM-based models or frameworks developed for other optimization problems, such as linear or integer programming. We introduce DP-Bench, the first benchmark covering a wide range of textbook-level DP problems to enable systematic evaluation. We present Dynamic Programming Language Model (DPLM), a 7B-parameter specialized model that achieves performance comparable to state-of-the-art LLMs like OpenAI's o1 and DeepSeek-R1, and surpasses them on hard problems. Central to DPLM's effectiveness is DualReflect, our novel synthetic data generation pipeline, designed to scale up training data from a limited set of initial examples. DualReflect combines forward generation for diversity and backward generation for reliability. Our results reveal a key insight: backward generation is favored in low-data regimes for its strong correctness guarantees, while forward generation, though lacking such guarantees, becomes increasingly valuable at scale for introducing diverse formulations. This trade-off highlights the complementary strengths of both approaches and the importance of combining them.

2506.23053 2026-04-02 cs.LG

Double-Diffusion: ODE-Prior Accelerated Diffusion Models for Spatio-Temporal Graph Forecasting

Hanlin Dong, Arian Prabowo, Hao Xue, Ao Shuang, Tianyi Zhou, Flora D. Salim

详情
英文摘要

Forecasting over graph-structured sensor networks demands models that capture both deterministic spatial trends and stochastic variability, while remaining efficient enough for repeated inference as new observations arrive. We propose Double-Diffusion, a denoising diffusion probabilistic model that integrates a parameter-free graph diffusion Ordinary Differential Equation (ODE) forecast as a structural prior throughout the generative process. Unlike standard diffusion approaches that generate predictions from pure noise, Double-Diffusion uses the ODE prediction as both (1) a residual learning target in the forward process via the Resfusion framework, and (2) an explicit conditioning input for the reverse denoiser, shifting the generation task from full synthesis to guided refinement. This dual integration enables accelerated sampling by initializing from an intermediate diffusion step where the ODE prior is already close to the target distribution. We further introduce the Factored Spectral Denoiser (FSD), which adopts the divided attention principle to decompose spatio-temporal-channel modeling into three efficient axes: temporal self-attention, cross-channel attention, and spectral graph convolution via the Graph Fourier Transform. Extensive experiments on four real-world sensor-network datasets spanning two domains: urban air quality (Beijing, Athens) and traffic flow (PEMS08, PEMS04, demonstrate that Double-Diffusion achieves the best probabilistic calibration (CRPS) across all datasets while scaling sublinearly in inference time, achieving a 3.8x speedup compared to standard diffusion model setup through a substantial reduction in required sampling steps.

2506.21997 2026-04-02 cs.LG cs.AI

Binned semiparametric Bayesian networks for efficient kernel density estimation

Rafael Sojo, Javier Díaz-Rozo, Concha Bielza, Pedro Larrañaga

Comments Major revision after reviewer comments. Title changed based on reviewer suggestion. Improved introduction, complexity analysis and experiments. Submitted to Information Sciences

详情
英文摘要

This paper introduces a new type of probabilistic semiparametric model that takes advantage of data binning to reduce the computational cost of kernel density estimation in nonparametric distributions. Two new conditional probability distributions are developed for the new binned semiparametric Bayesian networks, the sparse binned kernel density estimation and the Fourier kernel density estimation. These two probability distributions address the curse of dimensionality, which typically impacts binned models, by using sparse tensors and restricting the number of parent nodes in conditional probability calculations. To evaluate the proposal, we perform a complexity analysis and conduct several comparative experiments using synthetic data and datasets from the UCI Machine Learning repository. The experiments include different binning rules, parent restrictions, grid sizes, and number of instances to get a holistic view of the model's behavior. As a result, our binned semiparametric Bayesian networks achieve structural learning and log-likelihood estimations with no statistically significant differences compared to the semiparametric Bayesian networks, but at a much higher speed. Thus, the new binned semiparametric Bayesian networks prove to be a reliable and more efficient alternative to their non-binned counterparts.

2506.19846 2026-04-02 cs.AI

HiMA-Ecom: Enabling Joint Training of Hierarchical Multi-Agent E-commerce Assistants

Junxing Hu, Ai Han, Haolan Zhan, Pu Wei, Zhiqian Zhang, Yuhang Guo, Jiawei Lu, Zhen Chen, Haoran Li, Zicheng Zhang

Comments 39 pages, 10 figures, under review

详情
英文摘要

Hierarchical multi-agent systems based on large language models (LLMs) have become a common paradigm for building AI assistants in vertical domains such as e-commerce, where a master agent coordinates multiple specialized sub-agents. Despite their practical importance, realistic benchmarks for training and evaluating such systems remain scarce, and joint optimization across functionally distinct agents is still challenging. To address this gap, we introduce HiMA-Ecom, the first hierarchical multi-agent benchmark tailored for e-commerce scenarios. HiMA-Ecom contains 22.8K instances, including agent-specific supervised fine-tuning samples with memory and system-level input-output pairs for joint multi-agent reinforcement learning. Building upon it, a joint training method named HiMA-R1 is proposed. It presents Variance-Reduction Group Relative Policy Optimization (VR-GRPO), which employs initial trajectory-based Monte Carlo sampling to mitigate the exponential joint action space and selects informative agent groups for efficient updates based on reward variance. Furthermore, an adaptive memory evolution mechanism that repurposes GRPO rewards as cost-free supervisory signals is designed to eliminate repetitive reasoning and accelerate convergence. Experiments on HiMA-Ecom demonstrate that our method, built upon smaller 3B/7B open-source models, achieves performance comparable to that of larger LLMs, such as DeepSeek-R1, and surpasses DeepSeek-V3 by an average of 6\%.

2506.18919 2026-04-02 cs.CL cs.AI cs.CV

MemeMind: A Large-Scale Multimodal Dataset with Chain-of-Thought Reasoning for Harmful Meme Detection

Hexiang Gu, Qifan Yu, Yuan Liu, Zikang Li, Saihui Hou, Jian Zhao, Zhaofeng He

详情
英文摘要

As a multimodal medium combining images and text, memes frequently convey implicit harmful content through metaphors and humor, rendering the detection of harmful memes a complex and challenging task. Although recent studies have made progress in detection accuracy and interpretability, large-scale, high-quality datasets for harmful memes remain scarce, and current methods still struggle to capture implicit risks and nuanced semantics. Thus, we construct MemeMind, a large-scale harmful meme dataset. Aligned with the international standards and the context of internet, MemeMind provides detailed Chain-of-Thought (CoT) reasoning annotations to support fine-grained analysis of implicit intentions in memes. Based on this dataset, we further propose MemeGuard, a reasoning-oriented multimodal detection framework that significantly improves both the accuracy of harmful meme detection and the interpretability of model decisions. Extensive experimental results demonstrate that MemeGuard outperforms existing state-of-the-art methods on the MemeMind dataset, establishing a solid foundation for future research in harmful meme detection. The complete dataset and code will be released upon acceptance.

2506.08915 2026-04-02 cs.CV cs.AI

Two-stage Vision Transformers and Hard Masking offer Robust Object Representations

Ananthu Aniraj, Cassio F. Dantas, Dino Ienco, Diego Marcos

Comments Accepted at ICPR 2026

详情
英文摘要

Context can strongly affect object representations, sometimes leading to undesired biases, particularly when objects appear in out-of-distribution backgrounds at inference. At the same time, many object-centric tasks require to leverage the context for identifying the relevant image regions. We posit that this conundrum, in which context is simultaneously needed and a potential nuisance, can be addressed by an attention-based approach that uses learned binary attention masks to ensure that only attended image regions influence the prediction. To test this hypothesis, we evaluate a two-stage framework: stage 1 processes the full image to discover object parts and identify task-relevant regions, for which context cues are likely to be needed, while stage 2 leverages input attention masking to restrict its receptive field to these regions, enabling a focused analysis while filtering out potentially spurious information. Both stages are trained jointly, allowing stage 2 to refine stage 1. The explicit nature of the semantic masks also makes the model's reasoning auditable, enabling powerful test-time interventions to further enhance robustness. Extensive experiments across diverse benchmarks demonstrate that this approach significantly improves robustness against spurious correlations and out-of-distribution backgrounds. Code: https://github.com/ananthu-aniraj/ifam

2506.04822 2026-04-02 cs.CL

Evaluating Vision-Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms

Nurul Aisyah, Muhammad Dehan Al Kautsar, Arif Hidayat, Raqib Chowdhury, Fajri Koto

Comments Accepted at AIED 2026

详情
英文摘要

Despite rapid progress in vision-language and large language models (VLMs and LLMs), their effectiveness for AI-driven educational assessment in real-world, underrepresented classrooms remains largely unexplored. We evaluate state-of-the-art VLMs and LLMs on over 14K handwritten answers from grade-4 classrooms in Indonesia, covering Mathematics and English aligned with the local national curriculum. Unlike prior work on clean digital text, our dataset features naturally curly, diverse handwriting from real classrooms, posing realistic visual and linguistic challenges. Assessment tasks include grading and generating personalized Indonesian feedback guided by rubric-based evaluation. Results show that the VLM struggles with handwriting recognition, causing error propagation in LLM grading, yet LLM feedback remains pedagogically useful despite imperfect visual inputs, revealing limits in personalization and contextual relevance.

2506.03753 2026-04-02 cs.CV

HUMOF: Human Motion Forecasting in Interactive Social Scenes

Caiyi Sun, Yujing Sun, Xiao Han, Zemin Yang, Jiawei Liu, Xinge Zhu, Siu Ming Yiu, Yuexin Ma

Comments Accepted by ICLR 2026

详情
英文摘要

Complex scenes present significant challenges for predicting human behaviour due to the abundance of interaction information, such as human-human and humanenvironment interactions. These factors complicate the analysis and understanding of human behaviour, thereby increasing the uncertainty in forecasting human motions. Existing motion prediction methods thus struggle in these complex scenarios. In this paper, we propose an effective method for human motion forecasting in interactive scenes. To achieve a comprehensive representation of interactions, we design a hierarchical interaction feature representation so that high-level features capture the overall context of the interactions, while low-level features focus on fine-grained details. Besides, we propose a coarse-to-fine interaction reasoning module that leverages both spatial and frequency perspectives to efficiently utilize hierarchical features, thereby enhancing the accuracy of motion predictions. Our method achieves state-of-the-art performance across four public datasets. The source code will be available at https://github.com/scy639/HUMOF.

2506.02768 2026-04-02 cs.RO cs.SY eess.SY

Geometric Visual Servo Via Optimal Transport

Ethan Canzini, Simon Pope, Ashutosh Tiwari

Comments 19 pages, 5 figures. Accepted to Control Engineering Practice

详情
英文摘要

When developing control laws for robotic systems, the principle factor when examining their performance is choosing inputs that allow smooth tracking to a reference input. In the context of robotic manipulation, this involves translating an object or end-effector from an initial pose to a target pose. Robotic manipulation control laws frequently use vision systems as an error generator to track features and produce control inputs. However, current control algorithms don't take into account the probabilistic features that are extracted and instead rely on hand-tuned feature extraction methods. Furthermore, the target features can exist in a static pose thus allowing a combined pose and feature error for control generation. We present a geometric control law for the visual servoing problem for robotic manipulators. The input from the camera constitutes a probability measure on the 3-dimensional Special Euclidean task-space group, where the Wasserstein distance between the current and desired poses is analogous with the geometric geodesic. From this, we develop a controller that allows for both pose and image-based visual servoing by combining classical PD control with gravity compensation with error minimization through the use of geodesic flows on a 3-dimensional Special Euclidean group. We present our results on a set of test cases demonstrating the generalisation ability of our approach to a variety of initial positions.

2505.23459 2026-04-02 cs.LG

On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments

Safwan Labbi, Paul Mangold, Daniil Tiapkin, Eric Moulines

Comments Preprint

详情
英文摘要

We provide global convergence rates for vanilla and entropy-regularized federated softmax stochastic policy gradient (FedPG) with local training. We show that FedPG converges to a near-optimal policy in terms of the average agent value, with a gap controlled by the level of heterogeneity. Remarkably, we obtain the first convergence rates for entropy-regularized policy gradient with explicit constants, leveraging a projection-like operator. Our results build upon a new analysis of federated averaging for non-convex objectives, based on the observation that the Łojasiewicz-type inequalities from the single-agent setting (Mei et al., 2020) do not hold for the federated objective. This uncovers a fundamental difference between single-agent and federated reinforcement learning: while single-agent optimal policies can be deterministic, federated objectives may inherently require stochastic policies.

2505.22337 2026-04-02 cs.CV

Learning to Infer Parameterized Representations of Plants from 3D Scans

Samara Ghrer, Christophe Godin, Stefanie Wuhrer

详情
英文摘要

Plants frequently contain numerous organs, organized in 3D branching systems defining the plant's architecture. Reconstructing the architecture of plants from unstructured observations is challenging because of self-occlusion and spatial proximity between organs, which are often thin structures. To achieve the challenging task, we propose an approach that allows to infer a parameterized representation of the plant's architecture from a given 3D scan of a plant. In addition to the plant's branching structure, this representation contains parametric information for each plant organ, and can therefore be used directly in a variety of tasks. In this data-driven approach, we train a recursive neural network with virtual plants generated using a procedural model. After training, the network allows to infer a parametric tree-like representation based on an input 3D point cloud. Our method is applicable to any plant that can be represented as binary axial tree. We quantitatively evaluate our approach on Chenopodium Album plants on reconstruction, segmentation and skeletonization, which are important problems in plant phenotyping. In addition to carrying out several tasks at once, our method achieves results on-par with strong baselines for each task. We apply our method, trained exclusively on synthetic data, to 3D scans and show that it generalizes well.

2505.21505 2026-04-02 cs.CL cs.AI

How Does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective

Shimao Zhang, Zhejian Lai, Xiang Liu, Shuaijie She, Xiao Liu, Yeyun Gong, Shujian Huang, Jiajun Chen

Comments AAAI 2026 (Oral)

详情
英文摘要

Multilingual Alignment is an effective and representative paradigm to enhance LLMs' multilingual capabilities, which transfers the capabilities from the high-resource languages to the low-resource languages. Meanwhile, some research on language-specific neurons provides a new perspective to analyze and understand LLMs' mechanisms. However, we find that there are many neurons that are shared by multiple but not all languages and cannot be correctly classified. In this work, we propose a ternary classification methodology that categorizes neurons into three types, including language-specific neurons, language-related neurons, and general neurons. And we propose a corresponding identification algorithm to distinguish these different types of neurons. Furthermore, based on the distributional characteristics of different types of neurons, we divide the LLMs' internal process for multilingual inference into four parts: (1) multilingual understanding, (2) shared semantic space reasoning, (3) multilingual output space transformation, and (4) vocabulary space outputting. Additionally, we systematically analyze the models before and after alignment with a focus on different types of neurons. We also analyze the phenomenon of "Spontaneous Multilingual Alignment". Overall, our work conducts a comprehensive investigation based on different types of neurons, providing empirical results and valuable insights to better understand multilingual alignment and multilingual capabilities of LLMs.

2505.19715 2026-04-02 cs.CL cs.AI cs.LG

Graceful Forgetting in Generative Language Models

Chunyang Jiang, Chi-min Chan, Yiyang Cai, Yulong Liu, Wei Xue, Yike Guo

Comments 8 pages, 6 figures. EMNLP 2025

详情
英文摘要

Recently, the pretrain-finetune paradigm has become a cornerstone in various deep learning areas. While in general the pre-trained model would promote both effectiveness and efficiency of downstream tasks fine-tuning, studies have shown that not all knowledge acquired during pre-training is beneficial. Some of the knowledge may actually bring detrimental effects to the fine-tuning tasks, which is also known as negative transfer. To address this problem, graceful forgetting has emerged as a promising approach. The core principle of graceful forgetting is to enhance the learning plasticity of the target task by selectively discarding irrelevant knowledge. However, this approach remains underexplored in the context of generative language models, and it is often challenging to migrate existing forgetting algorithms to these models due to architecture incompatibility. To bridge this gap, in this paper we propose a novel framework, Learning With Forgetting (LWF), to achieve graceful forgetting in generative language models. With Fisher Information Matrix weighting the intended parameter updates, LWF computes forgetting confidence to evaluate self-generated knowledge regarding the forgetting task, and consequently, knowledge with high confidence is periodically unlearned during fine-tuning. Our experiments demonstrate that, although thoroughly uncovering the mechanisms of knowledge interaction remains challenging in pre-trained language models, applying graceful forgetting can contribute to enhanced fine-tuning performance.

2505.19574 2026-04-02 cs.RO cs.AI cs.LG math.OC

Situationally-Aware Dynamics Learning

Alejandro Murillo-Gonzalez, Lantao Liu

详情
Journal ref
The International Journal of Robotics Research (IJRR) 2026
英文摘要

Autonomous robots operating in complex, unstructured environments face significant challenges due to latent, unobserved factors that obscure their understanding of both their internal state and the external world. Addressing this challenge would enable robots to develop a more profound grasp of their operational context. To tackle this, we propose a novel framework for online learning of hidden state representations, with which the robots can adapt in real-time to uncertain and dynamic conditions that would otherwise be ambiguous and result in suboptimal or erroneous behaviors. Our approach is formalized as a Generalized Hidden Parameter Markov Decision Process, which explicitly models the influence of unobserved parameters on both transition dynamics and reward structures. Our core innovation lies in learning online the joint distribution of state transitions, which serves as an expressive representation of latent ego- and environmental-factors. This probabilistic approach supports the identification and adaptation to different operational situations, improving robustness and safety. Through a multivariate extension of Bayesian Online Changepoint Detection, our method segments changes in the underlying data generating process governing the robot's dynamics. The robot's transition model is then informed with a symbolic representation of the current situation derived from the joint distribution of latest state transitions, enabling adaptive and context-aware decision-making. To showcase the real-world effectiveness, we validate our approach in the challenging task of unstructured terrain navigation, where unmodeled and unmeasured terrain characteristics can significantly impact the robot's motion. Extensive experiments in both simulation and real world reveal significant improvements in data efficiency, policy performance, and the emergence of safer, adaptive navigation strategies.

2505.17899 2026-04-02 cs.LG

Universal Domain Adaptation Benchmark for Time Series Data Representation

Romain Mussard, Fannia Pacheco, Maxime Berar, Gilles Gasso, Paul Honeine

详情
Journal ref
2025 33rd European Signal Processing Conference (EUSIPCO)
英文摘要

Deep learning models have significantly improved the ability to detect novelties in time series (TS) data. This success is attributed to their strong representation capabilities. However, due to the inherent variability in TS data, these models often struggle with generalization and robustness. To address this, a common approach is to perform Unsupervised Domain Adaptation, particularly Universal Domain Adaptation (UniDA), to handle domain shifts and emerging novel classes. While extensively studied in computer vision, UniDA remains underexplored for TS data. This work provides a comprehensive implementation and comparison of state-of-the-art TS backbones in a UniDA framework. We propose a reliable protocol to evaluate their robustness and generalization across different domains. The goal is to provide practitioners with a framework that can be easily extended to incorporate future advancements in UniDA and TS architectures. Our results highlight the critical influence of backbone selection in UniDA performance and enable a robustness analysis across various datasets and architectures.

2505.17870 2026-04-02 cs.CL

Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods

Shaina Raza, Rizwan Qureshi, Azib Farooq, Marcelo Lotif, Aman Chadha, Deval Pandya, Christos Emmanouilidis

详情
英文摘要

Large language models (LLMs) reproduce misinformation not by memorizing false facts alone, but by learning the linguistic patterns that make falsehoods persuasive, such as hedging, false presuppositions, and fabricated citations. We propose model immunization, a training paradigm based on supervised fine-tuning over curated (false claim, correction) pairs, injected as small vaccine doses (5 to 10% of tokens) alongside truthful data. Unlike post-hoc filtering or preference-based alignment, immunization introduces direct negative supervision on labeled falsehoods. Across four open weight model families, this approach improves TruthfulQA accuracy by 12 points and increases misinformation rejection rates by 30 points, while preserving overall model capability. We further outline key design requirements, including dosage, labeling, quarantine, and diversity and advocate for standardized vaccine corpora and benchmarks to evaluate generalization. These findings position immunization as a practical and scalable component of responsible LLM development.

2505.17760 2026-04-02 cs.LG cs.AI

But what is your honest answer? Aiding LLM-judges with honest alternatives using steering vectors

Leon Eshuijs, Archie Chaudhury, Alan McBeth, Ethan Nguyen

详情
英文摘要

LLM-as-a-judge is widely used as a scalable substitute for human evaluation, yet current approaches rely on black-box access and struggle to detect subtle dishonesty, such as sycophancy and manipulation. We introduce Judge Using Safety-Steered Alternatives (JUSSA), a framework that leverages a model's internal representations to optimize an honesty-promoting steering vector from a single training example, generating contrastive alternatives that give judges a reference point for detecting dishonesty. We test JUSSA on a novel manipulation benchmark with human-validated response pairs at varying dishonesty levels, finding AUROC improvements across both GPT-4.1 (0.893 $\to$ 0.946) and Claude Haiku (0.859 $\to$ 0.929) judges, though performance degrades when task complexity is mismatched to judge capability, suggesting contrastive evaluation helps most when the task is challenging but within the judge's reach. Layer-wise analysis further shows that steering is most effective in middle layers, where model representations begin to diverge between honest and dishonest prompt processing. Our work demonstrates that steering vectors can serve as tools for evaluation rather than for improving model outputs at inference, opening a new direction for thorough white-box auditing.

2505.16619 2026-04-02 cs.AI q-bio.OT

Open and Sustainable AI: challenges, opportunities and the road ahead in the life sciences (October 2025 -- Version 2)

Gavin Farrell, Eleni Adamidi, Rafael Andrade Buono, Mihail Anton, Omar Abdelghani Attafi, Salvador Capella Gutierrez, Emidio Capriotti, Leyla Jael Castro, Davide Cirillo, Lisa Crossman, Christophe Dessimoz, Alexandros Dimopoulos, Raul Fernandez-Diaz, Styliani-Christina Fragkouli, Carole Goble, Wei Gu, John M. Hancock, Alireza Khanteymoori, Tom Lenaerts, Fabio G. Liberante, Peter Maccallum, Alexander Miguel Monzon, Magnus Palmblad, Lucy Poveda, Ovidiu Radulescu, Denis C. Shields, Shoaib Sufi, Thanasis Vergoulis, Fotis Psomopoulos, Silvio C. E. Tosatto

Comments 1 PDF, 24 Pages, 2 figures within. Co-corresponding authors: Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece and Department of Biomedical Sciences, University of Padova, Padova, Italy. E-mails: fpsom[@]certh.gr, silvio.tosatto[@]unipd.it

详情
英文摘要

Artificial intelligence (AI) has recently seen transformative breakthroughs in the life sciences, expanding possibilities for researchers to interpret biological information at an unprecedented capacity, with novel applications and advances being made almost daily. In order to maximise return on the growing investments in AI-based life science research and accelerate this progress, it has become urgent to address the exacerbation of long-standing research challenges arising from the rapid adoption of AI methods. We review the increased erosion of trust in AI research outputs, driven by the issues of poor reusability and reproducibility, and highlight their consequent impact on environmental sustainability. Furthermore, we discuss the fragmented components of the AI ecosystem and lack of guiding pathways to best support Open and Sustainable AI (OSAI) model development. In response, this perspective introduces a practical set of OSAI recommendations directly mapped to over 300 components of the AI ecosystem. Our work connects researchers with relevant AI resources, facilitating the implementation of sustainable, reusable and transparent AI. Built upon life science community consensus and aligned to existing efforts, the outputs of this perspective are designed to aid the future development of policy and structured pathways for guiding AI implementation.

2505.15808 2026-04-02 cs.LG cs.AI math.PR stat.AP stat.ML

Neural Conditional Transport Maps

Carlos Rodriguez-Pardo, Leonardo Chiani, Emanuele Borgonovo, Massimo Tavoni

Comments Published in Transactions on Machine Learning Research

详情
英文摘要

We present a neural framework for learning conditional optimal transport (OT) maps between probability distributions. Our approach introduces a conditioning mechanism capable of processing both categorical and continuous conditioning variables simultaneously. At the core of our method lies a hypernetwork that generates transport layer parameters based on these inputs, creating adaptive mappings that outperform simpler conditioning methods. Comprehensive ablation studies demonstrate the superior performance of our method over baseline configurations. Furthermore, we showcase an application to global sensitivity analysis, offering high performance in computing OT-based sensitivity indices. This work advances the state-of-the-art in conditional optimal transport, enabling broader application of optimal transport principles to complex, high-dimensional domains such as generative modeling and black-box model explainability.

2505.15160 2026-04-02 cs.CV

Lossless Token Merging Even Without Fine-Tuning in Vision Transformers

Jaeyeon Lee, Dong-Wan Choi

Comments ECAI 2025

详情
Journal ref
Proceedings of ECAI 2025, FAIA
英文摘要

Although Vision Transformers (ViTs) have become the standard architecture in computer vision, their massive sizes lead to significant computational overhead. Token compression techniques have attracted considerable attention to address this issue, but they often suffer from severe information loss, requiring extensive additional training to achieve practical performance. In this paper, we propose Adaptive Token Merging (ATM), a novel method that ensures lossless token merging, eliminating the need for fine-tuning while maintaining competitive performance. ATM adaptively reduces tokens across layers and batches by carefully adjusting layer-specific similarity thresholds, thereby preventing the undesirable merging of less similar tokens with respect to each layer. Furthermore, ATM introduces a novel token matching technique that considers not only similarity but also merging sizes, particularly for the final layers, to minimize the information loss incurred from each merging operation. We empirically validate our method across a wide range of pretrained models, demonstrating that ATM not only outperforms all existing training-free methods but also surpasses most training-intensive approaches, even without additional training. Remarkably, training-free ATM achieves over a 30% reduction in FLOPs for the DeiT-T and DeiT-S models without any drop in their original accuracy.

2505.14222 2026-04-02 cs.SD cs.GR cs.MM eess.AS

MATHDance: Mamba-Transformer Architecture with Uniform Tokenization for High-Quality 3D Dance Generation

Kaixing Yang, Xulong Tang, Ziqiao Peng, Yuxuan Hu, Xiangyue Zhang, Puwei Wang, Hongyan Liu, Jun He, Zhaoxin Fan

详情
英文摘要

Music-to-dance generation represents a challenging yet pivotal task at the intersection of choreography, virtual reality, and creative content generation. Despite its significance, existing methods face substantial limitation in achieving choreographic consistency. To address the challenge, we propose MatchDance, a novel framework for music-to-dance generation that constructs a latent representation to enhance choreographic consistency. MatchDance employs a two-stage design: (1) a Kinematic-Dynamic-based Quantization Stage (KDQS), which encodes dance motions into a latent representation by Finite Scalar Quantization (FSQ) with kinematic-dynamic constraints and reconstructs them with high fidelity, and (2) a Hybrid Music-to-Dance Generation Stage(HMDGS), which uses a Mamba-Transformer hybrid architecture to map music into the latent representation, followed by the KDQS decoder to generate 3D dance motions. Additionally, a music-dance retrieval framework and comprehensive metrics are introduced for evaluation. Extensive experiments on the FineDance dataset demonstrate state-of-the-art performance.

2505.12189 2026-04-02 cs.AI cs.CL

Mitigating Content Effects on Reasoning in Language Models through Fine-Grained Activation Steering

Marco Valentino, Geonhee Kim, Dhairya Dalal, Zhixue Zhao, André Freitas

Comments AAAI 2026

详情
英文摘要

Large language models (LLMs) exhibit reasoning biases, often conflating content plausibility with formal logical validity. This can lead to wrong inferences in critical domains, where plausible arguments are incorrectly deemed logically valid or vice versa. This paper investigates how content biases on reasoning can be mitigated through activation steering, an inference-time technique that modulates internal activations. Specifically, after localising the layers responsible for formal and plausible inference, we investigate activation steering on a controlled syllogistic reasoning task, designed to disentangle formal validity from content plausibility. An extensive empirical analysis reveals that contrastive steering methods consistently support linear control over content biases. However, a static approach is insufficient to debias all the tested models. We then investigate how to control content effects by dynamically determining the steering parameters through fine-grained conditional methods. By introducing a novel kNN-based conditional approach (K-CAST), we demonstrate that conditional steering can effectively reduce biases on unresponsive models, achieving up to 15% absolute improvement in formal reasoning accuracy. Finally, we found that steering for content effects is robust to prompt variations, incurs minimal side effects on multilingual language modeling capabilities, and can partially generalize to different reasoning tasks. In practice, we demonstrate that activation-level interventions offer a scalable inference-time strategy for enhancing the robustness of LLMs, contributing towards more systematic and unbiased reasoning capabilities.

2505.08614 2026-04-02 cs.CV

WaveGuard: Robust Deepfake Detection and Source Tracing via Dual-Tree Complex Wavelet and Graph Neural Networks

Ziyuan He, Zhiqing Guo, Liejun Wang, Gaobo Yang, Yunfeng Diao, Dan Ma

Comments 14 pages, 6 figures, 7 tables

详情
英文摘要

Deepfake technology poses increasing risks such as privacy invasion and identity theft. To address these threats, we propose WaveGuard, a proactive watermarking framework that enhances robustness and imperceptibility via frequency-domain embedding and graph-based structural consistency. Specifically, we embed watermarks into high-frequency sub-bands using Dual-Tree Complex Wavelet Transform (DT-CWT) and employ a Structural Consistency Graph Neural Network (SC-GNN) to preserve visual quality. We also design an attention module to refine embedding precision. Experimental results on face swap and reenactment tasks demonstrate that WaveGuard outperforms state-of-the-art methods in both robustness and visual quality. Code is available at https://github.com/vpsg-research/WaveGuard.

2505.04738 2026-04-02 cs.LG

SetONet: A Set-Based Operator Network for Solving PDEs with Variable-Input Sampling

Stepan Tretiakov, Xingjian Li, Krishna Kumar

详情
英文摘要

Most neural-operator surrogates for PDEs inherit from DeepONet-style formulations the requirement that the input function be sampled at a fixed, ordered set of sensors. This assumption limits applicability to problems with variable sensor layouts, missing data, point sources, and sample-based representations of densities. We propose SetONet, which addresses this gap by recasting the operator input as an unordered set of coordinate-value observations and encoding it with permutation-invariant aggregation inside a standard branch-trunk operator network while preserving the DeepONet synthesis mechanism and lightweight end-to-end training. A structured variant, SetONet-Key, aggregates sensor information through learnable query tokens and a position-only key pathway, thereby decoupling sampling geometry from sensor values. The method is assessed on four classical operator-learning benchmarks under fixed layouts, variable layouts, and evaluation-time sensor drop-off, and on four problems with inherently unstructured point-cloud inputs, including heat conduction with multiple point sources, advection-diffusion, phase-screen diffraction, and optimal transport problems. In parameter-matched studies, SetONet-Key achieves lower error than the DeepONet baseline on fixed-sensor benchmarks and remains reliable when layouts vary or sensors are dropped at evaluation. Comparisons across pooling rules show that attention-based aggregation is typically more robust than mean or sum pooling. On the point-cloud problems, SetONet operates directly on the native input representation, without rasterization or multi-stage preprocessing, and outperforms the larger VIDON baseline.

2505.04645 2026-04-02 cs.CL cs.LG stat.CO

ChatGPT for automated grading of short answer questions in mechanical ventilation

Tejas Jade, Alex Yartsev

详情
Journal ref
Focus on Health Professional Education: A Multi-Professional Journal, 27(1), 90-104 (2026)
英文摘要

Standardised tests using short answer questions (SAQs) are common in postgraduate education. Large language models (LLMs) simulate conversational language and interpret unstructured free-text responses in ways aligning with applying SAQ grading rubrics, making them attractive for automated grading. We evaluated ChatGPT 4o to grade SAQs in a postgraduate medical setting using data from 215 students (557 short-answer responses) enrolled in an online course on mechanical ventilation (2020--2024). Deidentified responses to three case-based scenarios were presented to ChatGPT with a standardised grading prompt and rubric. Outputs were analysed using mixed-effects modelling, variance component analysis, intraclass correlation coefficients (ICCs), Cohen's kappa, Kendall's W, and Bland--Altman statistics. ChatGPT awarded systematically lower marks than human graders with a mean difference (bias) of -1.34 on a 10-point scale. ICC values indicated poor individual-level agreement (ICC1 = 0.086), and Cohen's kappa (-0.0786) suggested no meaningful agreement. Variance component analysis showed minimal variability among the five ChatGPT sessions (G-value = 0.87), indicating internal consistency but divergence from the human grader. The poorest agreement was observed for evaluative and analytic items, whereas checklist and prescriptive rubric items had less disagreement. We caution against the use of LLMs in grading postgraduate coursework. Over 60% of ChatGPT-assigned grades differed from human grades by more than acceptable boundaries for high-stakes assessments.

2505.00213 2026-04-02 cs.RO math.OC

A Player Selection Network for Scalable Game-Theoretic Prediction and Planning

Tianyu Qiu, Eric Ouano, Fernando Palafox, Christian Ellis, David Fridovich-Keil

详情
英文摘要

While game-theoretic planning frameworks are effective at modeling multi-agent interactions, they require solving large optimization problems where the number of variables increases with the number of agents, resulting in long computation times that limit their use in large-scale, real-time systems. To address this issue, we propose 1) PSN Game-a learning-based, game-theoretic prediction and planning framework that reduces game size by learning a Player Selection Network (PSN); and 2) a Goal Inference Network (GIN) that makes it possible to use the PSN in incomplete-information games where other agents' intentions are unknown to the ego agent. A PSN outputs a player selection mask that distinguishes influential players from less relevant ones, enabling the ego player to solve a smaller, masked game involving only selected players. By reducing the number of players included in the game, PSN shrinks the corresponding optimization problems, leading to faster solve times. Experiments in both simulated scenarios and real-world pedestrian trajectory datasets show that PSN is competitive with, and often improves upon, the evaluated explicit game-theoretic selection baselines in 1) prediction accuracy and 2) planning safety. Across scenarios, PSN typically selects substantially fewer players than are present in the full game, thereby reducing game size and planning complexity. PSN also generalizes to settings in which agents' objectives are unknown, via the GIN, without test-time fine-tuning. By selecting only the most relevant players for decision-making, PSN Game provides a practical mechanism for reducing planning complexity that can be integrated into existing multi-agent planning frameworks.

2504.16624 2026-04-02 cs.LG cs.FL

A Detailed Account of Compositional Automata Learning through Alphabet Refinement

Leo Henry, Thomas Neele, Mohammad Reza Mousavi, Matteo Sammartino

Comments Extended version of "Compositional Active Learning of Synchronizing Systems Through Automated Alphabet Refinement" (CONCUR 2025, DOI: 10.4230/LIPIcs.CONCUR.2025.20), submitted to the CONCUR 2025 special issue of Logical Methods in Computer Science. Incorporates and extends results from "Compositional Automata Learning of Synchronous Systems" (FASE 2023, DOI: 10.1007/978-3-031-30826-0_3)

详情
英文摘要

Active automata learning infers automaton models of systems from behavioral observations, a technique successfully applied to a wide range of domains. Compositional approaches have recently emerged to address scalability to concurrent systems. We take a significant step beyond available results, including those by the authors, and develop a general technique for compositional learning of a synchronizing parallel system with an unknown decomposition. Our approach automatically refines the global alphabet into component alphabets while learning the component models. We develop a theoretical treatment of distributions of alphabets, i.e., sets of possibly overlapping component alphabets, characterize counter-examples that reveal inconsistencies with global observations, and show how to systematically update the distribution to restore consistency. We extend $L^{\star}$ to handle partial and potentially spurious information arising when learning components from global observations only. We establish correctness and termination of the full algorithm. We provide an implementation, called CoalA, using the state-of-the-art active learning library LearnLib. Our experiments on more than 630 subject systems show that CoalA delivers up to five orders of magnitude fewer membership queries than monolithic learning, and achieves better scalability in equivalence queries on systems with significant concurrency.

2503.22244 2026-04-02 cs.LG math.OC

Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch

Weizhen Wang, Jianping He, Xiaoming Duan

详情
英文摘要

Policy gradient methods are one of the most successful approaches for solving challenging reinforcement learning problems. Despite their empirical successes, many state-of-the-art policy gradient algorithms for discounted problems deviate from the theoretical policy gradient theorem due to the existence of a distribution mismatch. In this work, we analyze the impact of this mismatch on policy gradient methods. Specifically, we first show that in the case of tabular parameterizations, the biased gradient induced by the mismatch still yields a valid first-order characterization of global optimality. Then, we extend this analysis to more general parameterizations by deriving explicit bounds on both the state distribution mismatch and the resulting gradient mismatch in episodic and continuing MDPs, which are shown to vanish at least linearly as the discount factor approaches one. Building on these bounds, we further establish guarantees for the biased policy gradient iterates, showing that they approach approximate stationary points with respect to the exact gradient, with asymptotic residuals depending on the discount factor. Our findings offer insights into the robustness of policy gradient methods as well as the gap between theoretical foundations and practical implementations.

2503.11175 2026-04-02 cs.CV cs.AI

Zero-TIG: Temporal Consistency-Aware Zero-Shot Illumination-Guided Low-light Video Enhancement

Yini Li, Nantheera Anantrasirichai

详情
英文摘要

Low-light and underwater videos suffer from poor visibility, low contrast, and high noise, necessitating enhancements in visual quality. However, existing approaches typically rely on paired ground truth, which limits their practicality and often fails to maintain temporal consistency. To overcome these obstacles, this paper introduces a novel zero-shot learning approach named Zero-TIG, leveraging the Retinex theory and optical flow techniques. The proposed network consists of an enhancement module and a temporal feedback module. The enhancement module comprises three subnetworks: low-light image denoising, illumination estimation, and reflection denoising. The temporal enhancement module ensures temporal consistency by incorporating histogram equalization, optical flow computation, and image warping to align the enhanced previous frame with the current frame, thereby maintaining continuity. Additionally, we address color distortion in underwater data by adaptively balancing RGB channels. The experimental results demonstrate that our method achieves low-light video enhancement without the need for paired training data, making it a promising and applicable method for real-world scenario enhancement.