arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1139
2508.18695 2026-01-26 cs.CV

A Novel Deep Hybrid Framework with Ensemble-Based Feature Optimization for Robust Real-Time Human Activity Recognition

Wasi Ullah, Yasir Noman Khalid, Saddam Hussain Khan

Comments 35 pages, 25 figures, 11 tables

详情
英文摘要

Real-time Human Activity Recognition (HAR) has wide-ranging applications in areas such as context-aware environments, public safety, assistive technologies, and autonomous monitoring and surveillance systems. However, existing real-time HAR systems face significant challenges, including limited scalability and high computational costs arising from redundant features. To address these issues, the Inception-V3 model was customized with region-based and boundary-aware operations, using average pooling and max pooling, respectively, to enhance region homogeneity, suppress noise, and capture discriminative local features, while improving robustness through down-sampling. Furthermore, to effectively encode motion dynamics, an Attention-Augmented Long Short-Term Memory (AA-LSTM) network was employed to learn temporal dependencies across video frames. Features are extracted from video dataset and are then optimized through a novel proposed dynamic composite feature selection method called Adaptive Dynamic Fitness Sharing and Attention (ADFSA). This ADFSA mechanism is embedded within a genetic algorithm to select a compact, optimized subset of features by dynamically balancing multiple objectives, accuracy, redundancy reduction, feature uniqueness, and complexity minimization. As a result, the selected subset of diverse and discriminative features enables lightweight machine learning classifiers to achieve accurate and robust HAR in heterogeneous environments. Experimental results demonstrate up to 99.65\% accuracy using as few as seven selected features, with improved inference time on the challenging UCF-YouTube dataset, which includes factors such as occlusion, cluttered backgrounds, complex motion dynamics, and poor illumination conditions.

2508.13680 2026-01-26 cs.CL cs.LG

VMMU: A Vietnamese Multitask Multimodal Understanding and Reasoning Benchmark

Vy Tuong Dang, An Vo, Emilio Villa-Cueva, Quang Tau, Duc Dm, Thamar Solorio, Daeyoung Kim

详情
英文摘要

We introduce VMMU, a Vietnamese Multitask Multimodal Understanding and Reasoning Benchmark designed to evaluate how vision-language models (VLMs) interpret and reason over visual and textual information beyond English. VMMU consists of 2.5k multimodal questions across 7 tasks, covering a diverse range of problem contexts, including STEM problem solving, data interpretation, rule-governed visual reasoning, and abstract visual reasoning. All questions require genuine multimodal integration, rather than reliance on text-only cues or OCR-based shortcuts. We evaluate a diverse set of state-of-the-art proprietary and open-source VLMs on VMMU. Despite strong Vietnamese OCR performance, proprietary models achieve only 66% mean accuracy. Further analysis shows that the primary source of failure is not OCR, but instead multimodal grounding and reasoning over text and visual evidence. Code and data are available at https://vmmu-bench.github.io/

2508.13661 2026-01-26 cs.LG cs.MA

MACTAS: Self-Attention-Based Inter-Agent Communication in Multi-Agent Reinforcement Learning with Action-Value Function Decomposition

Maciej Wojtala, Bogusz Stefańczyk, Dominik Bogucki, Łukasz Lepak, Jakub Strykowski, Paweł Wawrzyński

Comments Submitted for IJCAI 2026

详情
英文摘要

Communication is essential for the collective execution of complex tasks by human agents, motivating interest in communication mechanisms for multi-agent reinforcement learning (MARL). However, existing communication protocols in MARL are often complex and non-differentiable. In this work, we introduce a self-attention-based communication method that exchanges information between the agents in MARL. Our proposed approach is fully differentiable, allowing agents to learn to generate messages in a reward-driven manner. The method can be seamlessly integrated with any action-value function decomposition algorithm and can be viewed as an orthogonal extension of such decompositions. Notably, it includes a fixed number of trainable parameters, independent of the number of agents, which makes it scalable to large systems. Experimental results on the SMACv2 benchmark demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on a number of maps. makes it scalable to large systems. Experimental results on the SMACv2 benchmark demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on a number of maps.

2508.12623 2026-01-26 cs.LG cs.AI

How can we trust opaque systems? Criteria for robust explanations in XAI

Florian J. Boge, Annika Schuster

Comments 8 pages, 1 figure

详情
英文摘要

Deep learning (DL) algorithms are becoming ubiquitous in everyday life and in scientific research. However, the price we pay for their impressively accurate predictions is significant: their inner workings are notoriously opaque - it is unknown to laypeople and researchers alike what features of the data a DL system focuses on and how it ultimately succeeds in predicting correct outputs. A necessary criterion for trustworthy explanations is that they should reflect the relevant processes the algorithms' predictions are based on. The field of eXplainable Artificial Intelligence (XAI) presents promising methods to create such explanations. But recent reviews about their performance offer reasons for skepticism. As we will argue, a good criterion for trustworthiness is explanatory robustness: different XAI methods produce the same explanations in comparable contexts. However, in some instances, all methods may give the same, but still wrong, explanation. We therefore argue that in addition to explanatory robustness (ER), a prior requirement of explanation method robustness (EMR) has to be fulfilled by every XAI method. Conversely, the robustness of an individual method is in itself insufficient for trustworthiness. In what follows, we develop and formalize criteria for ER as well as EMR, providing a framework for explaining and establishing trust in DL algorithms. We also highlight interesting application cases and outline directions for future work.

2508.08450 2026-01-26 cs.LG stat.ME stat.ML

Differentiable Cyclic Causal Discovery Under Unmeasured Confounders

Muralikrishnna G. Sethuraman, Faramarz Fekri

详情
英文摘要

Understanding causal relationships between variables is fundamental across scientific disciplines. Most causal discovery algorithms rely on two key assumptions: (i) all variables are observed, and (ii) the underlying causal graph is acyclic. While these assumptions simplify theoretical analysis, they are often violated in real-world systems, such as biological networks. Existing methods that account for confounders either assume linearity or struggle with scalability. To address these limitations, we propose DCCD-CONF, a novel framework for differentiable learning of nonlinear cyclic causal graphs in the presence of unmeasured confounders using interventional data. Our approach alternates between optimizing the graph structure and estimating the confounder distribution by maximizing the log-likelihood of the data. Through experiments on synthetic data and real-world gene perturbation datasets, we show that DCCD-CONF outperforms state-of-the-art methods in both causal graph recovery and confounder identification. Additionally, we also provide consistency guarantees for our framework, reinforcing its theoretical soundness.

2508.08172 2026-01-26 cs.LG cs.AI cs.LO

Neural Logic Networks for Interpretable Classification

Vincent Perreault, Katsumi Inoue, Richard Labib, Alain Hertz

Comments 25 pages, 8 figures, pre-print, code available at https://github.com/VincentPerreault0/NeuralLogicNetworks

详情
英文摘要

Traditional neural networks have an impressive classification performance, but what they learn cannot be inspected, verified or extracted. Neural Logic Networks on the other hand have an interpretable structure that enables them to learn a logical mechanism relating the inputs and outputs with AND and OR operations. We generalize these networks with NOT operations and biases that take into account unobserved data and develop a rigorous logical and probabilistic modeling in terms of concept combinations to motivate their use. We also propose a novel factorized IF-THEN rule structure for the model as well as a modified learning algorithm. Our method improves the state-of-the-art in Boolean networks discovery and is able to learn relevant, interpretable rules in tabular classification, notably on examples from the medical and industrial fields where interpretability has tangible value.

2508.06263 2026-01-26 cs.AI cs.LG

Symmetry breaking for inductive logic programming

Andrew Cropper, David M. Cerna, Matti Järvisalo

Comments AAAI26

详情
英文摘要

The goal of inductive logic programming is to search for a hypothesis that generalises training data and background knowledge. The challenge is searching vast hypothesis spaces, which is exacerbated because many logically equivalent hypotheses exist. To address this challenge, we introduce a method to break symmetries in the hypothesis space. We implement our idea in answer set programming. Our experiments on multiple domains, including visual reasoning and game playing, show that our approach can reduce solving times from over an hour to just 17 seconds.

2508.06230 2026-01-26 cs.AI

Learning Logical Rules using Minimum Message Length

Ruben Sharma, Sebastijan Dumančić, Ross D. King, Andrew Cropper

详情
英文摘要

Unifying probabilistic and logical learning is a key challenge in AI. We introduce a Bayesian inductive logic programming approach that learns minimum message length hypotheses from noisy data. Our approach balances hypothesis complexity and data fit through priors, which favour more general programs, and a likelihood, which favours accurate programs. Our experiments on several domains, including game playing and drug design, show that our method significantly outperforms previous methods, notably those that learn minimum description length programs. Our results also show that our approach is data-efficient and insensitive to example balance, including the ability to learn from exclusively positive examples.

2508.04494 2026-01-26 cs.CL

CALE : Concept-Aligned Embeddings for Both Within-Lemma and Inter-Lemma Sense Differentiation

Bastien Liétard, Gabriel Loiseau

Comments Accepted at EACL 2026

详情
英文摘要

Lexical semantics is concerned with both the multiple senses a word can adopt in different contexts, and the semantic relations that exist between meanings of different words. To investigate them, Contextualized Language Models are a valuable tool that provides context-sensitive representations that can be used to investigate lexical meaning. Recent works like XL-LEXEME have leveraged the task of Word-in-Context to fine-tune them to get more semantically accurate representations, but Word-in-Context only compares occurrences of the same lemma, limiting the range of captured information. In this paper, we propose an extension, Concept Differentiation, to include inter-words scenarios. We provide a dataset for this task, derived from SemCor data. Then we fine-tune several representation models on this dataset. We call these models Concept-Aligned Embeddings (CALE). By challenging our models and other models on various lexical semantic tasks, we demonstrate that the proposed models provide efficient multi-purpose representations of lexical meaning that reach best performances in our experiments. We also show that CALE's fine-tuning brings valuable changes to the spatial organization of embeddings.

2508.02283 2026-01-26 cs.LG q-fin.CP q-fin.RM

An Enhanced Focal Loss Function to Mitigate Class Imbalance in Auto Insurance Fraud Detection with Explainable AI

Francis Boabang, Samuel Asante Gyamerah

Comments 15 pages, 4 figures, 2 tables

详情
英文摘要

Detecting fraudulent auto-insurance claims remains a challenging classification problem, largely due to the extreme imbalance between legitimate and fraudulent cases. Standard learning algorithms tend to overfit to the majority class, resulting in poor detection of economically significant minority events. This paper proposes a structured three-stage training framework that integrates a convex surrogate of focal loss for stable initialization, a controlled non-convex intermediate loss to improve feature discrimination, and the standard focal loss to refine minority-class sensitivity. We derive conditions under which the surrogate retains convexity in the prediction space and show how this facilitates more reliable optimization when combined with deep sequential models. Using a proprietary auto-insurance dataset, the proposed method improves minority-class F1-scores and AUC relative to conventional focal-loss training and resampling baselines. The approach also provides interpretable feature-attribution patterns through SHAP analysis, offering transparency for actuarial and fraud-analytics applications.

2507.21616 2026-01-26 cs.LG

Categorical Distributions are Effective Neural Network Outputs for Event Prediction

Kevin Doran, Tom Baden

Comments 42 pages, 33 figures

详情
英文摘要

We demonstrate the effectiveness of the categorical distribution as a neural network output for next event prediction. This is done for both discrete-time and continuous-time event sequences. To model continuous-time processes, the categorical distribution is interpreted as a piecewise-constant density function and is shown to be competitive across a range of datasets. We then argue for the importance of studying discrete-time processes by introducing a neuronal spike prediction task motivated by retinal prosthetics, where discretization of event times is consequent on the task description. Separately, we show evidence that commonly used datasets favour smaller models. Finally, we introduce new synthetic datasets for testing larger models, as well as synthetic datasets with discrete event times.

2507.12884 2026-01-26 cs.CV eess.SP

From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation

Mengxi Liu, Lala Shakti Swarup Ray, Sizhen Bian, Ko Watanabe, Ankur Bhatt, Joanna Sorysz, Russel Torah, Bo Zhou, Paul Lukowicz

详情
英文摘要

We present NeckSense, a novel wearable system for head pose tracking that leverages multi-channel bio-impedance sensing with soft, dry electrodes embedded in a lightweight, necklace-style form factor. NeckSense captures dynamic changes in tissue impedance around the neck, which are modulated by head rotations and subtle muscle activations. To robustly estimate head pose, we propose a deep learning framework that integrates anatomical priors, including joint constraints and natural head rotation ranges, into the loss function design. We validate NeckSense on 7 participants using the current SOTA pose estimation model as ground truth. Our system achieves a mean per-vertex error of 25.9 mm across various head movements with a leave-one-person-out cross-validation method, demonstrating that a compact, line-of-sight-free bio-impedance wearable can deliver head-tracking performance comparable to SOTA vision-based methods.

2507.12814 2026-01-26 cs.LG cs.CE cs.NA math.NA

RONOM: Reduced-Order Neural Operator Modeling

Sven Dummer, Dongwei Ye, Christoph Brune

详情
英文摘要

Time-dependent partial differential equations are ubiquitous in physics-based modeling, but they remain computationally intensive in many-query scenarios, such as real-time forecasting, optimal control, and uncertainty quantification. Reduced-order modeling (ROM) addresses these challenges by constructing a low-dimensional surrogate model but relies on a fixed discretization, which limits flexibility across varying meshes during evaluation. Operator learning approaches, such as neural operators, offer an alternative by parameterizing mappings between infinite-dimensional function spaces, enabling adaptation to data across different resolutions. Whereas ROM provides rigorous numerical error estimates, neural operator learning largely focuses on discretization convergence and invariance without quantifying the error between the infinite-dimensional and the discretized operators. This work introduces the reduced-order neural operator modeling (RONOM) framework, which bridges concepts from ROM and operator learning. We establish a discretization error bound analogous to those in ROM, and get insights into RONOM's discretization convergence and discretization robustness. Moreover, three numerical examples are presented that compare RONOM to existing neural operators for solving partial differential equations. The results demonstrate that RONOM using standard vector-to-vector neural networks can achieve comparable performance in input generalization and achieves superior performance in both spatial super-resolution and discretization robustness, while also offering novel insights into temporal super-resolution scenarios and ROM-based approaches for learning on time-dependent data.

2507.10121 2026-01-26 cs.RO

Digital twins for the design, interactive control, and deployment of modular, fiber-reinforced soft continuum arms

Seung Hyun Kim, Jiamiao Guo, Arman Tekinalp, Heng-Sheng Chang, Ugur Akcal, Tixian Wang, Darren Biskup, Benjamin Walt, Girish Chowdhary, Girish Krishnan, Prashant G. Mehta, Mattia Gazzola

Comments 8 pages, 4 figures This work has been submitted for possible publication

详情
英文摘要

Soft continuum arms (SCAs) promise versatile manipulation through mechanical compliance, for assistive devices, agriculture, search applications, or surgery. However, the strong nonlinear coupling between materials, morphology, and actuation renders design and control challenging, hindering real-world deployment. In this context, a modular fabrication strategy paired with reliable, interactive simulations would be highly beneficial, streamlining prototyping and control design. Here, we present a digital twin framework for modular SCAs realized using pneumatic Fiber-Reinforced Elastomeric Enclosures (FREEs). The approach models assemblies of FREE actuators through networks of Cosserat rods, favoring the accurate simulation of three-dimensional arm reconfigurations, while explicitly preserving internal modular architecture. This enables the quantitative analysis and scalable development of composite soft robot arms, overcoming limitations of current monolithic continuum models. To validate the framework, we introduce a three-dimensional reconstruction pipeline tailored to soft, slender, small-volume, and highly deformable structures, allowing reliable recovery of arm kinematics and strain distributions. Experimental results across multiple configurations and actuation regimes demonstrate close agreement with simulations. Finally, we embed the digital twins in a virtual environment to allow interactive control design and sim-to-real deployment, establishing a foundation for principled co-design and remote operation of modular soft continuum manipulators.

2507.05964 2026-01-26 cs.CV

T-LoRA: Single Image Diffusion Model Customization Without Overfitting

Vera Soboleva, Aibek Alanov, Andrey Kuznetsov, Konstantin Sobolev

Comments AAAI 2026

详情
英文摘要

While diffusion model fine-tuning offers a powerful approach for customizing pre-trained models to generate specific objects, it frequently suffers from overfitting when training samples are limited, compromising both generalization capability and output diversity. This paper tackles the challenging yet most impactful task of adapting a diffusion model using just a single concept image, as single-image customization holds the greatest practical potential. We introduce T-LoRA, a Timestep-Dependent Low-Rank Adaptation framework specifically designed for diffusion model personalization. We show that higher diffusion timesteps are more prone to overfitting than lower ones, necessitating a timestep-sensitive fine-tuning strategy. T-LoRA incorporates two key innovations: (1) a dynamic fine-tuning strategy that adjusts rank-constrained updates based on diffusion timesteps, and (2) a weight parametrization technique that ensures independence between adapter components through orthogonal initialization. Extensive experiments on SD-XL and FLUX-1.dev show that T-LoRA and its individual components outperform standard LoRA and other diffusion model personalization techniques, achieving a superior balance between concept fidelity and text alignment. Project page is available at https://controlgenai.github.io/T-LoRA/.

2507.04037 2026-01-26 cs.AI

Ready Jurist One: Benchmarking Language Agents for Legal Intelligence in Dynamic Environments

Zheng Jia, Shengbin Yue, Wei Chen, Siyuan Wang, Yidong Liu, Zejun Li, Yun Song, Zhongyu Wei

详情
英文摘要

The gap between static benchmarks and the dynamic nature of real-world legal practice poses a key barrier to advancing legal intelligence. To this end, we introduce J1-ENVS, the first interactive and dynamic legal environment tailored for LLM-based agents. Guided by legal experts, it comprises six representative scenarios from Chinese legal practices across three levels of environmental complexity. We further introduce J1-EVAL, a fine-grained evaluation framework, designed to assess both task performance and procedural compliance across varying levels of legal proficiency. Extensive experiments on 17 LLM agents reveal that, while many models demonstrate solid legal knowledge, they struggle with procedural execution in dynamic settings. Even the SOTA model, GPT-4o, falls short of 60% overall performance. These findings highlight persistent challenges in achieving dynamic legal intelligence and offer valuable insights to guide future research.

2507.00243 2026-01-26 cs.CV

VOCAL: Visual Odometry via ContrAstive Learning

Chi-Yao Huang, Zeel Bhatt, Yezhou Yang

Comments Accepted to WACV 2026

详情
英文摘要

Breakthroughs in visual odometry (VO) have fundamentally reshaped the landscape of robotics, enabling ultra-precise camera state estimation that is crucial for modern autonomous systems. Despite these advances, many learning-based VO techniques rely on rigid geometric assumptions, which often fall short in interpretability and lack a solid theoretical basis within fully data-driven frameworks. To overcome these limitations, we introduce VOCAL (Visual Odometry via ContrAstive Learning), a novel framework that reimagines VO as a label ranking challenge. By integrating Bayesian inference with a representation learning framework, VOCAL organizes visual features to mirror camera states. The ranking mechanism compels similar camera states to converge into consistent and spatially coherent representations within the latent space. This strategic alignment not only bolsters the interpretability of the learned features but also ensures compatibility with multimodal data sources. Extensive evaluations on the KITTI dataset highlight VOCAL's enhanced interpretability and flexibility, pushing VO toward more general and explainable spatial intelligence.

2506.21771 2026-01-26 cs.LG cs.NE

Gradient-Based Neuroplastic Adaptation for Concurrent Optimization of Neuro-Fuzzy Networks

John Wesley Hostetter, Min Chi

Comments 52 pages

详情
英文摘要

Neuro-fuzzy networks (NFNs) are transparent, symbolic, and universal function approximations that perform as well as conventional neural architectures, but their knowledge is expressed as linguistic IF-THEN rules. Despite these advantages, their systematic design process remains a challenge. Existing work will often sequentially build NFNs by inefficiently isolating parametric and structural identification, leading to a premature commitment to brittle and subpar architecture. We propose a novel application-independent approach called gradient-based neuroplastic adaptation for the concurrent optimization of NFNs' parameters and structure. By recognizing that NFNs' parameters and structure should be optimized simultaneously as they are deeply conjoined, settings previously unapproachable for NFNs are now accessible, such as the online reinforcement learning of NFNs for vision-based tasks. The effectiveness of concurrently optimizing NFNs is empirically shown as it is trained by online reinforcement learning to proficiently play challenging scenarios from a vision-based video game called DOOM.

2506.13911 2026-01-26 cs.LG cs.AI cs.LO

Logical Expressiveness of Graph Neural Networks with Hierarchical Node Individualization

Arie Soeteman, Balder ten Cate

Comments Submitted to NeurIPS 2025, 28 pages, 5 figures

详情
英文摘要

We propose and study Hierarchical Ego Graph Neural Networks (HEGNNs), an expressive extension of graph neural networks (GNNs) with hierarchical node individualization, inspired by the Individualization-Refinement paradigm for isomorphism testing. HEGNNs generalize subgraph-GNNs and form a hierarchy of increasingly expressive models that, in the limit, distinguish graphs up to isomorphism. We show that, over graphs of bounded degree, the separating power of HEGNN node classifiers equals that of graded hybrid logic. This characterization enables us to relate the separating power of HEGNNs to that of higher-order GNNs, GNNs enriched with local homomorphism count features, and color refinement algorithms based on Individualization-Refinement. Our experimental results confirm the practical feasibility of HEGNNs and show benefits in comparison with traditional GNN architectures, both with and without local homomorphism count features.

2506.11777 2026-01-26 cs.CV cs.AI cs.CY cs.LG

Self-supervised Learning of Echocardiographic Video Representations via Online Cluster Distillation

Divyanshu Mishra, Mohammadreza Salehi, Pramit Saha, Olga Patey, Aris T. Papageorghiou, Yuki M. Asano, J. Alison Noble

Comments Accepted in NeurIPS 2025

详情
英文摘要

Self-supervised learning (SSL) has achieved major advances in natural images and video understanding, but challenges remain in domains like echocardiography (heart ultrasound) due to subtle anatomical structures, complex temporal dynamics, and the current lack of domain-specific pre-trained models. Existing SSL approaches such as contrastive, masked modeling, and clustering-based methods struggle with high intersample similarity, sensitivity to low PSNR inputs common in ultrasound, or aggressive augmentations that distort clinically relevant features. We present DISCOVR (Distilled Image Supervision for Cross Modal Video Representation), a self-supervised dual branch framework for cardiac ultrasound video representation learning. DISCOVR combines a clustering-based video encoder that models temporal dynamics with an online image encoder that extracts fine-grained spatial semantics. These branches are connected through a semantic cluster distillation loss that transfers anatomical knowledge from the evolving image encoder to the video encoder, enabling temporally coherent representations enriched with fine-grained semantic understanding.Evaluated on six echocardiography datasets spanning fetal, pediatric, and adult populations, DISCOVR outperforms both specialized video anomaly detection methods and state-of-the-art video-SSL baselines in zero-shot and linear probing setups,achieving superior segmentation transfer and strong downstream performance on clinically relevant tasks such as LVEF prediction. Code available at: https://github.com/mdivyanshu97/DISCOVR

2506.10805 2026-01-26 cs.LG

Detecting High-Stakes Interactions with Activation Probes

Alex McKenzie, Urja Pawar, Phil Blandfort, William Bankes, David Krueger, Ekdeep Singh Lubana, Dmitrii Krasheninnikov

Comments Accepted at NeurIPS 2025

详情
英文摘要

Monitoring is an important aspect of safely deploying Large Language Models (LLMs). This paper examines activation probes for detecting ``high-stakes'' interactions -- where the text indicates that the interaction might lead to significant harm -- as a critical, yet underexplored, target for such monitoring. We evaluate several probe architectures trained on synthetic data, and find them to exhibit robust generalization to diverse, out-of-distribution, real-world data. Probes' performance is comparable to that of prompted or finetuned medium-sized LLM monitors, while offering computational savings of six orders-of-magnitude. These savings are enabled by reusing activations of the model that is being monitored. Our experiments also highlight the potential of building resource-aware hierarchical monitoring systems, where probes serve as an efficient initial filter and flag cases for more expensive downstream analysis. We release our novel synthetic dataset and the codebase at https://github.com/arrrlex/models-under-pressure.

2506.04772 2026-01-26 cs.CL

Identifying Reliable Evaluation Metrics for Scientific Text Revision

Léane Jourdan, Florian Boudin, Richard Dufour, Nicolas Hernandez

Comments V5 contains the English version, (ACL 2025 main, 26 pages) and V4 contains the French version (TALN 2025, 32 pages), both with corrected results for cramer's v and pairwise accuracy

详情
英文摘要

Evaluating text revision in scientific writing remains a challenge, as traditional metrics such as ROUGE and BERTScore primarily focus on similarity rather than capturing meaningful improvements. In this work, we analyse and identify the limitations of these metrics and explore alternative evaluation methods that better align with human judgments. We first conduct a manual annotation study to assess the quality of different revisions. Then, we investigate reference-free evaluation metrics from related NLP domains. Additionally, we examine LLM-as-a-judge approaches, analysing their ability to assess revisions with and without a gold reference. Our results show that LLMs effectively assess instruction-following but struggle with correctness, while domain-specific metrics provide complementary insights. We find that a hybrid approach combining LLM-as-a-judge evaluation and task-specific metrics offers the most reliable assessment of revision quality.

2505.22928 2026-01-26 cs.AI cs.CL

Enhancing Study-Level Inference from Clinical Trial Papers via Reinforcement Learning-Based Numeric Reasoning

Massimiliano Pronesti, Michela Lorandi, Paul Flanagan, Oisin Redmond, Anya Belz, Yufang Hou

Comments Accepted at EMNLP 2025 Main Conference. This revision corrects a minor typo in the camera-ready version

详情
英文摘要

Systematic reviews in medicine play a critical role in evidence-based decision-making by aggregating findings from multiple studies. A central bottleneck in automating this process is extracting numeric evidence and determining study-level conclusions for specific outcomes and comparisons. Prior work has framed this problem as a textual inference task by retrieving relevant content fragments and inferring conclusions from them. However, such approaches often rely on shallow textual cues and fail to capture the underlying numeric reasoning behind expert assessments. In this work, we conceptualise the problem as one of quantitative reasoning. Rather than inferring conclusions from surface text, we extract structured numerical evidence (e.g., event counts or standard deviations) and apply domain knowledge informed logic to derive outcome-specific conclusions. We develop a numeric reasoning system composed of a numeric data extraction model and an effect estimate component, enabling more accurate and interpretable inference aligned with the domain expert principles. We train the numeric data extraction model using different strategies, including supervised fine-tuning (SFT) and reinforcement learning (RL) with a new value reward model. When evaluated on the CochraneForest benchmark, our best-performing approach -- using RL to train a small-scale number extraction model -- yields up to a 21% absolute improvement in F1 score over retrieval-based systems and outperforms general-purpose LLMs of over 400B parameters by up to 9% on the RCTs benchmark. Our results demonstrate the promise of reasoning-driven approaches for automating systematic evidence synthesis.

2505.21548 2026-01-26 cs.CL cs.AI cs.CY physics.soc-ph

Fluent but Foreign: Even Regional LLMs Lack Cultural Alignment

Dhruv Agarwal, Anya Shukla, Sunayana Sitaram, Aditya Vashistha

Comments Under review

详情
英文摘要

Large language models (LLMs) are used worldwide, yet exhibit Western cultural tendencies. Many countries are now building ``regional'' or ``sovereign'' LLMs, but it remains unclear whether they reflect local values and practices or merely speak local languages. Using India as a case study, we evaluate six Indic and six global LLMs on two dimensions -- values and practices -- grounded in nationally representative surveys and community-sourced QA datasets. Across tasks, Indic models do not align better with Indian norms than global models; in fact, a U.S. respondent is a closer proxy for Indian values than any Indic model. We further run a user study with 115 Indian users and find that writing suggestions from both global and Indic LLMs introduce Westernized or exoticized writing. Prompting and regional fine-tuning fail to recover alignment and can even degrade existing knowledge. We attribute this to scarce culturally grounded data, especially for pretraining. We position cultural evaluation as a first-class requirement alongside multilingual benchmarks and offer a reusable, community-grounded methodology. We call for native, community-authored corpora and thickxwide evaluations to build truly sovereign LLMs.

2505.16004 2026-01-26 cs.LG cs.AI cs.CL

Evaluating Adversarial Robustness of Concept Representations in Sparse Autoencoders

Aaron J. Li, Suraj Srinivas, Usha Bhalla, Himabindu Lakkaraju

详情
英文摘要

Sparse autoencoders (SAEs) are commonly used to interpret the internal activations of large language models (LLMs) by mapping them to human-interpretable concept representations. While existing evaluations of SAEs focus on metrics such as the reconstruction-sparsity tradeoff, human (auto-)interpretability, and feature disentanglement, they overlook a critical aspect: the robustness of concept representations to input perturbations. We argue that robustness must be a fundamental consideration for concept representations, reflecting the fidelity of concept labeling. To this end, we formulate robustness quantification as input-space optimization problems and develop a comprehensive evaluation framework featuring realistic scenarios in which adversarial perturbations are crafted to manipulate SAE representations. Empirically, we find that tiny adversarial input perturbations can effectively manipulate concept-based interpretations in most scenarios without notably affecting the base LLM's activations. Overall, our results suggest that SAE concept representations are fragile and without further denoising or postprocessing they might be ill-suited for applications in model monitoring and oversight.

2505.15638 2026-01-26 cs.LG stat.CO stat.ME stat.ML

Bayesian Ensembling: Insights from Online Optimization and Empirical Bayes

Daniel Waxman, Fernando Llorente, Petar M. Djurić

Comments 28 pages, 10 figures; Accepted to Transactions on Machine Learning Research (TMLR)

Journal ref Transactions on Machine Learning Research (TMLR), 2026

详情
英文摘要

We revisit the classical problem of Bayesian ensembles and address the challenge of learning optimal combinations of Bayesian models in an online, continual learning setting. To this end, we reinterpret existing approaches such as Bayesian model averaging (BMA) and Bayesian stacking through a novel empirical Bayes lens, shedding new light on the limitations and pathologies of BMA. Further motivated by insights from online optimization, we propose Online Bayesian Stacking (OBS), a method that optimizes the log-score over predictive distributions to adaptively combine Bayesian models. A key contribution of our work is establishing a novel connection between OBS and portfolio selection, bridging Bayesian ensemble learning with a rich, well-studied theoretical framework that offers efficient algorithms and extensive regret analysis. We further clarify the relationship between OBS and online BMA, showing that they optimize related but distinct cost functions. Through theoretical analysis and empirical evaluation, we identify scenarios where OBS outperforms online BMA and provide principled methods and guidance on when practitioners should prefer one approach over the other.

2505.10072 2026-01-26 cs.CV

ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars

Rui-Yang Ju, Sheng-Yen Huang, Yi-Ping Hung

Comments 2-Page Version Accepted as a Poster at IEEE VR 2026

详情
英文摘要

The introduction of 3D Gaussian blendshapes has enabled the real-time reconstruction of animatable head avatars from monocular video. Toonify, a StyleGAN-based method, has become widely used for facial image stylization. To extend Toonify for synthesizing diverse stylized 3D head avatars using Gaussian blendshapes, we propose an efficient two-stage framework, ToonifyGB. In Stage 1 (stylized video generation), we adopt an improved StyleGAN to generate the stylized video from the input video frames, which overcomes the limitation of cropping aligned faces at a fixed resolution as preprocessing for normal StyleGAN. This process provides a more stable stylized video, which enables Gaussian blendshapes to better capture the high-frequency details of the video frames, facilitating the synthesis of high-quality animations in the next stage. In Stage 2 (Gaussian blendshapes synthesis), our method learns a stylized neutral head model and a set of expression blendshapes from the generated stylized video. By combining the neutral head model with expression blendshapes, ToonifyGB can efficiently render stylized avatars with arbitrary expressions. We validate the effectiveness of ToonifyGB on benchmark datasets using two representative styles: Arcane and Pixar.

2505.06980 2026-01-26 cs.RO cs.CV

VALISENS: A Validated Innovative Multi-Sensor System for Cooperative Automated Driving

Lei Wan, Prabesh Gupta, Andreas Eich, Marcel Kettelgerdes, Hannan Ejaz Keen, Michael Klöppel-Gersdorf, Alexey Vinel

Comments 8 pages, 5 figures, submitted to IEEE VNC

详情
英文摘要

Reliable perception remains a key challenge for Connected Automated Vehicles (CAVs) in complex real-world environments, where varying lighting conditions and adverse weather degrade sensing performance. While existing multi-sensor solutions improve local robustness, they remain constrained by limited sensing range, line-of-sight occlusions, and sensor failures on individual vehicles. This paper introduces VALISENS, a validated cooperative perception system that extends multi-sensor fusion beyond a single vehicle through Vehicle-to-Everything (V2X)-enabled collaboration between Connected Automated Vehicles (CAVs) and intelligent infrastructure. VALISENS integrates onboard and roadside LiDARs, radars, RGB cameras, and thermal cameras within a unified multi-agent perception framework. Thermal cameras enhances the detection of Vulnerable Road Users (VRUs) under challenging lighting conditions, while roadside sensors reduce occlusions and expand the effective perception range. In addition, an integrated sensor monitoring module continuously assesses sensor health and detects anomalies before system degradation occurs. The proposed system is implemented and evaluated in a dedicated real-world testbed. Experimental results show that VALISENS improves pedestrian situational awareness by up to 18% compared with vehicle-only sensing, while the sensor monitoring module achieves over 97% accuracy, demonstrating its effectiveness and its potential to support future Cooperative Intelligent Transport Systems (C-ITS) applications.

2505.05855 2026-01-26 cs.CV

Decoupling Multi-Contrast Super-Resolution: Self-Supervised Implicit Re-Representation for Unpaired Cross-Modal Synthesis

Yinzhe Wu, Hongyu Rui, Fanwen Wang, Jiahao Huang, Zhenxuan Zhang, Haosen Zhang, Zi Wang, Guang Yang

详情
英文摘要

Multi-contrast super-resolution (MCSR) is crucial for enhancing MRI but current deep learning methods are limited. They typically require large, paired low- and high-resolution (LR/HR) training datasets, which are scarce, and are trained for fixed upsampling scales. While recent self-supervised methods remove the paired data requirement, they fail to leverage valuable population-level priors. In this work, we propose a novel, decoupled MCSR framework that resolves both limitations. We reformulate MCSR into two stages: (1) an unpaired cross-modal synthesis (uCMS) module, trained once on unpaired population data to learn a robust anatomical prior; and (2) a lightweight, patient-specific implicit re-representation (IrR) module. This IrR module is optimized in a self-supervised manner to fuse the population prior with the subject's own LR target data. This design uniquely fuses population-level knowledge with patient-specific fidelity without requiring any paired LR/HR or paired cross-modal training data. By building the IrR module on an implicit neural representation, our framework is also inherently scale-agnostic. Our method demonstrates superior quantitative performance on different datasets, with exceptional robustness at extreme scales (16x, 32x), a regime where competing methods fail. Our work presents a data-efficient, flexible, and computationally lightweight paradigm for MCSR, enabling high-fidelity, arbitrary-scale

2505.01391 2026-01-26 cs.LG

Learning and Transferring Physical Models through Derivatives

Alessandro Trenta, Andrea Cossu, Davide Bacciu

Comments Accepted at Transactions on Machine Learning Research (TMLR) in January 2026

详情
英文摘要

We propose Derivative Learning (DERL), a supervised approach that models physical systems by learning their partial derivatives. We also leverage DERL to build physical models incrementally, by designing a distillation protocol that effectively transfers knowledge from a pre-trained model to a student one. We provide theoretical guarantees that DERL can learn the true physical system, being consistent with the underlying physical laws, even when using empirical derivatives. DERL outperforms state-of-the-art methods in generalizing an ODE to unseen initial conditions and a parametric PDE to unseen parameters. We also design a method based on DERL to transfer physical knowledge across models by extending them to new portions of the physical domain and a new range of PDE parameters. This introduces a new pipeline to build physical models incrementally in multiple stages.