arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1125
2602.02089 2026-02-23 cs.CV

UrbanGS: A Scalable and Efficient Architecture for Geometrically Accurate Large-Scene Reconstruction

Changbai Li, Haodong Zhu, Hanlin Chen, Xiuping Liang, Tongfei Chen, Shuwei Shao, Linlin Yang, Huobin Tan, Baochang Zhang

Comments ICLR 2026

详情
英文摘要

While 3D Gaussian Splatting (3DGS) enables high-quality, real-time rendering for bounded scenes, its extension to large-scale urban environments gives rise to critical challenges in terms of geometric consistency, memory efficiency, and computational scalability. To address these issues, we present UrbanGS, a scalable reconstruction framework that effectively tackles these challenges for city-scale applications. First, we propose a Depth-Consistent D-Normal Regularization module. Unlike existing approaches that rely solely on monocular normal estimators, which can effectively update rotation parameters yet struggle to update position parameters, our method integrates D-Normal constraints with external depth supervision. This allows for comprehensive updates of all geometric parameters. By further incorporating an adaptive confidence weighting mechanism based on gradient consistency and inverse depth deviation, our approach significantly enhances multi-view depth alignment and geometric coherence, which effectively resolves the issue of geometric accuracy in complex large-scale scenes. To improve scalability, we introduce a Spatially Adaptive Gaussian Pruning (SAGP) strategy, which dynamically adjusts Gaussian density based on local geometric complexity and visibility to reduce redundancy. Additionally, a unified partitioning and view assignment scheme is designed to eliminate boundary artifacts and optimize computational load. Extensive experiments on multiple urban datasets demonstrate that UrbanGS achieves superior performance in rendering quality, geometric accuracy, and memory efficiency, providing a systematic solution for high-fidelity large-scale scene reconstruction.

2601.20198 2026-02-23 cs.LG

DeRaDiff: Denoising Time Realignment of Diffusion Models

Ratnavibusena Don Shahain Manujith, Teoh Tze Tzun, Kenji Kawaguchi, Yang Zhang

详情
英文摘要

Recent advances align diffusion models with human preferences to increase aesthetic appeal and mitigate artifacts and biases. Such methods aim to maximize a conditional output distribution aligned with higher rewards whilst not drifting far from a pretrained prior. This is commonly enforced by KL (Kullback Leibler) regularization. As such, a central issue still remains: how does one choose the right regularization strength? Too high of a strength leads to limited alignment and too low of a strength leads to "reward hacking". This renders the task of choosing the correct regularization strength highly non-trivial. Existing approaches sweep over this hyperparameter by aligning a pretrained model at multiple regularization strengths and then choose the best strength. Unfortunately, this is prohibitively expensive. We introduce DeRaDiff, a denoising time realignment procedure that, after aligning a pretrained model once, modulates the regularization strength during sampling to emulate models trained at other regularization strengths without any additional training or finetuning. Extending decoding-time realignment from language to diffusion models, DeRaDiff operates over iterative predictions of continuous latents by replacing the reverse step reference distribution by a geometric mixture of an aligned and reference posterior, thus giving rise to a closed form update under common schedulers and a single tunable parameter, lambda, for on the fly control. Our experiments show that across multiple text image alignment and image-quality metrics, our method consistently provides a strong approximation for models aligned entirely from scratch at different regularization strengths. Thus, our method yields an efficient way to search for the optimal strength, eliminating the need for expensive alignment sweeps and thereby substantially reducing computational costs.

2601.19318 2026-02-23 cs.RO cs.CV

Perception-to-Pursuit: Track-Centric Temporal Reasoning for Open-World Drone Detection and Autonomous Chasing

Venkatakrishna Reddy Oruganti

Comments 7 pages, 2 figures, 3 tables, 15 references. Intended for submission to ICCV 2027

详情
英文摘要

Autonomous drone pursuit requires not only detecting drones but also predicting their trajectories in a manner that enables kinematically feasible interception. Existing tracking methods optimize for prediction accuracy but ignore pursuit feasibility, resulting in trajectories that are physically impossible to intercept 99.9% of the time. We propose Perception-to-Pursuit (P2P), a track-centric temporal reasoning framework that bridges detection and actionable pursuit planning. Our method represents drone motion as compact 8-dimensional tokens capturing velocity, acceleration, scale, and smoothness, enabling a 12-frame causal transformer to reason about future behavior. We introduce the Intercept Success Rate (ISR) metric to measure pursuit feasibility under realistic interceptor constraints. Evaluated on the Anti-UAV-RGBT dataset with 226 real drone sequences, P2P achieves 28.12 pixel average displacement error and 0.597 ISR, representing a 77% improvement in trajectory prediction and 597x improvement in pursuit feasibility over tracking-only baselines, while maintaining perfect drone classification accuracy (100%). Our work demonstrates that temporal reasoning over motion patterns enables both accurate prediction and actionable pursuit planning.

2601.10161 2026-02-23 cs.CL cs.AI cs.IR

AWED-FiNER: Agents, Web applications, and Expert Detectors for Fine-grained Named Entity Recognition across 36 Languages for 6.6 Billion Speakers

Prachuryya Kaushik, Ashish Anand

Comments Submitted to SIGIR'26 Low-resource Environments Track

详情
英文摘要

Named Entity Recognition (NER) is a foundational task in Natural Language Processing (NLP) and Information Retrieval (IR), which facilitates semantic search and structured data extraction. We introduce \textbf{AWED-FiNER}, an open-source collection of agentic tool, web application, and 53 state-of-the-art expert models that provide Fine-grained Named Entity Recognition (FgNER) solutions across 36 languages spoken by more than 6.6 billion people. The agentic tool enables routing multilingual text to specialized expert models to fetch FgNER annotations within seconds. The web-based platform provides a ready-to-use FgNER annotation service for non-technical users. Moreover, the collection of language-specific extremely small open-source state-of-the-art expert models facilitates offline deployment in resource-constrained scenarios, including edge devices. AWED-FiNER covers languages spoken by over 6.6 billion people, ranging from global languages like English, Chinese, Spanish, and Hindi, to low-resource languages like Assamese, Santali, and Odia, along with a specific focus on extremely low-resource vulnerable languages such as Bodo, Manipuri, Bishnupriya, and Mizo. The resources can be accessed here: Agentic Tool (https://github.com/PrachuryyaKaushik/AWED-FiNER), Web Application (https://hf.co/spaces/prachuryyaIITG/AWED-FiNER), and 53 Expert Detector Models (https://hf.co/collections/prachuryyaIITG/awed-finer).

2601.10160 2026-02-23 cs.CL cs.AI cs.LG

Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment

Cameron Tice, Puria Radmard, Samuel Ratnam, Andy Kim, David Africa, Kyle O'Brien

详情
英文摘要

Pretraining corpora contain extensive discourse about AI systems, yet the causal influence of this discourse on downstream alignment remains poorly understood. If prevailing descriptions of AI behaviour are predominantly negative, LLMs may internalise corresponding behavioural priors, giving rise to self-fulfilling misalignment. This paper provides the first controlled study of this hypothesis by pretraining 6.9B-parameter LLMs with varying amounts of (mis)alignment discourse. We find that discussion of AI contributes to misalignment. Upsampling synthetic training documents about AI misalignment leads to a notable increase in misaligned behaviour. Conversely, upsampling documents about aligned behaviour reduces misalignment scores from 45% to 9%. We consider this evidence of self-fulfilling alignment. These effects are dampened, but persist through post-training. Our findings establish the study of how pretraining data shapes alignment priors, or alignment pretraining, as a complement to post-training. We recommend practitioners consider pretraining for alignment alongside capabilities. We share our models, data, and evaluations at AlignmentPretraining.ai.

2601.09920 2026-02-23 cs.RO

SyncTwin: Fast Digital Twin Construction and Synchronization for Safe Robotic Manipulation

Ruopeng Huang, Boyu Yang, Wenlong Gui, Jeremy Morgan, Erdem Biyik, Jiachen Li

Comments Project Website: https://sync-twin.github.io/

详情
英文摘要

Accurate and safe robotic manipulation under dynamic and visually occluded conditions remains a core challenge in real-world deployment. We introduce SyncTwin, a novel digital twin framework that unifies fast 3D scene reconstruction and real-to-sim synchronization for robust and safety-aware robotic manipulation in such environments. In the offline stage, we employ VGGT to rapidly reconstruct object-level 3D assets from RGB images, forming a reusable geometry library. During execution, SyncTwin continuously synchronizes the digital twin by tracking real-world object states via point cloud segmentation updates and aligning them through colored-ICP registration. The synchronized twin enables motion planners to compute collision-free and dynamically feasible trajectories in simulation, which are safely executed on the real robot through a closed real-to-sim-to-real loop. Experiments in dynamic and occluded scenes show that SyncTwin improves manipulation performance and motion safety, demonstrating the effectiveness of digital twin synchronization for real-world robotic execution. The video demos and code can be found on the project website: https://sync-twin.github.io/.

2601.09282 2026-02-23 cs.AI cs.DC cs.LG cs.SE

Cluster Workload Allocation: Semantic Soft Affinity Using Natural Language Processing

Leszek Sliwko, Jolanta Mizeria-Pietraszko

Comments This is the accepted version of the paper published in IEEE Access (2026). The final version is available at: https://doi.org/10.1109/ACCESS.2026.3665989

详情
英文摘要

Cluster workload allocation often requires complex configurations, creating a usability gap. This paper introduces a semantic, intent-driven scheduling paradigm for cluster systems using Natural Language Processing. The system employs a Large Language Model (LLM) integrated via a Kubernetes scheduler extender to interpret natural language allocation hint annotations for soft affinity preferences. A prototype featuring a cluster state cache and an intent analyzer (using AWS Bedrock) was developed. Empirical evaluation demonstrated high LLM parsing accuracy (>95% Subset Accuracy on an evaluation ground-truth dataset) for top-tier models like Amazon Nova Pro/Premier and Mistral Pixtral Large, significantly outperforming a baseline engine. Scheduling quality tests across six scenarios showed the prototype achieved superior or equivalent placement compared to standard Kubernetes configurations, particularly excelling in complex and quantitative scenarios and handling conflicting soft preferences. The results validate using LLMs for accessible scheduling but highlight limitations like synchronous LLM latency, suggesting asynchronous processing for production readiness. This work confirms the viability of semantic soft affinity for simplifying workload orchestration and presents a proof-of-concept design.

2601.06500 2026-02-23 cs.AI cs.CY

The AI Pyramid A Conceptual Framework for Workforce Capability in the Age of AI

Alok Khatri, Bishesh Khanal

Comments 14 pages

详情
英文摘要

Artificial intelligence (AI) represents a qualitative shift in technological change by extending cognitive labor itself rather than merely automating routine tasks. Recent evidence shows that generative AI disproportionately affects highly educated, white collar work, challenging existing assumptions about workforce vulnerability and rendering traditional approaches to digital or AI literacy insufficient. This paper introduces the concept of AI Nativity, the capacity to integrate AI fluidly into everyday reasoning, problem solving, and decision making, and proposes the AI Pyramid, a conceptual framework for organizing human capability in an AI mediated economy. The framework distinguishes three interdependent capability layers: AI Native capability as a universal baseline for participation in AI augmented environments; AI Foundation capability for building, integrating, and sustaining AI enabled systems; and AI Deep capability for advancing frontier AI knowledge and applications. Crucially, the pyramid is not a career ladder but a system level distribution of capabilities required at scale. Building on this structure, the paper argues that effective AI workforce development requires treating capability formation as infrastructure rather than episodic training, centered on problem based learning embedded in work contexts and supported by dynamic skill ontologies and competency based measurement. The framework has implications for organizations, education systems, and governments seeking to align learning, measurement, and policy with the evolving demands of AI mediated work, while addressing productivity, resilience, and inequality at societal scale.

2512.19223 2026-02-23 cs.LG

Phase-space entropy at acquisition reflects downstream learnability

Xiu-Cheng Wang, Jun-Jie Zhanga, Nan Cheng, Long-Gang Pang, Taijiao Du, Deyu Meng

Comments 22 pages 6 figures

详情
英文摘要

Modern learning systems work with data that vary widely across domains, but they all ultimately depend on how much structure is already present in the measurements before any model is trained. This raises a basic question: is there a general, modality-agnostic way to quantify how acquisition itself preserves or destroys the information that downstream learners could use? Here we propose an acquisition-level scalar $ΔS_{\mathcal B}$ based on instrument-resolved phase space. Unlike pixelwise distortion or purely spectral errors that often saturate under aggressive undersampling, $ΔS_{\mathcal B}$ directly quantifies how acquisition mixes or removes joint space--frequency structure at the instrument scale. We show theoretically that \(ΔS_{\mathcal B}\) correctly identifies the phase-space coherence of periodic sampling as the physical source of aliasing, recovering classical sampling-theorem consequences. Empirically, across masked image classification, accelerated MRI, and massive MIMO (including over-the-air measurements), $|ΔS_{\mathcal B}|$ consistently ranks sampling geometries and predicts downstream reconstruction/recognition difficulty \emph{without training}. In particular, minimizing $|ΔS_{\mathcal B}|$ enables zero-training selection of variable-density MRI mask parameters that matches designs tuned by conventional pre-reconstruction criteria. These results suggest that phase-space entropy at acquisition reflects downstream learnability, enabling pre-training selection of candidate sampling policies and as a shared notion of information preservation across modalities.

2512.12717 2026-02-23 cs.RO

HMPCC: Human-Aware Model Predictive Coverage Control

Mattia Catellani, Marta Gabbi, Lorenzo Sabattini

详情
英文摘要

We address the problem of coordinating a team of robots to cover an unknown environment while ensuring safe operation and avoiding collisions with non-cooperative agents. Traditional coverage strategies often rely on simplified assumptions, such as known or convex environments and static density functions, and struggle to adapt to real-world scenarios, especially when humans are involved. In this work, we propose a human-aware coverage framework based on Model Predictive Control (MPC), namely HMPCC, where human motion predictions are integrated into the planning process. By anticipating human trajectories within the MPC horizon, robots can proactively coordinate their actions %avoid redundant exploration, and adapt to dynamic conditions. The environment is modeled as a Gaussian Mixture Model (GMM), representing regions of interest. Team members operate in a fully decentralized manner, without relying on explicit communication, an essential feature in hostile or communication-limited scenarios. Our results show that human trajectory forecasting enables more efficient and adaptive coverage, improving coordination between human and robotic agents.

2512.06896 2026-02-23 cs.RO cs.SY eess.SY

Control of Powered Ankle-Foot Prostheses on Compliant Terrain: A Quantitative Approach to Stability Enhancement

Chrysostomos Karakasis, Camryn Scully, Robert Salati, Panagiotis Artemiadis

Journal ref Actuators 2026, 15, 107

详情
英文摘要

Walking on compliant terrain presents a substantial challenge for individuals with lower-limb amputation, further elevating their already high risk of falling. While powered ankle-foot prostheses have demonstrated adaptability across speeds and rigid terrains, control strategies optimized for soft or compliant surfaces remain underexplored. This work experimentally validates an admittance-based control strategy that dynamically adjusts the quasi-stiffness of powered prostheses to enhance gait stability on compliant ground. Human subject experiments were conducted with three healthy individuals walking on two bilaterally compliant surfaces with ground stiffness values of 63 and 25 kN/m, representative of real-world soft environments. Controller performance was quantified using phase portraits and two walking stability metrics, offering a direct assessment of fall risk. Compared to a standard phase-variable controller developed for rigid terrain, the proposed admittance controller consistently improved gait stability across all compliant conditions. These results demonstrate the potential of adaptive, stability-aware prosthesis control to reduce fall risk in real-world environments and advance the robustness of human-prosthesis interaction in rehabilitation robotics.

2512.04954 2026-02-23 cs.LG hep-ex hep-ph physics.comp-ph physics.data-an

Amortized Inference of Multi-Modal Posteriors using Likelihood-Weighted Normalizing Flows

Rajneil Baruah

Comments 14 pages, 8 figures, 3 Tables

详情
英文摘要

We present a novel technique for amortized posterior estimation using Normalizing Flows trained with likelihood-weighted importance sampling. This approach allows for the efficient inference of theoretical parameters in high-dimensional inverse problems without the need for posterior training samples. We implement the method on multi-modal benchmark tasks in 2D and 3D to check for the efficacy. A critical observation of our study is the impact of the topology of the base distributions on the modelled posteriors. We find that standard unimodal base distributions fail to capture disconnected support, resulting in spurious probability bridges between modes. We demonstrate that initializing the flow with a Gaussian Mixture Model that matches the cardinality of the target modes significantly improves reconstruction fidelity, as measured by some distance and divergence metrics.

2511.19269 2026-02-23 cs.LG cs.CL

CDLM: Consistency Diffusion Language Models For Faster Sampling

Minseo Kim, Chenfeng Xu, Coleman Hooper, Harman Singh, Ben Athiwaratkun, Ce Zhang, Kurt Keutzer, Amir Gholami

Comments Accepted to MLSys 2026

详情
英文摘要

Diffusion Language Models (DLMs) offer a promising parallel generation paradigm but suffer from slow inference due to numerous refinement steps and the inability to use standard KV caching. We introduce CDLM (Consistency Diffusion Language Models), a training-based acceleration method that simultaneously tackles both bottlenecks. CDLM integrates consistency modeling to drastically reduce the number of required sampling steps by enabling multi-token finalization. Furthermore, we enforce a block-wise causal attention mask during fine-tuning, making the model fully compatible with KV caching. Experiments show CDLM achieves 3.6x-14.5x lower latency while maintaining competitive accuracy on math and coding tasks. The full training and evaluation code is available at https://github.com/SqueezeAILab/CDLM.

2511.11924 2026-02-23 cs.AI

A Neuromorphic Architecture for Scalable Event-Based Control

Yongkang Huo, Fulvio Forni, Rodolphe Sepulchre

详情
英文摘要

This paper introduces the ``rebound Winner-Take-All (RWTA)" motif as the basic element of a scalable neuromorphic control architecture. From the cellular level to the system level, the resulting architecture combines the reliability of discrete computation and the tunability of continuous regulation: it inherits the discrete computation capabilities of winner-take-all state machines and the continuous tuning capabilities of excitable biophysical circuits. The proposed event-based framework addresses continuous rhythmic generation and discrete decision-making in a unified physical modeling language. We illustrate the versatility, robustness, and modularity of the architecture through the nervous system design of a snake robot.

2511.09044 2026-02-23 cs.AI

Advancing Autonomous Emergency Response Systems: A Generative AI Perspective

Yousef Emami, Radha Reddy, Azadeh Pourkabirian, Miguel Gutierrez Gaitan

Comments 8 pages, 3 figures, 2 tables

详情
英文摘要

Autonomous Vehicles (AVs) are poised to revolutionize emergency services by enabling faster, safer, and more efficient responses. This transformation is driven by advances in Artificial Intelligence (AI), particularly Reinforcement Learning (RL), which allows AVs to navigate complex environments and make critical decisions in real time. However, conventional RL paradigms often suffer from poor sample efficiency and lack adaptability in dynamic emergency scenarios. This paper reviews next-generation AV optimization strategies to address these limitations. We analyze the shift from conventional RL to Diffusion Model (DM)-augmented RL, which enhances policy robustness through synthetic data generation, albeit with increased computational cost. Additionally, we explore the emerging paradigm of Large Language Model (LLM)-assisted In-Context Learning (ICL), which offers a lightweight and interpretable alternative by enabling rapid, on-the-fly adaptation without retraining. By reviewing the state of the art in AV intelligence, DM-augmented RL, and LLM-assisted ICL, this paper provides a critical framework for understanding the next generation of autonomous emergency response systems from a Generative AI perspective.

2511.04108 2026-02-23 cs.CL

Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models

Saurabh Srivastava, Janit Bidhan, Hao Yan, Abhishek Dey, Tanu Kansal, Paras Kath, Sina Mansouri, Mohit Marvania, Vamsi Shankar Simhadri, Gaurav Singh

Comments New version with updated author list

详情
英文摘要

Large Reasoning Models (LRMs) achieve strong performance through explicit chain-of-thought reasoning but suffer from \textit{overthinking}: generating excessive reasoning tokens even for trivial queries. {Beyond inflating cost, overthinking can be self-defeating: models enter recursive self-doubt loops that exhaust token budgets without producing an answer, causing API timeouts that directly hurt accuracy.} We present an empirical study showing that \textbf{batch prompting}, originally introduced for throughput optimization, effectively suppresses overthinking at inference time. Across 13 diverse benchmarks with DeepSeek-R1 and OpenAI-o1, batch prompting {reduces reasoning tokens by 76\% (2{,}950$\mapsto$710), on average, while preserving or improving accuracy}. Through behavioral analysis, we find that batching induces three beneficial effects: (1) it reduces per-query reasoning effort when multiple queries share a context; (2) it enables pattern induction, where models generalize from earlier examples to solve later ones; and (3) it suppresses hedging behavior (e.g., ``\texttt{wait,}'' ``\texttt{let me double-check}'') that signals metacognitive loops. We also show that explicit prompt constraints (``\texttt{Use no more than 100 tokens in thinking.}'') fail to reduce overthinking; models either ignore them or sacrifice accuracy. These findings reframe batch prompting as more than a cost optimization: it is a practical inference-time technique that improves efficiency and reliability without model modification.

2511.03988 2026-02-23 cs.CV q-bio.NC

Simple 3D Pose Features Support Human and Machine Social Scene Understanding

Wenshuo Qin, Leyla Isik

Comments 28 pages, 6 figures

详情
英文摘要

Humans effortlessly recognize social interactions from visual input, yet the underlying computations remain unknown, and social interaction recognition challenges even the most advanced deep neural networks (DNNs). Here, we hypothesized that humans rely on 3D visuospatial pose information to make social judgments, and that this information is largely absent from most vision DNNs. To test these hypotheses, we used a novel pose and depth estimation pipeline to automatically extract 3D body joint positions from short video clips. We compared the ability of these body joints to predict human social judgments in the videos with embeddings from over 350 vision DNNs. We found that body joints predicted social judgments better than most DNNs. We then reduced the 3D body joints to an even more compact feature set describing only the 3D position and direction of people in the videos. We found that this minimal 3D feature set, but not its 2D counterpart, was necessary and sufficient to explain the prediction performance of the full set of body joints. These minimal 3D features also predicted the extent to which DNNs aligned with human social judgments and significantly improved their performance on these tasks. Together, these findings demonstrate that human social perception depends on simple, explicit 3D pose information.

2510.26752 2026-02-23 cs.AI cs.LG

The Oversight Game: Learning to Cooperatively Balance an AI Agent's Safety and Autonomy

William Overman, Mohsen Bayati

详情
英文摘要

As increasingly capable agents are deployed, a central safety challenge is how to retain meaningful human control without modifying the underlying system. We study a minimal control interface in which an agent chooses whether to act autonomously (play) or defer (ask), while a human simultaneously chooses whether to be permissive (trust) or engage in oversight (oversee), and model this interaction as a two-player Markov game. When this game forms a Markov Potential Game, we prove an alignment guarantee: any increase in the agent's utility from acting more autonomously cannot decrease the human's value. This establishes a form of intrinsic alignment where the agent's incentive to seek autonomy is structurally coupled to the human's welfare. Practically, the framework induces a transparent control layer that encourages the agent to defer when risky and act when safe. While we use gridworld simulations to illustrate the emergence of this collaboration, our primary validation involves an agentic tool-use task in which two 30B parameter language models are fine-tuned via independent policy gradient. We demonstrate that even as the agents learn to coordinate on the fly, this framework effectively reduces safety violations in realistic, open-ended environments.

2510.25860 2026-02-23 cs.AI cs.CL cs.HC

Through the Judge's Eyes: Inferred Thinking Traces Improve Reliability of LLM Raters

Xingjian Zhang, Tianhong Gao, Suliang Jin, Tianhao Wang, Teng Ye, Eytan Adar, Qiaozhu Mei

详情
英文摘要

Large language models (LLMs) are increasingly used as raters for evaluation tasks. However, their reliability is often limited for subjective tasks, when human judgments involve subtle reasoning beyond annotation labels. Thinking traces, the reasoning behind a judgment, are highly informative but challenging to collect and curate. We present a human-LLM collaborative framework to infer thinking traces from label-only annotations. The proposed framework uses a simple and effective rejection sampling method to reconstruct these traces at scale. These inferred thinking traces are applied to two complementary tasks: (1) fine-tuning open LLM raters; and (2) synthesizing clearer annotation guidelines for proprietary LLM raters. Across multiple datasets, our methods lead to significantly improved LLM-human agreement. Additionally, the refined annotation guidelines increase agreement among different LLM models. These results suggest that LLMs can serve as practical proxies for otherwise unrevealed human thinking traces, enabling label-only corpora to be extended into thinking-trace-augmented resources that enhance the reliability of LLM raters.

2510.18322 2026-02-23 cs.LG stat.ML

Uncertainty Estimation by Flexible Evidential Deep Learning

Taeseong Yoon, Heeyoung Kim

Comments NeurIPS 2025

详情
英文摘要

Uncertainty quantification (UQ) is crucial for deploying machine learning models in high-stakes applications, where overconfident predictions can lead to serious consequences. An effective UQ method must balance computational efficiency with the ability to generalize across diverse scenarios. Evidential deep learning (EDL) achieves efficiency by modeling uncertainty through the prediction of a Dirichlet distribution over class probabilities. However, the restrictive assumption of Dirichlet-distributed class probabilities limits EDL's robustness, particularly in complex or unforeseen situations. To address this, we propose \textit{flexible evidential deep learning} ($\mathcal{F}$-EDL), which extends EDL by predicting a flexible Dirichlet distribution -- a generalization of the Dirichlet distribution -- over class probabilities. This approach provides a more expressive and adaptive representation of uncertainty, significantly enhancing UQ generalization and reliability under challenging scenarios. We theoretically establish several advantages of $\mathcal{F}$-EDL and empirically demonstrate its state-of-the-art UQ performance across diverse evaluation settings, including classical, long-tailed, and noisy in-distribution scenarios.

2510.17999 2026-02-23 cs.CV

Investigating Demographic Bias in Brain MRI Segmentation: A Comparative Study of Deep-Learning and Non-Deep-Learning Methods

Ghazal Danaee, Marc Niethammer, Jarrett Rushmore, Sylvain Bouix

Comments 17 pages, 2 figures, Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2025:035

Journal ref Machine.Learning.for.Biomedical.Imaging. " (2026)

详情
英文摘要

Deep-learning-based segmentation algorithms have substantially advanced the field of medical image analysis, particularly in structural delineations in MRIs. However, an important consideration is the intrinsic bias in the data. Concerns about unfairness, such as performance disparities based on sensitive attributes like race and sex, are increasingly urgent. In this work, we evaluate the results of three different segmentation models (UNesT, nnU-Net, and CoTr) and a traditional atlas-based method (ANTs), applied to segment the left and right nucleus accumbens (NAc) in MRI images. We utilize a dataset including four demographic subgroups: black female, black male, white female, and white male. We employ manually labeled gold-standard segmentations to train and test segmentation models. This study consists of two parts: the first assesses the segmentation performance of models, while the second measures the volumes they produce to evaluate the effects of race, sex, and their interaction. Fairness is quantitatively measured using a metric designed to quantify fairness in segmentation performance. Additionally, linear mixed models analyze the impact of demographic variables on segmentation accuracy and derived volumes. Training on the same race as the test subjects leads to significantly better segmentation accuracy for some models. ANTs and UNesT show notable improvements in segmentation accuracy when trained and tested on race-matched data, unlike nnU-Net, which demonstrates robust performance independent of demographic matching. Finally, we examine sex and race effects on the volume of the NAc using segmentations from the manual rater and from our biased models. Results reveal that the sex effects observed with manual segmentation can also be observed with biased models, whereas the race effects disappear in all but one model.

2510.09658 2026-02-23 cs.LG cs.AI cs.CV

Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models

Filippo Rinaldi, Aniello Panariello, Giacomo Salici, Fengyuan Liu, Marco Ciccone, Angelo Porrello, Simone Calderara

Comments Accepted at ICLR 2026

详情
英文摘要

When a new release of a foundation model is published, practitioners typically need to repeat fine-tuning, even if the same task was already tackled in the previous version. A promising alternative is to reuse the parameter changes (i.e., task vectors) that capture how a model adapts to a specific task. However, these vectors often fail to transfer across different pre-trained models because their parameter spaces are misaligned. In this work, we show that successful transfer depends strongly on the gradient-sign structure of the new model. Based on this insight, we propose GradFix, which approximates the ideal sign structure and leverages it to transfer knowledge using only a handful of labeled samples. Notably, this requires no additional fine-tuning: we only compute a few target-model gradients without parameter updates and mask the source task vector accordingly. This yields an update that is locally aligned with the target loss landscape, effectively rebasing the task vector onto the new pre-training. We provide a theoretical guarantee that our method ensures first-order descent. Empirically, we demonstrate significant performance gains on vision and language benchmarks, consistently outperforming naive task vector addition and few-shot fine-tuning. We further show that transporting task vectors improves multi-task and multi-source model merging. Code is available at https://github.com/fillo-rinaldi/GradFix.

2510.08570 2026-02-23 cs.LG

Who Said Neural Networks Aren't Linear?

Nimrod Berman, Assaf Hallak, Assaf Shocher

详情
英文摘要

Neural networks are famously nonlinear. However, linearity is defined relative to a pair of vector spaces, $f:X \to Y$. Leveraging the algebraic concept of transport of structure, we propose a method to explicitly identify non-standard vector spaces where a neural network acts as a linear operator. When sandwiching a linear operator $A$ between two invertible neural networks, $f(x)=g_y^{-1}(A g_x(x))$, the corresponding vector spaces $X$ and $Y$ are induced by newly defined addition and scaling actions derived from $g_x$ and $g_y$. We term this kind of architecture a Linearizer. This framework makes the entire arsenal of linear algebra, including SVD, pseudo-inverse, orthogonal projection and more, applicable to nonlinear mappings. Furthermore, we show that the composition of two Linearizers that share a neural network is also a Linearizer. We leverage this property and demonstrate that training diffusion models using our architecture makes the hundreds of sampling steps collapse into a single step. We further utilize our framework to enforce idempotency (i.e. $f(f(x))=f(x)$) on networks leading to a globally projective generative model and to demonstrate modular style transfer.

2510.08059 2026-02-23 cs.LG

Mitigating Subject Dependency in EEG Decoding with Subject-Specific Low-Rank Adapters

Timon Klein, Piotr Minakowski, Sebastian Sager, Steffen Schotthöfer

详情
英文摘要

Subject-specific distribution shifts represent a fundamental obstacle to developing foundation models for brain decoding. We propose the Subject-Specific Low-Rank Adapter (SuLoRA), a drop-in replacement for standard linear or convolutional layers that captures inter-subject variability by decomposing weights into a shared, subject-invariant component and a lightweight, low-rank correction unique to each subject. This explicit separation enables existing architectures to become robust to subject shifts without architectural redesign. We evaluate SuLoRA on MEG speech perception and EEG motor imagery tasks across CNN and transformer architectures. In the speech decoding task, SuLoRA exceeds the baseline performance with half of the parameters. On motor imagery dataset, SuLoRA outperforms both subject-agnostic models and independently trained subject-specific models. SuLoRA offers a practical path towards effective cross-subject foundation models for brain signal applications.

2509.25740 2026-02-23 cs.CV

Dragging with Geometry: From Pixels to Geometry-Guided Image Editing

Xinyu Pu, Hongsong Wang, Jie Gui, Pan Zhou

详情
英文摘要

Interactive point-based image editing serves as a controllable editor, enabling precise and flexible manipulation of image content. However, most drag-based methods operate primarily on the 2D pixel plane with limited use of 3D cues. As a result, they often produce imprecise and inconsistent edits, particularly in geometry-intensive scenarios such as rotations and perspective transformations. To address these limitations, we propose a novel geometry-guided drag-based image editing method-GeoDrag, which addresses three key challenges: 1) incorporating 3D geometric cues into pixel-level editing, 2) mitigating discontinuities caused by geometry-only guidance, and 3) resolving conflicts arising from multi-point dragging. Built upon a unified displacement field that jointly encodes 3D geometry and 2D spatial priors, GeoDrag enables coherent, high-fidelity, and structure-consistent editing in a single forward pass. In addition, a conflict-free partitioning strategy is introduced to isolate editing regions, effectively preventing interference and ensuring consistency. Extensive experiments across various editing scenarios validate the effectiveness of our method, showing superior precision, structural consistency, and reliable multi-point editability. Project page: https://xinyu-pu.github.io/projects/geodrag.

2509.23592 2026-02-23 cs.LG cs.AI

Toward a Holistic Approach to Continual Model Merging

Hoang Phan, Sungmin Cha, Tung Lam Tran, Qi Lei

Comments Accepted to Workshop on Continual Learning in Computer Vision, ICCV 2025

详情
英文摘要

We present a holistic framework for Continual Model Merging (CMM) that intervenes at three critical stages: pre-merging, during merging, and post-merging-to address two fundamental challenges in continual learning. In particular, conventional approaches either maintain a growing list of per-domain task vectors, leading to scalability issues or rely solely on weight-space merging when old data is inaccessible, thereby losing crucial functional information. Our method overcomes these limitations by first fine-tuning the main model within its tangent space on domain-specific data; this linearization amplifies per-task weight disentanglement, effectively mitigating across-task interference. During merging, we leverage functional information from available optimizer states beyond mere parameter averages to avoid the need to revisit old data. Finally, a post-merging correction aligns the representation discrepancy between pre- and post-merged models, reducing bias and enhancing overall performance-all while operating under constant memory constraints without accessing historical data. Extensive experiments on standard class-incremental and domain-incremental benchmarks demonstrate that our approach not only achieves competitive performance but also provides a scalable and efficient solution to the catastrophic forgetting problem.

2509.20263 2026-02-23 cs.RO

HL-IK: A Lightweight Implementation of Human-Like Inverse Kinematics in Humanoid Arms

Bingjie Chen, Zihan Wang, Zhe Han, Guoping Pan, Yi Cheng, Houde Liu

详情
英文摘要

Traditional IK methods for redundant humanoid manipulators emphasize end-effector (EE) tracking, frequently producing configurations that are valid mechanically but not human-like. We present Human-Like Inverse Kinematics (HL-IK), a lightweight IK framework that preserves EE tracking while shaping whole-arm configurations to appear human-like, without full-body sensing at runtime. The key idea is a learned elbow prior: using large-scale human motion data retargeted to the robot, we train a FiLM-modulated spatio-temporal attention network (FiSTA) to predict the next-step elbow pose from the EE target and a short history of EE-elbow states.This prediction is incorporated as a small residual alongside EE and smoothness terms in a standard Levenberg-Marquardt optimizer, making HL-IK a drop-in addition to numerical IK stacks. Over 183k simulation steps, HL-IK reduces arm-similarity position and direction error by 30.6% and 35.4% on average, and by 42.2% and 47.4% on the most challenging trajectories. Hardware teleoperation on a robot distinct from simulation further confirms the gains in anthropomorphism. HL-IK is simple to integrate, adaptable across platforms via our pipeline, and adds minimal computation, enabling human-like motions for humanoid robots.

2509.06974 2026-02-23 cs.LG cs.AI

Individualized and Interpretable Sleep Forecasting via a Two-Stage Adaptive Spatial-Temporal Model

Xueyi Wang, Claudine J. C. Lamoth, Elisabeth Wilhelm

详情
英文摘要

Sleep quality impacts well-being. Therefore, healthcare providers and individuals need accessible and reliable forecasting tools for preventive interventions. This paper introduces an interpretable, individualized adaptive spatial-temporal model for predicting sleep quality. We designed a hierarchical architecture, consisting of parallel 1D convolutions with varying kernel sizes and dilated convolution, which extracts multi-resolution temporal patterns-short kernels capture rapid physiological changes, while larger kernels and dilation model slower trends. The extracted features are then refined through channel attention, which learns to emphasize the most predictive variables for each individual, followed by bidirectional LSTM and self-attention that jointly model both local sequential dynamics and global temporal dependencies. Finally, a two-stage adaptation strategy ensures the learned representations transfer effectively to new users. We conducted various experiments with five input window sizes (3, 5, 7, 9, and 11 days) and five prediction window sizes (1, 3, 5, 7, and 9 days). Our model consistently outperformed time series forecasting baseline approaches, including LSTM, Informer, PatchTST, and TimesNet. The best performance was achieved with a three-day input window and a one-day prediction window, yielding an RMSE of 0.216. Furthermore, the model demonstrated good predictive performance even for longer forecasting horizons (e.g., with a 0.257 RMSE for a three-day prediction window), highlighting its practical utility for real-world applications. We also conducted an explainability analysis to examine how different features influence sleep quality. These findings proved that the proposed framework offers a robust, adaptive, and explainable solution for personalized sleep forecasting using sparse data from commercial wearable devices.

2509.00663 2026-02-23 cs.LG cs.NE

Morephy-Net: An Evolutionary Multi-objective Optimization for Replica-Exchange-based Physics-informed Neural Operator Learning Networks

Binghang Lu, Changhong Mou, Guang Lin

详情
英文摘要

We propose an evolutionary Multi-objective Optimization for Replica-Exchange-based Physics-informed operator-learning Networks (Morephy-Net) to solve parametric partial differential equations (PDEs) in noisy data regimes, for both forward prediction and inverse identification. Existing physics-informed neural networks and operator-learning models (e.g., DeepONets and Fourier neural operators) often face three coupled challenges: (i) balancing data/operator and physics residual losses, (ii) maintaining robustness under noisy or sparse observations, and (iii) providing reliable uncertainty quantification. Morephy-Net addresses these issues by integrating: (i) evolutionary multi-objective optimization that treats data/operator and physics residual terms as separate objectives and searches the Pareto front, thereby avoiding ad hoc loss weighting; (ii) replica-exchange stochastic gradient Langevin dynamics to enhance global exploration and stabilize training in non-convex landscapes; and (iii) Bayesian uncertainty quantification obtained from stochastic sampling. We validate Morephy-Net on representative forward and inverse problems, including the one-dimensional Burgers equation and the time-fractional mixed diffusion--wave equation. The results demonstrate consistent improvements in accuracy, noise robustness, and calibrated uncertainty estimates over standard operator-learning baselines.

2508.15637 2026-02-23 cs.LG cs.CL stat.AP

Classification errors distort findings in automated speech processing: examples and solutions from child-development research

Lucas Gautheron, Evan Kidd, Anton Malko, Marvin Lavechin, Alejandrina Cristia

详情
英文摘要

With the advent of wearable recorders, scientists are increasingly turning to automated methods of analysis of audio and video data in order to measure children's experience, behavior, and outcomes, with a sizable literature employing long-form audio-recordings to study language acquisition. While numerous articles report on the accuracy and reliability of the most popular automated classifiers, less has been written on the downstream effects of classification errors on measurements and statistical inferences (e.g., the estimate of correlations and effect sizes in regressions). This paper's main contributions are drawing attention to downstream effects of confusion errors, and providing an approach to measure and potentially recover from these errors. Specifically, we use a Bayesian approach to study the effects of algorithmic errors on key scientific questions, including the effect of siblings on children's language experience and the association between children's production and their input. By fitting a joint model of speech behavior and algorithm behavior on real and simulated data, we show that classification errors can significantly distort estimates for both the most commonly used \gls{lena}, and a slightly more accurate open-source alternative (the Voice Type Classifier from the ACLEW system). We further show that a Bayesian calibration approach for recovering unbiased estimates of effect sizes can be effective and insightful, but does not provide a fool-proof solution.