arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2981
2603.28321 2026-03-31 cs.LG

FairGC: Fairness-aware Graph Condensation

Yihan Gao, Chenxi Huang, Wen Shi, Ke Sun, Ziqi Xu, Xikun Zhang, Mingliang Hou, Renqiang Luo

Comments 6 pages, IJCNN 2026 accepted

详情
英文摘要

Graph condensation (GC) has become a vital strategy for scaling Graph Neural Networks by compressing massive datasets into small, synthetic node sets. While current GC methods effectively maintain predictive accuracy, they are primarily designed for utility and often ignore fairness constraints. Because these techniques are bias-blind, they frequently capture and even amplify demographic disparities found in the original data. This leads to synthetic proxies that are unsuitable for sensitive applications like credit scoring or social recommendations. To solve this problem, we introduce FairGC, a unified framework that embeds fairness directly into the graph distillation process. Our approach consists of three key components. First, a Distribution-Preserving Condensation module synchronizes the joint distributions of labels and sensitive attributes to stop bias from spreading. Second, a Spectral Encoding module uses Laplacian eigen-decomposition to preserve essential global structural patterns. Finally, a Fairness-Enhanced Neural Architecture employs multi-domain fusion and a label-smoothing curriculum to produce equitable predictions. Rigorous evaluations on four real-world datasets, show that FairGC provides a superior balance between accuracy and fairness. Our results confirm that FairGC significantly reduces disparity in Statistical Parity and Equal Opportunity compared to existing state-of-the-art condensation models. The codes are available at https://github.com/LuoRenqiang/FairGC.

2603.28319 2026-03-31 cs.CV

Beyond Scanpaths: Graph-Based Gaze Simulation in Dynamic Scenes

Luke Palmer, Petar Palasek, Hazem Abdelkawy

详情
英文摘要

Accurately modelling human attention is essential for numerous computer vision applications, particularly in the domain of automotive safety. Existing methods typically collapse gaze into saliency maps or scanpaths, treating gaze dynamics only implicitly. We instead formulate gaze modelling as an autoregressive dynamical system and explicitly unroll raw gaze trajectories over time, conditioned on both gaze history and the evolving environment. Driving scenes are represented as gaze-centric graphs processed by the Affinity Relation Transformer (ART), a heterogeneous graph transformer that models interactions between driver gaze, traffic objects, and road structure. We further introduce the Object Density Network (ODN) to predict next-step gaze distributions, capturing the stochastic and object-centric nature of attentional shifts in complex environments. We also release Focus100, a new dataset of raw gaze data from 30 participants viewing egocentric driving footage. Trained directly on raw gaze, without fixation filtering, our unified approach produces more natural gaze trajectories, scanpath dynamics, and saliency maps than existing attention models, offering valuable insights for the temporal modelling of human attention in dynamic environments.

2603.28316 2026-03-31 cs.LG

Taming the Instability: A Robust Second-Order Optimizer for Federated Learning over Non-IID Data

Yuanqiao Zhang, Tiantian He, Yuan Gao, Yixin Wang, Yew-Soon Ong, Maoguo Gong, A. K. Qin, Hui Li

Comments 33 pages, preprint, under review

详情
英文摘要

In this paper, we present Federated Robust Curvature Optimization (FedRCO), a novel second-order optimization framework designed to improve convergence speed and reduce communication cost in Federated Learning systems under statistical heterogeneity. Existing second-order optimization methods are often computationally expensive and numerically unstable in distributed settings. In contrast, FedRCO addresses these challenges by integrating an efficient approximate curvature optimizer with a provable stability mechanism. Specifically, FedRCO incorporates three key components: (1) a Gradient Anomaly Monitor that detects and mitigates exploding gradients in real-time, (2) a Fail-Safe Resilience protocol that resets optimization states upon numerical instability, and (3) a Curvature-Preserving Adaptive Aggregation strategy that safely integrates global knowledge without erasing the local curvature geometry. Theoretical analysis shows that FedRCO can effectively mitigate instability and prevent unbounded updates while preserving optimization efficiency. Extensive experiments show that FedRCO achieves superior robustness against diverse non-IID scenarios while achieving higher accuracy and faster convergence than both state-of-the-art first-order and second-order methods.

2603.28315 2026-03-31 cs.CV cs.LG

Prototype-Enhanced Multi-View Learning for Thyroid Nodule Ultrasound Classification

Yangmei Chen, Zhongyuan Zhang, Xikun Zhang, Xinyu Hao, Mingliang Hou, Renqiang Luo, Ziqi Xu

Comments 6 pages, IWCMC 2026 accepted

详情
英文摘要

Thyroid nodule classification using ultrasound imaging is essential for early diagnosis and clinical decision-making; however, despite promising performance on in-distribution data, existing deep learning methods often exhibit limited robustness and generalisation when deployed across different ultrasound devices or clinical environments. This limitation is mainly attributed to the pronounced heterogeneity of thyroid ultrasound images, which can lead models to capture spurious correlations rather than reliable diagnostic cues. To address this challenge, we propose PEMV-thyroid, a Prototype-Enhanced Multi-View learning framework that accounts for data heterogeneity by learning complementary representations from multiple feature perspectives and refining decision boundaries through a prototype-based correction mechanism with mixed prototype information. By integrating multi-view representations with prototype-level guidance, the proposed approach enables more stable representation learning under heterogeneous imaging conditions. Extensive experiments on multiple thyroid ultrasound datasets demonstrate that PEMV-thyroid consistently outperforms state-of-the-art methods, particularly in cross-device and cross-domain evaluation scenarios, leading to improved diagnostic accuracy and generalisation performance in real-world clinical settings. The source code is available at https://github.com/chenyangmeii/Prototype-Enhanced-Multi-View-Learning.

2603.28301 2026-03-31 cs.LG

LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models

Chanyoung Kim, Minwoo Kim, Minseok Kang, Hyunwoo Kim, Dahuin Jung

Comments 32 pages, 28 figures

详情
英文摘要

Vision-Language-Action (VLA) models achieve strong performance in robotic manipulation by leveraging pre-trained vision-language backbones. However, in downstream robotic settings, they are typically fine-tuned with limited data, leading to overfitting to specific instruction formulations and leaving robustness to paraphrased instructions underexplored. To study this gap, we introduce LIBERO-Para, a controlled benchmark that independently varies action expressions and object references for fine-grained analysis of linguistic generalization. Across seven VLA configurations (0.6B-7.5B), we observe consistent performance degradation of 22-52 pp under paraphrasing. This degradation is primarily driven by object-level lexical variation: even simple synonym substitutions cause large drops, indicating reliance on surface-level matching rather than semantic grounding. Moreover, 80-96% of failures arise from planning-level trajectory divergence rather than execution errors, showing that paraphrasing disrupts task identification. Binary success rate treats all paraphrases equally, obscuring whether models perform consistently across difficulty levels or rely on easier cases. To address this, we propose PRIDE, a metric that quantifies paraphrase difficulty using semantic and syntactic factors. Our benchmark and corresponding code are available at: https://github.com/cau-hai-lab/LIBERO-Para

2603.28300 2026-03-31 cs.LG cs.AI

NeiGAD: Augmenting Graph Anomaly Detection via Spectral Neighbor Information

Qing Qing, Huafei Huang, Mingliang Hou, Renqiang Luo, Mohsen Guizani

Comments 6 pages, IWCMC 2026 accepted

详情
英文摘要

Graph anomaly detection (GAD) aims to identify irregular nodes or structures in attributed graphs. Neighbor information, which reflects both structural connectivity and attribute consistency with surrounding nodes, is essential for distinguishing anomalies from normal patterns. Although recent graph neural network (GNN)-based methods incorporate such information through message passing, they often fail to explicitly model its effect or interaction with attributes, limiting detection performance. This work introduces NeiGAD, a novel plug-and-play module that captures neighbor information through spectral graph analysis. Theoretical insights demonstrate that eigenvectors of the adjacency matrix encode local neighbor interactions and progressively amplify anomaly signals. Based on this, NeiGAD selects a compact set of eigenvectors to construct efficient and discriminative representations. Experiments on eight real-world datasets show that NeiGAD consistently improves detection accuracy and outperforms state-of-the-art GAD methods. These results demonstrate the importance of explicit neighbor modeling and the effectiveness of spectral analysis in anomaly detection. Code is available at: https://github.com/huafeihuang/NeiGAD.

2603.28297 2026-03-31 cs.CV

DinoDental: Benchmarking DINOv3 as a Unified Vision Encoder for Dental Image Analysis

Kun Tang, Xinquan Yang, Mianjie Zheng, Xuefen Liu, Xuguang Li, Xiaoqi Guo, Ruihan Chen, Linlin Shen, He Meng

详情
英文摘要

The scarcity and high cost of expert annotations in dental imaging present a significant challenge for the development of AI in dentistry. DINOv3, a state-of-the-art, self-supervised vision foundation model pre-trained on 1.7 billion images, offers a promising pathway to mitigate this issue. However, its reliability when transferred to the dental domain, with its unique imaging characteristics and clinical subtleties, remains unclear. To address this, we introduce DinoDental, a unified benchmark designed to systematically evaluate whether DINOv3 can serve as a reliable, off-the-shelf encoder for comprehensive dental image analysis without requiring domain-specific pre-training. Constructed from multiple public datasets, DinoDental covers a wide range of tasks, including classification, detection, and instance segmentation on both panoramic radiographs and intraoral photographs. We further analyze the model's transfer performance by scaling its size and input resolution, and by comparing different adaptation strategies, including frozen features, full fine-tuning, and the parameter-efficient Low-Rank Adaptation (LoRA) method. Our experiments show that DINOv3 can serve as a strong unified encoder for dental image analysis across both panoramic radiographs and intraoral photographs, remaining competitive across tasks while showing particularly clear advantages for intraoral image understanding and boundary-sensitive dense prediction. Collectively, DinoDental provides a systematic framework for comprehensively evaluating DINOv3 in dental analysis, establishing a foundational benchmark to guide efficient and effective model selection and adaptation for the dental AI community.

2603.28295 2026-03-31 cs.AI

Evaluating LLMs for Answering Student Questions in Introductory Programming Courses

Thomas Van Mullem, Bart Mesuere, Peter Dawyndt

详情
英文摘要

The rapid emergence of Large Language Models (LLMs) presents both opportunities and challenges for programming education. While students increasingly use generative AI tools, direct access often hinders the learning process by providing complete solutions rather than pedagogical hints. Concurrently, educators face significant workload and scalability challenges when providing timely, personalized feedback. This study investigates the capabilities of LLMs to safely and effectively assist educators in answering student questions within a CS1 programming course. To achieve this, we established a rigorous, reproducible evaluation process by curating a benchmark dataset of 170 authentic student questions from a learning management system, paired with ground-truth responses authored by subject matter experts. Because traditional text-matching metrics are insufficient for evaluating open-ended educational responses, we developed and validated a custom LLM-as-a-Judge metric optimized for assessing pedagogical accuracy. Our findings demonstrate that models, such as Gemini 3 flash, can surpass the quality baseline of typical educator responses, achieving high alignment with expert pedagogical standards. To mitigate persistent risks like hallucination and ensure alignment with course-specific context, we advocate for a "teacher-in-the-loop" implementation. Finally, we abstract our methodology into a task-agnostic evaluation framework, advocating for a shift in the development of educational LLM tools from ad-hoc, post-deployment testing to a quantifiable, pre-deployment validation process.

2603.28290 2026-03-31 cs.LG cs.AR

OptINC: Optical In-Network-Computing for Scalable Distributed Learning

Sijie Fei, Grace Li Zhang, Bing Li, Ulf Schlichtmann

详情
英文摘要

Distributed learning is widely used for training large models on large datasets by distributing parts of the model or dataset across multiple devices and aggregating the computed results for subsequent computations or parameter updates. Existing communication algorithms for distributed learning such as ring all-reduce result in heavy communication overhead between servers. Since communication in large-scale systems uses optical fibers, we propose an Optical In-Network-Computing (OptINC) architecture to offload the computation in servers onto the optical interconnects. To execute gradient averaging and quantization in the optical domain, we incorporate optical devices such as Mach-Zehnder-Interferometers (MZIs) into the interconnects. Such a de facto optical neural network (ONN) can effectively reduce the communication overhead in existing distributed training solutions. To reduce dataset complexity for training this neural network, a preprocessing algorithm implemented in the optical domain is also proposed. Hardware cost is lowered by approximating the weight matrices of the optical neural network with unitary and diagonal matrices, while the accuracy is maintained by a proposed hardware-aware training algorithm. The proposed solution was evaluated on real distributed learning tasks, including ResNet50 on CIFAR-100, and a LLaMA-based network on Wikipedia-1B. In both cases, the proposed framework can achieve comparable training accuracy to the ring all-reduce baseline, while eliminating communication overhead.

2603.28288 2026-03-31 cs.LG cs.AI cs.NA math.NA

FI-KAN: Fractal Interpolation Kolmogorov-Arnold Networks

Gnankan Landry Regis N'guessan

Comments 37 pages, 20 figures, 14 tables. Code available at: https://github.com/ReFractals/fractal-interpolation-kan

详情
英文摘要

Kolmogorov-Arnold Networks (KAN) employ B-spline bases on a fixed grid, providing no intrinsic multi-scale decomposition for non-smooth function approximation. We introduce Fractal Interpolation KAN (FI-KAN), which incorporates learnable fractal interpolation function (FIF) bases from iterated function system (IFS) theory into KAN. Two variants are presented: Pure FI-KAN (Barnsley, 1986) replaces B-splines entirely with FIF bases; Hybrid FI-KAN (Navascues, 2005) retains the B-spline path and adds a learnable fractal correction. The IFS contraction parameters give each edge a differentiable fractal dimension that adapts to target regularity during training. On a Holder regularity benchmark ($α\in [0.2, 2.0]$), Hybrid FI-KAN outperforms KAN at every regularity level (1.3x to 33x). On fractal targets, FI-KAN achieves up to 6.3x MSE reduction over KAN, maintaining 4.7x advantage at 5 dB SNR. On non-smooth PDE solutions (scikit-fem), Hybrid FI-KAN achieves up to 79x improvement on rough-coefficient diffusion and 3.5x on L-shaped domain corner singularities. Pure FI-KAN's complementary behavior, dominating on rough targets while underperforming on smooth ones, provides controlled evidence that basis geometry must match target regularity. A fractal dimension regularizer provides interpretable complexity control whose learned values recover the true fractal dimension of each target. These results establish regularity-matched basis design as a principled strategy for neural function approximation.

2603.28282 2026-03-31 cs.LG cs.AI cs.DC

Pre-Deployment Complexity Estimation for Federated Perception Systems

KMA Solaiman, Shafkat Islam, Ruy de Oliveira, Bharat Bhargava

Comments Accepted and presented at Edge AI Research Symposium 2026 (EdgeAI2026), San Diego, CA

详情
英文摘要

Edge AI systems increasingly rely on federated learning to train perception models in distributed, privacy-preserving, and resource-constrained environments. Yet, before training begins, practitioners often lack practical tools to estimate how difficult a federated learning task will be in terms of achievable accuracy and communication cost. This paper presents a classifier-agnostic, pre-deployment framework for estimating learning complexity in federated perception systems by jointly modeling intrinsic properties of the data and characteristics of the distributed environment. The proposed complexity metric integrates dataset attributes such as dimensionality, sparsity, and heterogeneity with factors related to the composition of participating clients. Using federated learning as a representative distributed training setting, we examine how learning difficulty varies across different federated configurations. Experiments on multiple variants of the MNIST dataset and CIFAR dataset show that the proposed metric strongly correlates with federated learning performance and the communication effort required to reach fixed accuracy targets. These findings suggest that complexity estimation can serve as a practical diagnostic tool for resource planning, dataset assessment, and feasibility evaluation in edge-deployed perception systems.

2603.28272 2026-03-31 cs.RO

Point of View: How Perspective Affects Perceived Robot Sociability

Subham Agrawal, Aftab Akthar, Nils Dengler, Maren Bennewitz

详情
英文摘要

Ensuring that robot navigation is safe and socially acceptable is crucial for comfortable human-robot interaction in shared environments. However, existing validation methods often rely on a bird's-eye (allocentric) perspective, which fails to capture the subjective first-person experience of pedestrians encountering robots in the real world. In this paper, we address the perceptual gap between allocentric validation and egocentric experience by investigating how different perspectives affect the perceived sociability and disturbance of robot trajectories. Our approach uses an immersive VR environment to evaluate identical robot trajectories across allocentric, egocentric-proximal, and egocentric-distal viewpoints in a user study. We perform this analysis for trajectories generated from two different navigation policies to understand if the observed differences are unique to a single type of trajectory or more generalizable. We further examine whether augmenting a trajectory with a head-nod gesture can bridge the perceptual gap and improve human comfort. Our experiments suggest that trajectories rated as sociable from an allocentric view may be perceived as significantly more disturbing when experienced from a first-person perspective in close proximity. Our results also demonstrate that while passing distance affects perceived disturbance, communicative social signaling, such as a head-nod, can effectively enhance the perceived sociability of the robot's behavior.

2603.28271 2026-03-31 cs.RO

osmAG-Nav: A Hierarchical Semantic Topometric Navigation Stack for Robust Lifelong Indoor Autonomy

Yongqi Zhang, Jiajie Zhang, Chengqian Li, Fujing Xie, Sören Schwertfeger

Comments 42 pages, 10 figures

详情
英文摘要

The deployment of mobile robots in large-scale, multi-floor environments demands navigation systems that achieve spatial scalability without compromising local kinematic precision. Traditional navigation stacks, reliant on monolithic occupancy grid maps, face severe bottlenecks in storage efficiency, cross-floor reasoning, and long-horizon planning. To address these limitations, this paper presents osmAG-Nav, a complete, open-source ROS2 navigation stack built upon the hierarchical semantic topometric OpenStreetMap Area Graph (osmAG) map standard. The system follows a "System of Systems" architecture that decouples global topological reasoning from local metric execution. A Hierarchical osmAG planner replaces dense grid searches with an LCA-anchored pipeline on a passage-centric graph whose edge costs derive from local raster traversability rather than Euclidean distance, yielding low-millisecond planning on long campus-scale routes. A Rolling Window mechanism rasterizes a fixed-size local metric grid around the robot, keeping the local costmap memory footprint independent of the total mapped area, while a Segmented Execution strategy dispatches intermediate goals to standard ROS2 controllers for smooth handoffs. System robustness is reinforced by a structure-aware LiDAR localization framework that filters dynamic clutter against permanent architectural priors. Extensive experiments on a real-world multi-story indoor-outdoor campus (>11,025 m^2) show that, on the same-floor benchmark subset, osmAG-Nav delivers up to 7816x lower planning latency than a grid-based baseline on long routes while maintaining low path-length overhead and lifelong localization stability. A single-floor long-range robot mission further validates the integrated stack reliability. The full stack is released as modular ROS2 Lifecycle Nodes.

2603.28263 2026-03-31 cs.CL cs.AI

Merge and Conquer: Instructing Multilingual Models by Adding Target Language Weights

Eneko Valero, Maria Ribalta i Albado, Oscar Sainz, Naiara Perez, German Rigau

Comments This paper was accepted at the 15th edition of the Language Resources and Evaluation Conference (LREC 2026)

详情
英文摘要

Large Language Models (LLMs) remain heavily centered on English, with limited performance in low-resource languages. Existing adaptation approaches, such as continual pre-training, demand significant computational resources. In the case of instructed models, high-quality instruction data is also required, both of which are often inaccessible for low-resource language communities. Under these constraints, model merging offers a lightweight alternative, but its potential in low-resource contexts has not been systematically explored. In this work, we explore whether it is possible to transfer language knowledge to an instruction-tuned LLM by merging it with a language-specific base model, thereby eliminating the need of language-specific instructions and repeated fine-tuning processes whenever stronger instructed variants become available. Through experiments covering four Iberian languages (Basque, Catalan, Galician, and Spanish) and two model families, we show that merging enables effective instruction following behavior in new languages and even supports multilingual capability through the combination of multiple language-specific models. Our results indicate that model merging is a viable and efficient alternative to traditional adaptation methods for low-resource languages, achieving competitive performance while greatly reducing computational cost.

2603.28261 2026-03-31 cs.CL

Coconstructions in spoken data: UD annotation guidelines and first results

Ludovica Pannitto, Sylvain Kahane, Kaja Dobrovoljc, Elena Battaglia, Bruno Guillaume, Caterina Mauri, Eleonora Zucchini

详情
英文摘要

The paper proposes annotation guidelines for syntactic dependencies that span across speaker turns - including collaborative coconstructions proper, wh-question answers, and backchannels - in spoken language treebanks within the Universal Dependencies framework. Two representations are proposed: a speaker-based representation following the segmentation into speech turns, and a dependency-based representation with dependencies across speech turns. New propositions are also put forward to distinguish between reformulations and repairs, and to promote elements in unfinished phrases.

2603.28248 2026-03-31 cs.AI

Reasoning as Energy Minimization over Structured Latent Trajectories

David K. Johansson

Comments 7 pages

详情
英文摘要

Single-shot neural decoders commit to answers without iterative refinement, while chain-of-thought methods introduce discrete intermediate steps but lack a scalar measure of reasoning progress. We propose Energy-Based Reasoning via Structured Latent Planning (EBRM), which models reasoning as gradient-based optimization of a multi-step latent trajectory $z_{1:T}$ under a learned energy function $E(h_x, z)$. The energy decomposes into per-step compatibility, transition consistency, and trajectory smoothness terms. Training combines supervised encoder-decoder learning with contrastive energy shaping using hard negatives, while inference performs gradient descent or Langevin dynamics over $z$ and decodes from $z_T$. We identify a critical failure mode: on CNF logic satisfaction, latent planning reduces accuracy from $\approx 95\%$ to $\approx 56\%$. This degradation arises from a distribution mismatch, where the decoder is trained on encoder outputs $h_x$ but evaluated on planner outputs $z_T$ that drift into unseen latent regions. We analyze this behavior through per-step decoding, latent drift tracking, and gradient decomposition. To address it, we propose dual-path decoder training and latent anchoring. We further introduce a six-part ablation protocol covering component contributions, trajectory length, planner dynamics, initialization, decoder training distribution, and anchor weight. Experiments on three synthetic tasks show that energy decreases monotonically and induces structured latent trajectories on graph and logic tasks, while remaining flat on arithmetic ($r = 0.073$), indicating a negative result. Code is available at https://github.com/dkjo8/ebr-via-structured-latent-planning.

2603.28243 2026-03-31 cs.RO cs.SY eess.SY

Cost-Matching Model Predictive Control for Efficient Reinforcement Learning in Humanoid Locomotion

Wenqi Cai, Kyriakos G. Vamvoudakis, Sébastien Gros, Anthony Tzes

详情
英文摘要

In this paper, we propose a cost-matching approach for optimal humanoid locomotion within a Model Predictive Control (MPC)-based Reinforcement Learning (RL) framework. A parameterized MPC formulation with centroidal dynamics is trained to approximate the action-value function obtained from high-fidelity closed-loop data. Specifically, the MPC cost-to-go is evaluated along recorded state-action trajectories, and the parameters are updated to minimize the discrepancy between MPC-predicted values and measured returns. This formulation enables efficient gradient-based learning while avoiding the computational burden of repeatedly solving the MPC problem during training. The proposed method is validated in simulation using a commercial humanoid platform. Results demonstrate improved locomotion performance and robustness to model mismatch and external disturbances compared with manually tuned baselines.

2603.28240 2026-03-31 cs.RO

Off-Axis Compliant RCM Joint with Near-Isotropic Stiffness and Minimal Parasitic Error

Federico Mariano, Elena De Momi, Giovanni Berselli, Jovana Jovanova, Just L. Herder, Leonardo S. Mattos

详情
英文摘要

This paper presents an off-axis, monolithic compliant Remote Center of Motion (RCM) joint for neuroendoscopic manipulation, combining near-isotropic stiffness with minimal parasitic motion. Based on the Tetra II concept, the end-effector is placed outside the tetrahedral flexure to improve line of sight, facilitate sterilization, and allow rapid tool release. Design proceeds in two stages: mobility panels are sized with a compliance-based isotropy objective, then constraining panels are synthesized through finite-element feasibility exploration to trade stiffness isotropy against RCM drift. The joint is modeled with beam elements and validated via detailed finite-element analyses, including fatigue-bounded stress constraints. A PA12 prototype is fabricated by selective laser sintering and characterized on a benchtop: a 2 N radial load is applied at the end-effector while a 6-DOF electromagnetic sensor records pose. The selected configuration produces a stiffness-ellipse principal axis ratio (PAR) of 1.37 and a parasitic-to-useful rotation ratio (PRR) of 0.63%. Under a 4.5° commanded rotation, the predicted RCM drift remains sub-millimetric (0.015-0.172 mm). Fatigue analysis predicts a usable rotational workspace of 12.1°-34.4° depending on direction. Experiments reproduce the simulated directional stiffness trend with typical deviations of 6-30%, demonstrating a compact, fabrication-ready RCM module for constrained surgical access.

2603.28233 2026-03-31 cs.CV cs.AI

TwinMixing: A Shuffle-Aware Feature Interaction Model for Multi-Task Segmentation

Minh-Khoi Do, Huy Che, Dinh-Duy Phan, Duc-Khai Lam, Duc-Lung Vu

详情
Journal ref
Results in Engineering 30 (2026) 109982
英文摘要

Accurate and efficient perception is essential for autonomous driving, where segmentation tasks such as drivable-area and lane segmentation provide critical cues for motion planning and control. However, achieving high segmentation accuracy while maintaining real-time performance on low-cost hardware remains a challenging problem. To address this issue, we introduce TwinMixing, a lightweight multi-task segmentation model designed explicitly for drivable-area and lane segmentation. The proposed network features a shared encoder and task-specific decoders, enabling both feature sharing and task specialization. Within the encoder, we propose an Efficient Pyramid Mixing (EPM) module that enhances multi-scale feature extraction through a combination of grouped convolutions, depthwise dilated convolutions and channel shuffle operations, effectively expanding the receptive field while minimizing computational cost. Each decoder adopts a Dual-Branch Upsampling (DBU) Block composed of a learnable transposed convolution-based Fine detailed branch and a parameter-free bilinear interpolation-based Coarse grained branch, achieving detailed yet spatially consistent feature reconstruction. Extensive experiments on the BDD100K dataset validate the effectiveness of TwinMixing across three configurations - tiny, base, and large. Among them, the base configuration achieves the best trade-off between accuracy and computational efficiency, reaching 92.0% mIoU for drivable-area segmentation and 32.3% IoU for lane segmentation with only 0.43M parameters and 3.95 GFLOPs. Moreover, TwinMixing consistently outperforms existing segmentation models on the same tasks, as illustrated in Fig. 1. Thanks to its compact and modular design, TwinMixing demonstrates strong potential for real-time deployment in autonomous driving and embedded perception systems. The source code: https://github.com/Jun0se7en/TwinMixing.

2603.28224 2026-03-31 cs.CV

Ghost-FWL: A Large-Scale Full-Waveform LiDAR Dataset for Ghost Detection and Removal

Kazuma Ikeda, Ryosei Hara, Rokuto Nagata, Ozora Sako. Zihao Ding, Takahiro Kado, Ibuki Fujioka, Taro Beppu, Mariko Isogawa, Kentaro Yoshioka

Comments Accepted to CVPR 2026 (Main)

详情
英文摘要

LiDAR has become an essential sensing modality in autonomous driving, robotics, and smart-city applications. However, ghost points (or ghosts), which are false reflections caused by multi-path laser returns from glass and reflective surfaces, severely degrade 3D mapping and localization accuracy. Prior ghost removal relies on geometric consistency in dense point clouds, failing on mobile LiDAR's sparse, dynamic data. We address this by exploiting full-waveform LiDAR (FWL), which captures complete temporal intensity profiles rather than just peak distances, providing crucial cues for distinguishing ghosts from genuine reflections in mobile scenarios. As this is a new task, we present Ghost-FWL, the first and largest annotated mobile FWL dataset for ghost detection and removal. Ghost-FWL comprises 24K frames across 10 diverse scenes with 7.5 billion peak-level annotations, which is 100x larger than existing annotated FWL datasets. Benefiting from this large-scale dataset, we establish a FWL-based baseline model for ghost detection and propose FWL-MAE, a masked autoencoder for efficient self-supervised representation learning on FWL data. Experiments show that our baseline outperforms existing methods in ghost removal accuracy, and our ghost removal further enhances downstream tasks such as LiDAR-based SLAM (66% trajectory error reduction) and 3D object detection (50x false positive reduction). The dataset and code is publicly available and can be accessed via the project page: https://keio-csg.github.io/Ghost-FWL

2603.28219 2026-03-31 cs.LG

Variational Neurons in Transformers for Language Modeling

Yves Ruffenach

Comments 11 pages, 3 figures

详情
英文摘要

Transformers for language modeling usually rely on deterministic internal computation, with uncertainty expressed mainly at the output layer. We introduce variational neurons into Transformer feed-forward computation so that uncertainty becomes part of the internal computation itself. Concretely, we replace deterministic feed-forward units with local variational units based on EVE while preserving the overall Transformer backbone. We evaluate this design in compact next-token language-modeling settings. We compare deterministic and variational variants with both predictive and probabilistic criteria. Alongside negative log-likelihood, perplexity and accuracy, we analyze calibration, conditional variance, mutual information and latent-usage statistics. The resulting picture is clear. Variational neurons integrate stably into Transformers, preserve strong predictive performance and produce informative uncertainty signals. The experiments also show that task quality, useful depth and internal stability are distinct properties. These results establish variational Transformers as a practical form of uncertainty-aware language modeling. They show that Transformers can predict with an explicit internal structure of uncertainty, which supports stronger probabilistic evaluation and a more informative analysis of model behavior.

2603.28213 2026-03-31 cs.CL

\textit{Versteasch du mi?} Computational and Socio-Linguistic Perspectives on GenAI, LLMs, and Non-Standard Language

Verena Platzgummer, John McCrae, Sina Ahmadi

详情
英文摘要

The design of Large Language Models and generative artificial intelligence has been shown to be "unfair" to less-spoken languages and to deepen the digital language divide. Critical sociolinguistic work has also argued that these technologies are not only made possible by prior socio-historical processes of linguistic standardisation, often grounded in European nationalist and colonial projects, but also exacerbate epistemologies of language as "monolithic, monolingual, syntactically standardized systems of meaning". In our paper, we draw on earlier work on the intersections of technology and language policy and bring our respective expertise in critical sociolinguistics and computational linguistics to bear on an interrogation of these arguments. We take two different complexes of non-standard linguistic varieties in our respective repertoires--South Tyrolean dialects, which are widely used in informal communication in South Tyrol, Italy, as well as varieties of Kurdish--as starting points to an interdisciplinary exploration of the intersections between GenAI and linguistic variation and standardisation. We discuss both how LLMs can be made to deal with nonstandard language from a technical perspective, and whether, when or how this can contribute to "democratic and decolonial digital and machine learning strategies", which has direct policy implications.

2603.28211 2026-03-31 cs.CV

Explaining CLIP Zero-shot Predictions Through Concepts

Onat Ozdemir, Anders Christensen, Stephan Alaniz, Zeynep Akata, Emre Akbas

Comments Accepted to CVPR 2026

详情
英文摘要

Large-scale vision-language models such as CLIP have achieved remarkable success in zero-shot image recognition, yet their predictions remain largely opaque to human understanding. In contrast, Concept Bottleneck Models provide interpretable intermediate representations by reasoning through human-defined concepts, but they rely on concept supervision and lack the ability to generalize to unseen classes. We introduce EZPC that bridges these two paradigms by explaining CLIP's zero-shot predictions through human-understandable concepts. Our method projects CLIP's joint image-text embeddings into a concept space learned from language descriptions, enabling faithful and transparent explanations without additional supervision. The model learns this projection via a combination of alignment and reconstruction objectives, ensuring that concept activations preserve CLIP's semantic structure while remaining interpretable. Extensive experiments on five benchmark datasets, CIFAR-100, CUB-200-2011, Places365, ImageNet-100, and ImageNet-1k, demonstrate that our approach maintains CLIP's strong zero-shot classification accuracy while providing meaningful concept-level explanations. By grounding open-vocabulary predictions in explicit semantic concepts, our method offers a principled step toward interpretable and trustworthy vision-language models. Code is available at https://github.com/oonat/ezpc.

2603.28209 2026-03-31 cs.SD

On the Usefulness of Diffusion-Based Room Impulse Response Interpolation to Microphone Array Processing

Sagi Della Torre, Mirco Pezzoli, Fabio Antonacci, Sharon Gannot

详情
英文摘要

Room Impulse Responses estimation is a fundamental problem in spatial audio processing and speech enhancement. In this paper, we build upon our previously introduced diffusion-based inpainting framework for Room Impulse Response interpolation and demonstrate its applicability to enhancing the performance of practical multi-microphone array processing tasks. Furthermore, we validate the robustness of this method in interpolating real-world Room Impulse Responses.

2603.28203 2026-03-31 cs.AI cs.LG

Differentiable Power-Flow Optimization

Muhammed Öz, Jasmin Hörter, Kaleb Phipps, Charlotte Debus, Achim Streit, Markus Götz

详情
英文摘要

With the rise of renewable energy sources and their high variability in generation, the management of power grids becomes increasingly complex and computationally demanding. Conventional AC-power-flow simulations, which use the Newton-Raphson (NR) method, suffer from poor scalability, making them impractical for emerging use cases such as joint transmission-distribution modeling and global grid analysis. At the same time, purely data-driven surrogate models lack physical guarantees and may violate fundamental constraints. In this work, we propose Differentiable Power-Flow (DPF), a reformulation of the AC power-flow problem as a differentiable simulation. DPF enables end-to-end gradient propagation from the physical power mismatches to the underlying simulation parameters, thereby allowing these parameters to be identified efficiently using gradient-based optimization. We demonstrate that DPF provides a scalable alternative to NR by leveraging GPU acceleration, sparse tensor representations, and batching capabilities available in modern machine-learning frameworks such as PyTorch. DPF is especially suited as a tool for time-series analyses due to its efficient reuse of previous solutions, for N-1 contingency-analyses due to its ability to process cases in batches, and as a screening tool by leveraging its speed and early stopping capability. The code is available in the authors' code repository.

2603.28200 2026-03-31 cs.RO cs.LG q-bio.PE

A Deep Reinforcement Learning Framework for Closed-loop Guidance of Fish Schools via Virtual Agents

Takato Shibayama, Hiroaki Kawashima

Comments 18 pages, 8 figures

详情
英文摘要

Guiding collective motion in biological groups is a fundamental challenge in understanding social interaction rules and developing automated systems for animal management. In this study, we propose a deep reinforcement learning (RL) framework for the closed-loop guidance of fish schools using virtual agents. These agents are controlled by policies trained via Proximal Policy Optimization (PPO) in simulation and deployed in physical experiments with rummy-nose tetras (Petitella bleheri), enabling real-time interaction between artificial agents and live individuals. To cope with the stochastic behavior of live individuals, we design a composite reward function to balance directional guidance with social cohesion. Our systematic evaluation of visual parameters shows that a white background and larger stimulus sizes maximize guidance efficacy in physical trials. Furthermore, evaluation across group sizes revealed that while the system demonstrates effective guidance for groups of five individuals, this capability markedly degrades as group size increases to eight. This study highlights the potential of deep RL for automated guidance of biological collectives and identifies challenges in maintaining artificial influence in larger groups.

2603.28198 2026-03-31 cs.LG q-fin.ST

Policy-Controlled Generalized Share: A General Framework with a Transformer Instantiation for Strictly Online Switching-Oracle Tracking

Hongkai Hu

Comments 44 pages, 6 figures, 5 tables, 1 algorithm. Includes appendix and reproducibility-oriented experiments

详情
英文摘要

Static regret to a single expert is often the wrong target for strictly online prediction under non-stationarity, where the best expert may switch repeatedly over time. We study Policy-Controlled Generalized Share (PCGS), a general strictly online framework in which the generalized-share recursion is fixed while the post-loss update controls are allowed to vary adaptively. Its principal instantiation in this paper is PCGS-TF, which uses a causal Transformer as an update controller: after round t finishes and the loss vector is observed, the Transformer outputs the controls that map w_t to w_{t+1} without altering the already committed decision w_t. Under admissible post-loss update controls, we obtain a pathwise weighted regret guarantee for general time-varying learning rates, and a standard dynamic-regret guarantee against any expert path with at most S switches under the constant-learning-rate specialization. Empirically, on a controlled synthetic suite with exact dynamic-programming switching-oracle evaluation, PCGS-TF attains the lowest mean dynamic regret in all seven non-stationary families, with its advantage increasing for larger expert pools. On a reproduced household-electricity benchmark, PCGS-TF also achieves the lowest normalized dynamic regret for S = 5, 10, and 20.

2603.28197 2026-03-31 cs.AI

EpiPersona: Persona Projection and Episode Coupling for Pluralistic Preference Modeling

Yujie Zhang, Weikang Yuan, Zhuoren Jiang, Pengwei Yan

详情
英文摘要

Pluralistic alignment is essential for adapting large language models (LLMs) to the diverse preferences of individuals and minority groups. However, existing approaches often mix stable personal traits with episode-specific factors, limiting their ability to generalize across episodes. To address this challenge, we introduce EpiPersona, a framework for explicit persona-episode coupling. EpiPersona first projects noisy preference feedback into a low-dimensional persona space, where similar personas are aggregated into shared discrete codes. This process separates enduring personal characteristics from situational signals without relying on predefined preference dimensions. The inferred persona representation is then coupled with the current episode, enabling episode-aware preference prediction. Extensive experiments show that EpiPersona consistently outperforms the baselines. It achieves notable performance gains in hard episodic-shift scenarios, while remaining effective with sparse preference data.

2603.28191 2026-03-31 cs.CL

DongYuan: An LLM-Based Framework for Integrative Chinese and Western Medicine Spleen-Stomach Disorders Diagnosis

Hua Li, Yingying Li, Xiaobin Feng, Xinyi Fu, Lifeng Dong, Qingfeng Yang, Yanzhe Chen, Xiaoju Feng, Zhidong Cao, Jianbin Guo, Yanru Du

Comments 13 pages, 6 figures

详情
英文摘要

The clinical burden of spleen-stomach disorders is substantial. While large language models (LLMs) offer new potential for medical applications, they face three major challenges in the context of integrative Chinese and Western medicine (ICWM): a lack of high-quality data, the absence of models capable of effectively integrating the reasoning logic of traditional Chinese medicine (TCM) syndrome differentiation with that of Western medical (WM) disease diagnosis, and the shortage of a standardized evaluation benchmark. To address these interrelated challenges, we propose DongYuan, an ICWM spleen-stomach diagnostic framework. Specifically, three ICWM datasets (SSDF-Syndrome, SSDF-Dialogue, and SSDF-PD) were curated to fill the gap in high-quality data for spleen-stomach disorders. We then developed SSDF-Core, a core diagnostic LLM that acquires robust ICWM reasoning capabilities through a two-stage training regimen of supervised fine-tuning. tuning (SFT) and direct preference optimization (DPO), and complemented it with SSDF-Navigator, a pluggable consultation navigation model designed to optimize clinical inquiry strategies. Additionally, we established SSDF-Bench, a comprehensive evaluation benchmark focused on ICWM diagnosis of spleen-stomach disorders. Experimental results demonstrate that SSDF-Core significantly outperforms 12 mainstream baselines on SSDF-Bench. DongYuan lays a solid methodological foundation and provides practical technical references for the future development of intelligent ICWM diagnostic systems.

2603.28182 2026-03-31 cs.CV

A Closer Look at Cross-Domain Few-Shot Object Detection: Fine-Tuning Matters and Parallel Decoder Helps

Xuanlong Yu, Youyang Sha, Longfei Liu, Xi Shen, Di Yang

Comments Accepted at CVPR 2026

详情
英文摘要

Few-shot object detection (FSOD) is challenging due to unstable optimization and limited generalization arising from the scarcity of training samples. To address these issues, we propose a hybrid ensemble decoder that enhances generalization during fine-tuning. Inspired by ensemble learning, the decoder comprises a shared hierarchical layer followed by multiple parallel decoder branches, where each branch employs denoising queries either inherited from the shared layer or newly initialized to encourage prediction diversity. This design fully exploits pretrained weights without introducing additional parameters, and the resulting diverse predictions can be effectively ensembled to improve generalization. We further leverage a unified progressive fine-tuning framework with a plateau-aware learning rate schedule, which stabilizes optimization and achieves strong few-shot adaptation without complex data augmentations or extensive hyperparameter tuning. Extensive experiments on CD-FSOD, ODinW-13, and RF100-VL validate the effectiveness of our approach. Notably, on RF100-VL, which includes 100 datasets across diverse domains, our method achieves an average performance of 41.9 in the 10-shot setting, significantly outperforming the recent approach SAM3, which obtains 35.7. We further construct a mixed-domain test set from CD-FSOD to evaluate robustness to out-of-distribution (OOD) samples, showing that our proposed modules lead to clear improvement gains. These results highlight the effectiveness, generalization, and robustness of the proposed method. Code is available at: https://github.com/Intellindust-AI-Lab/FT-FSOD.