arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.09316 2026-03-11 cs.CV cs.AI cs.LG

CLoE: Expert Consistency Learning for Missing Modality Segmentation

Xinyu Tong, Meihua Zhou, Bowu Fan, Haitao Li

详情

英文摘要

Multimodal medical image segmentation often faces missing modalities at inference, which induces disagreement among modality experts and makes fusion unstable, particularly on small foreground structures. We propose Consistency Learning of Experts (CLoE), a consistency-driven framework for missing-modality segmentation that preserves strong performance when all modalities are available. CLoE formulates robustness as decision-level expert consistency control and introduces a dual-branch Expert Consistency Learning objective. Modality Expert Consistency enforces global agreement among expert predictions to reduce case-wise drift under partial inputs, while Region Expert Consistency emphasizes agreement on clinically critical foreground regions to avoid background-dominated regularization. We further map consistency scores to modality reliability weights using a lightweight gating network, enabling reliability-aware feature recalibration before fusion. Extensive experiments on BraTS 2020 and MSD Prostate demonstrate that CLoE outperforms state-of-the-art methods in incomplete multimodal segmentation, while exhibiting strong cross-dataset generalization and improving robustness on clinically critical structures.

URL PDF HTML ☆

赞 0 踩 0

2603.09312 2026-03-11 cs.CV

IntroSVG: Learning from Rendering Feedback for Text-to-SVG Generation via an Introspective Generator-Critic Framework

Feiyu Wang, Jiayuan Yang, Zhiyuan Zhao, Da Zhang, Bingyu Li, Peng Liu, Junyu Gao

2603.09310 2026-03-11 cs.LG math.PR stat.ML

A Gaussian Comparison Theorem for Training Dynamics in Machine Learning

Ashkan Panahi

2603.09307 2026-03-11 cs.SD

Paralinguistic Emotion-Aware Validation Timing Detection in Japanese Empathetic Spoken Dialogue

Zi Haur Pang, Yahui Fu, Yuan Gao, Tatsuya Kawahara

Comments Accepted to ICASSP 2026

2603.09298 2026-03-11 cs.RO

CORAL: Scalable Multi-Task Robot Learning via LoRA Experts

Yuankai Luo, Woping Chen, Tong Liang, Zhenguo Li

2603.09291 2026-03-11 cs.CV cs.AI

DenoiseSplat: Feed-Forward Gaussian Splatting for Noisy 3D Scene Reconstruction

Fuzhen Jiang, Zhuoran Li, Yinlin Zhang

2603.09287 2026-03-11 cs.CV

Exploring Modality-Aware Fusion and Decoupled Temporal Propagation for Multi-Modal Object Tracking

Shilei Wang, Pujian Lai, Dong Gao, Jifeng Ning, Gong Cheng

2603.09285 2026-03-11 cs.CV

Learning Convex Decomposition via Feature Fields

Yuezhi Yang, Qixing Huang, Mikaela Angelina Uy, Nicholas Sharp

Comments 14 pages, 12 figures

2603.09277 2026-03-11 cs.CV

Speeding Up the Learning of 3D Gaussians with Much Shorter Gaussian Lists

Jiaqi Liu, Zhizhong Han

Comments Accepted to CVPR 2026. Project page: https://github.com/MachinePerceptionLab/ShorterSplatting

2603.09274 2026-03-11 cs.LG cs.AI cs.AR cs.ET cs.NE

DendroNN: Dendrocentric Neural Networks for Energy-Efficient Classification of Event-Based Data

Jann Krausse, Zhe Su, Kyrus Mama, Maryada, Klaus Knobloch, Giacomo Indiveri, Jürgen Becker

Comments Currently under review

详情

英文摘要

Spatiotemporal information is at the core of diverse sensory processing and computational tasks. Feed-forward spiking neural networks can be used to solve these tasks while offering potential benefits in terms of energy efficiency by computing event-based. However, they have trouble decoding temporal information with high accuracy. Thus, they commonly resort to recurrence or delays to enhance their temporal computing ability which, however, bring downsides in terms of hardware-efficiency. In the brain, dendrites are computational powerhouses that just recently started to be acknowledged in such machine learning systems. In this work, we focus on a sequence detection mechanism present in branches of dendrites and translate it into a novel type of neural network by introducing a dendrocentric neural network, DendroNN. DendroNNs identify unique incoming spike sequences as spatiotemporal features. This work further introduces a rewiring phase to train the non-differentiable spike sequences without the use of gradients. During the rewiring, the network memorizes frequently occurring sequences and additionally discards those that do not contribute any discriminative information. The networks display competitive accuracies across various event-based time series datasets. We also propose an asynchronous digital hardware architecture using a time-wheel mechanism that builds on the event-driven design of DendroNNs, eliminating per-step global updates typical of delay- or recurrence-based models. By leveraging a DendroNN's dynamic and static sparsity along with intrinsic quantization, it achieves up to 4x higher efficiency than state-of-the-art neuromorphic hardware at comparable accuracy on the same audio classification task, demonstrating its suitability for spatiotemporal event-based computing. This work offers a novel approach to low-power spatiotemporal processing on event-driven hardware.

URL PDF HTML ☆

赞 0 踩 0

2603.09268 2026-03-11 cs.AI

Logos: An evolvable reasoning engine for rational molecular design

Haibin Wen, Zhe Zhao, Fanfu Wang, Tianyi Xu, Hao Zhang, Chao Yang, Ye Wei

详情

英文摘要

The discovery and design of functional molecules remain central challenges across chemistry,biology, and materials science. While recent advances in machine learning have accelerated molecular property prediction and candidate generation, existing models tend to excel either in physical fidelity without transparent reasoning, or in flexible reasoning without guarantees of chemical validity. This imbalance limits the reliability of artificial intelligence systems in real scientific design workflows.Here we present Logos, a compact molecular reasoning model that integrates multi-step logical reasoning with strict chemical consistency. Logos is trained using a staged strategy that first exposes the model to explicit reasoning examples linking molecular descriptions to structural decisions, and then progressively aligns these reasoning patterns with molecular representations. In a final training phase, chemical rules and invariants are incorporated directly into the optimization objective, guiding the model toward chemically valid outputs. Across multiple benchmark datasets, Logos achieves strong performance in both structural accuracy and chemical validity, matching or surpassing substantially larger general-purpose language models while operating with a fraction of their parameters. Beyond benchmark evaluation, the model exhibits stable behaviour in molecular optimization tasks involving multiple, potentially conflicting constraints. By explicitly exposing intermediate reasoning steps, Logos enables human inspection and assessment of the design logic underlying each generated structure. These results indicate that jointly optimizing for reasoning structure and physical consistency offers a practical pathway toward reliable and interpretable AI systems for molecular science, supporting closer integration of artificial intelligence into scientific discovery processes.

URL PDF HTML ☆

赞 0 踩 0

2603.09259 2026-03-11 cs.CV cs.RO

Implicit Geometry Representations for Vision-and-Language Navigation from Web Videos

Mingfei Han, Haihong Hao, Liang Ma, Kamila Zhumakhanova, Ekaterina Radionova, Jingyi Zhang, Xiaojun Chang, Xiaodan Liang, Ivan Laptev

Comments Extension of CVPR 2025 RoomTour3D with implicit geometric representations

2603.09258 2026-03-11 cs.CV

Multimodal Graph Representation Learning with Dynamic Information Pathways

Xiaobin Hong, Mingkai Lin, Xiaoli Wang, Chaoqun Wang, Wenzhong Li

Comments 12 pages, 6 figures, 6 tables

2603.09257 2026-03-11 cs.LG stat.ML

Transductive Generalization via Optimal Transport and Its Application to Graph Node Classification

MoonJeong Park, Seungbeom Lee, Kyungmin Kim, Jaeseung Heo, Seunghyuk Cho, Shouheng Li, Sangdon Park, Dongwoo Kim

2603.09255 2026-03-11 cs.CV cs.AI

Multi-model approach for autonomous driving: A comprehensive study on traffic sign-, vehicle- and lane detection and behavioral cloning

Kanishkha Jaisankar, Pranav M. Pawar, Diana Susane Joseph, Raja Muthalagu, Mithun Mukherjee

Comments 35 pages, 40 figures

详情

英文摘要

Deep learning and computer vision techniques have become increasingly important in the development of self-driving cars. These techniques play a crucial role in enabling self-driving cars to perceive and understand their surroundings, allowing them to safely navigate and make decisions in real-time. Using Neural Networks self-driving cars can accurately identify and classify objects such as pedestrians, other vehicles, and traffic signals. Using deep learning and analyzing data from sensors such as cameras and radar, self-driving cars can predict the likely movement of other objects and plan their own actions accordingly. In this study, a novel approach to enhance the performance of selfdriving cars by using pre-trained and custom-made neural networks for key tasks, including traffic sign classification, vehicle detection, lane detection, and behavioral cloning is provided. The methodology integrates several innovative techniques, such as geometric and color transformations for data augmentation, image normalization, and transfer learning for feature extraction. These techniques are applied to diverse datasets,including the German Traffic Sign Recognition Benchmark (GTSRB), road and lane segmentation datasets, vehicle detection datasets, and data collected using the Udacity selfdriving car simulator to evaluate the model efficacy. The primary objective of the work is to review the state-of-the-art in deep learning and computer vision for self-driving cars. The findings of the work are effective in solving various challenges related to self-driving cars like traffic sign classification, lane prediction, vehicle detection, and behavioral cloning, and provide valuable insights into improving the robustness and reliability of autonomous systems, paving the way for future research and deployment of safer and more efficient self-driving technologies.

URL PDF HTML ☆

赞 0 踩 0

2603.09253 2026-03-11 cs.LG

Efficient Reasoning at Fixed Test-Time Cost via Length-Aware Attention Priors and Gain-Aware Training

Rian Atri

Comments 19 pages, 6 tables, 1 figure. NeurIPS 2025 Workshop on Efficient Reasoning

2603.09249 2026-03-11 cs.AI

Social-R1: Towards Human-like Social Reasoning in LLMs

Jincenzi Wu, Yuxuan Lei, Jianxun Lian, Yitian Huang, Lexin Zhou, Haotian Li, Xing Xie, Helen Meng

Comments 27 pages. Code and dataset will be released upon acceptance

2603.09245 2026-03-11 cs.CV

Towards Instance Segmentation with Polygon Detection Transformers

Jiacheng Sun, Jiaqi Lin, Wenlong Hu, Haoyang Li, Xinghong Zhou, Chenghai Mao, Yan Peng, Xiaomao Li

2603.09241 2026-03-11 cs.CV cs.RO

RAE-NWM: Navigation World Model in Dense Visual Representation Space

Mingkun Zhang, Wangtian Shen, Fan Zhang, Haijian Qin, Zihao Pei, Ziyang Meng

Comments Code is available at: https://github.com/20robo/raenwm

2603.09237 2026-03-11 cs.RO

MO-Playground: Massively Parallelized Multi-Objective Reinforcement Learning for Robotics

Neil Janwani, Ellen Novoseller, Vernon J. Lawhern, Maegan Tucker

Comments 8 pages, 4 figures, 3 tables

2603.09236 2026-03-11 cs.CV cs.AI

BridgeDiff: Bridging Human Observations and Flat-Garment Synthesis for Virtual Try-Off

Shuang Liu, Ao Yu, Linkang Cheng, Xiwen Huang, Li Zhao, Junhui Liu, Zhiting Lin, Yu Liu

Comments 33 pages, 16 figures

2603.09235 2026-03-11 cs.CV

HelixTrack: Event-Based Tracking and RPM Estimation of Propeller-like Objects

Radim Spetlik, Michal Pliska, Vojtěch Vrba, Jiri Matas

2603.09232 2026-03-11 cs.SD cs.CL eess.AS

How Contrastive Decoding Enhances Large Audio Language Models?

Tzu-Quan Lin, Wei-Ping Huang, Yi-Cheng Lin, Hung-yi Lee

Comments Submitted to INTERSPEECH 2026. Code and additional analysis results are provided in our repository: https://github.com/nervjack2/LALM-Contrastive-Decoding-Error-Profiles

2603.09231 2026-03-11 cs.AI

Cognitively Layered Data Synthesis for Domain Adaptation of LLMs to Space Situational Awareness

Ding Linghu, Cheng Wang, Da Fan, Wei Shi, Kaifeng Yin, Xiaoliang Xue, Fan Yang, Haiyi Ren, Cong Zhang

2603.09226 2026-03-11 cs.RO

TRIP-Bag: A Portable Teleoperation System for Plug-and-Play Robotic Arms and Leaders

Noboru Myers, Sankalp Yamsani, Obin Kwon, Joohyung Kim

2603.09223 2026-03-11 cs.CV

UniField: A Unified Field-Aware MRI Enhancement Framework

Yiyang Lin, Chenhui Wang, Zhihao Peng, Yixuan Yuan

详情

英文摘要

Magnetic Resonance Imaging (MRI) field-strength enhancement holds immense value for both clinical diagnostics and advanced research. However, existing methods typically focus on isolated enhancement tasks, such as specific 64mT-to-3T or 3T-to-7T transitions using limited subject cohorts, thereby failing to exploit the shared degradation patterns inherent across different field strengths and severely restricting model generalization. To address this challenge, we propose \methodname, a unified framework integrating multiple modalities and enhancement tasks to mutually promote representation learning by exploiting these shared degradation characteristics. Specifically, our main contributions are threefold. Firstly, to overcome MRI data scarcity and capture continuous anatomical structures, \methodname departs from conventional methods that treat 3D MRI volumes as independent 2D slices. Instead, we directly exploit comprehensive 3D volumetric information by leveraging pre-trained 3D foundation models, thereby embedding generalized and robust structural representations to significantly boost enhancement performance. In addition, to mitigate the spectral bias of mainstream flow-matching models that often over-smooth high-frequency details, we explicitly incorporate the physical mechanisms of magnetic fields to introduce a Field-Aware Spectral Rectification Mechanism (FASRM), tailoring customized spectral corrections to distinct field strengths. Finally, to resolve the fundamental data bottleneck, we organize and publicly release a comprehensive paired multi-field MRI dataset, which is an order of magnitude larger than existing datasets. Extensive experiments demonstrate our method's superiority over state-of-the-art approaches, achieving an average improvement of approximately 1.81 dB in PSNR and 9.47\% in SSIM. Code will be released upon acceptance.

URL PDF HTML ☆

赞 0 踩 0

2603.09222 2026-03-11 cs.CL

LooComp: Leverage Leave-One-Out Strategy to Encoder-only Transformer for Efficient Query-aware Context Compression

Thao Do, Dinh Phu Tran, An Vo, Seon Kwon Kim, Daeyoung Kim

2603.09220 2026-03-11 cs.CV

Distributed Convolutional Neural Networks for Object Recognition

Liang Sun

2603.09218 2026-03-11 cs.RO cs.AI

Embodied Human Simulation for Quantitative Design and Analysis of Interactive Robotics

Chenhui Zuo, Jinhao Xu, Michael Qian Vergnolle, Yanan Sui

2603.09215 2026-03-11 cs.CL eess.AS

SPAR-K: Scheduled Periodic Alternating Early Exit for Spoken Language Models

Hsiao-Ying Huang, Cheng-Han Chiang, Hung-yi Lee

Comments 6 pages, 1 figures, 2 tables