arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.06254 2026-03-09 cs.CV cs.RO eess.IV

NOVA: Next-step Open-Vocabulary Autoregression for 3D Multi-Object Tracking in Autonomous Driving

Kai Luo, Xu Wang, Rui Fan, Kailun Yang

Comments Code will be available at https://github.com/xifen523/NOVA

详情

英文摘要

Generalizing across unknown targets is critical for open-world perception, yet existing 3D Multi-Object Tracking (3D MOT) pipelines remain limited by closed-set assumptions and ``semantic-blind'' heuristics. To address this, we propose Next-step Open-Vocabulary Autoregression (NOVA), an innovative paradigm that shifts 3D tracking from traditional fragmented distance-based matching toward generative spatio-temporal semantic modeling. NOVA reformulates 3D trajectories as structured spatio-temporal semantic sequences, enabling the simultaneous encoding of physical motion continuity and deep linguistic priors. By leveraging the autoregressive capabilities of Large Language Models (LLMs), we transform the tracking task into a principled process of next-step sequence completion. This mechanism allows the model to explicitly utilize the hierarchical structure of language space to resolve fine-grained semantic ambiguities and maintain identity consistency across complex long-range sequences through high-level commonsense reasoning. Extensive experiments on nuScenes, V2X-Seq-SPD, and KITTI demonstrate the superior performance of NOVA. Notably, on the nuScenes dataset, NOVA achieves an AMOTA of 22.41% for Novel categories, yielding a significant 20.21% absolute improvement over the baseline. These gains are realized through a compact 0.5B autoregressive model. Code will be available at https://github.com/xifen523/NOVA.

URL PDF HTML ☆

赞 0 踩 0

2603.06252 2026-03-09 cs.LG stat.ML

Synthetic Monitoring Environments for Reinforcement Learning

Leonard Pleiss, Carolin Schmidt, Maximilian Schiffer

2603.06250 2026-03-09 cs.CV

Hierarchical Collaborative Fusion for 3D Instance-aware Referring Expression Segmentation

Keshen Zhou, Runnan Chen, Mingming Gong, Tongliang Liu

2603.06248 2026-03-09 cs.LG math.OC stat.ML

Gradient Flow Polarizes Softmax Outputs towards Low-Entropy Solutions

Aditya Varre, Mark Rofin, Nicolas Flammarion

Comments 35 pages, 21 figures

2603.06231 2026-03-09 cs.CV cs.AI cs.RO

TaPD: Temporal-adaptive Progressive Distillation for Observation-Adaptive Trajectory Forecasting in Autonomous Driving

Mingyu Fan, Yi Liu, Hao Zhou, Deheng Qian, Mohammad Haziq Khan, Matthias Raetsch

详情

英文摘要

Trajectory prediction is essential for autonomous driving, enabling vehicles to anticipate the motion of surrounding agents to support safe planning. However, most existing predictors assume fixed-length histories and suffer substantial performance degradation when observations are variable or extremely short in real-world settings (e.g., due to occlusion or a limited sensing range). We propose TaPD (Temporal-adaptive Progressive Distillation), a unified plug-and-play framework for observation-adaptive trajectory forecasting under variable history lengths. TaPD comprises two cooperative modules: an Observation-Adaptive Forecaster (OAF) for future prediction and a Temporal Backfilling Module (TBM) for explicit reconstruction of the past. OAF is built on progressive knowledge distillation (PKD), which transfers motion pattern knowledge from long-horizon "teachers" to short-horizon "students" via hierarchical feature regression, enabling short observations to recover richer motion context. We further introduce a cosine-annealed distillation weighting scheme to balance forecasting supervision and feature alignment, improving optimization stability and cross-length consistency. For extremely short histories where implicit alignment is insufficient, TBM backfills missing historical segments conditioned on scene evolution, producing context-rich trajectories that strengthen PKD and thereby improve OAF. We employ a decoupled pretrain-reconstruct-finetune protocol to preserve real-motion priors while adapting to backfilled inputs. Extensive experiments on Argoverse 1 and Argoverse 2 show that TaPD consistently outperforms strong baselines across all observation lengths, delivers especially large gains under very short inputs, and improves other predictors (e.g., HiVT) in a plug-and-play manner. Code will be available at https://github.com/zhouhao94/TaPD.

URL PDF HTML ☆

赞 0 踩 0

2603.06228 2026-03-09 cs.CV

Low-latency Event-based Object Detection with Spatially-Sparse Linear Attention

Haiqing Hao, Zhipeng Sui, Rong Zou, Zijia Dai, Nikola Zubić, Davide Scaramuzza, Wenhui Wang

2603.06224 2026-03-09 cs.LG

FedSCS-XGB -- Federated Server-centric surrogate XGBoost for continual health monitoring

Felix Walger, Mehdi Ejtehadi, Anke Schmeink, Diego Paez-Granados

Comments Submitted to IEEE EMBC 2026

2603.06222 2026-03-09 cs.CL

SPOT: Span-level Pause-of-Thought for Efficient and Interpretable Latent Reasoning in Large Language Models

Yunlong Chu, Minglai Shao, Yuhang Liu, Bing Hao, Yumeng Lin, Jialu Wang, Ruijie Wang

2603.06220 2026-03-09 cs.CV

Word-Anchored Temporal Forgery Localization

Tianyi Wang, Xi Shao, Harry Cheng, Yinglong Wang, Mohan Kankanhalli

Comments Submitted for review

2603.06217 2026-03-09 cs.AI cs.MA cs.SY eess.SY

Conversational Demand Response: Bidirectional Aggregator-Prosumer Coordination through Agentic AI

Reda El Makroum, Sebastian Zwickl-Bernhard, Lukas Kranzl, Hans Auer

Comments 6 pages, 2 figures. Code available at: https://github.com/RedaElMakroum/cdr

2603.06216 2026-03-09 cs.CV

EntON: Eigenentropy-Optimized Neighborhood Densification in 3D Gaussian Splatting

Miriam Jäger, Boris Jutzi

Comments Submitted to ISPRS Journal of Photogrammetry and Remote Sensing on 20 February 2026

详情

英文摘要

We present a novel Eigenentropy-optimized neighboorhood densification strategy EntON in 3D Gaussian Splatting (3DGS) for geometrically accurate and high-quality rendered 3D reconstruction. While standard 3DGS produces Gaussians whose centers and surfaces are poorly aligned with the underlying object geometry, surface-focused reconstruction methods frequently sacrifice photometric accuracy. In contrast to the conventional densification strategy, which relies on the magnitude of the view-space position gradient, our approach introduces a geometry-aware strategy to guide adaptive splitting and pruning. Specifically, we compute the 3D shape feature Eigenentropy from the eigenvalues of the covariance matrix in the k-nearest neighborhood of each Gaussian center, which quantifies the local structural order. These Eigenentropy values are integrated into an alternating optimization framework: During the optimization process, the algorithm alternates between (i) standard gradient-based densification, which refines regions via view-space gradients, and (ii) Eigenentropy-aware densification, which preferentially densifies Gaussians in low-Eigenentropy (ordered, flat) neighborhoods to better capture fine geometric details on the object surface, and prunes those in high-Eigenentropy (disordered, spherical) regions. We provide quantitative and qualitative evaluations on two benchmark datasets: small-scale DTU dataset and large-scale TUM2TWIN dataset, covering man-made objects and urban scenes. Experiments demonstrate that our Eigenentropy-aware alternating densification strategy improves geometric accuracy by up to 33% and rendering quality by up to 7%, while reducing the number of Gaussians by up to 50% and training time by up to 23%. Overall, EnTON achieves a favorable balance between geometric accuracy, rendering quality and efficiency by avoiding unnecessary scene expansion.

URL PDF HTML ☆

赞 0 踩 0

2603.06213 2026-03-09 cs.CV cs.AI

Cut to the Chase: Training-free Multimodal Summarization via Chain-of-Events

Xiaoxing You, Qiang Huang, Lingyu Li, Xiaojun Chang, Jun Yu

Comments Accepted to CVPR 2026

2603.06212 2026-03-09 cs.LG stat.AP

Topological descriptors of foot clearance gait dynamics improve differential diagnosis of Parkinsonism

Jhonathan Barrios, Wolfram Erlhagen, Miguel F. Gago, Estela Bicho, Flora Ferreira

Comments 17 pages, 12 figures, Under review

2603.06210 2026-03-09 cs.CV cs.RO

VG3S: Visual Geometry Grounded Gaussian Splatting for Semantic Occupancy Prediction

Xiaoyang Yan, Muleilan Pei, Shaojie Shen

2603.06205 2026-03-09 cs.RO

KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware Inference

Jiwon Choi, Hogyun Kim, Geonmo Yang, Juhui Lee, Younggun Cho

Comments 8 pages, 9 figures

2603.06201 2026-03-09 cs.CV

Point-Supervised Skeleton-Based Human Action Segmentation

Hongsong Wang, Yiqin Shen, Pengbo Yan, Jie Gui

2603.06200 2026-03-09 cs.CV

Adaptive Language-Aware Image Reflection Removal Network

Siyan Fang, Yuntao Wang, Jinpu Zhang, Ziwen Li, Yuehuan Wang

Comments IJCAI 2025

2603.06199 2026-03-09 cs.CL cs.AI

FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling

Qihang Fan, Huaibo Huang, Zhiying Wu, Juqiu Wang, Bingning Wang, Ran He

2603.06197 2026-03-09 cs.CL

Wisdom of the AI Crowd (AI-CROWD) for Ground Truth Approximation in Content Analysis: A Research Protocol & Validation Using Eleven Large Language Models

Luis de-Marcos, Manuel Goyanes, Adrián Domínguez-Díaz

2603.06194 2026-03-09 cs.CL cs.AI

MAPO: Mixed Advantage Policy Optimization for Long-Horizon Multi-Turn Dialogue

Naifan Zhang, Ruihan Sun, Jinwei Su, Hengjie Yang, Zhengyuan Pan, Zhaohan Chen, Xiaofan Zhang

2603.06193 2026-03-09 cs.SD cs.AI eess.AS

Whisper-CD: Accurate Long-Form Speech Recognition using Multi-Negative Contrastive Decoding

Hoseong Ahn, Jeongyun Chae, Yoonji Park, Kyuhong Shim

Comments Submitted to Interspeech 2026

2603.06190 2026-03-09 cs.RO

DreamToNav: Generalizable Navigation for Robots via Generative Video Planning

Valerii Serpiva, Jeffrin Sam, Chidera Simon, Hajira Amjad, Iana Zhura, Artem Lykov, Dzmitry Tsetserukou

Comments Submitted to conference

2603.06186 2026-03-09 cs.CV

SpaCRD: Multimodal Deep Fusion of Histology and Spatial Transcriptomics for Cancer Region Detection

Shuailin Xue, Jun Wan, Lihua Zhang, Wenwen Min

Comments Accepted by AAAI-2026-Oral

2603.06181 2026-03-09 cs.CV

Towards Motion Turing Test: Evaluating Human-Likeness in Humanoid Robots

Mingzhe Li, Mengyin Liu, Zekai Wu, Xincheng Lin, Junsheng Zhang, Ming Yan, Zengye Xie, Changwang Zhang, Chenglu Wen, Lan Xu, Siqi Shen, Cheng Wang

Comments 13 pages, 10 figures, conference

2603.06180 2026-03-09 cs.CV cs.AI cs.CL cs.LG

Contrastive-to-Self-Supervised: A Two-Stage Framework for Script Similarity Learning

Claire Roman, Philippe Meyer

2603.06173 2026-03-09 cs.CV

Optimizing 3D Diffusion Models for Medical Imaging via Multi-Scale Reward Learning

Yueying Tian, Xudong Han, Meng Zhou, Rodrigo Aviles-Espinosa, Rupert Young, Philip Birch

Comments Preprint

2603.06167 2026-03-09 cs.CV

A Semi-Supervised Framework for Breast Ultrasound Segmentation with Training-Free Pseudo-Label Generation and Label Refinement

Ruili Li, Jiayi Ding, Ruiyu Li, Yilun Jin, Shiwen Ge, Yuwen Zeng, Xiaoyong Zhang, Eichi Takaya, Jan Vrba, Noriyasu Homma

2603.06166 2026-03-09 cs.CV

FreeOcc: Training-free Panoptic Occupancy Prediction via Foundation Models

Andrew Caunes, Thierry Chateau, Vincent Fremont

Comments 14 pages

2603.06164 2026-03-09 cs.SD cs.AI cs.CL

Do Compact SSL Backbones Matter for Audio Deepfake Detection? A Controlled Study with RAPTOR

Ajinkya Kulkarni, Sandipana Dowerah, Atharva Kulkarni, Tanel Alumäe, Mathew Magimai Doss

Comments Submitted to Interspeech 2026, 4 pages, 2 figures

2603.06163 2026-03-09 cs.RO

Dual-Agent Multiple-Model Reinforcement Learning for Event-Triggered Human-Robot Co-Adaptation in Decoupled Task Spaces

Yaqi Li, Zhengqi Han, Huifang Liu, Steven W. Su