arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.11404 2026-03-16 cs.RO cs.CV

Real-time Rendering-based Surgical Instrument Tracking via Evolutionary Optimization

Hanyang Hu, Zekai Liang, Florian Richter, Michael C. Yip

详情

英文摘要

Accurate and efficient tracking of surgical instruments is fundamental for Robot-Assisted Minimally Invasive Surgery. Although vision-based robot pose estimation has enabled markerless calibration without tedious physical setups, reliable tool tracking for surgical robots still remains challenging due to partial visibility and specialized articulation design of surgical instruments. Previous works in the field are usually prone to unreliable feature detections under degraded visual quality and data scarcity, whereas rendering-based methods often struggle with computational costs and suboptimal convergence. In this work, we incorporate CMA-ES, an evolutionary optimization strategy, into a versatile tracking pipeline that jointly estimates surgical instrument pose and joint configurations. Using batch rendering to efficiently evaluate multiple pose candidates in parallel, the method significantly reduces inference time and improves convergence robustness. The proposed framework further generalizes to joint angle-free and bi-manual tracking settings, making it suitable for both vision feedback control and online surgery video calibration. Extensive experiments on synthetic and real-world datasets demonstrate that the proposed method significantly outperforms prior approaches in both accuracy and runtime.

URL PDF HTML ☆

赞 0 踩 0

2603.11277 2026-03-16 cs.AI

COMPASS: The explainable agentic framework for Sovereignty, Sustainability, Compliance, and Ethics

Jean-Sébastien Dessureault, Alain-Thierry Iliho Manzi, Soukaina Alaoui Ismaili, Khadim Lo, Mireille Lalancette, Éric Bélanger

Comments 22 pages, 4 figures

2603.11041 2026-03-16 cs.CV cs.RO

DynVLA: Learning World Dynamics for Action Reasoning in Autonomous Driving

Shuyao Shang, Bing Zhan, Yunfei Yan, Yuqi Wang, Yingyan Li, Yasong An, Xiaoman Wang, Jierui Liu, Lu Hou, Lue Fan, Zhaoxiang Zhang, Tieniu Tan

Comments 18 pages, 10 figures. Project Page: https://yaoyao-jpg.github.io/dynvla

2603.10767 2026-03-16 cs.CL

mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR

Konstantin Dobler, Simon Lehnerer, Federico Scozzafava, Jonathan Janke, Mohamed Ali

2603.10301 2026-03-16 cs.LG

What do near-optimal learning rate schedules look like?

Hiroki Naganuma, Atish Agarwala, Priya Kasimbeg, George E. Dahl

2603.10240 2026-03-16 cs.SD eess.AS

nlm: Real-Time Non-linear Modal Synthesis in Max

Rodrigo Diaz, Rodrigo Constanzo, Mark Sandler

Comments accepted to PdMaxCon25~ (https://music.illinois.edu/pd-max-con/)

2603.10212 2026-03-16 cs.CV cs.LG

FusionNet: a frame interpolation network for 4D heart models

Chujie Chang, Shoko Miyauchi, Ken'ichi Morooka, Ryo Kurazume, Oscar Martinez Mozos

Comments This is the authors' version. The final authenticated version is available online at https://doi.org/10.1007/978-3-031-47425-5_4. Published in Medical Image Computing and Computer Assisted Intervention - MICCAI 2023 Workshops

2603.09619 2026-03-16 cs.AI cs.MA

Context Engineering: From Prompts to Corporate Multi-Agent Architecture

Vera V. Vishnyakova

Comments 25 pages, 1 figure

2603.09217 2026-03-16 cs.CV

TubeMLLM: A Foundation Model for Topology Knowledge Exploration in Vessel-like Anatomy

Yaoyu Liu, Minghui Zhang, Xin You, Hanxiao Zhang, Yun Gu

Comments 18 pages, 12 figures

2603.08605 2026-03-16 cs.CV cs.AI

Weakly Supervised Teacher-Student Framework with Progressive Pseudo-mask Refinement for Gland Segmentation

Hikmat Khan, Wei Chen, Muhammad Khalid Khan Niazi

2603.06688 2026-03-16 cs.CV cs.AI

Narrative Weaver: Towards Controllable Long-Range Visual Consistency with Multi-Modal Conditioning

Zhengjian Yao, Yongzhi Li, Xinyuan Gao, Quan Chen, Peng Jiang, Yanye Lu

Comments Accepted by CVPR2026

2603.05869 2026-03-16 cs.CV

PatchCue: Enhancing Vision-Language Model Reasoning with Patch-Based Visual Cues

Yukun Qi, Pei Fu, Hang Li, Yuhan Liu, Chao Jiang, Bin Qin, Zhenbo Luo, Jian Luan

2603.05819 2026-03-16 cs.SD

Which Data Matter? Embedding-Based Data Selection for Speech Recognition

Zakaria Aldeneh, Skyler Seto, Maureen de Seyssel, Jie Chi, Zijin Gu, Takuya Higuchi, Jee-weon Jung, Shinji Watanabe, David Grangier, Barry-John Theobald, Tatiana Likhomanenko

2603.05344 2026-03-16 cs.AI

Building Effective AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned

Nghi D. Q. Bui

Comments Work in progress, new versions will be updated continuously

2603.02631 2026-03-16 cs.CL

Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models

Shubhangi Upasani, Ravi Shanker Raju, Bo Li, Mengmeng Ji, John Long, Chen Wu, Urmish Thakker, Guangtao Wang

详情

Journal ref: ICLR 2026 (WS)

英文摘要

Prompt length is a major bottleneck in agentic large language model (LLM) workloads, where repeated inference steps and multi-call loops incur substantial prefill cost. Recent work on speculative prefill demonstrates that attention-based token importance estimation can enable training-free prompt compression, but this assumes the existence of a draft model that shares the same tokenizer as the target model. In practice, however, agentic pipelines frequently employ models without any smaller in-family draft model. In this work, we study cross-family speculative prefill, where a lightweight draft model from one model family is used to perform prompt compression for a target model from a different family. Using the same speculative prefill mechanism as prior work, we evaluate a range of cross-family draft-target combinations, including Qwen, LLaMA, and DeepSeek models. Across a broad diversity of tasks, we find that attention-based token importance estimation transfers reliably across different model families despite differences in model architectures and tokenizers between draft and target models. Cross-model prompt compression largely retains 90~100% of full-prompt baseline performance and, in some cases, slightly improves accuracy due to denoising effects, while delivering substantial reductions in time to first token (TTFT). These results suggest that speculative prefill depends mainly on task priors and semantic structure, thus serving as a generalizable prompt compression primitive. We discuss the implications of our findings for agentic systems, where repeated long-context inference and heterogeneous model stacks make cross-model prompt compression both necessary and practical.

URL PDF HTML ☆

赞 0 踩 0

2603.02435 2026-03-16 cs.AI cs.LG

VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings

Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

Comments Published in Proceedings of the ACM Web Conference 2026 (WWW '26). This arXiv version includes extended supplementary material

2603.00947 2026-03-16 cs.CV

Mobile-VTON: High-Fidelity On-Device Virtual Try-On

Zhenchen Wan, Ce Chen, Runqi Lin, Jiaxin Huang, Tianxi Chen, Yanwu Xu, Tongliang Liu, Mingming Gong

Comments The project page is available at: https://zhenchenwan.github.io/Mobile-VTON/

2603.00016 2026-03-16 cs.RO cs.AI cs.HC

Beyond Static Instruction: A Multi-agent AI Framework for Adaptive Augmented Reality Robot Training

Nicolas Leins, Jana Gonnermann-Müller, Malte Teichmann, Sebastian Pokutta

2602.23951 2026-03-16 cs.CV

AHAP: Reconstructing Arbitrary Humans from Arbitrary Perspectives with Geometric Priors

Xiaozhen Qiao, Wenjia Wang, Zhiyuan Zhao, Jiacheng Sun, Ping Luo, Hongyuan Zhang, Xuelong Li

2602.23790 2026-03-16 cs.CV

Fourier Angle Alignment for Oriented Object Detection in Remote Sensing

Changyu Gu, Linwei Chen, Lin Gu, Ying Fu

Comments Accepted by CVPR 2026

2602.23228 2026-03-16 cs.CV cs.AI

MovieTeller: Tool-augmented Movie Synopsis with ID Consistent Progressive Abstraction

Yizhi Li, Xiaohan Chen, Miao Jiang, Wentao Tang, Gaoang Wang

Comments 6 pages, CSCWD 2026

2602.20673 2026-03-16 cs.CV

GA-Drive: Geometry-Appearance Decoupled Modeling for Free-viewpoint Driving Scene Generation

Hao Zhang, Lue Fan, Qitai Wang, Wenbo Li, Zehuan Wu, Lewei Lu, Zhaoxiang Zhang, Hongsheng Li

2602.19531 2026-03-16 cs.LG cs.AI

A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations

Dingyi Nie, Yixing Wu, C. -C. Jay Kuo

Comments Accepted for publication in APSIPA Transactions on Signal and Information Processing

详情

DOI: 10.1108/ATSIP-02-2026-002
Journal ref: APSIPA Transactions on Signal and Information Processing, 15(1): 61-75, 2026

英文摘要

Irregular multivariate time series with missing values present significant challenges for predictive modeling in domains such as healthcare. While deep learning approaches often focus on temporal interpolation or complex architectures to handle irregularities, we propose a simpler yet effective alternative: extracting time-agnostic summary statistics to eliminate the temporal axis. Our method computes four key features per variable-mean and standard deviation of observed values, as well as the mean and variability of changes between consecutive observations to create a fixed-dimensional representation. These features are then utilized with standard classifiers, such as logistic regression and XGBoost. Evaluated on four biomedical datasets (PhysioNet Challenge 2012, 2019, PAMAP2, and MIMIC-III), our approach achieves state-of-the-art performance, surpassing recent transformer and graph-based models by 0.5-1.7% in AUROC/AUPRC and 1.1-1.7% in accuracy/F1-score, while reducing computational complexity. Ablation studies demonstrate that feature extraction-not classifier choice-drives performance gains, and our summary statistics outperform raw/imputed input in most benchmarks. In particular, we identify scenarios where missing patterns themselves encode predictive signals, as in sepsis prediction (PhysioNet, 2019), where missing indicators alone can achieve 94.2% AUROC with XGBoost, only 1.6% lower than using original raw data as input. Our results challenge the necessity of complex temporal modeling when task objectives permit time-agnostic representations, providing an efficient and interpretable solution for irregular time series classification.

URL PDF HTML ☆

赞 0 踩 0

2602.17908 2026-03-16 cs.RO

WHED: A Wearable Hand Exoskeleton for Natural, High-Quality Demonstration Collection

Mingzhang Zhu, Alvin Zhu, Jose Victor S. H. Ramos, Beom Jun Kim, Yike Shi, Yufeng Wu, Ruochen Hou, Quanyou Wang, Eric Song, Tony Fan, Yuchen Cui, Dennis W. Hong

Comments This manuscript is withdrawn because the work is being substantially revised for submission to a peer-reviewed venue. The current version may be incomplete or misleading

2602.17077 2026-03-16 cs.CV

Cross Pseudo Labeling For Weakly Supervised Video Anomaly Detection

Dayeon Lee, Donghyeong Kim, Chaewon Park, Sungmin Woo, Sangyoun Lee

Comments ICASSP 2026, https://github.com/eastbrother87/CPLVAD

2602.16891 2026-03-16 cs.AI cs.CR cs.SE

OpenSage: Self-programming Agent Generation Engine

Hongwei Li, Zhun Wang, Qinrun Dai, Yuzhou Nie, Jinjun Peng, Ruitong Liu, Jingyang Zhang, Kaijie Zhu, Jingxuan He, Lun Wang, Yangruibo Ding, Yueqi Chen, Wenbo Guo, Dawn Song

2602.14041 2026-03-16 cs.CV cs.AI

BitDance: Scaling Autoregressive Generative Models with Binary Tokens

Yuang Ai, Jiaming Han, Shaobin Zhuang, Weijia Mao, Xuefeng Hu, Ziyan Yang, Zhenheng Yang, Yali Wang, Huaibo Huang, Xiangyu Yue, Hao Chen

Comments Code and models: https://github.com/shallowdream204/BitDance

2602.12740 2026-03-16 cs.CV cs.GR

SPRig: Self-Supervised Pose-Invariant Rigging from Mesh Sequences

Ruipeng Wang, Langkun Zhong, Miaowei Wang

Comments Code: https://github.com/WANG-Ruipeng/SPRig

2602.08820 2026-03-16 cs.CV

Omni-Video 2: Scaling MLLM-Conditioned Diffusion for Unified Video Generation and Editing

Hao Yang, Zhiyu Tan, Jia Gong, Luozheng Qin, Hesen Chen, Xiaomeng Yang, Yuqing Sun, Yuetan Lin, Mengping Yang, Hao Li

Comments Technical Report, Project: https://howellyoung-s.github.io/Omni-Video2-project/

2602.08116 2026-03-16 cs.RO

From Ellipsoids to Midair Control of Dynamic Hitches

Jiawei Xu, Subhrajit Bhattacharya, David Saldaña