arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.05874 2026-03-09 cs.LG cs.SI

Stochastic Event Prediction via Temporal Motif Transitions

İbrahim Bahadır Altun, Ahmet Erdem Sarıyüce

详情

英文摘要

Networks of timestamped interactions arise across social, financial, and biological domains, where forecasting future events requires modeling both evolving topology and temporal ordering. Temporal link prediction methods typically frame the task as binary classification with negative sampling, discarding the sequential and correlated nature of real-world interactions. We introduce STEP (STochastic Event Predictor), a framework that reformulates temporal link prediction as a sequential forecasting problem in continuous time. STEP models event dynamics through discrete temporal motif transitions governed by Poisson processes, maintaining a set of open motif instances that evolve as new interactions arrive. At each step, the framework decides whether to initiate a new temporal motif or extend an existing one, selecting the most probable event via Bayesian scoring of temporal likelihoods and structural priors. STEP also produces compact, temporal motif-based feature vectors that can be concatenated with existing temporal graph neural network outputs, enriching their representations without architectural modifications. Experiments on five real-world datasets demonstrate up to 21% average precision gains over state-of-the-art baselines in classification and 0.99 precision in next $k$ sequential forecasting, with consistently lower runtime than competing motif-aware methods.

URL PDF HTML ☆

赞 0 踩 0

2603.05873 2026-03-09 cs.CV

Shifting Adaptation from Weight Space to Memory Space: A Memory-Augmented Agent for Medical Image Segmentation

Bowen Chen, Qiaohui Gao, Shaowen Wan, Shanhui Sun, Wei Liu, Xiang Li, Tianming Liu, Lin Zhao

2603.05868 2026-03-09 cs.RO

AnyCamVLA: Zero-Shot Camera Adaptation for Viewpoint Robust Vision-Language-Action Models

Hyeongjun Heo, Seungyeon Woo, Sang Min Kim, Junho Kim, Junho Lee, Yonghyeon Lee, Young Min Kim

Comments Under review, Project Page: https://heo0224.github.io/AnyCamVLA/

2603.05861 2026-03-09 cs.RO

DexEMG: Towards Dexterous Teleoperation System via EMG2Pose Generalization

Qianyou Zhao, Wenqiao Li, Chiyu Wang, Kaifeng Zhang

2603.05860 2026-03-09 cs.AI cs.CV

Evolving Medical Imaging Agents via Experience-driven Self-skill Discovery

Lin Fan, Pengyu Dai, Zhipeng Deng, Haolin Wang, Xun Gong, Yefeng Zheng, Yafei Ou

Comments 18 pages, 4 figures, 3 tables

2603.05852 2026-03-09 cs.SD

How Well Do Current Speech Deepfake Detection Methods Generalize to the Real World?

Daixian Li, Jun Xue, Yanzhen Ren, Zhuolin Yi, Yihuan Huang, Guanxiang Feng, Yi Chai

Comments Submitted to Interspeech 2026

2603.05851 2026-03-09 cs.CV

VS3R: Robust Full-frame Video Stabilization via Deep 3D Reconstruction

Muhua Zhu, Xinhao Jin, Yu Zhang, Yifei Xue, Tie Ji, Yizhen Lao

2603.05845 2026-03-09 cs.CV

Cog2Gen3D: Sculpturing 3D Semantic-Geometric Cognition for 3D Generation

Haonan Wang, Hanyu Zhou, Haoyue Liu, Tao Gu, Luxin Yan

2603.05844 2026-03-09 cs.CV cs.AI

Remote Sensing Image Classification Using Deep Ensemble Learning

Niful Islam, Md. Rayhan Ahmed, Nur Mohammad Fahad, Salekul Islam, A. K. M. Muzahidul Islam, Saddam Mukta, Swakkhar Shatabda

2603.05842 2026-03-09 cs.RO

Expert Knowledge-driven Reinforcement Learning for Autonomous Racing via Trajectory Guidance and Dynamics Constraints

Bo Leng, Weiqi Zhang, Zhuoren Li, Lu Xiong, Guizhe Jin, Ran Yu, Chen Lv

2603.05837 2026-03-09 cs.RO

Terrain characterization and locomotion adaptation in a small-scale lizard-inspired robot

Duncan Andrews, Landon Zimmerman, Evan Martin, Joe DiGennaro, Baxi Chong

Comments 7 pages. 9 figures. IROS 2026 Conference

2603.05830 2026-03-09 cs.RO

OpenHEART: Opening Heterogeneous Articulated Objects with a Legged Manipulator

Seonghyeon Lim, Hyeonwoo Lee, Seunghyun Lee, I Made Aswin Nahrendra, Hyun Myung

Comments 8 pages

2603.05828 2026-03-09 cs.CL

HART: Data-Driven Hallucination Attribution and Evidence-Based Tracing for Large Language Models

Shize Liang, Hongzhi Wang

2603.05822 2026-03-09 cs.LG

Self-Auditing Parameter-Efficient Fine-Tuning for Few-Shot 3D Medical Image Segmentation

Son Thai Ly, Hien V. Nguyen

2603.05818 2026-03-09 cs.CL

RouteGoT: Node-Adaptive Routing for Cost-Efficient Graph of Thoughts Reasoning

Yuhang Liu, Ruijie Wang, Yunlong Chu, Bing Hao, Yumeng Lin, Shengzhong Liu, Minglai Shao

详情

英文摘要

Large Language Models (LLMs) excel at multi-step reasoning, yet increasing the structural complexity of inference does not consistently improve system-level returns. Methods such as Tree of Thoughts (ToT), Graph of Thoughts (GoT), and Adaptive Graph of Thoughts (AGoT) can boost accuracy on some benchmarks, but often introduce substantial overhead in token consumption and latency, and their gains can be unstable across task distributions-sometimes underperforming simpler Chain-of-Thought (CoT) or direct input-output prompting (IO). We attribute this inefficiency to stage-wise and node-wise heterogeneity inside GoT-style reasoning pipelines: high-quality planning and final synthesis are globally coupled and typically benefit from strong models, whereas many intermediate subtasks are localized and can be solved accurately by lighter models with far fewer tokens. Motivated by these observations, we propose RouteGoT, a budget-controllable, node-adaptive routing framework for graph-structured reasoning. RouteGoT performs in-graph routing by prioritizing strong models for planning and synthesis, while dynamically allocating lightweight models and cost-effective strategies to leaf subtasks based on predicted difficulty. It further integrates explicit budget constraints into a global inference scheduler to control graph expansion under a user-specified token budget, enabling predictable performance-cost trade-offs. Experiments across reasoning, retrieval, and multi-hop QA benchmarks show that RouteGoT matching or improving accuracy while substantially reducing token usage; specifically, it achieves an average 8.1 percentage points accuracy improvement and 79.1\% output token reduction compared to AGoT. Furthermore, RouteGoT outperforms existing routing baselines by maintaining a superior cost-accuracy trade-off, demonstrating improved robustness under varying budget targets and tasks.

URL PDF HTML ☆

赞 0 踩 0

2603.05815 2026-03-09 cs.RO

Hierarchical Latent Action Model

Hanjung Kim, Lerrel Pinto, Seon Joo Kim

Comments ICLR 2026 Workshop - 2nd Workshop on World Models: Understanding, Modelling and Scaling

2603.05812 2026-03-09 cs.CV cs.AI cs.LG

Margin and Consistency Supervision for Calibrated and Robust Vision Models

Salim Khazem

2603.05807 2026-03-09 cs.CV

EventGeM: Global-to-Local Feature Matching for Event-Based Visual Place Recognition

Adam D. Hines, Gokul B. Nair, Nicolás Marticorena, Michael Milford, Tobias Fischer

Comments 10 pages, 4 figures, 5 tables, under review

2603.05806 2026-03-09 cs.LG

MoE Lens -- An Expert Is All You Need

Marmik Chaudhari, Idhant Gulati, Nishkal Hundia, Pranav Karra, Shivam Raval

Comments 15 pages, 10 figures, ICLR 2025 Workshop on Sparsity in LLMs (SLLM)

2603.05805 2026-03-09 cs.LG

Sparse Crosscoders for diffing MoEs and Dense models

Marmik Chaudhari, Nishkal Hundia, Idhant Gulati

Comments 5 pages, 3 figures

2603.05804 2026-03-09 cs.RO

CDF-Glove: A Cable-Driven Force Feedback Glove for Dexterous Teleoperation

Huayue Liang, Ruochong Li, Yaodong Yang, Long Zeng, Yuanpei Chen, Xueqian Wang

2603.05787 2026-03-09 cs.CV

Spectral Probing of Feature Upsamplers in 2D-to-3D Scene Reconstruction

Ling Xiao, Yuliang Xiu, Yue Chen, Guoming Wang, Toshihiko Yamasaki

详情

英文摘要

A typical 2D-to-3D pipeline takes multi-view images as input, where a Vision Foundation Model (VFM) extracts features that are spatially upsampled to dense representations for 3D reconstruction. If dense features across views preserve geometric consistency, differentiable rendering can recover an accurate 3D representation, making the feature upsampler a critical component. Recent learnable upsampling methods mainly aim to enhance spatial details, such as sharper geometry or richer textures, yet their impact on 3D awareness remains underexplored. To address this gap, we introduce a spectral diagnostic framework with six complementary metrics that characterize amplitude redistribution, structural spectral alignment, and directional stability. Across classical interpolation and learnable upsampling methods on CLIP and DINO backbones, we observe three key findings. First, structural spectral consistency (SSC/CSC) is the strongest predictor of NVS quality, whereas High-Frequency Spectral Slope Drift (HFSS) often correlates negatively with reconstruction performance, indicating that emphasizing high-frequency details alone does not necessarily improve 3D reconstruction. Second, geometry and texture respond to different spectral properties: Angular Energy Consistency (ADC) correlates more strongly with geometry-related metrics, while SSC/CSC influence texture fidelity slightly more than geometric accuracy. Third, although learnable upsamplers often produce sharper spatial features, they rarely outperform classical interpolation in reconstruction quality, and their effectiveness depends on the reconstruction model. Overall, our results indicate that reconstruction quality is more closely related to preserving spectral structure than to enhancing spatial detail, highlighting spectral consistency as an important principle for designing upsampling strategies in 2D-to-3D pipelines.

URL PDF HTML ☆

赞 0 踩 0

2603.05783 2026-03-09 cs.RO

Task-Level Decisions to Gait Level Control: A Hierarchical Policy Approach for Quadruped Navigation

Sijia Li, Haoyu Wang, Shenghai Yuan, Yizhuo Yang, Thien-Minh Nguyen

Comments Submitted to IROS 2026

2603.05781 2026-03-09 cs.CV cs.AI

Visual Words Meet BM25: Sparse Auto-Encoder Visual Word Scoring for Image Retrieval

Donghoon Han, Eunhwan Park, Seunghyeon Seo

2603.05778 2026-03-09 cs.CL

Tutor Move Taxonomy: A Theory-Aligned Framework for Analyzing Instructional Moves in Tutoring

Zhuqian Zhou, Kirk Vanacore, Tamisha Thompson, Jennifer St John, Rene Kizilcec

2603.05776 2026-03-09 cs.CL cs.AI

PVminerLLM: Structured Extraction of Patient Voice from Patient-Generated Text using Large Language Models

Samah Fodeh, Linhai Ma, Ganesh Puthiaraju, Srivani Talakokkul, Afshan Khan, Ashley Hagaman, Sarah Lowe, Aimee Roundtree

2603.05774 2026-03-09 cs.LG cs.DC

First-Order Softmax Weighted Switching Gradient Method for Distributed Stochastic Minimax Optimization with Stochastic Constraints

Zhankun Luo, Antesh Upadhyay, Sang Bin Moon, Abolfazl Hashemi

2603.05769 2026-03-09 cs.CV

Layer-wise Instance Binding for Regional and Occlusion Control in Text-to-Image Diffusion Transformers

Ruidong Chen, Yancheng Bai, Xuanpu Zhang, Jianhao Zeng, Lanjun Wang, Dan Song, Lei Sun, Xiangxiang Chu, Anan Liu

Comments Accepted by CVPR26

详情

英文摘要

Region-instructed layout control in text-to-image generation is highly practical, yet existing methods suffer from limitations: (i) training-based approaches inherit data bias and often degrade image quality, and (ii) current techniques struggle with occlusion order, limiting real-world usability. To address these issues, we propose LayerBind. By modeling regional generation as distinct layers and binding them during the generation, our method enables precise regional and occlusion controllability. Our motivation stems from the observation that spatial layout and occlusion are established at a very early denoising stage, suggesting that rearranging the early latent structure is sufficient to modify the final output. Building on this, we structure the scheme into two phases: instance initialization and subsequent semantic nursing. (1) First, leveraging the contextual sharing mechanism in multimodal joint attention, Layer-wise Instance Initialization creates per-instance branches that attend to their own regions while anchoring to the shared background. At a designated early step, these branches are fused according to the layer order to form a unified latent with a pre-established layout. (2) Then, Layer-wise Semantic Nursing reinforces regional details and maintains the occlusion order via a layer-wise attention enhancement. Specifically, a sequential layered attention path operates alongside the standard global path, with updates composited under a layer-transparency scheduler. LayerBind is training-free and plug-and-play, serving as a regional and occlusion controller across Diffusion Transformers. Beyond generation, it natively supports editable workflows, allowing for flexible modifications like changing instances or rearranging visible orders. Both qualitative and quantitative results demonstrate LayerBind's effectiveness, highlighting its strong potential for creative applications.

URL PDF HTML ☆

赞 0 踩 0

2603.05767 2026-03-09 cs.RO

Multi-Robot Trajectory Planning via Constrained Bayesian Optimization and Local Cost Map Learning with STL-Based Conflict Resolution

Sourav Raxit, Abdullah Al Redwan Newaz, Jose Fuentes, Paulo Padrao, Ana Cavalcanti, Leonardo Bobadilla

Comments Accepted to ICRA 2026

2603.05764 2026-03-09 cs.LG cs.AI

TML-Bench: Benchmark for Data Science Agents on Tabular ML Tasks

Mykola Pinchuk

Comments 19 pages, 16 tables and figures