arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.01864 2026-03-03 cs.CV cs.RO

Streaming Real-Time Trajectory Prediction Using Endpoint-Aware Modeling

Alexander Prutsch, David Schinagl, Horst Possegger

Comments WACV 2026 Oral. Project Page at https://a-pru.github.io/seam/

详情

英文摘要

Future trajectories of neighboring traffic agents have a significant influence on the path planning and decision-making of autonomous vehicles. While trajectory forecasting is a well-studied field, research mainly focuses on snapshot-based prediction, where each scenario is treated independently of its global temporal context. However, real-world autonomous driving systems need to operate in a continuous setting, requiring real-time processing of data streams with low latency and consistent predictions over successive timesteps. We leverage this continuous setting to propose a lightweight yet highly accurate streaming-based trajectory forecasting approach. We integrate valuable information from previous predictions with a novel endpoint-aware modeling scheme. Our temporal context propagation uses the trajectory endpoints of the previous forecasts as anchors to extract targeted scenario context encodings. Our approach efficiently guides its scene encoder to extract highly relevant context information without needing refinement iterations or segment-wise decoding. Our experiments highlight that our approach effectively relays information across consecutive timesteps. Unlike methods using multi-stage refinement processing, our approach significantly reduces inference latency, making it well-suited for real-world deployment. We achieve state-of-the-art streaming trajectory prediction results on the Argoverse~2 multi-agent and single-agent benchmarks, while requiring substantially fewer resources.

URL PDF HTML ☆

赞 0 踩 0

2603.01863 2026-03-03 cs.LG cs.AI

Tide: A Customisable Dataset Generator for Anti-Money Laundering Research

Montijn van den Beukel, Jože Martin Rožanec, Ana-Lucia Varbanescu

Comments Synthetic AML transaction datasets (Tide, HI and LI variants) are available at https://doi.org/10.5281/zenodo.18804069

2603.01850 2026-03-03 cs.RO cs.CV cs.SY eess.SY

Tiny-DroNeRF: Tiny Neural Radiance Fields aboard Federated Learning-enabled Nano-drones

Ilenia Carboni, Elia Cereda, Lorenzo Lamberti, Daniele Malpetti, Francesco Conti, Daniele Palossi

2602.23697 2026-03-03 cs.CV

Towards Source-Aware Object Swapping with Initial Noise Perturbation

Jiahui Zhan, Xianbing Sun, Xiangnan Zhu, Yikun Ji, Ruitong Liu, Liqing Zhang, Jianfu Zhang

Comments This paper is accepted by CVPR 2026 Findings

2602.23589 2026-03-03 cs.CV cs.AI

Pseudo Contrastive Learning for Diagram Comprehension in Multimodal Models

Hiroshi Sasaki

Comments 9 pages, 3 figures

2602.22948 2026-03-03 cs.CV

ToProVAR: Efficient Visual Autoregressive Modeling via Tri-Dimensional Entropy-Aware Semantic Analysis and Sparsity Optimization

Jiayu Chen, Ruoyu Lin, Zihao Zheng, Jingxin Li, Maoliang Li, Guojie Luo, Xiang Chen

Comments ToProVAR is honored to be accepted by ICLR 2026

2602.21101 2026-03-03 cs.CV cs.RO

Event-Aided Sharp Radiance Field Reconstruction for Fast-Flying Drones

Rong Zou, Marco Cannici, Davide Scaramuzza

2602.20160 2026-03-03 cs.CV

tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction

Chen Wang, Hao Tan, Wang Yifan, Zhiqin Chen, Yuheng Liu, Kalyan Sunkavalli, Sai Bi, Lingjie Liu, Yiwei Hu

Comments Accepted by CVPR 2026. Project Page: https://cwchenwang.github.io/tttLRM

2602.19385 2026-03-03 cs.CV cs.CL cs.LG

Adaptive Data Augmentation with Multi-armed Bandit: Sample-Efficient Embedding Calibration for Implicit Pattern Recognition

Minxue Tang, Yangyang Yu, Aolin Ding, Maziyar Baran Pouyan, Taha Belkhouja, Yujia Bao

2602.19096 2026-03-03 cs.LG

The Power of Decaying Steps: Enhancing Attack Stability and Transferability for Sign-based Optimizers

Wei Tao, Yang Dai, Jincai Huang, Qing Tao

Comments CVPR 2026

2602.18873 2026-03-03 cs.CV cs.AI

BiMotion: B-spline Motion for Text-guided Dynamic 3D Character Generation

Miaowei Wang, Qingxuan Yan, Zhi Cao, Yayuan Li, Oisin Mac Aodha, Jason J. Corso, Amir Vaxman

Comments Accepted to CVPR 2026

2602.18726 2026-03-03 cs.CV cs.LG

WiCompass: Oracle-driven Data Scaling for mmWave Human Pose Estimation

Bo Liang, Chen Gong, Haobo Wang, Qirui Liu, Rungui Zhou, Fengzhi Shao, Yubo Wang, Wei Gao, Kaichen Zhou, Guolong Cui, Chenren Xu

Comments This paper has been accepted by The 32nd Annual International Conference on Mobile Computing and Networking (MobiCom'26)

2602.17616 2026-03-03 cs.LG cs.AI

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs

Luke J. Huang, Zhuoyang Zhang, Qinghao Hu, Shang Yang, Song Han

2602.12021 2026-03-03 cs.LG

Improved state mixing in higher-order and block diagonal linear recurrent networks

Igor Dubinin, Antonio Orvieto, Felix Effenberger

2602.00654 2026-03-03 cs.LG

PHAT: Modeling Period Heterogeneity for Multivariate Time Series Forecasting

Jiaming Ma, Qihe Huang, Haofeng Ma, Guanjun Wang, Sheng Huang, Zhengyang Zhou, Pengkun Wang, Binwu Wang, Yang Wang

2601.22308 2026-03-03 cs.LG cs.AI cs.CR

Stealthy Poisoning Attacks Bypass Defenses in Regression Settings

Javier Carnerero-Cano, Luis Muñoz-González, Phillippa Spencer, Emil C. Lupu

2601.21895 2026-03-03 cs.CL cs.AI stat.ML

Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text

Hongyi Zhou, Jin Zhu, Kai Ye, Ying Yang, Erhan Xu, Chengchun Shi

Comments Accepted by ICLR2026

2601.18753 2026-03-03 cs.LG cs.AI

HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

Xinyue Zeng, Junhong Lin, Yujun Yan, Feng Guo, Liang Shi, Jun Wu, Dawei Zhou

Comments Accepted by The Fourteenth International Conference on Learning Representations (ICLR'26)

2601.14492 2026-03-03 cs.RO

UNCLE-Grasp: Uncertainty-Aware Grasping of Leaf-Occluded Strawberries

Malak Mansour, Ali Abouzeid, Zezhou Sun, Qinbo Sun, Dezhen Song, Abdalla Swikir

2601.08133 2026-03-03 cs.CV cs.AI

How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?

Yujian Lee, Peng Gao, Yongqi Xu, Wentao Fan

2512.20745 2026-03-03 cs.AI cs.CL cs.LG

AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent

Haipeng Luo, Huawen Feng, Qingfeng Sun, Can Xu, Kai Zheng, Yufei Wang, Tao Yang, Han Hu, Yansong Tang

Comments This paper has been accepted to ICLR 2026

详情

英文摘要

Large Reasoning Models (LRMs) like o3 and DeepSeek-R1 have achieved remarkable progress in reasoning tasks with long cot. However, they remain computationally inefficient and struggle with accuracy when solving problems requiring complex mathematical operations. In this work, we present AgentMath, an agent framework that seamlessly integrates language models' reasoning capabilities with code interpreters' computational precision to efficiently tackle complex mathematical problems. Our approach introduces three key innovations: (1) An automated method that converts natural language chain-of-thought into structured tool-augmented trajectories, generating high-quality supervised fine-tuning (SFT) data to alleviate data scarcity; (2) A novel agentic reinforcement learning (RL) paradigm that dynamically interleaves natural language generation with real-time code execution. This enables models to autonomously learn optimal tool-use strategies through multi-round interactive feedback, while fostering emergent capabilities in code refinement and error correction; (3) An efficient training system incorporating innovative techniques, including request-level asynchronous rollout scheduling, agentic partial rollout, and prefix-aware weighted load balancing, achieving 4-5x speedup and making efficient RL training feasible on ultra-long sequences with scenarios with massive tool invocation. The evaluations show that AgentMath achieves state-of-the-art performance on challenging mathematical competition benchmarks including AIME24, AIME25, and HMMT25. Specifically, AgentMath-30B-A3B attains 90.6%, 86.4%, and 73.8% accuracy respectively, surpassing OpenAI-o3-mini and Claude-Opus-4.0-Thinking while remaining competitive with OpenAI-o3, Gemini-2.5-Pro, and DeepSeek-R1-671B-0528.These results validate the effectiveness of our approach and pave the way for building scalable mathematical reasoning agents.

URL PDF HTML ☆

赞 0 踩 0

2512.13989 2026-03-03 cs.LG

A Single Architecture for Representing Invariance Under Any Space Group

Cindy Y. Zhang, Elif Ertekin, Peter Orbanz, Ryan P. Adams

Comments 24 pages, 7 figures. ICLR 2026

2512.12046 2026-03-03 cs.LG cs.RO cs.SY eess.SY stat.ML

Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning

Vittorio Giammarino, Ahmed H. Qureshi

2512.10683 2026-03-03 cs.CV cs.LG

Optimal transport unlocks end-to-end learning for single-molecule localization

Romain Seailles, Jean-Baptiste Masson, Jean Ponce, Julien Mairal

2512.10477 2026-03-03 cs.RO cs.NE

Symphony: A Heuristic Normalized Calibrated Advantage Actor and Critic Algorithm in application for Humanoid Robots

Timur Ishuov, Michele Folgheraiter, Madi Nurmanov, Goncalo Gordo, Richárd Farkas, József Dombi

Comments https://github.com/SuspensionRailway/symphony

详情

英文摘要

In our work we implicitly suggest that it is a misconception to think that humans learn fast. The learning process takes time. Babies start learning to move in the restricted fluid environment of the womb. Children are often limited by underdeveloped body. Even adults are not allowed to participate in complex competitions right away. However, with robots, when learning from scratch, we often don't have the privilege of waiting for tens of millions of steps. "Swaddling" regularization is responsible for restraining an agent in rapid but unstable development penalizing action strength in a specific way not affecting actions directly. The Symphony, Transitional-policy Deterministic Actor and Critic algorithm, is a concise combination of different ideas for possibility of training humanoid robots from scratch with Sample Efficiency, Sample Proximity and Safety of Actions in mind. It is well known that continuous increase in Gaussian noise without appropriate smoothing is harmful for motors and gearboxes. Compared to Stochastic algorithms, we set limited parametric noise and promote a reduced strength of actions, safely increasing entropy, since the actions are submerged in weaker noise. When actions require more extreme values, actions rise above the weak noise. Training becomes empirically much safer for both the environment around and the robot's mechanisms. We use Fading Replay Buffer: using a fixed formula containing the hyperbolic tangent, we adjust the batch sampling probability: the memory contains a recent memory and a long-term memory trail. Fading Replay Buffer allows us to use Temporal Advantage when we improve the current Critic Network prediction compared to the exponential moving average. Temporal Advantage allows us to update the Actor and Critic in one pass, as well as combine the Actor and Critic in one Object and implement their Losses in one line.

URL PDF HTML ☆

赞 0 踩 0

2512.01351 2026-03-03 cs.AI

Benchmarking Overton Pluralism in LLMs

Elinor Poole-Dayan, Jiayi Wu, Taylor Sorensen, Jiaxin Pei, Michiel A. Bakker

Comments Paper accepted to ICLR 2026

2511.21722 2026-03-03 cs.CL cs.AI cs.CY

German General Social Survey Personas: A Survey-Derived Persona Prompt Collection for Population-Aligned LLM Studies

Jens Rupprecht, Leon Fröhling, Claudia Wagner, Markus Strohmaier

Comments 20 pages, 7 figures

2511.06499 2026-03-03 cs.CV

SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports

Haotian Xia, Haonan Ge, Junbo Zou, Hyun Woo Choi, Xuebin Zhang, Danny Suradja, Botao Rui, Ethan Tran, Wendy Jin, Zhen Ye, Xiyang Lin, Christopher Lai, Shengjie Zhang, Junwen Miao, Shichao Chen, Rhys Tracy, Vicente Ordonez, Weining Shen, Hanjie Chen

详情

英文摘要

Deeply understanding sports requires an intricate blend of fine-grained visual perception and rule-based reasoning - a challenge that pushes the limits of current multimodal models. To succeed, models must master three critical capabilities: perceiving nuanced visual details, applying abstract sport rule knowledge, and grounding that knowledge in specific visual evidence. Current sports benchmarks either cover single sports or lack the detailed reasoning chains and precise visual grounding needed to robustly evaluate these core capabilities in a multi-sport context. To address this gap, we introduce SportR, the first multi-sports large-scale benchmark designed to train and evaluate MLLMs on the fundamental reasoning required for sports intelligence. Our benchmark provides a dataset of 4,789 images and 2,052 videos. To enable granular evaluation, we structure our benchmark around a progressive hierarchy of question-answer pairs designed to probe reasoning at increasing depths - from simple infraction identification to complex penalty prediction. For the most advanced tasks requiring multi-step reasoning, such as determining penalties or explaining tactics, we provide 6,841 high-quality, human-authored Chain of Thought annotations. In addition, our benchmark incorporates both image and video modalities and provides manual bounding box annotations to test visual grounding in the image part directly. Extensive experiments demonstrate the profound difficulty of our benchmark. State-of-the-art baseline models perform poorly on our most challenging tasks. While training on our data via Supervised Fine-Tuning and Reinforcement Learning improves these scores, they remain relatively low, highlighting a significant gap in current model capabilities. SportR presents a new challenge for the community, providing a critical resource to drive future research in multimodal sports reasoning.

URL PDF HTML ☆

赞 0 踩 0

2511.01199 2026-03-03 cs.RO

Closed-loop Control of Steerable Balloon Endoscopes for Robot-assisted Transcatheter Intracardiac Procedures

Max McCandless, Jonathan Hamid, Sammy Elmariah, Nathaniel Langer, Pierre E. Dupont

Comments 8 pages, 11 figures

2510.24482 2026-03-03 cs.LG cs.AI cs.RO

Sample-efficient and Scalable Exploration in Continuous-Time RL

Klemens Iten, Lenart Treven, Bhavya Sukhija, Florian Dörfler, Andreas Krause

Comments 28 pages, 8 figures, 6 tables. Published as a conference paper at ICLR 2026