arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2509.08743 2026-03-24 cs.RO

Parallel, Asymptotically Optimal Algorithms for Moving Target Traveling Salesman Problems

Anoop Bhat, Geordan Gutow, Bhaskar Vundurthy, Zhongqiang Ren, Sivakumar Rathinam, Howie Choset

详情

英文摘要

The Moving Target Traveling Salesman Problem (MT-TSP) seeks a trajectory that intercepts several moving targets, within a particular time window for each target. When generic nonlinear target trajectories or kinematic constraints on the agent are present, no prior algorithm guarantees convergence to an optimal MT-TSP solution. Therefore, we introduce the Iterated Random Generalized (IRG) TSP framework. The idea behind IRG is to alternate between randomly sampling a set of agent configuration-time points, corresponding to interceptions of targets, and finding a sequence of interception points by solving a generalized TSP (GTSP). This alternation asymptotically converges to the optimum. We introduce two parallel algorithms within the IRG framework. The first algorithm, IRG-PGLNS, solves GTSPs using PGLNS, our parallelized extension of state-of-the-art solver GLNS. The second algorithm, Parallel Communicating GTSPs (PCG), solves GTSPs for several sets of points simultaneously. We present numerical results for three MT-TSP variants: one where intercepting a target only requires coming within a particular distance, another where the agent is a variable-speed Dubins car, and a third where the agent is a robot arm. We show that IRG-PGLNS and PCG converge faster than a baseline based on prior work. We further validate our framework with physical robot experiments.

URL PDF HTML ☆

赞 0 踩 0

2508.09176 2026-03-24 cs.LG cs.AI

DQT: Dynamic Quantization Training via Dequantization-Free Nested Integer Arithmetic

Hazem Hesham Yousef Shalby, Fabrizio Pittorino, Francesca Palermo, Diana Trojaniello, Manuel Roveri

详情

DOI: 10.1609/aaai.v40i30.39717

英文摘要

The deployment of deep neural networks on resource-constrained devices relies on quantization. While static, uniform quantization applies a fixed bit-width to all inputs, it fails to adapt to their varying complexity. Dynamic, instance-based mixed-precision quantization promises a superior accuracy-efficiency trade-off by allocating higher precision only when needed. However, a critical bottleneck remains: existing methods require a costly dequantize-to-float and requantize-to-integer cycle to change precision, breaking the integer-only hardware paradigm and compromising performance gains. This paper introduces Dynamic Quantization Training (DQT), a novel framework that removes this bottleneck. At the core of DQT is a nested integer representation where lower-precision values are bit-wise embedded within higher-precision ones. This design, coupled with custom integer-only arithmetic, allows for on-the-fly bit-width switching through a near-zero-cost bit-shift operation. This makes DQT the first quantization framework to enable both dequantization-free static mixed-precision of the backbone network, and truly efficient dynamic, instance-based quantization through a lightweight controller that decides at runtime how to quantize each layer. We demonstrate DQT state-of-the-art performance on ResNet18 on CIFAR-10 and ResNet50 on ImageNet. On ImageNet, our 4-bit dynamic ResNet50 achieves 77.00% top-1 accuracy, an improvement over leading static (LSQ, 76.70%) and dynamic (DQNET, 76.94%) methods at a comparable BitOPs budget. Crucially, DQT achieves this with a bit-width transition cost of only 28.3M simple bit-shift operations, a drastic improvement over the 56.6M costly Multiply-Accumulate (MAC) floating-point operations required by previous dynamic approaches - unlocking a new frontier in efficient, adaptive AI.

URL PDF HTML ☆

赞 0 踩 0

2508.07392 2026-03-24 cs.LG math.ST stat.ML stat.TH

Tight Bounds for Schrödinger Potential Estimation in Unpaired Data Translation

Nikita Puchkin, Denis Suchkov, Alexey Naumov, Denis Belomestny

Comments The 14th International Conference on Learning Representations (ICLR 2026)

2508.06931 2026-03-24 cs.AI cs.LG

Automated Formalization via Conceptual Retrieval-Augmented LLMs

Wangyue Lu, Lun Du, Sirui Li, Ke Weng, Haozhe Sun, Hengyu Liu, Minghe Yu, Tiancheng Zhang, Ge Yu

2508.05984 2026-03-24 cs.LG

Parameter-free Optimal Rates for Nonlinear Semi-Norm Contractions with Applications to $Q$-Learning

Ankur Naskar, Gugan Thoppe, Vijay Gupta

2508.05264 2026-03-24 cs.CV cs.AI

SGDFuse: SAM-Guided Diffusion Model for High-Fidelity Infrared and Visible Image Fusion

Xiaoyang Zhang, jinjiang Li, Guodong Fan, Yakun Ju, Linwei Fan, Jun Liu, Alex C. Kot

Comments Published in Information Fusion

2508.04753 2026-03-24 cs.LG

InfoQ: Mixed-Precision Quantization via Global Information Flow

Mehmet Emre Akbulut, Hazem Hesham Yousef Shalby, Fabrizio Pittorino, Manuel Roveri

2508.03243 2026-03-24 cs.CV

MVTOP: Multi-View Transformer-based Object Pose-Estimation

Lukas Ranftl, Felix Brendel, Bertram Drost, Carsten Steger

Comments 9 pages, 7 figures

2508.01192 2026-03-24 cs.RO

Unified Generation-Refinement Planning: Bridging Guided Flow Matching and Sampling-Based MPC for Social Navigation

Kazuki Mizuta, Karen Leung

2507.19408 2026-03-24 cs.LG cs.AI

On Arbitrary Predictions from Equally Valid Models

Sarah Lockfisch, Kristian Schwethelm, Martin Menten, Rickmer Braren, Daniel Rueckert, Alexander Ziller, Georgios Kaissis

2507.18983 2026-03-24 cs.LG

KASPER: Kolmogorov Arnold Networks for Stock Prediction and Explainable Regimes

Vidhi Oad, Param Pathak, Nouhaila Innan, Shalini D, Muhammad Shafique

Comments 11 pages, 7 figures, 3 tables

2507.13677 2026-03-24 cs.CV cs.AI cs.LG cs.MM

HeCoFuse: Cross-Modal Complementary V2X Cooperative Perception with Heterogeneous Sensors

Chuheng Wei, Ziye Qin, Walter Zimmer, Guoyuan Wu, Matthew J. Barth

Comments Ranked first in CVPR DriveX workshop TUM-Traf V2X challenge. Accepted by ITSC2025

详情

DOI: 10.1109/ITSC60802.2025.11423237
Journal ref: Proceedings of the 2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), pp. 1214-1221, 2025

英文摘要

Real-world Vehicle-to-Everything (V2X) cooperative perception systems often operate under heterogeneous sensor configurations due to cost constraints and deployment variability across vehicles and infrastructure. This heterogeneity poses significant challenges for feature fusion and perception reliability. To address these issues, we propose HeCoFuse, a unified framework designed for cooperative perception across mixed sensor setups where nodes may carry Cameras (C), LiDARs (L), or both. By introducing a hierarchical fusion mechanism that adaptively weights features through a combination of channel-wise and spatial attention, HeCoFuse can tackle critical challenges such as cross-modality feature misalignment and imbalanced representation quality. In addition, an adaptive spatial resolution adjustment module is employed to balance computational cost and fusion effectiveness. To enhance robustness across different configurations, we further implement a cooperative learning strategy that dynamically adjusts fusion type based on available modalities. Experiments on the real-world TUMTraf-V2X dataset demonstrate that HeCoFuse achieves 43.22% 3D mAP under the full sensor configuration (LC+LC), outperforming the CoopDet3D baseline by 1.17%, and reaches an even higher 43.38% 3D mAP in the L+LC scenario, while maintaining 3D mAP in the range of 21.74% to 43.38% across nine heterogeneous sensor configurations. These results, validated by our first-place finish in the CVPR 2025 DriveX challenge, establish HeCoFuse as the current state-of-the-art on TUM-Traf V2X dataset while demonstrating robust performance across diverse sensor deployments.

URL PDF HTML ☆

赞 0 踩 0

2507.13340 2026-03-24 cs.RO cs.AI cs.LG

Latent Policy Steering with Embodiment-Agnostic Pretrained World Models

Yiqi Wang, Mrinal Verghese, Jeff Schneider

2507.05671 2026-03-24 cs.LG

Canine Clinical Gait Analysis for Orthopedic and Neurological Disorders: An Inertial Deep-Learning Approach

Netta Palez, Léonie Straß, Sebastian Meller, Holger Volk, Anna Zamansky, Itzik Klein

Comments 20 pages, 11 figures (one combine 2 images), 7 tables, 41 references

2507.00761 2026-03-24 cs.LG

A Probabilistic Approach to Wildfire Spread Prediction Using a Denoising Diffusion Surrogate Model

Wenbo Yu, Anirbit Ghosh, Tobias Sebastian Finn, Rossella Arcucci, Marc Bocquet, Sibo Cheng

2506.13925 2026-03-24 cs.CV cs.AI

Segmenting Visuals With Querying Words: Language Anchors For Semi-Supervised Image Segmentation

Numair Nadeem, Saeed Anwar, Muhammad Hamza Asad, Abdul Bais

2506.02459 2026-03-24 cs.CV

ReSpace: Text-Driven Autoregressive 3D Indoor Scene Synthesis and Editing

Martin JJ. Bucher, Iro Armeni

Comments 36 pages, 19 figures, 11 tables (incl. appendix)

2506.02426 2026-03-24 cs.CL cs.AI

Comparative Analysis of AI Agent Architectures for Entity Relationship Classification

Maryam Berijanian, Kuldeep Singh, Amin Sehati

2506.00835 2026-03-24 cs.AI cs.CV

SynPO: Synergizing Descriptiveness and Preference Optimization for Video Detailed Captioning

Jisheng Dang, Yizhou Zhang, Hao Ye, Teng Wang, Siming Chen, Huicheng Zheng, Yulan Guo, Jianhuang Lai, Bin Hu

2505.16474 2026-03-24 cs.CV

Foresight Diffusion: Improving Sampling Consistency in Predictive Diffusion Models

Yu Zhang, Xingzhuo Guo, Haoran Xu, Jialong Wu, Mingsheng Long

Comments Accepted at ICLR 2026

2505.15340 2026-03-24 cs.LG

SSR: Speculative Parallel Scaling Reasoning in Test-time

Yuanlin Chu, Bo Wang, Xiang Liu, Hong Chen, Aiwei Liu, Xuming Hu

2505.12656 2026-03-24 cs.CV

SPKLIP: Aligning Spike Video Streams with Natural Language

Yongchang Gao, Meiling Jin, Zhaofei Yu, Tiejun Huang, Guozhang Chen

Comments A dataset partitioning error occurred and is being corrected

2505.12299 2026-03-24 cs.CL cs.AI

MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning

Kun Huang, Weikai Xu, Yuxuan Liu, Quandong Wang, Pengzhi Gao, Wei Liu, Jian Luan, Bin Wang, Bo An

Comments 9 pages, 8 figures, 7 tables

2505.12224 2026-03-24 cs.RO cs.AI

RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction

Zewei Ye, Weifeng Lu, Minghao Ye, Tao Lin, Shuo Yang, Junchi Yan, Bo Zhao

2505.09855 2026-03-24 cs.LG cs.AI cs.CL

An evolutionary perspective on modes of learning in Transformers

Alexander Y. Ku, Thomas L. Griffiths, Stephanie C. Y. Chan

2505.00651 2026-03-24 cs.AI cs.ET cs.LG

Open-Source LLM-Driven Federated Transformer for Predictive IoV Management

Yazan Otoum, Arghavan Asad, Ishtiaq Ahmad

Comments Preprint version; submitted for academic peer review

详情

DOI: 10.1109/GLOBECOM59602.2025.11431973

英文摘要

The proliferation of connected vehicles within the Internet of Vehicles (IoV) ecosystem presents critical challenges in ensuring scalable, real-time, and privacy-preserving traffic management. Existing centralized IoV solutions often suffer from high latency, limited scalability, and reliance on proprietary Artificial Intelligence (AI) models, creating significant barriers to widespread deployment, particularly in dynamic and privacy-sensitive environments. Meanwhile, integrating Large Language Models (LLMs) in vehicular systems remains underexplored, especially concerning prompt optimization and effective utilization in federated contexts. To address these challenges, we propose the Federated Prompt-Optimized Traffic Transformer (FPoTT), a novel framework that leverages open-source LLMs for predictive IoV management. FPoTT introduces a dynamic prompt optimization mechanism that iteratively refines textual prompts to enhance trajectory prediction. The architecture employs a dual-layer federated learning paradigm, combining lightweight edge models for real-time inference with cloud-based LLMs to retain global intelligence. A Transformer-driven synthetic data generator is incorporated to augment training with diverse, high-fidelity traffic scenarios in the Next Generation Simulation (NGSIM) format. Extensive evaluations demonstrate that FPoTT, utilizing EleutherAI Pythia-1B, achieves 99.86% prediction accuracy on real-world data while maintaining high performance on synthetic datasets. These results underscore the potential of open-source LLMs in enabling secure, adaptive, and scalable IoV management, offering a promising alternative to proprietary solutions in smart mobility ecosystems.

URL PDF HTML ☆

赞 0 踩 0

2504.21247 2026-03-24 cs.CV

Subject Information Extraction for Novelty Detection with Domain Shifts

Yangyang Qu, Dazhi Fu, Jicong Fan

2504.11289 2026-03-24 cs.CV cs.MM

UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer

Xiang Wang, Shiwei Zhang, Longxiang Tang, Yingya Zhang, Changxin Gao, Yuehuan Wang, Nong Sang

Comments The training and inference code (based on Wan2.1) is available at https://github.com/ali-vilab/UniAnimate-DiT

2504.03792 2026-03-24 cs.LG cs.AI

DP-LET: An Efficient Spatio-Temporal Network Traffic Prediction Framework

Xintong Wang, Haihan Nan, Ruidong Li, Huaming Wu

Comments Accepted for presentation to the 2025 IEEE Global Communications Conference (IEEE GLOBECOM)

详情

DOI: 10.1109/GLOBECOM59602.2025.11432477
Journal ref: GLOBECOM 2025 - 2025 IEEE Global Communications Conference

英文摘要

Accurately predicting spatio-temporal network traffic is essential for dynamically managing computing resources in modern communication systems and minimizing energy consumption. Although spatio-temporal traffic prediction has received extensive research attention, further improvements in prediction accuracy and computational efficiency remain necessary. In particular, existing decomposition-based methods or hybrid architectures often incur heavy overhead when capturing local and global feature correlations, necessitating novel approaches that optimize accuracy and complexity. In this paper, we propose an efficient spatio-temporal network traffic prediction framework, DP-LET, which consists of a data processing module, a local feature enhancement module, and a Transformer-based prediction module. The data processing module is designed for high-efficiency denoising of network data and spatial decoupling. In contrast, the local feature enhancement module leverages multiple Temporal Convolutional Networks (TCNs) to capture fine-grained local features. Meanwhile, the prediction module utilizes a Transformer encoder to model long-term dependencies and assess feature relevance. A case study on real-world cellular traffic prediction demonstrates the practicality of DP-LET, which maintains low computational complexity while achieving state-of-the-art performance, significantly reducing MSE by 31.8% and MAE by 23.1% compared to baseline models.

URL PDF HTML ☆

赞 0 踩 0

2504.01396 2026-03-24 cs.CV

All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning

Zheng Yang, Ruoxin Chen, Zhiyuan Yan, Ke-Yue Zhang, Xinghe Fu, Shuang Wu, Xiujun Shu, Taiping Yao, Shouhong Ding, Zequn Qin, Xi Li