arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.21203 2026-02-25 cs.RO cs.CV cs.LG

Squint: Fast Visual Reinforcement Learning for Sim-to-Real Robotics

Abdulaziz Almuzairee, Henrik I. Christensen

Comments For website and code, see https://aalmuzairee.github.io/squint

详情

英文摘要

Visual reinforcement learning is appealing for robotics but expensive -- off-policy methods are sample-efficient yet slow; on-policy methods parallelize well but waste samples. Recent work has shown that off-policy methods can train faster than on-policy methods in wall-clock time for state-based control. Extending this to vision remains challenging, where high-dimensional input images complicate training dynamics and introduce substantial storage and encoding overhead. To address these challenges, we introduce Squint, a visual Soft Actor Critic method that achieves faster wall-clock training than prior visual off-policy and on-policy methods. Squint achieves this via parallel simulation, a distributional critic, resolution squinting, layer normalization, a tuned update-to-data ratio, and an optimized implementation. We evaluate on the SO-101 Task Set, a new suite of eight manipulation tasks in ManiSkill3 with heavy domain randomization, and demonstrate sim-to-real transfer to a real SO-101 robot. We train policies for 15 minutes on a single RTX 3090 GPU, with most tasks converging in under 6 minutes.

URL PDF HTML ☆

赞 0 踩 0

2602.21202 2026-02-25 cs.IR cs.CL cs.CV

Multi-Vector Index Compression in Any Modality

Hanxiang Qin, Alexander Martin, Rohan Jha, Chunsheng Zuo, Reno Kriz, Benjamin Van Durme

Comments 12 pages, 4 figures

2602.21197 2026-02-25 math.NA cs.NA

Variants of Raviart-Thomas mixed elements for curved domains using straight-edged tetrahedra

Vittoriano Ruas

Comments This pre-publication is the same as the initial version of the manuscript submitted to (Springer) Journal of Scientific Computing, except for its abridged title and the dedication. It was accepted for publication by this journal in revised and improved form on February 21, 2026

2602.21196 2026-02-25 cs.LG cs.DC

Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking

Ravi Ghadia, Maksim Abraham, Sergei Vorobyov, Max Ryabinin

Comments 14 pages, 6 figures

2602.21195 2026-02-25 cs.CV

Region of Interest Segmentation and Morphological Analysis for Membranes in Cryo-Electron Tomography

Xingyi Cheng, Julien Maufront, Aurélie Di Cicco, Daniël M. Pelt, Manuela Dezi, Daniel Lévy

2602.21193 2026-02-25 cs.CL

On Data Engineering for Scaling LLM Terminal Capabilities

Renjie Pi, Grace Lam, Mohammad Shoeybi, Pooya Jannaty, Bryan Catanzaro, Wei Ping

2602.21191 2026-02-25 cs.LG cs.DS stat.ML

Statistical Query Lower Bounds for Smoothed Agnostic Learning

Ilias Diakonikolas, Daniel M. Kane

2602.21188 2026-02-25 cs.CV

Human Video Generation from a Single Image with 3D Pose and View Control

Tiantian Wang, Chun-Han Yao, Tao Hu, Mallikarjun Byrasandra Ramalinga Reddy, Ming-Hsuan Yang, Varun Jampani

2602.21186 2026-02-25 cs.CV

Spa3R: Predictive Spatial Field Modeling for 3D Visual Reasoning

Haoyi Jiang, Liu Liu, Xinjie Wang, Yonghao He, Wei Sui, Zhizhong Su, Wenyu Liu, Xinggang Wang

2602.21183 2026-02-25 cs.SD

823-OLT @ BUET DL Sprint 4.0: Context-Aware Windowing for ASR and Fine-Tuned Speaker Diarization in Bengali Long Form Audio

Ratnajit Dhar, Arpita Mallik

2602.21182 2026-02-25 cs.DC

Circumventing the CAP Theorem with Open Atomic Ethernet

Paul Borrill

Comments 23 pages, 14 figures

2602.21180 2026-02-25 cs.CY

Memory Undone: Between Knowing and Not Knowing in Data Systems

Viktoriia Makovska, George Fletcher, Julia Stoyanovich, Tetiana Zakharchenko

Comments Undone Computer Science 2026

2602.21179 2026-02-25 cs.CV

Mask-HybridGNet: Graph-based segmentation with emergent anatomical correspondence from pixel-level supervision

Nicolás Gaggion, Maria J. Ledesma-Carbayo, Stergios Christodoulidis, Maria Vakalopoulou, Enzo Ferrante

详情

英文摘要

Graph-based medical image segmentation represents anatomical structures using boundary graphs, providing fixed-topology landmarks and inherent population-level correspondences. However, their clinical adoption has been hindered by a major requirement: training datasets with manually annotated landmarks that maintain point-to-point correspondences across patients rarely exist in practice. We introduce Mask-HybridGNet, a framework that trains graph-based models directly using standard pixel-wise masks, eliminating the need for manual landmark annotations. Our approach aligns variable-length ground truth boundaries with fixed-length landmark predictions by combining Chamfer distance supervision and edge-based regularization to ensure local smoothness and regular landmark distribution, further refined via differentiable rasterization. A significant emergent property of this framework is that predicted landmark positions become consistently associated with specific anatomical locations across patients without explicit correspondence supervision. This implicit atlas learning enables temporal tracking, cross-slice reconstruction, and morphological population analyses. Beyond direct segmentation, Mask-HybridGNet can extract correspondences from existing segmentation masks, allowing it to generate stable anatomical atlases from any high-quality pixel-based model. Experiments across chest radiography, cardiac ultrasound, cardiac MRI, and fetal imaging demonstrate that our model achieves competitive results against state-of-the-art pixel-based methods, while ensuring anatomical plausibility by enforcing boundary connectivity through a fixed graph adjacency matrix. This framework leverages the vast availability of standard segmentation masks to build structured models that maintain topological integrity and provide implicit correspondences.

URL PDF HTML ☆

赞 0 踩 0

2602.21178 2026-02-25 cs.CV cs.AI

XMorph: Explainable Brain Tumor Analysis Via LLM-Assisted Hybrid Deep Intelligence

Sepehr Salem Ghahfarokhi, M. Moein Esfahani, Raj Sunderraman, Vince Calhoun, Mohammed Alser

Comments Accepted in ICCABS 2026: The 14th International Conference on Computational Advances in Bio and Medical Sciences

2602.21175 2026-02-25 cs.CV

Seeing Through Words: Controlling Visual Retrieval Quality with Language Models

Jianglin Lu, Simon Jenni, Kushal Kafle, Jing Shi, Handong Zhao, Yun Fu

2602.21174 2026-02-25 cs.RO cs.AI

Efficient Hierarchical Any-Angle Path Planning on Multi-Resolution 3D Grids

Victor Reijgwart, Cesar Cadena, Roland Siegwart, Lionel Ott

Comments 12 pages, 9 figures, 4 tables, accepted to RSS 2025, code is open-source: https://github.com/ethz-asl/wavestar

2602.21168 2026-02-25 cs.LG

Sequential Counterfactual Inference for Temporal Clinical Data: Addressing the Time Traveler Dilemma

Jingya Cheng, Alaleh Azhir, Jiazi Tian, Hossein Estiri

2602.21167 2026-02-25 cs.IT math.IT

Wireless-Fed Pinching-Antenna Systems with Horn Antennas

Hao Feng, Ming Zeng, Ebrahim Bedeer, Xingwang Li, Octavia A. Dobre, Zhiguo Ding

Comments 4 pages; 1 figure; submitted to IEEE journals

2602.21165 2026-02-25 cs.CL cs.AI

PVminer: A Domain-Specific Tool to Detect the Patient Voice in Patient Generated Data

Samah Fodeh, Linhai Ma, Yan Wang, Srivani Talakokkul, Ganesh Puthiaraju, Afshan Khan, Ashley Hagaman, Sarah Lowe, Aimee Roundtree

2602.21162 2026-02-25 cs.IT math.IT

Phase-Aware Localization in Pinching Antenna Systems: CRLB Analysis and ML Estimation

Hao Feng, Ebrahim Bedeer, Ming Zeng, Xingwang Li, Shimin Gong, Quoc-Viet Pham

Comments 4 pages, 2 figures; submitted to IEEE journals

2602.21161 2026-02-25 cs.RO

ActionReasoning: Robot Action Reasoning in 3D Space with LLM for Robotic Brick Stacking

Guangming Wang, Qizhen Ying, Yixiong Jing, Olaf Wysocki, Brian Sheil

Comments 8 pages, 5 figures, accepted by the 2026 IEEE International Conference on Robotics and Automation

2602.21154 2026-02-25 cs.AI

CG-DMER: Hybrid Contrastive-Generative Framework for Disentangled Multimodal ECG Representation Learning

Ziwei Niu, Hao Sun, Shujun Bian, Xihong Yang, Lanfen Lin, Yuxin Liu, Yueming Jin

Comments Accepted by ICASSP 2026

2602.21153 2026-02-25 cs.CV

SPRITETOMESH: Automatic Mesh Generation for 2D Skeletal Animation Using Learned Segmentation and Contour-Aware Vertex Placement

Bastien Gimbert

Comments 11 pages, 17 figures. Code available at https://github.com/BastienGimbert/SpriteToMesh

2602.21148 2026-02-25 cs.RO cs.MA

A Micro-Macro Model of Encounter-Driven Information Diffusion in Robot Swarms

Davis S. Catherman, Carlo Pinciroli

Comments 10 pages, 5 figures, published at ANTS 2026

2602.21146 2026-02-25 cs.IT math.IT

TCDA: Robust 2D-DOA Estimation for Defective L-Shaped Arrays

Wenlong Wang, Tianyang Zhang, Tailun Dong, Lei Zhang

Comments 5 pages, 2 figures

2602.21144 2026-02-25 cs.DC cs.LG

Scaling State-Space Models on Multiple GPUs with Tensor Parallelism

Anurag Dutt, Nimit Shah, Hazem Masarani, Anshul Gandhi

Comments Submitted to 46th IEEE International Conference on Distributed Computing Systems (ICDCS 2026)

2602.21143 2026-02-25 cs.AI cs.CL cs.IR cs.LG

A Benchmark for Deep Information Synthesis

Debjit Paul, Daniel Murphy, Milan Gritta, Ronald Cardenas, Victor Prokhorov, Lena Sophia Bolliger, Aysim Toker, Roy Miles, Andreea-Maria Oncescu, Jasivan Alex Sivakumar, Philipp Borchert, Ismail Elezi, Meiru Zhang, Ka Yiu Lee, Guchun Zhang, Jun Wang, Gerasimos Lampouras

Comments Accepted at ICLR 2026

2602.21142 2026-02-25 cs.CV cs.LG

LUMEN: Longitudinal Multi-Modal Radiology Model for Prognosis and Diagnosis

Zhifan Jiang, Dong Yang, Vishwesh Nath, Abhijeet Parida, Nishad P. Kulkarni, Ziyue Xu, Daguang Xu, Syed Muhammad Anwar, Holger R. Roth, Marius George Linguraru

Comments Accepted to IEEE International Symposium on Biomedical Imaging (ISBI) 2026

2602.21140 2026-02-25 cs.DC

ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments

Haley Li, Xinglu Wang, Cong Feng, Chunxu Zuo, Yanan Wang, Hei Lo, Yufei Cui, Bingji Wang, Duo Cui, Shuming Jing, Yizhou Shan, Ying Xiong, Jiannan Wang, Yong Zhang, Zhenan Fan

Comments 21 pages, 6 figures

2602.21137 2026-02-25 cs.CV

UDVideoQA: A Traffic Video Question Answering Dataset for Multi-Object Spatio-Temporal Reasoning in Urban Dynamics

Joseph Raj Vishal, Nagasiri Poluri, Katha Naik, Rutuja Patil, Kashyap Hegde Kota, Krishna Vinod, Prithvi Jai Ramesh, Mohammad Farhadi, Yezhou Yang, Bharatesh Chakravarthi

详情

英文摘要

Understanding the complex, multi-agent dynamics of urban traffic remains a fundamental challenge for video language models. This paper introduces Urban Dynamics VideoQA, a benchmark dataset that captures the unscripted real-world behavior of dynamic urban scenes. UDVideoQA is curated from 16 hours of traffic footage recorded at multiple city intersections under diverse traffic, weather, and lighting conditions. It employs an event-driven dynamic blur technique to ensure privacy preservation without compromising scene fidelity. Using a unified annotation pipeline, the dataset contains 28K question-answer pairs generated across 8 hours of densely annotated video, averaging one question per second. Its taxonomy follows a hierarchical reasoning level, spanning basic understanding and attribution to event reasoning, reverse reasoning, and counterfactual inference, enabling systematic evaluation of both visual grounding and causal reasoning. Comprehensive experiments benchmark 10 SOTA VideoLMs on UDVideoQA and 8 models on a complementary video question generation benchmark. Results reveal a persistent perception-reasoning gap, showing models that excel in abstract inference often fail with fundamental visual grounding. While models like Gemini Pro achieve the highest zero-shot accuracy, fine-tuning the smaller Qwen2.5-VL 7B model on UDVideoQA bridges this gap, achieving performance comparable to proprietary systems. In VideoQGen, Gemini 2.5 Pro, and Qwen3 Max generate the most relevant and complex questions, though all models exhibit limited linguistic diversity, underscoring the need for human-centric evaluation. The UDVideoQA suite, including the dataset, annotation tools, and benchmarks for both VideoQA and VideoQGen, provides a foundation for advancing robust, privacy-aware, and real-world multimodal reasoning. UDVideoQA is available at https://ud-videoqa.github.io/UD-VideoQA/UD-VideoQA/.

URL PDF HTML ☆

赞 0 踩 0