arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.23284 2026-04-17 cs.CV

WaveSFNet: A Wavelet-Based Codec and Spatial--Frequency Dual-Domain Gating Network for Spatiotemporal Prediction

Xinyong Cai, Runming Xie, Hu Chen, Yuankai Wu

Comments Accepted to IJCNN 2026

详情

英文摘要

Spatiotemporal predictive learning aims to forecast future frames from historical observations in an unsupervised manner, and is critical to a wide range of applications. The key challenge is to model long-range dynamics while preserving high-frequency details for sharp multi-step predictions. Existing efficient recurrent-free frameworks typically rely on strided convolutions or pooling for sampling, which tends to discard textures and boundaries, while purely spatial operators often struggle to balance local interactions with global propagation. To address these issues, we propose WaveSFNet, an efficient framework that unifies a wavelet-based codec with a spatial--frequency dual-domain gated spatiotemporal translator. The wavelet-based codec preserves high-frequency subband cues during downsampling and reconstruction. Meanwhile, the translator first injects adjacent-frame differences to explicitly enhance dynamic information, and then performs dual-domain gated fusion between large-kernel spatial local modeling and frequency-domain global modulation, together with gated channel interaction for cross-channel feature exchange. Extensive experiments demonstrate that WaveSFNet achieves competitive prediction accuracy on Moving MNIST, TaxiBJ, and WeatherBench, while maintaining low computational complexity. Our code is available at https://github.com/fhjdqaq/WaveSFNet.

URL PDF HTML ☆

赞 0 踩 0

2603.22564 2026-04-17 cs.LG

MIOFlow 2.0: A unified framework for inferring cellular stochastic dynamics from single cell and spatial transcriptomics data

Xingzhi Sun, João Felipe Rocha, Brett Phelan, Dhananjay Bhaskar, Guillaume Huguet, Yanlei Zhang, Alexander Tong, Ke Xu, Oluwadamilola Fasina, Mark Gerstein, Natalia Ivanova, Christine L. Chaffer, Guy Wolf, Smita Krishnaswamy

2603.21094 2026-04-17 cs.CL

ReasonScaffold: A Scaffolded Reasoning-based Annotation Protocol for Human-AI Co-Annotation

Smitha Muthya Sudheendra, Jaideep Srivastava

2603.20997 2026-04-17 cs.LG

When Does Content-Based Routing Work? Representation Requirements for Selective Attention in Hybrid Sequence Models

Abhinaba Basu

2603.18294 2026-04-17 cs.AI

The Validity Gap in Health AI Evaluation: A Cross-Sectional Analysis of Benchmark Composition

Alvin Rajkomar, Pavan Sudarshan, Angela Lai, Lily Peng

2603.17512 2026-04-17 cs.CL

Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality

Mengyu Bu, Yang Feng

Comments ACL 2026 Main Conference. Code: https://github.com/ictnlp/XBridge | Models: https://huggingface.co/collections/ICTNLP/xbridge

2603.16024 2026-04-17 cs.CV

Speak, Segment, Track, Navigate: An Interactive System for Video-Guided Skull-Base Surgery

Jecia Z. Y. Mao, Francis X. Creighton, Russell H. Taylor, Manish Sahu

详情

英文摘要

We introduce a speech-guided embodied agent framework for video-guided skull base surgery that dynamically executes perception and image-guidance tasks in response to surgeon queries. The proposed system integrates natural language interaction with real-time visual perception directly on live intraoperative video streams, thereby enabling surgeons to request computational assistance without disengaging from operative tasks. Unlike conventional image-guided navigation systems that rely on external optical trackers and additional hardware setup, the framework operates purely on intraoperative video. The system begins with interactive segmentation and labeling of the surgical instrument. The segmented instrument is then used as a spatial anchor that is autonomously tracked in the video stream to support downstream workflows, including anatomical segmentation, interactive registration of preoperative 3D models, monocular video-based estimation of the surgical tool pose, and image guidance through real-time anatomical overlays. We evaluate the proposed system in video-guided skull base surgery scenarios and benchmark its tracking performance against a commercially available optical tracking system. Across three experimental trials, the hybrid vision-based method achieved a mean absolute tool-tip position error of 2.32 Plus/Minus 1.10 mm in the camera frame, with inter-frame yaw and pitch propagation discrepancies of 0.18 Plus/Minus 0.25° and 0.21 Plus/Minus 0.30°, respectively. The system completes tool segmentation and anatomy registration within approximately two minutes, substantially reducing setup complexity relative to conventional tracking workflows. These results demonstrate that speech-guided embodied agents can provide accurate spatial guidance while improving workflow integration and enabling rapid deployment of video-guided surgical systems.

URL PDF HTML ☆

赞 0 踩 0

2603.13933 2026-04-17 cs.CL

OmniCompliance-100K: A Multi-Domain, Rule-Grounded, Real-World Safety Compliance Dataset

Wenbin Hu, Huihao Jing, Haochen Shi, Changxuan Fan, Haoran Li, Yangqiu Song

Comments Accepted to ACL 2026 Findings

2603.04738 2026-04-17 cs.CL

IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation

Bosi Wen, Yilin Niu, Cunxiang Wang, Xiaoying Ling, Ying Zhang, Pei Ke, Hongning Wang, Minlie Huang

Comments ACL 2026

2603.03686 2026-04-17 cs.AI

AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Alignment

Jiangyu Chen

2602.22842 2026-04-17 cs.AI cs.CE cs.NA math.NA

The AI Research Assistant: Promise, Peril, and a Proof of Concept

Tan Bui-Thanh

Comments 11 pages

2602.20328 2026-04-17 cs.CV eess.IV math.OC

GSNR: Graph Smooth Null-Space Representation for Inverse Problems

Romario Gualdrón-Hurtado, Roman Jacome, Rafael S. Suarez, Henry Arguello

Comments Accepted to The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 (CVPR 2026)

2602.20091 2026-04-17 cs.CL

How Retrieved Context Shapes Internal Representations in RAG

Samuel Yeh, Sharon Li

Comments ACL 2026 Findings

2602.10069 2026-04-17 cs.RO

Humanoid Factors: Design Principles for AI Humanoids in Human Worlds

Xinyuan Liu, Eren Sadikoglu, Ransalu Senanayake, Lixiao Huang

2602.08698 2026-04-17 cs.CL

Challenges in Translating Technical Lectures: Insights from the NPTEL

Basudha Raje, Sadanand Venkatraman, Nandana TP, Soumyadeepa Das, Polkam Poojitha, M. Vijaykumar, Tanima Bagchi, Hema A. Murthy

Comments It was uploaded by the first author without concurrence from other authors. Additional experiments need to be done to confirm the results that are presented in the paper

2602.07529 2026-04-17 cs.LG

MedVerse: Efficient and Reliable Medical Reasoning via DAG-Structured Parallel Execution

Jianwen Chen, Xinyu Yang, Peng Xia, Arian Azarang, Yueh Z Lee, Gang Li, Hongtu Zhu, Yun Li, Beidi Chen, Huaxiu Yao

2602.07069 2026-04-17 cs.CV cs.AI

Bird-SR: Bidirectional Reward-Guided Diffusion for Real-World Image Super-Resolution

Zihao Fan, Xin Lu, Yidi Liu, Jie Huang, Dong Li, Xueyang Fu, Baocai Yin

2602.03295 2026-04-17 cs.CL cs.AI cs.CV

POP: Prefill-Only Pruning for Efficient Large Model Inference

Junhui He, Zhihui Fu, Jun Wang, Qingan Li

2602.02010 2026-04-17 cs.CL

NEAT: Neuron-Based Early Exit for Large Reasoning Models

Kang Liu, Yongkang Liu, Xiaocui Yang, Peidong Wang, Wen Zhang, Shi Feng, Yifei Zhang, Daling Wang

2601.21262 2026-04-17 cs.CL

CausalEmbed: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding

Jiahao Huo, Yu Huang, Yibo Yan, Ye Pan, Kening Zheng, Wei-Chieh Huang, Yi Cao, Mingdong Ou, Philip S. Yu, Xuming Hu

Comments Under review

2601.15123 2026-04-17 cs.CV cs.AI cs.HC

BREPS: Bounding-Box Robustness Evaluation of Promptable Segmentation

Andrey Moskalenko, Danil Kuznetsov, Irina Dudko, Anastasiia Iasakova, Nikita Boldyrev, Denis Shepelev, Andrei Spiridonov, Andrey Kuznetsov, Vlad Shakhuro

Comments Accepted by AAAI2026

2601.13503 2026-04-17 cs.CL

Anonpsy: A Graph-Based Framework for Structure-Preserving De-identification of Psychiatric Narratives

Kyung Ho Lim, Byung-Hoon Kim

Comments ACL 2026 Findings

2601.11227 2026-04-17 cs.CL cs.CY

Language of Thought Shapes Output Diversity in Large Language Models

Shaoyang Xu, Wenxuan Zhang

Comments acl2026

2601.09240 2026-04-17 cs.CV eess.IV

DeTracker: Motion-decoupled Vehicle Detection and Tracking in Unstabilized Satellite Videos

Jiajun Chen, Jing Xiao, Shaohan Cao, Yuming Zhu, Liang Liao, Jun Pan, Mi Wang

2601.07338 2026-04-17 cs.CL

Beyond Literal Mapping: Benchmarking and Improving Non-Literal Translation Evaluation

Yanzhi Tian, Cunxiang Wang, Zeming Liu, Heyan Huang, Wenbo Yu, Dawei Song, Jie Tang, Yuhang Guo

Comments Accepted to ACL 2026 Main Conference

2601.04588 2026-04-17 cs.CV

3D Conditional Image Synthesis of Left Atrial LGE MRI from Composite Semantic Masks

Yusri Al-Sanaani, Rebecca Thornhill, Sreeraman Rajan

Comments This work has been published in the Proceedings of the 2025 IEEE International Conference on Imaging Systems and Techniques (IST). The final published version is available via IEEE Xplore

2601.04567 2026-04-17 cs.CV

All Changes May Have Invariant Principles: Improving Ever-Shifting Harmful Meme Detection via Design Concept Reproduction

Ziyou Jiang, Mingyang Li, Junjie Wang, Yuekai Huang, Jie Huang, Zhiyuan Chang, Zhaoyang Li, Qing Wang

Comments 19 pages, 11 figures, 9 tables accepted by ACL 2026 main conference

2601.03448 2026-04-17 cs.CL

Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks

Atsuki Yamaguchi, Maggie Mi, Nikolaos Aletras

Comments Accepted to ACL 2026 Main Conference

2601.03416 2026-04-17 cs.CV

GAMBIT: A Gamified Jailbreak Framework for Multimodal Large Language Models

Xiangdong Hu, Yangyang Jiang, Qin Hu, Xiaojun Jia

Comments Accepted to the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), Main Conference

2601.03236 2026-04-17 cs.AI

MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents

Dongming Jiang, Yi Li, Guanpeng Li, Bingzhe Li

Comments ACL 2026 Main