arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2406.10985 2026-03-23 cs.CL

Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

Weiyao Luo, Suncong Zheng, Heming Xia, Weikang Wang, Yan Lei, Tianyu Liu, Shuang Chen, Zhifang Sui

详情

英文摘要

Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts due to they discard some information to reduce computational overhead. In this work, we propose a simple yet effective method to enable LLMs to take a deep breath, encouraging them to summarize information contained within discrete text chunks. Specifically, we segment the text into multiple chunks and insert special token <SR> at the end of each chunk. We then modify the attention mask to integrate the chunk's information into the corresponding <SR> token. This facilitates LLMs to interpret information not only from historical individual tokens but also from the <SR> token, aggregating the chunk's semantic information. Experiments on language modeling and out-of-domain downstream tasks validate the superiority of our approach.

URL PDF HTML ☆

赞 0 踩 0

2404.10318 2026-03-23 cs.CV

SRGS: Super-Resolution 3D Gaussian Splatting

Xiang Feng, Yongbo He, Linxi Chen, Yan Yang, Chengkai Wang, Yifei Chen, Yixuan Zhong, Zhenzhong Kuang, Jiajun ding, Xufei Yin, Yanming Zhu

Comments The first to focus on the HRNVS of 3DGS

2401.06344 2026-03-23 cs.CV cs.LG

Hyper-STTN: Hypergraph Augmented Spatial-Temporal Transformer Network for Trajectory Prediction

Weizheng Wang, Baijian Yang, Sungeun Hong, Wenhai Sun, Byung-Cheol Min

Comments To appear in ICRA2026

2305.14594 2026-03-23 cs.LG

torchgfn: A PyTorch GFlowNet library

Joseph D. Viviano, Omar G. Younis, Sanghyeok Choi, Victor Schmidt, Yoshua Bengio, Salem Lahlou

Comments 15 pages, 3 figures, 3 tables. Submitted

2211.02831 2026-03-23 cs.CV

Deep Face Restoration: A Survey

Tao Wang, Kaihao Zhang, Jiankang Deng, Tong Lu, Wei Liu, Stefanos Zafeiriou

Comments Accepted by ACM Computing Surveys, 39 pages, 14 figures

2603.19571 2026-03-23 cs.CV

CurveStream: Boosting Streaming Video Understanding in MLLMs via Curvature-Aware Hierarchical Visual Memory Management

Chao Wang, Xudong Tan, Jianjian Cao, Kangcong Li, Tao Chen

2603.19570 2026-03-23 cs.CV

Accelerating Diffusion Decoders via Multi-Scale Sampling and One-Step Distillation

Chuhan Wang, Hao Chen

2603.19567 2026-03-23 cs.CV

Efficiency Follows Global-Local Decoupling

Zhenyu Yang, Gensheng Pei, Tao Chen, Yichao Zhou, Tianfei Zhou, Yazhou Yao, Fumin Shen

2603.19565 2026-03-23 cs.CV cs.AI cs.LG

PFM-VEPAR: Prompting Foundation Models for RGB-Event Camera based Pedestrian Attribute Recognition

Minghe Xu, Rouying Wu, ChiaWei Chu, Xiao Wang, Yu Li

2603.19564 2026-03-23 cs.LG

Wearable Foundation Models Should Go Beyond Static Encoders

Yu Yvonne Wu, Yuwei Zhang, Hyungjun Yoon, Ting Dang, Dimitris Spathis, Tong Xia, Qiang Yang, Jing Han, Dong Ma, Sung-Ju Lee, Cecilia Mascolo

Comments 13 pages

2603.19563 2026-03-23 cs.CV cs.AI

Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture Search

Haoyu Zhang, Zhihao Yu, Rui Wang, Yaochu Jin, Qiqi Liu, Ran Cheng

详情

英文摘要

Modern computer vision requires balancing predictive accuracy with real-time efficiency, yet the high inference cost of large vision models (LVMs) limits deployment on resource-constrained edge devices. Although Evolutionary Neural Architecture Search (ENAS) is well suited for multi-objective optimization, its practical use is hindered by two issues: expensive candidate evaluation and ranking inconsistency among subnetworks. To address them, we propose EvoNAS, an efficient distributed framework for multi-objective evolutionary architecture search. We build a hybrid supernet that integrates Vision State Space and Vision Transformer (VSS-ViT) modules, and optimize it with a Cross-Architecture Dual-Domain Knowledge Distillation (CA-DDKD) strategy. By coupling the computational efficiency of VSS blocks with the semantic expressiveness of ViT modules, CA-DDKD improves the representational capacity of the shared supernet and enhances ranking consistency, enabling reliable fitness estimation during evolution without extra fine-tuning. To reduce the cost of large-scale validation, we further introduce a Distributed Multi-Model Parallel Evaluation (DMMPE) framework based on GPU resource pooling and asynchronous scheduling. Compared with conventional data-parallel evaluation, DMMPE improves efficiency by over 70% through concurrent multi-GPU, multi-model execution. Experiments on COCO, ADE20K, KITTI, and NYU-Depth v2 show that the searched architectures, termed EvoNets, consistently achieve Pareto-optimal trade-offs between accuracy and efficiency. Compared with representative CNN-, ViT-, and Mamba-based models, EvoNets deliver lower inference latency and higher throughput under strict computational budgets while maintaining strong generalization on downstream tasks such as novel view synthesis. Code is available at https://github.com/EMI-Group/evonas

URL PDF HTML ☆

赞 0 踩 0

2603.19558 2026-03-23 cs.CL

TextReasoningBench: Does Reasoning Really Improve Text Classification in Large Language Models?

Xinyu Guo, Yazhou Zhang, Jing Qin

Comments 20 pages

2603.19552 2026-03-23 cs.CV

StreetForward: Perceiving Dynamic Street with Feedforward Causal Attention

Zhongrui Yu, Zhao Wang, Yijia Xie, Yida Wang, Xueyang Zhang, Yifei Zhan, Kun Zhan

2603.19547 2026-03-23 cs.CV

SeeClear: Reliable Transparent Object Depth Estimation via Generative Opacification

Xiaoying Wang, Yumeng He, Jingkai Shi, Jiayin Lu, Yin Yang, Ying Jiang, Chenfanfu Jiang

Comments Project page: https://heyumeng.com/SeeClear-web/. 19 pages, 12 figures

2603.19546 2026-03-23 cs.LG cs.AI cs.CV

Subspace Kernel Learning on Tensor Sequences

Lei Wang, Xi Ding, Yongsheng Gao, Piotr Koniusz

Comments Accepted at the Fourteenth International Conference on Learning Representations (ICLR 2026)

2603.19544 2026-03-23 cs.LG

Scalable Cross-Facility Federated Learning for Scientific Foundation Models on Multiple Supercomputers

Yijiang Li, Zilinghan Li, Kyle Chard, Ian Foster, Todd Munson, Ravi Madduri, Kibaek Kim

2603.19543 2026-03-23 cs.RO

Zero Shot Deformation Reconstruction for Soft Robots Using a Flexible Sensor Array and Cage Based 3D Gaussian Modeling

Linrui Shou, Zilang Chen, Wenjia Xu, Yiyue Luo, Tingyu Cheng

2603.19539 2026-03-23 cs.CL cs.AI

FDARxBench: Benchmarking Regulatory and Clinical Reasoning on FDA Generic Drug Assessment

Betty Xiong, Jillian Fisher, Benjamin Newman, Meng Hu, Shivangi Gupta, Yejin Choi, Lanyan Fang, Russ B Altman

Comments 4 pages, 2 figures

2603.19533 2026-03-23 cs.CV cs.RO

Pedestrian Crossing Intent Prediction via Psychological Features and Transformer Fusion

Sima Ashayer, Hoang H. Nguyen, Yu Liang, Mina Sartipi

Comments Accepted to IEEE Intelligent Vehicles Symposium (IV) 2026. 8 pages, 3 figures

2603.19532 2026-03-23 cs.CL cs.IR cs.LG

EvidenceRL: Reinforcing Evidence Consistency for Trustworthy Language Models

J. Ben Tamo, Yuxing Lu, Benoit L. Marteau, Micky C. Nnamdi, May D. Wang

2603.19531 2026-03-23 cs.CV cs.AI

dinov3.seg: Open-Vocabulary Semantic Segmentation with DINOv3

Saikat Dutta, Biplab Banerjee, Hamid Rezatofighi

2603.19529 2026-03-23 cs.CV cs.HC cs.LG

SurfaceXR: Fusing Smartwatch IMUs and Egocentric Hand Pose for Seamless Surface Interactions

Vasco Xu, Brian Chen, Eric J. Gonzalez, Andrea Colaço, Henry Hoffmann, Mar Gonzalez-Franco, Karan Ahuja

Comments Accepted to IEEE VR 2026 as a TVCG journal paper

2603.19523 2026-03-23 cs.CV

Recognising BSL Fingerspelling in Continuous Signing Sequences

Alyssa Chan, Taein Kwon, Andrew Zisserman

Comments 11 pages, 15 figures

2603.19517 2026-03-23 cs.CV cs.LG

ReXInTheWild: A Unified Benchmark for Medical Photograph Understanding

Oishi Banerjee, Sung Eun Kim, Alexandra N. Willauer, Julius M. Kernbach, Abeer Rihan Alomaish, Reema Abdulwahab S. Alghamdi, Hassan Rayhan Alomaish, Mohammed Baharoon, Xiaoman Zhang, Julian Nicolas Acosta, Christine Zhou, Pranav Rajpurkar

Comments 11 pages, 4 figures

2603.19515 2026-03-23 cs.AI

ItinBench: Benchmarking Planning Across Multiple Cognitive Dimensions with Large Language Models

Tianlong Wang, Pinqiao Wang, Weili Shi, Sheng li

2603.19514 2026-03-23 cs.AI

Learning to Disprove: Formal Counterexample Generation with Large Language Models

Zenan Li, Zhaoyu Li, Kaiyu Yang, Xiaoxing Ma, Zhendong Su

2603.19512 2026-03-23 cs.CV cs.AI

FedAgain: A Trust-Based and Robust Federated Learning Strategy for an Automated Kidney Stone Identification in Ureteroscopy

Ivan Reyes-Amezcua, Francisco Lopez-Tiro, Clément Larose, Christian Daul, Andres Mendez-Vazquez, Gilberto Ochoa-Ruiz

Comments Paper submitted for peer review

2603.19501 2026-03-23 cs.LG eess.SP

Stochastic Sequential Decision Making over Expanding Networks with Graph Filtering

Zhan Gao, Bishwadeep Das, Elvin Isufi

2603.19497 2026-03-23 cs.LG

ICLAD: In-Context Learning for Unified Tabular Anomaly Detection Across Supervision Regimes

Jack Yi Wei, Narges Armanfard

Comments 33 pages, 17 figures

2603.19496 2026-03-23 cs.CV

VeloxNet: Efficient Spatial Gating for Lightweight Embedded Image Classification

Md Meftahul Ferdaus, Elias Ioup, Mahdi Abdelguerfi, Anton Netchaev, Steven Sloan, Ken Pathak, Kendall N. Niles

Comments This work has been submitted to the IEEE for possible publication