arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.21493 2026-03-24 cs.CV cs.MM

StreamingEval: A Unified Evaluation Protocol towards Realistic Streaming Video Understanding

Guowei Tang, Tianwen Qian, Huanran Zheng, Yifei Wang, Xiaoling Wang

详情

英文摘要

Real-time, continuous understanding of visual signals is essential for real-world interactive AI applications, and poses a fundamental system-level challenge. Existing research on streaming video understanding, however, typically focuses on isolated aspects such as question-answering accuracy under limited visual context or improvements in encoding efficiency, while largely overlooking practical deployability under realistic resource constraints. To bridge this gap, we introduce StreamingEval, a unified evaluation framework for assessing the streaming video understanding capabilities of Video-LLMs under realistic constraints. StreamingEval benchmarks both mainstream offline models and recent online video models under a standardized protocol, explicitly characterizing the trade-off between efficiency, storage and accuracy. Specifically, we adopt a fixed-capacity memory bank to normalize accessible historical visual context, and jointly evaluate visual encoding efficiency, text decoding latency, and task performance to quantify overall system deployability. Extensive experiments across multiple datasets reveal substantial gaps between current Video-LLMs and the requirements of realistic streaming applications, providing a systematic basis for future research in this direction. Codes will be released at https://github.com/wwgTang-111/StreamingEval1.

URL PDF HTML ☆

赞 0 踩 0

2603.21492 2026-03-24 cs.LG math.OC

Multinoulli Extension: A Lossless Continuous Relaxation for Partition-Constrained Subset Selection

Qixin Zhang, Wei Huang, Yan Sun, Yao Shu, Yi Yu, Dacheng Tao

Comments 45 pages, 4 figures

2603.21491 2026-03-24 cs.LG

Learning Can Converge Stably to the Wrong Belief under Latent Reliability

Zhipeng Zhang, Zhenjie Yao, Kai Li, Lei Yang

Comments 15 pages, 6 figures. Extended and refocused version of arXiv:2601.09261

2603.21489 2026-03-24 cs.CL cs.AI

Effective Strategies for Asynchronous Software Engineering Agents

Jiayi Geng, Graham Neubig

2603.21488 2026-03-24 cs.CV

Learning Trajectory-Aware Multimodal Large Language Models for Video Reasoning Segmentation

Jingnan Luo, Mingqi Gao, Jun Liu, Bin-Bin Gao, Feng Zheng

2603.21487 2026-03-24 cs.RO cs.LG

GaussianSSC: Triplane-Guided Directional Gaussian Fields for 3D Semantic Completion

Ruiqi Xian, Jing Liang, He Yin, Xuewei Qi, Dinesh Manocha

2603.21485 2026-03-24 cs.LG

Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies

Koichi Tanaka, Kazuki Kawamura, Takanori Muroi, Yusuke Narita, Yuki Sasamoto, Kei Tateno, Takuma Udagawa, Wei-Wei Du, Yuta Saito

Comments Published as a conference paper at ICLR 2026

2603.21484 2026-03-24 cs.CV

Which Concepts to Forget and How to Refuse? Decomposing Concepts for Continual Unlearning in Large Vision-Language Models

Hyundong Jin, Dongyoon Han, Eunwoo Kim

Comments Accepted to CVPR 2026

2603.21478 2026-03-24 cs.CL cs.LG eess.AS

TaigiSpeech: A Low-Resource Real-World Speech Intent Dataset and Preliminary Results with Scalable Data Mining In-the-Wild

Kai-Wei Chang, Yi-Cheng Lin, Huang-Cheng Chou, Wenze Ren, Yu-Han Huang, Yun-Shao Tsai, Chien-Cheng Chen, Yu Tsao, Yuan-Fu Liao, Shrikanth Narayanan, James Glass, Hung-yi Lee

Comments submitted to Interspeech 2026

2603.21475 2026-03-24 cs.AI

Unified-MAS: Universally Generating Domain-Specific Nodes for Empowering Automatic Multi-Agent Systems

Hehai Lin, Yu Yan, Zixuan Wang, Bo Xu, Sudong Wang, Weiquan Huang, Ruochen Zhao, Minzhi Li, Chengwei Qin

Comments Code is available at https://github.com/linhh29/Unified-MAS

2603.21473 2026-03-24 cs.AI cs.CL cs.LG

Beyond Correlation: Refutation-Validated Aspect-Based Sentiment Analysis for Explainable Energy Market Returns

Wihan van der Heever, Keane Ong, Ranjan Satapathy, Erik Cambria

Comments 13 pages, 6 figures, submitted to Expert Systems with Applications

2603.21463 2026-03-24 cs.CV

EpiMask: Leveraging Epipolar Distance Based Masks in Cross-Attention for Satellite Image Matching

Rahul Deshmukh, Aditya Chauhan, Avinash Kak

2603.21461 2026-03-24 cs.LG cs.AI cs.CL

DSPA: Dynamic SAE Steering for Data-Efficient Preference Alignment

James Wedgwood, Aashiq Muhamed, Mona T. Diab, Virginia Smith

2603.21301 2026-03-24 cs.CL cs.AI

Enhancing reasoning accuracy in large language models during inference time

Vinay Sharma, Manish Jain

2603.21245 2026-03-24 cs.CV

CornOrb: A Multimodal Dataset of Orbscan Corneal Topography and Clinical Annotations for Keratoconus Detection

Mohammed El Amine Lazouni, Leila Ryma Lazouni, Zineb Aziza Elaouaber, Mohammed Ammar, Sofiane Zehar, Mohammed Youcef Bouayad Agha, Ahmed Lazouni, Amel Feroui, Ali H. Al-Timemy, Siamak Yousefi, Mostafa El Habib Daho

Comments Preprint, 9 pages, 4 figures, dataset paper. Corresponding author: mostafa.elhabibdaho@univ-brest.fr

2603.19972 2026-03-24 cs.LG

Model-Driven Learning-Based Physical Layer Authentication for Mobile Wi-Fi Devices

Yijia Guo, Junqing Zhang, Yao-Win Peter Hong, Stefano Tomasin

2603.19964 2026-03-24 cs.CV

2K Retrofit: Entropy-Guided Efficient Sparse Refinement for High-Resolution 3D Geometry Prediction

Tianbao Zhang, Zhenyu Liang, Zhenbo Song, Nana Wang, Xiaomei Zhang, Xudong Cai, Zheng Zhu, Kejian Wu, Gang Wang, Zhaoxin Fan

Comments 15pages

2603.19610 2026-03-24 cs.CV

ParallelVLM: Lossless Video-LLM Acceleration with Visual Alignment Aware Parallel Speculative Decoding

Quan Kong, Yuhao Shen, Yicheng Ji, Huan Li, Cong Wang

2603.19575 2026-03-24 cs.CV

MagicSeg: Open-World Segmentation Pretraining via Counterfactural Diffusion-Based Auto-Generation

Kaixin Cai, Pengzhen Ren, Jianhua Han, Yi Zhu, Hang Xu, Jianzhuang Liu, Xiaodan Liang

详情

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence 2026

英文摘要

Open-world semantic segmentation presently relies significantly on extensive image-text pair datasets, which often suffer from a lack of fine-grained pixel annotations on sufficient categories. The acquisition of such data is rendered economically prohibitive due to the substantial investments of both human labor and time. In light of the formidable image generation capabilities of diffusion models, we introduce a novel diffusion model-driven pipeline for automatically generating datasets tailored to the needs of open-world semantic segmentation, named "MagicSeg". Our MagicSeg initiates from class labels and proceeds to generate high-fidelity textual descriptions, which in turn serve as guidance for the diffusion model to generate images. Rather than only generating positive samples for each label, our process encompasses the simultaneous generation of corresponding negative images, designed to serve as paired counterfactual samples for contrastive training. Then, to provide a self-supervised signal for open-world segmentation pretraining, our MagicSeg integrates an open-vocabulary detection model and an interactive segmentation model to extract object masks as precise segmentation labels from images based on the provided category labels. By applying our dataset to the contrastive language-image pretraining model with the pseudo mask supervision and the auxiliary counterfactual contrastive training, the downstream model obtains strong performance on open-world semantic segmentation. We evaluate our model on PASCAL VOC, PASCAL Context, and COCO, achieving SOTA with performance of 62.9%, 26.7%, and 40.2%, respectively, demonstrating our dataset's effectiveness in enhancing open-world semantic segmentation capabilities. Project website: https://github.com/ckxhp/magicseg.

URL PDF HTML ☆

赞 0 踩 0

2603.19415 2026-03-24 cs.CL cs.AI cs.LG

Scalable Prompt Routing via Fine-Grained Latent Task Discovery

Yunyi Zhang, Soji Adeshina, Sheng Guan, Ashwin Ganesh, Zhen Han, Vassilis N. Ioannidis, Huzefa Rangwala, George Karypis

2603.19201 2026-03-24 cs.RO

OmniVTA: Visuo-Tactile World Modeling for Contact-Rich Robotic Manipulation

Yuhang Zheng, Songen Gu, Weize Li, Yupeng Zheng, Yujie Zang, Shuai Tian, Xiang Li, Ce Hao, Chen Gao, Si Liu, Haoran Li, Yilun Chen, Shuicheng Yan, Wenchao Ding

Comments TARS Robotics Project Page: https://mrsecant.github.io/OmniVTA

2603.17779 2026-03-24 cs.CV

CrowdGaussian: Reconstructing High-Fidelity 3D Gaussians for Human Crowd from a Single Image

Yizheng Song, Yiyu Zhuang, Qipeng Xu, Haixiang Wang, Jiahe Zhu, Jing Tian, Siyu Zhu, Hao Zhu

Comments Accepted by CVPR 2026

2603.17775 2026-03-24 cs.CL cs.AI cs.LG

CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution

Teng Pan, Yuchen Yan, Zixuan Wang, Ruiqing Zhang, Guiyang Hou, Wenqi Zhang, Weiming Lu, Jun Xiao, Yongliang Shen

Comments Project Page: https://zju-real.github.io/CoVerRL Code: https://github.com/ZJU-REAL/CoVerRL

2603.17535 2026-03-24 cs.LG

PCA-Based Interpretable Knowledge Representation and Analysis of Geometric Design Parameters

Alexander Köhler, Michael Breuß

Comments 20 pages, 6 figures, 1 table, preprint to IntelliSys-Artificial Intelligence Conference 2026

2603.16666 2026-03-24 cs.CV cs.AI

Fast-WAM: Do World Action Models Need Test-time Future Imagination?

Tianyuan Yuan, Zibin Dong, Yicheng Liu, Hang Zhao

2603.14803 2026-03-24 cs.SD cs.AI cs.CL

VorTEX: Various overlap ratio for Target speech EXtraction

Ro-hoon Oh, Jihwan Seol, Bugeun Kim

Comments Submitted to InterSpeech 2026 (under review)

2603.13406 2026-03-24 cs.CV cs.AI

Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for AH Detection

Liang Tang, Hongda Li, Jiayu Zhang, Long Chen, Shuxian Li, Siqi Pei, Tiaonan Duan, Yuhao Cheng

Comments 5 pages, 1 figures

2603.12469 2026-03-24 cs.CV

Unleashing Video Language Models for Fine-grained HRCT Report Generation

Yingying Fang, Huichi Zhou, KinHei Lee, Yijia Wang, Zhenxuan Zhang, Jiahao Huang, Guang Yang

2603.10446 2026-03-24 cs.CV

SignSparK: Efficient Multilingual Sign Language Production via Sparse Keyframe Learning

Jianhe Low, Alexandre Symeonidis-Herzig, Maksym Ivashechkin, Ozge Mercanoglu Sincan, Richard Bowden

2603.09418 2026-03-24 cs.CV

CIGPose: Causal Intervention Graph Neural Network for Whole-Body Pose Estimation

Bohao Li, Zhicheng Cao, Huixian Li, Yangming Guo

Comments The paper is accepted by CVPR 2026