arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.21741 2026-02-26 cs.CL cs.LG cs.SD

Robust Long-Form Bangla Speech Processing: Automatic Speech Recognition and Speaker Diarization

MD. Sagor Chowdhury, Adiba Fairooz Chowdhury

Comments 6 pages, 5 figures, 3 tables; system paper submitted to DL Sprint 4.0 (Kaggle)

详情

英文摘要

We describe our end-to-end system for Bengali long-form speech recognition (ASR) and speaker diarization submitted to the DL Sprint 4.0 competition on Kaggle. Bengali presents substantial challenges for both tasks: a large phoneme inventory, significant dialectal variation, frequent code-mixing with English, and a relative scarcity of large-scale labelled corpora. For ASR we achieve a best private Word Error Rate (WER) of 0.37738 and public WER of 0.36137, combining a BengaliAI fine-tuned Whisper medium model with Demucs source separation for vocal isolation, silence-boundary chunking, and carefully tuned generation hyperparameters. For speaker diarization we reach a best private Diarization Error Rate (DER) of 0.27671 and public DER of 0.20936 by replacing the default segmentation model inside the pyannote.audio pipeline with a Bengali-fine-tuned variant, pairing it with wespeaker-voxceleb-resnet34-LM embeddings and centroid-based agglomerative clustering. Our experiments demonstrate that domain-specific fine-tuning of the segmentation component, vocal source separation, and natural silence-aware chunking are the three most impactful design choices for low-resource Bengali speech processing.

URL PDF HTML ☆

赞 0 踩 0

2602.21740 2026-02-26 cs.CV

Structure-to-Image: Zero-Shot Depth Estimation in Colonoscopy via High-Fidelity Sim-to-Real Adaptation

Juan Yang, Yuyan Zhang, Han Jia, Bing Hu, Wanzhong Song

Comments \c{opyright} 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

2602.21736 2026-02-26 cs.RO

Joint-Aligned Latent Action: Towards Scalable VLA Pretraining in the Wild

Hao Luo, Ye Wang, Wanpeng Zhang, Haoqi Yuan, Yicheng Feng, Haiweng Xu, Sipeng Zheng, Zongqing Lu

Comments CVPR2026

2602.21735 2026-02-26 cs.CV

SigVLP: Sigmoid Volume-Language Pre-Training for Self-Supervised CT-Volume Adaptive Representation Learning

Jiayi Wang, Hadrien Reynaud, Ibrahim Ethem Hamamci, Sezgin Er, Suprosanna Shit, Bjoern Menze, Bernhard Kainz

2602.21728 2026-02-26 cs.CL

Explore-on-Graph: Incentivizing Autonomous Exploration of Large Language Models on Knowledge Graphs with Path-refined Reward Modeling

Shiqi Yan, Yubo Chen, Ruiqi Zhou, Zhengxi Yao, Shuai Chen, Tianyi Zhang, Shijie Zhang, Wei Qiang Zhang, Yongfeng Huang, Haixin Duan, Yunqi Zhang

Comments Published as a conference paper at ICLR 2026

2602.21723 2026-02-26 cs.RO

LessMimic: Long-Horizon Humanoid Interaction with Unified Distance Field Representations

Yutang Lin, Jieming Cui, Yixuan Li, Baoxiong Jia, Yixin Zhu, Siyuan Huang

2602.20871 2026-02-26 cs.RO

GeCo-SRT: Geometry-aware Continual Adaptation for Robotic Cross-Task Sim-to-Real Transfer

Wenbo Yu, Wenke Xia, Weitao Zhang, Di Hu

Comments Accepted By CVPR 2026

2602.20659 2026-02-26 cs.AI

Recursive Belief Vision Language Action Models

Vaidehi Bagaria, Bijo Sebastian, Nirav Kumar Patel

2602.20070 2026-02-26 cs.LG

Training-Free Generative Modeling via Kernelized Stochastic Interpolants

Florentin Coeurdoux, Etienne Lempereur, Nathanaël Cuvelle-Magar, Thomas Eboli, Stéphane Mallat, Anastasia Borovykh, Eric Vanden-Eijnden

2602.19430 2026-02-26 cs.CV

TherA: Thermal-Aware Visual-Language Prompting for Controllable RGB-to-Thermal Infrared Translation

Dong-Guw Lee, Tai Hyoung Rhee, Hyunsoo Jang, Young-Sik Shin, Ukcheol Shin, Ayoung Kim

2602.18292 2026-02-26 cs.LG cs.AI

Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers

Xiaotong Ji, Rasul Tutunov, Matthieu Zimmer, Haitham Bou-Ammar

2602.18022 2026-02-26 cs.CV cs.AI

Dual-Channel Attention Guidance for Training-Free Image Editing Control in Diffusion Transformers

Guandong Li

2602.17849 2026-02-26 cs.LG cs.IT math.IT

Quad Length Codes for Lossless Compression of e4m3

Aditya Agrawal, Albert Magyar, Hiteshwar Eswaraiah, Patrick Sheridan, Pradeep Janedula, Ravi Krishnan Venkatesan, Krishna Nair, Ravi Iyer

Comments The first version proposed lossless compression of BFloat16 using dual length codes. This version proposes lossless compression of e4m3 using quad length codes. The versions will be merged later

2602.16642 2026-02-26 cs.LG

Optimizer choice matters for the emergence of Neural Collapse

Jim Zhao, Tin Sum Cheng, Wojciech Masarczyk, Aurelien Lucchi

Comments Published as a conference paper at ICLR 2026

2602.14903 2026-02-26 cs.AI

The Potential of CoT for Reasoning: A Closer Look at Trace Dynamics

Gregor Bachmann, Yichen Jiang, Seyed Mohsen Moosavi Dezfooli, Moin Nabi

2602.13477 2026-02-26 cs.AI

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage

Akshat Naik, Jay Culligan, Yarin Gal, Philip Torr, Rahaf Aljundi, Alasdair Paren, Adel Bibi

Comments Preprint; corrected typos

2602.11020 2026-02-26 cs.LG q-fin.ST

When Fusion Helps and When It Breaks: View-Aligned Robustness in Same-Source Financial Imaging

Rui Ma

Comments Added sensitivity analysis at tau=0.008 for adversarial robustness; corrected the author affiliation

2602.10953 2026-02-26 cs.CL cs.AI

Search or Accelerate: Confidence-Switched Position Beam Search for Diffusion Language Models

Mingyu Cao, Alvaro H. C. Correia, Christos Louizos, Shiwei Liu, Lu Yin

Comments 11 pages, 8 figures

2602.06834 2026-02-26 cs.RO

Perception-Control Coupled Visual Servoing for Textureless Objects Using Keypoint-Based EKF

Allen Tao, Jun Yang, Stanko Oparnica, Wenjie Xue

2602.06034 2026-02-26 cs.CV

V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval

Dongyang Chen, Chaoyang Wang, Dezhao Su, Xi Xiao, Zeyu Zhang, Jing Xiong, Qing Li, Yuzhang Shang, Shichao Kan

Comments Project page: https://github.com/chendy25/V-Retrver

2602.03594 2026-02-26 cs.CV

TIPS Over Tricks: Simple Prompts for Effective Zero-shot Anomaly Detection

Alireza Salehi, Ehsan Karami, Sepehr Noey, Sahand Noey, Makoto Yamada, Reshad Hosseini, Mohammad Sabokrou

Comments This is the extended version of the paper accepted in ICASSP'26, which will be publicly available in May. Authors' contributions may vary among the versions

2602.00288 2026-02-26 cs.CV cs.AI

TimeBlind: A Spatio-Temporal Compositionality Benchmark for Video LLMs

Baiqi Li, Kangyi Zhao, Ce Zhang, Chancharik Mitra, Jean de Dieu Nyandwi, Gedas Bertasius

Comments For code and data, see https://baiqi-li.github.io/timeblind_project/

2601.22074 2026-02-26 cs.RO

mjlab: A Lightweight Framework for GPU-Accelerated Robot Learning

Kevin Zakka, Qiayuan Liao, Brent Yi, Louis Le Lay, Koushil Sreenath, Pieter Abbeel

Comments Comments: 11 pages; Code is available at https://github.com/mujocolab/mjlab ; Expanded sensor and domain randomization sections, added references, minor edits

2601.21405 2026-02-26 cs.CV

Rectifying Geometry-Induced Similarity Distortions for Real-World Aerial-Ground Person Re-Identification

Kailash A. Hambarde, Hugo Proença

2601.20218 2026-02-26 cs.CV

DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment

Haoyou Deng, Keyu Yan, Chaojie Mao, Xiang Wang, Yu Liu, Changxin Gao, Nong Sang

Comments Accepted by ICLR 2026

2601.18970 2026-02-26 cs.CV

Pay Attention to Where You Looked

Alex Berian, JhihYang Wu, Daniel Brignac, Natnael Daba, Abhijit Mahalanobis

Comments ICIP 2025 Workshop on Generative AI for World Simulations and Communications

2601.15715 2026-02-26 cs.CL cs.AI

RebuttalAgent: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Zhitao He, Zongwei Lyu, Yi R Fung

Comments Accepted by ICLR 2026

2601.07524 2026-02-26 cs.LG

Stagewise Reinforcement Learning and the Geometry of the Regret Landscape

Chris Elliott, Einar Urdshals, David Quarel, Matthew Farrugia-Roberts, Daniel Murfet

Comments 48 pages, 10 figures

2512.16902 2026-02-26 cs.CL cs.LG

In-Context Algebra

Eric Todd, Jannik Brinkmann, Rohit Gandikota, David Bau

Comments ICLR 2026. 35 pages, 22 figures. Code and data at https://algebra.baulab.info

2511.13065 2026-02-26 cs.CV

RobustGait: Robustness Analysis for Appearance Based Gait Recognition

Reeshoon Sayera, Akash Kumar, Sirshapan Mitra, Prudvi Kamtam, Yogesh S Rawat

Comments IEEE WACV'26 Main Conference