arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

Wei Wang, Wangyou Zhang, Chenda Li, Jiahe Wang, Samuele Cornell, Marvin Sach, Kohei Saijo, Yihui Fu, Zhaoheng Ni, Bing Han, Xun Gong, Mengxiao Bi, Tim Fingscheidt, Shinji Watanabe, Yanmin Qian

2601.18415 2026-01-27 cs.CL cs.SD eess.AS

Pisets: A Robust Speech Recognition System for Lectures and Interviews

Ivan Bondarenko, Daniil Grebenkin, Oleg Sedukhin, Mikhail Klementev, Roman Derunets, Lyudmila Budneva

Journal ref Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track), pp. 988-997

2601.18409 2026-01-27 cs.LG

Frequency-Based Hyperparameter Selection in Games

Aniket Sanyal, Baraah A. M. Sidahmed, Rebekka Burkholz, Tatjana Chavdarova

2601.18407 2026-01-27 cs.CV

Larger than memory image processing

Jon Sporring, David Stansby

Comments 10 pages

详情

英文摘要

This report addresses larger-than-memory image analysis for petascale datasets such as 1.4 PB electron-microscopy volumes and 150 TB human-organ atlases. We argue that performance is fundamentally I/O-bound. We show that structuring analysis as streaming passes over data is crucial. For 3D volumes, two representations are popular: stacks of 2D slices (e.g., directories or multi-page TIFF) and 3D chunked layouts (e.g., Zarr/HDF5). While for a few algorithms, chunked layout on disk is crucial to keep disk I/O at a minimum, we show how the slice-based streaming architecture can be built on top of either image representation in a manner that minimizes disk I/O. This is in particular advantageous for algorithms relying on neighbouring values, since the slicing streaming architecture is 1D, which implies that there are only 2 possible sweeping orders, both of which are aligned with the order in which images are read from the disk. This is in contrast to 3D chunks, in which any sweep cannot be done without accessing each chunk at least 9 times. We formalize this with sweep-based execution (natural 2D/3D orders), windowed operations, and overlap-aware tiling to minimize redundant access. Building on these principles, we introduce a domain-specific language (DSL) that encodes algorithms with intrinsic knowledge of their optimal streaming and memory use; the DSL performs compile-time and run-time pipeline analyses to automatically select window sizes, fuse stages, tee and zip streams, and schedule passes for limited-RAM machines, yielding near-linear I/O scans and predictable memory footprints. The approach integrates with existing tooling for segmentation and morphology but reframes pre/post-processing as pipelines that privilege sequential read/write patterns, delivering substantial throughput gains for extremely large images without requiring full-volume residency in memory.

URL PDF HTML ☆

赞 0 踩 0

2601.18401 2026-01-27 cs.LG

Superlinear Multi-Step Attention

Yufeng Huang

Comments 30 pages, 6 figures

2601.18393 2026-01-27 cs.SD cs.CL eess.AS

OCR-Enhanced Multimodal ASR Can Read While Listening

Junli Chen, Changli Tang, Yixuan Li, Guangzhi Sun, Chao Zhang

Comments 4 pages, 2 figures. Submitted to ICASSP 2026

2601.18392 2026-01-27 cs.CV

Efficient Complex-Valued Vision Transformers for MRI Classification Directly from k-Space

Moritz Rempe, Lukas T. Rotkopf, Marco Schlimbach, Helmut Becker, Fabian Hörst, Johannes Haubold, Philipp Dammann, Kevin Kröninger, Jens Kleesiek

2601.18386 2026-01-27 cs.CV

ARMOR: Agentic Reasoning for Methods Orchestration and Reparameterization for Robust Adversarial Attacks

Gabriel Lee Jun Rong, Christos Korgialas, Dion Jia Xu Ho, Pai Chet Ng, Xiaoxiao Miao, Konstantinos N. Plataniotis

2601.18385 2026-01-27 cs.CV

Estimation of geometric transformation matrices using grid-shaped pilot signals

Rinka Kawano, Masaki Kawamura

Journal ref APSIPA Transactions on Signal and Information Processing (2025) 14 (1)

2601.18380 2026-01-27 cs.CL cs.CY cs.IR

Corpus-Based Approaches to Igbo Diacritic Restoration

Ignatius Ezeani

Comments 270 page. Ph.D. Thesis. The University of Sheffield

Journal ref 2019 White Rose eTheses Online

2601.18375 2026-01-27 cs.CL

Hierarchical Text Classification with LLM-Refined Taxonomies

Jonas Golde, Nicolaas Jedema, Ravi Krishnan, Phong Le

2601.18372 2026-01-27 cs.CV

Gaze Prediction in Virtual Reality Without Eye Tracking Using Visual and Head Motion Cues

Christos Petrou, Harris Partaourides, Athanasios Balomenos, Yannis Kopsinis, Sotirios Chatzis

2601.18356 2026-01-27 cs.LG

Making medical vision-language models think causally across modalities with retrieval-augmented cross-modal reasoning

Weiqin Yang, Haowen Xue, Qingyi Peng, Hexuan Hu, Qian Huang, Tingbo Zhang

2601.18353 2026-01-27 cs.AI cs.CL cs.HC

Can Good Writing Be Generative? Expert-Level AI Writing Emerges through Fine-Tuning on High-Quality Books

Tuhin Chakrabarty, Paramveer S. Dhillon

Comments Proceedings of CHI 2026 Conference (To Appear)

2601.18346 2026-01-27 cs.CV

Q-Bench-Portrait: Benchmarking Multimodal Large Language Models on Portrait Image Quality Perception

Sijing Wu, Yunhao Li, Zicheng Zhang, Qi Jia, Xinyue Li, Huiyu Duan, Xiongkuo Min, Guangtao Zhai

2601.18342 2026-01-27 cs.LG

Structural Gender Bias in Credit Scoring: Proxy Leakage

Navya SD, Sreekanth D, SS Uma Sankari

2601.18335 2026-01-27 cs.SD cs.AI

Analytic Incremental Learning For Sound Source Localization With Imbalance Rectification

Zexia Fan, Yu Chen, Qiquan Zhang, Kainan Chen, Xinyuan Qian

Comments Accepted by ICASSP26

2601.18334 2026-01-27 cs.CL

Overalignment in Frontier LLMs: An Empirical Study of Sycophantic Behaviour in Healthcare

Clément Christophe, Wadood Mohammed Abdul, Prateek Munjal, Tathagata Raha, Ronnie Rajan, Praveenkumar Kanithi

2601.18330 2026-01-27 cs.CV cs.LG

A Tumor Aware DenseNet Swin Hybrid Learning with Boosted and Hierarchical Feature Spaces for Large-Scale Brain MRI Classification

Muhammad Ali Shah, Muhammad Mansoor Alam, Saddam Hussain Khan

Comments 33 Pages, 8 Tables, Figures 16

2601.18329 2026-01-27 cs.LG

Discriminability-Driven Spatial-Channel Selection with Gradient Norm for Drone Signal OOD Detection

Chuhan Feng, Jing Li, Jie Li, Lu Lv, Fengkui Gong

2601.18326 2026-01-27 cs.LG

Cognitive Fusion of ZC Sequences and Time-Frequency Images for Out-of-Distribution Detection of Drone Signals

Jie Li, Jing Li, Lu Lv, Zhanyu Ju, Fengkui Gong

2601.18323 2026-01-27 cs.RO

TC-IDM: Grounding Video Generation for Executable Zero-shot Robot Motion

Weishi Mi, Yong Bao, Xiaowei Chi, Xiaozhu Ju, Zhiyuan Qin, Kuangzhi Ge, Kai Tang, Peidong Jia, Shanghang Zhang, Jian Tang

2601.18320 2026-01-27 cs.CL cs.AI cs.DB

MultiVis-Agent: A Multi-Agent Framework with Logic Rules for Reliable and Comprehensive Cross-Modal Data Visualization

Jinwei Lu, Yuanfeng Song, Chen Zhang, Raymond Chi-Wing Wong

Comments Accepted to SIGMOD 2026

2601.18314 2026-01-27 cs.LG

A Master Class on Reproducibility: A Student Hackathon on Advanced MRI Reconstruction Methods

Lina Felsner, Sevgi G. Kafali, Hannah Eichhorn, Agnes A. J. Leth, Aidas Batvinskas, Andre Datchev, Fabian Klemm, Jan Aulich, Puntika Leepagorn, Ruben Klinger, Daniel Rueckert, Julia A. Schnabel

2601.18308 2026-01-27 cs.AI cs.SI cs.SY eess.SY

A Generative AI-Driven Reliability Layer for Action-Oriented Disaster Resilience

Geunsik Lim

Comments 19 pages

2601.18306 2026-01-27 cs.CL cs.AI

Calibrating Beyond English: Language Diversity for Better Quantized Multilingual LLM

Everlyn Asiko Chimoto, Mostafa Elhoushi, Bruce A. Bassett

Comments Accepted to EACL 2026 Main Conference

2601.18305 2026-01-27 cs.CV

SwipeGen: Bridging the Execution Gap in GUI Agents via Human-like Swipe Synthesis

Xuan Wang, Siyuan Su, Quantong Fu, Yongxiang Hu, Yangfan Zhou

Comments 15 pages, 3 figures. Under review. Code and dataset will be released upon acceptance

2601.18302 2026-01-27 cs.CL

Suppressing Final Layer Hidden State Jumps in Transformer Pretraining

Keigo Shibata, Kazuki Yano, Ryosuke Takahashi, Jaesung Lee, Wataru Ikeda, Jun Suzuki

Comments Accepted to the Findings of EACL 2026

2601.18289 2026-01-27 cs.RO

Quest2ROS2: A ROS 2 Framework for Bi-manual VR Teleoperation

Jialong Li, Zhenguo Wang, Tianci Wang, Maj Stenmark, Volker Krueger

Comments HRI 2026

2601.18285 2026-01-27 cs.CL

U-Fold: Dynamic Intent-Aware Context Folding for User-Centric Agents

Jin Su, Runnan Fang, Yeqiu Li, Xiaobin Wang, Shihao Cai, Pengjun Xie, Ningyu Zhang, Fajie Yuan

AI 大模型

视觉与机器人

科学与医疗

UrgentMOS: Unified Multi-Metric and Preference Learning for Robust Speech Quality Assessment