arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2405.15216 2026-03-18 cs.LG cs.CL cs.SD eess.AS

Revisiting ASR Error Correction with Specialized Models

Zijin Gu, Tatiana Likhomanenko, He Bai, Erik McDermott, Ronan Collobert, Navdeep Jaitly

Comments under review

2405.08979 2026-03-18 cs.LG q-bio.MN q-bio.QM

drGT: Attention-Guided Gene Assessment of Drug Response Utilizing a Drug-Cell-Gene Heterogeneous Network

Yoshitaka Inoue, Hunmin Lee, Tianfan Fu, Rui Kuang, Augustin Luna

详情

英文摘要

For translational impact, both accurate drug response prediction and biological plausibility of predictive features are needed. We present drGT, a heterogeneous graph deep learning model over drugs, genes, and cell lines that couples prediction with mechanism-oriented interpretability via attention coefficients (ACs). We assess both predictive generalization (random, unseen-drug, unseen-cell, and zero-shot splits) and biological plausibility (use of text-mined PubMed gene-drug co-mentions and comparison to a structure-based DTI predictor) on GDSC, NCI60, and CTRP datasets. Across benchmarks, drGT consistently delivers top regression performance while maintaining competitive classification accuracy for drug sensitivity. Under random 5-fold cross-validation, drGT attains an AUROC of up to 0.945 (3rd overall) and an $R^2$ up to 0.690, outperforming all baselines on regression. In leave-one-out tests for unseen cell lines and drugs, drGT achieves AUROCs of 0.706 and 0.844, and $R^2$ values of 0.692 and 0.022, the only model yielding positive $R^2$ for unseen drugs. In zero-shot prediction, drGT achieves an AUROC of 0.786 and a regression $R^2$ of 0.334, both representing the highest scores among all models. For interpretability, AC-derived drug-gene links recover known biology: among 976 drugs with known DTIs, 36.9% of predicted links match established DTIs, and 63.7% are supported by either PubMed abstracts or a structure-based predictive model. Enrichment analyses of AC-prioritized genes reveal drug-perturbed biological processes, providing pathway-level explanations. drGT advances predictive generalization and mechanism-centered interpretability, offering state-of-the-art regression accuracy and literature-supported biological hypotheses that demonstrate the use of graph learning from heterogeneous input data for biological discovery. Code: https://github.com/sciluna/drGT

URL PDF HTML ☆

赞 0 踩 0

2405.00168 2026-03-18 cs.CV

Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Solution

Zhangyong Tang, Tianyang Xu, Zhenhua Feng, Xuefeng Zhu, Chunyang Cheng, Xiao-Jun Wu, Josef Kittler

2403.16169 2026-03-18 cs.CV

Gaze-guided Hand-Object Interaction Synthesis: Dataset and Method

Jie Tian, Ran Ji, Lingxiao Yang, Suting Ni, Yuexin Ma, Lan Xu, Jingyi Yu, Ye Shi, Jingya Wang

Comments Accepted by IEEE Transactions on Multimedia (TMM), 2026. Project Page: https://takiee.github.io/gaze-hoi/

2403.09551 2026-03-18 cs.CV

Label-supervised surgical instrument segmentation using temporal equivariance and semantic continuity

Qiyuan Wang, Yanzhe Liu, Shang Zhao, Rong Liu, S. Kevin Zhou

2312.03442 2026-03-18 cs.CV

High-Quality Facial Geometry and Appearance Capture at Home

Yuxuan Han, Junfeng Lyu, Feng Xu

Comments CVPR 2024. Project page: https://yxuhan.github.io/CoRA/index.html ; Github repo: https://github.com/yxuhan/CoRA

2310.12032 2026-03-18 cs.LG stat.ML

Exact and general decoupled solutions of the LMC Multitask Gaussian Process model

Olivier Truffinet, Karim Ammar, Jean-Philippe Argaud, Bertrand Bouriquet

Comments 78 pages, 12 figures, submitted to Neurocomputing

2310.02641 2026-03-18 cs.CV cs.AI eess.IV

Deformation-Invariant Neural Network and Its Applications in Distorted Image Restoration and Analysis

Han Zhang, Qiguang Chen, Lok Ming Lui

详情

英文摘要

Images degraded by geometric distortions pose a significant challenge to imaging and computer vision tasks such as object recognition. Deep learning-based imaging models usually fail to give accurate performance for geometrically distorted images. In this paper, we propose the deformation-invariant neural network (DINN), a framework to address the problem of imaging tasks for geometrically distorted images. The DINN outputs consistent latent features for images that are geometrically distorted but represent the same underlying object or scene. The idea of DINN is to incorporate a simple component, called the quasiconformal transformer network (QCTN), into other existing deep networks for imaging tasks. The QCTN is a deep neural network that outputs a quasiconformal map, which can be used to transform a geometrically distorted image into an improved version that is closer to the distribution of natural or good images. It first outputs a Beltrami coefficient, which measures the quasiconformality of the output deformation map. By controlling the Beltrami coefficient, the local geometric distortion under the quasiconformal mapping can be controlled. The QCTN is lightweight and simple, which can be readily integrated into other existing deep neural networks to enhance their performance. Leveraging our framework, we have developed an image classification network that achieves accurate classification of distorted images. Our proposed framework has been applied to restore geometrically distorted images by atmospheric turbulence and water turbulence. DINN outperforms existing GAN-based restoration methods under these scenarios, demonstrating the effectiveness of the proposed framework. Additionally, we apply our proposed framework to the 1-1 verification of human face images under atmospheric turbulence and achieve satisfactory performance, further demonstrating the efficacy of our approach.

URL PDF HTML ☆

赞 0 踩 0

2212.02007 2026-03-18 cs.RO cs.SY eess.SY

Mixed Cloud Control Testbed: Validating Vehicle-Road-Cloud Integration via Mixed Digital Twin

Jianghong Dong, Qing Xu, Jiawei Wang, Chunying Yang, Mengchi Cai, Chaoyi Chen, Jianqiang Wang, Keqiang Li

Comments 13 pages, 13 figures

2110.00675 2026-03-18 cs.LG cs.RO cs.SY eess.SY math.OC

Contraction Theory for Nonlinear Stability Analysis and Learning-based Control: A Tutorial Overview

Hiroyasu Tsukamoto, Soon-Jo Chung, Jean-Jacques E. Slotine

Comments Annual Reviews in Control, Preprint Version, Accepted, Oct. 1st

详情

DOI: 10.1016/j.arcontrol.2021.10.001
Journal ref: Annual Reviews in Control; Volume 52; 2021; Pages 135-169; ISSN 1367-5788

英文摘要

Contraction theory is an analytical tool to study differential dynamics of a non-autonomous (i.e., time-varying) nonlinear system under a contraction metric defined with a uniformly positive definite matrix, the existence of which results in a necessary and sufficient characterization of incremental exponential stability of multiple solution trajectories with respect to each other. By using a squared differential length as a Lyapunov-like function, its nonlinear stability analysis boils down to finding a suitable contraction metric that satisfies a stability condition expressed as a linear matrix inequality, indicating that many parallels can be drawn between well-known linear systems theory and contraction theory for nonlinear systems. Furthermore, contraction theory takes advantage of a superior robustness property of exponential stability used in conjunction with the comparison lemma. This yields much-needed safety and stability guarantees for neural network-based control and estimation schemes, without resorting to a more involved method of using uniform asymptotic stability for input-to-state stability. Such distinctive features permit the systematic construction of a contraction metric via convex optimization, thereby obtaining an explicit exponential bound on the distance between a time-varying target trajectory and solution trajectories perturbed externally due to disturbances and learning errors. The objective of this paper is, therefore, to present a tutorial overview of contraction theory and its advantages in nonlinear stability analysis of deterministic and stochastic systems, with an emphasis on deriving formal robustness and stability guarantees for various learning-based and data-driven automatic control methods. In particular, we provide a detailed review of techniques for finding contraction metrics and associated control and estimation laws using deep neural networks.

URL PDF HTML ☆

赞 0 踩 0

2603.15847 2026-03-18 cs.CV cs.LG cs.RO

FEEL (Force-Enhanced Egocentric Learning): A Dataset for Physical Action Understanding

Eadom Dessalene, Botao He, Michael Maynord, Yonatan Tussa, Pavan Mantripragada, Yianni Karabati, Nirupam Roy, Yiannis Aloimonos

Comments 14 pages, 7 figures

2603.15840 2026-03-18 cs.LG cs.AI cs.CL stat.ML

When Stability Fails: Hidden Failure Modes Of LLMS in Data-Constrained Scientific Decision-Making

Nazia Riasat

Comments 13 pages, 5 figures. Accepted at ICLR 2026 Workshop: I Can't Believe It's Not Better (ICBINB 2026). OpenReview: https://openreview.net/pdf?id=vf8vs2ibso

2603.15831 2026-03-18 cs.AI cs.CL

Persona-Conditioned Risk Behavior in Large Language Models: A Simulated Gambling Study with GPT-4.1

Sankalp Dubedy

Comments 21 pages, 13 figures, 9 tables. Independent research. Submitted to arXiv for open dissemination

2603.15826 2026-03-18 cs.RO

Robust Dynamic Object Detection in Cluttered Indoor Scenes via Learned Spatiotemporal Cues

Juan Rached, Yixuan Jia, Kota Kondo, Jonathan P. How

2603.15822 2026-03-18 cs.CV

Beyond the Embedding Bottleneck: Adaptive Retrieval-Augmented 3D CT Report Generation

Renjie Liang, Yiling Ma, Yang Xing, Zhengkang Fan, Jinqian Pan, Chengkun Sun, Li Li, Kuang Gong, Jie Xu

2603.15818 2026-03-18 cs.CV

Conflict-Aware Multimodal Fusion for Ambivalence and Hesitancy Recognition

Salah Eddine Bekhouche, Hichem Telli, Azeddine Benlamoudi, Salah Eddine Herrouz, Abdelmalik Taleb-Ahmed, Abdenour Hadid

2603.15814 2026-03-18 cs.LG stat.AP

Longitudinal Risk Prediction in Mammography with Privileged History Distillation

Banafsheh Karimian, Alexis Guichemerre, Soufiane Belharbi, Natacha Gillet, Luke McCaffrey, Mohammadhadi Shateri, Eric Granger

详情

英文摘要

Breast cancer remains a leading cause of cancer-related mortality worldwide. Longitudinal mammography risk prediction models improve multi-year breast cancer risk prediction based on prior screening exams. However, in real-world clinical practice, longitudinal histories are often incomplete, irregular, or unavailable due to missed screenings, first-time examinations, heterogeneous acquisition schedules, or archival constraints. The absence of prior exams degrades the performance of longitudinal risk models and limits their practical applicability. While substantial longitudinal history is available during training, prior exams are commonly absent at test time. In this paper, we address missing history at inference time and propose a longitudinal risk prediction method that uses mammography history as privileged information during training and distills its prognostic value into a student model that only requires the current exam at inference time. The key idea is a privileged multi-teacher distillation scheme with horizon-specific teachers: each teacher is trained on the full longitudinal history to specialize in one prediction horizon, while the student receives only a reconstructed history derived from the current exam. This allows the student to inherit horizon-dependent longitudinal risk cues without requiring prior screening exams at deployment. Our new Privileged History Distillation (PHD) method is validated on a large longitudinal mammography dataset with multi-year cancer outcomes, CSAW-CC, comparing full-history and no-history baselines to their distilled counterparts. Using time-dependent AUC across horizons, our privileged history distillation method markedly improves the performance of long-horizon prediction over no-history models and is comparable to that of full-history models, while using only the current exam at inference time.

URL PDF HTML ☆

赞 0 踩 0

2603.15811 2026-03-18 cs.CV

Feed-forward Gaussian Registration for Head Avatar Creation and Editing

Malte Prinzler, Paulo Gotardo, Siyu Tang, Timo Bolkart

Comments Website: https://malteprinzler.github.io/projects/match ; Video: https://youtu.be/Z3xoXQ648sE

2603.15803 2026-03-18 cs.LG

Mask Is What DLLM Needs: A Masked Data Training Paradigm for Diffusion LLMs

Linrui Ma, Yufei Cui, Kai Han, Yunhe Wang

Comments Ongoing work

2603.15802 2026-03-18 cs.LG

Time-Aware Prior Fitted Networks for Zero-Shot Forecasting with Exogenous Variables

Andres Potapczynski, Ravi Kiran Selvam, Tatiana Konstantinova, Shankar Ramasubramanian, Malcolm Wolff, Kin G. Olivares, Ruijun Ma, Mengfei Cao, Michael W. Mahoney, Andrew Gordon Wilson, Boris N. Oreshkin, Dmitry Efimov

2603.15800 2026-03-18 cs.CV cs.CL cs.CR

Evolving Contextual Safety in Multi-Modal Large Language Models via Inference-Time Self-Reflective Memory

Ce Zhang, Jinxi He, Junyi He, Katia Sycara, Yaqi Xie

Comments Accepted at CVPR 2026. Project page: https://echosafe-mllm.github.io

2603.15799 2026-03-18 cs.AI

Prose2Policy (P2P): A Practical LLM Pipeline for Translating Natural-Language Access Policies into Executable Rego

Vatsal Gupta, Darshan Sreenivasamurthy

2603.15798 2026-03-18 cs.AI

CUBE: A Standard for Unifying Agent Benchmarks

Alexandre Lacoste, Nicolas Gontier, Oleh Shliazhko, Aman Jaiswal, Kusha Sareen, Shailesh Nanisetty, Joan Cabezas, Manuel Del Verme, Omar G. Younis, Simone Baratta, Matteo Avalle, Imene Kerboua, Xing Han Lù, Elron Bandel, Michal Shmueli-Scheuer, Asaf Yehudai, Leshem Choshen, Jonathan Lebensold, Sean Hughes, Massimo Caccia, Alexandre Drouin, Siva Reddy, Tao Yu, Yu Su, Graham Neubig, Dawn Song

Comments Position paper. 10 pages. Reference implementation: https://github.com/The-AI-Alliance/cube-standard

2603.15780 2026-03-18 cs.CV cs.AI cs.GR cs.LG

Parallelised Differentiable Straightest Geodesics for 3D Meshes

Hippolyte Verninas, Caner Korkmaz, Stefanos Zafeiriou, Tolga Birdal, Simone Foti

Comments Accepted to CVPR 2026

2603.15774 2026-03-18 cs.CV

Domain Adaptation Without the Compute Burden for Efficient Whole Slide Image Analysis

Umar Marikkar, Muhammad Awais, Sara Atito

详情

英文摘要

Computational methods on analyzing Whole Slide Images (WSIs) enable early diagnosis and treatments by supporting pathologists in detection and classification of tumors. However, the extremely high resolution of WSIs makes end-to-end training impractical compared to typical image analysis tasks. To address this, most approaches use pre-trained feature extractors to obtain fixed representations of whole slides, which are then combined with Multiple Instance Learning (MIL) for downstream tasks. These feature extractors are typically pre-trained on natural image datasets such as ImageNet, which fail to capture domain-specific characteristics. Although domain-specific pre-training on histopathology data yields more relevant feature representations, it remains computationally expensive and fail to capture task-specific characteristics within the domain. To address the computational cost and lack of task-specificity in domain-specific pre-training, we propose EfficientWSI (eWSI), a careful integration of Parameter-Efficient-Fine-Tuning (PEFT) and Multiple Instance Learning (MIL) that enables end-to-end training on WSI tasks. We evaluate eWSI on seven WSI-level tasks over Camelyon16, TCGA and BRACS datasets. Our results show that eWSI when applied with ImageNet feature extractors yields strong classification performance, matching or outperforming MILs with in-domain feature extractors, alleviating the need for extensive in-domain pre-training. Furthermore, when eWSI is applied with in-domain feature extractors, it further improves classification performance in most cases, demonstrating its ability to capture task-specific information where beneficial. Our findings suggest that eWSI provides a task-targeted, computationally efficient path for WSI tasks, offering a promising direction for task-specific learning in computational pathology.

URL PDF HTML ☆

赞 0 踩 0

2603.15773 2026-03-18 cs.CL cs.AI

Morphemes Without Borders: Evaluating Root-Pattern Morphology in Arabic Tokenizers and LLMs

Yara Alakeel, Chatrine Qwaider, Hanan Aldarmaki, Sawsan Alqahtani

Comments Accepted at LREC 2026

2603.15771 2026-03-18 cs.RO cs.AI

CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving

Yihong Guo, Dongqiangzi Ye, Sijia Chen, Anqi Liu, Xianming Liu

2603.15767 2026-03-18 cs.CV

CLRNet: Targetless Extrinsic Calibration for Camera, Lidar and 4D Radar Using Deep Learning

Marcell Kegl, Andras Palffy, Csaba Benedek, Dariu M. Gavrila

Comments Submitted to IEEE Transactions on Intelligent Vehicles

2603.15726 2026-03-18 cs.CL cs.AI cs.IR cs.LG

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

MiroMind Team, S. Bai, L. Bing, L. Lei, R. Li, X. Li, X. Lin, E. Min, L. Su, B. Wang, L. Wang, L. Wang, S. Wang, X. Wang, Y. Zhang, Z. Zhang, G. Chen, L. Chen, Z. Cheng, Y. Deng, Z. Huang, D. Ng, J. Ni, Q. Ren, X. Tang, B. L. Wang, H. Wang, N. Wang, C. Wei, Q. Wu, J. Xia, Y. Xiao, H. Xu, X. Xu, C. Xue, Z. Yang, Z. Yang, F. Ye, H. Ye, J. Yu, C. Zhang, W. Zhang, H. Zhao, P. Zhu

Comments 23 pages

2603.15724 2026-03-18 cs.LG cs.AI

Meta-TTRL: A Metacognitive Framework for Self-Improving Test-Time Reinforcement Learning in Unified Multimodal Models

Lit Sin Tan, Junzhe Chen, Xiaolong Fu, Lichen Ma, Junshi Huang, Jianzhong Shi, Yan Li, Lijie Wen

Comments 8 pages