arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.17664 2026-02-20 cs.CL cs.AI cs.LG

Sink-Aware Pruning for Diffusion Language Models

Aidar Myrzakhan, Tianyi Li, Bowei Guo, Shengkun Tang, Zhiqiang Shen

Comments Code at: https://github.com/VILA-Lab/Sink-Aware-Pruning

详情

英文摘要

Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics largely inherited from autoregressive (AR) LLMs, typically preserve attention sink tokens because AR sinks serve as stable global anchors. We show that this assumption does not hold for DLMs: the attention-sink position exhibits substantially higher variance over the full generation trajectory (measured by how the dominant sink locations shift across timesteps), indicating that sinks are often transient and less structurally essential than in AR models. Based on this observation, we propose ${\bf \texttt{Sink-Aware Pruning}}$, which automatically identifies and prunes unstable sinks in DLMs (prior studies usually keep sinks for AR LLMs). Without retraining, our method achieves a better quality-efficiency trade-off and outperforms strong prior pruning baselines under matched compute. Our code is available at https://github.com/VILA-Lab/Sink-Aware-Pruning.

URL PDF HTML ☆

赞 0 踩 0

2602.17663 2026-02-20 cs.AI cs.CL cs.IR

CLEF HIPE-2026: Evaluating Accurate and Efficient Person-Place Relation Extraction from Multilingual Historical Texts

Juri Opitz, Corina Raclé, Emanuela Boros, Andrianos Michail, Matteo Romanello, Maud Ehrmann, Simon Clematide

Comments ECIR 2026. CLEF Evaluation Lab. Registration DL: 2026/04/23. Task Homepage at https://hipe-eval.github.io/HIPE-2026/

2602.17659 2026-02-20 cs.CV cs.RO

When Vision Overrides Language: Evaluating and Mitigating Counterfactual Failures in VLAs

Yu Fang, Yuchun Feng, Dong Jing, Jiaqi Liu, Yue Yang, Zhenyu Wei, Daniel Szafir, Mingyu Ding

Comments Website: https://vla-va.github.io/

2602.17655 2026-02-20 cs.CL

What Language is This? Ask Your Tokenizer

Clara Meister, Ahmetcan Yavuz, Pietro Lesci, Tiago Pimentel

2602.17645 2026-02-20 cs.LG cs.AI cs.CL cs.CV

Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

Xiaohan Zhao, Zhaoyi Li, Yaxin Luo, Jiacheng Cui, Zhiqiang Shen

Comments Code at: https://github.com/vila-lab/M-Attack-V2

2602.17642 2026-02-20 cs.LG

A.R.I.S.: Automated Recycling Identification System for E-Waste Classification Using Deep Learning

Dhruv Talwar, Harsh Desai, Wendong Yin, Goutam Mohanty, Rafael Reveles

2602.17641 2026-02-20 cs.LG cs.AI

FAMOSE: A ReAct Approach to Automated Feature Discovery

Keith Burghardt, Jienan Liu, Sadman Sakib, Yuning Hao, Bo Li

Comments 23 pages, 6 figures

2602.17639 2026-02-20 cs.CV

IntRec: Intent-based Retrieval with Contrastive Refinement

Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger, Yue Lu

2602.17636 2026-02-20 cs.CV

CORAL: Correspondence Alignment for Improved Virtual Try-On

Jiyoung Kim, Youngjin Shin, Siyoon Jin, Dahyun Chung, Jisu Nam, Tongmin Kim, Jongjae Park, Hyeonwoo Kang, Seungryong Kim

Comments 32 pages, 25 figures

2602.17634 2026-02-20 cs.LG cs.AI

Reverso: Efficient Time Series Foundation Models for Zero-shot Forecasting

Xinghong Fu, Yanhong Li, Georgios Papaioannou, Yoon Kim

2602.17633 2026-02-20 cs.LG cs.AI stat.ML

When to Trust the Cheap Check: Weak and Strong Verification for Reasoning

Shayan Kiyani, Sima Noorani, George Pappas, Hamed Hassani

2602.17625 2026-02-20 cs.LG cs.DC

Catastrophic Forgetting Resilient One-Shot Incremental Federated Learning

Obaidullah Zaland, Zulfiqar Ahmad Khan, Monowar Bhuyan

Comments Accepted for publication in the IEEE International Conference on Big Data (IEEE BigData) 2025

2602.17623 2026-02-20 cs.CL

Unmasking the Factual-Conceptual Gap in Persian Language Models

Alireza Sakhaeirad, Ali Ma'manpoosh, Arshia Hemmat

2602.17614 2026-02-20 cs.LG cs.DC

Guarding the Middle: Protecting Intermediate Representations in Federated Split Learning

Obaidullah Zaland, Sajib Mistry, Monowar Bhuyan

Comments Accepted for Publication in IEEE International Conference on Big Data (IEEE BigData) 2025

2602.17608 2026-02-20 cs.LG cs.AI stat.ML

Towards Anytime-Valid Statistical Watermarking

Baihe Huang, Eric Xu, Kannan Ramchandran, Jiantao Jiao, Michael I. Jordan

2602.17607 2026-02-20 cs.AI cs.LG cs.NA math.NA

AutoNumerics: An Autonomous, PDE-Agnostic Multi-Agent Pipeline for Scientific Computing

Jianda Du, Youran Sun, Haizhao Yang

2602.17602 2026-02-20 cs.AI

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

Hojung Jung, Rodrigo Hormazabal, Jaehyeong Jo, Youngrok Park, Kyunggeun Roh, Se-Young Yun, Sehui Han, Dae-Woong Jeong

2602.17599 2026-02-20 cs.CV cs.MM cs.SD

Art2Mus: Artwork-to-Music Generation via Visual Conditioning and Large-Scale Cross-Modal Alignment

Ivan Rinaldi, Matteo Mendula, Nicola Fanelli, Florence Levé, Matteo Testi, Giovanna Castellano, Gennaro Vessio

详情

英文摘要

Music generation has advanced markedly through multimodal deep learning, enabling models to synthesize audio from text and, more recently, from images. However, existing image-conditioned systems suffer from two fundamental limitations: (i) they are typically trained on natural photographs, limiting their ability to capture the richer semantic, stylistic, and cultural content of artworks; and (ii) most rely on an image-to-text conversion stage, using language as a semantic shortcut that simplifies conditioning but prevents direct visual-to-audio learning. Motivated by these gaps, we introduce ArtSound, a large-scale multimodal dataset of 105,884 artwork-music pairs enriched with dual-modality captions, obtained by extending ArtGraph and the Free Music Archive. We further propose ArtToMus, the first framework explicitly designed for direct artwork-to-music generation, which maps digitized artworks to music without image-to-text translation or language-based semantic supervision. The framework projects visual embeddings into the conditioning space of a latent diffusion model, enabling music synthesis guided solely by visual information. Experimental results show that ArtToMus generates musically coherent and stylistically consistent outputs that reflect salient visual cues of the source artworks. While absolute alignment scores remain lower than those of text-conditioned systems-as expected given the substantially increased difficulty of removing linguistic supervision-ArtToMus achieves competitive perceptual quality and meaningful cross-modal correspondence. This work establishes direct visual-to-music generation as a distinct and challenging research direction, and provides resources that support applications in multimedia art, cultural heritage, and AI-assisted creative practice. Code and dataset will be publicly released upon acceptance.

URL PDF HTML ☆

赞 0 踩 0

2602.17596 2026-02-20 cs.LG

Asymptotic Smoothing of the Lipschitz Loss Landscape in Overparameterized One-Hidden-Layer ReLU Networks

Saveliy Baturin

2602.17594 2026-02-20 cs.AI

AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

Lance Ying, Ryan Truong, Prafull Sharma, Kaiya Ivy Zhao, Nathan Cloos, Kelsey R. Allen, Thomas L. Griffiths, Katherine M. Collins, José Hernández-Orallo, Phillip Isola, Samuel J. Gershman, Joshua B. Tenenbaum

Comments 29 pages, 14 figures

2602.17586 2026-02-20 cs.RO cs.AI cs.LG

Conditional Flow Matching for Continuous Anomaly Detection in Autonomous Driving on a Manifold-Aware Spectral Space

Antonio Guillen-Perez

详情

英文摘要

Safety validation for Level 4 autonomous vehicles (AVs) is currently bottlenecked by the inability to scale the detection of rare, high-risk long-tail scenarios using traditional rule-based heuristics. We present Deep-Flow, an unsupervised framework for safety-critical anomaly detection that utilizes Optimal Transport Conditional Flow Matching (OT-CFM) to characterize the continuous probability density of expert human driving behavior. Unlike standard generative approaches that operate in unstable, high-dimensional coordinate spaces, Deep-Flow constrains the generative process to a low-rank spectral manifold via a Principal Component Analysis (PCA) bottleneck. This ensures kinematic smoothness by design and enables the computation of the exact Jacobian trace for numerically stable, deterministic log-likelihood estimation. To resolve multi-modal ambiguity at complex junctions, we utilize an Early Fusion Transformer encoder with lane-aware goal conditioning, featuring a direct skip-connection to the flow head to maintain intent-integrity throughout the network. We introduce a kinematic complexity weighting scheme that prioritizes high-energy maneuvers (quantified via path tortuosity and jerk) during the simulation-free training process. Evaluated on the Waymo Open Motion Dataset (WOMD), our framework achieves an AUC-ROC of 0.766 against a heuristic golden set of safety-critical events. More significantly, our analysis reveals a fundamental distinction between kinematic danger and semantic non-compliance. Deep-Flow identifies a critical predictability gap by surfacing out-of-distribution behaviors, such as lane-boundary violations and non-normative junction maneuvers, that traditional safety filters overlook. This work provides a mathematically rigorous foundation for defining statistical safety gates, enabling objective, data-driven validation for the safe deployment of autonomous fleets.

URL PDF HTML ☆

赞 0 踩 0

2602.17584 2026-02-20 cs.LG

Canonicalizing Multimodal Contrastive Representation Learning

Sharut Gupta, Sanyam Kansal, Stefanie Jegelka, Phillip Isola, Vikas Garg

Comments 78 pages, 57 figures

2602.17574 2026-02-20 cs.RO cs.SY eess.SY

Hybrid System Planning using a Mixed-Integer ADMM Heuristic and Hybrid Zonotopes

Joshua A. Robbins, Andrew F. Thompson, Jonah J. Glunt, Herschel C. Pangborn

2602.17573 2026-02-20 cs.RO cs.CV

FR-GESTURE: An RGBD Dataset For Gesture-based Human-Robot Interaction In First Responder Operations

Konstantinos Foteinos, Georgios Angelidis, Aggelos Psiris, Vasileios Argyriou, Panagiotis Sarigiannidis, Georgios Th. Papadopoulos

2602.17568 2026-02-20 cs.LG cs.AI

Be Wary of Your Time Series Preprocessing

Sofiane Ennadir, Tianze Wang, Oleg Smirnov, Sahar Asadi, Lele Cao

Comments Accepted at the AI4TS workshop at AAAI-26

2602.17566 2026-02-20 cs.AI

A Hybrid Federated Learning Based Ensemble Approach for Lung Disease Diagnosis Leveraging Fusion of SWIN Transformer and CNN

Asif Hasan Chowdhury, Md. Fahim Islam, M Ragib Anjum Riad, Faiyaz Bin Hashem, Md Tanzim Reza, Md. Golam Rabiul Alam

2602.17559 2026-02-20 cs.LG

Revisiting Weight Regularization for Low-Rank Continual Learning

Yaoyue Zheng, Yin Zhang, Joost van de Weijer, Gido M van de Ven, Shaoyi Du, Xuetao Zhang, Zhiqiang Tian

Comments Accepted by ICLR 2026

2602.17558 2026-02-20 cs.CV

RetouchIQ: MLLM Agents for Instruction-Based Image Retouching with Generalist Reward

Qiucheng Wu, Jing Shi, Simon Jenni, Kushal Kafle, Tianyu Wang, Shiyu Chang, Handong Zhao

Comments 10 pages, 6 figures

2602.17544 2026-02-20 cs.AI cs.CL cs.IR

Evaluating Chain-of-Thought Reasoning through Reusability and Verifiability

Shashank Aggarwal, Ram Vikas Mishra, Amit Awekar

2602.17537 2026-02-20 cs.RO cs.LG

IRIS: Learning-Driven Task-Specific Cinema Robot Arm for Visuomotor Motion Control

Qilong Cheng, Matthew Mackay, Ali Bereyhi