arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2512.21204 2026-04-21 cs.CL cs.AI

SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation

Mahi Luthra, Jiayi Shen, Maxime Poli, Angelo Ortiz, Yosuke Higuchi, Youssef Benchekroun, Martin Gleize, Charles-Eric Saint-James, Dongyan Lin, Phillip Rust, Angel Villar, Surya Parimi, Vanessa Stark, Rashel Moritz, Juan Pino, Yann LeCun, Emmanuel Dupoux

详情

英文摘要

Human infants, with only a few hundred hours of speech exposure, acquire basic units of new languages, highlighting a striking efficiency gap compared to the data-hungry self-supervised speech models. To address this gap, this paper introduces SpidR-Adapt for rapid adaptation of speech units to new languages using minimal unlabeled data. We cast such low-resource speech representation learning as a meta-learning problem and construct a multi-task adaptive pre-training (MAdaPT) protocol which formulates the adaptation process as a bi-level optimization framework. To enable scalable meta-training under this framework, we propose a novel heuristic solution, first-order bi-level optimization (FOBLO), avoiding heavy computation costs. Finally, we stabilize meta-training by using a robust initialization through interleaved supervision which alternates self-supervised and supervised objectives. Empirically, SpidR-Adapt achieves rapid gains in phonemic discriminability (ABX) and downstream spoken language modeling scores (sWUGGY, sBLIMP, tSC), surpassing in-domain toplines after training on less than 1h of target-language audio and delivering $100\times$ greater data efficiency than standard multi-task training. These findings highlight a practical, architecture-agnostic path toward biologically inspired, data-efficient representations. We open-source the training code and model checkpoints at https://github.com/facebookresearch/spidr-adapt.

URL PDF HTML ☆

赞 0 踩 0

2512.20626 2026-04-21 cs.AI cs.CL cs.CV cs.IR

MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation

Chi-Hsiang Hsiao, Yi-Cheng Wang, Tzung-Sheng Lin, Yi-Ren Yeh, Chu-Song Chen

Comments ACL 2026

2512.15948 2026-04-21 cs.AI q-bio.NC

Subjective functions

Samuel J. Gershman

2512.12643 2026-04-21 cs.CL

LexRel: Benchmarking Legal Relation Extraction for Chinese Civil Cases

Yida Cai, Ranjuexiao Hu, Huiyuan Xie, Chenyang Li, Yun Liu, Yuxiao Ye, Zhenghao Liu, Weixing Shen, Zhiyuan Liu

Comments Accepted to ACL 2026 (main conference). 17 pages, 7 figures

2512.12642 2026-04-21 cs.LG

Torch Geometric Pool: the PyTorch library for pooling in Graph Neural Networks

Carlo Abate, Ivan Marisca, Filippo Maria Bianchi

2512.11108 2026-04-21 cs.CL cs.AI

Explanation Bias is a Product: Revealing the Hidden Lexical and Position Preferences in Post-Hoc Feature Attribution

Jonathan Kamp, Roos Bakker, Dominique Blok

Comments 9 pages

2512.10687 2026-04-21 cs.AI cs.CY

Safe for Whom? Rethinking How We Evaluate the Safety of LLMs for Real Users

Manon Kempermann, Sai Suresh Macharla Vasu, Mahalakshmi Raveenthiran, Theo Farrell, Ingmar Weber

Comments Paper accepted at IASEAI'26; please cite that peer-reviewed version instead

详情

英文摘要

Safety evaluations of large language models (LLMs) typically focus on universal risks like dangerous capabilities or undesirable propensities. However, millions use LLMs for personal advice on high-stakes topics like finance and health, where harms are context-dependent rather than universal. While frameworks like the OECD's AI classification recognize the need to assess individual risks, user-welfare safety evaluations remain underdeveloped. We argue that developing such evaluations is non-trivial due to fundamental questions about accounting for user context in evaluation design. In this exploratory study, we evaluated advice on finance and health from GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro across user profiles of varying vulnerability. First, we demonstrate that evaluators must have access to rich user context: identical LLM responses were rated significantly safer by context-blind evaluators than by those aware of user circumstances, with safety scores for high-vulnerability users dropping from safe (5/7) to somewhat unsafe (3/7). One might assume this gap could be addressed by creating realistic user prompts containing key contextual information. However, our second study challenges this: we rerun the evaluation on prompts containing context users report they would disclose, finding no significant improvement. Our work establishes that effective user-welfare safety evaluation requires evaluators to assess responses against diverse user profiles, as realistic user context disclosure alone proves insufficient, particularly for vulnerable populations. By demonstrating a methodology for context-aware evaluation, this study provides both a starting point for such assessments and foundational evidence that evaluating individual welfare demands approaches distinct from existing universal-risk frameworks. We publish our code and dataset to aid future developments.

URL PDF HTML ☆

赞 0 踩 0

2512.06987 2026-04-21 cs.LG cond-mat.mtrl-sci

OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction

Emily Jin, Andrei Cristian Nica, Mikhail Galkin, Jarrid Rector-Brooks, Kin Long Kelvin Lee, Santiago Miret, Frances H. Arnold, Michael Bronstein, Avishek Joey Bose, Alexander Tong, Cheng-Hao Liu

2512.05623 2026-04-21 cs.LG

Bounded Graph Clustering with Graph Neural Networks

Kibidi Neocosmos, Diego Baptista, Nicole Ludwig

Comments 20 pages, 11 figures

2512.04677 2026-04-21 cs.CV

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Yubo Huang, Hailong Guo, Fangtai Wu, Weiqiang Wang, Shifeng Zhang, Shijie Huang, Qijun Gan, Lin Liu, Sirui Zhao, Enhong Chen, Jiaming Liu, Steven Hoi

2512.01643 2026-04-21 cs.CV

ViT$^3$: Unlocking Test-Time Training in Vision

Dongchen Han, Yining Li, Tianyu Li, Zixuan Cao, Ziming Wang, Jun Song, Yu Cheng, Bo Zheng, Gao Huang

Comments CVPR 2026, oral

2511.23170 2026-04-21 cs.CV

PowerCLIP: Powerset Alignment for Contrastive Pre-Training

Masaki Kawamura, Nakamasa Inoue, Rintaro Yanagi, Hirokatsu Kataoka, Rio Yokota

2511.21064 2026-04-21 cs.AI cs.CV

OVOD-Agent: A Markov-Bandit Framework for Proactive Visual Reasoning and Self-Evolving Detection

Chujie Wang, Jianyu Lu, Zhiyuan Luo, Xi Chen, Chu He

2511.19202 2026-04-21 cs.CV cs.GR

NVGS: Neural Visibility for Occlusion Culling in 3D Gaussian Splatting

Brent Zoomers, Florian Hahlbohm, Joni Vanherck, Lode Jorissen, Marcus Magnor, Nick Michiels

Comments 17 pages, 15 figures

2511.18850 2026-04-21 cs.CL

Cognitive Alpha Mining via LLM-Driven Code-Based Evolution

Fengyuan Liu, Yi Huang, Sichun Luo, Yuqi Wang, Yazheng Yang, Xinye Li, Zefa Hu, Junlan Feng, Qi Liu

2511.17699 2026-04-21 cs.CV cs.AI

Understanding Counting Mechanisms in Large Language and Vision-Language Models

Hosein Hasani, Amirmohammad Izadi, Fatemeh Askari, Mobin Bagherian, Sadegh Mohammadian, Mohammad Izadi, Mahdieh Soleymani Baghshah

Comments Accepted to CVPR 2026

2511.16698 2026-04-21 cs.CL cs.AI

Hierarchical Retrieval with Out-Of-Vocabulary Queries: A Case Study on SNOMED CT

Jonathon Dilworth, Hui Yang, Jiaoyan Chen, Yongsheng Gao, Ernesto Jimenez-Ruiz

Comments 21 pages, 5 figures, 8 tables, submission to the Transactions on Graph Data and Knowledge (TGDK) journal

2511.15669 2026-04-21 cs.LG cs.AI cs.RO

DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models

Cheng Yin, Yankai Lin, Wang Xu, Sikyuen Tam, Xiangrui Zeng, Zhiyuan Liu, Zhouping Yin

Comments 19 pages, 6 figures, conference

2511.14582 2026-04-21 cs.CV

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

Keda Tao, Kele Shao, Bohan Yu, Weiqiang Wang, Jian liu, Huan Wang

Comments [CVPR 2026] Code Link: https://github.com/KD-TAO/OmniZip

2511.12676 2026-04-21 cs.CV cs.AI

BridgeEQA: Virtual Embodied Agents for Real Bridge Inspections

Subin Varghese, Joshua Gao, Asad Ur Rahman, Vedhus Hoskere

2511.12554 2026-04-21 cs.CV

EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis

Yijie Guo, Dexiang Hong, Weidong Chen, Zihan She, Cheng Ye, Xiaojun Chang, Zhendong Mao

Comments 11 pages, 7 figures. This is a preprint version of a paper submitted to CVPR 2026

2511.11113 2026-04-21 cs.CV cs.AI cs.LG

VIDEOP2R: Video Understanding from Perception to Reasoning

Yifan Jiang, Yueying Wang, Rui Zhao, Toufiq Parag, Zhimin Chen, Zhenyu Liao, Jayakrishnan Unnikrishnan

Comments CVPR Findings 2026

2511.10370 2026-04-21 cs.CV cs.AI cs.LG

SHRUG-FM: Reliability-Aware Foundation Models for Earth Observation

Maria Gonzalez-Calabuig, Kai-Hendrik Cohrs, Vishal Nedungadi, Zuzanna Osika, Ruben Cartuyvels, Steffen Knoblauch, Joppe Massant, Shruti Nath, Patrick Ebel, Vasileios Sitokonstantinou

Comments Accepted for proceedings at CVPR EarthVision 2026

2511.09818 2026-04-21 cs.CV

Lumos3D: A Single-Forward Framework for Low-Light 3D Scene Restoration

Hanzhou Liu, Peng Jiang, Jia Huang, Mi Lu

2511.07129 2026-04-21 cs.CL cs.AI cs.LG

LoRA on the Go: Instance-level Dynamic LoRA Selection and Merging

Seungeon Lee, Soumi Das, Manish Gupta, Krishna P. Gummadi

Comments Accepted as a main conference paper in ACL 2026

2511.05152 2026-04-21 cs.CV cs.GR cs.MM

Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges

Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull

Comments Accepted to IEEE International Conference on 3DV (2026)

2511.00868 2026-04-21 cs.LG

FlexiCache: Leveraging Temporal Stability of Attention Heads for Efficient KV Cache Management

Nazmul Takbir, Hamidreza Alikhani, Nikil Dutt, Sangeetha Abdu Jyothi

Comments Accepted at MLSys-2026

2510.26721 2026-04-21 cs.AI cs.MM

MaLoRA: Gated Modality LoRA for Key-Space Alignment in Multimodal LLM Fine-Tuning

Xinhan Zheng, Huyu Wu, Xueting Wang, Duo Su, Haiyun Jiang

2510.24235 2026-04-21 cs.LG cs.AI

PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling

Ai Jian, Jingqing Ruan, Xing Ma, Xiaoyun Zhang, Dailin Li, Weipeng Zhang, Ke Zeng, Xunliang Cai

Comments ACL Main

2510.19410 2026-04-21 cs.CL cs.AI

ToMMeR -- Efficient Entity Mention Detection from Large Language Models

Victor Morand, Nadi Tomeh, Josiane Mothe, Benjamin Piwowarski

Comments Accepted at ACL2026 - Code: https://github.com/VictorMorand/llm2ner