arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.12911 2026-04-15 cs.CL cs.AI

Round-Trip Translation Reveals What Frontier Multilingual Benchmarks Miss

Ronald Skorobogat, Ameya Prabhu, Matthias Bethge

详情

英文摘要

Multilingual benchmarks guide the development of frontier models. Yet multilingual evaluations reported by frontier models are structured similar to popular reasoning and knowledge benchmarks, but across many languages. We show such benchmarks, and consequently multilingual evaluations, measure mathematical reasoning and factual recall, not multilingual proficiency. For example, thinking variants dramatically outperform instruct variants on these benchmarks, yet often perform worse on real-world multilingual tasks, such as LMArena. We propose a simple alternative: evaluate multilingual capability via round-trip translation. Given text in a source language, translate it to a target language and back; semantic gaps between the original and result expose failures in multilingual generation capabilities. Round-trip translation correlates almost perfectly (\r{ho} = 0.94) with user ratings on LMArena with our benchmark, requires no human reference translations, and does not require a more capable multilingual judge than tested models. Lastly, we introduce Lost in Translation (LiT), a challenging round-trip translation benchmark spanning widely spoken languages worldwide, for realistic evaluation of multilingual frontier models.

URL PDF HTML ☆

赞 0 踩 0

2604.12909 2026-04-15 cs.RO

Tree Learning: A Multi-Skill Continual Learning Framework for Humanoid Robots

Yifei Yan, Linqi Ye

2604.12908 2026-04-15 cs.RO

Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models

Zijian Song, Qichang Li, Jiawei Zhou, Zhenlong Yuan, Tianshui Chen, Liang Lin, Guangrun Wang

Comments 18 pages, 10 figures

2604.12905 2026-04-15 cs.RO cs.LG

Frequency-aware Decomposition Learning for Sensorless Wrench Forecasting on a Vibration-rich Hydraulic Manipulator

Hyeonbeen Lee, Min-Jae Jung, Tae-Kyeong Yeu, Jong-Boo Han, Daegil Park, Jin-Gyun Kim

Comments 11 pages, 6 figures, submitted to IEEE/ASME Transactions on Mechatronics

2604.12904 2026-04-15 cs.CV

A Sanity Check on Composed Image Retrieval

Yikun Liu, Jiangchao Yao, Weidi Xie, Yanfeng Wang

2604.12898 2026-04-15 cs.AI math.CO

BEAM: Bi-level Memory-adaptive Algorithmic Evolution for LLM-Powered Heuristic Design

Chuyang Xiang, Yichen Wei, Jiale Ma, Handing Wang, Junchi Yan

Comments 24 pages, 11 figures

2604.12896 2026-04-15 cs.CV cs.LG

Don't Show Pixels, Show Cues: Unlocking Visual Tool Reasoning in Language Models via Perception Programs

Muhammad Kamran Janjua, Hugo Silva, Di Niu, Bahador Rashidi

Comments Accepted to CVPR 2026

2604.12894 2026-04-15 cs.CV

Representing 3D Faces with Learnable B-Spline Volumes

Prashanth Chandran, Daoye Wang, Timo Bolkart

Comments Accepted to CVPR 2026 (Highlight)

2604.12891 2026-04-15 cs.LG cs.AR

TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning

Chaoyao Shen, Linfeng Jiang, Yixian Shen, Tao Xu, Guoqing Li, Anuj Pathania, Andy D. Pimentel, Meng Zhang

Comments introduces TCL framework for cross-hardware tensor program optimization with active learning, Mamba-based cost model, and continual knowledge distillation; includes extensive experiments on CPU and GPU platforms

2604.12887 2026-04-15 cs.CV cs.LG

VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization

Andrei Atanov, Jesse Allardice, Roman Bachmann, Oğuzhan Fatih Kar, R Devon Hjelm, David Griffiths, Peter Fu, Afshin Dehghan, Amir Zamir

Comments project page at https://videoflextok.epfl.ch/

2604.12879 2026-04-15 cs.RO cs.AI

FastGrasp: Learning-based Whole-body Control method for Fast Dexterous Grasping with Mobile Manipulators

Heng Tao, Yiming Zhong, Zemin Yang, Yuexin Ma

2604.12875 2026-04-15 cs.AI

AISafetyBenchExplorer: A Metric-Aware Catalogue of AI Safety Benchmarks Reveals Fragmented Measurement and Weak Benchmark Governance

Abiodun A. Solanke

Comments 11 pages, 4 figures

详情

英文摘要

The rapid expansion of large language model (LLM) safety evaluation has produced a substantial benchmark ecosystem, but not a correspondingly coherent measurement ecosystem. We present AISafetyBenchExplorer, a structured catalogue of 195 AI safety benchmarks released between 2018 and 2026, organized through a multi-sheet schema that records benchmark-level metadata, metric-level definitions, benchmark-paper metadata, and repository activity. This design enables meta-analysis not only of what benchmarks exist, but also of how safety is operationalized, aggregated, and judged across the literature. Using the updated catalogue, we identify a central structural problem: benchmark proliferation has outpaced measurement standardization. The current landscape is dominated by medium-complexity benchmarks (94/195), while only 7 benchmarks occupy the Popular tier. The workbook further reports strong concentration around English-only evaluation (165/195), evaluation-only resources (170/195), stale GitHub repositories (137/195), stale Hugging Face datasets (96/195), and heavy reliance on arXiv preprints among benchmarks with known venue metadata. At the metric level, the catalogue shows that familiar labels such as accuracy, F1 score, safety score, and aggregate benchmark scores often conceal materially different judges, aggregation rules, and threat models. We argue that the field's main failure mode is fragmentation rather than scarcity. Researchers now have many benchmark artifacts, but they often lack a shared measurement language, a principled basis for benchmark selection, and durable stewardship norms for post publication maintenance. AISafetyBenchExplorer addresses this gap by providing a traceable benchmark catalogue, a controlled metadata schema, and a complexity taxonomy that together support more rigorous benchmark discovery, comparison, and meta-evaluation.

URL PDF HTML ☆

赞 0 踩 0

2604.12874 2026-04-15 cs.AI

LIFE -- an energy efficient advanced continual learning agentic AI framework for frontier systems

Anne Lee, Gurudutt Hosangadi

Comments 9 pages, 4 figures, 4 tables

2604.12872 2026-04-15 cs.RO

OVAL: Open-Vocabulary Augmented Memory Model for Lifelong Object Goal Navigation

Jiahua Pei, Yi Liu, Guoping Pan, Yuanhao Jiang, Houde Liu, Xueqian Wang

Comments 8 pages, 5 figures

2604.12857 2026-04-15 cs.AI cs.RO cs.SY eess.SY

Artificial Intelligence for Modeling and Simulation of Mixed Automated and Human Traffic

Saeed Rahmani, Shiva Rasouli, Daphne Cornelisse, Eugene Vinitsky, Bart van Arem, Simeon C. Calvert

Comments This work has been submitted to the IEEE for possible publication

2604.12855 2026-04-15 cs.RO

Evolving the Complete Muscle: Efficient Morphology-Control Co-design for Musculoskeletal Locomotion

Lidong Sun, Wentao Zhao, Ye Wang, Huaping Liu, Fuchun Sun

2604.12852 2026-04-15 cs.RO

PAINT: Partner-Agnostic Intent-Aware Cooperative Transport with Legged Robots

Zhihao Cao, Tianxu An, Chenhao Li, Stelian Coros, Marco Hutter

2604.12837 2026-04-15 cs.RO

GGD-SLAM: Monocular 3DGS SLAM Powered by Generalizable Motion Model for Dynamic Environments

Yi Liu, Haoxuan Xu, Hongbo Duan, Keyu Fan, Zhengyang Zhang, Peiyu Zhuang, Pengting Luo, Houde Liu

Comments 8 pages, Accepted by ICRA 2026

2604.12833 2026-04-15 cs.CV

Challenging Vision-Language Models with Physically Deployable Multimodal Semantic Lighting Attacks

Yingying Zhao, Chengyin Hu, Qike Zhang, Xin Li, Xin Wang, Yiwei Wei, Jiujiang Guo, Jiahuan Long, Tingsong Jiang, Wen Yao

2604.12832 2026-04-15 cs.CV cs.AI

Detecting and refurbishing ground truth errors during training of deep learning-based echocardiography segmentation models

Iman Islam, Bram Ruijsink, Andrew J. Reader, Andrew P. King

Comments 5 pages, 3 figures, 2 tables, International Symposium on Biomedical Imaging 2026

2604.12831 2026-04-15 cs.RO

VULCAN: Vision-Language-Model Enhanced Multi-Agent Cooperative Navigation for Indoor Fire-Disaster Response

Shengding Liu, Qiben Yan

Comments INFOCOM EIN Workshop 2026

2604.12820 2026-04-15 cs.AI cs.CL

RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair

Jagadeesh Rachapudi, Pranav Singh, Ritali Vatsi, Praful Hambarde, Amit Shukla

详情

英文摘要

Large language models (LLMs) inherently absorb harmful knowledge, misinformation, and personal data during pretraining on large-scale web corpora, with no native mechanism for selective removal. While machine unlearning offers a principled solution, existing approaches are provider-centric, requiring retraining pipelines, curated retain datasets, and direct intervention by model service providers (MSPs), thereby excluding end users from controlling their own data. We introduce Interactive Machine Unlearning (IMU), a new paradigm in which users can instruct LLMs to forget targeted knowledge through natural language at inference time. To realize IMU, we propose RePAIR, a prompt-aware model repair framework comprising (i) a watchdog model for unlearning intent detection, (ii) a surgeon model for generating repair procedures, and (iii) a patient model whose parameters are updated autonomously. At the core of RePAIR, we develop Steering Through Activation Manipulation with PseudoInverse (STAMP), a training-free, single-sample unlearning method that redirects MLP activations toward a refusal subspace via closed-form pseudoinverse updates. Its low-rank variant reduces computational complexity from O(d^3) to O(r^3 + r^2 * d), enabling efficient on-device unlearning with up to ~3x speedup over training-based baselines. Extensive experiments across harmful knowledge suppression, misinformation correction, and personal data erasure demonstrate that RePAIR achieves near-zero forget scores (Acc_f = 0.00, F-RL = 0.00) while preserving model utility (Acc_r up to 84.47, R-RL up to 0.88), outperforming six state-of-the-art baselines. These results establish RePAIR as an effective and practical framework for user-driven model editing, advancing transparent and on-device control over learned knowledge, with potential extensions to multimodal foundation models.

URL PDF HTML ☆

赞 0 踩 0

2604.12817 2026-04-15 cs.LG cs.CR stat.ML

Understanding and Improving Continuous Adversarial Training for LLMs via In-context Learning Theory

Shaopeng Fu, Di Wang

Comments The Fourteenth International Conference on Learning Representations (ICLR 2026)

2604.12816 2026-04-15 cs.CL

The role of System 1 and System 2 semantic memory structure in human and LLM biases

Katherine Abramski, Giulio Rossetti, Massimo Stella

Comments 31 pages, 5 figures, 9 appendix figures

2604.12811 2026-04-15 cs.LG cs.AI cs.NE

Algorithmic Analysis of Dense Associative Memory: Finite-Size Guarantees and Adversarial Robustness

Madhava Gaikwad

Comments 21 pages, 9 figures, Accepted in New Frontiers in Associative Memory workshop at ICLR 2026

2604.12806 2026-04-15 cs.LG

Interpretable Relational Inference with LLM-Guided Symbolic Dynamics Modeling

Xiaoxiao Liang, Juyuan Zhang, Liming Pan, Linyuan Lü

Comments Submitted to conference

2604.12805 2026-04-15 cs.CV

Image-to-Image Translation Framework Embedded with Rotation Symmetry Priors

Feiyu Tan, Heran Yang, Qihong Duan, Kai Ye, Qi Xie, Deyu Meng

Comments 17 pages, 8 figures, submiting to TPAMI

2604.12803 2026-04-15 cs.CV cs.LG

Generative Anonymization in Event Streams

Adam T. Müller, Mihai Kocsis, Nicolaj C. Stache

Comments Accepted to the 1st Workshop on Low-Level Vision Frontiers (LoViF) at IEEE/CVF CVPR 2026

2604.12798 2026-04-15 cs.LG cs.AI

VFA: Relieving Vector Operations in Flash Attention with Global Maximum Pre-computation

Yupeng Sun, Yanzhao Li, Zhiqiang Zou, Bai Du, Zhiyuan Zhang, Hui Dong, Gaoyige Fan, Hui Wang

详情

英文摘要

FlashAttention-style online softmax enables exact attention computation with linear memory by streaming score tiles through on-chip memory and maintaining a running maximum and normalizer. However, as attention kernels approach peak tensor-core/cube-core throughput on modern accelerators, non-matmul components of online softmax -- especially per-tile rowmax and rowsum reductions and rescale chains -- can become vector or SIMD limited and dominate latency. This paper revisits FlashAttention and proposes Vector Relieved Flash Attention (VFA), a hardware-friendly method that reduces rowmax-driven updates of the running maximum while retaining the online-softmax structure. VFA initializes the running maximum via a cheap approximation from key-block representations, reorders key-block traversal to prioritize high-impact sink and local blocks, and freezes the maximum for remaining blocks to avoid repeated reductions and rescaling. We further integrate VFA with block-sparse skipping methods such as BLASST to form Vector Relieved Sparse Attention (VSA), which reduces both block count and per-block overhead. Notably, VFA and VSA completely avoid the conditional rescale operation in the update stage used in FA4.0. Extensive evaluations on benchmarks including MMLU and MATH500, together with attention statistics, verify our design: (i) sink and local reordering stabilizes the running maximum early; (ii) simple Q and K block summaries fail due to intra-block heterogeneity; (iii) m-initialization is required when maxima appear in middle blocks. Overall, VFA and VSA efficiently alleviate online-softmax reduction bottlenecks without performance loss. Compared to the C16V32 baseline, C8V32, C4V32 and C4V16 achieve nearly two times speedup on modern hardware while hitting the vector bottleneck. With upcoming architecture improvements, C4V16 will deliver six times speedup by enhancing exponent capacity.

URL PDF HTML ☆

赞 0 踩 0

2604.12782 2026-04-15 cs.LG cs.AI

OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension

Zhiyuan Zhang, Yanzhao Li, Zhiqiang Zou, Bai Du, Yupeng Sun, Hui Dong, Hui Wang