arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.02217 2026-04-03 cs.AI cs.CL

VISTA: Visualization of Token Attribution via Efficient Analysis

Syed Ahmed, Bharathi Vokkaliga Ganesh, Jagadish Babu P, Karthick Selvaraj, Praneeth Talluri, Sanket Hingne, Anubhav Kumar, Anushka Yadav, Pratham Kumar Verma, Kiranmayee Janardhan, Mandanna A N

Comments 12 pages, 3 figures

详情

英文摘要

Understanding how Large Language Models (LLMs) process information from prompts remains a significant challenge. To shed light on this "black box," attention visualization techniques have been developed to capture neuron-level perceptions and interpret how models focus on different parts of input data. However, many existing techniques are tailored to specific model architectures, particularly within the Transformer family, and often require backpropagation, resulting in nearly double the GPU memory usage and increased computational cost. A lightweight, model-agnostic approach for attention visualization remains lacking. In this paper, we introduce a model-agnostic token importance visualization technique to better understand how generative AI systems perceive and prioritize information from input text, without incurring additional computational cost. Our method leverages perturbation-based strategies combined with a three-matrix analytical framework to generate relevance maps that illustrate token-level contributions to model predictions. The framework comprises: (1) the Angular Deviation Matrix, which captures shifts in semantic direction; (2) the Magnitude Deviation Matrix, which measures changes in semantic intensity; and (3) the Dimensional Importance Matrix, which evaluates contributions across individual vector dimensions. By systematically removing each token and measuring the resulting impact across these three complementary dimensions, we derive a composite importance score that provides a nuanced and mathematically grounded measure of token significance. To support reproducibility and foster wider adoption, we provide open-source implementations of all proposed and utilized explainability techniques, with code and resources publicly available at https://github.com/Infosys/Infosys-Responsible-AI-Toolkit

URL PDF HTML ☆

赞 0 踩 0

2604.02215 2026-04-03 cs.LG cs.AI

Universal Hypernetworks for Arbitrary Models

Xuanfeng Zhou

2604.02209 2026-04-03 cs.CL

CV-18 NER: Augmented Common Voice for Named Entity Recognition from Arabic Speech

Youssef Saidi, Haroun Elleuch, Fethi Bougares

Comments Accepted at OSACT 2026

2604.02207 2026-04-03 cs.AI cs.CL

Blinded Radiologist and LLM-Based Evaluation of LLM-Generated Japanese Translations of Chest CT Reports: Comparative Study

Yosuke Yamagishi, Atsushi Takamatsu, Yasunori Hamaguchi, Tomohiro Kikuchi, Shouhei Hanaoka, Takeharu Yoshikawa, Osamu Abe

Comments 25 pages, 4 figures

详情

英文摘要

Background: Accurate translation of radiology reports is important for multilingual research, clinical communication, and radiology education, but the validity of LLM-based evaluation remains unclear. Objective: To evaluate the educational suitability of LLM-generated Japanese translations of chest CT reports and compare radiologist assessments with LLM-as-a-judge evaluations. Methods: We analyzed 150 chest CT reports from the CT-RATE-JPN validation set. For each English report, a human-edited Japanese translation was compared with an LLM-generated translation by DeepSeek-V3.2. A board-certified radiologist and a radiology resident independently performed blinded pairwise evaluations across 4 criteria: terminology accuracy, readability, overall quality, and radiologist-style authenticity. In parallel, 3 LLM judges (DeepSeek-V3.2, Mistral Large 3, and GPT-5) evaluated the same pairs. Agreement was assessed using QWK and percentage agreement. Results: Agreement between radiologists and LLM judges was near zero (QWK=-0.04 to 0.15). Agreement between the 2 radiologists was also poor (QWK=0.01 to 0.06). Radiologist 1 rated terminology as equivalent in 59% of cases and favored the LLM translation for readability (51%) and overall quality (51%). Radiologist 2 rated readability as equivalent in 75% of cases and favored the human-edited translation for overall quality (40% vs 21%). All 3 LLM judges strongly favored the LLM translation across all criteria (70%-99%) and rated it as more radiologist-like in >93% of cases. Conclusions: LLM-generated translations were often judged natural and fluent, but the 2 radiologists differed substantially. LLM-as-a-judge showed strong preference for LLM output and negligible agreement with radiologists. For educational use of translated radiology reports, automated LLM-based evaluation alone is insufficient; expert radiologist review remains important.

URL PDF HTML ☆

赞 0 踩 0

2604.02206 2026-04-03 cs.LG cs.AI

LEO: Graph Attention Network based Hybrid Multi Sensor Extended Object Fusion and Tracking for Autonomous Driving Applications

Mayank Mayank, Bharanidhar Duraisamy, Florian Geiss

Comments 10 pages, 6 figures

2604.02201 2026-04-03 cs.LG

On the Role of Depth in the Expressivity of RNNs

Maude Lizaire, Michael Rizvi-Martel, Éric Dupuis, Guillaume Rabusseau

2604.02200 2026-04-03 cs.CL

Towards Position-Robust Talent Recommendation via Large Language Models

Silin Du, Hongyan Liu

2604.02198 2026-04-03 cs.AI cs.LG

From High-Dimensional Spaces to Verifiable ODD Coverage for Safety-Critical AI-based Systems

Thomas Stefani, Johann Maximilian Christensen, Elena Hoemann, Frank Köster, Sven Hallerbach

2604.02194 2026-04-03 cs.CL cs.AI

Neuro-RIT: Neuron-Guided Instruction Tuning for Robust Retrieval-Augmented Language Model

Jaemin Kim, Jae O Lee, Sumyeong Ahn, Seo Yeon Park

2604.02190 2026-04-03 cs.CV cs.RO

UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving

Yongkang Li, Lijun Zhou, Sixu Yan, Bencheng Liao, Tianyi Yan, Kaixin Xiong, Long Chen, Hongwei Xie, Bing Wang, Guang Chen, Hangjun Ye, Wenyu Liu, Haiyang Sun, Xinggang Wang

Comments code has been released at https://github.com/xiaomi-research/unidrivevla

2604.02188 2026-04-03 cs.CV

Lightweight Spatiotemporal Highway Lane Detection via 3D-ResNet and PINet with ROI-Aware Attention

Sorna Shanmuga Raja, Abdelhafid Zenati

2604.02185 2026-04-03 cs.CV

CXR-LT 2026 Challenge: Projection-Aware Multi-Label and Zero-Shot Chest X-Ray Classification

Juno Cho, Dohui Kim, Mingeon Kim, Hyunseo Jang, Chang Sun Lee, Jong Chul Ye

Comments 5 pages, 3 figures. Accepted to the IEEE ISBI 2026 CXR-LT Challenge

2604.02182 2026-04-03 cs.CV cs.HC

ViT-Explainer: An Interactive Walkthrough of the Vision Transformer Pipeline

Juan Manuel Hernandez, Mariana Fernandez-Espinosa, Denis Parra, Diego Gomez-Zara

Comments 7 pages, 4 figures

2604.02174 2026-04-03 cs.AI

Quantifying Self-Preservation Bias in Large Language Models

Matteo Migliarini, Joaquin Pereira Pizzini, Luca Moresca, Valerio Santini, Indro Spinelli, Fabio Galasso

2604.02171 2026-04-03 cs.CL cs.LG

Do Lexical and Contextual Coreference Resolution Systems Degrade Differently under Mention Noise? An Empirical Study on Scientific Software Mentions

Atilla Kaan Alkan, Felix Grezes, Jennifer Lynn Bartlett, Anna Kelbert, Kelly Lockhart, Alberto Accomazzi

Comments 8 pages

2604.02168 2026-04-03 cs.CV

Reflection Generation for Composite Image Using Diffusion Model

Haonan Zhao, Qingyang Liu, Jiaxuan Chen, Li Niu

2604.02162 2026-04-03 cs.CV

Beyond the Fold: Quantifying Split-Level Noise and the Case for Leave-One-Dataset-Out AU Evaluation

Saurabh Hinduja, Gurmeet Kaur, Maneesh Bilalpur, Jeffrey Cohn, Shaun Canavan

Comments CVPR 2026

2604.02160 2026-04-03 cs.CV

CoRegOVCD: Consistency-Regularized Open-Vocabulary Change Detection

Weidong Tang, Hanbin Sun, Zihan Li, Yikai Wang, Feifan Zhang

2604.02156 2026-04-03 cs.CL astro-ph.IM cs.IR cs.LG

AstroConcepts: A Large-Scale Multi-Label Classification Corpus for Astrophysics

Atilla Kaan Alkan, Felix Grezes, Sergi Blanco-Cuaresma, Jennifer Lynn Bartlett, Daniel Chivvis, Anna Kelbert, Kelly Lockhart, Alberto Accomazzi

Comments 9 pages, 2 figures

2604.02155 2026-04-03 cs.CL

Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language Agents

Xuan Qi

Comments 21 pages

2604.02147 2026-04-03 cs.AI

TRACE-Bot: Detecting Emerging LLM-Driven Social Bots via Implicit Semantic Representations and AIGC-Enhanced Behavioral Patterns

Zhongbo Wang, Zhiyu Lin, Zhu Wang, Haizhou Wang

2604.02145 2026-04-03 cs.AI cs.CL

MTI: A Behavior-Based Temperament Profiling System for AI Agents

Jihoon Jeong

Comments 29 pages, 6 figures, 12 tables. Paper #3 in the Model Medicine Series (Paper #1: arXiv:2603.04722)

2604.02142 2026-04-03 cs.RO cs.MA

PRO-SPECT: Probabilistically Safe Scalable Planning for Energy-Aware Coordinated UAV-UGV Teams in Stochastic Environments

Roger Fowler, Cahit Ikbal Er, Benjamin Johnsenberg, Yasin Yazicioglu

2604.02139 2026-04-03 cs.LG

Application of parametric Shallow Recurrent Decoder Network to magnetohydrodynamic flows in liquid metal blankets of fusion reactors

M. Lo Verso, C. Introini, E. Cervi, L. Savoldi, J. N. Kutz, A. Cammi

详情

英文摘要

Magnetohydrodynamic (MHD) phenomena play a pivotal role in the design and operation of nuclear fusion systems, where electrically conducting fluids (such as liquid metals or molten salts employed in reactor blankets) interact with magnetic fields of varying intensity and orientation, influencing the resulting flow dynamics. The numerical solution of MHD models entails the resolution of highly nonlinear, multiphysics systems of equations, which can become computationally demanding, particularly in multi-query, parametric, or real-time contexts. This study investigates a fully data-driven framework for MHD state reconstruction that integrates dimensionality reduction through Singular Value Decomposition (SVD) with the SHallow REcurrent Decoder (SHRED), a neural network architecture designed to reconstruct the full spatio-temporal state from sparse time-series measurements of selected observables, including previously unseen parametric configurations. The SHRED methodology is applied to a three-dimensional geometry representative of a portion of a WCLL blanket cell, in which lead-lithium flows around a water-cooled tube. Multiple magnetic field configurations are examined, including constant toroidal fields, combined toroidal-poloidal fields, and time-dependent magnetic fields. Across all considered scenarios, SHRED achieves high reconstruction accuracy, robustness, and generalization to magnetic field intensities, orientations, and temporal evolutions not seen during training. Notably, in the presence of time-varying magnetic fields, the model accurately infers the temporal evolution of the magnetic field itself using temperature measurements alone. Overall, the findings identify SHRED as a computationally efficient, data-driven, and flexible approach for MHD state reconstruction, with significant potential for real-time monitoring, diagnostics and control in fusion reactor systems.

URL PDF HTML ☆

赞 0 踩 0

2604.02135 2026-04-03 cs.CL

GaelEval: Benchmarking LLM Performance for Scottish Gaelic

Peter Devine, William Lamb, Beatrice Alex, Ignatius Ezeani, Dawn Knight, Mícheál J. Ó Meachair, Paul Rayson, Martin Wynne

Comments 13 pages, to be published in Proceedings of LLMs4SSH (workshop co-located with LREC 2026; Mallorca, Spain; May 2026)

2604.02128 2026-04-03 cs.AI

SEAL: An Open, Auditable, and Fair Data Generation Framework for AI-Native 6G Networks

Sunder Ali Khowaja, Kapal Dev, Engin Zeydan, Madhusanka Liyanage

Comments 6 pages, 2 figures, 1 table, accepted at European Conference on Networks and Communications (2026 EuCNC & 6G Summit)

2604.02119 2026-04-03 cs.LG

AA-SVD : Anchored and Adaptive SVD for Large Language Model Compression

Atul Kumar Sinha, François Fleuret

2604.02118 2026-04-03 cs.AI cs.CL

LLM-as-a-Judge for Time Series Explanations

Preetham Sivalingam, Murari Mandal, Saurabh Deshpande, Dhruv Kumar

Comments Under Review

详情

英文摘要

Evaluating factual correctness of LLM generated natural language explanations grounded in time series data remains an open challenge. Although modern models generate textual interpretations of numerical signals, existing evaluation methods are limited: reference based similarity metrics and consistency checking models require ground truth explanations, while traditional time series methods operate purely on numerical values and cannot assess free form textual reasoning. Thus, no general purpose method exists to directly verify whether an explanation is faithful to underlying time series data without predefined references or task specific rules. We study large language models as both generators and evaluators of time series explanations in a reference free setting, where given a time series, question, and candidate explanation, the evaluator assigns a ternary correctness label based on pattern identification, numeric accuracy, and answer faithfulness, enabling principled scoring and comparison. To support this, we construct a synthetic benchmark of 350 time series cases across seven query types, each paired with correct, partially correct, and incorrect explanations. We evaluate models across four tasks: explanation generation, relative ranking, independent scoring, and multi anomaly detection. Results show a clear asymmetry: generation is highly pattern dependent and exhibits systematic failures on certain query types, with accuracies ranging from 0.00 to 0.12 for Seasonal Drop and Volatility Shift, to 0.94 to 0.96 for Structural Break, while evaluation is more stable, with models correctly ranking and scoring explanations even when their own outputs are incorrect. These findings demonstrate feasibility of data grounded LLM based evaluation for time series explanations and highlight their potential as reliable evaluators of data grounded reasoning in the time series domain.

URL PDF HTML ☆

赞 0 踩 0

2604.02113 2026-04-03 cs.CL

Reliable Control-Point Selection for Steering Reasoning in Large Language Models

Haomin Zhuang, Hojun Yoo, Xiaonan Luo, Kehan Guo, Xiangliang Zhang

2604.02109 2026-04-03 cs.RO

ROS 2-Based LiDAR Perception Framework for Mobile Robots in Dynamic Production Environments, Utilizing Synthetic Data Generation, Transformation-Equivariant 3D Detection and Multi-Object Tracking

Lukas Bergs, Tan Chung, Marmik Thakkar, Alexander Moriz, Amon Göppert, Chinnawut Nantabut, Robert Schmitt

Comments Accepted for publication at CIRP ICME 2025; will appear in Procedia CIRP