arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.11421 2026-03-13 cs.CV

ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation

Songlin Yang, Zhe Wang, Xuyi Yang, Songchun Zhang, Xianghao Kong, Taiyi Wu, Xiaotong Zhao, Ran Zhang, Alan Zhao, Anyi Rao

详情

英文摘要

Text-driven video generation has democratized film creation, but camera control in cinematic multi-shot scenarios remains a significant block. Implicit textual prompts lack precision, while explicit trajectory conditioning imposes prohibitive manual overhead and often triggers execution failures in current models. To overcome this bottleneck, we propose a data-centric paradigm shift, positing that aligned (Caption, Trajectory, Video) triplets form an inherent joint distribution that can connect automated plotting and precise execution. Guided by this insight, we present ShotVerse, a "Plan-then-Control" framework that decouples generation into two collaborative agents: a VLM (Vision-Language Model)-based Planner that leverages spatial priors to obtain cinematic, globally aligned trajectories from text, and a Controller that renders these trajectories into multi-shot video content via a camera adapter. Central to our approach is the construction of a data foundation: we design an automated multi-shot camera calibration pipeline aligns disjoint single-shot trajectories into a unified global coordinate system. This facilitates the curation of ShotVerse-Bench, a high-fidelity cinematic dataset with a three-track evaluation protocol that serves as the bedrock for our framework. Extensive experiments demonstrate that ShotVerse effectively bridges the gap between unreliable textual control and labor-intensive manual plotting, achieving superior cinematic aesthetics and generating multi-shot videos that are both camera-accurate and cross-shot consistent.

URL PDF HTML ☆

赞 0 踩 0

2603.11415 2026-03-13 cs.CL

BLooP: Zero-Shot Abstractive Summarization using Large Language Models with Bigram Lookahead Promotion

Varun Iyer, Cornelia Caragea

Comments LREC 2026

2603.11414 2026-03-13 cs.CL cond-mat.mtrl-sci

MaterialFigBENCH: benchmark dataset with figures for evaluating college-level materials science problem-solving abilities of multimodal large language models

Michiko Yoshitake, Yuta Suzuki, Ryo Igarashi, Yoshitaka Ushiku, Keisuke Nagato

Comments 27 pages, 4 tables, 6 figures

详情

英文摘要

We present MaterialFigBench, a benchmark dataset designed to evaluate the ability of multimodal large language models (LLMs) to solve university-level materials science problems that require accurate interpretation of figures. Unlike existing benchmarks that primarily rely on textual representations, MaterialFigBench focuses on problems in which figures such as phase diagrams, stress-strain curves, Arrhenius plots, diffraction patterns, and microstructural schematics are indispensable for deriving correct answers. The dataset consists of 137 free-response problems adapted from standard materials science textbooks, covering a broad range of topics including crystal structures, mechanical properties, diffusion, phase diagrams, phase transformations, and electronic properties of materials. To address unavoidable ambiguity in reading numerical values from images, expert-defined answer ranges are provided where appropriate. We evaluate several state-of-the-art multimodal LLMs, including ChatGPT and GPT models accessed via OpenAI APIs, and analyze their performance across problem categories and model versions. The results reveal that, although overall accuracy improves with model updates, current LLMs still struggle with genuine visual understanding and quantitative interpretation of materials science figures. In many cases, correct answers are obtained by relying on memorized domain knowledge rather than by reading the provided images. MaterialFigBench highlights persistent weaknesses in visual reasoning, numerical precision, and significant-digit handling, while also identifying problem types where performance has improved. This benchmark provides a systematic and domain-specific foundation for advancing multimodal reasoning capabilities in materials science and for guiding the development of future LLMs with stronger figure-based understanding.

URL PDF HTML ☆

赞 0 踩 0

2603.11409 2026-03-13 cs.AI cs.CL

Speak or Stay Silent: Context-Aware Turn-Taking in Multi-Party Dialogue

Kratika Bhagtani, Mrinal Anand, Yu Chen Xu, Amit Kumar Singh Yadav

Comments Submitted for review to Interspeech 2026

2603.11403 2026-03-13 cs.CV

DeepHistoViT: An Interpretable Vision Transformer Framework for Histopathological Cancer Classification

Ravi Mosalpuri, Mohammed Abdelsamea, Ahmed Karam Eldaly

2603.11400 2026-03-13 cs.RO cs.AI cs.LG

Deployment-Time Reliability of Learned Robot Policies

Christopher Agia

Comments Stanford University PhD dissertation, 2026. 182 pages, 37 figures. Available from Stanford Digital Repository

2603.11399 2026-03-13 cs.AI

Entropy Guided Diversification and Preference Elicitation in Agentic Recommendation Systems

Dat Tran, Yongce Li, Hannah Clay, Negin Golrezaei, Sajjad Beygi, Amin Saberi

Comments In proceeding to 2026 Association for the Advancement of Artificial Intelligence Spring Symposia

2603.11397 2026-03-13 cs.SD

Edge-Cloud Collaborative Speech Emotion Captioning via Token-Level Speculative Decoding in Audio-Language Models

Xiangyuan Xue, Jiajun Lu, Yan Gao, Gongping Huang, Ting Dang, Hong Jia

2603.11396 2026-03-13 cs.LG cs.CV

Harnessing Data Asymmetry: Manifold Learning in the Finsler World

Thomas Dagès, Simon Weber, Daniel Cremers, Ron Kimmel

2603.11389 2026-03-13 cs.CV

High-Precision 6DOF Pose Estimation via Global Phase Retrieval in Fringe Projection Profilometry for 3D Mapping

Sehoon Tak, Keunhee Cho, Sangpil Kim, Jae-Sang Hyun

2603.11388 2026-03-13 cs.AI

Deactivating Refusal Triggers: Understanding and Mitigating Overrefusal in Safety Alignment

Zhiyu Xue, Zimo Qi, Guangliang Liu, Bocheng Chen, Ramtin Pedarsani

2603.11378 2026-03-13 cs.SD cs.LG eess.AS

Continued Pretraining for Low-Resource Swahili ASR: Achieving State-of-the-Art Performance with Minimal Labeled Data

Hillary Mutisya, John Mugane

2603.11372 2026-03-13 cs.LG

Ensuring Safety in Automated Mechanical Ventilation through Offline Reinforcement Learning and Digital Twin Verification

Hang Yu, Huidong Liu, Qingchen Zhang, William Joy, Kateryna Nikulina, Andreas A. Schuppert, Sina Saffaran, Declan Bates

详情

英文摘要

Mechanical ventilation (MV) is a life-saving intervention for patients with acute respiratory failure (ARF) in the ICU. However, inappropriate ventilator settings could cause ventilator-induced lung injury (VILI). Also, clinicians workload is shown to be directly linked to patient outcomes. Hence, MV should be personalized and automated to improve patient outcomes. Previous attempts to incorporate personalization and automation in MV include traditional supervised learning and offline reinforcement learning (RL) approaches, which often neglect temporal dependencies and rely excessively on mortality-based rewards. As a result, early stage physiological deterioration and the risk of VILI are not adequately captured. To address these limitations, we propose Transformer-based Conservative Q-Learning (T-CQL), a novel offline RL framework that integrates a Transformer encoder for effective temporal modeling of patient dynamics, conservative adaptive regularization based on uncertainty quantification to ensure safety, and consistency regularization for robust decision-making. We build a clinically informed reward function that incorporates indicators of VILI and a score for severity of patients illness. Also, previous work predominantly uses Fitted Q-Evaluation (FQE) for RL policy evaluation on static offline data, which is less responsive to dynamic environmental changes and susceptible to distribution shifts. To overcome these evaluation limitations, interactive digital twins of ARF patients were used for online "at the bedside" evaluation. Our results demonstrate that T-CQL consistently outperforms existing state-of-the-art offline RL methodologies, providing safer and more effective ventilatory adjustments. Our framework demonstrates the potential of Transformer-based models combined with conservative RL strategies as a decision support tool in critical care.

URL PDF HTML ☆

赞 0 踩 0

2603.11369 2026-03-13 cs.LG q-bio.PE

abx_amr_simulator: A simulation environment for antibiotic prescribing policy optimization under antimicrobial resistance

Joyce Lee, Seth Blumberg

Comments 10 pages, 3 figures

2603.11365 2026-03-13 cs.RO

D-SLAMSpoof: An Environment-Agnostic LiDAR Spoofing Attack using Dynamic Point Cloud Injection

Rokuto Nagata, Kenji Koide, Kazuma Ikeda, Ozora Sako, Kentaro Yoshioka

2603.11364 2026-03-13 cs.RO

MirrorDrift: Actuated Mirror-Based Attacks on LiDAR SLAM

Rokuto Nagata, Kenji Koide, Kazuma Ikeda, Ozora Sako, Shion Horie, Kentaro Yoshioka

2603.11358 2026-03-13 cs.LG

Multilingual Financial Fraud Detection Using Machine Learning and Transformer Models: A Bangla-English Study

Mohammad Shihab Uddin, Md Hasibul Amin, Nusrat Jahan Ema, Bushra Uddin, Tanvir Ahmed, Arif Hassan Zidan

2603.11355 2026-03-13 cs.LG stat.AP

Teleodynamic Learning a new Paradigm For Interpretable AI

Enrique ter Horst, Juan Diego Zambrano

2603.11353 2026-03-13 cs.AI

The Artificial Self: Characterising the landscape of AI identity

Raymond Douglas, Jan Kulveit, Ondrej Havlicek, Theia Pearson-Vogel, Owen Cotton-Barratt, David Duvenaud

Comments 72 pages, 9 figures

2603.11352 2026-03-13 cs.AI

TimeSqueeze: Dynamic Patching for Efficient Time Series Forecasting

Sravan Kumar Ankireddy, Nikita Seleznev, Nam H. Nguyen, Yulun Wu, Senthil Kumar, Furong Huang, C. Bayan Bruss

Comments 21 pages, 14 figures

2603.11351 2026-03-13 cs.RO cs.AI

Novelty Adaptation Through Hybrid Large Language Model (LLM)-Symbolic Planning and LLM-guided Reinforcement Learning

Hong Lu, Pierrick Lorang, Timothy R. Duggan, Jivko Sinapov, Matthias Scheutz

2603.11342 2026-03-13 cs.CL cs.AI

Evaluating Explainable AI Attribution Methods in Neural Machine Translation via Attention-Guided Knowledge Distillation

Aria Nourbakhsh, Salima Lamsiyah, Adelaide Danilov, Christoph Schommer

Comments 37 pages, 11 figures

详情

英文摘要

The study of the attribution of input features to the output of neural network models is an active area of research. While numerous Explainable AI (XAI) techniques have been proposed to interpret these models, the systematic and automated evaluation of these methods in sequence-to-sequence (seq2seq) models is less explored. This paper introduces a new approach for evaluating explainability methods in transformer-based seq2seq models. We use teacher-derived attribution maps as a structured side signal to guide a student model, and quantify the utility of different attribution methods through the student's ability to simulate targets. Using the Inseq library, we extract attribution scores over source-target sequence pairs and inject these scores into the attention mechanism of a student transformer model under four composition operators (addition, multiplication, averaging, and replacement). Across three language pairs (de-en, fr-en, ar-en) and attributions from Marian-MT and mBART models, Attention, Value Zeroing, and Layer Gradient $\times$ Activation consistently yield the largest gains in BLEU (and corresponding improvements in chrF) relative to baselines. In contrast, other gradient-based methods (Saliency, Integrated Gradients, DeepLIFT, Input $\times$ Gradient, GradientShap) lead to smaller and less consistent improvements. These results suggest that different attribution methods capture distinct signals and that attention-derived attributions better capture alignment between source and target representations in seq2seq models. Finally, we introduce an Attributor transformer that, given a source-target pair, learns to reconstruct the teacher's attribution map. Our findings demonstrate that the more accurately the Attributor can reproduce attribution maps, the more useful an injection of those maps is for the downstream task. The source code can be found on GitHub.

URL PDF HTML ☆

赞 0 踩 0

2603.11340 2026-03-13 cs.AI cs.PF

Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI

Yonas Atinafu, Henry Lin, Robin Cohen

2603.11339 2026-03-13 cs.AI cs.CE cs.LG

FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles

Arun Vignesh Malarkkan, Manan Roy Choudhury, Guangwei Zhang, Vivek Gupta, Qingyun Wang, Yanjie Fu, Denghui Zhang

Comments 8 pages + Ethics Statement + References + Appendix

2603.11337 2026-03-13 cs.AI

RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents

Yonas Atinafu, Robin Cohen

2603.11335 2026-03-13 cs.RO

ADMM-based Continuous Trajectory Optimization in Graphs of Convex Sets

Lukas Pries, Jon Arrizabalaga, Zachary Manchester, Markus Ryll

2603.11328 2026-03-13 cs.RO cs.SY eess.SY

Distributed Kalman--Consensus Filtering with Adaptive Uncertainty Weighting for Multi-Object Tracking in Mobile Robot Networks

Niusha Khosravi, Rodrigo Ventura, Meysam Basiri

Comments Presented at ICARA 2026. To appear in the IEEE conference proceedings

2603.11325 2026-03-13 cs.CV

Towards Trustworthy Selective Generation: Reliability-Guided Diffusion for Ultra-Low-Field to High-Field MRI Synthesis

Zhenxuan Zhang, Peiyuan Jing, Ruicheng Yuan, Liwei Hu, Anbang Wang, Fanwen Wang, Yinzhe Wu, Kh Tohidul Islam, Zhaolin Chen, Zi Wang, Peter Lally, Guang Yang

2603.11323 2026-03-13 cs.CV

UNet-AF: An alias-free UNet for image restoration

Jérémy Scanvic, Quentin Barthélemy, Julián Tachella

2603.11320 2026-03-13 cs.CV

UniCompress: Token Compression for Unified Vision-Language Understanding and Generation

Ziyao Wang, Chen Chen, Jingtao Li, Weiming Zhuang, Jiabo Huang, Ang Li, Lingjuan Lyu