arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2510.05652 2026-05-08 cs.CV

SD-MVSum: Script-Driven Multimodal Video Summarization Method and Datasets

Manolis Mylonas, Charalampia Zerva, Evlampios Apostolidis, Vasileios Mezaris

Comments Under review

详情

英文摘要

In this work, we present a method and two large-scale datasets for Script-Driven Multimodal Video Summarization. The proposed method, SD-MVSum, builds on our earlier SD-VSum method for script-driven video summarization, which considered just the visual content of the video. SD-MVSum takes into account, in addition to the visual modality, the relevance of the user-provided script with the spoken content (i.e., audio transcript) of the video. The dependence between each considered pair of data modalities, i.e., script-video and script-transcript, is modeled using a new weighted cross-modal attention mechanism. This mechanism explicitly exploits the semantic similarity between the paired modalities in order to promote the parts of the full-length video with the highest relevance to the user-provided script. Furthermore, we extend two large-scale datasets for script-driven (S-VideoXum) and generic (MrHiSum) video summarization, to make them suitable for training and evaluation of script-driven multimodal video summarization methods. Experimental comparisons document the competitiveness of the proposed SD-MVSum method against other SotA approaches for script-driven and generic video summarization. Our new method and extended datasets are available at: https://github.com/IDT-ITI/SD-MVSum.

URL PDF HTML ☆

赞 0 踩 0

2510.02339 2026-05-08 cs.CL cs.AI

Evaluating Uncertainty Quantification Methods in Argumentative Large Language Models

Kevin Zhou, Adam Dejl, Gabriel Freedman, Lihu Chen, Antonio Rago, Francesca Toni

Comments Accepted at EMNLP Findings 2025

2510.02312 2026-05-08 cs.LG

KaVa: Latent Reasoning via Compressed KV-Cache Distillation

Anna Kuzina, Maciej Pioro, Paul N. Whatmough, Babak Ehteshami Bejnordi

Comments ICLR 2026

2509.24382 2026-05-08 cs.CV cs.AI

REMAP: Regularized Matching and Partial Alignment of Video Embeddings

Soumyadeep Chandra, Kaushik Roy

Comments 9 pages, 4 figures, 6 tables

2509.23765 2026-05-08 cs.CL cs.AI cs.LG

Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality

Junliang Li, Yucheng Wang, Yan Chen, Yu Ran, Ruiqing Zhang, Jing Liu, Hua Wu, Haifeng Wang

Comments 32 pages

2509.04112 2026-05-08 cs.LG cs.IT math.IT

Synthetic Counterfactual Labels for Efficient Conformal Counterfactual Inference

Amirmohammad Farzaneh, Matteo Zecchin, Osvaldo Simeone

2509.03238 2026-05-08 cs.RO cs.SY eess.SY

Vibration Damping in Underactuated Cable-suspended Artwork -- Flying Belt Motion Control

Martin Goubej, Lauria Clarke, Martin Hrabačka, David Tolar

Comments 10 pages, 10 figures

2508.16745 2026-05-08 cs.LG cs.AI

Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling

Ivan Rodkin, Daniil Orel, Konstantin Smirnov, Arman Bolatov, Bilal Elbouardi, Besher Hassan, Yuri Kuratov, Aydar Bulatov, Preslav Nakov, Timothy Baldwin, Artem Shelmanov, Mikhail Burtsev

2508.15119 2026-05-08 cs.AI cs.CL cs.LG cs.RO

Flexible Agent Alignment with Goal Inference from Open-Ended Dialog

Rachel Ma, Jingyi Qu, Andreea Bobu, Dylan Hadfield-Menell

Comments Previous version of the paper was titled: Open-Universe Assistance Games

2508.14482 2026-05-08 cs.LG

On the notion of missingness for path attribution explainability methods in medical settings: Guiding the selection of medically meaningful baselines

Alexander Geiger, Lars Wagner, Daniel Rueckert, Dirk Wilhelm, Alissa Jell

详情

英文摘要

The explainability of deep learning models remains a significant challenge, particularly in the medical domain where interpretable outputs are essential for clinical trust and transparency. Path attribution methods such as Integrated Gradients rely on a baseline that represents the absence of informative features, a notion commonly referred to as missingness. Standard baselines, such as all-zero inputs, are often semantically meaningless in medical contexts, where intensity values carry clinical significance. In this work, we revisit the notion of missingness for medical imaging, expose the limitations of standard baselines in this setting, and formalize a stricter missingness we term semantic missingness: a baseline must not merely lack signal, but must represent a clinically plausible state in which the disease-related features are absent. This formulation motivates a counterfactual-guided approach to baseline selection, in which a synthetically generated counterfactual (i.e. a clinically normal variant of the pathological input) serves as a principled and semantically meaningful reference. We derive theoretical guarantees showing that counterfactual baselines yield more faithful attributions than standard alternatives, and empirically validate this with two complementary counterfactual generative models, a VAE and a diffusion model, though the concept is model-agnostic and compatible with any suitable counterfactual method. Across three diverse medical datasets, counterfactual baselines produce more faithful and medically relevant attributions, outperforming standard baseline choices as well as related methods. Notably, we also compare against using the counterfactual directly as an explanation (an established paradigm in its own) and show that employing it as a baseline for Integrated Gradients yields superior results, thereby bridging two complementary explainability paradigms.

URL PDF HTML ☆

赞 0 踩 0

2507.22832 2026-05-08 cs.LG cs.CV cs.NE

Pulling Back the Curtain on Deep Networks

Maciej Satkiewicz, Roberto Corizzo, Marcin Pietroń

Comments Preprint; 9 pages, 23-page appendix, 12 figures, 6 Tables; v6 changes: slight reframing of the presentation

2507.02466 2026-05-08 cs.LG

Variational Kolmogorov-Arnold Network

Francesco Alesiani, Henrik Christiansen, Federico Errica

Comments Preprint

2507.01833 2026-05-08 cs.AI

Refining Gelfond Rationality Principle: Towards More Comprehensive Foundational Principles for Answer Set Semantics

Yi-Dong Shen, Thomas Eiter

Comments 76 pages. This article is a significantly extended version of a paper presented by the authors at IJCAI-2022

2506.18682 2026-05-08 cs.CV cs.AI

Multi-Scale Spectral Attention Module-based Hyperspectral Segmentation in Autonomous Driving Scenarios

Imad Ali Shah, Jiarong Li, Tim Brophy, Martin Glavin, Edward Jones, Enda Ward, Brian Deegan

2506.13727 2026-05-08 cs.LG cs.AI cs.CL

Attribution-Guided Pruning for Insight and Control: Circuit Discovery and Targeted Correction in Small-scale LLMs

Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Reduan Achtibat, Patrick Kahardipraja, Thomas Wiegand, Wojciech Samek, Alexander Binder, Sebastian Lapuschkin

Comments Work in progress (9 pages manuscript, 3 pages references, 16 pages appendix)

2505.24437 2026-05-08 cs.SD eess.AS

SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization

Jin Wang, Wenbin Jiang, Xiangbo Wang, Yubo You, Sheng Fang

Comments There is some technical error in this paper's method

2505.15064 2026-05-08 cs.LG math.DS stat.ML

Why and When Deep is Better than Shallow: Implementation-Agnostic State-Transition Model of Deep Learning

Sho Sonoda, Yuka Hashimoto, Isao Ishikawa, Masahiro Ikeda

2505.13100 2026-05-08 cs.LG

Time series saliency maps: explaining models across multiple domains

Christodoulos Kechris, Jonathan Dan, David Atienza

2504.04202 2026-05-08 cs.LG

Local-Order Auxiliary Losses Can Improve Autoencoder Reconstruction

Harvey Dam, Martin Burtscher, Tripti Agarwal, Ganesh Gopalakrishnan

2503.06624 2026-05-08 cs.CV

Chameleon: Benchmarking Detection and Backtracking on Commercial-Grade AI-Generated Videos

Xingming Liao, Meiyu Zeng, Canyu Chen, Nankai Lin, Zhuowei Wang, Aimin Yang

Comments Accepted by ICMR 2026

2503.02379 2026-05-08 cs.LG cs.CV

Teaching Metric Distance to Discrete Autoregressive Language Models

Jiwan Chung, Saejin Kim, Yongrae Jo, Jaewoo Park, Dongjun Min, Youngjae Yu

2502.19918 2026-05-08 cs.AI cs.LG

Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models

Yuan Sui, Yufei He, Tri Cao, Simeng Han, Yulin Chen, Bryan Hooi

Comments Accepted by ACL'2026

2502.03725 2026-05-08 cs.LG

Optimal Control of Fluid Restless Multi-armed Bandits: A Machine Learning Approach

Dimitris Bertsimas, Cheol Woo Kim, José Niño-Mora

2501.09238 2026-05-08 cs.LG

Mono-Forward: Revisiting Forward-Forward through Objective-Locality Decomposition

James Gong, Bruce Li, Waleed Abdulla

Comments 26 pages

2412.09125 2026-05-08 cs.AI cs.DB cs.LO

Goal-Driven Query Answering over First- and Second-Order Dependencies with Equality

Efthymia Tsamoura, Boris Motik

Comments 47 pages

2412.08110 2026-05-08 cs.CV cs.CL cs.LG

The ART of Composition: Attention-Regularized Training for Compositional Visual Grounding

Jiayun Luo, Mir Rayat Imtiaz Hossain, Pritam Sarkar, Boyang Li, Leonid Sigal

2411.18954 2026-05-08 cs.LG cs.AI

ReMAP: Neural Reparameterization for Scalable MAP Inference in Arbitrary-Order Markov Random Fields

Yaomin Wang, Chaolong Ying, Xiaodong Luo, Tianshu Yu

2411.12220 2026-05-08 cs.LG cs.AI cs.CR

DeTrigger: A Gradient-Centric Approach to Backdoor Attack Mitigation in Federated Learning

Kichang Lee, Yujin Shin, Jonghyuk Yun, Songkuk Kim, Jun Han, JeongGil Ko

Comments 21 pages

2411.03962 2026-05-08 cs.CL cs.IR

How Does A Text Preprocessing Pipeline Affect Ontology Matching?

Zhangcheng Qiang, Kerry Taylor, Weiqing Wang

Comments 14 pages, 16 figures, 3 tables

2411.02740 2026-05-08 cs.LG cond-mat.mtrl-sci physics.app-ph physics.comp-ph physics.data-an

An information-matching approach to optimal experimental design and active learning

Yonatan Kurniawan, Tracianne B. Neilsen, Benjamin L. Francis, Alex M. Stankovic, Mingjian Wen, Ilia Nikiforov, Ellad B. Tadmor, Vasily V. Bulatov, Vincenzo Lordi, Mark K. Transtrum