arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2508.00500 2026-03-30 cs.AI cs.SE

ProbGuard: Probabilistic Runtime Monitoring for LLM Agent Safety

Haoyu Wang, Christopher M. Poskitt, Jiali Wei, Jun Sun

详情

英文摘要

Large Language Model (LLM) agents increasingly operate across domains such as robotics, virtual assistants, and web automation. However, their stochastic decision-making introduces safety risks that are difficult to anticipate during execution. Existing runtime monitoring frameworks, such as AgentSpec, primarily rely on reactive safety rules that detect violations only when unsafe behavior is imminent or has already occurred, limiting their ability to handle long-horizon dependencies. We present ProbGuard, a proactive runtime monitoring framework for LLM agents that anticipates safety violations through probabilistic risk prediction. ProbGuard abstracts agent executions into symbolic states and learns a Discrete-Time Markov Chain (DTMC) from execution traces to model behavioral dynamics. At runtime, the monitor estimates the probability that future executions will reach unsafe states and triggers interventions when this risk exceeds a user-defined threshold. To improve robustness, ProbGuard incorporates semantic validity constraints in the abstraction and provides PAC-style guarantees on the learned model under standard assumptions. We evaluate ProbGuard in two safety-critical domains: autonomous driving and embodied household agents. Across evaluated scenarios, ProbGuard consistently predicts traffic law violations and collisions in advance, with warnings up to 38.66 seconds ahead of occurrence. In embodied agent tasks, ProbGuard reduces unsafe behavior by up to 65.37% while preserving up to 80.4% task completion. ProbGuard is implemented as an extensible open-source runtime monitor integrated with the LangChain agent framework and introduces minimal runtime overhead.

URL PDF HTML ☆

赞 0 踩 0

2507.03745 2026-03-30 cs.CV cs.AI cs.LG eess.IV

StreamDiT: Real-Time Streaming Text-to-Video Generation

Akio Kodaira, Tingbo Hou, Ji Hou, Markos Georgopoulos, Felix Juefei-Xu, Masayoshi Tomizuka, Yue Zhao

Comments CVPR 2026

2507.03005 2026-03-30 cs.CL q-bio.PE

Beyond cognacy

Gerhard Jäger

Comments 9 pages, 2 figures

2506.21011 2026-03-30 cs.CV

Score2Instruct: Scaling Up Video Quality-Centric Instructions via Automated Dimension Scoring

Qizhi Xie, Kun Yuan, Yunpeng Qu, Jiachao Gong, Mingda Wu, Ming Sun, Chao Zhou, Jihong Zhu

Comments 16 pages, 5 figures. Accepted by CVPR 2026 main conference

详情

英文摘要

Classical video quality assessment methods generate a numerical score to judge a video's perceived visual fidelity and clarity. Yet, a score fails to describe the video's complex quality dimensions, restricting its applicability. Benefiting from the human-friendly linguistic output, adapting video large multimodal models to VQA via instruction tuning has the potential to address this issue. The core of the approach lies in the video quality-centric instruction data. Previous explorations mainly focus on the image domain, and their data generation processes heavily rely on human quality annotations and proprietary systems, limiting data scalability and effectiveness. To address these challenges, we propose the Score-based Instruction Generation pipeline. Specifically, SIG first scores multiple quality dimensions of an unlabeled video and maps scores to text-defined levels. It then explicitly incorporates a hierarchical Chain-of-Thought to model the correlation between specific dimensions and overall quality, mimicking the human visual system's reasoning process. The automated pipeline eliminates the reliance on expert-written quality descriptions and proprietary systems, ensuring data scalability and generation efficiency. To this end, the resulting Score2Instruct dataset contains over 320K diverse instruction-response pairs, laying the basis for instruction tuning. Moreover, to advance video LMMs' quality scoring and justification abilities simultaneously, we devise a progressive tuning strategy to fully unleash the power of S2I. Built upon SIG, we further curate a benchmark termed S2I-Bench with 400 open-ended questions to better evaluate the quality justification capacity of video LMMs. Experimental results on the S2I-Bench and existing benchmarks indicate that our method consistently improves quality scoring and justification capabilities across multiple video LMMs.

URL PDF HTML ☆

赞 0 踩 0

2506.13633 2026-03-30 cs.LG cs.NA math.AP math.NA math.OC

Global Convergence of Adjoint-Optimized Neural PDEs

Konstantin Riedl, Justin Sirignano, Konstantinos Spiliopoulos

Comments 81 pages, 2 figures

详情

Journal ref: Journal of Machine Learning Research 26(295):1-94, 2025

英文摘要

Many engineering and scientific fields have recently become interested in modeling terms in partial differential equations (PDEs) with neural networks, which requires solving the inverse problem of learning neural network terms from observed data in order to approximate missing or unresolved physics in the PDE model. The resulting neural-network PDE model, being a function of the neural network parameters, can be calibrated to the available ground truth data by optimizing over the PDE using gradient descent, where the gradient is evaluated in a computationally efficient manner by solving an adjoint PDE. These neural PDE models have emerged as an important research area in scientific machine learning. In this paper, we study the convergence of the adjoint gradient descent optimization method for training neural PDE models in the limit where both the number of hidden units and the training time tend to infinity. Specifically, for a general class of nonlinear parabolic PDEs with a neural network embedded in the source term, we prove convergence of the trained neural-network PDE solution to the target data (i.e., a global minimizer). The global convergence proof poses a unique mathematical challenge that is not encountered in finite-dimensional neural network convergence analyses due to (i) the neural network training dynamics involving a non-local neural network kernel operator in the infinite-width hidden layer limit where the kernel lacks a spectral gap for its eigenvalues and (ii) the nonlinearity of the limit PDE system, which leads to a non-convex optimization problem in the neural network function even in the infinite-width hidden layer limit (unlike in typical neural network training cases where the optimization problem becomes convex in the large neuron limit). The theoretical results are illustrated and empirically validated by numerical studies.

URL PDF HTML ☆

赞 0 踩 0

2506.06909 2026-03-30 cs.CV

Gaussian Mapping for Evolving Scenes

Vladimir Yugay, Thies Kersten, Luca Carlone, Theo Gevers, Martin R. Oswald, Lukas Schmid

2505.20353 2026-03-30 cs.LG cs.AI cs.CV cs.MM cs.PF

FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

Dong Liu, Yanxuan Yu, Jiayi Zhang, Yifan Li, Ben Lengerich, Ying Nian Wu

2505.17080 2026-03-30 cs.CL

Not Minds, but Signs: Reframing LLMs through Semiotics

Davide Picca

2505.17006 2026-03-30 cs.CV cs.RO

CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning

Jiange Yang, Yansong Shi, Haoyi Zhu, Mingyu Liu, Kaijing Ma, Yating Wang, Gangshan Wu, Tong He, Limin Wang

Comments CVPR 2026

2504.09472 2026-03-30 cs.CV

Zero-Shot Personalized Camera Motion Control for Image-to-Video Synthesis

Pooja Guhan, Divya Kothandaraman, Geonsun Lee, Tsung-Wei Huang, Guan-Ming Su, Dinesh Manocha

2503.03399 2026-03-30 cs.LG

Robust Predictive Modeling Under Unseen Data Distribution Shifts: A Methodological Commentary

Hanyu Duan, Yi Yang, Ahmed Abbasi, Kar Yan Tam

Comments Forthcoming in Information Systems Research

2502.13592 2026-03-30 cs.CL

Don't Stop the Multi-Party! On Generating Synthetic Written Multi-Party Conversations with Constraints

Nicolò Penzo, Marco Guerini, Bruno Lepri, Goran Glavaš, Sara Tonelli

Comments Accepted at AAAI2026

2502.12896 2026-03-30 cs.CL

None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks

Eva Sánchez Salido, Julio Gonzalo, Guillermo Marco

2502.01557 2026-03-30 cs.LG math.DS stat.ML

How iteration order influences convergence and stability in deep learning

Benoit Dherin, Benny Avelin, Anders Karlsson, Hanna Mazzawi, Javier Gonzalvo, Michael Munn

2502.00262 2026-03-30 cs.CV cs.AI

INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation

Dianwei Chen, Zifan Zhang, Lei Cheng, Yuchen Liu, Xianfeng Terry Yang

2501.16919 2026-03-30 cs.LG

Projection-free Algorithms for Online Convex Optimization with Adversarial Constraints

Dhruv Sarkar, Aprameyo Chakrabartty, Subhamon Supantha, Palash Dey, Abhishek Sinha

Comments To appear in the Proceedings of the 29th International Conference on Artificial Intelligence and Statistics (AISTATS) 2026, Tangier, Morocco

2501.05765 2026-03-30 cs.AI cs.LO

Deontic Temporal Logic for Formal Verification of AI Ethics

Priya T. V., Shrisha Rao

2501.04828 2026-03-30 cs.CL

Building Foundations for Natural Language Processing of Historical Turkish: Resources and Models

Şaziye Betül Özateş, Tarık Emre Tıraş, Ece Elif Adak, Berat Doğan, Fatih Burak Karagöz, Efe Eren Genç, Esma F. Bilgin Taşdemir

2410.19733 2026-03-30 cs.AI

ReMe: Scaffolding Personalized Cognitive Training via Controllable LLM-Mediated Conversations

Zilong Wang, Nan Chen, Luna K. Qiu, Ling Yue, Geli Guo, Yang Ou, Shiqi Jiang, Yuqing Yang, Lili Qiu

2409.18602 2026-03-30 cs.CL

Do LLMs suffer from Multi-Party Hangover? A Diagnostic Approach to Addressee Recognition and Response Selection in Conversations

Nicolò Penzo, Maryam Sajedinia, Bruno Lepri, Sara Tonelli, Marco Guerini

Comments Accepted to EMNLP 2024 main conference

2408.00949 2026-03-30 cs.LG math.GR math.RT stat.ML

Equivariant neural networks and piecewise linear representation theory

Joel Gibson, Daniel Tubbenhauer, Geordie Williamson

Comments 23 pages, many figures, revision, to appear in Contemp. Math., comments welcome

2407.16541 2026-03-30 cs.CV cs.MM

QPT V2: Masked Image Modeling Advances Visual Scoring

Qizhi Xie, Kun Yuan, Yunpeng Qu, Mingda Wu, Ming Sun, Chao Zhou, Jihong Zhu

Comments 8 pages, 6 figures. Accepted by ACM MM 24

2405.12944 2026-03-30 cs.CV

AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection

Zizhao Chen, Yeqiang Qian, Xiaoxiao Yang, Chunxiang Wang, Ming Yang

Comments Accepted by IEEE Transactions on Multimedia

2405.00181 2026-03-30 cs.CV cs.AI

Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Hang Du, Sicheng Zhang, Binzhu Xie, Guoshun Nan, Jiayang Zhang, Junrui Xu, Hangyu Liu, Sicong Leng, Jiangming Liu, Hehe Fan, Dajiu Huang, Jing Feng, Linli Chen, Can Zhang, Xuhuan Li, Hao Zhang, Jianhang Chen, Qimei Cui, Xiaofeng Tao

Comments Accepted in CVPR2024, Codebase: https://github.com/fesvhtr/CUVA

2403.18425 2026-03-30 cs.CV cs.AI cs.LG

U-Sketch: An Efficient Approach for Sketch to Image Diffusion Models

Ilias Mitsouras, Eleftherios Tsonis, Paraskevi Tzouveli, Athanasios Voulodimos

详情

DOI: 10.1109/ACCESS.2026.3677054
Journal ref: IEEE Access 2026

英文摘要

Diffusion models have demonstrated remarkable performance in text-to-image synthesis, producing realistic and high resolution images that faithfully adhere to the corresponding text-prompts. Despite their great success, they still fall behind in sketch-to-image synthesis tasks, where in addition to text-prompts, the spatial layout of the generated images has to closely follow the outlines of certain reference sketches. Employing an MLP latent edge predictor to guide the spatial layout of the synthesized image by predicting edge maps at each denoising step has been recently proposed. Despite yielding promising results, the pixel-wise operation of the MLP does not take into account the spatial layout as a whole, and demands numerous denoising iterations to produce satisfactory images, leading to time inefficiency. To this end, we introduce U-Sketch, a framework featuring a U-Net type latent edge predictor, which is capable of efficiently capturing both local and global features, as well as spatial correlations between pixels. Moreover, we propose the addition of a sketch simplification network that offers the user the choice of preprocessing and simplifying input sketches for enhanced outputs. The experimental results, corroborated by user feedback, demonstrate that our proposed U-Net latent edge predictor leads to more realistic results, that are better aligned with the spatial outlines of the reference sketches, while drastically reducing the number of required denoising steps and, consequently, the overall execution time.

URL PDF HTML ☆

赞 0 踩 0

2402.02975 2026-03-30 cs.CL

Putting Context in Context: the Impact of Discussion Structure on Text Classification

Nicolò Penzo, Antonio Longa, Bruno Lepri, Sara Tonelli, Marco Guerini

Comments Accepted to EACL 2024 main conference

2312.10666 2026-03-30 cs.RO cs.LG math.OC

CACTO-SL: Using Sobolev Learning to improve Continuous Actor-Critic with Trajectory Optimization

Elisa Alboni, Gianluigi Grandesso, Gastone Pietro Rosati Papini, Justin Carpentier, Andrea Del Prete

2310.20093 2026-03-30 cs.CL cs.AI

Evaluating Neural Language Models as Cognitive Models of Language Acquisition

Héctor Javier Vázquez Martínez, Annika Lea Heuser, Charles Yang, Jordan Kodner

Comments To appear in the GenBench 2023 workshop proceedings, the first workshop on (benchmarking) generalisation in NLP. GenBench 2023 will be held at EMNLP 2023 on December 6, 2023

2307.00106 2026-03-30 cs.LG

Distance Functions and Normalization Under Stream Scenarios

Eduardo V. L. Barboza, Paulo R. Lisboa de Almeida, Alceu de Souza Britto, Rafael M. O. Cruz

Comments Paper accepted to the 2023 International Joint Conference on Neural Networks

2203.16263 2026-03-30 cs.SD cs.LG eess.AS

Does Audio Deepfake Detection Generalize?

Nicolas M. Müller, Pavel Czempin, Franziska Dieckmann, Adam Froghyar, Konstantin Böttinger

Comments Interspeech 2022