arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.06577 2026-03-09 cs.CV

Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion

Lijiang Li, Zuwei Long, Yunhang Shen, Heting Gao, Haoyu Cao, Xing Sun, Caifeng Shan, Ran He, Chaoyou Fu

Comments Project page: https://omni-diffusion.github.io

详情

英文摘要

While recent multimodal large language models (MLLMs) have made impressive strides, they predominantly employ a conventional autoregressive architecture as their backbone, leaving significant room to explore effective and efficient alternatives in architectural design. Concurrently, recent studies have successfully applied discrete diffusion models to various domains, such as visual understanding and image generation, revealing their considerable potential as a promising backbone for multimodal systems. Drawing inspiration from these pioneering research, we introduce Omni-Diffusion, the first any-to-any multimodal language model built entirely on mask-based discrete diffusion models, which unifies understanding and generation across text, speech, and images. Omni-Diffusion employs a unified mask-based discrete diffusion model to directly capture the joint distribution over discrete multimodal tokens. This approach supports not only bimodal tasks but also more complex scenarios involving multiple modalities. On a diverse set of benchmarks, our method outperforms or performs on par with existing multimodal systems that process two or more modalities, highlighting the significant promise of diffusion models in powering the next generation of multimodal foundation models. Project webpage: https://omni-diffusion.github.io.

URL PDF HTML ☆

赞 0 踩 0

2603.06576 2026-03-09 cs.CV cs.AI cs.LG cs.RO

BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations

Thomas Monninger, Shaoyuan Xie, Qi Alfred Chen, Sihao Ding

Comments 4 figures, 6 tables in the main paper, 32 pages in total

2603.06573 2026-03-09 cs.RO cs.AI

Fly360: Omnidirectional Obstacle Avoidance within Drone View

Xiangkai Zhang, Dizhe Zhang, WenZhuo Cao, Zhaoliang Wan, Yingjie Niu, Lu Qi, Xu Yang, Zhiyong Liu

Comments 16 pages, 10 figures

2603.06570 2026-03-09 cs.CV cs.AI

SUREON: A Benchmark and Vision-Language-Model for Surgical Reasoning

Alejandra Perez, Anita Rau, Lee White, Busisiwe Mlambo, Chinedu Nwoye, Muhammad Abdullah Jamal, Omid Mohareri

2603.06567 2026-03-09 cs.LG cond-mat.mtrl-sci cs.CE physics.chem-ph q-bio.QM

A recipe for scalable attention-based MLIPs: unlocking long-range accuracy with all-to-all node attention

Eric Qu, Brandon M. Wood, Aditi S. Krishnapriyan, Zachary W. Ulissi

2603.06565 2026-03-09 cs.AI cs.LG

Boosting deep Reinforcement Learning using pretraining with Logical Options

Zihan Ye, Phil Chau, Raban Emunds, Jannis Blüml, Cedric Derstroff, Quentin Delfosse, Oleg Arenz, Kristian Kersting

2603.06562 2026-03-09 quant-ph cs.CR

Radio-Frequency Side-Channel Analysis of a Trapped-Ion Quantum Computer

Giorgio Grigolo, Dorian Schiffer, Lukas Gerster, Martin Ringbauer, Paul Erker

Comments 11 pages, 8 figures, 1 table

2603.06557 2026-03-09 cs.LG q-bio.NC

Causal Interpretation of Neural Network Computations with Contribution Decomposition

Joshua Brendan Melander, Zaki Alaoui, Shenghua Liu, Surya Ganguli, Stephen A. Baccus

Comments 32 pages, 19 figures. ICLR 2026 poster

2603.06556 2026-03-09 cs.HC

Capability at a Glance: Design Guidelines for Intuitive Avatars Communicating Augmented Actions in Virtual Reality

Yang Lu, Tianyu Zhang, Jiamu Tang, Yanna Lin, Jiankun Yang, Longyu Zhang, Shijian Luo, Yukang Yan

2603.06555 2026-03-09 cs.LG

Hierarchical Industrial Demand Forecasting with Temporal and Uncertainty Explanations

Harshavardhan Kamarthi, Shangqing Xu, Xinjie Tong, Xingyu Zhou, James Peters, Joseph Czyzyk, B. Aditya Prakash

详情

英文摘要

Hierarchical time-series forecasting is essential for demand prediction across various industries. While machine learning models have obtained significant accuracy and scalability on such forecasting tasks, the interpretability of their predictions, informed by application, is still largely unexplored. To bridge this gap, we introduce a novel interpretability method for large hierarchical probabilistic time-series forecasting, adapting generic interpretability techniques while addressing challenges associated with hierarchical structures and uncertainty. Our approach offers valuable interpretative insights in response to real-world industrial supply chain scenarios, including 1) the significance of various time-series within the hierarchy and external variables at specific time points, 2) the impact of different variables on forecast uncertainty, and 3) explanations for forecast changes in response to modifications in the training dataset. To evaluate the explainability method, we generate semi-synthetic datasets based on real-world scenarios of explaining hierarchical demands for over ten thousand products at a large chemical company. The experiments showed that our explainability method successfully explained state-of-the-art industrial forecasting methods with significantly higher explainability accuracy. Furthermore, we provide multiple real-world case studies that show the efficacy of our approach in identifying important patterns and explanations that help stakeholders better understand the forecasts. Additionally, our method facilitates the identification of key drivers behind forecasted demand, enabling more informed decision-making and strategic planning. Our approach helps build trust and confidence among users, ultimately leading to better adoption and utilization of hierarchical forecasting models in practice.

URL PDF HTML ☆

赞 0 踩 0

2603.06551 2026-03-09 cs.SE

Understanding and Finding JIT Compiler Performance Bugs

Zijian Yi, Cheng Ding, August Shi, Milos Gligoric

Comments Accepted to OOPSLA 2026

2603.06548 2026-03-09 cs.RO cs.SY eess.SY

Uncertainty-Aware Adaptive Dynamics For Underwater Vehicle-Manipulator Robots

Edward Morgan, Nenyi K Dadson, Corina Barbalata

2603.06544 2026-03-09 cs.CV

Modeling and Measuring Redundancy in Multisource Multimodal Data for Autonomous Driving

Yuhan Zhou, Mehri Sattari, Haihua Chen, Kewei Sha

Comments This paper has been accepted by the Fourth IEEE International Conference on Mobility: Operations, Services, and Technologies (MOST) 2026

2603.06543 2026-03-09 cs.CV

SurgFormer: Scalable Learning of Organ Deformation with Resection Support and Real-Time Inference

Ashkan Shahbazi, Elaheh Akbari, Kyvia Pereira, Jon S. Heiselman, Annie C. Benson, Garrison L. H. Johnston, Jie Ying Wu, Nabil Simaan, Michael I. Miga, Soheil Kolouri

2603.06542 2026-03-09 cs.SD cs.AI

RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering

Gaia A. Bertolino, Yuwei Zhang, Tong Xia, Domenico Talia, Cecilia Mascolo

2603.06541 2026-03-09 eess.SP cs.SY eess.SY

Codebook Design and Baseband Precoding for Pragmatic Array-Fed RIS Hybrid Multiuser MIMO

Krishan Kumar Tiwari, Giuseppe Caire

2603.06540 2026-03-09 cs.CR

Proteus: A Practical Framework for Privacy-Preserving Device Logs

Sanket Goutam, Hunter Kippen, Mike Grace, Amir Rahmati

2603.06538 2026-03-09 cs.RO

Unified Learning of Temporal Task Structure and Action Timing for Bimanual Robot Manipulation

Christian Dreher, Patrick Dormanns, Andre Meixner, Tamim Asfour

Comments This work has been submitted to the IEEE for possible publication

2603.06536 2026-03-09 eess.SY cs.SY

Adaptive Data-Driven Min-Max MPC for Linear Time-Varying Systems

Yifan Xie, Julian Berberich, Frank Allgöwer

2603.06533 2026-03-09 cs.CV

NEGATE: Constrained Semantic Guidance for Linguistic Negation in Text-to-Video Diffusion

Taewon Kang, Ming C. Lin

Comments 50 pages, 32 figures

2603.06531 2026-03-09 cs.CV cs.RO

Spatial Calibration of Diffuse LiDARs

Nikhil Behari, Ramesh Raskar

2603.06530 2026-03-09 cs.CV

AV-Unified: A Unified Framework for Audio-visual Scene Understanding

Guangyao Li, Xin Wang, Wenwu Zhu

Comments Accepted by IEEE Transactions on Multimedia (TMM)

2603.06525 2026-03-09 cs.RO

Underactuated multimodal jumping robot for extraterrestrial exploration

Neil R. Wagner, Justin K. Yim

Comments 8 pages, 14 figures, Accepted for ICRA 2026

2603.06523 2026-03-09 cs.CV

SCAN: Visual Explanations with Self-Confidence and Analysis Networks

Gwanghee Lee, Sungyoon Jeong, Kyoungson Jhang

Comments 14 pages, 9 figures, IEEE Transactions on Artificial Intelligence

2603.06522 2026-03-09 cs.CV cs.AI cs.LG

Artificial Intelligence for Detecting Fetal Orofacial Clefts and Advancing Medical Education

Yuanji Zhang, Yuhao Huang, Haoran Dou, Xiliang Zhu, Chen Ling, Zhong Yang, Lianying Liang, Jiuping Li, Siying Liang, Rui Li, Yan Cao, Yuhan Zhang, Jiewei Lai, Yongsong Zhou, Hongyu Zheng, Xinru Gao, Cheng Yu, Liling Shi, Mengqin Yuan, Honglong Li, Xiaoqiong Huang, Chaoyu Chen, Jialin Zhang, Wenxiong Pan, Alejandro F. Frangi, Guangzhi He, Xin Yang, Yi Xiong, Linliang Yin, Xuedong Deng, Dong Ni

Comments 28 pages, 10 figures, 11 tables

2603.06512 2026-03-09 cs.RO cs.CV

SG-DOR: Learning Scene Graphs with Direction-Conditioned Occlusion Reasoning for Pepper Plants

Rohit Menon, Niklas Mueller-Goldingen, Sicong Pan, Gokul Krishna Chenchani, Maren Bennewitz

2603.06508 2026-03-09 cs.LG

When One Modality Rules Them All: Backdoor Modality Collapse in Multimodal Diffusion Models

Qitong Wang, Haoran Dai, Haotian Zhang, Christopher Rasmussen, Binghui Wang

Comments Accepted to the ICLR 2026 Workshop on Principled Design for Trustworthy AI. The first two authors contributed equally

2603.06507 2026-03-09 cs.CV

Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis

Hila Chefer, Patrick Esser, Dominik Lorenz, Dustin Podell, Vikash Raja, Vinh Tong, Antonio Torralba, Robin Rombach

Comments project webpage: https://bfl.ai/research/self-flow

2603.06505 2026-03-09 cs.CL

Speak in Context: Multilingual ASR with Speech Context Alignment via Contrastive Learning

Yuchen Zhang, Haralambos Mouratidis, Ravi Shekhar

Comments Accepted at LREC 2026

2603.06503 2026-03-09 cs.CL

Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing

Anmol Gulati, Sahil Sen, Waqar Sarguroh, Kevin Paul