arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.03329 2026-04-07 cs.CV cs.AI cs.LG cs.SD

CoLoRSMamba: Conditional LoRA-Steered Mamba for Supervised Multimodal Violence Detection

Damith Chamalke Senadeera, Dimitrios Kollias, Gregory Slabaugh

详情

英文摘要

Violence detection benefits from audio, but real-world soundscapes can be noisy or weakly related to the visible scene. We present CoLoRSMamba, a directional Video to Audio multimodal architecture that couples VideoMamba and AudioMamba through CLS-guided conditional LoRA. At each layer, the VideoMamba CLS token produces a channel-wise modulation vector and a stabilization gate that adapt the AudioMamba projections responsible for the selective state-space parameters (Delta, B, C), including the step-size pathway, yielding scene-aware audio dynamics without token-level cross-attention. Training combines binary classification with a symmetric AV-InfoNCE objective that aligns clip-level audio and video embeddings. To support fair multimodal evaluation, we curate audio-filtered clip level subsets of the NTU-CCTV and DVD datasets from temporal annotations, retaining only clips with available audio. On these subsets, CoLoRSMamba outperforms representative audio-only, video-only, and multimodal baselines, achieving 88.63% accuracy / 86.24% F1-V on NTU-CCTV and 75.77% accuracy / 72.94% F1-V on DVD. It further offers a favorable accuracy-efficiency tradeoff, surpassing several larger models with fewer parameters and FLOPs.

URL PDF HTML ☆

赞 0 踩 0

2604.03328 2026-04-07 cs.CV cs.RO

Review and Evaluation of Point-Cloud based Leaf Surface Reconstruction Methods for Agricultural Applications

Arif Ahmed, Parikshit Maini

2604.03325 2026-04-07 cs.CV cs.AI cs.RO

Safety-Aligned 3D Object Detection: Single-Vehicle, Cooperative, and End-to-End Perspectives

Brian Hsuan-Cheng Liao, Chih-Hong Cheng, Hasan Esen, Alois Knoll

Comments 10 pages, 9 figures, 6 tables

详情

英文摘要

Perception plays a central role in connected and autonomous vehicles (CAVs), underpinning not only conventional modular driving stacks, but also cooperative perception systems and recent end-to-end driving models. While deep learning has greatly improved perception performance, its statistical nature makes perfect predictions difficult to attain. Meanwhile, standard training objectives and evaluation benchmarks treat all perception errors equally, even though only a subset is safety-critical. In this paper, we investigate safety-aligned evaluation and optimization for 3D object detection that explicitly characterize high-impact errors. Building on our previously proposed safety-oriented metric, NDS-USC, and safety-aware loss function, EC-IoU, we make three contributions. First, we present an expanded study of single-vehicle 3D object detection models across diverse neural network architectures and sensing modalities, showing that gains under standard metrics such as mAP and NDS may not translate to safety-oriented criteria represented by NDS-USC. With EC-IoU, we reaffirm the benefit of safety-aware fine-tuning for improving safety-critical detection performance. Second, we conduct an ego-centric, safety-oriented evaluation of AV-infrastructure cooperative object detection models, underscoring its superiority over vehicle-only models and demonstrating a safety impact analysis that illustrates the potential contribution of cooperative models to "Vision Zero." Third, we integrate EC-IoU into SparseDrive and show that safety-aware perception hardening can reduce collision rate by nearly 30% and improve system-level safety directly in an end-to-end perception-to-planning framework. Overall, our results indicate that safety-aligned perception evaluation and optimization offer a practical path toward enhancing CAV safety across single-vehicle, cooperative, and end-to-end autonomy settings.

URL PDF HTML ☆

赞 0 踩 0

2604.03322 2026-04-07 cs.CV cs.AI cs.RO

VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing

Junyi Zong, Qingxuan Jia, Meixian Shi, Tong Li, Jiayuan Li, Zihang Lv, Gang Chen, Fang Deng

Comments 11 pages, 6 figures

2604.03321 2026-04-07 cs.LG cs.AI math.AP physics.med-ph

General Explicit Network (GEN): A novel deep learning architecture for solving partial differential equations

Genwei Ma, Ting Luo, Ping Yang, Xing Zhao

2604.03320 2026-04-07 cs.CV

Robust Multi-Source Covid-19 Detection in CT Images

Asmita Yuki Pritha, Jason Xu, Daniel Ding, Justin Li, Aryana Hou, Xin Wang, Shu Hu

Comments 8 pages, 5 figures, 3 tables. Accepted at the 3rd Workshop on New Trends in AI-Generated Media and Security (AIMS) @ CVPR 2026

2604.03316 2026-04-07 cs.CV

When Sinks Help or Hurt: Unified Framework for Attention Sink in Large Vision-Language Models

Jiho Choi, Jaemin Kim, Sanghwan Kim, Seunghoon Hong, Jin-Hwi Park

Comments preprint

2604.03315 2026-04-07 cs.CV cs.AI

StoryBlender: Inter-Shot Consistent and Editable 3D Storyboard with Spatial-temporal Dynamics

Bingliang Li, Zhenhong Sun, Jiaming Bian, Yuehao Wu, Yifu Wang, Hongdong Li, Yatao Bian, Huadong Mo, Daoyi Dong

2604.03314 2026-04-07 cs.CV cs.CL

CoLA: Cross-Modal Low-rank Adaptation for Multimodal Downstream Tasks

Wish Suharitdamrong, Tony Alex, Muhammad Awais, Sara Ahmed

Comments 14 pages, 6 Figures

2604.03313 2026-04-07 cs.CV

CardioSAM: Topology-Aware Decoder Design for High-Precision Cardiac MRI Segmentation

Ujjwal Jain

2604.03311 2026-04-07 cs.CV physics.ao-ph

PollutionNet: A Vision Transformer Framework for Climatological Assessment of NO$_2$ and SO$_2$ Using Satellite-Ground Data Fusion

Prasanjit Dey, Soumyabrata Dev, Bianca Schoen-Phelan

Comments This manuscript is currently under review at Theoretical and Applied Climatology (Springer)

2604.03310 2026-04-07 cs.CV

Diffusion Path Alignment for Long-Range Motion Generation and Domain Transitions

Haichao Wang, Alexander Okupnik, Yuxing Han, Gene Wen, Johannes Schneider, Kyriakos Flouris

2604.03309 2026-04-07 cs.CV cs.AI

TreeGaussian: Tree-Guided Cascaded Contrastive Learning for Hierarchical Consistent 3D Gaussian Scene Segmentation and Understanding

Jingbin You, Zehao Li, Hao Jiang, Xinzhu Ma, Shuqin Gao, Honglong Zhao, Congcong Zheng, Tianlu Mao, Feng Dai, Yucheng Zhang, Zhaoqi Wang

2604.03308 2026-04-07 cs.CV

Edge-Based Standing-Water Detection via FSM-Guided Tiering and Multi-Model Consensus

Oliver Aleksander Larsen, Mahyar T. Moghaddam

Comments Accepted at the In Practice Track of IEEE ICSA 2026. 10 pages

2604.03306 2026-04-07 cs.CV

Deep Image Clustering Based on Curriculum Learning and Density Information

Haiyang Zheng, Ruilin Zhang, Hongpeng Wang

2604.03305 2026-04-07 cs.CV

HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis

Mingjin Chen, Junhao Chen, Zhaoxin Fan, Yujian Lee, Zichen Dang, Lili Wang, Yawen Cui, Lap-Pui Chau, Yi Wang

Comments Project page: https://hvg3d.github.io/

2604.03302 2026-04-07 cs.CV cs.AI

Beyond Static Vision: Scene Dynamic Field Unlocks Intuitive Physics Understanding in Multi-modal Large Language Models

Nanxi Li, Xiang Wang, Yuanjie Chen, Haode Zhang, Hong Li, Yong-Lu Li

2604.03301 2026-04-07 cs.CV cs.AI

Embedding-Only Uplink for Onboard Retrieval Under Shift in Remote Sensing

Sangcheol Sim

Comments Accepted at the Machine Learning for Remote Sensing (ML4RS) Workshop, ICLR 2026

2604.03299 2026-04-07 cs.CV cs.AI

MoViD: View-Invariant 3D Human Pose Estimation via Motion-View Disentanglement

Yejia Liu, Hengle Jiang, Haoxian Liu, Runxi Huang, Xiaomin Ouyang

2604.03297 2026-04-07 cs.CV cs.AI cs.LG

XAttnRes: Cross-Stage Attention Residuals for Medical Image Segmentation

Xinyu Liu, Qing Xu, Zhen Chen

2604.03296 2026-04-07 cs.CV cs.AI

3D-IDE: 3D Implicit Depth Emergent

Chushan Zhang, Ruihan Lu, Jinguang Tong, Yikai Wang, Hongdong Li

Comments CVPR 2026 accepted. Project page: https://chushanzhang.github.io/3D-IDE/

2604.03286 2026-04-07 cs.AI cond-mat.mtrl-sci cs.HC

Toward Full Autonomous Laboratory Instrumentation Control with Large Language Models

Yong Xie, Kexin He, Andres Castellanos-Gomez

Comments 16 pages, 5 figures. Accepted manuscript published in Small Structures. Supporting data and code available at https://doi.org/10.5281/zenodo.15065601

2604.03277 2026-04-07 cs.CV cs.AI cs.LG

Event-Driven Neuromorphic Vision Enables Energy-Efficient Visual Place Recognition

Geoffroy Keime, Nicolas Cuperlier, Benoit R. Cottereau

Comments 40 pages single column, v1

2604.03270 2026-04-07 cs.CL

Knowledge Packs: Zero-Token Knowledge Delivery via KV Cache Injection

Andrey Pustovit

Comments 12 pages, 3 figures, 8 tables. Code: https://github.com/cnails/kv-knowledge-packs

2604.03267 2026-04-07 cs.CV cs.AI

A reconfigurable smart camera implementation for jet flames characterization based on an optimized segmentation model

Gerardo Valente Vazquez-Garcia, Carmina Perez Guerrero, Eduardo Garduño, Miguel Gonzalez-Mendoza, Adriana Palacios, Gerardo Rodriguez-Hernandez, Vahid Foroughi, Alba Àgueda, Elsa Pastor, Gilberto Ochoa-Ruiz

Comments Paper submitted to EAAI (Elsevier) for peer review

详情

英文摘要

In this work we present a novel framework for fire safety management in industrial settings through the implementation of a smart camera platform for jet flames characterization. The approach seeks to alleviate the lack of real-time solutions for industrial early fire segmentation and characterization. As a case study, we demonstrate how a SoC FPGA, running optimized Artificial Intelligence (AI) models can be leveraged to implement a full edge processing pipeline for jet flames analysis. In this paper we extend previous work on computer-vision jet fire segmentation by creating a novel experimental set-up and system implementation for addressing this issue, which can be replicated to other fire safety applications. The proposed platform is designed to carry out image processing tasks in real-time and on device, reducing video processing overheads, and thus the overall latency. This is achieved by optimizing a UNet segmentation model to make it amenable for an SoC FPGAs implementation; the optimized model can then be efficiently mapped onto the SoC reconfigurable logic for massively parallel execution. For our experiments, we have chosen the Ultra96 platform, as it also provides the means for implementing full-fledged intelligent systems using the SoC peripherals, as well as other Operating System (OS) capabilities (i.e., multi-threading) for systems management. For optimizing the model we made use of the Vitis (Xilinx) framework, which enabled us to optimize the full precision model from 7.5 million parameters to 59,095 parameters (125x less), which translated into a reduction of the processing latency of 2.9x. Further optimization (multi-threading and batch normalization) led to an improvement of 7.5x in terms of latency, yielding a performance of 30 Frames Per Second (FPS) without sacrificing accuracy in terms of the evaluated metrics (Dice Score).

URL PDF HTML ☆

赞 0 踩 0

2604.03264 2026-04-07 cs.CV cs.AI cs.CR

SafeScreen: A Safety-First Screening Framework for Personalized Video Retrieval for Vulnerable Users

Wenzheng Zhao, Madhava Kalyan Gadiputi, Fengpei Yuan

Comments 11 pages, 3 figures, 7 tables. Under review for ACM ICMI 2026

2604.03258 2026-04-07 cs.CL cs.AI

SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

Xinhao Huang, You-Liang Huang, Zeyi Wen

2604.03257 2026-04-07 cs.CL cs.AI

Robust LLM Performance Certification via Constrained Maximum Likelihood Estimation

Minghe Shen, Ananth Balashankar, Adam Fisch, David Madras, Miguel Rodrigues

2604.03253 2026-04-07 cs.CL cs.LG

Self-Execution Simulation Improves Coding Models

Gallil Maimon, Ori Yoran, Felix Kreuk, Michael Hassid, Gal Cohen, Pierre Chambon, Yossi Adi

2604.03242 2026-04-07 cs.LG

DRAFT: Task Decoupled Latent Reasoning for Agent Safety

Lin Wang, Junfeng Fang, Dan Zhang, Fei Shen, Xiang Wang, Tat-Seng Chua