arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.21108 2026-03-24 cs.LG cs.AI

DMMRL: Disentangled Multi-Modal Representation Learning via Variational Autoencoders for Molecular Property Prediction

Long Xu, Junping Guo, Jianbo Zhao, Jianbo Lu, Yuzhong Peng

Comments 9 pages, 1 figure

详情

英文摘要

Molecular property prediction constitutes a cornerstone of drug discovery and materials science, necessitating models capable of disentangling complex structure-property relationships across diverse molecular modalities. Existing approaches frequently exhibit entangled representations--conflating structural, chemical, and functional factors--thereby limiting interpretability and transferability. Furthermore, conventional methods inadequately exploit complementary information from graphs, sequences, and geometries, often relying on naive concatenation that neglects inter-modal dependencies. In this work, we propose DMMRL, which employs variational autoencoders to disentangle molecular representations into shared (structure-relevant) and private (modality-specific) latent spaces, enhancing both interpretability and predictive performance. The proposed variational disentanglement mechanism effectively isolates the most informative features for property prediction, while orthogonality and alignment regularizations promote statistical independence and cross-modal consistency. Additionally, a gated attention fusion module adaptively integrates shared representations, capturing complex inter-modal relationships. Experimental validation across seven benchmark datasets demonstrates DMMRL's superior performance relative to state-of-the-art approaches. The code and data underlying this article are freely available at https://github.com/xulong0826/DMMRL.

URL PDF HTML ☆

赞 0 踩 0

2603.21105 2026-03-24 cs.LG

ResPrune: Text-Conditioned Subspace Reconstruction for Visual Token Pruning in Large Vision-Language Models

Xu Li, Yi Zheng, Yuxuan Liang, Zhe Liu, Xiaolei Chen, Haotian Chen, Rui Zhu, Xiangyang Xue

2603.21104 2026-03-24 cs.RO cs.CV

CounterScene: Counterfactual Causal Reasoning in Generative World Models for Safety-Critical Closed-Loop Evaluation

Bowen Jing, Ruiyang Hao, Weitao Zhou, Haibao Yu

Comments 28 pages, 7 figures

2603.21100 2026-03-24 cs.CV cs.AI

Learning Progressive Adaptation for Multi-Modal Tracking

He Wang, Tianyang Xu, Zhangyong Tang, Xiao-Jun Wu, Josef Kittler

2603.21096 2026-03-24 cs.LG cs.AI cs.CL

Mixture of Chapters: Scaling Learnt Memory in Transformers

Tasmay Pankaj Tibrewal, Pritish Saha, Ankit Meda, Kunal Singh, Pradeep Moturi

Comments 20 pages, 2 figures, 8 tables. Accepted at ICLR 2026 New Frontiers in Associative Memory Workshop. Code available at https://github.com/Tasmay-Tibrewal/Memory

2603.21095 2026-03-24 cs.CV cs.AI

Representation-Level Adversarial Regularization for Clinically Aligned Multitask Thyroid Ultrasound Assessment

Dina Salama, Mohamed Mahmoud, Nourhan Bayasi, David Liu, Ilker Hacihaliloglu

2603.21086 2026-03-24 cs.CV

DGRNet: Disagreement-Guided Refinement for Uncertainty-Aware Brain Tumor Segmentation

Bahram Mohammadi, Yanqiu Wu, Vu Minh Hieu Phan, Sam White, Minh-Son To, Jian Yang, Michael Sheng, Yang Song, Yuankai Qi

Comments 10 pages, 3 figures, 4 tables

2603.21085 2026-03-24 cs.CV

Taming Sampling Perturbations with Variance Expansion Loss for Latent Diffusion Models

Qifan Li, Xingyu Zhou, Jinhua Zhang, Weiyi You, Shuhang Gu

Comments Accepted to CVPR 2026

2603.21084 2026-03-24 cs.CL cs.AI cs.LG

ViCLSR: A Supervised Contrastive Learning Framework with Natural Language Inference for Natural Language Understanding Tasks

Tin Van Huynh, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

2603.21083 2026-03-24 cs.CV

Hierarchical Text-Guided Brain Tumor Segmentation via Sub-Region-Aware Prompts

Bahram Mohammadi, Ta Duc Huy, Afrouz Sheikholeslami, Qi Chen, Vu Minh Hieu Phan, Sam White, Minh-Son To, Xuyun Zhang, Amin Beheshti, Luping Zhou, Yuankai Qi

Comments 10 pages, 3 figures, 4 tables

2603.21078 2026-03-24 cs.CL cs.AI cs.SD

Assessing the Ability of Neural TTS Systems to Model Consonant-Induced F0 Perturbation

Tianle Yang, Chengzhe Sun, Phil Rose, Cassandra L. Jacobs, Siwei Lyu

Comments Accepted for publication in Computer Speech & Language

2603.21069 2026-03-24 cs.CV

NoOVD: Novel Category Discovery and Embedding for Open-Vocabulary Object Detection

Yupeng Zhang, Ruize Han, Zhiwei Chen, Wei Feng, Liang Wan

Comments CVPR 2026 Accept

2603.21065 2026-03-24 cs.AI cs.CL

LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

Jianing Wang, Jianfei Zhang, Qi Guo, Linsen Guo, Rumei Li, Chao Zhang, Chong Peng, Cunguang Wang, Dengchang Zhao, Jiarong Shi, Jingang Wang, Liulin Feng, Mengxia Shen, Qi Li, Shengnan An, Shun Wang, Wei Shi, Xiangyu Xi, Xiaoyu Li, Xuezhi Cao, Yi Lu, Yunke Zhao, Zhengyu Chen, Zhimin Lin, Wei Wang, Peng Pei, Xunliang Cai

Comments 43 pages, 5 figures

2603.21061 2026-03-24 cs.CV

Single-Eye View: Monocular Real-time Perception Package for Autonomous Driving

Haixi Zhang, Aiyinsi Zuo, Zirui Li, Chunshu Wu, Tong Geng, Zhiyao Duan

Comments 9 pages, 5 figures

2603.21056 2026-03-24 cs.LG

Semi-Supervised Learning with Balanced Deep Representation Distributions

Changchun Li, Ximing Li, Bingjie Zhang, Wenting Wang, Jihong Ouyang

2603.21055 2026-03-24 cs.CV

SGAD-SLAM: Splatting Gaussians at Adjusted Depth for Better Radiance Fields in RGBD SLAM

Pengchong Hu, Zhizhong Han

Comments CVPR 2026

2603.21054 2026-03-24 cs.LG cs.AI cs.MM

Harmful Visual Content Manipulation Matters in Misinformation Detection Under Multimedia Scenarios

Bing Wang, Ximing Li, Changchun Li, Jinjin Chi, Tianze Li, Renchu Guan, Shengsheng Wang

详情

英文摘要

Nowadays, the widespread dissemination of misinformation across numerous social media platforms has led to severe negative effects on society. To address this challenge, the automatic detection of misinformation, particularly under multimedia scenarios, has gained significant attention from both academic and industrial communities, leading to the emergence of a research task known as Multimodal Misinformation Detection (MMD). Typically, current MMD approaches focus on capturing the semantic relationships and inconsistency between various modalities but often overlook certain critical indicators within multimodal content. Recent research has shown that manipulated features within visual content in social media articles serve as valuable clues for MMD. Meanwhile, we argue that the potential intentions behind the manipulation, e.g., harmful and harmless, also matter in MMD. Therefore, in this study, we aim to identify such multimodal misinformation by capturing two types of features: manipulation features, which represent if visual content has been manipulated, and intention features, which assess the nature of these manipulations, distinguishing between harmful and harmless intentions. Unfortunately, the manipulation and intention labels that supervise these features to be discriminative are unknown. To address this, we introduce two weakly supervised indicators as substitutes by incorporating supplementary datasets focused on image manipulation detection and framing two different classification tasks as positive and unlabeled learning issues. With this framework, we introduce an innovative MMD approach, titled Harmful Visual Content Manipulation Matters in MMD (HAVC-M4 D). Comprehensive experiments conducted on four prevalent MMD datasets indicate that HAVC-M4 D significantly and consistently enhances the performance of existing MMD methods.

URL PDF HTML ☆

赞 0 踩 0

2603.21051 2026-03-24 cs.RO

Cortical Policy: A Dual-Stream View Transformer for Robotic Manipulation

Xuening Zhang, Qi Lv, Xiang Deng, Miao Zhang, Xingbo Liu, Liqiang Nie

Comments Published as a conference paper at ICLR 2026. 10 pages, 4 figures. Appendix included

2603.21048 2026-03-24 cs.CV cs.AI

A Two-stage Transformer Framework for Temporal Localization of Distracted Driver Behaviors

Gia-Bao Doan, Nam-Khoa Huynh, Minh-Nhat-Huy Ho, Khanh-Thanh-Khoa Nguyen, Thanh-Hai Le

Comments 25 pages, 14 figures

2603.21047 2026-03-24 cs.CV

When Minor Edits Matter: LLM-Driven Prompt Attack for Medical VLM Robustness in Ultrasound

Yasamin Medghalchi, Milad Yazdani, Amirhossein Dabiriaghdam, Moein Heidari, Mojan Izadkhah, Zahra Kavian, Giuseppe Carenini, Lele Wang, Dena Shahriari, Ilker Hacihaliloglu

2603.21046 2026-03-24 cs.CV cs.AI

SpatialFly: Geometry-Guided Representation Alignment for UAV Vision-and-Language Navigation in Urban Environments

Wen Jiang, Kangyao Huang, Li Wang, Wang Xu, Wei Fan, Jinyuan Liu, Shaoyu Liu, Hanfang Liang, Hongwei Duan, Bin Xu, Xiangyang Ji

2603.21043 2026-03-24 cs.LG

Confidence Freeze: Early Success Induces a Metastable Decoupling of Metacognition and Behaviour

Zhipeng Zhang, Hongshun He

2603.21038 2026-03-24 cs.CL cs.HC

Reading Between the Lines: How Electronic Nonverbal Cues shape Emotion Decoding

Taara Kumar, Kokil Jaidka

Comments Accepted at AAAI ICWSM 2026

2603.21036 2026-03-24 cs.CL

Left Behind: Cross-Lingual Transfer as a Bridge for Low-Resource Languages in Large Language Models

Abdul-Salem Beibitkhan

2603.21034 2026-03-24 cs.LG

Fuel Consumption Prediction: A Comparative Analysis of Machine Learning Paradigms

Ali Akram

2603.21030 2026-03-24 cs.LG cs.HC

Deep Attention-based Sequential Ensemble Learning for BLE-Based Indoor Localization in Care Facilities

Minh Triet Pham, Quynh Chi Dang, Le Nhat Tan

Comments 8 pages, 9 figures, IEEE format. Best Challenge Paper Award at the ABC 2026 Activity and Location Recognition Challenge (ABC 2026)

2603.21029 2026-03-24 cs.AI

KLDrive: Fine-Grained 3D Scene Reasoning for Autonomous Driving based on Knowledge Graph

Ye Tian, Jingyi Zhang, Zihao Wang, Xiaoyuan Ren, Xiaofan Yu, Onat Gungor, Tajana Rosing

2603.21022 2026-03-24 cs.AI cs.CL cs.LG

Knowledge Boundary Discovery for Large Language Models

Ziquan Wang, Zhongqi Lu

Comments 9 pages,4 figures

2603.21017 2026-03-24 cs.RO

Dreaming the Unseen: World Model-regularized Diffusion Policy for Out-of-Distribution Robustness

Ziou Hu, Xiangtong Yao, Yuan Meng, Zhenshan Bing, Alois Knoll

Comments Under review

2603.21014 2026-03-24 cs.LG cs.CL

CLT-Forge: A Scalable Library for Cross-Layer Transcoders and Attribution Graphs

Florent Draye, Abir Harrasse, Vedant Palit, Tung-Yu Wu, Jiarui Liu, Punya Syon Pandey, Roderick Wu, Terry Jingchen Zhang, Zhijing Jin, Bernhard Schölkopf

Comments 9 pages, 2 figures, code: https://github.com/LLM-Interp/CLT-Forge