arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2506.00030 2026-03-20 cs.LG

Modality Equilibrium Matters: Minor-Modality-Aware Adaptive Alternating for Cross-Modal Memory Enhancement

Xiang Shi, Rui Zhang, Jiawei Liu, Yinpeng Liu, Qikai Cheng, Wei Lu

Comments Accepted by TPAMI

详情

英文摘要

Multimodal fusion is susceptible to modality imbalance, where dominant modalities overshadow weak ones, easily leading to biased learning and suboptimal fusion, especially for incomplete modality conditions. To address this problem, we propose a Shapley-guided alternating training framework that adaptively prioritizes minor modalities to balance and thus enhance the fusion. Our method leverages Shapley Value-based scheduling to improve the training sequence adaptively, ensuring that under-optimized modalities receive sufficient learning. Additionally, we introduce the memory module to refine and inherit modality-specific representations with a cross-modal mapping mechanism to align features at both the feature and sample levels. To further validate the adaptability of the proposed approach, the encoder module empirically adopts both conventional and LLM-based backbones. With building up a novel multimodal equilibrium metric, namely, equilibrium deviation metric (EDM), we evaluate the performance in both balance and accuracy across four multimodal benchmark datasets, where our method achieves state-of-the-art (SOTA) results. Meanwhile, robustness analysis under missing modalities highlights its strong generalization capabilities. Accordingly, our findings reveal the untapped potential of alternating training, demonstrating that strategic modality prioritization fundamentally balances and promotes multimodal learning, offering a new paradigm for optimizing multimodal training dynamics.

URL PDF HTML ☆

赞 0 踩 0

2505.17847 2026-03-20 cs.LG cs.AI cs.SY eess.SY

Time-o1: Time-Series Forecasting Needs Transformed Label Alignment

Hao Wang, Licheng Pan, Zhichao Chen, Xu Chen, Qingyang Dai, Lei Wang, Haoxuan Li, Zhouchen Lin

Comments Accepted as poster in NeurIPS 2025

2505.10294 2026-03-20 cs.CV q-bio.TO

MIPHEI-ViT: Multiplex Immunofluorescence Prediction from H&E Images using ViT Foundation Models

Guillaume Balezo, Roger Trullo, Albert Pla Planas, Etienne Decenciere, Thomas Walter

Comments Accepted manuscript, 24 pages, 9 figures, 5 tables. Published in Computers in Biology and Medicine (DOI: https://doi.org/10.1016/j.compbiomed.2026.111564)

详情

DOI: 10.1016/j.compbiomed.2026.111564
Journal ref: Computers in Biology and Medicine, vol. 206, 2026, 111564

英文摘要

Histopathological analysis is a cornerstone of cancer diagnosis, with Hematoxylin and Eosin (H&E) staining routinely acquired for every patient to visualize cell morphology and tissue architecture. On the other hand, multiplex immunofluorescence (mIF) enables more precise cell type identification via proteomic markers, but has yet to achieve widespread clinical adoption due to cost and logistical constraints. To bridge this gap, we introduce MIPHEI (Multiplex Immunofluorescence Prediction from H&E Images), a U-Net-inspired architecture that leverages a ViT pathology foundation model as encoder to predict mIF signals from H&E images using rich pretrained representations. MIPHEI targets a comprehensive panel of markers spanning nuclear content, immune lineages (T cells, B cells, myeloid), epithelium, stroma, vasculature, and proliferation. We train our model using the publicly available OrionCRC dataset of restained H&E and mIF images from colorectal cancer tissue, and validate it on five independent datasets: HEMIT, PathoCell, IMMUcan, Lizard and PanNuke. On OrionCRC test set, MIPHEI achieves accurate cell-type classification from H&E alone, with F1 scores of 0.93 for Pan-CK, 0.83 for alpha-SMA, 0.68 for CD3e, 0.36 for CD20, and 0.28 for CD68, substantially outperforming both a state-of-the-art baseline and a random classifier for most markers. Our results indicate that, for some molecular markers, our model captures the complex relationships between nuclear morphologies in their tissue context, as visible in H&E images and molecular markers defining specific cell types. MIPHEI offers a promising step toward enabling cell-type-aware analysis of large-scale H&E datasets, in view of uncovering relationships between spatial cellular organization and patient outcomes.

URL PDF HTML ☆

赞 0 踩 0

2504.00992 2026-03-20 cs.CV

SuperDec: 3D Scene Decomposition with Superquadric Primitives

Elisabetta Fedele, Boyang Sun, Leonidas Guibas, Marc Pollefeys, Francis Engelmann

Comments Project page: https://super-dec.github.io

2503.21782 2026-03-20 cs.CV

Mobile-VideoGPT: Fast and Accurate Model for Mobile Video Understanding

Abdelrahman Shaker, Muhammad Maaz, Chenhui Gou, Hamid Rezatofighi, Salman Khan, Fahad Shahbaz Khan

Comments Technical Report. Project: https://amshaker.github.io/Mobile-VideoGPT

2503.16252 2026-03-20 cs.CL

Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning

Zhaowei Liu, Xin Guo, Zhi Yang, Fangqi Lou, Lingfeng Zeng, Jinyi Niu, Mengping Li, Qi Qi, Zhiqiang Liu, Yiyang Han, Dongpo Cheng, Ronghao Chen, Huacan Wang, Xingdong Feng, Huixia Judy Wang, Chengchun Shi, Liwen Zhang

2503.13194 2026-03-20 cs.AI cs.LG

A representational framework for learning and encoding structurally enriched trajectories in complex agent environments

Corina Catarau-Cotutiu, Esther Mondragon, Eduardo Alonso

2503.08890 2026-03-20 cs.CL

PlainQAFact: Retrieval-augmented Factual Consistency Evaluation Metric for Biomedical Plain Language Summarization

Zhiwen You, Yue Guo

Comments Accepted by Journal of Biomedical Informatics

详情

DOI: 10.1016/j.jbi.2026.105019

英文摘要

Hallucinated outputs from large language models (LLMs) pose risks in the medical domain, especially for lay audiences making health-related decisions. Existing automatic factual consistency evaluation methods, such as entailment- and question-answering (QA) -based, struggle with plain language summarization (PLS) due to elaborative explanation phenomenon, which introduces external content (e.g., definitions, background, examples) absent from the scientific abstract to enhance comprehension. To address this, we introduce PlainQAFact, an automatic factual consistency evaluation metric trained on a fine-grained, human-annotated dataset PlainFact, for evaluating factual consistency of both source-simplified and elaborately explained sentences. PlainQAFact first classifies sentence type, then applies a retrieval-augmented QA scoring method. Empirical results show that existing evaluation metrics fail to evaluate the factual consistency in PLS, especially for elaborative explanations, whereas PlainQAFact consistently outperforms them across all evaluation settings. We further analyze PlainQAFact's effectiveness across external knowledge sources, answer extraction strategies, answer overlap measures, and document granularity levels, refining its overall factual consistency assessment. Taken together, our work presents a sentence-aware, retrieval-augmented metric targeted at elaborative explanations in biomedical PLS tasks, providing the community with both a new benchmark and a practical evaluation tool to advance reliable and safe plain language communication in the medical domain. PlainQAFact and PlainFact are available at: https://github.com/zhiwenyou103/PlainQAFact

URL PDF HTML ☆

赞 0 踩 0

2502.16116 2026-03-20 cs.LG physics.ao-ph

Integrating Weather Station Data and Radar for Precipitation Nowcasting: SmaAt-fUsion and SmaAt-Krige-GNet

Jie Shi, Aleksej Cornelissen, Siamak Mehrkanoon

Comments 13 pages, 6 figures

2502.09340 2026-03-20 cs.LG

This looks like what? Challenges and Future Research Directions for Part-Prototype Models

Khawla Elhadri, Tomasz Michalski, Adam Wróbel, Jörg Schlötterer, Bartosz Zieliński, Christin Seifert

Comments Accepted at the 4th World Conference on eXplainable Artificial Intelligence (XAI-2026)

2501.09749 2026-03-20 cs.CL cs.IR

Enhancing Lexicon-Based Text Embeddings with Large Language Models

Yibin Lei, Tao Shen, Yu Cao, Andrew Yates

Comments ACL 2025

2412.09465 2026-03-20 cs.CV

OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs

Yuanzhi Zhu, Ruiqing Wang, Shilin Lu, Junnan Li, Hanshu Yan, Kai Zhang

Comments ICLR 2026

2412.08973 2026-03-20 cs.CV cs.AI

Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations?

Yifan Zhang, Junhui Hou

Comments Accepted to IJCV 2026

2410.15825 2026-03-20 cs.CL

Did somebody say "Gest-IT"? A pilot exploration of multimodal data management

Ludovica Pannitto, Lorenzo Albanesi, Laura Marion, Federica Maria Martines, Carmelo Caruso, Claudia S. Bianchini, Francesca Masini, Caterina Mauri

2410.09252 2026-03-20 cs.CL cs.AI cs.HC

DAVIS: Planning Agent with Knowledge Graph-Powered Inner Monologue

Minh Pham Dinh, Munira Syed, Michael G Yankoski, Trenton W. Ford

Comments Accepted to EMNLP 2025 Findings

2409.17833 2026-03-20 cs.LG

ODE-Constrained Generative Modeling of Cardiac Dynamics for 12-Lead ECG Synthesis

Yakir Yehuda, Kira Radinsky

2407.17454 2026-03-20 cs.AI

Automated Explanation Selection for Scientific Discovery

Ashlin Iser

Comments Composite AI Workshop at ECAI 2024 (accepted for publication)

2403.17210 2026-03-20 cs.LG cs.AI cs.IR q-bio.BM q-bio.MN

CADGL: Context-Aware Deep Graph Learning for Predicting Drug-Drug Interactions

Azmine Toushik Wasi, Taki Hasan Rafi, Raima Islam, Serbetar Karlo, Dong-Kyu Chae

Comments Preliminary version; full version accepted to the IEEE Transactions on Computational Biology and Bioinformatics (IEEE TCBB) (https://doi.org/10.1109/TCBBIO.2026.3675142). Code: https://github.com/azminewasi/cadgl

2403.02482 2026-03-20 cs.AI

Heuristic Multiobjective Discrete Optimization using Restricted Decision Diagrams

Rahul Patel, Elias B. Khalil, David Bergman

Comments To appear in the proceedings of CPAIOR 2026

2402.15315 2026-03-20 cs.LG cs.DM math.CO

On Minimal Depth in Neural Networks

Juan L. Valerdi

Comments 16 pages

2312.08531 2026-03-20 cs.LG math.OC stat.ML

Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods

Zijian Liu, Zhengyuan Zhou

Comments The preliminary version has been accepted at ICLR 2024. For the update history, please refer to the PDF

2311.04095 2026-03-20 cs.CV

Multi-Scale Distillation for RGB-D Anomaly Detection on the PD-REAL Dataset

Jianjian Qin, Chao Zhang, Chunzhi Gu, Zi Wang, Jun Yu, Yijin Wei, Hui Xiao, Xin Yu

2310.01536 2026-03-20 cs.AI

Algebras of actions in an agent's representations of the world

Alexander Dean, Eduardo Alonso, Esther Mondragon

2603.18593 2026-03-20 cs.CL

Language Model Maps for Prompt-Response Distributions via Log-Likelihood Vectors

Yusuke Takase, Momose Oyama, Hidetoshi Shimodaira

2603.18589 2026-03-20 cs.RO

Benchmarking Visual Feature Representations for LiDAR-Inertial-Visual Odometry Under Challenging Conditions

Eunseon Choi, Junwoo Hong, Daehan Lee, Sanghyun Park, Hyunyoung Jo, Sunyoung Kim, Changho Kang, Seongsam Kim, Yonghan Jung, Jungwook Park, Seul Koo, Soohee Han

Comments 14 pages, Publised IEEE Access2026

详情

DOI: 10.1109/ACCESS.2026.3667112
Journal ref: E. Choi et al., "Benchmarking Visual Feature Representations for LiDAR-Inertial-Visual Odometry Under Challenging Conditions," in IEEE Access, vol. 14, pp. 30186-30199, 2026

英文摘要

Accurate localization in autonomous driving is critical for successful missions including environmental mapping and survivor searches. In visually challenging environments, including low-light conditions, overexposure, illumination changes, and high parallax, the performance of conventional visual odometry methods significantly degrade undermining robust robotic navigation. Researchers have recently proposed LiDAR-inertial-visual odometry (LIVO) frameworks, that integrate LiDAR, IMU, and camera sensors, to address these challenges. This paper extends the FAST-LIVO2-based framework by introducing a hybrid approach that integrates direct photometric methods with descriptor-based feature matching. For the descriptor-based feature matching, this work proposes pairs of ORB with the Hamming distance, SuperPoint with SuperGlue, SuperPoint with LightGlue, and XFeat with the mutual nearest neighbor. The proposed configurations are benchmarked by accuracy, computational cost, and feature tracking stability, enabling a quantitative comparison of the adaptability and applicability of visual descriptors. The experimental results reveal that the proposed hybrid approach outperforms the conventional sparse-direct method. Although the sparse-direct method often fails to converge in regions where photometric inconsistency arises due to illumination changes, the proposed approach still maintains robust performance under the same conditions. Furthermore, the hybrid approach with learning-based descriptors enables robust and reliable visual state estimation across challenging environments.

URL PDF HTML ☆

赞 0 踩 0

2603.18586 2026-03-20 cs.CV

Color image restoration based on nonlocal saturation-value similarity

Wei Wang, Yakun Li

2603.18585 2026-03-20 cs.CV

HAViT: Historical Attention Vision Transformer

Swarnendu Banik, Manish Das, Shiv Ram Dubey, Satish Kumar Singh

2603.18579 2026-03-20 cs.CL cs.AI cs.LG

ICE: Intervention-Consistent Explanation Evaluation with Statistical Grounding for LLMs

Abhinaba Basu, Pavan Chakraborty

2603.18573 2026-03-20 cs.AI cs.IR

Interplay: Training Independent Simulators for Reference-Free Conversational Recommendation

Jerome Ramos, Feng Xia, Xi Wang, Shubham Chatterjee, Xiao Fu, Hossein A. Rahmani, Aldo Lipani

Comments Accepted at ECIR 2026

2603.18571 2026-03-20 cs.AI cs.CE q-bio.QM

CAPSUL: A Comprehensive Human Protein Benchmark for Subcellular Localization

Yicheng Hu, Xinyu Lin, Shulin Li, Wenjie Wang, Fengbin Zhu, Fuli Feng

Comments Accepted to ICLR 2026