arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.06157 2026-05-08 cs.CL cs.AI cs.CV

HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities

Esra Dönmez, Pascal Tilli, Hsiu-Yu Yang, Thang Vu, Carina Silberer

详情

DOI: 10.18653/v1/2023.conll-1.24
Journal ref: Association for Computational Linguistics (2023)

英文摘要

Image-Text-Matching (ITM) is one of the defacto methods of learning generalized representations from a large corpus in Vision and Language (VL). However, due to the weak association between the web-collected image-text pairs, models fail to show a fine-grained understanding of the combined semantics of these modalities. To address this issue we propose Hard Negative Captions (HNC): an automatically created dataset containing foiled hard negative captions for ITM training towards achieving fine-grained cross-modal comprehension in VL. Additionally, we provide a challenging manually-created test set for benchmarking models on a fine-grained cross-modal mismatch task with varying levels of compositional complexity. Our results show the effectiveness of training on HNC by improving the models' zero-shot capabilities in detecting mismatches on diagnostic tasks and performing robustly under noisy visual input scenarios. Also, we demonstrate that HNC models yield a comparable or better initialization for fine-tuning

URL PDF HTML ☆

赞 0 踩 0

2605.06154 2026-05-08 cs.AI cs.LG

Graphlets as Building Blocks for Structural Vocabulary in Knowledge Graph Foundation Models

Kossi Amouzouvi, Robert Wardenga, Jens Lehmann, Sahar Vahdati

2605.06149 2026-05-08 cs.LG cs.AI

AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning

Yaomin Wang, Jianting Pan, Ran Tian, Xiaoyang Li, Yu Zhang, Hengle Qin, Tianshu YU

Comments 22 pages, 9 figures

2605.06148 2026-05-08 cs.CV cs.AI cs.LG

Learning Discrete Autoregressive Priors with Wasserstein Gradient Flow

Bowen Zheng, Yihong Luo, Tianyang Hu

2605.06145 2026-05-08 cs.LG cs.AI cs.SY eess.SY

Unifying Goal-Conditioned RL and Unsupervised Skill Learning via Control-Maximization

Alireza Modirshanechi, Benjamin Eysenbach, Peter Dayan, Eric Schulz

2605.06143 2026-05-08 cs.CV cs.AI

AI-Generated Images: What Humans and Machines See When They Look at the Same Image

Silvia Poletti, Justin Ilyes, Marcel Hasenbalg, David Fischinger, Martin Boyer

Comments Included in the main conference proceedings published by Springer Nature (CCIS Series)

2605.06142 2026-05-08 cs.CL cs.AI

IRC-Bench: Recognizing Entities from Contextual Cues in First-Person Reminiscences

Yehudit Aperstein, Eden Moran, Alexander Apartsin

Comments 29 pages, 8 figures

2605.06141 2026-05-08 cs.LG

Matrix-Valued Optimism is Matrix-Valued Augmentation: Additive Hybrid Designs for Constrained Optimization

Jiayi Zhao

2605.06140 2026-05-08 cs.LG cs.AI

SymDrift: One-Shot Generative Modeling under Symmetries

Samir Darouich, Vinh Tong, Lluís Pastor-Pérez, Tanja Bien, Loay Mualem, Mathias Niepert

2605.06127 2026-05-08 cs.CV cs.AI

Continuous Expert Assembly: Instance-Conditioned Low-Rank Residuals for All-in-One Image Restoration

Haisen He, Xiangyu Zou, SongLin Dong, Heng Li, Yihong Gong, Zhiheng Ma

2605.06124 2026-05-08 cs.AI

P-Guide: Parameter-Efficient Prior Steering for Single-Pass CFG Inference

Xin Peng, Ang Gao

2605.06123 2026-05-08 cs.AI

Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs

Nguyen Viet Tuan Kiet, Bui Dinh Pham, Dao Van Tung, Tran Cong Dao, Huynh Thi Thanh Binh

Comments 75 pages

2605.06121 2026-05-08 cs.CV

Pest-Thinker: Learning to Think and Reason like Entomologists via Reinforcement Learning

Xueheng Li, Yu Wang, Tao Hu, Ji Huang, Ke Cao, Qize Yang, Rui Li, Jie Zhang, Chengjun Xie

Comments 10 pages, 5 figures

2605.06116 2026-05-08 cs.AI

Policy-Guided Stepwise Model Routing for Cost-Effective Reasoning

Wenwen Si, Insup Lee, Osbert Bastani

2605.06112 2026-05-08 cs.CV cs.AI

Dynamic Pondering Sparsity-aware Mixture-of-Experts Transformer for Event Stream based Visual Object Tracking

Shiao Wang, Xiao Wang, Duoqing Yang, Wenhao Zhang, Bo Jiang, Lin Zhu, Yonghong Tian, Bin Luo

2605.06105 2026-05-08 cs.AI

Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility

Jungsuk Oh, Hyeseo Jeon, Hyunjune Ji, Kyongmin Kong, Jay-Yoon Lee

2605.06104 2026-05-08 cs.LG cs.AI

Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer

Yongyi Wang, Hanyu Liu, Lingfeng Li, Bozhou Chen, Ang Li, Qirui Zheng, Xionghui Yang, Chucai Wang, Wenxin Li

2605.06096 2026-05-08 cs.CL cs.CV

Uncovering Entity Identity Confusion in Multimodal Knowledge Editing

Shu Wu, Xiaotian Ye, Xinyu Mou, Dongsheng Liu, Xiaohan Wang, Mengqi Zhang

2605.06095 2026-05-08 cs.CV

Metonymy in vision models undermines attention-based interpretability

Ananthu Aniraj, Cassio F. Dantas, Dino Ienco, Massimiliano Mancini, Diego Marcos

Comments 28 pages, preprint

2605.06092 2026-05-08 cs.CV

Boosting Self-Supervised Tracking with Contextual Prompts and Noise Learning

Yaozong Zheng, Qihua Liang, Bineng Zhong, Shuimu Zeng, Yuanliang Xue, Ning Li, Shuxiang Song

Comments CVPR 2026

2605.06087 2026-05-08 cs.AI cs.SY eess.SY

Safety Certification is Classification

Oliver Schön, Licio Romao, Sadegh Soudjani

Comments 32 pages, 18 figures

2605.06086 2026-05-08 cs.CV

LARGO: Low-Rank Hypernetwork for Handling Missing Modalities

Niels Vyncke, Pooya Ashtari, Aleksandra Pižurica

2605.06084 2026-05-08 cs.CV

AMIEOD: Adaptive Multi-Experts Image Enhancement for Object Detection in Low-Illumination Scenes

Xiaochen Huang, Honggang Chen, Weicheng Zhang, Xiaobo Dai, Yongyi Li, Linbo Qing, Xiaohai He

Comments Accepted at IEEE Transactions on Multimedia

2605.06083 2026-05-08 cs.CV cs.IR cs.LG cs.MM

Revisiting Uncertainty: On Evidential Learning for Partially Relevant Video Retrieval

Jun Li, Peifeng Lai, Xuhang Lou, Jinpeng Wang, Yuting Wang, Ke Chen, Yaowei Wang, Shu-Tao Xia

Comments Accepted by ICML 2026. 16 pages, 6 figures, 3 tables

2605.06081 2026-05-08 cs.LG

Fast Gauss-Newton for Multiclass Cross-Entropy

Mikalai Korbit, Mario Zanon

Comments 29 pages, 3 figures, 1 table, 1 algorithm

2605.06080 2026-05-08 cs.CV

MSD-Score: Multi-Scale Distributional Scoring for Reference-Free Image Caption Evaluation

Shichao Kan, Xuyang Zhang, Haojie Zhang, Zhe Zhu, Yigang Cen, Yixiong Liang, Lianlei Shan, Linna Zhang, Zhe Qu, Jiazhi Xia

Comments Preprint. 17 pages, 10 figures. Code is available at: https://steinsgatesg.github.io/MSDScore/

2605.06078 2026-05-08 cs.CL cs.AI

Milestone-Guided Policy Learning for Long-Horizon Language Agents

Zixuan Wang, Yuchen Yan, Hongxing Li, Teng Pan, Dingming Li, Ruiqing Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

2605.06077 2026-05-08 cs.LG

Understanding diffusion models requires rethinking (again) generalization

Pierre Marion, Yu-Han Wu

2605.06076 2026-05-08 cs.CL

Navigating by Old Maps: The Pitfalls of Static Mechanistic Localization in LLM Post-Training

Hang Chen, Jiaying Zhu, Hongyang Chen, Hongxu Liu, Xinyu Yang, Wenya Wang

Comments 26 pages

2605.06073 2026-05-08 cs.LG

PRISM: Iterative Cross-Modal Posterior Refinement for Dynamic Text-Attributed Graphs

Trimble Chang, Yihang Liu, Mingjing Han, Han Zhang