arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2510.17422 2026-04-21 cs.CV

DeepDetect: Learning All-in-One Dense Keypoints

Shaharyar Ahmed Khan Tareen, Filza Khan Tareen, Xiaojing Yuan

Comments 8 pages, 8 figures, 3 tables, 6 equations

详情

英文摘要

Keypoint detection is the foundation of many computer vision tasks, including image registration, structure-from-motion, 3D reconstruction, visual odometry, and SLAM. Traditional detectors (SIFT, ORB, BRISK, FAST, etc.) and learning-based methods (SuperPoint, R2D2, QuadNet, LIFT, etc.) have shown strong performance gains yet suffer from key limitations: sensitivity to photometric changes, low keypoint density and repeatability, limited adaptability to challenging scenes, and lack of semantic understanding, often failing to prioritize visually important regions. We present DeepDetect, an intelligent, all-in-one, dense detector that unifies the strengths of classical detectors using deep learning. Firstly, we create ground-truth masks by fusing outputs of 7 keypoint and 2 edge detectors, extracting diverse visual cues from corners and blobs to prominent edges and textures in the images. Afterwards, a lightweight and efficient model: ESPNet, is trained using fused masks as labels, enabling DeepDetect to focus semantically on images while producing highly dense keypoints, that are adaptable to diverse and visually degraded conditions. Evaluations on Oxford, HPatches, and Middlebury datasets demonstrate that DeepDetect surpasses other detectors achieving maximum values of 0.5143 (average keypoint density), 0.9582 (average repeatability), 338,118 (correct matches), and 842,045 (voxels in stereo 3D reconstruction).

URL PDF HTML ☆

赞 0 踩 0

2510.17001 2026-04-21 cs.CL

Vocab Diet: Reshaping the Vocabulary of LLMs via Vector Arithmetic

Yuval Reif, Guy Kaplan, Roy Schwartz

Comments ACL 2026 Findings

2510.16756 2026-04-21 cs.AI cs.CL cs.CV cs.RO eess.AS

End-to-end Listen, Look, Speak and Act

Siyin Wang, Wenyi Yu, Xianzhao Chen, Xiaohai Tian, Jun Zhang, Lu Lu, Chao Zhang

Comments 22 pages, 8 figures

2510.16458 2026-04-21 cs.CL

Agree, Disagree, Explain: Decomposing Human Label Variation in NLI through the Lens of Explanations

Pingjun Hong, Beiduo Chen, Siyao Peng, Marie-Catherine de Marneffe, Benjamin Roth, Barbara Plank

Comments Accepted by ACL 2026 Findings, 13 pages, 6 figures

2510.15253 2026-04-21 cs.CL cs.CV

Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding

Sensen Gao, Shanshan Zhao, Xu Jiang, Lunhao Duan, Yong Xien Chng, Qing-Guo Chen, Weihua Luo, Kaifu Zhang, Jia-Wang Bian, Mingming Gong

Comments Accepted by ACL2026 Main Conference; Project is available at https://github.com/SensenGao/Multimodal-RAG-Survey-For-Document

2510.15218 2026-04-21 cs.LG

Ensemble Deep Learning Models for Early Detection of Meningitis in ICU: Multi-center Study

Han Ouyang, Ayush Singhal, Jesse Hamilton, Saeed Amal

2510.12047 2026-04-21 cs.AI cs.SE

ContractEval: A Benchmark for Evaluating Contract-Satisfying Assertions in Code Generation

Soohan Lim, Joonghyuk Hahn, Hyunwoo Park, Sang-Ki Ko, Yo-Sub Han

Comments 18 pages, 10 figures, 11 tables

2510.11288 2026-04-21 cs.CL

Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs

Nikita Afonin, Nikita Andriianov, Vahagn Hovhannisyan, Nikhil Bageshpura, Kyle Liu, Kevin Zhu, Sunishchal Dev, Ashwinee Panda, Oleg Rogov, Elena Tutubalina, Alexander Panchenko, Mikhail Seleznyov

2510.09741 2026-04-21 cs.CV cs.LG

Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping

Dwip Dalal, Gautam Vashishtha, Utkarsh Mishra, Jeonghwan Kim, Madhav Kanda, Hyeonjeong Ha, Svetlana Lazebnik, Heng Ji, Unnat Jain

Comments Accepted at ICLR 2026

2510.09351 2026-04-21 cs.CL

ReTraceQA: Evaluating Reasoning Traces of Small Language Models in Commonsense Question Answering

Francesco Maria Molfese, Luca Moroni, Ciro Porcaro, Simone Conia, Roberto Navigli

Comments Accepted at ACL 2026 Main Conference

2510.09275 2026-04-21 cs.CL cs.AI

Inflated Excellence or True Performance? Rethinking Medical Diagnostic Benchmarks with Dynamic Evaluation

Xiangxu Zhang, Lei Li, Yanyun Zhou, Xiao Zhou, Yingying Zhang, Xian Wu

Comments Accepted by ACL 2026 Main Conference

2510.08878 2026-04-21 cs.SD cs.AI cs.CL eess.AS

ControlAudio: Tackling Text-Guided, Timing-Indicated and Intelligible Audio Generation via Progressive Diffusion Modeling

Yuxuan Jiang, Zehua Chen, Zeqian Ju, Yusheng Dai, Weibei Dou, Jun Zhu

Comments Accepted at ACL 2026 Main

2510.07761 2026-04-21 cs.CL

Test-Time Reasoners Are Strategic Multiple-Choice Test-Takers

Nishant Balepur, Atrey Desai, Rachel Rudinger

Comments ACL 2026

2510.07739 2026-04-21 cs.LG cs.AI

MeSH: Memory-as-State-Highways for Recursive Transformers

Chengting Yu, Xiaobo Shu, Yadao Wang, Yizhen Zhang, Haoyi Wu, Jiaang Li, Rujiao Long, Ziheng Chen, Yuchi Xu, Wenbo Su, Bo Zheng

Comments Accepted by ICLR 2026

2510.07591 2026-04-21 cs.CL

Creating ConLangs to Probe the Metalinguistic Grammatical Knowledge of LLMs

Chihiro Taguchi, Richard Sproat

Comments 53 pages, 18 tables, 3 figures. Accepted at ACL 2026

2510.07143 2026-04-21 cs.CV

Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods

Chenfei Liao, Wensong Wang, Zichen Wen, Xu Zheng, Yiyu Wang, Haocong He, Yuanhuiyi Lyu, Lutao Jiang, Xin Zou, Yuqian Fu, Bin Ren, Linfeng Zhang, Xuming Hu

Comments Accepted by ACL2026 Main. Code: https://github.com/Chenfei-Liao/VTC-Bench; Project Page: https://chenfei-liao.github.io/VTC-Bench-Page/

2510.06700 2026-04-21 cs.CL

How Language Models Conflate Logical Validity with Plausibility: A Representational Analysis of Content Effects

Leonardo Bertolazzi, Sandro Pezzelle, Raffaella Bernardi

Comments ACL 2026 Findings

2510.05643 2026-04-21 cs.CV

Combined Hyperbolic and Euclidean Soft Triple Loss Beyond the Single Space Deep Metric Learning

Shozo Saeki, Minoru Kawahara, Hirohisa Aman

Comments 12 pages, 3 figures

2510.02034 2026-04-21 cs.CV

SemMorph3D: Unsupervised Semantic-Aware 3D Morphing via Mesh-Guided Gaussians

Mengtian Li, Yunshu Bai, Yimin Chu, Xinru Guo, Haolin Liu, Zhifeng Xie, Chaofeng Chen

Comments Project page: https://baiyunshu.github.io/GAUSSIANMORPHING.github.io/

2510.02025 2026-04-21 cs.CL

Style over Story: Measuring LLM Narrative Preferences via Structured Selection

Donghoon Jung, Jiwoo Choi, Songeun Chae, Seohyon Jung

Comments Accepted to ACL 2026 (Findings), camera-ready version

2510.00861 2026-04-21 cs.CL cs.AI cs.IR

Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs

Ziliang Wang, Kang An, Xuhui Zheng, Faqiang Qian, Weikun Zhang, Cijun Ouyang, Jialu Cai, Yuhang Wang, Yichao Wu

Comments 10 pages, 5 figures

2510.00546 2026-04-21 cs.CL

ThinkBrake: Efficient Reasoning via Log-Probability Margin Guided Decoding

Sangjun Song, Minjae Oh, Seungkyu Lee, Sungmin Jo, Yohan Jo

2509.26010 2026-04-21 cs.CV

New Fourth-Order Grayscale Indicator-Based Telegraph Diffusion Model for Image Despeckling

Rajendra K. Ray, Manish Kumar

2509.23808 2026-04-21 cs.LG cs.CL

Semantic-Space Exploration and Exploitation in RLVR for LLM Reasoning

Fanding Huang, Guanbo Huang, Xiao Fan, Yi He, Xiao Liang, Xiao Chen, Qinting Jiang, Faisal Nadeem Khan, Jingyan Jiang, Zhi Wang

Comments Accepted as an ACL 2026 Findings paper

2509.23724 2026-04-21 cs.CV cs.AI

Video Panels for Long Video Understanding

Lars Doorenbos, Federico Spurio, Juergen Gall

Comments CVPR 2026

2509.21042 2026-04-21 cs.CL cs.LG

LayerNorm Induces Recency Bias in Transformer Decoders

Junu Kim, Xiao Liu, Zhenghao Lin, Lei Ji, Yeyun Gong, Edward Choi

Comments Codes available at: https://github.com/starmpcc/layernorm_recency_bias

2509.18964 2026-04-21 cs.LG math.OC stat.ML

Central Limit Theorems for Asynchronous Averaged Q-Learning

Xingtu Liu

2509.18169 2026-04-21 cs.LG cs.CE cs.CL

PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning

Hengbo Xiao, Jingyuan Fan, Xin Tong, Jingzhao Zhang, Chao Lu, Guannan He

2509.15336 2026-04-21 cs.AI

Knowledge-Driven Hallucination in Large Language Models: An Empirical Study on Process Modeling

Humam Kourani, Anton Antonov, Alessandro Berti, Wil M. P. van der Aalst

Comments The Version of Record of this contribution will be published in the proceedings of the 2nd International Workshop on Generative AI for Process Mining (GenAI4PM 2025). This preprint has not undergone peer review or any post-submission improvements or corrections

2509.14804 2026-04-21 cs.SD eess.AS

Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages

Mingchen Shao, Bingshen Mu, Chengyou Wang, Hai Li, Ying Yan, Zhonghua Fu, Lei Xie