arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2511.19996 2026-04-15 cs.LG

RankOOD -- Class Ranking-based Out-of-Distribution Detection

Dishanika Denipitiyage, Naveen Karunanayake, Suranga Seneviratne, Sanjay Chawla

详情

英文摘要

We propose RankOOD, a rank-based Out-of-Distribution (OOD) detection approach based on training a model with the Placket-Luce loss, which is now extensively used for preference alignment tasks in foundational models. Our approach is based on the insight that with a deep learning model trained using the Cross Entropy Loss, in-distribution (ID) class prediction induces a ranking pattern for each ID class prediction. The RankOOD framework formalizes the insight by first extracting a rank list for each class using an initial classifier and then uses another round of training with the Plackett-Luce loss, where the class rank, a fixed permutation for each class, is the predicted variable. An OOD example may get assigned with high probability to an ID example, but the probability of it respecting the ranking classification is likely to be small. RankOOD, achieves SOTA performance on the near-ODD TinyImageNet evaluation benchmark, reducing FPR95 by 4.3%.

URL PDF HTML ☆

赞 0 踩 0

2511.19820 2026-04-15 cs.CV cs.AI cs.CL cs.LG

CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception

Miguel Carvalho, Helder Dias, Bruno Martins

Comments Accepted to the GRAIL-V Workshop at CVPR 2026

2511.17714 2026-04-15 cs.AI cs.GT

Learning the Value of Value Learning

Alex John London, Aydin Mohseni

Comments 19 pages, 6 figures, mathematical appendix

2511.11973 2026-04-15 cs.LG

Quantile Q-Learning: Revisiting Offline Extreme Q-Learning with Quantile Regression

Xinming Gao, Shangzhe Li, Yujin Cai, Wenwu Yu

Comments Accepted by TMLR 2026; Code available at: https://github.com/yunqianevergarden/Quantile-Q-Learning

2511.08439 2026-04-15 cs.AI

Dataset Safety in Autonomous Driving: Requirements, Risks, and Assurance

Alireza Abbaspour, Tejaskumar Balgonda Patil, B Ravi Kiran, Russel Mohr, Senthil Yogamani

2511.05477 2026-04-15 cs.CV

GroupKAN: Efficient Kolmogorov-Arnold Networks via Grouped Spline Modeling

Guojie Li, Tianyi Liu, Anwar P. P. Abdul Majeed, Muhammad Ateeq, Anh Nguyen, Fan Zhang

详情

英文摘要

Medical image segmentation demands models that achieve high accuracy while maintaining computational efficiency and clinical interpretability. While recent Kolmogorov-Arnold Networks (KANs) offer powerful adaptive non-linearities, their full-channel spline transformations incur a quadratic parameter growth of $\mathcal{O}(C^{2}(G+k))$ with respect to the channel dimension $C$, where $G$ and $k$ denote the number of grid intervals and spline polynomial order, respectively. Moreover, unconstrained spline mappings lack structural constraints, leading to excessive functional freedom, which may cause overfitting under limited medical annotations. To address these challenges, we propose GroupKAN (Grouped Kolmogorov-Arnold Networks), an efficient architecture driven by group-structured spline modeling. Specifically, we introduce: (1) Grouped KAN Transform (GKT), which restricts spline interactions to intra-group channel mappings across $g$ groups, effectively reducing the spline-induced quadratic expansion to \textbf{$\mathcal{O}(C^2(\frac{G+k}{g} + 1))$}, thereby significantly lowering the effective quadratic coefficient; and (2) Grouped KAN Activation (GKA), which applies shared spline functions within each group to enable efficient token-wise non-linearities. By imposing structured constraints on channel interactions, GroupKAN achieves a substantial reduction in parameter redundancy without sacrificing expressive capacity.Extensive evaluations on three medical benchmarks (BUSI, GlaS, and CVC) demonstrate that GroupKAN achieves an average IoU of 79.80\%, outperforming the strong U-KAN baseline by +1.11\% while requiring only 47.6\% of the parameters (3.02M vs. 6.35M). Qualitative results further reveal that GroupKAN produces sharply localized activation maps that better align with the ground truth than MLPs and KANs, significantly enhancing clinical interpretability.

URL PDF HTML ☆

赞 0 踩 0

2511.00710 2026-04-15 cs.AI

Does RLVR Extend Reasoning Boundaries? Investigating Capability Expansion in Vision-Language Models

Minghe Shen, Zhuo Zhi, Chonghan Liu, Shuo Xing, Zhengzhong Tu, Che Liu

2510.25512 2026-04-15 cs.LG cs.AI cs.CV

FaCT: Faithful Concept Traces for Explaining Neural Network Decisions

Amin Parchami-Araghi, Sukrut Rao, Jonas Fischer, Bernt Schiele

Comments 35 pages, 23 figures, 2 tables, Neural Information Processing Systems (NeurIPS) 2025; Code is available at https://github.com/m-parchami/FaCT

2510.24168 2026-04-15 cs.AI

MGA: Memory-Driven GUI Agent for Observation-Centric Interaction

Weihua Cheng, Junming Liu, Yifei Sun, Botian Shi, Yirong Chen, Ding Wang

Comments Submitted to ACM MM 2026

2510.23026 2026-04-15 cs.AI cs.RO

Mixed-Density Diffuser: Efficient Planning with Non-Uniform Temporal Resolution

Crimson Stambaugh, Rajesh P. N. Rao

Comments European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN, 2026)

2510.20635 2026-04-15 cs.CL cs.AI

Why Did Apple Fall: Evaluating Curiosity in Large Language Models

Haoyu Wang, Sihang Jiang, Yuyan Chen, Xiaojun Meng, Jiansheng Wei, Yitong Wang, Yanghua Xiao

Comments ACL 2026 findings paper

2510.20093 2026-04-15 cs.CV cs.AI

StableSketcher: Enhancing Diffusion Model for Pixel-based Sketch Generation via Visual Question Answering Feedback

Jiho Park, Sieun Choi, Jaeyoon Seo, Jihie Kim

Comments Under review at IEEE Access. Author-submitted preprint. Not the IEEE-published version

2510.19644 2026-04-15 cs.CL

CoRoVA: Compressed Representations for Vector-Augmented Code Completion

Daria Cherniuk, Nikita Sukhorukov, Danil Gusak, Nikita Sushko, Danil Sivtsov, Elena Tutubalina, Evgeny Frolov

2510.17902 2026-04-15 cs.AI

Activation Manifold Projection: Liberating Task-Specific Behaviors from LLM Architectures

Al Kari

2510.15552 2026-04-15 cs.CL cs.AI

Think Parallax: Solving Multi-Hop Problems via Multi-View Knowledge-Graph-Based Retrieval-Augmented Generation

Jinliang Liu, Jiale Bai, Shaoning Zeng

2510.09087 2026-04-15 cs.AI

The Stackelberg Speaker: Optimizing Persuasive Communication in Social Deduction Games

Zhang Zheng, Deheng Ye, Peilin Zhao, Hao Wang

Comments Accepted by ACL 2026

2510.07285 2026-04-15 cs.LG cs.AI

GTCN-G: A Residual Graph-Temporal Fusion Network for Imbalanced Intrusion Detection

Tianxiang Xu, Zhichao Wen, Xinyu Zhao, Qi Hu, Yan Li, Chang Liu

2510.04705 2026-04-15 cs.CV

Label-Efficient Cross-Modality Generalization for Liver Segmentation in Multi-Phase MRI

Quang-Khai Bui-Tran, Minh-Toan Dinh, Thanh-Huy Nguyen, Ba-Thinh Lam, Mai-Anh Vu, Ulas Bagci

Comments Accepted at MICCAI 2025 Workshop

2510.03174 2026-04-15 cs.CL cs.AI

LLM as Attention-Informed NTM and Topic Modeling as long-input Generation: Interpretability and long-Context Capability

Xuan Xu, Zhongliang Yang, Haolun Li, Beilin Chu, Rui Tian, Yu Li, Shaolin Tan, Linna Zhou

2510.00919 2026-04-15 cs.CL cs.AI

Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving

Shunfeng Zheng, Yudi Zhang, Meng Fang, Zihan Zhang, Zhitan Wu, Mykola Pechenizkiy, Ling Chen

Comments Accepted to EMNLP 2025 (Findings)

2509.25749 2026-04-15 cs.CV cs.AI

ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On

Junseo Park, Hyeryung Jang

Comments 21 pages

2509.22220 2026-04-15 cs.CL cs.SD

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

Yuhan Song, Linhao Zhang, Chuhan Wu, Aiwei Liu, Wei Jia, Houfeng Wang, Xiao Zhou

Comments Accepted to ICLR 2026

2509.19742 2026-04-15 cs.CL cs.AI cs.IR

HiCoLoRA: Addressing Context-Prompt Misalignment via Hierarchical Collaborative LoRA for Zero-Shot DST

Shuyu Zhang, Yifan Wei, Xinru Wang, Yanmin Zhu, Yangfan He, Yixuan Weng, Bin Li, Yujie Liu

Comments Accepted in ACL2026 findings

2509.19695 2026-04-15 cs.CL cs.AI cs.IR

DyBBT: Dynamic Balance via Bandit-inspired Targeting for Dialog Policy with Cognitive Dual-Systems

Shuyu Zhang, Yifan Wei, Jialuo Yuan, Xinru Wang, Yanmin Zhu, Bin Li, Yujie Liu

Comments Accepted in ACL2026 main

2509.18367 2026-04-15 cs.LG cs.AI cs.DC

Multi-Worker Selection based Distributed Swarm Learning for Edge IoT with Non-i.i.d. Data

Zhuoyu Yao, Yue Wang, Songyang Zhang, Yingshu Li, Zhipeng Cai, Zhi Tian

2509.18127 2026-04-15 cs.LG cs.AI cs.CL

Safe-SAIL: Towards a Fine-grained Safety Landscape of Large Language Models via Sparse Autoencoder Interpretation Framework

Jiaqi Weng, Han Zheng, Hanyu Zhang, Ej Zhou, Qinqin He, Jialing Tao, Hui Xue, Zhixuan Chu, Xiting Wang

2509.17995 2026-04-15 cs.CL cs.AI cs.LG

Variation in Verification: Understanding Verification Dynamics in Large Language Models

Yefan Zhou, Austin Xu, Yilun Zhou, Janvijay Singh, Jiang Gui, Shafiq Joty

Comments ICLR 2026

2509.16615 2026-04-15 cs.RO

LLM-Guided Task- and Affordance-Level Exploration in Reinforcement Learning

Jelle Luijkx, Runyu Ma, Zlatan Ajanović, Jens Kober

Comments 8 pages, 7 figures, ICRA 2026

2509.15406 2026-04-15 cs.CV

Causal Fingerprints of AI Generative Models

Hui Xu, Chi Liu, Congcong Zhu, Minghao Wang, Youyang Qu, Longxiang Gao

Comments 5 page, accepted for presentation at IEEE ICASSP 2026

2509.10026 2026-04-15 cs.CV

LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA

Jing Huang, Zhiya Tan, Shutao Gong, Fanwei Zeng, Joey Tianyi Zhou, Changtao Miao, Huazhe Tan, Weibin Yao, Jianshu Li

Comments Accepted by WWW 2026 Industry Track - Oral

详情

英文摘要

As large vision language models (VLMs) advance, their capabilities in multilingual visual question answering (mVQA) have significantly improved. Chain-of-thought (CoT) reasoning has been proven to enhance interpretability and complex reasoning. However, most existing approaches rely primarily on textual CoT and provide limited support for multilingual multimodal reasoning, constraining their deployment in real-world applications. To address this gap, we introduce LaV-CoT, the first Language-aware Visual CoT framework with Multi-Aspect Reward Optimization. LaV-CoT incorporates an interpretable multi-stage reasoning pipeline consisting of Text Summary with Bounding Box (BBox), Language Identification, Spatial Object-level Captioning, and Step-by-step Logical Reasoning. Following this reasoning pipeline, we design an automated data curation method that generates multilingual CoT annotations through iterative generation, correction, and refinement, enabling scalable and high-quality training data. To improve reasoning and generalization, LaV-CoT adopts a two-stage training paradigm combining Supervised Fine-Tuning (SFT) with Language-aware Group Relative Policy Optimization (GRPO), guided by verifiable multi-aspect rewards including language consistency, structural accuracy, and semantic alignment. Extensive evaluations on public datasets including MMMB, Multilingual MMBench, and MTVQA show that LaV-CoT achieves up to ~9.5% accuracy improvements over open-source baselines of similar size and even surpasses models with 2$\times$ larger scales by ~2.6%. Moreover, LaV-CoT outperforms advanced proprietary models such as GPT-4o-0513 and Gemini-2.5-flash. We further conducted an online A/B test to validate our method on real-world data, highlighting its effectiveness for industrial deployment. Our code is available at this link: https://github.com/HJNVR/LaV-CoT

URL PDF HTML ☆

赞 0 踩 0