arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.15010 2026-04-17 cs.LG cs.AI cs.CL

What Is the Minimum Architecture for Prolepsis? Early Irrevocable Commitment Across Tasks in Small Transformers

Éric Jacopin

Comments 24 pages, 3 figures. Under review at COLM 2026. Independent replication of the rhyme-planning finding from Lindsey et al. (2025) on open-weights models; extended to factual recall

2604.15009 2026-04-17 cs.AI cs.LG

Towards Faster Language Model Inference Using Mixture-of-Experts Flow Matching

Aihua Li

2604.14991 2026-04-17 cs.AI

Predicting Power-System Dynamic Trajectories with Foundation Models

Haoran Li, Lihao Mai, Chenhan Xiao, Erik Blasch, Yang Weng

Comments 10 pages

2604.14990 2026-04-17 cs.AI

The Possibility of Artificial Intelligence Becoming a Subject and the Alignment Problem

Till Mossakowski, Helena Esther Grass

2604.14987 2026-04-17 cs.AI eess.SP

AI-Enabled Covert Channel Detection in RF Receiver Architectures

Abdelrahman Emad Abdelazim, Alan Rodrigo Diaz-Rizo, Hassan Aboushady, Haralampos-G. Stratigopoulos

2604.14986 2026-04-17 cs.RO

Momentum-constrained Hybrid Heuristic Trajectory Optimization Framework with Residual-enhanced DRL for Visually Impaired Scenarios

Yuting Zeng, Zhiwen Zheng, Jingya Wang, You Zhou, JiaLing Xiao, Yongbin Yu, Manping Fan, Bo Gong, Liyong Ren

Comments 24 pages, 14 figures. arXiv admin note: text overlap with arXiv:2509.15582

2604.14974 2026-04-17 cs.LG

Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning

Jean-Bastien Grill, Michal Valko, Rémi Munos

Comments Published in Neural Information Processing Systems 2016

2604.14970 2026-04-17 cs.CL

Explain the Flag: Contextualizing Hate Speech Beyond Censorship

Jason Liartis, Eirini Kaldeli, Lambrini Gyftokosta, Eleftherios Chelioudakis, Orfeas Menis Mastromichalakis

Comments Accepted in the Findings of ACL 2026

2604.14969 2026-04-17 cs.AI

Discovering Novel LLM Experts via Task-Capability Coevolution

Andrew Dai, Boris Meinardus, Ciaran Regan, Yingtao Tian, Yujin Tang

Comments ICLR 2026

2604.14965 2026-04-17 cs.RO

POMDP-based Object Search with Growing State Space and Hybrid Action Domain

Yongbo Chen, Hesheng Wang, Shoudong Huang, Hanna Kurniawati

详情

英文摘要

Efficiently locating target objects in complex indoor environments with diverse furniture, such as shelves, tables, and beds, is a significant challenge for mobile robots. This difficulty arises from factors like localization errors, limited fields of view, and visual occlusion. We address this by framing the object-search task as a highdimensional Partially Observable Markov Decision Process (POMDP) with a growing state space and hybrid (continuous and discrete) action spaces in 3D environments. Based on a meticulously designed perception module, a novel online POMDP solver named the growing neural process filtered k-center clustering tree (GNPF-kCT) is proposed to tackle this problem. Optimal actions are selected using Monte Carlo Tree Search (MCTS) with belief tree reuse for growing state space, a neural process network to filter useless primitive actions, and k-center clustering hypersphere discretization for efficient refinement of high-dimensional action spaces. A modified upper-confidence bound (UCB), informed by belief differences and action value functions within cells of estimated diameters, guides MCTS expansion. Theoretical analysis validates the convergence and performance potential of our method. To address scenarios with limited information or rewards, we also introduce a guessed target object with a grid-world model as a key strategy to enhance search efficiency. Extensive Gazebo simulations with Fetch and Stretch robots demonstrate faster and more reliable target localization than POMDP-based baselines and state-of-the-art (SOTA) non-POMDP-based solvers, especially large language model (LLM) based methods, in object search under the same computational constraints and perception systems. Real-world tests in office environments confirm the practical applicability of our approach. Project page: https://sites.google.com/view/gnpfkct.

URL PDF HTML ☆

赞 0 踩 0

2604.14961 2026-04-17 cs.LG cs.AI

Calibration-Gated LLM Pseudo-Observations for Online Contextual Bandits

Maksim Pershin, Ivan Golovanov, Pavel Baltabaev, Natalia Trankova

2604.14958 2026-04-17 cs.CV

Frequency-Enhanced Dual-Subspace Networks for Few-Shot Fine-Grained Image Classification

Meijia Wang, Guochao Wang, Haozhen Chu, Bin Yao, Weichuan Zhang, Yuan Wang, Junpo Yang

详情

英文摘要

Few-shot fine-grained image classification aims to recognize subcategories with high visual similarity using only a limited number of annotated samples. Existing metric learning-based methods typically rely solely on spatial domain features. Confined to this single perspective, models inevitably suffer from inherent texture biases, entangling essential structural details with high-frequency background noise. Furthermore, lacking cross-view geometric constraints, single-view metrics tend to overfit this noise, resulting in structural instability under few-shot conditions. To address these issues, this paper proposes the Frequency-Enhanced Dual-Subspace Network (FEDSNet). Specifically, FEDSNet utilizes the Discrete Cosine Transform (DCT) and a low-pass filtering mechanism to explicitly isolate low-frequency global structural components from spatial features, thereby suppressing background interference. Truncated Singular Value Decomposition (SVD) is employed to construct independent, low-rank linear subspaces for both spatial texture and frequency structural features. An adaptive gating mechanism is designed to dynamically fuse the projection distances from these dual views. This strategy leverages the structural stability of the frequency subspace to prevent the spatial subspace from overfitting to background features. Extensive experiments on four benchmark datasets - CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC-Aircraft - demonstrate that FEDSNet exhibits excellent classification performance and robustness, achieving highly competitive results compared to existing metric learning algorithms. Complexity analysis further confirms that the proposed network achieves a favorable balance between high accuracy and computational efficiency, providing an effective new paradigm for few-shot fine-grained visual recognition.

URL PDF HTML ☆

赞 0 踩 0

2604.14953 2026-04-17 cs.CV

Prompt-to-Gesture: Measuring the Capabilities of Image-to-Video Deictic Gesture Generation

Hassan Ali, Doreen Jirak, Luca Müller, Stefan Wermter

Comments Accepted at 2026 International Conference on Automatic Face and Gesture Recognition (FG)

2604.14951 2026-04-17 cs.CV cs.AI cs.CL cs.MM

RaTA-Tool: Retrieval-based Tool Selection with Multimodal Large Language Models

Gabriele Mattioli, Evelyn Turri, Sara Sarto, Lorenzo Baraldi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Comments ICPR 2026

2604.14944 2026-04-17 cs.RO cs.CV

HRDexDB: A Large-Scale Dataset of Dexterous Human and Robotic Hand Grasps

Jongbin Lim, Taeyun Ha, Mingi Choi, Jisoo Kim, Byungjun Kim, Subin Jeon, Hanbyul Joo

2604.14941 2026-04-17 cs.CL

Text2Arch: A Dataset for Generating Scientific Architecture Diagrams from Natural Language Descriptions

Shivank Garg, Sankalp Mittal, Manish Gupta

Comments ICLR 2026 Poster

2604.14933 2026-04-17 cs.CV

Generative Data Augmentation for Skeleton Action Recognition

Xu Dong, Wanqing Li, Anthony Adeyemi-Ejeye, Andrew Gilbert

Comments Accepted at IEEE FG 2026

2604.14932 2026-04-17 cs.AI

WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training

Yifu Chen, Shengpeng Ji, Qian Chen, Tianle Liang, Yangzhuo Li, Ziqing Wang, Wen Wang, Jingyu Lu, Haoxiao Wang, Xueyi Pu, Fan Zhuo, Zhou Zhao

2604.14930 2026-04-17 cs.CL

IE as Cache: Information Extraction Enhanced Agentic Reasoning

Hang Lv, Sheng Liang, Hongchao Gu, Wei Guo, Defu Lian, Yong Liu, Hao Wang, Enhong Chen

Comments 8pages, 2figures

2604.14925 2026-04-17 cs.LG cs.AI

Improving Sparse Autoencoder with Dynamic Attention

Dongsheng Wang, Jinsen Zhang, Dawei Su, Hui Huang

2604.14922 2026-04-17 cs.LG cs.CL

LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning

Bowen Ping, Zijun Chen, Tingfeng Hui, Qize Yu, Chenxuan Li, Junchi Yan, Baobao Chang

2604.14920 2026-04-17 cs.AI

Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models

Yifu Chen, Shengpeng Ji, Zhengqing Liu, Qian Chen, Wen Wang, Ziqing Wang, Yangzhuo Li, Tianle Liang, Zhou Zhao

2604.14914 2026-04-17 cs.CV

Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes

Victoria Yue Chen, Emery Pierson, Léopold Maillard, Maks Ovsjanikov

2604.14908 2026-04-17 cs.LG cs.SY eess.SY stat.ML

Multi-User mmWave Beam and Rate Adaptation via Combinatorial Satisficing Bandits

Emre Özyıldırım, Barış Yaycı, Umut Eren Akturk, Cem Tekin

2604.14907 2026-04-17 cs.CL cs.LG

Comparison of Modern Multilingual Text Embedding Techniques for Hate Speech Detection Task

Evaldas Vaiciukynas, Paulius Danenas, Linas Ablonskis, Algirdas Sukys, Edgaras Dambrauskas, Voldemaras Zitkus, Rita Butkiene, Rimantas Butleris

Comments Submitted to Applied Soft Computing (Status: Decision in Process)

2604.14149 2026-04-17 cs.CV

One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding

Zheyu Zhang, Ziqi Pang, Shixing Chen, Xiang Hao, Vimal Bhat, Yu-Xiong Wang

Comments Appear in the proceedings of NeurIPS 2025

2604.14141 2026-04-17 cs.CV

Geometric Context Transformer for Streaming 3D Reconstruction

Lin-Zhuo Chen, Jian Gao, Yihang Chen, Ka Leong Cheng, Yipengjing Sun, Liangxiao Hu, Nan Xue, Xing Zhu, Yujun Shen, Yao Yao, Yinghao Xu

Comments Project page: https://technology.robbyant.com/lingbot-map Code: https://github.com/robbyant/lingbot-map

2604.14137 2026-04-17 cs.CL cs.AI cs.LG

From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs

Itay Itzhak, Eliya Habba, Gabriel Stanovsky, Yonatan Belinkov

Comments Under review. 42 pages, 18 figures. Code and data at https://technion-cs-nlp.github.io/vibe-testing-llms

2604.13315 2026-04-17 cs.CV cs.LG

The Spectrascapes Dataset: Street-view imagery beyond the visible captured using a mobile platform

Akshit Gupta, Joris Timmermans, Filip Biljecki, Remko Uijlenhoet

Comments Submitted, under-review

2604.13183 2026-04-17 cs.CV cs.MM

GeoLink: A 3D-Aware Framework Towards Better Generalization in Cross-View Geo-Localization

Hongyang Zhang, Yinhao Liu, Haitao Zhang, Zhongyi Wen, Zhenyu Kuang, Shuxian Liang, Xiansheng Hua