arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.11585 2026-04-14 cs.CV cs.RO

GeomPrompt: Geometric Prompt Learning for RGB-D Semantic Segmentation Under Missing and Degraded Depth

Krishna Jaganathan, Patricio Vela

Comments Accepted to the CVPR 2026 URVIS Workshop. Project page: https://geomprompt.github.io

详情

英文摘要

Multimodal perception systems for robotics and embodied AI often assume reliable RGB-D sensing, but in practice, depth is frequently missing, noisy, or corrupted. We thus present GeomPrompt, a lightweight cross-modal adaptation module that synthesizes a task-driven geometric prompt from RGB alone for the fourth channel of a frozen RGB-D semantic segmentation model, without depth supervision. We further introduce GeomPrompt-Recovery, an adaptation module that compensates for degraded depth by predicting the fourth channel correction relevant for the frozen segmenter. Both modules are trained solely with downstream segmentation supervision, enabling recovery of the geometric prior useful for segmentation, rather than estimating depth signals. On SUN RGB-D, GeomPrompt improves over RGB-only inference by +6.1 mIoU on DFormer and +3.0 mIoU on GeminiFusion, while remaining competitive with strong monocular depth estimators. For degraded depth, GeomPrompt-Recovery consistently improves robustness, yielding gains up to +3.6 mIoU under severe depth corruptions. GeomPrompt is also substantially more efficient than monocular depth baselines, reaching 7.8 ms latency versus 38.3 ms and 71.9 ms. These results suggest that task-driven geometric prompting is an efficient mechanism for cross-modal compensation under missing and degraded depth inputs in RGB-D perception.

URL PDF HTML ☆

赞 0 踩 0

2604.11579 2026-04-14 cs.CV

Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions

Seongyu Kim, Seungwoo Lee, Hyeonggon Ryu, Joon Son Chung, Arda Senocak

Comments CVPR 2026. Project page: https://mm.kaist.ac.kr/projects/SeeingThroughTouch/

2604.11576 2026-04-14 cs.CV

Finetune Like You Pretrain: Boosting Zero-shot Adversarial Robustness in Vision-language Models

Songlong Xing, Weijie Wang, Zhengyu Zhao, Jindong Gu, Philip Torr, Nicu Sebe

Comments Accepted to CVPR Findings Track 2026

2604.11575 2026-04-14 cs.CL

MIXAR: Scaling Autoregressive Pixel-based Language Models to Multiple Languages and Scripts

Chen Hu, Yintao Tai, Antonio Vergari, Frank Keller, Alessandro Suglia

2604.11572 2026-04-14 cs.RO cs.MM

DA-PTQ: Drift-Aware Post-Training Quantization for Efficient Vision-Language-Action Models

Siyuan Xu, Tianshi Wang, Fengling Li, Lei Zhu, Heng Tao Shen

Comments 13 pages, 6 figures

2604.11565 2026-04-14 cs.CL cond-mat.stat-mech cs.IT math.IT physics.soc-ph

Phonological distances for linguistic typology and the origin of Indo-European languages

Marius Mavridis, Juan De Gregorio, Raul Toral, David Sanchez

Comments 27 pages, 7 figures, 2 appendices

2604.11563 2026-04-14 cs.CL cs.AI cs.LG

Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo

Artem Gadzhiev, Andrew Kislov

2604.11562 2026-04-14 cs.CV

The Impact of Federated Learning on Distributed Remote Sensing Archives

Anand Umashankar, Karam Tomotaki-Dawoud, Nicolai Schneider

Comments This work was completed in 2021. It is posted as a historical record and reference baseline

2604.11560 2026-04-14 cs.LG cs.AI

bacpipe: a Python package to make bioacoustic deep learning models accessible

Vincent S. Kather, Sylvain Haupert, Burooj Ghani, Dan Stowell

2604.11559 2026-04-14 cs.CV physics.med-ph

Progressively Texture-Aware Diffusion for Contrast-Enhanced Sparse-View CT

Tianqi Wang, Wenchao Du, Hongyu Yang

Comments ICASSP2026

2604.11548 2026-04-14 cs.AI

SemaClaw: A Step Towards General-Purpose Personal AI Agents through Harness Engineering

Ningyan Zhu, Huacan Wang, Jie Zhou, Feiyu Chen, Shuo Zhang, Ge Chen, Chen Liu, Jiarou Wu, Wangyi Chen, Xiaofeng Mou, Yi Xu

2604.11547 2026-04-14 cs.LG cs.CL

Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach

Haolin Li, Shuyang Jiang, Ruipeng Zhang, Jiangchao Yao, Ya Zhang, Yanfeng Wang

Comments Accepted to ACL 2026 as a Findings paper

2604.11544 2026-04-14 cs.CL cs.AI

Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory

Weixian Waylon Li, Jiaxin Zhang, Xianan Jim Yang, Tiejun Ma, Yiwen Guo

2604.11543 2026-04-14 cs.CL cs.AI cs.DL cs.IR

NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment

Wenqing Wu, Yi Zhao, Yuzhuo Wang, Siyou Li, Juexi Shao, Yunfei Long, Chengzhi Zhang

Comments ACL 2026

2604.11540 2026-04-14 cs.AI

A collaborative agent with two lightweight synergistic models for autonomous crystal materials research

Tongyu Shi, Yutang Li, Zhanyuan Li, Qian Liu, Jie Zhou, Wenhe Xu, Yang Li, Dawei Dai, Rui He, Wenhua Zhou, Jiahong Wang, Xue-Feng Yu

2604.11539 2026-04-14 cs.CV cs.AI

CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space

Sohwi Lim, Lee Hyoseok, Jungjoon Park, Tae-Hyun Oh

Comments CVPR 2026, Project page: https://sohwi-lim.github.io/CLAY

2604.11524 2026-04-14 cs.AI cs.DS

Limited Perfect Monotonical Surrogates constructed using low-cost recursive linkage discovery with guaranteed output

M. W. Przewozniczek, F. Chicano, R. Tinós, M. M. Komarnicki

2604.11523 2026-04-14 cs.AI cs.MA

PAC-BENCH: Evaluating Multi-Agent Collaboration under Privacy Constraints

Minjun Park, Donghyun Kim, Hyeonjong Ju, Seungwon Lim, Dongwook Choi, Taeyoon Kwon, Minju Kim, Jinyoung Yeo

2604.11522 2026-04-14 cs.CL

Triviality Corrected Endogenous Reward

Xinda Wang, Zhengxu Hou, Yangshijie Zhang, Bingren Yan, Jialin Liu, Chenzhuo Zhao, Zhibo Yang, Bin-Bin Yang, Feng Xiao

2604.11521 2026-04-14 cs.LG cs.CV

Continuous Adversarial Flow Models

Shanchuan Lin, Ceyuan Yang, Zhijie Lin, Hao Chen, Haoqi Fan

2604.11519 2026-04-14 cs.LG math-ph math.MP

Generative Path-Finding Method for Wasserstein Gradient Flow

Chengyu Liu, Xiang Zhou

Comments Due to the arXiv notice that "The Abstract field cannot be longer than 1,920 characters", the abstract shown here is shortened. For the full abstract, please download the article

2604.11504 2026-04-14 cs.AI math.AP math.AT math.DG

Lectures on AI for Mathematics

Xiaoyang Chen, Xiaoyang Chen

2604.11501 2026-04-14 cs.LG cs.AI cs.CL

Quantization Dominates Rank Reduction for KV-Cache Compression

Samuel Salfati

Comments 16 pages, 3 figures

2604.11498 2026-04-14 cs.CV

TAG-Head: Time-Aligned Graph Head for Plug-and-Play Fine-grained Action Recognition

Imtiaz Ul Hassan, Nik Bessis, Ardhendu Behera

Comments 15 pages, 3 figures, to appear in ICPR 2026

2604.11487 2026-04-14 cs.CV

NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild

Aleksandr Gushchin, Khaled Abud, Ekaterina Shumitskaya, Artem Filippov, Georgii Bychkov, Sergey Lavrushkin, Mikhail Erofeev, Anastasia Antsiferova, Changsheng Chen, Shunquan Tan, Radu Timofte, Dmitry Vatolin, Chuanbiao Song, Zijian Yu, Hao Tan, Jun Lan, Zhiqiang Yang, Yongwei Tang, Zhiqiang Wu, Jia Wen Seow, Hong Vin Koay, Haodong Ren, Feng Xu, Shuai Chen, Ruiyang Xia, Qi Zhang, Yaowen Xu, Zhaofan Zou, Hao Sun, Dagong Lu, Mufeng Yao, Xinlei Xu, Fei Wu, Fengjun Guo, Cong Luo, Hardik Sharma, Aashish Negi, Prateek Shaily, Jayant Kumar, Sachin Chaudhary, Akshay Dudhane, Praful Hambarde, Amit Shukla, Zhilin Tu, Fengpeng Li, Jiamin Zhang, Jianwei Fei, Kemou Li, Haiwei Wu, Bilel Benjdira, Anas M. Ali, Wadii Boulila, Chenfan Qu, Junchi Li

Comments CVPR 2026 NTIRE Workshop Paper, Robust AI-Generated Image Detection Technical Report

2604.11484 2026-04-14 cs.CV

PACO: Proxy-Task Alignment and Online Calibration for On-the-Fly Category Discovery

Weidong Tang, Bohan Zhang, Zhixiang Chi, ZiZhang Wu, Yang Wang, Yanan Wu

Comments 16 pages, 6 figures, 7 tables, 1 algorithm

详情

英文摘要

On-the-Fly Category Discovery (OCD) requires a model, trained on an offline support set, to recognize known classes while discovering new ones from an online streaming sequence. Existing methods focus heavily on offline training. They aim to learn discriminative representations on the support set so that novel classes can be separated at test time. However, their discovery mechanism at inference is typically reduced to a single threshold. We argue that this paradigm is fundamentally flawed as OCD is not a static classification problem, but a dynamic process. The model must continuously decide 1) whether a sample belongs to a known class, 2) matches an existing novel category, or 3) should initiate a new one. Moreover, prior methods treat the support set as fixed knowledge. They do not update their decision boundaries as new evidence arrives during inference. This leads to unstable and inconsistent category formation. Our experiments confirm these issues. With properly calibrated and adaptive thresholds, substantial improvements can be achieved, even without changing the representation. Motivated by this, we propose PACO, a support-set-calibrated, tree-structured online decision framework. The framework models inference as a sequence of hierarchical decisions, including known-class routing, birth-aware novel assignment, and attach-versus-create operations over a dynamic prototype memory. Furthermore, we simulate the proxy discovery process to initialize the thresholds during offline training to align with inference. Thresholds are continuously updated during inference using mature novel prototypes. Importantly, PACO requires no heavy training and no dataset-specific tuning. It can be directly integrated into existing OCD pipelines as an inference-time module. Extensive experiments show significant improvements over SOTA baselines across seven benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2604.11483 2026-04-14 cs.LG q-bio.QM

CAGenMol: Condition-Aware Diffusion Language Model for Goal-Directed Molecular Generation

Yanting Li, Zhuoyang Jiang, Enyan Dai, Lei Wang, Wen-Cai Ye, Li Liu

2604.11477 2026-04-14 cs.AI cs.SE q-fin.TR

OOM-RL: Out-of-Money Reinforcement Learning Market-Driven Alignment for LLM-Based Multi-Agent Systems

Kun Liu, Liqun Chen

Comments 13 pages, 3 figures

2604.11473 2026-04-14 cs.LG

Learning How Much to Think: Difficulty-Aware Dynamic MoEs for Graph Node Classification

Jiajun Zhou, Yadong Li, Xuanze Chen, Chen Ma, Chuang Zhao, Shanqing Yu, Qi Xuan

2604.11470 2026-04-14 cs.CV

Degradation-Aware and Structure-Preserving Diffusion for Real-World Image Super-Resolution

Yang Ji, Zonghao Chen, Zhihao Xue, Junqin Hu