arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.07436 2026-04-15 cs.CV

RPG-SAM: Reliability-Weighted Prototypes and Geometric Adaptive Threshold Selection for Training-Free One-Shot Polyp Segmentation

Weikun Lin, Yunhao Bai, Yan Wang

Comments 8 pages, 3 figures

详情

英文摘要

Training-free one-shot segmentation offers a scalable alternative to expert annotations where knowledge is often transferred from support images and foundation models. But existing methods often treat all pixels in support images and query response intensities models in a homogeneous way. They ignore the regional heterogeity in support images and response heterogeity in query.To resolve this, we propose RPG-SAM, a framework that systematically tackles these heterogeneity gaps. Specifically, to address regional heterogeneity, we introduce Reliability-Weighted Prototype Mining (RWPM) to prioritize high-fidelity support features while utilizing background anchors as contrastive references for noise suppression. To address response heterogeneity, we develop Geometric Adaptive Selection (GAS) to dynamically recalibrate binarization thresholds by evaluating the morphological consensus of candidates. Finally, an iterative refinement loop method is designed to polishes anatomical boundaries. By accounting for multi-layered information heterogeneity, RPG-SAM achieves a 5.56\% mIoU improvement on the Kvasir dataset. Code will be released.

URL PDF HTML ☆

赞 0 踩 0

2603.05295 2026-04-15 cs.AI cs.CV

WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces

Sicheng Fan, Rui Wan, Yifei Leng, Gaoning Liang, Li Ling, Yanyi Shang, Dehan Kong

2603.05044 2026-04-15 cs.AI

WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

Sicheng Fan, Qingyun Shi, Shengze Xu, Shengbo Cai, Tieyong Zeng, Li Ling, Yanyi Shang, Dehan Kong

2603.03249 2026-04-15 cs.CL

Using Learning Progressions to Guide AI Feedback for Science Learning

Xin Xia, Nejla Yuruk, Yun Wang, Xiaoming Zhai

Comments 15pages, 4 figures

2603.02104 2026-04-15 cs.RO

ACDC: Adaptive Curriculum Planning with Dynamic Contrastive Control for Goal-Conditioned Reinforcement Learning in Robotic Manipulation

Xuerui Wang, Guangyu Ren, Tianhong Dai, Bintao Hu, Shuangyao Huang, Wenzhang Zhang, Hengyan Liu

Comments 13 pages (including references and appendix), 12 figures. Accepted to ICAPS 2026. Code available at https://github.com/Xuerui-Wang-oss/Adaptive-Curriculum-Learning-and-Dynamic-Contrastive-Control

2603.01591 2026-04-15 cs.LG cs.AI cs.CV

FAST-DIPS: Adjoint-Free Analytic Steps and Hard-Constrained Likelihood Correction for Diffusion-Prior Inverse Problems

Minwoo Kim, Seunghyeok Shin, Hongki Lim

2603.00137 2026-04-15 cs.LG cs.AI

MAML-KT: Addressing Cold Start Problem in Knowledge Tracing for New Students via Few-Shot Model-Agnostic Meta Learning

Indronil Bhattacharjee, Christabel Wayllace

2602.18109 2026-04-15 cs.LG cs.OS cs.SY eess.SY

TempoNet: Slack-Quantized Transformer-Guided Reinforcement Scheduler for Adaptive Deadline-Centric Real-Time Dispatchs

Rong Fu, Yibo Meng, Guangzhen Yao, Jiaxuan Lu, Zeyu Zhang, Zhaolu Kang, Ziming Guo, Jia Yee Tan, Xiaojing Du, Simon James Fong

Comments 43 pages, 12 figures

2602.17276 2026-04-15 cs.LG math.CO

RLGT: A reinforcement learning framework for extremal graph theory

Ivan Damnjanović, Uroš Milivojević, Irena Đorđević, Dragan Stevanović

2602.15355 2026-04-15 cs.CV

DAV-GSWT: Diffusion-Active-View Sampling for Data-Efficient Gaussian Splatting Wang Tiles

Rong Fu, Jiekai Wu, Haiyun Wei, Yee Tan Jia, Yang Li, Xiaowen Ma, Wangyu Wu, Simon Fong

Comments 16 pages, 7 figures

2602.11236 2026-04-15 cs.CV cs.CL cs.RO

ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning

Yandan Yang, Shuang Zeng, Tong Lin, Xinyuan Chang, Dekang Qi, Junjin Xiao, Haoyun Liu, Ronghan Chen, Yuzhi Chen, Dongjie Huo, Feng Xiong, Xing Wei, Zhiheng Ma, Mu Xu

Comments Project website: https://amap-cvlab.github.io/ABot-Manipulation/ . Code: https://github.com/amap-cvlab/ABot-Manipulation . 22 pages, 10 figures, 10 tables

详情

英文摘要

Building general-purpose embodied agents across diverse hardware remains a central challenge in robotics, often framed as the ''one-brain, many-forms'' paradigm. Progress is hindered by fragmented data, inconsistent representations, and misaligned training objectives. We present ABot-M0, a framework that builds a systematic data curation pipeline while jointly optimizing model architecture and training strategies, enabling end-to-end transformation of heterogeneous raw data into unified, efficient representations. From six public datasets, we clean, standardize, and balance samples to construct UniACT-dataset, a large-scale dataset with over 6 million trajectories and 9,500 hours of data, covering diverse robot morphologies and task scenarios. Unified pre-training improves knowledge transfer and generalization across platforms and tasks, supporting general-purpose embodied intelligence. To improve action prediction efficiency and stability, we propose the Action Manifold Hypothesis: effective robot actions lie not in the full high-dimensional space but on a low-dimensional, smooth manifold governed by physical laws and task constraints. Based on this, we introduce Action Manifold Learning (AML), which uses a DiT backbone to predict clean, continuous action sequences directly. This shifts learning from denoising to projection onto feasible manifolds, improving decoding speed and policy stability. ABot-M0 supports modular perception via a dual-stream mechanism that integrates VLM semantics with geometric priors and multi-view inputs from plug-and-play 3D modules such as VGGT and Qwen-Image-Edit, enhancing spatial understanding without modifying the backbone and mitigating standard VLM limitations in 3D reasoning. Experiments show components operate independently with additive benefits. We will release all code and pipelines for reproducibility and future research.

URL PDF HTML ☆

赞 0 踩 0

2602.10137 2026-04-15 cs.CV cs.AI

Multi-encoder ConvNeXt Network with Smooth Attentional Feature Fusion for Multispectral Semantic Segmentation

Leo Thomas Ramos, Angel D. Sappa

Comments This is an extended version of the study presented at IEEE SoutheastCon2025. It presents substantial new content and original contributions beyond the previous version, including an expanded and enhanced background, new architectural refinements, additional experiments conducted on a broader range of datasets and experimental scenarios, and a more comprehensive analysis of results

详情

DOI: 10.1016/j.neucom.2026.133533
Journal ref: Neurocomputing, vol. 685, pages 133533, 2026

英文摘要

This work proposes MeCSAFNet, a multi-branch encoder-decoder architecture for land cover segmentation in multispectral imagery. The model separately processes visible and non-visible channels through dual ConvNeXt encoders, followed by individual decoders that reconstruct spatial information. A dedicated fusion decoder integrates intermediate features at multiple scales, combining fine spatial cues with high-level spectral representations. The feature fusion is further enhanced with CBAM attention, and the ASAU activation function contributes to stable and efficient optimization. The model is designed to process different spectral configurations, including a 4-channel (4c) input combining RGB and NIR bands, as well as a 6-channel (6c) input incorporating NDVI and NDWI indices. Experiments on the Five-Billion-Pixels (FBP) and Potsdam datasets demonstrate significant performance gains. On FBP, MeCSAFNet-base (6c) surpasses U-Net (4c) by +19.21%, U-Net (6c) by +14.72%, SegFormer (4c) by +19.62%, and SegFormer (6c) by +14.74% in mIoU. On Potsdam, MeCSAFNet-large (4c) improves over DeepLabV3+ (4c) by +6.48%, DeepLabV3+ (6c) by +5.85%, SegFormer (4c) by +9.11%, and SegFormer (6c) by +4.80% in mIoU. The model also achieves consistent gains over several recent state-of-the-art approaches. Moreover, compact variants of MeCSAFNet deliver notable performance with lower training time and reduced inference cost, supporting their deployment in resource-constrained environments.

URL PDF HTML ☆

赞 0 踩 0

2602.08590 2026-04-15 cs.LG cs.DB

SDFed: Bridging Local Global Discrepancy via Subspace Refinement and Divergence Control in Federated Prompt Learning

Yicheng Di, Wei Yuan, Tieke He, Yuan Liu, Hongzhi Yin

Comments The article contains content that requires significant revision, therefore it is being retracted

2602.05971 2026-04-15 cs.CL cs.LG q-bio.NC

Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space

Felipe D. Toro-Hernández, Jesuino Vieira Filho, Rodrigo M. Cabral-Carvalho

Comments 10 pages, 6 figures (excluding refs/appendix). Accepted to ICLR 2026

详情

Journal ref: International Conference on Learning Representations (ICLR) 2026

英文摘要

Semantic representations can be framed as a structured, dynamic knowledge space through which humans navigate to retrieve and manipulate meaning. To investigate how humans traverse this geometry, we introduce a framework that represents concept production as navigation through embedding space. Using different transformer text embedding models, we construct participant-specific semantic trajectories based on cumulative embeddings and extract geometric and dynamical metrics, including distance to next, distance to centroid, entropy, velocity, and acceleration. These measures capture both scalar and directional aspects of semantic navigation, providing a computationally grounded view of semantic representation search as movement in a geometric space. We evaluate the framework on four datasets across different languages, spanning different property generation tasks: Neurodegenerative, Swear verbal fluency, Property listing task in Italian, and in German. Across these contexts, our approach distinguishes between clinical groups and concept types, offering a mathematical framework that requires minimal human intervention compared to typical labor-intensive linguistic pre-processing methods. Comparison with a non-cumulative approach reveals that cumulative embeddings work best for longer trajectories, whereas shorter ones may provide too little context, favoring the non-cumulative alternative. Critically, different embedding models yielded similar results, highlighting similarities between different learned representations despite different training pipelines. By framing semantic navigation as a structured trajectory through embedding space, bridging cognitive modeling with learned representation, thereby establishing a pipeline for quantifying semantic representation dynamics with applications in clinical research, cross-linguistic analysis, and the assessment of artificial cognition.

URL PDF HTML ☆

赞 0 踩 0

2602.04204 2026-04-15 cs.CV cs.LG

AGMA: Adaptive Gaussian Mixture Anchors for Prior-Guided Multimodal Human Trajectory Forecasting

Chao Li, Rui Zhang, Siyuan Huang, Xian Zhong, Hongbo Jiang

Comments Withdrawn for substantial revision and will be re-uploaded as a new manuscript

2602.03901 2026-04-15 cs.LG cs.NE

NeuroPareto: Calibrated Acquisition for Costly Many-Goal Search in Vast Parameter Spaces

Rong Fu, Chunlei Meng, Youjin Wang, Haoyu Zhao, Jiaxuan Lu, Kun Liu, JiaBao Dou, Simon James Fong

Comments 39 pages, 19 figures

2601.19917 2026-04-15 cs.CL

PILOT: Planning via Internalized Latent Optimization Trajectories for Large Language Models

Haoyu Zheng, Yun Zhu, Yuqian Yuan, Bo Yuan, Wenqiao Zhang, Siliang Tang, Jun Xiao

2601.18081 2026-04-15 cs.LG

DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal

Peixuan Han, Yingjie Yu, Jingjun Xu, Jiaxuan You

2601.14004 2026-04-15 cs.CL

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

Hengyuan Zhang, Zhihao Zhang, Mingyang Wang, Zunhai Su, Yiwei Wang, Qianli Wang, Shuzhou Yuan, Ercong Nie, Xufeng Duan, Feijiang Han, Qibo Xue, Zeping Yu, Chenming Shang, Xiao Liang, Jing Xiong, Hui Shen, Chaofan Tao, Zhengwu Liu, Senjie Jin, Zhiheng Xi, Dongdong Zhang, Sophia Ananiadou, Tao Gui, Ruobing Xie, Hayden Kwok-Hay So, Hinrich Schütze, Xuanjing Huang, Qi Zhang, Ngai Wong

2601.08209 2026-04-15 cs.CL

Generation-Augmented Generation: A Plug-and-Play Framework for Private Knowledge Injection in Large Language Models

Rongji Li, Jian Xu, Yi Chen, Xueqing Chen, Yisheng Yang, Jiayi Wang, Xingyu Chen, Chunyu Xie, Dawei Leng, Xu-Yao Zhang

详情

英文摘要

In domains such as materials science, biomedicine, and finance, high-stakes deployment of large language models (LLMs) requires injecting private, domain-specific knowledge that is proprietary, fast-evolving, and under-represented in public pretraining. However, the two dominant paradigms for private knowledge injection each have clear drawbacks: fine-tuning is expensive to iterate under continual updates that can induce catastrophic forgetting and general-capability regression; retrieval-augmented generation (RAG) keeps the base model intact but remains brittle in specialized private corpora due to chunk-induced evidence fragmentation, retrieval mismatch, and long-context pressure. Inspired by how multimodal LLMs align heterogeneous modalities into a shared semantic space, we propose Generation-Augmented Generation (GAG), which treats private expertise as an auxiliary modality and injects it into a frozen base model through a compact, constant-budget latent interface. Concretely, GAG distills question-conditioned specialist knowledge from lightweight domain experts into multi-slot latent memories, integrates multi-layer expert signals via per-slot cross-layer fusion, and aligns them to the frozen base model through gated residual projection, while supporting scalable mixed-domain deployment with reliable selective activation. In a unified mixed-domain evaluation spanning two scientific private-domain QA benchmarks (catalytic materials and immunology adjuvant) together with general-domain queries, GAG consistently outperforms strong retrieval-based and parameter-efficient fine-tuning baselines on specialist QA, while preserving general-domain capability, achieving highly reliable routing, and offering a favorable efficiency--effectiveness trade-off. Code and datasets are provided in the supplementary material. Code is publicly available at https://github.com/360CVGroup/GAG.

URL PDF HTML ☆

赞 0 踩 0

2601.05524 2026-04-15 cs.CL

Double: Breaking the Acceleration Limit via Double Retrieval Speculative Parallelism

Yuhao Shen, Tianyu Liu, Junyi Shen, Jinyang Wu, Quan Kong, Li Huan, Cong Wang

Comments Accepted by ACL2026 Main

2512.22799 2026-04-15 cs.CV

VPTracker: Global Vision-Language Tracking via Visual Prompt

Jingchao Wang, Kaiwen Zhou, Zhijian Wu, Kunhua Ji, Dingjiang Huang, Yefeng Zheng

Comments 7 pages

2512.19728 2026-04-15 cs.LG

Hard Negative Sample-Augmented DPO Post-Training for Small Language Models

Haocheng Lu, Minjun Zhu, Henry Yu

2512.14732 2026-04-15 cs.LG cs.AI cs.CV eess.IV

INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT

Idan Tankel, Nir Mazor, Rafi Brada, Christina LeBedis, Guy ben-Yosef

Comments Accepted for Spotlight presentation at MIDL 2026

2512.13726 2026-04-15 cs.LG cs.AI

Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce

Sayak Chakrabarty, Souradip Pal

Comments 9 pages, 5 figures

2512.10226 2026-04-15 cs.CV cs.RO

Latent Chain-of-Thought World Modeling for End-to-End Driving

Shuhan Tan, Kashyap Chitta, Yuxiao Chen, Ran Tian, Yurong You, Yan Wang, Wenjie Luo, Yulong Cao, Philipp Krahenbuhl, Marco Pavone, Boris Ivanovic

Comments Accepted to CVPR 2026

2512.08217 2026-04-15 cs.LG

Correction of Decoupled Weight Decay

Jason Chuan-Chih Chou

Comments v3 improves derivation and adds the Nesterov momentum case & numerical simulations for Scion-like updates

2512.07178 2026-04-15 cs.AI cs.HC cs.LG

ContextualSHAP : Enhancing SHAP Explanations Through Contextual Language Generation

Latifa Dwiyanti, Sergio Ryan Wibisono, Hidetaka Nambo

Comments This paper was accepted and presented at the 7th World Symposium on Software Engineering (WSSE) 2025 on 25 October 2025 in Okayama, Japan, and is currently awaiting publication

详情

DOI: 10.1145/3779657.3779691
Journal ref: WSSE '25: Proceedings of the 2025 7th World Symposium on Software Engineering

英文摘要

Explainable Artificial Intelligence (XAI) has become an increasingly important area of research, particularly as machine learning models are deployed in high-stakes domains. Among various XAI approaches, SHAP (SHapley Additive exPlanations) has gained prominence due to its ability to provide both global and local explanations across different machine learning models. While SHAP effectively visualizes feature importance, it often lacks contextual explanations that are meaningful for end-users, especially those without technical backgrounds. To address this gap, we propose a Python package that extends SHAP by integrating it with a large language model (LLM), specifically OpenAI's GPT, to generate contextualized textual explanations. This integration is guided by user-defined parameters (such as feature aliases, descriptions, and additional background) to tailor the explanation to both the model context and the user perspective. We hypothesize that this enhancement can improve the perceived understandability of SHAP explanations. To evaluate the effectiveness of the proposed package, we applied it in a healthcare-related case study and conducted user evaluations involving real end-users. The results, based on Likert-scale surveys and follow-up interviews, indicate that the generated explanations were perceived as more understandable and contextually appropriate compared to visual-only outputs. While the findings are preliminary, they suggest that combining visualization with contextualized text may support more user-friendly and trustworthy model explanations.

URL PDF HTML ☆

赞 0 踩 0

2512.05812 2026-04-15 cs.RO cs.CV

Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation

Fabian Konstantinidis, Moritz Sackmann, Ulrich Hofmann, Christoph Stiller

Comments This is the author's accepted version of a paper to appear in the IEEE International Conference on Robotics & Automation (ICRA 2026)

2511.21211 2026-04-15 cs.LG

Robust gene prioritization for Dietary Restriction via Fast-mRMR Feature Selection techniques

Rubén Fernández-Farelo, Jorge Paz-Ruza, Bertha Guijarro-Berdiñas, Amparo Alonso-Betanzos, Alex A. Freitas