arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2510.13907 2026-04-10 cs.CL stat.ML

LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization

Yuanchen Wu, Saurabh Verma, Justin Lee, Fangzhou Xiong, Poppy Zhang, Amel Awadelkarim, Xu Chen, Yubai Yuan, Shawndra Hill

Comments Accepted to Findings of ACL 2026. Camera-ready version

2510.12710 2026-04-10 cs.RO

Reflection-Based Task Adaptation for Self-Improving VLA

Baicheng Li, Dong Wu, Zike Yan, Xinchen Liu, Lusong Li, Zecui Zeng, Hongbin Zha

2510.12184 2026-04-10 cs.CV cs.AI

CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs

Jiwan Kim, Kibum Kim, Sangwoo Seo, Chanyoung Park

Comments ICLR'26, Project Page : https://ptkjw1997.github.io/CompoDistill-page/

2510.07048 2026-04-10 cs.CL cs.AI

Search-R3: Unifying Reasoning and Embedding in Large Language Models

Yuntao Gui, James Cheng

Comments CHANGELOG: (1) Completed training of Search-R3-Large; (2) Corrected error formulation; (3) Added a `Discussion` section to the appendix. We appreciation to the anonymous reviewers

2510.06670 2026-04-10 cs.CL

PIKA: Expert-Level Synthetic Datasets for Post-Training Alignment from Scratch

Shangjian Yin, Shining Liang, Wenbiao Ding, Yuli Qian, Zhouxing Shi, Hongzhi Li, Yutao Xie

2510.05524 2026-04-10 cs.CL cs.IR

KEO: Knowledge Extraction on OMIn via Knowledge Graphs and RAG for Safety-Critical Aviation Maintenance

Kuangshi Ai, Jonathan A. Karr, Meng Jiang, Nitesh V. Chawla, Chaoli Wang

2510.04641 2026-04-10 cs.CL cs.CY cs.LG

Evaluating LLMs for Demographic-Targeted Social Bias Detection: A Comprehensive Benchmark Study

Ayan Majumdar, Feihao Chen, Jinghui Li, Xiaozhen Wang

Comments 19 pages

2510.04628 2026-04-10 cs.CV

A Spatial-Spectral-Frequency Interactive Network for Multimodal Remote Sensing Classification

Hao Liu, Yunhao Gao, Wei Li, Mingyang Zhang, Maoguo Gong, Lorenzo Bruzzone

2509.23322 2026-04-10 cs.CV

Mitigating Visual Context Degradation in Large Multimodal Models: A Training-Free Decoupled Agentic Framework

Hongrui Jia, Chaoya Jiang, Shikun Zhang, Wei Ye

详情

英文摘要

With the continuous expansion of Large Language Models (LLMs) and advances in reinforcement learning, LLMs have demonstrated exceptional reasoning capabilities, enabling them to address a wide range of complex problems. Inspired by these achievements, researchers have extended related techniques to Large Multimodal Models (LMMs). However, a critical limitation has emerged, reflected in the progressive loss of visual grounding. As the reasoning chain grows longer, LMMs tend to rely increasingly on the textual information generated in earlier steps, while the initially extracted visual information is rarely revisited or incorporated. This phenomenon often causes the reasoning process to drift away from the actual image content, resulting in visually implausible or even erroneous conclusions. To overcome this fundamental limitation, we propose a novel, training-free agentic paradigm that Decouples cognitive Reasoning from visual Perception (DRP). In this framework, a powerful LLM serves as a strategic Reasoner, orchestrating the inference process by explicitly querying an LMM-acting as a dedicated Observer-to retrieve fine-grained visual details on demand. This approach is lightweight, model-agnostic, and plug-and-play, necessitating no additional training or architectural modifications. Extensive experiments demonstrate our framework DRP's efficacy in regulating the visual reasoning trajectory, significantly mitigating reasoning drift, and enforcing robust visual grounding. Notably, on the MathVision benchmark, the integration of Qwen2.5-VL-7B and Qwen3-32B achieves an accuracy of 47.2\%, outperforming GPT-4o's 40.6\%. These findings underscore the potential of our approach to enhance multimodal reasoning reliability without the need for costly retraining. Our code is publicly available at https://github.com/hongruijia/DRP.

URL PDF HTML ☆

赞 0 踩 0

2509.23310 2026-04-10 cs.CV

Balanced Diffusion-Guided Fusion for Multimodal Remote Sensing Classification

Hao Liu, Yongjie Zheng, Yuhan Kang, Mingyang Zhang, Maoguo Gong, Lorenzo Bruzzone

2509.18455 2026-04-10 cs.RO

Learning Geometry-Aware Nonprehensile Pushing and Pulling with Dexterous Hands

Yunshuang Li, Yiyang Ling, Gaurav S. Sukhatme, Daniel Seita

Comments Published at International Conference on Robotics and Automation (ICRA) 2026

2509.08016 2026-04-10 cs.CV cs.LG

Video Parallel Scaling: Aggregating Diverse Frame Subsets for VideoLLMs

Hyungjin Chung, Hyelin Nam, Jiyeon Kim, Hyojun Go, Byeongjun Park, Junho Kim, Joonseok Lee, Seongsu Ha, Byung-Hoon Kim

Comments CVPR Findings 2026; code: https://github.com/hyungjin-chung/VPS

2509.07673 2026-04-10 cs.CV cs.LG

Nearest Neighbor Projection Removal Adversarial Training

Himanshu Singh, A. V. Subramanyam, Shivank Rajput, Mohan Kankanhalli

2509.06713 2026-04-10 cs.CV cs.AI

MRI-Based Brain Tumor Detection through an Explainable EfficientNetV2 and MLP-Mixer-Attention Architecture

Mustafa Yurdakul, Şakir Taşdemir

详情

DOI: 10.1007/s13246-026-01728-0
Journal ref: Physical and Engineering Sciences in Medicine, 2026

英文摘要

Brain tumors are serious health problems that require early diagnosis due to their high mortality rates. Diagnosing tumors by examining Magnetic Resonance Imaging (MRI) images is a process that requires expertise and is prone to error. Therefore, the need for automated diagnosis systems is increasing day by day. In this context, a robust and explainable Deep Learning (DL) model for the classification of brain tumors is proposed. In this study, a publicly available Figshare dataset containing 3,064 T1-weighted contrast-enhanced brain MRI images of three tumor types was used. First, the classification performance of nine well-known CNN architectures was evaluated to determine the most effective backbone. Among these, EfficientNetV2 demonstrated the best performance and was selected as the backbone for further development. Subsequently, an attention-based MLP-Mixer architecture was integrated into EfficientNetV2 to enhance its classification capability. The performance of the final model was comprehensively compared with basic CNNs and the methods in the literature. Additionally, Grad-CAM visualization was used to interpret and validate the decision-making process of the proposed model. The proposed model's performance was evaluated using the five-fold cross-validation method. The proposed model demonstrated superior performance with 99.50% accuracy, 99.47% precision, 99.52% recall and 99.49% F1 score. The results obtained show that the model outperforms the studies in the literature. Moreover, Grad-CAM visualizations demonstrate that the model effectively focuses on relevant regions of MRI images, thus improving interpretability and clinical reliability. A robust deep learning model for clinical decision support systems has been obtained by combining EfficientNetV2 and attention-based MLP-Mixer, providing high accuracy and interpretability in brain tumor classification.

URL PDF HTML ☆

赞 0 踩 0

2509.03758 2026-04-10 cs.LG cs.NA math.NA

A Data-Driven Interpolation Method on Smooth Manifolds via Diffusion Processes and Voronoi Tessellations

Alvaro Almeida Gomez

Comments Comments are welcome

2509.01878 2026-04-10 cs.RO cs.CV cs.LG

AI-Driven Marine Robotics: Emerging Trends in Underwater Perception and Ecosystem Monitoring

Scarlett Raine, Tobias Fischer

Comments 9 pages, 3 figures, Accepted for Oral Presentation at AAAI Conference on Artificial Intelligence 2026

2508.19982 2026-04-10 cs.CL cs.AI

Diffusion Language Models Know the Answer Before Decoding

Pengxiang Li, Yefan Zhou, Dilxat Muhtar, Lu Yin, Shilin Yan, Li Shen, Soroush Vosoughi, Shiwei Liu

2508.13993 2026-04-10 cs.CL cs.AI

Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization

Shaohua Duan, Pengcheng Huang, Xinze Li, Zhenghao Liu, Xiaoyuan Yi, Yukun Yan, Shuo Wang, Yu Gu, Ge Yu, Maosong Sun

Comments 17 pages

2508.09521 2026-04-10 cs.CL cs.AI

PEER: Unified Process-Outcome Reinforcement Learning for Structured Empathetic Reasoning

Yunxiao Wang, Meng Liu, Kaiyu Jiang, Bin Wen, Fan Yang, Tingting Gao, Lizi Liao

2508.00768 2026-04-10 cs.LG

Evaluating Angle and Amplitude Encoding Strategies for Variational Quantum Machine Learning: their impact on model's accuracy

Antonio Tudisco, Andrea Marchesin, Maurizio Zamboni, Mariagrazia Graziano, Giovanna Turvani

2507.13662 2026-04-10 cs.RO

Iteratively Learning Muscle Memory for Legged Robots to Master Adaptive and High Precision Locomotion

Jing Cheng, Yasser G. Alqaham, Zhenyu Gan, Amit K. Sanyal

2507.09211 2026-04-10 cs.LG physics.ao-ph physics.data-an physics.geo-ph stat.ML

Capturing Unseen Spatial Heat Extremes Through Dependence-Aware Generative Modeling

Xinyue Liu, Xiao Peng, Shuyue Yan, Yuntian Chen, Dongxiao Zhang, Zhixiao Niu, Hui-Min Wang, Xiaogang He

2507.04678 2026-04-10 cs.CV

ChangeBridge: Spatiotemporal Image Generation with Multimodal Controls for Remote Sensing

Zhenghui Zhao, Chen Wu, Xiangyong Cao, Di Wang, Hongruixuan Chen, Datao Tang, Liangpei Zhang, Zhuo Zheng

Comments Accepted by CVPR 2026

2506.17212 2026-04-10 cs.CV cs.AI cs.LG cs.RO

Part$^{2}$GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting

Tianjiao Yu, Vedant Shah, Muntasir Wahed, Ying Shen, Kiet A. Nguyen, Ismini Lourentzou

2506.12040 2026-04-10 cs.LG cs.AI cs.CV

BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook

Hao Gu, Lujun Li, Hao Wang, Lei Wang, Zheyu Wang, Bei Liu, Jiacheng Liu, Qiyuan Zhu, Sirui Han, Yike Guo

2506.04500 2026-04-10 cs.AI cs.RO

"Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation

Amin Seffo, Aladin Djuhera, Masataro Asai, Holger Boche

Comments ICLR 2026 Workshop -- Agentic AI in the Wild: From Hallucinations to Reliable Autonomy

2505.24499 2026-04-10 cs.CV

Reason-SVG: Enhancing Structured Reasoning for Vector Graphics Generation with Reinforcement Learning

Ximing Xing, Ziteng Xue, Yandong Guan, Jing Zhang, Dong Xu, Qian Yu

Comments Accepted by CVPR 2026. 16 pages, 7 figures

2505.20579 2026-04-10 cs.LG cs.AI cs.MA

The challenge of hidden gifts in multi-agent reinforcement learning

Dane Malenfant, Blake A. Richards

Comments Increased analysis of LOLA baselines and moved to main section. Cleaned up proof and fixed error where gradient symbol was left in front of the log(policy). Self correction becomes more intuitive

详情

英文摘要

Sometimes we benefit from actions that others have taken even when we are unaware that they took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These ``hidden gifts'' represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus this act for others is a ``hidden gift''. We show that several different state-of-the-art MARL algorithms, including MARL specific architectures, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that decentralized actor-critic policy gradient agents can succeed when we provide them with information about their own action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for policy gradient agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show that credit assignment in multi-agent settings can be particularly challenging in the presence of ``hidden gifts'', and demonstrate that self learning-awareness in decentralized agents can benefit these settings.

URL PDF HTML ☆

赞 0 踩 0

2505.17732 2026-04-10 cs.CV cs.AI cs.LG

RQR3D: Reparametrizing the regression targets for BEV-based 3D object detection

Ozsel Kilinc, Cem Tarhan

Comments To appear in proceedings of CVPR Findings 2026

详情

英文摘要

Accurate, fast, and reliable 3D perception is essential for autonomous driving. Recently, bird's-eye view (BEV)-based perception approaches have emerged as superior alternatives to perspective-based solutions, offering enhanced spatial understanding and more natural outputs for planning. Existing BEV-based 3D object detection methods, typically using an angle-based representation, directly estimate the size and orientation of rotated bounding boxes. We observe that BEV-based 3D object detection is analogous to aerial oriented object detection, where angle-based methods are known to suffer from discontinuities in their loss functions. Drawing inspiration from this domain, we propose \textbf{R}estricted \textbf{Q}uadrilateral \textbf{R}epresentation to define \textbf{3D} regression targets. RQR3D regresses the smallest horizontal bounding box encapsulating the oriented box, along with the offsets between the corners of these two boxes, thereby transforming the oriented object detection problem into a keypoint regression task. We employ RQR3D within an anchor-free single-stage object detection method achieving state-of-the-art performance. We show that the proposed architecture is compatible with different object detection approaches. Furthermore, we introduce a simplified radar fusion backbone that applies standard 2D convolutions to radar features. This backbone leverages the inherent 2D structure of the data for efficient and geometrically consistent processing without over-parameterization, thereby eliminating the need for voxel grouping and sparse convolutions. Extensive evaluations on the nuScenes dataset show that RQR3D achieves SotA camera-radar 3D object detection performance despite its lightweight design, reaching 67.5 NDS and 59.7 mAP with reduced translation and orientation errors, which are crucial for safe autonomous driving.

URL PDF HTML ☆

赞 0 踩 0

2505.17556 2026-04-10 cs.LG cs.CV

Wildfire spread forecasting with Deep Learning

Nikolaos Anastasiou, Spyros Kondylatos, Ioannis Papoutsis

Comments 10 pages, 9 figures