arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.12624 2026-03-16 cs.CV eess.IV

Prompt-Driven Lightweight Foundation Model for Instance Segmentation-Based Fault Detection in Freight Trains

Guodong Sun, Qihang Liang, Xingyu Pan, Moyun Liu, Yang Zhang

Comments 14 pages, 9 figures

详情

英文摘要

Accurate visual fault detection in freight trains remains a critical challenge for intelligent transportation system maintenance, due to complex operational environments, structurally repetitive components, and frequent occlusions or contaminations in safety-critical regions. Conventional instance segmentation methods based on convolutional neural networks and Transformers often suffer from poor generalization and limited boundary accuracy under such conditions. To address these challenges, we propose a lightweight self-prompted instance segmentation framework tailored for freight train fault detection. Our method leverages the Segment Anything Model by introducing a self-prompt generation module that automatically produces task-specific prompts, enabling effective knowledge transfer from foundation models to domain-specific inspection tasks. In addition, we adopt a Tiny Vision Transformer backbone to reduce computational cost, making the framework suitable for real-time deployment on edge devices in railway monitoring systems. We construct a domain-specific dataset collected from real-world freight inspection stations and conduct extensive evaluations. Experimental results show that our method achieves 74.6 $AP^{\text{box}}$ and 74.2 $AP^{\text{mask}}$ on the dataset, outperforming existing state-of-the-art methods in both accuracy and robustness while maintaining low computational overhead. This work offers a deployable and efficient vision solution for automated freight train inspection, demonstrating the potential of foundation model adaptation in industrial-scale fault diagnosis scenarios. Project page: https://github.com/MVME-HBUT/SAM_FTI-FDet.git

URL PDF HTML ☆

赞 0 踩 0

2603.12618 2026-03-16 cs.LG

Human-AI Collaborative Autonomous Experimentation With Proxy Modeling for Comparative Observation

Arpan Biswas, Hiroshi Funakubo, Yongtao Liu

Comments 14 pages, 7 figures

详情

英文摘要

Optimization for different tasks like material characterization, synthesis, and functional properties for desired applications over multi-dimensional control parameters need a rapid strategic search through active learning such as Bayesian optimization (BO). However, such high-dimensional experimental physical descriptors are complex and noisy, from which realization of a low-dimensional mathematical scalar metrics or objective functions can be erroneous. Moreover, in traditional purely data-driven autonomous exploration, such objective functions often ignore the subtle variation and key features of the physical descriptors, thereby can fail to discover unknown phenomenon of the material systems. To address this, here we present a proxy-modelled Bayesian optimization (px-BO) via on-the-fly teaming between human and AI agents. Over the loop of BO, instead of defining a mathematical objective function directly from the experimental data, we introduce a voting system on the fly where the new experimental outcome will be compared with existing experiments, and the human agents will choose the preferred samples. These human-guided comparisons are then transformed into a proxy-based objective function via fitting Bradley-Terry (BT) model. Then, to minimize human interaction, this iteratively trained proxy model also acts as an AI agent for future surrogate human votes. Finally, these surrogate votes are periodically validated by human agents, and the corrections are then learned by the proxy model on-the-fly. We demonstrated the performance of the proposed px-BO framework into simulated and BEPS data generated from PTO sample. We find that our approach provided better control of the domain experts for an improved search over traditional data-driven exploration, thus, signifies the importance of human-AI teaming in an accelerated and meaningful material space exploration.

URL PDF HTML ☆

赞 0 踩 0

2603.12617 2026-03-16 cs.LG cs.AI

When Drafts Evolve: Speculative Decoding Meets Online Learning

Yu-Yang Qian, Hao-Cong Wu, Yichao Fu, Hao Zhang, Peng Zhao

2603.12612 2026-03-16 cs.LG cs.AI

FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control

Jun Xue, Junze Wang, Xinming Zhang, Shanze Wang, Yanjun Chen, Wei Zhang

2603.12607 2026-03-16 cs.RO cs.AI

CarPLAN: Context-Adaptive and Robust Planning with Dynamic Scene Awareness for Autonomous Driving

Junyong Yun, Jungho Kim, ByungHyun Lee, Dongyoung Lee, Sehwan Choi, Seunghyeop Nam, Kichun Jo, Jun Won Choi

Comments 10 pages, 6 figures. Under review at IEEE Transactions on Intelligent Transportation Systems

详情

英文摘要

Imitation learning (IL) is widely used for motion planning in autonomous driving due to its data efficiency and access to real-world driving data. For safe and robust real-world driving, IL-based planning requires capturing the complex driving contexts inherent in real-world data and enabling context-adaptive decision-making, rather than relying solely on expert trajectory imitation. In this paper, we propose CarPLAN, a novel IL-based motion planning framework that explicitly enhances driving context understanding and enables adaptive planning across diverse traffic scenarios. Our contributions are twofold: We introduce Displacement-Aware Predictive Encoding (DPE) to improve the model's spatial awareness by predicting future displacement vectors between the Autonomous Vehicle (AV) and surrounding scene elements. This allows the planner to account for relational spacing when generating trajectories. In addition to the standard imitation loss, we incorporate an augmented loss term that captures displacement prediction errors, ensuring planning decisions consider relative distances from other agents. To improve the model's ability to handle diverse driving contexts, we propose Context-Adaptive Multi-Expert Decoder (CMD), which leverages the Mixture of Experts (MoE) framework. CMD dynamically selects the most suitable expert decoders based on scene structure at each Transformer layer, enabling adaptive and context-aware planning in dynamic environments. We evaluate CarPLAN on the nuPlan benchmark and demonstrate state-of-the-art performance across all closed-loop simulation metrics. In particular, CarPLAN exhibits robust performance on challenging scenarios such as Test14-Hard, validating its effectiveness in complex driving conditions. Additional experiments on the Waymax benchmark further demonstrate its generalization capability across different benchmark settings.

URL PDF HTML ☆

赞 0 踩 0

2603.12606 2026-03-16 cs.CV cs.AI

Mastering Negation: Boosting Grounding Models via Grouped Opposition-Based Learning

Zesheng Yang, Xi Jiang, Bingzhang Hu, Weili Guan, Runmin Cong, Guo-Jun Qi, Feng Zheng

Comments 12 pages, 6 figures

2603.12605 2026-03-16 cs.CV

A2Z-10M+: Geometric Deep Learning with A-to-Z BRep Annotations for AI-Assisted CAD Modeling and Reverse Engineering

Pritham Kumar Jena, Bhavika Baburaj, Tushar Anand, Vedant Dutta, Vineeth Ulavala, Sk Aziz Ali

Comments 27 pages, accepted to IEEE CVF CVPR 2026

2603.12599 2026-03-16 cs.CV

A Prediction-as-Perception Framework for 3D Object Detection

Song Zhang, Haoyu Chen, Ruibo Wang

2603.12598 2026-03-16 cs.CV

Neural Gate: Mitigating Privacy Risks in LVLMs via Neuron-Level Gradient Gating

Xiangkui Cao, Jie Zhang, Meina Kan, Shiguang Shan, Xilin Chen

2603.12597 2026-03-16 cs.LG cs.AI cs.HC cs.MA cs.SE

Feynman: Knowledge-Infused Diagramming Agent for Scalable Visual Designs

Zixin Wen, Yifu Cai, Kyle Lee, Sam Estep, Josh Sunshine, Aarti Singh, Yuejie Chi, Wode Ni

Comments A previous version was submitted to ICLR 2025

2603.12596 2026-03-16 cs.LG cs.AI

Optimize Wider, Not Deeper: Consensus Aggregation for Policy Optimization

Zelal Su, Mustafaoglu, Sungyoung Lee, Eshan Balachandar, Risto Miikkulainen, Keshav Pingali

2603.12595 2026-03-16 cs.LG cs.AI

Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback

Gihoon Kim, Euntai Kim

Comments ICLR 2026

2603.12594 2026-03-16 cs.LG

Maximizing Incremental Information Entropy for Contrastive Learning

Jiansong Zhang, Zhuoqin Yang, Xu Wu, Xiaoling Luo, Peizhong Liu, Linlin Shen

Comments ICLR 2026 (The Fourteenth International Conference on Learning Representations) https://openreview.net/forum?id=XL7ValpExh

2603.12591 2026-03-16 cs.LG cs.AI

CA-HFP: Curvature-Aware Heterogeneous Federated Pruning with Model Reconstruction

Gang Hu, Yinglei Teng, Pengfei Wu, Shijun Ma

2603.12587 2026-03-16 cs.CV

MRGeo: Robust Cross-View Geo-Localization of Corrupted Images via Spatial and Channel Feature Enhancement

Le Wu, Lv Bo, Songsong Ouyang, Yingying Zhu

2603.12582 2026-03-16 cs.CL cs.CR

RTD-Guard: A Black-Box Textual Adversarial Detection Framework via Replacement Token Detection

He Zhu, Yanshu Li, Wen Liu, Haitian Yang

Comments 15 pages, 4 figures

2603.12579 2026-03-16 cs.CV

DINOLight: Robust Ambient Light Normalization with Self-supervised Visual Prior Integration

Youngjin Oh, Junhyeong Kwon, Nam Ik Cho

Comments Submitted to ICPR 2026 (under review)

2603.12577 2026-03-16 cs.CL cs.CV

Expert Pyramid Tuning: Efficient Parameter Fine-Tuning for Expertise-Driven Task Allocation

Jia-Chen Zhang, Zhen-Wei Yan, Yu-Jie Xiong, Chun-Ming Xia

2603.12576 2026-03-16 cs.LG

A Spectral Revisit of the Distributional Bellman Operator under the Cramér Metric

Keru Wang, Yixin Deng, Yao Lyu, Stephen Redmond, Shengbo Eben Li

2603.12575 2026-03-16 cs.CV

AccelAes: Accelerating Diffusion Transformers for Training-Free Aesthetic-Enhanced Image Generation

Xuanhua Yin, Chuanzhi Xu, Haoxian Zhou, Boyu Wei, Weidong Cai

Comments 32 pages, 13 tables, 12 figures

2603.12574 2026-03-16 cs.RO

From Woofs to Words: Towards Intelligent Robotic Guide Dogs with Verbal Communication

Yohei Hayamizu, David DeFazio, Hrudayangam Mehta, Zainab Altaweel, Jacqueline Choe, Chao Lin, Jake Juettner, Furui Xiao, Jeremy Blackburn, Shiqi Zhang

Comments 10 pages, 6 figures, AAAI 2026

2603.12565 2026-03-16 cs.SD cs.CL

Speech-Worthy Alignment for Japanese SpeechLLMs via Direct Preference Optimization

Mengjie Zhao, Lianbo Liu, Yusuke Fujita, Hao Shi, Yuan Gao, Roman Koshkin, Yui Sudo

2603.12557 2026-03-16 cs.LG cs.CV

Lyapunov Stable Graph Neural Flow

Haoyu Chu, Xiaotong Chen, Wei Zhou, Wenjun Cui, Kai Zhao, Shikui Wei, Qiyu Kang

2603.12556 2026-03-16 cs.LG cs.NA math.NA physics.comp-ph

Scaling Laws and Pathologies of Single-Layer PINNs: Network Width and PDE Nonlinearity

Faris Chaudhry

Comments Accepted at the Machine Learning and Physical Sciences Workshop (NeurIPS 2025)

2603.12553 2026-03-16 cs.RO cs.CV

Beyond Dense Futures: World Models as Structured Planners for Robotic Manipulation

Minghao Jin, Mozheng Liao, Mingfei Han, Zhihui Li, Xiaojun Chang

2603.12552 2026-03-16 cs.LG math.OC stat.ML

Asymptotic and Finite-Time Guarantees for Langevin-Based Temperature Annealing in InfoNCE

Faris Chaudhry

Comments Accepted at the Optimization for Machine Learning Workshop (NeurIPS 2025)

2603.12551 2026-03-16 cs.CV

CVGL: Causal Learning and Geometric Topology

Songsong Ouyang, Yingying Zhu

2603.12547 2026-03-16 cs.CV

Decoding Matters: Efficient Mamba-Based Decoder with Distribution-Aware Deep Supervision for Medical Image Segmentation

Fares Bougourzi, Fadi Dornaika, Abdenour Hadid

2603.12544 2026-03-16 cs.LG

Deep Distance Measurement Method for Unsupervised Multivariate Time Series Similarity Retrieval

Susumu Naito, Kouta Nakata, Yasunori Taguchi

Comments Workshop of Artificial Intelligence for Time Series Analysis (AI4TS): Theory, Algorithms, and Applications at 2025 IEEE International Conference on Data Mining (ICDM), 2025

2603.12543 2026-03-16 cs.LG cs.AI

CALF: Communication-Aware Learning Framework for Distributed Reinforcement Learning

Carlos Purves, Pietro Lio'