arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.22946 2026-03-25 cs.CV

Caption Generation for Dongba Paintings via Prompt Learning and Semantic Fusion

Shuangwu Qian, Xiaochan Yuan, Pengfei Liu

详情

英文摘要

Dongba paintings, the treasured pictorial legacy of the Naxi people in southwestern China, feature richly layered visual elements, vivid color palettes, and pronounced ethnic and regional cultural symbolism, yet their automatic textual description remains largely unexplored owing to severe domain shift when mainstream captioning models are applied directly. This paper proposes \textbf{PVGF-DPC} (\textit{Prompt and Visual Semantic-Generation Fusion-based Dongba Painting Captioning}), an encoder-decoder framework that integrates a content prompt module with a novel visual semantic-generation fusion loss to bridge the gap between generic natural-image captioning and the culturally specific imagery found in Dongba art. A MobileNetV2 encoder extracts discriminative visual features, which are injected into the layer normalization of a 10-layer Transformer decoder initialized with pretrained BERT weights; meanwhile, the content prompt module maps the image feature vector to culture-aware labels -- such as \emph{deity}, \emph{ritual pattern}, or \emph{hell ghost} -- and constructs a post-prompt that steers the decoder toward thematically accurate descriptions. The visual semantic-generation fusion loss jointly optimizes the cross-entropy objectives of both the prompt predictor and the caption generator, encouraging the model to extract key cultural and visual cues and to produce captions that are semantically aligned with the input image. We construct a dedicated Dongba painting captioning dataset comprising 9{}408 augmented images with culturally grounded annotations spanning seven thematic categories.

URL PDF HTML ☆

赞 0 踩 0

2603.22943 2026-03-25 cs.AI

PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference

Qirui Wang, Qi Guo, Yiding Sun, Junkai Yang, Dongxu Zhang, Shanmin Pang, Qing Guo

Comments Accepted in ICME 2026

2603.22942 2026-03-25 cs.AI

Optimizing Small Language Models for NL2SQL via Chain-of-Thought Fine-Tuning

Anshul Solanki, Sanchit Latawa, Koushik Chakraborty, Navneet Kamboj

Comments 9 pages , 3 fifures

2603.22939 2026-03-25 cs.CV cs.LG

FixationFormer: Direct Utilization of Expert Gaze Trajectories for Chest X-Ray Classification

Daniel Beckmann, Benjamin Risse

2603.22935 2026-03-25 cs.AI cs.HC

Ran Score: a LLM-based Evaluation Score for Radiology Report Generation

Ran Zhang, Yucong Lin, Zhaoli Su, Bowen Liu, Danni Ai, Tianyu Fu, Deqiang Xiao, Jingfan Fan, Yuanyuan Wang, Mingwei Gao, Yuwan Hu, Shuya Gao, Jingtao Li, Jian Yang, Hong Song, Hongliang Sun

Comments 4 pages, 5 figures

2603.22922 2026-03-25 cs.CL

Quality Over Clicks: Intrinsic Quality-Driven Iterative Reinforcement Learning for Cold-Start E-Commerce Query Suggestion

Qi Sun, Kejun Xiao, Huaipeng Zhao, Tao Luo, Xiaoyi Zeng

Comments Submitted to ACL 2026 Industry Track

2603.22915 2026-03-25 cs.CV

When AVSR Meets Video Conferencing: Dataset, Degradation, and the Hidden Mechanism Behind Performance Collapse

Yihuan Huang, Jun Xue, Liu Jiajun, Daixian Li, Tong Zhang, Zhuolin Yi, Yanzhen Ren, Kai Li

2603.22904 2026-03-25 cs.AI

Separating Diagnosis from Control: Auditable Policy Adaptation in Agent-Based Simulations with LLM-Based Diagnostics

Shaoxin Zhong, Yuchen Su, Michael Witbrock

Comments This paper has been accepted at AAMAS 2026 Workshop MABS

2603.22903 2026-03-25 cs.RO

Task-Aware Positioning for Improvisational Tasks in Mobile Construction Robots via an AI Agent with Multi-LMM Modules

Seongju Jang, Francis Baek, SangHyun Lee

2603.22899 2026-03-25 cs.RO

Agile-VLA: Few-Shot Industrial Pose Rectification via Implicit Affordance Anchoring

Teng Yan, Zhengyang Pei, Chengyu Shi, Yue Yu, Yikun Chen, Zilong Zhu, Zelin Fang, Kaile Guo, Zihang Wang, Peigen Tian, Bingzhuo Zhong

Comments 8 pages. Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2026

2603.22892 2026-03-25 cs.LG

VLGOR: Visual-Language Knowledge Guided Offline Reinforcement Learning for Generalizable Agents

Pengsen Liu, Maosen Zeng, Nan Tang, Kaiyuan Li, Jing-Cheng Pang, Yunan Liu, Yang Yu

2603.22886 2026-03-25 cs.LG q-fin.GN q-fin.ST

Conditionally Identifiable Latent Representation for Multivariate Time Series with Structural Dynamics

Minkey Chang, Jae-Young Kim

Comments Accepted paper for 2026 ICLR FINAI workshop

2603.22882 2026-03-25 cs.LG cs.CV

TreeTeaming: Autonomous Red-Teaming of Vision-Language Models via Hierarchical Strategy Exploration

Chunxiao Li, Lijun Li, Jing Shao

Comments CVPR2026

2603.22879 2026-03-25 cs.LG cs.AI

Confidence Calibration under Ambiguous Ground Truth

Linwei Tao, Haoyang Luo, Minjing Dong, Chang Xu

2603.22877 2026-03-25 cs.AI

Continuous Optimization for Satisfiability Modulo Theories on Linear Real Arithmetic

Yunuo Cen, Daniel Ebler, Xuanyao Fong

2603.22876 2026-03-25 cs.RO cs.AI

Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models

Ruixing Jin, Zicheng Zhu, Ruixiang Ouyang, Sheng Xu, Bo Yue, Zhizheng Wu, Guiliang Liu

2603.22874 2026-03-25 cs.CV

Template-Based Feature Aggregation Network for Industrial Anomaly Detection

Wei Luo, Haiming Yao, Wenyong Yu

Comments Accepted by Engineering Applications of Artificial Intelligence

2603.22872 2026-03-25 cs.CV

ForeSea: AI Forensic Search with Multi-modal Queries for Video Surveillance

Hyojin Park, Yi Li, Janghoon Cho, Sungha Choi, Jungsoo Lee, Taotao Jing, Shuai Zhang, Munawar Hayat, Dashan Gao, Ning Bi, Fatih Porikli

2603.22871 2026-03-25 cs.AI cs.LG math.DS

Dynamical Systems Theory Behind a Hierarchical Reasoning Model

Vasiliy A. Es'kin, Mikhail E. Smorkalov

2603.22870 2026-03-25 cs.CV

Designing to Forget: Deep Semi-parametric Models for Unlearning

Amber Yijia Zheng, Yu-Shan Tai, Raymond A. Yeh

Comments CVPR 2026

2603.22861 2026-03-25 cs.CV

A Feature Shuffling and Restoration Strategy for Universal Unsupervised Anomaly Detection

Wei Luo, Haiming Yao, Zhenfeng Qiang, Xiaotian Zhang, Weihang Zhang

Comments Accepted by Knowledge-Based Systems

详情

英文摘要

Unsupervised anomaly detection is vital in industrial fields, with reconstruction-based methods favored for their simplicity and effectiveness. However, reconstruction methods often encounter an identical shortcut issue, where both normal and anomalous regions can be well reconstructed and fail to identify outliers. The severity of this problem increases with the complexity of the normal data distribution. Consequently, existing methods may exhibit excellent detection performance in a specific scenario, but their performance sharply declines when transferred to another scenario. This paper focuses on establishing a universal model applicable to anomaly detection tasks across different settings, termed as universal anomaly detection. In this work, we introduce a novel, straightforward yet efficient framework for universal anomaly detection: \uline{F}eature \uline{S}huffling and \uline{R}estoration (FSR), which can alleviate the identical shortcut issue across different settings. First and foremost, FSR employs multi-scale features with rich semantic information as reconstruction targets, rather than raw image pixels. Subsequently, these multi-scale features are partitioned into non-overlapping feature blocks, which are randomly shuffled and then restored to their original state using a restoration network. This simple paradigm encourages the model to focus more on global contextual information. Additionally, we introduce a novel concept, the shuffling rate, to regulate the complexity of the FSR task, thereby alleviating the identical shortcut across different settings. Furthermore, we provide theoretical explanations for the effectiveness of FSR framework from two perspectives: network structure and mutual information. Extensive experimental results validate the superiority and efficiency of the FSR framework across different settings.Code is available at https://github.com/luow23/FSR.

URL PDF HTML ☆

赞 0 踩 0

2603.22858 2026-03-25 cs.LG cs.AI cs.NE q-bio.NC

The Coordinate System Problem in Persistent Structural Memory for Neural Architectures

Abhinaba Basu

2603.22854 2026-03-25 cs.CL cs.AI

Avoiding Over-smoothing in Social Media Rumor Detection with Pre-trained Propagation Tree Transformer

Chaoqun Cui, Caiyan Jia

Comments 14 pages, 6 figures

2603.22852 2026-03-25 cs.CV

Gau-Occ: Geometry-Completed Gaussians for Multi-Modal 3D Occupancy Prediction

Chengxin Lv, Yihui Li, Hongyu Yang, YunHong Wang

2603.22851 2026-03-25 cs.CV cs.AI

UniQueR: Unified Query-based Feedforward 3D Reconstruction

Chensheng Peng, Quentin Herau, Jiezhi Yang, Yichen Xie, Yihan Hu, Wenzhao Zheng, Matthew Strong, Masayoshi Tomizuka, Wei Zhan

2603.22847 2026-03-25 cs.CV

Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought

Yunheng Li, Hangyi Kuang, Hengrui Zhang, Jiangxia Cao, Zhaojie Liu, Qibin Hou, Ming-Ming Cheng

2603.22841 2026-03-25 cs.CV cs.AI

UAV-DETR: DETR for Anti-Drone Target Detection

Jun Yang, Dong Wang, Hongxu Yin, Hongpeng Li, Jianxiong Yu

2603.22840 2026-03-25 cs.CV cs.AI

URA-Net: Uncertainty-Integrated Anomaly Perception and Restoration Attention Network for Unsupervised Anomaly Detection

Wei Luo, Peng Xing, Yunkang Cao, Haiming Yao, Weiming Shen, Zechao Li

Comments Accepted by IEEE TCSVT

2603.22839 2026-03-25 cs.CV

MultiCam: On-the-fly Multi-Camera Pose Estimation Using Spatiotemporal Overlaps of Known Objects

Shiyu Li, Hannah Schieber, Kristoffer Waldow, Benjamin Busam, Julian Kreimeier, Daniel Roth

2603.22837 2026-03-25 cs.CL

Analysing LLM Persona Generation and Fairness Interpretation in Polarised Geopolitical Contexts

Maida Aizaz, Quang Minh Nguyen

Comments EACL 2026 Student Research Workshop