arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.20012 2026-03-26 cs.CV

Diffusion-Based Makeup Transfer with Facial Region-Aware Makeup Features

Zheng Gao, Debin Meng, Yunqi Miao, Zhensong Zhang, Songcen Xu, Ioannis Patras, Jifei Song

Comments Accepted by CVPR'26

详情

英文摘要

Current diffusion-based makeup transfer methods commonly use the makeup information encoded by off-the-shelf foundation models (e.g., CLIP) as condition to preserve the makeup style of reference image in the generation. Although effective, these works mainly have two limitations: (1) foundation models pre-trained for generic tasks struggle to capture makeup styles; (2) the makeup features of reference image are injected to the diffusion denoising model as a whole for global makeup transfer, overlooking the facial region-aware makeup features (i.e., eyes, mouth, etc) and limiting the regional controllability for region-specific makeup transfer. To address these, in this work, we propose Facial Region-Aware Makeup features (FRAM), which has two stages: (1) makeup CLIP fine-tuning; (2) identity and facial region-aware makeup injection. For makeup CLIP fine-tuning, unlike prior works using off-the-shelf CLIP, we synthesize annotated makeup style data using GPT-o3 and text-driven image editing model, and then use the data to train a makeup CLIP encoder through self-supervised and image-text contrastive learning. For identity and facial region-aware makeup injection, we construct before-and-after makeup image pairs from the edited images in stage 1 and then use them to learn to inject identity of source image and makeup of reference image to the diffusion denoising model for makeup transfer. Specifically, we use learnable tokens to query the makeup CLIP encoder to extract facial region-aware makeup features for makeup injection, which is learned via an attention loss to enable regional control. As for identity injection, we use a ControlNet Union to encode source image and its 3D mesh simultaneously. The experimental results verify the superiority of our regional controllability and our makeup transfer performance. Code is available at https://github.com/zaczgao/Facial_Region-Aware_Makeup.

URL PDF HTML ☆

赞 0 踩 0

2603.19808 2026-03-26 cs.LG math.AP stat.ML

Two-Time-Scale Learning Dynamics: A Population View of Neural Network Training

Giacomo Borghi, Hyesung Im, Lorenzo Pareschi

2603.17872 2026-03-26 cs.CL cs.AI

Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval

Md. Asraful Haque, Aasar Mehdi, Maaz Mahboob, Tamkeen Fatima

Comments 14 Pages, 5 Figures, 4 Tables; v2: Updated Table 3 and Figure 4 to address minor data inconsistencies and revised the relevant content

2603.16661 2026-03-26 cs.LG stat.ML

Self-Aware Markov Models for Discrete Reasoning

Gregor Kornhardt, Jannis Chemseddine, Christian Wald, Gabriele Steidl

2603.14185 2026-03-26 cs.AI

Relationship-Aware Safety Unlearning for Multimodal LLMs

Vishnu Narayanan Anilkumar, Abhijith Sreesylesh Babu, Trieu Hai Vo, Mohankrishna Kolla, Alexander Cuneo

Comments 9 pages,4figures

2603.12703 2026-03-26 cs.CV

VCBench: A Streaming Counting Benchmark for Spatial-Temporal State Maintenance in Long Videos

Pengyiang Liu, Zhongyue Shi, Hongye Hao, Qi Fu, Xueting Bi, Siwei Zhang, Xiaoyang Hu, Zitian Wang, Linjiang Huang, Si Liu

2603.11649 2026-03-26 cs.RO

A Hybrid Neural-Assisted Unscented Kalman Filter for Unmanned Ground Vehicle Navigation

Gal Versano, Itzik Klein

2603.07659 2026-03-26 cs.CV

Scaling Test-Time Robustness of Vision-Language Models via Self-Critical Inference Framework

Kaihua Tang, Jiaxin Qi, Jinli Ou, Yuhua Zheng, Jianqiang Huang

Comments Accepted to CVPR 2026. Code: https://github.com/KaihuaTang/Self-Critical-Inference-Framework

2603.03072 2026-03-26 cs.AI cs.CL cs.CV

TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

Christian Greisinger, Steffen Eger

2603.01853 2026-03-26 cs.CL

Let the Agent Search: Autonomous Exploration Beats Rigid Workflows in Temporal Question Answering

Xufei Lv, Jiahui Yang, Haoyuan Sun, Xialin Su, Zhiliang Tian, Yifu Gao, Linbo Qiao, Houde Liu

Comments Revised version with three added authors and additional experiments

2603.01601 2026-03-26 cs.CV

Dehallu3D: Hallucination-Mitigated 3D Generation from Single Image via Cyclic View Consistency Refinement

Xiwen Wang, Shichao Zhang, Hailun Zhang, Ruowei Wang, Mao Li, Chenyu Zhou, Qijun Zhao, Ji-Zhe Zhou

2602.24055 2026-03-26 cs.AI cs.SE

CIRCLE: A Framework for Evaluating AI from a Real-World Lens

Reva Schwartz, Carina Westling, Morgan Briggs, Marzieh Fadaee, Isar Nejadgholi, Matthew Holmes, Fariza Rashid, Maya Carlyle, Afaf Taïk, Kyra Wilson, Peter Douglas, Theodora Skeadas, Gabriella Waters, Rumman Chowdhury, Thiago Lacerda

Comments Accepted at Intelligent Systems Conference (IntelliSys) 2026

2602.23481 2026-03-26 cs.CL

IDP Accelerator: Agentic Document Intelligence from Extraction to Compliance Validation

Md Mofijul Islam, Md Sirajus Salekin, Joe King, Priyashree Roy, Vamsi Thilak Gudi, Spencer Romo, Akhil Nooney, David Kaleko, Boyi Xie, Bob Strahan, Diego A. Socolinsky

2602.21591 2026-03-26 cs.CV

CADC: Content Adaptive Diffusion-Based Generative Image Compression

Xihua Sheng, Lingyu Zhu, Tianyu Zhang, Dong Liu, Shiqi Wang, Jing Wang

详情

英文摘要

Diffusion-based generative image compression has demonstrated remarkable potential for achieving realistic reconstruction at ultra-low bitrates. The key to unlocking this potential lies in making the entire compression process content-adaptive, ensuring that the encoder's representation and the decoder's generative prior are dynamically aligned with the semantic and structural characteristics of the input image. However, existing methods suffer from three critical limitations that prevent effective content adaptation. First, isotropic quantization applies a uniform quantization step, failing to adapt to the spatially varying complexity of image content and creating a misalignment with the diffusion model's noise-dependent prior. Second, the information concentration bottleneck -- arising from the dimensional mismatch between the high-dimensional noisy latent and the diffusion decoder's fixed input -- prevents the model from adaptively preserving essential semantic information in the primary channels. Third, existing textual conditioning strategies either need significant textual bitrate overhead or rely on generic, content-agnostic textual prompts, thereby failing to provide adaptive semantic guidance efficiently. To overcome these limitations, we propose a content-adaptive diffusion-based image codec with three technical innovations: 1) an Uncertainty-Guided Adaptive Quantization method that learns spatial uncertainty maps to adaptively align quantization distortion with content characteristics; 2) an Auxiliary Decoder-Guided Information Concentration method that uses a lightweight auxiliary decoder to enforce content-aware information preservation in the primary latent channels; and 3) a Bitrate-Free Adaptive Textual Conditioning method that derives content-aware textual descriptions from the auxiliary reconstructed image, enabling semantic guidance without bitrate cost.

URL PDF HTML ☆

赞 0 踩 0

2602.19083 2026-03-26 cs.CV

ChordEdit: One-Step Low-Energy Transport for Image Editing

Liangsi Lu, Xuhang Chen, Minzhe Guo, Shichu Li, Jingchao Wang, Yang Shi

Comments Accepted by CVPR 2026

2602.17665 2026-03-26 cs.CV

OpenEarthAgent: A Unified Framework for Tool-Augmented Geospatial Agents

Akashah Shabbir, Muhammad Umer Sheikh, Muhammad Akhtar Munir, Hiyam Debary, Mustansar Fiaz, Muhammad Zaigham Zaheer, Paolo Fraccaro, Fahad Shahbaz Khan, Muhammad Haris Khan, Xiao Xiang Zhu, Salman Khan

2602.16485 2026-03-26 cs.CL cs.AI cs.MA

Team of Thoughts: Efficient Test-time Scaling of Agentic Systems through Orchestrated Tool Calling

Jeffrey T. H. Wong, Zixi Zhang, Junyi Liu, Yiren Zhao

Comments 8 pages

2602.14844 2026-03-26 cs.LG

Interactionless Inverse Reinforcement Learning: A Data-Centric Framework for Durable Alignment

Elias Malomgré, Pieter Simoens

Comments Accepted for the AAMAS 2026 Blue Sky Ideas track

2602.07058 2026-03-26 cs.CV cs.AI cs.LG

SPARE: Self-distillation for PARameter-Efficient Removal

Natnael Mola, Leonardo S. B. Pereira, Carolina R. Kelsch, Luis H. Arribas, Juan C. S. M. Avedillo

2602.07047 2026-03-26 cs.CV cs.LG

ShapBPT: Image Feature Attributions Using Data-Aware Binary Partition Trees

Muhammad Rashid, Elvio G. Amparore, Enrico Ferrari, Damiano Verda

Comments Presented at AAAI-26 conference and published in Proceedings of the The Fortieth AAAI Conference on Artificial Intelligence (AAAI-26)

详情

DOI: 10.1609/aaai.v40i30.39699
Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 2026

英文摘要

Pixel-level feature attributions are an important tool in eXplainable AI for Computer Vision (XCV), providing visual insights into how image features influence model predictions. The Owen formula for hierarchical Shapley values has been widely used to interpret machine learning (ML) models and their learned representations. However, existing hierarchical Shapley approaches do not exploit the multiscale structure of image data, leading to slow convergence and weak alignment with the actual morphological features. Moreover, no prior Shapley method has leveraged data-aware hierarchies for Computer Vision tasks, leaving a gap in model interpretability of structured visual data. To address this, this paper introduces ShapBPT, a novel data-aware XCV method based on the hierarchical Shapley formula. ShapBPT assigns Shapley coefficients to a multiscale hierarchical structure tailored for images, the Binary Partition Tree (BPT). By using this data-aware hierarchical partitioning, ShapBPT ensures that feature attributions align with intrinsic image morphology, effectively prioritizing relevant regions while reducing computational overhead. This advancement connects hierarchical Shapley methods with image data, providing a more efficient and semantically meaningful approach to visual interpretability. Experimental results confirm ShapBPT's effectiveness, demonstrating superior alignment with image structures and improved efficiency over existing XCV methods, and a 20-subject user study confirming that ShapBPT explanations are preferred by humans.

URL PDF HTML ☆

赞 0 踩 0

2602.06088 2026-03-26 cs.LG cs.AI

Transformer-Based Reinforcement Learning for Autonomous Orbital Collision Avoidance in Partially Observable Environments

Thomas Georges, Adam Abdin

2601.20732 2026-03-26 cs.LG cs.CV

Continual GUI Agents

Ziwei Liu, Borui Kang, Hangjie Yuan, Zixiang Zhao, Wei Li, Yifan Zhu, Tao Feng

Comments Code is available at: https://github.com/xavierliu34/GUI-AiF

2601.17535 2026-03-26 cs.CV

Will It Zero-Shot?: Predicting Zero-Shot Classification Performance For Arbitrary Queries

Kevin Robbins, Xiaotong Liu, Yu Wu, Le Sun, Grady McPeak, Abby Stylianou, Robert Pless

2601.16212 2026-03-26 cs.RO

Point Bridge: 3D Representations for Cross Domain Policy Learning

Siddhant Haldar, Lars Johannsmeier, Lerrel Pinto, Abhishek Gupta, Dieter Fox, Yashraj Narang, Ajay Mandlekar

2601.15368 2026-03-26 cs.CV eess.IV

Aligned Stable Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency

Yikai Wang, Junqiu Yu, Chenjie Cao, Xiangyang Xue, Yanwei Fu

Comments Extension of our CVPR 2025 highlight paper: arXiv:2312.04831. The paper was submitted to cs.CV but was classified under eess.IV. The authors made an appeal but have not received a response for one month. Therefore, we update the comment to clarify the category

2601.10402 2026-03-26 cs.AI

Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

Xinyu Zhu, Yuzhu Cai, Zexi Liu, Bingyang Zheng, Cheng Wang, Rui Ye, Yuzhi Zhang, Linfeng Zhang, Weinan E, Siheng Chen, Yanfeng Wang

Comments 25 pages. 5 figures

2512.18128 2026-03-26 cs.CV

SERA-H: Beyond Native Sentinel Spatial Limits for High-Resolution Canopy Height Mapping

Thomas Boudras, Martin Schwartz, Rasmus Fensholt, Martin Brandt, Ibrahim Fayad, Jean-Pierre Wigneron, Gabriel Belouze, Fajwel Fogel, Philippe Ciais

Comments 17 pages, 8 figures, 3 tables

2512.16371 2026-03-26 cs.CV

Anchored Video Generation: Decoupling Scene Construction and Temporal Synthesis in Text-to-Video Diffusion Models

Mariam Hassan, Bastien Van Delft, Wuyang Li, Alexandre Alahi

2512.08699 2026-03-26 cs.LG

A Dynamic Time Warping-Transfer Learning Approach to Transferring Knowledge in Stress-strain Behaviors from Polymers to Metals: An Affordable and Generalizable Additive Manufacturing Part Qualification Framework

Chenglong Duan, Dazhong Wu

详情

DOI: 10.1016/j.aei.2026.104538

英文摘要

Part qualification in additive manufacturing (AM) ensures that additively manufactured parts can be consistently produced and reliably used in critical applications. One crucial aspect of part qualification is to determine the complex stress-strain behavior of additively manufactured parts. However, conventional part qualification techniques such as the destructive testing and non-destructive testing are costly and time consuming, especially for metal AM. To address this challenge, we develop a dynamic time warping (DTW)-transfer learning (TL) framework for AM part qualification by transferring knowledge gained from the stress-strain behaviors of additively manufactured low-cost polymers to high-performance, expensive metals. Specifically, the framework selects one single optimal polymer dataset that is the most similar to the metal dataset in the target domain using DTW among multiple polymer datasets, including Nylon, PLA, CF-ABS, and Resin. A long short-term memory (LSTM) model is then trained on one single optimal polymer dataset and tested on one of three target metal datasets, including AlSi10Mg, Ti6Al4V, and carbon steel datasets. Experimental results show that the Resin dataset is selected as the optimal polymer dataset in the source domain for the AlSi10Mg and Ti6Al4V datasets, while the Nylon dataset is selected as the optimal polymer dataset in the source domain for the carbon steel dataset. The DTWTL model trained on one single optimal polymer dataset as the source domain achieves the best predictive performance, including an average mean absolute percentage error of 12.41%, an average root mean squared error of 63.75, and an average coefficient of determination of 0.96 when three metals are used as the target domain, outperforming the vanilla LSTM model without TL as well as the TL model trained on all four polymer datasets as the source domain.

URL PDF HTML ☆

赞 0 踩 0

2512.03923 2026-03-26 cs.LG cs.NA math.NA physics.comp-ph

Quantum-Classical Physics-Informed Neural Networks for Solving Reservoir Seepage Equations

Xiang Rao, Yina Liu, Yuxuan Shen