arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.04307 2026-03-05 cs.CV

Dual Diffusion Models for Multi-modal Guided 3D Avatar Generation

Hong Li, Yutang Feng, Minqi Meng, Yichen Yang, Xuhui Liu, Baochang Zhang

Comments 18 pages, 10 figures

详情

英文摘要

Generating high-fidelity 3D avatars from text or image prompts is highly sought after in virtual reality and human-computer interaction. However, existing text-driven methods often rely on iterative Score Distillation Sampling (SDS) or CLIP optimization, which struggle with fine-grained semantic control and suffer from excessively slow inference. Meanwhile, image-driven approaches are severely bottlenecked by the scarcity and high acquisition cost of high-quality 3D facial scans, limiting model generalization. To address these challenges, we first construct a novel, large-scale dataset comprising over 100,000 pairs across four modalities: fine-grained textual descriptions, in-the-wild face images, high-quality light-normalized texture UV maps, and 3D geometric shapes. Leveraging this comprehensive dataset, we propose PromptAvatar, a framework featuring dual diffusion models. Specifically, it integrates a Texture Diffusion Model (TDM) that supports flexible multi-condition guidance from text and/or image prompts, alongside a Geometry Diffusion Model (GDM) guided by text prompts. By learning the direct mapping from multi-modal prompts to 3D representations, PromptAvatar eliminates the need for time-consuming iterative optimization, successfully generating high-fidelity, shading-free 3D avatars in under 10 seconds. Extensive quantitative and qualitative experiments demonstrate that our method significantly outperforms existing state-of-the-art approaches in generation quality, fine-grained detail alignment, and computational efficiency.

URL PDF HTML ☆

赞 0 踩 0

2603.04305 2026-03-05 cs.RO

Perception-Aware Time-Optimal Planning for Quadrotor Waypoint Flight

Chao Qin, Jiaxu Xing, Rudolf Reiter, Angel Romero, Yifan Lin, Hugh H. -T. Liu, Davide Scaramuzza

2603.04304 2026-03-05 cs.CL

$V_1$: Unifying Generation and Self-Verification for Parallel Reasoners

Harman Singh, Xiuyu Li, Kusha Sareen, Monishwaran Maheswaran, Sijun Tan, Xiaoxia Wu, Junxiong Wang, Alpay Ariyak, Qingyang Wu, Samir Khaki, Rishabh Tiwari, Long Lian, Yucheng Lu, Boyi Li, Alane Suhr, Ben Athiwaratkun, Kurt Keutzer

2603.04302 2026-03-05 cs.CV

Motion Manipulation via Unsupervised Keypoint Positioning in Face Animation

Hong Li, Boyu Liu, Xuhui Liu, Baochang Zhang

Comments 19 pages, 15 figures

2603.04301 2026-03-05 cs.RO

Compliant In-hand Rolling Manipulation Using Tactile Sensing

Huan Weng, Yifei Chen, Kevin M. Lynch

2603.04293 2026-03-05 cs.SD cs.AI cs.IR cs.LG

LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance

Ioannis Prokopiou, Ioannis Sina, Agisilaos Kounelis, Pantelis Vikatos, Themos Stafylakis

Comments Accepted at NLP4MusA 2026 (4th Workshop on NLP for Music and Audio)

2603.04292 2026-03-05 cs.CL

Position: Vector Prompt Interfaces Should Be Exposed to Enable Customization of Large Language Models

Liangwei Yang, Shiyu Wang, Haolin Chen, Rithesh Murthy, Ming Zhu, Jielin Qiu, Zixiang Chen, Juntao Tan, Jianguo Zhang, Zhiwei Liu, Wenting Zhao, Silvio Savarese, Caiming Xiong, Huan Wang, Shelby Heinecke

2603.04291 2026-03-05 cs.CV cs.AI

CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video

Lingen Li, Guangzhi Wang, Xiaoyu Li, Zhaoyang Zhang, Qi Dou, Jinwei Gu, Tianfan Xue, Ying Shan

Comments Accepted to CVPR 2026

2603.04289 2026-03-05 cs.LG cs.AI

IPD: Boosting Sequential Policy with Imaginary Planning Distillation in Offline Reinforcement Learning

Yihao Qin, Yuanfei Wang, Hang Zhou, Peiran Liu, Hao Dong, Yiding Ji

2603.04288 2026-03-05 cs.CV

A multi-center analysis of deep learning methods for video polyp detection and segmentation

Noha Ghatwary, Pedro Chavarias Solano, Mohamed Ramzy Ibrahim, Adrian Krenzer, Frank Puppe, Stefano Realdon, Renato Cannizzaro, Jiacheng Wang, Liansheng Wang, Thuy Nuong Tran, Lena Maier-Hein, Amine Yamlahi, Patrick Godau, Quan He, Qiming Wan, Mariia Kokshaikyna, Mariia Dobko, Haili Ye, Heng Li, Ragu B, Antony Raj, Hanaa Nagdy, Osama E Salem, James E. East, Dominique Lamarque, Thomas de Lange, Sharib Ali

Comments 17 pages

2603.04284 2026-03-05 cs.RO

OmniPlanner: Universal Exploration and Inspection Path Planning across Robot Morphologies

Angelos Zacharia, Mihir Dharmadhikari, Mohit Singh, Kostas Alexis

Comments The code for this paper is open-sourced and released at: https://github.com/ntnu-arl/gbplanner_ros/tree/gbplanner3

2603.04277 2026-03-05 cs.RO cs.AI

VANGUARD: Vehicle-Anchored Ground Sample Distance Estimation for UAVs in GPS-Denied Environments

Yifei Chen, Xupeng Chen, Feng Wang, Niangang Jiao, Jiayin Liu

2603.04276 2026-03-05 cs.LG cs.AI cs.CL econ.EM

Causality Elicitation from Large Language Models

Takashi Kameyama, Masahiro Kato, Yasuko Hio, Yasushi Takano, Naoto Minakawa

2603.04272 2026-03-05 cs.CV

SSR: A Generic Framework for Text-Aided Map Compression for Localization

Mohammad Omama, Po-han Li, Harsh Goel, Minkyu Choi, Behdad Chalaki, Vaishnav Tadiparthi, Hossein Nourkhiz Mahjoub, Ehsan Moradi Pari, Sandeep P. Chinchali

2603.04265 2026-03-05 cs.CV

ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos

Luigi Seminara, Davide Moltisanti, Antonino Furnari

Comments Accepted at CVPR 2026

2603.04257 2026-03-05 cs.CL cs.LG

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Zhenting Wang, Huancheng Chen, Jiayun Wang, Wei Wei

2603.04254 2026-03-05 cs.CV

EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding

Seungjun Lee, Zihan Wang, Yunsong Wang, Gim Hee Lee

Comments CVPR 2026, Project Page: https://0nandon.github.io/EmbodiedSplat/

2603.04249 2026-03-05 cs.RO

RoboLight: A Dataset with Linearly Composable Illumination for Robotic Manipulation

Shutong Jin, Jin Yang, Muhammad Zahid, Florian T. Pokorny

2603.04247 2026-03-05 cs.LG cs.AI

Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback

Haoran Zhang, Seohyeon Cha, Hasan Burhan Beytur, Kevin S Chan, Gustavo de Veciana, Haris Vikalo

Comments preprint

2603.04241 2026-03-05 cs.AI cs.LG

Agentics 2.0: Logical Transduction Algebra for Agentic Data Workflows

Alfio Massimiliano Gliozzo, Junkyu Lee, Nahuel Defosse

Comments 14 pages, 4 figures

2603.04240 2026-03-05 cs.CV

DeNuC: Decoupling Nuclei Detection and Classification in Histopathology

Zijiang Yang, Chen Kuang, Dongmei Fu

Comments 10 pages

2603.04238 2026-03-05 cs.CL

Retrieval or Representation? Reassessing Benchmark Gaps in Multilingual and Visually Rich RAG

Martin Asenov, Kenza Benkirane, Dan Goldwater, Aneiss Ghodsi

Comments ICLR 2026 Workshop I Can't Believe It's Not Better: Where Large Language Models Need to Improve

2603.04225 2026-03-05 cs.RO

AMP2026: A Multi-Platform Marine Robotics Dataset for Tracking and Mapping

Edwin Meriaux, Shuo Wen, David Widhalm, Zhizun Wang, Junming Shi, Mariana Sosa Guzmán, Kalvik Jakkala, Bennett Carley, Elias Sokolova, Yogesh Girdhar, Monika Roznere, Jason O'Kane, Junaed Sattar, Gregory Dudek

2603.04224 2026-03-05 cs.LG cs.CV

Nearest-Neighbor Density Estimation for Dependency Suppression

Kathleen Anderson, Thomas Martinetz

2603.04222 2026-03-05 cs.RO cs.AI

PRAM-R: A Perception-Reasoning-Action-Memory Framework with LLM-Guided Modality Routing for Adaptive Autonomous Driving

Yi Zhang, Xian Zhang, Saisi Zhao, Yinglei Song, Chengdong Wu, Nenad Petrovic, Alois Knoll

2603.04219 2026-03-05 cs.SD cs.AI eess.AS

ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis

Youngwon Choi, Jinwoo Oh, Hwayeon Kim, Hyeonyu Kim

Comments 6 pages, submitted to INTERSPEECH 2026

2603.04217 2026-03-05 cs.CL

When Do Language Models Endorse Limitations on Human Rights Principles?

Keenan Samway, Nicole Miu Takagi, Rada Mihalcea, Bernhard Schölkopf, Ilias Chalkidis, Daniel Hershcovich, Zhijing Jin

Comments EACL Findings 2026

2603.04209 2026-03-05 cs.LG

Beyond Edge Deletion: A Comprehensive Approach to Counterfactual Explanation in Graph Neural Networks

Matteo De Sanctis, Riccardo De Sanctis, Stefano Faralli, Paola Velardi, Bardh Prenkaj

2603.04208 2026-03-05 cs.RO

GSeg3D: A High-Precision Grid-Based Algorithm for Safety-Critical Ground Segmentation in LiDAR Point Clouds

Muhammad Haider Khan Lodhi, Christoph Hertzberg

2603.04205 2026-03-05 cs.CV

Real5-OmniDocBench: A Full-Scale Physical Reconstruction Benchmark for Robust Document Parsing in the Wild

Changda Zhou, Ziyue Gao, Xueqing Wang, Tingquan Gao, Cheng Cui, Jing Tang, Yi Liu