arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2404.00636 2026-03-05 cs.CV cs.AI cs.MM

Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation

Taekyung Ki, Dongchan Min, Gyeongsu Chae

Comments ECCV 2024. Project page: https://export3d.github.io

详情

英文摘要

In this paper, we present Export3D, a one-shot 3D-aware portrait animation method that is able to control the facial expression and camera view of a given portrait image. To achieve this, we introduce a tri-plane generator with an effective expression conditioning method, which directly generates a tri-plane of 3D prior by transferring the expression parameter of 3DMM into the source image. The tri-plane is then decoded into the image of different view through a differentiable volume rendering. Existing portrait animation methods heavily rely on image warping to transfer the expression in the motion space, challenging on disentanglement of appearance and expression. In contrast, we propose a contrastive pre-training framework for appearance-free expression parameter, eliminating undesirable appearance swap when transferring a cross-identity expression. Extensive experiments show that our pre-training framework can learn the appearance-free expression representation hidden in 3DMM, and our model can generate 3D-aware expression controllable portrait images without appearance swap in the cross-identity manner.

URL PDF HTML ☆

赞 0 踩 0

2311.16157 2026-03-05 cs.CV cs.LG eess.IV math.GT

GeoTop: Advancing Image Classification with Geometric-Topological Analysis

Mariem Abaach, Ian Morilla

Comments 37 pages, 6 figures

2306.17544 2026-03-05 cs.RO

Fusion of Visual-Inertial Odometry with LiDAR Relative Localization for Cooperative Guidance of a Micro-Scale Aerial Vehicle

Václav Pritzl, Matouš Vrba, Petr Štěpán, Martin Saska

Comments Accepted version

详情

DOI: 10.1109/ACCESS.2026.3666998
Journal ref: IEEE Access, vol. 14, pp. 31269-31285, 2026

英文摘要

A novel relative localization approach for guidance of a micro-scale Unmanned Aerial Vehicle (UAV) by a well-equipped aerial robot fusing Visual-Inertial Odometry (VIO) with Light Detection and Ranging (LiDAR) is proposed in this paper. LiDAR-based localization is accurate and robust to challenging environmental conditions, but 3D LiDARs are relatively heavy and require large UAV platforms, in contrast to lightweight cameras. However, visual-based self-localization methods exhibit lower accuracy and can suffer from significant drift with respect to the global reference frame. To benefit from both sensory modalities, we focus on cooperative navigation in a heterogeneous team of a primary LiDAR-equipped UAV and a secondary micro-scale camera-equipped UAV. We propose a novel cooperative approach combining LiDAR relative localization data with VIO output on board the primary UAV to obtain an accurate pose of the secondary UAV. The pose estimate is used to precisely and reliably guide the secondary UAV along trajectories defined in the primary UAV reference frame. The experimental evaluation has shown the superior accuracy of our method to the raw VIO output, reaching the average 3D Absolute Trajectory Error (ATE) of 0.28 m, and demonstrated its capability to guide the secondary UAV along desired trajectories while mitigating VIO drift. Thus, such a heterogeneous system can explore large areas with LiDAR precision, as well as visit locations inaccessible to the large LiDAR-carrying UAV platforms, as was showcased in a real-world cooperative mapping scenario.

URL PDF HTML ☆

赞 0 踩 0

2306.09459 2026-03-05 cs.LG cs.AI

Recurrent Action Transformer with Memory

Egor Cherepanov, Alexey Staroverov, Alexey K. Kovalev, Aleksandr I. Panov

Comments 29 pages, 22 figures, 13 tables

2303.07510 2026-03-05 cs.CV quant-ph

Schrödinger's Camera: First Steps Towards a Quantum-Based Privacy Preserving Camera

Hannah Kirkland, Sanjeev J. Koppal

2603.03597 2026-03-05 cs.LG

NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training

Hadi Mohaghegh Dolatabadi, Thalaiyasingam Ajanthan, Sameera Ramasinghe, Chamin P Hewa Koneputugodage, Shamane Siriwardhana, Violetta Shevchenko, Karol Pajak, James Snewin, Gil Avraham, Alexander Long

Comments 47 pages, 22 figures, 18 tables

2603.03595 2026-03-05 cs.LG

Hybrid Belief Reinforcement Learning for Efficient Coordinated Spatial Exploration

Danish Rizvi, David Boyle

2603.03584 2026-03-05 cs.CV

Hazard-Aware Traffic Scene Graph Generation

Yaoqi Huang, Julie Stephany Berrio, Mao Shan, Stewart Worrall

2603.03583 2026-03-05 cs.CL cs.LG

ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer

Chunyuan Deng, Sanket Lokegaonkar, Colin Lockard, Besnik Fetahu, Nasser Zalmout, Xian Li

Comments ICLR 2026

2603.03580 2026-03-05 cs.CV

An Effective Data Augmentation Method by Asking Questions about Scene Text Images

Xu Yao, Lei Kang

Comments Accepted to ICASSP 2026

2603.03578 2026-03-05 cs.LG

Transport Clustering: Solving Low-Rank Optimal Transport via Clustering

Henri Schmidt, Peter Halmos, Ben Raphael

2603.03571 2026-03-05 cs.CV

Confidence-aware Monocular Depth Estimation for Minimally Invasive Surgery

Muhammad Asad, Emanuele Colleoni, Pritesh Mehta, Nicolas Toussaint, Ricardo Sanchez-Matilla, Maria Robu, Faisal Bashir, Rahim Mohammadi, Imanol Luengo, Danail Stoyanov

Comments 12 pages, 4 figures

2603.03564 2026-03-05 cs.CV

Modeling Cross-vision Synergy for Unified Large Vision Model

Shengqiong Wu, Lanhu Wu, Mingyang Bao, Wenhao Xu, Hanwang Zhang, Shuicheng Yan, Hao Fei, Tat-Seng Chua

Comments 21 pages, 9 figures, 16 tables, CVPR

2603.03556 2026-03-05 cs.RO cs.LG cs.SY eess.SY

Real-time tightly coupled GNSS and IMU integration via Factor Graph Optimization

Radu-Andrei Cioaca, Paul Irofti, Cristian Rusu, Gianluca Caparra, Andrei-Alexandru Marinache, Florin Stoican

2603.03546 2026-03-05 cs.RO cs.LG cs.SY eess.SY

Real-time loosely coupled GNSS and IMU integration via Factor Graph Optimization

Radu-Andrei Cioaca, Cristian Rusu, Paul Irofti, Gianluca Caparra, Andrei-Alexandru Marinache, Florin Stoican

2603.03544 2026-03-05 cs.CV

PinCLIP: Large-scale Foundational Multimodal Representation at Pinterest

Josh Beal, Eric Kim, Jinfeng Rao, Rex Wu, Dmitry Kislyuk, Charles Rosenberg

2603.03543 2026-03-05 cs.CL cs.AI

Tucano 2 Cool: Better Open Source LLMs for Portuguese

Nicholas Kluge Corrêa, Aniket Sen, Shiza Fatimah, Sophia Falk, Lennard Landgraf, Julia Kastner, Lucie Flek

2603.03541 2026-03-05 cs.CL cs.AI

RAG-X: Systematic Diagnosis of Retrieval-Augmented Generation for Medical Question Answering

Aswini Sivakumar, Vijayan Sugumaran, Yao Qiang

Comments 7 pages, 1 figure

2603.03537 2026-03-05 cs.RO

Passive Phase-Oriented Impedance Shaping for Rapid Acceleration in Soft Robotic Swimmers

Qimin Feng, Orion A. Roberts, Qiang Zhong

Comments Submitted to the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

2603.03536 2026-03-05 cs.CL cs.AI cs.IR

SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems

Haochang Hao, Yifan Xu, Xinzhuo Li, Yingqiang Ge, Lu Cheng

Comments 14 pages, 4 figures

2603.03535 2026-03-05 cs.LG

Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts

Sanae Lotfi, Lucas Caccia, Alessandro Sordoni, Jordan T. Ash, Miroslav Dudik

2603.03531 2026-03-05 cs.LG cs.AI

Role-Aware Conditional Inference for Spatiotemporal Ecosystem Carbon Flux Prediction

Yiming Sun, Runlong Yu, Rongchao Dong, Shuo Chen, Licheng Liu, Youmi Oh, Qianlai Zhuang, Yiqun Xie, Xiaowei Jia

2603.03530 2026-03-05 cs.LG cs.AI

Directional Neural Collapse Explains Few-Shot Transfer in Self-Supervised Learning

Achleshwar Luthra, Yash Salunkhe, Tomer Galanti

2603.03529 2026-03-05 cs.LG cs.AI cs.NE

mlx-snn: Spiking Neural Networks on Apple Silicon via MLX

Jiahao Qin

Comments 11 pages 3 figures

2603.03527 2026-03-05 cs.LG

Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

Betul Yurdem, Ferhat Ozgur Catak, Murat Kuzlu, Mehmet Kemal Gullu

Comments 10 pages, 6 figures, 4 tables

2603.03523 2026-03-05 cs.LG math.OC

Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence

Shengbo Wang

2603.03517 2026-03-05 cs.LG cs.AI cs.CL

MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery

Maksim Kuznetsov, Zulfat Miftahutdinov, Rim Shayakhmetov, Mikolaj Mizera, Roman Schutski, Bogdan Zagribelnyy, Ivan Ilin, Nikita Bondarev, Thomas MacDougall, Mathieu Reymond, Mihir Bafna, Kaeli Kaymak-Loveless, Eugene Babin, Maxim Malkov, Mathias Lechner, Ramin Hasani, Alexander Amini, Vladimir Aladinskiy, Alex Aliper, Alex Zhavoronkov

2603.03514 2026-03-05 cs.RO

Sampling-Based Motion Planning with Scene Graphs Under Perception Constraints

Qingxi Meng, Emiliano Flores, Thai Duong, Vaibhav Unhelkar, Lydia E. Kavraki

Comments 8 pages, 5 figures, Accepted to R-AL

2603.03508 2026-03-05 cs.CL cs.AI

Raising Bars, Not Parameters: LilMoo Compact Language Model for Hindi

Shiza Fatimah, Aniket Sen, Sophia Falk, Florian Mai, Lucie Flek, Nicholas Kluge Corrêa

2603.03505 2026-03-05 cs.CV cs.AI

PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation

Shang Wu, Chenwei Xu, Zhuofan Xia, Weijian Li, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Han Liu