arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.17396 2026-04-08 cs.CV

Gesture-Aware Pretraining and Token Fusion for 3D Hand Pose Estimation

Rui Hong, Jana Kosecka

Comments 6 pages, 6 figures

详情

英文摘要

Estimating 3D hand pose from monocular RGB images is fundamental for applications in AR/VR, human-computer interaction, and sign language understanding. In this work we focus on a scenario where a discrete set of gesture labels is available and show that gesture semantics can serve as a powerful inductive bias for 3D pose estimation. We present a two-stage framework: gesture-aware pretraining that learns an informative embedding space using coarse and fine gesture labels from InterHand2.6M, followed by a per-joint token Transformer guided by gesture embeddings as intermediate representations for final regression of MANO hand parameters. Training is driven by a layered objective over parameters, joints, and structural constraints. Experiments on InterHand2.6M demonstrate that gesture-aware pretraining consistently improves single-hand accuracy over the state-of-the-art EANet baseline, and that the benefit transfers across architectures without any modification.

URL PDF HTML ☆

赞 0 踩 0

2603.14093 2026-04-08 cs.LG cs.AI

Not All Latent Spaces Are Flat: Hyperbolic Concept Control

Maria Rosaria Briglia, Simone Facchiano, Paolo Cursi, Alessio Sampieri, Emanuele Rodolà, Guido Maria D'Amely di Melendugno, Luca Franco, Fabio Galasso, Iacopo Masi

2603.11911 2026-04-08 cs.CV

InSpatio-WorldFM: An Open-Source Real-Time Generative Frame Model

InSpatio Team, Donghui Shen, Guofeng Zhang, Haomin Liu, Haoyu Ji, Jialin Liu, Jing Guo, Nan Wang, Siji Pan, Weihong Pan, Weijian Xie, Xiaojun Xiang, Xiaoyu Zhang, Xianbin Liu, Yifu Wang, Yipeng Chen, Zhewen Le, Zhichao Ye, Ziqiang Zhao

Comments Project page: https://inspatio.github.io/worldfm/ Code: https://github.com/inspatio/worldfm

2603.05414 2026-04-08 cs.AI cs.CL

Emergent Introspection in AI is Content-Agnostic

Harvey Lederman, Kyle Mahowald

Comments This version supersedes the earlier posted preprint, as discussed in this version

2602.07064 2026-04-08 cs.CV

OmniFysics: Towards Physical Intelligence Evolution via Omni-Modal Signal Processing and Network Optimization

Minghao Han, Dingkang Yang, Yue Jiang, Yizhou Liu, Lihua Zhang

Comments This work has been submitted to the IEEE for possible publication

2602.00913 2026-04-08 cs.CL cs.AI cs.LG

Do Schwartz Higher-Order Values Help Sentence-Level Human Value Detection? A Study of Hierarchical Gating and Calibration

Víctor Yeste, Paolo Rosso

Comments Code: https://github.com/VictorMYeste/human-value-detection, models: https://huggingface.co/papers/2602.00913, 27 pages, 4 figures

2601.18336 2026-04-08 cs.CV cs.GR

PPISP: Physically-Plausible Compensation and Control of Photometric Variations in Radiance Field Reconstruction

Isaac Deutsch, Nicolas Moënne-Loccoz, Gavriel State, Zan Gojcic

Comments For more details and updates, please visit our project website: https://research.nvidia.com/labs/sil/projects/ppisp/

2601.16211 2026-04-08 cs.CV cs.AI

Why Can't I Open My Drawer? Mitigating Object-Driven Shortcuts in Zero-Shot Compositional Action Recognition

Geo Ahn, Inwoong Lee, Taeoh Kim, Minho Shim, Dongyoon Wee, Jinwoo Choi

Comments The code is available at https://github.com/KHU-VLL/RCORE

2601.14690 2026-04-08 cs.CV

FeedbackSTS-Det: Sparse Frames-Based Spatio-Temporal Semantic Feedback Network for Moving Infrared Small Target Detection

Yian Huang, Qing Qin, Aji Mao, Xiangyu Qiu, Liang Xu, Xian Zhang, Zhenming Peng

Comments Submitted to Journal IEEE Transactions on Circuits and Systems for Video Technology

2601.10075 2026-04-08 cs.CV cs.GR cs.LG

Thinking Like Van Gogh: Structure-Aware Style Transfer via Flow-Guided 3D Gaussian Splatting

Lebin Zhou, Jingchuan Xiao, Zhendong Wang, Jinhao Wang, Rongduo Han, Nam Ling, Cihan Ruan

Comments 7 pages, 8 figures

2601.10073 2026-04-08 cs.CV cs.AI

ReaMIL: Reasoning- and Evidence-Aware Multiple Instance Learning for Whole-Slide Histopathology

Hyun Do Jung, Jungwon Choi, Hwiyoung Kim

Comments Accepted at LFMBio Workshop, WACV 2026. Oral Presentation

2601.09726 2026-04-08 cs.CL

Forgetting as a Feature: Cognitive Alignment of Large Language Models

Alexandros Christoforos

Comments arXiv admin note: This submission has been withdrawn by arXiv administrators due to incorrect authorship. Author list truncated

2601.05811 2026-04-08 cs.LG

Learning Reconstructive Embeddings in Reproducing Kernel Hilbert Spaces via the Representer Theorem

Enrique Feito-Casares, Francisco M. Melgarejo-Meseguer, José-Luis Rojo-Álvarez

2601.04462 2026-04-08 cs.LG

Meta-probabilistic Modeling

Kevin Zhang, Yixin Wang

2601.03054 2026-04-08 cs.CV cs.AI

IBISAgent: Reinforcing Pixel-Level Visual Reasoning in MLLMs for Universal Biomedical Object Referring and Segmentation

Yankai Jiang, Qiaoru Li, Binlu Xu, Haoran Sun, Chao Ding, Junting Dong, Yuxiang Cai, Xuhong Zhang, Jianwei Yin

2512.20157 2026-04-08 cs.CV

SigLino: Efficient Multi-Teacher Distillation for Agglomerative Vision Foundation Models

Sofian Chaybouti, Sanath Narayan, Yasser Dahou, Phúc H. Lê Khac, Ankit Singh, Ngoc Dung Huynh, Wamiq Reyaz Para, Hilde Kuehne, Hakim Hacid

Comments 17 pages, 8 figures, 11 tables

2512.16811 2026-04-08 cs.CV cs.RO

GeoPredict: Leveraging Predictive Kinematics and 3D Gaussian Geometry for Precise VLA Manipulation

Jingjing Qian, Boyao Han, Chen Shi, Lei Xiao, Long Yang, Shaoshuai Shi, Li Jiang

2512.11013 2026-04-08 cs.CL

PIAST: Rapid Prompting with In-context Augmentation for Scarce Training data

Pawel Batorski, Paul Swoboda

2511.15424 2026-04-08 cs.CL

LLM-MemCluster: Empowering Large Language Models with Dynamic Memory for Text Clustering

Yuanjie Zhu, Liangwei Yang, Ke Xu, Weizhi Zhang, Zihe Song, Jindong Wang, Philip S. Yu

2511.07969 2026-04-08 cs.CL

Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker

Matthias De Lange, Jens-Joris Decorte, Jeroen Van Hautte

Comments Preprint, 9 pages

2511.03819 2026-04-08 cs.CV q-bio.QM

SiLVi: Simple Interface for Labeling Video Interactions

Ozan Kanbertay, Richard Vogg, Elif Karakoc, Peter M. Kappeler, Claudia Fichtel, Alexander S. Ecker

Comments Documentation link updated, Linux version added

2511.01594 2026-04-08 cs.RO cs.CV

MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence

Renjun Gao

Comments 3 figures, 1 table

2511.00503 2026-04-08 cs.CV

Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models

Panwang Pan, Chenguo Lin, Jingjing Zhao, Chenxin Li, Yuchen Lin, Haopeng Li, Honglei Yan, Kairun Wen, Yunlong Lin, Yixuan Yuan, Yadong Mu

Comments Accepted to CVPR 2026. Project page: https://paulpanwang.github.io/Diff4Splat

2510.23409 2026-04-08 cs.LG cs.AI

Eigen-Value: Efficient Domain-Robust Data Valuation via Eigenvalue-Based Approach

Youngjun Choi, Joonseong Kang, Sungjun Lim, Kyungwoo Song

2510.18117 2026-04-08 cs.CV

Online In-Context Distillation for Low-Resource Vision Language Models

Zhiqi Kang, Rahaf Aljundi, Vaggelis Dorovatas, Karteek Alahari

2510.17018 2026-04-08 cs.CL cs.LG

CoGate-LSTM: Prototype-Guided Feature-Space Gating for Mitigating Gradient Dilution in Imbalanced Toxic Comment Classification

Noor Islam S. Mohammad

2510.12957 2026-04-08 cs.LG cs.AI

Reveal-to-Revise: Explainable Bias-Aware Generative Modeling with Multimodal Attention

Noor Islam S. Mohammad, Md Muntaqim Meherab

Comments We are recently authors in conflict with this work; I am heartily requesting to withdraw this paper as soon as possible

2510.07310 2026-04-08 cs.CV

MATRIX: Mask Track Alignment for Interaction-aware Video Generation

Siyoon Jin, Seongchan Kim, Dahyun Chung, Jaeho Lee, Hyunwook Choi, Jisu Nam, Jiyoung Kim, Seungryong Kim

Comments Project Page is available at: https://cvlab-kaist.github.io/MATRIX/, ICLR 2026

2510.01025 2026-04-08 cs.AI cs.CL

Hypothesis-Driven Feature Manifold Analysis in LLMs via Supervised Multi-Dimensional Scaling

Federico Tiblias, Irina Bigoulaeva, Jingcheng Niu, Simone Balloccu, Iryna Gurevych

Comments Published in TMLR (March 2026) | OpenReview: https://openreview.net/forum?id=vCKZ40YYPr | Code: https://github.com/UKPLab/tmlr2026-manifold-analysis

2509.18633 2026-04-08 cs.AI q-fin.RM

Modelling Cascading Physical Climate Risk in Supply Chains with Adaptive Firms: A Spatial Agent-Based Framework

Yara Mohajerani

Comments V1 presented at NeurIPS 2025 Tackling Climate Change with Machine Learning workshop. V4 replaces evolutionary learning with explicit firm continuity adaptation, adds stock-flow consistency, matched-seed ensembles, cascade diagnostics, and internal validations. Code: https://github.com/yaramohajerani/spatial-climate-ABM