arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.26938 2026-03-31 cs.CV

From 3D Pose to Prose: Biomechanics-Grounded Vision--Language Coaching

Yuyang Ji, Yixuan Shen, Shengjie Zhu, Yu Kong, Feng Liu

详情

英文摘要

We present BioCoach, a biomechanics-grounded vision--language framework for fitness coaching from streaming video. BioCoach fuses visual appearance and 3D skeletal kinematics, through a novel three-stage pipeline: an exercise-specific degree-of-freedom selector that focuses analysis on salient joints; a structured biomechanical context that pairs individualized morphometrics with cycle and constraint analysis; and a vision--biomechanics conditioned feedback module that applies cross-attention to generate precise, actionable text. Using parameter-efficient training that freezes the vision and language backbones, BioCoach yields transparent, personalized reasoning rather than pattern matching. To enable learning and fair evaluation, we augment QEVD-fit-coach with biomechanics-oriented feedback to create QEVD-bio-fit-coach, and we introduce a biomechanics-aware LLM judge metric. BioCoach delivers clear gains on QEVD-bio-fit-coach across lexical and judgment metrics while maintaining temporal triggering; on the original QEVD-fit-coach, it improves text quality and correctness with near-parity timing, demonstrating that explicit kinematics and constraints are key to accurate, phase-aware coaching.

URL PDF HTML ☆

赞 0 踩 0

2603.25716 2026-03-31 cs.CV cs.AI

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Kaijin Chen, Dingkang Liang, Xin Zhou, Yikang Ding, Xiaoqiang Liu, Pengfei Wan, Xiang Bai

Comments Project Page: https://kj-chen666.github.io/Hybrid-Memory-in-Video-World-Models/ Code: https://github.com/H-EmbodVis/HyDRA

2603.25053 2026-03-31 cs.CV

GaussFusion: Improving 3D Reconstruction in the Wild with A Geometry-Informed Video Generator

Liyuan Zhu, Manjunath Narayana, Michal Stary, Will Hutchcroft, Gordon Wetzstein, Iro Armeni

Comments CVPR 2026 main paper camera-ready. Project page: http://research.zhuliyuan.net/projects/GaussFusion/

2603.24984 2026-03-31 cs.CV

MoE-GRPO: Optimizing Mixture-of-Experts via Reinforcement Learning in Vision-Language Models

Dohwan Ko, Jinyoung Park, Seoung Choi, Sanghyeok Lee, Seohyun Lee, Hyunwoo J. Kim

Comments Accepted at CVPR 2026

2603.24903 2026-03-31 cs.CV

Self-Supervised Learning for Knee Osteoarthritis: Diagnostic Limitations and Prognostic Value of Hospital Data

Haresh Rengaraj Rajamohan, Yuxuan Chen, Kyunghyun Cho, Cem M. Deniz

2603.24749 2026-03-31 cs.CV

TIGeR: A Unified Framework for Time, Images and Geo-location Retrieval

David G. Shatwell, Sirnam Swetha, Mubarak Shah

Comments Accepted in CVPR 2026

2603.24691 2026-03-31 cs.CV

BCMDA: Bidirectional Correlation Maps Domain Adaptation for Mixed Domain Semi-Supervised Medical Image Segmentation

Bentao Song, Jun Huang, Qingfeng Wang

Comments Accepted at Neural Networks

详情

DOI: 10.1016/j.neunet.2026.108877

英文摘要

In mixed domain semi-supervised medical image segmentation (MiDSS), achieving superior performance under domain shift and limited annotations is challenging. This scenario presents two primary issues: (1) distributional differences between labeled and unlabeled data hinder effective knowledge transfer, and (2) inefficient learning from unlabeled data causes severe confirmation bias. In this paper, we propose the bidirectional correlation maps domain adaptation (BCMDA) framework to overcome these issues. On the one hand, we employ knowledge transfer via virtual domain bridging (KTVDB) to facilitate cross-domain learning. First, to construct a distribution-aligned virtual domain, we leverage bidirectional correlation maps between labeled and unlabeled data to synthesize both labeled and unlabeled images, which are then mixed with the original images to generate virtual images using two strategies, a fixed ratio and a progressive dynamic MixUp. Next, dual bidirectional CutMix is used to enable initial knowledge transfer within the fixed virtual domain and gradual knowledge transfer from the dynamically transitioning labeled domain to the real unlabeled domains. On the other hand, to alleviate confirmation bias, we adopt prototypical alignment and pseudo label correction (PAPLC), which utilizes learnable prototype cosine similarity classifiers for bidirectional prototype alignment between the virtual and real domains, yielding smoother and more compact feature representations. Finally, we use prototypical pseudo label correction to generate more reliable pseudo labels. Empirical evaluations on three public multi-domain datasets demonstrate the superiority of our method, particularly showing excellent performance even with very limited labeled samples. Code available at https://github.com/pascalcpp/BCMDA.

URL PDF HTML ☆

赞 0 踩 0

2603.24569 2026-03-31 cs.CV

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kumar Das, Monorama Swain, Yufang Hou, Elisabeth Andre, Khalid Mahmood Malik, Markus Schedl, Shah Nawaz

Comments Grand challenge at ACM MM 2026

2603.22339 2026-03-31 cs.LG cs.CL stat.ML

Problems with Chinchilla Approach 2: Systematic Biases in IsoFLOP Parabola Fits

Eric Czech, Zhiwei Xu, Yael Elmatad, Yixin Wang, William Held

2603.22054 2026-03-31 cs.CV

FontCrafter: High-Fidelity Element-Driven Artistic Font Creation with Visual In-Context Generation

Wuyang Luo, Chengkai Tan, Chang Ge, Binye Hong, Su Yang, Yongjiu Ma

Comments To appear in CVPR 2026

2603.22041 2026-03-31 cs.CV

DTVI: Dual-Stage Textual and Visual Intervention for Safe Text-to-Image Generation

Binhong Tan, Zhaoxin Wang, Handing Wang

2603.22015 2026-03-31 cs.CL

Retrieving Climate Change Disinformation by Narrative

Max Upravitelev, Veronika Solopova, Charlott Jakob, Premtim Sahitaj, Sebastian Möller, Vera Schmitt

2603.21879 2026-03-31 cs.LG cs.AI

SmaAT-QMix-UNet: A Parameter-Efficient Vector-Quantized UNet for Precipitation Nowcasting

Nikolas Stavrou, Siamak Mehrkanoon

Comments 6 pages, 5 figures

2603.21876 2026-03-31 cs.CV

Thermal Topology Collapse: Universal Physical Patch Attacks on Infrared Vision Systems

Chengyin Hu, Yikun Guo, Yuxian Dong, Qike Zhang, Kalibinuer Tiliwalidi, Yiwei Wei, Haitao Shi, Jiujiang Guo, Jiahuan Long, Xiang Chen

2603.21806 2026-03-31 cs.CV

Anatomical Token Uncertainty for Transformer-Guided Active MRI Acquisition

Lev Ayzenberg, Shady Abu-Hussein, Raja Giryes, Hayit Greenspan

2603.21636 2026-03-31 cs.AI cs.CL

Silicon Bureaucracy and AI Test-Oriented Education: Contamination Sensitivity and Score Confidence in LLM Benchmarks

Yiliang Song, Hongjun An, Jiangan Chen, Xuanchen Yan, Huan Song, Jiawei Shao, Xuelong Li

Comments Remove the NeurIPS 2026 template

2603.21356 2026-03-31 cs.CV

FluidGaussian: Propagating Simulation-Based Uncertainty Toward Functionally-Intelligent 3D Reconstruction

Yuqiu Liu, Jialin Song, Marissa Ramirez de Chanlatte, Rochishnu Chowdhury, Rushil Paresh Desai, Wuyang Chen, Daniel Martin, Michael W. Mahoney

Comments Accepted by CVPR 2026

2603.21332 2026-03-31 cs.CV

EmoTaG: Emotion-Aware Talking Head Synthesis on Gaussian Splatting with Few-Shot Personalization

Haolan Xu, Keli Cheng, Lei Wang, Ning Bi, Xiaoming Liu

Comments Accepted by CVPR 2026. Page: https://emotag26.github.io/

2603.20957 2026-03-31 cs.CL cs.AI cs.CY

Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models

Xinyue Liu, Niloofar Mireshghallah, Jane C. Ginsburg, Tuhin Chakrabarty

Comments Preprint Under Review

2603.20339 2026-03-31 cs.LG cs.CR

Graph-Aware Stealthy Poison-Text Backdoors for Text-Attributed Graphs

Qi Luo, Minghui Xu, Dongxiao Yu, Xiuzhen Cheng

Comments 13 pages

2603.19278 2026-03-31 cs.CL cs.AI

HypeLoRA: Hyper-Network-Generated LoRA Adapters for Calibrated Language Model Fine-Tuning

Bartosz Trojan, Filip Gębala

Comments 12 pages, 2 figures, 2 tables

2603.19013 2026-03-31 cs.CV

GenHOI: Generalized Hand-Object Pose Estimation with Occlusion Awareness

Hui Yang, Wei Sun, Jian Liu, Jian Xiao, Tao Xie, Hossein Rahmani, Ajmal Saeed Mian, Nicu Sebe, Gim Hee Lee

Comments 25 pages, 7 figures

2603.18532 2026-03-31 cs.RO cs.AI cs.LG

Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds

Andrew Choi, Xinjie Wang, Zhizhong Su, Wei Xu

2603.17388 2026-03-31 cs.CV

Toward Phonology-Guided Sign Language Motion Generation: A Diffusion Baseline and Conditioning Analysis

Rui Hong, Jana Kosecka

Comments 8 pages, 4 figures

2603.17227 2026-03-31 cs.CV

Adaptive Anchor Policies for Efficient 4D Gaussian Streaming

Ashim Dahal, Rabab Abdelfattah, Nick Rahimi

2603.16649 2026-03-31 cs.CV

Mixture of Style Experts for Diverse Image Stylization

Shihao Zhu, Ziheng Ouyang, Yijia Kang, Qilong Wang, Mi Zhou, Bo Li, Ming-Ming Cheng, Qibin Hou

Comments 24 pages, 16 figures

2603.16249 2026-03-31 cs.CV

Synergizing Deep Learning and Biological Heuristics for Extreme Long-Tail White Blood Cell Classification

Duc T. Nguyen, Hoang-Long Nguyen, Huy-Hieu Pham

Comments Accepted at IEEE ISBI 2026

2603.14903 2026-03-31 cs.CL

ExPosST: Explicit Positioning with Adaptive Masking for LLM-Based Simultaneous Machine Translation

Yuzhe Shang, Pengzhi Gao, Yazheng Yang, Jiayao Ma, Wei Liu, Jian Luan, Jinsong Su

2603.14575 2026-03-31 cs.LG cs.CL stat.ML

CausalEvolve: Towards Open-Ended Discovery with Causal Scratchpad

Yongqiang Chen, Chenxi Liu, Zhenhao Chen, Tongliang Liu, Bo Han, Kun Zhang

Comments Preprint of ongoing work; Yongqiang and Chenxi contributed equally;

2603.14498 2026-03-31 cs.RO cs.CV

R3DP: Real-Time 3D-Aware Policy for Embodied Manipulation

Yuhao Zhang, Wanxi Dong, Yue Shi, Yi Liang, Jingnan Gao, Qiaochu Yang, Yaxing Lyu, Zhixuan Liang, Yibin Liu, Congsheng Xu, Xianda Guo, Wei Sui, Yaohui Jin, Xiaokang Yang, Yanyan Xu, Yao Mu

Comments Project Page: https://dazazh.github.io/r3dp-project-page/ Github Repo: https://github.com/dazazh/R3DP