arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2505.05589 2026-03-06 cs.CV cs.AI cs.LG

ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation

Jingzhong Lin, Xinru Li, Yuanyuan Qi, Bohao Zhang, Wenxiang Liu, Kecheng Tang, Wenxuan Huang, Xiangfeng Xu, Bangyan Li, Changbo Wang, Gaoqi He

详情

英文摘要

Reactive dance generation (RDG), the task of generating a dance conditioned on a lead dancer's motion, holds significant promise for enhancing human-robot interaction and immersive digital entertainment. Despite progress in duet synchronization and motion-music alignment, two key challenges remain: generating fine-grained spatial interactions and ensuring long-term temporal coherence. In this work, we introduce \textbf{ReactDance}, a diffusion framework that operates on a novel hierarchical latent space to address these spatiotemporal challenges in RDG. First, for high-fidelity spatial expression and fine-grained control, we propose Hierarchical Finite Scalar Quantization (\textbf{HFSQ}). This multi-scale motion representation effectively disentangles coarse body posture from subtle limb dynamics, enabling independent and detailed control over both aspects through a layered guidance mechanism. Second, to efficiently generate long sequences with high temporal coherence, we propose Blockwise Local Context (\textbf{BLC}), a non-autoregressive sampling strategy. Departing from slow, frame-by-frame generation, BLC partitions the sequence into blocks and synthesizes them in parallel via periodic causal masking and positional encodings. Coherence across these blocks is ensured by a dense sliding-window training approach that enriches the representation with local temporal context. Extensive experiments show that ReactDance substantially outperforms state-of-the-art methods in motion quality, long-term coherence, and sampling efficiency. Project page: https://ripemangobox.github.io/ReactDance.

URL PDF HTML ☆

赞 0 踩 0

2505.04997 2026-03-06 cs.AI cs.MA

Foam-Agent: Towards Automated Intelligent CFD Workflows

Ling Yue, Nithin Somasekharan, Tingwen Zhang, Yadi Cao, Zhangze Chen, Shimin Di, Shaowu Pan

2505.03621 2026-03-06 cs.CV

PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing

Yiping Xie, Bo Zhao, Mingtong Dai, Jian-Ping Zhou, Yue Sun, Tao Tan, Weicheng Xie, Linlin Shen, Zitong Yu

Comments Accepted by International Conference on Learning Representations (ICLR) 2026

2504.13596 2026-03-06 cs.CV cs.RO

Collaborative Learning of Local 3D Occupancy Prediction and Versatile Global Occupancy Mapping

Shanshuai Yuan, Julong Wei, Muer Tie, Xiangyun Ren, Zhongxue Gan, Wenchao Ding

Comments Accepted by ICRA 2026

2504.11190 2026-03-06 cs.AI cs.CL

Enhancing multimodal analogical reasoning with Logic Augmented Generation

Anna Sofia Lippolis, Andrea Giovanni Nuzzolese, Aldo Gangemi

2504.10288 2026-03-06 cs.CV cs.LG physics.data-an

Noise2Ghost: Self-supervised deep convolutional reconstruction for ghost imaging

Mathieu Manni, Dmitry Karpov, K. Joost Batenburg, Sharon Shwartz, Nicola Viganò

2504.07654 2026-03-06 cs.LG cs.AI

ms-Mamba: Multi-scale Mamba for Time-Series Forecasting

Yusuf Meric Karadag, Ismail Talaz, Ipek Gursel Dino, Sinan Kalkan

Comments 14 pages. Accepted for publication in Neurocomputing

2503.15664 2026-03-06 cs.CL

Enhancing Pancreatic Cancer Staging with Large Language Models: The Role of Retrieval-Augmented Generation

Hisashi Johno, Yuki Johno, Akitomo Amakawa, Junichi Sato, Ryota Tozuka, Atsushi Komaba, Hiroaki Watanabe, Hiroki Watanabe, Chihiro Goto, Hiroyuki Morisaka, Hiroshi Onishi, Kazunori Nakamoto

Comments 11 pages, 6 figures, 2 tables, 6 supplementary files

详情

DOI: 10.1007/s12194-026-01026-0

英文摘要

Purpose: Retrieval-augmented generation (RAG) is a technology to enhance the functionality and reliability of large language models (LLMs) by retrieving relevant information from reliable external knowledge (REK). RAG has gained interest in radiology, and we previously reported the utility of NotebookLM, an LLM with RAG (RAG-LLM), for lung cancer staging. However, since the comparator LLM differed from NotebookLM's internal model, it remained unclear whether its advantage stemmed from RAG or inherent model differences. To better isolate RAG's impact and assess its utility across different cancers, we compared NotebookLM with its internal LLM, Gemini 2.0 Flash, in a pancreatic cancer staging experiment. Materials and Methods: A summary of Japan's pancreatic cancer staging guidelines was used as REK. We compared three groups - REK+/RAG+ (NotebookLM with REK), REK+/RAG- (Gemini 2.0 Flash with REK), and REK-/RAG- (Gemini 2.0 Flash without REK) - in staging 100 fictional pancreatic cancer cases based on CT findings. Staging criteria included TNM classification, local invasion factors, and resectability classification. In REK+/RAG+, retrieval accuracy was quantified based on the sufficiency of retrieved REK excerpts. Results: REK+/RAG+ achieved a staging accuracy of 70%, outperforming REK+/RAG- (38%) and REK-/RAG- (35%). For TNM classification, REK+/RAG+ attained 80% accuracy, exceeding REK+/RAG- (55%) and REK-/RAG- (50%). Additionally, REK+/RAG+ explicitly presented retrieved REK excerpts, achieving a retrieval accuracy of 92%. Conclusion: NotebookLM, a RAG-LLM, outperformed its internal LLM, Gemini 2.0 Flash, in a pancreatic cancer staging experiment, suggesting that RAG may improve LLM's staging accuracy. Furthermore, its ability to retrieve and present REK excerpts provides transparency for physicians, highlighting its applicability for clinical diagnosis and classification.

URL PDF HTML ☆

赞 0 踩 0

2503.11730 2026-03-06 cs.LG cs.AI

BACE-RUL: A Bi-directional Adversarial Network with Covariate Encoding for Machine Remaining Useful Life Prediction

Zekai Zhang, Dan Li, Shunyu Wu, Junya Cai, Bo Zhang, See Kiong Ng, Zibin Zheng

Comments This paper has been received as a research paper at CollaborateCom 2024

2503.07928 2026-03-06 cs.AI cs.HC

The StudyChat Dataset: Analyzing Student Dialogues With ChatGPT in an Artificial Intelligence Course

Hunter McNichols, Fareya Ikram, Andrew Lan

Comments LAK '26

2502.17100 2026-03-06 cs.LG cs.AI

Generative Models in Decision Making: A Survey

Xinyu Shao, Jianping Zhang, Haozhi Wang, Leo Maxime Brunswic, Kaiwen Zhou, Jiqian Dong, Kaiyang Guo, Zhitang Chen, Jun Wang, Jianye Hao, Xiu Li, Yinchuan Li

Comments Project page:https://github.com/xyshao23/Awesome-Generative-Models-for-Decision-Making-Taxonomy

2502.11682 2026-03-06 cs.LG math.OC stat.ML

Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy

Rustem Islamov, Samuel Horvath, Aurelien Lucchi, Peter Richtarik, Eduard Gorbunov

2502.05360 2026-03-06 cs.LG math.OC stat.ML

Curse of Dimensionality in Neural Network Optimization

Sanghoon Na, Haizhao Yang

Comments Accepted for publication in Information and Inference: A Journal of the IMA. 32 pages, 1 figure

2501.18864 2026-03-06 cs.CV

Flatness Guided Test-Time Adaptation for Vision-Language Models

Aodi Li, Liansheng Zhuang, Xiao Long, Houqiang Li, Shafei Wang

2412.02852 2026-03-06 cs.CV

Learnable Sparsity for Vision Generative Models

Yang Zhang, Er Jin, Wenzhong Liang, Yanfei Dong, Ashkan Khakzar, Philip Torr, Johannes Stegmaier, Kenji Kawaguchi

Comments Project page: https://yangzhang-v5.github.io/EcoDiff

2412.01639 2026-03-06 cs.RO

Vision-based Tactile Image Generation via Contact Condition-guided Diffusion Model

Xi Lin, Weiliang Xu, Yixian Mao, Jing Wang, Meixuan Lv, Lu Liu, Xihui Luo, Xinming Li

2412.00711 2026-03-06 cs.RO

GenTact Toolbox: A Computational Design Pipeline to Procedurally Generate Context-Driven 3D Printed Whole-Body Artificial Skins

Carson Kohlbrenner, Caleb Escobedo, S. Sandra Bae, Alexander Dickhans, Alessandro Roncone

Comments Camera ready accepted at the IEEE International Conference on Robotics and Automation (ICRA) 2025

2411.16758 2026-03-06 cs.CV

Motion-Aware Animatable Gaussian Avatars Deblurring

Muyao Niu, Yifan Zhan, Qingtian Zhu, Zhuoxiao Li, Wei Wang, Zhihang Zhong, Xiao Sun, Yinqiang Zheng

Comments Accepted at CVPR 2026, Codes: https://github.com/MyNiuuu/MAD-Avatar

2411.09847 2026-03-06 cs.LG stat.ML

Towards a Fairer Non-negative Matrix Factorization

Lara Kassab, Erin George, Deanna Needell, Haowen Geng, Nika Jafar Nia, Aoxi Li

2409.14545 2026-03-06 cs.AI

Why Is Anything Conscious?

Michael Timothy Bennett, Sean Welsh, Anna Ciaunica

2407.15738 2026-03-06 cs.LG cs.AI cs.DC

Parallel Split Learning with Global Sampling

Mohammad Kohankhaki, Ahmad Ayad, Mahdi Barhoush, Anke Schmeink

Comments Accepted at the 2025 IEEE 3rd International Conference on Foundation and Large Language Models (FLLM). This version corresponds to the accepted manuscript

2406.14777 2026-03-06 cs.LG math.OC

Learning to Cover: Online Learning and Optimization with Irreversible Decisions

Alexandre Jacquillat, Michael Lingzhi Li

2405.18991 2026-03-06 cs.CV cs.CL cs.MM

EasyAnimate: High-Performance Video Generation Framework with Hybrid Windows Attention and Reward Backpropagation

Jiaqi Xu, Kunzhe Huang, Xinyi Zou, Yunkuo Chen, Bo Liu, MengLi Cheng, Jun Huang, Xing Shi

Comments 10 pages, 8 figures, ACM MM 2025

2404.16721 2026-03-06 cs.AI cs.LG

Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods

Min Kyu Shin, Su-Jeong Park, Seung-Keol Ryu, Heeyeon Kim, Han-Lim Choi

Comments Results have severe errors

2404.09982 2026-03-06 cs.CL

INMS: Memory Sharing for Large Language Model based Agents

Hang Gao, Yongfeng Zhang

2404.04037 2026-03-06 cs.CV cs.MM

InstructHumans: Editing Animated 3D Human Textures with Instructions

Jiayin Zhu, Linlin Yang, Angela Yao

Comments Accepted for publication in IEEE Transactions on Multimedia (TMM), 2025

2404.03759 2026-03-06 cs.LG eess.SP math.OC

Localized Distributional Robustness in Submodular Multi-Task Subset Selection

Ege C. Kaya, Abolfazl Hashemi

Comments 29 pages, 7 figures. This work was presented in part at the 2023 Annual Conference on Communication, Control, and Computing (Allerton). The full work was published in IEEE Transactions on Signal Processing, 2024

2403.14000 2026-03-06 cs.RO

Visual Imitation Learning of Task-Oriented Object Grasping and Rearrangement

Yichen Cai, Jianfeng Gao, Christoph Pohl, Tamim Asfour

2403.08211 2026-03-06 cs.CL cs.AI

Large Language Models are Contrastive Reasoners

Liang Yao

2403.01977 2026-03-06 cs.RO cs.AI cs.CV

Seeing Through Uncertainty: A Free-Energy Approach for Real-Time Perceptual Adaptation in Robust Visual Navigation

Maytus Piriyajitakonkij, Rishabh Dev Yadav, Mingfei Sun, Mengmi Zhang, Wei Pan