arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.08709 2026-03-10 cs.CV cs.AI

Scale Space Diffusion

Soumik Mukhopadhyay, Prateksha Udhayanan, Abhinav Shrivastava

Comments Project website: https://prateksha.github.io/projects/scale-space-diffusion/ . The first two authors contributed equally

详情

英文摘要

Diffusion models degrade images through noise, and reversing this process reveals an information hierarchy across timesteps. Scale-space theory exhibits a similar hierarchy via low-pass filtering. We formalize this connection and show that highly noisy diffusion states contain no more information than small, downsampled images - raising the question of why they must be processed at full resolution. To address this, we fuse scale spaces into the diffusion process by formulating a family of diffusion models with generalized linear degradations and practical implementations. Using downsampling as the degradation yields our proposed Scale Space Diffusion. To support Scale Space Diffusion, we introduce Flexi-UNet, a UNet variant that performs resolution-preserving and resolution-increasing denoising using only the necessary parts of the network. We evaluate our framework on CelebA and ImageNet and analyze its scaling behavior across resolutions and network depths. Our project website ( https://prateksha.github.io/projects/scale-space-diffusion/ ) is available publicly.

URL PDF HTML ☆

赞 0 踩 0

2603.08708 2026-03-10 cs.CV

FVG-PT: Adaptive Foreground View-Guided Prompt Tuning for Vision-Language Models

Haoyang Li, Liang Wang, Siyu Zhou, Jiacheng Sun, Jing Jiang, Chao Wang, Guodong Long, Yan Peng

Comments 27 Pages, 9 Figures, 15 Tables

2603.08706 2026-03-10 cs.AI cs.CL cs.LG

Agentic Critical Training

Weize Liu, Minghui Liu, Sy-Tuyen Ho, Souradip Chakraborty, Xiyao Wang, Furong Huang

Comments Project page: https://attention-is-all-i-need.github.io/ACT/

2603.08704 2026-03-10 cs.AI

Evaluating Financial Intelligence in Large Language Models: Benchmarking SuperInvesting AI with LLM Engines

Akshay Gulati, Kanha Singhania, Tushar Banga, Parth Arora, Anshul Verma, Vaibhav Kumar Singh, Agyapal Digra, Jayant Singh Bisht, Danish Sharma, Varun Singla, Shubh Garg

Comments 12 pages, 6 Figures, 5 Tables

2603.08703 2026-03-10 cs.CV

HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising

Kai Zou, Dian Zheng, Hongbo Liu, Tiankai Hang, Bin Liu, Nenghai Yu

Comments Project page: https://jacky-hate.github.io/HiAR/ Code: https://github.com/Jacky-hate/HiAR

2603.08692 2026-03-10 cs.AI

A Multi-Objective Optimization Approach for Sustainable AI-Driven Entrepreneurship in Resilient Economies

Anas ALsobeh, Raneem Alkurdi

Comments 35 Pages,

2603.08687 2026-03-10 cs.LG cs.AI

Split Federated Learning Architectures for High-Accuracy and Low-Delay Model Training

Yiannis Papageorgiou, Yannis Thomas, Ramin Khalili, Iordanis Koutsopoulos

2603.08681 2026-03-10 cs.CV

ER-Pose: Rethinking Keypoint-Driven Representation Learning for Real-Time Human Pose Estimation

Nanjun Li, Pinqi Cheng, Zean Liu, Minghe Tian, Xuanyin Wang

详情

英文摘要

Single-stage multi-person pose estimation aims to jointly perform human localization and keypoint prediction within a unified framework, offering advantages in inference efficiency and architectural simplicity. Consequently, multi-scale real-time detection architectures, such as YOLO-like models, are widely adopted for real-time pose estimation. However, these approaches typically inherit a box-driven modeling paradigm from object detection, in which pose estimation is implicitly constrained by bounding-box supervision during training. This formulation introduces biases in sample assignment and feature representation, resulting in task misalignment and ultimately limiting pose estimation accuracy. In this work, we revisit box-driven single-stage pose estimation from a keypoint-driven perspective and identify semantic conflicts among parallel objectives as a key source of performance degradation. To address this issue, we propose a keypoint-driven learning paradigm that elevates pose estimation to a primary prediction objective. Specifically, we remove bounding-box prediction and redesign the prediction head to better accommodate the high-dimensional structured representations for pose estimation. We further introduce a keypoint-driven dynamic sample assignment strategy to align training objectives with pose evaluation metrics, enabling dense supervision during training and efficient NMS-free inference. In addition, we propose a smooth OKS-based loss function to stabilize optimization in regression-based pose estimation. Based on these designs, we develop a single-stage multi-person pose estimation framework, termed ER-Pose. On MS COCO and CrowdPose, ER-Pose-n achieves AP improvements of 3.2/6.7 without pre-training and 7.4/4.9 with pre-training respectively compared with the baseline YOLO-Pose. These improvements are achieved with fewer parameters and higher inference efficiency.

URL PDF HTML ☆

赞 0 踩 0

2603.08679 2026-03-10 cs.LG cs.AI cs.GT econ.TH

A New Lower Bound for the Random Offerer Mechanism in Bilateral Trade using AI-Guided Evolutionary Search

Yang Cai, Vineet Gupta, Zun Li, Aranyak Mehta

2603.08674 2026-03-10 cs.CV

Talking Together: Synthesizing Co-Located 3D Conversations from Audio

Mengyi Shan, Shouchieh Chang, Ziqian Bai, Shichen Liu, Yinda Zhang, Luchuan Song, Rohit Pandey, Sean Fanello, Zeng Huang

Comments Accepted to CVPR 2026

2603.08668 2026-03-10 cs.RO

Exp-Force: Experience-Conditioned Pre-Grasp Force Selection with Vision-Language Models

Siqi Shang, Minchao Huang, Bill Fan, Lillian Chin

2603.08661 2026-03-10 cs.CV

ImprovedGS+: A High-Performance C++/CUDA Re-Implementation Strategy for 3D Gaussian Splatting

Jordi Muñoz Vicente

Comments 6 pages, 1 figure. Technical Report. This work introduces ImprovedGS+, a library-free C++/CUDA implementation for 3D Gaussian Splatting within the LichtFeld-Studio framework. Source code available at https://github.com/jordizv/ImprovedGS-Plus

2603.08660 2026-03-10 cs.LG cs.CL

How Far Can Unsupervised RLVR Scale LLM Training?

Bingxiang He, Yuxin Zuo, Zeyuan Liu, Shangziqi Zhao, Zixuan Fu, Junlin Yang, Cheng Qian, Kaiyan Zhang, Yuchen Fan, Ganqu Cui, Xiusi Chen, Youbang Sun, Xingtai Lv, Xuekai Zhu, Li Sheng, Ran Li, Huan-ang Gao, Yuchen Zhang, Bowen Zhou, Zhiyuan Liu, Ning Ding

Comments Accepted to the ICLR 2026

2603.08658 2026-03-10 cs.LG

Context-free Self-Conditioned GAN for Trajectory Forecasting

Tiago Rodrigues de Almeida, Eduardo Gutierrez Maestro, Oscar Martinez Mozos

Comments Accepted at the 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)

2603.08655 2026-03-10 cs.AI cs.CL cs.IR

OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning

Krista Opsahl-Ong, Arnav Singhvi, Jasmine Collins, Ivan Zhou, Cindy Wang, Ashutosh Baheti, Owen Oertell, Jacob Portes, Sam Havens, Erich Elsen, Michael Bendersky, Matei Zaharia, Xing Chen

Comments 24 pages, 16 figures. Introduces the OfficeQA Pro benchmark for grounded reasoning over enterprise documents

2603.08652 2026-03-10 cs.AI

CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation

Haodong Li, Chunmei Qing, Huanyu Zhang, Dongzhi Jiang, Yihang Zou, Hongbo Peng, Dingming Li, Yuhong Dai, ZePeng Lin, Juanxi Tian, Yi Zhou, Siqi Dai, Jingwei Wu

Comments 21 pages, 7 figures, 7 tables

2603.08649 2026-03-10 cs.LG

Divide and Predict: An Architecture for Input Space Partitioning and Enhanced Accuracy

Fenix W. Huang, Henning S. Mortveit, Christian M. Reidys

Comments Under review; 24 pages; 8 figures

2603.08648 2026-03-10 cs.CV

CAST: Modeling Visual State Transitions for Consistent Video Retrieval

Yanqing Liu, Yingcheng Liu, Fanghong Dong, Budianto Budianto, Cihang Xie, Yan Jiao

2603.08647 2026-03-10 cs.LG

Grow, Don't Overwrite: Fine-tuning Without Forgetting

Dyah Adila, Hanna Mazzawi, Benoit Dherin, Xavier Gonzalvo

2603.08645 2026-03-10 cs.CV cs.GR cs.LG

Retrieval-Augmented Gaussian Avatars: Improving Expression Generalization

Matan Levy, Gavriel Habib, Issar Tzachor, Dvir Samuel, Rami Ben-Ari, Nir Darshan, Or Litany, Dani Lischinski

2603.08620 2026-03-10 cs.CV

StreamReady: Learning What to Answer and When in Long Streaming Videos

Shehreen Azad, Vibhav Vineet, Yogesh Singh Rawat

Comments Accepted in CVPR 2026

2603.08619 2026-03-10 cs.RO

Embedding Classical Balance Control Principles in Reinforcement Learning for Humanoid Recovery

Nehar Poddar, Stephen McCrory, Luigi Penco, Geoffrey Clark, Hakki Erhan Svil, Robert Griffin

2603.08617 2026-03-10 cs.RO

Diff-Muscle: Efficient Learning for Musculoskeletal Robotic Table Tennis

Wentao Zhao, Jun Guo, Kangyao Huang, Xin Liu, Huaping Liu

Comments 8 pages, 7 figures

2603.08611 2026-03-10 cs.CV cs.RO

FOMO-3D: Using Vision Foundation Models for Long-Tailed 3D Object Detection

Anqi Joyce Yang, James Tu, Nikita Dvornik, Enxu Li, Raquel Urtasun

Comments Published at 9th Annual Conference on Robot Learning (CoRL 2025)

2603.08600 2026-03-10 cs.LG cs.AI

Don't Look Back in Anger: MAGIC Net for Streaming Continual Learning with Temporal Dependence

Federico Giannini, Sandro D'Andrea, Emanuele Della Valle

2603.08599 2026-03-10 cs.RO

Bilevel Planning with Learned Symbolic Abstractions from Interaction Data

Fatih Dogangun, Burcu Kilic, Serdar Bahar, Emre Ugur

2603.08589 2026-03-10 cs.CV

CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing

Yucheng Wang, Zedong Wang, Yuetong Wu, Yue Ma, Dan Xu

Comments Accepted by CVPR 2026. Project page: https://care-edit.github.io/

2603.08583 2026-03-10 cs.LG cs.CV

DualFlexKAN: Dual-stage Kolmogorov-Arnold Networks with Independent Function Control

Andrés Ortiz, Nicolás J. Gallego-Molina, Carmen Jiménez-Mesa, Juan M. Górriz, Javier Ramírez

Comments 22 pages, 12 figures

2603.08578 2026-03-10 cs.LG cs.CL

Drift-to-Action Controllers: Budgeted Interventions with Online Risk Certificates

Ismail Lamaakal, Chaymae Yahyati, Khalid El Makkaoui, Ibrahim Ouahbi, Yassine Maleh

Comments Published as a conference paper at CAO Workshop at ICLR 2026

2603.08575 2026-03-10 cs.AI cs.LG

Trust via Reputation of Conviction

Aravind R. Iyengar

Comments 19 pages, 4 figures