arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.05276 2026-03-06 cs.LG cs.AI

Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts

Samandar Samandarov, Nazirjon Ismoiljonov, Abdullah Sattorov, Temirlan Sabyrbayev

详情

英文摘要

In the landscape of modern machine learning, frozen pre-trained models provide stability and efficiency but often underperform on specific tasks due to mismatched data distributions. This paper introduces the Whisperer, a novel visual prompting framework that learns diffusion-based preprocessors to adapt inputs in pixel space, effectively "whispering" enhancements to frozen downstream models like EasyOCR. By framing the process as behavioral cloning of stochastically discovered improvement policies, our method achieves an 8% absolute (10.6% relative) reduction in Character Error Rate (CER) on a challenging dataset of 300k degraded synthetic text images, surpassing hand-engineered baselines such as CLAHE. The key innovation is a four-stage training curriculum that uses behavioral cloning to amplify "lucky" improvements discovered through the stochastic exploration of a partially trained diffusion model. This approach is highly sample-efficient and avoids the pitfalls of traditional reinforcement learning. Crucially, we frame this not as naive reinforcement learning, but as behavioral cloning of an exploration policy: we stochastically sample intermediate diffusion outputs, select those that improve CER by chance, and then train the model to reproduce them. This bootstrapping curriculum (4 stages over 60 GPU-hours) amplifies random successes into a systematic strategy. In summary, by whispering to the frozen OCR through its inputs, we improve an imperfect classifier without touching its weights.

URL PDF HTML ☆

赞 0 踩 0

2603.05268 2026-03-06 cs.RO cs.SY eess.SY

Curve-Induced Dynamical Systems on Riemannian Manifolds and Lie Groups

Saray Bakker, Martin Schonger, Tobias Löw, Javier Alonso-Mora, Sylvain Calinon

Comments Preprint, 14 pages, video linked in the paper, Saray Bakker and Martin Schonger contributed equally as first authors and are listed alphabetically

2603.05267 2026-03-06 cs.LG

Beyond Word Error Rate: Auditing the Diversity Tax in Speech Recognition through Dataset Cartography

Ting-Hui Cheng, Line H. Clemmensen, Sneha Das

Comments Submitted to the Interspeech 2026

2603.05263 2026-03-06 cs.LG

A Behaviour-Aware Federated Forecasting Framework for Distributed Stand-Alone Wind Turbines

Bowen Li, Xiufeng Liu, Maria Sinziiana Astefanoaei

2603.05262 2026-03-06 cs.CL

VietJobs: A Vietnamese Job Advertisement Dataset

Hieu Pham Dinh, Hung Nguyen Huy, Mo El-Haj

Comments 10 pages

2603.05256 2026-03-06 cs.CV

Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum

Shan Ning, Longtian Qiu, Xuming He

Comments Accepted by ICLR 26, code and weights are publicly available

2603.05255 2026-03-06 cs.CV

CATNet: Collaborative Alignment and Transformation Network for Cooperative Perception

Gong Chen, Chaokun Zhang, Tao Tang, Pengcheng Lv, Feng Li, Xin Xie

Comments Accepted by CVPR26

2603.05252 2026-03-06 cs.RO

Rethinking the Role of Collaborative Robots in Rehabilitation

Vivek Gupte, Shalutha Rajapakshe, Emmanuel Senft

Comments 5 pages, 1 figure

2603.05240 2026-03-06 cs.AI

GCAgent: Enhancing Group Chat Communication through Dialogue Agents System

Zijie Meng, Zheyong Xie, Zheyu Ye, Chonggang Lu, Zuozhu Liu, Zihan Niu, Yao Hu, Shaosheng Cao

2603.05235 2026-03-06 cs.AI

Reclaiming Lost Text Layers for Source-Free Cross-Domain Few-Shot Learning

Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Yuhua Li, Ruixuan Li

Comments CVPR 2026

2603.05234 2026-03-06 cs.LG cs.AI

Recursive Inference Machines for Neural Reasoning

Mieszko Komisarczyk, Saurabh Mathur, Maurice Kraus, Sriraam Natarajan, Kristian Kersting

2603.05232 2026-03-06 cs.LG

SlideSparse: Fast and Flexible (2N-2):2N Structured Sparsity

Hanyong Shao, Yingbo Hao, Ting Song, Yan Xia, Di Zhang, Shaohan Huang, Xun Wu, Songchen Xu, Le Xu, Li Dong, Zewen Chi, Yi Zou, Furu Wei

2603.05231 2026-03-06 cs.SD cs.AI cs.LG

Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards

Linghan Fang, Tianxin Xie, Li Liu

2603.05225 2026-03-06 cs.AI cs.AR

AI+HW 2035: Shaping the Next Decade

Deming Chen, Jason Cong, Azalia Mirhoseini, Christos Kozyrakis, Subhasish Mitra, Jinjun Xiong, Cliff Young, Anima Anandkumar, Michael Littman, Aron Kirschen, Sophia Shao, Serge Leef, Naresh Shanbhag, Dejan Milojicic, Michael Schulte, Gert Cauwenberghs, Jerry M. Chow, Tri Dao, Kailash Gopalakrishnan, Richard Ho, Hoshik Kim, Kunle Olukotun, David Z. Pan, Mark Ren, Dan Roth, Aarti Singh, Yizhou Sun, Yusu Wang, Yann LeCun, Ruchir Puri

Comments 35 pages, 4 figures

详情

英文摘要

Artificial intelligence (AI) and hardware (HW) are advancing at unprecedented rates, yet their trajectories have become inseparably intertwined. The global research community lacks a cohesive, long-term vision to strategically coordinate the development of AI and HW. This fragmentation constrains progress toward holistic, sustainable, and adaptive AI systems capable of learning, reasoning, and operating efficiently across cloud, edge, and physical environments. The future of AI depends not only on scaling intelligence, but on scaling efficiency, achieving exponential gains in intelligence per joule, rather than unbounded compute consumption. Addressing this grand challenge requires rethinking the entire computing stack. This vision paper lays out a 10-year roadmap for AI+HW co-design and co-development, spanning algorithms, architectures, systems, and sustainability. We articulate key insights that redefine scaling around energy efficiency, system-level integration, and cross-layer optimization. We identify key challenges and opportunities, candidly assess potential obstacles and pitfalls, and propose integrated solutions grounded in algorithmic innovation, hardware advances, and software abstraction. Looking ahead, we define what success means in 10 years: achieving a 1000x improvement in efficiency for AI training and inference; enabling energy-aware, self-optimizing systems that seamlessly span cloud, edge, and physical AI; democratizing access to advanced AI infrastructure; and embedding human-centric principles into the design of intelligent systems. Finally, we outline concrete action items for academia, industry, government, and the broader community, calling for coordinated national initiatives, shared infrastructure, workforce development, cross-agency collaboration, and sustained public-private partnerships to ensure that AI+HW co-design becomes a unifying long-term mission.

URL PDF HTML ☆

赞 0 踩 0

2603.05219 2026-03-06 cs.CV cs.AI

SPyCer: Semi-Supervised Physics-Guided Contextual Attention for Near-Surface Air Temperature Estimation from Satellite Imagery

Sofiane Bouaziz, Adel Hafiane, Raphael Canals, Rachid Nedjai

2603.05218 2026-03-06 cs.AI cs.LG

KARL: Knowledge Agents via Reinforcement Learning

Jonathan D. Chang, Andrew Drozdov, Shubham Toshniwal, Owen Oertell, Alexander Trott, Jacob Portes, Abhay Gupta, Pallavi Koppol, Ashutosh Baheti, Sean Kulinski, Ivan Zhou, Irene Dea, Krista Opsahl-Ong, Simon Favreau-Lessard, Sean Owen, Jose Javier Gonzalez Ortiz, Arnav Singhvi, Xabi Andrade, Cindy Wang, Kartik Sreenivasan, Sam Havens, Jialu Liu, Peyton DeNiro, Wen Sun, Michael Bendersky, Jonathan Frankle

Comments 77 pages, 43 figures, 17 tables

2603.05210 2026-03-06 cs.CL cs.AI cs.LG

Balancing Coverage and Draft Latency in Vocabulary Trimming for Faster Speculative Decoding

Ofir Ben Shoham

2603.05204 2026-03-06 cs.LG cs.AI

Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaptation

Yize Wu, Ke Gao, Ling Li, Yanjun Wu

2603.05202 2026-03-06 cs.CV

Semantic Class Distribution Learning for Debiasing Semi-Supervised Medical Image Segmentation

Yingxue Su, Yiheng Zhong, Keying Zhu, Zimu Zhang, Zhuoru Zhang, Yifang Wang, Yuxin Zhang, Jingxin Liu

Comments 9 pages, 2 figures

2603.05201 2026-03-06 cs.LG stat.ML

Towards a data-scale independent regulariser for robust sparse identification of non-linear dynamics

Jay Raut, Daniel N. Wilke, Stephan Schmidt

Comments 21 pages, 9 figures, 5 tables

2603.05198 2026-03-06 cs.CL cs.SC

Distilling Formal Logic into Neural Spaces: A Kernel Alignment Approach for Signal Temporal Logic

Sara Candussio, Gabriele Sarti, Gaia Saveri, Luca Bortolussi

2603.05197 2026-03-06 cs.CL

Diffusion LLMs can think EoS-by-EoS

Sarah Breckner, Sebastian Schuster

2603.05185 2026-03-06 cs.RO

Critic in the Loop: A Tri-System VLA Framework for Robust Long-Horizon Manipulation

Pengfei Yi, Yingjie Ma, Wenjiang Xu, Yanan Hao, Shuai Gan, Wanting Li, Shanlin Zhong

2603.05184 2026-03-06 cs.CV cs.AI

Logi-PAR: Logic-Infused Patient Activity Recognition via Differentiable Rule

Muhammad Zarar, MingZheng Zhang, Xiaowang Zhang, Zhiyong Feng, Sofonias Yitagesu, Kawsar Farooq

2603.05175 2026-03-06 cs.LG

Incentive Aware AI Regulations: A Credal Characterisation

Anurag Singh, Julian Rodemann, Rajeev Verma, Siu Lun Chau, Krikamol Muandet

2603.05172 2026-03-06 cs.LG

Trainable Bitwise Soft Quantization for Input Feature Compression

Karsten Schrödter, Jan Stenkamp, Nina Herrmann, Fabian Gieseke

Comments Accepted to CPAL 2026

2603.05168 2026-03-06 cs.CL

Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity

Di Zhang, Xun Wu, Shaohan Huang, Yudong Wang, Hanyong Shao, Yingbo Hao, Zewen Chi, Li Dong, Ting Song, Yan Xia, Zhifang Sui, Furu Wei

2603.05160 2026-03-06 cs.RO cs.AI

Lifelong Language-Conditioned Robotic Manipulation Learning

Xudong Wang, Zebin Han, Zhiyu Liu, Gan Li, Jiahua Dong, Baichen Liu, Lianqing Liu, Zhi Han

Comments 14 pages, 7 figures

2603.05158 2026-03-06 cs.LG

Balancing Privacy-Quality-Efficiency in Federated Learning through Round-Based Interleaving of Protection Techniques

Yenan Wang, Carla Fabiana Chiasserini, Elad Michael Schiller

2603.05157 2026-03-06 cs.CV cs.LG eess.IV

The Impact of Preprocessing Methods on Racial Encoding and Model Robustness in CXR Diagnosis

Dishantkumar Sutariya, Eike Petersen

Comments Preprint accepted for publication at BVM 2026 (https://www.bvm-conf.org/)