arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

Shaobo Wang, Jiaming Wang, Jiajun Zhang, Cong Wang, Yue Min, Zichen Wen, Xingzhang Ren, Fei Huang, Huiqiang Jiang, Junyang Lin, Dayiheng Liu, Linfeng Zhang

Comments 26 pages, 9 figures, 15 tables

2509.23759 2026-02-04 cs.SD cs.LG

VioPTT: Violin Technique-Aware Transcription from Synthetic Data Augmentation

Ting-Kang Wang, Yueh-Po Peng, Li Su, Vincent K. M. Cheung

2509.23286 2026-02-04 cs.CL cs.AI

A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models

Wonje Jeung, Sangyeon Yoon, Yoonjun Cho, Dongjae Jeon, Sangwoo Shin, Hyesoo Hong, Albert No

Comments Accepted at ICLR 2026. Code and models are available at https://ai-isl.github.io/A2D

2509.22984 2026-02-04 cs.AI cs.CL

From Deferral to Learning: Online In-Context Knowledge Distillation for LLM Cascades

Yu Wu, Shuo Wu, Ye Tao, Yansong Li, Anand D. Sarwate

Comments 32 pages, 6 figures, 23 tables, under review

2509.22840 2026-02-04 cs.LG

A Capacity-Based Rationale for Multi-Head Attention

Micah Adler

2509.21984 2026-02-04 cs.CV cs.CL

Beyond the Vision Encoder: Identifying and Mitigating Spatial Bias in Large Vision-Language Models

Yingjie Zhu, Xuefeng Bai, Kehai Chen, Yang Xiang, Youcheng Pan, Yongshuai Hou, Weili Guan, Jun Yu, Min Zhang

2509.21875 2026-02-04 cs.CL

LUMINA: Detecting Hallucinations in RAG System with Context-Knowledge Signals

Samuel Yeh, Sharon Li, Tanwi Mallick

Comments ICLR 2026

2509.21249 2026-02-04 cs.CV cs.AI cs.LG

Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations

Zhijian Yang, Noel DSouza, Istvan Megyeri, Xiaojian Xu, Amin Honarmandi Shandiz, Farzin Haddadpour, Krisztian Koos, Laszlo Rusko, Emanuele Valeriano, Bharadwaj Swaninathan, Lei Wu, Parminder Bhatia, Taha Kass-Hout, Erhan Bas

2509.20510 2026-02-04 cs.RO

MELEGROS: Monolithic Elephant-inspired Gripper with Optical Sensors

Petr Trunin, Diana Cafiso, Anderson Brazil Nardin, Trevor Exley, Lucia Beccai

Comments 15 pages, 6 figures. SI 18 pages, 19 figures. Submitted to Wiley Advanced Science

2509.19664 2026-02-04 cs.CV cs.AI

MoTiC: Momentum Tightness and Contrast for Few-Shot Class-Incremental Learning

Zeyu He, Shuai Huang, Yuwu Lu, Ming Zhao

Journal ref Pattern Recognition, Vol. 173, pp. 112753, May 2026

2509.16832 2026-02-04 cs.CV cs.RO eess.IV

L2M-Reg: Building-level Uncertainty-aware Registration of Outdoor LiDAR Point Clouds and Semantic 3D City Models

Ziyang Xu, Benedikt Schwab, Yihui Yang, Thomas H. Kolbe, Christoph Holst

Comments Accepted version by ISPRS Journal of Photogrammetry and Remote Sensing

2509.15090 2026-02-04 cs.LG cs.GT econ.TH

Emergent Alignment via Competition

Natalie Collina, Surbhi Goel, Aaron Roth, Emily Ryu, Mirah Shi

2509.14863 2026-02-04 cs.LG cs.AI

Exploring the Global-to-Local Attention Scheme in Graph Transformers: An Empirical Study

Gang Wu, Zhengwei Wang

Comments The article has been accepted by Frontiers of Computer Science (FCS), with the DOI: {10.1007/s11704-026-51718-4}

2509.14565 2026-02-04 cs.CV

DiffVL: Diffusion-Based Visual Localization on 2D Maps via BEV-Conditioned GPS Denoising

Li Gao, Hongyang Sun, Liu Liu, Yunhao Li, Yang Cai

2509.12203 2026-02-04 cs.CV

LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence

Zixin Yin, Xili Dai, Duomin Wang, Xianfang Zeng, Lionel M. Ni, Gang Yu, Heung-Yeung Shum

Comments https://zxyin.github.io/LazyDrag

2509.11361 2026-02-04 cs.AI

MAPGD: Multi-Agent Prompt Gradient Descent for Collaborative Prompt Optimization

Yichen Han, Yuhang Han, Siteng Huang, Guanyu Liu, Zhengpeng Zhou, Bojun Liu, Yujia Zhang, Isaac N Shi, Lewei He, Tianyu Shi

2509.00060 2026-02-04 cs.RO

Correspondence-Free, Function-Based Sim-to-Real Learning for Deformable Surface Control

Yingjun Tian, Guoxin Fang, Renbo Su, Aoran Lyu, Neelotpal Dutta, Weiming Wang, Simeon Gill, Andrew Weightman, Charlie C. L. Wang

Comments arXiv admin note: text overlap with arXiv:2405.08935

2508.18742 2026-02-04 cs.LG

Constraint Matters: Multi-Modal Representation for Reducing Mixed-Integer Linear programming

Jiajun Li, Yixuan Li, Ran Hou, Yu Ding, Shisi Guan, Jiahui Duan, Xiongwei Han, Tao Zhong, Vincent Chau, Weiwei Wu, Wanyuan Wang

Comments Accecpted by ICLR 2026

2508.14330 2026-02-04 cs.LG

Multi-view Graph Condensation via Tensor Decomposition

Nícolas Roque dos Santos, Dawon Ahn, Diego Minatel, Alneu de Andrade Lopes, Evangelos E. Papalexakis

Comments Accepted at WSDM 2026

2508.10801 2026-02-04 cs.CV

Object Fidelity Diffusion for Remote Sensing Image Generation

Ziqi Ye, Shuran Ma, Jie Yang, Xiaoyi Yang, Yi Yang, Ziyang Gong, Xue Yang, Haipeng Wang

2508.07468 2026-02-04 cs.AI cs.CL cs.LG cs.SE

CP-Agent: Agentic Constraint Programming

Stefan Szeider

2508.04568 2026-02-04 cs.CV

DDTracking: A Deep Generative Framework for Diffusion MRI Tractography with Streamline Local-Global Spatiotemporal Modeling

Yijie Li, Wei Zhang, Xi Zhu, Ye Wu, Yogesh Rathi, Lauren J. O'Donnell, Fan Zhang

Comments Preprint version. The content may be updated in the future

2508.04136 2026-02-04 cs.CV cs.AI

UniFGVC: Universal Training-Free Few-Shot Fine-Grained Vision Classification via Attribute-Aware Multimodal Retrieval

Hongyu Guo, Xiangzhao Hao, Jiarui Guo, Haiyun Guo, Jinqiao Wang, Tat-Seng Chua

2508.03516 2026-02-04 cs.CV

DSKC: Domain Style Modeling with Adaptive Knowledge Consolidation for Exemplar-free Lifelong Person Re-Identification

Shiben Liu, Mingyue Xu, Huijie Fan, Qiang Wang, Liangqiong Qu, Zhi Han

Comments 11 papges, 6 figures

2508.01725 2026-02-04 cs.LG cs.CV

Imbalance-Robust and Sampling-Efficient Continuous Conditional GANs via Adaptive Vicinity and Auxiliary Regularization

Xin Ding, Yun Chen, Yongwei Wang, Kao Zhang, Sen Zhang, Peibei Cao, Xiangxue Wang

2507.23642 2026-02-04 cs.CV cs.AI

Efficient Masked Attention Transformer for Few-Shot Classification and Segmentation

Dustin Carrión-Ojeda, Stefan Roth, Simone Schaub-Meyer

Comments Accepted for GCPR 2025. Project page: https://visinf.github.io/emat

Journal ref In Proceedings of the 47th German Conference on Pattern Recognition (GCPR 2025)

2507.21129 2026-02-04 cs.AI

Measuring and Analyzing Intelligence via Contextual Uncertainty in Large Language Models using Information-Theoretic Metrics

Jae Wan Shim

2507.20534 2026-02-04 cs.LG cs.AI cs.CL

Kimi K2: Open Agentic Intelligence

Kimi Team, Yifan Bai, Yiping Bao, Y. Charles, Cheng Chen, Guanduo Chen, Haiting Chen, Huarong Chen, Jiahao Chen, Ningxin Chen, Ruijue Chen, Yanru Chen, Yuankun Chen, Yutian Chen, Zhuofu Chen, Jialei Cui, Hao Ding, Mengnan Dong, Angang Du, Chenzhuang Du, Dikang Du, Yulun Du, Yu Fan, Yichen Feng, Kelin Fu, Bofei Gao, Chenxiao Gao, Hongcheng Gao, Peizhong Gao, Tong Gao, Yuyao Ge, Shangyi Geng, Qizheng Gu, Xinran Gu, Longyu Guan, Haiqing Guo, Jianhang Guo, Xiaoru Hao, Tianhong He, Weiran He, Wenyang He, Yunjia He, Chao Hong, Hao Hu, Yangyang Hu, Zhenxing Hu, Weixiao Huang, Zhiqi Huang, Zihao Huang, Tao Jiang, Zhejun Jiang, Xinyi Jin, Yongsheng Kang, Guokun Lai, Cheng Li, Fang Li, Haoyang Li, Ming Li, Wentao Li, Yang Li, Yanhao Li, Yiwei Li, Zhaowei Li, Zheming Li, Hongzhan Lin, Xiaohan Lin, Zongyu Lin, Chengyin Liu, Chenyu Liu, Hongzhang Liu, Jingyuan Liu, Junqi Liu, Liang Liu, Shaowei Liu, T. Y. Liu, Tianwei Liu, Weizhou Liu, Yangyang Liu, Yibo Liu, Yiping Liu, Yue Liu, Zhengying Liu, Enzhe Lu, Haoyu Lu, Lijun Lu, Yashuo Luo, Shengling Ma, Xinyu Ma, Yingwei Ma, Shaoguang Mao, Jie Mei, Xin Men, Yibo Miao, Siyuan Pan, Yebo Peng, Ruoyu Qin, Zeyu Qin, Bowen Qu, Zeyu Shang, Lidong Shi, Shengyuan Shi, Feifan Song, Jianlin Su, Zhengyuan Su, Lin Sui, Xinjie Sun, Flood Sung, Yunpeng Tai, Heyi Tang, Jiawen Tao, Qifeng Teng, Chaoran Tian, Chensi Wang, Dinglu Wang, Feng Wang, Hailong Wang, Haiming Wang, Jianzhou Wang, Jiaxing Wang, Jinhong Wang, Shengjie Wang, Shuyi Wang, Si Wang, Xinyuan Wang, Yao Wang, Yejie Wang, Yiqin Wang, Yuxin Wang, Yuzhi Wang, Zhaoji Wang, Zhengtao Wang, Zhengtao Wang, Zhexu Wang, Chu Wei, Qianqian Wei, Haoning Wu, Wenhao Wu, Xingzhe Wu, Yuxin Wu, Chenjun Xiao, Jin Xie, Xiaotong Xie, Weimin Xiong, Boyu Xu, Jinjing Xu, L. H. Xu, Lin Xu, Suting Xu, Weixin Xu, Xinran Xu, Yangchuan Xu, Ziyao Xu, Jing Xu, Jing Xu, Junjie Yan, Yuzi Yan, Hao Yang, Xiaofei Yang, Yi Yang, Ying Yang, Zhen Yang, Zhilin Yang, Zonghan Yang, Haotian Yao, Xingcheng Yao, Wenjie Ye, Zhuorui Ye, Bohong Yin, Longhui Yu, Enming Yuan, Hongbang Yuan, Mengjie Yuan, Siyu Yuan, Haobing Zhan, Dehao Zhang, Hao Zhang, Wanlu Zhang, Xiaobin Zhang, Yadong Zhang, Yangkun Zhang, Yichi Zhang, Yizhi Zhang, Yongting Zhang, Yu Zhang, Yutao Zhang, Yutong Zhang, Zheng Zhang, Haotian Zhao, Yikai Zhao, Zijia Zhao, Huabin Zheng, Shaojie Zheng, Longguang Zhong, Jianren Zhou, Xinyu Zhou, Zaida Zhou, Jinguo Zhu, Zhen Zhu, Weiyu Zhuang, Xinxing Zu

Comments tech report of Kimi K2, with minor updates

2507.14111 2026-02-04 cs.AI cs.DC cs.LG

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Xiaoya Li, Xiaofei Sun, Albert Wang, Jiwei Li, Chris Shum

Comments Accepted by ICLR 2026

详情

英文摘要

The exponential growth in demand for GPU computing resources has created an urgent need for automated CUDA optimization strategies. While recent advances in LLMs show promise for code generation, current SOTA models achieve low success rates in improving CUDA speed. In this paper, we introduce CUDA-L1, an automated reinforcement learning framework for CUDA optimization that employs a novel contrastive RL algorithm. CUDA-L1 achieves significant performance improvements on the CUDA optimization task: trained on A100, it delivers an average speedup of x3.12 with a median speedup of x1.42 against default baselines over across all 250 CUDA kernels of KernelBench, with peak speedups reaching x120. In addition to the default baseline provided by KernelBench, CUDA-L1 demonstrates x2.77 over Torch Compile, x2.88 over Torch Compile with reduce overhead, x2.81 over CUDA Graph implementations, and x7.72 over cuDNN libraries. Furthermore, the model also demonstrates portability across different GPU architectures. Beyond these benchmark results, CUDA-L1 demonstrates several properties: it 1) discovers a variety of CUDA optimization techniques and learns to combine them strategically to achieve optimal performance; 2) uncovers fundamental principles of CUDA optimization, such as the multiplicative nature of optimizations; 3) identifies non-obvious performance bottlenecks and rejects seemingly beneficial optimizations that actually harm performance. The capabilities demonstrate that, RL can transform an initially poor-performing LLM into an effective CUDA optimizer through speedup-based reward signals alone, without human expertise or domain knowledge. This paradigm opens possibilities for automated optimization of CUDA operations, and holds promise to substantially promote GPU efficiency and alleviate the rising pressure on GPU computing resources. Project: deepreinforce-ai.github.io/cudal1_blog

URL PDF HTML ☆

赞 0 踩 0

2507.03545 2026-02-04 cs.LG

DOME: Improving Signal-to-Noise in Stochastic Gradient Descent via Sharp-Direction Subspace Filtering

Julien Nicolas, Mohamed Maouche, Sonia Ben Mokhtar, Mark Coates

AI 大模型

视觉与机器人

科学与医疗

Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning