arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.21331 2026-03-24 cs.LG cs.PF

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search

Jaber Jaber, Osama Jaber

Comments 11 pages, 5 tables, 2 figures. Code: https://github.com/RightNow-AI/autokernel

详情

英文摘要

Writing high-performance GPU kernels is among the most labor-intensive tasks in machine learning systems engineering. We present AutoKernel, an open-source framework that applies an autonomous agent loop to GPU kernel optimization for arbitrary PyTorch models. Given a model, AutoKernel profiles it to identify computational bottlenecks, ranks them by Amdahl's law impact, and iteratively refines Triton or CUDA C++ kernel implementations through hundreds of experiments without human intervention. A five-stage correctness harness covering smoke tests, shape sweeps, numerical stability, determinism verification, and edge-case coverage ensures every candidate kernel is validated before any speedup is recorded. The system comprises over 9,000 lines of Python, 18 starter kernel implementations across two backends, a six-tier optimization playbook, and integration with the KernelBench benchmark suite. AutoKernel covers nine kernel types spanning the dominant operations in modern transformer architectures. On an NVIDIA H100, our Triton kernels outperform both PyTorch eager and torch.compile (max-autotune) on the majority of tested configurations: 5.29x over eager on RMSNorm, 2.82x on softmax, and 2.21x on cross-entropy, while beating torch.compile by 2.83x, 3.44x, and 2.94x respectively. In community deployment, an AutoKernel-optimized kernel achieved first place on the vectorsum_v2 B200 leaderboard. The full system is available at https://github.com/RightNow-AI/autokernel.

URL PDF HTML ☆

赞 0 踩 0

2603.21327 2026-03-24 cs.CV

KHMP: Frequency-Domain Kalman Refinement for High-Fidelity Human Motion Prediction

Wenhan Wu, Zhishuai Guo, Chen Chen, Srijan Das, Hongfei Xue, Pu Wang, Aidong Lu

2603.21321 2026-03-24 cs.AI cs.CL

Improving Coherence and Persistence in Agentic AI for System Optimization

Pantea Karimi, Kimia Noorbakhsh, Mohammad Alizadeh, Hari Balakrishnan

2603.21319 2026-03-24 cs.LG

Active Inference Agency Formalization, Metrics, and Convergence Assessments

Eduard Kapelko

2603.21317 2026-03-24 cs.LG

Stream separation improves Bregman conditioning in transformers

James Clayton Kerce

2603.21316 2026-03-24 cs.SD cs.LG eess.AS

HELIX: Scaling Raw Audio Understanding with Hybrid Mamba-Attention Beyond the Quadratic Limit

Khushiyant, Param Thakkar

Comments 10 Pages, 8 Figures

2603.21315 2026-03-24 cs.LG

FluidWorld: Reaction-Diffusion Dynamics as a Predictive Substrate for World Models

Fabien Polly

Comments 18 pages, 16 figures, 4 tables. Code available at https://github.com/infinition/FluidWorld/

2603.21308 2026-03-24 cs.LG

Direct Interval Propagation Methods using Neural-Network Surrogates for Uncertainty Quantification in Physical Systems Surrogate Model

Ghifari Adam Faza, Jolan Wauters, Fabio Cuzzolin, Hans Hallez, David Moens

2603.21305 2026-03-24 cs.CV

Privacy-Preserving Federated Action Recognition via Differentially Private Selective Tuning and Efficient Communication

Idris Zakariyya, Pai Chet Ng, Kaushik Bhargav Sivangi, S. Mohammad Sheikholeslami, Konstantinos N. Plataniotis, Fani Deligianni

2603.21299 2026-03-24 cs.CV

Identity-Consistent Video Generation under Large Facial-Angle Variations

Bin Hu, Zipeng Qi, Guoxi Huang, Zunnan Xu, Ruicheng Zhang, Chongjie Ye, Jun Zhou, Xiu Li, Jingdong Wang

2603.21295 2026-03-24 cs.CV

Text-Image Conditioned 3D Generation

Jiazhong Cen, Jiemin Fang, Sikuang Li, Guanjun Wu, Chen Yang, Taoran Yi, Zanwei Zhou, Zhikuan Bao, Lingxi Xie, Wei Shen, Qi Tian

Comments CVPR 2026. Project page: https://jumpat.github.io/tigon-page Code: https://github.com/Jumpat/tigon

2603.21287 2026-03-24 cs.CV

Focus on Background: Exploring SAM's Potential in Few-shot Medical Image Segmentation with Background-centric Prompting

Yuntian Bo, Yazhou Zhu, Piotr Koniusz, Haofeng Zhang

Comments Accepted by CVPR26

2603.21284 2026-03-24 cs.LG cs.AI cs.CV physics.ao-ph

Sonny: Breaking the Compute Wall in Medium-Range Weather Forecasting

Minjong Cheon

2603.21282 2026-03-24 cs.LG cs.AI cs.SD

Fusing Memory and Attention: A study on LSTM, Transformer and Hybrid Architectures for Symbolic Music Generation

Soudeep Ghoshal, Sandipan Chakraborty, Pradipto Chowdhury, Himanshu Buckchash

Comments 20 pages, 6 figures. Published in Expert Systems with Applications (Elsevier), 2026. DOI: https://doi.org/10.1016/j.eswa.2026.131173

详情

DOI: 10.1016/j.eswa.2026.131173
Journal ref: Expert Systems with Applications 308 (2026) 131173

英文摘要

Machine learning techniques, such as Transformers and Long Short-Term Memory (LSTM) networks, play a crucial role in Symbolic Music Generation (SMG). Existing literature indicates a difference between LSTMs and Transformers regarding their ability to model local melodic continuity versus maintaining global structural coherence. However, their specific properties within the context of SMG have not been systematically studied. This paper addresses this gap by providing a fine-grained comparative analysis of LSTMs versus Transformers for SMG, examining local and global properties in detail using 17 musical quality metrics on the Deutschl dataset. We find that LSTM networks excel at capturing local patterns but fail to preserve long-range dependencies, while Transformers model global structure effectively but tend to produce irregular phrasing. Based on this analysis and leveraging their respective strengths, we propose a Hybrid architecture combining a Transformer Encoder with an LSTM Decoder and evaluate it against both baselines. We evaluated 1,000 generated melodies from each of the three architectures on the Deutschl dataset. The results show that the hybrid method achieves better local and global continuity and coherence compared to the baselines. Our work highlights the key characteristics of these models and demonstrates how their properties can be leveraged to design superior models. We also supported the experiments with ablation studies and human perceptual evaluations, which statistically support the findings and provide robust validation for this work.

URL PDF HTML ☆

赞 0 踩 0

2603.21278 2026-03-24 cs.CL cs.AI cs.HC

Conversation Tree Architecture: A Structured Framework for Context-Aware Multi-Branch LLM Conversations

Pranav Hemanth, Sampriti Saha

Comments 6 pages, 1 figure. Prototype available at https://the-conversation-tree.vercel.app/app

2603.21276 2026-03-24 cs.LG cs.AI

Aggregation Alignment for Federated Learning with Mixture-of-Experts under Data Heterogeneity

Zihan Fang, Qianru Wang, Haonan An, Zheng Lin, Yiqin Deng, Xianhao Chen, Yuguang Fang

Comments 14 pages, 14 figures

详情

英文摘要

Large language models (LLMs) increasingly adopt Mixture-of-Experts (MoE) architectures to scale model capacity while reducing computation. Fine-tuning these MoE-based LLMs often requires access to distributed and privacy-sensitive data, making centralized fine-tuning impractical. Federated learning (FL) therefore provides a paradigm to collaboratively fine-tune MoE-based LLMs, enabling each client to integrate diverse knowledge without compromising data privacy. However, the integration of MoE-based LLM fine-tuning into FL encounters two critical aggregation challenges due to inherent data heterogeneity across clients: (i) divergent local data distributions drive clients to develop distinct gating preference for localized expert selection, causing direct parameter aggregation to produce a ``one-size-fits-none'' global gating network, and (ii) same-indexed experts develop disparate semantic roles across clients, leading to expert semantic blurring and the degradation of expert specialization. To address these challenges, we propose FedAlign-MoE, a federated aggregation alignment framework that jointly enforces routing consistency and expert semantic alignment. Specifically, FedAlign-MoE aggregates gating behaviors by aligning routing distributions through consistency weighting and optimizes local gating networks through distribution regularization, maintaining cross-client stability without overriding discriminative local preferences. Meanwhile, FedAlign-MoE explicitly quantifies semantic consistency among same-indexed experts across clients and selectively aggregates updates from semantically aligned clients, ensuring stable and specialized functional roles for global experts. Extensive experiments demonstrate that FedAlign-MoE outperforms state-of-the-art benchmarks, achieving faster convergence and superior accuracy in non-IID federated environments.

URL PDF HTML ☆

赞 0 踩 0

2603.21272 2026-03-24 cs.AI cs.CL cs.DS cs.LG

The Library Theorem: How External Organization Governs Agentic Reasoning Capacity

Zachary F. Mainen

Comments 19 pages, 6 figures

详情

英文摘要

Externalized reasoning is already exploited by transformer-based agents through chain-of-thought, but structured retrieval -- indexing over one's own reasoning state -- remains underexplored. We formalize the transformer context window as an I/O page and prove that tool-augmented agents with indexed external memory achieve exponentially lower retrieval cost than agents restricted to sequential scanning: $O(\log_b N)$ versus $Ω(N)$ page reads per query, and $O(T \log_b T)$ versus $Θ(T^2)$ cumulative cost over $T$ reasoning steps -- a gap that widens as deliberation deepens. We test these predictions on a controlled lookup benchmark across three content types -- random hashes, ordered integers, and encyclopedia entries -- varying store size from 50 to 5,000 items, and replicate key conditions across two model generations (GPT-4o-mini and GPT-5.4). On abstract content, the indexed agent achieves median 1 page read regardless of store size, confirming the $O(1)$ prediction. Sorted pages without an index fail to close the gap: the weaker model cannot sustain binary search at scale, and the stronger model achieves near-optimal $\log_2 N$ search but still loses to the index by $5\times$. On familiar content (encyclopedia entries), a competing failure mode emerges: the model recognizes the domain, bypasses the retrieval protocol, and generates answers from parametric memory, producing catastrophic token expenditure even when the index is sound. This parametric memory competition dissociates the two cognitive operations that indexing combines: understanding content (where language models excel) and following navigational protocols (where they fail when understanding tempts them to shortcut). The result argues for a separation of concerns: use language models for index construction, where semantic understanding helps, and deterministic algorithms for index traversal, where it hurts.

URL PDF HTML ☆

赞 0 踩 0

2603.21269 2026-03-24 cs.RO

DyGeoVLN: Infusing Dynamic Geometry Foundation Model into Vision-Language Navigation

Xiangchen Liu, Hanghan Zheng, Jeil Jeong, Minsung Yoon, Lin Zhao, Zhide Zhong, Haoang Li, Sung-Eui Yoon

2603.21248 2026-03-24 cs.CL cs.IR

Graph Fusion Across Languages using Large Language Models

Kaung Myat Kyaw, Khush Agarwal, Jonathan Chan

2603.21244 2026-03-24 cs.LG eess.SP

Amortized Variational Inference for Logistic Regression with Missing Covariates

M. Cherifi, Aude Sportisse, Xujia Zhu, Mohammed Nabil El Korso, A. Mesloub

Comments 25 pages, 12 figures, submitted to Statistics and Computing

2603.21237 2026-03-24 cs.AI

ConsRoute:Consistency-Aware Adaptive Query Routing for Cloud-Edge-Device Large Language Models

Haoyu Qiao, Hao Zhang, Shanwen Mao, Siyao Cheng, Jie Liu

2603.21234 2026-03-24 cs.CV

Enhancing Brain Tumor Classification Using Vision Transformers with Colormap-Based Feature Representation on BRISC2025 Dataset

Faisal Ahmed

Comments 11 pages, 3 figures

2603.21233 2026-03-24 cs.CV

DepthTCM: High Efficient Depth Compression via Physics-aware Transformer-CNN Mixed Architecture

Young-Seo Chang, Yatong An, Jae-Sang Hyun

2603.21232 2026-03-24 cs.CV cs.AI

QMoP: Query Guided Mixture-of-Projector for Efficient Visual Token Compression

Zhongyang Li, Yaqian Li, Faming Fang, Rinyoichi Takezoe, Zi-Hao Bo, Cheng Qian, Mo Guang, Guixu Zhang, Kaiwen Long

2603.21229 2026-03-24 cs.CV

Plant Taxonomy Meets Plant Counting: A Fine-Grained, Taxonomic Dataset for Counting Hundreds of Plant Species

Jinyu Xu, Tianqi Hu, Xiaonan Hu, Letian Zhou, Songliang Cao, Meng Zhang, Hao Lu

Comments Accepted by CVPR 2026. Project page: https://github.com/tiny-smart/TPC-268

2603.21228 2026-03-24 cs.AI

Does AI Homogenize Student Thinking? A Multi-Dimensional Analysis of Structural Convergence in AI-Augmented Essays

Keito Inoshita, Michiaki Omura, Tsukasa Yamanaka, Go Maeda, Kentaro Tsuji

2603.21224 2026-03-24 cs.SD

Emotion-Aware Quantization for Discrete Speech Representations: An Analysis of Emotion Preservation

Haoguang Zhou, Siyi Wang, Jingyao Wu, James Bailey, Ting Dang

2603.21222 2026-03-24 cs.CV

A Large-Scale Remote Sensing Dataset and VLM-based Algorithm for Fine-Grained Road Hierarchy Classification

Ting Han, Xiangyi Xie, Yiping Chen, Yumeng Du, Jin Ma, Aiguang Li, Jiaan Liu, Yin Gao

2603.21217 2026-03-24 cs.CV

Reframing Long-Tailed Learning via Loss Landscape Geometry

Shenghan Chen, Yiming Liu, Yanzhen Wang, Yujia Wang, Xiankai Lu

Comments Accepted to CVPR 2026. 11 pages, 6 figures, 5 tables

2603.21213 2026-03-24 cs.CV cs.AI

Positional Segmentor-Guided Counterfactual Fine-Tuning for Spatially Localized Image Synthesis

Tian Xia, Matthew Sinclair, Andreas Schuh, Fabio De Sousa Ribeiro, Raghav Mehta, Rajat Rasal, Esther Puyol-Antón, Samuel Gerber, Kersten Petersen, Michiel Schaap, Ben Glocker