2506.05233
2026-06-04
cs.LG
cs.AI
cs.CL
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
MesaNet: 通过局部最优测试时训练进行序列建模
Johannes von Oswald, Nino Scherrer, Seijin Kobayashi, Luca Versari, Songlin Yang, Sarthak Mittal, Maximilian Schlegel, Kaitlin Maile, Yanick Schimpf, Oliver Sieberling, Alexander Meulemans, Rif A. Saurous, Guillaume Lajoie, Charlotte Frenkel, Razvan Pascanu, Blaise Agüera y Arcas, João Sacramento
发表机构
*
Google(谷歌)
;
Paradigms of Intelligence Team(智能范式团队)
;
Google DeepMind(谷歌深Mind)
;
MIT CSAIL(麻省理工学院CSAIL)
AI总结
提出一种基于共轭梯度求解器实现局部最优测试时训练的Mesa层,在保持常数推理成本的同时,在语言建模困惑度和下游基准性能上超越现有RNN模型。