arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2512.18248 2026-05-12 cs.LG

On the Convergence Rate of LoRA Gradient Descent

Siqiao Mu, Diego Klabjan

AI总结本文研究了原始LoRA梯度下降算法的收敛速率问题，该算法在微调大模型中广泛应用，因其计算效率高且效果良好。由于LoRA缺乏Lipschitz平滑性，其收敛性分析较为困难，现有理论多依赖强假设或仅分析渐进行为。本文首次在不依赖这些假设的前提下，提供了LoRA梯度下降的非渐近收敛分析，证明其收敛速率可达 $O\left(\frac{1}{\log T}\right)$，并通过数值实验验证了理论结果。

Comments ICML 2026

2512.11470 2026-05-12 cs.LG cs.CL

Rethinking Expert Trajectory Utilization in LLM Post-training for Mathematical Reasoning

Bowen Ding, Yuhan Chen, Jiayang Lyv, Jiyao Yuan, Qi Zhu, Shuangshuang Tian, Dantong Zhu, Futing Wang, Heyuan Deng, Fei Mi, Lifeng Shang, Tao Lin

AI总结该研究探讨了如何在数学推理任务中更有效地利用专家轨迹进行大语言模型的后训练。作者提出了“可塑性-上限”框架，将最终性能分解为基础的监督微调（SFT）表现和随后的强化学习（RL）可塑性，从而揭示了SFT-then-RL的顺序训练流程在性能上的优势。研究还给出了具体的训练策略，包括在SFT稳定或轻度过拟合阶段过渡到RL、数据规模决定后训练潜力以及验证损失作为选择专家轨迹的可靠指标，为最大化利用专家轨迹提供了实用指导。

Comments ACL-26, Main Conference

2512.06571 2026-05-12 cs.RO

Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input

Zifan Xu, Myoungkyu Seo, Dongmyeong Lee, Hao Fu, Jiaheng Hu, Jiaxun Cui, Yuqian Jiang, Zhihan Wang, Anastasiia Brund, Joydeep Biswas, Peter Stone

AI总结本文研究了如何让类人足球机器人在存在噪声感知输入的情况下学习快速且稳健的踢球技能。作者提出了一种基于强化学习的系统，通过扩展教师-学生训练框架，设计了四个训练阶段，使机器人能够适应不同的球-球门配置并持续执行踢球动作。该方法结合了定制奖励函数、真实噪声建模和在线约束强化学习，有效缩小了仿真到现实的差距，并在仿真和实际机器人上均表现出优异的踢球精度和进球成功率。

2511.19279 2026-05-12 cs.LG cs.CL

MapFormer: Self-Supervised Learning of Cognitive Maps with Input-Dependent Positional Embeddings

Victor Rambaud, Salvador Mascarenhas, Yair Lakretz

AI总结该研究提出了一种名为MapFormer的新型Transformer架构，旨在通过自监督学习从观测数据中学习认知地图，从而实现类似人类和动物的强泛化能力。其核心方法是通过输入依赖的位置编码矩阵，将输入内容与其结构关系解耦，从而捕捉抽象关系并支持路径积分。实验表明，MapFormer在多个认知任务中显著优于现有模型，展现出接近完美的分布外泛化能力，并且在自然数据上也表现出优越的性能，具有良好的可扩展性。

Comments 19 pages (29 with appendix), 8 figures

详情

英文摘要

A cognitive map is an internal model which encodes the abstract relationships among entities in the world, giving humans and animals the flexibility to adapt to new situations, with a strong out-of-distribution (OOD) generalization that current AI systems still do not possess. To bridge this gap, we introduce $\textit{MapFormers}$, new Transformer-based architectures, which can learn cognitive maps from observational data and perform path-integration without supervision. Cognitive maps are learned in the model by disentangling structural relationships in the inputs from their specific content, a property that can be achieved by updating position encodings with input-dependent matrices, built as exponentials of learned combinations of Lie-algebra generators. We developed two variants of $\textit{MapFormers}$ that unify absolute and relative positional encoding to model episodic (EM) and working memory (WM), respectively. We tested $\textit{MapFormers}$ on several formal tasks targeting distinct cognitive capacities, including gating, 2D navigation and nested hierarchies (Dyck Languages). Our results demonstrate that $\textit{MapFormers}$ significantly outperform current AI architectures, achieving near-perfect OOD generalization where standard models fail. Furthermore, we show that $\textit{MapFormers}$ are scalable; evaluations on naturalistic data yield perplexity improvements over baselines, suggesting that these principles extend to large-scale, real-world domains. These results are obtained through efficient parallel computation on commutative maps, though our models can also learn non-commutative cognitive maps via sequential path-integration. Overall, these results suggest that input-dependent matrices provide a critical structural bias, by disentangling abstract relations from content in order to drive robust OOD generalization.

URL PDF HTML ☆

赞 0 踩 0

2511.18374 2026-05-12 cs.RO cs.SY eess.SY math.DS

Explicit Bounds on the Hausdorff Distance for Truncated mRPI Sets via Norm-Dependent Contraction Rates

Jiaxun Sun, Hengyu Xue, Yuyang Zhao

AI总结本文研究了截断最小鲁棒正不变集（mRPI）与其无限时间范围极限之间的Hausdorff距离的显式上界。通过系统矩阵的诱导范数收缩因子和扰动集大小度量，提出了一种可计算的闭式上界表达式，并给出了一个无需迭代计算的显式时间步长选择规则，以保证预设的逼近精度。通过选择不同的向量范数，可以进一步优化收缩因子和逼近精度，对鲁棒不变集逼近和基于管的模型预测控制中的约束收紧具有重要意义。

Comments 6 pages, 5 figures. Accepted at the 2026 IEEE Conference on Control Technology and Applications (CCTA), Vancouver, BC, Canada, August 12-14, 2026

2511.17879 2026-05-12 cs.LG cs.SD

Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction

Yusong Wu, Stephen Brade, Aleksandra Teng Ma, Tia-Jane Fowler, Enning Yang, Berker Banar, Aaron Courville, Natasha Jaques, Cheng-Zhi Anna Huang

AI总结本文研究了在实时人机音乐协作中，如何通过生成对抗后训练方法缓解强化学习后训练中的奖励黑客问题。作者提出了一种对抗性训练方法，在策略生成的轨迹上进行训练，以提升旋律到和声伴奏生成的多样性与适应性。实验表明，该方法有效提高了输出多样性、和声连贯性以及用户的互动体验。

Comments v3: fix the Figure numbering bugs

2511.12878 2026-05-12 cs.CV cs.RO

Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views

Junyi Ma, Wentao Bao, Jingyi Xu, Guanzhong Sun, Yu Zheng, Erhang Zhang, Xieyuanli Chen, Hesheng Wang

AI总结本文提出了一种名为Uni-Hand的通用手部运动预测框架，旨在解决第一人称视角下手部运动预测中存在的预测目标不足、模态差异、手部与头部运动耦合以及下游任务验证有限等问题。该方法通过融合视觉与语言信息、引入全局上下文和任务感知的文本嵌入，实现了2D和3D空间中手部关键点的多目标预测，并首次引入手部与物体交互状态的预测以提升下游任务表现。实验结果表明，Uni-Hand在多个公开数据集和新构建的基准测试中均取得了最先进的预测性能，并在机器人策略迁移和动作识别等任务中展现出优异的应用潜力。

Comments Accepted by T-PAMI 2026. Code and data: https://github.com/IRMVLab/UniHand

详情

英文摘要

Forecasting how human hands move in egocentric views is critical for applications like augmented reality and human-robot policy transfer. Recently, several hand trajectory prediction (HTP) methods have been developed to generate future possible hand waypoints, which still suffer from insufficient prediction targets, inherent modality gaps, entangled hand-head motion, and limited validation in downstream tasks. To address these limitations, we present a universal hand motion forecasting framework considering multi-modal input, multi-dimensional and multi-target prediction patterns, and multi-task affordances for downstream applications. We harmonize multiple modalities by vision-language fusion, global context incorporation, and task-aware text embedding injection, to forecast hand waypoints in both 2D and 3D spaces. A novel dual-branch diffusion is proposed to concurrently predict human head and hand movements, capturing their motion synergy in egocentric vision. By introducing target indicators, the prediction model can forecast the specific joint waypoints of the wrist or the fingers, besides the widely studied hand center points. In addition, we enable Uni-Hand to additionally predict hand-object interaction states (contact/separation) to facilitate downstream tasks better. As the first work to incorporate downstream task evaluation in the literature, we build novel benchmarks to assess the real-world applicability of hand motion forecasting algorithms. The experimental results on multiple publicly available datasets and our newly proposed benchmarks demonstrate that Uni-Hand achieves the state-of-the-art performance in multi-dimensional and multi-target hand motion forecasting. Extensive validation in multiple downstream tasks also presents its impressive human-robot policy transfer to enable robotic manipulation, and effective feature enhancement for action anticipation/recognition.

URL PDF HTML ☆

赞 0 踩 0

2511.11159 2026-05-12 cs.LG

Adaptive Symmetrization of the KL Divergence

Omri Ben-Dov, Luiz F. O. Chamon

AI总结该研究针对KL散度在分布拟合中的不对称性问题，提出了一种非对抗性的方法以最小化Jeffreys散度。通过引入一个代理模型来近似主模型的反向KL散度，联合优化主模型与代理模型，从而在保证可计算性的同时提升稳定性与拟合精度。实验表明，该方法在密度估计和基于模拟的推断任务中，特别是在数据量较少的情况下，优于最大似然估计和生成对抗网络。

2511.06216 2026-05-12 cs.LG

Adaptive Multi-view Graph Contrastive Learning via Fractional-order Neural Diffusion Networks

Yanan Zhao, Feng Ji, Jingyang Dai, Jiaze Ma, Keyue Jiang, Kai Zhao, Wee Peng Tay

AI总结本文提出了一种基于分数阶神经扩散网络的自适应多视角图对比学习方法，通过调整分数阶导数阶数 $α$ 自动生成多样化的局部与全局视角，无需手动设计数据增强。该方法能够根据数据自动学习最优的扩散尺度，生成更具表达力和鲁棒性的图表示。实验表明，该方法在多个基准数据集上优于现有的图对比学习方法。

Comments Machine learning, diffusion neural networks. arXiv admin note: text overlap with arXiv:2504.16748

2511.01008 2026-05-12 cs.CL

MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL

Haolin Yang, Jipeng Zhang, Zhitao He, Alexander Zhou, Yi R. Fung

AI总结 MARS-SQL 是一种用于文本到 SQL 任务的多智能体强化学习框架，旨在解决大语言模型在复杂任务中逻辑精确性和模式对齐方面的不足。该方法通过将任务分解为模式对齐、查询生成和解决方案验证三个专门角色，并采用多轮强化学习策略进行训练，使智能体能够通过与数据库的交互逐步优化 SQL 生成过程。实验表明，MARS-SQL 在多个基准数据集上取得了领先的执行准确率，展现出良好的泛化能力。

2511.00560 2026-05-12 cs.CV

4D Neural Voxel Splatting: Dynamic Scene Rendering with Voxelized Guassian Splatting

Chun-Tin Wu, Jun-Cheng Chen

AI总结尽管3D高斯泼溅（3D-GS）在新视角合成中实现了高效的渲染，但将其扩展到动态场景时仍因每帧复制高斯分布而导致较大的内存开销。为此，本文提出了一种4D神经体素泼溅（4D-NVS）方法，结合体素表示与神经高斯泼溅，以高效建模动态场景。该方法通过学习变形场的紧凑神经体素集来建模时间动态，显著降低了内存消耗并加快了训练速度，同时保持了高质量的图像渲染。实验表明，该方法在内存占用和训练速度上优于现有方法，实现了实时渲染与更优的视觉效果。

Comments 10 pages, 7 figures

2511.00371 2026-05-12 cs.CL cs.CY cs.SE

Reasoning Trajectories for Socratic Debugging of Student Code: From Misconceptions to Contradictions and Updated Beliefs

Erfan Al-Hossami, Razvan Bunescu

AI总结本文研究如何通过苏格拉底式调试帮助学生自主发现并修正代码中的错误，核心问题是生成引导学生从错误观念走向矛盾并更新信念的推理轨迹（RT）。作者提出了推理轨迹生成任务，并构建了包含人工和大语言模型生成轨迹的数据集，同时设计了基于大语言模型的解决方案生成相关对话。实验表明，大型语言模型能够生成高达91%正确的推理轨迹和98.7%有效的对话回合，展示了该方法在编程教育中的潜力。

Comments 25 pages, 2 tables, 13 figures

2510.26067 2026-05-12 cs.RO

Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion

Chi Zhang, Mingrui Li, Wenzhe Tong, Xiaonan Huang

AI总结本文研究了如何通过形态感知的图强化学习方法，提升张力结构机器人（tensegrity robot）的运动控制能力。该方法将图神经网络（GNN）集成到软演员-评论家（SAC）算法中，利用机器人物理结构的图表示来捕捉组件间的耦合关系，从而实现更高效、稳定的运动学习。实验表明，该方法在样本效率、抗噪声和抗刚度变化能力以及轨迹精度方面均优于传统方法，并且能够在无需微调的情况下直接从仿真迁移到实际硬件，实现了稳定的实物运动控制。

Comments 8 pages, 10 figures. Project page: https://tensegrity-graph-rl.github.io/

2510.22767 2026-05-12 cs.LG cs.CL

TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination

Omar Naim, Krish Sharma, Niyar R Barman, Nicholas Asher

AI总结 TELL-TALE 是一种在推理阶段通过任务感知的层消除方法，旨在提升大语言模型在特定任务上的性能。该方法通过移除对任务无关或有害的模型层，优化模型架构，从而在不重新训练的情况下提高任务表现并降低计算成本。实验表明，TELL-TALE 在多种任务和模型家族中均能匹配或超越基线性能，并且与微调结合使用可进一步提升效果，具有实际部署价值。

Comments ACL 2026 Findings

2510.21954 2026-05-12 cs.CL

Model-Aware Tokenizer Transfer

Mykola Haltiuk, Aleksander Smywinski-Pohl

AI总结大型语言模型（LLMs）在支持多种语言方面取得进展，但其预定义的分词器在适应低资源或使用不同书写系统的语言时仍面临瓶颈。本文提出了一种模型感知的分词器迁移方法MATT，通过引入注意力影响建模（AIM）目标，将源模型中的词间通信模式迁移到使用新分词器的目标模型中，从而提升分词器迁移的效果。实验表明，MATT能够在短时间内恢复大部分原始模型性能，优于传统启发式方法，展示了结合模型内部信号进行分词器迁移的有效性。

2510.20797 2026-05-12 cs.CL cs.AI cs.LG

No Mean Feat: Simple, Strong Baselines for Context Compression

Yair Feldman, Yoav Artzi

AI总结该论文研究了上下文压缩技术，旨在通过将长输入替换为短的预计算表示来降低Transformer模型的推理成本，尤其适用于检索增强生成（RAG）等任务。作者提出了一套标准化的评估框架BenchPress，并设计了两个简单但性能优越的基线方法，显著优于常用的因果压缩方法。研究发现双向注意力机制在生成压缩表示时具有优势，且简单的池化操作也能实现有效的上下文压缩。

Comments Code available at https://github.com/lil-lab/benchpress

2510.17671 2026-05-12 cs.LG cs.AI cs.CL

LILO: Bayesian Optimization with Natural Language Feedback

Katarzyna Kobalczyk, Zhiyuan Jerry Lin, Benjamin Letham, Zhuokai Zhao, Maximilian Balandat, Eytan Bakshy

AI总结 LILO 是一种结合贝叶斯优化与大语言模型的优化框架，旨在解决由复杂主观偏好引导的现实优化问题。该方法利用大语言模型将自然语言反馈转化为结构化偏好信号，突破了传统偏好优化中对标量或成对反馈的限制。通过将这些偏好整合到高斯过程代理模型中，LILO 在保证样本效率和稳定性的同时，提供了更灵活的反馈接口，并在多个基准测试中优于传统方法和纯语言模型优化器。

2510.09592 2026-05-12 cs.CL

Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models

Donghang Wu, Haoyang Zhang, Jun Chen, Xiangyu, Zhang, Hexin Liu, Eng Siong Chng, Fei Tian, Xuerui Yang, Xiangyu Zhang, Daxin Jiang, Gang Yu

AI总结本文提出了一种名为“Mind-Paced Speaking（MPS）”的新型框架，旨在解决实时语音语言模型在链式推理（CoT）过程中因生成延迟过高而难以实现高效推理的问题。该方法受到人类大脑分工的启发，采用“双脑”结构，将高阶推理与语音生成分别由两个独立模块完成，从而避免模式切换并保持推理过程的完整性。实验表明，MPS在推理准确率和实时性方面均优于现有方法，为高质量推理与实时交互的结合提供了有效解决方案。

2510.09580 2026-05-12 cs.AI cs.CL

GraphMERT: Efficient and Scalable Distillation of Reliable Knowledge Graphs from Unstructured Data

Margarita Belova, Jiaxin Xiao, Shikhar Tuli, Niraj K. Jha

AI总结本文提出了一种名为 GraphMERT 的高效可扩展模型，用于从非结构化文本中提炼出可靠的知识图谱（KG）。该模型采用纯图编码器结构，能够生成具有事实准确性和语义一致性的领域特定知识图谱，解决了传统神经符号系统在可扩展性和可靠性方面的不足。实验表明，GraphMERT 在医学领域（如糖尿病相关文献）生成的知识图谱在事实性和有效性指标上显著优于大语言模型。

Comments Camera-ready version. Published in Transactions on Machine Learning Research (TMLR), 2026. Reviewed on OpenReview: https://openreview.net/forum?id=tnXSdDhvqc

详情

Journal ref: Transactions on Machine Learning Research, 2026

英文摘要

Researchers have pursued neurosymbolic artificial intelligence (AI) applications for nearly three decades. A marriage of the neural and symbolic components can lead to rapid advancements in AI. Yet, the field has not realized this promise since most neurosymbolic AI frameworks fail to scale. In addition, the implicit representations and approximate reasoning of purely neural approaches limit interpretability and trust. Knowledge graphs (KGs), a gold-standard representation of explicit semantic knowledge, can address the symbolic side of the problem. However, automatically deriving reliable KGs from text corpora remains an open problem. We address these challenges by introducing GraphMERT, a tiny graphical encoder-only model that distills high-quality KGs from unstructured text corpora and its own internal representations. GraphMERT and its equivalent KG form a modular neurosymbolic stack: neural learning of abstractions; symbolic KGs for verifiable reasoning. GraphMERT + KG is the first efficient and scalable neurosymbolic model to achieve state-of-the-art benchmark accuracy along with superior symbolic representations relative to baselines. Concretely, we target reliable domain-specific KGs that are both (1) factual (with provenance) and (2) valid (ontology-consistent relations with domain-appropriate semantics). When a large language model (LLM), e.g., Qwen3-32B, generates domain-specific KGs, it falls short on reliability due to prompt sensitivity, shallow domain expertise, and hallucinated relations. On text obtained from PubMed papers on diabetes, our 80M-parameter GraphMERT yields a KG with a 69.8% FActScore; a 32B-parameter baseline LLM yields a KG that achieves only 40.2% FActScore. The GraphMERT KG also attains a higher ValidityScore of 68.8%, versus 43.0% for the LLM baseline.

URL PDF HTML ☆

赞 0 踩 0

2510.04988 2026-05-12 cs.LG

Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization

Kristi Topollai, Anna Choromanska

AI总结本文针对深度学习优化中常用的固定动量系数问题，提出了一种基于模型的自适应记忆动量机制，通过在线动态调整动量系数来提升优化性能。该方法利用当前梯度和历史梯度累积信息构建两个近似目标函数的平面，从而推导出自适应动量更新规则，无需额外假设或超参数调优。实验表明，该方法在多种学习任务中均优于使用手动调优动量的常规SGD和Adam优化器，为优化算法的自适应性研究提供了新思路。

2510.03648 2026-05-12 cs.LG

SAFA-SNN: Sparsity-Aware On-Device Few-Shot Class-Incremental Learning with Fast-Adaptive Structure of Spiking Neural Network

Huijing Zhang, Muyang Cao, Linshan Jiang, Xin Du, Di Yu, Changze Lv, Shuiguang Deng

AI总结本文提出了一种基于脉冲神经网络（SNN）的设备端少样本类增量学习方法SAFA-SNN，旨在解决边缘设备在数据样本有限的情况下持续学习新类别的挑战。该方法通过稀疏性感知的神经元动态和快速自适应网络结构，有效缓解了灾难性遗忘问题，并采用零阶优化技术处理脉冲非微分特性，同时利用正交子空间投影增强类别原型的判别能力。实验表明，SAFA-SNN在多个基准数据集上优于现有方法，具有更高的准确率和更低的能耗。

Comments Published as a conference paper at ICLR 2026

2509.25742 2026-05-12 cs.LG

Less is More: Towards Simple Graph Contrastive Learning

Yanan Zhao, Feng Ji, Jingyang Dai, Jiaze Ma, Wee Peng Tay

AI总结本文研究了图对比学习（GCL）在异质图上的有效性问题，提出了一种简单高效的图对比学习方法。通过分析图结构与节点特征之间的关系，作者发现利用图结构特征来减少节点特征噪声可以提升对比学习的效果，并基于此设计了一个无需数据增强和负样本的简单模型，该模型在异质图上取得了优越的性能，同时在计算和内存开销上也具有优势。

2509.25646 2026-05-12 cs.LG cs.NA math.NA

Deep set based operator learning with uncertainty quantification

Lei Ma, Ling Guo, Hao Wu, Tao Zhou

AI总结该论文提出了一种具有不确定性量化能力的深度集操作符学习框架UQ-SONet，用于从数据中学习科学计算中的操作符。该方法通过集变换器嵌入处理稀疏且可变的传感器位置，并采用条件变分自编码器来近似解操作符的条件分布，从而在保持预测精度的同时提供原理化的不确定性估计。实验表明，该框架在确定性和随机偏微分方程上均表现出良好的鲁棒性和有效性。

2509.22196 2026-05-12 cs.LG stat.ML

Mechanistic Independence: A Principle for Identifiable Disentangled Representations

Stefan Matthes, Zhiwei Han, Hao Shen

AI总结本文提出了一种基于“机制独立性”的统一框架，用于实现可识别的解耦表征，其核心在于通过潜变量对观测变量的作用方式来刻画潜在因素，而非依赖潜变量的分布特性。该方法在潜变量密度变化甚至引入统计依赖的情况下仍保持不变性，并提出了多种独立性准则，证明了即使在非线性和非可逆混合条件下，也能实现潜空间的可识别性。研究还建立了这些准则之间的层次关系，并从图论角度对潜空间进行了结构表征，为解耦表征的可识别性提供了新的理论基础。

2509.21743 2026-05-12 cs.AI cs.LG

Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts

Ammar Ahmed, Azal Ahmad Khan, Ayaan Ahmad, Sheng Di, Zirui Liu, Ali Anwar

AI总结该论文提出了一种名为Retrieval-of-Thought（RoT）的高效推理方法，旨在解决大模型推理过程中因生成长推理轨迹而导致的延迟和成本增加问题。RoT通过检索和复用先前推理中的“思维”步骤，构建可组合的思维图谱，从而快速生成针对新问题的推理模板，减少冗余探索。实验表明，RoT在保持推理准确率的同时，显著降低了输出token数量、推理延迟和计算成本，展现出高效的推理范式。

2509.21671 2026-05-12 cs.LG q-bio.NC

Neuroprobe: Evaluating Intracranial Brain Responses to Naturalistic Stimuli

Andrii Zahorodnii, Christopher Wang, Geeling Chau, Bennett Stankovits, Charikleia Moraitaki, Eli Gross, Alexander Brady, Andrei Barbu, Boris Katz, Ila R Fiete

AI总结本文提出 Neuroprobe，一个用于评估侵入式脑电图（iEEG）记录下大脑对自然刺激响应的解码任务框架。该研究基于 BrainTreebank 数据集，包含10名受试者观看电影时超过40小时的iEEG记录，旨在系统研究语言处理过程中不同脑区的时间和空间特征解码规律。Neuroprobe 不仅有助于揭示语言和听觉信息在大脑中的处理流程，还为比较神经基础模型的架构和训练方法提供了标准化评估平台。

Comments 38 pages, 7 main figures, 16 supplementary figures, 13 tables

详情

英文摘要

High-resolution neural datasets enable foundation models for the next generation of brain-computer interfaces and neurological treatments. The community requires rigorous benchmarks to discriminate between competing modeling approaches, yet no standardized evaluation frameworks exist for intracranial EEG (iEEG) recordings. To address this gap, we present Neuroprobe: a suite of decoding tasks for studying multi-modal language processing in the brain. Unlike scalp EEG, intracranial EEG requires invasive surgery to implant electrodes that record neural activity directly from the brain with minimal signal distortion. Neuroprobe is built on the BrainTreebank dataset, which consists of over 40 hours of iEEG recordings from 10 human subjects performing a naturalistic movie viewing task. Neuroprobe serves two critical functions. First, it is a source from which neuroscience insights can be drawn. The high temporal and spatial resolution of the labeled iEEG allows researchers to systematically determine when and where computations for each aspect of language processing occur in the brain by measuring the decodability of each feature across time and all electrode locations. Using Neuroprobe, we visualize how information flows from key language and audio processing sites in the superior temporal gyrus to sites in the prefrontal cortex. We also demonstrate the time evolution of processing from simple auditory features (e.g., pitch and volume) to more complex language features (e.g., part of speech) in a purely data-driven manner. Second, as the field moves toward neural foundation models trained on large-scale datasets, Neuroprobe provides a rigorous framework for comparing competing architectures and training protocols. We make the code for Neuroprobe openly available, aiming to enable rapid progress in the field of iEEG foundation models. Public leaderboard: https://neuroprobe.dev/

URL PDF HTML ☆

赞 0 踩 0

2509.15816 2026-05-12 cs.LG

On the Convergence of Muon and Beyond

Da Chang, Yongxiang Liu, Ganzhao Yuan

AI总结本文研究了Muon优化器在非凸随机优化中的收敛性能，针对其理论分析与实际效果之间的差距，提出了两种基于动量的方差减少变体——Muon-MVR1和Muon-MVR2。通过严格的理论分析，证明在无上限学习率调度下，Muon-MVR2能够达到最优的任意时间收敛速率$\widetilde{\mathcal{O}}(T^{-1/3})$，并给出了在Polyak–Łojasiewicz条件下的收敛保证。实验表明，所提方法在CIFAR-10和C4数据集上具有良好的实际效果。

2509.13484 2026-05-12 cs.CV cs.CY

MINGLE: VLMs for Semantically Complex Region Detection in Urban Scenes

Liu Liu, Alexandra Kudaeva, Marco Cipriano, Fatimeh Al Ghannam, Freya Tan, Gerard de Melo, Andres Sevtsuk

AI总结本文提出MINGLE，一种用于检测城市场景中语义复杂社交群体区域的视觉-语言模型方法。该方法通过结合人体检测、深度估计、视觉-语言模型推理及空间聚合算法，实现了对图像中社交互动区域的识别与定位。研究还构建了一个包含10万张城市街景图像的新数据集，标注了个体及社交群体的边界框和标签，为相关研究提供了重要资源。

Comments 13 pages, 4 figures Updated with the camera-ready version after acceptance

2509.13332 2026-05-12 cs.AI cs.CL

Explicit Reasoning Makes Better Judges: A Systematic Study on Accuracy, Efficiency, and Robustness

Pratik Jayarao, Himanshu Gupta, Neeraj Varshney, Chaitanya Dwivedi

AI总结本文系统研究了在“大语言模型作为评判者”的框架下，显式推理（思考型）模型与非显式推理（非思考型）模型在准确性、效率和鲁棒性方面的表现差异。通过使用开源的Qwen 3系列模型进行对比实验，结果表明，显式推理模型在保持较低计算开销的同时，能显著提升判断准确性，并在多种偏见条件下表现出更强的稳定性。研究还发现，显式推理的优势不仅适用于英文任务，也在多语言环境下得到验证。

Comments Accepted in 2025 NeurIPS Foundations of Reasoning in Language Models Workshop

2509.12635 2026-05-12 cs.CL cs.AI

Positional Encoding via Token-Aware Phase Attention

Yu Wang, Sheng Shen, Rémi Munos, Hongyuan Zhan, Yuandong Tian

AI总结本文研究了旋转位置嵌入（RoPE）在处理长上下文时存在的内在距离依赖偏差问题，并提出了一种新的位置编码方法——Token-Aware Phase Attention（TAPA）。TAPA 通过在注意力机制中引入可学习的相位函数，有效保留了长距离的token交互，并支持直接且轻量的持续预训练，从而在长上下文场景下实现了比RoPE基线更低的困惑度和更强的检索性能。

Comments 28 pages