arXivDaily arXiv每日学术速递 周一至周五更新

AI 大模型

代码大模型 / AI 编程

代码生成、软件工程智能体、程序修复、测试生成和开发者工具。

今日/当前日期收录 2 信号源:cs.SE, cs.CL, cs.AI, cs.LG, cs.PL
2602.06774 2026-06-18 cs.AI 版本更新 85%

Towards Understanding What State Space Models Learn About Code

理解状态空间模型在代码中学到了什么

Jiali Wu, Abhinav Anand, Shweta Verma, Mira Mezini

发表机构 * TU Darmstadt(图宾根大学) Hessian Center for Artificial Intelligence(黑森人工智能中心) National Research Center for Applied Cybersecurity ATHENE(应用网络安全国家研究中心ATHENE)

专题命中 代码评测 :SSM代码理解机制分析

AI总结 本文首次系统分析状态空间模型(SSM)在代码理解中的学习机制,发现SSM在预训练时比Transformer更有效捕获语法和语义结构,但微调时会遗忘某些关系,并提出SSM-Interpret框架和架构改进,将NLCodeSearch的MRR提升高达6。

详情
AI中文摘要

状态空间模型(SSM)已成为Transformer架构的高效替代方案。先前工作表明,在可比条件下训练时,SSM在代码理解任务上可以匹配或超越Transformer。然而,其内部机制仍是一个黑箱。我们首次系统分析了基于SSM的代码模型所学到的内容,并在此领域直接比较了SSM和Transformer模型。我们的分析表明,SSM在预训练期间比Transformer更有效地捕获了语法和语义结构,但在某些任务的微调过程中会遗忘某些关系。为了研究这种行为,我们引入了SSM-Interpret,一个频域框架,揭示了微调期间向短程依赖的频谱偏移。在这些发现的指导下,我们提出了架构修改,将基于SSM的代码模型在NLCodeSearch上的性能显著提升了高达+6 MRR。这表明我们的分析不仅解释了模型行为,而且直接导致了更好的设计。

英文摘要

State Space Models (SSMs) have emerged as an efficient alternative to the Transformer architecture. Prior work shows that, when trained under comparable conditions, SSMs can match or surpass Transformers on code understanding tasks. However, their internal mechanisms remain a black box. We present the first systematic analysis of what SSM-based code models learn along with the direct comparison between SSM and Transformer models in this domain. Our analysis shows that SSMs capture syntactic and semantic structure more effectively than Transformers during pretraining but forgets certain relations during fine-tuning on some tasks. To investigate this behavior, we introduce SSM-Interpret, a frequency-domain framework that exposes a spectral shift toward short-range dependencies during fine-tuning. Guided by these findings, we propose architectural modifications that significantly improve the performance of SSM-based code model by upto +6 MRR on NLCodeSearch. This demonstrates that our analysis not only explains model behavior but also leads directly to better designs.

2604.00730 2026-06-18 cs.CY cs.AI cs.LG cs.SE 版本更新 75%

A CEFR-Inspired Classification Framework with Fuzzy C-Means To Automate Assessment of Programming Skills in Scratch

基于CEFR启发的模糊C均值分类框架:自动化评估Scratch编程技能

Ricardo Hidalgo-Aragón, Jesús M. González-Barahona, Gregorio Robles

发表机构 * Universidad Rey Juan Carlos(雷昂卡洛斯大学)

专题命中 代码评测 :模糊C均值聚类评估Scratch编程技能

AI总结 提出一种基于CEFR的Scratch项目评估框架,使用模糊C均值聚类对200万+项目分级,识别B2瓶颈并引入分类确定性指标以平衡自动反馈与人工审核。

Comments Best Paper Award CSEDU 2026 -Minor change FPC fix-

详情
AI中文摘要

背景:学校、培训平台和技术公司日益需要以透明、可重复的方法大规模评估编程能力,以支持个性化学习路径。目标:本研究引入一个与欧洲共同语言参考标准(CEFR)一致的Scratch项目评估教学框架,为学生和教师提供通用能力等级,并为课程设计提供可行见解。方法:我们对通过此http URL评估的2008246个Scratch项目应用模糊C均值聚类,实施序数准则将聚类映射到CEFR等级(A1-C2),并引入增强分类指标,识别过渡学习者,实现持续进度跟踪,量化分类确定性以平衡自动反馈与教师评审。影响:该框架能够诊断系统性课程缺口——特别是“B2瓶颈”,由于逻辑同步和数据表示的认知负荷,仅13.3%的学习者处于该等级——同时提供基于确定性的触发机制以进行人工干预。

英文摘要

Context: Schools, training platforms, and technology firms increasingly need to assess programming proficiency at scale with transparent, reproducible methods that support personalized learning pathways. Objective: This study introduces a pedagogical framework for Scratch project assessment, aligned with the Common European Framework of Reference (CEFR), providing universal competency levels for students and teachers alongside actionable insights for curriculum design. Method: We apply Fuzzy C-Means clustering to 2008246 Scratch projects evaluated via Dr.Scratch, implementing an ordinal criterion to map clusters to CEFR levels (A1-C2), and introducing enhanced classification metrics that identify transitional learners, enable continuous progress tracking, and quantify classification certainty to balance automated feedback with instructor review. Impact: The framework enables diagnosis of systemic curriculum gaps-notably a "B2 bottleneck" where only 13.3% of learners reside due to the cognitive load of integrating Logic Synchronization, and Data Representation--while providing certainty--based triggers for human intervention.