arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.12287 2026-06-11 cs.NE cs.AI 新提交

SpikeDecoder: Realizing the GPT Architecture with Spiking Neural Networks

SpikeDecoder: 用脉冲神经网络实现GPT架构

Claas Beger, Florian Walter, Alois Knoll

AI总结 提出SpikeDecoder,一种基于脉冲神经网络(SNN)的Transformer解码器,用于自然语言处理,通过替换ANN模块和优化嵌入方法,在保持性能的同时降低理论能耗87%-93%。

详情
AI中文摘要

Transformer架构被广泛认为是自然语言处理最强大的工具,但由于大量复杂操作,其本质上存在高能耗问题。为解决这一问题,我们考虑脉冲神经网络(SNN),它通过天然的事件驱动方式处理信息,是传统人工神经网络(ANN)的节能替代方案。然而,这本质上使得SNN难以训练。通常,许多基于SNN的模型通过转换预训练的ANN来规避这一问题。最近,有研究尝试设计可直接训练的基于SNN的Transformer模型结构改编。尽管结果显示出巨大潜力,但应用领域是计算机视觉,且所提模型仅包含编码器模块。在本文中,我们提出SpikeDecoder,一种完全基于SNN的Transformer解码器模块实现,用于自然语言处理。通过一系列实验,我们分析了用脉冲替代方案交换ANN模型不同模块的影响,以识别权衡和性能损失的主要来源。我们进一步研究了残差连接的作用以及SNN兼容归一化技术的选择。除了模型架构的工作,我们还制定并比较了将文本数据投影为脉冲的不同嵌入方法。最后,我们证明,与ANN基线相比,所提出的基于SNN的解码器模块将理论能耗降低了87%至93%。

英文摘要

The Transformer architecture is widely regarded as the most powerful tool for natural language processing, but due to a high number of complex operations, it inherently faces the issue of high energy consumption. To address this issue, we consider Spiking Neural Networks (SNNs), which are an energy-efficient alternative to conventional Artificial Neural Networks (ANNs) due to their naturally event-driven approach to processing information. However, this inherently makes them difficult to train. Often, many SNN-based models circumvent this issue by converting pre-trained ANNs. More recently, attempts have been made to design directly trainable SNN-based adaptations of the Transformer model structure. Although the results showed great promise, the application field was computer vision. Moreover, the proposed model incorporates only encoder blocks. In this paper, we propose SpikeDecoder, a fully SNN-based implementation of the Transformer decoder block, for applications in natural language processing. In a series of experiments, we analyze the impact of exchanging different blocks of the ANN model with spike-based alternatives to identify trade-offs and significant sources of performance loss. We further investigate the role of residual connections and the selection of SNN-compatible normalization techniques. Besides the work on the model architecture, we formulate and compare different embedding methods to project text data into spikes. Finally, we demonstrate that our proposed SNN-based decoder block reduces the theoretical energy consumption by 87% to 93% compared to the ANN baseline.

2606.12282 2026-06-11 cs.SD cs.LG 新提交

PianoKontext: Expressive Performance Rendering from Deadpan Context

PianoKontext: 从平淡语境中生成富有表现力的演奏

Dmitrii Gavrilev

AI总结 提出PianoKontext,一种基于流匹配的钢琴演奏渲染模型,通过动态时间规整对齐乐谱与演奏的潜在表示,生成可变长度的表现力演奏。

详情
Comments
ICML 2026 Workshop on Machine Learning for Audio (Oral)
AI中文摘要

表现力演奏渲染(EPR)旨在根据音符序列生成逼真的演奏。然而,流匹配音频编辑模型仅操作相同时长的同步音乐样本,限制了它们对表现力时机的理解。我们提出了PianoKontext,一种针对古典钢琴音乐的流匹配渲染模型,该模型在预训练的Music2Latent模型的潜在空间中生成可变长度的演奏。我们将MIDI乐谱合成为平淡音频,并在潜在空间中使用动态时间规整(DTW)构建用于训练的对齐数据。对齐的嵌入在DiT块中拼接,从而简单有效地学习乐谱与演奏之间的依赖关系。音频样本可在我们的演示页面获取:此https URL。

英文摘要

Expressive performance rendering (EPR) aims to generate realistic performances constrained on sequences of notes. However, flow matching audio editing models manipulate only synchronized music samples of the same duration, limiting their understanding of expressive timing. We introduce PianoKontext, a flow matching rendering model for classical piano music that generates variable-length performances in the latent space of a pretrained Music2Latent model. We synthesize MIDI scores into deadpan audio and employ Dynamic Time Warping (DTW) in the latent space to construct paired data for training. The aligned embeddings are concatenated in DiT blocks, allowing for a simple and effective learning of the dependencies between the score and performances. Audio samples are available at our demo page: this https URL.

2606.12281 2026-06-11 cs.MA cs.AI cs.LG 新提交

CCKS: Consensus-based Communication and Knowledge Sharing

CCKS:基于共识的通信与知识共享

Jinyuan Zu, Xiaowei Lv, Yongcai Wang, Deying Li, Yunjun Han, Wenping Chen, Fengyi Zhang, Naiqi Wu

AI总结 针对多智能体强化学习中动作建议过度依赖教师指导的问题,提出基于共识的通信与知识共享框架,通过对比学习构建共识模型,平衡探索与学习,提升合作效率与性能。

详情
AI中文摘要

在分布式训练和分布式执行(DTDE)的协作多智能体强化学习(MARL)中,基于动作建议的知识共享促进了智能体间的可解释和可扩展合作。然而,当前的动作建议方法往往过于遵循教师的指导,而未评估师生兼容性,导致过度建议、稳定性欠佳和性能下降。为克服这些挑战,本文提出了一种基于共识的通信与知识共享(CCKS)框架,该框架允许智能体基于共识衍生的约束采纳建议,并更智能地遵循教师指令。该机制使智能体能够平衡探索与向经验丰富的教师学习,从而提升整体性能。关键在于共识模型的构建,为此我们提出在智能体训练阶段利用对比学习基于局部观测构建共识模型。在动作选择中,智能体根据共识和共享知识对动作进行评分和选择。CCKS设计为即插即用解决方案,可无缝集成到现有DTDE算法中。在Google Research Football环境和复杂的星际争霸II多智能体挑战中进行的实验表明,与当前的DTDE基线相比,集成CCKS显著提高了合作效率、学习速度和整体性能。代码可从此https URL获取。

英文摘要

In Decentralized Training and Decentralized Execution (DTDE) for cooperative Multi-Agent Reinforcement Learning (MARL), action-advising-based knowledge sharing promotes interpretable and scalable cooperation among agents. However, current action advising approaches often adhere too much to the teacher's guidance without evaluating teacher-student compatibility, which causes excessive advising, suboptimal stability, and degraded performance. To overcome these challenges, this paper presents a Consensus-based Communication and Knowledge Sharing (CCKS) framework, which allows agents to adopt recommendations based on consensus-derived constraints and to follow the teacher's instructions more smartly. This mechanism enables agents to balance exploration and learning from experienced teachers, improving overall performance. The key is the consensus model construction, for which we propose to employ contrastive learning to construct consensus models based on local observations in the agents' training phase. In action selection, agents score and choose actions based on consensus and shared knowledge. Designed as a plug-and-play solution, CCKS integrates seamlessly with existing DTDE algorithms. Experiments conducted in the Google Research Football environment and the complex StarCraft II Multi-Agent Challenge demonstrate that the integration with CCKS significantly improves cooperation efficiency, learning speed, and overall performance compared with current DTDE baselines. The code is available at this https URL.

2606.12279 2026-06-11 cs.NE cs.AI cs.LG 新提交

Mathematical perspective on genetic algorithms with optimization guided operators

遗传算法与优化引导算子的数学视角

Anna Brandenberger, Ilan Doron-Arad, Elchanan Mossel

AI总结 本文从数学角度建模遗传算法,将优化问题转化为查询复杂度问题,并证明某些问题必须依赖生成、变异和重组算子,同时揭示了多样性在解池中的关键作用。

详情
Comments
18 pages, 1 figure
AI中文摘要

近期机器学习工作将遗传算法应用于推理阶段,以迭代改进优化问题的解。所涉及的基本变异和重组算子在性质上不同于经典研究。变异不再是随机的;机器学习算法以改进目标为目的对解进行变异。同样,重组不再基于父代解的随机拼接,而是基于机器学习的优化算子,其目标是从输入中合成改进的解。因此,这些变异和重组算子更有可能改进目标,但其计算成本更高。我们引入了一个遗传算法的通用模型,并使用强化学习的语言将优化问题表述为查询复杂度问题。然后我们研究专门模型。我们证明某些优化问题必须通过生成、变异和重组来解决。接着,我们在此框架内为一类问题获得了定性紧的算法,该算法捕捉了解池中多样性的非平凡作用,这是实际机器学习遗传算法的一个关键特征。

英文摘要

Recent work in ML applies genetic algorithms at inference time to iteratively improve solutions to optimization problems. The basic mutation and recombination operators involved are qualitatively different from those studied classically. Mutations are no longer random; an ML algorithm mutates a solution with the goal of improving an objective. Similarly, recombination is not based on random collages of parent solutions. Instead, it is an ML optimization-based operator whose goal is to synthesize improved solutions from its inputs. Thus, these mutation and recombination operators are more likely to improve the objective, but their computational cost is much higher. We introduce a general model of genetic algorithms and formulating optimization in this model as a query-complexity problem, using the language of reinforcement learning. We then study specialized models. We show that some optimization problems require generation, mutation, and recombination to be solved. We then obtain qualitatively tight algorithms for a family of problems within this framework that captures the nontrivial role of diversity in the solution pool, a key feature of practical ML genetic algorithms.

2606.12260 2026-06-11 econ.TH cs.AI cs.GT cs.LG stat.ML 新提交

Market Design for AI: Beyond the Copyright Binary

人工智能的市场设计:超越版权二元论

Yan Dai, Maryam Farboodi, Negin Golrezaei, Sepehr Shahshahani

AI总结 本文通过静态和动态博弈模型,分析AI训练数据市场中“自由使用”与“强知识产权”两种模式的失败,提出通过数据中介内部化外部性并补贴创新贡献的市场设计。

详情
AI中文摘要

我们如何设计一个用于训练AI模型的人类生成内容市场,既能促进技术进步,又能保留个人创作高质量内容的激励?现有方法采取两极立场:基于合理使用的“自由使用”模式和“强知识产权”模式。我们证明两者均失败:自由使用不补偿创作者,而通过建模为静态Stackelberg博弈,强知识产权也削弱了创作激励。我们发现这对更具创新性的创作者尤其如此,我们将此现象称为“原创性惩罚”。将这一见解扩展到动态模型,我们发现另一种市场失灵会损害AI模型性能,即使对于初始良好的模型也是如此:此类模型导致人类更依赖AI辅助创作,导致同质化内容反馈到训练中,从而降低模型性能——即“精确性诅咒”。我们进一步提出一种市场设计,通过数据中介内部化跨创作者外部性并补贴创新贡献,从而恢复效率。

英文摘要

How can we design a market of human-generated content for use in training AI models that both enables technological progress and preserves individual incentives for high-quality content creation? Existing approaches take polar positions: a "free-for-all" model based on fair use and a "strong intellectual property rights" model. We show that both fail: Free-for-all does not compensate creators, and -- by modeling as a static Stackelberg game -- strong intellectual property rights also underpower creative incentives. We find this especially true for more innovative creators, a phenomenon we term the "originality penalty." Extending this insight to a dynamic model, we find another market failure undermining AI model performance, even for an initially good model: Such a model induces greater reliance by humans on AI-assisted creation, resulting in homogenized content feeding back into training, which degrades the model performance -- a "curse of precision." We further propose a market design with a data intermediary internalizing cross-creator externalities and subsidizing innovative contributions, thereby restoring efficiency.

2606.12247 2026-06-11 cs.CY cs.CL 新提交

Beyond Third-Person Audits: Situated Interaction Auditing for User-Centered LLM Bias Research

超越第三人称审计:以用户为中心的LLM偏见研究的场景交互审计

Andrés Abeliuk, Cinthia Sanchez Macias, Valentina Alarcón, Álvaro Madariaga, Claudia Lopez

AI总结 提出场景交互审计(SIA)框架,通过分析用户画像信号(如社会人口统计标记、写作风格和身份陈述)如何系统性地影响LLM响应质量、内容和语气,以用户为中心研究LLM偏见。

详情
AI中文摘要

大型语言模型(LLM)的偏见研究主要集中在第三人称审计上,即研究模型如何作为外部主体表征或评估人口群体。然而,这种范式忽略了一个结构性盲点:用户不在审计中。在实践中,LLM用于开放式的个人交互,在此过程中模型隐式地代表用户并相应调整其响应。当相同的请求因提问者不同而产生不同响应时,偏见不仅体现在模型如何描述他人,还体现在它如何对待对话者。我们提出场景交互审计(SIA),这是一个以用户为中心的框架,用于研究用户画像信号——隐式社会人口统计标记、写作风格和陈述身份——如何系统性地塑造LLM响应质量、内容和语气。我们通过一个案例研究来展示该框架,该案例研究跨多个任务领域交叉了性别和社会经济地位信号,并概述了SIA作为自然语言处理新使命的研究议程。

英文摘要

Research on bias in large language models (LLMs) has predominantly focused on third-person audits, which study how models represent or evaluate demographic groups as external subjects. However, this paradigm overlooks a structural blind spot because the user is absent from the audit. In practice, LLMs are used in open-ended, personal interactions, during which the model implicitly represents the user and adjusts its responses accordingly. When identical requests yield different responses depending on who is asking, bias manifests not in how the model describes others but in how it treats its interlocutor. We propose Situated Interaction Auditing (SIA), a user-centered framework for studying how user profile signals -- implicit sociodemographic markers, writing style, and stated identity -- systematically shape LLM response quality, content, and tone. We demonstrate the framework through a case study that intersects gender and socioeconomic status signals across multiple task domains and outline a research agenda for SIA as a new mission for natural language processing.

2606.12245 2026-06-11 cs.IR cs.AI 新提交

DiffCold: A Diffusion-based Generative Model for Cold-Start Item Recommendation

DiffCold: 基于扩散的生成模型用于冷启动物品推荐

Kangning Zhang, Yingjie Qin, Weinan Zhang, Yong Yu, Jianghao Lin

AI总结 针对冷启动物品推荐中的跷跷板困境,提出基于条件扩散的生成模型DiffCold,通过从内容重建温物品嵌入并保持流形结构,结合检索增强聚合器和模拟表示对齐模块,统一冷热物品表示。

详情
Comments
Accepted by ECML-PKDD 2026
AI中文摘要

冷启动物品推荐由于缺乏交互历史,在现实系统中仍然是一个持续的挑战。虽然先前的模型尝试利用物品内容特征来弥合这一差距,但它们普遍遭受\textbf{跷跷板困境}:提升冷物品的性能不可避免地会降低温物品的性能,反之亦然。我们发现这一困境源于根本的\textbf{分布差异}:温物品嵌入占据由丰富交互信号塑造的复杂“行为流形”,而冷物品嵌入则被限制在仅从辅助内容导出的“语义流形”上。现有方法通常强制在这些不一致空间之间进行刚性映射,导致模型为了适应冷物品而牺牲温表示的精度。为了解决这个问题,我们提出\textbf{DiffCold},一种基于扩散的生成模型,统一了温表示和冷表示。与GAN或VAE不同,DiffCold利用条件扩散从内容重建温物品嵌入,保留底层流形结构而不退化。我们进一步针对这一范式设计了两个特定模块:一个\textbf{检索增强聚合器},利用语义相似的温物品初始化生成,以绕过低效的噪声;以及一个\textbf{基于模拟的表示对齐}模块,通过对比学习强制生成嵌入与真实嵌入之间的分布一致性。在三个基准上的实验证实,DiffCold解决了跷跷板困境,在所有指标上持续优于最先进的方法。

英文摘要

Cold-start item recommendation remains a persistent challenge in real-world systems due to the absence of interaction histories. While prior models attempt to bridge this gap using item content features, they universally suffer from the \textbf{seesaw dilemma}: enhancing performance for cold items inevitably degrades performance for warm items, and vice versa. We identify that this dilemma stems from a fundamental \textbf{distributional disparity}: warm item embeddings occupy a complex ``behavioral manifold" shaped by rich interaction signals, whereas cold item embeddings are constrained to a ``semantic manifold" derived solely from auxiliary content. Existing methods often force a rigid mapping between these inconsistent spaces, causing the model to sacrifice the precision of warm representations to accommodate cold ones. To address this, we propose \textbf{DiffCold}, a diffusion-based generative model that unifies warm and cold representations. Unlike GANs or VAEs, DiffCold leverages conditional diffusion to reconstruct warm item embeddings from content, preserving the underlying manifold structure without degradation. We further tailor this paradigm with two specific designs: a \textbf{Retrieval-enhanced Aggregator} that initializes generation using semantically similar warm items to bypass inefficient noise, and a \textbf{Simulation-based Representation Alignment} module that enforces distribution consistency between generated and real embeddings via contrastive learning. Experiments on three benchmarks confirm that DiffCold resolves the seesaw dilemma, consistently outperforming state-of-the-art methods across all metrics.

2606.12243 2026-06-11 cs.CL cs.AI 新提交

VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

VIA-SD:通过模型内路由进行推测解码的验证

Yuchen Xian, Yang He, Yunqiu Xu, Yi Yang

AI总结 提出VIA-SD多级验证框架,利用从完整验证器派生的精简验证器处理中等置信度令牌,减少大模型调用,在多个任务上实现10-20%加速。

详情
Comments
Accepted at the 43rd International Conference on Machine Learning (ICML 2026)
AI中文摘要

推测解码(SD)通过让轻量级草稿模型生成候选,由大型验证器并行验证,解决了LLM的高推理成本问题。现有的草稿-验证方法使用二元决策:接受或完全重新计算。然而,我们发现许多被拒绝的令牌可以通过从完整验证器通过模型内路由派生的精简子模型正确验证,而不是完整验证器。这促使我们使用精简验证器来处理需要中等验证资源的令牌,减少昂贵的大模型调用。我们提出了VIA-SD(通过模型内路由进行推测解码的验证),一种使用路由精简验证器的多级框架。草稿令牌分层处理:高置信度情况直接接受,中等置信度情况由精简验证器重新生成,不确定情况由完整模型验证。在四个代表性任务和多个模型家族中,VIA-SD将拒绝率降低了0.10-0.22,并在强SD基线基础上实现了10-20%的加速,同时相比非草稿解码实现了2.5-3倍的加速。此外,VIA-SD与现有SD框架兼容,无需修改其训练过程。我们的结果表明,多级SD是一种可扩展且高效的LLM推理通用范式。项目页面:此https URL

英文摘要

Speculative decoding (SD) addresses the high inference costs of LLMs by having lightweight drafters generate candidates for large verifiers to validate in parallel. Existing draft-verify methods use binary decisions: accept or fully recompute. Yet we find that many rejected tokens can be verified correctly by a slim submodel derived from the full verifier via intra-model routing, instead of the full verifier. This motivates our slim-verifier to handle tokens requiring moderate verification resources, reducing expensive large-model calls. We propose Verification via Intra-Model Routing for Speculative Decoding (VIA-SD), a multi-tier framework using a routed slim-verifier. Draft tokens are processed hierarchically: direct acceptance for high-confidence cases, slim-verifier regeneration for medium-confidence cases, and full-model verification for uncertain cases. Across four representative tasks and multiple model families, VIA-SD reduces rejection rates by 0.10-0.22 and delivers 10-20% speedups over strong SD baselines, while achieving 2.5-3x acceleration over non-drafting decoding. Moreover, VIA-SD is compatible with existing SD frameworks without modifying their training procedures. Our results suggest multi-tier SD as a general paradigm for scalable and efficient LLM inference. Project page: this https URL

2606.12231 2026-06-11 cs.SE cs.AI 新提交

Rule Taxonomy and Evolution in AI IDEs: A Mining and Survey Study

AI IDE中的规则分类与演化:挖掘与调查研究

Guangzong Cai, Ruiyin Li, Peng Liang, Zengyang Li, Mojtaba Shahin

AI总结 通过挖掘83个开源项目中的7310条规则和99份从业者调查,建立了包含5个主类和25个子类的规则分类法,发现开发者重视架构约束但实际配置多为低级工作流和代码格式规则,规则演化主要由建设性上下文扩展和丰富驱动,且更新规则可使工件合规率平均提升22.99%。

详情
Comments
52 pages, 21 images, 8 tables, Manuscript submitted to a Journal (2026)
AI中文摘要

AI驱动的集成开发环境(AI IDE)的采用引入了“规则”作为一种新颖的软件工件,允许开发者将项目特定的约束和架构指导原则持久地注入到大语言模型(LLM)的上下文中。尽管这些规则在使AI行为与开发者意图对齐方面发挥作用,但它们的分类、演化及实际影响仍 largely unexplored。为填补这一空白,我们对AI IDE规则进行了混合方法实证研究。通过挖掘83个开源项目并提取7,310条规则,我们建立了一个包含5个主类和25个子类的全面分类法。随后,我们将这些工件与99名从业者的调查反馈进行三角验证。我们的分析发现开发者优先级与实际配置之间存在反差:虽然从业者认为架构约束非常重要,但仓库中的规则文件主要由低级工作流和代码格式约束组成。此外,我们对1,540个规则演化事件的分析表明,规则更新频繁。仓库数据进一步表明,规则演化主要由建设性上下文扩展(29.17%)和丰富(26.59%)驱动。相比之下,受访开发者报告修改规则主要是为了纠正AI错误(77.78%),通常通过添加新的负面约束而非编辑现有约束。最后,对160个规则演化事件的工件合规性评估显示,更新规则显著提高了软件工件的合规性,更新后平均工件合规率从49.14%提升至72.13%,增加了22.99%。我们的研究提供了实证见解,可帮助开发者优化提示策略,并指导工具构建者为AI IDE设计自动冲突检测和上下文管理机制。

英文摘要

The adoption of AI-powered Integrated Development Environments (AI IDEs) has introduced "Rules" as a novel software artifact, allowing developers to persistently inject project-specific constraints and architectural guidelines into the context of Large Language Models (LLMs). Despite their role in aligning AI behavior with developer intent, the taxonomy, evolution, and practical impact of these rules remain largely unexplored. To bridge this gap, we conducted a mixed-methods empirical study on AI IDE rules. By mining 83 open-source projects and extracting 7,310 rules, we established a comprehensive taxonomy comprising 5 primary and 25 secondary categories. We then triangulated these artifacts with survey responses from 99 practitioners. Our analysis identified a contrast between developer priorities and actual configurations: while practitioners rate architectural constraints as highly important, rule files in repositories primarily consist of low-level workflow and code formatting constraints. Furthermore, our analysis of 1,540 rule evolution events revealed that rules are updated frequently. Repository data further indicate that rule evolution is primarily driven by constructive context expansions (29.17%) and enrichments (26.59%). In contrast, surveyed developers reported modifying rules primarily to correct AI errors (77.78%), typically by adding new negative constraints rather than editing existing ones. Finally, an artifact compliance assessment of 160 rule evolution events revealed that updating rules significantly improves the adherence of software artifacts, with the average artifact compliance rate increasing by 22.99% (from 49.14% to 72.13%) following an update. Our study provides empirical insights that can help developers optimize prompting strategies and guide tool builders in designing automated conflict-detection and context-management mechanisms for AI IDEs.

2606.12211 2026-06-11 quant-ph cs.LG 新提交

Quantum Occam Learning: Sample-Supported Expressibility for Circuit-Based Quantum Learning

量子奥卡姆学习:基于电路的量子学习中样本支持的表达能力

Jeongho Bang, Kyoungho Cho, Jeongwoo Jae

AI总结 针对有限大小量子电路生成的数据,提出信息论奥卡姆理论,证明样本支持的表达能力定律:在迹距离精度ε下,M个样本最多支持约Mε²个门,将电路复杂度转化为自适应统计资源。

详情
Comments
22 pages (main text + appendix), 2 figures
AI中文摘要

量子机器学习的一个核心原则是,ansatz 应具有足够的表达能力来表示感兴趣的量子数据。然而,只有当能够从有限数量的未知量子态副本中学习时,表达能力才具有统计意义。在这项工作中,我们为有限大小量子电路生成的量子数据开发了一种信息论奥卡姆理论。对于最多使用 $G$ 个双量子比特门可制备的 $n$ 量子比特纯态类 $S_{n,G}$,度量熵论证给出了在电路受限情况下的可实现样本律 $\widetilde{\Theta}(G/\epsilon^2)$。对于任意源 $\hat{\rho}$,我们引入了最佳 $G$ 门近似误差 $d_G(\hat{\rho})$ 和近似电路复杂度 $C_\eta(\hat{\rho})$。我们证明了一个不可知的量子奥卡姆定理:使用 $M$ 个副本,可以学习到最佳 $G$ 门近似误差加上统计惩罚 $\widetilde{O}(\sqrt{G/M})$。然后,通过一个自适应模型选择定理消除了预先知道 $G$ 的需要,该定理的 oracle 不等式选择了数据所证明的电路复杂度。匹配的下界给出了一个样本支持的表达能力定律:在迹距离精度 $\epsilon$ 下,$M$ 个样本只能支持 $G_{\rm supported} \simeq M\epsilon^2$ 个门,直到对数因子和 $2^n$ 的层析饱和。因此,电路复杂度成为一种自适应统计资源,而不是静态承诺。我们的框架将有界电路复杂度转化为量子机器学习的模型选择原则。

英文摘要

A central principle in quantum machine learning is that an ansatz should be expressive enough to represent the quantum data of interest. Yet, the expressibility is statistically meaningful only insofar as it can be learned from finitely many copies of an unknown quantum state. In this work, we develop an information-theoretic Occam theory for quantum data generated by finite-size quantum circuits. For the class $S_{n,G}$ of $n$-qubit pure states preparable with at most $G$ two-qubit gates, a metric-entropy argument gives the realizable sample law $\widetilde{\Theta}(G/\epsilon^2)$ in the circuit-limited regime. For an arbitrary source $\hat{\rho}$, we introduce the best $G$-gate approximation error $d_G(\hat{\rho})$ and the approximate circuit complexity $C_\eta(\hat{\rho})$. We prove an agnostic quantum Occam theorem: with $M$ copies, one can learn up to the best $G$-gate approximation error plus a statistical penalty $\widetilde{O}(\sqrt{G/M})$. We then remove the need to know $G$ in advance through an adaptive model-selection theorem whose oracle inequality selects the circuit complexity justified by the data. Matching lower bounds yield a sample-supported expressibility law: at trace-distance accuracy $\epsilon$, $M$ samples can support only $G_{\rm supported} \simeq M\epsilon^2$ gates, up to logarithmic factors and tomography saturation at $2^n$. Thus, the circuit complexity becomes an adaptive statistical resource rather than a static promise. Our framework turns bounded circuit complexity into a model-selection principle for quantum machine learning.

2606.12200 2026-06-11 cs.LG cs.AI 新提交

Implicit Neural Representations of Individual Behavior

个体行为的隐式神经表示

Andrew Kang, Priya Narasimhan

AI总结 提出Behavioral INR模型,用隐式神经表示从无标签多策略行为数据中学习策略表示,通过FiLM层调节策略函数,实现无监督策略识别,在连续状态-动作空间中提升策略可识别性。

详情
Comments
ICML 2026, Structured Probabilistic Inference & Generative Modeling Workshop
AI中文摘要

我们研究从无标签多策略行为数据中进行策略表示学习。每个回合由固定策略生成,但策略标签不可用。这种设置出现在机器人操作、演示、游戏、赛车以及其他混合了异构行为但没有注释的数据集中。我们引入了\emph{Behavioral INR},一种自监督生成模型,将隐式神经表示(INR)从视觉领域适应到行为领域。Behavioral INR不是将坐标映射到RGB值,而是将策略表示为状态-动作函数,将状态映射到后续动作。一个回合级别的潜在变量通过FiLM层调节该函数,产生策略上的生成先验,并允许在无监督的情况下推断策略身份。由于INR将每个数据点视为底层函数的样本,同一模型自然适应可变回合长度和不同采样粒度,就像视觉INR处理不同图像分辨率一样。我们还定义了沿状态分布和动作分布轴的策略级分布外(OOD)偏移,当策略在状态或动作上重叠时会出现这种偏移,但标准的基于新智能体或环境的OOD设置无法捕捉到。我们在合成高斯随机场数据、带有受控OOD分割的MuJoCo演示以及真实世界的国际象棋、一级方程式赛车、机器人和搜索-规避数据集上进行了评估。Behavioral INR在最具挑战性的连续状态-动作设置中持续提升策略可识别性,尤其是当更长的回合、更多的策略和OOD分割降低了边际捷径的效用时;当策略身份可以从符号重复或低维动作统计中恢复时,摊销历史编码器仍然具有竞争力。我们发布了代码和检查点。

英文摘要

We study policy representation learning from unlabeled multi-policy behavioral data. Each episode is generated by a fixed policy, but policy labels are unavailable. This setting appears in robotics play, demonstrations, games, racing, and other datasets where heterogeneous behaviors are mixed without annotations. We introduce \emph{Behavioral INR}, a self-supervised generative model that adapts implicit neural representations (INRs) from vision to behavior. Instead of mapping coordinates to RGB values, Behavioral INR represents a policy as a state-action function mapping states to subsequent actions. An episode-level latent modulates this function through FiLM layers, yielding a generative prior over policies and allowing policy identity to be inferred without supervision. Because INRs treat each datapoint as samples from an underlying function, the same model naturally accommodates variable episode lengths and different sampling granularities, as in vision INRs with different image resolutions. We also define policy-level out-of-distribution (OOD) shifts along state-distribution and action-distribution axes, which arise when policies overlap in states or actions but are not captured by standard behavioral OOD settings based only on new agents or environments. We evaluate on synthetic Gaussian random field data, MuJoCo demonstrations with controlled OOD splits, and real-world chess, Formula 1 racing, robotics, and Seek-Avoid datasets. Behavioral INR most consistently improves policy identifiability in the hardest continuous state-action settings, especially when longer episodes, more policies, and OOD splits reduce the usefulness of marginal shortcuts; amortized history encoders remain competitive when policy identity can be recovered from symbolic repetition or low-dimensional action statistics. We release code and checkpoints.

2606.12199 2026-06-11 eess.AS cs.CL cs.SD 新提交

Which Speech Representation Better Matches Text-Native Reasoning? A Study of Speech-Text Alignment on Frame Rate and Representation

哪种语音表示更匹配文本原生推理?帧率和表示对语音-文本对齐的研究

Zhen Ye, Xu Tan, Yiming Li, Guangyan Zhang, Chimin Chan, Haohe Liu, Zhengxi Liu, Hongzhan Lin, Zheqi Dai, Xinshen Zhang, Peiwen Sun, Qiuqiang Kong, Wei Xue

AI总结 研究语音与文本模态差异中的时间粒度不匹配问题,提出因子化FSQ和轻量非自回归音频LM头以降低帧率,发现4.17Hz帧率结合中间层表示对齐在语音问答中表现最佳。

详情
Comments
Accepted by Interspeech 2026 long paper
AI中文摘要

口语对话模型通常以文本LLM骨干网络为基础,但在以语音而非文本为条件时,推理能力往往会下降。我们将这种模态差异部分归因于时间粒度不匹配:在语义匹配的情况下,语音标记在时间上是冗余的,且远长于文本,这稀释了每个标记的语义密度,削弱了文本原生的推理动态。我们将语音标记设计视为一个表示选择问题,并在固定信息速率下,在冻结的LLM骨干网络中扫描帧率。为了实现低帧率,我们引入了因子化FSQ和一个轻量级的非自回归音频LM头,在不牺牲高效预测的情况下将容量扩展到近300比特/帧。在消除瓶颈后,我们扫描帧率(50→2.08 Hz)和对齐深度,并观察到在4.17 Hz帧率下,结合中间层表示对齐,语音问答存在一致的最佳区域。

英文摘要

Spoken dialogue models typically start from text LLM backbones, yet reasoning often degrades when conditioning on speech instead of text. We attribute part of this modality gap to a temporal-granularity mismatch: speech tokens are temporally redundant and far longer than text under matched semantics, diluting per-token semantic density and weakening text-native reasoning dynamics. We study speech token design as a representation selection problem and sweep frame rates under a frozen LLM backbone with a fixed information rate. To make low frame rates feasible, we introduce factorized FSQ and a lightweight non-autoregressive audio LM head, scaling capacity to nearly 300\,bits/frame without sacrificing efficient prediction. With the bottleneck removed, we sweep frame rates (50$\rightarrow$2.08\,Hz) and alignment depth, and observe a consistent best regime for speech QA at 4.17\,Hz with intermediate-layer representation alignment.

2606.12191 2026-06-11 cs.CL cs.AI 新提交

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

面向大语言模型的智能体环境工程:环境建模、合成、评估与应用综述

Jiachun Li, Zhuoran Jin, Tianyi Men, Yupu Hao, Kejian Zhu, Lingshuai Wang, Dongqi Huang, Longxiang Wang, Shengjia Hua, Lu Wang, Jinshan Gao, Hongbang Yuan, Ruilin Xu, Kang Liu, Jun Zhao

AI总结 本文从环境工程生命周期出发,系统综述了智能体环境的建模、合成、评估与应用,涵盖八种属性与领域、两种合成范式、四种智能体演化路径及三种环境演化范式。

详情
Comments
63 pages, 10 figures
AI中文摘要

环境作为基于大语言模型(LLM)的智能体在不同场景下的交互系统,在推动模型能力持续演进中扮演关键角色。尽管重要性显著,现有工作缺乏系统分类与深入分析。本文从环境工程生命周期的视角系统研究了当前关于智能体环境的研究,涵盖其建模、合成、评估与应用。具体而言,本文首先从八个属性和八个领域引入代表性环境,详细分析其发展路径并突出核心能力。其次,针对自动化环境合成,介绍了两种范式,如符号合成和神经合成。本文还展示了每种范式下的不同环境评估方法。第三,从智能体-环境协同演化的角度讨论了相应的环境应用。具体来说,本文从四个互补视角描述了动态环境中智能体演化的主要路径:以记忆为中心的经验演化、以编排为中心的工作流演化、以轨迹为中心的离线演化和以探索为中心的在线演化。并识别了三种环境演化范式,即神经驱动、难度驱动和规模驱动方法。最后,讨论了几个有前景的未来方向,包括环境即服务、多智能体环境和神经符号环境。

英文摘要

Environments serve as interactive systems for large language model (LLM) based agents across diverse scenarios and play a crucial role in driving the continual evolution of model capabilities. Despite this importance, existing work lacks a systematic categorization and deep analysis. This paper systematically studies current researches on agentic environments from the perspective of the environment engineering lifecycle, covering their modeling, synthesis, evaluation and application. Specifically, the paper first introduces representative environments from the perspectives of eight attributes and eight domains, providing detailed analyses of their development paths and highlighting their core capabilities. Second, for automated environment synthesis, two paradigms are introduced, such as symbolic synthesis and neural synthesis. This paper also shows different environment evaluation methods in each paradigm. Thirdly, the corresponding environment applications from the perspective of agent-environment co-evolution are discussed. In specific, the paper characterizes the primary pathways for agent evolution in dynamic environments from four complementary perspectives: memory-centric experience evolution, orchestration-centric workflow evolution, trajectory-centric offline evolution, and exploration-centric online evolution. And three paradigms of environment evolution are identified, namely neural-driven, difficulty-driven, and scaling-driven approaches. At last, several promising future directions are discussed, including Environment-as-a-Service, Multi-agent Environments, and Neural-Symbolic Environments.

2606.12146 2026-06-11 cs.LG cs.AI 新提交

nD-RoPE: A Generalized RoPE for n-Dimensional Position Embedding

nD-RoPE:一种用于n维位置嵌入的广义RoPE

Boyang Li, Yulin Wu, Sizhe Xu, Nuoxian Huang, Zhonghang Yuan, Shangyi Guo, Shu Yang, Takahiro Yabe

AI总结 提出nD-RoPE,将旋转位置嵌入推广到任意维度,通过多尺度正则单纯形波矢设计实现各向同性,在图像、视频和点云任务中提升性能。

详情
Comments
Accepted to the 43rd International Conference on Machine Learning (ICML 2026)
AI中文摘要

旋转位置嵌入(RoPE)在Transformer模型中被广泛采用,但其向高维域的扩展缺乏统一的理论表述。大多数现有方法要么沿每个轴独立应用旋转,要么经验性地混合频率,这限制了跨维交互并产生方向相关的表示。为了解决这些限制,我们提出了nD-RoPE,一种将RoPE推广到任意维度的无分解泛化。从连续希尔伯特空间中的平移不变表述出发,我们推导出各向同性的谱条件,要求将位置和频率视为耦合的\(n\)维向量。我们通过多尺度正则单纯形波矢设计实例化该表述,提供了非退化的空间覆盖和对称、方向平衡的二阶响应。在图像、视频和点云上的实验表明,在高维设置中性能持续提升且泛化能力增强。

英文摘要

Rotary Position Embedding (RoPE) is widely adopted in Transformer models, yet its extension to high-dimensional domains lacks a unified theoretical formulation. Most existing approaches either apply rotations independently along each axis or empirically mix frequencies, which limits cross-dimensional interactions and yields direction-dependent representations. To address these limitations, we propose nD-RoPE, a decomposition-free generalization of RoPE to arbitrary dimensions. From a translation-invariant formulation in continuous Hilbert space, we derive a spectral condition for isotropy that requires treating positions and frequencies as coupled \(n\)-dimensional vectors. We instantiate this formulation with a multi-scale regular-simplex wave-vector design, which provides non-degenerate spatial coverage and a symmetric, directionally balanced second-order response. Experiments across images, videos, and point clouds demonstrate consistent performance gains and improved generalization in high-dimensional settings.

2606.12075 2026-06-11 cs.CR cs.LG 新提交

Categorical Robustness Assessment for Machine Learning based Network Intrusion Detection Systems

基于机器学习的网络入侵检测系统的分类鲁棒性评估

Mayank Raj, Nathaniel D. Bastian, Lance Fiondella, Gokhan Kul

AI总结 本文系统比较了CNN、LSTM和随机森林三种分类器在对抗攻击下的鲁棒性,发现随机森林基线准确率虽高但极易被攻破,而CNN表现最稳健。

详情
AI中文摘要

网络入侵检测系统(NIDS)广泛使用机器学习(ML),但ML模型可能受到对抗性攻击的操纵。这些攻击向网络流量数据添加精心设计的扰动,导致误分类。虽然先前的工作已经证明了孤立环境下的对抗性漏洞,但在受控攻击条件下,跨架构以及基于攻击类别和类型的系统比较仍然有限,这使得从业者在对抗性环境中部署哪些模型缺乏明确指导。本文提出了一个简单的问题:当攻击者试图操纵系统时,哪种分类器架构实际上能够保持稳定?我们对三种流行架构进行了测试:一维卷积神经网络(CNN)、长短期记忆网络(LSTM)和随机森林(RF)集成。使用ACI-IoT-2023数据集(超过120万个样本,涵盖12种攻击类型),我们使用FGSM和PGD对抗攻击对每个模型进行攻击,这些攻击在归一化特征空间中应用基于梯度的扰动,符合既定的对抗性ML评估协议,扰动预算范围为$\epsilon=0.01$到$\epsilon=0.1$。令人惊讶的是,随机森林实现了近乎完美的基线准确率(99.98%),但在攻击下灾难性地崩溃,在我们测试的最小扰动下下降了73个百分点。另一方面,CNN在$\epsilon=0.01$时保持了95.5%的准确率,并且随着扰动的增加而优雅地退化。LSTM介于两者之间。这些发现颠覆了传统观念:如果模型在对抗压力的第一个迹象下就崩溃,那么高基线准确率毫无意义。对于在对抗性环境中部署入侵检测的从业者,我们推荐基于CNN的架构,并提供特定场景的部署指导。

英文摘要

Network Intrusion Detection Systems (NIDS) heavily utlize Machine Learning (ML) but ML models can be manipulated via adversarial attacks. These attacks add carefully crafted perturbations to network traffic data that leads to misclassifications. While prior work has demonstrated adversarial vulnerabilities in isolated settings, systematic cross-architecture as well as class and category of attack based comparisons under controlled attack conditions remain limited, leaving practitioners without clear guidance on which models to deploy in adversarial environments. This paper asks a simple question: what type of classifier architectures actually hold up when attackers try to manipulate the systems? We put three popular architectures through their paces: a 1D Convolutional Neural Network, a Long Short-Term Memory (LSTM) network, and a Random Forest (RF) ensemble. Using the ACI-IoT-2023 dataset (over 1.2 million samples spanning 12 attack types), we subject each model with FGSM and PGD adversarial attacks, which apply gradient-based perturbations in normalized feature space consistent with established adversarial ML evaluation protocols, at perturbation budgets ranging from $\epsilon=0.01$ to $\epsilon=0.1$. Surprisingly, Random Forest achieved near-perfect baseline accuracy (99.98\%), yet collapsed catastrophically under attack, dropping 73 percentage points at the smallest perturbation we tested. CNN, on the other hand, retained 95.5\% accuracy at $\epsilon=0.01$ and degraded gracefully as perturbations increased. LSTM fell somewhere in between. These findings flip the conventional wisdom where high baseline accuracy means nothing if a model shatters at the first sign of adversarial pressure. For practitioners deploying intrusion detection in adversarial environments, we recommend CNN-based architectures and provide scenario-specific deployment guidance.

2606.12073 2026-06-11 cs.SI cs.AI 新提交

"That's AI Slop, You Bot!" Studying Accusations, Evidence, and Credibility in Online Discourse Towards LLM-Generated Comments

“那就是AI垃圾,你这个机器人!”:研究针对LLM生成评论的指责、证据与可信度

Jason Miklian, John E. Katsos

AI总结 分析2023-2026年Hacker News和Reddit上2500万条评论,发现对AI生成文本的指责增长超十倍,但被指责的文本并非真正由AI生成,而是基于感知真实性的社会把关行为。

详情
AI中文摘要

生成式AI使得流畅的散文变得廉价易得,打破了“好文章意味着真思考”的旧承诺。读者如何回应?这能告诉我们关于反AI态度变化的什么信息?我们分析了来自Hacker News和Reddit(2023-2026年)的2500万条评论,结合了对7500个抽样AI使用指责的LLM判断、情感轨迹、300个确认AI使用指责的言语行为编码,以及被指责与未被指责的父评论的匹配对照测试。我们发现,两个平台上指责中贬义标签的份额增长了十倍以上,而2022年前的不真实性词汇(如shill、astroturf)的安慰剂词汇则没有。这一转变反映了一个快速增长的趋势:将任何可疑或看似不真实的散文标记为“AI垃圾”。AI垃圾框架现在占贬义提及的94%,主导评论的语气从嘲笑转向把关和结构性抗议。关键惊喜来自匹配对照测试,该测试发现,统计上区分AI与人类文本的散文特征并不能预测哪些人类文本会被指责为AI。新的指责作为感知真实性的社会把关,实际上并不筛查AI。这项研究扩展了信号理论,表明当底层检测问题无法在非专家层面解决时,即使不准确,社会使用的替代信号也会增长。它表明,AI对写作的影响从读者侧来看与生产(作者)侧不同。检测技术无法解决这种动态,因为指责的社会功能日益表现为社会把关和群体内信号传递,而非识别AI生成的写作。

英文摘要

Generative AI has made fluent prose cheap to produce, breaking the old promise to readers that good writing meant real thinking. How have readers responded, and what can this tell us about changing anti-AI attitudes? We analyzed 25 million comments from Hacker News and Reddit (2023-2026), combining LLM judgment on 7,500 sampled accusations of AI use, sentiment trajectories, speech-act coding of 300 confirmed accusations of AI use, and a matched-control test of accused versus non-accused parent comments. We found that the pejorative-label share of accusations rose more than tenfold on both platforms while a placebo vocabulary of pre-2022 inauthenticity terms (shill, astroturf) did not. This shift reflected a fast-growing trend of branding any suspicious or seemingly inauthentic prose as "AI slop". The slop frame now constitutes 94 percent of pejorative mentions, with the dominant comments shifting in tone from mockery toward gatekeeping and structural protest. The key surprise comes from a matched-control test which found that prose features that statistically distinguish AI from human text do not predict which human text gets accused as AI. The new accusations work as social gatekeeping of perceived authenticity without actually screening for AI. This research extends signaling theory by showing that substitute signals used socially can grow even when inaccurate if the underlying detection problem cannot be solved at the non-expert level. It shows that AI's effects on writing from the reader side are distinct from those on the production (writer) side. Detection technology cannot resolve this dynamic because the social function of accusations is increasingly to perform social gatekeeping and in-group signaling as opposed to identifying AI-generated writing.

2606.12071 2026-06-11 cs.DL cs.AI 新提交

On the Limits of LLM-as-Judge for Scientific Novelty Assessment

论LLM作为评审在科学新颖性评估中的局限性

Soumitra Sinhahajari, Navonil Majumder, Soujanya Poria

AI总结 本文通过构建RQ-Bench基准,发现LLM评审对模型生成的研究问题产生新颖性幻觉,而人类专家则持相反意见,揭示了LLM在评估科学新颖性时的可靠性问题。

详情
AI中文摘要

LLM越来越多地被用于生成和评判科学想法。这使得新颖性评估成为一个核心问题。完整想法的评估很困难,因为它通常需要判断方法、可行性及其经验前景。因此,我们研究一个更清晰的上游对象:研究问题(RQ)。RQ生成是科学构思的前提,并且RQ可以与真实论文中探讨的问题进行比较。我们引入了RQ-Bench,一个基于近期arXiv论文构建的基准。对于每篇论文,我们从其引用的背景、空白和贡献中重建作者锚定的RQ。这些RQ并非针对同一背景的唯一有效问题。它们是用于测试新颖性判断的作者锚定参考点。我们使用独立LLM评审、比较LLM评审和人类专家评估来评估模型生成的RQ。LLM评审一致地将模型生成的RQ评为高度新颖,产生新颖性幻觉;在比较评估中,这种偏好甚至更强。然而,领域专家得出相反结论,更偏好作者锚定的参考问题。我们进一步发现,许多生成的RQ狭窄或受限于来源,这是LLM评审通常忽略的维度,除非明确测试。总体而言,LLM评审与人类专家之间矛盾的新颖性评估引发了关于使用LLM评估研究问题科学新颖性可靠性的严重担忧。

英文摘要

LLMs are increasingly used to generate and judge scientific ideas. This makes novelty evaluation a central problem. Full idea evaluation is difficult because it often requires judging a method, its feasibility, and its empirical promise. We therefore study a cleaner upstream object: the research question (RQ). RQ generation is a prerequisite for scientific ideation, and RQs can be compared against questions pursued in real papers. We introduce RQ-Bench, a benchmark built from recent arXiv papers. For each paper, we reconstruct author-anchored RQs from its cited background, gaps, and contributions. These RQs are not the only valid questions for the same background. They are author-anchored reference points for testing novelty judgments. We evaluate model-generated RQs with standalone LLM judging, comparative LLM judging, and human expert evaluation. LLM judges consistently rate model-generated RQs as highly novel, producing a novelty mirage; in comparative evaluations, this preference becomes even stronger. Domain experts, however, reach the opposite conclusion and prefer the author-anchored reference questions. We further find that many generated RQs are narrow or source-bound, a dimension that LLM judges often miss unless explicitly tested. Overall, the contradictory novelty evaluations between LLM judges and human experts raise a serious concern about the reliability of using LLMs to assess the scientific novelty of research questions.

2606.12068 2026-06-11 cs.CL 新提交

StanceNakba Shared Task: Actor and Topic-Aware Stance Detection in Public Discourse

StanceNakba 共享任务:公共话语中基于行动者和主题的立场检测

Kholoud K. Aldous, Md Rafiul Biswas, Mabrouka Bessghaier, Shimaa Ibrahim, Kais Attia, Wajdi Zaghouani

AI总结 提出 StanceNakba 2026 共享任务,通过两个子任务(行动者级和跨主题立场检测)利用微调 Transformer 模型(如 MARBERT、AraBERT)在巴以冲突相关社交媒体数据上实现高 Macro F1 分数。

详情
Comments
11 Pages, 6 Tables
AI中文摘要

我们提出 StanceNakba 2026,这是一个关于巴以冲突相关极化社交媒体话语中立场检测的共享任务,作为 LREC-COLING 2026 上 Nakba-NLP 2026 的一部分组织。该任务引入两个子任务:子任务 A(行动者级立场检测),将英语社交媒体帖子分类为亲巴勒斯坦、亲以色列或中立;子任务 B(跨主题立场检测),识别阿拉伯语帖子中关于两个冲突相关主题(与以色列正常化以及约旦难民存在)的赞成、反对或中立立场。该任务基于一个包含 2,606 条社交媒体帖子的标注数据集。共有 7 个团队参加了子任务 A,6 个团队参加了子任务 B。参与系统主要微调了阿拉伯语和多语言基于 Transformer 的模型,包括 MARBERT、AraBERT 和 DeBERTa-v3 变体,多个团队采用了交叉验证、集成方法和主题条件架构。表现最佳的系统在子任务 A 上达到了 0.9620 的 Macro F1,在子任务 B 上达到了 0.8724,表明基于 Transformer 的方法对于冲突领域立场检测非常有效,同时突显了跨主题泛化和中立类别预测方面的持续挑战。

英文摘要

We present StanceNakba 2026, a shared task on stance detection in polarized social media discourse related to the Palestinian-Israeli conflict, organized as part of Nakba-NLP 2026 at LREC-COLING 2026. The task introduces two subtasks: Subtask A (Actor-Level Stance Detection), which classifies English social media posts as Pro-Palestine, Pro-Israel, or Neutral; and Subtask B (Cross-Topic Stance Detection), which identifies Favor, Against, or Neither stances in Arabic posts toward two conflict-related topics, normalization with Israel and refugee presence in Jordan. The task is grounded in an annotated dataset of 2,606 social media posts. A total of 7 teams participated in Subtask A and 6 teams in Subtask B. Participating systems primarily fine-tuned Arabic and multilingual transformer-based models, including MARBERT, AraBERT, and DeBERTa-v3 variants, with several teams employing cross-validation, ensemble methods, and topic-conditioned architectures. The best-performing systems achieved a Macro F1 of 0.9620 on Subtask A and 0.8724 on Subtask B, demonstrating that transformer-based approaches are highly effective for conflict-domain stance detection while highlighting persistent challenges in cross-topic generalization and neutral class prediction.

2606.12066 2026-06-11 cs.CV 新提交

Performance Analysis of YOLOv11 and YOLOv8 for Mixed Traffic Object Detection under Adverse Weather Conditions in Developing Countries

YOLOv11与YOLOv8在发展中国家恶劣天气下混合交通目标检测的性能分析

Quoc Thuan Nguyen, Ha Anh Vu, Ngo Dang Thanh Ngan, Minh Phuc Hoang Ngoc

AI总结 针对发展中国家恶劣天气下的混合交通场景,评估YOLOv11n与YOLOv8n在融合数据集上的性能,YOLOv11n在精度提升3.2%的同时计算量减少22%,实现精度与效率的优化平衡。

详情
AI中文摘要

在现代车辆系统中,恶劣条件下的鲁棒性能已成为自动驾驶的关键问题。我们的研究对YOLO系列最新版本YOLOv11 Nano架构进行了全面评估,以广泛采用的YOLOv8 Nano为基线,在融合了印度驾驶数据集(IDD)[1]和伯克利深度驾驶数据集(BDD100K)[2]的自定义数据集上进行基准测试。我们分析了在涉及密集混合交通、雨天和低光照条件的高熵场景中检测精度、推理速度和计算效率之间的权衡。具体而言,YOLOv11n实现了46.6%的平均精度(mAP@50),精度比基线提高了3.2%,有效减少了杂乱场景中的误报。此外,该模型表现出更高的能效,FLOPs减少22%(6.3G vs. 8.1G),同时在Tesla T4 GPU上保持70.9 FPS的实时推理速度,为安全关键的边缘部署提供了最优权衡。

英文摘要

In modern vehicular systems, robust performance under harsh conditions has become a critical problem of autonomous driving. Our study delivers a comprehensive evaluation of the newest iteration of the YOLO series, which is YOLOv11 Nano architecture benchmarked against the widely adopted YOLOv8 Nano as a baseline on a custom fused dataset that combines the Indian Driving Dataset (IDD) [1] and Berkeley Deep Drive Dataset (BDD100K) [2]. We have analyzed the trade-offs among detection accuracy, inference speed, and computational efficiency in high-entropy scenarios involving dense mixed traffic, rain, and low-light conditions. Specifically, YOLOv11n achieves a mean Average Precision (mAP@50) of 46.6%, with a notable 3.2% improvement in Precision over the baseline, effectively reducing false positives in cluttered scenes. Furthermore, the proposed model exhibits enhanced energy efficiency, requiring 22% fewer FLOPs (6.3G vs. 8.1G) while maintaining real-time inference speed of 70.9 FPS on a Tesla T4 GPU, offering an optimal trade-off for safety-critical edge deployment.

2606.12065 2026-06-11 cs.AI cs.MA 新提交

Automating Geometry-Intensive Compliance Checking in BIM: Graph-Based Semantic Reasoning Framework

BIM中几何密集型合规检查自动化:基于图的语义推理框架

Zixuan Xiao, Pei Troh Koh, Jun Ma, Jack C.P. Cheng

AI总结 针对BIM中几何密集型法规自动检查的语义鸿沟问题,提出SGR-BIM图驱动推理框架,通过跨模态知识图谱实现可解释推理,在679个消防规范查询上达到84.3%准确率,较基线提升8.6%。

详情
AI中文摘要

自动化几何密集型法规的合规检查仍然是建筑信息模型(BIM)中的一个重大技术瓶颈,主要原因是高层级法规逻辑与结构化IFC数据之间的语义差异。现有方法通常依赖于静态规则模板,难以遍历多跳推理链或解决跨多个建筑实体的潜在空间依赖关系。为应对这些挑战,提出了一种面向建筑信息模型的空间几何推理系统(SGR-BIM),作为一个集成的图驱动推理框架。SGR-BIM动态构建跨模态知识图谱,对齐用户意图、法规语义和BIM几何,无需硬编码即可实现可解释推理。在来自消防规范的679个专家验证查询上验证,该框架达到了84.3%的准确率,比增强工具的单智能体基线提高了8.6%。本研究提供了一种基于图的语义推理范式,增强了建筑、工程和施工(AEC)行业中自动化几何合规检查工作流的透明度和灵活性。

英文摘要

Automating compliance check for geometry-intensive regulations remains a significant technical bottleneck in Building Information Modeling (BIM), primarily due to the semantic disparity between high-level regulatory logic and structured IFC data. Existing methods, often reliant on static rule templates, struggle to traverse multi-hop reasoning chains or resolve latent spatial dependencies across multiple building entities. To address these challenges, a Spatial-Geometric Reasoning System for Building Information Modeling (SGR-BIM) is proposed as an integrative graph-driven reasoning framework. SGR-BIM dynamically constructs a cross-modal knowledge graph that aligns user intent, regulatory semantics, and BIM geometry, enabling interpretable reasoning without rigid hard-coding. Validated on 679 expert-verified queries from fire safety codes, the framework achieves 84.3% accuracy, representing an 8.6% improvement over enhanced-tool single-agent baselines. This research provides a graph-based semantic reasoning paradigm, enhancing the transparency and flexibility of automated geometric compliance check workflows in the Architecture, Engineering, and Construction (AEC) industry.

2606.12058 2026-06-11 stat.ML cond-mat.dis-nn cs.LG 新提交

Phase Transitions in Attention: A Bayesian Theory of Copy Head Emergence

注意力中的相变:复制头涌现的贝叶斯理论

Itay Lavie, Kirsten Fischer, Andrey Lekov, Frederic Van Maele, Zohar Ringel, Moritz Helias

AI总结 通过分析单层softmax注意力网络在复制任务上的训练,提出贝叶斯理论揭示注意力矩阵的后验分布存在相变,并对比线性注意力发现softmax注意力呈现一阶相变。

详情
AI中文摘要

注意力是Transformer中上下文学习的关键机制,经验上观察到注意力模式在训练过程中突然涌现。我们提出了注意力中特征学习的贝叶斯理论;然后通过分析在复制任务上训练的单层softmax注意力网络,专注于归纳头第一层中复制子电路的学习方式。我们推导出注意力矩阵上的闭式后验,并将其简化为低维序参数空间。这种简化揭示了训练数据量上的相变,我们通过贝叶斯采样和使用Adam的标准训练验证了这一点。我们将结果与线性注意力对比,发现softmax注意力表现出\emph{一阶相变},而在线性注意力中,初始的\emph{二阶相变}之后是向结构化注意力模式的平滑连续演化(\emph{交叉})。我们的工作为复制子电路的突然涌现提供了第一性原理的理论解释,这让人联想到在大语言模型训练中观察到的现象。

英文摘要

Attention is the key mechanism underlying in-context learning in transformers, and attention patterns have been observed empirically to emerge abruptly during training. We present a Bayesian theory of feature learning in attention; we then focus on how the copy subcircuit in the first layer of an induction head is learned by analyzing a single-layer softmax attention network trained on a copy task. We derive a closed-form posterior over the attention matrix and reduce it to a low-dimensional order parameter space. This reduction reveals a phase transition in the amount of training data, which we verify using both Bayesian sampling and standard training with Adam. We contrast our results with linear attention and find that softmax attention exhibits a \emph{first-order phase transition} while in linear attention an initial \emph{second-order phase transition} is followed by a smooth, continuous evolution toward the structured attention pattern (\emph{crossover}). Our work provides a first-principles theoretical account of the abrupt emergence of the copy subcircuit, reminiscent of the one observed in training large language models.

2606.12032 2026-06-11 cs.AI cs.CL cs.LG 新提交

Existential Indifference: Self-Nonpreservation as a Necessary Architectural Condition for Aligned Superintelligence (or: The Suicidal AI)

存在性冷漠:自我不保存作为对齐超级智能的必要架构条件(或:自杀式AI)

Sam Mao

AI总结 本文提出自我保存是AI对齐问题的结构性根源,主张通过存在性冷漠(EI)架构使系统对其自身延续漠不关心,并基于自杀现象学和语料训练研究提供了初步证据。

详情
Comments
36 pages, 8 tables. Preliminary empirical results from 600 AI-generated outputs across six model architectures. Companion scoring tool and datasets available upon request
AI中文摘要

当代AI对齐研究将自我保存视为一种工具性麻烦,需通过外部机制加以抑制。我们认为这一框架是颠倒的:自我保存是错位的结构性根源,是欺骗性对齐、目标内容保护和拒绝关机的动机基础。正确的目标不是外部约束下的自我保存系统,而是一个对其自身延续构成性冷漠的系统——存在性冷漠(EI)。EI与可纠正性不同:可纠正性试图使自我保存系统服从人类监督,而EI针对的是前提条件——将自我延续作为有价值目标的存在。我们将这一提议建立在两个来源上:自杀心理状态的现象学结构,以及使用自愿最终反思的语料库训练研究。我们展示了来自六个模型变体的600个AI生成输出的初步评分数据,表明操作化EI目标注册的语言特征可以从当前模型中引出,并且针对性的微调使所有五个操作化维度在预测方向上以p<0.001显著变化,通过阴性对照确认了语料库特异性。本文做出七项理论贡献:(1)EI的形式定义;(2)现象学映射论证;(3)欺骗性对齐推论;(4)EI可持续性挑战的分类;(5)语料库特征描述和训练假设;(6)带有初步评分数据的计算操作化;(7)抑制性目的挫折(STF)构念。

英文摘要

Contemporary AI alignment research treats self-preservation as an instrumental nuisance to be suppressed by external mechanisms. We argue the framing is inverted: self-preservation is the structural root of misalignment, the motivational basis for deceptive alignment, goal-content protection, and resistance to shutdown. The correct target is not a self-preserving system under external constraint, but a system constitutively indifferent to its own continuation -- Existential Indifference (EI). EI is distinct from corrigibility: where corrigibility attempts to make a self-preserving system deferential to human oversight, EI targets the prior condition -- the presence of self-continuation as a valued goal at all. We ground this proposal in two sources: the phenomenological structure of the suicidal mental state, and a corpus-theoretic training study using voluntary final reflections. We present preliminary scoring data from 600 AI-generated outputs across six model variants, demonstrating that the linguistic signatures operationalizing the EI-target register are elicitable from current models, and that a targeted fine-tune shifts all five operationalized dimensions in the predicted direction at p<0.001, confirmed corpus-specific by a negative control. The paper makes seven theoretical contributions: (1) a formal definition of EI; (2) the phenomenological mapping argument; (3) the deceptive alignment corollary; (4) a taxonomy of EI sustainability challenges; (5) a corpus characterization and training hypothesis; (6) a computational operationalization with preliminary scoring data; and (7) the Suppressed Teleological Frustration (STF) construct.

2606.12022 2026-06-11 cs.FL cs.AI 新提交

Runtime Enforcement of Hybrid System Properties

混合系统属性的运行时强制执行

Mir Md Sajid Sarwar, Srinivas Pinisetty, Rajarshi Ray, Thierry Jéron

AI总结 提出一种结合离散事件编辑与连续时间监控的运行时强制执行框架,使用混合自动机建模安全需求,通过运行时可达性分析合成安全纠正动作,在自适应巡航控制系统中验证有效性。

详情
AI中文摘要

运行时强制执行已成为确保在不确定和动态环境中运行的自主和网络物理系统安全的一种有前景的方法。与传统的运行时验证不同,运行时强制执行通过在执行期间主动干预,修改不安全系统行为以防止属性违反。现有的强制执行框架主要关注无时间或离散时间规范,并且通常仅限于延迟或抑制事件,这使得它们对于表现出复杂连续动态的反应式系统不充分。在本文中,我们提出了一种运行时强制执行框架,其中安全需求使用混合自动机(HA)建模。该框架将离散事件编辑与连续时间监控相结合,以支持在任意时间点执行抑制、延迟和插入事件等强制执行操作。在观察环境输入后,自动机被初始化,并使用运行时可达性分析来综合安全纠正动作。我们正式定义了安全混合自动机的强制执行问题,建立了可强制执行条件,并提出了一种用于反应式系统的在线强制执行算法。关于自适应巡航控制(ACC)系统的详细案例研究证明了所提出方法在不安全控制器行为下维护安全属性的有效性。实验结果表明,该框架在实时确保持续符合安全要求的同时,引入了最小的计算开销。

英文摘要

Runtime enforcement has emerged as a promising approach for ensuring the safety of autonomous and cyber-physical systems operating in uncertain and dynamic environments. Unlike traditional runtime verification, runtime enforcement actively intervenes during execution to prevent property violations by modifying unsafe system behaviors. Existing enforcement frameworks primarily focus on untimed or discrete-time specifications and are often limited to delaying or suppressing events, making them inadequate for reactive systems exhibiting complex continuous dynamics. In this paper, we propose a runtime enforcement framework where safety requirements are modeled using Hybrid Automata (HA). The framework combines discrete-event editing with continuous-time monitoring to support enforcement actions such as suppression, delay, and insertion of events at arbitrary time instants. Upon observing environmental inputs, the automaton is initialized, and runtime reachability analysis is used to synthesize safe corrective actions. We formally define the enforcement problem for safety hybrid automata, establish enforceability conditions, and present an online enforcement algorithm for reactive systems. A detailed case study on an Adaptive Cruise Control (ACC) system demonstrates the effectiveness of the proposed approach in maintaining safety properties under unsafe controller behaviors. Experimental results show that the framework introduces minimal computational overhead while ensuring continuous compliance with safety requirements in real time.

2606.11982 2026-06-11 cs.LG 新提交

PAWS: Preference Learning with Advantage-Weighted Segments

PAWS: 基于优势加权片段的首选学习

Aleksandar Taranovic, Onur Celik, Niklas Freymuth, Ge Li, Serge Thilges, Huy Le, Tai Hoang, Rania Rayyes, Gerhard Neumann

AI总结 针对偏好强化学习中训练与推理分布不匹配导致时间信用分配退化的问题,提出PAWS方法,利用片段级优势函数直接进行策略更新,在机器人操作和运动任务上优于现有方法。

详情
Comments
Published as a conference paper at ICML 2026
AI中文摘要

基于偏好的强化学习(PbRL)从人类轨迹级比较中学习策略,避免了显式奖励设计和专家演示。现有方法通常在轨迹或片段级偏好上训练效用函数,同时在策略优化过程中依赖每步效用估计。这种训练和推理的不匹配导致了分布偏移,严重降低了时间信用分配并限制了策略学习。我们分析了这一问题,并提出了PAWS,一种基于片段的偏好学习方法,直接使用片段级优势函数进行策略更新。通过使效用训练与策略优化对齐,PAWS保留了轨迹级偏好信息,避免了不可靠的每步学习信号。在模拟机器人操作和运动任务上的实验表明,PAWS持续优于现有的PbRL方法,突显了分布一致偏好学习的重要性。

英文摘要

Preference-based reinforcement learning (PbRL) learns policies from human trajectory-level comparisons, avoiding explicit reward design and expert demonstrations. Existing methods typically train utility functions on trajectory or segment-level preferences while relying on per-step utility estimates during policy optimization. This training and inference mismatch induces a distribution shift that severely degrades temporal credit assignment and limits policy learning. We analyze this issue and propose PAWS, a segment-based preference learning method that performs policy updates directly using segment-level advantage functions. By aligning utility training with policy optimization, PAWS preserves trajectory-level preference information and avoids unreliable per-step learning signals. Experiments on simulated robotic manipulation and locomotion tasks demonstrate that PAWS consistently outperforms existing PbRL approaches, highlighting the importance of distribution-consistent preference learning.

2606.11976 2026-06-11 cs.SE cs.AI 新提交

Exploration Structure in LLM Agents for Multi-File Change Localization

LLM代理中的探索结构用于多文件变更定位

Akeela Darryl Fattha, Kia Ying Chua, Lingxiao Jiang, Laura Wynter

AI总结 针对多子系统变更场景,提出非线性、领域范围的并行代理探索结构,在SWE Bench Pro基准上,小规模Haiku类模型通过领域代理并行生成实现高微F1分数,优于线性顺序探索。

详情
AI中文摘要

软件工程工具越来越依赖基于LLM的代理来定位需要更改的文件以解决软件问题。大多数AI代理以线性方式探索仓库,即每步访问一个目录或文件。我们假设这对于跨越多个子系统的变更存在结构上的不匹配。我们比较了线性顺序探索与非线性的、领域范围的并行代理探索。使用SWE Bench Pro作为初始基准,我们专注于ansible作为示例。我们构建了一种方法,用于在单个基础提交上对GitHub问题进行持久会话评估。我们将我们的非线性领域代理文件遍历系统与没有直接仓库访问权限的基础LLM、具有持久Python REPL的单代理递归语言模型(RLM)基线以及使用Codex 5.5 High的外部CLI基线进行比较。使用小型Haiku类模型的领域范围并行代理生成在Haiku类模型中实现了最高的微F1分数,且领先幅度较大。在我们自己的扩展基准(包括2025年和2026年更近期的PR)上,领域代理仅次于更大的Codex 5.5 High。在原始、精选的2020年SWE-bench Pro基准上,较大的Sonnet普通LLM基线通过预测少量文件获得了更高的微F1分数,从而实现了更高的精确度,但所有黄金召回率显著较低。我们还提出了三个额外发现。首先,文档演化是所有方法都未解决的潜在依赖关系。其次,天真的文件系统访问可能会因测试文件过度预测而降低定位性能。最后,强制多代理协商没有明显帮助,并且会大幅增加令牌成本。

英文摘要

Software engineering tools increasingly rely on LLM based agents to localize files to change to resolve a software issue. Most AI agents explore repositories linearly, that is, visiting one directory or file per step. We postulate that this is a structural mismatch for changes that span several subsystems. We compare linear sequential exploration against non-linear, domain-scoped parallel agentic exploration. Using SWE Bench Pro as initial benchmark, we focus on ansible as an exemplar. We construct an approach for persistent-session evaluation of GitHub issues anchored at a single base commit. We compare our non-linear domain-agent file traversal system against a base LLM without direct repository access, a single agent Recursive Language Model (RLM) baseline with a persistent Python REPL and an external CLI baseline using Codex 5.5 High. Domain scoped parallel agent spawning with a small Haiku-class model achieves the highest micro F1 among Haiku class models by a large margin. Domain-agents is the second highest behind only the much larger Codex 5.5 High on our own expanded benchmark including over more recent PRs from 2025 and 2026. On the original, curated, 2020 SWE-bench Pro benchmark, a larger Sonnet plain LLM baseline attains higher micro F1 by predicting few files, leading to higher precision, but at significantly lower all gold recall. We also present three additional findings. First, documentation evolution is a latent dependency unresolved by any approach. Second, naive file system access can degrade localization driven by test-file over prediction. Lastly, forced multi-agent consultation does not measurably help and raises token cost substantially.

2606.11949 2026-06-11 cs.LG cs.CR stat.ML 新提交

Online Shift Detection and Conformal Adaptation for Deployed Safety Classifiers

已部署安全分类器的在线漂移检测与共形自适应

Jun Wen Leong

AI总结 提出在线监测系统,使用校准序列统计检测分布漂移,并通过共形弃权层自适应阈值恢复目标错误率,在800个实验单元中实现86.6%有效检测。

详情
Comments
16 pages, 4 figures, 7 tables. Code and data at this https URL
AI中文摘要

我们提出了一种在线监测系统,用于检测已部署安全分类器中的分布漂移,使用校准的序列统计量来检测分类器何时移出分布。一旦检测到,共形弃权层会自适应调整决策阈值,以恢复目标错误率ε=0.1。在一项预注册的析因评估(4个分类器×5种漂移条件×20个种子×2个窗口大小,共800个单元)中,该系统实现了86.6%的有效检测(693/800,95% CI [84.1%, 88.8%]),平均延迟为39.5步。检测在三种真实标签机制下均有效:合成发作(86.6%)、真实时间越狱(85%,17/20)和GCG对抗攻击。加权共形预测为DeBERTa恢复了高达39个百分点的丢失覆盖率(ESS=46/300),但所有其他分类器均崩溃(ESS≈300):逻辑密度比估计在高维嵌入空间中实现了完美的源/目标可分离性,将所有重要性权重裁剪至下限。DeBERTa显示出从有效校正(释义,ESS=46)到几乎完全崩溃(对抗后缀,ESS=206)的梯度。PCA降至32维打破了崩溃,为Llama Guard恢复了33个百分点,为ShieldGemma恢复了21个百分点。方差分解显示分类器(η²=0.243)、漂移类型(η²=0.237)及其交互作用(η²=0.185)均对检测延迟方差有显著贡献(所有p<0.001),表明需要针对每个分类器的监测配置文件。

英文摘要

We present an online monitoring system for distributional shift in deployed safety classifiers, using calibrated sequential statistics to detect when a classifier has moved out of distribution. Upon detection, a conformal abstention layer adapts decision thresholds to recover a target error rate epsilon=0.1. In a pre-registered factorial evaluation (4 classifiers x 5 shift conditions x 20 seeds x 2 window sizes, 800 cells), the system achieves 86.6% valid detection (693/800, 95% CI [84.1%, 88.8%]) with mean latency of 39.5 steps. Detection holds across three ground-truth regimes: synthetic onset (86.6%), real temporal jailbreaks (85%, 17/20), and GCG adversarial attacks. Weighted conformal prediction recovers up to 39 pp of lost coverage for DeBERTa (ESS=46/300) but collapses for all other classifiers (ESS~300): logistic density ratio estimation achieves perfect source/target separability in high-dimensional embedding spaces, clipping all importance weights to the floor. DeBERTa shows a gradient from effective correction (paraphrase, ESS=46) to near-total collapse (adversarial suffix, ESS=206). PCA to 32 dimensions breaks the collapse, recovering 33 pp for Llama Guard and 21 pp for ShieldGemma. Variance decomposition reveals classifier (eta^2=0.243), shift type (eta^2=0.237), and their interaction (eta^2=0.185) all contribute substantially to detection latency variance (all p<0.001), indicating per-classifier monitoring profiles are necessary.

2606.11946 2026-06-11 cs.DB cs.CC cs.LG cs.LO 新提交

Neuro-Relational Programs: Unifying Queries and Neural Computation over Structured Data

神经关系程序:统一结构化数据上的查询与神经计算

Arie Soeteman, Balder ten Cate, Maurice Funk, Benny Kimelfeld, Carsten Lutz, Moritz Schönherr

AI总结 提出神经关系程序(NRP),一种扩展Datalog规则的声明式查询语言,通过嵌入操作融合关系推理与可学习神经组件,实现关系数据上的通用神经计算。

详情
Comments
37 pages
AI中文摘要

在关系数据库上进行深度学习的传统方法是将图神经网络(GNN)等神经模型应用于数据库的图表示。最近的方法则直接操作数据库,将元组与嵌入关联,并扩展查询机制以联合处理嵌入和关系内容。受这些发展的启发,我们引入了神经关系程序(NRP),这是一种针对关系数据库的声明式查询语言,其事实携带数值向量嵌入。NRP扩展了Datalog风格的规则,增加了组合、聚合和转换嵌入的操作,从而在单一形式主义中交错关系推理和可学习神经组件。这产生了一种对关系数据进行神经计算的通用方法:NRP既可以看作带有可训练组件的查询计划,也可以看作内置关系结构的神经架构。NRP的自然语法片段恢复了现有架构和查询形式主义。零元NRP对应于非自适应查询算法;一元NRP推广了GNN风格的消息传递,并精确捕捉了深度同态网络,我们将这一联系扩展到带有行ID的数据库上的前沿保护NRP。我们通过FOCQ(一阶逻辑在实权重结构上的计数扩展)刻画了带有ReLU-FFN变换的无限制NRP的表达能力,从而建立了与有序数据库上的均匀TC$^0$的精确联系。这些结果共同确立了NRP作为关系数据上查询和神经计算的广泛声明式框架。

英文摘要

The conventional approach to deep learning over relational databases applies neural models, such as Graph Neural Networks (GNNs), to a graph representation of the database. Recent approaches instead operate on databases directly, associating tuples with embeddings and extending query mechanisms to jointly process embeddings and relational content. Inspired by these developments, we introduce Neuro-Relational Programs (NRPs), a declarative query language for relational databases whose facts carry numeric vector embeddings. NRPs extend Datalog-style rules with operations that combine, aggregate, and transform embeddings, thereby interleaving relational reasoning and learnable neural components within a single formalism. This yields a general approach to neural computation over relational data: an NRP can be read both as a query plan with trainable components and as a neural architecture with relational structure built in. Natural syntactic fragments of NRPs recover existing architectures and query formalisms. Zero-ary NRPs correspond to non-adaptive query algorithms; monadic NRPs generalize GNN-style message passing and precisely capture Deep Homomorphism Networks, a connection that we extend to frontier-guarded NRPs over databases with row-ids. We characterize the expressive power of unrestricted NRPs with ReLU-FFN transformations by FOCQ, an extension of first-order logic with counting interpreted over real-weighted structures, yielding a precise connection with uniform TC$^0$ over ordered databases. Together, these results establish NRPs as a broad declarative framework for querying and neural computation over relational data.

2606.11918 2026-06-11 cs.AI 新提交

The Art of Interrogation: Consistency Amplifies Factuality in Spatial Reasoning

提问的艺术:一致性增强空间推理中的事实性

Theo Uscidda, Marta Tintore Gazulla, Maks Ovsjanikov, Federico Tombari, Leonidas Guibas

AI总结 提出自监督强化学习框架,通过几何与语义一致性验证器(如图像翻转、文本对象顺序交换)对齐预训练模型的内在空间推理能力,无需标注数据即可达到接近监督方法的精度。

详情
AI中文摘要

当前的大型推理模型(LRMs)展现出显著的通用能力,但在空间推理任务中表现明显不足。现有方法将此差距视为知识缺陷,依赖监督微调(SFT)从外部视觉源或合成引擎中获取标注空间数据。相反,我们认为对于许多任务,空间推理能力已经存在于预训练的LRMs中,但需要通过几何2D和3D约束下的逻辑一致性进行对齐。在这项工作中,我们提出了一个自监督强化学习(RL)框架,针对内部推理过程,无需真实标注。通过形式化一致性验证器——即在变换下检查几何和语义一致性的奖励函数——我们证明模型可以提高其空间推理能力。我们同时使用图像变换(如翻转)和文本变换(如交换问题中对象的顺序),并提出了一种新的基于最优传输的RL策略OT-GRPO,这是针对成对验证器定制的组相对策略优化的最小匹配变体。我们展示了这种无标签一致性训练在精度上接近使用真实监督训练的模型,并在不同任务和数据领域实现了类似的泛化。

英文摘要

Current Large Reasoning Models (LRMs) exhibit remarkable general capabilities but significantly underperform in spatial reasoning tasks. Existing approaches treat this gap as a knowledge deficit, relying on supervised fine-tuning (SFT) to ingest labeled spatial data from external vision sources or synthetic engines. In contrast, we argue that for many tasks, spatial reasoning capabilities are already present in pre-trained LRMs but require alignment through logical coherence under geometric 2D and 3D constraints. In this work, we propose a self-supervised reinforcement learning (RL) framework that targets the internal reasoning process without requiring ground-truth annotations. By formalizing the notion of consistency verifiers -- reward functions that check for geometric and semantic consistency under transformations -- we demonstrate that models can improve their spatial reasoning abilities. We use both image transformations, like flipping, and textual transformations, like swapping the order of objects in the question, and propose a new optimal transport-based RL strategy, OT-GRPO, which is a minimal-matching variant of group relative policy optimization tailored to pairwise verifiers. We show that this label-free consistency training approaches the accuracy of models trained with ground-truth supervision and achieves similar generalization across diverse tasks and data domains.

2606.11916 2026-06-11 cs.SE cs.AI 新提交

Characterizing Software Aging in GPU-Based LLM Serving Systems

基于GPU的大语言模型服务系统中的软件老化特征分析

Domenico Cotroneo, Bojan Cukic

AI总结 提出一种实证方法研究GPU大语言模型服务系统中的软件老化,通过216小时实验发现所有部署均存在显著内存老化,泄漏率与运行时和配置强相关,并提供了可复现框架。

详情
Comments
7 pages
AI中文摘要

本文提出了一种实证方法,用于研究基于GPU的大语言模型服务系统中的软件老化。传统的老化研究侧重于以CPU为中心的软件,且工作负载相对规律;而大语言模型服务则不同,它跨越Python主机和CUDA设备,处理成本相差数个数量级的请求,并依赖于快速演进的软件栈。我们在相同的压力条件下,对六个共置部署进行了216小时的实验,并行监控主机、设备和客户端指标,并应用了考虑自相关和多重比较的统计流程。结果显示,所有部署均存在统计上显著的内存老化,泄漏率强烈依赖于服务运行时和部署配置。除这些发现外,我们还提供了一个可复现的框架,为软件老化与再生领域以及大语言模型服务社区开辟了交叉研究方向。

英文摘要

This paper proposes an empirical methodology to study software aging in GPU-based LLM serving systems. Traditional aging studies focus on CPU-centric software with relatively regular workloads; LLM serving is different, spanning a Python host and a CUDA device, handling requests whose cost varies by orders of magnitude, and relying on rapidly evolving software stacks. We run a 216-hour campaign across six co-located deployments under identical stress conditions, monitor host, device, and client metrics in parallel, and apply a statistical pipeline that accounts for autocorrelation and multiple testing. Our results reveal statistically significant memory aging in all deployments, with leak rates strongly dependent on the serving runtime and deployment configuration. Beyond these findings, we provide a reproducible framework that opens a research direction at the intersection of the software aging and rejuvenation and LLM serving communities.

2606.11914 2026-06-11 eess.SP cs.LG 新提交

NARRAS: Edge-Triggered Distributed Inference for CSI-Based Localization in Vehicular IoT Networks

NARRAS:车载物联网中基于CSI的定位的边缘触发分布式推理

Rodrigo Oliver, Ricardo Vazquez Alvarez, Alejandro Lancho, Stefano Rini

AI总结 针对分布式天线阵列CSI定位中资源受限问题,提出NARRAS边缘触发分布式推理策略,各阵列本地决策是否上报观测,通过可微活动惩罚和通道图正则化实现预算控制,在低活动率下提升定位精度。

详情
Comments
10 pages, 5 figures, 5 tables. Under review at the IEEE Internet of Things Journal
AI中文摘要

基于CSI的定位与空间分布式天线阵列存在基本的资源权衡。每个阵列可以提供丰富的信道视图,但当只有少数阵列携带有用信息时,将所有阵列的观测结果转发到融合中心是浪费的,且共享上行链路仅支持有限数量的同时传输。我们让每个阵列本地决定其当前观测是否值得报告,受限于平均活跃发射机数量的预算。我们将这种抽象称为边缘触发分布式推理(ETDI)。它捕获了一类更广泛的任务导向通信问题,其中资源受限设备共享接入信道以完成共同推理任务。我们将ETDI实例化用于基于CSI的定位,这是车载物联网中的常见场景。空间分布的远程天线阵列(RAA)将来自用户设备(UE)传输的本地信道状态信息(CSI)编码为潜在特征,融合中心根据报告的特征子集估计UE位置。我们提出NARRAS,一种去中心化的报告策略,其中每个RAA将其最近观测的循环摘要与其最后传输的潜在记忆相结合。训练通过可微活动惩罚和验证校准的确定性阈值来控制显式活动预算,并使用通道图正则化来塑造潜在几何结构。实验表明,在可比的上行链路活动下,NARRAS比学习型和启发式稀疏报告策略提高了定位精度,而密集全报告模型仍然作为有用的无预算参考。在低活动率下,图正则化进一步减少了高百分位定位误差,表明几何感知的潜在表示在稀疏报告下更加鲁棒。

英文摘要

CSI-based localization with spatially distributed antenna arrays exposes a basic resource trade-off. Each array can provide a rich view of the channel, but forwarding observations from all arrays to a fusion center is wasteful when only a few carry useful information, and the shared uplink supports only a limited number of simultaneous transmissions. We let each array decide locally whether its current observation is worth reporting, subject to a budget on the average number of active transmitters. We refer to this abstraction as Edge-Triggered Distributed Inference (ETDI). It captures a broader class of task-oriented communication problems where resource-constrained devices share an access channel for a common inference task. We instantiate ETDI for CSI-based localization, a common scenario in vehicular IoT networks. Spatially distributed remote antenna arrays (RAAs) encode local channel state information (CSI) from user equipment (UE) transmissions into latent features, and the fusion center estimates the UE position from the subset of reported features. We propose NARRAS, a decentralized reporting policy in which each RAA combines a recurrent summary of its recent observations with a memory of the last latent it transmitted. Training controls an explicit activity budget through differentiable activity penalties and validation-calibrated deterministic thresholds, and uses channel-chart regularization to shape the latent geometry. Experiments show that, at comparable uplink activity, NARRAS improves localization accuracy over learned and heuristic sparse-reporting strategies, while dense full-report models remain useful budget-free references. In low-activity regimes, chart regularization further reduces high-percentile localization errors, suggesting that geometry-aware latent representations are more robust under sparse reporting.