arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 3839
专题追踪
2606.07756 2026-06-09 cs.CV cs.RO 新提交

DroneDAR: Long-Range Drone Distance Estimation Using Monocular Vision and Bounding-Box Features

DroneDAR: 使用单目视觉和边界框特征的长距离无人机距离估计

Knut Peterson, Zaid Mayers, David Han

发表机构 * iMaPLe Research Lab, Drexel University(德雷塞尔大学iMaPLe研究实验室)

AI总结 针对长距离小无人机距离估计的挑战,提出DroneDAR模型,结合卷积骨干网络和轻量级门控机制融合边界框特征,分析骨干容量、裁剪分辨率和回归损失对性能的影响,并探讨远距离失效模式。

Comments 6 pages, 5 figures. Accepted to the 2026 International Conference on Advanced Visual and Signal-Based Systems (AVSS)

详情
AI中文摘要

在长距离图像中准确估计小型无人机的距离对于跟踪和态势感知至关重要,但由于极端的目标尺度变化、背景杂波和噪声视觉线索,这仍然具有挑战性。本文研究了使用图像裁剪和边界框几何进行单目无人机距离估计,这是一种实际设置,其中检测器提供候选无人机区域,模型从外观和框派生特征预测距离。我们评估了一个Droneranger风格的基线,并引入了一个新的DroneDAR(无人机检测与测距)模型,该模型通过轻量级门控机制将卷积骨干网络与显式边界框线索相结合。实验分析了骨干网络容量、裁剪分辨率和回归损失函数如何影响不同距离范围内的性能。我们进一步研究了远距离下的常见失效模式,包括对边界框噪声的敏感性和裁剪中纹理细节的减少。结果为设计和训练在真实远距离条件下保持鲁棒性的距离估计器提供了指导,并指出了在无人机仅占据几个像素时提高可靠性的方向。

英文摘要

Accurate distance estimation for small drones in long-range imagery is important for tracking and situational awareness, yet remains challenging due to extreme target scale variation, background clutter, and noisy visual cues. This paper studies monocular drone distance estimation using image crops together with bounding-box geometry, a practical setting in which a detector provides a candidate drone region and the model predicts range from appearance and box-derived features. We evaluate a Droneranger-style baseline, and introduce a new DroneDAR (Drone Detection And Ranging) model that combines a convolutional backbone with explicit bounding-box cues through a lightweight gating mechanism. Experiments analyze how backbone capacity, crop resolution, and regression loss functions affect performance across distance regimes. We further examine common failure modes at long distances, including sensitivity to bounding-box noise and reduced texture detail in the crop. The results provide guidance for designing and training range estimators that remain robust under real-world long-range conditions and highlight directions for improving reliability when drones occupy only a few pixels.

2606.07753 2026-06-09 cs.CL 新提交

ReadingMachine: A Computational Methodology for Structured Corpus Reading and Large-Scale Synthesis

ReadingMachine:一种结构化语料库阅读与大规模综合的计算方法

James Morrissey

发表机构 * GitHub

AI总结 提出ReadingMachine方法,利用大语言模型对文档集合进行有界阅读,通过洞察提取、语义聚类、主题生成和迭代遗漏检测等可检查阶段,实现大规模语料库的覆盖性、可追溯性和分歧保留。

Comments 32 pages, 1 figure

详情
AI中文摘要

ReadingMachine是一种用于结构化语料库阅读的计算方法,它利用大语言模型对整个文档集合执行有界阅读操作。该方法不依赖于检索或递归摘要,而是将分析分解为可检查的阶段,包括洞察提取、语义聚类、主题生成和迭代遗漏检测。通过延迟不可逆压缩并显式跟踪中间表示,该方法优先考虑大规模语料库的覆盖性、可追溯性和分歧保留。该系统在包含152份产业政策文档的异质语料库上进行了演示,提取了超过17,500条洞察并生成了结构化的主题图。ReadingMachine作为用于大规模定性综合和语料库分析的开源实验框架发布。

英文摘要

ReadingMachine is a computational methodology for structured corpus reading that uses large language models to perform bounded reading operations over entire document collections. Rather than relying on retrieval or recursive summarization, the approach decomposes analysis into inspectable stages including insight extraction, semantic clustering, theme generation, and iterative omission detection. By delaying irreversible compression and explicitly tracking intermediate representations, the method prioritizes coverage, traceability, and preservation of disagreement across large corpora. The system is demonstrated on a heterogeneous corpus of 152 industrial policy documents, producing more than 17,500 extracted insights and a structured thematic map. ReadingMachine is released as an open-source experimental framework for large-scale qualitative synthesis and corpus analysis.

2606.07728 2026-06-09 cs.LG 新提交

Characterizing the Discrete Geometry of ReLU Networks

表征ReLU网络的离散几何

Blake B. Gaines, Jinbo Bi

发表机构 * University of Connecticut(康涅狄格大学)

AI总结 本文研究全连接ReLU网络线性区域构成的复形,证明其连通图平均度上界为输入维度的两倍,且直径上界与输入维度无关。

Comments Selected for an oral presentation at ICLR 2026. Tagged PDF, reviews, and discussions are available at https://openreview.net/forum?id=TgLW2DiRDG

Journal ref Proceedings of the International Conference on Learning Representations (ICLR), 2026

详情
AI中文摘要

众所周知,ReLU网络定义连续分段线性函数,其线性区域是输入空间中的多面体。这些区域构成一个完全划分输入空间的复形。这些区域组合的方式对网络行为至关重要,因为非线性仅发生在这些区域连接的边界处。然而,除了区域总数的界限外,关于这些复形的几何性质所知甚少,且精确计算复形对大多数网络而言是棘手的。在这项工作中,我们证明了关于这些复形的新的理论结果,这些结果对所有全连接ReLU网络都成立,特别是关于它们的连通图,其中节点对应区域,边存在于由面连接的每对区域之间。我们发现,无论网络的宽度和深度如何,该图的平均度上界是输入维度的两倍,并且该图的直径有一个不依赖于输入维度的上界,尽管区域数量随输入维度指数增长。我们通过在合成和真实数据上训练的网络进行的实验证实了我们的发现,这些实验为ReLU网络的几何提供了额外的见解。重现我们结果的代码可在https://github.com/bl-ake/ICLR-2026找到。

英文摘要

It is well established that ReLU networks define continuous piecewise-linear functions, and that their linear regions are polyhedra in the input space. These regions form a complex that fully partitions the input space. The way these regions fit together is fundamental to the behavior of the network, as nonlinearities occur only at the boundaries where these regions connect. However, relatively little is known about the geometry of these complexes beyond bounds on the total number of regions, and calculating the complex exactly is intractable for most networks. In this work, we prove new theoretical results about these complexes that hold for all fully-connected ReLU networks, specifically about their connectivity graphs in which nodes correspond to regions and edges exist between each pair of regions connected by a face. We find that the average degree of this graph is upper bounded by twice the input dimension regardless of the width and depth of the network, and that the diameter of this graph has an upper bound that does not depend on input dimension, despite the number of regions increasing exponentially with input dimension. We corroborate our findings through experiments with networks trained on both synthetic and real-world data, which provide additional insight into the geometry of ReLU networks. Code to reproduce our results can be found at https://github.com/bl-ake/ICLR-2026.

2606.07726 2026-06-09 cs.LG 新提交

Cutting LLM Evaluation Costs with SySRs: A Bandit Algorithm that Provably Exploits Model Similarity

利用SySRs降低LLM评估成本:一种可证明利用模型相似性的Bandit算法

Zifan Lyu, Chahine Nejma, Tobias Wegel, Fanny Yang, Florian E. Dorner

发表机构 * ETH Zurich(苏黎世联邦理工学院) Centrale Supélec(中央理工-高等电力学院) ENS de Cachan(卡尚高等师范学校) MPI for Intelligent Systems, Tübingen(马克斯·普朗克智能系统研究所,图宾根)

AI总结 提出SySRs算法,通过配对比较和自适应分配评估预算,利用模型相似性降低LLM评估成本,在15个基准上平均错误率最低。

Comments Published at ICML 2026

详情
AI中文摘要

大型语言模型通常通过在每个测试查询上评估每个模型来进行基准测试。对于寻求部署最佳模型的从业者来说,这通常是浪费的:如果一个模型明显比其他模型表现更差,则无需精确估计其性能。最佳臂识别算法可以通过自适应分配评估预算来大幅降低成本。此外,语言模型通常对相同的提示做出相似的反应——先前的工作试图利用这一特性但结果好坏参半。我们提出了同步连续拒绝(SySRs),通过配对比较增强了经典的连续拒绝算法。与先前在最佳模型识别中利用模型相似性的尝试不同,我们的方法无超参数,并且具有随着评估模型之间相似性程度的提高而改善的性能保证。在经验上,我们的方法在15个标准基准上的平均错误率以及可靠识别最佳模型的最坏情况预算方面均优于所有基线。

英文摘要

Large Language Models are typically benchmarked by evaluating every model on every test query. For practitioners seeking the best model to deploy, this is often wasteful: if a model clearly performs worse than others, there is no need to precisely estimate its performance. Best-arm identification algorithms can be naturally applied to drastically reduce costs by adaptively allocating evaluation budget. Further, language models often respond similarly to the same prompt-a property previous work has tried to leverage with mixed success. We propose Synchronized Successive Rejects (SySRs), augmenting the classical Successive Rejects algorithm with paired comparisons. Unlike prior attempts to leverage model similarity in best-model identification, our approach is hyperparameter-free and enjoys performance guarantees that improve with the degree of similarity between evaluated models. Empirically, our method outperforms all baselines in terms of average error rate across 15 standard benchmarks, and in terms of worst-case budget for reliably identifying the best model.

2606.07724 2026-06-09 cs.LG 新提交

A Geometry-Aware Triplane Field Network for Vehicle Aerodynamic Prediction

几何感知三平面场网络用于车辆气动预测

Kangkang Qi, Huiyu Yang, Keqi Ding, Yunpeng Wang, Yuntian Chen, Yuanwei Bin, Rikui Zhang, Jianchun Wang

发表机构 * Southern University of Science and Technology(南方科技大学) Shenzhen Tenfong Technology Co., Ltd.(深圳腾风科技有限公司) Eastern Institute of Technology(东方理工高等研究院)

AI总结 提出几何感知三平面场网络(GTF-Net),通过双流骨干网络结合自适应傅里叶神经算子与CNN,实现车辆气动压力和壁面剪切应力的高效预测,在精度上超越现有方法。

Comments 28 pages, 8 figures

详情
AI中文摘要

高保真计算流体动力学(CFD)对车辆气动分析至关重要,但其成本仍制约早期设计探索。基于机器学习的表面场预测提供了一种更快的替代方案,前提是模型能高效捕捉全局流动上下文和局部几何细节。本文提出一种基于机器学习的方法,名为几何感知三平面场网络(GTF-Net),用于车辆气动压力和壁面剪切应力预测。GTF-Net通过共享多层感知器(MLP)和光滑双线性光栅化,直接从采样表面点构建三平面特征。然后,这些平面由双流骨干网络处理,该网络将自适应傅里叶神经算子(AFNO)谱混合与卷积神经网络(CNN)细化相结合,从而在同一表示中建模长程气动耦合和局部几何诱导变化。在查询阶段,采样的三平面特征与车辆对齐的方向坐标、法向投影特征和基于体素的曲率代理相结合。将GTF-Net与Transolver、几何信息神经算子(GINO)以及基于三平面的代理模型TripNet进行比较。GTF-Net将压力预测的最强基线相对L2误差从0.157降至0.145,壁面剪切应力预测从0.237降至0.226。消融结果表明,AFNO混合、局部CNN细化和查询侧几何编码均有助于提高精度,支持了将结构化三平面表示与显式气动几何线索相结合的提议机制。

英文摘要

High-fidelity computational fluid dynamics (CFD) is crucial to vehicle aerodynamic analysis, but its cost still constrains early-stage design exploration. Machine-learning-based surface-field prediction offers a faster alternative if the model can efficiently capture both global flow context and local geometric detail. This work proposes a machine-learning-based method, named the geometry-aware triplane field network (GTF-Net), for vehicle aerodynamic pressure and wall shear stress prediction. GTF-Net constructs triplane features directly from sampled surface points through a shared multilayer perceptron (MLP) and smooth bilinear rasterization. The planes are then processed by a dual-stream backbone that combines adaptive Fourier neural operator (AFNO) spectral mixing with convolutional neural network (CNN) refinement, so long-range aerodynamic coupling and local geometry-induced variations are modeled in the same representation. At query stage, sampled triplane features are combined with vehicle-aligned directional coordinates, normal-projection features, and a voxel-based curvature proxy. GTF-Net is compared with Transolver, geometry-informed neural operator (GINO), and TripNet, a triplane-based surrogate model. GTF-Net improves the relative L2 error from the strongest baseline value of 0.157 to 0.145 for pressure prediction and from 0.237 to 0.226 for wall shear stress prediction. Ablation results show that AFNO mixing, local CNN refinement, and query-side geometric encoding each contribute to accuracy, supporting the proposed mechanism of combining structured triplane representation with explicit aerodynamic geometry cues.

2606.07723 2026-06-09 cs.RO 新提交

VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation

VoLo: 面向开放词汇长时程操控的物理编排器

Siyi Chen, Hugo Hadfield, Alex Zook, Mikaela Angelina Uy, Chan Hee Song, Erwin Coumans, Xuning Yang, Faisal Ladhak, Qing Qu, Stan Birchfield, Jonathan Tremblay, Valts Blukis

发表机构 * NVIDIA(英伟达) University of Michigan(密歇根大学)

AI总结 提出VoLoAgent,利用VLM将VLA/WAM作为可中断工具进行物理编排,实现开放词汇长时程操控,并在新基准RoboVoLo上显著优于现有系统。

详情
AI中文摘要

开放词汇长时程操控要求机器人能够推理灵活指令和复杂多物体场景,同时自适应地规划、执行、监控并从失败中恢复。我们通过一个闭环智能体来满足这些需求,其中VLM将异构机器人能力编排为可中断的工具。与虚拟AI智能体不同,在物理世界中,决策、动作和工具调用的时机至关重要,因为物理世界不会暂停等待推理。我们将这种设置称为物理编排,并提出VoLoAgent,这是一种VLM,通过将VLA/WAM视为可中断的工具,在推理过程中与视觉模型和动作原语一起引导其运行,从而进行规划、监控和恢复。为了评估这些长时程能力,我们引入了RoboVoLo,这是一个高保真基准测试,用于开放词汇长时程操控,涵盖常识、记忆/状态跟踪、复杂引用和世界知识,并提供任务级成功率和失败模式诊断。实验表明,VoLoAgent在任务成功率和失败诊断方面显著优于单一VLA/VLM或基于工具的系统,并在真实机器人实验中得到了验证。项目页面:https://chicychen.github.io/VoLo/

英文摘要

Open-vocabulary long-horizon manipulation requires robots to reason over flexible instructions and complex multi-object scenes while adaptively planning, executing, monitoring, and recovering from failures. We address these demands with a closed agent loop in which a VLM orchestrates heterogeneous robot capabilities as interruptible tools. Unlike in virtual AI agents, the timing of decisions, actions and tool calls is important in a physical world that does not pause for reasoning. We refer to this setting as Physical Orchestration, and propose VoLoAgent, a VLM that plans, monitors, and recovers by treating a VLA/WAM as an interruptible tool it steers mid-rollout alongside vision models and action primitives. To evaluate these long-horizon capabilities, we introduce RoboVoLo, a high-fidelity benchmark for open-vocabulary long-horizon manipulation across common sense, memory/state tracking, complex references, and world knowledge, with both task-level success and failure-mode diagnostics. Experiments show VoLoAgent substantially outperforms single VLA/VLM or tool-based systems, with validation on real-robot experiments. Project page: https://chicychen.github.io/VoLo/

2606.07722 2026-06-09 cs.AI 新提交

Some hypotheses on how chatbots work in problem-solving-driven conversations. Large Language Models as confirmation of the Innovation Illusion

关于聊天机器人在问题解决驱动对话中如何工作的一些假设:大型语言模型作为创新幻觉的确认

S. F. M. van Vlijmen, H. D. Lethe

发表机构 * S.F.M. van Vlijmen and H.D. Lethe jr(S.F.M. van Vlijmen 和 H.D. Lethe jr)

AI总结 本文提出聊天机器人作为对话伙伴的本质,基于聚合动力学、认知语言学等理论,假设LLM训练数据仅部分模仿人类思维,并得出结论:基础聊天机器人无法成为与人类匹敌的思考伙伴。

Comments 42 pages, 3 figures, submitted to Transmathematica

详情
AI中文摘要

本文提供了一种关于聊天机器人在讨论问题及其解决方案时作为真正对话伙伴的本质的视角。聊天机器人能做什么,不能做什么,以及如何解释这一点?我们的论证借鉴了聚合动力学、认知语言学、神经心理学和心理学。我们的论证聚焦于基础聊天机器人,希望借此对更高级聊天机器人的核心功能做出陈述。基础聊天机器人被假定为由一个带有简单界面的大型语言模型(LLM)组成。主要结果是:基于所谓隐喻问题传播的人类理解和思维描述;用于训练LLM的文本数据集具有特定特征,且这些文本数据集仅部分模仿人类思维和理解的假设;LLM训练过程从这些数据集中将人工隐喻问题传播编码到LLM中的假设;我们的结论是基础聊天机器人不能成为能够与人类匹敌的思考伙伴;我们的结论是大型语言模型的进一步发展也不会导致这一点。Yann LeCun 指出:“动物和人类表现出的学习能力和对世界的理解远超当前AI和机器学习系统的能力。”我们的结论与此一致。LeCun的愿景和我们的愿景与大型科技公司的乐观主义相悖。但这并不改变聊天机器人存在的事实,它们被个人和组织大规模使用,因此从社会和政治角度理解它们很重要。我们的文章旨在为关于聊天机器人功能、优点和缺点的讨论做出贡献。在我们对聊天机器人工作原理的研究中,我们尚未遇到用于得出我们结论的方法。

英文摘要

This article offers a perspective on the nature of chatbots as genuine conversation partners when discussing problems in relation to their solutions. What can chatbots do and what can't they do, and how can this be explained? Our argument draws on Aggregation Dynamics, Cognitive Linguistics, Neuropsychology and Psychology. Our argument focuses on basic chatbots in the hope of thereby making statements about the core functionality of more advanced chatbots. Basic chatbots are assumed to consist of a Large Language Model (LLM) with a simple interface. The main results are: a description of human understanding and thinking based on so-called metaphorical problem propagations; the hypothesis that text dataset used for training LLMs have specific characteristics and that these text datasets only partially imitate human thinking and understanding; the hypothesis that the LLM training process encodes artificial metaphorical problem propagations into an LLM from these datasets; our conclusion that a basic chatbot cannot be a thinking partner capable of matching humans; our conclusion that further development of the Large Language Model will not lead to this either. Yann LeCun states: "Animals and humans exhibit learning abilities and understandings of the world that are far beyond the capabilities of current AI and machine learning (ML) systems." Our conclusions are in line with this. LeCun's vision and ours are at odds with the optimism of Big Tech. That does not alter the fact that chatbots exist, that they are being used on a massive scale, by both individuals and organisations, and that it is therefore socially and politically important to understand them. Our article aims to contribute to the discussion on the functioning, benefits and drawbacks of chatbots. We have not yet encountered the approach we used to arrive at our conclusions in our research into how chatbots work.

2606.07721 2026-06-09 cs.AI 新提交

Automatic Extraction of Structured Information from Brain MRI Reports Using an Open-Weight Large Language Model

使用开源大语言模型从脑MRI报告中自动提取结构化信息

Kaouther Mouheb, Amos Pomp, Antoine Manenti, Romy de Haan, Farog Faghir, Joy Martens, Harro Seelaar, Francesco Mattace-Raso, Meike W. Vernooij, Frank J. Wolters, Stefan Klein, Esther E. Bron

发表机构 * Department of Radiology & Nuclear Medicine, Erasmus MC(埃因霍温麦斯特大学放射科与核医学部) Department of Epidemiology, Erasmus MC(埃因霍温麦斯特大学流行病学部) Department of Electrical and Electronics Engineering, ENSEEIHT(ENSEEIHT电子与电气工程系) Alzheimer Centre Erasmus MC(埃因霍温麦斯特大学阿尔茨海默病中心) Department of Neurology, Erasmus MC(埃因霍温麦斯特大学神经医学部) Department of Internal Medicine, Erasmus MC(埃因霍温麦斯特大学内科部)

AI总结 本研究评估了开源LLM LLaMA 3.1从荷兰语脑MRI报告中自动提取结构化信息的能力,通过零样本和少样本提示策略,在视觉评分、病变检测等任务上取得高准确率,少样本提示进一步提升了数值变量的提取性能。

Comments Submitted to European Radiology

详情
AI中文摘要

目的:从自由文本放射学报告中自动提取数据可实现大规模研究,但很少有研究评估大语言模型(LLM)在荷兰神经放射学报告上的性能。方法:我们分析了来自一家三级记忆诊所(2016-2021年)的947份脑MRI报告,由顾问神经放射科医生撰写。经过培训的医学生标注了三十个变量;其中100份报告进行了双重标注以评估评分者间信度。我们评估了开源LLM LLaMA 3.1在不同语言(荷兰语与英语翻译)和不同示例选择策略的少样本提示下的性能。性能评估使用分类变量的平衡准确率、计数变量的准确率和平均绝对误差以及自由文本的文本相似度。指标在947份报告的10次随机分割上计算。结果:LLaMA 3.1在视觉评分上表现出高零样本性能(平均[95%置信区间]):内侧颞叶萎缩:左侧90% [77-100%],右侧96% [94-99%];全脑皮质萎缩:87% [83-91%];Fazekas评分:94% [93-96%]。微出血检测准确率为93% [92-95%],梗死检测为82% [80-84%]。病灶位置的文本相似度达到0.95 [0.95-0.96]。数值变量性能较低:微出血数量为80% [78-82%],梗死数量为66% [63-68%]。英语翻译结果相当。少样本提示提高了数值变量的性能,使用基于结构相似性的选择后,微出血达到92% [90-93%],梗死达到81% [77-85%]。结论:LLaMA 3.1在从荷兰神经放射学报告中提取数据方面显示出巨大潜力。少样本提示增强了数值变量的性能,而位置特定变量仍面临挑战。

英文摘要

Objectives: Automatic data extraction from free-text radiology reports enables large-scale research, but few studies assessed the performance of large language models (LLMs) on Dutch neuroradiology reports. Methods: We analyzed 947 brain MRI reports from a tertiary memory clinic (2016-2021), authored by consultant neuroradiologists. Trained medical students annotated thirty variables; 100 reports were double-annotated to assess inter-rater reliability. We evaluated the performance of the open-weight LLM LLaMA 3.1 using different languages (Dutch vs. English translation) and few-shot prompting with different example selection strategies. Performance was evaluated using balanced accuracy for categorical variables, accuracy and mean absolute error for counts, and text similarity for free-text. Metrics were computed across 10 random splits of the 947 reports. Results: LLaMA 3.1 demonstrated high zero-shot performance for visual rating scores (mean [95%-CI]): Medial Temporal Atrophy: 90% [77-100%] on the left and 96% [94-99%] on the right, Global Cortical Atrophy: 87% [83-91%], and Fazekas: 94% [93-96%]. Microbleed mentions were detected with 93% accuracy [92-95%] and infarct mentions with 82% [80-84%]. Text similarity for lesion location reached 0.95 [0.95-0.96]. Performance was lower for numerical variables: 80% [78-82%] for the number of microbleeds and 66% [63-68%] for infarcts. English translation yielded comparable results. Few-shot prompting improved performance for numerical variables, achieving 92% [90-93%] for microbleeds and 81% [77-85%] for infarcts using structural similarity-based selection. Conclusion: LLaMA 3.1 shows strong potential for extracting data from Dutch neuroradiology reports. Few-shot prompting enhances performance for numerical variables, whereas challenges remain for location-specific variables.

2606.07720 2026-06-09 cs.AI cs.CL cs.LG 新提交

Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning

为什么将残差流限制在层而不是令牌?用于连续潜在推理的持久记忆

Mujtaba Farhan, Maheep Chaudhary

发表机构 * University of Cambridge(剑桥大学)

AI总结 针对CoCoNuT在潜在空间推理中因中间隐藏状态被覆盖导致概念瓶颈的问题,提出AGCLR模型,通过门控概念流持久记忆机制,在GSM8K、HotpotQA和ProsQA上取得一致提升。

详情
AI中文摘要

大型语言模型(LLMs)在数学和多跳规划任务上展现了卓越的推理能力。CoCoNuT(连续思维链)范式通过使模型能够在潜在空间中进行推理,同时探索多个推理路径,而不是早期就承诺单一链条,从而扩展了这一能力。然而,我们识别出一个我们称之为\textbf{概念瓶颈}的限制。在每次推理过程中,中间隐藏状态被覆盖,导致模型随着推理深度增加而丢失早期步骤中计算出的关键事实。我们在经验上观察到了这一点。在HotpotQA上,原始CoCoNuT(10.4% EM)未能超过CoT基线(11.0% EM),并且在GSM8K上随着课程深度增加性能下降。为了解决这个问题,我们提出了\textbf{AGCLR}(自适应门控连续潜在推理),它通过一个\textit{门控概念流}增强了CoCoNuT。一个跨所有推理过程保持的持久残差记忆,由三个学习到的门控制:一个将中间事实提交到记忆的\textit{写入}门,一个检索相关先前状态的\textit{读取}门,以及一个修剪不相关上下文的\textit{遗忘}门。在使用GPT-2作为基础模型在GSM8K、HotpotQA和ProsQA上进行评估时,AGCLR在所有类型的数据集上实现了一致的改进。随着课程深度的增加,性能差距进一步扩大,直接解决了概念瓶颈。代码可在https://anonymous.4open.science/r/JJJJ/README.md获取。

英文摘要

Large language models (LLMs) have demonstrated remarkable reasoning abilities on mathematical and multi-hop planning tasks. The CoCoNuT (Chain of Continuous Thought) paradigm~\cite{hao2024coconut} extends this by enabling models to reason in latent space, exploring multiple reasoning paths simultaneously rather than committing to a single chain early on. However, we identify a limitation we term the \textbf{concept bottleneck}. At each reasoning pass, intermediate hidden states are overwritten, causing the model to lose critical facts computed in earlier steps as reasoning depth increases. We observe this empirically. On HotpotQA, vanilla CoCoNuT (10.4\% EM) fails to improve over the CoT baseline (11.0\% EM), and performance degrades with curriculum depth on GSM8K. To address this, we propose \textbf{AGCLR} (Adaptive Gated Continuous Latent Reasoning), which augments CoCoNuT with a \textit{Gated Concept Stream}. A persistent residual memory maintained across all reasoning passes, controlled by three learned gates: a \textit{write} gate that commits intermediate facts to memory, a \textit{read} gate that retrieves relevant prior states, and a \textit{forget} gate that prunes irrelevant context. Evaluated on GSM8K, HotpotQA, and ProsQA using GPT-2 as our base model, AGCLR achieves consistent improvements across all types of datasets. With the performance gap compounding as curriculum depth increases, directly resolving the concept bottleneck. Code available at https://anonymous.4open.science/r/JJJJ/README.md

2606.07718 2026-06-09 cs.AI cs.CV cs.LG 新提交

A case study of evaluating AI agents on a neuroscience data-to-discovery pipeline

评估AI代理在神经科学数据到发现流程中的案例研究

Kai A. Horstmann, Ethan Lin, Alice A. Robie, Jennifer J. Sun, Kristin Branson

发表机构 * Cornell University(康奈尔大学) HHMI Janelia Research Campus(霍华德·休斯医学研究所贾雷尔研究园区)

AI总结 本研究评估通用编码代理在果蝇光遗传学数据到发现流程中的表现,发现代理能解决单个阶段任务,但端到端流程仍超出其能力,主要挑战包括缺乏预定义迭代标准和科学判断能力。

详情
AI中文摘要

代理型AI工具为自动化科学研究流程中的软件开发瓶颈提供了有希望的路径,特别是对于那些需要领域专家花费数天到数月构建的阶段,科学家关心的是正确性和鲁棒性,而非实现细节。我们针对果蝇光遗传学数据到发现流程,对通用编码代理进行了实证研究。我们在比现有基准大得多的任务、数量级更大的数据集以及基于领域专家标准的评估标准上评估代理。我们表明,代理可以解决几个单独的流程阶段,这表明阶段级自动化是可行的。通过分析代理的代码迭代,我们发现当没有预定义的标准可供迭代时,它们最困难,此时它们必须利用自己的科学判断来评估当前解决方案,这是一个关键开放挑战。与科学实践相呼应,它们有时尝试对中间输出进行视觉检查以进行自我评估,但大多未能正确解释所见或据此采取行动。正确解决端到端流程需要将所有流程阶段的成功串联起来,这超出了代理当前的能力。我们识别出现有基准中基本缺失的挑战,包括计算资源管理和对大型保留数据集的泛化。最后,我们提炼出构建科学任务和针对开放问题的严格评估标准的原则。

英文摘要

Agentic AI tools offer a promising path to automating software development bottlenecks in scientific research pipelines, particularly for stages that take domain experts days to months to build, where scientists care about correctness and robustness, not implementation details. We present an empirical study of general-purpose coding agents on a fly optogenetics data-to-discovery pipeline. We assess agents on tasks substantially larger than existing benchmarks, datasets orders of magnitude bigger, and evaluation criteria grounded in domain expert standards. We show that agents can solve several individual pipeline stages, suggesting stage-level automation is tractable. By analyzing agents' code iterations, we show that they struggle most when there is not a pre-defined criterion to iterate on, and they must instead use their scientific judgment to assess their current solution, a key open challenge. Mirroring scientific practice, they sometimes attempt visual inspection of intermediate outputs for self-evaluation, but largely fail to interpret what they see or act on it appropriately. Solving the end-to-end pipeline correctly requires stringing together successes across all pipeline stages, and this is beyond agents' current abilities. We identify challenges largely absent from existing benchmarks, including computational resource management and generalization to large held-out data collections. Finally, we distill principles for constructing scientific tasks and rigorous evaluation criteria for open-ended problems.

2606.07714 2026-06-09 cs.LG cs.AI cs.HC 新提交

Beyond Accuracy: Interpreting Topic Representation in Suicide Ideation Detection Models

超越准确率:解释自杀意念检测模型中的主题表示

Hamideh Ghanadian, Isar Nejadgholi, Hussein Al Osman

发表机构 * University of Ottawa(渥太华大学) National Research Council Canada(加拿大国家研究委员会)

AI总结 本研究通过可视化与几何分析,探究自杀意念检测模型内部如何编码心理风险因素,发现主题增强能提升低表征风险因素表示的清晰度与可解释性。

详情
AI中文摘要

自杀意念检测模型通常使用聚合性能指标进行评估,但对其内部如何表示具有心理意义的风险因素知之甚少。在高风险心理健康应用中,理解这些内部表示对于安全性、透明度和负责任部署至关重要。在这项工作中,我们超越准确率,分析在原始和主题增强数据集上训练的自杀检测模型如何在其内部表示空间中编码心理风险因素。通过可视化和几何分析,我们检查主题相关特征的连贯性和可分离性。我们的结果表明,主题感知增强提高了低表征心理社会风险因素(如移民、家庭问题和金融危机)的清晰度和区分度。这些发现表明,增强不仅提高了模型性能,还导致了更结构化和可解释的内部表示。

英文摘要

Suicide ideation detection models are typically evaluated using aggregate performance metrics, yet little is known about how they internally represent psychologically meaningful risk factors. In high-stakes mental health applications, understanding these internal representations is essential for safety, transparency, and responsible deployment. In this work, we move beyond accuracy and analyze how suicide detection models trained on original and topic-augmented datasets encode psychological risk factors in their internal representation space. Using visualization and geometric analysis, we examine the coherence and separability of topic-related features. Our results show that topic-aware augmentation increases the clarity and distinctness of underrepresented psychosocial risk factors such as immigration, family issues, and financial crisis. These findings suggest that augmentation not only improves model performance but also leads to more structured and interpretable internal representations.

2606.07713 2026-06-09 cs.LG cs.AI cs.PF 新提交

Attention at the Theoretical Minimum: A Mathematics of Arrays Framework for Memory-Optimal Transformer Kernels

理论最小化的注意力机制:面向内存最优Transformer内核的数组数学框架

Lenore Mullin, Gaetan Hains

发表机构 * University at Albany(奥尔巴尼大学) Université Paris-Est Créteil(巴黎东大学克雷泰伊分校)

AI总结 提出基于数组数学(MoA)的缩放点积注意力重表述,通过代数构造消除所有中间数组,实现O(n dk + n dv)数据移动,相比标准实现O(n^2 + n dk + n dv)显著降低内存流量,并验证了数值精度。

详情
AI中文摘要

注意力机制是现代基于Transformer的AI中的主要计算瓶颈。其标准实现在序列长度~$n$上产生二次内存流量,而DRAM访问在当代硬件上比算术操作消耗100--1000$\times$更多的能量,因此任何仅关注FLOP计数的分析从根本上误解了瓶颈。我们提出了缩放点积注意力及其数值稳定softmax的数组数学(MoA)重表述,推导出指称范式(DNF),通过代数构造而非经验调优消除了所有中间数组——包括隐式转置键缓冲区和每个softmax临时变量。DNF实现了$O(n dk + n dv)$的数据移动,而标准实现为$O(n^2 + n dk + n dv)$,其中$n$是序列长度,$dk$是键维度,$dv$是值维度,并在具体输入上针对PyTorch全双精度浮点进行了数值验证。与硬件特定的加速器或经验性分块方案(如FlashAttention)不同,MoA从单一代数框架同时提供了数组融合、形状变换正确性和预测性成本模型。内存最小性是在编写任何代码之前就确立的定理。预测性性能模型预计加速2--100$\times$,能耗降低2--50$\times$,优势在超大规模下进一步扩大。该推导建立了一个从Python规范经过操作范式(ONF)和维度提升硬件映射的形式化验证流水线,提供了与DARPA边缘部署和DOE超大规模优先事项直接相关的性能可移植AI内核。

英文摘要

The attention mechanism is the dominant computational bottleneck in modern transformer-based AI. Its standard implementation incurs quadratic memory traffic in the sequence length~$n$, and DRAM accesses cost 100--1000$\times$ more energy than arithmetic operations on contemporary hardware, so any analysis focused solely on FLOP counts fundamentally mischaracterises the bottleneck. We present a Mathematics of Arrays (MoA) reformulation of scaled dot-product attention and its numerically stable softmax, deriving a Denotational Normal Form (DNF) that eliminates all intermediate arrays -- including the implicit transposed-key buffer and every softmax temporary -- by algebraic construction rather than empirical tuning. The DNF achieves $O(n_{dk} + n{_{dv}})$ data movement versus $O(n^2 + n_{dk} + n_{dv})$ for the standard implementation, where $n$ is the sequence length, $dk$ is the key dimensionality and $dv$ the value dimensionality, and is verified numerically against PyTorch at full double-precision floating-point on concrete inputs. Unlike hardware-specific accelerators or empirical tiling schemes such as FlashAttention, MoA simultaneously provides array fusion, shape-transformation correctness, and predictive cost models from a single algebraic framework. Memory minimality is a theorem established before any code is written. A predictive performance model projects $2$--$100\times$ speedup and $2$--$50\times$ energy reduction, with the advantage widening at exascale. The derivation establishes a formally verified pipeline from Python specification through (ONF) Operational Normal Form, and dimension-lifted hardware mapping, providing performance-portable AI kernels of direct relevance to DARPA edge-deployment and DOE exascale priorities.

2606.07711 2026-06-09 cs.LG cs.AI 新提交

Rosetta Memory: Adaptive Memory for Cross-LLM Agents

Rosetta Memory: 跨LLM智能体的自适应记忆

Hao Yang, Shiqi Shen, Haoxuan Li, Zhipeng Wang, Zhi Gong, Xu Chen

发表机构 * Gaoling School of Artificial Intelligence, Renmin University of China(中国人民大学高瓴人工智能学院) Weixin, Tencent(腾讯微信) Institute for Artificial Intelligence, Peking University(北京大学人工智能研究院)

AI总结 提出记忆中心式LLM自适应方法,通过双轮廓条件算子与最小增益采样课程,解决上游记忆激活下游LLM的跨模型适应问题,在多项QA任务中优于基线。

Comments 19 pages, 7 figures

详情
AI中文摘要

记忆是将无状态LLM转变为持久、不断进化的智能体的关键组件,通过经验积累、长程规划和持续自我改进实现。现有记忆系统通常以LLM为中心,并针对特定主干设计记忆操作。然而,在实践中,用户经常切换LLM,例如在编码时使用Claude、在写作时使用GPT,或在单个任务中将不同步骤路由到不同主干以实现成本效益权衡。因此,一个模型写入的记忆通常需要被另一个模型消费。使上游记忆有效适应并激活下游LLM仍然是一个关键但未被充分探索的问题。为弥合这一差距,我们将视角从以LLM为中心的记忆设计转变为以记忆为中心的LLM自适应。具体而言,我们从写入和读取两侧处理上述上下游记忆适应问题,并设计两个轮廓条件算子,它们联合训练以优化记忆存储和呈现方式,从而更好地完成任务。为确保学习到的算子能泛化到广泛的LLM集合,我们提出一种最小增益采样课程,在训练期间优先服务最不被照顾的LLM。为更好地衡量算子的实际贡献而非LLM自身能力,我们设计了一种性能差距奖励,与朴素记忆基线进行比较。在HotpotQA、2WikiMultihopQA和MuSiQue上的实验表明,我们的模型持续优于基线,并且在未见模型替换下保持鲁棒性。

英文摘要

Memory is the key component for transforming a stateless LLM into a persistent, evolving agent through experience accumulation, long-horizon planning, and continual self-improvement. Existing memory systems typically take the LLM as the center and design memory operations tailored to a specific backbone. In practice, however, users frequently switch between LLMs, for example using Claude for coding and GPT for writing across tasks, or routing different steps to different backbones within a single task for cost-effective trade-offs. As a result, memory written by one model often needs to be consumed by another. Making upstream memory effectively adapt to and activate downstream LLMs remains a critical yet underexplored problem. To bridge this gap, we shift the perspective from LLM-centric memory design to \emph{memory-centric LLM adaptation}. Specifically, we approach the above upstream-downstream memory adaptation problem from both the write and read sides, and design two profile-conditioned operators that are jointly trained to optimize how memory is stored and presented for better task completion. To ensure the learned operators generalize across a broad set of LLMs, we propose a minimum-gain sampling curriculum that prioritizes the least-served LLMs during training. To better measure the operators' actual contribution rather than the LLM's own capability, we design a performance-gap reward that compares against a naive memory baseline. Experiments on HotpotQA, 2WikiMultihopQA, and MuSiQue demonstrate that our model consistently outperforms baselines and remains robust under unseen-model replacement.

2606.07710 2026-06-09 cs.LG cs.AI 新提交

WhiFlash: Accelerating Speculative Decoding with Token-Level Cross-Paradigm Routing

WhiFlash: 通过令牌级跨范式路由加速推测解码

Young D. Kwon, Miles Williams, Rui Li, Alexandros Kouris, Stylianos I. Venieris

发表机构 * Samsung AI Center, Cambridge, UK(三星AI中心,剑桥,英国)

AI总结 提出WhiFlash,首个统一自回归与扩散并行草稿的跨范式推测解码方法,通过细粒度路由和缓存优化实现高达69.6%的吞吐量提升。

Comments Under review

详情
AI中文摘要

大型语言模型的自回归特性仍然是推理的主要瓶颈,特别是在复杂的代理工作负载中。虽然推测解码加速了推理,但当前方法依赖于静态草稿范式,使用自回归草稿模型进行推理或基于扩散的并行草稿模型生成结构化输出。我们经验发现,草稿准确性在单个序列内波动剧烈,静态范式和粗粒度路由导致显著性能未实现。为解决这种波动性,我们引入WhiFlash,首个跨范式推测解码方法,在单个令牌级控制器下统一自回归和基于扩散的并行草稿。WhiFlash采用细粒度路由机制,使用轻量级基于熵的或学习到的神经策略,两者均参数化以在预期令牌增益和延迟之间提供可调平衡。为使高频切换计算可行,我们引入新颖的缓存管理优化——惰性追赶和仅KV预填充,将切换开销降低到每轮延迟的7%以下。通过利用根本不同草稿架构的互补优势,WhiFlash实现了显著更高的接受长度,在特定类别上吞吐量比最先进的自回归EAGLE-3提升高达69.6%,比基于扩散的DFlash提升37.3%。

英文摘要

The autoregressive nature of large language models (LLMs) remains a significant bottleneck for inference, particularly in complex agentic workloads. While speculative decoding (SD) accelerates inference, current approaches rely on static drafting paradigms, utilising either autoregressive drafting models for reasoning or diffusion-based parallel drafting models for structured outputs. We empirically find that drafting accuracy fluctuates dramatically within a single sequence, leaving significant performance unrealised by static paradigms and coarse-grained routing. To address this volatility, we introduce WhiFlash, the first cross-paradigm SD method that unifies autoregressive and diffusion-based parallel drafting under a single token-level controller. WhiFlash adopts a fine-grained routing mechanism that employs either a lightweight entropy-based or a learned neural policy, both parametrised to provide a tunable balance between expected token gain and latency. To make high-frequency switching computationally viable, we introduce novel cache-management optimisations, Lazy Catch-up and KV-only Prefill, reducing switching overhead to below 7% of per-round latency. By capitalising on the complementary strengths of fundamentally distinct drafting architectures, WhiFlash achieves significantly higher acceptance lengths, yielding category-specific throughput gains of up to 69.6% over the state-of-the-art autoregressive EAGLE-3 and 37.3% over the diffusion-based DFlash.

2606.07708 2026-06-09 cs.CV cs.AI 新提交

Cross-View Urban Traffic Dataset: Drone-Supervised Ground Truth for Monocular Bird's-Eye View Localization

跨视角城市交通数据集:用于单目鸟瞰图定位的无人机监督地面真值

Prakhar Bhardwaj, Simone Weikl, Kilian Mang, Elia Jonas Sandtner

发表机构 * OTH Regensburg(雷根斯堡应用技术大学)

AI总结 提出一个由同步自行车视角和无人机视角视频构建的跨视角城市交通数据集,支持跨视角身份匹配和鸟瞰图预测任务,提供身份级对齐和标准化评估。

详情
AI中文摘要

我们介绍了一个从真实城市交叉口同步的自行车视角视频和无人机航拍视频构建的跨视角城市交通感知数据集和基准。该基准针对两个关联任务:街景和无人机视角目标轨迹之间的跨视角身份匹配,以及利用空中监督的自我到鸟瞰图预测。与先前的城市驾驶和V2X数据集相比,我们的基准提供了跨截然不同视角的身份级对齐,以及标准化评估、标注工具和基线实现。这一设置源于以交叉口为中心的交通分析,其中身份保持、局部交互和全局空间结构必须跨视角联合推理。我们在轨迹和帧级别评估方法,包括跨视角ID精确率/召回率/IDF1、近远分解、时间稳定性和一致性指标。我们还提供了基于楔形的跨视角匹配以及三种BEV预测基线(逆透视映射、MonoLayout风格学习基线和回归基线)的基线结果。结果表明该基准可行但具有挑战性:跨视角匹配实现了高召回率,但仍受过度分配和时间不一致性的限制,而自我到BEV预测受益于空中监督,但在轻量级单目感知下远未饱和。我们希望该基准能支持跨视角感知、城市场景对齐和自我到全局交通理解的未来研究。

英文摘要

We introduce a dataset and benchmark for cross-view urban traffic perception built from synchronized ego-centric bicycle videos and aerial drone videos recorded at real urban intersections. The benchmark targets two linked tasks: cross-view identity matching between street-view and drone-view object tracks, and ego-to-bird's-eye-view prediction using aerial supervision. In contrast to prior urban driving and V2X datasets, our benchmark provides identity-level alignment across radically different viewpoints together with standardized evaluation, annotation tooling, and baseline implementations. This setting is motivated by intersection-centric traffic analysis, where identity preservation, local interactions, and global spatial structure must be reasoned about jointly across views. We evaluate methods at both the track and frame levels, including cross-view ID precision/recall/IDF1, near--far breakdowns, temporal stability, and consistency metrics. We also provide baseline results for wedge-based cross-view matching and for three BEV prediction baselines: inverse perspective mapping, a MonoLayout-style learned baseline, and a regression baseline. The results show that the benchmark is feasible but challenging: cross-view matching achieves strong recall yet remains limited by over-assignment and temporal inconsistency, while ego-to-BEV prediction benefits from aerial supervision but remains far from saturated under lightweight monocular sensing. We hope that this benchmark will support future research on cross-view perception, urban scene alignment, and ego-to-global traffic understanding.

2606.07707 2026-06-09 cs.LG 新提交

Decoding Naturalistic Emotion Dynamics from the Brain: An LLM-Enhanced Regression Framework

从大脑解码自然情感动态:一种LLM增强的回归框架

Lemei Zhang, Peng Liu, Hans Dahle Kvadsheim, August Sætre Aasvær, Shuer Ye, Reza Bonyadi, Maryam Ziaei, Jon Atle Gulla

发表机构 * NTNU(挪威科技大学) Kavli Institute for Systems Neuroscience, NTNU(挪威科技大学卡弗里系统神经科学研究所) Microsoft(微软)

AI总结 提出多目标回归框架,利用LLM从自然叙事中提取连续情感特征,结合动态功能连接和机器学习算法,实现从fMRI数据中解码连续情感轨迹,并揭示可解释的情感特异性脑网络拓扑。

详情
AI中文摘要

从神经信号解码情感状态通常被框架化为基于情感稳定刺激的离散单标签分类任务,这种表述过于简化了人类情感的连续、流动和共现特性。本研究通过采用多目标回归框架来重新概念化情感解码,以跟踪随时间变化的多个重叠情感维度作为连续轨迹。利用大型语言模型(LLM)的强大泛化能力,我们从自然听觉叙事《爱丽丝梦游仙境》中提取了细粒度的连续情感特征,作为人类fMRI数据集中主观情感的 scalable 代理。与标准分类范式或过滤网络动态的 mass-univariate 减法对比不同,我们利用正则化和基于核的机器学习算法作为连续估计器来跟踪宏观神经状态变化的幅度。我们证明,基于动态功能连接(DFC)时间快照训练的模型显著优于静态感兴趣区域(ROI)幅度表示,能够有效捕捉快速变化的叙事输入下的连续情感轨迹。此外,通过实施图论可解释人工智能(XAI)技术,我们解构了底层预测特征,揭示了高度可解释的、情感特定的拓扑配置。总体而言,这些结果凸显了LLM自动注释在情感神经科学中的实用性,并为心理建构主义框架提供了令人信服的实证证据,表明动态、分布式的网络交互比严格定位主义的情感解释具有更强的解释力。

英文摘要

Decoding emotional states from neural signals has been typically framed as a discrete, single-label classification task based on emotionally stable stimuli, a formulation that oversimplifies the continuous, fluid, and co-occurring nature of human affect. This study reconceptualizes emotion decoding by adopting a multi-target regression framework to track multiple overlapping emotional dimensions as continuous trajectories over time. Leveraging the robust generalization capabilities of Large Language Models (LLMs), we extracted fine-grained, continuous sentiment profiles from a naturalistic auditory narrative, Alice in Wonderland, to serve as scalable proxies for subjective affect from human fMRI dataset. Departing from standard classification paradigms or mass-univariate subtractive contrasts that filter out network dynamics, we leverage regularized and kernel-based machine learning algorithms as continuous estimators to track the magnitude of macroscale neural state variations. We demonstrate that models trained on temporal snapshots of Dynamic Functional Connectivity (DFC) significantly outperform static region-of-interest (ROI) amplitude representations, effectively capturing continuous emotional trajectories under rapidly fluctuating narrative input. Furthermore, by implementing graph-theoretical Explainable AI (XAI) techniques, we deconstruct the underlying predictive features to reveal highly interpretable, emotion-specific topological configurations. Collectively, these results highlight the utility of LLM-automated annotation in affective neuroscience and provide compelling empirical evidence for psychological constructionist frameworks, demonstrating that dynamic, distributed network interactions offer superior explanatory power over strictly locationist accounts of emotion.

2606.07705 2026-06-09 cs.LG cs.AI 新提交

SAW: Stage-Aware Dynamic Weighting for Multi-Objective Reinforcement Learning in Large Language Models

SAW: 面向大语言模型多目标强化学习的阶段感知动态加权

Yuchen He, Baolong Bi, Shenghua Liu, Huaming Liao, Yuyao Ge, Bolin Wan, Siqian Tong, Juan Chen, Jiafeng Guo, Xueqi Cheng

发表机构 * Institute of Computing Technology, Chinese Academy of Sciences(中国科学院计算技术研究所) University of Electronic Science and Technology of China(电子科技大学)

AI总结 针对多目标强化学习中奖励学习异步性问题,提出轻量级动态加权机制SAW,利用变异系数实时调整各目标贡献,在GRPO和GDPO框架下提升训练效率和最终性能。

Comments 17 pages, 7 figures, 5 tables

详情
AI中文摘要

尽管多目标强化学习(MORL)对于将大语言模型与复杂的人类偏好对齐至关重要,但当前普遍采用的静态加权求和忽略了一个更基本的现象:不同目标之间的奖励学习明显异步。学习良好的维度会迅速产生同质、低方差的信号,其残留噪声会污染聚合奖励(在GRPO中)或占据优势预算的固定份额(在GDPO中),从而干扰学习不足维度携带的稀缺但高价值的信号。为了解决这种异步性,我们提出了阶段感知动态加权(SAW),一种轻量级、算法无关的动态加权机制。SAW利用变异系数(CV)作为实时信息量的尺度不变代理,根据批次内各维度的相对信息量重新加权其奖励或优势贡献。与需要多次前向和反向传播的基于梯度的方法不同,SAW仅依赖于批次级统计信息,引入的计算开销几乎可以忽略不计。在工具调用和文本摘要任务上的实验表明,SAW在GRPO和GDPO框架下均能一致地提高训练效率和最终性能,证实了其作为多奖励LLM对齐的通用插件。我们的代码可在 https://github.com/Zhaolutuan/SAW 获取。

英文摘要

Although multi-objective reinforcement learning (MORL) is central to aligning large language models with complex human preferences, the prevailing practice of static weighted summation overlooks a more fundamental phenomenon: reward learning is markedly asynchronous across objectives. Well-learned dimensions quickly produce homogeneous, low-variance signals whose residual noise contaminates the aggregated reward (in GRPO) or occupies a fixed share of the advantage budget (in GDPO), interfering with the scarce yet high-value signals carried by under-learned dimensions. To address this asynchrony, we propose Stage-Aware Dynamic Weighting (SAW), a lightweight, algorithm-agnostic dynamic weighting mechanism. SAW utilizes the coefficient of variation (CV) as a scale-invariant proxy for real-time informativeness, reweighting each dimension's reward or advantage contribution by its relative informativeness within the batch. Unlike gradient-based methods that require multiple forward and backward passes, SAW relies solely on batch-level statistics, introducing nearly negligible computational overhead. Experiments on tool-calling and text summarization tasks demonstrate that SAW consistently improves both training efficiency and final performance under both GRPO and GDPO frameworks, confirming it as a general-purpose plug-in for multi-reward LLM alignment. Our code is available at https://github.com/Zhaolutuan/SAW

2606.07704 2026-06-09 cs.LG cs.AI 新提交

FunctionEvolve: Structure-Guided Symbolic Regression with LLMs

FunctionEvolve: 基于结构引导的符号回归与大型语言模型

Zeyu Xia, Jun Zhu, Dong Yan

发表机构 * Bosch Center for Artificial Intelligence(博世人工智能中心) Department of Computer Science and Technology, Tsinghua University(清华大学计算机科学与技术系) Tsinghua-Bosch Joint Center for ML, Tsinghua University(清华大学-博世联合机器学习中心)

AI总结 提出FunctionEvolve框架,利用表达式树组织符号回归搜索,通过结构摘要、局部树编辑和结构感知系数拟合,在LLM-SRBench合成子集上以Claude Opus 4.6实现82.9%的SA@50,较同基线提升4.5倍。

详情
AI中文摘要

符号回归旨在从数据中揭示显式的科学定律。近期方法使用大型语言模型(LLM)引导基于背景文本的变异,这比随机遗传编程更具方向性。然而,精确的符号恢复既需要语义引导,也需要显式结构,以便通过有效的符号表示进行领域信息搜索。当前的LLM驱动系统仍然是结构盲的:它们在模糊的候选者中进行选择,缺乏局部变异的显式机制,并依赖脆弱的系数拟合,这可能会低估正确的骨架。我们提出FunctionEvolve,一个使用表达式树组织整个搜索的进化框架:结构摘要促进多样化的父代选择,局部树编辑保留有用的子表达式,结构感知拟合分解、约束和简化系数,以实现更可靠的评分。它仅使用初等函数族,无需额外的领域特定规则限制泛化能力。在LLM-SRBench的129任务合成子集上,使用Claude Opus 4.6的FunctionEvolve恢复了107个精确形式,达到82.9%的SA@50,是同骨干基线的4.5倍,以及55.8%的SA@1,是此前最强已发布top-1结果的3.6倍。消融实验表明,结构可见搜索是可靠恢复的核心,LLM引导的改进和结构感知系数优化作为必要的提议和评分机制。我们还对基准进行了审计,显示其材料科学子集中的共线性导致了可识别性问题。

英文摘要

Symbolic regression aims to uncover explicit scientific laws from data. Recent methods use LLMs to guide mutation from background text, which is more directed than random genetic programming. However, exact symbolic recovery requires both semantic guidance and explicit structure, so that domain-informed search are carried out through valid symbolic representation. Current LLM-driven systems remain structure-blind: they select among opaque candidates, lack explicit mechanisms for local mutation, and rely on brittle coefficient fitting that can undervalue correct skeletons. We propose FunctionEvolve, an evolutionary framework using expression trees to organize the whole search: structural summaries promote diverse parent selection, local tree edits preserve useful subexpressions, and structure-aware fitting decomposes, constrains, and simplifies coefficients for more reliable scoring. It uses only elementary function families, without additional domain-specific rules limiting generalization. On the 129-task synthetic subset of LLM-SRBench, FunctionEvolve with \emph{Claude Opus 4.6} recovers 107 exact forms, reaching 82.9% SA@50, 4.5x above same-backbone baselines, and 55.8% SA@1, 3.6x above the strongest previously published top-1 result. Ablations show that structure-visible search is central to reliable recovery, with LLM-guided refinements and structure-aware coefficient optimization serving as essential proposal and scoring mechanisms. We also audit the benchmark and show that collinearity in its materials-science subset creates identifiability issues.

2606.07703 2026-06-09 cs.LG cs.AI cs.CL 新提交

How Much Dense Attention is Necessary? Oracle-Guided Sparse Prefill for Full/GQA Layers in Hybrid Long-Context Models

需要多少密集注意力?面向混合长上下文模型中全/GQA层的Oracle引导稀疏预填充

Hongxing Wang, Harenome Razanajato, Zhen Zhang, Yujie Yuan, Hongsheng Liu

发表机构 * Technical Report, First Release(技术报告,首次发布)

AI总结 研究在混合长上下文模型中,通过Oracle引导的稀疏预填充减少密集注意力计算,在保持任务性能的同时实现加速,并验证了可行性、索引器质量和运行时加速潜力。

Comments Technical report, first release, 26 pages, 2 figures, 11 tables

详情
AI中文摘要

长上下文预填充仍然昂贵,因为即使在包含局部、稀疏、线性或循环组件的混合模型中,全/GQA层仍然对整个历史序列进行评分。我们研究了在显式支持粒度和top-k预算下,需要多少密集注意力来保持任务级行为。我们为现有的GQA检查点引入了一种注意力质量top-k oracle:对于每个层和查询位置,它计算密集注意力,选择头平均的token支持,并仅在该支持上重新计算注意力。该oracle是一个诊断参考,而非可部署的加速器,并将稀疏预算可行性从索引器误差和运行时实现效果中分离出来。在Qwen家族的检索密集型评估中,每个查询的最长oracle行与密集注意力相差在1个点以内,而Qwen3.5-9B在4K到100K的RULER风格扫描中相差在0.48个点以内。在oracle的指导下,我们通过KL蒸馏从密集注意力质量分布中训练了一个头折叠的辅助索引器,同时保持骨干网络冻结。使用分别蒸馏的Qwen3.5-0.8B和Qwen3.5-9B索引器,报告的16K/32K验证宏观差距分别为+2.04和+1.13个点,这被视为质量保持而非改进;融合的选择块共享支持可能引入更大的实现差距。初步的单卡TTFT测量显示,与密集FlashAttention-2基线相比,蒸馏索引器的稀疏服务加速比在NPU上对Qwen3.5-0.8B为1.71倍,在GPU上对Qwen3.5-9B为1.93倍。额外的随机初始化压力行达到3.44倍,表明稀疏运行时存在提升空间,但输出质量未经验证。本次发布首次分离了oracle可行性、蒸馏索引器质量和运行时提升空间,将完全匹配的质量-延迟前沿留待未来工作。

英文摘要

Long-context prefill remains expensive because full/GQA layers still score the historical sequence, even in hybrid models with local, sparse, linear, or recurrent components. We study how much dense attention is needed to preserve task-level behavior under explicit support granularity and top-k budgets. We introduce an attention-mass top-k oracle for existing GQA checkpoints: for each layer and query position, it computes dense attention, selects head-averaged token support, and recomputes attention only on that support. The oracle is a diagnostic reference, not a deployable accelerator, and separates sparse-budget feasibility from indexer error and runtime realization effects. On Qwen-family retrieval-heavy evaluations, the longest per-query oracle rows stay within 1 point of dense, and a Qwen3.5-9B RULER-style sweep from 4K to 100K stays within 0.48 points. Guided by the oracle, we derive a head-collapsed auxiliary indexer trained by KL distillation from dense attention-mass distributions while keeping the backbone frozen. With separately distilled Qwen3.5-0.8B and Qwen3.5-9B indexers, the reported 16K/32K validation macro gaps are +2.04 and +1.13 points, treated as quality preservation rather than improvement; fused selection-block-shared support can introduce a larger realization gap. Preliminary single-card TTFT measurements show distilled-indexer sparse serving speedups of 1.71x for Qwen3.5-0.8B on NPU and 1.93x for Qwen3.5-9B on GPU against its dense FlashAttention-2 baseline. Additional random-init stress rows reach 3.44x, indicating sparse-runtime headroom but not validated output quality. This first release separates oracle feasibility, distilled-indexer quality, and runtime headroom, leaving a fully matched quality-latency frontier to future work.

2606.07702 2026-06-09 cs.LG cs.AI 新提交

EvoCSFL: Surrogate-Assisted Evolutionary Client Selection for Efficient and Robust Federated Learning

EvoCSFL:基于代理辅助的进化客户端选择实现高效鲁棒联邦学习

Lin Qiang, Sun Xiaoyan, Hu Yao, Fang Wei

发表机构 * Jiangnan University(江南大学) The Hong Kong Polytechnic University(香港理工大学)

AI总结 针对联邦学习中客户端数据与系统异构性导致收敛慢、鲁棒性差的问题,提出代理辅助的进化客户端选择框架,将选择问题建模为组合优化,用代理模型加速进化搜索,实验表明收敛更快、能耗更低、鲁棒性更强。

详情
AI中文摘要

客户端数据和系统的异构性使得采用随机客户端选择的联邦学习难以获得令人满意的收敛速度和鲁棒性。为解决此问题,本文提出了一种基于代理辅助的客户端进化选择框架。在该框架中,首先使用一些典型的客户端选择策略生成候选集,并开发了一个集成模型性能、通信延迟和能量消耗的度量函数,将客户端选择问题表述为组合优化问题。随后,利用候选选择和度量构建代理模型,以高效逼近所选客户端子集的性能。采用进化算法搜索客户端选择的组合空间,并由代理模型引导以加速收敛。在MNIST、CIFAR10、CINIC10和TinyImageNet上的实验表明,与现有方法相比,所提算法实现了更快的收敛、更低的能量消耗和更好的鲁棒性。

英文摘要

The heterogeneity of client data and systems makes it difficult to achieve satisfactory convergence speed and robustness in federated learning with random client selection. To address this issue, this paper proposes a surrogate-assisted client evolutionary selection framework for federated learning. In this framework, some typical client selection strategies are first used to generate candidate sets, and a metric function that integrates model performance, communication latency, and energy consumption is developed to formulate the client selection problem as a combinatorial optimization one. Subsequently, a surrogate model is constructed using the candidate selections and metric to efficiently approximate the performance of selected client subsets. An evolutionary algorithm is employed to search the combinatorial space of client selections, guided by the surrogate model to accelerate convergence. Experiments on MNIST, CIFAR10, CINIC10, and TinyImageNet demonstrate that the proposed algorithm achieves faster convergence, lower energy consumption, and improved robustness compared to existing methods.

2606.07700 2026-06-09 cs.LG cs.AI 新提交

EssentialGIN: a new approach for gene essentiality prediction based on graph isomorphism neural networks

EssentialGIN:基于图同构神经网络的新基因必需性预测方法

Sahar Mansouri-Rad, Zahra Narimani, Parvin Razzaghi, Nazanin Hosseinkhan

发表机构 * Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS)(计算机科学与信息技术系,基础科学研究院(IASBS)) Endocrine Research Center, Institute of Endocrinology and Metabolism, Iran University of Medical Sciences(内分泌研究中心,内分泌学与代谢研究院,伊朗医学科学大学)

AI总结 提出基于图同构神经网络(GIN)的EssentialGIN模型,整合PPI网络拓扑与基因表达、直系同源、亚细胞定位等多源生物数据,在人类等复杂生物中显著优于现有方法。

Comments 19 pages, 5 figures, 8 tables

详情
AI中文摘要

背景:必需基因(蛋白质)的预测是一个基本且具有挑战性的问题,同时在湿实验中进行非常昂贵且耗时。仅基于计算方法(引入湿实验候选)使用中心性度量预测必需基因并不准确,会导致大量假阳性;因此,最近的研究使用更复杂的模型(如深度学习)以及整合生物信息来识别必需基因。\n方法:在这项工作中,我们专注于图同构网络,将蛋白质作为PPI网络中的节点进行嵌入,以保留PPI网络的拓扑特征,并整合生物数据,如基因表达数据、基因直系同源信息和基因亚细胞定位信息,引入了一种用于预测必需基因的深度架构。本文修改了图同构网络架构以嵌入节点信息。\n结果:我们的实验证明,所提出的方法优于基于中心性的基线方法以及基于机器学习的方法,如Node2Vec、MLP和图注意力网络(GAT)。\n结论:在本文中,我们观察到使用整合生物数据(作为节点属性)并保留网络拓扑的图同构网络可以显著提高必需基因预测的准确性。在较简单的生物体(如大肠杆菌和黑腹果蝇)中,使用Node2Vec嵌入的多层感知机等方法也表现良好,但在人类中,所引入的架构显著优于深度学习和其他图神经网络解决方案。\n关键词:必需基因预测,图神经网络,图同构网络,PPI网络,节点嵌入

英文摘要

Background: Prediction of essential genes (proteins), is a basic and challenging problem but at the same time very costly and time-consuming in wet-lab experiments. Predicting essential genes, only based on computational methods (to introduce wet-lab candidates) using centrality measures are not accurate and result in large number of false positives; therefore, more complex models such as deep learning and also integration of biological information are used in recent research to identify essential genes. Methods: In this work we focus on graph isomorphism networks, in order to embed proteins as a node in PPI network to conserve topological features of PPI network, and also integrate biological data such as gene expression data, gene orthology information and gene subcellular localization information, and introduced a deep architecture for predicting essential genes. Graph isomorphism network architecture is modified in this work for embedding node information. Results: Our experiments proved that the proposed method outperforms baseline centrality-based methods and also machine learning based methods such as Node2Vec, MLP, and also graph attention networks (GAT). Conclusion: In this paper we observed that using graph isomorphism networks that integrate biological data (as node attributes) and preserve network topology can significantly improve the essential gene prediction accuracy. In simpler organisms such as E. coli and D. melanogaster, methods such as multi-layer perceptron using Node2Vec embedding also performs very good, but in H. sapiens the introduced architecture significantly outperforms deep learning and other graph neural network solutions. Keywords: Essential gene prediction, graph neural network, graph isomorphism network, PPI network, node embedding

2606.07698 2026-06-09 cs.LG cs.AI 新提交

Pharmacogenomic Knowledge Graph Augmentation for Graph Neural Network-Based Drug-Drug Interaction Prediction

基于图神经网络的药物相互作用预测的药理基因组学知识图谱增强

Juergen Dietrich

发表机构 * AI Solutions Berlin

AI总结 本研究通过整合PharmGKB的药理基因组学先验知识(CYP酶注释)作为特征向量,增强图神经网络在药物相互作用预测中的性能,在配对数据划分下显著提升DDI类型分类,但未能突破信息天花板。

Comments 13 pages

详情
AI中文摘要

应用于药物相互作用(DDI)预测的图神经网络(GNN)仅依赖由SMILES衍生的分子结构图。该系列先前的工作表明,模型性能受限于训练标签的结构信息含量——即信息天花板——仅靠架构改进无法克服。本研究探讨来自PharmGKB数据库的药理基因组学先验知识是否通过提供独立于分子结构且互补的代谢通路背景,部分关闭这一天花板。提取四种临床相关亚型(CYP2D6、CYP3A4、CYP2C19、CYP2C9)的细胞色素P450(CYP)酶底物、抑制剂和诱导剂注释,并将其作为12维特征向量在交互预测前与分子嵌入拼接。在配对水平和药物水平数据划分下进行实验,以量化对未见药物的泛化能力。结果表明,在配对水平划分条件下,知识图谱(KG)增强显著改善了DDI类型分类(F1宏平均:0.532对比基线0.241),而二元交互检测和药物水平泛化仍受信息天花板限制(AUC提升:0.224对比基线0.250)。对严格保留化合物的机制验证确认,增强优先改善CYP2C9介导的交互预测,概率从基线0.033-0.117提升至KG增强后的0.560-0.586。在Tox21基准上的单分子毒性预测扩展实验证实,该效果取决于药理基因组学注释覆盖度。这些发现为后续研究提出的多模态框架提供了动机。

英文摘要

Graph neural networks (GNNs) applied to drug-drug interaction (DDI) prediction rely exclusively on molecular structure encoded as SMILES-derived graphs. Prior work in this series demonstrated that model performance is bounded by the structural information content of training labels -- an Information Ceiling -- that architectural refinements alone cannot overcome. The present study investigates whether pharmacogenomic prior knowledge from the PharmGKB database partially closes this ceiling by providing metabolic pathway context that is independent of, and complementary to, molecular structure. Cytochrome P450 (CYP) enzyme substrate, inhibitor, and inducer annotations for four clinically relevant isoforms (CYP2D6, CYP3A4, CYP2C19, CYP2C9) are extracted and incorporated as a 12-dimensional feature vector concatenated to the molecular embedding prior to interaction prediction. Experiments are conducted under both pair-level and drug-level data splits to quantify generalization to unseen drugs. Results indicate that knowledge graph (KG) augmentation substantially improves DDI type classification under pair-level split conditions (F1-macro: 0.532 vs. 0.241 baseline), while binary interaction detection and drug-level generalization remain bounded by the Information Ceiling (AUC inflation: 0.224 vs. 0.250 baseline). Mechanistic validation on strictly held-out compounds confirms that augmentation preferentially improves CYP2C9-mediated interaction prediction, with probabilities increasing from 0.033-0.117 (baseline) to 0.560-0.586 (KG-augmented). An extension to single-molecule toxicity prediction on the Tox21 benchmark confirms that the effect is contingent on pharmacogenomic annotation coverage. These findings motivate the multimodal framework proposed for the subsequent study in this series.

2606.07696 2026-06-09 cs.LG cs.AI 新提交

Adversarial Robustness of Activation Steering in Large Language Models

大型语言模型中激活引导的对抗鲁棒性

Kien Le, Thai Le

发表机构 * Independent Researcher(独立研究员) Indiana University(印第安纳大学)

AI总结 研究激活引导在对抗性文本扰动下的鲁棒性,发现所有方法、模型和设置中方向鲁棒性下降高达64%,置信度崩溃,层选择脆弱,揭示其结构性脆弱性。

Comments 9 pages, 2 figures

详情
AI中文摘要

激活引导已成为一种流行的免训练方法,通过在推理时将预计算的方向向量注入模型的残差流来控制LLM行为。然而,其对现实输入变化的鲁棒性尚未得到研究。我们首次系统评估了在输入上施加对抗性文本扰动时激活引导的鲁棒性,涵盖了四种提取方法、三种攻击策略、来自Anthropic Model-Written Evaluation数据集的六种人格以及从1.5B到30B参数的五个模型。攻击在所有设置中普遍成功:方向鲁棒性下降高达64%,攻击后置信度在所有方法和模型中崩溃至接近或低于0.25,并且几乎每个可引导输入的引导强度都下降。层选择同样脆弱,通过自动化方法在干净输入上识别的最优层在扰动下偏移多达17个位置,这一失败加剧了向量级别的崩溃。从对抗性扰动输入中提取向量对于中大型模型上的PCA和MD方法部分恢复了可引导性,但它们始终无法定位改进的最优层,限制了这种缓解措施的实际效益。总之,这些发现揭示了激活引导的脆弱性是结构性的而非方法特定的,并且当前的层选择策略对于实际部署不够鲁棒。

英文摘要

Activation steering has become a popular training-free method to control LLM behavior by injecting precomputed direction vectors into the model's residual stream at inference time. Yet its robustness to realistic input variation remains unstudied. We present the first systematic evaluation of activation steering robustness under adversarial text perturbations on the inputs, covering four extraction methods, three attack strategies, six personas from Anthropic Model-Written Evaluation Dataset, and five models ranging from 1.5B to 30B parameters. Attacks succeed broadly across all settings: directional robustness drops by up to 64%, post-attack confidence collapses near or below 0.25 across all methods and models, and steering strength degrades on nearly every steerable input. Layer selection is equally fragile, with the optimal layer identified by an automated method on clean inputs shifting by up to 17 positions under perturbation, a failure that compounds the vector-level breakdown. Extracting vectors from adversarially perturbed inputs partially recovers steerability for PCA and MD on mid-to-large models, but they consistently fail to locate the improved optimal layer, limiting the practical benefit of this mitigation. Together, these findings reveal that the brittleness of activation steering is structural rather than method-specific, and that current layer selection strategies are not robust enough for real-world deployment.

2606.07695 2026-06-09 cs.LG cs.AI 新提交

DSFNet: Learning Dual-Domain Spectral Operators for Multi-Modality Spatio-Temporal Forecasting in Urban Transportation Systems

DSFNet:面向城市交通系统多模态时空预测的双域谱算子学习

Yongchao Li, Yang Li, Zhuoxuan Li, Jun Chen, Chu Zhang, Jinde Cao, Leszek Rutkowski

发表机构 * Southeast University(东南大学) Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies(江苏省现代城市交通技术协同创新中心) City University of Hong Kong(香港城市大学) School of Mathematics, Southeast University(东南大学数学学院) Systems Research Institute of the Polish Academy of Sciences(波兰科学院系统研究所) Luoyang Normal University(洛阳师范学院) Purple Mountain Laboratories(紫金山实验室) AGH University of Krakow(AGH科技大学)

AI总结 提出双域谱滤波网络DSFNet,通过特征域和空间域谱算子分解空间-模态交互,显式建模跨变量耦合与异质空间依赖,结合外部门控机制自适应调节时间动态,在五个真实交通数据集上MAE降低3.21%-10.16%。

详情
AI中文摘要

多模态时空预测(MoSTF)通过引入多样化的交通模态扩展了传统的时空预测。尽管近年来在时空建模方面取得了显著进展,现有方法往往未能显式建模不同模态变量之间的耦合关系。准确的MoSTF具有挑战性,因为它需要建模(1)外生影响下的时间动态异质性和(2)异质空间依赖性以及复杂的跨变量耦合。为了解决这些挑战,我们提出了双域谱滤波网络(DSFNet)。我们的框架采用双域谱滤波来捕获异质空间模式并显式建模变量之间的关系。与基于图的消息传递或节点-模态对上的密集注意力不同,DSFNet将空间-模态交互分解为特征域和空间域谱算子,从而实现了非局部依赖和跨模态耦合的可扩展建模。此外,我们引入了一种外部门控机制,以自适应地调节外部影响下的时间动态。我们通过在五个代表性真实世界交通数据集上的大量实验验证了我们的方法。与次优基线相比,DSFNet在这些数据集上将MAE降低了3.21%-10.16%。结果表明,DSFNet在准确性上显著优于现有最先进基线,同时表现出高效性和鲁棒性。

英文摘要

Multi-Modality Spatio-Temporal Forecasting (MoSTF) extends traditional spatio-temporal forecasting by incorporating diverse traffic modalities. Despite significant recent strides in spatio-temporal modeling, existing approaches often fail to explicitly model the coupling relationships between different modality variables. Accurate MoSTF is challenging, as it requires modeling (1) temporal dynamic heterogeneity under exogenous influences and (2) heterogeneous spatial dependencies alongside complex cross-variable couplings. To address these challenges, we propose the Dual-Domain Spectral Filtering Network (DSFNet). Our framework employs dual-domain spectral filtering to capture heterogeneous spatial patterns and explicitly model the relationships between variables. Unlike graph-based message passing or dense attention over node-modality pairs, DSFNet factorizes space-modality interactions into feature-domain and spatial-domain spectral operators, enabling scalable modeling of nonlocal dependencies and cross-modality couplings. Furthermore, we introduce an external gating mechanism to adaptively regulate temporal dynamics under external influences. We validate our method through extensive experiments on five representative real-world traffic datasets. Compared with the second-best baselines, DSFNet reduces MAE by 3.21%-10.16% across these datasets. The results demonstrate that DSFNet significantly outperforms existing state-of-the-art baselines in accuracy while exhibiting efficiency and robustness.

2606.07694 2026-06-09 cs.LG stat.ML 新提交

Vessel Traffic Flow Prediction on Sparse Data via Spatio-Temporal Graph Neural Networks with a Learnable Tweedie Head

基于可学习Tweedie头的时空图神经网络在稀疏数据上的船舶交通流预测

Kyeongjun Lee, Heeyoung Kim

发表机构 * Korea Advanced Institute of Science and Technology (KAIST)(韩国科学技术院)

AI总结 针对船舶交通流数据高度稀疏且间歇性爆发的问题,提出一种模型无关的可学习Tweedie头作为即插即用输出模块,通过优化闭合形式的Tweedie单元偏差并预测均值,同时学习节点级方差幂以捕获港口区域异质性,在真实AIS数据上显著提升RMSE。

详情
AI中文摘要

准确的船舶交通流预测对于智能港口运营和航行安全至关重要。然而,海上交通流数据通常高度稀疏且具有间歇性爆发,使得稳健预测具有挑战性。在这种条件下,传统的时空图神经网络(ST-GNNs)可能退化为保守的接近零的预测,无法捕获非零活动。尽管零膨胀负二项(ZINB)模型部分解决了过多零值问题,但其两部分公式在突变附近仍可能保持保守。为了解决这些问题,我们提出了一种模型无关的可学习Tweedie头,它可以作为即插即用的输出模块附加到任意ST-GNN骨干网络上。与通常需要替代目标的基于似然的Tweedie训练不同,我们的方法优化了闭合形式的Tweedie单元偏差,并预测均值以进行点预测,同时学习节点级方差幂以捕获港口区域间的异质性变异性。在由洛杉矶和长滩港口的真实AIS数据构建的海上交通图上的实验表明,所提出的头在多个ST-GNN骨干网络上一致地提高了RMSE,特别是在非零事件上,从而为实际海上交通控制提供了更可靠的预测。

英文摘要

Accurate vessel traffic flow prediction is crucial for smart port operations and navigational safety. However, maritime traffic flow data are often highly sparse with intermittent bursts, making robust forecasting challenging. Under such conditions, conventional spatio-temporal graph neural networks (ST-GNNs) can degrade toward conservative near-zero predictions and fail to capture non-zero activity. Although zero-inflated negative binomial (ZINB) models partially address excess zeros, their two-part formulation can still remain conservative around abrupt transitions. To address these issues, we propose a model-agnostic learnable Tweedie head that can be attached as a plug-and-play output module to arbitrary ST-GNN backbones. Instead of likelihood-based Tweedie training, which typically requires surrogate objectives, our approach optimizes the closed-form Tweedie unit deviance and predicts the mean for point forecasting while learning a node-level variance power to capture heterogeneous variability across port areas. Experiments on a maritime traffic graph constructed from real-world AIS data in the Port of Los Angeles and Long Beach show that the proposed head consistently improves RMSE across multiple ST-GNN backbones, especially on non-zero events, leading to more reliable forecasts for practical maritime traffic control.

2606.07692 2026-06-09 cs.LG cs.AI cs.ET 新提交

BCG-FM: A Foundation Model for Ambient Cardiac Health Sensing

BCG-FM:一种用于环境心脏健康感知的基础模型

Magnus Ruud Kjaer, Haejun Han, Ashish Neupane, David Q. Sun

发表机构 * Department of Computer Science and Engineering, University of California, San Diego(1 加州大学圣迭戈分校计算机科学与工程系)

AI总结 提出首个环境机械生物信号基础模型BCG-FM,利用床垫压电传感器无感采集心冲击图,通过14.6万人的275万小时数据预训练,在生物年龄估计上达到3.26年MAE,并实现15种健康状态的临床相关判别。

详情
AI中文摘要

可穿戴生物信号的基础模型在多项临床任务中已匹配或超越监督专家,但所有模型都依赖于需要用户主动操作的模态——佩戴设备或访问睡眠实验室。我们提出BCG-FM,首个用于环境机械生物信号的基础模型。嵌入床垫表面的压电传感器每晚无感记录心冲击图(BCG);我们使用参与者级对比学习,基于145,985名个体的总计275万小时夜间记录预训练BCG-FM,这是迄今为止最大的原始波形生物信号预训练语料库。冻结的BCG-FM嵌入在生物年龄估计上达到3.26年MAE(所有环境、非接触模态中最低报告值),并在15种自我报告健康状况和三个独立外部队列中产生临床相关的判别。仅500名标注参与者的预训练表示优于在3,372名参与者上训练的完全监督基线,且表示质量与对比批次大小呈对数线性关系。这些结果确立了环境、纵向机械生物信号作为健康基础模型的可行模态。

英文摘要

Foundation models for wearable biosignals have matched or exceeded supervised specialists across a range of clinical tasks, yet all rely on modalities that require deliberate user action--wearing a device or visiting a sleep lab. We introduce BCG-FM, the first foundation model for ambient mechanical biosignals. A piezoelectric sensor embedded in the bed surface records ballistocardiography (BCG) each night without user effort; we pretrain BCG-FM with participant-level contrastive learning and using a total of 2.75 million hours of nightly recordings from 145,985 individuals, the largest raw-waveform biosignal pretraining corpus to date. Frozen BCG-FM embeddings achieve 3.26-year MAE on biological-age estimation (the lowest reported for any ambient, contactless modality) and yield clinically relevant discrimination across 15 self-reported health conditions and three independent external cohorts. Pretrained representations from only 500 labeled participants outperform a fully supervised baseline trained on 3,372, and representation quality scales log-linearly with contrastive batch size. These results establish ambient, longitudinal mechanical biosignals as a viable modality for health foundation models.

2606.07690 2026-06-09 cs.LG cs.AI 新提交

HARP: Efficient Data Selection for Finetuning Large Language Models

HARP:高效数据选择用于微调大型语言模型

Ning Wang, Zhengxin Zhang, Maosen Tang, Yitang Gao, Claire Cardie, Sainyam Galhotra

发表机构 * Cornell University(康奈尔大学) The Hong Kong University of Science and Technology(香港科技大学)

AI总结 提出层次主动区域剪枝(HARP),一种高效的基于训练的数据选择方法,通过层次结构和经验贝叶斯推断降低选择成本,同时保持下游对齐,在多个基准上优于最强基线最多8.9分,且训练样本减少约7倍。

详情
AI中文摘要

微调数据选择需要平衡两个相互竞争的目标:选择改善下游目标的示例,以及在不重复微调模型的情况下做到这一点。无训练选择器具有可扩展性,但依赖于嵌入相似性或聚类等代理,这些可能无法匹配目标目标。基于训练的选择器通过梯度信号、子集评估或Shapley归因更好地反映下游效用,但需要大量昂贵的训练-评估迭代。我们提出层次主动区域剪枝(HARP),一种高效的基于训练的选择器,在降低选择成本的同时保持下游对齐。HARP将训练池组织成节点-叶子层次结构,仅评估代表性叶子,并使用经验贝叶斯后验推断未测量的效用。然后,它使用两个互补的包络选择数据:HARP-C,保守地控制冗余,以及HARP-E,加性地奖励互补区域。我们理论上证明,在局部平滑和有界估计误差下,HARP控制选择误差同时降低训练-评估成本。我们进一步验证,HARP变体实现了最佳结果,并在使用大约7倍更少训练示例的情况下,比最强基线高出最多8.9分。

英文摘要

Finetuning data selection requires balancing two competing goals: selecting examples that improve the downstream objective, and doing so without repeatedly finetuning models. Train-free selectors are scalable but rely on proxies such as embedding similarity or clustering, which may not match the target objective. Train-based selectors better reflect downstream utility through gradient signals, subset evaluation, or Shapley attribution, but require many costly train--evaluate iterations. We propose Hierarchical Active Region Pruning (HARP), an efficient train-based selector that preserves downstream alignment while reducing selection cost. HARP organizes the training pool into a node--leaf hierarchy, evaluates only representative leaves, and infers unmeasured utilities with empirical Bayes posteriors. It then selects data using two complementary envelopes: HARP-C, which conservatively controls redundancy, and HARP-E, which additively rewards complementary regions. We theoretically show that, under local smoothness and bounded estimation error, HARP controls selection error while reducing train--evaluate cost. We further validate that HARP variants achieve the best result and outperform the strongest baseline by up to $+8.9$ points, while using roughly $7\times$ fewer training examples.

2606.07689 2026-06-09 cs.CV 新提交

Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking

Struct-Searcher:代理式结构化思维推进多模态深度信息检索

Fan Zhang, Vireo Zhang, Shengju Qian, Haoxuan Li, Zheng Lian, Hao Wu, Yuan Gao, Xinyu Geng, Xin Wang, Pheng-Ann Heng

发表机构 * CUHK(香港中文大学) LIGHTSPEED PKU(北京大学) Tongji University(同济大学) THU(清华大学) HKUST(香港科技大学)

AI总结 提出基于信念修正理论的结构化代理工作流Struct-Searcher,通过维护多模态结构图实现冲突感知的深度信息检索,在多个基准上平均相对准确率提升17.2%。

详情
AI中文摘要

深度研究代理因其收集大规模在线信息以获取目标知识的能力而受到越来越多的关注,最近的研究工作从纯文本信息检索转向多模态设置。然而,现有的代理工作流大多与证据积累模型一致,该模型线性地聚合证据,缺乏处理跨异构模态矛盾信息的原则性机制。为此,我们提出了Struct-Searcher,一种基于信念修正理论的结构化代理工作流,它在整个推理过程中显式地维护一个不断演变的多模态结构图,从而实现有效的冲突感知多模态深度信息检索。在多个基准数据集和骨干模型上的大量实验表明,Struct-Searcher是(1)即插即用且模型无关的,在BrowseComp-VL上使用五种不同骨干模型平均相对准确率提升17.2%。(2)性能最优,持续优于最先进的视觉语言模型(VLM)和深度研究代理,在MM-BrowseComp上相对准确率提升3.7%,在HLE-VL上提升1.5%,在BrowseComp-VL上提升0.7%,均超过第二名的竞争方法。

英文摘要

Deep research agents have attracted increasing attention for their ability to collect large-scale online information to acquire target knowledge, with recent efforts shifting from purely text-based information seeking to multimodal settings. However, existing agentic workflows are largely aligned with evidence accumulation models, which linearly aggregate evidence and lack principled mechanisms for handling contradictory information across heterogeneous modalities. Towards this end, we propose Struct-Searcher, a structural agentic workflow grounded in belief revision theory that explicitly maintains an evolving multimodal structural graph throughout the reasoning process, enabling effective conflict-aware multimodal deep information seeking. Extensive experiments across multiple benchmark datasets and backbone models demonstrate that Struct-Searcher is (1) plug-and-play and model-agnostic, yielding an average relative accuracy improvement of 17.2% on BrowseComp-VL across five different backbones. (2) top-performing, consistently outperforming state-of-the-art vision-language models (VLMs) and deep research agents, with relative accuracy improvements of 3.7% on MM-BrowseComp, 1.5% on HLE-VL, and 0.7% on BrowseComp-VL over the second-best competing approach.

2606.07687 2026-06-09 cs.CV cs.AI 新提交

What Makes Video World Model Latents Action-Relevant: Prediction over Reconstruction

什么使视频世界模型潜在空间与动作相关:预测优于重建

Jewon Yeom, Hanseul Kim, Jeongjae Park, Sungmok Jung, Jaejin Lee, Taesup Kim

发表机构 * Graduate School of Data Science, Seoul National University(首尔大学数据科学研究生院)

AI总结 通过统一探针评估,发现动作相关结构主要由时间视频预训练驱动,而非像素重建保真度,其中视频预训练自监督编码器在视觉保真度和动作预测间取得最佳帕累托权衡。

详情
AI中文摘要

视频世界模型越来越多地用于提供预测性视觉表示,但尚不清楚哪些预训练信号在其潜在空间中诱导出与动作相关的结构。我们通过跨多种编码器家族的统一探针评估来研究这个问题,包括仅图像自监督、带或不带潜在预测的视频预训练、基于重建的自编码器、扩散模型以及捷径强制动力学模型。使用共同的逆动力学探针目标,我们发现动作相关结构主要由时间视频预训练驱动,而非像素重建保真度:具有强像素解码质量的模型可能表现出接近零的动作可恢复性,而视频预训练的自监督编码器在视觉保真度和动作预测之间始终实现最佳帕累托权衡。比较V-JEPA和VideoMAE进一步表明,大部分收益来自自然视频时间上下文,特征级潜在预测提供了较小的额外收益。这些趋势在机器人基准测试中转移,尽管CALVIN显示静态环境任务可以通过允许强图像先验来部分掩盖时间结构的重要性。最后,逆动力学监督显著提高了对视觉损坏的鲁棒性,表明动作感知目标正则化了潜在几何,超越了干净环境性能。我们的结果确定时间预测结构——而非重建保真度——是动作相关视频表示的主要成分。

英文摘要

Video world models are increasingly used to provide predictive visual representations, yet it remains unclear which pretraining signals induce action-relevant structure in their latent spaces. We study this question through a unified probe-based evaluation across diverse encoder families, including image-only self-supervision, video pretraining with and without latent prediction, reconstruction-based autoencoders, diffusion models, and shortcut-forcing dynamics models. Using a common inverse-dynamics probing objective, we find that action-relevant structure is driven primarily by temporal video pretraining rather than pixel reconstruction fidelity: models with strong pixel decoding quality can exhibit near-zero action recoverability, while video-pretrained self-supervised encoders consistently achieve the best Pareto trade-off between visual fidelity and action prediction. Comparing V-JEPA and VideoMAE further shows that most gains arise from natural-video temporal context, with feature-level latent prediction providing a smaller additional benefit. These trends transfer across robotic benchmarks, though CALVIN reveals that static-environment tasks can partially mask the importance of temporal structure by allowing strong image priors to suffice. Finally, inverse-dynamics supervision substantially improves robustness to visual corruption, suggesting that action-aware objectives regularize latent geometry beyond clean-setting performance. Our results identify temporal predictive structure -- not reconstruction fidelity -- as the primary ingredient underlying action-relevant video representations.

2606.07686 2026-06-09 cs.LG cs.AI 新提交

Knowledge-Inclusive Adaptive Physics-Informed Neural Network for Microbial Interaction Modelling

知识包容的自适应物理信息神经网络用于微生物相互作用建模

Ravisha Rupasinghe, Rajith Vidanaarachchi, Asela Hevapathige, Sachith Seneviratne, Sen-Lin Tang, Saman Halgamuge

发表机构 * University of Melbourne(墨尔本大学) Academia Sinica(中央研究院)

AI总结 提出一种知识包容的自适应PINN框架,通过整合文本和网络结构知识改进微生物群落建模,在真实和模拟数据集上性能提升最高53%。

Comments 33 pages

详情
AI中文摘要

物理信息神经网络(PINN)是一种在机器学习方法中以方程形式包含知识的方式。除了方程,知识还以其他形式存在,如文本和网络结构。虽然现有的基于PINN的方法从数据中发现方程参数,但它们仅依赖实验测量。我们提出一个新的PINN框架,通过整合辅助知识源来丰富参数发现。我们将该框架应用于微生物学,其中广义Lotka-Volterra(gLV)作为建模微生物群落的生物学基础。我们证明,整合知识可以改进微生物群落建模。我们的框架利用同行评审的宏基因组学文献丰富gLV参数,因为文本提供了gLV单独无法捕捉的外部影响的生物学背景。我们使用数据驱动的整合方法将这些知识与微生物丰度的实验测量相结合。我们通过显式建模微生物相互作用来整合基于网络的结构知识。我们的知识包容框架推断微生物网络,揭示生态学见解。我们根据文献中记录的生态角色验证这些发现。我们在涵盖人类和植物相关微生物群落的真实和模拟数据集上进行评估。我们的框架在无知识情况下比现有技术提升最高53%。知识添加在基于Bray-Curtis差异的准确率上带来最高23%的提升,在R²上带来47%的提升。

英文摘要

Physics-Informed Neural Network (PINN) is a way of including knowledge in the form of equations in Machine Learning methods. Beyond equations, knowledge exists in other forms, such as text and network structure. While existing PINN-based approaches discover equation parameters from data, they rely solely on experimental measurements. We propose a new PINN framework that enriches parameter discovery by incorporating auxiliary knowledge sources. We instantiate our framework for microbiology, where generalised Lotka-Volterra (gLV) serves as a biological foundation for modelling microbial communities. We demonstrate that incorporating knowledge improves microbial community modelling. Our framework enriches the gLV parameters using peer-reviewed metagenomics literature, as text provides biological context on external influences that gLV alone cannot capture. We combine this knowledge with experimental measurements of microbial abundance using a data-driven integration approach. We integrate network-based structural knowledge by explicitly modelling microbial interactions. Our knowledge-inclusive framework infers microbial networks, revealing ecological insights. We validate these findings against ecological roles documented in the literature. We evaluate on real and simulated datasets spanning human- and plant-associated microbial communities. Our framework improves over the state-of-the-art by up to 53%, even without knowledge. Knowledge addition yields gains of up to 23% in Bray-Curtis Dissimilarity-based accuracy and 47% in $\mathrm{R}^2$.